Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Enabling large-scale seismic data acquisition, processing and waveform-inversion via rank-minimization Kumar, Rajiv 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_november_kumar_rajiv.pdf [ 30.3MB ]
Metadata
JSON: 24-1.0355266.json
JSON-LD: 24-1.0355266-ld.json
RDF/XML (Pretty): 24-1.0355266-rdf.xml
RDF/JSON: 24-1.0355266-rdf.json
Turtle: 24-1.0355266-turtle.txt
N-Triples: 24-1.0355266-rdf-ntriples.txt
Original Record: 24-1.0355266-source.json
Full Text
24-1.0355266-fulltext.txt
Citation
24-1.0355266.ris

Full Text

Enabling large-scale seismic data acquisition, processing andwaveform-inversion via rank-minimizationbyRajiv KumarB.Sc., Hindu College, Delhi University, 2006M.Sc., Indian Institute of Technology, Bombay, 2008a thesis submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoral studies(Geophysics)The University Of British Columbia(Vancouver)August 2017c© Rajiv Kumar, 2017AbstractIn this thesis, I adapt ideas from the field of compressed sensing to mitigate the computationaland memory bottleneck of seismic processing workflows such as missing-trace interpolation, sourceseparation and wave-equation based inversion for large-scale 3- and 5-D seismic data. For interpo-lation and source separation using rank-minimization, I propose three main ingredients, namely arank-revealing transform domain, a subsampling scheme that increases the rank in the transformdomain, and a practical large-scale data-consistent rank-minimization framework, which avoids theneed for expensive computation of singular value decompositions. We also devise a wave-equationbased factorization approach that removes computational bottlenecks and provides access to thekinematics and amplitudes of full-subsurface offset extended images via actions of full extended im-age volumes on probing vectors, which I use to perform the amplitude-versus- angle analyses andautomatic wave-equation migration velocity analyses on complex geological environments. After abrief overview of matrix completion techniques in Chapter 1, we propose a singular value decompo-sition (SVD)-free factorization based rank-minimization approach for large-scale matrix completionproblems. Then, I extend this framework to deal with large-scale seismic data interpolation prob-lems, where I show that the standard approach of partitioning the seismic data into windows is notrequired, which use the fact that events tend to become linear in these windows, while exploiting thelow-rank structure of seismic data. Carefully selected synthetic and realistic seismic data examplesvalidate the efficacy of the interpolation framework. Next, I extend the SVD-free rank-minimizationapproach to remove the seismic cross-talk in simultaneous source acquisition. Experimental resultsverify that source separation using the SVD-free rank-minimization approaches are comparableto the sparsity-promotion based techniques; however, separation via rank-minimization is signif-icantly faster and memory efficient. We further introduce a matrix-vector formulation to formfull-subsurface extended image volumes, which removes the storage and computational bottleneckfound in the convention methods. I demonstrate that the proposed matrix-vector formulation isused to form different image gathers with which amplitude-versus-angle and wave-equation mi-gration velocity analyses is performed, without requiring prior information on the geologic dips.Finally, I conclude the thesis by outlining potential future research directions and extensions of thethesis work.iiLay SummaryIn this thesis, I have developed fast computational techniques for large-scale seismic applicationsusing SVD-free factorization based rank-minimization approach for missing-trace interpolation andsource separation. The proposed framework is built upon the existing knowledge of compressedsensing as a successful signal recovery paradigm and outlined the necessary components of a low-rank domain, a rank-increasing sampling scheme, and a SVD-free rank-minimizing optimizationscheme for successful missing-trace interpolation. I also proposed a matrix-vector formulationthat avoids the explicit storage of full-subsurface offset extended image volumes and removes theexpensive loop over shots found in the conventional extended imaging.iiiPrefaceAll of my thesis work herein presented is carried out under the supervision of Dr. Felix J. Herrmann,in the Seismic Laboratory for Imaging and Modeling at the University of British Columbia.Chapter 1 is prepared by me. Figures related to seismic data survey designs are taken fromthe publicly available seismic literature with citations added at the appropriate locations in thechapter.A version of Chapter 2 was published in A. Aravkin, R. Kumar, H. Mansour, B. Recht, andF. J. Herrmann, ”Fast methods for denoising matrix completion formulations, with applicationsto robust seismic data interpolation”, SIAM Journal on Scientific Computing, 36(5):S237-S266,2014. F. Herrmann developed the idea of SVD-free matrix completion techniques for seismic data.B. Recht was involved in the initial theoretical discussion on matrix completion techniques. Icollaborated with A. Aravkin and H. Mansour in the theoretical developments of SVD-free matrixcompletion framework along with its robust and weighting extensions. A. Aravkin coded the SPG-LR framework. I coded all the experiments and prepared all the figures along with the parametersrelated to the experiments in this chapter. This manuscript was mainly written by A. Aravkin andH. Mansour, and i contributed to the numerical and experimental section. F. Herrmann and B.Recht were involved in the final manuscript editing.A version of Chapter 3 was published in R. Kumar, C. D. Silva, O. Akalin, A. Y. Aravkin,H. Mansour, B. Recht, and F. J. Herrmann, ”Efficient matrix completion for seismic data recon-struction”, Geophysics, 80(05): V97-V114, 2015. All the authors were involved in the initial layoutof this work and manuscript edits. I, along with C. D. Silva, was responsible for the manuscriptand examples. I performed all the matrix completion examples using SPG-LR along with the realdata case study and comparison to sparsity-promotion based techniques. O. Akalin performedthe Jellyfish comparison example and C. D. Silva performed the ADMM and LmaFit comparisonexamples.A version of Chapter 4 was published in R. Kumar, H. Wason, and F. J. Herrmann, ” Sourceseparation for simultaneous towed-streamer marine acquisition: a compressed sensing approach”,Geophysics, 80(06): WD73-WD88, 2015. H. Wason identified the problem of source separationtechniques in seismic literature. I, along with H. Wason, developed the theoretical frameworkto solve the source separation problem using compressed sensing. I coded the underlying solverivusing rank-minimization based techniques to solve the source separation problem. F. Herrmannproposed to incorporate the HSS techniques in the rank-minimization framework. I performedall the experiments using the rank-minimization based techniques, and, H. Wason performed allthe experiments using the sparsity-promotion based techniques. I along with H. Wason wrote themanuscript and F. Herrmann was involved in the final manuscript editing. H. Wason and I madean equal contribution (approximately 50%) in developing this research idea for large-scale seismicdata applications. This chapter appears as Chapter 7 in H. Wason’s dissertation. H. Wason hasalso granted permission for this chapter to appear in my dissertation.A version of Chapter 5 was published in R. Kumar, S. Sharan, H. Wason, and F. J. Herrmann,”Time-jittered marine acquisition: a rank-minimization approach for 5D source separation”, SEGTechnical Program Expanded Abstracts, 2016, p. 119-123. This was my original contributionwherein i proposed the mathematical formulation, coded it and performed the verification examples.S. Sharan helped in simulation of continuous time-domain data. I wrote the abstract where all thecoauthors were involved in the abstracts editing.A version of Chapter 6 was published in T. van Leeuwen, R. Kumar, F. J. Herrmann, ”Enablingaffordable omnidirectional subsurface extended image volumes via probing”, Geophysical Prospect-ing, 1365-2478, 2016. T. van Leeuwen and F. Herrmann proposed the idea of probing techniques forextended-images. T. van Leeuwen provided the initial framework to compute the extended imagevolumes. I coded the AVA, 3D extended image volume and the wave-equation migration velocityanalysis examples. I, along with T. van Leeuwen, wrote the initial manuscript, and F. Herrmannprovided substantial editing of the manuscript during the review process.MATLAB and its parallel computing toolbox has been used to prepare all the examples inthis thesis. The `1 solver is a public toolbox presented by Ewout van den Berg and Michael P.Friedlander, whereas, SPG-LR solver is coded by Aleksandr Y. Aravkin. The Curvelet toolbox ispresented by Emmanuel Candes, Laurent Demanet, David Donoho and Lexing Ying.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Fast methods for denoising matrix completion formulations, with applicationsto robust seismic data interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Regularization formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Factorization approach to rank optimization . . . . . . . . . . . . . . . . . . . . . . . 172.5 Local minima correspondence between factorized and convex formulations . . . . . . 182.6 LR-BPDN algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.2 Increasing k on the fly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.3 Computational efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22vi2.7 Robust formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.8 Reweighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.8.1 Projection onto the weighted frobenius norm ball . . . . . . . . . . . . . . . . 272.8.2 Traversing the pareto curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.9 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.9.1 Collaborative filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.9.2 Seismic missing-trace interpolation . . . . . . . . . . . . . . . . . . . . . . . . 302.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.11 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Efficient matrix completion for seismic data reconstruction . . . . . . . . . . . 483.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.3 Structured signal recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.4 Low-rank promoting data organization . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.1 2D seismic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.4.2 3D seismic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.5 Large scale data reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.5.1 Large scale matrix completion . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.5.2 Large scale tensor completion . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.6.1 2D seismic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.6.2 3D seismic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 Source separation for simultaneous towed-streamer marine acquisition—a com-pressed sensing approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.3 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.3.1 Rank-revealing “transform domain” . . . . . . . . . . . . . . . . . . . . . . . 824.3.2 Hierarchical semi-separable matrix representation (HSS) . . . . . . . . . . . . 854.3.3 Large-scale seismic data: SPG-LR framework . . . . . . . . . . . . . . . . . . 88vii4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.4.1 Comparison with nmo-based median filtering . . . . . . . . . . . . . . . . . . 934.4.2 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985 Large-scale time-jittered simultaneous marine acquisition: rank-minimizationapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.4 Experiments & results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106 Enabling affordable omnidirectional subsurface extended image volumes viaprobing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.3 Anatomy & physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4 Computational aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.5 Case study 1: computing gathers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.5.1 Numerical results in 2-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.5.2 Numerical results in 3-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.6 Case study 2: dip-angle gathers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.6.1 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226.7 Case study 3: wave-equation migration-velocity analysis (WEMVA) . . . . . . . . . 1246.7.1 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.1 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.1.1 SVD-free factorization based matrix completion . . . . . . . . . . . . . . . . . 1447.1.2 Enabling computation of omnidirectional subsurface extended image volumes 1477.2 Follow-up work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.3 Current limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.4 Future extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.4.1 Extracting on-the-fly information . . . . . . . . . . . . . . . . . . . . . . . . . 149viii7.4.2 Compressing full-subsurface offset extended image volumes . . . . . . . . . . 1507.4.3 Comparison of MVA and FWI . . . . . . . . . . . . . . . . . . . . . . . . . . 152Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153ixList of TablesTable 2.1 Summary of the computational time (in seconds) for LR-BPDN, measuring theeffect of random versus smart ([Jain et al., 2013, Algorithm 1]) initialization ofL and R for factor rank k and relative error level η for (BPDNη). Comparisonperformed on the 1M MovieLens Dataset. Type of initialization had almost noeffect on quality of final reconstruction. . . . . . . . . . . . . . . . . . . . . . . . 22Table 2.2 Summary of the recovery results on the MovieLens (1M) data set for factor rankk and relative error level η for (BPDNη). SNR in dB (higher is better) listedin the left table, and RMSE (lower is better) in the right table. The last rowin each table gives recovery results for the non-regularized data fitting factorizedformulation solved with Riemannian optimization (ROPT). Quality degrades withk due to overfitting for the non-regularized formulation, and improves with k whenregularization is used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Table 2.3 Summary of the computational timing (in seconds) on the MovieLens (1M) dataset for factor rank k and relative error level η for (BPDNη). The last row givescomputational timing for the non-regularized data fitting factorized formulationsolved with Riemannian optimization. . . . . . . . . . . . . . . . . . . . . . . . . . 30Table 2.4 Nuclear-norms of the solutions X = LRT for results in Table 2.2, correspondingto τ values in (LASSOτ ). These values are found automatically via root finding,but are difficult to guess ahead of time. . . . . . . . . . . . . . . . . . . . . . . . . 31Table 2.5 Classic SPGL1 (using Lanczos based truncated SVD) versus LR factorization onthe MovieLens (10M) data set ( 10000 × 20000 matrix) shows results for a fixediteration budget (100 iterations) for 50% subsampling of MovieLens data. SNR,RMSE and computational time are shown for k = 5, 10, 20. . . . . . . . . . . . . 31Table 2.6 LR method on the Netflix (100M) data set ( 17770×480189 matrix) shows resultsfor 50% subsampling of Netflix data. SNR, computational time and RMSE areshown for factor rank k and relative error level η for (BPDNη). . . . . . . . . . . 32xTable 2.7 TFOCS versus classic SPG`1 (using direct SVDs) versus LR factorization. Syn-thetic low rank example shows results for completing a rank 10, 100 × 100matrix, with 50% missing entries. SNR, Computational time and iterations areshown for η = 0.1, 0.01, 0.005, 0.0001. Rank of the factors is taken to be 10.Seismic example shows results for matrix completion a low-frequency slice at 10Hz, extracted from the Gulf of Suez data set, with 50% missing entries. SNR,Computational time and iterations are shown for η = 0.2, 0.1, 0.09, 0.08. Rank offactors was taken to be 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Table 3.1 Curvelet versus matrix completion (MC). Real data results for completing afrequency slice of size 401× 401 with 50% and 75% missing sources. Left : 10 Hz(low frequency), right : 60 Hz (high frequency). SNR, computational time, andnumber of iterations are shown for varying levels of η = 0.08, 0.1. . . . . . . . . . 63Table 3.2 Single reflector data results. The recovery quality (in dB) and the computationaltime (in minutes) is reported for each method. The quality suffers significantlyas the window size decreases due to the smaller redundancy of the input data, asdiscussed previously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Table 3.3 3D seismic data results. The recovery quality (in dB) and the computational time(in minutes) is reported for each method. . . . . . . . . . . . . . . . . . . . . . . . 74Table 4.1 Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniquesfor the Marmousi model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Table 4.2 Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniquesfor the Gulf of Suez dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Table 4.3 Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniquesfor the BP model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Table 6.1 Correspondence between continuous and discrete representations of the imagevolume. Here, ω represents frequency, x represents subsurface positions, and(i, j) represents the subsurface grid points. The colon (:) notation extracts avector from e at the grid point i, j for all subsurface offsets. . . . . . . . . . . . . 130Table 6.2 Computational complexity of the two schemes in terms of the number of sourcesNs, receivers Nr sample points Nx and desired number of subsurface offsets ineach direction Nh{x,y,z} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130xiTable 6.3 Comparison of the computational time (in sec) and memory (in megabytes) forcomputing CIP’s gather on a central part of Marmousi model. We can see thesignificant difference in time and memory using the probing techniques comparedto the conventional method and we expect this difference to be greatly exacerbatedfor realistically sized models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130xiiList of FiguresFigure 1.1 Schematic representation of (a) marine and (b) land seismic data acquisition.Source [Enjolras, January 24 2017, RigZone, January 24 2017] . . . . . . . . . . 2Figure 1.2 Illustration of various marine acquisition geometries, namely towed-streamer (1),an ocean bottom geometry (2), buried seafloor array (3), and Vertical SeismicProfile (4). All the seismic surveys involve a source (S), which is typically anairgun for marine survey, and receivers (black dots) that are mainly hydrophonesand/or 3-component geophones. Source Caldwell and Walker [January 24 2017]. 3Figure 1.3 This table summarizes the different types of marine seismic surveys. Source Cald-well and Walker [January 24 2017]. . . . . . . . . . . . . . . . . . . . . . . . . . . 3Figure 1.4 Here, we illustrate the basic difference between the 2D and 3D survey geometry.The area covered by the two surveys is exactly identical, as suggested by thedashed contour lines. Source Caldwell and Walker [January 24 2017]. . . . . . . . 3Figure 1.5 Various types of seismic displays: (a) wiggle trace, (b) variable area, (c) variablearea wiggle trace, and (d) variable density. Copyright: Conoco Inc. . . . . . . . . 4Figure 1.6 Common-receiver gather. (a) Fully sampled and (b) 50% subsampled. The finalgoal is to recover the fully-sampled data from the subsampled data with minimalloss of coherent energy. (c, e) Reconstruction results from two different typesof interpolation and (d, f) corresponding residual plots. We can see that theinterpolation results in (e, f) are better than (c, d) because the energy loss issmall, especially at the cusp of the common-receiver gather. . . . . . . . . . . . . 7Figure 1.7 Simultaneous long-offset acquisition, where an extra source vessel is deployed,sailing one spread-length ahead of the main seismic vessel (see Chapter 4 formore details). We record overlapping shot records in the field and separate theminto non-overlapping shots using source separation based techniques. . . . . . . . 8Figure 1.8 Over/Under acquisition is an instance of low-variability in source firing times,i.e, two sources are firing within 1 (or 2) seconds (see Chapter 4 for more details). 9xiiiFigure 1.9 Time-jittered marine continuous acquisition, where a single source vessel sailsacross an ocean-bottom array firing two airgun arrays at jittered source locationsand time instances with receivers recording continuously . . . . . . . . . . . . . . 10Figure 2.1 Gaussian (black dashed line), Laplace (red dashdotted line), and Student’s t(blue solid line); Densities (left plot), Negative Log Likelihoods (center plot),and Influence Functions (right plot). Student’s t-density has heavy tails, a non-convex log-likelihood, and re-descending influence function. . . . . . . . . . . . . 25Figure 2.2 Frequency slices of a seismic line from Gulf of Suez with 354 shots, 354 receivers.Full data for (a) low frequency at 12 Hz and (b) high frequency at 60 Hz ins-r domain. 50% Subsampled data for (c) low frequency at 12 Hz and (d) highfrequency at 60 Hz in s-r domain. Full data for (e) low frequency at 12 hz and(f) high frequency at 60 Hz in m-h domain. 50% subsampled data for (g) lowfrequency at 12 Hz and (h) high frequency at 60 Hz in m-h domain. . . . . . . . 33Figure 2.3 Singular value decay of fully sampled (a) low frequency slice at 12 Hz and (c)high frequency slice at 60 Hz in (s-r) and (m-h) domains. Singular value decay of50% subsampled (b) low frequency slice at 12 Hz and (d) high frequency data at60 Hz in (s-r) and (m-h) domains. Notice that for both high and low frequencies,decay of singular values is faster in the fully sampled (m-h) domain than in thefully sampled (s-r) domain, and that subsampling does not significantly changethe decay of singular value in (s-r) domain, while it destroys fast decay of singularvalues in (m-h) domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 2.4 Recovery results for 50% subsampled 2D frequency slices using the nuclear normformulation. (a) Interpolation and (b) residual of low frequency slice at 12 Hzwith SNR = 19.1 dB. (c) Interpolation and (d) residual of high frequency sliceat 60 Hz with SNR = 15.2 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 2.5 Missing trace interpolation of a seismic line from Gulf of Suez. (a) Ground truth.(b) 50% subsampled common shot gather. (c) Recovery result with a SNR of18.5 dB. (d) Residual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 2.6 Matricization of 4D monochromatic frequency slice. Top: (Source x, Sourcey) matricization. Bottom: (Source x, Receiver x) matricization. Left: Fullysampled data; Right: Subsampled data. . . . . . . . . . . . . . . . . . . . . . . . 40Figure 2.7 Singular value decay in case of different matricization of 4D monochromaticfrequency slice. Left: Fully sampled data; Right: Subsampled data. . . . . . . . . 40xivFigure 2.8 Missing-trace interpolation of a frequency slice at 12.3Hz extracted from 5D dataset, 75% missing data. (a,b,c) Original, recovery and residual of a common shotgather with a SNR of 11.4 dB at the location where shot is recorded. (d,e,f)Interpolation of common shot gathers at the location where no reference shot ispresent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 2.9 Missing-trace interpolation of a frequency slice at 12.3Hz extracted from 5D dataset, 50% missing data. (a,b,c) Original, recovery and residual of a common shotgather with a SNR of 16.6 dB at the location where shot is recorded. (d,e,f)Interpolation of common shot gathers at the location where no reference shot ispresent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 2.10 Comparison of regularized and non-regularized formulations. SNR of (a) lowfrequency slice at 12 Hz and (b) high frequency slice at 60 Hz over a range offactor ranks. Without regularization, recovery quality decays with factor rankdue to over-fiting; the regularized formulation improves with higher factor rank. 43Figure 2.11 Comparison of interpolation and denoising results for the Student’s t and least-squares misfit function. (a) 50% subsampled common receiver gather with an-other 10 % of the shots replaced by large errors. (b) Recovery result using theleast-squares misfit function. (c,d) Recovery and residual results using the stu-dent’s t misfit function with a SNR of 17.2 dB. . . . . . . . . . . . . . . . . . . . 44Figure 2.12 Residual error for recovery of 11 Hz slice (a) without weighting and (b) withweighting using true support. SNR in this case is improved by 1.5 dB. . . . . . 45Figure 2.13 Residual of low frequency slice at 11 Hz (a) without weighing (c) with supportfrom 10.75 Hz frequency slice. SNR is improved by 0.6 dB. Residual of lowfrequency slice at 16 Hz (b) without weighing (d) with support from 15.75 Hzfrequency slice. SNR is improved by 1dB. Weighting using learned support isable to improve on the unweighted interpolation results. . . . . . . . . . . . . . . 46Figure 2.14 Recovery results of practical scenario in case of weighted factorized formulationover a frequency range of 9-17 Hz. The weighted formulation outperforms thenon-weighted for higher frequencies. For some frequency slices, the performanceof the non-weighted algorithm is better, because the weighted algorithm can benegatively affected when the subspaces are less correlated. . . . . . . . . . . . . 47Figure 3.1 Singular value decay in the source-receiver and midpoint-offset domain. Left :fully sampled frequency slices. Right : 50% missing shots. Top: low frequencyslice. Bottom: high frequency slice. Missing source subsampling increases thesingular values in the (midpoint-offset) domain instead of decreasing them in the(src-rec) domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56xvFigure 3.2 A frequency slice from the the seismic dataset from Nelson field. Left : Fullysampled data. Right : 50% subsampled data. Top: Source-receiver domain.Bottom: Midpoint-offset domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure 3.3 (xrec, yrec) matricization. Top: Full data volume. Bottom: 50% missing sources.Left : Fully sampled data. Right : Zoom plot . . . . . . . . . . . . . . . . . . . . . 58Figure 3.4 (ysrc, yrec) matricization. Top: Fully sampled data. Bottom: 50% missingsources. Left : Full data volume. Right : Zoom plot. In this domain, the samplingartifacts are much closer to the idealized ’pointwise’ random sampling of matrixcompletion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 3.5 Singular value decay (normalized) of the Left : (xrec, yrec) matricization andRight : (ysrc, yrec) matricization for full data and 50% missing sources. . . . . . . 59Figure 3.6 Missing-trace interpolation. Top : Fully sampled data and 75% subsampledcommon receiver gather. Bottom Recovery and residual results with a SNR of9.4 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 3.7 Qualitative performance of 2D seismic data interpolation for 5-85 Hz frequencyband for 50% and 75% subsampled data. . . . . . . . . . . . . . . . . . . . . . . 64Figure 3.8 Recovery results using matrix-completion techniques. Left : Interpolation in thesource-receiver domain, low-frequency SNR 3.1 dB. Right : Difference betweentrue and interpolated slices. Since the sampling artifacts in the source-receiverdomain do not increase the singular values, matrix completion in this domainis unsuccesful. This example highlights the necessity of having the appropriateprinciples of low-rank recovery in place before a seismic signal can be interpolatedeffectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 3.9 Gulf of Mexico data set. Top: Fully sampled monochromatic slice at 7 Hz.Bottom left : Fully sampled data (zoomed in the square block). Bottom right :80% subsampled sources. For visualization purpose, the subsequent figures onlyshow the interpolated result in the square block. . . . . . . . . . . . . . . . . . . 66Figure 3.10 Reconstruction errors for frequency slice at 7Hz (left) and 20Hz (right) in caseof 80% subsampled sources. Rank-minimization based recovery with a SNR of14.2 dB and 11.0 dB respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 3.11 Frequency-wavenumber spectrum of the common receiver gather. Top left : Fully-sampled data. Top right : Periodic subsampled data with 80% missing sources.Bottom left : Uniform-random subsampled data with 80% missing sources. Bot-tom Right : Reconstruction of uniformly-random subsampled data using rank-minimization based techniques. While periodic subsampling creates aliasing,uniform-random subsampling turns the aliases in to incoherent noise across thespectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67xviFigure 3.12 Gulf of Mexico data set, common receiver gather. Left : Uniformly-randomsubsampled data with 80% missing sources. Middle : Reconstruction resultsusing rank-minimization based techniques (SNR = 7.8 dB). Right : Residual. . . 68Figure 3.13 Missing-trace interpolation (80% sub-sampling) in case of geological structureswith a fault. Left : 80% sub-sampled data. Middle: after interpolation (SNR =23 dB). Right : difference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 3.14 ADMM data fit + recovery quality (SNR) for single reflector data, common re-ceiver gather. Middle row: recovered slices, bottom row: residuals correspondingto each method in the middle row. Tensor-based windowing appears to visiblydegrade the results, even with overlap. . . . . . . . . . . . . . . . . . . . . . . . 71Figure 3.15 BG 5-D seismic data, 12.3 Hz, 75% missing sources. Middle row: interpolationresults, bottom row: residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Figure 3.16 BG 5D seismic data, 4.68 Hz, Comparison of interpolation results with and with-out windowing using Jellyfish for 75% missing sources. Top row: interpolationresults for differing window sizes, bottom row: residuals. . . . . . . . . . . . . . 74Figure 4.1 Monochromatic frequency slice at 5 Hz in the source-receiver (s-r) and midpoint-offset (m-h) domain for blended data (a,c) with periodic firing times and (b,d)with uniformly random firing times for both sources. . . . . . . . . . . . . . . . . 84Figure 4.2 Decay of singular values for a frequency slice at (a) 5 Hz and (b) 40 Hz of blendeddata. Source-receiver domain: blue—periodic, red—random delays. Midpoint-offset domain: green—periodic, cyan—random delays. Corresponding decay ofthe normalized curvelet coefficients for a frequency slice at (c) 5 Hz and (d) 40Hz of blended data, in the source-channel domain. . . . . . . . . . . . . . . . . . 85Figure 4.3 Monochromatic frequency slice at 40 Hz in the s-r and m-h domain for blendeddata (a,c) with periodic firing times and (b,d) with uniformly random firing timesfor both sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Figure 4.4 HSS partitioning of a high-frequency slice at 40 Hz in the s-r domain: (a) first-level, (b) second-level, for randomized blended acquisition. . . . . . . . . . . . . . 86Figure 4.5 (a,b,c) First-level sub-block matrices (from Figure 4.4a). . . . . . . . . . . . . . . 87Figure 4.6 Decay of singular values of the HSS sub-blocks in s-r domain: red—Figure 4.5a,black—Figure 4.5b, blue—Figure 4.5c. . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 4.7 Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous over/under acquisition simulated on theMarmousi model. (d, e) Corresponding common-channel gathers for each sourceand (f) the blended common-channel gather. . . . . . . . . . . . . . . . . . . . . 91xviiFigure 4.8 Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous over/under acquisition from the Gulf ofSuez dataset. (d, e) Corresponding common-channel gathers for each source and(f) the blended common-channel gather. . . . . . . . . . . . . . . . . . . . . . . . 92Figure 4.9 Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous long offset acquisition simulated on the BPsalt model. (d, e) Corresponding common-channel gathers for each source and(f) the blended common-channel gather. . . . . . . . . . . . . . . . . . . . . . . . 93Figure 4.10 Deblended shot gathers and difference plots (from the Marmousi model) of source1 and source 2: (a,c) deblending using HSS based rank-minimization and (b,d)the corresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots. . . . . . . . . . . . . . . . 95Figure 4.11 Deblended common-channel gathers and difference plots (from the Marmousimodel) of source 1 and source 2: (a,c) deblending using HSS based rank-minimizationand (b,d) the corresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots. . . . . . . 96Figure 4.12 Deblended shot gathers and difference plots (from the Gulf of Suez dataset) ofsource 1 and source 2: (a,c) deblending using HSS based rank-minimization and(b,d) the corresponding difference plots; (e,g) deblending using curvelet-basedsparsity-promotion and (f,h) the corresponding difference plots. . . . . . . . . . . 97Figure 4.13 Deblended common-channel gathers and difference plots (from the Gulf of Suezdataset) of source 1 and source 2: (a,c) deblending using HSS based rank-minimization and (b,d) the corresponding difference plots; (e,g) deblending usingcurvelet-based sparsity-promotion and (f,h) the corresponding difference plots. . 98Figure 4.14 Deblended shot gathers and difference plots (from the BP salt model) of source 1and source 2: (a,c) deblending using HSS based rank-minimization and (b,d) thecorresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots. . . . . . . . . . . . . . . . 99Figure 4.15 Deblended common-channel gathers and difference plots (from the BP salt model)of source 1 and source 2: (a,c) deblending using HSS based rank-minimizationand (b,d) the corresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots. . . . . . . 100Figure 4.16 Signal-to-noise ratio (dB) over the frequency spectrum for the deblended datafrom the Marmousi model. Red, blue curves—deblending without HSS; cyan,black curves—deblending using second-level HSS partitioning. Solid lines—separated source 1, + marker—separated source 2. . . . . . . . . . . . . . . . . . 101xviiiFigure 4.17 Blended common-midpoint gathers of (a) source 1 and (e) source 2 for the Mar-mousi model. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimization and (d,h) sparsity-promotion. . . . . . . . . . . . . . . . . . . . . . 102Figure 4.18 Blended common-midpoint gathers of (a) source 1, (e) source 2 for the Gulf ofSuez dataset. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimization and (d,h) sparsity-promotion. . . . . . . . . . . . . . . . . . . . . . 103Figure 4.19 Blended common-midpoint gathers of (a) source 1, (e) source 2 for the BPsalt model. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimization and (d,h) sparsity-promotion. . . . . . . . . . . . . . . . . . . . . . 104Figure 5.1 Aerial view of the 3D time-jittered marine acquisition. Here, we consider onesource vessel with two airgun arrays firing at jittered times and locations. Start-ing from point a, the source vessel follows the acquisition path shown by blacklines and ends at point b. The receivers are placed at the ocean bottom (reddashed lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Figure 5.2 Schematic representation of the sampling-transformation operator A during theforward operation. The adjoint of the operator A follows accordingly. (a, b,c) represent a monochromatic data slice from conventional data volume and (d)represents a time slice from the continuous data volume. . . . . . . . . . . . . . . 107Figure 5.3 Monochromatic slice at 10.0 Hz. Fully sampled data volume and simultaneousdata volume matricized as (a, c) i = (nsx, nsy), and (b, d) i = (nrx, nsx). (e) De-cay of singular values. Notice that fully sampled data organized as i = (nsx, nsy)has slow decay of the singular values (solid red curve) compared to the i =(nrx, nsx) organization (solid blue curve). However, the sampling-restriction op-erator slows the decay of the singular values in the i = (nrx, nsx) organization(dotted blue curve) compared to the i = (nsx, nsy) organization (dotted redcurve), which is a favorable scenario for the rank-minimization formulation. . . . 108Figure 5.4 Source separation recovery. A shot gather from the (a) conventional data; (b) asection of 30 seconds from the continuous time-domain simultaneous data (b);(c) recovered data by applying the adjoint of the sampling operatorM; (d) datarecovered via the proposed formulation (SNR = 20.8 dB); (e) difference of (a)and (d) where amplitudes are magnified by a factor of 8 to illustrate a very smallloss in coherent energy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111xixFigure 6.1 Different slices through the 4-dimensional image volume e(z, z′, x, x′) aroundz = zk and x = xk. (a) Conventional image e(z, z, x, x), (b) Image gather forhorizontal and vertical offset e(z, z′, xk, x′), (c) Image gather for horizontal offsete(z, z, xk, x′) and (d) Image gather for a single scattering point e(zk, z′, xk, x′).(e-g) shows how these slices are organized in the matrix representation of e. . . . 131Figure 6.2 Migrated images for a wrong (a) and the correct (b) background velocity areshown with 3 locations at which we extract CIPs for a wrong (c) and the correct(d) velocity. The CIPs contain many events that do not necessarily focus. How-ever, these events are located along the line normal to the reflectors. Therefore,it seems feasible to generate multiple CIPs simultaneously as long as they arewell-separated laterally. A possible application of this is the extraction of CIGsat various lateral positions. CIGs at x = 1500 m and x = 2500 m for a wrong(e) and the correct (f) velocity indeed show little evidence of crosstalk, allowingus to compute several CIGs at the cost of a single CIG. . . . . . . . . . . . . . . 132Figure 6.3 (a) Compass 3D synthetic velocity model provided to us by BG group. (b) ACIP gather at (x, y, z) = (1250, 1250, 390)m. The proposed method (Algorithm2) is 1500 times faster then the classical method (Algorithm 1) to generate CIPgather. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Figure 6.4 Cross-section of Compass 3D velocity model (Figure 6.3 (a)) along (a) x, and (b)y direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Figure 6.5 Slices extracted along the horizontal (a,b) and vertical (c) offset directions fromthe CIP gather shown in Figure 6.3 (b). . . . . . . . . . . . . . . . . . . . . . . . 134Figure 6.6 Schematic depiction of the scattering point and related positioning of the reflector.134Figure 6.7 (a) Horizontal one-layer velocity model and (b) constant density model. CIP lo-cation is x = 1250 m and z = 400 m. (c) Modulus of angle-dependent reflectivitycoefficients at CIP. The black lines are included to indicate the effective apertureat depth. The red lines are the theoretical reflectivity coefficients and the bluelines are the wave-equation based reflectivity coefficients. . . . . . . . . . . . . . 135Figure 6.8 Angle dependent reflectivity coefficients in case of horizontal four-layer (a) veloc-ity and (b) density model at x = 1250 m. Modulus of angle-dependent reflectivitycoefficients at (c) z = 200 m, (d) z = 600 m, (e) z = 1000 m, (f) z = 1400 m. . . 136Figure 6.9 Estimation of local geological dip. (a,b) Two-layer model. (c) CIP gather atx = 2250 m and z = 960 m overlaid on dipping model. (d) Stack-power versusdip-angle. We can see that the maximum stack-power corresponds to the dipvalue of 10.8◦, which is close to the true dip value of 11◦. . . . . . . . . . . . . . 137xxFigure 6.10 Modulus of angle-dependent reflectivity coefficients in two-layer model at z = 300and 960 m and x = 2250 m. (a) Reflectivity coefficients at z = 300 m andx = 2250 m. Reflectivity coefficients at z = 900 m (b) with no dip θ = 0◦ and(c) with the dip obtained via the method described above (θ = 10.8◦). . . . . . . 138Figure 6.11 Comparison of working with CIGs versus CIPs. (a) True velocity model. Theyellow line indicates the location along which we computed the CIGs and thegreen dot is the location where we extracted the CIPs. (b,c) CIGs extractedalong vertical and horizontal offsets directions in case of vertical reflector. (d)CIPs extracted along vertical (z = 1.2 km, x = 1 km) reflector. (e,f) CIGsextracted along vertical and horizontal offsets directions in case of horizontalreflector. (g) CIPs extracted along horizontal (z = 1.5 km, x = 4.48 km) reflector.139Figure 6.12 Randomized trace estimation. (a,b) True and initial velocity model. Objectivefunctions for WEMVA based on the Frobenius norm, as a function of velocity per-turbation using the complete matrix (blue line) and error bars of approximatedobjective function evaluated via 5 different random probing with (c) K=10 and(d) K = 80 for the Marmousi model. . . . . . . . . . . . . . . . . . . . . . . . . . 140Figure 6.13 WEMVA on Marmousi model with probing technique for a good starting model.(a,b) True and initial velocity models. Inverted model using (c) K = 10 and (b)K = 100 respectively. We can clearly see that even 10 probing vectors are goodenough to start revealing the structural information. . . . . . . . . . . . . . . . . 141Figure 6.14 WEMVA on Marmousi model with probing technique for a poor starting veloc-ity model and 8-25 Hz frequency band. (a,b) True and initial velocity models.Inverted model using (c) K = 100. (d) Inverted velocity model overlaid with acontour plot of the true model perturbation. We can see that we captures theshallow complexity of the model reasonably well when working with a realisticseismic acquisition and inversion scenario. . . . . . . . . . . . . . . . . . . . . . . 142Figure 7.1 To understand the inherent redundancy of seismic data, we analyze the decayof singular values of windowed versus non-windowed cases. We see that fullysampled seismic data volumes have fastest decay of singular values, whereas,smaller window sizes result in the slower decay rate of the singular values. . . . . 148xxiFigure 7.2 To visualize the low-rank nature of image volumes, I form a full-subsurface offsetextended image volume using a subsection of the Marmousi model and analyzedthe decay of singular values. (a) Complex subsection of the Marmousi modelwith highly dipping reflectors with strong lateral variations in the velocity, and(b) corresponding full-subsurface offset extended image volume at 5 Hz. (c) Todemonstrate the low-rank nature of image volumes, I plot the decay of singularvalues, where I observed that we only required the first 10 singular vectors to geta reconstruction error of 10−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151xxiiAcknowledgementsFirst and foremost, I want to express my sincere gratitude to my advisor Professor Dr. Felix J.Herrmann. I am deeply indebted for his continuous support of my research, for believing in me,his patience, motivation, and immense knowledge that have made my PhD experience productive.He has taught me, both consciously and unconsciously, the best practices in conducting scientificresearch.I would also like to thank my committee members, Professor Dr. Eldad Haber, Professor Dr.Chen Greif, and Professor Dr. Ozgur Yilmaz for serving on my supervisory committee and forgenerously offering their time, support, and invaluable advice.I would like to express my heartiest thanks to Dr. Aleksandr Aravkin, Dr. Tristan van Leeuwen,Dr. Hassan Mansour and Dr. Rongrong Wang for their mentorships and friendship, sharing theirexpertise and experiences, and for providing valuable feedback throughout my research. I speciallywould like to thank our late post-doc Dr. Ernie Esser (1980-2015) for his friendship, guidance andlate evening conversations on research ideas. I miss cycling with him around Vancouver.I would like to show my special appreciation to Henryk Modzelewski and Miranda Joyce fortheir support, friendship and generous time during my stay at the SLIM lab. I would like to expressmy special thanks to them for being great souls. They are always ready to help with a smile. Ialso would like to thank Manjit Dosanjh and Ian Hanlon for their support and help during my firstyear in the SLIM-group.I am forever thankful to my colleagues at the SLIM Lab for their friendship and support, andfor creating a cordial working environment. Many thanks to Haneet Wason, Xiang Li, Ning Tu,Shashin Sharan, Felix Oghenekohwo, Curt Da Silva, Oscar Lopez, Zhilong Fang, Art Petrenko, LuzAnglica Caudillo Mata, Tim Lin, Brendan Smithyman, Bas Peter, Ali Alfaraj and other membersof the SLIM-group.My grateful appreciation goes to Dr. James Rickett for giving me the opportunity to do aninternship with Schlumberger. I also would like to thank Dr. Can Evren Yarman and Dr. IvanVasconcelos for mentorship at Schlumberger, and for creating an enjoyable working environment.Many thanks to Dr. Eric Verschuur for providing the Gulf of Suez dataset, which I used inChapters 2 and 4, to PGS for providing the North Sea dataset, and to Chevron for providing theGulf of Mexico dataset that I used in Chapter 3. I also would like to thank the BG group forxxiiiproviding the synthetic 3D compass velocity model and 5D seismic data that I used in Chapters2, 3 and 5. Many thanks to the authors of IWave, SPG`1, SPG-LR, Madagascar, the Marmousivelocity model and the BP salt model, which I used throughout the thesis. I also would liketo acknowledge the collaboration of the SENAI CIMATEC Supercomputing Center for IndustrialInnovation, Bahia, Brazil, and the support of BG Group and the International Inversion InitiativeProject.Finally, I acknowledge the people who mean a lot to me: my family, my brother and sistersfor their continuous and unparalleled love, help and support. My heartfelt regard goes to myfather-in-law and mother-in-law for their love and moral support. I owe thanks to a very specialperson, my wife Monika, for making countless sacrifices to help me get to this point; and my sonRutva for abiding my ignorance and continually providing the requisite breaks from research withhis innocent smiles. Monika’s unconditional love and support helped me get through this period inthe most positive way, and I dedicate this milestone to her.This work was in part financially supported by the NSERC Collaborative Research and De-velopment Grant DNOISE II (375142-08). This research was carried out as part of the SINBADproject with support from the following organizations: BG Group, BGP, CGG, Chevron, Cono-coPhillips, DownUnder GeoSolutions, Hess, Petrobras, PGS, Sub-Salt Solutions, Schlumberger,and Woodside.xxivChapter 1IntroductionExploration geophysics is an applied branch of geophysics whose aim is to predict the physicalproperties of the subsurface of the earth, along with the anomalies, using data acquired overthe surface of the earth. This data ranges from seismic, gravitational, magnetic, electrical andelectromagnetic. Seismic methods are widely used in all the major oil and gas exploration activitiesaround the world because of the better resolution capability of small scale anomalies compared toother existing geophysical methods. Although the acquisition principles are identical for land andmarine environments, the operational details such as the geometry and type of receivers systems,the density of measurements made over a given area, and the type of sensors used differ between thetwo environments [Caldwell and Walker, January 24 2017]. All seismic surveys involve application ofa seismic energy source at discrete surface locations, such as a vibroseis truck / shot-hole dynamiteon land / air-guns at sea (Figure 1.1). In this thesis, I focus my investigation on the issues related tomarine seismic data acquisition, processing and inversion. Figure 1.2 illustrates the different receivergeometries used in marine seismic surveying, while Figure 1.3 provides a list of the different typesof surveys.As outlined in [Caldwell and Walker, January 24 2017], in towed streamer acquisition, a cableis towed or streamed behind a moving vessel that contains the hydrophones, where the length ofthe streamer can vary between 3 and 12 kilometres depending upon the depth of the geologicaltarget of interest. In ocean bottom surveys, the recording system contains a hydrophone and pos-sibly 3-component geophones at each recording location. In vertical seismic profiling, 3-componentgeophones are placed along vertical and/or horizontal wells. Seismic surveys are often acquiredeither along single lines or over an area, termed 2D and 3D seismic acquisition, respectively. In 2Dacquisition, the sources and receivers are placed along a single sail line below the sea-surface withthe underlying assumption being that the reflections are generated in the same 2D vertical planelying below the sail line. The processing of 2D seismic data generates a 2D image of the subsurfacewith detailed geological features, i.e., a map of the locations where the acoustic properties of theearth change, which lie beneath the sail line, hence, the name 2D [Caldwell and Walker, January1(a)(b)Figure 1.1: Schematic representation of (a) marine and (b) land seismic data acquisition.Source [Enjolras, January 24 2017, RigZone, January 24 2017]24 2017]. Figure 1.4 shows a standard 2D survey, where the 2D lines are placed on a grid. Whilethe 2D surveys are economical, the gaps between the receiver lines are in kilometres, which makesinterpretation of the subsurface problematic if there are strong lateral variations in the earth in thecross-line direction. In that case, 2D assumption fails and we are not able to produce the correctimage of the subsurface. This limitation is overcome using 3D acquisition (Figure 1.4), where seis-mic surveys are acquired over an area resulting in a 3D image of the subsurface, hence the term3D seismic. Although 3D surveys capture the geological features in detail, they are very expen-sive because they involve greater investment than 2D surveying in terms of logistics, turnaroundacquisition time and sophisticated equipment. Apart from 2D and 3D, seismic surveys are oftenacquired repeatedly over a producing hydrocarbon field, known as 4D surveys (time-lapse), wherethe interval between the surveys can be in the order of months or years. The objective of 4D surveysis to estimate reservoir changes as a result of production and/or injection of fluid in the reservoir,by comparing different datasets acquired over a period of time [Caldwell and Walker, January 242017].Seismic data acquisition results in millions of recorded traces with reflection events generatedat interfaces between rock layers in the subsurface having different rock properties. Each tracedisplays the data associated with its common depth point as a continuous function of pressureoscillating on either side of a zero amplitude line (Figure 1.5a). The amplitude of the wiggle isrelative to how large the change in rock properties is between two layers. By Society of ExplorationGeophysicist (SEG) convention [AAPG, January 24 2017], a reflection event is displayed as apositive peak (polarity is positive) if it is generated from an increase in acoustic impedance, such2Figure 1.2: Illustration of various marine acquisition geometries, namely towed-streamer (1),an ocean bottom geometry (2), buried seafloor array (3), and Vertical Seismic Profile(4). All the seismic surveys involve a source (S), which is typically an airgun for marinesurvey, and receivers (black dots) that are mainly hydrophones and/or 3-componentgeophones. Source Caldwell and Walker [January 24 2017].Figure 1.3: This table summarizes the different types of marine seismic surveys. Source Cald-well and Walker [January 24 2017].Figure 1.4: Here, we illustrate the basic difference between the 2D and 3D survey geometry.The area covered by the two surveys is exactly identical, as suggested by the dashedcontour lines. Source Caldwell and Walker [January 24 2017].3Figure 1.5: Various types of seismic displays: (a) wiggle trace, (b) variable area, (c) variablearea wiggle trace, and (d) variable density. Copyright: Conoco Inc.as from a slow velocity shale to a high velocity dolomite. If a reflection event is generated fromthe decrease in acoustic impedance, then it is displayed as a trough and the polarity is negative.This convention is called normal polarity. In seismic processing, this reflection energy is mappedto an image and/or attributes of the underground geological structures that are used to infer thephysical rock properties.In realistic seismic data acquisition, each sub-surface point is sampled multiple times to increasethe fold, where fold is a measure of the redundancy of common midpoint seismic data equal to thenumber of offset receivers that record a given data point or in a given bin. Fold improves thesignal-to-noise ratio and reduces the random environmental noise in the data during the stacking(summation) process to produce the image of the subsurface. Various factors, such as depth andthickness of the zone of interest, surface conditions and topography, play an important role in thedesign of the seismic acquisition layout. Sources and receivers grid points are controlled by the binsize, which determine how often you sample the subsurface. Bin sizes are smaller in the target-areaof interest where the aim is to get a higher resolution image of the subsurface. Apart from the binsize, the frequency spectrum of the data controls the vertical resolution, since higher frequencieshave shorter wavelengths and provide a more detailed image of the subsurface.1.1 Problem statementRealistically, conventional oil and gas fields are increasingly difficult to explore and produce, callingfor more complex wave-equation based inversion (WEI) algorithms requiring dense long-offset sam-plings and wide-azimuthal coverage. Due to budgetary and/or physical constraints, seismic dataacquisition involves coarser sampling (sub-sampling) along either sources or receivers, i.e., seismicdata along spatial sampling grids are typically sampled below Nyquist because of cost and certainphysical constraints. However, some of the seismic data processing and imaging techniques such as4surface related multiple estimation, amplitude-versus-offset (AVO) and amplitude-versus-azimuth(AVAz) analyses require densely sampled seismic data to avoid acquisition related artifacts in theinverted model of the subsurface. To mitigate these artifacts, we rely on seismic data interpolationmethods that result in dense periodically sampled data preferably at or above Nyquist. I includedFigures 1.6 (a ,b) to show a fully sampled and 50% subsampled common-receiver gather extractedfrom a 2D seismic data acquisition where sources and receivers are placed below the sea-surface.The objective of different types of interpolation algorithms are to estimate the missing traces froman undersampled dataset with minimal loss of coherent energy as shown in Figures 1.6 (c, d) and(e,f).The practitioner also proposed to acquire simultaneous source surveys to reduce costs by re-ducing acquisition time and environmental impact (shorter disturbance of an area). Simultaneousacquisition also mitigates sampling related issues and improves the quality of seismic data, whereinsingle and/or multiple source vessels fire sources at near-simultaneous or slightly random times, re-sulting in overlapping shot records (also known as blending). In general, there are two different typesof marine source surveys, namely static and dynamic surveys. During static surveys, sources aretowed behind the source vessels and receivers are fixed at the ocean-floor, whereas during dynamicsurveys, both sources and receivers are towed behind the source vessels. The current paradigm forsimultaneous towed-streamer marine acquisition (dynamic geometry) incorporates low-variabilityin source firing times—i.e., 0 ≤ 1 or 2 seconds, since both the sources and receivers are moving.Figures 1.7 and 1.8 show two instances of dynamic geometry, namely simultaneous long offset [Longet al., 2013] and over/under [Hill et al., 2006, Moldoveanu et al., 2007, Lansley et al., 2007, Long,2009, Hegna and Parkes, 2012, Torben Hoy, 2013], where we can see the low degree of randomnessin the overlapping shots in simultaneous data that presents a challenging case for source separationaccording to compressed sensing. For static geometry, [Wason and Herrmann, 2013b, Li et al.,2013] proposed an alternate sampling strategy for simultaneous acquisition (time-jittered marine)leveraging ideas from compressed sensing (CS), where a single source vessel sails across an ocean-bottom array continuously firing two airgun arrays at jittered source locations and time instanceswith receivers recording continuously (Figure 1.9). This results in a continuous time-jittered simul-taneous recording of seismic data (Figure 1.9). Simultaneous acquisitions are economically viable,since overall acquisition becomes compressed, where we record the overlapping shot records. Whilethese cost savings are very welcome, especially in the current downturn, subsequent seismic dataprocessing and imaging workflows expect recordings where the data is collected for sequential shots.So our task is to turn continuous recordings with overlapping shots into sequential recording withthe least amount of artifacts and loss of energy at late arrival time. Therefore, a practical (simulta-neous) source separation technique is required, which aims to recover unblended (non-overlapping)data—as acquired during conventional acquisition—from simultaneous data. However, all the in-terpolation and source separation workflows for 3D seismic data acquisition results in exponentialgrowth in data volumes because the recovered seismic data contains traces on the order of millions,5and prohibitive demands on computational resources. Given the data size volumes and resources,one of the key challenges is to extract meaningful information from this huge dataset (which mayin the near future grow as large as on the order of petabytes) in computationally efficient ways, i.e.,to reduce the turnaround time of each of the processing and inversion steps, apart from storing themassive volumes of interpolated and separated seismic data on disks.1.2 ObjectivesThe primary focus of this thesis is to propose fast computational techniques that are practical androbust for large-scale seismic data processing, namely missing-trace interpolation, source separation,and wave-equation based migration velocity analysis. The main objectives of this work are:• to identify three basic principles of compressed sensing [Donoho, 2006b] for recovering seis-mic data volumes using rank-minimization based techniques, namely a low-rank transformdomain, a rank-increasing sampling scheme, and a practical SVD-free rank-minimizing opti-mization scheme for large-scale seismic data processing;• to decrease the cost of large-scale simultaneous source acquisition for towed streamers andocean-bottom surveys by leveraging ideas from compressed sensing via randomization insource locations and time instances followed by (SVD-free) computationally efficient sourceseparation;• to derive a computationally feasible two-way wave-equation based factorization principle thatgives us access to the kinematics and amplitudes of full subsurface offset extended image vol-umes without carrying out explicit cross-correlations between source and receiver wavefieldsfor each shot.1.3 ContributionsTo our knowledge this work represents the first instance where seismic data processing, namelymissing-trace interpolation and source separation, is performed using full 5D seismic data volumeby avoiding windowed based operations. In this work, I show the benefits of large-scale SVD-freeframework in terms of the computational time and memory requirements. We also propose a novelmatrix-vector formulation to extract information from the full-subsurface offset extended image vol-umes that overcome prohibitive computational and storage costs of forming full-subsurface offsetextended image volumes, which cannot be formed using conventional extended imaging workflows.I show that the purpose matrix-vector formulation offers new perspectives on the design and imple-mentation of workflows that exploit information embedded in various types of subsurface extendedimages. I further demonstrate the benefits of this matrix-vector formulation to obtain local infor-mation, tied to individual subsurface points that can serve as quality control for velocity analyses6(a) (b)(c) (d)(e) (f)Figure 1.6: Common-receiver gather. (a) Fully sampled and (b) 50% subsampled. The finalgoal is to recover the fully-sampled data from the subsampled data with minimal loss ofcoherent energy. (c, e) Reconstruction results from two different types of interpolationand (d, f) corresponding residual plots. We can see that the interpolation results in (e,f) are better than (c, d) because the energy loss is small, especially at the cusp of thecommon-receiver gather.7(a)Figure 1.7: Simultaneous long-offset acquisition, where an extra source vessel is deployed,sailing one spread-length ahead of the main seismic vessel (see Chapter 4 for moredetails). We record overlapping shot records in the field and separate them into non-overlapping shots using source separation based techniques.or as input to localized amplitude-versus-offset analyses, or global information that can be used todrive automatic velocity analyses without requiring prior information on the geologic dip.1.4 OutlineIn Chapters 2 and 3, we first introduce the underlying theory of matrix completion and its SVD-free approach for large-scale missing-trace interpolation problem. Next, we outline three practicalprinciples for using low-rank optimization techniques to recover missing seismic data which arebuilt upon theoretical ideas from Compressed Sensing. We further address the computationalchallenges of using the matrix-based techniques for seismic data reconstruction, where we proposeto use either a (SVD-free) factorization based rank-minimization framework with the Pareto curveapproach or the factorization-based parallel matrix completion framework dubbed Jellyfish. Wealso examine the popular approach of windowing a large data volume into smaller data volumesto be processed in parallel, and empirically demonstrate how such a process does not respect theinherent redundancy present in the data, degrading reconstruction quality as a result. Finally,I demonstrate these observations on carefully selected real 2D seismic surveys and synthetic 3Dseismic surveys simulated using a complex velocity model provided by the BG Group. We also show8(a)Figure 1.8: Over/Under acquisition is an instance of low-variability in source firing times,i.e, two sources are firing within 1 (or 2) seconds (see Chapter 4 for more details).the computational advantages of matrix completion techniques over sparsity-promoting techniquesand tensor-based missing-trace reconstruction techniques.In Chapter 4, I extend the low-rank optimization based approaches to perform source separationin simultaneous towed-streamer marine acquisition, where I modified the matrix completion formu-lation to separate multiple sources acquired simultaneously. We address the challenge of source sep-aration for simultaneous towed-streamer acquisitions via two compressed sensing based approaches,namely sparsity-promotion and rank-minimization. We exploit the sparse structure of seismic datain the curvelet domain and low-rank structure of seismic data in the midpoint-offset domain. Ifurther incorporate the Hierarchical Semi-Separable matrix representation in rank-minimizationframework to exploit the low-rank structure of seismic data at higher frequencies. Finally, we illus-trate the performance of both the sparsity-promotion and rank-minimization based techniques bysimulating two simultaneous towed-streamer acquisition scenarios: over/under and simultaneouslong offset. A field data example from the Gulf of Suez for the over/under acquisition scenario isalso included. I further compare these two techniques with the NMO-based median filtering typeapproach.In Chapter 5, I use rank-minimization based techniques to present a computationally tractablealgorithm to separate simultaneous time-jittered continuous recording for a 3D ocean-bottom cablesurvey. First, I formulate a factorization based rank-minimization formulation that works on the9(a)Figure 1.9: Time-jittered marine continuous acquisition, where a single source vessel sailsacross an ocean-bottom array firing two airgun arrays at jittered source locations andtime instances with receivers recording continuouslytemporal-frequency domain using all monochromatic data matrices together. Then, I show theefficacy of proposed framework on a synthetic 3D seismic survey simulated on a complex geologicalvelocity model provided by the BG Group.In Chapter 6, we exploit the redundancy in extended image volumes, by first arranging theextended image volume as a matrix, followed by probing this matrix in a way that avoids explicitstorage and removes the customary and expensive loop over shots found in conventional extendedimaging. As a result, we end up with a matrix-vector formulation where I form different imagegathers and use them to perform amplitude-versus-angle and wave-equation migration velocityanalyses, without requiring prior information on the geologic dips. Next, I show how this factor-ization can be used to extract information on local geological dips and radiation patterns for AVApurposes, and how full-subsurface offset extended image volumes can be used to carry out auto-matic WEMVA. Each application is illustrated by carefully selected stylized numerical exampleson 2- and 3-D velocity models.10In Chapter 7, I conclude the work presented in this thesis and propose future research directions.11Chapter 2Fast methods for denoising matrixcompletion formulations, withapplications to robust seismic datainterpolation.2.1 SummaryRecent SVD-free matrix factorization formulations have enabled rank minimization for systemswith millions of rows and columns, paving the way for matrix completion in extremely large-scaleapplications, such as seismic data interpolation.In this paper, we consider matrix completion formulations designed to hit a target data-fittingerror level provided by the user, and propose an algorithm called LR-BPDN that is able to exploitfactorized formulations to solve the corresponding optimization problem. Since practitioners typi-cally have strong prior knowledge about target error level, this innovation makes it easy to applythe algorithm in practice, leaving only the factor rank to be determined.Within the established framework, we propose two extensions that are highly relevant to solvingpractical challenges of data interpolation. First, we propose a weighted extension that allows knownsubspace information to improve the results of matrix completion formulations. We show how thisweighting can be used in the context of frequency continuation, an essential aspect to seismicdata interpolation. Second, we propose matrix completion formulations that are robust to largemeasurement errors in the available data.We illustrate the advantages of LR-BPDN on collaborative filtering problem using the Movie-A version of this chapter has been published in SIAM Journal on Scientific Computing, 2014, vol 36, pagesS237-S266.12Lens 1M, 10M, and Netflix 100M datasets. Then, we use the new method, along with its robustand subspace re-weighted extensions, to obtain high-quality reconstructions for large scale seismicinterpolation problems with real data, even in the presence of data contamination.2.2 IntroductionSparsity- and rank-regularization have had significant impact in many areas over the last severaldecades. Sparsity in certain transform domains has been exploited to solve underdetermined linearsystems with applications to compressed sensing Donoho [2006a], Cande`s and Tao [2006], nat-ural image denoising/inpainting Starck et al. [2005], Mairal et al. [2008], Mansour et al. [2010],and seismic image processing Herrmann and Hennenfent [2008a], Neelamani et al. [2010], Her-rmann et al. [2012a], Mansour et al. [2012b]. Analogously, low-rank structure has been used toefficiently solve matrix completion problems, such as the Netflix Prize problem, along with manyother applications, including control, system identification, signal processing, and combinatorialoptimization Fazel [2002], Recht et al. [2010b], Cande`s et al. [2011], and seismic data interpolationand denoising Oropeza and Sacchi [2011].Regularization formulations for both types of problems introduce a regularization functional ofthe decision variable, either by adding an explicit penalty to the data-fitting termminxρ(A(x)− b) + λ||x||, (QPλ)or by imposing constraintsminxρ(A(x)− b) s.t. ||x|| ≤ τ . (LASSOτ )In these formulations, x may be either a matrix or a vector, ‖ · ‖ may be a sparsity or low-rankpromoting penalty such as the `1 norm ‖ · ‖1 or the matrix nuclear norm ‖ · ‖∗, A may be any linearoperator that predicts the observed data vector b of size p× 1, and ρ(·) is typically taken to be the2-norm.These approaches require the user to provide regularization parameters whose values are typi-cally not known ahead of time, and otherwise may require fitting or cross-validation procedures.The alternate formulationminx||x|| s.t. ρ(A(x)− b) ≤ η. (BPDNη)has been successfully used for the sparse regularization of large scale systems Berg and Friedlander[2008], and proposed for nuclear norm regularization Berg and Friedlander [2011]. This formulationrequires the user to provide an acceptable error bound in the data fitting domain (BPDNη), andis preferable for many applications, especially when practitioners know (or are able to estimate)an approximate data error level. We refer to (BPDNη), (QPλ) and (LASSOτ ) as regularization13formulations, since all three limit the space of feasible solutions by considering the nuclear norm ofthe decision variable.A practical implementation of (BPDNη) for large scale matrix completion problems is difficultbecause of the large size of the systems of interest, which makes SVD-based approaches intractable.For example, seismic inverse problems work with 4D data volumes, and matricization of such datacreates structures whose size is a bottleneck for standard low-rank interpolation approaches.Fortunately, a growing literature on factorization-based rank optimization approaches has enabledmatrix completion formulations for (QPλ) and (LASSOτ ) approaches for extremely large-scale sys-tems that avoids costly SVD computations Rennie and Srebro [2005b], Lee et al. [2010b], Rechtand Re´ [2011]. These formulations are non-convex, and therefore do not have the same convergenceguarantees as convex formulations for low-rank factorization. In addition, they require an a priorirank specification, adding a rank constraint to the original problem. Nonetheless, factorized for-mulations can be shown to avoid spurious local minima, so that if a local minimum is found, it willcorrespond to the global minimum in the convex formulation, provided the chosen factor rank washigh enough. In addition, computational methods for factorized formulations are more efficient,mainly because they can completely avoid SVD (or partial SVD) computations. In this paper,we extend the framework of Berg and Friedlander [2011] to incorporate matrix factorization ideas,enabling the (BPDNη) formulation for rank regularization of large scale problems, such as seismicdata interpolation.While formulations in Berg and Friedlander [2008, 2011] choose ρ in (BPDNη) to be thequadratic penalty, recent extensions Aravkin et al. [2013a] allow more general penalties to beused. In particular, robust convex (see e.g. Huber [1981]) and nonconvex penalties (see e.g. Langeet al. [1989], Aravkin et al. [2012]) can be used to measure misfit error in the (BPDNη) formulation.We incorporate these extensions into our framework, allowing matrix completion formulations thatare robust to data contamination.Finally, subspace information can be used to inform the matrix completion problem, analogouslyto how partial support information can be used to improve the sparse recovery problem Friedlanderet al. [2011]. This idea is especially important for seismic interpolation, where frequency continua-tion is used. We show that subspace information can be incorporated into the proposed frameworkusing reweighting, and that the resulting approach can improve recovery SNR in a frequency contin-uation setting. Specifically, subspace information obtained at lower frequencies can be incorporatedinto reweighted formulations for recovering data at higher frequencies.To summarize, we design factorization-based formulations and algorithms for matrix completionthat1. Achieve a specified target misfit level provided by the user (i.e. solve (BPDNη)).2. Achieve recovery in spite of severe data contamination using robust cost functions ρ in (BPDNη)3. Incorporate subspace information into the inversion using re-weighting.14The paper proceeds as follows. In section 2.3, we briefly discuss and compare the formula-tions (QPλ), (LASSOτ ), and (BPDNη). We also review the SPG`1 algorithm Berg and Friedlander[2008] to solve (BPDNη), along with recent extensions for (BPDNη) formulations developed in Ar-avkin et al. [2013a]. In section 2.4, we formulate the convex relaxation for the rank optimizationproblem, and review SVD-free factorization methods. In section 2.5, we extend analysis from Bu-rer and Monteiro [2003] to characterize the relationship between local minima of rank-optimizationproblems and their factorized counterparts in a general setting that captures all formulations ofinterest here. In section 2.6, we propose an algorithm that combines matrix factorization with theapproach developed by Berg and Friedlander [2008, 2011], Aravkin et al. [2013a]. We develop therobust extensions in section 2.7, and reweighting extensions in section 2.8. Numerical results forboth the Netflix Prize problem and for seismic trace interpolation of real data are presented insection 2.9.2.3 Regularization formulationsEach of the three formulations (QPλ), (LASSOτ ), and (BPDNη) controls the tradeoff betweendata fitting and a regularization functional using a regularization parameter. However, there areimportant differences between them.From an optimization perspective, most algorithms solve (QPλ) or (LASSOτ ), together witha continuation strategy to modify τ or λ, see e.g., Figueiredo et al. [2007], Berg and Friedlander[2008]. There are also a variety of methods to determine optimal values of the parameters; seee.g. Giryes et al. [2011] and the references within. However, from a modeling perspective (BPDNη)has a significant advantage, since the η parameter can be directly interpreted as a noise floor, ora threshold beyond which noise is commensurate with the data. In many applications, such asseismic data interpolation, scientists have good prior knowledge of the noise floor. In the absenceof such knowledge, one still wants an algorithm that returns a reasonable solution given a fixedcomputational budget, and some formulations for solving (BPDNη) satisfy this requirement.van den Berg and Friedlander Berg and Friedlander [2008] proposed the SPG`1 algorithm foroptimizing (BPDNη) that captures the features discussed above. Their approach solves (BPDNη)using a series of inexact solutions to (LASSOτ ). The bridge between these problems is provided bythe value function v : R→ Rv(τ) = minxρ(A(x)− b) s.t. ‖x‖ ≤ τ , (2.1)where the particular choice of ρ(·) = ‖ · ‖2 was made in Berg and Friedlander [2008, 2011]. Thegraph of v(τ) is often called the Pareto curve. The (BPDNη) problem can be solved by finding the15root of v(τ) = η using Newton’s method:τk+1 = τk − v(τ)− ηv′(τ), (2.2)and the quantities v(τ) and v′(τ) can be approximated by solving (LASSOτ ) problems. In thecontext of sparsity optimization, (BPDNη) and (LASSOτ ) are known to be equivalent for certainvalues of parameters τ and η. Recently, these results were extended to a much broader class offormulations (see [Aravkin et al., 2013a, Theorem 2.1]). Indeed, convexity of ρ is not required forthis theorem to hold, and instead activity of the constraint at the solution plays a key role. Themain hypothesis requires that the constraint is active at any solution x, i.e. ρ(b−A(x)) = σ, and‖x‖ = τ .For any ρ, v(τ) is non-increasing, since larger τ allow a bigger feasible set. For any convex ρin (2.1), v(τ) is convex by inf-projection [Rockafellar and Wets, 1998, Proposition 2.22]. When ρ isalso differentiable, it follows from [Aravkin et al., 2013a, Theorem 5.2] that v(τ) is differentiable,with derivative given in closed form byv′(τ) = −‖A∗∇ρ(b−Ax¯)‖d , (2.3)where A∗ is the adjoint to the operator A, ‖ ·‖d is the dual norm to ‖ ·‖, and x¯ solves LASSOτ . Forexample, when the norm ‖ · ‖ in (2.1) is the 1-norm, the dual norm is the infinity norm, and (2.3)evaluates to the maximum absolute entry of the gradient. In the matrix case, ‖ ·‖ is typically takento be the nuclear norm, and then ‖ · ‖d is the spectral norm, so (2.3) evaluates to the maximumsingular value of A∗∇ρ(r).To design effective optimization methods, one has to be able to evaluate v(τ), and to computethe dual norm ‖ · ‖d. Evaluating v(τ) requires solving a sequence of optimization problems (2.1),for the sequence of τ given by (2.2).A key idea that makes the approach of Berg and Friedlander [2008] very useful in practice issolving LASSO problems inexactly, with increasing precision as the overarching Newton’s methodproceeds. The net computation is therefore much smaller than what would be required if one solveda set of LASSO problems to a pre-specified tolerance. For large scale systems, the method of choiceis typically a first-order method, such as spectral projected gradient where after taking a step alongthe negative gradient of the mismatch function ρ(A(x)− b), the iterate is projected onto the normball ‖ · ‖ ≤ τ . Fast projection is therefore a necessary requirement for tractable implementation,since it is used in every iteration of every subproblem.With the inexact strategy, the convergence rate of the Newton iteration (2.2) may depend onthe conditioning of the linear operator A [Berg and Friedlander, 2008, Theorem 3.1]. For well-conditioned problems, in practice one can often observe only a few (6-10) (LASSOτ ) problems tofind the solution for (BPDNη) for a given η. As the optimization proceeds, (LASSOτ ) problems for16larger τ warm-start from the solution corresponding to the previous τ .2.4 Factorization approach to rank optimizationWe now consider (BPDNη) in the specific context of rank minimization. In this setting, ‖ · ‖ istaken to be the nuclear norm, where for a matrix X ∈ Rn×m, ‖X‖∗ = ‖σ‖1, where σ is the vectorof singular values. The dual norm in this case is ‖σ‖∞, which is relatively easy to find for verylarge systems.Unfortunately, solving the optimization problem in (2.1) is much more difficult. For the largesystem case, this requires repeatedly projecting onto the set ‖X‖∗ ≤ τ , which which means repeatedSVD or partial SVD computations. This is not feasible for large systems.Factorization-based approaches allow matrix completion for extremely large-scale systems byavoiding costly SVD computations Rennie and Srebro [2005b], Lee et al. [2010a], Recht and Re´[2011]. The main idea is to parametrize the matrix X as a product,X = LRT , (2.4)and to optimize over the factors L,R. If X ∈ Rn×m, then L ∈ Rn×k, and R ∈ Rm×k. Thedecision variable therefore has dimension k(n + m), rather than nm; giving tremendous savingswhen k  m,n. The asymptotic computational complexity of factorization approaches is the sameas that of partial SVDs, as both methods are dominated by an O(nmk) cost; the former having toform X = LRT , and the latter computing partial SVDs, at every iteration. However, in practice theformer operation is much simpler than the latter, and factorization methods outperform methodsbased on partial SVDs. In addition, factorization methods keep an explicit bound on the rank ofall iterates, which might otherwise oscillate, increasing the computational burden.Using representation (2.4), the projection problem in (LASSOτ ) is trivial. For the nuclear norm,we have Rennie and Srebro [2005b]‖X‖∗ = infX=LRT12∥∥∥∥∥[LR]∥∥∥∥∥2F, (2.5)and therefore for any partiular L,R, we have‖X‖∗ = ‖LRT ‖∗ ≤ 12∥∥∥∥∥[LR]∥∥∥∥∥2F. (2.6)The nuclear norm is not the only formulation that can be factorized. Lee et al. [2010b] haverecently introduced the max norm, which is closely related to the nuclear norm and has beensuccessfully used for matrix completion.172.5 Local minima correspondence between factorized and convexformulationsAll of the algorithms we propose for matrix completion are based on the factorization approachdescribed above. Even though the change of variables X = LRT makes the problem nonconvex,it turns out that for a surprisingly general class of problems, this change of variables does notintroduce any extraneous local minima, and in particular any local minimum of the factorized(non-convex) problem corresponds to a local (and hence global) minimum of the correspondingun-factorized convex problem. This result appeared in [Burer and Monteiro, 2003, Proposition2.3] in the context of semidefinite programming (SDP); however, it holds in general, as the authorspoint out [Burer and Monteiro, 2003, p. 431].Here, we state the result for a broad class of problems, which is general enough to capture allof our formulations of interest. In particular, the continuity of the objective function is the mainhypothesis required for this correspondence. It is worthwhile to emphasize this, since in Section 2.7,we consider smooth non-convex robust misfit penalties for matrix completion, which give impressiveresults (see figure 2.11).For completeness, we provide a proof in the appendix.Theorem 1 (General Factorization Theorem) Consider an optimization problem of the formminZ0f(Z)s.t. gi(Z) ≤ 0 i = 1, . . . , nhj(Z) = 0 j = 1, . . . ,mrank(Z) ≤ r,(2.7)where Z ∈ Rn×n is positive semidefinite, and f, gi, hi are continuous. Using the change of variableZ = SST , take S ∈ Rn×r, and consider the problemminSf(SST )s.t. gi(SST ) ≤ 0 i = 1, . . . , nhj(SST ) = 0 j = 1, . . . ,m(2.8)Let Z¯ = S¯S¯T , where Z¯ is feasible for (2.7). Then Z¯ is a local minimum of (2.7) if and only if S¯is a local minimum of (2.8).At first glance, Theorem 1 seems restrictive to apply to a recovery problem for a generic X,since it is formulated in terms of a PSD variable Z. However, we show that all of the formulationsof interest can be expressed this way, due to the SDP characterization of the nuclear norm.It was shown in [Recht et al., 2010b, Sec. 2] that the nuclear norm admits a semi-definite18programming (SDP) formulation. Given a matrix X ∈ Rn×m, we can characterize the nuclearnorm ‖X‖∗ in terms of an auxiliary matrix positive semidefinite marix Z ∈ R(n+m)×(n+m)‖X‖∗ = minZ012Tr(Z)subject to Z1,2 = ZT2,1 = X ,(2.9)where Z1,2 is the upper right n × m block of Z, and Z2,1 is the lower left m × n block. Moreprecisely, the matrix Z is a symmetric positive semidefinite matrix having the structureZ =[LR] [LT RT]=[LLT XXT RRT], (2.10)where L and R have the same rank as X, and Tr(Z) = ‖L‖2F + ‖R‖2F .Using characterization (2.9)-(2.10), we can show that a broad class of formulations of interestin this paper are in fact problems in the class characterized by Theorem 1.Corollary 1 (General Matrix Lasso) Any optimization problem of the formminXf(X)s.t. ‖X‖∗ ≤ τrank(X) ≤ r(2.11)where f is continuous has an equivalent problem in the class of problems (2.7) characterized byTheorem 1.Proof 1 Using (2.9), write (2.11) asminZ≥0f(R(Z))s.t. Tr(Z) ≤ τrank(Z) ≤ r,(2.12)where R(Z) extracts the upper right n × m block of Z. It is clear that if rank(Z) ≤ r, thenrank(X) ≤ r, so every solution feasible for the problem in Z is feasible for the problem in Xby (2.9). On the other hand, we can use the SVD of any matrix X of rank r to write X = LRT ,with rank(L) = rank(R) = r, and then the matrix Z in (2.10) has rank r, contains X in its upperright hand corner, and has as its trace the nuclear norm of X. In particular, if X = UΣV T , wecan use L = U√Σ, and R = V√Σ to get this representation. Therefore, every feasible point forthe X problem has a corresponding Z.192.6 LR-BPDN algorithmThe factorized formulations in the previous section have been used to design several algorithmsfor large scale matrix completion and rank minimization Lee et al. [2010b], Recht and Re´ [2011].However, all of these formulations take the form (QPλ) or (LASSOτ ). The (LASSOτ ) formulationenjoys a natural relaxation interpretation, see e.g. Herrmann et al. [2012b]; on the other hand, a lotof work has focused on methods for λ-selection in (QPλ) formulations, see e.g. Giryes et al. [2011].However, both formulations require some identification procedure of the parameters λ and τ .Instead, we propose to use the factorized formulations to solve the (BPDNη) problem by travers-ing the Pareto curve of the nuclear norm minimization problem. In particular, we integrate thefactorization procedure into the SPG`1 framework, which allows to find the minimum rank solutionby solving a sequence of factorized (LASSOτ ) subproblems (2.14). The cost of solving the factor-ized (LASSOτ ) subproblems is relatively cheap and the resulting algorithm takes advantage of theinexact subproblem strategy in Berg and Friedlander [2008].For the classic nuclear norm minimization problem, we definev(τ) = minX‖A(X)− b‖22 s.t. ‖X‖∗ ≤ τ , (2.13)and find v(τ) = η using the iteration (2.2).However, rather than parameterizing our problem with X, which requires SVD for each pro-jection, we use the factorization formulation, exploiting Theorem 1 and Corollary 1. Specifically,when evaluating the value function v(τ), we solve the corresponding factorized formulationminL,R‖A(LRT )− b‖22 s.t.12∥∥∥∥∥[LR]∥∥∥∥∥2F≤ τ (2.14)using decision variables L,R with a fixed number of k columns each.By Theorem 1 and Corollary 1, any local solution to this problem corresponds to a local solutionof the true LASSO problem, subject to a rank constraint rank(X) ≤ k. We use residual A(LRT )−breconstructed from (2.14) to evaluate both v(τ) and its derivative v′(τ). When the rank of L,Ris large enough, a local minimum of (2.14) corresponds to a local minimum of (2.13), and for anyconvex ρ, every local minimum of (LASSOτ ) is also a global minimum. When the rank of thefactors L and R is smaller than the rank of the optimal LASSO solution, the algorithm looks forlocal minima of the rank-constrained LASSO problem. Unfortunately, we cannot guarantee thatthe solutions we find are local minima for (2.14), rather than simply stationary points. Nonetheless,this approach works quickly and reliably in practice, as we show in our experiments.Problem (2.14) is optimized using the spectral projected gradient algorithm. The gradient iseasy to compute, and the projection requires rescaling all entries of L,R by a single value, whichis fast, simple, and parallelizable.20To evaluate v′(τ), we use the formula (2.3) for the Newton step corresponding to the original(convex) problem in X; this requires computing the spectral norm (largest singular value) ofA∗(b−A(L¯R¯T )) ,where A∗ is the adjoint of the linear operator A, while L¯ and R¯ are the solutions to (2.14). Thelargest singular value of the above matrix can be computed relatively quickly using the powermethod. Again, at every update requiring v(τ) and v′(τ), we are assuming here that our solutionX¯ = L¯R¯T is close to a local minimum of the true LASSO problem, but we do not have theoreticalguarantees of this fact.2.6.1 InitializationThe factorized LASSO problem (2.14) has a stationary point at L = 0, R = 0. This means that inparticular, we cannot initialize from this point. Instead, we recommend initializing from a smallrandom starting point. Another possibility is trying to jump start the algorithm, for example usingthe initialization technique of [Jain et al., 2013, Algorithm 1]. One can compute the partial SVDof the adjoint of the linear operator A on the observed data:USV T = A∗bThen L and R are initialized asL = U√S, R = V√S.This initialization procedure can sometimes result in faster convergence over random initialization.Compared to random initialization, this method has the potential to reduce the runtime of thealgorithm by 30-40% for smaller values of η, see Table 2.1. The key feature of any initializationprocedure is to ensure that the starting value ofτ0 =12[L0R0]is less than the solution to the root finding problem for (BPDNη), v(τ) = η.2.6.2 Increasing k on the flyIn factorized formulations, the user must specify a factor rank. From a computational perspective,it is better that the rank stay small; however if it is too small, it may be impossible to solve (BPDNη)to a specified error level η. For some classes of problems, where the true rank is known ahead oftime (see e.g. Cande`s et al. [2013]), one is guaranteed that a solution will exist for a given rank.21Table 2.1: Summary of the computational time (in seconds) for LR-BPDN, measuring theeffect of random versus smart ([Jain et al., 2013, Algorithm 1]) initialization of L and Rfor factor rank k and relative error level η for (BPDNη). Comparison performed on the1M MovieLens Dataset. Type of initialization had almost no effect on quality of finalreconstruction.Random initializationk 10 20 30 50η=0.5 3.54 5.46 4.04 8.31η=0.3 11.90 6.14 8.42 20.84η=0.2 86.53 107.88 148.12 166.92Smart initializationk 10 20 30 50η=0.5 5.01 5.75 6.84 8.24η=0.3 11.15 18.88 12.38 21.02η=0.2 58.78 84.29 95.07 114.21However, if necessary, factor rank can be adjusted on the fly within our framework.Specifically, adding columns to L and R can be done on the fly, since[L l] [R r]T= LRT + lrT .Moreover, the proposed framework for solving (BPDNη) is fully compatible with this strategy, sincethe underlying root finding is blind to the factorization representation. Changing k only affectsiteration (2.2) through v(τ) and v′(τ).2.6.3 Computational efficiencyOne way of assessing the cost of LR-BPDN is to compare the computational cost per iteration ofthe factorization constrained LASSO subproblems (2.14) with that of the nuclear norm constrainedLASSO subproblems (2.13). We first consider the cost for computing the gradient direction. Agradient direction for the factor L in the factorized algorithm is given bygL = A∗(A(LRT )− b)R,with gR taking a similar form. Compare this to a gradient direction for XgX = A∗ (A(X)− b) .First, we consider the cost incurred in working with the residual and decision variables. While bothmethods must compute the action of A∗ on a vector, the factorized formulation must modify factorsL,R (at a cost of O(k(n + m)) and re-form the matrix X = LRT (at a cost of at most O(knm),for every iteration and line search evaluation. Since A is a sampling matrix for the applicationsof interest, it is sufficient to form only the entries of X that are sampled by A, thus reducing thecost to O(kp), where p is the dimension of the measurement vector b. The sparser the sampling22operator A, the greater the savings. Standard approaches update an explicit decision variable X,at a cost of O(nm), for every iteration and line search evaluation. If the fraction sampled is smallerthan the chosen rank k, the factorized approach is actually cheaper than the standard method. Itis also important to note that standard approaches have a memory footprint of O(mn), simply tostore the decision variable. In contrast, the memory used by factorized approaches are dominatedby the size of the observed data.We now consider the difference in cost involved in the projection. The main benefit for thefactorized formulation is that projection is done using the Frobenius norm formulation (2.14), andso the cost is O(k(n+m)) for every projection. In contrast, state of the art implementations thatcompute full or partial SVDs in order to accomplish the projection (see e.g. Jain et al. [2010],Becker et al. [2011]) are dominated by the cost of this calculation, which is (in the case of partialk-SVD) O(nmk), assuming without loss of generality that k ≤ min(m,n).While the complexity of both standard and factorized iterations is dominated by the termO(mnk), in practice forming X = LRT from two factors with k columns each is still cheaper thancomputing a k-partial SVD of X. This essentially explains why factorized methods are faster. Whileit is possible to obtain further speed up for standard methods using inexact SVD computations,the best reported improvement is a factor of two or three Lin and Wei [2010]. To test our approachagainst a similar approach that uses Lanczos to compute partial SVDs, we modified the projectionused by the SPGL1 code to use this acceleration. We compare against this accelerated code, aswell as against TFOCS Becker et al. [2011] in section 2.9 (see Table 2.7).Finally, both standard and factorized versions of the algorithm require computing the maximumsingular value in order to compute v′(τ). The analysis in section 2.5 shows that if the chosenrank of the factors L and R is larger than or equal to the rank of the global minimizers of thenuclear norm LASSO subproblems, then any local minimizer of the factorized LASSO subproblemcorresponds to a global minimizer for the convex nuclear norm LASSO formulation. Consequently,both formulations will have similar of Pareto curve updates, since the derivates are necessarilyequal at any global minimum whenever ρ is strictly convex1.2.7 Robust formulationsRobust statistics Huber [1981], Maronna et al. [2006] play a crucial role in many real-world appli-cations, allowing good solutions to be obtained in spite of data contamination. In the linear andnonlinear regression setting, the least-squares problemminX‖F (X)− b‖221It is shown in Aravkin et al. [2013a] that for any differentiable convex ρ, the dual problem for the residualr = b−Ax has a unique solution. Therefore, any global minimum for (LASSOτ ) guarantees a unique residual whenρ is strictly convex, and the claim follows, since the derivative only depends on the residual.23corresponds to the maximum likelihood estimate of X for the statistical modelb = F (X) +  , (2.15)where  is a vector of i.i.d. Gaussian variables. Robust statistical approaches relax the Gaussianassumption, allowing other (heavier tailed) distributions to be used. Maximum likelihood estimatesof X under these assumptions are more robust to data contamination. Heavy-tailed distributions,in particular the Student’s t, yield formulations that are more robust to outliers than convexformulations Lange et al. [1989], Aravkin et al. [2012]. This corresponds to the notion of a re-descending influence function Maronna et al. [2006], which is simply the derivative of the negativelog likelihood. The relationship between densities, penalties, and influence functions is shown inFigure 2.1. Assuming that  has the Student’s t density leads to the maximum likelihood estimationproblemminXρ(F (x)− b) :=∑ilog(ν + (F (X)i − bi)2), (2.16)where ν is the Student’s t degree of freedom.A general version of (BPDNη) was proposed in Aravkin et al. [2013a], allowing different penaltyfuntionals ρ. The root-finding procedure of Berg and Friedlander [2008] was extended in Aravkinet al. [2013a] to this more general context, and used for root finding for both convex and noncovexρ (e.g. as in (2.16)).The (BPDNη) formulation for any ρ do not arise directly from a maximum likelihood estimatorof (2.15), because they appear in the constraint. However, we can think about penalties ρ as agentswho, given an error budget η, distribute it between elements of the residual. The strategy that eachagent ρ will use to accomplish this task can be deduced from tail features evident in Figure 2.1.Specifically, the cost of a large residual is prohibitively expensive for the least squares penalty,since its cost is commensurate with that of a very large number of small residuals. For example,(10α)2 = 100α2; so a residual of size 10α is worth as much as 100 residuals of size α to the leastsquares penalty. Therefore, a least squares penalty will never assign a single residual a relativelylarge value, since this would quickly use up the entire error budget. In contrast, |10α| = 10|α|, soa residual of size 10α is worth only 10 residuals of size α when the 1-norm penalty is used. Thispenalty is likely to grant a few relatively large errors to certain residuals, if this resulted in a betterfit. For the penalty in (2.16), it is easy to see that the cost of a residual of size 10α can be worthfewer than 10 residuals of size α, and specific computations depend on ν and actual size of α. Anonconvex penalty ρ, e.g. the one in (2.16), allows large residuals, as long as the majority of theremaining residuals are fit well.From the discussion in the previous paragraph, it is clear that robust penalties are useful asconstraints in (BPDNη), and can cleverly distribute the allotted error budget η, using it for outlierswhile fitting good data. The LR-BPDN framework proposed in this paper captures the robust24Figure 2.1: Gaussian (black dashed line), Laplace (red dashdotted line), and Student’s t (bluesolid line); Densities (left plot), Negative Log Likelihoods (center plot), and InfluenceFunctions (right plot). Student’s t-density has heavy tails, a non-convex log-likelihood,and re-descending influence function.extension, allowing robust data interpolation in situations when some of available data is heavilycontaminated. To develop this extension, we follow Aravkin et al. [2013a] to define the generalizedvalue functionvρ(τ) = minXρ(A(X)− b) s.t. ‖X‖∗ ≤ τ , (2.17)and find vρ(τ) = η using the iteration (2.2). As discussed in section 2.3, for any convex smoothpenalty ρ,v′ρ(τ) = −‖A∗∇ρ(r¯)‖2 , (2.18)where ‖ · ‖2 is the spectral norm, and r¯ = A(X¯)− b for optimal solution X¯ that achieves vρ(τ). Forsmooth non-convex ρ, e.g. (2.16), we still use (2.18) in iteration (2.2).As with standard least squares, we use the factorization formulation to avoid SVDs. Note thatTheorem 1 and Corollary 1 hold for any choice of penalty ρ. When evaluating the value functionvρ(τ), we actually solveminL,Rρ(A(LRT )− b) s.t. 12∥∥∥∥∥[LR]∥∥∥∥∥2F≤ τ . (2.19)For any smooth penalty ρ, including (2.16), a stationary point for this problem can be found usingthe projected gradient method.2.8 ReweightingEvery rank-k solution X¯ of (BPDNη) lives in a lower dimensional subspace of Rn×m spanned bythe n×k row and m×k column basis vectors corresponding to the nonzero singular values of X¯. Incertain situations, it is possible to estimate the row and column subspaces of the matrix X eitherfrom prior subspace information or by solving an initial (BPDNη) problem.In the vector case, it was shown that prior information on the support (nonzero entries) canbe incorporated in the `1-recovery algorithm by solving the weighted-`1 minimization problem. Inthis case, the weights are applied such that solutions with large nonzero entries on the support25estimate have a lower cost (weighted `1 norm) than solutions with large nonzeros outside of thesupport estimate Friedlander et al. [2011].In the matrix case, the support estimate is replaced by estimates of the row and columnsubspace bases U0 ∈ Rn×k and V0 ∈ Rm×k of the largest k singular values of X. Let the matricesU˜ ∈ Rn×k and V˜ ∈ Rm×k be estimates of U0 and V0, respectively.The weighted nuclear norm minimization problem can be formulated as follows:minX||QXW ||∗ s.t. ρ(A(X)− b) ≤ η, (wBPDNη)where Q = ωU˜U˜T + U˜⊥U˜⊥T , W = ωV˜ V˜ T + V˜ ⊥V˜ ⊥T , and ω is some constant between zero andone. Here, we use the notation U˜⊥ ∈ Rn×n−k to refer to the orthogonal complement of U˜ inRn×n, and similarly for V˜ ⊥ in Rm×m. The matrices Q and W are weighted projection matricesof the subspaces spanned by U˜ and V˜ and their orthogonal complements. Therefore, minimizing||QXW ||∗ penalizes solutions that live in the orthogonal complement spaces more when ω < 1.Note that matrices Q and W are invertible, and hence the reweighed LASSO problem still fitsinto the class of problems characterized by Theorem 1. Specifically, we can write any objectivef(X) subject to a reweighted nuclear norm constraint asmin f(Q−1R(Z)W−1)s.t. Tr(Z) ≤ τ ,(2.20)where as in Corollary 1, R(Z) extracts the upper n ×m block of Z (see (2.10)). A factorizationsimilar to (2.14) can then be formulated for the (wBPDNη) problem in order to optimize over thelower dimensional factors L ∈ Rn×k and R ∈ Rm×k.In particular, we can solve a sequence of (LASSOτ ) problemsminL,R‖A(LRT )− b‖22 s.t.12∥∥∥∥∥[QLWR]∥∥∥∥∥2F≤ τ , (2.21)where Q and W are as defined above. Problem (2.21) can also be solved using the spectral projectedgradient algorithm. However, unlike to the non-weighted formulation, the projection in this case isnontrivial. Fortunately, the structure of the problem allows us to find an efficient formulation forthe projection operator.262.8.1 Projection onto the weighted frobenius norm ballThe projection of a point (L,R) onto the weighted Frobenius norm ball 12(‖QL‖2F + ‖WR‖2F ) ≤ τis achieved by finding the point (L˜, R˜) that solvesminLˆ,Rˆ12∥∥∥∥∥[Lˆ− LRˆ−R]∥∥∥∥∥2Fs.t.12∥∥∥∥∥[QLˆWRˆ]∥∥∥∥∥2F≤ τ.The solution to the above problem is given byL˜ =((µω2 + 1)−1U˜ U˜T + (µ+ 1)−1U˜⊥U˜⊥T)LR˜ =((µω2 + 1)−1V˜ V˜ T + (µ+ 1)−1V˜ ⊥V˜ ⊥T)R,where µ is the Lagrange multiplier that solves f(µ) ≤ τ with f(µ) given byf(µ) =12Tr[( ω2(µω2 + 1)2U˜ U˜T +1(µ+ 1)2U˜⊥U˜⊥T)LLT+( ω2(µω2 + 1)2V˜ V˜ T +1(µ+ 1)2V˜ ⊥V˜ ⊥T)RRT].(2.22)The optimal µ that solves equation (2.22) can be found using the Newton iterationµ(t) = µ(t−1) − f(µ(t−1))− τ∇f(µ(t−1)) ,where ∇f(µ) is given byTr[( −2ω4(µω2 + 1)2U˜ U˜T +−2(µ+ 1)3U˜⊥U˜⊥T)LLT+( −2ω4(µω2 + 1)3V˜ V˜ T +−2(µ+ 1)3V˜ ⊥V˜ ⊥T)RRT].2.8.2 Traversing the pareto curveThe design of an effective optimization method that solves (wBPDNη) requires 1) evaluating prob-lem (2.21), and 2) computing the dual of the weighted nuclear norm ‖QXW‖∗.We first define a gauge function κ(x) as a convex, nonnegative, positively homogeneous functionsuch that κ(0) = 0. This class of functions includes norms and therefore includes the formulationsdescribed in (wBPDNη) and (2.21). Recall from section 2.3 that taking a Newton step along thePareto curve of (wBPDNη) requires the computation of the derivative of v(τ) as in (2.3). Therefore,27we also define the polar (or dual) of κ asκo(x) = supw{wTx | κ(w) ≤ 1}. (2.23)Note that if κ is a norm, the polar reduces to the dual norm.To compute the dual of the weighted nuclear norm, we follow Theorem 5.1 of Berg and Fried-lander [2011] which defines the polar (or dual) representation of a weighted gauge function κ(Φx)as κo(Φ−1x), where Φ is an invertible linear operator. The weighted nuclear norm ‖QXW‖∗ is infact a gauge function with invertible linear weighting matrices Q and W . Therefore, the dual normis given by(‖Q(·)W‖∗)d(Z) := ‖Q−1ZW−1‖∞.2.9 Numerical experimentsWe test the performance of LR-BPDN on two example applications. In section 2.9.1, we con-sider the Netflix Prize problem, which is often solved using rank minimization Funk [2006], Gross[2011], Recht and Re´ [2011]. Using MovieLens 1M, 10M, and Netflix 100M datasets, we compareand discuss advantages of different formulations, compare our solver against state of the art con-vex (BPDNη) solver SPG`1, and report timing results. We show that the proposed algorithm isorders of magnitude faster than the best convex (BPDNη) solver.In section 2.9.2, we apply the proposed methods and extensions to seismic trace interpolation, akey application in exploration geophysics Sacchi et al. [1998], where rank regularization approacheshave recently been used successfully Oropeza and Sacchi [2011]. In section 2.9.2, we include anadditional comparison of LR-BPDN with classic SPG`1 as well as with TFOCS Becker et al.[2011] for small matrix completion and seismic data interpolation problems. Then, using real datacollected from the Gulf of Suez, we show results for robust completion in section 2.9.2, and presentresults for the weighted extension in section 2.9.2.2.9.1 Collaborative filteringWe tested the performance of our algorithm on completing missing entries in the MovieLens (1M), (10M),and Netflix (100M) datasets, which contain anonymous ratings of movies made by Netflix users.The ratings are on an integer scale from 1 to 5. The ratings matrix is not complete, and the goal isto infer the values in the unseen test set. In order to test our algorithm, we further subsampled theavailable ratings by randomly removing 50% of the known entries. We then solved the (BPDNη)formulation to complete the matrix, and compared the predicted (P) and actual (A) removed en-tries in order to assess algorithm performance. We report the signal-to-noise ratio (SNR) and root28means square error (RMSE):SNR = 20 log( ‖A‖F‖P −A‖F), RMSE = ‖P −A‖F /‖A‖0for different values of η in the (BPDNη) formulation.Since our algorithm requires pre-defining the rank of the factors L and R, we perform therecovery with ranks k ∈ {10, 20, 30, 50}. Table 2.2 shows the reconstruction SNR for each of theranks k and for a relative error η ∈ {0.5, 0.3, 0.2} (the data mismatch is reduced to a fraction ηof the initial error). The last row of table 2.2 shows the recovery for an unconstrained low-rankformulation, using the work and software of Vandereycken [2013]. This serves as an interestingbaseline, since the rank k of the Riemannian manifold in the unconstrained formulation functionsas a regularizer. It is clear that for small k, we get good results without additional functionalregularization; however, as k increases, the quality of the rank k solution decays without furtherconstraints. In contrast, we get better results as the rank increases, because we consider a largermodel space, but solve the BPDN formulation each time. This observation demonstrates theimportance of the nuclear norm regularization, especially when the underlying rank of the problemis unknown.Table 2.3 shows the timing (in seconds) used by all methods to obtain solutions. There areseveral conclusions that can be readily drawn. First, for error-level constrained problems, a tightererror bound requires a higher computational investment by our algorithm, which is consistent withthe original behavior of SPG`1 Berg and Friedlander [2008]. Second, the unconstrained prob-lem is easier to solve (using the Riemmanian manifolds approach of Vandereycken [2013]) than aconstrained problem of the same rank; however, it is interesting to note that as the rank of therepresentation increases, the unconstrained Riemmanian approach becomes more expensive thanthe constrained problem for the levels η considered, most likely due to second-order methods usedby the particular implementation of Vandereycken [2013].Table 2.4 shows the value of ‖X‖∗ of the reconstructed signal corresponding to the settings inTable 2.2. While the interpretation of the η values are straightforward (they are fractions of theinitial data error), it is much more difficult to predict ahead of time which value of τ one maywant to use when solving (LASSOτ ). This illustrates the modeling advantage of the (BPDNη)formulation: it requires only the simple parameter η, which is an estimate of the (relative) noisefloor. Once η is provided, the algorithm (not the user) will instantiate (LASSOτ ) formulations, andfind the right value τ that satisfies v(τ) = η. When no estimate of η is available, our algorithm canstill be applied to the problem, with η = 0 and a fixed computational budget (see Table 2.5. )Table 2.5 shows a comparison between classic SPG`1, accelerated with a Lanczos-based trun-cated SVD projector, against the new solver, on the MovieLens (10M) dataset, for a fixed budgetof 100 iterations. Where the classic solver takes over six hours, the proposed method finishes in lessthan a minute. For a problem of this size, explicit manipulation of X as a full matrix of size 10K29Table 2.2: Summary of the recovery results on the MovieLens (1M) data set for factor rankk and relative error level η for (BPDNη). SNR in dB (higher is better) listed in the lefttable, and RMSE (lower is better) in the right table. The last row in each table givesrecovery results for the non-regularized data fitting factorized formulation solved withRiemannian optimization (ROPT). Quality degrades with k due to overfitting for thenon-regularized formulation, and improves with k when regularization is used.k 10 20 30 50η0.5 5.93 5.93 5.93 5.930.3 10.27 10.27 10.26 10.270.2 12.50 12.54 12.56 12.56ROPT 11.16 8.38 6.01 2.6k 10 20 30 50η0.5 1.89 1.89 1.89 1.890.3 1.14 1.14 1.15 1.140.2 0.88 0.88 0.88 0.88ROPT 1.03 1.42 1.87 2.77Table 2.3: Summary of the computational timing (in seconds) on the MovieLens (1M) dataset for factor rank k and relative error level η for (BPDNη). The last row gives compu-tational timing for the non-regularized data fitting factorized formulation solved withRiemannian optimization.k 10 20 30 50η0.5 5.0 5.7 6.8 8.20.3 11.1 18.8 12.3 21.00.2 58.7 84.2 95.0 114.2ROPT 14.9 43.5 98.4 327.3by 20K is computationally prohibitive. Table 2.6 gives timing and reconstruction quality resultsfor the Netflix (100M) dataset, where the full matrix is 18K by 500K when fully constructed.2.9.2 Seismic missing-trace interpolationIn exploration seismology, large-scale data sets (approaching the order of petabytes for the latestland and wide-azimuth marine acquisitions) must be acquired and processed in order to determinethe structure of the subsurface. In many situations, only a subset of the complete data is acquireddue to physical and/or budgetary constraints. Recent insights from the field of compressed sensingallow for deliberate subsampling of seismic wavefields in order to improve reconstruction quality andreduce acquisition costs Herrmann and Hennenfent [2008a]. The acquired subset of the completedata is often chosen by randomly subsampling a dense regular periodic source or receiver grid.Interpolation algorithms are then used to reconstruct the dense regular grid in order to perform30Table 2.4: Nuclear-norms of the solutions X = LRT for results in Table 2.2, correspondingto τ values in (LASSOτ ). These values are found automatically via root finding, butare difficult to guess ahead of time.k 5 10 30 50η0.5 5.19e3 5.2e3 5.2e3 5.2e30.3 9.75e3 9.73e3 9.76e3 9.74e30.2 1.96e4 1.96e4 1.93e4 1.93e4Table 2.5: Classic SPGL1 (using Lanczos based truncated SVD) versus LR factorization onthe MovieLens (10M) data set ( 10000×20000 matrix) shows results for a fixed iterationbudget (100 iterations) for 50% subsampling of MovieLens data. SNR, RMSE andcomputational time are shown for k = 5, 10, 20.MovieLens (10M)k 5 10 20SPG`1SNR (dB) 11.32 11.37 11.37RMSE 1.02 1.01 1.01time (sec) 22680 93744 121392LRSNR (dB) 11.87 11.77 11.72RMSE 0.95 0.94 0.94time (sec) 54.3 48.2 47.5additional processing on the data such as removal of artifacts, improvement of spatial resolution,and key analysis, such as imaging.In this section, we apply the new rank-minimization approach, along with weighted and robustextensions, to the trace-interpolation problem for two different seismic acquisition examples. Wefirst describe the structure of the datasets, and then present the transform we use to cast theinterpolation as a rank-minimization problem.The first example is a real data example from the Gulf of Suez. Seismic data are organizedinto seismic lines, where Nr receivers and Ns sources are collocated in a straight line. Sources aredeployed sequentially, and receivers collect each shot record2 for a period of Nt time samples. TheGulf of Suez data contains Ns = 354 sources, Nr = 354 receivers, and Nt = 1024 with a samplinginterval of 0.004s, leading to a shot duration of 4s and a maximum temporal frequency of 125 Hz.Most of the energy of the seismic line is preserved when we restrict the spectrum to the 12-60Hzfrequency band. Figs. 2.2(a) and (b) illustrate the 12Hz and 60Hz frequency slices in the source-2Data collection performed for several sources taken with increasing or decreasing distance between sources andreceivers.31Table 2.6: LR method on the Netflix (100M) data set ( 17770×480189 matrix) shows resultsfor 50% subsampling of Netflix data. SNR, computational time and RMSE are shownfor factor rank k and relative error level η for (BPDNη).Netflix (100M)k 2 4 6η=0.5SNR (dB) 7.37 7.03 7.0RMSE 1.60 1.67 1.68time (sec) 236.5 333.0 335.0η=0.4SNR (dB) 8.02 7.96 7.93RMSE 1.49 1.50 1.50time (sec) 315.2 388.6 425.0η=0.3SNR (dB) 10.36 10.32 10.35RMSE 1.14 1.14 1.14time (sec) 1093.2 853.7 699.7receiver domain, respectively. Columns in these frequency slices represent the monochromaticresponse of the earth to a fixed source and as a function of the receiver coordinate. In order tosimulate missing traces, we apply a subsampling mask that randomly removes 50% of the sources,resulting in the subsampled frequency slices illustrated in Figs. 2.2 (c) and (d).State of the art trace-interpolation schemes transform the data into sparsifying domains, forexample using the Fourier Sacchi et al. [1998] and curvelet Herrmann and Hennenfent [2008a]transforms. The underlying sparse structure of the data is then exploited to recover the missingtraces. The approach proposed in this paper allows us to instead exploit the low-rank matrixstructure of seismic data, and to design formulations that can achieve trace-interpolation usingmatrix-completion strategies.The main challenge in applying rank-minimization for seismic trace-interpolation is to find atransform domain that satisfies the following two properties:1. Fully sampled seismic lines have low-rank structure (quickly decaying singular values)2. Subsampled seismic lines have high rank (slowly decaying singular values).When these two properties hold, rank-penalization formulations allow the recovery of missing traces.To achieve these aims, we use the transformation from the source-receiver (s-r) domain to themidpoint-offset (m-h). The conversion from (s-r) domain to (m-h) domain is a coordinate transfor-mation, with the midpoint is defined by m = 12(s+r) and the half-offset is defined by h =12(s-r).3This transformation is illustrated by transforming the 12Hz and 60Hz source-receiver domain fre-quency slices in Figs. 2.2(a) and (b) to the midpoint-offset domain frequency slices in Figs. 2.2(e)3 In mathematical terms, the transformation from (s-r) domain to (m-h) domain represents a tight frame.32Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(a)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(b)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(c)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(d)Offset(m)Midpoint(m)−4000 −2000 0 2000 400001000200030004000(e)Offset(m)Midpoint(m)−4000 −2000 0 2000 400001000200030004000(f)Offset(m)Midpoint(m)−4000 −2000 0 2000 400001000200030004000(g)Offset(m)Midpoint(m)−4000 −2000 0 2000 400001000200030004000(h)Figure 2.2: Frequency slices of a seismic line from Gulf of Suez with 354 shots, 354 receivers.Full data for (a) low frequency at 12 Hz and (b) high frequency at 60 Hz in s-r domain.50% Subsampled data for (c) low frequency at 12 Hz and (d) high frequency at 60 Hzin s-r domain. Full data for (e) low frequency at 12 hz and (f) high frequency at 60Hz in m-h domain. 50% subsampled data for (g) low frequency at 12 Hz and (h) highfrequency at 60 Hz in m-h domain.and (f). The corresponding subsampled frequency slices in the midpoint-offset domain are shownin Figs. 2.2(g) and (h).To show that the midpoint-offset transformation achieves aims 1 and 2 above, we plot the decayof the singular values of both the 12Hz and 60Hz frequency slices in the source-receiver domainand in the midpoint-offset domain in Figs. 2.3 (a) and (c). Notice that the singular values of bothfrequency slices decay faster in the midpoint-offset domain, and that the singular value decay isslower for subsampled data in Figs. 2.3 (b) and (d).Let X denote the data matrix in the midpoint-offset domain and let R be the subsamplingoperator that maps Figs. 2.2 (e) and (f) to Figs. 2.2(g) and (h). Denote by S the transformationoperator from the source-receiver domain to the midpoint-offset domain. The resulting measure-ment operator in the midpoint-offset domain is then given by A = RSH .We formulate and solve the matrix completion problem (BPDNη) to recover a seismic linefrom Gulf of Suez in the (m-h) domain. We first performed the interpolation for the frequencyslices at 12Hz and 60Hz to get a good approximation of the lower and higher limit of the rankvalue. Then, we work with all the monochromatic frequency slices and adjust the rank within thelimit while going from low- to high-frequency slices. We use 300 iterations of LR for all frequencyslices. Figures 2.4(a) and (b) show the recovery and error plot for the low frequency slice at 1233100 101 102 10300.51number of singular valuessingular value magnitude  SR DomainMH Domain(a)100 101 102 10300.51number of singular valuessingular value magnitude  SR DomainMH Domain(b)100 101 102 10300.51number of singular valuessingular value magnitude  SR DomainMH Domain(c)100 101 102 10300.51number of singular valuessingular value magnitude  SR DomainMH Domain(d)Figure 2.3: Singular value decay of fully sampled (a) low frequency slice at 12 Hz and (c)high frequency slice at 60 Hz in (s-r) and (m-h) domains. Singular value decay of 50%subsampled (b) low frequency slice at 12 Hz and (d) high frequency data at 60 Hzin (s-r) and (m-h) domains. Notice that for both high and low frequencies, decay ofsingular values is faster in the fully sampled (m-h) domain than in the fully sampled(s-r) domain, and that subsampling does not significantly change the decay of singularvalue in (s-r) domain, while it destroys fast decay of singular values in (m-h) domain.Hz, respectively. Figures 2.4(c) and (d) show the recovery and error plot for the high frequencyslice at 60 Hz, respectively. Figures 2.5 shows a common-shot gather section after missing-traceinterpolation from Gulf of Suez data set. We can clearly see that we are able to recapture most ofthe missing traces in the data (Figures 2.5c), also evident from residual plot (Figures 2.5d).In the second acquisition example, we implement the proposed formulation on the 5D syntheticseismic data (2 source dimension, 2 receiver dimension, 1 temporal dimension) provided by BGGroup. We extract a frequency slice at 12.3Hz to perform the missing-trace interpolation, wherethe size of the to-be recovered matrix is 400×400 receivers spaced by 25m and 68×68 sources spaced34by 150m. Due to the low spatial frequency content of the data at 12.3 Hz, we further subsample thedata in receiver coordinates by a factor of two to speed up the computation. We apply sub-samplingmasks that randomly remove 75% and 50% of the shots. In case of 4D, we have two choices ofmatricization Silva and Herrmann [2013c], Demanet [2006a], as shown in Figures 2.6(a,b), wherewe can either place the (Receiver x, Receiver y) dimensions in the rows and (Source x, Source y)dimensions in the columns, or (Receiver y, Source y) dimensions in the rows and (Receiver x, Sourcex) dimensions in the columns. We observed the faster decay of singular value decay as shown inFigure 2.7(a,b), for each of these strategies. We therefore selected the transform domain to be thepermutation of source and receivers coordinates, where matricization of each 4D monochromaticfrequency slices is done using (Source x, Receiver x) and (Source y, Receiver y) coordinates. We userank 200 for the interpolation, and run the solver for a maximum of 1000 iterations. The resultsafter interpolation are shown in Figures 2.8 and 2.9 for 75% and 50% missing data, respectively.We can see that when 75% of data is missing, we start losing coherent energy (Figures 2.8c). With50% missing data, we capture most of the coherent energy (Figures 2.9c). We also have higherSNR values for recovery in case of 50% compared to 75% missing data.To illustrate the importance of the nuclear-norm regularization, we solved the interpolationproblem using a simple least-squares formulation on the same seismic data set from Gulf ofSuez. The least squares problem was solved using the L, R factorization structure, thereby im-plicitly enforcing a rank on the recovered estimate (i.e, formulation (2.14) was optimized with-out the τ -constraint). The problem was then solved with the factors L and R having a rankk ∈ {5, 10, 20, 30, 40, 50, 80, 100}. The reconstruction SNRs comparing the recovery for the reg-ularized and non-regularized formulations are shown in Fig. 2.10. The Figure shows that theperformance of the non-regularized approach decays with rank, due to overfitting. The regularizedapproach, in contrast, obtains better recovery as the factor rank increases.Comparison with classical nuclear-norm formulationTo illustrate the advantage of proposed matrix-factorization formulation (which we refer to as LRbelow) over classical nuclear-norm formulation, we compare the reconstruction error and computa-tion time with the existing techniques. The most natural baseline is the SPG`1 algorithm Berg andFriedlander [2011] applied to the classic nuclear norm (BPDNη) formulation, where the decisionvariable is X, the penalty function ρ is the 2-norm, and the projection is done using the SVD. Thisexample tests the classic (BPDNη) formulation against the LR extension proposed in this paper.The second comparison is with the TFOCSBecker et al. [2011], which is a library of first-ordermethods for a variety of problems with explicit examples written by the authors for the (BPDNη)formulation. The TFOCS approach to (BPDNη) relies on a proximal function for the nuclear norm,which, similar to projection, requires computing SVDs or partial SVDs.The comparisons are done using three different data sets. In the first example, we interpolated35missing traces of a monochromatic slice (of size 354×354), extracted from Gulf of Suez data set. Wesubsampled the frequency slice by randomly removing the 50% of shots and performed the missing-trace interpolation in the midpoint-offset (m-h) domain. We compares the SNR, computation timeand iterations for a fixed set of η. The rank of the factors was set to 28. The seismic example inTable 2.7 shows the results. Both the classic SPG`1 algorithm and LR are faster than TFOCS.In the quality of recovery, both SPG`1 and LR have better SNR than TFOCS. In this case LR isfaster than SPG`1 by a factor of 15 (see Table 2.7).In the second example, we generated a rank 10 matrix of size 100 × 100. We subsampled thematrix by randomly removing 50% of the data entries. The synthetic low-rank example in Table2.7 shows the comparison of SNR and computational time. The rank of the factors was set to bethe true rank of the original data matrix for this experiment. The LR formulation proposed inthis paper is faster than classic SPG`1, and both are faster than TFOCS. As the error thresholdtightens, TFOCS requires a large number of iterations to converge. For a small problem size, LRand the classic SPG`1 perform comparably. When operating on decision variables with the correctrank LR gave uniformly better SNR results than classic SPG`1 and TFOCS, and the improvementwas significant for lower error thresholds.4 In reality, we do not know the rank value in advance.To make a fair comparison, we used MovieLens (10M) dataset, where we subsampled the availableratings by randomly removing 50% of the known entries. In this example, we fixed the number ofiterations to 100 and compared the SNR and computational time (Table 2.5) for multiple ranks,k = 5, 10, 20. It is evident that the we get better SNR in case of LR, also the computational speedof LR is significantly faster then the classic SPG`1.Simultaneous missing-trace interpolation and denoisingTo illustrate the utility of robust cost functions, we consider a situation where observed dataare heavily contaminated. The goal here is to simultaneously denoise interpolate the data. Wework with same seismic line from Gulf of Suez. To obtain the observed data, we apply a sub-sampling mask that randomly removes 50% of the shots, and to simulate contamination, we replaceanother 10% of the shots with large random errors, whose amplitudes are three times the maximumamplitude present in the data. In reality, we know the sub-sampling mask but we do not knowthe behaviour and amplitude of noise. In this example, we formulate and solve the robust matrixcompletion problem (BPDNη), where the cost ρ is taken to be the penalty (2.16); see section 2.7for the explanation and motivation. As in the previous examples, the recovery is done in the(m-h) domain. We implement the formulation in the frequency domain, where we work withmonochromatic frequency slices, and adjust the rank and ν parameter while going from low tohigh frequency slices. Figure 2.11 compares the recovery results with and without using a robust4We tested this hypothesis by re-running the experiment with higher factor rank. For example, selecting factorrank to be 40 gives SNRs of 16.5, 36.7, 42.4, 75.3 for the corresponding η values for the synthetic low-rank experimentin 2.7.36Table 2.7: TFOCS versus classic SPG`1 (using direct SVDs) versus LR factorization. Syn-thetic low rank example shows results for completing a rank 10, 100 × 100 matrix,with 50% missing entries. SNR, Computational time and iterations are shown forη = 0.1, 0.01, 0.005, 0.0001. Rank of the factors is taken to be 10. Seismic exam-ple shows results for matrix completion a low-frequency slice at 10 Hz, extracted fromthe Gulf of Suez data set, with 50% missing entries. SNR, Computational time anditerations are shown for η = 0.2, 0.1, 0.09, 0.08. Rank of factors was taken to be 28.Synthetic low rankη 0.1 0.01 0.005 0.0001TFOCSSNR (dB) 17.2 36.3 56.2 76.2time (s) 24 179 963 2499iteration 1151 8751 46701 121901SPG`1SNR (dB) 14.5 36.4 39.2 76.2time (s) 4.9 17.0 17.2 61.1iteration 12 46 47 152LRSNR (dB) 16.5 36.7 42.7 76.2time (s) 0.6 0.5 0.58 0.9iteration 27 64 73 119Seismicη 0.2 0.1 0.09 0.08TFOCSSNR (dB) 13.05 17.4 17.9 18.5time (s) 593 3232 4295 6140iteration 1201 3395 3901 4451SPG`1SNR (dB) 12.8 17.0 17.4 17.9time (s) 30.4 42.8 32.9 58.8iteration 37 52 40 73LRSNR (dB) 13.1 17.1 17.4 18.0time (s) 1.6 2.9 3.2 4.0iteration 38 80 87 113penalty function. The error budget plays a significant role in this example, and we standardizedthe problems by setting the relative error to be 20% of the initial error, so that the formulationsare comparable.We can clearly see that the standard least squares formulation is unable to recover the truesolution. The intuitive reason is that the least squares penalty is simply unable to budget largeerrors to what should be the outlying residuals. The Student’s t penalty, in contrast, achieves agood recovery in this extreme situation, with an SNR of 17.9 DB. In this example, we used 300iterations of SPG`1 for all frequency slices.Re-weightingRe-weighting for seismic trace interpolation was recently used in Mansour et al. [2012a] to improvethe interpolation of subsampled seismic traces in the context of sparsity promotion in the curveletdomain. The weighted `1 formulation takes advantage of curvelet support overlap across adjacentfrequency slices.Analogously, in the matrix setting, we use the weighted rank-minimization formulation (wBPDNη)to take advantage of correlated row and column subspaces for adjacent frequency slices. We firstdemonstrate the effectiveness of solving the (wBPDNη) problem when we have accurate subspaceinformation. For this purpose, we compute the row and column subspace bases of the fully sampledlow frequency (11Hz) seismic slice and pass this information to (wBPDNη) using matrices Q and37Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(a)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(b)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(c)Source(m)Receiver(m)0 1000 2000 3000 400001000200030004000(d)Figure 2.4: Recovery results for 50% subsampled 2D frequency slices using the nuclear normformulation. (a) Interpolation and (b) residual of low frequency slice at 12 Hz withSNR = 19.1 dB. (c) Interpolation and (d) residual of high frequency slice at 60 Hz withSNR = 15.2 dB.W . Figures 2.12(a) and (b) show the residual of the frequency slice with and without weighting.The reconstruction using the (wBPDNη) problem achieves a 1.5dB improvement in SNR over thenon-weighted (BPDNη) formulation.Next, we apply the (wBPDNη) formulation in a practical setting where we do not know subspacebases ahead of time, but learn them as we proceed from low to high frequencies. We use therow and column subspace vectors recovered using (BPDNη) for 10.75 Hz and 15.75 Hz frequencyslices as subspace estimates for the adjacent higher frequency slices at 11 Hz and 16 Hz. Usingthe (wBPDNη) formulation in this way yields SNR improvements of 0.6dB and 1dB, respectively,over (BPDNη) alone. Figures 2.13(a) and (b) show the residual for the next higher frequencywithout using the support and Figures 2.13(c) and (d) shows the residual for next higher frequencywith support from previous frequency. Figure 2.14 shows the recovery SNR versus frequency forweighted and non-weighted cases for a range of frequencies from 9 Hz to 17 Hz.38Source(m)Time(s)0 1000 2000 3000 400000.511.52(a)Source(m)Time(s)0 1000 2000 3000 400000.511.52(b)Source(m)Time(s)0 1000 2000 3000 400000.511.52(c)Source(m)Time(s)0 1000 2000 3000 400000.511.52(d)Figure 2.5: Missing trace interpolation of a seismic line from Gulf of Suez. (a) Ground truth.(b) 50% subsampled common shot gather. (c) Recovery result with a SNR of 18.5 dB.(d) Residual.2.10 ConclusionsWe have presented a new method for matrix completion. Our method combines the Pareto curveapproach for optimizing (BPDNη) formulations with SVD-free matrix factorization methods.We demonstrated the modeling advantages of the (BPDNη) formulation on the Netflix Prizeproblem, and obtained high-quality reconstruction results for the seismic trace interpolation prob-lem. Comparison with state of the art methods for the (BPDNη) formulation showed that thefactorized formulation is faster than both TFOCS and classic SPG`1 formulations that rely on theSVD. The presented factorized approach also has a small memory imprint and does not rely onSVDs, which makes this method applicable to truly large-scale problems.We also proposed two extensions. First, using robust penalties ρ in (BPDNη), we showedthat simultaneous interpolation and denoising can be achieved in the extreme data contaminationcase, where 10% of the data was replaced by large outliers. Second, we proposed a weighted39Source x, Source yReceiver x, Receiver y  50 100 150 200 250 300 350 40050100150200250300350400 −2−1012x 10−4(a)Source x, Source yReceiver x, Receiver y  50 100 150 200 250 300 350 40050100150200250300350400 −2−1012x 10−4(b)Source x, Receiver xSource y, Receiver y  50 100 150 200 250 300 350 40050100150200250300350400 −2−1012x 10−4(c)Source x, Receiver xSource y, Receiver y  50 100 150 200 250 300 350 40050100150200250300350400 −2−1012x 10−4(d)Figure 2.6: Matricization of 4D monochromatic frequency slice. Top: (Source x, Source y)matricization. Bottom: (Source x, Receiver x) matricization. Left: Fully sampleddata; Right: Subsampled data.100 101 102 103 10400.51number of singular valuessingular value magnitude  Acquisition DomainTransform Domain(a)100 101 102 103 10400.51number of singular valuessingular value magnitude  Acquisition DomainTransform Domain(b)Figure 2.7: Singular value decay in case of different matricization of 4D monochromatic fre-quency slice. Left: Fully sampled data; Right: Subsampled data.40Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(a)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(b)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(c)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−8−6−4−202468x 10−4(d)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−8−6−4−202468x 10−4(e)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(f)Figure 2.8: Missing-trace interpolation of a frequency slice at 12.3Hz extracted from 5D dataset, 75% missing data. (a,b,c) Original, recovery and residual of a common shot gatherwith a SNR of 11.4 dB at the location where shot is recorded. (d,e,f) Interpolation ofcommon shot gathers at the location where no reference shot is present.extension (wBPDNη), and used it to incorporate subspace information we learned on the fly toimprove interpolation in adjacent frequencies.2.11 AppendixProof of Theorem 1 Recall [Burer and Monteiro, 2003, Lemma 2.1]: if SST = KKT , then S = KQfor some orthogonal matrix Q ∈ Rr×r. Next, note that the objective and constraints of (2.8) aregiven in terms of SST , and for any orthogonal Q ∈ Rr×r, we have SQQTST = SST , so S¯ is a localminimum of (2.8) if and only if S¯Q is a local minimum for all orthogonal Q ∈ Rr×r.If Z¯ is a local minimum of (2.7), then any factor S¯ with Z¯ = S¯S¯T is a local minimum of (2.8).Otherwise, we can find a better solution S˜ in the neighborhood of S¯, and then Z˜ := S˜S˜T will be afeasible solution for (2.7) in the neighborhood of Z¯ (by continuity of the map S → SST ).We prove the other direction by contrapositive. If Z¯ is not a local minimum for (2.7), thenyou can find a sequence of feasible solutions Zk with f(Zk) < f(Z¯) and Zk → Z¯. For each k,41Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(a)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(b)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(c)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−8−6−4−202468x 10−4(d)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−8−6−4−202468x 10−4(e)Receiver yReceiver x  50 100 150 200 250 300 350 40050100150200250300350400−6−4−20246x 10−4(f)Figure 2.9: Missing-trace interpolation of a frequency slice at 12.3Hz extracted from 5D dataset, 50% missing data. (a,b,c) Original, recovery and residual of a common shot gatherwith a SNR of 16.6 dB at the location where shot is recorded. (d,e,f) Interpolation ofcommon shot gathers at the location where no reference shot is present.write Zk = SkSTk . Since Zk are all feasible for (2.7), so Sk are feasible for (2.8). By assumption{Zk} is bounded, and so is Sk; we can therefore find a subsequence of Sj → S˜ with S˜S˜T = Z¯,and f(SjSTj ) < f(S˜S˜T ). In particular, we have Z¯ = S¯S¯T = S˜S˜T , and S˜ is not a local minimumfor (2.8), and therefore (by previous results) S¯ cannot be either.4220 40 60 80 100101214161820RankSNR  no−regularizationregularization(a)20 40 60 80 1000246810121416RankSNR  no−regularizationregularization(b)Figure 2.10: Comparison of regularized and non-regularized formulations. SNR of (a) lowfrequency slice at 12 Hz and (b) high frequency slice at 60 Hz over a range of factorranks. Without regularization, recovery quality decays with factor rank due to over-fiting; the regularized formulation improves with higher factor rank.43Source(m)Time(s)0 1000 2000 3000 400000.511.52(a)Source(m)Time(s)0 1000 2000 3000 400000.511.52(b)Source(m)Time(s)0 1000 2000 3000 400000.511.52(c)Source(m)Time(s)0 1000 2000 3000 400000.511.52(d)Figure 2.11: Comparison of interpolation and denoising results for the Student’s t and least-squares misfit function. (a) 50% subsampled common receiver gather with another10 % of the shots replaced by large errors. (b) Recovery result using the least-squares misfit function. (c,d) Recovery and residual results using the student’s tmisfit function with a SNR of 17.2 dB.44Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(a)Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(b)Figure 2.12: Residual error for recovery of 11 Hz slice (a) without weighting and (b) withweighting using true support. SNR in this case is improved by 1.5 dB.45Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(a)Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(b)Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(c)Source(m)Receiver(m)0 1000 2000 3000 400005001000150020002500300035004000(d)Figure 2.13: Residual of low frequency slice at 11 Hz (a) without weighing (c) with supportfrom 10.75 Hz frequency slice. SNR is improved by 0.6 dB. Residual of low frequencyslice at 16 Hz (b) without weighing (d) with support from 15.75 Hz frequency slice.SNR is improved by 1dB. Weighting using learned support is able to improve on theunweighted interpolation results.4610 11 12 13 14 15 16 171618202224Frequency(Hz)SNR  WeightedNon−weightedFigure 2.14: Recovery results of practical scenario in case of weighted factorized formulationover a frequency range of 9-17 Hz. The weighted formulation outperforms the non-weighted for higher frequencies. For some frequency slices, the performance of thenon-weighted algorithm is better, because the weighted algorithm can be negativelyaffected when the subspaces are less correlated.47Chapter 3Efficient matrix completion forseismic data reconstruction3.1 SummaryDespite recent developments in improved acquisition, seismic data often remains undersampledalong source and receiver coordinates, resulting in incomplete data for key applications such asmigration and multiple prediction. We interpret the missing-trace interpolation problem in thecontext of matrix completion and outline three practical principles for using low-rank optimiza-tion techniques to recover seismic data. Specifically, we strive for recovery scenarios wherein theoriginal signal is low rank and the subsampling scheme increases the singular values of the ma-trix. We employ an optimization program that restores this low rank structure to recover the fullvolume. Omitting one or more of these principles can lead to poor interpolation results, as weshow experimentally. In light of this theory, we compensate for the high-rank behaviour of datain the source-receiver domain by employing the midpoint-offset transformation for 2D data and asource-receiver permutation for 3D data to reduce the overall singular values. Simultaneously, inorder to work with computationally feasible algorithms for large scale data, we use a factorization-based approach to matrix completion, which significantly speeds up the computations comparedto repeated singular value decompositions without reducing the recovery quality. In the contextof our theory and experiments, we also show that windowing the data too aggressively can haveadverse effects on the recovery quality. To overcome this problem, we carry out our interpolationsfor each frequency independently while working with the entire frequency slice. The result is acomputationally efficient, theoretically motivated framework for interpolating missing-trace data.Our tests on realistic two- and three-dimensional seismic data sets show that our method comparesfavorably, both in terms of computational speed and recovery quality, to existing curvelet-basedA version of this chapter has been published in Geophysics, 2015, vol 80, pages V97-V114.48and tensor-based techniques.3.2 IntroductionCoarsely sampled seismic data creates substantial problems for seismic applications such as mi-gration and inversion [Canning and Gardner, 1998, Sacchi and Liu, 2005]. In order to mitigateacquisition related artifacts, we rely on interpolation algorithms to reproduce the missing tracesaccurately. The aim of these interpolation algorithms is to reduce acquisition costs and to providedensely sampled seismic data to improve the resolution of seismic images and mitigate subsamplingrelated artifacts such as aliasing. A variety of methodologies, each based on various mathematicaltechniques, have been proposed to interpolate seismic data. Some of the methods require trans-forming the data into different domains, such as the Radon [Bardan, 1987, Kabir and Verschuur,1995], Fourier [Duijndam et al., 1999, Sacchi et al., 1998, Curry, 2009, Trad, 2009] and curveletdomains [Herrmann and Hennenfent, 2008a, Sacchi et al., 2009, Wang et al., 2010]. The CS ap-proach exploits the resulting sparsity of the signal, i.e. small number of nonzeros [Donoho, 2006a]in these domains. In the CS framework, the goal for effective recovery is to first find a repre-sentation in which the signal of interest is sparse, or well-approximated by a sparse signal, andwhere the the mask encoding missing traces makes the signal much less sparse. Hennenfent andHerrmann [2006b], Herrmann and Hennenfent [2008a] successfully applied the ideas of CS to thereconstruction of missing seismic traces in the curvelet domain.More recently, rank-minimization-based techniques have been applied to interpolating seismicdata [Trickett et al., 2010, Oropeza and Sacchi, 2011, Kreimer and Sacchi, 2012c,a, Yang et al., 2013].Rank minimization extends the theoretical and computational ideas of CS to the matrix case (seeRecht et al. [2010a] and the references within). The key idea is to exploit the low-rank structure ofseismic data when organized as a matrix, i.e. a small number of nonzero singular values or quicklydecaying singular values. Oropeza and Sacchi [2011] identified that seismic temporal frequencyslices organized into a block Hankel matrix, under ideal conditions, is a matrix of rank k, wherek is the number of different plane waves in the window of analysis. These authors showed thatadditive noise and missing samples increase the rank of the block Hankel matrix, and the authorspresented an iterative algorithm that resembles seismic data reconstruction with the method ofprojection onto convex sets, where they use a low-rank approximation of the Hankel matrix viathe randomized singular value decomposition [Liberty et al., 2007, Halko et al., 2011b, Mahoney,2011] to interpolate seismic temporal frequency slices. While this technique may be effective forinterpolating data with a limited number of distinct dips, first, the approach requires embedding thedata into an even larger space where each dimension of size n is mapped to a matrix of size n×n, so afrequency slice with 4 dimensions becomes a Hankel tensor with 8 dimensions. Second, the processinvolves partitioning the input data in to smaller subsets that can be processed independently.As we know the theory of matrix completion is predicated upon the notion of an m × n matrix49being relatively low rank in order to ensure successful recovery. That is, the ratio of rank of thematrix to the ambient dimension, min(m,n), should be small for rank-minimizing techniques tobe successful in recovering the matrix from appropriately subsampled data. With the practice ofwindowing, we are inherently increasing the relative rank by decreasing the ambient dimension.Although mathematically desirable due to the seismic signal being stationary in sufficiently smallwindows, the act of windowing from a matrix rank point of view can lead to lower quality results,as we will see later in experiments. Choosing window sizes apriori is also a difficult task, as it isnot altogether obvious how to ensure that the resulting sub-volume is approximately a plane-wave.Previously proposed methods for automatic window size selection include Sinha et al. [2005], Wanget al. [2011] in the context of time-frequency analysis.Other than the Hankel transformation, Yang et al. [2013] used a texture-patch based trans-formation of the data, initially proposed by Schaeffer and Osher [2013], to exploit the low-rankstructure of seismic data. They showed that seismic data can be expressed as a combination of afew textures, due to continuity of seismic data. They divided the signal matrix into small r × rsubmatrices, which they then vectorized in to the columns of a matrix with r2 rows using the sameordering, and approximated the resulting matrix using low rank techniques. Although experimen-tally promising, this organization has no theoretically motivated underpinning and its performanceis difficult to predict as a function of the submatrix size. The authors proposed two algorithms tosolve this matrix completion problem, namely accelerated proximal gradient method (APG) andlow-rank matrix fitting (LMaFit). APG does not scale well to large scale seismic data because itinvolves repeated singular value decompositions, which are very expensive. LMaFit, on the otherhand, parametrizes the matrix in terms of two low-rank factors and uses nonlinear successive-over-relaxation to reconstruct the seismic data, but without penalizing the nuclear norm of the matrix.As shown in Aravkin et al. [2014a], without a nuclear norm penalty, choosing an incorrect rankparameter k can lead to overfitting of the data and degrading the interpolated result. Moreover,Mishra et al. [2013] demonstrates the poor performance of LMaFit, both in terms of speed andsolution quality, compared to more modern matrix completion techniques that penalize the nuclearnorm.Another popular approach to seismic data interpolation is to exploit the multi-dimensionalnature of seismic data and parametrize it as a low-rank tensor. Many of the ideas from low rankmatrices carry over to the multidimensional case, although there is no unique extension of the SVDto tensors. It is beyond the scope of this paper to examine all of the various tensor formats in thispaper, but we refer to a few tensor-based seismic interpolation methods here. Kreimer and Sacchi[2012a] stipulates that the seismic data volume of interest is well captured by a k−rank Tuckertensor and subsequently propose a projection on to non-convex sets algorithm for interpolatingmissing traces. Silva and Herrmann [2013a] develop an algorithm for interpolating HierarchicalTucker tensors, which are similar to Tucker tensors but have much smaller dimensionality. Trickettet al. [2013] proposes to take a structured outer product of the data volume, using a tensor ordering50similar to Hankel matrices, and performs tensor completion in the CP-Parafac tensor format. Themethod of Kreimer et al. [2013], wherein the authors consider a nuclear norm-penalizing approachin each matricization of the tensor, that is to say, the reshaping of the tensor, along each dimension,in to a matrix.These previous CS-based approaches, using sparsity or rank-minimization, incur computationaldifficulties when applied to large scale seismic data volumes. Methods that involve redundanttransforms, such as curvelets, or that add additional dimensions, such as taking outer products oftensors, are not computationally tractable for large data volumes with four or more dimensions.Moreover, a number of previous rank-minimization approaches are based on heuristic techniques andare not necessarily adequately grounded in theoretical considerations. Algorithmic components suchas parameter selection can significantly affect the computed solution and “hand-tuning” parameters,in addition to incurring unnecessary computational overhead, may lead to suboptimal results [Owenand Perry, 2009, Kanagal and Sindhwani, 2010].3.2.1 ContributionsOur contributions in this work are three-fold. First, we outline a practical framework for recoveringseismic data volumes using matrix and tensor completion techniques built upon the theoreticalideas from CS. In particular, understanding this framework allows us to determine apriori whenthe recovery of signals sampled at sub-Nyquist will succeed or fail and provides the principles uponwhich we can design practical experiments to ensure successful recovery. The ideas themselveshave been established for some time in the literature, albeit implicitly by means of the somewhattechnical conditions of CS and matrix completion. We explicitly describe these ideas on a highlevel in a qualitative manner in the hopes of broadening the accessibility of these techniques toa wider audience. These principles are all equally necessary in order for CS-based approaches ofsignal recovery to succeed and we provide examples of how recovery can fail if one or more of theseprinciples are omitted.Second, we address the computational challenges of using these matrix-based techniques forseismic-data reconstruction, since traditional rank minimization algorithms rely on computing thesingular value decomposition (SVD), which is prohibitively expensive for large matrices. To over-come this issue we propose to use either a fast optimization approach that combines the (SVD-free)matrix factorization approach recently developed by Lee et al. [2010a] with the Pareto curve ap-proach proposed by Berg and Friedlander [2008] and the factorization-based parallel matrix comple-tion framework dubbed Jellyfish [Recht and Re´, 2013]. We demonstrate the superior computationalperformances of both of these approaches compared to the tensor-based interpolation of Kreimeret al. [2013] as well as traditional curvelet-based approaches on realistic 2D and 3D seismic datasets.Third, we examine the popular approach of windowing a large data volume in to smaller data51volumes to be processed in parallel and empirically demonstrate how such a process does not respectthe inherent redundancy present in the data, degrading reconstruction quality as a result.3.2.2 NotationIn this paper, we use lower case boldface letters to represent vectors (i.e. one-dimensional quan-tities), e.g., b, f ,x,y, . . . . We denote matrices and tensors using upper case boldface letters, e.g.,X,Y,Z, . . . and operators that act on vectors, matrices, or tensors will be denoted using calligraphicupper case letters, e.g., A. 2D seismic volumes have one source and one receiver dimensions, de-noted xsrc, xrec, respectively, and time, denoted t. 3D seismic volumes have two source dimensions,denoted xsrc, ysrc, two receiver dimensions, denoted xrec, yrec, and time t. We also denote midpointand offset coordinates as xmidpt, xoffset for the x-dimensions and similarly for the y-dimensions.The Frobenius norm of a m × n matrix X, denoted as ‖X‖F , is simply the usual `2 normof X when considered as a vector, i.e., ‖X‖F =√∑mi=1∑nj=1 X2ij . We write the SVD of X asX = USV H , where U and V are orthogonal and S = diag(s1, s2, . . . , sr) is a block diagonalmatrix of singular values, s1 ≥ s2 ≥ · · · ≥ sr ≥ 0. The matrix X has rank k when sk > 0 andsk+1 = sk+2 = · · · = sr = 0. The nuclear norm of X is defined as ‖X‖∗ =∑ri=1 si.We will use the matricization operation freely in the text below, which reshapes a tensor in toa matrix along specific dimensions. Specifically, if X is a temporal frequency slice with dimensionsxsrc, ysrc, xrec, yrec indexed by i = 1, . . . , 4, the matrix X(i) is formed by vectorizing the ith dimen-sion along the rows and the remaining dimensions along the columns. Matricization can also beperformed not only along singleton dimensions, but also with groups of dimensions. For example,X(i) with i = xsrc, ysrc places the x and y source dimensions along the columns and the remainingdimensions along the columns.3.3 Structured signal recoveryIn this setting, we are interested in completing a matrix X when we only view a subset of itsentries. For instance, in the 2D seismic data case, X is typically a frequency slice and missing shotscorrespond to missing columns from this matrix. Matrix completion arises as a natural extensionof Compressive Sensing ideas to recovering two dimensional signals. Here we consider three corecomponents of matrix completion.1. Signal structure - low rankCompressed Sensing is a theory that is deals with recovering vectors x that are sparse, orhave a few nonzeros. For a matrix X, a direct analogue for sparsity in a signal x is sparsity inthe singular values of X. We are interested in the case where the singular values of X decayquickly, so that X is well approximated by a rank k matrix. The set of all rank-k matriceshas low dimensionality compared to the ambient space of m × n matrices, which will allow52us to recover a low rank signal from sufficiently incoherent measurements.When our matrix X has slowly decaying singular values, i.e. is high rank, we considertransformations that promote quickly decaying singular values, which will allow us to recoverour matrix in another domain.Since we are sampling points from our underlying matrix X, we want to make sure that X isnot too ”spiky” and is sufficiently ”spread out”. If our matrix of interest was, for instance, thematrix of all zeros with one nonzero entry, we could not hope to recover this matrix withoutsampling the single, nonzero entry. In the seismic case, given that our signals of interest arecomposed of oscillatory waveforms, they are rarely, if ever, concentrated in a single region of,say, (source,receiver) space.2. Structure-destroying sampling operatorSince our matrix has an unknown but small rank k, we will look for the matrix X of smallestrank that fits our sampled data, i.e., A(X) = B, for a subsampling operator A. As such, weneed to employ subsampling schemes that increase the rank or decay of the singular valuesof the matrix. That is to say, we want to consider sampling schemes that are incoherentwith respect to the left and right singular vectors. Given a subsampling operator A, theworst possible subsampling scheme for the purposes of recovery would be removing columns(equivalently, rows) from the matrix, i.e. A(X) = XIk, where Ik is a subset of the columnsof identity matrix. Removing columns from the matrix can never allow for successful recon-struction because this operation lowers the rank, and therefore the original matrix X is nolonger the matrix of smallest rank that matches the data (for instance, the data itself wouldbe a candidate solution).Unfortunately, for, say, a 2D seismic data frequency slice X with sources placed along thecolumns and receivers along the rows, data is often acquired with missing sources, whichtranslates to missing columns of X. Similarly, periodic subsampling can be written as A(X) =ITkXIk′ , where Ik, Ik′ are subsets of the columns of the identity matrix. A similar considerationshows that this operator lowers the rank and thus rank minimizing interpolation will notsucceed in this sampling regime.The problematic aspect of the aforementioned sampling schemes is that they are separablewith respect to the matrix. That is, if X = USVH is the singular value decompositionof X, the previously mentioned schemes yield a subsampling operator of the form A(X) =CXDH = (CU)S(DV)H , for some matrices C,D. In the compressed sensing context, thistype of sampling is coherent with respect to the left and right singular vectors, which is anunfavourable recovery scenario.The incoherent sampling considered in the matrix completion literature is that of uniformrandom sampling, wherein the individual entries of X are sampled from the matrix with equal53probability [Cande`s and Recht, 2009, Recht, 2011]. This particular sampling scheme, althoughtheoretically convenient to analyze, is impractical to implement in the seismic context as itcorresponds to removing (source, receiver) pairs from the data. Instead, we will considernon-separable transformations, i.e., transforming data from the source-receiver domain tothe midpoint-offset domain, under which the missing sources operator is incoherent. Theresulting transformations will simultaneously increase the decay of the singular values of ouroriginal signal, thereby lowering its rank, and slow the decay of the singular values of thesubsampled signal, thereby creating a favourable recovery scenario.3. Structure-promoting optimization programSince we assume that our target signal X is low-rank and that subsampling increases therank, the natural approach to interpolation is to find the matrix of lowest possible rank thatagrees with our observations. That is, we solve the following problem for A, our measurementoperator, and B, our subsampled data, up to a given tolerance σ,minimizeX‖X‖∗ (3.1)subject to ‖A(X)−B‖F ≤ σ.Similar to using the `1 norm in the sparse recovery case, minimizing the nuclear norm promoteslow-rank structure in the final solution. Here we refer to this problem as Basis PursuitDenoising (BPDNσ).In summary, these three principles are all necessary for the recovery of subsampled signalsusing matrix completion techniques. Omitting any one of these three principles will, in general,cause such methods to fail, which we will see in the next section. Although this framework isoutlined for matrix completion, a straightforward extension of this approach also applies to thetensor completion case.3.4 Low-rank promoting data organizationBefore we can apply matrix completion techniques to interpolate F, our unvectorized frequencyslice of fully-sampled data, we must deal with the following issues. First, in the original (src, rec)domain, the missing sources operator, A, removes columns from F, which sets the singular valuesto be set to zero at the end of the spectrum, thereby decreasing the rank.Second, F itself also has high rank, owing to the presence of strong diagonal entries (zerooffset energy) and subsequent off-diagonal oscillations. Our previous theory indicates that naivelyapplying matrix completion techniques in this domain will yield poor results. Simply put, we aremissing two of the prerequisite signal recovery principles in the (src, rec) domain, which we can seeby plotting the decay of singular values in Figure 3.1. In light of our previous discussion, we will54examine different transformations under which the missing sources operator increases the singularvalues of our data matrix and hence promotes recovery in an alternative domain.3.4.1 2D seismic dataIn this case, we use the Midpoint-Offset transformation, which defines new coordinates for thematrix asxmidpt =12(xsrc + xrec)xoffset =12(xsrc − xrec).This coordinate transformation rotates the matrix F by 45 degrees and is a tight frame operatorwith a nullspace, as depicted in Figure 3.2. If we denote this operator by M, then M∗M = I,so transforming from (src, rec) to (midpt, offset) to (src, rec) returns the original signal, butMM∗ 6= I, so the transformation from (midpt, offset) to (src, rec) and back again does notreturn the original signal. By using this transformation, we move the strong diagonal energy toa single column in the new domain, which mitigates the slow singular value decay in the originaldomain. Likewise, the restriction operator A now removes super-/sub-diagonals from F rather thancolumns, demonstrated in Figure 3.2, which results in an overall increase in the singular values, asseen in Figure 3.1, placing the interpolation problem in a favourable recovery scenario as per theprevious section. Our new optimization variable is X˜ = M(X), which is the data volume in themidpoint-offset domain, and our optimization problem is thereforeminimizeX˜‖X˜‖∗s.t. ‖AM∗(X˜)−B‖F ≤ σ.3.4.2 3D seismic dataUnlike in the matrix-case, there is no unique generalization of the SVD to tensors and as a result,there is no unique notion of rank for tensors. Instead, can consider the rank of different matri-cizations of F. Instead of restricting ourselves to matricizations F(i) where i = xsrc, ysrc, xrec, yrec,we consider the case where i = {xsrc, ysrc}, {xsrc, xrec}, {xsrc, yrec}, {ysrc, xrec}. Owing to the reci-procity relationship between sources and receivers in F, we only need to consider two differentmatricizations of F, which are depicted in Figure 3.3 and Figure 3.4. As we see in Figure 3.5, thei = (xrec, yrec) organization, that is, placing both receiver coordinates along the rows, results ina matrix that has high rank and the missing sources operator removes columns from the matrix,decreasing the rank as mentioned previously. On the other hand, the i = (ysrc, yrec) matriciza-55100 200 300 4000.20.40.60.81Number of singular valueSingular value magnitude  (src−rec)(midpt−offset)100 200 300 40000.20.40.60.81Number of singular valueSingular value magnitude  (src−rec)(midpt−offset)100 200 300 4000.20.40.60.81Number of singular valueSingular value magnitude  (src−rec)(midpt−offset)100 200 300 40000.20.40.60.81Number of singular valueSingular value magnitude  (src−rec)(midpt−offset)Figure 3.1: Singular value decay in the source-receiver and midpoint-offset domain. Left :fully sampled frequency slices. Right : 50% missing shots. Top: low frequency slice.Bottom: high frequency slice. Missing source subsampling increases the singular valuesin the (midpoint-offset) domain instead of decreasing them in the (src-rec) domain.tion yields fast decay of the singular values for the original signal and a subsampling operatorthat causes the singular values to increase. This scenario is much closer to the idealized matrixcompletion sampling, which would correspond to the nonphysical process of randomly removing(xsrc, ysrc, xrec, yrec) points from F. We note that this data organization has been considered in thecontext of solution operators of the wave equation in Demanet [2006b], which applies to our caseas our data volume F is the restriction of a Green’s function to the acquisition surface.3.5 Large scale data reconstructionIn this section, we explore the modifications necessary to extend matrix completion to 3D seismicdata and compare this approach to an existing tensor-based interpolation technique. Matrix-56Source (m)Receiver (m)0 1000 2000 3000 4000 50000500100015002000250030003500400045005000Source (m)Receiver (m)0 1000 2000 3000 4000 50000500100015002000250030003500400045005000Offset (m)Midpoint (m)−5000 0 50000500100015002000250030003500400045005000Offset (m)Midpoint (m)−5000 0 50000500100015002000250030003500400045005000Figure 3.2: A frequency slice from the the seismic dataset from Nelson field. Left : Fullysampled data. Right : 50% subsampled data. Top: Source-receiver domain. Bottom:Midpoint-offset domain.completion techniques, after some modification, easily scale to interpolate large, multidimensionalseismic volumes.3.5.1 Large scale matrix completionFor the matrix completion approach, the limiting component for large scale data is that of thenuclear norm projection. As mentioned in Aravkin et al. [2014a], the projection on to the set‖X‖∗ ≤ τ requires the computation of the SVD of X. The main computational costs of computingthe SVD of a n× n matrix has computational complexity O(n3), which is prohibitively expensivewhen X has tens of thousands or even millions of rows and columns. On the assumption that X isapproximately low-rank at a given iteration, other authors such as Stoll [2012] compute a partialSVD using a Krylov approach, which is still cost-prohibitive for large matrices.We can avoid the need for the expensive computation of SVDs via a well known factorizationof the nuclear norm. Specifically, we have the following characterization of the nuclear norm, due57xsrc, ysrcxrec,y rec100 200 300 400 500 600100200300400500600xsrc, ysrcxrec,y rec20 40 60 80 100102030405060708090100xsrc, ysrcxrec,y rec100 200 300 400 500 600100200300400500600xsrc, ysrcxrec,y rec20 40 60 80 100102030405060708090100Figure 3.3: (xrec, yrec) matricization. Top: Full data volume. Bottom: 50% missing sources.Left : Fully sampled data. Right : Zoom plotto Srebro [2004],‖X‖∗ = minimizeL,R12(‖L‖2F + ‖R‖2F )subject to X = LRT .This allows us to write X = LRT for some placeholder variables L and R of a prescribed rankk. Therefore, instead of projecting on to ‖X‖∗ ≤ τ , we can instead project on to the factor ball12(‖L‖2F +‖R‖2F ) ≤ τ . This factor ball projection only involves computing ‖L‖2F , ‖R‖2F and scalingthe factors by a constant, which is substantially cheaper than computing the SVD of X.Equipped with this factorization approach, we can still use the basic idea of SPG`1 to flip theobjective and the constraints. The resulting subproblems for solving BPDNσ can be solved muchmore efficiently in this factorized form, while still maintaining the quality of the solution. Theresulting algorithm is dubbed SPG-LR by Aravkin et al. [2014a]. This reformulation allows us to58xsrc,xrecy rec,ysrc100 200 300 400 500 600100200300400500600xsrc,xrecy rec,ysrc20 40 60 80 100102030405060708090100xsrc,xrecy rec,ysrc100 200 300 400 500 600100200300400500600xsrc,xrecy rec,ysrc20 40 60 80 100102030405060708090100Figure 3.4: (ysrc, yrec) matricization. Top: Fully sampled data. Bottom: 50% missingsources. Left : Full data volume. Right : Zoom plot. In this domain, the samplingartifacts are much closer to the idealized ’pointwise’ random sampling of matrix com-pletion.0 100 200 300 400 500 600 70010−710−610−510−410−310−210−1100Singular value indexNormalized singular value  No subsampling50% missing sources0 100 200 300 400 500 600 70010−710−610−510−410−310−210−1100Singular value indexNormalized singular value  No subsampling50% missing sourcesFigure 3.5: Singular value decay (normalized) of the Left : (xrec, yrec) matricization and Right :(ysrc, yrec) matricization for full data and 50% missing sources.59apply these matrix completion techniques to large scale seismic data interpolation.This factorization turns the convex subproblems for solving BPDNσ posed in terms of X intoa nonconvex problem in terms of the variables L,R, so there is a possibility for local minima ornon-critical stationary points to arise when using this approach. As it turns out, as long as theprescribed rank k is larger than the rank of the optimal X, any local minima encountered in thefactorized problem is actually a global minimum [Burer and Monteiro, 2005, Aravkin et al., 2014a].The possibility of non-critical stationary points is harder to discount, and remains an open problem.There is preliminary analysis indicating that initializing L and R so that LRT is sufficiently closeto the true X will ensure that this optimization program will converge to the true solution [Sun andLuo, 2014]. In practice, we initialize L and R randomly with appropriately scaled Gaussian randomentries, which does not noticeably change the recovery results across various random realizations.An alternative approach to solving the factorized BPDNσ is to relax the data constraint ofEquation (3.1) in to the objective, resulting in the QPλ formulation,minL,R12‖A(LRH)−B‖2F + λ(‖L‖2F + ‖R‖2F ). (3.2)The authors in Recht and Re´ [2013] exploit the resulting independance of various subblocksof the L and R factors to create a partitioning scheme that updates components of these factorsin parallel, resulting in a parallel matrix completion framework dubbed Jellyfish. By using thisJellyfish approach, each QPλ problem for fixed λ and fixed internal rank k can be solved veryefficiently and cross-validation techniques can choose the optimal λ and rank parameters.3.5.2 Large scale tensor completionFollowing the approach of Kreimer et al. [2013], which applies the method developed in Gandyet al. [2011] to seismic data, we can also exploit the tensor structure of a frequency slice F forinterpolating missing traces.We now stipulate that each matricization F(i) for i = 1, . . . , 4 has low-rank. We can proceed inan analogous way to the matrix completion case by solving the following problemminimizeF4∑i=1‖F(i)‖∗subject to ‖A(F)−B‖2 ≤ σ,i.e. look for the tensor F that has simultaneously the lowest rank in each matricization F(i) thatfits the subsampled data B. In the case of Kreimer et al. [2013], this interpolation is performed inthe (xmidpt, ymidpt, xoffset, yoffset) domain on each frequency slice, which we also employ in our laterexperiments.To solve this problem, the authors in Kreimer et al. [2013] use the Douglas-Rachford variable60splitting technique that creates 4 additional copies of the variable F, denoted Xi, with each copycorresponding to each matricization F(i). This is an inherent feature of this approach to solveconvex optimization problems with coupled objectives/constraints and thus cannot be avoided oroptimized away. The authors then use an Augmented Lagrangian approach to solve the decoupledproblemminimizeX1,X2,X3,X4,F4∑i=1‖Xi‖∗ + λ‖A(F)−B‖22 (3.3)subject to Xi = F(i) for i = 1, . . . , 4.The resulting problem is convex, and thus has a unique solution. We refer to this method asthe alternating direction method of multipliers (ADMM) tensor method. This variable splittingtechnique can be difficult to implement for realistic problems, as the tensor F is often unableto be stored fully in working memory. Given the large number of elements of F, creating atminimum four extraneous copies of F can quickly overload the storage and memory of even alarge computing cluster. Moreover, there are theoretical and numerical results that state thatthis problem formulation is in fact no better than imposing the nuclear norm penalty on a singlematricization of F, at least in the case of Gaussian measurements [Oymak et al., 2012, Signorettoet al., 2011]. We shall see a similar phenomenon in our subsequent experiments.Penalizing the nuclear norm in this fashion, as in all methods that use an explicit nuclearnorm penalty, scales very poorly as the problem size grows. When our data F has four or moredimensions, the cost of computing the SVD of one of its matricizations easily dominates the overallcomputational costs of the method. Applying this operation four times per iteration in the aboveproblem, as is required due to the variable splitting, prevents this technique from performingefficiently for large realistic problems.3.6 ExperimentsWe perform seismic data interpolation on five different data sets. In case of 2D, the first data set,which is a shallow-water marine scenario, is from the Nelson field provided to us by PGS. TheNelson data set contains 401× 401 sources and receivers with the temporal sampling interval of0.004s. The second synthetic data set is from the Gulf of Mexico (GOM) and is provided to usby the Chevron. It contains 3201 sources and 801 receivers with a spatial interval of 25m. Thethird data set is simulated on a synthetic velocity model (see Berkhout and Verschuur [2006]) usingIWave [Symes et al., 2011a]. An anticline salt structure over-lies the target, i.e., a fault structure.A seismic line is modelled using a fixed-spread configuration where sources and receivers are placedat an interval of 15m. This results in a data set of 361× 361 sources and receivers.Our 3D examples consist of two different data sets. The first data set is generated on a synthetic61single-layer model. This data set has 50 sources and 50 receivers and we use a frequency slice at4 Hz. This simple data set allows us to compare the running time of the various algorithms underconsideration. The Compass data set is provided to us by the BG Group and is generated froman unknown but geologically complex and realistic model. We selected a few 4D monochromaticfrequency slices from this data set at 4.68, 7.34, and 12.3Hz. Each monochromatic frequency slicehas 401×401 receivers spaced by 25m and 68×68 sources spaced by 150m. In all the experiments,we initialize L and R using random numbers.3.6.1 2D seismic dataIn this section, we compare matrix-completion based techniques to existing curvelet-based interpo-lation for interpolating 2D seismic data. For details on the curvelet-based reconstruction techniques,we refer to [Herrmann and Hennenfent, 2008a, Mansour et al., 2013]. For concreteness, we concernourselves with the missing-sources scenario, although the missing-receivers scenario is analogous.In all the experiments, we set the data misfit parameter σ to be equal to η‖B‖F where η ∈ (0, 1)is the fraction of the input data energy to fit.Nelson data setHere, we remove 50%, 75% of the sources, respectively. For the sake of comparing curvelet-based andrank-minimization based reconstruction methods on identical data, we first interpolate a single 2Dfrequency slice at 10 Hz. When working with frequency slices using curvelets, Mansour et al. [2013]showed that the best recovery is achieved in the midpoint-offset domain, owing to the increasedcurvelet sparsity. Therefore, in order to draw a fair comparison with the matrix-based methods, weperform curvelet-based and matrix-completion based reconstruction in the midpoint-offset domain.We summarize these results of interpolating a single 2D frequency slice in Table 3.1. Comparedto the costs associated to applying the forward and adjoint curvelet transform, SPG-LR is muchmore efficient and, as such, this approach significantly outperforms the `1-based curvelet interpo-lation. Both methods perform similarly in terms of reconstruction quality for low frequency slices,since these slices are well represented both as a sparse superposition of curvelets and as a low-rankmatrix. High frequency data slices, on the other hand, are empirically high rank, which can beshown explicitly for a homogeneous medium as a result of Lemma 2.7 in Engquist and Ying [2007],and we expect matrix completion to perform less well in this case, as high frequencies containsoscillations away from the zero-offset. On the other hand, these oscillations can be well approxi-mated by low-rank values in localized domains. To perform the reconstruction of seismic data inthe high frequency regime, Kumar et al. [2013a] proposed to represent the matrix in the Hierar-chical semi-separable (HSS) format, wherein data is first windowed in off-diagonal and diagonalblocks and the diagonal blocks are recursively partitioned. The interpolation is then performedon each subset separately. In the interest of brevity, we omit the inclusion of this approach here.62Additionally, since the high frequency slices are very oscillatory, they are much less sparse in thecurvelet dictionary.Owing to the significantly faster performance of matrix completion compared to the curvelet-based method, we apply the former technique to an entire seismic data volume by interpolatingeach frequency slice in the 5-85Hz band. Figures 3.6 show the interpolation results in case of 75%missing traces. In order to get the best rank values to interpolation the full seismic line, we firstperformed the interpolation for the frequency slices at 10 Hz and 60 Hz. The best rank value weget for these two slices is 30 and 60 respectively. Keeping this in mind, we work with all of themonochromatic frequency slices and adjust the rank linearly from 30 to 60 when moving from lowto high frequencies. The running time is 2 h 18 min using SPG-LR on a 2 quad-core 2.6GHz Intelprocessor with 16 GB memory and implicit multithreading via LAPACK libraries. We can seethat we have low-reconstruction error with little coherent energy in the residual when 75% of thesources are missing. Figure 3.7 shows the qualitative measurement of recovery for all frequencies inthe energy-band. We can further mitigate such coherent residual energy by exploiting additionalstructures in the data such as symmetry, as in Kumar et al. [2014].RemarkIt is important to note that if we omit the first two principles of matrix completion by interpolat-ing the signal in the source-receiver domain, as discussed previously, we obtain very poor results, asshown in Figure 3.8. Similar to CS-based interpolation, choosing an appropriate transform-domainfor matrix and tensor completion is vital to ensure successful recovery.Table 3.1: Curvelet versus matrix completion (MC). Real data results for completing afrequency slice of size 401 × 401 with 50% and 75% missing sources. Left : 10 Hz (lowfrequency), right : 60 Hz (high frequency). SNR, computational time, and number ofiterations are shown for varying levels of η = 0.08, 0.1.Curvelets MCη 0.08 0.1 0.08 0.150%SNR (dB) 18.2 17.3 18.6 17.7time (s) 1249 1020 15 10iterations 123 103 191 12475%SNR (dB) 13.5 13.2 13.0 13.3time (s) 1637 1410 8.5 8iterations 162 119 105 104Curvelets MCη 0.08 0.1 0.08 0.150%SNR (dB) 10.5 10.4 12.5 12.4time (s) 1930 1549 19 13iteration 186 152 169 11875%SNR (dB) 6.0 5.9 6.9 7.0time (s) 3149 1952 15 10iteration 284 187 152 105Gulf of mexico data setIn this case, we remove 80% of the sources. Here, we perform the interpolation on a frequencyspectrum of 5-30hz. Figure 3.10 shows the comparison of the reconstruction error using rank-63Figure 3.6: Missing-trace interpolation. Top : Fully sampled data and 75% subsampledcommon receiver gather. Bottom Recovery and residual results with a SNR of 9.4 dB.10 20 30 40 50 60 70 80681012141618Frequency(Hz)SNR(dB)  75% missing50% missingFigure 3.7: Qualitative performance of 2D seismic data interpolation for 5-85 Hz frequencyband for 50% and 75% subsampled data.64Source (m)Receiver (m)0 1000 2000 3000 4000 50000500100015002000250030003500400045005000Source (m)Receiver (m)0 1000 2000 3000 4000 50000500100015002000250030003500400045005000Figure 3.8: Recovery results using matrix-completion techniques. Left : Interpolation in thesource-receiver domain, low-frequency SNR 3.1 dB. Right : Difference between trueand interpolated slices. Since the sampling artifacts in the source-receiver domaindo not increase the singular values, matrix completion in this domain is unsuccesful.This example highlights the necessity of having the appropriate principles of low-rankrecovery in place before a seismic signal can be interpolated effectively.minimization based approach for a frequency slice at 7Hz and 20 Hz. For visualization purposes,we only show a subset of interpolated data corresponding to the square block in Figure 3.9, but weinterpolate the monochromatic slice over all sources and receivers. Even in the highly sub-sampledcase of 80%, we are still able to recover to a high SNR of 14.2 dB, 10.5dB, respectively, but westart losing coherent energy in the residual as a result of the high-subsampling ratio. These resultsindicate that even in complex geological environments, low-frequencies are still low-rank in nature.This can also be seen since, for a continuous function, the smoother the function is (i.e., the morederivatives it has), the faster its singular values decay (see, for instance, Chang and Ha [1999]).For comparison purposes, we plot the frequency-wavenumber spectrum of the 20Hz frequency slicein Figure 3.11 along with the corresponding spectra of matrix with 80% of the sources removedperiodically and uniform randomly. In this case, the volume is approximately three times aliased inthe bandwidth of the original signal for periodic subsampling, while the randomly subsampled casehas created noisy aliases. The average sampling interval for both schemes is the same. As shown inthis figure, the interpolated matrix has a significantly improved spectrum compared to the input.Figure 3.12 shows the interpolation result over a common receiver gather using rank-minimizationbased techniques. In this case, we set the rank parameter to be 40 and use the same rank for allthe frequencies. The running time on a single frequency slice in this case is 7 min using SPG-LRand 1320 min using curvelets.65Source (km)Receiver (km)30 35 40 45 503032343638404244464850Source (km)Receiver (km)30 35 40 45 503032343638404244464850Figure 3.9: Gulf of Mexico data set. Top: Fully sampled monochromatic slice at 7 Hz.Bottom left : Fully sampled data (zoomed in the square block). Bottom right : 80%subsampled sources. For visualization purpose, the subsequent figures only show theinterpolated result in the square block.Synthetic fault modelIn this setting, we remove 80% of the sources and display the results in Figure 3.13. For simplicity,we only perform rank-minimization based interpolation on this data set. In this case we set therank parameter to be 30 and used the same for all frequencies. Even though the presence of faultsmake the geological environment complex, we are still able to successfully reconstruct the datavolume using rank-minimization based techniques, which is also evident in the low-coherency ofthe data residual (Figure 3.13).66Source (km)Receiver (km)30 35 40 45 503032343638404244464850Source (km)Receiver (km)30 35 40 45 503032343638404244464850Figure 3.10: Reconstruction errors for frequency slice at 7Hz (left) and 20Hz (right) in caseof 80% subsampled sources. Rank-minimization based recovery with a SNR of 14.2dB and 11.0 dB respectively.Wavenumber (1/m)Frequency (Hz)−0.02 −0.01 0 0.0151015202530Wavenumber (1/m)Frequency (Hz)−0.02 −0.01 0 0.0151015202530Wavenumber (1/m)Frequency (Hz)−0.02 −0.01 0 0.0151015202530Wavenumber (1/m)Frequency (Hz)−0.02 −0.01 0 0.0151015202530Figure 3.11: Frequency-wavenumber spectrum of the common receiver gather. Top left :Fully-sampled data. Top right : Periodic subsampled data with 80% missingsources. Bottom left : Uniform-random subsampled data with 80% missing sources.Bottom Right : Reconstruction of uniformly-random subsampled data using rank-minimization based techniques. While periodic subsampling creates aliasing, uniform-random subsampling turns the aliases in to incoherent noise across the spectrum.67Source (km)Time (s)30 35 403.544.555.566.577.58Source (km)Time (s)30 35 403.544.555.566.577.58Source (km)Time (s)30 35 403.544.555.566.577.58Figure 3.12: Gulf of Mexico data set, common receiver gather. Left : Uniformly-randomsubsampled data with 80% missing sources. Middle : Reconstruction results usingrank-minimization based techniques (SNR = 7.8 dB). Right : Residual.Sources (m)Time (s)0 2000 400000.511.52Sources (m)Time (s)0 2000 400000.511.52Sources (m)Time (s)0 2000 400000.511.52Figure 3.13: Missing-trace interpolation (80% sub-sampling) in case of geological structureswith a fault. Left : 80% sub-sampled data. Middle: after interpolation (SNR = 23dB). Right : difference.683.6.2 3D seismic dataSingle-layer reflector dataBefore proceeding to a more realistically sized data set, we first test the performance of the SPG-LRmatrix completion and the tensor completion method of Kreimer et al. [2013] on a small, syntheticdata set generated from a simple, single-reflector model. We only use a frequency slice at 4 Hz. Wenormalize the volume to unit norm and randomly remove 50% of the sources from the data.For the alternating direction method of multipliers (ADMM) tensor method, we complete thedata volumes in the midpoint-offset domain, which is the same domain used in Kreimer et al.[2013]. In the context of our framework, we note that the midpoint-offset domain for recovering3D frequency slices has the same recovery-enhancing properties as for recovering 2D frequencyslices, as mentioned previously. Specifically, missing source sampling tends to increase the rankof the individual source and receiver matricizations in this domain, making completion via rank-minimization possible in midpoint-offset compared to source-receiver. In the original source-receiverdomain, removing (xsrc, xrec) points from the tensor does not increase the singular values in thexsrc and xrec matricizations and hence the reconstruction quality will suffer. On the other hand,for the matrix completion case, the midpoint-offset conversion is a tight frame that acts on the leftand right singular vectors of the matricized tensor F(xsrc,xrec) and thus does not affect the rank forthis particular matricization. Also in this case, we consider the effects of windowing the input dataon interpolation quality and speed. We let ADMM-w denote the ADMM method with a windowsize of w with an additional overlap of approximately 20%. In our experiments, we consider w = 10(small windows), w = 25 (large windows), and w = 50 (no windowing).In the ADMM method, the two parameters of note are λ, which control the relative penaltybetween data misfit and nuclear norm, and β, which controls the speed of convergence of theindividual matrices X(i) to the tensor F. The λ, β parameters proposed in Kreimer et al. [2013] donot appear to work for our problems, as using the stated parameters penalizes the nuclear normterms too much compared to the data residual term, resulting in the solution tensor convergingto X = 0. Instead, we estimate the optimal λ, β parameters by cross validation, which involvesremoving 20% of the 50% known sources, creating a so-called ”test set”, and using the remainingdata points as input data. We use various combinations of λ, β to solve Problem 3.3, using 50iterations, and compare the SNR of the interpolant on the test set in order to determine the bestparameters, i.e. we estimate the optimal λ, β without reference to the unknown entries of thetensor. Owing to the large computational costs of the ”no window” case, we scan over the valuesof λ increasing exponentially and fix β = 0.5. For the windowed cases, we scan over exponentiallyincreasing values of λ, β for a single window and use the estimated λ, β for interpolating the otherwindows. For the SPG-LR, we set our internal rank parameter to be 20 and allow the algorithm torun for 1000 iterations. As shown in Aravkin et al. [2014a], as long as the chosen rank is sufficiently69Method SNR Solve time Parameter selection time Total timeSPG-LR 25.5 0.9 N/A 0.9ADMM - 50 20.8 87.4 320 407.4ADMM - 25 16.8 4.4 16.4 20.8ADMM - 10 10.9 0.1 0.33 0.43Table 3.2: Single reflector data results. The recovery quality (in dB) and the computationaltime (in minutes) is reported for each method. The quality suffers significantly as thewindow size decreases due to the smaller redundancy of the input data, as discussedpreviously.large, further increasing the rank parameter will not significantly change the results. We summarizeour results in Table 3.2 and display the results in Figure 3.14.Even disregarding the time spent selecting ideal parameters, SPG-LR matrix completion dras-tically outperforms the ADMM method on this small example. The tensor-based, per-dimensionwindowing approach also degrades the overall reconstruction quality, as the algorithm is unable totake advantage of the redundancy of the full data volume once the windows are sufficiently small.There is a very prominent tradeoff between recovery speed and reconstruction quality as the sizeof the windows become smaller, owing to the expensive nature of the ADMM approach itself forlarge data volumes and the inherent redundancy in the full data volume that makes interpolationpossible which is decreased when windowing.BG compass dataOwing to the smoothness of the data at lower frequencies, we uniformly downsample the individualfrequency slices in the receiver coordinates without introducing aliasing. This reduces the overallcomputational complexity while simultaneously preserving the recovery quality. The 4.64Hz, 7.34Hzand 12.3Hz slices were downsampled to 101×101, 101×101 and 201×201 receiver grids, respectively.For these problems, the data was subsampled along the source coordinates by removing 25%, 50%,and 75% of the shots.In order to apply matrix completion without windowing on the entire data set, the data wasorganized as a matrix using the low-rank promoting organization described previously. We usedJellyfish and SPG-LR implementations to complete the resulting incomplete matrix and comparedthese methods to the ADMM Tensor method and LMaFit, an alternating least-squares approachto matrix completion detailed in Wen et al. [2012]. LMaFit is a fast matrix completion solverthat avoids using nuclear norm penalization but must be given an appropriate rank parameter inorder to achieve reasonable results. We use the code available from the author’s website. SPG-LR,ADMM, and LMaFit were run on a 2 quad-core 2.6GHz Intel processor with 16 GB memory andimplicit multithreading via LAPACK libraries while Jellyfish was run on a dual Xeon X650 CPU70receiver xreceiver y1 10 20 30 40110203040(a) True Datareceiver xreceiver y1 10 20 30 40110203040(b) Subsampled datareceiver yreceiver x1 10 20 30 40110203040(c) SPG-LR - 34.5 dBreceiver xreceiver y1 10 20 30 40110203040(d) ADMM-50 - 26.6 dBreceiver xreceiver y1 10 20 30 40110203040(e) ADMM-25 - 21.4 dBreceiver xreceiver y1 10 20 30 40110203040(f) ADMM-10 - 16.2 dBreceiver yreceiver x1 10 20 30 40110203040receiver xreceiver y1 10 20 30 40110203040receiver xreceiver y1 10 20 30 40110203040receiver xreceiver y1 10 20 30 40110203040Figure 3.14: ADMM data fit + recovery quality (SNR) for single reflector data, commonreceiver gather. Middle row: recovered slices, bottom row: residuals corresponding toeach method in the middle row. Tensor-based windowing appears to visibly degradethe results, even with overlap.(6 x 2 cores) with 24 GB of RAM with explicit multithreading. The hardware configurations ofboth of these environments are very similar, which results in SPG-LR and Jellyfish performingcomparably.For the Jellyfish experiments, the model parameter µ, which plays the same role as the λparameter above, and the optimization parameters (initial step size and step decay) were selectedby validation, which required 120 iterations of the optimization procedure for each (frequency,subsampling ratio) pair. The maximum rank value was set to the rank value used in the SPG-LRresults. For the SPG-LR experiments, we interpolate a subsection of the data for various rankvalues and arrived at 120, 150 and 200 as the best rank parameters for each frequency. We performthe same validation techniques on the rank parameter k of LMaFit. In order to focus solely oncomparing computational times, we omit reporting the parameter selection times for the ADMMmethod.The results for 75% missing sources in Figure 3.15 demonstrate that, even in the low subsampling71regime, matrix completion methods can successfully recover the missing shot data at these lowfrequencies. Table 3.3 gives an extensive summary of our results for different subsampling ratios andfrequencies. The comparative results between Jellyfish and SPG-LR agree the with the theoreticalresults that establish the equivalence of BPDNσ and QPλ formulations. The runtime values includethe parameter estimation procedure, which was carried out individually in each case. As we haveseen previously, the ADMM approach does not perform well both in terms of computational timeand in terms of recovery.In our experiments, we noticed that successful parameter combinations work well for otherproblems too. Hence we can argue that in a real-world problem, once a parameter combinationis selected, it can be used for different instances or it can be used as an initial point for a localparameter search.Matrix completion with windowingWhen windowing the data, we use the same matricizations of the data as discussed previously,but now split the volume in to nonoverlapping windows. We now use matrix completion on theresulting windows of data individually. We used Jellyfish for matrix completion on individualwindows. Again, we use cross validation to select our parameters. We performed the experimentswith two different window sizes. For the small window case, the matricization was partitionedinto 4 segments along rows and columns, totalling 16 windows. For the large window case, thematricization was split into 16 segments along rows and columns, yielding 256 windows. Thiswindowing is distinctly different from the windowing explored for the single-layer model, since herewe are windowing the matricized form of the tensor, in the (xsrc, xrec) unfolding, as opposed tothe per-dimension windowing in the previous section. The resulting windows created in this waycontain much more sampled data than in the tensor-windowing case yet are still small enough insize to be processed efficiently.The results in Figure 3.16 suggest that for this particular form of windowing, the matrix comple-tion results are particularly degraded by only using small windows of data at a time. As mentionedpreviously, since we are relying on a high redundancy (with respect to the SVD) in the underlyingand sampled data to perform matrix completion, we are reducing the overall redundancy of the in-put data by partitioning it. On the other hand, the real-world benefits of windowing in this contextbecome apparent when the data cannot be fit into the memory at the cost of reconstruction quality.In this case, windowing allows us to partition the problem, offsetting the I/O cost that would resultfrom memory paging. Based on these results, whenever possible, we strive to include as much dataas possible in a given problem in order to recover the original matrix/tensor adequately.72receiver yreceiver x1 100 200 300 4001100200300400(a) True datareceiver yreceiver x1 100 200 300 4001100200300400(b) SPG-LR : 8.3 dBreceiver yreceiver x1 100 200 300 4001100200300400(c) SPG-LR differencereceiver yreceiver x1 100 200 300 4001100200300400(d) Jellyfish : 9.2 dBreceiver yreceiver x1 100 200 300 4001100200300400(e) ADMM : -1.48 dBreceiver yreceiver x1 100 200 300 4001100200300400(f) LMaFit : 6.3 dBreceiver yreceiver x1 100 200 300 4001100200300400(g) Jellyfish differencereceiver yreceiver x1 100 200 300 4001100200300400(h) ADMM differencereceiver yreceiver x1 100 200 300 4001100200300400(i) LMaFit differenceFigure 3.15: BG 5-D seismic data, 12.3 Hz, 75% missing sources. Middle row: interpolationresults, bottom row: residuals.3.7 DiscussionAs the above results demonstrate, the L,R matrix completion approach significantly outperformsthe ADMM tensor-based approach due to the need to avoid the computation of SVDs as well as theminimal duplication of variables compared to the latter method. For the simple synthetic data, theADMM method is able to achieve a similar recovery SNR to matrix completion, albeit at a muchlarger computational cost. For realistically sized data sets, the difference between the two methodscan mean the difference between hours and days to produce an adequate result. In terms of thedifference between SPG-LR and Jellyfish matrix completion, both return results that are similarin quality, which agrees with the fact that they are both based off of L,R factorizations and theranks used in these experiments are identical. Compared to these two methods, LMaFit converges73receiver yreceiver x1 100 200 300 4001100200300400(a) No windowing - SNR 16.7 dBreceiver yreceiver x1 100 200 300 4001100200300400(b) Large window - 14.5 dBreceiver yreceiver x1 100 200 300 4001100200300400(c) Small window - SNR 8.5 dBreceiver yreceiver x1 100 200 300 4001100200300400receiver yreceiver x1 100 200 300 4001100200300400receiver yreceiver x1 100 200 300 4001100200300400Figure 3.16: BG 5D seismic data, 4.68 Hz, Comparison of interpolation results with andwithout windowing using Jellyfish for 75% missing sources. Top row: interpolationresults for differing window sizes, bottom row: residuals.Frequency Missing sources SPG-LR Jellyfish ADMM LmaFitSNR Time SNR Time SNR Time SNR Time4.68 Hz75% 15.9 84 16.34 36 0.86 1510 14.7 20450% 20.75 96 19.81 82 3.95 1510 17.5 9125% 21.47 114 19.64 124 9.17 1510 18.9 667.34 Hz75% 11.2 84 11.99 52 0.39 1512 10.7 18350% 15.2 126 15.05 146 1.71 1512 14.1 3725% 16.3 138 15.31 195 4.66 1512 14.3 2112.3 Hz75% 7.3 324 9.34 223 0.06 2840 8.1 81450% 12.6 438 12.12 706 0.21 2840 11.1 7225% 14.02 450 12.90 1295 0.42 2840 11.3 58Table 3.3: 3D seismic data results. The recovery quality (in dB) and the computational time(in minutes) is reported for each method.74much faster for regimes when there is more data available, while producing a lower quality result.When there is very little data available, as is typical in realistic seismic acquisition scenarios, thealgorithm has issues converging. We note that, since it is a sparse linear algebra method, Jellyfishtends to outperform SPG-LR when the number of missing traces is high. This sparse linear algebraapproach can conceivably be employed with the SPG-LR machinery. In these examples, we havenot made any attempts to explicitly parallelize the SPG-LR or ADMM methods, instead relyingon the efficient dense linear algebra routines used in Matlab, whereas Jellyfish is an inherentlyparallelized method.Without automatic mechanisms for parameter selection as in SPG-LR, the Jellyfish, ADMM,and LMaFit algorithms rely on cross-validation techniques that involve solving many probleminstances at different parameters. The inherently parallel nature of Jellyfish allows it to solve eachproblem instance very quickly and thus achieves very similar performance to SPG-LR. LMaFithas very fast convergence when there is sufficient data, but slows down significantly in scenarioswith very little data. The ADMM method, on the other hand, scales much more poorly forlarge data volumes and spends much more time on parameter selection than the other methods.However, in practice, we can assume that across frequency slices, say, optimally chosen parametersfor one frequency slice will likely work well for neighbouring frequency slices and thus the parameterselection time can be amortized over the whole volume.In our experiments, aside from the simple 3D layer model and the Nelson dataset, the geologicalmodels used were not low rank. That is to say, the models had complex geology and were not simplyhorizontally layered media. Instead, through the use of these low rank techniques, we are exploitingthe low rank structure of the data volumes achieved from the data acquisition process, not merelyany low rank structure present in the models themselves. As the temporal frequency increases, theinherent rank of the resulting frequency slices increases, which makes low rank interpolation morechallenging. Despite this observation, we still achieve reasonable results for higher frequencies usingour methods.As predicted by our theoretical considerations, the choice of windowing in this case has anegative effect on the generated results in the situation where the earth model permits a low-rankrepresentation that is reflected in the midpoint-offset domain. In case of earth models that are notinherently low-rank, such as those with salt bodies, we can still recover the low-frequency slicesas shown by the examples without performing the windowing on the data sets. As a general ruleof thumb, we advise to incorporate as much of the input data is possible in to a given matrix-completion problem but clearly there is a tradeoff between the size of the data windows, theamount of memory available to process such volumes, and the inherent complexity of the model.Additionally, one should avoid methods that needlessly create extraneous copies of the data whenworking with large scale volumes.Here we have also demonstrated the importance of theoretical components for signal recoveryusing matrix and tensor completion methods. By ignoring these principles of matrix completion, a75practitioner can unintentionally find herself in a disadvantageous scenario and produce sub-optimalresults without a guiding theory to remedy the situation. However, by choosing an appropriatetransform domain in which to complete the matrix or tensor, we can successfully employ this rank-minimizing machinery to interpolate a signal with missing traces in a computationally efficientmanner.From a practitioner’s point of view, the purpose of this interpolation machinery is to remove theacquisition footprint from missing-trace data that is used in further downstream seismic processessuch as migration and full waveform inversion. These techniques can help mitigate the lack of datacoverage in certain areas that would otherwise have created artifacts or non-physical regions in aseismic image.3.8 ConclusionBuilding upon existing knowledge of compressive sensing as a successful signal recovery paradigm,this work has outlined the necessary components of using matrix and tensor completion methodsfor interpolating large-scale seismic data volumes. As we have demonstrated numerically, with-out the necessary components of a low-rank domain, a rank-increasing sampling scheme, and arank-minimizing optimization scheme, matrix completion-based techniques cannot successfully re-cover subsampled seismic signals. Once all of these ingredients are in place, however, we can useexisting convex solvers to recover the fully sampled data volume. Since such solvers invariablyinvolve computing singular-value decomposition of large matrices, we have presented two alter-native factorized-based formulations that scale much more efficiently than their strictly convexcounterparts when the data volumes are large. We have shown that our factorization-based ma-trix completion approach is very competitive compared to existing curvelet-based methods for 2Dseismic data and alternating direction method of multipliers tensor-based methods for 3D seismicdata.From a practical point of view, this theoretical framework is exceedingly flexible. We haveshown the effectiveness of midpoint-offset organization for 2D data and (xsource, xreceiver) matrixorganization for 3D data for promoting low-rank structure in the data volumes but it is conceivablethat other seismic data organizations could also be useful in this regard, e.g., midpoint-offset-azimuth. Our optimization framework also allows us to operate on both large-scale data withouthaving to select a large number of parameters and we do not need to recourse to using small windowsof data, which may degrade the recovery results. In the seismic context, reusing the interpolatedresults from lower frequencies as a warm-start for interpolating data at higher frequencies canfurther reduce the overall computational costs. The proposed approach to matrix completion andthe Jellyfish method are very promising for large scale data sets and can conceivably be applied tointerpolate wide azimuth data sets as well.76Chapter 4Source separation for simultaneoustowed-streamer marine acquisition—acompressed sensing approach4.1 SummaryApart from performing seismic data interpolation as shown in previous chapters, rank-minimizationbased techniques have shown great potential in dealing with seismic data acquired in simultaneousfashion. In marine environment, simultaneous acquisition is an economic way to sample seismicdata and speedup acquisition, wherein single and/or multiple source vessels fire sources at near-simultaneous or slightly random times, resulting in overlapping shot records. The current paradigmfor simultaneous towed-streamer marine acquisition incorporates “low-variability” in source firingtimes—i.e., 0 ≤ 1 or 2 seconds, since both the sources and receivers are moving. This resultsin low degree of randomness in simultaneous data, which is challenging to separate (into its con-stituent sources) using compressed sensing based separation techniques since randomization is thekey to successful recovery via compressed sensing. In this chapter, we address the challenge ofsource separation for simultaneous towed-streamer acquisitions via two compressed sensing basedapproaches—i.e., sparsity-promotion and rank-minimization. We illustrate the performance of boththe sparsity-promotion and rank-minimization based techniques by simulating two simultaneoustowed-streamer acquisition scenarios—i.e., over/under and simultaneous long offset. A field dataexample from the Gulf of Suez for the over/under acquisition scenario is also included. We observethat the proposed approaches give good and comparable recovery qualities of the separated sources,but the rank-minimization technique outperforms the sparsity-promoting technique in terms of thecomputational time and memory. We also compare these two techniques with the NMO-basedA version of this chapter has been published in Geophysics, 2016, vol 80, pages WD73-WD88.77median filtering type approach.4.2 IntroductionThe benefits of simultaneous source marine acquisition are manifold—it allows the acquisition ofimproved-quality seismic data at standard (conventional) acquisition turnaround, or a reducedturnaround time while maintaining similar quality, or a combination of both advantages. In simul-taneous marine acquisition, a single or multiple source vessels fire sources at near-simultaneous orslightly random times resulting in overlapping shot records [de Kok and Gillespie, 2002, Beasley,2008, Berkhout, 2008b, Hampson et al., 2008, Moldoveanu and Quigley, 2011, Abma et al., 2013], asopposed to non-overlapping shot records in conventional marine acquisition. A variety of simulta-neous source survey designs have been proposed for towed-streamer and ocean bottom acquisitions,where small-to-large random time delays between multiple sources have been used [Beasley, 2008,Moldoveanu and Fealy, 2010, Mansour et al., 2012c, Abma et al., 2013, Wason and Herrmann,2013b, Mosher et al., 2014].An instance of low-variability in source firing times—e.g., 0 ≤ 1 (or 2) second, is the over/under(or multi-level) source acquisition [Hill et al., 2006, Moldoveanu et al., 2007, Lansley et al., 2007,Long, 2009, Hegna and Parkes, 2012, Torben Hoy, 2013]. The benefits of acquiring and processingover/under data are clear, the recorded bandwidth is extended at both low and high ends of thespectrum since the depths of the sources produce complementary ghost functions, avoiding deepnotches in the spectrum. The over/under acquisition allows separation of the up- and down-goingwavefields at the source (or receiver) using a vertical pair of sources (or receivers) to determinewave direction. Simultaneous long offset acquisition (SLO) is another variation of simultaneoustowed-streamer acquisition, where an extra source vessel is deployed, sailing one spread-lengthahead of the main seismic vessel [Long et al., 2013]. The SLO technique is better in comparisonto conventional acquisition since it provides longer coverage in offsets, less equipment downtime(doubling the vessel count inherently reduces the streamer length by half), easier maneuvering, andshorter line turns.Simultaneous acquisition (e.g., over/under and SLO) results in seismic interferences or sourcecrosstalk that degrades quality of the migrated images. Therefore, an effective (simultaneous)source separation technique is required, which aims to recover unblended interference-free data—as acquired during conventional acquisition—from simultaneous data. The challenge of sourceseparation (or deblending) has been addressed by many researchers [Stefani et al., 2007, Moore et al.,2008, Akerberg et al., 2008, Huo et al., 2009], wherein the key observation has been that as longas the sources are fired at suitably randomly dithered times, the resulting interferences (or sourcecrosstalk) will appear noise-like in specific gather domains such as common-offset and common-receiver, turning the separation problem into a (random) noise removal procedure. Inversion-typealgorithms [Moore, 2010, Abma et al., 2010, Mahdad et al., 2011, Doulgeris et al., 2012, Baardman78and van Borselen, 2013] take advantage of sparse representations of coherent seismic signals. Wasonand Herrmann [2013a]; Wason and Herrmann [2013b] proposed an alternate sampling strategyfor simultaneous acquisition (time-jittered marine) that leverages ideas from compressed sensing(CS), addressing the deblending problem through a combination of tailored (blended) acquisitiondesign and sparsity-promoting recovery via convex optimization using one-norm constraints. Thisrepresents a scenario of high-variability in source firing times—e.g., > 1 second, resulting in irregularshot locations.One of the source separation techniques is the normal moveout based median filtering, where thekey idea is as follows: i) transform the blended data into the midpoint-offset domain, ii) performsemblance analysis on common-midpoint gathers to pick the normal moveout (NMO) velocitiesfollowed by NMO corrections, iii) perform median filtering along the offset directions and thenapply inverse NMO corrections. One of the major assumptions in the described workflow is thatthe seismic events become flat after NMO corrections, however, this can be challenging when thegeology is complex and/or with the presence of noise in the data. Therefore, the above processalong with the velocity analysis is repeated a couple of times to get a good velocity model toeventually separate simultaneous data.Recently, rank-minimization based techniques have been used for source separation by Maras-chini et al. [2012] and Cheng and Sacchi [2013]. The general idea is to exploit the low-rank structureof seismic data when it is organized in a matrix. Low-rank structure refers to the small numberof nonzero singular values, or quickly decaying singular values. Maraschini et al. [2012] followedthe rank-minimization based approach proposed by Oropeza and Sacchi [2011], who identified thatseismic temporal frequency slices organized into a block Hankel matrix, in ideal conditions, is amatrix of rank k, where k is the number of different plane waves in the window of analysis. Oropezaand Sacchi [2011] showed that additive random noise increase the rank of the block Hankel matrixand presented an iterative algorithm that resembles seismic data reconstruction with the method ofprojection onto convex sets, where they use a low-rank approximation of the Hankel matrix via therandomized singular value decomposition [Liberty et al., 2007, Halko et al., 2011a] to interpolateseismic temporal frequency slices. While this technique may be effective the approach requiresembedding the data into an even larger space where each dimension of size n is mapped to a matrixof size n×n. Consequently, these approaches are applied on small data windows, where one has tochoose the size of these windows. Although mathematically desirable due to the seismic signal be-ing stationary in sufficiently small windows, Kumar et al. [2015a] showed that the act of windowingfrom a matrix-rank point of view degrades the quality of reconstruction in the case of missing-traceinterpolation. Choosing window sizes apriori is also a difficult task, as it is not altogether obvioushow to ensure that the resulting sub-volume is approximately a plane-wave.794.2.1 MotivationThe success of CS hinges on randomization of the acquisition, as presented in our previous workon simultaneous source acquisition [Mansour et al., 2012c, Wason and Herrmann, 2013b], whichrepresents a case of high-variability in source firing times—e.g., within a range of 1-20 seconds, re-sulting in overlapping shot records that lie on irregular spatial grids. Consequently, this made ourmethod applicable to marine acquisition with ocean bottom cables/nodes. Successful separation ofsimultaneous data by sparse inversion via one-norm minimization, in this high-variability scenario,motivated us to analyze the performance of our separation algorithm for the low-variability, simulta-neous towed-streamer acquisitions. In this chapter, we address the challenge of source separation fortwo types of simultaneous towed-streamer marine acquisition—over/under and simultaneous longoffset. We also compare the sparsity-promoting separation technique with separation via rank-minimization based technique, since the latter is relatively computationally faster and memoryefficient, as shown by Kumar et al. [2015a] for missing-trace interpolation.4.2.2 ContributionsOur contributions in this work are the following: first, we propose a practical framework for sourceseparation based upon the compressed sensing (CS) theory, where we outline the necessary con-ditions for separating the simultaneous towed-streamer data using sparsity-promoting and rank-minimization techniques. Second, we show that source separation using the rank-minimizationbased framework includes a “transform domain” where we exploit the low-rank structure of seismicdata. We further establish that in simultaneous towed-streamer acquisition each monochromaticfrequency slice of the fully sampled blended data matrix with periodic firing times has low-rankstructure in the proposed transform domain. However, uniformly random firing-time delays increasethe rank of the resulting frequency slice in this transform domain, which is a necessary conditionfor successful recovery via rank-minimization based techniques.Third, we show that seismic frequency slices in the proposed transform domain exhibit low-rank structure at low frequencies, but not at high frequencies. Therefore, in order to exploit thelow-rank structure at higher frequencies we adopt the Hierarchical Semi-Separable matrix repre-sentation (HSS) method proposed by Chandrasekaran et al. [2006] to represent frequency slices.Finally, we combine the (SVD-free) matrix factorization approach recently developed by Lee et al.[2010a] with the Pareto curve approach proposed by Berg and Friedlander [2008]. This renders theframework suitable for large-scale seismic data since it avoids the computation of the singular valuedecomposition (SVD), a necessary step in traditional rank-minimization based methods, which isprohibitively expensive for large matrices.We simulate two simultaneous towed-streamer acquisitions—over/under and simultaneous longoffset, and also use a field data example for over/under acquisition. We compare the recovery interms of the separation quality, computational time and memory usage. In addition, we also make80comparisons with the NMO-based median filtering type technique proposed by Chen et al. [2014].4.3 TheoryCompressed sensing is a signal processing technique that allows a signal to be sampled at sub-Nyquist rate and offers three fundamental principles for successful reconstruction of the originalsignal from relatively few measurements. The first principle utilizes the prior knowledge that theunderlying signal of interest is sparse or compressible in some transform domain—i.e., if only asmall number k of the transform coefficients are nonzero or if the signal can be well approximatedby the k largest-in-magnitude transform coefficients. The second principle is based upon a samplingscheme that breaks the underlying structure—i.e., decreases the sparsity of the original signal in thetransform domain. Once the above two principles hold, a sparsity-promoting optimization problemcan be solved in order to recover the fully sampled signal. It is well known that seismic data admitsparse representations by curvelets that capture “wavefront sets” efficiently (see e.g., Smith [1998],Cande`s and Demanet [2005], Hennenfent and Herrmann [2006a] and the references therein).For high resolution data represented by the N -dimensional vector f0 ∈ RN , which admits asparse representation x0 ∈ CP in some transform domain characterized by the operator S ∈ CP×Nwith P ≥ N , the sparse recovery problem involves solving an underdetermined system of equations:b = Ax0, (4.1)where b ∈ Cn, n  N ≤ P , represents the compressively sampled data of n measurements, andA ∈ Cn×P represents the measurement matrix. We denote by x0 a sparse synthesis coefficientvector of f0. When x0 is strictly sparse—i.e., only k < n nonzero entries in x0, sparsity-promotingrecovery can be achieved by solving the `0 minimization problem, which is a combinatorial problemand quickly becomes intractable as the dimension increases. Instead, the basis pursuit denoise(BPDN) convex optimization problem:minimizex∈CP‖x‖1 subject to ‖b−Ax‖2 ≤ , (BPDN)can be used to recover x˜, which is an estimate of x0. Here,  represents the error-bound inthe least-squares misfit and the `1 norm ‖x‖1 is the sum of absolute values of the elements of avector x. The matrix A can be composed of the product of an n × N sampling (or acquisition)matrix M and the sparsifying operator S such that A := MSH , here H denotes the Hermitiantranspose. Consequently, the measurements are given by b = Ax0 = Mf0. A seismic line with Nssources, Nr receivers, and Nt time samples can be reshaped into an N dimensional vector f , whereN = Ns×Nr×Nt. For simultaneous towed-streamer acquisition, given two unblended data vectors81x1 and x2 and (blended) measurements b, we can redefine Equation 4.1 asA︷ ︸︸ ︷[MT1SH MT2SH] x︷ ︸︸ ︷[x1x2]= b,(4.2)where T1 and T2 are defined as the firing-time delay operators which apply uniformly randomtime delays to the first and second source, respectively. Note that accurate knowledge of the firingtimes is essential for successful recovery by the proposed source separation techniques. We wishto recover a sparse approximation f˜ of the discretized wavefield f (corresponding to each source)from the measurements b. This is done by solving the BPDN sparsity-promoting program, usingthe SPG`1 solver [see Berg and Friedlander, 2008, Hennenfent et al., 2008, for details], yieldingf˜ = SH x˜ for each source.Sparsity is not the only structure seismic data exhibits where three- or five-dimensional seismicdata is organized as a vector. High-dimensional seismic data volumes can also be representedas matrices or tensors, where the low-rank structure of seismic data can be exploited [Trickettand Burroughs, 2009, Oropeza and Sacchi, 2011, Kreimer and Sacchi, 2012a, Silva and Herrmann,2013b, Aravkin et al., 2014b]. This low-rank property of seismic data leads to the notion of matrixcompletion theory which offers a reconstruction strategy for an unknown matrix X from its knownsubsets of entries [Cande`s and Recht, 2009, Recht et al., 2010a]. The success of matrix completionframework hinges on the fact that regularly sampled target dataset should exhibit a low-rankstructure in the rank-revealing “transform domain” while subsampling should destroy the low-rankstructure of seismic data in the transform domain.4.3.1 Rank-revealing “transform domain”Following the same analogy of CS, the main challenge in applying matrix completion techniques tothe source separation problem is to find a “transform domain” wherein: i) fully sampled conven-tional (or unblended) seismic data have low-rank structure—i.e., quickly decaying singular values;ii) blended seismic data have high-rank structure—i.e., slowly decaying singular values. When theseproperties hold, rank-minimization techniques (used in matrix completion) can be used to recoverthe deblended signal. Kumar et al. [2013a] showed that the frequency slices of unblended seismicdata do not exhibit low-rank structure in the source-receiver (s-r) domain since strong wavefrontsextend diagonally across the s-r plane. However, transforming the data into the midpoint-offset(m-h) domain results in a vertical alignment of the wavefronts, thereby reducing the rank of the82frequency slice matrix. The midpoint-offset domain is a coordinate transformation defined as:xmidpoint =12(xsource + xreceiver),xoffset =12(xsource − xreceiver).These observations motivate us to exploit the low-rank structure of seismic data in the midpoint-offset domain for simultaneous towed-streamer acquisition. Figures 4.1a and 4.1c show a monochro-matic frequency slice (at 5 Hz) for simultaneous acquisition with periodic firing times in the source-receiver (s-r) and midpoint-offset (m-h) domains, while Figures 4.1b and 4.1d show the same forsimultaneous acquisition with random firing-time delays. Note that we use the source-receiverreciprocity to convert each monochromatic frequency slice of the towed-streamer acquisition tosplit-spread type acquisition, which is required by our current implementation of rank-minimizationbased techniques for 2-D seismic acquisition. For 3-D seismic data acquisition, however, where seis-mic data exhibit 5-D structure, we can follow the strategy proposed by Kumar et al. [2015a], wheredifferent matricization is used as a transformation domain to exploit the low-rank structure of seis-mic data. Therefore, in 3-D seismic data acquisition we do not have to work in the midpoint-offsetdomain which removes the requirement of source-receiver reciprocity.As illustrated in Figure 4.1, simultaneously acquired data with periodic firing times preservescontinuity of the waveforms in the s-r and m-h domains, which inherently do not change therank of blended data compared to unblended data. Introducing random time delays destroyscontinuity of the waveforms in the s-r and m-h domains, thus increasing the rank of the blendeddata matrix drastically, which is a necessary condition for rank-minimization based algorithmsto work effectively. To illustrate this behaviour, we plot the decay of the singular values of a 5Hz monochromatic frequency slice extracted from the periodically and randomized simultaneousacquisition in the s-r and m-h domains, respectively in Figure 4.2a. Note that uniformly randomfiring-time delays do not noticeably change the decay of the singular values in the source-receiver(s-r) domain, as expected, but significantly slow down the decay rate in the m-h domain.Similar trends are observed for a monochromatic frequency slice at 40 Hz in Figure 4.2b. Fol-lowing the same analogy, Figures 4.2c and 4.2d show how randomization in acquisition destroys thesparse structure of seismic data in the source-channel (or source-offset) domain—i.e., slow decay ofthe curvelet coefficients, hence, favouring recovery via sparsity-promotion in this domain. Similarly,for simultaneous long offset acquisition, we exploit the low-rank structure of seismic data in them-h domain, and the sparse structure in the source-channel domain.Seismic frequency slices exhibit low-rank structure in the m-h domain at low frequencies, butthe same is not true for data at high frequencies. This is because in the low-frequency slices, thevertical alignment of the wavefronts can be accurately approximated by a low-rank representation.On the other hand, high-frequency slices include a variety of wave oscillations that increase the rank,83[a] [b] [c][d]Figure 4.1: Monochromatic frequency slice at 5 Hz in the source-receiver (s-r) and midpoint-offset (m-h) domain for blended data (a,c) with periodic firing times and (b,d) withuniformly random firing times for both sources.even though the energy remains focused around the diagonal [Kumar et al., 2013a]. To illustratethis phenomenon, we plot a monochromatic frequency slice at 40 Hz in the s-r domain and the m-hdomain for over/under acquisition in Figure 4.3. When analyzing the decay of the singular valuesfor high-frequency slices in the s-r domain and the m-h domain (Figure 4.2b), we observe that thesingular value decay is slower for the high-frequency slice than for the low-frequency slice. Therefore,rank-minimization in the high-frequency range requires extended formulations that incorporate thelow-rank structure.To exploit the low-rank structure of high-frequency data, we rely on the Hierarchical Semi-Separable matrix representation (HSS) method proposed by Chandrasekaran et al. [2006] to rep-resent frequency slices. The key idea in the HSS representation is that certain full-rank matrices,e.g., matrices that are diagonally dominant with energy decaying along the off-diagonals, can berepresented by a collection of low-rank sub-matrices. Kumar et al. [2013a] showed the possibilityof finding accurate low-rank approximations of sub-matrices of the high-frequency slices by par-titioning the data into the HSS structure for missing-trace interpolation. Jumah and Herrmann[2014] showed that HSS representations can be used to reduce the storage and computational cost84[a] [b] [c][d]Figure 4.2: Decay of singular values for a frequency slice at (a) 5 Hz and (b) 40 Hz ofblended data. Source-receiver domain: blue—periodic, red—random delays. Midpoint-offset domain: green—periodic, cyan—random delays. Corresponding decay of thenormalized curvelet coefficients for a frequency slice at (c) 5 Hz and (d) 40 Hz ofblended data, in the source-channel domain.for the estimation of primaries by sparse inversions. They combined the HSS representation withthe randomized SVD proposed by Halko et al. [2011c] to accelerate matrix-vector multiplicationsthat are required for sparse inversion.4.3.2 Hierarchical semi-separable matrix representation (HSS)The HSS structure first partitions a matrix into diagonal and off-diagonal sub-matrices. The samepartitioning structure is then applied recursively to the diagonal sub-matrices only. To illustrate theHSS partitioning, we consider a 2-D monochromatic high-frequency data matrix at 40 Hz in the s-rdomain. We show the first-level of partitioning in Figure 4.4a and the second-level partitioning inFigure 4.4b in their corresponding source-receiver domains. Figures 4.5a and 4.5b display the first-level off-diagonal sub-blocks, Figure 4.5c is the diagonal sub-block, and the corresponding decay ofthe singular values is displayed in Figure 4.6. We can clearly see that the off-diagonal sub-matriceshave low-rank structure, while the diagonal sub-matrices have higher rank. Further partitioningof the diagonal sub-blocks (Figure 4.4b) allows us to find better low-rank approximations. Thesame argument holds for the simultaneous long offset acquisition. Therefore, for low-variabilityacquisition scenarios, each frequency slice is first partitioned using HSS and then deblended in itsrespective m-h domain, as showed for missing-trace interpolation by Kumar et al. [2013a].85Figure 4.3: Monochromatic frequency slice at 40 Hz in the s-r and m-h domain for blendeddata (a,c) with periodic firing times and (b,d) with uniformly random firing times forboth sources.[a] [b]Figure 4.4: HSS partitioning of a high-frequency slice at 40 Hz in the s-r domain: (a) first-level, (b) second-level, for randomized blended acquisition.86[a] [b][c]Figure 4.5: (a,b,c) First-level sub-block matrices (from Figure 4.4a).Figure 4.6: Decay of singular values of the HSS sub-blocks in s-r domain: red—Figure 4.5a,black—Figure 4.5b, blue—Figure 4.5c.87One of the limitations of matrix completion type approaches for large-scale seismic data is thenuclear-norm projection, which inherently involves the computation of singular value decomposi-tions (SVD). Aravkin et al. [2014b] showed that the computation of SVD is prohibitively expensivefor large-scale data such as seismic, therefore, we propose a matrix-factorization based approach toavoid the need for expensive computation of SVDs [see Aravkin et al., 2014b, for details]. In thenext section, we introduce the matrix completion framework and explore its necessary extension toseparate large-scale simultaneous seismic data.4.3.3 Large-scale seismic data: SPG-LR frameworkLet X0 be a low-rank matrix in Cn×m and A be a linear measurement operator that maps fromCn×m → Cp with p n×m. Under the assumption that the blending process increases the rankof the matrix X0, the source separation problem is to find the matrix of lowest possible rank thatagrees with the above observations. The rank-minimization problem involves solving the followingproblem for A, up to a given tolerance :minimizeXrank(X) subject to ‖A(X)− b‖2 ≤ ,where rank is defined as the maximum number of linearly independent rows or column of a ma-trix, b is a set of blended measurements. For simultaneous towed-streamer acquisition, we followequation 4.2 and redefine our system of equations asA︷ ︸︸ ︷[MT1SH MT2SH] X︷ ︸︸ ︷[X1X2]= b,where S is the transformation operator from the s-r domain to the m-h domain. Recht et al.[2010b] showed that under certain general conditions on the operator A, the solution to the rank-minimization problem can be found by solving the following nuclear-norm minimization problem:minimizeX‖X‖∗ subject to ‖A(X)− b‖2 ≤ , (BPDN)where ‖X‖∗ = ‖σ‖1, and σ is a vector of singular values. Unfortunately, for large-scale data, solvingthe BPDN problem is difficult since it requires repeated projections onto the set ‖X‖∗ ≤ τ , whichmeans repeated SVD or partial SVD computations. Therefore, in this paper, we avoid computingsingular value decompositions (SVD) of the matrices and use an extension of the SPG`1 solver[Berg and Friedlander, 2008] developed for the BPDN problem in Aravkin et al. [2013b]. We referto this extension as SPG-LR in the rest of the paper. The SPG-LR algorithm finds the solutionto the BPDN problem by solving a sequence of LASSO (least absolute shrinkage and selection88operator) subproblems:minimizeX‖A(X)− b‖2 subject to ||X||∗ ≤ τ, (LASSOτ )where τ is updated by traversing the Pareto curve. The Pareto curve defines the optimal trade-offbetween the two-norm of the residual and the one-norm of the solution [Berg and Friedlander,2008]. Solving each LASSO subproblem requires a projection onto the nuclear-norm ball ‖X‖∗ ≤ τin every iteration by performing a singular value decomposition and then thresholding the singularvalues. For large-scale seismic problems, it becomes prohibitively expensive to carry out such alarge number of SVDs. Instead, we adopt a recent factorization-based approach to nuclear-normminimization [Rennie and Srebro, 2005a, Lee et al., 2010a, Recht and Re´, 2011]. The factorizationapproach parametrizes the matrix (X1, X2) ∈ Cn×m as the product of two low-rank factors (L1,L2) ∈ Cn×k and (R1, R2) ∈ Cm×k such that,X =[L1RH1L2RH2]. (4.3)Here, k represents the rank of the L and R factors. The optimization scheme can then be carriedout using the factors (L1,L2) and (R1,R2) instead of (X1,X2), thereby significantly reducing thesize of the decision variable from 2nm to 2k(n + m) when k  m,n. Rennie and Srebro [2005a]showed that the nuclear-norm obeys the relationship:‖X‖∗ ≤ 12∥∥∥∥∥[L1R1]∥∥∥∥∥2F+12∥∥∥∥∥[L2R2]∥∥∥∥∥2F=: Φ(L1,R1,L2,R2), (4.4)where ‖ · ‖2F is the Frobenius norm of the matrix—i.e., sum of the squared entires. Consequently,the LASSO subproblem can be replaced byminimizeL1,R1,L2,R2‖A(X)− b‖2 subject to Φ(L1,R1,L2,R2) ≤ τ , (4.5)where the projection onto Φ(L1,R1,L2,R2) ≤ τ is easily achieved by multiplying each factor(L1,L2) and (R1,R2) by the scalar√2τ/Φ(L1,R1,L2,R2). Equation 4.4, for each HSS sub-matrixin the m-h domain, guarantees that ‖X‖∗ ≤ τ for any solution of 4.5. Once the optimization problemis solved, each sub-matrix in the m-h domain is transformed back into the s-r domain, where weconcatenate all the sub-matrices to get the deblended monochromatic frequency data matrices.One of the advantages of the HSS representation is that it works with recursive partitioning of amatrix and sub-matrices can be solved in parallel, speeding up the optimization formulation.894.4 ExperimentsWe perform source separation for two simultaneous towed-streamer acquisition scenarios—over/underand simultaneous long offset, by generating synthetic datasets on complex geological models usingthe IWAVE [Symes et al., 2011a] time-stepping acoustic simulation software, and also use a fielddataset from the Gulf of Suez. Source separation for over/under acquisition is tested on two differ-ent datasets. The first dataset is simulated on the Marmousi model [Bourgeois et al., 1991], whichrepresents a complex-layer model with steeply dipping reflectors that make the data challenging.With a source (and channel/receiver) sampling of 20.0 m, one dataset is generated with a source-depth of 8.0 m (Figures 4.7a and 4.7d), while the other dataset has the source at 12.0 m depth(Figures 4.7b and 4.7e), resulting in 231 sources and 231 channels. The temporal length of eachdataset is 4.0 s with a sampling interval of 0.004 s. The second dataset is a field data example fromthe Gulf of Suez. In this case, the first source is placed at 5.0 m depth (Figures 4.8a and 4.8d)and the second source is placed at 10.0 m depth (Figures 4.8b and 4.8e). The source (and channel)sampling is 12.5 m, resulting in 178 sources and 178 channels with a time sampling interval of 0.004s.The simultaneous long offset acquisition is simulated on the BP salt model [Billette and Brandsberg-Dahl, 2004], where the presence of salt-bodies make the data challenging. The two source vesselsare 6.0 km apart and the streamer length is 6.0 km. Both the datasets (for source 1 and source2) contain 361 sources and 361 channels with a spatial interval of 12.5 m, where the source andstreamer depth is 6.25 m. The temporal length of each dataset is 6.0 s with a sampling intervalof 0.006 s. A single shot gather from each dataset is shown in Figures 4.9a and 4.9b and thecorresponding channel gathers are shown in Figures 4.9d and 4.9e. The datasets for each sourcein both the acquisition scenarios are (simply) summed for simultaneous acquisition with periodicfiring times, while uniformly random time delays between 0-1 second are applied to each source forthe randomized simultaneous acquisition. Figures 4.7c, 4.8c and 4.9c show the randomized blendedshot gathers for the Marmousi, the Gulf of Suez and the BP datasets, respectively. As illustratedin the figures, both the sources fire at random times (independent of each other) within the intervalof 0-1 second, hence, the difference between the firing times of the sources is always less than 1second. The corresponding randomized blended channel gathers are shown in Figures 4.7f, 4.8fand 4.9f. Note that the speed of the vessels in both the acquisition scenarios is no different thanthe current practical speed of the vessels in the field.For deblending via rank-minimization, second-level of HSS partitioning, on each frequency slicein the s-r domain, was sufficient for successful recovery in both the acquisition scenarios. Aftertransforming each sub-block into the m-h domain, deblending is then performed by solving thenuclear-norm minimization formulation (BPDN) on each sub-block, using 350 iterations of SPG-LR. In order to choose an appropriate rank value, we first perform deblending for frequency slicesat 0.2 Hz and 125 Hz. For the over/under acquisition simulated on the Marmousi model, the best90[a] Channel (km)0 2 4Time (s)00.511.522.533.5[b] Channel (km)0 2 4Time (s)00.511.522.533.5[c] Channel (km)0 2 4Time (s)00.511.522.533.5[d] Source (km)0 2 4Time (s)00.511.522.533.5[e] Source (km)0 2 4Time (s)00.511.522.533.5[f] Source (km)0 2 4Time (s)00.511.522.533.5Figure 4.7: Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous over/under acquisition simulated on the Mar-mousi model. (d, e) Corresponding common-channel gathers for each source and (f)the blended common-channel gather.rank value is 30 and 80 for each frequency slice, respectively. The best rank values for the Gulf ofSuez dataset are 20 and 100, respectively. For simultaneous long offset acquisition, the best rankvalue is 10 and 90 for frequency slices at 0.15 Hz and 80 Hz, respectively. Hence, we adjust therank linearly within these ranges when moving from low to high frequencies, for each acquisitionscenario. For deblending via sparsity-promotion, we use the BPDN formulation to minimize the`1 norm (instead of the nuclear-norm) where the transformation operator S is the 2-D curveletoperator. Here, we run 350 iterations of SPG`1.For the over/under acquisition scenario simulated on the Marmousi model, Figures 4.10a and 4.10cshow the deblended shot gathers via rank-minimization and Figures 4.10e and 4.10g show the de-blended shot gathers via sparsity-promotion, respectively. The deblended common-channel gath-ers via rank-minimization and sparsity-promotion are shown in Figures 4.11a, 4.11c and Fig-ures 4.11e, 4.11g, respectively. For the Gulf of Suez field dataset, Figures 4.12 and 4.13 show91[a] Channel (km)0 1 2Time (s)00.511.522.533.5[b] Channel (km)0 1 2Time (s)00.511.522.533.5[c] Channel (km)0 1 2Time (s)00.511.522.533.5[d] Source (km)0 1 2Time (s)00.511.522.533.5[e] Source (km)0 1 2Time (s)00.511.522.533.5[f] Source (km)0 1 2Time (s)00.511.522.533.5Figure 4.8: Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous over/under acquisition from the Gulf of Suezdataset. (d, e) Corresponding common-channel gathers for each source and (f) theblended common-channel gather.the deblended gathers and difference plots in the common-shot and common-channel domain, re-spectively. The corresponding deblended gathers and difference plots in the common-shot andcommon-channel domain for the simultaneous long offset acquisition scenario are shown in Fig-ures 4.14 and 4.15.As illustrated by the results and their corresponding difference plots, both the CS-based ap-proaches of rank-minimization and sparsity-promotion are able to deblend the data for the low-variability acquisition scenarios fairly well. In all the three different datasets, the average SNRsfor separation via sparsity-promotion is slightly better than rank-minimization, but the differenceplots show that the recovery via rank-minimization is equivalent to the sparsity-promoting basedrecovery where it is able to recover most of the coherent energy. Also, rank-minimization outper-forms the sparsity-promoting technique in terms of the computational time and memory usage asrepresented in Tables 4.1, 4.2 and 4.3. Both the CS-based recoveries are better for the simultane-92[a] Channel (km)0 2 4 6Time (s)012345[b] Channel (km)0 2 4 6Time (s)012345[c] Channel (km)0 2 4 6Time (s)012345[d] Source (km)0 2 4 6Time (s)012345[e] Source (km)0 2 4 6Time (s)012345[f] Source (km)0 2 4 6Time (s)012345Figure 4.9: Original shot gather of (a) source 1, (b) source 2, and (c) the correspondingblended shot gather for simultaneous long offset acquisition simulated on the BP saltmodel. (d, e) Corresponding common-channel gathers for each source and (f) theblended common-channel gather.ous long offset acquisition than the recoveries from the over/under acquisition scenario. A possibleexplanation for this improvement is the long offset distance that increases randomization in thesimultaneous acquisition, which is a more favourable scenario for recovery by CS-based approaches.Figure 4.16 demonstrates the advantage of the HSS partitioning, where the SNRs of the deblendeddata are significantly improved.4.4.1 Comparison with nmo-based median filteringWe also compare the performance of our CS-based deblending techniques with deblending usingthe NMO-based median filtering technique proposed by Chen et al. [2014], where we work on acommon-midpoint gather from each acquisition scenario. For the over/under acquisition simulatedon the Marmousi model, Figures 4.17a and 4.17e show the blended common-midpoint gathers and93Marmousi modelTime Memory SNRSparsity 167 7.0 16.7, 16.7Rank 12 2.8 15.0, 14.8Table 4.1: Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniques for theMarmousi model.Gulf of SuezTime Memory SNRSparsity 118 6.6 14.6Rank 8 2.6 12.8Table 4.2: Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniques for theGulf of Suez dataset.deblending using the median filtering technique is shown in Figures 4.17b and 4.17f. The cor-responding deblended common-midpoint gathers from the two CS-based techniques are shown inFigures 4.17(c,d,g,h). Figure 4.18 shows the blended and deblended common-midpoint gathers forthe field data from the Gulf of Suez. We observe that recoveries via the proposed CS-based ap-proaches are comparable to the recovery from the median filtering technique. Similarly, Figure 4.19shows the results for the simultaneous long offset acquisition simulated on the BP salt model. Here,the CS-based techniques result in slightly improved recoveries.4.4.2 RemarkIt is important to note here that we perform the CS-based source separation algorithms only once,however, we can always perform a few more runs of the algorithms where we can first subtract thedeblended source 1 and source 2 from the acquired blended data and then re-run the algorithms todeblend the energy in the residual data. Hence, the recovery can be further improved until necessary.Since separation via rank-minimization is computationally faster than the sparsity based technique,multiple passes through the data is a computationally viable option for the former deblendingtechnique.4.5 DiscussionThe above experiments demonstrate the successful implementation of the proposed CS-based ap-proaches of rank-minimization and sparsity-promotion for source separation in the low-variability,94[a] Channel (km)0 2 4Time (s)00.511.522.533.5[b] Channel (km)0 2 4Time (s)00.511.522.533.5[c] Channel (km)0 2 4Time (s)00.511.522.533.5[d] Channel (km)0 2 4Time (s)00.511.522.533.5[e] Channel (km)0 2 4Time (s)00.511.522.533.5[f] Channel (km)0 2 4Time (s)00.511.522.533.5[g] Channel (km)0 2 4Time (s)00.511.522.533.5[h] Channel (km)0 2 4Time (s)00.511.522.533.5Figure 4.10: Deblended shot gathers and difference plots (from the Marmousi model) ofsource 1 and source 2: (a,c) deblending using HSS based rank-minimization and (b,d)the corresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots.BP modeltime memory SNRSparsity 325 7.0 32.0, 29.4Rank 20 2.8 29.4, 29.0Table 4.3: Comparison of computational time (in hours), memory usage (in GB) and averageSNR (in dB) using sparsity-promoting and rank-minimization based techniques for theBP model.95[a] Source (km)0 2 4Time (s)00.511.522.533.5[b] Source (km)0 2 4Time (s)00.511.522.533.5[c] Source (km)0 2 4Time (s)00.511.522.533.5[d] Source (km)0 2 4Time (s)00.511.522.533.5[e] Source (km)0 2 4Time (s)00.511.522.533.5[f] Source (km)0 2 4Time (s)00.511.522.533.5[g] Source (km)0 2 4Time (s)00.511.522.533.5[h] Source (km)0 2 4Time (s)00.511.522.533.5Figure 4.11: Deblended common-channel gathers and difference plots (from the Marmousimodel) of source 1 and source 2: (a,c) deblending using HSS based rank-minimizationand (b,d) the corresponding difference plots; (e,g) deblending using curvelet-basedsparsity-promotion and (f,h) the corresponding difference plots.simultaneous towed-streamer acquisitions. The recovery is comparable for both approaches, how-ever, separation via rank-minimization is significantly faster and memory efficient. This is furtherenhanced by incorporating the HSS partitioning since it allows the exploitation of the low-rankstructure in the high-frequency regime, and renders its extension to large-scale data feasible. Notethat in the current implementation, we work with each temporal frequency slice and perform thesource separation individually. The separation results can further be enhanced by incorporatingthe information from the previously recovered frequency slice to the next frequency slice, as shownby Mansour et al. [2013] for seismic data interpolation.The success of CS hinges on randomization of the acquisition. Although, the low degree ofrandomization (e.g., 0 ≤ 1 second) in simultaneous towed-streamer acquisitions seems favourablefor source separation via CS-based techniques, however, high-variability in the firing times enhancesthe recovery quality of separated seismic data volumes, as shown in Wason and Herrmann [2013a];96[a] Channel (km)0 1 2Time (s)00.511.522.533.5[b] Channel (km)0 1 2Time (s)00.511.522.533.5[c] Channel (km)0 1 2Time (s)00.511.522.533.5[d] Channel (km)0 1 2Time (s)00.511.522.533.5[e] Channel (km)0 1 2Time (s)00.511.522.533.5[f] Channel (km)0 1 2Time (s)00.511.522.533.5[g] Channel (km)0 1 2Time (s)00.511.522.533.5[h] Channel (km)0 1 2Time (s)00.511.522.533.5Figure 4.12: Deblended shot gathers and difference plots (from the Gulf of Suez dataset) ofsource 1 and source 2: (a,c) deblending using HSS based rank-minimization and (b,d)the corresponding difference plots; (e,g) deblending using curvelet-based sparsity-promotion and (f,h) the corresponding difference plots.Wason and Herrmann [2013b] for ocean-bottom cable/node acquisition with continuous recording.One of the advantages of the proposed CS-based techniques is that it does not require velocityestimation, which can be a challenge for data with complex geologies. However, the proposedtechniques require accurate knowledge of the random firing times.So far, we have not considered the case of missing traces (sources and/or receivers), however,incorporating this scenario in the current framework is straightforward. This makes the problema joint deblending and interpolation problem. In reality, seismic data are typically irregularlysampled along spatial axes, therefore, future work includes working with non-uniform samplinggrids. Finally, we envisage that our methods can, in principle, be extended to separate 3-D blendedseismic data volumes.97[a] Source (km)0 1 2Time (s)00.511.522.533.5[b] Source (km)0 1 2Time (s)00.511.522.533.5[c] Source (km)0 1 2Time (s)00.511.522.533.5[d] Source (km)0 1 2Time (s)00.511.522.533.5[e] Source (km)0 1 2Time (s)00.511.522.533.5[f] Source (km)0 1 2Time (s)00.511.522.533.5[g] Source (km)0 1 2Time (s)00.511.522.533.5[h] Source (km)0 1 2Time (s)00.511.522.533.5Figure 4.13: Deblended common-channel gathers and difference plots (from the Gulf of Suezdataset) of source 1 and source 2: (a,c) deblending using HSS based rank-minimizationand (b,d) the corresponding difference plots; (e,g) deblending using curvelet-basedsparsity-promotion and (f,h) the corresponding difference plots.4.6 ConclusionsWe have presented two compressed sensing based methods for source separation for simultaneoustowed-streamer type acquisitions, such as the over/under and the simultaneous long offset acquisi-tion. Both the compressed sensing based approaches of rank-minimization and sparsity-promotiongive comparable deblending results, however, the former approach is readily scalable to large-scaleblended seismic data volumes and is computationally faster. This can be further enhanced by in-corporating the HSS structure with factorization-based rank-regularized optimization formulations,along with improved recovery quality of the separated seismic data. We have combined the Paretocurve approach for optimizing BPDN formulations with the SVD-free matrix factorization meth-ods to solve the nuclear-norm optimization formulation, which avoids the expensive computation ofthe singular value decomposition (SVD), a necessary step in traditional rank-minimization basedmethods. We find that our proposed techniques are comparable to the commonly used NMO-based98[a] Channel (km)0 2 4 6Time (s)012345[b] Channel (km)0 2 4 6Time (s)012345[c] Channel (km)0 2 4 6Time (s)012345[d] Channel (km)0 2 4 6Time (s)012345[e] Channel (km)0 2 4 6Time (s)012345[f] Channel (km)0 2 4 6Time (s)012345[g] Channel (km)0 2 4 6Time (s)012345[h] Channel (km)0 2 4 6Time (s)012345Figure 4.14: Deblended shot gathers and difference plots (from the BP salt model) of source 1and source 2: (a,c) deblending using HSS based rank-minimization and (b,d) the cor-responding difference plots; (e,g) deblending using curvelet-based sparsity-promotionand (f,h) the corresponding difference plots.median filtering techniques.99[a] Source (km)0 2 4 6Time (s)012345[b] Source (km)0 2 4 6Time (s)012345[c] Source (km)0 2 4 6Time (s)012345[d] Source (km)0 2 4 6Time (s)012345[e] Source (km)0 2 4 6Time (s)012345[f] Source (km)0 2 4 6Time (s)012345[g] Source (km)0 2 4 6Time (s)012345[h] Source (km)0 2 4 6Time (s)012345Figure 4.15: Deblended common-channel gathers and difference plots (from the BP saltmodel) of source 1 and source 2: (a,c) deblending using HSS based rank-minimizationand (b,d) the corresponding difference plots; (e,g) deblending using curvelet-basedsparsity-promotion and (f,h) the corresponding difference plots.100Figure 4.16: Signal-to-noise ratio (dB) over the frequency spectrum for the deblended datafrom the Marmousi model. Red, blue curves—deblending without HSS; cyan, blackcurves—deblending using second-level HSS partitioning. Solid lines—separated source1, + marker—separated source 2.101[a] Offset (km)0 2 4Time (s)00.511.522.533.5[b] Offset (km)0 2 4Time (s)00.511.522.533.5[c] Offset (km)0 2 4Time (s)00.511.522.533.5[d] Offset (km)0 2 4Time (s)00.511.522.533.5[e] Offset (km)0 2 4Time (s)00.511.522.533.5[f] Offset (km)0 2 4Time (s)00.511.522.533.5[g] Offset (km)0 2 4Time (s)00.511.522.533.5[h] Offset (km)0 2 4Time (s)00.511.522.533.5Figure 4.17: Blended common-midpoint gathers of (a) source 1 and (e) source 2 for theMarmousi model. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimization and (d,h) sparsity-promotion.102[a] Offset (km)0 1 2Time (s)00.511.522.533.5[b] Offset (km)0 1 2Time (s)00.511.522.533.5[c] Offset (km)0 1 2Time (s)00.511.522.533.5[d] Offset (km)0 1 2Time (s)00.511.522.533.5[e] Offset (km)0 1 2Time (s)00.511.522.533.5[f] Offset (km)0 1 2Time (s)00.511.522.533.5[g] Offset (km)0 1 2Time (s)00.511.522.533.5[h] Offset (km)0 1 2Time (s)00.511.522.533.5Figure 4.18: Blended common-midpoint gathers of (a) source 1, (e) source 2 for the Gulfof Suez dataset. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimization and (d,h) sparsity-promotion.103[a] Offset (km)0 2 4 6Time (s)012345[b] Offset (km)0 2 4 6Time (s)012345[c] Offset (km)0 2 4 6Time (s)012345[d] Offset (km)0 2 4 6Time (s)012345[e] Offset (km)0 2 4 6Time (s)012345[f] Offset (km)0 2 4 6Time (s)012345[g] Offset (km)0 2 4 6Time (s)012345[h] Offset (km)0 2 4 6Time (s)012345Figure 4.19: Blended common-midpoint gathers of (a) source 1, (e) source 2 for the BP saltmodel. Deblending using (b,f) NMO-based median filtering, (c,g) rank-minimizationand (d,h) sparsity-promotion.104Chapter 5Large-scale time-jittered simultaneousmarine acquisition: rank-minimizationapproach5.1 SummaryIn this chapter, we present a computationally tractable rank-minimization algorithm to separatesimultaneous time-jittered continuous recording for a 3D ocean-bottom cable survey. We will showexperimentally that the proposed algorithm is computationally tractable for 3D time-jittered marineacquisition.5.2 IntroductionSimultaneous source marine acquisition mitigates the challenges posed by conventional marineacquisition in terms of sampling and survey efficiency, since more than one shot can be fired at thesame time Beasley et al. [1998], de Kok and Gillespie [2002], Berkhout [2008a]. The final objective ofsource separation is to get interference-free shot records. Wason and Herrmann [2013b] have shownthat the challenge of separating simultaneous data can be addressed through a combination oftailored single- (or multiple-) source simultaneous acquisition design and curvelet-based sparsity-promoting recovery. The idea is to design a pragmatic time-jittered marine acquisition schemewhere acquisition time is reduced and spatial sampling is improved by separating overlapping shotrecords and interpolating jittered coarse source locations to fine source sampling grid. While theproposed sparsity-promoting approach recovers densely sampled conventional data reasonably well,it poses computational challenges since curvelet-based sparsity-promoting methods can becomecomputationally intractable—in terms of speed and memory storage—especially for large-scale 5DA version of this chapter has been published in the proceedings of SEG Annual Meeting, 2016, Denver, USA105seismic data volumes.Recently, nuclear-norm minimization based methods have shown the potential to overcome thecomputational bottleneck [Kumar et al., 2015a], hence, these methods are successfully used forsource separation [Maraschini et al., 2012, Cheng and Sacchi, 2013, Kumar et al., 2015b]. The gen-eral idea is that conventional seismic data can be well approximated in some rank-revealing trans-form domain where the data exhibit low-rank structure or fast decay of singular values. Therefore,in order to use nuclear-norm minimization based algorithms for source separation, the acquisitiondesign should increase the rank or slow the decay of the singular values. In chapter 4, we usednuclear-norm minimization formulation to separate simultaneous data acquired from an over/un-der acquisition design, where the separation is performed on each monochromatic data matrixindependently. However, by virtue of the design of the simultaneous time-jittered marine acquisi-tion we formulate a nuclear-norm minimization formulation that works on the temporal-frequencydomain—i.e., using all monochromatic data matrices together. One of the computational bottle-necks of working with the nuclear-norm minimization formulation is the computation of singularvalues. Therefore, in this chapter we combine the modified nuclear-norm minimization approachwith the factorization approach recently developed by Lee et al. [2010a]. The experimental resultson a synthetic 5D data set demonstrate successful implementation of the proposed methodology.5.3 MethodologySimultaneous source separation problem can be perceived as a rank-minimization problem. In thischapter, we follow the time-jittered marine acquisition setting proposed by Wason and Herrmann[2013b], where a single source vessel sails across an ocean-bottom array firing two airgun arraysat jittered source locations and time instances with receivers recording continuously (Figure 5.1).This results in a continuous time-jittered simultaneous data volume.Conventional 5D seismic data volume can be represented as a tensor D ∈ Cnf×nrx×nsx×nry×nsy ,where (nsx, nsy) and (nrx, nry) represents number of sources and receivers along x, y coordinatesand nf represents number of frequencies. The aim is to recover the data volume D from thecontinuous time-domain simultaneous data volume b ∈ CnT×nrx×nry by finding a minimum ranksolution D that satisfies the system of equations A(D) = b. Here, A represents a linear sampling-transformation operator, nT < nt×nsx×nsy is the total number of time samples in the continuoustime-domain simultaneous data volume, nt is the total number of time samples in the conventionalseismic data. Note that the operator A maps D to a lower dimensional simultaneous data volumeb since the acquisition process superimposes shot records shifted with respect to their firing times.The sampling-transformation operator A is defined as A = MRS, where the operator S per-mutes the tensor coordinates from (nrx, nsx, nry, nsy) (rank-revealing domain, i.e., Figure 5.2a) to(nrx, nry, nsx, nsy) (standard acquisition ordering, i.e., Figure 5.2b) and its adjoint reverses this per-mutation. The restriction operator R subsamples the conventional data volume at jittered source106Figure 5.1: Aerial view of the 3D time-jittered marine acquisition. Here, we consider onesource vessel with two airgun arrays firing at jittered times and locations. Startingfrom point a, the source vessel follows the acquisition path shown by black lines andends at point b. The receivers are placed at the ocean bottom (red dashed lines).Figure 5.2: Schematic representation of the sampling-transformation operator A during theforward operation. The adjoint of the operatorA follows accordingly. (a, b, c) representa monochromatic data slice from conventional data volume and (d) represents a timeslice from the continuous data volume.locations (Figure 5.2c), the sampling operator M maps the conventional subsampled temporal-frequency domain data to the simultaneous time-domain data (Figure 5.2d). Note that Figure 5.2drepresents a time slice from the continuous (simultaneous) data volume where the stars representlocations of jittered sources in the simultaneous acquisition.Rank-minimization formulations require that the target data set should exhibit a low-rankstructure or fast decay of singular values. Consequently, the sampling-restriction (MR) operationshould increase the rank or slow the decay of singular values. As we know, there is no uniquenotion of rank for tensors, therefore, we can choose the rank of different matricizations of DKreimer and Sacchi [2012b] where the idea is to create the matrix D(i) by group the dimensions ofD(i) specified by i and vectorize them along the rows while vectorizing the other dimensions along107[a] [b] [c] [d][e]Figure 5.3: Monochromatic slice at 10.0 Hz. Fully sampled data volume and simultaneousdata volume matricized as (a, c) i = (nsx, nsy), and (b, d) i = (nrx, nsx). (e) Decayof singular values. Notice that fully sampled data organized as i = (nsx, nsy) hasslow decay of the singular values (solid red curve) compared to the i = (nrx, nsx)organization (solid blue curve). However, the sampling-restriction operator slows thedecay of the singular values in the i = (nrx, nsx) organization (dotted blue curve)compared to the i = (nsx, nsy) organization (dotted red curve), which is a favorablescenario for the rank-minimization formulation.the columns. In this work, we consider the matricization proposed by Silva and Herrmann [2013a],where i = (nsx, nsy)—i.e., placing both source coordinates along the columns (Figure 5.3a), ori = (nrx, nsx)—i.e., placing receiver-x and source-x coordinates along the columns (Figure 5.3b).As we see in Figure 5.3e, the matricization i = (nsx, nsy) has higher rank or slow decay of thesingular values (solid red curve) compared to the matricization i = (nrx, nsx) (solid blue curve).The sampling-restriction operator removes random columns in the matricization i = (nsx, nsy)(Figure 5.3c), as a result the overall singular values decay faster (dotted red curve). This isbecause missing columns put the singular values to zero, which is opposite to the requirementof rank-minimization algorithms. On the other hand, the sampling-restriction operator removesrandom blocks in the matricization i = (nrx, nsx) (Figure 5.3d), hence, slowing down the decayof the singular values (dotted blue curve). This scenario is much closer to the matrix-completionproblem (Recht et al. [2010b]), where samples are removed at random points in a matrix. Therefore,we address the source separation problem by exploiting low-rank structure in the matricizationi = (nrx, nsx).Since rank-minimization problems are NP hard and therefore computationally intractable, Rechtet al. [2010b] showed that solutions to rank-minimization problems can be found by solving anuclear-norm minimization problem. Silva and Herrmann [2013a] showed that for seismic datainterpolation the sampling operatorM is separable, hence, data can be interpolated by working oneach monochromatic data tensor independently. Since in continuous time-jittered marine acquisi-tion, the sampling operator M is nonseparable as it is a combined time-shifting and shot-jittering108operator, we can not perform source separation independently over different monochromatic datatensors. Therefore, we formulate the nuclear-norm minimization formulation over the temporal-frequency domain as follows:minDnf∑j‖D(i)j ‖∗ subject to ‖A(D)− b‖2 ≤ , (5.1)where∑nfj ‖.‖∗ =∑nfj ‖σj‖1 and σj is the vector of singular values for each monochromaticdata matricization. One of the main drawbacks of the nuclear-norm minimization problem isthat it involves computation of the singular-value decomposition (SVD) of the matrices, which isprohibitively expensive for large-scale seismic data. Therefore, we avoid the direct approach tonuclear-norm minimization problem and follow a factorization-based approach Rennie and Srebro[2005a], Lee et al. [2010a], Recht and Re´ [2011]. The factorization-based approach parametrizeseach monochromatic data matrix D(i) as a product of two low-rank factors L(i) ∈ C(nrx·nsx)×kand R(i) ∈ C(nry ·nsy)×k such that, D(i) = L(i)R(i)H , where k represents the rank of the underlyingmatrix and H represents the Hermitian transpose. Note that tensors L,R can be formed byconcatenating each matrix L(i),R(i), respectively. The optimization scheme can then be carriedout using the tensors L,R instead of D, thereby significantly reducing the size of the decisionvariable from nrx × nry × nsx × nsy × nf to 2k × nrx × nsx × nf when k ≤ nrx × nsx. FollowingRennie and Srebro [2005a], the sum of the nuclear norm obeys the relationship:nf∑j‖D(i)j ‖∗ ≤nf∑j12‖L(i)j R(i)j ‖2F ,where ‖ · ‖2F is the Frobenius norm of the matrix (sum of the squared entries).5.4 Experiments & resultsWe test the efficacy of our method by simulating a synthetic 5D data set using the BG Compassvelocity model (provided by the BG Group) which is a geologically complex and realistic model.We also quantify the cost savings associated with simultaneous acquisition in terms of an improvedspatial-sampling ratio defined as a ratio between the spatial grid interval of observed simultaneoustime-jittered acquisition and the spatial grid interval of recovered conventional acquisition. Thespeed-up in acquisition is measured using the survey-time ratio (STR), proposed by Berkhout[2008a], which measures the ratio of time of conventional acquisition and simultaneous acquisition.Using a time-stepping finite-difference modelling code provided by Chevron, we simulate aconventional 5D data set of dimensions 2501× 101× 101× 40× 40 (nt×nrx×nry×nsx×nsy) overa survey area of approximately 4 km× 4 km. Conventional time-sampling interval is 4.0 ms, source-and receiver-sampling interval is 6.25 m. We use a Ricker wavelet with central frequency of 15.0 Hz109as source function. Figure 5.4a shows a conventional common-shot gather. Applying the sampling-transformation operator (A) to the conventional data generates approximately 65 minutes of 3Dcontinuous time-domain simultaneous seismic data, 30 seconds of which is shown in Figure 5.4b.By virtue of the design of the simultaneous time-jittered acquisition, the simultaneous data volumeb is 4-times subsampled compared to conventional acquisition. Consequently, the spatial samplingof recovered data is improved by a factor of 4 and the acquisition time is reduced by the samefactor.Simply applying the adjoint of the sampling operatorM to simultaneous data b results in stronginterferences from other sources as shown in Figure 5.4c. Therefore, to recover the interference-freeconventional seismic data volume from the simultaneous time-jittered data, we solve the factor-ization based nuclear-norm minimization formulation. We perform the source separation for arange of rank k values of the two low-rank factors L(i),R(i) and find that k = 100 gives the bestsignal-to-noise ratio (SNR) of the recovered conventional data. Figure 5.4d shows the recoveredshot gather, with an SNR of 20.8 dB, and the corresponding residual is shown in Figure 5.4e. Asillustrated, we are able to separate the shots along with interpolating the data to the finer grid of6.25 m. To establish that we loose very small coherent energy during source separation, we inten-sify the amplitudes of the residual plot by a factor of 8 (Figure 5.4e). The late arriving events,which are often weak in energy, are also separated reasonably well. Computational efficiency ofthe rank-minimization approach—in terms of the memory storage—in comparison to the curvelet-based sparsity-promoting approach is approximately 7.2 when compared with 2D curvelets and 24when compared with 3D curvelets.5.5 ConclusionsWe propose a factorization based nuclear-norm minimization formulation for simultaneous sourceseparation and interpolation of 5D seismic data volume. Since the sampling-transformation op-erator is nonseparable in the simultaneous time-jittered marine acquisition, we formulate the fac-torization based nuclear-norm minimization problem over the entire temporal-frequency domain,contrary to solving each monochromatic data matrix independently. We show that the proposedmethodology is able to separate and interpolate the data to a fine underlying grid reasonably well.The proposed approach is computationally memory efficient in comparison to the curvelet-basedsparsity-promoting approach.110[a] [b][c] [d][e]Figure 5.4: Source separation recovery. A shot gather from the (a) conventional data; (b)a section of 30 seconds from the continuous time-domain simultaneous data (b); (c)recovered data by applying the adjoint of the sampling operatorM; (d) data recoveredvia the proposed formulation (SNR = 20.8 dB); (e) difference of (a) and (d) whereamplitudes are magnified by a factor of 8 to illustrate a very small loss in coherentenergy.111Chapter 6Enabling affordable omnidirectionalsubsurface extended image volumesvia probing6.1 SummaryImage gathers as a function of subsurface offset are an important tool for the inference of rock prop-erties and velocity analysis in areas of complex geology. Traditionally, these gathers are thought ofas multidimensional correlations of the source and receiver wavefields. The bottleneck in computingthese gathers lies in the fact that one needs to store, compute, and correlate these wavefields for allshots in order to obtain the desired image gathers. Therefore, the image gathers are typically onlycomputed for a limited number of subsurface points and for a limited range of subsurface offsets,which may cause problems in complex geological areas with large geologic dips. We overcome in-creasing computational and storage costs of extended image volumes by introducing a formulationthat avoids explicit storage and removes the customary and expensive loop over shots, found inconventional extended imaging. As a result, we end up with a matrix-vector formulation from whichdifferent image gathers can be formed and with which amplitude-versus-angle and wave-equationmigration velocity analyses can be performed without requiring prior information on the geologicdips. Aside from demonstrating the formation of two-way extended image gathers for differentpurposes and at greatly reduced costs, we also present a new approach to conduct automatic wave-equation based migration-velocity analysis. Instead of focussing in particular offset directions andpreselected subsets of subsurface points, our method focuses every subsurface point for all subsur-face offset directions using a randomized probing technique. As a consequence, we obtain goodvelocity models at low cost for complex models without the need to provide information on theA version of this chapter has been published in Geophysical Prospecting, 2016, issn 1365-2478.112geologic dips.6.2 IntroductionSeismic reflection data are a rich source of information about the subsurface and by studying bothdynamic and kinematic properties of the data we can infer both large-scale velocity variations aswell as local rock properties. While seismic data volumes – as a function of time, source andreceiver positions – contains all reflected and refracted events, it is often more convenient to mapthe relevant events present in pre-stack data to their respective positions in the subsurface. Thisstripping away of the propagation effects leads to the definition of an image volume or extendedimage – as a function of depth, lateral position and some redundant spatial coordinate(s) – thathas the same (or higher) dimensionality as the original data volume. This mapping can be thoughtof as a coordinate transform, depending on the large-scale background velocity model features,that maps reflection events observed in data collected at the surface to focussed secondary pointsources tracing out the respective reflectors. The radiation pattern of these point sources revealsthe angle-dependent reflection coefficient and can be used to infer the local rock properties. Errorsin the large-scale background velocity model features are revealed through the failure of the eventsto fully focus. This principle forms the basis of many velocity-model analysis procedures.Over the past decades various methods have been proposed for computing and exploiting imagevolumes or slices thereof – the so-called image gathers [Claerbout, 1970, Doherty and Claerbout,1974, de Bruin et al., 1990, Symes and Carazzone, 1991, ten Kroode et al., 1994, Chauris et al., 2002,Biondi and Symes, 2004, Sava and Vasconcelos, 2011, Koren and Ravve, 2011]. These approachesdiffer in the way the redundant coordinate is introduced and the method used to transform datavolumes into image volumes. Perhaps the most well-known example are normal moveout correctedmidpoint gathers, where simple move-out corrections transform observed data into volumes thatcan be migrated into extended images that are a function of time, midpoint position and surfaceoffset [Claerbout, 1985b].While this type of surface-offset pre-stack images have been used for migration-velocity analysisand reservoir characterization, these extended images may suffer from unphysical artifacts. Forthis reason extended images are nowadays formed as functions of the subsurface offset, rather thansurface offset as justified by Stolk et al. [2009], who showed that this approach produces artifact-free image gathers in complex geological settings. So far, wave-equation based constructions ofthese extended images are based on the double-square-root equation [Claerbout, 1970, Dohertyand Claerbout, 1974, Biondi et al., 1999, Prucha et al., 1999, Duchkov and Maarten, 2009], whichproduces image volumes as a function of depth and virtual subsurface source and receiver positions.With the advent of reverse-time migration [Whitmore et al., 1983, Baysal et al., 1983, Levin,1984], these one-way, and therefore angle-limited one-way approaches, are gradually being replacedby methods based on the two-way wave-equation, which is able to produce gathers as a function113of depth and horizontal or vertical offset, or both [Biondi and Symes, 2004, Sava and Vasconcelos,2011]. This is achieved by forward propagating the source and backward propagating the data andsubsequently correlating these source and receiver wavefields at non-zero offset/time lag. Dependingon the extended imaging conditions [Sava and Biondi, 2004, Sava and Fomel, 2006], various typesof subsurface extended image gathers can be formed by restricting the multi-dimensional cross-correlations between the propagated source and receiver wavefields to certain specified coordinatedirections. The motivation to depart from horizontal offset only is that steeply dipping events donot optimally focus in the horizontal direction, even for a kinematically correct velocity model. Atemporal shift instead of a subsurface offset is sometimes also used as it is more computationallyefficient MacKay and Abma [1992], Sava and Fomel [2006].Up to this point, most advancements in image-gather based velocity-model building and reser-voir characterization are due to improved sampling of the extended images and/or a more accuratewave propagators. However, these improvements carry a heavy price tag since the computationstypically involve the solution of the forward and adjoint wave equation for each shot, followed bysubsequent cross-correlations. As a result, computational and memory requirements grow uncon-trollably and many practical implementations of subsurface offset images are therefore restricted,(e.g., by allowing only for lateral interaction over a short distance), and the gathers are computedfor a subset of image points only. Unfortunately, forming full subsurface extended image volumesrapidly becomes prohibitively expensive even for small two-dimensional problems.Extended images play an important role in wave-equation migration-velocity analysis (WEMVA),where velocity model updates are calculated by minimizing an objective function that measuresthe coherency of image gathers [Symes and Carazzone, 1991, Shen and Symes, 2008]. Regrettably,the computational and storage costs associated with this approach easily becomes unmanageableunless we restrict ourselves to a few judiciously chosen subsurface points [Yang and Sava, 2015].As a result, we end up with an expensive method, which needs to loop over all shot records andallowing for the computation of subsurface offset gathers for a limited number of subsurface pointsand directions. This is problematic because the effectiveness of WEMVA is impeded for dippingreflectors that do not focus as well along horizontal offsets.Extended images also serve as input to amplitude-versus-angle (AVA) or amplitude-versus-offset(AVO) analysis methods that derive from the (linearized) Zoeppritz equations [Aki and Richards,1980]. In this situation, the challenge is to produce reliable amplitude preserving angle/offsetgathers in complex geological environments (see e.g., de Bruin et al. [1990], van Wijngaarden[1998]) from primaries only or, more recently from surface-related multiples, as demonstrated byLu et al. [2014]. In the latter case, good angular illumination can be obtained from conventionalmarine towed-streamer acquisitions with dense receiver sampling. As WEMVA, these amplitudeanalysis are biased when reflectors are dipping, calling for corrections dependent on geological dipsthat are generally unknown [Brandsberg-Dahl et al., 2003].We present a new wave-equation based factorization principle that removes computational bot-114tlenecks and gives us access to the kinematics and amplitudes of full subsurface offset extendedimages without carrying out explicit cross-correlations between source and receiver wavefields foreach shot. We accomplish this by carrying out the correlations implicitly among all wavefields viaactions of full extended image volumes on certain probing/test vectors. Depending on the choiceof these probing vectors, these actions allow us to compute common-image-point (CIPs), common-image- (CIGs) and dip-angle gathers at certain predefined subsurface points in 2- and 3-D as well asobjective functions for WEMVA. None of these require storage of wavefields and loops over shots.The paper proceeds as follows. After introducing the governing equations of continuous-spacemonochromatic extended image volumes, their relation to migrated images, CIPs and CIGs, wederive the two-way equivalent of the double square-root equation and move to a discrete settingthat reveals a wave-equation based factorization of discrete full extended image volumes. Next,we show how this factorization can be used to extract information on local geological dips andradiation patterns for AVA purposes and how full offset extended image volumes can be used tocarry out automatic WEMVA. Each application is illustrated by carefully selected stylized numericalexamples in 2- and 3-D.6.2.1 NotationIn this paper, we use lower/upper case letters to represents scalars, e.g., xm, R, and lower case bold-face letters to represents vectors (i.e. one-dimensional quantities), e.g., x,x′. We denote matricesand tensors using upper case boldface letters, e.g., E, E˜,R. Subscript i represents frequencies,j represents the number of sources and/or receivers and k represents the subsurface grid points.2-D represents seismic volumes with one source, one receiver, and time dimensions. 3-D representsseismic volumes with two sources, two receivers, and time dimensions.6.3 Anatomy & physicsBefore describing the proposed methodology of extracting information from full subsurface-offsetextended image volumes using the proposed probing technique, we first review the governing equa-tions. We denote the full-extended image volumes ase(ω,x,x′) =∫Dsdxs u(ω,x,xs)v(ω,x′,xs), (6.1)where the overline denotes complex conjugation, u and v are source and receiver wavefields as afunction of the frequency ω ∈ Ω ⊂ R, subsurface positions x,x′ ∈ D ⊂ Rn (n = 2 or 3) and sourcepositions xs ∈ Ds ⊂ Rn−1. These monochromatic wavefields are obtained by solvingH(m)u(ω,x,xs) = q(ω,x,xs), (6.2)H(m)∗v(ω,x,xs) =∫Drdxr d(ω,xr,xs)δ(x− xr), (6.3)115where H(m) = ω2m(x)2 +∇2 is the Helmholtz operator with Sommerfeld boundary conditions, thesymbol ∗ represents the conjugate-transpose (adjoint), m is the squared slowness, q is the sourcefunction and d represents the reflection-data at receiver positions xr ∈ Dr ⊂ Rn−1.Equations (6.1-6.3) define a linear mapping from a 2n− 1 dimensional data volume d(ω,xr,xs)to a 2n + 1 dimensional image volume e(ω,x,x′). Conventional migrated images are obtained byapplying the zero-time/offset imaging condition [Sava and Vasconcelos, 2011]r(x) =∫Ωdω e(ω,x,x). (6.4)Once we have the full extended image, we can extract all conceivable image and angle gathers byapplying appropriate imaging conditions [Sava and Vasconcelos, 2011]. For instance, conventional2-D common-image-gathers (CIGs), as a function of lateral positions xm, depth z and horizontaloffset hx are defined asICIG(z, h;xm) =∫Ωdω e(ω, (xm − hx, z)T , (xm + hx, z)T), (6.5)where T denotes the transpose.Similarly, time-shifted common-image-point (CIP) gathers at subsurface points xk, as a functionof all spatial offset vectors h and temporal shifts ∆t becomesIextCIP(h,∆t; xk) =∫Ωdω e(ω,xk,xk + h)eıω∆t, (6.6)where ı =√−1. Note that we departed in this expression from the usual symmetric definitione(ω,xk − h,xk + h) because it turns out to be more natural for our shot-based computations.Because we consider these gathers of full-subsurface offset only, we apply a zero-time imagingcondition to the CIPs–i.e., ICIP(h; xk) = IextCIP(h,∆t = 0; xk). These gathers form the theoreticalbasis for AVA and WEMVA.Moving to a discrete setting, we have Ns sources {xs,j}Nsj=1 ∈ Ds, Nr receivers {xr,j}Nrj=1 ∈ Dr,Nf frequencies {ωi}Nfi=1 ∈ Ω and discretize the domain D using a rectangular grid with a total of Nxgrid points {xk}Nxk=1. We organize the source and receiver wavefields in tensors of size Nf ×Nx×Nswith elements uikj ≡ u(ωi,xk,xs,j) and vikj ≡ v(ωi,xk,xs,j). For the ith frequency, we can representthe wavefields for all sources as complex valued Nx ×Ns matrices Ui and Vi, where each columnof the matrix represents a monochromatic source experiment.Full 2n + 1 dimensional image volumes can now be represented as a 3-D tensor with elementseikk′ ≡ e(ωi,xk,xk′). A slice through this tensor at frequency i is an Nx × Nx matrix, which canbe expressed as an outer product of the matrices Ui and ViEi = UiV∗i , (6.7)116where ∗ denotes the complex conjugate transpose. The matrix Ei is akin to Berkhout’s reflectivitymatrices [Berkhout, 1993], except that Ei captures vertical interactions as well since it is derivedfrom the two-way wave-equation. These source and receiver wavefields obey the following discretizedHelmholtz equations:Hi(m)Ui = P∗sQi, (6.8)Hi(m)∗Vi = P∗rDi, (6.9)where Hi(m) represents the discretized Helmholtz operator for frequency ωi with absorbing bound-ary conditions and for a gridded squared slowness m. The Ns×Ns matrix Qi represents the sources(i.e., each column is a source function), the Nr ×Ns matrix Di contains reflection-data (i.e., eachcolumn is a monochromatic shot gather after subtraction of the direct arrival) and the matricesPs,Pr sample the wavefields at the source and receiver positions (and hence, their transpose injectsthe sources and receivers into the grid). Remark that these are the discretized versions of equations(6.2, 6.3).Substituting relations (6.8, 6.9) into the definition of the extended image (6.7) yieldsHiEiHi = P∗sQiD∗iPr. (6.10)This defines a natural 2-way analogue of the well-known double-square-root equation. From equa-tion (6.10), we derive the following expression for monochromatic full extended image volumesEi = H−1i P∗sQiD∗iPrH−1i , (6.11)which is a discrete analogue of the linear mapping from data to image volumes defined in equations(6.1-6.3). Note that for co-located sources and receivers (Pr = Ps) and ideal discrete point sources(Q is the identity matrix) we find that E is complex symmetric (i.e., <(E∗) = <(E) and =(E∗) =−=(E)) because of source-receiver reciprocity.The usual zero-time/offset imaging conditions translate tor =Nf∑i=1diag(Ei), (6.12)where r is the discretized reflectivity and diag(A) denotes the diagonal elements of A organizedin a vector. Various image gathers are embedded in our extended image volumes as illustrated inFigure 6.1.As we will switch between continuous and discrete notation throughout the paper, the corre-spondence between the image volume e(ω,x,x′) and the matrices Ei is listed in Table 6.1. Forsimplicity, we will drop the frequency-dependence from our notation for the remainder of the pa-117per and implicitly assume that all quantities are monochromatic with the understanding that allcomputations can be repeated as needed for multiple frequencies. We will also assume that thezero-time imaging condition is applied by summing over the frequencies, followed by taking thereal part. We further note that full extended image volumes can be severely aliased in case ofinsufficient source-receiver sampling. Therefore, one would in practice only extract gathers fromimage volumes in well-sampled directions.6.4 Computational aspectsOf course, we can never hope to explicitly form the complete image volumes owing to the enormouscomputational and storage costs associated with these volumes that are quadratic in the numberof grid points Nx. In particular, we will discuss the computation of monochromatic image volumesEi and drop the subscript i in the remainder of the section.To avoid forming extended images E explicitly, we instead propose to probe these volumes byright-multiplying them with Nx ×K sampling matrices W = [w1, . . . ,wK ], where K denotes thenumber of samples and wk denotes a single probing or sampling vector. After sampling, the reducedimage volume E˜ now readsE˜ = EW = H−1P∗sQD∗PrH−1W. (6.13)Our main contribution is that we can compute these compressed volumes efficiently with algorithm1 or 2. As one can see, these computations derive from wave-equation based factorizations thatavoid storage and loops over all shots.Algorithm 1 Compute matrix-vector multiplication of image volume matrix with given vectorsW = [w1, . . . ,wK ]. The computational cost is 2Ns wave-equation solves plus the cost of correlatingthe wavefields.compute all the source wavefields U = H−1P∗sQ,compute all the receiver wavefields V = H−∗P∗rDcompute weights Y˜ = V∗Wcompute the product E˜ = UY˜.Algorithm 2 Compute matrix-vector multiplication of image volume matrix with given vectorsW = [w1, . . . ,wK ]. The computational cost is 2K wave-equation solves plus the cost of correlatingthe data matrices.compute U˜ = H−1W and sample this wavefield at the receiver locations D˜ = PrU˜;correlate the result with the data W˜ = D∗D˜ to get the source weights;use the source weights to generate the simultaneous sources Q˜ = QW˜;compute the resulting wavefields E˜ = H−1P∗sQ˜.While both algorithms produce the same compressed image volume, Algorithm 1 corresponds118to the traditional way of computing image volumes where all source and receiver wavefields arecomputed first and subsequently cross-correlated. Algorithm 2, on the other hand, produces exactlythe same result without looping over all shots and without carrying out explicit cross-correlations.It arrives at this result by cross-correlating the data and source wavefields and by solving onlytwo wave-equations per probing vector instead of solving the forward and adjoint wave-equationsfor each shot. This means that our method (Algorithm 2) gains computationally as long as thenumber of probings is small compared to the number of sources (K < Ns). The choice of theprobing vectors depends on the application.For completeness, we summarized the computational complexities of the two schemes in Table6.2 in terms of the number of sources Ns, receivers Nr, subsurface sample points Nx and desirednumber of subsurface offsets in each direction Nh{x,y,z} .6.5 Case study 1: computing gathersWe now describe how common-image-point gathers (CIPs) and common-image-gathers (CIGs) canbe formed with Algorithm 2.CIPs: According to Algorithm 2, a CIP at midpoint xk can be computed efficiently by extractingthe corresponding column from the monochromatic matrix E representing the full extendedimage. We achieve this at the cost of only two wave-equation solves by setting wk to a cardinalbasis vector with its non-zero entry corresponding to the location of a single point source atxk. Because of the proposed factorization, the number of required wave-equation solves isreduced from twice the number of shots to only two per subsurface point, representing anorder-of-magnitude improvement. As long as the number of subsurface points are not toolarge, this reduction allows for targeted quality control with omnidirectional extended imagegathers.For reasonable background velocity models, we can even compute these image gathers simul-taneously as long as the corresponding subsurface points are spatially separated. In this case,the probing vectors wk correspond to simultaneous subsurface sources without encoding. Asa result, we may introduce cross-talk in the gathers, but expect that there will be very littleinterference when the locations being probed are sufficiently far away from each other. Eventhough this cross-talk may interfere with the ability to visually inspect the CIPs, we will seethat these interferences can be rendered into incoherent noise via randomized source encodingas shown by van Leeuwen et al. [2011] for full waveform inversion (FWI), a property we willlater use in automated migration-velocity analyses.CIGs We can also extract common-image-gather at a lateral position xk by (densely) sampling theimage volumes at xk = (xk, zk)T at all depth levels. In this configuration, the sampling matrixtakes the form W = Wx⊗Wz, where Wx = (0, 0, . . . , 1, . . . , 0)T is a cardinal basis vector as119before with a single non-zero entry located at the index corresponding to the midpoint positionxk and Wz = I is the identity matrix, sampling all grid points in the vertical direction. Theresulting gathers E˜ = EW are now a function of (z,∆x,∆z) and contain both vertical andlateral offsets. We can form conventional image gathers by extracting a slice of the volume at∆z = 0. The computational cost per CIG in 2-D are roughly the same as the conventionallycomputed CIGs; 2Nz vs. 2Ns wave-equation solves, where Nz denotes the number of samplesin depth. In 3-D however, the proposed way of computing the gathers via algorithm 2 is aorder of magnitude faster because the number of sources in a 3-D seismic acquisition is anorder of magnitude bigger while the number of samples in depth stays the same.As with the CIPs, we can generate these gathers simultaneously, by sampling various lateralpositions at the same time for each depth level.6.5.1 Numerical results in 2-DTo illustrate the discussed methodology, we compute various gathers on a central part of Marmousimodel. We use a grid spacing of 10 m, 81 co-located sources and receivers with 50 m spacing andfrequencies between 5 and 25 Hz with 0.5 Hz spacing. The source wavelet is a Ricker wavelet with apeak frequency of 15 Hz. The wavefields are modeled using a 9-point finite difference discretizationof the Helmholtz operator with a sponge boundary condition. The direct wave is removed prior tocomputing the image gathers.Figure 6.2 shows the migrated image for wrong (a) and correct background velocity models(b). At three locations indicated by the ∗ symbol we extract the CIP gathers, shown in (c) fora background velocity model that is too low and in (d) for the correct velocity. As a result ofthe cross-correlations between various events in propagated wavefields several spurious events arepresent so the recovered CIP gathers are only meaningful for interpretation close to the imagepoint. However, as expected most of the energy concentrates along the normal to the reflector. Wecan generate these three CIPs simultaneously at the cost of generating one CIP by defining thesampling vector wk to represent the three point sources simultaneously. As we mentioned before,this may result in cross-talk between the CIPs, however, in this case the events do not significantlyinterfere because the events are separated laterally, as shown in Figure 6.2 (c,d).To illustrate the benefits of the proposed scheme, we also report the computational time (in sec)and memory (in MB) required to compute a single common-image point gather using Algorithm1 and 2. The results are shown in Table 6.3. We can see that even for a small toy model, theprobing technique reduces the computational time and memory requirement by a factor of 20 and30, respectively.Finally, Figures 6.2 (e-f) contain CIGs at two lateral positions for a background velocity thatis too low (e) and correct (f). In this example, we generated the image volume for each depth levelsimultaneously for all lateral positions. When creating many gathers, such a simultaneous probing120can substantially reduce the required computational cost.6.5.2 Numerical results in 3-DNotwithstanding the achieved speedup and memory reduction in 2-D, the proposed probing methodoutlined in Algorithm 2 is a true enabler in 3-D where there is no realistic hope to store full extendedimage volumes whose size is quadratic in the number of grid points Nx. Besides, the number ofsources also becomes quadratic for full-azimuth acquisitions. As in the 2-D case, our probingtechnique is a key enabler allowing us to compute CIPs without allocating exorbitant amounts ofmemory and computational resources. To illustrate our claim, we compute a single CIP for theCompass velocity model provided to us by BG Group. Figures 6.3, 6.4 contains both vertical andlateral cross-sections of this complicated 3D velocity model, which contains 131 × 101 × 101 gridpoints. The model is 780 m deep and 2.5 km across in both lateral directions.We generated data from this velocity model following an ocean-bottom node configuration wheresources are placed at the water surface along the x and y directions with a sampling interval of 75 m.The receivers are placed at the sea bed with a sampling of 50 m resulting in a data volume with1156 sources and 2601 receivers. We generated this marine data volume with a 3-D time-harmonicHelmholtz solver, based on a 27 point discretization, perfectly-matching boundary conditions, anda Ricker wavelet with a central frequency of 15 Hz. During the simulations and imaging, we used15 frequencies ranging from 5 to 12 Hz with a sampling interval of 0.5 Hz. For further details on theemployed wave-equation solver, we refer to Lago et al. [2014], van Leeuwen and Herrmann [2014].Figures 6.3b and 6.5 show an example of a full CIP gather extracted at z = 390, x = 1250and y = 1250 m. Aside from behaving as expected, the observed running time with Algorithm 2is 1500 times faster compared to the corresponding time needed by Algorithm 1. While Algorithm1 and 2 lend themselves well for parallelization over frequencies, shots, and probing vectors, ourmethod avoids a loop over 1156 shots while avoiding explicit formation of extended image thatwould require the allocation of a matrix with 1016 entries.6.6 Case study 2: dip-angle gathersAside from their use for kinematical quality control during velocity model building, common-image gathers (CIGs) also contain dynamic amplitude-versus-angle (AVA) information on reflectinginterfaces that can serve to invert associated rock properties [Mahmoudian and Margrave, 2009].For this purpose, various definitions of angle-domain CIGs have been proposed [de Bruin et al.,1990, van Wijngaarden, 1998, Rickett and Sava, 2002, Sava and Fomel, 2003, Kuhel and Sacchi,2003, Biondi and Symes, 2004, Sava and Vasconcelos, 2011]. Extending the work of de Bruinet al. [1990] to include corrections for the geological dip θ, we extract angle-dependent reflection121coefficients from a subsurface point x0 by evaluating the following integral:R(x0, α; θ) ∝∫Ωdω∫ hmax−hmaxdh e(ω,x0,x0 + hn(θ)⊥)eıω sin(α)h/v(x) (6.14)over frequencies and offsets. We use the ∝ symbol to indicate that this expression holds up toproportionality constant.In this expression, α is the angle of incidence with respect to the normal (see Figure 6.6), v(x)is the local background velocity used to convert subsurface offsets to angles, and n(θ)⊥ denotes thetangent vector to the reflector defining the offset vector in this direction (hn(θ)⊥). The integral iscarried out over the effective offset range denoted by hmax, which decreases for deeper parts of themodel. The integral over frequencies correspond to the zero-time imaging condition. As illustratedin Figure 6.6, the normal n(θ) = (sin θ, cos θ)T and tangent vectors to a reflecting interface dependon the geological dip τ , which is unknown in practice. Unfortunately, ignoring this factor maylead to erroneous amplitudes in cases where this dip is steep. This means that we need to extractCIGs, following the procedure outlined above, as well as estimates on the geological dip from theextended image volumes.Following Brandsberg-Dahl et al. [2003], we use the stack power to estimate the geological dipat subsurface point xk viaθ̂ = argmaxθ∫ hmax−hmaxdh∣∣∣∣∫Ωdω e(ω,x,xk + hn(θ))∣∣∣∣2 . (6.15)This maximization is based on the assumption that the above integral attains a maximumvalue when we collect energy from the image volume at time zero and along a direction thatcorresponds to the true geologic dip. Both integrals in equation 6.14 and 6.15 require informationon monochromatic extended image volumes at subsurface position xk only–i.e., e(ω,xk,x′) forω ∈ Ω, to which we have readily access via probing as described in Algorithm 2. The resultinggathers e(ω,xk,x′) contain the required information to estimate local geologic-dip corrected angle-dependent reflection coefficient R(α; xk, θ̂) where the correction is carried with dip estimate θ̂that maximizes the stack power. Since we have access to all subsurface offsets at no additionalcomputational costs, there is no need to select a maximum subsurface offsets hmax priori eventhough the effective subsurface offset decreases with depth due to the finite aperture in the seismicdata acquisition.6.6.1 Numerical resultsTo illustrate the proposed method of computing angle-domain common-image gathers (CIGs),we compare the modulus for plotting reasons of the estimated reflection coefficients |R(α; xk, θk)|with the theoretical PP-reflection coefficients predicted by the Zoeppritz equations [Koefoed, 1955,122Shuey, 1985].To make this comparison, we used a finite-difference time-domain acoustic modelling code[Symes et al., 2011b] to generate three synthetic data sets for increasingly complex models, namelya two-layer velocity and constant density model (Figures 6.7 (a, b)); a four layer model with prop-erties taken from de Bruin et al. [1990] (Figures 6.8 (a,b)), and a two layer lateral varying velocityand density model–i.e., one-horizontal reflector and one-dipping reflector (Figures 6.9(a, b)). Thepurpose of these three experiments is to (i) verify the velocity-change-only angle dependent re-flection coefficients; (ii) study the effects of density and decreasing effective horizontal offset withdepth; and (iii) illustrate the effect of the geologic dip on the reflection coefficients. We used aRicker wavelet with a peak frequency of 15 Hz as a source signature. In all three examples, seismicdata is simulated using split-spread acquisition. The gathers used in the AVA analyses are obtainedusing Algorithm 2 and discrete version of equations 6.14 and 6.15 for a smoothed version of thelayered velocity models.The estimated angle-dependent reflection coefficients for the first model is displayed in Figure6.7 (c). We can see that these estimates, extracted from our extended image volumes, matchthe theoretical reflection coefficients according to the Zoeppritz equations after a single amplitudescaling fairly well up to angles of 50◦ that are reasonably close to effective aperture angle (depictedby the black line in the Figures of the AVA curves). Beyond these angles, the Zoeppritz equationsare no longer accurate.The results for the deeper four layer model depicted in Figures 6.8 (c, d, e, f) clearly show theimprint of smaller horizontal offsets with depth.From the AVA plots in Figure 6.8, we can observe that the reflection coefficients for the first andsecond reflector are well matched up to 50◦ and 40◦ and, for the third and fourth reflectors, to only20◦. We also confirm that that the angle-dependent reflection coefficient associated with a changesin density is approximately flat (see Figure 6.8 (e)). The finite aperture of the data accounts forthe discrepancy beyond these angles.To illustrate our method’s ability to correct for the geological dip, we consider a common-image-point (CIP) gather at x = 2250 m and z = 960 m. As we can see from Figure 6.9 (c), this CIPis well focused and with the local geologic dip of the reflector. The corresponding stack power isplotted in Figure 6.9 (d), which attains maximum power at θ̂ = 10.8◦ an estimate close to theactual dip of 11◦. To demonstrate the effect of ignoring the geological dip when computing anglegathers, we evaluate equation 6.14 for θ = 0. The results included in Figure 6.10 clearly illustratethe benefit of incorporating the dip-information in angle-domain image gathers as it allows for amore accurate estimation of the angle-dependent reflection coefficients. This experimental resultsupports our claim that proposed extended image volume framework lends itself well to estimategeologic-dip corrected angle gathers. As we mentioned before, the proposed method also reaps thecomputational benefits of probing extended image volumes as outlined in Algorithm 2.1236.7 Case study 3: wave-equation migration-velocity analysis(WEMVA)Aside from providing localized information on the kinematics and dynamics, full subsurface offsetextended image volumes also lend themselves well for automatic velocity analyses that minimizesome global focusing objective [Symes and Carazzone, 1991, Biondi and Symes, 2004, Shen andSymes, 2008, Sava and Vasconcelos, 2011, Mulder, 2014, Yang and Sava, 2015]. For this reason,wave-equation migration-velocity analysis (WEMVA) can be considered as another important ap-plication where subsurface image gathers are being used extensively. In this case, the aim is tobuild kinematically correct background velocity models that either promote similarity amongst dif-ferent surface offset gathers, as in the original work by Symes and Carazzone [1991] on DifferentialSemblance, or that aim to focus at zero subsurface offset, an approach promoted by recent workon WEMVA [Brandsberg-Dahl et al., 2003, Shen and Symes, 2008, Symes, 2008b]. While thesewave-equation based methods are less prone to imaging artifacts Stolk and Symes [2003], they arecomputationally expensive restricting the number of directions in which the subsurface offsets canbe calculated. This practical limitation may affect our ability to handle unknown geological dips[Sava and Vasconcelos, 2011, Yang and Sava, 2015].Before presenting an alternative approach that overcomes these practical limitations, let us firstformulate an instance of WEMVA, based on the probing technique outlined in Algorithm 2. In thisdiscrete case, WEMVA corresponds to minimizingminmN2x∑k=1‖SkE(m)wk‖22, (6.16)with E(m) =∑i∈Ω Ei(m). As before the Ei(m)’s denote monochromatic extended image volumesfor the background velocity model m. With this definition, we absorb both the zero-time imagingcondition, by summing over frequencies. The vector wn represents a point-source at a subsurfacepoint corresponding to the kth entry of w. We compute this sum efficiently via first probing theeach monochromatic extended image volumes with the vector wk followed by sum over frequencyinstead of summing over the monochromatic full subsurface image volumes followed by probingwith the vector wk. Finally, the diagonal matrix Sk penalizes defocussed energy by applying aweighting function. Often, this weight is chosen proportional to the lateral subsurface offset [Shenand Symes, 2008]. The main costs of this approach (cf. equation 6.16) are formed by the numberof gathers, which equals the number of grid points. Of course, WEMVA is conducted in practicewith a subset of grid points Nx, the number of which depends on the complexity of the subsurface.To arrive at an alternative more cost effective formulation that offers flexibility to focus in alloffset directions, we take a different tack by using the fact that focussed extended image volumescommute with diagonal weighting matrices [Kumar et al., 2013b, Symes, 2014] that penalize off-124diagonal energy–i.e., we haveE diag(s) ≈ diag(s) E (6.17)for a given weighting vector s. For the specific case where the entries in s correspond to lateralpositions for each grid point in the model, forcing the above commutation relation corresponds to theobjective of equation 6.16 with weights proportional to the horizontal subsurface offset. However,other options are also available including focusing in all offset directions. The optimization problemcan now be written asminm{φ(m) = ||E(m)diag(s)− diag(s)E(m)||2F}, (6.18)where ||A||2F =∑i,j a2i,j denotes the Frobenius norm. Minimization of this norm forces E to focusas a function of the velocity model m.For obvious reasons, this formulation is impractical because even in 2-D we can not hope tostore the extended image volumes E, whose size is quadratic in the number of grid points – E isa N2x ×N2x matrix. However, we can still minimize the above objective function via random-traceestimation [Avron and Toledo, 2011], a technique that also underlies phase encoding techniquesin full-waveform inversion van Leeuwen et al. [2011]. With this technique, equation 6.18 can beevaluated to arbitrary accuracy via actions of E on random vectors w. With this approximation,the WEMVA objective becomesφ(m) ≈ φ˜(m) = 1KK∑k=1‖R(m)wk‖22, (6.19)where R(m) = E (m)diag(s) − diag(s) E(m). While other choices are possible, we select the wkas Gaussian vectors with independent, identically distributed random entries with zero mean andunit variance. For this choice, φ˜(m) and φ(m) are equal in expectation, which means that theabove sample average is unbiased [van Leeuwen et al., 2011, Haber et al., 2012]. As we know fromsource-encoding in full-waveform inversion [Krebs et al., 2009, van Leeuwen et al., 2011, Haberet al., 2012], good approximations can be obtained for small sample size K  Nx. We will studythe quality of this approximation below.The gradient of the approximate objective is given by∇φ˜(m) = 1KK∑k=1(∇E(m,diag(s)wk)− diag(s)∇E(m,wk))∗R(m)wk, (6.20)where∇E(m,y) = ∂E(m)y∂m125is the Jacobian of E(m)y. We do not form this Jacobian matrix explicitly, but instead compute itsaction on a vector as follows:∇E(m,y)δm = −ω2 (E(m)diag(y˜) + H(m)−1diag(e˜)) δm (6.21)where w˜ = H(m)−1y, e˜ = E(m)y. The computation of the action of the adjoint of the Jacobianfollows naturally.We can now employ an iterative gradient-based method to find a minimizer of φ(m) by usinga K-term approximation of this objective and its gradient [van Leeuwen et al., 2011, Haber et al.,2012]. To remove possible bias from using a fixed set of random probing vectors, we redraw thesevectors after each gradient update. Remaining errors can be controlled by increasing K [Haberet al., 2012, Friedlander and Schmidt, 2012, van Leeuwen and Herrmann, 2014].6.7.1 Numerical resultsWe test the proposed wave-equation migration-velocity analysis (WEMVA) formulation on syn-thetic examples, illustrating the efficacy of the probing techniques. In all experiments, we use a9-point finite-difference discretization of the 2-D Helmholtz equation to simulate the wavefields.The direct wave is removed from the data prior to performing the velocity analysis. To regularizethe inversion, we parameterize the model using cubic B-splines [Symes, 2008a]. We solve the result-ing optimization problem with the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)method [Nocedal and Wright, 2000].Need for full subsurface-offset image volumesOur first experiment is designed to illustrate the main benefit of working with full subsurface offsetsin all directions in situations where both horizontal and vertical reflectors are present (Figure6.11 (a) adapted from [Yang and Sava, 2015]). For this purpose, we juxtapose the focussing, for avelocity model with the correct kinematics, of common-image gathers (CIGS) against focussing withcommon-image point gathers (CIPs) for a model with vertical and horizontal horizontal. Whileboth CIG and CIP gathers can be used to form WEMVA objectives, their performance can bequite different. Ideally, CIGs measure focusing along offsets in the direction of the geologic dip forsubsurface points sampled along spatial coordinated perpendicular to this direction, denoted bythe yellow lines in Figure 6.11 (a). The green dots denote the location of the selected CIPs.Since prior knowledge of geologic dips is typically not available, why not device a WEMVAscheme that focusses in all spatial directions? In that way, we are guaranteed to focus as illustratedin Figure 6.11 (b-d), for the horizontal reflector, and in Figure 6.11 (e-g) for the vertical reflector.From these figures it is clear that CIGs do not focus (Figure 6.11 (c,e)), despite the fact thatthe velocity model is correct. This lack of focusing may lead to erroneous biases during CIG-basedWEMVA, a problem altogether avoided when we work with CIPs (Figure 6.11 (d,f)). Because CIPs126are sensitive to focusing in all directions, there is no need to focus in time [Sava and Vasconcelos,2011]. While the advantages of CIP-based WEMVA are clear, we cheated by selecting CIPs on topof the reflecting interfaces, whose position is also generally not known in advance. This is where therandomized sampling comes to our rescue. By treating all subsurface points as random amplitudeencoded sources, we will be able to form a CIP-based WEMVA objective, which does not requireexplicit knowledge on the location of the reflecting interfaces.Quality of the stochastic approximationAs with FWI, randomly encoded sources—e.g. via random Gaussian weights, provide a vehicleto approximate (prohibitively) expensive to evaluate wave-equation based objectives (cf. equa-tions 6.18 and 6.19) to controllable accuracy by increasing K. Following Haber et al. [2012], weevaluate for a small subset (Figure 6.12 (a)) of the Marmousi model [Bourgeois et al., 1991] thetrue φ(m0 + αδm) (solid blue line) and approximate φ˜(m0 + αδm) (denoted by the error bars)objectives as a function of α in the direction of the gradient—i.e., δm = −∇φ, evaluated at thestarting model m0 depicted in Figure 6.12 (b). The results for K = 10 and K = 80 are included inFigures 6.12 (c,d). We calculated the error bars from 5 independent realizations. As we increaseK, the true objective is better approximated reflected in tighter and better centred (comparedto the true objective) error bars. These results also show that we can substantially reduce thecomputational costs while approximating the true objective function accurately.WEMVA on the marmousi modelTo validate our approach to WEMVA based on random probing and full-subsurface offset common-image point gathers (CIPs), we minimize the approximate objective in equation with a quasi-Newton method using approximate evaluations for the objective (equation 6.18) and gradients(equation 6.20). We choose the Marmousi model plotted in Figure 6.13 (a), because of its complexityand relatively steep reflectors. The model is 3.0 km deep and 9.2 km wide, sampled at 12 m. Weacquired synthetic data for this model using a split-spread acquisition geometry resulting in 767sources and receivers sources sampled at a 12 m interval. The data simulation and inversion arecarried out over 201 frequencies, sampled at 0.1 Hz ranging from 5 to 25 Hz and scaled by a Rickerwavelet with a central frequency of 10 Hz.We use a highly smoothed starting model with small (or no) lateral variations to start theinversions for different numbers of probing vectors, K = 10 and K = 100 respectively. To regularizethe inversion, we use B-splines sampled at x = 48 m and z = 48 m in the lateral and verticaldirections. Figures 6.13 (c, d) show the inverted models after 25 L-BFGS iterations. We can clearlysee that even 10 probing vectors are good enough to reveal the structural information. Comparedto computing the full image gathers (which would need 2 · 767 PDE solves per evaluation), thisreduces the computational cost and memory use by roughly a factor 60.127Encouraged by this result, we conduct a second experiment for a poor starting model (Fig-ure 6.14 (b)) and for data with fewer low frequencies (8 − 25 Hz). In this case, we use a slightlyhigher number of probing vectors (K = 100) to reduce the error in approximating the true objec-tive function. Figures 6.12 (c, d) show that the errors for small K are relatively large when thestarting model is poor. This is the case for small α’s in Figure 6.12. To regularize the inversion,we use B-splines sampled at x = 96 m and z = 96 m in the lateral and vertical directions in orderto recover the smooth (low-frequency) component of the true velocity model. We can clearly see inFigure 6.14 (c) that we build the low-frequency component of the velocity model. Figure 6.14 (d)shows our velocity estimate overlaid with a contour plot of the true velocity perturbation. We cansee that despite missing low frequencies the inverted model captures the shallow complexity of themodel reasonably well.6.8 DiscussionTo our knowledge this work represents the first instance of deriving a discrete two-way equivalentof Claerbout’s double square-root equation [Claerbout, 1970, 1985a] that enables us to computefull-subsurface offset extended image volumes. Contrary to approaches based on the one-way waveequation, which are dip-limited and march along depth, our method provides access to extendedimage volumes via actions of a wave-equation based factorization on certain probing vectors. Thisfactorization enables matrix-vector products with matrices that encode wavefield interactions be-tween arbitrary pairs of points in the subsurface that can not formed explicitly. As a result, wearrive at a formulation that is computationally feasible and that offers new perspectives on thedesign and implementation of workflows that exploit information embedded in various types ofsubsurface extended images.Depending on the choice of the probing vectors, we either obtain local information, tied toindividual subsurface points that can serve as quality control for velocity analyses or as input tolocalized amplitude-versus-offset analysis, or global information that can as we have shown be usedto drive automatic velocity analyses. Aside from guiding kinematical inversions through focusing,we expect that our randomized probings of extended image volumes also provide information on therock properties. Further extension of the methodology include forming fully elastic image volumes,which turns our matrix representation into a tensor representation. We leave such generalizationsto a future paper.6.9 ConclusionsExtended subsurface image volumes carry information on interactions between pairs of subsurfacepoints encoding essential information on the kinematics—to be used during velocity analysis—and the dynamics, which serve as input to inversions of the rock properties. While conceptuallybeautiful, full-subsurface offset image volumes have not yet been considered in practice because128these objects are too large to be formed explicitly. Through a wave-equation based factorization, weavoid explicit computations by forming matrix-vector products instead that only require two wave-equation solves each, and thereby removing the customary and expensive loop over shots, found inconventional extended imaging. This approach leads to significant computational gains in certainsituations since we are no longer constrained by costs that scale with the number of shots and thenumber of subsurface points and offsets visited during the cross-correlation calculations. Instead,we circumvent these expensive explicit computations by carrying out these correlations implicitlythrough the wave equation solves. As a result, we end up with a matrix-vector formulation fromwhich different image gathers can be formed and with which amplitude-versus-angle and wave-equation migration-velocity analyses can be performed without requiring prior information on thegeologic dip. We showed that these operations can be accomplished at affordable computationalcosts.By means of concrete examples, we demonstrate how localized information on focussing andscattering amplitudes can be revealed by forming different extended image volumes in both 2-Dand 3-D. Because full-subsurface extended image volumes are quadratic in the number of griddedsubsurface parameters, it would be difficult if not impossible to obtain these results by conven-tional methods. We also verify that our matrix-vector formulation lends itself well for automaticmigration-velocity analysis if Gaussian random probing vectors are used. These vectors act assimultaneous sources and allow for significant computational gains in the evaluation of global fo-cusing objectives that are key to migration-velocity analysis. Instead of focussing in a particularoffset direction, as in most current approaches, our objective and its gradient force full-subsurfaceextended image volumes to focus in all offset directions at all subsurface points. It accomplishes thisby forcing a commutation relation between the extended image volume and a matrix that expressesthe Euclidean distance between points in the subsurface. The examples show that the computa-tional costs can be controlled by probing. Application of this new automatic migration-velocityanalysis technique to a complex synthetic shows encouraging results in particular in regions withsteep geological dips.129Table 6.1: Correspondence between continuous and discrete representations of the image vol-ume. Here, ω represents frequency, x represents subsurface positions, and (i, j) repre-sents the subsurface grid points. The colon (:) notation extracts a vector from e at thegrid point i, j for all subsurface offsets.Continuous Discretefull image volume e(ωi,x,x′) Eimigrated image∫Ωdω e(ω,x,x)∑Nfi=1 diag(Ei)CIP e(ωi,xk,xk′) eikk′# of PDE solves flopsconventional 2Ns NsNhxNhyNhzthis paper 2Nx NrNsTable 6.2: Computational complexity of the two schemes in terms of the number of sourcesNs, receivers Nr sample points Nx and desired number of subsurface offsets in eachdirection Nh{x,y,z} .time (s) memory (MB)conventional 456 152this paper 23 5.1Table 6.3: Comparison of the computational time (in sec) and memory (in megabytes) forcomputing CIP’s gather on a central part of Marmousi model. We can see the significantdifference in time and memory using the probing techniques compared to the conven-tional method and we expect this difference to be greatly exacerbated for realisticallysized models.130(a) (b) (c) (d)(e) (f) (g) (h)Figure 6.1: Different slices through the 4-dimensional image volume e(z, z′, x, x′) around z =zk and x = xk. (a) Conventional image e(z, z, x, x), (b) Image gather for horizontal andvertical offset e(z, z′, xk, x′), (c) Image gather for horizontal offset e(z, z, xk, x′) and (d)Image gather for a single scattering point e(zk, z′, xk, x′). (e-g) shows how these slicesare organized in the matrix representation of e.131(a) (b)(c)(d)(e) (f)Figure 6.2: Migrated images for a wrong (a) and the correct (b) background velocity areshown with 3 locations at which we extract CIPs for a wrong (c) and the correct (d)velocity. The CIPs contain many events that do not necessarily focus. However, theseevents are located along the line normal to the reflectors. Therefore, it seems feasible togenerate multiple CIPs simultaneously as long as they are well-separated laterally. Apossible application of this is the extraction of CIGs at various lateral positions. CIGsat x = 1500 m and x = 2500 m for a wrong (e) and the correct (f) velocity indeedshow little evidence of crosstalk, allowing us to compute several CIGs at the cost of asingle CIG.132(a)(b)Figure 6.3: (a) Compass 3D synthetic velocity model provided to us by BG group. (b) ACIP gather at (x, y, z) = (1250, 1250, 390)m. The proposed method (Algorithm 2) is1500 times faster then the classical method (Algorithm 1) to generate CIP gather.133(a)(b)Figure 6.4: Cross-section of Compass 3D velocity model (Figure 6.3 (a)) along (a) x, and (b)y direction.(a) (b) (c)Figure 6.5: Slices extracted along the horizontal (a,b) and vertical (c) offset directions fromthe CIP gather shown in Figure 6.3 (b).Figure 6.6: Schematic depiction of the scattering point and related positioning of the reflec-tor.134(a)(b)(c)Figure 6.7: (a) Horizontal one-layer velocity model and (b) constant density model. CIPlocation is x = 1250 m and z = 400 m. (c) Modulus of angle-dependent reflectivitycoefficients at CIP. The black lines are included to indicate the effective aperture atdepth. The red lines are the theoretical reflectivity coefficients and the blue lines arethe wave-equation based reflectivity coefficients.135(a) (b)(c) (d)(e) (f)Figure 6.8: Angle dependent reflectivity coefficients in case of horizontal four-layer (a) ve-locity and (b) density model at x = 1250 m. Modulus of angle-dependent reflectivitycoefficients at (c) z = 200 m, (d) z = 600 m, (e) z = 1000 m, (f) z = 1400 m.136(a) (b)(c) (d)Figure 6.9: Estimation of local geological dip. (a,b) Two-layer model. (c) CIP gather atx = 2250 m and z = 960 m overlaid on dipping model. (d) Stack-power versus dip-angle. We can see that the maximum stack-power corresponds to the dip value of 10.8◦,which is close to the true dip value of 11◦.137(a) (b)(c)Figure 6.10: Modulus of angle-dependent reflectivity coefficients in two-layer model at z =300 and 960 m and x = 2250 m. (a) Reflectivity coefficients at z = 300 m andx = 2250 m. Reflectivity coefficients at z = 900 m (b) with no dip θ = 0◦ and (c)with the dip obtained via the method described above (θ = 10.8◦).138(a)(b) (c) (d)(e) (f) (g)Figure 6.11: Comparison of working with CIGs versus CIPs. (a) True velocity model. Theyellow line indicates the location along which we computed the CIGs and the greendot is the location where we extracted the CIPs. (b,c) CIGs extracted along verticaland horizontal offsets directions in case of vertical reflector. (d) CIPs extracted alongvertical (z = 1.2 km, x = 1 km) reflector. (e,f) CIGs extracted along vertical andhorizontal offsets directions in case of horizontal reflector. (g) CIPs extracted alonghorizontal (z = 1.5 km, x = 4.48 km) reflector.139(a) (b)(c) (d)Figure 6.12: Randomized trace estimation. (a,b) True and initial velocity model. Objectivefunctions for WEMVA based on the Frobenius norm, as a function of velocity pertur-bation using the complete matrix (blue line) and error bars of approximated objectivefunction evaluated via 5 different random probing with (c) K=10 and (d) K = 80 forthe Marmousi model.140(a) (b)(c) (d)Figure 6.13: WEMVA on Marmousi model with probing technique for a good starting model.(a,b) True and initial velocity models. Inverted model using (c) K = 10 and (b)K = 100 respectively. We can clearly see that even 10 probing vectors are goodenough to start revealing the structural information.141(a) (b)(c) (d)Figure 6.14: WEMVA on Marmousi model with probing technique for a poor starting veloc-ity model and 8-25 Hz frequency band. (a,b) True and initial velocity models. Invertedmodel using (c) K = 100. (d) Inverted velocity model overlaid with a contour plotof the true model perturbation. We can see that we captures the shallow complexityof the model reasonably well when working with a realistic seismic acquisition andinversion scenario.142Chapter 7ConclusionsIn this chapter I summarize the main contributions of this thesis, propose follow-up work, discusssome limitations, and outline possible extensions.7.1 Main contributionsIn this thesis, I have developed fast computational techniques for large-scale seismic applicationssuch as missing-trace interpolation, simultaneous source separation, and wave-equation based mi-gration velocity analysis. In Chapters 2 and 3, I designed a large-scale singular value decomposi-tion (SVD)-free factorization based rank-minimization approach, which is built upon the existingknowledge of compressed sensing as a successful signal recovery paradigm, and outlined the nec-essary components of a low-rank domain, a rank-increasing sampling scheme, and a SVD-freerank-minimizing optimization scheme for successful missing-trace interpolation. Note that for thematrix completion approach, the limiting component for large scale data is that of the nuclearnorm projection, which requires the computation of SVD. For large scale 5D seismic data, it is pro-hibitively expensive to compute singular value decompositions because the underlying matrix hastens of thousands or even millions of rows and columns. Hence, we design a SVD-free factorizationbased approach for missing-trace interpolation and source separation. In Chapter 4 and 5, I ex-tended the SVD-free rank-minimization framework to remove source cross-talk during simultaneoussource acquisition because subsequent seismic data processing and imaging assumes well separateddata. In Chapter 6, I proposed a matrix-vector formulation, which overcomes computational andstorage costs of full-subsurface offset extended image volumes that are prohibitively expensive toform for large-scale seismic data imaging problems. This matrix-vector formulation avoids explicitstorage of full-subsurface offset extended image volumes and removes the expensive loop over shotsfound in the conventional extended imaging [Sava and Vasconcelos, 2011].The remainder of this section provides more details about the aforementioned contributions.1437.1.1 SVD-free factorization based matrix completionCurrent efforts towards imaging subsalt structures under complex overburdens have led to a movetowards wide-azimuth towed streamer (WATS) acquisition. Due to budgetary and/or physical con-straints, WATS is typically coarsely sampled (sub-sampled) either along sources and/or receivers.However, most of the processing and imaging techniques require densely sampled seismic data on aregular periodic grid, thus call for large-scale seismic data interpolation techniques. Following ideasfrom the field of compressed sensing [Donoho, 2006b, Cande`s et al., 2006], a variety of method-ologies, each based on various mathematical techniques, have been proposed to interpolate seismicdata where the underlying principle is to exploit the sparse or low-rank structure of seismic data ina transform domain [Hennenfent et al., 2010, Kreimer, 2013]. However, these previous compressedsensing (CS)-based approaches, using sparsity or rank-minimization, incur computational difficul-ties when applied to large scale seismic data volumes. For instance, methods that involve redundanttransforms, such as curvelets [Hennenfent et al., 2010], or that add additional dimensions, such astaking outer products of tensors [Kreimer, 2013], are computationally no longer tractable for largedata volumes with four or more dimensions.From a theoretical point of view, the success of rank-minimization based techniques dependsupon two main principles, namely a low-rankifying transform and a sub-Nyquist sampling strategythat subdues coherent aliases in the rank-revealing transform domain. As identified in Chapter2 (Figures 2.2 and 2.3), for 2D seismic data acquisition, seismic frequency slices exhibit low-rankstructure (fast decay of the singular values) in the midpoint-offset domain, whereas, subsamplingincreases the rank in the midpoint-offset domain, i.e., singular values decay slowly. Hence, we canuse the rank-minimization based techniques to reconstruct the low-rank representation of seismicdata in the midpoint-offset domain. Similarly as shown in Chapter 3, 4D monochromatic frequencyslices (extracted from a 3D seismic data acquisition) exhibit a low-rank structure in the (source-x,receiver-x) matricization compared to other possible matricization, whereas, subsampling destroysthe low-rank structure in the (source-x, receiver-x) matricization (Figures 3.4 and 3.5). Here,matricization refers to an operation which reshapes a tensor into a matrix along specific dimensions.When these two principles hold, i.e., a low-rank domain and a rank-increasing sampling scheme,rank-minimization formulations allow the recovery of missing traces.From a practical standpoint, we need a large-scale seismic data interpolation framework, whichcan handle seismic data measurements in the order of 1010 to 1012. Unfortunately, one of thelimitations of rank-minimization based techniques for large-scale seismic problems is the nuclear-norm projection, which inherently involves prohibitively expensive computations of singular valuedecompositions (SVD). For this reason, I proposed, in Chapter 2 and 3, a practical frameworkfor recovering missing-traces in large-scale seismic data volumes using a matrix-factorization basedapproach [Lee et al., 2010b, Recht and Re´, 2011], where I avoided the need for expensive computa-tions of SVDs. I showed that the proposed SVD-free factorization based framework can interpolate144large-scale seismic data volumes with high fidelity from significantly subsampled, and thereforeeconomic and environmental friendly, seismic data volumes. In Chapter 3, I also demonstratedthat the proposed factorization framework allows me to work with fully sampled seismic data vol-ume without relying on a windowing operation that divides 5D data up into small cubes followedby normal-moveout corrections designed to remove curvature of seismic reflection events—one ofthe more recent approaches in missing-trace interpolation literature [Kreimer, 2013]. The reportedresults in Chapter 2 and 3 on realistic 3D and synthetic 5D seismic data interpolation demonstratethat while the SVD-free factorization based approaches are comparable (in reconstruction quality)to the existing matrix completion [Becker et al., 2011, Wen et al., 2012] and sparsity-promotionbased techniques that use sparsity in transform domains (such as wavelets and curvelets) [Herrmannand Hennenfent, 2008b], it significantly outperforms these techniques in terms of the computationaltime and memory requirements.Large-scale simultaneous source separationPractitioners also proposed to acquire simultaneous seismic data acquisition to mitigate the sam-pling related issues and improve the quality of seismic data, wherein a single or multiple sourcevessels fire sources at near-simultaneous or slightly random times. This results in overlapping andmissing shot records, as opposed to non-overlapping shot records in conventional marine acquisition.Although simultaneous source surveys result in improved quality recorded seismic data volumes ata reduced acquisition turnaround time, the costs of reduced acquisition are coherent artifacts fromseismic cross-talk. This may degrade the quality of final migrated images because subsequentseismic data processing and imaging assumes well separated data acquired in the conventional(non-overlapping) way. Therefore, a practical (simultaneous) source separation and interpolationtechnique is required, which separates overlapping and interpolated missing shots. Following thecompressed sensing principles outlined in Chapter 1 and 2 for matrix completion framework, Ifound that fully sampled conventional seismic data has low-rank structure—i.e., quickly decayingsingular values in the midpoint-offset domain, whereas, overlapping and missing seismic data resultsin high-rank structure—i.e., slowly decaying singular values. For this reason, in Chapter 4 and 5, Ideveloped a highly parallelizable singular-value decomposition (SVD)-free matrix factorization ap-proach for source separation and missing-trace interpolation. I showed that the proposed frameworkregains the low-rank structure structure of the separated and interpolated seismic data volumes. Itested the SVD-free framework on two different seismic data acquisition scenarios, namely static(receivers are moving with sources) and dynamic (receivers are fixed at ocean-bottom) geometries.For dynamic geometry, in Chapter 4, I investigated two instances of low-variability in source fir-ing times—e.g., 0 ≤ 1 (or 2) second, namely over/under and simultaneous long offset (Figure 4.7and 4.9). In Chapter 5, I investigated an instance of static geometry in source firing times—e.g., >1 second, where a single source vessel sails across an ocean-bottom array firing two airgun arrays145at jittered source locations and time instances with receivers recording continuously (Figure 5.1).For both the cases, I showed that the proposed SVD-free rank-minimization framework is ableto separate and interpolate the large-scale seismic data to a desired grid with negligible loss ofthe coherent energy where the reconstruction quality is comparable to sparsity-promotion [Wasonand Herrmann, 2013b] and NMO-based median filtering type techniques [Chen et al., 2014]. Ialso demonstrated that the proposed SVD-free factorization based rank-minimization approach forsource separation outperforms sparsity-promotion based techniques by an order of magnitude incomputational speed while using only 1/20th of the memory.Missing-trace interpolation without windowingSeismic data is often acquired in large volumes, which makes the processing of seismic data com-putationally expensive. For this reason, seismic practitioners follow workflows that involve: i)normal-moveout (NMO) correction to remove the curvature of seismic reflection events such thatthe events tend to become linear in small enough windows, ii) dividing data into overlappingspatial-temporal windows, iii) performing Fourier transform along the time coordinate within eachwindow, iv) using a matrix or tensor based technique to interpolate individual monochromaticslices, v) doing the inverse Fourier transform along the frequency coordinate within each window,vi) combining all the windows to get reconstructed seismic data volumes. While this approach iscomputationally feasible because it can be readily parallelized and utilized with success, there areseveral issues during the windowing process: i) one needs a reasonably accurate root-mean squarevelocity model to perform the NMO correction that may be difficult to compute when data ismissing, ii) proper window size selection, which depends upon the complexity of seismic data, i.e.,large window size for linear reflection events and smaller window size for complex seismic reflectionevents with curvature, iii) averaging operations along the overlapping windows (see [Kreimer, 2013]for details). These issues are very important because they may lead to underperformance.One of the main contributions of this thesis was the design and implementation of a SVD-free matrix-factorization approach, which exploits the inherent redundancy in seismic data whileavoiding the customary preprocessing steps of windowing and NMO corrections. Seismic data isinherently redundant because we are collecting data on the same subsurface at different angles.In this thesis, I used this inherent redundancy of seismic data to exploit the low-rank structure,where the idea is to represent complete seismic data volume using only a few singular vectorscorresponding to the largest singular values. Note that this redundancy can only be exploitedwhen the data is organized in a particular manner, namely exploiting the low-rank structure ofseismic data in the midpoint-offset domain for 2D seismic data surveys or (source-x, receiver-x)matricizations for 3D seismic data surveys.When data volumes are organized in this way, the singular values decay rapidly, which is a reflec-tion of the intrinsic redundancy exhibited by seismic data. To illustrate this inherent redundancy,146I considered a split-spread seismic acquisition over a complex geological model provided by the BGGroup, where each source is recorded by all the receivers. This simulation resulted in a seismicdata with 1024 time samples, 60 × 60 sources and 60 × 60 receivers. From this seismic volume, Iextracted a monochromatic 4D tensor at 10 Hz, where the size of the tensor is 60× 60× 60× 60,followed by an analysis of the decay of the singular values as a function of window sizes. To comparethe recovery within windows versus carrying out the interpolation over the complete non-windowedsurvey, I computed the singular values as follows: i) I perform xsrc, xrec matricization (as explainedin Chapter 3) on a 10 Hz monochromatic slice, resulting in a matrix of 3600 rows and 3600 columns,ii) I define window sizes of 100, 200, 300, 400, 600, 900, 1200, 1800 and 3600, iii) for each window size,I extract all possible sub-matrices and compute their corresponding singular values, iv) I concate-nate all the singular values of sub-windowed matrices, for a given window size, and sort them inthe descending order. In Figure 7.1 (a), I compared the decay of singular values of monochromaticslices at 10 Hz for different window sizes, where we can see that i) fully sampled seismic datavolumes have the fastest decay of singular values, and ii) smaller window sizes result in a slowerdecay rate of the singular values, i.e., we need relatively more singular values to approximate theunderlying fully sampled seismic data. This simple experiment demonstrated that fully samplednon-windowed monochromatic slices exhibit low-rank structure because we can approximate themby using a few singular vectors that correspond to the first few largest singular values with minimalloss of coherent seismic energy. Therefore, the non-windowed seismic data should be used duringmissing-trace interpolation and source separation using the rank-minimization based techniquesand avoiding the customary windowed- and NMO-based preprocessing steps.7.1.2 Enabling computation of omnidirectional subsurface extended imagevolumesImage gathers contain kinematics and dynamics information on interactions between pairs of sub-surface points. We can use this information to design wave-equation based velocity analysis,amplitude-versus-angle inversion to estimate the rock properties, and target-oriented imaging usingredatuming techniques in geological challenging environments to overcome the effects of complexoverburden. Unfortunately, forming full-subsurface offset extended image volumes is prohibitivelyexpensive because they are quadratic in the image size and therefore impossible to store. Apart fromstorage impediments, the costs associated with computing the extended image volumes scale withthe number of shots, the number of subsurface points, and the number of subsurface offsets visitedduring the cross-correlation calculations. In Chapter 6, I proposed a computationally and memoryefficient way of gleaning information from the full-subsurface offset extended image volumes, whereI first organized the full-subsurface offset extended image volumes as a matrix. Then, I computedthe action of the matrix on a given vector without explicitly constructing the full-subsurface offsetextended image volumes. The proposed matrix-vector product removes the expensive loop overshots found in the conventional methods, thus leading to significant computational and memory14710 0 10 1 10 2 10 3 10 4# of singular values00.10.20.30.40.50.60.70.80.91normalized singular values100200300400600900120018003600(a)Figure 7.1: To understand the inherent redundancy of seismic data, we analyze the decayof singular values of windowed versus non-windowed cases. We see that fully sampledseismic data volumes have fastest decay of singular values, whereas, smaller windowsizes result in the slower decay rate of the singular values.gains in certain situations. Using this matrix-vector formulation, I formed the image-gathers alongall offset directions to perform amplitude-versus-angle and automatic wave-equation migration-velocity analyses without requiring prior information on the geologic dip. By means of concreteexamples, I demonstrated that we can extract the localized scattering amplitudes information fromthe image volumes computed on 2D and 3D velocity models. I further validated the potential ofprobing techniques to perform automatic wave-equation migration-velocity analyses on a complexsynthetic model with steep geological dips.7.2 Follow-up workAlthough I tested the proposed SVD-free factorization-based rank-minimization framework on re-alistic 3D and complex 5D synthetic data volumes and compared it to the existing matrix / tensorcompletions and sparsity-promotion based interpolation techniques (such as curvelets), it would beinteresting to see its application to a realistic 5D data set observed on complex geological structuressuch as salt. Since our SVD-free rank-minimization framework works with the full seismic datavolumes without windowing, it will be computationally infeasible to interpolate the complete dataon a single computing node, i.e., performing the interpolation in a serial mode, hence, we need148to devise a parallel interpolation framework that can exploit the inherent redundancy of seismicdata without performing the windowing operation and overcome the computational bottleneck ofhandling the large-scale seismic data volumes where unknowns are of the order of 1010 to 1012.Finally, it would be good to see the impact of 5D seismic interpolation on various seismic post pro-cessing workflows such as surface-related multiple estimation, wavefield-decomposition, migration,waveform inversion, target-imaging and uncertainty analysis.7.3 Current limitationsOne of the main requirements of the current rank-minimization approach is to find a transformdomain where the target recovered data exhibit a low-rank structure. In this thesis, I showedthat 2D seismic monochromatic slices exhibit low-rank structure in the midpoint-offset domain atthe lower frequencies, but not at the higher frequencies. This behaviour is due to the increasein wave oscillations as we move from low to high-frequency slices in the midpoint-offset domain,even though the energy of the wavefronts remains focused around the diagonal (see Figures 2.2and 2.3 in Chapter 2). Therefore, interpolation via rank minimization in the high-frequency regimerequires extended formulations that incorporate low-rank structure. [Kumar et al., 2013a] proposedto addressed this issue for 2D seismic data acquisition using Hierarchically Semi-Separable matrixrepresentation (HSS) [Chandrasekaran et al., 2006]. Although the initial results of HSS basedtechniques for 3D seismic data interpolation and source separation are encouraging, it would beinteresting to see its benefits to interpolate higher frequencies in large-scale realistic 5D seismic datavolumes generated using 3D seismic data acquisitions. Moreover one of the limitations in proposedSVD-free matrix-factorization approach is to find the rank parameter k associated with each low-rank factor. I estimated it using the cross-validation techniques (see chapter 3 for more details) inall the examples presented in this thesis, however, it is still not a practical approach when dealingwith large-scale subsampled seismic data volumes, since it will involve finding the appropriate smallvolume of seismic data to run the cross-validation techniques—a computationally expensive process.7.4 Future extensions7.4.1 Extracting on-the-fly informationWhile the traditional interpolation and/or source separation approaches can deal with missinginformation and/or source cross-talk during seismic data acquisition, these processing methodsresult in massive seismic data volumes in the orders of terabytes to petabytes. This overwhelmingamount of data makes subsequent imaging and inversion workflows daunting for realistic data sets,since extracting various data gathers, such as common-source and receiver gathers can be verychallenging (Input/Output costs), apart from storing the massive volumes of interpolated and/orseparated seismic data on disks. To overcome this computational and memory burden, we can149design a fast, resilient, and scalable workflow, where we first compress the seismic data volumesusing a SVD-free rank-minimizing optimization scheme. Then, we can access the information fromthe compressed volumes on-the fly, i.e., extracting the common-source and/or receiver gathers fromthe low-rank factors without forming the fully sampled seismic data volumes during the objectiveand gradient calculations of inversion framework.7.4.2 Compressing full-subsurface offset extended image volumesEven though the probing techniques circumvent the computational and storage requirements offorming the full-subsurface offset extended image volumes, we have only limited access to theinformation from the full-subsurface offset image volumes at the subsurface locations defined bythe probing vectors. As a consequence, the proposed framework of probing vectors can only bebeneficial when the number of probing vectors is very small compared to the number of sources.To circumvent the computational requirement of forming the full-subsurface offset extended imagevolumes at every point in the subsurface, I proposed to exploit the low-rank structure of imagevolumes. To understand the low-rank behaviour of image volumes, I analyzed its singular valuedecay on a small section of Marmousi model (Figure 7.2 (a)). I chose this particular part of themodel because it consists of highly dipping reflectors with strong lateral variations in the velocity.For this 2D model, the monochromatic full-subsurface offset extended image volume is a fourdimensional tensor with dimensions, depth z, offset x, horizontal lag δx, and vertical lag δz. Asoutlined in Chapter 3, there is no unique generalization of the SVD to tensors and as a result, there isno unique notion of rank for tensors. However, rank can be computed on the different matricizationsof tensors where matricization reshapes a tensor to a matrix along specific dimensions. To analyzethe decay of singular values of each monochromatic full-subsurface offset extended image volume,I matricize this 4D tensor where the depth z and vertical lag coordinates (z, δz) are groupedalong the rows, and the offset x and horizontal lag δx coordinates are grouped along the columns.Figure 7.2 (b) shows the matricized tensor and Figure 7.2 (c) shows its associated singular valuesdecay. I also tested the decay of singular values for all possible combination of matricizations,i.e., (z, δx), (z, x), (z, δz), (δx, δz), and find that (z, δz) matricization gives the fastest decay of thesingular values. I further approximated the full-subsurface offset extended image volume with itsfirst 10 singular vectors computed using (z, δz) matricization, which results in a reconstructionerror of 10−4. This example demonstrates that even for highly complex geological structures, thefull-subsurface offset extended image volumes for all subsurface points exhibits low-rank structure,which can be exploited to compress full-subsurface offset extended image volumes while gleaninginformation from the full-subsurface offset extended image volumes during wave-equation migrationvelocity analyses.150(a)(b)(c)Figure 7.2: To visualize the low-rank nature of image volumes, I form a full-subsurface offsetextended image volume using a subsection of the Marmousi model and analyzed thedecay of singular values. (a) Complex subsection of the Marmousi model with highlydipping reflectors with strong lateral variations in the velocity, and (b) correspondingfull-subsurface offset extended image volume at 5 Hz. (c) To demonstrate the low-ranknature of image volumes, I plot the decay of singular values, where I observed that weonly required the first 10 singular vectors to get a reconstruction error of 10−4.1517.4.3 Comparison of MVA and FWIApart from performing velocity analysis using extended image volumes, which is known as migrationvelocity analysis (MVA) in seismic literature, full-waveform inversion (FWI) is another useful toolto invert for the velocity model of the subsurface. FWI is a nonlinear data-fitting procedure,where the aim is to get the velocity updates via minimizing the mismatch between the observedand predicted seismic data. Here, the predicted data is generated using an initial guess of thesubsurface by solving a wave-equation [Virieux and Operto, 2009]. Both MVA and FWI have itsown pitfall, which can lead to spurious artifacts in the velocity inversion. For example, FWI ismainly driven by the turning waves and MVA is mainly driven by the reflected waves. FWI hasdeeper depth of penetration for lower frequencies but not for higher frequencies, whereas MVA canprovide the velocity updates in the deeper part of the model for both lower and higher frequencies.Another well-known drawback of FWI is the occurrence of local-minima in the misfit functional,which leads to erroneous artifacts in the velocity updates. This can be circumvented by eitherrecording the low-frequencies in the observed data [Virieux and Operto, 2009], which are generallyabsents from the seismic data, or starting with a good initial velocity model that is sufficiently closeto the true model. In future work, I would like to come up with an inversion framework, which hasflavors of both FWI and MVA and help us to overcome some of these pitfalls.152BibliographyAAPG. Seismic data display. American Association of Petroleum Geologists Wiki, January 242017. URL http://wiki.aapg.org/Seismic data display. → pages 2R. Abma, T. Manning, M. Tanis, J. Yu, and M. Foster. High quality separation of simultaneoussources by sparse inversion. In 72nd EAGE Conference and Exhibition, 2010. → pages 78R. Abma, A. Ford, N. Rose-Innes, H. Mannaerts-Drew, and J. Kommedal. Continueddevelopment of simultaneous source acquisition for ocean bottom surveys. In 75th EAGEConference and Exhibition, 2013. doi: 10.3997/2214-4609.20130081. URLhttp://earthdoc.eage.org/publication/publicationdetails/?publication=68897. → pages 78P. Akerberg, G. Hampson, J. Rickett, H. Martin, and J. Cole. Simultaneous source separation bysparse radon transform. SEG Technical Program Expanded Abstracts, 27(1):2801–2805, 2008.doi: 10.1190/1.3063927. URL http://link.aip.org/link/?SGA/27/2801/1. → pages 78K. Aki and P. G. Richards. Quantitative seismology. Freeman and Co. New York, 1980. → pages114A. Aravkin, M. Friedlander, F. Herrmann, and T. van Leeuwen. Robust inversion, dimensionalityreduction, and randomized sampling. Mathematical Programming, 134(1):101–125, 2012. →pages 14, 24A. Aravkin, J. Burke, and M. Friedlander. Variational properties of value functions. SIAMJournal on Optimization, 23(3):1689–1717, 2013a. doi: 10.1137/120899157. URLhttp://dx.doi.org/10.1137/120899157. → pages 14, 15, 16, 23, 24, 25A. Aravkin, R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann. Fast methods for denoisingmatrix completion formulations, with applications to robust seismic data interpolation. SIAMJournal on Scientific Computing, 36(5):S237–S266, 2014a. doi: 10.1137/130919210. → pages50, 57, 58, 60, 69A. Y. Aravkin, J. V. Burke, and M. P. Friedlander. Variational properties of value functions.SIAM Journal on optimization, 23(3):1689–1717, 2013b. → pages 88A. Y. Aravkin, R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann. Fast methods fordenoising matrix completion formulations, with applications to robust seismic datainterpolation. SIAM Journal on Scientific Computing, 36(5):S237–S266, 10 2014b. doi:10.1137/130919210. URL http://epubs.siam.org/doi/abs/10.1137/130919210. → pages 82, 88153H. Avron and S. Toledo. Randomized algorithms for estimating the trace of an implicitsymmetric positive semi-definite matrix. Journal of the Association for Computing Machinery,58(2):P1–P16, Apr. 2011. ISSN 00045411. doi: 10.1145/1944345.1944349. URLhttp://portal.acm.org/citation.cfm?doid=1944345.1944349http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.6194&amp;rep=rep1&amp;type=pdf. →pages 125R. H. Baardman and R. G. van Borselen. Method and system for separating seismic sources inmarine simultaneous shooting acquisition. Patent Application, EP 2592439 A2, 2013. → pages78V. Bardan. Trace interpolation in seismic data processing. Geophysical Prospecting, 35(4):343–358, 1987. ISSN 1365-2478. doi: 10.1111/j.1365-2478.1987.tb00822.x. URLhttp://dx.doi.org/10.1111/j.1365-2478.1987.tb00822.x. → pages 49E. Baysal, D. D. Kosloff, and J. W. Sherwood. Reverse time migration. Geophysics, 48(11):1514–1524, 1983. → pages 113C. J. Beasley. A new look at marine simultaneous sources. The Leading Edge, 27(7):914–917,2008. doi: 10.1190/1.2954033. URL http://tle.geoscienceworld.org/cgi/content/abstract/27/7/914. →pages 78C. J. Beasley, R. E. Chambers, and Z. Jiang. A new look at simultaneous sources. SEG TechnicalProgram Expanded Abstracts, 17(1):133–135, 1998. doi: 10.1190/1.1820149. URLhttp://link.aip.org/link/?SGA/17/133/1. → pages 105S. R. Becker, E. J. Candes, and M. C. Grant. Templates for convex cone problems withapplications to sparse signal recovery. Mathematical Programming Computation, 3(3):165–218,2011. ISSN 1867-2949. doi: 10.1007/s12532-011-0029-5. URLhttp://dx.doi.org/10.1007/s12532-011-0029-5. → pages 23, 28, 35, 145E. v. Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAMJournal on Scientific Computing, 31(2):890–912, 2008. → pages 13, 14, 15, 16, 20, 24, 29, 51,80, 82, 88, 89E. v. Berg and M. P. Friedlander. Sparse optimization with least-squares constraints. SIAM J.Optimization, 21(4):1201–1229, 2011. → pages 13, 14, 15, 28, 35A. Berkhout. Changing the mindset in seismic data acquisition. The Leading Edge, 27(7):924–938, 2008a. → pages 105, 109A. Berkhout and D. Verschuur. Imaging of multiple reflections. Geophysics, 71, no. 4(4):SI209–SI220, 2006. doi: 10.1190/1.2215359. URL http://dx.doi.org/10.1190/1.2215359. → pages 61A. J. Berkhout. A unified approach to acoustical reflection imaging. I: The forward model. TheJournal of the Acoustical Society of America, 93(4):2005–2016, 1993. ISSN 00014966. doi:10.1121/1.406714. URL http://link.aip.org/link/JASMAN/v93/i4/p2005/s1&Agg=doi. → pages 117A. J. Berkhout. Changing the mindset in seismic data acquisition. The Leading Edge, 27(7):924–938, 2008b. doi: 10.1190/1.2954035. URLhttp://tle.geoscienceworld.org/cgi/content/abstract/27/7/924. → pages 78154F. J. Billette and S. Brandsberg-Dahl. The 2004 bp velocity benchmark. In 67th EAGEConference & Exhibition, 2004. → pages 90B. Biondi and W. W. Symes. Angle-domain common-image gathers for migration velocityanalysis by wavefield-continuation imaging. Geophysics, (5), 69(5):1283, 2004. ISSN 00168033.doi: 10.1190/1.1801945. URL http://link.aip.org/link/GPYSA7/v69/i5/p1283/s1&Agg=doi. → pages113, 114, 121, 124B. Biondi, P. Sava, et al. Wave-equation migration velocity analysis. 69th Ann. Internat. MtgSoc. of Expl. Geophys, pages 1723–1726, 1999. → pages 113A. Bourgeois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg. Marmousi, modeland data. In 1990 workshop on Practical Aspects of Seismic Data Inversion, EuropeanAssociation of Exploration Geophysicists, volume 5-16. EAGE, 1991. → pages 90, 127S. Brandsberg-Dahl, M. de Hoop, and B. Ursin. Focusing in dip and ava compensation onscattering angle/azimuth common image gathers. Geophysics, (1), 68(1):232–254, 2003. →pages 114, 122, 124S. Burer and R. D. Monteiro. Local minima and convergence in low-rank semidefiniteprogramming. Mathematical Programming, 103(3):427–444, 2005. → pages 60S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefiniteprogramming. Mathematical Programming, 103:2005, 2003. → pages 15, 18, 41J. Caldwell and C. Walker. An overview of marine seismic operations. The internationalassociation of oil and gas producers, January 24 2017. URL http://www.ogp.org.uk/pubs/448.pdf.→ pages xiii, 1, 2, 3E. J. Cande`s and L. Demanet. The curvelet representation of wave propagators is optimallysparse. Communications on Pure and Applied Mathematics, 58(11):1472–1528, 2005. ISSN1097-0312. doi: 10.1002/cpa.20078. URL http://dx.doi.org/10.1002/cpa.20078. → pages 81E. J. Cande`s and B. Recht. Exact matrix completion via convex optimization. Foundations ofComputational mathematics, 9(6):717–772, 2009. → pages 54, 82E. J. Cande`s and T. Tao. Near-optimal signal recovery from random projections: Universalencoding strategies. Information Theory, IEEE Transactions on, 52(12):5406–5425, December2006. ISSN 0018-9448. doi: 10.1109/TIT.2006.885507. → pages 13E. J. Cande`s, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccuratemeasurements. CPAM, 59(8):1207–1223, 2006. → pages 144E. J. Cande`s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of theACM, 58(3), May 2011. → pages 13E. J. Cande`s, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery frommagnitude measurements via convex programming. Communications on Pure and AppliedMathematics, 66(8):1241–1274, 2013. ISSN 1097-0312. doi: 10.1002/cpa.21432. URLhttp://dx.doi.org/10.1002/cpa.21432. → pages 21155A. Canning and G. Gardner. Reducing 3-d acquisition footprint for 3-d dmo and 3-d prestackmigration. Geophysics, 63(4):1177–1183, 1998. doi: 10.1190/1.1444417. → pages 49S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals. A fast solver for hssrepresentations via sparse matrices. SIAM J. Matrix Analysis Applications, 29(1):67–81, 2006.doi: http://dx.doi.org/10.1137/050639028. → pages 80, 84, 149C.-H. Chang and C.-W. Ha. Sharp inequalities of singular values of smooth kernels. IntegralEquations and Operator Theory, 35(1):20–27, 1999. → pages 65H. Chauris, M. S. Noble, G. Lambare´, and P. Podvin. Migration velocity analysis from locallycoherent events in 2-d laterally heterogeneous media, part i: Theoretical aspects. Geophysics,67(4):1202–1212, 2002. → pages 113Y. Chen, J. Yuan, Z. Jin, K. Chen, and L. Zhang. Deblending using normal moveout and medianfiltering in common-midpoint gathers. Journal of Geophysics and Engineering, 11(4):045012,2014. → pages 81, 93, 146J. Cheng and M. D. Sacchi. Separation of simultaneous source data via iterative rank reduction.In SEG Technical Program Expanded Abstracts, pages 88–93, 2013. URLhttp://dx.doi.org/10.1190/segam2013-1313.1. → pages 79, 106J. Claerbout. Imaging the earth’s interior. Blackwell Scientific Publishers, 1985a. → pages 128J. F. Claerbout. Coarse grid calculations of waves in inhomogeneous media with application todelineation of complicated seismic structure. Geophysics, 35(3):407–418, 1970. doi:10.1190/1.1440103. URL http://geophysics.geoscienceworld.org/content/35/3/407.abstract. → pages113, 128J. F. Claerbout. Fundamentals of geophysical data processing. Pennwell Books, Tulsa, OK, 1985b.→ pages 113W. J. Curry. Interpolation with fourier radial adaptive thresholding. 79th Annual InternationalMeeting, SEG, Expanded Abstracts, pages 3259–3263, 2009. doi: 10.1190/1.3255536. URLhttp://library.seg.org/doi/abs/10.1190/1.3255536. → pages 49C. de Bruin, C. Wapenaar, and A. Berkhout. Angle-dependent reflectivity by means of prestackmigration. Geophysics, (9), 55(9):1223, Sept. 1990. ISSN 1070485X. doi: 10.1190/1.1442938.URL http://library.seg.org/getabs/servlet/GetabsServlet?prog=normal&id=GPYSA7000055000009001223000001&idtype=cvips&gifs=yes. → pages 113, 114, 121, 123R. de Kok and D. Gillespie. A universal simultaneous shooting technique. In 64th EAGEConference and Exhibition, 2002. → pages 78, 105L. Demanet. Curvelets, Wave Atoms, and Wave Equations. PhD thesis, California Institute ofTechnology, 2006a. → pages 35L. Demanet. Curvelets, Wave Atoms, and Wave Equations. PhD thesis, California Institute ofTechnology, 2006b. → pages 56156S. M. Doherty and J. F. Claerbout. Velocity analysis based on the wave equation. TechnicalReport 1, Stanford Exploration Project, 1974. URLhttp://sepwww.stanford.edu/oldreports/sep01/01 12.pdf. → pages 113D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306,2006a. → pages 13, 49D. L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306,2006b. → pages 6, 144P. Doulgeris, K. Bube, G. Hampson, and G. Blacquiere. Convergence analysis of acoherency-constrained inversion for the separation of blended data. Geophysical Prospecting, 60(4):769–781, 2012. doi: 10.1111/j.1365-2478.2012.01088.x. → pages 78A. A. Duchkov and V. Maarten. Velocity continuation in the downward continuation approach toseismic imaging. Geophysical Journal International, 176(3):909–924, 2009. → pages 113A. Duijndam, M. Schonewille, and C. Hindriks. Reconstruction of band-limited signals,irregularly sampled along one spatial direction. Geophysics, 64(2):524–538, 1999. doi:10.1190/1.1444559. → pages 49B. Engquist and L. Ying. Fast directional multilevel algorithms for oscillatory kernels. SIAMJournal on Scientific Computing, 29(4):1710–1737, 2007. → pages 62J.-M. Enjolras. Total pioneers cable-less 3d seismic surveys in uganda. Oil in Uganda, January 242017. URLhttp://www.oilinuganda.org/features/environment/uganda-pioneers-3d-seismic-surveys.html. → pagesxiii, 2M. Fazel. Matrix rank minimization with applications. PhD thesis, Stanford University, 2002. →pages 13M. Figueiredo, R. Nowak, and S. Wright. Gradient projection for sparse reconstruction:Application to compressed sensing and other inverse problems. IEEE Journal of SelectedTopics in Signal Processing, 1(4):586 –597, dec. 2007. → pages 15M. Friedlander, H. Mansour, R. Saab, and O. Yilmaz. Recovering compressively sampled signalsusing partial support information. IEEE Transactions on Information Theory, 58(1), January2011. → pages 14, 26M. P. Friedlander and M. Schmidt. Hybrid Deterministic-Stochastic Methods for Data Fitting.SIAM Journal on Scientific Computing, 34(3):A1380–A1405, May 2012. ISSN 1064-8275. doi:10.1137/110830629. URL http://epubs.siam.org/doi/abs/10.1137/110830629. → pages 126S. Funk. Netflix update: Try this at home, December 2006. URLhttp://sifter.org/∼simon/journal/20061211.html. → pages 28S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convexoptimization. Inverse Problems, 27(2):025010, Jan. 2011. → pages 60157R. Giryes, M. Elad, and Y. C. Eldar. The projected gsure for automatic parameter tuning initerative shrinkage methods. Applied and Computational Harmonic Analysis, 30(3):407–422,2011. → pages 15, 20D. Gross. Recovering Low-Rank Matrices From Few Coefficients in Any Basis. IEEETransactions on Information Theory, 57:1548–1566, 2011. → pages 28E. Haber, M. Chung, and F. Herrmann. An effective method for parameter estimation with pdeconstraints with multiple right-hand sides. SIAM Journal on Optimization, 22(3):739–757,2012. doi: 10.1137/11081126X. URL http://epubs.siam.org/doi/abs/10.1137/11081126X. → pages125, 126, 127N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288,2011a. → pages 79N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288,2011b. → pages 49N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288,2011c. → pages 85G. Hampson, J. Stefani, and F. Herkenhoff. Acquisition using simultaneous sources. The LeadingEdge, 27(7):918–923, 2008. doi: 10.1190/1.2954034. URLhttp://tle.geoscienceworld.org/cgi/content/abstract/27/7/918. → pages 78S. Hegna and G. E. Parkes. Method for acquiring and processing marine seismic data to extractand constructively use the up-going and down-going wave-fields emitted by the source (s),november 2012. US Patent Application 13/686,408. → pages 5, 78G. Hennenfent and F. J. Herrmann. Seismic denoising with nonuniformly sampled curvelets.Computing in Science & Engineering, 8(3):16–25, 05 2006a. doi: 10.1109/MCSE.2006.49. URLhttps://www.slim.eos.ubc.ca/Publications/Public/Journals/CiSE/2006/hennenfent06CiSEsdn/hennenfent06CiSEsdn.pdf. → pages 81G. Hennenfent and F. J. Herrmann. Application of stable signal recovery to seismic datainterpolation. In SEG Technical Program Expanded Abstracts, volume 25, pages 2797–2801.SEG, 2006b. → pages 49G. Hennenfent, E. van den Berg, M. P. Friedlander, and F. J. Herrmann. New insights intoone-norm solvers from the Pareto curve. Geophysics, 73(4):A23–A26, 07 2008. doi:10.1190/1.2944169. URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/Geophysics/2008/hennenfent08GEOnii/hennenfent08GEOnii.pdf. → pages 82G. Hennenfent, L. Fenelon, and F. J. Herrmann. Nonequispaced curvelet transform for seismicdata reconstruction: A sparsity-promoting approach. Geophysics, 75(6):WB203–WB210, 122010. URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/Geophysics/2010/hennenfent2010GEOPnct/hennenfent2010GEOPnct.pdf. → pages 144158F. J. Herrmann and G. Hennenfent. Non-parametric seismic data recovery with curvelet frames.Geophysical Journal International, 173(1):233–248, 2008a. ISSN 1365-246X. → pages 13, 30,32, 49, 62F. J. Herrmann and G. Hennenfent. Non-parametric seismic data recovery with curvelet frames.Geophysical Journal International, 173:233–248, April 2008b. URLhttps://www.slim.eos.ubc.ca/Publications/Public/Journals/GeophysicalJournalInternational/2008/herrmann08nps/herrmann08nps.pdf. → pages 145F. J. Herrmann, M. P. Friedlander, and O. Yilmaz. Fighting the curse of dimensionality:Compressive sensing in exploration seismology. Signal Processing Magazine, IEEE, 29(3):88–100, 2012a. ISSN 1053-5888. doi: 10.1109/MSP.2012.2185859. URLhttps://www.slim.eos.ubc.ca/Publications/Public/Journals/IEEESignalProcessingMagazine/2012/Herrmann11TRfcd/Herrmann11TRfcd.pdf. → pages 13F. J. Herrmann, M. P. Friedlander, and O. Yilmaz. Fighting the curse of dimensionality:compressive sensing in exploration seismology. IEEE Signal Processing Magazine, 29:88–100,May 2012b. doi: 10.1109/MSP.2012.2185859. → pages 20D. Hill, C. Combee, and J. Bacon. Over/under acquisition and data processing: the nextquantum leap in seismic technology? First Break, 24(6):81–95, 2006. → pages 5, 78P. Huber. Robust Statistics. Wiley, 1981. → pages 14, 23S. Huo, Y. Luo, and P. Kelamis. Simultaneous sources separation via multi-directionalvector-median filter. SEG Technical Program Expanded Abstracts, 28(1):31–35, 2009. doi:10.1190/1.3255522. URL http://link.aip.org/link/?SGA/28/31/1. → pages 78P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. InIn NIPS 2010, 2010. → pages 23P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternatingminimization. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory ofComputing, STOC ’13, pages 665–674, New York, NY, USA, 2013. ACM. ISBN978-1-4503-2029-0. doi: 10.1145/2488608.2488693. URLhttp://doi.acm.org/10.1145/2488608.2488693. → pages x, 21, 22B. Jumah and F. J. Herrmann. Dimensionality-reduced estimation of primaries by sparseinversion. Geophysical Prospecting, 62(5):972–993, 09 2014. doi: 10.1111/1365-2478.12113.URL http://onlinelibrary.wiley.com/doi/10.1111/1365-2478.12113/abstract. → pages 84M. N. Kabir and D. Verschuur. Restoration of missing offsets by parabolic radon transform1.Geophysical Prospecting, 43(3):347–368, 1995. ISSN 1365-2478. doi:10.1111/j.1365-2478.1995.tb00257.x. URL http://dx.doi.org/10.1111/j.1365-2478.1995.tb00257.x.→ pages 49B. Kanagal and V. Sindhwani. Rank selection in low-rank matrix approximations: A study ofcross-validation for nmfs. Advances in Neural Information Processing Systems, 1:10, 2010. →pages 51159O. Koefoed. On the effect of poisson’s ratios of rock strata on the reflection coefficients of planewaves. Geophysical Prospecting, 3(4):381–387, 1955. ISSN 1365-2478. doi:10.1111/j.1365-2478.1955.tb01383.x. URL http://dx.doi.org/10.1111/j.1365-2478.1955.tb01383.x.→ pages 122Z. Koren and I. Ravve. Full-azimuth subsurface angle domain wavefield decomposition andimaging part i: Directional and reflection image gathers. Geophysics, 76(1):S1–S13, 2011. →pages 113J. Krebs, J. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, and M. Lacasse. Fastfull-wavefield seismic inversion using encoded sources. Geophysics, (6), 74(6):P177–P188, 2009.doi: 10.1190/1.3230502. URL http://dx.doi.org/10.1190/1.3230502. → pages 125N. Kreimer. Multidimensional seismic data reconstruction using tensor analysis. 2013. → pages144, 145, 146N. Kreimer and M. Sacchi. A tensor higher-order singular value decomposition for prestackseismic data noise reduction and interpolation. Geophysics, 77, no. 3(3):V113–V122, 2012a. doi:10.1190/geo2011-0399.1. → pages 49, 50, 82N. Kreimer and M. D. Sacchi. A tensor higher-order singular value decomposition for prestackseismic data noise reduction and interpolation. GEOPHYSICS, 77(3):V113–V122, 2012b. doi:10.1190/geo2011-0399.1. → pages 107N. Kreimer and M. D. Sacchi. Tensor completion via nuclear norm minimization for 5d seismicdata reconstruction. In 83rd Annual International Meeting, SEG, Expanded Abstracts, pages1–5, 2012c. doi: 10.1190/segam2012-0529.1. URLhttp://library.seg.org/doi/abs/10.1190/segam2012-0529.1. → pages 49N. Kreimer, A. Stanton, and M. D. Sacchi. Tensor completion based on nuclear normminimization for 5d seismic data reconstruction. Geophysics, 78, no. 6(6):V273–V284, 2013. →pages 51, 60, 69H. Kuhel and M. Sacchi. Least-squares wave-equation migration for avp/ava inversion.Geophysics, (1), 68(1):262–273, 2003. doi: 10.1190/1.1543212. URLhttp://library.seg.org/doi/abs/10.1190/1.1543212. → pages 121R. Kumar, H. Mansour, A. Y. Aravkin, and F. J. Herrmann. Reconstruction of seismic wavefieldsvia low-rank matrix factorization in the hierarchical-separable matrix representation. 84thAnnual International Meeting, SEG, Expanded Abstracts, pages 3628–3633, 9 2013a. doi:10.1190/segam2013-1165.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/SEG/2013/kumar2013SEGHSS/kumar2013SEGHSS.pdf. → pages 62, 82, 84, 85, 149R. Kumar, T. van Leeuwen, and F. J. Herrmann. Efficient WEMVA using extended images. InSEG Workshop on Advances in Model Building, Imaging, and FWI; Houston, 9 2013b. →pages 124R. Kumar, A. Y. Aravkin, E. Esser, H. Mansour, and F. J. Herrmann. SVD-free low-rank matrixfactorization : wavefield reconstruction via jittered subsampling and reciprocity. 76stConference and Exhibition, EAGE, Extended Abstracts, 06 2014. doi:16010.3997/2214-4609.20141394. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2014/kumar2014EAGErank/kumar2014EAGErank.pdf. → pages 63R. Kumar, C. D. Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann.Efficient matrix completion for seismic data reconstruction. Geophysics, 80(05):V97–V114, 092015a. doi: 10.1190/geo2014-0369.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/Geophysics/2015/kumar2014GEOPemc/kumar2014GEOPemc.pdf. (Geophysics). → pages79, 80, 83, 106R. Kumar, H. Wason, and F. J. Herrmann. Source separation for simultaneous towed-streamermarine acquisition –- a compressed sensing approach. Geophysics, 80(06):WD73–WD88, 112015b. doi: 10.1190/geo2015-0108.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/Geophysics/2015/kumar2015sss/kumar2015sss revised.pdf. (Geophysics). → pages 106R. Lago, A. Petrenko, Z. Fang, and F. J. Herrmann. Fast solution of time-harmonicwave-equation for full-waveform inversion. In EAGE Annual Conference Proceedings, 06 2014.doi: 10.3997/2214-4609.20140812. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2014/lago2014EAGEfst/lago2014EAGEfst.pdf. → pages 121K. L. Lange, R. J. A. Little, and J. M. G. Taylor. Robust statistical modeling using the tdistribution. Journal of the American Statistical Association, 84(408):881–896, 1989. → pages14, 24R. M. Lansley, M. Berraki, and M. M. M. Gros. Seismic array with spaced sources havingvariable pressure, september 2007. US Patent Application 12/998,723. → pages 5, 78A. Lee, B. Recht, R. Salakhutdinov, Nathan Srebro, and J. A. Tropp. Practical Large-ScaleOptimization for Max-Norm Regularization. In Advances in Neural Information ProcessingSystems, 2010a. → pages 17, 51, 80, 89, 106, 109J. Lee, B. Recht, R. Salakhutdinov, N. Srebro, and J. Tropp. Practical large-scale optimizationfor max-norm regularization. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, andA. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1297–1305.2010b. → pages 14, 17, 20, 144S. A. Levin. Principle of reverse-time migration. Geophysics, 49(5):581–583, 1984. → pages 113C. Li, C. C. Mosher, L. C. Morley, Y. Ji, and J. D. Brewer. Joint source deblending andreconstruction for seismic data. SEG Technical Program Expanded Abstracts 2013, pages 82–87,2013. → pages 5E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert. Randomized algorithms forthe low-rank approximation of matrices. National Academy of Sciences, 104(51):20167–20172,2007. doi: 10.1073/pnas.0709640104. → pages 49, 79Z. Lin and S. Wei. A block lanczos with warm start technique for accelerating nuclear normminimization algorithms. CoRR, abs/1012.0365, 2010. → pages 23A. Long. A new seismic method to significantly improve deeper data character andinterpretability. 2009. → pages 5, 78161A. S. Long, E. von Abendorff, M. Purves, J. Norris, and A. Moritz. Simultaneous long offset(SLO) towed streamer seismic acquisition. In 75th EAGE Conference & Exhibition, 2013. →pages 5, 78S. Lu, N. Whitmore, A. Valenciano, and N. Chemingui. Illumination from 3d imaging ofmultiples: An analysis in the angle domain. 84th SEG, Denver, USA, Expended Abstracts,2014. → pages 114S. MacKay and R. Abma. Imaging and velocity analysis with depth-focusing analysis.Geophysics, (12), 57(12):1608–1622, 1992. URL http://dx.doi.org/10.1190/1.1443228. → pages 114A. Mahdad, P. Doulgeris, and G. Blacquiere. Separation of blended data by iterative estimationand subtraction of blending interference noise. Geophysics, 76(3):Q9–Q17, 2011. doi:10.1190/1.3556597. URL http://link.aip.org/link/?GPY/76/Q9/1. → pages 78F. Mahmoudian and G. F. Margrave. A review of angle domain common image gathers. TechnicalReport, University of Calgary, 2009. URLhttp://www.crewes.org/ForOurSponsors/ResearchReports/2009/CRR200953.pdf. → pages 121M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends inMachine Learning, 3(2):123–224, Feb. 2011. ISSN 1935-8237. doi: 10.1561/2200000035. URLhttp://dx.doi.org/10.1561/2200000035. → pages 49J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEETransactions on Image Processing, 17(1):53–69, Jan. 2008. → pages 13H. Mansour, R. Saab, P. Nasiopoulos, and R. Ward. Color image desaturation using sparsereconstruction. In Proc. of the IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP), pages 778–781, March 2010. → pages 13H. Mansour, F. J. Herrmann, and O. Yilmaz. Improved wavefield reconstruction from randomizedsampling via weighted one-norm minimization. submitted to Geophysics, 2012a. → pages 37H. Mansour, H. Wason, T. T. Lin, and F. J. Herrmann. Randomized marine acquisition withcompressive sampling matrices. Geophysical Prospecting, 60(4):648–662, July 2012b. URLhttp://onlinelibrary.wiley.com/doi/10.1111/j.1365-2478.2012.01075.x/abstract. → pages 13H. Mansour, H. Wason, T. T. Lin, and F. J. Herrmann. Randomized marine acquisition withcompressive sampling matrices. Geophysical Prospecting, 60(4):648–662, 2012c. URLhttp://onlinelibrary.wiley.com/doi/10.1111/j.1365-2478.2012.01075.x/abstract. → pages 78, 80H. Mansour, F. J. Herrmann, and O. Yilmaz. Improved wavefield reconstruction from randomizedsampling via weighted one-norm minimization. Geophysics, 78, no. 5(5):V193–V206, 08 2013.doi: 10.1190/geo2012-0383.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/Geophysics/2013/mansour2013GEOPiwr/mansour2013GEOPiwr.pdf. → pages 62, 96M. Maraschini, R. Dyer, K. Stevens, and D. Bird. Source separation by iterative rank reduction -theory and applications. In 74th EAGE Conference and Exhibition, 2012. → pages 79, 106R. A. Maronna, D. Martin, and Yohai. Robust Statistics. Wiley Series in Probability andStatistics. Wiley, 2006. → pages 23, 24162B. Mishra, G. Meyer, F. Bach, and R. Sepulchre. Low-rank optimization with trace norm penalty.SIAM Journal on Optimization, 23(4):2124–2149, 2013. → pages 50N. Moldoveanu and S. Fealy. Multi-vessel coil shooting acquisition. Patent Application, US20100142317 A1, 2010. → pages 78N. Moldoveanu and J. Quigley. Random sampling for seismic acquisition. In 73rd EAGEConference & Exhibition, 2011. → pages 78N. Moldoveanu, L. Combee, M. Egan, G. Hampson, L. Sydora, and W. Abriel. Over/undertowed-streamer acquisition: A method to extend seismic bandwidth to both higher and lowerfrequencies. The Leading Edge, 26:41–58, 01 2007. URLhttp://www.slb.com/∼/media/Files/westerngeco/resources/articles/2007/jan07 tle overunder.pdf. →pages 5, 78I. Moore. Simultaneous sources - processing and applications. In 72nd EAGE Conference andExhibition, 2010. → pages 78I. Moore, B. Dragoset, T. Ommundsen, D. Wilson, C. Ward, and D. Eke. Simultaneous sourceseparation using dithered sources. SEG Technical Program Expanded Abstracts, 27(1):2806–2810, 2008. doi: 10.1190/1.3063928. URL http://link.aip.org/link/?SGA/27/2806/1. → pages78C. Mosher, C. Li, L. Morley, Y. Ji, F. Janiszewski, R. Olson, and J. Brewer. Increasing theefficiency of seismic data acquisition via compressive sensing. The Leading Edge, 33(4):386–391,2014. → pages 78W. Mulder. Subsurface offset behaviour in velocity analysis with extended reflectivity images.Geophysical Prospecting, 62(1):17–33, 2014. ISSN 1365-2478. doi: 10.1111/1365-2478.12073.URL http://dx.doi.org/10.1111/1365-2478.12073. → pages 124R. N. Neelamani, C. E. Krohn, J. R. Krebs, J. K. Romberg, M. Deffenbaugh, and J. E. Anderson.Efficient seismic forward modeling using simultaneous random sources and sparsity. Geophysics,75(6):WB15–WB27, 2010. doi: 10.1190/1.3509470. URLhttp://geophysics.geoscienceworld.org/content/75/6/WB15.abstract. → pages 13J. Nocedal and S. J. Wright. Numerical Optimization. Springer, Aug. 2000. ISBN 0387987932.URL http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0387987932. →pages 126V. Oropeza and M. Sacchi. Simultaneous seismic data denoising and reconstruction viamultichannel singular spectrum analysis. Geophysics, 76(3):V25–V32, 2011. → pages 13, 28, 49,79, 82A. B. Owen and P. O. Perry. Bi-cross-validation of the svd and the nonnegative matrixfactorization. The Annals of Applied Statistics, 3(2):564–594, 06 2009. doi:10.1214/08-AOAS227. → pages 51S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi. Simultaneously structured modelswith application to sparse and low-rank matrices. IEEE Transactions on Information Theory,pages 1–1, 2012. → pages 61163M. Prucha, B. Biondi, W. Symes, et al. Angle-domain common image gathers by wave-equationmigration. 69th Ann. Internat. Mtg: Soc. of Expl. Geophys, pages 824–827, 1999. → pages 113B. Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research,7:3413–3430, 2011. → pages 54B. Recht and C. Re´. Parallel stochastic gradient algorithms for large-scale matrix completion. InOptimization Online, 2011. → pages 14, 17, 20, 28, 89, 109, 144B. Recht and C. Re´. Parallel stochastic gradient algorithms for large-scale matrix completion.Mathematical Programming Computation, 5(2):201–226, 2013. ISSN 1867-2949. doi:10.1007/s12532-013-0053-8. URL http://dx.doi.org/10.1007/s12532-013-0053-8. → pages 51, 60B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linear matrixequations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010a. doi:10.1137/070697835. → pages 49, 82B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank solutions to linear matrixequations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010b. → pages 13, 18,88, 108J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborativeprediction. In Proceedings of the 22nd international conference on Machine learning, ICML ’05,pages 713–719, New York, NY, USA, 2005a. ACM. ISBN 1-59593-180-5. → pages 89, 109J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborativeprediction. In ICML ’05 Proceedings of the 22nd international conference on Machine learning,pages 713 – 719, 2005b. → pages 14, 17J. Rickett and P. Sava. Offset and angle domain common imagepoint gathers for shot profilemigration. Geophysics, (3), 67(3):883–889, 2002. doi: 10.1190/1.1484531. URLhttp://library.seg.org/doi/abs/10.1190/1.1484531. → pages 121RigZone. How does marine seismic work. RigZone, January 24 2017. URLhttp://www.rigzone.com/training/insight.asp-insight id=303. → pages xiii, 2R. Rockafellar and R.-B. Wets. Variational Analysis, volume 317. Springer, 1998. → pages 16M. Sacchi, T. Ulrych, and C. Walker. Interpolation and extrapolation using a high-resolutiondiscrete fourier transform. Signal Processing, IEEE Transactions on, 46(1):31 –38, jan 1998. →pages 28, 32, 49M. Sacchi, S. Kaplan, and M. Naghizadeh. F-x gabor seismic data reconstruction. 71stConference and Exhibition, EAGE, Extended Abstracts, 2009. → pages 49M. D. Sacchi and B. Liu. Minimum weighted norm wavefield reconstruction for ava imaging.Geophysical Prospecting, 53(6):787–801, 2005. ISSN 1365-2478. doi:10.1111/j.1365-2478.2005.00503.x. URL http://dx.doi.org/10.1111/j.1365-2478.2005.00503.x. →pages 49164P. Sava and S. Fomel. Angle-domain common-image gathers by wavefield continuation methods.Geophysics, (3), 68(3):1065–1074, 2003. doi: 10.1190/1.1581078. URLhttp://library.seg.org/doi/abs/10.1190/1.1581078. → pages 121P. Sava and I. Vasconcelos. Extended imaging conditions for wave-equation migration.Geophysical Prospecting, 59(1):35–55, Jan. 2011. ISSN 00168025. doi:10.1111/j.1365-2478.2010.00888.x. URL http://doi.wiley.com/10.1111/j.1365-2478.2010.00888.x. →pages 113, 114, 116, 121, 124, 127, 143P. C. Sava and B. Biondi. Wave-equation migration velocity analysis. I. Theory. GeophysicalProspecting, 52:593–606, 2004. URL http://dx.doi.org/10.1111/j.1365-2478.2004.00447.x. → pages114P. C. Sava and S. Fomel. Time-shift imaging condition in seismic migration. Geophysics, (6), 71(6):S209—-S217, 2006. URL http://dx.doi.org/10.1190/1.2338824. → pages 114H. Schaeffer and S. Osher. A low patch-rank interpretation of texture. SIAM Journal on ImagingSciences, 6(1):226–262, 2013. → pages 50P. Shen and W. W. Symes. Automatic velocity analysis via shot profile migration. Geophysics, 73(5):VE49–VE59, 2008. URL http://dx.doi.org/10.1190/1.2972021. → pages 114, 124R. Shuey. A simplification of the zoeppritz equations. Geophysics, (4), 50(4):609–614, 1985. doi:10.1190/1.1441936. URL http://dx.doi.org/10.1190/1.1441936. → pages 123M. Signoretto, R. Van de Plas, B. De Moor, and J. A. Suykens. Tensor versus matrix completion:a comparison with application to spectral data. Signal Processing Letters, IEEE Transactionson Information Theory, 18(7):403–406, 2011. → pages 61C. D. Silva and F. J. Herrmann. Hierarchical Tucker tensor optimization - applications to 4Dseismic data interpolation. In EAGE Annual Conference Proceedings, 06 2013a. doi:10.3997/2214-4609.20130390. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2013/dasilva2013EAGEhtucktensor/dasilva2013EAGEhtucktensor.pdf. → pages 50, 108C. D. Silva and F. J. Herrmann. Structured tensor missing-trace interpolation in the HierarchicalTucker format. In SEG Technical Program Expanded Abstracts, volume 32, pages 3623–3627, 92013b. doi: 10.1190/segam2013-0709.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/SEG/2013/dasilva2013SEGhtuck/dasilva2013SEGhtuck.pdf. → pages 82C. D. Silva and F. J. Herrmann. Hierarchical Tucker tensor optimization - applications to 4dseismic data interpolation. In EAGE, 06 2013c. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2013/dasilva2013EAGEhtucktensor/dasilva2013EAGEhtucktensor.pdf. →pages 35S. Sinha, P. S. Routh, P. D. Anno, and J. P. Castagna. Spectral decomposition of seismic datawith continuous-wavelet transform. Geophysics, 70, no. 6(6):P19–P25, 2005. → pages 50H. F. Smith. A Hardy space for Fourier integral operators. Journal of Geometric Analysis, 8(4):629–653, 1998. → pages 81165N. Srebro. Learning with matrix factorizations, PhD Thesis. PhD thesis, Massachusetts Instituteof Technology, 2004. URL http://hdl.handle.net/1721.1/28743. → pages 58J.-L. Starck, M. Elad, and D. Donoho. Image decomposition via the combination of sparserepresentation and a variational approach. IEEE Transaction on Image Processing, 14(10),2005. → pages 13J. Stefani, G. Hampson, and F. Herkenhoff. Acquisition using simultaneous sources. In 69thEAGE Conference and Exhibition, 2007. → pages 78C. C. Stolk and W. W. Symes. Smooth objective functionals for seismic velocity inversion.Inverse Problems, 19:73–89, 2003. → pages 124C. C. Stolk, M. V. de Hoop, and W. W. Symes. Kinematics of shot-geophone migration.Geophysics, (6), 74(6):WCA19—-WCA34, 2009. ISSN 00168033. doi: 10.1190/1.3256285. URLhttp://library.seg.org/getabs/servlet/GetabsServlet?prog=normal&id=GPYSA70000740000060WCA19000001&idtype=cvips&gifs=yeshttp://dx.doi.org/10.1190/1.3256285.→ pages 113M. Stoll. A krylov-schur approach to the truncated svd. Linear Algebra and its Applications, 436(8):2795 – 2806, 2012. ISSN 0024-3795. doi: http://dx.doi.org/10.1016/j.laa.2011.07.022. URLhttp://www.sciencedirect.com/science/article/pii/S0024379511005349. → pages 57R. Sun and Z.-Q. Luo. Guaranteed matrix completion via non-convex factorization.http://arxiv.org/abs/1411.8003, accessed 28 November 2014, 2014. → pages 60W. Symes. Approximate linearized inversion by optimal scaling of prestack depth migration.Geophysics, (2), 73(2):R23–R35, 2008a. doi: 10.1190/1.2836323. URLhttp://dx.doi.org/10.1190/1.2836323. → pages 126W. Symes. Seismic inverse poblems : recent developments in theory and practice. In A. Louis,S. Arridge, and B. Rundell, editors, Proceedings of the Inverse Problems from Theory toApplications Conference, pages 2–6. IOP Publishing, 2014. → pages 124W. W. Symes. Migration velocity analysis and waveform inversion. Geophysical Prospecting, 56(6):765–790, 2008b. → pages 124W. W. Symes and J. J. Carazzone. Velocity inversion by differential semblance optimization.Geophysics, (5), 56(5):654–663, 1991. → pages 113, 114, 124W. W. Symes, D. Sun, and M. Enriquez. From modelling to inversion: designing a well-adaptedsimulator. Geophysical Prospecting, 59(5):814–833, 2011a. ISSN 1365-2478. doi:10.1111/j.1365-2478.2011.00977.x. URL http://dx.doi.org/10.1111/j.1365-2478.2011.00977.x. →pages 61, 90W. W. Symes, D. Sun, and M. Enriquez. From modelling to inversion: designing a well-adaptedsimulator. Geophysical Prospecting, 59(5):814–833, 2011b. ISSN 1365-2478. doi:10.1111/j.1365-2478.2011.00977.x. URL http://dx.doi.org/10.1111/j.1365-2478.2011.00977.x. →pages 123166A. ten Kroode, D.-J. Smit, and A. Verdel. Linearized inversed scattering in the presence ofcaustics. In SPIE’s 1994 International Symposium on Optics, Imaging, and Instrumentation,pages 28–42. International Society for Optics and Photonics, 1994. → pages 113P. Torben Hoy. A step change in seismic imaging–using a unique ghost free source and receiversystem. CSEG, Geoconvetion, 2013. → pages 5, 78D. Trad. Five-dimensional interpolation: Recovering from acquisition constraints. Geophysics, 74,no. 6(6):V123–V132, 2009. → pages 49S. Trickett and L. Burroughs. Prestack rank-reducing noise suppression: Theory. Society ofExploration Geophysicists, 2009. → pages 82S. Trickett, L. Burroughs, A. Milton, L. Walton, and R. Dack. Rank reduction based traceinterpolation. 80th Annual International Meeting, SEG, Expanded Abstracts, pages 3829–3833,2010. doi: 10.1190/1.3513645. → pages 49S. Trickett, L. Burroughs, and A. Milton. Interpolation using hankel tensor completion. 83rdAnnual International Meeting, SEG, Expanded Abstracts, pages 3634–3638, 2013. doi:10.1190/segam2013-0416.1. URL http://library.seg.org/doi/abs/10.1190/segam2013-0416.1. →pages 50T. van Leeuwen and F. J. Herrmann. 3D frequency-domain seismic inversion with controlledsloppiness. SIAM Journal on Scientific Computing, 36(5):S192–S217, 10 2014. doi:10.1137/130918629. URL http://epubs.siam.org/doi/abs/10.1137/130918629. (SISC). → pages 121,126T. van Leeuwen, A. Y. Aravkin, and F. J. Herrmann. Seismic waveform inversion by stochasticoptimization. International Journal of Geophysics, 2011, 12 2011. doi: 10.1155/2011/689041.URL https://www.slim.eos.ubc.ca/Publications/Public/Journals/InternationJournalOfGeophysics/2011/vanLeeuwen10IJGswi/vanLeeuwen10IJGswi.pdf. → pages 119, 125, 126A. van Wijngaarden. Imaging and Characterization of Angle-dependent Seismic Reflection Data.PhD thesis, Delft University of Technology, 1998. URLhttp://books.google.ca/books?id=x5NaAAAACAAJ. → pages 114, 121B. Vandereycken. Low-rank matrix completion by Riemannian optimization. SIAM Journal onOptimization, 23(2):1214—1236, 2013. → pages 29J. Virieux and S. Operto. An overview of full-waveform inversion in exploration geophysics.Geophysics, 74(6):WCC1–WCC26, 2009. doi: 10.1190/1.3238367. URLhttp://library.seg.org/doi/abs/10.1190/1.3238367. → pages 152J. Wang, M. Ng, and M. Perz. Seismic data interpolation by greedy local radon transform.Geophysics, 75, no. 6(6):WB225–WB234, 2010. doi: 10.1190/1.3484195. → pages 49L. Wang, J. Gao, W. Zhao, and X. Jiang. Nonstationary seismic deconvolution by adaptivemolecular decomposition. In Geoscience and Remote Sensing Symposium (IGARSS), pages2189–2192. IEEE Transactions on Information Theory, 2011. → pages 50167H. Wason and F. J. Herrmann. Ocean bottom seismic acquisition via jittered sampling. In EAGE,06 2013a. doi: 10.3997/2214-4609.20130379. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2013/wason2013EAGEobs/wason2013EAGEobs.pdf. → pages 79, 96H. Wason and F. J. Herrmann. Time-jittered ocean bottom seismic acquisition. 32:1–6, 9 2013b.doi: 10.1190/segam2013-1391.1. URL https://www.slim.eos.ubc.ca/Publications/Public/Conferences/SEG/2013/wason2013SEGtjo/wason2013SEGtjo.pdf. → pages 5, 78, 79, 80, 97, 105, 106, 146Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for matrix completion bya nonlinear successive over-relaxation algorithm. Mathematical Programming Computation, 4(4):333–361, 2012. → pages 70, 145N. Whitmore et al. Iterative depth migration by backward time propagation. In 1983 SEGAnnual Meeting. Society of Exploration Geophysicists, 1983. → pages 113T. Yang and P. Sava. Image-domain wavefield tomography with extended common-image-pointgathers. Geophysical Prospecting, 63(5):1086–1096, 2015. ISSN 1365-2478. doi:10.1111/1365-2478.12204. URL http://dx.doi.org/10.1111/1365-2478.12204. → pages 114, 124, 126Y. Yang, J. Ma, and S. Osher. Seismic data reconstruction via matrix completion. Inverseproblem and imaging, 7(4):1379–1392, 2013. → pages 49, 50168

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0355266/manifest

Comment

Related Items