Machine learning applicationsto geophysical data analysisbyBenjamin Bryan BougherB.Sc. in Physics, Dalhousie University, 2009a thesis submitted in partial fulfillmentof the requirements for the degree ofMaster of Scienceinthe faculty of graduate and postdoctoral studies(Geophysics)The University of British Columbia(Vancouver)August 2016© Benjamin Bryan Bougher, 2016AbstractThe sedimentary layers of the Earth are a complex amorphous materialformed from chaotic, turbulent, and random natural processes. Explorationgeophysicists use a combination of assumptions, approximate physical mod-els, and trained pattern recognition to extract useful information from com-plex remote sensing data such as seismic and well logs. In this thesis I inves-tigate supervised and unsupervised machine learning models in geophysicaldata analysis and present two novel applications to exploration geophysics.Firstly, interpreted well logs from the Trenton-Black River study areused to train a classifier that results in a success rate of 67% at predictingstratigraphic units from gamma ray logs. I use the scattering transform, amultiscale analysis transform, to extract discriminating features to feed aK-nearest neighbour classifier.A second experiment frames a conventional pre-stack seismic data char-acterization workflow as an unsupervised machine learning problem that isfree from physical assumptions. Conventionally, the Shuey model is used tofit the angle dependent reflectivity response of seismic data. I instead useprinciple component based approaches to learn projections from the datathat improve classification. Results on the Marmousi II elastic model andan industry field dataset show that unsupervised learning models can beeffective at segmenting hydrocarbon reservoirs from seismic data.iiPrefaceThis thesis consists of original research carried out at The University ofBritish Columbia under the supervision of Felix J. Herrmann and the SeismicLaboratory for Imaging and Modelling.The lab work for this thesis was entirely computational and owes muchgratitude to the following open source software projects:• The MATLAB package ScatNet, used for scattering transform calcu-lations in Chapter 2 (https://github.com/scatnet).• Jan Thorbecke’s finite difference engine, which was used for seismicmodelling in Chapter 3 (https://github.com/JanThorbecke/OpenSource).• Madagascar framework for reproducible science, used for migrationand seismic processing in Chapter 3 (https://github.com/ahay).• scikit-learn machine learning library, used for classification inChapter 2, and PCA and clustering computations in Chapter 3(https://github.com/scikit-learn/scikit-learn).• The linear operator toolbox SPOT, which was used for matrix inver-sions.• The scientific python stack SciPy, which was used for gen-eral computations and data visualization throughout the thesis(https://www.scipy.org/stackspec.html)The robust PCA optimization algorithm in Section 3.3.4 is my own be-spoke development.Versions of Chapter 2 were published as a journal article and an expandedabstract at the 2016 Geoconvention, where it received honourable mentionfor best student presentation.iiiPreface• Bougher, B. B., & Herrmann, F. J. (2016). Using the scattering trans-form to predict stratigraphic units from well logs. CSEG Recorder,41(1), 22–25.• Bougher, B. B., & Herrmann, F. J. (2015). Prediction of stratigraphicunits from spectral co-occurance coefficients of well logs. In CSEGAnnual Conference Proceedings.A version of Chapter 3 was published as an extended abstract and willbe presented at the 2016 SEG annual meeting.• Bougher,B. B., & Herrmann, F. J. (2016). AVA classification as anunsupervised machine-learning problem.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . xiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Predicting stratigraphic units from well logs using super-vised learning . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Multifractal analysis . . . . . . . . . . . . . . . . . . . . . . 72.2 Scattering transform . . . . . . . . . . . . . . . . . . . . . . 112.3 Scattering transform as a convolution neural network . . . . 142.4 Application to the Trenton-Black River well logs . . . . . . 162.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . 162.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 17vTable of Contents2.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 183 Reflection seismology analysis as an unsupervised learningproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1 Angle domain common image gathers by shot profile migration 213.2 Overview of amplitude vs angle analysis . . . . . . . . . . . 263.3 Motivation for machine learning . . . . . . . . . . . . . . . . 293.3.1 Problem formulation . . . . . . . . . . . . . . . . . . 333.4 Example on Marmousi II synthetic dataset . . . . . . . . . 333.4.1 Seismic modelling . . . . . . . . . . . . . . . . . . . 343.4.2 Principle component analysis . . . . . . . . . . . . . 363.4.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . 423.4.4 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . 443.4.5 Robust PCA . . . . . . . . . . . . . . . . . . . . . . 483.4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 543.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Comparison to dynamic intercept gradient inversion . . . 574.1 Dynamic intercept gradient inversion . . . . . . . . . . . . . 574.2 PCA extended DIGI . . . . . . . . . . . . . . . . . . . . . . 594.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . 66Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68viList of FiguresFigure 1.1 Netflix matrix completion. Putting the users along therows and movies along the columns, the ratings form an incom-plete matrix. Predictions can be made by completing the matrixvia a generic convex optimization. . . . . . . . . . . . . . . . . 3Figure 2.1 Example of a labeled gamma ray log from the Trenton-Black River dataset. . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 2.2 Flow diagram of a supervised learning problem. Featuresare extracted from a labelled data, and which are then split intotesting and training sets. The training features and correspond-ing labels are used to train a classifier, which maps a set of fea-tures to a label. This mapping is applied to the testing dataset,where predictions can be compared to the true labels. . . . . . 7Figure 2.3 Examples of fractal geometries. Left: Mandelbrot set[“Mandelbrot Set” by Wolfgang Beyer is licensed under CC BY3.0] Centre: Dendritic copper crystals at 20x magnification [li-censed under CC BY SA 2.0]. Right: The coast line of Britain[“Fractal Coast” by Chris Moran (2010), used with permissionfrom author]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Figure 2.4 Singularities about 0 for = 0 (top), = 1 (middle), and = 2 (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 2.5 Multifractal signals and their singularity spectrum. Gaus-sian white noise (top), bore hole measurement (bottom). Figureadapted from Herrmann (1997). . . . . . . . . . . . . . . . . . . 9viiList of FiguresFigure 2.6 The WTMML method applied to a geophysical signal(Herrmann, 1997)). (a) A well log measurement of the compres-sional wave-speed. (b) The wavelet transform of (a) with mod-ulus maxima lines overlaid. (c) The partition function formedfrom the modulus maxima lines. (d) The mass exponent (q)estimated from (c). (e)The singularity spectrum f() computedfrom the legendre transform of (d). . . . . . . . . . . . . . . . . 12Figure 2.7 The scattering transform as a convolutional neural net-work (Andén and Mallat, 2014). . . . . . . . . . . . . . . . . . 13Figure 2.8 The action of a single window of a 2 layer scattering trans-form on a signal. . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.9 Generic 2-layer convolutional neural network (CNN) ap-plied to a 1d-signal. A CNN extracts a feature vector throughmultiple stages of convolution, non-linearities, and pooling oper-ations. The weights for each convolution filter are learned whiletraining the classifier. . . . . . . . . . . . . . . . . . . . . . . . 15Figure 2.10 KNN classification. The input feature vector is comparedto the training vectors and the mode of the labels from the mostsimilar trained vectors is output as the prediction. . . . . . . . 17Figure 2.11 Confusion matrix of the classifier. The diagonal cells cor-respond to correct classifications and off-diagonal cells are mis-classifications. Cell size corresponds to the number of samples ofa particular class. . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 3.1 Seismic survey [CC-BY-SA-NC]. Sources and receivers onthe surface measure backscattered energy from stratigraphic in-terfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 3.2 Angle gather migration creates scattering angle depen-dent images of the subsurface (bottom) from seismic shot records(top). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 3.3 ADCIGs for multiple midpoints. . . . . . . . . . . . . . 24Figure 3.4 For each shot, source (b) and receiver wavefields (c) areextrapolated (e,f) to each depth. The extrapolated wavefields arecorrelated for multiple space lags creating offset-domain common-image gathers (h). The ODCIG’s are slant-stacked yielding angle-domain common-image gathers (i). The angle dependent reflec-tivity response (j) can be analyzed for each point in the model. 25viiiList of FiguresFigure 3.5 The geometric ray approximation to seismic wave scatter-ing at an interface. . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 3.6 AVA curves generated using the non-linear Zoeppritzequations (Equation 3.5) and the two-term Shuey approximation(Equation 3.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Figure 3.7 IG scatter plot depicting the mudrock line of silliclasticinterfaces and the outlying gas saturated sands. The seeminglypositive/negative symmetry stems from the top and bottom ofeach layer. Every point in the scatter plot corresponds to a pixelin the seismic image. . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 3.8 Examples of response curves of strong reflectors from realdatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 3.9 AVA characterization as unsupervised learning. ADCIGsare reshaped into a feature matrix, where each point becomesa row and the angle dependent reflectivity response becomes acolumn. The columns of the matrix are reduced to allow forvisualization and to prevent overfitting. The low dimensionalfeatures can be clustered which results in a segmented image. . 34Figure 3.10 Subset of the Marmousi II elastic model. . . . . . . . . . 35Figure 3.11 ADCIG for the true Marmousi II reflectivity model. . . 36Figure 3.12 IG crossplot for the true Marmousi II reflectivity model. 37Figure 3.13 ADCIG for the migrated seismic from the Marmousi IImodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 3.14 IG crossplot for the migrated Marmousi II model. . . . 39Figure 3.15 PCA crossplot for the true Marmousi II reflectivity model. 39Figure 3.16 Comparison of principle components from Zoeppritz mod-elled data, migrated data, and the Shuey terms as components. 40Figure 3.17 PCA crossplot (a) compared to IG crossplot (b) for themigrated Marmousi II model. Note that outliers in (a) are pushedfarther from the dense background clusters than in (b). . . . . 41Figure 3.18 Partitional clustering. Points are partitioned by max-imizing the likelihood of membership to a distribution. . . . . . 42Figure 3.19 Agglomerative clustering. Each point is initially con-sidered to be a cluster. The geometrically closest clusters arejoined into a node, which is then considered a new cluster. Thealgorithm steps through the hierarchy until a set number of clus-ters is reached. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43ixList of FiguresFigure 3.20 Classification results for physically consistent data. . . . 44Figure 3.21 Classification of principle components from migrated seis-mic data. The background trend was initially two positive/nega-tive clusters that were joined into one for plotting purposes. . . 45Figure 3.22 Kernel PCA crossplots for increasing values of x for phys-ically consistent data (top), and migrated seismic data (bottom). 47Figure 3.23 Image segmentation results for kernel PCA applied tophysically consistent data. . . . . . . . . . . . . . . . . . . . . . 49Figure 3.24 Image segmentation results for kernel PCA applied to themigrated seismic data. . . . . . . . . . . . . . . . . . . . . . . . 50Figure 3.25 Robust PCA applied the true Marmousi II reflectivitymodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure 3.26 Robust PCA applied to the migrated seismic from theMarmousi II model. . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.1 Migrated data section from a real field dataset providedthe BG group. There is an interpreted gas sand is located atsample 400 and xl range 150-220. . . . . . . . . . . . . . . . . . 58Figure 4.2 Segmentation results from robust PCA. . . . . . . . . . 61Figure 4.3 Segmentation results from thresholding ZZg values ex-tracted using the DIGI algorithm. . . . . . . . . . . . . . . . . 62Figure 4.4 Segmentation results from thresholding ZZg values ex-tracted using the DIGI algorithm with principle components re-placing the Shuey terms. . . . . . . . . . . . . . . . . . . . . . . 63Figure 4.5 Comparison of EER sections generated from DIGI (top)and PCA extended DIGI (bottom). The region containing thepotential reservoir is highlighted in green on the trace plots. . . 64Figure 4.6 Principle component vectors extracted from the BG dataset (top) compared the Shuey components (bottom). . . . . . . 65xList of AlgorithmsAlgorithm 3.1 ADCIG by shot profile migration . . . . . . . . . . . 24Algorithm 3.2 Component Pursuit by ADMM . . . . . . . . . . . . 52xiAcknowledgmentsThe entire SLIM group for their support. In particular, I would not havebeen able to digest the mathematical aspects of my research without thehelp of Curt Da Silva and Dr. Rongrong Wang.Charles Jones and James Selvage for pointing me in the direction of AVAanalysis, providing their open-source analysis code for comparison, and forproviding the processed field data used in Chapter 4.Finally, I owe extreme gratitude to my supervisor Dr. Felix J. Herr-mann for providing the freedom to pursue original research, the flexibilityto continue my work outside of the lab, and for his progressive ideas.xiiDedicationTo Miranda Joyce, for taking care of everything I needed before I knew Ineeded it. To my roomate Maddy Kipling, for making Vancouver feel likehome for 2 years. And finally, to Tim Morgan, whoever you are...xiiiChapter 1Introduction1.1 MotivationLarge scale natural phenomena exhibits complexity beyond any tractablephysical model or deterministic computation. As Earth scientists, we aretasked with the impossible challenge of explaining the complex world welive in. We can use laboratories to control complexity and study individualphenomena, or impose assumptions and approximations to make inferencesfrom tractable but simplified models. In either case, we fail to fully explainthe large scale system we wish to study.Complexity exists in many domains outside of the Earth sciences-e.g.,financial markets, economics, human behaviour, and systems engineering,where technological advances in data storage and communications has cre-ated large data sets about complex systems. This has motivated a new ap-proach to data analysis where in place of directly modelling the behaviourof a complex system, one instead learns predictions and trends directly fromthe data. These approaches have many names -e.g., machine learning, datascience, predictive analytics, but generalize as the application of genericmodels to large datasets in order to extract useful information from com-plex systems.As an example, consider the infamous Netflix problem: Given a largedatabase of movies and users, and a small list of movies that each userhas rated, the problem is to predict user ratings for movies they have not1Chapter 1. Introductionseen. We can try to model a large population’s individual taste, hopingto learn much about the intrinsic relationships between people and theircinematic preferences, but will quickly realize the problem is too complexto be modelled in an objectively correct way. Alternatively, we can trya machine learning approach that uses a general model to extract usefulinformation from the dataset. Forming a matrix with users along the rowsand movies along the columns, the user ratings now form an incompletematrix (Figure 1.1). The complex problem of predicting user taste profiles isabstracted to the general problem of matrix completion, which can be solvedusing low-rank optimization scheme (Candes and Tao, 2009). This methoddoes not allow us to directly say anything about the intrinsic relationshipbetween users and movies, but it extracts potentially useful information andpredictions from the dataset.In this thesis I study the geophysical data of the sedimentary layers ofthe Earth, a complex amorphous material formed from chaotic, turbulent,and random natural processes. I argue that exploration geophysics fits inthe regime of machine learning models and successfully demonstrate theirapplication to two conventional analysis methodologies:• Determining stratigraphic units from well log data• Exploring prestack seismic data for hydrocarbons1.2 BackgroundGeophysicists analyze remote sensing data such as seismic and well logs inorder to characterize geological and structural properties. Measurements arevisually interpreted for patterns and correlations in order to classify regionsthat are similar in geological structure. This type of pattern recognition fitswell into a supervised learning regime where interpreted data can be usedto train a classifier, which can predict interpretations on future data sets.Convolutional neural networks (CNN) have shown success in recognizingpatterns in images and signals. For example Li et al. (2010) used CNN toextract meaningful features from music for automated genre classificationand an approach using CNN achieved the best classification results on thecompetitive MNIST hand writing benchmark (Ciresan et al., 2012).The scattering transform (Andén and Mallat, 2014) offers an alterna-tive approach to training a CNN. While it shares the same structure as a2Chapter 1. IntroductionFigure 1.1 Netflix matrix completion. Putting the users along the rowsand movies along the columns, the ratings form an incomplete matrix. Pre-dictions can be made by completing the matrix via a generic convex opti-mization.CNN, the nodes are predefined as multiscale wavelet transforms and canbe interpreted as multiscale measurements of the input signal. Using thescattering transform has achieved competitive classification results on bothmusic (Andén and Mallat, 2014) and image classification problems (Bruna,2013). Of particular interest to problems in geophysics, the multiscale na-ture of the transform is tied to multifractal signal analysis (Bruna et al.,2015). In Chapter 2 I use the scattering transform in the context of super-vised machine learning to classify stratigraphic units from well logs.Unsupervised learning, in the most general sense, is a subfield of machinelearning that tries to infer hidden structure within an unlabeled dataset.Unsupervised methods are particularly useful when the inferred structureis lower dimensional than the original data. For example, given a list of3Chapter 1. Introductionn patients in a hospital and their corresponding symptoms s, it is unlikelythat each patient-symptom combination is unique. A set of common disease-symptom combinations y can be inferred from the data, where y << nP s. In-terestingly, interpreted images and geological maps produced by geoscienceworkflows are substantially lower-dimension than the original field data.The structure of the sedimentary layers of the Earth is often explainedwith a simplified model: rock with similar physical properties is formed alongrelatively continuous interfaces and facies in the subsurface. For this reason,we can use a combination of physical models, local geological knowledge, andexperience to reduce large seismic and well log datasets into low-dimensionalmodels of the Earth. Unfortunately, the simplified physical and geologi-cal assumptions do not always hold true in practice, making the inferredmodel highly uncertain and biased. This problem can instead be reformu-lated using general machine learning models. Abstractly, we are inferringa low-dimensional Earth model from high-dimensional geophysical data. InChapter 3 I apply unsupervised machine learning models to automaticallyclassify prestack seismic data.1.3 OverviewThis thesis is organized as follows:Chapter 2 provides an overview of fractal analysis and the motivationfor using the scattering transform as a discriminating feature basis for welllogs. Interpreted well logs from the Trenton-Black River dataset are used totrain a classifier which can predict stratigraphic units at 67% success rate.Chapter 3 describes reflection seismology as a scattering physics exper-iment and explains the assumptions and approximations used to measureseismic radiation patterns in the subsurface. I demonstrate that PCA-basedapproaches can successfully classify regions of the subsurface even when thephysical assumptions break down.Chapter 4 applies the methodologies from Chapter 3 to a industry fielddataset where results are compared to directly to an algorithm used by theBG group.I finish with general conclusions and outlines for future related research.4Chapter 2Predicting stratigraphic unitsfrom well logs using supervisedlearningWell logs, measurements made down bore holes, are interpreted by geophysi-cists in order to characterize the subsurface (Figure 2.1). Stratigraphic unitsare discovered and correlated across multiple wells by recognizing patterns,textures, and similarities in the data. The manual analysis of well logs isboth repetitive and time consuming and would therefore benefit from auto-mated approaches. Supervised learning provides a framework that can traina classifier to predict human interpretations of data.In a supervised learning experiment, one uses interpreted (labelled)datasets to build a mapping from data to interpretation (Figure 2.2). Train-ing a classifier using raw data samples is often not useful, or even possible,as raw data can be irregularly sized, incomplete, or too cumbersome to beused directly. Data samples are therefore transformed into feature vectorsvia a feature extraction step. Extracting meaningful features from a datasetrequires significant domain expertise and experience, as finding an abstractrepresentation of the data is the most challenging and sensitive aspect ofsupervised learning.The sedimentary layers of the Earth are built up by random depositionalprocesses occurring across different timescales. These multiscale processesform fractal relationships in the strata of the Earth, which is described by5Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.1 Example of a labeled gamma ray log from the Trenton-BlackRiver dataset.6Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.2 Flow diagram of a supervised learning problem. Features areextracted from a labelled data, and which are then split into testing andtraining sets. The training features and corresponding labels are used totrain a classifier, which maps a set of features to a label. This mapping isapplied to the testing dataset, where predictions can be compared to thetrue labels.the influential work of Mandelbrot (1977). Furthermore, fractal analysis ofwell logs by Herrmann (1997) showed that bore hole measurements (welllogs) reflect these relationships as multifractal signals. Building on thisfoundation, I extract multifractal features from well logs to train a classifierthat predicts stratigraphic units.2.1 Multifractal analysisI limit the scope of this section to an introduction and conceptual overview ofmultifractal signal analysis and refer to original works for proofs and rigoroustreatments. Multifractal analysis is heavily tied to wavelet analysis, whichMallat (1998) provides a superb treatment of.7Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFractal geometries can be defined as a natural phenomenon or mathe-matical set that repeats at each scale (Figure 2.3). Fractal relationships havebeen observed in nature and are often used as a simple way to model com-plex natural structures such as coast lines (Mandelbrot, 1967) and branchingpatterns of trees (Morse et al., 1985). The use of fractals to model naturehas become ubiquitous in the special effects industry, where intricately tex-tured landscapes can be modeled by simple parametric models (Pentland,1984). In addition to modeling natural geometries, fractals are a powerfulsignal analysis tool.Figure 2.3 Examples of fractal geometries. Left: Mandelbrot set [“Mandel-brot Set” by Wolfgang Beyer is licensed under CC BY 3.0] Centre: Dendriticcopper crystals at 20x magnification [licensed under CC BY SA 2.0]. Right:The coast line of Britain [“Fractal Coast” by Chris Moran (2010), used withpermission from author].The fractal nature of signals can be studied by analyzing the singularitiesof a continuous process, where a singularity is defined as a point where thedifferentiability fails to be well behaved. The local differentiability of a signalf(x) can be described by the Holder exponent (Estrada and Kanwal, 2000)(x) where|f(x)− f(y)| ≤ X|x− y|(x) (2.1)for some constant X and y being a point at a small distance from x. Inthis regard, a signal is times differentiable around the point x. Figure 2.4shows examples of singularities for different values of .A multifractal signal consists of singular points of differing values of ,where the singularity spectrum Y() describes the distribution of the Holdercoefficients (Frisch and Parisi, 1985). Loosely, the singularity spectrum char-acterizes the roughnzss of a signal. Figure 2.5 shows multifractal signalswith differing singularity spectrums.The Holder exponent at a singularity can be measured by exploiting thescaling properties of the wavelet transform (Mallat and Hwang, 1992, Mallat8Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.4 Singularities about 0 for = 0 (top), = 1 (middle), and = 2(bottom).Figure 2.5 Multifractal signals and their singularity spectrum. Gaussianwhite noise (top), bore hole measurement (bottom). Figure adapted fromHerrmann (1997).9Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningand Zhong (1992)). The local behaviour of the wavelet transform i of asignal f behaves asi [f ](x0P v) ∼ v(x0) P v→ 0+ (2.2)provided that n S (x0) where n is the number of vanishing moments ofthe analyzing wavelet. The Holder exponent can thus be determined fromthe slope of the line from the log-log plot of the wavelet coefficients at x0.Directly measuring the Holder exponents at each point in a signal inorder to sample Y() is not feasible. The method of wavelet transform mod-ulus maxima lines (WTMML) (Frisch and Parisi, 1985, Mallat and Zhong(1992), Bacry et al. (1993), Muzy et al. (1991), MUZY et al. (1994), Muzyet al. (1993)) instead provides a statistical estimation of Y() from modulusmaxima lines of the wavelet transform.The scaling behaviour of the wavelet transform is approximated by thepartition functiono(qP v) =∑l∈L(a)0B@ sup(xPa′)∈la′≤a|i [f ](xP v′)|1CAq(2.3)where sup(xPa′)∈la′≤a|i [f ](xP v′)| are the modulus maxima lines of the wavelettransform. A scaling exponent (q) can now be defined from the power-lawo(qP v) ∝ v(q) (2.4)of the partition function.Arneodo et al. (1995) has shown this partition function has analogies tothermodynamics, where (q) and q play the role of the inverse temperatureand free energy. For the conjugate variables energy and entropy, one has theHolder exponent and singularity spectrum Y() which can be determinedby the legendre transformY() = minq(q− (q))O (2.5)Figure 2.6 shows the WTMML method applied to a multifractal signal.Using a fractal approach to the analysis of well log data is an activefield of research. Khue et al. (2002) found that texture logs formed from10Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningmultifractal measurements of microresistivity logs were indicators of het-erogeneity when compared to corresponding core samples. Alvarez et al.(2003) found the scaling behaviour of wavelet coefficients distinctly differ-ent for gravel than for sands. Ouadfeul and Aliouane (2013), Aliouane andOuadfeul (2014), Aliouane et al. (2012) have directly used the WTMMLmethod in combination with self-organizing maps to automatically segmentwell-logs.AlthoughWTMML is a tractable method for fractal characterization andanalysis, it is difficult to measure Y() in local windows, which is requiredfor a useful feature representation in a supervised learning problem.2.2 Scattering transformThe scattering transform (Andén and Mallat, 2014), a cascade of wavelettransforms and modulus operators which operates on local data windows,provides an alternative method for multifractal signal analysis where Brunaet al. (2015) demonstrated the first two layers of the scattering networkcarry discriminating multifractal signal information. Further more, Andénand Mallat (2011) and Bruna and Mallat (2010) demonstrated the effec-tiveness of the scattering transform as a feature basis in image and signalclassification.Given a discrete signal x[n] = x(nt) for n = 1 O O O c , the discrete wavelettransform uses a bandpass filter bank to decompose x[n] into J bandlim-ited signals. Analogous to the Fourier transform, which decomposes a signalinto frequencies, the wavelet transform is considered to be a decompositioninto scales and translations. The scattering transform exploits multiscalerelationships in signals using a cascade of wavelet transforms, magnitude,and averaging operations (Figure 2.7). Figure 2.8 shows the action of onescattering transform window on a signal.At the zero-level of the transform, the h0 output coefficients are simplyx[n] averaged by a window ϕ–i.e.,h0[n] = x[n] ∗ ϕ[n]O (2.6)The first layer of the network is formed by taking the magnitude of thewavelet transform Ψ1 of x[n] and the h1 output coefficients are then gen-erated by averaging along each scale in the filter bank 1 by the window11Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.6 The WTMML method applied to a geophysical signal (Herr-mann, 1997)). (a) A well log measurement of the compressional wave-speed.(b) The wavelet transform of (a) with modulus maxima lines overlaid. (c)The partition function formed from the modulus maxima lines. (d) Themass exponent (q) estimated from (c). (e)The singularity spectrum f()computed from the legendre transform of (d).ϕ–i.e.,h1[nP 1] = |x[n] ∗Ψ1 | ∗ ϕ[n]O (2.7)Referring to Figure 2.8, each window results in J h1 output coefficients,where J is the number of scales in 1.12Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.7 The scattering transform as a convolutional neural network(Andén and Mallat, 2014).Figure 2.8 The action of a single window of a 2 layer scattering transformon a signal.13Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningThe second layer of the transform is realized by taking the magnitudeof a second wavelet transform Ψ2 of each signal in the first layer of thenetwork. Averaging is performed to make the h2 output coefficients–i.e.,h2[nP 1P 2] = ||x[n] ∗Ψ1 | ∗Ψ2 | ∗ ϕ[n]O (2.8)Again referring to Figure 2.8, the h2 output contains JL coefficients whereJ and L are the number of scales in 1 and 2. Note that the second layercontains multiscale mixing coefficients 1j2l . The notation 1j2l refers to lthscale of 2 operating on the jth scale of 1. This process is repeated to adefined network depth of m-i.e.,hm[nP 1P 2P O O O P m] = |||x[n] ∗Ψ1 | ∗Ψ2 | ∗ | ∗Ψm || ∗ ϕ[n]O (2.9)2.3 Scattering transform as a convolutionneural networkConvolutional neural networks (CNNs) have achieved state of the art resultsin perceptual classification problems(Li et al., 2010; Garcia and Delakis,2004; Nasse et al., 2009). Each level in a CNN consists of convolutions, non-linearities, and pooling operations (Figure 2.9). The final level of a CNNoutputs a feature representation of the input signal, which is then fed intoa classifier.In each convolution layer, a filter bank of J trainable kernels are fed byK feature maps, where xi for i = 1P OOPK are the feature maps output by theprevious network level. For the first level, x is just the raw input signal. Foreach kernel kj ∈ Rl where l is the length of the kernels in the filter bank,the output feature mapyj = wj +∑ikj ∗ xi (2.10)is computed, where wi is a trainable bias parameter. The output featuremaps are then fed to the non-linearity layer.Conventionally, the non-linearity layer is simply a pointwise sigmoid func-tion applied to every point in each feature map output by the convolutionlayer. However, any pointwise non-linearity function can be applied at thisstage. It is common for the non-linearity function to contain a trainable14Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningFigure 2.9 Generic 2-layer convolutional neural network (CNN) applied toa 1d-signal. A CNN extracts a feature vector through multiple stages ofconvolution, non-linearities, and pooling operations. The weights for eachconvolution filter are learned while training the classifier.parameter, such as the rectified sigmoid zj = |gj · tanh(yj)| (LeCun et al.,2010) that contains a trainable gain parameter gj . The output feature mapszj for j = 1P OOOP J are the same size as the output from the convolution layer.In the pooling layer, a neighbourhood average eV(zj) is computed in-dependently on each feature map zj . Implicit in the pooling layer is adownsampling operation that is applied to remove the redundancy causedby local averaging.The output of the pooling layer in the final level of the network is concate-nated into the final feature representation of the data. The feature vectoris output into a classifier, which is typically a logistic classifier like softmaxthat assigns a probability of the signal belonging to each available class.The filter weights and bias terms for every kernel in every level of thenetwork become a large set of trainable parameters which are determinedusing stochastic gradient descent. The gradients for each level are computedusing the back propogation method, which is described in Rumelhart et al.(1986) . Due to the large number of parameters in a CNN, CNNs are usefulonly when a very large training database is available.The scattering transform shares much of the same architecture as a CNN,but the biggest difference is the kernel weights are predefined wavelet trans-forms instead of being trained. The non-linearity stages are simply magni-tude operations, and the pooling operator is defined by ϕ. The scattering15Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningnetwork also differs in that a feature representation is output at every level,instead of just the final level. The benefit of the scattering transform is thatit has the same structure as a deep network, but does not require trainingand maintains a direct interpretation. The scattering transform is also use-ful for feature extraction for classification problems that do not have largedatabases of training data.2.4 Application to the Trenton-Black Riverwell logsThe fractal nature of well logs and the scattering transform as a practicalfractal analysis tool motivates the use of the scattering transform as a featurerepresentation of well logs in a machine learning problem.In the early 2000s, the Trenton Black River carbonates experiencedrenewed interest due to speculation among geologists of potential produc-ing reservoirs. A basin wide collaboration to study the region resulted inmany data products, including well logs with corresponding stratigraphicanalysis. The dataset contained 80 gamma-ray logs with correspondingstratigraphic interpretations (labels). Although the region contained moreunits, some were too thin and pinched out to allow for valid signal anal-ysis. The 5 most prominent units (Black River, Kope, Ordovician, Tren-ton/Lexington, and Utica) were used for analysis. An example of a la-belled log is shown in Figure 2.1. The dataset can be downloaded athttp://www.wvgs.wvnet.edu/www/tbr/.2.4.1 MethodologyI transform the well logs into feature vectors, where each scattering windowbecomes a sample and the h0, h1, and h2 scattering coefficients form the fea-ture vector. This study used a two-layer scattering network, as the numberof coefficients becomes unmanageable for deep networks (large m) and Bruna(2013) found only marginal classification improvement for m > 2. I use aDyadic wavelet bank of Morlet wavelets for Ψm and found through trial that256 samples was a good choice in scattering window size. The opensourceMATLAB implementation ScatNet was used for calculating the scatteringtransforms. I apply a K-Nearest Neighbours (KNN) classifier (Figure 2.10)16Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningthat uses Euclidean distance as the measure of similarity. Cross-validationdetermined a K-value of 5.Figure 2.10 KNN classification. The input feature vector is compared tothe training vectors and the mode of the labels from the most similar trainedvectors is output as the prediction.2.4.2 ResultsThe classifier had an overall prediction rate of 65%. The confusion ma-trix (Figure 2.11) shows the prediction and misclassification rates for eachunit type. The thicker units, which inherently contained more data samples,showed the best performance. This is expected, as having more training datafor a particular label increases the chances of having similar vectors and bet-ter matches in the KNN classifier. The two thickest units, Black River andOrdovician, had the best prediction rates. Figure 2.1 and Figure 2.11 showthat sequential units in the stratigraphic record were the most likely to bemisclassified as each other, for example Trenton Lexington and Black River.17Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningThese units may actually come from the same or similar depositional systemsand would share similar multiscale structure in the well logs. More detailedassessment of the geology and the method used to interpret stratigraphy isrequired in order to better understand these results.Figure 2.11 Confusion matrix of the classifier. The diagonal cells corre-spond to correct classifications and off-diagonal cells are misclassifications.Cell size corresponds to the number of samples of a particular class.2.4.3 DiscussionAssessing the results in a quantitative matter is highly uncertain, as thelabels in the “truth” data set are inherently subjective. The work of Bond(2015), as well as recent work by Hall et al. (2015) has shown significantdeviations of interpretations on the same section of data. Additionally, whata geologist interprets as a stratigraphic unit can be somewhat arbitrary, andmay be based on subjective information other than textural patterns in thedata.18Chapter 2. Predicting stratigraphic units from well logs usingsupervised learningThe scattering transform is highly parameterized and results would likelybe sensitive to choices of wavelet banks and window sizes at each level of thetransform. This basic study used dyadic Morlet wavelets at each level, butthe scattering transform may use different banks in each layer of the network.Multifractal analysis is highly sensitive to the number of vanishing momentsin the analyzing wavelet, as the vanishing moments defines the upper limit ofthe Holder exponent that can be measured. The optimal analyzing waveletat each layer can be determined in a rigorous manner by analyzing theHolder continuity of the signal in each layer of the network. Alternatively,a pragmatic approach of using cross-validation on larger datasets could bebeneficial in determining optimal parameterization.Due to availability of data, this study was limited to the gamma raylog and did not make use of any a priori knowledge of the region. Incorpo-rating additional log measurements (i.e. sonic, resistivity, etc) would helpcorroborate the classification as stratigraphic tops can be correlated on manydifferent log measurements. A priori information such as stratigraphic order-ing and thickness are reasonable assumptions that should be incorporatedas features or penalties in the classification problem in order to boost clas-sification accuracy.There is a strong correlation between number of samples and classifica-tion accuracy, which indicates the need for more data. Generally, traininga classifier requires many samples of labelled data. The availability of openlabelled datasets for research has proved to be a valuable resource in manyother fields. The TIMIT dataset exists for acoustic-phonetic voice classi-fication and GTZAN dataset supports machine learning for music genreclassification. The MNIST digit recognition dataset a provides a popularbenchmark for hand-written digit recognition, while the CURET and OU-TEX sets provides labelled data for image texture classification. Kaggle isa popular website where datasets from many industries are made availableas open machine learning competitions. The sensitive nature of interpretedgeophysical makes public datasets in geophysics extremely scarce, but with-out useful data it is difficult to realize the same supervised learning progressexperienced in other fields.19Chapter 3Reflection seismology analysisas an unsupervised learningproblemIn hydrocarbon exploration, geophysicists use reflection seismology to cre-ate images of the sedimentary layers. Using wave physics to model seis-mic wavefield propagation and scattering, physical rock properties can beinferred from the data that are then used to locate hydrocarbon reserves.In reality, the acquisition and processing of seismic data requires many apriori assumptions and physical approximations that inhibit calibrated mea-surements, making the inferred physical properties highly uncertain. As analternative to physical models, I hypothesize that useful information can beextracted directly from the data using unsupervised learning methods.In this chapter I describe reflection seismology as a scattering physicsexperiment, where a measured radiation pattern is used to characterize amaterial. I explain the physical assumptions required to measure and classifyseismic scattering patterns at stratigraphic interfaces in the subsurface anddiscuss their limitations. I then propose unsupervised machine learningmodels that exploit low-dimensional structure directly from the data anddemonstrate their application to synthetic datasets. In this experiment Isegment prestack seismic data using:1. Clustering of principle component projections extracted from conven-tional principle component analysis (PCA)20Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem2. Clustering of principle component projections extracted from non-linear kernel principle component analysis (kPCA)3. Direct image segmentation by matrix splitting using an extension ofrobust principle component analysis (rPCA).3.1 Angle domain common image gathers byshot profile migrationSeismic reflection data is acquired by firing an impulsive source and record-ing the backscattered wavefield using receiver arrays placed on the surface(Figure 3.1). The geometry of a seismic survey restricts the location ofsources and receivers to the surface, which prevents the direct measurementof subsurface scattering patterns. Data therefore needs to be migrvtzy, aprocess that reverses the physics of propagation and maps a reflected wave-field recorded at the surface back to the point of reflection. Assuming ade-quate acquisition aperture and an estimate of propagation velocity, data canbe migrated into reflection-angle dependent images of the subsurface (Fig-ure 3.2) referred to as an angle-domain common-image gathers (ADCIGs).See Figure 3.3 for angle dependent images for different midpoints. Theseangle dependent images of reflectivity can be analyzed to characterize therock properties of the sedimentary layers.Consider a seismic reflection where the source wavefield propagates to adiscontinuity in the Earth, scatters based on the physical properties at thediscontinuity, and then the backscattered wavefield propagates to a receiverat the surface. We can use a physics model to extrapolate the recordedwavefield and source to the scattering interface. The source and receiverwavefield are related by the scattering response of the interface, and thereforecorrelating the wavefields provides an estimate of the reflectivity strength(Figure 3.4).The physics of seismic wave propagation is best described by the elasticwave equationu¨ = f+∇(∇·u)+∇·[∇u+(∇u)i ]+(+2)∇(∇·u)−∇×(∇×u) (3.1)where denotes the spatial-dependent density, and are the spatial-dependent Lame parameters describing the elastic properties of the medium,f is the spatial and time-dependent vector source function, and u is spatial21Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.1 Seismic survey [CC-BY-SA-NC]. Sources and receivers on thesurface measure backscattered energy from stratigraphic interfaces.and time-dependent vector wavefield. Due to the computational complexityof migration, approximations to the wave physics are required to make thewavefield extrapolations feasible. Firstly, we use the density independentacoustic wave equation∇2u− 1k 2pU2uUt2= f (3.2)where u is the spatial and time-dependent scalar wavefield, kp is the pressurewave velocity, and f is the source function. The computational load isfurther reduced by only considering one-directional wave propagation. Thesingle square-root equationez+∆z(!P kxP ky) = ez(!P kxP ky)zikz∆z (3.3)22Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(a) Shot record a posi-tion x0.(b) Shot record a posi-tion x1.(c) Shot record a posi-tion x2.(d) Image for scatteringangle 0.(e) Image for scatteringangle 1.(f) Image for scatteringangle 2.Figure 3.2 Angle gather migration creates scattering angle dependent im-ages of the subsurface (bottom) from seismic shot records (top).is used to extrapolate wavefields, where ez(!P kxP ky) is the Fourier transformof the wavefield uz(tP xP y) andkz = −√!2v(zP xP y)− (k2x + k2y) (3.4)is the dispersion relationship for the one-way wave equation (Chapter 4Biondi (1998), Chapter 6 Berkhout (1980)) . The extended split-step algo-rithm described by Stoffa et al. (1990) and Kessinger (1992) is a computa-tionally efficient implementation of one-way wavefield extrapolation that isvalid for horizontally varying velocities.A simple correlation for each point in the subsurface model yields animage of the zero-offset reflectivity strength. We desire the entire angulardependent reflectivity response, which requires imaging at multiple offsets.This is accomplished by extracting multiple space lags during wavefield corre-lation, yielding a reflectivity measurement along subsurface offsets h referred23Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.3 ADCIGs for multiple midpoints.to as an offset-domain common-image gather (ODCIG). Having reflectivitymeasurements along subsurface offset allows us to measure @z@h , which is ge-ometrically related to angle via tan = − @z@h , where @z@h can be calculatedfrom a slant stack. See Algorithm 3.1 and Figure 3.4. I refer to Sava andFomel (2003) for a full treatment of the approach.Algorithm 3.1 ADCIG by shot profile migration1. for each shot do2. for each depth z do3. compute src_z = forward_extrapolate(shot, z)4. compute rec_z = backward_extrapolate(data, z)5. compute odcig[z]=correlate(src_z, rec_z)6. end for7. end for8. compute adcig = slant_stack(odcig)7. output: adcig24Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(a) Earth model. (b) Source wavefield atz0.(c) Receiver wavefieldat z0.(d) Earth model. (e) Extrapolated sourcewavefield at z1.(f) Extrapolated re-ceiver wavefield atz1.(g) Earthmodel.(h) ODCIG atx1(i) ADCIG at x1 (j) Angle depen-dent reflectivityat (x1; z1)Figure 3.4 For each shot, source (b) and receiver wavefields (c) are extrap-olated (e,f) to each depth. The extrapolated wavefields are correlated formultiple space lags creating offset-domain common-image gathers (h). TheODCIG’s are slant-stacked yielding angle-domain common-image gathers (i).The angle dependent reflectivity response (j) can be analyzed for each pointin the model.25Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem3.2 Overview of amplitude vs angle analysisGiven measurements of the angle-dependent reflectivity, we need to inferphysical rock properties in order to classify the subsurface. Assuming thatthe seismic wavelength of the source is small compared to physical varia-tions in the Earth, we can use ray theory to describe the scattering patternat a stratigraphic interface (Figure 3.5). The angular dependence of trans-mitted and reflected seismic rays for an incident p-wave is described by theZoeppritz equations (Zoeppritz, 1919)2664geegehieeieh3775 =26664− sin 1 − cosϕ1 sin 2 cosϕ2cos 1 − sinϕ1 cos 2 − sinϕ2sin 2 kP1kS1 xos2ϕ12kS22kP11k 2S1kP 2cos 212kS2kP11k 2S1cos 2ϕ2−xosϕ2 kS1kP1 sin 2ϕ12kP21kP12kS21kP1sin 2ϕ237775−1 2664sin 1cos 1sin 21cos 2ϕ13775(3.5)where gee and geh denote the p-wave and shear wave reflection coefficients,iee and ieh denote the coefficients of transmission, kh and ke denote theshear and P-wave velocities, and denotes the density.The Zoeppritz equations provide a physical model that relates angledependent reflectivity to ke , kh , and density. Unfortunately these are highlynon-linear relationships that make direct inversion prohibitive. Shuey (1985)provides a simplification that linearizes the Zoeppritz equations for smallparameter perturbations over a constant background. For scattering angles< 30 degrees (Figure 3.6), the non-linear Zoeppritz equations can be modeledby the two-term Shuey approximationgpp() = i(∆ke P∆) + g(∆ke P∆kh P∆) sin2 (3.6)wherei(∆ke P∆) =12(∆ke〈ke 〉 +∆〈〉), (3.7)g(∆ke P∆kh P∆) =12∆ke〈ke 〉 − 2〈kh〉2〈ke 〉2(∆〈〉 + 2∆kh〈kh〉), (3.8)and 〈ke 〉, 〈kh〉, and 〈〉 are estimates of the background. The Shuey formu-lation is linear with respect to i and g, which can be determined from theleast-squares problem26Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemminm‖Vm− d‖22 (3.9)whereV =2641 sin2 1... ...1 sin2 k375 , (3.10)m =[ ig](3.11)and d is the vectorized ADCIG. In this regard, Shuey term inversion projectsan angle gather dataset d ∈ Rnk where n is the number of samples in theEarth model and k is the number of angles onto two components iPg ∈ Rn.Figure 3.5 The geometric ray approximation to seismic wave scattering atan interface.27Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.6 AVA curves generated using the non-linear Zoeppritz equations(Equation 3.5) and the two-term Shuey approximation (Equation 3.6).Measurements by Castagna et al. (1985) and Gardner et al. (1974) ofshales and sandstones showed the empirical relationships∆〈〉 ∝ k∆ke〈ke 〉 (3.12)andke = mkh + xO (3.13)As a corollary, i and g are linearly related by the mudrock lineg =i1 + k[1− 4 〈kh〉〈ke 〉( 2m+ k〈kh〉〈ke 〉)](3.14)where m, x, and k are constants empirically determined from well logs orlaboratory measurements. Interestingly, hydrocarbon saturated sandstoneshave different correlation constants than sandstones saturated with brine.28Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemAnalyzing the multivariate properties of i and g therefore provides a method-ology for inferring hydrocarbon reserves directly from seismic data.Consider the situation of imaging the sedimentary layers where reflec-tion interfaces are primarily formed from shale and brine-saturated sand-stones. Reflections from hydrocarbon-saturated sands are statistically rare.Castagna et al. (1998) exploits the multivariate correlations between i andg by crossplotting, which leads to a background trend of shale/brine-sandreflections and statistical outliers that correspond to potential sandstoneswith trapped hydrocarbons. Figure 3.7 shows a crossplot of Shuey termscalculated from a distribution of shale/brine-sand interfaces and shale/gas-sand interfaces. Rock properties and mudrock line parameters used are fromCastagna et al. (1998) and are shown Table 3.1 and Table 3.2.Lithology ke kh Shale 3240 ± 50 1620 ± 50 2340 ± 50Brine sand 2590 ± 200 1060 ± 200 2210 ± 200Gas sand 2540 ± 20 1620 ± 20 2090 ± 20Table 3.1 Rock settings used for Figure 3.7.Parameter Valuec 1360 ± 136k 0.25 ± .05m 1.16 ± .116Table 3.2 Mudrock line parameters used for Figure 3.7.3.3 Motivation for machine learningMachine learning should be applied to a scientific problem when:1. There exists an underlying but unknown relationship between inputdata and desired output.2. There is no known physical model to describe the relationship, or thephysical model requires too many unrealistic assumptions and approx-imations.29Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.7 IG scatter plot depicting the mudrock line of silliclastic inter-faces and the outlying gas saturated sands. The seemingly positive/negativesymmetry stems from the top and bottom of each layer. Every point in thescatter plot corresponds to a pixel in the seismic image.In Section 3.1 and Section 3.2 I described the physics model that relateshydrocarbon-saturated sediments to seismic data. The physics relies on asimplified model of the Earth, which in reality is a complex amorphous ma-terial that has no known exact physical model. The following assumptionsare imposed in order to apply a physical model to seismic data analysis:1. Wave propagation is entirely described by the acoustic wave equationwith no multiple reflections (migration approximation).2. The sedimentary layers of the Earth are flat or show small lateralvariations (one-way wave equation approximation).3. The source wavelength is small compared variations in the Earth (raytheory approximation).4. The survey design and receiver aperture provides a full angular illumi-nation to each point in the subsurface.30Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem5. Reflection amplitudes are preserved by the migration operator.6. An accurate a priori velocity model is available.7. The effect of limited source bandwidth can be removed (perfect decon-volution).8. There is minimal noise.Seismic data is collected in uncontrolled environments, such as oceansurveys, where noise and environmental factors play an important and un-quantifiable role. Additionally, preprocessing seismic data to meet the as-sumptions listed above are each active research fields that have their ownuncertainties and limitations. For example, deconvolution aims to removethe effects of the source wavelet by solving a linear inverse problem. Thefinite-bandwidth of the source wavelet makes an ill-posed problem that re-quires regularization by prior information such as sparsity (Ooe and Ulrych,1979, Chapman and Barrodale (1983)) or smoothness (Edgar and Selvage,2013). Both are biases that can not be fully justified. Additionally, decon-volution assumes the source is known, which in reality is estimated from thedata and carries its own error. Advanced methods for removing multiple re-flections from data such as surface-related multiple attenuation (Verschuuret al., 1992) or estimation of primaries by sparse-inversion (Groenestijn,2010) provide methods for removing both internal multiples and surface-related multiples. These methods require dense and regular acquisition ge-ometries that are not practical for the scale of modern acquisitions and thusrequire interpolation in order to be effective. Although anecdotal, I argueeach seismic survey comes with unique acquisition and processing issues thatinhibit calibrated measurements with quantifiable uncertainty. Figure 3.8shows angle dependent curves from industry processed seismic data, whichexemplify the uncertainty inherent in real world seismic data. Even thelarge discrepancy in scales in Figure 3.8 shows that even simple variabilitychallenges exist between datasets. The seeming inability of conventionalmethods to adhere to the assumptions required by the physical model moti-vates new data-driven approaches.Now reconsider the Shuey term analysis method described in Section3.2. In the most general sense, Shuey term inversion is projecting a length kvector (angle dependent reflectivity curve) onto a length 2 vector (i,g). Thecrossplot analysis described by Castagna et al. (1998) can be abstracted asfinding outlying points(hydrocarbon-saturated sandstones) against a back-31Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(a) AVA response from the Penobscotopen dataset.(b) AVA response from the MobilViking dataset.(c) AVA response from the BG AVAdataset used in Chapter 4.Figure 3.8 Examples of response curves of strong reflectors from realdatasets.ground trend (mudrock line). We can now use the vernacular of unsuper-vised learning to restate Shuey term analysis without the need of a physicalmodel, where we apply yimznsionvlity-rzyuxtion followed by xlustzring.The remainder of this section demonstrates that projections learned directlyfrom seismic data can be successful in discriminating potential hydrocarbonreserves in seismic images.Previous work by Hami-Eddine et al. (2012) used self-organizing maps todirectly classify angle gathers without using the Shuey terms as a physicalmodel. Instead, I closely follow the conventional analysis procedure butreplace the physical model with projections learned from the data.32Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem3.3.1 Problem formulationGiven angle-domain common-image gathers, we desire a segmented outputimage where each point in the Earth model, or equivalently pixel in theimage, is classified according to the angle-dependent scattering response.This can be accomplished using unsupervised machine learning models (Fig-ure 3.9).For an unsupervised learning problem, we define a dataset in termsof samples and features, where samples are individual objects that are de-scribed by a vector of features. In the case of ADCIGs, each point in theEarth is a sample i described by a feature vector xi ∈ Rk consisting ofthe length k angle-dependent reflectivity response. The feature vectors areformed into a matrix m ∈ Rn×k with samples along the rows and featuresalong the columns. In this regard, we simply reshape our ADCIGs into amatrix where each pixel of the image becomes a row and each angle becomesa column. Generalizing the data as a feature matrix allows us to work in anunsupervised learning framework.Assuming the existence of a lower-dimensional representation of thecolumns of m, we can use dimensionality reduction techniques to reducethe number of columns into a new feature matrix mˆ ∈ Rn×mPm ≪ k. Thedimensionality reduced feature vectors can then be clustered based on simi-larity, which results in a segmented image.3.4 Example on Marmousi II syntheticdatasetThe elastic Marmousi II model (Figure 3.10) is an elastic extension to theubiquitous Marmousi model and was developed for the specific purpose oftesting amplitude vs. angle processing and analysis methodologies. I usea subset of the model that contains a gas reservoir embedded in layers ofbrine-saturated sand and shales in order to develop and demonstrate thefollowing unsupervised learning methods:1. Clustering of principle component projections extracted from conven-tional principle component analysis (PCA)2. Clustering of principle component projections extracted from non-linear kernel principle component analysis (kPCA)33Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.9 AVA characterization as unsupervised learning. ADCIGs arereshaped into a feature matrix, where each point becomes a row and theangle dependent reflectivity response becomes a column. The columns ofthe matrix are reduced to allow for visualization and to prevent overfitting.The low dimensional features can be clustered which results in a segmentedimage.3. Direct image segmentation by matrix splitting using an extension ofrobust principle component analysis (rPCA).3.4.1 Seismic modellingThe Marmousi II Earth model provides physical rock properties ke , kh ,and . We wish to study angle dependent reflectivities and therefore needto model ADCIGs (Section 3.1). As a ground truth dataset, I use the Zoep-pritz equations (Equation 3.5) to synthesize ADCIGs (Figure 3.11) directly34Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.10 Subset of the Marmousi II elastic model.from physical rock properties of the model. This dataset is ideal in thatthe angle dependent reflectivity responses are physically consistent with themodel described in Section 3.2 and is what we would measure if we hadperfect seismic acquisition and processing. Shuey terms were extracted byinverting the linear system described in Equation 3.9 and are crossplottedin Figure 3.12. The farthest outliers in the crossplot correspond to the topand bottom of the hydrocarbon reservoir.I synthesize a more realistic dataset by first modelling a seismic acqui-sition via a finite-difference visco-acoustic wave equation simulation. Anabsorbing boundary at the water bottom was used to prevent strong surface-related multiples. ADCIGs (Figure 3.13) were then formed using the shotprofile migration scheme described in Section 3.1. Note that the anglegathers are not perfectly flat, and the wavelet spreads out at greater an-gles. Flatness of gathers were corrected by peak picking in order to preventphase errors dominating the angle-dependent response. The spreading ofthe wavelet is an artefact of the migration operator, which was not cor-rected for in order to keep some physical inconsistencies in this syntheticdataset. No additional processing was performed, so the data still contains35Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.11 ADCIG for the true Marmousi II reflectivity model.internal multiples, some diffraction artefacts, non-stationary wavelet effects,and non-uniform illumination.Extracted Shuey terms are plotted in Figure 3.14. Visco-acoustic mod-elling is computationally cheaper than using a full elastic scheme, but resultsin subtler responses. Referring to Equations 3.6-3.8, the response will be pri-marily characterized by the first term (Equation 3.7). Although there is abackground trend and outliers which can be related to the underlying ge-ology, there is significant spreading due to the unmodeled contributions. Iexplore data projections that are robust to these unmodeled contributions.3.4.2 Principle component analysisPCA can be used to find data projections that will maximize the variance,which is a measure of information. Performing an eigendecomposition of theco-variance matrix36Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.12 IG crossplot for the true Marmousi II reflectivity model.X = mim =1nn∑ixixii (3.15)where X ∈ Ry×y results in a set of eigenvectors vi with corresponding eigen-values i, where i = 1 O O O y. Projecting the feature matrix m ∈ Rn×y ontothe eigenvectors with the m largest eigenvalues results in the dimensionallyreduced matrix mˆ ∈ Rn×m.The use of PCA as a seismic analysis tool has been studied by Scheevelet al. (2001), where they use principle components along the time axis asfeatures while predicting porosity from seismic amplitude in a supervisedcontext. Hagen (1982) as well as Kaplan (2003) have used PCA as a meansof denoising, which again analyzes principal components along the time axis.Alternatively, Saleh et al. (2000) has recognized the connection betweenPCA of scattering responses and Shuey term inversion. The author showedthat crossplots of principle component coefficients and Shuey terms sharesimilar multivariate properties in a noise free example, but the principlecomponent projection was more robust to multiplicative noise. This is be-cause multiplicative noise is not in the span of the Shuey vectors, where as37Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.13 ADCIG for the migrated seismic from the Marmousi II model.the data driven principle component approach will always find projectionsthat span the data.I now apply the same approach as Shuey term analysis to the syntheticdatasets, but instead project the onto the two largest principle componentsinstead of the Shuey vectors. For the physically consistent dataset modelleddirectly by the Zoeppritz equations, the crossplot geometries of PCA andShuey methods are nearly identical (Figure 3.12 and Figure 3.15). Theprinciple component vectors extracted are normalized versions of the Shueyvectors, which is shown in Figure 3.16. This is an interesting result, as itdemonstrates that under the conditions where the physics is preserved, PCAand Shuey inversion provide interpretably equivalent crossplots.The crossplot for the migrated physically inconsistent data showed dif-ferent geometries for PCA and Shuey terms (Figure 3.12 and Figure 3.15).Most importantly, outlying clusters are spread further from the backgroundtrend in the PCA crossplot. This is an important result, as anomalousresponses in this dataset correspond to hydrocarbon-saturated sediments.38Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.14 IG crossplot for the migrated Marmousi II model.Figure 3.15 PCA crossplot for the true Marmousi II reflectivity model.39Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(a) Principle component vectors for thetrue Marmousi II reflectivity model.(b) Principle component vectors for themigrated Marmousi II model.(c) The Shuey terms as component vec-tors.Figure 3.16 Comparison of principle components from Zoeppritz modelleddata, migrated data, and the Shuey terms as components.The two principle component vectors with the largest eigenvalues arecompared to the vectors defined by the Shuey model(Figure 3.16). Since theprinciple components were extracted directly from data, unmodeled contri-butions such as wavelet effects, internal multiples, and the response of themigration are in the span of the projection. On the contrary, the Shueymodel is static and assumes the data is explained entirely by the physics,resulting in less variance in the crossplot, which makes discriminating anoma-lous responses more difficult.Although crossplots can be manually analyzed to look for backgroundtrends and anomalous clusters, we seek a fully automated approach to imagesegmentation. We therefore need a method to cluster the data into a finitenumber of classes.40Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(a) Crossplot of migrated data projected onto principle components.(b) Crossplot of migrated data projected onto Shuey components.Figure 3.17 PCA crossplot (a) compared to IG crossplot (b) for the mi-grated Marmousi II model. Note that outliers in (a) are pushed farther fromthe dense background clusters than in (b). 41Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem3.4.3 ClusteringData clustering methods are typically either partitional, which classifiespoints by assignment to an underlying distribution, or hierarchical whichclusters points based on a distance metric.Partitional algorithms directly decompose data into disjoint clusters byminimizing a criterion function, which is usually a set of probability distribu-tion functions. For example, Gaussian mixture models partition a datasetby maximizing probability of each point being associated with one of Kdistributions defined by Gaussian statistics. The ubiquitous K-means algo-rithm is a scalable but specific case of a mixture model where each classshares identical statistics. A partitional algorithm will be sensitive to thechoice of number of clusters and the initial starting guess of the distributionparameters. Partitional clustering on a toy example is shown in Figure 3.18.Figure 3.18 Partitional clustering. Points are partitioned by maximiz-ing the likelihood of membership to a distribution.Hierarchical algorithms cluster by iteratively joining and splitting groupsof points into a hierarchy based on a distance metric (Figure 3.19). Sincethere is no reason to assume that dimensionality reduced seismic data will bedrawn from any known distribution, I use a hierarchical clustering algorithmfor classification. BIRCH clustering (Zhang et al., 1996) is designed for largedatabases and is specifically optimized to minimize the number of passesthrough the dataset.At the core of Birch clustering is the cluster feature tree. Given a clusterof N samples {xi} where i = 1P 2P OOOP c and xi ∈ Rm, a cluster feature isdefined as the three point vector (cP∑ci=1 xiP∑ci=1 x2i )O A tree can be formed42Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.19 Agglomerative clustering. Each point is initially consid-ered to be a cluster. The geometrically closest clusters are joined into anode, which is then considered a new cluster. The algorithm steps throughthe hierarchy until a set number of clusters is reached.by joining subclusters into nodes, where the merged cluster feature vectoris a simple addition of the subcluster feature vectors. Note that clustercentroid x¯ =∑Ni=1 xic and cluster radius r =∑Ni=1(xi−x¯)2c can be calculatedfrom the cluster feature values.The tree geometry is controlled by two parameters, the threshold i thatdefines the maximum radius of any cluster, and the branching factor W thatdefines the maximum number of samples allowed in each cluster. A clusteris split when either of these values is exceeded, and the subclusters are eitherleft as their own leaf, or merged with other clusters. The final leaves of thetree are passed to a simple agglomerative clustering algorithm (Figure 3.19)which outputs the final classifications.Note that the formation of the cluster feature tree requires one passthrough memory and can be applied in an online fashion, which makes thealgorithm highly scalable.When applied to the principle component projections of the physicallyconsistent data, Birch resulted in 6 distinct clusters, 2 of which correspondto the top and bottom of the hydrocarbon reservoir (Figure 3.20), one corre-sponds to the background trend, and 3 other clusters are related to the highamplitude refelctions near the bottom of the model. Clustering principlecomponent projections of the migrated seismic resulted in 4 distinct classes43Chapter 3. Reflection seismology analysis as an unsupervisedlearning problem(two were merged after the fact to define the background), where the topand bottom of reservoir was again delineated from the background trend(Figure 3.21).Birch clustering provides a scalable method to automatically segment aseismic image based on principle component coefficients. The method suc-cessfully delineated a hydrocarbon reserve from a background trend in syn-thetic datasets. The number of clusters and branching threshold (Table 3.3)was chosen by trial to demonstrate desirable results, which I acknowledgeis not a robust general solution to parameter selection. I now explore otherprojections with the goal of discovering multivariate geometries (locationsof points on a crossplot) that are more robust to automated clustering.Figure 3.20 Classification results for physically consistent data.3.4.4 Kernel PCAPCA will reduce the number of features while maximizing the variance of thedata (a measure of information), but it is a linear model and may not resultin the best low-dimensional representation of m. Instead, I seek a non-linearprojection that results in advantageous multivariate geometries. Non-linear44Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.21 Classification of principle components from migrated seismicdata. The background trend was initially two positive/negative clusters thatwere joined into one for plotting purposes.principle component projections can be calculated by employing the “kerneltrick”, which implicitly transforms the data into a high-dimensional non-linear feature space (Schölkopf et al., 1997). Kernels are often employed inmachine learning problems to exaggerate or linearize seperation boundariesin classification problem (Hofmann et al., 2008).In order to apply a kernel to PCA, we need to recognize that PCA canbe calculated from the inner-product matrixmmi =0BBB@〈x1Px1〉〈x1Px2〉 O O O 〈x1Pxn〉〈x2Px1〉〈x2Px2〉 O O O 〈x2Pxn〉... ... . . . ...〈xnPx1〉〈xnPx2〉 O O O 〈xnPxn〉1CCCA (3.16)instead of the outer product matrix mim (covariance). Although mmi ∈Rn×n compared to mim ∈ Ry×y and y ≪ n for seismic data, mmi is aGramian matrix that is required by kernel methods. Now let j be the45Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemorthogonal matrix containing the eigenvalues of mmi P Λ be the correspond-ing matrix of eigenvalues, and k be the orthogonal matrix containing theeigenvectors of mim. By definition, we have(mmi )j = jΛ (3.17)that we multiply both sides by mi to arrive at(mim)(mi )j = (mij)ΛO (3.18)Note this is just an eigenvalue decomposition, which shows the eigenvectorsof the covariance matrix mim can be calculated from mij . This is animportant result as it shows that the inner product matrix mmi can beused to find the principle components of m.Since the entries of mmi depend only on the inner product of featurevectors 〈xiPxj〉, we can employ a kernel (xiPxj) that implicitly calculatesthe inner product 〈ϕ(xi)P ϕ(xj)〉 where ϕ(x) defines a high-dimensional fea-ture space. Note that by using a kernel we calculate the high-dimensionalinner product in the original lower-dimensional space (Hofmann et al., 2008).Using a non-linear kernel will result in a non-linear PCA operation.Now consider the polynomial kernel(xiPxj) = (xii xj + w)xO (3.19)It can be shown that (xiPxj) = ϕ(xi)iϕ(xj) where ϕ(x) is a feature trans-formation that contains all terms up to degree x. For example, the polyno-mial kernel with x = 2 and w = 1 corresponds to the feature transformationϕ(x) = [1P√2x1P√2x2P x21P x22P√2x1x2] (3.20)when x ∈ R2 (Murphy, 2012).Crossplots of projected components for increasing values of x are shownin 3.22. The geometry changes dramatically for higher values, and outliersbecome linearly separable from a densely clustered background trend. Thisis likely partially attributed to the power-law inherent in the polynomialkernel, which will put increasing emphasis on high-amplitude features. Forlarger values of x floating point errors become problematic, so I choose avalue of 10 which pushes the background trend into a dense cluster withoutincurring floating-point errors.46Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemc 5 ) c 5 2c 5 5 c 5 )(c 5 ) c 5 2c 5 5 c 5 )(Figure 3.22 Kernel PCA crossplots for increasing values of x for physicallyconsistent data (top), and migrated seismic data (bottom).47Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemI present the results of kernel PCA as an empirical experiment, and cur-rently do not have a rigorous explanation as to why the data forms along aseemingly smooth curve. The feature space spanned by the implicit trans-form ϕ(x) is both very large and highly non-linear, which makes the analysisof isocurves in this space beyond the scope of this thesis.Birch clustering was applied to the projected data yielding the segmentedimages shown in Figure 3.23 and Figure 3.24. The reservoir is clearly delin-eated from the background in both of the datasets. The background formedone cluster at the apex of the curve, which allows outlying samples to belinearly separated. The farthest outlying cluster (blue) in Figure 3.24 cor-responds to the high-amplitude tuning effects at the edges of the reservoir,which was not discovered as a unique cluster using linear PCA projections.Kernel PCA becomes a computationally expensive operation for seismicdata (n≫ k), as forming the inner product matrix mmi is d(n3) comparedto d(k2n) for the outer product matrix. This limits the size of the windowthat the operation can be performed on, but does not inhibit the use of themethod.In summary, kernel PCA resulted in preferential multivariate geometriescompared to linear PCA at the expense of computational cost. Interpretingthe physical meaning of the non-linear projection is challenging and is notcurrently understood, which limits the appeal of the method. Although ker-nel PCA projections were less sensitive to clustering parameters than linearPCA, choices of the number of clusters and branching threshold (Table 3.4)are still determined by trial and will likely vary between datasets.3.4.5 Robust PCAClustering of both linear and kernel principle component projections wassuccessful at delineating the reservoir, but both rely heavily on tuning ofclustering parameters. Additionally, kernel PCA significantly increases thecomputational cost and obscures physical interpretation. For these reasons,I develop a scalable approach that can segment an image without explicitlyclustering, yet still has a direct physical interpretation.I again state the assumption that the majority of reflections in the sed-imentary layers are from brine-saturated sandstones and shales, and reflec-tions from hydrocarbon-saturated sands are statistically rare. Additionally,I impose the assumption that the angle dependent reflectivity responses of48Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.23 Image segmentation results for kernel PCA applied to physi-cally consistent data.the background trend are similar and ideally linearly related. With theseassumptions, we can state that our feature matrix m is the summation of alow-rank matrix a consisting of the background trend, and a sparse matrixh containing the outlying responses. In this regard, we can directly segmentthe image into a background trend and outliers by solving the optimizationproblemminaPhrank(a) + ‖h‖row−0 s.t. a+ h = m (3.21)where m is the feature matrix, a is a low-rank matrix, h is a row-sparsematrix, and ‖h‖row−0 denotes the number of non-zero rows in h. Note thatboth terms are non-convex, which makes optimization a hard problem.The rank(a) can be relaxed into a convex formulation using the nuclearnorm ‖a‖∗ where‖a‖∗ = trace(√a∗a) =∑ii = ‖‖1 (3.22)49Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.24 Image segmentation results for kernel PCA applied to themigrated seismic data.and is a vector of the magnitude of the singular values of m. Donoho(2004) showed that minimizing the u1-norm promotes a sparse solution,which means minimizing the nuclear norm results in a sparse set of sin-gular values. The number of non-zero singular values corresponds to therank of a matrix, and therefore minimizing ‖a‖∗ will result in a low-rankmatrix.The ‖h‖row−0 term can be relaxed into the convex matrix norm ‖h‖1P∞,which is the u1-norm along rows and the u∞-norm along columns. Encour-aging the vector of extrema along each row to be sparse will result in row-sparsity. Other row-sparsity promoting norms like u12 can also be used. Wecan now reformulate Equation 3.21 as the convex problemminaPh‖a‖∗ + ‖h‖1P∞ s.t. a+ h = m P (3.23)which extends the robust PCA work of Candes et al. (2009) from sparsematrices to row-sparse matrices. I use the alternating direction method ofmultipliers (ADMM) to find a solution to Equation 3.23. ADMM methods50Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemhave well-studied convergence properties and have been applied to a wide-range of convex optimization problems. Perhaps most relevant to seismicdata problems, it is adaptable to large distributed datasets (Boyd, 2010).Using an augmented Lagrangian we can write Equation 3.23 as the un-constrained loss functionl(aP hP n ) = ‖a‖∗ + ‖h‖1P∞+ < nPm −a− h S +2‖m −a− h‖2F (3.24)where can be interpreted as penalty term for the data misfit. ADMMproceeds by iteratively finding akP hk via argminaPh l(aP hP nk) then updatingthe Lagrange multipliernk+1 = nk + (m − aK − hk) (3.25)until a convergence criterion is met. Following the approach used by Candeset al. (2009), I recognize that argmina l(aP hP n ) and argminh l(aP hP n ) havesimple and efficient solutions using proximal operators. I refer to Combettesand Pesquet (2011) for a background on proximal operators and iterativeproximal splitting methods.Let S : Rm → Rm denote the vector shrinkage operatorS [x] = max(1− ‖x‖∞ P 0)xP (3.26)T : R→ R denote the scalar shrinkage operatorT [x] = sgn(x)max( − |x|P 0)P (3.27)and D denote the singular value thresholding operator given byD (m) = ji (Σ)k ∗ (3.28)where m = jΣk ∗ is any exact or approximate singular value decomposition.It can be shown (Candes et al., 2009; Combettes and Pesquet, 2011) thathk = h(m − a+ −1n ) (3.29)andak = Y(m − a− −1n ) (3.30)51Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemare proximal operators for argminh l(aP hP n ) and argmina l(aP hP n ) respec-tively. The optimization proceeds by fixing h while minimizing with respectto a, then fixing a and minimizing with respect to h, and finally updatingthe Lagrange multiplier based on the residual m −a−h. See algorithm 3.2.Algorithm 3.2 Component Pursuit by ADMM1. initialize h0 = n0 = 0P S 0.2. while not converged do3. compute ak+1 = D(b − hk − −1nk);4. compute hk+1 = S(b − ak+1 + −1nk);5. compute nk+1 = nk + (b − ak+1 − hk+1);6. end while7. output: aP h.In practice the algorithm is not run to completion as a seismic datasetcan be very large, so both the stopping criteria and choice of play a vitalrole in the resulting estimates of a and h. The values listed in Table 3.5were determined through trial.Robust PCA was able to exactly segment the reservoir for the physicallyconsistent dataset (Figure 3.25). The reservoir was mostly delineated in themigrated seismic (Figure 3.26), but there is not perfect lateral continuity.Convergence of the algorithm has been proven for the case when the sparsecomponent h is random noise (high rank), but I can not assume that theoutlying reflectivity responses will also be high rank. The lack of completesegmentation is therefore not surprising, but the amount delineation shownin Figure 3.26 still provides a very useful result. The fact that Robust PCAhas a direct physical interpretation, does not require a priori knowledge ofthe number of clusters, and is scalable to large seismic datasets makes it anattractive method for seismic image segmentation.Parameter Zoeppritz data Migrated seismic dataNumber of components 2 2Number of clusters 6 4Table 3.3 Settings for clustering PCA reduced features52Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.25 Robust PCA applied the true Marmousi II reflectivity model.Parameter Zoeppritz data Migrated seismic dataNumber of components 2 2x 10 10w 0 0Number of clusters 3 4Table 3.4 Settings for kernel PCAParameter Zoeppritz data Migrated seismic data 3000.0 3000.0 0.01 0.01number of iterations 50 50Table 3.5 Settings for robust PCA53Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.26 Robust PCA applied to the migrated seismic from the Mar-mousi II model.3.4.6 DiscussionThe experiments performed in this chapter were motivated by the inabil-ity of the conventional geophysical model to adequately explain the angledependent reflectivity response of seismic data. Seismic data acquisition isperformed in an uncontrolled natural environment, and processing of thedata requires many physical assumptions that prevent calibrated measure-ments. Seismic data can be migrated into structural images where scatteringinterfaces can be accurately located, but the angle dependent response willcontain a combination of the true physical response as well as unmodeledcontributions. For this reason, I explored unsupervised learning modelsthat learn projections directly from the data that help delineate outlyingresponses from a background trend. This is important, as outlying reflectiv-ity responses can be associated with potential hydrocarbon reserves in thesedimentary layers.Generalizing the physical model described in Section 3.2 as a simpleprojection motivated the use of PCA-based approaches. Figure 3.12 and54Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemFigure 3.15 demonstrate that in the unlikely case of a physically consistentdataset, the projections learned from PCA and the direct projections fromthe physical model are nearly equivalent. When applied to a more real-istic dataset that that contains unmodeled contributions from acquisitionand processing, PCA projections provided advantageous crossplots whereoutliers(hydrocarbon related regions) could be more easily delineated (Fig-ure 3.17). In this regard, projecting onto principle components should bea preferred method to using the Shuey model, as it provides similar resultsin an ideal scenario and is more robust to physically inconsistent data. Forthis simple synthetic test case, the physical inconsistencies are minor, andthe principle components therefore have similar shapes (Figure 3.16). Forreal seismic data (Figure 3.8), the principle component vectors vary substan-tially from Shuey vectors indicating that the difference between PCA andShuey term approaches will be much more pronounced.Using a polynomial kernel provided a method to project the data ontoprinciple components in an implicit high-dimensional, non-linear feature rep-resentation. The resulting multivariate geometry of the non-linear projectionallowed outlying responses to be linearly separated from a background. In-terestingly, the crossplots shown in Figure 3.22 form a distinct curve wherethe background densely clusters at the apex. Although this projection hasproperties that are useful for automatic classification, it is difficult to in-tuitively interpret. Further research is required to better understand themultivariate geometry created by the polynomial kernel.Birch clustering was used to automatically classify the projected data.Results were promising, as the reservoir was segmented using kernel PCAand PCA for both physically consistent and migrated data (Figure 3.21, Fig-ure 3.23, Figure 3.24). Automated clustering requires the desired numberof clusters as a user defined input, which in practice may be an unrealis-tic requirement. Additionally, real world seismic data has a continuum ofresponses, which may not be able to be classified into a finite number ofdisjoint classes. Although Birch clustering was useful for segmenting thissynthetic data, a less automated approach to classifying projections willlikely be required for real datasets.Alternatively, robust PCA directly segmented the reservoir from thebackground without the need of a clustering algorithm, which is shown inFigure 3.23 and Figure 3.24. The efficacy of the algorithm is contingent upondesirable rank properties of both the background trend and sparse outliers.The matrix of the background of angle dependent responses is certainly low55Chapter 3. Reflection seismology analysis as an unsupervisedlearning problemrank, but assuming that outlying responses will be significantly higher rankmay not be justified. A study using real seismic data collected over a knownreservoir is required to analyze the rank of hydrocarbon related reflectivityresponses to gauge if use of this method can be used in practice.Finally, these experiments were performed on simple synthetic models ofthe Earth. Although interesting for the development of new methodologies,the work presented is by no means a proper benchmark comparison betweenapproaches.3.5 ConclusionsIn this chapter I used a relatively simple model to develop and demonstrateunsupervised learning methods to classifying seismic data. The novelty ofthis work is reformulating the conventional analysis methodology withoutthe need of a physical model. Results on ideal data showed that projectionslearned directly from the data yield interpretably equivalent results to usinga physical model. Results using migrated data that contains contributionsoutside of the span of the physical model showed that unsupervised modelscould provide better projections for classifying outliers from a backgroundtrend.Future work requires testing the methodologies on seismic data acquiredover a known and proven hydrocarbon reserve, which would allow properbenchmarks and firm conclusions on the effectiveness of the new methodolo-gies.56Chapter 4Comparison to dynamicintercept gradient inversionDynamic intercept gradient inversion (DIGI) is a Shuey term based methodfor seismic data analysis developed by Edgar and Selvage (2013). It is cur-rently used by the BG group for finding potential hydrocarbon reserves fromindustrial seismic data. A migrated seismic dataset was provided to comparePCA-based analysis methods with the DIGI algorithm. A seismic sectionof interest(Figure 4.1) containing a potential gas reservoir was selected by aBG geologist and will serve as the segmentation target in the comparison.The clustering based methods could only segment the water bottom/shal-low segments, so I restrict the comparison to robust PCA and a modifiedversion PCA that extends the DIGI algorithm.4.1 Dynamic intercept gradient inversionThe Shuey model described in Section 3.2 is extended to include thebandwidth-limiting effects of the seismic wavelet, where the migrated seis-mic data y(xP tP ) is explained by the convolution modely(xP tP ) = w(xP tP ) ∗ r(xP tP )P (4.1)where r(xP tP ) is the reflectivity response of the Earth and w(xP tP ) is a setof non-stationary wavelets estimated from the data. The null-space caused57Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.1 Migrated data section from a real field dataset provided theBG group. There is an interpreted gas sand is located at sample 400 and xlrange 150-220.by the bandwidth-limited wavelet creates the ill-posed system of equations[d]=[l l sin2 ] [ ig](4.2)wherel is a Toepplitz matrix containing the estimated set of non-stationarywavelets w(xP tP ) and d is again the migrated ADCIGs flattened into avector. The ill-conditioned system is regularized by forcing the gradient ofiPg to be small by the augmented system[d0]=[l l sin2 ∇ ∇] [ ig]O (4.3)The system is further augmented by the minimum energy extended elasticreflectivity (J. Hicks and M. Francis, 2006)ZZg(mz) = I cos(mz) +G sin(mz) (4.4)58Chapter 4. Comparison to dynamic intercept gradient inversionwhich exploits the expected correlation between i and g where ZZg(mz) =0 for perfectly correlated terms. The value of mz is expected to vary withthe shear velocity ratio kPkS . Assuming this variation is dominated by com-paction, mz should decrease with time below the seabed, iseabed. Thesimple linear relationshipmz(xP t− iseabed) = m(x) · (t− iseabed) + x(x) (4.5)is used to model the compaction trend.The final system of equations24d0035 =24 l l sin2 ∇ ∇l (mz) cos(mz) l (mz) sin(mz)35[ ig](4.6)where mz is obtained from sin2 mz = tan(mz) is solved using the conjugategradient algorithm LSQR (Paige and Saunders, 1982).The ZZg values calculated from i and g form an image where largevalues indicate deviations from the expected correlation. In this respect,anomalous regions can be extracted from the ZZg image by thresholding.4.2 PCA extended DIGIIn Section 3.4.2 I showed that projecting ADCIGs onto the two largest princi-ple components had advantageous multivariate properties compared to theShuey model. Using this knowledge, I modify the DIGI algorithm by di-rectly replacing the Shuey terms with the two largest principle componentsextracted from the data −i.e.,24d0035 =24 lc1 lc2∇ ∇l (mz) cos(mz) l (mz) sin(mz)35[ ig](4.7)where c1 and c2 are the two largest principle components vectors. Usingdata driven components allows the algorithm to explain data that is outsidethe span of the Shuey vectors. This method is feasible only for linear PCA,as the abstraction of the kernel PCA to a higher dimension does not fitwithin this framework.59Chapter 4. Comparison to dynamic intercept gradient inversion4.3 ResultsSegmented images for robust PCA, DIGI, and PCA extended DIGI areshown in Figures 4.2-4.4. Parameters were selected to segment the potentialreservoir while maintaining the least amount of misclassified clutter. Al-though none of the methods was able to achieve perfect segmentation, bothPCA methods had significant less clutter than the DIGI results.The robust PCA method (Figure 4.2) primarily segmented the water bot-tom as well as the potential reservoir. This is a good result, as both theseregions should deviate significantly from any background trend of reflec-tions. There are some spurious classifications at the depth of the reservoir,but these misclassifications do not form coherent shapes and would likelybe ignored by an experienced interpreter. The robust PCA had the leastamount classification clutter for the same amount of reservoir segmentationmaking it the preferred method for this test case.The segmented image from the DIGI algorithm is shown in Figure 4.3.The image shows significant amount of spurious classifications in addition tothe potential reservoir, which would make interpreting the potential reser-voir a challenge. Thresholding the DIGI results is not useful, but manuallyexamining the EER image (Figure 4.5) for bright regions could still be ef-fective.The PCA extended DIGI algorithm produced a segmented image (Fig-ure 4.4) that had significantly less clutter than the original DIGI algorithm.The interesting part of this image is that the water bottom and shallow lay-ers were not considered as outliers, which one might expect. However, theseare the highest amplitude regions, and since PCA tries to explain the mostvariance in the data, the extracted principle components would be biased toexplain this region.EER cross-sections created from Shuey terms and principle componentvectors are shown in Figure 4.5. The image generated from the principlecomponents is sharper, and the potential reservoir stands out higher againstthe background (see the green section of the trace in Figure 4.5). Princi-ple component vectors extracted from the data are compared to the Shueyvectors in Figure 4.6. There are substantial differences between these plots,which indicates there is significant contributions in the data that out ofthe span of the Shuey vectors. For this reason, the PCA extended DIGIprovided preferential results.60Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.2 Segmentation results from robust PCA.4.4 DiscussionThe PCA-based methods outperformed DIGI on this particular seismic sec-tion because the data contained contributions that were outside span of theShuey vectors. Both robust PCA and the PCA extended DIGI algorithmdo not rely on a static physical model and are thus able to adapt to thedataset.The PCA-extended DIGI method extracted principle components fromthe entire seismic section of data. Future research should deconvolve thedata first before extracting principle components, as the convolution is partof the forward model (Equation 4.7). Additionally, studies to see how theprinciple components vary with spatial coordinates would be useful, as ex-tracting principle components from windowed sections could improve resultsif the principle components vary spatially. Finally, the ZZg calculation(Equation 4.4) may not be the best for determining outliers from correla-tions between principle component projections. Future work could assessthe efficacy of thresholding on other correlation functions.61Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.3 Segmentation results from thresholding ZZg values extractedusing the DIGI algorithm.I will reserve any firm conclusions, as this was one study on a smallseismic section with a very loose geological interpretation. Seismic fielddatasets containing proven reservoirs are required to continue research intohow to find hydrocarbons directly from seismic data. Further research intofeature representations and data projections will continue to be inconclusivewithout ground truth datasets to serve as benchmarks.62Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.4 Segmentation results from thresholding ZZg values extractedusing the DIGI algorithm with principle components replacing the Shueyterms.63Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.5 Comparison of EER sections generated from DIGI (top) andPCA extended DIGI (bottom). The region containing the potential reservoiris highlighted in green on the trace plots. 64Chapter 4. Comparison to dynamic intercept gradient inversionFigure 4.6 Principle component vectors extracted from the BG data set(top) compared the Shuey components (bottom). 65Chapter 5Closing remarksI this thesis I explored machine learning applications to geophysical dataanalysis. I made the thematic argument that geophysical data falls withinthe regime of machine learning models. Firstly, interpretation of geophysicaldata is often based on visual correlations and pattern recognition, which fitsinto a supervised learning framework. Geological interpretations can serveas labels which can train a classifier to make predictions on similar datasets.Secondly, I argue that the complexity of the Earth’s sedimentary layerscombined with the difficulty of seismic acquisition and processing limits theuse of physical models to fully explain seismic data. Using assumptions oflow-dimensionality and latent models I showed that unsupervised learningcan be applied to extract more useful information directly from the data.In Chapter 2 I applied supervised learning to train a classifier to pre-dict stratigraphic units from well logs. I learned a generic mapping frommultifractal feature vectors to interpreted stratigraphy that can make use-ful predictions of stratigraphic units. The novel contribution was using thescattering transform as a multifractal feature representation for well logs.In Chapter 3 I took a well-known exploration geophysics analysis method-ology and presented it in as an unsupervised learning problem. I made theargument that seismic data contains contributions outside the span of theconventional two-component linear model. I showed that under ideal condi-tions PCA-based approaches provide the same results, and are more robustto unmodeled contributions from acquisition and migration. The main con-tribution of this chapter was abstracting the analysis procedure into the ver-nacular of unsupervised learning which is independent of a physical model.66Chapter 5. Closing remarksIn Chapter 4 I applied the methods developed in Chapter 3 to a fielddataset and compared the results to a method currently used by the BGgroup for seismic data exploration. In this small example on a field dataset I showed that unsupervised learning approaches could better segment apotential gas reservoir than BG’s current methodology. The novel contri-bution of this section is demonstrating the methodologies developed in thisthesis can be effective on a real dataset.In this thesis I developed original ideas and methodologies to use ma-chine learning in exploration geophysics. Although several novel approacheswere developed, conclusions were difficult to draw. In each experiment Ilacked adequate data to do scientific benchmarks. A lot of the advances inmachine learning in other fields can be owed to open datasets which allowresearchers to benchmark and publish their findings on the same data. With-out this same infrastructure, there are significant barriers to demonstratingthe potential success of machine learning in exploration geophysics.67BibliographyAliouane, L., and S.-A. Ouadfeul, 2014, Heterogeneities analysis using thegeneralized fractal dimensions and the continuous wavelet transform.Application to the KTB boreholes: Arabian Journal of Geosciences, 7,2959–2967. → pages 11Aliouane, L., S.-A. Ouadfeul, and A. Boudell, 2012, Well-Logs DataProcessing Using the Fractal Analysis and Neural Network, in FractalAnalysis and Chaos in Geosciences: InTech. → pages 11Alvarez, G., B. Sanso, R. J. Michelena, and J. R. Jimenez, 2003, Lithologiccharacterization of a reservoir using continuous-wavelet transforms:IEEE Transactions on Geoscience and Remote Sensing, 41, 59–65. →pages 11Andén, J., and S. Mallat, 2011, Multiscale Scattering for AudioClassification.: ISMIR, 657–662. → pages 11——–, 2014, Deep scattering spectrum: Signal Processing, IEEETransactions on, 62, 4114–4128. → pages viii, 2, 3, 11, 13Arneodo, A., E. Bacry, and J. F. Muzy, 1995, The thermodynamics offractals revisited with wavelets: Physica A: Statistical Mechanics and itsApplications, 213, 232–275. → pages 10Bacry, E., J. F. Muzy, and A. Arnéodo, 1993, Singularity spectrum offractal signals from wavelet analysis: Exact results: Journal ofStatistical Physics, 70, 635–674. → pages 10Berkhout, A. J., 1980, Seismic Migration: Imaging of Acoustic Energy byWave Field Extrapolation: Elsevier. → pages 2368BibliographyBiondi, B., 1998, 3-D Seismic Imaging: Society of ExplorationGeophysicists. → pages 23Bond, C. E., 2015, Uncertainty in structural interpretation: Lessons to belearnt: Journal of Structural Geology, 74, 185–200. → pages 18Boyd, S., 2010, Distributed Optimization and Statistical Learning via theAlternating Direction Method of Multipliers: Foundations and Trends®in Machine Learning, 3, 1–122. → pages 51Bruna, J., 2013, Scattering representations for recognition: PhD thesis,Ecole Polytechnique X. → pages 3, 16Bruna, J., and S. Mallat, 2010, Classification with scattering operators:arXiv preprint arXiv:1011.3023. → pages 11Bruna, J., S. Mallat, E. Bacry, and J.-F. Muzy, 2015, Intermittent processanalysis with scattering moments: The Annals of Statistics, 43, 323–351.(arXiv: 1311.4104). → pages 3, 11Candes, E. J., X. Li, Y. Ma, and J. Wright, 2009, Robust PrincipalComponent Analysis?: arXiv:0912.3599 [cs, math]. (arXiv: 0912.3599).→ pages 50, 51Candes, E. J., and T. Tao, 2009, The Power of Convex Relaxation:Near-Optimal Matrix Completion: arXiv:0903.1476 [cs, math]. (arXiv:0903.1476). → pages 2Castagna, J. P., M. L. Batzle, and R. L. Eastwood, 1985, Relationshipsbetween compressional-wave and shear-wave velocities in clastic silicaterocks: Geophysics, 50, 571–581. → pages 28Castagna, J. P., H. W. Swan, and D. J. Foster, 1998, Framework for AVOgradient and intercept interpretation: Geophysics, 63, 948–956. → pages29, 31Chapman, N. R., and I. Barrodale, 1983, Deconvolution of marine seismicdata using the l1 norm: Geophysical Journal International, 72, 93–100.→ pages 31Ciresan, D., U. Meier, and J. Schmidhuber, 2012, Multi-column deepneural networks for image classification: Computer Vision and PatternRecognition (CVPR), 2012 IEEE Conference on, IEEE, 3642–3649. →pages 269BibliographyCombettes, P. L., and J.-C. Pesquet, 2011, Proximal Splitting Methods inSignal Processing, in Fixed-Point Algorithms for Inverse Problems inScience and Engineering: Springer New York, Springer Optimizationand Its Applications, No. 49, 185–212. (DOI:10.1007/978-1-4419-9569-8_10). → pages 51Donoho, D. L., 2004, For Most Large Underdetermined Systems of LinearEquations the Minimal 1-norm Solution is also the Sparsest Solution:Comm. Pure Appl. Math, 59, 797–829. → pages 50Edgar, J., and J. Selvage, 2013, Dynamic Intercept-gradient Inversion:Presented at the . → pages 31, 57Estrada, R., and R. P. Kanwal, 2000, Singular Integral Equations:Springer Science & Business Media. → pages 8Frisch, U., and G. Parisi, 1985, Fully developed turbulence andintermittency: Turbulence and predictability in geophysical fluiddynamics and climate dynamics, 88, 71–88. → pages 8, 10Garcia, C., and M. Delakis, 2004, Convolutional face finder: A neuralarchitecture for fast and robust face detection: IEEE Transactions onpattern analysis and machine intelligence, 26, 1408–1423. → pages 14Gardner, G. H. F., L. W. Gardner, and A. R. Gregory, 1974, Formationvelocity and density; the diagnostic basics for stratigraphic traps:Geophysics, 39, 770–780. → pages 28Groenestijn, G.-J. v., 2010, Estimation of primaries and multiples by sparseinversion.: PhD thesis, [s.n.], S.l. (OCLC: 840440078). → pages 31Hagen, D. C., 1982, The application of principal components analysis toseismic data sets: Geoexploration, 20, 93–111. → pages 37Hall, M., B. Bougher, and E. Bianco, 2015, Pick This! Social imageinterpretation: , Society of Exploration Geophysicists, 1772–1775. →pages 18Hami-Eddine, K., P. Klein, L. Richard, and A. Furniss, 2012, AnomalyDetection Using Dynamic Neural Networks, Classi?cation of PrestackData: Presented at the SEG-2012-1222, Society of ExplorationGeophysicists. → pages 3270BibliographyHerrmann, F. J., 1997, A scaling medium representation, a discussion onwell-logs, fractals and waves: [s.n.]. → pages vii, viii, 7, 9, 12Hofmann, T., B. Schölkopf, and A. J. Smola, 2008, Kernel methods inmachine learning: The Annals of Statistics, 36, 1171–1220. → pages 45,46J. Hicks, G., and A. M. Francis, 2006, Extended Elastic Impedance and ItsRelation to AVO Crossplotting and Vp/Vs: Presented at the . → pages58Kaplan, S. T., 2003, Principal and independent component analysis forseismic data: PhD thesis, University of British Columbia. → pages 37Kessinger, W., 1992, Extended split‐step fourier migration, in SEGTechnical Program Expanded Abstracts 1992: 917–920. → pages 23Khue, P. N., O. Huseby, A. Saucier, and J. Muller, 2002, Application ofgeneralized multifractal analysis for characterization of geologicalformations: Journal of Physics: Condensed Matter, 14, 2347. → pages10LeCun, Y., K. Kavukcuoglu, C. Farabet, and others, 2010, Convolutionalnetworks and applications in vision.: ISCAS, 253–256. → pages 15Li, T. L., A. B. Chan, and A. H. Chun, 2010, Automatic musical patternfeature extraction using convolutional neural network: Presented at theIn Proc. IMECS. → pages 2, 14Mallat, S., and W. L. Hwang, 1992, Singularity detection and processingwith wavelets: IEEE Transactions on Information Theory, 38, 617–643.→ pages 8Mallat, S., and S. Zhong, 1992, Characterization of signals from multiscaleedges: IEEE Transactions on Pattern Analysis and Machine Intelligence,14, 710–732. → pages 8, 10Mallat, S. G., 1998, A Wavelet Tour of Signal Processing: Academic Press.→ pages 7Mandelbrot, B., 1967, How Long Is the Coast of Britain? StatisticalSelf-Similarity and Fractional Dimension: Science, 156, 636–638. →pages 871BibliographyMandelbrot, B. B., 1977, The Fractal Geometry of Nature: W. H. Freemanand Company. → pages 7Morse, D. R., J. H. Lawton, M. M. Dodson, and M. H. Williamson, 1985,Fractal dimension of vegetation and the distribution of arthropod bodylengths: Nature, 314, 731–733. → pages 8Murphy, K. P., 2012, Machine Learning: A Probabilistic Perspective, 1edition ed.: The MIT Press. → pages 46MUZY, J., E. BACRY, and A. ARNEODO, 1994, THE MULTIFRACTALFORMALISM REVISITED WITH WAVELETS: International Journalof Bifurcation and Chaos, 04, 245–302. → pages 10Muzy, J. F., E. Bacry, and A. Arneodo, 1991, Wavelets and multifractalformalism for singular signals: Application to turbulence data: PhysicalReview Letters, 67, 3515–3518. → pages 10——–, 1993, Multifractal formalism for fractal signals: Thestructure-function approach versus the wavelet-transformmodulus-maxima method: Physical Review E, 47, 875–884. → pages 10Nasse, F., C. Thurau, and G. A. Fink, 2009, Face detection usinggpu-based convolutional neural networks: International Conference onComputer Analysis of Images and Patterns, Springer, 83–90. → pages 14Ooe, M., and T. J. Ulrych, 1979, Minimum entropy deconvolution with anexponential transformation: Geophysical Prospecting, 27, 458–473. →pages 31Ouadfeul, S.-A., and L. Aliouane, 2013, Automatic lithofaciessegmentation using the wavelet transform modulus maxima linescombined with the detrended fluctuation analysis: Arabian Journal ofGeosciences, 6, 625–634. → pages 11Paige, C. C., and M. A. Saunders, 1982, LSQR: An algorithm for sparselinear equations and sparse least squares: ACM transactions onmathematical software, 8, 43–71. → pages 59Pentland, A. P., 1984, Fractal-Based Description of Natural Scenes: IEEETransactions on Pattern Analysis and Machine Intelligence, PAMI-6,661–674. → pages 872BibliographyRumelhart, D. E., G. E. Hinton, and R. J. Williams, 1986, Learningrepresentations by back-propagating errors: Nature, 323, 533–536. →pages 15Saleh, S. J., J. A. de Bruin, and others, 2000, AVO attribute extraction viaprincipal component analysis: Presented at the 2000 SEG AnnualMeeting, Society of Exploration Geophysicists. → pages 37Sava, P. C., and S. Fomel, 2003, Angle-domain common-image gathers bywavefield continuation methods: GEOPHYSICS, 68, 1065–1074. →pages 24Scheevel, J. R., K. Payrazyan, and others, 2001, Principal componentanalysis applied to 3d seismic data for reservoir property estimation:SPE Reservoir Evaluation & Engineering, 4, 64–72. → pages 37Schölkopf, B., A. Smola, and K.-R. Müller, 1997, Kernel principalcomponent analysis, in Artificial Neural Networks—ICANN’97:Springer, 583–588. → pages 45Shuey, R. T., 1985, A simplification of the Zoeppritz equations:Geophysics, 50, 609–614. → pages 26Stoffa, P. L., J. T. Fokkema, R. M. de Luna Freire, and W. P. Kessinger,1990, Split‐step Fourier migration: GEOPHYSICS, 55, 410–421. →pages 23Verschuur, D. J., A. J. Berkhout, and C. P. A. Wapenaar, 1992, Adaptivesurface‐related multiple elimination: GEOPHYSICS, 57, 1166–1177. →pages 31Zhang, T., R. Ramakrishnan, and M. Livny, 1996, BIRCH: an efficientdata clustering method for very large databases: ACM Sigmod Record,ACM, 103–114. → pages 42Zoeppritz, K., 1919, VII b. Über Reflexion und Durchgang seismischerWellen durch Unstetigkeitsflächen: Nachrichten von der Gesellschaft derWissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1919,66–84. → pages 2673