"Science, Faculty of"@en . "Earth, Ocean and Atmospheric Sciences, Department of"@en . "DSpace"@en . "UBCV"@en . "Sacchi, Mauricio Dino"@en . "2009-02-18T20:52:13Z"@en . "1996"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "In seismic data analysis, recorded data often are transformed to various domains to\r\ndiscriminate against coherent and incoherent noise. For instance, by mapping a shot\r\nrecord from time-space domain to frequency-wavenumber domain, coherent linear noise\r\ncan be attenuated. Similarly, by mapping a common-midpoint gather from time-space to\r\ntime-velocity domain (velocity stacks) multiples are separated from primaries based on\r\nmoveout discrimination. In these procedures the correct identification of seismic events\r\nwith similar moveout can be severely affected by the aperture of the array and the discrete\r\nsampling of the wavefield.\r\nEconomic and/or logistic reasons usually dictate the cable length and spatial sampling\r\nof the seismic experiment. This thesis examines how the resolution (the ability to distinguish\r\nclose events) of slant stack and parabolic stack operators deteriorates under limited\r\naperture. An algorithm is developed to increase the resolution of the aforementioned operators.\r\nThis procedure constructs an operator that collapses each seismic signal in\r\nthe transform domain, thus diminishing truncation artifacts. The overall procedure is\r\nequivalent to the simulation of a longer array of receivers.\r\nSlant stacks and the parabolic stacks are linear operations used to map the seismic\r\ndata into another domain, the transform domain (r \u00E2\u0080\u0094 p or r \u00E2\u0080\u0094 q). In this thesis an inverse\r\nproblem is posed. This is accomplished by considering the data as the result of a linear\r\noperation onto the transform domain. This approach permits one to incorporate prior\r\ninformation into the problem which is utilized to attenuate truncation artifacts.\r\nThe prior information is incorporated into the inverse problem by means of the\r\nBayesian formalism. The observational errors and the prior information are combined through Bayes' rule using the likelihood function and a long tailed distribution, respectively.\r\nThe posteriori probability is then used to induce the objective function of the\r\nproblem. Finally, minimizing the objective function leads to the solution of the inverse\r\nproblem. The advantage of incorporating a long tailed distribution to model the transform\r\ndomain is that the solution is constrained to be sparse which is a desired feature\r\nfor highly resolved models.\r\nThe method is also used to design an artifacts-reduced 2-D discrete Fourier transform.\r\nA by-product of the method is a high resolution periodogram. This periodogram coincides\r\nwith the periodogram that would have been computed with a longer array of receivers if\r\nthe data consist of a limited superposition of linear events."@en . "https://circle.library.ubc.ca/rest/handle/2429/4752?expand=metadata"@en . "19268727 bytes"@en . "application/pdf"@en . "A P E R T U R E C O M P E N S A T E D R A D O N A N D F O U R I E R T R A N S F O R M S by Mauricio Dino Sacchi Geofisico, Universidad Nacional de La Plata A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S G E O P H Y S I C S A N D A S T R O N O M Y We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRIT ISH C O L U M B I A January 1996 @ Mauricio Dino Sacchi, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Geophysics and Astronomy The University of British Columbia 129-2219 Main Mall Vancouver, Canada V6T 1Z4 Date: Abstract In seismic data analysis, recorded data often are transformed to various domains to discriminate against coherent and incoherent noise. For instance, by mapping a shot record from time-space domain to frequency-wavenumber domain, coherent linear noise can be attenuated. Similarly, by mapping a common-midpoint gather from time-space to time-velocity domain (velocity stacks) multiples are separated from primaries based on moveout discrimination. In these procedures the correct identification of seismic events with similar moveout can be severely affected by the aperture of the array and the discrete sampling of the wavefield. Economic and/or logistic reasons usually dictate the cable length and spatial sampling of the seismic experiment. This thesis examines how the resolution (the ability to distin-guish close events) of slant stack and parabolic stack operators deteriorates under limited aperture. An algorithm is developed to increase the resolution of the aforementioned op-erators. This procedure constructs an operator that collapses each seismic signal in the transform domain, thus diminishing truncation artifacts. The overall procedure is equivalent to the simulation of a longer array of receivers. Slant stacks and the parabolic stacks are linear operations used to map the seismic data into another domain, the transform domain (r \u00E2\u0080\u0094 p or r \u00E2\u0080\u0094 q). In this thesis an inverse problem is posed. This is accomplished by considering the data as the result of a linear operation onto the transform domain. This approach permits one to incorporate prior information into the problem which is utilized to attenuate truncation artifacts. The prior information is incorporated into the inverse problem by means of the Bayesian formalism. The observational errors and the prior information are combined ii through Bayes' rule using the likelihood function and a long tailed distribution, respec-tively. The posteriori probability is then used to induce the objective function of the problem. Finally, minimizing the objective function leads to the solution of the inverse problem. The advantage of incorporating a long tailed distribution to model the trans-form domain is that the solution is constrained to be sparse which is a desired feature for highly resolved models. The method is also used to design an artifacts-reduced 2-D discrete Fourier transform. A by-product of the method is a high resolution periodogram. This periodogram coincides with the periodogram that would have been computed with a longer array of receivers if the data consist of a limited superposition of linear events. iii Table of Contents Abstract ii List of Tables vii List of Figures viii Acknowledgement xi 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Scope and contribution of this work 5 1.4 Thesis outline 5 2 Slant Stacks 7 2.1 Introduction 7 2.2 Slant stacks 10 2.2.1 The slant stack operator (conventional definition) 10 2.2.2 The inverse slant stack operator 14 2.2.3 The sampling theorem for slant stacks 16 2.3 Discrete slant stacks 18 2.3.1 The discrete slant stack operator (conventional definition) . . . . 20 2.3.2 The minimum norm discrete inverse slant stack operator 21 2.3.3 Minimum norm solution with quadratic data constraints 23 iv 2.3.4 On the stability of the operator G 26 2.3.5 The data and model resolution matrices 35 2.3.6 Resolution analysis by means of the spectral expansion 39 2.4 Synthetic data examples 42 2.5 Concluding remarks 43 3 Velocity stacks 48 3.1 Introduction 48 3.2 Velocity stacks 51 3.3 Inverse velocity stacks 53 3.4 Discrete parabolic Radon transform 55 3.5 Sampling considerations 57 3.6 Example 58 3.7 The validity of the parabolic approximation 59 4 High Resolution Slant Stack and Parabolic Stacks Operators 63 4.1 Introduction 63 4.2 Why sparse models? 66 4.3 Inverse problems and Bayes' rule 69 4.3.1 Bayes'Rule 69 4.3.2 Bayesian approach to inverse problems 70 4.3.3 Assigning a prior to the model 73 4.3.4 Assigning a probability to the noise 76 4.3.5 Maximum a Posteriori Solution (MAP) 78 4.3.6 Algorithm complexity 81 4.3.7 Convergence 82 4.3.8 Convexity of the objective function 83 v 4.3.9 Derivation of the damped least squares solution 85 4.4 Application to slant stacks 86 4.4.1 Slant stacking non-aliased data 86 4.4.2 Slant stacking beyond the alias 86 4.4.3 High resolution wavefield decomposition of VSP data 90 4.5 Application to velocity processing 98 4.5.1 Identification of multiples 98 4.5.2 Offset space reconstruction from parabolic stacks 102 4.5.3 Field data example 107 4.6 Concluding remarks and discussion 115 5 Aperture Compensated Discrete Fourier Transform and Applications 116 5.1 Introduction 116 5.2 The discrete Fourier transform as an inverse problem 118 5.3 Zero order quadratic regularization (damped least squares) 120 5.4 Regularization by the Cauchy-Gauss model 122 5.5 Hybrid two-dimensional estimator of the DFT 124 5.5.1 First example: spatio-temporal spectrum narrow band signals . . 125 5.5.2 Second example: broad band applications 126 5.6 Application to field data. Vertical Seismic Profiling (VSP) 129 5.7 Discussion 133 6 Summary 137 Appendices 148 A Scalar products 148 vi List of Tables 2.1 Eigenvalues of (l - V), Z,Z' = 0,...JV - 1 for W = 0.2, 0.3, 0.4 31 4.1 Parameters used to generate the synthetic data 86 vii List of Figures 2.1 (a) values of A*(ll, W) and (b) A*(25, W) for W = 0.1,0.2,0.3,0.4. . . . 30 2.2 Synthetic data 44 2.3 (a) T \u00E2\u0080\u0094 p panel computed with the conventional slant stack operator, (b), (c) and (d) T \u00E2\u0080\u0094 p space computed with the minimum norm slant stack operator 45 2.4 (a) Power spectrum of the 25 Hz Ricker wavelet used in the synthetic example, (b) Condition number of the matrix g(l \u00E2\u0080\u0094 l'),l,l'=0,... ,10. . . 46 2.5 Minimum and maximum eigenvalues values of fg(l \u00E2\u0080\u0094 /'), 1,1' = 0,..., 10. 47 3.1 Synthetic data 59 3.2 Velocity processing, (a) Velocity stack computed using the conventional operator (parabolic moveout and stacking), (b) Velocity panel computed using the minimum norm inverse operator 60 4.1 Linear Radon transform. Conventional and inverse operators 87 4.2 Synthetic example 90 4.3 Slant stack computed with the conventional and with inverse operator. . 91 4.4 Contour plot of the normalized envelope of the panels showed in Figure 4.3 92 4.5 Blowup of figure 4.4 93 4.6 (a) The synthetic wavefield. Noisy case 94 4.7 The T \u00E2\u0080\u0094 p panel of the noisy wavefield 94 4.8 The T \u00E2\u0080\u0094 p space computed with inverse slant stack operator derived from the Cauchy prior 95 viii 4.9 Misfit function versus frequency. 96 4.10 VSP data and r \u00E2\u0080\u0094 p panels derived using damped least squares and the IMRLS 97 4.11 Velocity processing with the conventional and inverse parabolic stack op-erator 99 4.12 A synthetic CMP gather with three primaries events and multiples . . . . 100 4.13 Velocity processing with the conventional parabolic stack and with the IMRLS algorithm 101 4.14 Parabolic processing after a t2 transformation 103 4.15 Offset space reconstruction 104 4.16 Parabolic processing after a NMO correction 105 4.17 Offset space reconstruction 106 4.18 CSP gather with missing traces 109 4.19 Parabolic transform processing after a t2 transformation 110 4.20 Parabolic transform processing after a t2 transformation. IMRLS solution. I l l 4.21 CSP gather with missing traces before and after NMO correction 112 4.22 Parabolic transform processing after a NMO correction 113 4.23 Parabolic transform processing after a NMO correction. IMRLS solution 114 5.1 2-D spectrum of three narrow band signals 127 5.2 a) Synthetic wavefield 129 5.2 b-g) Spatio-temporal spectrum using the windowed DFT and the high resolution DFT 130 5.2 i-h) Wavefield extrapolation 131 5.3 a) Synthetic wavefield contaminated with random noise 131 ix 5.3 b-g) Spatio-temporal spectrum using the windowed DFT and the high resolution DFT 132 5.4 Segment of a VSP 133 5.5 2-D spectral signature of the VSP in the interval 500 - 800m 134 5.6 2-D spectral signature of the VSP in the interval 800 - 1200m 134 5.7 2-D spectral signature of the VSP in the interval 1200 - 1600m 135 x Acknowledgement A miei genitori, ricordando con riconoscenza cuanto amore mi venne dalla sua comprensione. A Luciana y a Fedeiico, gracias poi estar siempie a mi lado durante estos anos de estudio. I am indebted to many people. In particular, to my supervisor, Tad Ulrych, for his friendship, encouragement, and for sharing his compulsory passion for time series analysis. My thanks also to Drs. M. Bostock, and L. Kirlin for their considerable input dur-ing the preparation of this thesis. My thanks also to Drs. G. K. C. Clarke, and L. Smith for their participation as committee member and university examiner, respect-ively. Moreover, I thank Drs. 0. Yilmaz (external examiner) and C. Kostov for their very constructive remarks. An appreciation is given to Dr. Doug Oldenburg. His clear inverse theory viewpoint always came to my mind during the preparation of this thesis. The friendship of Xingong Li has been very important since my very first day at UBC. The valuable input of Ken Matson in real seismic exploration problems is also appreciated. The research that leads to this thesis was supported by an NSERC grant, a University Research Grant from Imperial Oil Ltd., and by a University Graduate Fellowship from the University of British Columbia. xi Chapter 1 Introduction It was the furthest point of navigation and the culminating point of my expe-rience. Joseph Conrad: Heart of Darkness 1.1 Background Limited aperture is a long standing problem in exploration seismology. Geophysicists are constantly dealing with a finite amount of data and consequently with truncation artifacts. An important part of the seismic technology for oil exploration still relies on two basic procedures: velocity analysis and stacking. With the development of new technologies and with the necessity of revealing more complicated seismic structures, high resolution techniques for velocity processing and multiple attenuation have become an important requirement. Thorson and Claerbout (1985) were the first to note a need for high resolution tech-niques for velocity processing to cope with finite amount of data in C M P processing. They address the problem by developing an inverse technique to retrieve velocity panels from C M P data. In another scenario, the pioneering work of John P. Burg (1975) on maximum entropy spectral analysis (MESA), shows how to estimate spectral estimators without imposing any implicit assumption about the samples of the time series that were 1 Chapter 1. Introduction 2 not recorded. Burg's technique assumes that a finite length time series is only a circum-stantial outcome of a longer process and therefore, it is not convenient to assume that values outside the time series gate are zero (missing samples). The Maximum Entropy principle (Jaynes, 1967) is the elegant mechanism utilized by Burg to avoid drawing any assumption about the missing information. Slant stacking and parabolic stacking entail two-dimensional operators routinely used to process seismic data. The resolution of these operators is affected by the aperture of the seismic array. In this context, the aperture plays a role equivalent to the length of the time series in time series analysis. Slant stack and parabolic stack operators are two members of the generalized discrete Radon transform (Beylkin, 1987). Slant stacking is usually used to decompose and/or filter signals that exhibit linear moveout. An example is vertical seismic profiling (VSP), where the composite seismo-gram is a superposition of seismic events with constant ray parameter. Slant stacks also perform an important role in wave equation theory, since they may be related to plane wave decomposition (Treitel et al., 1982). The parabolic transform maps a parabola into a point in the transform domain, although this is only true when the aperture of the array is infinite. If the seismic data consist of a superposition of parabolas, then the parabolic transform will decompose the seismogram into a superposition of isolated points. However, limited aperture introduces amplitude smearing into the transform domain and therefore the resolution or focusing power of the transform is deteriorated. The same reasoning is applicable in time series analysis where truncation introduces sidelobes and, consequently, difficulty arises in the identification of close spectral peaks. In CMP or shot gathers the seismic signal is organized in hyperbolic paths. In order to apply the parabolic transform to decompose these hyperbolas, the parabolic model Chapter 1. Introduction 3 must be first satisfied. Hampson (1986) proposed to validate the parabolic approxima-tion by means of a normal moveout (NMO) correction. After NMO, there is a range of hyperbolic events that may be accurately approximated by parabolic paths and therefore the conditions to successfully apply the parabolic transformation are satisfied. Yilmaz (1989) proposed to model CMP gathers after a t2 transformation of the temporal axis. Hampson (1986) and Yilmaz (1989) posed the problem in the frequency-offset (u; \u00E2\u0080\u0094 h) domain. The latter permits the problem to be broken down into several smaller inverse problems, one at each frequency in the seismic band. These researchers have adopted zero order quadratic regularization or damped least squares (Lines and Treitel, 1984; Tit-terigton, 1985) to design the parabolic operator. In this thesis, it is shown that damped least squares is not necessarily the best means by which to solve the problem. In fact, damped least squares imposes smoothness onto the transform domain and consequently, instead of sharpening up the image (enhance the focusing power of the transform), it tends to blur and introduce smearing into the transform domain. 1.2 Motivation My interest in this topic arose after investigating together with T.J. Ulrych a technique to estimate a side-lobe-free 1-D discrete Fourier transform, a problem originally suggested to us by Colin Walker. There are many approaches to estimate high resolution spectral estimates based on the idea of side lobe suppression. However, none of these approaches deals with the problem of estimating a high resolution discrete Fourier transform, and hence none is capable of estimating the spectral amplitude as well as the phase of the signal. The spectral estimation problem can be posed as a linear inverse problem where the target is the power spectral density that honors a few lags of the autocorrelation function (Priestly, 1982). However, since the autocorrelation is a phase-less function, only Chapter 1. Introduction 4 the spectral density can be recovered. To retrieve a high resolution estimate of amplitude and phase, instead of using autocorrelation constraints, we considered data constraints. A non-parametric method to estimate the high resolution DFT was developed by posing the problem as an inverse problem and using a regularization approach derived from a Bayesian description of the problem in probability space. Two dimensional applications were developed to estimate the spatial spectral signature of narrow and broad band seismic data (Sacchi and Ulrych, 1995a). My attention then became focused on slant stacks and parabolic stacks. In the frequency-offset space, the Radon operator looks very similar to the transformation mat-rix used to retrieve the DFT. The regularization scheme developed to estimate the side-lobes free DFT was re-examined to compute slant stacks and parabolic stacks free of truncation artifacts. The primary results of the regularization scheme devised to cope with limited aper-ture were presented by myself at the 64th Annual meeting of the Society of Exploration Geophysicists in Los Angeles, USA (Sacchi and Ulrych, 1994), later expanded and pub-lished (Sacchi and Ulrych, 1995b and c). These works were primarily focused on inverse velocity stacking, a name usually reserved for Thorson's (Thorson and Claerbout, 1985) technique. Although the ultimate goal of my research involves the estimation of aperture com-pensated slant stack and parabolic stack operators, there are other areas where I found that these algorithms may be applied. As geophysicists we live in a truncated world. We are always faced with the problem of data coverage. A simple deconvolution problem may be posed as an aperture problem where from a few samples of the central band of the seismic signal we attempt to reconstruct the missing or deteriorated low and high frequencies. It is in these types of situations that the correct regularization of the in-verse problem plays a crucial role. In particular, I would like to mention the problem of Chapter 1. Introduction 5 impedance reconstruction from band-limited seismic data, extensively studied by many researchers (Oldenburg et al., 1983; Walker and Ulrych, 1983). 1.3 Scope and contribution of this work The purpose of this thesis is to investigate new means of estimating Radon operators in situations where the physical size of the seismic survey (cable length) inhibits the correct identification of close signals. The proposed methodology does not replace the existing techniques to compute 2-D linear transforms but, rather, complements them. This thesis provides a novel inverse procedure to compute Radon operators. The Bayesian framework is adopted to derive the regularization strategy that is used to solve the inverse problem. The regularization term is derived by assuming a long tailed pdf to model the prior distribution of parameters. The inverse procedure is also used to estimate a high resolution 2-D discrete Fourier transform. The goal is to estimate from a limited number of receivers the 2-D spectral signature of a group of events that are recorded on a linear array of receivers. 1.4 Thesis outline The linear Radon transform or the slant stack transform is presented in Chapter 2. The stability/resolution tradeoff is also studied. This chapter details some pivotal concepts that are used throughout this thesis. Parabolic stacks as an alternative procedure for velocity processing are presented in Chapter 3. The validity of the parabolic approximation by means of a time stretching transformation and a normal moveout correction is also studied. Chapter 4 introduces a new procedure to enhance the resolution of slant stack and parabolic stack operators. The feasibility of the technique to process seismic data is Chapter 1. Introduction 6 examined with synthetic and real field examples. In Chapter 5, the regularization scheme presented in chapter 4 is used to estimate 2-D spectral estimators. This problem may appear rather disconnected from the rest of the thesis. However, the aperture problem has a significant impact in spectral ana-lysis problems. The reader will find that many of the ideas developed to treat aperture limitations in Radon operators may also be applied in a spectral analysis scenario. Finally, in Chapter 6, the core of the thesis is reviewed. Two important aspects are examined. First, the potential strength of the techniques to process real data when dealing with severe aperture limitations. Secondly, I provide a discussion on the main limitations of the techniques. Chapter 2 Slant Stacks Y aqui mt despido yo Que he relatao a mi modo, Males que conocen todos Pero que naides conto Jose Hernandez -.Martin Fierro 2.1 Introduction Different techniques have been devised to identify and/or filter linear events. Generally, they have the following common framework. First, they assume that a set of linear events are recorded on an array with discrete and limited coverage. Secondly, they assume that the noise is uncorrelated with the signals. In geophysics, linear event identification has been an active field of research. Two classic examples are vertical seismic profiles (VSP) (Hardage, 1985) and slowness vector estimation in seismographic arrays for earthquake detection and location (Goldstein and Archuleta, 1987). In VSP processing, linear event detection-estimation is used to identify and separate the principal components of the VSP data: the up-going and the down-going waves. A general strategy for event identification-estimation involves the following approach. First, the data are transformed to a new domain where each component may be isolated. Then, after masking the undesired components, the data are mapped back to the original 7 Chapter 2. Slant Stacks 8 domain retaining only the desired information. The so-called covariance methods, spectral matrix methods or eigenstructure meth-ods, that have been borrowed from the field of sonar processing (Bienvenu and Kopp, 1983; Wax et al., 1984), exploit the separability of the signal and noise covariance ma-trices. Although they have been developed to process narrow band signals, covariance methods can be applied to broad band applications by carrying out the processing at each narrow band frequency that comprises the broad band. These techniques have been successfully applied in VSP processing (Mari and Glangeaud, 1990; Mari and Gavin, 1990) and in wavefield decomposition of multi-component records (Rutty and Jackson, 1992). To fully separate two signals of different slowness by means of the spectral matrix approach, the signals must be uncorrelated. In other words, the inner product of their steering vectors must vanish. This is a very restrictive assumption that usually does not hold. This short-coming, however, can be overcome by un-correlating the signals by means of a frequency-spatial averaging procedure. The latter implies that some type of a priori information about the dips and the extension of the frequency smoothing must be provided (Mari and Gavin, 1990). Freire and Ulrych (1988) have applied the singular value decomposition (SVD) to separate down and up-going waves in VSP processing. Their technique is another way of interpreting the Karhunen-Loeve (KL) transform in-troduced in geophysics by Hemon and Mace (1978) and extensively studied by Jones (1985). Similar to the spectral matrix approach, the SVD or KL method requires prior knowledge of the dip of the target signal. Recently, Ulrych et al. (1995), have developed a hybrid technique to cope with the aforesaid problem. First, they apply an eigenvalue based metric (Pisarenko, 1972) to identify the dip of each primary wavefield: up-going and down-going waves. Then, this information is used to rotate the wavefield target to simulate a signal with infinite apparent velocity. Finally, the packing property for flat events of the SVD or KL transform (Jones, 1985) is used to isolate the target wavefield. Chapter 2. Slant Stacks 9 The Radon transform does not require any a priori knowledge of the apparent velocity of each linear component, but the performance of the transform is severely affected by the aperture of the array and by spatial undersampling (alias). In seismic processing, the Radon transform is commonly known as the r \u00E2\u0080\u0094 p (r denotes time and p ray para-meter) or slant stack transform. The original idea developed by Radon in 1917 (Deans, 1983), has provided a basic framework for many problems of image reconstruction in physics, astronomy, medicine, optics, nondestructive testing, and geophysics. In image processing, it is also called the Hough transform (Pratt, 1991), which may be regarded as a transformation of a Une in Cartesian coordinate space to a point in polar coordinate space. In geophysics, the properties of the Radon transform were examined by Phinney et al. (1981), Durrani and Bisset (1984) and Tatham (1984). Chapman (1981) developed exact formulas for a point source in cartesian or spherical coordinates, and for a line source in cylindrical coordinates. The relationship between the Radon transform and the plane wave decomposition is also well established (StofFa et al., 1981; Treitel et al., 1982). Least squares procedures to compute the Radon transform were investigated by Thorson and Claerbout (1985), Beylkin (1987) and Rostov (1990). These authors showed how to mitigate the smearing caused by the finite aperture. Recently, Zhou and Greenhalgh (1994) linked the least squares solution to p-dependent Wiener filters. These researchers derived the slant stack formulas in the continuous domain, but the resulting algorithms are identical to those obtained by other researchers ( Beylkin, 1987; Kostov, 1990). In order to avoid the inversion of prohibitively large matrices the problem may be posed in the frequency-space domain (/ \u00E2\u0080\u0094 x). This technique was adopted by Beylkin (1987), Kostov (1990), Foster and Mosher (1992), and recently by Zhou and Green-halgh (1994). This allows us to solve several small problems in the band that comprises the signal. Some stability concerns arise when the problem is tackled in this manner. Chapter 2. Slant Stacks 10 Particularly, a least squares solution can be extremely unstable at low frequencies. In ad-dition, it is interesting that slant stacks can be also computed in the time-space domain. Thorson and Claerbout (1985) and, recently Yilmaz and Tanner (1994), have presented high resolution least squares slant stack operators designed in time-space domain. Their procedures use an iterative inversion scheme especially devised to solve large linear sparse operators. Thorson and Claerbout (1985) have also shown how to update in each itera-tion the variances of the model to drive the solution to minimum entropy. Yilmaz and Taner (1994) have also developed an interesting scheme based on fuzzy logic to mitigate the alias. This chapter is organized as follows. First, I provide the basic definitions of the continuous Radon transform in the x \u00E2\u0080\u0094 t and r\u00E2\u0080\u0094p planes. Then, two different approaches are used to formulate the Radon pair. The first definition, the conventional slant stack operator, uses the forward mapping to compute the Radon space and a deconvolution operator to map the r \u00E2\u0080\u0094 p space into the x \u00E2\u0080\u0094 t space. The second definition, the inverse slant stack operator, uses a deconvolution operator to compute the Radon space and a simple mapping to recover the x \u00E2\u0080\u0094 t space. Discrete formulas for the aforesaid operators are also derived. In the discrete case, I analyse two crucial problems that constantly arise in geophysics: stability and resolution. These problems are formally examined using prolate spheroidal discrete sequences (Slepian and Sonenblick, 1965). Finally, the theoretical analysis is illustrated with numerical simulations. 2.2 Slant stacks 2.2.1 The slant stack operator (conventional definition) Let u(h,t) represent a seismic signal. Throughout this thesis variable t designates the time and h the offset or range. For a continuous array we define the slant stack by means Chapter 2. Slant Stacks 11 of the following transformation v(p, r) = (Cu)(p, r) = f\u00C2\u00B0\u00C2\u00B0 u(h, t = r + hp)dh. (2.1) J\u00E2\u0080\u0094oo Where p and r denote the slope or ray parameter and the intercept time, respectively. v(p, r) is used to designate the signal in the r \u00E2\u0080\u0094 p domain. The adjoint transform C* is given by u(h, t) = (\u00C2\u00A3*v)(p, T) = r v{p, t = r- hp)dp. (2.2) J\u00E2\u0080\u0094oo In the frequency domain, the pair of transformations are given by, V(p,u>) = [\u00C2\u00B0\u00C2\u00B0 U(h,w)eiuphdh, (2.3) J \u00E2\u0080\u0094OO U(h,u)= f\u00C2\u00B0\u00C2\u00B0 V{p,w)e-iuphdp, (2.4) J\u00E2\u0080\u0094oo substituting, (2.3) into (2.4) yields U(h',w) / e-^^'Updh' (2.5) -OO \u00E2\u0080\u0094 oo which may be written as follows U(h,w) = U(h,u)*p(h,u), (2.6) Chapter 2. Slant Stacks 12 where * denotes convolution and the function p is given by />(*,\u00C2\u00AB)= r e-^hdp, (2.7) J\u00E2\u0080\u0094oo making the substitution z \u00E2\u0080\u0094 \u00E2\u0080\u0094up equation (2.7) becomes /O 1 O71-.\u00E2\u0080\u009EHe d z = M H h ) - (2'8) The convolution operator is a delta function with respect to the variable h. Using the property of the 8 function, U(h,u)= fcU{h,u,)*6(h) (Z.y) the inversion formula becomes, U(h,u)=l-^U(h,u). (2.10) The inverse is computed in two steps. First, the adjoint is used to evaluate U(h,u). Then, U(h,(jj) is multiplied by the frequency response of the p filter. The conventional slant stack pair in the frequency domain results in, V(p,w) = r U(h,Lo)eiu,phdh, J\u00E2\u0080\u009400 U(h,u) = Mj^Vip^e-^dp. (2.11) Chapter 2. Slant Stacks 13 Now, consider that the range of p is a finite interval, p \u00E2\u0082\u00AC [\u00E2\u0080\u0094P, P]. This case leads to the following p filter, ^ = / _ V ^ ? = 2 p ! ! ^ . (2.12) Substituting (2.12) in (2.6), It is evident that the data may be recovered after solving a deconvolution problem. Spatial deconvolution is required since the infinite range of the variable p is truncated to a finite range. The wavenumber response of the p filter has the following expression: p(k,u) = r p(h,u)eikhdh J\u00E2\u0080\u0094oo = r r e-^-^dhdp J-oo J-P fp = / 8(u>p \u00E2\u0080\u0094 k)dp 1 twP = - S(k' - k)dk' U> J-u>P (2.14) k < \u>P\ otherwise According to the last equation, the spatial deconvolution will be unstable if the wavenum-bers in the data he outside the range [\u00E2\u0080\u0094wP,u>P]. Equation (2.14) also shows that the deconvolution is unstable at low frequencies. Chapter 2. Slant Stacks 14 2.2.2 The inverse slant stack operator The definition of the forward slant stack operator and its adjoint may be changed to construct another slant stack pair, u(h, i) = (C*v)(p, r)= f \u00C2\u00B0\u00C2\u00B0 v(h, t = r- hp)dp (2.15) J oo v(p, T) = (Cu)(p, T) = r u(h, t = r + hp)dh, (2.16) J\u00E2\u0080\u0094oo the pair of transformations can be posed in the frequency-offset domain, U(h,cj) = r V(p,u)e-iui>hdp, (2.17) J\u00E2\u0080\u0094oo V(p,w) = f\u00C2\u00B0\u00C2\u00B0 U(h,w)eiuphdh. (2.18) Substituting, (2.17) into (2.18) yields, V(p,v) = r V(p',w) f\u00C2\u00B0\u00C2\u00B0 e-^-^dhdp', (2.19) J\u00E2\u0080\u0094oo J\u00E2\u0080\u0094oo where now, the convolution is with respect to the variable p, and the convolutional operator is given by /oo 1 o_ - \u00E2\u0080\u009E M \u00C2\u00A3 * = M * W - ( 2 ' 2 0 ) Chapter 2. Slant Stacks 15 The 7 filter is a delta function with respect to the variable p. Therefore, equation (2.19) becomes, 27T V(P,u) =-r-V(p,u) (2.21) M or equivalently V(p,u,) = glv-(p,u,). (2.22) From the above derivation, it is clear that the p and the 7 filters have the same frequency response. Finally, the slant stack pair becomes, V{p,u) = M I^U(h,u,)e^hdh, U{h,u) = f\u00C2\u00B0\u00C2\u00B0 V(h,w)e-iupdp. (2.23) J\u00E2\u0080\u009400 Assuming that h G [\u00E2\u0080\u0094H, H] (finite aperture), the 7 filter has the following structure 7 ( p | W ) = f \ ^ d h = 2 H ^ ^ . (2.24) J-H uHp Hence, V(p, u>) may be calculated by solving the following integral equation, v M ^ i y i M ^ ^ i i P : ( , 2 8 ) Chapter 2. Slant Stacks 16 After a comparison of the slant stacks pairs, equations (2.11) and (2.23), it is clear that a deconvolution procedure is required in both cases. In the conventional slant stack transform, the deconvolution is necessary to recover the data from the r \u00E2\u0080\u0094 p space. In the inverse slant stack operator the deconvolution process is required to estimate the r\u00E2\u0080\u0094p space. The truncation effect of the variable p may be alleviated by choosing the proper region of support of the transform. The truncation of the variable h is associated with the resolution of the transform and cannot be alleviated by simple means. Generally, both the variables h and p are truncated. Thus, deconvolution should be carried out in both the forward and inverse transform (Zhou and Greenhalgh, 1994). However, the range of p may be chosen in such a way that most of the energy in the signal lies within this range. 2.2.3 The sampling theorem for slant stacks Assuming that the wavefield is evenly sampled according to U(nAh, u>), n = 0, \u00C2\u00B1 1 , \u00C2\u00B1 2 , . . . , the relationship between the r \u00E2\u0080\u0094 p and the h \u00E2\u0080\u0094 t spaces is given by where V(p,u>) denotes the slant stack corresponding to a continuous wavefield U(h,u>). The integration domain can be decomposed into small subdomains as follows, (2.26) (2.27) Chapter 2. Slant Stacks 17 since e * 2 i m f c = 1 Vnfc, the last equation may be written in the following form U(nAh, u) = H j f * % (p, o . J e - ^ ^ p , (2.28) w Ah where the relationship between the slant stack of the continuous signal and the one corresponding to the sampled wavefield, Vd(p,u>) , is given by oo 7T Vd(p,u>) = \u00C2\u00A3 V(p + 2k\u00E2\u0080\u0094-,\u00C2\u00AB). (2.29) k=\u00E2\u0080\u0094oo Thus, the discrete signal has an w - p representation with support in the range p \u00E2\u0082\u00AC l-^h>uh]- T h e components with slope p - 2 ^ , p + 2 ^ , p - 4 ^ , p + 4 ^ , . . . will appear to have slope p and every slope outside the range ( \u00E2\u0080\u0094 ^ ^ h ) will have an alias inside this range. If the continuous signal has all the components inside that range, the aliased components do not exist and therefore we can write Vd(p, u>) = V(p, u>). It is clear from the above discussion that spatial sampling must be chosen so as to avoid the aliasing effect. If P = Pmoas = \u00E2\u0080\u0094Pmin, the following relationship guarantees the absence of alias, < , (2-30) & ~ J max where fmax = u ; m o x / 2 7 T is the maximum temporal frequency of the seismic signal. The product P fmax is also the maximum wavenumber. Similarly, if Ah is given, the maximum ray parameter that can be retrieved without alias is given by Chapter 2. Slant Stacks 18 Pmax \u00E2\u0080\u0094 0 A i f \u00E2\u0080\u00A2 (2-31) For a non-symmetric slant stack, Pmax ^ ~Pmin, equation (2.30) is modified as follows (Turner, 1990), M < pH\u00E2\u0080\u0094 , (2-32) where P' = \ P m a x - P m i n \ . 2.3 Discrete slant stacks Discrete versions of equations (2.1) and (2.2) are obtained by replacing integrals by summations and imposing finite limits. First, assume that the seismogram contains N = Lf \u00E2\u0080\u0094 Ln traces, where the indices Lf and Ln denote far and near offset traces respectively. v(p, r) = (\u00C2\u00A3u)(p, T) = t\u00C2\u00BB(fc|, r + hlP)Ah,, (2.33) l=Ln where Ahi = (/ij+i \u00E2\u0080\u0094 hi) for J = L n , . . . , Lf \u00E2\u0080\u0094 1. Similarly, we approximate (2.2) by the following expression u(M) = (C*V)(T,P) = ^2 v(h,t-hp)APj (2.34) where Apj = (pj+i \u00E2\u0080\u0094 Pj) for j = Jmin, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Jmax \u00E2\u0080\u0094 1- Taking the Fourier transform of equations (2.31) and (2.32) yields Chapter 2. Slant Stacks 19 V(p, /) = \u00C2\u00A3 U(hh fy^Aht (2.35) J max U(h, /) = \u00C2\u00A3 ^(P. f)e-2Hfh*APj . (2.36) Using matrix notation it is possible to rewrite the slant stack and its adjoint as follows (/ is omitted to avoid notational clutter), v = FW uu = Lu (2.37) u = FHWvy - L*v (2.38) where W u = diag[A^L n,..., AhLf] and W\u00E2\u0080\u009E = diag[Ap J r a i n,..., ApJmax] account for the irregular geometry of the array and for irregular sampling of the variable p, respectively. The matrix F has dimension (Jmax \u00E2\u0080\u0094 Jmin + 1) x (Lf \u00E2\u0080\u0094 Ln + 1), and elements given by Fji = e-i2*fhlP>. (2.39) The operators L and L* form an adjoint pair. The matrix L is the forward operator and L* denotes the adjoint operator. The following useful relationship is proved in Appendix A L* - W U X L H W\u00E2\u0080\u009E . (2.40) Chapter 2. Slant Stacks 20 This formula shows that, apart from a constant factor, the Hermitian transpose of the forward mapping coincides with the adjoint when the discretization is regular in both spaces. 2.3.1 The discrete slant stack operator (conventional definition) The slant stack operator, equation (2.37), maps the t \u00E2\u0080\u0094 x space into the r \u00E2\u0080\u0094 p domain; the adjoint, equation (2.38), maps the r \u00E2\u0080\u0094 p domain into the t \u00E2\u0080\u0094 x domain. It is clear that since L is non-orthogonal L and L* do not constitute an inverse pair. Given v = Lu, the problem is how to recover u. A relationship between u and u is obtained after substituting (2.37) into (2.38) u = L*Lu . (2.41) Equation (2.41) is uniquely invertible in / \u00E2\u0082\u00AC B provided that det(L*L) ^ 0 in the band u= (L*L) _ 1u V ' (2.42) = G _ 1 u . The N x N matrix G = L*L represents a discrete version of the p filter. The pair of transformations which map a signal from / \u00E2\u0080\u0094 h to / \u00E2\u0080\u0094 p and vice-versa is given by v = Lu u= G ^ L ' v . (2.43) Chapter 2. Slant Stacks 21 The vector v always exists since it is obtained by means of a simple mapping. Both expressions constitute an inverse pair when the inverse of G exits. The forward and inverse pair does not permit to adequately model the signal when additive noise is present. If the data are contaminated with noise, the noise is mapped to the Radon domain. 2.3.2 The minimum norm discrete inverse slant stack operator Thus far I have described the reconstruction of the data from the slant stack domain by means of Radon's inversion formula for discrete signals (2.11). The resolution of the Radon operator is mainly affected by the aperture of the array. This problem, however, can be diminished by recasting the problem as an inverse problem. Assuming that the data are the result of some transformation on the Radon domain, the problem is how to solve L*v = u. This is the discrete case associated with the continuous pair discussed in section 2.2.2: u = F H W\u00E2\u0080\u009Ev = L*v (2.44) v = FW u u = Lu. (2.45) Two cases may arise depending of the number of unknowns and observations (traces). First, I will consider the underdetermined case (more unknowns than observations). Oversampling the Radon parameter p may lead to an underdetermined problem. The latter may help to remove ambiguities due to the discretization. Oversampling, however, as it is very well known, does not improve the resolution of the slant stack operator (Yilmaz, 1994). The underdetermined problem is solved by selecting a regularization strategy. A common means of computing a regularized solution is by the concept of the minimum norm. In effect, the solution chosen from among all the possible solutions that Chapter 2. Slant Stacks 22 fit the data is the one that has minimum energy (minimum Euclidean norm). This strat-egy is also called zero-order quadratic regularization (Titterigton, 1985). The regularized solution is calculated after solving the following problem minimize v H W \u00E2\u0080\u009E v , (2.46) subject to L*v = u . Incorporating the vector of Lagrange multipliers, A, the objective function of the problem becomes J = v H W \u00E2\u0080\u009E v + A*W u(L*v - u). (2.47) and minimizing J subject to data constraints leads to v = L(L*L)\"1u. (2.48) One finds that the operator L (conventional definition) has been replaced by L(L*L) _ 1 . The forward and inverse operators become v= L G V (2.49) u= L*v. The Radon operator exists provided that the inverse of G exists. The operator G plays an important role in the definition of the Radon pair. In the conventional definition, G _ 1 Chapter 2. Slant Stacks 23 is used to recover the data from the Radon domain (2.43), while in the minimum norm solution, G 1 is used to construct the Radon operator (2.49). 2.3.3 Minimum norm solution with quadratic data constraints In the previous section the Radon operator was computed using the concept of minimum norm. In this section, I will show the relationship that exists between the minimum norm solution for underdetermined problems and the technique called damped least squares. First, I will assume that the problem is overdetermined, M < N. This may appear rather misleading; the observation space has finite dimension (only N receivers are located in the field) while the model space, before discretization, lies in an infinite dimensional space. The least squares solution to the problem, L*v = u, is computed by minimizing the squared error S, S= e HW ue = (L*v - u)HWu(L*v - u). (2.50) The minimization with respect to the complex vector v yields L* H W u L*v = L * H W u u. (2.51) Combining the result proved in the Appendix A, L*H = W\u00E2\u0080\u009ELW. u -1 (2.52) with equation (2.51) results in the following expression Chapter 2. Slant Stacks 24 LL*v = L u . (2.53) Finally, assuming that the inverse of LL* exits, the least squares solution becomes v = (LL*)- 1 Lu. (2.54) Equation (2.54) is the unconstrained least squares solution. To avoid singularities the solution is usually stabilized by adding a small perturbation to the diagonal of LL*. This perturbation is called the damping factor or the ridge regression parameter. The statistical literature often refers to f3 as a smoothing parameter or a hyper-parameter. Formally, the damped least squares solution may be derived by minimizing the following objective function (Lines and Treitel, 1984), J, . = /3v H W v v + S, (2.55) after taking derivatives and equating to zero v = (01 + L L * ) - x L u . (2.56) The parameter is also the Lagrange multiplier of a minimum norm problem with a single quadratic constraint given by S. Now, considering the following identities, Chapter 2. Slant Stacks 25 /3L + LL*L = L(/3I + L*L) V ' (2.57) = (/3I + LL*)L, and since the matrices (/3I + L*L) and (/3I + LL*) are positive definite provided that /3 > 0, the following identity is valid (01 + LL*)\" X L = L(/tt + L * L ) - 1 , (2.58) and when combined with equation (2.56) yields v = L()9I + L * L ) - 1 u . (2.59) Equations (2.56) and (2.59) may be used to either solve underdetermined and/or over-determined problems. In general, it is more convenient to adopt equation (2.56) for overdetermined problems and equation (2.59) for the underdetermined case. This last assessment can be verified by considering the dimension of the matrices L*L and LL*. If the matrix G = L*L has full rank (underdetermined problem), it is possible to choose /? = 0. Equation (2.59) becomes the minimum norm solution with exact data constraints. It is interesting to note that a single quadratic constraint, S, may be used to obtain the minimum norm solution with N exact constraints. It appears that the problem formulated in (2.46) might be solved using the objective function Ju, equation (2.55), with (3 = 0. This is not true. The trick is valid only when applied to the solution of the problem and not the objective function Chapter 2. Slant Stacks 26 At this point one would think that if the problem is underdetermined, we may be able to use equation (2.59) with /3 = 0. However, the matrix G = L*L may have eigenvalues that are very small and may even fall below machine precision. This effect is particularly drastic at low frequencies. Any attempt to invert G, if it is possible, will lead to an erroneous result. 2.3.4 On the stability of the operator G Further insight into the stability of the slant stack pair is gained by considering the stability of the operator G. The entire analysis is focused on two approaches to construct the slant stack pair. In the first definition of the Radon operator, the existence and the stability of Radon's inversion formula is subject to the existence and stability of the operator G. In fact, only when G _ 1 exists, are both transformations inverses of each other. This is also applicable to the alternative definition given in the preceeding section using the concept of minimum norm. In the former situation, the stability of the operator G must be carefully considered to reconstruct the data from the r\u00E2\u0080\u0094p space. In the latter, the stability of the operator must be taken into account to design the slant stack operator. The elements of G are given by giv = Ahi \u00C2\u00A3 ApjeW'^-W, (2.60) when Ah is constant the relative offset is given by hi \u00E2\u0080\u0094 hi> = Ah(l \u00E2\u0080\u0094 V). Under this condition, the matrix G reduces to a Toeplitz structure. It is natural therefore to write 9i,i> as 9iv=g(l-V). (2.61) Chapter 2. Slant Stacks 27 If and Ap are constant, equation (2.60) corresponds to a geometric series with the following sum, 9iv = \u00C2\u00B1hAp ^ ^ 1 , , , ^ e\u00C2\u00AB . (2.62) Letting J = Jmax \u00E2\u0080\u0094 \u00E2\u0080\u0094 Jmin and M = 23 + 1, where M is the number of slopes or unknowns and introducing the variable, 9 = 2irfApAh(l \u00E2\u0080\u0094 /'), equation (2.62) becomes giv = 2wAhApDj(6), (2.63) where the function Dj(0) is known as the Dirichlet kernel, The function Dj(6) plays an important role in Fourier analysis (Priestley, 1982). The discrete Fourier transform (DFT) of a finite length sequence is the convolution of the Dirichlet kernel with the DFT of the infinite sequence. Thus, the transform of the observed finite sequence is a distorted version of the infinite sequence transform. The kernel Dj{6) has a large peak at 9 = 0 of magnitude (2 J+ l ) / 2 7 r , together with secondary peaks at approximately 9 = \u00C2\u00B15TT /(2J + 1), \u00C2\u00B1 9 T T / ( 2 J + 1), \u00C2\u00B1 1 3 T T / ( 2 J + 1),.... Since Ap and (2J + 1) are not independent, the following manipulation is convenient to continue with the analysis. Letting Ap = 2P/(2J + 1) where 2P denotes the region of support of the ray parameter, equation (2.62) becomes, Chapter 2. Slant Stacks 28 9(1 - n = 2pAh sin(2ir fPAhjl - I')) (2J + 1) ^ ( ^ / P A f c M ) * (2.65) When the number of slopes 2 J + 1 goes to infinity, while P is bounded, the elements of G reduce to where W = PfAh. The matrix (l \u00E2\u0080\u0094 I'), which arises in maximum time-frequency con-centration problems, has N eigenvectors which conform to the discrete prolate spheroidal sequences (Slepian, 1977). In time-frequency maximum concentration problems WN is also called the time-bandwidth product (Slepian and Pollack, 1961). When W < 1/2, the eigenvalues of g(l \u00E2\u0080\u0094 V) have the following properties: Property 1. The matrix (l \u00E2\u0080\u0094 /'), I, I' = 0,..., N \u00E2\u0080\u0094 1 has only N non-zero eigenvalues. They are distinct, real and positive and we order them so that 9(1 - n 1 sin(27rV7(Z - Z;)) (2.66) 1 > A0(iV, W) > X^N, W) > . . . > XN-^N, W) > 0. Property 2. The eigenvalues obey the symmetry law Xk(N,^-W) = l-XN-1\u00E2\u0080\u009Ek{N,W) Chapter 2. Slant Stacks 29 Property 3. For any arbitrarily small e > 0 there exits a constant Ce so that Xk>l-efork< 2WN - CemirNW l-e>\k>efork< 2CehnrNW Property 4. [Slepian, 1977] As N -* oo, Xk(N,W) -> 1 if k = 2WN(1 - e), while Xk(N, W) -> 0, if k = 2WN(1 + e). This is true for any e satisfying 1 > e > 0. In fact, this is a consequence of property 3. Note that all the eigenvalues are positive and different from zero which is a property of any Hermitian matrix (Strang, 1976). Strictly speaking, this does not guarantee the stability of the inverse, since several eigenvalues can be close to zero and even fall below machine precision. This is particularly true for large values of N. Table 2.1 shows the eigenvalues of the 11 x 11 matrix for W = 0.2, W = 0.3 and W = 0.4. Figure 2.1 shows the dependence of the eigenvalues on W for N = 11 and N = 25. For N > 25 the eigenvalues cannot be computed by direct solution of an eigen-problem. In fact, asymptotic expressions must be used to cope with this situation (Slepian ,1977). Property 4 highlights some interesting features of the eigen-spectra of the matrix {l - V), lt I' = 0,... N - 1 for W = 0.2, 0.3, 0.4 The aforementioned properties are valid for W < 1/2. When W = n/2, n E Z the matrix (l \u00E2\u0080\u0094 V) degenerates to the identity matrix, {l - V) = 2W8l>v . (2.67) In addition, when W > 1/2 the following property is valid: Property 5. When | > W' > with neZ Xk(N, W') = n + X(N, W), W = W' - ^ . So far I have described some relevant properties of the eigen-spectra of the matrix (f>{l \u00E2\u0080\u0094 V). One can use these properties to infer a stability criterion for the inversion of the matrix G. In particular, the condition number, Cg, of the matrix G may be assessed by means of the condition number of the matrix (l \u00E2\u0080\u0094 I'), C^,. The condition number is Chapter 2. Slant Stacks 32 defined as the ratio of the largest eigenvalue to the smallest eigenvalue. For a stability analysis, it is preferable to estimate the condition number since it is a scale-free criterion. Since the maximum eigenvalue is always bounded, it is also possible to draw similar conclusions by analysing the smallest eigenvalue of (l \u00E2\u0080\u0094 /'). Three different situations may arise depending on the value of W. Using Properties 1, 2, 3, and 4, it immediately follows that CJNW)- Xo{N>W) W>1 (2.68) Ci(N,W) = l,W=l, (2.69) Accordingly to Property 3, when N is large, the condition number becomes C*{N,W)->oo, W>\, (2.71) n C(N,W) = 1,W = -, (2.72) C,(N, W') = ^ , W = W ' - ^ < W ' < ^ n 2 2 2 (2.73) The condition number of {l \u00E2\u0080\u0094 V) may be computed by means of the condition number of the Toeplitz form g(l \u00E2\u0080\u0094 V). Since the condition number is a ratio of eigenvalues it is easy to see that Chapter 2. Slant Stacks 33 Cg(N, W) = C+(N, W). (2.74) At this point, some comments follow immediately. First, the condition number can be prohibitively large when W < 1/2. Secondly, for values of W > 1/2 the condition number stays close to 1. The latter may appear an advantage, but recalling the definition of the variable W the right hand side of the second inequality is the spatial wavenumber, while the left hand side is the maximum admissible wavenumber (Nyquist wavenumber). It is clear that when W > 1/2 the transform is aliased. Consequently, W = 1/2 is the upper limit that guarantees the absence of alias. The lower frequency limit is related to the stability of the inverse. Obviously, the inverse of G does not exist at zero frequency. The latter implies that the DC level cannot be recovered (it is clear that the DC level lies in the null space of the problem.) Using continuity arguments, it follows that the inverse of g(l \u00E2\u0080\u0094 I') is unstable at low frequencies. This situation is well reflected in the condition number. At this point, some comments on the computation of the condition number are ne-cessary. The most serious problem arises when large eigenvalues begin to cluster very close to 1. Since the values 1 \u00E2\u0080\u0094 A* fall below machine precision, eigen-problem routines cannot resolve the difference between two consecutive eigenvalues. This is particularly important when N is large. In this case the lower order eigenvalues cannot be computed by means of any eigenvalue routine. Slepian (1976) provided an asymptotic expansion for the eigenvalues of the matrix [l \u00E2\u0080\u0094 /') for fixed k, large N and W < 1/2. I am not Chapter 2. Slant Stacks 34 going to derive the following expression since it is beyond the scope of this thesis, 1 - Xk(N, W) = ir*(fc!)-12<14*+9>/4[2 - a]-( f e +5)jV f c +' e\"^ a = 1 - COS(2TW), (2.76) 7 = M1 + ^ ] . Thus, the condition number becomes C+(N, W) = t 1 L \ (2.77) where d and 7 may be computed with W = 1/2 \u00E2\u0080\u0094 W. The fit with Ao is very good for N > 6 and > 0.1. In expression (2.77) I have replaced \N-i(N,W) by X0(N,W) according to property 2. This function can be tabulated for different values of N, W to estimate the lower limit of W and hence, the low frequency cutoff, fmin, that guarantees the stability of the transform. The inverse of G may be stabilized using one of the following tricks: \u00E2\u0080\u00A2 by shifting the signal to higher frequencies where the invertibility of g(l \u00E2\u0080\u0094 /') is guaranteed (Beylkin, 1987). Unfortunately, some wavenumbers may be moved above the Nyquist limit. \u00E2\u0080\u00A2 by adding a small perturbation to the principal diagonal of g{l \u00E2\u0080\u0094 V). In this thesis, I have adopted the second strategy. Generally, a small perturbation to the diagonal of G is sufficient to guarantee the invertibility of the matrix. If f3 denotes a positive scalar, the perturbed matrix is written as Chapter 2. Slant Stacks 35 g(l - I') = \{l - V) + I, I' = 0, N - 1. (2.78) The condition number of the perturbed matrix becomes, Cg(N,W) = X0(N, W) + f{3 (2.79) Note that, according to Property 5, when N is large the condition number reduces to it is clear that Cg increases when / decreases and that the effect of (3 is dominant at low frequencies. 2.3.5 The data and model resolution matrices Consider an inverse operator that solves the problem u = L*v. An estimate of the model parameters can be written as v e , t = L + u , where L + denotes the inverse operator. Substituting v\"' into the equation L*v = u leads to the following relationship between the predicted data, u p r e d and the observed data, u, C, = {l + fP)/f0, (2.80) u' red = L*v e\" = L * L + u = R u U . (2.81) The data resolution matrix, Ru , describes how close the predicted data are to the actual observations. When = I, the data are perfectly resolved. Chapter 2. Slant Stacks 36 Similarly, the model resolution matrix is defined by Rv, where v\"' = L + L*v t r u e = (2.82) When R \u00E2\u0080\u009E = I each model parameter is perfectly resolved. On the other hand, when R\u00E2\u0080\u009E / I the estimated parameters are weighted averages of the \"true model\" parameters. In fact, one may define an effective averaging distance, d, that quantifies the departure of R\u00E2\u0080\u009E from the identity. This may be done by inspecting the width of the sidelobes of the matrix R. As an example, consider two monochromatic linear events with slopes p\ and P2. If d < \p2 \u00E2\u0080\u0094 pi|, the slant stack operator can identify both events. Therefore, v e 4 t will exhibit two narrow peaks corresponding to the events with slopes p\ and p^. Conversely, when this condition is not met, the slant stack operator cannot resolve both events, and the resulting model will show a broad single peak located somewhere between pi and j>2 \u00E2\u0080\u00A2 Different measures of resolution have been studied in the context of time series and spectral analysis (Marple, 1986). I have found the idea of spreading functions to be very attractive, not only because it highlights some important aspects concerning the resolution of linear operators but because of its historical importance. The concept of spreading functions and averaging distance form part of the first formal approach to geophysical inverse problems (Backus and Gilbert, 1968). The spreading function may be defined as where the subscript F stands for the Frobenious norm and I for the identity matrix. Equation (2.83) is also called the Dirichlet spread function. Notice that when the model Spread^) = ||I - R. | |JI = E E ( 4 i \" *wi)2 - (2.83) \u00C2\u00BB 3 Chapter 2. Slant Stacks 37 resolution matrix approaches the identity matrix, the spreading function tends to zero. The foregoing discussion leads to two cases. First, consider the unconstrained least squares solution for the overdetermined, equation (2.54), = L+L* = (LL' ) - 1 LL* (2.84) = I, and consequently the spreading function becomes Spread(R\u00E2\u0080\u009E) = 0. (2.85) In the underdetermined case, equation (2.48), the model resolution matrix becomes R v = L+L* (2.86) = L(L*L)- X L*. It is straightforward to verify that the matrix I \u00E2\u0080\u0094 R v is idempotent, hence, the following properties may be used to derive an expression for the spreading function: \u00E2\u0080\u00A2 If A is idempotent, A 2 = A, (Strang, 1976). The Frobenious norm of A is given by: | | A | \u00C2\u00A3 = Trace(AHA). (2.87) \u00E2\u0080\u00A2 For any idempotent matrix, Trace(A) = Rank(A). (2.88) Chapter 2. Slant Stacks 38 Letting A = I \u00E2\u0080\u0094 R v , the spreading function becomes Spread(R\u00E2\u0080\u009E) = Trace(I) - Trace(R\u00E2\u0080\u009E) (2.89) = M-N. In the underdetermined problem the spreading function is equal to the difference between the number of unknowns (slopes) and the number of observations (traces). The preceding analysis suggests that the overdetermined option is better than the underdetermined one. However, any assessment based on spreading functions must be carefully considered. The model resolution matrix gives an input/output relationship between two vectors. However, it is obvious that the so called \"true model\" lies in a physical space of infinite dimension. The vectors v and v t r u e are a simple discrete representation of a physical space. A simple example may help to clarify the problem. Assume that a single linear event is recorded on an array of N = 3 receivers. Then, according to the model resolution matrix, only M = 3 slopes are sufficient to perfectly resolve the model. Obviously, this is only true if one of the tentative slopes pi,i = 1,2,3 coincides with the actual slope of the linear event. If this is not the case, the vector v will have non-zero contributions in all of its entries. A similar problem is encountered in harmonic retrieval from a finite length time series. In this problem M frequencies and the associated amplitudes and phases are retrieved from a complex time series of N samples (M < N.) The problem is usually solved in two steps. First the frequencies are estimated, then the amplitudes and phases using a least squares procedure or any other method that minimizes some measure of closeness between the observed and the predicted time series. The first step is crucial since an incorrect estimation of the frequencies will incorporate a bias in the estimation of the amplitudes and phases. It is clear that the first problem, the retrieval of the frequencies, is an Chapter 2. Slant Stacks 39 underdetermined problem (the frequency is a continuous variable) which may be solved by means of an estimate of the power spectral density (PSD). The key is that the PSD is computed by solving a linearly underdetermined problem where a few samples of the autocorrelation function are the data constraints. Two examples that come to my mind are the periodogram, that is the PSD estimate with minimum energy (minimum norm solution), and the maximum entropy PSD which maximizes the entropy of a Gaussian process (Burg, 1975). 2.3.6 Resolution analysis by means of the spectral expansion First I will address the resolution problem for the conventional slant stack operator. To this end, I will assume that a monochromatic plane wave u is recorded with a symmetric array of N = 2L + 1 receivers. The ray parameter of the wave is denoted by p, u, = C - \u00C2\u00BB \u00C2\u00BB / P A M I - I ) . (2.90) Substituting the last equation into the expression for the conventional slant stack, Lv = u, we have L vk = ApAh J2 e i 2 , r /(p-p*)A W . (2.91) l=-L This equation has been already met (equation (2.63)). The sum of the geometric series is replaced by a Dirichlet kernel, DL(0), vk = 2irApAhDL{e) (2.92) Chapter 2. Slant Stacks 40 where 8 = 2irf(p \u00E2\u0080\u0094 pk)Ah. Expression (2.92) has a peak at p = pk, which corresponds to the slope of the plane wave. In the presence of many plane waves, vj, is a superposition of Dirichlet kernels. Assuming two plane waves with a slope difference of Sp, one may ask the following question: what is the minimum Sp that is discernible by the slant stack operator? One measure of resolution may be the first zero crossing, 9 = \u00C2\u00B12TT/(2L + 1), of the Dirichlet kernel. Using this measure, it follows that S p fAh(2L + 1) TT^l\u00E2\u0080\u0094 \u00E2\u0080\u00A2 (2-93) /Aperture This relationship is also known in optics as \"Rayleigh's Criterion\", (Born and Wolf, 1980), and simply states that the resolution is proportional to the inverse of the aperture. When the slant stack is computed using the minimum norm criterion (2.49) the resolution analysis may be tackled as follows. When M is large, the following expression is valid (section 2.3.4) G = k+- . (2-94) The eigenvectors of are the prolate spheroidal discrete sequences. The eigenvalues of have been extensively studied in a preceding section. The eigenvalue problem is written as, ^Grk(N, W) = Xk(N, W)rh(N, W). (2.95) Chapter 2. Slant Stacks 41 A certain economy in notation is provided by defining, jG = R A R H , (2.96) where R is an orthogonal matrix where each column is an eigenvector Tk(N, W), and A is a diagonal matrix with elements given by A^i = Xi(N, W). Equation (2.96) is the spectral expansion of the Hermitian matrix f~lG. Since R is orthogonal, the inverse of G may be computed as G 1 = R A _ 1 R H . (2.97) Now, take for instance the minimum norm solution (2.49) v = L G - ' u = i L R A \" 1 R H u / (2.98) = } L E f e \u00C2\u00A3 r f e ( r ? u ) where the coefficients jk are given by r fu The key to understanding resolution is understanding the relationship between the mag-nitude of Afc(iV, W) and W and N. It has been shown that the eigenvalues are bounded Chapter 2. Slant Stacks 42 above by one for all k, and plunge to zero for values of k > 2WN. Beyond k = 2WN, the eigenvalues fall quickly at a fairly constant rate, such that Afc/Afc+1 is approximately equal to ten (Slepian and Sonnenblick, 1965). Therefore, one may adopt k' = int[2WiV] as the maximum eigenvalue to be used in (2.98). An interesting feature of the prolate spheroidal sequence, r*, is that it may be approximated by a sinusoidal function in terms of the average distance between zero crossings. 2.4 Synthetic data examples A linear event impinging on a uniform array of receivers is used to study the resolution-stability tradeoff. The array is composed of 15 receivers located 10 m apart. The near and fax offsets are located at \u00E2\u0080\u009470 and 70m, respectively. The apparent velocity of the seismic event is 2000m/s. The seismic source corresponds to a zero phase Packer wavelet with a central frequency of 25 Hz. Most of the spectral energy is contained in the interval 0 \u00E2\u0080\u0094 50 Hz. Spectral components which are located above 50 Hz are negligible. The data are shown in Figure 2.2. The r \u00E2\u0080\u0094 p plane computed with the conventional definition is illustrated in Figure 2.3a. Similarly, the minimum norm slant stack operator is used to transform the data to the r \u00E2\u0080\u0094 p domain. In both cases M = 41 slopes were computed. The minimum norm slant stack panel was computed with three different values of pre-whitening, /? = 1 x 10~8, 1 x 10~3, 1 x 10 _ 1. The results are shown in Figures 2.3b, c and d. It is clear that the minimum norm solution provides more resolution of the seismic signal than the conventional slant stack procedure. It is also true that damping makes the minimum norm solution behave like the conventional definition. A significant part of the spectral energy is contained at low frequencies where the operator G is unstable. This feature is shown in Figure 2.4 where the condition numbers computed using the minimum and maximum eigenvalue of the matrix G before and after pre-whitening are Chapter 2. Slant Stacks 43 shown. It important to notice that the behaviour of this curve may be predicted using asymptotic values derived from the prolate spheroidal discrete sequences. The cutoff frequency is / = 50Hz (W = 1/2) that corresponds to the upper bound to avoid spatial alias. To complete the analysis I plotted the minimum and maximum eigenvalues of the matrix g{l \u00E2\u0080\u0094 V) x / . In the limiting case when N is large, these eigenvalues are approx-imately equal to those of (l \u00E2\u0080\u0094 I'). The result is portrayed in Figure 2.5 . The curve resembles the eigen-spectra of the matrix (l \u00E2\u0080\u0094 I'). 2.5 Concluding remarks This section was devoted to the presentation of two methods of computing the discrete slant stack operator. The first one involves the conventional definition of the slant stack operator followed by the inversion of the operator G to recover the data. The second solution involves interchanging the definition of the forward transform of the problem. In this case the forward transform maps the r \u00E2\u0080\u0094 p panel into the data space. The problem may be posed as an underdetermined linear inverse problem which may be solved using the concept of minimum norm. It must be stressed that other regularization schemes may be used to tackle the problem. It has been proven that the resolution of the conventional slant stack is proportional to the inverse of the aperture of the array (Rayleigh limit). In the minimum norm solution, the deconvolution of the matrix G may help to improve the resolution. However, because of the rather peculiar eigen-spectra of G , damping to stabilize the inverse is necessary. The stability is improved at the expense of resolution. Chapter 2. Slant Stacks o 0 . 2 -0 . 4 -0 . 6 -0 . 8 -- 5 0 0 5 0 o f f s e t ( m ) Figure 2.2: Synthetic data. Chapter 2. Slant Stacks 45 (a) (b) 0.2 0.4 0.6 0.8 i I I I I I I 1 I I I I I L 0.2 0.4 0.6 0.8 j \u00E2\u0080\u0094 i \u00E2\u0080\u0094 i \u00E2\u0080\u0094 i \u00E2\u0080\u0094 i \u00E2\u0080\u0094 1 _ -0.001 -0.0005 0 0.0005 0.001 p(sec/m) -0.001 -0.0005 0 0.0005 0.001 (c) p(sec/m) (d) 0.2 0.4 0.6 0.8 J I I 1 1 L _ 0.2 0.4 0.6 0.8 _ l I I I ! I I L M J \u00E2\u0080\u0094 1 \u00E2\u0080\u0094 I \u00E2\u0080\u0094 1 \u00E2\u0080\u0094 I \u00E2\u0080\u0094 1 _ -0.001 -0.0005 0 0.0005 0.001 -0.001 -0.0005 0 0.0005 0.001 p(sec/m) p(sec/m) Figure 2.3: (a) r \u00E2\u0080\u0094 p panel computed with the conventional slant stack operator, (b), (c) and (d) r \u00E2\u0080\u0094 p space computed with the minimum norm slant stack operator with P = 1 x 10- 9 ,1 x 10- 3 , and 1 x 10\" 1. Chapter 2. Slant Stacks 46 (a) 50 Frequency (Hz) 100 0) 300 ! ZOO 3 2 100 1 1 1 1 1 . 1 1 1 i i \fl=0.0 -- 0=lxlO-\u00C2\u00BB\ -0 = 1x10-' \ \u00E2\u0080\u00A2 /J=lxlO-' i i . . , \ L , , , 1\u00E2\u0080\u0094 1 i i 0 50 Frequency (Hz) 100 Figure 2.4: (a) Power spectrum of the 25 Hz Pucker wavelet used in the synthetic ex-ample, (b) Condition number of the matrix g(l \u00E2\u0080\u0094 Z'),1,1'=0,... ,10. The continuous line corresponds to the condition number without damping. The dashed lines correspond to the condition number when different amounts of damping were applied. Frequencies above 50 Hz are spatially aliased. Chapter 2. Slant Stacks 47 ~\ 1 1 r - i 1 1 r I Z o\" II *N-1 \u00E2\u0080\u00941 I I l_ 50 Frequency (Hz) 100 Figure 2.5: Minimum and maximum eigenvalues values of fg(l \u00E2\u0080\u0094 /'), /, V Note the strong similarity to the eigenvalues of the matrix (l \u00E2\u0080\u0094 V). 0 10. Chapter 3 Velocity stacks 3.1 Introduction Conventional velocity analysis is performed by measuring energy along hyperbolic paths for a set of tentative velocities. The analysis of the results in the r \u00E2\u0080\u0094 v (two way zero offset time and velocity) plane serves to estimate the stacking velocity that is later used to construct the zero offset section. The semblance (Neidell and Taner, 1971) is one of the most popular measures of coherent energy along hyperbolic trajectories in CMP gathers. The semblance measures the ratio of the signal energy within a window to the total energy in the window. Noise with non-zero mean and closely-spaced events in the same window deteriorates the velocity resolution when using this measure. The poor resolution of the semblance has lead to more sophisticated techniques based on the eigenstructure of the data covariance matrix (Biondi and Kostov, 1989; Key and Smithson, 1990). In these techniques, the data covariance matrix is decomposed into signal and noise space contributions. Different metrics based on the eigenvector of the signal space are then used to measure coherent energy along hyperbolic paths. The semblance, or any other velocity measure can be displayed as a contour map where each maximum corresponds to the kinematic pair r \u00E2\u0080\u0094 v. Mapping the original data back from this space is not possible since these energy measures do not contain phase information. This short-coming may be overcome by using constant-velocity stack gathers that consist of constant-velocity CMP-stacked traces (Thorson and Claerbout, 48 Chapter 3. Velocity stacks 49 1984). For an infinite-aperture array, the summation along hyperbolas should map into a point in the velocity space (for a spike-like wavelet). Limited aperture, however, spreads the information and not only makes the identification of each seismic event difficult, but also degrades any reconstruction of the offset space beyond the original aperture. Hence, there is a practical importance for estimating high-resolution or aperture-compensated velocity gathers. To reduce amplitude smearing in the velocity space, Thorson and Claerbout (1985) performed the inversion of a set of constant-velocity stacks in t \u00E2\u0080\u0094 h space. The problem involves the inversion of large matrices that is a difficult task. Thorson and Claerbout develop a stochastic inversion scheme that converges to a solution with minimum entropy. The latter is achieved by defining a Gaussian prior pdf with variable variance. To avoid the inversion of very large matrices, Yilmaz (1989) posed the problem in the / \u00E2\u0080\u0094 h domain. Since velocity stacks are not time-invariant, he proposed a t2 transformation to force time-invariance. Time invariant operators can be easily posed in the frequency-offset space. Thus, instead of solving a large inverse problem, we can solve several small inverse problems at each frequency within the seismic band. After a t2 transformation summation along hyperbolas is replaced by summation along parabolas. Consequently, the hyperbolic Radon operator is replaced by an operator with integration along parabolic paths. The parabolic transform was originally applied in a somewhat different manner by Hampson (1986) to attenuate multiple energy in CMP gathers. Hampson applied the parabolic transform to normal moveout (NMO) corrected CMP gathers. The residual moveout of a reflection, after NMO, can be approximated by a parabola. The multi-ples are isolated in the transform domain and after being transformed back to the offset domain they are subtracted from the original data. A similar transformation has been Chapter 3. Velocity stacks 50 proposed by Foster and Mosher (1992) who showed that the correct suppression of mul-tiples is particularly important in some prestack processes like amplitude versus offset (AVO) analysis. Their transformation is well suited for target oriented problems. The focusing power of the transform depends on a tuning parameter that may be adjusted to isolate a particular hyperbolic event. It is clear that the parabolic transform applied to NMO corrected CMP gathers is appropriate to model only energy that is organized along a parabolic path. At this point some remarks are in order. The focusing power of the method critically depends on how well the data match the Radon integration path. In parabolic stacking after NMO (Hampson, 1986) the hyperbolas can be accurately approximated by parabolas only for a limited range of time-velocity pairs. In other words, the focusing power of the transform in not guaranteed for the complete gather. On the other hand, if the parabolic path is validated by means of a t2 transformation any hyperbola can be converted into a parabola Unfortunately, the t2 transformation introduces a non-linear compression of the time axis before 1 s and an expansion after 1 s. It is clear that, after the t2 transformation, the frequency decomposition is not completely valid. This chapter is organized as follows. First, I provide analytical expressions for the forward and adjoint mappings. I assume that parabolic paths arise as a consequence of applying a t2 transformation to the CMP or CSP gather. The discrete forms are used, as in Chapter 2, to set the inverse problem in the frequency-offset space. Finally, I discuss under which conditions the NMO correction may be a good alternative to validate the parabolic approximation. Chapter 3. Velocity stacks 51 3.2 Velocity stacks On CMP or CSP gathers seismic events are more likely to be hyperbolic than linear. The T \u00E2\u0080\u0094 p transform maps hyperbolas into ellipses. This does not help much for reducing the support of the original signal in the r \u00E2\u0080\u0094 p domain. Instead of a linear slant stack it is better to define a transform with a hyperbolic integration path. Let u(h, t) be the CMP or CSP gather and v(q,r) the velocity stack. For a continuous array the velocity stack is defined by the following transformation v(q, r) = (Cu)(q, r) = f\u00C2\u00B0\u00C2\u00B0 u(h, t = JT2 + h2q)dh , (3.1) J\u00E2\u0080\u0094oo the adjoint transform C* is defined by u(h, t) = (\u00C2\u00A3*v)(q, T) = r v(h, r = Jt2 - h2q)dq . (3.2) J\u00E2\u0080\u0094oo The pair of transformations (3.1) and (3.2) is not time-invariant. This short-coming, however, may be overcome by introducing the following transformation, t' = t2 r' = r2, (3.3) this transformation compresses the time axis for samples before Is and stretches the samples after 1 s. The main problem of a t2 transformation is that shallow events can be severely aliased. However, this can be avoided by oversampling the time axis. A simple substitution shows that hyperbolic traveltimes are converted into parabolic traveltimes, Chapter 3. Velocity stacks 52 t2 = r2 + h2q \u00E2\u0080\u0094>t? = T' + h2q. (3.4) It is possible, now, to write a new pair of transformations in stretched coordinates v(q, T') = {Cu)(q, r') = f\u00C2\u00B0\u00C2\u00B0 u(h, t'= r'+ h2q)dh. (3.5) J\u00E2\u0080\u0094oo u(h, t') = (C*v){q, T') = r v(h, r' = t'- h2q)dq . (3.6) J \u00E2\u0080\u0094 O O The notation of u, v and C in expressions (3.1)-(3.2) and (3.5)-(3.6) has not been changed, but it is straightforward to see that they do not represent the same functions after the t2 transformation. The benefit of the t2 transform is that the pair of transformations (3.5) and (3.6) is time-invariant and therefore, the transformations can be expressed in the frequency-space domain, V{q,w)= r Uih^e\u00E2\u0084\u00A2** dh, (3.7) J\u00E2\u0080\u0094oo U{h,w)= f\u00C2\u00B0\u00C2\u00B0 V(q,u>)e-iuqh2dq. (3.8) Substituting equation (3.7) into (3.8) leads to U(h,u>)= f\u00C2\u00B0\u00C2\u00B0 U{h',u>) f\u00C2\u00B0\u00C2\u00B0 e-W-h'2Uqdh'. (3.9) Chapter 3. Velocity stacks 53 The last equation involves a particular type of convolution integral. The kernel in equa-tion (3.9) is given by 1>(h,u)= r e-iuh?*dq. (3.10) J\u00E2\u0080\u0094oo If the parameter q is truncated q \u00E2\u0082\u00AC [\u00E2\u0080\u0094Q, Q], the kernel V\u00C2\u00BB has the following form A special type of convolution integral must be solved in order to recover u(h, t'), 3.3 Inverse velocity stacks In chapter 2,1 showed that the forward/inverse slant stack pair may be derived using two approaches by interchanging the definition of forward and adjoint operators. Similarly, we may want to define the following pair of forward and adjoint operators for the parabolic transform u(h, t') = (Cmv)(q, T' = r v(q, t' = r' - h2q)dq, (3.13) J-co /oo u(h,t' = r' + h2q)dh. (3.14) -co In the frequency-offset space, these transformations become Chapter 3. Velocity stacks 54 U{h,w) = f\u00C2\u00B0\u00C2\u00B0 V(q,U)e-^h2dq, (3.15) J \u00E2\u0080\u0094oo V(q,u)= f\u00C2\u00B0\u00C2\u00B0 Uih^eWdh. (3.16) J\u00E2\u0080\u0094oo The substitution of equation (3.15) into (3.16) leads to V{q,u) = f\u00C2\u00B0\u00C2\u00B0 V(q',u) f\u00C2\u00B0\u00C2\u00B0 e-^'-^dhdq' (3.17) J\u00E2\u0080\u0094oo J\u00E2\u0080\u0094oo or equivalently, V(q,u,) = V(q,u)**(qiu>). (3.18) Zhou and Greenhalgh (1994) showed that \u00E2\u0080\u00A2c = l l + i s g n ( 9 ) V^ki ' ( 3- 1 9 ) Deconvolution with respect to the variable q is also required when the aperture is in-finite. This is an important distinction with respect to the linear Radon transform. Unfortunately, when h is truncated there is no analytical expression for o-(q,u>). Chapter 3. Velocity stacks 55 3.4 Discrete parabolic Radon transform Discrete versions of equations (3.5) and (3.6) are obtained replacing the integral by summation and imposing finite limits. First assume that the CMP gather contains Lf \u00E2\u0080\u0094 Ln + 1 traces, where the indices Lf and Ln denote far and near offset traces re-spectively it v{q, r') = (Cu)(q, r') = \u00C2\u00A3 u{hh r' + hl2q)Ahl, (3.20) l=Ln where Ahi = hi+i \u00E2\u0080\u0094 hi for I = Ln,..., Lf. Similarly, equation (3.6) is approximated by the following expression J max u(h, t') = (\u00C2\u00A3*V)(T>, q) = \u00C2\u00A3 v(h, t' - h*qj)Aqj. (3.21) j\u00E2\u0080\u0094Jmin where Aqj = qj+i \u00E2\u0080\u0094 qj for j = J m j \u00E2\u0080\u009E , . . . , Jmax \u00E2\u0080\u00A2 Taking the Fourier transforms of (3.20) and (3.21) yields V(q, /') = E U(hh fy^'Aht (3.22) l=Ln U(h,f')= E V(qjJ')e-WhVAqj. (3.23) Using matrix notation and suppressing the frequency dependence, equations (3.22) and (3.23) can be written as v = F W u u = L u , (3.24) Chapter 3. Velocity stacks 56 u = F f f W\u00E2\u0080\u009Ev = L*v, (3.25) where W u = diag[A/i\u00C2\u00A3 n,..., AhLf] and W\u00E2\u0080\u009E = diag[Agj r o i n,..., AqJmax]. The matrix W u accounts for the irregular geometry of the array, while W v accounts for the irregular sampling of the inverse squared slowness q. The matrix F has dimension (Jmax \u00E2\u0080\u0094 \u00C2\u00AB7m; n + 1) x (Lf \u00E2\u0080\u0094 Ln + 1), and elements given by Fij = e * 2 ^ ' h ' 3 \u00C2\u00AB i . (3.26) The reader will realize that an important part of the analysis presented in Chapter 2 also applies here. In fact, it is easy to derive inverse operators, by defining a new pair of forward and adjoint mappings as u = F H W\u00E2\u0080\u009Ev = L*v, (3.27) v = FW u u = Lu. (3.28) The conventional operator computes the f'\u00E2\u0080\u0094q space after parabolic moveout and stacking (3.24). The data are recovered using the adjoint, equation (3.25), followed by the operator G, as u = G _ 1 u . (3.29) Chapter 3. Velocity stacks 57 In the second formulation, the f' \u00E2\u0080\u0094 q space is obtained by solving equation (3.27). This equation may be solved with the zero-order quadratic regularization (damped least squares) procedures outlined in section 2.3.3. In contrast to the linear Radon transform, it is not possible to find an asymptotic formulation to assess the stability and the reso-lution of the operator. The elements of the operator G , assuming constant Aq and Ah, are given by 9 l v = AhAq \u00C2\u00A3 e * 2 ^ \u00C2\u00AB ^ 2 ( ' 3 - ' ' 2 ) (3.30) which has the following sum gm ( 2 * ' ' A *y ' - ' \">) glv = AhAq , / 2 l f / , A q A V t ^ T ~ e 3 \u00E2\u0080\u00A2 ( 3 - 3 1 ) Gulunay (1988) has pointed out that equation (3.30) behaves like a Fresnel integral (Born and Wolf, 1980). 3.5 Sampling considerations In section 2.2.3, I derived the sampling theorem for slant stacks. Unfortunately, it is not possible to derive an exact analytical expression for the parabolic stack (Hugonnet and Canadas, 1995). An approximate expression has been reported by Kabir and Verschuur (1995) Chapter 3. Velocity stacks 58 which has been obtained by making an analogy with the linear Radon transform. For the linear Radon transform we have, AV < _ * . (3.33) Wrnax Pmin | J max If we assume that the parabolic transform can be computed using the linear Radon transform after the variable transformation hi = h2, Ah' = 2hAh. (3.34) After replacing equation (3.34) into (3.33), equation (3.32) follows immediately. 3.6 Example The following steps summarize the procedures involved in the computation of velocity stacks by means of the parabolic transform: 1. Stretch the time axis of each seismic trace. 2. Transform each trace to the frequency domain. 3. Apply the parabolic stack operator (conventional or inverse operator). 4. Apply the inverse Fourier transform to each trace. 5. undo the stretching of the time axis. Figure 3.1 shows a hyperbolic event simulated with a velocity of 3000 m/sec. In figures 3.2a and 3.2b, the conventional velocity stack and the minimum norm velocity stack are portrayed. It is clear that the minimum norm velocity stack panel exhibits Chapter 3. Velocity stacks 59 o 0.2 0.4 0.6 0.8 \u00E2\u0080\u0094s *\u00E2\u0080\u0094< I *J 1.2 1.4 1.6 1.8 2 0 200 400 600 800 1000 1200 offset (m) Figure 3.1: Synthetic data. better resolution of the seismic event than the conventional panel computed with the adjoint operator. 3.7 The validity of the parabolic approximation To ensure the improvement of the resolution of the parabolic transform, it is important to examine under what conditions the parabolic approximation is valid. Bear in mind that the NMO correction and the t2 transformation are tools to make the reflection paths time invariant. In other words, while a hyperbola will change if the time axis is translated, a parabola will exhibit the same moveout after a translation of the time axis. This important concept was used to set up our problem in the / \u00E2\u0080\u0094 x space (see the transition from equations (3.20)-(3.21) to (3.22)-(3.23)). The t2 transformation maps a hyperbola in t \u00E2\u0080\u0094 x into a parabola in t2 \u00E2\u0080\u0094 x, regardless Chapter 3. Velocity stacks 60 (a) (b) 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.04 0.08 0.12 0.16 0.2 0.24 0.28 (ms!/m!) (msVm8) Figure 3.2: Velocity processing, (a) Velocity stack computed using the conventional oper-ator (parabolic moveout and stacking), (b) Velocity panel computed using the minimum norm inverse operator. Chapter 3. Velocity stacks 61 of the position and velocity of the hyperbola. In other words, the transformation is valid for any t \u00E2\u0080\u0094 v pair. Unfortunately, the frequency decomposition is not valid since the spectrum of the wavelet becomes time dependent. The latter was neglected in the formulation of the problem. The frequency distortion of shallow events that is present in CMP gathers reconstructed from velocity panels (Yilmaz, 1989; Sacchi and Ulrych, 1995a) can be explained as follows. To compute the velocity gather in r \u00E2\u0080\u0094 q we must apply a r 1 / 2 transformation to undo the original t2 transformation. The data before 1 s are stretched and the data after Is are compressed. Because of the alias introduced i by the t2 transformation before Is, the reconstruction is not exact and leads to the aforementioned distortion. The parabolic approximation can be also validated by means of an NMO correction (Hampson, 1986). Assuming that the stretching at far offsets after the correction is negligible, the spectral distortion of the signal may be very mild. In this case, the validity of the parabolic approximation depends on the position of each reflection in the velocity spectrum. If V denotes the velocity of a reflection and Vc the velocity used to perform the NMO correction, the corrected moveout can be approximated by t(h) = t-(O) + aih2 + a2h4 + \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 (3.35) where the parameters a\, a2 are given by a i = 2 ^ 0 ) ( ^ \" ^ ) ' ( 3 - 3 6 ) a 2 = 8 W ( ^ - ^ } - ( 3 - 3 7 ) Chapter 3. Velocity stacks 62 The parabolic approximation is valid provided that If the last expression is satisfied, we can neglect the term o^/i 4 in equation (3.35) and consequently the parabolic approximation is valid. Such an approximation justifies identi-fying the Radon parameter q with the variable a i . It is easy to see that the approximation may fail for shallow reflections, f.(0) \u00E2\u0080\u0094\u00E2\u0080\u00A2 0. The correction, Vc, can be chosen to minimize the last inequality, but if there are two hyperbolas with different velocities but with similar i(0)'s, the approximation may not be completely satisfied for both hyperbolas. Integration paths, other than the parabolic, may be more adequate to model hyper-bolas after NMO correction, an example is {h) = a\h2 + a 2 / i 4 . However, the computa-tional cost of a procedure to decompose CMP or CSP gathers in a r, a i , o 2 space must be carefully considered. Chapter 4 High Resolution Slant Stack and Parabolic Stacks Operators Entia non sunt multiplicanda praeter necessitatem. William of Occam 4.1 Introduction This chapter presents an inverse procedure to enhance the resolution of slant stacks and parabolic stacks. In particular, I replace the conventional slant stack and the parabolic stack operators by operators derived from the solution of an inverse problem that is solved using a regularization that serves to improve the focusing power of the operator. These operators seek a solution that resembles the one that might have been computed with the conventional one when the aperture of the array is infinite. Hence, the overall procedure is equivalent to simulating a longer array. Specifically, two spaxseness criteria are proposed to regularize the inverse problem. These criteria reduce the support of each signal in the transform domain, and hence, the resolution or focusing power of the Radon operator is enhanced (Sacchi and Ulrych, 1994, 1995b). It has already been mentioned that the problem of estimating a highly resolved Radon operator may be posed as an inverse problem. Inverse problems are naturally ill-posed since they do not satisfy the conditions of existence, uniqueness and stability of the solu-tion (see, for example, Tikhonov and Goncharsky, 1987). The technique which permits the construction of a unique and stable solution by introducing some type of prior in-formation is called regularization. In fact, a particular regularization by means of the minimum norm has already been discussed. The prior information can either be given 63 Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 64 in a deterministic form or in a stochastic form. Positivity is a common deterministic constraint useful for solving a variety of inverse problems i.e. magnetic susceptibility in-version, density inversion, etc. A stochastic constraint assumes some relevant information about the model in terms of moments of corresponding distributions. In many problems the regularization is carried out without consideration of the nature of the model we are seeking. That is the case of the widely used zero-order quadratic regularization. This method has the advantage of imposing smoothness to the model which is the common way to avoid the amplification of random errors associated with each observation. The well known pre-whitening technique used in spiking and predictive deconvolution is an example of quadratic regularization (Robinson and Treitel, 1980). However, in many situations one may wish to use other types of regularization which permit some relevant information about the model to be incorporated. If the additional information is in the form of a pdf it may be combined with the data likelihood using Bayes' rule. The Radon operator is obtained by incorporating prior information in terms of a family of sparse priors modelled by a generalized Gaussian. In addition, the Cauchy density is used to induce another type of sparse prior. Both priors lead to solutions consisting of isolated wavelets in the transform domain while remaining consistent with the data. When the regularization is imposed by the Gaussian probability distribution the operator is identical to the so-called called stochastic inverse (Aki and Richards, 1980). The latter may be derived using arguments others than those provided by Bayes' rule. An example is the damped least square solution (Lines and Treitel, 1984), which may be either derived considering a probabilistic description of the problem or by modifying the eigen-spectra of the problem to make the inversion less sensitive to observational errors. When prior information is introduced throughout a sparse regulariser not only is the focusing power of the transform improved, but also the alias may be drastically mitigated. Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 65 Therefore, this type of algorithm may be used to process real data where severe aperture limitation and undersampling inhibit the correct separation of seismic events with close moveout. The first type of regularization discussed here is derived minimizing the lp norm of the model subject to data constraints. From a Bayesian point of view, one can generate this regularization by assuming that the unknown parameters may be modelled by a generalized Gaussian distribution (Johnson and Kotz, 1970). In this case, two parameters are used to properly tune the inversion. A shape parameter that controls the tails of the distribution and a scale parameter that may be used to regulate the degree of fitting of the reconstructed data with the original observations. The shape parameter is crucial in the inversion of sparse models. An attractive feature of the generalized Gaussian distribution is that it may be used to provoke sparse (p \u00C2\u00AB 1) or smooth models (p = 2) according to the selection of the shape parameter. A second regularization is derived using the Cauchy pdf. This pdf has been used in robust inversion of seismic data by Crase et al. (1990) and by Amundsen (1991). It is important to point out that like the lp regularization , the Cauchy regularization is used to diminish the influence of outlier or gross errors that may corrupt the result of the inversion. In this thesis, it is shown that these distributions lead to a regularization procedure that converges to a sparse solution. In summary, long tailed distributions are used to impose features onto the model and not to damp outliers as proposed by several authors (Gersztenkorn et al., 1986; Scales et al., 1988; Crase et al., 1990; Amundsen, 1991). The prior pdf is combined with the data likelihood (pdf of the observational errors) using Bayes's rule. This permits the construction of the so-called Maximum Posteriori (MAP) solution of the inverse problem. The observational errors are modelled by means of a Gaussian distribution. The latter has some advantages. First, it provides a simple Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 66 description of the noise process. Secondly, a Gaussian likelihood leads to a MAP solu-tion that resembles the least squares solution. In fact, the MAP solution is retrieved with an iteratively model re-weighted least squares (IMRLS) procedure. The latter is a least squares solution where the model is re-weighted using a non-linear functional that depends on the previous iteration. There is a strong conceptual similarity between this algorithm and iteratively re-weighed least squares (IRLS) (Scales et al., 1988). Still, it is important to mention that in IRLS the errors are re-weighted to damp the influence of outliers. In the IMRLS, the model parameters are re-weighted to impose sparseness on the solution. Finally, I would like to point out that there are strong similarities between the regular-ization strategies proposed in this thesis and the problem of band-limited extrapolation (Cabrera and Parks, 1991). In this technique, a weighted norm is minimized honoring data constraints (e.g., samples of a time series). The idea is that, theoretically, a time-limited signal cannot have a band-limited spectrum and therefore the norm is used to iteratively band limit and shape the spectrum. The weighted norm incorporates the a priori information about the shape and the spectral support of the signal. The procedure involves the computation of the weights from the previous iteration. Recently, Kabir and Verschuur (1995) applied a similar algorithm to interpolation/extrapolation of missing traces. Their scheme uses the parabolic Radon transform as a band limiting device which is iteratively applied until the missing traces are restored. 4.2 Why sparse models? In chapters 2 and 3, I showed that one of the principal shortcomings of using the con-ventional definition of the Radon operator is that amplitude smearing may mask events with similar intercept time and moveout. The inversion of the Radon operator by means Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 67 of the concept of minimum norm may help to diminish the smearing but, in situations where a highly resolved Radon panel is desired, the smoothing introduced by the mini-mum norm constraint does not guarantee enough resolution or focusing power. Assuming that the data are composed of a finite superposition of linear events or hyperbolic events, the Radon operator (for an ideal delta-like wavelet) would map each seismic event into a point. It is evident that a model with smearing is less sparse than a model without smearing and thus, minimizing a sparseness criterion, subject to data constraints, may help to reduce aperture artifacts. Sparse models were widely used in another geophysical scenario. The work of Wiggins (1978) on minimum entropy deconvolution (MED) shows how a sparseness criterion may be used to simultaneously retrieve the reflectivity and the source wavelet form the seismic trace. The deconvolution problem, when the wavelet is unknown, is a severely ill-posed problem. This problem may be tackled by assuming that the reflectivity is a sparse time series. Wiggins' technique maximizes the varimax norm, but similar results may be obtained by maximizing other norms (Oee and Ulrych, 1979; Poetic et. al, 1980). These norms measure departure from the Gaussian assumption. In other words, the MED looks for the least Gaussian reflectivity that fits the data. Projection Pursuit (PP) provides an elegant framework to understand MED (Friedman and Tukey, 1974; Huber, 1985; Jones and Sibson, 1986). PP is a statistical technique used to reduce the dimension of multivariate data. The technique operates as follows. First, a measure that defines some interesting configuration of the data is proposed Then, the data are projected into a 1-D or 2-D space where the criterion is maximized. This procedure enables us to project the data into a new system of coordinates where some interesting configuration is manifested. In MED deconvolution the data matrix (composed of the original time xAs pointed out by Huber (1985) \"We cannot expect universal agreement on what constitutes an \"interesting\" projection.\" Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 68 series properly padded with zero to represent discrete convolution) is projected into a space where the spareness is maximized. The projection operator is the MED filter and the projection is an estimate of the reflectivity. It is believed that MED tends to retrieve a reflectivity that does not represent a real sequence of primary reflections. Some authors proposed more elaborated models to describe the reflectivity and use MED-style algorithms to compute estimates of the reflectivity sequence (Walden, 1985). I do not think that this approach provides an important advance with respect to the original MED idea. I suspect that the MED failures are not only due to the statistical assumptions. The degrees of freedom of the problem axe also a crucial aspect to be taken into account. In other words, since the source wavelet is unknown, the link between the reflectivity and the trace is very weak and, thus the MED norm enters into the problem as a strong constraint. The norms that I have adopted to retrieve sparse solutions may be related to minimum entropy norms. However, there is an important distinction with respect to MED techniques. The model and the data axe linked by the Radon operator. The data are the primary piece of information, and consequently the sparseness constraints enter into the problem as a secondary effect. In other words, the model is well anchored to the data and, therefore, the regularization serves to improve the model by reducing smearing. In MED, the reflectivity and the data are related by another unknown: the source wavelet or the inverse source wavelet. I hope that this small digression on sparse models and MED illuminates the differences between MED-style techniques and the sparse regularization scheme proposed in this thesis. Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 69 4.3 Inverse problems and Bayes' rule 4.3.1 Bayes' Rule To clarify concepts, I will start with Bayes's rule. In what follows p(A\I) denotes the probability of proposition A being true conditional on I being true and, A means that A is false. The rules for Bayesian parameter estimation are simply the product and sum rules of probability theory which are p(AB\I) = p(A\I)p(B\AI) = p(B\I)p(A\BI) (4.1) and p(A\B) + p(A\B) = 1. (4.2) From (4.1), assuming that p(B\I) ^ 0, we derive Bayes' rule p(A\BI)=p(A\lfj^. (4.3) Bayes' rule establishes the way of updating p(A\I) when additional information, B, is incorporated. In equation (4.3) the proposition I is the prior information and y(A|J) is the prior probability of A conditional only on the information given by I (Loredo, 1990). The term on the left-hand-side of equation (4.3) is called the posteriori probability. For Bayesians, probability is a measure of the degree of plausibility of a proposition. Basically, Bayes' rule serves to update the plausibility of a proposition when our state of Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 70 information changes because new data are acquired. The last assessment establishes the radical difference with respect to the \"frequentist\" view of probability. For frequentists, probabilities arise from the relative frequencies of outcomes of an infinite number of identical repetitions of an experiment. This definition restricts A to be a proposition about a random variable. The Bayesian viewpoint, on the other hand, expands the domain of probability theory to include any type of proposition. 4.3.2 Bayesian approach to inverse problems The relationship between the Radon space (u> \u00E2\u0080\u0094 p or u \u00E2\u0080\u0094 q) and the data space when noise is considered is given by where n stands for the noise term and L is the slant or parabolic stack operator depending on the problem at hand. Letting v = A be the proposition we want to assess and u = B represent our data, Bayes' rule can be written as where for simplicity we have omitted J in the notation. To clarify notation, \u00E2\u0080\u00A2 p(u|v) is the probability or likelihood of obtaining the data u assuming that the model v is true. \u00E2\u0080\u00A2 p(v) is the prior probability of the model. \u00E2\u0080\u00A2 p(u) is the data UkeUhood and enters into the problem as a normalization factor. L*v + n = u (4.4) p(v|u) = p(u|v)p(v) p(u) (4.5) Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 71 \u00E2\u0080\u00A2 p(v|u) is the posterior probability of the model. To utilize the Bayesian formalism we must consider two remaining problems. First, given p(v|u), how to determine v, and second, how to determine p(v). The first problem can be solved by an appropriate choice of a decision rule. For example, one could use the maximum a posteriori (MAP) solution, V M A P , which maximizes p(v|u). This solution has maximum posteriori probability, which is the same (up to a normalization factor) as seeking the maximum joint probability. The second problem, which is probably one of the most cumbersome problems in inverse theory, is how to translate our prior appraisal about the model into a probability density function. If the prior probability and the likelihood functions have the forms p(v) oc e- 7 ' , (4.6) p(u\v) oc e~Ju . (4.7) Then, the posteriori distribution is given by, p(v|u) oc p(v) x p(u|v) (4.8) x e-(J.+J.). The MAP solution will coincide with the minimum of J = \u00E2\u0080\u0094 ln[p(v|u)], where J is usually called the objective or cost function. (4.9) Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 72 If the posterior distribution is multi-modal the objective function is not a strictly convex function, therefore a global optimization problem is required to find the MAP solution. If the prior and the likelihood are modelled with normal distributions, the objective function becomes the classical quadratic form widely used in inverse theory (Tarantola, 1987). It is important to point out that other estimators of v may be used. For instance, the mean v of the posteriori pdf v = /p(v|u)dv. (4.10) or the median, v i , 2 ^ ( v < v j ) = | . (4.11) From a computational point of view the MAP is usually preferred, since the computation of the mean and/or the median require the integration of the whole model space. For a symmetrical unimodal posterior pdf, the mean, the median and the MAP estimator coincide. The normalization constants of the pdf's are not required to minimize the objective function, and therefore they can be omitted in the analysis. Bayesian inference considers Jv as additional information which may help to deter-mine the most likely set of parameters. From a non-Bayesian perspective (rather, a classical inverse theory approach) this term serves to stabilize the inversion (Tikhonov and Goncharsky, 1987). The classical school uses the concept of regularization to obtain a stable inverse of an ill-posed problem. Typically, probabilities are not mentioned and attention is focused on formulating an objective function. In the Bayesian approach the Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 73 aim is to incorporate probabilistic information into the formulation of the inverse prob-lem. Moreover, in the Bayesian approach the objective function is an intermediate step which is required to construct a solution. There are two approaches to derive a cost function. The Bayesian approach, which is what was just explained, entails the construction of the posteriori probability and then the objective function. In inverse theory a more direct approach is usually adopted. In the latter the objective function is designed by combining a misfit criterion with a regularization function that is derived from a smoothing criterion. I feel that in many situations the Bayesian approach may highlight some formal aspects of the inversion. However, when an objective function is derived by means of Bayes' rule, a common questions arises: why that pdf and not another?. On the other hand, in classic inverse theory the objective function is accepted without any problem, i.e., smoothing with derivatives (Constable et. al, 1985). Yet we have to realize, that both approaches may lead to similar or even to the same objective function. In fact, smoothing strategies based on derivatives may be related to Gaussian models with a particular covariance structure (Wang and Braile, 1994). 4.3.3 Assigning a prior to the model Bayesian inference has been historically divided by the degree of objectivity in the selec-tion of priors. Various attempts have been made to find prior probabilities which rep-resent a state of total ignorance about the model (Jeffreys, 1961; Jaynes, 1968). These priors are derived from mathematical arguments of symmetry and invariance. An objec-tive prior may be also derived using the principle of maximum entropy (Jaynes, 1968). Consider a random variable, x, with pdf f(x). The entropy h given by Chaptei 4. High Resolution Slant Stack and Parabolic Stacks Operators 74 h = - [\u00C2\u00B0\u00C2\u00B0 f(z)]n[f(z)]dz (4.12) J\u00E2\u0080\u0094oo expresses the uncertainty associated with the distribution f(x). Therefore, the distribu-tion that most honestly describes the data, given only what it is known without assuming anything else, is the one with maximum entropy (Jaynes, 1968). It is natural, therefore, that when we attempt to make inferences based on incomplete information, that we draw them from that probability distribution that has the maximum entropy allowed by the available information. The entropy is maximized subject to the following constraints (4.13) and / f(x)gk(x)dx = afc, Ja * = 1,2,.. ,m, (4.14) where the first constraint is a normalization constraint, the second is a set of moments that are assumed to be known. Using the Euler-Lagrange equation of the calculus of variations the maximum entropy pdf is given by /(*) = ,[-A0-A1fl1(a!)-A2fl3(x)-ATOflrTO(x)] (4.15) Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 75 The Lagrange parameters A 0 , A i , . . . A m are estimated by replacing f(x) into the con-straints (Kapur and Kesavan, 1992) and solving a nonlinear problem. Many well known pdf's may be obtained using the principle of maximum entropy. An example is the generalized Gaussian distribution. In this case, the prescribed constraints are equation (4.13) and the dispersion measure associated with the lp norm, (*\u00E2\u0080\u009E)*= f\u00C2\u00B0\u00C2\u00B0 \x\pf(x)dx. (4.16) J\u00E2\u0080\u0094oo \u00E2\u0080\u00A2 The pdf that maximizes the entropy is l - l / p ^ i ] x \u00C2\u00A3 The generalized Gaussian distribution with shape parameter p and scale parameter oo to a uniform distribution. The metric induced by the generalized Gaussian distribution is the lp norm. Laplace's double exponential is a pdf that has longer tails than the normal distribution. This property is exploited in robust statistic to damp the influence of outliers (Huber, 1981). Another typical long-tailed probability density is the Cauchy distribution, = ( 4 - 1 8 ) Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 76 This distribution provides an attractive prior to model a sparse distribution of para-meters. In addition, it leads to a regularization function similar to Burg's entropy (Burg, 1975). Equation (4.18) can be derived using the principle of maximum Entropy when .E[ln(l + x2/o~l)] is prescribed (Kapur and Kesavan, 1992) (E denotes expectation). In that case, JE?[ln(l + x2/cr2)\, plays a role similar to a dispersion measure. The inverse problems studied in this thesis are posed in the frequency-offset space. Hence, the priors must be modified to cope with complex variables. Neglecting the normalizing term (not needed to compute the MAP estimator), the priors may be formally written down as f(V,ap,p)* = 2irk/N, k = 0,1,2,N \u00E2\u0080\u0094 1. It follows that since the real and imaginary parts are linear combinations of a normally distributed variable nt, they are also normally distributed with zero mean and variance given by (Priestly, 1981) Var(iVr(u;fc)) = Var(iV,(u;fe)) = a* ,k^0,N/2 { 2*1 ,k = 0,N/2 al ,k^0,N/2 0 ,k = 0,N/2 (4.22) (4.23) In addition, the covariance is given by Cw{Nr{u>k),Ni{ul)) = 0,Vk,l (4.24) similarly, Cav(Nr{u>k),Nr(en)) = Cav(Ni{uk),Ni{un)) = 0,Vfc,Z (4.25) From the above discussion it is clear that the real and imaginary parts have a bivariate distribution Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 78 1 l p{NrM, Ni(u>k)) = \u00E2\u0080\u0094 e 3 < k^O, N/2 (4.26) Now, it is possible to regard the joint distribution of the real and imaginary part as the distribution of a complex variable Nr + iNi, 4.3.5 Maximum a Posteriori Solution (MAP) When we combine the generalized Gaussian prior and the Gaussian likelihood, the fol-lowing objective function is obtained Jp = v ^ Q p V + (L*v - u ) H C n - 1 ( L * v - u) (4.28) where the diagonal matrix Q p has the following elements When p = 2, the elements of the diagonal matrix Q c are equal. The cost function Jp corresponds to a mixed lp and Z2 norm problem (Alliney and Ruzinsky, 1994). Note, however, that this is a special mixed norm problem since the argument of the norm is the amplitude of a complex variable, |Vj |, and not the complex variable itself. The matrix C n stands for the noise covariance matrix which is assumed to be known. The minimization of $ p gives rise to the following solution, Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 79 v = (Q p + L C n - 1 ^ ) \" 1 ^ - ^ . (4.30) The diagonal matrix Q p in equation (4.30) is non-linearly related to the model. Equation (4.30) can be rewritten using the following manipulations (4.31) = (Qp + L C T ^ Q p - 1 ! , . Since ( C n + L * Q P 1L) and (Q p + L C n XL*) are positive definite and thus, invertible, the following identity holds (Q p + L C n - 1 ! * ) \" 1 ^ \" 1 = Qp\" 1 L(C n + L ' Q p ^ L ) \" 1 (4.32) and expression (4.30) is equivalent to v = Q p - 1 L ( C n + I / Q p ^ L ) - 1 ! ! . (4.33) It must be stressed that for theoretical assessment regarding uniqueness and convergence, the operators (4.30) and (4.33) are equivalent. The lp regularization may lead to division by zero when Vi \u00E2\u0080\u0094\u00E2\u0080\u00A2 0. To avoid this inconsistency the following modification is included Q P i i = P 2o% ivjr 2 i f |v5i>e, , x (4.34) e p \" 2 i f |Vi| < e . Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 80 With this modification, all samples smaller that e axe weighted in the same manner. This technique was also used by Huber (1981) in robust statistical procedures to assign a constant weight to errors below a given threshold. If the Cauchy distribution is adopted, the objective function derived from Bayes theorem is given by Jc = \u00C2\u00A3 l n ( l + ! ^\u00C2\u00A3) + ( L * v - u f C n - ^ L V - u ) . (4.35) It is straightforward to see that in equations (4.30) and (4.33) the matrix Q p is replaced by Q c , \u00C2\u00AB \u00C2\u00AB < = ^ T T ^ ) - < 4 - 3 6 > Crase et. al (1990) and Amundsen (1991) used this criterion to improved the inversion of seismic data in the presence of outliers. In a time series analysis scenario, the Cauchy regularization has been applied to retrieve the spectral signature of seismic wavefields (Sacchi and Ulrych, 1995b). It is evident that some iterative strategy is required to solve equation (4.30) or (4.33). Letting Q denote Q p or Q c , the algorithm follows v f c + i = ( Q f c + L C ^ L ^ ^ L C n ^ u , (4.37) similarly we can operate with (4.33), Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 81 (4.38) The algorithm is initiated with either a flat model or with the following approximation: v = Lu. The number of iterations necessary to reach the solution will depend on the adopted regularization. Usually 5 iterations yield a good approximation to the minimum of Jp (p \u00C2\u00AB 1) and/or Jc. One advantage of the lp regularization, is that the degree of sparseness may be modified by selecting the shape parameter p. One inconvenience, however, is that trimming of small amplitudes is required. The effect of the non-linear weights Q p or Q c can be summarized as follows. In each iteration, the non-linearity produces a model that, while consistent with the data, has the minimum amount of structure or maximum sparseness. In fact, the algorithm drives the solution to a state in which most samples are equal to zero with only few samples different from zero. The algorithm is referred to as the Iteratively Model Re-weighted Least Squares (IMRLS) algorithm. The re-weighting function may be also derived using other priors than those proposed in this chapter. 4.3.6 Algorithm complexity The operators (4.30) and (4.33) are equivalent, however, it is important to consider the computational advantages of one over the other. For underdetermined problems (M > N), clearly, it is more convenient to solve equation (4.33) than equation (4.30). The following cases arise depending on the discretization in model and data space: \u00E2\u0080\u00A2 Slant Stack operator: If the ray parameter axis is evenly discretized, the matrix LL* is Toeplitz Hermitian. If a zero order quadratic regularization is adopted (damped least squares), the elements of the diagonal matrix Q are equal, consequently, Chapter 4. High Resolution Slant Stack and Parabohc Stacks Operators 82 Q + LL* is also Toeplitz. In this situation, fast solvers like Levinson recursion may be used (Rostov, 1990). When the weighting function is derived from the Cauchy prior or the generalized Gaussian (y ^ 2), the elements of the diagonal matrix Q are unequal. Hence, the Toeplitz structure is destroyed. In this case, the solution may be computed with any solver for Hermitian matrices. I have adopted a Cholesky decomposition (Press et. al, 1992) but other techniques may be utilized. Since only a few iterations are required to reach a feasible solution, the computational cost of Cholesky decomposition is reasonable. \u00E2\u0080\u00A2 Parabohc stacks: The operator LL* is Toeplitz Hermitian providing that the Radon parameter is evenly discretized. When zero order regularization is adopted (damped least squares) equation (4.30) may be solved using the Levinson recursion. Any other case is solved using Cholesky decomposition. 4.3.7 Convergence In this section, I show that the proposed algorithm minimizes the cost function Jp and/or Jc. The change of the objective function between two consecutive iterations may be approximated by a first order Taylor expansion AJk = Jk+i \u00E2\u0080\u0094 Jk = VJkHAvk. (4.39) The solution at iteration k + 1 satisfies, L C n u = (Qfe + L C ^ L ^ V k + x = (Qfc + LCn^L'Xvfc + Avk) (4.40) Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 83 thus, V J ^ - C Q f c + L C n - ^ A v * . (4.41) Finally, combining the above expressions AJk = -AvkH(Qk + LC^L-JAvfc. (4.42) The objective function is decreased providing that ( Q * + LC\u00E2\u0080\u009E 1L*) is positive definite. This is proved in the next section. 4.3.8 Convexity of the objective function If the objective function is strictly convex and possesses bounded derivatives, then the problem J = min (4.43) or V J = 0 (4.44) has a unique solution. The convexity assumption serves to guarantee equivalence between (4.43) and (4.44). If we are satisfied with local uniqueness, strict convexity can be Chapter 4. High Resolution Slant Stack and Parabolic Stacks Operators 84 replaced by convexity (Rockafellar, 1970). The function J is convex provided that the Hessian matrix, H = V V J , is positive definite x H V V J x > 0 for all x other than x = 0 , (4.45) where the entries of the Hessian matrix are 82J If the regularization is derived from the generalized Gaussian, the Hessian becomes W J P = | Q P + L C - 1 L * . (4.47) The quadratic form v H Q p v is always positive definite, while v H L C n - 1 L * v is symmetric and positive semi-definite (positive definite if L* has full rank), hence the sum is positive definite. It might happen that some elements of the matrix Q p become unbounded when Vj? \u00E2\u0080\u0094> 0. The threshold operator in equation (4.34) serves to avoid this degeneracy. If the Cauchy criterion is adopted as the prior of the problem, the Hessian of the objective function is given by V V J c = D + L C ^ L * , (4.48) where D is a diagonal matrix with the following elements (4.46) Chapter 4. High Resolution Slant Stack and ParaboHc Stacks Operators 85 Du = Since o\ > 0, the Hessian is a positive form. Thus, the convexity of J is guaranteed for both regularises. The existence of the solution is also guaranteed in both cases. Finally, note that the equivalence between equations (4.30) and (4.33) is also satisfied. 4.3.0 Derivation of the damped least squares solution What is the relationship between the well known damped least squares approach and the sparse regulariser? There is an easy way to answer this question when the regularization is induced by the generalized Gaussian. We simply set p = 2 in (4.30) to end up with If the noise covariance matrix is diagonal, C n = N. A standard approach to solving this problem is by means of zero padding. Denning a new time series consisting of the original series plus a zero extension for n = N,..., M \u00E2\u0080\u0094 1, one can estimate M spectral samples using the DFT. This procedure helps to remove ambiguities due to discretization of the Fourier transform but, as is well known, it does not reduce the sidelobes created by the temporal/spatial window or improve the resolution. Estimation of M spectral samples without zero padding may be posed as a linear inverse problem where the target is a DFT that is consistent only with the available information. At this point, it is interesting to note that the underlying philosophy is similar to Burg's maximum entropy method (MEM) (Burg, 1975). However, in Burg's MEM the target is a PSD estimate. Rewriting equation (5.2) as i M-l *n = V 7 \u00C2\u00A3 XkeiMlM n = 0,..., N - 1, (5.3) M k=o gives rise to a linear system of equations d = F m , (5.4) where the vector d \u00E2\u0082\u00AC R N and m \u00C2\u00A3 C M denote the available information and the unknown DFT, respectively. Equation (5.4) is a linearly underdetermined problem which, as is well known, can be satisfied by many different solutions. Uniqueness is imposed by defining a regularized solution, m, which may be obtained by minimizing the following cost function J(m) = $(m) + | |d -Fm| | ^ . (5.5) Chapter 5. Aperture Compensated Discrete Fourier Transform and Applications 120 The regulariser $(m) serves to impose a particular character on the solution. 5.3 Zero order quadratic regularization (damped least squares) To start with, I will assume that the data are contaminated with Gaussian noise and that the samples of the DFT can be modelled with a Gaussian distribution. For simplicity, I will assume that the discretization of both spaces is regular and that the covariance matrices of the model and noise are diagonal matrices. Under this assumption, one may use Bayes' rule to estimate a MAP estimator of the DFT. This assumption leads to the minimization of the following objective function As usual, the first term represents the model norm while the second term is the misfit function. Taking derivatives and equating to zero yield Jaa = XmH m + (d - Fm)H(d - Fm). (5.6) m = (FHF + Aljif ) _ 1 F H d . (5.7) The last expression can be rewritten using the following identity (FBF + A I M ) _ 1 F H = FH(FFH + X1N)~\ (5.8) where 1M and IN represent M x M and N x N identity matrices, respectively. Recalling equation (5.8) and that F F ^ = j^ Iw, we end up with the following expression Chapter 5. Aperture Compensated Discrete Fourier Transform and Applications 121 m = (\u00C2\u00B1 + \)-1FHd. (5.9) The result is nothing else than the DFT of xn modified by a scale factor. The solution expressed by equation (5.9) becomes ^ ^ T ^ E ^ 2 ^ - 1 ) , (5.10) which is the DFT of the windowed time series (equation (5.1)) and is equivalent to padding with zeros in the range n = N,..., M \u00E2\u0080\u0094 1. It is easy to see that this regular-ization yields a scaled version of the DFT. Hence, the associated periodogram exhibits a resolution that is proportional to the inverse of the length of the time series (the usual resolution criterion for the periodogram.) The periodogram becomes, Pk=XkX*k, fc = 0 , . . . , M - l . (5.11) It is interesting to point out that Oldenburg (1976) arrived to the same conclusion using the Backus and Gilbert (B&G) formalism which is a completely different approach to inversion. Specifically, when the first Dirichlet criterion of B&G theory is used the resulting expression leads to an expression equivalent to equation (5.10). This is a simple consequence of using this criterion; other spreading functions lead to different results. An important conclusion which may be drawn from Oldenburg's paper is that the resolution is fixed and cannot be modified. An intuitive explanation is as follows; since the resolution of the PSD is controlled by the width of the sidelobes which are independent of the scaling Chapter 5. Aperture Compensated Discrete Fourier Transform and Applications 122 factor, the DFT computed with the Gauss-Gauss model will lead to a PSD estimator that is equivalent to the periodogram of the truncated time series. 5.4 Regularization by the Cauchy-Gauss model In the previous section I showed that the Gauss-Gauss regularization does not serve the purpose of obtaining a high resolution PSD estimator. The sparse regularization scheme used to compute Radon operators may be used here to estimate a high resolution DFT. The reason is simple and can be interpreted as follows; in general, the interpretation of spatio-temporal data is involved with a finite number of events which exhibit spatial continuity. If the data show great complexity, time and/or spatial windows are chosen such that particular events may be more easily distinguished and mapped. Because of the limited number of such events, it is appropriate to require a model which consists of the minimum number of events that can satisfy the data. This is the classic idea of simplicity or parsimony and quite opposite in philosophy to norms which describe the \"structure\" of a surface by means of, for example, the smallest, flattest or smoothest models. A \"long tailed\" distribution, like the Cauchy pdf, will induce a model consisting of only few elements different from zero. The objective function of the problem becomes, Jcg = Je(m) + -^(d - Fm) f f (d - Fm) (5.12) n The function, Jc{x), which is expressed by M - l is the regulariser imposed by the Cauchy distribution and is a measure of the sparseness Chapter 5. Aperture Compensated Discrete Fourier Transform and Applications 123 of the vector of spectral powers Pk = XkX\u00C2\u00A3, k = 0,..., M \u00E2\u0080\u0094 1. The constant ac controls the amount of sparseness that can be attained by the inversion. The sparseness of the estimate will also depend on the noise level since \u00C2\u00ABrn may inhibit a reliable sparse solution. Taking derivatives of Jcg(x) and equating to zero yields the following result m = (ACT1 + F H F ) \" 1 F H d , (5.13) where A = <72/ "Thesis/Dissertation"@en . "1996-05"@en . "10.14288/1.0052951"@en . "eng"@en . "Geophysics"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Aperture compensated radon and fourier transforms"@en . "Text"@en . "http://hdl.handle.net/2429/4752"@en .