@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en, "Earth, Ocean and Atmospheric Sciences, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Velis, Danilo Rubén"@en ; dcterms:issued "2009-06-25T18:28:49Z"@en, "1998"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """Wavelet estimation, ray tracing, and traveltime inversion are fundamental problems in seismic exploration. They can be finally reduced to minimizing a highly nonlinear cost function with respect to a certain set of unknown parameters. I use simulated annealing (SA) to avoid local minima and inaccurate solutions often arising by the use of linearizing methods. I illustrate all applications using numerical and/or real data examples. The first application concerns the 4th-order cumulant matching (CM) method for wavelet estimation. Here the reliability of the derived wavelets depends strongly on the amount of data. Tapering the trace cumulant estimate reduces significantly this dependency, and allows for a trace-by-trace implementation. For this purpose, a hybrid strategy that combines SA and gradient-based techniques provides efficiency and accuracy. In the second application I present SART (SA ray tracing), which is a novel method for solving the two-point ray tracing problem. SART overcomes some well known difficulties in standard methods, such as the selection of new take-off angles, and the multipathing problem. SA finds the take-off angles so that the total traveltime between the endpoints is a global minimum. SART is suitable for tracing direct, reflected, and headwaves, through complex 2-D and 3-D media. I also develop a versatile model representation in terms of a number of regions delimited by curved interfaces. Traveltime tomography is the third SA application. I parameterize the subsurface geology by using adaptive-grid bicubic B-splines for smooth models, or parametric 2-D functions for anomaly bodies. The second approach may find application in archaeological and other near-surface studies. The nonlinear inversion process attempts to minimize the rms error between observed and predicted traveltimes."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/9599?expand=metadata"@en ; dcterms:extent "16592142 bytes"@en ; dc:format "application/pdf"@en ; skos:note "APPLICATION OF SIMULATED ANNEALING TO SOME SEISMIC PROBLEMS By Danilo Ruben Velis Geofisico, Universidad Nacional de La Plata, Argentina A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E STUDIES D E P A R T M E N T O F E A R T H A N D O C E A N S C I E N C E S We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A August 1998 © Danilo Ruben Velis, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study, i further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of AJZ> 0 <=e4/J SC/^MCFS The University of British Columbia Vancouver, Canada Date A*fi/st~r, fflj? DE-6 (2/88) Abstract Wavelet estimation, ray tracing, and traveltime inversion are fundamental problems in seismic exploration. They can be finally reduced to minimizing a highly nonlinear cost function with respect to a certain set of unknown parameters. I use simulated annealing (SA) to avoid local minima and inaccurate solutions often arising by the use of linearizing methods. I illustrate all applications using numerical and/or real data examples. The first application concerns the 4th-order cumulant matching (CM) method for wavelet estimation. Here the reliability of the derived wavelets depends strongly on the amount of data. Tapering the trace cumulant estimate reduces significantly this depend-ency, and allows for a trace-by-trace implementation. For this purpose, a hybrid strategy that combines SA and gradient-based techniques provides efficiency and accuracy. In the second application I present SART (SA ray tracing), which is a novel method for solving the two-point ray tracing problem. SART overcomes some well known difficulties in standard methods, such as the selection of new take-off angles, and the multipathing problem. SA finds the take-off angles so that the total traveltime between the endpoints is a global minimum. SART is suitable for tracing direct, reflected, and headwaves, through complex 2-D and 3-D media. I also develop a versatile model representation in terms of a number of regions delimited by curved interfaces. Traveltime tomography is the third SA application. I parameterize the subsurface geology by using adaptive-grid bicubic B-splines for smooth models, or parametric 2-D functions for anomaly bodies. The second approach may find application in archaeological and other near-surface studies. The nonlinear inversion process attempts to minimize the rms error between observed and predicted traveltimes. u Table of Contents Abstract ii List of Tables viii List of Figures ix Acknowledgment xii 1 Introduction 1 1.1 Introductory remarks 1 1.2 Simulated annealing applications 2 1.3 Simulated annealing, linearizing methods, and hybrid strategies 3 1.4 Alternatives to simulated annealing 4 1.5 Dissertation outline 5 2 Simulated Annealing 10 2.1 Introduction 10 2.2 Statistical mechanics, the Metropolis algorithm, and the annealing process 11 2.3 Simulated annealing optimization 13 2.3.1 Generating function 14 2.3.2 Acceptance function 16 2.3.3 Cooling schedule 16 2.4 Fast annealing (FA) 17 2.5 Very fast simulated annealing (VFSA) 18 iii 2.5.1 Description of the algorithm 20 2.5.2 Re-annealing 21 2.5.3 Quenching 22 2.6 Annealing or quenching? 22 2.6.1 Initial temperature 24 2.7 Constrained simulated annealing 25 2.8 Conclusions 28 3 Wavelet Estimation Using Fourth-Order Cumulant Matching 29 3.1 Introduction 29 3.2 Background theory 31 3.2.1 Cumulants and moments 31 3.2.2 Cumulant matching (CM) for a convolutional process 33 3.3 Optimization problem 34 3.4 Discussion and explanatory examples 35 3.4.1 Tapering the cumulant estimate 35 3.4.2 V F S A vs linearizing solutions: a hybrid strategy 38 3.4.3 Sensitivity to wavelet phase 40 3.4.4 Non-Gaussianity assumption 44 3.5 Real data examples 48 3.5.1 Field data 48 3.5.2 Marine data 48 3.6 C M in the frequency domain 50 3.7 Conclusions 54 4 Two-Dimensional Boundary Value Ray Tracing 56 4.1 Introduction 56 iv 4.2 Earth model 58 4.3 Solving the initial-value problem (IVP) 59 4.3.1 Two-dimensional cell ray tracing 59 4.3.2 Pros & cons of cell ray tracing: a discussion 65 4.4 Solving the boundary-value problem (BVP) 68 4.4.1 Problem definition 68 4.4.2 Conventional methods for solving the B V P 72 4.4.3 Simulated annealing ray tracing (SART) 78 4.5 Simulated annealing ray tracing (SART) 79 4.5.1 Description of the algorithm 79 4.5.2 SART for more complex BVPs 82 4.5.3 Shadow zones, multiples, and more 91 4.6 Numerical examples and discussion 92 4.6.1 Model 1: smooth model 93 4.6.2 Model 2: reflector model 94 4.6.3 Model 3: refractor model 97 4.7 Conclusions 101 5 Boundary Value Ray Tracing In Complex 3-D Media 104 5.1 Introduction 104 5.2 Earth model . . . . . 107 5.3 Solving the initial-value problem (IVP) 109 5.3.1 The ray equations 109 5.3.2 Intersection points I l l 5.3.3 Ray signature and stopping conditions 113 5.4 Solving the boundary-value problem (BVP) by means of SART 114 v 5.4.1 Problem definition 115 5.4.2 SART accuracy 121 5.4.3 Refinement of the solution: a hybrid strategy 122 5.5 Numerical examples 126 5.5.1 Fault model 127 5.5.2 Salt-dome model 134 5.5.3 Selection of SART parameters 142 5.6 SART performance: analysis and discussion 148 5.6.1 SART subprocesses efficiency 149 5.6.2 Further improvement of SART efficiency 154 5.6.3 Comparison with other ray-tracing methods 157 5.7 B V P as a constrained nonlinear optimization problem: a discussion . . . 164 5.7.1 Straight-ray construction drawback 164 5.7.2 Alternative formulations . . . 165 5.8 Conclusions 174 6 Traveltime tomography 178 6.1 Introduction . . 178 6.2 General theory 181 6.3 Model representation 183 6.3.1 Bicubic B-splines 184 6.3.2 Parametric 2-D functions 186 6.4 Forward modeling 192 6.5 Nonlinear inversion 193 6.5.1 Selection of V F S A parameters 194 6.6 Numerical examples 194 vi 6.6.1 Smooth model 196 6.6.2 Anomaly body 198 6.7 Conclusions 202 7 Summary 204 7.1 Future research 207 References 210 Appendices 224 A Sufficient conditions for the global convergence of various S A algo-rithms 224 B Formulas required for cell ray tracing with piecewise constant velocity226 C C P U time saving by using a different numerical integrator in S A R T 230 vii L i s t of T a b l e s 4.1 Decision table after reaching a predefined discontinuity 65 4.2 Multipathing in the three models 96 5.1 Model 1: velocities and interfaces defining the fault model 127 5.2 Model 1: multipathing in the fault model 131 5.3 Model 2: velocities and interfaces defining the salt-dome model 137 5.4 Model 2: multipathing in the salt-dome model 140 5.5 Parameter search ranges in V F S A 144 5.6 Mean number of SA iterations to achieve a specified misfit 145 5.7 SART profile using fourth-order R K integration 153 5.8 SART profile using static and dynamic integration 156 5.9 Model 1: comparison of SART with other methods in the fault model. . . 158 5.10 Model 2: comparison of SART with other methods in the salt-dome model. 161 5.11 Comparison of SART with other methods for different number of receivers. 162 5.12 Number of iterations for global convergence using various cost functions. 172 6.1 Estimated model parameters after 3000 iterations (20 runs) 198 B . l Shooting from the left edge of cell 227 B.2 Shooting from the top edge of cell 227 B.3 Shooting from the right edge of cell 228 B.4 Shooting from the bottom edge of cell 228 B.5 Angles ipa and ipi, to determine next cell edge 229 viii L i s t o f F i g u r e s 1.1 Comparison of \"greedy\" algorithms with SA 3 2.1 Flow chart of a basic SA algorithm 15 2.2 Probability density functions used in B A , FA, and V F S A 19 2.3 Comparison of various SA cooling schedules. . 23 3.1 Tapering the trace cumulant. 36 3.2 Synthetical example: linearizing vs V F S A solutions 40 3.3 Synthetical example: deconvolved traces and error 41 3.4 Fourth-order moment sensitivity to wavelet phase 43 3.5 Non-Gaussianity assumption in the cumulant matching approach 45 3.6 Field data example: input data 49 3.7 Field data example: wavelet estimates 50 3.8 Marine data example: input data and wavelet estimate 51 4.1 Two dimensional cell parameterization of the velocity field 58 4.2 Detailed geometry of a ray entering a cell from the left edge 61 4.3 Special cases of the ray geometry 63 4.4 Fictitious reflection introduced by the cell parameterization 66 4.5 Source-receiver and raypath geometry for tracing direct waves 69 4.6 Source-receiver and raypath geometry for tracing reflections 70 4.7 Source-receiver and raypath geometry for tracing headwaves 71 4.8 Source-receiver and raypath geometry for tracing diffractions 72 ix 4.9 Bending method gets trapped in local minima 76 4.10 Straight-ray construction used by SART 80 4.11 Alternative strategies for tracing reflections 84 4.12 Yet another alternative strategy for tracing reflections 87 4.13 SART for tracing headwaves 88 4.14 SART for tracing diffractions 90 4.15 SART for tracing multiples 91 4.16 Model 1: tracing direct waves 94 4.17 Model 2: tracing reflected waves 95 4.18 Model 2: Traveltime vs receiver distance, receiver distance vs take-off angle. 97 4.19 Model 2: cost function vs take-off angle, and SART convergence 98 4.20 Model 3: tracing headwaves 99 4.21 Model 3: cost function vs entering and emerging coordinates 100 4.22 Model 3: SART convergence after 300 iterations 101 5.1 Model 1: a 3-D fault model 108 5.2 The ray direction is described by declination 8 and azimuth £ 110 5.3 Ray tracing direct waves through a 3-D media using SART 115 5.4 Ray tracing reflections through a 3-D media using SART 117 5.5 Ray tracing headwaves through a 3-D media using SART 120 5.6 Model 1: two-dimensional slice through the fault model 126 5.7 Model 1: fan of rays and wavefronts in the fault model 129 5.8 Model 1: Traveltime vs geophone depth, and common-shot gather 130 5.9 Model 1: two cost functions for the fault model 132 5.10 Model 1: multipathing in the fault model 133 5.11 Model 1: traveltime vs take-off angles 134 x 5.12 Model 1: SART convergence after 500 iterations 135 5.13 Model 1: scatter plot for 500 annealing iterations 135 5.14 Model 2: a 3-D salt-dome model 136 5.15 Model 2: two-dimensional slice through the salt-dome model 137 5.16 Model 2: two cost functions for the salt-dome model 138 5.17 Model 2: multipathing in the salt-dome model 139 5.18 Model 2: SART convergence after 500 iterations 141 5.19 Model 2: scatter plot for 500 annealing iterations 142 5.20 Model 2: Traveltime vs geophone depth, and common-shot gather 143 5.21 SART convergence for 20 independent realizations 146 5.22 Dispersion plots of the number of iterations 147 5.23 Model 1: multiple ray tracing in the fault model 150 5.24 Model 2: multiple ray tracing in the salt-dome model 151 5.25 Comparison of SART with other methods 159 5.26 Comparison of SART with other methods for different number of receivers. 163 5.27 The two-layer model and the \"unrealistic\" raypath 165 5.28 Model 1: alternative cost functions for the fault model 171 5.29 Number of iterations and traveltime T e r for various cost functions 174 6.1 Two-dimensional grid used for the bicubic spline parameterization 185 6.2 One-dimensional velocity anomaly function for various slopes 188 6.3 Two-dimensional velocity anomaly function (flat region only) 190 6.4 Acquisition geometry for the tomographic problem 196 6.5 Traveltime inversion for the smooth model 199 6.6 SART convergence after 3000 iterations (20 realizations) 200 6.7 Traveltime inversion for the anomaly model 201 xi Acknowledgment I would like to express my deep thanks to my supervisor Tadeusz Ulrych for his guidance and support. He has been both a mentor and a friend to me throughout my years at U B C . I thank Laurence Lines and Toshifumi Matsuoka for helpful comments about ray tracing. I also thank Michael Bostock, Richard Pawlowicz, and the members of the committee, for their excellent comments and suggestions that helped to improve the quality of this work. A note of gratitude goes to Xingong L i and David Aldridge, who let me use some of their codes. For continuing support, I am very grateful to Facultad de Ciencias Astronomicas y Geofisicas, Universidad Nacional de La Plata, Argentina. A special thanks to my former supervisor Alberto Cominguez, for introducing and guiding me into the seismic data processing arena, and to the graduate students and colleagues at the Department of Applied Geophysics, Universidad Nacional de La Plata, for their friendship and support. Thanks to the other current and earlier Ph.D. students and staff at the Department of Earth and Ocean Sciences, for their commitment to create a pleasant and dynamical environment at the department, and for those unforgettable soccer Friday afternoons. I am especially in debt to my parents, brothers, and friends, who are responsible for a great part of what I have achieved in my life. Most of all, I wish to thank my wife Susana and my son Juan Manuel for their unconditional love, patience and understanding. xn To tie memory of my mother... I miss you. xiii Chapter 1 Introduct ion Thus, in our reasoning we depend very much on 'prior information' to help us in evaluating the degree of plausibility in a new problem. This reasoning process goes on unconsciously, almost instantaneously, and we conceal how complicated it really is by calling it 'common sense'. E.T. Jaynes - Probability Theory: The Logic of Science 1.1 Introductory remarks In general, geophysical inverse problems ultimately reduce to minimizing an error, ob-jective, or cost function with respect to a set of unknown model parameters: Cost Function(Model) = Minimum (1.1) Most of these problems are nonlinear and very difficult to solve using standard optimiza-tion techniques. The difficulties arise not only because of the nonlinearities involved in such optimization problems, but also because frequently the unknown parameters, which represent physical quantities, must satisfy a set of constraints. These are used to incor-porate a priori geophysical or geological information on the model parameters. Generally these constraints are simply bounding ranges (e.g. a velocity must be greater than zero), but often, they represent additional nonlinear functions and relationships. Consequently, the problem is usually cast as a constrained nonlinear optimization one. Other difficulties arise from the fact that often the forward problem does not lend itself to be easily treated 1 Chapter 1. Introduction 2 with standard techniques that are based on local gradient directions or derivatives (e.g. the cost function has no analytical form and is nondifferentiable). Nonlinear optimization problems are often solved iteratively. Given an initial guess, the current solution is updated until no further improvement in the cost function is observed. Here multimodality plays a key role for the success of the algorithm at hand, for local minima may prevent it from finding the desired solution. This point represents one of the most important concerns in solving equation (1.1). That the minimization of a multimodal cost function using local techniques (e.g. steepest descent) leads to the solution which is closest to the starting model is a com-monly known fact. On the contrary, the simulated annealing (SA) method is not much affected by the initial guess. As a result, the global minimum of the cost function can be obtained. Figure 1.1 depicts the contrasting behavior between SA and a local algorithm in minimizing a multimodal cost function. Local algorithms are \"greedy\" in the sense that they only make downhill moves, whereas in the SA, uphill moves are not forbidden. This allows SA to avoid local minima. 1.2 Simulated annealing applications SA applies to a wide variety of applications. From circuit design [WS92] and optimal pro-tein states conformation [BB89], to econometric problems [GFR94] and optimal deploy-ment of missile interceptors [BJS88]. Van Laarhoven and Aarts describe more traditional applications [vLA88], while Sen and Stoffa focus on geophysical applications [SS95]. The first application in Geophysics was made by Rothman in 1985 [Rot85, Rot86] for estimat-ing residual statics corrections. Subsequently, the method has found several applications like coherence optimization [LBT89], transmission tomography [AV90], waveform inver-sion [SS91, LHS95], reflection inversion [PL93], resistivity inversion [SBS93], earthquake Chapter 1. Introduction 3 Cost_Function A initiaLmodel Model Figure 1.1: Comparison of \"greedy\" algorithms with SA. (a) \"Greedy\" algorithms (e.g. steepest descent) always perform downhill moves (dotted line), guiding the solution to the minimum which is closest to the initial model, (b) SA occasionally accepts uphill moves (solid line) so that it does not get stuck in local minima. location [Bil94], earthquake source parameter estimation [HL95], residual statics correc-tions [LHS95], combined first-arrival traveltime and reflection coherency optimization [PL97], etc. Though the SA method is a very time-consuming algorithm, the results shown so far are encouraging as the numerous applications illustrate. 1.3 Simulated annealing, linearizing methods, and hybrid strategies Both SA and linearizing optimization methods have commonly been used for solving geo-physical parameter estimation problems. The drawbacks of the local (greedy) optimiza-tion algorithms have already been described. Not only is the issue of local convergence a problem, but also the fact that many geophysical inverse problems deal with very complex cost functions which are not easily linearized. Despite the hmitations of local algorithms, their rapid convergence as compared to SA make them extremely useful for most geophysical applications, since the amount of data usually involved is enormous and Chapter 1. Introduction 4 computational speed becomes a necessity. SA, on the other hand, is a time-consuming algorithm. This is the price to be paid in exchange for obtaining the global minimum of the objective function. Usually, hundred or thousand iterations should be performed until convergence, even using fast approaches like very fast simulated annealing (VFSA) [Ing89]. Simulated quenching (a variant of SA) is not always the best remedy for this problem, for a trade-off between convergence speed and global convergence arises. One way of exploiting the advantages of both methods and work out their respective drawbacks, is to combine them into some hybrid strategy. Chunduru et al [CSS96] com-bine the conjugate gradient method with V F S A for the inversion of 2-D resistivity data and for seismic velocity analysis. Liu et al. [LHS95] combine the simplex method with a SA algorithm based on Lane's approach [Lan92] for 1-D acoustic waveform inversion and residual statics. Hartzell and Liu [HL95] use the hybrid method of Liu et al. for earthquake source parameters estimation. Both hybrid strategies have shown to be more efficient than the approach based on SA alone, although their global convergence proper-ties are not clear. In the next chapters, I will develop hybrid strategies to improve the performance of V F S A for wavelet estimation and two-point seismic ray tracing. In par-ticular I combine V F S A with the steepest descent method and a quasi-Newton method, respectively. 1.4 Alternatives to simulated annealing Simulated annealing optimization encompasses a wide family of algorithms which use dif-ferent strategies regarding the perturbation stage, temperature schedule, and acceptance criterion, as the variety of application problems illustrates. Other global optimization algorithms exist to solve the same problems. Branch-and-bound and related exhaustive Chapter 1. Introduction 5 methods are guaranteed to find the global minimum, though not necessarily in an ac-ceptable time [RR88]. Pure random search and the multistart methods [And72] are two closely related stochastic methods which have been traditionally used for global optimiza-tion. These methods are rarely used in nowadays applications due to their low efficiency in terms of computational cost to obtain the global minimum. Simulated evolution-ary optimization frequently offers more attractive alternatives [Fog94]. These methods encompass genetic algorithms [Dav87, Gol89], evolution strategies and evolutionary pro-gramming [Fog93]. Among other rather unlikely methods for global optimization, I can mention taboo search [CK95], neural networks [RS88], etc. Despite the fact that these methods do not lend themselves to theoretical understand-ing (particularly regarding convergence), researchers have found them very effective in some geophysical problems. Such is the case of genetic algorithms (see for example [SS95] for detailed accounts of genetic algorithms in geophysical applications), taboo search in connection to seismic inversion [VM96], and neural networks for automate velocity picking [FK94]. The methods mentioned, together with \"greedy methods\", and countless hybrid combinations have been used successfully in many applications. However, neither of the mentioned algorithms seems to be universally preferred for all problems. Researchers find, usually through experimentation, which tool best fits the problem at hand. 1.5 Dissertation outline This dissertation addresses some fundamental inverse problems arising in geophysics which have not been solved satisfactorily using standard \"greedy\" algorithms nor ever addressed using SA. In all cases I use very fast simulated annealing (VFSA) to globally minimize the multimodal, nonlinear cost functions that are involved. In some cases I Chapter 1. Introduction 6 combine both V F S A with linearizing algorithms to expedite convergence and/or to re-fine the solution. The first portion of the thesis, Chapter 2, provides enough introductory material on SA that the dissertation stands on its own. It also points out the differences between SA and simulated quenching, and explores strategies for dealing with constraints other than bounding ranges. Chapter 3 deals with the problem of wavelet estimation using the fourth-order cu-mulant matching method [Tug87, Laz93, VU95a, VU96b]. The fourth-order cumulant matching (CM) method has been recently developed for estimating a mixed-phase wavelet from a convolutional process [Laz93]. Matching between the trace fourth-order cumulant and the wavelet fourth-order moment is done in a minimum mean-squared error sense under the assumption of a non-Gaussian, stationary, and statistically independent re-flectivity series. This leads to a highly nonlinear optimization problem, usually solved by techniques that require a certain degree of linearization, and that invariably converge to the minimum closest to the initial model. Alternatively, the SA approach developed in Chapter 3 provides reliability of the numerical solutions, reducing the risk of being trapped in local minima. Beyond the numerical aspect, the reliability of the derived wavelets depends strongly on the amount of data available. However, using a multidimensional taper to smooth the trace cumulant, I show that the method can be used even in a trace-by-trace implemen-tation, a point that is, I believe, very important from the point of view of stationarity and consistency. Here I combine both optimization techniques into a hybrid strategy to exploit the efficiency of the linearizing algorithm and the reliability of the SA solutions. I also demonstrate the viability of the method under several reflectivity models that may arise in real situations. The application of the hybrid strategy is illustrated by means of marine and field data examples. The consistency of the results is very encouraging because the improved cumulant matching strategy I developed can be effectively used Chapter 1. Introduction 7 in a trace-by-trace implementation. This chapter appeared previously as a paper in Geophysics [VU96b]. The purpose of Chapter 4 is to develop a new method for solving the two-point seismic ray-tracing problem based on Fermat's principle of stationary time, which I call SART (simulated annealing ray tracing). In models where more than one raypath satisfies Fermat's principle (multipathing), SART focuses on finding the direct ray trajectory whose traveltime is minimum. The nonlinear algorithm overcomes some well known difficulties that arise in standard ray shooting and bending methods. Problems related to: (1) the selection of new take-off angles, and (2) local minima in multipathing cases, are overcome by using an efficient SA algorithm to find the ray parameters that generate the absolute minimum path connecting source and receiver. At each iteration, the ray is propagated from the source by solving a standard initial-value problem. The last portion of the raypath is then forced to pass through the receiver. Using SA, the total traveltime is then globally minimized and the appropriate initial conditions that produce the absolute minimum path are so obtained. As it is described in Chapter 4, SART is suitable for accurately tracing rays through 2-D models only (an extension to 3-D is well developed in Chapter 5). The model is parameterized by rectangular cells of constant velocity (or slowness) and at least one additional discontinuity (e.g. a reflector) can be included. This allows one to trace not only direct waves, but also reflections and headwaves of a predefined signature. In fact, SART can be used in conjunction with any available or user-preferred initial-value solver system (in Chapter 5 I use a numerical ray-tracing procedure). A number of examples of multipathing in moderately complex 2-D media are examined. The results using the nonlinear two-point ray-tracing system demonstrate that the absolute minimum path connecting source and receiver can be obtained regardless the initial guess. Parts of this chapter appeared as a paper in Geophysical Research Letters [VU96a]. Chapter 1. Introduction 8 Chapter 5 has two purposes: (1) to extend SART to encompass 3-D laterally varying media of arbitrary complexity, and (2) to present a very flexible model parameterization that can be used in conjunction with SART to trace complex raypaths through general 3-D media. Though the first purpose is the main goal, the second one is very important too, since solving the forward problem is not always a well understood problem, especially in complex velocity structures. In the present formulation, the model is characterized by any number of regions separated by interfaces. The velocity field within each individual region is specified separately. It may be any function of the space coordinates with the constraint of being twice differentiable. Interfaces can have arbitrary form with the only condition of being univalued. The problems of local minima and take-off angles selection are more severe in the 3-D than in the 2-D case. These problems are overcome easily by SART, for the basic principles regarding global convergence defined for 2-D, apply to 3-D as well. In SART (3-D case) the differential equations that govern the wave propagation are integrated using standard numerical ODE solvers until the ray emerges at the desired endpoint following the absolute minimum traveltime path. This boundary-value problem (BVP) is cast as a nonlinear optimization problem which is in turn solved by means of V F S A . This guarantees convergence to the global minimum of the cost function rep-resenting the total traveltime from source to receiver. In 3-D, in contrast to 2-D, the cost function is at best a unimodal two-dimensional continuous smooth surface. But in real life, the cost function is often a multimodal, discontinuous, nondifferentiable, rough surface of two (or more than two) variables. To obtain the absolute minimum of such an ill-behaved function, a stochastic technique like SA becomes an invaluable tool. A number of synthetic examples are shown to demonstrate the versatility of the model parameterization and the viability of SART to solve the B V P effectively. Although the main goal of SART is to trace first-arrival direct waves, several strategies for dealing Chapter 1. Introduction 9 with complex raypaths (reflections, headwaves, multiples, etc) are well described. A dis-cussion about SART performance in terms of computational cost and accuracy is also presented and illustrated with numerous synthetic examples. This includes a quantitative comparison between SART and some standard shooting techniques. Finally, several al-ternative formulations regarding the optimization problem are explored and discussed in detail. Some parts of this chapter were presented at V Congreso Argentino de Mecdnica Computational, Tucumdn, Argentina, and also published in the proceedings [Vel96]. The last portion of this thesis, Chapter 6, deals with the problem of traveltime tomog-raphy using SA. The subsurface geology is parameterized either using (1) cubic B-splines defined over a 2-D grid with variable spacing in both horizontal and vertical coordinates, or (2) parametric 2-D functions aimed to represent anomaly-like bodies submerged in a smooth background velocity. The flexible grid used in the spline approach can accom-modate moderately complex smooth structures with only a few nodes. The parametric approach, allows one to image velocity anomalies of various shapes, including abrupt changes of velocity, using a small set of parameters. The approaches attempt to mini-mize the rms error between observed and predicted arrival times, which are computed using a finite-difference algorithm. V F S A was used to estimate the velocity at the spline node locations, the grid spacing in each dimension, or the parameters defining the anom-aly bodies. The solution is independent of the initial model because local minima are avoided during the process by a convenient uphill-downhill exploration of the model space. Results using synthetic examples show how moderately complex structures can be obtained using a few parameters without introducing undesirable artifacts [VU95b]. The spline approach is especially suited for smooth models, and the parametric approach, for imaging buried structures such as those arising in archaeology and other near-surface studies [Vel98]. Chapter 2 Simulated Annea l ing The development of high-speed computers has made it possible to incorporate a 'randomizing unit' into some machines which will select random samples ad hoc. The use of such mechanisms requires investigation. Maurice G. Kendall - The Advanced Theory of Statistics, 1969 2.1 Introduct ion The SA method is a stochastic computational algorithm for finding near-optimal solutions to hard optimization problems. Van Laarhoven and Aarts provide a complete introduc-tory treatment of SA and describe traditional applications [vLA88]. The SA method is derived from statistical mechanics where the behavior of a large number of particles at a given temperature is analyzed. In 1953, Metropolis et al. [MRR+53] presented a Monte Carlo sampling technique for modeling the evolution of a solid submerged in a heat bath at a given temperature. Three decades later, the technique was generalized and applied to nonlinear optimization problems [KGV83], specifically to the problem of partitioning, component placement, and wire routing in VLSI (very large scale integration) design. Here, the unknown parameters play the role of the particles in the solid, and the cost (objective) function represents the energy of the system. In the SA approach, the model space is randomly perturbed and new configurations are accepted (or rejected) so that the cost function or energy decreases iteration after iteration. Occasionally, some in-creases of the cost function are allowed and it is this uphill movement that allows the system to escape from local minima. Initially, the temperature (a control parameter) is 10 Chapter 2. Simulated Annealing 11 high enough so that almost all proposed configurations are accepted. On the contrary, at low temperatures the probability of accepting a new configuration corresponding to an increase of the cost function is small. The fluctuations of the random perturbations are gradually decreased along with the temperature following a predefined cooling schedule, and the process is stopped after a maximum number of iterations or when no noticeable change in the cost function is observed during a certain number of consecutive steps. At this point it is said that the system has \"crystallized\" in the sense that the minimum energy state (global minimum) has been obtained. 2.2 Statistical mechanics, the Metropolis algorithm, and the annealing pro-cess The basics of statistical mechanics should be understood to appreciate the connection between condensed matter annealing and the solution of hard optimization problems, such as those occurring in the solution of some geophysical inverse problems. A system consisting of a large number of particles interacting at a finite temperature can be de-scribed by the position of the particles. The configuration of the system is referred to as the \"state\" x. Such is the case of a fluid or melted solid submerged in a heat bath. If T is the temperature of thermal equilibrium of the system with energy E(x), then the probability KT{X) that the system is in state x can be described by the Boltzmann or Gibbs distribution where K is the Boltzmann's constant, Chapter 2. Simulated Annealing 12 Z(T) =J2e^ (2.2) wex is the partition function, and X is the set of all possible states. The Metropolis algorithm (MA) [MRR+53] is a simple stochastic algorithm for sim-ulating these kind of processes developed in the earliest days of scientific computing. In each step of this procedure, a small random displacement (perturbation) Aa; to the current state x with energy E(x) is performed, and the energy change AE = E(x + Ax) — E(x) is then computed. Now, the ratio h between the probabilities of both states is evaluated: ITT(X + Ax) -AB h= i y . . ' = e^r. (2.3 TrT(x) If the energy of the new state is less than the energy of the previous state, that is if h > 1, then the new configuration is accepted as the new state. If h < 1, that is if the energy of the new state is greater than or equal to the energy of the previous state, then the new configuration is accepted with probability h. This criterion for accepting or rejecting a new state is known as the \"Metropolis criterion\" [MRR + 53]. It can be shown that as the number of iterations tends to infinity, the probability that the system is in a given state x equals TTT(X), and hence the distribution of the generated configurations converges to the Boltzmann distribution [GG84]. It interesting to study the behavior of the system at low energy states, in order to appreciate the crystalline structure of the melted solid as temperature is gradually de-creased (chemical annealing). For this purpose, the temperature of the heat bath must be decreased slowly enough so that the probability of obtaining lower energy states accord-ing to the Boltzmann distribution (2.1) predominates. These states can be obtained by Chapter 2. Simulated Annealing 13 keeping the system in thermal equilibrium. As a counterpart, \"quenching\" is the process by which the temperature of the heat bath is lowered very quickly giving not enough time for the system to reach thermal equilibrium. Consequently, the probability of obtaining low energy states is greatly reduced, and the resulting structure of the material may be far from constituting a crystalline lattice. 2.3 Simulated annealing opt imizat ion In 1983, Kirkpatrick et al. [KGV83] made the analogy of a computer design problem with the physical system described in the previous section. They devised a strategy for obtaining a near-optimal solution to the problem of partitioning, component placement and wire routing in VLSI design. Basically, a Metropolis algorithm was applied at each of a series of iterations with gradually decreasing temperatures. For applying SA to solve an optimization problem of the form of equation (1.1), the following analogies must be made: • Identify the configuration of parameters (model) with the configuration of particles. • Identify the objective or cost function, say <3>, with the energy of the system. • Identify the process of finding a near-optimal solution with the process of finding the lowest energy states. The Boltzmann constant K is set to 1 and the temperature T becomes a control parameter. In addition, three very important functionals must be denned: 1. g(Ax): the probability density function (pdf) for generating a new configuration in the model space; Chapter 2. Simulated Annealing 14 2. h(A$): the probability density function for accepting a new configuration; and 3. Tk'. the cooling schedule for the annealing process (temperature T at iteration k). Here x represents the model space of M parameters. Note that both the generating and the accepting pdf's are a function of the temperature as well. The main differences concerning performance among various SA techniques reside in the choice of these three functions. Figure 2.1 shows a flow chart describing a typical SA algorithm. At each iter-ation, a new state is generated by drawing from the pdf g(Ax). The Metropolis criterion is then evaluated and, after lowering temperature according to the cooling schedule Tk, a new iteration is started. The process is stopped when a maximum number of iterations, kmax, is reached (note that other stopping conditions may be used). When the cost func-tion increases, the probability of accepting the uphill move is given by h. In practice, the new state is accepted if h is greater than a random number r, which is drawn from a uniform pdf, U[0,1]. 2.3.1 Generating function Initially, the Metropolis algorithm searched the model space using a uniform pdf for g. Faster convergence was later achieved by using a Gaussian pdf instead, which concen-trates on the states with lowest energy by narrowing the search as temperature decreased (\"Boltzmann annealing\" [SH87]). In this case, the M-dimensional Gaussian pdf may be expressed by g(Ax) = ( 2 7 r T ) - M / 2 e ^ i , (2.4) where A x = x — x0 is the perturbation applied to the previous configuration, Xn- Here temperature T can be viewed as a measure of the fluctuations of g around x0. At high Chapter 2. Simulated Annealing Figure 2.1: Flow chart of a basic SA algorithm. Chapter 2. Simulated Annealing 16 temperatures, the model space is sampled more or less uniformly. Conversely, when temperatures are low, perturbations tend to be smaller. 2.3.2 Acceptance function Generally, the Metropolis criterion is used for accepting/rejecting new configurations: M A * ) = r * 7 = — - g r ^ (2-5) This expression is based on the probability of obtaining a new state with energy relative to the previous state with energy where A $ = $fc+i — This is essen-tially the Boltzmann distribution. Following Metropolis, a new configuration is accepted unconditionally if A $ < 0, and accepted with probability / i(A$) if A $ > 0. Clearly, as the temperature approaches zero, the acceptance probability decreases exponentially and only the lowest energy states are accepted. This is the key point of SA optimiza-tion. It states the most important difference with the so-called \"greedy\" algorithms. While other optimization methods perform only downhill moves (i.e. accept only those states for which the cost function decreases), SA allows uphill moves too. This behav-ior facilitates exploring the whole model space without being trapped in local minima. Linearizing techniques (e.g. steepest descent) belong to the class of greedy algorithms, and their success for obtaining the global minimum of multimodal cost functions is very much influenced, as is well known, by the starting model. 2.3.3 Cooling schedule The schedule used to lower the temperature of the annealing process plays a key role in S A optimization. Ideally, one would like to lower the temperature as fast as possible in order to reach the lowest energy states in a few iterations. However, theory indicates that the Chapter 2. Simulated Annealing 17 cooling rate cannot be faster than a prescribed function if annealing, and not quenching, is desired. Back to the thermal process analogy arising in statistical mechanics, if the metal is cooled slowly enough, the crystal lattice structure results in a configuration with minimal energy (global minimum). On the contrary, if temperature is lowered too fast, distortions in the crystal structure which represent higher energy states (local minima) will be preserved at the end of the process. These two different processes, that is annealing and quenching, are well known in chemical annealing. Thus, for true annealing, thermal equilibrium must be obtained before temperature is lowered. Originally, this state was reached by keeping temperature at a constant level [MRR + 53]. Later, it was found that the equilibrium could be obtained provided that the temperature was lowered slowly enough. It was Geman and Geman [GG84] who showed that the global minimum of $ can be (statistically) obtained if temperature is lowered no faster than T ° (2-6) ln ( l + A;)' where To is a constant and k is iteration. If the temperature is lowered faster than what expressed in equation (2.6), a premature crystallization may occur and the system may get trapped in a local minimum. A heuristic demonstration can be given to show that expression (2.6) is a sufficient condition for the convergence to a global minimum [SH87] (see Appendix A) . 2.4 Fast annealing (FA) A faster cooling schedule can indeed be used without affecting the convergence proof by modifying the generating function. \"Fast annealing\" (FA) uses a Cauchy pdf of the form Chapter 2. Simulated Annealing 18 1 T 9(Ax) = ~ ( A x 2 + r 2 ) ( M + i ) / 2 - ( 2- 7) The coohng schedule that guarantees (statistically) that a global minimum can be found for this particular choice of g is Tk = ^ . (2.8) The corresponding heuristic demonstration is given in Appendix A . FA has proven to be superior to standard annealing [SH87] due to the fact that its coohng rate is exponentially faster than the logarithmic schedule used by B A . The fatter tail of the Cauchy distribution (see Figure 2.2) allows the system to explore the model space more conveniently and local minima are more easily avoided. 2.5 Very fast simulated annealing (VFSA) Ingber [Ing89] proposed a SA technique that allows a still faster coohng schedule. This scheme is superior to the aforementioned SA approaches [Ros92] and in some cases supe-rior to evolutionary methods such as genetic algorithms [IR92]. V F S A uses a generating function with fatter tails than the Cauchy's. Contrarily to B A and FA which sample infinite ranges, V F S A samples finite ranges. Besides, V F S A takes into account the dif-ferences in each parameter-dimension by allowing g to expand or contract adaptively according to the sensitivity of the cost function of each dimension (this procedure is called \"re-annealing\"). Chapter 2. Simulated Annealing 19 5^} \"ft Figure 2.2: In SA, a new model state is generated by drawing a perturbation from the probability density function g(x). The figures show the pdf corresponding to Boltzmann annealing (BA), fast annealing (FA) and very fast simulated annealing (VFSA) , for T = 1, T = 0.1 and T = 0.01. Note the fatter tail of the V F S A generating function, even at low temperatures. Chapter 2. Simulated Annealing 20 2.5.1 Description of the algorithm In V F S A , new parameters x\\ in dimension i are generated via the random variable yi using x?+1) = x[k) + y i ( B i - Ai), Vi e [-1,1]. (2.9) The new parameters x[k+1^ are constrained to the range [Aj,B»], and formula (2.9) is repeated for all i until a set of M new parameters are generated. The generating function in the V F S A algorithm is defined as M Y ^SacfcT+^MMW ( 2 ' 1 0 ) where I have dropped the superscript k for simplicity. The cumulative probability dis-tribution may be obtained by the integral M •••/ 9(y)dy1--dyM = l[Gi(yi), (2.11) •i • ' - i i = i 1 sgn(yi) ln ( l + \\yi\\/Tj) GiiVi) = j + - g - l n ( 1 + 1 / T i ) • (2-12) As a result, new points ruled by this distribution may be generated from a uniform distribution Ui £ [0,1] considering Vi = G7\\ui), (2.13) hence Chapter 2. Simulated Annealing 21 Vi = sgn(Ui - l-) Ti [(1 + l/TO1 2\"4-1 1 - l] • (2.14) Having defined g(y), the cooling schedule that statistically guarantees the convergence to a global minimum can be given by Ti = Toie-eihl,U, (2.15) where c, is some user-defined constant that does not affect the convergence proof given in Appendix A. In practical contexts, this constant may be used to tune the algorithm for particular applications. For example, to reach a final temperature Tfi in kfi iterations, set Ci = k^Mlog^i. (2.16) lfi 2.5.2 Re-annealing It seems reasonable to use a wider generating function for the insensitive dimensions relatively to the more sensitive ones. To account for different sensitivities of the cost function of each parameter at different locations of the model space, temperatures are periodically rescaled (every a hundred iterations, or so) so that g(y) expands or contracts accordingly. The sensitivities are computed at the current lowest minimum, and the parameter temperatures rescaled relatively to the maximum sensitivity by means of r T(v^M ( 2 1 7 ) •Si where Chapter 2. Simulated Annealing 22 5$ ^ $(g< + g) - $(g f - 0) ds; ~ 25 (2.18) with S small. 2.5.3 Quenching Whenever the number of parameters is large, the cooling rate (2.15) may be still too slow for some applications (consider the exponent 1/M). In these cases, the cooling rate can be accelerated using Ti = Toie-CikQ/M, (2.19) where Q is a quench factor set between 1 and M. Note that only for Q = 1 the statistical convergence is ensured (Appendix A). 2.6 Annealing or quenching? As seen in previous sections, the cooling rate is somehow governed by the selected gen-erating function. Figure 2.3 shows various cooling schedules for different SA algorithms. For simplicity, T 0 is such that Ti is about the same in all cases. Note that in VFSA, temperature decreases much faster than in BA and FA, especially for M small. For many practical applications, the cooling schedule which is consistent with the heuristic proof of global convergence is too slow because too many iterations are required to reach the lower energy states, particularly when the number of unknowns is relat-ively large. With expediency the only reason given, many researchers use faster cooling schedules that do not guarantee convergence, claiming to use SA to solve their problems. Chapter 2. Simulated Annealing 23 T i i i i i r iteration Figure 2.3: Comparison of various SA cooling schedules. For example, a commonly used coohng rate in B A is the exponential schedule of the form Tk = Toe~ ak, a small, (2.20) which is equivalent to another frequently used coohng rate Tk = 0Tk_x = 3kTo, 8 = e- a. (2.21) These two schedules are particular cases of equation (2.19) for Q = M and c» = a. It is clear that the convergence proof given for the B A algorithm is violated if one of the above two coohng schedules is used. These algorithms should be called simulated quenching (SQ) rather than simulated annealing, for the system is not guaranteed to reach the lowest energy states in annealing time. Nevertheless, any attempt to reduce Chapter 2. Simulated Annealing 24 the number of iterations it is always worthwhile, and quenching has been found very useful in a number of circumstances [Ing93]. For example, Mirkin et al. compared various (quenching) schedules for the residual statics problem and obtained satisfactory results [MVC93]. An interesting discussion on annealing vs quenching can be found in [Ing96]. 2.6.1 Initial temperature The selection of the initial temperature is sometimes a critical issue in standard SA algorithms. If To is too high, the system will take longer to reach the lowest energy states. On the other hand, if To is too low, the system might freeze too soon. The convergence proof for B A requires that the initial temperature in equation (2.6) is sufficiently high. This is not the case for FA and V F S A , where To is arbitrary, as shown in Appendix A . Besides, since the coohng rate of both FA and V F S A is much faster that B A (see Figure 2.3), the selection of a high value for T0 is not critical at all. In practical contexts, the initial temperature should be such that the expected cost at this temperature is within one standard deviation of the mean cost [Whi84]. This is to ensure that almost all proposed configurations are accepted at this stage (the system is completed \"melted\"). For this purpose it is enough to set To numerically equal to the mean cost function for a set of arbitrary points selected at random from the model space. In Figure 2.3 I illustrated various coohng rates for different SA algorithms. For the sake of simplicity, I set the initial temperatures so that Ti = 1 in all cases. Since To is arbitrary for FA and V F S A , smaller values could have been selected for higher coohng rates without affecting the convergence proof. In terms of rate of convergence, it is worthwhile mentioning that too high values for To in B A might diminish its efficiency tremendously. In FA and V F S A one can be much less cautious in selecting the initial temperature. Chapter 2. Simulated Annealing 25 A t this point it is necessary to stress the difference, in the V F S A algorithm, between acceptance temperature, which is associated with the Metropolis criterion, and parameter temperature, which is associated with the generating function. Even though the cooling rate for both temperatures is generally the same, the init ia l temperatures might be dif-ferent. In the first case, T0 is chosen as described above (mean cost at arbitrary points). In the second case, Ten is generally set equal to 1 for all i 's, so that the model space is sampled widely independently of the init ial configuration. These are the methods I wi l l use for selecting the ini t ia l temperatures throughout this thesis. 2.7 Constrained simulated annealing There are two types of constraints that may arise in nonlinear optimization problems such as equation (1.1): • bounding constraints, and • model constraints. A n y model configuration that satisfies the given constraints, is said to belong to the feasible region. Whi le configurations not satisfying the constraints are in the infeasible region. The union of both regions form the model space. Bounding constraints usually consist of minimum and maximum values for each model parameter (search space). These constraints are naturally incorporated in V F S A by specifying the ranges for the \"random\" perturbation stage in equation (2.9). Whenever a new configuration falls outside the given ranges, equation (2.9) is repeated unti l all model parameters fall inside the feasible region. The incorporation of model constraints is not so direct. Mode l constraints are of four types: 1. exact linear constraints, Chapter 2. Simulated Annealing 26 2. exact nonlinear constraints, 3. inequality linear constraints, and 4. inequality nonlinear constraints. The model constraints can be any relationship, explicit or implicit, between the model parameters. Often they represent a difficult optimization problem on their own. Hand-ling these four types of model constraints using any optimization algorithm requires sometimes very different approaches and strategies. But in general, there are two ways of addressing the resulting constrained optimization problem. One way is to devise an algorithm able to generate always feasible points in an efficient way. These points are then checked for acceptance/rejection using the Metropolis algorithm and SA proceeds as usual. The second approach is to transform the constrained optimization problem into an unconstrained one by means of penalty terms. During the first stages of the optimization, model parameters usually are infeasible points, but as iteration proceeds, they tend to get into the feasible region. It is not the purpose of this chapter to develop general methods for handling model constraints with SA, although the literature in these matters is not complete and further investigation is required. In general, the first of the above two approaches (sampling from the feasible region) is desirable for nonlinear optimization, for SA concentrates on decreasing the value of the cost function within the feasible region only and does not waste the time exploring the infeasible model space. The second approach (penalizing infeasible points) is perhaps more widely used for its apparent simplicity, though the sampling may become biased towards undesirable model regions during too many SA iterations. Moreover, if penalty terms are not chosen properly, the algorithm may get easily trapped in local minima. Sampling from the feasible region for linear model constraints is in Chapter 2. Simulated Annealing 27 general easier than for nonlinear ones, for linear constraints constitute a convex set1. This convex set can be sampled with no difficulties using a number of techniques (see for example [BBK +87]). These sampling techniques can be used in conjunction with SA for generating feasible (random) points. The same algorithms can be extended for nonlinear model constraints [CH82], but only if they define a convex set. A general technique for sampling feasible points with arbitrary nonlinear model constraints remains to be found. Fortunately, particular methods can be devised for the optimization problem at hand, and a general technique is not always required. In Chapters 4 and 5 I devised some schemes for dealing with nonlinear model constraints in connection to the two-point ray-tracing problem via simulated annealing (SART). The model constraint here is that the raypath must arrive to a given receiver point. The cost function is represented by the traveltime. It is not obvious which is the feasible region in this problem. Actually, it corresponds to those take-off angles which generate a ray propagating from the source to the receiver (in the two-dimensional space defined by the two take-off angles, declina-tion and azimuth, the feasible region consists of isolated points). Clearly, not only the constraint is nonlinear, but also the feasible region is nonconvex. If one were able to sample from the feasible region, the problem would be solved almost immediately. But this is not possible in general. In SART the feasible region is extended to encompass all possible take-off angles (within a given range), though not necessarily corresponding to raypaths with physical sense. However, when traveltime is minimum, the final raypath satisfies the ray equations from source to receiver. 1 A set is said to be convex if it contains straight line segments between any two of its points (see for example [Rud87]). Chapter 2. Simulated Annealing 28 2.8 Conclusions SA is a very powerful and important tool for optimizing difficult cost functions in a variety of disciplines. V F S A , a variant of SA, outperforms other SA algorithms in terms of computational cost. This is achieved by exploring the model space more efficiently and allowing for fast coohng rates. At the same time, one can be much less cautious for selecting the initial temperature. Contrarily to simulated quenching (SQ), V F S A strictly adheres to the sufficient conditions for global convergence. Nevertheless, SQ is a valuable and valid alternative in terms of its practicality. Constraints come in two flavors: bounding constraints and model constraints. The first type is naturally implemented in V F S A , since a search range is specified for each model parameter. The second type requires a special consideration which may be problem dependent. Linear model constraints are by far more easily implemented than nonlinear ones. It is worth mentioning that there exist in the literature other SA algorithms not described in this chapter. To name a few I can mention the heat bath algorithm [Rot86], SA without rejected moves [GS84], mean field annealing [BMT + 89], etc. The first two are intended to reduce the low acceptance to rejection ratio of the standard SA algorithm. The third one is a SQ algorithm that performs very well when a mean-field theory is a good approximation to a stochastic cost function2. 2In statistical mechanics, the mean-field theory is based on approximating the stochastic behavior of large systems of electronic spin elements with complex interactions, by a deterministic set of equations. In the SA approach, these equations are solved iteratively. Chapter 3 Wavelet Estimation Using Fourth-Order Cumulant Matching Models are to be used but not to be believed. Henry Thei l 3.1 Introduction In this chapter I am concerned primarily with the problem of wavelet estimation via the 4th-order cumulant matching (CM) approach. In essence, the problem consists of estim-ating, at best, two unknowns from one equation. In the seismic case, as is well known, the model of the recorded seismic trace, xt, is generally formulated as a convolution plus additive noise x(t) = a(t)*r(t) + v(t), (3.1) where a[t) is the seismic wavelet and r(t) is the reflectivity series. The C M approach makes use of higher-order cumulants, higher-order covariance func-tions wi th special properties. It turns out that, as a result of the convolutional model which I have described, including additive Gaussian noise, the seismic wavelet can be esti-mated from the noisy seismogram under the constraint that the reflectivity is a stationary, non-Gaussian, and statistically independent random process [BR67, L R 8 2 , NR87]. Since the 2nd-order cumulant of a zero-mean process (autocorrelation) is a phaseless function, 29 Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 30 and the 3rd-order cumulant is identically zero when the process is symmetrically dis-tributed (like most seismic reflectivity sequences), higher-order cumulants are needed for mixed-phase estimation. The information contained in the 4th-order cumulant (a 3-D function) is enough to determine the wavelet within a polarity reversal and a time shift, and no assumptions are required concerning the wavelet phase. Tugnait [Tug87] presented a 4th-order C M method in which a mixed-phase moving-average (MA) wavelet is estimated from the data 4th-order cumulant, which is generally an estimate of the wavelet 4th-order moment. Following that work, Lazear [Laz93] pre-sented a parametric C M method for estimating a mixed-phase wavelet using real seismic data. The C M problem turns into an optimization problem where a highly nonlinear cost function is to be minimized. The main purpose of this chapter is to offer an improved strategy for solving the optimization problem that makes use of a very fast simulated annealing (VFSA) algorithm [Ing93], reducing the risk of the solution being trapped in local minima. I also propose a hybrid strategy aimed at a trace-by-trace implementation. This approach combines the fast performance of a standard linearizing technique with the reliability of the results provided by V F S A . I propose the application of a multidimensional taper to smooth the trace cumulant as a method to provide better estimates even when small amounts of data are used. This is particularly important because the accuracy of the trace cumulant estimate depends strongly on the amount of (stationary) data available. The importance of tapering of the trace cumulant is illustrated by means of both synthetic and field data examples. Very recently, Hargreaves [Har94] has pointed out that the 4th-order cumulant, like the kurtosis, may be phase insensitive below a limiting bandwidth. I explore this crucial aspect of the C M approach and show that, in general, any lag of the 4th-order moment exhibits the same degree of sensitivity to wavelet phase. I draw attention to the behavior of the C M method under various non-Gaussian Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 31 reflectivity models that may arise in real situations [WH86]. Here I show various synthetic examples using as few as 250 data samples and as many as 10,000 data samples. The results are very encouraging because very good wavelet estimates are obtained for most models. Finally I illustrate the improved C M strategy in a trace-by-trace implementation using both on-shore and marine data. With these examples I demonstrate the viability of the C M approach even when a small amount of data is used. 3.2 Background theory 3.2.1 Cumulants and moments Cumulants and moments are higher-order covariance functions with properties that make them very useful for describing both stochastic and deterministic signals (see for example [Men91]). Given a real stationary discrete-time random process x(t), t = 0 , ± 1 , ± 2 - - - , its nth-order moment function is expressed by r 2 l • • • , r n _ x ) = E{x{t), x(t + n) • • • x{t + TV,-!)} , (3.2) where E{-} denotes statistical expectation. In the case of real discrete-time deterministic finite-length signals, a(t), t = 0,1, • • • , M — 1, expectations are replaced by sums1: M-l m ^ ( T 1 , T 2 , - - - l r n _ i ) = J2 a(t)a(t + T1)---a(t + T N - 1 ) . (3.3) t=o These moments measure the degree of similarity between a signal and a product of delayed or advanced versions of itself. For example, is known as the \"mean\" value, raf(ri) is the autocorrelation function, etc. 1 I n practice, higher-order m o m e n t s are estimated using this f o r m u l a for b o t h stationary r a n d o m processes a n d finite-length deterministic signals [NP93]. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 32 For n = 3,4, the nth-order cumulant function of a non-Gaussian stationary random process, x(t), can be written as < ( r i , r 2 , - - - , r n _ 1 ) = m^(r x, r 2 , • • •, r ^ ) - m£(r i , r 2 , • • • , r „_ i ) , (3.4) where m ° ( T i , r 2 , • • • , T n _ ! ) is the rath-order moment function of an equivalent Gaussian random process that has the same mean value and autocorrelation function as x(t). Clearly, if x(i) is Gaussian, C * ( T I , T 2 , • • • , r„_i ) = 0. Since there is no clear advantage to using cumulants for the analysis of deterministic signals, the previous definition applies only to stochastic processes. The lst-order cumulant is equal to the lst-order moment, which is the mean value. The 2nd-order cumulant is equal to the covariance, and in the case of a zero-mean process, it is equal to the autocorrelation. This function is symmetric about r = 0 and hence all phase information is lost. As a result and, as is well known, methods based on the use of the autocorrelation are only capable of identifying a minimum-phase wavelet from a convolutional process. The 3rd-order cumulant equals the 3rd-order moment in the case of a zero-mean process (i.e. m^(ri ,r 2) = 0 for all T I , T 2 ) . It is a 2-D function that preserves phase information in its structure that may be exploited for system identification. However, when its argument is an independent identically distributed (IID) random process (such as most reflectivities), it is identically zero. On the other hand, the 4th-order cumulant of a non-Gaussian stationary signal not only preserves phase information, but also is nonzero for an IID process. In the case of zero-mean signal, x(t), mf(T1,T2,T3) = m^T-^m^Tz - r 2) + ml(T2)rnx2(T3 - r a ) + m2(r3)m2(T2 - r a ) . (3.5) Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 33 By putting r x = r 2 = r 3 = 0 in the previous definitions, we have the familiar quantities ' 7? = C2(0) = E{x2{t)}, (variance) < 7 | = cS(0,0)= E{x3(t)}, (skewness) (3.6) . 7^ = cf (0, 0,0) = E{x\\t)} - ZE2{x\\t)}. (kurtosis) The normalized kurtosis is denned as 74/(7?)*) and equals zero for a Gaussian process. 3.2.2 Cumulant matching ( C M ) for a convolutional process Combining equations (3.1) and (3.4) and assuming that the reflectivity is a non-Gaussian, independent and IID process, the following important result is obtained [BR67]: <(n , T 2 , • • • , T n _ i ) = Ynmn{T\\ 1 *\"21 ' • • , Tn-1 ) + < ( T I , T 2 , • • • , T n _i ) . (3.7) Since v(t) is assumed Gaussian, its nth-order cumulant is equal to zero. This means that the approach is blind (in theory) to additive Gaussian noise. Thus, I can rewrite formula (3.7) for n = 4, as cJ (n,T2,T3) = 747724(71, T 2 , T 3 ) . (3.8) Equation (3.8) expresses the fact that the 4th-order moment of the wavelet, a(t), equals, within a scale factor, the 4th-order cumulant of the trace, x(t). In practice, neither the 4th-order cumulant of the noise is zero nor the 4th-order cumulant of the reflectivity is equal to a spike of amplitude 74 at lag (0,0,0), as expressed by formula (3.8). This is due to the fact that, despite the assumed stochastic properties of both noise and reflectivity, usually only a finite amount of data is available. Equation (3.8) is strictly true only for N —> 00. For N finite, one has an estimate of the trace 4th-order cumulant, and the C M Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 34 is computed in a mean-squared error sense. So I define the following cost function that relates the wavelet parameters with the estimated trace 4th-order cumulant, CI(TI,T2,TS): * = E E E [ * ^ ^ 3 ) - 7 > 4 ( n , r 2 , r 3 ) ] 2 . (3.9) T l T2 T 3 In the optimization algorithm, $ is minimized with respect to the wavelet parameters (the scale factor 74 can be absorbed by ml). This method for estimating a moving-average (MA) system from a convolutional model was first proposed by Lii and Rosenblatt [LR82] and extended by Tugnait [Tug87] in connection with the identification of an A R M A (autoregressive-MA) system. Due to various symmetry planes of the 4th-order moment function and the finite length of the wavelet, the summations in (3.9) are considerably reduced. It can be proved that all the necessary information is contained in one of 24 regions that compound the total domain of the moment function [PIIF92]. This means that if we know the moment at any of these 24 regions, we can map across the symmetry planes and produce the entire 4th-order moment. As a result, it is enough to evaluate the summations in (3.9) for 0 < T\\ < M, 0 < r 2 < Ti, and 0 < T3 < r 2 only, M being the length of the M A system. Usually, M can be estimated from the autocorrelation function. 3.3 O p t i m i z a t i o n p r o b l e m Basically, two approaches can be used to minimize (1) a linearizing technique based on gradient directions, and (2) a stochastic global optimization algorithm. It should be noted that $, a multidimensional cost function, is highly nonlinear because it involves higher-order covariances. It seems reasonable to expect it to be a multimodal function. In previous works $ has been minimized with respect to the M A parameters using a steepest descent algorithm. I am not going to consider the linearizing method here, and Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 35 the reader is referred to the works of Tugnait [Tug87] and Lazear [Laz93]. However, it is well-known that this kind of optimization schemes converge to the local minimum which is closest to the initial model. I have found some numerical examples (I show this below) in which the linearizing scheme converges to a local minimum which is far from the required solution. To avoid local minima that may lead to poor wavelet estimates, I propose the use of a stochastic global optimization algorithm. I use a very fast simulated annealing (VFSA) technique to solve for the wavelet parameters which are obtained independently of the initial estimate. Nevertheless, the linearizing technique should not be discarded because in general I have found that it is faster than V F S A . 3.4 Discussion and explanatory examples 3.4.1 Tapering the cumulant estimate Since in practice, neither the cumulant of the noise is zero nor is the cumulant of the reflectivity series a multidimensional spike at zero lag, the estimated wavelet moment obtained from equation (3.8) is a distorted version of the true wavelet moment. Figures 3.1a and 3.1b show a view of the 4th-order moment of a zero-phase Packer wavelet and the 4th-order cumulant of a trace (250 samples) that was generated by convolving a sparse reflectivity with the wavelet, plus 10% by amplitude Gaussian noise2. Note that for the lags shown, the trace cumulant is a poor estimate of the wavelet moment. Any attempt to recover the wavelet using this cumulant estimate will likely fail. To improve the estimate, I have found it very useful to apply a 3-D smoothing-taper window to the trace cumulant, especially for those cases in which the cumulant estimate is very poor due to low amount of data available. In these cases, I can re-define the cost function as 2The standard deviation of noise was made equal to 10% of the maximum amplitude present in the data. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 36 Figure 3.1: Tapering the trace cumulant. (a) Fourth-order moment function of a 25-points Pucker wavelet, (b) Fourth-order cumulant function of a 250-points trace generated by convolving the Pucker wavelet with a sparse reflectivity series (plus 10% by amplitude Gaussian noise), (c) The same cumulant function in (b) after the application of a 3-D Parzen window, (d) Fourth-order cumulant of the trace using 5,000 points. A l l diagrams correspond to —12 < Ti,r2 < 12, and r 3 = 0. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 37 $ = E E E [ M n , r 2 , r 3 ^ ( T 1 ) r 2 , r 3 ) - 7 4 r m » ( r 1 , r 2 ) r 3 ) ] 2 . (3.10) n T2 T 3 where /&(TI, T 2 , T 3 ) is a 3-D window function. Usually, suitable windows are applied in the case of power spectral estimation and higher-order spectral estimation to produce better estimates. The window function should have the following properties: a) h(T1,T2,r3) = / i ( - r i , r 2 - T i , r 3 - n) = /I(T X - r 2 , - r 2 , T 3 - T 2 ) = / i ( T i - T 3 , T 2 - T 3 , - T 3 ) and any possible exchange of any pair of the three arguments (symmetry properties of the 4th-order cumulant); b) h(ri,T2, T 3 ) = 0 for |TJ j > L [L defines the region of support of c f ( r i , T 2 , T 3 ) ] ; c) h(Q,0,0) — 1 (normalizing condition); d) H(U>I,LO2,OJ3) > 0 for all frequencies ( u > i , a ; 2 , u ; 3 ) . It is easy to build multidimensional windows satisfying these constraints by using standard 1-D windows (see for example [NP93]). So I can write M r l i T 2 » T s ) = < * ( r i ) i (T 2 ) (£ (T3) i (T2 - T i )d (T 3 - T 2 ) C Z ( T 3 - T a ) (3.11) where d(r) = d(—r); d(r) = 0, r > L; d(0) - 1 and D(w) > 0 for all u>. Among many possible 1-D windows (Hamming, triangular, etc.) I have obtained very good results using a Parzen window, which is defined by d(r) = l - 6 ( | T | / L ) 2 + 6 ( | T | / i : ) 3 , \\T\\L. (3.12) Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 38 Figures 3.1c and 3.Id show the trace cumulant after applying a 3-D Parzen window and the trace cumulant when using 5,000 data samples. Observe that the improvement due to the tapering is quite important. This obviates the use of large amounts of data where the validity of convolutional stationarity is very much in question. I have found that the C M method can be applied even for a trace-by-trace implementation, and in many cases only a few traces are required to obtain cumulant estimates good enough to provide reliable wavelet estimates. In the last section I illustrate this point using a small set of field data. 3.4.2 VFSA vs linearizing solutions: a hybrid strategy The reliability of the derived wavelets depends not only on the degree of validity of equation (3.8), but also on the success in the minimization of the multidimensional, nonlinear, and possibly multimodal cost function Hargreaves [Har94] proposed a procedure to assess the reliability of the derived wavelets that consists of applying a 90-degree phase shift to the data and verifying whether or not the new wavelet estimate exhibits the same phase shift. However, a procedure to assess the reliability of the optimization solution is still required. In his steepest descent algorithm, Lazear [Laz93] starts the iterations with a centered spike as the initializing wavelet. The results obtained by Lazear show that the solution is rather consistent and good estimates are obtained in all the examples shown. I have found that the V F S A approach generally provides an increased confidence in the solutions. As an example, Figure 3.2a shows a synthetic trace consisting of 250 points ( A i = 4 ms), generated by convolving a mixed-phase Berlage wavelet [Ald90] with a sparse reflectivity series. The trace has been contaminated with 10% by amplitude of Gaussian noise. In Figures 3.2b to 3.2d the true wavelet as well as the solutions obtained after minimizing $ are plotted (20 realizations using different noisy traces). In Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 39 this particular example the linearizing solution has been trapped in a local minimum of the cost function3, while the V F S A algorithm was able to recover the actual wavelet more accurately. Finally, Figure 3.3 shows the deconvolved traces using the wavelets in Figures 3.2b to 3.2d, as well as the corresponding error curves. The deconvolution was simply done by spectral division (see the book of Robinson and Treitel [RT80] for an excellent treatment of the deconvolution problem). Though data are very band-limited, a quick view at the error curves shows that the SA solution is a better estimate to the true wavelet. It should be pointed out here that for annealing (and not quenching) the C P U time required to find the V F S A solution is substantially greater than that required to find the linearizing one. In the example of Figure 3.2, about 20 iterations were required using the steepest descent approach. Each iteration involved the computation of the current wavelet moment function, its gradient and the step size, as well as updating the wavelet estimate. On the other hand, the V F S A solution required about 1,000 iterations, where each iteration involved basically computing the cost function and perturbing the wavelet estimate. In terms of C P U time, the linearizing approach took about 1.1 seconds for 20 iterations on a Sun Ultra 1 workstation, and V F S A about 4.0 seconds for 1,000 iterations. This implies that each linearizing iteration was about one order of magnitude more expensive than each V F S A iteration. When quenching is performed, the number of iterations may be reduced substantially, and the C P U time becomes comparable to the linearizing optimization. In a trace-by-trace processing, C P U time is a very important aspect that has to be taken into account. A strategy for an efficient implementation in real situations may be to apply the V F S A algorithm to a single trace and use this estimate as the initial model 3 As suggested in the previous work, I set the initial guesses for the linearizing approach equal to a centered spike. The fact that the true wavelet contains two main lobes made the method to converge to a local minima. The global minimum might have been obtained using a different starting model. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 40 (a) 0 0.12 0 0.12 0 0.12 Figure 3.2: Synthetical example: linearizing vs V F S A solutions, (a) Reflectivity series and trace, (b) Actual wavelet, (c) Linearizing wavelet estimates (20 realizations), (d) SA wavelet estimates (20 realizations). for the remaining traces where the linearizing algorithm can be used. This procedure combines the fast performance of the linearizing technique with the more reliable solutions provided by the V F S A algorithm. At the same time, the risk of being trapped in a local minimum when using the linearizing technique is reduced (if not eliminated) because a reasonable initial model is now used. 3.4.3 Sensitivity to wavelet phase The sensitivity of the zero-lag 4th-order cumulant (kurtosis) to wavelet phase has been studied by various authors in connection to the estimation of a residual phase-shift present Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 41 0.5 Figure 3.3: Synthetical example: deconvolved traces and error. Deconvolved traces using (a) the actual wavelet, (b) the average SA wavelet, and (c) the average linearizing wavelet, (d) Difference between series (a) and (b). (e) Difference between series (a) and (c). in the data by maximizing the kurtosis or the varimax norm [Whi86, L087]. It turns out that if the data effective bandwidth, B, is small compared to the central frequency, /o, the kurtosis (and hence the varimax norm) is rather insensitive to phase changes. Fortunately, most seismic data satisfy B > f0 and the maximum kurtosis criterion is widely used in seismic processing. In order to perform a more detailed bandwidth study, I have extended the analysis to other lags of c ^ T i , r 2 , T3) . For this purpose, I generated a passband-type wavelet by subtracting two low-pass filters with cut-off frequencies fi and //,: I- (3-16) These models have leptokurtic pdf's and it is this non-Gaussian nature that is observed in well-log data. Here, I consider also other kinds of reflectivity models: one that is essentially sparse, and one that has been proven to fit very well the empirical amplitude distribution of block-averaged well logs. The first one can be described by a Bernoulli-Gaussian process [KM78, GR81] with pdf Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 45 ( a ) ( d ) 1 • N=250 l - A- A ft—* x N=1000 A N=10000 •& 0 - / 0 - / 0 - X~ A V — * i i i 0.8 i i i 0.7 0.9 0.7 x 0.9 (b) (e) 0.95 h 0.95 75 0 0.95 1.8 0 B-G kur. Figure 3.5: Non-Gaussianity assumption. From top to bottom, subsequent rows cor-respond to Bernoulli-Gaussian, Laplace mixture, and Gaussian-7 models. (a,b,c) Nor-malized cost function for the different models. This quantity expresses the matching between the true wavelet moment and the corresponding trace cumulant. (d,e,f) Cor-relation between each wavelet estimate and the actual wavelet for the different models. (g,h,i) Correlation versus kurtosis for the different models. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 46 f(r) = 0 with probability A JV(0, o-2) with probability 1 - A, (3.17) where N(0, of) denotes a zero mean Gaussian distribution with variance of, and 0 < A < 1. The normalized kurtosis can be easily computed and is equal to 3/(1 — A) — 3. For obtaining realistic reflectivity models, A ranges typically from 0.7 to 0.9, corresponding to models consisting of reflections of 1 in 3 samples (kurtosis=7), to 1 in 10 samples (kurtosis=27), approximately. The second model (perhaps the one that is closest to a real reflectivity sequence from the point of view of stratigraphic considerations) is that in [WH86]. In this paper it is shown that real primary reflectivity sequences can be modeled as a mixture of two Laplace (or double-sided exponential) distributions: where Ai and A 2 denote the scale parameters of each population and p the proportion of the mixture, 0 < p < 1. This alternative model makes very clear the distinction between sedimentary beds and lithologic units by simply changing the proportion of the mixture. Such a model has normalized kurtosis 6(1 + c) — 3, c > 0, which is always greater than or equal to 3 (here c is a function of p, Ai and A 2 only [WH86]). Several well-log data sets were fitted using this flexible model, and in all the cases shown in the aforementioned work, the results indicated a kurtosis exceeding 3. I generated several reflectivity sequences according to the three models (i.e. Gaussian-7, Bernoulli-Gaussian, Laplace mixture) using various parameters. The models were intended to reproduce real reflectivity sequences and the purpose was to test the ef-fectiveness of the C M method to recover the true wavelet for the various models using (3.18) Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 47 different amounts of data. A zero-phase Packer wavelet was used to generate each seismo-gram, and 10% by amplitude Gaussian noise was added. Figures 3.5a to 3.5c illustrate the results of computing the normalized mean-squared error between the trace cumulant estimate and the true wavelet moment: M S E = Sn.Ta.-rJ^fri,r 2,r 3) - \"S( ri,r 2,r3)]2 2 where ct is a constant that minimizes MSE. Also, I have computed the correlation of the estimated wavelets with the true wavelet, which is shown in Figures 3.5d to 3.5f. As expected, the matching increases with the amount of data. In general, good estimates were obtained even for models with small kurtosis and a relatively small amount of data (correlations exceeding 0.95 were obtained for most of the cases). For sparse models, I have found excellent results using as few as 250 samples. In Figures 3.5g to 3.5i I have plotted the effective kurtosis (as computed from the reflectivity series) versus wavelet correlation for models using 1,000 and 10,000 data samples. It is interesting to note that there is not a rigorous correlation between kurtosis and accuracy of results. This may be true only for some specific models (e.g. the Gaussian-7) since the results appear to depend on the manner in which the wavelet interferes with the structure of the hidden reflectivity coefficients. For example, consider Figure 3.5i corresponding to the Gaussian-7 model. The higher the kurtosis, the higher the correlation between true and estimated wavelets. This is not the case for Bernoulli-Gaussian and Laplace models (Figures 3.5g to 3.5h). Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 48 3.5 Real data examples 3.5.1 Field data In this example, the field data I used are shown in Figure 3.6a. The window con-sists of 21 traces, from 1.0 to 2.0 s. To illustrate the trace-by-trace process, I obtained an individual wavelet estimate for each trace and then averaged the optimally shifted estimates to obtain an average estimate. Only 250 samples were used to obtain each individual estimate. The consistency obtained in this example is clear from inspection of the individual estimates plotted in Figure 3.7a. The average estimate is shown in Figure 3.7b. If all data are used simultaneously to estimate the trace cumulant, the wavelet in Figure 3.7c is obtained. This is very similar to the wavelet obtained after synchronizing the individual wavelets. To assess the reliability of the wavelet estimate I deconvolved the original data by spectral division using the average estimate. The \"zero-phase\" section is shown in Figure 3.6b. These data were in turn used to obtain the residual wavelets which are shown in Figures 3.7d to 3.7f. As expected, the residual wavelets are very close to zero-phase, a fact that demonstrates the reliability of the wavelet estimate. 3.5.2 Marine data The real data in this example come from a marine section which had been processed carefully for impedance recovery. Figure 3.8a shows the 24 traces (325 samples per trace) used for wavelet estimation. The 24 individual wavelet estimates are shown in Figure 3.8b. The average wavelet estimate, after the hybrid trace-by-trace strategy and stacking the wavelets in panel (a) of the same figure, is illustrated in Figure 3.8c. As expected, due to the pre-processing steps employed, the recovered wavelet is very nearly zero-phase. Finally, the wavelet shown in Figure 3.8d corresponds to the estimate using Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 49 Figure 3.6: Field data example: input data, (a) Original data section used for a trace-by-trace C M wavelet estimation (b) Same section in (a) after removing the wavelet phase using the average estimate. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 50 (a) (b) (c) i • 1 1 1 i 1 . i i i i i i i 0 0.12 0 0.12 0 0.12 Figure 3.7: Field data example: wavelet estimates, (a) 21 individual wavelet estimates of the section shown in Figure 3.6a. (b) Average wavelet estimate after synchronizing and stacking all the individual estimates, (c) Wavelet estimate using all data simultaneously, (d), (e), and (f) residual wavelets corresponding to the zero-phased section in Figure 3.6b. all data simultaneously. 3.6 C M in the frequency domain In as much as the information contained in the autocorrelation sequence is essentially present in the power spectrum, so the information contained in higher-order moments and cumulants is present in polyspectra (higher-order spectra). This duality is estab-lished upon the polyspectra definition. Polyspectra are defined, naturally, as the Fourier transform of the corresponding time domain sequence. For instance, the bispectrum Figure 3.8: Marine data example, (a) Section of data used in the trace-by-trace C M wavelet estimation, (b) 24 individual wavelet estimates, (c) Average wavelet estimate, (d) Wavelet estimate using all data simultaneously. Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 52 (third-order spectrum) is defined to be the Fourier transform of the third-order cumulant sequence, and the trispectrum, which is the fourth-order spectrum, is defined to be the Fourier transform of the fourth-order cumulant sequence. It is worthwhile to make a dis-tinction between two types of polyspectra: (1) cumuiant spectra, which are particularly useful in the analysis of stochastic processes, and (2) moment spectra, which are of great importance in the analysis of deterministic signals. Since we are interested in fourth-order statistics, let x{t), t = 0 , ± 1 , ± 2 - - - , be a real stationary discrete-time random process with fourth-order cumulant sequence c%(r\\,T2,Tz). Then, the 4th-order cumulant spectrum (trispectrum) of the process x(t) is defined as the 3-D Fourier transform +oo +oo +oo Cftwij^yWa) = X) c4( r i ,r2 ,T 3)exp[-j(u>iTi + U)2T2 + a;3r3)], (3.20) T l = —OO T 2 = — OO T 2 = —OO where |u>;| < 7r for i — 1,2,3, and |o;i + O J 2 -f- u;31 < TT [Bri65]. In general, C\"(ui,u)2,w3) is complex, and it is periodic with period 2-7T in each dimension. Many symmetry properties can be derived due to the fourth-order cumulant definition. As a result, there is a lot of redundant information in the cube + u>2 + &z\\ < 7r (see for example [PIIF92]). In the case of real discrete-time deterministic signals, a(t), the fourth-order moment spectrum is defined as the Fourier transform of the fourth-order moment sequence. A more compact and interesting expression can be obtained in terms of the Fourier trans-form of the signal, A(u>), after manipulating algebraically the corresponding definitions: M£(wi ,u ;2 ,u>3) = A(u>i)A(u>2)A(w3)A*(cc>i + u2 + w 3), (3.21) where \\u>i\\ < ir for i — 1,2, 3; |u>i + u>2 + w3\\ < ir, and \" * \" denotes conjugate. In terms of magnitude and phase, equation (3.21) is equivalent to Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 53 |Af£(wi,u;2,W3)| = |A(wi)p(w 2 ) | | j4(w 2 ) | |4(wi + W 2 + a , s ) | (3.22) ^ ( W i , ^ , ^ ) = ^ a ( w i ) + <^a(<^2) + 2 + W3), where |A(u>)| and <^ >a(u>) are the magnitude and phase spectrum of the signal a(t), respec-tively. Back to the wavelet estimation problem, taking Fourier transform to equation (3.8) for N finite yields C T ( w i i w 2 , w s ) 74A(wi)A(u; 2)A(cj3)A*(a; 1 + w2 + w8), (3.23) where A(a>) is the Fourier transform of the seismic wavelet. This expression states that then trispectrum of the trace approximates, within a scale factor (and a linear phase shift), the fourth-order moment spectrum of the wavelet. In this context, there are several methods in the literature for recovering the signal from high-order spectra. These include phase recovery algorithms [LR82, MU84], phase and magnitude recovery algorithms, etc. (see for example [NP93], Chapters 6&7, for a concise description of various methods). One of such methods is based on the least squares optimization of the following cost functions: $phase = £ W 1 £ W 2 E W 3 f e ^ 2 , W 3 ) ~ W2> W 3)] 2 ^magnitude = Ecu, £ u 2 E c 3 [ I C4 ( W l » W 2 , W 3 ) I ~ 7 4 I , U2, W3) |] 2 (3.24) where *f(a>i,a;2,a;3) and \\C^(U>I,(JJ2,OJ3)\\ are estimated from the data. These cost func-tions establish the duality between the C M in the time domain and the C M in the frequency domain. Once ^1(U;1,UJ2,u>3) and \\Cl{wx,w2,Wz)\\ are estimated, V F S A can be applied to minimize the sums with respect to the wavelet spectral coefficients. A l -ternatively, the cost functions may be written as the squared error between the real and Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 54 imaginary parts of data and wavelet polyspectral coefficients. Though not tested here, the results in the frequency domain are expected to be (apart from multidimensional unwrapping and other computational issues that may arise) similar to the results in the time domain, for the information present in the polyspectra is essentially contained in the corresponding higher-order cumulant or moment sequence, as mentioned in the first paragraph of this section. Finally, it is worth mentioning that an alternative and elegant approach is to use the cepstrum of higher-order cumulants (polycepstra), and in particular the tricepstrum [SVU98]. This method allows one to obtain the wavelet coefficients analytically from the estimated tricepstrum of the data. The analysis of these techniques is beyond the scope of this work. 3.7 Conclusions 1. The V F S A optimization should be seen as a method for assessing the reliability of the numerical solution, and a way to provide consistent results regardless of the initial model used in the optimization process. This leads to a hybrid strategy that combines the fast performance of the linearizing technique with the reliability of the V F S A solution. 2. Limited effective bandwidth and missing low frequencies in the recorded data reduce the 4th-order cumulant sensitivity to wavelet phase. This effect may be partially compensated by either whitening the data or performing A G C before carrying out the wavelet estimation. 3. It is assumed that the reflectivity is a non-Gaussian process. This assumption appears, in fact, to represent a property of the earth's reflectivity. Further, if the reflectivity is also somewhat sparse, which is not an unreasonable expectation, Chapter 3. Wavelet Estimation Using Fourth-Order Cumulant Matching 55 the C M method may be used with confidence with relatively short data segments. In these cases, the application of a multidimensional taper to smooth the trace cumulant estimate and improve the matching, is strongly recommended. 4. The combination of both the hybrid strategy and a convenient multidimensional tapering, permitted us to develop an efficient and consistent trace-by-trace imple-mentation. Numerical consistency is achieved by the V F S A algorithm and tapering, C P U efficiency is attained by the fast performance of a linearizing technique. 5. Despite the inherent hmitations of the C M method due to bandwidth content and amount of data available, the results obtained both with synthetic and real data demonstrate not only the viability of the method, but also the reliability of the convolutional model which is at the heart of wavelet estimation philosophy. Chapter 4 Two-Dimensional Boundary Value Ray Tracing Begin at the beginning... and go on till you come to the end, then stop. Lewis Carroll - Alice's Adventures in Wonderland 4.1 Introduction Ray tracing plays a key role in seismological studies. Large attention has been devoted to the initial-value problem (IVP), in which the ray is specified by the initial conditions: initial point and initial direction of propagation (take-off angle). The I V P is in general a well resolved problem (see for example [Cer87]). However, geotomographic methods and earthquake location usually require precise traveltime and trajectory computations of seismic waves propagating between two fixed points in an laterally heterogeneous medium. This represents a boundary-value problem (BVP) because the ray is not only specified by the initial conditions. In a more general case a ray may be specified by more complicated boundary conditions at different points of its trajectory. Such is the case of ray tracing reflected or headwaves connecting two fixed points. Traditionally, there are two techniques to find the raypath of a seismic wave propagat-ing between two points in a heterogeneous medium: shooting and bending [JG77, AR80]. The method of finite-differences [Vid88] has become widely used especially for dealing with smooth velocity fields. The first arrival traveltimes field is first calculated at all nodes of a gridded model, and then raypaths are traced by following the wavefront nor-mals. Most of these techniques perform very well for the particular type of velocity model 56 Chapter 4. Two-Dimensional Boundary Value Ray Tracing 57 they were devised for. But in general they are applicable to a hmited class of models, and the more complex the media, the more obvious become their limitations. Other techniques include graph theory methods, recently developed for obtaining the shortest paths in a gridded model [Mos89, FL93]. Though these methods work well for moder-ately complex 2-D media, their applicability for dealing with complex 3-D structures has not been studied thoughtfully, especially because it becomes computationally very much involved when a certain accuracy is required. In this chapter I develop a new ray-tracing method for solving the B V P through a het-erogeneous 2-D medium. Although a more general technique for solving the B V P through an arbitrary 3-D medium will be developed in next chapter, the ray-tracing method will be called \"simulated annealing ray tracing\" (SART) throughout. The purpose of SART is to overcome the usual deficiencies of shooting and bending for solving the B V P . The philosophy of the method is to put the B V P into a convenient optimization framework which is in turn solved by means of an efficient simulated annealing algorithm like V F S A . In the two-point ray-tracing case, SART is an iterative procedure that attempts to find the optimum take-off angle corresponding to the raypath with minimum traveltime con-necting any given source-receiver pair [Vel96]. The main advantage of SART lies in the fact that it overcomes the problem of multipathing inasmuch as convergence to the abso-lute minimum traveltime is statistically guaranteed. This is demonstrated by a number of synthetic examples where not only direct waves are traced, but also reflected (including normal rays) and headwaves. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 58 o f f s e t x * 1 1 Vl2 V21 (l , i V z + l) < Aa; # * (^x + 1,1) (NX + 1,NZ + 1) Figure 4.1: Two-dimensional cell parameterization. The velocity field is piece-wise con-stant. The function z = f(x) represents a reflector/refractor. 4.2 E a r t h model J The modeling of a complex geology in a way that it can be input and easily manipulated by a computer program is not a trivial task. Usually, it is more difficult and time consum-ing to devise a mathematical method for representing real velocity fields than devising a method for tracing rays through it. Fortunately, some simple forms of parameteriza-tion can approximate very well real subsurface velocity structures for the purpose of ray tracing and understanding of the method. In the 2-D case I use a cell ray-tracing method which is fast and accurate for the purposes of this work. This kind of parameterization requires the velocity field to be Chapter 4. Two-Dimensional Boundary Value Ray Tracing 59 represented by rectangular cells of constant velocity and fixed size. The model consists of Nx by Nz rectangular cells of size Aa; by A z which define Nx + 1 by Nz + 1 cell nodes (Figure 4.1). Node (1,1) has coordinates (xmin, zmin) as shown in the Figure. The two-dimensional velocity field is represented by v = v(x,z), where x stands for offset distance and z for depth (positive z-axis points downwards). Velocity v may correspond to either P- or S-wave velocity of propagation. Each rectangular cell is assigned the velocity corresponding to the upper-left node, thus vij = v [xmin + (» - l)Ax, zmin + (j - l ) A z ] , * = 1, • • • , Nx; j = 1, • • • , Nz. (4.1) Additionally, a predefined interface z — / (x) can be superposed for tracing reflections or headwaves as will be described later. Although this way of representing the velocity field presents some hmitations for tracing particular raypaths (e.g. turning rays), the simplicity associated with the geometry of the ray trajectory makes it attractive. The raypath will be composed of a series of segments honoring Snell's Law at each cell bound-ary. In the next chapter, a more flexible and general model parameterization will be proposed. 4.3 Solving the initial-value problem (IVP) 4.3.1 Two-dimensional cell ray tracing Before describing the B V P it is essential to describe the IVP for the model parameteriza-tion at hand. To solve the IVP I use a cell ray-tracing scheme where the velocity within each rectangular cell is assumed to be constant. Both traveltime and ray trajectory are to be found. The ray is propagated using Snell's Law at each cell boundary, and thus it Chapter 4. Two-Dimensional Boundary Value Ray Tracing 60 is composed of a number of segments. Intersection points and angles are obtained using simple geometrical relations as will be described in the following paragraphs. Four cases can be distinguished according to which edge the ray enters a given cell: left, top, right or bottom. I will describe the first case only, since the others follow a similar reasoning. Let us suppose the ray enters cell (z, j) from left edge at point x*. = (xk, Zfc), as shown in Figure 4.2a, where k denotes the k-th. point in the ray trajectory. The next point will he either on the top, right or bottom edges of the current cell, depending on the angle 9k and the cell geometry: ' 0 < Ok < tpa bottom, < -**HW] . The three subcases are depicted in Figures 4.2b to 4.2d, and the formulas required to compute coordinates (sjt+i, Zfe+i), angle 0k+i, and traveltime within the current cell are summarized in Table B . l . Snell's Law is applied at point Xfe + i = (aifc+i, zk+i) to determine the next direction: sin ak _ sin ak+1 ^ ^ vk vk+1 where ak and otk+i are the angles between the ray trajectory and the normal to the corresponding cell edge, and vk and vk+\\ are the velocities at opposite sides. For example, in the case of Figure 4.2b, Chapter 4. Two-Dimensional Boundary Value Ray Tracing 61 ( « , J ) (c) Vij s 1 s -r (d) m s s ^ Xfc < s \\ t j Figure 4.2: Detailed geometry of a ray entering cell (i, i) from the left edge, (a) The ray enters at point x.k which determines angles a and fo. (b), (c) and (d) Depending on 9k resulting from Snell's Law, point X fc + 1 is then computed. See Table B . l and text for details. f ak = iv - 9k, ak+i = 7T — Ok+i, ( Vk+1 = Vij-j.. (4.5) This process is repeated until the ray exits the model boundaries or crosses the pre-defined discontinuity, where a special attention must be paid. Finally, total traveltime is computed as the sum of all the individual contributions corresponding to each ray Chapter 4. Two-Dimensional Boundary Value Ray Tracing 62 segment: r = J > - (4.6) Figure 4.2 illustrates one of the four possible cases: xk lying on the left edge of the cell. As a result there is a total of twelve subcases depending on the location of x^ and the angle 6k. I will not describe them all here. The reader is referred to Appendix B where all the required formulas are summarized. In summary, starting from the source, Xo = x,, a sequence of segments is generated according to simple geometrical relations and Snell's Law at cell boundaries or predefined discontinuities. At each step, it must be determined which edge of the cell the ray comes from, so that one of the four mentioned cases is applied. Also, each segment of the ray which is being computed must be checked to see whether it crossed the predefined reflector (or refractor) z = f(x). Special cases A distinction must be made whenever the current point lies within the cell boundaries (e.g. when the source does not he exactly on some cell edge), and when the ray trajectory arrives at some predefined discontinuity (e.g. a reflector). These situations are depicted in Figure 4.3a and 4.3b respectively. In the first case, it is possible to use the same formulas as in the normal case (Table B . l ) by simply exchanging Aa; by Aa;' = iAx — xk as shown in Figure 4.3a. In the second case (Figure 4.3b), a more elaborate situation must be considered. After shooting from Xfe with angle 6k as usual, the auxiliary point X f c + 1 is determined. If a reflector crossing is detected, the intersection point x.k+i should be obtained becoming the next point in the ray trajectory. At this point, Snell's Law of reflection (or refraction if required) is applied and angle 6k+i so determined. Now, the Chapter 4. Two-Dimensional Boundary Value Ray Tracing 63 (a) ( b ) Figure 4.3: Special cases of the ray geometry, (a) Shooting from inside-left a given cell (b) The ray meets a reflector at point x*.+1 and travels towards point X fc + 2 after undergoing a reflection. propagation continues as in the case of Figure 4.3a. When a refraction is requested at this intersection point, the velocity on either side of z = f(x) is generally different. This means that some cells have two velocity values up and below the predefined refractor. The reflector (or refractor) z — f(x) is approximated by a straight line within each cell so that the intersection with the raypath is found in one step. After some simple geometrical relations, the coordinates of point Xfc + i are determined with which are the coordinates of the intersection point between the raypath segment connect-ing Xjt with X j . + 1 , and the reflector (or refractor) segment connecting /(x^) with /(x^.+1) respectively. A n iterative procedure (not assuming the reflector/refractor is a straight line within each cell) to find the intersection point could also be tried. But the approx-imation made is accurate enough for the purposes of this work, provided the reflector (refractor) is a smooth function and the number of cells is relatively large. (4.7) Chapter 4. Two-Dimensional Boundary Value Ray Tracing 64 Ray signature and stopping conditions It is also required to define a sort of ray signature at the beginning of the process to make a decision after function z = f(x) has been crossed. Having denned the ray signa-ture, it is possible to trace ray conversions by using different velocities after a reflection (or refraction) has occurred. For example, a ray signature may be simply assigning a series of binary numbers IOP{I), I = 1,2, • • •, associated with the discontinuity. These numbers are expressed as decimal integer values. Index I indicates the number of times the discontinuity is met, thus allowing for more than one reflection, refraction and/or conversion at different points of the same discontinuity. For example, the first time the discontinuity is met, the number I0P(1) is used to make the decision, the second time, I0P(2), etc. Each bit of IOP(I) may be either 0 or 1. The first bit is reserved for existence of the discontinuity, the second for refraction, the third for reflection, and the fourth for phase conversion. A \"0\" means to ignore the corresponding event, and a \"1\" means to follow the directives. Table 4.1 shows the decisions to make according to each bit value. As an example let I0P{1) = 13 (binary 1101) and IOP(2) = 1 (binary 0001). The first time the discontinuity is met, the ray is requested to undergo a reflection and a change of phase. If by chance the discontinuity is met again, the propagation is requested to stop. If refraction is required setting bit 2 to 1 instead, but a total reflection occurs, the propagation is required to stop by default. In the case a discontinuity is not defined or IOP(I) is set to 0, the propagation ends when point Xfc+i lies on some model boundary (i.e. xk+i = xmax and/or zk+\\ = zmax), when k > kmax, or when T > Tmax, where kmax and Tmax are user-predefined values. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 65 bit 0 1 1 ignore z = f(x) do not ignore z = f(x) 2 no refraction refraction 3 no reflection reflection 4 no phase conversion phase conversion Table 4.1: Decision table indicating what to do when discontinuity z = f(x) is met at some point of the ray trajectory. See text for explanation. 4.3.2 Pros & cons of cell ray tracing: a discussion As already mentioned, cell ray tracing with piecewise constant velocity presents some limitations, especially for tracing turning rays. The problem arises when at a certain cell edge a total reflection occurs due to large incident angles. As a counterpart, the method is computationally inexpensive, simple and accurate enough for the purposes of illustrating the two-point ray-tracing method that will be described in next sections. The total reflection is artificially introduced by the cell parameterization, because of the first-order discontinuities in the velocity that are generated (Figure 4.4a). The question is how to deal with large incident angles that may certainly occur. The answer would be to try to eliminate first-order discontinuities. First-order discontinuities may be eliminated by using nonconstant velocities within each cell. For rectangular cells, the simplest form which admits analytical solution and ensures continuity is the quadratic slowness: v~2(x, z) = a + bx + cz + dxz, (4-8) where a, b, c and d are constants [Cer87]. Clearly, the ray trajectory would be no longer composed of a series of straight segments. Although the shape of the ray for a velocity function as given in equation (4.8) is known, the computational cost and coding complexity would be much greater than using piecewise constant velocities because Chapter 4. Two-Dimensional Boundary Value Ray Tracing 66 Xfe+2 Xfe Xfe+2) Xfe+2 Figure 4.4: (a) Total reflection occurs when the angle ctk > a c , where a c is the critical angle derived from Snell's Law: s ina c = v k / v k + 1 , vk < vk+i- This is a fictitious reflec-tion introduced by the cell parameterization, (b) A method to eliminate the fictitious reflection in (a). The cell with velocity v k + i is subdivided into two smaller cells, one of them with velocity v'k+1 = (vk + vk+i)/2 < v k + \\ , which makes the new critical angle, a c , smaller. As a result ak < ac and a refraction occurs. trigonometric and hyperbolic functions are involved. Alternatively, triangular cells could be used. Now v(x, z) = a + bx + cz, (4-9) is the simplest form of velocity which ensures continuity. The raypath is an arc of a circle, which also leads to more expensive computations than using straight segments. However, in the case of piecewise constant velocity cells, if the velocity field is smooth and the size of the cells is relatively small, total reflections do not represent a serious problem. In this case total reflection would occur only for incident angles close to 90 Chapter 4. Two-Dimensional Boundary Value Ray Tracing 67 degrees. This issue does not mean that the velocity field must be smooth for the cell ray-tracing scheme to be an accurate procedure. Jumps in the velocity model can be introduced without loosing accuracy provided that these velocity jumps coincide with cell boundaries. Besides, additional discontinuities of arbitrary shape can be introduced through function z = f(x) as a refractor, as shown in this section. So far the method I have developed allows only one function z = f(x) to be defined, but more than one could be similarly introduced for greater flexibility in modeling complex structures. The use of smaller cells would obviously increase computational cost, but not in the same measure as in the case of nonconstant velocity cells. A n alternative not tested in this work would be to use some kind of adaptive gridding so that cells are larger where velocity is smooth, and smaller where velocity varies rapidly. This can be done only at those cells where a total reflection has been detected. At this point the cell is subdivided into smaller cells until a refraction occurs, as illustrated in Figure 4.4b. However, this scheme does not eliminate the problem at all. Suppose the velocity increases linearly with depth and a ray is shot from the source with 6a = 45°. In theory, the ray follows an arc of a circle and travels downwards until a maximum depth where $ — 90° (turning point). Then it starts going upwards. It is clear that this path cannot be reproduced by the described cell ray-tracing scheme, because once 9 = 90° has been reached, there is no chance for 9 to take on greater values. A better alternative is to use a constant gradient velocity within the corresponding cell. Locally, the raypath is an arc of a circle of known equation, thus allowing for turning rays. Although cell ray tracing can be improved in several ways, these kind of refinements will not be considered here. Instead, in next chapter I will present a numerical ray-tracing method that is more flexible and accurate for dealing with complex 2-D or 3-D media. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 68 4.4 Solving the boundary-value problem ( B V P ) 4.4.1 Problem definition In general, it is the goal of a ray-tracing B V P solver to find the raypath connecting any two fixed points within the model boundaries (two-point B V P ) . The raypaths must be consistent with the ray theory which is used to propagate the wave through the given velocity field (e.g. Snell's Law at each cell boundary in a cell ray-tracing scheme). Usually, this is carried out by finding the unknown take-off angle that gives rise to a ray propagating from the source to the receiver. In general, the B V P must be solved iteratively, except in special cases. Although the solution sometimes is nonunique (several raypaths may connect source and receiver satisfying the ray equations), the purpose of SART is to find the raypath that exhibits the absolute minimum traveltime. At this point it is worthwhile to make a distinction among various B V P alternatives. In the standard two-point ray-tracing problem, the wave travels from source to receiver without any additional constraint along its trajectory. I refer to this B V P as tracing direct waves. Other B V P alternatives introduce additional constraints along the ray trajectory. Such is the case of tracing reflections (including normal rays), head waves, or point diffractions. Direct waves Consider Figure 4.5. Let x3 represent the location of the source and xP the location of the receiver. These two points are fixed in space and must he within the model boundaries. For the sake of simplicity, I put them coincident with some of the model boundaries, but they could be placed in the interior of the rectangular area shown in the figure as well. In a tomographic experiment, for example, the source is detonated at x, at time t = 0 and the energy propagates in all directions. In particular, the ray that leaves the source with Chapter 4. Two-Dimensional Boundary Value Ray Tracing 69 local path a . cu T3 v = v(x, z) ofFset x Figure 4.5: Source, receiver and raypath geometry in a two-dimensional two-point ex-periment for tracing direct waves. angle 9^ arrives to the receiver with the minimum traveltime among all other possible trajectories. Here Of 1 determines uniquely the raypath shown in the figure. It is the purpose of SART to find 0 f * as well as the associated traveltime. Reflections In a reflection experiment (see Figure 4.6), not only must the ray connect both source and receiver, but it must also undergo a reflection at a prescribed reflector. The reflector can be described by a single-valued function of the form z = f(x). The absolute minimum raypath corresponds to the one that leaves the source with take-off angle 9^, undergoes a reflection at point x u and arrives to the receiver. The only unknown is 0 ° p t , for x u is uniquely determined by the take-off angle. Normal rays can be viewed as a particular case of reflected waves. Now both source and receiver are coincident. Naturally, the angle between the ray trajectory and the Chapter 4. Two-Dimensional Boundary Value Ray Tracing 70 Figure 4.6: Source, receiver, reflector and raypath geometry in a two-dimensional two-point experiment for tracing reflections. normal to the reflector at point x u is zero. Rather than finding the take-off angle corre-sponding to the minimum reflecting raypaths, is is more efficient to find the coordinate xu because half of the raypath needs to be traced. Headwaves Figure 4.7 depicts the case of tracing headwaves. The ray is now constrained to travel along the refractor z = f(x). The ray leaves the source with take-off angle 0^' and arrives to the refractor at point x u . After traveling from x u to x „ along the refractor, the ray ends up in the receiver. Note that the angles between the ray trajectory and the normals to the refractor at points x u and x „ are equal to the critical angles as defined by Snell's Law. In this problem the unknowns are two: either 8^ and xv, or xu and xv. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 71 offset x Figure 4.7: Source, receiver, refractor and raypath geometry i n a two-dimensional two-point experiment for tracing headwaves. Diffractions Diffraction ray tracing is another typical case of B V P . In a two-dimensional model, a diffracting point x u is also fixed within the model boundaries, and the m i n i m u m raypath connecting x4 with xu, and then x u with x,, is to be found (see Figure 4.8). The problem can be split into two standard two-point ray-tracing problems, each one characterized by independent take-off angles. Different B V P s can also be defined, as is the case of multiples and complicated head-waves. In any case, i t is not the purpose of this work to deal w i t h complicated wave phases nor later arrivals that may obscure the description of the two-point ray-tracing method. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 72 Figure 4.8: Source, receiver, diffracting point and raypath geometry in a two-dimensional two-point experiment for tracing diffractions. 4.4.2 Conventional methods for solving the BVP A brief description of some conventional methods for solving the B V P arising in ray tracing follows. In [Cer87] a more detailed review can be found. The next section presents a novel approach called SART, a B V P solver that is intended to overcome the usual deficiencies of the conventional methods. Shooting The shooting method represents an iterative IVP. First, an initial point (source) is fixed and the ray is propagated by specifying the take-off angle. Then, a search strategy is used to update these angles until the ray emerges through the desired endpoint (receiver) within a give tolerance. Since frequently the receiver location is an ill-behaved function of the take-off angle, Chapter 4. Two-Dimensional Boundary Value Ray Tracing 73 the strategy for updating the take-off angle may become a difficult task, and divergence is a common issue unless the model, or the results of the computations, are conveniently smoothed [LLC85]. As a consequence, some raypaths can be missed. The problem is even more severe in the 3-D case, where two take-off angles are to be found. Headwaves are obtained by using a similar trial-and-error search routine [Cas82], which is more difficult to implement and more likely to diverge, because more variables are involved. To find the optimum take-off angle corresponding to the ray connecting both source and receiver, the distance d = d(9a) = ||xe - x r | | (4.10) is to be minimized, where x e is the emerging point (the point where the ray leaves the model boundaries), and x P is the desired endpoint (see Figure 4.10). This is a rather difficult optimization problem. Equation (4.10) is zeroed using some kind of linearizing method that requires an estimate of the partial derivatives of d with respect to 8„. In the simplest form, shooting produces a fan of rays with equally spaced take-off angles covering a given range. As a result, it is possible to generate function d = d(9„) empirically, and then use a standard zero-finder to solve the equation d(9^pt) = 0. In the method of False position described by Julian and Gubbins [JG77] the partial derivatives of the receiver coordinates with respect to the take-off angle are estimated numerically by tracing three trial rays with slightly varying take-off angles. This method usually has slow convergence caused by the inaccuracies in the estimation of the partial derivatives. In [SK90] a more accurate set of derivatives is obtained by exploiting the fact that the information regarding the position of the ray at any given point and the direction is contained in the curvature of the local wavefront. So, the partial derivatives of the receiver coordinates with respect to the take-off angle can be determined by solving Chapter 4. Two-Dimensional Boundary Value Ray Tracing 74 the equations describing the geometrical spreading of the wavefront along with the ray equations [SK90]. The procedure requires the solution of a system of 15 linear equations per iteration for the 3-D case. To deal with discontinuities, extra equations to account for the boundary conditions must be added. Despite the efforts made to avoid slow convergence and to increase the accuracy, multipathing, discontinuities, and in general the ill-behavior of the objective function represent serious problems for the shooting method [SK90], and divergence can be expected in complex 2-D structures. In the same work, a sort of climb-out procedure to avoid local minima is devised, but I do not believe this is an efficient strategy when dealing with complicated 3-D structures. Bending In the bending method both points are linked by an initial guess path, which is then perturbed iteratively so as to satisfy the ray equations or Fermat's principle of stationary time. Several bending techniques are reported in the literature [JG77, UT87, PTE88, MNS92, Yon93] which, unlike shooting, always produce a ray connecting any source-receiver pair. In general, bending involves the solution of a highly nonlinear optimization problem, which requires some kind of gradient directions to update the raypath. In com-plicated velocity structures, bending tends to overlook multipath propagation because the solution depends on the first guess. As a result, the absolute minimum raypath can be missed. Although a method based on advanced graph theory for choosing an initial guess close to the global minimum exists [Mos89, Mos91, FL93], it is not clear whether the final raypath is a global or a local minimum [MNS92]. Besides, it is worth mentioning that the accuracy obtained with bending methods depends on the selected parameterization (e.g. number of nodes for representing the ray trajectory), and that the phase of the resulting ray may remain unknown throughout the process. Bending methods are usually faster than shooting methods for relatively smooth models, except Chapter 4. Two-Dimensional Boundary Value Ray Tracing 75 for tracing a large set of closely spaced rays where shooting may be preferred. In complex models where shooting has its main drawbacks, bending is advantageous but at a higher computational cost. Figure 4.9 illustrates a standard bending method. The algorithm starts with an unre-alistic straight raypath and perturbs it iteratively until the traveltime is minimum (in this example the raypath is parameterized with a set of segments and the algorithm perturbs the node coordinates). Clearly, whenever multipathing exists, the final solution depends on the starting model. Further, if the velocity model contains discontinuities, bending would present some difficulties in dealing with the node perturbation scheme, which is usually based on gradient directions. Filho and Thedy [FT95] use genetic algorithms (GA) to overcome these drawbacks. However, the optimization problem may become too expensive when the number of unknowns (nodes) is large. At the same time, the number of nodes can not always be kept small when a certain accuracy is required or the velocity field is complicated. Finite differences Finite-difference (FD) methods are based on solving the eikonal equation1 by means of finite differences [Vid88, PL91]. The scheme allows one to obtain simultaneously the traveltime field at the nodes of the finite-differences grid. By interpolation, the traveltime corresponding to the ray connecting both source and receiver can be obtained when the endpoint does not coincide with some grid node. The corresponding raypaths are found by following the normals to the wavefronts from source to receiver. FD methods are very suitable for computing traveltimes in relatively smooth velocity models, and the efficiency reached by these schemes is quite high, especially when a large set of rays needs to be •\"•A form of the wave equation for harmonic waves valid only where the variation of properties is small within a wavelength (high-frequency approximation). Chapter 4. Two-Dimensional Boundary Value Ray Tracing 76 offset x Figure 4.9: In the bending method, an initial guess path is iteratively perturbed until the traveltime is minimum or the ray equations are satisfied. Usually bending gets trapped in local minima and misses the optimum (global minimum) raypath. traced. However, since their accuracy depends upon the validity of the finite-difference approximation, when the model contains discontinuities or the velocity varies rapidly, the limitations become apparent. The use of a denser grid usually does not help much. The main drawback of the FD method is, however, the fact that it only produces first arrivals, regardless the phase of the obtained raypath. That is, given a gridded velocity field and a source-receiver geometry, in general it is not possible to know which type of wave phase the traveltime obtained by the FD method corresponds to. It may correspond to a diffraction, or a headwave, or a direct wave, etc., whichever arrives first. This issue is very important from the point of view of phase identification and the energy of the arrivals [GB93]. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 77 Linear traveltime interpolation Linear traveltime interpolation (LTI) was first introduced in [IRS 88] and later modified in [AK90]. As in the FD method, a gridded 2-D cell model is used. Raypaths are assumed straight within each cell, and traveltimes are computed on the cell boundaries based on the linear interpolation of traveltimes. Cell boundaries are subdivided into several segments and the minimum traveltime is calculated at every resulting grid point. After computing traveltimes, a backward process is utilized to trace the raypaths based on the reciprocity principle. The LTI method is reported to be more accurate and more suited for gridded 2-D cell models than the FD method [AK93]. LTI can even handle abrupt changes of velocity without the need to smooth the velocity field or to increase the number of cells. Like F D , LTI produces only first arrivals. Huygens' principle method Methods based on the Huygens' principle have been developed in order to avoid the difficulties associated with the branching points of a ray (see for example [Sai89]). Meth-ods based on graph theory may also be included in this category [Mos89, Mos91, FL93]. Like FD and LTI, these methods are applicable to gridded cell models. According to Huygens' principle each grid point becomes a secondary source, from which a new ray propagates to all directions. In practice, the number of directions is fixed and therefore the number of paths connecting two points becomes finite. Traveltimes corresponding to each possible path are compared and the one with minimum traveltime is selected. The accuracy is limited to the model discretization, and the computational requirements increase dramatically with the number of cells, nodes and directions, in particular for 3-D models. Recently, graph-theory methods have been applied to 3-D media [CH96], but a thoughtful study of its applicability for complex 3-D structures remains to be done. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 78 4.4.3 Simulated annealing ray tracing ( S A R T ) As a counterpart, I present here a novel method for solving the B V P that combines SA with ray-tracing techniques [VU96a, Vel96]. SART has both shooting and bending features. On the one hand, a standard IVP is solved at each iteration. On the other hand, the raypath is arbitrarily modified so as to satisfy the boundary conditions. At convergence, the solution to the IVP yields the optimum ray trajectory connecting source and receiver. It is the goal of SART to find the raypath that exhibits the absolute minimum trav-eltime. It is not the objective of the method to find all possible rays connecting the two fixed points, but only the one that exhibits the absolute minimum traveltime for a given wave phase. SART focuses on direct first arrivals, for it is based on solving the IVP. In many ray-tracing applications these are of greatest importance because they usually represent the most energetic events [GB93]. Arbitrary headwaves, diffractions and waves propagating through shadow zones cannot, in general, be found by SART. In these cases, bending, finite-difference, or graph-theory methods should be used. However, SART can be easily modified so as to handle these kind of waves if the ray signature is defined be-forehand, as shown in Figures 4.6, 4.7, and 4.8. It is important to note that the method is not suitable for obtaining raypaths corresponding to local minima (such as caustics and triplications) nor minimum paths that exhibit exactly the same traveltime in some pathological cases, which are beyond the scope of this work. SART, unlike bending, avoids local minima and converges to the absolute minimum traveltime, provided the proper cooling schedule is utilized. As mentioned before, the number of unknowns in the bending method is usually large if a certain accuracy is required. On the contrary, the number of unknowns in SART reduces to a few and it is not affected by local minima nor discontinuities. Any ill-behavior of the objective Chapter 4. Two-Dimensional Boundary Value Ray Tracing 79 function does not represent a serious problem, in general, for V F S A , which is a natural tool for dealing with highly nonlinear optimization problems. 4.5 Simulated annealing ray tracing ( S A R T ) SART essentially proceeds as follows (see Figure 4.10). Given a fixed source point, x 4, and an initial take-off angle, 8S, the ray is propagated until it leaves the model boundaries or arrives at the boundary of a prescribed near-receiver region (e.g. the receiver cell), determining the emerging point x e. This point is then connected with a straight line directly to the receiver point x r . The total traveltime, which is computed as the path integral of the slowness between x 4 and x r , is minimized with respect to 6S using V F S A . Notice that the ray satisfies Snell's Law at all crossed cell boundaries, except for the segment connecting x e to x r . Nevertheless, when the minimum is obtained, Snell's Law is also satisfied for the last portion of the raypath, too. 4.5.1 Description of the algorithm Basically, at each iteration an I V P is solved starting from the source, x 8, with take-off angle 0S. The propagation is terminated provided either 1. the ray arrives at the model boundary; or 2. the ray arrives at a prescribed near-receiver region (e.g. receiver cell). The point where the ray meets one of the above two conditions is called x e . The raypath is completed by connecting x e with the receiver, x r , with a straight line. Figure 4.10 illustrates this situation when condition (1) is met for a 2-D model. Note that the segment connecting x e with xP is quite arbitrary, and the resulting raypath may be completely unrealistic. The purpose of the straight-ray construction is to force the raypath to satisfy Chapter 4. Two-Dimensional Boundary Value Ray Tracing 80 TD offset x Figure 4.10: Straight-ray construction used by SART. When traveltime is minimum, x e coincides with xP and the minimum path with optimum take-off angle Bf* has been found. the constraint imposed by the B V P . This is an intermediate raypath that, like in bending, is updated iteratively. The total traveltime is computed by integrating the slowness field s(x,z) = l/v(x,z) along the path I between both endpoints: Tse= sdl ( is the traveltime associated with the first portion of raypath that starts at x,, and (4.11) where s dl (4.13) Chapter 4. Two-Dimensional Boundary Value Ray Tracing 81 is the traveltime associated with the straight-ray construction starting at x e. The integra-tion here is done along the straight line. In the cell ray-tracing scheme, the integrals are approximated by discrete sums over all ray segments that make the raypath. Since source and receiver are fixed and x e is uniquely determined by the solution of the IVP (I assume the IVP can be solved for any given take-off angle T becomes a function of the take-off angle only, so T = T(ds). (4.14) The final raypath is found by recalling Fermat's principle. When traveltime T is mini-mum, x e coincides with x r and the whole raypath satisfies the ray equations. As I will show later, this is not always strictly true, and some ray for which x e ^ x r may arrive earlier than any other realistic path. However, I will reformulate the problem so as to overcome this eventual difficulty. The straight-ray construction could be replaced by any other arbitrary ray construction inasmuch as traveltime Ter tends to zero as x e tends to x r . The straight-ray construction has been chosen for expediency only, because it adds little extra effort to computing Ter. The basic problem now becomes an optimization one in which one parameter (take-off angle) is to be found so that T is a global minimum. Since expression (4.14) is a highly nonlinear, multimodal, and nondifferentiable function, it cannot be properly minimized using classical linearizing methods. I use instead V F S A , which is a very powerful tool for minimizing cost functions independently of its nonlinearities, discontinuities, and stochasticity. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 82 4.5.2 SART for more complex BVPs So far I have described the strategy for solving the two-point boundary-value ray-tracing problem that takes into account direct waves only. It is capable of finding the raypath connecting the two endpoints x4 and x, with the absolute minimum traveltime, inde-pendently of any possible reflection along its trajectory. However, many times one is interested in finding the raypath corresponding to complex raypaths, such as reflected waves or headwaves. This can be accomplished by incorporating minor modifications to the already described two-point ray-tracing scheme. Handling reflected waves In the case of reflected waves, the ray must undergo a reflection at a predefined boundary z = f(x). The reflection occurs at point xu = [xu, f(xu)], which is unknown (see Figure 4.11a). Here the ray is forced to arrive to the prescribed boundary where a reflection is desired. This is done by penalizing those rays that do not reach the prescribed boundary. Once the ray reaches the boundary, it undergoes a reflection and propagates towards the emerging point xe. Then xe is connected with xr using a straight line as described above. Since the ray trajectory is comprised of three portions, I define the following cost function to be globally minimized: , Tsu + Tue + Ter = /*; s dl + £« s dl + £' s dl, zk = f(xk) for some k Tmax, otherwise (4.15) where Tmax > I Sdl, V0„ (4.16) Chapter 4. Two-Dimensional Boundary Value Ray Tracing 83 can be guessed easily from a set of trial raypaths or set simply as TMAX = lmax/vmin, where lmax is the maximum expected path length, and vmin is the minimum velocity of the model. In equation (4.15) TUE is the traveltime associated with the second portion of the raypath starting at point xu. Cost function $i(0s) as defined above ensures that the resulting traveltime for those rays that do not reach the reflector is always greater than that corresponding to any reflected wave. As a consequence, the global minimum of equation (4.15) corresponds to a reflection, and xe = xr. Notice that equation (4.15) introduces first-order discontinuities at those angles where the ray stops reaching the discontinuity. This is not an issue for V F S A to be worried about, but may lead to extra iterations during the optimization process, because those rays that do not reach the discontinuity are generated unnecessarily. A n alternative scheme that avoids using a penalty cost function and that ensures that all generated rays correspond to a reflection, is to start the ray propagation from point xu towards the emerging points xei and xe2 as shown in Figure 4.11b. In this scheme the number of unknowns has increased by one. Not only coordinate xu must be obtained, but also take-off angle 0Ul. The take-off angle corresponding to the ray traveling from x„ to xe2 is related to 9Ul by 6U2 = 2ir-6Ul-2 arctan [f'(xu)], (4.17) easily derived from the figure. Then, it is convenient to define the cost function $ 2 = $ 2 ( 0 U 1 ; xu)=TUEI+TEIS+TUE2+TB2T = r e i s di + r s di + r* s di + r s n, JXu Jxei JXu Jxe2 (4.18) that is to be globally minimized using V F S A . Chapter 4. Two-Dimensional Boundary Value Ray Tracing 84 i offset x offset x Figure 4.11: Alternative strategies used by SART for tracing reflections connecting source and receiver (see text for details), (a) Here SART searches for the optimum take-off angle Qa until xe = xr and traveltime is minimum, (b) Here the search involves xu and 9Ul. At convergence, xei = x, and xe2 = xr. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 85 Yet another alternative scheme could be devised that depends on a single unknown, as in the first case, and that always generates a reflection at the discontinuity, as in the second one. This scheme is illustrated in Figure 4.12a. Here the ray leaves the source with take-off angle 03 and arrives to either point x u on the discontinuity, or to point x e on some model boundary. If x u is met first, the ray undergoes a reflection and travels towards x e as in Figure 4.11a. But if x e is met first, a vertical straight ray segment is used to join it with xu = [xe,f(xe)] and back again x e . From x e to x P the straight-ray construction is used as usual. Then I write { Tsu + Tue + T e r , zk = f(xk) for some k (4.19) Tse-\\-2Teu-\\-Ter, otherwise where (4.20) In essence, equation (4.19) is equivalent to equation (4.15), where Tmax have been replaced by Tse + 2Teu + Ter. It is clear that this construction will generate rather arbitrary ray trajectories with no physical sense except at convergence. Which of the three alternative schemes would be the most effective and efficient SART scheme for obtaining the optimum reflection connecting source and receiver, cannot be said a priori. Clearly the first one involves the fewest operations per iteration, but no final conclusion can be drawn regarding which one converges faster. At first sight the second approach would require more iterations since the model space to be explored by V F S A is two-dimensional. The underlying velocity model and source-receiver geometry play an important role, too. It can be said, however, that once the optimum take-off angle has been found Chapter 4. Two-Dimensional Boundary Value Ray Tracing 86 *i{9?) = M&?\\ x ? ) = MO?*)- (4-21) Normal rays In the normal rays case xa = xr, and the principle of reciprocity is applied. The ray leaves the reflector at point xu with take-off angle 6U and arrives to the emerging point xe (see Figure 4.12b). Note that the tangent to the raypath is parallel to n at point x„, the normal to the reflector. The only unknown is xu, since 6U can be computed with 6u = ir- arctan [f'(xu)]. (4.22) Thus, for this particular case, the cost function to be globally minimized is written as $ = $(x u ) = 2(Tue + Tes), (4.23) where T u e = [*' s dl, Tea = [*' s dl. (4.24) In SART, equation (4.23) is globally minimized using VFSA. Handling headwaves To trace headwaves along a prescribed boundary, z = f(x), the ray trajectory is broken into various parts. As shown in Figure 4.7 and described in section 4.4.1, the final raypath has three parts, but the intermediate raypaths which are obtained at each iteration are normally composed of five, as shown is Figure 4.13 and described below: Chapter 4. Two-Dimensional Boundary Value Ray Tracing 87 offset x (b) X 5 — x r x e I X«\" M \\ intermediate path J / v = v(xy z) * -- = /(*) offset a; Figure 4.12: (a) S A R T for tracing reflected waves. Yet another alternative strategy. When the ray leaves the model at point x e , a double straight vertical line is used to force an (unrealistic) reflection at point x „ = [xe,f(xe)]. (b) Normal rays case. In the optimization problem, coordinate xu represents the only unknown. Chapter 4. Two-Dimensional Boundary Value Ray Tracing 88 CL X X, V offset x Figure 4.13: SART for tracing headwaves. Both coordinates xu and xv are globally optimized until the total traveltime is minimum. 1. the ray that leaves a point x u = [xu,f(xu)] with take-off angle 9U, and arrives to the emerging point x e i , 2. the ray that leaves a point x „ = [xv, f(xv)], xv > xu, with take-off angle 8V, and arrives to the emerging point x e 2 , 3. the ray that propagates along the boundary from x u to x „ , 4. the straight-ray segment that connects x e i with x 3, and 5. the straight-ray segment that connects x £ 2 with x r . The problem now consists on finding the unknown points, x u and x „ , so that the first part of the ray arrives to x 4, the third one arrives to x,, and the total traveltime is minimum. That is, the problem has been split into two IVPs (now the take-off angles are known and the initial point is not), plus the ray propagation along the refractor. Each of Chapter 4. Two-Dimensional Boundary Value Ray Tracing 89 both IVPs, are not, however, independent. The take-off angles to trace ray parts (1) and (2) above are closely related to the respective critical angles and normals to the refractor at points x u and x„. That is { &u = 7T + arcsinfuu/iv) — arctan/'(a:u) (4.25) 0V = 7T — arcsm(vv/vr) — arctan/ ' (x„) where vu and vv are the velocities right above the refractor at points x u and x„ respect-ively, and vT is the refractor velocity (vT > vu, vr > vv). For this problem the total traveltime to be globally minimized can be written as $ i = * i (a j „ , xv) = Tuei + Tve2 + Tuv + Tei3 + Te2r, (4.26) where Tuv= r s[x,f(x)]dx, (4.27) is the traveltime along the refractor from x u to x„ . I minimize equation (4.26) with respect to the two unknowns xu and xv using V F S A . Again it must be stressed that equation (4.26) is highly nonlinear and multimodal, and that although a trial-and-error procedure for finding xu and xv is feasible, the same kind of difficulties that arise in shooting to choose the take-off angle is expected. In a 3-D velocity model, if the prescribed boundary is denned by the surface z = f(x,y), the total traveltime is a function of four variables, say (xu,yu) and (a;^,^). A trial-and-error procedure to find the four unknowns may be both inefficient and ineffective for obvious reasons. Alternatively, the total traveltime can be defined in terms of 6S and xv (or 9r and xu): $2 = $2(63, xv) = TBU + Tuv + Tve + Ter, (4.28) Chapter 4. Two-Dimensional Boundary Value Ray Tracing 90 Q. f(xk+i,yk+1), (5.8) zk > f{xk,yk) and zk+1 < f(xk+1,yk+1). Once an interface crossing has been detected, a root-finder is used to find the new At1 which yields zk+i — f(xk+i,yk+i). For this purpose, I define the function g(At) = z k + l - f(xk+i,yk+i), (5.9) Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 112 which must be equal to zero (within a given tolerance) at the intersection point, x \\ That is g(At i) = z i-f(x i,y i)~0. (5.10) Since g(0) — zk — f{xk,Vk), then the root of equation (5.9) is bracketed in the interval [0,Ai]. The selection of an appropriate root-finder is very important. Robustness under difficult situations is required because function g(At) may take any form. Also, the convergence rate must be as fast as possible for best efficiency (note that every time g(At) is evaluated, the ray equations must be advanced one stepsize At). The bisection method never fails, but its convergence rate is very poor. A higher order approximation is achieved with Brent's method [PTVF92]. I chose this method because of its convergence rate, robustness, and because the derivatives are not required (note besides that interfaces may be discontinuous). Brent's algorithm has quadratic termination in most cases and is as robust as the bisection method [PTVF92]. In the worst cases, such as pathological functions, Brent's method takes a bisection step. In practice I have found that one to three adjustments are enough to find the intersection point between the raypath and the interface within a small tolerance. Once the intersection point has been found, the current raypath coordinates and traveltime are updated accordingly: f x f c + 1 = x \\ (5.H) [ i f c + i = tk + At\\ Here the slowness vector is updated using the reflection/transmission laws given in equa-tions (5.3) and (5.4). Then the timestep is restored to its initial value and the ray propa-gation continues with the new initial conditions. If more than one interface is crossed at Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 113 a given timestep, the one whose corresponding adjusted timestep is smallest is selected as the candidate interface for applying the reflection/transmission laws. The same kind of iterative adjustment is done when the ray crosses any of the model boundaries. In this particular case, the propagation is terminated. which is needed when applying Snell's Law at the intersection point. This requires the additional condition of differentiability, at least locally, of the interfaces. 5.3.3 Ray signature and stopping conditions During the propagation, an index indicating the current region is saved and updated after each interface crossing. This index is used to determine which velocity function must be used in the integration of the equations (5.2). A flag for each interface is also provided which indicates the decision to make in case the ray arrives at the interface (see \"Ray signature and stopping conditions\" in Chapter 4). This allows us to model P-and 5-waves, or even a conversion between the two, or to force a reflection at any given interface to simulate a reflector. Note that there are various stopping conditions that may be implemented. The ray propagation stops whichever of the following conditions occurs first: 1. the ray arrives at some model boundary, 2. the ray arrives at some prescribed interface (e.g. some target plane), 3. a refraction has been requested but total reflection occurred, The unit normal to the surface z = f(x,y) at a generic point x, is calculated by (5.12) Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 114 4. the number of points in the raypath or the traveltime is greater than a predefined value, etc. In my experiments, conditions (1) and (2) are set by default. For simplicity, I locate all sources and receivers on the model boundaries. In general, a refraction is requested at all interfaces, except at some individual interface for modeling reflections or headwaves. In case a refraction is not possible because the incident angle is beyond the critical angle [negative square-root argument in formula (5.4)], a reflection is generated and this is notified on output. In summary, given an initial point and an initial direction the system described above is able to propagate any ray provided the velocity and interfaces can be evaluated at any point within the model boundaries, and provided a ray signature is specified a priori to make a decision at each interface intersection. It is also required to know which region is being traversed at any point. Raypaths honor bending in inhomogeneous regions and Snell's Law at discontinuities. 5.4 Solving the boundary-value problem ( B V P ) by means of S A R T As already described in previous chapter for the 2-D case, the B V P is solved by means of SART. In 3-D the procedure is analogous, being the number of degrees of freedom the only fundamental difference. Of course, in this scheme I use a numerical ray-tracing algorithm instead of a cell ray-tracing one. But this does not concern to SART, a system which is independent of the selected IVP solver. The straight-ray construction applies here too, in such a way that the raypath is forced to reach the desired endpoint. Direct waves, reflections, headwaves, diffractions, etc., are treated similarly as in the 2-D case. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 115 z Figure 5.3: Ray tracing direct waves through a 3-D media using SART. At convergence x e = x P . 5.4.1 P r o b l e m definition Direct waves Tracing direct waves is the simplest case and the basis for tracing the other types of waves. Both source and receiver are fixed and the optimum take-off angles 9^ and £f* are to be found so that the total traveltime is a global minimum. The total traveltime is written as In equation (5.13), is the traveltime which is obtained after solving the IVP for the given initial conditions. The second term in equation (5.13), T e P , is the traveltime associated with the straight-ray construction (see Figure 5.3). Since there are two angles (5.13) Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 116 needed to determine uniquely the whole ray trajectory, traveltime T is a two-dimensional function, thus T = T{e.,t.). (5.14) which is in general multimodal and nondifferentiable. When T = minimum, Fermat's principle is satisfied and point x e coincides with the receiver. Thus, it can be seen that the only difference between the 2-D and the 3-D case is that the whole final raypath is now defined in terms of two unknowns instead of just one. If in the 2-D case a simple exhaustive search could have been used to find the global minimum of the traveltime equation (4.11), in the 3-D case the same procedure is not recommended for obvious reasons, unless T is a very simple function. Unfortunately, T is in general multimodal and nondifferentiable, which makes the optimization problem a difficult numerical task. Reflected waves The case of reflected waves again involves the minimization of a two-dimensional cost function in terms of the two take-off angles: { Tau + Tue + Ter, if the ray arrives to z = f(x, y) (5.15) Tmax, otherwise where point x u is the point where the ray intersects the reflector, z = f(x, y), and Tmax is the maximum guessed value the total traveltime may take for all possible take-off angles. As usual, the ray is propagated from the source with take-off angles 8„ and until it arrives to the reflector at point x u (see Figure 5.4). Here Snell's Law of reflection is applied and the propagation continues until the ray leaves the model boundaries at the Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 117 z t Figure 5.4: Ray tracing reflections through a 3-D media using SART. At convergence x e = x r . An alternative strategy is described in the text. emerging point x e . Finally, this point is connected with the receiver using a straight line. When total traveltime $ is minimum, x e coincides with x r . As in the 2-D case, alternative strategies for dealing with reflected waves can be devised. One of such alternative strategies starts the propagation from the unknown point x u on the reflector surface towards the source with take-off angles 6U and £ u , also unknowns. Then the raypath is completed by shooting towards the receiver starting at x u with take-off angles 6'u and £u- These angles can be expressed in terms of the slowness vector su and the unit normal to the surface, n, according to equation (5.4). Then cos £'u = sin 0U cos £ u — 2an., (5.16) where Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 118 a = su • n = nx sin 9U cos £ u + ny sin 9U cos £ u + nz cos 9, u (5.17) is the dot product between the slowness vector and the unit normal. As a result, the total traveltime becomes a four dimensional function not undergoing any reflection as in the previous case. In equation (5.18), x e i and x e 2 are the emerging points after shooting from x u with take-off angles (9U,£U) and (#„,£„) respectively. The terms Tei3 and Te2r are the traveltime contributions associated to the corresponding straight-ray construction, as done for the direct wave case. N o r m a l rays The normal rays case is a particular case of reflections. The problem can be cast as finding the optimum coordinates, (xu,yu), of the ray that travels towards the source-receiver point with an initial direction which is parallel to the normal of the surface at that point. That is, s = n/v. Take-off angles are then calculated from * — * {9ui £ u ! xui Vu) — Tuei -\\- TeiS + Tue2 + Tt (5.18) A l l raypaths correspond to reflections now, and there is no need for penalizing those rays (5.19) Consequently, the total traveltime to be globally minimized is written as * = *(a5u,yu) = 2(r i ie + T e,), (5.20) again a two-dimensional function. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 119 Headwaves Now the number of unknowns is four: the coordinates of the points the ray enters and leaves the refractor: x u and x „ . Actually, the ray is propagated starting at these two locations towards the source and receiver respectively. As a result the final raypath is composed of five portions, as described in Chapter 4 for the 2-D case. Figure 5.5 depicts how SART handles a headwave in a 3-D media. Here I will assume the refractor is a plane interface of equation z = ao + a^x + a2y with velocity vT = constant. So, the raypath connecting x u with x „ , which is the shortest path, honors the parametric equation X A, A £ [>CU, Xy\\ y = Vu + &o(A - xu) z = zu + fei(A - xu) where bo and b\\ are constants given by (5.21) (5.22) [bo - (yv - yu)/(xv - xu), { bi = (zv - zu)/(xv - xu). Slowness vectors sP (see Figure 5.5) at both points are expressed in terms of the tangent vector to the parametric curve defined in equation (5.21): 1 (dx dy dz \\ V = ' dx^ OX. + 'dy^ dy + 'dz' d\\, -.1/2 (5.23) At point x.v the slowness vector follows the direction of this ray segment. At point x u , the range of variation of A must be reversed, so that slowness vectors have opposite sign. Computing the derivatives, it yields - ^ ( i + fcnj + hk) at point x u ^ ( i + boj + &ik) at point x „ (5.24) Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 120 Figure 5.5: Ray tracing headwaves through a 3-D media using SART. The ray travels along the refractor with v = vr, from x u to x „ . At convergence x e i = x s , and x e 2 = xP. where p = ( l + 6g + 6 ? ) 1 / a . (5-25) The corresponding take-off angles at the unknown points x u and x.v are again com-puted according to Snell's Law (5.3) with + csAei, . . (5-30) where a3 is obtained by Unear search or other strategy. But the selection of the search vector (A8 3,A£ 3) is largely what distinguishes one method from another. In the SD approach the greatest reduction in the cost function value is obtained in the direction of the negative gradient, thus 3 dx- 3 dx-2 Y,(xi - xri)-^, 2 X>i - a J r J^T 1 . »=i a f f i i=i °Z>. ( A 0 J , A £ ) = V Ttol after I T M A X iterations, the SA run is repeated using a different seed. Default value: 1.0 ms. 2. M A X R E P E A T : maximum number of times the SA run is repeated before giving up. Default value: 3. 3. I T M A X _ L I N : maximum number of iterations for the linearizing stage. Default value: 15. 4. e^ : tolerance distance for the linearizing stage. Default value: 1 x 10 - 5 m. The first two parameters, together with I T M A X , are related to the SA optimization stage. The remaining two parameters concern exclusively to the linearizing stage. 5.6 SART performance: analysis and discussion In this section I shall discuss some important aspects concerning the efficiency of various SART subprocesses, and methods to further improve its overall performance. I will also make a comparison with a standard grid-search algorithm, and with a multistart linearizing procedure. These two methodologies may be viewed as particular cases of shooting techniques. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 149 5.6.1 S A R T subprocesses efficiency It is a commonly known fact that SA-based algorithms for the optimization of nonlinear problems are more expensive, in terms of computational cost, than gradient-based meth-ods. SART is not an exception to the rule, even though it uses V F S A . The price to be paid in order to provide a reasonable certainty on finding the global minimum is given by the number of iterations required for convergence, as shown in previous section (see Table 5.6). To visualize the efficiency of various SART subprocesses under different conditions, the ray-tracing method was applied to both Model 1 (Figure 5.1) and Model 2 (Figure 5.14) using different settings. In the first case, a source was located at (5,5,70) and 25 receivers were distributed across the surface, z = 0. In the second case, the source was put right below the salt dome at (—25,0,35), and 25 receivers distributed across the surface, too. The resulting 3-D raypaths are plotted in Figures 5.23 and 5.24 respectively. SART was repeated using two different constant timesteps in the R K integration. The F O R T R A N codes were compiled using the \"-pg\" option (profile-debugging mode) so that the amount of time spent in each subroutine could be extracted from the resulting executable code. Table 5.7 summarizes some of the profile values obtained in that way1. The tabulated values in each row have the following meaning: 1. TT: total computational time (in seconds) required to obtain the 25 optimum raypaths. 2. rays/s: number of traced rays per second (i.e. 25/TT). 3. k: mean raypath length (in points). 4. COST: percentage of TT spent at evaluating the cost function. xThe results were obtained on a 90MHz PC-Pentium, running under Linux with f2c. Figure 5.23: Model 1: multiple ray tracing in the fault model, (a) 25 optimum raypaths obtained by SART. The source is located at (5,5,70). (b) Raypath projections onto various planes. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 151 Figure 5.24: Model 2: multiple ray tracing in the salt-dome model, (a) 25 optimum raypaths obtained by SART. The source is located at (5,5,70). (b) Raypath projections onto various planes. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 152 5. INT: percentage of TT spent by the numerical integration of the ray equations. 6. CROSS: percentage of T T for checking for interface crossings, finding intersections and evaluating Snell Law. 7. STRGHT: percentage of TT spent in the straight-ray construction subroutines 8. LIN: percentage of TT spent in the hnearizing optimization stage (the method of steepest descent for pohshing the SA solution). 9. TTEuier/TTuKA- ratio of TT using Euler and fourth-order R K methods for integrat-ing the ray equations. 10. TTRKI/TTRKA- ratio of TT using second- and fourth-order R K methods for in-tegrating the ray equations. In SART, each iteration involves the solution of the ray equations for a given set of initial conditions. Traveltime TeT along the straight-ray construction needs to be com-puted, too. At each timestep, the algorithm also checks whether the ray has crossed any model interface. Usually, this procedure, along with the calculation of the corresponding intersection points, is a very time-consuming stage, since all or some interfaces must be evaluated at every point of the ray trajectory. When At is small and/or the number of interfaces is large, this process may account for the major part of the total computational time spent in each iteration. A l l these procedures are involved in the evaluation of the cost function at each iteration. This is by far the most time consuming process of SART, accounting for about 95-99% of the total computational cost, as shown in Table 5.7 (row labeled COST). It is interesting to see the percentage of TT spent by some of the subprocesses involved in the evaluation of the cost function, according to the different models and stepsizes. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 153 Model 1 Model 2 A i 1.0 10.0 0.5 3.0 TT 43.2 18.1 65.6 43.6 rays/s 0.6 1.4 0.4 0.6 k 43.4 7.4 42.3 11.1 COST 98.1 95.3 98.9 98.4 INT 64.7 48.5 50.7 51.1 CROSS 18.7 33.7 32.0 52.0 STRGHT 10.9 19.7 22.6 22.0 LIN 3.4 2.5 2.0 2.0 TTEUUT/TT 0.51 0.64 0.62 0.62 TTRK2/TT 0.68 0.76 0.75 0.74 Table 5.7: SART profile using fourth-order RK integration. The first row shows the total computational time, TT, in seconds. Rows 4 to 8 show the percentage of TT spent in various subprocesses. See text for explanation. For example, take a look at row INT in the table. This subprocess is the most expensive one in most cases (above 50% of TT), especially for A i small. However, when the number of interfaces is large (as is the case of Model 2), CROSS starts getting relevant and even exceeds INT in terms of computational effort when A i is large (52.0% against 51.1% in the last column of the table). This is because the number of iterations performed by CROSS at each interface crossing is almost independent of the integration stepsize. The profile also yields other interesting results. The straight-ray construction subpro-cess, STRGHT, which involves the calculation of the traveltime between the emerging point and the receiver, accounts for about 10-20% of TT. Finally, subprocess LIN takes only about 2-3% of TT, what is a small value as compared with the other subprocesses. As a conclusion of this analysis, it can be said that the number of interfaces, and their Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 154 complexity, plays a key role in the performance of SART. For models with no interfaces, the overall computational time would be significantly reduced. A n important reduction in the overall computational time may be obtained by using faster numerical methods at no expense of accuracy when velocities are smooth functions. This point will be discussed in detail below. 5.6.2 Further improvement of SART efficiency Lower-order integration A significant overall improvement of SART performance can be obtained if the compu-tational cost is reduced in the most time-consuming subprocess, namely INT. This can be accomplished using a lower-order numerical integration algorithm, especially when ve-locity fields within each region are very smooth functions. For example, the step's error in the simple Euler method is 0(At2), which is zero when the velocity is constant. Euler method (see for example [PTVF92]) requires a single evaluation per step of the right-hand side of equation (5.2), whilst fourth-order R K requires four. Second-order R K can also be used. It requires two evaluations per step of the derivatives, and the associated error is 0(At3). As a result, a significant overall reduction in efficiency can be expected. Appendix C provides a formulae to estimate the ratio between the total C P U time after using two different numerical integrators. For instance, the saving in computational cost is reduced by a factor of two if Euler method is used instead of 4th-order R K in Model 1 with At = 1.0 s (see last two rows in Table 5.7 for estimated ratios). Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 155 Static v e r s u s dynamic i n t e g r a t i o n In SART, the integration of the ray equations has two options: static integration and dynamic integration. In the first case, the user must supply which method to use through-out. In the second case, each model region is given a flag number indicating which method must be used for that particular region. In this manner, every time the ray enters a new region, the appropriate method is automatically selected for optimum performance ac-cording to the velocity field smoothness (e.g. in the salt-dome model, use Euler for regions IV, V , VII , and VIII; second-order R K for regions I, II and VI ; and third-order R K for region III). The same static/dynamic concept applies to the straight-ray construction2. Table 5.8 shows the profile values obtained after using Euler method (first four columns) instead of fourth-order R K (Table 5.7), and using dynamic integration (fifth and sixth columns) for the salt-dome model. The results show that the improvement in the overall performance estimated using formula (C.2) is a fairly good approximation to the actual values. In effect, last row of Table 5.8 shows the ratios of the total com-putational times using Euler and fourth-order R K respectively. These values are to be compared to the estimations shown in Table 5.7 (row labeled TTEuier/TT). Notice also how subprocess INT now surrenders its protagonist role to subprocess CROSS in all the runs. It becomes the most time-consuming subprocess, now, even for Model 1 that has only six planar interfaces. This suggests that SART efficiency will be significantly greater for models with no interfaces or just a few. C o l l e c t i n g p r e v i o u s cost f u n c t i o n e v a l u a t i o n s Whenever more than one B V P is to be solved for a given experiment, it seems reasonable to take advantage of previously computed cost functions to improve next SA runs. This 2 Since at convergence Ter goes to zero, the accuracy at this stage is not very important and Euler method can be used in all cases. Nevertheless, the user can select any method at this stage. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 156 Model 1 Model 2 Model 2 (static) (static) (dynamic) A f 1.0 10.0 0.5 3.0 0.5 3.0 TT 20.6 10.1 42.2 23.2 44.9 23.6 rays/s 1.2 2.5 0.6 1.1 0.6 1.1 COST 96.5 92.3 97.4 97.0 98.1 96.5 INT 23.1 15.9 15.3 16.1 14.6 15.2 CROSS 41.2 39.3 39.4 52.6 40.8 53.2 STRGHT 19.9 20.0 34.5 27.2 33.8 27.4 LIN 3.2 2.8 1.6 2.1 1.6 2.0 TT/TTRK4 0.48 0.56 0.64 0.53 - -Table 5.8: SART profile using static integration (Euler method in this case) and dynamic integration. is particularly useful for source-receiver pairs sharing the same source point. At each cost function call, not only is the cost function associated with the source-receiver at hand evaluated, but also the cost functions associated with all remaining source-receiver pairs. In turn, the optimum solution to each pair is updated accordingly. When the next pair is considered, the computations already done for all the previous pairs are taken into account. This procedure allows one to require fewer iterations to reach the global minimum than running SA for each pair independently. In the examples below, the maximum number of iterations for the last source-receiver pair is set, by default, equal to half the maximum number of iterations for the first source-receiver pair. For the other receivers, I T M A X varies linearly between these two values. Since the most time consuming process is associated with the first portion of the raypath (from the source, x,, to the emerging point xe), for a moderate number of receivers with common shot, in general this strategy leads to a significant overall speedup, though the evaluation of those Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 157 extra cost functions contributes to increase the computational cost of each individual run. 5 . 6 . 3 Comparison with other ray-tracing methods In this section I will make a comparison, in terms of computational cost, among SART, a simple grid-search algorithm, and a multistart linearizing procedure. The grid-search method consists of performing an exhaustive search on a regular grid that spans the take-off angles ranges, and evaluating the cost function at each point. The lowest value corresponding to each source-receiver pair is then selected, and the steepest descent (SD) method is finally used to refine the solution. In this way, the global minimum can be located provided the discretization of the take-off angles is dense enough. This method will be called GRID. The multistart procedure, MULTI, simply applies the SD method starting from different initial take-off angles selected at random from the given search ranges. The optimum take-off angles are then chosen as those that produce the lowest cost function values. For the comparison, SART is run in two flavors. SART1 solves the optimization problem for each source-receiver pair independently. SART2, collects information corresponding to previous cost function evaluations and uses this information to improve the current solution, as explained above. In the experiments, both Model 1 and Model 2 shown in previous sections are used. Several source-receiver pairs are defined in each case, and the optimum take-off angles are searched using SART, GRID, and MULTI, using various I T M A X , grid densities, and number of restarting points, respectively. The comparison is done for 2-D and 3-D experiments. In the first case, only take-off angle 9S is searched, while is fixed and known a priori. Here, the acquisition geometry is that used in Figures 5.10b and 5.17b. That is, all geophones uniformly distributed along the right-hand side borehole. In the 3-D case, both 6S and are searched. Here, I use two acquisition geometries: (1) the one used in the 2-D case (i.e. aligned geophones along right well), and (2) the one depicted Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 158 2-D 3-D, case 1 3-D, case 2 Method P A R ray/s 1.0 % P A R ray/s 1.0 % P A R ray/s 1.0 % 100 22.90 94.4 100 16.58 82.4 100 17.86 76.8 SART1 250 11.26 97.3 250 8.72 91.2 250 10.42 82.4 500 5.35 99.1 500 4.96 91.8 500 5.81 91.2 1000 2.80 99.7 1000 2.55 96.5 1000 2.94 98.4 100 27.20 99.1 100 20.61 94.4 100 20.83 87.2 SART2 250 12.83 100.0 250 9.58 96.2 250 10.05 90.4 500 7.08 100.0 500 5.96 97.6 500 5.21 96.8 1000 3.76 100.0 1000 3.24 99.4 1000 2.75 99.2 1.0 343.43 94.4 0.9 20.73 95.6 1.7 20.11 88.0 GRID 0.5 300.88 96.5 0.6 10.97 98.2 1.0 7.14 89.6 0.2 241.13 99.4 0.4 4.89 98.7 0.4 4.55 94.4 0.1 162.68 100.0 0.3 2.75 99.2 0.6 2.63 96.0 10 23.73 97.9 10 21.94 86.8 15 14.71 87.2 M U L T I 20 11.14 99.1 25 8.72 92.9 25 9.26 94.4 40 5.60 100.0 40 4.86 97.9 60 4.10 99.2 80 2.80 100.0 70 2.98 98.5 100 2.63 100.0 Table 5.9: Model 1: comparison of SART with other methods in the fault model. P A R equals I T M A X , A6S (and A£ s ) , and number of starting points for SART, GRID, and M U L T I , respectively. in Figures 5.23a and 5.24a (i.e. geophones distributed across the surface plane). I run SART1 and SART2 with various I T M A X and measured the total C P U time required to find all solutions3. I also measured the C P U time required by GRID for various A6S (and A£,) , as well as the C P U time required by M U L T I with different number of starting points (ITMAX_LIN=5 by default). For statistical purposes, each run was repeated 5 times using different seeds, or slightly shifted grids when running GRID. In 3The CPU time was obtained by compiling the codes with the \"-pg\" option. This compiling option invokes a runtime recording mechanism that makes profiles by procedure. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 159 (a) (b) (c) 0.1 1 10 100 1000 0.1 1 10 100 0.1 1 10 100 ray/s ray/s ray/s Figure 5.25: Comparison of SART with other methods for both the fault model (solid line) and the salt-dome model (dashed line), (a) 2-D experiment; (b) 3-D experiment, case 1; (c) 3-D experiment, case 2. Data is shown in Tables 5.9 and 5.10. See text for details. SART2, I T M A X was kept constant in Model 1 (3-D, case 2 only), and in Model 2 (all cases). Otherwise, I T M A X decreased linearly to half the number of iterations for the first source-receiver pair, as explained above. The results are summarized in Table 5.9, for Model 1, and Table 5.10, for Model 2. Also, the tables show the percentage of rays that converged to the actual global minimum within a 1% tolerance. These numbers are a measure of the effectiveness of the algorithms. The actual global minimum raypaths for each source-receiver pair are shown in Figures 5.10b and 5.23a, for Model 1, and in Figures 5.17b and 5.24a, for Model 2. Figure 5.25 plots data in previous tables for Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 160 each experiment. Note that in Model 1 (first acquisition geometry), there is a shadow zone for z > 67.0 m, approximately. So, the total number of rays in these experiments was 5 x 68 = 340. The total number of rays for the second acquisition geometry was 5 x 25 = 125. In Model 2, these numbers were 5x41 = 205, and 5 x 25 = 125, respectively. One obvious conclusion is that the rays traced per second decrease and the effective-ness increase with the number of iterations in SART, the density of the grid in GRID, and the number of starting points in M U L T I . In Model 1, 2-D experiment, GRID outperforms both M U L T I and SART by about one order of magnitude in terms of C P U time. GRID computes 200-300 rays per second to reach about 98 — 99% effectiveness, while SART2 computes 20-30, only. When the model becomes more complex, such as in the salt-dome case, SART2 attains the same degree of effectiveness and efficiency than GRID. Now the number of traced rays per second decreases to about 1 ray/s for 98-99% effectiveness. To obtain this effectiveness level, SART2 required I T M A X of about 100, for Model 1, and 1000 for Model 2. For GRID, A9, = 0.5 was enough in Model 1, but it should be made as small as 0.0015 in Model 2, so that the global minima were not missed. On the other hand, both SART1 and M U L T I are slower and take more iterations to reach a certain effectiveness than SART2 and GRID (Figure 5.25a), especially in Model 2. In the 3-D experiments, first acquisition geometry, things are different (see Figure 5.25b). In Model 1, GRID is still best, but SART2 attains almost the same efficiency and effectiveness. M U L T I is not a bad option for the slowest runs (5 rays/s), and produces 98% effectiveness, like GRID and SART2, when it is restarted 40 times. SART requires I T M A X of about 500, and GRID A9, ~ 0.4. In Model 2, SART2 requires 2000-3000 iterations to reach these levels of effectiveness (0.3 rays/s), while SART1 barely reaches 80% after 4000 iterations (0.15 rays/s), GRID as low as 40% with Ad, = 0.1 (0.1 rays/s), and M U L T I misses 9 out of 10 global minima at a low rate of 0.13 rays/s. In the case of M U L T I and GRID, these numbers are unacceptable. Note that here the cost functions Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 161 2-D 3-D, case 1 3-D, case 2 Method P A R ray/s 1.0 % P A R ray/s 1.0 % P A R ray/s 1.0 % 300 2.12 45.4 1000 0.53 35.1 300 2.81 86.4 SART1 750 0.93 66.3 2000 0.31 62.0 750 1.28 88.8 1500 0.35 80.0 3000 0.18 74.6 1500 0.72 92.8 3000 0.23 82.4 4000 0.14 77.8 4000 0.24 96.8 300 1.72 97.1 1000 0.79 96.6 300 2.84 88.8 SART2 750 1.08 98.3 2000 0.37 97.1 750 1.40 89.6 1500 0.45 99.3 3000 0.26 98.5 1500 0.67 93.6 3000 0.31 100.0 4000 0.19 99.0 4000 0.27 100.0 0.0030 1.98 96.1 0.30 0.72 17.1 2.5 3.05 86.4 GRID 0.0015 0.99 99.0 0.20 0.34 28.3 1.5 1.18 89.6 0.0007 0.47 99.5 0.15 0.18 31.7 1.0 0.54 92.0 0.0004 0.29 100.0 0.10 0.09 39.0 0.7 0.27 93.6 25 2.28 34.6 100 0.44 4.4 25 2.60 66.4 M U L T I 50 1.13 57.6 150 0.31 7.3 60 1.26 79.2 150 0.37 84.4 250 0.20 9.8 120 0.62 84.8 200 0.25 88.4 350 0.13 12.7 300 0.24 92.0 Table 5.10: Model 2: comparison of SART with other methods in the salt-dome model. P A R equals I T M A X , AO, (and and number of starting points for SART, GRID, and M U L T I , respectively. are very complex, and SART2 outperforms all other methods by far. In the case of the second acquisition geometry, 3-D experiment, the results of the computations show that almost all methods perform well in both models (see Figure 5.25c). In Model 1, all methods except GRID attain at least 98% effectiveness at rates of about 3 rays/s. In Model 2, only SART2 reaches these levels of effectiveness at a rate of 0.3 rays/s, SART1 reaches 97%, and GRID and M U L T I keep below 94%. One can conclude that for relatively simple 2-D models, GRID is best. In simple 3-D Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 162 2-D 3-D, case 1 Mode l Nr S A R T 2 G R I D M U L T I S A R T 2 G R I D M U L T I 1 0.08 0.13 0.13 0.42 5.12 0.25 10 0.32 0.16 0.78 1.67 4.99 1.91 25 0.79 0.18 2.01 4.75 5.60 5.40 1 50 1.77 0.25 4.72 9.75 6.01 8.48 100 3.83 0.28 7.86 22.22 7.06 16.71 150 6.15 0.52 13.25 36.06 7.67 25.84 200 8.98 0.64 16.67 56.49 9.89 34.02 250 13.29 0.90 20.58 79.19 10.16 42.73 1 0.58 27.77 4.24 3.35 _ 39.87 10 4.29 27.37 53.02 28.77 - -25 10.56 28.16 - 66.62 - -2 50 21.73 28.34 - - - -100 44.04 30.03 - - - -150 68.50 29.77 - - - -200 92.52 31.01 - - - -250 - 36.53 - - - -1 - 100 0.200 20 750 0.600 40 2 - 400 0.002 320 2300 - 2700 Table 5.11: Comparison of S A R T with other methods for different number of receivers, iV P . The values indicate C P U time (in seconds) required to find all global minima with 1% accuracy and 98% effectiveness (except Mode l 1, 2-D case, where the resulting effec-tiveness was 99%). Values greater than 100 s were not computed. Last two rows show I T M A X , A0a (and A(a), and number of restarting points used to achieve the desired effectiveness level i n S A R T 2 , G R I D , and M U L T I , respectively. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 163 (a) (b) 100 T - 100 10 1 0.1 1 10 100 1 10 Nr Figure 5.26: Comparison of SART with other methods for different number of receivers, (a) Fault model, (b) Salt-dome model. Data is shown in Tables 5.11. models, all methods work well, and SART1 is the less effective. In complex situations, SART is the best option, especially in the 3-D case. Here GRID and M U L T I are very inefficient, since there are two degrees of freedom. This is not a big obstacle in SART. What makes SART less efficient in 3-D than in 2-D experiments is the complexity of the model (number of discontinuities, smoothness of the velocity fields, etc.), and not the fact that the number of degrees of freedom is doubled. In M U L T I and GRID, both issues play a key role in terms of efficiency and effectiveness. It is important to notice that the number of rays sharing the same source to be traced needs to be considered in this quantitative comparison of methods. If one were to compute a single ray, SART would be much more efficient than GRID and MULTI , for a given effectiveness level. On the other hand, the larger the number of rays with the same source, the more efficient GRID would be relatively to SART and M U L T I . Figure 5.26 and Table 5.11 illustrate this point. Here, all computations were done so that the Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 164 resulting effectiveness to find the global minima within a 1% accuracy was 98% (except in Model 1, 2-D case, where the effectiveness was 99%). In general, the C P U time increases linearly with Nr (number of receivers) in both SART and M U L T I . On the contrary, the C P U time in GRID is rather independent of the number of rays to be traced, as expected. This is because the cost function at all grid nodes needs to be evaluated once, for any value of Nr. In SART, for example, the SA run is repeated for each source-receiver pair independently (though SART2 collects information corresponding to different runs to improve next solutions, as explained above). In simple 2-D models (Figure 5.26a, solid line), GRID is much more efficient than SART and M U L T I in all cases, except for tracing a single ray. In complex 2-D models (Figure 5.26b, solid line), GRID is more efficient than SART and M U L T I only for Nr ~ 25. In 3-D, things are different. Now, SART is the most efficient method, even for tracing large numbers of rays in complex models (Figure 5.26b, dashed line). In simple 3-D models (Figure 5.26a, dashed line), GRID is superior to SART only for tracing more than 60-70 rays with the same source. 5.7 B V P as a constrained nonlinear optimization problem: a discussion 5.7.1 Straight-ray construction drawback As mentioned in previous chapter, the straight-ray construction may lead to unrealistic raypaths arriving at earlier times than any other realistic path. Even thought the total traveltime is a global minimum, it is possible in some cases that the resulting raypath corresponds to a path not satisfying the ray equations at all points. One can think of a two-layer model where the velocity in the top layer is much lower than the velocity in the bottom one. Let source and receiver be located, for example, right above the discontinuity in the low velocity layer as shown in Figure 5.27. Clearly, the \"unrealistic\" critically refracted ray propagating through the high velocity medium arrives earlier than Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 165 x 5 \\ ^ optimum path V2 > Vi F i g u r e 5.27: T w o - l a y e r m o d e l where the t r a v e l t i m e of an \" u n r e a l i s t i c \" r a y p a t h is smal ler t h a n tha t of the t rue (real i s t ic) r a y p a t h . the expec ted one i f v2 ^> v\\. T h i s apparent d i f f icul ty w i l l be e l i m i n a t e d i n the fo l lowing r e f o r m u l a t i o n . 5.7.2 Al ternat ive formulations T h e b o u n d a r y - v a l u e r a y - t r a c i n g p r o b l e m can be v i e w e d as a cons t ra ined g loba l o p t i m i z a -t i o n p r o b l e m . In the two-point ray t r a c i n g case, the f u n c t i o n to be g loba l ly m i n i m i z e d is the t r a v e l t i m e f r o m xs to xe, a n d the constra int is tha t xe must co inc ide w i t h xr, w i t h i n a g iven to lerance . T h e set of parameters that satisfy the constra int is ca l led feasible region. P u t i t m a t h e m a t i c a l l y , r*e 1 m i n i m i z e T,e = / —:—-ds, subject to d < dtoi, (5-41) where d is the d i s tance be tween xe a n d xP, a n d dtoi is a to lerance d i s tance . B o t h distances are measured , for e x a m p l e , a long the b o u n d a r y p lane . S ince the t r ave l t ime usua l ly represents a n observat ion w i t h a ce r ta in error , the o p t i m i z a t i o n p r o b l e m can be expressed i n t e rms of a to lerance t r a v e l t i m e Ttoi'. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 166 fXe 1 fXr 1 minimize T8e = / —r^ds, subject to Ter = / —r-rds < TtoU (5.42) Jx, v(x) JXe v(x) The tolerance traveltime is related to the tolerance distance by dtol ~ vTtol, (5.43) where v is some mean velocity around the receiver point. Note that in the previous formulation the straight-ray construction is not explicitly taken into account, except for the fact that the integration involved in the constraint is done along a straight line segment connecting x e and x P . Problem (5.42) is nonlinear in both the cost function and the constraint terms (except for some simple form of the velocity field, for which the path is known) but also multimodal and nondifferentiable in general. This is a rather difficult optimization problem. In standard shooting methods, the optimization problem reduces to solving d = ||x e - x r | | < dtot. (5.44) Although this formulation seems much simpler, it is necessary to find all minima of equation (5.44) for which d < dto\\ to decide later which one corresponds to the global minimum traveltime. In SART I solve (5.42) using V F S A , taking special care to properly incorporate the constraint. For this purpose, I transform the constrained optimization problem (5.42) into an unconstrained one. This can be done in several ways. The most direct way of incorporating constraints in a SA optimization problem is to reject those proposed models that do not satisfy the constraint. That is, during the perturbation stage of the Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 167 SA process, random perturbations are generated until the proposed configuration satisfies the constraint. Since usually Ttoi is very small, and so is the feasible region, this strategy may be extremely inefficient. This is because almost all proposed configurations will not satisfy the constraint, especially during the first stages of the process when temperature is relatively high. Alternatively, as in the case of tracing reflected waves, I can define { T,e, Ter < Ttol, (5.45) Tmax, otherwise, where Tmax > Tse for all possible take-off angles. In this formulation, all proposed con-figurations are acceptable. The decision whether to accept or reject a trial configuration is made by the Metropolis criterion instead. Again, since Ttoi is usually very small, cost function (5.45) is very difficult to optimize. This function is basically a flat surface with a set of \"holes\", each one corresponding to a beam of rays satisfying the constraint (multipathing). A better alternative is to define the cost function !7 « e ) Ter ^ Ttoi, (5.46) pTer, otherwise, where p is a constant, and T,e is the traveltime after solving the IVP. I chose p so as to guarantee that $[Ter < Ttoi\\ < $[T e r > Ttoj] for all (0, ,£,) . This ensures that the global minimizer of $ belongs to the feasible region. I set P > Tmax/Ttol (5.47) where Tmax denotes the maximum traveltime for all possible IVP solutions. Notice that this formulation is not valid if Ttoi is zero. However, in real-world situations we rarely collect exact data, and a tolerance distance on the target position is allowed. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 168 In formula (5.46) the \"holes\" are still present, but the surface height decreases in the proximity of the feasible regions. The advantage of this formulation lies in the fact that if the current solution is not in the feasible region all effort is put into guiding the solution towards this region (the \"holes\"). On the other hand, when the current solution satisfies the constraint, all computational effort is put into minimizing traveltime 7Ae, and traveltime Ter is no longer taken into account. Notice that iteration is not stopped as soon as Ter < Ttoi. Rather, iteration continues within the feasible region so as to obtain the global minimum in case multipathing exists. SART is stopped only after a maximum number of iterations or when the cost function is not improved after a number of consecutive steps. Formulation (5.46) introduces a discontinuity at d — dtoi at most of size A $ = Tmax - Tept, (5.48) where Topt is the global minimum and dtoi is the tolerance distance denned in equation (5.43). This discontinuity does not represent a major obstacle, in general, for a stochastic optimization algorithm. But if dtoi (or Ttoi) is very small, the \"diameter\" of the feasible region becomes very small, too (the feasible region reduces to a point and $ case Ttoi = 0. This is equivalent to minimizing distance d). The optimization of such a cost function may become very difficult, even for V F S A . Alternatively the cost function could be defined in terms of a conventional penalty factor: *p{9.,Z,) = Tn + PpT„. (5.49) where ppy pp > 1, is a constant that weights the trade-off between traveltime Tse and closeness to the feasible region. Now both terms of the sum are minimized simultaneously. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 169 In the first stages of SART, when Ter > Ttoi, it is likely that some local minima of Tse may delay convergence. But soon, if pp is selected properly, trial solutions start being feasible points. To ensure that the global minimizer of $ p is a feasible point, pp must be selected as in the previous formulation, equation (5.47). These values for p and pp ensure that the global minima of both cost functions (5.46) and (5.49) correspond to realistic raypaths, since $[T e r > Ttol] = PTer = ^ T e r > Tmax, (5.50) and $ p [r e r > Ttol] = Tse + PpTer = Tse + ^ T e r > Tse + Tmax > Tmax, (5.51) J-tol where Tmax is the maximum traveltime T,e may take for all possible take-off angles. Note that the larger the value of p p , the harder the optimization problem, since for pp >^ 1 cost function (5.49) is relatively insensitive to a change in Tte: d $ p / d T e r pp As a result, all feasible points are equally likely to be obtained, because $p(0s,Z,)^PPTer. (5.53) This is equivalent to lowering temperature too fast during the annealing process (quench-ing). A premature crystallization is obtained and the solution depends on the initial Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 170 model. On the other hand, if pp is too small, the chances of obtaining an unrealistic so-lution get higher, as discussed in the first paragraphs of this section. There is a trade-off, then, between global convergence and feasible solution. As an alternative penalty formulation, one can think of a single SA run where pp = Tmax/Ttoi, for annealing temperature T = To. Then, as temperature decreases, pp does so following a predefined decreasing ratio. Apparently, this scheme would make the job in a single SA run: at high temperatures the solution is guided towards the feasible region. When temperature is low, the minimization concentrates on finding the minimum traveltime T„e. However, the resulting cost function has changed during the process and global convergence cannot be guaranteed anymore. Nothing prevents the system of being trapped in a local minimum during the high-pp stages of the process. When T is low, it may be too late to escape. Besides, the predefined ratio for decreasing pp must be tuned very carefully for each particular problem. This is against the philosophy of SART of being a simple general algorithm for solving the two-point ray-tracing problem. After testing SART under several numerical experiments I have found that pp = 1 is enough to guarantee global convergence to a feasible point in most cases4. This is an advantage, since it turns out that although cost function $ p with pp > 1 tends to guarantee global convergence to a realistic raypath, the optimization of such a cost function takes more iterations, in general, than the pp = 1 case, especially when Ttoi is very small. The same can be said for minimizing cost function where p should not be, in general, smaller than TmaxjTto\\- This issue is demonstrated below. Take Figure 5.28 as an example. Cost functions $ and $ p , which correspond to the 2-D fault model, were computed using Tmax = max{T a e(0 s)} ~ 35.0 ms, Ttol = 1.0 ms, p = pp = 35.0. 4Note that actually equation (5.49) together with pp — 1 is the formulation I have used in Chapter 4 in all the examples. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 171 ( a ) 0 (deg) Figure 5.28: Mode l 1: alternative cost functions for the fault model, (a) and (b) $ p , as a function of take-off angle 8a, with p = pp — 35.0. These curves are to be compared to the cost functions plotted in Figure 5.9, i.e. d and T = Tse + Ter. As a result of the selected values, the absolute minimum of both $ an $ p correspond to a ray satisfying the ray equations from source to receiver. The absolute minimum of $ p belongs to the feasible region, in general d = 0. The solution coincides with that in Figure 5.9b: 6, = 125.90°, $ p = 34.714 ms. Cost function $ has also a unique global minimum, but the values are slightly different: Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 172 $ - equation (5.46) equation (5.49) Ttoi k T - 1 er °~Ter PP k o-k Ter 1.00 339.6 360.6 0.630 0.259 1 239.5 129.0 0.286 0.113 0.75 461.1 575.0 0.464 0.190 2 440.4 415.1 0.119 0.047 0.50 673.5 782.4 0.322 0.124 3 706.4 831.4 0.080 0.029 0.10 1208.2 1034.9 0.067 0.023 5 887.9 1125.2 0.048 0.016 0.03 1663.5 1421.7 0.017 0.006 10 1333.9 1304.0 0.025 0.008 0.01 2252.2 1819.2 0.007 0.002 35 2345.4 1944.6 0.007 0.002 Table 5.12: Mean and standard deviation of the number of iterations, k, required to obtain the global minimum within 1% accuracy for various cost functions, and mean and standard deviation of traveltime Ter. 6a = 124.61°, $ = 34.402 ms. Now the search for the minimum within the feasible region is not biased towards d = 0. Consequently the global minimum raypaths does not necessarily arrive exactly to the receiver, but within a tolerance distance dtoi as denned in equation (5.43). In the above example, it is not surprising that the global minimum of cost function $ corresponds to a distance d of about 3 m, since dtoi = 3 m for the given receiver location and tolerance traveltime (v = 3 km/s in the near receiver region). In order to demonstrate which is the best option, in terms of number of iterations to find the global minimum, I have run SART using several cost functions, for the 3-D fault model example taking into account both take-off angles. It is assumed that the global minima of all cost functions correspond to a feasible point, and Tmax = 35.0 ms. Table 5.12 summarizes the mean and standard deviation of the number of iterations required to obtain the global minimum within a 1% accuracy, after 200 realizations in each case. Note that the linearizing stage to refine the solution was not applied in this case. In the case of cost function $, Ttoi was varied from 0.01 ms to 1.00 ms, corresponding to p in the range Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 173 35 to 3500. As Ttoi decreases, the mean number of iterations increases exponentially, as illustrated in Figure 5.29a (solid line). On the contrary, f„ decreases linearly with Ttoi. This implies that for obtaining a very accurate solution (Ter small), a large number of iterations should be done. Conversely, if Ttoi is set too large, the emerging point may be too far away from the receiver point, no matter how many iterations are performed. Cost function $ p behaves differently (see Figure 5.29b). The values were obtained for Ttoi — 1-0 ms. Despite the fact that Ttoi has been fixed to 1.0 ms, the minimization of $ p leads to very accurate solutions in fewer iterations, especially for pp close to 1. The intersection point between the curves in Figure 5.29 is the best option for maximum accuracy and greatest efficiency. It is worth mentioning that the refinement stage that was not applied for constructing Table 5.12 and Figure 5.29, would have reduced Ter virtually to zero in both examples. However, in the case of cost function $, there may be ambiguities when dealing with very similar solutions unless Ttoi is set very small. If Ttoi is too large, two or more solutions may he within a single \"hole\". Cost function $ p does not present this difficulty, but may require a large pp to guarantee global convergence to a feasible point in some complex media. In SART, pp = 1 is used by default. In particular models, as that shown in Figure 5.27, pp can be increased in successive SA runs until an upper bound of pp = Tmax/Ttoi-Whenever SART converges to an unrealistic raypath, pp may be increased by a given factor (say 2.0) until the global minimum corresponds to a realistic path. This process should only be repeated a few times (two or three) because the failure to converge to a realistic path may indicate the presence of a shadow zone rather than pp being too small. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 174 2750 o '-*-» TO •Z 1500 h TO 250 P 0.5 T t o , (ms) Figure 5.29: Mean number of iterations and traveltime Ter for various cost functions. Data is shown in Table 5.12. (a) Cost function (b) Cost function $ p . In both cases, Tmax = 35.0 ms. 5 . 8 Conclusions In this chapter I have developed a 3-D extension of the ray-tracing algorithm presented in Chapter 4. SART is a very versatile computational algorithm for solving the boundary-value ray-tracing problem in a general 2-D/3-D model. SART aims to find the raypath connecting any source-receiver pair so that the associated traveltime is a global minimum. The solution is found after solving iteratively a highly nonlinear optimization problem by means of V F S A . The results are independent of the initial guess, since V F S A is a global stochastic optimization algorithm. Contrarily to SART in 2-D, which is based on a cell ray-tracing scheme, SART extension to 3-D solves numerically the equations derived from the high-frequency ray approximation theory. This new formulation provides greater accuracy in the calculations and greater flexibility for dealing with complex models. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 175 For this purpose, I have also developed a versatile model parameterization system that allows to represent a wide variety of geologic 3-D (or 2-D) structures. Any number of regions delimited by arbitrary interfaces can be defined. The velocity (or slowness) within each region is specified separately, which allows to model any type of waves, including converted waves. As in the 2-D case, SART extension to 3-D exhibits some improvements over existing ray-tracing techniques like bending and shooting. The problem of local minimum paths is eliminated provided a sufficient number of iterations are performed. Despite the fact that SART solves an IVP at each iteration, the selection of the appropriate take-off angles does not represent a serious difficulty even for ill-behaved receiver-distance functions. This issue is particularly important for tracing complex raypaths where more that two unknown parameters are to be found, for example when tracing headwaves. It is worth to mention that SART goal is to find a single raypath corresponding to the global minimum traveltime. Multiple arrivals can be obtained only if separate search ranges for the unknown parameters are specified. The optimum raypath obtained by SART corresponds to a particular solution of the IVP. Consequently, first-arrival raypaths traveling across shadow zones and arbitrary headwaves cannot, in general, be found. It has been pointed out that these waves are not the most energetic events because they fill space fastest and tend to have maximum geometric spreading, especially in models with strong lateral velocity variations [GB93]. Hence the use of these arrivals (usually obtained by finite-difference methods) for geotomographic applications is limited or presents some difficulties. SART arrivals, on the contrary, correspond to the most energetic events (direct waves) since they are obtained after solving the IVP. It is emphasized that the physical nature of the resulting ray is known a priori and a posteriori, a point that is very important in phase identification. Later arrivals, like reflections and headwaves, can be obtained by specifying a ray signature in the way of Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 176 constraints. This is done by penalizing those rays not following the desired signature. The methodology can be extended to deal with more complex raypaths (multiples, higher order headwaves, diffractions, etc), including ray conversions of any type. The accuracy of the results, unlike in bending, does not depend on the parameteriza-tion of the raypath but on the selected integration method and timestep. Higher-order accuracy is obtained by means of the 4th-order R K integration, for example, at no sig-nificant extra computational cost. The source of errors is limited then to the accuracy used in the calculation of the intersections between raypaths and interfaces, and on the distance between the ray ending point and the receiver. The errors associated with the intersection points can be made very small simply by increasing the accuracy of the iter-ative root-finder. The other source of errors is ehminated by polishing the final raypath after the last annealing iteration. This is carried out by using a standard linearizing optimization method (e.g. steepest descent) to reduce the receiver distance virtually to zero within machine precision. In terms of computational cost, SART is expensive as compared to conventional boundary-value ray-tracing systems. This is because several iterations are required to en-sure global convergence. Contrarily, conventional shooting and bending methods usually converge (if not diverge) in a much smaller number of iterations. But their convergence is only local. For global convergence, it has been demonstrated that SART is much faster than some shooting variants such as grid-search and multistart, when dealing with complex 2-D/3-D media. One disadvantage of SART is that raypaths connecting source-receiver pairs are found one at a time. This makes SART unpractical for traveltime tomography when the number of sources and receivers is large. Nevertheless, when there are a few source-receivers pairs, when the geology is complex, and direct first arrivals are required, SART is a valuable tool. In these cases, conventional shooting, bending and finite-differences methods present serious difficulties that SART overcomes naturally. Chapter 5. Boundary Value Ray Tracing In Complex 3-D Media 177 Finally, it is worth noting that SART may be readily parallelized. Each source-receiver pair may be treated separately by individual processors. Since the forward problem is by far the most time consuming stage in SART, it may be possible to obtain superlinear speedups by using various processors. This means that the speed per processor increases when processors are added. Chapter 6 Traveltime tomography Some circumstantial evidence is very strong, as when you find a trout in the milk. Henry David Thoreau 6.1 Introduction Traveltime tomography for imaging subsurface velocity variations represents a very im-portant technique in global seismology and exploration studies. The method is used to obtain the velocity distribution from measured time values (inverse problem). The basic idea can be traced back to as early as 1917, when a method to locate ore bodies by combining traveltimes from a number of sources and receivers in various boreholes was proposed [Fes 17]. Though this method requires a fairly simple medium to roughly locate the ore bodies by hand interpretation and elementary geometry, it points forward to the present use of crosswell tomography. In its simplest form, traveltime tomography utilizes straight rays connecting sources and receivers. Then the problem reduces to solve a linear system of algebraic equations. Aside from some issues regarding ill-conditioning and/or sparseness of the derived model matrix, the traveltime tomography problem using straight rays can be solved without major difficulties. But in highly refractive media the straight ray approximation becomes invalid and the resulting inverted model is inaccurate. Curved ray traveltime tomography, on the other hand, provides a more accurate ap-proximation to the actual physical phenomenon. This technique was originally developed by [BPLT72] for estimating the velocity distribution between two wells. As opposed to 178 Chapter 6. Traveltime tomography 179 straight ray traveltime tomography, curved ray traveltime tomography is a highly non-linear problem. This is due to the fact that the arrival-times are nonlinearly related to the unknown slowness field. In other words, not only the velocity distribution is un-known, but also the raypaths. Usually, the problem is solved using linearizing techniques in an iterative fashion. In general, a good starting model is required and some form of regularization or model constraints must be introduced in the objective function to stabilize the solution and to minimize the artifacts generated by the inversion (see for example [Nol87, BBC89, SDG90, Ald92]). In addition, this procedure is also intended to introduce a priori information in connection with the subsurface slowness field, such as flatness, closeness to a prescribed base model, etc. A n alternative is to introduce such constraints in the model itself through a suitable parameterization, an essential stage in any inverse problem. Rather than using box-like cells, the nonlinear tomographic inversion technique described in this chapter makes use of either bicubic B-splines with adaptive node spacing, or 2-D parametric functions for representing velocity anomalies. The motivation for using bicubic B-splines is two-fold: (1) smoothing is obtained beforehand without the need to regularize in the standard fashion, and (2) the number of unknown parameters is reduced substantially making the problem tractable using a computationally intensive algorithm like simulated annealing (SA). Although bicubic B-splines are inherently smooth, relatively sharp velocity changes could be obtained by locating a few nodes at key points [Sha93]. I use V F S A to optimize the velocity values at the spline nodes and the location of the nodes themselves, so the grid is dynamically adapted along with the nodes values. This strategy allows one to model moderately complex structures using only a few nodes (3-5 in each dimension) [VU95b]. In a previous work [Mic95], an adaptive-grid formalism which uses bicubic B-splines for traveltime tomography is also used. Here node coordinates are updated using a local linearizing method rather than a global optimization one. Chapter 6. Traveltime tomography 180 In other tomographic applications, such as archaeology, mining, and other near surface studies, one is interested in locating several or a single velocity anomaly submerged in a smooth background velocity field. Since it is very difficult to decide a priori whether the underlying structure corresponds to a smooth or a high contrasting velocity distribution, the selection of a proper parameterization that does not favors any of the two forms is desirable. This is the motivation of using parametric 2-D functions as an alternative for reducing the nonuniqueness of the inverse problem at hand. This limits the number of unknowns significantly and virtually eliminates the artifacts and instabilities associated with the tomographic problem. At the same time, it provides sufficient flexibility to model complex structures with a few parameters. The parameterization adopted in this chapter can be complemented by incorporating a linear trend to the velocity field. The coefficients of the trend represent extra parameters in the traveltime inversion. Velocity constraints can be easily included by imposing range limits to the various parameters which are involved. Prior information about the subsurface structure can also be used to guide the spline node locations and the anomaly size and position within the model boundaries. This a priori information is very useful to reduce the number of forward modeling computations required for convergence, but are not essential for the success of the nonlinear optimization scheme. The two forms of parameterizations just described, introduce strong nonlinearities into the optimization problem. Besides, the resulting objective function (misfit between observed and calculated arrival times) is multimodal and rather ill-behaved. Unless the initial model is close to the global minimum, the solution is likely to be trapped in a local minimum if a local linearizing method is utilized. This is the reason why I use V F S A , at the expense pf a higher computational cost. SA has already been applied to traveltime tomography without raytracing [AV90, AV93, PL93]. But the parameterization they used is based on a standard grid with rectangular cells. Though a spatial smoothing is Chapter 6. Traveltime tomography 181 introduced at the perturbation stage to reduce the unrealistic artifacts associated to the inversion, the hmitations of a box-like parameterization are still present. This chapter is organized as follows. First I describe the traveltime inversion prob-lem in a general sense. I also describe the two forms of parameterization that will be used to model the velocity field. Second, I present the traveltime inversion as nonlinear optimization problem, which is to be solved by means of V F S A . Finally, the method is i l -lustrated with various synthetic examples. These comprise crosswell, and well-to-surface 2-D experiments. The results of the nonlinear scheme are compared to a linearized in-version based on Vidale's approach [Ald92]. The two forms of parameterization may also be used, combined or alone, in conjunction with other geophysical techniques, such as magnetic, or geoelectric methods, provided the forward problem can be solved at each iteration of the SA algorithm for the given model representation. It is not the purpose of the proposed techniques to replace any preexisting inversion procedure for general seismic exploration studies, but to provide an alternative methodology for dealing with particular models susceptible of being represented by a few parameters. The method is especially suited for imaging smooth (regional) models and/or near surface structures such as archaeological and mining prospects. 6.2 General theory Under the high-frequency approximation of wave theory, the traveltime of a seismic wave propagating through a slowness field, s(x, z), is given by the path integral (see for example [Nol87]): T(s)= I s(x,z)dl, JL(s) (6.1) Chapter 6. Traveltime tomography 182 where L(s) is the raypath connecting source and receiver, which satisfies Fermat's prin-ciple. The problem is to derive s(x,z) from a number of measured traveltimes T(s). This is a very complicated nonlinear inverse problem, since the unknown slowness field is implicit in the path of the integral. The basic procedure is then to linearize using perturb-ation theory. For this purpose consider a reference model, s0, and a small perturbation, As. Ideally, this yields a small change in traveltime: A T = T - T0 = / sdl - f sQdl ~ f Asdl, (6.2) where Lo = L(s0). Note that, after invoking Fermat's principle, we have approximated the integral along the path L by the integral along the path LQ. The above expression is valid to first-order, and allows one to linearly relate a small change in traveltime with a small change in slowness field. In effect, if the slowness field is represented by a set of K cells with constant slowness sk, (k = 1, • • • , K), then the raypath within each cell is a straight fine segment. For a set of N source-receiver pairs, in matrix notation A T = A(s)As, (6.3) where A(s) is a N x K matrix whose elements are raypaths length segments. Equation (6.3) maps a projection of the model space into the data space, where in general K > N. The reference model, So, is known and A T = T ° - T c , (6.4) where T c = T 0 are the calculated traveltimes, and T ° are the observed traveltimes. The objective of the inverse problem is to find the perturbation A s such that the calculated traveltimes obtained using the reference model, approximate the observed traveltimes Chapter 6. Traveltime tomography 183 within a given tolerance. In the case of straight-ray tomography and if a sufficiently good ray coverage is available, equation (6.3) can be solved directly, despite the fact that matrix A is generally nonsquare and large. Unfortunately, the rays are usually not straight nor is the coverage of paths optimal. Moreover, the observed traveltimes are contaminated by random noise. As a result, matrix A ( s ) is, in addition to large and nonsquare, sparse and rank deficient, and the inverse problem becomes ill-posed. To overcome this difficulty, the inversion procedure must introduce a criterion so that elements in the model space can be ranked with respect to each other (regularization, damping, etc). This is to overcome the nonuniqueness problem, for with the only constraint of AT being small, there exist an infinite number of models that would reproduce the data (this is the fundamental difficulty encountered in any underdetermined inverse problem). In general, the problem is solved by linearizing the objective function with respect to the model unknowns, and being extremely careful about which criterion has been selected, based on the information available, to make the inverse problem well-posed. For a complete description of methods to solve this linearized inverse problem, the reader is referred to for example [Nol87]. 6.3 M o d e l r e p r e s e n t a t i o n I adopted two contrasting parameterization schemes to represent the velocity field, v(x, z): 1. bicubic B-splines with adaptive node spacing; and 2. parametric 2-D functions. These are intended to offer an alternative to box-like parameterization and to reduce the number of parameters to a minimum. Yet, they allow a certain flexibility to accommo-date complex structures. The first approach favors smooth velocity fields. The spline coefficients are determined \"globally\" so that the interpolating function is continuous Chapter 6. Traveltime tomography 184 everywhere up to the second derivative. The second approach, which was devised to rep-resent velocity anomalies (e.g. buried structures), does not favor neither smooth models nor models with sharp discontinuities, though these characteristics could be forced easily by setting the appropriate bounding ranges for some of its parameters. 6.3.1 B i cub ic B-splines Cubic and bicubic B-splines parameterization in tomography has been effectively ap-plied by other authors [Fir87, MM91, Mic95]. In these cases, solutions are found by linearizing the objective function and solving iteratively. Chunduru et al [CSS94] used bicubic B-spline parameterization for 2-D resistivity inversion using V F S A . Their results are encouraging because they effectively inverted areas of complex geology at a low com-putational cost, even though a nonlinear optimization method was used. The bicubic spline parameterization method adopted here is based on that given in [PTVF92]. The two-dimensional rectilinear grid consists of Mx x Mz nodes. To accommodate complex structures, the spacing in each dimension is not fixed. Rather, node coordinates are given by x^, i = 1, • • • , Mx, and Zj, j = 1, • • • , Mz, with x^ < Xj and Z{ < Zj if i < j. For simplicity, the first and last nodes of each row (and column) are located on the model boundaries, i.e. { X\\ — Xmini XMX = xmax (6.5) Figure 6.1 illustrates a typical grid for Mx = Mz = 4. To perform a bicubic interpolation, the user must specify the function v(x,z) at each node, the derivatives with respect to the coordinates, dv/dx and dv/dz, and the corresponding second-order cross derivative, d2v/dxdz. The goal of bicubic spline in-terpolation, a special case of bicubic interpolation, is to obtain an interpolating formula Chapter 6. Traveltime tomography 185 *2 XZ —9-X4 Z2 ZZ 2 4 Mx = Mz = 4 Figure 6.1: Example of a two-dimensional grid used for the bicubic spline parameteriza-tion. The node spacing in each dimension is not fixed. that is smooth in the first derivative and continuous in the second derivative. This is achieved by setting first derivatives equal at either side of each node. As a result, second derivatives can be readily computed provided additional conditions at the boundaries are given. Natural splines are obtained if second derivatives are set to zero at the boundaries. Alternatively, these may be set to values so as to make first derivatives equal to zero at the boundaries. This is the approach I adopted here. Having set all these conditions, a unique set of spline coefficients can be calculated and the resulting bicubic interpolating function can be written 4 4 vs{x, z) = ]C 0 ( 6 ' n ) This function has its maximum, A , at x = 0, and tends to zero asymptotically as x —> ±oo. Factor p, a dimensionless constant, is defined in such a way that pR is the distance from the origin at which the maximum of f(x) decreases by a factor of two. That is, f(\\PR\\) = /(0)/2 = A/2. (6.12) This parameter is a measure of the width of the function f(x). As a result, it controls the slopes of function va(x). Figure 6.2 shows a series of 1-D anomalies with unit amplitude and width, centered at xc — 0. Notice how the slopes of va(x) vary with different values of p. For p —» 0, f(x) —* 0, and the anomaly reduces to a box-card. For p —* oo, f(x) —• A , and the \"anomaly\" is flat for all x. It is important to remark that equation Chapter 6. Traveltime tomography 188 0 R 1 \\ « = 4.0 \\ p = 2 . 0 ^ s \\ P = i . o \\ . = o \\ i I . i 1 i , i , I - 2 - 1 0 1 2 offset x Figure 6.2: One-dimensional velocity anomaly function for various slopes. Both ampli-tude and width are equal to one. (6.10) is continuous and differentiable everywhere, even at x — xc ± R, since /(0) = A , and /'(0) = 0. Equation (6.10) allows us to model either smooth or high contrasting 1-D anomalies, using only 4 parameters. Namely xc, A , R, and p. In 2-D, one can make use of equation (6.10) to construct 2-D anomaly functions with similar features. One possibility is to consider a surface of revolution along the axis normal to the xz-plane passing through the center point (xc,zc): (A, r ° - ( 6- 1 5 ) Here I have introduced a new parameters, p, that helps to control the shape of the flat region limits. Notice that for p = 2, equation (6.15) becomes an ellipse of center (x c, zc) and semiaxes 2RX and 2RZ. Further, if Rx — Rz, it becomes a circle. In practice, R is calculated by finding the intersection point, (XR, ZR), between the straight segment connecting (x c, zc) and (x, z), with the curve (6.15), where (x, z) is a generic point in the xz-plane. After some algebraic manipulation, j xR = x c + 1 ZR = ZC + '• where (6.16) x - x c | p \\z-zc\\p\\1/p V Rl ft. Since R(x,y) = T(XR, ZR), combining equations (6.14) and (6.16), it yields R= r-. (6.18) r Chapter 6. Traveltime tomography 190 0.5 h -rectangle i F - i 0.5 1 -1 -0.5 0 x Figure 6.3: Two-dimensional velocity anomaly function (flat region only) for various values of p [equation (6.15)]. In all cases (xc,zc) = (0,0), Rx = 1.0, and Rz = 0.5. Notice how the anomaly takes on different shapes as p varies. To define the slopes of the anomaly, it is useful to keep the same shape than used for the 1-D case for values of r greater than R. This result can be obtained by replacing x by r in equation (6.11). It yields Finally, the 2-D anomaly function is defined like in equation (6.13), with the exception computed using equation (6.19). As in the 1-D case, va(r) is continuous up to the first derivative, except for p < 1. The flexibility of this model representation and the meaning of all the parameters required to define the 2-D anomaly, are illustrated in Figure 6.3 for various p. Now the shape of the anomaly can be changed easily by setting the appropriate parameters for the flat region, f(r) = A (6.19) that R is no longer a constant, but computed using equation (6.18), and that f(r) is Chapter 6. Traveltime tomography 191 along with the factor p. For example, values of p smaller than or equal to one can also be selected. The resulting flat region takes on a diamond-like shape. If p —» oo, the flat region becomes a rectangle of size 2RX x 2Rz.In effect, from equation (6.15), \\x-xJ*\\1,p ±R,\\l- ' Rp ' 1 , \\x - xe\\ < Rx. (6.20) Taking hmit, { zc ± Rz, \\x - xc\\ < Rx (6.21) I I _ P zci \\X Xc\\ — Jrix. p—»oo In addition, the whole anomaly is rotated an angle u> around the axis normal to the xz-plane passing through its center. This is done using the transformation { x' = xc + (x — xc) cos u> — (z — zc) sin u> (6.22) z' = zc + (x — xc) sin w + (z — zc) cos where x' and z' are the new coordinates in the rotated frame. In summary, 2-D anomalies are represented using parametric functions defined by 8 parameters, namely A , xc, zc, Rx, Rz, p, p, and a>. To reduce the number of parameters, it is possible to set p = constant (e.g. p = 2), yet having a quite flexible model repres-entation. In this particular case, the number of parameters to be found at the inversion stage reduces to 7. Finally, as in the case of bicubic splines, a linear background velocity can be added, as given in equation (6.8). As a result, the total number of parameters is K = 11. That is, number of anomaly parameters + number of background coefficients. Then, the model space is given by m = {A, Rx,Rz,p,p,w,v0,gx,gz}, (6.23) Chapter 6. Traveltime tomography 192 a vector of 11 elements. 6.4 Forward model ing I adopted a finite-difference (FD) method [Vid88, Ald92, A093] to compute the travel-times given a velocity model and a source-receiver geometry. Explicit raypaths are not required since the SA inversion is based on the computation of the traveltimes only. The FD method is based on the solution of the eikonal equation using finite-differences on each square cell of a gridded slowness field. The eikonal equation can be easily derived from the high-frequency approximation of the wave equation (see for example [Nol87, Ber91]). The grid is basically the same as that used for ray tracing in Chapter 4 (Figure 4.1), with the exception that Ax = Az = h, and that velocities are assigned to the nodes. Once a model parameterization has been chosen (bicubic splines or parametric functions), the velocity field is sampled over the equally spaced grid of Nx x Nz square cells: = v(xt, Zj), (6.24) where Xi = xmin + (i - l)h, i = 1, • • • , Nx (6.25) Zi = zmin + {j - l)h, j = 1, • • • , Nz, and v(x,z) — Vb(x,z), equation (6.8), plus either v,(x,z), equation (6.6), or va(x,z), equation (6.13). The \"sampling\" process given by equation (6.24) and (6.25) is repeated at each iteration after the corresponding parameters (spline nodes, background velocity coefficients, etc.) have been updated by the SA algorithm. So, given a source-receiver geometry, and the current gridded slowness field, at each iteration the F D method pro-vides a set of calculated traveltimes Chapter 6. Traveltime tomography 193 Tnc = Tn c(m), n = l,.-.,N, (6.26) which are to be compared to the observed traveltimes, T° (data). Here N is the total number of source-receiver pairs. 6.5 Nonlinear inversion The traveltime inversion problem is cast as a nonlinear optimization problem. For this purpose, I define the cost function *(m) = I £ ^\\T°n - Tn c(m)|* (6.27) i V 71=1 where wn are weights. This equations expresses the misfit between the observed and calculated traveltimes. In general, q = 2, which leads to a standard weighted least-squares optimization. B u t other values for q can also be used. The objective is to minimize equation (6.27) with respect to m, such that * ( m ) < = X \\ (6-28) where $ t o j is a tolerance cost associated with the observational errors, and x ^s f f l e expected misfit. Note that % has the same units as traveltimes. In general, an estimate of the right-hand side of equation (6.28) is available, so % is a measure of the goodness-of-fit of the model to the observed data. In addition to minimizing $(m), I specify a set of bounding constraints of the form Ak 60.00 60.48 5.01 0.0 90.0 V9 2.04 0.02 1.4 2.6 vo 2.00 1.99 0.01 1.0 3.0 X2 36.3 8.7 20.0 50.0 9x 0.00 0.01 0.00 -0.1 0.1 *2 33.6 6.7 20.0 50.0 9z 0.00 -0.01 0.00 -0.1 0.1 e 0.67 0.10 - - e 1.31 0.38 _ X 0.19 0.02 - - X - 0.37 0.03 - -Table 6.1: Estimated model parameters after 3000 iterations (20 realizations). The search ranges are specified by (Ak, Bk). Last two rows show the average differential model and final misfit mean and standard deviation values, respectively. where v(i,j) is the true image, and v(i,j) is the result of the inversion. A l l errors in the nonlinear inversion are below 2%, and in the linearizing inversion they are below 3.5%. The results shown in Table 6.1 reveal that the velocity values at the spline nodes are well determined, in the sense that they show low variances. On the contrary, the relatively large variances of the coordinates of the central nodes, indicate that the traveltimes are not so sensitive to variations in the grid structure, but on the node values. 6.6.2 A n o m a l y b o d y The traveltime inversion procedure is also tested using a high contrasting anomaly body, A N O M A , which is illustrated in Figure 6.7a. A buried structure is submerged in a Chapter 6. Traveltime tomography 199 Figure 6.5: Traveltime inversion for the smooth model, (a) True model, (b) Raypath coverage, (c) S A inversion, (d) S A differential model, (e) Linearizing inversion, (f) Linearizing differential model. Chapter 6. Traveltime tomography 200 10 X 1000 2000 iteration 3000 0 1000 2000 iteration 300C Figure 6.6: SART convergence after 3000 iterations (20 realizations). constant velocity background field. The velocity of the anomaly is 2.5 km/s and the background velocity is 2.0 km/s. Table 6.1 summarizes the various parameters I used to build this model using the parametric functions described in previous section, along with the search ranges utilized by the SA inversion. Figure 6.7b shows the raypath coverage after exploiting all the sources. Traveltimes were then contaminated with 1% random noise, with b = ±0 .61. According to formula (6.30), with q = 2, the expected misfit for this experiment is X — 0.35 ms. (6.34) As with the SPLIN model, a total of 20 independent realizations were run using different seeds, with ITMAX=3000. The expected misfit was achieved well before 3000 iterations in about 50% of the cases. The convergence curves are illustrated in Figure 6.6 (right panel). Figure 6.7c shows the mean inversion using the SA approach (i.e. using the Chapter 6. Traveltime tomography 201 0 50 100 velocity (km/s) 0 50 0 -ioo 4 0.0 2.5 5.0 error (percent) Figure 6.7: Traveltime inversion for the anomaly model, (a) True model, (b) Raypath coverage, (c) SA inversion, (d) SA differential model, (e) Linearizing inversion, (f) Linearizing differential model. Chapter 6. Traveltime tomography 202 mean parameters values after the 20 realizations). It can be seen that the true model was recovered quite accurately. Figure 6.7e shows the inversion using the linearizing approach. The linearized solution also achieved the expected misfit and converged in about 5 iterations. Figures 6.7d and 6.7f show the differential models between true and reconstructed models. A l l values of the two differential models fall within 5%. However, the linearized inversion introduced some artifacts not present in the original model. The basic shape of the anomaly was recovered pretty well, but the sharp boundaries of the anomaly body could not be recovered, as expected. On the contrary, the SA inversion recovered the original model very accurately, with most errors below 1%. Though the linearizing approach imposes smoothness and is not appropriate for the inversion of this type of models, its results were shown for illustrative purposes. 6 . 7 Conclusions I have demonstrated the ability of the described traveltime tomography procedure for velocity imaging. The results of the inversion using synthetic data are in very good agreement with the model. The method makes use of either bicubic B-splines defined over a grid with nonuniform spacing, or parametric 2-D functions to model the velocity field. The parameterization is intended to reduce the number of unknowns and to stabilize the inversion. In the case of bicubic splines, smoothing is achieved beforehand. Besides, the grid spacing is dynamically adapted to accommodate moderately complex structures, leaving only a few unknown parameters for the optimization algorithm to work with. As a counterpart, the approach that utilizes parametric functions does not favor neither smooth nor high contrasting velocity distributions. This information is extracted from the data, where this information exists. Chapter 6. Traveltime tomography 203 The traveltime inverse problem is cast as a constraint nonlinear optimization problem, which is solved by means of V F S A in an attempt to find the global minimum regard-less the initial model. This strategy requires no ray tracing and problems related to: (1) initial model, and (2) selection of a proper regularization in the standard inversion procedure, are avoided. The selection of an adequate parameterization is very important, however, and the success of the tomography inversion resides not only in the ability of the parameterization to accommodate the subsurface geology, but also in the amount of data available and ray coverage. The main drawback of the traveltime inversion problem presented in this chapter relies perhaps in the fact that it is a time consuming process. Even though fast finite-differences are used to compute the traveltimes corresponding to each source-receiver pair, the computational cost is very expensive since a large number of iterations are required for convergence to a near-optimal solution. As compared to linearizing methods, which usually require just a few iterations (3-10), the SA approach requires a much large number (1000-3000). For the size of the models used here, this represented a few minutes in a Sun Ultra 1 workstation. Nevertheless, the parameterization that is used to represent the velocity field cannot be inverted satisfactorily using a linearizing method, for there exists a large number of hidden local minima that inhibit the convergence to a useful solution. The methodology is especially suited for those situations where a small number of parameters is able to represent the subsurface model. In the case of parametric functions, the method may find application in near surface studies, geoarchaeology, etc., provided the buried structure can be represented adequately with the parametric functions that have been described. The bicubic representation is appropriate, on the contrary, for smooth models only. The extension to 3-D situations may be readily implemented. Chapter 7 Summary It is venturesome to think that a coordination of words (philosophies are noth-ing more than that) can resemble the universe very much. It is also venture-some to think that of all these illustrious coordinations, one of them - at least in an infinitesimal way - does not resemble the universe a bit more than the others. Jorge Luis Borges - The Avatars of the Tortoise This thesis contributes to knowledge in three fundamental areas of seismic processing and inversion analysis: wavelet estimation, two-point ray tracing, and traveltime tomog-raphy. These problems are viewed as nonlinear optimization problems, where some misfit between the observed and calculated data, or some other functional of the model para-meters, must be minimized (cost function). The leitmotiv of the thesis is centered on solving these hard optimization problems by means of SA, to avoid local minima, and to improve the accuracy and reliability of the results. The problem of wavelet estimation based on the cumulant matching approach was treated in Chapter 3. Based on the convolutional model, it was assumed that the reflec-tivity series is non-Gaussian, and that the additive noise is normally distributed. Except for stationarity, there were no assumptions regarding the wavelet phase. Several im-provements over previous works were proposed. Special effort was put on those issues concerning amount of data and reliability of the numerical solution. It was shown that the cumulant matching leads to an optimization problem where a good initial model is required for obtaining reliable wavelet estimates, if a linearizing algorithm to solve the optimization problem is utilized. This justified the use of SA. Since the amount of data 204 Chapter 7. Summary 205 available plays a key role, a convenient multidimensional tapering to reduce the trace cumulant estimate variance was recommended. As a result, the algorithm could be used with confidence using relatively short data segments, aiming at a trace-by-trace process. An heuristic analysis of cumulant sensitivity to wavelet bandwidth was also performed. It was verified that, like kurtosis and MED-type deconvolution algorithms, the success of the cumulant matching method relies on the wavelet bandwidth. Finally, a hybrid strat-egy that combines SA with standard linearizing algorithms was proposed to alleviate the computational overburden associated with the optimization problem. The behavior of the method was explored by using several non-Gaussian reflectivity distributions to build the synthetic traces, and by taking different amounts of data. Real data was also utilized to test the method. The results demonstrated not only the viability of the cumulant matching method, but also the reliability of the convolutional model, which is at the heart of wavelet estimation philosophy. Chapter 4 introduced a novel method for solving the boundary value ray-tracing problem in laterally varying 2-D media. The method was called SART, that stands for simulated annealing ray tracing. The motivation of SART was to overcome some difficulties arising in standard ray-tracing techniques (e.g. shooting and bending). The ray-tracing problem was put into a nonlinear optimization framework, where the take-off angle which produces the ray connecting two fixed points with absolute minimum traveltime was to be found. The method focused on direct arrivals, since it is based on a shooting procedure, but other phases could also be incorporated (e.g. diffractions) by constraining the raypath at intermediate points of its trajectory. It was not possible, however, to find all the raypaths connecting source and receiver for a given velocity model, nor to trace arbitrary headwaves and phases traveling through shadow zones, unless the raypath signature is totally specified a priori. This apparent disadvantage, which is shared with shooting methods, implied that the phase of the raypath being Chapter 7. Summary 206 traced was always known, a priori and a posteriori. This issue is very important from the point of view of phase identification. A l l the equations for ray tracing in a gridded cell velocity model were well developed. It was possible to add an arbitrary discontinuity to represent a geologic horizon. This allowed one to trace not only direct waves, but also reflections and simple headwaves. The method was illustrated using various numerical examples. These included multipathing and other nonlinearities which are difficult to handle using shooting or bending. In all the cases, a fast convergence to the global minimum was obtained. SART philosophy was extended for dealing with complex 3-D models in Chapter 5. Not less important, in this chapter I also developed a complete model representation in terms of a number of blocks or regions that comprise the velocity field. Each region was assigned an arbitrary velocity field in the form of a three dimensional function. Block boundaries were given by arbitrary curve interfaces. SART extension to 3-D is based on the numerical solution of the ray equations, which were solved by means of standard ODE numerical solvers. A l l necessary equations to solve the initial-value problem, including how to find the intercept points where the ray meets block boundaries, were presented. It was shown how the boundary-value problem can be viewed as an unconstrained non-linear optimization problem, where a few unknown parameters were to be obtained. In general, these were the take-off angles which gave rise to the absolute minimum travel-time trajectory. In the case of headwaves, for example, the unknowns were represented by the coordinates the ray enters and leaves a predefined interface. Various alternative strategies were explored to solve the optimization problem, depending on the type of ray being traced (i.e. direct waves, reflected waves, etc.). It was found convenient to refine the solution at the later stages of the SA optimization (hybridization). Here the optimization problem was redefined in terms of the coordinates of the emerging point, and efficient linearizing methods were utilized. Several numerical examples were devised Chapter 7. Summary 207 to test the method under difficult situations, indicating that the algorithm is reason-ably robust. The models included complex structures that lead to extremely ill-behaved cost functions. Finally, an extensive analysis and discussion on SART performance and accuracy was also performed. Several methods for improving SART efficiency were pro-posed. For example, it was shown that a dynamic integration was much efficient than using the same numerical solver for all blocks, without loosing accuracy. Application of SA to the traveltime tomography problem without rays is treated in Chapter 6. The iterative algorithm is based on two special parameterization schemes. The solution to the inverse problem was stabilized by virtue of the reduction in the num-ber of unknown parameters that define the velocity field, rather than imposing model constraints through regularization. A bicubic spline parameterization with adaptive grid spacing was proposed for dealing with smooth velocity fields. For models that include anomaly-like bodies, such as those of interest in archaeology and other near-surface stud-ies, a versatile parametric scheme based on 2-D functions was presented in full detail. This novel scheme allowed one to model either smooth or high contrasting velocity structures with sharp boundaries, including elliptical, circular, or rectangular bodies, immersed in a smooth velocity background. Numerical examples using both crosswell and VSP data, demonstrated that the method can be used to successfully image complex structures without the need to regularize in the standard fashion, provided the velocity field can be represented using one of the two mentioned parameterization schemes. 7.1 Future research This work leaves open several areas of research. First, developing the cumulant match-ing approach in the frequency domain would provide interesting insights in the wavelet estimation problem using higher-order statistics. Different polyspectra may be combined Chapter 7. Summary 208 into a single cost function to optimally extract their information (e.g. second-order stat-istics may be used for magnitude recovery, and fourth-order statistics for phase recovery). The spectra may be parameterized to reduce the number of unknowns and to increase S A efficiency and accuracy. A comparison with polycepstra-based methods would bring more understanding to this field, as well. Second, a parallelized approach to the boundary ray tracing problem would give a more quantitative idea of the potential of S A R T for dealing with several raypaths as it is usual in tomography applications. Methods based on graph theory have not yet been implemented for dealing with complex 3-D models. It would be interesting to test its applicability, from the point of view of efficiency and accuracy, as compared to S A R T . T h i r d , better controlled experiments in Chapter 6 would add more understanding to this area. Devising alternative parameterization schemes for dealing with nonregular anomaly structures would provide a more flexible approach for imaging buried complex bodies. Three-dimensional anomalies could also be represented by this technique, as well as three-dimensional smooth velocity fields based on cubic splines, or other interpolation methodology. Finally, it has been assumed that V F S A is one of the most efficient S A methods available for dealing with nonlinear optimization problems. But other S A variants have emerged in the last few years and have shown to be very efficient in certain applications. Genetic algorithms, taboo search, and other rather unlikely methods are always attrac-tive algorithms for global optimization. Few published reports make a fair comparison between them. Improving existing S A techniques would certainly help to develop more efficient algorithms for solving difficult nonlinear optimization problems arising in seis-mic exploration and other geophysical areas. Questions like \"how many iterations are required to obtain the global minimum?\" , \"how accurate are the results after a certain number of iterations?\", \"how much faster can a S A algorithm be made without affecting Chapter 7. Summary 209 the reliability of the results?\", have no clear and definitive answer, for the results depend tremendously on the nonlinear inverse problem at hand. It is not only the so-called \"curse of dimensionality\" that plays a key role in SA optimization, but also the complexity of the cost function (e.g. number of local minima, roughness). These are questions that experience will resolve. References [AK90] E. Asakawa and T. Kawanaka. A new ray tracing method for seismic tomog-raphy. Technical report, 1990. (in Japanese). [AK93] E. Asakawa and T. Kawanaka. Seismic ray tracing using linear traveltime interpolation. Geop. Prosp., 41:99-111, 1993. [Ald90] D.F. Aldridge. The Berlage wavelet. Geophysics, 55:1508-1511, 1990. [Ald92] D.F. Aldridge. Analysis and inversion of seismic refraction traveltimes. PhD thesis, The University of British Columbia, Vancouver, Canada, Apri l 1992. [And72] R.S. Anderssen. In R.S. Anderssen, L.S. Jennings, and D . M . Ryan, editors, Optimisation, pages 27-34, St. Lucia, Australia, 1972. Univ. of Queensland Press. [A093] D.F. Aldridge and D.W. Oldenburg. Two dimensional tomography inversion with finite-difference traveltimes. J. Seism. Expl, 2:257-274, 1993. [AR80] K. A k i and P.G. Richards. Quantitative seismology: theory and methods. W . H . Freeman and Co., 1980. [AV90] C.J. Ammon and J.E. Vidale. Seismic traveltime tomography using combi-natorial optimization techniques. Seis. Res. Lett., 61:39, 1990. [AV93] C.J. Ammon and J.E. Vidale. Tomography without rays. Bull. Seism. Soc. Am., 83:509-528, 1993. 210 References 211 [BB89] H . Bohr and S. Brunak. A traveling salesman approach to protein conform-ation. Complex Sys., 3:9-28, 1989. [BBC89] N.D. Bregman, R.C. Bailey, and C H . Chapman. Crosshole seismic tomog-raphy. Geophysics, 54:200-215, 1989. [BBK+87] H. Berbee, C. Boender, A . Rinnooy Kan, C. Scheffer, R. Smith, and J. Tel-gen. Hit-and-run algorithms for the identification of nonredundant linear inequalities. Mathl. Programming, 37:184-207, 1987. [Ber91] J .G. Berryman. Nonlinear inversion and tomography (lecture notes). Uni-versity of California, Lawrence Livermore National Laboratory, Livermore, C A 94550, October 1991. [Bil94] S.D. Billings. Simulated annealing for earthquake location. Geophys. J. Int., 118:680-692, 1994. [BJS88] L O Bohachevsky, M . E . Johnson, and M . L . Stein. Optimal deployment of missile interceptors. Am. J. Math. Management Sci., 8(3 & 4):361-387, 1988. [BMT+89] G. Bilbro, R. Mann, T.K.Miller , W . E . Snyder, D.E. Van der Bout, and M . White. Optimization by mean field annealing. In D. Touretzky, editor, Advances in Neural Network Information Processing Systems, pages 91-98. Morgan-Kaufman, San Mateo, C A , 1989. [BPLT72] P. Bois, M . La Porte, M . Lavergne, and G. Thomas. Well-to-well seismic measurements. Geophysics, 37(3):471-480, 1972. [BR67] D.R. Brillinger and M . Rosenblatt. Asymptotic theory of estimates of k-th order spectra. In B. Harris, editor, Spectral Analysis of Time Series, pages 153-188. John Wiley & Sons, Inc., 1967. References 212 [Bri65] D.R. Brillinger. An introduction to polyspectra. Ann. Math. Statist., 36:1351-1374, 1965. [Cas82] B.R. Cassel. A method for calculating synthetic seismograms in laterally varying media. Geophys. J. R. Astr. Soc, 69:339-354, 1982. [Cer87] V. Cerveny. Ray-tracing algorithms in three-dimensional laterally varying layered structures. In Seismic Tomography with applications in Global Seis-mology and Exploration Geophysics, pages 99-133, Dordretcht, The Nether-lands, 1987. E. Reidel. [CH82] P. Chang and T. Hsieh. Constrained nonlinear optimization approaches to color-signal separation. IEEE Trans. Image Processing, 4:81-94, 1982. [CH96] N. Cheng and L. House. Minimum traveltime calculation in 3-D graph theory. Geophysics, 61(6): 1895-1898, 1996. Short note. [CK95] D. Cvijovic and J. Klinowski. Taboo search: An approach to the multiple minima problem. Science, 267:664-666, 1995. [CSS94] R.K. Chunduru, M.K. Sen, and P.L. Stoffa. Resistivity inversion for 2-D ge-ologic structures using very fast simulated annealing. In 64th Ann. Internal. Mtg., Soc. Expl. Geophys., Expanded Abstracts, volume 94, pages 640-643, 1994. [CSS96] R.K. Chunduru, M.K. Sen, and P.L. Stoffa. Development of efficient hy-brid optimization methods for geophysical inversion. In 66th Ann. Internal. Mtg., Soc. Expl. Geophys., Expanded Abstracts, volume 96, pages 1130-1133, Denver, Colorado, 1996. References 213 [Dav87] L. Davis, editor. Genetic algorithms and simulated annealing. Morgan Kaufmann Publishers, Los Altos, C A , 1987. [EH65] V . A . Elissevnin. Analysis of rays propagating in an inhomogeneous medium. Sov. Phys. Acoust., 10:242-245, 1965. English translation. [Fesl7] R .A. Fessenden. Method and apparatus for locating ore-bodies. U.S. patent 1.240,328, 1917. [Fir87] P. Firbas. Tomography from seismic profiles, chapter 8, pages 189-202. In Nolet [Nol87], 1987. [FK94] B.C. Fish and T. Kusuma. A neural network approach to automate velocity picking. In 64th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Ab-stracts, volume 94, pages 185-188, 1994. [FL93] R. Fisher and J . M . Lees. Shortest path ray tracing with sparse graphs. Geo-physics, 58:987-996, 1993. [Fog93] D.B. Fogel. On the philosophical differences between genetic algorithms and evolutionary algorithms. In D. B. Fogel and W . Atmar, editors, Proc. of the Sec. Ann. Conf. on Evolutionary Programming, pages 23-29, La Jolla, C A , 1993. Evolutionary Programming Society. [Fog94] D.B. Fogel. A n introduction to evolutionary optimization. IEEE trans, on Neural Networks, 5(1):3-14, 1994. [FT95] C . A . C . Filho and F . A . C . Thedy. Tracado de raios atraves de algoritmos geneticos. In 1st Latin American Geophysical Union Conference, Expanded Abstracts, pages 339-343, 1995. With abstract in English. References 214 [GB93] S. Geoltrain and J. Brae. Can we image complex structures with first-arrival traveltimes? Geophysics, 58:564-575, 1993. [GFR94] W . L . Goffe, G.D. Ferrier, and J. Rogers. Global optimization of statistical functions with simulated annealing. J. Econometrics, 60(l/2):65-100, 1994. (http://emlab.berkely.edu/Software). [GG84] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Analysis and Machine Intelligence, PAMI-6:721-741, 1984. [Gol89] D. E. Goldberg, editor. Genetics algorithms in search, optimization, and machine learning. Addisson-Wesley Pubhshing Company, Inc., 1989. The University of Alabama. [GR81] R. Godfrey and F. Rocca. Zero memory nonlinear deconvolution. Geophys. Prosp., 29:189-228, 1981. [GS84] M.R . Green and J. Supowit. Simulated annealing without rejected moves. In Digest International Conference on Computer Design, pages 658-663, Octo-ber 1984. [Har94] N.D. Hargreaves. Wavelet estimation via fourth-order cumulants. In 64th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, volume 94, pages 1588-1590, 1994. [HL95] S.H. Hartzell and P.C. Liu. Determination of earthquake source parameters using a hybrid global search algorithm. Bull. Seism. Soc. Am., 85:516-524, 1995. References 215 [Ing89] L. Ingber. Very fast simulated re-annealing. Mathl. Comput. Modelling, 12:967-973, 1989. [Ing93] L. Ingber. Simulated annealing: practice versus theory. Mathl. Comput. Modelling, 18:29-57, 1993. [Ing96] L. Ingber. Adaptive simulated annealing (ASA): Lessons learned. Control and Cybernetics, 25(l):33-54, 1996. [IR92] L. Ingber and B. Rosen. Genetic algorithms and very fast simulated rean-nealing: a comparison. Mathl. Comput. Modelling, 16:87-100, 1992. [IRS88] Y . Ishii, S. Rokugawa, and T. Susuki. On the various subsurface models and A R T methods. In Proceedings 1988 Spring Meeting, SEG Japan, pages 23-26, 1988. (in Japanese). [JG77] B.R. Julian and D. Gubbins. Three-dimensional seismic ray tracing. J. Geo-phys., 43:95-113, 1977. [KGV83] S. Kirkpatrick, C.D. Jr. Gellat, and M.P. Vecchi. Optimization by simulated annealing. Science, 220:671-680, 1983. [KM78] J. Kormylo and J . M . Mendel. On maximum-likelihood detection and es-timation of reflection coefficients. In 48th Ann. Internat. Mtg., Soc. Expl. Geophys., Reprints, volume 78, 1978. [Lan92] F. Lane. Estimation of kinematic rupture parameters from historical seis-mograms: an application of simulated annealing to a nonlinear optimization problem. PhD thesis, Colorado School of Mines, Golden, Colorado, 1992. References 216 [Laz93] G.L. Lazear. Mixed-phase wavelet estimation using fourth-order cumulants. Geophysics, 7:1042-1051, 1993. [LBT89] E. Landa, W . Beydoun, and A . Tarantola. Reference velocity model estima-tion from prestack waveforms: Coherency optimization by simulated anneal-ing. Geophysics, 54:984-990, 1989. [LHS95] P. L iu , S. Hartzell, and W . Stephenson. Nonlinear multiparameter inversion using a hybrid global search algorithm: Applications in reflection seismology. Geophys. J. Int., 122:991-1000, 1995. [LLC85] R.T. Langan, I. Lerche, and R.T. Cutler. Tracing of rays through heterogen-eous media: A n accurate and efficient procedure. Geophysics, 50:1456-1465, 1985. [L087] S. Levy and D.W. Oldenburg. Automatic phase correction of common-midpoint stacked data. Geophysics, 52:51-59, 1987. [LR82] K.S. L i i and M . Rosenblatt. Deconvolution and estimation of transfer func-tion phase and coefficients for non-Gaussian linear processes. Ann. Statist., 10:1195-1208, 1982. [LS86] P. Lancaster and K. Salkauskas. Curve and surface fitting, an introduction. Academic Press, New York, 1986. [Men91] J . M . Mendel. Tutorial on higher-order statistics in signal processing and system theory: Theoretical results and some applications. In Proc. IEEE, volume 79, pages 278-305, 1991. [Mic95] A . Michelini. A n adaptive-grid formalism for traveltime tomography. Geo-phys. J. Int., 121:489-510, Apri l 1995. References 217 [MJC89] J.L. Mallet, P. Jacquemin, and N. Cheimanoff. G O C A D project: Geo-metric modeling of complex geological surfaces. In 59th Annual Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, volume 89, page 126, 1989. (http://www.ensg.u-nancy.fr/GOCAD/gocad.html). [MM91] A . Michelini and T . V . McEvilly. Seismological studies at Parkfield I. simulta-neous inversion for velocity structure and hypocentres using cubic B-splines parameterization. Bull. Seism. Soc. Am., 81:524-552, 1991. [MNS92] T.J . Moser, G. Nolet, and R. Snieder. Ray bending revisited. Bull. Seis. Soc. Am., 82:259-288, 1992. [Mos89] T. J . Moser. Efficient seismic ray tracing using graph theory. In 59th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, pages 1106-1108, Dallas, 1989. [Mos91] T.J . Moser. Shortest path calculation of seismic rays. Geophysics, 56:59-67, 1991. [MRR+53] N . Metropolis, A . W . Rosenbluth, M . N . Rosenbluth, A . H . Teller, and E. Teller. Equation of state calculations by fast computing machines. J. Chem. Phys., 21:1087-1092, 1953. [MS97] W . Mao and G.W. Stuart. Rapid multi-wave-type ray tracing in complex 2-D and 3-D isotropic media. Geophysics, 62:298-308, 1997. [MU84] T. Matsuoka and T.J . Ulrych. Phase estimation using the bispectrum. In Proc. of IEEE, volume 72, pages 1403-1411, October 1984. [MVC93] G. Mirkin, K. Vasudevan, and F .A. Cook. A comparison of several coohng References 218 schedules for simulated annealing implemented on a residual statics problem. Geophys. Res. Lett, 20(l):77-80, 1993. [Nol87] G. Nolet, editor. Seismic Tomography with Applications in Global Seismology and Exploration Geophysics, Dordretcht, The Netherlands, 1987. D. Reidel. [NP93] C L . Nikias and A.P . Petropulu. Higher-order spectral analysis: A nonlinear signal processing framework. Prentice-Hall, Inc., 1993. [NR87] C L . Nikias and M.R. Raghuveer. Bispectrum estimation: A digital signal processing framework. In Proc. IEEE, volume 75, pages 869-891, 1987. [Per92] V . Pereyra. Two-point ray tracing in general 3-D media. Geophys. Prosp., 40:267-287, 1992. [PIIF92] L .A . Pflug, G.E. Ioup, J .W. Ioup, and R.L. Field. Properties of higher-order correlations and spectra for bandlimited, deterministic transients. J. Acoust. Soc. Am., 92:975-988, 1992. [PL91] P. Podvin and I. Lecomte. Finite-difference computation of traveltimes in very contrasted velocity models: A massively parallel approach and its asso-ciated tools. Geophys. J. Int., 105:271-284, 1991. [PL93] S.K. Pullammanappallil and J .N. Louie. Inversion of seismic reflection trav-eltimes using a nonlinear optimization scheme. Geophysics, 58:1607-1620, 1993. [PL97] S.K. PullammanappaUil and J .N. Louie. A combined first-arrival traveltime and reflection coherency optimization approach to velocity estimation. Geo-phys. Res. Lett, 24(5):511-514, 1997. References 219 [PS82] C C . Paige and M . A . Saunders. LSQR: an algorithm for sparse linear equa-tions and sparse least squares. ACM Trans. Math. Software, 8:43-71, 1982. [PTE88] W . A . Prothero, W . J . Taylor, and J .A. Eickemeyer. A fast, two-point, three-dimensional ray-tracing algorithm using a simple step search method. Bull. Seis. Soc. Am., 78:1190-1198, 1988. [PTVF92] W . H . Press, S.A. Teukolsky, W . T . Vetterling, and B.P. Flannery. Numerical Recipes in FORTRAN: the Art of Scientific Computing. Cambridge Uni-versity Press, second edition, 1992. [Ros92] B. Rosen. Function optimization based on advanced simulated annealing. In IEEE Workshop on Physics and Computation, pages 289-293, 1992. [Rot85] D.H. Rothman. Nonlinear inversion, statistical mechanics, and residual esti-mation. Geophysics, 50:332-346, 1985. [Rot86] D .H. Rothman. Automatic estimation of large residual statics corrections. Geophysics, 51:2784-2796, 1986. [RR88] H . Ratschek and J . Rokne, editors. New computer methods for global opti-mization. Ellis Horwood Ltd, 1988. [RS88] J. Ramanujan and P. Sadayappan. Optimization by neural networks. In Proc. of the Int. Conf. on Neural Networks, volume II, pages 325-332, 1988. [RT80] E .A . Robinson and S. Treitel. Geophysical signal analysis. Prentice-Hall, Inc., Englewood Cliffs, NJ 07632, 1980. [Rud87] W . Rudin. Real and complex analysis. McGraw-Hill, Inc., 1987. References 220 [Sai89] H . Saito. Traveltimes and raypaths of first arrival seismic waves: computation method based on Huygens' principle. In 59th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, pages 244-247, Dallas, 1989. [SBM95] M . Sambridge, J . Braun, and H. McQueen. Geophysical parameterization and interpolation of irregular data using natural neighbors. Geophys. J. Int., 122:837-857, 1995. [SBS93] M . K . Sen, B.B. Bhattacharya, and P.L Stoffa. Nonlinear inversion of resis-tivity sounding data. Geophysics, 58:496-507, 1993. [Sca85] L.E. Scales. Introduction to nonlinear optimization. Springer-Verlag New York Inc., 1985. [SDG90] J .A. Scales, P Docherty, and A. Gersztenkorn. Regularisation of nonlinear inverse problems: imaging the near surface weathering layer. Inverse Prob., 6:115-131, 1990. [SH87] H . Szu and R. Hartley. Fast simulated annealing. Phys. Lett. A., 122(3,4):157-162, 1987. [Sha93] E. Shalev. Cubic B-splines: Strategies of translating a simple structure to B-spline parameterization. Bull. Seism. Soc. Am., 83:1617-1627, 1993. [SK90] M.S. Sambridge and B .L .N . Kennett. Boundary-value ray tracing in a het-erogeneous medium: A simple and versatile algorithm. Geophys. J. Int., 191:157-168, 1990. [SS91] M . K . Sen and P.L. Stoffa. Nonlinear one-dimensional seismic waveform in-version using simulated annealing. Geophysics, 56:1624-1638, 1991. References 221 [SS95] M . K . Sen and P.L. Stoffa. Global optimization methods in geophysical inver-sion. Elsevier Publications, 1995. [Sun93] Y . Sun. Ray tracing in 3-D media by parameterized shooting. Geophys. J. Int., 114:145-155, 1993. [SVU98] M . D . Sacchi, D.R. Velis, and T.J . Ulrych. Non-minimum phase wavelet esti-mation using polycepstra. Journal of Seismic Exploration, 1998. (in press). [Tho82] D.J. Thomson. Spectrum estimation and harmonic analysis. In Proc. IEEE, volume 70, pages 1055-1096, 1982. [Tug87] J .K. Tugnait. Identification of linear stochastic systems via second- and fourth-order cumulant matching. IEEE Trans. Info. Theory, IT-33:393-407, 1987. [UT87] J. Um and C. Thurber. A fast algorithm for two-point seismic ray tracing. Bull. Seis. Soc. Am., 77:972-986, 1987. [Vel96] D.R. Velis. Nonlinear traveltime optimization for ray tracing in complex 3-D media. In V Congreso Argentino de Mecdnica Computacional, MECOM '96, volume 16, pages 53-61, Tucuman, Argentina, 1996. Asociacion Argentina de Mecanica Computacional. [Vel98] D.R. Velis. Nonlinear traveltime inversion: a parametric approach. In 68th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, New Orleans, Louisiana, 1998. [VF91] J. Virieux and V . Farra. Ray tracing in 3-D complex isotropic media: A n analysis of the problem. Geophysics, 56(12):2057—2069, 1991. References 222 [Vid88] J. Vidale. Finite-difference calculation of traveltimes. Bull Seis. Soc. Am., 78:2062-2076, 1988. [vLA88] P.I .M. van Laarhoven and E.H.L. Aarts. Simulated annealing: theory and applications. D. Riedel, Dordrecht, 1988. [VM96] R. Vinther and K Mosegaard. Seismic inversion through tabu search. Geo-phys. Prosp., 44:555-570, 1996. [VU95a] D.R. Velis and T.J . Ulrych. Improved wavelet estimation using fourth-order cumulants based on very fast simulated annealing. In CSEG National Con-vention, Expanded Abstracts, pages 143-144, Calgary, Canada, 1995. [VU95b] D.R. Velis and T.J . Ulrych. Traveltime tomography using very fast simulated annealing. In 65th Annual Internal. Mtg., Soc. Expl. Geophys., Expanded Abstracts, volume 95, pages 1055-1057, Houston, USA, 1995. [VU96a] D.R. Velis and T.J . Ulrych. Simulated annealing two-point ray tracing. Geo-phys. Res. Lett, 23:201-204, 1996. [VU96b] D.R. Velis and T.J . Ulrych. Simulated annealing wavelet estimation via fourth-order cumulant matching. Geophysics, 61:1939-1948, 1996. [Wal85] A . T . Walden. Non-Gaussian reflectivity, entropy, and deconvolution. Geo-physics, 12:2862-2888, 1985. [Wal90] A . T . Walden. Improved low-frequency decay estimation using the multitaper spectral analysis method. Geophys. Prosp., 38:61-86, 1990. [WH86] A . T . Walden and J .W.J . Hosken. The nature of the non-Gaussianity of References 223 primary reflection coefficients and its significance for deconvolution. Geo-phys. Prosp., 34:1038-1066, 1986. [Whi84] S.R. White. Concepts of scale in simulated annealing. In Proc. IEEE In-ternational Conference on Computer Design, pages 646-651, Port Chester, 1984. [Whi86] R.E. White. Estimation problems on deconvolution: Deconvolution and in-version. Blackwell Scientific Publications, Inc., 1986. [WS92] Q. Wu and T . H . Sloane. CMOS leaf-cell design using simulated annealing. Technical report, Buckwell University, Lewisburg, PA, 1992. [Yon93] S. Yonghe. Ray tracing in 3-D media by parameterized shooting. Geophys. J. Int., 114:145-155, 1993. A p p e n d i x A Suf f ic ient c o n d i t i o n s for t h e g l o b a l c o n v e r g e n c e o f v a r i o u s S A a l g o r i t h m s In this appendix, I give a heuristic demonstration showing that the given temperature schedule is a sufficient condition for the convergence to a global minimum (a more rigorous proof for the convergence of B A can be found in Geman and Geman [GG84]). For this purpose, it is enough to prove that the products of the probabilities of not generating a state x for all annealing times successive to ko is zero [SH87, Ing89]: oo U(l-9k) = 0. ( A . l ) fc=fco This means that all model states are given the chance of being sampled and checked for acceptance before the system is too cool to proceed. B y taking logarithm of equation ( A . l ) and Taylor expanding in gk, oo oo £ l n ( l - ft) = E ~9k + 9l/2 - gl/Z + •••. (A.2) fe—fen fe—/CQ As a result, proving equation ( A . l ) is equivalent to prove that £ 9k = oo. (A.3) fe=feo In B A the generating pdf is a Gaussian distribution, and if T is chosen to be equation (2.6), for a sufficiently high T 0 , equation (A.3) yields 224 Appendix A. Sufficient conditions for the global convergence of various SA algoiithmsTlb oo OO OO -I k=ko k=ko In FA, where a Cauchy distribution is used to generate new model states, T can be lowered at a faster rate. In effect, replacing equation (2.8) into the corresponding generating function, for arbitrary To, equation (A.3) yields OO rp OO j ^ rp OO E 9k = „ A „ ° M + 1 E u~ I 7 M ± T - E ^ = °°- (A-5) 1 + { & ) ' k—ki o In V F S A , a still faster coohng rate is allowed due to the nature of the generating pdf. Replacing equation (2.15) into the corresponding generating pdf, for arbitrary To, equation (A.3) becomes oo oo M j oo | E a. - E II o ^ . i f c i / M ~ E i = °°> (A-6) k=ko k=ko t=i ^ I S M \" fe=fc0 w since 2(| y i | + Ti ) ln( l + 1/Ti) ~ 2|y i|(c ifc 1/M - lnT o i ) ~ 2«\\yi\\k 1'M (A.7) as Ti becomes small for large k. It should be noted that the convergence proof is not affected by the value of C{. When quenching is desired, equation (2.15) is replaced by equation (2.19) with 1 < Q < M, and convergence is no longer guaranteed: co oo M oo Appendix B Formulas required for cell ray tracing with piecewise constant velocity In Chapter 4 I have described a method to solve the IVP through a velocity model that has been parameterized using rectangular cells with constant velocity. In particular I described how to trace each ray segment in the case that the first point lies on the left edge of any given cell. In this appendix I give all the required formulas to trace every possible ray segment. Tables B . l , B.2, B.3 and B.4 summarize the formulas for each particular case. The last row indicates the next set of formulas that must be used. Table B.5 shows the angles (pa and b <0k < 2TT