INVERSION AND APPRAISAL FOR THE ONE-DIMENSIONAL MAGNETOTELLURICS PROBLEM By Stanley Edward Dosso M. Sc. Physics, University of Victoria, 1985 B. Sc. (Hons.) Physics and Applied Mathematics, University of Victoria, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF GEOPHYSICS AND ASTRONOMY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1990 © Stanley Edward Dosso, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Geophysics and Astronomy The University of British Columbia 129-2219 Main Mall Vancouver, Canada V6T 1W5 Date: "Sepi/<}9D Abstract The method of magnetotellurics (MT) uses surface measurements of naturally-occurring electromagnetic fields to investigate the conductivity distribution within the Earth. In many interpretations it is adequate to represent the conductivity structure by a one-dimensional (1-D) model. Inferring information about this model from surface field measurements is a non-linear inverse problem. In this thesis, linearized construction and appraisal algorithms are developed for the 1-D MT inverse problem. To formulate a linearized approach, the forward operator is expanded in a generalized Taylor series and second-order terms are neglected. The resulting linear problem may be solved using techniques of linear inverse theory. Since higher-order terms are neglected, the linear problem is only approximate, and this process is repeated iteratively until an acceptable model is achieved. Linearized methods have the advantage that, with an appropriate transformation, a solution may be found which minimizes a particular functional of the model known as a model norm. By explicitly minimizing the model norm at each iteration, it is hypothesized that the final constructed model represents the global minimum of this functional; however, in practice, it is difficult to verify that a global (rather than local) minimum has been found. The linearization of the MT problem is considered in detail in this thesis by deriving complete expansions in terms of Frdchet differential series for several choices of response functional, and verifying that the responses are indeed Frechet differentiable. The relative linearity of these responses is quantified by examining the ratio of non-linear to linear terms in order to determine the best choice for a linearized approach. In addition, the similitude equation for MT is considered as an alternative formulation to linearization and found to be inadequate in that it implicitly neglects first-order terms. Appropriate choices of the model norm allow linearized inversion algorithms to be formulated which minimize a measure of the model structure or of the deviation from a (known) base model. These inversions construct the minimum-structure and smallest-deviatoric model, respectively. In addition, minimizing h model norms lead to smooth solutions which represent structure in terms of continuous gradients, whereas minimizing l\ norms yield layered conductivity models with structural variations occurring discontinuously. These two formulations offer complementary representations of the Earth, and in practice, a complete interpretation should consider both. The algorithms developed here consider the model to be either conductivity or log conductivity, include an arbitrary weighting function in the model norm, and fit the data to a specified level of misfit: this provides considerable flexibility in constructing 1-D models from MT responses. Linearized inversions may also be formulated to construct extremal models which minimize or maximize localized conductivity averages of the model. These extremal models provide bounds for the average conductivity over the region of interest, and thus may be used to appraise model features. An efficient, robust appraisal algorithm has been developed using linear programming to extremize the conductivity averages. For optimal results, the extremal models must be geophysically reasonable, and bounding the total variation in order to limit unrealistic structure is an important constraint. Since the extremal models are constructed via linearized inversion, the possibility always exists that the computed bounds represent local rather than global extrema. In order to corrob-orate the results, extremal models are also computed using simulated annealing optimization. Simulated annealing makes no approximations and is well known for its inherent ability to avoid unfavourable local minima. Although the method is considerably slower than linearized analysis, it represents a general and interesting new appraisal technique. The construction and appraisal methods developed here are illustrated using synthetic test cases and MT field data collected as part of the LITHOPROBE project. In addition, the model construction techniques are used to analyze MT responses measured at a number of sites on Vancouver Island, Canada, to investigate the monitoring of local changes in conductivity as a precursor for earthquakes. MT responses measured at the same site over a period of four years are analyzed and indicate no significant changes in the conductivity (no earthquakes of magnitude greater than 3.0 occurred in this period). Conductivity profiles at a number of sites are also considered in an attempt to infer the regional structure. Finally, a method of correcting linearized inversions is developed. The corrections consist of successively approximating an analytic expression for the linearization error. The method would seem to represent a novel and practical approach that can significantly reduce the number of linearized iterations. In addition, a correspondence between the correction steps and iterations of the modified Newton's method for operators is established. Table of Contents Abstract ii Table of Contents v List of Tables ix List of Figures x Acknowledgements xiv 1 Introduction 1 1.1 The geophysical inverse problem 1 1.2 The geo-electrical conductivity 3 1.3 The magnetotelluric (MT) method 6 1.3.1 EM induction in the Earth 6 1.3.2 The MT inverse problem : 10 1.3.3 Applicability of 1-D inversion 14 1.4 Overview of work in this thesis 16 2 Frechet differential expansions and the linearity of MT responses 19 2.1 Introduction 19 2.1.1 Functional analysis background 19 2.1.2 Application to 1-D MT inversion 24 2.2 The response R = d zE/(iu>E) 28 2.2.1 Expansions for R 28 2.2.2 Proof of the Frdchet differentiability of R 33 2.2.3 Constant conductivity example 35 2.3 The response c — —E/d zE 36 2.3.1 Expansions for c 36 2.3.2 Constant-conductivity example 43 2.4 Relative linearity of R and c 45 2.4.1 Quantifying the linearity 45 2.4.2 Linearity for constant-conductivity models 47 2.4.3 Linearity for general models 51 2.5 Alternative formulations 69 2.5.1 Alternative choices for model and response 69 2.5.2 The similitude equation as a basis for model-norm inversions 70 3 h model norm construction 78 3.1 Introduction 78 3.2 Linear inverse theory 79 3.2.1 Smallest model 80 3.2.2 Spectral expansion 82 3.2.3 Smallest-deviatoric model 85 3.2.4 Flattest model 86 3.3 The linearized inversion algorithm 88 3.3.1 Numerical implementation 89 3.3.2 Non-linear considerations 91 3.4 Examples of model norm construction 96 3.4.1 Flattest model construction 96 3.4.2 Smallest-deviatoric model construction 112 4 h model norm construction 117 4.1 Introduction 117 4.2 Linear inversion 118 4.2.1 Minimum-variation model 118 4.2.2 Smallest-deviatoric model 122 4.3 The linearized inversion algorithm 123 4.4 Examples of ly model norm construction 125 4.4.1 Minimum-variation model construction 125 4.4.2 Smallest-deviatoric model construction 138 5 Appraisal using extremal models of bounded variation 143 5.1 Introduction 143 5.2 Formulating the variation bound 148 5.2.1 Method 1 148 5.2.2 Method 2 150 5.3 Linear appraisal example 152 5.4 The linearized appraisal algorithm 158 5.5 Appraisal of MT responses 161 5.5.1 Synthetic MT example 161 5.5.2 MT field data example 170 6 Non-linear appraisal using simulated annealing 178 6.1 Introduction 178 6.2 Simulated annealing 179 6.2.1 Statistical mechanics 179 6.2.2 The Metropolis algorithm 181 6.2.3 Combinatorial optimization using simulated annealing 182 6.3 The simulated annealing appraisal algorithm 185 6.4 Appraisal examples 188 7 An application to MT monitoring of earthquake precursors 197 7.1 Introduction 197 7.2 Geological and tectonic setting 200 7.3 Field experiment 201 7.4 Temporal change in responses 202 7.5 Temporal change required in conductivity models 204 7.6 Regional interpretation 208 8 A modified linearized inversion algorithm 212 8.1 Introduction 212 8.2 Correcting the linearization 212 8.3 A practical inversion algorithm 223 9 Summary and discussion 230 References 237 Appendix A The absolutely flattest model 246 List of Tables 2.1 Summary of cases considered in linearity study of MT responses R and c 68 3.1 The true conductivity model for the synthetic MT example 99 3.2 Summary of model attributes for the l2 norm inversion shown in Fig. 3.1 99 4.1 Summary of model attributes for the l\ norm inversion shown in Fig. 4.1 128 6.1 Summary of bounds for a computed using simulated annealing and linearized inversion for the synthetic MT example 191 8.1 Summary of misfits using the correction scheme for the synthetic MT example with m(z) = <r(z) 220 8.2 Summary of misfits using the correction scheme for the synthetic MT example with m(z) = \oga(z) 220 List of Figures 2.1 Relative linearity of MT responses R and c for constant-conductivity models 50 2.2 Relative linearity of R and c for surface-layer true model and constant-conductivity starting models 53 2.3 Relative linearity of R and c for surface-layer true model and constant-conductivity starting models as a function of <ro and T 55 2.4 Relative linearity of R and c for surface-layer true and starting models 56 2.5 Relative linearity of R and c for surface-layer true and starting models as a function of gq and T : 58 2.6 Relative linearity of R and c for surface-layer true and starting models (different layer thickness) 59 2.7 Relative linearity of R and c for surface-layer true and starting models (different layer thickness) as a function of ao and T 60 2.8 Relative linearity of R and c for positive-gradient true model and constant-conductivity starting models 61 2.9 Relative linearity of R and c for positive-gradient true model and constant-conductivity starting models as a function of ao and T 62 2.10 Relative linearity of R and c for positive-gradient true and starting models 64 2.11 Relative linearity of R and c for positive-gradient true and starting models as a function of ao and T 65 2.12 Relative linearity of R and c for positive-gradient true and starting models (different surface value) 66 2.13 Relative linearity of R and c for positive-gradient true and starting models (different surface value) as a function of ao and T 67 3.1 Dependence of x2 misfit on target (linear) misfit xi 94 3.2 Sequence of models produced in l2 flattest model inversion 97, 98 3.3 The l2 flattest model for m(z) = a(z) and / (z) = log z + zo 102 3.4 The I2 flattest model for m(z) = \oga(z) and f(z) = \ogz+zo 103 3.5 The I2 flattest model for m(z) =log cr(z) and f(z) = log z + zo and weighting w{z) which allows structural variations in narrow zones 105 3.6 The effect of data errors on I2 flattest model construction 106 3.7 The I2 flattest and absolutely flattest models 108 3.8 The l2 flattest models for the LITHOPROBE MT data set 110 3.9 The h smallest-deviatoric models for m(z) = a(z) 113 3.10 Approximate appraisal using weighted /2 model-deviation norm 115 4.1 Sequence of models produced in li minimum-variation model inversion . . . 126, 127 4.2 Computation time for LP solution at each iteration 128 4.3 The h minimum-variation model for m{z) = log a{z) 130 4.4 Best-fitting and l\ minimum-structure models 132 4.5 The li minimum-variation model for weighting function Wi = 5 for zt < 800 m and wl = 1 for z{ > 800 m 134 4.6 The effect of data errors on l\ minimum-variation model construction 135 4.7 The effect of data outliers on l\ and h minimum-structure models 137 4.8 The h minimum-structure model for the LITHOPROBE MT data set 139 t 4.9 The li smallest-deviatoric and minimum combined-norm models 140 5.1 Funnel function lower and upper bounds for the linear test case 153 5.2 Constructed extremal models for the linear test case 155 5.3 Percent improvement for V\ = oo, V 2 = 2.0 for the linear test case 155 5.4 Lower and upper bounds as a function of variation V for the linear test case 157 5.5 The l\ minimum-variation model for the linear test case 157 5.6 Normalized bound width as function of A and V for the linear test case 159 5.7 Funnel function bounds for a(zo, A) for the synthetic MT example 162 5.8 Extremal models of unbounded variation for the synthetic MT example 163 5.9 Extremal models with variation bound Vj = 0.21 S/m for the synthetic MT example 166 5.10 Percent improvement with V\ — oo, V2 = 0.21 S/m for the synthetic MT example .. 167 5.11 Comparison of upper bound for a(zo =4000) and lower bound for a(z0 = 8000) for the synthetic MT example 168 5.12 Extremal models for a(z 0 = 8000, A = 4000) for the synthetic MT example 169 5.13 The effect of data errors on computed bounds 171 5.14 Extremal models which maximize a for the apparent low conductivity region 2000-7000 m depth for the LITHOPROBE MT data set 174 5.15 Extremal models which minimize a for the apparent high conductivity region 20000-30000 m depth for the LITHOPROBE MT data set 176 5.16 Extremal models which maximize a for the apparent high conductivity region 20000-30000 m depth for the LITHOPROBE MT data set 177 6.1 Comparison of upper and lower bounds from simulated annealing and linearized inversion, synthetic MT example 190 6.2 Comparison of extremal models (minimization) from simulated annealing and linearized inversion, synthetic MT example 193 6.3 Comparison of extremal models (maximization) from simulated annealing and linearized inversion, synthetic MT example 194 6.4 Comparison of extremal models (maximization) from simulated annealing and linearized inversion, LITHOPROBE MT data set 196 7.1 Tectonic map of earthquake precursor study area on Vancouver Island 198 7.2 Change in observed responses at site BRF between 1988 and 1989 203 7.3 Minimum-structure models for BRF 1988 data set 205 7.4 Smallest-deviatoric models for site BRF 207 7.5 Minimum-structure models for sites BRF, UBC and HLN 210 8.1 One-step inversion for exact linearization 214 8.2 Correction scheme inversion for m{z) = o(z) 219 8.3 Correction scheme inversion for m(z) = log a(z) 222 8.4 The x 2 misfit as a. function of (linearized) inversion iteration number 225 8.5 The h flattest models constructed via standard and corrected inversions 226 8.6 The x 2 misfit as a function of (linearized) inversion iteration number for the LITHOPROBE MT data set 228 8.7 The l2 flattest models constructed via standard and corrected inversions for the LITHO-PROBE MT data set 229 Acknowledgements I would like to sincerely thank Dr. Doug Oldenburg for his valuable advice and encourage-ment in all aspects of this work. I would also like to thank Yaoguo Li, Drs Rob Ellis and Ken Whittall, and Dave Aldridge for many helpful and interesting discussions. I am grateful to Dr. Rob Ellis and John Amor for their help and expertise with the computing facilities, and to Drs P. Loewen, R. M. Ellis and R. D. Russell for constructive reviews of the manuscript. D. R. Auld and Dr. L. K. Law of the Pacific Geoscience Centre, Sidney, British Columbia, provided the MT data from the Vancouver Island earthquake precursor study and were the source of much useful information on the subject. Funding for the research work in this thesis was supplied by an NSERC post-graduate scholarship and NSERC research grant 5-84270. Finally, I would like to sincerely thank my friends in the Department of Geophysics and Astronomy who have made my studies there so enjoyable. Chapter 1 Introduction Many geophysical methods rely on the interpretation of measurements made at the Earth's surface to infer the subsurface distribution of various physical parameters. Seismic reflection and refraction, gravity and electromagnetic methods are examples of this procedure. Determining useful information about the Earth from a set of surface measurements is the realm of geophysical inverse theory. This thesis considers the application and development of inversion procedures to the method of magnetotellurics (MT), which uses surface measurements of naturally-occurring electromagnetic fields to determine the electrical conductivity distribution within the Earth. The first section of this thesis defines the geophysical inverse problem. Section 1.2 considers how the geo-electrical conductivity may be used to characterize the composition and physical state of the Earth's interior. Section 1.3 describes the magnetotelluric method, reviews pertinent applications of inverse theory and considers the assumption of a one-dimensional (1-D) Earth model. Finally, Section 1.4 provides an overview of the work presented in this thesis. 1.1 The geophysical inverse problem This section briefly defines the geophysical inverse problem. Thorough reviews of the subject are given by Parker (1977a), Tarantola & Valette (1982a, b) and Menke (1984). In a general geophysical problem, the measured responses e are related to the Earth model m according to e(u) = F(m,u), (1.1.1) where the physics of the problem is represented by the mapping F: m—> e and u is an independent space or time variable relevant to the experiment. The problem of predicting the data e given a model m is known as the forward or direct problem. The corresponding inverse problem may be defined as: given data e, determine the model m which gave rise to these measurements. Unfortunately, it is rarely possible to solve this problem unambiguously. In practice, only a finite number of responses are measured, so the data may be represented as {e7 = e(uj); j — 1,... ,N}. Since the model solution sought often represents an (infinite-dimensional) function of position while the set of responses is finite (iV-dimensional), the mapping F: m^F(m) is not one-to-one, and the inverse mapping is non-unique. In addition to being incomplete, practical data are always inexact so that (1.1.1) becomes e, = Fj (m) + j = 1 , . . . , N, (1.1.2) where Fj(m) = F(m, Uj) and e3 represents the (unknown) measurement errors. It is unjustified to seek a solution to the inverse problem (1.1.2) which fits the data to a greater level of precision than the uncertainty of the measurements; this compounds the problem of non-uniqueness. The inevitability of data errors and simplifying assumptions (e.g. representing the Earth by a 1-D model) also raises the question of existence for the inverse problem: for a given set of responses it is possible that no model exists which adequately reproduces the data. In general, if there exists one model which fits the responses, then infinitely many such models exist. These models can be diverse and difficult to characterize. Given the problems of existence and non-uniqueness, the goal of inverse theory is to use the observed data to infer information about the true model. There are several general approaches to overcoming the inherent non-uniqueness of the inverse problem, as outlined by Oldenburg (1984). One approach is model construction, whereby a solution m is sought which adequately reproduces the data. Another approach is that of appraisal. Rather than constructing one or more of the infinite number of possible model solutions, the goal of appraisal is to calculate properties of m which all acceptable models (including the true model) share. A third approach to inverse theory is inference, whereby the measured data are used to predict other functional of the model. Backus and Gilbert have developed a general formalism for treating inverse problems which are linear or can be linearized, such as the MT problem; their work provides the foundation for many of the applications developed here. This thesis considers the inverse problem of determining the Earth conductivity distribution from magnetotelluric measurements. Section 1.3 describes the MT method and reviews pertinent applications of inverse theory; but first, properties which influence the conductivity are briefly considered. 1.2 The geo-electrical conductivity The electrical conductivity of the Earth is influenced by a number of factors and therefore may be considered indicative of a variety of geophysical conditions and processes. Conductiv-ity distributions inferred from MT experiments have been used to delineate mineral deposits, geothermal areas and potential petroleum reservoirs, and to investigate the temperature distribu-tion, composition, structure, and tectonic features of the Earth. Electric current flow in the Earth is due to three types of conduction: ohmic, electrolytic and dielectric polarization. Ohmic (electronic) conduction dominates in materials containing free electrons, such as the metals. Electrolytic conduction is due to the physical transport of ions in fluids (mostly water) contained in rock pores. Displacement currents are due to the dielectric polarization of atoms, ions or molecules in the presence of time-varying electric fields. At the frequencies employed in MT studies, displacement currents are negligible. Of the commonly measured physical properties of rocks, conductivity is the quantity which exhibits the greatest variation. Rocks and minerals are classified as good conductors for conductivities of 1-108 S/m (this group includes metals, graphite, sulphides and magnetite), intermediate conductors for conductivities of 10~7-1 S/m (most ores, oxides and porous rocks containing water), or poor conductors (insulators) for conductivities of less than 10~7 S/m (most common rock-forming minerals, silicates, phosphates and carbonates). Good and intermediate conductors can often be detected directly using MT methods for resource prospecting. Telford et al. (1976) have compiled tables of characteristic resistivities (the reciprocal of conductivity) of minerals and rocks. At shallow crustal depths, most commonly-occurring minerals are themselves poor conduc-tors and current flow is due predominately to electrolytic conduction within fluid-filled pores in the rock. The conductivity of porous rock depends on the interstitial fluid saturation and conductivity, and the volume and connectivity of the pores. Water conductivity can vary con-siderably depending on the amount and conductivity of dissolved minerals. Temperature may also affect the fluid conductivity by influencing the solubility and mobility of ions. High con-ductivity anomalies are often associated with geothermal areas and MT has been successfully used in geothermal prospecting (e.g. Hoover et al. 1978; Sandberg & Hohmann 1982). The geometrical arrangement of the pores themselves does not generally have a pronounced effect on the electrolytic conductivity, but can cause the conductivity to be anisotropic, a characteristic of stratified rock (Telford et al. 1976). In general, sedimentary rocks are more conductive than igneous or metamorphic rocks, with the conductivity depending primarily on porosity and fluid content, so that MT methods can be used to delineate possible petroleum reservoirs. Noritomi (1981) and the Electromagnetic Research Group For The Active Fault (1982, 1983) have found that active fault zones often correspond to regions of high crustal conductivity and ascribe this to water contained in the fault fractured zone to depths greater than 10 km. Yukutake (1984) notes that the groundwater which causes this high conductivity may also allow the fault to slip more easily. Changes in crustal conductivity have been observed to occur before earthquakes (e.g. Barsukov 1972; Mazzella & Morrison 1974; Qian etal. 1983). The process by which these changes occur is not completely understood, but is believed to be caused by the opening and extension of micro-cracks in crustal rocks, due to large stresses, which subsequently fill with water. MT studies have been carried out to monitor crustal conductivity as a possible precursor in earthquake prediction (e.g. Honkura et al 1976; Kurtz & Niblett 1978, 1983). In the lower crust and upper mantle the geo-electrical conductivity is influenced by a variety of geophysical factors such as temperature, heat flow, partial melting, and petrology (e.g. Gough 1974; Garland 1975; Ad£m 1980). Thus, conductivity models derived from MT or other electromagnetic measurements can be used to investigate large-scale properties of the Earth. For example, a pronounced increase in electrical conductivity is observed when water-containing rocks are heated to 500-700° C which may be attributed to the onset of fractional melting; hence, conductivity anomalies may be useful in identifying zones of partial melt and dehydration. Oldenburg (1981) and Oldenburg, Whittall & Parker (1984) used conductivity models derived from seafloor MT measurements to infer temperature and partial melt profiles for three locations on the Pacific plate. The depths at which large amounts of partial melt were predicted correlated well with seismic low-velocity zones computed by inverting Rayleigh wave dispersion diagrams. Filloux (1980) and Oldenburg (1981) noted that the depth to the apparent zone of partial melt appears to increase with the age of the lithosphere. Conductivity models may also be used to investigate tectonic features. For example, Kurtz et al. (1986, 1990) interpreted an MT survey across central Vancouver Island on the Pacific coast of Canada in terms of a conducting zone at depths greater than 20 km. This conductive layer correlated well with the a strong seismic reflective zone which is believed to delineate the top of the subducting Juan de Fuca plate (Green et al. 1986). The conducting zone is believed to result from cracks and pores filled with saline fluids supplied by water subducted with the oceanic crust and by dehydration reactions. Conductivity anomalies have been associated with a number of other subduction zones (Rokityansky 1982). Also, according to Rokityansky, conductivity anomalies have been discovered in all rifts (spreading centres) which have been investigated by deep electromagnetic study. The high conductivity is believed to be due to a zone of fractional melting, which agrees with the hypothesis of the uplift of hot mantle material in rifts. Many of the studies cited in this section are based on one-dimensional (1-D) interpretations of MT measurements in which the conductivity is assumed to vary only with depth. In this case there are two possible geophysically-realistic representations of the conductivity profile. In shallow regions where abrupt conductivity changes of several orders of magnitude are observed at interfaces between different rock types or at the edges of water saturated zones, layered or discontinuous models may be required. At greater depths, phase changes could cause discontinuous changes in the conductivity, however, in some depth ranges it is believed that the conductivity is largely controlled by temperature; in this case smoothly varying profiles are more likely (Parker 1983, 1984). Conductivity profiles of both types are considered in this thesis. 1.3 The magnetoteliuric (MT) method 1.3.1 EM induction in the Earth Natural electromagnetic (EM) fields measured at the surface of the Earth consist of two components: a primary component of origin external to the Earth, and a secondary or internal component which arises due to telluric currents induced in conductive regions of the Earth by the primary field. The penetration depth of the primary field, and therefore of the telluric currents, depends on the periods of oscillation and on the conductivity distribution within the Earth, with greatest penetration for long periods and low conductivities. The ratio of (orthogonal) horizontal components of the electric and magnetic fields measured at the surface depends primarily on the subsurface conductivity and is relatively insensitive to the properties of the source. In the magnetoteliuric method, pioneered by Tikhonov (1950) and Cagniard (1953), this ratio, known as the impedance, is measured as a function of period. Impedances measured at progressively longer periods provide information about the conductivity to progressively greater depths. Inferring the Earth conductivity distribution from such measurements defines the MT inverse problem. This section presents a brief overview of the MT method; a comprehensive treatment of the subject is given in the monograph by Rokityansky (1982). Many natural sources of geomagnetic variations contribute to the EM spectrum exploited by the MT method. Since impedance measurements constrain the conductivity only over the depth of penetration of the inducing fields, measurements over a broad band of periods T are required to investigate the conductivity distribution of the Earth. Atmospherics which are generated by lightning storms and propagate in the Earth-ionosphere waveguide are an important source for short period variations of 10~5-0.2 s, with amplitude peaks at distinct periods known as the Schumann resonance periods. EM fields for this range of periods typically penetrate to depths of about one kilometre (controlled sources are also sometimes employed at these periods). Micropulsations in the magnetosphere dominate the spectrum for periods of 0.2-1000 s; these fields typically penetrate from one to tens of kilometres. Regular daily variations in the geomagnetic field including solar, lunar and diurnal variations and their harmonics have periods of 0.2-1 day. Magnetic storms and their low-frequency (recovery) components provide rich spectra at periods of hours to days. Impedance measurements at these periods may be used to investigate the conductivity of the lower crust and upper mantle. The rotation of the sun and semi-yearly and yearly modulations provide variations with periods of the order of months. The longest period (extraterrestrial) variations known are due to the 11-year solar cycle with associated penetration depths of up to 1200-1800 km. In the MT method, it is assumed that the geomagnetic variation fields at the surface of the Earth may be represented in terms of a plane wave decomposition. Much of the EM energy incident at the Earth's surface is reflected, but a small amount is transmitted. Even at large angles of incidence (with respect to the normal) the transmitted EM waves are refracted to the normal by the extreme conductivity contrast between the air and the Earth, and diffuse vertically into the Earth inducing telluric currents in conductive regions. The plane wave assumption requires that the wavelengths associated with horizontal field fluctuations be large compared with the depth of penetration of the fields, and is generally valid except near the auroral zone or in cases of local sources such as nearby lightning or magnetic storms. To derive the governing differential equation for EM induction, assume the Earth to be a linear, isotropic medium and consider a Cartesian coordinate system defined with origin at the surface and the positive z axis vertically downward. The EM field vectors are related by Maxwell's equations: V x E = -d tB, (1.3.1) v xH = J + d tD v D =.Ph v B = 0, (1.3.2) (1.3.3) (1.3.4) and the constitutive relations D — eE, B = fiH, J = aE, (1.3.5) where E and H are the electric and magnetic field intensities, B is the magnetic induction, D is the electric displacement, J is the electric current density, pf is the free-charge density, a is the conductivity, and e and p are the permittivity and magnetic permeability. As previously mentioned, at the frequencies employed in MT studies the displacement current d tD is negligibly small and may be omitted (the quasi-static approximation). Also, in most common rocks the magnetic permeability does not vary appreciably from its value in vacuum, /z0, and may be set identically to this value, i.e. /i = Assuming a harmonic time dependence of eiujt, where u = 2^/T is the angular frequency, and making use of the constitutive relations (1.3.5), Maxwell's equations (1.3.1) and (1.3.2) become v x E = -iunoH, (1.3.6) v xH = aE. (1.3.7) The 1-D induction equations are derived by assuming that the conductivity distribution and EM fields do not vary in the x or y directions, i.e. a = a(z) and plane EM waves propagating in the z direction. There is no loss in generality in assuming that the EM fields are linearly polarized as E= (E x, 0,0)e''wt, H= (0,H y,0)eiut. Equation (1.3.6) becomes d zEx = -iunoH y, (1.3.8) and taking the curl of (1.3.6) and using (1.3.7) leads to d 2zEx = iufj, 0aEx. (1.3.9) Equation (1.3.9) is the governing differential equation for 1-D induction. The boundary conditions are that Ex(z) -+0 as z —»oo (the radiation condition) and that the surface field may be set equal to an arbitrary constant Ex(z=0) = E0. Equation (1.3.9) can be solved for Ex(z) analytically for layered conductivity distributions cr(z) or numerically for continuous <r(z); this corresponds to the forward solution for the 1-D induction problem. Equation (1.3.8) may be used to convert field measurements of H y into d zEx. The fundamental response or transfer function of EM induction is the impedance; in the 1-D case this may be defined as Z = EJH y. (1.3.10) The impedance quantifies the effect of the conductivity distribution on the surface fields. Other response functions involving ratios of Ex and H y are also commonly used, e.g. the admittance (reciprocal of the impedance), Y = H y/E x. As the simplest case of EM induction, consider an earth conductivity model which consists of a halfspace of constant conductivity a. In this case it is straightforward to verify that Ex(z) = E0z-W z' s, (1.3.11) H y (z) = (1 - i) \Ja j2(jj(x 0 Ex (z) are solutions to (1.3.8) and (1.3.9), where 6( u) = ^2 /u ix 0 a (1.3.12) is known as the skin depth. The solutions (1.3.11) correspond to exponentially attenuated sinusoids with 8 representing the depth over which the fields are attenuated by a factor of 1/e. Equations (1.3.11) may be solved for the conductivity in terms of the observed EM fields to yield 2 a = luifiQ Hy Ex = iup0/Z 2. (1.3.13) In more general cases when a is not a constant, the magnitude of (1.3.13), known as the apparent conductivity aa{u>), represents an average of the conductivity distribution over the depth of penetration of the EM fields. The phase <f>{u) of Z also provides information of variations in the conductivity with depth. a a{u) and <f>(u>) are commonly used as MT response functions and plots of AA (or pa = 1 /O A) and <F> as a function of LU provide an approximate idea of the behaviour of the conductivity as a function of depth. However, before interpretations regarding the conductivity distribution are possible, the MT inverse problem must be solved. Despite the simplicity of the mathematical model of the Earth and the governing differential equation (1.3.9), the responses depend on the conductivity in a non-linear manner and the inverse problem is not simple nor is it completely understood. 1.3.2 The MT inverse problem This section briefly considers some of the methods which have been applied to the 1-D magnetoteliuric inverse problem. The questions of existence and uniqueness of solutions and methods of model construction, appraisal and inference have been considered by many authors. This review is not intended to be complete, but rather to present the context for the work in this thesis. A thorough survey of applications of inverse theory to 1-D MT is presented by Whittall & Oldenburg (1990). Parker (1980, 1983, 1984) also reviews this problem. A number of authors have considered the question of existence for the 1-D MT inverse problem. Weidelt (1972) derived a set of necessary conditions for the existence of a 1-D model solution involving an MT response functional d zE(z = 0,u) known as the complex c-response or inductive scale length (Schmucker 1970; Weidelt 1972). However, the conditions involve derivatives of the response, which cannot be calculated exactly for discrete data and so the conditions are difficult to apply. Weidelt (1986) overcame this difficulty by deriving a set of 2N necessary and sufficient conditions for existence in terms of determinants involving (complex) responses measured at N frequencies. These sets of conditions represent important theoretical results for the MT problem. Unfortunately, practical (noisy) responses cannot be analyzed in this manner. In general, it is likely that no 1-D conductivity model a(z) exists which is exactly consistent with a given set of inaccurate data, nor is it justified to seek such a model. Rather, the question becomes one of the existence of a model a(z) which adequately reproduces the data according to some misfit criteria. Parker (1980) shows that when no model exactly fits the data, the model which minimizes the x 2 misfit consists of delta functions of infinite conductivity, but finite conductance, separated by insulating zones of zero conductivity. Parker calls this the D+ class of models. If the misfit realized by the D+ model is greater than the specified allowable misfit, then the data are incompatible with a 1-D model; thus, constructing a D+ model with an acceptable misfit provides a necessary and sufficient condition for the existence of a 1-D solution. Parker and Whaler (1981) present a practical algorithm for constructing D+ models. Tikhonov (1965) proved a one-to-one correspondence between every (precise) complete data set and piecewise-analytic conductivity profiles, thereby demonstrating that, in principle, surface measurements are sufficient to uniquely determine the true conductivity a(z). Parker (1983) demonstrated that knowledge of precise responses on any open interval U 1 < oo < UJ 2, or at an infinite number of equally-spaced, discrete frequencies is also sufficient to uniquely determine a(z). Other authors have presented uniqueness proofs valid for different types of conductivity models. However, all MT uniqueness results require an infinite number of precise data. In contrast, for all practical data sets consisting of a finite number of inaccurate responses the inverse problem is always non-unique and there are two alternatives: either no model exists which adequately reproduces the responses, or an infinite number of acceptable a(z) exist. The problem of constructing 1-D conductivity models which adequately reproduce a set of measured MT responses has received considerable attention (see Whittall & Oldenburg 1990 for a review and comprehensive reference list). Construction methods can generally be categorized as asymptotic inversions, exact (non-linear) inversions and linearized inversions. A variety of heuristic inversion schemes have been developed which consider asymptotic forms of the induction equations based on the physical process of EM diffusion. Although these are approximate inversions which are not guaranteed to reproduce the responses, they are straightforward to implement and are still used routinely in more sophisticated multi-dimensional interpretations. Schmucker (1987) reviews and compares a number of asymptotic methods including the well-known Niblett-Bostick transform (Niblett & Sayn-Wittgenstein 1960) and p*-z* transform (Schmucker 1970). Exact inversion methods which completely solve the non-linear MT inverse problem are generally two-stage algorithms. The first stage involves completing the measured data to obtain responses at all frequencies, the second (non-linear) stage maps the completed data to a uniquely-determined conductivity model. Exact inversions are not iterative, so a starting model is not required and there are no problems concerning convergence. A variety of exact inversion schemes have been developed involving different approaches to the data completion and inverse mapping stages. For instance, the D+ model construction of Parker (1980) and Parker & Whaler (1981) finds a least-squares fit of the measured responses to a partial fraction representation, which essentially completes the data. This partial fraction is then used to construct a continued fraction representation which may be interpreted in terms of a delta-comb function conductivity model. A similar procedure may be used to construct H + conductivity models which consist of uniform layers, each with a constant value of ah2, where h is the layer thickness. In exact inversion methods, once the completed response functional is constructed, the conductivity model is uniquely determined. Therefore, any flexibility in the model construction must be built into the data completion procedure. For instance, Weidelt (1972) transformed the data to an MT impulse response function analogous to the seismic impulse response, and inverted this response using Gel'fand-Levitan theory to obtain a unique conductivity profile. Whittall & Oldenburg (1986) and Whittall (1987) extended Weidelt's method by constructing the impulse response using linear inversion techniques; this allowed the construction of diverse, minimum-structure impulse responses. The final conductivity model constructed approximately minimizes a related structural measure. In the linearized approach to model construction, an approximate linear relationship is derived between a perturbation 8a to an arbitrary starting model a0 and the corresponding change in the responses. This equation may be inverted using linear techniques to yield a solution for the model perturbation and the model is updated. This procedure is repeated iteratively until an acceptable model is achieved. Linearized inversions may be divided into two categories. Parametric or under-parametrized inversions restrict the space of possible model solutions by representing the model by a small number of parameters. This reduces or eliminates the non-uniqueness of the inversion; however, the solution is strongly dependent on the parametrization. Alternatively, over-parametrized inversions adopt a large number of model parameters to approximate an arbitrary function of depth. Backus-Gilbert linear inverse theory (Backus & Gilbert 1967, 1968, 1970) may be used to construct model perturbations which minimize some functional of 8a and/or the data misfit. This approach has been used by Parker (1970), Oldenburg (1979) and Hobbs (1982). Importantly, Oldenburg (1983) transformed the linearized equations so that a functional of the model itself (not the model perturbation) is minimized at each iteration; this allows the construction of models of a specific character. Constable, Parker & Constable (1987), Smith & Booker (1988) and Dosso & Oldenburg (1989) have applied this method to construct minimum-structure conductivity models. This approach has a major advantage over exact (non-linear) inversions in that a structural measure of the model is minimized directly. The disadvantage is that, as with any iterative linearized method, it is difficult to verify if a global minimum is achieved. A number of approaches to the problem of appraisal for 1-D MT inverse problem have been developed. One approach is to compute unique features associated with a given set of responses. For example, Parker (1982) found the minimum depth to a perfect conductor for practical data, such that the corresponding D+ model achieved an acceptable misfit. The region below this depth may be considered the zone of total ignorance, since the data do not constrain the conductivity structure here unless additional information is supplied or assumed (e.g. prohibiting non-geophysical models such as the D+ solution). Monte Carlo and related random search techniques, which use forward modelling to test a large number of randomly generated models, have been applied to the inversion and appraisal of MT responses (e.g. Hermance & Grillot 1974; Jones & Hutton 1979; Fischer & Le Quang 1981). Model features may be appraised by generating a large number of acceptable models and estimating the most probable values or bounds for model parameters from this population. The advantage of this method is that no approximations such as linearization are required. However, since such a search can never be exhaustive, the computed bounds must be regarded as approximate at best. In addition, these methods are often not practical due to the low probability of randomly generating models consistent with modern data sets consisting of a large number of highly accurate responses. Finally, the method is strongly dependent on the model parametrization which is usually chosen for computational efficiency rather than to provide a realistic representation of the Earth. Backus & Gilbert (1968, 1970) developed a method of appraisal for linear inverse problems based on generating unique averages of the model from linear combinations of the data. For non-linear problems such as MT, Backus-Gilbert appraisal can be applied via linearization about some constructed model. Unfortunately, in this case the unique averages pertain only to models which are linearly close to the initial model. Oldenburg (1979) constructed a number of different conductivity models by inverting a set of MT responses and found different values for the model average by linearizing about these models. Parker (1983) and Oldenburg, Whittall & Parker (1984) have found linearized Backus-Gilbert appraisal to be inadequate for the MT inverse problem. An alternative method of appraisal was developed by Oldenburg (1983), who computed bounds for localized conductivity averages by constructing extremal models which minimize or maximize the conductivity over specified regions. This method of appraisal may be considered an application of the general inference theory of Backus (1970a, b, c; 1972). However, since the extremal models are constructed via linearized inversion, it is difficult to establish whether the computed bounds represent true global (rather than local) extrema. Weidelt (1985) provides a fully non-linear theoretical analysis of extremal models for a small number of accurate MT responses. 1.3.3 Applicability of 1-D inversion In this thesis, methods are developed to construct and appraise 1-D conductivity models by inverting MT measurements. Of course, in general, the conductivity of the Earth can vary in two or three dimensions. Higher-dimensional inversion algorithms are a topic of much current research (see e.g. the review by Oldenburg 1990). Nonetheless, inversions for 1-D structure are still an important source of information for a number of reasons. First, there are geologic regions where lateral variation is small (e.g. sedimentary basins, oceanic lithosphere) and 1-D interpretations are directly applicable. At a sufficiently large scale, the deep geo-electrical structure may often be considered approximately 1-D; this structure can be studied using very long-period responses. Departures from one-dimensionality due to relatively shallow inhomogeneities can often be approximated by a frequency-independent static distortion and only 1-D interpretations are necessary (Larsen 1977). Also, Weidelt (1972) has derived transformations for conductivity profiles and responses between a 1-D flat, Cartesian model and a radially-symmetric model. In practice, many MT studies use 1-D modelling to provide rudimentary interpretations of the Earth structure, and Parker's (1980) D+ solution provides a straightforward method to determine the applicability of this assumption. If the conductivity structure is not 1-D, the impedance measurements generalize to a tensor with four components. A number of authors have proposed using rotationally invariant parameters of the impedance tensor (e.g. Berdichevsky & Dimitriev 1976; Ranganayaki 1984) in 1-D inversions in an attempt to minimize multi-dimensional effects. Ingham (1988) and Park & Livelybrooks (1989) used modelling studies of invariant responses to evaluate 1-D inversions in the vicinity of a 3-D heterogeneity. They concluded that the 1-D inversion of these responses provided a good representation of the structure directly beneath the site when the site was located away from a finite, highly-conductive heterogeneity, but erroneous deep structure beneath the anomaly. At sites above or away from a finite resistive heterogeneity, 1-D inversions yielded a good structural representation at shallow and intermediate depths, but erroneous structure at greater depths. A second reason for developing 1-D inversion algorithms is that 1-D models can be combined to provide good starting models for 2 - or 3-D inversions (Park & Livelybrooks 1989). Also, 1-D inversions can be an integral component of multi-dimensional inversion schemes, as in the Rapid inversion algorithm of Smith (1989), or the AIM (Approximate Inverse Mapping) inversion of Oldenburg & Ellis (1990). A third reason is that a thorough understanding of the 1-D inverse problem can provide a foundation for solving inverse problems in higher dimensions. This is true both in terms of understanding the inherent non-uniqueness of the inverse problem and in developing practical inversion algorithms. In particular, it is anticipated that the construction and appraisal methods developed in this thesis for 1-D inversion could be extended to the 2-D case. Finally, Parker (1983) notes that "the usefulness of the (1-D) model can best be judged by the large number of papers in the geophysical literature relying upon it for interpretational purposes and by the almost equally large number devoted to advancing the associated theory." This statement would appear to remain true today. 1.4 Overview of work in this thesis The purpose of this thesis is to investigate the non-linear magnetoteliuric inverse problem. In particular, linearized methods are developed to construct and appraise 1-D conductivity models of the Earth by inverting MT measurements. Linearized methods are generally based on expanding a functional representing the forward problem about some starting model and neglecting the higher-order terms. Chapter 2 considers the linearization of the MT problem in detail. Complete expansions in terms of Frechet differential series are derived for several choices of MT response and the Frechet differentiability of these responses is verified. The relative linearity of the responses is quantified by examining the ratio of non-linear to linear terms in order to determine the best choice for a linearized approach. In addition, the similitude equation for MT, suggested by G6mez-Trevino (1987) as an alternative formulation to linearization, is examined and found to be inadequate in that it implicitly neglects first-order terms. In Chapter 3 an iterative inversion algorithm is developed based on the local linearization of Chapter 2. This algorithm may be used to construct acceptable conductivity models which minimize an l2 norm. The norms considered measure model structure or the deviation of the model from a given base model to produce minimum-structure and smallest-deviatoric models, respectively. The constructed models are smooth and represent structural variations in terms of continuous gradients in the conductivity. Chapter 4 presents a similar inversion algorithm which may be used to construct minimum-structure or smallest-deviatoric models which minimize an /x norm. The solutions resemble layered Earth models with structural variations occurring discontinuously at distinct depths and complement the smooth solutions constructed in Chapter 3. The algorithms developed in Chapters 3 and 4 consider the model to be either a or log a, include an arbitrary weighting function in the model norm, and fit the data to a specified level of misfit: these algorithms should provide considerable flexibility in constructing 1-D conductivity models from MT responses. Model features can be appraised by constructing extremal models which minimize and maximize localized conductivity averages (Oldenburg 1983). These extremal models provide lower and upper bounds for the conductivity average over the region of interest. In Chapter 5, an efficient, robust appraisal algorithm is developed using linear programming to minimize or maximize the conductivity averages. For optimal bounds, it is important that the extremal models are geophysically reasonable. The appraisal method is extended by constraining the total variation to limit unrealistic structure and ensure that the extremal models are plausible. The variation bound can be specified in terms of a or log a. The extremal models in Chapter 5 are constructed via linearized inversion; therefore, the possibility always exists that the computed bounds represent local rather than global extrema. In Chapter 6 the method of simulated annealing optimization is applied to the problem of constructing extremal models. Simulated annealing is a Monte-Carlo procedure based on an analogy between the parameters of a mathematical system to be optimized and the particles of a physical system which cools and anneals into a ground-state configuration according to the theory of statistical mechanics. Simulated annealing makes no approximations and is well known for its inherent ability to avoid unfavourable local minima. Although the method is considerably slower than linearized analysis, it represents a general and interesting new appraisal technique which may be used to corroborate results of the linearized approach. In Chapter 7, the model construction algorithms developed in this thesis are used to analyze MT responses measured at a number of sites on Vancouver Island, Canada. The measurements were made to investigate monitoring of local changes in conductivity as a precursor for earthquakes. MT responses measured at the same site over a period of four years are analyzed and indicate no significant changes in the conductivity structure (no earthquakes of magnitude greater than 3.0 occurred in this period). Conductivity profiles at a number of sites are also considered in an attempt to infer the regional structure. In Chapter 8, a method of correcting linearized inversion iterations is developed. The cor-rections consist of successively approximating the linearization error using analytic expressions developed in Chapter 2. The method would seem to represent a novel approach which can be implemented in a practical algorithm that significantly reduces the number of linearized itera-tions. In addition, a correspondence between the correction steps and iterations of the modified Newton's method for operators is established. The algorithms developed in Chapters 3, 4, 5, 6 and 8 are all illustrated using synthetic test cases and MT field data collected as part of the LITHOPROBE project. Finally, it should be noted that although the algorithms developed in this thesis are applied specifically to the 1-D MT problem, the methods are general and could be applied to a variety of inverse problems. Chapter 2 Frechet differential expansions and the linearity of MT responses 2.1 Introduction In many geophysical inverse problems the observed responses are related to the Earth model by a non-linear operator or functional. A practical method of treating such problems is that of local linearization whereby the non-linear operator is replaced locally by its linear approximation. This requires that the operator be expanded in a generalized Taylor series about an arbitrary starting model; the expansion may then be linearized by neglecting the higher-order terms. A variety of linearized iterative algorithms have been applied to the non-linear MT inverse problem. Oldenburg (1983) shows how the linearized problem may be transformed so that the solution at each iteration minimizes a functional of the model called a model norm. In this chapter linearization is examined as a basis for model norm solutions. Complete expansions are derived for several choices of MT response functional and the higher-order terms are considered in an attempt to formulate the most accurate linearized inverse problem. First, however, the necessary mathematical background is presented. 2.1.1 Functional analysis background An operator on a normed vector space is a generalization of the idea of a function of a real variable. The extension of the concepts of algebra and calculus to operators is the realm of functional analysis. In this section the necessary definitions and results from the theory of functional analysis are presented in order to develop the general expansion and linearized solution of a non-linear operator equation. Since the MT forward problem may be considered as a non-linear mapping of the space of integrable, real-valued functions (the geo-electrical model) to the space of real or complex numbers (the observed responses), special attention is given to this case. The most efficient approach to non-linear operator problems is often that of linearization. For ordinary functions the relevant result is Taylor's theorem which requires the existence of at least some of the derivatives at a point. To apply this idea requires a concept of the derivative for non-linear operators. There are a number of possible generalizations, but the most useful is that of the Frdchet derivative. Since many results in elementary calculus generalize naturally in terms of the Frechet derivative, it plays an important role in functional analysis. Definition 2.1 Frechet derivative (Milne 1980, p. 289) Let U, V be normed vector spaces. An operator F:U—*V is Frfchet differentiable at x0 £ U if there exists a continuous linear operator F'(x 0)\U^V such that for all h G U F'(x 0) is known as the Frechet derivative of F at x0. F'(x 0)h is known as the Frdchet differential and is also written F'(x 0)h = dF(x 0, h). The Frechet derivative at x0 is unique. In (2.1.2), || • | |u and || • | |v refer to the norms associated with vector spaces U and V (when there is no chance of confusion, the subscripts are omitted). A distinction must be made between Frechet derivatives and the ordinary derivative. In the case F: where 71 is the set of real numbers, the two are closely related, but logically different (Griffel 1981). The ordinary derivative at x0 is a number giving the slope of F at xQ, and the ordinary differential dF = F'(x 0)h is interpreted as the product of the number F'(x 0) with the number h. The Frdchet derivative F'(x 0), however, is not a number but a linear operator F'(x 0): H^K. The Frechet differential dF = F'(x 0)h is interpreted as the result of applying the linear operator F'(x 0) to the element h 6 TZ. According to Definition 2.1, an operator F is Fr6chet differentiable if its Frechet derivative exists. Therefore, to prove the Frechet differentiability of F at xo, a linear operator F'(x 0) must be found such that the remainder (non-linear) term e(x 0, h) of (2.1.1) satisfies condition (2.1.2), i.e. e(x 0, h) is much smaller than h as \\h\\—In general, however, the expression for F (x 0 + h) = F Oo) + F' (so) h + e (a*, h), lim M £ £ I M 1 V = 0 . IWIm-O \M U (2.1.1) (2.1.2) the Frechet derivative F'(x 0) alone is not easily written in an informative way; it is often more useful to compute the differential dF — F'(x 0)h and not the Frdchet derivative itself (Zeidler 1985). Equations (2.1.1) and (2.1.2) can be combined and written in the form F(x 0 + h) = F(XO) + F'(XO)/> + o[ |N| ] . ' (2-1.3) This equation demonstrates that if F is Fr6chet differentiable at x0, then F is locally linear at x0. In a sufficiently small neighbourhood of x0 (i.e. for h sufficiently small), F(x 0 + h) can be approximated to arbitrary accuracy by F{x 0) + F'(x 0)h, which is linear in h since the Frechet derivative F'(x 0) is, by definition, a linear operator. The Fr6chet differential gives the best linear approximation to the non-linear operator F near x0 (Griffel 1981). The Frechet differential F'(x 0)heV represents the application of the linear operator F'(x 0) to the element h e U. When U represents the space of integrable, real-valued functions of a real variable on the interval [0, d\, and when V, the range of F, represents the space of complex numbers, any continuous linear operator $ : li —> V may be written as d $/* = J <f>(u)h(u)du, (2.1.4) o for some bounded function <f> called the kernel of $ (Griffel 1981). The kernel corresponding to DF(x 0) is called the Fr6chet kernel of F, denoted by G(x 0, u): thus (2.1.3) can be written as d F(x 0 + h) = F ( x 0 ) + J G(x 0,u)h(u)du + o[\\h\\]. (2.1.5) o In order to investigate the form of the non-linear terms, higher-order Fr6chet derivatives are required. Definition 2.2 Second-order Frechet derivative (Milne 1980, p. 295) Let F \ U ^ V be Frechet differentiable in a neighbourhood of xQeU. If the Frechet derivative of F' at x0 exists, it is called the second Frdchet derivative of F at xQ and is written F"(x 0). The second Frechet differential is given by d 2F(x 0, h) — F"(x 0)h2. The second Frechet derivative is a bi-linear operator, i.e. F"(x 0)-MxU^V is an operator which, given (hi,h 2) G UxU (the Cartesian product of U with itself) associates it with an element of V denoted by F"(x 0)(hi, h2). F"(x 0) applied to (/ii, h2) is linear in both hi and h2. The second Frechet differential F"(x 0)h2 = F"(x 0)(h,h) represents the restriction of F"(x 0) to elements (h, h) G UxU. Higher-order Frechet derivatives are defined in a manner similar to Definition 2.2. With Frechet derivatives of all orders defined, Taylor's expansion theorem may be generalized, to operators: Theorem 2.3 Taylor series expansion (Milne 1980, p. 298) Let F:U—>V have an nth Frdchet derivative F^ n\x0) and Frdchet derivatives of order 1 through (n—1) in an open ball B(x 0, r) G U, then for all h eU with < r F(x 0 + h) = F (x0) + F' (x 0) h + (x0) h2 (2 1 6) + ... + ±FW(x 0)h« + o[\\h\\ n}, where F ^ ( x 0 ) h n denotes the restriction of the multi-linear operator F ( n ) ( x 0 ) to elements (h,h,...,h) G {UxUx ... xU). In Theorem 2.3 B(x 0, r), the open ball with centre x0 G U and radius r > 0 (r G 11), represents the set of all elements seU such that —3|| < r. In the case where U represents the space of integrable, real-valued functions defined on [0, d] and V represents real or complex numbers, the bilinear operator F"(x 0)h2 can be written as (Griffel 1981) d d F" (x 0) h2 = j J G2(x 0,u1,u2)h(u 1)h(u 2)du 1du 2, (2.1.7) o o where G2(x 0) represents the second Frechet kernel of F(x 0). Extending this result to higher-order Frechet differentials allows the Taylor series expansion (2.1.6) to be written as d + F (x 0 + h) = F (x 0) + J Gi (x 0,«i)ft(ui)dMi d d + ... ' + J J G2(x 0,u1,u2)h(ui)h(u 2)du 1du2 0 0 d d + ^j- J J " J Gn Oo, u 2 , . . . , un) h (ux) h (u 2) • • • /l (U n) • • • du n + ° (2.1.8) where Gk{x 0) represents the Frechet kernel of order k. Generalized Taylor series expansions of the form of (2.1.6) or (2.1.8) are also referred to as Frdchet differential expansions. The concepts of Fr6chet differentiation and Taylor series expansions for operators lead naturally to the generalization of linearized inversion techniques such as Newton's method. Consider the non-linear operator equation F(x) = 9, where F-.U-^V and 6 represents the zero element of V, with solution x 6 U. If F is Fr6chet differentiable in an open ball B(x m,r) about an element xm£U with ||x — x m | | <r, then according to (2.1.3) F(x) may be expanded about xm as F (x) = 6 = F (x m) + F'(x m)(x - xm) + o [|\x - xm\|]. (2.1.9) If [F'(x m)J-1, the inverse of F'(x m), exists, then by neglecting the second-order terms, equation (2.1.9) can be rearranged to give an approximation x m + 1 to x: xm+1 = xm - [F'(Xm)}' 1 F(x m). (2.1.10) Equation (2.1.10) may be used in an iterative fashion by selecting an initial approximation a;0 and solving for xm+1 for m = 0,1,... to produce a Newton series. Conditions for the convergence of Newton's method for operators exist (e.g. Hutson & Pynn 1980), but in practice these conditions are difficult to apply and it is usually simpler to carry out the computations and check convergence a posteriori (Milne 1980). In general, Newton's method will converge provided the initial approximation x0 is sufficiently near the solution. When this is the case, the rate of convergence is quadratic, i.e. \\xm+i -x\\ < A | | z m - x | | 2 , (2.1.11) where A e 1Z is a constant. A potential difficulty with the Newton series given by (2.1.10) is that the Fr6chet derivative must be computed and a linear operator equation must be solved at each iteration. In practice, these calculations may be very time consuming. A number of modified Newton's methods have been devised to overcome this difficulty. One of the simplest is given by (Lusternik & Sobolev 1961) Xm+1 = x m - [F' (xo)r1 F(x m), (2.1.12) where the initial inverse operator is used for all iterations. This greatly reduces the computational effort, but can degrade the convergence. The remainder of this chapter applies the concepts of Frechet differential expansion and linearization to MT response functional; in subsequent chapters linearized construction and appraisal algorithms based on Newton's method for operators are developed. 2.1.2 Application to 1-D MT inversion The 1-D MT forward problem may be considered as a non-linear mapping of the space of integrable, real-valued functions of a real variable (the geo-electrical model m(z), defined for z > 0) to the space of real or complex numbers (the observed responses e(u>) measured at N distinct frequencies uij), i.e. e ^ ) = F(m, U j), j = l,...,N. (2.1.13) If F is Frdchet differentiable, the functional analysis methods of Section 2.2.1 may be used to solve (2.1.13) as an inverse problem. The model m(z) is, of course, unknown, but it may be represented as the sum of an arbitrary starting model m0(z) and an unknown model perturbation 8m(z) so that (2.1.13) may be written e (uj) = F (m0 + 8m, u>j). (2.1.14) If F is Frechet differentiable in a neighbourhood of m0, (2.1.14) may be expanded about m0(z) according to (2.1.5) to yield oo e(uj) = F(m0 ,Wj) + J G1(m 0,u}j,z)6m(z)dz + o[\\6m\\], (2.1.15) o where Gx represents the first Frechet kernel of F with respect to m. This expansion may be linearized by neglecting the higher-order terms to produce CO 8e(u>j) = J G1(m 0,u}j,z)8m(z)dz, j = l,...,N, (2.1.16) o where 8e = e — F(m 0) = F(m) — F(m 0). Equation (2.1.16) is in the form of a Fredholm integral equation of the first kind and can be inverted by extremizing some functional of 8m(z) using standard methods of linear inverse theory (e.g. Oldenburg 1984). Usually the smallest acceptable model perturbation is constructed by minimizing ||<5m||, and the model solution is given by m ^ z ) = mQ(z) + 8m(z). Since higher-order terms have been neglected, this model may not fit the data and the process must be repeated iteratively until an acceptable model is achieved. This process is essentially Newton's method for operators described in Section 2.1.1. Parker (1970) first applied this method to the inversion of global electromagnetic induction data; it was first applied to the 1-D inversion of MT responses by Oldenburg (1979). Alternatively, by substituting 8m{z) = m(z)—m 0(z), as first described by Oldenburg (1983), (2.1.16) can be recast as oo oo 8e(u>j) + J GifaojUj, z)m0(z) dz = J Gi (m0 , Uj, z) m (z) dz. (2.1.17) o o The left side of (2.1.17) consists of known quantities and may be considered modified data. However, in (2.1.17) the responses are related directly to the model, not to a model perturbation. By formulating the inverse problem according to (2.1.17), linear inversion methods can be used to minimize a functional of the model itself, not simply a functional of the model perturbation. This is referred to as model norm inversion in this thesis since the functional (which often consists of a norm) is applied directly to the model. This method allows great flexibility: models of different global character can be constructed by the extremizing different functional of m(z). A number of authors have applied this method to the inversion of MT data. Constable, Parker & Constable (1987), Smith & Booker (1988) and Dosso & Oldenburg (1989) use this formulation to minimize the /2 norm of the first or second derivative of the model to construct 'flattest' or 'smoothest' models. These models are particularly useful in that they exhibit the minimum structure necessary to fit the data. Dosso & Oldenburg (1989) also minimize the l\ norm of the model Variation to produce minimum-structure models which more closely resemble layered Earth models. In addition, Oldenburg (1983) and Dosso & Oldenburg (1989) use (2.1.17) to construct extremal models which maximize or minimize box-car averages of the conductivity model. From these extremal models, upper and lower bounds may be computed for localized conductivity averages in order to appraise features of interest. These various applications of (2.1.17) are the basis for Chapters 3-5 of this thesis. In any iterative inversion algorithm the choice of response and model may be important factors in determining the convergence. A particular choice may be 'more linear' than others in that it yields a more accurate linearized equation when the higher-order terms are neglected. The correct choice should increase the likelihood of the algorithm converging to the desired solution and minimize the number of iterations required. The conductivity a(z) is chosen as model in the initial study since it appears to be the choice for which the inverse problem is most linear (Smith & Booker 1988). As described in Chapter 1, magnetoteliuric responses consist of ratios of orthogonal components of electric and magnetic fields E and B or H (subscripts indicating directional components of the fields are omitted in this chapter). The two responses commonly used in linearized inversions are y (2.1.18) v ' ' iu B (<r, u>) d zE (cr,u>) v ' considered by Parker (1970,1972,1977b) and Constable etal. (1987), and its (scaled) reciprocal v ; E (a, u>) ilu E(a, u) used by Oldenburg (1979, 1981, 1983), Smith & Booker (1988) and Dosso & Oldenburg (1989). The responses c and R are proportional to the impedance Z and admittance Y, respectively. Parker (1977b) derived the linear and remainder terms for the c response and he, Chave (1984) and MacBain (1986, 1987) have considered the Frdchet differentiability of this response. Oldenburg (1979) derived the linear term for R. No proof of the Frechet differentiability of the R response or expressions for the series of higher-order terms for R or c have been given. In the next two sections of this chapter Frdchet differential expansions, including higher-order terms, are derived for the two MT responses and the higher-order terms are shown to sum to a closed-form remainder term. In the usual MT method, measurements are made at the surface of the Earth (z — 0), however, in principle measurements can be made at an arbitrary depth z = zm (e.g. sea floor MT). In this chapter all results are derived for arbitrary measurement depth; this is motivated by a statement by Gdmez-Trevino (1987) that Frechet kernels for 1-D EM induction problems in the literature are limited to surface measurements. The Frechet differentiability of R is proved. Results are illustrated for the special case of constant-conductivity models where Frdchet differentiation is equivalent to ordinary differentiation and the expansions can be evaluated directly. Section 2.5 quantifies the relative linearity of the R and c responses by examining the ratio of non-linear to linear terms in order to determine the most linear choice of response. G6mez-Trevino (1987) suggests that model norm inversions could be based on the similitude equation rather than on linearization. Section 2.6 shows that linearization is indeed the correct basis for these solutions. Finally, the linearized expressions derived in this chapter are the basis for the MT inversion and appraisal applications of Chapters 3, 4 and 5, and Chapter 8 also makes use of these expressions in an attempt to include or correct for the higher-order information in a linearized solution. 2.2 The response R = d zE/(iu>E) 2.2.1 Expansions for R The definition of the Fr6chet derivative (Definition 2.1) states the criteria that the derivative must satisfy, but does not indicate how the Frechet differential terms are obtained. In general, the (linear) Frechet derivative term may be determined by a standard perturbation technique. The governing differential equation may be written in terms of a reference or starting model m0(z) and response e(m0), or a perturbed model m0(z)+6m(z) and response e(m 0 + Sm) = e(m 0)+6e. A differential equation which relates the perturbation in the model 8m{z) to the resulting change in the observed response Se may be obtained by subtracting the original equation from the perturbed expression. Neglecting higher-order terms and solving this equation leads to a linear expression for 6e in terms of the first Frechet differential. In this section, the perturbation analysis is carried out in a slightly different manner which emphasizes the analogy with Frechet differential expansions and retains higher-order terms. The complete Frechet differential expansion is derived for arbitrary measurement depth zm. This expansion may be expressed as an infinite series of terms of increasing order in the model perturbation, or as a linear term and a closed-form remainder term which contains all the higher-order contributions. The MT differential equation for the electric field is derived in Chapter 1: d 2zE(a,z)-ivn 0a(z)E(a,z) = 0, * (2.2.1) with boundary condition E(a, z) —> 0 as z —> oo, where a(z) represents the conductivity model and the dependence of E on u is implicitly assumed. The data are usually measured at the Earth's surface, but the response R defined by (2.1.19) may be generalized to depth as * ( , . , ) = (2.2.2) iu> E(a,z) Differentiating (2.2.2) and substituting from (2.2.1) yields the governing differential equation in terms of R: d zR (<r, z) + iu>R2 (a, z) = fi 0a (z). (2.2.3) Assume (2.2.3) is satisfied by an arbitrary conductivity model a0(z) and introduce a first-order perturbation a(z) = cx0(z) + 6a(z). (2.2.4) This will result in a change in the response which may be expressed as an expansion R(a,z) = R(a 0 + 6cr,z)=R0(z) + R1(z) + R2(z) + R3(z) + ... , (2.2.5) where Rk represents the term of order k in 8a. Substituting (2.2.4) and (2.2.5) into the differential equation (2.2.3) and equating terms of like order leads to d zRo + iojRl = (2.2.6a) d zRi + (2iuR 0)R1 = Hoticr, (2.2.6b) d zR2 + {2iuR 0)R2 = —iujRl, (2.2.6c) d zR3 + (2iuR 0)R3 = —ioj2RxR2, (2.2.6d) d zR4 + (2iu}Ro)R 4 = -iu(R% + 2R1R3), (2.2.6e) Consider a solution to equations (2.2.6) for a particular depth zm. The zero-order equation (2.2.6a) has the form of the original differential equation (2.2.1) and is satisfied by Ro(z m) = R(a 0, zm). The remaining equations all have left sides of the same form. They are linear, first-order differential equations and may be solved using an integrating factor zm . 2 E{a o, zm) J e x p / 2IURQ (U) du = . E(a o,0) . Assuming that i?A(cr, z) —> 0 as 2 00 for k = 0 , 1 , . . t h e solutions to (2.2.6) are Ro(z„ Ri(z„ R2(z„ Rz(z-n Ri{z„ --- R(<T 0,Z m), - / - / . 0 E(a 0,z) E(a 0, zm)_ [E(a 0, zm)_ 0 0 , O = [ iu)\ja°±if! J t U J [E(a 0,zm)_ 00 J l U J [E(a 0,Z M)_ 8a(z)dz, R\{z)dz, 2 R1(z)R 2(z)dz, (Rl(z) + 2R1(z)R 3(z))dz, (2.2.7) (2.2.8a) (2.2.8b) (2.2.8c) (2.2.8d) (2.2.8e) The zero-order term Rq represents the response functional evaluated at the starting model, the first-order term Rx is linear in 6a, and R2,R3,... represent higher-order terms. Of course, it remains to be proved that R2, R3,... actually are second or higher order in 6a. This implies the Frechet differentiability of R and is considered in Section 2.2.2. To derive a closed-form remainder term Rr(z m) which contains all the higher-order contri-butions, the expansion for R{a, z) is modified to be R (a, z) = R (00 + 8a, z) = Ro (z) + Ri 0 ) + Rr 0 ) • (2-2.9) Substituting (2.2.9) and the conductivity perturbation (2.2.4) into the differential equation (2.2.3) and equating terms of like order leads to dzRr + (2iw.Ro) Rr = -iw (Ri + Rrf • (2.2.10) This differential equation may be solved for a depth z = zm by using the integrating factor given by (2.2.7) and noting that Rx + Rr = R(a)-R(a 0) = 8R to yield E(a 0,z) Rr 0m) = J tW 0 0 - - 2 8R {z) dz. (2.2.11) _E(a 0,zm)_ ZM Since 8R = R(a) — R(a 0), the remainder term Rr depends on the true model a in a non-linear manner; nonetheless, (2.2.11) gives a convenient closed-form representation of the linearization error. The expansion of R(a) about arbitrary a0 in terms of a linear and remainder term is always exact; however, the expansion in terms of an infinite series of higher-order terms may not converge for all choices of a0. When a is within the region of convergence for the Frechet differential expansion about a0, the equivalence of the remainder term and the series of higher-order terms is easily established. Substituting 6R = R(a) — R(a 0) = RI+R 2 + R3 + ... into the expression for the remainder term given by (2.2.11) and breaking the integral into terms according to their order yields the expansion of higher-order terms given by (2.2.8c, d, ...). The expansion terms in the infinite series for R(a 0, zm) given by (2.2.8a, b, ...) are written in a recursive form: R2 depends on Rl f R3 depends on R2 and Rx, and so on. This form is compact, convenient for computation, and readily suggests the continuation of the series to higher order. However, the terms are not in the standard form for the Frdchet differential expansion according to Theorem 2.3 or equation (2.1.8). It is instructive to write the expansion terms in the form of (2.1.8) since this allows the identification of the Frechet kernels. This can be done by substituting the expression for Rk into Rk+1 from (2.2.8) and introducing the Heaviside step function H defined as 0, if x < 0; 1, if x > 0, to allow the order of integration to be rearranged. After some algebra, the results for the expansion terms in standard Frechet differential form are H(x) = (2.2.12) RO(ZM) = R(<70, Z M), OO zi) 6a (zi) dzi, (2.2.13a) (2.2.13b) (2.2.13c) R2(z m) = ^ J J G2(z m,z1,z2)6a(z 1)6a(z 2)dz 1dz 2, o o oo oo oo Rs(Zm) = ^ J J JG 3( zi, z2, z3) 6a (zi) 6a (z2) && ( z 3 ) dz x dz 2 dz 3, (2.2.13d) o o o where Gk(z m, z i , . . . , Zk) represents the Frechet kernel of order k given by G\ (z m,zi) = — (IQ E(z 1) E (z m) G2 (z m,z!,z2) = 2iu^l J o oo oo G3 (z m,ZUZ 2, Z3) = -12zUJ/J% J j H (zi—z m), E( Z1)E(Z 2) E(z 3)E(z m)_ (2.2.14a) H (z! - z 3 ) H (z2 - z 3 ) H ( z 3 - z m ) dz 3, (2.2.14b) E( Zl)E(z 2)E(z 3) H (zx - z4) E(z 4)E(z 5)E(z m)_ x H (Z 2 — Z 4 ) H ( Z 3 - Z 4 ) H (z 4 — zm)dz 4dz 5, (2.2.14c) The Frechet kernels are evaluated at the starting model cr0. A number of authors refer to G\ itself as the Frdchet derivative; however, this is not strictly in keeping with Definition 2.1 of the Frdchet derivative as a linear operator. For instance, the linear operator associated with R\{z) in (2.2.13b) is given by oo J G1(a 0,zm,z) • dz, (2.2.15) o where the '•' indicates application to a real function. To avoid confusion, Gk will always be referred to as the fcth Frechet kernel in this thesis. The first Fr6chet kernel is sometimes referred to as the sensitivity since it determines, to first order, how the model perturbation 8 a effects the response R(a 0+8a). For zm^0 and z<zm, the Heaviside function in (2.2.14b) ensures that Gi(z m, z) =0 . This indicates that for 2 < zm, 8a{z) makes no first-order contribution to the value of R. In fact, Gk(z m, z\, •. •, zk) = 0 if Zj < zm for any j, 1 < j < k, which indicates that 8a(z < zm) makes no contribution of any order to R. This demonstrates that the 1-D MT method is only sensitive to the conductivity structure below the depth of measurement. This result has the following consequence. Since the conductivity structure above the depth of measurement zm is irrelevant, the coordinate system may always be redefined so that this depth is taken to be the origin, i.e. zm = 0. The general expressions for the linear and remainder terms as well as the expansion in terms of Fr6chet kernels given by (2.2.13) and (2.2.14) then reduce to those that would be derived for surface measurements. Only the recursive form of the higher-order terms in (2.2.8) require expressions for zm > 0. It is also interesting to note that although the first Fr6chet kernel G1 at depth zx depends only on the electric fields at depths z\ and zm, higher-order Frechet kernels depend on the field over the entire depth range. To formulate a practical inversion algorithm, the Frechet differential expansion is linearized by neglecting the terms R? = R2 + R3 + ... . In Section 2.2.2 it is shown that the remainder term is second order in 8a, so this approximation is good provided 8a is small. As 8a-+0 the remainder term goes to zero faster than the linear term, and convergence is guaranteed. For zm = 0 the linearized equation reduces to that derived by Oldenburg (1979): oo 8R(u) & Rriu) = J G1(a o,u>,zm = 0,z)8a(z)dz, (2.2.16) where the dependence on u is explicitly indicated and the Fr6chet kernel is given by E ((To,U, z)~ Gx (cr0, u, zm = 0, z) = -Ho By applying the transformation given in (2.1.17), (2.2.16) may be written as (2.2.17) oo oo £jR(u;) + J G1 (cr o,u;,zm = 0,z)cro(z)dz = J Gi (a 0,u>, zm = 0, z) a (z) dz. (2.2. 18) Equation (2.2.18) can be inverted directly for a model which minimizes some functional using standard methods of linear inverse theory. Since the EM fields are complex, (2.2.18) may be considered as two equations consisting of the real and imaginary parts or manipulated and represented in terms of amplitude and phase. In practice, responses are measured at N frequencies u>j and the solution involves inverting a set of 2N real equations. 2.2.2 Proof of the Frechet differentiability of R The expansion for R(a) developed in Section 2.2.1 may be written oo 8R(z m) = J G1 ((To, z m , z) 8a (z) dz + Rr (z m). (2.2.19) o This is an exact expression; however, in this form it is not amenable to solution using methods of linear inverse theory. If the remainder term can be shown to be o[||5cr||], then it may be neglected and the expression linearized (this is eqivalent to proving the Frechet differentiability of R). Chave (1984) claims that it is important to prove that the remainder term actually is second order in the model perturbation, i.e. R,. — 0[||^<J||2]. This is a stronger condition than Frechet differentiability, which immediately implies Frechet differentiability. The linearization breaks down if the remainder term is not small, and this is observed in geophysical problems (e.g. Woodhouse 1976). A number of authors have considered the Frechet differentiability of the o 0 MT response c given by (2.1.18) (see references cited in Section 2.3.1). MacBain (1987) proves that the linearization error for the c response is 0[||^cr||2], which firmly establishes the Frdchet differentiability of c. This fact can be used to prove a similar result for the R response as follows. MacBain (1987) proves that the functional c satisfies c (a) - c ((Jo) = c' (cr 0) 6a + e (a, a0), (2.2.20) where c'(a 0)6a represents the Frechet differential (linear) term and the linearization error satisfies |e(<7,<70) |</c| |H|2 , (2-2.21) for some KEH, provided ||6cr|| = ||<r — <r0|| is small enough. Now consider the Taylor series expansion of 1 jC about C0 , for complex C and C0 , which may be written as 7? = 7T ~ [C ~ Co] + £ (C, Co), (2.2.23) where \e(C,C 0)\<-^\C-C 0\\ (2.2.24) provided | C - C 0 | < |C|/2. Using C = c(a), C0 = c(a 0) and (2.2.20), (2.2.23) may be written as 1 1 1 c' (cr 0) 6a + e (a, cr0)] + e (c (a) , c (cr 0)) c(a) c(a 0) c2(a 0) where the error E(a,a 0) is given by (2.2.25) It follows that E (a, a0) = e(c(a),c (a 0)) - —^e (a, a0). (2.2.26) C (CToj IE (<7, <r0)l < k (c(or), C (<r„))| + 7-7—772 l e (*> CTo)|, (2.2.27) and using (2.2.21) and (2.2.24), (2.2.27) becomes 16 , , , , k |E(*,*o)| < — — 3 |c(a) — c(<r0)J + —— \\6*\\'. (2.2.28) By (2.2.20) and (2.2.21) \c(a) - c(<7 0)| < \c'(a 0)8a\ + |e(<7,<r 0)| < 2 ||c' (<r0)|| \\6a\\ + K\\8a\\ 2, and so (2.2.28) may be written (to second-order in ||<5<x||) as (2.2.29) |E(*,* 0)\ < 6 4 , l | c / ( ^ 1 1 2 \\6*\? + —j—i . (2.2.30) Thus by (2.2.25) and (2.2.30), the linearization error for the response 1/c is of 0[||<5cr||2], and since the response R= —1 /{iuc), it follows that R must also have this property (provided c^O). This implies the Frechet differentiability of R. 2.2.3 Constant-conductivity example A simple yet instructive example where the Frechet differential expansion can be evaluated directly is the case where conductivity profiles and perturbations are depth independent, i.e. a(z) — a, aQ(z) — (To, and 6a(z) — 6a. For a halfspace conductivity profile a, E{a,z) — E(a, 0 R ( a ) = —y/ji^ajiuj, and it is straightforward to show that the general expressions for the Frechet differential terms given by (2.2.18) reduce to p IVOCTO -tto = —1 IU RX = - l ^ S a , 2 y iu>a0 r2= i p v - 2 - <2 '231) 8 Y zuaQ R> = A 16 Y lu^o and thus the Frdchet differential expansion for R(a) is given by „ . . ( .— 1 8a 1 Sa2 1 6a3 \ „ W = ^ - + - + . . . j . (2.2.32) The first term represents the functional R evaluated at the starting model a0, the second term is linear in 8a and the remaining terms represent an alternating series increasing in powers of 8a. For this simple case of constant conductivities, Fr6chet differentiation of R is equivalent to ordinary differentiation and the Fr6chet differential expansion reduces to an ordinary Taylor series expansion „^ s ~• - dR R(a) = R{a 0) + — 1 d?R + 2! d ^ 6 ( r 2 1 cPR a0 3! da 3 Sa3 + . . . . (2.2.33) It is straightforward to show that taking R = — y//j, 0a/iu> in (2.2.28) leads to an expression identical to (2.2.27). By using the ratio test for convergence of a series (e.g. Kaplan 1973, p. 385) it may be shown that the series of higher-order terms in (2.2.27) converges for \8a/a 0\ < 1 and diverges for \8a/a 0\ >1. The remainder term Rr given by (2.2.11) reduces, for constant conductivities, to and it is straightforward to verify that R(a) = R0 + Rl + Rr holds for all a0 and 8a. Writing ~ y/a 0 + 8a) 2 = a0 — ^Jl + 8aJa^ in (2.2.29) and expanding y/l + 8a/a Q in a binomial series (provided \8a/a 0\ < 1) leads to which demonstrates the equivalence of the remainder term and the infinite series of higher-order terms within the region of convergence. 2.3 The response c = - E/d zE 2.3.1 Expansions for c Parker (1977b) derived the linear and remainder terms for the MT response c, defined by (2.1.18), when the point of measurement is the surface of the Earth. In this section, the complete expansion in terms of an infinite series of higher-order terms is derived and the equivalence of the remainder term and the series of higher-order terms is demonstrated. Also, by changing the manner in which the response is generalized to depth, these results are obtained for arbitrary measurement depth zm. Parker (1977b) generalized the c response according to Since c(a,z) simply represents a scaling of the electric field, it must satisfy the differential equation (2.2.1) for E(a,z), i.e. d 2c {a, z) — iu>fi 0a (z) c (a, z) — 0. (2.3.2) Following the procedure outlined in Section 2.2, assume (2.3.2) is satisfied by a conductivity a0(z) and introduce a perturbation <r (z) = <TQ (Z) + 8a (z), (2.3.3a) c (a, z) = c (a 0 + 8a, z) — CQ (z) + cx (z) + c2 (z) + ... . (2.3.3b) Equating terms of like order leads to d 2zc0 - (iu>fj, 0a0) CO = 0, (2.3.4a) d\c\ - (iu>fi0<r 0) ci = (iufj, 0) co8a, (2.3.4b) d zc2 — (iu/.c 0a0) c2 = (iw/J-o) c\8a, (2.3.4c) d 2zc3 - (iufioao) c3 = (iufj, 0) c28a, (2.3.4d) Consider a solution to (2.3.4) for a particular depth zm. The zero-order equation (2.3.4a) is in the form of the original differential equation (2.3.2) and has a solution c0(z m) =c(a 0, zm). The remaining equations all have the same form and can be solved if a Green's function Q(a 0, zm, z) (e.g. Morse & Feshbach 1953, Chapter 7) can be found which satisfies d 2zQ Oo, zm, z) - iufj, 0cr0 (z m) Q (<70, zm, z) = 8 (z m - z), (2.3.5) where 8(z m — z) is the Dirac delta function centred at z — zm. The Green's function Q must satisfy boundary conditions Q(z) = 0 for z—>00 and d zQ(z = 0) = 0 (the latter condition follows from definition (2.3.1) and the expansion c(a) = c(<r0)+ci + c 2 + . . . ). The solutions to (2.3.4) are then given by (2.3.6a) (2.3.6b) (2.3.6c) (2.3.6d) As before, the zero-order term represents the response functional evaluated at the starting model, the first-order term is linear in 8a, and c2 , c 3 , . . . represent higher-order contributions. A closed-form remainder term cr can be found by introducing an expansion c(a) = c (a 0 + 8a) = c0 + cx + c r (2.3.7) into the differential equation (2.3.2). This leads to oo cr(z m) = J iu>/j,oQ(ao,z m,z)8c(z)8a(z)dz, (2.3.8) o where 8c — c(a) — c(er0). The equivalence of the series of higher-order terms and the remainder term is clear: using 8c = ci + c2 + . . . in (2.3.8) and breaking the integral into terms according to their order yields the expansion terms given by (2.3.6c, d, ...). It is straightforward to change the recursive form of the expansion terms in (2.3.6) to the standard form for the Fr6chet differential expansion according to (2.1.8). By substituting the expression for ck_i into the expression for ck and implicitly assuming the dependence on a0, Co(z m) = c(a 0,zm) , oo ci(z m) = J iuj/j, 0Q(a0,z m,z)c0(z)8a(z)dz o oo C2(z m) = J iufioQ (<70, zm, z) c1(z)8a(z)dz, 0 oo c3(z m) = J iunoQ (a 0,zm,z)c2(z)8a(z)dz, (2.3.6) can be written as co(z m) = C (ao, zm), (2.3.9a) oo Ci(z m) = J G 1 z i ) 6a(z x)dz u (2.3.9b) 0 oo oo c2(z m) = J J G2(z m, zuz2) 8a(z 1)6a(z 2)dz 1 dz 2, (2.3.9c) 0 0 oo oo oo cz(z m) = Y\ J J J G3(z m,z1,z2,z3)8a(z 1)8a(z 2)8a(z 3)dz 1dz 2dz 3, (2.3.9d) o o o where the Frechet kernels are defined by G1(z m,z1) = iwiioQ(zm,zx)c(cro,z 1), (2.3.10a) Gi(z m,zx,z2) = -2u!2nlQ(z m,z2)Q(z 2,z1)c(a Q,z1), (2.3.10b) G3 (z m, zx,z2, z3) = -6iu3fj,lQ (z m,z3) Q (z 3, z2) Q (z 2, zi)c(a 0, zx), (2.3.10c) The problem in making use of these expansions for c(a,z m) is that the Green's function Q(a 0, zm, z) may be difficult to determine for general zm and a0. However, Parker (1977b) shows that even when Q(a 0,zm,z) cannot be determined, Q(a o,zm = 0,z) = -c(a 0,z) (2.3.11) may be chosen. Thus, for zm = 0 the linearized equation reduces to 00 8c(u) = c-i (CJ) = JGi (a 0,u>,z)8a(z)dz, (2.3.12a) 0 Gi (a Q, v, z) = -iufi 0c2 (cr 0,z), (2.3.12b) and the remainder term reduces to 00 C V ' 00 (w) = J —iu>Hoc(cro, z) 6c(z) 8a (z) dz, (2.3.13) which correspond to the expressions derived by Parker (1977b). The linear and remainder terms for arbitrary zm as well as the series of higher-order terms given by (2.3.6c, d, . . .) or (2.3.9c, d, . . .) cannot be determined in this manner. In general, this does not represent a practical limitation. MT responses are usually measured at the surface, and if not, the coordinate system may be redefined so that zm = 0. Thus, expressions for 8c and cT are always available,, and these are all that are required to carry out a linearized inversion (an expression for cr is required to verify that the neglected terms are second order). However, for completeness the linear term, series of higher-order terms and remainder term will be derived for arbitrary measurement depth ZM-Expressions for these terms can be derived if the responses are generalized to depth according to ~ / s E (cr, z) „ ^ aiifco' ( 23-14) i.e. c(0) = c(0), but in general c(z) ^ c(z). Following the procedure outlined for expanding R and c, a differential equation may be derived for c, and a perturbation introduced to the conductivity and response. Collecting terms according to their order and solving the resulting series of differential equations (using an integrating factor) leads to the following expressions for the expansion terms: CO(ZM) = C (<70 , Z M ) , (2.3.15a) oo Ci(z m) = j -iLOfi 0cl(z)e s^z)8a(z)dz, (2.3.15b) Zm OO c2{zm) = J -iufio [c\{z)a 0(z) + 2c0(z)c 1(z)8a(z)] (2.3.15c) ZM OO c3(z m) = J-iujfxo 2c1(z)c 2(z)a 0(z)+ ZM (c2(z) + 2c0(z)c 2(z)) 8a(z)] e'^Uz, (2.3.15d) where (2.3.16) s(z m,z) = J —2iufi 0c0(u)cr 0(u)du. Zm A closed-form remainder term cT is given by OO Cr (z m) = J -iufio [8c 2 (z) (a 0 (z) + 8a (z)) + 2co (z) 8c(z) 6a (a)] e'^^dz. (2.3.17) ZM The equivalence of the remainder term and the expansion of higher-order terms given by (2.3.15c, d, . . .) is easily established by substituting 8c = c(a) — c(a 0) = ca + c2 + . . . into (2.3.17) and breaking the integral into terms according to their order. The recursive form of the expansion terms in (2.3.15) may be changed to the standard Frechet differential form in a manner similar to that outlined in Section 2.2.1; however, the expressions for the Fr6chet kernels are lengthy and will only be given to second order: Co(z m) = c(a 0,zm), oo Ci(z m) = J G\ {z m,z1)8a(z l)dz 1, (2.3.18a) (2.3.18b) oo oo c2( zm) = ^ J J J G2(z m,z1,z2)8a(z 1)8a(z 2)dz ldz2, (2.3.18c) o o where the Frdchet kernels are given by = -iw/zoc 2 (<T 0, Zi) # (*1 - zm)es(Z m'Zl\ [oo J iu>fj, 0cr0(z 3)c(a 0, Z 3)H(Z 1 - z3) X H(Z 2 - Z 3)H(Z 3 -- 2H(Z 2 - z1)H{z 1 -(2.3.19a) (2.3.19b) Equations (2.3.15)-(2.3.19) provide complete expansions for the generalized response K ai zm) = —E(cr, zm)/d zE(cr, zm) in terms of an infinite series or a linear and remainder term. For zm = 0 the Fr6chet kernel of the linear term, given by (2.3.19a), may be written Gi (z m = 0,z) = -iu/j,o E(a 0,z) "" 2 d zE (cr 0,z) Equation (2.3.20) may be simplified by noting that exp [ - 2 J J . (2.3.20) d zE (a 0, z) exp I U J ^ q ^{a^u f 0 d U j = 9 * E ^ 0 ) (2-3.21) (to obtain this relationship, divide the MT differential equation (2.2.1) by d zE(z) and integrate from 0 to z). This leads to E(<r 0,z) 1 2 Gi (z m = 0,z) = -iujxo (2.3.22) [d zE{a 0,0). = -iufl 0C 2 ((To, z ) , which is equal to Gi(z m = 0, z), the Frechet kernel for the generalized response c(a, zm) = —E(cr, zm)/E(a, 0) given by (2.3.12). Thus, the Frechet kernel and linear term are identical for zm = 0 regardless of how the response is generalized. This also holds for the higher-order terms and remainder term. For the usual case where responses are measured at the Earth's surface, the linear and remainder term for c(a, zm—Q) are simpler to work with. However, the generalization c(a, zm) given by (2.3.14) allows the computation of the linear and remainder terms for arbitrary zm as well as the series of higher-order terms. The proof of the Frechet differentiability of c has been the subject of a number of papers. Parker's (1970) paper in which he first derived the Fr6chet kernel for the linear term and a later application to computing conductivity bounds (Parker 1972) simply assumed c to be Frechet differentiable. Anderssen (1975) questioned the validity of Parker's results since the neglected terms had not been proved to be second order. Anderssen's observations became more significant when Woodhouse (1976) showed that first- rather than second-order remainder terms result for the seismic normal mode problem when the Earth model is allowed to be discontinuous, invalidating the Frechet kernels derived by Backus & Gilbert (1967) for this problem. Parker (1977b) appeared to prove the Frechet differentiability of c with respect to conductivity profiles a E the space of square-integrable functions. Chave (1984) proved Frdchet differentiability for the fundamental toroidal and poloidal modes of EM induction (of which MT is a special case) in an l2 norm. MacBain (1986) detected a mathematical flaw in Parker's (1977b) proof, but proved the result for a E C 2, the space of functions which are twice continuously differentiable. In Parker's (1986) reply, he acknowledged the error but pointed out that MacBain's choice of model space was overly restrictive. MacBain (1987) appears to have completed the problem by proving the Frechet differentiability of c with respect to conductivity models a in and Z2 and showing that the l-i result can be extended to include finite delta-comb functions. Thus, the Frdchet differentiability of c seems to be firmly established. Since the Frechet differentiability of c has been extensively investigated and c = c for the usual case of zm = 0, the differentiability of c will not be considered further in this thesis. 2.3.2 Constant-conductivity example For the simple case where conductivity profiles and perturbations are depth independent, c(<r, Zm) = (iwfioa)- 1/2 and it is straightforward to show that the Frdchet differential terms given by (2.3.15) reduce to ~ ( 1 CO {Zm) — !• -A/SCU/XOCTO CL (2M) = 3/2 * y/lU![l 0<V * ( * » ) = f - p ^ A r 3 , (2.3.23) c3{zm) = -— 7/ 28a\ 1 0 y/lUJUoCTQ1 and thus the Frechet differential expansion is given by 1 ( 1 I 6a 3 8a2 6a3 \ „ = - j g s " u f + T7f~ ~oT*•• j • < 2 3 ' 2 4 ) The first term represents the functional c evaluated at the starting model a0, the second term is linear in 8a and the remaining terms form an alternating series increasing in powers of 8a. All terms are depth independent. It is straightforward to verify that an equation identical to (2.3.24) is obtained using ordinary derivatives of c(a, zm) with respect to a and the standard Taylor series expansion. The ratio test for convergence of a series indicates that the series of higher-order terms in (2.3.24) converges for \8a/a a\ < 1 and diverges for \8a/a 0\ > 1. The remainder term cr(z m) given by (2.3.17) reduces, for constant conductivities, to * ( , ) = * ( » U - ^ + l J ) , (2.3.25) y/lUfio y y/Oo 2 y/a0 + 80 J and it is straightforward to verify that c(cr) = c0 + ca + c r holds for all a 0 and 8a. Writing l/y/cr 0+8a = <7q1/2(1 + 8a/a 0)~1/ 2 in (2.3.25) and expanding (l + 8cr/ao)~ 1/ 2 in a binomial series (provided \8a/a 0\ < 1) leads to 1 {3 6a2 5 8a3 \ . . which demonstrates the equivalence of the remainder term and the infinite series of higher-order terms within the region of convergence. Evaluating the expansion terms for c(a,z m) = -E(a,z m)/d zE(a,0) is somewhat more complicated. For the special case where a0 is a constant, it may be verified that Q Oo, zm, z) 2y/iu}/j, 0a0 (2.3.27) is a solution to the differential equation (2.3.5) for the Green's function which also satisfies the boundary conditions Q(z) = 0 for z—>oo and d zQ{z = 0) = 0. For zm = 0 Q(a o,zm = 0,z) = y/iuJH 0cr0 E (<T 0, z) (2.3.28) d zE{a o,0) = -c(a 0,z), in agreement with (2.3.11). The Frechet differential terms for c(a, zm) given by (2.3.6) become e—y/iuiwcroZTn C 0 (zm ) — y/ILOFJ,0A0 Cl (z n) = - 2 c 2 (z m) - g z, + — _y/iujfj, 0(7^ 2 a0 _ 3 8a, C3 (ZM) = -y/iujjj^ay 2 ao 5 3z m . T O T 3/2 (2.3.29) 16 5zrn ly/iunaZ ^ iu y.Qz, » i -R c/o "I 5/2 3<r02 It is straightforward to show that an ordinary Taylor series expansion of c(a,z m) about a0 leads to an expansion with terms identical to (2.3.29). For zm = 0 the expansion terms for c(a,z m) given by (2.3.29) reduce to the depth-independent expansion terms of c(a,z m) given by (2.3.23). The remainder term cr(z m) given by (2.3.17) can be shown to reduce, for constant conductivities, to 1 CR (^M) — y/iujfio n y/a 0 + 8a + 8a \JIUJ\±q 8a ZN v " ' " - - m J g-V'^ O^O-Zm 2<7o 3/2 2a0 (2.3 ,30) It is straightforward to verify that c(a, zm) = co(z m)+ci(z m)+cT(z m) holds for all a0 and 8a, and that for zm = 0 (2.3.30) is equivalent to the remainder term cr(z m) given by (2.3.25). 2.4 Relative linearity of R and c 2.4.1 Quantifying the linearity In Sections 2.2 and 2.3 Frechet differential expansions are derived for the MT responses R and c. By proving that the remainder (non-linear) term is second order and may be neglected, the linear term can be inverted using methods of linear inverse theory. Linearized inversion algorithms have been successfully implemented which make use of both responses (see references cited in Section 2.1.2). However, an investigation of the relative linearity of the two responses has not been presented. In a non-linear problem, the choice of model and response may well affect the linearity of the problem (i.e. the relative size of the linear and non-linear terms). Strictly speaking, a problem is either linear or non-linear; however, a particular choice of model and response may be considered 'more linear' than others if it yields a more accurate linearized equation when the higher-order terms are neglected. The correct choice should increase the likelihood of the algorithm converging to an acceptable solution, particularly when a model is sought which minimizes a particular functional. In addition, formulating the most accurate linearized inverse problem should minimize the number of iterations required to achieve an acceptable model. This may be significant, particularly for 2 - and 3-D inversion schemes which require a large number of 1-D solutions to perform an approximate inversion step (Oldenburg & Ellis 1990) or to carry out uncoupled 1-D inversions at each site (Smith 1989). Three possible choices for the model are conductivity <r, resistivity p and log conductivity (logo- = -log/?). Smith & Booker (1988) argue convincingly that conductivity is the choice of model for which the problem is most linear; therefore, a(z) is adopted as model in this study. The only consideration of the linearity of MT responses R and c in the literature is a brief heuristic argument presented by Smith & Booker (1988) to motivate their use of R, in a linearized inversion algorithm. By integrating the MT differential equation (2.2.1) and normalizing by the surface field, Smith & Booker derive an exact relationship between R and a: According to their argument, if E were independent of a, R would be linear in a and therefore may be more linear than c. They do not demonstrate that a similar statement cannot be made about c. Although this argument may motivate their choice of R, the reasoning is not very general since E is certainly not independent of a (the dependence of the electric field and its gradient on the Earth conductivity structure is, in fact, the basis for MT). A more general consideration of the relative linearity is required in order to determine the most linear response. oo (2.4.1) o The purpose of this section is to attempt to quantify the relative linearity of R and c in order to select the most linear response. The quantity that will be considered diagnostic of the linearity of a response functional is the ratio of the magnitudes of the linear and non-linear terms in the Frechet differential expansion, defined by Inon—linear terms | , _ Q = |linear term| ' < 2 4 ' 2 ) If the problem is linear, the non-linear terms are zero and a — 0; if the non-linear terms are small compared to the linear term, a is small and the problem is considered 'almost linear'; if the non-linear terms dominate, a is large and the problem is considered 'very non-linear'. Other diagnostic quantities could be defined, but the linearity ratio a given by (2.4.2) provides a practical and useful measure of the linearity. Since the remainder terms R,. and cr provide closed-form expressions which contain all the higher-order contributions, the linearity ratios a R and ac for R and c can be written = (2.4.3) I-Hi I = H - (2.4.4) lcil The ratio aR/a c is diagnostic of the relative linearity of R and c: when aR/a c < 1, R may be considered more linear than c; when aR/a c > 1, R is more non-linear than c. Calculating aR, ac and the ratio aR/a c allows the relative linearity of the responses R and c to be quantitatively compared. 2.4.2 Linearity for constant-conductivity models In order to demonstrate how the relative linearity of R and c may be quantified using the linearity ratio a, consider the special case where the conductivity models and perturbations are independent of depth. As before, a represents the true model, a0 represents an arbitrary starting model, and 6a = a — cr0. An expression for the linearity ratio for R, aR, may be obtained by using the linear and non-linear (remainder) terms derived for the constant-conductivity case in Section 2.2.3. Substituting the expressions for Rx and Rr given by (2.2.31) and (2.2.34), respectively, into (2.4.3) leads to = (2.4.5) For the case of constant conductivities, a R is independent of frequency u which simplifies the analysis. To investigate the linearity of R, the true model a is considered to be known and remains fixed, and a R is computed for a wide range of starting models <r0. A number of results follow immediately from (2.4.5): as a0—*cr, aR—>0; as cro—^0, aR—>1; and as a0—too, aR—>>1. In fact, 0 < « « < 1 for all possible choices of a and cr0, indicating that for R the magnitude of the non-linear terms never exceeds that of the linear term. An expression for the linearity ratio for c, ac, may be obtained by substituting expressions for the linear and remainder terms derived for c in the constant-conductivity example of Section 2.3.2. For simplicity, the depth-independent terms cx and c r given by (2.3.23) and (2.3.25), respectively, may be used in (2.4.4) and lead to <7 + 2 ^ / 2 ( 7 - 1 / 2 - 3(70 — • (2.4.6) |<7 - <70 | A number of results follow from (2.4.6): as <70-><x, ac—>0; as <r0—>0, ac—>1; but as a0—>oo, ac—>oo. Thus, a c is not bounded — for large <70 the ratio of the non-linear to linear terms approaches infinity. The ratio a R / a c may be considered diagnostic of the relative linearity of R and c. For the constant conductivities this ratio may be written <* /« . = . (2.4.7) a + 2<7o / 2<r- 1 ' 2 - 3cr0 For the case of constant conductivity profiles, it can be proved analytically that a R / a c < 1 for all values of a 0 . The proof begins with the identity (^Jo/cro — 1 j > 0 or a ->2J—-1. (2.4.8) Multiplying through by -2al' 2a~1/ 2 and adding aQ + a to (2.4.8) leads to a — 2y/aa^ + a0<a + 2a3'2aQ 1 / 2 - 3<r0, (2.4.9) or OCR/A C = (2.4.10) (7 + 2 ( 7 o / V 1 / 2 - 3cr0 It is noted that this proof does not hold for a0 =0 , a0 = a, or <T0—>oo. However, in these cases l'Hopital's rule may be used to calculate the values: as a0—>0, aR/a c-^ 1; as a0-*a, oiRj(Xq,—• 1/3; and as <70—>oo, a R / a c ^ 0 . Thus, a R / a c < 1 holds for all values of cr0 which indicates that for the case of depth-independent conductivities, R is always at least as linear as c. For large values of a0, c may be a great deal more non-linear than R. To illustrate how the linearity of R and c vary with the starting model a0, Fig. 2.1 shows a R , a c and a R / a c computed from (2.4.4), (2.4.5) and (2.4.7), respectively, plotted as a function of a0/a. Figure 2.1(a) shows that aR and ac both have a simple form with a single pronounced minimum occurring at <70/<r= 1. At this point a R and a c are actually zero, but the figure simply shows them approaching zero. The minima represents the point where a and <r0 are closest in a linear sense, i.e. the value of a 0 where the linearized expansion most closely approximates the true response. For constant conductivities it is clear that this must occur at a 0 / a = \ since R(cr) = R{a 0) + f G(cr 0, z)8a(z)dz is exactly true for aQ = a, 8a = 0 (and similarly for c). For a0/a < 1, aR and ac increase and asymptotically approach one. For a0/a > 1, aR again approaches one; however, a c increases without bound. The relative linearity of R and c is illustrated in Fig. 2.1(b) which shows aR/a c as a function of a0/a. For small values of a0/a, aR/a c asymptotically approaches one. As a0/a increases, aR/a c decreases smoothly passing through the value 1/3 at a0/a= 1. For large values of a0/a, aR/a c approaches zero, indicating R is much more linear than c. It is clear from Fig. 2.1 that aR < ac for all values of a0/a indicating that for constant-conductivity profiles, R is a more linear choice of MT response than c. oo 0 101 o 10° l c r 1 • s G 1 0 - 2 1 0 - 3 10° o \ 10 - 1 10 ^ 1 0 ~ 3 1 ( T 2 1 0 " 1 1 0 ° 1 0 1 1 0 2 Figure 2.1 Relative linearity of MT responses R and c for the case of constant-conductivity true and starting models, (a) shows a R (solid line) and a c (dashed line) as a function of a Q / a . (b) shows the ratio a R / a c as a function of <J0/<T. 2.4.3 Linearity for general models In Section 2.4.2 it is shown that for the case of constant-conductivity profiles, R is more linear than c for all a and a0. This fact alone strongly suggests that R is a better choice for linearizing the MT response than c; however, it is instructive to investigate the relative linearity for more general conductivity models. Considering general models introduces two difficulties not encountered in the constant-conductivity case. First, general analytic results concerning a R and a c are not available, and second, the linearity ratios depend on frequency u (or period T = 2-k/uS) as well as the conductivities. In the most general formulation the linearity ratios would also depend on the depth of measurement, zm; however, since the coordinate system may be redefined so zm = 0, only the case corresponding to surface measurements need be considered. Although analytic results are not available for general models, the linearity ratios a R and a c can be computed for a given choice of cr(z), a0(z) and T using expressions for the expansion terms derived in Sections 2.2 and 2.3. The observed responses, R(cr) and c(a), are expanded about the starting model <r0 in terms of a linear and non-linear (remainder) term. The linear terms, Rx and cx , are calculated using (2.2.16) and (2.3.12), respectively. Equations (2.2.11) and (2.3.13) provide expressions for the remainder terms Rr and cr; however, in a synthetic study once the linear terms Rx and cx have been calculated, it is simpler to evaluate the remainder terms as Rr = R(a) - R(cr 0) - Rx, (2.4.11a) cr = c{a) - c (<70) - ci. (2.4.lib) The linearity ratios a R and a c are calculated according to (2.4.3) and (2.4.4) as the ratio of the magnitudes of the non-linear and linear terms. Evaluating the quantities in (2.4.11) require computing the electric and magnetic fields as a function of depth for the conductivity profiles a(z) and a0(z); this is the forward problem for MT and can be solved in a manner similar to Oldenburg (1979). The forward problem is considered in more detail in Chapter 3. For a given true model a, the linearity ratios may be computed and compared for a number of starting models a 0 . It is obviously impossible to exhaustively consider all a and a 0 in the model space. However, for a given a the linearity ratios a R and a c can be computed for all a 0 within certain classes of starting models. Examining a R and a c for a number of cases which are representative of general conductivity structures provides insight into the relative linearity of R and c for arbitrary models. As an example of the linearity analysis, consider a model a{z) which consists of a conductive surface layer 200-m thick with a conductivity of 0.1 S/m overlying a halfspace of conductivity 0.01 S/m. Figure 2.2 shows the linearity ratios computed when the starting models consists of halfspaces of conductivity a0: aR, ac and aR/a c are plotted as a function of a0 for periods T of 0.01, 0.1, and 1.0 s. Each curve consists of 500 computed values logarithmically spaced in a0 from 0.001 to 10.0 S/m. The linearity ratios aR and ac generally vary with cr0 in a regular manner with absolute minima occurring at the points where a and <r0 are closest in a linear sense for R and c, respectively. Unlike the constant-conductivity case shown in Fig. 2.1(a), the minima of a R and a c do not coincide. The locations of the minima vary with period: for T = 0.01 s, Fig. 2.2(a) shows the minima occur near CR0 = 0.1 S/m (the value of the surface layer of <7), while for T = 1.0 s, Fig. 2.2(c) shows the minima occur near a 0 =0.01 S/m (the value of the underlying halfspace of a). Figure 2.2(b) shows that for T =0.1 s, the minima occur at values of a 0 intermediate between 0.1 and 0.01 S/m. This variation may be understood by considering the depth of penetration of the electric fields in the true model a. For the short period T = 0.01 s, the skin depth (depth at which E has decayed to 1/e of its surface value) is less than the thickness of the surface layer, so the response is similar to that of a halfspace of conductivity 0.1 S/m. For the long period T = 1.0 s, the skin depth is an order of magnitude greater than the thickness of the surface layer, so the response is similar to that of a halfspace of conductivity 0.01 S/m. For periods much shorter than 0.01 s or longer than 1.0 s, plots of a R and a c simply resemble the constant-conductivity case shown in Fig. 2.1(a). In addition to the minima, Fig. 2.2(a) also exhibits distinct maxima for a R and a c near a 0 =0.1 S/m. This o 10° 55 . 10-1 S 10-2 1 0 - 3 10 1 o 10° V 0 " S 1 0 - 2 1 0 ~ 3 1 0 - 3 1 0 " 2 1 0 " 1 10° 10 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 ° 10 C70 ( S / m ) (7q (S /m) a 0 (S /m) "N / v " b 1 i i i Figure 2.2 Relative linearity of MT responses R and c. The true model a consists of a 200-m thick surface layer of conductivity 0.1 S/m over a halfspace of conductivity 0.01 S/m. The starting models consist of halfspaces of conductivity ao. (a), (b) and (c) show CXR (solid line) and AC (dashed line) as a function of ao for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio a R / a c corresponding to the same periods (the dotted line indicates OLR/OL C = 1). Ul OJ feature is greatly reduced at longer periods. Except for a narrow interval near the maximum in Fig. 2.2(a), a R is less than one for all (7q and T indicating that the non-linear terms do not exceed the linear terms in magnitude. This is not true for a c which increases rapidly from its minimum value as a 0 increases. a R is less than ac for all cr0 and T except in the immediate vicinity of the minimum for ac. This point is illustrated in Fig. 2.2(d), (e) and (f) which show the ratio a R / a c as a function of cr0 for the same periods as Fig. 2.2(a), (b) and (c), respectively. The line a R / a c = 1 is included as a reference in Fig. 2.2(d), (e) and (f). It is clear that except in a narrow interval near the minimum for a c , the ratio a R / a c is less than one for all values of a 0 and T. In fact, a R is less than a c by almost an order of magnitude over much of the region, indicating that R is significantly more linear than c. Figure 2.2 shows a R , a c and a R / a c as a function of <70 for three fixed periods. In order to observe how these quantities vary with period as well as starting model, Fig. 2.3 shows surfaces of a R , a c and a R / a c as a function of T and cr0. The surfaces Fig. 2.3(a) and (b) represent a 50 by 50 grid of computed values. Figure 2.2(c) represents 50 values of T and 60 values of a0 ; the sampling interval in cr0 is halved in the region of the maximum to more accurately define this feature. The curves in Fig. 2.2 represent 2-D 'slices' through these surfaces at periods of 0.01, 0.1, and 1.0 s. The variation with T is seen to be gradual, smoothly joining the fixed-period curves. For a true model a consisting of a layer over a halfspace, another appropriate class of starting models for investigating the relative linearity is the set of 2-layer models. In Fig. 2.4 the linearity ratios are compared for starting models a 0 which consist of a 200-m thick surface layer over a halfspace of conductivity 0.01 S/m, i.e. the layer thickness and halfspace conductivity are identical to the true model. The conductivity of the surface layer, a0(z s), is varied from 0.001 to 1.0 S/m. Figure 2.4(a), (b) and (c) show aR and ac as a function of a0(z s) for periods T of 0.01, 0.1, and 1.0 s. The linearity ratios both exhibit a pronounced minimum at <70(zs) = 0.1 S/m for all periods. At this point the true and starting models are identical and a R and a c are zero although the figure simply shows them approaching zero. Figure 2.4(c), (d) and (e) show 101 G 1 0 ° Q . 10-1 ct; 8 l O - 2 CJ 10 10 - 2 ** s — — — — f/ — V a i i i -1 _ icr3 icr2 ict1 io° io1 icr3 icr2 lcr1 io° io1 a 0 ( z s ) ( S / m ) a Q ( z s ) ( S / m ) / / / / / / - ^ — — _ ~ s / _ s / \l / c 1 X 1 / \l / 1 1 0 - 3 1 0 - 2 1 0 " 1 10° 10 1 C7 0(z s) ( S / m ) Figure 2.4 Relative linearity of MT responses R and c. The true model a consists of a 200-m thick surface layer of conductivity 0.1 S/m over a halfspace of conductivity 0.01 S/m. The starting models CTO consist of a 200-m thick layer over a 0.01-S/m halfspace, <TQ(Z S) indicates the conductivity of the surface layer, (a), (b) and (c) show aR (solid line) and a c (dashed line) as a function of ao(zs) for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio a j i / a c corresponding to the same periods. Ul OS aR/a c corresponding to the same periods. This ratio is less than one for all values of a0(z s) and T, indicating R is more linear than c. For T= 1.0 s, Fig. 2.4(c) shows that aR is more than an order of magnitude less than ac over the range of <r0(z a) values. Figure 2.5 shows surfaces of aR, ac and aR/a c as a function of crQ(z s) and T. Figure 2.6 shows the linearity ratios when the starting models er0 consist of a 300-m thick surface layer of conductivity a0(z s) between 0.001 and 1.0 S/m over a halfspace of conductivity 0.04 S/m, i.e. the layer thickness and halfspace conductivity do not correspond to the true model. Figure 2.6(a), (b) and (c) show aR and ac as a function of the surface layer conductivity, cr0(z s), for periods T of 0.01, 0.1, and 1.0 s. Figure 2.6(a) resembles Fig. 2.2(a) since for the short period T =0.01 s there is little difference in expanding the responses about a starting model consisting of a 300-m surface layer or a halfspace of the same conductivity. At longer periods the minima of a R and a c become less pronounced, as shown in Fig. 2.6(b) and (c). Figure 2.6(c), (d) and (e) show aR/a c: this ratio is less than one for all values of cr0(z s) and T except in the immediate vicinity of the minimum for ac for T = 0.01 and 0.1 s. Figure 2.7 shows surfaces of aR, ac and aR/a c as a function of a0(z s) and T. In Fig. 2.7(c) the sampling interval in a0(z s) is halved along the maximum to more accurately define this feature. As a second example of the linearity analysis, consider a true model a(z) which consists of a linear conductivity gradient of d za = a' = 1 0 - 4 S/m2 from the surface to a depth of 10 km. The surface conductivity value is 0.01 S/m; below 10 km the conductivity is a constant 1.01 S/m. Figure 2.8 shows the linearity ratios computed when the starting models consist of halfspaces of conductivity aQ. Figure 2.8(a), (b) and (c) show aR and ac as a function of the halfspace conductivity <r0 for periods T of 0.01, 0.1, and 1.0 s. The linearity ratios exhibit shallow minima with the value of a 0 at which the minima occur increasing slightly with period. The minima for aR and ac do not coincide, but aR is less than ac over the entire range of values for a0 including the minimum for ac. Figure 2.8(c), (d) and (e) show aR/a c as a function of a0 for the same values of T: this ratio is less than one for all a0 and T. Figure 2.9 shows surfaces of aR, ac and aR/a c as a function of a0(z s) and T. The linearity ratios show only a weak dependence on T. CJ O . 10-1 C 1 0 - 2 10° 10 - 2 1 1 1 i i d i 1 0 " 3 10-2 10-1 1 0 o 1 0 1 cT 0(z s) (S/m) 1 0 " 3 1 0 " 2 1 0 " 1 10° 10 1 (7 0 ( z s ) (S /m) 1 0 " 3 10-2 1 0 " 1 10° 10 1 a0(z s) (S/m) Figure 2.6 Relative linearity of MT responses R and c. The true model a consists of a 200-m thick surface layer of conductivity 0.1 S/m over a halfspace of conductivity 0.01 S/m. The starting models GQ consist of a 300-m thick layer over a 0.04-S/m halfspace, oo(zs) indicates the conductivity of the surface layer, (a), (b) and (c) show OCR (solid line) and a c (dashed line) as a function of <7o(zs) for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio a R / a c corresponding to the same periods (the dotted line indicates a R / a c =1). l O " 2 1 0 " 3 10-2 1 0 - 1 10° 0 (S/m) / / / / — -c 1 i i 1 0 - 1 10° (S/m) 1 0 " 3 10-2 1 0 - 1 10° a0 (S/m) 101 Figure 2.8 Relative linearity of MT responses R and c. The true model a consists of a conductivity gradient of 10"4 S/m2 from the surface to a depth of 10 km. The surface value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of halfspaces of conductivity a 0 . (a), (b) and (c) show aR (solid line) and ac (dashed line) as a function of a0 for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio ocR /a c corresponding to the same periods (the dotted line indicates a R / a c =1). Figure 2.10 shows the linearity ratios computed when the starting models consist of linear gradients a'0 ranging from 1 0 - 7 to 1 0 - 3 S/m2. The surface conductivity value is 0.01 S/m (identical to the true model) and below 10 km the conductivity is held constant at the achieved value. Figure 2.10(a), (b) and (c) show a R and a c as a function of the ratio of the gradients of the starting and true models, CTQ/CT', for periods T of 0.01, 0.1, and 1.0 s. The linearity ratios exhibit a pronounced minima at cr' 0/(r' = 1 where the true and starting models are identical. Figure 2.10(c), (d) and (e) show aR/a c as a function of o'0/a' for the same values of T. This ratio is less than one or all a '0 /a ' and T. Figure 2.11 shows surfaces of a R , a c and a R / a c as a function of cr'0/cr' and T. The linearity ratios computed when the starting models a0 consist of conductivity gradients of 10"7 to 10~3 S/m2 and the surface conductivity value is 0.1 S/m (not identical to the true model) are shown in Fig. 2.12. aR and ac are shown in Fig. 2.12(a), (b) and (c) for T = 0.01, 0.1 and 1.0 s. The linearity ratios do not show pronounced minima and a R is less than a c for all values of O-Q/<T' and T. Figure 2.12(c), (d) and (e) show aR/a c: this ratio is almost constant at a value of about 0.2. Figure 2.13 shows surfaces of a R , a c and a R / a c as a function of cr'o/a' and T. Figures 2.1-2.13 illustrate the relative linearity of MT responses R and c for several choices of true model a. The relative linearity has been investigated for a number of true models including a halfspace, conductive/resistive surface layer, conductive/resistive buried layer and positive/negative conductivity gradient and a variety of types of starting models. Table 2.1 summarizes the cases that have been considered. The results of this study may be summarized as follows. Plots of a R and a c at fixed periods generally exhibit a minimum for some model <70. When CT0 = cr is a possible choice, the minima of aR and ac coincide at this point achieving a value of zero; however, the value of a R / a c as a0—^cr is less than one. When a0 = a is not a possible choice, the minima of aR and ac may be pronounced or shallow and generally do not coincide. Except near localized maxima, aR is generally less than one, indicating that the non-linear terms do not exceed the linear term in magnitude. In contrast, ac often exceeds one over large ranges. Finally, aR is observed to be less than ac for all choices of a0 and periods IO1 o 10° . 10-1 S I O " 2 - \ / a i I 1 0 " 1 I I I I O " 3 10-2 1 0 - 1 100 ( T 0 ' / V IO1 I O - 3 10-2 IO" 1 10° a 0 / a ' IO1 I O " 3 10-2 1 0 - 1 10° <Jo'/ a' IO1 Figure 2.10 Relative linearity of MT responses R and c. The true model a consists of a conductivity gradient of 10~4 S/m2 from the surface to a depth of 10 km. The surface value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of conductivity gradients from the surface to 10-km depth; the surface values are 0.01 S/m and below 10 km the conductivity remains constant. <7q/CT' represents the ratio of the gradients of the starting and true models, (a), (b) and (c) show (solid line) and a c (dashed line) as a function of O-q/CT' for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio OCR/A C corresponding to the same periods. 1 0 - 1 10-3 1 0 " 2 1 0 - 1 10° a 0 / a ' 1 0 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 a0'/a' 101 10-3 1 0 " 2 1 0 - 1 10° < T 0 ' / a ' Figure 2.12 Relative linearity of MT responses R and c. The true model a consists of a conductivity gradient of 1CT4 S/m2 from the surface to a depth of 10 km. The surface value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of conductivity gradients from the surface to 10-km depth; the surface values are 0.1 S/m and below 10 km the conductivity remains constant, o-g/cr' represents the ratio of the gradients of the starting and true models, (a), (b) and (c) show OLR (solid line) and a c (dashed line) as a function of cr'0 jo ' for periods T of 0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio CXR/CXC corresponding to the same periods. o\ o\ Table 2.1 Summary of cases considered in linearity study of MT repsonses R and c. True model Starting models halfspace - halfspaces - surface layers - buried layers conductive surface layer - halfspaces - surface layers (correct thickness & halfspace conductivity) - surface layers (incorrect thickness & halfspace conductivity) resistive surface layer - halfspaces - surface layers (correct thickness & halfspace conductivity) - surface layers (incorrect thickness & halfspace condutivity) conductive buried layer - halfspaces - buried layers (correct depths & halfspace conductivity) - burried layers (incorrect depths & halfspace conductivity) resistive buried layer - halfspaces - buried layers (correct depths & halfspace conductivity) - buried layers (incorrect depths & halfspace conductivity) positive gradient - halfspaces - positive gradients (correct suface conductivity & underlying halfspace) - positive gradients (incorrect surface conductivity & underlying halfspace) negative gradient - halfspaces - negative gradients (correct surface conductivity & underlying halfspace) - negative gradients (incorrect surface conductiv-ity & underlying halfspace) T except in the immediate vicinity of the minimum of ac for some cases where the minima of a R and a c do not coincide. In the narrow regions where a c is less than a R , the value of a R is generally small, indicating that even where c is more linear than R, R is still quite linear. Although this study is not exhaustive, the true models considered are chosen to be repre-sentative of general conductivity structures. The results summarized above are consistent for all choices of true and starting models considered. This study strongly indicates that R is a more linear choice of MT response than c. The use of R should result in a more accurate linearization and therefore a more effective and efficient linearized inversion algorithm. 2.5 Alternative formulations 2.5.1 Alternative choices for model and response Section 2.4 indicates that a and R are the most linear choices of model and response for the MT inverse problem. In general, this choice should result in the optimal linearized inversion algorithm. However, in some cases practical considerations warrant the use of alternative forms of response or model. In this section two such choices are considered. The conductivity of the Earth can vary over many orders of magnitude. If recovering conductivity variations over a number of orders of magnitude is required, logo- is a more appropriate choice of model than a. This choice of model has the additional benefit of ensuring positivity. Field measurements are often presented in terms of the amplitude and phase \R\ and $ of R (i.e. R = \R\el<t>) rather than real and imaginary parts. Although in theory it is straightforward to convert between the two representations, in practice different errors associated with the amplitude and phase measurements can make this difficult. Also, in some cases it is advantageous to examine the phase information separately (e.g. Rananayaki 1984). It is then necessary to consider the responses as amplitude and phase and ascribe appropriate statistical errors to each. Oldenburg (1979) transformed the linearized equation for R to obtain expressions for |i2| and <f> as responses and log a as model. Although it is not explicitly mentioned in this reference, it should be noted that the transformations are not exact since they involve approximating the difference operator 8 with the differential operator d. This is a first-order approximation which results in additional higher-order error terms thereby increasing the non-linearity. The results from Oldenburg (1979) are as follows. When the complex response is considered as amplitude and phase, the Frechet kernels are given by (?<*, .„ , , ) = (2.5.1a) for |i2(cr,cu)| and for <F)(A, OJ), where SJ and indicate imaginary and real parts. If the model is logcr, then the kernels G(a 0,u>,z) corresponding to either real and imaginary or amplitude and phase responses are replaced by a0(z)G(a 0,u>, z). Although the inverse problem in terms of log a as model or \R\ and (j) as response is not as linear as the formulation in terms of a and R, studies similar to that in Section 2.4 indicate that these choices are more linear than resistivity p as model or amplitude and phase of c as response. In this thesis, inversion methods are developed for both a and log a as model and real and imaginary part or amplitude and phase of R as response. Although real and imaginary responses and conductivity are the favoured choices, it is considered important that inversion algorithms are flexible enough to accommodate the alternatives when required. 2.5.2 The similitude equation as a basis for model norm inversion This chapter examines linearization as a method of solving the MT inverse problem for model norm solutions. Gomez-Treviiio (1987) suggests an alternative formulation. Using scaling properties of Maxwell's equations, G6mez-Trevino derives an exact, non-linear integral equation, which he calls the similitude equation, relating the conductivity model to field measurements. Although he does not address the problem in detail, G6mez-Trevino suggests that MT construction and appraisal algorithms could be based on this formulation rather than linearization. In this section the similitude equation is compared to linearization as a basis for model norm solutions. Gomez-Trevino (1987) derives the similitude equation in terms of the apparent conductivity aa as response. In order to compare the similitude method to the linearized equations developed in this chapter, the similitude equation is derived here for the R response; this derivation also serves to illustrate Gomez-Trevino's approach. The scaling properties of the electric and magnetic fields are well known and follow directly from Maxwell's equations: for a scalar k E (ka, kT, zm) = ^E (a, T, zm), (2.5.2a) B (ka, kT, zm) = B (a, T, zm). (2.5.2b) Thus, the R response scales according to R ( k a k T z 0 ' ' Z m ) ~ E{ka,kT,z m) (2.5.3) = kR(a,T,z m). If k = 1 + h, (2.5.3) becomes R(a + ha,T+hT,z m) = (l + h)R(a,T,z m). (2.5.4) The quantities ha and hT may be thought of as perturbations in conductivity and period which are simply a scaling of the original values. The perturbation in the response 8R is given by 8R{ha,hT,z m) = R(a+ha, T+hT, zm) - R (a, T, zm). (2.5.5) Combining (2.5.4) and (2.5.5) leads to 6R (ha, hT, zm) = hR (a, T,z m). (2.5.6) However, 8R may also be expressed in terms of an expansion about a and T: oo 8R (ha, hT, zm) = h T d R ( a ^ z m ) + r > ^ ^ z ) h ( y ^ d z + RT + Rr^ (2.5.7) where G is the first Frechet kernel and RT and Rr are the remainder terms for the first-order expansion of R in T and a. Substituting for 6R from (2.5.6) and dividing by h, (2.5.7) may be written as oo R (<7, T, zm) = T d R ( a ^ z ™ ) + JQ^ T ) z ^ z ) a d z + ±.Rt + (2.5.8) o In the limit as h—>0, RT/h^0 and R,./h—>0 and (2.5.8) becomes oo R ( a , T , z m ) - T d R ^ Z m ) = J G (a, T, zm, z) a (z) dz. (2.5.9) 0 Equation (2.5.9) represents the similitude equation for R. It is an exact expression relating measurements of the electric field and its derivatives to the conductivity model and makes no allusion to any perturbation or starting model. For comparison, the Frechet differential expansion for R formulated for model norm inversion is reproduced here: 00 00 R(cr,T,z m) - R(a 0,T, zm) + j G (a 0,T, zm, z) a0(z) dz = J G (<T 0,T, zm, z) a (z) dz + Rr_ 0 0 (2.5.10) This is also an exact expression; however, to solve for a the non-linear remainder term Rr must be neglected. Gomez-Trevino (1987) suggests that since the exact similitude equation directly relates the response to the true model, inversion algorithms could be based on this formulation rather than on a linearized approximation. The difficulty is that the Frechet kernel in (2.5.9) is evaluated at the true model a. Since a is never known a priori (it is the object of the inversion), the best that can be done in practice is to approximate it with a known starting model a 0 , i.e. approximate G(cr) with G((T 0) in (2.5.9). This leads to a linear problem for a. However, making this approximation is equivalent to neglecting an error term S r at each iteration in much the same way that the Frechet differential expansion for R is linearized by neglecting the remainder term Rr. To illustrate this, the similitude equation (2.5.9) can be written 00 = jG(a 0,z)a(z)dz + S r, (2.5.11) 0 where the dependence on T and zm is implicitly assumed. The error term S r neglected in G6mez-Trevino's approach is given by oo ST = J [G(a,z)-G(a 0,z)]a(z)dz. (2.5.12) o The remainder term for the linearized equation is proved to be second order in 8a in Section 2.2.3. To determine the magnitude of the similitude error term S r, consider the following analysis. Subtract from the similitude equation (2.5.9) an identical expression evaluated at a0 rather than a. This leads to oo oo 8R - T^p- = j G(a,z)a(z)dz- j G(a 0,z)a0 (z) dz. (2.5.13) 0 0 After some algebra, (2.5.13) may be rearranged to give oo S r=8R- J G ( a 0 , z ) 8 a ( z ) d z - T ( 2 . 5 . 1 4 ) o oo But 8R — J G(a 0, z)8a(z)dz = Rr, so (2.5.14) becomes o S r = R r - T ^ . (2.5.15) Thus, S r may be considered as the sum of two terms. The first term, Rr, is second order in 8a. To investigate the magnitude of the second term, consider the first-order approximation for 8R oo 8R {a 0,8a, T, zm) = J G (a 0, T, zm, z) 8a (z) dz, (2.5.16) o where all dependences are explicitly indicated. It follows that oo ^ f 9 dT 0 T J—{G(a 0,T,z m,z)}8a(z)dz, (2.5.17) which is first order in 8a. Thus, the similitude equation for a can be written to first order as oo oo R(a)-T^± = J G{a 0,z)a{z)dz-jT dG {^ z)8a{z)dz, (2.5.18) o 0 where the second term on the right side represents the error term ST. The error term neglected in an inversion step based on the similitude equation is first order in the model perturbation. To demonstrate the significance of this result, (2.5.18) may be written in a form which illustrates the relative size of terms: oo oo oo R (a) - - J g (<70, z) <70 (z) dz = J g (a0 , z) 8a (z) dz - j T 9 G ^ z ) S a (z) dz. 0 0 0 (2.5.19) The quantities on the left side of (2.5.19) may be considered modified data. The first term on the right side is the linear functional of 8a to be inverted, the second term represents (to first order) the error term ST that is neglected in the inversion. Although a model norm inversion for a would make use of (2.5.18), writing the similitude equation in the form of (2.5.19) emphasizes that the error term and the inversion term are of the same order in 8a and any inversion scheme based on neglecting this error term is ill-founded. Whether the model produced by such an inversion step is an improvement on a0 depends on the relative size of the linear functionals of 8a in (2.5.19). At best, an iterative inversion algorithm based on the similitude equation might exhibit linear convergence; however, convergence is not guaranteed, even as a0—>a. In contrast, the linearized equation neglects a remainder term that is second order in 8a; an iterative algorithm based on it (essentially Newton's method for operators) generally exhibits quadratic convergence and convergence is guaranteed provided <70 is close enough to the true model a. The similitude equation requires twice as many measurements as the linearized equation since both R{a, T) and d?R(a, T) are required to form the response at each period T. Ideally, these extra measurements should be included in the inversion in a manner which improves the convergence properties of the algorithm. However, the similitude equation seems to combine the measurements in a way which degrades (or destroys) convergence. This point is reinforced by the following analysis. Consider the similitude equation (2.5.9) evaluated at a0 and substitute for the right side from the linearized equation (2.5.10) to give oo R (a) - T - = J g (<TO, z) a (z) dz + Rr. (2.5.20) The left side of (2.5.20) resembles the left side of the similitude equation (2.5.11) except that d TR(cr) is repaced with d TR(<r 0). The right side resembles the right side of (2.5.11) with the first-order error term S r replaced with the second-order term Rr. Thus, replacing the observed quantity dxR(cr) with that computed for the starting model a 0 essentially reduces the error term in the similitude equation from first to second order in So even though the number of measurements is reduced by half. Equation (2.5.20) is a viable equation for inversion; the convergence properties of an algorithm based on this equation should be identical to those of the linearized inversion algorithm. Even if an iterative solution of the similitude equation does converge, because the responses consist of the difference between measured quantities, there is no guarantee that the constructed model will reproduce the measurements. Convergence criteria for iterative algorithms are considered in Chapter 3, but basically what is required is that the sequence of constructed models stabilize at a model which reproduces the responses. For instance, consider the linearized equation for a at iteration n + l: oo oo (z) dz = j G (a n, z) an+1 (z) dz. (2.5.21) 0 0 If this sequence stabilizes after n iterations so that there is no change in the constructed model, i.e. an+1 = an, (2.5.21) reduces to R(a n) = R(a) and the constructed model reproduces the observations. However, consider an inversion algorithm for a based on the similitude equation (2.5.11) at iteration n + l: oo R (*) - = jG(<Tn,z) an+1 (z) dz. (2.5.22) 0 oo If the process stabilizes after n iterations so that an+1 = an, then since J G(a n, z)an(z)dz = o T ^ = RM- T d - ( 2 . 5 . 2 3 ) Because the similitude equation responses consist of the difference of measured quantities, an can satisfy (2.5.23) without either R(a n) = R(a) or drR{a n) = drRia). Thus, the constructed model is not actually required to reproduce any of the measurements. Before the analysis in this section was carried out, an inversion algorithm based on the similitude equation (2.5.11) was implemented (the details of this algorithm are similar to the linearized inversion algorithm presented in Chapter 3). For the simple case where all models are restricted to constant-conductivity halfspaces, we found that the algorithm generally exhibited linear convergence to the true model. However, for models with any appreciable structure (even simple two-layer models), the algorithm generally did not converge. Thus it would seem both in theory and in practice that the similitude equation is not an appropriate basis for model norm inversion. The major difficulty with using the similitude equation for inversion is that in order to implement it in the manner suggested by G6mez-Trevino (1987), a first-order error term S r must be neglected. A natural question to investigate is whether this error term can be included in the inversion rather than neglected. In order to include this term, the first-order similitude equation (2.5.18) may be written as {oo oo R{a)- J G (<7o, z) 8a (z)dz 1 = J g (a 0, z) a (z) dz. (2.5.24) 0 J o oo Since R(a) - f G(a 0, z)6a(z)dz = R(a 0) (to first order) and Td TR(a 0) = R(a 0)-o oo f G(a 0,z)a0(z)dz, (2.5.24) reduces to o oo oo 8R+ J G(a 0,z)a0(z)dz = J G (a 0, z) a (z) dz, (2.5.25) 0 0 which is simply the linearized inverse equation for a. This is not a surprising result. The Frechet derivative of the response R with respect to the conductivity is defined to be a linear functional which, when applied to the conductivity perturbation, produces (to first order) the response perturbation. The Frechet derivative is unique, therefore the linearized equation represents the only linear expression which relates 8R to 8a (or, using transformation (2.1.17), to a itself) which is accurate to first order in 8a. Any attempt to devise a linear relationship between R and a which is accurate to first order must reduce to the linearized equation for R. For instance, the integral equation (2.4.1) derived by Smith & Booker (1988) also relates the model directly to the response; however, it can be shown that using (2.4.1) in a linear inversion requires an approximation that is equivalent to neglecting a first-order error term. If the error term is included in the inversion, the expression again reduces to the linearized equation. Thus, the linearized equation would seem to be the obvious basis for model norm inversions. Chapter 3 l 2 Model norm construction 3.1 Introduction In Chapter 2 an approximate linear expression (equation (2.2.18)) which relates the con-ductivity model directly to (modified) MT responses was derived. In this chapter an iterative inversion algorithm based on this linearized expression is developed to construct models which minimize an l2 norm. The norm can be a measure of model structure or of the deviation of the model from a given base model; minimizing these norms produce the l2 flattest (minimum-structure) model and the smallest-deviatoric model, respectively. It is important to note that although recent papers by Constable et al. (1987) and Smith & Booker (1988) have considered the construction of Z2 minimum-structure models, the work presented in this chapter was initiated prior to those publications (or the author being aware that such research was being carried out) and has been developed independently. Certainly the development and results of these papers are germane to this chapter and comparisons and contrasts are made. Preliminary results of the inversion algorithm described in this chapter have been presented in Dosso & Oldenburg (1989) and have also been summarized in Whittall & Oldenburg's (1990) survey of 1-D MT inversion techniques. The work in this thesis was guided by a somewhat different philosophy than that expounded by Constable et al. (1987) and Smith & Booker (1988). Those references consider the l2 minimum-structure solution to be the best model with which to interpret the observed responses. While it is acknowledged that if an interpretation is to rely on a single solution, this model may be an excellent choice, the development here is based on the belief that it is always valuable to produce a variety of acceptable models and to take into account as much additional information or insight into the problem as possible. Such flexibility in model construction allows some exploration and understanding of the space of acceptable models. To this end, l2 flattest and smallest-deviatoric models are developed for both a and log a as models and an arbitrary weighting function is included in all model norms. The MT inversion algorithm presented in this chapter is based on successive solutions of the linearized problem. Before the general inversion scheme is described, however, it is convenient to first consider the simpler linear problem in order to describe the model norms and the inversion procedure 3.2 Linear inverse theory This section presents methods and results from linear inverse theory. Although the theory is general, only results relevant to the linearized inversion algorithm are considered. More complete reviews of linear inverse theory are given by Parker (1977a), Oldenburg (1984) and Bertero, De Mol & Pikes (1985, 1988). In a linear problem the model m and responses d 3 are related via a linear functional oo dj = J Gj(z)m(z)dz, j = l,...,N. (3.2.1) o Observed responses are generally inaccurate; therefore the aim of the inversion is not to fit the data exactly, but rather to achieve an acceptable level of fit based on some criteria. The error on each response dj is assumed to be due to an independent, zero-mean Gaussian process with a standard deviation Sj. Although this may be a poor approximation in some cases, it is retained since knowledge of the true statistical distribution of the noise is often very poor and this assumption allows the analysis to be carried out exactly (Parker 1977a). To weight each response according to its uncertainty, the data equations are divided by their standard deviation to yield oo ei = J 9i 0 ) m(z)dz, j = 1 , . . . , N, (3.2.2) o where e3 = dj/sj and gj(z) = Gj(z)/sj. The constraint equations (3.2.2) may be inverted for the model m(z) which minimizes some penalty functional. Minimization of a penalty functional (also called a regularization functional) is a requirement for regularizing the inverse problem (Rokityansky 1982). 3.2.1 Smallest model One of the simplest functionals to minimize is the (squared) l2 norm of the model, , oo | |m||22 = J \m(z)\ 2dz, (3.2.3) o to yield the smallest model. A somewhat more general formulation includes an arbitrary (positive) weighting function w(z) in the norm: \\™\\l w = J\w(z)m(z)\ 2dz, w(z) > 0. (3.2.4) o Including w in the norm allows flexibility in defining how strongly (in a relative sense) the minimization is applied to various regions of the model: where w(z) is large the model will tend to be small (if possible) and vice versa. To minimize the model norm (3.2.4) subject to the side conditions (3.2.2) the method of Lagrange multipliers (e.g. Morse & Feshbach 1953, p. 278) is employed. Each constraint equation is written as an expression equal to zero; these expressions are multiplied by an unknown Lagrange multiplier (for convenience written as 2a j) and added to the norm to produce a new functional N J *(m)= IMl2 )U, + 2X>;[ ei" /*(*)"»(*)*]. (3.2.5) The model norm (3.2.4) is minimized, subject to the data constraints, at the point where the functional (3.2.5) is stationary with respect to m and ar To investigate the variation of $ with respect to m, consider the change c?$ = $(m + dm) — $(m) due to an infinitesimal model perturbation dm. It is straightforward to show that °° N d$ = 2 w2 (z) m(z)-^2 ai9i ( z ) d m ( z)d z• (3.2.6) { For $ to be stationary with respect to m, d$ must be zero for arbitrary dm, thus (3.2.6) requires N m(z) = J2<* i9A z ) /™ 2 ( z ) - (3-2.7) 3=1 According to (3.2.7), the smallest model is given by a linear combination of the (weighted) kernel functions with the Lagrange multipliers acting as the coefficients. Setting d$/dotj = 0 yields the constraints (3.2.2); substituting (3.2.7) into (3.2.2) leads to N I a (z) w(z) e J = £ a i / a ^ f i ^ b , (3.2.8) f-f J w< fc=l o or e = Ta , (3.2.9) where c = (ei, e 2 , . . . , ejv)T, a = (ai, A 2 , C X N ) T and T represents the (weighted) inner-product or Gram matrix with elements oo r j k = [ ? M i i M d z . (3.2.10) J w(z) w(z) 0 r is an NxN symmetric, positive-definite matrix, so in theory it can be inverted, (3.2.9) solved for a and the smallest model constructed using (3.2.7). However, although the problem of computing the smallest model in this manner is well-posed in the strict mathematical sense (i.e. the solution depends continuously on the data), the problem can be extremely ill-conditioned and therefore very unstable numerically, especially for large data sets. This instability is a consequence of noise on the data and the finite numerical precision of computations which has the result that the kernel functions may not all be considered as effectively linearly independent. Also, since the responses are inherently inaccurate, it is not desired that the solution should exactly reproduce the responses. Rather, the inversion procedure should seek to fit the data only to within a level of misfit appropriate to the uncertainties of the responses. A common measure of misfit is given by = (3.2.11) •s? 3=1 x 3 where dj{m c) = f Gj(z)m c(z)dz are the responses predicted for the constructed model mc(z). o X 2 corresponds to the standard chi-squared statistic if the error on each response is assumed to be due to an independent, zero-mean Gaussian process with standard deviation sy The expected value of x 2 is (approximately) equal to the number of data, N; therefore, the constructed model should ideally produce a x 2 misfit of N. If x 2 is much less than N, the data are fit too well and the model may exhibit structure which is simply an artifact of the noise; if x 2 is much greater than N, the data are fit too poorly and information about the model may be lost (Oldenburg 1984). A method that allows for a suitable misfit and which overcomes the numerical instabilities of the inversion is the spectral expansion method described by Parker (1977a). Spectral expansion isolates the components of the solution that are well determined by the data from those that are not. In fact, it can be shown that the spectral expansion method is equivalent to the principal component or Karhunen-Loeve transformation used in signal processing and information theory to extract the common signal from a set of time series (e.g. Kramer & Mathews 1968, Jones 1985). 3.2.2 Spectral expansion In the spectral expansion method, rather than forming the model solution directly as a linear combination of the kernel functions as suggested by (3.2.7), the kernels are rotated to produce a new set of basis functions N = (3.2.12) k=i where U is the orthogonal matrix which diagonalizes T according to U TTU = A . (3.2.13) In (3.2.13) A = diag (A1? A 2 , . . . , A^), with Aa > A2 > . . . > A;v > 0, is the matrix of eigenvalues of T (called the spectrum of T) and U is the matrix of column eigenvectors of T. The matrices U and A may found by a singular value decomposition (SVD) of T (Lanczos 1958). It is straightforward to verify that {^3{z)} is an orthonormal set, i.e. CO J tl> j(z)tl> k(z)dz = 6jk. (3.2.14) 0 This set of functions can be used as a basis for m: N m(z) = ^2 0 ) , (3.2.15) j=i where the coefficients are given by oo ctj = J if?j (z) m (z) dz o = A~1 / 2e j 5 (3.2.16) and i j = I represents rotated responses. Parker (1977a) shows that the aj are statistically independent and the standard deviation of each coefficient aj is XJ 1/ 2, i.e. the coefficients associated with the smallest eigenvalues have the greatest uncertainty. Physically, diagonalizing the inner-product matrix may be considered analogous to rotating to a 'principal axes' co-ordinate system where the co-ordinate axes, specified by the eigenvectors, correspond to the natural axes of symmetry. Parker (1977a) regards (3.2.15) as an expansion in the natural 'modes' of the data. He notes that the largest eigenvalues are generally associated with the smoothest functions ij>j(z) and that the functions become more oscillatory as j increases. Also, the magnitude of successive eigenvalues generally decreases rapidly except for the smallest ones which often cannot be computed accurately and tend to cluster around a small number determined by the computational precision. The expansion coefficients associated with these small eigenvalues are poorly determined. Parker (1977a) describes two methods to stabilize the inversion. One method is simply to omit the functions i/>j(z) associated with the smallest eigenvalues from the expansion (3.2.15). The truncated series will no longer fit the original data precisely, but it is straightforward to show that the x 2 misfit for the constructed model is given by N x2 = E ( 3-2-17> j=n+1 where n < N is the number of terms included in the truncated expansion. The number n can be chosen so that x 2 approximately achieves the desired value. An alternative approach is to retain all the eigenvalues, but to replace A j by A _,+/? in (3.2.12) and (3.2.16), where /? is a positive constant. This is equivalent to adding a constant to the main diagonal of A to avoid singularities or near singularities and is similar to the Marquardt-Levenberg or 'ridge regression' method for least-squares inversion (Levenberg 1944; Marquardt 1963; Lines & Treitel 1984). In this case, it can be shown that the x 2 misfit is given by (3 represents a trade-off parameter between fitting the responses accurately and minimizing the model norm: when 0 is large the model norm can be made very small by misfitting the data; when j3 is small the data are accurately fit at the expense of an increase in the norm. Since X 2 is a monotonically increasing function of /?, (3.2.18) can easily be solved for the value of the ridge-regression parameter fi which results in the desired level of x2 misfit using Newton's method or a 1-D line search. This procedure allows the construction of the smallest model with a precisely determined value for x2-Oldenburg (1979) applied the method of spectral expansion and truncation to the linearized MT response expansion given by (2.2.20) to compute the smallest conductivity perturbation in an iterative algorithm. In order for the linearization to hold, perturbations must be small and minimizing the l2 norm of the perturbation is a logical choice. However, making use of the transformed equation (2.2.22) allows a norm to be applied to the model itself, not the perturbation. In this case, constructing the smallest conductivity model is not the best choice for several reasons. There is no geophysical reason to expect the Earth conductivity structure to correspond to the smallest model solution. In fact, minimizing the conductivity often results in highly oscillatory solutions which include regions of negative conductivity that are both unphysical and difficult to deal with computationally. The problem of negative conductivities can be remedied by considering log a as model; however, more meaningful choices for the model norm can be derived by modifying the smallest model construction. (3.2.18) 3.2.3 Smallest-deviatoric model As mentioned above, there is no geophysical reason to expect the smallest model to represent the true Earth structure. However, if a base model is available which corresponds to the best estimate of the true model based on any available information (well logs, geological considerations, previous modelling studies, etc.), the method described in Section 3.2.1 can be modified to construct the model which deviates least from this base model. That is, the norm of the deviation between the constructed and base model is minimized subject to the constraint that the constructed model fits the data to within an acceptable level. This is referred to as the smallest-deviatoric model. Let rriB(z) represent the base model and Am{z) represent the (unknown) deviation from m j required so that the constructed model m(z) = mB(z) + Arn(z) (3.2.19) fits the data to within the required level of misfit. The data constraint equations (3.2.2) become oo e j = J 9: CO [™b (z) + Am (*)] dz, j = 1,...,N, (3.2.20) o which may be written as oo gj = J gj(z)Am(z)dz, (3.2.21) o where the modified responses are given by oo = e j ~~ J 9j ( z ) m B (z) dz. (3.2.22) o Equations (3.2.21) are modified data constraints written in terms of Am as model; these may be used as side conditions in minimizing the (weighted) /2 norm of the model deviation oo 11 Am 1 1 = j\w(z)Am(z)\ 2dz (3.2.23) in the manner described in Sections 3.2.1 and 3.2.2. Strictly speaking, (3.2.23) is a not a norm for the model itself; however, it will sometimes be referred to loosely as a model norm. The smallest-deviatoric model is formed by adding the constructed deviation to the base model according to (3.2.19). The weighting function w(z) is particularly useful for the smallest-deviatoric model as it is often the case that the base model is well known for some depth regions but not for others. Also, by an appropriate choice of base model and weighting function, the construction of smallest-deviatoric models may be used to perform an approximate appraisal of model features; this procedure is demonstrated in Section 3.4.2. 3.2.4 Flattest model In many cases there is insufficient additional information available to determine a reliable base model prior to the inversion. In such cases it is probably best to seek the simplest model consistent with the data at a given level of misfit. By constructing minimum-structure models, the danger of being mislead by features appearing in the model that are not required by the data should be greatly reduced. There is reason to believe that the features of the minimum-structure solution are characteristics essential to fitting the data and not simply artifacts of the noise or inversion procedure. The true model may be more complex than the simplest model, but these additional complexities are not resolved by the data and are not justified in the constructed model. Constable et al. (1987) and Smith & Booker (1988) also propound this philosophy for the inversion of MT responses. The l2 norm of the model gradient is a measure of the amount of structure of the model; minimizing this norm produces the flattest model. To express the constraints in terms of the model gradient, (3.2.2) can be integrated by parts to give oo fij = / 9j (z) m'(z) dz, j = 1,..., N, (3.2.24) o 9j 0 ) = hj 0 ) - hj (oo) (3.2.25a) z k3 0) = / 9j 0) d u (3.2.25b) o ej = m (0) hj (oo) - ej (3.2.25c) and m' = dm/dz. Using these constraints as side conditions, the method described in Section 3.2.1 may be used to compute the smallest gradient model. The flattest model m(z) is recovered directly by integrating m'(z). The procedure derived here requires the surface value m(0) be known; in fact, the derivation can be generalized for m known at any fixed depth, but in practice m(0) is most likely to be known accurately for the MT problem. If m(0) is known, it is valuable to include this information in the inversion; however, an inaccurate estimate of m(0) can introduce false structure into the model. If a reliable estimate of m(0) is not available, the value for m(0) which produces the absolutely flattest model can be calculated. An original derivation of the absolutely flattest model and a discussion of some of its attributes are included in Appendix A. Smith & Booker (1988) also describe a similar procedure. Minimizing the norm of the model gradient leads to the model with the minimum structure which fits the data. The weighting function w(z) may be used to define depth regions where structure is to be discriminated against either more or less strongly. In MT the resolution of the responses generally decreases logarithmically with depth; therefore it is often appropriate to minimize the logarithmic gradient, i.e. to consider a norm where the depth function f(z) may be either z or log(z + zQ). Since dm/d[log(z + z0)] cx (.z + z0) dm/dz, the logarithmic gradient can be accomplished by including a weighting (in addition to w) of (z + z0)1/ 2. The constant z0 is included to avoid a singularity at z =0; physically, it is required since the resolution length approaches a constant (not zero) at the Earth's surface. Smith & Booker (1988) use a similar approach and choose z0 equal to half the penetration depth 3£(c) (Weidelt 1972) for the highest frequency in the data since structure much shallower than this cannot be resolved. After considering a number of possibilities, this value was also adopted in our inversion algorithm. (3.2.26) o 3.3 The linearized inversion algorithm This section describes an iterative inversion algorithm for the non-linear MT problem based on local linearization. Let the MT responses be related to the model according to Rj = Fj ( m) ? j = l,...,N, (3.3.1) where Fj represents the non-linear functional defining the forward problem. Chapter 2 shows that expanding the functional about a starting model, neglecting second-order terms and substituting for the model perturbation leads to a linear expression which relates the model directly to the (modified) responses: oo oo ^ + = j = l , . . . , N . (3.3.2) 0 0 In (3.3.2) the model m(z) may represent conductivity a{z) or log cr(z). By selecting a starting model m0(z) and computing the corresponding kernel functions, (3.3.2) can be inverted for a flattest or smallest-deviatoric model m^z) using the linear inversion techniques described in Section 3.2. Since higher-order terms are neglected in (3.3.2), unless m 0 is close to a solution, it is unlikely that mi will adequately fit the observed responses, i.e. the misfit X 2 = t ( f l ] " f i ( m i ) ) ' (3-3.3) 3=1 ^ S j ' will be unacceptably large. If this is the case, the inversion is repeated iteratively with m 1 becoming the starting model for the next iteration and so on. This procedure is continued until the x 2 misfit reaches the desired level and the model does not change appreciably between successive iterations. Since the linear equation (3.3.2) is accurate to second order, it may be anticipated that when the algorithm converges, convergence will be rapid (quadratic) in analogy with Newton's method for operators (Section 2.1.1). By explicitly minimizing the norm of the model gradient or deviation at each iteration, it is hypothesized that the constructed model represents the flattest or smallest-deviatoric model which fits the data. In practice, it is difficult to verify that a global (rather than local) minimum for the norm has been found; this point will be investigated in some detail. 3.3.1 Numerical implementation Numerical implementation of the algorithm requires that a depth partitioning be introduced and the model discretized. The discretized model is intended to represent an arbitrary function of depth that is independent of the parametrization. Thus it is important to allow the discretized model to be as flexible as possible; this generally requires that the partition elements should be smaller than the resolution width of the data. Because of the inherent loss of resolution with depth, the partition locations are usually distributed logarithmically with depth (with possibly a uniformly discretized region near the surface) and a uniform halfspace underlies the system. If there is any question of whether the partitioning has influenced the final solution, the inversion may be repeated with successively finer discretizations until there is no appreciable change in the solution. The model gradient norm (3.2.26) assumes a continuous model; however, efficient numerical solutions to the forward problem are more readily based on a parametrization in terms of a set of piece-wise constant layers. Therefore, the model is discretized so that m(z) — mi, Zi-i < z < zit z = l , . . . , M , (3.3.4) where z0 = 0 and the number of layers M is typically about 100. The depth partition must extend deep enough that the EM fields associated with the longest period have decayed essentially to zero; the upper limit of infinity in all integrals may then be replaced by zM-Once a starting model m0(z) has been specified on the partition, the kernel functions can be computed. The N complex equations (3.3.2) are separated into 2N real equations corresponding to either real and imaginary or amplitude and phase representations of the complex responses R. Expressions for the kernel functions are derived in Chapter 2: when m(z) = a(z) the kernels corresponding to real and imaginary responses are given by the real and imaginary parts of (2.2.17) while the kernels corresponding to amplitude and phase responses are given by (2.5.1a) and (2.5.1b). When m(z) = log a(z), these kernel functions are multiplied by cr0(z) as described in Section 2.5. Computing the kernel functions and modified responses requires the solution to the forward problem: given the model m0(z), calculate the resulting fields E(m 0) and H(m 0) (or equiva-lently, d zE). Recursive solutions to the forward problem of computing electric and magnetic fields for a layered Earth model are well known (e.g. Rokityansky 1982). Our algorithm in-corporates routines written by Wannamaker, Stodt and Rijo (1987) to compute the EM fields at any depth. Integrations required to calculate the modified data or kernels for either the flattest or smallest-deviatoric model are computed using a Romberg integration scheme. In general, it is not necessary to compute and integrate all kernel functions to a depth ZM (Oldenburg 1979). For each period there exists a depth z m a x below which the electric field has decayed to some small proportion (say, 10 - 5 ) of its surface value. This depth can be computed adequately using a WKBJ approximation (e.g. Mathews & Walker 1970, p. 27), and below, the kernel function may be set identically to zero. Once the appropriate kernel functions and modified responses have been computed, the linearized problem can be solved for the flattest or smallest-deviatoric model as described in Section 3.2: the inner-product matrix T is computed and diagonalized using SVD, the appropriate value for the ridge-regression parameter /3 is computed, and the model gradient or deviation is constructed as a linear combination of the (rotated) basis functions {ipj}. Finally, the model mi(z) is found by integrating m'(z) (for the flattest model) or adding mg(z) to Am(z) (for the smallest-deviatoric model) and x 2 misfit to the data is computed for mi(z) according to (3.3.3). This procedure is repeated iteratively until the convergence criteria are met. The criteria are that the model must fit the data to within a tolerance tx2 of the desired misfit xl> a n d that the model does not change appreciably between successive iterations. The latter requirement is included because since the constructed model always depends on the model at the previous iteration, it is important to verify that a stable solution has been achieved. The total change in the model at the &th iteration is quantified by the value ek: 1/2 £ [m izk - i] 2 t— 1 (3.3.5) eit = where M m a x is the index of the partition element that corresponds to the maximum depth of penetration zmax for the longest period response. The limit M m a x is used rather than M so that the model change measure e will not be unduly influenced by that region of the model which is not resolved by the data. In addition to requiring that the total model change e be less than some specified value ed for convergence, no model element is allowed to change by more than a prescribed factor; this procedure is described in the following section. In our algorithm, the desired misfit value, xl> the tolerance allowed in the final misfit, tx2, and an acceptable value for ed are variable parameters that are defined by the user. In practice, common values for these parameters are Xd = ^N (for complex responses at N frequencies), tx2 =0.1 and e^ = 0.01. 3.3.2 Non-linear considerations An important consideration that has not yet been discussed regards the applicability of the local linearization inherent in (3.3.2). Since this equation neglects second-order terms, it is only accurate for small changes in the model. One method of ensuring that model changes at each iteration remain small is to require only a small change in the misfit per iteration. In practice, this is accomplished by choosing the target misfit value xi f° r the kth iteration to be some fraction of the misfit of the previous iteration unless this value is less than the desired final misfit value: where P is usually taken to be between 2 and 10. In addition to choosing target misfits in this manner, the size of changes in the model between iterations is controlled when the model is updated, i.e. the value of the model on the zth partition element at the &th iteration is only (3.3.6) allowed to change by a factor of D from its value at the previous iteration: m t i f c_!/D < mitk < D , i = 1 , . . . , M, (3.3.7) where D is also usually taken to be between 2 and 10. In most cases the new model values naturally fall within these limits and the constructed model is not altered; however, in some cases applying the limits seems to be required to stabilize the first few iterations. The algorithm is not considered to have converged at a given iteration if any element of the constructed model is altered according to (3.3.7), even if the required conditions for misfit and the total model change e are satisfied. In practice, these methods of keeping model changes small so that the linearization remains valid have been found to be effective and necessary precautions for a stable and robust inversion scheme. Another consequence of the non-linearity is that solving the linearized data equations to a misfit of Xt a t a given iteration will not, in general, result in this same value when the true X 2 misfit is computed according to (3.3.3). Constable et al. (1987) overcome this difficulty by sweeping through values of the ridge-regression parameter /3 and computing the corresponding values for the x 2 misfit in order to achieve the desired value. Unfortunately, this can require a large number of solutions to the forward problem for each iteration. While this technique may be viable for the 1-D MT problem, for 2-D MT or other problems the amount of forward calculations can be prohibitive. A different approach has been developed for our algorithm. The difference in the misfits of the linearized and non-linear cases is not a problem in the early iterations since the linearized approximation will produce a very acceptable decrease in the misfit of the non-linear problem. However, in order for the algorithm to converge precisely to the desired misfit the non-linear effects must be taken into account in the final iteration. In its unmodified form, the algorithm generally converges to a stable solution that has a misfit slightly larger than xl• In order to reduce the misfit to the value Xd> it necessary to solve the linearized equations to a misfit somewhat smaller than x2d (i-e- require a target misfit Xt somewhat smaller than Xd)- The problem becomes one of finding the value for the target (or linear) misfit x 2 that results in the desired misfit x2 — Xd i n a n efficient manner. In our approach, when three consecutive iterations have not reduced the misfit significantly, the target misfit x 2 f° r the third iteration is reduced by multiplying it by a factor Xd/x 2> where x2 represents the misfit originally achieved in the third iteration. The appropriate /? is computed so that the linearized equations are solved to a misfit of x?» and the true misfit achieved is computed from the resulting model. Adjusting x< i n this manner generally leads to a substantial improvement in the x 2 misfit. In some cases the x2 value it produces is within the specified tolerance of the desired value xl and the algorithm has converged (provided the model change requirements are also satisfied). Even if this step does not result in the desired misfit, it has produced a second pair of target and actual misfits (xi, x2) valid for this iteration and thus a third target misfit value can be computed using these two pairs in an approximate Newton step. In practice, the x 2 resulting from this Newton step usually leads to the desired misfit of x% cases when it does not, the procedure is repeated up to three times. If xl has still not been achieved, a new iteration is initiated. It is quite possible that the desired misfit x2 — Xd c a n he obtained for more than one value of the target (or linear) misfit xi since the x 2 misfit may begin to increase when the linear misfit is decreased below a certain point as a result of the non-linearity of the problem. This is illustrated diagrammatically in Fig. 3.1. When this is the case, the larger value of xi is correct because it results in a smaller model norm. Therefore, it is important to verify that our procedure has returned this value for xi- This is easily checked by verifying that x 2 has decreased if xi was decreased or that x 2 has increased if xi w a s increased. If this is not the case the step is approaching the wrong root and the algorithm returns to the best previous solution which satisfied this requirement to initiate a new iteration. This procedure for determining the misfit to the linearized problem required to precisely obtain a specified misfit to the non-linear problem has proved to be an efficient and effective method that requires a minimum of forward modelling. The algorithm has successfully inverted all data sets (measured and synthetic) that have been considered. In many cases, only one forward model solution per iteration is required for all but the final iteration which may require two or three forward models to achieve the desired misfit Xd- Some particularly difficult inversions may x \ Figure 3.1 Diagram showing the dependence of the x 2 misfit on the target misfit Xt (the misfit to the linearized data equations). The desired misfit x j i s indicated a dashed line. require several forward model solutions for the final few iterations. The only difficulties arise when a value for \2 d is specified which is smaller than that which can be achieved by a finite 1-D model. This can be the case if a set of field measurements contain significant 2-D effects or the data uncertainties are under-estimated. In theory, the problem of finding the smallest possible x 2 misfit to an arbitrary set of MT responses has been solved by Parker (1980) and Parker & Whaler (1981). They show that the solution which minimizes the x 2 misfit is given by the D+ model which consists of a series of delta functions of infinite conductivity, but finite conductance, embedded in an insulating halfspace. The D+ solution is not a geophysically realistic model, but it does provides a lower limit for the x 2 misfit. In practice, however, it may not be possible to achieve this level of misfit with the algorithm described in this chapter since the constructed models do not admit delta functions and a finite partitioning is imposed on the model. This is perfectly acceptable since the object of the inversion is to construct geophysically plausible models. In the case where an unattainable value for x^ is specified, it may be that the multiplicative factor xl/x 2 u s e d to reduce x2 i n the initial step is too large and may not result in a decrease in x2- If this is the case, the multiplicative factor is reduced by successively taking its square root until an acceptable step is found. If an acceptable step is not found in 10 such attempts, the misfit would appear to have been reduced to (approximately) the smallest possible value and the algorithm terminates. The algorithm described in this section has been implemented as a fully automated, self-contained routine. The algorithm generally converges from a halfspace starting model to a model with the expected value of x 2 in about six to eight iterations. The next section of this chapter presents a number of examples of l2 model norm construction designed to illustrate the features of the inversion algorithm. 3.4 Examples of l 2 model norm construction 3.4.1 Flattest model construction The inversion algorithm described in Sections 3.2 and 3.3 is designed to be flexible so that a variety of acceptable models of specific character can be obtained for a given data set. In this section the construction of Z2 flattest models is demonstrated by inverting both synthetic and measured field responses. The synthetic test case used to illustrate the inversion algorithm is that considered by Whittall & Oldenburg (1990) in their survey of 1-D MT inversion techniques. The true model, given in Table 3.1 and shown by the dashed line in Figs 3.2-3.7, consists of four homogeneous layers overlying a uniform halfspace. Fifty data consisting of the real and imaginary parts of the response R were generated at 25 periods equally spaced in logarithmic time from 0.0025 to 250 s. For the purpose of illustrating features of the constructed models, accurate data are considered initially; however, an uncertainty of 2 percent in all responses is assumed so that the X 2 statistic can be used to measure the relative fit of the models. Unless otherwise indicated, the absolutely flattest model was constructed in each example in this section and a misfit of x 2 = 50 (tolerance tx2 = 0.1) and maximum model change of ed = 0.01 were required for convergence. In each case the starting model was taken to be a halfspace of conductivity 0.02 S/m. The effects of introducing noise into the synthetic data and inverting field measurements are considered subsequently. The first example illustrates the convergence properties of the algorithm when the gradient norm (3.2.26) is minimized with model m(z) = cr{z), depth function f(z) = z and a weighting w(z) = 1. The constructed models and predicted responses at each iteration are shown in Fig. 3.2 and the corresponding values of x2> ||m'||2 and e are given in Table 3.2. The target misfit X2 for each iteration was chosen according to (3.3.6) with P = 10 and the size of the model change between iterations was limited according to (3.3.7) with D = 10. Figure 3.2(a) shows the halfspace starting model; the true model is indicated by a dashed line. The two plots to the 0.10 ' V a 0 .08 cn 0.06 ^—y 0.04 N 0 .02 b 0.00 -• i ; i ^ 11 > 1111 i i 1111111 a • i 102 103 104 0.10 / s 6 0.08 \ CO 0.06 0 .04 N 0 .02 ""b 0 .00 43 1 0 - 1 CO 10"z b 3 1 0 - 3 80 £ 60 40 20 1 0 - a 1 0 -2 1 0 - i 1 0 o 1 0 i 1 0 2 10"3 lO-2 10"1 10° 101 10z 0.10 / s a 0 .08 \ CO 0.06 ^ — ' 0.04 0 .02 b 0 .00 102 1 0 - 3 io-2 1 0 - l 1 0 o 1 0 l 1 0 2 T ( s ) F i g u r e 3 . 2 The sequence of models produced in the inversion for the Z2 flattest model with m(z) = <r(z), f(z) = z and = 1. (a), (b) and (c) show the starting model and the models constructed in iterations 1 and 2, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The values of x 2 , | |m'||2 and e for each iteration are summarized in Table 3.2. 0.10 - ' "1 a 0.08 - 1 m 0.06 - I i / 1 \ / 1 \ 0.04 - if I _ i N 0.02 - i / \ / i \ /1 / 1 j d 0.00 i i 111 in i i i i 111ii i i i 102 103 104 ? 0.10 0.08 _ 1 n ! i w 0.06 ,/ i \ r 1 \ 0.04 - - - j, i _ \ - -j— N 0.02 - !r \ / j H 1 V 1 J e 0.00 i i i 11in i i i i i i in i i B D 1 0 -3 80 O 60 S . 40 20 IO"3 10~z IO"1 10° 101 102 102 103 104 IO"3 IO"2 10~l 10° 10l 102 0.10 -— 6 0.08 \ CO 0.06 0.04 0.02 "TF 0.00 IO"3 10-2 10"1 10° 10l IO2 T (s ) Figure 3.2 (cont'd) (d), (e) and (f) show the models constructed in iterations 3, 4 and 6, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. Table 3.1 The true conductivity model for the synthetic test case. Depth Conductivity (m) (S/m) , 0 - 600 0.004 600 - 2 000 0.04 2 000 - 6 000 0.01 6 000 - 10 000 0.1 > 10 000 0.04 Table 3.2 Summary of model attributes at each iteration for the inversion shown in Fig. 3.2. Iteration x 2 1 M b e number Misfit Derivative norm model change 0 76 700 0 1 6 630 4.45 xl0~4 0.663 2 1 290 9.52 xlO-4 0.420 3 169 1.87 xlO-3 0.284 4 53.9 2.19xl0"3 0.106 5 52.8 2.20 xlO"3 0.0353 6 50.0 2.22xl0-3 0.00592 right of Fig. 3.2(a) compare the observed responses (squares with error bars) to the responses computed for the starting model (solid lines). Although real and imaginary parts of R are used as responses in the computations, the more standard representation in terms of apparent conductivity o a and phase <f> is displayed. For a halfspace starting model o a and <f> are constants and result in a very poor fit to the data (x2 =76700). By requiring an improved fit to the data at each iteration, structure is gradually introduced into the model. The model produced by the first iteration, shown in Fig. 3.2(b), has approximately the correct surface value and general increase in conductivity with depth; likewise, the predicted responses reproduce the general trend of the true data but none of the detailed structure. Structure is gradually introduced into the constructed model and predicted responses in iterations 2, 3 and 4 shown in Fig. 3.2(c), (d) and (e). By iteration 4 the constructed model essentially reproduces the true data (x2 = 53.9); however, several more iterations are required to precisely achieve the desired misfit and verify that the solution has stabilized. The final model, achieved in iteration 6 and shown in Fig. 3.2(f), has a misfit of x 2 = 50.0 and represents a model change of only e = 0.006 from the previous iteration. Only one additional forward model computation was required in the final iteration to precisely achieve the desired misfit value. The model solution clearly indicates the five layers of the true model. However, since minimizing the l2 norm of the gradient discriminates against large abrupt changes in the model, the conductivity changes in a smooth, continuous manner with depth and the structural changes are represented in terms of gradual gradients rather than discontinuous layers. Figure 3.2(f) represents the minimum-structure model (where structure is measured according to (3.2.26) with the given choices of m, f and w), the only features exhibited are those required to fit the data. In practice, it is difficult to verify that this solution truly represents a global (rather than local) minimum for the structural norm. One method of investigating this is to repeat the inversion with different starting models. The algorithm has been initiated from a variety of diverse starting models with no difference in the final solution. Although this does not prove that a global minimum has been found, it does provide some confidence that the final solution is independent of the starting model. In contrast, the models constructed by Oldenburg (1979) when the linearized problem is formulated in terms of a model perturbation are clearly dependent on the starting model. The question of convergence is considered in more detail in Chapter 6 which presents another method for validating the minimization. Since the resolution of MT responses generally decreases logarithmically with depth, it is often appropriate to consider the gradient norm (3.2.26) with the depth function f(z) = \og(z+z Q) as described in Section 3.2.3. Figure 3.3(a) shows the model constructed by minimizing this norm in the inversion algorithm (solid line); the model produced when f{^z) — z is included for comparison (dotted line). In both cases m(z) — a(z), w(z) = 1 and the constructed models have a misfit of x 2 = 50.0. Since the logarithmic depth function is implemented by including a weighting of (z+z Q)1^2, minimizing the logarithmic gradient results in more structure at shallow depths and less structure at large depths than minimizing the linear gradient, as is evident in Fig. 3.3(a). This requires that the low frequency responses are fit more closely and the high frequency responses fit less closely when the logarithmic-gradient norm is used compared to the linear-gradient norm. Smith & Booker (1988) maintain that it is preferable to fit the responses at all frequencies equally well (a white fit) and that using f(z) — \og(z+z 0) can accomplish this particularly when the model is taken to be m(z)—log a(z). The fit to the data for the minimum logarithmic-gradient model is shown in Fig. 3.3(b) and (c). In many cases recovering the conductivity over a number of orders of magnitude is of interest, in this case m{z) = \oga{z) is the appropriate choice of model. Figure 3.4(a) shows the model constructed by minimizing the norm of d[\oga]/d[log(z + z0)]; the result of minimizing da / d\iog(z+zoj\ is also included for comparison (note that conductivity is plotted on a logarithmic scale). In regions of high conductivity, using log a as model results in slightly more structure than using a; however, in the low conductivity regions log a as model results in a significant reduction of structure and improvement in the recovery of the true conductivity. The fit to the data for m(z) —log a(z) is shown in Fig. 3.4(b) and (c). In many applications m{z) = \oga(z) and f(z) — log(z + z0) is the practical choice. B \ CO " b i i i 10 B 10 - 2 10 - 3 i i i i ' nun i i i iiini i i i I M i l l 10-3 10-2 1 0 - 1 10° 101 102 80 £ 6 0 ^ 40 -20 J_L I I I I I I 11 llll I I I i i i i mil i i i i i 10-3 1 0 - 2 1 Q - 1 1 Q 0 1 0 1 1 0 2 T (s) Figure 3.3 The flattest model for m(z) = a(z), f(z) = logz+z 0 and w(z)= 1. (a) shows the constructed model (solid line) and the true model (dashed line); the solution for f(z) = z is also included for comparison (dotted line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit of the constructed model is x 2 = 50.0. 10~3 10~2 10" 1 10° 80 £ 6 0 40 20 -1 1 1 1 1llll 1 'at ^ 1 1 1 1 1 HUM 1 1 1 1 II III I I I 1 Mill c 1 1 111 1 0 - 3 10-2 1 0 - i 1 0 o T (s) 101 1 0 2 Figure 3.4 The l2 flattest model for m(z) = \oga(z), f(z) = logz+z 0 and w(z) — 1. (a) shows the constructed model (solid line) and the true model (dashed line); the solution for m(z) = a(z) is also included for comparison (dotted line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit of the constructed model is x 2 = 50.0. The derivation of the l2 model norm solution in Section 3.3 includes an arbitrary weighting function w(z) which determines how strongly the norm minimization is applied to various depth regions. This weighting function provides additional flexibility in the model construction. For example, the l2 flattest model formulation generally produces smooth models with structural changes represented by gradual gradients; however, by an appropriate choice of w{z), these changes can be confined to localized regions. Figure 3.5 shows the flattest model constructed with m{z) = log <y(z), f(z) = \og(z+z 0) and the weighting function w(z) set to unity everywhere except for narrow regions three partition elements wide centred at each of the depths where the conductivity of the true model changes. These three elements were given weights of 0.1, 0.05 and 0.1. The structural changes in the constructed model, shown in Fig. 3.5(a), occur predominately in the regions where the weighting is small, outside of these regions the model is essentially constant. By comparing Fig. 3.5(a) with Fig. 3.4(a) it is clear that this choice of weighting function results in a constructed solution which more closely resembles a layered model. Of course, in many practical cases an appropriate weighting function may not be so readily evident; however, Chapter 4 presents a new method of model construction which produces layered-type solutions that requires no a priori decisions about where to permit conductivity changes. In the examples presented so far, accurate responses have been inverted to produce the constructed models. Allowing errors on the responses will always degrade the solution, but by requiring a fit to the data appropriate to the inherent uncertainties according to the x 2 criterion, the errors should not introduce false structure into the constructed models. Figure 3.6 shows constructed models and their corresponding fit to the data for three levels of error. In each inversion m(z) = log cr(z), f(z) = log(z + z0) and w(z) = 1, and all solutions have a misfit of X2 = 50.0. In Fig. 3.6(a) each response (real and imaginary part of R) has been contaminated by the addition of a random error drawn from a zero-mean, Gaussian distribution with a standard deviation of 2 percent of the accurate response value. The constructed model is similar (but not identical) to the model shown in Fig. 3.4(a) where a 2 percent uncertainty in the data was assumed but accurate responses were inverted. In Fig. 3.6(b) the standard deviation of the error 1 0 - 3 10-2 10 " 1 10° Figure 3.5 The l2 flattest model for m(z) = loga(z), / (z) = log Z+ZQ and a weighting function w(z) chosen to allow structural variations in narrow zones, (a) shows the constructed model (solid line) and the true model (dashed line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit of the constructed model is x 2 = 50.0. 6 \ m 1 0 " 1 1 0 " 2 10" - i -1 ^ I s J— 1/ 1 J 1 y i i i b 11111in i i i mill i i i io2 io3 io4 io3 io4 z (m) 1 0 - 3 1 0 -2 1 Q -1 1Q0 1 0 1 102 IO"3 10-2 10"1 10° 10l IO2 x w 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 T ( s ) Figure 3.6 The effect of data errors: /2 flattest models for m(z) = \oga(z), f(z) = logz + z0 and w(z) = 1 are shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of 2, 10 and 30 percent, respectively. The true model is also indicated (dashed line). The ! true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The misfit of the constructed model is x 2 = 50.0. distributions is 10 percent of the response value. The constructed model has considerably less structure but there are still indications of the features of the true model. Figure 3.6(c) shows the results for 30 percent error: the constructed model exhibits only a general increase in conductivity with depth and there is no indication of the underlying layered structure. Likewise, the predicted data exhibit no detailed features as only a simple structure is required to adequately fit the true data. For 50 percent errors (not shown), the constructed model is a halfspace and the predicted apparent conductivity and phase are constants; at this level of uncertainty the true responses have essentially no resolving power. In order to formulate the l2 flattest model solution, the surface value for the model must be specified, as described in Section 3.2.4. The standard approach is simply to estimate this value (e.g. Oldenburg 1984); however, the surface value which results in the absolutely flattest model may be determined in the manner described in Appendix A. Figures 3.2-3.6 show examples of absolutely flattest models. Figure 3.7 compares the flattest models computed for an accurate and an inaccurate estimate of the surface conductivity with the absolutely flattest model. In this example m(z) = log a(z), f(z) — log(z+z 0), w(z) — 1 and the responses were contaminated with 4 percent Gaussian noise. Figure 3.7(a) shows the absolutely flattest model solution (note that in this figure the conductivity is plotted to 10° m so that the surface partition elements are shown). The optimum surface conductivity value is found to be 0.0055 S/m and the l2 norm of the model gradient is 0.0534. The constructed model is truly flat near the surface: there is no change in the conductivity over the first few partition elements (the partition length is 5 m to a depth of 102 m and then increases logarithmically below this depth). The flattest model constructed when the true surface conductivity value of 0.004 S/m is specified is shown in Fig. 3.7(b). If this value is known accurately it is valuable to include it in the solution; the model shown in Fig. 3.7(b) is slightly closer to the true model to a depth of about 102 m, below this depth they are essentially identical. However, the flattest model shown in Fig. 3.7(b) is not truly flat near the surface since the conductivity changes slightly at the first partition boundary at 5 m depth. The l2 norm of the model gradient is 0.0550, slightly larger than the norm associated with the absolutely flattest s \ CO 'fT 10"1 10 10 IO"1 10-2 IO-1 10° IO1 102 E \ CO 'fT IO"1 -10-2 . 10-3 10° IO1 IO2 io3 2 ( m ) ] 1 1 1 1 1 1 10~3 10-2 1 0 - 1 1 0 0 1 0 1 1 0 2 IO"3 10-2 IO"1 10° IO1 IO2 T ( s ) Figure 3.7 Flattest and absolutely flattest models for m(z) = log a(z), f(z) = logz + zo, w(z) = 1 and responses contaminated with 4 percent Gaussian noise, (a) shows the absolutely flattest model, (b) the flattest model when the true surface conductivity of 0.004 S/m is specified, and (c) the flattest model when an inaccurate surface conductivity of 0.04 S/m is specified. The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The misfit of the models in (a) and (b) is x 2 = 50.0. The model in (c) represents the best fit that could be obtained for the specified surface value, the misfit for this model is x 2 =56.5 model. The fit of the predicted to true responses is shown in the panels to the right of the constructed models. The models shown in Fig. 3.7(a) and (b) both have a misfit of x 2 = 50.0. Figure 3.7(c) shows the flattest model constructed when an inaccurate surface value of 0.04 S/m is specified. Prescribing this surface value introduces false structure at shallow depths: it not only results in a false high conductivity region near the surface, but a spurious low conductivity region at about 102 m is introduced to counter this near-surface feature. Furthermore, since the high frequency responses which constrain the shallow structure cannot be fit accurately, the low frequency responses are over-fit in an attempt to obtain an acceptable x 2 misfit. However, over-fitting the low frequency responses introduces false structure at depth which is simply an artifact of the noise on the data. The constructed model shown in Fig. 3.7(c) exhibits shallow and deep structure not required by the data and the Z2 norm of the model gradient is 0.283, more than five times that of the absolutely flattest model. In addition, because of the inaccurate surface value, the best fit to the data that could be obtained with the linearized inversion algorithm was X 2 = 56.5. This example demonstrates that when a surface conductivity value cannot be reliably estimated, the absolutely flattest model is a valuable alternative. As a final example of the l2 flattest model inversion, a set of wide-band MT field data measured near Kootenay Lake in southeastern British Columbia, Canada, are inverted. The data were collected by PHOENIX Geophysics Ltd. (Toronto) for Jones et al. (1988) as part of the LITHOPROBE Southern Cordilleran transect, and a preliminary analysis has been presented by Jones et al. The analysis of data from this study is also appropriate in this thesis as the author spent several weeks in the field assisting in the data collection procedure. The responses were measured at 34 periods and are shown as apparent conductivities and phases in Fig. 3.8. The actual data set inverted consists of amplitudes and phases of the response R computed from the determinant averages of the impedance tensor (e.g. Ranganayaki 1984). The determinant average is a rotationally-invariant parameter and its use avoids problems with identification of the electrical strike (Park & Livelybrooks 1989). The uncertainties associated with the computed |J?| and <f> values were determined by a ?10"' \ Q 1 0 - 2 S io-3 b IO"4 --a 1 1 1 1 11 III 1 • 102 103 104 105 10-3 10-2 IO-1 10° IO1 102 ) IO"3 10~2 IO"1 10° IO1 z ( m ) 10-3 10-2 IO-1 10° IO1 102 T ( s ) Figure 3.8 h flattest models and MT responses observed in southeastern British Columbia, Canada. The models solutions are for m(z) = log a(z), f(z) = \ogz + zo and w(z) =1. (a), (b) and (c) show the models produced in iterations 7, 9 and 12 with misfits of x 2 =267, 244 and 219, respectively. The true data (squares with error bars) and the predicted responses (solid lines) are shown in the panels to the right. straightforward numerical simulation procedure (Whittall 1987). Assuming that the errors in the real and imaginary parts of the determinant-average impedances Z(u>) are independent and Gaussian with zero mean, a large number of noisy Z values are generated to form an (approximate) Gaussian distribution centred on the observed impedance value and having its measured standard deviation. Each (complex) noisy impedance is transformed to a value for \R\ and <f> and the standard deviations of the corresponding distributions are determined statistically. This procedure appears to be satisfactory as the computed \R\ and distributions are also (approximately) Gaussian, and transforming the computed mean and standard deviations for |jR| and <j) back to Z using the same procedure essentially reproduces the original value. Jones et al. (1988) also processed the measured impedances and noted that the data set they arrived at was (with the exception of the longest period response) consistent with the response of a 1-D model according to the criterion of Parker (1980). However, the procedure outlined above results in somewhat smaller uncertainty estimates, especially for the outlying data points, than those presented by Jones et al. (1988). Our set of responses and uncertainty estimates is not strictly consistent with a 1-D model since the D+ best-fitting solution (Parker 1980) has a misfit of x 2 = 199 for 68 data. This is likely due to an under-estimation of the errors associated with the measured impedances. Nonetheless, the l2 flattest model inversion was carried out with the desired misfit set to xl — 199. Although it is unlikely that any finite 1-D model can achieve this misfit, the algorithm should construct the flattest model which (approximately) achieves the least possible misfit. The starting model for the inversion was taken to be a halfspace of conductivity 0.01 S/m; the misfit for this starting model is x 2 = 4.09xl05 . Figure 3.8 shows /2 minimum-structure models constructed with m(z) = log a{z), f(z) = log(,z + zo) and w(z) = 1. Since the surface conductivity value was not known in this practical example, solving for the absolutely flattest model was required. Figure 3.8(a) shows the model constructed at iteration 7 of the inversion. This model has a misfit of x2 = 267 and a derivative norm of ||m'||2 =0.0904. The solution is in good agreement with the 1-D models constructed by Jones et al. (1988) using the Occam's inversion algorithm of Constable et al. (1987) and the best-fitting 1-D layered model of Fisher & Le Quang (1981). The model constructed at iteration 9, shown in Fig. 3.8(b), has a misfit of X2 = 244 and a norm of ||m'||2 = 0.137. The best-fitting flattest model that could be constructed with the inversion algorithm is shown in Fig. 3.8(c). This model was produced at iteration 12 and is similar to those shown in Fig. 3.8(a) and (b) except that the features are more pronounced as more structure is required to reduce the misfit. The model has a misfit of x 2 = 219, only about 10 percent larger than the D+ model misfit of 199, and a derivative norm of ||m'||2 =0.198. The inversion and appraisal of this data set will be considered further in Chapters 4 and 5. 3.4.2 Smallest-deviatoric model construction In cases when an a priori estimate of the model structure exists, it is often useful to construct the model which fits the measured MT responses but deviates by a minimal amount from this base model. As an example, consider the synthetic test case considered in Section 3.4.1 and assume that the true model is known to a depth of 2000 m from, perhaps, a well log or previous geophysical study. In this case it would be reasonable to construct models which match this known structure by using the smallest-deviatoric model formulation. Figure 3.9 shows the results of this procedure for the model m{z) = a(z). The base model, shown by the dotted line, consists of the true model structure to a depth of 2000 m; below this depth the conductivity is held constant. In Fig. 3.9(a) the weighting function is set to unity. The constructed smallest-deviatoric model is generally in good agreement with the base model to a depth of about 5000 m which shows that the base model is consistent with the observed responses to this depth. Below about 5000 m, however, the data require additional structure in the model: two higher conductivity layers are clearly indicated. The gradual decrease in the conductivity towards the base model value at large depths results from the limited information content of the data at these depths due to the decay of the EM fields. The constructed model shown was obtained for a variety of different starting models which provides some confidence that a global minimum for the model deviation has been found. 0.10 • s 6 0.08 \ cn. 0.06 0.04 'NT 0.02 0.00 IO"3 IO"2 IO"1 10° 101 IO"3 10-2 10-1 10° 101 102 T (s) Figure 3.9 The l2 smallest-deviatoric model for m(z) = a(z). In (a) and (b) the base model is indicated by the dotted line, the true model by the dashed line, (a) shows the smallest-deviatoric model solution (solid line) for a uniform weighting function, (b) shows the smallest-deviatoric model when w(z) = 1 for z <2000 m and w(z) = 0.2 for 2 >2000 m. The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The misfit of the constructed model is x 2 = 50.0. Since the base model is known reliably over only a certain depth range in this example, it would be reasonable to include a weighting function in the model-deviation norm (3.3.23) which reflects this limitation. Figure 3.9(b) shows the smallest-deviatoric model constructed when a weighting function of w(z) = 1 for 0 < z < 2000 m and w(z) = 0.2 for z > 2000 m was included. In this case the base model is reproduced almost exactly in the region that is strongly weighted. Outside this region, the data require additional structure in a manner similar to Fig. 3.9(a). The fit to the true data for the models given in Fig. 3.9(a) and (b) are shown to the right of each plot. The smallest-deviatoric model formulation may also be used to perform an approximate appraisal of model features by an appropriate choice of base model and weighting function (e.g. Whittall & Oldenburg 1990). As an example, consider appraising the region of high conductivity centred at about 8000 m depth indicated by the flattest log a model shown in Fig. 3.4(a). Figure 3.10 shows the results of a weighted model-deviation norm appraisal of this conductive feature. In Fig. 3.10(a) the base model (dotted line) was taken to be identical to the flattest model solution at all depths except for the conductive zone 6000 < z < 10000 m in depth. In this region the base model was assigned a conductivity of 0.04 S/m, a value significantly lower than that indicated by the flattest model. The weighting function was chosen to be w(z)= 1 for 6000 < z < 10000 m and w(z) =0.2 for all other depths. This ensures that the model-deviation norm is minimized most effectively over the conductive zone which approximates minimizing the conductivity in this region. The solid line in Fig. 3.10(a) shows the constructed smallest-deviatoric model. This solution indicates that a model with an average conductivity over the apparent high conductivity region of only 0.046 S/m is not inconsistent with the data. However, the highly conductive structures at either edges of this zone indicate that the data still require some type of conductive feature near this region. Figure 3.10(b) shows a similar analysis which attempts to maximize the conductivity over the region 6000-10 000 m in depth. The constructed solution indicates that an average conductivity over the conductive zone as high as 0.13 S/m is consistent with the data. In order to achieve this, however, the constructed model exhibits regions of low conductivity at either edge of the IO"3 IO"2 10"1 10° 101 10z s \ CO "b' 10"1 1 0 - 2 . 10-3 IO"3 1 0 -2 1 0 -1 1 Q0 1 0 1 IO2 T ( s ) Figure 3.10 Approximate appraisal using the weighted model-deviation norm. In (a) the base model (dotted line) has a conductivity of 0.04 S/m coinciding with the high conductivity zone of the true model (dashed line). The weighting function w(z) was chosen to emphasize this region. The smallest-deviatoric model (solid line) indicates that an average conductivity of only 0.046 S/m is consistent with the data. A similar analysis is shown in (b) where the base model has a conductivity of 0.2 S/m over the conductive zone. The smallest-deviatoric model has an average conductivity of 0.13 S/m in this region. The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The misfit of the constructed model is x 2 = 50.0. conductive zone. The constructed models shown in Fig. 3.10(a) and (b) both have a misfit of X 2 = 50.0; the fit to the data are shown to the right of the model solutions. Of course, the constructed models shown in Fig. 3.10(a) and (b) do not represent a true minimization or maximization for the conductive region since the model-deviation norm still applies to the entire model. They do, however, demonstrate a method of approximate appraisal using model construction that can be used explore the range of models which fit the data and to test hypotheses about the true model by attempting to construct counter-examples. Chapter 5 describes a method of constructing models using linear programming which may provide true global maxima and minima for localized conductivity averages. Chapter 4 li Model norm construction 4.1 Introduction In Chapter 3 an iterative algorithm for inverting MT responses was presented based on l2 norm solutions to the linearized problem. This chapter presents an inversion algorithm based on li norm solutions, i.e. the objective function that is minimized at each iteration represents an model norm and the observed responses are fitted according to an lx misfit criterion. Although the li misfit does not lend itself as readily to analytic results, it is more robust and less influenced by extreme responses than the l2 norm (Claerbout & Muir 1973; Parker & McNutt 1980). In addition, minimizing an /x model norm leads to constructed models of significantly different character than those obtained in Chapter 3 using the l2 norm. Whittall & Oldenburg (1986) and Whittall (1986) present 1-D MT inversion schemes which minimize h norms of the impulse response or of the reflectivity coefficients of the model, respectively, in an attempt to limit the model structure. However, the conductivity models constructed in this manner will not minimize any structural measure themselves. The method of inverting MT responses by applying an norm to the model itself would seem to be new. As in the previous chapter, an inversion algorithm is developed to construct both minimum-structure and smallest-deviatoric models. However, rather than minimizing a norm of the model gradient, the minimum-structure solution developed here minimizes the l\ norm of the model variation. The total variation of a model m(z) may be defined as (Korevaar 1968, p. 406) and represents a measure of the amount of structure of the model. The models constructed by minimizing this norm resemble layered earth models and are complementary to the continuous gradient models produced using l2 norm inversion. The method of minimizing the variation oo (4.1.1) 0 norm was initially developed to constrain the structure of extremal models constructed in an appraisal analysis (see Chapter 5) and has been found to be a very practical and useful formulation (Dosso & Oldenburg 1989) that has been adopted by a number of authors (e.g. Whittall & MacKay 1989; Oldenburg & Ellis 1990). The smallest-deviatoric model construction is developed in a similar manner and an algorithm is presented which minimizes a combination of both the variation of the model with respect to depth, and the deviation from an arbitrary base model. The MT inversion algorithm presented in this chapter is based on successive solutions of the linearized inverse problem. The linearized problem is solved at each iteration using linear programming methods (e.g. Gass 1975). Linear programming (LP) can be used to solve for a set of parameters which minimize (or maximize) a linear objective function subject to equality or inequality constraints on linear combinations of the parameters. The LP formulation is very flexible: any physical information about the model that can be written as a linear constraint can be included and different choices of the objective function allow a variety of models to be constructed. The next section describes the general formulation and solution of the linear problem for both minimum-variation and smallest-deviatoric models. 4.2 Linear inversion 4.2.1 Minimum-variation model To formulate model construction as a linear programming problem, the model must be discretized and the model elements treated as LP parameters. As before, let m (z) = rrii, i < z < Z{, i = 1 , . . r, M. (4.2.1) In discrete form, the lx norm of the model variation (4.1.1) can be expressed as M-l V(m)= ^ K + 1 - m . - l . (4.2.2) t ' = i The total variation is a measure of the amount of structure of the model; the goal is to construct an acceptable model which minimizes this quantity. Unfortunately, due to the absolute value function, the expression given in (4.2.2) is not in a linear form that can be minimized using LP. However, a suitable objective function can be derived by introducing 2(M—1) new (non-negative) LP parameters {pi,qi, i = 1 , . . . , M —1} which are constrained to be equal to the M—1 model changes according to mi+1 -mi= pi - q{, p{, q{ > 0, i = l,...,M-l. (4.2.3) It follows that \ m i + l — m,-| < Pi + qi (with equality holding if either p{ or qi is zero), and therefore a bound for the total variation is given by M-1 V(m)< (Pi + qi). (4.2.4) »=i It is straightforward to establish that minimizing an objective function given by M—l X > i + ?0 (4-2.5) t=i effectively minimizes the variation of the model as follows. Assume that a set of parameters {mi,pi,qi} are found which minimize the objective function (4.2.5) subject to the constraints (4.2.3). Then one (or both) of p, or qt must be zero for each i or $ is not a minimum. In this case (4.2.4) becomes an equality and minimizing $ as given by (4.2.5) is equivalent to minimizing the total variation. The objective function given by (4.2.5) is a linear combination of parameters in the form suitable for the LP formulation. It is also straightforward to include a set of arbitrary weights {w t, i = 1 , . . . , M — 1} in the objective function (4.2.5) to influence how strongly the variation is minimized at various depths. To construct an acceptable minimum-variation model, the minimization of the objective function is carried out subject to the constraints that the model reproduces the observed responses. With the discretization given by (4.2.1), the data equations (normalized by their standard deviations) oo ej = J gj(z)m(z)dz, j = 1 , . . . , N, (4.2.6) o can be written M ei = Y, > j = 1,...,N, (4.2.7) i=1 where Zi 7 j i = J 9 j (z) dz, i — 1,..., M. (4.2.8) Equations (4.2.7) represent the data equations expressed as linear constraints. However, since the responses are generally inaccurate, provision for an acceptable misfit should be included in the constraint equations. This can be accomplished in two ways. In the first method, the data constraints are simply imposed as inequalities M ej — a < ^ tjirrii < ej + a, j = 1 , . . . , N, (4.2.9) t=i where a is a constant (usually 1 or 2) which determines how closely the responses are fit. Imposing the constraints in this manner requires that each response must be fit to within a standard deviations; therefore, these are often referred to as 'hard' bounds. The second method constrains the lx norm of the misfit rather than the misfit of each equation individually. Since this allows some responses to have large misfits while limiting the total misfit, it is often referred to as a 'soft' bound. To implement this method, the data constraints are written as equalities (Levy & Fullagar 1981) M ei = (7j«'m» + ui ~ vi)' ui' VJ - 3 = 1 ' " • •' N > ( 4 - 2 ' 1 0 ) i=l where u3 and v3 are new LP parameters introduced to represent the misfit to the jth data equation . Let x 1 be the norm of the misfit, then N N x1 = < (4.2.11) j=l j=l Parker & McNutt (1980) describe the statistics of the x 1 distribution. The expected value for N responses is -y/2/TTN; thus, an upper bound for the misfit could be expressed as Constraint (4.2.12) represents a bound on the h misfit. The actual value of the misfit can be computed from the constructed model. In practice, we have always found that the computed misfit is equal to the applied bound. When the responses are assumed to have statistical uncertainties, it is generally preferable to constrain the total misfit in this manner rather than impose hard bounds on each response (Fullagar 1981; Oldenburg 1983). The LP problem of constructing the h minimum-variation model consists of minimizing the objective function (4.2.5) subject to the variation parameter constraints (4.2.3) and the data constraints expressed by (4.2.9) for hard bounds or by (4.2.10) and (4.2.12) for soft bounds. In addition, it is straightforward to include limits on the model elements or any additional information about the model which can be expressed as a linear constraint in the LP formulation. The flexibility of LP to accommodate additional physical constraints or to minimize different objective functions allows considerable scope to investigate the inverse problem. Whittall (1986) describes two methods of applying localized conductivity constraints in a LP inversion. If reliable physical constraints are available, they can be used to restrict the non-uniqueness of the inverse problem and construct models which are closer to reality. Alternatively, arbitrary constraints may be used to assess the extent of the non-uniqueness and explore the range of acceptable models. The use of minimum-structure models in MT inversion as well as the method of constructing lx minimum-variation models would seem to be new. In the limit of vanishing layer thicknesses it should give similar results to an li flattest model which can be constructed by integrating the data equations by parts to obtain constraints in terms of the model gradient and minimizing the h norm of the gradient using LP (e.g. Oldenburg 1984). The minimum-variation formulation, however, does not require integration of the data equations and does not require a known model endpoint value. Also, in this formulation the depth function f(z) is essentially N (4.2.12) controlled by the choice of the partition width as a function of depth and need not be explicitly introduced. For MT inversion a logarithmic depth partitioning is the appropriate choice. The minimum-variation formulation described here would seem to be an lx analogue of Constable et al.'s (1987) Occam's inversion which minimizes the l2 norm of the variation (which they refer to as the model roughness). Formulating minimum-structure inversion in terms of an norm offers benefits in addition to the flexibility of the LP algorithm. As noted previously, since the l2 norm of the model gradient or variation discriminates strongly against large or abrupt changes in the model, minimizing this norm generally produces smoothly varying models which represent structural changes by continuous gradients. It is important to recognize that this form is due to the inversion procedure, the true model may or may not be involve such gradients and the observed responses do not demand them. In contrast, minimizing the norm of the variation does not discriminate against abrupt changes, but rather produces a minimum-structure model which more closely resembles a layered Earth with structural variations occurring at distinct depths. Thus, the /a and l2 inversions offer complementary representations of the Earth in terms of gradient or layered models; in practice, a complete interpretation should consider both. 4.2.2 Smallest-deviatoric model It is straightforward to modify the minimum-variation formulation described in the previous section to construct the model which minimizes the norm of the deviation from a given base model mg: The data constraints can be included in the LP formulation as described in Section 4.2.1; however, the variation parameters are replaced by 2M new LP parameters {r,, tt, i = 1 , . . . , M) which are constrained to be equal to the M deviations according to M (4.2.14) t=i m-i ~ mBi = - U, ri,U >0, i = 1,..., M, (4.2.15) and the objective function to be minimized is given by M (4.2.16) i=l This approach differs from the standard method of writing the data equations in terms of the model deviation Am and solving for the smallest acceptable deviation, as described in Section 3.2.3. The advantage of the new approach is that it is straightforward to formulate the model construction to minimize an objective function which combines both the variation and deviation according to where 9 is a parameter which determines the trade-off between minimizing the variation and the deviation and W{ and iut- represent arbitrary weighting functions. In the LP formulation there is no difficulty in setting some of the weights u>; or u>, to zero; this differs from the l2 formulation where the weights must be non-zero. 4.3 The linearized inversion algorithm This section describes an iterative inversion algorithm for the non-linear MT problem based on successive LP solutions to the corresponding linearized problem. The method is similar in many respects to the l2 inversion algorithm described in Section 3.3 and therefore will only be described briefly. At each iteration a model solution m(z), representing either a(z) or log a(z), is sought to the linearized equations (3.3.2). By introducing a depth partitioning (4.2.1) and specifying a starting model m0(z), the kernel functions may be computed and integrated and the LP problem posed for the minimum-variation or smallest-deviatoric model as described in Section 4.2. This problem is solved using the exceptionally powerful and flexible LP algorithm XMP (Experimental Mathematical Programming library) developed by R. E. Marsten (1981). The convergence criteria for the inversion algorithm are that the li misfit of the model M-1 M (4.2.17) (4.3.1) must be within a tolerance tx 1 of the desired misfit x\ a n ( i that the total change in the model between successive iterations given by (3.3.5) must be less than ed. In our algorithm t x i , x\ and are parameters that are defined by the user. In practice, common values for these parameters are = \Z2/ir2N (for complex responses at N frequencies), tx 1 =0.1 and ed = 0.01. In order to ensure that the linearization is valid at each iteration, it is important to control the change in the model at each iteration. To accomplish this, the target misfit value at the kth iteration x\,k taken to be some fraction of the misfit of the previous iteration unless this value is less then x\'-where P is usually taken to be between 2 and 5. In addition to choosing target misfits in this manner, the size of changes in the conductivity between successive iterations are controlled by imposing a LP constraint on the ith partition element at the kth iteration according to where D is usually between 2 and 10. Note that the constraints (4.2.3) are included in the LP formulation, not simply imposed when the model is updated as was the case in the /? inversion algorithm. This represents a significant improvement in stabilizing the inversion. A useful feature of Marsten's (1981) LP package is that it allows the user to initialize the LP algorithm with an arbitrary basis. The standard procedure in many LP algorithms is to initiate all parameters at their lower bound. We have found that by initializing the LP algorithm with the solution basis from the previous iteration, the LP computation time can be reduced significantly. This is particularly true for the final few iterations which do not change the model greatly but are required to precisely achieve the desired misfit and ensure that the model change e is acceptably small. The computation time for these iterations can be reduced by an order of magnitude. (4.2.2) tr itk-i/D < aitk < D aitk-i, i = I,..., M, (4.2.3) 4.4 Examples of h model norm construction 4.4.1 Minimum-variation model construction This section presents a number of examples of model construction by inverting both synthetic and measured field responses. The first example illustrates the convergence of the algorithm when the objective function (4.2.17) is minimized with 8 — 1 (minimum-variation model), m(z) — a(z) and uniform weighting tut- = 1 for the synthetic test case described in Section 3.4.1. The constructed models and predicted responses at each iteration are shown in Fig. 4.1 and the corresponding values of the misfit x \ the total variation ||m,+1— m,-||i and the total model change e are given in Table 4.1. Figure 4.1(a) shows the starting model which consists of a halfspace of conductivity 0.02 S/m; the true model is indicated by a dashed line. The two plots to the right compare the observed responses (squares with error bars) to the responses computed for the starting model (solid lines). The observed data are accurate, but an uncertainty of 2 percent in the real and imaginary parts of the response R is assumed so that the X1 statistic can be used to measure the relative fit of the models. Figure 4.1(b)-(f) show the models produced at iterations 1, 2, 3, 4 and 6, respectively. Since the expected value of x 1 for N = 25 complex responses is V ^ A 2-/V « 40, this value was used as the desired misfit x\ with a tolerance of txi =0.1, Also, a model change of e < 0.01 was required for convergence. At each iteration the target misfit was chosen according to (4.2.2) with P = 3 and the change in the conductivity of each model element was limited according to (4.2.3) with D = 10. Formulating each inversion step in this manner ensures that the linearization holds and that structure is introduced into the constructed models and into the predicted data in a controlled manner, as shown in Fig. 4.1(b)-(f). By iteration 4 the constructed model reproduces the data approximately correctly (x1 = 39.7); however two more iterations are required to to precisely achieve the desired misfit and verify that the solution has stabilized. The final model, achieved "in iteration 6 and shown in Fig. 4.1(f), has a misfit of x 1 = 40.0 and represents a model change of only e = 8 .82x l0 - 4 from the previous iteration. 0.10 £ 0.08 CO 0.06 0.04 N 0.02 0.00 102 i i i 111 II f i i i 111 n a _l l_l_ 103 104 a 1 0 - 1 \ C/^ 10-2 30 £ 60 40 20 1 0 " 10-2 1 0 - i io° 101 102 j ' 1 1 ' 1 10~3 10-2 IO"1 10° IO1 102 « « « 3 1 1 1 1 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 T (s) Figure 4.1 The sequence of models produced in the inversion for the l\ minimum-variation model with m{z) = a(z), and W{ = 1. (a), (b) and (c) show the starting model and the models constructed in iterations 1 and 2, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The values of x 1 . Il^-fi — m i | | 2 and e for each iteration are summarized in Table 4.1. 0.10 £ 0.08 00 0.06 _ 0.04 0.02 -0.00 j i i i 11 m i i i 111 ii d J l—L 102 103 104 0.10 £ 0.08 00 0.06 _ 0.04 0.02 -0.00 • i i i 111 II i i i i 111 II i i i 102 103 104 0.10 £ 0.08 00 0.06 0.04 N 0.02 0.00 1 1 j ' i l l l l Ml I I I I I I I II I I L 102 103 2 ( m ) 104 10-3 10-2 10-1 10° 101 102 IO"3 IO"2 10 -1 10° 101 102 IO"3 IO"2 10~l 10° 101 102 T ( s ) Figure 4.1 (cont'd) (d), (e) and (f) show the models constructed in iterations 3, 4 and 6, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. Table 4.1 Summary of model attributes at each iteration for the inversion shown in Fig. 4.1 Iteration x 1 | | m t + 1 - m,-||i e number Misfit Variation norm model change 0 13 100 0 1 396 0.0208 0.530 2 175 0.0524 0.744 3 67.7 0.113 0.314 4 39.7 0.134 0.168 5 40.0 0.133 0.0513 6 40.0 0.133 0.000882 I teration Figure 4.2 Computation times on a SUN 4/310 workstation required for the LP solution at each iteration for the inversion shown in Fig. 4.1. The triangles indicate the CPU time required when the LP parameters are initiated at their lower bound; the squares indicate the time required when the LP algorithm is initiated with the solution basis from the previous iteration. Fortunately, the computation time for the final iterations which refine the misfit value and ensure that the solution has stabilized can be significantly reduced by initiating the LP algorithm with the solution basis of the previous iteration. Figure 4.2 shows the CPU time required for the LP inversion at each iteration on a SUN 4/310 workstation when the LP algorithm is initiated with all parameters at their lower bound (triangles), and when the algorithm is initiated with the solution basis of the previous iteration (squares) from iteration 2 on. When the LP inversion is initiated from the lower bounds, the CPU times vary somewhat but are generally about 80 s. When the inversions are initiated from the previous solution basis, the time required generally decreases with the iteration number with the final two inversions requiring only about 7 s each. This represents a reduction by more than a factor of 10 over the times required when the inversions are initiated at their lower bounds (for larger problems the reduction can be even more substantial). The total time required by the LP algorithm is about 125 s when the inversions are initiated form the previous solutions and about 460 s when the inversions are initiated from the lower bounds. The model solutions at each iteration are identical regardless of how the LP algorithm is initiated. Figure 4.3 shows the results of an inversion similar to that of Fig. 4.1 except that the total variation of m(z)=log o(z) is minimized. The constructed models shown in Fig. 4.1(f) and Fig. 4.3(a) illustrate the very different characteristics of the solutions when they are compared to the corresponding l2 solutions shown in Fig. 3.2(f) and Fig. 3.3(a). Unlike minimizing the l2 structural norm which discriminates strongly against large, abrupt changes in favour of continuous gradients, minimizing the lx total variation produces layered-type models with structural variations occurring at distinct depths. In each case the characteristics of the constructed models are a result of the choice of norm that is minimized; both solution fit the data equally well. Thus, the lx and l2 inversions offer complementary solutions and, in practice, a complete interpretation should consider both. In cases where a minimum-structure layered model is desired, the lx minimum-variation model would seem to be an excellent choice. The method does not require a priori knowledge of the number or depths of the layers, the only requirement 'nT b z (m) 10 - i B w 1 0 - 2 10 - 3 i i 11 urn i i i i i i i i i ' i i ' i nn i i i i nn i i 111 10~3 10-2 10" 1 10° 101 102 T (s) Figure 4.3 The l\ minimum-variation model for m(z) = log <r(z), and w{ = 1. (a) shows the constructed model (solid line) and the true model (dashed line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit of the constructed model is x 1 =40.0. is that the depth partitioning be sufficiently fine that the solution is essentially independent of the discretization, as described in Section 3.3.2. An advantage of the formulation is that there no intrinsic limit to the high-frequency information content of the constructed models; in contrast, the l2 formulation which forms the model as a linear combination of the kernel functions imposes a finite limit independent of the data (Claerbout & Muir 1973). This feature is crucial to the appraisal analysis using extremal models presented in the next chapter. As an example of this difference, consider the lx and l2 minimum-structure models constructed by fitting the (accurate) responses as closely as possible. Figure 4.4(a) shows the best-fitting l2 flattest model that can be constructed. This model is produced whether the inversion algorithm is initiated from a halfspace starting model or the true model. Although the model solution reproduces the observed responses very well (x 2 = 0.242, X 1 = 2.67, assuming 2 percent uncertainties in the responses) and is good agreement with the true model, it is not a perfect solution. In particular, the constructed model does not exhibit the abrupt discontinuities of the layered model and tends to over-shoot the true layer conductivities and oscillate slightly. Also, the misfit to the observed responses, while small, is well above the limit of computational accuracy. This limit on the precision of the solution results because a layered true model simply cannot be reproduced exactly as a linear combination of a finite number of smooth kernel functions. The features of the constructed model are similar to the Gibb's effect observed in attempting to construct a step discontinuity from a finite number of Fourier components. In contrast, the best-fitting lx minimum-variation model, shown in Fig. 4.4(b), reproduces the true model almost exactly (given a model partition with depth elements at the discontinuities of the true model). The observed responses are apparently fit to within the limits of computational precision (x2 =1 .77x l0 - 9 , x 1 =7 .30x l0 - 5 ) since an absolute accuracy of 10~5 is required by the algorithm in integrating the kernel functions according to (4.2.8). The same model is produced whether the inversion algorithm is initiated from a halfspace starting model or the true model. The result that the construction reproduces the true responses accurately is not simply IO"3 IO"2 10"1 10° 101 • 102 6 m N 10~3 10-2 1 0 -1 ioO io1 102 T (s ) Figure 4.4 Best-fitting constructed /1 and l\ minimum-structure models (accurate data) are shown in (a) and (b), respectively. The fit to the true data is shown in the panels on the right. due to the fact that the true model is layered and minimizing the lx variation norm produces layered models. For instance, the lx algorithm performs equally well if the true model and data are taken to be a constructed l2 flattest model and its responses. The weighting parameters included in the LP objective function (4.2.17) can be used to influence how strongly the minimization is applied to various regions of the model. For example, the constructed model shown in Fig. 4.3 indicates a layer of constant conductivity extending to a depth of about 600 m where the conductivity changes abruptly. To investigate whether this constant-conductivity surface layer could extend to 800 m depth, weights tu,- can be chosen in the objective function (4.2.17) to discriminate strongly against conductivity changes in this region. Figure 4.5(a) shows the model constructed by minimizing the model variation (6 = 1) with weights ivi = 5 for 0 < z < 800 m and iut- = 1 below this depth. The model indicates that it is unlikely that a realistic model with an 800-m surface layer could be consistent with the responses. Even with the strong variation weighting, the conductivity changes at about 300 m depth. Also, a narrow high-conductivity zone or spike is required at 800 m depth; this might be considered geophysically unrealistic. It is apparent from the fit to the data displayed in Fig. 4.5(b) and (c) that the constructed model significantly misfits the short period responses which are sensitive to this shallow structure and, to compensate, overfit the long period responses. This indicates that the shallow structure is not in good agreement with the data. In the examples presented so far, accurate responses have been inverted to produce the constructed models. Figure 4.6 shows constructed h minimum-structure models and their corresponding fit to the data for three levels of error on the responses. In each inversion m(z) = a-(z), w(z) = 1, and all solutions have a misfit of x1 = 40.0. Although the errors degrade the solution in each case, by requiring a fit to the data appropriate to the inherent uncertainties according to the x 1 criterion, the errors do not introduce any false structure into the constructed models. The error contaminated data sets inverted in Fig. 4.6 are the same as those considered in Fig. 3.6 for the l2 minimum-structure inversion. In Fig. 4.6(a) each response has been contaminated by the addition of a random error drawn from a zero-mean, Gaussian distribution 3 'nT " b z (m) 10-3 10~2 IO"1 10° Figure 4.5 The li minimum-variation model constructed for m{z)—\oga{z) with a weighting function w, = 5 for 0 < z < 800 m and wi — 1 for 2 > 800 m. (a) shows the constructed model (solid line), the true model is indicated by the dashed line, (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit of the constructed model is x 1 =40.0. IO -3 io-2 IO-1 10° 101 102 £ \ CO '7T "b 10-1 10" 1 0 " 3 -1 1 | ~ T | 1 . L 1 -1 1 1 1 M 1 III 1 1 1 1 Mil c J 1 1 102 103 104 z ( m ) 20 - " i l l r g j f 1 1 i " •j -i-i U T I F-,T i i i f ] 10-3 IO"2 IO"1 10° IO1 102 T (s ) Figure 4.6 The effect of data errors: l\ minimum-variation models for m(z) = log a(z) and wi = 1 are shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of 2, 10 and 30 percent, respectively. The true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. The misfit of the constructed models is x 1 =40.0. with a standard deviation of 2 percent of the accurate response value. The constructed model is similar (but not identical) to the model shown in Fig. 4.3(a) where a 2 percent uncertainty in the data was assumed but accurate responses were inverted. In Fig. 4.6(b) the standard deviation of the error distributions is 10 percent of the response value. The constructed model exhibits considerably less structure: the conductivity of the first and last layers are reproduced quite well and the conductivity increases near 600 and 6000 m; however, the low and high conductivity zones are apparently not required to fit the responses and are not resolved. Figure 4.6(c) shows the results for 30 percent error: the responses are adequately fit by essentially a two-layer model with an increase in conductivity at about 600 m depth. Likewise, the predicted data exhibit no detailed features as only a simple structure is required to adequately fit the true data. It is well known that the misfit norm is more robust and less influenced by extreme or outlying responses than the l2 misfit norm (e.g. Claerbout & Muir 1973). The advantage of this property is demonstrated in Fig. 4.7 which shows and l2 minimum-structure models constructed by inverting a data set contaminated with 2 percent Gaussian noise and with the apparent conductivity response at T=0.187 Hz in error by four standard deviations (the data set is shown in the panels on the right). Figure 4.7(a) shows the minimum-structure model constructed with a misfit corresponding to the expected value, x 1 = 40.0. This model exhibits some additional small-scale structure, but is generally a good representation of the true model and is similar to the model shown in Fig. 4.6(a) which was constructed by inverting the same data set without the outlying response. Figure 4.7(b) shows the constructed l2 minimum-structure model; the expected value for the misfit is x 2 = 50, however, the best fit that could be obtained for this data set was x 2 = 158. Even with this large misfit value the l2 model shows significant false structure when compared with the true model or with the model constructed by inverting the data set without the outlier, shown in Fig. 3.6(a). The relative insensitivity of the misfit norm to a small number of outliers (or 'blunders', in the terminology of Claerbout & Muir 1973) in the data set can be a significant advantage in inverting field measurements when the actual uncertainties may be difficult to estimate accurately. IO"3 IO"2 IO"1 10° 101 102 T (s ) Figure 4.7 The effects of outliers in the data set. The responses are contaminated with 2 percent Gaussian noise and the apparent conductivity response at T = 0.187 Hz is in error by four standard deviations, (a) shows the l\ minimum-variation model with a misfit of x 1 = 40.0. (b) shows the best-fitting h flattest models with a misfit of x 2 = 158. The true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. As an example of lx norm inversion of MT field measurements, consider the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. As described in that section, the uncertainties associated with the outlying responses in this data set appear to be under-estimated and consequently the best-fitting l2 minimum-structure model, shown in Fig. 3.8(c), may contain unnecessary structure. This contention is supported by the best-fitting lx minimum-structure model shown in Fig. 4.8(a) by the solid line. The constructed model has a misfit of x 1 = 88.0; the expected value for 68 responses is 54.3. The fit to the observed responses is shown in Fig. 4.8(b) and (c). The lx minimum-variation model has the same general features as the l2 minimum-structure models shown in Fig. 3.8 (the l2 model shown in Fig. 3.8(b) is included as a dotted line for comparison), but does not exhibit the additional small-scale structure of the best-fitting l2 model shown in Fig. 3.8(c). The simpler structure of the lx best-fitting solution is likely due to its relative insensitivity to the outliers compared to the l2 solution of Fig. 3.8(c). Figure 4.8(a) also shows another example of the complementary model types (gradient or layered) which may be constructed using both the l2 and lx minimum-structure solutions. 4.4.2 Smallest-deviatoric model construction The LP objective function (4.2.17) may be used to represent the model variation, the deviation from a base model, or any (weighted) combination of the two. As an example of the lx smallest-deviatoric model construction, consider the synthetic test case and assume that the true model is known to a depth of 2000 m: the base model is taken to consist of the true model to this depth and below the conductivity is held constant. This same base model was used to demonstrate the l2 smallest-deviatoric model in Section 3.4.2 except that here the model is taken to be m{z) — \oga{z). Figure 4.9(a) shows the smallest-deviatoric model (solid line) constructed by minimizing the objective function (4.2.17) with 9 = 0 and weights wt = l. The true model is indicated by a dashed line and the base model by a dotted line (mostly obscured by the constructed model). The constructed model exactly reproduces the base model over the region 0-2000 m depth where the base model is known accurately, indicating that this structure is consistent with -1 6 CO 10 IO" 2 b C IO" 3 — " i l l ii i i i i i 1 II HIM 1 1 II 1 llll 1 II 1 Mil 1 II 1 Mil b i iMi o 10-3 10~2 10" 1 10° 101 IO2 80 -60 -40 I 20 - c 0 1 1 1 1 HIM 1 1 1 MINI 1 1 1 1 llll 1 i 11inn ,i i 1 1 1 llll 1 l l l l IO" 3 10-2 1 0 - 1 1 0 0 1 0 1 IO2 T (s) Figure 4.8 Best-fitting li minimum-variation model and MT responses observed in southeastern British Columbia, Canada. The minimum-variation model with x 1 = 88.0 is shown in (a) by the solid line (the dashed line indicates the /2 flattest model from Fig. 3.8(b) for comparison). The true data (squares with error bars) and the predicted responses (solid lines) are shown in (b) and (c) 10-3 IO"2 10-1 10° 101 102 u 10° 101 102 & 10"1 w IO"2 N IO"3 -_ 1 1 I I I c mi i i i i i mi i II 102 IO3 104 2 (m) 0 1 ' — 10-3 1 0 -2 10-1 100 101 102 T (s) Figure 4.9 l\ smallest-deviatoric and minimum combined-norm models. The base model is indicated by the dotted line, the true model by the dashed line, (a) shows the smallest-deviatoric model constructed for a uniform weighting. In (b) the combined objective function is minimized with a variation to deviation trade-off of 6 =0.8333, a deviation weighting of unity and a variation weighting of 1 for z >2000 m and 0 for z <2000 m. In (c) the weights are the same, but the trade-off parameter is 6 = 0.9375. The misfit of all constructed models is x 1 — 40.0; the true data (squares with error bars) and the predicted responses (solid line) for each model are indicated in the panels on the right. the observed responses. Below this depth the constructed model also corresponds exactly to the base model except for three narrow zones of high conductivity (each one partition element wide) that are required in order to fit the data. The sparse, spiky structure in this region is characteristic of lx norm solutions when the model structure is not explicitly minimized; this is considered in detail in Chapter 5. The conductivity spikes at depth in Fig. 4.9(a) clearly indicate that the responses require high conductivity structure in this region that is not included in the base model. However, if such rapidly varying structure is considered geophysically implausible, the variation may be reduced by minimizing an objective function that includes both the model deviation and the model variation. Figure 4.9(b) shows the model constructed by minimizing a combined objective function with 8 = 0.8333 (i.e. the variation to deviation trade-off is a ratio of 5 to 1). The deviation weighting parameters u>t were taken to be unity; however, the variation weights Wi were taken to be zero for model elements in the depth range 0-2000 m and unity below these depths so that the contributions to the objective function in the region where the base model is well known are due entirely to the model deviation. Again, the constructed model exactly corresponds to the base model in this region. In the region where the true model differs from the base model the variation has been reduced but several conductivity spikes remain. Figure 4.9(c) shows a similar construction with 0 = 0.9375 (trade-off ratio of 15 to 1). In this case the conductivity spikes are replaced by three layers which decrease in conductivity with depth. The first layer represents the high conductivity region of the true model; the decrease in conductivity towards the base model in the third layer results from the diminishing information content of the responses at great depth. All the constructed models shown in Fig. 4.9 have a misfit of X 1 = 40.0, the fit to the true responses is shown in the panels on the right. The models shown in Fig. 4.9 illustrate some of the diversity of acceptable models that can be constructed. It is important to note that although the constructed model shown in Fig. 4.9(c) may be more appealing (since we know the true model) than that in Fig. 4.9(a), both models are equally acceptable according to their fit to the data. If we have some prior knowledge or insight into the form of the true model (e.g. layered or gradient models as opposed to conductivity spikes) this may be valuable additional information and we are certainly justified in seeking such models by an appropriate choice of model norm or additional constraints. However, it must be kept in mind that the constructed models then reflect our bias and that a variety of models may exist which fit the data equally well. If we have no prior knowledge or insight then it may be valuable to construct a number of different models in an attempt to explore' the range of acceptable models and determine common features. The inversion algorithms presented in this chapter are an important component in any compilation of inversion techniques. Chapter 5 Appraisal using extremal models of bounded variation 5.1 Introduction In linear inverse theory, the general relationship between the responses and the model is given by a Fredholm integral equation of the first kind A fundamental difficulty in inverse theory is that of non-uniqueness: if there exists one model which adequately reproduces the data via (5.1.1), then infinitely many such models exist. The problem of overcoming this non-uniqueness to determine useful information about the true model may be addressed in several ways (e.g. Oldenburg 1984). One approach is to construct acceptable models of a specific character. Chapters 3 and 4 addressed the problem of model construction for the non-linear MT inverse problem. A second approach is that of appraisal. As a result of the inherent non-uniqueness of the inverse problem, a finite set of observed responses cannot impose any bounds on the value of the model at a fixed point. At a given point, the model may attain any (possibly infinite) value and still satisfy (5.1.1). However, model averages over a finite width are constrained by the data, provided at least one kernel function is non-zero over a portion of this width. The goal of appraisal is to determine quantitative information about such model averages. One approach to appraisal is given by Backus & Gilbert (1970). By taking appropriate linear combinations of the data equations, they generated unique averages of the model at a depth interest z0 of the form oo (5.1.1) 0 oo (5.1.2) o N (5.1.3) is known as the averaging function or resolving kernel. The model average (m(z 0)) is unique in the sense that the inner product of A(z 0, z) with any acceptable model will produce this same value. The coefficients a3 are chosen to make A(z 0,z) localized and centred on z0. Ideally, A(z 0, z) should correspond to the Dirac delta function S(z—z 0), for then (m(z 0)) = m(z 0) and the true model would be recovered uniquely. In practice, however, this can never be accomplished with a finite number of kernel functions, so the coefficients are chosen to make A(z 0, z) as close as possible (in some sense) to 8(z — z0). Several possible 'deltaness' criteria are proposed by Backus & Gilbert (1970). The shape of the averaging function may be used to quantify the resolving power of the data. If A(z 0,z) is narrow, well centred on z0 and without significant side-lobes or negative values, then the data have good resolution at this depth and (m(z Q)) is a meaningful estimate of m(z 0). In practical cases where the responses are inaccurate, a trade-off exists between the resolution width of the averaging function and the variance of the model average. The interpreter must select an A(z 0, z) and associated (m(z 0)) that represents the most meaningful compromise between resolution and accuracy. In some cases this analysis produces excellent results. However, for some problems the averaging function A(z 0,z) may have undesirable characteristics such as significant sidelobes or negative values, or it may not be centred at the depth of interest. In such cases the model average, although unique, is not readily interpreted. These difficulties arise from the fact that A(z 0, z), formed from a linear combination of averaging functions, is restricted to that subspace of the Hilbert space spanned by the kernel functions; in some cases it simply may not be possible to construct a suitable averaging function in this manner. Huestis (1987, 1988) presents a method for computing non-negative averaging functions; however, he demonstrates that for some problems such functions do not exist, and even in cases where they do exist, the advantage gained in their use may be offset by a greatly increased computational burden. When the relationship between the model and the responses is functionally non-linear, Backus-Gilbert appraisal can be applied by linearizing the problem about some constructed model. Unfortunately, in this case the unique averages computed pertain only to models that are linearly close to the initial model. Oldenburg (1979) constructed a number of different conductivity models which fit a set of MT data but were not linearly close to each other, and found different values for the model average by linearizing about these models. Parker (1983) and Oldenburg, Whittall & Parker (1984) have found linearized Backus-Gilbert appraisal to be inadequate for the non-linear MT problem. To overcome these shortcomings, it is advantageous to seek quantitative information about model averages by formulating the appropriate inference problem. The mathematical foundation for inference theory has been presented by Backus (1970a, b, c; 1972) and a pragmatic application has been presented by Oldenburg (1983) and will be briefly recounted here. In Oldenburg (1983) it was shown that upper and lower bounds for predicted linear functionals of the model could be computed using LP techniques. One of the most useful linear functionals is the integral of the model with a unimodular box-car B of width A centred at the depth of interest z0: f l / A , i f \Z-ZQ\ < A / 2 ; B{z 0,A,z) = l (5.1.4) [ 0, otherwise. The resultant inner product oo m(z 0,A) = J B(z 0,A,z)m(z)dz (5.1.5) o represents an average of the model over a width A centred at z0. Since B(z 0,A,z) cannot generally be formed as a linear combination of the kernel functions, fh(z 0, A) cannot be determined uniquely. However, lower and upper bounds mL (z 0, A) < m (zo, A) < mu (z 0, A) (5.1.6) can be obtained by constructing models which minimize and maximize (5.1.5) subject to the data constraints using LP. The constructed models which extremize the model average are referred to as extremal models. Implementation of this procedure requires that the model be discretized; as before, let m{z) — mi, Zj_i < z < Zi t i = l,...,M. (5.1.7) Linear programming methods can be used to minimize or maximize an objective function $ = w i m ' subject to constraints on linear combinations of the model parameters m r The {u;, } are a set of arbitrary weights which may be chosen according to f (Zi-Zi-J/A, if zo —A/2 < zi_!, Zi < +A/2; W i = \ • (5.1.8) {0, otherwise, so as to make the objective function represent a discretized form of the model average, i.e. M $ = ^ Wim, = rh (z0, A) . (5.1.9) «=i Lower and upper bounds mL and rnu for m(z 0, A) are calculated by minimizing and maximizing $ subject to the data constraints of (5.1.1), which may be incorporated in discretized form as either hard or soft bounds, as described in Section 4.2.1. An advantage of this approach to appraisal is that the bounds are calculated for exact box-car averages of the true model. This method may be applied to non-linear problems such as the appraisal of MT responses by constructing the extremal models using an iterative linearized inversion algorithm formulated so that the LP objective function is applied to the model at each iteration. If this procedure leads to the global extremization of the objective function, then true bounds for the model average have been found. This is in contrast to linearized Backus-Gilbert appraisal which is only valid for models which are linearly close to some constructed model. In general, it is difficult to verify that a global extremum has been found. However, the analysis in this chapter and in Chapter 6 indicate that in many cases no better extremum can be found; this gives confidence that meaningful bounds have been calculated. An important advantage of the LP method is that any physical information about the model which can be formulated as a linear constraint can be included in the inversion. For instance, a priori lower and upper limits for the model elements m~ < m{ < mf, (5.1.10) are easily included. In order to obtain the most meaningful bounds for the model average rh, is important to include as much additional physical information or insight into the character of the true model as possible. For each value of z0 the bounds m i ( z 0 , A) and mu(z 0, A) may be calculated for a number of different averaging widths A, and plotted as a function of A. Since the computed bounds tend to converge as the averaging width increases, such a plot is referred to as a 'funnel function' diagram (Oldenburg 1983). Funnel function diagrams provide immediate insight into the resolving power of the data at the depth of interest z0. The only loss of generality in this formulation is that caused by the partitioning and parametrization. This is not of practical significance, however, provided that the partition quantization is sufficiently small. Lang (1985) demonstrates that the exact problem of computing linear functionals of the model can be approximated to arbitrary accuracy by a discretized problem given a small enough partition interval. Oldenburg (1983) illustrated the method of appraisal using extremal models for a simple nu-merical example and applied the method to the (non-linear) MT inverse problem. Unfortunately, it is found that the extremal models constructed by this analysis often exhibit unacceptably large oscillations. When model limits are large or absent, the extremal models are characteristically sparse and spiky, consisting of isolated pulses of high conductivity embedded in an insulating halfspace. If confining model limits are imposed, the extremal models characteristically consist of a sequence of sections which alternate between the limits, in some cases fluctuating rapidly. In many practical applications we are not willing to accept such models, even if they are consistent with the observed data. Although these models represent mathematically acceptable solutions, they are generally not geophysically realistic. As a consequence, since the funnel function bounds are obtained from these extremal models, it is likely that bounds found using this method are unduly pessimistic. It is anticipated that more meaningful bounds could be calculated if these highly variable models are purposely winnowed from the analysis. In this chapter the total variation is used as a measure of the amount of structure of a model and highly oscillatory models are discriminated against by placing an upper bound on the variation of the extremal models. As a consequence of restricting the model solution space in this manner, the difference between the upper and lower bounds computed for m(z 0, A) is often considerably reduced. Thus, the appraisal technique of Oldenburg (1983) is extended to include the variation of the extremal models as another dimension. The interpreter may make use of any knowledge or insight regarding the variation of the earth model to select reasonable extremal models and meaningful funnel function bounds. In the next section, two methods of bounding the total variation of the constructed models are presented. In Section 5.3 the appraisal technique and the dependence of the computed bounds on the allowed variation is demonstrated for a simple linear example. The appraisal method can be applied to non-linear problems using an iterative linearized algorithm. Section 5.4 describes this algorithm and in Section 5.5 the method is applied to the non-linear MT problem by considering synthetic and field data cases. Much of the work in this chapter has been presented in Dosso & Oldenburg (1989). 5.2 Formulating the variation bound The total variation of a model m(z) was defined in Section 4.1.1 as In order to eliminate extremal models which are judged to have too much structure, the appraisal method described in Oldenburg (1983) is modified to include a constraint on V(m). By placing an upper bound on the variation, models which are sparse and spiky or oscillate repeatedly between the imposed limits can be discriminated against. Abrupt or discontinuous changes are still allowed, but the total number and magnitude of such changes can be limited to an amount deemed reasonable. The goal is to select a variation bound which results in models that are judged to be geophysically realistic and produce the most meaningful funnel function bounds. Two methods of bounding the total variation are presented. 5.2.1 Method 1 The first method is applicable to models that are assumed to be continuous, i.e. m e C 1 . In this case (5.2.1) can be written as oo (5.2.1) 0 oo (5.2.2) o where m' — dm/dz. In discrete form, where the model is assumed to have a constant gradient on each partition element, the variation can be written as Linear programming methods generally assume that the variables are non-negative, but model derivatives which may be either positive or negative can be accommodated by writing each parameter m- as the difference of two non-negative quantities: m- = r , — w h e r e r t , t% > 0 are the variables to be determined by the LP algorithm. The absolute values |m' | in (5.2.3) cannot be included in a linear constraint; however, they may be represented as |m-| <r t--H;, with equality holding when either r, or ti is zero. The total variation V must obey the inequality M T < £ > • • +*••)(*.•-*.•-1) (5.2.4) !=1 and an upper bound Vf, for the variation may be specified by requiring M Y / ( r i + t i ) ( z i ~ z i - i ) < V b . (5.2.5) t=i Equation (5.2.5) is a linear constraint for the total variation in a form that can be included in the LP algorithm. To constrain the total variation according to (5.2.5), the LP objective function and constraints must be written in a form which involves only r, and tt- as unknowns (with m- = Ti — ti). If ra0 = m(z = 0) is assumed known, then the value of the model on the zth partition element is given by i-1 mi = zk-i) (r k ~ tk) + \ (z { - z^x) (r< - U). (5.2.6) k=l The objective function (5.1.19) becomes To put the data constraints into a compatible form, integrate (5.1.1) by parts to obtain M (5.2.3) «=i (5.2.7) oo o where fj and h3(z) are new data and kernels given by fj = m0hj (oo) — ej, (5.2.9a) z hj (z) = J gj (u) du. (5.2.9b) o Discretization yields M fj = Y,™( r«- j = l,...,N, (5.2.10a) i=i Zi = J i h i ( z ) - hi (oo)] d z i — 1,... ,M. (5.2.10b) 7a Zi-l As a final constraint, limits for individual model elements, m t < m t < mf, may be included as «—i »r Zk-i) (rf c - tk) + i (ZI - Zi_!) (r,- - U) <mf i = 1 , . . . , M. (5.2.11) ™>: — k=l The LP problem of computing bounds for m(z 0, A) consists of extremizing the objective function $ given by (5.2.7) subject to the data constraints of (5.2.10), the model limits of (5.2.11) and the variation bound of (5.2.5). The extremal model may be computed according to (5.2.6). 5.2.2 Method 2 The second method does not require the model to be a continuous function, rather, the model is represented by a constant value on each partition element. In this case the total variation of the model can be characterized as M—L V = |m,-+i -rrii\. (5.2.12) i=1 Instead of formulating a linear programming problem in which the objective function, data constraints, model limits and variation bound are written in terms of the model derivative m- = r% - ti, the formulation in terms of the model elements m, is retained, but 2(M - 1 ) new (non-negative) LP parameters {pi, qt} are introduced and constrained to represent the model changes: Pi ~ qi - - Pi,qi>0, i = 1,..., M. (5.2.13) It follows that |m,+ i — m, | < pl + ql and therefore the total variation can be bounded by constraining M—L + Vft. (5.2.14) j = i This representation of the total variation of the model is identical with that given in Section 4.2.1. However, in the application presented there the model variation was taken as the objective function and minimized by the LP algorithm in order to construct minimum-structure models. Here the objective function represents the model average m(z 0, A) as given by (5.1.8) and (5.1.9). This objective function is minimized and maximized to obtain lower and upper bounds for m(z 0, A). The variation is bounded according to (5.2.14) in order to ensure that the extremal models constructed in the minimization or maximization are geophysically reasonable. For the work presented here, both methods of bounding the variation have been programmed and give essentially the same results. The advantage of the first method is that fewer variables and constraints are required in the LP algorithm. In Method 1, 2 M variables are required to represent the model derivative elements and the variation bound is specified as a single constraint, whereas in Method 2, 3 M— 2 variables are required (M model elements and 2 (M— 1) variables to represent the model changes) and a total of M constraints are required to specify the variation bound. However, the sparsity of the constraint matrix is destroyed when limits for the model elements are expressed in terms of the models derivative according to (5.2.11). In practice, this can be a significant disadvantage since many LP algorithms are designed for large, sparse constraint matrices. Despite the fact that the second method requires more variables and constraints, the constraints are sparse and we have found the second method to be both significantly faster and more stable computationally for large extremization problems. In addition, since the second method is formulated in terms of the model rather than its derivative, integration of the data equations and the recovered model derivative solution are not required. Also, there is no need to specify a model value at an endpoint. For these reasons the second method is the recommended formulation and the numerical examples presented in this thesis are computed using Method 2. In either method, however, since the variation bound is specified as an inequality constraint, it may be that the extremal model does not achieve a total variation of VJ,. The actual variation V of the constructed model can be evaluated directly. In practice it is generally found that V = V& provided the variation bound is less than the variation of the unbounded extremal models. 5.3 Linear appraisal example To illustrate the appraisal technique and demonstrate the improvement in resolution when the variation is bounded, a simple linear example presented in Oldenburg (1983) is re-examined. Let the model be defined on the interval [0,1] as m (z) = 1 — - cos (2wz) (5.3.1) and the responses be obtained from the equations i ej = J e-( j~l)zm (z) dz, j = 1 , . . . , N. (5.3.1) o A total of 11 accurate data were generated, and these are used to infer information about the value of the true model for a depth z0 = 0.5 where the model attains its maximum value of 1.5. Figure 5.1(a) shows upper and lower bounds calculated when no limits (except a non-negativity constraint) are placed on the model elements, and no bound is placed on the total variation. Averages of the true model are indicated by the dashed line. The wide bounds indicate that the resolving power of the data is poor. For instance, for an averaging width of A = 0.2, the model average is known to lie only within the bounds 0<m(z o, A) <4.16 while the true model lies in the range 1.47 < m(z) < 1.50. Only for A >0.5 is mL >0, so without additional physical information, a region of non-zero amplitude near z0 — 0.5 is not a required feature of the model. Figure 5.1(b) and (c) show constructed extremal models which minimize and maximize the model average m(z 0, A) for an averaging width of A = 0.2. The true model is indicated by the dashed line. The constructed models consist of a sequence of regions of zero amplitude with two or three isolated zones of large amplitude each one partition element in width. This structure is characteristic of all extremal models which produced the funnel function bounds of Fig. 5.1(a). Figure 5.1 Lower and upper bounds for m(z0 = 0.5, A) are shown in (a), (d) and (g). In (a) only non-negativity was required, in (d) model limits 0.5 < m; < 2.0 were imposed, and in (g) a variation bound of V& = 2.0 was also included. This variation bound corresponds to the actual variation of the true model. The true model averages are indicated by the dashed line. The two plots to the right of each funnel function diagram show the constructed extremal models which minimize and maximize m(zo = 0.5, A = 0.2). In these plots the true model is indicated by the dashed line. The constructed models for the unconstrained extremizations exhibit narrow zones or spikes with amplitudes of up to 40 or more. These values differ dramatically from the true model. If reasonable limits for the model amplitude are known, the computed bounds can be greatly improved. Figure 1(d) shows the bounds calculated after requiring that 0.5 < m, <2.0. The significant improvement in resolution for all averaging widths is apparent when Fig. 5.1(a) and (d) are compared. The1 funnel functions also show the minimum resolution width required before the measured responses influence the computed bounds. For instance, only for A >0.28 is mL > 0.5, the imposed lower limit, and mu < 2.0, the imposed upper limit. Figure 5.1(e) and (f) show constructed extremal models which minimize and maximize fh(z 0, A = 0.2). These models consist predominately of a sequence of sections which alternate between the imposed limits. Only a few model elements do not achieve either m~ = 0.5 or m+ = 2.0. In some cases the extremal models fluctuate rapidly between the imposed limits. An example of this is given in Fig. 5.2 which shows the constructed model which minimizes m(z 0, A = 1.0). The bimodal form of these extremal models is similar to that of Parker's (1974, 1975) ideal bodies. The ideal body m^z) is that model which is everywhere equal to either zero or M0, where M 0 represents the greatest lower bound on the largest value of m (i.e. the smallest supremum of m). The model mj(z) is unique in that it is the only acceptable model which nowhere exceeds M0. In the limit of m~ —>0 and m + ->M 0 , the LP extremal models will be equal to mi(z) regardless of the values of z0 or A. However, it appears that the extremal models retain this bimodal form for a wide range of values of m" and m + , provided the discretization interval is sufficiently small. Models such as those shown in Figs 5.1(b), (c), (e), (f) and 5.2 might not be considered geophysically realistic, and hence the computed bounds may be unduly pessimistic. Figure 5.1(g) shows the results of employing a variation bound Vb = 2.0, which represents the actual variation of the true model. A significant improvement in the resolution is apparent when Fig. 5.1(d) and (g) are compared. In this case there appears to be no minimum resolution width before 0.0 0.4 0.6 Depth z Figure 5.2 The constructed model which minimizes m(zo = 0.5, A = 1.0) with imposed model limits 0.5 < m, < 2.0. The true model is indicated by the dashed line. Averaging Width A Figure 5.3 The percent improvement, P, for V\ model limits 0.5 < m, < 2.0. = oo (no variation bound) and V 2 = 2.0, with the data influence the computed bounds. For instance, for an averaging width of A = 0.28, the computed bounds are 0.86 < rh < 1.72, while in Fig. 5.1(d), the computed bounds simply reflect the imposed limits 0.5 < in <2.0. Figure 5.1(h) and (i) show constructed models which minimize and maximize fh(z 0, A = 0.2) for the variation bound V < 2.0. These models do not exhibit excessive oscillations and might be considered to be more geophysically realistic. To quantify the improvement in the bounds that results when the allowed variation is changed from V\ to V 2, the 'percent improvement', P is defined P(z 0,V uV 2,&) = 1 -mv2 (*o»A) - rriy2 (z 0, A) x 100%. (5.3.3) (z 0, A) - m£x (*o,A). In (5.3.3) the subscripts Vi and V 2 indicate the total variation allowed in the extremal models. The results for V\ = oo (no variation bound) and V 2 =2.0 are shown in Fig. 5.3. For most averaging widths the funnel function bounds are improved by 30-40 percent. By reformulating the appraisal method to bound the total variation of the model, the analysis has been extended to include the variation as another dimension. Upper and lower bounds may now be considered as a function of both the averaging width and the model variation. Figure 5.4 shows the computed bounds as a function of the allowed variation for a fixed averaging width A = 0.2. No limits (except non-negativity) were placed on the model elements. The true model has a variation of V— 2.0 and an average value of fh(z 0, A = 0.2) = 1.47; this point is indicated by a cross in Fig. 5.4. For large allowed variations the bounds are wide and the model average is poorly constrained. For instance, for a variation of V = 14.0, the model average is only known to lie within the bounds 0 <m(z 0. A) < 4.15. As the allowed variation is decreased, the bounds converge smoothly. The upper bound decreases monotonically as the allowed variation is decreased from V = 14.0; however, it is not until the variation is less than about V = 5.0 that the lower bound increases from zero. At the true model variation of V = 2.0, the model average is known to lie within the bounds 0.73 < m(z0, A) < 1.75. Reducing the allowed variation to a value less than 2.0 excludes the true model from the LP solution space and may result in computed bounds which do not contain the true model average. 4 D S 3 2 hJ " S 1 0 4 6 8 10 Variation V Figure 5.4 Lower and upper bounds for fh(zQ = 0.5, A = 0.2) as a function of the allowed total variation V for the linear example with ZQ = 0.5, A = 0.2. The true model average is indicated by the cross. N 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Depth z Figure 5.5 The l\ minimum-variation model for the linear example. The model given by the solid line is constructed by minimizing the total variation. An identical model is constructed by minimizing or maximizing the model average m with a variation bound Vf, =0.75. The value V = 0.75 corresponds to the minimum model variation that is consistent with the model fitting the data. The true model is indicated by the dashed line. This point is also illustrated in Fig. 5.4. As V is decreased below 2.0, the bounds continue to converge; for variations V < 1.5 the bounds no longer contain the true model average. The upper and lower bounds meet at a variation of V — 0.75. For this variation the model average is known precisely since m L = 1.15<m<1.15 = m u , but this value does not correspond to the true model average of m = 1.47. This demonstrates that although it is important to use the best possible value for the variation bound, over-constraining the variation (or any other physical property) can cause misleading results. The point at which the bounds meet represents the smallest possible variation which still permits an acceptable model. Any attempt to reduce the allowed variation below this value results in an inconsistency between the variation bound and the data constraints. The extremal model which achieves this minimum variation corresponds to the minimum-variation model described in Section 4.2.1. The minimum-variation model and the extremal models computed by minimizing and maximizing fh for V b = 0.75 are identical and are shown in Fig. 5.5. The funnel function resolution depends on both the averaging width A and the allowed variation V. This dependence is illustrated in Fig. 5.6 in which contours of the normalized bound width 2(m u —mL)/(m u+mL) are plotted. The bound width increases with increasing variation and decreases with increasing averaging widths. The best resolution occurs for large averaging widths and small allowed variations. In practice, a plot like Fig. 5.6 together with an examination of the extremal models constructed for various variation bounds should enable an interpreter to determine meaningful bounds on the average value of the model over the region of interest. 5.4 The linearized appraisal algorithm The method of appraisal using extremal models of bounded variation can be applied to non-linear inverse problems such as MT by constructing the extremal models in an iterative linearized algorithm formulated so that the LP objective function is applied to the model at each iteration. When the variation bound is formulated according to Method 2, Section 5.2.2, the numerical 0.0 0 .2 0 .4 0.6 0.8 1.0 Averag ing Width A Figure 5.6 Contours of the normalized bound width 2 (m u of averaging width A and variation V for z0 =0.5. -mL)/(m u+mL) as a function implementation of this procedure is similar in many regards to the minimum-variation model construction algorithm described in Sections 4.2 and 4.3, and will be described only briefly here. The major difference between the construction of minimum-variation models and extremal models of bounded variation is the objective function which is extremized. In the minimum-variation algorithm, the variation of the model is minimized at each iteration. To construct extremal models of bounded variation, the box-car model average m(z 0, A) given by (5.1.8) and (5.1.9) is minimized or maximized at each iteration while the model variation is constrained according to (5.2.14). The data constraints are imposed as described in Section 4.2, and model element limits (5.1.10) are also included. To ensure that the linearization inherent in the data equations remains valid, the target x 1 misfit is reduced at each iteration as per (4.2.2) and the size of changes in the model between successive iterations is controlled as per (4.2.3). The x 1 misfit must be reduced to within a tolerance txi of the desired misfit x\ f° r convergence. Since the objective function applies to only a limited region of the model, there may be a variety of acceptable models with the same value of m(z0 , A) but which differ in structural detail outside the region of extremization. It is found that the algorithm sometimes cycles between such models. Since it is the objective function value not the extremal model per se that is of interest here, there is no requirement that the total model change e given by (3.3.5) be reduced to some specified limit. Rather, in addition to the model fitting the responses, it is required that the value of the objective function changes by less than a factor / over three consecutive iterations for convergence. In our algorithm x}' txi and / are parameters that are defined by the user. In practice, common values for these parameters are x j = \/2/n2N (the expected value for complex responses at N frequencies), t x i =0.1 and / =0.01. The algorithm is usually initiated from a halfspace starting model and converges in about six to ten iterations. The LP package of Marsten (1981) is employed and the solution at one iteration is used as the starting basis for the subsequent iteration; this greatly reduces the computation time of the latter iterations. 5.5 Appraisal of MT responses 5.5.1 Synthetic MT example This section presents a number of examples of the appraisal of MT responses using extremal models of bounded variation . The appraisal method is first demonstrated using the synthetic test case of Whittall & Oldenburg (1990) described in Section 3.3.1; in Section 5.5.2 the analysis is applied to a set of measured field responses. Acceptable l2 and minimum-structure models constructed for the synthetic MT example when the model is taken to be m(z) = a(z) are shown in Figs 3.2(f) and 4.1(f). Both of these models indicate a (relative) high conductivity zone centred at about 1300 m depth with a width of about 1400 m. The true model has a conductivity of 0.04 S/m at these depths. The method of appraisal using extremal models can be used to compute upper and lower bounds for conductivity averages a(z Q = 1300,A) of this region. Figure 5.7(a) shows the bounds au and aL computed when model limits 0.002 < a, < 0.2 S/m are imposed, but no constraint is placed on the total variation of the extremal models. The conductivity averages of the true model are indicated by the dashed line. The upper bound decreases from the imposed limit of 0.2 S/m for A < 200 m to a value of 0.0563 S/m at A = 1400 m. The lower bound reflects the imposed lower limit of 0.002 S/m for A < 800 m then increases to 0.0175 S/m at A = 1400 m. Examples of the extremal models which produce the funnel function bounds of Fig. 5.7(a) are given in Fig. 5.8(a) and (b) which show the constructed extremal models which minimize and maximize, respectively, the model average a(z Q = 1300, A = 800). These models have the characteristic sparse, spiky form of solutions of unconstrained variation, consisting of regions of conductivity at the imposed lower limit with narrow, isolated zones of high conductivity at or near the upper limit. All extremal models constructed to produce the bounds in Fig. 5.7(a) are of this form and have total variation values of 2.0 to 4.0 S/m, more than 10 times that of the true model. The extremal models which produced the funnel function bounds in Fig. 5.7 all ^ x 0.20 6 a m 0.15 - \ 0.10 P b 0.05 -b 0 . 0 0 i i i i i i 0 400 800 1200 0.20 -6 m 0.15 -0.10 -b 0.05 -b 0.00 -0 400 800 1200 Averaging Width A (m) Figure 5.7 Computed lower and upper bounds for a(z 0 = 1300, A) for the synthetic MT example. In (a) only model limits 0.002 < cti < 0.2 S/m have been imposed; in (b) a variation bound of V b = 0.21 S/m (the variation of the true model) was also included. The true model averages are indicated by the dashed line. 0.20 f—K E \ 0.15 m 0.10 0.05 b 0.00 1 0.20 £ \ 0.15 m 0.10 0.05 b 0.00 h 20 IO"3 10~2 10"1 10° 101 IO2 IO"3 IO"2 1 0 - l 1 0 o io1 IO2 T (s) Figure 5.8 Extremal models constructed by (a) minimizing and (b) maximizing a(z 0 = 1300, A = 800) for the synthetic MT example with model limits 0.002 < cr; < 0.2 S/m but no bound on the total variation. The true model is indicated by the dashed line. The constructed models have a misfit of x 1 = 40.0; the fit to the true responses is shown in the panels to the right. have a misfit of x 1 =40.0; the fit to the true responses for the constructed models in Fig. 5.8 is shown in the panels to the right. It is interesting to compare the LP extremal models with the theoretical results of Weidelt (1985). Weidelt analytically treated the full non-linear problem of extremizing the conductance 22 function S(z 2) = f o{z)dz subject to exactly fitting a small number of MT responses. He o determined that when no model limits (except non-negativity) where imposed, the extremal models consist of insulating zones (<r = 0) and thin regions of infinite conductivity, but finite conductance, located at isolated points. When S(z 2) is maximized, a conducting region is located at 22—0, which is just included in the region of integration, whereas when S{z 2) is minimized, a conducting region is located at z2+ 0, which is just excluded. When model limits a~ < a < a+ are imposed, Weidelt found that the extremal models consist of a sequence of sections of alternating conductivities o~ and <r+. When S(z 2) is maximized, a layer of conductivity a+ ends at z — z2, whereas when S(z 2) is minimized a layer of of conductivity cr~ ends at 2 = z2. The extremal models of unconstrained variation in Fig. 5.8 appear illustrate the character of Weidelt's exact solutions. In Fig. 5.8(a) where a(z 0 = 1300, A = 800) is minimized, the constructed model reflects the lower bound over the region of minimization and narrow zones of conductivity at the upper limit are just excluded from this region. In Fig. 5.8(b) where a(z 0, A) is maximized, narrow zones of conductivity at the upper limit are just included in the region of maximization. The discrepancies between the character of the LP extremal models and Weidelt's theory are likely due to the finite discretization interval that, must be employed in a practical solution. Weidelt's exact extremal models represent an interesting and important result. However, if the purpose of the extremization is to determine meaningful bounds for a(z 0, A), these types of extremal models are not satisfactory. Figure 5.7(b) shows the result of imposing a variation bound of V& = 0.21 S/m, which is the total variation of the true model. A significant improvement in the resolution at all averaging widths is apparent when Fig. 5.7(a) and (b) are compared. The computed bounds in Fig. 5.7(b) are within the imposed limits for all averaging widths A and converge smoothly as A increases. The percent improvement in the bound width, P(z 0, V1 ;V2 , A) given by (5.3.3), is shown in Fig. 5.9 for Vi = oo (no variation bound) and V2 = 0.21 S/m . For most averaging widths the funnel function bounds are improved by 50-65 percent. Examples of the extremal models which produced the improved funnel function bounds of Fig. 5.7(b) are given in Fig. 5.10(a) and (b) which show the models which minimize and maximize a(z 0 = 1300, A = 800) for V b =0.21 S/m. These constructed models do not exhibit the sparse, spiky form of the models in Fig. 5.8 and would likely be considered to be more plausible from a geophysical point of view. The extremal models which produced the funnel function bounds of Fig. 5.7 were constructed by initiating the inversion algorithm from a 0.02 S/m halfspace starting model. However, a number of the extremizations were repeated with the algorithm initiated from a diverse variety of starting models. Although the extremal models constructed sometimes differed in minor detail, we have not found a case where the objective function value representing the model average differed significantly. This provides some confidence that a better extremal model cannot be found (at least for the partition used here) and that meaningful bounds have been computed. This aspect is considered further in Chapter 6. The constructed models shown in Figs 3.2(f) and 4.1(f) exhibit a region of low conductivity centred at about 4000 m followed by a region of high conductivity centred at about 8000 m. To verify if this structure is required of all acceptable models the method of appraisal using extremal models may be used to compute upper bounds for a(z 0 = 4000) and lower bounds for a(z 0 =8000). Figure 5.11(a) show the computed bounds when model limits a~ =0.002, a + = 0.2 S/m were imposed but the total variation was unconstrained. The computed lower bound for a(z 0 = 8000) is lower than the upper bound for a(z 0 = 4000) for all A, indicating that without additional information the difference between the two regions cannot be resolved from the data in this manner. It is interesting to note from Fig. 5.11(a) that aL(z 0 = 8000, A = 4000) = 0.002 S/m, i.e. an acceptable extremal model can be constructed with a at the imposed lower limit of 0.002 S/m over the entire high conductivity region of the true model 6000-10000 m where cr = 0.1 S/m. This extremal model, shown in Fig. 5.12(a), exhibits conductive zones at the upper Averaging Width A (m) Figure 5 9 i The percent improvement, P, for Vi = oo (no variation bound) and V 2 = 0.21 S/m for the synthetic MT example with *0 = 1300 m. Model limits 0.002 < a, < 0.2 S/m have been imposed. 0.10 £ 0.08 CO 0.06 ' 0.04 N> 0.02 "tT 0.00 1 0.10 s 6 0.08 \ CO 0.06 0.04 0.02 0.00 _i ' • i i 111ii a J L_l_ 02 10a 104 - t - -> -N . i i -i i i i i 1 j b i i i i J-IUJ i i i Mill i i i 102 103 z (m) 104 4 3 1 0 - 1 CO 10~2 IO"3 10~2 10-1 10° 101 IO2 10-3 10 - 2 IO-1 10° IO1 102 T (s ) Figure 5.10 Extremal models constructed by (a) minimizing and (b) maximizing a(z 0 = 1300, A = 800) for the synthetic MT example with model limits 0.002 < a l < 0.2 S/m and a variation bound Vj = 0.21 S/m. The true model is indicated by the dashed line. The constructed models have a misfit of x 1 = 40.0; the fit to the true responses is shown in the panels to the right. Averaging Width A (m) Figure 5.11 The upper bound au(zq =4000,A) is compared to the lower bound aL(z 0 = 8000,A) as a function of averaging width A for the synthetic MT example. In (a) only model limits 0.002 < a <0.2 S/m were imposed, while in (b) a variation bound of Vj =0.21 S/m was also included. 0.20 f—i \ 0.15 CO 0.10 ,—„ 0.05 0.00 1 0.10 ^—s 6 0.08 cn 0.06 0.04 0.02 "b 0.00 i _ -i ' ' i i'tn ' i i 111 n • 1 102 103 z ( m ) io4 10"3 IO"2 IO"1 10° 101 102 10"3 IO"2 10-1 10° 101 102 T ( s ) Figure 5.12 Extremal models constructed by minimizing ct(z0 = 8000,A = 4000) with model limits 0.002 < <r, <0.2 S/m. In (a) no variation bound was imposed, in (b) a variation bound Vj =0.21 S/m was included. The true model is indicated by the dashed line. The constructed models have a misfit of x 1 = 40.0; the fit to the true responses is shown in the panels to the right. limit of 0.2 S/m just excluded at either edge of the region of minimization and illustrates the extent of the non-uniqueness of the inverse problem. To reduce the non-uniqueness and compute more meaningful funnel function bounds, the total variation may be constrained; the results of imposing a bound Vb — 0.21 S/m are shown in Fig. 5.11(b). For resolution widths greater than about 1500 m the computed lower bound for a(z 0 = 8000) is greater than the computed upper bound for a(z 0 — 4000), indicating that the region of of low conductivity followed by a region of higher conductivity is clearly resolved and is a required feature of all acceptable models with a total variation V < Vb• The constructed extremal model which minimizes a(z 0 = 8000, A = 4000) with Vb = 0.21 S/m is shown in Fig. 5.12(b). The extremal models which produced the funnel function bounds in Fig. 5.11 all have a misfit of x 1 =40.0; the fit to the true responses for the constructed models is Fig. 5.12 is shown in the panels to the right. The final example of the appraisal method using the synthetic MT test case considers the effect of errors on the responses. Figure 5.13 shows the funnel function bounds computed for three levels of error. The error contaminated data sets inverted in Fig. 5.13(a), (b) and (c) are the same as those considered in Figs 3.5 and 4.6 and result from the addition of a random error drawn from a zero-mean, Gaussian distribution with a standard deviation of 2, 10 and 30 percent of the accurate response value, respectively. As would be expected, the bound width increases with the error level indicating that the funnel function resolution deteriorates as the data become more imprecise. 5.5.2 MT field data example Figure 4.8 shows l2 and h minimum-structure models constructed for the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. These two models are in good agreement and show essentially the same features as those obtained by Jones et al. (1988). In particular, the models indicate a region of low conductivity at 2000-7000 m depth and a region of high conductivity at 20 000-30 000 m depth. These features will be appraised using the method of extremal models of bounded variation. 0 . 0 0 b 0.05 h 0 . 0 0 b 0.05 -0 . 0 0 0 400 800 1200 Averaging Width A (m) Figure 5.13 The effect of data errors: funnel function bounds computed for a(z0 = 1300, A) for the synthetic MT example ares shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of 2, 10 and 30 precent, respectively. The true model average is indicated by the dashed line. In Fig. 4.8 the model was taken to be log a rather than a in order to recover conductivity variations over several orders of magnitude. In order to appraise model features we still wish to determine bounds for the conductivity average a", however, to construct realistic extremal models of log a, we need to constrain the total variation of log a. This can be accomplished as follows. If the model m(z) is taken to be log a(z), the definition of the total variation given by (5.3.1) becomes oo V(logtr) = b j \d(x/a\, (5.5.1) o where b — log e. There are a number of ways that this can be approximated in discrete form and used to bound the variation of log a at each iteration. We have found the most useful approximation of the log variation at the kth iteration to be M ~ X U* (jk I V (loga) = b £ ' ^ - x ' L y (5.5.2) ^ max {(r t+1,(7. } Then the variation of log a can be constrained by applying the LP formulation of Method 2, Section 5.2.2, with the variation bound (5.2.14) modified to be M-1 h i ? m a x K+I'!V'} ~ V i ' <5-5'3) where pi — qt = |of +1 — a-\ and V b is the limit on the log variation. We have found that formulating the bound in this manner preserves the character of the extremal models (e.g. it allows conductivity spikes to be just included or excluded from the region of extremization) while effectively limiting the log variation. However, since (5.5.3) is not an exact representation of the variation of log a, the actual log variation of the constructed model M-1 V ( l o g a ) = £ | b g f f ? + 1 - l o g ( j ? | (5.5.4) 2=1 must be evaluated directly. Also, since the log variation constraint (5.5.3) depends on the model of the previous iteration, it is important that the iterations converge to a stable solution where changes in the model between iterations are negligible. Limiting the changes in the model between iterations according to (5.4.3) appears to be an effective manner of ensuring convergence to a stable solution. Figure 5.14(a) shows the extremal model which maximizes a- over the apparent low con-ductivity region, 2000-7000 m depth. Conductivity limits a~ =0.0001, a + =1.0 S/m were imposed, but no bound was included on the variation. The computed upper bound for a is 0.0023 S/m and the log variation of the extremal model is 72. The model is sparse and spiky, consisting of insulating regions with conductivity at the imposed lower limit and narrow, isolated zones of high conductivity. A zone of high conductivity (one partition element wide) is just included at each edge of the region of maximization. Such a model is not appealing from a geophysical point of view. Figure 5.14(b) and (c) show extremal models with log variations of 18 and 7.9, respectively. The rapid fluctuations between low and high conductivity values have been suppressed. The upper bounds for a computed from the models in Fig. 5.14(b) and (c) are 0.0022 and 0.0015 S/m. The models which minimize a for this region simply reflect the imposed lower limit and are not shown; the computed lower bounds for regions of low conductivity are often not particularly meaningful since MT measurements contain little information about resistive layers. It would be advantageous if an appropriate variation bound Vb could be ascertained through analysis or from the physics of the problem. Unfortunately, this is seldom the case and for many practical problems an appropriate a priori bound for the variation may not be known. When this is the case, the interpreter may wish to construct extremal models for a number of variation bound values and select the model with the largest variation that is deemed geophysically plausible. In this manner the interpreter may make use of any knowledge or insight regarding the variation of the true model to select reasonable extremal models and meaningful bounds for the model average. For instance, the extremal model shown in Fig. 5.14(a) is not realistic; however, the model shown in Fig. 5.14(c) might be considered acceptable and therefore a meaningful upper bound for a would be 0.0015 S/m. Extremal models which minimize and maximize a for the apparent high conductivity region 105 10-3 10-2 10 - 1 10° 101 IO2 105 1 0 ° ? 1 0 - 1 \ S IO"2 "N" 10-3 b IO"4 ' ' I I I 11 llll c 1 ' 1 " " IO2 IO3 104 z ( m ) 105 10-3 10-2 10"1 10° 101 102 W 1 0 - 2 3 u i i i i L IO"3 1 0 -2 1 0 - 1 1 Q0 1 0 1 1 ( T ( s ) Figure 5.14 Constructed extremal models which maximize o for the apparent low conductivity region 2000-7000 m depth for the LITHOPROBE MT data set. Model limits 0.0001 < a{ < 1.0 S/m were imposed in each case, (a) shows the extremal model of unconstrained variation. This model has a log variation of 72 and a model average of a =0.0023 S/m. (b) and (c) show the extremal models with log variations of 18 and 7.9 and model averages of 0.0022 and 0.0015 S/m, respectively. at 20000-30000 m depth are shown in Figs 5.15 and 5.16. Model limits a~ =0.0001, cr+ = 1.0 S/m were imposed in each case. Figure 5.15(a) shows the extremal model of unconstrained variation which minimizes a. The lower bound for a computed from this model is 0.070 S/m and the log variation is 71. Figure 5.15(b) and (c) show minimization models with log variations of 19 and 7.4, respectively. The lower bounds for a computed from these models are 0.076 and 0.12 S/m, respectively. The extremal model of unconstrained variation which maximizes a for the apparent high-conductivity region is shown in Fig. 5.16(a). The upper bound for a computed from this model is 0.32 S/m and the log variation is 68. Figure 5.16(b) and (c) show maximization models with log variations of 21 and 7.5; the computed upper bounds for a are 0.27 and 0.22 S/m, respectively. If the extremal models shown in Figs 5.15(c) and 5.16(c) are accepted as geophysically realistic representations of the Earth, bounds for the average conductivity are 0.12 < a < 0.22 S/m. This establishes the region 20 000-30 000 m depth as a zone of high conductivity. The average conductivity of this region is greater than that of the low conductivity zone at 2000-7000 m depth (a < 0.0015 S/m) by (at least) two orders of magnitude. 10° ? 1 0 - 1 \ CO 1 0 -2 "N* I O - 3 IO"4 I I I I lllll "III I I 102 103 104 105 IO"3 10-2 IO-1 10° IO1 102 IO"3 IO"2 IO"1 10° IO1 102 T ( s ) Figure 5.15 Constructed extremal models which minimize a for the apparent high conductivity region 20000-30000 m depth for the LITHOPROBE MT data set. Model limits 0.0001 <cr, < 1.0 S/m were imposed in each case, (a) shows the extremal model of unconstrained variation. This model has a log variation of 71 and a model average of d = 0.070 S/m. (b) and (c) show the extremal models with log variations of 19 and 7.4 and model averages of 0.076 and 0.12 S/m, respectively. _i • ' icrl io° io1 ) • •—•— — 10"3 10"2 IO"1 10° io1 io2 10° ? io-1 \ S 10-2 'TT io-3 V 10"4 c u pH u - i r 1 1 i 1111 m 1 1 11 111 io2 103 104 z (m) 105 o — io~3 io-2 1 0 - i 10o ioi io2 T (s) Figure 5.16 Constructed extremal models which maximize a for the apparent high conductivity region 20000-30000 m depth for the LITHOPROBE MT data set. Model limits 0.0001 < a { < 1.0 S/m were imposed in each case, (a) shows the extremal model of unconstrained variation. This model has a log variation of 68 and a model average of a — 0.32 S/m. (b) and (c) show the extremal models with log variations of 21 and 7.5 and model averages of 0.27 and 0.22 S/m, respectively. Chapter 6 Non-linear appraisal using simulated annealing 6.1 Introduction In Chapter 5, two methods for appraising MT responses were described. Backus-Gilbert appraisal can be applied to the non-linear MT problem by linearizing about some reference model. Alternatively, bounds for conductivity averages can be obtained by constructing extremal models of bounded variation. A significant advantage of the latter method is that the appraisal is not limited to models that are linearly close to a particular model. However, since the extremal models are constructed via (iterated) linearized inversion, the possibility of the solution becoming trapped in a local extremum always exists. As previously noted, we have found that initiating the construction algorithm from a wide range of starting models results in the same extremal value for the model average. Also, the constructed extremal models have the same form as the exact solutions of Weidelt (1985). These facts provide some confidence that the algorithm generally converges to the global extremum (or at least an excellent approximation to it). This chapter presents a new method of appraisal for non-linear inverse problems which is not based on linearization. Rather, the method of simulated annealing (Kirkpatrick et al. 1983) is applied to the problem of constructing extremal models that reproduce a set of MT responses. Simulated annealing is a Monte-Carlo optimization procedure which has been successfully applied to many problems in the field of combinatorial optimization (e.g. van Laarhoven & Aarts 1987); however, its application to geophysical model appraisal would seem to be new. Simulated annealing is based on an analogy with statistical mechanics and mimics the thermodynamical process by which liquids freeze or metals cool and anneal to form crystals, which represent the state of minimum energy for the system. A major advantage of the method is its inherent ability to avoid being trapped in unfavorable local minima. This feature is of crucial importance to the application here. Although appraisal using simulated annealing is considerably less efficient than linearized methods, it represents a general and interesting new appraisal technique which may be used to corroborate the results of the extremal model analysis presented in Chapter 5. In the next section of this chapter, the method of simulated annealing and its analogy with statistical mechanics is briefly presented (for a comprehensive treatment, see Kirkpatrick et al. 1983 and Kirkpatrick 1984, or the monograph by van Laarhoven & Aarts 1987). In Section 6.3 the simulated annealing appraisal algorithm is described, and in Section 6.4 a number of examples of the analysis are presented and compared with the results of extremal model appraisal. 6.2 Simulated annealing Simulated annealing is a mathematical optimization procedure that mimics the physical process of annealing. Annealing is the way in which crystals are grown: a substance is first heated to melting, then cooled very slowly until a crystal is formed. At high temperatures the molecules of the liquid move freely with respect to one another; as the liquid is cooled, thermal mobility is lost. If the liquid is cooled slowly enough that the system reaches an equilibrium (steady-state) configuration at each temperature in the cooling process, the atoms are able to line themselves up and form a single pure crystal that is completely ordered and represents the global minimum free-energy state for the system. If the liquid is cooled too quickly, it does not obtain this ground state and the resulting crystal may have many defects, or the substance may form a glass with no crystalline order; these configurations represent local minima in energy. The study of the physical systems on which the method of simulated annealing is based is the domain of statistical mechanics. 6.2.1 Statistical mechanics Statistical mechanics provides methods of describing the (average) physical properties of a macroscopic system composed of many microscopic particles (e.g. atoms or molecules) in thermal equilibrium. Let each possible configuration of the system be defined by the set of M parameters r = {r;; i = 1 , . . . , M} which may represent, for example, the particle positions and velocities. A fundamental result of statistical mechanics is the Gibbs or Boltzmann probability distribution P(r) = ±e-EW"> T, (6.2.1) Z which gives the probability P of the system at (absolute) temperature T being in configuration r. In (6.2.1), E(r) is the energy of the system in configuration r, kB is Boltzmann's constant and the normalizing constant Z, called the partition function, is defined by Z = J 2 e ~ E { r ) / k B T , (6-2.2) r where the sum is over all possible system configurations. According to the Boltzmann distribution (6.2.1), the probability function for a system in equilibrium at (non-zero) temperature T is distributed over all possible configurations r. Thus, even at low temperature there is a small, but finite chance of the system being in a configuration corresponding to a high energy state. At non-zero temperature, the system configuration r is perturbed continuously due to thermal agitation. Of central importance here is the fact that, according to (6.2.1), perturbations to the system that increase the energy state are allowed, although they are less probable than fluctuations that decrease the energy. Thus the configuration sometimes undergoes transitions that are 'uphill' in energy, and it is these uphill transitions which allow the system to avoid being trapped in locally optimal configurations. As the temperature T decreases, however, the Boltzmann distribution (6.2.1) assigns progressively greater probability to low-energy configurations, and significant uphill excursions become increasingly less likely. In the limit as T—>0, the Boltzmann distribution collapses into the ground state for the system. This ground state often corresponds to a pure crystal which is completely ordered in all directions over distances up to billions of times the size of an individual atom and represents the global minimum-energy configuration for the system. However, in practice, low temperature is not a sufficient condition for achieving this ground state. To reach the minimum-energy state the system must be cooled very slowly, and a long time spent in the vicinity of the freezing point. If this is not done and the system is allowed to get out of equilibrium, it will not obtain the ground state but rather forms a polycrystalline or amorphous state (glass) with no crystalline order and only metastable, locally optimal structure. These configurations represent local minima for the system energy state. 6.2.2 The Metropolis algorithm Metropolis et al. (1953), in the earliest days of scientific computing, developed a simple algorithm which simulates the average behaviour of a physical system in thermal equilibrium. The system is parameterized by M model parameters (e.g. the positions r of a collection of atoms). In each step of the algorithm, an atom is given a small random displacement and the resulting change in the energy of the system AE is computed. The probability of such a change occurring is assumed to be P (AE) = e ~ A E / k B T . (6.2.3) If AE < 0 (i.e. the transition has lowered the system energy) the probability according to (6.2.3) is greater than unity; in this case the change is arbitrarily assigned a probability P = 1, and the transition is always accepted. The case AE > 0 (the system energy has increased) is treated probabilistically as follows. A random number £ is generated from a uniform distribution on the interval [0,1]. If £<P(AE), the new configuration is retained; if not, the original configuration is used to start the next step. Repeating this basic step many times simulates the thermal motion of atoms at a temperature T. The system eventually reaches equilibrium, and because of the choice of P(AE) in (6.2.3) the probability of a given configuration r evolves into the Boltzmann distribution (6.2.1). This general scheme of always accepting a downhill step while sometimes accepting an uphill step based on a probability distribution has become known as the Metropolis algorithm and is used in statistical mechanics as a random sampling technique to estimate average properties or integrals of the system (e.g. Barker & Henderson 1976; Binder 1978). 6.2.3 Combinatorial optimization using simulated annealing Kirkpatrick et al. (1983) devised an optimization procedure based on simulating the behaviour of a system of particles using the Metropolis algorithm, and applied it to a variety of problems in the optimal design of computer components. The problems they considered are examples of combinatorial optimization: finding the optimum value of an objective function defined for a discrete, but factorially large configuration space which in practice cannot be explored exhaustively. No general exact solution is known for such problems in which the computing effort does not increase exponentially with the number of parameters M; therefore, heuristic methods are often employed. The most common general framework used in heuristic solutions is known as iterative improvement. Iterative improvement begins with the system of parameters in a known configuration. A standard perturbation operation is applied to each part of the system in turn until a new configuration is found that improves the objective function. This new configuration is then adopted and the process continued until no further improvement can be found. Iterative improvement is sometimes referred to as a 'greedy' algorithm (e.g. Press et al. 1986) since it always proceeds downhill in the objective function. Because of this, the search often gets trapped in a local minimum, and it is customary to repeat the procedure a number of times starting from different configurations and save the best result. Kirkpatrick et al. (1983) developed simulated annealing as a new and general heuristic method for combinatorial optimization problems. The method is based on an analogy between the many undetermined parameters of the system to be optimized and the particles of an imaginary physical system. The objective function of the optimization problem is considered analogous to the energy of the physical system, with the ground state representing the optimal configuration sought in the optimization problem. The optimization procedure involves statistically modelling the evolution of the physical system using the Metropolis algorithm at a series of decreasing temperatures which allow it to anneal into a state of minimum energy. Allowing perturbations to the system which increase the objective function as well as those which decrease it according to the Metropolis criterion is crucial to escaping from local minima. In simulated annealing, the temperature T of the physical system has no obvious equivalent in the system being optimized and simply acts as a control parameter in the same units as the objective or energy function E (Boltzmann's constant kB is generally taken to be 1). The simulated annealing process begins with the system to be optimized in a known configuration and some procedure of generating random perturbations or changes in the configuration. The first step is to completely 'melt' the system: i.e. to repeatedly perturb the configuration at a high enough effective temperature that essentially all changes are accepted according to the Metropolis criterion (6.2.3) regardless of whether the objective function E is decreased or increased. This process completely disorders the system and renders the solution independent of the initial configuration. The temperature is then reduced in slow stages allowing enough perturbations at each temperature that the system reaches equilibrium before proceeding to a lower temperature. As the temperature is decreased, according to (6.2.3) the probability of accepting configuration changes which increase the objective function decreases. As the system configuration begins to approximate the ground state, perturbations which decrease E become less frequent. Finally, at a low temperature the system will 'freeze' and no further changes are accepted. The sequence of temperatures and the number of perturbations to the configuration attempted to reach equilibrium at each temperature is referred to as an annealing schedule. An appropriate annealing schedule is generally problem specific and may require trial-and-error experimentation. Also, determining the most effective method of perturbing the system and which factors to incorporate into the objective function require insight into the problem being solved and may not be obvious (Kirkpatrick et al. 1983). If an appropriate annealing schedule is followed, the configuration at which freezing occurs should approximate the global minimum for the objective function. Since controlled uphill as well as downhill steps are allowed, annealing is not a greedy algorithm like iterative improvement. Iterative improvement may be considered analogous to rapid cooling or 'quenching' of a physical system in which energy is rapidly extracted, resulting in a local minimum of the system energy (Kirkpatrick 1984). Taking many quenches will produce variations in this energy, but for large systems this variation will generally be much smaller than the difference between the quenched and ground states. It is not clear if this difference will be so great in practical combinatorial optimization problems. However, Kirkpatrick (1984) found that simulated annealing always gave better solutions than exhaustive iterative improvement searches for a number of representative optimization problems. Another interesting feature of simulated annealing is that configuration rearrangements generally proceed in a logical order. The temperature parameter T distinguishes classes of rearrangements: changes which cause the greatest decrease in energy tend to occur at high temperatures, and these features become more permanent as T is lowered; small-scale refinements in the configuration which reduce E only slightly are generally deferred until low temperatures. In order for simulated annealing to be efficient, calculating the change in the energy function AE should require much less computational effort than calculating the energy function E itself (Kirkpatrick 1984). A method of improving the efficiency of simulated annealing for some discrete problems was developed by Rothman (1986) who applied annealing to the estimation of residual statics from noisy seismic reflection data (see also Rothman 1985). He developed a one-step Metropolis algorithm by computing the relative probabilities of acceptance for each possible parameter change a priori and forming system perturbations based on weighted guesses. These perturbations are always accepted and thus the inefficiency of high rejection rates at low temperatures is eliminated. However, for problems with a large number of possible parameters values or parameters that vary continuously between given limits this modification would not seem practical. The final aspect of simulated annealing that will be discussed here is that the analogy between cooling a fluid and optimizing a function of many parameters may fail in an important respect. Whereas in an ideal system the atoms are all identical and the ground state is a regular crystal lattice, some optimization problems contain many distinct, non-interchangeable elements which make a regular solution unlikely. Also, conflicting objectives in the optimization problem may preclude a simple, well-ordered solution. Optimization problems with these characteristics are termed 'frustrated' problems (Kirkpatrick et al. 1983). Physical analogies of frustrated systems exist (e.g. Kirkpatrick et al. 1983; Kirkpatrick 1984), but will not be discussed here. In physical systems, frustration introduces degeneracy into the low-temperature states of the model so that a number of near-ground-state configurations exist with essentially identical energies. Similarly, in optimization problems, frustration makes the search for the optimal solution much more difficult. However, the degeneracy induced by the frustration implies that there should be many equivalent solutions which closely approximate the absolute optimum. In practice, finding one of these solutions is sufficient (Kirkpatrick 1984). 6.3 The simulated annealing appraisal algorithm In this section, a method of appraisal using simulated annealing is developed. The appraisal technique is applied here to the MT inverse problem. However, the method is general, requires no approximations (such as linearization) and can be applied to any inverse problem for which the forward problem can be solved. The method consists of formulating extremal model construction in terms of an optimization problem which may be solved using simulated annealing. Press et al. (1986) summarize the elements required to apply simulated annealing to an optimization problem: 1) A description of the system and possible system configurations. 2) An energy or objective function E to be minimized. 3) A method of randomly perturbing the system. 4) A temperature or control parameter T and an appropriate annealing schedule. The construction of extremal models may be formulated as a simulated annealing optimiza-tion problem as follows. The system to be optimized is taken to consist of the parameters {<7;, i = 1 , . . . , M} which form the discretized representation of the conductivity function o(z); this system is represented by a = {crt }. The ensemble of possible system configurations is taken to be the set of all configurations {<r,}, such that oj < ox < of, where <r~, of represent the lower and upper limits for the ith conductivity element. Since each <7, is allowed to vary con-tinuously between its limits, there are an infinite number of possible configurations; the problem formulated here is not therefore strictly a combinatorial optimization problem. However, with an appropriate procedure of perturbing the system it is straightforward to apply simulated annealing to this problem. Vanderbilt & Louie (1984) present a method of applying simulated annealing to continuous problems when there are no limits for the system parameters. The construction of an extremal model which minimizes a localized conductivity average CT(A) may be formulated as an optimization problem by minimizing the objective function where Rj(er) represents the responses predicted for configuration a and Sj is the standard deviation. In (6.3.1) the first term represents the difference between the achieved and desired X1 misfit, the second term represents the model average to be minimized, and a and (3 are trade-off parameters which determine the relative importance of the misfit and model average in the minimization. The trade-off parameters are varied to keep the misfit and model average of comparable importance throughout the optimization; two parameters are included to facilitate this process (determining values for a and /3 is considered in detail later). The goal is to minimize (6.3.1) such that a model configuration <ris constructed with an acceptable misfit and the smallest possible value of a. To construct an extremal model which maximizes a, the energy function (6.3.1) is modified to The energy functions (6.3.1) and (6.3.2) generally lead to frustrated optimization problems since minimizing or maximizing the model average while achieving an acceptable misfit tend to be contradictory objectives. Therefore, near-optimum configurations may be degenerate and in practice it can be very difficult to achieve the global optimum. However, the degeneracy implies that there should be many solutions which closely approximate the absolute optimum and are equally acceptable (Kirkpatrick 1984). (6.3.1) M (6.3.2) The basic step at each temperature of the annealing schedule involves perturbing the system <7, computing the resulting change in the objective function AE, and accepting or rejecting the new configuration based on the Metropolis criterion (6.2.3). System perturbations involve randomly changing one or more conductivity elements. A conductivity element cr,- is changed according to Gi - (?L + V ( a u - crL), (6.3.3) where 77 is a random number from a uniform distribution on [0,1]. In (6.3.3), a L and a v are initially taken to be err and af so that cr, can take on any value between its limits. After a sufficient number of temperature steps, large-scale structure of the solution becomes (relatively) fixed and extreme perturbations will inevitably be rejected. At this point a L and a v may be reset to cr;/2 and 2cr, (if these values are within the limits for <j;). A system perturbation can involve changing just one element cr4 with the perturbations sequentially cycling through the elements. Alternatively, random combinations of the elements may be changed in each perturbation. Combinations involve a random number of up to 5 elements which are chosen at random and changed according to (6.3.3). We have found the most effective manner of perturbing the system involves alternating between changing an individual element and a random combination of elements and cycling through the system a number of times. The most subtle aspects of implementing the simulated annealing algorithm involve determin-ing an effective annealing schedule and selecting appropriate values for the trade-off parameters a and /?. The annealing schedule developed here is based on suggestions by van Laarhoven & Aarts (1987). Since finding the best possible extremal value for a is crucial, we have adopted a cautious approach to the schedule. An initial temperature T0 is chosen so that at least 90 percent of the perturbations are accepted. This effectively 'melts' the starting model. The temperature is reduced according to the sequence T i+1 = eTTi, (6.3.4) where e T is typically 0.99. At each temperature, the system is perturbed as described above until the system 'freezes' and no perturbations are accepted at a number of consecutive temperature steps. Although faster annealing schedules could likely be devised by reducing the number of perturbations at high and low temperatures, the schedule described here has proved very effective in our applications. In addition to defining an appropriate annealing schedule, to construct extremal models it is essential to keep the misfit and model average of comparable importance in the objective function over the entire temperature range. To accomplish this, the trade-off parameter (3 is varied with temperature. At high temperatures where a changes freely, large changes in the misfit occur and initially the value of f3 is taken to be large. At lower temperatures where structure which approximately reproduces the data becomes permanent in the system configuration, (3 must be reduced. In our formulation, (3 is set to a large value at the initial temperature T0 and reduced with decreasing temperature according to fi(T i+1)=epfi(Ti), (6.3.5) where ep < 1. Appropriate values of f3(T 0) and ep are problem-dependent and may require some trial-and-error experimentation. Also, achieving a misfit x1(or) precisely equal to x d not required. If x1(< r) is within a specified tolerance txi of xd the model is considered to be acceptable and the value of a may be reduced by a prescribed factor (typically 10-100) so that more importance is allotted to extremizing a. Unfortunately, the computation of AE for each perturbation requires a full solution to the forward problem at each frequency in the data set. Therefore, simulated annealing is considerably less efficient than linearized methods in constructing extremal models and can be quite slow. However, since simulated annealing is renowned for its ability to avoid unfavourable local minima, it provides a useful method of corroborating the results of linearized analysis. 6.4 Appraisal examples To demonstrate the method of appraisal using simulated annealing and to compare this optimization procedure with the linearized approach of Chapter 5, consider the synthetic MT test case described in Section 3.3.1. This case was analyzed using linearized appraisal in Section 5.5.1; however, the problem is altered slightly here to reduce the computation times required by the simulated annealing method. The number of elements in the depth partition is reduced and the data set is halved by omitting the response at every second frequency. It will be seen that this does not significantly affect the computed results. As in Section 5.5.1, conductivity limits of o~ — 0.002 S/m and a + = 0.2 S/m are assumed. Figure 6.1 shows the funnel function bounds computed by minimizing and maximizing a(z 0, A) for z0= 1300 m, i.e. the first (relative) high conductivity zone of the true model. The solid line indicates the bounds calculated using simulated annealing, the dotted line indicates the bounds from the linearized analysis and the dashed line shows the true model averages. The lower and upper bounds are established by constructing extremal models which minimize or maximize a(A) for averaging widths A of 100,200,400,600,800,1000,1200, and 1400 m. The bounds computed using the two methods are almost indistinguishable over the entire range of A, indicating that these methods produce virtually identical extremal values for a. The bounds produced by simulated annealing and linearized inversion are summarized in Table 6.1. The simulated annealing solutions required, on average, almost two days CPU time per extremization on a SUN 4/310 workstation, and represent a very careful approach to the annealing schedule to ensure that the best possible results are obtained. By comparison, the linearized extremizations required only about 3-5 minutes of computation time. Importantly, the extremal values computed from the linearized analysis are slightly better at each value of A than those achieved by the simulated annealing method (i.e. the linearized method yields larger upper bounds and smaller lower bounds). Although the difference is not significant, it does indicate that even an intensive application of simulated annealing could not obtain better extremal values than the linearized appraisal. This provides confidence that the linearized extremization yields meaningful bounds. Finally, comparing Fig. 6.1 with Fig. 5.7(a) indicates that bounds computed in this example are essentially the same as those obtained for the larger data set and finer partition. 0.20 \ 0.15 W 0.10 S3 b 0.05 b 0 . 0 0 0 400 800 1200 Averaging Width A (m) Figure 6.1 Computed lower and upper bounds for a(zQ = 1300,A) for the synthetic MT example with model limits 0.002 < a, < 0.2 S/m. The solid line indicates bounds computed using simulated annealing appraisal, the dotted line indicates bounds from linearized appraisal, and the true model averages are indicated by the dashed line. Table 6.1 Summary of upper and lower bounds computed for a(z 0= 1300, A) using simulated annealing and linearized inversion to compute extremal models for the synthetic MT example. a (S/m) a (S/m) A (m) minimization maximization annealing linearized annealing linearized 100 0.0022 0.0020 0.200 0.200 200 0.0022 0.0020 0.200 0.200 400 0.0022 0.0020 0.118 0.119 600 0.0022 0.0020 0.0866 0.0884 800 0.0022 0.0020 0.0760 0.0767 1000 0.0134 0.0127 0.0680 0.0696 1200 0.0174 0.0172 0.0604 0.0616 1400 0.0211 0.0200 0.0537 0.0545 It is interesting to examine the extremal models which produced the bounds shown in Fig. 6.1. Figure 6.2(a) and (b) show the models which minimize a(z 0 = 1300, A = 800) constructed via simulated annealing and linearization, respectively (the true model is indicated by the dashed line). The similarity of the solutions to depths greater than 6000 m is amazing considering the completely different approaches of the two methods. The form of the models over this depth range is similar to Weidelt's (1985) theoretical solution to the extremization problem for a small number of exact data. Weidelt's solutions consist of thin conductive zones embedded in an insulating halfspace with conductive zones just excluded at either edge of the region of minimization (900-1700 m depth). This would seem to suggest that both the linearized and simulated annealing extremizations approach discretized approximations to the global extremum solution. The deep structure of the extremal models in Fig. 6.2(a) and (b) differs somewhat; this structure is well removed from the region of minimization and does not effect the extremal value for a. Both constructed models have a misfit of x 1 = 16.0 and the measured and predicted responses are shown in the panels on the right. Figure 6.3(a) and (b) show the extremal models (x1 =16.0) which maximize a(z 0 = 1300, A =800) constructed using simulated annealing and linearized inversion, respectively. Again, the solutions are very similar to depths greater than 6000 m, particularly near the region of maximization 900-1700 m depth. These models resemble Weidelt's solution with thin conducting zones just included at either edge of the region of maximization. The similarity between the models constructed via simulated annealing and linearization and the correspondence in form with Weidelt's exact solution are characteristic of all the extremal models which produced the funnel function bounds shown in Fig. 6.1. A final example of appraisal using simulated annealing considers the LITHOPROBE MT data set described in Section 3.4.1. Minimum-structure models for this data set are shown in Fig. 4.8 and indicate a low conductivity region at 2000-7000 m depth. Upper bounds for this region were computed using the linearized appraisal algorithm (Fig. 5.14). Considering the full data set of complex responses at iV = 34 frequencies and a fine depth partition (M = 130) represents a 0.20 r-K B \ 0.15 CO s—' 0.10 N 0.05 0.00 IO"3 10~2 10"1 10° 101 IO2 IO"3 10-2 1 0 - 1 1 0 0 1 0 1 1 0 2 T (s) Figure 6.2 Extremal models constructed by minimizing a(z 0 = 1300, A = 800) for the synthetic MT example with model limits 0.002 < <T, < 0.2 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from the linearized inversion. The two models are in excellent correspondence, particularly in the region where the conductivity is minimized. The constructed models have a misfit of x 1 = 16.0; the fit to the true responses is shown in the panels to the right. The true model is indicated by the dashed line. ^ 0.05 to 0.00 ) ' — — ' — IO"3 IO"2 10"1 10° 101 102 0.20 -^ 0.15 -m ^ 0.10 -^ 0.05 -to 0.00 102 103 z (m) 104 1 0 - 3 10~ 2 10 - 1 10° 101 102 T ( s ) Figure 6.3 Extremal models constructed by maximizing a(zo = 1300,A = 800) for the synthetic MT example with model limits 0.002 < ax < 0.2 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from the linearized inversion. The two models are in excellent correspondence, particularly in the region where the conductivity is maximized. The constructed models have a misfit of x 1 = 16.0; the fit to the true responses is shown in the panels to the right. The true model is indicated by the dashed line. demanding test for the simulated annealing appraisal algorithm. Model limits of 0.0001 <&,< 1.0 S/m were imposed. The upper bound for a computed using simulated annealing is 0.0021 S/m and the constructed extremal model is shown in Fig. 6.4(a). For comparison, an upper bound of 0.0023 S/m was computed using the linearized appraisal algorithm, and the extremal model is reproduced in Fig. 6.4(b). Both constructed models have a misfit of x 1 =95.0. The features of the two extremal models are in good agreement in the region of maximization 2000-7000 m. The conductivity remains near the lower limit in this region with narrow high-conductivity zones just included at either edge of the region of maximization. In all the cases we have considered, we have found that if a careful annealing schedule is followed, the extremal value for a computed using simulated annealing is very close to that computed using the linearized appraisal algorithm. Importantly, however, we have never found simulated annealing to produce a better extremum than linearization. This indicates that the linearized approach produces excellent extremal values which in many cases may represent the best (discretized) approximation to the global extremum. The similarity in form to Weidelt's exact extremal solutions would seem to support this conclusion. In addition to the application to constructing extremal models for MT appraisal, the simulated inversion method developed in this chapter has enormous flexibility and can be applied to minimize or maximize any (linear or non-linear) functional of model and/or misfit in any inverse problem for which a solution to the forward problem exists. The computational efficiency of the annealing method depends directly on the efficiency of the forward solution. IO -3 IO"2 IO"1 10° 10l IO2 1 0 ° ? IO"1 \ S IO"2 N 10 - 3 IO"4 t r t j f iiij f i i i mil i i i I II ii 102 103 104 z (m) 105 IO"3 IO-2 10"1 10° 10l 102 T ( s ) Figure 6.4 Extremal models constructed by maximizing a over the apparent low conductivity region 2000-7000 m depth for the LITHOPROBE MT data set. Model limits are 0.0001 < a t < 1.0 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from the linearized inversion. The two models are in good correspondence near the region where the conductivity is maximized. The computed upper bounds for a are 0.0021 S/m using simulated annealing, and 0.0023 S/m using linearized inversion. The constructed models have a misfit of X 1 =95.0; the fit to the true responses is shown in the panels to the right. Chapter 7 An application to MT monitoring of earthquake precursors 7.1 Introduction Earthquake prediction is a challenging but important goal. Earthquakes may be preceded by anomalous changes in tilt, strain, seismic velocities, magnetic fields, and electrical conductivity. These precursor signals may precede the onset of an earthquake by a matter of hours, days or years. The dilatancy model of the Earth's crust provides an explanation of how precursory signals may occur prior to a seismic event. Dilatancy is an inelastic volume increase in stressed rock which occurs prior to fracture. The volume increase results from new pores and microcracks forming and propagating within the rock due to the steady increase in tectonic stress. Water diffusing into the newly-created microcracks causes an increase in the electrical conductivity in the focal region of a forthcoming earthquake. It may be possible, therefore, to detect a change in the conductivity with time at suitably located sites. The lead time of the precursor depends on the magnitude of the earthquake and the distance between the epicentre and the recording site (Rikitake 1987). The precursory increase in conductivity may be as great as 30 percent (Sumitomo & Noritomi 1986). Canadian studies using the magnetoteliuric method to monitor earthquake precursors began in 1974 in Charlevoix County, Quebec, near the centre of seismicity on the north shore of the St. Lawrence River. A number of MT stations were established in the region, and the data collected have been analyzed by Kurtz & Niblett (1978). Their results showed an approximately 14 percent increase in the impedance tensor per year; however, there was no clear association between impedance changes and seismic activity. The central region of Vancouver Island, where two major earthquakes have occurred this century, is another location in Canada where earthquake precursor studies are underway. Figure 7.1 shows the locations of the two earthquakes. The earthquake on the west coast of the island CONTINENTAL VANCOUVER Figure 7.1 Tectonic map of the earthquake precursor study site on Vancouver Island. Stars indicate earthquakes, solid circles indicate MT sites. The lower diagram shows a cross-section of the region with earthquake focal depths indicated. occurred in 1918 and had a magnitude of 7.0 (Cassidy et al. 1988). The earthquake on the eastern side occurred in 1946 with a magnitude of 7.3 (Rogers & Hasegawa 1978). The focal depths are shown in the lower portion of the figure and correspond to approximately 15 km for the 1918 event and 30 km for the 1946 event. The recurrence interval for earthquakes of this size in the Vancouver Island region is estimated to be approximately 40 years (Rogers, personal communication), hence, a seismic event may be imminent. Relevelling surveys show a change in the sense of direction of the vertical deformation in the Campbell River region. The data indicate a relative uplift of 4 mm/yr from 1977 to 1984, and a subsidence of approximately the same magnitude from 1984 to 1988 (Dragert & Lisowski 1990). In response to the estimated earthquake recurrence interval and the change in levelling data, D. R. Auld and L. K. Law of the Pacific Geoscience Centre, Sidney, British Columbia, initiated a program in 1986 to investigate changes in electrical parameters with time. From 1986 to 1989 magnetotelluric data have been measured annually at a number of sites in central Vancouver Island (the site locations are indicated by the solid circles in Fig. 7.1 and will be discussed in a later section). The goal of this chapter is to interpret some of the data from this study by applying inversion procedures to the problem of detecting precursory changes in the conductivity. Previous MT studies have evaluated precursor signals only in terms of changes in the measured responses. Examining the responses may be sufficient to detect that a change in the conductivity has occurred; however, it is difficult to quantitatively interpret changes in the conductivity at depth simply by inspecting changes in the data. This is correctly formulated as an inverse problem. Explicitly formulating the inverse problem allows investigation of the changes required in conductivity models of the Earth corresponding to yearly variations in the responses. This may be much more informative in terms of evaluating the processes and depths involved in observed changes in the data. It may be important, for instance, to determine if changes are localized at the focal depth of previous earthquakes. The model construction procedures developed in Chapters 3 and 4 can be applied effectively to this inverse problem. Minimum-structure models generally provide a reasonable representation of the conductivity structure. In addition, a model constructed from data measured one year may be used as the base model in a smallest-deviatoric inversion of responses measured in a subsequent year. This provides a direct method of investigating the changes required in the earth conductivity model to accommodate the yearly variations of the data. In this chapter, MT inversion methods are applied to the earthquake precursor study of Auld and Law. The geological and tectonic setting of the study are briefly reviewed in Section 7.2. Section 7.3 describes the field experiment and Section 7.4 considers the temporal change in the responses. In Section 7.4 model construction techniques are applied to the MT field data in order to investigate yearly changes required in conductivity models of the Earth. Finally, Section 7.5 briefly considers an interpretation of the regional structure. 7.2 Geological and Tectonic setting Vancouver Island is composed of a number of terranes that are part of a series of accreted terranes which make up the western segment of the Canadian Cordillera. The central Vancouver Island region is part of Wrangellia, a large composite terrane made up of volcanic, plutonic, sedimentary and metamorphic rocks of Paleozoic to Jurassic age. Overlying Wrangellia and underlying part of the study area is the Nanaimo Group composed of conglomerates, sandstones, mudstones and shales of Late Cretaceous age. This complex occurs beneath the Alberni Valley and along the eastern side of Vancouver Island, extending from approximately Nanoose Bay to the Campbell River region. The Nanaimo Group of sedimentary rock extends to depths of 200 m beneath the Alberni Valley and to at least 500 m in the coastal region (Gabrielse et al. 1990). The tectonic setting of central Vancouver Island is complex. The plate boundaries of the northeast Pacific region are shown in Fig. 7.1. Riddihough (1977) concluded that the subducting Juan de Fuca and Explorer oceanic sub-plates are interacting independently with the lithosphere beneath Vancouver Island. Hyndman et al. (1970) located a zone of faulting, known as the Nootka fault, extending from the northern end of the Juan de Fuca ridge to the continental margin off central Vancouver Island. It is likely that the tectonic forces causing earthquakes on Vancouver Island are a result of some form of stress coupling between the dynamics of the subduction zone and faults in the crust. There are a large number of old crustal faults on Vancouver Island, with the dominant fault pattern striking northwest-southeast. One of these northwest trending faults, the Beaufort Range fault, is within the current study area in the central region of Vancouver Island. Earlier MT work on Vancouver Island has been carried out by Kurtz et al. (1990) as part of the LITHOPROBE multi-disciplinary geoscience research program. Their results indicated a conducting zone at depths greater than 20 km beneath Vancouver Island which correlated with the top of the seismic E-reflector. This strong reflective zone is believed to delineate the top of the Juan de Fuca plate (Green et al. 1986). The conducting zone is believed to result from saline fluids supplied by the subducting oceanic crust and dehydration reactions. Their MT recording sites were approximately 50 km to the southeast of the present study area. 7.3 Field experiment Figure 7.1 shows the location of the four magnetotelluric measurement sites of the present experiment. The sites designated BRF and UBC, located on and slightly to the northeast of the Beaufort fault zone, were established in 1986. Sites designated HLN, located northwest of Campbell River, and SLW, west of Sproat Lake, were established in 1989. Data were collected at each site once a year, at the same time each year, recording for one to two months. Variations in five components of the natural EM fields were measured and recorded digitally on a cassette recorder. The responses were processed using a robust technique for magnetotelluric data developed by Egbert & Booker (1986). This method involves iterative reweighting of the data to remove outliers and rejection of Fourier harmonics which have less than a prescribed minimum power in the horizontal magnetic field measurements. The robust processing method results in more accurate estimates of the response functions and significantly reduced error estimates (e.g. Jones et al. 1989). The MT data collected in the present study are generally of good quality with standard deviations varying from 3 to less than 1 percent. Data quality varied considerably from site to site and year to year. 7.4 Temporal change in responses It may be possible to detect changes in the Earth's conductivity prior to an earthquake by a change in the measured MT responses. The amount of temporal change in the measured data at the two sites BRF and UBC has varied. An annual variation in apparent conductivity of at least a few percent would be expected due to a number of factors including ground water level and ground temperature (Xu 1986). As an example of the amount of year to year change in the measurements, Fig. 7.2 shows the percentage change in the measured apparent conductivities and phases at BRF for the latest two years of data, 1988 and 1989 (the responses correspond to measurements of the north-south component of magnetic field and the east-west component of the electric field). The change in apparent conductivity from 1988 to 1989 is of the order of a 5 percent decrease and varies somewhat with period. Between 1986 and 1987 the apparent conductivity increased by approximately 6 percent. Unfortunately, different instrumentation had to be used at BRF after 1987 which confuses the total change at this site over the four year period. However, at the UBC site, the total change in apparent conductivity over the four years was about a 10 percent decrease: approximately 6 percent decrease for 1986-87, no change for 1987-88, and approximately 4 percent decrease for 1988-89. Unfortunately, the UBC site is located on about 500 m of high-conductivity sediments of the Nanaimo Group, which degrades the quality of the data at this site. Yearly changes in the phase measurements at both sites were observed, but are more difficult to interpret. It is difficult to determine exactly what change would be required in the MT responses to qualify as a clear earthquake precursor signal. For a precursor to be detected by monitoring apparent conductivity, the change would likely have to exceed the measured annual change of up to 6 percent (Auld, personal communication). To date, no changes have been observed which CD OO cd o 20 10 0 10 a [ ] „ n [] x " [] U -L i I I n [] -ti - 2 0 j i i i i i 1 0 2 1 0 3 Figure 7.2 Change in observed response at site BRF between 1988 and 1989. (a) shows percentage change in apparent conductivity, (b) shows percentage change in phase. could be interpreted as a precursor to a large seismic event. For the duration of the experiment, no earthquakes above magnitude 3.0 have occurred in the central Vancouver Island region. 7.5 Temporal change required in conductivity models In this section, model construction techniques are applied to determine changes in the conductivity models which are required by the yearly variations in the data. Figure 7.3 shows the and l2 minimum-structure models constructed for the MT responses measured at site BRF in 1988 (this data set was chosen as it appears to be the best quality). The responses consist of determinant averages of the impedance tensor measured at 18 periods between 93 and 2643 s, and are shown as apparent conductivities and phases in Fig. 7.3(b) and (c). The actual data set that was inverted consists of amplitudes and phases of the R response computed from the impedance tensor, as described in Section 3.4.1. The l2 model has a x 2 misfit of 36 and the lx model has a x 1 misfit of 29. These misfits represent the expected values for x 2 and x \ respectively, and indicate that 1-D model solutions are justified. In fact, according to the D+ criterion of Parker (1980), 1-D models are justified for each of the four years of data recorded at BRF at a misfit considerably less than the expected value for x2-An approximate method of determining if a change in the conductivity model is required between two years, say, 1988 and 1989, is to compute the misfit of the responses predicted for the 1988 model to the responses measured in 1989. If this misfit is less than the expected value of x 2 = 36, the constructed model for 1988 adequately fits the 1989 data set and no change in the conductivity model is required by the data. If the misfit is greater than x 2 = 36, a change in the model may be required. The responses predicted for the 1988 model misfit the data measured in 1989 by x 2 = 47, this indicates that a change in the conductivity model is likely required by the data. Likewise, the 1988 model misfits the 1987 responses by x 2 = 177, which indicates that a change in the model is required between these years. Finally, the 1988 model misfits the 1986 data by x 2 = 18, so no change in the model is required by the data. B w 10 -1 1 JR H « ST" — i — a If I — ^ ' i i i i i i i 102 10 3 70 60 50 40 30 20 T (s) Figure 7.3 Minimum-structure models constructed by inverting the data recorded at site BRF in 1988. The Z2 solution is a smooth model, while the solution represents a layered (discontinuous) model. The above misfit analysis may be used to determine if two data sets require a change in the model. In order to obtain a representation of the change required, the smallest-deviatoric model may be constructed. As an example of this procedure, the 1988 l2 minimum-structure model, shown in Fig. 7.3(a), is taken to be the base model. A model may then be constructed which fits the 1986 data, but deviates least from the 1988 base model; this model is shown in Fig. 7.4(a). The model is identical to the 1988 model, which verifies that no change in the solution is required for the 1986 and 1988 responses. This null result may be due to the (relatively) large uncertainties associated with the 1986 data, as shown in the panels to the right of the constructed model. Figure 7.4(b) shows the smallest-deviatoric model constructed for the 1987 responses (solid line). The 1988 base model is also shown (dotted line). At most depths the two models are indistinguishable, indicating that the structure of the 1988 model is consistent with the 1987 data. However, the models differ in the high conductivity zone at about 10-15 km depth. The 1987 data require a slightly higher conductivity in this region. Unfortunately, it is not known to what extent the change in instrumentation between 1987 and 1988 affects these results. The smallest-deviatoric model (solid line) constructed for the 1989 responses as well as the 1988 base model (dotted line) are shown in Fig. 7.4(c). Again, at most depths the two models are indistinguishable, but differ slightly in the the high conductivity zone at about 10-15 km depth. Even though this difference is small, it indicates that the 1989 data require a slightly lower conductivity in this region. This agrees with the observation that the apparent conductivities decreased slightly from 1988 to 1989, and serves to illustrate the smallest change in the conductivity models (measured in the l2 norm) compatible with the change in the data. Also, the inversion procedure takes into account the amplitude and phase information, in contrast to evaluating changes solely in terms of the apparent conductivities. The changes in the constructed models indicated in Fig. 7.4(b) and (c) show that the data require a decrease in the conductivity of the conductive zone at about 10-15 km depth from 1987 to 1988 and from 1988 to 1989. The percentage change in the conductivity, averaged over three B 1 0 - 1 \ GO N IO"2 102 103 104 105 106 s 1 0 - 1 \ CO N 10 - 2 ' ' • ""II I I I I Mil i i i mill 6 \ CO IO"1 "at—dK s H X XT—BC——£ ]£' I I I I I I I I I 6 1 0 - 1 -I \ in -<o b ^ 6 0 E b 40 » 1 1 1 Mil i i M I N I 1 1 1 Mil 20 6 \ CO 10-1 60 s-102 103 104 105 z ( m ) io8 40 20 i i i i i i i i 1 1 1 1 i 102 103 T ( s ) Figure 7.4 Smallest-deviatoric models for site BRF with the 1988 minimum-structure model as base model, (a), (b) and (c) show the smallest-deviatoric models constructed by inverting the 1986, 1987 and 1989 responses, respectively. partition elements at the peak of the conductive zone, indicates a 14 percent decrease between 1987 and 1988 and a 4 percent decrease between 1988 and 1989. As mentioned previously, the yearly variations in data described in Section 7.4 would likely not be judged significant as an earthquake precursor signal. However, if changes in the conductivity model are required at depths which coincide with tectonic stress or with the focal depths of previous earthquakes, special attention might be paid to even small conductivity changes. Unfortunately, the depth of the conductive zone where changes are indicated is not reliably determined in this study. Near-surface inhomogeneities in conductivity can introduce a static shift into the measured apparent conductivities, which has the effect of displacing the conductivity as a function of depth (e.g. Jiracek 1988). Kurtz et al. (1990) carried out MT measurements at 25 locations about 50 km to the southeast of the present study. They considered 1-D models at each site based on the inversion of Fisher & LeQuang (1981) in order to construct a 2-D conductivity model of the region. The 1-D models Kurtz et al. obtained included a high-conductivity zone similar to the models in Fig. 7.3(a). In their study, the depth to this zone varied from less than 10 km to more than 40 km, and they attributed this variation primarily to static shift effects. It is likely that the depth of the conductive zone is no better constrained in the present study. Therefore, in order to use model construction techniques to reliably determine the depths of required conductivity changes, an independent measurement of the static shift is required (e.g. Sternberg et al. 1988). 7.6 Regional interpretation An interesting difference between the models obtained in the the current study and those obtained by Kurtz et al. (1990), is the presence of a secondary conductive zone shown at about 60 km depth in Fig. 7.3(a). Kurtz et al. interpreted the single conductive zone in their models as coinciding with a strong seismic reflective zone, known as the E-reflector, which is believed to delineate the top of the subducting Juan de Fuca plate. However, they saw no evidence of a similar, but shallower, reflective zone known as the C-reflector. The C-reflector is believed to be associated with older oceanic lithosphere that was underplated to the base of the Island after a westward jump in subduction occurred in the Late Eocene (Keen & Hyndman 1979). Given the uncertainty associated with the true depth of the conductive features, several interpretations are possible for the two conductors indicated in the present study. One possible interpretation is that the shallower conductive zone is associated with the C-reflector and the deeper conductor with the E-reflector. Alternatively, the C-reflector may not be associated with a conductive feature, as Kurtz et al. (1990) believe. In this case the shallower conductor may correspond to the E-reflector and the secondary conductive region may be a minor feature which cannot be associated with any prominent seismic event. Figure 7.5 investigates these possibilities using l2 minimum-structure models constructed by inverting the responses measured at sites BRF (1988), UBC (1988) and HLN (1989). Figure 7.5(a) shows the model constructed by inverting (unaltered) responses from site BRF. Figure 7.5(b) shows the model constructed by inverting the BRF responses which had been altered to simulate the static shift in apparent conductivity that results in the shallow conductor occurring at 18 km depth. This is the approximate depth of the C-reflector at the site. The model in Fig. 7.5(c) was constructed by inverting responses which were altered to simulate the static shift in apparent conductivity that yields the shallow conductor at 35 km depth, the approximate depth of the E-reflector. Figure 7.5(d), (e) and (f) show a similar series of models constructed by inverting unaltered and altered.responses from the UBC site, and Fig. 7.5(g), (h) and (i) show the same series for the HLN site. The data sets at sites UBC and HLN were fit to a x 2 value somewhat smaller than the expected value. This appears to be justified as it enhances the structural similarities common to all three sites and does not appear to cause structure generated by data noise. The models constructed from the unaltered responses at the three sites, shown in Fig. 7.5(a), (d) and (g), are of similar character: resistive at shallow depths with two conductive peaks and a conductive halfspace at depth. The depths to the conductive features are not in agreement between the three stations. The models constructed so that the first conductive zone is at 18 km Figure 7.5 Minimum-structure models for the data sets measured at sites BRF (1988), UBC (1988) and HLN (1989). (a) shows the model constructed by inverting the (unaltered) responses from site BRF. In (b) the BRF data were altered to simulate the static shift in apparent conductivity that results in the shallow conductor occurring at 18 km depth (the approximate depth of the C-reflector). In (c) the responses were altered to simulate the static shift that yields the shallow conductor at 35 km depth (the depth of the E-reflector). (d), (e) and (f) show a similar series of models constructed by inverting unaltered and altered responses from the UBC site; and (g), (h) and (i) show the same series for the HLN site. Panels (c), (f) and (i) also show a cross-section through the 2-D model of Kurtz et al. (1990) as the dashed line. S3 i—» O depth, shown in Fig. 7.5(b), (e) and (h), have the second conductor at approximately 80, 60 and 60 km depth for sites BRF, UBC and HLN, respectively. These depths do not agree with the depth of the E-reflector (35 km), so the interpretation of the two conductors corresponding to the C- and E-reflectors does not seem to be borne out. Figure 7.5(c), (f) and (i) show the models for the three sites constructed so that the first conductive zone is at 35 km. In these panels a cross-section through the 2-D conductivity model of Kurtz et al. (1990) closest to the sites of the present study is included as a dotted line. In this case the secondary conductive zone appears as a minor feature associated with the increase in conductivity at depth. This would appear to be the most likely interpretation. Chapter 8 A modified linearized inversion algorithm 8.1 Introduction In Chapter 2 complete Frechet differential series were derived for several choices of MT response. The ratio of the higher-order (non-linear) terms to the linear term was used to quantify the relative linearity in order to determine which response yields the most accurate linearized expansion. Another goal in deriving expressions for the higher-order terms was to investigate whether the additional information contained in these terms could be used to improve the efficiency of an inversion scheme while still making use of linear inverse theory to provide the solutions. A method of accomplishing this by approximating the higher-order terms at the current model is developed in this chapter. The corrections consist of successively approximating the linearization error or remainder term in order to approximate a response functional for which the inverse problem is exactly linear. This method would seem to represent a novel approach to linearized inversion which can be implemented as a practical algorithm. Correcting the linearized solution at each iteration can reduce the number of iterations and total computational effort required to converge to an acceptable model. In addition, a correspondence between the corrected linearized solutions and iterations of the modified Newton's method for operators is established. 8.2 Correcting the linearized inversion In Section 2.2.1 a complete expansion for the MT response R(a, zm = 0) about a starting model a 0 was derived in terms of a constant, linear and remainder term: oo oo Rj (a, 0) = Rj (o-o, 0) +JGj ((Jo, z) 8a (z) dz - J~Gj ( ao, z) [Rj (a, z)-Rj (<70, z)f dz, j = l,...,N, (8.2.1) where Gj(a 0,z) represents the first Frechet kernel, given by (2.2.17), and the subscript j indicates an implicit dependence on frequency j = 1 , . . . , N. The last term on the right side represents the linearization error or remainder term which contains the higher-order contributions. This term is neglected in a linearized approach; however, by retaining the remainder term equation (8.2.1) may be rearranged to give oo Rj (cr, 0) - Rj (<T 0,0) + JGj (<70, z) <7*0 (*) dz oo + 0 0 oo oo /iu> 2 f —Gj(a 0,z)[Rj(a,z)-Rj(cr 0,z)] dz — j Gj (cr 0, z) a (z) dz. (8.2.2) This equation is in the form of a Fredholm integral equation of the first kind which may be solved for a(z)\ the remainder term is included on the left side as a component of a modified response functional. This equation is exact: no terms have been neglected and there is no requirement for Sa = a — a 0 to be small. Thus, the left side of (8.2.2) corresponds to a choice of response functional which may be expressed (exactly) as a linear functional of cr. If Rj(cr, z) is known for all depths z, the modified responses may be evaluated and (8.2.2) inverted to construct a conductivity model cri(z) using linear inverse theory. It should be noted that even in this ideal case when an exact equation is inverted, ax(z) is not guaranteed to reproduce the measured responses, i.e. ,R(<7I,0) = -R(CT,0) is not guaranteed. Rather, since the inversion has been formulated as a linear problem, the linear functionals of a and o\ will be identical, i.e. oo oo J Gj (a 0, z) (T\ (z) dz = J Gj (a 0, z) a (z) dz, j = 1,..., N. (8.2.3) 0 0 However, for responses measured at a reasonable coverage of frequencies over a wide bandwidth, (8.2.3) represents a stringent condition requiring a x to resemble the true model a, and the constructed model will generally reproduce the measured responses. Figure 8.1 shows an example of this inversion procedure for the synthetic test case described in Section 3.4.1. Figure 8.1(a) shows the true model (dashed line) and the starting model (solid 0.10 ? 0.08 \ CO 0.06 * ^ 0.04 N 0 .02 "tT 0.00 -l J3 1 0 CO IO"2 <0 10 - 3 80 £ 60 40 20 IO"3 10~2 IO"1 10° IO1 102 0.10 X—s a 0.08 \ CO 0.06 0 .04 N 0 .02 0.00 ? io-1 \ CO IO"2 id b 10"3 80 IO"3 IO"2 IO"1 10° IO1 102 T ( s ) Figure 8.1 One-step inversion for exact linearization, (a) shows the starting model (misfit X2 = 76 700) and (b) shows the h flattest model (misfit x 2 = 50.0) constructed in one iteration when the responses are corrected for the exact remainder term. The responses predicted for the starting model and the constructed model are compared with the measured data in the panels on the right line) which consists of a halfspace of conductivity 0.02 S/m. The corresponding misfit for this model is x 2 = 76 700; the measured responses (represented as apparent conductivity and phase) and those predicted for the starting model are shown in the panel to the right. The modified response functionals, given as the left side of (8.2.2), were computed and this equation was inverted for the l2 flattest conductivity model. The model constructed is shown in Fig. 8.1(b) and achieves the desired misfit of x 2 =50.0 in one iteration; the fit to the data is shown in the panel to the right. The standard linearized solution neglects the higher-order information contained in the remainder term and requires a number of iterations to converge to an acceptable solution, as shown in Fig. 3.2 of Section 3.4.1. Of course, in any practical problem the electromagnetic fields are not measured at all depths, so R{a, z) and therefore the remainder term are generally not available. However, an approximate method of correcting the linearized solution can be developed which is applicable when only surface measurements are available. To formulate the method, (8.2.2) is rewritten in recursive form as oo Rj (a, 0) - Rj (<7O,0) + JGj (<70, z) <70 (z) dz o oo oo + J ^Gj(a 0,z)[Rj(a k,z)-Rj(a 0,z)]2 dz = jGj (a 0, z) ak+1 (z) dz, o o A; = 0 ,1 , . . . , (8.2.4) where a k + 1 on the right side represents the model that is constructed at the kth step and R(a k,z) in the remainder term is used to approximate R(a,z). At step k = 0, the remainder term approximation of (8.2.4) is zero and the expression reduces to the standard linearized equation that may be inverted to produce a model ox . For an l2 norm solution this requires computing the responses -Rj(<70,0) and kernel functions Gj(cr 0,z), and performing a singular value decomposition (SVD) of the inner product matrix T, as described in Chapter 3. The constructed model a x should be a better approximation to the true model than the starting model <j0, therefore R(a x,z) is used in approximating the remainder term in step k = 1 and (8.2.4) is inverted for a new model a2. Note that step k = 1 requires only the computation of R(a x,z) and the integration of the remainder term to update the left side of (8.2.4). The kernel functions are still evaluated at <J0 and may be stored at the previous step and retrieved. In fact, the eigenvalues and eigenvectors of T computed at the previous step may also be retrieved so an SVD is not required in this step. This represents a substantial saving in computational effort since the forward modelling required to compute the kernels and the SVD of T are the most computationally intensive processes in an inversion iteration. Thus, step k = 1 is simply a repeat of the inversion performed at step k — 0 with only the responses modified. The modification amounts to an approximate correction for the higher-order terms that were neglected in step k = 0. This correction can be repeated with a 2 (the model produced by step k= 1) used in the remainder term approximation to produce a model a3 at step k = 2, and so on. The series of steps k— 1,2,... represent sequential corrections to the responses used in the initial linearized inversion (step k = 0) with the corrected response functionals approximating more and more closely the modified responses of (8.2.2) for which the inverse problem is exactly linear. This method of successively correcting the responses and repeating the same linearized inversion is in contrast to the standard procedure of updating the model and kernels and performing a new linearized inversion at each iteration. The new method of treating (8.2.2) is reminiscent of the Born approximation for integral equations used in scattering theory (e.g. Morse & Feshbach 1953, p. 1073). Since R(cr,z) in the remainder term is unknown, it is initially set equal to the assumed starting value R(a 0, z). This is, in effect, the linearization approximation and allows the construction of a model <j\(z) using linear inversion methods. However, rather than simply repeating the process with the new model o-j as the starting model, <7x is used to compute an updated (non-zero) remainder term using R(ai,z) to approximate R(a,z), and the initial linearized inversion is repeated with the responses corrected for the new remainder approximation. Before examples of this procedure are presented, however, an interesting correspondence is derived. To obtain an alternate expression for the remainder term approximation in (8.2.4), consider the remainder term in the expansion of R(crk, 0) about <r0: oo - / —Gj (<7Q, Z) [Rj (A K, z)-Rj (A 0, z)]2 dz = J Vo o oo oo Rj (crf c, 0) - Rj (<70,0) - JGj (<70, CTfe (2) dz + jGj ((To, z) (TO (z) dz. 0 0 Substituting this expression into the formulation for the remainder term correction (8.2.4) leads to 00 00 Rj (a, 0) - Rj (o-fc, 0) + JGj ((T 0, z) ok (z) dz = JGj (CR 0, Z) <rfe+ 1 (2) dz, 0 0 Jfc = 0,1, (8.2.6) Equation (8.2.6) represents an alternative formulation for the sequence of corrections that is even 00 simpler to evaluate than (8.2.4). For k > 1 only R(o k, 0) and the integral f G(a 0, z)ak(z)dz must 0 be computed to update the left side of (8.2.6). R(a k, 0) is generally available from the previous step (where it is computed in order to determine the misfit associated with a k ) and the integral is very efficiently computed if a k is taken to be a piece-wise constant function of depth on the Zi partition {z 0, zt,..., zM} and values of / G(a 0, z)dz are stored at step k — 0. By comparison, updating the modified response functional formulated according to (8.2.4) requires calculating R(cr k,z) for all depths 2 and computing the remainder term requires numerical integration that is much less efficient. In either formulation, once the responses are updated in correction step k > 1, the inversion is very efficient since computation of the kernel functions and the SVD of the inner-product matrix are required only at step k = 0. It is interesting to note that (8.2.6) is identical to the expression on which the standard linearized model norm inversion is based, except that the kernel functions are evaluated at the starting model for each step and not updated. This is recognized to correspond to the modified Newton's method for operator equations, described in Section 2.1.1. Thus, correcting the responses by approximating the higher-order terms is equivalent to performing a modified Newton step; the procedure may be considered from either perspective. Inversion algorithms based on both formulations have been developed and give identical results; however, (8.2.6) leads to a computationally more efficient algorithm. A question of practical importance concerns the effectiveness of this method in proceeding from a simple starting model to a complex constructed model which reproduces a set of measured responses. This question is investigated in Fig. 8.2 by considering the inversion of the synthetic MT test case initiated from three different starting models a 0 . In Fig. 8.2(a) the starting model consists of a 300-m thick surface layer of conductivity 0.004 S/m overlying a halfspace of conductivity 0.02 S/m; this model is indicated by the dotted line and has a misfit of x 2 = 6890. The model constructed from 9 correction steps applied to an initial linearized inversion is indicated by the solid curve (in this construction m(z) = a(z), f(z)=logz+z 0 and the correct surface conductivity value was supplied). The constructed model achieves the desired misfit of X 2 = 50.0; however, the model has slightly more structure at depth than the equivalent (iterated) linearized inversion, shown in Fig. 3.3. This point will be considered in more detail later. The initial linearized inversion and correction steps were carried out with the target misfit X 2 at each step representing a reduction in the actual misfit by a factor of 5, i.e. P = 5 in (3.3.6). This reduction step is quite conservative and probably not optimal in terms of a rapid reduction in the misfit, however, it ensures a very stable inversion. In order to evaluate the improvement that results from the correction steps, the solution is compared to the best-fitting model that could be constructed from a single linearized inversion iteration. This best-fitting model was constructed by a trial-and-error procedure of adjusting the value of P and performing a single linearized inversion until the (approximate) optimal value was determined which lead to the greatest reduction in the misfit. The optimal value was found to be P = 100; the model constructed from this linearized inversion step is indicated by the dotted curve in Fig. 8.2(a). The linearized solution has a misfit of x 2 =613; thus, the correction steps have reduced the misfit by more than a factor of 10. The panel to the right compares the observed responses to those computed for the correction-scheme inversion (solid curve) and the single linearized inversion iteration (dotted curve). Finally, it is noted that while the full linearized inversion required about 0.10 N 6 0 .08 \ CO 0.06 * * — ^ 0.04 0.02 b 0.00 10~3 10 - z IO"1 10° 101 102 0.10 ' s fci 0 .08 \ CO 0.06 0.04 N 0 .02 b 0.00 1 _L ' ' • I " I I ' ' ' I ' HI 02 103 _l I L 104 10-3 IO"2 10"1 10° 10 1 1 0 2 10 - 3 10 - 2 10-1 10° 101 102 T ( s ) Figure 8.2 Correction-scheme inversion for m(z) = a(z). In (a), (b) and (c) the true model is given by the dashed line, the starting model by the dotted line, the best-fitting linearized model by the dotted curve and the model constructed by successively correcting the linearization for higher-order terms by the solid curve, x 2 misfit values for each model are given in Table 8.1. The panels on the right compare the predicted responses for the correction-scheme model (solid line) and the linearized model (dotted line) with the measured data. Table 8.1 Summary of misfits for the starting model, best-fitting linearized inversion solution and correction-scheme solution for Fig. 8.2, with model m(z) = a(z). x2 x2 x2 Number of Starting Linearized Correction corrections Fig. 8.2(a) 6 890 613 50.0 9 Fig. 8.2(b) 76700 2351 69.7 14 Fig. 8.2(c) 33400 7700 1820 40 Table 8.2 Summary of misfits for the starting model, best-fitting linearized inversion solution and correction-scheme solution for Fig. 8.3, with model m(z) = \oga(z). x2 x2 x2 Number of Starting Linearized Correction corrections Fig. 8.3(a) 6 890 1183 50.0 12 Fig. 8.3(b) 76700 7750 50.0 27 Fig. 8.3(c) 33400 3470 2100 12 22 s of CPU time on a SUN 4/310 workstation, each correction step required only 1.0 s. Figure 8.2(b) illustrates a similar comparison except that the starting model consists of a 0.02-S/m halfspace (dotted line) which has a misfit of x 2 = 76 700. The model constructed from 14 correction steps (solid curve) achieves a misfit of x 2 = 69.7, slightly larger than the desired value of 50, however, further correction steps do not reduce this misfit significantly. Again, the constructed model exhibits some unnecessary structure at depth. The best-fitting model that could be constructed from a single linearized iteration is indicated by the dotted curve; this model has a misfit of x 2 = 2350. Thus, the correction steps have reduced the misfit by more than a factor of 30. The best-fitting linearized solution exhibits the correct general increase in conductivity with depth, but only hints at the layered structure with no indication of the high conductivity zone at 6000-10000 m depth. By comparison, the correction-scheme solution clearly resolves all layers. The panel to the right compares the responses computed for both solutions. A significant improvement is indicated for the correction-scheme solution, particularly in the phase responses. In Fig. 8.2(c) the starting model consists of a 0.004-S/m halfspace which has a misfit of X2 = 33 400. The model constructed from 40 correction steps achieved a misfit of x 2 = 1820. In this case the correction-scheme is not able to converge to an acceptable solution and the model exhibits incorrect structure at depth. However, this solution is still a considerable improvement on the best-fitting linearized model which has a misfit of x 2 = 7700; this improvement is evident in the panel on the right which compares the responses computed for both solutions. The misfits associated with the models shown in Fig. 8.2 are summarized in Table 8.1. The basis for the correction-scheme inversion for the conductivity model a(z) is the exact equation (8.2.2). As described in Section 2.5.1, this equation can be recast in terms of log a(z) as the model, however, this requires neglecting second-order terms. Therefore, it is important to verify if the correction scheme still yields satisfactory results in this case. Figure 8.3 shows a comparison identical to that of Fig. 8.2 except that the model is taken to be m(z) = log a(z). The misfits associated with the starting model, best-fitting linearized solution and correction-scheme solution and the number of correction steps applied are summarized in Table 8.2. The results are 6 \ W IO"1 10"2 -IO"3 icr3 io-2 io-1 io° io1 io2 T (s) Figure 8.3 Correction-scheme inversion for m(z) = \ogcr(z). In (a), (b) and (c) the true model is given by the dashed line, the starting model by the dotted line, the best-fitting linearized model by the dotted curve and the model constructed by successively correcting the linearization for higher-order terms by the solid curve. \ 2 misfit values for each model are given in Table 8.2. The panels on the right compare the predicted responses for the correction-scheme model (solid line) and the linearized model (dotted line) with the measured data. similar to those described above for Fig. 8.2: for the starting models shown in Fig. 8.3(a) and (b) the correction scheme achieves an acceptable solution, although some unnecessary structure is evident when the models are compared with the (iterated) linearized solution, shown in Fig. 3.4. In Fig. 8.3(b) the correction scheme achieves a misfit which is more than two orders of magnitude smaller than that of the best-fitting linearized inversion. In Fig. 8.3(c) the correction scheme achieves a smaller misfit than the best-fitting linearized inversion, but does not converge to an acceptable model and the solution exhibits incorrect structure at depth. Figure 8.3 indicates that the correction scheme is an effective solution for m(z) — log <7(2). Figures 8.2 and 8.3 indicate that the higher-order correction scheme can result in a significant reduction in the misfit over the best-fitting linearized inversion, and in some cases converges to an acceptable model. However, it is noted that even when the misfit converges to the desired value, the constructed models often exhibit incorrect structure at depth (in comparison to models produced via iterated linearization). This may be understood by examining the recursive formulation (8.2.4). The error in (8.2.4) arises from approximating R(a, z) in the remainder term by R(a k,z). In general, this approximation will be somewhat in error at all depths. However, since the depth of penetration of the electromagnetic fields and therefore the Frechet kernels Gj(z) and response function Rj(z) increases with decreasing frequency, the accumulated error in the integral approximating the remainder term should be larger at the lower frequencies. Since it is the low-frequency responses that determine the deep structure, the greater accumulated error in the modified response functionals at low frequencies likely results in the incorrect structure at depth. 8.3 A practical inversion algorithm The procedure of applying successive correction steps to a linearized inversion represents a fixed point iteration method. This procedure should exhibit linear convergence when it converges; by comparison, the standard (iterated) linearized approach exhibits quadratic convergence. The advantage of the correction method is in the greatly reduced computational requirements: once an initial linearized inversion is performed, a number of correction steps may be carried out at a fraction of the computational expense of performing another complete linearized inversion. However, if the starting model a 0 is sufficiently far from an acceptable solution it may be that this procedure requires a prohibitive number of steps to converge or does not converge at all. In many cases it appears that the most effective approach is to perform a small number of correction steps (two to ten) between standard linearized inversion iterations. This procedure should converge whenever the standard linearized method converges. We have found that this correction procedure generally reduces the number of iterations and total computation time required to converge to an acceptable model. In this section several examples of this procedure are presented. Figure 8.4 compares the misfit as a function of (linearized) iteration number for the standard linearized inversion (squares) and the linearized inversion corrected for higher-order terms (triangles) applied to the synthetic MT test case. The starting model consists of a 0.004-S/m halfspace and the model is m(z) = cr(z). The dashed line indicates the desired misfit value of X2 = 50. The target misfit xi a t e a ° h s t e P the corrected inversion represented a reduction in actual misfit by a factor of P = 5. Five corrections were applied to the first linearized iteration, however, only two were required at the second iteration to reduce the misfit to the desired level. To evaluate the improvement that results directly from the higher-order correction steps, a trial-and-error procedure was used to find the optimum value of P which results in the fewest iterations for the linearized inversion. However, even for the optimum value of P = 100, four linearized iterations were required. For this problem a complete linearized inversion requires about 22 s computation time while a correction step requires about 1.0 s; thus, the standard linearized inversion required about 88 s total computation time, while the corrected inversion required only about 52 s. The model constructed by the corrected linearized scheme is shown in Fig. 8.5(a) by the solid curve, and its fit to the measured responses is shown in Fig. 8.5(b) and (c). The model constructed by the standard linearized inversion method is also shown in Fig. 8.5(a) by the dotted curve. The two models are almost identical although the standard inversion procedure I t e ra t ion n u m b e r Figure 8.4 x 2 misfit as a function of (linearized) inversion iteration number for the synthetic MT test case. The squares represent the optimal standard linearized inversion, and the triangles represent the linearized inversion corrected for higher-order terms. The starting model was a 0.004—S/m halfspace and m{z) — cr(z). z (m) 10 a GQ IO" 2 10 - 3 b 1 1 I 1 1 1 1 Hill 1 1 1 1 HIM I I I HUM 1 1 II1 1 0 - 3 10~2 1 0 _ 1 10° IO1 102 80 c P 60 S- 40 20 - AG AT -1 1 N JH C „ .1 I I I HILL I I I INN 1 ' 1 i i nun i I I I II IO" 3 IO" 2 IO"1 10° IO1 T (s) 102 Figure 8.5 /2 flattest models constructed in the inversion of Fig. 8.4. In (a) the dotted curve indicates the model constructed by the standard linearized inversion, the solid curve indicates the model constructed by the linearized inversion corrected for higher-order terms. The dashed line indicates the true model, (b) and (c) compare the responses predicted for the corrected inversion. results in slightly less structure (as measured by the l2 norm of the model gradient). However, if one further linearized inversion is performed in the correction scheme, the model obtained is identical to that shown for the standard inversion. As a final example of the corrected linearization scheme, consider the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. In this example the data are represented as amplitude and phase of R (rather than real and imaginary parts) and the model is taken to be m(z)= log cr(z); both of these choices result in additional higher-order terms that are neglected in (8.2.2), as described in Section 2.5.1. Figure 8.6 compares the misfit as a function of iteration number for the standard and corrected linearized inversions (the absolutely flattest model was constructed in each case). Again, a conservative value of P was used for the corrected inversion and a trial-and-error procedure was performed to find the value of P which resulted in the most rapid convergence for the standard linearized algorithm. Nonetheless, the corrected inversions converged to the desired misfit value of x 2 = 244 (dashed line) in three linearized inversions with seven correction steps per inversion, while the standard linearized inversion required six iterations. The two constructed models are shown to be essentially identical in Fig. 8.7. The examples presented in this chapter have involved Z2 model norm inversions using SVD and the method of spectral expansion to perform the inversion. It a straightforward to apply a similar correction procedure to model norm constructions which carry out the inversions using linear programming (LP). In this case since only the responses and not the kernel functions are modified at each correction step, the computational expense of performing the corrections steps can be greatly reduced by storing the LP solution basis at each step and retrieving it at the subsequent step. I t e ra t ion n u m b e r Figure 8.6 x 2 misfit as a function of (linearized) inversion iteration number for the LITHO-PROBE MT data set. The squares represent the optimal standard linearized inversion, and the triangles represent the linearized inversion corrected for higher-order terms. The starting model was a 0.0004—S/m halfspace and m(z) = log a(z). s U1 1 0 -3-102 IO3 104 1 Z ( m ) -1 -2 --3 I i 1 1 1 Mil 1 i i inn i 1 1 1 1 Mil 1 lllllll 1 1 1 1 lllll 1 b Mil IO" 3 IO" 2 10" 1 10° 101 IO2 80 -60 -40 20 jf 5 C 0 1 1 i 11 mi i lllllll 1 1 1 1 1 Mil 1 1 1 1 Mil 1 III 1 i i i i 1 0 - 3 IO" 2 10" 1 10° 101 102 T (s) Figure 8.7 l2 flattest models constructed in the inversion of Fig. 8.6 (LITHOPROBE MT data set). In (a) the dotted curve indicates the model constructed by the standard linearized inversion, the solid curve indicates the model constructed by the linearized inversion corrected for higher-order terms. (b) and (c) compare the responses predicted for the corrected inversion. Chapter 9 Summary and discussion The purpose of the work presented in this thesis was to develop and apply methods of inverse theory to the problem of inferring information about the Earth conductivity structure from magnetotelluric measurements. The MT inverse problem is functionally non-linear, however, linearization allows a variety of construction and appraisal algorithms to be developed using techniques of linear inverse theory. The linearization of the MT problem was considered in detail in Chapter 2. Complete expansions for several choices of MT response were derived by a small modification to a standard perturbation approach. These expansions consist of a linear term and an infinite series of higher-order Fr6chet differentials. The higher-order terms sum to a closed-form remainder term which conveniently represents the linearization error. In a linearized approach the higher-order terms are neglected; however, inversion procedures which require second-order terms are used in optimization theory (e.g. Gill et al. 1981) but have not been applied to the MT inverse problem. The second-order term derived here could be used in such schemes. Also, the remainder term can be used to correct inversions for the linearization error, as described later. The expansions were derived for arbitrary measurement depth zm; however, it was shown that the conductivity above this depth is irrelevant, and therefore, if measurements are made at depth it is always possible to translate the coordinate system so that zm = 0 and the surface-measurement expressions apply. The expansions were illustrated for the simple case of constant-conductivity models where Frechet differentiation is equivalent to ordinary differentiation and the terms can be evaluated directly. In a linearized approach, it is important to verify that neglected terms are of second order, which implies that the response is Fr6chet differentiable. Although Frdchet differentiability of the c response has received considerable attention, the proof is problem-dependent and has not been presented for the R response in the literature. In Chapter 2 the Frechet differentiability of R is proved This verifies that algorithms based on the linearization of this response are well-founded. In a non-linear problem such as MT, the choice of response may well affect the linearity of the problem, with the correct choice resulting in the most accurate and efficient linearized algorithm. The relative linearity of the R and c responses was quantified by considering the ratio of non-linear to linear terms. For the special case of constant conductivity profiles, it was proved analytically that R is always more linear than c. The general case was investigated by considering the linearity ratios for a number of representative models. The conclusions of this study are that R is generally more linear than c and is therefore the preferred choice of response for linearized inversion. Alternative formulations to linearization have been suggested. Gomez-Trevino (1987) used scaling properties of Maxwell's equations to derive an exact, non-linear integral equation relating the conductivity to measured responses. He called this expression the similitude equation, and suggested that MT inversion could be based on this formulation rather than linearization. The similitude equation was examined in Chapter 2 and found to be inadequate for inversion in that it implicitly ignores first-order terms. In fact, the uniqueness of the Frechet derivative implies that the linearized expression is the only linear relationship between response and model that is accurate to first order. Therefore, linearization would seem to be the obvious basis for model norm inversions. In Chapter 3 an iterative inversion algorithm was developed based on the linearization of the R response. This algorithm may be used to construct acceptable conductivity models which minimize an l2 norm. Two model norms were considered. The smallest-deviatoric model may be constructed by minimizing the norm of the deviation from an arbitrary base model. The base model may represent the best estimate of the Earth structure from any available information (e.g. well logs, geology), or it may be chosen to investigate the range of acceptable models. Alternatively, the flattest or minimum-structure model may be constructed by minimizing the l2 norm of the model gradient. Constructing minimum-structure models reduces the possibility of being misled by model features that are not required by the data. There is reason to believe that features of the minimum-structure solution are characteristics essential to fitting the responses. The Earth may be more complex than the simplest model, but these additional complexities are not resolved by data and are not justified in the constructed model. The Z2 flattest and smallest-deviatoric solutions tend to be smooth models which represent structure in terms of continuous gradients in the conductivity. The standard method of constructing flattest models requires specifying a model value at some fixed point a priori in order to express the data constraints in terms of the model derivative. In general, the flattest model is not a unique entity since supplying different values lead to different flattest models. If a model value is known reliably, it is valuable to include this information in the inversion; however, specifying an inaccurate value can introduce false structure into the constructed model. A method was presented in Appendix A to compute an optimal model value directly from the responses. This value is optimal in the sense that it results in the absolutely flattest model which has the smallest possible norm of the model gradient. It was shown in Chapter 3 that this procedure can significantly reduce the amount of structure in constructed conductivity models. Chapter 4 presented an inversion algorithm which uses linear programming to construct minimum-structure or smallest-deviatoric models that minimize an norm. Both model norms are formulated as measures of variation: the minimum-structure model minimizes the variation of the model with depth, the smallest deviatoric model minimizes variation from the base model. This also makes it straightforward to minimize a functional which combines both model norms. In addition, the minimum-structure solution is formulated in terms of the model, not its derivative, which obviates the need to integrate the data equations or specify a model values The li solutions resemble layered earth models with structural variations occurring discon-tinuously at distinct depths. This is in contrast to the smooth solutions constructed in Chapter 3. It is important to recognize that in either case the form of the solution (gradient or layered) is due to the inversion procedure and is not demanded by the measurements. The and /2 inversions offer complementary solutions, and in practice a complete interpretation should consider both. The inversion algorithms presented in Chapters 3 and 4 consider the model to be either <j or log <7, include an arbitrary weighting function in the model norm, and fit the data to a specified level of misfit; these algorithms should provide considerable flexibility in constructing acceptable conductivity profiles. This development is based on the belief that it is always valuable to produce a variety of models and to take into account as much additional information or insight into the problem as possible. Such flexibility in model construction allows some exploration and understanding of the range of acceptable solutions. As a result of the inherent non-uniqueness of the inverse problem, a finite data set cannot impose any bounds on the value of the model at a point. However, model averages over a finite width are generally constrained by the data. Model features can be appraised by constructing extremal models which minimize and maximize localized conductivity averages (Oldenburg 1983). These extremal models provide lower and upper bounds for the conductivity average over the region of interest. In Chapter 5 an efficient and robust appraisal algorithm was developed which uses linear programming to extremize conductivity averages. For a given depth of interest zQ, upper and lower bounds may be computed for a number of averaging widths A and plotted as a function of A. This yields a funnel function diagram which provides immediate insight into the resolving power of the data at z0. For optimal bounds it is important that the constructed extremal models are geophysically reasonable. The appraisal method was extended by constraining the total variation to limit unrealistic structure and ensure that the extremal models are plausible. The variation bound can be specified in terms of a or log a. The extremal models of Chapter 5 as well as the minimum-structure and smallest-deviatoric models of Chapters 3 and 4 are constructed via linearized inversion; therefore, the possibility always exists of the algorithm becoming trapped in a local (rather than global) minimum. In practice, it is difficult to verify that the solution represents the absolute minimum. One method of investigating this is to repeat the inversion with different starting models. We have initiated our inversion procedures from diverse starting models and never found a case where the solution differed significantly. This does not constitute proof that a global minimum has been found, but does provide confidence that the algorithms are not strongly dependent on the starting model. In order to corroborate the results of the linearized appraisal, a method of constructing extremal models using simulated annealing was developed in Chapter 6. Simulated annealing is a Monte-Carlo optimization procedure based on an analogy between the parameters of a mathematical system to be optimized and particles of a physical system which cools and anneals into its ground state according to the theory of statistical mechanics. Simulated annealing was developed as a combinatorial optimization procedure (Kirkpatrick et al. 1983) and is well known for its inherent ability to avoid unfavourable local minima. In the cases we have considered, simulated annealing and linearized appraisal yield essentially identical results for the model-average bounds. This provides confidence that better extrema cannot be obtained and that meaningful bounds have been computed. In addition, examining the extremal models constructed by each method indicates a close correspondence with Weidelt's (1985) analytic solution to the extremal model problem for a small number of exact MT data. Although the simulated annealing approach is considerably slower than the linearized analysis, it represents an interesting new appraisal technique. The method of simulated annealing appraisal is general, requires no approximations (such as linearization) and can be applied to any inverse problem for which the forward problem can be solved. In Chapter 7, model construction procedures were applied to analyze MT responses measured yearly over the past four years at a number of sites on Vancouver Island, Canada. The data were recorded to monitor local changes in conductivity as a possible earthquake precursor. Previous MT studies have evaluated precursor signals based on changes in the measured responses (impedances or apparent resistivities). However, formulating an explicit inverse problem allows investigation of the corresponding changes required in conductivity models of the Earth. This may be much more informative in terms of determining the processes and depths involved in observed changes in the data. In particular, a model constructed from data measured one year was used as the base model in a smallest-deviatoric inversion of responses measured in a subsequent year. This provided a direct method of investigating the changes required in the earth conductivity model to accommodate the yearly variations in the data. In the current study, no changes judged to be significant were observed in responses or models, and no seismic events larger than magnitude 3.0 occurred during the study period. In Chapter 8, a method of correcting linearized inversion iterations was developed. The corrections consist of successively approximating the linearization error using the analytic ex-pression for the remainder term derived in Chapter 2. These corrections are used to approximate a response functional for which the inverse problem is exactly linear. The procedure is remi-niscent of the Born approximation for integral equations. This method would seem to represent a novel approach to linearized inversion and was implemented in a practical algorithm. It was shown that correcting the linearized solution at each iteration can reduce the total number of linearized iterations and total computational effort required to converge to an acceptable model. In addition, a correspondence between the corrected linearized solutions and iterations of the modified Newton's method for operators was established. The methods developed in this thesis represent a comprehensive package of construction and appraisal algorithms for investigating the 1-D MT inverse problem. The algorithms that have been implemented are robust, practical, and efficient (except for simulated annealing). The algorithms were illustrated using synthetic test cases and MT field data collected as part of the LITHOPROBE Southern Cordilleran transect in southeastern British Columbia. Finally, it should be noted that many of the methods developed here for the 1-D MT problem are general and could be applied to a variety of inverse problems. For instance, the author has implemented an l2 inversion algorithm for the 1-D dc resistivity problem using Frechet kernels derived by Oldenburg (1978), and has applied appraisal using extremal models of bounded variation to the problem of inferring plasma densities from laser interferometric data (e.g. Oldenburg & Samson 1979). An application that is particularly interesting would involve extending the lx minimum-variation solution to 2-D model construction. This is straightforward for a cellular 2-D model since the horizontal variation can be expressed in exactly the same manner as the vertical variation. A linear programming formulation analogous to that developed in Chapter 4 may then be used to minimize an objective function which represents the total variation in two dimensions. This would seem to be a promising approach to constructing minimum-structure 2-D models for the MT problem, but is beyond the scope of this thesis. References Adam, A., 1980. Relation of mantle conductivity to physical conditions in the asthenosphere, Geophysical Surveys, 4, 43-55. Anderssen, R. S., 1975. On the inversion of global electromagnetic induction data, Phys. Earth Planet. Int., 10, 292-298. Aubin, J. P. & Ekeland, I., 1984. Applied nonlinear analysis, John Wiley & Sons, New York. Backus, G. E., 1970a. Inference from inadequate and inaccurate data, 1, Proc.Natn.Acad. Sci. USA, 65, 1-7. Backus, G.E., 1970b. Inference from inadequate and inaccurate data, 2, Proc.Natn.Acad.Sci. USA, 65, 281-287. Backus, G.E., 1970c. Inference from inadequate and inaccurate data, 3, Proc.Natn.Acad.Sci. USA, 67, 282-289. Backus, G. E., 1972. Inference from inadequate and inaccurate data, in Mathematical problems in geophysical sciences, American Mathematical Society, Providence, RI. Backus, G.E. & Gilbert, F., 1967. The resolving power of gross earth data, Geophys.J.R.astr. Soc., 16, 169-205. Backus, G. E. & Gilbert, F., 1968. Numerical applications of a formalism for geophysical inverse problems, Geophys.J.R.astr.Soc., 13, 247-276. Backus, G. E. & Gilbert, F., 1970. Uniqueness in the inversion of inaccurate gross earth data, Phil. Trans.R. Soc.Lond. Ser.A., 266, 123-192. Barker, J.A. & Henderson, D., 1976. What is 'liquid'? Understanding the states of matter, Reviews of Modern Physics, 48-4, 587-671. Barsukov, O. M., 1972. Variations of electrical resistivity of mountain rocks connected with tectonic causes, Tectonophysics, 14, 273-277. Berdichevsky, M.N. & Dimitriev, V.I., 1976. Basic principles of interpretation of magnetotel-luric sounding curves, in Ad£m, A., Ed., Geoelectric and geothermal studies, KAPG Geo-physical Monograph, Akad. Kiado, 165-221. Bertero, M., De Mol, C. & Pike, E. R., 1985. Linear inverse problems with discrete data: I. General formulation and singular system analysis, Inverse Problems, 1, 301-330. Bertero, M., De Mol, C. & Pike, E. R., 1988. Linear inverse problems with discrete data: II. Stability and regularization, Inverse Problems, 4, 573-594. Binder, K., 1978. Monte Carlo methods in statistical physics, Springer, New York. Cagniard, L., 1953. Basic theory of the magnetoteliuric method of geophysical prospecting, Geophysics, 18, 605-635. Cassidy, J.F., Ellis, R.M. & Rogers, G.C., 1988. The 1918 and 1957 Vancouver Island earthquakes, Bull. Seism. Soc. Am., 78, 617-635. Chave, A.D., 1984. The Frechet derivatives of electromagnetic induction, J.Geophys.Res., 89, 3373-3380. Claerbout, J. F. & Muir, F., 1973. Robust modelling with erratic data, Geophysics, 38, 826-844. Constable, S. C., Parker, R. L. & Constable, C. G., 1987. Occam's inversion: a practical algorithm for generating smooth models from EM sounding data, Geophysics, 52, 289-300. Dosso, S.E. & Oldenburg, D. W., 1989. Linear and non-linear appraisal using extremal models of bounded variation, Geophys.J.Int., 99, 483-495. Dragert, H., & Lisowski, M., 1990. Crustal deformation measurements on Vancouver Island, British Columbia: 1976 to 1988, Proceeding of the IAG symposium, Edinburgh, 1988, In press. Egbert, G. D. & Booker, J. R., 1986. Robust estimation of geomagnetic transfer functions, Geophys. J. R. astr. Soc., 87, 173-194. Electromagnetic Research Group For The Active Fault, 1982. Low electrical resistivity along an active fault, the Yamasaki fault, J.Geomag.Geoelectr., 34, 103-127. Electromagnetic Research Group For The Active Fault, 1983. Electrical resistivity structure of the Tanna and the Ukihashi fault, Bull. Earthq. Res. Inst., 58, 265-286. Filloux, J.H., 1980. Magnetoteliuric soundings over the Northeast Pacific may reveal spatial dependence of depth and conductance of the asthenosphere, Earth Planet. Sci.Lett., 46, 244-252. Fischer, G. & Le Quang, B. V., 1981. Topography and minimization of the standard deviation in one-dimensional magnetoteliuric modelling, Geophys. J.R. astr. Soc., 67, 279-292. Fullagar, P.K., 1981. Inversion of horizontal loop electromagnetic soundings over a stratified earth, Ph.D. thesis, University of British Columbia, Vancouver. Gabrielse, H., Monger, J.W.H., Wheeler, J.O. & Yorath, C.J., 1990. Morphogeological belts, tectonic assemblages and terranes, The Cordilleran Orogen in Canada, GSC, Geology of Canada, 4, In press. Garland, G. D., 1975. Correlation between electrical conductivity and other geophysical param-eters, Phys. Earth Planet. Int., 10, 220-230. Gass, S. I., 1975. Linear programming: methods and applications, McGraw-Hill, New York. Gill, P. E., Murray, W. & Wright, M. H., 1981. Practical optimization, Academic Press, London. G6mez-Trevino, E., 1987. Nonlinear integral equations for electromagnetic inverse problems, Geophysics, 52, 1297-1302. Gough, D. I., 1974. Electrical conductivity under western North America in relation to heat flow, seismology, and structure, J. Geomag. Geoelectr., 26, 105-123. Green, A. G., Clowes, R.M., Yorath, C. J., Spencer, C., Kanasewich, E. R., Brandon, M. T. & Sutherland-Brown, A., 1986. Seismic reflection imaging of the subducting Juan de Fuca plate, Nature, 317, 210-213. Griffel, D.H., 1981. Applied functional analysis, John Wiley & Sons, New York. Hermance, J.F. & Grillot, L.R., 1974. Constraints on temperatures beneath Iceland from < magnetotelluric data, Phys. Earth Planet. Int., 8, 1-12. Heustis, S.P., 1987. Construction of non-negative resolving kernels in Backus-Gilbert theory, Geophys.J.R.astr.Soc., 90, 495-500. Heustis, S.P., 1988. Positive resolving kernels and annihilators in linear inverse theory, Geophys. J. Int., 94, 571-573. Hobbs, B. A., 1982. Automatic model for finding the one-dimensional magnetotelluric problem, Geophys.J.R.astr.Soc., 68, 253-264. Honkura, Y., Niblett, E. R. & Kurtz, R.D., 1976. Changes in magnetic and telluric fields in a seismically active region of eastern Canada: Preliminary results of earthquake prediction studies, Tectonophysics, 34, 219-230. Hoover, D.B., Long, C.L. & Senterfit, R.M., 1978. Some results from audiomagnetotelluric investigations in geothermal areas, Geophysics, 43, 1501-1514. Huston, V. & Pynn, J. S., 1980. Applications of functional analysis and operator theory, Aca-demic Press, London. Hyndman, R.D., Riddihough, R.P. & Herzer, R., 1979. The Nootka fault zone — a new plate boundary off Western Canada, Geophys.J.R.astr. Soc., 58, 667-683. Jiracek, G. R., 1988. Near-surface and topographic distortions in electromagnetic induction, Presented at 9th IAGA Workshop on Electromagnetic Induction in the Earth and Moon, Sochi, USSR. Jones, A. G. & Hutton, R„ 1979. A multi-station magnetoteliuric study in southern Scot-land — II. Monte-Carlo inversion of the data and geophysical and tectonic implications, Geophys. J.R. astr. Soc., 56, 351-368. Jones, A. G., Kurtz, R.D., Oldenburg, D.W., Boerner, D.E. & Ellis, R., 1988. Magnetoteliuric observations along the LITHOPROBE Canadian Cordilleran Transect, Geophys. Res.Lett., 15, 677-680. Jones, A. G., Chave, A. D., Egbert, G., Auld, D. R. & Bahr, K., 1989. A comparison of techniques for magnetoteliuric response function estimation, J. Geophys.Res., 94, 14201-14213. Jones, I. F., 1985. Applications of the Karhunen-Loeve transformation in reflection seismology, Ph.D. thesis, University of British Columbia, Vancouver. Kaplan, W., 1973. Advanced Calculus, Addison-Wesley Publishing Co., London. Keen, C. E. & Hyndman, R. D., 1979. Geophysical review of the continental margins of eastern and western Canada, Can. J. Earth Sci., 16, 712-747. Kirkpatrick, S., Gelatt, C.D. & Vecchi, M., 1983. Optimization by simulated annealing, Science, 220, 671-680. Kirkpatrick, S., 1984. Optimization by simulated annealing: quantitative studies, J.Statis.Phys., 34, 975-986. Korevaar, J., 1968. Mathematical methods, 1, Academic Press, New York. Kramer, H. P. & Mathews, M. V., 1968. A linear coding for transmitting a set of correlated signals, Presented at the 38th Annual SEG meeting in Denver. Kurtz, R.D. & Niblett, E.R., 1978. Time dependence of magnetoteliuric fields in a tectonically active region in Eastern Canada, J.Geomag.Geoelectr., 30, 561-577. Kurtz, R.D. & Niblett, E.R., 1983. Magnetoteliuric monitoring of impedance in an area of induced seismicity at Manic 3, Quebec, Pub. Earth Phys. Branch, Ottawa, 25, 1-38. Kurtz, R. D., DeLaurier, J. M. & Gupta, JC., 1986. A magnetoteliuric sounding across Vancouver Island detects the subducting Juan de Fuca plate, Nature, 321, 596-599. Kurtz, R.D., DeLaurier, J.M. & Gupta, JC., 1990. The electrical conductivity distribution beneath Vancouver Island: a region of active plate subduction, J. Geophys. Res., In press. Lanczos, C., 1958. Linear systems in self-adjoint form, Am. math.Monthly., 65, 665-679. Lang, S. W., 1985. Bounds from noisy linear measurements, IEEE Trans.Inform. Theory, IT-31, 490-508. Larsen, J. C., 1977. Removal of local surface conductivity effects from low frequency mantle response curves, Acta.Geodaet.et Montanist. Acad. Hung., 12, 183-186. Levenberg, K., 1944. A method for the solution of certain nonlinear problems in least squares, Q. Appl. Math., 2, 164-168. Levy, S. & Fullagar, P. K., 1981. The reconstruction of a sparse spike train from a portion of its spectrum and application to high resolution deconvolution, Geophysics, 46, 1235-1243. Lines, L. R. & Treitel, S., 1984. A review of least-squares inversion and its application to geophysical problems, Geophys.Prosp., 32, 159-186. Lusternik, L. A. & Sobolev, V. J., 1961. Elements of functional analysis, John Wiley & Sons, New York. MacBain, J., 1986. On the Frdchet differentiability of the one-dimensional magnetotellurics problem, Geophys.J.R.astr.Soc., 86, 669-672. MacBain, J., 1987. On the Fr6chet differentiability of the one-dimensional electromagnetic induction problem, Geophys.J.R.astr.Soc., 88, 777-785. Marquardt, D.W., 1963. An algorithm for least-squares estimation of non-linear parameters, ACM Trans. Math. Softw., 7, 481^497. Martsen, R.E., 1981. The design of the XMP linear programming library, J.S.Ind. Appl.Math., 11, 431-441. Mathews, J. & Walker, R. L., 1970. Mathematical methods of physics, W. A. Benjamin, Inc., Don Mills, Ont. Mazzella, A. & Morrison, H. F., 1974. Electrical resistivity variations associated with earthquakes on San Andreas Fault, Science, 185, 855-857. Menke, W., 1984. Geophysical data analysis: discrete inverse problems, Academic Press, London. Metropolis, N. A., Rosenbluth, A., Rosenbluth, M., Teller, A. & Teller, E., 1953. Equation of state calculations by fast computing machines, / . of Chem. Physics., 21, 1087-1092. Milne, R.D., 1980. Applied functional analysis: An introductory treatment, Pitman Advanced Publishing Program, Boston. Morse, P.M. & Feshbach, H., 1953. Methods of theoretical physics, McGraw-Hill, New York. Niblett, E. R. & Sayn-Wittgenstein, C., 1960. Variation of electrical conductivity with depth by the magnetoteliuric method, Geophysics, 25, 998-1088. Noritomi, K., 1981. Study on fault activity using geoelectrical and geomagnetic methods, Rep.Natural Disaster Sci., A-2, 1-107. Oldenburg, D.W., 1978. The interpretation of direct current resistivity measurements, Geo-physics, 43, 610-625. Oldenburg, D. W., 1979. One-dimensional inversion of natural source magnetoteliuric observa-tions, Geophysics, 44, 1218-1244. Oldenburg, D.W., 1981. Conductivity structure of oceanic upper mantle beneath the Pacific plate, Geophys. J.R. astr. Soc., 65, 359-394. Oldenburg, D.W., 1983. Funnel functions in linear and nonlinear appraisal, J. Geophys. Res., 88, 7387-7398. Oldenburg, D.W., 1984. An introduction to linear inverse theory, IEEE Trans. Geosci. Remote Sensing, GE-22, 644-649. Oldenburg, D.W., 1990. Inversion of electromagnetic data: an overview of new techniques, Geophysical Surveys, In press. Oldenburg, D. W. & Samson, J. C., 1979. Inversion of interferometric data from cylindrically symmetric refractionless plasmas, J. Opt. Soc. Am., 69, 927-942. Oldenburg, D.W., Whittall, K.P. & Parker, R.L., 1984. Inversion of ocean bottom magnetotel-iuric data revisited, J. Geophys. Res., 89, 1829-1833. Oldenburg, D.W. & Ellis, R.G., 1990. Inversion of geophysical data using an approximate inverse mapping, Geophys. J. Int., Submitted. Park, S.K. & Livelybrooks, D.W., 1989. Quantitative interpretation of rotationally invariant parameters in magnetotellurics, Geophysics, 54, 1483-1490. Parker, R. L., 1970. The inverse problem of electrical conductivity in the mantle, Geophys. J.R.astr.Soc., 22, 121-138. Parker, R.L., 1972. Inverse theory with grossly inadequate data, Geophys.J.R.astr.Soc., 29, 123-138. Parker, R.L., 1974. Best bounds on density and depth from gravity data, Geophysics, 39, 644-649. Parker, R.L., 1975. The theory of ideal bodies for gravity data interpretations, Geophys. J.R.astr.Soc., 42, 315-334. Parker, R. L., 1977a. Understanding inverse theory, Ann. Rev. Earth. Planet. Sci., 5, 35-64. Parker, R. L., 1977b. The Frechet derivative for the one-dimensional electromagnetic induction problem, Geophys.J.R.astr.Soc., 49, 543-547. Parker, R. L., 1980. The inverse problem of electromagnetic induction: existence and construction of solutions based on incomplete data, J.Geophys.Res., 85, 4421—4425. Parker, R. L., 1982. The existence of a region inaccessible to magnetotelluric sounding, Geophys. J.R.astr.Soc., 68, 165-170. Parker, R. L., 1983. The magnetotelluric inverse problem, Geophysical Surveys, 6, 5-25. Parker, R. L., 1984. An inverse problem of electromagnetism arising in geophysics, SIAM-AMS Proceedings, 14, 3-12. Parker, R.L., 1986. Comments concerning 'On the Frdchet differentiability of the one-dimensional magnetotellurics problem' by John MacBain, Geophys.J.R.astr.Soc., 86, 673. Parker, R.L. & McNutt, M.K., 1980. Statistics for the one-norm misfit error, J.Geophys.Res., 85, 4429-4430. Parker, R. L. & Whaler, K. A., 1981. Numerical methods for establishing solutions to the inverse problem of electromagnetic induction, J. Geophys. Res., 86, 9574—9584. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T„ 1986. Numerical recipes: the art of scientific computing, Cambridge University Press, Cambridge. Qian, F. Y., Zhao, Y. L., Yu, M. M., Wang, Z. X., Liu, X. W., & Chang, S. M., 1983. Geoelectrical resistivity anomalies before earthquakes, Scienta Sinica, B-26, 326-336. Ranganayaki, R.P., 1984. An interpretive analysis of magnetotelluric data, Geophysics, 49, 1730-1748. Riddihough, R. P., 1977. A model for recent plate interactions off Canada's west coast, Can. J. Earth Sci., 14, 384-396. Rikitake, K., 1987. Magnetic and electric signals precursory to earthquakes: an analysis of Japanese data, J.Geomag.Geoelectr., 39, 47-61. Rogers, G. C. & Hasegawa, S., 1978. A second look at the British Columbia earthquake of June 23, 1946, Bull. Seism. Soc. Am., 68, 653-675. Rokityansky, 1.1., 1982. Geoelectromagnetic investigation of the Earth's crust and mantle, Springer-Verlag, Berlin. Rothman, D.H., 1985. Nonlinear inversion, statistical mechanics, and residual statics estimation, Geophysics, 50, 2784-2796. Rothman, D. H., 1986. Automatic estimation of large residual statics corrections, Geophysics, 51, 332-346. Sandberg, S. K. & Hohmann, G. W., 1982. Controlled source audiomagnetotellurics in geothermal exploration, Geophysics, 47, 100-116. Schmucker, U., 1970. Anomalies of geomagnetic variations in the Southwestern United States, Scripps Institute of Oceanography Bulletin, 13, University of California Press, London. Schmucker, U., 1987. Substitute conductors for electromagnetic response estimates, PAGEOPH, 125, 341-367. Smith, J.T., 1989. Rapid inversion of multi-dimensional magnetoteliuric data, Ph.D. thesis, University of Washington, Seattle. Smith, J. T. & Booker, J. R., 1988. Magnetoteliuric inversion for minimum structure, Geophysics, 53, 1565-1576. Sternberg, B. K., Washburne, J.C. & Pellerin, L., 1988. Correction for the static shift in magnetotellurics using transient electromagnetic soundings, Geophysics, 53, 1459-1468. Sumitomo, N. & Noritomi, K., 1986. Synchronous precursors in the electrical earth resistivity and the geomagnetic field in relation to an earthquake near the Yamasaki fault, southwest Japan, J.Geomag.Geoelectr., 38, 971-989. Tarantola, A. & Valette, B., 1982. Inverse problems = Quest for information, J. Geophys., 50, 159-170. Tarantola, A. & Valette, B., 1982. Generalized nonlinear inverse problems solved using the least squares criterion, Rev. Geophys. Space Phys., 20, 219-232. Telford, W. M., Geldart, L. P., Sheriff, R. E. & Keys, D. A., 1976. Applied geophysics, Cambridge University Press, Cambridge. Tikhonov, A.N., 1950. On determining the electrical properties of deep-lying crustal layers, Dokl. Acad. Nauk SSSR, 73, 295-297. Tikhonov, A.N., 1965. Mathematical basis of the theory of electromagnetic soundings, USSR Comp. Math, and Phys., 5, 207-211. van Laarhoven, P.J.M. & Aarts, E.H.L., 1987. Simulated annealing: theory and applications, D. Reidel Publishing Co., Dordrecht, Holland. Vanderbilt, D. & Louie, S. G., 1984. A Monte Carlo simulated annealing approach to optimization over continuous variables, J.Comput.Phys., 36, 259-271. Wannamaker, P. E., Stodt, J. A. & Luis, R., 1987. PW2D: Finite element program for solution of magnetotelluric responses of two-dimensional earth resistivity structure, ESL-158, University of Utah Research Institute. Weidelt, P., 1972. The inverse problem of geomagnetic induction, Z. Geophys., 38, 257-289. Weidelt, P., 1985. Construction of conductance bounds from magnetotelluric impedances, J. Geophys., 57, 191-206. Weidelt, P., 1986. Discrete frequency inequalities for magnetotelluric impedances of one-dimensional conductors, J. Geophys., 59, 171-176. Whittall, K. P., 1986. Inversion of magnetotelluric data using localized conductivity constraints, Geophysics, 51, 1603-1607. Whittall, K. P., 1987. Exploring magnetotelluric nonuniqueness using inverse scattering methods, Ph.D. thesis, University of British Columbia, Vancouver. Whittall, K. P. & Oldenburg, D. W., 1986. Inversion of magnetotelluric data using a practical inverse scattering formulation, Geophysics, 51, 383-395. Whittall^ K. P. & MacKay, A. L., 1989. Quantitative interpretation of NMR relaxation data, J.Magn.Reson., 84, 134-152. Whittall, K.P. & Oldenburg, D.W., 1990. Inversion of magnetotelluric data over a one-dimensional earth, in Magnetotellurics in geophysical exploration, ed. Wannamaker, P. E., SEG publication, In press. Woodhouse, J.H., 1976. On Rayleigh's principle, Geophys.J.R.astr.Soc., 46, 11-22. Xu, S., 1986. Quantitative estimation of an annual variation of apparent resistivity, J.Geomag., Geoelectr., 38, 991-999. Zeidler, E., 1985. Nonlinear junctional analysis and its applications, Springer-Verlag, New York. Appendix A The absolutely flattest model The model constructed by minimizing the norm of the model derivative is commonly referred to as the flattest model. The flattest model can be particularly useful since it may be considered to be a minimum-structure solution. The construction procedure requires that a model value at some fixed point be specified a priori in order to express the data constraints in terms of the model derivative. The standard derivation (e.g. Oldenburg 1984) considers specifying this model value only at the endpoints of the interval of definition; however, it is straightforward to generalize this method to specify a model value at any point in the interval. In general, the flattest model is not a unique entity since supplying different model values lead to different flattest models. Rather, what is constructed is the flattest acceptable model which passes through the specified value. If the model value is known reliably, it is valuable to include this information in the inversion; however, supplying an inaccurate value can introduce false structure into the constructed model. In this case, it is preferable to solve directly for the model value which results in the absolutely flattest model. The absolutely flattest model is the unique solution for which the norm of the model derivative is smaller than that of any other flattest model. Not only does this method obviate the requirement of specifying a model value a priori, but the absolutely flattest model is the true minimum-structure solution and possesses a number of attributes which may be desirable in practice. The absolutely flattest model (/2 norm) is developed and discussed here for the general linear inverse problem where N observed responses ej are related to the model rri(z) via the linear functional where the model and the kernel functions g-j(z) are defined on the interval [a, 6] and equations (Al) have been normalized by their assumed uncertainties. The kernel functions are assumed (Al) a to be continuous or to have at most a finite number of step discontinuities. To express the constraints in terms of the model derivative, (Al) can be integrated by parts to give b m(b)h 3(b)-e 3 = jhJ(z) m'(z)dz, (A2) where hj(z) = Jg 3(u)du is a continuous function. A similar expression involving the model endpoint m(a) is also easily derived (e.g. Oldenburg 1984). However, it is straightforward to generalize this method to write the data constraints in terms of the model value m(c) specified at any arbitrary point c, a < c < b, by substituting m I •.(b) = j H(z-c) m' (z) dz + m (c) (A3) a into (A2) to yield b m(c)h j(b)-e j = J[h(z)-h j(b)H(z-c)]m'(z)dz, j = l,...,N, (A4) a where H is the Heaviside step function. It is emphasized that in the model construction problem m(c) is a parameter to be specified arbitrarily and may be considered to be independent of c. The Z2 flattest model is constructed by minimizing the norm of the model derivative subject to the side conditions (A4) using the method of Lagrange multipliers to minimize the functional $ ( m ' ; m ( c ) , c ) = ||m'(2:)||j + 2 j ™ (c ) hi { b ) ~ 6 j ~ J ^ { Z ) ~ k j m ' d z } ( A 5 ) (it is straightforward to include a weighting function in the model norm, but for clarity this will be omitted where). Section 3.2.1 shows that minimizing (A5) with respect to m' and a j leads to the result N rri (z) = Y, <*i [hj (z) - ^ (b) H (z — c)] (A6a) j=i N N = Y <*jh3 (z) -H(z-c)Y j «jhj(b) , (A6b) j=i j=l _ a where the Lagrange multipliers a = ( a 1 , a 2 , . . . , a/v)T are given by a =T- 1[m(c)h(b)-e], (A7) where h(b) = (h^b), h2(b),..., hN(b)) T, c = (ea, e 2 , . . . , eN) T and T is the inner product matrix with elements 0 = J [hj (z) - hj (b) H (z — c)] [h k (z) - hk (b) H (z - c)] dz., (A8) According to (A6), the derivative of the the flattest model is given by a linear combination of the (modified) kernel functions with the Lagrange multipliers a j acting as coefficients given by (A7). The inner product matrix is symmetric and positive definite and may be decomposed using singular value decomposition as T =UAU T, (A9) where A = diag(A1? A 2 , . . . , Ajv) is the diagonal matrix of eigenvalues and U is the matrix of column eigenvectors. The flattest model is recovered by integrating m'(z) using m(c) as a fixed value: m z (z) = m(c) + J m'{u)du. (A10) According to (A6), m'(z) is not defined for z = c since a step discontinuity occurs in the kernel functions at this point; also, the derivative is not defined in a strict sense at the endpoints a and b. However, the derivative may be defined at these points according to the following convention. For an interior point c, m'(c) is defined to be the average of the left and right limits of rn'(z) as defined by (A6); at the endpoints a and b where two-sided limits do not exist, the model derivative is defined to be the appropriate one-sided limit, i.e. m ' ( a ) = lim m' (z) , (Alia) Z—KJ + m'(b)= l imm' (z ) , (Allb) (Allc) m'(c) = -z—*b 1 lim m'(z) + lim m! (z) z—+c~ z— a If the point c coincides with either a or b, the derivative at that point is defined to be the appropriate one-sided limit. In this case it is straightforward to verify that (A6) reduces to m ' ( z ) = a j [ h j ( z ) ~ hj (6)], for c = a, (A12a) 3=1 N m! (z) = a3h3 0 ) , for c = b, (A 12b) 3=1 which correspond to the expressions for the flattest model that are generally derived when the model value is specified only at the endpoints. The standard procedure for constructing the flattest model involves specifying some estimate of the model value m(c). Oldenburg (1984) illustrates how specifying an inaccurate estimate can introduce false structure into the constructed model. To construct the absolutely flattest model which truly minimizes the Lagrange functional, (A5) must also be minimized with respect to the parameter m(c). Setting d§/dm(c) equal to zero leads to N = 0- (A 13) 3=1 Equation (A 13) represents a further constraint on the a / s that must be satisfied by the absolutely flattest model. Equations (A7) and (A13) represent JV + 1 equations in iV+1 unknowns, the N Lagrange multipliers a j and the optimum value of m(c). The system of equations may be solved for m(c) to yield N ~ E h3 (b ) ej/Aj m(c) = ^ , (A 14) E h) (b) /Xj 3=1 where £j = Efeli and hj = E t l i Ukjhk represent rotated responses and kernel functions. Using this model value will result in the absolutely flattest model. It is straightforward to show that the absolutely flattest model does not depend on the choice of c in the interval [a, b]. Differentiating $ with respect to c leads to ^ = 2 m ( c ) 2 2 a j h j { b ) . (A 15) 3=1 If condition (A 13) for the absolutely flattest model is satisfied, then d$/dc= 0 regardless of the choice of c. Any convenient value of c can be used to evaluate m(c) via (A14). Equation (A 14) for the optimal choice of m(c) is not valid in the case that hj(b) = 0 for all j (i.e. when the original kernel functions all have zero area). If this is the case condition (A13) is automatically satisfied and it is straightforward to verify that the modified data equations (A4) are independent of m(c). In this case the absolutely flattest model is unique only to within an additive constant, and any value of m(c) will result in a flattest model with the same derivative norm ||m'||2. A number of interesting properties of the absolutely flattest model can be demonstrated. Consider first the general expression (A6) for m'(z) when c represents a point in the interior of the interval. According to (A6), m'(z) is formed as a linear combination of kernel functions that have a step discontinuity at c. Therefore, in general, m'(z) also has a discontinuity at c, i.e. the flattest model has a discontinuous derivative at c, so m(z) e C°. However, applying condition (A13) for the absolutely flattest model to (A6) removes the discontinuity from m'(z). Thus, the absolutely flattest model is that model which forms m'(z) as a linear combination of discontinuous kernel functions in such a way that the discontinuity is eliminated. The absolutely flattest model has a continuous derivative at all points and thus, rn(z)eC 1. This property can be understood by recognizing that the flattest model is really the flattest acceptable model which passes through the specified point m(c). In general, for an arbitrary value of m(c) the requirement that the model pass through this point is antagonistic to minimizing the model-derivative norm while fitting the data; the model which best accomplishes these conflicting objectives has a discontinuous derivative at c. However, the value of m(c) specified for the absolutely flattest model is that value which is not antagonistic to the other objectives and no discontinuity results. When the model value is specified for an interior point c it follows from (A6) that m'(a) = m'(b) = 0 for any value of ra(c). When c coincides with either a or b the general expression (A6) for the flattest model reduces to the simpler form (A 12). According to (A12), when c=b, m'(a) = 0 (since hj(a) = 0), but in general m'(6)^0; when c = a, m'(b) — 0, but in general m'(a) ^ 0. Thus, when the model value is specified at an endpoint, the flattest model will generally have a non-zero derivative at this endpoint and zero derivative at the other endpoint. The non-zero derivative at the specified endpoint is a result of supplying a model value here which conflicts with the objective of minimizing the derivative norm while fitting the data; this introduces additional structure into the constructed model. However, applying condition (A 13) for the absolutely flattest model to (A12) immediately leads to m'(a) = m'(b) = 0 for both c = a and c = b; thus the absolutely flattest model has zero derivative at both endpoints. Equation (A7) for the unknowns {c^} requires the inverse of the inner-product matrix T. However, as described in Section 3.2.1, this matrix is often ill-conditioned in practical problems and an inversion which results in an appropriate (non-zero) level of misfit is desired. The spectral expansion method of Section 3.2.2 overcomes these difficulties. As described in Section 3.2.2, a ridge-regression parameter f3 is added to the main diagonal of A to stabilize the inversion. The value of /? which gives the appropriate misfit may be found by solving the simple non-linear equation where the (new) rotated responses fj are given by N fi = E U kj [m (c) hk (b) - ek]. (A 17) k=1 In this case, the solution (A14) for the value of m(c) which results in the absolutely flattest model becomes m(c) = ^ . (A 18) E ^ W / C A i + f l 3 =1 Of course, the difficulty with this approach is that m(c) is required to compute (3 according to (A16) and (A17), but /? is required to compute m(c) according to (A18). A practical solution to this dilemma is to choose a reasonable starting value for m(c), compute the corresponding /? then use this value to re-compute an improved m(c). The solutions for m(c) and f3 may be repeated iteratively until both have converged to stable values. This procedure has been implemented to find the surface conductivity value in the MT inversion algorithm for the absolutely flattest model. It is found that the convergence for m(c) and fl is stable and rapid and the procedure is straightforward to implement since it is not necessary to recompute or decompose the matrix T at each iteration.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Inversion and appraisal for the one-dimensional magnetotellurics...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Inversion and appraisal for the one-dimensional magnetotellurics problem Dosso, Stanley Edward 1990
pdf
Page Metadata
Item Metadata
Title | Inversion and appraisal for the one-dimensional magnetotellurics problem |
Creator |
Dosso, Stanley Edward |
Publisher | University of British Columbia |
Date Issued | 1990 |
Description | The method of magnetotellurics (MT) uses surface measurements of naturally-occurring electromagnetic fields to investigate the conductivity distribution within the Earth. In many interpretations it is adequate to represent the conductivity structure by a one-dimensional (1-D) model. Inferring information about this model from surface field measurements is a non-linear inverse problem. In this thesis, linearized construction and appraisal algorithms are developed for the 1-D MT inverse problem. To formulate a linearized approach, the forward operator is expanded in a generalized Taylor series and second-order terms are neglected. The resulting linear problem may be solved using techniques of linear inverse theory. Since higher-order terms are neglected, the linear problem is only approximate, and this process is repeated iteratively until an acceptable model is achieved. Linearized methods have the advantage that, with an appropriate transformation, a solution may be found which minimizes a particular functional of the model known as a model norm. By explicitly minimizing the model norm at each iteration, it is hypothesized that the final constructed model represents the global minimum of this functional; however, in practice, it is difficult to verify that a global (rather than local) minimum has been found. The linearization of the MT problem is considered in detail in this thesis by deriving complete expansions in terms of Fréchet differential series for several choices of response functional, and verifying that the responses are indeed Fréchet differentiable. The relative linearity of these responses is quantified by examining the ratio of non-linear to linear terms in order to determine the best choice for a linearized approach. In addition, the similitude equation for MT is considered as an alternative formulation to linearization and found to be inadequate in that it implicitly neglects first-order terms. Appropriate choices of the model norm allow linearized inversion algorithms to be formulated which minimize a measure of the model structure or of the deviation from a (known) base model. These inversions construct the minimum-structure and smallest-deviatoric model, respectively. In addition, minimizing I₂ model norms lead to smooth solutions which represent structure in terms of continuous gradients, whereas minimizing I₁ norms yield layered conductivity models with structural variations occurring discontinuously. These two formulations offer complementary representations of the Earth, and in practice, a complete interpretation should consider both. The algorithms developed here consider the model to be either conductivity or log conductivity, include an arbitrary weighting function in the model norm, and fit the data to a specified level of misfit: this provides considerable flexibility in constructing 1-D models from MT responses. Linearized inversions may also be formulated to construct extremal models which minimize or maximize localized conductivity averages of the model. These extremal models provide bounds for the average conductivity over the region of interest, and thus may be used to appraise model features. An efficient, robust appraisal algorithm has been developed using linear programming to extremize the conductivity averages. For optimal results, the extremal models must be geophysically reasonable, and bounding the total variation in order to limit unrealistic structure is an important constraint. Since the extremal models are constructed via linearized inversion, the possibility always exists that the computed bounds represent local rather than global extrema. In order to corroborate the results, extremal models are also computed using simulated annealing optimization. Simulated annealing makes no approximations and is well known for its inherent ability to avoid unfavourable local minima. Although the method is considerably slower than linearized analysis, it represents a general and interesting new appraisal technique. The construction and appraisal methods developed here are illustrated using synthetic test cases and MT field data collected as part of the LITHOPROBE project. In addition, the model construction techniques are used to analyze MT responses measured at a number of sites on Vancouver Island, Canada, to investigate the monitoring of local changes in conductivity as a precursor for earthquakes. MT responses measured at the same site over a period of four years are analyzed and indicate no significant changes in the conductivity (no earthquakes of magnitude greater than 3.0 occurred in this period). Conductivity profiles at a number of sites are also considered in an attempt to infer the regional structure. Finally, a method of correcting linearized inversions is developed. The corrections consist of successively approximating an analytic expression for the linearization error. The method would seem to represent a novel and practical approach that can significantly reduce the number of linearized iterations. In addition, a correspondence between the correction steps and iterations of the modified Newton's method for operators is established. |
Subject |
Earth currents Seismology Earthquake hazard analysis Plate tectonics |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-01-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0052739 |
URI | http://hdl.handle.net/2429/30692 |
Degree |
Doctor of Philosophy - PhD |
Program |
Geophysics |
Affiliation |
Science, Faculty of Earth, Ocean and Atmospheric Sciences, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1990_A1 D67.pdf [ 9.01MB ]
- Metadata
- JSON: 831-1.0052739.json
- JSON-LD: 831-1.0052739-ld.json
- RDF/XML (Pretty): 831-1.0052739-rdf.xml
- RDF/JSON: 831-1.0052739-rdf.json
- Turtle: 831-1.0052739-turtle.txt
- N-Triples: 831-1.0052739-rdf-ntriples.txt
- Original Record: 831-1.0052739-source.json
- Full Text
- 831-1.0052739-fulltext.txt
- Citation
- 831-1.0052739.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0052739/manifest