Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Inversion and appraisal for the one-dimensional magnetotellurics problem Dosso, Stanley Edward 1990

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1990_A1 D67.pdf [ 9.01MB ]
Metadata
JSON: 831-1.0052739.json
JSON-LD: 831-1.0052739-ld.json
RDF/XML (Pretty): 831-1.0052739-rdf.xml
RDF/JSON: 831-1.0052739-rdf.json
Turtle: 831-1.0052739-turtle.txt
N-Triples: 831-1.0052739-rdf-ntriples.txt
Original Record: 831-1.0052739-source.json
Full Text
831-1.0052739-fulltext.txt
Citation
831-1.0052739.ris

Full Text

INVERSION AND APPRAISAL FOR THE ONE-DIMENSIONAL MAGNETOTELLURICS PROBLEM By Stanley Edward Dosso M. Sc. Physics, University of  Victoria, 1985 B. Sc. (Hons.) Physics and Applied Mathematics, University of  Victoria, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF GEOPHYSICS AND ASTRONOMY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1990 © Stanley Edward Dosso, 1990 In presenting this thesis in partial fulfilment  of  the requirements for  an advanced degree at the University of  British Columbia, I agree that the Library shall make it freely  available for  reference  and study. I further  agree that permission for  extensive copying of  this thesis for  scholarly purposes may be granted by the head of  my department or by his or her representatives. It is understood that copying or publication of  this thesis for  financial  gain shall not be allowed without my written permission. Department of  Geophysics and Astronomy The University of  British Columbia 129-2219 Main Mall Vancouver, Canada V6T 1W5 Date: "Sepi/<}9D Abstract The method of  magnetotellurics (MT) uses surface  measurements of  naturally-occurring electromagnetic fields  to investigate the conductivity distribution within the Earth. In many interpretations it is adequate to represent the conductivity structure by a one-dimensional (1-D) model. Inferring  information  about this model from  surface  field  measurements is a non-linear inverse problem. In this thesis, linearized construction and appraisal algorithms are developed for  the 1-D MT inverse problem. To formulate  a linearized approach, the forward  operator is expanded in a generalized Taylor series and second-order terms are neglected. The resulting linear problem may be solved using techniques of  linear inverse theory. Since higher-order terms are neglected, the linear problem is only approximate, and this process is repeated iteratively until an acceptable model is achieved. Linearized methods have the advantage that, with an appropriate transformation,  a solution may be found  which minimizes a particular functional  of  the model known as a model norm. By explicitly minimizing the model norm at each iteration, it is hypothesized that the final  constructed model represents the global minimum of  this functional;  however, in practice, it is difficult  to verify  that a global (rather than local) minimum has been found. The linearization of  the MT problem is considered in detail in this thesis by deriving complete expansions in terms of  Frdchet differential  series for  several choices of  response functional, and verifying  that the responses are indeed Frechet differentiable.  The relative linearity of these responses is quantified  by examining the ratio of  non-linear to linear terms in order to determine the best choice for  a linearized approach. In addition, the similitude equation for  MT is considered as an alternative formulation  to linearization and found  to be inadequate in that it implicitly neglects first-order  terms. Appropriate choices of  the model norm allow linearized inversion algorithms to be formulated which minimize a measure of  the model structure or of  the deviation from  a (known) base model. These inversions construct the minimum-structure and smallest-deviatoric model, respectively. In addition, minimizing h model norms lead to smooth solutions which represent structure in terms of  continuous gradients, whereas minimizing l\ norms yield layered conductivity models with structural variations occurring discontinuously. These two formulations  offer  complementary representations of  the Earth, and in practice, a complete interpretation should consider both. The algorithms developed here consider the model to be either conductivity or log conductivity, include an arbitrary weighting function  in the model norm, and fit  the data to a specified  level of  misfit:  this provides considerable flexibility  in constructing 1-D models from  MT responses. Linearized inversions may also be formulated  to construct extremal models which minimize or maximize localized conductivity averages of  the model. These extremal models provide bounds for  the average conductivity over the region of  interest, and thus may be used to appraise model features.  An efficient,  robust appraisal algorithm has been developed using linear programming to extremize the conductivity averages. For optimal results, the extremal models must be geophysically reasonable, and bounding the total variation in order to limit unrealistic structure is an important constraint. Since the extremal models are constructed via linearized inversion, the possibility always exists that the computed bounds represent local rather than global extrema. In order to corrob-orate the results, extremal models are also computed using simulated annealing optimization. Simulated annealing makes no approximations and is well known for  its inherent ability to avoid unfavourable  local minima. Although the method is considerably slower than linearized analysis, it represents a general and interesting new appraisal technique. The construction and appraisal methods developed here are illustrated using synthetic test cases and MT field  data collected as part of  the LITHOPROBE project. In addition, the model construction techniques are used to analyze MT responses measured at a number of  sites on Vancouver Island, Canada, to investigate the monitoring of  local changes in conductivity as a precursor for  earthquakes. MT responses measured at the same site over a period of  four years are analyzed and indicate no significant  changes in the conductivity (no earthquakes of magnitude greater than 3.0 occurred in this period). Conductivity profiles  at a number of  sites are also considered in an attempt to infer  the regional structure. Finally, a method of  correcting linearized inversions is developed. The corrections consist of  successively approximating an analytic expression for  the linearization error. The method would seem to represent a novel and practical approach that can significantly  reduce the number of  linearized iterations. In addition, a correspondence between the correction steps and iterations of  the modified  Newton's method for  operators is established. Table of Contents Abstract ii Table of  Contents v List of  Tables ix List of  Figures x Acknowledgements xiv 1 Introduction 1 1.1 The geophysical inverse problem 1 1.2 The geo-electrical conductivity 3 1.3 The magnetotelluric (MT) method 6 1.3.1 EM induction in the Earth 6 1.3.2 The MT inverse problem : 10 1.3.3 Applicability of  1-D inversion 14 1.4 Overview of  work in this thesis 16 2 Frechet differential  expansions and the linearity of  MT responses 19 2.1 Introduction 19 2.1.1 Functional analysis background 19 2.1.2 Application to 1-D MT inversion 24 2.2 The response R = d zE/(iu>E)  28 2.2.1 Expansions for  R 28 2.2.2 Proof  of  the Frdchet differentiability  of  R 33 2.2.3 Constant conductivity example 35 2.3 The response c — —E/d zE 36 2.3.1 Expansions for  c 36 2.3.2 Constant-conductivity example 43 2.4 Relative linearity of  R and c 45 2.4.1 Quantifying  the linearity 45 2.4.2 Linearity for  constant-conductivity models 47 2.4.3 Linearity for  general models 51 2.5 Alternative formulations  69 2.5.1 Alternative choices for  model and response 69 2.5.2 The similitude equation as a basis for  model-norm inversions 70 3 h model norm construction 78 3.1 Introduction 78 3.2 Linear inverse theory 79 3.2.1 Smallest model 80 3.2.2 Spectral expansion 82 3.2.3 Smallest-deviatoric model 85 3.2.4 Flattest model 86 3.3 The linearized inversion algorithm 88 3.3.1 Numerical implementation 89 3.3.2 Non-linear considerations 91 3.4 Examples of  model norm construction 96 3.4.1 Flattest model construction 96 3.4.2 Smallest-deviatoric model construction 112 4 h model norm construction 117 4.1 Introduction 117 4.2 Linear inversion 118 4.2.1 Minimum-variation model 118 4.2.2 Smallest-deviatoric model 122 4.3 The linearized inversion algorithm 123 4.4 Examples of  ly model norm construction 125 4.4.1 Minimum-variation model construction 125 4.4.2 Smallest-deviatoric model construction 138 5 Appraisal using extremal models of  bounded variation 143 5.1 Introduction 143 5.2 Formulating the variation bound 148 5.2.1 Method 1 148 5.2.2 Method 2 150 5.3 Linear appraisal example 152 5.4 The linearized appraisal algorithm 158 5.5 Appraisal of  MT responses 161 5.5.1 Synthetic MT example 161 5.5.2 MT field  data example 170 6 Non-linear appraisal using simulated annealing 178 6.1 Introduction 178 6.2 Simulated annealing 179 6.2.1 Statistical mechanics 179 6.2.2 The Metropolis algorithm 181 6.2.3 Combinatorial optimization using simulated annealing 182 6.3 The simulated annealing appraisal algorithm 185 6.4 Appraisal examples 188 7 An application to MT monitoring of  earthquake precursors 197 7.1 Introduction 197 7.2 Geological and tectonic setting 200 7.3 Field experiment 201 7.4 Temporal change in responses 202 7.5 Temporal change required in conductivity models 204 7.6 Regional interpretation 208 8 A modified  linearized inversion algorithm 212 8.1 Introduction 212 8.2 Correcting the linearization 212 8.3 A practical inversion algorithm 223 9 Summary and discussion 230 References  237 Appendix A The absolutely flattest  model 246 List of Tables 2.1 Summary of  cases considered in linearity study of  MT responses R and c 68 3.1 The true conductivity model for  the synthetic MT example 99 3.2 Summary of  model attributes for  the l2 norm inversion shown in Fig. 3.1 99 4.1 Summary of  model attributes for  the l\ norm inversion shown in Fig. 4.1 128 6.1 Summary of  bounds for  a computed using simulated annealing and linearized inversion for the synthetic MT example 191 8.1 Summary of  misfits  using the correction scheme for  the synthetic MT example with m(z)  = <r(z)  220 8.2 Summary of  misfits  using the correction scheme for  the synthetic MT example with m(z)  = \oga(z)  220 List of Figures 2.1 Relative linearity of  MT responses R and c for  constant-conductivity models 50 2.2 Relative linearity of  R and c for  surface-layer  true model and constant-conductivity starting models 53 2.3 Relative linearity of  R and c for  surface-layer  true model and constant-conductivity starting models as a function  of  <ro and T  55 2.4 Relative linearity of  R and c for  surface-layer  true and starting models 56 2.5 Relative linearity of  R and c for  surface-layer  true and starting models as a function  of  gq and T : 58 2.6 Relative linearity of  R and c for  surface-layer  true and starting models (different  layer thickness) 59 2.7 Relative linearity of  R and c for  surface-layer  true and starting models (different  layer thickness) as a function  of  ao and T  60 2.8 Relative linearity of  R and c for  positive-gradient true model and constant-conductivity starting models 61 2.9 Relative linearity of  R and c for  positive-gradient true model and constant-conductivity starting models as a function  of  ao and T  62 2.10 Relative linearity of  R and c for  positive-gradient true and starting models 64 2.11 Relative linearity of  R and c for  positive-gradient true and starting models as a function of  ao and T  65 2.12 Relative linearity of  R and c for  positive-gradient true and starting models (different  surface value) 66 2.13 Relative linearity of  R and c for  positive-gradient true and starting models (different  surface value) as a function  of  ao and T  67 3.1 Dependence of  x2 misfit  on target (linear) misfit  xi 94 3.2 Sequence of  models produced in l2 flattest  model inversion 97, 98 3.3 The l2 flattest  model for  m(z)  = a(z)  and / (z) = log z + zo 102 3.4 The I2  flattest  model for  m(z)  = \oga(z)  and f(z)  = \ogz+zo 103 3.5 The I2  flattest  model for  m(z)  =log cr(z)  and f(z)  = log z + zo and weighting w{z)  which allows structural variations in narrow zones 105 3.6 The effect  of  data errors on I2 flattest  model construction 106 3.7 The I2 flattest  and absolutely flattest  models 108 3.8 The l2 flattest  models for  the LITHOPROBE MT data set 110 3.9 The h smallest-deviatoric models for  m(z)  = a(z)  113 3.10 Approximate appraisal using weighted /2 model-deviation norm 115 4.1 Sequence of  models produced in li minimum-variation model inversion . . . 126, 127 4.2 Computation time for  LP solution at each iteration 128 4.3 The h minimum-variation model for  m{z)  = log a{z)  130 4.4 Best-fitting  and l\ minimum-structure models 132 4.5 The li minimum-variation model for  weighting function  Wi = 5 for  zt < 800 m and wl = 1 for  z{ > 800 m 134 4.6 The effect  of  data errors on l\ minimum-variation model construction 135 4.7 The effect  of  data outliers on l\ and h minimum-structure models 137 4.8 The h minimum-structure model for  the LITHOPROBE MT data set 139 t 4.9 The li smallest-deviatoric and minimum combined-norm models 140 5.1 Funnel function  lower and upper bounds for  the linear test case 153 5.2 Constructed extremal models for  the linear test case 155 5.3 Percent improvement for  V\  = oo, V 2 = 2.0 for  the linear test case 155 5.4 Lower and upper bounds as a function  of  variation V  for  the linear test case 157 5.5 The l\ minimum-variation model for  the linear test case 157 5.6 Normalized bound width as function  of  A and V  for  the linear test case 159 5.7 Funnel function  bounds for  a(zo, A) for  the synthetic MT example 162 5.8 Extremal models of  unbounded variation for  the synthetic MT example 163 5.9 Extremal models with variation bound Vj = 0.21 S/m for  the synthetic MT example 166 5.10 Percent improvement with V\  — oo, V2  = 0.21 S/m for  the synthetic MT example .. 167 5.11 Comparison of  upper bound for  a(zo  =4000) and lower bound for  a(z0  = 8000) for  the synthetic MT example 168 5.12 Extremal models for  a(z 0 = 8000, A = 4000) for  the synthetic MT example 169 5.13 The effect  of  data errors on computed bounds 171 5.14 Extremal models which maximize a for  the apparent low conductivity region 2000-7000 m depth for  the LITHOPROBE MT data set 174 5.15 Extremal models which minimize a for  the apparent high conductivity region 20000-30000 m depth for  the LITHOPROBE MT data set 176 5.16 Extremal models which maximize a for  the apparent high conductivity region 20000-30000 m depth for  the LITHOPROBE MT data set 177 6.1 Comparison of  upper and lower bounds from  simulated annealing and linearized inversion, synthetic MT example 190 6.2 Comparison of  extremal models (minimization) from  simulated annealing and linearized inversion, synthetic MT example 193 6.3 Comparison of  extremal models (maximization) from  simulated annealing and linearized inversion, synthetic MT example 194 6.4 Comparison of  extremal models (maximization) from  simulated annealing and linearized inversion, LITHOPROBE MT data set 196 7.1 Tectonic map of  earthquake precursor study area on Vancouver Island 198 7.2 Change in observed responses at site BRF between 1988 and 1989 203 7.3 Minimum-structure models for  BRF 1988 data set 205 7.4 Smallest-deviatoric models for  site BRF 207 7.5 Minimum-structure models for  sites BRF, UBC and HLN 210 8.1 One-step inversion for  exact linearization 214 8.2 Correction scheme inversion for  m{z)  = o(z)  219 8.3 Correction scheme inversion for  m(z)  = log a(z)  222 8.4 The x 2 misfit  as a. function  of  (linearized) inversion iteration number 225 8.5 The h flattest  models constructed via standard and corrected inversions 226 8.6 The x 2 misfit  as a function  of  (linearized) inversion iteration number for  the LITHOPROBE MT data set 228 8.7 The l2 flattest  models constructed via standard and corrected inversions for  the LITHO-PROBE MT data set 229 Acknowledgements I would like to sincerely thank Dr. Doug Oldenburg for  his valuable advice and encourage-ment in all aspects of  this work. I would also like to thank Yaoguo Li, Drs Rob Ellis and Ken Whittall, and Dave Aldridge for  many helpful  and interesting discussions. I am grateful  to Dr. Rob Ellis and John Amor for  their help and expertise with the computing facilities,  and to Drs P. Loewen, R. M. Ellis and R. D. Russell for  constructive reviews of  the manuscript. D. R. Auld and Dr. L. K. Law of  the Pacific  Geoscience Centre, Sidney, British Columbia, provided the MT data from  the Vancouver Island earthquake precursor study and were the source of  much useful  information  on the subject. Funding for  the research work in this thesis was supplied by an NSERC post-graduate scholarship and NSERC research grant 5-84270. Finally, I would like to sincerely thank my friends  in the Department of  Geophysics and Astronomy who have made my studies there so enjoyable. Chapter 1 Introduction Many geophysical methods rely on the interpretation of  measurements made at the Earth's surface  to infer  the subsurface  distribution of  various physical parameters. Seismic reflection  and refraction,  gravity and electromagnetic methods are examples of  this procedure. Determining useful  information  about the Earth from  a set of  surface  measurements is the realm of  geophysical inverse theory. This thesis considers the application and development of  inversion procedures to the method of  magnetotellurics (MT), which uses surface  measurements of  naturally-occurring electromagnetic fields  to determine the electrical conductivity distribution within the Earth. The first  section of  this thesis defines  the geophysical inverse problem. Section 1.2 considers how the geo-electrical conductivity may be used to characterize the composition and physical state of  the Earth's interior. Section 1.3 describes the magnetotelluric method, reviews pertinent applications of  inverse theory and considers the assumption of  a one-dimensional (1-D) Earth model. Finally, Section 1.4 provides an overview of  the work presented in this thesis. 1.1 The geophysical inverse problem This section briefly  defines  the geophysical inverse problem. Thorough reviews of  the subject are given by Parker (1977a), Tarantola & Valette (1982a, b) and Menke (1984). In a general geophysical problem, the measured responses e are related to the Earth model m according to e(u)  = F(m,u),  (1.1.1) where the physics of  the problem is represented by the mapping F:  m—> e and u is an independent space or time variable relevant to the experiment. The problem of  predicting the data e given a model m is known as the forward  or direct problem. The corresponding inverse problem may be defined  as: given data e, determine the model m which gave rise to these measurements. Unfortunately,  it is rarely possible to solve this problem unambiguously. In practice, only a finite  number of  responses are measured, so the data may be represented as {e7 = e(uj);  j — 1,... ,N}.  Since the model solution sought often  represents an (infinite-dimensional)  function of  position while the set of  responses is finite  (iV-dimensional), the mapping F:  m^F(m)  is not one-to-one, and the inverse mapping is non-unique. In addition to being incomplete, practical data are always inexact so that (1.1.1) becomes e, = Fj  (m) + j = 1 , . . . , N,  (1.1.2) where Fj(m)  = F(m,  Uj)  and e3 represents the (unknown) measurement errors. It is unjustified to seek a solution to the inverse problem (1.1.2) which fits  the data to a greater level of  precision than the uncertainty of  the measurements; this compounds the problem of  non-uniqueness. The inevitability of  data errors and simplifying  assumptions (e.g. representing the Earth by a 1-D model) also raises the question of  existence for  the inverse problem: for  a given set of  responses it is possible that no model exists which adequately reproduces the data. In general, if  there exists one model which fits  the responses, then infinitely  many such models exist. These models can be diverse and difficult  to characterize. Given the problems of  existence and non-uniqueness, the goal of  inverse theory is to use the observed data to infer  information  about the true model. There are several general approaches to overcoming the inherent non-uniqueness of  the inverse problem, as outlined by Oldenburg (1984). One approach is model construction, whereby a solution m is sought which adequately reproduces the data. Another approach is that of appraisal. Rather than constructing one or more of  the infinite  number of  possible model solutions, the goal of  appraisal is to calculate properties of  m which all acceptable models (including the true model) share. A third approach to inverse theory is inference,  whereby the measured data are used to predict other functional  of  the model. Backus and Gilbert have developed a general formalism  for  treating inverse problems which are linear or can be linearized, such as the MT problem; their work provides the foundation  for  many of  the applications developed here. This thesis considers the inverse problem of  determining the Earth conductivity distribution from  magnetotelluric measurements. Section 1.3 describes the MT method and reviews pertinent applications of  inverse theory; but first,  properties which influence  the conductivity are briefly considered. 1.2 The geo-electrical conductivity The electrical conductivity of  the Earth is influenced  by a number of  factors  and therefore may be considered indicative of  a variety of  geophysical conditions and processes. Conductiv-ity distributions inferred  from  MT experiments have been used to delineate mineral deposits, geothermal areas and potential petroleum reservoirs, and to investigate the temperature distribu-tion, composition, structure, and tectonic features  of  the Earth. Electric current flow  in the Earth is due to three types of  conduction: ohmic, electrolytic and dielectric polarization. Ohmic (electronic) conduction dominates in materials containing free electrons, such as the metals. Electrolytic conduction is due to the physical transport of  ions in fluids  (mostly water) contained in rock pores. Displacement currents are due to the dielectric polarization of  atoms, ions or molecules in the presence of  time-varying electric fields.  At the frequencies  employed in MT studies, displacement currents are negligible. Of  the commonly measured physical properties of  rocks, conductivity is the quantity which exhibits the greatest variation. Rocks and minerals are classified  as good conductors for conductivities of  1-108 S/m (this group includes metals, graphite, sulphides and magnetite), intermediate conductors for  conductivities of  10~7-1 S/m (most ores, oxides and porous rocks containing water), or poor conductors (insulators) for  conductivities of  less than 10~7 S/m (most common rock-forming  minerals, silicates, phosphates and carbonates). Good and intermediate conductors can often  be detected directly using MT methods for  resource prospecting. Telford et al. (1976) have compiled tables of  characteristic resistivities (the reciprocal of  conductivity) of  minerals and rocks. At shallow crustal depths, most commonly-occurring minerals are themselves poor conduc-tors and current flow  is due predominately to electrolytic conduction within fluid-filled  pores in the rock. The conductivity of  porous rock depends on the interstitial fluid  saturation and conductivity, and the volume and connectivity of  the pores. Water conductivity can vary con-siderably depending on the amount and conductivity of  dissolved minerals. Temperature may also affect  the fluid  conductivity by influencing  the solubility and mobility of  ions. High con-ductivity anomalies are often  associated with geothermal areas and MT has been successfully used in geothermal prospecting (e.g. Hoover et al. 1978; Sandberg & Hohmann 1982). The geometrical arrangement of  the pores themselves does not generally have a pronounced effect on the electrolytic conductivity, but can cause the conductivity to be anisotropic, a characteristic of  stratified  rock (Telford  et al. 1976). In general, sedimentary rocks are more conductive than igneous or metamorphic rocks, with the conductivity depending primarily on porosity and fluid content, so that MT methods can be used to delineate possible petroleum reservoirs. Noritomi (1981) and the Electromagnetic Research Group For The Active Fault (1982, 1983) have found  that active fault  zones often  correspond to regions of  high crustal conductivity and ascribe this to water contained in the fault  fractured  zone to depths greater than 10 km. Yukutake (1984) notes that the groundwater which causes this high conductivity may also allow the fault  to slip more easily. Changes in crustal conductivity have been observed to occur before  earthquakes (e.g. Barsukov 1972; Mazzella & Morrison 1974; Qian etal.  1983). The process by which these changes occur is not completely understood, but is believed to be caused by the opening and extension of  micro-cracks in crustal rocks, due to large stresses, which subsequently fill  with water. MT studies have been carried out to monitor crustal conductivity as a possible precursor in earthquake prediction (e.g. Honkura et al 1976; Kurtz & Niblett 1978, 1983). In the lower crust and upper mantle the geo-electrical conductivity is influenced  by a variety of  geophysical factors  such as temperature, heat flow,  partial melting, and petrology (e.g. Gough 1974; Garland 1975; Ad£m 1980). Thus, conductivity models derived from  MT or other electromagnetic measurements can be used to investigate large-scale properties of  the Earth. For example, a pronounced increase in electrical conductivity is observed when water-containing rocks are heated to 500-700° C which may be attributed to the onset of  fractional melting; hence, conductivity anomalies may be useful  in identifying  zones of  partial melt and dehydration. Oldenburg (1981) and Oldenburg, Whittall & Parker (1984) used conductivity models derived from  seafloor  MT measurements to infer  temperature and partial melt profiles for  three locations on the Pacific  plate. The depths at which large amounts of  partial melt were predicted correlated well with seismic low-velocity zones computed by inverting Rayleigh wave dispersion diagrams. Filloux (1980) and Oldenburg (1981) noted that the depth to the apparent zone of  partial melt appears to increase with the age of  the lithosphere. Conductivity models may also be used to investigate tectonic features.  For example, Kurtz et al. (1986, 1990) interpreted an MT survey across central Vancouver Island on the Pacific  coast of  Canada in terms of  a conducting zone at depths greater than 20 km. This conductive layer correlated well with the a strong seismic reflective  zone which is believed to delineate the top of the subducting Juan de Fuca plate (Green et al. 1986). The conducting zone is believed to result from  cracks and pores filled  with saline fluids  supplied by water subducted with the oceanic crust and by dehydration reactions. Conductivity anomalies have been associated with a number of  other subduction zones (Rokityansky 1982). Also, according to Rokityansky, conductivity anomalies have been discovered in all rifts  (spreading centres) which have been investigated by deep electromagnetic study. The high conductivity is believed to be due to a zone of  fractional melting, which agrees with the hypothesis of  the uplift  of  hot mantle material in rifts. Many of  the studies cited in this section are based on one-dimensional (1-D) interpretations of  MT measurements in which the conductivity is assumed to vary only with depth. In this case there are two possible geophysically-realistic representations of  the conductivity profile.  In shallow regions where abrupt conductivity changes of  several orders of  magnitude are observed at interfaces  between different  rock types or at the edges of  water saturated zones, layered or discontinuous models may be required. At greater depths, phase changes could cause discontinuous changes in the conductivity, however, in some depth ranges it is believed that the conductivity is largely controlled by temperature; in this case smoothly varying profiles  are more likely (Parker 1983, 1984). Conductivity profiles  of  both types are considered in this thesis. 1.3 The magnetoteliuric (MT) method 1.3.1 EM induction in the Earth Natural electromagnetic (EM) fields  measured at the surface  of  the Earth consist of  two components: a primary component of  origin external to the Earth, and a secondary or internal component which arises due to telluric currents induced in conductive regions of  the Earth by the primary field.  The penetration depth of  the primary field,  and therefore  of  the telluric currents, depends on the periods of  oscillation and on the conductivity distribution within the Earth, with greatest penetration for  long periods and low conductivities. The ratio of  (orthogonal) horizontal components of  the electric and magnetic fields  measured at the surface  depends primarily on the subsurface  conductivity and is relatively insensitive to the properties of  the source. In the magnetoteliuric method, pioneered by Tikhonov (1950) and Cagniard (1953), this ratio, known as the impedance, is measured as a function  of  period. Impedances measured at progressively longer periods provide information  about the conductivity to progressively greater depths. Inferring  the Earth conductivity distribution from  such measurements defines  the MT inverse problem. This section presents a brief  overview of  the MT method; a comprehensive treatment of  the subject is given in the monograph by Rokityansky (1982). Many natural sources of  geomagnetic variations contribute to the EM spectrum exploited by the MT method. Since impedance measurements constrain the conductivity only over the depth of  penetration of  the inducing fields,  measurements over a broad band of  periods T are required to investigate the conductivity distribution of  the Earth. Atmospherics which are generated by lightning storms and propagate in the Earth-ionosphere waveguide are an important source for  short period variations of  10~5-0.2 s, with amplitude peaks at distinct periods known as the Schumann resonance periods. EM fields  for  this range of  periods typically penetrate to depths of  about one kilometre (controlled sources are also sometimes employed at these periods). Micropulsations in the magnetosphere dominate the spectrum for  periods of  0.2-1000 s; these fields  typically penetrate from  one to tens of  kilometres. Regular daily variations in the geomagnetic field  including solar, lunar and diurnal variations and their harmonics have periods of  0.2-1 day. Magnetic storms and their low-frequency  (recovery) components provide rich spectra at periods of  hours to days. Impedance measurements at these periods may be used to investigate the conductivity of  the lower crust and upper mantle. The rotation of  the sun and semi-yearly and yearly modulations provide variations with periods of  the order of  months. The longest period (extraterrestrial) variations known are due to the 11-year solar cycle with associated penetration depths of  up to 1200-1800 km. In the MT method, it is assumed that the geomagnetic variation fields  at the surface  of  the Earth may be represented in terms of  a plane wave decomposition. Much of  the EM energy incident at the Earth's surface  is reflected,  but a small amount is transmitted. Even at large angles of  incidence (with respect to the normal) the transmitted EM waves are refracted  to the normal by the extreme conductivity contrast between the air and the Earth, and diffuse  vertically into the Earth inducing telluric currents in conductive regions. The plane wave assumption requires that the wavelengths associated with horizontal field  fluctuations  be large compared with the depth of  penetration of  the fields,  and is generally valid except near the auroral zone or in cases of local sources such as nearby lightning or magnetic storms. To derive the governing differential  equation for  EM induction, assume the Earth to be a linear, isotropic medium and consider a Cartesian coordinate system defined  with origin at the surface  and the positive z axis vertically downward. The EM field  vectors are related by Maxwell's equations: V x E = -d tB, (1.3.1) v xH  = J  + d tD v D =.Ph v B = 0, (1.3.2) (1.3.3) (1.3.4) and the constitutive relations D — eE, B = fiH,  J  = aE, (1.3.5) where E and H  are the electric and magnetic field  intensities, B is the magnetic induction, D is the electric displacement, J  is the electric current density, pf  is the free-charge  density, a is the conductivity, and e and p are the permittivity and magnetic permeability. As previously mentioned, at the frequencies  employed in MT studies the displacement current d tD is negligibly small and may be omitted (the quasi-static approximation). Also, in most common rocks the magnetic permeability does not vary appreciably from  its value in vacuum, /z0, and may be set identically to this value, i.e. /i = Assuming a harmonic time dependence of  eiujt, where u = 2^/T is the angular frequency,  and making use of  the constitutive relations (1.3.5), Maxwell's equations (1.3.1) and (1.3.2) become v x E = -iunoH,  (1.3.6) v xH  = aE. (1.3.7) The 1-D induction equations are derived by assuming that the conductivity distribution and EM fields  do not vary in the x or y directions, i.e. a = a(z)  and plane EM waves propagating in the z direction. There is no loss in generality in assuming that the EM fields  are linearly polarized as E= (E x, 0,0)e''wt, H=  (0,H y,0)eiut. Equation (1.3.6) becomes d zEx = -iunoH y, (1.3.8) and taking the curl of  (1.3.6) and using (1.3.7) leads to d 2zEx = iufj, 0aEx. (1.3.9) Equation (1.3.9) is the governing differential  equation for  1-D induction. The boundary conditions are that Ex(z)  -+0 as z —»oo (the radiation condition) and that the surface  field may be set equal to an arbitrary constant Ex(z=0)  = E0. Equation (1.3.9) can be solved for Ex(z)  analytically for  layered conductivity distributions cr(z)  or numerically for  continuous <r(z); this corresponds to the forward  solution for  the 1-D induction problem. Equation (1.3.8) may be used to convert field  measurements of  H y into d zEx. The fundamental  response or transfer  function  of  EM induction is the impedance; in the 1-D case this may be defined  as Z = EJH y. (1.3.10) The impedance quantifies  the effect  of  the conductivity distribution on the surface  fields.  Other response functions  involving ratios of  Ex and H y are also commonly used, e.g. the admittance (reciprocal of  the impedance), Y  = H y/E x. As the simplest case of  EM induction, consider an earth conductivity model which consists of  a halfspace  of  constant conductivity a. In this case it is straightforward  to verify  that Ex(z)  = E0z-W z' s, (1.3.11) H y (z)  = (1 - i) \Ja  j2(jj(x 0 Ex (z) are solutions to (1.3.8) and (1.3.9), where 6( u) = ^2 /u ix 0 a (1.3.12) is known as the skin depth. The solutions (1.3.11) correspond to exponentially attenuated sinusoids with 8 representing the depth over which the fields  are attenuated by a factor  of  1/e. Equations (1.3.11) may be solved for  the conductivity in terms of  the observed EM fields  to yield 2 a = luifiQ Hy Ex = iup0/Z 2. (1.3.13) In more general cases when a is not a constant, the magnitude of  (1.3.13), known as the apparent conductivity aa{u>), represents an average of  the conductivity distribution over the depth of  penetration of  the EM fields.  The phase <f>{u)  of  Z  also provides information  of  variations in the conductivity with depth. a a{u) and <f>(u>)  are commonly used as MT response functions and plots of  AA (or pa = 1 /O A) and <F>  as a function  of  LU  provide an approximate idea of  the behaviour of  the conductivity as a function  of  depth. However, before  interpretations regarding the conductivity distribution are possible, the MT inverse problem must be solved. Despite the simplicity of  the mathematical model of  the Earth and the governing differential  equation (1.3.9), the responses depend on the conductivity in a non-linear manner and the inverse problem is not simple nor is it completely understood. 1.3.2 The MT inverse problem This section briefly  considers some of  the methods which have been applied to the 1-D magnetoteliuric inverse problem. The questions of  existence and uniqueness of  solutions and methods of  model construction, appraisal and inference  have been considered by many authors. This review is not intended to be complete, but rather to present the context for  the work in this thesis. A thorough survey of  applications of  inverse theory to 1-D MT is presented by Whittall & Oldenburg (1990). Parker (1980, 1983, 1984) also reviews this problem. A number of  authors have considered the question of  existence for  the 1-D MT inverse problem. Weidelt (1972) derived a set of  necessary conditions for  the existence of  a 1-D model solution involving an MT response functional d zE(z  = 0,u) known as the complex c-response or inductive scale length (Schmucker 1970; Weidelt 1972). However, the conditions involve derivatives of  the response, which cannot be calculated exactly for  discrete data and so the conditions are difficult  to apply. Weidelt (1986) overcame this difficulty  by deriving a set of  2N necessary and sufficient  conditions for  existence in terms of determinants involving (complex) responses measured at N  frequencies.  These sets of  conditions represent important theoretical results for  the MT problem. Unfortunately,  practical (noisy) responses cannot be analyzed in this manner. In general, it is likely that no 1-D conductivity model a(z)  exists which is exactly consistent with a given set of  inaccurate data, nor is it justified to seek such a model. Rather, the question becomes one of  the existence of  a model a(z)  which adequately reproduces the data according to some misfit  criteria. Parker (1980) shows that when no model exactly fits  the data, the model which minimizes the x 2 misfit  consists of  delta functions  of  infinite  conductivity, but finite  conductance, separated by insulating zones of  zero conductivity. Parker calls this the D+ class of  models. If  the misfit  realized by the D+ model is greater than the specified  allowable misfit,  then the data are incompatible with a 1-D model; thus, constructing a D+ model with an acceptable misfit  provides a necessary and sufficient condition for  the existence of  a 1-D solution. Parker and Whaler (1981) present a practical algorithm for  constructing D+ models. Tikhonov (1965) proved a one-to-one correspondence between every (precise) complete data set and piecewise-analytic conductivity profiles,  thereby demonstrating that, in principle, surface measurements are sufficient  to uniquely determine the true conductivity a(z).  Parker (1983) demonstrated that knowledge of  precise responses on any open interval U 1 < oo < UJ 2, or at an infinite  number of  equally-spaced, discrete frequencies  is also sufficient  to uniquely determine a(z). Other authors have presented uniqueness proofs  valid for  different  types of  conductivity models. However, all MT uniqueness results require an infinite  number of  precise data. In contrast, for  all practical data sets consisting of  a finite  number of  inaccurate responses the inverse problem is always non-unique and there are two alternatives: either no model exists which adequately reproduces the responses, or an infinite  number of  acceptable a(z)  exist. The problem of  constructing 1-D conductivity models which adequately reproduce a set of measured MT responses has received considerable attention (see Whittall & Oldenburg 1990 for a review and comprehensive reference  list). Construction methods can generally be categorized as asymptotic inversions, exact (non-linear) inversions and linearized inversions. A variety of  heuristic inversion schemes have been developed which consider asymptotic forms  of  the induction equations based on the physical process of  EM diffusion.  Although these are approximate inversions which are not guaranteed to reproduce the responses, they are straightforward  to implement and are still used routinely in more sophisticated multi-dimensional interpretations. Schmucker (1987) reviews and compares a number of  asymptotic methods including the well-known Niblett-Bostick transform  (Niblett & Sayn-Wittgenstein 1960) and p*-z* transform  (Schmucker 1970). Exact inversion methods which completely solve the non-linear MT inverse problem are generally two-stage algorithms. The first  stage involves completing the measured data to obtain responses at all frequencies,  the second (non-linear) stage maps the completed data to a uniquely-determined conductivity model. Exact inversions are not iterative, so a starting model is not required and there are no problems concerning convergence. A variety of  exact inversion schemes have been developed involving different  approaches to the data completion and inverse mapping stages. For instance, the D+ model construction of  Parker (1980) and Parker & Whaler (1981) finds  a least-squares fit  of  the measured responses to a partial fraction  representation, which essentially completes the data. This partial fraction  is then used to construct a continued fraction representation which may be interpreted in terms of  a delta-comb function  conductivity model. A similar procedure may be used to construct H + conductivity models which consist of  uniform layers, each with a constant value of  ah2, where h is the layer thickness. In exact inversion methods, once the completed response functional  is constructed, the conductivity model is uniquely determined. Therefore,  any flexibility  in the model construction must be built into the data completion procedure. For instance, Weidelt (1972) transformed the data to an MT impulse response function  analogous to the seismic impulse response, and inverted this response using Gel'fand-Levitan  theory to obtain a unique conductivity profile. Whittall & Oldenburg (1986) and Whittall (1987) extended Weidelt's method by constructing the impulse response using linear inversion techniques; this allowed the construction of  diverse, minimum-structure impulse responses. The final  conductivity model constructed approximately minimizes a related structural measure. In the linearized approach to model construction, an approximate linear relationship is derived between a perturbation 8a to an arbitrary starting model a0 and the corresponding change in the responses. This equation may be inverted using linear techniques to yield a solution for  the model perturbation and the model is updated. This procedure is repeated iteratively until an acceptable model is achieved. Linearized inversions may be divided into two categories. Parametric or under-parametrized inversions restrict the space of  possible model solutions by representing the model by a small number of  parameters. This reduces or eliminates the non-uniqueness of  the inversion; however, the solution is strongly dependent on the parametrization. Alternatively, over-parametrized inversions adopt a large number of  model parameters to approximate an arbitrary function  of  depth. Backus-Gilbert linear inverse theory (Backus & Gilbert 1967, 1968, 1970) may be used to construct model perturbations which minimize some functional of  8a and/or the data misfit.  This approach has been used by Parker (1970), Oldenburg (1979) and Hobbs (1982). Importantly, Oldenburg (1983) transformed  the linearized equations so that a functional  of  the model itself  (not the model perturbation) is minimized at each iteration; this allows the construction of  models of  a specific  character. Constable, Parker & Constable (1987), Smith & Booker (1988) and Dosso & Oldenburg (1989) have applied this method to construct minimum-structure conductivity models. This approach has a major advantage over exact (non-linear) inversions in that a structural measure of  the model is minimized directly. The disadvantage is that, as with any iterative linearized method, it is difficult  to verify  if  a global minimum is achieved. A number of  approaches to the problem of  appraisal for  1-D MT inverse problem have been developed. One approach is to compute unique features  associated with a given set of responses. For example, Parker (1982) found  the minimum depth to a perfect  conductor for practical data, such that the corresponding D+ model achieved an acceptable misfit.  The region below this depth may be considered the zone of  total ignorance, since the data do not constrain the conductivity structure here unless additional information  is supplied or assumed (e.g. prohibiting non-geophysical models such as the D+ solution). Monte Carlo and related random search techniques, which use forward  modelling to test a large number of  randomly generated models, have been applied to the inversion and appraisal of  MT responses (e.g. Hermance & Grillot 1974; Jones & Hutton 1979; Fischer & Le Quang 1981). Model features  may be appraised by generating a large number of  acceptable models and estimating the most probable values or bounds for  model parameters from  this population. The advantage of  this method is that no approximations such as linearization are required. However, since such a search can never be exhaustive, the computed bounds must be regarded as approximate at best. In addition, these methods are often  not practical due to the low probability of  randomly generating models consistent with modern data sets consisting of  a large number of  highly accurate responses. Finally, the method is strongly dependent on the model parametrization which is usually chosen for  computational efficiency  rather than to provide a realistic representation of  the Earth. Backus & Gilbert (1968, 1970) developed a method of  appraisal for  linear inverse problems based on generating unique averages of  the model from  linear combinations of  the data. For non-linear problems such as MT, Backus-Gilbert appraisal can be applied via linearization about some constructed model. Unfortunately,  in this case the unique averages pertain only to models which are linearly close to the initial model. Oldenburg (1979) constructed a number of  different conductivity models by inverting a set of  MT responses and found  different  values for  the model average by linearizing about these models. Parker (1983) and Oldenburg, Whittall & Parker (1984) have found  linearized Backus-Gilbert appraisal to be inadequate for  the MT inverse problem. An alternative method of  appraisal was developed by Oldenburg (1983), who computed bounds for  localized conductivity averages by constructing extremal models which minimize or maximize the conductivity over specified  regions. This method of  appraisal may be considered an application of  the general inference  theory of  Backus (1970a, b, c; 1972). However, since the extremal models are constructed via linearized inversion, it is difficult  to establish whether the computed bounds represent true global (rather than local) extrema. Weidelt (1985) provides a fully  non-linear theoretical analysis of  extremal models for  a small number of  accurate MT responses. 1.3.3 Applicability of  1-D inversion In this thesis, methods are developed to construct and appraise 1-D conductivity models by inverting MT measurements. Of  course, in general, the conductivity of  the Earth can vary in two or three dimensions. Higher-dimensional inversion algorithms are a topic of  much current research (see e.g. the review by Oldenburg 1990). Nonetheless, inversions for  1-D structure are still an important source of  information  for  a number of  reasons. First, there are geologic regions where lateral variation is small (e.g. sedimentary basins, oceanic lithosphere) and 1-D interpretations are directly applicable. At a sufficiently  large scale, the deep geo-electrical structure may often  be considered approximately 1-D; this structure can be studied using very long-period responses. Departures from  one-dimensionality due to relatively shallow inhomogeneities can often  be approximated by a frequency-independent  static distortion and only 1-D interpretations are necessary (Larsen 1977). Also, Weidelt (1972) has derived transformations  for  conductivity profiles  and responses between a 1-D flat,  Cartesian model and a radially-symmetric model. In practice, many MT studies use 1-D modelling to provide rudimentary interpretations of  the Earth structure, and Parker's (1980) D+ solution provides a straightforward  method to determine the applicability of  this assumption. If  the conductivity structure is not 1-D, the impedance measurements generalize to a tensor with four  components. A number of  authors have proposed using rotationally invariant parameters of  the impedance tensor (e.g. Berdichevsky & Dimitriev 1976; Ranganayaki 1984) in 1-D inversions in an attempt to minimize multi-dimensional effects.  Ingham (1988) and Park & Livelybrooks (1989) used modelling studies of  invariant responses to evaluate 1-D inversions in the vicinity of  a 3-D heterogeneity. They concluded that the 1-D inversion of  these responses provided a good representation of  the structure directly beneath the site when the site was located away from  a finite,  highly-conductive heterogeneity, but erroneous deep structure beneath the anomaly. At sites above or away from  a finite  resistive heterogeneity, 1-D inversions yielded a good structural representation at shallow and intermediate depths, but erroneous structure at greater depths. A second reason for  developing 1-D inversion algorithms is that 1-D models can be combined to provide good starting models for  2 - or 3-D inversions (Park & Livelybrooks 1989). Also, 1-D inversions can be an integral component of  multi-dimensional inversion schemes, as in the Rapid inversion algorithm of  Smith (1989), or the AIM (Approximate Inverse Mapping) inversion of  Oldenburg & Ellis (1990). A third reason is that a thorough understanding of  the 1-D inverse problem can provide a foundation  for  solving inverse problems in higher dimensions. This is true both in terms of understanding the inherent non-uniqueness of  the inverse problem and in developing practical inversion algorithms. In particular, it is anticipated that the construction and appraisal methods developed in this thesis for  1-D inversion could be extended to the 2-D case. Finally, Parker (1983) notes that "the usefulness  of  the (1-D) model can best be judged by the large number of  papers in the geophysical literature relying upon it for  interpretational purposes and by the almost equally large number devoted to advancing the associated theory." This statement would appear to remain true today. 1.4 Overview of work in this thesis The purpose of  this thesis is to investigate the non-linear magnetoteliuric inverse problem. In particular, linearized methods are developed to construct and appraise 1-D conductivity models of the Earth by inverting MT measurements. Linearized methods are generally based on expanding a functional  representing the forward  problem about some starting model and neglecting the higher-order terms. Chapter 2 considers the linearization of  the MT problem in detail. Complete expansions in terms of  Frechet differential  series are derived for  several choices of  MT response and the Frechet differentiability  of  these responses is verified.  The relative linearity of  the responses is quantified  by examining the ratio of  non-linear to linear terms in order to determine the best choice for  a linearized approach. In addition, the similitude equation for  MT, suggested by G6mez-Trevino (1987) as an alternative formulation  to linearization, is examined and found to be inadequate in that it implicitly neglects first-order  terms. In Chapter 3 an iterative inversion algorithm is developed based on the local linearization of  Chapter 2. This algorithm may be used to construct acceptable conductivity models which minimize an l2 norm. The norms considered measure model structure or the deviation of  the model from  a given base model to produce minimum-structure and smallest-deviatoric models, respectively. The constructed models are smooth and represent structural variations in terms of continuous gradients in the conductivity. Chapter 4 presents a similar inversion algorithm which may be used to construct minimum-structure or smallest-deviatoric models which minimize an /x norm. The solutions resemble layered Earth models with structural variations occurring discontinuously at distinct depths and complement the smooth solutions constructed in Chapter 3. The algorithms developed in Chapters 3 and 4 consider the model to be either a or log a, include an arbitrary weighting function  in the model norm, and fit  the data to a specified  level of  misfit:  these algorithms should provide considerable flexibility  in constructing 1-D conductivity models from  MT responses. Model features  can be appraised by constructing extremal models which minimize and maximize localized conductivity averages (Oldenburg 1983). These extremal models provide lower and upper bounds for  the conductivity average over the region of  interest. In Chapter 5, an efficient,  robust appraisal algorithm is developed using linear programming to minimize or maximize the conductivity averages. For optimal bounds, it is important that the extremal models are geophysically reasonable. The appraisal method is extended by constraining the total variation to limit unrealistic structure and ensure that the extremal models are plausible. The variation bound can be specified  in terms of  a or log a. The extremal models in Chapter 5 are constructed via linearized inversion; therefore,  the possibility always exists that the computed bounds represent local rather than global extrema. In Chapter 6 the method of  simulated annealing optimization is applied to the problem of constructing extremal models. Simulated annealing is a Monte-Carlo procedure based on an analogy between the parameters of  a mathematical system to be optimized and the particles of a physical system which cools and anneals into a ground-state configuration  according to the theory of  statistical mechanics. Simulated annealing makes no approximations and is well known for  its inherent ability to avoid unfavourable  local minima. Although the method is considerably slower than linearized analysis, it represents a general and interesting new appraisal technique which may be used to corroborate results of  the linearized approach. In Chapter 7, the model construction algorithms developed in this thesis are used to analyze MT responses measured at a number of  sites on Vancouver Island, Canada. The measurements were made to investigate monitoring of  local changes in conductivity as a precursor for  earthquakes. MT responses measured at the same site over a period of  four  years are analyzed and indicate no significant  changes in the conductivity structure (no earthquakes of  magnitude greater than 3.0 occurred in this period). Conductivity profiles  at a number of  sites are also considered in an attempt to infer  the regional structure. In Chapter 8, a method of  correcting linearized inversion iterations is developed. The cor-rections consist of  successively approximating the linearization error using analytic expressions developed in Chapter 2. The method would seem to represent a novel approach which can be implemented in a practical algorithm that significantly  reduces the number of  linearized itera-tions. In addition, a correspondence between the correction steps and iterations of  the modified Newton's method for  operators is established. The algorithms developed in Chapters 3, 4, 5, 6 and 8 are all illustrated using synthetic test cases and MT field  data collected as part of  the LITHOPROBE project. Finally, it should be noted that although the algorithms developed in this thesis are applied specifically  to the 1-D MT problem, the methods are general and could be applied to a variety of  inverse problems. Chapter 2 Frechet differential  expansions and the linearity of MT responses 2.1 Introduction In many geophysical inverse problems the observed responses are related to the Earth model by a non-linear operator or functional.  A practical method of  treating such problems is that of local linearization whereby the non-linear operator is replaced locally by its linear approximation. This requires that the operator be expanded in a generalized Taylor series about an arbitrary starting model; the expansion may then be linearized by neglecting the higher-order terms. A variety of  linearized iterative algorithms have been applied to the non-linear MT inverse problem. Oldenburg (1983) shows how the linearized problem may be transformed  so that the solution at each iteration minimizes a functional  of  the model called a model norm. In this chapter linearization is examined as a basis for  model norm solutions. Complete expansions are derived for  several choices of  MT response functional  and the higher-order terms are considered in an attempt to formulate  the most accurate linearized inverse problem. First, however, the necessary mathematical background is presented. 2.1.1 Functional analysis background An operator on a normed vector space is a generalization of  the idea of  a function  of  a real variable. The extension of  the concepts of  algebra and calculus to operators is the realm of  functional  analysis. In this section the necessary definitions  and results from  the theory of functional  analysis are presented in order to develop the general expansion and linearized solution of  a non-linear operator equation. Since the MT forward  problem may be considered as a non-linear mapping of  the space of  integrable, real-valued functions  (the geo-electrical model) to the space of  real or complex numbers (the observed responses), special attention is given to this case. The most efficient  approach to non-linear operator problems is often  that of  linearization. For ordinary functions  the relevant result is Taylor's theorem which requires the existence of  at least some of  the derivatives at a point. To apply this idea requires a concept of  the derivative for  non-linear operators. There are a number of  possible generalizations, but the most useful  is that of  the Frdchet derivative. Since many results in elementary calculus generalize naturally in terms of  the Frechet derivative, it plays an important role in functional  analysis. Definition  2.1 Frechet derivative (Milne 1980, p. 289) Let U,  V be normed vector spaces. An operator F:U—*V  is Frfchet  differentiable  at x0 £ U  if there exists a continuous linear operator F'(x 0)\U^V  such that for  all h G U F'(x 0) is known as the Frechet derivative of  F  at x0. F'(x 0)h is known as the Frdchet differential  and is also written F'(x 0)h = dF(x 0, h). The Frechet derivative at x0 is unique. In (2.1.2), || • | |u and || • | |v refer  to the norms associated with vector spaces U  and V (when there is no chance of  confusion,  the subscripts are omitted). A distinction must be made between Frechet derivatives and the ordinary derivative. In the case F:  where 71 is the set of  real numbers, the two are closely related, but logically different  (Griffel  1981). The ordinary derivative at x0 is a number giving the slope of  F  at xQ, and the ordinary differential dF  = F'(x 0)h is interpreted as the product of  the number F'(x 0) with the number h. The Frdchet derivative F'(x 0), however, is not a number but a linear operator F'(x 0): H^K.  The Frechet differential  dF  = F'(x 0)h is interpreted as the result of  applying the linear operator F'(x 0) to the element h 6 TZ. According to Definition  2.1, an operator F  is Fr6chet differentiable  if  its Frechet derivative exists. Therefore,  to prove the Frechet differentiability  of  F  at xo, a linear operator F'(x 0) must be found  such that the remainder (non-linear) term e(x 0, h) of  (2.1.1) satisfies  condition (2.1.2), i.e. e(x 0, h) is much smaller than h as \\h\\—In general, however, the expression for F (x 0 + h) = F  Oo) + F'  (so) h + e (a*, h), lim M £ £ I M 1 V = 0 . IWIm-O  \M  U (2.1.1) (2.1.2) the Frechet derivative F'(x 0) alone is not easily written in an informative  way; it is often  more useful  to compute the differential  dF  — F'(x 0)h and not the Frdchet derivative itself  (Zeidler 1985). Equations (2.1.1) and (2.1.2) can be combined and written in the form F(x 0 + h) = F(XO) + F'(XO)/> + o[ |N| ] . ' (2-1.3) This equation demonstrates that if  F  is Fr6chet differentiable  at x0, then F  is locally linear at x0. In a sufficiently  small neighbourhood of  x0 (i.e. for  h sufficiently  small), F(x 0 + h) can be approximated to arbitrary accuracy by F{x 0) + F'(x 0)h, which is linear in h since the Frechet derivative F'(x 0) is, by definition,  a linear operator. The Fr6chet differential  gives the best linear approximation to the non-linear operator F  near x0 (Griffel  1981). The Frechet differential  F'(x 0)heV  represents the application of  the linear operator F'(x 0) to the element h e U.  When U  represents the space of  integrable, real-valued functions  of  a real variable on the interval [0, d\,  and when V, the range of  F,  represents the space of  complex numbers, any continuous linear operator $ : li —> V may be written as d $/*  = J  <f>(u)h(u)du,  (2.1.4) o for  some bounded function  <f>  called the kernel of  $ (Griffel  1981). The kernel corresponding to DF(x 0) is called the Fr6chet kernel of  F,  denoted by G(x 0, u): thus (2.1.3) can be written as d F(x 0 + h) = F ( x 0 ) + J  G(x 0,u)h(u)du  + o[\\h\\].  (2.1.5) o In order to investigate the form  of  the non-linear terms, higher-order Fr6chet derivatives are required. Definition  2.2 Second-order Frechet derivative (Milne 1980, p. 295) Let F \ U ^ V be Frechet differentiable  in a neighbourhood of  xQeU.  If  the Frechet derivative of  F'  at x0 exists, it is called the second Frdchet derivative of  F  at xQ and is written F"(x 0). The second Frechet differential  is given by d 2F(x 0, h) — F"(x 0)h2. The second Frechet derivative is a bi-linear operator, i.e. F"(x 0)-MxU^V  is an operator which, given (hi,h 2) G UxU  (the Cartesian product of  U  with itself)  associates it with an element of  V denoted by F"(x 0)(hi,  h2). F"(x 0) applied to (/ii, h2) is linear in both hi and h2. The second Frechet differential  F"(x 0)h2 = F"(x 0)(h,h)  represents the restriction of  F"(x 0) to elements (h,  h) G UxU. Higher-order Frechet derivatives are defined  in a manner similar to Definition  2.2. With Frechet derivatives of  all orders defined,  Taylor's expansion theorem may be generalized, to operators: Theorem 2.3 Taylor series expansion (Milne 1980, p. 298) Let F:U—>V  have an nth Frdchet derivative F^ n\x0) and Frdchet derivatives of  order 1 through (n—1) in an open ball B(x 0, r) G U,  then for  all h eU  with < r F(x 0 + h) = F  (x0) + F'  (x 0) h + (x0) h2 (2 1 6) + ... + ±FW(x 0)h« + o[\\h\\ n}, where F ^ ( x 0 ) h n denotes the restriction of  the multi-linear operator F ( n ) ( x 0 ) to elements (h,h,...,h)  G {UxUx  ... xU). In Theorem 2.3 B(x 0, r), the open ball with centre x0 G U  and radius r > 0 (r G 11), represents the set of  all elements seU  such that —3|| < r. In the case where U  represents the space of  integrable, real-valued functions  defined  on [0, d] and V represents real or complex numbers, the bilinear operator F"(x 0)h2 can be written as (Griffel  1981) d d F"  (x 0) h2 = j J  G2(x 0,u1,u2)h(u 1)h(u 2)du 1du 2, (2.1.7) o o where G2(x 0) represents the second Frechet kernel of  F(x 0). Extending this result to higher-order Frechet differentials  allows the Taylor series expansion (2.1.6) to be written as d + F  (x 0 + h) = F  (x 0) + J  Gi (x 0,«i)ft(ui)dMi d d + ... ' + J  J  G2(x 0,u1,u2)h(ui)h(u 2)du 1du2 0 0 d d + ^j- J  J  " J  Gn Oo, u 2 , . . . , un) h (ux) h (u 2) • • • /l (U n) • • • du n + ° (2.1.8) where Gk{x 0) represents the Frechet kernel of  order k. Generalized Taylor series expansions of  the form  of  (2.1.6) or (2.1.8) are also referred  to as Frdchet differential  expansions. The concepts of  Fr6chet differentiation  and Taylor series expansions for  operators lead naturally to the generalization of  linearized inversion techniques such as Newton's method. Consider the non-linear operator equation F(x)  = 9, where F-.U-^V  and 6 represents the zero element of  V, with solution x 6 U.  If  F  is Fr6chet differentiable  in an open ball B(x m,r) about an element xm£U  with ||x — x m | | <r, then according to (2.1.3) F(x)  may be expanded about xm as F (x)  = 6 = F (x m) + F'(x m)(x  - xm) + o [|\x  - xm\|]. (2.1.9) If  [F'(x m)J-1, the inverse of  F'(x m), exists, then by neglecting the second-order terms, equation (2.1.9) can be rearranged to give an approximation x m + 1 to x: xm+1 = xm - [F'(Xm)}' 1 F(x m). (2.1.10) Equation (2.1.10) may be used in an iterative fashion  by selecting an initial approximation a;0 and solving for  xm+1 for  m = 0,1,... to produce a Newton series. Conditions for  the convergence of  Newton's method for  operators exist (e.g. Hutson & Pynn 1980), but in practice these conditions are difficult  to apply and it is usually simpler to carry out the computations and check convergence a posteriori  (Milne 1980). In general, Newton's method will converge provided the initial approximation x0 is sufficiently  near the solution. When this is the case, the rate of  convergence is quadratic, i.e. \\xm+i -x\\ < A | | z m - x | | 2 , (2.1.11) where A e 1Z is a constant. A potential difficulty  with the Newton series given by (2.1.10) is that the Fr6chet derivative must be computed and a linear operator equation must be solved at each iteration. In practice, these calculations may be very time consuming. A number of  modified  Newton's methods have been devised to overcome this difficulty.  One of  the simplest is given by (Lusternik & Sobolev 1961) Xm+1  = x m - [F'  (xo)r1 F(x m), (2.1.12) where the initial inverse operator is used for  all iterations. This greatly reduces the computational effort,  but can degrade the convergence. The remainder of  this chapter applies the concepts of  Frechet differential  expansion and linearization to MT response functional;  in subsequent chapters linearized construction and appraisal algorithms based on Newton's method for  operators are developed. 2.1.2 Application to 1-D MT inversion The 1-D MT forward  problem may be considered as a non-linear mapping of  the space of integrable, real-valued functions  of  a real variable (the geo-electrical model m(z),  defined  for z > 0) to the space of  real or complex numbers (the observed responses e(u>)  measured at N distinct frequencies  uij), i.e. e ^ ) = F(m, U j), j = l,...,N.  (2.1.13) If  F  is Frdchet differentiable,  the functional  analysis methods of  Section 2.2.1 may be used to solve (2.1.13) as an inverse problem. The model m(z)  is, of  course, unknown, but it may be represented as the sum of  an arbitrary starting model m0(z)  and an unknown model perturbation 8m(z) so that (2.1.13) may be written e (uj)  = F  (m0 + 8m, u>j). (2.1.14) If  F  is Frechet differentiable  in a neighbourhood of  m0, (2.1.14) may be expanded about m0(z) according to (2.1.5) to yield oo e(uj)  = F(m0 ,Wj) + J  G1(m 0,u}j,z)6m(z)dz  + o[\\6m\\],  (2.1.15) o where Gx represents the first  Frechet kernel of  F  with respect to m. This expansion may be linearized by neglecting the higher-order terms to produce CO 8e(u>j)  = J  G1(m 0,u}j,z)8m(z)dz,  j = l,...,N,  (2.1.16) o where 8e = e — F(m 0) = F(m)  — F(m 0). Equation (2.1.16) is in the form  of  a Fredholm integral equation of  the first  kind and can be inverted by extremizing some functional  of  8m(z) using standard methods of  linear inverse theory (e.g. Oldenburg 1984). Usually the smallest acceptable model perturbation is constructed by minimizing ||<5m||, and the model solution is given by m ^ z ) = mQ(z)  + 8m(z).  Since higher-order terms have been neglected, this model may not fit  the data and the process must be repeated iteratively until an acceptable model is achieved. This process is essentially Newton's method for  operators described in Section 2.1.1. Parker (1970) first  applied this method to the inversion of  global electromagnetic induction data; it was first  applied to the 1-D inversion of  MT responses by Oldenburg (1979). Alternatively, by substituting 8m{z)  = m(z)—m 0(z),  as first  described by Oldenburg (1983), (2.1.16) can be recast as oo oo 8e(u>j)  + J  GifaojUj,  z)m0(z)  dz  = J  Gi  (m0 , Uj,  z) m (z)  dz.  (2.1.17) o o The left  side of  (2.1.17) consists of  known quantities and may be considered modified  data. However, in (2.1.17) the responses are related directly to the model, not to a model perturbation. By formulating  the inverse problem according to (2.1.17), linear inversion methods can be used to minimize a functional  of  the model itself,  not simply a functional  of  the model perturbation. This is referred  to as model norm inversion in this thesis since the functional  (which often  consists of  a norm) is applied directly to the model. This method allows great flexibility:  models of different  global character can be constructed by the extremizing different  functional  of  m(z). A number of  authors have applied this method to the inversion of  MT data. Constable, Parker & Constable (1987), Smith & Booker (1988) and Dosso & Oldenburg (1989) use this formulation to minimize the /2 norm of  the first  or second derivative of  the model to construct 'flattest' or 'smoothest' models. These models are particularly useful  in that they exhibit the minimum structure necessary to fit  the data. Dosso & Oldenburg (1989) also minimize the l\ norm of the model Variation to produce minimum-structure models which more closely resemble layered Earth models. In addition, Oldenburg (1983) and Dosso & Oldenburg (1989) use (2.1.17) to construct extremal models which maximize or minimize box-car averages of  the conductivity model. From these extremal models, upper and lower bounds may be computed for  localized conductivity averages in order to appraise features  of  interest. These various applications of (2.1.17) are the basis for  Chapters 3-5 of  this thesis. In any iterative inversion algorithm the choice of  response and model may be important factors  in determining the convergence. A particular choice may be 'more linear' than others in that it yields a more accurate linearized equation when the higher-order terms are neglected. The correct choice should increase the likelihood of  the algorithm converging to the desired solution and minimize the number of  iterations required. The conductivity a(z)  is chosen as model in the initial study since it appears to be the choice for  which the inverse problem is most linear (Smith & Booker 1988). As described in Chapter 1, magnetoteliuric responses consist of  ratios of  orthogonal components of  electric and magnetic fields  E and B or H  (subscripts indicating directional components of  the fields  are omitted in this chapter). The two responses commonly used in linearized inversions are y (2.1.18) v '  '  iu B (<r,  u>) d zE (cr,u>)  v ' considered by Parker (1970,1972,1977b) and Constable etal.  (1987), and its (scaled) reciprocal v ; E (a,  u>) ilu E(a,  u) used by Oldenburg (1979, 1981, 1983), Smith & Booker (1988) and Dosso & Oldenburg (1989). The responses c and R are proportional to the impedance Z  and admittance Y,  respectively. Parker (1977b) derived the linear and remainder terms for  the c response and he, Chave (1984) and MacBain (1986, 1987) have considered the Frdchet differentiability  of  this response. Oldenburg (1979) derived the linear term for  R. No proof  of  the Frechet differentiability  of  the R response or expressions for  the series of  higher-order terms for  R or c have been given. In the next two sections of  this chapter Frdchet differential  expansions, including higher-order terms, are derived for  the two MT responses and the higher-order terms are shown to sum to a closed-form  remainder term. In the usual MT method, measurements are made at the surface  of the Earth (z  — 0), however, in principle measurements can be made at an arbitrary depth z = zm (e.g. sea floor  MT). In this chapter all results are derived for  arbitrary measurement depth; this is motivated by a statement by Gdmez-Trevino (1987) that Frechet kernels for  1-D EM induction problems in the literature are limited to surface  measurements. The Frechet differentiability of  R is proved. Results are illustrated for  the special case of  constant-conductivity models where Frdchet differentiation  is equivalent to ordinary differentiation  and the expansions can be evaluated directly. Section 2.5 quantifies  the relative linearity of  the R and c responses by examining the ratio of  non-linear to linear terms in order to determine the most linear choice of  response. G6mez-Trevino (1987) suggests that model norm inversions could be based on the similitude equation rather than on linearization. Section 2.6 shows that linearization is indeed the correct basis for  these solutions. Finally, the linearized expressions derived in this chapter are the basis for  the MT inversion and appraisal applications of  Chapters 3, 4 and 5, and Chapter 8 also makes use of  these expressions in an attempt to include or correct for  the higher-order information  in a linearized solution. 2.2 The response R = d zE/(iu>E) 2.2.1 Expansions for  R The definition  of  the Fr6chet derivative (Definition  2.1) states the criteria that the derivative must satisfy,  but does not indicate how the Frechet differential  terms are obtained. In general, the (linear) Frechet derivative term may be determined by a standard perturbation technique. The governing differential  equation may be written in terms of  a reference  or starting model m0(z) and response e(m0), or a perturbed model m0(z)+6m(z)  and response e(m 0 + Sm) = e(m 0)+6e. A differential  equation which relates the perturbation in the model 8m{z)  to the resulting change in the observed response Se  may be obtained by subtracting the original equation from  the perturbed expression. Neglecting higher-order terms and solving this equation leads to a linear expression for  6e in terms of  the first  Frechet differential. In this section, the perturbation analysis is carried out in a slightly different  manner which emphasizes the analogy with Frechet differential  expansions and retains higher-order terms. The complete Frechet differential  expansion is derived for  arbitrary measurement depth zm. This expansion may be expressed as an infinite  series of  terms of  increasing order in the model perturbation, or as a linear term and a closed-form  remainder term which contains all the higher-order contributions. The MT differential  equation for  the electric field  is derived in Chapter 1: d 2zE(a,z)-ivn 0a(z)E(a,z)  = 0, * (2.2.1) with boundary condition E(a,  z) —> 0 as z —> oo, where a(z) represents the conductivity model and the dependence of  E on u is implicitly assumed. The data are usually measured at the Earth's surface,  but the response R defined  by (2.1.19) may be generalized to depth as * ( , . , ) = (2.2.2) iu> E(a,z) Differentiating  (2.2.2) and substituting from  (2.2.1) yields the governing differential  equation in terms of  R: d zR (<r, z) + iu>R2 (a,  z) = fi 0a (z).  (2.2.3) Assume (2.2.3) is satisfied  by an arbitrary conductivity model a0(z) and introduce a first-order perturbation a(z)  = cx0(z)  + 6a(z).  (2.2.4) This will result in a change in the response which may be expressed as an expansion R(a,z)  = R(a 0 + 6cr,z)=R0(z)  + R1(z)  + R2(z)  + R3(z)  + ... , (2.2.5) where Rk represents the term of  order k  in 8a. Substituting (2.2.4) and (2.2.5) into the differential  equation (2.2.3) and equating terms of  like order leads to d zRo + iojRl = (2.2.6a) d zRi + (2iuR 0)R1 = Hoticr, (2.2.6b) d zR2 + {2iuR 0)R2 = —iujRl, (2.2.6c) d zR3 + (2iuR 0)R3 = —ioj2RxR2, (2.2.6d) d zR4 + (2iu}Ro)R 4 = -iu(R%  + 2R1R3), (2.2.6e) Consider a solution to equations (2.2.6) for  a particular depth zm. The zero-order equation (2.2.6a) has the form  of  the original differential  equation (2.2.1) and is satisfied  by Ro(z m) = R(a 0, zm). The remaining equations all have left  sides of  the same form.  They are linear, first-order  differential  equations and may be solved using an integrating factor zm . 2 E{a  o, zm) J e x p / 2IURQ  (U)  du  = . E(a o,0) . Assuming that i?A(cr, z) —> 0 as 2 00 for  k = 0 , 1 , . . t h e solutions to (2.2.6) are Ro(z„ Ri(z„ R2(z„ Rz(z-n Ri{z„ --- R(<T 0,Z m), - / - / . 0 E(a 0,z) E(a 0, zm)_ [E(a 0, zm)_ 0 0 , O = [ iu)\ja°±if! J t U J [E(a 0,zm)_ 00 J l U J [E(a 0,Z M)_ 8a(z)dz, R\{z)dz, 2 R1(z)R 2(z)dz, (Rl(z)  + 2R1(z)R 3(z))dz, (2.2.7) (2.2.8a) (2.2.8b) (2.2.8c) (2.2.8d) (2.2.8e) The zero-order term Rq represents the response functional  evaluated at the starting model, the first-order  term Rx is linear in 6a, and R2,R3,... represent higher-order terms. Of  course, it remains to be proved that R2, R3,... actually are second or higher order in 6a. This implies the Frechet differentiability  of  R and is considered in Section 2.2.2. To derive a closed-form  remainder term Rr(z m) which contains all the higher-order contri-butions, the expansion for  R{a,  z) is modified  to be R (a,  z) = R (00 + 8a, z) = Ro (z)  + Ri 0 ) + Rr 0 ) • (2-2.9) Substituting (2.2.9) and the conductivity perturbation (2.2.4) into the differential  equation (2.2.3) and equating terms of  like order leads to dzRr + (2iw.Ro) Rr = -iw (Ri  + Rrf  • (2.2.10) This differential  equation may be solved for  a depth z = zm by using the integrating factor  given by (2.2.7) and noting that Rx + Rr = R(a)-R(a 0) = 8R to yield E(a 0,z) Rr 0m) = J  tW 0 0 - - 2 8R {z)  dz.  (2.2.11) _E(a 0,zm)_ ZM Since 8R = R(a)  — R(a 0), the remainder term Rr depends on the true model a in a non-linear manner; nonetheless, (2.2.11) gives a convenient closed-form  representation of  the linearization error. The expansion of  R(a)  about arbitrary a0 in terms of  a linear and remainder term is always exact; however, the expansion in terms of  an infinite  series of  higher-order terms may not converge for  all choices of  a0. When a is within the region of  convergence for  the Frechet differential  expansion about a0, the equivalence of  the remainder term and the series of  higher-order terms is easily established. Substituting 6R = R(a)  — R(a 0) = RI+R 2 + R3 + ... into the expression for  the remainder term given by (2.2.11) and breaking the integral into terms according to their order yields the expansion of  higher-order terms given by (2.2.8c, d, ...). The expansion terms in the infinite  series for  R(a 0, zm) given by (2.2.8a, b, ...) are written in a recursive form:  R2 depends on Rl f  R3 depends on R2 and Rx, and so on. This form  is compact, convenient for  computation, and readily suggests the continuation of  the series to higher order. However, the terms are not in the standard form  for  the Frdchet differential  expansion according to Theorem 2.3 or equation (2.1.8). It is instructive to write the expansion terms in the form  of  (2.1.8) since this allows the identification  of  the Frechet kernels. This can be done by substituting the expression for  Rk into Rk+1 from  (2.2.8) and introducing the Heaviside step function  H  defined  as 0, if  x < 0; 1, if  x > 0, to allow the order of  integration to be rearranged. After  some algebra, the results for  the expansion terms in standard Frechet differential  form  are H(x)  = (2.2.12) RO(ZM)  = R(<70,  Z M), OO zi) 6a (zi)  dzi, (2.2.13a) (2.2.13b) (2.2.13c) R2(z m) = ^ J  J  G2(z m,z1,z2)6a(z 1)6a(z 2)dz 1dz 2, o o oo oo oo Rs(Zm)  = ^ J  J  JG 3(  zi, z2, z3) 6a (zi) 6a (z2) && ( z 3 ) dz x dz 2 dz 3, (2.2.13d) o o o where Gk(z m, z i , . . . , Zk)  represents the Frechet kernel of  order k  given by G\ (z m,zi) = — (IQ E(z 1) E (z m) G2 (z m,z!,z2) = 2iu^l  J o oo oo G3 (z m,ZUZ 2, Z3) = -12zUJ/J%  J  j H  (zi—z m), E( Z1)E(Z 2) E(z 3)E(z m)_ (2.2.14a) H  (z! - z 3 ) H  (z2 - z 3 ) H  ( z 3 - z m ) dz 3, (2.2.14b) E( Zl)E(z 2)E(z 3) H (zx - z4) E(z 4)E(z 5)E(z m)_ x H  (Z 2 — Z 4 ) H ( Z 3 - Z 4 ) H  (z 4 — zm)dz 4dz 5, (2.2.14c) The Frechet kernels are evaluated at the starting model cr0. A number of  authors refer  to G\ itself  as the Frdchet derivative; however, this is not strictly in keeping with Definition  2.1 of  the Frdchet derivative as a linear operator. For instance, the linear operator associated with R\{z)  in (2.2.13b) is given by oo J  G1(a 0,zm,z) • dz,  (2.2.15) o where the '•' indicates application to a real function.  To avoid confusion,  Gk will always be referred  to as the fcth  Frechet kernel in this thesis. The first  Fr6chet kernel is sometimes referred  to as the sensitivity since it determines, to first order, how the model perturbation 8 a effects  the response R(a 0+8a). For zm^0  and z<zm, the Heaviside function  in (2.2.14b) ensures that Gi(z m, z) =0 . This indicates that for  2 < zm, 8a{z)  makes no first-order  contribution to the value of  R. In fact,  Gk(z m, z\, •. •, zk)  = 0 if Zj  < zm for  any j, 1 < j < k,  which indicates that 8a(z  < zm) makes no contribution of  any order to R. This demonstrates that the 1-D MT method is only sensitive to the conductivity structure below the depth of  measurement. This result has the following  consequence. Since the conductivity structure above the depth of  measurement zm is irrelevant, the coordinate system may always be redefined  so that this depth is taken to be the origin, i.e. zm = 0. The general expressions for  the linear and remainder terms as well as the expansion in terms of  Fr6chet kernels given by (2.2.13) and (2.2.14) then reduce to those that would be derived for  surface measurements. Only the recursive form  of  the higher-order terms in (2.2.8) require expressions for  zm > 0. It is also interesting to note that although the first  Fr6chet kernel G1 at depth zx depends only on the electric fields  at depths z\ and zm, higher-order Frechet kernels depend on the field  over the entire depth range. To formulate  a practical inversion algorithm, the Frechet differential  expansion is linearized by neglecting the terms R? = R2 + R3 + ... . In Section 2.2.2 it is shown that the remainder term is second order in 8a, so this approximation is good provided 8a is small. As 8a-+0 the remainder term goes to zero faster  than the linear term, and convergence is guaranteed. For zm = 0 the linearized equation reduces to that derived by Oldenburg (1979): oo 8R(u)  & Rriu) = J  G1(a o,u>,zm = 0,z)8a(z)dz,  (2.2.16) where the dependence on u is explicitly indicated and the Fr6chet kernel is given by E ((To,U,  z)~ Gx (cr0, u, zm = 0, z) = -Ho By applying the transformation  given in (2.1.17), (2.2.16) may be written as (2.2.17) oo oo £jR(u;)  + J  G1 (cr o,u;,zm = 0,z)cro(z)dz  = J  Gi (a 0,u>, zm = 0, z) a (z)  dz.  (2.2. 18) Equation (2.2.18) can be inverted directly for  a model which minimizes some functional  using standard methods of  linear inverse theory. Since the EM fields  are complex, (2.2.18) may be considered as two equations consisting of  the real and imaginary parts or manipulated and represented in terms of  amplitude and phase. In practice, responses are measured at N frequencies  u>j and the solution involves inverting a set of  2N real equations. 2.2.2 Proof  of  the Frechet differentiability  of  R The expansion for  R(a)  developed in Section 2.2.1 may be written oo 8R(z m) = J  G1 ((To, z m , z) 8a (z)  dz  + Rr (z m). (2.2.19) o This is an exact expression; however, in this form  it is not amenable to solution using methods of  linear inverse theory. If  the remainder term can be shown to be o[||5cr||], then it may be neglected and the expression linearized (this is eqivalent to proving the Frechet differentiability of  R). Chave (1984) claims that it is important to prove that the remainder term actually is second order in the model perturbation, i.e. R,. — 0[||^<J||2]. This is a stronger condition than Frechet differentiability,  which immediately implies Frechet differentiability.  The linearization breaks down if  the remainder term is not small, and this is observed in geophysical problems (e.g. Woodhouse 1976). A number of  authors have considered the Frechet differentiability  of  the o 0 MT response c given by (2.1.18) (see references  cited in Section 2.3.1). MacBain (1987) proves that the linearization error for  the c response is 0[||^cr||2], which firmly  establishes the Frdchet differentiability  of  c. This fact  can be used to prove a similar result for  the R response as follows. MacBain (1987) proves that the functional  c satisfies c (a)  - c ((Jo) = c'  (cr 0) 6a + e (a,  a0), (2.2.20) where c'(a 0)6a represents the Frechet differential  (linear) term and the linearization error satisfies |e(<7,<70) |</c| |H|2 , (2-2.21) for  some KEH,  provided ||6cr|| = ||<r — <r0|| is small enough. Now consider the Taylor series expansion of  1 jC about C0 , for  complex C and C0 , which may be written as 7? = 7T ~ [C ~ Co] + £ (C, Co), (2.2.23) where \e(C,C 0)\<-^\C-C 0\\ (2.2.24) provided | C - C 0 | < |C|/2. Using C = c(a),  C0 = c(a 0) and (2.2.20), (2.2.23) may be written as 1 1 1 c'  (cr 0) 6a + e (a,  cr0)] + e (c  (a)  , c (cr 0)) c(a)  c(a 0) c2(a 0) where the error E(a,a 0) is given by (2.2.25) It follows  that E (a,  a0) = e(c(a),c  (a 0)) - —^e (a,  a0). (2.2.26) C (CToj IE (<7, <r0)l < k (c(or),  C (<r„))| + 7-7—772 l e (*> CTo)|, (2.2.27) and using (2.2.21) and (2.2.24), (2.2.27) becomes 16 , , , , k |E(*,*o)| < — — 3 |c(a)  — c(<r0)J + —— \\6*\\'.  (2.2.28) By (2.2.20) and (2.2.21) \c(a)  - c(<7 0)| < \c'(a 0)8a\ + |e(<7,<r 0)| < 2 ||c' (<r0)|| \\6a\\ + K\\8a\\ 2, and so (2.2.28) may be written (to second-order in ||<5<x||) as (2.2.29) |E(*,* 0)\ < 6 4 , l | c / ( ^ 1 1 2 \\6*\? + —j—i . (2.2.30) Thus by (2.2.25) and (2.2.30), the linearization error for  the response 1/c is of  0[||<5cr||2], and since the response R= —1 /{iuc),  it follows  that R must also have this property (provided c^O). This implies the Frechet differentiability  of  R. 2.2.3 Constant-conductivity example A simple yet instructive example where the Frechet differential  expansion can be evaluated directly is the case where conductivity profiles  and perturbations are depth independent, i.e. a(z)  — a, aQ(z)  — (To,  and 6a(z)  — 6a. For a halfspace  conductivity profile  a, E{a,z)  — E(a,  0 R ( a )  = —y/ji^ajiuj,  and it is straightforward  to show that the general expressions for  the Frechet differential  terms given by (2.2.18) reduce to p IVOCTO -tto = —1 IU RX = - l ^ S a , 2 y iu>a0 r2= i p v - 2 - <2 '231) 8 Y zuaQ R> = A 16 Y lu^o and thus the Frdchet differential  expansion for  R(a)  is given by „ . . (  .— 1 8a 1 Sa2 1 6a3 \ „ W = ^ - + - + . . . j . (2.2.32) The first  term represents the functional  R evaluated at the starting model a0, the second term is linear in 8a and the remaining terms represent an alternating series increasing in powers of 8a. For this simple case of  constant conductivities, Fr6chet differentiation  of  R is equivalent to ordinary differentiation  and the Fr6chet differential  expansion reduces to an ordinary Taylor series expansion „^ s ~• - dR R(a)  = R{a 0) + — 1 d?R + 2! d ^ 6 ( r 2 1 cPR a0 3! da 3 Sa3 + . . . . (2.2.33) It is straightforward  to show that taking R = — y//j, 0a/iu>  in (2.2.28) leads to an expression identical to (2.2.27). By using the ratio test for  convergence of  a series (e.g. Kaplan 1973, p. 385) it may be shown that the series of  higher-order terms in (2.2.27) converges for  \8a/a 0\ < 1 and diverges for  \8a/a 0\ >1. The remainder term Rr given by (2.2.11) reduces, for  constant conductivities, to and it is straightforward  to verify  that R(a)  = R0 + Rl + Rr holds for  all a0 and 8a. Writing ~ y/a 0 + 8a) 2 = a0 — ^Jl  + 8aJa^  in (2.2.29) and expanding y/l  + 8a/a Q in a binomial series (provided \8a/a 0\ < 1) leads to which demonstrates the equivalence of  the remainder term and the infinite  series of  higher-order terms within the region of  convergence. 2.3 The response c = - E/d zE 2.3.1 Expansions for  c Parker (1977b) derived the linear and remainder terms for  the MT response c, defined  by (2.1.18), when the point of  measurement is the surface  of  the Earth. In this section, the complete expansion in terms of  an infinite  series of  higher-order terms is derived and the equivalence of the remainder term and the series of  higher-order terms is demonstrated. Also, by changing the manner in which the response is generalized to depth, these results are obtained for  arbitrary measurement depth zm. Parker (1977b) generalized the c response according to Since c(a,z)  simply represents a scaling of  the electric field,  it must satisfy  the differential equation (2.2.1) for  E(a,z),  i.e. d 2c {a,  z) — iu>fi 0a (z)  c (a,  z) — 0. (2.3.2) Following the procedure outlined in Section 2.2, assume (2.3.2) is satisfied  by a conductivity a0(z)  and introduce a perturbation <r (z)  = <TQ  (Z)  + 8a (z),  (2.3.3a) c (a,  z) = c (a 0 + 8a, z) — CQ (z)  + cx (z)  + c2 (z)  + ... . (2.3.3b) Equating terms of  like order leads to d 2zc0 - (iu>fj, 0a0) CO = 0, (2.3.4a) d\c\  - (iu>fi0<r 0) ci = (iufj, 0) co8a, (2.3.4b) d zc2 — (iu/.c 0a0) c2 = (iw/J-o)  c\8a, (2.3.4c) d 2zc3 - (iufioao)  c3 = (iufj, 0) c28a, (2.3.4d) Consider a solution to (2.3.4) for  a particular depth zm. The zero-order equation (2.3.4a) is in the form  of  the original differential  equation (2.3.2) and has a solution c0(z m) =c(a 0, zm). The remaining equations all have the same form  and can be solved if  a Green's function  Q(a 0, zm, z) (e.g. Morse & Feshbach 1953, Chapter 7) can be found  which satisfies d 2zQ Oo, zm, z) - iufj, 0cr0 (z m) Q (<70, zm, z) = 8 (z m - z), (2.3.5) where 8(z m — z) is the Dirac delta function  centred at z — zm. The Green's function  Q must satisfy  boundary conditions Q(z)  = 0 for  z—>00 and d zQ(z  = 0) = 0 (the latter condition follows from  definition  (2.3.1) and the expansion c(a)  = c(<r0)+ci + c 2 + . . . ). The solutions to (2.3.4) are then given by (2.3.6a) (2.3.6b) (2.3.6c) (2.3.6d) As before,  the zero-order term represents the response functional  evaluated at the starting model, the first-order  term is linear in 8a, and c2 , c 3 , . . . represent higher-order contributions. A closed-form  remainder term cr can be found  by introducing an expansion c(a)  = c (a 0 + 8a) = c0 + cx + c r (2.3.7) into the differential  equation (2.3.2). This leads to oo cr(z m) = J  iu>/j,oQ(ao,z m,z)8c(z)8a(z)dz,  (2.3.8) o where 8c — c(a)  — c(er0). The equivalence of  the series of  higher-order terms and the remainder term is clear: using 8c = ci + c2 + . . . in (2.3.8) and breaking the integral into terms according to their order yields the expansion terms given by (2.3.6c, d, ...). It is straightforward  to change the recursive form  of  the expansion terms in (2.3.6) to the standard form  for  the Fr6chet differential  expansion according to (2.1.8). By substituting the expression for  ck_i into the expression for  ck and implicitly assuming the dependence on a0, Co(z m) = c(a 0,zm) , oo ci(z m) = J  iuj/j, 0Q(a0,z m,z)c0(z)8a(z)dz o oo C2(z m) = J  iufioQ  (<70, zm, z) c1(z)8a(z)dz, 0 oo c3(z m) = J  iunoQ (a 0,zm,z)c2(z)8a(z)dz, (2.3.6) can be written as co(z m) = C (ao,  zm), (2.3.9a) oo Ci(z m) = J G 1 z i ) 6a(z x)dz u (2.3.9b) 0 oo oo c2(z m) = J  J  G2(z m, zuz2) 8a(z 1)6a(z 2)dz 1 dz 2, (2.3.9c) 0 0 oo oo oo cz(z m) = Y\  J  J  J  G3(z m,z1,z2,z3)8a(z 1)8a(z 2)8a(z 3)dz 1dz 2dz 3, (2.3.9d) o o o where the Frechet kernels are defined  by G1(z m,z1) = iwiioQ(zm,zx)c(cro,z 1), (2.3.10a) Gi(z m,zx,z2) = -2u!2nlQ(z m,z2)Q(z 2,z1)c(a Q,z1), (2.3.10b) G3 (z m, zx,z2, z3) = -6iu3fj,lQ  (z m,z3) Q (z 3, z2) Q (z 2, zi)c(a 0, zx), (2.3.10c) The problem in making use of  these expansions for  c(a,z m) is that the Green's function Q(a 0, zm, z) may be difficult  to determine for  general zm and a0. However, Parker (1977b) shows that even when Q(a 0,zm,z) cannot be determined, Q(a o,zm = 0,z) = -c(a 0,z) (2.3.11) may be chosen. Thus, for  zm = 0 the linearized equation reduces to 00 8c(u)  = c-i (CJ)  = JGi  (a 0,u>,z)8a(z)dz,  (2.3.12a) 0 Gi (a Q, v, z) = -iufi 0c2 (cr 0,z), (2.3.12b) and the remainder term reduces to 00 C V ' 00 (w) = J  —iu>Hoc(cro,  z) 6c(z)  8a (z)  dz,  (2.3.13) which correspond to the expressions derived by Parker (1977b). The linear and remainder terms for  arbitrary zm as well as the series of  higher-order terms given by (2.3.6c, d, . . .) or (2.3.9c, d, . . .) cannot be determined in this manner. In general, this does not represent a practical limitation. MT responses are usually measured at the surface,  and if  not, the coordinate system may be redefined  so that zm = 0. Thus, expressions for  8c and cT  are always available,, and these are all that are required to carry out a linearized inversion (an expression for  cr is required to verify  that the neglected terms are second order). However, for  completeness the linear term, series of  higher-order terms and remainder term will be derived for  arbitrary measurement depth ZM-Expressions for  these terms can be derived if  the responses are generalized to depth according to ~ /  s E (cr,  z) „ ^ aiifco' ( 23-14) i.e. c(0) = c(0), but in general c(z)  ^ c(z).  Following the procedure outlined for  expanding R and c, a differential  equation may be derived for  c, and a perturbation introduced to the conductivity and response. Collecting terms according to their order and solving the resulting series of  differential  equations (using an integrating factor)  leads to the following  expressions for  the expansion terms: CO(ZM)  = C  (<70 , Z M ) , (2.3.15a) oo Ci(z m) = j -iLOfi 0cl(z)e s^z)8a(z)dz,  (2.3.15b) Zm OO c2{zm)  = J  -iufio  [c\{z)a 0(z)  + 2c0(z)c 1(z)8a(z)]  (2.3.15c) ZM OO c3(z m) = J-iujfxo  2c1(z)c 2(z)a 0(z)+ ZM (c2(z) + 2c0(z)c 2(z))  8a(z)]  e'^Uz,  (2.3.15d) where (2.3.16) s(z m,z) = J  —2iufi 0c0(u)cr 0(u)du. Zm A closed-form  remainder term cT is given by OO Cr (z m) = J  -iufio  [8c 2 (z)  (a 0 (z)  + 8a (z))  + 2co (z)  8c(z)  6a (a)] e'^^dz.  (2.3.17) ZM The equivalence of  the remainder term and the expansion of  higher-order terms given by (2.3.15c, d, . . .) is easily established by substituting 8c = c(a)  — c(a 0) = ca + c2 + . . . into (2.3.17) and breaking the integral into terms according to their order. The recursive form  of the expansion terms in (2.3.15) may be changed to the standard Frechet differential  form  in a manner similar to that outlined in Section 2.2.1; however, the expressions for  the Fr6chet kernels are lengthy and will only be given to second order: Co(z m) = c(a 0,zm), oo Ci(z m) = J  G\ {z m,z1)8a(z l)dz 1, (2.3.18a) (2.3.18b) oo oo c2( zm) = ^ J  J  J  G2(z m,z1,z2)8a(z 1)8a(z 2)dz ldz2,  (2.3.18c) o o where the Frdchet kernels are given by = -iw/zoc 2 (<T 0, Zi)  # (*1  - zm)es(Z m'Zl\ [oo J  iu>fj, 0cr0(z 3)c(a 0, Z 3)H(Z 1 - z3) X H(Z 2 - Z 3)H(Z 3 -- 2H(Z 2 - z1)H{z 1 -(2.3.19a) (2.3.19b) Equations (2.3.15)-(2.3.19) provide complete expansions for  the generalized response K ai zm) = —E(cr,  zm)/d zE(cr,  zm) in terms of  an infinite  series or a linear and remainder term. For zm = 0 the Fr6chet kernel of  the linear term, given by (2.3.19a), may be written Gi (z m = 0,z) = -iu/j,o E(a 0,z) "" 2 d zE (cr 0,z) Equation (2.3.20) may be simplified  by noting that exp [ - 2 J  J . (2.3.20) d zE (a 0, z) exp I U J ^ q ^{a^u f 0 d U j = 9 * E ^ 0 ) (2-3.21) (to obtain this relationship, divide the MT differential  equation (2.2.1) by d zE(z)  and integrate from  0 to z). This leads to E(<r 0,z) 1 2 Gi (z m = 0,z) = -iujxo (2.3.22) [d zE{a 0,0). = -iufl 0C 2 ((To, z ) , which is equal to Gi(z m = 0, z), the Frechet kernel for  the generalized response c(a,  zm) = —E(cr,  zm)/E(a,  0) given by (2.3.12). Thus, the Frechet kernel and linear term are identical for zm = 0 regardless of  how the response is generalized. This also holds for  the higher-order terms and remainder term. For the usual case where responses are measured at the Earth's surface,  the linear and remainder term for  c(a,  zm—Q) are simpler to work with. However, the generalization c(a,  zm) given by (2.3.14) allows the computation of  the linear and remainder terms for  arbitrary zm as well as the series of  higher-order terms. The proof  of  the Frechet differentiability  of  c has been the subject of  a number of  papers. Parker's (1970) paper in which he first  derived the Fr6chet kernel for  the linear term and a later application to computing conductivity bounds (Parker 1972) simply assumed c to be Frechet differentiable.  Anderssen (1975) questioned the validity of  Parker's results since the neglected terms had not been proved to be second order. Anderssen's observations became more significant when Woodhouse (1976) showed that first-  rather than second-order remainder terms result for  the seismic normal mode problem when the Earth model is allowed to be discontinuous, invalidating the Frechet kernels derived by Backus & Gilbert (1967) for  this problem. Parker (1977b) appeared to prove the Frechet differentiability  of  c with respect to conductivity profiles a E the space of  square-integrable functions.  Chave (1984) proved Frdchet differentiability  for the fundamental  toroidal and poloidal modes of  EM induction (of  which MT is a special case) in an l2 norm. MacBain (1986) detected a mathematical flaw  in Parker's (1977b) proof,  but proved the result for  a E C 2, the space of  functions  which are twice continuously differentiable. In Parker's (1986) reply, he acknowledged the error but pointed out that MacBain's choice of model space was overly restrictive. MacBain (1987) appears to have completed the problem by proving the Frechet differentiability  of  c with respect to conductivity models a in and Z2 and showing that the l-i result can be extended to include finite  delta-comb functions.  Thus, the Frdchet differentiability  of  c seems to be firmly  established. Since the Frechet differentiability  of c has been extensively investigated and c = c for  the usual case of  zm = 0, the differentiability of  c will not be considered further  in this thesis. 2.3.2 Constant-conductivity example For the simple case where conductivity profiles  and perturbations are depth independent, c(<r, Zm)  = (iwfioa)- 1/2 and it is straightforward  to show that the Frdchet differential  terms given by (2.3.15) reduce to ~ ( 1 CO {Zm)  — !• -A/SCU/XOCTO CL (2M) = 3/2 * y/lU![l 0<V * ( * » ) = f - p ^ A r 3 , (2.3.23) c3{zm)  = -— 7/ 28a\ 1 0 y/lUJUoCTQ1 and thus the Frechet differential  expansion is given by 1 ( 1 I  6a 3 8a2 6a3 \ „ = - j g s " u f + T7f~  ~oT*••  j • < 2 3 ' 2 4 ) The first  term represents the functional  c evaluated at the starting model a0, the second term is linear in 8a and the remaining terms form  an alternating series increasing in powers of  8a. All terms are depth independent. It is straightforward  to verify  that an equation identical to (2.3.24) is obtained using ordinary derivatives of  c(a,  zm) with respect to a and the standard Taylor series expansion. The ratio test for  convergence of  a series indicates that the series of  higher-order terms in (2.3.24) converges for  \8a/a a\ < 1 and diverges for  \8a/a 0\ > 1. The remainder term cr(z m) given by (2.3.17) reduces, for  constant conductivities, to * ( , ) = * ( » U - ^ + l J ) , (2.3.25) y/lUfio  y y/Oo 2 y/a0 + 80 J and it is straightforward  to verify  that c(cr) = c0 + ca + c r holds for  all a 0 and 8a. Writing l/y/cr 0+8a = <7q1/2(1 + 8a/a 0)~1/ 2 in (2.3.25) and expanding (l  + 8cr/ao)~ 1/ 2 in a binomial series (provided \8a/a 0\ < 1) leads to 1 {3  6a2 5 8a3 \ . . which demonstrates the equivalence of  the remainder term and the infinite  series of  higher-order terms within the region of  convergence. Evaluating the expansion terms for  c(a,z m) = -E(a,z m)/d zE(a,0)  is somewhat more complicated. For the special case where a0 is a constant, it may be verified  that Q Oo, zm, z) 2y/iu}/j, 0a0 (2.3.27) is a solution to the differential  equation (2.3.5) for  the Green's function  which also satisfies  the boundary conditions Q(z)  = 0 for  z—>oo and d zQ{z  = 0) = 0. For zm = 0 Q(a o,zm = 0,z) = y/iuJH 0cr0 E (<T 0, z) (2.3.28) d zE{a o,0) = -c(a 0,z), in agreement with (2.3.11). The Frechet differential  terms for  c(a,  zm) given by (2.3.6) become e—y/iuiwcroZTn C 0 (zm ) — y/ILOFJ,0A0 Cl  (z n) = - 2 c 2 (z m) - g z, + — _y/iujfj, 0(7^ 2 a0 _ 3 8a, C3 (ZM) = -y/iujjj^ay 2 ao 5 3z m . T O T 3/2 (2.3.29) 16 5zrn ly/iunaZ ^ iu y.Qz, » i -R c/o "I 5/2 3<r02 It is straightforward  to show that an ordinary Taylor series expansion of  c(a,z m) about a0 leads to an expansion with terms identical to (2.3.29). For zm = 0 the expansion terms for c(a,z m) given by (2.3.29) reduce to the depth-independent expansion terms of  c(a,z m) given by (2.3.23). The remainder term cr(z m) given by (2.3.17) can be shown to reduce, for  constant conductivities, to 1 CR (^M) — y/iujfio n y/a 0 + 8a + 8a \JIUJ\±q  8a ZN v " ' " - - m J g-V'^ O^O-Zm 2<7o 3/2 2a0 (2.3 ,30) It is straightforward  to verify  that c(a,  zm) = co(z m)+ci(z m)+cT(z m) holds for  all a0 and 8a, and that for  zm = 0 (2.3.30) is equivalent to the remainder term cr(z m) given by (2.3.25). 2.4 Relative linearity of R and c 2.4.1 Quantifying  the linearity In Sections 2.2 and 2.3 Frechet differential  expansions are derived for  the MT responses R and c. By proving that the remainder (non-linear) term is second order and may be neglected, the linear term can be inverted using methods of  linear inverse theory. Linearized inversion algorithms have been successfully  implemented which make use of  both responses (see references cited in Section 2.1.2). However, an investigation of  the relative linearity of  the two responses has not been presented. In a non-linear problem, the choice of  model and response may well affect  the linearity of  the problem (i.e. the relative size of  the linear and non-linear terms). Strictly speaking, a problem is either linear or non-linear; however, a particular choice of  model and response may be considered 'more linear' than others if  it yields a more accurate linearized equation when the higher-order terms are neglected. The correct choice should increase the likelihood of the algorithm converging to an acceptable solution, particularly when a model is sought which minimizes a particular functional.  In addition, formulating  the most accurate linearized inverse problem should minimize the number of  iterations required to achieve an acceptable model. This may be significant,  particularly for  2 - and 3-D inversion schemes which require a large number of  1-D solutions to perform  an approximate inversion step (Oldenburg & Ellis 1990) or to carry out uncoupled 1-D inversions at each site (Smith 1989). Three possible choices for  the model are conductivity <r, resistivity p and log conductivity (logo- = -log/?). Smith & Booker (1988) argue convincingly that conductivity is the choice of  model for  which the problem is most linear; therefore,  a(z)  is adopted as model in this study. The only consideration of  the linearity of  MT responses R and c in the literature is a brief heuristic argument presented by Smith & Booker (1988) to motivate their use of  R, in a linearized inversion algorithm. By integrating the MT differential  equation (2.2.1) and normalizing by the surface  field,  Smith & Booker derive an exact relationship between R and a: According to their argument, if  E were independent of  a, R would be linear in a and therefore may be more linear than c. They do not demonstrate that a similar statement cannot be made about c. Although this argument may motivate their choice of  R, the reasoning is not very general since E is certainly not independent of  a (the dependence of  the electric field  and its gradient on the Earth conductivity structure is, in fact,  the basis for  MT). A more general consideration of  the relative linearity is required in order to determine the most linear response. oo (2.4.1) o The purpose of  this section is to attempt to quantify  the relative linearity of  R and c in order to select the most linear response. The quantity that will be considered diagnostic of  the linearity of  a response functional  is the ratio of  the magnitudes of  the linear and non-linear terms in the Frechet differential  expansion, defined  by Inon—linear terms | , _ Q = |linear term| ' < 2 4 ' 2 ) If  the problem is linear, the non-linear terms are zero and a — 0; if  the non-linear terms are small compared to the linear term, a is small and the problem is considered 'almost linear'; if  the non-linear terms dominate, a is large and the problem is considered 'very non-linear'. Other diagnostic quantities could be defined,  but the linearity ratio a given by (2.4.2) provides a practical and useful  measure of  the linearity. Since the remainder terms R,. and cr provide closed-form  expressions which contain all the higher-order contributions, the linearity ratios a R and ac for  R and c can be written = (2.4.3) I-Hi I = H - (2.4.4) lcil The ratio aR/a c is diagnostic of  the relative linearity of  R and c: when aR/a c < 1, R may be considered more linear than c; when aR/a c > 1, R is more non-linear than c. Calculating aR, ac and the ratio aR/a c allows the relative linearity of  the responses R and c to be quantitatively compared. 2.4.2 Linearity for  constant-conductivity models In order to demonstrate how the relative linearity of  R and c may be quantified  using the linearity ratio a, consider the special case where the conductivity models and perturbations are independent of  depth. As before,  a represents the true model, a0 represents an arbitrary starting model, and 6a = a — cr0. An expression for  the linearity ratio for  R, aR, may be obtained by using the linear and non-linear (remainder) terms derived for  the constant-conductivity case in Section 2.2.3. Substituting the expressions for  Rx and Rr given by (2.2.31) and (2.2.34), respectively, into (2.4.3) leads to = (2.4.5) For the case of  constant conductivities, a R is independent of  frequency  u which simplifies  the analysis. To investigate the linearity of  R, the true model a is considered to be known and remains fixed,  and a R is computed for  a wide range of  starting models <r0. A number of  results follow  immediately from  (2.4.5): as a0—*cr, aR—>0; as cro—^0, aR—>1; and as a0—too, aR—>>1. In fact,  0 < « « < 1 for  all possible choices of  a and cr0, indicating that for  R the magnitude of the non-linear terms never exceeds that of  the linear term. An expression for  the linearity ratio for  c, ac, may be obtained by substituting expressions for  the linear and remainder terms derived for  c in the constant-conductivity example of  Section 2.3.2. For simplicity, the depth-independent terms cx and c r given by (2.3.23) and (2.3.25), respectively, may be used in (2.4.4) and lead to <7 + 2 ^ / 2 ( 7 - 1 / 2 - 3(70 — • (2.4.6) |<7 - <70 | A number of  results follow  from  (2.4.6): as <70-><x, ac—>0; as <r0—>0, ac—>1; but as a0—>oo, ac—>oo. Thus, a c is not bounded — for  large <70 the ratio of  the non-linear to linear terms approaches infinity. The ratio a R / a c may be considered diagnostic of  the relative linearity of  R and c. For the constant conductivities this ratio may be written <* /« . = . (2.4.7) a + 2<7o / 2<r- 1 ' 2 - 3cr0 For the case of  constant conductivity profiles,  it can be proved analytically that a R / a c < 1 for all values of  a 0 . The proof  begins with the identity (^Jo/cro — 1 j > 0 or a ->2J—-1.  (2.4.8) Multiplying through by -2al' 2a~1/ 2 and adding aQ + a to (2.4.8) leads to a — 2y/aa^  + a0<a + 2a3'2aQ 1 / 2 - 3<r0, (2.4.9) or OCR/A C = (2.4.10) (7 + 2 ( 7 o / V 1 / 2 - 3cr0 It is noted that this proof  does not hold for  a0 =0 , a0 = a, or <T0—>oo. However, in these cases l'Hopital's rule may be used to calculate the values: as a0—>0, aR/a c-^  1; as a0-*a, oiRj(Xq,—• 1/3; and as <70—>oo, a R / a c ^ 0 . Thus, a R / a c < 1 holds for  all values of  cr0 which indicates that for  the case of  depth-independent conductivities, R is always at least as linear as c. For large values of  a0, c may be a great deal more non-linear than R. To illustrate how the linearity of  R and c vary with the starting model a0, Fig. 2.1 shows a R , a c and a R / a c computed from  (2.4.4), (2.4.5) and (2.4.7), respectively, plotted as a function of  a0/a.  Figure 2.1(a) shows that aR and ac both have a simple form  with a single pronounced minimum occurring at <70/<r= 1. At this point a R and a c are actually zero, but the figure  simply shows them approaching zero. The minima represents the point where a and <r0 are closest in a linear sense, i.e. the value of  a 0 where the linearized expansion most closely approximates the true response. For constant conductivities it is clear that this must occur at a 0 / a = \ since R(cr)  = R{a 0) + f  G(cr 0, z)8a(z)dz  is exactly true for  aQ = a, 8a = 0 (and similarly for  c). For a0/a  < 1, aR and ac increase and asymptotically approach one. For a0/a  > 1, aR again approaches one; however, a c increases without bound. The relative linearity of  R and c is illustrated in Fig. 2.1(b) which shows aR/a c as a function of  a0/a.  For small values of  a0/a,  aR/a c asymptotically approaches one. As a0/a  increases, aR/a c decreases smoothly passing through the value 1/3 at a0/a=  1. For large values of  a0/a, aR/a c approaches zero, indicating R is much more linear than c. It is clear from  Fig. 2.1 that aR < ac for  all values of  a0/a  indicating that for  constant-conductivity profiles,  R is a more linear choice of  MT response than c. oo 0 101 o 10° l c r 1 • s G 1 0 - 2 1 0 - 3 10° o \ 10 - 1 10 ^ 1 0 ~ 3 1 ( T 2 1 0 " 1 1 0 ° 1 0 1 1 0 2 Figure 2.1 Relative linearity of  MT responses R and c for  the case of  constant-conductivity true and starting models, (a) shows a R (solid line) and a c (dashed line) as a function  of  a Q / a . (b) shows the ratio a R / a c as a function  of  <J0/<T. 2.4.3 Linearity for  general models In Section 2.4.2 it is shown that for  the case of  constant-conductivity profiles,  R is more linear than c for  all a and a0. This fact  alone strongly suggests that R is a better choice for linearizing the MT response than c; however, it is instructive to investigate the relative linearity for  more general conductivity models. Considering general models introduces two difficulties not encountered in the constant-conductivity case. First, general analytic results concerning a R and a c are not available, and second, the linearity ratios depend on frequency  u (or period T  = 2-k/uS)  as well as the conductivities. In the most general formulation  the linearity ratios would also depend on the depth of  measurement, zm; however, since the coordinate system may be redefined  so zm = 0, only the case corresponding to surface  measurements need be considered. Although analytic results are not available for  general models, the linearity ratios a R and a c can be computed for  a given choice of  cr(z),  a0(z)  and T  using expressions for  the expansion terms derived in Sections 2.2 and 2.3. The observed responses, R(cr)  and c(a),  are expanded about the starting model <r0 in terms of  a linear and non-linear (remainder) term. The linear terms, Rx and cx , are calculated using (2.2.16) and (2.3.12), respectively. Equations (2.2.11) and (2.3.13) provide expressions for  the remainder terms Rr and cr; however, in a synthetic study once the linear terms Rx and cx have been calculated, it is simpler to evaluate the remainder terms as Rr = R(a)  - R(cr 0) - Rx, (2.4.11a) cr = c{a)  - c (<70) - ci. (2.4.lib) The linearity ratios a R and a c are calculated according to (2.4.3) and (2.4.4) as the ratio of the magnitudes of  the non-linear and linear terms. Evaluating the quantities in (2.4.11) require computing the electric and magnetic fields  as a function  of  depth for  the conductivity profiles a(z)  and a0(z);  this is the forward  problem for  MT and can be solved in a manner similar to Oldenburg (1979). The forward  problem is considered in more detail in Chapter 3. For a given true model a, the linearity ratios may be computed and compared for  a number of  starting models a 0 . It is obviously impossible to exhaustively consider all a and a 0 in the model space. However, for  a given a the linearity ratios a R and a c can be computed for  all a 0 within certain classes of  starting models. Examining a R and a c for  a number of  cases which are representative of  general conductivity structures provides insight into the relative linearity of  R and c for  arbitrary models. As an example of  the linearity analysis, consider a model a{z) which consists of  a conductive surface  layer 200-m thick with a conductivity of  0.1 S/m overlying a halfspace  of  conductivity 0.01 S/m. Figure 2.2 shows the linearity ratios computed when the starting models consists of halfspaces  of  conductivity a0: aR, ac and aR/a c are plotted as a function  of  a0 for  periods T of  0.01, 0.1, and 1.0 s. Each curve consists of  500 computed values logarithmically spaced in a0 from  0.001 to 10.0 S/m. The linearity ratios aR and ac generally vary with cr0 in a regular manner with absolute minima occurring at the points where a and <r0 are closest in a linear sense for  R and c, respectively. Unlike the constant-conductivity case shown in Fig. 2.1(a), the minima of  a R and a c do not coincide. The locations of  the minima vary with period: for T = 0.01 s, Fig. 2.2(a) shows the minima occur near CR0 = 0.1 S/m (the value of  the surface  layer of  <7), while for  T = 1.0 s, Fig. 2.2(c) shows the minima occur near a 0 =0.01 S/m (the value of  the underlying halfspace  of  a). Figure 2.2(b) shows that for  T  =0.1 s, the minima occur at values of  a 0 intermediate between 0.1 and 0.01 S/m. This variation may be understood by considering the depth of  penetration of  the electric fields  in the true model a. For the short period T = 0.01 s, the skin depth (depth at which E has decayed to 1/e of  its surface  value) is less than the thickness of  the surface  layer, so the response is similar to that of  a halfspace  of conductivity 0.1 S/m. For the long period T  = 1.0 s, the skin depth is an order of  magnitude greater than the thickness of  the surface  layer, so the response is similar to that of  a halfspace of  conductivity 0.01 S/m. For periods much shorter than 0.01 s or longer than 1.0 s, plots of a R and a c simply resemble the constant-conductivity case shown in Fig. 2.1(a). In addition to the minima, Fig. 2.2(a) also exhibits distinct maxima for  a R and a c near a 0 =0.1 S/m. This o 10° 55 . 10-1 S 10-2 1 0 - 3 10 1 o 10° V 0 " S 1 0 - 2 1 0 ~ 3 1 0 - 3 1 0 " 2 1 0 " 1 10° 10 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 ° 10 C70 ( S / m ) (7q (S /m) a 0 (S /m) "N / v " b 1 i i i Figure 2.2 Relative linearity of  MT responses R and c. The true model a consists of  a 200-m thick surface  layer of  conductivity 0.1 S/m over a halfspace  of  conductivity 0.01 S/m. The starting models consist of  halfspaces  of  conductivity ao. (a), (b) and (c) show CXR  (solid line) and AC (dashed line) as a function  of ao for  periods T  of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f)  show the ratio a R / a c corresponding to the same periods (the dotted line indicates OLR/OL C = 1). Ul OJ feature  is greatly reduced at longer periods. Except for  a narrow interval near the maximum in Fig. 2.2(a), a R is less than one for  all (7q and T  indicating that the non-linear terms do not exceed the linear terms in magnitude. This is not true for  a c which increases rapidly from  its minimum value as a 0 increases. a R is less than ac for  all cr0 and T  except in the immediate vicinity of  the minimum for  ac. This point is illustrated in Fig. 2.2(d), (e) and (f)  which show the ratio a R / a c as a function  of  cr0 for  the same periods as Fig. 2.2(a), (b) and (c), respectively. The line a R / a c = 1 is included as a reference  in Fig. 2.2(d), (e) and (f).  It is clear that except in a narrow interval near the minimum for  a c , the ratio a R / a c is less than one for  all values of  a 0 and T. In fact,  a R is less than a c by almost an order of  magnitude over much of  the region, indicating that R is significantly  more linear than c. Figure 2.2 shows a R , a c and a R / a c as a function  of  <70 for  three fixed  periods. In order to observe how these quantities vary with period as well as starting model, Fig. 2.3 shows surfaces of  a R , a c and a R / a c as a function  of  T  and cr0. The surfaces  Fig. 2.3(a) and (b) represent a 50 by 50 grid of  computed values. Figure 2.2(c) represents 50 values of  T  and 60 values of  a0 ; the sampling interval in cr0 is halved in the region of  the maximum to more accurately define  this feature.  The curves in Fig. 2.2 represent 2-D 'slices' through these surfaces  at periods of  0.01, 0.1, and 1.0 s. The variation with T  is seen to be gradual, smoothly joining the fixed-period curves. For a true model a consisting of  a layer over a halfspace,  another appropriate class of  starting models for  investigating the relative linearity is the set of  2-layer models. In Fig. 2.4 the linearity ratios are compared for  starting models a 0 which consist of  a 200-m thick surface  layer over a halfspace  of  conductivity 0.01 S/m, i.e. the layer thickness and halfspace  conductivity are identical to the true model. The conductivity of  the surface  layer, a0(z s), is varied from  0.001 to 1.0 S/m. Figure 2.4(a), (b) and (c) show aR and ac as a function  of  a0(z s) for  periods T of  0.01, 0.1, and 1.0 s. The linearity ratios both exhibit a pronounced minimum at <70(zs) = 0.1 S/m for  all periods. At this point the true and starting models are identical and a R and a c are zero although the figure  simply shows them approaching zero. Figure 2.4(c), (d) and (e) show 101 G 1 0 ° Q . 10-1 ct; 8 l O - 2 CJ 10 10 - 2 ** s — — — — f/ — V a i i i -1 _ icr3 icr2 ict1 io° io1 icr3 icr2 lcr1 io° io1 a 0 ( z s ) ( S / m ) a Q ( z s ) ( S / m ) / / / / / / - ^ — — _ ~ s / _ s / \l / c 1 X 1 / \l / 1 1 0 - 3 1 0 - 2 1 0 " 1 10° 10 1 C7 0(z s) ( S / m ) Figure 2.4 Relative linearity of  MT responses R and c. The true model a consists of  a 200-m thick surface layer of  conductivity 0.1 S/m over a halfspace  of  conductivity 0.01 S/m. The starting models CTO consist of  a 200-m thick layer over a 0.01-S/m halfspace,  <TQ(Z S) indicates the conductivity of  the surface  layer, (a), (b) and (c) show aR (solid line) and a c (dashed line) as a function  of  ao(zs) for  periods T  of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f)  show the ratio a j i / a c corresponding to the same periods. Ul OS aR/a c corresponding to the same periods. This ratio is less than one for  all values of  a0(z s) and T, indicating R is more linear than c. For T=  1.0 s, Fig. 2.4(c) shows that aR is more than an order of  magnitude less than ac over the range of  <r0(z a) values. Figure 2.5 shows surfaces of  aR, ac and aR/a c as a function  of  crQ(z s) and T. Figure 2.6 shows the linearity ratios when the starting models er0 consist of  a 300-m thick surface  layer of  conductivity a0(z s) between 0.001 and 1.0 S/m over a halfspace  of  conductivity 0.04 S/m, i.e. the layer thickness and halfspace  conductivity do not correspond to the true model. Figure 2.6(a), (b) and (c) show aR and ac as a function  of  the surface  layer conductivity, cr0(z s), for  periods T  of  0.01, 0.1, and 1.0 s. Figure 2.6(a) resembles Fig. 2.2(a) since for  the short period T  =0.01 s there is little difference  in expanding the responses about a starting model consisting of  a 300-m surface  layer or a halfspace  of  the same conductivity. At longer periods the minima of  a R and a c become less pronounced, as shown in Fig. 2.6(b) and (c). Figure 2.6(c), (d) and (e) show aR/a c: this ratio is less than one for  all values of  cr0(z s) and T  except in the immediate vicinity of  the minimum for  ac for  T  = 0.01 and 0.1 s. Figure 2.7 shows surfaces  of  aR, ac and aR/a c as a function  of  a0(z s) and T. In Fig. 2.7(c) the sampling interval in a0(z s) is halved along the maximum to more accurately define  this feature. As a second example of  the linearity analysis, consider a true model a(z)  which consists of a linear conductivity gradient of  d za = a'  = 1 0 - 4 S/m2 from  the surface  to a depth of  10 km. The surface  conductivity value is 0.01 S/m; below 10 km the conductivity is a constant 1.01 S/m. Figure 2.8 shows the linearity ratios computed when the starting models consist of  halfspaces of  conductivity aQ. Figure 2.8(a), (b) and (c) show aR and ac as a function  of  the halfspace conductivity <r0 for  periods T  of  0.01, 0.1, and 1.0 s. The linearity ratios exhibit shallow minima with the value of  a 0 at which the minima occur increasing slightly with period. The minima for aR and ac do not coincide, but aR is less than ac over the entire range of  values for  a0 including the minimum for  ac. Figure 2.8(c), (d) and (e) show aR/a c as a function  of  a0 for  the same values of  T: this ratio is less than one for  all a0 and T. Figure 2.9 shows surfaces  of  aR, ac and aR/a c as a function  of  a0(z s) and T.  The linearity ratios show only a weak dependence on T. CJ O . 10-1 C 1 0 - 2 10° 10 - 2 1 1 1 i i d i 1 0 " 3 10-2 10-1 1 0 o 1 0 1 cT 0(z s) (S/m) 1 0 " 3 1 0 " 2 1 0 " 1 10° 10 1 (7 0 ( z s ) (S /m) 1 0 " 3 10-2 1 0 " 1 10° 10 1 a0(z s) (S/m) Figure 2.6 Relative linearity of  MT responses R and c. The true model a consists of  a 200-m thick surface layer of  conductivity 0.1 S/m over a halfspace  of  conductivity 0.01 S/m. The starting models GQ consist of a 300-m thick layer over a 0.04-S/m halfspace,  oo(zs) indicates the conductivity of  the surface  layer, (a), (b) and (c) show OCR  (solid line) and a c (dashed line) as a function  of  <7o(zs) for  periods T of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f)  show the ratio a R / a c corresponding to the same periods (the dotted line indicates a R / a c =1). l O " 2 1 0 " 3 10-2 1 0 - 1 10° 0 (S/m) / / / / — -c 1 i i 1 0 - 1 10° (S/m) 1 0 " 3 10-2 1 0 - 1 10° a0 (S/m) 101 Figure 2.8 Relative linearity of  MT responses R and c. The true model a consists of  a conductivity gradient of  10"4 S/m2 from  the surface  to a depth of  10 km. The surface  value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of  halfspaces  of  conductivity a 0 . (a), (b) and (c) show aR (solid line) and ac (dashed line) as a function  of  a0 for  periods T  of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f)  show the ratio ocR /a c corresponding to the same periods (the dotted line indicates a R / a c =1). Figure 2.10 shows the linearity ratios computed when the starting models consist of  linear gradients a'0 ranging from  1 0 - 7 to 1 0 - 3 S/m2. The surface  conductivity value is 0.01 S/m (identical to the true model) and below 10 km the conductivity is held constant at the achieved value. Figure 2.10(a), (b) and (c) show a R and a c as a function  of  the ratio of  the gradients of the starting and true models, CTQ/CT', for  periods T of  0.01, 0.1, and 1.0 s. The linearity ratios exhibit a pronounced minima at cr' 0/(r'  = 1 where the true and starting models are identical. Figure 2.10(c), (d) and (e) show aR/a c as a function  of  o'0/a'  for  the same values of  T.  This ratio is less than one or all a '0 /a ' and T. Figure 2.11 shows surfaces  of  a R , a c and a R / a c as a function  of  cr'0/cr' and T. The linearity ratios computed when the starting models a0 consist of  conductivity gradients of 10"7 to 10~3 S/m2 and the surface  conductivity value is 0.1 S/m (not identical to the true model) are shown in Fig. 2.12. aR and ac are shown in Fig. 2.12(a), (b) and (c) for  T  = 0.01, 0.1 and 1.0 s. The linearity ratios do not show pronounced minima and a R is less than a c for  all values of  O-Q/<T' and T.  Figure 2.12(c), (d) and (e) show aR/a c: this ratio is almost constant at a value of  about 0.2. Figure 2.13 shows surfaces  of  a R , a c and a R / a c as a function  of  cr'o/a' and T. Figures 2.1-2.13 illustrate the relative linearity of  MT responses R and c for  several choices of  true model a. The relative linearity has been investigated for  a number of  true models including a halfspace,  conductive/resistive surface  layer, conductive/resistive buried layer and positive/negative conductivity gradient and a variety of  types of  starting models. Table 2.1 summarizes the cases that have been considered. The results of  this study may be summarized as follows.  Plots of  a R and a c at fixed  periods generally exhibit a minimum for  some model <70. When CT0 = cr is a possible choice, the minima of  aR and ac coincide at this point achieving a value of  zero; however, the value of  a R / a c as a0—^cr  is less than one. When a0 = a is not a possible choice, the minima of  aR and ac may be pronounced or shallow and generally do not coincide. Except near localized maxima, aR is generally less than one, indicating that the non-linear terms do not exceed the linear term in magnitude. In contrast, ac often  exceeds one over large ranges. Finally, aR is observed to be less than ac for  all choices of  a0 and periods IO1 o 10° . 10-1 S I O " 2 - \ / a i I 1 0 " 1 I I I I O " 3 10-2 1 0 - 1 100 ( T 0 ' / V IO1 I O - 3 10-2 IO" 1 10° a 0 / a ' IO1 I O " 3 10-2 1 0 - 1 10° <Jo'/ a' IO1 Figure 2.10 Relative linearity of  MT responses R and c. The true model a consists of  a conductivity gradient of  10~4 S/m2 from  the surface  to a depth of  10 km. The surface  value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of  conductivity gradients from  the surface  to 10-km depth; the surface  values are 0.01 S/m and below 10 km the conductivity remains constant. <7q/CT' represents the ratio of  the gradients of  the starting and true models, (a), (b) and (c) show (solid line) and a c (dashed line) as a function  of  O-q/CT' for  periods T of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio OCR/A C corresponding to the same periods. 1 0 - 1 10-3 1 0 " 2 1 0 - 1 10° a 0 / a ' 1 0 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 a0'/a' 101 10-3 1 0 " 2 1 0 - 1 10° < T 0 ' / a ' Figure 2.12 Relative linearity of  MT responses R and c. The true model a consists of  a conductivity gradient of  1CT4 S/m2 from  the surface  to a depth of  10 km. The surface  value is 0.01 S/m and below 10 km the conductivity remains constant at 1.01 S/m. The starting models consist of  conductivity gradients from  the surface  to 10-km depth; the surface  values are 0.1 S/m and below 10 km the conductivity remains constant, o-g/cr' represents the ratio of  the gradients of  the starting and true models, (a), (b) and (c) show OLR (solid line) and a c (dashed line) as a function  of  cr'0 jo ' for  periods T of  0.01, 0.1 and 1.0 s, respectively, (d), (e) and (f) show the ratio CXR/CXC  corresponding to the same periods. o\ o\ Table 2.1 Summary of  cases considered in linearity study of  MT repsonses R and c. True model Starting models halfspace - halfspaces - surface  layers - buried layers conductive surface  layer - halfspaces - surface  layers (correct thickness & halfspace conductivity) - surface  layers (incorrect thickness & halfspace conductivity) resistive surface  layer - halfspaces - surface  layers (correct thickness & halfspace conductivity) - surface  layers (incorrect thickness & halfspace condutivity) conductive buried layer - halfspaces - buried layers (correct depths & halfspace conductivity) - burried layers (incorrect depths & halfspace conductivity) resistive buried layer - halfspaces - buried layers (correct depths & halfspace conductivity) - buried layers (incorrect depths & halfspace conductivity) positive gradient - halfspaces - positive gradients (correct suface  conductivity & underlying halfspace) - positive gradients (incorrect surface  conductivity & underlying halfspace) negative gradient - halfspaces - negative gradients (correct surface  conductivity & underlying halfspace) - negative gradients (incorrect surface  conductiv-ity & underlying halfspace) T  except in the immediate vicinity of  the minimum of  ac for  some cases where the minima of a R and a c do not coincide. In the narrow regions where a c is less than a R , the value of  a R is generally small, indicating that even where c is more linear than R, R is still quite linear. Although this study is not exhaustive, the true models considered are chosen to be repre-sentative of  general conductivity structures. The results summarized above are consistent for  all choices of  true and starting models considered. This study strongly indicates that R is a more linear choice of  MT response than c. The use of  R should result in a more accurate linearization and therefore  a more effective  and efficient  linearized inversion algorithm. 2.5 Alternative formulations 2.5.1 Alternative choices for  model and response Section 2.4 indicates that a and R are the most linear choices of  model and response for the MT inverse problem. In general, this choice should result in the optimal linearized inversion algorithm. However, in some cases practical considerations warrant the use of  alternative forms of  response or model. In this section two such choices are considered. The conductivity of  the Earth can vary over many orders of  magnitude. If  recovering conductivity variations over a number of  orders of  magnitude is required, logo- is a more appropriate choice of  model than a. This choice of  model has the additional benefit  of  ensuring positivity. Field measurements are often  presented in terms of  the amplitude and phase \R\ and $ of  R (i.e. R = \R\el<t>) rather than real and imaginary parts. Although in theory it is straightforward  to convert between the two representations, in practice different  errors associated with the amplitude and phase measurements can make this difficult.  Also, in some cases it is advantageous to examine the phase information  separately (e.g. Rananayaki 1984). It is then necessary to consider the responses as amplitude and phase and ascribe appropriate statistical errors to each. Oldenburg (1979) transformed  the linearized equation for  R to obtain expressions for  |i2| and <f>  as responses and log a as model. Although it is not explicitly mentioned in this reference, it should be noted that the transformations  are not exact since they involve approximating the difference  operator 8 with the differential  operator d.  This is a first-order  approximation which results in additional higher-order error terms thereby increasing the non-linearity. The results from  Oldenburg (1979) are as follows.  When the complex response is considered as amplitude and phase, the Frechet kernels are given by (?<*, .„ , , ) = (2.5.1a) for  |i2(cr,cu)| and for  <F)(A,  OJ),  where SJ and indicate imaginary and real parts. If  the model is logcr, then the kernels G(a 0,u>,z) corresponding to either real and imaginary or amplitude and phase responses are replaced by a0(z)G(a 0,u>, z). Although the inverse problem in terms of  log a as model or \R\ and (j) as response is not as linear as the formulation  in terms of  a and R, studies similar to that in Section 2.4 indicate that these choices are more linear than resistivity p as model or amplitude and phase of  c as response. In this thesis, inversion methods are developed for  both a and log a as model and real and imaginary part or amplitude and phase of  R as response. Although real and imaginary responses and conductivity are the favoured  choices, it is considered important that inversion algorithms are flexible  enough to accommodate the alternatives when required. 2.5.2 The similitude equation as a basis for  model norm inversion This chapter examines linearization as a method of  solving the MT inverse problem for model norm solutions. Gomez-Treviiio (1987) suggests an alternative formulation.  Using scaling properties of  Maxwell's equations, G6mez-Trevino derives an exact, non-linear integral equation, which he calls the similitude equation, relating the conductivity model to field measurements. Although he does not address the problem in detail, G6mez-Trevino suggests that MT construction and appraisal algorithms could be based on this formulation  rather than linearization. In this section the similitude equation is compared to linearization as a basis for model norm solutions. Gomez-Trevino (1987) derives the similitude equation in terms of  the apparent conductivity aa as response. In order to compare the similitude method to the linearized equations developed in this chapter, the similitude equation is derived here for  the R response; this derivation also serves to illustrate Gomez-Trevino's approach. The scaling properties of  the electric and magnetic fields  are well known and follow  directly from  Maxwell's equations: for  a scalar k E (ka,  kT,  zm) = ^E  (a,  T,  zm), (2.5.2a) B (ka,  kT,  zm) = B (a,  T,  zm). (2.5.2b) Thus, the R response scales according to R ( k a k T z 0 ' ' Z m ) ~ E{ka,kT,z m) (2.5.3) = kR(a,T,z m). If  k = 1 + h, (2.5.3) becomes R(a  + ha,T+hT,z m) = (l  + h)R(a,T,z m). (2.5.4) The quantities ha and hT  may be thought of  as perturbations in conductivity and period which are simply a scaling of  the original values. The perturbation in the response 8R is given by 8R{ha,hT,z m) = R(a+ha,  T+hT,  zm) - R (a,  T,  zm). (2.5.5) Combining (2.5.4) and (2.5.5) leads to 6R (ha,  hT,  zm) = hR (a,  T,z m). (2.5.6) However, 8R may also be expressed in terms of  an expansion about a and T: oo 8R (ha,  hT,  zm) = h T d R ( a ^ z m ) + r > ^ ^ z ) h ( y ^ d z + RT  + Rr^  (2.5.7) where G is the first  Frechet kernel and RT  and Rr are the remainder terms for  the first-order expansion of  R in T  and a. Substituting for  6R from  (2.5.6) and dividing by h, (2.5.7) may be written as oo R (<7, T, zm) = T d R ( a ^ z ™ ) + JQ^ T ) z ^ z ) a d z + ±.Rt + (2.5.8) o In the limit as h—>0, RT/h^0  and R,./h—>0  and (2.5.8) becomes oo R ( a , T , z m ) - T d R ^ Z m ) = J  G (a,  T,  zm, z) a (z)  dz.  (2.5.9) 0 Equation (2.5.9) represents the similitude equation for  R. It is an exact expression relating measurements of  the electric field  and its derivatives to the conductivity model and makes no allusion to any perturbation or starting model. For comparison, the Frechet differential  expansion for  R formulated  for  model norm inversion is reproduced here: 00 00 R(cr,T,z m) - R(a 0,T,  zm) + j G (a 0,T,  zm, z) a0(z)  dz  = J  G (<T 0,T, zm, z) a (z)  dz  + Rr_ 0 0 (2.5.10) This is also an exact expression; however, to solve for  a the non-linear remainder term Rr must be neglected. Gomez-Trevino (1987) suggests that since the exact similitude equation directly relates the response to the true model, inversion algorithms could be based on this formulation  rather than on a linearized approximation. The difficulty  is that the Frechet kernel in (2.5.9) is evaluated at the true model a. Since a is never known a priori (it is the object of  the inversion), the best that can be done in practice is to approximate it with a known starting model a 0 , i.e. approximate G(cr)  with G((T 0) in (2.5.9). This leads to a linear problem for  a. However, making this approximation is equivalent to neglecting an error term S r at each iteration in much the same way that the Frechet differential  expansion for  R is linearized by neglecting the remainder term Rr. To illustrate this, the similitude equation (2.5.9) can be written 00 = jG(a 0,z)a(z)dz  + S r, (2.5.11) 0 where the dependence on T  and zm is implicitly assumed. The error term S r neglected in G6mez-Trevino's approach is given by oo ST  = J  [G(a,z)-G(a 0,z)]a(z)dz.  (2.5.12) o The remainder term for  the linearized equation is proved to be second order in 8a in Section 2.2.3. To determine the magnitude of  the similitude error term S r, consider the following analysis. Subtract from  the similitude equation (2.5.9) an identical expression evaluated at a0 rather than a. This leads to oo oo 8R - T^p-  = j G(a,z)a(z)dz-  j G(a 0,z)a0 (z)  dz.  (2.5.13) 0 0 After  some algebra, (2.5.13) may be rearranged to give oo S r=8R- J  G ( a 0 , z ) 8 a ( z ) d z - T ( 2 . 5 . 1 4 ) o oo But 8R — J  G(a 0, z)8a(z)dz  = Rr, so (2.5.14) becomes o S r = R r - T ^ . (2.5.15) Thus, S r may be considered as the sum of  two terms. The first  term, Rr, is second order in 8a. To investigate the magnitude of  the second term, consider the first-order  approximation for  8R oo 8R {a 0,8a, T,  zm) = J  G (a 0, T,  zm, z) 8a (z)  dz,  (2.5.16) o where all dependences are explicitly indicated. It follows  that oo ^ f  9 dT 0 T  J—{G(a 0,T,z m,z)}8a(z)dz,  (2.5.17) which is first  order in 8a. Thus, the similitude equation for  a can be written to first  order as oo oo R(a)-T^±  = J  G{a 0,z)a{z)dz-jT dG {^ z)8a{z)dz,  (2.5.18) o 0 where the second term on the right side represents the error term ST. The error term neglected in an inversion step based on the similitude equation is first  order in the model perturbation. To demonstrate the significance  of  this result, (2.5.18) may be written in a form  which illustrates the relative size of  terms: oo oo oo R (a) - - J g  (<70, z) <70 (z)  dz  = J g  (a0 , z) 8a (z)  dz  - j T 9 G ^ z ) S a (z)  dz. 0 0 0 (2.5.19) The quantities on the left  side of  (2.5.19) may be considered modified  data. The first  term on the right side is the linear functional  of  8a to be inverted, the second term represents (to first order) the error term ST  that is neglected in the inversion. Although a model norm inversion for a would make use of  (2.5.18), writing the similitude equation in the form  of  (2.5.19) emphasizes that the error term and the inversion term are of  the same order in 8a and any inversion scheme based on neglecting this error term is ill-founded.  Whether the model produced by such an inversion step is an improvement on a0 depends on the relative size of  the linear functionals of  8a in (2.5.19). At best, an iterative inversion algorithm based on the similitude equation might exhibit linear convergence; however, convergence is not guaranteed, even as a0—>a. In contrast, the linearized equation neglects a remainder term that is second order in 8a; an iterative algorithm based on it (essentially Newton's method for  operators) generally exhibits quadratic convergence and convergence is guaranteed provided <70 is close enough to the true model a. The similitude equation requires twice as many measurements as the linearized equation since both R{a,  T)  and d?R(a,  T)  are required to form  the response at each period T.  Ideally, these extra measurements should be included in the inversion in a manner which improves the convergence properties of  the algorithm. However, the similitude equation seems to combine the measurements in a way which degrades (or destroys) convergence. This point is reinforced by the following  analysis. Consider the similitude equation (2.5.9) evaluated at a0 and substitute for  the right side from  the linearized equation (2.5.10) to give oo R (a)  - T - =  J g  (<TO, z) a (z)  dz  + Rr. (2.5.20) The left  side of  (2.5.20) resembles the left  side of  the similitude equation (2.5.11) except that d TR(cr)  is repaced with d TR(<r 0). The right side resembles the right side of  (2.5.11) with the first-order  error term S r replaced with the second-order term Rr. Thus, replacing the observed quantity dxR(cr)  with that computed for  the starting model a 0 essentially reduces the error term in the similitude equation from  first  to second order in So  even though the number of measurements is reduced by half.  Equation (2.5.20) is a viable equation for  inversion; the convergence properties of  an algorithm based on this equation should be identical to those of the linearized inversion algorithm. Even if  an iterative solution of  the similitude equation does converge, because the responses consist of  the difference  between measured quantities, there is no guarantee that the constructed model will reproduce the measurements. Convergence criteria for  iterative algorithms are considered in Chapter 3, but basically what is required is that the sequence of  constructed models stabilize at a model which reproduces the responses. For instance, consider the linearized equation for  a at iteration n + l: oo oo (z)  dz  = j G (a n, z) an+1 (z)  dz.  (2.5.21) 0 0 If  this sequence stabilizes after  n iterations so that there is no change in the constructed model, i.e. an+1 = an, (2.5.21) reduces to R(a n) = R(a)  and the constructed model reproduces the observations. However, consider an inversion algorithm for  a based on the similitude equation (2.5.11) at iteration n + l: oo R (*)  - = jG(<Tn,z)  an+1 (z)  dz.  (2.5.22) 0 oo If  the process stabilizes after  n iterations so that an+1 = an, then since J  G(a n, z)an(z)dz  = o T ^  = RM-  T d - ( 2 . 5 . 2 3 ) Because the similitude equation responses consist of  the difference  of  measured quantities, an can satisfy  (2.5.23) without either R(a n) = R(a)  or drR{a n) = drRia).  Thus, the constructed model is not actually required to reproduce any of  the measurements. Before  the analysis in this section was carried out, an inversion algorithm based on the similitude equation (2.5.11) was implemented (the details of  this algorithm are similar to the linearized inversion algorithm presented in Chapter 3). For the simple case where all models are restricted to constant-conductivity halfspaces,  we found  that the algorithm generally exhibited linear convergence to the true model. However, for  models with any appreciable structure (even simple two-layer models), the algorithm generally did not converge. Thus it would seem both in theory and in practice that the similitude equation is not an appropriate basis for  model norm inversion. The major difficulty  with using the similitude equation for  inversion is that in order to implement it in the manner suggested by G6mez-Trevino (1987), a first-order  error term S r must be neglected. A natural question to investigate is whether this error term can be included in the inversion rather than neglected. In order to include this term, the first-order  similitude equation (2.5.18) may be written as {oo oo R{a)-  J  G (<7o, z) 8a (z)dz  1 = J g  (a 0, z) a (z)  dz.  (2.5.24) 0 J o oo Since R(a)  - f  G(a 0, z)6a(z)dz  = R(a 0) (to first  order) and Td TR(a 0) = R(a 0)-o oo f  G(a 0,z)a0(z)dz,  (2.5.24) reduces to o oo oo 8R+ J  G(a 0,z)a0(z)dz  = J  G (a 0, z) a (z)  dz,  (2.5.25) 0 0 which is simply the linearized inverse equation for  a. This is not a surprising result. The Frechet derivative of  the response R with respect to the conductivity is defined  to be a linear functional  which, when applied to the conductivity perturbation, produces (to first  order) the response perturbation. The Frechet derivative is unique, therefore  the linearized equation represents the only linear expression which relates 8R to 8a (or, using transformation  (2.1.17), to a itself)  which is accurate to first  order in 8a. Any attempt to devise a linear relationship between R and a which is accurate to first  order must reduce to the linearized equation for  R. For instance, the integral equation (2.4.1) derived by Smith & Booker (1988) also relates the model directly to the response; however, it can be shown that using (2.4.1) in a linear inversion requires an approximation that is equivalent to neglecting a first-order  error term. If  the error term is included in the inversion, the expression again reduces to the linearized equation. Thus, the linearized equation would seem to be the obvious basis for  model norm inversions. Chapter 3 l 2 Model norm construction 3.1 Introduction In Chapter 2 an approximate linear expression (equation (2.2.18)) which relates the con-ductivity model directly to (modified)  MT responses was derived. In this chapter an iterative inversion algorithm based on this linearized expression is developed to construct models which minimize an l2 norm. The norm can be a measure of  model structure or of  the deviation of the model from  a given base model; minimizing these norms produce the l2 flattest  (minimum-structure) model and the smallest-deviatoric model, respectively. It is important to note that although recent papers by Constable et al. (1987) and Smith & Booker (1988) have considered the construction of  Z2 minimum-structure models, the work presented in this chapter was initiated prior to those publications (or the author being aware that such research was being carried out) and has been developed independently. Certainly the development and results of  these papers are germane to this chapter and comparisons and contrasts are made. Preliminary results of  the inversion algorithm described in this chapter have been presented in Dosso & Oldenburg (1989) and have also been summarized in Whittall & Oldenburg's (1990) survey of  1-D MT inversion techniques. The work in this thesis was guided by a somewhat different  philosophy than that expounded by Constable et al. (1987) and Smith & Booker (1988). Those references  consider the l2 minimum-structure solution to be the best model with which to interpret the observed responses. While it is acknowledged that if  an interpretation is to rely on a single solution, this model may be an excellent choice, the development here is based on the belief  that it is always valuable to produce a variety of  acceptable models and to take into account as much additional information  or insight into the problem as possible. Such flexibility  in model construction allows some exploration and understanding of  the space of  acceptable models. To this end, l2 flattest and smallest-deviatoric models are developed for  both a and log a as models and an arbitrary weighting function  is included in all model norms. The MT inversion algorithm presented in this chapter is based on successive solutions of  the linearized problem. Before  the general inversion scheme is described, however, it is convenient to first  consider the simpler linear problem in order to describe the model norms and the inversion procedure 3.2 Linear inverse theory This section presents methods and results from  linear inverse theory. Although the theory is general, only results relevant to the linearized inversion algorithm are considered. More complete reviews of  linear inverse theory are given by Parker (1977a), Oldenburg (1984) and Bertero, De Mol & Pikes (1985, 1988). In a linear problem the model m and responses d 3 are related via a linear functional oo dj  = J  Gj(z)m(z)dz,  j = l,...,N.  (3.2.1) o Observed responses are generally inaccurate; therefore  the aim of  the inversion is not to fit the data exactly, but rather to achieve an acceptable level of  fit  based on some criteria. The error on each response dj is assumed to be due to an independent, zero-mean Gaussian process with a standard deviation Sj. Although this may be a poor approximation in some cases, it is retained since knowledge of  the true statistical distribution of  the noise is often  very poor and this assumption allows the analysis to be carried out exactly (Parker 1977a). To weight each response according to its uncertainty, the data equations are divided by their standard deviation to yield oo ei = J  9i 0 ) m(z)dz,  j = 1 , . . . , N,  (3.2.2) o where e3 = dj/sj  and gj(z)  = Gj(z)/sj.  The constraint equations (3.2.2) may be inverted for the model m(z)  which minimizes some penalty functional.  Minimization of  a penalty functional (also called a regularization functional)  is a requirement for  regularizing the inverse problem (Rokityansky 1982). 3.2.1 Smallest model One of  the simplest functionals  to minimize is the (squared) l2 norm of  the model, , oo | |m||22 = J  \m(z)\ 2dz,  (3.2.3) o to yield the smallest model. A somewhat more general formulation  includes an arbitrary (positive) weighting function  w(z)  in the norm: \\™\\l w = J\w(z)m(z)\ 2dz,  w(z)  > 0. (3.2.4) o Including w in the norm allows flexibility  in defining  how strongly (in a relative sense) the minimization is applied to various regions of  the model: where w(z)  is large the model will tend to be small (if  possible) and vice versa. To minimize the model norm (3.2.4) subject to the side conditions (3.2.2) the method of Lagrange multipliers (e.g. Morse & Feshbach 1953, p. 278) is employed. Each constraint equation is written as an expression equal to zero; these expressions are multiplied by an unknown Lagrange multiplier (for  convenience written as 2a j) and added to the norm to produce a new functional N  J *(m)=  IMl2 )U,  + 2X>;[ ei" /*(*)"»(*)*].  (3.2.5) The model norm (3.2.4) is minimized, subject to the data constraints, at the point where the functional  (3.2.5) is stationary with respect to m and ar To investigate the variation of  $ with respect to m, consider the change c?$ = $(m + dm)  — $(m) due to an infinitesimal  model perturbation dm.  It is straightforward  to show that °°  N d$  = 2 w2 (z)  m(z)-^2 ai9i ( z ) d m ( z)d z• (3.2.6) { For $ to be stationary with respect to m, d$  must be zero for  arbitrary dm,  thus (3.2.6) requires N m(z) = J2<* i9A z ) /™ 2 ( z ) - (3-2.7) 3=1 According to (3.2.7), the smallest model is given by a linear combination of  the (weighted) kernel functions  with the Lagrange multipliers acting as the coefficients.  Setting d$/dotj = 0 yields the constraints (3.2.2); substituting (3.2.7) into (3.2.2) leads to N I  a (z)  w(z) e J = £ a i / a ^ f i ^ b ,  (3.2.8) f-f  J  w< fc=l  o or e = Ta , (3.2.9) where c = (ei, e 2 , . . . , ejv)T, a = (ai, A 2 , C X N ) T and T represents the (weighted) inner-product or Gram matrix with elements oo r j k = [ ? M i i M d z . (3.2.10) J  w(z)  w(z) 0 r is an NxN  symmetric, positive-definite  matrix, so in theory it can be inverted, (3.2.9) solved for  a and the smallest model constructed using (3.2.7). However, although the problem of computing the smallest model in this manner is well-posed in the strict mathematical sense (i.e. the solution depends continuously on the data), the problem can be extremely ill-conditioned and therefore  very unstable numerically, especially for  large data sets. This instability is a consequence of  noise on the data and the finite  numerical precision of  computations which has the result that the kernel functions  may not all be considered as effectively  linearly independent. Also, since the responses are inherently inaccurate, it is not desired that the solution should exactly reproduce the responses. Rather, the inversion procedure should seek to fit  the data only to within a level of  misfit  appropriate to the uncertainties of  the responses. A common measure of  misfit  is given by = (3.2.11) •s? 3=1 x 3 where dj{m c) = f  Gj(z)m c(z)dz  are the responses predicted for  the constructed model mc(z). o X 2 corresponds to the standard chi-squared statistic if  the error on each response is assumed to be due to an independent, zero-mean Gaussian process with standard deviation sy The expected value of  x 2 is (approximately) equal to the number of  data, N;  therefore,  the constructed model should ideally produce a x 2 misfit  of  N.  If  x 2 is much less than N,  the data are fit  too well and the model may exhibit structure which is simply an artifact  of  the noise; if  x 2 is much greater than N,  the data are fit  too poorly and information  about the model may be lost (Oldenburg 1984). A method that allows for  a suitable misfit  and which overcomes the numerical instabilities of the inversion is the spectral expansion method described by Parker (1977a). Spectral expansion isolates the components of  the solution that are well determined by the data from  those that are not. In fact,  it can be shown that the spectral expansion method is equivalent to the principal component or Karhunen-Loeve transformation  used in signal processing and information  theory to extract the common signal from  a set of  time series (e.g. Kramer & Mathews 1968, Jones 1985). 3.2.2 Spectral expansion In the spectral expansion method, rather than forming  the model solution directly as a linear combination of  the kernel functions  as suggested by (3.2.7), the kernels are rotated to produce a new set of  basis functions N = (3.2.12) k=i where U  is the orthogonal matrix which diagonalizes T according to U TTU  = A . (3.2.13) In (3.2.13) A = diag (A1? A 2 , . . . , A^), with Aa > A2 > . . . > A;v > 0, is the matrix of  eigenvalues of  T (called the spectrum of  T) and U  is the matrix of  column eigenvectors of  T. The matrices U  and A may found  by a singular value decomposition (SVD) of  T (Lanczos 1958). It is straightforward  to verify  that {^3{z)} is an orthonormal set, i.e. CO J  tl> j(z)tl> k(z)dz  = 6jk. (3.2.14) 0 This set of  functions  can be used as a basis for  m: N m(z) = ^2 0 ) , (3.2.15) j=i where the coefficients  are given by oo ctj = J  if?j  (z)  m (z)  dz o = A~1 / 2e j 5 (3.2.16) and i j = I represents rotated responses. Parker (1977a) shows that the aj are statistically independent and the standard deviation of  each coefficient  aj is XJ 1/ 2, i.e. the coefficients  associated with the smallest eigenvalues have the greatest uncertainty. Physically, diagonalizing the inner-product matrix may be considered analogous to rotating to a 'principal axes' co-ordinate system where the co-ordinate axes, specified  by the eigenvectors, correspond to the natural axes of  symmetry. Parker (1977a) regards (3.2.15) as an expansion in the natural 'modes' of  the data. He notes that the largest eigenvalues are generally associated with the smoothest functions  ij>j(z)  and that the functions  become more oscillatory as j increases. Also, the magnitude of  successive eigenvalues generally decreases rapidly except for  the smallest ones which often  cannot be computed accurately and tend to cluster around a small number determined by the computational precision. The expansion coefficients  associated with these small eigenvalues are poorly determined. Parker (1977a) describes two methods to stabilize the inversion. One method is simply to omit the functions  i/>j(z)  associated with the smallest eigenvalues from  the expansion (3.2.15). The truncated series will no longer fit  the original data precisely, but it is straightforward  to show that the x 2 misfit  for  the constructed model is given by N x2 = E ( 3-2-17> j=n+1 where n < N  is the number of  terms included in the truncated expansion. The number n can be chosen so that x 2 approximately achieves the desired value. An alternative approach is to retain all the eigenvalues, but to replace A j by A _,+/? in (3.2.12) and (3.2.16), where /? is a positive constant. This is equivalent to adding a constant to the main diagonal of  A to avoid singularities or near singularities and is similar to the Marquardt-Levenberg or 'ridge regression' method for  least-squares inversion (Levenberg 1944; Marquardt 1963; Lines & Treitel 1984). In this case, it can be shown that the x 2 misfit  is given by (3 represents a trade-off  parameter between fitting  the responses accurately and minimizing the model norm: when 0 is large the model norm can be made very small by misfitting  the data; when j3 is small the data are accurately fit  at the expense of  an increase in the norm. Since X 2 is a monotonically increasing function  of  /?, (3.2.18) can easily be solved for  the value of the ridge-regression parameter fi  which results in the desired level of  x2 misfit  using Newton's method or a 1-D line search. This procedure allows the construction of  the smallest model with a precisely determined value for  x2-Oldenburg (1979) applied the method of  spectral expansion and truncation to the linearized MT response expansion given by (2.2.20) to compute the smallest conductivity perturbation in an iterative algorithm. In order for  the linearization to hold, perturbations must be small and minimizing the l2 norm of  the perturbation is a logical choice. However, making use of  the transformed  equation (2.2.22) allows a norm to be applied to the model itself,  not the perturbation. In this case, constructing the smallest conductivity model is not the best choice for  several reasons. There is no geophysical reason to expect the Earth conductivity structure to correspond to the smallest model solution. In fact,  minimizing the conductivity often  results in highly oscillatory solutions which include regions of  negative conductivity that are both unphysical and difficult  to deal with computationally. The problem of  negative conductivities can be remedied by considering log a as model; however, more meaningful  choices for  the model norm can be derived by modifying  the smallest model construction. (3.2.18) 3.2.3 Smallest-deviatoric model As mentioned above, there is no geophysical reason to expect the smallest model to represent the true Earth structure. However, if  a base model is available which corresponds to the best estimate of  the true model based on any available information  (well logs, geological considerations, previous modelling studies, etc.), the method described in Section 3.2.1 can be modified  to construct the model which deviates least from  this base model. That is, the norm of  the deviation between the constructed and base model is minimized subject to the constraint that the constructed model fits  the data to within an acceptable level. This is referred  to as the smallest-deviatoric model. Let rriB(z)  represent the base model and Am{z)  represent the (unknown) deviation from m j required so that the constructed model m(z)  = mB(z)  + Arn(z)  (3.2.19) fits  the data to within the required level of  misfit.  The data constraint equations (3.2.2) become oo e j = J  9: CO [™b  (z)  + Am (*)] dz,  j = 1,...,N,  (3.2.20) o which may be written as oo gj = J  gj(z)Am(z)dz,  (3.2.21) o where the modified  responses are given by oo = e j ~~ J  9j ( z ) m B (z)  dz.  (3.2.22) o Equations (3.2.21) are modified  data constraints written in terms of  Am as model; these may be used as side conditions in minimizing the (weighted) /2 norm of  the model deviation oo 11 Am 1 1 = j\w(z)Am(z)\ 2dz  (3.2.23) in the manner described in Sections 3.2.1 and 3.2.2. Strictly speaking, (3.2.23) is a not a norm for  the model itself;  however, it will sometimes be referred  to loosely as a model norm. The smallest-deviatoric model is formed  by adding the constructed deviation to the base model according to (3.2.19). The weighting function  w(z)  is particularly useful  for  the smallest-deviatoric model as it is often  the case that the base model is well known for  some depth regions but not for  others. Also, by an appropriate choice of  base model and weighting function,  the construction of  smallest-deviatoric models may be used to perform  an approximate appraisal of model features;  this procedure is demonstrated in Section 3.4.2. 3.2.4 Flattest model In many cases there is insufficient  additional information  available to determine a reliable base model prior to the inversion. In such cases it is probably best to seek the simplest model consistent with the data at a given level of  misfit.  By constructing minimum-structure models, the danger of  being mislead by features  appearing in the model that are not required by the data should be greatly reduced. There is reason to believe that the features  of  the minimum-structure solution are characteristics essential to fitting  the data and not simply artifacts  of  the noise or inversion procedure. The true model may be more complex than the simplest model, but these additional complexities are not resolved by the data and are not justified  in the constructed model. Constable et al. (1987) and Smith & Booker (1988) also propound this philosophy for the inversion of  MT responses. The l2 norm of  the model gradient is a measure of  the amount of  structure of  the model; minimizing this norm produces the flattest  model. To express the constraints in terms of  the model gradient, (3.2.2) can be integrated by parts to give oo fij  = /  9j (z)  m'(z)  dz,  j = 1,..., N, (3.2.24) o 9j 0 ) = hj 0 ) - hj (oo) (3.2.25a) z k3 0) = / 9j 0) d u (3.2.25b) o ej = m (0) hj (oo) - ej (3.2.25c) and m'  = dm/dz.  Using these constraints as side conditions, the method described in Section 3.2.1 may be used to compute the smallest gradient model. The flattest  model m(z)  is recovered directly by integrating m'(z).  The procedure derived here requires the surface  value m(0) be known; in fact,  the derivation can be generalized for  m known at any fixed  depth, but in practice m(0) is most likely to be known accurately for  the MT problem. If  m(0) is known, it is valuable to include this information  in the inversion; however, an inaccurate estimate of  m(0) can introduce false  structure into the model. If  a reliable estimate of  m(0) is not available, the value for  m(0) which produces the absolutely flattest  model can be calculated. An original derivation of  the absolutely flattest  model and a discussion of  some of  its attributes are included in Appendix A. Smith & Booker (1988) also describe a similar procedure. Minimizing the norm of  the model gradient leads to the model with the minimum structure which fits  the data. The weighting function  w(z) may be used to define  depth regions where structure is to be discriminated against either more or less strongly. In MT the resolution of the responses generally decreases logarithmically with depth; therefore  it is often  appropriate to minimize the logarithmic gradient, i.e. to consider a norm where the depth function  f(z)  may be either z or log(z  + zQ). Since dm/d[log(z  + z0)] cx (.z  + z0) dm/dz,  the logarithmic gradient can be accomplished by including a weighting (in addition to w) of  (z  + z0)1/ 2. The constant z0 is included to avoid a singularity at z =0; physically, it is required since the resolution length approaches a constant (not zero) at the Earth's surface.  Smith & Booker (1988) use a similar approach and choose z0 equal to half  the penetration depth 3£(c) (Weidelt 1972) for  the highest frequency  in the data since structure much shallower than this cannot be resolved. After  considering a number of  possibilities, this value was also adopted in our inversion algorithm. (3.2.26) o 3.3 The linearized inversion algorithm This section describes an iterative inversion algorithm for  the non-linear MT problem based on local linearization. Let the MT responses be related to the model according to Rj = Fj  ( m) ? j = l,...,N,  (3.3.1) where Fj represents the non-linear functional  defining  the forward  problem. Chapter 2 shows that expanding the functional  about a starting model, neglecting second-order terms and substituting for  the model perturbation leads to a linear expression which relates the model directly to the (modified)  responses: oo oo ^ + = j = l , . . . , N . (3.3.2) 0 0 In (3.3.2) the model m(z)  may represent conductivity a{z)  or log cr(z).  By selecting a starting model m0(z)  and computing the corresponding kernel functions,  (3.3.2) can be inverted for  a flattest  or smallest-deviatoric model m^z) using the linear inversion techniques described in Section 3.2. Since higher-order terms are neglected in (3.3.2), unless m 0 is close to a solution, it is unlikely that mi will adequately fit  the observed responses, i.e. the misfit X 2 = t ( f l ] " f i ( m i ) ) ' (3-3.3) 3=1 ^ S j ' will be unacceptably large. If  this is the case, the inversion is repeated iteratively with m 1 becoming the starting model for  the next iteration and so on. This procedure is continued until the x 2 misfit  reaches the desired level and the model does not change appreciably between successive iterations. Since the linear equation (3.3.2) is accurate to second order, it may be anticipated that when the algorithm converges, convergence will be rapid (quadratic) in analogy with Newton's method for  operators (Section 2.1.1). By explicitly minimizing the norm of the model gradient or deviation at each iteration, it is hypothesized that the constructed model represents the flattest  or smallest-deviatoric model which fits  the data. In practice, it is difficult to verify  that a global (rather than local) minimum for  the norm has been found;  this point will be investigated in some detail. 3.3.1 Numerical implementation Numerical implementation of  the algorithm requires that a depth partitioning be introduced and the model discretized. The discretized model is intended to represent an arbitrary function of  depth that is independent of  the parametrization. Thus it is important to allow the discretized model to be as flexible  as possible; this generally requires that the partition elements should be smaller than the resolution width of  the data. Because of  the inherent loss of  resolution with depth, the partition locations are usually distributed logarithmically with depth (with possibly a uniformly  discretized region near the surface)  and a uniform  halfspace  underlies the system. If there is any question of  whether the partitioning has influenced  the final  solution, the inversion may be repeated with successively finer  discretizations until there is no appreciable change in the solution. The model gradient norm (3.2.26) assumes a continuous model; however, efficient numerical solutions to the forward  problem are more readily based on a parametrization in terms of  a set of  piece-wise constant layers. Therefore,  the model is discretized so that m(z)  — mi, Zi-i  < z < zit z = l , . . . , M , (3.3.4) where z0 = 0 and the number of  layers M  is typically about 100. The depth partition must extend deep enough that the EM fields  associated with the longest period have decayed essentially to zero; the upper limit of  infinity  in all integrals may then be replaced by zM-Once a starting model m0(z)  has been specified  on the partition, the kernel functions  can be computed. The N  complex equations (3.3.2) are separated into 2N  real equations corresponding to either real and imaginary or amplitude and phase representations of  the complex responses R. Expressions for  the kernel functions  are derived in Chapter 2: when m(z)  = a(z)  the kernels corresponding to real and imaginary responses are given by the real and imaginary parts of (2.2.17) while the kernels corresponding to amplitude and phase responses are given by (2.5.1a) and (2.5.1b). When m(z)  = log a(z),  these kernel functions  are multiplied by cr0(z)  as described in Section 2.5. Computing the kernel functions  and modified  responses requires the solution to the forward problem: given the model m0(z),  calculate the resulting fields  E(m 0) and H(m 0) (or equiva-lently, d zE). Recursive solutions to the forward  problem of  computing electric and magnetic fields  for  a layered Earth model are well known (e.g. Rokityansky 1982). Our algorithm in-corporates routines written by Wannamaker, Stodt and Rijo (1987) to compute the EM fields  at any depth. Integrations required to calculate the modified  data or kernels for  either the flattest or smallest-deviatoric model are computed using a Romberg integration scheme. In general, it is not necessary to compute and integrate all kernel functions  to a depth ZM  (Oldenburg 1979). For each period there exists a depth z m a x below which the electric field  has decayed to some small proportion (say, 10 - 5 ) of  its surface  value. This depth can be computed adequately using a WKBJ  approximation (e.g. Mathews & Walker 1970, p. 27), and below, the kernel function may be set identically to zero. Once the appropriate kernel functions  and modified  responses have been computed, the linearized problem can be solved for  the flattest  or smallest-deviatoric model as described in Section 3.2: the inner-product matrix T is computed and diagonalized using SVD, the appropriate value for  the ridge-regression parameter /3 is computed, and the model gradient or deviation is constructed as a linear combination of  the (rotated) basis functions  {ipj}.  Finally, the model mi(z)  is found  by integrating m'(z)  (for  the flattest  model) or adding mg(z)  to Am(z)  (for  the smallest-deviatoric model) and x 2 misfit  to the data is computed for  mi(z) according to (3.3.3). This procedure is repeated iteratively until the convergence criteria are met. The criteria are that the model must fit  the data to within a tolerance tx2 of  the desired misfit  xl> a n d that the model does not change appreciably between successive iterations. The latter requirement is included because since the constructed model always depends on the model at the previous iteration, it is important to verify  that a stable solution has been achieved. The total change in the model at the &th iteration is quantified  by the value ek: 1/2 £ [m izk - i] 2 t—  1 (3.3.5) eit = where M m a x is the index of  the partition element that corresponds to the maximum depth of penetration zmax for  the longest period response. The limit M m a x is used rather than M  so that the model change measure e will not be unduly influenced  by that region of  the model which is not resolved by the data. In addition to requiring that the total model change e be less than some specified  value ed for  convergence, no model element is allowed to change by more than a prescribed factor;  this procedure is described in the following  section. In our algorithm, the desired misfit  value, xl> the tolerance allowed in the final  misfit,  tx2, and an acceptable value for  ed  are variable parameters that are defined  by the user. In practice, common values for  these parameters are Xd = ^N (for complex responses at N  frequencies),  tx2 =0.1 and e^  = 0.01. 3.3.2 Non-linear considerations An important consideration that has not yet been discussed regards the applicability of  the local linearization inherent in (3.3.2). Since this equation neglects second-order terms, it is only accurate for  small changes in the model. One method of  ensuring that model changes at each iteration remain small is to require only a small change in the misfit  per iteration. In practice, this is accomplished by choosing the target misfit  value xi f° r the kth  iteration to be some fraction  of the misfit  of  the previous iteration unless this value is less than the desired final  misfit  value: where P is usually taken to be between 2 and 10. In addition to choosing target misfits  in this manner, the size of  changes in the model between iterations is controlled when the model is updated, i.e. the value of  the model on the zth partition element at the &th iteration is only (3.3.6) allowed to change by a factor  of  D from  its value at the previous iteration: m t i f c_!/D  < mitk < D , i = 1 , . . . , M, (3.3.7) where D is also usually taken to be between 2 and 10. In most cases the new model values naturally fall  within these limits and the constructed model is not altered; however, in some cases applying the limits seems to be required to stabilize the first  few  iterations. The algorithm is not considered to have converged at a given iteration if  any element of  the constructed model is altered according to (3.3.7), even if  the required conditions for  misfit  and the total model change e are satisfied.  In practice, these methods of  keeping model changes small so that the linearization remains valid have been found  to be effective  and necessary precautions for  a stable and robust inversion scheme. Another consequence of  the non-linearity is that solving the linearized data equations to a misfit  of  Xt a t a given iteration will not, in general, result in this same value when the true X 2 misfit  is computed according to (3.3.3). Constable et al. (1987) overcome this difficulty  by sweeping through values of  the ridge-regression parameter /3  and computing the corresponding values for  the x 2 misfit  in order to achieve the desired value. Unfortunately,  this can require a large number of  solutions to the forward  problem for  each iteration. While this technique may be viable for  the 1-D MT problem, for  2-D MT or other problems the amount of  forward calculations can be prohibitive. A different  approach has been developed for  our algorithm. The difference  in the misfits  of  the linearized and non-linear cases is not a problem in the early iterations since the linearized approximation will produce a very acceptable decrease in the misfit  of  the non-linear problem. However, in order for  the algorithm to converge precisely to the desired misfit  the non-linear effects  must be taken into account in the final  iteration. In its unmodified  form,  the algorithm generally converges to a stable solution that has a misfit  slightly larger than xl• In order to reduce the misfit  to the value Xd> it necessary to solve the linearized equations to a misfit  somewhat smaller than x2d (i-e- require a target misfit  Xt somewhat smaller than Xd)- The problem becomes one of  finding  the value for  the target (or linear) misfit  x 2 that results in the desired misfit  x2 — Xd  i n a n efficient  manner. In our approach, when three consecutive iterations have not reduced the misfit  significantly,  the target misfit  x 2 f° r the third iteration is reduced by multiplying it by a factor  Xd/x 2> where x2 represents the misfit  originally achieved in the third iteration. The appropriate /? is computed so that the linearized equations are solved to a misfit  of  x?» and the true misfit  achieved is computed from  the resulting model. Adjusting x< i n this manner generally leads to a substantial improvement in the x 2 misfit.  In some cases the x2 value it produces is within the specified  tolerance of  the desired value xl and the algorithm has converged (provided the model change requirements are also satisfied). Even if  this step does not result in the desired misfit,  it has produced a second pair of  target and actual misfits  (xi,  x2) valid for  this iteration and thus a third target misfit  value can be computed using these two pairs in an approximate Newton step. In practice, the x 2 resulting from  this Newton step usually leads to the desired misfit  of  x% cases when it does not, the procedure is repeated up to three times. If  xl has still not been achieved, a new iteration is initiated. It is quite possible that the desired misfit  x2 — Xd  c a n he obtained for  more than one value of  the target (or linear) misfit  xi since the x 2 misfit  may begin to increase when the linear misfit  is decreased below a certain point as a result of  the non-linearity of  the problem. This is illustrated diagrammatically in Fig. 3.1. When this is the case, the larger value of  xi is correct because it results in a smaller model norm. Therefore,  it is important to verify  that our procedure has returned this value for  xi- This is easily checked by verifying  that x 2 has decreased if  xi was decreased or that x 2 has increased if  xi w a s increased. If  this is not the case the step is approaching the wrong root and the algorithm returns to the best previous solution which satisfied  this requirement to initiate a new iteration. This procedure for  determining the misfit  to the linearized problem required to precisely obtain a specified  misfit  to the non-linear problem has proved to be an efficient  and effective method that requires a minimum of  forward  modelling. The algorithm has successfully  inverted all data sets (measured and synthetic) that have been considered. In many cases, only one forward model solution per iteration is required for  all but the final  iteration which may require two or three forward  models to achieve the desired misfit  Xd- Some particularly difficult  inversions may x \ Figure 3.1 Diagram showing the dependence of  the x 2 misfit  on the target misfit  Xt (the misfit to the linearized data equations). The desired misfit  x j i s indicated a dashed line. require several forward  model solutions for  the final  few  iterations. The only difficulties  arise when a value for  \2 d is specified  which is smaller than that which can be achieved by a finite  1-D model. This can be the case if  a set of  field  measurements contain significant  2-D effects  or the data uncertainties are under-estimated. In theory, the problem of  finding  the smallest possible x 2 misfit  to an arbitrary set of  MT responses has been solved by Parker (1980) and Parker & Whaler (1981). They show that the solution which minimizes the x 2 misfit  is given by the D+ model which consists of  a series of  delta functions of  infinite  conductivity, but finite  conductance, embedded in an insulating halfspace.  The D+ solution is not a geophysically realistic model, but it does provides a lower limit for  the x 2 misfit.  In practice, however, it may not be possible to achieve this level of  misfit  with the algorithm described in this chapter since the constructed models do not admit delta functions and a finite  partitioning is imposed on the model. This is perfectly  acceptable since the object of  the inversion is to construct geophysically plausible models. In the case where an unattainable value for  x^ is specified,  it may be that the multiplicative factor  xl/x 2 u s e d to reduce x2 i n the initial step is too large and may not result in a decrease in x2- If  this is the case, the multiplicative factor  is reduced by successively taking its square root until an acceptable step is found.  If  an acceptable step is not found  in 10 such attempts, the misfit  would appear to have been reduced to (approximately) the smallest possible value and the algorithm terminates. The algorithm described in this section has been implemented as a fully  automated, self-contained routine. The algorithm generally converges from  a halfspace  starting model to a model with the expected value of  x 2 in about six to eight iterations. The next section of  this chapter presents a number of  examples of  l2 model norm construction designed to illustrate the features of  the inversion algorithm. 3.4 Examples of l 2 model norm construction 3.4.1 Flattest model construction The inversion algorithm described in Sections 3.2 and 3.3 is designed to be flexible  so that a variety of  acceptable models of  specific  character can be obtained for  a given data set. In this section the construction of  Z2 flattest  models is demonstrated by inverting both synthetic and measured field  responses. The synthetic test case used to illustrate the inversion algorithm is that considered by Whittall & Oldenburg (1990) in their survey of  1-D MT inversion techniques. The true model, given in Table 3.1 and shown by the dashed line in Figs 3.2-3.7, consists of  four  homogeneous layers overlying a uniform  halfspace.  Fifty  data consisting of  the real and imaginary parts of  the response R were generated at 25 periods equally spaced in logarithmic time from  0.0025 to 250 s. For the purpose of  illustrating features  of  the constructed models, accurate data are considered initially; however, an uncertainty of  2 percent in all responses is assumed so that the X 2 statistic can be used to measure the relative fit  of  the models. Unless otherwise indicated, the absolutely flattest  model was constructed in each example in this section and a misfit  of  x 2 = 50 (tolerance tx2 = 0.1) and maximum model change of  ed  = 0.01 were required for  convergence. In each case the starting model was taken to be a halfspace  of  conductivity 0.02 S/m. The effects of  introducing noise into the synthetic data and inverting field  measurements are considered subsequently. The first  example illustrates the convergence properties of  the algorithm when the gradient norm (3.2.26) is minimized with model m(z)  = cr{z),  depth function  f(z)  = z and a weighting w(z)  = 1. The constructed models and predicted responses at each iteration are shown in Fig. 3.2 and the corresponding values of  x2> ||m'||2 and e are given in Table 3.2. The target misfit X2 for  each iteration was chosen according to (3.3.6) with P = 10 and the size of  the model change between iterations was limited according to (3.3.7) with D = 10. Figure 3.2(a) shows the halfspace  starting model; the true model is indicated by a dashed line. The two plots to the 0.10 '  V a 0 .08 cn 0.06 ^—y 0.04 N 0 .02 b 0.00 -• i ; i ^ 11 > 1111 i i 1111111 a • i 102 103 104 0.10 / s 6 0.08 \ CO 0.06 0 .04 N 0 .02 ""b 0 .00 43 1 0 - 1 CO 10"z b 3 1 0 - 3 80 £ 60 40 20 1 0 - a 1 0 -2 1 0 - i 1 0 o 1 0 i 1 0 2 10"3 lO-2 10"1 10° 101 10z 0.10 / s a 0 .08 \ CO 0.06 ^ — ' 0.04 0 .02 b 0 .00 102 1 0 - 3 io-2 1 0 - l 1 0 o 1 0 l 1 0 2 T  ( s ) F i g u r e 3 . 2 The sequence of  models produced in the inversion for  the Z2 flattest  model with m(z)  = <r(z),  f(z)  = z and = 1. (a), (b) and (c) show the starting model and the models constructed in iterations 1 and 2, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The values of  x 2 , | |m'||2 and e for  each iteration are summarized in Table 3.2. 0.10 - ' "1 a 0.08 - 1 m 0.06 - I i / 1 \ / 1 \ 0.04 - if  I _ i N 0.02 - i / \ / i \ /1 / 1 j d 0.00 i i 111 in i i i i 111ii i i i 102 103 104 ? 0.10 0.08 _ 1 n ! i w 0.06 ,/ i \ r 1 \ 0.04 - - - j, i _ \ - -j— N 0.02 - !r \ / j H 1 V  1 J e 0.00 i i i 11in i i i i i i in i i B D 1 0 -3 80 O 60 S . 40 20 IO"3 10~z IO"1 10° 101 102 102 103 104 IO"3 IO"2 10~l 10° 10l 102 0.10 -— 6 0.08 \ CO 0.06 0.04 0.02 "TF 0.00 IO"3 10-2 10"1 10° 10l IO2 T  (s ) Figure 3.2 (cont'd)  (d), (e) and (f)  show the models constructed in iterations 3, 4 and 6, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. Table 3.1 The true conductivity model for  the synthetic test case. Depth Conductivity (m) (S/m) , 0 - 600 0.004 600 - 2 000 0.04 2 000 - 6 000 0.01 6 000 - 10 000 0.1 > 10 000 0.04 Table 3.2 Summary of  model attributes at each iteration for  the inversion shown in Fig. 3.2. Iteration x 2 1 M b e number Misfit Derivative norm model change 0 76 700 0 1 6 630 4.45 xl0~4 0.663 2 1 290 9.52 xlO-4 0.420 3 169 1.87 xlO-3 0.284 4 53.9 2.19xl0"3 0.106 5 52.8 2.20 xlO"3 0.0353 6 50.0 2.22xl0-3 0.00592 right of  Fig. 3.2(a) compare the observed responses (squares with error bars) to the responses computed for  the starting model (solid lines). Although real and imaginary parts of  R are used as responses in the computations, the more standard representation in terms of  apparent conductivity o a and phase <f>  is displayed. For a halfspace  starting model o a and <f>  are constants and result in a very poor fit  to the data (x2 =76700). By requiring an improved fit  to the data at each iteration, structure is gradually introduced into the model. The model produced by the first  iteration, shown in Fig. 3.2(b), has approximately the correct surface  value and general increase in conductivity with depth; likewise, the predicted responses reproduce the general trend of  the true data but none of  the detailed structure. Structure is gradually introduced into the constructed model and predicted responses in iterations 2, 3 and 4 shown in Fig. 3.2(c), (d) and (e). By iteration 4 the constructed model essentially reproduces the true data (x2 = 53.9); however, several more iterations are required to precisely achieve the desired misfit  and verify that the solution has stabilized. The final  model, achieved in iteration 6 and shown in Fig. 3.2(f), has a misfit  of  x 2 = 50.0 and represents a model change of  only e = 0.006 from  the previous iteration. Only one additional forward  model computation was required in the final  iteration to precisely achieve the desired misfit  value. The model solution clearly indicates the five  layers of  the true model. However, since minimizing the l2 norm of  the gradient discriminates against large abrupt changes in the model, the conductivity changes in a smooth, continuous manner with depth and the structural changes are represented in terms of  gradual gradients rather than discontinuous layers. Figure 3.2(f) represents the minimum-structure model (where structure is measured according to (3.2.26) with the given choices of  m, f  and w), the only features  exhibited are those required to fit  the data. In practice, it is difficult  to verify  that this solution truly represents a global (rather than local) minimum for  the structural norm. One method of  investigating this is to repeat the inversion with different  starting models. The algorithm has been initiated from  a variety of diverse starting models with no difference  in the final  solution. Although this does not prove that a global minimum has been found,  it does provide some confidence  that the final  solution is independent of  the starting model. In contrast, the models constructed by Oldenburg (1979) when the linearized problem is formulated  in terms of  a model perturbation are clearly dependent on the starting model. The question of  convergence is considered in more detail in Chapter 6 which presents another method for  validating the minimization. Since the resolution of  MT responses generally decreases logarithmically with depth, it is often  appropriate to consider the gradient norm (3.2.26) with the depth function  f(z)  = \og(z+z Q) as described in Section 3.2.3. Figure 3.3(a) shows the model constructed by minimizing this norm in the inversion algorithm (solid line); the model produced when f{^z)  — z is included for  comparison (dotted line). In both cases m(z)  — a(z),  w(z)  = 1 and the constructed models have a misfit  of  x 2 = 50.0. Since the logarithmic depth function  is implemented by including a weighting of  (z+z Q)1^2, minimizing the logarithmic gradient results in more structure at shallow depths and less structure at large depths than minimizing the linear gradient, as is evident in Fig. 3.3(a). This requires that the low frequency  responses are fit  more closely and the high frequency  responses fit  less closely when the logarithmic-gradient norm is used compared to the linear-gradient norm. Smith & Booker (1988) maintain that it is preferable  to fit  the responses at all frequencies  equally well (a white fit)  and that using f(z)  — \og(z+z 0) can accomplish this particularly when the model is taken to be m(z)—log  a(z).  The fit  to the data for  the minimum logarithmic-gradient model is shown in Fig. 3.3(b) and (c). In many cases recovering the conductivity over a number of  orders of  magnitude is of interest, in this case m{z)  = \oga{z)  is the appropriate choice of  model. Figure 3.4(a) shows the model constructed by minimizing the norm of  d[\oga]/d[log(z  + z0)]; the result of  minimizing da  /  d\iog(z+zoj\  is also included for  comparison (note that conductivity is plotted on a logarithmic scale). In regions of  high conductivity, using log a as model results in slightly more structure than using a; however, in the low conductivity regions log a as model results in a significant reduction of  structure and improvement in the recovery of  the true conductivity. The fit  to the data for  m(z)  —log  a(z)  is shown in Fig. 3.4(b) and (c). In many applications m{z)  = \oga(z) and f(z)  — log(z  + z0) is the practical choice. B \ CO " b i i i 10 B 10 - 2 10 - 3 i i i i ' nun i i i iiini i i i I M i l l 10-3 10-2 1 0 - 1 10° 101 102 80 £ 6 0 ^ 40 -20 J_L I I I I I I 11 llll I I I i i i i mil i i i i i 10-3 1 0 - 2 1 Q - 1 1 Q 0 1 0 1 1 0 2 T  (s) Figure 3.3 The flattest  model for  m(z)  = a(z),  f(z)  = logz+z 0 and w(z)=  1. (a) shows the constructed model (solid line) and the true model (dashed line); the solution for  f(z)  = z is also included for  comparison (dotted line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit  of  the constructed model is x 2 = 50.0. 10~3 10~2 10" 1 10° 80 £ 6 0 40 20 -1 1 1 1 1llll 1 'at ^ 1 1 1 1 1 HUM 1 1 1 1 II III I I I 1 Mill c 1 1 111 1 0 - 3 10-2 1 0 - i 1 0 o T  (s) 101 1 0 2 Figure 3.4 The l2 flattest  model for  m(z)  = \oga(z),  f(z)  = logz+z 0 and w(z)  — 1. (a) shows the constructed model (solid line) and the true model (dashed line); the solution for  m(z)  = a(z) is also included for  comparison (dotted line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit  of  the constructed model is x 2 = 50.0. The derivation of  the l2 model norm solution in Section 3.3 includes an arbitrary weighting function  w(z)  which determines how strongly the norm minimization is applied to various depth regions. This weighting function  provides additional flexibility  in the model construction. For example, the l2 flattest  model formulation  generally produces smooth models with structural changes represented by gradual gradients; however, by an appropriate choice of  w{z),  these changes can be confined  to localized regions. Figure 3.5 shows the flattest  model constructed with m{z)  = log <y(z),  f(z)  = \og(z+z 0) and the weighting function  w(z)  set to unity everywhere except for  narrow regions three partition elements wide centred at each of  the depths where the conductivity of  the true model changes. These three elements were given weights of  0.1, 0.05 and 0.1. The structural changes in the constructed model, shown in Fig. 3.5(a), occur predominately in the regions where the weighting is small, outside of  these regions the model is essentially constant. By comparing Fig. 3.5(a) with Fig. 3.4(a) it is clear that this choice of  weighting function  results in a constructed solution which more closely resembles a layered model. Of course, in many practical cases an appropriate weighting function  may not be so readily evident; however, Chapter 4 presents a new method of  model construction which produces layered-type solutions that requires no a priori decisions about where to permit conductivity changes. In the examples presented so far,  accurate responses have been inverted to produce the constructed models. Allowing errors on the responses will always degrade the solution, but by requiring a fit  to the data appropriate to the inherent uncertainties according to the x 2 criterion, the errors should not introduce false  structure into the constructed models. Figure 3.6 shows constructed models and their corresponding fit  to the data for  three levels of  error. In each inversion m(z)  = log cr(z), f(z)  = log(z  + z0) and w(z)  = 1, and all solutions have a misfit  of X2 = 50.0. In Fig. 3.6(a) each response (real and imaginary part of  R) has been contaminated by the addition of  a random error drawn from  a zero-mean, Gaussian distribution with a standard deviation of  2 percent of  the accurate response value. The constructed model is similar (but not identical) to the model shown in Fig. 3.4(a) where a 2 percent uncertainty in the data was assumed but accurate responses were inverted. In Fig. 3.6(b) the standard deviation of  the error 1 0 - 3 10-2 10 " 1 10° Figure 3.5 The l2 flattest  model for  m(z)  = loga(z),  / (z) = log Z+ZQ  and a weighting function w(z)  chosen to allow structural variations in narrow zones, (a) shows the constructed model (solid line) and the true model (dashed line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit  of  the constructed model is x 2 = 50.0. 6 \ m 1 0 " 1 1 0 " 2 10" - i -1 ^ I s J— 1/ 1 J  1 y i i i b 11111in i i i mill i i i io2 io3 io4 io3 io4 z (m) 1 0 - 3 1 0 -2 1 Q -1 1Q0 1 0 1 102 IO"3 10-2 10"1 10° 10l IO2 x w 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 T  ( s ) Figure 3.6 The effect  of  data errors: /2 flattest  models for  m(z)  = \oga(z),  f(z)  = logz + z0 and w(z) = 1 are shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of  2, 10 and 30 percent, respectively. The true model is also indicated (dashed line). The ! true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The misfit  of  the constructed model is x 2 = 50.0. distributions is 10 percent of  the response value. The constructed model has considerably less structure but there are still indications of  the features  of  the true model. Figure 3.6(c) shows the results for  30 percent error: the constructed model exhibits only a general increase in conductivity with depth and there is no indication of  the underlying layered structure. Likewise, the predicted data exhibit no detailed features  as only a simple structure is required to adequately fit  the true data. For 50 percent errors (not shown), the constructed model is a halfspace  and the predicted apparent conductivity and phase are constants; at this level of  uncertainty the true responses have essentially no resolving power. In order to formulate  the l2 flattest  model solution, the surface  value for  the model must be specified,  as described in Section 3.2.4. The standard approach is simply to estimate this value (e.g. Oldenburg 1984); however, the surface  value which results in the absolutely flattest  model may be determined in the manner described in Appendix A. Figures 3.2-3.6 show examples of absolutely flattest  models. Figure 3.7 compares the flattest  models computed for  an accurate and an inaccurate estimate of  the surface  conductivity with the absolutely flattest  model. In this example m(z)  = log a(z),  f(z)  — log(z+z 0), w(z)  — 1 and the responses were contaminated with 4 percent Gaussian noise. Figure 3.7(a) shows the absolutely flattest  model solution (note that in this figure  the conductivity is plotted to 10° m so that the surface  partition elements are shown). The optimum surface  conductivity value is found  to be 0.0055 S/m and the l2 norm of  the model gradient is 0.0534. The constructed model is truly flat  near the surface:  there is no change in the conductivity over the first  few  partition elements (the partition length is 5 m to a depth of  102 m and then increases logarithmically below this depth). The flattest  model constructed when the true surface  conductivity value of  0.004 S/m is specified  is shown in Fig. 3.7(b). If  this value is known accurately it is valuable to include it in the solution; the model shown in Fig. 3.7(b) is slightly closer to the true model to a depth of  about 102 m, below this depth they are essentially identical. However, the flattest  model shown in Fig. 3.7(b) is not truly flat  near the surface  since the conductivity changes slightly at the first  partition boundary at 5 m depth. The l2 norm of the model gradient is 0.0550, slightly larger than the norm associated with the absolutely flattest s \ CO 'fT 10"1 10 10 IO"1 10-2 IO-1 10° IO1 102 E \ CO 'fT IO"1 -10-2 . 10-3 10° IO1 IO2 io3 2 ( m ) ] 1 1 1 1 1 1 10~3 10-2 1 0 - 1 1 0 0 1 0 1 1 0 2 IO"3 10-2 IO"1 10° IO1 IO2 T  ( s ) Figure 3.7 Flattest and absolutely flattest  models for  m(z)  = log a(z),  f(z)  = logz  + zo, w(z)  = 1 and responses contaminated with 4 percent Gaussian noise, (a) shows the absolutely flattest  model, (b) the flattest  model when the true surface  conductivity of  0.004 S/m is specified, and (c) the flattest  model when an inaccurate surface  conductivity of  0.04 S/m is specified.  The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The misfit  of  the models in (a) and (b) is x 2 = 50.0. The model in (c) represents the best fit  that could be obtained for  the specified  surface  value, the misfit  for  this model is x 2 =56.5 model. The fit  of  the predicted to true responses is shown in the panels to the right of  the constructed models. The models shown in Fig. 3.7(a) and (b) both have a misfit  of  x 2 = 50.0. Figure 3.7(c) shows the flattest  model constructed when an inaccurate surface  value of 0.04 S/m is specified.  Prescribing this surface  value introduces false  structure at shallow depths: it not only results in a false  high conductivity region near the surface,  but a spurious low conductivity region at about 102 m is introduced to counter this near-surface  feature.  Furthermore, since the high frequency  responses which constrain the shallow structure cannot be fit  accurately, the low frequency  responses are over-fit  in an attempt to obtain an acceptable x 2 misfit.  However, over-fitting  the low frequency  responses introduces false  structure at depth which is simply an artifact  of  the noise on the data. The constructed model shown in Fig. 3.7(c) exhibits shallow and deep structure not required by the data and the Z2 norm of  the model gradient is 0.283, more than five  times that of  the absolutely flattest  model. In addition, because of  the inaccurate surface value, the best fit  to the data that could be obtained with the linearized inversion algorithm was X 2 = 56.5. This example demonstrates that when a surface  conductivity value cannot be reliably estimated, the absolutely flattest  model is a valuable alternative. As a final  example of  the l2 flattest  model inversion, a set of  wide-band MT field  data measured near Kootenay Lake in southeastern British Columbia, Canada, are inverted. The data were collected by PHOENIX Geophysics Ltd. (Toronto) for  Jones et al. (1988) as part of  the LITHOPROBE Southern Cordilleran transect, and a preliminary analysis has been presented by Jones et al. The analysis of  data from  this study is also appropriate in this thesis as the author spent several weeks in the field  assisting in the data collection procedure. The responses were measured at 34 periods and are shown as apparent conductivities and phases in Fig. 3.8. The actual data set inverted consists of  amplitudes and phases of  the response R computed from  the determinant averages of  the impedance tensor (e.g. Ranganayaki 1984). The determinant average is a rotationally-invariant parameter and its use avoids problems with identification  of  the electrical strike (Park & Livelybrooks 1989). The uncertainties associated with the computed |J?| and <f>  values were determined by a ?10"' \ Q 1 0 - 2 S io-3 b IO"4 --a 1 1 1 1 11 III 1 •  102 103 104 105 10-3 10-2 IO-1 10° IO1 102 ) IO"3 10~2 IO"1 10° IO1 z ( m ) 10-3 10-2 IO-1 10° IO1 102 T  ( s ) Figure 3.8 h flattest  models and MT responses observed in southeastern British Columbia, Canada. The models solutions are for  m(z)  = log a(z),  f(z)  = \ogz + zo and w(z)  =1. (a), (b) and (c) show the models produced in iterations 7, 9 and 12 with misfits  of  x 2 =267, 244 and 219, respectively. The true data (squares with error bars) and the predicted responses (solid lines) are shown in the panels to the right. straightforward  numerical simulation procedure (Whittall 1987). Assuming that the errors in the real and imaginary parts of  the determinant-average impedances Z(u>) are independent and Gaussian with zero mean, a large number of  noisy Z  values are generated to form  an (approximate) Gaussian distribution centred on the observed impedance value and having its measured standard deviation. Each (complex) noisy impedance is transformed  to a value for  \R\ and <f>  and the standard deviations of  the corresponding distributions are determined statistically. This procedure appears to be satisfactory  as the computed \R\ and distributions are also (approximately) Gaussian, and transforming  the computed mean and standard deviations for  |jR| and <j) back to Z  using the same procedure essentially reproduces the original value. Jones et al. (1988) also processed the measured impedances and noted that the data set they arrived at was (with the exception of  the longest period response) consistent with the response of  a 1-D model according to the criterion of  Parker (1980). However, the procedure outlined above results in somewhat smaller uncertainty estimates, especially for  the outlying data points, than those presented by Jones et al. (1988). Our set of  responses and uncertainty estimates is not strictly consistent with a 1-D model since the D+ best-fitting  solution (Parker 1980) has a misfit  of  x 2 = 199 for  68 data. This is likely due to an under-estimation of  the errors associated with the measured impedances. Nonetheless, the l2 flattest  model inversion was carried out with the desired misfit  set to xl — 199. Although it is unlikely that any finite  1-D model can achieve this misfit,  the algorithm should construct the flattest  model which (approximately) achieves the least possible misfit. The starting model for  the inversion was taken to be a halfspace  of  conductivity 0.01 S/m; the misfit  for  this starting model is x 2 = 4.09xl05 . Figure 3.8 shows /2 minimum-structure models constructed with m(z)  = log a{z),  f(z)  = log(,z + zo) and w(z)  = 1. Since the surface conductivity value was not known in this practical example, solving for  the absolutely flattest model was required. Figure 3.8(a) shows the model constructed at iteration 7 of  the inversion. This model has a misfit  of  x2 = 267 and a derivative norm of  ||m'||2 =0.0904. The solution is in good agreement with the 1-D models constructed by Jones et al. (1988) using the Occam's inversion algorithm of  Constable et al. (1987) and the best-fitting  1-D layered model of  Fisher & Le Quang (1981). The model constructed at iteration 9, shown in Fig. 3.8(b), has a misfit  of X2 = 244 and a norm of  ||m'||2 = 0.137. The best-fitting  flattest  model that could be constructed with the inversion algorithm is shown in Fig. 3.8(c). This model was produced at iteration 12 and is similar to those shown in Fig. 3.8(a) and (b) except that the features  are more pronounced as more structure is required to reduce the misfit.  The model has a misfit  of  x 2 = 219, only about 10 percent larger than the D+ model misfit  of  199, and a derivative norm of  ||m'||2 =0.198. The inversion and appraisal of  this data set will be considered further  in Chapters 4 and 5. 3.4.2 Smallest-deviatoric model construction In cases when an a priori estimate of  the model structure exists, it is often  useful  to construct the model which fits  the measured MT responses but deviates by a minimal amount from  this base model. As an example, consider the synthetic test case considered in Section 3.4.1 and assume that the true model is known to a depth of  2000 m from,  perhaps, a well log or previous geophysical study. In this case it would be reasonable to construct models which match this known structure by using the smallest-deviatoric model formulation.  Figure 3.9 shows the results of  this procedure for  the model m{z)  = a(z).  The base model, shown by the dotted line, consists of  the true model structure to a depth of  2000 m; below this depth the conductivity is held constant. In Fig. 3.9(a) the weighting function  is set to unity. The constructed smallest-deviatoric model is generally in good agreement with the base model to a depth of  about 5000 m which shows that the base model is consistent with the observed responses to this depth. Below about 5000 m, however, the data require additional structure in the model: two higher conductivity layers are clearly indicated. The gradual decrease in the conductivity towards the base model value at large depths results from  the limited information  content of  the data at these depths due to the decay of  the EM fields.  The constructed model shown was obtained for  a variety of  different  starting models which provides some confidence  that a global minimum for the model deviation has been found. 0.10 • s 6 0.08 \ cn. 0.06 0.04 'NT 0.02 0.00 IO"3 IO"2 IO"1 10° 101 IO"3 10-2 10-1 10° 101 102 T  (s) Figure 3.9 The l2 smallest-deviatoric model for  m(z)  = a(z).  In (a) and (b) the base model is indicated by the dotted line, the true model by the dashed line, (a) shows the smallest-deviatoric model solution (solid line) for  a uniform  weighting function,  (b) shows the smallest-deviatoric model when w(z)  = 1 for  z <2000 m and w(z)  = 0.2 for  2 >2000 m. The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The misfit  of  the constructed model is x 2 = 50.0. Since the base model is known reliably over only a certain depth range in this example, it would be reasonable to include a weighting function  in the model-deviation norm (3.3.23) which reflects  this limitation. Figure 3.9(b) shows the smallest-deviatoric model constructed when a weighting function  of  w(z)  = 1 for  0 < z < 2000 m and w(z)  = 0.2 for  z > 2000 m was included. In this case the base model is reproduced almost exactly in the region that is strongly weighted. Outside this region, the data require additional structure in a manner similar to Fig. 3.9(a). The fit  to the true data for  the models given in Fig. 3.9(a) and (b) are shown to the right of  each plot. The smallest-deviatoric model formulation  may also be used to perform  an approximate appraisal of  model features  by an appropriate choice of  base model and weighting function  (e.g. Whittall & Oldenburg 1990). As an example, consider appraising the region of  high conductivity centred at about 8000 m depth indicated by the flattest  log a model shown in Fig. 3.4(a). Figure 3.10 shows the results of  a weighted model-deviation norm appraisal of  this conductive feature. In Fig. 3.10(a) the base model (dotted line) was taken to be identical to the flattest  model solution at all depths except for  the conductive zone 6000 < z < 10000 m in depth. In this region the base model was assigned a conductivity of  0.04 S/m, a value significantly  lower than that indicated by the flattest  model. The weighting function  was chosen to be w(z)=  1 for  6000 < z < 10000 m and w(z)  =0.2 for  all other depths. This ensures that the model-deviation norm is minimized most effectively  over the conductive zone which approximates minimizing the conductivity in this region. The solid line in Fig. 3.10(a) shows the constructed smallest-deviatoric model. This solution indicates that a model with an average conductivity over the apparent high conductivity region of  only 0.046 S/m is not inconsistent with the data. However, the highly conductive structures at either edges of  this zone indicate that the data still require some type of  conductive feature  near this region. Figure 3.10(b) shows a similar analysis which attempts to maximize the conductivity over the region 6000-10 000 m in depth. The constructed solution indicates that an average conductivity over the conductive zone as high as 0.13 S/m is consistent with the data. In order to achieve this, however, the constructed model exhibits regions of  low conductivity at either edge of  the IO"3 IO"2 10"1 10° 101 10z s \ CO "b' 10"1 1 0 - 2 . 10-3 IO"3 1 0 -2 1 0 -1 1 Q0 1 0 1 IO2 T  ( s ) Figure 3.10 Approximate appraisal using the weighted model-deviation norm. In (a) the base model (dotted line) has a conductivity of  0.04 S/m coinciding with the high conductivity zone of  the true model (dashed line). The weighting function  w(z)  was chosen to emphasize this region. The smallest-deviatoric model (solid line) indicates that an average conductivity of  only 0.046 S/m is consistent with the data. A similar analysis is shown in (b) where the base model has a conductivity of  0.2 S/m over the conductive zone. The smallest-deviatoric model has an average conductivity of  0.13 S/m in this region. The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The misfit  of  the constructed model is x 2 = 50.0. conductive zone. The constructed models shown in Fig. 3.10(a) and (b) both have a misfit  of X 2 = 50.0; the fit  to the data are shown to the right of  the model solutions. Of  course, the constructed models shown in Fig. 3.10(a) and (b) do not represent a true minimization or maximization for  the conductive region since the model-deviation norm still applies to the entire model. They do, however, demonstrate a method of  approximate appraisal using model construction that can be used explore the range of  models which fit  the data and to test hypotheses about the true model by attempting to construct counter-examples. Chapter 5 describes a method of  constructing models using linear programming which may provide true global maxima and minima for  localized conductivity averages. Chapter 4 li Model norm construction 4.1 Introduction In Chapter 3 an iterative algorithm for  inverting MT responses was presented based on l2 norm solutions to the linearized problem. This chapter presents an inversion algorithm based on li norm solutions, i.e. the objective function  that is minimized at each iteration represents an model norm and the observed responses are fitted  according to an lx misfit  criterion. Although the li misfit  does not lend itself  as readily to analytic results, it is more robust and less influenced by extreme responses than the l2 norm (Claerbout & Muir 1973; Parker & McNutt 1980). In addition, minimizing an /x model norm leads to constructed models of  significantly  different character than those obtained in Chapter 3 using the l2 norm. Whittall & Oldenburg (1986) and Whittall (1986) present 1-D MT inversion schemes which minimize h norms of  the impulse response or of  the reflectivity  coefficients  of  the model, respectively, in an attempt to limit the model structure. However, the conductivity models constructed in this manner will not minimize any structural measure themselves. The method of  inverting MT responses by applying an norm to the model itself  would seem to be new. As in the previous chapter, an inversion algorithm is developed to construct both minimum-structure and smallest-deviatoric models. However, rather than minimizing a norm of  the model gradient, the minimum-structure solution developed here minimizes the l\ norm of  the model variation. The total variation of  a model m(z)  may be defined  as (Korevaar 1968, p. 406) and represents a measure of  the amount of  structure of  the model. The models constructed by minimizing this norm resemble layered earth models and are complementary to the continuous gradient models produced using l2 norm inversion. The method of  minimizing the variation oo (4.1.1) 0 norm was initially developed to constrain the structure of  extremal models constructed in an appraisal analysis (see Chapter 5) and has been found  to be a very practical and useful formulation  (Dosso & Oldenburg 1989) that has been adopted by a number of  authors (e.g. Whittall & MacKay 1989; Oldenburg & Ellis 1990). The smallest-deviatoric model construction is developed in a similar manner and an algorithm is presented which minimizes a combination of  both the variation of  the model with respect to depth, and the deviation from  an arbitrary base model. The MT inversion algorithm presented in this chapter is based on successive solutions of the linearized inverse problem. The linearized problem is solved at each iteration using linear programming methods (e.g. Gass 1975). Linear programming (LP) can be used to solve for  a set of  parameters which minimize (or maximize) a linear objective function  subject to equality or inequality constraints on linear combinations of  the parameters. The LP formulation  is very flexible:  any physical information  about the model that can be written as a linear constraint can be included and different  choices of  the objective function  allow a variety of  models to be constructed. The next section describes the general formulation  and solution of  the linear problem for  both minimum-variation and smallest-deviatoric models. 4.2 Linear inversion 4.2.1 Minimum-variation model To formulate  model construction as a linear programming problem, the model must be discretized and the model elements treated as LP parameters. As before,  let m (z)  = rrii, i < z < Z{,  i = 1 , . . r, M.  (4.2.1) In discrete form,  the lx norm of  the model variation (4.1.1) can be expressed as M-l V(m)=  ^ K + 1 - m . - l . (4.2.2) t ' = i The total variation is a measure of  the amount of  structure of  the model; the goal is to construct an acceptable model which minimizes this quantity. Unfortunately,  due to the absolute value function,  the expression given in (4.2.2) is not in a linear form  that can be minimized using LP. However, a suitable objective function  can be derived by introducing 2(M—1) new (non-negative) LP parameters {pi,qi,  i = 1 , . . . , M —1} which are constrained to be equal to the M—1 model changes according to mi+1 -mi= pi - q{,  p{,  q{  > 0, i = l,...,M-l.  (4.2.3) It follows  that \ m i + l — m,-| < Pi + qi (with equality holding if  either p{ or qi is zero), and therefore  a bound for  the total variation is given by M-1 V(m)<  (Pi  + qi). (4.2.4) »=i It is straightforward  to establish that minimizing an objective function  given by M—l X > i + ?0 (4-2.5) t=i effectively  minimizes the variation of  the model as follows.  Assume that a set of  parameters {mi,pi,qi} are found  which minimize the objective function  (4.2.5) subject to the constraints (4.2.3). Then one (or both) of  p, or qt must be zero for  each i or $ is not a minimum. In this case (4.2.4) becomes an equality and minimizing $ as given by (4.2.5) is equivalent to minimizing the total variation. The objective function  given by (4.2.5) is a linear combination of  parameters in the form suitable for  the LP formulation.  It is also straightforward  to include a set of  arbitrary weights {w t, i = 1 , . . . , M — 1} in the objective function  (4.2.5) to influence  how strongly the variation is minimized at various depths. To construct an acceptable minimum-variation model, the minimization of  the objective function  is carried out subject to the constraints that the model reproduces the observed responses. With the discretization given by (4.2.1), the data equations (normalized by their standard deviations) oo ej = J gj(z)m(z)dz,  j = 1 , . . . , N,  (4.2.6) o can be written M ei = Y,  > j = 1,...,N,  (4.2.7) i=1 where Zi 7 j i = J 9 j (z)  dz,  i — 1,..., M.  (4.2.8) Equations (4.2.7) represent the data equations expressed as linear constraints. However, since the responses are generally inaccurate, provision for  an acceptable misfit  should be included in the constraint equations. This can be accomplished in two ways. In the first  method, the data constraints are simply imposed as inequalities M ej — a < ^ tjirrii < ej + a, j = 1 , . . . , N,  (4.2.9) t=i where a is a constant (usually 1 or 2) which determines how closely the responses are fit. Imposing the constraints in this manner requires that each response must be fit  to within a standard deviations; therefore,  these are often  referred  to as 'hard' bounds. The second method constrains the lx norm of  the misfit  rather than the misfit  of  each equation individually. Since this allows some responses to have large misfits  while limiting the total misfit,  it is often  referred  to as a 'soft'  bound. To implement this method, the data constraints are written as equalities (Levy & Fullagar 1981) M ei = (7j«'m» + ui ~ vi)' ui' VJ  - 3 = 1 ' " • •' N > ( 4 - 2 ' 1 0 ) i=l where u3 and v3 are new LP parameters introduced to represent the misfit  to the jth data equation . Let x 1 be the norm of  the misfit,  then N  N x1 = < (4.2.11) j=l j=l Parker & McNutt (1980) describe the statistics of  the x 1 distribution. The expected value for N  responses is -y/2/TTN;  thus, an upper bound for  the misfit  could be expressed as Constraint (4.2.12) represents a bound on the h misfit.  The actual value of  the misfit  can be computed from  the constructed model. In practice, we have always found  that the computed misfit is equal to the applied bound. When the responses are assumed to have statistical uncertainties, it is generally preferable  to constrain the total misfit  in this manner rather than impose hard bounds on each response (Fullagar 1981; Oldenburg 1983). The LP problem of  constructing the h minimum-variation model consists of  minimizing the objective function  (4.2.5) subject to the variation parameter constraints (4.2.3) and the data constraints expressed by (4.2.9) for  hard bounds or by (4.2.10) and (4.2.12) for  soft  bounds. In addition, it is straightforward  to include limits on the model elements or any additional information  about the model which can be expressed as a linear constraint in the LP formulation. The flexibility  of  LP to accommodate additional physical constraints or to minimize different objective functions  allows considerable scope to investigate the inverse problem. Whittall (1986) describes two methods of  applying localized conductivity constraints in a LP inversion. If  reliable physical constraints are available, they can be used to restrict the non-uniqueness of  the inverse problem and construct models which are closer to reality. Alternatively, arbitrary constraints may be used to assess the extent of  the non-uniqueness and explore the range of  acceptable models. The use of  minimum-structure models in MT inversion as well as the method of constructing lx minimum-variation models would seem to be new. In the limit of  vanishing layer thicknesses it should give similar results to an li flattest  model which can be constructed by integrating the data equations by parts to obtain constraints in terms of  the model gradient and minimizing the h norm of  the gradient using LP (e.g. Oldenburg 1984). The minimum-variation formulation,  however, does not require integration of  the data equations and does not require a known model endpoint value. Also, in this formulation  the depth function  f(z)  is essentially N (4.2.12) controlled by the choice of  the partition width as a function  of  depth and need not be explicitly introduced. For MT inversion a logarithmic depth partitioning is the appropriate choice. The minimum-variation formulation  described here would seem to be an lx analogue of  Constable et al.'s  (1987) Occam's inversion which minimizes the l2 norm of  the variation (which they refer to as the model roughness). Formulating minimum-structure inversion in terms of  an norm offers  benefits  in addition to the flexibility  of  the LP algorithm. As noted previously, since the l2 norm of  the model gradient or variation discriminates strongly against large or abrupt changes in the model, minimizing this norm generally produces smoothly varying models which represent structural changes by continuous gradients. It is important to recognize that this form  is due to the inversion procedure, the true model may or may not be involve such gradients and the observed responses do not demand them. In contrast, minimizing the norm of  the variation does not discriminate against abrupt changes, but rather produces a minimum-structure model which more closely resembles a layered Earth with structural variations occurring at distinct depths. Thus, the /a and l2 inversions offer  complementary representations of  the Earth in terms of  gradient or layered models; in practice, a complete interpretation should consider both. 4.2.2 Smallest-deviatoric model It is straightforward  to modify  the minimum-variation formulation  described in the previous section to construct the model which minimizes the norm of  the deviation from  a given base model mg: The data constraints can be included in the LP formulation  as described in Section 4.2.1; however, the variation parameters are replaced by 2M  new LP parameters {r,, tt, i = 1 , . . . , M) which are constrained to be equal to the M  deviations according to M (4.2.14) t=i m-i ~ mBi = - U,  ri,U  >0, i = 1,..., M, (4.2.15) and the objective function  to be minimized is given by M (4.2.16) i=l This approach differs  from  the standard method of  writing the data equations in terms of  the model deviation Am and solving for  the smallest acceptable deviation, as described in Section 3.2.3. The advantage of  the new approach is that it is straightforward  to formulate  the model construction to minimize an objective function  which combines both the variation and deviation according to where 9 is a parameter which determines the trade-off  between minimizing the variation and the deviation and W{ and iut- represent arbitrary weighting functions.  In the LP formulation  there is no difficulty  in setting some of  the weights u>; or u>, to zero; this differs  from  the l2 formulation where the weights must be non-zero. 4.3 The linearized inversion algorithm This section describes an iterative inversion algorithm for  the non-linear MT problem based on successive LP solutions to the corresponding linearized problem. The method is similar in many respects to the l2 inversion algorithm described in Section 3.3 and therefore  will only be described briefly.  At each iteration a model solution m(z),  representing either a(z)  or log a(z), is sought to the linearized equations (3.3.2). By introducing a depth partitioning (4.2.1) and specifying  a starting model m0(z),  the kernel functions  may be computed and integrated and the LP problem posed for  the minimum-variation or smallest-deviatoric model as described in Section 4.2. This problem is solved using the exceptionally powerful  and flexible  LP algorithm XMP (Experimental Mathematical Programming library) developed by R. E. Marsten (1981). The convergence criteria for  the inversion algorithm are that the li misfit  of  the model M-1 M (4.2.17) (4.3.1) must be within a tolerance tx 1 of  the desired misfit  x\ a n ( i that the total change in the model between successive iterations given by (3.3.5) must be less than ed. In our algorithm t x i , x\ and are parameters that are defined  by the user. In practice, common values for  these parameters are = \Z2/ir2N  (for  complex responses at N  frequencies),  tx 1 =0.1 and ed = 0.01. In order to ensure that the linearization is valid at each iteration, it is important to control the change in the model at each iteration. To accomplish this, the target misfit  value at the kth  iteration x\,k  taken to be some fraction  of  the misfit  of  the previous iteration unless this value is less then x\'-where P is usually taken to be between 2 and 5. In addition to choosing target misfits  in this manner, the size of  changes in the conductivity between successive iterations are controlled by imposing a LP constraint on the ith partition element at the kth  iteration according to where D is usually between 2 and 10. Note that the constraints (4.2.3) are included in the LP formulation,  not simply imposed when the model is updated as was the case in the /? inversion algorithm. This represents a significant  improvement in stabilizing the inversion. A useful  feature  of  Marsten's (1981) LP package is that it allows the user to initialize the LP algorithm with an arbitrary basis. The standard procedure in many LP algorithms is to initiate all parameters at their lower bound. We have found  that by initializing the LP algorithm with the solution basis from  the previous iteration, the LP computation time can be reduced significantly. This is particularly true for  the final  few  iterations which do not change the model greatly but are required to precisely achieve the desired misfit  and ensure that the model change e is acceptably small. The computation time for  these iterations can be reduced by an order of  magnitude. (4.2.2) tr itk-i/D  < aitk < D aitk-i, i = I,...,  M, (4.2.3) 4.4 Examples of h model norm construction 4.4.1 Minimum-variation model construction This section presents a number of  examples of  model construction by inverting both synthetic and measured field  responses. The first  example illustrates the convergence of  the algorithm when the objective function  (4.2.17) is minimized with 8 — 1 (minimum-variation model), m(z)  — a(z)  and uniform  weighting tut- = 1 for  the synthetic test case described in Section 3.4.1. The constructed models and predicted responses at each iteration are shown in Fig. 4.1 and the corresponding values of  the misfit  x \ the total variation ||m,+1— m,-||i and the total model change e are given in Table 4.1. Figure 4.1(a) shows the starting model which consists of  a halfspace  of  conductivity 0.02 S/m; the true model is indicated by a dashed line. The two plots to the right compare the observed responses (squares with error bars) to the responses computed for  the starting model (solid lines). The observed data are accurate, but an uncertainty of  2 percent in the real and imaginary parts of  the response R is assumed so that the X1 statistic can be used to measure the relative fit  of  the models. Figure 4.1(b)-(f)  show the models produced at iterations 1, 2, 3, 4 and 6, respectively. Since the expected value of  x 1 for  N  = 25 complex responses is V ^ A 2-/V « 40, this value was used as the desired misfit  x\ with a tolerance of  txi =0.1, Also, a model change of  e < 0.01 was required for  convergence. At each iteration the target misfit  was chosen according to (4.2.2) with P = 3 and the change in the conductivity of  each model element was limited according to (4.2.3) with D = 10. Formulating each inversion step in this manner ensures that the linearization holds and that structure is introduced into the constructed models and into the predicted data in a controlled manner, as shown in Fig. 4.1(b)-(f).  By iteration 4 the constructed model reproduces the data approximately correctly (x1 = 39.7); however two more iterations are required to to precisely achieve the desired misfit  and verify  that the solution has stabilized. The final  model, achieved "in iteration 6 and shown in Fig. 4.1(f),  has a misfit  of  x 1 = 40.0 and represents a model change of  only e = 8 .82x l0 - 4 from  the previous iteration. 0.10 £ 0.08 CO 0.06 0.04 N 0.02 0.00 102 i i i 111 II f  i i i 111 n a _l l_l_ 103 104 a 1 0 - 1 \ C/^  10-2 30 £ 60 40 20 1 0 " 10-2 1 0 - i io° 101 102 j ' 1 1 ' 1 10~3 10-2 IO"1 10° IO1 102 « « « 3 1 1 1 1 1 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 T  (s) Figure 4.1 The sequence of  models produced in the inversion for  the l\ minimum-variation model with m{z)  = a(z),  and W{  = 1. (a), (b) and (c) show the starting model and the models constructed in iterations 1 and 2, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The values of  x 1 . Il^-fi — m i | | 2 and e for  each iteration are summarized in Table 4.1. 0.10 £ 0.08 00 0.06 _ 0.04 0.02 -0.00 j i i i 11 m i i i 111 ii d J l—L 102 103 104 0.10 £ 0.08 00 0.06 _ 0.04 0.02 -0.00 • i i i 111 II i i i i 111 II i i i 102 103 104 0.10 £ 0.08 00 0.06 0.04 N 0.02 0.00 1 1 j ' i l l l l Ml I I I I I I I II I I L 102 103 2 ( m ) 104 10-3 10-2 10-1 10° 101 102 IO"3 IO"2 10 -1 10° 101 102 IO"3 IO"2 10~l 10° 101 102 T  ( s ) Figure 4.1 (cont'd)  (d), (e) and (f)  show the models constructed in iterations 3, 4 and 6, respectively (solid line); the true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. Table 4.1 Summary of  model attributes at each iteration for  the inversion shown in Fig. 4.1 Iteration x 1 | | m t + 1 - m,-||i e number Misfit Variation norm model change 0 13 100 0 1 396 0.0208 0.530 2 175 0.0524 0.744 3 67.7 0.113 0.314 4 39.7 0.134 0.168 5 40.0 0.133 0.0513 6 40.0 0.133 0.000882 I teration Figure 4.2 Computation times on a SUN 4/310 workstation required for  the LP solution at each iteration for  the inversion shown in Fig. 4.1. The triangles indicate the CPU time required when the LP parameters are initiated at their lower bound; the squares indicate the time required when the LP algorithm is initiated with the solution basis from  the previous iteration. Fortunately, the computation time for  the final  iterations which refine  the misfit  value and ensure that the solution has stabilized can be significantly  reduced by initiating the LP algorithm with the solution basis of  the previous iteration. Figure 4.2 shows the CPU time required for  the LP inversion at each iteration on a SUN 4/310 workstation when the LP algorithm is initiated with all parameters at their lower bound (triangles), and when the algorithm is initiated with the solution basis of  the previous iteration (squares) from  iteration 2 on. When the LP inversion is initiated from  the lower bounds, the CPU times vary somewhat but are generally about 80 s. When the inversions are initiated from  the previous solution basis, the time required generally decreases with the iteration number with the final  two inversions requiring only about 7 s each. This represents a reduction by more than a factor  of  10 over the times required when the inversions are initiated at their lower bounds (for  larger problems the reduction can be even more substantial). The total time required by the LP algorithm is about 125 s when the inversions are initiated form  the previous solutions and about 460 s when the inversions are initiated from  the lower bounds. The model solutions at each iteration are identical regardless of  how the LP algorithm is initiated. Figure 4.3 shows the results of  an inversion similar to that of  Fig. 4.1 except that the total variation of  m(z)=log o(z)  is minimized. The constructed models shown in Fig. 4.1(f) and Fig. 4.3(a) illustrate the very different  characteristics of  the solutions when they are compared to the corresponding l2 solutions shown in Fig. 3.2(f)  and Fig. 3.3(a). Unlike minimizing the l2 structural norm which discriminates strongly against large, abrupt changes in favour  of  continuous gradients, minimizing the lx total variation produces layered-type models with structural variations occurring at distinct depths. In each case the characteristics of  the constructed models are a result of  the choice of  norm that is minimized; both solution fit  the data equally well. Thus, the lx and l2 inversions offer  complementary solutions and, in practice, a complete interpretation should consider both. In cases where a minimum-structure layered model is desired, the lx minimum-variation model would seem to be an excellent choice. The method does not require a priori knowledge of  the number or depths of  the layers, the only requirement 'nT b z (m) 10 - i B w 1 0 - 2 10 - 3 i i 11 urn i i i i i i i i i ' i i ' i nn i i i i nn i i 111 10~3 10-2 10" 1 10° 101 102 T  (s) Figure 4.3 The l\ minimum-variation model for  m(z) = log <r(z),  and w{  = 1. (a) shows the constructed model (solid line) and the true model (dashed line), (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit  of  the constructed model is x 1 =40.0. is that the depth partitioning be sufficiently  fine  that the solution is essentially independent of the discretization, as described in Section 3.3.2. An advantage of  the formulation  is that there no intrinsic limit to the high-frequency information  content of  the constructed models; in contrast, the l2 formulation  which forms  the model as a linear combination of  the kernel functions  imposes a finite  limit independent of  the data (Claerbout & Muir 1973). This feature  is crucial to the appraisal analysis using extremal models presented in the next chapter. As an example of  this difference,  consider the lx and l2 minimum-structure models constructed by fitting  the (accurate) responses as closely as possible. Figure 4.4(a) shows the best-fitting  l2 flattest  model that can be constructed. This model is produced whether the inversion algorithm is initiated from  a halfspace  starting model or the true model. Although the model solution reproduces the observed responses very well (x 2 = 0.242, X 1 = 2.67, assuming 2 percent uncertainties in the responses) and is good agreement with the true model, it is not a perfect  solution. In particular, the constructed model does not exhibit the abrupt discontinuities of  the layered model and tends to over-shoot the true layer conductivities and oscillate slightly. Also, the misfit  to the observed responses, while small, is well above the limit of  computational accuracy. This limit on the precision of  the solution results because a layered true model simply cannot be reproduced exactly as a linear combination of  a finite number of  smooth kernel functions.  The features  of  the constructed model are similar to the Gibb's effect  observed in attempting to construct a step discontinuity from  a finite  number of Fourier components. In contrast, the best-fitting  lx minimum-variation model, shown in Fig. 4.4(b), reproduces the true model almost exactly (given a model partition with depth elements at the discontinuities of the true model). The observed responses are apparently fit  to within the limits of  computational precision (x2 =1 .77x l0 - 9 , x 1 =7 .30x l0 - 5 ) since an absolute accuracy of  10~5 is required by the algorithm in integrating the kernel functions  according to (4.2.8). The same model is produced whether the inversion algorithm is initiated from  a halfspace  starting model or the true model. The result that the construction reproduces the true responses accurately is not simply IO"3 IO"2 10"1 10° 101 • 102 6 m N 10~3 10-2 1 0 -1 ioO io1 102 T  (s ) Figure 4.4 Best-fitting  constructed /1 and l\ minimum-structure models (accurate data) are shown in (a) and (b), respectively. The fit  to the true data is shown in the panels on the right. due to the fact  that the true model is layered and minimizing the lx variation norm produces layered models. For instance, the lx algorithm performs  equally well if  the true model and data are taken to be a constructed l2 flattest  model and its responses. The weighting parameters included in the LP objective function  (4.2.17) can be used to influence  how strongly the minimization is applied to various regions of  the model. For example, the constructed model shown in Fig. 4.3 indicates a layer of  constant conductivity extending to a depth of  about 600 m where the conductivity changes abruptly. To investigate whether this constant-conductivity surface  layer could extend to 800 m depth, weights tu,- can be chosen in the objective function  (4.2.17) to discriminate strongly against conductivity changes in this region. Figure 4.5(a) shows the model constructed by minimizing the model variation (6 = 1) with weights ivi = 5 for  0 < z < 800 m and iut- = 1 below this depth. The model indicates that it is unlikely that a realistic model with an 800-m surface  layer could be consistent with the responses. Even with the strong variation weighting, the conductivity changes at about 300 m depth. Also, a narrow high-conductivity zone or spike is required at 800 m depth; this might be considered geophysically unrealistic. It is apparent from  the fit  to the data displayed in Fig. 4.5(b) and (c) that the constructed model significantly  misfits  the short period responses which are sensitive to this shallow structure and, to compensate, overfit  the long period responses. This indicates that the shallow structure is not in good agreement with the data. In the examples presented so far,  accurate responses have been inverted to produce the constructed models. Figure 4.6 shows constructed h minimum-structure models and their corresponding fit  to the data for  three levels of  error on the responses. In each inversion m(z)  = a-(z),  w(z)  = 1, and all solutions have a misfit  of  x1 = 40.0. Although the errors degrade the solution in each case, by requiring a fit  to the data appropriate to the inherent uncertainties according to the x 1 criterion, the errors do not introduce any false  structure into the constructed models. The error contaminated data sets inverted in Fig. 4.6 are the same as those considered in Fig. 3.6 for  the l2 minimum-structure inversion. In Fig. 4.6(a) each response has been contaminated by the addition of  a random error drawn from  a zero-mean, Gaussian distribution 3 'nT " b z (m) 10-3 10~2 IO"1 10° Figure 4.5 The li minimum-variation model constructed for  m{z)—\oga{z)  with a weighting function  w, = 5 for  0 < z < 800 m and wi — 1 for  2 > 800 m. (a) shows the constructed model (solid line), the true model is indicated by the dashed line, (b) and (c) show the true data (squares with error bars) and the predicted responses (solid line). The misfit  of  the constructed model is x 1 =40.0. IO -3 io-2 IO-1 10° 101 102 £ \ CO '7T "b 10-1 10" 1 0 " 3 -1 1 | ~ T | 1 . L 1 -1 1 1 1 M 1 III 1 1 1 1 Mil c J 1 1 102 103 104 z ( m ) 20 - " i l l r g j f 1 1 i " •j -i-i U T I F-,T i i i f ] 10-3 IO"2 IO"1 10° IO1 102 T  (s ) Figure 4.6 The effect  of  data errors: l\ minimum-variation models for  m(z) = log a(z)  and wi = 1 are shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of 2, 10 and 30 percent, respectively. The true model is also indicated (dashed line). The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. The misfit  of  the constructed models is x 1 =40.0. with a standard deviation of  2 percent of  the accurate response value. The constructed model is similar (but not identical) to the model shown in Fig. 4.3(a) where a 2 percent uncertainty in the data was assumed but accurate responses were inverted. In Fig. 4.6(b) the standard deviation of  the error distributions is 10 percent of  the response value. The constructed model exhibits considerably less structure: the conductivity of  the first  and last layers are reproduced quite well and the conductivity increases near 600 and 6000 m; however, the low and high conductivity zones are apparently not required to fit  the responses and are not resolved. Figure 4.6(c) shows the results for  30 percent error: the responses are adequately fit  by essentially a two-layer model with an increase in conductivity at about 600 m depth. Likewise, the predicted data exhibit no detailed features  as only a simple structure is required to adequately fit  the true data. It is well known that the misfit  norm is more robust and less influenced  by extreme or outlying responses than the l2 misfit  norm (e.g. Claerbout & Muir 1973). The advantage of  this property is demonstrated in Fig. 4.7 which shows and l2 minimum-structure models constructed by inverting a data set contaminated with 2 percent Gaussian noise and with the apparent conductivity response at T=0.187 Hz in error by four  standard deviations (the data set is shown in the panels on the right). Figure 4.7(a) shows the minimum-structure model constructed with a misfit  corresponding to the expected value, x 1 = 40.0. This model exhibits some additional small-scale structure, but is generally a good representation of  the true model and is similar to the model shown in Fig. 4.6(a) which was constructed by inverting the same data set without the outlying response. Figure 4.7(b) shows the constructed l2 minimum-structure model; the expected value for  the misfit  is x 2 = 50, however, the best fit  that could be obtained for  this data set was x 2 = 158. Even with this large misfit  value the l2 model shows significant false  structure when compared with the true model or with the model constructed by inverting the data set without the outlier, shown in Fig. 3.6(a). The relative insensitivity of  the misfit norm to a small number of  outliers (or 'blunders', in the terminology of  Claerbout & Muir 1973) in the data set can be a significant  advantage in inverting field  measurements when the actual uncertainties may be difficult  to estimate accurately. IO"3 IO"2 IO"1 10° 101 102 T  (s ) Figure 4.7 The effects  of  outliers in the data set. The responses are contaminated with 2 percent Gaussian noise and the apparent conductivity response at T = 0.187 Hz is in error by four  standard deviations, (a) shows the l\ minimum-variation model with a misfit  of  x 1 = 40.0. (b) shows the best-fitting  h flattest  models with a misfit  of  x 2 = 158. The true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. As an example of  lx norm inversion of  MT field  measurements, consider the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. As described in that section, the uncertainties associated with the outlying responses in this data set appear to be under-estimated and consequently the best-fitting  l2 minimum-structure model, shown in Fig. 3.8(c), may contain unnecessary structure. This contention is supported by the best-fitting lx minimum-structure model shown in Fig. 4.8(a) by the solid line. The constructed model has a misfit  of  x 1 = 88.0; the expected value for  68 responses is 54.3. The fit  to the observed responses is shown in Fig. 4.8(b) and (c). The lx minimum-variation model has the same general features  as the l2 minimum-structure models shown in Fig. 3.8 (the l2 model shown in Fig. 3.8(b) is included as a dotted line for  comparison), but does not exhibit the additional small-scale structure of  the best-fitting  l2 model shown in Fig. 3.8(c). The simpler structure of  the lx best-fitting  solution is likely due to its relative insensitivity to the outliers compared to the l2 solution of  Fig. 3.8(c). Figure 4.8(a) also shows another example of  the complementary model types (gradient or layered) which may be constructed using both the l2 and lx minimum-structure solutions. 4.4.2 Smallest-deviatoric model construction The LP objective function  (4.2.17) may be used to represent the model variation, the deviation from  a base model, or any (weighted) combination of  the two. As an example of  the lx smallest-deviatoric model construction, consider the synthetic test case and assume that the true model is known to a depth of  2000 m: the base model is taken to consist of  the true model to this depth and below the conductivity is held constant. This same base model was used to demonstrate the l2 smallest-deviatoric model in Section 3.4.2 except that here the model is taken to be m{z)  — \oga{z).  Figure 4.9(a) shows the smallest-deviatoric model (solid line) constructed by minimizing the objective function  (4.2.17) with 9 = 0 and weights wt = l. The true model is indicated by a dashed line and the base model by a dotted line (mostly obscured by the constructed model). The constructed model exactly reproduces the base model over the region 0-2000 m depth where the base model is known accurately, indicating that this structure is consistent with -1 6 CO 10 IO" 2 b C IO" 3 — " i l l ii i i i i i 1 II HIM 1 1 II 1 llll 1 II 1 Mil 1  II 1 Mil b i iMi o 10-3 10~2 10" 1 10° 101 IO2 80 -60 -40 I 20 - c 0 1 1 1 1 HIM 1 1 1 MINI 1 1 1 1 llll 1 i 11inn ,i i 1 1 1 llll 1 l l l l IO" 3 10-2 1 0 - 1 1 0 0 1 0 1 IO2 T  (s) Figure 4.8 Best-fitting  li minimum-variation model and MT responses observed in southeastern British Columbia, Canada. The minimum-variation model with x 1 = 88.0 is shown in (a) by the solid line (the dashed line indicates the /2 flattest  model from  Fig. 3.8(b) for  comparison). The true data (squares with error bars) and the predicted responses (solid lines) are shown in (b) and (c) 10-3 IO"2 10-1 10° 101 102 u 10° 101 102 & 10"1 w IO"2 N IO"3 -_ 1 1 I I I c mi i i i i i mi i II 102 IO3 104 2 (m) 0 1 ' — 10-3 1 0 -2 10-1 100 101 102 T  (s) Figure 4.9 l\ smallest-deviatoric and minimum combined-norm models. The base model is indicated by the dotted line, the true model by the dashed line, (a) shows the smallest-deviatoric model constructed for  a uniform  weighting. In (b) the combined objective function  is minimized with a variation to deviation trade-off  of  6 =0.8333, a deviation weighting of  unity and a variation weighting of  1 for  z >2000 m and 0 for  z <2000 m. In (c) the weights are the same, but the trade-off  parameter is 6 = 0.9375. The misfit  of  all constructed models is x 1 — 40.0; the true data (squares with error bars) and the predicted responses (solid line) for  each model are indicated in the panels on the right. the observed responses. Below this depth the constructed model also corresponds exactly to the base model except for  three narrow zones of  high conductivity (each one partition element wide) that are required in order to fit  the data. The sparse, spiky structure in this region is characteristic of  lx norm solutions when the model structure is not explicitly minimized; this is considered in detail in Chapter 5. The conductivity spikes at depth in Fig. 4.9(a) clearly indicate that the responses require high conductivity structure in this region that is not included in the base model. However, if  such rapidly varying structure is considered geophysically implausible, the variation may be reduced by minimizing an objective function  that includes both the model deviation and the model variation. Figure 4.9(b) shows the model constructed by minimizing a combined objective function  with 8 = 0.8333 (i.e. the variation to deviation trade-off  is a ratio of  5 to 1). The deviation weighting parameters u>t were taken to be unity; however, the variation weights Wi were taken to be zero for  model elements in the depth range 0-2000 m and unity below these depths so that the contributions to the objective function  in the region where the base model is well known are due entirely to the model deviation. Again, the constructed model exactly corresponds to the base model in this region. In the region where the true model differs  from the base model the variation has been reduced but several conductivity spikes remain. Figure 4.9(c) shows a similar construction with 0 = 0.9375 (trade-off  ratio of  15 to 1). In this case the conductivity spikes are replaced by three layers which decrease in conductivity with depth. The first  layer represents the high conductivity region of  the true model; the decrease in conductivity towards the base model in the third layer results from  the diminishing information  content of the responses at great depth. All the constructed models shown in Fig. 4.9 have a misfit  of X 1 = 40.0, the fit  to the true responses is shown in the panels on the right. The models shown in Fig. 4.9 illustrate some of  the diversity of  acceptable models that can be constructed. It is important to note that although the constructed model shown in Fig. 4.9(c) may be more appealing (since we know the true model) than that in Fig. 4.9(a), both models are equally acceptable according to their fit  to the data. If  we have some prior knowledge or insight into the form  of  the true model (e.g. layered or gradient models as opposed to conductivity spikes) this may be valuable additional information  and we are certainly justified  in seeking such models by an appropriate choice of  model norm or additional constraints. However, it must be kept in mind that the constructed models then reflect  our bias and that a variety of models may exist which fit  the data equally well. If  we have no prior knowledge or insight then it may be valuable to construct a number of  different  models in an attempt to explore' the range of  acceptable models and determine common features.  The inversion algorithms presented in this chapter are an important component in any compilation of  inversion techniques. Chapter 5 Appraisal using extremal models of bounded variation 5.1 Introduction In linear inverse theory, the general relationship between the responses and the model is given by a Fredholm integral equation of  the first  kind A fundamental  difficulty  in inverse theory is that of  non-uniqueness: if  there exists one model which adequately reproduces the data via (5.1.1), then infinitely  many such models exist. The problem of  overcoming this non-uniqueness to determine useful  information  about the true model may be addressed in several ways (e.g. Oldenburg 1984). One approach is to construct acceptable models of  a specific  character. Chapters 3 and 4 addressed the problem of  model construction for  the non-linear MT inverse problem. A second approach is that of  appraisal. As a result of the inherent non-uniqueness of  the inverse problem, a finite  set of  observed responses cannot impose any bounds on the value of  the model at a fixed  point. At a given point, the model may attain any (possibly infinite)  value and still satisfy  (5.1.1). However, model averages over a finite  width are constrained by the data, provided at least one kernel function  is non-zero over a portion of  this width. The goal of  appraisal is to determine quantitative information  about such model averages. One approach to appraisal is given by Backus & Gilbert (1970). By taking appropriate linear combinations of  the data equations, they generated unique averages of  the model at a depth interest z0 of  the form oo (5.1.1) 0 oo (5.1.2) o N (5.1.3) is known as the averaging function  or resolving kernel. The model average (m(z 0)) is unique in the sense that the inner product of  A(z 0, z) with any acceptable model will produce this same value. The coefficients  a3 are chosen to make A(z 0,z) localized and centred on z0. Ideally, A(z 0, z) should correspond to the Dirac delta function  S(z—z 0), for  then (m(z 0)) = m(z 0) and the true model would be recovered uniquely. In practice, however, this can never be accomplished with a finite  number of  kernel functions,  so the coefficients  are chosen to make A(z 0, z) as close as possible (in some sense) to 8(z  — z0). Several possible 'deltaness' criteria are proposed by Backus & Gilbert (1970). The shape of  the averaging function  may be used to quantify  the resolving power of  the data. If  A(z 0,z) is narrow, well centred on z0 and without significant  side-lobes or negative values, then the data have good resolution at this depth and (m(z Q)) is a meaningful  estimate of  m(z 0). In practical cases where the responses are inaccurate, a trade-off  exists between the resolution width of  the averaging function  and the variance of  the model average. The interpreter must select an A(z 0, z) and associated (m(z 0)) that represents the most meaningful  compromise between resolution and accuracy. In some cases this analysis produces excellent results. However, for  some problems the averaging function  A(z 0,z) may have undesirable characteristics such as significant  sidelobes or negative values, or it may not be centred at the depth of  interest. In such cases the model average, although unique, is not readily interpreted. These difficulties  arise from  the fact  that A(z 0, z), formed  from  a linear combination of  averaging functions,  is restricted to that subspace of  the Hilbert space spanned by the kernel functions;  in some cases it simply may not be possible to construct a suitable averaging function  in this manner. Huestis (1987, 1988) presents a method for  computing non-negative averaging functions;  however, he demonstrates that for some problems such functions  do not exist, and even in cases where they do exist, the advantage gained in their use may be offset  by a greatly increased computational burden. When the relationship between the model and the responses is functionally  non-linear, Backus-Gilbert appraisal can be applied by linearizing the problem about some constructed model. Unfortunately,  in this case the unique averages computed pertain only to models that are linearly close to the initial model. Oldenburg (1979) constructed a number of  different conductivity models which fit  a set of  MT data but were not linearly close to each other, and found  different  values for  the model average by linearizing about these models. Parker (1983) and Oldenburg, Whittall & Parker (1984) have found  linearized Backus-Gilbert appraisal to be inadequate for  the non-linear MT problem. To overcome these shortcomings, it is advantageous to seek quantitative information  about model averages by formulating  the appropriate inference  problem. The mathematical foundation for  inference  theory has been presented by Backus (1970a, b, c; 1972) and a pragmatic application has been presented by Oldenburg (1983) and will be briefly  recounted here. In Oldenburg (1983) it was shown that upper and lower bounds for  predicted linear functionals  of  the model could be computed using LP techniques. One of  the most useful  linear functionals  is the integral of the model with a unimodular box-car B of  width A centred at the depth of  interest z0: f  l / A , i f  \Z-ZQ\  < A / 2 ; B{z 0,A,z) = l (5.1.4) [ 0, otherwise. The resultant inner product oo m(z 0,A) = J  B(z 0,A,z)m(z)dz  (5.1.5) o represents an average of  the model over a width A centred at z0. Since B(z 0,A,z) cannot generally be formed  as a linear combination of  the kernel functions,  fh(z 0, A) cannot be determined uniquely. However, lower and upper bounds mL (z 0, A) < m (zo,  A) < mu (z 0, A) (5.1.6) can be obtained by constructing models which minimize and maximize (5.1.5) subject to the data constraints using LP. The constructed models which extremize the model average are referred to as extremal models. Implementation of  this procedure requires that the model be discretized; as before,  let m{z)  — mi, Zj_i  < z < Zi t i = l,...,M.  (5.1.7) Linear programming methods can be used to minimize or maximize an objective function $ = w i m ' subject to constraints on linear combinations of  the model parameters m r The {u;, } are a set of  arbitrary weights which may be chosen according to f  (Zi-Zi-J/A,  if  zo —A/2 < zi_!, Zi < +A/2; W i = \ • (5.1.8) {0, otherwise, so as to make the objective function  represent a discretized form  of  the model average, i.e. M $ = ^ Wim,  = rh (z0, A) . (5.1.9) «=i Lower and upper bounds mL and rnu for  m(z 0, A) are calculated by minimizing and maximizing $ subject to the data constraints of  (5.1.1), which may be incorporated in discretized form  as either hard or soft  bounds, as described in Section 4.2.1. An advantage of  this approach to appraisal is that the bounds are calculated for  exact box-car averages of  the true model. This method may be applied to non-linear problems such as the appraisal of  MT responses by constructing the extremal models using an iterative linearized inversion algorithm formulated  so that the LP objective function  is applied to the model at each iteration. If  this procedure leads to the global extremization of  the objective function,  then true bounds for  the model average have been found.  This is in contrast to linearized Backus-Gilbert appraisal which is only valid for  models which are linearly close to some constructed model. In general, it is difficult  to verify  that a global extremum has been found.  However, the analysis in this chapter and in Chapter 6 indicate that in many cases no better extremum can be found; this gives confidence  that meaningful  bounds have been calculated. An important advantage of  the LP method is that any physical information  about the model which can be formulated  as a linear constraint can be included in the inversion. For instance, a priori lower and upper limits for  the model elements m~ < m{  < mf,  (5.1.10) are easily included. In order to obtain the most meaningful  bounds for  the model average rh, is important to include as much additional physical information  or insight into the character of the true model as possible. For each value of  z0 the bounds m i ( z 0 , A) and mu(z 0, A) may be calculated for  a number of different  averaging widths A, and plotted as a function  of  A. Since the computed bounds tend to converge as the averaging width increases, such a plot is referred  to as a 'funnel  function'  diagram (Oldenburg 1983). Funnel function  diagrams provide immediate insight into the resolving power of  the data at the depth of  interest z0. The only loss of  generality in this formulation  is that caused by the partitioning and parametrization. This is not of  practical significance,  however, provided that the partition quantization is sufficiently  small. Lang (1985) demonstrates that the exact problem of  computing linear functionals  of  the model can be approximated to arbitrary accuracy by a discretized problem given a small enough partition interval. Oldenburg (1983) illustrated the method of  appraisal using extremal models for  a simple nu-merical example and applied the method to the (non-linear) MT inverse problem. Unfortunately, it is found  that the extremal models constructed by this analysis often  exhibit unacceptably large oscillations. When model limits are large or absent, the extremal models are characteristically sparse and spiky, consisting of  isolated pulses of  high conductivity embedded in an insulating halfspace.  If  confining  model limits are imposed, the extremal models characteristically consist of  a sequence of  sections which alternate between the limits, in some cases fluctuating  rapidly. In many practical applications we are not willing to accept such models, even if  they are consistent with the observed data. Although these models represent mathematically acceptable solutions, they are generally not geophysically realistic. As a consequence, since the funnel  function  bounds are obtained from  these extremal models, it is likely that bounds found  using this method are unduly pessimistic. It is anticipated that more meaningful  bounds could be calculated if  these highly variable models are purposely winnowed from  the analysis. In this chapter the total variation is used as a measure of  the amount of  structure of  a model and highly oscillatory models are discriminated against by placing an upper bound on the variation of  the extremal models. As a consequence of  restricting the model solution space in this manner, the difference  between the upper and lower bounds computed for  m(z 0, A) is often  considerably reduced. Thus, the appraisal technique of  Oldenburg (1983) is extended to include the variation of  the extremal models as another dimension. The interpreter may make use of  any knowledge or insight regarding the variation of  the earth model to select reasonable extremal models and meaningful  funnel  function  bounds. In the next section, two methods of bounding the total variation of  the constructed models are presented. In Section 5.3 the appraisal technique and the dependence of  the computed bounds on the allowed variation is demonstrated for  a simple linear example. The appraisal method can be applied to non-linear problems using an iterative linearized algorithm. Section 5.4 describes this algorithm and in Section 5.5 the method is applied to the non-linear MT problem by considering synthetic and field  data cases. Much of  the work in this chapter has been presented in Dosso & Oldenburg (1989). 5.2 Formulating the variation bound The total variation of  a model m(z)  was defined  in Section 4.1.1 as In order to eliminate extremal models which are judged to have too much structure, the appraisal method described in Oldenburg (1983) is modified  to include a constraint on V(m).  By placing an upper bound on the variation, models which are sparse and spiky or oscillate repeatedly between the imposed limits can be discriminated against. Abrupt or discontinuous changes are still allowed, but the total number and magnitude of  such changes can be limited to an amount deemed reasonable. The goal is to select a variation bound which results in models that are judged to be geophysically realistic and produce the most meaningful  funnel  function  bounds. Two methods of  bounding the total variation are presented. 5.2.1 Method 1 The first  method is applicable to models that are assumed to be continuous, i.e. m e C 1 . In this case (5.2.1) can be written as oo (5.2.1) 0 oo (5.2.2) o where m'  — dm/dz.  In discrete form,  where the model is assumed to have a constant gradient on each partition element, the variation can be written as Linear programming methods generally assume that the variables are non-negative, but model derivatives which may be either positive or negative can be accommodated by writing each parameter m- as the difference  of  two non-negative quantities: m- = r , — w h e r e r t , t% > 0 are the variables to be determined by the LP algorithm. The absolute values |m' | in (5.2.3) cannot be included in a linear constraint; however, they may be represented as |m-| <r t--H;, with equality holding when either r, or ti is zero. The total variation V  must obey the inequality M T < £ > • • +*••)(*.•-*.•-1) (5.2.4) !=1 and an upper bound Vf,  for  the variation may be specified  by requiring M Y / ( r i + t i ) ( z i ~ z i - i ) < V b . (5.2.5) t=i Equation (5.2.5) is a linear constraint for  the total variation in a form  that can be included in the LP algorithm. To constrain the total variation according to (5.2.5), the LP objective function and constraints must be written in a form  which involves only r, and tt- as unknowns (with m- = Ti — ti). If  ra0 = m(z  = 0) is assumed known, then the value of  the model on the zth partition element is given by i-1 mi = zk-i) (r k ~ tk) + \ (z {  - z^x)  (r< - U).  (5.2.6) k=l The objective function  (5.1.19) becomes To put the data constraints into a compatible form,  integrate (5.1.1) by parts to obtain M (5.2.3) «=i (5.2.7) oo o where fj  and h3(z)  are new data and kernels given by fj  = m0hj (oo) — ej, (5.2.9a) z hj (z)  = J  gj (u) du.  (5.2.9b) o Discretization yields M fj  = Y,™( r«- j = l,...,N,  (5.2.10a) i=i Zi = J  i h i ( z ) - hi (oo)] d z i — 1,... ,M.  (5.2.10b) 7a Zi-l As a final  constraint, limits for  individual model elements, m t < m t < mf,  may be included as «—i »r Zk-i)  (rf c - tk)  + i (ZI  - Zi_!) (r,- - U)  <mf  i = 1 , . . . , M.  (5.2.11) ™>:  — k=l The LP problem of  computing bounds for  m(z 0, A) consists of  extremizing the objective function $ given by (5.2.7) subject to the data constraints of  (5.2.10), the model limits of  (5.2.11) and the variation bound of  (5.2.5). The extremal model may be computed according to (5.2.6). 5.2.2 Method 2 The second method does not require the model to be a continuous function,  rather, the model is represented by a constant value on each partition element. In this case the total variation of the model can be characterized as M—L V  = |m,-+i -rrii\. (5.2.12) i=1 Instead of  formulating  a linear programming problem in which the objective function,  data constraints, model limits and variation bound are written in terms of  the model derivative m- = r% - ti, the formulation  in terms of  the model elements m, is retained, but 2(M - 1 ) new (non-negative) LP parameters {pi,  qt} are introduced and constrained to represent the model changes: Pi ~ qi - - Pi,qi>0, i = 1,..., M.  (5.2.13) It follows  that |m,+ i — m, | < pl + ql and therefore  the total variation can be bounded by constraining M—L + Vft.  (5.2.14) j = i This representation of  the total variation of  the model is identical with that given in Section 4.2.1. However, in the application presented there the model variation was taken as the objective function  and minimized by the LP algorithm in order to construct minimum-structure models. Here the objective function  represents the model average m(z 0, A) as given by (5.1.8) and (5.1.9). This objective function  is minimized and maximized to obtain lower and upper bounds for  m(z 0, A). The variation is bounded according to (5.2.14) in order to ensure that the extremal models constructed in the minimization or maximization are geophysically reasonable. For the work presented here, both methods of  bounding the variation have been programmed and give essentially the same results. The advantage of  the first  method is that fewer  variables and constraints are required in the LP algorithm. In Method 1, 2 M  variables are required to represent the model derivative elements and the variation bound is specified  as a single constraint, whereas in Method 2, 3 M—  2 variables are required (M model elements and 2 (M— 1) variables to represent the model changes) and a total of  M  constraints are required to specify the variation bound. However, the sparsity of  the constraint matrix is destroyed when limits for the model elements are expressed in terms of  the models derivative according to (5.2.11). In practice, this can be a significant  disadvantage since many LP algorithms are designed for  large, sparse constraint matrices. Despite the fact  that the second method requires more variables and constraints, the constraints are sparse and we have found  the second method to be both significantly  faster  and more stable computationally for  large extremization problems. In addition, since the second method is formulated  in terms of  the model rather than its derivative, integration of  the data equations and the recovered model derivative solution are not required. Also, there is no need to specify  a model value at an endpoint. For these reasons the second method is the recommended formulation  and the numerical examples presented in this thesis are computed using Method 2. In either method, however, since the variation bound is specified  as an inequality constraint, it may be that the extremal model does not achieve a total variation of  VJ,. The actual variation V of  the constructed model can be evaluated directly. In practice it is generally found  that V  = V& provided the variation bound is less than the variation of  the unbounded extremal models. 5.3 Linear appraisal example To illustrate the appraisal technique and demonstrate the improvement in resolution when the variation is bounded, a simple linear example presented in Oldenburg (1983) is re-examined. Let the model be defined  on the interval [0,1] as m (z)  = 1 — - cos (2wz)  (5.3.1) and the responses be obtained from  the equations i ej = J  e-( j~l)zm (z)  dz,  j = 1 , . . . , N.  (5.3.1) o A total of  11 accurate data were generated, and these are used to infer  information  about the value of  the true model for  a depth z0 = 0.5 where the model attains its maximum value of  1.5. Figure 5.1(a) shows upper and lower bounds calculated when no limits (except a non-negativity constraint) are placed on the model elements, and no bound is placed on the total variation. Averages of  the true model are indicated by the dashed line. The wide bounds indicate that the resolving power of  the data is poor. For instance, for  an averaging width of  A = 0.2, the model average is known to lie only within the bounds 0<m(z o, A) <4.16 while the true model lies in the range 1.47 < m(z)  < 1.50. Only for  A >0.5 is mL >0, so without additional physical information,  a region of  non-zero amplitude near z0 — 0.5 is not a required feature  of  the model. Figure 5.1(b) and (c) show constructed extremal models which minimize and maximize the model average m(z 0, A) for  an averaging width of  A = 0.2. The true model is indicated by the dashed line. The constructed models consist of  a sequence of  regions of  zero amplitude with two or three isolated zones of  large amplitude each one partition element in width. This structure is characteristic of  all extremal models which produced the funnel  function  bounds of  Fig. 5.1(a). Figure 5.1 Lower and upper bounds for  m(z0 = 0.5, A) are shown in (a), (d) and (g). In (a) only non-negativity was required, in (d) model limits 0.5 < m; < 2.0 were imposed, and in (g) a variation bound of  V& = 2.0 was also included. This variation bound corresponds to the actual variation of  the true model. The true model averages are indicated by the dashed line. The two plots to the right of  each funnel  function  diagram show the constructed extremal models which minimize and maximize m(zo = 0.5, A = 0.2). In these plots the true model is indicated by the dashed line. The constructed models for  the unconstrained extremizations exhibit narrow zones or spikes with amplitudes of  up to 40 or more. These values differ  dramatically from  the true model. If  reasonable limits for  the model amplitude are known, the computed bounds can be greatly improved. Figure 1(d) shows the bounds calculated after  requiring that 0.5 < m, <2.0. The significant  improvement in resolution for  all averaging widths is apparent when Fig. 5.1(a) and (d) are compared. The1 funnel  functions  also show the minimum resolution width required before  the measured responses influence  the computed bounds. For instance, only for  A >0.28 is mL > 0.5, the imposed lower limit, and mu < 2.0, the imposed upper limit. Figure 5.1(e) and (f)  show constructed extremal models which minimize and maximize fh(z 0, A = 0.2). These models consist predominately of  a sequence of  sections which alternate between the imposed limits. Only a few  model elements do not achieve either m~ = 0.5 or m+ = 2.0. In some cases the extremal models fluctuate  rapidly between the imposed limits. An example of  this is given in Fig. 5.2 which shows the constructed model which minimizes m(z 0, A = 1.0). The bimodal form  of  these extremal models is similar to that of  Parker's (1974, 1975) ideal bodies. The ideal body m^z) is that model which is everywhere equal to either zero or M0, where M 0 represents the greatest lower bound on the largest value of  m (i.e. the smallest supremum of  m). The model mj(z)  is unique in that it is the only acceptable model which nowhere exceeds M0. In the limit of  m~ —>0 and m + ->M 0 , the LP extremal models will be equal to mi(z)  regardless of  the values of  z0 or A. However, it appears that the extremal models retain this bimodal form  for  a wide range of  values of  m" and m + , provided the discretization interval is sufficiently  small. Models such as those shown in Figs 5.1(b), (c), (e), (f)  and 5.2 might not be considered geophysically realistic, and hence the computed bounds may be unduly pessimistic. Figure 5.1(g) shows the results of  employing a variation bound Vb = 2.0, which represents the actual variation of  the true model. A significant  improvement in the resolution is apparent when Fig. 5.1(d) and (g) are compared. In this case there appears to be no minimum resolution width before 0.0 0.4 0.6 Depth z Figure 5.2 The constructed model which minimizes m(zo = 0.5, A = 1.0) with imposed model limits 0.5 < m, < 2.0. The true model is indicated by the dashed line. Averaging Width A Figure 5.3 The percent improvement, P, for  V\ model limits 0.5 < m, < 2.0. = oo (no variation bound) and V 2 = 2.0, with the data influence  the computed bounds. For instance, for  an averaging width of  A = 0.28, the computed bounds are 0.86 < rh < 1.72, while in Fig. 5.1(d), the computed bounds simply reflect  the imposed limits 0.5 < in <2.0. Figure 5.1(h) and (i) show constructed models which minimize and maximize fh(z 0, A = 0.2) for  the variation bound V  < 2.0. These models do not exhibit excessive oscillations and might be considered to be more geophysically realistic. To quantify  the improvement in the bounds that results when the allowed variation is changed from  V\  to V 2, the 'percent improvement', P is defined P(z 0,V uV 2,&) = 1 -mv2 (*o»A) - rriy2 (z 0, A) x 100%. (5.3.3) (z 0, A) - m£x (*o,A). In (5.3.3) the subscripts Vi and V 2 indicate the total variation allowed in the extremal models. The results for  V\  = oo (no variation bound) and V 2 =2.0 are shown in Fig. 5.3. For most averaging widths the funnel  function  bounds are improved by 30-40 percent. By reformulating  the appraisal method to bound the total variation of  the model, the analysis has been extended to include the variation as another dimension. Upper and lower bounds may now be considered as a function  of  both the averaging width and the model variation. Figure 5.4 shows the computed bounds as a function  of  the allowed variation for  a fixed  averaging width A = 0.2. No limits (except non-negativity) were placed on the model elements. The true model has a variation of  V—  2.0 and an average value of  fh(z 0, A = 0.2) = 1.47; this point is indicated by a cross in Fig. 5.4. For large allowed variations the bounds are wide and the model average is poorly constrained. For instance, for  a variation of  V  = 14.0, the model average is only known to lie within the bounds 0 <m(z 0. A) < 4.15. As the allowed variation is decreased, the bounds converge smoothly. The upper bound decreases monotonically as the allowed variation is decreased from  V  = 14.0; however, it is not until the variation is less than about V  = 5.0 that the lower bound increases from  zero. At the true model variation of  V  = 2.0, the model average is known to lie within the bounds 0.73 < m(z0, A) < 1.75. Reducing the allowed variation to a value less than 2.0 excludes the true model from  the LP solution space and may result in computed bounds which do not contain the true model average. 4 D S 3 2 hJ " S 1 0 4 6 8 10 Variation V Figure 5.4 Lower and upper bounds for  fh(zQ  = 0.5, A = 0.2) as a function  of  the allowed total variation V  for  the linear example with ZQ  = 0.5, A = 0.2. The true model average is indicated by the cross. N 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Depth z Figure 5.5 The l\ minimum-variation model for  the linear example. The model given by the solid line is constructed by minimizing the total variation. An identical model is constructed by minimizing or maximizing the model average m with a variation bound Vf,  =0.75. The value V  = 0.75 corresponds to the minimum model variation that is consistent with the model fitting the data. The true model is indicated by the dashed line. This point is also illustrated in Fig. 5.4. As V  is decreased below 2.0, the bounds continue to converge; for  variations V  < 1.5 the bounds no longer contain the true model average. The upper and lower bounds meet at a variation of  V  — 0.75. For this variation the model average is known precisely since m L = 1.15<m<1.15 = m u , but this value does not correspond to the true model average of  m = 1.47. This demonstrates that although it is important to use the best possible value for  the variation bound, over-constraining the variation (or any other physical property) can cause misleading results. The point at which the bounds meet represents the smallest possible variation which still permits an acceptable model. Any attempt to reduce the allowed variation below this value results in an inconsistency between the variation bound and the data constraints. The extremal model which achieves this minimum variation corresponds to the minimum-variation model described in Section 4.2.1. The minimum-variation model and the extremal models computed by minimizing and maximizing fh  for  V b = 0.75 are identical and are shown in Fig. 5.5. The funnel  function  resolution depends on both the averaging width A and the allowed variation V.  This dependence is illustrated in Fig. 5.6 in which contours of  the normalized bound width 2(m u —mL)/(m u+mL) are plotted. The bound width increases with increasing variation and decreases with increasing averaging widths. The best resolution occurs for  large averaging widths and small allowed variations. In practice, a plot like Fig. 5.6 together with an examination of  the extremal models constructed for  various variation bounds should enable an interpreter to determine meaningful  bounds on the average value of  the model over the region of  interest. 5.4 The linearized appraisal algorithm The method of  appraisal using extremal models of  bounded variation can be applied to non-linear inverse problems such as MT by constructing the extremal models in an iterative linearized algorithm formulated  so that the LP objective function  is applied to the model at each iteration. When the variation bound is formulated  according to Method 2, Section 5.2.2, the numerical 0.0 0 .2 0 .4 0.6 0.8 1.0 Averag ing Width A Figure 5.6 Contours of  the normalized bound width 2 (m u of  averaging width A and variation V  for  z0 =0.5. -mL)/(m u+mL) as a function implementation of  this procedure is similar in many regards to the minimum-variation model construction algorithm described in Sections 4.2 and 4.3, and will be described only briefly  here. The major difference  between the construction of  minimum-variation models and extremal models of  bounded variation is the objective function  which is extremized. In the minimum-variation algorithm, the variation of  the model is minimized at each iteration. To construct extremal models of  bounded variation, the box-car model average m(z 0, A) given by (5.1.8) and (5.1.9) is minimized or maximized at each iteration while the model variation is constrained according to (5.2.14). The data constraints are imposed as described in Section 4.2, and model element limits (5.1.10) are also included. To ensure that the linearization inherent in the data equations remains valid, the target x 1 misfit  is reduced at each iteration as per (4.2.2) and the size of  changes in the model between successive iterations is controlled as per (4.2.3). The x 1 misfit  must be reduced to within a tolerance txi of  the desired misfit  x\ f° r convergence. Since the objective function  applies to only a limited region of  the model, there may be a variety of  acceptable models with the same value of  m(z0 , A) but which differ  in structural detail outside the region of  extremization. It is found  that the algorithm sometimes cycles between such models. Since it is the objective function  value not the extremal model per se that is of  interest here, there is no requirement that the total model change e given by (3.3.5) be reduced to some specified  limit. Rather, in addition to the model fitting  the responses, it is required that the value of  the objective function  changes by less than a factor  / over three consecutive iterations for  convergence. In our algorithm x}' txi and / are parameters that are defined  by the user. In practice, common values for  these parameters are x j = \/2/n2N  (the expected value for  complex responses at N  frequencies), t x i =0.1 and / =0.01. The algorithm is usually initiated from  a halfspace  starting model and converges in about six to ten iterations. The LP package of  Marsten (1981) is employed and the solution at one iteration is used as the starting basis for  the subsequent iteration; this greatly reduces the computation time of  the latter iterations. 5.5 Appraisal of MT responses 5.5.1 Synthetic MT example This section presents a number of  examples of  the appraisal of  MT responses using extremal models of  bounded variation . The appraisal method is first  demonstrated using the synthetic test case of  Whittall & Oldenburg (1990) described in Section 3.3.1; in Section 5.5.2 the analysis is applied to a set of  measured field  responses. Acceptable l2 and minimum-structure models constructed for  the synthetic MT example when the model is taken to be m(z)  = a(z)  are shown in Figs 3.2(f)  and 4.1(f).  Both of  these models indicate a (relative) high conductivity zone centred at about 1300 m depth with a width of  about 1400 m. The true model has a conductivity of  0.04 S/m at these depths. The method of appraisal using extremal models can be used to compute upper and lower bounds for  conductivity averages a(z Q = 1300,A) of  this region. Figure 5.7(a) shows the bounds au and aL computed when model limits 0.002 < a, < 0.2 S/m are imposed, but no constraint is placed on the total variation of  the extremal models. The conductivity averages of  the true model are indicated by the dashed line. The upper bound decreases from  the imposed limit of  0.2 S/m for  A < 200 m to a value of  0.0563 S/m at A = 1400 m. The lower bound reflects  the imposed lower limit of 0.002 S/m for  A < 800 m then increases to 0.0175 S/m at A = 1400 m. Examples of  the extremal models which produce the funnel  function  bounds of  Fig. 5.7(a) are given in Fig. 5.8(a) and (b) which show the constructed extremal models which minimize and maximize, respectively, the model average a(z Q = 1300, A = 800). These models have the characteristic sparse, spiky form  of  solutions of  unconstrained variation, consisting of  regions of  conductivity at the imposed lower limit with narrow, isolated zones of  high conductivity at or near the upper limit. All extremal models constructed to produce the bounds in Fig. 5.7(a) are of  this form  and have total variation values of  2.0 to 4.0 S/m, more than 10 times that of the true model. The extremal models which produced the funnel  function  bounds in Fig. 5.7 all ^ x 0.20 6 a m 0.15 - \ 0.10 P b 0.05 -b 0 . 0 0 i i i i i i 0 400 800 1200 0.20 -6 m 0.15 -0.10 -b 0.05 -b 0.00 -0 400 800 1200 Averaging Width A (m) Figure 5.7 Computed lower and upper bounds for  a(z 0 = 1300, A) for  the synthetic MT example. In (a) only model limits 0.002 < cti < 0.2 S/m have been imposed; in (b) a variation bound of V b = 0.21 S/m (the variation of  the true model) was also included. The true model averages are indicated by the dashed line. 0.20 f—K E \ 0.15 m 0.10 0.05 b 0.00 1 0.20 £ \ 0.15 m 0.10 0.05 b 0.00 h 20 IO"3 10~2 10"1 10° 101 IO2 IO"3 IO"2 1 0 - l 1 0 o io1 IO2 T  (s) Figure 5.8 Extremal models constructed by (a) minimizing and (b) maximizing a(z 0 = 1300, A = 800) for  the synthetic MT example with model limits 0.002 < cr; < 0.2 S/m but no bound on the total variation. The true model is indicated by the dashed line. The constructed models have a misfit  of  x 1 = 40.0; the fit  to the true responses is shown in the panels to the right. have a misfit  of  x 1 =40.0; the fit  to the true responses for  the constructed models in Fig. 5.8 is shown in the panels to the right. It is interesting to compare the LP extremal models with the theoretical results of  Weidelt (1985). Weidelt analytically treated the full  non-linear problem of  extremizing the conductance 22 function  S(z 2) = f  o{z)dz  subject to exactly fitting  a small number of  MT responses. He o determined that when no model limits (except non-negativity) where imposed, the extremal models consist of  insulating zones (<r = 0) and thin regions of  infinite  conductivity, but finite conductance, located at isolated points. When S(z 2) is maximized, a conducting region is located at 22—0, which is just included in the region of  integration, whereas when S{z 2) is minimized, a conducting region is located at z2+ 0, which is just excluded. When model limits a~ < a < a+ are imposed, Weidelt found  that the extremal models consist of  a sequence of  sections of  alternating conductivities o~ and <r+. When S(z 2) is maximized, a layer of  conductivity a+ ends at z — z2, whereas when S(z 2) is minimized a layer of  of  conductivity cr~ ends at 2 = z2. The extremal models of  unconstrained variation in Fig. 5.8 appear illustrate the character of  Weidelt's exact solutions. In Fig. 5.8(a) where a(z 0 = 1300, A = 800) is minimized, the constructed model reflects  the lower bound over the region of  minimization and narrow zones of  conductivity at the upper limit are just excluded from  this region. In Fig. 5.8(b) where a(z 0, A) is maximized, narrow zones of  conductivity at the upper limit are just included in the region of  maximization. The discrepancies between the character of  the LP extremal models and Weidelt's theory are likely due to the finite  discretization interval that, must be employed in a practical solution. Weidelt's exact extremal models represent an interesting and important result. However, if  the purpose of  the extremization is to determine meaningful  bounds for  a(z 0, A), these types of  extremal models are not satisfactory.  Figure 5.7(b) shows the result of  imposing a variation bound of  V& = 0.21 S/m, which is the total variation of  the true model. A significant improvement in the resolution at all averaging widths is apparent when Fig. 5.7(a) and (b) are compared. The computed bounds in Fig. 5.7(b) are within the imposed limits for  all averaging widths A and converge smoothly as A increases. The percent improvement in the bound width, P(z 0, V1 ;V2 , A) given by (5.3.3), is shown in Fig. 5.9 for  Vi = oo (no variation bound) and V2 = 0.21 S/m . For most averaging widths the funnel  function  bounds are improved by 50-65 percent. Examples of  the extremal models which produced the improved funnel  function  bounds of  Fig. 5.7(b) are given in Fig. 5.10(a) and (b) which show the models which minimize and maximize a(z 0 = 1300, A = 800) for  V b =0.21 S/m. These constructed models do not exhibit the sparse, spiky form  of  the models in Fig. 5.8 and would likely be considered to be more plausible from  a geophysical point of  view. The extremal models which produced the funnel  function  bounds of  Fig. 5.7 were constructed by initiating the inversion algorithm from  a 0.02 S/m halfspace  starting model. However, a number of  the extremizations were repeated with the algorithm initiated from  a diverse variety of  starting models. Although the extremal models constructed sometimes differed  in minor detail, we have not found  a case where the objective function  value representing the model average differed  significantly.  This provides some confidence  that a better extremal model cannot be found  (at least for  the partition used here) and that meaningful  bounds have been computed. This aspect is considered further  in Chapter 6. The constructed models shown in Figs 3.2(f)  and 4.1(f)  exhibit a region of  low conductivity centred at about 4000 m followed  by a region of  high conductivity centred at about 8000 m. To verify  if  this structure is required of  all acceptable models the method of  appraisal using extremal models may be used to compute upper bounds for  a(z 0 = 4000) and lower bounds for  a(z 0 =8000). Figure 5.11(a) show the computed bounds when model limits a~ =0.002, a + = 0.2 S/m were imposed but the total variation was unconstrained. The computed lower bound for  a(z 0 = 8000) is lower than the upper bound for  a(z 0 = 4000) for  all A, indicating that without additional information  the difference  between the two regions cannot be resolved from  the data in this manner. It is interesting to note from  Fig. 5.11(a) that aL(z 0 = 8000, A = 4000) = 0.002 S/m, i.e. an acceptable extremal model can be constructed with a at the imposed lower limit of 0.002 S/m over the entire high conductivity region of  the true model 6000-10000 m where cr = 0.1 S/m. This extremal model, shown in Fig. 5.12(a), exhibits conductive zones at the upper Averaging Width A (m) Figure 5 9 i The percent improvement, P, for  Vi = oo (no variation bound) and V 2 = 0.21 S/m for  the synthetic MT example with *0 = 1300 m. Model limits 0.002 < a, < 0.2 S/m have been imposed. 0.10 £ 0.08 CO 0.06 ' 0.04 N> 0.02 "tT 0.00 1 0.10 s 6 0.08 \ CO 0.06 0.04 0.02 0.00 _i ' • i i 111ii a J L_l_ 02 10a 104 - t - -> -N . i i -i i i i i 1 j b i i i i J-IUJ i i i Mill i i i 102 103 z (m) 104 4 3 1 0 - 1 CO 10~2 IO"3 10~2 10-1 10° 101 IO2 10-3 10 - 2 IO-1 10° IO1 102 T  (s ) Figure 5.10 Extremal models constructed by (a) minimizing and (b) maximizing a(z 0 = 1300, A = 800) for  the synthetic MT example with model limits 0.002 < a l < 0.2 S/m and a variation bound Vj = 0.21 S/m. The true model is indicated by the dashed line. The constructed models have a misfit  of  x 1 = 40.0; the fit  to the true responses is shown in the panels to the right. Averaging Width A (m) Figure 5.11 The upper bound au(zq =4000,A) is compared to the lower bound aL(z 0 = 8000,A) as a function  of  averaging width A for  the synthetic MT example. In (a) only model limits 0.002 < a <0.2 S/m were imposed, while in (b) a variation bound of  Vj =0.21 S/m was also included. 0.20 f—i \ 0.15 CO 0.10 ,—„ 0.05 0.00 1 0.10 ^—s 6 0.08 cn 0.06 0.04 0.02 "b 0.00 i _ -i ' ' i i'tn ' i i 111 n • 1 102 103 z ( m ) io4 10"3 IO"2 IO"1 10° 101 102 10"3 IO"2 10-1 10° 101 102 T  ( s ) Figure 5.12 Extremal models constructed by minimizing ct(z0 = 8000,A = 4000) with model limits 0.002 < <r, <0.2 S/m. In (a) no variation bound was imposed, in (b) a variation bound Vj =0.21 S/m was included. The true model is indicated by the dashed line. The constructed models have a misfit  of  x 1 = 40.0; the fit  to the true responses is shown in the panels to the right. limit of  0.2 S/m just excluded at either edge of  the region of  minimization and illustrates the extent of  the non-uniqueness of  the inverse problem. To reduce the non-uniqueness and compute more meaningful  funnel  function  bounds, the total variation may be constrained; the results of imposing a bound Vb — 0.21 S/m are shown in Fig. 5.11(b). For resolution widths greater than about 1500 m the computed lower bound for  a(z 0 = 8000) is greater than the computed upper bound for  a(z 0 — 4000), indicating that the region of  of  low conductivity followed  by a region of higher conductivity is clearly resolved and is a required feature  of  all acceptable models with a total variation V  < Vb•  The constructed extremal model which minimizes a(z 0 = 8000, A = 4000) with Vb = 0.21 S/m is shown in Fig. 5.12(b). The extremal models which produced the funnel function  bounds in Fig. 5.11 all have a misfit  of  x 1 =40.0; the fit  to the true responses for  the constructed models is Fig. 5.12 is shown in the panels to the right. The final  example of  the appraisal method using the synthetic MT test case considers the effect  of  errors on the responses. Figure 5.13 shows the funnel  function  bounds computed for three levels of  error. The error contaminated data sets inverted in Fig. 5.13(a), (b) and (c) are the same as those considered in Figs 3.5 and 4.6 and result from  the addition of  a random error drawn from  a zero-mean, Gaussian distribution with a standard deviation of  2, 10 and 30 percent of  the accurate response value, respectively. As would be expected, the bound width increases with the error level indicating that the funnel  function  resolution deteriorates as the data become more imprecise. 5.5.2 MT field  data example Figure 4.8 shows l2 and h minimum-structure models constructed for  the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. These two models are in good agreement and show essentially the same features  as those obtained by Jones et al. (1988). In particular, the models indicate a region of  low conductivity at 2000-7000 m depth and a region of  high conductivity at 20 000-30 000 m depth. These features  will be appraised using the method of  extremal models of  bounded variation. 0 . 0 0 b 0.05 h 0 . 0 0 b 0.05 -0 . 0 0 0 400 800 1200 Averaging Width A (m) Figure 5.13 The effect  of  data errors: funnel  function  bounds computed for  a(z0 = 1300, A) for the synthetic MT example ares shown in (a), (b) and (c) when the responses are contaminated by Gaussian errors of  2, 10 and 30 precent, respectively. The true model average is indicated by the dashed line. In Fig. 4.8 the model was taken to be log a rather than a in order to recover conductivity variations over several orders of  magnitude. In order to appraise model features  we still wish to determine bounds for  the conductivity average a", however, to construct realistic extremal models of  log a, we need to constrain the total variation of  log a. This can be accomplished as follows. If  the model m(z)  is taken to be log a(z),  the definition  of  the total variation given by (5.3.1) becomes oo V(logtr) = b j \d(x/a\,  (5.5.1) o where b — log e. There are a number of  ways that this can be approximated in discrete form and used to bound the variation of  log a at each iteration. We have found  the most useful approximation of  the log variation at the kth  iteration to be M ~ X U* (jk  I V  (loga) = b £ ' ^ - x ' L y (5.5.2) ^ max {(r t+1,(7. } Then the variation of  log a can be constrained by applying the LP formulation  of  Method 2, Section 5.2.2, with the variation bound (5.2.14) modified  to be M-1 h i ? m a x K+I'!V'} ~ V i ' <5-5'3) where pi — qt = |of +1 — a-\ and V b is the limit on the log variation. We have found  that formulating  the bound in this manner preserves the character of  the extremal models (e.g. it allows conductivity spikes to be just included or excluded from  the region of  extremization) while effectively  limiting the log variation. However, since (5.5.3) is not an exact representation of  the variation of  log a, the actual log variation of  the constructed model M-1 V ( l o g a ) = £ | b g f f ? + 1 - l o g ( j ? | (5.5.4) 2=1 must be evaluated directly. Also, since the log variation constraint (5.5.3) depends on the model of  the previous iteration, it is important that the iterations converge to a stable solution where changes in the model between iterations are negligible. Limiting the changes in the model between iterations according to (5.4.3) appears to be an effective  manner of  ensuring convergence to a stable solution. Figure 5.14(a) shows the extremal model which maximizes a- over the apparent low con-ductivity region, 2000-7000 m depth. Conductivity limits a~ =0.0001, a + =1.0 S/m were imposed, but no bound was included on the variation. The computed upper bound for  a is 0.0023 S/m and the log variation of  the extremal model is 72. The model is sparse and spiky, consisting of  insulating regions with conductivity at the imposed lower limit and narrow, isolated zones of  high conductivity. A zone of  high conductivity (one partition element wide) is just included at each edge of  the region of  maximization. Such a model is not appealing from  a geophysical point of  view. Figure 5.14(b) and (c) show extremal models with log variations of  18 and 7.9, respectively. The rapid fluctuations  between low and high conductivity values have been suppressed. The upper bounds for  a computed from  the models in Fig. 5.14(b) and (c) are 0.0022 and 0.0015 S/m. The models which minimize a for  this region simply reflect  the imposed lower limit and are not shown; the computed lower bounds for  regions of  low conductivity are often  not particularly meaningful  since MT measurements contain little information  about resistive layers. It would be advantageous if  an appropriate variation bound Vb could be ascertained through analysis or from  the physics of  the problem. Unfortunately,  this is seldom the case and for  many practical problems an appropriate a priori bound for  the variation may not be known. When this is the case, the interpreter may wish to construct extremal models for  a number of  variation bound values and select the model with the largest variation that is deemed geophysically plausible. In this manner the interpreter may make use of  any knowledge or insight regarding the variation of  the true model to select reasonable extremal models and meaningful  bounds for  the model average. For instance, the extremal model shown in Fig. 5.14(a) is not realistic; however, the model shown in Fig. 5.14(c) might be considered acceptable and therefore  a meaningful  upper bound for  a would be 0.0015 S/m. Extremal models which minimize and maximize a for  the apparent high conductivity region 105 10-3 10-2 10 - 1 10° 101 IO2 105 1 0 ° ? 1 0 - 1 \ S IO"2 "N" 10-3 b IO"4 ' ' I I I 11 llll c 1 ' 1 " " IO2 IO3 104 z ( m ) 105 10-3 10-2 10"1 10° 101 102 W 1 0 - 2 3 u i i i i L IO"3 1 0 -2 1 0 - 1 1 Q0 1 0 1 1 ( T  ( s ) Figure 5.14 Constructed extremal models which maximize o for  the apparent low conductivity region 2000-7000 m depth for  the LITHOPROBE MT data set. Model limits 0.0001 < a{ < 1.0 S/m were imposed in each case, (a) shows the extremal model of  unconstrained variation. This model has a log variation of  72 and a model average of  a =0.0023 S/m. (b) and (c) show the extremal models with log variations of  18 and 7.9 and model averages of  0.0022 and 0.0015 S/m, respectively. at 20000-30000 m depth are shown in Figs 5.15 and 5.16. Model limits a~ =0.0001, cr+ = 1.0 S/m were imposed in each case. Figure 5.15(a) shows the extremal model of unconstrained variation which minimizes a. The lower bound for  a computed from  this model is 0.070 S/m and the log variation is 71. Figure 5.15(b) and (c) show minimization models with log variations of  19 and 7.4, respectively. The lower bounds for  a computed from  these models are 0.076 and 0.12 S/m, respectively. The extremal model of  unconstrained variation which maximizes a for  the apparent high-conductivity region is shown in Fig. 5.16(a). The upper bound for  a computed from  this model is 0.32 S/m and the log variation is 68. Figure 5.16(b) and (c) show maximization models with log variations of  21 and 7.5; the computed upper bounds for  a are 0.27 and 0.22 S/m, respectively. If  the extremal models shown in Figs 5.15(c) and 5.16(c) are accepted as geophysically realistic representations of  the Earth, bounds for  the average conductivity are 0.12 < a < 0.22 S/m. This establishes the region 20 000-30 000 m depth as a zone of  high conductivity. The average conductivity of  this region is greater than that of  the low conductivity zone at 2000-7000 m depth (a < 0.0015 S/m) by (at least) two orders of  magnitude. 10° ? 1 0 - 1 \ CO 1 0 -2 "N* I O - 3 IO"4 I I I I lllll "III I I 102 103 104 105 IO"3 10-2 IO-1 10° IO1 102 IO"3 IO"2 IO"1 10° IO1 102 T  ( s ) Figure 5.15 Constructed extremal models which minimize a for  the apparent high conductivity region 20000-30000 m depth for  the LITHOPROBE MT data set. Model limits 0.0001 <cr, < 1.0 S/m were imposed in each case, (a) shows the extremal model of  unconstrained variation. This model has a log variation of  71 and a model average of  d = 0.070 S/m. (b) and (c) show the extremal models with log variations of  19 and 7.4 and model averages of  0.076 and 0.12 S/m, respectively. _i • ' icrl io° io1 ) • •—•— — 10"3 10"2 IO"1 10° io1 io2 10° ? io-1 \ S 10-2 'TT io-3 V 10"4 c u pH u - i r 1 1 i 1111 m 1 1 11 111 io2 103 104 z (m) 105 o — io~3 io-2 1 0 - i 10o ioi io2 T  (s) Figure 5.16 Constructed extremal models which maximize a for  the apparent high conductivity region 20000-30000 m depth for  the LITHOPROBE MT data set. Model limits 0.0001 < a { < 1.0 S/m were imposed in each case, (a) shows the extremal model of  unconstrained variation. This model has a log variation of  68 and a model average of  a — 0.32 S/m. (b) and (c) show the extremal models with log variations of  21 and 7.5 and model averages of  0.27 and 0.22 S/m, respectively. Chapter 6 Non-linear appraisal using simulated annealing 6.1 Introduction In Chapter 5, two methods for  appraising MT responses were described. Backus-Gilbert appraisal can be applied to the non-linear MT problem by linearizing about some reference model. Alternatively, bounds for  conductivity averages can be obtained by constructing extremal models of  bounded variation. A significant  advantage of  the latter method is that the appraisal is not limited to models that are linearly close to a particular model. However, since the extremal models are constructed via (iterated) linearized inversion, the possibility of  the solution becoming trapped in a local extremum always exists. As previously noted, we have found  that initiating the construction algorithm from  a wide range of  starting models results in the same extremal value for  the model average. Also, the constructed extremal models have the same form  as the exact solutions of  Weidelt (1985). These facts  provide some confidence  that the algorithm generally converges to the global extremum (or at least an excellent approximation to it). This chapter presents a new method of  appraisal for  non-linear inverse problems which is not based on linearization. Rather, the method of  simulated annealing (Kirkpatrick et al. 1983) is applied to the problem of  constructing extremal models that reproduce a set of  MT responses. Simulated annealing is a Monte-Carlo optimization procedure which has been successfully applied to many problems in the field  of  combinatorial optimization (e.g. van Laarhoven & Aarts 1987); however, its application to geophysical model appraisal would seem to be new. Simulated annealing is based on an analogy with statistical mechanics and mimics the thermodynamical process by which liquids freeze  or metals cool and anneal to form  crystals, which represent the state of  minimum energy for  the system. A major advantage of  the method is its inherent ability to avoid being trapped in unfavorable  local minima. This feature  is of crucial importance to the application here. Although appraisal using simulated annealing is considerably less efficient  than linearized methods, it represents a general and interesting new appraisal technique which may be used to corroborate the results of  the extremal model analysis presented in Chapter 5. In the next section of  this chapter, the method of  simulated annealing and its analogy with statistical mechanics is briefly  presented (for  a comprehensive treatment, see Kirkpatrick et al. 1983 and Kirkpatrick 1984, or the monograph by van Laarhoven & Aarts 1987). In Section 6.3 the simulated annealing appraisal algorithm is described, and in Section 6.4 a number of  examples of  the analysis are presented and compared with the results of  extremal model appraisal. 6.2 Simulated annealing Simulated annealing is a mathematical optimization procedure that mimics the physical process of  annealing. Annealing is the way in which crystals are grown: a substance is first heated to melting, then cooled very slowly until a crystal is formed.  At high temperatures the molecules of  the liquid move freely  with respect to one another; as the liquid is cooled, thermal mobility is lost. If  the liquid is cooled slowly enough that the system reaches an equilibrium (steady-state) configuration  at each temperature in the cooling process, the atoms are able to line themselves up and form  a single pure crystal that is completely ordered and represents the global minimum free-energy  state for  the system. If  the liquid is cooled too quickly, it does not obtain this ground state and the resulting crystal may have many defects,  or the substance may form  a glass with no crystalline order; these configurations  represent local minima in energy. The study of  the physical systems on which the method of  simulated annealing is based is the domain of  statistical mechanics. 6.2.1 Statistical mechanics Statistical mechanics provides methods of  describing the (average) physical properties of a macroscopic system composed of  many microscopic particles (e.g. atoms or molecules) in thermal equilibrium. Let each possible configuration  of  the system be defined  by the set of  M parameters r = {r;; i = 1 , . . . , M}  which may represent, for  example, the particle positions and velocities. A fundamental  result of  statistical mechanics is the Gibbs or Boltzmann probability distribution P(r)  = ±e-EW"> T,  (6.2.1) Z which gives the probability P of  the system at (absolute) temperature T being in configuration r. In (6.2.1), E(r)  is the energy of  the system in configuration  r, kB is Boltzmann's constant and the normalizing constant Z,  called the partition function,  is defined  by Z = J 2 e ~ E { r ) / k B T , (6-2.2) r where the sum is over all possible system configurations. According to the Boltzmann distribution (6.2.1), the probability function  for  a system in equilibrium at (non-zero) temperature T  is distributed over all possible configurations  r. Thus, even at low temperature there is a small, but finite  chance of  the system being in a configuration corresponding to a high energy state. At non-zero temperature, the system configuration  r is perturbed continuously due to thermal agitation. Of  central importance here is the fact  that, according to (6.2.1), perturbations to the system that increase the energy state are allowed, although they are less probable than fluctuations  that decrease the energy. Thus the configuration sometimes undergoes transitions that are 'uphill' in energy, and it is these uphill transitions which allow the system to avoid being trapped in locally optimal configurations.  As the temperature T decreases, however, the Boltzmann distribution (6.2.1) assigns progressively greater probability to low-energy configurations,  and significant  uphill excursions become increasingly less likely. In the limit as T—>0, the Boltzmann distribution collapses into the ground state for  the system. This ground state often  corresponds to a pure crystal which is completely ordered in all directions over distances up to billions of  times the size of  an individual atom and represents the global minimum-energy configuration  for  the system. However, in practice, low temperature is not a sufficient  condition for  achieving this ground state. To reach the minimum-energy state the system must be cooled very slowly, and a long time spent in the vicinity of  the freezing  point. If this is not done and the system is allowed to get out of  equilibrium, it will not obtain the ground state but rather forms  a polycrystalline or amorphous state (glass) with no crystalline order and only metastable, locally optimal structure. These configurations  represent local minima for  the system energy state. 6.2.2 The Metropolis algorithm Metropolis et al. (1953), in the earliest days of  scientific  computing, developed a simple algorithm which simulates the average behaviour of  a physical system in thermal equilibrium. The system is parameterized by M  model parameters (e.g. the positions r of  a collection of atoms). In each step of  the algorithm, an atom is given a small random displacement and the resulting change in the energy of  the system AE is computed. The probability of  such a change occurring is assumed to be P (AE)  = e ~ A E / k B T .  (6.2.3) If  AE < 0 (i.e. the transition has lowered the system energy) the probability according to (6.2.3) is greater than unity; in this case the change is arbitrarily assigned a probability P = 1, and the transition is always accepted. The case AE > 0 (the system energy has increased) is treated probabilistically as follows.  A random number £ is generated from  a uniform  distribution on the interval [0,1]. If  £<P(AE),  the new configuration  is retained; if  not, the original configuration is used to start the next step. Repeating this basic step many times simulates the thermal motion of  atoms at a temperature T. The system eventually reaches equilibrium, and because of  the choice of  P(AE)  in (6.2.3) the probability of  a given configuration  r evolves into the Boltzmann distribution (6.2.1). This general scheme of  always accepting a downhill step while sometimes accepting an uphill step based on a probability distribution has become known as the Metropolis algorithm and is used in statistical mechanics as a random sampling technique to estimate average properties or integrals of  the system (e.g. Barker & Henderson 1976; Binder 1978). 6.2.3 Combinatorial optimization using simulated annealing Kirkpatrick et al. (1983) devised an optimization procedure based on simulating the behaviour of  a system of  particles using the Metropolis algorithm, and applied it to a variety of  problems in the optimal design of  computer components. The problems they considered are examples of  combinatorial optimization: finding  the optimum value of  an objective function  defined for  a discrete, but factorially  large configuration  space which in practice cannot be explored exhaustively. No general exact solution is known for  such problems in which the computing effort  does not increase exponentially with the number of  parameters M;  therefore,  heuristic methods are often  employed. The most common general framework  used in heuristic solutions is known as iterative improvement. Iterative improvement begins with the system of  parameters in a known configuration.  A standard perturbation operation is applied to each part of  the system in turn until a new configuration  is found  that improves the objective function.  This new configuration  is then adopted and the process continued until no further  improvement can be found.  Iterative improvement is sometimes referred  to as a 'greedy' algorithm (e.g. Press et al. 1986) since it always proceeds downhill in the objective function.  Because of  this, the search often  gets trapped in a local minimum, and it is customary to repeat the procedure a number of times starting from  different  configurations  and save the best result. Kirkpatrick et al. (1983) developed simulated annealing as a new and general heuristic method for  combinatorial optimization problems. The method is based on an analogy between the many undetermined parameters of  the system to be optimized and the particles of  an imaginary physical system. The objective function  of  the optimization problem is considered analogous to the energy of  the physical system, with the ground state representing the optimal configuration sought in the optimization problem. The optimization procedure involves statistically modelling the evolution of  the physical system using the Metropolis algorithm at a series of  decreasing temperatures which allow it to anneal into a state of  minimum energy. Allowing perturbations to the system which increase the objective function  as well as those which decrease it according to the Metropolis criterion is crucial to escaping from  local minima. In simulated annealing, the temperature T  of  the physical system has no obvious equivalent in the system being optimized and simply acts as a control parameter in the same units as the objective or energy function  E (Boltzmann's constant kB is generally taken to be 1). The simulated annealing process begins with the system to be optimized in a known configuration and some procedure of  generating random perturbations or changes in the configuration.  The first  step is to completely 'melt' the system: i.e. to repeatedly perturb the configuration  at a high enough effective  temperature that essentially all changes are accepted according to the Metropolis criterion (6.2.3) regardless of  whether the objective function  E is decreased or increased. This process completely disorders the system and renders the solution independent of  the initial configuration.  The temperature is then reduced in slow stages allowing enough perturbations at each temperature that the system reaches equilibrium before  proceeding to a lower temperature. As the temperature is decreased, according to (6.2.3) the probability of  accepting configuration changes which increase the objective function  decreases. As the system configuration  begins to approximate the ground state, perturbations which decrease E become less frequent.  Finally, at a low temperature the system will 'freeze'  and no further  changes are accepted. The sequence of temperatures and the number of  perturbations to the configuration  attempted to reach equilibrium at each temperature is referred  to as an annealing schedule. An appropriate annealing schedule is generally problem specific  and may require trial-and-error experimentation. Also, determining the most effective  method of  perturbing the system and which factors  to incorporate into the objective function  require insight into the problem being solved and may not be obvious (Kirkpatrick et al. 1983). If  an appropriate annealing schedule is followed,  the configuration  at which freezing  occurs should approximate the global minimum for  the objective function.  Since controlled uphill as well as downhill steps are allowed, annealing is not a greedy algorithm like iterative improvement. Iterative improvement may be considered analogous to rapid cooling or 'quenching' of  a physical system in which energy is rapidly extracted, resulting in a local minimum of  the system energy (Kirkpatrick 1984). Taking many quenches will produce variations in this energy, but for  large systems this variation will generally be much smaller than the difference  between the quenched and ground states. It is not clear if  this difference  will be so great in practical combinatorial optimization problems. However, Kirkpatrick (1984) found  that simulated annealing always gave better solutions than exhaustive iterative improvement searches for  a number of  representative optimization problems. Another interesting feature  of  simulated annealing is that configuration  rearrangements generally proceed in a logical order. The temperature parameter T  distinguishes classes of rearrangements: changes which cause the greatest decrease in energy tend to occur at high temperatures, and these features  become more permanent as T  is lowered; small-scale refinements in the configuration  which reduce E only slightly are generally deferred  until low temperatures. In order for  simulated annealing to be efficient,  calculating the change in the energy function AE should require much less computational effort  than calculating the energy function  E itself (Kirkpatrick 1984). A method of  improving the efficiency  of  simulated annealing for  some discrete problems was developed by Rothman (1986) who applied annealing to the estimation of  residual statics from  noisy seismic reflection  data (see also Rothman 1985). He developed a one-step Metropolis algorithm by computing the relative probabilities of  acceptance for  each possible parameter change a priori and forming  system perturbations based on weighted guesses. These perturbations are always accepted and thus the inefficiency  of  high rejection rates at low temperatures is eliminated. However, for  problems with a large number of  possible parameters values or parameters that vary continuously between given limits this modification  would not seem practical. The final  aspect of  simulated annealing that will be discussed here is that the analogy between cooling a fluid  and optimizing a function  of  many parameters may fail  in an important respect. Whereas in an ideal system the atoms are all identical and the ground state is a regular crystal lattice, some optimization problems contain many distinct, non-interchangeable elements which make a regular solution unlikely. Also, conflicting  objectives in the optimization problem may preclude a simple, well-ordered solution. Optimization problems with these characteristics are termed 'frustrated'  problems (Kirkpatrick et al. 1983). Physical analogies of  frustrated  systems exist (e.g. Kirkpatrick et al. 1983; Kirkpatrick 1984), but will not be discussed here. In physical systems, frustration  introduces degeneracy into the low-temperature states of  the model so that a number of  near-ground-state configurations  exist with essentially identical energies. Similarly, in optimization problems, frustration  makes the search for  the optimal solution much more difficult.  However, the degeneracy induced by the frustration  implies that there should be many equivalent solutions which closely approximate the absolute optimum. In practice, finding  one of  these solutions is sufficient  (Kirkpatrick 1984). 6.3 The simulated annealing appraisal algorithm In this section, a method of  appraisal using simulated annealing is developed. The appraisal technique is applied here to the MT inverse problem. However, the method is general, requires no approximations (such as linearization) and can be applied to any inverse problem for  which the forward  problem can be solved. The method consists of  formulating  extremal model construction in terms of  an optimization problem which may be solved using simulated annealing. Press et al. (1986) summarize the elements required to apply simulated annealing to an optimization problem: 1) A description of  the system and possible system configurations. 2) An energy or objective function  E to be minimized. 3) A method of  randomly perturbing the system. 4) A temperature or control parameter T  and an appropriate annealing schedule. The construction of  extremal models may be formulated  as a simulated annealing optimiza-tion problem as follows.  The system to be optimized is taken to consist of  the parameters {<7;, i = 1 , . . . , M}  which form  the discretized representation of  the conductivity function  o(z); this system is represented by a = {crt }. The ensemble of  possible system configurations  is taken to be the set of  all configurations  {<r,}, such that oj < ox < of,  where <r~, of  represent the lower and upper limits for  the ith conductivity element. Since each <7, is allowed to vary con-tinuously between its limits, there are an infinite  number of  possible configurations;  the problem formulated  here is not therefore  strictly a combinatorial optimization problem. However, with an appropriate procedure of  perturbing the system it is straightforward  to apply simulated annealing to this problem. Vanderbilt & Louie (1984) present a method of  applying simulated annealing to continuous problems when there are no limits for  the system parameters. The construction of  an extremal model which minimizes a localized conductivity average CT(A) may be formulated  as an optimization problem by minimizing the objective function where Rj(er)  represents the responses predicted for  configuration  a and Sj is the standard deviation. In (6.3.1) the first  term represents the difference  between the achieved and desired X1 misfit,  the second term represents the model average to be minimized, and a and (3  are trade-off  parameters which determine the relative importance of  the misfit  and model average in the minimization. The trade-off  parameters are varied to keep the misfit  and model average of comparable importance throughout the optimization; two parameters are included to facilitate  this process (determining values for  a and /3  is considered in detail later). The goal is to minimize (6.3.1) such that a model configuration  <ris constructed with an acceptable misfit  and the smallest possible value of  a. To construct an extremal model which maximizes a, the energy function (6.3.1) is modified  to The energy functions  (6.3.1) and (6.3.2) generally lead to frustrated  optimization problems since minimizing or maximizing the model average while achieving an acceptable misfit  tend to be contradictory objectives. Therefore,  near-optimum configurations  may be degenerate and in practice it can be very difficult  to achieve the global optimum. However, the degeneracy implies that there should be many solutions which closely approximate the absolute optimum and are equally acceptable (Kirkpatrick 1984). (6.3.1) M (6.3.2) The basic step at each temperature of  the annealing schedule involves perturbing the system <7, computing the resulting change in the objective function  AE, and accepting or rejecting the new configuration  based on the Metropolis criterion (6.2.3). System perturbations involve randomly changing one or more conductivity elements. A conductivity element cr,- is changed according to Gi - (?L + V ( a u - crL), (6.3.3) where 77 is a random number from  a uniform  distribution on [0,1]. In (6.3.3), a L and a v are initially taken to be err and af  so that cr, can take on any value between its limits. After  a sufficient  number of  temperature steps, large-scale structure of  the solution becomes (relatively) fixed  and extreme perturbations will inevitably be rejected. At this point a L and a v may be reset to cr;/2 and 2cr, (if  these values are within the limits for  <j;). A system perturbation can involve changing just one element cr4 with the perturbations sequentially cycling through the elements. Alternatively, random combinations of  the elements may be changed in each perturbation. Combinations involve a random number of  up to 5 elements which are chosen at random and changed according to (6.3.3). We have found  the most effective  manner of perturbing the system involves alternating between changing an individual element and a random combination of  elements and cycling through the system a number of  times. The most subtle aspects of  implementing the simulated annealing algorithm involve determin-ing an effective  annealing schedule and selecting appropriate values for  the trade-off  parameters a and /?. The annealing schedule developed here is based on suggestions by van Laarhoven & Aarts (1987). Since finding  the best possible extremal value for  a is crucial, we have adopted a cautious approach to the schedule. An initial temperature T0 is chosen so that at least 90 percent of  the perturbations are accepted. This effectively  'melts' the starting model. The temperature is reduced according to the sequence T i+1 = eTTi,  (6.3.4) where e T is typically 0.99. At each temperature, the system is perturbed as described above until the system 'freezes'  and no perturbations are accepted at a number of  consecutive temperature steps. Although faster  annealing schedules could likely be devised by reducing the number of  perturbations at high and low temperatures, the schedule described here has proved very effective  in our applications. In addition to defining  an appropriate annealing schedule, to construct extremal models it is essential to keep the misfit  and model average of  comparable importance in the objective function over the entire temperature range. To accomplish this, the trade-off  parameter (3  is varied with temperature. At high temperatures where a changes freely,  large changes in the misfit  occur and initially the value of  f3  is taken to be large. At lower temperatures where structure which approximately reproduces the data becomes permanent in the system configuration,  (3  must be reduced. In our formulation,  (3  is set to a large value at the initial temperature T0 and reduced with decreasing temperature according to fi(T i+1)=epfi(Ti),  (6.3.5) where ep < 1. Appropriate values of  f3(T 0) and ep are problem-dependent and may require some trial-and-error experimentation. Also, achieving a misfit  x1(or) precisely equal to x d not required. If  x1(< r) is within a specified  tolerance txi of  xd  the model is considered to be acceptable and the value of  a may be reduced by a prescribed factor  (typically 10-100) so that more importance is allotted to extremizing a. Unfortunately,  the computation of  AE for  each perturbation requires a full  solution to the forward  problem at each frequency  in the data set. Therefore,  simulated annealing is considerably less efficient  than linearized methods in constructing extremal models and can be quite slow. However, since simulated annealing is renowned for  its ability to avoid unfavourable  local minima, it provides a useful  method of  corroborating the results of  linearized analysis. 6.4 Appraisal examples To demonstrate the method of  appraisal using simulated annealing and to compare this optimization procedure with the linearized approach of  Chapter 5, consider the synthetic MT test case described in Section 3.3.1. This case was analyzed using linearized appraisal in Section 5.5.1; however, the problem is altered slightly here to reduce the computation times required by the simulated annealing method. The number of  elements in the depth partition is reduced and the data set is halved by omitting the response at every second frequency.  It will be seen that this does not significantly  affect  the computed results. As in Section 5.5.1, conductivity limits of  o~ — 0.002 S/m and a + = 0.2 S/m are assumed. Figure 6.1 shows the funnel  function  bounds computed by minimizing and maximizing a(z 0, A) for  z0= 1300 m, i.e. the first  (relative) high conductivity zone of  the true model. The solid line indicates the bounds calculated using simulated annealing, the dotted line indicates the bounds from  the linearized analysis and the dashed line shows the true model averages. The lower and upper bounds are established by constructing extremal models which minimize or maximize a(A)  for  averaging widths A of  100,200,400,600,800,1000,1200, and 1400 m. The bounds computed using the two methods are almost indistinguishable over the entire range of A, indicating that these methods produce virtually identical extremal values for  a. The bounds produced by simulated annealing and linearized inversion are summarized in Table 6.1. The simulated annealing solutions required, on average, almost two days CPU time per extremization on a SUN 4/310 workstation, and represent a very careful  approach to the annealing schedule to ensure that the best possible results are obtained. By comparison, the linearized extremizations required only about 3-5 minutes of  computation time. Importantly, the extremal values computed from  the linearized analysis are slightly better at each value of  A than those achieved by the simulated annealing method (i.e. the linearized method yields larger upper bounds and smaller lower bounds). Although the difference  is not significant,  it does indicate that even an intensive application of  simulated annealing could not obtain better extremal values than the linearized appraisal. This provides confidence  that the linearized extremization yields meaningful  bounds. Finally, comparing Fig. 6.1 with Fig. 5.7(a) indicates that bounds computed in this example are essentially the same as those obtained for the larger data set and finer  partition. 0.20 \ 0.15 W 0.10 S3 b 0.05 b 0 . 0 0 0 400 800 1200 Averaging Width A (m) Figure 6.1 Computed lower and upper bounds for  a(zQ = 1300,A) for  the synthetic MT example with model limits 0.002 < a, < 0.2 S/m. The solid line indicates bounds computed using simulated annealing appraisal, the dotted line indicates bounds from  linearized appraisal, and the true model averages are indicated by the dashed line. Table 6.1 Summary of  upper and lower bounds computed for  a(z 0= 1300, A) using simulated annealing and linearized inversion to compute extremal models for  the synthetic MT example. a (S/m) a (S/m) A (m) minimization maximization annealing linearized annealing linearized 100 0.0022 0.0020 0.200 0.200 200 0.0022 0.0020 0.200 0.200 400 0.0022 0.0020 0.118 0.119 600 0.0022 0.0020 0.0866 0.0884 800 0.0022 0.0020 0.0760 0.0767 1000 0.0134 0.0127 0.0680 0.0696 1200 0.0174 0.0172 0.0604 0.0616 1400 0.0211 0.0200 0.0537 0.0545 It is interesting to examine the extremal models which produced the bounds shown in Fig. 6.1. Figure 6.2(a) and (b) show the models which minimize a(z 0 = 1300, A = 800) constructed via simulated annealing and linearization, respectively (the true model is indicated by the dashed line). The similarity of  the solutions to depths greater than 6000 m is amazing considering the completely different  approaches of  the two methods. The form  of  the models over this depth range is similar to Weidelt's (1985) theoretical solution to the extremization problem for a small number of  exact data. Weidelt's solutions consist of  thin conductive zones embedded in an insulating halfspace  with conductive zones just excluded at either edge of  the region of minimization (900-1700 m depth). This would seem to suggest that both the linearized and simulated annealing extremizations approach discretized approximations to the global extremum solution. The deep structure of  the extremal models in Fig. 6.2(a) and (b) differs  somewhat; this structure is well removed from  the region of  minimization and does not effect  the extremal value for  a. Both constructed models have a misfit  of  x 1 = 16.0 and the measured and predicted responses are shown in the panels on the right. Figure 6.3(a) and (b) show the extremal models (x1 =16.0) which maximize a(z 0 = 1300, A =800) constructed using simulated annealing and linearized inversion, respectively. Again, the solutions are very similar to depths greater than 6000 m, particularly near the region of maximization 900-1700 m depth. These models resemble Weidelt's solution with thin conducting zones just included at either edge of  the region of  maximization. The similarity between the models constructed via simulated annealing and linearization and the correspondence in form with Weidelt's exact solution are characteristic of  all the extremal models which produced the funnel  function  bounds shown in Fig. 6.1. A final  example of  appraisal using simulated annealing considers the LITHOPROBE MT data set described in Section 3.4.1. Minimum-structure models for  this data set are shown in Fig. 4.8 and indicate a low conductivity region at 2000-7000 m depth. Upper bounds for  this region were computed using the linearized appraisal algorithm (Fig. 5.14). Considering the full  data set of  complex responses at iV = 34 frequencies  and a fine  depth partition (M = 130) represents a 0.20 r-K B \ 0.15 CO s—' 0.10 N 0.05 0.00 IO"3 10~2 10"1 10° 101 IO2 IO"3 10-2 1 0 - 1 1 0 0 1 0 1 1 0 2 T  (s) Figure 6.2 Extremal models constructed by minimizing a(z 0 = 1300, A = 800) for  the synthetic MT example with model limits 0.002 < <T, < 0.2 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from  the linearized inversion. The two models are in excellent correspondence, particularly in the region where the conductivity is minimized. The constructed models have a misfit  of  x 1 = 16.0; the fit  to the true responses is shown in the panels to the right. The true model is indicated by the dashed line. ^ 0.05 to 0.00 ) ' — — ' — IO"3 IO"2 10"1 10° 101 102 0.20 -^ 0.15 -m ^ 0.10 -^ 0.05 -to 0.00 102 103 z (m) 104 1 0 - 3 10~ 2 10 - 1 10° 101 102 T  ( s ) Figure 6.3 Extremal models constructed by maximizing a(zo  = 1300,A = 800) for  the synthetic MT example with model limits 0.002 < ax < 0.2 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from  the linearized inversion. The two models are in excellent correspondence, particularly in the region where the conductivity is maximized. The constructed models have a misfit  of  x 1 = 16.0; the fit  to the true responses is shown in the panels to the right. The true model is indicated by the dashed line. demanding test for  the simulated annealing appraisal algorithm. Model limits of  0.0001 <&,< 1.0 S/m were imposed. The upper bound for  a computed using simulated annealing is 0.0021 S/m and the constructed extremal model is shown in Fig. 6.4(a). For comparison, an upper bound of  0.0023 S/m was computed using the linearized appraisal algorithm, and the extremal model is reproduced in Fig. 6.4(b). Both constructed models have a misfit  of  x 1 =95.0. The features of  the two extremal models are in good agreement in the region of  maximization 2000-7000 m. The conductivity remains near the lower limit in this region with narrow high-conductivity zones just included at either edge of  the region of  maximization. In all the cases we have considered, we have found  that if  a careful  annealing schedule is followed,  the extremal value for  a computed using simulated annealing is very close to that computed using the linearized appraisal algorithm. Importantly, however, we have never found simulated annealing to produce a better extremum than linearization. This indicates that the linearized approach produces excellent extremal values which in many cases may represent the best (discretized) approximation to the global extremum. The similarity in form  to Weidelt's exact extremal solutions would seem to support this conclusion. In addition to the application to constructing extremal models for  MT appraisal, the simulated inversion method developed in this chapter has enormous flexibility  and can be applied to minimize or maximize any (linear or non-linear) functional  of  model and/or misfit  in any inverse problem for  which a solution to the forward  problem exists. The computational efficiency  of  the annealing method depends directly on the efficiency  of  the forward  solution. IO -3 IO"2 IO"1 10° 10l IO2 1 0 ° ? IO"1 \ S IO"2 N 10 - 3 IO"4 t r t j f  iiij f  i i i mil i i i I  II  ii 102 103 104 z (m) 105 IO"3 IO-2 10"1 10° 10l 102 T  ( s ) Figure 6.4 Extremal models constructed by maximizing a over the apparent low conductivity region 2000-7000 m depth for  the LITHOPROBE MT data set. Model limits are 0.0001 < a t < 1.0 S/m. (a) shows the model constructed using simulated annealing; (b) shows the solution from the linearized inversion. The two models are in good correspondence near the region where the conductivity is maximized. The computed upper bounds for  a are 0.0021 S/m using simulated annealing, and 0.0023 S/m using linearized inversion. The constructed models have a misfit  of X 1 =95.0; the fit  to the true responses is shown in the panels to the right. Chapter 7 An application to MT monitoring of earthquake precursors 7.1 Introduction Earthquake prediction is a challenging but important goal. Earthquakes may be preceded by anomalous changes in tilt, strain, seismic velocities, magnetic fields,  and electrical conductivity. These precursor signals may precede the onset of  an earthquake by a matter of  hours, days or years. The dilatancy model of  the Earth's crust provides an explanation of  how precursory signals may occur prior to a seismic event. Dilatancy is an inelastic volume increase in stressed rock which occurs prior to fracture.  The volume increase results from  new pores and microcracks forming  and propagating within the rock due to the steady increase in tectonic stress. Water diffusing  into the newly-created microcracks causes an increase in the electrical conductivity in the focal  region of  a forthcoming  earthquake. It may be possible, therefore,  to detect a change in the conductivity with time at suitably located sites. The lead time of  the precursor depends on the magnitude of  the earthquake and the distance between the epicentre and the recording site (Rikitake 1987). The precursory increase in conductivity may be as great as 30 percent (Sumitomo & Noritomi 1986). Canadian studies using the magnetoteliuric method to monitor earthquake precursors began in 1974 in Charlevoix County, Quebec, near the centre of  seismicity on the north shore of  the St. Lawrence River. A number of  MT stations were established in the region, and the data collected have been analyzed by Kurtz & Niblett (1978). Their results showed an approximately 14 percent increase in the impedance tensor per year; however, there was no clear association between impedance changes and seismic activity. The central region of  Vancouver Island, where two major earthquakes have occurred this century, is another location in Canada where earthquake precursor studies are underway. Figure 7.1 shows the locations of  the two earthquakes. The earthquake on the west coast of  the island CONTINENTAL VANCOUVER Figure 7.1 Tectonic map of  the earthquake precursor study site on Vancouver Island. Stars indicate earthquakes, solid circles indicate MT sites. The lower diagram shows a cross-section of  the region with earthquake focal  depths indicated. occurred in 1918 and had a magnitude of  7.0 (Cassidy et al. 1988). The earthquake on the eastern side occurred in 1946 with a magnitude of  7.3 (Rogers & Hasegawa 1978). The focal depths are shown in the lower portion of  the figure  and correspond to approximately 15 km for the 1918 event and 30 km for  the 1946 event. The recurrence interval for  earthquakes of  this size in the Vancouver Island region is estimated to be approximately 40 years (Rogers, personal communication), hence, a seismic event may be imminent. Relevelling surveys show a change in the sense of  direction of  the vertical deformation  in the Campbell River region. The data indicate a relative uplift  of  4 mm/yr from  1977 to 1984, and a subsidence of  approximately the same magnitude from  1984 to 1988 (Dragert & Lisowski 1990). In response to the estimated earthquake recurrence interval and the change in levelling data, D. R. Auld and L. K. Law of  the Pacific  Geoscience Centre, Sidney, British Columbia, initiated a program in 1986 to investigate changes in electrical parameters with time. From 1986 to 1989 magnetotelluric data have been measured annually at a number of  sites in central Vancouver Island (the site locations are indicated by the solid circles in Fig. 7.1 and will be discussed in a later section). The goal of  this chapter is to interpret some of  the data from  this study by applying inversion procedures to the problem of  detecting precursory changes in the conductivity. Previous MT studies have evaluated precursor signals only in terms of  changes in the measured responses. Examining the responses may be sufficient  to detect that a change in the conductivity has occurred; however, it is difficult  to quantitatively interpret changes in the conductivity at depth simply by inspecting changes in the data. This is correctly formulated  as an inverse problem. Explicitly formulating  the inverse problem allows investigation of  the changes required in conductivity models of  the Earth corresponding to yearly variations in the responses. This may be much more informative  in terms of  evaluating the processes and depths involved in observed changes in the data. It may be important, for  instance, to determine if  changes are localized at the focal  depth of  previous earthquakes. The model construction procedures developed in Chapters 3 and 4 can be applied effectively  to this inverse problem. Minimum-structure models generally provide a reasonable representation of  the conductivity structure. In addition, a model constructed from  data measured one year may be used as the base model in a smallest-deviatoric inversion of  responses measured in a subsequent year. This provides a direct method of  investigating the changes required in the earth conductivity model to accommodate the yearly variations of  the data. In this chapter, MT inversion methods are applied to the earthquake precursor study of  Auld and Law. The geological and tectonic setting of  the study are briefly  reviewed in Section 7.2. Section 7.3 describes the field  experiment and Section 7.4 considers the temporal change in the responses. In Section 7.4 model construction techniques are applied to the MT field  data in order to investigate yearly changes required in conductivity models of  the Earth. Finally, Section 7.5 briefly  considers an interpretation of  the regional structure. 7.2 Geological and Tectonic setting Vancouver Island is composed of  a number of  terranes that are part of  a series of  accreted terranes which make up the western segment of  the Canadian Cordillera. The central Vancouver Island region is part of  Wrangellia, a large composite terrane made up of  volcanic, plutonic, sedimentary and metamorphic rocks of  Paleozoic to Jurassic age. Overlying Wrangellia and underlying part of  the study area is the Nanaimo Group composed of  conglomerates, sandstones, mudstones and shales of  Late Cretaceous age. This complex occurs beneath the Alberni Valley and along the eastern side of  Vancouver Island, extending from  approximately Nanoose Bay to the Campbell River region. The Nanaimo Group of  sedimentary rock extends to depths of  200 m beneath the Alberni Valley and to at least 500 m in the coastal region (Gabrielse et al. 1990). The tectonic setting of  central Vancouver Island is complex. The plate boundaries of  the northeast Pacific  region are shown in Fig. 7.1. Riddihough (1977) concluded that the subducting Juan de Fuca and Explorer oceanic sub-plates are interacting independently with the lithosphere beneath Vancouver Island. Hyndman et al. (1970) located a zone of  faulting,  known as the Nootka fault,  extending from  the northern end of  the Juan de Fuca ridge to the continental margin off  central Vancouver Island. It is likely that the tectonic forces  causing earthquakes on Vancouver Island are a result of  some form  of  stress coupling between the dynamics of the subduction zone and faults  in the crust. There are a large number of  old crustal faults  on Vancouver Island, with the dominant fault  pattern striking northwest-southeast. One of  these northwest trending faults,  the Beaufort  Range fault,  is within the current study area in the central region of  Vancouver Island. Earlier MT work on Vancouver Island has been carried out by Kurtz et al. (1990) as part of  the LITHOPROBE multi-disciplinary geoscience research program. Their results indicated a conducting zone at depths greater than 20 km beneath Vancouver Island which correlated with the top of  the seismic E-reflector.  This strong reflective  zone is believed to delineate the top of the Juan de Fuca plate (Green et al. 1986). The conducting zone is believed to result from  saline fluids  supplied by the subducting oceanic crust and dehydration reactions. Their MT recording sites were approximately 50 km to the southeast of  the present study area. 7.3 Field experiment Figure 7.1 shows the location of  the four  magnetotelluric measurement sites of  the present experiment. The sites designated BRF and UBC, located on and slightly to the northeast of the Beaufort  fault  zone, were established in 1986. Sites designated HLN, located northwest of Campbell River, and SLW, west of  Sproat Lake, were established in 1989. Data were collected at each site once a year, at the same time each year, recording for  one to two months. Variations in five  components of  the natural EM fields  were measured and recorded digitally on a cassette recorder. The responses were processed using a robust technique for  magnetotelluric data developed by Egbert & Booker (1986). This method involves iterative reweighting of  the data to remove outliers and rejection of  Fourier harmonics which have less than a prescribed minimum power in the horizontal magnetic field  measurements. The robust processing method results in more accurate estimates of  the response functions  and significantly  reduced error estimates (e.g. Jones et al. 1989). The MT data collected in the present study are generally of  good quality with standard deviations varying from  3 to less than 1 percent. Data quality varied considerably from site to site and year to year. 7.4 Temporal change in responses It may be possible to detect changes in the Earth's conductivity prior to an earthquake by a change in the measured MT responses. The amount of  temporal change in the measured data at the two sites BRF and UBC has varied. An annual variation in apparent conductivity of  at least a few  percent would be expected due to a number of  factors  including ground water level and ground temperature (Xu 1986). As an example of  the amount of  year to year change in the measurements, Fig. 7.2 shows the percentage change in the measured apparent conductivities and phases at BRF for  the latest two years of  data, 1988 and 1989 (the responses correspond to measurements of  the north-south component of  magnetic field  and the east-west component of  the electric field).  The change in apparent conductivity from  1988 to 1989 is of  the order of a 5 percent decrease and varies somewhat with period. Between 1986 and 1987 the apparent conductivity increased by approximately 6 percent. Unfortunately,  different  instrumentation had to be used at BRF after  1987 which confuses  the total change at this site over the four  year period. However, at the UBC site, the total change in apparent conductivity over the four  years was about a 10 percent decrease: approximately 6 percent decrease for  1986-87, no change for 1987-88, and approximately 4 percent decrease for  1988-89. Unfortunately,  the UBC site is located on about 500 m of  high-conductivity sediments of  the Nanaimo Group, which degrades the quality of  the data at this site. Yearly changes in the phase measurements at both sites were observed, but are more difficult  to interpret. It is difficult  to determine exactly what change would be required in the MT responses to qualify  as a clear earthquake precursor signal. For a precursor to be detected by monitoring apparent conductivity, the change would likely have to exceed the measured annual change of up to 6 percent (Auld, personal communication). To date, no changes have been observed which CD OO cd o 20 10 0 10 a [ ] „ n [] x " [] U -L i I I n [] -ti - 2 0 j i i i i i 1 0 2 1 0 3 Figure 7.2 Change in observed response at site BRF between 1988 and 1989. (a) shows percentage change in apparent conductivity, (b) shows percentage change in phase. could be interpreted as a precursor to a large seismic event. For the duration of  the experiment, no earthquakes above magnitude 3.0 have occurred in the central Vancouver Island region. 7.5 Temporal change required in conductivity models In this section, model construction techniques are applied to determine changes in the conductivity models which are required by the yearly variations in the data. Figure 7.3 shows the and l2 minimum-structure models constructed for  the MT responses measured at site BRF in 1988 (this data set was chosen as it appears to be the best quality). The responses consist of determinant averages of  the impedance tensor measured at 18 periods between 93 and 2643 s, and are shown as apparent conductivities and phases in Fig. 7.3(b) and (c). The actual data set that was inverted consists of  amplitudes and phases of  the R response computed from  the impedance tensor, as described in Section 3.4.1. The l2 model has a x 2 misfit  of  36 and the lx model has a x 1 misfit  of  29. These misfits  represent the expected values for  x 2 and x \ respectively, and indicate that 1-D model solutions are justified.  In fact,  according to the D+ criterion of  Parker (1980), 1-D models are justified  for  each of  the four  years of  data recorded at BRF at a misfit  considerably less than the expected value for  x2-An approximate method of  determining if  a change in the conductivity model is required between two years, say, 1988 and 1989, is to compute the misfit  of  the responses predicted for the 1988 model to the responses measured in 1989. If  this misfit  is less than the expected value of  x 2 = 36, the constructed model for  1988 adequately fits  the 1989 data set and no change in the conductivity model is required by the data. If  the misfit  is greater than x 2 = 36, a change in the model may be required. The responses predicted for  the 1988 model misfit  the data measured in 1989 by x 2 = 47, this indicates that a change in the conductivity model is likely required by the data. Likewise, the 1988 model misfits  the 1987 responses by x 2 = 177, which indicates that a change in the model is required between these years. Finally, the 1988 model misfits  the 1986 data by x 2 = 18, so no change in the model is required by the data. B w 10 -1 1 JR H « ST" — i — a If I — ^ ' i i i i i i i 102 10 3 70 60 50 40 30 20 T  (s) Figure 7.3 Minimum-structure models constructed by inverting the data recorded at site BRF in 1988. The Z2 solution is a smooth model, while the solution represents a layered (discontinuous) model. The above misfit  analysis may be used to determine if  two data sets require a change in the model. In order to obtain a representation of  the change required, the smallest-deviatoric model may be constructed. As an example of  this procedure, the 1988 l2 minimum-structure model, shown in Fig. 7.3(a), is taken to be the base model. A model may then be constructed which fits  the 1986 data, but deviates least from  the 1988 base model; this model is shown in Fig. 7.4(a). The model is identical to the 1988 model, which verifies  that no change in the solution is required for  the 1986 and 1988 responses. This null result may be due to the (relatively) large uncertainties associated with the 1986 data, as shown in the panels to the right of  the constructed model. Figure 7.4(b) shows the smallest-deviatoric model constructed for  the 1987 responses (solid line). The 1988 base model is also shown (dotted line). At most depths the two models are indistinguishable, indicating that the structure of  the 1988 model is consistent with the 1987 data. However, the models differ  in the high conductivity zone at about 10-15 km depth. The 1987 data require a slightly higher conductivity in this region. Unfortunately,  it is not known to what extent the change in instrumentation between 1987 and 1988 affects  these results. The smallest-deviatoric model (solid line) constructed for  the 1989 responses as well as the 1988 base model (dotted line) are shown in Fig. 7.4(c). Again, at most depths the two models are indistinguishable, but differ  slightly in the the high conductivity zone at about 10-15 km depth. Even though this difference  is small, it indicates that the 1989 data require a slightly lower conductivity in this region. This agrees with the observation that the apparent conductivities decreased slightly from  1988 to 1989, and serves to illustrate the smallest change in the conductivity models (measured in the l2 norm) compatible with the change in the data. Also, the inversion procedure takes into account the amplitude and phase information,  in contrast to evaluating changes solely in terms of  the apparent conductivities. The changes in the constructed models indicated in Fig. 7.4(b) and (c) show that the data require a decrease in the conductivity of  the conductive zone at about 10-15 km depth from  1987 to 1988 and from  1988 to 1989. The percentage change in the conductivity, averaged over three B 1 0 - 1 \ GO N IO"2 102 103 104 105 106 s 1 0 - 1 \ CO N 10 - 2 ' ' • ""II I I I I Mil i i i mill 6 \ CO IO"1 "at—dK s H X XT—BC——£ ]£' I I I I I I I I I 6 1 0 - 1 -I \ in -<o b ^ 6 0 E b 40 » 1 1 1 Mil i i M I N I 1 1 1 Mil 20 6 \ CO 10-1 60 s-102 103 104 105 z ( m ) io8 40 20 i i i i i i i i 1 1 1 1 i 102 103 T  ( s ) Figure 7.4 Smallest-deviatoric models for  site BRF with the 1988 minimum-structure model as base model, (a), (b) and (c) show the smallest-deviatoric models constructed by inverting the 1986, 1987 and 1989 responses, respectively. partition elements at the peak of  the conductive zone, indicates a 14 percent decrease between 1987 and 1988 and a 4 percent decrease between 1988 and 1989. As mentioned previously, the yearly variations in data described in Section 7.4 would likely not be judged significant  as an earthquake precursor signal. However, if  changes in the conductivity model are required at depths which coincide with tectonic stress or with the focal  depths of  previous earthquakes, special attention might be paid to even small conductivity changes. Unfortunately,  the depth of  the conductive zone where changes are indicated is not reliably determined in this study. Near-surface  inhomogeneities in conductivity can introduce a static shift into the measured apparent conductivities, which has the effect  of  displacing the conductivity as a function  of  depth (e.g. Jiracek 1988). Kurtz et al. (1990) carried out MT measurements at 25 locations about 50 km to the southeast of  the present study. They considered 1-D models at each site based on the inversion of  Fisher & LeQuang (1981) in order to construct a 2-D conductivity model of  the region. The 1-D models Kurtz et al. obtained included a high-conductivity zone similar to the models in Fig. 7.3(a). In their study, the depth to this zone varied from  less than 10 km to more than 40 km, and they attributed this variation primarily to static shift  effects. It is likely that the depth of  the conductive zone is no better constrained in the present study. Therefore,  in order to use model construction techniques to reliably determine the depths of required conductivity changes, an independent measurement of  the static shift  is required (e.g. Sternberg et al. 1988). 7.6 Regional interpretation An interesting difference  between the models obtained in the the current study and those obtained by Kurtz et al. (1990), is the presence of  a secondary conductive zone shown at about 60 km depth in Fig. 7.3(a). Kurtz et al. interpreted the single conductive zone in their models as coinciding with a strong seismic reflective  zone, known as the E-reflector,  which is believed to delineate the top of  the subducting Juan de Fuca plate. However, they saw no evidence of  a similar, but shallower, reflective  zone known as the C-reflector.  The C-reflector  is believed to be associated with older oceanic lithosphere that was underplated to the base of  the Island after a westward jump in subduction occurred in the Late Eocene (Keen & Hyndman 1979). Given the uncertainty associated with the true depth of  the conductive features,  several interpretations are possible for  the two conductors indicated in the present study. One possible interpretation is that the shallower conductive zone is associated with the C-reflector  and the deeper conductor with the E-reflector.  Alternatively, the C-reflector  may not be associated with a conductive feature,  as Kurtz et al. (1990) believe. In this case the shallower conductor may correspond to the E-reflector  and the secondary conductive region may be a minor feature  which cannot be associated with any prominent seismic event. Figure 7.5 investigates these possibilities using l2 minimum-structure models constructed by inverting the responses measured at sites BRF (1988), UBC (1988) and HLN (1989). Figure 7.5(a) shows the model constructed by inverting (unaltered) responses from  site BRF. Figure 7.5(b) shows the model constructed by inverting the BRF responses which had been altered to simulate the static shift  in apparent conductivity that results in the shallow conductor occurring at 18 km depth. This is the approximate depth of  the C-reflector  at the site. The model in Fig. 7.5(c) was constructed by inverting responses which were altered to simulate the static shift  in apparent conductivity that yields the shallow conductor at 35 km depth, the approximate depth of  the E-reflector.  Figure 7.5(d), (e) and (f)  show a similar series of  models constructed by inverting unaltered and altered.responses from  the UBC site, and Fig. 7.5(g), (h) and (i) show the same series for  the HLN site. The data sets at sites UBC and HLN were fit  to a x 2 value somewhat smaller than the expected value. This appears to be justified  as it enhances the structural similarities common to all three sites and does not appear to cause structure generated by data noise. The models constructed from  the unaltered responses at the three sites, shown in Fig. 7.5(a), (d) and (g), are of  similar character: resistive at shallow depths with two conductive peaks and a conductive halfspace  at depth. The depths to the conductive features  are not in agreement between the three stations. The models constructed so that the first  conductive zone is at 18 km Figure 7.5 Minimum-structure models for  the data sets measured at sites BRF (1988), UBC (1988) and HLN (1989). (a) shows the model constructed by inverting the (unaltered) responses from  site BRF. In (b) the BRF data were altered to simulate the static shift  in apparent conductivity that results in the shallow conductor occurring at 18 km depth (the approximate depth of  the C-reflector).  In (c) the responses were altered to simulate the static shift  that yields the shallow conductor at 35 km depth (the depth of  the E-reflector).  (d), (e) and (f)  show a similar series of  models constructed by inverting unaltered and altered responses from  the UBC site; and (g), (h) and (i) show the same series for  the HLN site. Panels (c), (f)  and (i) also show a cross-section through the 2-D model of  Kurtz et al. (1990) as the dashed line. S3 i—» O depth, shown in Fig. 7.5(b), (e) and (h), have the second conductor at approximately 80, 60 and 60 km depth for  sites BRF, UBC and HLN, respectively. These depths do not agree with the depth of  the E-reflector  (35 km), so the interpretation of  the two conductors corresponding to the C- and E-reflectors  does not seem to be borne out. Figure 7.5(c), (f)  and (i) show the models for  the three sites constructed so that the first conductive zone is at 35 km. In these panels a cross-section through the 2-D conductivity model of  Kurtz et al. (1990) closest to the sites of  the present study is included as a dotted line. In this case the secondary conductive zone appears as a minor feature  associated with the increase in conductivity at depth. This would appear to be the most likely interpretation. Chapter 8 A modified linearized inversion algorithm 8.1 Introduction In Chapter 2 complete Frechet differential  series were derived for  several choices of  MT response. The ratio of  the higher-order (non-linear) terms to the linear term was used to quantify  the relative linearity in order to determine which response yields the most accurate linearized expansion. Another goal in deriving expressions for  the higher-order terms was to investigate whether the additional information  contained in these terms could be used to improve the efficiency  of  an inversion scheme while still making use of  linear inverse theory to provide the solutions. A method of  accomplishing this by approximating the higher-order terms at the current model is developed in this chapter. The corrections consist of  successively approximating the linearization error or remainder term in order to approximate a response functional  for  which the inverse problem is exactly linear. This method would seem to represent a novel approach to linearized inversion which can be implemented as a practical algorithm. Correcting the linearized solution at each iteration can reduce the number of  iterations and total computational effort  required to converge to an acceptable model. In addition, a correspondence between the corrected linearized solutions and iterations of  the modified  Newton's method for  operators is established. 8.2 Correcting the linearized inversion In Section 2.2.1 a complete expansion for  the MT response R(a,  zm = 0) about a starting model a 0 was derived in terms of  a constant, linear and remainder term: oo oo Rj (a,  0) = Rj (o-o,  0) +JGj  ((Jo, z) 8a (z)  dz  - J~Gj  ( ao, z) [Rj  (a,  z)-Rj (<70, z)f  dz, j = l,...,N,  (8.2.1) where Gj(a 0,z) represents the first  Frechet kernel, given by (2.2.17), and the subscript j indicates an implicit dependence on frequency  j = 1 , . . . , N.  The last term on the right side represents the linearization error or remainder term which contains the higher-order contributions. This term is neglected in a linearized approach; however, by retaining the remainder term equation (8.2.1) may be rearranged to give oo Rj (cr,  0) - Rj (<T 0,0) + JGj  (<70, z) <7*0 (*) dz oo + 0 0 oo oo /iu> 2 f —Gj(a 0,z)[Rj(a,z)-Rj(cr 0,z)] dz  — j Gj (cr 0, z) a (z)  dz. (8.2.2) This equation is in the form  of  a Fredholm integral equation of  the first  kind which may be solved for  a(z)\  the remainder term is included on the left  side as a component of  a modified response functional.  This equation is exact: no terms have been neglected and there is no requirement for  Sa = a — a 0 to be small. Thus, the left  side of  (8.2.2) corresponds to a choice of  response functional  which may be expressed (exactly) as a linear functional  of  cr. If  Rj(cr,  z) is known for  all depths z, the modified  responses may be evaluated and (8.2.2) inverted to construct a conductivity model cri(z)  using linear inverse theory. It should be noted that even in this ideal case when an exact equation is inverted, ax(z)  is not guaranteed to reproduce the measured responses, i.e. ,R(<7I,0) = -R(CT,0) is not guaranteed. Rather, since the inversion has been formulated  as a linear problem, the linear functionals  of  a and o\ will be identical, i.e. oo oo J  Gj (a 0, z) (T\  (z)  dz  = J  Gj (a 0, z) a (z)  dz,  j = 1,..., N.  (8.2.3) 0 0 However, for  responses measured at a reasonable coverage of  frequencies  over a wide bandwidth, (8.2.3) represents a stringent condition requiring a x to resemble the true model a, and the constructed model will generally reproduce the measured responses. Figure 8.1 shows an example of  this inversion procedure for  the synthetic test case described in Section 3.4.1. Figure 8.1(a) shows the true model (dashed line) and the starting model (solid 0.10 ? 0.08 \ CO 0.06 * ^ 0.04 N 0 .02 "tT 0.00 -l J3 1 0 CO IO"2 <0 10 - 3 80 £ 60 40 20 IO"3 10~2 IO"1 10° IO1 102 0.10 X—s a 0.08 \ CO 0.06 0 .04 N 0 .02 0.00 ? io-1 \ CO IO"2 id b 10"3 80 IO"3 IO"2 IO"1 10° IO1 102 T  ( s ) Figure 8.1 One-step inversion for  exact linearization, (a) shows the starting model (misfit X2 = 76 700) and (b) shows the h flattest  model (misfit  x 2 = 50.0) constructed in one iteration when the responses are corrected for  the exact remainder term. The responses predicted for  the starting model and the constructed model are compared with the measured data in the panels on the right line) which consists of  a halfspace  of  conductivity 0.02 S/m. The corresponding misfit  for  this model is x 2 = 76 700; the measured responses (represented as apparent conductivity and phase) and those predicted for  the starting model are shown in the panel to the right. The modified response functionals,  given as the left  side of  (8.2.2), were computed and this equation was inverted for  the l2 flattest  conductivity model. The model constructed is shown in Fig. 8.1(b) and achieves the desired misfit  of  x 2 =50.0 in one iteration; the fit  to the data is shown in the panel to the right. The standard linearized solution neglects the higher-order information contained in the remainder term and requires a number of  iterations to converge to an acceptable solution, as shown in Fig. 3.2 of  Section 3.4.1. Of  course, in any practical problem the electromagnetic fields  are not measured at all depths, so R{a,  z) and therefore  the remainder term are generally not available. However, an approximate method of  correcting the linearized solution can be developed which is applicable when only surface  measurements are available. To formulate  the method, (8.2.2) is rewritten in recursive form  as oo Rj (a,  0) - Rj (<7O,0) + JGj  (<70, z) <70 (z)  dz o oo oo + J  ^Gj(a 0,z)[Rj(a k,z)-Rj(a 0,z)]2 dz  = jGj (a 0, z) ak+1 (z)  dz, o o A; = 0 ,1 , . . . , (8.2.4) where a k + 1 on the right side represents the model that is constructed at the kth  step and R(a k,z) in the remainder term is used to approximate R(a,z).  At step k = 0, the remainder term approximation of  (8.2.4) is zero and the expression reduces to the standard linearized equation that may be inverted to produce a model ox . For an l2 norm solution this requires computing the responses -Rj(<70,0) and kernel functions  Gj(cr 0,z), and performing  a singular value decomposition (SVD) of  the inner product matrix T, as described in Chapter 3. The constructed model a x should be a better approximation to the true model than the starting model <j0, therefore  R(a x,z) is used in approximating the remainder term in step k = 1 and (8.2.4) is inverted for  a new model a2. Note that step k = 1 requires only the computation of  R(a x,z) and the integration of  the remainder term to update the left  side of  (8.2.4). The kernel functions are still evaluated at <J0 and may be stored at the previous step and retrieved. In fact,  the eigenvalues and eigenvectors of  T computed at the previous step may also be retrieved so an SVD is not required in this step. This represents a substantial saving in computational effort since the forward  modelling required to compute the kernels and the SVD of  T are the most computationally intensive processes in an inversion iteration. Thus, step k = 1 is simply a repeat of  the inversion performed  at step k — 0 with only the responses modified.  The modification amounts to an approximate correction for  the higher-order terms that were neglected in step k  = 0. This correction can be repeated with a 2 (the model produced by step k=  1) used in the remainder term approximation to produce a model a3 at step k = 2, and so on. The series of  steps k— 1,2,... represent sequential corrections to the responses used in the initial linearized inversion (step k = 0) with the corrected response functionals  approximating more and more closely the modified  responses of  (8.2.2) for  which the inverse problem is exactly linear. This method of  successively correcting the responses and repeating the same linearized inversion is in contrast to the standard procedure of  updating the model and kernels and performing  a new linearized inversion at each iteration. The new method of  treating (8.2.2) is reminiscent of  the Born approximation for  integral equations used in scattering theory (e.g. Morse & Feshbach 1953, p. 1073). Since R(cr,z)  in the remainder term is unknown, it is initially set equal to the assumed starting value R(a 0, z). This is, in effect,  the linearization approximation and allows the construction of  a model <j\(z)  using linear inversion methods. However, rather than simply repeating the process with the new model o-j as the starting model, <7x is used to compute an updated (non-zero) remainder term using R(ai,z)  to approximate R(a,z),  and the initial linearized inversion is repeated with the responses corrected for  the new remainder approximation. Before  examples of  this procedure are presented, however, an interesting correspondence is derived. To obtain an alternate expression for  the remainder term approximation in (8.2.4), consider the remainder term in the expansion of  R(crk, 0) about <r0: oo - /  —Gj (<7Q,  Z)  [Rj  (A K, z)-Rj (A 0, z)]2 dz  = J  Vo o oo oo Rj (crf c, 0) - Rj (<70,0) - JGj  (<70, CTfe  (2) dz  + jGj ((To, z) (TO  (z)  dz. 0 0 Substituting this expression into the formulation  for  the remainder term correction (8.2.4) leads to 00 00 Rj (a, 0) - Rj (o-fc,  0) + JGj  ((T 0, z) ok (z)  dz  = JGj  (CR 0, Z)  <rfe+ 1 (2) dz, 0 0 Jfc  = 0,1, (8.2.6) Equation (8.2.6) represents an alternative formulation  for  the sequence of  corrections that is even 00 simpler to evaluate than (8.2.4). For k > 1 only R(o k, 0) and the integral f  G(a 0, z)ak(z)dz  must 0 be computed to update the left  side of  (8.2.6). R(a k, 0) is generally available from  the previous step (where it is computed in order to determine the misfit  associated with a k ) and the integral is very efficiently  computed if  a k is taken to be a piece-wise constant function  of  depth on the Zi partition {z 0, zt,..., zM}  and values of  / G(a 0, z)dz  are stored at step k — 0. By comparison, updating the modified  response functional  formulated  according to (8.2.4) requires calculating R(cr k,z) for  all depths 2 and computing the remainder term requires numerical integration that is much less efficient.  In either formulation, once the responses are updated in correction step k > 1, the inversion is very efficient  since computation of  the kernel functions  and the SVD of the inner-product matrix are required only at step k  = 0. It is interesting to note that (8.2.6) is identical to the expression on which the standard linearized model norm inversion is based, except that the kernel functions  are evaluated at the starting model for  each step and not updated. This is recognized to correspond to the modified  Newton's method for  operator equations, described in Section 2.1.1. Thus, correcting the responses by approximating the higher-order terms is equivalent to performing  a modified Newton step; the procedure may be considered from  either perspective. Inversion algorithms based on both formulations  have been developed and give identical results; however, (8.2.6) leads to a computationally more efficient  algorithm. A question of  practical importance concerns the effectiveness  of  this method in proceeding from  a simple starting model to a complex constructed model which reproduces a set of  measured responses. This question is investigated in Fig. 8.2 by considering the inversion of  the synthetic MT test case initiated from  three different  starting models a 0 . In Fig. 8.2(a) the starting model consists of  a 300-m thick surface  layer of  conductivity 0.004 S/m overlying a halfspace  of conductivity 0.02 S/m; this model is indicated by the dotted line and has a misfit  of  x 2 = 6890. The model constructed from  9 correction steps applied to an initial linearized inversion is indicated by the solid curve (in this construction m(z)  = a(z),  f(z)=logz+z 0 and the correct surface  conductivity value was supplied). The constructed model achieves the desired misfit  of X 2 = 50.0; however, the model has slightly more structure at depth than the equivalent (iterated) linearized inversion, shown in Fig. 3.3. This point will be considered in more detail later. The initial linearized inversion and correction steps were carried out with the target misfit X 2 at each step representing a reduction in the actual misfit  by a factor  of  5, i.e. P = 5 in (3.3.6). This reduction step is quite conservative and probably not optimal in terms of  a rapid reduction in the misfit,  however, it ensures a very stable inversion. In order to evaluate the improvement that results from  the correction steps, the solution is compared to the best-fitting model that could be constructed from  a single linearized inversion iteration. This best-fitting model was constructed by a trial-and-error procedure of  adjusting the value of  P and performing a single linearized inversion until the (approximate) optimal value was determined which lead to the greatest reduction in the misfit.  The optimal value was found  to be P = 100; the model constructed from  this linearized inversion step is indicated by the dotted curve in Fig. 8.2(a). The linearized solution has a misfit  of  x 2 =613; thus, the correction steps have reduced the misfit by more than a factor  of  10. The panel to the right compares the observed responses to those computed for  the correction-scheme inversion (solid curve) and the single linearized inversion iteration (dotted curve). Finally, it is noted that while the full  linearized inversion required about 0.10 N 6 0 .08 \ CO 0.06 * * — ^ 0.04 0.02 b 0.00 10~3 10 - z IO"1 10° 101 102 0.10 ' s fci 0 .08 \ CO 0.06 0.04 N 0 .02 b 0.00 1 _L ' ' • I " I I ' ' ' I ' HI 02 103 _l I L 104 10-3 IO"2 10"1 10° 10 1 1 0 2 10 - 3 10 - 2 10-1 10° 101 102 T  ( s ) Figure 8.2 Correction-scheme inversion for  m(z)  = a(z).  In (a), (b) and (c) the true model is given by the dashed line, the starting model by the dotted line, the best-fitting  linearized model by the dotted curve and the model constructed by successively correcting the linearization for higher-order terms by the solid curve, x 2 misfit  values for  each model are given in Table 8.1. The panels on the right compare the predicted responses for  the correction-scheme model (solid line) and the linearized model (dotted line) with the measured data. Table 8.1 Summary of  misfits  for  the starting model, best-fitting  linearized inversion solution and correction-scheme solution for  Fig. 8.2, with model m(z)  = a(z). x2 x2 x2 Number of Starting Linearized Correction corrections Fig. 8.2(a) 6 890 613 50.0 9 Fig. 8.2(b) 76700 2351 69.7 14 Fig. 8.2(c) 33400 7700 1820 40 Table 8.2 Summary of  misfits  for  the starting model, best-fitting  linearized inversion solution and correction-scheme solution for  Fig. 8.3, with model m(z)  = \oga(z). x2 x2 x2 Number of Starting Linearized Correction corrections Fig. 8.3(a) 6 890 1183 50.0 12 Fig. 8.3(b) 76700 7750 50.0 27 Fig. 8.3(c) 33400 3470 2100 12 22 s of  CPU time on a SUN 4/310 workstation, each correction step required only 1.0 s. Figure 8.2(b) illustrates a similar comparison except that the starting model consists of  a 0.02-S/m halfspace  (dotted line) which has a misfit  of  x 2 = 76 700. The model constructed from 14 correction steps (solid curve) achieves a misfit  of  x 2 = 69.7, slightly larger than the desired value of  50, however, further  correction steps do not reduce this misfit  significantly.  Again, the constructed model exhibits some unnecessary structure at depth. The best-fitting  model that could be constructed from  a single linearized iteration is indicated by the dotted curve; this model has a misfit  of  x 2 = 2350. Thus, the correction steps have reduced the misfit  by more than a factor  of 30. The best-fitting  linearized solution exhibits the correct general increase in conductivity with depth, but only hints at the layered structure with no indication of  the high conductivity zone at 6000-10000 m depth. By comparison, the correction-scheme solution clearly resolves all layers. The panel to the right compares the responses computed for  both solutions. A significant improvement is indicated for  the correction-scheme solution, particularly in the phase responses. In Fig. 8.2(c) the starting model consists of  a 0.004-S/m halfspace  which has a misfit  of X2 = 33 400. The model constructed from  40 correction steps achieved a misfit  of  x 2 = 1820. In this case the correction-scheme is not able to converge to an acceptable solution and the model exhibits incorrect structure at depth. However, this solution is still a considerable improvement on the best-fitting  linearized model which has a misfit  of  x 2 = 7700; this improvement is evident in the panel on the right which compares the responses computed for  both solutions. The misfits associated with the models shown in Fig. 8.2 are summarized in Table 8.1. The basis for  the correction-scheme inversion for  the conductivity model a(z)  is the exact equation (8.2.2). As described in Section 2.5.1, this equation can be recast in terms of  log a(z) as the model, however, this requires neglecting second-order terms. Therefore,  it is important to verify  if  the correction scheme still yields satisfactory  results in this case. Figure 8.3 shows a comparison identical to that of  Fig. 8.2 except that the model is taken to be m(z)  = log a(z).  The misfits  associated with the starting model, best-fitting  linearized solution and correction-scheme solution and the number of  correction steps applied are summarized in Table 8.2. The results are 6 \ W IO"1 10"2 -IO"3 icr3 io-2 io-1 io° io1 io2 T  (s) Figure 8.3 Correction-scheme inversion for  m(z)  = \ogcr(z).  In (a), (b) and (c) the true model is given by the dashed line, the starting model by the dotted line, the best-fitting  linearized model by the dotted curve and the model constructed by successively correcting the linearization for higher-order terms by the solid curve. \ 2 misfit  values for  each model are given in Table 8.2. The panels on the right compare the predicted responses for  the correction-scheme model (solid line) and the linearized model (dotted line) with the measured data. similar to those described above for  Fig. 8.2: for  the starting models shown in Fig. 8.3(a) and (b) the correction scheme achieves an acceptable solution, although some unnecessary structure is evident when the models are compared with the (iterated) linearized solution, shown in Fig. 3.4. In Fig. 8.3(b) the correction scheme achieves a misfit  which is more than two orders of magnitude smaller than that of  the best-fitting  linearized inversion. In Fig. 8.3(c) the correction scheme achieves a smaller misfit  than the best-fitting  linearized inversion, but does not converge to an acceptable model and the solution exhibits incorrect structure at depth. Figure 8.3 indicates that the correction scheme is an effective  solution for  m(z)  — log <7(2). Figures 8.2 and 8.3 indicate that the higher-order correction scheme can result in a significant reduction in the misfit  over the best-fitting  linearized inversion, and in some cases converges to an acceptable model. However, it is noted that even when the misfit  converges to the desired value, the constructed models often  exhibit incorrect structure at depth (in comparison to models produced via iterated linearization). This may be understood by examining the recursive formulation  (8.2.4). The error in (8.2.4) arises from  approximating R(a,  z) in the remainder term by R(a k,z). In general, this approximation will be somewhat in error at all depths. However, since the depth of  penetration of  the electromagnetic fields  and therefore  the Frechet kernels Gj(z)  and response function  Rj(z)  increases with decreasing frequency,  the accumulated error in the integral approximating the remainder term should be larger at the lower frequencies.  Since it is the low-frequency  responses that determine the deep structure, the greater accumulated error in the modified  response functionals  at low frequencies  likely results in the incorrect structure at depth. 8.3 A practical inversion algorithm The procedure of  applying successive correction steps to a linearized inversion represents a fixed  point iteration method. This procedure should exhibit linear convergence when it converges; by comparison, the standard (iterated) linearized approach exhibits quadratic convergence. The advantage of  the correction method is in the greatly reduced computational requirements: once an initial linearized inversion is performed,  a number of  correction steps may be carried out at a fraction  of  the computational expense of  performing  another complete linearized inversion. However, if  the starting model a 0 is sufficiently  far  from  an acceptable solution it may be that this procedure requires a prohibitive number of  steps to converge or does not converge at all. In many cases it appears that the most effective  approach is to perform  a small number of  correction steps (two to ten) between standard linearized inversion iterations. This procedure should converge whenever the standard linearized method converges. We have found  that this correction procedure generally reduces the number of  iterations and total computation time required to converge to an acceptable model. In this section several examples of  this procedure are presented. Figure 8.4 compares the misfit  as a function  of  (linearized) iteration number for  the standard linearized inversion (squares) and the linearized inversion corrected for  higher-order terms (triangles) applied to the synthetic MT test case. The starting model consists of  a 0.004-S/m halfspace  and the model is m(z)  = cr(z).  The dashed line indicates the desired misfit  value of X2 = 50. The target misfit  xi a t e a ° h s t e P the corrected inversion represented a reduction in actual misfit  by a factor  of  P = 5. Five corrections were applied to the first  linearized iteration, however, only two were required at the second iteration to reduce the misfit  to the desired level. To evaluate the improvement that results directly from  the higher-order correction steps, a trial-and-error procedure was used to find  the optimum value of  P which results in the fewest iterations for  the linearized inversion. However, even for  the optimum value of  P = 100, four linearized iterations were required. For this problem a complete linearized inversion requires about 22 s computation time while a correction step requires about 1.0 s; thus, the standard linearized inversion required about 88 s total computation time, while the corrected inversion required only about 52 s. The model constructed by the corrected linearized scheme is shown in Fig. 8.5(a) by the solid curve, and its fit  to the measured responses is shown in Fig. 8.5(b) and (c). The model constructed by the standard linearized inversion method is also shown in Fig. 8.5(a) by the dotted curve. The two models are almost identical although the standard inversion procedure I t e ra t ion n u m b e r Figure 8.4 x 2 misfit  as a function  of  (linearized) inversion iteration number for  the synthetic MT test case. The squares represent the optimal standard linearized inversion, and the triangles represent the linearized inversion corrected for  higher-order terms. The starting model was a 0.004—S/m halfspace  and m{z)  — cr(z). z (m) 10 a GQ IO" 2 10 - 3 b 1 1 I 1 1 1 1 Hill 1 1 1 1 HIM I I I HUM 1 1 II1 1 0 - 3 10~2 1 0 _ 1 10° IO1 102 80 c P 60 S- 40 20 - AG AT -1 1 N JH C „ .1 I I I HILL I I I INN 1 ' 1 i i nun i I I I II IO" 3 IO" 2 IO"1 10° IO1 T  (s) 102 Figure 8.5 /2 flattest  models constructed in the inversion of  Fig. 8.4. In (a) the dotted curve indicates the model constructed by the standard linearized inversion, the solid curve indicates the model constructed by the linearized inversion corrected for  higher-order terms. The dashed line indicates the true model, (b) and (c) compare the responses predicted for  the corrected inversion. results in slightly less structure (as measured by the l2 norm of  the model gradient). However, if  one further  linearized inversion is performed  in the correction scheme, the model obtained is identical to that shown for  the standard inversion. As a final  example of  the corrected linearization scheme, consider the LITHOPROBE data set measured in southeastern British Columbia, described in Section 3.4.1. In this example the data are represented as amplitude and phase of  R (rather than real and imaginary parts) and the model is taken to be m(z)=  log cr(z);  both of  these choices result in additional higher-order terms that are neglected in (8.2.2), as described in Section 2.5.1. Figure 8.6 compares the misfit  as a function  of  iteration number for  the standard and corrected linearized inversions (the absolutely flattest  model was constructed in each case). Again, a conservative value of  P was used for  the corrected inversion and a trial-and-error procedure was performed  to find  the value of  P which resulted in the most rapid convergence for  the standard linearized algorithm. Nonetheless, the corrected inversions converged to the desired misfit  value of  x 2 = 244 (dashed line) in three linearized inversions with seven correction steps per inversion, while the standard linearized inversion required six iterations. The two constructed models are shown to be essentially identical in Fig. 8.7. The examples presented in this chapter have involved Z2 model norm inversions using SVD and the method of  spectral expansion to perform  the inversion. It a straightforward  to apply a similar correction procedure to model norm constructions which carry out the inversions using linear programming (LP). In this case since only the responses and not the kernel functions are modified  at each correction step, the computational expense of  performing  the corrections steps can be greatly reduced by storing the LP solution basis at each step and retrieving it at the subsequent step. I t e ra t ion n u m b e r Figure 8.6 x 2 misfit  as a function  of  (linearized) inversion iteration number for  the LITHO-PROBE MT data set. The squares represent the optimal standard linearized inversion, and the triangles represent the linearized inversion corrected for  higher-order terms. The starting model was a 0.0004—S/m halfspace  and m(z)  = log a(z). s U1 1 0 -3-102 IO3 104 1 Z  ( m ) -1 -2 --3 I i 1 1 1 Mil 1 i i inn i 1 1 1 1 Mil 1 lllllll 1 1 1 1 lllll 1 b Mil IO" 3 IO" 2 10" 1 10° 101 IO2 80 -60 -40 20 jf 5 C 0 1 1 i 11 mi i lllllll 1 1 1 1 1 Mil 1 1 1 1 Mil 1 III 1 i i i i 1 0 - 3 IO" 2 10" 1 10° 101 102 T  (s) Figure 8.7 l2 flattest  models constructed in the inversion of  Fig. 8.6 (LITHOPROBE MT data set). In (a) the dotted curve indicates the model constructed by the standard linearized inversion, the solid curve indicates the model constructed by the linearized inversion corrected for  higher-order terms. (b) and (c) compare the responses predicted for  the corrected inversion. Chapter 9 Summary and discussion The purpose of  the work presented in this thesis was to develop and apply methods of inverse theory to the problem of  inferring  information  about the Earth conductivity structure from magnetotelluric measurements. The MT inverse problem is functionally  non-linear, however, linearization allows a variety of  construction and appraisal algorithms to be developed using techniques of  linear inverse theory. The linearization of  the MT problem was considered in detail in Chapter 2. Complete expansions for  several choices of  MT response were derived by a small modification  to a standard perturbation approach. These expansions consist of  a linear term and an infinite  series of  higher-order Fr6chet differentials.  The higher-order terms sum to a closed-form  remainder term which conveniently represents the linearization error. In a linearized approach the higher-order terms are neglected; however, inversion procedures which require second-order terms are used in optimization theory (e.g. Gill et al. 1981) but have not been applied to the MT inverse problem. The second-order term derived here could be used in such schemes. Also, the remainder term can be used to correct inversions for  the linearization error, as described later. The expansions were derived for  arbitrary measurement depth zm; however, it was shown that the conductivity above this depth is irrelevant, and therefore,  if  measurements are made at depth it is always possible to translate the coordinate system so that zm = 0 and the surface-measurement expressions apply. The expansions were illustrated for  the simple case of  constant-conductivity models where Frechet differentiation  is equivalent to ordinary differentiation  and the terms can be evaluated directly. In a linearized approach, it is important to verify  that neglected terms are of  second order, which implies that the response is Fr6chet differentiable.  Although Frdchet differentiability  of  the c response has received considerable attention, the proof  is problem-dependent and has not been presented for  the R response in the literature. In Chapter 2 the Frechet differentiability  of  R is proved This verifies  that algorithms based on the linearization of  this response are well-founded. In a non-linear problem such as MT, the choice of  response may well affect  the linearity of  the problem, with the correct choice resulting in the most accurate and efficient  linearized algorithm. The relative linearity of  the R and c responses was quantified  by considering the ratio of  non-linear to linear terms. For the special case of  constant conductivity profiles,  it was proved analytically that R is always more linear than c. The general case was investigated by considering the linearity ratios for  a number of  representative models. The conclusions of  this study are that R is generally more linear than c and is therefore  the preferred  choice of  response for  linearized inversion. Alternative formulations  to linearization have been suggested. Gomez-Trevino (1987) used scaling properties of  Maxwell's equations to derive an exact, non-linear integral equation relating the conductivity to measured responses. He called this expression the similitude equation, and suggested that MT inversion could be based on this formulation  rather than linearization. The similitude equation was examined in Chapter 2 and found  to be inadequate for  inversion in that it implicitly ignores first-order  terms. In fact,  the uniqueness of  the Frechet derivative implies that the linearized expression is the only linear relationship between response and model that is accurate to first  order. Therefore,  linearization would seem to be the obvious basis for  model norm inversions. In Chapter 3 an iterative inversion algorithm was developed based on the linearization of the R response. This algorithm may be used to construct acceptable conductivity models which minimize an l2 norm. Two model norms were considered. The smallest-deviatoric model may be constructed by minimizing the norm of  the deviation from  an arbitrary base model. The base model may represent the best estimate of  the Earth structure from  any available information (e.g. well logs, geology), or it may be chosen to investigate the range of  acceptable models. Alternatively, the flattest  or minimum-structure model may be constructed by minimizing the l2 norm of  the model gradient. Constructing minimum-structure models reduces the possibility of being misled by model features  that are not required by the data. There is reason to believe that features  of  the minimum-structure solution are characteristics essential to fitting  the responses. The Earth may be more complex than the simplest model, but these additional complexities are not resolved by data and are not justified  in the constructed model. The Z2 flattest  and smallest-deviatoric solutions tend to be smooth models which represent structure in terms of  continuous gradients in the conductivity. The standard method of  constructing flattest  models requires specifying  a model value at some fixed  point a priori in order to express the data constraints in terms of  the model derivative. In general, the flattest  model is not a unique entity since supplying different  values lead to different  flattest  models. If  a model value is known reliably, it is valuable to include this information  in the inversion; however, specifying  an inaccurate value can introduce false structure into the constructed model. A method was presented in Appendix A to compute an optimal model value directly from  the responses. This value is optimal in the sense that it results in the absolutely flattest  model which has the smallest possible norm of  the model gradient. It was shown in Chapter 3 that this procedure can significantly  reduce the amount of  structure in constructed conductivity models. Chapter 4 presented an inversion algorithm which uses linear programming to construct minimum-structure or smallest-deviatoric models that minimize an norm. Both model norms are formulated  as measures of  variation: the minimum-structure model minimizes the variation of  the model with depth, the smallest deviatoric model minimizes variation from  the base model. This also makes it straightforward  to minimize a functional  which combines both model norms. In addition, the minimum-structure solution is formulated  in terms of  the model, not its derivative, which obviates the need to integrate the data equations or specify  a model values The li solutions resemble layered earth models with structural variations occurring discon-tinuously at distinct depths. This is in contrast to the smooth solutions constructed in Chapter 3. It is important to recognize that in either case the form  of  the solution (gradient or layered) is due to the inversion procedure and is not demanded by the measurements. The and /2 inversions offer  complementary solutions, and in practice a complete interpretation should consider both. The inversion algorithms presented in Chapters 3 and 4 consider the model to be either <j or log <7, include an arbitrary weighting function  in the model norm, and fit  the data to a specified  level of  misfit;  these algorithms should provide considerable flexibility  in constructing acceptable conductivity profiles.  This development is based on the belief  that it is always valuable to produce a variety of  models and to take into account as much additional information  or insight into the problem as possible. Such flexibility  in model construction allows some exploration and understanding of  the range of  acceptable solutions. As a result of  the inherent non-uniqueness of  the inverse problem, a finite  data set cannot impose any bounds on the value of  the model at a point. However, model averages over a finite width are generally constrained by the data. Model features  can be appraised by constructing extremal models which minimize and maximize localized conductivity averages (Oldenburg 1983). These extremal models provide lower and upper bounds for  the conductivity average over the region of  interest. In Chapter 5 an efficient  and robust appraisal algorithm was developed which uses linear programming to extremize conductivity averages. For a given depth of  interest zQ, upper and lower bounds may be computed for  a number of  averaging widths A and plotted as a function  of  A. This yields a funnel  function  diagram which provides immediate insight into the resolving power of  the data at z0. For optimal bounds it is important that the constructed extremal models are geophysically reasonable. The appraisal method was extended by constraining the total variation to limit unrealistic structure and ensure that the extremal models are plausible. The variation bound can be specified  in terms of  a or log a. The extremal models of  Chapter 5 as well as the minimum-structure and smallest-deviatoric models of  Chapters 3 and 4 are constructed via linearized inversion; therefore,  the possibility always exists of  the algorithm becoming trapped in a local (rather than global) minimum. In practice, it is difficult  to verify  that the solution represents the absolute minimum. One method of  investigating this is to repeat the inversion with different  starting models. We have initiated our inversion procedures from  diverse starting models and never found  a case where the solution differed  significantly.  This does not constitute proof  that a global minimum has been found,  but does provide confidence  that the algorithms are not strongly dependent on the starting model. In order to corroborate the results of  the linearized appraisal, a method of  constructing extremal models using simulated annealing was developed in Chapter 6. Simulated annealing is a Monte-Carlo optimization procedure based on an analogy between the parameters of  a mathematical system to be optimized and particles of  a physical system which cools and anneals into its ground state according to the theory of  statistical mechanics. Simulated annealing was developed as a combinatorial optimization procedure (Kirkpatrick et al. 1983) and is well known for  its inherent ability to avoid unfavourable  local minima. In the cases we have considered, simulated annealing and linearized appraisal yield essentially identical results for  the model-average bounds. This provides confidence  that better extrema cannot be obtained and that meaningful  bounds have been computed. In addition, examining the extremal models constructed by each method indicates a close correspondence with Weidelt's (1985) analytic solution to the extremal model problem for  a small number of  exact MT data. Although the simulated annealing approach is considerably slower than the linearized analysis, it represents an interesting new appraisal technique. The method of  simulated annealing appraisal is general, requires no approximations (such as linearization) and can be applied to any inverse problem for  which the forward  problem can be solved. In Chapter 7, model construction procedures were applied to analyze MT responses measured yearly over the past four  years at a number of  sites on Vancouver Island, Canada. The data were recorded to monitor local changes in conductivity as a possible earthquake precursor. Previous MT studies have evaluated precursor signals based on changes in the measured responses (impedances or apparent resistivities). However, formulating  an explicit inverse problem allows investigation of  the corresponding changes required in conductivity models of  the Earth. This may be much more informative  in terms of  determining the processes and depths involved in observed changes in the data. In particular, a model constructed from  data measured one year was used as the base model in a smallest-deviatoric inversion of  responses measured in a subsequent year. This provided a direct method of  investigating the changes required in the earth conductivity model to accommodate the yearly variations in the data. In the current study, no changes judged to be significant  were observed in responses or models, and no seismic events larger than magnitude 3.0 occurred during the study period. In Chapter 8, a method of  correcting linearized inversion iterations was developed. The corrections consist of  successively approximating the linearization error using the analytic ex-pression for  the remainder term derived in Chapter 2. These corrections are used to approximate a response functional  for  which the inverse problem is exactly linear. The procedure is remi-niscent of  the Born approximation for  integral equations. This method would seem to represent a novel approach to linearized inversion and was implemented in a practical algorithm. It was shown that correcting the linearized solution at each iteration can reduce the total number of linearized iterations and total computational effort  required to converge to an acceptable model. In addition, a correspondence between the corrected linearized solutions and iterations of  the modified  Newton's method for  operators was established. The methods developed in this thesis represent a comprehensive package of  construction and appraisal algorithms for  investigating the 1-D MT inverse problem. The algorithms that have been implemented are robust, practical, and efficient  (except for  simulated annealing). The algorithms were illustrated using synthetic test cases and MT field  data collected as part of  the LITHOPROBE Southern Cordilleran transect in southeastern British Columbia. Finally, it should be noted that many of  the methods developed here for  the 1-D MT problem are general and could be applied to a variety of  inverse problems. For instance, the author has implemented an l2 inversion algorithm for  the 1-D dc resistivity problem using Frechet kernels derived by Oldenburg (1978), and has applied appraisal using extremal models of  bounded variation to the problem of  inferring  plasma densities from  laser interferometric  data (e.g. Oldenburg & Samson 1979). An application that is particularly interesting would involve extending the lx minimum-variation solution to 2-D model construction. This is straightforward for  a cellular 2-D model since the horizontal variation can be expressed in exactly the same manner as the vertical variation. A linear programming formulation  analogous to that developed in Chapter 4 may then be used to minimize an objective function  which represents the total variation in two dimensions. This would seem to be a promising approach to constructing minimum-structure 2-D models for  the MT problem, but is beyond the scope of  this thesis. References Adam, A., 1980. Relation of  mantle conductivity to physical conditions in the asthenosphere, Geophysical Surveys,  4, 43-55. Anderssen, R. S., 1975. On the inversion of  global electromagnetic induction data, Phys. Earth Planet.  Int.,  10, 292-298. Aubin, J. P. & Ekeland, I., 1984. Applied  nonlinear analysis, John Wiley & Sons, New York. Backus, G. E., 1970a. Inference  from  inadequate and inaccurate data, 1, Proc.Natn.Acad.  Sci. USA,  65, 1-7. Backus, G.E., 1970b. Inference  from  inadequate and inaccurate data, 2, Proc.Natn.Acad.Sci. USA,  65, 281-287. Backus, G.E., 1970c. Inference  from  inadequate and inaccurate data, 3, Proc.Natn.Acad.Sci. USA,  67, 282-289. Backus, G. E., 1972. Inference  from  inadequate and inaccurate data, in Mathematical  problems in geophysical sciences, American Mathematical Society, Providence, RI. Backus, G.E. & Gilbert, F., 1967. The resolving power of  gross earth data, Geophys.J.R.astr. Soc.,  16, 169-205. Backus, G. E. & Gilbert, F., 1968. Numerical applications of  a formalism  for  geophysical inverse problems, Geophys.J.R.astr.Soc.,  13, 247-276. Backus, G. E. & Gilbert, F., 1970. Uniqueness in the inversion of  inaccurate gross earth data, Phil. Trans.R.  Soc.Lond.  Ser.A.,  266, 123-192. Barker, J.A. & Henderson, D., 1976. What is 'liquid'? Understanding the states of  matter, Reviews of  Modern  Physics, 48-4, 587-671. Barsukov, O. M., 1972. Variations of  electrical resistivity of  mountain rocks connected with tectonic causes, Tectonophysics,  14, 273-277. Berdichevsky, M.N. & Dimitriev, V.I., 1976. Basic principles of  interpretation of  magnetotel-luric sounding curves, in Ad£m, A., Ed., Geoelectric and geothermal studies, KAPG Geo-physical Monograph, Akad. Kiado, 165-221. Bertero, M., De Mol, C. & Pike, E. R., 1985. Linear inverse problems with discrete data: I. General formulation  and singular system analysis, Inverse  Problems,  1, 301-330. Bertero, M., De Mol, C. & Pike, E. R., 1988. Linear inverse problems with discrete data: II. Stability and regularization, Inverse  Problems,  4, 573-594. Binder, K., 1978. Monte  Carlo  methods  in statistical  physics, Springer, New York. Cagniard, L., 1953. Basic theory of  the magnetoteliuric method of  geophysical prospecting, Geophysics, 18, 605-635. Cassidy, J.F., Ellis, R.M. & Rogers, G.C., 1988. The 1918 and 1957 Vancouver Island earthquakes, Bull.  Seism. Soc. Am., 78, 617-635. Chave, A.D., 1984. The Frechet derivatives of  electromagnetic induction, J.Geophys.Res.,  89, 3373-3380. Claerbout, J. F. & Muir, F., 1973. Robust modelling with erratic data, Geophysics, 38, 826-844. Constable, S. C., Parker, R. L. & Constable, C. G., 1987. Occam's inversion: a practical algorithm for  generating smooth models from  EM sounding data, Geophysics, 52, 289-300. Dosso, S.E. & Oldenburg, D. W., 1989. Linear and non-linear appraisal using extremal models of  bounded variation, Geophys.J.Int.,  99, 483-495. Dragert, H., & Lisowski, M., 1990. Crustal deformation  measurements on Vancouver Island, British Columbia: 1976 to 1988, Proceeding  of  the IAG  symposium, Edinburgh,  1988, In press. Egbert, G. D. & Booker, J. R., 1986. Robust estimation of  geomagnetic transfer  functions, Geophys. J.  R. astr. Soc.,  87, 173-194. Electromagnetic Research Group For The Active Fault, 1982. Low electrical resistivity along an active fault,  the Yamasaki fault,  J.Geomag.Geoelectr.,  34, 103-127. Electromagnetic Research Group For The Active Fault, 1983. Electrical resistivity structure of the Tanna and the Ukihashi fault,  Bull.  Earthq.  Res. Inst.,  58, 265-286. Filloux, J.H., 1980. Magnetoteliuric soundings over the Northeast Pacific  may reveal spatial dependence of  depth and conductance of  the asthenosphere, Earth  Planet.  Sci.Lett.,  46, 244-252. Fischer, G. & Le Quang, B. V., 1981. Topography and minimization of  the standard deviation in one-dimensional magnetoteliuric modelling, Geophys. J.R.  astr. Soc.,  67, 279-292. Fullagar, P.K., 1981. Inversion of  horizontal loop electromagnetic soundings over a stratified earth, Ph.D. thesis, University of  British Columbia, Vancouver. Gabrielse, H., Monger, J.W.H., Wheeler, J.O. & Yorath, C.J., 1990. Morphogeological belts, tectonic assemblages and terranes, The  Cordilleran  Orogen in Canada,  GSC, Geology of Canada, 4, In press. Garland, G. D., 1975. Correlation between electrical conductivity and other geophysical param-eters, Phys. Earth  Planet.  Int.,  10, 220-230. Gass, S. I., 1975. Linear programming:  methods  and  applications,  McGraw-Hill, New York. Gill, P. E., Murray, W. & Wright, M. H., 1981. Practical  optimization,  Academic Press, London. G6mez-Trevino, E., 1987. Nonlinear integral equations for  electromagnetic inverse problems, Geophysics, 52, 1297-1302. Gough, D. I., 1974. Electrical conductivity under western North America in relation to heat flow, seismology, and structure, J.  Geomag. Geoelectr.,  26, 105-123. Green, A. G., Clowes, R.M., Yorath, C. J., Spencer, C., Kanasewich, E. R., Brandon, M. T. & Sutherland-Brown, A., 1986. Seismic reflection  imaging of  the subducting Juan de Fuca plate, Nature,  317, 210-213. Griffel,  D.H., 1981. Applied  functional  analysis, John Wiley & Sons, New York. Hermance, J.F. & Grillot, L.R., 1974. Constraints on temperatures beneath Iceland from < magnetotelluric data, Phys. Earth  Planet.  Int.,  8, 1-12. Heustis, S.P., 1987. Construction of  non-negative resolving kernels in Backus-Gilbert theory, Geophys.J.R.astr.Soc.,  90, 495-500. Heustis, S.P., 1988. Positive resolving kernels and annihilators in linear inverse theory, Geophys. J.  Int.,  94, 571-573. Hobbs, B. A., 1982. Automatic model for  finding  the one-dimensional magnetotelluric problem, Geophys.J.R.astr.Soc.,  68, 253-264. Honkura, Y., Niblett, E. R. & Kurtz, R.D., 1976. Changes in magnetic and telluric fields  in a seismically active region of  eastern Canada: Preliminary results of  earthquake prediction studies, Tectonophysics,  34, 219-230. Hoover, D.B., Long, C.L. & Senterfit,  R.M., 1978. Some results from  audiomagnetotelluric investigations in geothermal areas, Geophysics, 43, 1501-1514. Huston, V. & Pynn, J. S., 1980. Applications of  functional  analysis and  operator  theory, Aca-demic Press, London. Hyndman, R.D., Riddihough, R.P. & Herzer, R., 1979. The Nootka fault  zone — a new plate boundary off  Western Canada, Geophys.J.R.astr.  Soc.,  58, 667-683. Jiracek, G. R., 1988. Near-surface  and topographic distortions in electromagnetic induction, Presented at 9th IAGA Workshop on Electromagnetic Induction in the Earth and Moon, Sochi, USSR. Jones, A. G. & Hutton, R„ 1979. A multi-station magnetoteliuric study in southern Scot-land — II. Monte-Carlo inversion of  the data and geophysical and tectonic implications, Geophys. J.R.  astr. Soc.,  56, 351-368. Jones, A. G., Kurtz, R.D., Oldenburg, D.W., Boerner, D.E. & Ellis, R., 1988. Magnetoteliuric observations along the LITHOPROBE Canadian Cordilleran Transect, Geophys. Res.Lett.,  15, 677-680. Jones, A. G., Chave, A. D., Egbert, G., Auld, D. R. & Bahr, K., 1989. A comparison of  techniques for  magnetoteliuric response function  estimation, J.  Geophys.Res., 94, 14201-14213. Jones, I. F., 1985. Applications of  the Karhunen-Loeve transformation  in reflection  seismology, Ph.D. thesis, University of  British Columbia, Vancouver. Kaplan, W., 1973. Advanced  Calculus,  Addison-Wesley Publishing Co., London. Keen, C. E. & Hyndman, R. D., 1979. Geophysical review of  the continental margins of  eastern and western Canada, Can. J.  Earth  Sci.,  16, 712-747. Kirkpatrick, S., Gelatt, C.D. & Vecchi, M., 1983. Optimization by simulated annealing, Science, 220, 671-680. Kirkpatrick, S., 1984. Optimization by simulated annealing: quantitative studies, J.Statis.Phys., 34, 975-986. Korevaar, J., 1968. Mathematical  methods,  1, Academic Press, New York. Kramer, H. P. & Mathews, M. V., 1968. A linear coding for  transmitting a set of  correlated signals, Presented at the 38th Annual SEG meeting in Denver. Kurtz, R.D. & Niblett, E.R., 1978. Time dependence of  magnetoteliuric fields  in a tectonically active region in Eastern Canada, J.Geomag.Geoelectr.,  30, 561-577. Kurtz, R.D. & Niblett, E.R., 1983. Magnetoteliuric monitoring of  impedance in an area of induced seismicity at Manic 3, Quebec, Pub. Earth  Phys. Branch, Ottawa, 25, 1-38. Kurtz, R. D., DeLaurier, J. M. & Gupta, JC., 1986. A magnetoteliuric sounding across Vancouver Island detects the subducting Juan de Fuca plate, Nature,  321, 596-599. Kurtz, R.D., DeLaurier, J.M. & Gupta, JC., 1990. The electrical conductivity distribution beneath Vancouver Island: a region of  active plate subduction, J.  Geophys. Res., In press. Lanczos, C., 1958. Linear systems in self-adjoint  form,  Am. math.Monthly.,  65, 665-679. Lang, S. W., 1985. Bounds from  noisy linear measurements, IEEE  Trans.Inform.  Theory,  IT-31, 490-508. Larsen, J. C., 1977. Removal of  local surface  conductivity effects  from  low frequency  mantle response curves, Acta.Geodaet.et  Montanist.  Acad.  Hung.,  12, 183-186. Levenberg, K., 1944. A method for  the solution of  certain nonlinear problems in least squares, Q. Appl. Math.,  2, 164-168. Levy, S. & Fullagar, P. K., 1981. The reconstruction of  a sparse spike train from  a portion of  its spectrum and application to high resolution deconvolution, Geophysics, 46, 1235-1243. Lines, L. R. & Treitel, S., 1984. A review of  least-squares inversion and its application to geophysical problems, Geophys.Prosp., 32, 159-186. Lusternik, L. A. & Sobolev, V. J., 1961. Elements  of  functional  analysis, John Wiley & Sons, New York. MacBain, J., 1986. On the Frdchet differentiability  of  the one-dimensional magnetotellurics problem, Geophys.J.R.astr.Soc.,  86, 669-672. MacBain, J., 1987. On the Fr6chet differentiability  of  the one-dimensional electromagnetic induction problem, Geophys.J.R.astr.Soc.,  88, 777-785. Marquardt, D.W., 1963. An algorithm for  least-squares estimation of  non-linear parameters, ACM  Trans.  Math.  Softw.,  7, 481^497. Martsen, R.E., 1981. The design of  the XMP linear programming library, J.S.Ind.  Appl.Math., 11, 431-441. Mathews, J. & Walker, R. L., 1970. Mathematical  methods  of  physics, W. A. Benjamin, Inc., Don Mills, Ont. Mazzella, A. & Morrison, H. F., 1974. Electrical resistivity variations associated with earthquakes on San Andreas Fault, Science,  185, 855-857. Menke, W., 1984. Geophysical data  analysis: discrete  inverse problems,  Academic Press, London. Metropolis, N. A., Rosenbluth, A., Rosenbluth, M., Teller, A. & Teller, E., 1953. Equation of state calculations by fast  computing machines, / . of  Chem. Physics., 21, 1087-1092. Milne, R.D., 1980. Applied  functional  analysis: An introductory  treatment,  Pitman Advanced Publishing Program, Boston. Morse, P.M. & Feshbach, H., 1953. Methods  of  theoretical  physics, McGraw-Hill, New York. Niblett, E. R. & Sayn-Wittgenstein, C., 1960. Variation of  electrical conductivity with depth by the magnetoteliuric method, Geophysics, 25, 998-1088. Noritomi, K., 1981. Study on fault  activity using geoelectrical and geomagnetic methods, Rep.Natural  Disaster Sci.,  A-2, 1-107. Oldenburg, D.W., 1978. The interpretation of  direct current resistivity measurements, Geo-physics, 43, 610-625. Oldenburg, D. W., 1979. One-dimensional inversion of  natural source magnetoteliuric observa-tions, Geophysics, 44, 1218-1244. Oldenburg, D.W., 1981. Conductivity structure of  oceanic upper mantle beneath the Pacific plate, Geophys. J.R.  astr. Soc.,  65, 359-394. Oldenburg, D.W., 1983. Funnel functions  in linear and nonlinear appraisal, J.  Geophys. Res., 88, 7387-7398. Oldenburg, D.W., 1984. An introduction to linear inverse theory, IEEE  Trans.  Geosci. Remote Sensing,  GE-22, 644-649. Oldenburg, D.W., 1990. Inversion of  electromagnetic data: an overview of  new techniques, Geophysical Surveys,  In press. Oldenburg, D. W. & Samson, J. C., 1979. Inversion of  interferometric  data from  cylindrically symmetric refractionless  plasmas, J.  Opt. Soc. Am., 69, 927-942. Oldenburg, D.W., Whittall, K.P. & Parker, R.L., 1984. Inversion of  ocean bottom magnetotel-iuric data revisited, J.  Geophys. Res., 89, 1829-1833. Oldenburg, D.W. & Ellis, R.G., 1990. Inversion of  geophysical data using an approximate inverse mapping, Geophys. J.  Int.,  Submitted. Park, S.K. & Livelybrooks, D.W., 1989. Quantitative interpretation of  rotationally invariant parameters in magnetotellurics, Geophysics, 54, 1483-1490. Parker, R. L., 1970. The inverse problem of  electrical conductivity in the mantle, Geophys. J.R.astr.Soc.,  22, 121-138. Parker, R.L., 1972. Inverse theory with grossly inadequate data, Geophys.J.R.astr.Soc.,  29, 123-138. Parker, R.L., 1974. Best bounds on density and depth from  gravity data, Geophysics, 39, 644-649. Parker, R.L., 1975. The theory of  ideal bodies for  gravity data interpretations, Geophys. J.R.astr.Soc.,  42, 315-334. Parker, R. L., 1977a. Understanding inverse theory, Ann. Rev. Earth.  Planet.  Sci.,  5, 35-64. Parker, R. L., 1977b. The Frechet derivative for  the one-dimensional electromagnetic induction problem, Geophys.J.R.astr.Soc.,  49, 543-547. Parker, R. L., 1980. The inverse problem of  electromagnetic induction: existence and construction of  solutions based on incomplete data, J.Geophys.Res.,  85, 4421—4425. Parker, R. L., 1982. The existence of  a region inaccessible to magnetotelluric sounding, Geophys. J.R.astr.Soc.,  68, 165-170. Parker, R. L., 1983. The magnetotelluric inverse problem, Geophysical Surveys,  6, 5-25. Parker, R. L., 1984. An inverse problem of  electromagnetism arising in geophysics, SIAM-AMS Proceedings,  14, 3-12. Parker, R.L., 1986. Comments concerning 'On the Frdchet differentiability  of  the one-dimensional magnetotellurics problem' by John MacBain, Geophys.J.R.astr.Soc.,  86, 673. Parker, R.L. & McNutt, M.K., 1980. Statistics for  the one-norm misfit  error, J.Geophys.Res., 85, 4429-4430. Parker, R. L. & Whaler, K. A., 1981. Numerical methods for  establishing solutions to the inverse problem of  electromagnetic induction, J.  Geophys. Res., 86, 9574—9584. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T„ 1986. Numerical  recipes: the art of  scientific  computing,  Cambridge University Press, Cambridge. Qian, F. Y., Zhao, Y. L., Yu, M. M., Wang, Z. X., Liu, X. W., & Chang, S. M., 1983. Geoelectrical resistivity anomalies before  earthquakes, Scienta  Sinica,  B-26, 326-336. Ranganayaki, R.P., 1984. An interpretive analysis of  magnetotelluric data, Geophysics, 49, 1730-1748. Riddihough, R. P., 1977. A model for  recent plate interactions off  Canada's west coast, Can. J.  Earth  Sci.,  14, 384-396. Rikitake, K., 1987. Magnetic and electric signals precursory to earthquakes: an analysis of Japanese data, J.Geomag.Geoelectr.,  39, 47-61. Rogers, G. C. & Hasegawa, S., 1978. A second look at the British Columbia earthquake of  June 23, 1946, Bull.  Seism. Soc. Am., 68, 653-675. Rokityansky, 1.1., 1982. Geoelectromagnetic  investigation  of  the Earth's  crust and  mantle, Springer-Verlag, Berlin. Rothman, D.H., 1985. Nonlinear inversion, statistical mechanics, and residual statics estimation, Geophysics, 50, 2784-2796. Rothman, D. H., 1986. Automatic estimation of  large residual statics corrections, Geophysics, 51, 332-346. Sandberg, S. K. & Hohmann, G. W., 1982. Controlled source audiomagnetotellurics in geothermal exploration, Geophysics, 47, 100-116. Schmucker, U., 1970. Anomalies of  geomagnetic  variations in the Southwestern  United  States, Scripps Institute of  Oceanography Bulletin, 13, University of  California  Press, London. Schmucker, U., 1987. Substitute conductors for  electromagnetic response estimates, PAGEOPH, 125, 341-367. Smith, J.T., 1989. Rapid inversion of  multi-dimensional magnetoteliuric data, Ph.D. thesis, University of  Washington, Seattle. Smith, J. T. & Booker, J. R., 1988. Magnetoteliuric inversion for  minimum structure, Geophysics, 53, 1565-1576. Sternberg, B. K., Washburne, J.C. & Pellerin, L., 1988. Correction for  the static shift  in magnetotellurics using transient electromagnetic soundings, Geophysics, 53, 1459-1468. Sumitomo, N. & Noritomi, K., 1986. Synchronous precursors in the electrical earth resistivity and the geomagnetic field  in relation to an earthquake near the Yamasaki fault,  southwest Japan, J.Geomag.Geoelectr.,  38, 971-989. Tarantola, A. & Valette, B., 1982. Inverse problems = Quest for  information,  J.  Geophys., 50, 159-170. Tarantola, A. & Valette, B., 1982. Generalized nonlinear inverse problems solved using the least squares criterion, Rev. Geophys. Space Phys., 20, 219-232. Telford,  W. M., Geldart, L. P., Sheriff,  R. E. & Keys, D. A., 1976. Applied  geophysics, Cambridge University Press, Cambridge. Tikhonov, A.N., 1950. On determining the electrical properties of  deep-lying crustal layers, Dokl.  Acad.  Nauk  SSSR,  73, 295-297. Tikhonov, A.N., 1965. Mathematical basis of  the theory of  electromagnetic soundings, USSR Comp. Math,  and  Phys., 5, 207-211. van Laarhoven, P.J.M. & Aarts, E.H.L., 1987. Simulated  annealing: theory and  applications, D. Reidel Publishing Co., Dordrecht, Holland. Vanderbilt, D. & Louie, S. G., 1984. A Monte Carlo simulated annealing approach to optimization over continuous variables, J.Comput.Phys.,  36, 259-271. Wannamaker, P. E., Stodt, J. A. & Luis, R., 1987. PW2D: Finite element program for  solution of magnetotelluric responses of  two-dimensional earth resistivity structure, ESL-158, University of  Utah Research Institute. Weidelt, P., 1972. The inverse problem of  geomagnetic induction, Z. Geophys., 38, 257-289. Weidelt, P., 1985. Construction of  conductance bounds from  magnetotelluric impedances, J.  Geophys., 57, 191-206. Weidelt, P., 1986. Discrete frequency  inequalities for  magnetotelluric impedances of  one-dimensional conductors, J.  Geophys., 59, 171-176. Whittall, K. P., 1986. Inversion of  magnetotelluric data using localized conductivity constraints, Geophysics, 51, 1603-1607. Whittall, K. P., 1987. Exploring magnetotelluric nonuniqueness using inverse scattering methods, Ph.D. thesis, University of  British Columbia, Vancouver. Whittall, K. P. & Oldenburg, D. W., 1986. Inversion of  magnetotelluric data using a practical inverse scattering formulation,  Geophysics, 51, 383-395. Whittall^ K. P. & MacKay, A. L., 1989. Quantitative interpretation of  NMR relaxation data, J.Magn.Reson.,  84, 134-152. Whittall, K.P. & Oldenburg, D.W., 1990. Inversion of  magnetotelluric data over a one-dimensional earth, in Magnetotellurics  in geophysical exploration,  ed. Wannamaker, P. E., SEG publication, In press. Woodhouse, J.H., 1976. On Rayleigh's principle, Geophys.J.R.astr.Soc.,  46, 11-22. Xu, S., 1986. Quantitative estimation of  an annual variation of  apparent resistivity, J.Geomag., Geoelectr.,  38, 991-999. Zeidler, E., 1985. Nonlinear  junctional analysis and  its applications,  Springer-Verlag, New York. Appendix A The absolutely flattest model The model constructed by minimizing the norm of  the model derivative is commonly referred to as the flattest  model. The flattest  model can be particularly useful  since it may be considered to be a minimum-structure solution. The construction procedure requires that a model value at some fixed  point be specified  a priori in order to express the data constraints in terms of the model derivative. The standard derivation (e.g. Oldenburg 1984) considers specifying  this model value only at the endpoints of  the interval of  definition;  however, it is straightforward  to generalize this method to specify  a model value at any point in the interval. In general, the flattest  model is not a unique entity since supplying different  model values lead to different  flattest  models. Rather, what is constructed is the flattest  acceptable model which passes through the specified  value. If  the model value is known reliably, it is valuable to include this information  in the inversion; however, supplying an inaccurate value can introduce false  structure into the constructed model. In this case, it is preferable  to solve directly for  the model value which results in the absolutely flattest  model. The absolutely flattest  model is the unique solution for  which the norm of  the model derivative is smaller than that of  any other flattest  model. Not only does this method obviate the requirement of  specifying  a model value a priori, but the absolutely flattest  model is the true minimum-structure solution and possesses a number of  attributes which may be desirable in practice. The absolutely flattest  model (/2 norm) is developed and discussed here for  the general linear inverse problem where N  observed responses ej are related to the model rri(z)  via the linear functional where the model and the kernel functions  g-j(z)  are defined  on the interval [a, 6] and equations (Al) have been normalized by their assumed uncertainties. The kernel functions  are assumed (Al) a to be continuous or to have at most a finite  number of  step discontinuities. To express the constraints in terms of  the model derivative, (Al) can be integrated by parts to give b m(b)h 3(b)-e 3 = jhJ(z) m'(z)dz, (A2) where hj(z)  = Jg 3(u)du  is a continuous function.  A similar expression involving the model endpoint m(a)  is also easily derived (e.g. Oldenburg 1984). However, it is straightforward  to generalize this method to write the data constraints in terms of  the model value m(c)  specified at any arbitrary point c, a < c < b, by substituting m I •.(b)  = j H(z-c)  m'  (z)  dz  + m (c)  (A3) a into (A2) to yield b m(c)h j(b)-e j = J[h(z)-h j(b)H(z-c)]m'(z)dz,  j = l,...,N,  (A4) a where H is the Heaviside step function.  It is emphasized that in the model construction problem m(c)  is a parameter to be specified  arbitrarily and may be considered to be independent of  c. The Z2 flattest  model is constructed by minimizing the norm of  the model derivative subject to the side conditions (A4) using the method of  Lagrange multipliers to minimize the functional $ ( m ' ; m ( c ) , c ) = ||m'(2:)||j + 2 j ™ (c ) hi { b ) ~ 6 j ~ J  ^ { Z ) ~ k j m ' d z } ( A 5 ) (it is straightforward  to include a weighting function  in the model norm, but for  clarity this will be omitted where). Section 3.2.1 shows that minimizing (A5) with respect to m' and a j leads to the result N rri (z)  = Y,  <*i [hj  (z)  - ^ (b)  H  (z  — c)] (A6a) j=i N  N = Y  <*jh3 (z)  -H(z-c)Y j «jhj(b)  , (A6b) j=i j=l _ a where the Lagrange multipliers a = ( a 1 , a 2 , . . . , a/v)T are given by a =T- 1[m(c)h(b)-e],  (A7) where h(b)  = (h^b),  h2(b),...,  hN(b)) T,  c = (ea, e 2 , . . . , eN) T  and T is the inner product matrix with elements 0 = J  [hj  (z)  - hj (b)  H  (z  — c)] [h k (z)  - hk (b)  H  (z  - c)] dz., (A8) According to (A6), the derivative of  the the flattest  model is given by a linear combination of  the (modified)  kernel functions  with the Lagrange multipliers a j acting as coefficients  given by (A7). The inner product matrix is symmetric and positive definite  and may be decomposed using singular value decomposition as T =UAU T,  (A9) where A = diag(A1? A 2 , . . . , Ajv) is the diagonal matrix of  eigenvalues and U  is the matrix of column eigenvectors. The flattest  model is recovered by integrating m'(z)  using m(c) as a fixed  value: m z (z) = m(c) + J  m'{u)du.  (A10) According to (A6), m'(z)  is not defined  for  z = c since a step discontinuity occurs in the kernel functions  at this point; also, the derivative is not defined  in a strict sense at the endpoints a and b. However, the derivative may be defined  at these points according to the following convention. For an interior point c, m'(c) is defined  to be the average of  the left  and right limits of  rn'(z)  as defined  by (A6); at the endpoints a and b where two-sided limits do not exist, the model derivative is defined  to be the appropriate one-sided limit, i.e. m ' ( a ) = lim m' (z) , (Alia) Z—KJ  + m'(b)=  l imm' (z ) , (Allb) (Allc) m'(c) = -z—*b 1 lim m'(z)  + lim m! (z) z—+c~ z— a If  the point c coincides with either a or b, the derivative at that point is defined  to be the appropriate one-sided limit. In this case it is straightforward  to verify  that (A6) reduces to m ' ( z ) = a j [ h j ( z ) ~ hj (6)], for  c = a, (A12a) 3=1 N m! (z)  = a3h3 0 ) , for  c = b, (A 12b) 3=1 which correspond to the expressions for  the flattest  model that are generally derived when the model value is specified  only at the endpoints. The standard procedure for  constructing the flattest  model involves specifying  some estimate of  the model value m(c). Oldenburg (1984) illustrates how specifying  an inaccurate estimate can introduce false  structure into the constructed model. To construct the absolutely flattest  model which truly minimizes the Lagrange functional,  (A5) must also be minimized with respect to the parameter m(c). Setting d§/dm(c)  equal to zero leads to N = 0- (A 13) 3=1 Equation (A 13) represents a further  constraint on the a / s that must be satisfied  by the absolutely flattest  model. Equations (A7) and (A13) represent JV + 1 equations in iV+1 unknowns, the N Lagrange multipliers a j and the optimum value of  m(c). The system of  equations may be solved for  m(c)  to yield N  ~ E h3 (b ) ej/Aj m(c) = ^ , (A 14) E h) (b)  /Xj 3=1 where £j = Efeli  and hj = E t l i Ukjhk represent rotated responses and kernel functions. Using this model value will result in the absolutely flattest  model. It is straightforward  to show that the absolutely flattest  model does not depend on the choice of  c in the interval [a,  b]. Differentiating  $ with respect to c leads to ^ = 2 m ( c ) 2 2 a j h j { b ) . (A 15) 3=1 If  condition (A 13) for  the absolutely flattest  model is satisfied,  then d$/dc=  0 regardless of  the choice of  c. Any convenient value of  c can be used to evaluate m(c) via (A14). Equation (A 14) for  the optimal choice of  m(c) is not valid in the case that hj(b)  = 0 for  all j (i.e. when the original kernel functions  all have zero area). If  this is the case condition (A13) is automatically satisfied  and it is straightforward  to verify  that the modified  data equations (A4) are independent of  m(c).  In this case the absolutely flattest  model is unique only to within an additive constant, and any value of  m(c) will result in a flattest  model with the same derivative norm ||m'||2. A number of  interesting properties of  the absolutely flattest  model can be demonstrated. Consider first  the general expression (A6) for  m'(z)  when c represents a point in the interior of  the interval. According to (A6), m'(z)  is formed  as a linear combination of  kernel functions that have a step discontinuity at c. Therefore,  in general, m'(z)  also has a discontinuity at c, i.e. the flattest  model has a discontinuous derivative at c, so m(z)  e C°. However, applying condition (A13) for  the absolutely flattest  model to (A6) removes the discontinuity from  m'(z). Thus, the absolutely flattest  model is that model which forms  m'(z)  as a linear combination of discontinuous kernel functions  in such a way that the discontinuity is eliminated. The absolutely flattest  model has a continuous derivative at all points and thus, rn(z)eC 1. This property can be understood by recognizing that the flattest  model is really the flattest  acceptable model which passes through the specified  point m(c). In general, for  an arbitrary value of  m(c) the requirement that the model pass through this point is antagonistic to minimizing the model-derivative norm while fitting  the data; the model which best accomplishes these conflicting  objectives has a discontinuous derivative at c. However, the value of  m(c) specified  for  the absolutely flattest model is that value which is not antagonistic to the other objectives and no discontinuity results. When the model value is specified  for  an interior point c it follows  from  (A6) that m'(a)  = m'(b)  = 0 for  any value of  ra(c). When c coincides with either a or b the general expression (A6) for  the flattest  model reduces to the simpler form  (A 12). According to (A12), when c=b, m'(a)  = 0 (since hj(a)  = 0), but in general m'(6)^0; when c = a, m'(b)  — 0, but in general m'(a)  ^ 0. Thus, when the model value is specified  at an endpoint, the flattest  model will generally have a non-zero derivative at this endpoint and zero derivative at the other endpoint. The non-zero derivative at the specified  endpoint is a result of  supplying a model value here which conflicts  with the objective of  minimizing the derivative norm while fitting  the data; this introduces additional structure into the constructed model. However, applying condition (A 13) for  the absolutely flattest  model to (A12) immediately leads to m'(a)  = m'(b)  = 0 for  both c = a and c = b; thus the absolutely flattest  model has zero derivative at both endpoints. Equation (A7) for  the unknowns {c^} requires the inverse of  the inner-product matrix T. However, as described in Section 3.2.1, this matrix is often  ill-conditioned in practical problems and an inversion which results in an appropriate (non-zero) level of  misfit  is desired. The spectral expansion method of  Section 3.2.2 overcomes these difficulties.  As described in Section 3.2.2, a ridge-regression parameter f3  is added to the main diagonal of  A to stabilize the inversion. The value of  /? which gives the appropriate misfit  may be found  by solving the simple non-linear equation where the (new) rotated responses fj  are given by N fi  = E U kj [m  (c) hk (b) - ek]. (A 17) k=1 In this case, the solution (A14) for  the value of  m(c)  which results in the absolutely flattest model becomes m(c) = ^ . (A 18) E ^ W / C A i + f l 3 =1 Of  course, the difficulty  with this approach is that m(c) is required to compute (3  according to (A16) and (A17), but /? is required to compute m(c)  according to (A18). A practical solution to this dilemma is to choose a reasonable starting value for  m(c), compute the corresponding /? then use this value to re-compute an improved m(c). The solutions for  m(c)  and f3  may be repeated iteratively until both have converged to stable values. This procedure has been implemented to find  the surface  conductivity value in the MT inversion algorithm for  the absolutely flattest model. It is found  that the convergence for  m(c) and fl  is stable and rapid and the procedure is straightforward  to implement since it is not necessary to recompute or decompose the matrix T at each iteration. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0052739/manifest

Comment

Related Items