A PRACTICAL SOLUTION FOR LINEAR INFERENCE COMPUTATIONS by MICHAEL GARFIELD SCHLAX B.Sc. (Engineering Geoscience) U n i v e r s i t y of C a l i f o r n i a at Berkeley, 1978. A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Geophysics and Astronomy We accept t h i s t h e s i s as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March 1984 © Michael G. Schlax, 1984 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree th a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r reference and study. I f u r t h e r agree t h a t permission f o r ex t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head o f my department or by h i s or her r e p r e s e n t a t i v e s . I t i s understood t h a t copying or p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l gain s h a l l not be allowed without my w r i t t e n permission. Department of (ftoyky S t c S A streneWy The U n i v e r s i t y of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date j L / ^ / 3 / 8 ^ i A b s t r a c t In t h i s t h e s i s , I s h a l l present a matrix form of Backus' theory of l i n e a r i n f e r e n c e with m u l t i p l e p r e d i c t i o n s . The Bayesian approach of Backus allows the treatment of erroneous data and the imposition of the e s s e n t i a l a p r i o r i bound on the model norm. The X 2 s t a t i s t i c w i l l be introduced to c o n s t r u c t a most l i k e l y model and bound the norm of a l l acceptable models from above and below. This r e s u l t s in more r e l i a b l e , and p o s s i b l y more c o n f i n i n g , estimates of the p r e d i c t i o n s than provided by only an upper bound. The a l g o r i t h m d e r i v e d i s robust and e f f i c i e n t , and estimates comparable to those obtained from Oldenburg's l i n e a r programming al g o r i t h m have been achieved. The work done in t h i s t h e s i s was c a r r i e d out with Doug Oldenburg and the body of t h i s t h e s i s i s e s s e n t i a l l y that of a paper e n t i t l e d "A P r a c t i c a l S o l u t i o n for L i n e a r Inference Computations" w r i t t e n by Schlax and Oldenburg. As a consequence, the pronoun "we" w i l l be used throughout the remainder of t h i s t e x t . i i Table of Contents A b s t r a c t i L i s t of F i g u r e s i i i Acknowledgements i v CHAPTER 1 An I n t r o d u c t i o n 1 CHAPTER 2 Review of L i n e a r Inference 4 CHAPTER 3 Matrix N o t a t i o n f o r Inference 10 CHAPTER 4 A S t a t i s t i c a l Formulation 13 CHAPTER 5 The Matrix S o l u t i o n 18 CHAPTER 6 A P r i o r i Bounds on ||m£|| 21 CHAPTER 7 Bounds on | | m£ | | 25 CHAPTER 8 A Numerical Example 27 CHAPTER 9 D i s c u s s i o n 34 CHAPTER 10 Con c l u s i o n 39 References 40 Appendix 1 41 Appendix 2 44 Appendix 3 47 Appendix 4 49 Appendix 5 52 i i i L i s t of F i g u r e s F i g u r e 1 A Geometrical Example of the Inference Problem .. 7 Fi g u r e 2 Numerical Example I 30 F i g u r e 3 Numerical Example II 32 iv, ACKNOWLEDGEMENTS T h i s research owes a great deal to Doug Oldenburg. Besides o f f e r i n g me the op p o r t u n i t y to pursue my research i n t e r e s t s , he provided guidance, f r i e n d s h i p , and c o n t r i b u t e d many hours of thought and d i s c u s s i o n to t h i s t h e s i s . His i n t e r e s t in my s t u d i e s , and p e r p e t u a l l y a v a i l a b l e help have made my M.Sc. program a f r u i t f u l and enjoyable experience. Jose J u l i a n Cabrera Gomez kept l a s maquinas h o r r i b l e s at bay and saved me much g r i e f by f i n d i n g my i n f i n i t e loops. Diane Donnelly d i d a remarkable job of i n t e r p r e t i n g my "handwriting" and t u r n i n g out f i n e manuscripts. And, f i n a l l y , thanks to Bob K r i d e r , for making me f e e l normal. 1 CHAPTER 1 AN INTRODUCTION For many p h y s i c a l problems, there i s a l i n e a r f u n c t i o n a l r e l a t i o n s h i p between o b s e r v a t i o n a l data and the model. T h i s r e l a t i o n s h i p can be w r i t t e n as a Fredholm equation of the f i r s t k i n d 7 • = J m (x) g L (x) dx i =1,...D (1 ) In equation ( 1 ) , 7^ i s the i ' t h datum, m f(x) i s a model d e s c r i b i n g that property of the earth'about which i n f o r m a t i o n i s sought, and g^(x) i s the data kernel corresponding to the i ' t h datum. The f u n c t i o n s g-L(x) are determined from the geometry and p h y s i c a l equations r e l e v a n t to the problem and are assumed to be known. A p a r t i c u l a r experiment w i l l y i e l d D numerical v a l u e s , which are contaminated with e r r o r . These data provide a set of l i n e a r c o n s t r a i n t s upon m E(x). L i n e a r i n v e r s e theory uses these c o n s t r a i n t s to e x t r a c t i n f o r m a t i o n about the model m f(x) in three p o s s i b l e ways: (1) to c o n s t r u c t a model which reproduces the data; ( 2 ) to take l i n e a r combinations of the data to generate unique averages of the model; ( 3 ) to c a l c u l a t e the values of other l i n e a r f u n c t i o n a l s of the model. The f i r s t two approaches, which are r e s p e c t i v e l y r e f e r r e d to as model c o n s t r u c t i o n and a p p r a i s a l , are r o u t i n e l y used and do not r e q u i r e f u r t h e r e x p l a n a t i o n . However, the t h i r d approach, l i n e a r i n f e r e n c e , has had l i m i t e d exposure. The goal of l i n e a r i n f e r e n c e i s to use the ( i n a c c u r a t e ) o b s e r v a t i o n s to p r e d i c t values of other l i n e a r f u n c t i o n a l s of m_(x), that i s , to estimate P values 7". given by 2 7; = / m, (x) p- (x) dx i = 1, P (2) These p r e d i c t i o n s may represent averages of m^x) i n the sense of Backus and G i l b e r t (1968, 1970) or Oldenburg (1983), or they may be a s s o c i a t e d with some p h y s i c a l parameters which depend upon m f(x). The f u n c t i o n s p ^ ( x ) , which we r e f e r to as the p r e d i c t i o n k e r n e l s , w i l l be determined by these mathematical or p h y s i c a l c o n s i d e r a t i o n s and are assumed to be known. The mathematical foundations of l i n e a r i n f e r e n c e theory were presented i n a s e r i e s of elegant and formidable papers by Backus (1970a,b,c, 1972). These have r e c e i v e d l i t t l e a t t e n t i o n i n the ge o p h y s i c a l l i t e r a t u r e but some attempts have been made to use t h i s work. Parker (1971) a p p l i e d i n f e r e n c e theory to the c a l c u l a t i o n of seamount magnetism, and Johnson and G i l b e r t (1972) used the r e s u l t s of Backus (1970a) to assess the non-uniqueness inherent i n the i n v e r s i o n of t e l e s e i s m i c d a ta. Parker (1977a) presented a matrix form of Backus' (1970a) s o l u t i o n of the i n f e r e n c e problem with P>1 and e r r o r - f r e e data. In that paper, Parker showed how more c o n f i n e d estimates of the p r e d i c t i o n s c o u l d be had when they were c o n s i d e r e d as the c o e f f i c i e n t s of some expansion of m f(x). Jackson (1979) a l s o presented a s o l u t i o n to the i n f e r e n c e problem a f t e r assuming that m f(x) c o u l d be represented by a f i n i t e number of parameters. More r e c e n t l y , Oldenburg (1983) has a p p l i e d l i n e a r programming to the s o l u t i o n of a l i n e a r i n f e r e n c e problem. He c a l c u l a t e d upper and lower bounds to boxcar averages of m^x) in order to assess non-uniqueness. In t h i s paper, we present a matrix form of Backus' (1970a) s o l u t i o n to the i n f e r e n c e problem with m u l t i p l e p r e d i c t i o n s and 3 i n a c c u r a t e d a t a . We s h a l l g ive a b r i e f review of Backus' theory of l i n e a r i n f e r e n c e , and present the a p p r o p r i a t e matrix n o t a t i o n . Using p r o b a b i l i t y d e n s i t y f u n c t i o n s , we s h a l l show how the data and other a p r i o r i i n f o r m a t i o n about m f(x) may be combined through Bayes' theorem to d e f i n e a set of models which c o n t a i n s m f(x). The range of v a l u e s of the p r e d i c t i o n s a r i s i n g from t h i s set of models w i l l be c a l c u l a t e d to provide p r o b a b i l i s t i c estimates f o r 7*. . We wish to emphasize that we are p r e s e n t i n g a p r a c t i c a l form of Backus' work. With minor except i o n s , the theory of our p r e s e n t a t i o n i s h i s , and we hope that t h i s treatment of l i n e a r i n f e r e n c e , presented at an elementary l e v e l , w i l l demonstrate the power of Backus' ideas. 4 CHAPTER 2 REVIEW OF LINEAR INFERENCE We s h a l l f i r s t present a b r i e f review of Backus' "(1970a) treatment of the l i n e a r i n f e r e n c e problem. From the data set (equation (1)) we seek to estimate the p r e d i c t i o n s in equation (2) . We a l s o suppose that the e r r o r a s s o c i a t e d with each 7fc. i s Gaussian d i s t r i b u t e d with zero-mean and known standard d e v i a t i o n , and that the c o v a r i a n c e matrix fo r the e r r o r s i s known. We assume that m £(x), g. ( x ) , and p t-(x) are a l l elements of a r e a l H i l b e r t space with a w e l l - d e f i n e d inner product <u,v> = / u(x) v(x) dx and norm | |u| | = <u,u>^ With t h i s n o t a t i o n , equations (1) and (2) can be w r i t t e n as 7t- = <m£ ,gt. > i = 1, . . .D (3) \ =' <mE , p^ > i = 1,...P (4) In l i g h t of the geometric s t r u c t u r e equations (3) and (4) may be c h a r a c t e r i z e d as the p r o j e c t i o n s of m£ onto the v e c t o r s gt- and pt- . We s h a l l e x p l o i t t h i s geometry to f i n d a s o l u t i o n . The v e c t o r s g4- and pt- allow a convenient decomposition of M. Assuming, without l o s s of g e n e r a l i t y (Backus, 1970a), that the set i9 i•••% iP1•••PP) i s l i n e a r l y independent, we d e f i n e three f i n i t e - d i m e n s i o n a l subspaces of Let asp {} denote the a l g e b r a i c span, or the set of a l l l i n e a r combinations, of a set of v e c t o r s i n 5 t^f, and def i n e JT?=asp {g, . . .gD ,p, . . .p-p } and «2)=asp {gi...gj>}. The t h i r d vector space, the orthogonal complement of***-'in $>. ThusSA^? where # denotes the orthogonal d i r e c t sum of two subspaces. Let mg, iri^ and m[- be the orthogonal p r o j e c t i o n s of iri£ onto <Q J^"? and the complement of r e s p e c t i v e l y . Then me = m^' + rr£ + ni^ . From the d e f i n i t i o n of the subspaces i t fol l o w s that the data equations may be r e w r i t t e n as 7< = <rnj / 9; > i = 1 , . . . D ( 5) S i m i l a r l y , the p r e d i c t i o n s take the form = <m f " 'Pt > + <m" f 'Pi = <m^ ,p(."> + <m^,p£> i = l , ...P (6) which we w r i t e as ft i = l , . . . P (7) Equations (5) - (7) demonstrate a fundamental f a c t about the inference problem; the data c o n s t r a i n only m£ (equation 5 ) . In f a c t , the data are equivalent to knowledge of m£. Although 7^ " may always be found from the data, the data provide no information about 7^. I t therefore f o l l o w s from equations (6) and (7) that 7t-cannot be determined unless f u r t h e r c o n s t r a i n t s are placed upon me or unless pX=0. We s h a l l attempt to estimate 7V by p l a c i n g a p r i o r i 6 c o n s t r a i n t s upon me. On the b a s i s of p h y s i c a l reasoning there w i l l always e x i s t a p o s i t i v e number M such that ||mE||<M. The Cauchy-Schwartz i n e q u a l i t y assures us that | ^ | = |<m£,pf>| < M | |p^| | ( 8 ) from which we can wri t e 7t- = ± | T^"*" | - The value of M i n equation ( 8 ) may be re p l a c e d by Mj_ i f i t i s hypothesized that | | m^ " [ | -^ Mj_. We would l i k e to demonstrate the concepts o u t l i n e d above by c o n s i d e r i n g a very simple analogy using v e c t o r s i n the 2-dimensional E u c l i d e a n space E 2 , i l l u s t r a t e d in Figu r e 1. We co n s i d e r E 2 with the usual b a s i s {x,y}, inner product, and norm. Suppose there i s some ve c t o r me i n E 2 , and that the only i n f o r m a t i o n a v a i l a b l e about mE i s the datum, 7=<m£,x>. I t i s d e s i r e d to have an estimate 7=<mE,p> where the ve c t o r p = ax" + b$ i s the p r e d i c t i o n k e r n e l . For t h i s example the subspace the x- a x i s , and i s the y - a x i s (pJ" = by). Our datum 7 provides i n f o r m a t i o n about m^ , that i s mE' = 7 x , but provi d e s no information about nig, the y - component of mE. Equation ( 6 ) shows that our p r e d i c t i o n i s 7 = 7 a + <mE",p^ "> and consequently i s undetermined unless m E i s bounded. H y p o t h e s i z i n g that Hm^l! < M x w i l l c o n f i n e m£ to the s o l i d l i n e i n F i g u r e 1. Corresponding estimates of 7 are then given by 7 a - bM_,_ < 7 < ya + bMj_ ( 9 ) I t i s c l e a r from equation ( 9 ) that the u s e f u l n e s s of t h i s approach 7 F i g u r e 1 A Geometrical Example of the Inference Problem From the datum 7 = <m£,x> we may f i n d mE' = 7 X . The p r e d i c t i o n v e c t o r i s p = ax + by. The bound | |mg j | < M± c o n f i n e s mf to the s o l i d l i n e , from which we estimate yx= <p,m£> as 7 a-bM i£ 7 < 7a + bM x. Figure 1 9 w i l l depend h e a v i l y on the magnitude of the hypothesized bound Mj. and t h e r e f o r e some e f f o r t should be made on the part of the i n v e s t i g a t o r to make M x as small as p o s s i b l e . There are two extensions to the i n f e r e n c e problem o u t l i n e d thus f a r which we would l i k e to c o n s i d e r . The f i r s t i s to i n c l u d e the e f f e c t s of the data e r r o r s ; the second i s to allow the bounds on ||nif|| or ||m^ || to be p r o b a b i l i s t i c . These two extensions w i l l permit us to d e f i n e regions i n <--Dand w ^ w h i c h c o n t a i n m'^ and m^ a t some l e v e l of c o n f i d e n c e . The e v a l u a t i o n of <mE',p"> and <m"^,pf> f o r a l l models i n those regions w i l l provide the d e s i r e d estimates f o r % • In order to make these e x t e n s i o n s and to set up a formalism w i t h i n which the computations are e a s i l y c a r r i e d out, we devote the next two s e c t i o n s to s e t t i n g up a matrix n o t a t i o n and i n t r o d u c i n g Bayesian s u b j e c t i v e p r o b a b i l i t i e s . 10 CHAPTER 3 MATRIX NOTATION FOR INFERENCE Equation (6) demonstrates that the c a l c u l a t i o n s r e q u i r e d t o f i n d the yt- i n v o l v e only elements of the f i n i t e - d i m e n s i o n a l v e c t o r spaces o D a n d i ? V e c t o r s on those subspaces o fc^may be represented as D- and P- t u p l e s of r e a l numbers and inner products of elements ofcOorc!^may be computed using matrix o p e r a t i o n s . We s h a l l e x p l o i t t h i s f i n i t e - d i m e n s i o n a l r e p r e s e n t a t i o n and develop the s u i t a b l e matrix equations f o r the i n f e r e n c e problem. D e t a i l s may be found i n Appendix 1. F i r s t , l e t us d e f i n e some n o t a t i o n . Consider the subspace with b a s i s {gi-.-g,,}. If x i s any element o f ^ ) w e may wr i t e x = Z a, g, (10) E q u i v a l e n t l y x may be represented as the D-tuple c o n s i s t i n g of the c o e f f i c i e n t s at- , that i s , as the column vect o r [ x ] T = (a, , a 2 . . .a B ) . Thus [x] i s the matrix of c o e f f i c i e n t s . If x and y are v e c t o r s i n j Q t h e i r inner product appears i n matrix form as <x,y> = [ x f T [y] (11) where 1^. = <g^ ,g:>. T i s c a l l e d the inner product matrix, and i t i s p o s i t i v e d e f i n i t e and symmetric. With t h i s n o t a t i o n , the norm of x i s given by | |x| | 2 = <x,x> = [ x f T[x] (12) 11 We now turn our a t t e n t i o n to the c a l c u l a t i o n of the p r e d i c t i o n s . Equation (7) can be w r i t t e n as a P-tuple i n E^ >, that i s , as 7 = 7 " + ^ = (7 1"...7; ,) T+ ( 7 t . . . 7 V - ) T (13) Our goal i s to f i n d a convenient way to evaluate t h i s equation. For any model m^ i n 301 the P-tuple 7" may be evaluated by T = G T[m?] (14) where GLi = <g i fp->. To evaluate the second component of the p r e d i c t i o n , yx, we intr o d u c e , as a b a s i s f o r P, the set (p^, . . .p£ }. Let the inner product matrix with respect to t h i s b a s i s be Ax£j= <P^,P±>> Since p"£" = p. - p", i t i s e a s i l y shown that A x = A - G T r - 1G (15) where A - j = <p(-,pJ->. Thus, f o r any model m"£ i n v ! ^ we have 7 X = A x [ < ] (16) Equations (14) and (16) permit the computation of j for any model m£ = m£ + m^ . The data w i l l be used to d e f i n e a region in ( ^ c o n t a i n i n g m£ at a c e r t a i n l e v e l of co n f i d e n c e . For t h i s purpose, i t i s necessary to understand the geometrical s i g n i f i c a n c e of the data. T h i s w i l l f o l l o w from c o n s i d e r a t i o n of the dual b a s i s f o r which we write 12 as {g'.-.g0}. The data are the c o e f f i c i e n t s of rrig with respect to the dual basis (Backus, 1970a). If [m^]^ i s the c o e f f i c i e n t matrix of mE with respect to the dual basis, i t can be shown (Appendix 2) that [ m j ] = r-'fmJIk = r - 1 ? (17) where yT = (7, .. .yv ) are the data. Equation (17) may be recognized as the equation defining the smallest model of Backus and G i l b e r t . Clearly, m^ i s the model with smallest norm that s a t i s f i e s the data equations (5); th i s i s the distinguishing property of the orthogonal projection. 1 3 CHAPTER 4 A STATISTICAL FORMULATION The previous chapter showed how the p r e d i c t i o n s y c o u l d be c a l c u l a t e d i f m^' and rri^ were provided. The p r a c t i c a l d i f f i c u l t y i s that n e i t h e r of these f u n c t i o n s can be determined e x a c t l y . Let us f i r s t c o n s i d e r the d i f f i c u l t i e s i n v o lved with computing . The data "7 are i n a c c u r a t e and may be w r i t t e n as 7 = yt + 67 where 7 ( are the true data and 67 i s an e r r o r v e c t o r . We s h a l l suppose that each element of 67 has a Gaussian d i s t r i b u t i o n with zero-mean. Let V be the c o v a r i a n c e matrix of the e r r o r s . If 7^ are the true data, then a j o i n t pdf for 7 on the vector space E^ i s given by p ( f | 7 t ) = const x exp [ -%.{ 7~7 t ) T V~ 1 ( y-yt ) ] (18) Let 3> m" = L 0. g. (19) Equation (17) shows that f o r any given "7, the c o e f f i c i e n t s 0 i n equation (19) are 0 = r ~ ' 7 ; that i s , the c o e f f i c i e n t s 0 are l i n e a r combinations of the data 7*. It f o l l o w s that s t a t i s t i c a l u n c e r t a i n t i e s in 0 can be r e l a t e d d i r e c t l y to the s t a t i s t i c s of the -» -». -* data e r r o r s . If 0 T = r " ' 7 t , the pdf for 0 i s p ( 0 | 0 T ) = const x exp [-%M 0-0 T ) T r v 1 r( 0-0 T ) ] (20) Equation (20) e s t a b l i s h e s the s t a t i s t i c a l r e l a t i o n s h i p between data i n a c c u r a c i e s and s t a t i s t i c a l v a r i a b i l i t y of the c o e f f i c i e n t s 0. V a r i a b i l i t y of the c o n s t r u c t e d p o r t i o n of the model on 3D can 14 t h e r e f o r e be determined. G e o m e t r i c a l l y , equation (20) has the f o l l o w i n g i n t e r p r e t a t i o n . —» At the 1-a l e v e l of c o n f i d e n c e , 0 i s contained w i t h i n the h y p e r e l l i p s o i d a l region centered on 0-t d e f i n e d by the p o s i t i v e d e f i n i t e q u a d r a t i c form ( £ - £ t ) T r v - 1 T ( 0 - 0 T ) < 1 (21 ) We s h a l l be concerned only with t h i s g e o m e t r i c a l i n t e r p r e t a t i o n of p d f s . For our purposes i t i s necessary only to write the argument of the pdf, that i s , equation (20) w i l l be w r i t t e n as p(?|£ t ) - « - ( ? - £ t ) T r v - ' r ( M t ) We can a l s o d e f i n e a pdf f o r /?t which expresses our b e l i e f that ||m£|| < M u. The a p p r o p r i a t e pdf i s p ( j j t ) — ^ t r M ^ A (22) and t h i s s t a t e s t h a t , at the 1-a l e v e l of confidence, m£ i s c o n t a i n e d w i t h i n a sphere of r a d i u s M,( i n i D c e n t e r e d on 0. In a s i m i l a r manner, a pdf f o r f^, the c o e f f i c i e n t s of m"e with respect to the b a s i s {p\ , . . .p£ } , may be d e f i n e d i f | | m£ | | < M^ .. That pdf i s p(% >-^ t TA (23) Equations (20) and (22) represent two pdfs f o r the c o e f f i c i e n t s /3 and $t . We would l i k e to combine them i n t o one pdf f o r /L and to do so we s h a l l use the approach pursued by Bayesian 15 s t a t i s t i c i a n s . Bayesians maintain that before the experiment i s performed, the i n v e s t i g a t o r has some r a t i o n a l c o n v i c t i o n regarding the r e s u l t 7 of an experiment. T h i s a p r i o r i c o n v i c t i o n may be expressed as a pdf, P p r i o ^ T t ) • where 7 t i s the true ( e r r o r f r e e ) outcome of the experiment. A f t e r the experiment i s performed and the data 7 are obtain e d , the i n v e s t i g a t o r w i l l a l t e r h i s s u b j e c t i v e n o t i o n s i n the l i g h t of the r e s u l t and a r r i v e at an a p o s t e r i o r i pdf f o r 7^ . We wr i t e the pdf f o r 7 t , given the datum 7 as P p o s t ( 7 t l 7 ) " A s t r a i g h t f o r w a r d e x t e n s i o n of Bayes' theorem (Backus (1970a), Savage (1962)) leads to the f o l l o w i n g formula f o r p p 0st(7<|t) P p o s - J t t l T ) = « P d e i t t t(7|7 t ) P p r t 0 r(7 t ) ( 2 4 ) where K i s a s u i t a b l e n o r m a l i z i n g constant. p p o j t ( 7 t l 7 ^ w i l l have some expected value % and c o v a r i a n c e . In our problem, our a p r i o r i c o n v i c t i o n that ^ M„ l e d to an a p r i o r i pdf p pr»i<jr( 01) given by equation (22). The outcome of the experiment y i e l d e d the pdf PjaXe< (01 0* ) given by equation (20). Thus, through Bayes' theorem, we have P p o s t ^ t l ^ ) — - ( 0 - 0 T ) T ™ - 1 r ( 0 - 0 T ) + ^m-2tt < 2 5 ) I t should be noted that while we s h a l l use pdfs d e f i n e d on the f i n i t e - d i m e n s i o n a l spaces Ej, and E-p, the formal development of Backus' theory r e q u i r e s the use of p r o b a b i l i t y measures d e f i n e d on the i n f i n i t e - d i m e n s i o n a l spaced. T h i s i s a d i f f i c u l t t o p i c , and i s covered i n d e t a i l by Backus (1970c, 1972). A l s o , our use of pdfs to d e f i n e regions c o n t a i n i n g ml and nf" i n a p r o b a b i l i s t i c sense does 16 not imply that these models are the outcome of some random process; we are simply using the pdfs to express our uncertainties about mjr and ntp in a quantitative manner. This means of expressing uncertainty (or degree of knowledge) i s fundamental to Bayesian analysis. There has been considerable controversy over the application of Bayes' theorem. Rather than focusing on the mathematical v a l i d i t y of the theorem, which is unquestioned, o b j e c t i v i s t s question our a b i l i t y to define ppcier(7-c ) . a n& t n e u s e °f Pjq-u^ I % •* to define a "pdf for y±". When applying Bayes' theorem to geophysical inference problems, we n o t e : ( l ) in many, i f not a l l geophysical problems, the norm of me i s generally constrained by widely accepted physical i n t u i t i o n ; (2) the geometrical significance of the data makes i t s use in defining a region containing m£ by p r o b a b i l i s t i c means reasonable, and (3) as Backus (1970a, 1970c, 1972) points out, the solution obtained by using Bayes' theorem reduces to the correct one when the data are error-free, and also in the case with one prediction as determined without recourse to Bayesian techniques. In l i g h t of t h i s , and given the simple, e f f i c i e n t and robust solution obtained, we f e e l , along with Backus, that the Bayesian approach i s acceptable and advantageous Often, linear constraints on m£ are available prior to obtaining data from an experiment. If an a p r i o r i constraint may be expressed in terms of a linear functional then we are j u s t i f i e d in formally treating this information as data and including i t in the data set. For example, we may be convinced that the average of m f(x) over some subinterval [a,b] of [0,1] l i e s near a certain 17 value M• This conviction may be expressed as u ± 8u = J B([a,b])m (x)dx (26) o where B([a,b]) i s a suitable averaging function (e.g. a boxcar over [a,b]) and bu i s an "error" assigned in accordance with the strength of our conviction. Jackson(1979) coined the term "a p r i o r i data" for t h i s type of constraint and argued for i t s inclusion in any data set. Henceforth, the term data w i l l apply to both observational data and any a p r i o r i linear constraints that we feel j u s t i f i e d in including. 18 CHAPTER 5 THE MATRIX SOLUTION We s h a l l now apply the concepts and no t a t i o n of the previous chapters and give the matrix form for the s o l u t i o n to the i n f e r e n c e —p problem. Expressing the data as a D-tuple 7"= ( 7 , . . . ^ ) and assuming that a l l of the data are contaminated with zero-mean Gaussian e r r o r s we wr i t e 7 = 7 t +67 (where 7^ i s the true value) and express t h i s as the pdf Proceeding to c o n s t r u c t a pdf for the c o e f f i c i e n t s Pt = [m£] i n E^., we have P j a < 0 k - 0 | tt ) in equation (20) and P p r i o r ( ^ t ^ (equation (22)) r e s u l t i n g from our c o n v i c t i o n that ||m£|| < M„. Combining these with Bayes' theorem produces equation (25). W r i t i n g P p O V t ( | P ) as p(P^), and completing the square (Appendix 3), we ob t a i n p(? t )—^(it -\ ) T ( r V " T+M ^ D (0t -/3t) (27) with 0 = ( rv - T+M^D - ' n r '7 (28) Equations (27) and (28) d e f i n e a region i n Ej> which w i l l c o n t a i n the c o e f f i c i e n t s of m£ at a given l e v e l of c o n f i d e n c e . At the i - a l e v e l , t h a t w i l l be the h y p e r e l 1 i p s o i d d e f i n e d by the p o s i t i v e d e f i n i t e q u a d r a t i c form <^-0c.> ( rv- T+M ^ r ) (j?t -&c ) < 1 (29) The model mc , given by /Jc=[m&] i s the most l i k e l y c a n d i d a t e f o r m£, 19 given the data and our prejudice about the norm of mj?. Our uncertainty about m"., expressed by (27) and (28) may be used to estimate 7". From (14) we find the corresponding expected value and covariance matrix for 7" as = \ = G T ?J t (30) and v" = G T(rv-T+M-^D " 1G (31) which leads to the pdf on Ep for 7" p(7" )—*(1"-7 t ) T v " - 1 ( f " - \ ) (32) To calculate "y , we note that the only information we have about rri^ is the a p r i o r i bound | |m£ | | < Mj.. Thus, i f T?t is the P-tuple of c o e f f i c i e n t s of m"^ with respect to the basis for we find the pdf defined in equation ( 2 3 ) . Consequently <^[rf^) = 0 and the covariance matrix of 7 is V A = M * A X (33) Our desired values of 7 are found by combining 7" and 7 as independent random variables. The expected value and the covariance matrix are respectively: 6 [ 7 l = \ = G T ^ (34) 20 and V2 = V n 2 .—• | _ (35) At the 1-a l e v e l of confidence the P-tuple y w i l l be contained in the region defined by (f"7c )T V"'(?-% ) £ 1 (36) In practice, i t i s more convenient to write 7i - <*£ ± (37) i f the covariances may be ignored. It i s convenient to use data whose errors are independent with unit variance. It i s always possible to find linear combinations of the data which y i e l d an equivalent data set with those properties (Gilbert, 1971). Once this normalization has been done, the solution requires the inversion of the matrices r and (T + Mf»21) . In many geophysical problems, T w i l l be poorly conditioned. The addition of Mjj2I to T w i l l generally allow the cal c u l a t i o n of a stable inverse for ( r + Mjj 2I); r~ 1 , however, must be approximated by discarding eigenvalues and eigenvectors of r . Since T" 1 appears only as a premultiplier of G, the e f f e c t of approximating r~ 1 i s to approximate [ p " l . The resulting error in 7" may be quantified and held well beneath data error. The d e t a i l s of this s t a b i l i z a t i o n procedure may be found in Appendix 4. 21 CHAPTER 6 A PRIORI BOUNDS ON ||m^ || In t h i s s e c t i o n we s h a l l present arguments l e a d i n g to a more reasonable and r e s t r i c t i v e pdf f o r m--'. The problem with the pdf f o r m£ d e r i v e d i n the l a s t s e c t i o n was the nature of the a p r i o r i c o n s t r a i n t ||mg|| < M„ . Such a c o n s t r a i n t has the p o s s i b i l i t y of being unduly weak s i n c e i t admits f u n c t i o n s that the i n v e s t i g a t o r would thi n k unreasonable. In p a r t i c u l a r , the c o n s t r a i n t ||m£|| < MB e a s i l y admits the f u n c t i o n mg = 0, a r e s u l t which i s h i g h l y u n l i k e l y f o r many experiments. In view of t h i s , b e t t e r r e s u l t s might be expected i f numbers ML and Mrt c o u l d be found such that M L - ll m£ll - M H a n c ^ ^ t h i s c o n s t r a i n t c o u l d be i n c l u d e d ( p r o b a b i l i s t i c a l l y ) i n the manner o u t l i n e d i n the l a s t s e c t i o n . U n f o r t u n a t e l y , t h i s cannot be done d i r e c t l y , but i t can be accomplished by a l t e r i n g our problem s l i g h t l y . Instead of producing a most l i k e l y model m&, and a range of a c c e p t a b l e models around i t , as we d i d i n the l a s t s e c t i o n , we s h a l l t r y to f i n d a most l i k e l y model ( i n ) using model c o n s t r u c t i o n techniques. If /3 tagain re p r e s e n t s the c o e f f i c i e n t s of m£ with respect to the b a s i s of £>, then the c o e f f i c i e n t s ^ = r~ w i l l reproduce the data e x a c t l y . But almost a l l i n v e r s e problems are unstable i n the sense that small v a r i a t i o n s i n the data produce l a r g e f l u c t u a t i o n s i n the c o n s t r u c t e d model i f the data are to be f i t e x a c t l y . T h e r e f o r e , when the data are imprecise, i t i s important that the c o n s t r u c t e d model which estimates m£ does not reproduce those data e x a c t l y . When the data e r r o r s are independent, Gaussian, with zero-mean and u n i t v a r i a n c e , the a p p r o p r i a t e measure fo r the a c c e p t a b i l i t y of a c o n s t r u c t e d model i s the x 2 m i s f i t , x 2 i s d e f i n e d by 22 x 2 = i ht~yL ) 2 (38) where yj- = <m,g^ > i s the i ' th datum predicted by the constructed model m. If D i 5 the expected value of x 2 i s approximately D and the standard deviation of x 2 i s a^i = 2\[2D. A model generating a chi-squared m i s f i t much less than D w i l l f i t the data too well and w i l l show structure that i s merely an a r t i f a c t of the noise. Alt e r n a t i v e l y , i f x2>>D, the data are f i t too poorly and information about the model that i s contained in the data w i l l have been l o s t . We w i l l define as acceptable any model with D-cy. <x2-^D+a^t . The most l i k e l y model m0 i s that one corresponding to x2=D. The construction of m0 takes place by using a standard regularization technique. Recognizing that the i n s t a b i l i t i e s in our problem result from small eigenvalues of r , we s t a b i l i z e T by introducing the regularization parameter b and use [m] = ( r + b l ) - ' T (39) rather than (17) to construct acceptable models. Let X L, i=1...D be the eigenvalues of r and l e t U be the matrix whose columns contain the associated eigenvectors. It i s straightforward to show (Appendix 5) that u l L \ +b (40) and 23 M l 2 7? 1=1 L -XL (Xi +b) 2 (41 ) where 7=U 7. As equations (40) and (41) demonstrate, x 2 i s a monotonicaly i n c r e a s i n g f u n c t i o n of b, while ||m|| i s a monotonically d e c r e a s i n g f u n c t i o n of b. Using (39) we may f i n d those values b , b 0 , b corresponding to x 2 values D - a^*- / D» an& D + a x t ' r e s p e c t i v e l y . Using b 0 in (39) and (41) we f i n d m0 and ||m0||. Using b^ and b K in (41), we may f i n d upper and lower bounds and Mu for ||m||, i f m i s to be accep t a b l e by our x 2 c r i t e r i o n . Having c o n s t r u c t e d m0, we may reformulate the i n f e r e n c e problem by w r i t i n g m£ = m0 + 6m£ and s u b s t i t u t i n g i n t o equation (5). The new ( t r a n s l a t e d ) data are 7.* = 7- " <m0,g;> = < Srn^ , g-L > i = 1 , . . . D (42) S i m i l a r l y , the new p r e d i c t i o n s are y[ = y[ - <m0rpV> = <6m£ ,p?> i=1,...P (43) Equations (42) and (43) d e f i n e a new i n f e r e n c e problem depending upon 6m£ which may be solved by s t i p u l a t i n g only an upper bound on ||6m^ ||. That i s , s p e c i f y i n g | |6m£ | | < 6M and using the p r o b a b i l i s t i c approach o u t l i n e d in the l a s t chapter i s now p r e c i s e l y the way to proceed. A reasonable c h o i c e f o r 6M might be 6M = (M M - M b)/2 but i f MH and Mu d i f f e r g r e a t l y , and ||m0|| i s much c l o s e r to one than the other, choosing 6M as some f r a c t i o n of ||m0I I might prove to be a 24 better choice. Having defined a pdf in Ev for SrnJI, we may proceed as before and calculate 7 " via (43). The expected or most l i k e l y prediction w i l l be 7 " = G T[m 0] and the variance V" w i l l depend upon the more r e s t r i c t i v e variance for 6m". 25 CHAPTER 7 BOUNDS ON ||mg|| We have shown that the "perpendicular p a r t " of the p r e d i c t i o n w i l l depend only on the s u b j e c t i v e bound | | me I I - Mi>- Backus suggests that a bound f o r ||m^|| may be found by r e q u i r i n g that the p r o j e c t i o n of mE onto any subspace of have norm bounded by M, the bound f o r ||mf||; that i s Mj. = M. A l t e r n a t i v e l y , given an estimate m0 f o r m£, we cou l d r e q u i r e that ||mi|fs IKM 2 - ||m0||2 If t h i s q u a n t i t y turns out to be too sm a l l , we would probably wish to i n c r e a s e our bound for ||m£||. An a l t e r n a t e approach might be as f o l l o w s . Suppose that we co n s t r u c t m0 as i n the previous chapter. T h i s model was co n s t r u c t e d to have minimum norm subject to a d a t a - f i t t i n g c r i t e r i o n and i s known as the " s m a l l e s t " model. Suppose we then c o n s t r u c t the " f l a t t e s t " model, mp , whose d e r i v a t i v e has minimum norm, subject to a d a t a - f i t t i n g c r i t e r i o n . Often the f l a t t e s t model w i l l represent a more "optimum" or r e a l i s t i c model than the sma l l e s t model, and w i l l e x h i b i t l e s s s t r u c t u r e . If we f e e l that m£ has at l e a s t as much s t r u c t u r e as mf , but l e s s than m0 then M£ * ||m0 - mF || 2 might be a reasonable choice for Mx. Regardless of these r a t i o n a l e s f o r the choice of Mx, i t should be r e a l i z e d that yu w i l l always depend upon the p r e j u d i c e of the i n v e s t i g a t o r . The qu e s t i o n of what c o n s t i t u t e s r e a l i s t i c and reasonable bounds for ||mg|| deserves careful consideration problem-by-problem basis. 27 CHAPTER 8 A NUMERICAL EXAMPLE The a p p l i c a t i o n of the matrix s o l u t i o n w i l l now be demonstrated by c a l c u l a t i n g the funnel f u n c t i o n s (Oldenburg, 1 9 8 3 ) f o r a numerical example. Let m(x) = 1.0 - 0.5 cos 2irx on the i n t e r v a l [ 0 , 1 ] and consider data k e r n e l s g^(x) = e x p ( - ( k - l ) x ) ( k = l , . . . l l ) . For some x 0 in [ 0 , 1 ] we s h a l l attempt to compute upper and lower bounds (m M(x 0,A) and m l'(x 0,A)) to the average value m(x 0,A) = <m,B(x0,A)> (44) where B(x 0,A) = ' 1/A f o r | x-x 0| < A/2 0 otherwise i s a unimodular boxcar of width A centered on x 0 . Oldenburg used l i n e a r programming techniques to c a l c u l a t e m I A(x 0,A) and m l i(x 0,A) and p l o t t e d these q u a n t i t i e s as f u n c t i o n s of A. These f u n c t i o n s were c a l l e d funnel f u n c t i o n s and they q u a n t i f i e d the u n c e r t a i n t y in averages of m(x) taken over i n t e r v a l s of width A about x 0 . For our case, l e t {A,, ...Ap} be a set of numbers d e f i n i n g the widths of i n t e r v a l s over which we seek to estimate the averages (44). Our p r e d i c t i o n k e r n e l s are then p^ (x) = B(x 0,Aj_) and the set of P p r e d i c t i o n s we seek from the data a f t e r s p e c i f y i n g an a p r i o r i bound on ||m|| are 28 ?• = m ( x 0 , A i ) = <m,p|;> i-1...P (45) Using the matrix solution, we may estimate mu(x0,A^) and m u(x 0 , A ^ ) as \ L + and , respectively, and present this 1-a level confinement of the 7. as the funnel functions for m(x). The resolving power of the data is further quantified by the uncertainty function £ (A) = m U(x 0,A) - m u ( x 0 , A ) {(A) is the distance between the upper and lower funnel functions. Its value indicates the uncertainty in the average value of m over a width A centered on x 0. For the following examples we have chosen an a p r i o r i . upper bound for a s M=2.0. This is a conservative estimate, given the true value, ||m||=1.06. Figure 2a shows the " p a r a l l e l part" of the funnel functions obtained using the method of Chapter 5, with M„=M and Mj. = 0. With this choice of M H we find ||mt||=2.09. Given our assumption that M=2.0, setting Mx=0 seemingly provides reasonable estimates for the predict ions. A more conservative c a l c u l a t i o n , using Mx=1.0 is shown in Figure 2b. Along with the funnel functions, i„n Figures 2a and 2b are shown the expected values of the predictions, - J ^ -(solid line) and the true values(dashed l i n e ) . The predictions are consistently greater than the true values. The reason for this bias is clear from Figure 2c, in which the most l i k e l y model, mt, is plotted along with the true model. The large norm of mc and the high values of the prediction are c l e a r l y a result of the large amplitude and 29 o s c i l l a t i o n of mt. The u n c e r t a i n t y f u n c t i o n s are p l o t t e d in Figure 2d. The more c o n s e r v a t i v e estimate using MJ_=1.0 does not introduce a great deal more u n c e r t a i n t y i n t o the s o l u t i o n . Figure 3 shows e q u i v a l e n t r e s u l t s obtained using the method of Chapter 6. Again M=2.0. The " p a r a l l e l p a r t " of the p r e d i c t i o n s give r i s e to the funnel f u n c t i o n s in Figu r e 3a, with 6M=.5||m0|| and M^O. Since |jm0||=1.73, in Fi g u r e 3b, we show the funnel f u n c t i o n s obtained by s e t t i n g M_£ = M 2 - ||m0||2. Now the expected values of the p r e d i c t i o n s are very c l o s e to the true value. T h i s i s the r e s u l t of the s i m i l a r i t y of m and m0 demonstrated in Figure 3c. The u n c e r t a i n t y f u n c t i o n s are p l o t t e d in Figure 3d. Now the r e s u l t of s p e c i f y i n g a non-zero Mj_ i s s i g n i f i c a n t ; i f a more c o n f i n i n g estimate of Mx i s to be had, there may be much improvement i n the r e s u l t i n g u n c e r t a i n t y . The funnel f u n c t i o n s of Fi g u r e s 2a and 2b and F i g u r e 3b d i s p l a y e s s e n t i a l l y the same u n c e r t a i n t y . The l a t t e r , however, provides more r e l i a b l e estimates of the p r e d i c t i o n , and hence s u p e r i o r r e s u l t s . When compared with Oldenburg's (1983) F i g u r e 1a, we note t h a t , at the l - a l e v e l of confidence the funnel f u n c t i o n s we have c a l c u l a t e d are e q u i v a l e n t to h i s c a l c u l a t e d using l i n e a r programming. . Figure 2 Numerical Example I Funnel functions for x 0 = .5, calculated using the method of Chapter 5, with M=2.0 are shown for Mj_=0 in (a), and for MJL=1.0 in (b). Also shown in (a) and (b) are the expected values of the predictions ( s o l i d l ine) and the true model averages (dotted l i n e ) . The most l i k e l y model, me i s plotted with the true model in ( c ) . The uncertainty functions associated with (a) and (b) are the lower and upper curves, respectively, in (d). 0.2 0.4 0.6 0.8 1.0 A Figure 2 Figure 3 Numerical Example II Funnel functions for x0=5, calculated using the method of Chapter 6, with M=2.0 and 6M=.5||m0||, are shown for Mx=0 in (a) and Mj;=M2-| |m0 | | 2 in (b). Also shown are the expected values of the predictions ( s o l i d line) and the true model averages (dotted l i n e ) . The most l i k e l y model, m0 i s p l o t t e d with the true model in (c). The uncertainty functions associated with (a) and (b) are the lower and upper curves respectively, in (d). 34 CHAPTER 9 DISCUSSION Following Backus (1970a) we have derived a matrix method to solve the linear inference problem. The use of Bayesian p r o b a b i l i t y theory allows a simple treatment of erroneous data. We have shown how to apply a smoothing constraint on m"f using the x 2 s t a t i s t i c . This may lead to more reasonable and r e s t r i c t i v e estimates of the predictions than the a p r i o r i constraint on ||mE|| of Backus (1970a), although he described how the problem could be translated with respect to some pa r t i c u l a r model m0, as we have done (Backus,(1972)). Our solution i s numerically e f f i c i e n t and robust. The two matrix inversions required, may be s t a b i l i z e d and the amount of computation required for the inversions does not increase with the number of predictions desired. Increasing the number of predictions increases only the size of matrices involved in m u l t i p l i c a t i o n operations. This c h a r a c t e r i s t i c i s advantageous in problems of interpolation or appraisal where a large number of predictions may be required. The d i f f i c u l t y of computing the inner product matrices w i l l , of course, vary from problem to problem. Parker (1977a) derived a matrix solution equivalent to Backus' (1970a) solution for accurate data and showed how to treat error in a non-Bayesian fashion. We s h a l l b r i e f l y outline Parker's work using our notation to f a c i l i t a t e comparison. We have defined the subspace^, spanned by {g, .. .g^ ,p,.. .pp} with dual basis written as {g1 .. .g3* ,p'.. .p1*}. Let 7* = (71.. .7^ ,7i • • ) be the D+P-tuple of data and predictions. 7* is c l e a r l y the c o e f f i c i e n t matrix of the orthogonal projection of m£ o n t o ^ , with respect to the dual basis. That projection has norm 35 7*^ H" 1 7* (46) where H = r • G GT: A i s the inner product matrix otjv, and the submatrices r , G and A are as previously defined. We demand that the projection of mE onto any subspace have norm bounded by M, so that When the data, (assumed error-free) are substituted into (47), we have a p o s i t i v e d e f i n i t e quadratic form defining the range of y consistent with the data and a p r i o r i bound. Parker shows how a more confining solution may be obtained for a special case of the inference problem. Suppose the predictions represent the c o e f f i c i e n t s of some expansion of m^ . Presumably, we have chosen the functions P L ( X ) so that such an expansion represents m£ well. Then a bound more confining than (47) may be applied as follows. Define the subspace*^, with basis {pi...pj>} and dual basis orthogonal projection of mE onto«A% with respect to the dual basis. In approximating m E(x) as an expansion in terms of the functions 7*TH- 1 7* < M (47) {p 1...p , J}. Then the predictions, y are the c o e f f i c i e n t s of mE, the pt' (x), we write m (x) = m*(x) + m*(x) (48) 36 and hope that that part of m e(x) not accounted for in our f i n i t e expansion, m*(x), i s small. That i s , m£ is a good approximation to mg.. From (48) we find I |m*||2 S ||m,||* - ||mt||* (49) and express our prejudice about m* by ||m*|| £ M* (50) where M* is small. In matrix form, (50) becomes 7* H - 1 _ 0 0 0 A" 1 7 * < M* (51) When the data are substituted into (51), a positi v e d e f i n i t e quadratic form for 7" is defined that y i e l d s a much more r e s t r i c t i v e solution than does (47). Parker calculated the additional uncertainty imposed on the solution by erroneous data. Given the s t a t i s t i c s of the data errors, he calculated the p r o b a b i l i t y that a given prediction i s constrained by the data values through arguments associated with the geometry defined in 3^. by the quadratic forms (47) or (51). Parker concluded that this procedure i s not numerically e f f i c i e n t . Parker's algorithm was presented in a geometric context. By introducing his parameter-data space, the roles of data, the a p r i o r i bounds, and the predictions are made cle a r . For the problem of model f i t t i n g , his bound (50) and associated quadratic form (51) 37 give superior r e s u l t s . Unfortunately, the use of the dual basis for requires that both a D-dimensional matrix and a P-dimensional matrix be inverted, with no method of s t a b i l i z a t i o n proposed for i l l - c o n d i t i o n e d problems. The imposition of more r e s t r i c t i v e bounds such as (51) i s less straightforward with our algorithm. To apply such bounds, the introduction of linear a p r i o r i constraints beyond the data, which may become somewhat contrived unless the investigator i s careful in c o n t r o l l i n g his bias. Jackson (1979) derived matrix equations which may be used to solve the inference problem i f m f(x) i s represented by a f i n i t e number of parameters. His a p r i o r i information was included as linear constraints and v i a a hypothesized covariance matrix r e l a t i n g the model parameters. It i s interesting to note the s i m i l a r i t y of his and our r e s u l t s , given the d i f f e r e n t approaches to the problem. . Oldenburg (1983) has shown how linear programming (LP) may be applied to the solution of linear inference problems. The LP method has several advantages: (1) The solution i s not p r o b a b i l i s t i c ; the bounds on a prediction are the greatest and least consistent with a given data set; (2) Rather than imposing a bound on ||mg|| to find a solution, LP allows the use of non-probabilistic, pointwise constraints. (3) If applicable, a p o s i t i v i t y constraint may be used. be enlarged with suitable functions via the 38 The p r i n c i p l e drawback to the use of LP for multiple prediction i s that each prediction i s done separately; computing time increases rapidly with the number of predictions. For the example presented in t h i s paper, the LP solution required approximately three times as much computing time. 39 CHAPTER 10 CONCLUSION We have presented Backus' solution to the linear inference problem with multiple predictions in matrix form. The matrix algorithm i s e f f i c i e n t and stable, and with the careful application of l i n e a r a p r i o r i constraints and smoothing assumptions, i t appears as i f usefully r e s t r i c t i v e solutions may be obtained. We hope that t h i s elementary treatment of Backus' work w i l l help to earn for i t the attention i t deserves. 40 REFERENCES Backus, G.E., 1970a, Inference from inadequate and inaccurate data, I, Proc. Nat. Acad. S c i . U.S., 65, 1-7. Backus, G.E., 1970b, Inference from inadequate and inaccurate data, II, Proc. Nat. Acad. S c i . U.S., 65, 281-287. Backus, G.E., 1970c, Inference from inadequate and inaccurate data, III, Proc. Nat. Acad. S c i . U.S., 67, 282-289. Backus, G.E.,1972, Inference from inadequate and inaccurate data, in Mathematical Problems in the Geophysical Sciences, Amer. Math. S o c , Providence ,RI. Backus, G. and G i l b e r t , F., 1968, The resolving power of gross earth data, Geophys. J . R. astron. S o c , 16, 169-205. Backus, G. and G i l b e r t , F., 1970, Uniqueness in the inversion of gross earth data, P h i l . Trans. R. Soc. Lond. Ser. A, 266, 123-192. Gi l b e r t , F., 1971, Ranking and winnowing gross earth data for inversion and resolution, Geophys. J . R. astron. S o c , 23, 125-128. Jackson, D.D., 1979, The use of a p r i o r i data to resolve non-uniqueness in linear inversion, Geophys. J. R. astron. S o c , 57, 137-157. Johnson, L.E. and G i l b e r t , F., 1972, Inversion and inference for teleseismic ray data, Methods in Computational Physics, V. 12, Academic Press, New York, 231-266. Oldenburg, D.W., 1983, Funnel functions in linear and non-linear appraisal, Jour. Geophys. Res., 88, 7387-7398. Parker, R.L., 1971, The determination of seamount magnetism, Geophys. J . R. astron. S o c , 24, 321-324. Parker, R.L., 1977a, Linear inference and undeparameterized models, Rev. Geophys. Space Phys., 15, 446-455. Parker, R.L., 1977b, Understanding inverse theory, Ann. Rev. Earth Planet. Sci.,5, 35-64. Savage, L.J., 1962, The Foundations of S t a t i s t i c a l Inference, Methuen, London. 41 APPENDIX 1 Matrix Form of the Predictions Write the predictions as \ = X + X = <Pi>ml> + <Pt'<> i = 1 - . -P (A1.1) We s h a l l express the predictions as elements of E p , that i s as column vectors or P-tuples ? = + = (y\...y£)T + (y\,...,%±)T (A1.2) To calculate r" , write so that, for i=1,...P, 7." = <pr,ml> = <p. ,m£> T> • < f , ^ i < P i ' 9 j ' > = | , 0 j < g j ,P;> Thus, i f Gjt= <gj ,p.t >, 7." = Z G « ft t j * i J 1 J Since J3 = ( 0 l f . . . f t _ ) T = [m'']f we have 7 n = GT[m^] (A1.3) 42 To find 7 T, we need to know the inner product matrix tovfo, that i s A _ L, where Axtj = <PfrP/> To fi n d a simple expression for Ax, note that A J . £ j = <P**Pj> = < P l - P f . P j - ^ > = <Pt rPj > - <P{>Pj > = A t j " Alltj Now, A«ij - <Pi'Pj> = tPf J P tPf] Using a result from Appendix 2, and standard i n d i c i a l notation, A » t j = (r- 'Ipi^) 1 " r ( r - ' t p j V ) = <gk'PL> ^ <g1 , P j > " G*rn G l j or A , , = G T r- 1 G whence A x = A - G T T" 1 G (A1.4) Now, on$2, for each i = 1...P, Tt - < P t » " t > - f P ^ A-t [ » S 43 Since {p*f.. .p"^ } i s our basis for P, we find f x = Ax [mxB] (A1 .5) 44 APPENDIX 2 Dual Bases, Inner Product Matrices and Orthogonal Projections Consider the subspace ofi^tviO=asp{g,,.. .g p }. The vectors {g^ K i=1,...D are assumed to be l i n e a r l y independent, and thus form a basis for £>. The basis dual to thi s i s the set of D vectors {g1 , ...g1*} defined by <gl,gJ> - &q i , j = 1...D (A2.1) where 6ij i s the Kronecker del t a . The dual basis always exists and is unique. Now, i f one of the elements of the dual basis, g J i s expressed in terms of the o r i g i n a l basis, g J = i B j j k g k By (A2.1), <gL ,gJ> = j^Bjk <gx ,gk> = 6^ Defining the inner product matrix I» = <g-t ,gj > leads to Br = I or B = r~ 1. (Since T is po s i t i v e d e f i n i t e and symmetric, T"1 always exists and i s posi t i v e , d e f i n i t e and symmetric). The above relation allows the following characterization of T" 1; i t i s the matrix whose columns contain the c o e f f i c i e n t s of the vectors g l with respect to the basis {gi,...g D}. Thus i f m i s inJ0, and [m] and [m]^ are the matrices of m with respect to the o r i g i n a l and dual bases, respectively, 45 [m] = r- ' tmlj (A2.2) L e t m be an element o f ^ . The o r t h o g o n a l p r o j e c t i o n of m onto © i s t h a t v e c t o r m" i n such t h a t f o r a l l v e c t o r s x i n <m-m",x> = 0 (A2.3) By the p r o j e c t i o n theorem m" e x i s t s and i s unique. S i n c e x i s i n i Q , (A2.3) i s e q u i v a l e n t t o the D e q u a t i o n s <m-m",gi> = 0 i=1...D (A2.4) whence <m,gL> = 7i = <m",g-.> (A2.5) W r i t e m" i n terms of the d u a l b a s i s m so t h a t (A2.5) becomes \ = < m , g L > = Jjaj < g J , g L > = Lai 5^ which i m p l i e s 7i = a-L i = 1 . . .D (A2.6) Thus, the numbers yL = <m,qL> a r e the c o e f f i c i e n t s of m" w i t h 4 6 respect to the dual basis of<£D. By (A2.2) then, [mn] = T-1 [m-]j = V'y (A2.7) Equation (A2.7) demonstrates that the data are equivalent to knowledge of mE". 47 APPENDIX 3 Calculation of fl, The joint normal d i s t r i b u t i o n for #t = t ml?l i s given by equation (25) p(3 t>-*<Mt> Trv-ir(jMt> + ti(M-a2)rpt or p (0t 0 t ( rv- T + ( M j \ 2 ) r ) 0 t - 2 0 t r v - 1 r 0 + constant (A3.1) where /3 = T" 1 7 . Equation (A3.1) i s to be rewritten as p ( 0 t ) — > ( 0 t - 0 t ) C ( 0 t - 0 C ) + constant (A3. 2) which may be expanded to p(/3t)—+ 0{c/?t - 20tC0e. + constant (A3.3) For equality between (A3.1) and (A3.2) to hold, up to a constant, C = ( r v T + (M -„ 2 )D (A3.4) and 48 For t h i s to be the case fc = ( r v - ' r + (MJJ 2) r ) • 1 r v - 1 r/f (A3.5) Note that (rV"T + (Mj,2)D i s po s i t i v e d e f i n i t e and symmetric, so that i t has a positive d e f i n i t e symmetric inverse. Recalling that p=V}y, we fin d % = ( n r , r+ (Mj i 2 ) r ) - ' r v - 1 7 (A3.6) 49 APPENDIX 4 Computational Details If the errors on the data are not independent, linear combinations of the data equations may be found that y i e l d an equivalent data set with independent errors (Gilbert, 1971). Assuming that t h i s has been accomplished i t is convenient to normalize each data equation by the standard deviation of the associated error so that the resulting normalized data a l l have unit variance, that i s we use Ti = <m£ ,g_i> ai ai i = 1 . . .D (A4. 1 ) as the data equation rather than (5). The normalized data kernels are then g\ = g^ /a-L and the inner product matrix T i s found using these normalized kernels. The matrix V - 1 i s then the id e n t i t y . To carry out the solution, i t i s necessary to invert the DxD matrices ( r V ' T + M"jj2r) and T . When using normalized kernels, we find ( r v - 1 r+M7 t 2r) ~ 1 = ( r+M« 2i) - 1 r - 1 (A4.2) In many geophysical problems, T w i l l be poorly conditioned. The addition of My(2I w i l l generally be s u f f i c i e n t to allow the ca l c u l a t i o n of a stable inverse for (T + Mjj 2 l ) . T, however, may remain i l l - c o n d i t i o n e d . Note that T"1 appears only as a premultiplier of G, i . e . as r _ ,G. By (A2.2) and the d e f i n i t i o n of G, the matrix r~'G has as columns the c o e f f i c i e n t s of the vectors p". If T"1 i s found only 50 approximately, in a manner described below, the effect i s to approximate the p^ ". Since the p? are found in the expression = <p",m£>, th i s approximation w i l l introduce some error into the cal c u l a t i o n of the 7.". This error w i l l be quantified, and in cases of p r a c t i c a l interest i t should be i n s i g n i f i c a n t with respect to data error. Since T i s positive d e f i n i t e and symmetric, i t possesses the T T singular value decomposition r=UQU with inverse r'^US^'U , where U i s an orthogonal matrix containing the eigenvectors of T as i t s columns. 0 i s diagonal, with entries Xj which are the eigenvalues of T. The condition number of T i s defined as c=|Xmin/Xmax|. If c is near the precision of the machine used in the ca l c u l a t i o n of r~ 1, i t i s not possible to find a stable inverse. Suppose the eigenvalues of T are ordered with decreasing size and that the S smallest are discarded so that iX^-^/X,! i s above machine prec i s i o n . Then defining IT as the (Dx(D-S)) matrix of the f i r s t (D-S) eigenvectors of T and as the (D-S)x(D-S) diagonal matrix containing the f i r s t (D-S) eigenvalues of r , T~1 = UO~ 'u" 1 " approximates r ~ 1 . To calculate the error in 7" introduced by t h i s approximation write yr = <pr,mf> = <p",mf> + <pt"-p^,mf> where p." i s the approximation to p" found by using T" 1. Now <pV§",m;> ^ I |p"~Pt" I I I l«n£l I S . Z J I P I ' H X J M , , = e, so i f U and £2 are truncated so that c i s above machine precision, and e << 67. for i=1,...D, then r~ 1 provides a stable approximation to 51 r - 1 without introducing s i g n i f i c a n t error into the solution. 52 APPENDIX 5 Regularization of r The regularization procedure used to obtain equation (40) and (41) are most readily understood by using the orthonormal basis for given by Parker (1977b). As in Appendix 4, we consider that the data and associated kernel function have been normalized by the standard deviations of the data errors. The inner product matrix r is decomposed into We form linear combinations of the normalized kernels to find r = u n u 1 " (A5.1) (A5.2) the vectors are easily shown to be an orthonormal basis for The c o e f f i c i e n t s of mi with respect to this new basis are aL = <mner*Pi > or (A5.3) i f 7 = U 7 53 Thus yL = Vii <ml,\bL> (A5.4) Now, suppose that m i s a model found from 0 = [m] = (T + b I ) - ' 7 (A5.5) and 7c is the data we would expect from m, that i s 7L = <m,gt> Then, as before yt = <m^L> (A5.6) where 9 c= u T ^ Now, to fin d the c o e f f i c i e n t s of m with respect to the orthonormal basis, that i s <m,\bi>, we note 7 = (T + blYi = UfiUTJ3 + b i t or 53 Thus 7t = VXi. <ml,\bi> (A5.4) Now, suppose that m i s a model found from 0 = [m] = (T + b i ) ' 1 7 (A5.5) and 7« i s the data we would expect from m, that i s it = <m,gt> Then, as before = < m><> (A5.6) where 7 C = UT7*e-Now, to fin d the c o e f f i c i e n t s of m with respect to the orthonormal basis, that i s <m,4>i>, we note 7 = (T + bl)0 = UfiUT0 + bI0 or 54 7 = + b l f (A5.7) where 1?=UTJ3, since UU T = I From (A5.7), (A5.8) and 0 = U/3 Now, or <m,^> = 7L "X7+B" (A5.9) With these re s u l t s we may find x 2 as X 2 = 2(7° - 7 t > 2 - ? ( T ! - T C ) 2 = .2 X L [<m,i// t>-<m^ ,\pi>] z ? 2 [ b ] 2 (A5.10) ( r e c a l l i n g that, for the normalized data, a^=1). And | |m| | 2 = <m,m> = £ jy? Xi using (A5.9) and the fact that the 1/7 are orthonormal.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A practical solution for linear inference computations
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A practical solution for linear inference computations Schlax, Michael Garfield 1984
pdf
Page Metadata
Item Metadata
Title | A practical solution for linear inference computations |
Creator |
Schlax, Michael Garfield |
Publisher | University of British Columbia |
Date Issued | 1984 |
Description | In this thesis, I shall present a matrix form of Backus' theory of linear inference with multiple predictions. The Bayesian approach of Backus allows the treatment of erroneous data and the imposition of the essential a priori bound on the model norm. The X² statistic will be introduced to construct a most likely model and bound the norm of all acceptable models from above and below. This results in more reliable, and possibly more confining, estimates of the predictions than provided by only an upper bound. The algorithm derived is robust and efficient, and estimates comparable to those obtained from Oldenburg's linear programming algorithm have been achieved. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-16 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096150 |
URI | http://hdl.handle.net/2429/24768 |
Degree |
Master of Science - MSc |
Program |
Physics |
Affiliation |
Science, Faculty of Physics and Astronomy, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1984_A6_7 S35.pdf [ 2.17MB ]
- Metadata
- JSON: 831-1.0096150.json
- JSON-LD: 831-1.0096150-ld.json
- RDF/XML (Pretty): 831-1.0096150-rdf.xml
- RDF/JSON: 831-1.0096150-rdf.json
- Turtle: 831-1.0096150-turtle.txt
- N-Triples: 831-1.0096150-rdf-ntriples.txt
- Original Record: 831-1.0096150-source.json
- Full Text
- 831-1.0096150-fulltext.txt
- Citation
- 831-1.0096150.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096150/manifest