Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Generalized matrix inverses and the generalized Gauss-Markoff theorem Ang , Siow-Leong 1971

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1971_A6_7 A53.pdf [ 2.07MB ]
Metadata
JSON: 831-1.0302209.json
JSON-LD: 831-1.0302209-ld.json
RDF/XML (Pretty): 831-1.0302209-rdf.xml
RDF/JSON: 831-1.0302209-rdf.json
Turtle: 831-1.0302209-turtle.txt
N-Triples: 831-1.0302209-rdf-ntriples.txt
Original Record: 831-1.0302209-source.json
Full Text
831-1.0302209-fulltext.txt
Citation
831-1.0302209.ris

Full Text

GENERALIZED MATRIX INVERSES AND THE GENERALIZED  GAUSS-MARKOFF THEOREM by SIOW-LEONG ANG B.Sc, NANYANG UNIVERSITY,, SINGAPORE. 1970. A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE ;IN THE DEPARTMENT OF MATHEMATICS We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September, 1970 In presenting th i s thesis in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Un ivers i ty of B r i t i s h Columbia, I agree that the L ibrary sha l l make it f ree l y ava i l ab le for reference and study. I fur ther agree that permission for extensive copying of th i s thes i s for scho lar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of th i s thes is fo r f i nanc ia l gain sha l l not be allowed without my wr i t ten permission. Department of Mathematics The Univers i ty of B r i t i s h Columbia Vancouver 8, Canada 20 October 1970. S U P E R V I S O R : D r . S t a n l e y W. N a s h . A B S T R A C T . I n t h i s t h e s i s we p r e s e n t t h e g e n e r a l i z a t i o n o f t h e M o o r e - P e n r o s e p s e u d o - i n v e r s e i n t h e s e n s e t h a t i t s a t i s f i e s t h e f o l l o w i n g c o n d i t i o n s . L e t X b e a n m x n m a t r i x o f r a n k r , a n d l e t U a n d V b e s y m m e t r i c p o s i t i v e s e m i - d e f i n i t e m a t r i c e s o f o r d e r m a n d n a n d r a n k s a n d t r e s p e c t i v e l y , s u c h t h a t s . t ^ r , a n d c o l u m n s p a c e o f X C c o l u m n s p a c e o f U r o w s p a c e o f X c r o w s p a c e o f V . T h e n X ^ i s c a l l e d t h e g e n e r a l i z e d i n v e r s e o f X w i t h r e s p e c t t o U a n d V i f a n d o n l y i f i t s a t i s f i e s : ( i ) X X ^ X = X ( i i ) X ^ X X ^ = X ^ ( i i i ) ( X X ^ ) 1 = U^XX ' u ( i v ) (x'x)' = V + X ^ X V , w h e r e U + a n d V + a r e t h e M o o r e - P e n r o s e p s e u d o - i n v e r s e s o f U a n d V r e s p e c t i v e l y . We further use th i s r e s u l t to generalize the fundamental Gauss-Markoff theorem f o r l i n e a r estimation, and we also use i t i n the minimum mean square err o r estimation of the general model y = X$ + e , that i s , we allow the covariance matrix of y to be symmetric p o s i t i v e semi-definite. / PAGE CHAPTER 1 : INTRODUCTION 1 §1.1 : HISTORY 1 §1.2 : GENERALIZED INVERSES 1 §1.3 : ESTIMATION 2 §1.4 : MINIMUM MEAN SQUARE ERROR ESTIMATOR 2 CHAPTER 2 : GENERALIZATION OF THE MOORE-PENROSE PSEUDO-INVERSE 3 §2.1 : INTRODUCTION 3 §2.2 : DIAGONALIZATION OF MATRICES, AND NOTATIONS 4 §2.3 : GENERALIZED INVERSE 9 §2.4 : MATRIX ORDERING AND THE MINIMIZING OF MATRICES 18 CHAPTER 3 : GENERALIZATION OF THE FUNDAMENTAL GAUSS-MARKOFF THEOREM FOR LINEAR ESTIMATION ' 22 §3.1 : INTRODUCTION AND NOTATIONS 22 §3.2 : GENERALIZATION OF THE GAUSS-MARKOFF THEOREM WITH RESTRICTIONS ON THE COEFFICIENTS MATRIX 25 PAGE CHAPTER 4 : ESTIMATION BY MINIMUM MEAN SQUARE ERROR 37 §4.1 : INTRODUCTION AND NOTATIONS 37 §4.2 : MINIMUM MEAN SQUARE ERROR ESTIMATION 38 §4.3 : THE JUSTIFICATIONS FOR LEAST SQUARES 48 BIBLIOGRAPHY 49 \ / 5 ACKNOWLEDGEMENTS A great debt of gratitude i s acknowledged to Dr. Stanley W. Nash fo r suggesting the topic of t h i s thesis and his assistance and encouragement i n the preparation of t h i s t h e s i s . I wish also to thank Dr. James V. Zidek f o r reading the thesis and Mr. and Mrs. K.G. Choo f o r proof reading and typing of t h i s t h e s i s . The f i n a n c i a l support of the Univ e r s i t y of B r i t i s h Columbia and of the National Research Council of Canada i s also g r a t e f u l l y acknowledge. \ / CHAPTER 1 INTRODUCTION §1.1 HISTORY. We deal with s t a t i s t i c a l inference based on l i n e a r models f o r the expectations and c e r t a i n s p e c i f i e d structures f o r the variances and covariances of the observations. The theory of l e a s t squares i s concerned with the estimation of unknown parameters i n a l i n e a r model. The e s s e n t i a l s of the theory are found i n the works of Gauss (1809) and Markoff (1900). However, c e r t a i n improvements and generalizations have been made by a number of w r i t e r s . Recently the Gauss-Markoff theorem have been generalized by A.C. Aitken (1935 and 1945), Paul S. Dwyer (1958), A.J. Goldman and M. Zelen (1964), John S. Chipman (1964), T.O'. Lewis and^P.L. Odell (1966) ,and S u j i t K. Mitra and C. Radhakrishua Rao (1968). But most of these w r i t e r s placed r e s t r i c t i o n s on the expectations or on the variances and covariances of the observations. §1.2 GENERALIZED INVERSES We consider the Gauss-Markoff set-up, that i s , the regression model, y = X8 • + e mxl mxn n x l mxl where E(e) = 0 and Var.(e) = E(ee') = a U . A u n i f i e d approach to the problem of l e a s t squares estimation covering a l l p r a c t i c a l s i t u a t i o n s uses the concept of a generalized inverse for a singular matrix. The generalized inverse i s c o n s i s t e n t l y defined, e x i s t s , and i s computable as i n the case of a true inverse of a non-singular matrix. This i s discussed i n CHAPTER 2. §1.3 ESTIMATION A In CHAPTER 3, we obtain an estimator, 8, of the unknown parameter 8, that minimizes our generalized measure of biasedness. Furthermore, A among a l l estimators with minimum b i a s , 8 has minimum covariance matrix. A / We c a l l 8 the best l i n e a r minimum bias estimator . §1.4 MINIMUM MEAN SQUARE ERROR ESTIMATOR Considered i n CHAPTER IV, i s the regression model y = X8 + e with the further assumption that 8 have a p r o b a b i l i t y d i s t r i b u t i o n with mean and variance E(8) = 8 and Var.(8) = £ = x 2V r e s p e c t i v e l y . We obtain an e x p l i c i t r e s u l t f o r the minimum mean square er r o r estimator 8 of 8 . CHAPTER 2 GENERALIZATION OF MOORE-PENROSE'S PSEUDO INVERSE §2.1 INTRODUCTION This chapter i s intended to provide the b a s i c concepts of matrices that we need, and a p a r t i a l ordering of symmetric matrices of the same order. We also introduce a gen e r a l i z a t i o n of the Moore-Penrose pseudo-inverse (or generalized inverse). Penrose [10] has shown that, (I) For any m x n matrix A there i s a unique n x m matrix A + s a t i s f y i n g , ( i ) AA +A V = A ( t ) ( i i ) A +AA + - A + + +7 ( i i i ) (AA )' = AA (iv) (A +A) 1 = A +A We c a l l A + the pseudo-inverse of A. I t has the same rank as A, which i s also the rank (and trace) of the idempotent matrices + + AA and A A . (II) A l l solutions of the matrix equation AXB = C are given by X - A +CB + + [Y - A +AYBB +] i f and only i f AA +CB +B - C , where Y has the dimension of X but i s otherwise a r b i t r a r y . + + + (III) The Range of A equals the range of A' . A A and AA are, r e s p e c t i v e l y , the p r o j e c t i o n operators on the range spaces of A + and A . §2.2 DIAGONALIZATION OF MATRICES, AND NOTATIONS Let A be any r e a l m x n matrix of rank r and l e t U and V be m x m and n x n symmetric p o s i t i v e semi-definite matrices of rank s and t r e s p e c t i v e l y . From elementary matrix theory we known that there e x i s t orthogonal matrices H(m x m) and K(n x n) such that, } r lows } m-r lows columns where, Y p 0, • . . , ( ) " o , o , y r H'AK r o 0 0 r columns n-r r = i s a diagonal matrix with r e a l p o s i t i v e diagonal elements. Without loss of generality one can assume that ^_Y 2 _^ ••• — '^x> ^ * Note : I f A i s a complex matrix, then H and K are Hermitian and t h e i r adjoints are used instead of transposes. In the following discussion we have r e s t r i c t e d ourselves to r e a l matrices, though the discussion can be e a s i l y extended to complex cases and the same conclusions s t i l l hold. Write H = [ H ^ H2.] and K - [K J, K 2] . r columns m-r columns r columns n-r columns Then, from H*H = HH* = I m and K'K = KK' = I n , we have, H'H = I , H'H = I while H'H - 0 (r x m - r) , H'H, = 0 (m - r x r ) 1 1 r 2 2 m-r 1 2 ' » 2 I and KjK 1 = I r , K 2K 2 = I n _ r while KjK 2 = 0 (r x n - r) , = 0 (n - r x r ) . Furthermore, / = H^KJ (m x n) Then A^ = XH^ (n x m) i s the Moore-Penrose pseudo-inverse of A, which i s unique. S i m i l a r l y , there e x i s t orthogonal matrices P(m x m ) and Q(n x n) such that r o r o K« = [ H l t H 2] 0 0 0 0 K2 where P'UP "e2 o" } s lows } m-s lows s columns m—s columns. t-,,2 e2 -o, o , e2, o , o, , o , o with ex >_ e2 >_ and, >_ 0 S > 0 say, where Q'VQ = "$ 2, 0~ 0 , 0 } t lows } n-t lows t columns n-t / columns r~A 2 <(>i» 0, • • , o ~ o , j2 <P2»' * , o o , 0, •• with t j j >_ c[)2 <t»t > 0 say. Write P -• [ P l t P 2] and Q = [Q x , Q 2] s columns m-s columns t columns n-t columns So, U - P V = Q V , o" 0 , 0 $ 2 , ( P P' - [ P j , P 2] 0 , 0 + -2 Q* = [Q x» Q21 V % = Q1*Q{J • so, • any r e a l numbers. Then u°(u+)e (u +) 3u a P 1 0 a - 3 p ; 0 2, o 0 , 0 P i _P2_ = p e 2pj "V, 0~~ 0 , 0 = Q 1 *2 Q i '<u*>+ = PjO" 1 P{ = (u +) % • .<vV - Q» = (v +) % ( U + ) 3 = / P ^ " 3 PJ where f o r a > 3 P XPJ , f o r a = g { ( U + ) 3 - a , f o r 8 > a . S i m i l a r l y , i f we put V a = Q ^ Q j and ( V + ) 3 = Q1$~3Q{ , then A v V = ( V + ) 3 V a = Q ^ Q J va-e ^ M i . (v +) p- a , f o r a > 6 for •. a = 8 for 6 > a . Proof This follows, since PjPj = I g and Q|Q1 = I t . Q.E.D. Remarks : The above Lemma i s s t i l l true i f we l e t a and 6 be complex numbers. H = [ H ^ H 2] and P = [ P ^ P 2] are both m x m orthogonal matrices, so there i s an m x m orthogonal matrix C such that P = HC, Cll» C 1 2 i . e . , [ P l f P 2] = [H x, H 2] c 2 1 , c 2 2 = [ H J C J J + H 2 C 2 ^ , H j C ^ 2 + H 2 C 2 2 ] > where i s (r x s ) , C, 0 (r x m - s ) , C 9 1(m - r x s) • and C 9 9(m-rxm-s) 1 2 21 ' 2 2 ' Thus H'Pj = H { [ H 1 C 1 1 + H 2 C 2 i ] = C n ' S i m i l a r l y , there i s an n x n orthogonal matrix D such that Q = KD , i . e . , [Q :, Q 2] = [K x, K 2] Dll» D 1 2 D 2 1, .D22 = [ K ^ + K 2D 2 1, K X D 1 2 + K 2D 2 2] , s where D n ( r x t) , D 1 2 ( r * n - t) , D 2 1 ( n - r x t) and D 2 2 ( n - r x n - t) Thus K]Q 1 = K J I K J D H + K 2 D 2 1 ] . - D N . §2.3 GENERALIZED INVERSE Chipman [3, p.1084] has generalized the Moore-Penrose pseudo-inverse i n the following way. Let A be any m x n matrix and A + the unique Moore-Penrose pseudo-inverse of A s a t i s f y i n g (t), and l e t U and V be given symmetric p o s i t i v e d e f i n i t e matrices of order m and n r e s p e c t i v e l y . Defining X = U^AV-^ and X^ = V*A+u"* , i t i s immediately v e r i f i e d that ( i ) XX^X = X ( i i ) X^XX^ = X^ ( i i i ) (XX'V = U _ 1XX^U (iv) (x'x)' = v'VStV . X^ i s unique with respect to U and V . We c a l l X^ the generalized inverse of X with respect to U and V(or the (U, V)-pseudo-inverse of X). Since i t i s c l e a r from the context which matrices U and V apply, we sometimes c a l l X^ the generalized inverse of X f o r short We are now i n a p o s i t i o n to define our version of generalized inverse of a matrix, which i s a gen e r a l i z a t i o n of Chipman's generalized inverse of X with" respect to U and V. We require only that U and V be symmetric p o s i t i v e semi-definite matrices. In t h i s s e c t i o n , we use the same notations as those above. D e f i n i t i o n 2.1 Let S(A) denote the span of the columns {a^,a2,•••>an^ o f A, i . e . , the column space of A i s S(A) = {£ e E m | £ = Aa, a e Efl} , E m and E n are m-and n-dimensional Eudidean spaces . S(A) i s sometimes c a l l the range space of A . Also l e t R(A) denote the span of rows {a^, ctg, ••• , a"m} of A, i . e . , the row space of A i s R(A) = { rf e E n | rf = aA, a" E E m } = range space of A' = S(A') . To help reading, we restate from G r a y b i l l [7, §5.4, pp. 87-91] some properties of matrices which we need f o r the discussion of the problems considered i n the r e s t of the sec t i o n . Lemma 2.2 Let A be m x n matrix and B be n x k matrix, then a) R(AB)C:R ( B ) and S(AB)ciS(A) b) rank AB = rank B i f and only i f ' R(AB) = R ( B ) and, rank AB = Rank A if.a n d only i f S(AB) =S(A) . Proof • a) R(AB) = { n" e E k | r\ = oTAB, a e E m } = { n e Ei.I rf = J B , F = O A E L , C I E E } d { n e E k | n = F B , J e E n } = R ( B ) S i m i l a r l y , S ( A B ) C S ( A ) . b) I f rank (AB) = rank (B), and we have R(AB) R(B), then R(AB) = R(B) . For, i f not, then there e x i s t s rf £ R(B) and r\ t R(AB) , so that rank B = rank a c o n t r a d i c t i o n . B >_ rank AB _ n* _ > rank AB = rank B, which i s Conversely, i f R(AB) = R(B), then rank AB = dimension of R(AB) = dimension of R(B) = rank B. S i m i l a r l y , rank AB = rank A i f and only i f S(AB) = S(A). Q.E.D. Lemma 2.3 Let A and B be m x n matrices . Then a) R(A) CZ R(B) i f and only i f there e x i s t s a square matrix H such that A = HB. S ( A ) C S ( B ) i f and only i f there e x i s t s a square matrix K such that A = BK. b) R(A) = R(B) i f and only i f the above H i s non-singular / S(A) = S(B) i f and only i f the above K i s non-singular. Proof a) Let { c t j , ' . . , ctm } be the rows of A and { F i , • • •, F m } be the rows of B . m I f R ( A ) C R ( B ) , then a. = \ h..g. , J i = l i . e . .A = HB where H = Conversely, i f A = HB, then R ( A ) C R ( B ) by Lemma 2.2. a). S i m i l a r l y S ( A ) C S ( B ) i f and only i f there e x i s t s a square matrix K such that A = BK . b) I f R(A) = R(B), then A and B have the same rank. From G r a y b i l l [7, p.13, Th. 1.6.9] we have, there e x i s t non-singular matrices H and K such that A = HBK whenever A and B have the same rank . Since K i s the columns l i n e a r operator and R(A) = R(B), i t follows i n the present case that K = I n , i . e . , A = HB and H i s non-singular. Conversely, i f A = HB and H i s non-singular, then B = H ^ A and R ( A ) C R ( B ) , R(B) C R(A). I t follows that R(A) - R(B). N S i m i l a r l y , we can show that S(A) = S(B) i f , and only i f K i s non-singular. Q.E.D. Lemma 2.4 ( i ) The conditions, a) S ( P 2 ) C : S ( H 2 ) ' / ' a') S.CH^CSCPj) • are equivalent to -b) S ( Q 2 ) C S ( K 2 ) J lb1) S O C ^ d S C Q j ) ( i i ) The conditions, a') S(H X) C S(P 1)' a") S ( A ) C S(U) = R(U) • are equivalent to • b') S ( K 1 ) C I S ( Q 1 ) b") S(A') = R(A)C= R(V)=S(V) Proof We have A = HjTKj , and U = P 1 0 2P{ i s symmetric, so that ^ . Hence, by Lemma 2.2 S(A) = SCHj), and H x = AK^" 1 , P x = U P 1 0 ~ 2 SCP^ = S(U) = R(U) That i s ' a 1 ) i s equivalent to a") . S i m i l a r l y b') i s equivalent to b") . Q.E.D. Lemma 2.5 Let A = H r, o 0, 0 K' be any m x n matrix of rank r and r-~2 U = P 0 % 0 0 , 0 P'(m x m) and V = Q R 0 2 , 0~ 0 , 0 Q'(n x n) be symmetric p o s i t i v e semi-definite matrices of rank s and t re s p e c t i v e l y as i n §2.2, such that s.t >_ x and H, K,P and Q are orthogonal matrices (as i n §2.2), s a t i s f y i n g the following conditions: a) S ( P 2 ) C S ( H 2 ) b) S ( Q 2 ) C S(K 2) , where H = [ H ^ H 2] , K = [K x, K 2] , P = [ P l t P 2] r columns m-r columns r columns n-r columns s columns m-s columns and Q = [Q x, Q 2] t columns n-t columns Define X = IAACV*)** and = V ^ A + ( U * ) + . Then they s a t i s f y the following conditions : i ) X X f X = X i i ) X ^ X X ^ i i i ) ( X X ^ ) ' i v ) ( X ^ X ) ' = X' = U+XX^U = V X r X V , tf) and x ' ' i s unique with respect to U and V . Proof Since P and H are both m x m orthogonal matrices there e x i s t s an orthogonal matrix C = C l l > C12 c 2 1 , c 2 2 such that [P , P 2] = P = HC = [Hj, H 2] Cll» C12 c 2 1 , c 2 2 = [ H 1 C n + H 2C 2 1, H j C ^ + H 2 C 2 2 ] . S i m i l a r l y , f o r both n x n orthogonal matrices Q and K we have [ 0 ^ , Q 2] = Q = KD = [ K 1 D n + K 2 D 2 i , K XD 1 2 + K 2D 2 2] where D D n , D 1 2 D 2 1, D 2 2 i s an orthogonal matrix From hypotheses a) S ( P 2 ) C S ( H 2 ) and b) S ( Q 2 ) C S ( K 2 ) i t follows that C 1 2 = 0 and D 1 2 = 0 . This implies that 'C 1 1C{ 1 = I r and D11 D11 = I r s i n c e c c ' = ^ a n d D D ' = *!! s d To show X X r X = X , f i r s t note that XX* = U ^ A ( V ^ ) W ( u V A j T K; ql Q; K x r - 1 H J ( u * ) + U \ H ' ( U * ) + So x x ^ x - u*H1H•(uVu*H1^K1HvV," = u * H 1 c 1 1 q 1 r K ' ( v * ) + since H^P1 = HJ[H 1C 1 1 + H 2C 2 2] = C H S i m i l a r l y , X rXX r = X r j. p . p Furthermore (XX r)' = ( 0 4 ) H H | D J X = U +[U^H 1H|(U^) +]U = U XX rU 4 + 4 S i m i l a r l y (X rX)' = V X fXV I t remains to show that X* i s unique with respect to U and V. Suppose X = A ( 1 ) ( V % ) + = U % A ( 2 ) ( V % ) + and S(P 2) <= S C H ^ ) , S ( Q 2 ) d S ( K ^ 1 } ) , S ( P 2 ) C S ( H | 2 ) ) , S ( Q 2 ) C : S ( K ^ 2 ) ) > where H ( 1 ) , K ( 1 ) and H ( 2 ) , K ( 2 ) are orthogonal matrices that diagonalize A ^ and A^ 2^ r e s p e c t i v e l y . Then (U 2) XV* = (U*) U'AV '(V*) V* = (U*) U*AV '(V*) V* So Since i n general, i f S ( P 2 ) C T S ( H 2 ) , then (I - P ^ ) ^ = P 2 P 2 H i = 0 > i t follows that H l = P l P i H l S i m i l a r l y , we have Kj = K{QiQi Hence, i t follows Hp>r ( 2 ) K P > ' = A<2> Therefore x ( l ) ^ = x ( 2 ) ^ a s r e q u i r e d . Q.E.D. The matrix X of lemma 2.5 w i l l be c a l l e d the generalized inverse of X with respect to U and V. This terminslogy i s j u s t i f i e d by the following theorem, which shows that one can s t a r t with X i t s e l f , and need not s t a r t f i r s t with another matrix A . THEOREM 2.1 Let U(m x m) and V(n x n) be two symmetric p o s i t i v e semi-d e f i n i t e matrices of rank s and t res p e c t i v e l y . Let X be any m x n matrix of rank r <^min{s, t} and s a t i s f y i n g the conditions : a) S ( X ) C S ( U ) , and b) R(X) CL R(V) . Then there e x i s t s a unique n x m matrix X* , which i s the generalized inverse of X with respect to U and V . P P + Proof Note that the equation X = U £A(V £) has solutions f o r A i f , and only i f , U^(U^) +XV^(V^) + s PjPjXO^Qj = X . But t h i s condition i s equivalent to a) S(X)CZ.S(U) , and b) R(X) CT R(V) . One of the P + P solutions i s A = (U 2) XV 2 . This i s the s o l u t i o n such that S ( A ) C S ( U ) and R(A) CT R(V) . Note that rank X = rank A, since P P + rank X = rank U 2A(V 2) <_ rank A , and rank A = rank (U^) +XV^ <_ rank X . 4 P + P + Now write X r = V 2A (U 2) and apply Lemma 2.5 and the conditions S(P 2) CZ S(H 2) and S ( q 2 ) C T S ( K 2 ) , which are equivalent to S ( A ) d S ( U ) and R ( A ) C R ( V ) by Lemma 2.4 . Q.E.D. §2.4 MATRIX ORDERING AND THE MINIMIZING OF MATRICES. To be self-contained and to f a c i l i t a t e reading we would l i k e to indude some properties of matrix ordering and the minimizing of matrices, which were stated i n Chipman [3, pp. 1092-1094] . Let A be any square matrix. As usual, A w i l l be c a l l e d p o s i t i v e d e f i n i t e i f x'Ax > 0 f o r a l l x ^ 0 ; non-negative d e f i n i t e i f x'Ax >_ 0 f o r a l l x ; zero d e f i n i t e i f x'Ax = 0 f o r a l l x ; and p o s i t i v e semi-definite i f x'Ax >_ 0 f o r a l l x, but x'Ax f 0 f o r some x ^ 0 . For these four concepts we write A > > 0 , A > 0 , A ~ 0 and A > 0 re s p e c t i v e l y , where 0 i s a n u l l matrix. Thus " A > 0 " means " A £ 0 but not A z 0 " . F i n a l l y , we define A £ B to mean A - B > 0 ; th i s may also be written B < A . Lemma 2.5 The r e l a t i o n i among square matrices i s t r a n s i t i v e , and among / symmetric matrices i s also anti-symmetric . Proof By the i d e n t i t y , x'(A - C)x = i t i s c l e a r that £ i s t r a n s i t i v e . x' (A - B)x + x' (B - C)x , I t remains to show that, f o r A and B symmetric matrices, A £ B and B £ A imply that A = B, i . e . , anti-symmetry holds. By d e f i n i t i o n we deduce that A z B i . e . the matrix C = A - B i s zero d e f i n i t e and C i s symmetric. Let x be such that XJ[ = 1 and XJ = 0 for j ^ i , then x'Cx = 0 implies c ^ = 0 . Hence c-^ = 0 f o r a l l i , i . e . , a l l the diagonal terms of C vanish. Furthermore, l e t x^ = X j = 1 and x^ = 0 f o r i f k ^ j then x'Cx = 0 implies c^j + c..^ = 0 . But C i s symmetric, so c^j = " j t ^ - j + c-}±] = 0 . Hence C = 0 and A = B . Q . E . D . In view of t h i s lemma, we s h a l l speak of minimizing a symmetric non-negative d e f i n i t e matrix, which simply means f i n d i n g a matrix A e Q(, where 0< i s a c e r t a i n class of matrices, such that B i A f o r a l l B e Of . Owing to the anti-symmetry of the r e l a t i o n £ , i f a set of symmetric matrices has a minimum, the minimum matrix i s a f o r t i o r i unique. Let X be a m x n matrix and X + i t s Moore-Penrose pseudo-inverse. Let be a c o l l e c t i o n of n/x m matrices A , such that A X X + i s independent of A , that i s , Ot= { A | A X X + = A Q , f o r some f i x e d A Q } . Then consider the problem of minimizing A A ' . Write A = A [ X X + + (I - X X + ) ] ; from ( X X + ) ' = X X + and X X + X X + = X X + , we have A A ' = A X X + A ' + ( A - A X X + ) ( A - A X X + ) ' = A X X + A ' + A ( I - X X + ) A ' = A QA^ + A(I - XX +)A' whence, since only the second term depends on A, minimizing AA' is equivalent to minimizing A.(I - XX )A' . The solution A = AXX = A Q for minimizing A(I - XX+)A' is by hypothesis independent of A for A e 0 ( One gets the same solution of A = AXX+ when A(I - XX+)A' is differentiated with respect to the elements of A and the derivatives are set equal to zero . For let, Q = A(I - XX+)A' , then <5Q = 6A(I - XX+)A' + A(I - XX+)<SA' . Since 6A is arbitrary, i f we put 6Q = 0 , i t follows that A - AXX+ = 0, that is A = AXX _ 1 = A Q . It is customary, instead of minimizing a matrix AA' , to minimize its trace, or what amounts to the same thing, to minimize the spectral norm (also called its Frobenius norm), defined by || A|| = / trace AA' . The trace of AA' is simply the sum of squares of a l l the elements of A. To show that these procedures are equivalent, i f a matrix A Q is a minimum in a set 0( , then A - A Q £ 0 for a l l A e 0 ( implies, in particular, that the diagonal elements of A Q are a l l at an absolute minimum; in fact their sum, trace A Q , is a minimum . A-^  £ A Q always implies trace Aj >_ trace A Q . The converse is not true; that i s , trace A^ >_ trace A Q does not imply Al }> AQ . However, i f a minimum matrix exists, i t will have the smallest trace. Thus the minimization of trace A is a correct procedure, both for finding a minimum matrix i f i t exists, and for establishing its existence. Thus, while either method is valid, the proceduse of minimizing A is simpler and more direct, and is that which will be followed here. CHAPTER 3 GENERALIZATION OF THE FUNDAMENTAL GAUSS-MARKOFF THEOREM FOR LINEAR ESTIMATION §3.1 INTRODUCTION AND NOTATIONS In t h i s chapter, we consider a regression problem, y = XB + e mxl mxn n x l mxl y i s an m x 1 vector of observations; X i s a known ( r e a l ) * m x n matrix; 6 i s an n x 1 vector of fi x e d but unknown parameters to be estimated; and e i s an m x l vector of random variables (errors) such that E(e) = 0 and covariance matrix E(ee') = cr U , where U i s a known symmetric p o s i t i v e semi-definite matrix; a > 0 i s an unknown s c a l a r ; and E i s the expectation operator. This model w i l l be re f e r r e d to as (y, Xg, a U) . We are going to f i n d whenever pos s i b l e a l i n e a r estimator A 8 = By + b (*) Here, we r e s t r i c t o u r s e l f to r e a l matrices, but i t i s easy to deal with complex matrices and to show that the same r e s u l t s s t i l l hold. o f g w i t h t h e p r o p e r t y t h a t , i f V = a'X, where iL' = (Jl^, &2, ••• , in) and a' = ( a ^ a 2 > ••• , a n ) , t h e n E(£'g) = V8 i d e n t i c a l l y i n 8 . T h i s r e q u i r e m e n t can be r e d u c e d t o E(X(^) = Xg i d e n t i c a l l y i n g . Thus, E(Xg) = E(XBy + Xb) = XBXg + Xb = Xg . Hence XBX = X and Xb = 0 , so t h a t b e N(X) = { n| Xn = 0 } , t h e n u l l s pace o f X, and BXBX = BX , t h a t i s , BX i s i d e m p o t e n t . From E(C'g) = C'g , we a l s o o b t a i n C'b = 0 and £'BX = 5' , so t h a t BX p r o j e c t s e v e r y l i n e a r f u n c t i o n a l I1 i n t o t h e spac e o f e s t i m a b l e f u n c t i o n a l s , X = { £'| £ ' = a ' X } , t o w h i c h b i s o r t h o g o n a l . I f an u n b i a s e d l i n e a r e s t i m a t o r o f 8 does n o t e x i s t , we s h a l l seek an e s t i m a t o r t h a t m i n i m i z e s t h e b i a s i n some s e n s e . U n b i a s e d n e s s o f A a l i n e a r e s t i m a t o r g = By + b r e q u i r e s A. E(g) = E(By + b) = BXg + b = g . / i d e n t i c a l l y i n g , w h i c h i s e q u i v a l e n t t o t h e two c o n d i t i o n s BX = I and b = 0 . I f r a n k (X) = r < n , t h e s e c o n d i t i o n s cannot be f u l l f i l l e d , s o we may se e k t o m i n i m i z e b and ( I - BX) i n some s e n s e . A n a t u r a l p r o c e d u r e i s t o m i n i m i z e t h e m a t r i x [ b , I - B X ] [ b , I - B X ] ' o r t h e norm II [b, I " BX] || - /||b|| 2 + ||I-BX||* , where || ° || is the spectral norm defined by | | B J | = / trace B B ' . This clearly leads to the solution b = 0 for b . Therefore, we need only to seek B so as to minimize (I - B X )(I - B X ) ' or alternatively so as to minimize || I - BX|| . In Lewis and Odell [ 8 ] , they measure the biasedness of the estimation with respect to the range space of X by putting b = 0 and minimizing the quadratic form, [E(ft - 6] *[E(8) -8] = 8'(I - BX)'(I - BX)8 which i s equivalent to setting b = 0 and minimizing the matrix (I - B X )'(I - B X ) . Thus the two c r i t e r i a of biasedness are similar, since (I - B X )'(I - BX) and (I - B X )(I - B X )' are both of the same order and rank, symmetric, and have the same trace. Chipman [3] has generalized the measure of biasedness to the form (I - BX)W(I - B X )' , where W i s any symmetric positive definite A matrix. So, when W is the inverse of the covariance matrix of 8 , then the measure of biasedness i s dimensionless, that i s , the biasedness w i l l remain the same i n spite of any change of scale of measurement for the unknown parameters 8 . We now slightly generalize the measure of biasedness to the form (I - BX)V(I - B X )' , where V i s an n x n symmetric positive semi-definite matrix. §3.2 GENERALIZATION OF THE GAUSS-MARKOFF THEOREM WITH RESTRITIONS ON THE COEFFICIENTS MATRIX In t h i s section, .we consider the model (y, Xg, a U) and l e t V be an n x n symmetric p o s i t i v e semi-definite matrix. We further assume that rank (X) = r ; rank (U) = s ; and rank(V) = t such that r <_ s, t , and S(X)cCS(U) , R(X) CTR(V) . Then we have THEOREM 3.1 Let the model be (y, Xg, cr 2U) , and l e t V be an (n x n) symmetric p o s i t i v e semi-definite matrix as above. Then a necessary and s u f f i c i e n t condition that the bias matrix (I - BX)V(I - BX)' be. a minimum i s that B s a t i s f y , BX = X^X , (3.1) where X^ i s the generalized inverse of X with respect to U and V, or eq'uivalently, BXVX' = VX' (3.2) E i t h e r condition i s equivalent to the condition that B s a t i s f y ( i ) and (iv) of (^ ) i n CHAPTER 2 . Proof As i n THEOREM 2.1 l e t A = (U*) +XV* . Then we can w r i t e X=U*A(V^) + and X^ = V*A +(U*) + . By (iv) of GO , we have 4 + 4 (XrX) ' = V XrXV Hence V(X^X)' = W +[V^A +(U^) +]XV = [V %A +(U %) +]XV = X^XV , that i s , V(X^X)' = X^XV . (3.3) Similarly, since V and V + are symmetric, (3.3) and the transposition of (X^X)' = V+X^XV yield V(X^X)*V+ = X^XVV+ = X^X . It is clear that (3.1) is consistent, since X and X^X have the same rank r . The equivalence of (3.1) and (3.2) follows from the fact that, 4 v 4 4 XrXVX' = V(XrX)'X' = V(XX^X)'. = VX' , (3.4) where use is made of properties (i) and (iv) of (4) and of (3.3) . Thus, postmultiplication of (3.1) by VX' gives (3.2), and postmultiplication of (3.2) by X* V + gives (3.1), since V(X^X)'V+ = X^X . Thus (3.1) and (3.2) are equivalent. We next show that, i f B satisfies (3.1), then (I - BX)V(I - BX)' is minimum . We have (I - BX) = (I - X^X) + (X^X - BX) , and by the transpose of (3.4) whence, X*XV(I - X^X) 1 = X^XV - X^XV(X^X) ' = 0 (I - BX)V(I - BX)' = [(I - X^X) + (X^X - BX)]V[I - X^X) + (X^X - BX)] (I - X*X)V(I - X^X)* + (X^X - BX)V(X*X - BX)' (3.5) since, using the transpose of (3.4), we have (X^X - BX)V(I - X^X) ' = 0 (n x n) The f i r s t term on the r i g h t of (3.5) i s independent of B, and the second i s symmetric non-negative d e f i n i t e and equal to the n u l l matrix i f (3.1) holds. The minimum bias i s therefore, (I - X*X)V(I - X^X)' = (I - X*X)V , by (3.3) and (3.4) . Conversely, i f (X^X - BX)V(X*X - BX)* = 0 , or equivalently, i f (I - BX)V(I - BX)' i s minimum, we show that BX = X^X . We have from CHAPTER 2, V = Q-^Qj and, by Lemma.2.2 , R(X*X - BX) = R[(X* - B)X] C R ( X ) CIR(V) = RCQ^Qj) Note that, (X^X - BX) and Q^Qj are both n x n matrices, hence by Lemma 2.3 , there e x i s t s an n x n matrix N such that (X^X - BX) = NQ^, Let M = NQ = N [Q l f Q 2] = [M1 M 2] , so that N = MQ' = MjQj + M 2Q 2 , and (X^X - BX) = (MJQJ + M 2 Q 2 )Q!Q{ Hence, (X^X - BX)V(X*X - BX)' = M1Q{Q1$2Q{Q1M{ t 2. [ I ( f ) i m h i I \ i ] ( n x n ) i = l If (X rX - BX)V(X rX - BX)* = 0 , i t follows that the diagonal elements of M ^ M J are J <j>?m? . = 0 f o r h = 1, 2, n . Thus m h i = 0 f o r i = l a l l h and f o r i = l , 2, t . Hence = 0 , (n x t) , whenever (X^X - BX)V(X*X - BX)' = 0 . But Mx = 0 implies (X^X - BX) = MjQ' e> 0 , (n x n) . Thus X^X = BX , as required. Thus equation (3.1) i s a necessary and s u f f i c i e n t condition for (I - BX)V(I - BX)' to be. minimum . It remains to show that BX = X^X i s equivalent to the two conditions ( i ) XBX = X, and (iv) (BX) 1 = V+BXV as i n (4) . Let BX=X^X Then XBX = XX^X = X , and (BX)* = (X^X)' = V+X^XV = V+BXV . Conversely, l e t ( i ) and (iv) hold, that i s , l e t XBX = X and (BX)' = V +BXV 4 +4 We have (X rX)' = V XrXV , which implies (x^x) 'v+ = v V x w + = v V [ U % A ( V % ) + ] V V + + 4 = V xrx 4 '4 + Hence (XX)'(BX)' = (X rX)'[V BXV] So (BXX^X)' = V+X^XBXV = V+X^XV Thus (BX)' = (X^X)' I t follows that BX = X^X as desired . Q.E.D. Thus, linear minimum bias estimators with respect to (I-BX)V(I-BX)' are characterized by, b = 0 ; BX = XrX or equivalently b = 0 ; BXVX' = VX' (3.6) We proceed with a discussion of variance. The covariance matrix (or simply the variance) of y i s cr2U = E[(y - Xg)(y - Xg)'] = E(ee') . A. Any linear estimator g = By + b has i t s variance Var. (g) = E[(g - E(g))(g - E(£)) '] = E[Bee'B'] = a2BUB' . A slightly generalization of Penrose's criterion [11, p.17] is this Definition 3.1 A best linear minimum bias estimator of g is an estimator A g = By such that the ordered pair of matrices,. < (I - BX)V(I - BX)', BUB' > is minimized with respect to B in the lexicographic sense, that i s , the optimal B Q is such that either (I - BX)V(I - BX) 1 > (I - B 0X)V(I - B 0X)' or (I - BX)V(I - BX)' z (I - B_X)V(I - B D X ) 1 and BUB' Z BQUBQ for a l l conformable B. Lewis and Odell [8] deal with the model (y, X8, U) , where U i s p o s i t i v e d e f i n i t e , and obtain BX = X +X by minimizing (I - BX)'(I - BX). They furt h e r minimize the covariance matrix BUB' subject to BX = X +X and have the r e s u l t B = (X'U - 1X) +X*U _ 1 . Chipman [3, pp. 1094-1097] considers the model (y, X8, U), where U i s p o s i t i v e definite,and any p o s i t i v e d e f i n i t e matrix V (n x n) . He obtained BX = X^X by minimizing (I - BX)V(I - BX)', and furthermore derived B = X^ by minimizing BUB' subject to BX = X^X , where i s the generalized inverse of X with respect to U and V . Now, we consider the model (y, X8, a 2U) and (n x n) matrix V, where U and V are symmetric p o s i t i v e semi-definite matrices such that S ( X ) C S(U) and R(X) CT R(V) . Then we have, THEOREM 3.2 Let the regression model be (y, Xg, o 2U) as i n §3.1 , and l e t V be an (n x n) symmetric p o s i t i v e semi-definite matrix such that X, U and V s a t i s f y the same conditions as i n THEOREM 3.1. Then the best A l i n e a r minimum bias estimator g of g , that i s , the l i n e a r estimator Hf = By f o r which BUB' i s a minimum subject to BXVX' = VX' (or to the equivalent condition BX = X^X) , i s given by , A 4 4 S = [ X r + Z ( I - XXT)]y , 4 4 with B = X + Z(I - XX ) , where Z i s any (n x m) matrix s a t i s f y i n g R ( Z ) C I R [ ( I - XX^)U] X = S(X) y S ( U ) X . Furthermore, B = X^ + Z(I - XX^) s a t i s f i e s the following i d e n t i t i e s . (i) XBX = X , ( i i ) BXB = X* , ( i i i ) (XB)' = U+XBU , (iv) (BX) ' = V +BXV . A i . e . conditions ( i ) , ( i i i ) and (iv) of (4) . We c a l l 8 the "best" estimator of 8 f o r short . A J A A Let 8^ = X y , then 8^ and 8 have the same expectation A A j A A J, i i E(8) = E(6^) = X rX8 , and covariance matrix Var. (8) = Var. (8^) = X rUX r Proof The condition B X V X ' = V X ' i s equivalent to BX = X ^ X from THEOREM 3.1. From BX = X ^ X , since rank ( X ) = rank ( X ^ X ) , we can solve f o r the solutions B = x ' ' + Z [ I - X X ^ ] , where Z i s any (n x m) matrix 4 4 with dimension B . Postmultiplying BX = X X by X , and using property ( i i ) of (4) , t h i s becomes B X X ^ = X ^ . Hence, B = B [ X X ^ + ( I - X X ^ ) ] 4 4 = X + ( B - X 7 ) . From properties ( i i ) and ( i i i ) of Q4) , we have XX rU[I - X X r ] 1 = 0 , 4 + 4 since (XX r)' = U XX rU , so that, 4 + 4 U(XX r)' = UU XX rU = UU +[U*A(V*) +]X*U = [ A ( V * ) + ] X * U = XX*U . 4 41 4 4 I t follows that BUB* = X rUX r + (B - X r)U(B - X r ) ' . Only the second term involves B . This term i s symmetric non-negative d e f i n i t e , and i t i s equal to the n u l l matrix, \ i . e . , (B - X*)U(B - X*)' = [(B - X*)U*][(B - X*)U %]' = 0 (n x n) i f , and only i f , (B - X*)U* = Z(I - XX*)U* = 0 , (n x m) . This happens i f , and only i f , R ( Z ) C R{ (I - XX*)tT r } X = R{ (I - XX*)(UU +) } X Now (XX*)(UU +) = X[V*A +(U*) +]UU + = X[V^A +(U*) +] = XX* . Hence R(Z) CL R(UU + - X X ^ ) X = R{ (I - UU +) + XX^ } = S{ (I - UU +) + xx^ } S(I - UU+) u S(XX^) = S(X) u s(u)"1- , since S(X) = S(XX^) and S(U)"L = S(UuV" = S(I - UU +) , while S(X) and S ( U ) X are d i s j o i n t . Thus, i t follows that (B - X^)U(B - X ^ ) 1 = 0 i f , and only i f B = X + Z(I - XX ) , where Z i s any n x m matrix s a t i s f y i n g R(Z) d S(X) y S(U) . Hence the minimum Variance i s X rUX r . By THEOREM 3.1, BX = X^X i s equivalent to (i) XBX = X and (iv) (BX)' = V +BXV . To show that ( i i ) BXB = X^ and ( i i i ) (XB)' = U+XBU hold, we f i r s t show that XB = XX^ . I f (B - X^)U(B - X^)' - 0 , i t follows that X(B - X^)U(B - X5*) 'X' = 0 We have from CHAPTER 2, U = ?l<32?{ and by Lemma 2.2 S[X(B - X f ) ] C S ( X ) C S ( U ) = SCPjPj) Note that, X(B - X ) and Pj^PJ are both (m x m) matrices, hence by Lemma- 2.3, there e x i s t s an (m x m) matrix F, such that X(B - X*) = P-^ P Let PJF G' = P'F = = P 2F so that, F = P G ' = [ P X P 2 ] G ! G2 " P l G i + P 2 G 2 -and X(B - X*) = P J P J C P ^ J + P 2 G p = P l G i Hence, X(B - X*)U(B - X*)'X' P J G J P ^ P J G J L P J T0 2T' where T = P ^ J P j [ I ^ h i ^ < m * m> i = l I f X(B - X*)U(B - X*)*X' - 0 , i t follows that the diagonal elements of s T0 2T' are £ = 0 f o r h =• 1, 2, • • •, m . Thus t h ± = 0 f o r a l l i = l h and f o r i = 1, 2, s . Hence T = 0 , (m x s) , whenever X(B - X^)U(B - X^)'X' = 0 . But T = PiGjPj = 0 implies P ^ P ^ P i = G ^ = 0. Hence X(B - X ^ P ^ ^ O . Thus "XB = XX^ . 4 This r e s u l t and the condition BX = X rX give ( i i ) BXB = X^XB = X^XX^ = X^ , and ( i i i ) (XB)' = (XX^) 1 = U+XX^U = U+XBU . Let ^ = x'y . Then E(B^) = X^XB and Var. (B^ ) = X^UX^' , and A 4 4 E(B) = E [ ( X r + Z(I - XX f))y] = X^XB + Z(I - XX^)XB = X rXB . A 4 4* / 4 4 Var.(B)= X rUX r + Z(I - XX r)U(I - XX r)*Z* 4 4' = X rUX r . Q.E.D. CHAPTER 4 ESTIMATION BY MINIMUM MEAN SQUARE ERROR §4.1 INTRODUCTION AND NOTATIONS In t h i s chapter we consider the same regression model (y, Xg, a U) as i n Chipman [3, pp. 1104-1109], except we do not r e s t r i c t U to be p o s i t i v e d e f i n i t e ; we allow U to be a symmetric p o s i t i v e semi-definite matrix. We restate the whole set-up as i n Chipman [3, pp. 1104-1105]. Let the regression model be y = Xg + e , mxl mxn n x l mxl where the (n x 1) vector g has a p r i o r p r o b a b i l i t y d i s t r i b u t i o n , with E(g) = g" ; Var. (g) = E{ (g -'?) (g - g")' = E = T 2V , x 2 > 0 , and the (m x 1) random vector e has mean and variance E(e) = 0 , Var (e) = E(ee') 2 2 = ft = o U , cr > 0 . Assume further that g and e are uncorrelated, that i s , E{ (g - g)e'} = 0 . We s h a l l denote the deviation of g from i t s p r i o r mean by 8 = 3 - 3 ; - thus S M , defining M . 3 g g = » Z E = 0 ; Var. = E [8 e'] = e e e 0 From the j o i n t p r o b a b i l i t y d i s t r i b u t i o n of g and e we obtain the c o n d i t i o n a l d i s t r i b u t i o n s of y = Xg + e given, r e s p e c t i v e l y , g and e , with mean and variance E(y|g) = Xg ; Var (y|g) = fl . and E(y(e) = Xg" + e ; Var (y|e) = XEX' . Thus the unconditional d i s t r i b u t i o n of y has mean and variance E(y) = Xg ; Var (y) = XEX' + Q = W , which defines W . §4.2 MINIMUM MEAN SQUARE ERROR ESTIMATION V D e f i n i t i o n 4.1 Let g = By + b be a l i n e a r estimator of g . Then V V V = P(B, b) = E{ (g - g)(g - g)* } i s c a l l the matrix of mean square error, or more b r i e f l y the r i s k matrix. D e f i n i t i o n 4.2 A l i n e a r estimator g'= By + b , i s s a i d to be a minimum mean square err o r estimator of g i f B and b are such that the matrix V V of mean square error, V = t?(B, b) = E{ (g - g) (g - g) ' }, i s minimum . We proceed as i n Chipman [3, pp. 1104-1109] f i r s t to minimize V with respect to b, and then with respect to B. From g = g - g , V g - g = B(Xg + e) + b - g = (I - BX )F + Be + [b - (I - BX)g"] . We have, P(B, b) = (I - BX)E(I - BX)' + BfiB' + [b - (I - BX)3][.b - (I - B X ) 8 ] ' , since 3 and e are uncorrelated. Only the t h i r d term on the r i g h t involves b , and i t i s non-negative d e f i n i t e . Therefore P(B, b) i s minimized with V respect to b, when b = (I - BX)$ . Substituting t h i s i n t o 6 = By + b , we obtain V _ _ _ 3 = By + (I - BX )3 = B(y - X8) + 8 The problem i s therefore reduced to f i n d i n g a matrix B such that P(B) = (I - BX)E(I - BX)' + BftB' , i s a minimum. Chipman [3, pp. 1105-1106] has solved t h i s problem quite generally f o r the case i n which E and are both p o s i t i v e d e f i n i t e , and obtained B = EX'(XEX' + SI)'1 = ( E _ 1 + X'n^X)"^ 1^" 1 . We s l i g h t l y generalize Chipman's approach by allowing ft to be symmetric p o s i t i v e semi-d e f i n i t e , so that U i s also a symmetric p o s i t i v e semi-definite matrix. We have the following r e s u l t . THEOREM 4.1 Consider the regression model y = Xg + e as i n §4.1. Let X be an (m x n) matrix of rank r ; l e t I be a p o s i t i v e d e f i n i t e matrix of order n and ft a symmetric p o s i t i v e semi-definite matrix of order m and rank s such that s >_ r , and S(X) CL S(&) . Then there i s an (n x m) matrix B = X ( c a l l e d the optimal inverse of X) which minimizes V = (I - BX)E(I - BX)' + BUB' and i s equal to X = EX* (XEX» + n)~ = • ( E - 1 + X ' ^ X ) " ^ ' ^ The minimum r i s k then becomes V(XV) = (I - X WX)E -1 Proof: F i r s t of a l l , since E i s p o s i t i v e d e f i n i t e , so are E and E - 1 + x'n+x . Define the augmented matrices L = [I - BX B], N = I X and M -E 0 0 .9. Then V = LML' , and the problem i s to f i n d an nx(n + m) matrix L such that LML' i s a minimum . By CHAPTER 2, there e x i s t orthogonal matrices H, K and P such that, H'XK = r o 0 0 , and P'fiP « r 0 2 0~ 0 0 yl , 0 , ... 0 0 , Yo» *'* 0 where r = Yi 1 Y 2 i '•• l Y r > 0 , say , 0 , 0 , . . . Y , r~n2 and 02 -9j , 0 , ... 0 o , e2,, ••• o o , o , . . . e: > e i L 6 2 — * * * — E S > 0 , S A Y / We p a r t i t i o n H, K ,and P i n such a way that, H = [Hj Hg] , K =. [K x K 2] , and P = [Pj P 2] . r-columns (m-r)-columns r-columns Tjv-r)-columns s-columns "i(m-s)-columns Then, X = H^Kj and tt = Pj0 2Pj '. Since both H and P are (m x m) orthogonal matrices, there e x i s t s an orthogonal matrix C c l l C12 C21 C22 such that [P 1 P 2] = P = HC = [Hj H 2] C l l C12 C21 C22 = [ H 1 C N + H 2 C 2 1 , H J C J J , + H 2 C 2 2 ] i . e . \ P l " H1 C11 + H2 C21 P 2 - H j C ^ + H 2 C 2 2 By hypothesis S ( X ) C " S ( f i ) , so i t follows that S ( P 2 ) C S ( H 2 ) as i n " CHAPTER 2. This implies that C 1 2 = 0 , and hence C n C n = I r , c n C 2 1 = 0 » so we have, H l ^ l ^ J ~ H l ^ l ^ l l + ^2 G211 [ H1 G11 + ^2^21^' = H l Given M = E 0 o si + , i t follows that M = E 1 0 , where M + and SI are the Moore-Penrose pseudo-inverses of M and SI r e s p e c t i v e l y . Write N' (N'M +N) - 1N'M + ( E _ 1 + X'^Xr^E"" 1 X'fi +] Then N N = I n , and hence, NN N = N I n = N 4 4 4 4 S i m i l a r l y N NN = I n N r = N r . Also, we have (NN ) I X ( E - 1 + x ' n ^ x ) " 1 ^ " 1 , x*ft+] ( E 1 + X ' ^ X ) " 1 ^ 1 , ( E _ 1 + X ^ X ) " " 3 * ft+ x ( E - 1 + x ' ^ x ) " 1 ! - 1 , X ( E - 1 + x ' n ^ ^ x ' n 4 . E ' - ^ E " 1 , + x ' f t + x ) - 1 , z 1 ( E " 1 + X ' ^ X ) " ^ ' ft+X(E~^ + X ' f t + X ) _ 1 , ft+X(Z_1 + X ' ^ X ) " ^ ' z ' 1 0 o ft + O f 1 + x'ft+xrV 1 , ( E - 1 + X ' f t + X ) hi'Sl* -1 + -1 -1 X ( E + X'ft X ) £ , X ( E _ 1 + x ' f l ^ ^ x ' n 4 " E 0 0 ft M+(NN^)M , where, since E and ft are symmetric, so i s E ^ + X ' f t + X , and where x'ft+ft = ( K ^ H p P j e ^ p j p ^ p j = K^HJ = X ' . Thus N satisfies properties (i), (ii) and (iv) of (t) and ( i i i ) of (5O in CHAPTER 2 (with M in place of U). Furthermore, N N * M ( N N * ) 1 = N N * M [ M + N N * M ] = N N * M = M ( N N * ) 1 , since N * M M + = N * and fift+X = X , so that M M + N = N . Consequently, p = L M L 1 = L [ N N * + (I - N N * ) ] M [ N N * + (I - N N * ) ] ' L ' = ( N ' M + N ) _ 1 + ( L - N * ) M ( L - N * ) * , where use is made of the fact that L N = [I - B X B ] = I , and N * M N * = (N'M+N)-1-, since N * = ( N ' M + N ) ~ 1 N , M + . Only the second term involves L , and i t is symmetric non-negative definite, hence equal to the null matrix i f L = N * Thus we have L = [ i - B X , B ] = ( i f 1 + x ' n ^ x ) " 1 ! * : " 1 , x ' f i + ] = [ ( E - 1 + X ' f i ' x r V 1 , . ( E _ 1 + X ' f t + X ) h ' Q * ] = N * But t h i s implies B = (E 1 + X'ft +X) "Si'Si* = X® , which i s as required. The f a c t that B = EX'(XEX' + Sl)+ = X® follows from the i d e n t i t y X ' n +(XEX' + Sl) + = ( E _ 1 + X ' n +X)EX' , from S ( X ) C S ( n ) = S(XEX' + SI) , and from the fa c t that (XlX'+fl) (XEX'+ft)+ i s the p r o j e c t i o n operator on S[(XIX' + Sl)+] = S[(XEX' + SI) ' ] = S(XIX' + SI), since XIX' + SI i s symmetric. The formula V = (I - X®X)E f o r the ® ""1 4" — 1 "I" minimum mean square error follows from X = (E + X'fi X) Xft and X*ft +(XIX' + B) = ( E - 1 + X ' a +X)EX' . Since we have , V(X) = (I - X WX)E(I - x wx)' + x V = (I - x\)E - (I - x eX)E(X®X)' + x V = (I - x \ ) E - E(X €X)' + (X €X)E(x\)' + X*«X* (I - X®X)E - EX'fi +X(E 1 + X'ft+X)" 1 + ( E - 1 + X*^ +X)" 1X'fi +XEX ,fi +X(E~ 1 + X ' ^ X ) " 1 + ( E - 1 + X , n +X)" 1X'£2 ' W fX(E" 1 + X ' n + X ) _ 1 = ( I - x e x ) S - E X ' ^ X C E " 1 + X ' n + X ) _ 1 + ( E _ 1 + X ' n + X ) - 1 { X'ft +[XEX' + SI] }Sl + X f t f 1 + X ' ^ X ) " 1 = (I - X®X)E - EX'^XCE - 1 + X'n +X) - 1 + ( I " 1 + X ,fi +X)" 1(E" 1 + X'Q +X)EX'Q +X(E _ 1 + X'^X)" 1 = (I - x\)E - EX'n +X(E - 1 + X'f i + X ) - 1 + EX'fi +X(E - 1 + X'n +X) _ 1 = (I - X#X)E . Hence t h e b e s t v a l u e o f b i s b = [I - ( E _ 1 + X ' ^ X ) " ^ ' ^ ] ? , and so b — > 0 when E 1 —> 0 . Q.E.D. Remark : T h i s theorem i s none o t h e r t h a n t h e G a u s s - M a r k o f f - A i t k e n theorem on l e a s t s q u a r e s w i t h a change o f n o t a t i o n . C o n s i d e r t h e r e g r e s s i o n model e i 8 = 8 + y X e where t h e random e r r o r t e r m has mean 0 and v a r i a n c e M . Suppose' t h e i n t e r p r e t a t i o n s o f 8 and 8 a r e now r e v e r s e d , and t h e " p r i o r mean" 8 i s considered to be a random v a r i a b l e with mean equal to 8 and covariance matrix Var (8) = E{ ( F - 8)(8 - 8 ) 1 } = E . Then the minimum mean square estimator i s p r e c i s e l y the same as the generalized l e a s t squares estimator corresponding to the above model. For l e t a l i n e a r estimator 8 = Lj3 + L 2 y = [Lj L 2 ] be required to be unbiased and of minimum variance. Unbiasedness requires E(8) = L j 0 + L2X8 = [ L x L 2 ] 8 = L N 8 = 8 f o r a l l 8 » or LN = + L 2X = I n , and so L1 = I n - L 2X . The \ + -1 + minimum variance condition, LML' = minimum, requires L = (N'M N) N'M = , exactly as i n the Gauss-Markoff-Aitken theorem, and hence that ,-1 L, = X = (E + X'fl X) X'fi 'o +v^~•'•v»o+ Thus, 8 = (I - X X)6 + X y §4.3 THE JUSTIFICATIONS FOR LEAST SQUARES. To f a c i l i t a t e reading, we rewrite some paragraphs of Chipman's [3, pp. 1107-1109] discussion . The minimum mean square error estimator-of 8 has been-found to be 8 = ( z - 1 + x , f i + x)" 1 x , a + y + [i - ( E - 1 + x 'f l +x)" 1x ,n +x]F . Define p = — 2 T 0 O o where a , x. are the same as i n sec t i o n 4.1, i . e . , Z = x V and o V ft = a U . Then 8 may also be written . V = ( p 2 V _ 1 + X ' U + X ) " 1 X , U + y + [I - ( p 2 V _ 1 + X ' u 'xP - ' - X ' U ^ X j B , and likewise V becomes. / V = x 2 ( I - BX)V(I - BX)' + a 2BUB' = a 2 [ p " 2 ( I - BX)V(I - BX)' + BUB'] -2 As p —>co , the c r i t e r i o n of minimum mean square er r o r reduces to Penrose's lexicographic c r i t e r i o n , i n f i n i t e weight being given to the bias term (I - BX)V(I - BX)' , which i s to be minimized f i r s t , a f t e r which the variance BUB' i s minimized subject to the condition of minimum b i a s . In the case i n which X has f u l l rank, the minimum bias i s , of course, zero, and from the equation of $ above, i t follows immediately that $ approaches the generalized l e a s t squares estimator • *B = (X'U +X) +X»U +y as p 2 —> 0 . BIBLIOGRAPHY [1] Aitken, A.C., " On l e a s t Squares and Linear Combinations of Observations ", Proceedings of the Royal Society of Edinburgh, V ol. 55, (1935), pp. 42-48. [2J Aitken, A.C., " Studies i n P r a c t i c a l Mathematics. IV. On Linear Approximation by Least Squares " , Proceedings of the Royal Society of Edinburgh, Section A, Vol. 62, (1945), pp. 138-146. [3] Chipman, John S., " On Least Squares with I n s u f f i c i e n t Observations, " Journal of the American S t a t i s t i c a l Association, V o l . 59, No. 308, (December 1964), pp. 1078-1111. [4] Chipman, John S. and Rao, M.M., " Projections, Generalized Inverses, and Quadratic Forms " , Journal of Mathematical Analysis and Ap p l i c a t i o n s , Vol. 9. No. 1, (August 1964), pp. 1-11. [5] Dwyer, Paul. S., " Generalizations of a,Gaussian-:-Theorem " , The Annals of Mathematical S t a t i s t i c s , V o l . 29. No. 1, (March 1958) pp. 106-117 . [6] Goldman A. J . and Zelen, M., " Weak Generalized Inverses and Minimum Variance Linear Unbiased Estimation " , Journal of Research of the National Bureau of Standards- Section B. Mathematics and Mathematical Physics, V o l . 68 B, No. 4, (October-December 1964), pp. 151-172 . [7] G r a y b i l l , F r a n k l i n A., " Introduction to Matrices with Applications i n S t a t i s t i c s " , Wadsworth Publishing Company, Inc., Belmont, C a l i f o r n i a . (1969) . [8] Lewis, T.O. and Odell P.L., " A Generalization of the Gauss-Markov Theorem " , Journal of the American S t a t i s t i c a l A s s o c i a t i o n , V o l . 61, No. 316, (December 1966), pp. 1063-1066 . [9] Mitra, S u j i t K. and Rao, C. Radhakrishna, " Some Results i n Estimation and Tests of Linear Hypothesis Under the Gauss-Markoff Model " , Sankhya , Series A, Vol 30, Part 3, (September 1968), pp. 281-290 . [10] Penrose, R., 11 A Generalized Inverse f o r Matrices " , Proceedings of the Cambridge P h i l o s o p h i c a l Society, V o l . 51, Part 3, (July 1955), pp. 406-413 . [11] Penrose, R., "On Best Approximate Solutions of Linear Matrix Equations ", Proceedings of the Cambridge P h i l o s o p h i c a l Society, V o l . 52, Part 1, (January 1956), pp. 17-19 . [12] Rao, C. Radhakrishna, " Linear S t a t i s t i c a l Inference and i t s Applications ", John Wiley and Sons, Inc., New York (1965). [13] Rohde, Charles A. " Some Results on Generalized Inverses ", S.I.A.M. Review, V o l . 8, No. 2. ( A p r i l 1966). pp. 201-205. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0302209/manifest

Comment

Related Items