BIAS IN LEAST SQUARES REGRESSION by DOUGLAS HAROLD WILLIAMS B.Sc, Simon Fraser University, 1970 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in the Department of FORESTRY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA January, 1972 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at The University of B r i t i s h Columbia, I agree that the Library s h a l l make i t fr e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. The University of B r i t i s h Columbia Vancouver 8, Canada Date Q-^*-Z*^O^/ v 5? > i i i ABSTRACT Much of the data analysed by least squares regression methods viol a t e s the assumption that independent variables are known without error. Also, i t has been demonstrated that parameter estimates based on minimum residual sums of squares have a high p r o b a b i l i t y of being unsatisfactory i f the independent variables are not orthogonal. Both situations are examined j o i n t l y by Monte Carlo simulation and bias i n least squares estimate of regression c o e f f i c i e n t s and error sums of squares i s demonstrated. Techniques for regression under these con-ditions are reviewed but the l i t e r a t u r e does not present a p r a c t i c a l algorithm i n either case. i v TABLE OF CONTENTS Page LIST OF TABLES v LIST OF FIGURES v i i Chapter INTRODUCTION 1 ONE THE LINEAR MODEL 3 The Classical Model 3 The Least Squares Solution 3 TWO MONTE CARLO STUDIES 6 The Simulation Algorithm . 6 Construction of Vectors of a Given Correlation . . . 7 A Single Variable Model: Study 1 10 A Two-Variable Model: Study 2 19 Discussion of Simulation Results 32 THREE REGRESSION PROCEDURES WHEN o^ 4 0, AND PREDICTOR VECTORS ARE NOT ORTHOGONAL 38 CONCLUSION 41 LITERATURE CITED 44 V LIST OF TABLES Table Page 1. Simulation results for a single variable model: g = -0.25 12 2. Simulation results for a single variable model: 3 = 0.0 13 3. Simulation results for a single variable model: 3 =0.25 14 4. Simulation results for a single variable model: 3 = 0.5 . . 15 5. Simulation results for a single variable model: 3 = 0.75 16 6. Simulation results for a two variable model: correlation (X^ X^) = 0.0 20 7. Simulation results for a two variable model: correlation (X^ X^) =0.1 21 8. Simulation results for a two variable model: correl a t i o n (X^ X^) =0.2 22 9. Simulation results f o r a two variable model: correlation (X^ X^) =0.3 23 10. Simulation results for a two variable model: correlation (X^ X^) = 0.4 . 24 11. Simulation results for a two variable model: correl a t i o n (X^ X^) =0.5 25 12. Simulation results f o r a two variable model: correlation (X^ X^) =0.6 26 13. Simulation results for a two variable model: correlation (Xn X 0) = 0.7 27 v i Table Page 14. Simulation results for a two variable model: correlation (X X 2) = 0.8 28 15. Simulation results for a two variable model: correlation (X, X.) = 0.9 29 v i i L IST OF FIGURES F i g u r e Page 1. The T r e n d o f t h e S t a n d a r d D e v i a t i o n o f t = (B - 6)/S(3) w i t h B and 18 2. The E f f e c t o f N o n - o r t h o g o n a l P r e d i c t o r V a r i a b l e s and a i 0 on t h e S t a n d a r d D e v i a t i o n of t = (B - B) /S(B) 31 ACKNOWLEDGMENT The author wishes to express his gratitude to Dr. A. Kozak who suggested the problem and under whose direction this study was undertaken. Drs. W. G. Warren, A. Kozak, and Mr. G. G. Young are gratefully acknowledged for their help, useful criticism and review of the thesis. BIAS IN LEAST SQUARES REGRESSION INTRODUCTION Multiple regression methods employing the least squares p r i n c i p l e are used throughout the sciences for i d e n t i f i c a t i o n of the function r e l a t i n g a set of 'independent' variables to a single responding 'dependent' variable. The widespread u t i l i t y of these methods necessitates examination of the v a l i d i t y of the technique under conditions v i o l a t i n g or s t r a i n i n g the assumptions of least squares regression theory. In p a r t i c u l a r , much of the forestry data analysed by regres-sion methods does not f u l f i l l the assumptions of regression theory. The independent variable may only be an estimate, such as stand age i n natural 'even aged' stands. Another common regression s i t u a t i o n i n forestry i s the volume equation, V = a D b HC, where V = tree volume, D = tree diameter at breast height, H = tree height. 2 The l i n e a r form i s log V = log a + b log D + c log H The independent variables H and D have errors of estimate and are highly correlated. Some aspects of this problem have been discussed by Wald (1940), B a r t l e t t (1949), Acton (1959), Kendall and Stuart (1961), and Cox (1968). However, these authors are concerned largely with developing improved parameter estimates. Kendall (1951) i n a discussion of regression and functional r e l a t i o n s h i p , examined the problem of test of s i g n i f i -cance i n addition to parameter estimation. Turnbull (1968) presents a series of Monte Carlo simulations of a single variable model, and demonstrates empirically the seriousness of error of estimate of the independent variable. I t was attempted i n th i s paper to examine two common experi-mental s i t u a t i o n s , the case where the independent variables contain errors and the case where the independent variables are not orthogonal. Monte Carlo simulation experiments are used to provide empirical data to demonstrate the trend and magnitude of effects a r i s i n g from these s i t u a t i o n s . CHAPTER ONE THE LINEAR MODEL 1.1 The Classical Model The general linear model for relating a response variable Y to a controlled variable X i s Y = X 3 + e (1) where Y i s an (n x 1) vector of observations. X i s an (n x p) matrix of known form. 3 i s a (p x 1) vector of parameters e i s an (n x 1) vector of errors. A number of assumptions are made about this model: i) e * N(0, o 2I) i i ) Y ^ N(X 3 , a 2I) This implies that cov (Y^ Y_.) = 0 for a l l i 4 j i i i ) The independent variable X i s known without error. 4 1.2 The Least Squares Solution The l e a s t squares estimate of 3 i s the vector b which minimizes T the e r r o r sums of squares, e e. From equation (1), and e = Y - X 3 (2) e Te = (Y - X 6) T(Y - XS) = Y Y - 3 X Y - Y X3 + 3 X X3 T T T T T (3) = Y Y - 2 3 X Y + 3 X X3 V J' The value of b which, when substituted f o r 3 i n (1), minimizes T e e i s found by the c l a s s i c a l d e r i v a t i v e technique. Equation (3) i s d i f f e r e n t i a t e d with respect to 3. The r e s u l t i n g matrix i s equated to zero, 3 being replaced by b. The r e s u l t i n g system i s generally r e f e r r e d to as the 'normal' equations, (X TX)b = X TY. (4) I f the system of equations (4) consists of p independent equa-tions i n p unknowns, then a unique s o l u t i o n f o r b can be obtained. T - I T b = (X X) X I The s o l u t i o n b i s an estimate of 3 which minimized the error T sums of squares e e i r r e s p e c t i v e of the d i s t r i b u t i o n properties of the e r r o r s . However, the assumption that the errors e are normally 5 distributed i s required in order to make tests of significance which depend on assumptions of normality, such as t- or F-tests. Another interesting property of the solution b i s that when 2 e^N(0, a ), then b i s the maximum likelihood estimate of 3. The l i k e l i -hood function, L, for the n-tuple of observation Y = [Y^, ^2'*"*^n^ T / T . . n 1 , 2 ,„ 2 S L(e e) = ir ryr- exp (-e /2a ) 1 1 o(2ir) ' 1 1 T 2 = — —pr exp(-(e e)/2a ) n / 0 Nn/z a (2TT) As the sums of square error term is negative, minimizing T L(e e) maximizes the exponential expression and the likelihood function. This property provides j u s t i f i c a t i o n for the least squares procedure in the common situation where errors are normally distributed. CHAPTER TWO MONTE CARLO STUDIES 2.1 The Simulation Algorithm The Monte Carlo studies i n this investigation consist of generating data from a linear equation with known parameters and normal random errors of known standard deviations. The independent variables were constructed to a predetermined correlation and corrected sum of squares by a technique described below. The dependent variable 'observations', y, were generated as y=*0+h h±+h X 2 i + " ' - + B p X p i + e 2 i > where: i ) 8 „ , 8-, > . . . . 8 are the parameter values of the model, U 1 p i i ) X,., X„., X . are the i t h level independent l i ' 2 i ' ' p i r variables measured without error, i i i ) the intercorrelations matrix of the independent variables i s C, iv) 1 S t n e e r r o r of estimate of the i t h observation and i s N(0, o 2 ) . 7 The dependent variable, therefore, i s generated in a manner in agreement with regression theory. The error of estimate i s associated with the dependent variables and i s independent of the X variable, and the X variables themselves are assumed to be error free. Next, the observed independent variables, x, were generated such that x = X + e 1 where e i * y°> A y Each set of observations was f i t t e d using the usual least squares procedure. The procedure was repeated 1000 times for a number of combinations of and C, and 'expectation' values were provided by the means of selected s t a t i s t i c s . It should be pointed out that the same set of standard normal random deviates was used for a l l combinations of C and a^. 2.2 Construction of Vectors of a Given Correlation The intercorrelation of the independent regression variables was a controlled parameter for studies involving multiple regression models. An algorithm for construction of variables of a given correlation i s described below. 8 Let the unknown random vector be X = [X., X„, ... X ], written 1 2 p as a column matrix. Then the desired column vector of mean values i s y = E(X) = E(X 2) E(X 2) E(X p) and the desired intercorrelation matrix i s C = '12 "IP c 2 1 1 JLfpi A covariance matrix, V, may be obtained from the correlation matrix and the desired standard deviations of each independent variable by way of the relationship: or c. . = J i 3 v. . = c. . a. a. A random standard normal deviate generator i s used to produce observations of a p-dimensional normal variable Y. Let C be the re-sultant covariance matrix of Y, and m is the mean vector. That i s , Y ^ N (m, C) . P It i s a property of multivariate normal distributions that i f A i s any p x p, non singular matrix, then Z = AY * N (Am, ACAT) . P Without immediately determining A assume that, T ACA = I , p where I i s a p x p identity matrix. 1/2 If V i s the square root of the desired covariance matrix, then by the above property, V 1 / 2 T Z = x ' „ N(V 1 / 2 TAm, V 1 / 2 T I V 1 / 2 ) = X ^ N(V 1 / 2 TAm, V) . The user specified mean vector i s formed by subtracting the mean vector of Z from each observation of Y and adding the user speci-fied mean vector y. We have not yet determined the matrix A such that T ACA = I T Premultiplying both sides by A , we have T T T T A ACA = A I = A . This implies that (A TA) _ 1A T = CAT 10 and that T -1 (A A) A = C . Therefore T -1 A A = C and A - Ccf 1) 1 7 2 Hence to determine the desired random vector X, we need only -1 1/2 1/2 T compute the two matrices (C ) and (V ) and apply them to the starting random variable Y: X = ( V 1 / 2 ) T ( C - 1 ) 1 / 2 (Y-m) + u i s N(u, V) . Much of this algorithm i s incorporated into the computer program *N0RMAL written by J. Halm of The University of British Columbia Comput-ing Centre. It should be pointed out that 'correlation' i s a property of random variables. In this study we use 'correlation' i n association with fixed predictor vectors for want of a better term. 2.3 A Single Variable Model: Study 1 In the f i r s t study, the straight li n e model used by Turnbull (1968) was simulated Y = 30 + exx + c 2 11 where 8Q = 10. , 2 and e 1 ^ N(0, e 2 * N(0, a 2 ) Observations were generated for 9 levels of X (X = 6, 7, ...., 13, 14) and the standard deviation of the error of Y, was held con-stant over the study, = 1. The slope, 8^ and the standard deviation of error of X , a ^ , were varied over the study in a fa c t o r i a l arrangement. &1 = -.25, 0, .25, .5, .75 ax = 0, 1, 2, 3, 4 The results of the 25 experiments i n Study 1 are given i n Tables 1-5. The tabled variables are Bl , the least squares estimate of B ^ > S(B1) , the standard deviation of Bl, T(B1) , the s t a t i s t i c B l - 8, ±_ » ST(B1) , the standard deviation of the above s t a t i s t i c T(B1), CHI , the s t a t i s t i c 9 (n-2) S y.x or SSRES 2 2 °2 °2 V(CHI) , the variance of the s t a t i s t i c CHI. The table entries are the means of 1000 observations. TABLE 1 SIMULATION RESULTS FOR A SINGLE VARIABLE MODEL MODEL : Y = 10. - 0.2 5< X+E 1 > + E2 LEVELS OF X' = RE PE TI TI ONS = 9 10 00 DES I G.N • • . SIGMA 2 . • 1 1 • • 1 • 1. • 1 • • . SIG MA 1 . • 0 1 • • • 2 • 3 • 4 • * • • • • • STATISTICS Bl -0.25 -0.23 • -0.17 . -0.12 . -0.09 . . S(B1) . 0.13 . 0.12 • 0.12 . o . i o . 0. 09 . . T ( B 1 ) -0.02 . 0.2 2 • 0.77 . 1.43 2.10 . . ST(31) 1.21 . 1.17 • 1.19 1.27 . 1.40 CHI 7.04 . 7 .44 • 8.26 8.96 . 9.41 . . V(CHI) 13.75 . 14.97 • 18.75 . 22. 52 . 25. 03 . TABLE 2 SIMULATION RESULTS FOR A SINGLE VARIABLE MODEL MODEL : Y.= 10. + O.CCIX+El) + E2 LEVELS OF X = 9 REPE TI TI ONS = 1000 DESIGN • • . SIGMA 2 . • 1 • 1 • 1 • 1 • 1 • • . SIGMAI . • • • 0 • • 1 • • 2 • • .3 • • 4 • STATISTICS Bl -0.00 -0.00 . -0.0 0 -0.0 0 . 0.00 . . S ( Bl ) 0.13 . 0.12 . 0.11 . 0. 09 . 0.08 . . T( Bl ) -0.02 . -0.00 . 0. 01 0.01 0.00 . ST( Bl ) 1.21 . 1.19 . 1.17 . 1.16 . 1.16 . CHI 7.04 . 7.05 . 7.07 . 7.09 . 7. 10 . . V(CHI) 13.75 . 13.B8 . 14.01 . 14.26 . 14.54 . TABLE 3 SIMULATION 'RESULTS EUR A SINGLE VARIABLE MODEL 14 MODEL : Y = 10. + 0.25 1X+E1) + E2 LEVELS OF X = 9 REPETITIONS - 1000 DESIGN • • • • . SIGMA2 1 . 3. . 1 • • • • . SIGMA1 0 . 1 . 2 STATISTICS 31 C . 2 5 . C.23 . 0.17 . 0.12 . 0. 09 . S ( B 1) 0.13 . 0.12 0.12 . 0.10 . 0.09 . . T ( Rl ) . -0.02 -0.2 3 -0.78 . -1.45 . -2.13 . . ST(B1) 1.21 . 1 .23 . l..*24 . 1.31 . 1.43 . CHI 7.C4 . 7.44 . 8.27 . 8.98 9.45 . V(CHI ) 13.75 . 16.09 , 2 0.04 . 23.59 . . 26.09 . 1 . 1 » 3 . 4 TABLE 4 SIMULA!ICN RE SUITS FOR A SINGLE VARIABLE MODEL 1 5 MODEL : Y = IC . + 0.50 (X+El) + E 2 LEVELS OF X = 9 REPETITIONS = 1000 DES IGN • • # • • • • . SIGMA2 1 1 1 1 1 • • . STGMA1 • 0 • 1 • 2 * 3 • 4 » • • • • • • STATISTICS Bl 0.50 . G. 45 . 0.35 . 0.24 . 0.17 . S(31» 0.13 . 0.13 0.14 0.13 0.12 . . T ( Bl ) -0.02 -0.4 3 -1.31 . -2.25 . -3.18 . . ST I 01) 1.21 1.27 . 1.32 . 1.46 . 1.67 . CHI 7.C4 . 8.60 . 11.86 . 14.66 16.47 . . V(CHI) 1 3.75 . 2 2.06 4 0.9? 58.28 . 6 8.61 16 TABLE 5 SIMULATION RESULTS FOR A SINGLE VARIABLE MODEL MODEL : Y = 10. + 0.75(X+El) + E2 LEVELS OF X = 9 REPF TI TI ONS = 1000 DES IGM • • . S, IG M A 2 . • 1 * 1 1 • 1 • • 1 • • . SIG MA 1 . 0 • 1 2 • 3 • • 4 • * • • • STAT1STICS Bl. 0.7? 0.63 0 . 5 2 0.37 0.26 . S ( Bl ) 0.13 . 0.15 . 0. 17 0. 1 7 0.16 . . T( B 1 ) -0.02 . -0.58 . -1. 59 . -2.60 -3.5 8 . ST(B1) 1.21 1.28 . 1 .33 1 .47 1.67 . CHI 7.04 10.55 17.85 . 24.10 28. 14 . . V(CHI) 13.75 . 3 3.29 . 89. 42 . 141.79 . 167.96 17 I t can be seen from Tables 1-5, that there i s a trend to underestimation of [3 1|, by B l as c^, increases. Also, t h i s bias i s proportional to the value of the parameter 6^ . I f we define bias as the absolute distance d, then for the family of models simulated, d = .66 3 X The t s t a t i s t i c behaves as would be expected from the trend of B l . However, the variance of T(B1) demonstrates from Students' t d i s t r i b u t i o n . The expected variance of a random variable d i s t r i b u t e d as Students' t with n-2 degrees of freedom i s -Ar = 1.285 (when n = 9). n-2 When = 0, the experimental value i s close to i t s expectation 2 2 (ST(B1) = (1.21) = 1.46) but increases as increases. The parameter value of 6^ , also appears to have a proportional effect of the standard deviation of the t - s t a t i s t i c , at least for 8-^ = 0, .25, .5. Figure 1 shows that the trend of ST(B1) for 8 = .75 i s almost i d e n t i c a l to the trend when 6^ = .5. The s t a t i s t i c CHI, the r a t i o of the residual sum of squares to 2 a^t should have a ch i square d i s t r i b u t i o n with mean n-2 and variance 2 2(n-2). When ^ 0, the observed values f o r CHI and i t s variance V(CHI) are greater than t h e i r expectations. The bias for both s t a t i s t i c s 18 T 1.0 2.0 SIGMA 1 3=0.75 8=0.50 3=0.25 8=-.25 8=0.00 Figure 1 The Trend of the Standard Deviation of t = (B - 8)/S(8) with 8 and a, 19 is directly proportional to the absolute value of the estimate. We w i l l return to discussion of these effects after consider-ation of a multiple regression model. 2.4 A Two-Variable Model: Study 2 In the second study, a two variable linear model was simulated, Y " *() + S l X l + h X2 + £2 where 3 = 10 .5 .5 and e l l ' e12 ^ N 2 ( 0 ' 0 1 I 2 ) ' e 2 * N(0, c 2) , o\ = 1 , for 10 levels of X = [X^ X ]. The covariance matrix, V, of was varied over the study, V = k l 2 , k = 0, 1, 2, 3, 4, in a fa c t o r i a l arrangement with the correlation of the predictor vectors of X, p(X 1, X 2) = j ( . l ) , j = 0, 1 9 . The regression coefficients 3 , and a^, were held constant over the study. The results of the 50 experiments of Study 2 are given in Tables 6-15. Tables 6-15 are of the same form as those of Study 1 except that data are present for two regression coefficients. 20 rABLE 6 SIMULATION RESULTS FOR A TWO VARIABLE MODEL HO DEL : Y = 10. + .5(X1 + E11) + .5(X2 + E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION XI,X2 = 0.0 DESIGN • • • . • . SIGMA2 . 1 . 1 . 1 • m m m . SIGMA 1 . 0 . 1 . 2 STATISTICS B1 0. .50 * 0. .45 '. 0. .33 ! 0. .22 " o. 14 . S(B1) 0, .13 " 0. 15 . 0. .17 . 0, .16 0. 14 . . T(B1) . -0. 0 3 * -0. .41 , -1. .19 '. -2. .03 . -2. 87 . . ST(B1) 1. .12 . 1. 14 . 1. 19 ! 1, 31 1. 50 ." B2 0. .50 *. 0. .45 « 0. .33 . 0. .22 0. 15 . .* S(B2) 0. .13 * 0. 15 . 0. .16 . 0, 16 0. 14 . . T(B2) . -0. 05 * - 0. .46 . -1. .23 . -2, .07 . -2. 89 . . ST(B2) 1. .23 . 1. .24 ] 1. .29 . 1. . 44 1. 6 3 .* CHI 7. .07 * 10. 14 . 1 6. 17 20, .71 .* 23. 28 . V(CHI) . 14. 8 5 *. 29. .75 '. 67. .82 . 98. .25 * 112. 2 3 I TABLE 7 SIMULATION RESULTS FOR A TWO VARIABLE 90DEL MODEL : ¥ = 10. + .5(X1+E11) + .5(X2+E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 COR RELATION X1 fX2 = 0.1 DESIGN . SIGMA2 . 1 * • . SIGMA1 . 0 STATISTICS B1 0. 50 . 0. 46 . 0. 34 '. 0. .23 0. 15 . . S(B1) , '. 0. 13 . 0. 15 . 0. 17 0. 16 0. 14 . . T(B1) , -o. 02 . -0. 37 . -1. 10 . -1. 91 . - 2. 71 . . ST(B1) 1. 13 . 1. 13 . 1. 1 8 1. 29 1. 46 . B2 0.50 . 0. 45 . 0. 34 0. 23 0. 16 . . S{B2) 0.13 . 0. 15 . 0. 17 '. 0. 16 0. 14 . . T(B2) . -0. 05 . -0. 42 . -1. 15 . -1. 94 . -2. 74 . . ST(B2) 1. 23 . 1. 24 . 1. 29 1. 42 1. 59 . CHI 7. .07 10. 18 . 16. 49 '. 21. 47 ". 24. 42 . . V(CHI) . 14. 85 . 29. 92 . 70. 47 . 105. 59 . 122. 82 . 22 TABLE 8 SIMULATION RESULTS FOB A TWO VARIABLE MODEL HODEL : Y = 10. + ,5(X1 + E11) + .5(X2 + E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION X1,X2 = 0.2 DESIGN • ** • ' • • . SIGMA2 . 1 . 1 . 1 . 1 • • • • • . SIGMA1 . 0 1 2 3 STATISTICS B1 '. 0. 50 " 0. .U6 ! 0. 35 0. .24 0. 16 . . S(B1) '. 0. 13 . 0. .15 . 0. 17 '. 0. .16 ." 0. 15 . . T(B1) '. -o. 02 * -0. ,3a \ . -1. 03 • -1 -.80 ." -2. 58 . . ST(B1) '. 1. 15 1. . 14 1. 18 1. .28 1. 44 . ! B2 '. 0. 50 * 0. .45 . o. 35 °. 0. .24 0. 17 . . S(B2) ! o. 13 " 0. 15 ! 0. 1 7 0. 16 0. 15 * . T(B2) -0. 0 5 * -0, .40 -1 . 08 ! -1. .84 .* -2. 6 0 .* . ST(B2) '. 1. 25 . 1. 25 . 1. 29 1. 40 1. 56 . CHI 7. 07 . 10. 21 '. 1 6. 76 . 22. 17 .* 25. 50 I . V(CHI) 14. 85 . 30. 08 . 72. 92 . 112. 66 * 133. 46 . 23 TABLE 9 SIMULATION RESULTS FOR A TWO VARIABLE MODEL MODEL : Y = 10. + .5(X1+E11) + .5(X2+E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRE1&TICN X1,X2 = 0.3 DESIGN • • • • • . SIGMA2 1 1 . 1 . 1 • * • • • . SIGMA1 0 1 . 2 . 3 STATISTICS B1 0. 50 . 0. 46 . 0. 36 0. 25 '. 0. 17 . . S(B1) . 0. 13 . 0. 15 . 0. 17 ! 0. 17 0. 15 . . T(B1) . -0. 01 . -0. 32 . -0. 97 . - 1. 71 . - 2. 46 . . ST(B1) 1.18 . 1. 16 ' . 1. 19 '. 1. 28 1. 42 . B2 0. .50 . 0. 46 . 0. 36 0. 25 0. 18 . . S(B2) 0. 13 . 0. 15 . 0. 17 0. 16 0. 15 . . T(B2) . -0. 05 . -0. 37 . -1. 02 . -1. 74 . -2. 48 . . ST(B2) 1. 29 . 1. 27 . 1. 30 1. 40 1. 53 . CHI 7. 07 . 10. 24 '. 17. 01 '. 22. 81 . 26. 52 . . V{CHI) . . 14. 85 . 30. 21 . 75. 23 . 119. 46 . 144. 12 . 24 TABLE 10 SIMULATION RESULTS FOR A TWO VARIABLE MODEL HO DEL : i = 10. + .5(X1 + E11) + .5(X2 + E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION X1,X2 = 0.4 DESIGN • • • . SIGMA2 . 1 . 1 • • • . SIGMA 1 . 0 . 1 STATISTICS B1 0. .50 * 0. .47 \ 0. 37 '. 0. .26 0. 18 .* . S(B1) I 0.. .13 * 0. . 15 . 0. 1 7 0. .17 0. 15 . . T(B1) -0, .01 . -C. .30 I . -0. 91 . -1. 62 " -2. 35 .* . ST(B1) ! 1. .24 " 1. .20 . 1 . 21 * 1. .28 " 1. 41 . B2 '. 0. .50 * 0. 46 . 0. 37 I 0. .26 0. 19 . . S(B2) ! o. 13 * 0. .15 '. 0. 1 7 \ 0. .17 0. 15 . . T(B2) -0. .0 5 * -0. .35 \ . -0. 96 . -1 .66 -2. 37 .* . ST(B2) 1, .34 . 1. .31 " 1 . 32 1. .40 1. 51 . CHI 7. .07 " 10. .27 '. . 17. 23 '. 23. .39 .* 27. 48 .* . V(CHI) . 14. 85 3 0. 34 ' . 77. 43 . 126. 01 . 154.78 . 25 TABLE 11 SIMULATION RESULTS FOR A TWO VARIABLE 80DEL MODEL : Y = 10. + .5(X1 + E11) + -5(X2+E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION X1,X2 =0.5 DESIGN • • • • • . SIGMA2 . 1 1 1 . 1 • • • • • . SIGMA1 0 1 . 2 . 3 STATISTICS B1 0, 50 . 0. 47 . 0. 37 0. 27 0. 19 . . S(B1) , 0. 13 - 0. 15 . 0. 17 0. 17 0. 15 . . T(B1) . -0. 00 . -0. 29 . -0. 87 . - 1. 56 . -2. 26 . . ST(B1) . * 1. 28 . 1. 22 . 1. 23 1. 30 1. 41 . B2 0.50 . 0. 46 . 0. 37 0. 27 0. 19 . . S(B2) 0. 13 . 0. 15 . 0. 17 0. 17 0. 15 . . T(B2) . . -0.05 . -0. 34 . -0. 92 . - 1. 59 . -2. 28 . . ST(B2) 1. 37 . 1. 32 . 1. 33 1. 41 1. 5 1 . CHI 7. 07 . 10. 30 '. 17. 43 . 23. 97 28. 42 . . V(CHI) . 14. 81 . 30. 35 . 79. 30 . 132. 60 *. 165. 90 . 26 TABLE 12 SIMULATION RESULTS FOR A TWO VARIABLE MODEL MODEL : Y = 10. + .5{X1 + E11) + .5(X2 + E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION X1,X2 = 0.6 DESIGN • tt • • « • . SIGMA2 . 1 . 1 . 1 . 1 . 1 9 • • • • ' • . SIGMA1 . 0 1 . 2 . 3 4 STATISTICS B1 0. .50 * 0. .47 \ 0. .38 " 0. .28 . 0.20 . '. S(B1) 0. .13 0. .15 . 0. .17 " 0. .17 0. 16 . . T(B1) 0. .01 * -0. .26 . . -0. .82 . -1. .48 . -2. 16 . . ST(B1) 1. .45 * 1. .33 '. 1. .29 1. .33 1. 4 0 * B2 0. .50 " C. .47 \ 0. .38 0. .28 ." 0. 20 . . S(B2) 0. .13 . 0. .15 . 0. .17 0. .17 0. 16 . . T(B2) . -0. .06 * -0. .32 '. -0. .87 -1. .51 . -2. 18 . . ST(B2) 1. .53 . 1. 43 ! 1. 39 1. 43 1. 18 . CHI 7. .07 " 1C. .33 '. . 17. .59 24 .43 .* 29. 25 . . V(CHI) . 14. 8 5 " 30. .61 ] . 81 . 55 . 138. . 36 .* 176. 02 . 27 TABLE 13 SIMULATION RESULTS FOR A TWO VARIABLE MODEL MODEL : Y = 10. + .5(X1+E11) + .5(X2+E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATICH X1,X2 =0.7 DESIGN • • • * • . SIGMA2 1 1 . 1 . 1 * m • * • . SIGMA1 0 1 . 2 . 3 STATISTICS B1 0. 50 0. 47 . 0. 38 ! 0. 29 0. 21 . . S(B1) . 0. 13 0. 15 . 0. 1 8 ! 0. 17 0. 16 . T(B1) 0. 02 . -0. 25 . -0. 78 . -1. 41 . -2. 09 . . ST(B1) 1. 64 . 1. 45 . 1. 36 1. 36 1. 39 . B2 0. 50 . 0. 47 , 0. 38 '. 0. 29 0. 21 . . S(B2) 0. .13 . 0. 15 . 0. 17 '. 0. 17 0. 16 . . T(B2) . -0. 0 7 -0. 31 . -0. 83 . - 1. 45 . -2. 10 . . ST(B2) 1. .72 . 1. 55 . 1. 46 1. 45 1. 48 . CHI 7. 07 10. 36 . 17. 74 '. 24. 89 '. 30. 07 . . V(CHI) . 14. 85 . 30. 76 . 83. 38 . 144. 14 . 186. 60 . 28 TABLE 14 SIMULATION RESULTS FOR A TWO VARIABLE MODEL MODEL : Y = 10. + .5(X1 + E11) + .5(X2 + E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION X1,X2 =0.8 DESIGN • • m • • • . SIGMA2 . 1 . 1 . 1 . 1 . 1 • * • m * m . SIGMA1 . 0 . 1 . 2 . 3 . 4 STATISTICS B1 '. 0. .50 \ 0. .47 ' 0.39 0. .29 0. 21 . . S(B1) '. 0. .13 . 0. .15 . 0. 1 8 0. .17 0. 16 . . T(B1) '. 0. .04 * -0. .23 \ . -0. 75 . -1. 36 . -2. 0 2 .* . ST(B1) 1. .98 * 1. .65 \ 1. .47 1. .38 1. 40 .* '. B2 '. 0. .49 0. .47 \ 0, 39 0. 29 0. 22 . . S(B2) ! '. 0. .13 0. 15 \ 0. 1 7 0. 17 0. 16 . . T(B2) '. -0. .08 * - c . 30 ] . -0. .80 . -1. 39 . -2. 02 . . ST(B2) 2. .05 . 1. 73 \ 1.55 1. 46 1. 48 . CHI 7. 07 " 10. 39 \ . 17. 87 . 25. 31 ." 3 0. 85 . . V(CHI) . 14, .85 . 30. 97 \ . 84. 98 . 149. 64 .* 197. 20 . 29 TABLE 15 SIMULATION RESULTS FOR & TWO VARIABLE MODEL MODEL : Y = 10. + .5(X1+E11) + .5(X2+E12) + E2 LEVELS OF X = 10 REPETITIONS = 1000 CORRELATION XI,X2 =0.9 DESIGN * * • • . SIGMA2 1 1 . 1 o m » m . SIGMA1 0 1 . 2 STATISTICS B1 0.51 . 0.48 . 0. 39 0. 30 0. 22 S(B1) . '. 0.13 ." 0. 15 . 0. 1 8 0. 18 0. 16 T(B1) 0.07 . -0.21 . -0. 72 . -1.32 *. -1.95 ST(B1) 2.77 . 2.02 . 1. 58 1.41 1.42 B2 0.49 . 0.47 . 0. 39 0. 30 0. 22 S(B2) . 0.13 0. 15 . 0. 1 8 0. 18 0. 16 T(B2) . . -0.11 . -0.29 . -0. 76 . -1.32 . -1.94 ST(B2) 2.82 . 2.07 . 1. 65 1. 48 1.50 CHI 7.07 . 10.41 . 17.96 '. 25.69 '. 31.60 V(CHI) . 14.85 . 31.24 . 86. 20 . 155.00 * 207.67 30 2 From Tables 6-15, we see that the trends due to o^, which were noted for the single variable model are generally exchanged by the addi-tions of another variable. However, detailed comparison of the two cases does not seem j u s t i f i e d as the method of selection of the independent variables is different. For the single variable case the x variables are set at x = (6,7,8, . . . , 13,14), while for the 2 variable case the x variables were generated randomly with a mean of 10 and a standard devia-tion of 2. Consequently, the results for the 2 variable case are stated below without comparison to the single variable case. The regression coefficient Bl (or B2) underestimates 8-^ (or H^) as a- increases. Bi - 8. The s t a t i s t i c „ / Q .— behaves as expected considering the effects o \P_jJ of a 4 0 on the estimation of 8, increasing i n absolute value as O-Bi - 8. increases. The rate of increase of , . is inversely affected by raising the level of correlation of X^, X^. The standard deviation of this s t a t i s t i c , ST(B1), increases propor-tional to a^. Raising the correlation of the two independent variables causes an increase in the value of ST(B1) for > 0. However, the rate at which ST(B1) increases with decreases for larger correlations (Figure 2). The s t a t i s t i c CHI and i t s variance V(CHI) both show overestimation with increasing a^, in the 2 variable model. 2.5 Discussion of Simulation Results The fact that b w i l l be a biased estimate of 8 when the variance of error associated with the independent variables is greater than zero, i s not well known by many users of regression techniques. An algebraic 31 0 Figure 2 The Effect of Non-orthogonal Predictor Variables and ai J 0 on the Standard Deviation of t = (B - 3)/S(B) explanation of this result can be obtained by inserting the error of the dependent variable into the normal equations. Let x be the p x n column matrix of observed independent variables with an associated random vector of errors e^. e. * N (0, a.I) 1 p 1 The true experimental variable i s x such that x = X + e1 and x Tx = (X + £ ; L ) T (X + T T T T = (X X + X e± + e-jX + e e) . The normal equations expressed i n terms of the independent variables observed with error i s T T (x x) b = x Y Using equation (6) and solving for b, equation (7) can be written T T T T —1 T T b = (XX + X e± + ept + e ^ ) (X Y + e*Y) . Taking expected values of both sides we have E(b) = E[(X TX + X Te 1 + e^X + e ^ ) " 1 XTY] , T as E(e^Y) = 0 by the distributive properties of e^. 33 From (9) we can see that the bias of b w i l l be small i f ( X T E 1 + e^S + e^e 1) i s small r e l a t i v e to X TX. Turnbull (1968), working with a single variable model, observed that the bias i n b as an estimate of 3 increases as the r a t i o of to the corrected sum of squares increases. The bias of the estimate of 6 i s minimized i f the range of the independent variables i s maximized. The p r a c t i c a l s ignificance of this r e s u l t i s apparent i n the forestry problem of 'even aged' stands mentioned i n the introduction. I f i s one or two years and the range of stand age i s large, the bias i n the estimate of 3 would tend to be t r i v i a l . S i m i l a r i l y , the bias due to cr^ > 0 i n the estimate of the regression c o e f f i c i e n t corresponding to the variable tree height, w i l l be small i f the range of tree heights i s large. Another point of interest i n the simulation results was the T bias of the error sum of squares, e2 e2 (denoted CHI i n Tables 1-15), 2 as an estimate of (n-p-1) a^. I t was also noted that the sampling d i s t r i b u t i o n of t h i s s t a t i s t i c was not a chi square d i s t r i b u t i o n . An explanation of t h i s result can be obtained through examination of the components of the s t a t i s t i c . The error sum of squares SSE i s computed as SSE = (Y-y) T (Y-y) = Y TY - 2y TY + y Ty . (10) I f we have a c l a s s i c a l regression model y = X8 + e 2 , e 2 * NCO.a*) then 7 - Y + e 2 and we can replace y i n equation (10) with Y + e^. SSE = Y TY - 2(Y + e 2 ) T Y + (Y + e 2 ) T ( Y + e 2) m m T T T T* T* = Y Y - 2Y Y - 2 e 2Y + Y Y + e 2Y + Y e 2 + ( n ) T T T Gathering Y Y terms and noting e 2Y = Y e 2 , rp m m m rp rp rp SSE = (Y Y - 2Y Y + Y Y) + (e^Y + Y.e 2 - 2 2Y) + T " £ 2 e 2 (12) However, i f our model contains error i n the independent variables, 2 y = xB + e 2 » e 2 ^ N ( 0 ' ° 2 ) Where x = X + e x , e x * N p(0,a 2I p) (13) 35 S u b s t i t u t i n g equation (13) i n t o (10) gives T n r , N „ , , T „ . r / , v n i n T , SSE = YiY-2[(X+e 1)3+e 2] Y + [(X+e^B+e^ [ (X+e^ B+e2] = YTY-2(X3)TY-2( e ; L3)TY-2e 2Y + (X3+e13+e2)T (X3+e 13+£ 2) Noting t h a t X3 = Y, SEE = YTY - 2YTY - Ke^^Y - 2e?Y + (YTY + YT( £ ; L 3 ) + YTe 2 4- (e^Y + (e^)1 ( e ^ ) + ( e ; L 3 )Te 2 + e^Ce^) + e^Ce^Y) + Gathering terms, SSE = 8Te^e 13 + 2 e2eie + E2e2 ^1 4 ) T T T This f o l l o w s due to the f a c t that e 2Y, ^ 2 e l ^ ' a n d Y elg a r e 1 X 1 m a t r i c e s , and t h e i r transposes must have the same v a l u e . Taking e x p e c t a t i o n s of ( 1 4 ) , E(SSE) = E[3Te^ e ] L3] + 2 E [ e 2 e i 3 ] + E [ e 2 e 2 ] = E[B T eJ e i B] + E [ e 2 e 2 ] (15) when and are independent. From equation ( 1 5 ) , the expected value of the e r r o r sum of squares when the model has e r r o r of independent v a r i a b l e s i s an ex p r e s s i o n 36 of three terms. The f i r s t term i s the c h i square di s t r i b u t e d estimate 2 T T of (n-p-ljo^. The second term 6 e^E-^3 biases the SSE as an estimate of 2 (n-p-ljo^. Note that the i n f l a t i o n of the error sums of squares i s d i r e c t l y proportional to the magnitude of the 3 parameters and the 2 diagonal of the covariance matrix a.. I of the random vector e... A x p X T t h i r d bias term, Ze^e^ further i n f l a t e s the error sums of squares when and are not independent. These relationships are c l e a r l y r e f l e c t e d i n the data of the simulation experiment. The experiments of both studies (1) and (2) demonstrate the effect of increasing (CT-j^p) o n t n e s t a t i s t i c l a b e l l e d CHI, the sums of squares error, and the experiment of Study 2 show the effect of varying the size of the 3^ parameters. A p r a c t i c a l value of the simulation r e s u l t s i s that they demonstrate the rapid increase i n over-estimation of the true error sum of squares of Y as a^I increases. The observations that the d i s t r i -bution of the s t a t i s t i c CHI no longer follows that of the chi square i s also explained by equation (15). The s t a t i s t i c i s no longer the sum of the squares of a standard normally distributed random variable. The reason for the departure of the s t a t i s t i c B l - 3 X S(3 1) from students' t d i s t r i b u t i o n can be seen from the above conclusions. The t d i s t r i b u t i o n i s a r a t i o of random variables, the denominator deriving from the chi squared estimate of SSE. 37 It has been demonstrated that when the predictor vectors are not orthogonal and 3^ > 0, the error sum of squares i s inflated pro-portional to both and the correlation of X^, X^. The correlation effects of Study 2 are as interesting as the X-error effects but are presented here without discussion. A precise mathematical explanation of the bias conditions associated with non-orthogonal prediction vectors i s given by Hoerl and Kennard (1970). From the standpoint of application i t i s important to point out that regression with non-orthogonal predictor vectors is far from unusual. In forestry, the linear form of the volume equation discussed earlier, log V = log a + b log D + c log H, contains two correlated independent variables. Presumably, a least 2 squares f i t would result in overestimation of the variance, o0. CHAPTER THREE REGRESSION PROCEDURES WHEN a± J 0, AND PREDICTOR VECTORS ARE NOT ORTHOGONAL Kendall and Stuart (1961) discuss the problem of "both variables subject to error." The two variables X and Y are assumed linearly related, Y = 3 Q + 3XX = c y but X and Y cannot be observed. However, 6 and y can be observed where 6 = X + e x, Y = Y + e , y e x > being errors associated with the x and y vectors respectively. Bartlett (1949) describes a graphical method for f i t t i n g straight lines when both X and Y are subject to error. One point on the line i s (X,Y), the means of the X and Y vectors. The n data points are then divided into 3 groups such that the extreme groups contain as near ~ points as possible. The means of the extreme groups (X^,Y^) and (X„,Y_) are used with the original point to determine the line. 39 Carlson, Sobel and Watson (1966) describe a technique for f i t t i n g multiple regression models with errors of both dependent and independent variables by means of "instrumental variables." Instrumental variables Z, ,Z„,...,Z are variables which could 1 2 n be used to predict X but are uncorrelated to errors of X and Y. The li n e a r model becomes Y = $ Q + 3xx + e y (16) where x = Y Q + Y ^ + Y 2 z 2 + ... + Y n z n + e2 (17) The authors solve the system d i r e c t l y by a " f u l l information maximum l i k e l i h o o d " method, an i t e r a t i v e process involving estimation of the error convariance matrix. Results obtained by applying least squares methods to (17) and then to (16) underestimate the values found by f u l l information maximum l i k e l i h o o d . The authors suggest the following compromise: solve (17) by least squares and substitute i n (16). Then estimate a,b such that -Z (y - bx - a ) 2 i s a minimum. Acton (1950) presents a maximum l i k e l i h o o d estimate of the regression c o e f f i c i e n t of a straight l i n e when X and Y are i n error. The maximum l i k e l i h o o d expression for g involves the standard deviation of the errors of X and Y, a , a respectively, and correlation of E x £ y the errors p(e , e ). K x y 40 A l i k e l i h o o d function for the multiple variable case i s discussed by Clutton-Brock (1967), An i t e r a t i v e f i t t i n g process i s also demonstrated. Techniques for f i t t i n g l i n e a r models when the independent variables are correlated are not w e l l developed. Kendall (1957) suggests applying p r i n c i p l e component analysis to the independent variables and then using the orthogonal variables s p e c i f i e d by the f i r s t few p r i n c i p l e components as predictor vectors. However, Cox (1968) c r i t i c i z e s t h i s technique, arguing that there i s no l o g i c a l reason why the dependent variable should not be closely t i e d to the least important p r i n c i p l e component. Hoerl and Kennard (1970) propose an estimation procedure, termed Ridge Regression, for the s i t u a t i o n where predictor variables are non-orthogonal. Estimation i s based on (X TX + k I ) , k > 0, P T where X X i s i n the form of a cor r e l a t i o n matrix. Addition of the small p o s i t i v e quantity k to each diagonal element causes the system (X TX + k l ) B = X TY to act more l i k e an orthogonal system. A procedure for improving estimation of 8 i s given. This summary of regression procedures when a 4 0 or C f I shows that the l i t e r a t u r e i s not w e l l developed on th i s subject. P r a c t i c a l algorithms for f i t t i n g l i n e a r models under these non-standard conditions do not yet e x i s t . CONCLUSION i In this paper we examine the ramifications of using least squares regression methods when the assumption that independent variables are known without error i s v i o l a t e d and the predictor vectors are non-orthogonal. V i o l a t i o n of the assumption that X variables are known without error affects the estimate of the vector of regression c o e f f i c i e n t s , 3 . As the variance of the error of the independent va r i a b l e s , o^, increases, there i s a d i s t i n c t trend of decrease i n the \$\ estimate, |B|. For a two variable model with 3^ = 3^ = .5, 6^ decreased from B l = .5 to B l = .14 as increased from = 0 to = 4. Turnbull (1968) demon-strates that the r e l a t i v e bias of the estimate of 3 i s proportional to the sum of squares of the corresponding predictor variable. The r a t i o of the residual sum of squares to variance of error of y observations does not display the elementary ch a r a c t e r i s t i c s of a chi square s t a t i s t i c when > 0. From the expression for expected sum of squares error when > 0, E[SSE] = 3 T E[e^ e ; L] 3 T + E f e ^ ] the bias term i s a product of sum of squares errors of the independent variables and t h e i r corresponding 3 term. The T s t a t i s t i c i s also affected by the i n f l a t i o n of SSE. 42 A further small deviation from the expected sampling d i s t r i b u t i o n of these s t a t i s t i c s (for > 0) with increasing c o r r e l a t i o n i s due to T T lack of precision i n computing B E[e^e^] B, as the system becomes less orthogonal. From the above conclusions, two generalizations about models with more than two variables can be made: a) Overestimation of the true error sums of squares of y i s proportional to the error of independent variables. Increasing the number of variables w i l l increase the bias of t h i s estimate. b) Addition of non-orthogonal predictor vectors w i l l increase the number of non-zero off diagonal elements of the correla t i o n matrix. The i n f l a t i o n of the error sum of squares results from t h i s lack of orthogonality and would presumably increase. Further simulation studies of models with more than 2 non-orthogonal predictor vectors might demonstrate the p r a c t i c a l upper l i m i t s of cor r e l a t i o n of predictor variables for given computer precision. In general, the effects of correlated independent variables deserve further study through both simulation and algebraic analysis. The l i t e r a t u r e does not present a p r a c t i c a l algorithm for f i t t i n g l i n e a r models under these non-standard conditions. The most f r u i t f u l approach to the problem of error of independent variables appears to be maximum l i k e l i h o o d estimates of B. The effects of non-orthogonal predictor variables are less generally appreciated. Transformations to an orthogonal subset of predictor vectors by principle components methods, and the recent technique of ridge regression are possible solutions. 44 LITERATURE CITED Acton, F. S. 1959. Analysis of Straight Line Data. Wiley, New York. B a r t l e t t , M. S. 1949. F i t t i n g a straight l i n e when both variables are subject to error. Biometrics 5:207-212. Caulson, F. D., E. Sobel and G. S. Watson. 1966. Linear relationships between variables affected by errors. Biometrics 22:252-267. Clutton-Brock, M. 1967. Likelihood d i s t r i b u t i o n s for estimating functions when both variables are subject to error. Techno-metrics 9:261-269. Cox, D. R. 1968, Notes on some aspects of regression analysis. J. R. S t a t i s t . Soc. A 131:265-279. Hoerl, A. E. and R. W. Kennard. 1970. Ridge Regression: biased estimation for non-orthogonal problems. Technometrics 12:55-67. Kendall, M. G. 1951. Regression, structure, and functional r e l a t i o n -ship. Biometrics 7:11-25. 1957. A Course i n Multivariate Analysis. G r i f f i n , London. and A. Stuart. 1961. The Advanced Theory of S t a t i s t i c s . Hofner, New York. Vol. 2. Turnbull, K. J . 1968. Monte Carlo studies of several regression models. Presented at the meeting of the Society of American Foresters, D i v i s i o n of Forest Mensuration, Philadelphia, Oct. 2, 1968. Wald, A. 1940. The f i t t i n g of straight l i n e s i f both variables are subject to error. Ann. Math. S t a t i s t . , 11:284.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bias in least squares regression.
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bias in least squares regression. Williams, Douglas Harold 1972
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Bias in least squares regression. |
Creator |
Williams, Douglas Harold |
Publisher | University of British Columbia |
Date Issued | 1972 |
Description | Much of the data analysed by least squares regression methods violates the assumption that independent variables are known without error. Also, it has been demonstrated that parameter estimates based on minimum residual sums of squares have a high probability of being unsatisfactory if the independent variables are not orthogonal. Both situations are examined jointly by Monte Carlo simulation and bias in least squares estimate of regression coefficients and error sums of squares is demonstrated. Techniques for regression under these conditions are reviewed but the literature does not present a practical algorithm in either case. |
Subject |
Least squares. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-03-31 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0075416 |
URI | http://hdl.handle.net/2429/33131 |
Degree |
Master of Science - MSc |
Program |
Forestry |
Affiliation |
Forestry, Faculty of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1972_A6_7 W54.pdf [ 2.45MB ]
- Metadata
- JSON: 831-1.0075416.json
- JSON-LD: 831-1.0075416-ld.json
- RDF/XML (Pretty): 831-1.0075416-rdf.xml
- RDF/JSON: 831-1.0075416-rdf.json
- Turtle: 831-1.0075416-turtle.txt
- N-Triples: 831-1.0075416-rdf-ntriples.txt
- Original Record: 831-1.0075416-source.json
- Full Text
- 831-1.0075416-fulltext.txt
- Citation
- 831-1.0075416.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0075416/manifest