ADDITIVITY OF COMPONENT REGRESSION EQUATIONS WHEN THE UNDERLYING MODEL IS LINEAR by SIMEON SANDARAMU CHIYENDA .S.F., The University of British Columbia, 1974 M.S., Iowa State University, 1979 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Forestry We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March 1983 © Simeon Sandaramu Chiyenda, 1983 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6 (3/81) ABSTRACT This thesis is concerned with the theory of fitting models of the form y = X$ + e, where some distributional assumptions are made on e. More specifically, suppose that y = X(5\ + is a model for a component j (j = 1, 2, ..., k) and that one is interested in estimation and inter im ence theory relating to yT = Zj=1 y = X3T + eT-The theory of estimation and inference relating to the fitting of yT is considered within the general framework of general linear model theory. The consequence of independence and dependence of the y (j = 1, 2, ..., k) for estimation and inference is investigated. It is shown that under the assumption of independence of the y^, the parameter vector of the total equation can easily be obtained by adding corresponding components of the estimates for the parameters of the component models. Under dependence, however, this additivity property seems to break down. Inference theory under dependence is much less tractable than under inde pendence and depends critically, of course, upon whether y^ is normal or not. Finally, the theory of additivity is extended to classificatory models encountered in designed experiments. It is shown, however, that additivity does not hold in general in nonlinear models. The problem of additivity does not require new computing subroutines for estimation and inference in general in those cases where it works. ii TABLE OF CONTENTS ABSTRACT ii LIST OF TABLES v LIST OF FIGURESACKNOWLEDGEMENT vl DEDICATION viii Chapter I INTRODUCTION 1 II PRELIMINARIES, NOTATION AND PROBLEM DEFINITION 4 2.1 Preliminaries and Notation 4 2.2 Problem Definition 6 III LITERATURE REVIEW 7 IV ADDITIVITY IN THE CASE r. = p 13 3 4.1 Inferences for Total Model when e ^ N(<j>, Io?) 15 4.2 Inferences for Total Model when Ej N(<j>, Vo?) 9 V ADDITIVITY WHEN r. < p WITH r. < p FOR SOME j 23 3 3 5.1 Estimation when e. ^ N(<j>, Io?) 24 5.2 Inference when ^ N((j>, I°j) 9 5.3 Estimation and Inference when ^ N(((>, Vo?) .... 32 VI A GENERALIZATION OF THE ADDITIVITY PROBLEM 33 VII OTHER ASPECTS OF THE ADDITIVITY PROBLEM 35 7.1 The Case y., y (j * I) Dependent 36 3 i 7.1.1 Estimation for Total Model under Dependence 39 iii 7.1.2 Inference for Total Model when yit y2» • • •» are JMVN 41 7.1.3 Inference for Total Model when yi> Y2» •••> y. are not JMVN 43 K. 7.2 The Case of X Random or Measured with Error .. 44 7.2.1 The Case of X Random 47.2.2 The Case when X is Measured with Error 46 7.3 Other General Complements 48 VIII SOME EXTENSIONS OF THE THEORY 53 8.1 Extension to Classificatory Models 54 8.2 Extension to Nonlinear Models 56 IX COMPUTATIONAL CONSIDERATIONS 60 X SOME ILLUSTRATIVE EXAMPLES 7 10.1 Assessing Multivariate Normality of y^ 67 10.2 Example 1 72 10.3 Example 2 5 10.4 Example 3 6 XI CONCLUSIONS AND REMARKS 78 REFERENCES 80 APPENDIX I 6 APPENDIX II 9APPENDIX III 2 iv 7.1.2 Inference for Total Model when yi» yi> • • • » y^ are JMVN 41 7.1.3 Inference for Total Model when Yi> Y2> •••> y^ are not JMVN 43 7.2 The Case of X Random or Measured with Error .. 44 7.2.1 The Case of X Random 47.2.2 The Case when X is Measured with Error 46 7.3 Other General Complements 48 VII SOME EXTENSIONS OF THE THEORY 53 8.1 Extension to Classificatory Models 54 8.2 Extension to Nonlinear Models 56 IX COMPUTATIONAL CONSIDERATIONS 60 X SOME ILLUSTRATIVE EXAMPLES 7 10.1 Assessing Multivariate Normality of 67 10.2 Example 1 72 10.3 Example 2 5 10.4 Example 3 6 XI CONCLUSIONS AND REMARKS 78 REFERENCES 80 APPENDIX I 6 APPENDIX II 9APPENDIX III 2 iv LIST OF TABLES Table 1 Results of Koziol's (1982) test for multivariate normality on three data sets 69 2 Results of Koziol's test for bivariate normality 70 3 Comparison of predicted values and residuals of unrestricted total equation with those of total conditioned equation 94 LIST OF FIGURES Figure 1 Scatter of Residuals from Total Unrestricted Equation and from Total Conditioned Equation 95 v ACKNOWLEDGEMENT I would like to express my indebtedness to Dr. Antal Kozak, my supervisor, for his advice, encouragement, and for a degree of under standing of a rare order. If it were not for his encouragement and unstinting support, this thesis would hardly have reached this stage. My most sincere gratitude is due to other members of my research commit tee, namely Drs. Michael Schulzer of the Department of Mathematics, J. H. G. Smith, D. D. Munro, and D. H. Williams, all of the Faculty of Forestry, for providing much-needed guidance and constructive criticism at various stages of this study. I am especially grateful to Dr. Schulzer who gave freely of his time to discuss various aspects of my research. I would like to extend my appreciation to Dr. Stanley Nash, now Professor Emeritus in the Department of Mathematics at the University of British Columbia,for kindly reviewing an earlier draft of this thesis. My two-year association with Dr. James Zidek of the Department of Mathematics at this university, prior to his sabbatical leave at Imperial College (London), was a truly rewarding experience and I would like to express my admiration for a person so selfless and so endowed with academic vision. I would like to record my deepest gratitude to the Faculty of For estry for financial support during the period of my studies and to the Food and Agriculture Organization for a generous fellowship. My grati tude also goes to Professor Lewis K. Mughogho for always having confidence in me and to Professors Chimphamba and Edje for supporting me whenever I vi needed support. Finally, I express much gratitude and love to my wife Jane and daughters Lucy and Margaret for putting up with a husband and daddy who was hardly home during the preparation of this thesis. I am indebted to Mrs. Nina Thurston for a typing job truly well done. There are many others that I should mention, including some very special people. To these and others, I would like to say "I have not forgotten about you." vii DEDICATION I dedicate this thesis to the loving memory of my mother Katherine who passed away in February 1982 while I studied at the University of British Columbia viii CHAPTER I 1.0 INTRODUCTION The main objective of this thesis is to formalize and extend or generalize results obtained by Kozak (1970) concerning conditions that ensure that predicted values calculated from component regression equa tions add up to those obtained from a corresponding total equation. Kozak (1970) derived his results within the context of forest biomass prediction using component biomass equations and a corresponding total biomass equa tion. He cites examples in other areas of forestry and forest research where such a problem arises. A broader view is adopted in this thesis with regard to areas of application of the 'additivity' problem. Since biomass analysis is of interest to scientists in various other disciplines of applied biology than forestry, a formalization and generalization of the additivity problem and its related statistical theory will be of value to a large number of scientists, including those in agriculture and ecology. To fix ideas with regard to the additivity problem as perceived in this thesis, suppose as in Kozak (1970), that for some tree species weight of bole (Yi), weight of bark on the bole (Y2), weight of crown (Y3 = branches and foliage), and total weight (Y=E?, Y.) can each be modelled as some function (in the linear regression sense) of diameter at breast height (X). In this setting, one refers to the equations 1 2 expressing each of Yi, Y2, and Y3 as functions of X as component equations and that giving Y^ as a function of X as the total equation. More generally, one can envisage k components of an organism or system characterized by measurable attributes Yi, Y2, Y and their K. sum Y = Y. where Yi, Y2, Y are each related to a common K+l j—1 j k+1 set of p independent variables Xi, X2, X^ according to a multiple linear regression model. Using the case (with k = 3) described in the preceding paragraph, Kozak (1970) states conditions under which one need only fit the component equations, the total equation being completely determined by adding coefficients of corresponding independent variables in the component equations. Kozak's (1970) results pertain to the situa tion where each component equation contains all independent variables under consideration. In the sequel, our objective is essentially four-fold. First, it is intended to demonstrate, in a rather simple way, why Kozak's (1970) conditions hold and to derive explicit expressions for statistics of interest for the total equation from those of the component equations. Secondly, it will be shown that additivity can be assured even when differ ent terms are retained in the component equations (that is, when r^ S p with strict inequality for at least one j, where r^ is the number of independent variables in component equation j and p is the number in the total equation). This will be achieved by appropriately correcting the total equation in order to take into account the conditioning that forces some independent variables not to appear in some component equa tions. Thirdly, it will be shown that the case considered by Kozak (1970), where each component equation has the same number of independent variables as the total equation (that is, r. = p) , can be derived as a 3 special case of a generalization of the conditioning principle mentioned above. Estimation and inference theory will be developed for the above objectives based primarily on the assumption that Y. and Y are indepen-J *• dent for each j * £ ( j , £ = 1, 2, ..., k) and an appropriate distribu tional assumption on the error term corresponding to model j. Because of its relevance when considering many biological phenomena, theory relating to the case where Y., Y (j * £) are dependent will be con-J * sidered. Finally, some examples will be worked out to illustrate the application of the theory. It is important to emphasize here that the main objective of this dissertation is to investigate the additivity problem and its related statistical theory within a fairly general framework. Therefore, the use of certain models incorporating and, probably, excluding particular independent variables in the examples and elsewhere in this thesis should not be construed as suggesting that the equations are best in a predictive sense. More specifically, although the subject of the thesis has a direct bearing upon biomass prediction problems, it is not the objective here to arrive at a best biomass equation in any particular sense. On the other hand, the view is taken that the determination of best equations for prediction purposes is best left to particular applications of the theory to be presented here. CHAPTER II 2.0 PRELIMINARIES, NOTATION AND PROBLEM DEFINITION We begin with some definitions and rules of convention regarding notation to be adopted in the sequel. 2.1 Preliminaries and Notation Since the development in this thesis will be concerned with linear regression models, we first seek to identify this class of models precisely. Accordingly, we define a regression model following Gallant (1971). Let X C Rm, n C R? and n and p be positive integers such that n > p. The elements of X and Q will be denoted by x and g, respectively. Further-oo , oo more, we shall let let)t_^ he a sequence of random variables, 'LxtJt_^ a sequence from X, f(x, g) a real-valued function with argument (x, g) and go to be a point in Q. Definition 1. A regression model is defined here to be the sequence of random variables {yt}^_-^ given by yt = f(xt, go) + et-We emphasize that we owe the basic idea behind this definition to Gallant (1971). Our definition is not as rigorous as Gallant's (1971) but it will suffice for our purposes. Note that f is a mapping of points from the product space X x fi into the real line R1 (that is, f : X x fi ->• R1) . Now, suppose we denote the set of all possible regression models generated according to definition 1 by R* and define a set of regression 4 models of the form P yt = <t»o(xt) + Z Bj*j(xt> + et where <K : X R1, i = 0, 1, p. Again we owe this formulation to Gallant (1971). Note that in the above, f(xt, So) in definition 1 has been replaced by <t>o (xt) + E?=1 ^^(Xj-)- Gallant (1971) desig nates the class of regression models whose members are specified according to the last equation by L, which is the class of linear regression models. Thus we have a second definition. Definition 2. A regression model r* is called a linear regression model if r* E L, where L is as defined above. Note, for completeness, that L C R*. This thesis will be concerned with the theory of estimation and inference for members of L under certain conditions. As is well known, any member of L can be written in matrix form. With regard to notation, a matrix representation will be adopted throughout most of the development here. This has an obvious aesthetic appeal but, more importantly, leads to brevity and a rather compact pre sentation of results which would otherwise be cumbersome using ordinary scalar arithmetic. Accordingly, let y^ denote an n x 1 vector of reali zations of an observable random variable corresponding to the j*"*1 com ponent, X an n x (p + 1) matrix defined so that X = (X0IXiI ... IX ) = (tfIXiI...I X ), where X0 = # is an n x 1 vector with II p P components identically equal to unity and X^ (i = 1, 2, ..., p) is an n x 1 vector consisting of realizations of the independent variable X^, 8j is a (p + 1) x 1 parameter vector and is the corresponding n x 1 vector of errors or disturbances. 6 For any matrix A, say, we shall write A*" to denote the transpose of A and A ^ to denote the inverse of A, provided that the inverse exists. -1 t Where A does not exist, we shall have occasion to use A , the generalized inverse of A, to obtain more general results. In any case, new notation may be introduced in the discourse as the need arises but in every case our notation will be consistent with that used in standard texts in linear algebra (e.g., Noble & Daniel, 1977; Searle, 1966; Stewart, 1973; Strang, 1976). 2.2 Problem Definition With the above conventions and notation, we shall be concerned, in this dissertation, with models of the form y^ = Xgj + £j (j = 1, 2, k) (2.2.1) and yx = £j=1 y- = xeT + eT (2.2.2) where (2.2.1) gives us k component models and (2.2.2) the corresponding total model. We shall focus interest in the sequel on characterizing estimators of 3rr, and on propounding a theory of inference relating to the total model under a number of assumptions concerning the behavior of the Ej (j = 1, 2, k). The theory to be presented here will be based on well-known general linear model theory. Note that specifi cation of the behavior of the leads, in general, to specification of the behavior of try.. In the next chapter, a review is made of some of the work in the literature relating to the additivity problem before proceeding to propose a unified theory of estimation and inference for this problem in succeeding chapters. CHAPTER III 3.0 LITERATURE REVIEW Theoretical and applied biologists have traditionally viewed biomass as a useful index for assessing the productivity of various flora and fauna with respect to designated environments or ecosystems (Ovington, 1962). This index has also been used for cataloguing, in the form of inventories, the quantities of biological matter available at a given time in a given environment. A thorough reading of the literature on biomass and related studies indicates development in two main directions. The early part of the literature indicates that scientists essentially sought ways of quantita tively describing biomass production and productivity of various biolog ical organisms. This approach was especially dominant in ecological studies for many years. Quite often, systematic sampling schemes (e.g., line transects) were used to obtain data which were subsequently summarized to give crude estimates of biomass. In many cases, these estimates were reported by component of the organism or system under consideration. -In general, little or no statistical information, such as measures of precision, accompanied the summarizations. In any case, the very nature of the sampling schemes upon which the estimates were based militated against a meaningful statistical interpretation of the results. More recent literature is suggestive of a significant shift from the purely 7 8 descriptive approach to model-based, statistically-oriented methods of describing biomass. This approach not only leads naturally to the need to address questions relating to choice of proper model for use in a given situation, but more importantly perhaps, attaches particular importance to choosing estimates that are statistically reasonable. The following review of the literature on developments that led to the additivity prob lem will be brief and, hopefully, informative rather than exhaustive. For a more complete review, see Chiyenda (1974) or Kurucz (1969). More recent comprehensive reviews are given by Smith (1979) and Smith and Williams (1980). The forestry literature credits Tufts (1919) with the first reported work on tree component biomass. In that work, Tufts correlated trunk cir cumference of fruit trees with the weight of their tops (or crowns). Fol lowing that work, many workers in forestry engaged in biomass studies of one form or another. Accordingly, considerable work has been reported in the general area of total biomass production of various tree species (Honer, 1971; Kellogg and Keys, 1968; Young and Chase, 1965) and of forest eco systems (Ovington, 1962). Some of this work was carried out as part of ongoing inventory programmes (e.g., Honer, 1971) while others were conducted in research connected with forest fire-hazard abatement efforts (e.g., Kiil, 1967, 1968; Loomis et al., 1966; Storey et al. , 1969). In a biomass study of 13 North American tree species, Storey et al. (1955) found that dry crown weight, branchwood weight, and foliage weight were significantly related to stem diameter at the base of live crown for all the species. On the other hand, Ovington (1956) investigated and compared the forms, weights, and productivity of tree species grown in close stands. This study was motivated by silvicultural and ecological considerations. In a study similar to that of 1955, Storey and Pong 9 (1957) investigated and compared crown characteristics of a number of hardwood species. Fahnestock (1960) used data collected from nine coniferous tree species in the Northern Rocky Mountain area to fit regression equations to predict crown weight and proceeded to construct crown weight tables for the species. Among the species investigated were Douglas-fir (Pseudotsuga menziesii [Mirb.j Franco), western hemlock (Tsuga heterophyla [Raf.] Sarg.) and western red-cedar (Thuja plicata Donn). Tadaki et al. (1961) investigated the productivity of a young stand of birch (Betula platyphylla) in southern Hokkaido, Japan, and established linear relationships on logarithmic axes between basal area and stem biomass, basal area and branch biomass, and between basal area and foliage biomass. They also reported that estimated fresh and dry foliage weights did not vary with stand density but that branch biomass decreased with stand density. Brown (1963) investigated the relations between crown weight and diameter in some Lake States red pine (Pinus resinosa Ait.) plantations and also studied the influence of site quality and stand density on the weight of individual tree crowns. Keen (1963) analysed average green weights and centres of gravity of samples of black spruce (Picea mariana [Mill.] B.S.P.), white spruce (Picea glauca [Moench.] Voss.), and balsam fir (Abies balsamae [L.] Mill.) and investigated their variation with species, season, and location. He also derived a tabulation of weights and centres of gravity of the trees. Young et al. (1964) used regression equations to construct fresh and dry fibre weight tables for individual tree components, groups of components, and complete trees for seven tree species. Brown (1965) 10 investigated the effect of site and stand density on the crown size of individual red pine and jack pine (Pinus banksiana Lamb.) trees and studied ways of estimating crown fuel weights. The study indicated that esti mated amounts of foliage and branchwood per unit area varied with the age and growing conditions of the stand. Loomis et al. (1966) used analysis of covariance to test the effect of stand density on dry foliage and branchwood weights in shortleaf pine (Pinus echinata Mill.) and found that regressions of dry foliage and branchwood weights for different stand densities were not significantly different. Dyer (1967) prepared preliminary fresh and dry weight tables for northern white cedar (Thuja occidentalis L.) and derived linear regression equations for predicting fresh and dry wood weights of various tree components as percentages of total tree fresh and dry weights. Kiil (1967) used regression analysis to construct fuel weight tables for white spruce and lodgepole pine (Pinus contorta Dougl.) in west-central Alberta and found that a combination of diameter at breast-height and either crown width or crown length gave the most precise estimating equation for fuel weight. In a follow-up study, Kiil (1968) studied the fuel complex of 70-year-old lodgepole pine in the same area with a view to facilitating measurement and prediction of weight and size distribution of fuel components. Kurucz (1969) obtained predictive regression equations for total and component biomass of Douglas-fir, western hemlock, and western red-cedar grown on the University of British Columbia Research Forest near Haney, British Columbia. In a study that was probably motivated as much by Kurucz's (1969) study as by others, Kozak (1970) considers the problem of additivity of component biomass regression equations for purposes of prediction. The real essence of.Kozak's (1970) work does not lie in the uniqueness of the problem he poses but, rather, in the statistical problems that it raises and the potential practical impact that a solution to these problems might have. Other studies conducted following kozak's (1970), while essentially underscoring the importance of the biomass estimation problem in forestry and related disciplines, did not address the additivity aspect of the problem directly. Instead, many investigators continued to look for the best set of variables giving the most parsimonious predictive equation for total and component biomass (e.g., Crow, 1971; Honer, 1971; Johnstone, 1971; Muraro, 1971; Zavitkovski, 1971; Sando and Wick, 1972). Biomass studies and methods of effectively predicting individual tree biomass continued to interest applied quantitative biologists in the mid- and late-seventies and well into the eighties. This interest in biomass is ascribable to the applicability of individual tree biomass information in addressing a wide range of ecological and forest management problems. These include large-scale biomass inventories (Young, 1978; Ker and Van Raalte, 1980), nutrient-cycling problems (Marks and Bormann, 1972; Kimmins, 1977; Kimmins and Krumlik, 1976; Kimmins et al., 1979), as well as the determination of net productivity of forest ecosystems (whittaker et al., 1974). Many studies, such as Jacobs and Cunia (1980), Jokela et al. (1981), Keyes and Grier (1981), Schmitt and Grigal (1981), Yandle and Wiant (1981), Zavitkovski et al (1981), Chaturvedi and Singh (1982), Freedman et al. (1982), and Singh (1982), have used regression methods to address the biomass prediction problem. It is worth mentioning that biomass studies of various descriptions are being conducted to date. Some of these are essentially computer-12 based simulations of various aspects of the biomass problem. An example of this is the FORCYTE study being undertaken by Kimmins and his associates at the University of British Columbia (see Kimmins and Scoullar, 1979; Kimmins et__al., 1980). As described by its authors, "FORCYTE is an inter active simulation model designed to examine, on a site-specific basis, the long-term effects on nutrient budgets and productivity of various intensive forest management and harvesting practices." Other studies are conducted as part of ongoing national programmes aimed at identifying useful model ling procedures for predicting or, otherwise, describing biomass. An example of this is the study, again at the University of British Columbia, by Smith (1979) and Smith and Williams (1980) originally commissioned by the Canadian Forestry Service to propose the development of a comprehensive forest biomass growth model. That proposal has since been approved and work is currently under way to develop such a model. Surprisingly, most of the studies cited earlier do not consider the additivity problem except for passing reference to Kozak's additivity result in a few instances (e.g., Ker and Van Raalte, 1980; Singh, 1982). One might surmize that this apparent lack of interest in the additivity problem might be largely due to the fact that additivity has, since its introduction into the forestry literature by Kozak (1970), been restricted to situations in which each component equation contains the same independent variables. This precludes the use of additivity in the more common and important situations where only statistically important independent vari ables are used in any component equation. An extension of the additivity problem to such situations along with its corresponding statistical theory would obviously be of interest. This is what is intended to be done in suc ceeding chapters of the discourse. One hopes that studies such as have been 12a cited above will, in time, benefit from or at least find useful comple mentary methodology in the theory to be presented in this dissertation. CHAPTER IV 4.0 ADDITIVITY IN THE CASE r. = p J We consider first the models y. = XB. + e. (i =1, 2, .... k) and 3 3 3 = £. , y. = XB_ + e_ where it is understood that each of these k + 1 I j=l J2 T T models involves the same matrix X. This is the case considered by Kozak (1970). Consider the estimation of 3^ assuming £j % (<!>> or e. (d), Vo2.) where d> is an n x 1 null vector, I is an identity matrix 3 3 of dimension n and V is a known symmetric positive definite matrix of dimension n. Note that we have not for now specified the form of the distribution function of e. as this is not necessary to obtain estimates 3 of desirable properties. We restrict attention in this chapter to the situation where X is of full rank. Under the assumption that ^ (<}>, ^°j)» ordinary least squares (OLS) fitting of the k component models yields Gauss-Markoff estimators Bj = (X^)"1 xV. (j = 1, 2, k) . (4.0.1) The result given by (4.0.1) is completely basic and warrants no further comment except to note that the resulting Bj are best linear unbiased estimators (BLUE's) in the sense of the Gauss-Markoff theorem (see Gray-bill, 1976, p. 219; Kempthorne, 1975, p. 32; Searle, 1971, p. 88). When £j ^ (<|>, Vo?), generalized least squares (GLS) fitting applied to each component model leads to Gauss-Markoff estimators 14 B\ = (XtV~1X)"1XtV~1yj (j = 1, 2, k).. (4.0.2) Note that in (4.0.2) the existence of V ^ is guaranteed by the positive definiteness of V. It is worth pointing out that the OLS estimator of 3^ is in general different from the GLS estimator, except in the special case where there exists a (p + 1) x (p + 1) nonsingular matrix F such that VX = XF. This is a very special result and is due to Zyskind (1962). See Graybill (1976), Kempthorne (1975), and Searle (1971) for references to this result. Now consider the total model. Introducing the expression for yj given earlier into the expression for the total model, one gets k yT = Z (XB. + e.) = XBT + eT (4.0.3) j=l J or k k y = X I B• + I e. = XBT + eT (4.0.4) j=l J j=i J k k from where it is clear that £_, = £. -, B- and £_, = !. n e. . Hence, the T j=l J T j=l j k -least squares estimator of Bj is given by B^ = 8., where 8- is given by (4.0.1) or (4.0.2) according as ordinary leasf squares or generalized least squares fitting is used to obtain BLUE's. Thus the total equation is completely determined by coefficients of the component equations, as Kozak (1970) pointed out. Having shown (for = p) that the total equation is completely determined by the parameters of the component equations, it might be of interest to investigate whether statistics derived from the component analyses can be utilized to make inferences pertaining to the total equa tion. It will be shown, in the sequel, that this is the case in general. The results presented below will be useful for testing hypotheses 15 concerning the total equation and for constructing confidence intervals. In order to simplify the derivations, we shall assume that the random variables y. and y are independent for each j * I (j, A = 1, 2, ...,k). J * As we point out later, this assumption may be quite unrealistic for the phenomena being modelled and this may considerably affect the utility of the theory to be developed on the basis of this assumption. 4.1 Inferences for Total Model when Ej ^ N(<j>, Ial) We now address the problem of inference for the total model when the £j follow a multivariate normal distribution with expectation vector <j> and covariance matrix la?, where $ and I have been defined earlier. Note that we have explicitly specified the form of the distribution of e. here since such specification is necessary for inference. J Let us begin by supposing that it is desired to discover how well the independent variables in the total equation explain the observed variation in the components of y^,. To answer this question, one needs to determine the amount of variation in the components of y^, that can be attributed jointly to these independent variables. This is the usual sum of squares due to regression. Denote the uncorrected sum of squares due to fitting the unrestricted total equation by SS (3T) and that due to fitting a version of this model restricted so that all components of g^, other than the intercept component are set equal to zero by SS^Cg^). It is easy to show that ssR(eT) = = (Zj=1 I.') x'tt^ y.) (4.1.1) and SSR(B()T) = g (I y ) - ny 2 (4.1.2) i=l where y.T is the i^ component of yT and yT is the mean of the components 16 of y,p. It follows that the desired regression sum of squares for fitting the total equation is given by k t k SSR((3*T|B0T) = ( Z 0 )Xt( Z y ) - nyT2. (4.1.3) j=l j=l As usual, the corrected total sum of squares for the total equation is _ n _ SST = y£ yT - nyT2 = Z y|T - nyT2 (4.1.4) i=l and the error sum of squares is given by SSE = y£ yT - SSR(3T) (4.1.5) or as the result of subtracting (4.1.3) from (4.1.4), that is, SSE = SST - SSR(B*T|B0T). It can be shown easily that SSR(3*T|BQT) is associated with p degrees of freedom while SS_, is associated with n - p - 1 degrees of freedom. Furthermore, since SSrj, is associated with n - 1 degrees of freedom and SST is the sum of SSR(B*T j and SSE, it follows from Cochran's theorem (see Hogg and Craig, 1970, p.393; Kempthorne, 1975, p. 57; Montgomery, 1976, p. 37) that ssr(3*x|B0T) and SSg are independent. By our distributional assumption on (j = 1, 2, k) we have that SSR(g*T|BQT)/a2 and SSg/of are, respectively, noncentral chi-square with p degrees of freedom and non-t t ? centrality parameter g*^ X Xg*^,/2a^, and central chi-square with n - p - 1 degrees of freedom. We have already noted that they are inde pendent, whence it follows that a test of significance for the total equation can be obtained using data and estimates relating to the component models without actually fitting the total equation. An R2 associated with the total equation is similarly obtained. For examining the hypothesis that some component of is equal to zero, one is often interested in constructing confidence intervals about the component or performing a direct t-test (see Montgomery, 1976, p. 325). In either case, one requires an estimate of the covariance matrix of 3T> Denote the true covariance matrix of g\ by ^ and that of 8T by Then we have that $j = (X^)-1^ (4.1.6) and by the fact that BT = E^=1 j-L and that the (j = 1, 2, .. ., k) are independent, we have JT = (X^)"1 Zj=1 o2.. (4.1.7) The results given in (4.1.6) and (4.1.7) are completely basic and we shall not venture to prove them here. The estimator of $ will be given by |T = (XtX)"1.Zj=1 5^ (4.1.8) „2 , th where Oj is the mean square error associated with fitting the j com ponent equation. Hence confidence limits on the relevant component' of 3T, say g , will be given by hr 1 './a.n-p-l Itej-l5PV! (*-1-9> where t ,„ . is the (1 - a/2) 100-th percentile of the central t-a/2,n-p-l r distribution with n - p - 1 degrees of freedom and C is the (£ + l)st diagonal element of the matrix (XfcX) ^. Note that £ takes integer values in the range 0 to p inclusive. The corresponding test based on the t-distribution is performed by computing and rejecting the null hypothesis that 80T = 0 if |t0| > t ,„ _ _-. • Note also that the test in (4.1.10) can be derived as a special case of a more general approach in the context of general linear hypothesis theory, as will be shown in a later part of the discourse. The inferences based on (4.1.9) and (4.1.10) are valid if (XfcX) ^ 18 is a diagonal matrix. In general, however, (XfcX) ^ is not diagonal and so results obtained from (4.1.9) and (4.1.10) can be misleading (see Montgomery, 1976, p. 326). This is so be cause both (4.1.9) and (4.1.10) and are based on the assumption that the elements of 3^, such as 8^ 3jT for i * j, are independent. When (XtX) ^ is not diagonal, this assumption does not hold in general. Therefore, a test for 8„T = 0 Xi 1 versus 3^ * 0 must be constructed using the 'extra sum of squares' prin ciple used in deriving (4.1.3) or using the general linear hypothesis theory alluded to above. The extra sum of squares principle is described in Draper and Smith (1981, p. 97 cf.) and in Montgomery (1976, pp. 326-328) . Finally, for constructing a confidence interval about a true value, y^, corresponding to the x-coordinate, X^ = (1, x-^rp» •••» > where X _ is a row vector, one needs the covariance matrix $~ of y . Since y^ = X8,r,, we know from theory (e.g., Morrison, 1976, pp. 83-84) that i> = X(±T) Xfc = X(XtX)"1 Xt Ek . a2.. (4.1.11) TYT rT j=l j Hence the estimator of is given by yT L = X(XtX)"1 Xfc . o2. (4.1.12) and, in particular, a- , the estimated standard error of y corres-y£T 11 ponding to the x-coordinate X^ is 8,n- lyA)"1 4^.1 ¥• <4-l-13) Therefore, the desired confidence interval for y is given by y4T ± [x^Cx'x)"1 x[T z)ml o]l* tal2^x • (4.1.14) Results given above show that tests concerning specific components of 3rr, and corresponding confidence intervals can be constructed using information obtained from analyses relating to the component equations. Thus there is no need to fit the total equation in order to make infer ences about it. These observations pertain to the situation where e. <\i N(c|), la2.). We show below that the same holds when e. ^ N(<f>, Va2) . 4.2 Inferences for Total Model when e. ^ N(<j>, Va2.) J J Consider now the situation where e. ^ N(6, Va2.) with V as defined J J earlier. It was stated in (4.0.2) that the BLUE for gj under this dis tributional assumption is (XtV ^X) ^Xt V ^ . One must add here that this estimator is also a maximum likelihood estimator (MLE). It is desirable to motivate the derivation of (XtV ''"X) ^ Xt V ^y^ , mainly to clear the way for its use in making inferences. Accordingly, consider the model specification y. = Xg. + e . with e. ^ K(<j>, Va2.) . Since V J J J J J is positive definite, there exists an n x n nonsingular matrix P such that V = PfcP. (4.2.1) Hence we have that (P1") lV P"1 = I. (4.2.2) Suppose one pre-multiplies the model y^ = Xg.. + by (Pfc) ^; then one has (Pt)~1y. = (P*) lX&. + (P*) 1 e. (4.2.3) which may be given equivalently by y*. = X*g. + e*.. (4.2.4) J J J Note that E y*. = X*g . = (Pt)~1X B., since E e*. = $, and J J J J E(e*. e*5) = Io2 so that we now have that z*. ^ N(d>, Io2). Therefore J J J J all the theory developed in section 4.1 applies to (4.2.4). Applying OLS to (4.2.4) one obtains 20 &. = (X*C X*)"1 X*fc y*. J J = [xt(ptp)"1xf1 xt(ptp)"1 yj = (XtV-1X)"1 Xfc V_1 y , (A.2.5) which is the generalized least squares estimator given earlier. Note that this type of fitting (GLS) which produces (A.2.5) is also commonly referred to as weighted least squares fitting and, in more general pre sentations, as minimum V-norm fitting (see Kempthorne, 1975). Results useful in making inferences when ^ N(<j>, Va^) are basically similar to those derived above for EJ ^ N(<j>, lap with a few important distinctions as a result of our transformation of y^ above. In testing for significance of the total equation, for instance, the regression sum of squares is given by k t- t- k SS_(B*|enT) = ( Z |p X* ( 1 yV " nyV (4.2.6) °T j=l J j=l 3 1 where X* and y*. are as defined above and y* is the mean of the components of v*m = Z. ., y*.. Furthermore, the corrected total sum of squares T J=l J for the total equation is SST = y*^ y*T - ny*T2 (4.2.7) and the error sum of squares is SSE = y*Tt y*T - SSR(BT), (4.2.8) where SS^g^) is given by the first term on the right-handside of (4.2.6). In these sums of squares, g\ is as defined in (4.2.5). Again, SSr, (3*T I B__) la% and SS^/a2 will be distributed as noted earlier except K 1 1 (Jl 1 li 1 that the noncentrality parameter associated with the distribution of the former is now grj,*txt V ^ XB*rr,/2a2. As before, it is clear that a test of significance for the total regression equation and the associated R2 are obtainable from data and ancillary quantities derived from fitting 21 the component equations without recourse to actually fitting the total equation. In light of (4.2.5) the covariance matrix of Sj is now given by = (XtV_1X) a2 (4.2.9) and that of by $T = (XfcV XX) 1 Zk=1 o2 (4.2.10) with ^ and ^ obtained respectively by simply replacing a? by a? in the expressions' for ^ and In this context, a 100(1 - a)% confi dence interval for 3^ is given by hT ± ta/2,n-p-l [CVZj=l5j]i (4'2-n) while a corresponding t-test is obtained by calculating = &*T"C\£ Ej=l (4-2'12) and rejecting the null hypothesis that 3 _ = 0 if It*0I > t ,„ £T a/2,n-p-l In (4.2.11) and (4.2.12), C* refers to the (£ + l)st diagonal element of (XfcV '''X) , with £ specified as before. The limitations of the results given by (4.2.11) and (4.2.12) when (XtV ''"X) ^ is not diagonal are equally valid here so that one must resort to using the extra sum of squares principle or general linear hypothesis theory to obtain valid tests. To construct a 100(1 - a)% confidence interval about a true value of yT, say y^, corresponding to the x-coordinate, X£T = (1, x1T Xp,p) , for X^T a row vector, one can show easily that the covariance matrix of y^, = P*" yj* is estimated by L = X(Xtv"1X)~1 Xs Ek . a2. (4.2.13) ryT J=l J Therefore, a confidence interval for y is given by *£T * [XlI(xVl«_1 4 *j=l 5j^ 'a/Z.n-p-l * (4'2'U) Once again, results obtained in this section show that tests con cerning specific components of B.J-. and associated confidence intervals 22 are constructible from information relating to analyses of the component equations. This result holds when ^ N((|>, Va2). We already showed that it holds when e. ^ N(d>, la2). In both cases, our derivations are J J based on the assumption that y. and y. are independent for each j * I (j, il = 1, 2, . . . , k) . This essentially completes our consideration of the problem of estimation and inference for the total model when r_. = p. The problem of additivity as defined here when r^ = p is mathematic ally nice and, in a sense, trivial. The problem, however, has obvious practical limitations as a result of requiring that rj = p since, in practice, one would like to retain in each component equation only those of the p independent variables that are statistically important. It is, therefore, of interest to consider the consequences for estimation and inference for the total model when r. ^ p with the possibility that J r. < p for all j. This is the situation not considered by Kozak (1970). We consider this case in the next chapter. CHAPTER V 5.0 ADDITIVITY WHEN r. £ p WITH r. < p FOR SOME j J J We now relax the requirement that each component model contain all the independent variables in the total equation. Specifically, if the total equation contains p independent variables (each assumed important), we shall allow the component equations to contain only statis tically important independent variables among the p variables. Thus , the number of independent variables in component equation j, may be less than p and strictly so for at least one j. This admits the possibility that r_. < p for each j provided that in that case each inde pendent variable in the total equation is contained in at least one com ponent equation. This will be consistent with additivity as defined here. With X = (i>|Xi | . . . |X ) as defined earlier, we now consider models of the form yj = Xgj + e (j = 1, 2, k) (5.0.1) and YX = Ej=1 Yj = XBT + eT (5.0.2) where the matrix X is common to the k + 1 models. However, the latter models differ from those specified earlier in the following respects. In (5.0.1), is a (p + 1) x 1 vector defined so that g^ has an inter cept component with r^ £ p of the remaining p components nonzero and the other p - r. equal to zero. The relative positions of the zero J 23 24 and nonzero components in the last p cells of g.. will be such as correspond to the presence or absence of particular X^ (i = 1, 2, ..., p) in the jtn component equation. gT will be as defined before with p + 1 nonzero components. Our definition of g.. implies that the effective X matrix, say Xj, corresponding to component equation j is necessarily different for each j except when g. and g have nonzero components in identical positions for j * £ (j, I = 1, 2, k). Assume that e.. and e^, are distributed as specified in chapter IV and also that y. and y (hence J * e. and e ) are independent for j * £. J £ 5.1 Estimation when e. a* N(<j) la2) J 1_ We first consider the problem of estimation for the total equation when e. ^ N (d>, la2.). First, we state an intuitive result for the esti-J J mation problem and then demonstrate its validity. Note that model (5.0.1) is equivalent to model (2.2.1) with a condition adjoined, namely, y. = Xg. + e. .3. b = <(> (j = 1, 2, k) (5.1.1) J J J where b is a vector of zeros corresponding to the vector of components qj of gj in (5.0.1) which are set equal to zero in the j*"*1 component equation. It is clear that estimation relating to (5.1.1) can be achieved via con strained minimization. Denote the resulting solution by g*^ (j = 1, 2, k) and the corresponding conditioned fit by Yj = Xg*.. (j = 1, 2, k). (5.1.2) Furthermore, let y^, be as defined in chapter IV; we shall occasionally refer to y^, as the unconditioned total model. Also, let y^,^, be termed the 'conditioned' total equation in a sense to be defined momentarily k - . . ... and let y*Tr = E. , Xg*.. Then the conditioned total predictive equation IL. j-1 j is given by 25 yTC = yT ~' ^yT ~ y*TC^ (5.1.3) where the factor - 9*JQ is a correction or conditioning factor which corrects the unconditioned fitted total equation (obtained in chapter IV) for parameters that are set equal to zero in some of the component equations. Note that (5.1.3) implies that yTC = Y*TC = X ^^=1 3*j and that ^TC - Zj=l **j " gT " C*T ~ Zj=l (5-1'4) In (5.1.4), B^ and 0^ are the estimator of the parameter vector of the total conditioned equation and that of the unconditioned total equation, respectively. Equation (5.1.4) implies that the parameter vector of the conditioned total equation is estimated by adding the estimates of the parameter vectors corresponding to the conditioned component equa tions. Our immediate task is to demonstrate that this result is mathe matically valid. To do so, we state our problem as one of minimization subject to constraints. Let us solve the estimation problem associated with fitting yT = XBT + Et (5.1.5) subject to BT = Zj=]_ &*.. (5.1.6) Note that 8*j in (5.1.6) refers to the parameter vector corresponding to the conditioned component equation j. This problem is solved by minimizing the Lagrangian function S(gT, 6) = (yT - XgT)t(yT - XBT) + 29^ - Z*=1 &*.), (5.1.7) where 26t is a vector of Lagrange multipliers. Now, differentiating (5.1.7) with respect to the elements of B^, and 6, respectively, one ob tains 9S = 2XtXBT + 26 - 2Xtyrj, (5.1.8) 98j 26 ff = ~eT"zj=i **y (5-1-9) Equating both (5.1.8) and (5.1.9) to zero leads to XCX BT + 6 = X^ (5.1.10) 3T = Zj=1 B*.. (5.1.11Our solution vector gT must satisfy both (5.1.10) and (5.1.11). From (5.1.10) one has xcx gT = xtyT - e=>gT = (x^)-1 x^x - (x'x)"1 0 = 3T - (,A)~h. (5.1.12) Now, (5.1.12)=*§ = XtX(g_ - g_) and since 8T = Ek g*. by (5.1.11), T T T J=l J one has that 6 = XtX(g - Ek= g*.). (5.1.13) Hence, putting (5.1.13) into (5.1.12) one gets §T = gT - (X^'Vxdj - Ek=1 g*.) = gT - (gT - Ek=1 %*.), (5.1.14) a result given earlier in equation (5.1.4). This establishes the validi of that result. Since g^, the parameter vector of the total conditioned equation, is determined by g*^ (j = 1, 2, k), it is important to discuss the estimation of S*. here. It should be emphasized that g*. is a J J (p + 1) x 1 vector having p - r^ of its components equal to zero. We are explicitly assuming that the intercept component of g*^ is nonzero. In estimating 6*j> it is important to recognize that one does so condi tionally on some specified components being assumed equal to zero. In the following, we cast the problem within the framework of general linear hypothesis theory. Consider testing the general hypothesis H : = m, where 3.. is a (p + 1) x 1 parameter vector of the model (2.2.1), KC is any matrix of s rows and p + 1 columns and m is a vector, of order s, of specified constants. We shall require that K*" be of full row rank, that is r(Kt) = s, where r(«) denotes the rank of the argument. One is inter ested here in estimating 3j under the null hypothesis H : = m. Designate the parameter vector under the null hypothesis by 3*j. Using constrained least squares (see Searle, 1971, pp. 113-114), the desired estimator is given by 3*j = fL - (X^)"1 K[Kt(XtX)~1 K]"1 (K'JL - m), (5.1.15) where 3j is the unconstrained estimator of 3j (with all independent vari ables included) . When the hypothesis is of the form H : b^ = <j> for b a subset of 3. of order q., we have Kfc = [I 0], m = s = q.. (5.1.16) Now partition 3^, g\ and (XfcX) 1 as follows where p. + q. = p + 1. Then the estimator of 3*. is J J J f * 3*. = \ • (5.1.17) J Vb " T T \ p . p .q . q .q . If X is partitioned as X= (X JX ), then the estimator in (5.1.17) is equivalently given by V PjPj Pj Pj^j qjPj Pj y 28 Observe that if the columns of X are orthogonal to those of X , then (XfcX) is block-diagonal so that Equation (5.1.19) expresses the expected fact that when the columns of X and those of X are orthogonal, then fitting only the last p. com-ponents of X will yield the same b as fitting all components of X con-Pj ditionally on the first qj having zero coefficients. The consequence for estimation of having all columns of X mutually orthogonal should be obvious from this. In general, the estimator of 3 for any linear model depends upon variables not included in the model, including those that are not known. When a subset of a set of predictor variables is statistically unimportant, it is common practice to fit an equation which simply ignores the unimportant subset. Unless the latter subset is. orthogonal to the important one in the original set, the resulting fit will not be conditioned in the sense defined above. Hence, the corresponding estimator of the parameter vector is different from that of a correspond ing conditioned fit, again unless orthogonality holds. .The estdmator is also of smaller order and is given by the non-null part of B* j in (5.1.19). It follows from (5.1.18) and (5.1.19) that one can correct the latter estimator and fill it out appropriately to obtain the corres ponding conditioned estimator. In view of this, the remainder of this chapter will be based on conditioned component equations in the sense just defined. (5.1.19) 29 5.2 Inference when e. ^ N(<J>,Ia2) J J When testing the significance of the total conditioned equation, the regression sum of squares is given by SSR(8*TC|B0TC) = (Zj=1 8*^) Xt(Ek=1 y.) - nyT2. (5.2.1) Other sums of squares relating to this particular testing problem are obtained in an obvious way. We omit further details which, again, are completely obvious. We consider, instead, the problem of testing specif hypotheses relating to the total conditioned equation. It has been demonstrated above that the estimator of the parameter vector g,^ for the total conditioned equation is determined by summing corresponding components of conditioned component equations. Analytic ally, there are two possible ways in which a particular component of 8 might turn out to be zero under additivity. First, a component of 8^^, may be zero as a result of the cancellation law when adding negative and positive elements. While this is arithmetically plausible, it is' not reasonable given that we have assumed a priori that each independent variable is statistically important. Secondly, if a particular inde pendent variable has an estimated coefficient of zero in each individual component equation, then the corresponding component of B^ will be zero. Again, this is unlikely since it is contrary to the hypothesis that each independent variable is important. The foregoing suggests that an hypothesis which states that some component (or vector of a subset of components of ^^Q) is zero is not a reasonable hypothesis. Instead, an hypothesis that some component (or vector of a subset of components of &JQ) is equal to c, where c is a known scalar (vector with all its components) different from zero, is a reasonable hypothesis. The specification of c may be based on 30 past experience or related analyses. There are two ways in which c may be specified. The specification of c may be direct as described in the preceding paragraph. On the other hand, c may be specified indirectly by simply specifying c. (j = 1, 2, ..., k) in a series of subhypotheses relating to each com ponent conditioned equation. We emphasize that the value of c is not specifically determined in the latter case, but it is determined under additivity of the conditioned component equations when they are fitted under the subhypotheses specified by c^. We shall refer to the test of hypothesis concerning c when c is specified directly as a direct test of the hypothesis c. The corresponding test when c is specified in directly through the c^ (j = 1, 2, ..., k) will be referred to as an indirect test. We use this terminology only for its mnemonic appeal. Let the conditioned total model specified by g ^ be designated k as the full model. Recall that g r = g*. and that the sum of square regression due to fitting this model is given by SSR(g*^ | g^^J as given in (5.2.1). Now consider fitting a version of the full model restricted so that some specified component(s) of g^ is(are) given by the scalar (vector) c. Let the latter model be indexed by the parameter vector c ^ c • • g r. Note that g is obtained by fitting the model indexed by g „ XL IL xL subject to the further restriction that some specified component(s) of g^ is(are) equal to c. Designate the sum of squares regression due • C C i c to fitting the model indexed by g^- by SSD (g* g^,) . This sum of square IL K OIL is given by UC I ^C \ _ /oC \t „f-.-„k SSR(g*TC|g^TC) = (g^r Xt(Zj=1 yj) - nyT2. (5.2.2) c . Since the model indexed by 3T(_, is a restriction on the model indexed by gTC, it follows that SSR(g*^c | g^) ^ SSR^*TC I B0TC) * HenCe the 31 sum of squares HO = SSR(B*TC|B0TC) - SSR(3*TC|3^TC) (5.2.3) can be interpreted as the extra sum of squares for testing the hypothesis that a specified component (s) of $^ is (are) equal to c. Thus, using ijj(c) and the error sum of squares for the full model leads to a direct test "for c. Suppose, now, that c is specified indirectly by specifying c^ (j = 1, 2, ..., k). As in an earlier part of the sequel, let 3*^ be the parameter vector corresponding to conditioned component model j. Suppose we fit the model indexed by 3*j under the subhypothesis that ~ c some component of 3*j 1S equal to c^. This leads to estimators 3*j (j = 1, 2, ..., k), where the superscript c on 3*j denotes the further restriction c^. Again if we denote the total conditioned model with the further restriction c imposed indirectly through the c^ as being c indexed by 3^^,, it follows by our result on additivity that c k. c 3__ = 6*. • The regression sum of squares due to fitting the latter model is given by SSR(3*TC|3^TC) = (^(B'.V) XC(Zk=1 y.) - ny/. (5.2.4) Once again, one has that SSR(3*XCI3QTC) i SSR(3*TC|30TC), so that again the sum of squares **(c) = SSR(B*TC|30TC) " SSR(3*TC|3^TC) (5.2.5) can be interpreted as the extra sum of squares for testing the hypothesis that a specified component (s) of 3 r is (are) equal to c via the c. Thus, using ^*(c) and the sum of squares error for the full model leads to an indirect test for c. It is worth emphasizing that the procedure for testing a hypothesis concerning c directly or indirectly is applicable whether c is a scalar 32 or a vector. Tests of hypotheses concerning individual components of 3^,Q using the usual t-statistic and/or univariate confidence intervals about such components can be obtained in a manner similar to that described in chapter IV provided an estimate of the covariance matrix of is available. These tests and confidence intervals are especially likely to be mislead ing here, however, since the conditioning makes the components of 3^ cor related unless the columns of X are orthogonal. Therefore, tests on individual components of B^^-, are best performed as outlined in the preced ing paragraph. On the other hand, a confidence interval can be constructed about y^x> say, corresponding to a given x-coordinate in an obvious way. 5.3 Estimation and Inference when e. ^ N(<j>, Vo2.) J i_ Results presented in sections 5.1 and 5.2 relate to the distribu tional assumption ^ N((j), la?). These results carry over to the case £j ^ N(<j>, Vo?), for V as defined in chapter IV with the obvious modifica tion that wherever X, y^ and y^ occur, in the various expressions, they are replaced by (P1") ''"X, (P*") ^y^ and (P*") ^y^, in that order, where V^F = V with P as defined in that chapter. Thus, most of the results will involve V ^ as demonstrated before. In the following chapter, it is demonstrated that the problem of additivity when Tj = p can be treated as a special case of additivity when r^ i p. Such a demonstration provides a basis for constructing a unified theory relating to the additivity problem. CHAPTER VI 6.0 A GENERALIZATION OF THE ADDITIVITY PROBLEM Within the framework of the conditioning principle described in the preceding chapter, the fitting of the component and corresponding total equations when r^ = p can be considered as a problem of fitting subject to 'null' or 'trivial' conditions. By null or trivial condi tions we mean here that no further conditions are imposed on the g. (j = 1, 2, k) beyond the basic additivity requirement that Bj = BT- The point to observe here is that there is no require ment that any component(s) of g.. be equal to zero. In terms of result (5.1.3) in the preceding chapter, this implies that y*TQ = yT so that (5.1.3) reduces to yTC = yT- (6.0.1) Note that y^,, y^,^, and y*T£ were all defined in the previous chapter. Thus the correction factor or the conditioning factor is identically zero when r. = p. J In terms of a result given in (5.1.14), the above implies that gT = Ik g*. (6.0.2) T J=l J so that (5.1.14) then reduces to BT = 8r (6.0.3) Equations (6.0.1) through (6.0.3) suggest that the problem of additivity when rj E p can be treated as a special case of additivity when r.. ^ p. 33 34 In this connection, the theory of estimation and inference described in chapter V reduces to that presented in chapter IV when r^ = .p. This generalization is significant, at least theoretically, since it makes it possible to visualize the additivity problem as defined here as one very general problem which can be studied under one unified theory of estimation and inference. CHAPTER VII 7.0 OTHER ASPECTS OF THE ADDITIVITY PROBLEM The development of the theory has, thus far, been based upon the assumption that y. and y (and hence e. and e ) are independent for each j ^ £ (j, i = 1, 2, ..., k). As indicated earlier in the thesis, how ever, there are examples of applications where this assumption is simply not tenable. The implications of this, especially for inference, are well worth considering and will be examined in this chapter. There will be occasion also to consider other general complements of the add itivity problem such as that of the matrix X not of full column rank and of V not necessarily positive definite. These latter generalizations are useful when considering certain classes of the general linear model. In particular, they permit the extension of the theory of additivity as developed here to classificatory models which are generally associated with designed experiments and are routinely analysed using analysis of variance procedures. Furthermore, it is noteworthy that it is generally assumed in most applications of regression analysis that the X matrix is fixed (that is, that the independent variables are either known or are measured without error). Yet it is quite conceivable that the inde pendent variables may themselves be random, just as the dependent variable y, or they may be fixed but measured with error. It is reasonable to consider briefly the implications for analysis of these possibilities, at least for completeness. 35 36 7.1 The Case y., y (j * £) Dependent J As a preamble, suppose that Xi, X2, . .. , X^ are multivariate normal random vectors, each of dimension m, with mean vectors (i = 1, 2, k) and corresponding covariance matrices (i = 1, 2, ..., k). k Now define U = E. , X. and suppose one is interested in the distribution 1=1 1 of U. If, in addition, it is assumed that the vectors X^ (i = 1, 2, k) are independent, then'by a well-known theorem in multivariate analysis (see Muirhead, 1982, p. 14), it follows that U is m-variate k normally distributed with mean vector u = E. -, u. and covariance matrix J i=l 1 iTT = £k , % .. On the other hand, if the vectors X. (i=l, 2, ...,k) TU 1=1 Ti 1 are dependent, we know only that EU = Ek . y. (7.1.1) i=l 1 and t. = Ek . t. + E E i.., (7.1.2) TU 1=1 Ti . . Til where t. is the variance-covariance matrix of X. and t.. is the covariance Ti 1 Tij matrix of X^ and X. (i * j). Hence it follows that k k U^ (E. , u., E. , I. +E E I..). Note that we have only specified the 1=1 1 1=1 Ti . . Tij parameters of the distribution of U and not its form. Indeed, as far as is known, without making any assumptions concerning the form of depen dency of the summands in the definition of U, the exact distribution of U under dependence is largely an outstanding problem in mathematical statistics. However, as suggested above, given some knowledge of the form of dependency among the k vectors X^, it is possible to obtain a distribution for U (Olkin, 1983, personal communication). Furthermore, one might surmize that the distribution of U under dependence of the component vectors might be derivable as a multivariate generalization of the univariate analogue considered by Springer (1979, pp. 72-75). Even so, however, the explicit representation of such a distribution is likely to be nontrivial. The representation of $ given in (7.1.2) derives from a simple generalization of the univariate case to the multivariate case. For the analogous univariate result, see Mood, Graybill, and Boes (1974, p. 179). Note that independence of the vectors (i = 1, 2, ..., k) implies that H 0, the null matrix, in (7.1.2). Now let U = y_ and X. = y. (j = 1, 2, ..., k); then for y., y J- J J *• dependent for each j * I, it follows that k k k J, x y„ = Z. - y. ^ (XE. .. 3., E. . t. + H I..), For simplicity in what T j=l J2 J=l J J=l TJ TiJ follows, we shall mostly use tT to designate £k i . + Z Z i , .-. Even given that the y^ (j = 1, 2, ..., k) are individually multivariate normal under dependence it is not known what the distribution of y^, is exactly. What is known is that y^, is either multivariate normal or is not multi variate normal (see Kale, 1970). Examples are found in the literature of linear combinations of normal random variables which are themselves (the linear combinations, that is) not normal (e.g., Rosenberg, 1965; Behboodian, 1972) and of marginally normal random variables whose joint distributions are not normal (e.g., Ruymgaart, 1973). These results of course generalize to vector random variables. The overall implica tion of this is that lack of knowledge of the exact distribution of y^, and, in particular, its probable non-normality renders the construction of a small-sample theory of inference considerably more difficult. Given lack of knowledge of the exact distribution of y^,, a small-sample theory of inference for the total regression model is constriictibl on the basis of normality of y^, if one can demonstrate that the vectors 38 Vj (j = 1, 2, ..., k) are jointly multivariate normal. This is the case because it is well-known that if yx, yz , ..., y, are jointly multi-K. variate normal, then every linear function of these y^'sis multivariate normal. (Note that they.'s are vectors here.) This result follows from a characterization of the bivariate normal distribution which gener alizes to other joint multivariate normal distributions (see Rao, 1965, pp. 437-438). Therefore, in our instance, under dependence of the y^'s, normal theory can be used to construct inferences concerning y^ or or both if it can be shown that yi, y2» •••> y, are jointly multivariate K. normal. This suggests the need for methods of assessing multivariate normality based upon realizations of the vectors yi, y2> •••> y^-Graphical methods of assessing multivariate normality have been proposed in the\literature (e.g., Healy, 1968; Cox, 1968; Andrews, Gnanadesikan, and Warner, 1973). Other authors have proposed analytical significance tests for testing for multivariate normality (e.g., Malkovich and Afifi, 1973; Hawkins, 1981). More recently, however, Koziol (1982) introduced a test for assessing multivariate normality which is fairly easy to use and has some nice properties. If a test for joint multivariate normality such as Koziol's (1982) leads one to entertain joint multivariate normal ity, then one proceeds to make inferences concerning y^, or based upon the usual normality assumptions. If, on the other hand, joint multivari ate normality is rejected, then one can either appeal to asymptotic results to construct approximate tests or resort to nonparametric approaches. We shall discuss the latter approach only briefly in this thesis. But first, let us tackle the problem of estimation. 39 7.1.1 Estimation for Total Model under Dependence As observed earlier, estimation should not, in general, be hampered by lack of knowledge of the distribution of y^, and, in particular, by its non-normality. Consider estimation for the total model when e. a, N(d>, la2.) and when e. ^ N(<b, Va2). It is demonstrated in this J J J J section that when the y^'s are dependent, the concept of additivity, as defined here, does not hold. This follows from the following reasoning. When the e. are dependent, one has that k em ^ (<f>, £• •, $• + £ E t • •) in general. In the case where T T i=l 'i . . Tii i*j J ^ N((j>, la2) (j = 1, 2, k) , the first term in the variance-covar i-ance matrix of e^, reduces to la2,, where a2 = £j=]_ ° j • Furthermore, it is easy to see that t.. = Ip..a.a. so that E E I.. = I E E p..a.a.. + ij ij 1 J ^ + U i5ej iJ i J Therefore, under dependence of the with ^ N(4>, Io2) , ^ (<j>,Ia2) where a2 = £k a2 + E E p..a.a.. Thus the variance-T Y 1=1 i . . ii l l i*j J J covariance matrix of Erj, is diagonal. On the other hand, when e. ^ N(d), Va2), the first term in the variance-covariance matrix of e_ J J T becomes Va2,, with a2, as defined above, so that e ^ (cf>, Va2, + E E $^.) where i.. is not diagonal. Recall that under the assumption of independence of the y^'s and V positive definite, the transformation matrix P, such that PfcP.= V, ••was the same for each component model and the total model. As a consequence, additivity followed naturally since §T = (Xtv"1X)"1Xtv"1yT = (Xtv"1X)"1xV1 Ek=1 y.. = (Xtv"1X)"1Xtv"1[yi + ... + yfc] = £k (7.1.1.1) J=l J 40 Note that the above result holds also when V = I. Now suppose that under dependence of the (j = 1, 2, ..., k), the variance-covariance matrix of may be written in the form V^o2, for some matrix Vj. First, note that when ^ N(<t>, Vo-^) with V positive definite, there is no guaran tee that is positive definite, though we know that it is at least positive semi-definite. Secondly, even if were positive definite, it is obvious that the matrix P^ such that = P^P^ is not necessarily equal to the matrix P which transforms each of the component equations. This holds when ^ N(<|>, ia^) because Io"T 4 Io2. Consequently, when the y^ are dependent, 3^ cannot be determined from the additivity property. This result must obviously hold for more general V and Vj. Assuming positive definiteness of V and V^, nonadditivity is demonstrated as fol lows. Note that §T = (XtV^1X)"1XtVT"1yT = (XfcVT lX) 1XtVT 1 Zk=i y = Ek 6 * j=l j + Ek § = (xtv"1X)"1XtV_1 2k y.. (7.1.1.2) J=l J J=l J Indeed, equality holds only if V\j, = V which implies that E„ = 0 for all i 4 . j and, therefore, independence of the Yj's- The above results suggest that when the Yj's are dependent, the parameters of the total equation must be determined by actually fitting the corresponding total equation rather than from additivity. An exception to this would be in those cases where dependence is so weak that the first term in the covariance matrix dominates the second term (namely E I t..) in the sense that the entries of each are close to zero. But this simply implies that independence largely obtains. 41 Since inference for the total model is essentially linked to estima tion, it follows that inferences concerning parameters of the total model when the y are dependent can only be made after fitting the relevant total models directly. Therefore, under dependence of the y , one must actually fit the total model in order to estimate its parameters and make inferences about them. Let us return to the problem of inference under dependence. Since, as observed earlier, e^, may or may not be multivariate normal, one has several options. First, one can test for joint multivariate normality (JMVN) of y1} y2, y, along the lines of Koziol (1982). If the test shows that joint multivariate normality is tenable, then one uses normal theory to make inferences concerning the total equation as described below. If joint multivariate normality is not tenable, one may examine yT = Yj directly for normality, since lack of joint multivariate normality does not rule out the possibility that y^ is normal. If both joint multivariate normality of the y.'s (j = 1, 2, k) and normality of y^, by direct examination of Ej-j Yj are not tenable, then one may use non-parametric procedures. If certain conditions are met, one may use asymp totic results to arrive at approximate inferences (see Arnold, 1981, sec tions 10.1, 10.3) when yT is not normal. This latter is not considered further here. However the other approaches are considered briefly below. 7.1.2 Inference for Total Model when yj, y2, y^ are JMVN When a procedure for assessing joint multivariate normality such as Koziol's (1982) leads to the conclusion that the assumption of joint multi variate normality is reasonable for y^, y2, y^> one treats ym = E. , y. as multivariate normal with mean Xg,,, and variance-covariance T J=l J T matrix r = Ek , + E E The normality of y when yi, y2, y, i J 1 3 XJ IK are jointly multivariate normal is a standard result in mathematical statistics as suggested in section 7.1. Recall that if e. ^ N(4>, la2) (j = 1, 2, .... k) , then with the e. 3 3 3 k ? jointly multivariate normal, £ ^ N(4>, I(£. - erf + E E P..o".a,)). J ' ' T ' j=l J ^ ij l j" With these conditions, fitting y^ = X6T + directly by ordinary least squares yields BLUE's for 3„ and a^ = E. af+EEp..a.a. in the sense 4 T j = l j ±^ i] i] of the Gauss-Markoff theorem. These estimators, denoted by 3^ and cr = MSE, respectively, are also maximum likelihood estimators. It is note-worthy that the components of o are not estimable directly from the total model. However, the usual analysis of variance tests for the total model based upon the above estimates are valid and confidence intervals may be constructed on individual components of 3^, and on y^ essentially as explained in chapter IV. This approach is also applicable if a direct examinat ion of y^, = ^j=1 Yj suggests that it is normally distributed. When e. ^ N(<j>, Va?) and the £. are jointly multivariate normal, then k E_ i/ N(<|>, VE. - ar + E E $..). If it is possible to write J=1 3 1*3 1J VE. - a? + E E i.. as V^a2 for some positive definite matrix Vm, then it 1=1 J ±?£j ij T T ' is clear by results given in chapter IV that there exists a matrix P^ such that V^ = P^Prp. Therefore, generalized least squares applied to the total equation yields BLUE's which are also maximum likelihood estimators. Tests of hypotheses concerning 3,^, or its components can be achieved as outlined in section 4.2 If, on the other hand, the variance-covariance matrix of £rri is simply positive semi-definite rather than positive definite, then an approach such as is used by Zyskind (1967) and Zyskind and Martin (1969) may be employed for estimation and inference. 43 7.1.3 Inference for Total Model when y^, y2, .... y^. are not JMVN It has been noted above that when y^, y2, y, are not jointly K multivariate normal, it is still possible that y_ = E. .. y. is multi-T j=l variate normal. It has also been indicated that when y^ is multivariate normal, then estimation and inferences relating to the total equation fitted directly can be carried out as outlined in the preceding section. When yi, y2, y are not jointly multivariate normal and K. yT = Yj Is not normal, one of the options left for inferences for the total model is via use of nonparametric procedures. It is not the objec tive here to pursue the subject in detail but, for purposes of completeness, to indicate what procedures are available and possible references. Randies and Wolfe (1979) give a nonparametric approach to testing the slope in simple linear regression. Clearly, this is of limited use in our context. However, both the estimation problem and inference procedures for more general regression problems are considered in chapter 9 of Hollander and Wolfe (1973). Other methods of dealing with non-normality of y^, in estimation and inference for the total equation is to use any of a number of so-called robust regression techniques. One such technique is known as robust ridge regression proposed by Hogg (1979). For further references to some of these robust regression techniques see Montgomery and Peck (1982, section 9.3). Finally, before leaving the subject of nonparametric approaches and how they might be applied to fitting a total model under non-normality, it is worth mentioning two fairly novel non parametric methods which are applicable to regression situations. These are the jackknife and the bootstrap. For a reference to use of the jack-knife in regression see Miller (1974) and for application of bootstrap techniques in regression see Efron (1979). 44 7.2 The Case of X Random or Measured with Error It is generally assumed in regression applications that the inde pendent variables are either fixed and known or that they are measured without error. Indeed, in most of the development in this thesis, this has been assumed implicitly. In such situations, only the dependent variable y is assumed random. In many biological applications, however, it is often the case that both y and the independent variables are random. Alternatively, it may well be that the independent variables are in fact fixed but are measured with error. In the following, these two possi bilities are considered in the light of their implication for estimation and inference for the total model. Attention is restricted in both cases to the situation of independent vj's' Without losing sight of the additivity problem it will suffice here to examine the consequence for estimation and inference on a component equation which, for simplicity, will be referred to without the j subscript as y = X3 + e. 7.2.1 The Case of X Random Sampson (1974) distinguishes between two related regression schemes. One scheme is that in which the independent variables are constant or fixed, as is often assumed. He refers to this simply as regression analysis. The other is that in which the independent variables are random variables (or realizations of random variables). This latter regression scheme is referred to as multivariate analysis of regression. We concern ourselves in the present section with the latter scheme. The objective here is not to provide a detailed analysis of the situation but to highlight the effect that randomness of X may have on estimation and inference for component model j and, hence, for the total equation. The following is largely due to Sampson (1974). 45 The multivariate analysis of regression scheme assumes that the vector y and the vectors consisting of the columns of the matrix X form a multivariate random variable (or a realization of a multivariate random variable). In the present case, it will be assumed that the joint distri bution is multivariate normal. Denote the continuous random variable cor responding to the independent variable by X (where X is p-dimensional) and let X* be a realization of X. The random variable corresponding to the dependent variable is denoted by y and its realization by y*. With a t t sample of size n ((y^, x^) , i = 1, 2, n), let z^, = (y^, x./) and the corresponding realizations {(y^, x*), i = 1, 2, n) and (z*, i = 1, 2, n). In the multivariate analysis of regression, it is assumed that for 1 _< i _< n, z^ are independently and identically distributed according to N(<|>, $) . In the multivariate analysis of regression model, the parameters equivalent to 8 and a in the regression analysis model are $22 $21 and $n - $i2 $22 $21, where squared error loss. Thus, when one speaks of regression coefficients in the multivariate analysis of regression situation, one speaks of (7.2.1.1) and $21 is p x 1. As stated by Sampson (1974), the justification for the appropriateness of $22 $2i as a parameter vector is that for 1 <^ i _< n, E(y^ - x^ y)2 is minimized for y = $22 $21» so that x^ $22 $21 is the best linear predictor of y. in the sense of minimizing 46 probability distribution law of the argument. The relationship between regression analysis and (7.2.1.2) should be fairly obvious. Without going into further technical details, we state results concerning estima tion and inference in multivariate analysis of regression and how they relate to corresponding results in regression analysis with fixed or nonrandom X. An important result concerning estimation in multivariate analysis of regression is that although the maximum likelihood (ML) estimators are necessarily diff erent from those in regression analysis (mainly because they are defined on different sample spaces), the corresponding ML esti mates under the two models are exactly the same. Thus estimation under the two model formulations is the same. However, Sampson (1974) shows that for testing hypotheses in the two situations, the power functions are different. This is a significant result in that it stresses the importance of using a correct model in order to obtain tests with the correct power. This result is of considerable relevance in the present and other biological applications where X may in fact be random rather than fixed as is often assumed in regression situations. The implication for additivity is that testing is obviously affected by randomness of X but not estimation. 7.2.2 The Case when X is Measured with Error In regression situations where the independent variables may reason ably be considered fixed, it is conceivable that an error may be intro duced when measuring X at its fixed value. It is noteworthy that this problem is not necessarily the same as that of random X unless further assumptions are made about both X and y. The main objective here is to demonstrate the effect upon estimation and inference when X is measured with error. For simplicity, restrict attention to the simple linear regression model y = B0 + BiX + E, (7.2.2.1) where it is assumed that e ^ N(0,a2 ) and cov(e., e.) = 0 for ±4 j. Now if X is measured with error, one does not observe X directly but rather observes X' = X + 6 (7.2.2.2) where X is the true value of X and 6 is a measurement error. Suppose that 6 ^ N(0,o2 ), X ^ N(u , a2.) and that e, 6, and X are independent. Then Y 0 A A and X' follow a bivariate normal distribution (Snedecor and Cochran, 1973) and the regression of Y on X' is linear with regression coefficient Bj = Bi/(1 + X), (7.2.2.3) where X = o2/a2. Thus it is the case that when X is measured with error, £ X our least squares estimate of the regression of Y on X is biased in that it underestimates the true regression coefficient from fitting Y on X. When X is not normal, the above result holds in large samples and holds approxi mately in small samples if X is small (see Snedecor and Cochran, 1973). Inferences concerning y or the regression coefficient are valid if X is measured with error provided that £, 5, and the true X are approximately normal. However, predictions of y are less precise because of the increase in residuals as a result of errors in X. The results given above have some relevance in the additivity problem. More importantly, they point to the need for a proper regression approach if proper estimates and inferences are to be made. For other aspects of this problem see Wald (1940), Berkson (1950), and Madansky (1959). 48 7.3 Other General Complements In more general applications of the general linear model, it is not uncommon that the matrix X is not of full column rank. Suppose, first, that ^ N(<f>, l°j) • Observe that XfcX is singular and, therefore, a unique solution does not exist for the least squares problem. An optimal solution is obtainable, however, by using the well-known concept of a generalized inverse. Let us begin by considering a particular generalized inverse, one commonly referred to as the Moore-Penrose inverse (Moore,1920; Penrose, 1955) but also often called the pseudo-inverse or, simply, p-inverse. Attention is restricted to real X throughout; but first some definitions are in order, given here as theorems. Theorem 1. Suppose X is a real n x (p+.l) matrix with rank (X) = r < p + 1. Then the (p + 1) * (p + 1) matrix XtX has exactly 2 2 2 r positive eigenvalues Xi 5 A2 5 ••• = ^r > 0 plus the zero eigenvalue with multiplicity p + 1 - r. The next theorem is based on a well-known theorem in matrix algebra called the Singular-Value Decomposition theorem. Theorem 2. With X satisfying theorem 1, one can always find an n x n orthogonal matrix U and a (p + 1) x (p + 1) orthogonal matrix G such that A = uScG and X = UAG*" with A the n x (p + 1) matrix 'D 0' A =( where D is an r x r diagonal matrix with i1"*1 diagonal element d.. = X. > 0 for 1 S i i r. The expression of X in the form 11 1 X = UAG*" is termed the singular-value decomposition of X. One must remark that U and G in the above theorem are not neces sarily unique. The real importance of theorem 2 la that a.decomposition 49 of the matrix X exists, a result which leads to definition of the Moore-Penrose inverse as follows. Theorem 3. If, from the n x (p + 1) matrix A of theorem 2, one defines A+ as the (p + 1) x n matrix then the Moore-Penrose inverse (pseudo-inverse) of the matrix X is given by X+ = GAV1, where G and U are as specified in theorem 2. With X+ defined as above, an optimal least squares solution for a model of the form y = X8 + £ is given by B = X+y. (7.2.1) Pertaining to the additivity problem of the discourse, result (7.2.1) implies that £. = X+y. (7.2.2) J J for component equation j (j =1, 2, ..., k) with gT = £k -j B. (7.2.3) T J=l J for the total model. Note that inference theory relating to the total model as discussed elsewhere in the thesis now incorporates X+ in an obvious way. Note, for instance, that when ^ N(t(>, I°j) » it is the case that the covariance matrix of 8^ is given by $S = X+X+t Zk . a2., (7.2.4) T3T j=l j a result which can be derived easily from (7.2.2) and (7.2.3). 50 When y. ^ N(<f>, Vo.), using our usual transformation, g. is given by J (7.2.5) so that (7.2.6) In (7.2.5) and (7.2.6), X T is the Moore-Penrose inverse of * t -1 * t -1 X = (P ) X and y^ = (P ) y. Again, the incorporation of these results in inference theory is a straightforward exercise and is omitted here. However, some remarks are in order with respect to X (or X ). First, it is noteworthy that although U and G are not necessarily unique in the decomposition of X given by theorem 2, the Moore-Penrose inverse X (or X ) is unique. Therefore, different U and G will lead to the same X+ and, hence, the same optimal solution §. It was mentioned earlier that when X is not of full column rank, there is no unique solution to the least squares problem of fitting y = X£ + e or its corresponding transform. While this is so, it is remarkable that the solution (7.2.1) or its transformed version is optimal in the sense that it is the only solution giving least 2-norm; that is, it is the best solution to the least squares problem. When X is square and nonsingular, then X+ = X \ the unique inverse of X. Finally, it is important to mention that the real practical usefulness of X+ hinges upon the ease with which it can be determined in any one problem. It turns out that X+ is rela tively easy to compute when X has a few columns. However, the task of computing X+ becomes increasingly more difficult with an increasing number of columns in X. Since many practical problems tend to involve an X matrix with a fairly large number of columns (especially in classi-ficatory models), the use of X+ often presents a computational barrier. Largely because of this, a more general (weaker) generalized inverse 51 is used in many singular situations. The following is only an introduction to this type of inverse. For more complete treatments, see Searle (1971) and Rao and Mitra (1966). The Moore-Penrose inverse described above satisfies the following conditions: (i) xx+x = X (ii) x+xx+ •• = x+ (iii) (x+x)fc = x+x (iv) (xxV = xx+ (7.2.7) If one defines a matrix X satisfying only condition (i) in (7.2.7), *t" *t" that is satisfying XX X = X, then X is termed a generalized inverse of X (see Searle, 1971). Unlike X , the Moore-Penrose inverse, X is not unique. However, X^ is considerably easier to compute than X+. Furthermore, any X^ has the property that it generates all possible solutions relating to any given estimation problem and these solutions are invariant under affine transformations. The latter property is of value with regard to estimation and inference for linear functions of the parameters in a given problem. It should be pointed out that X^ enters into inference theory in much the same way that the Moore-Penrose inverse does. Further details relating to the use of x"*" are omitted here as they can be found elsewhere (e.g., Searle, 1971; Rao and Mitra, 1971). Most of the results presented so far are based upon the assumption that whenever e. ^ N(<f>, Va2), then V is positive definite and known. J.J While this is commonly true and lends itself to fairly straightforward mathematical manipulations, there are instances in which V is not neces sarily positive definite. In addition, the elements of V may be unknown. The general approach is indicated here for the case where V is nonnegative definite and known. The case of V unknown is considerably more difficult. As was the case when X was not of full column rank, the concept of the generalized inverse is employed when dealing with models where V is not positive definite. In the general case, a solution for model j (j = 1, 2, k) would be given by 3. = (XtVtX)+XtV+y. (7.2.8) J J where V is any generalized inverse of V. Corresponding results for additivity and the associated inference problem generally correspond t . to those presented earlier with the obvious modification that V is used in place of V ^. Other details are given in Searle (1971, section 5.8) while another fairly instructive approach is given by Zyskind (1967) and Zyskind and Martin (1969). CHAPTER VIII 8.0 SOME EXTENSIONS OF THE THEORY It is well-known that data sets generated under experimental condi tions according to a predetermined design model can be analysed using the general regression approach. Indeed, although the conventional anal ysis of variance approach is used in analysing most such data sets, the general regression approach often represents the most efficient and, at times, the only exact method of analysis, especially for unbalanced situa tions. In view of this link between regression analysis and conventional analysis of variance, it is reasonable to ask whether the problem of addit ivity, as defined here, cannot be envisaged within the context of classi-ficatory models. It is shown, in this chapter, that an extension of the additivity problem to classificatory models is not only theoretically plausible but also makes sense in some practical situations. Secondly, in view of the growing interest in the use of nonlinear models in many branches of applied biology, it is of interest to investigate the extent to which the concept of additivity, as understood here, can be expected to hold in nonlinear situations. In a forestry context, such an investi gation has an important bearing upon the determination of total volume or weight biomass of individual trees from corresponding component biomass using any of the well-known nonlinear models, such as the Chapman-Richards function. The main thrust of the development in this chapter will, 53 54 then, be directed towards establishing the extent to which the theory of additivity, as developed in chapters IV, V, and VI, applies to classi-ficatory and nonlinear models. 8.1 Extension to Classificatory Models A theoretical justification for the extension of the theory of addi tivity to classificatory models is based upon the fact that any classifi-catory model can be equivalently expressed in linear regression form. A special feature to note about such a model is that the incidence matrix, otherwise known as the design matrix, is, in general, not of full column rank. Therefore, no unique solution exists for the estimation problem using least squares. Hence, one either uses the unique Moore-Penrose inverse to obtain an optimal solution or uses a generalized inverse to arrive at a solution. As indicated elsewhere in the discourse, the deci sion to use the Moore-Penrose inverse or a generalized inverse will depend upon considerations of computational efficiency. Furthermore, the incor poration of results from the estimation problem into inference involves the simple substitution of expressions involving the appropriate general ized inverse into statistics derived in earlier chapters. To indicate the practicality of the additivity problem in the context of a classifi-catory model, we describe below how such a problem might arise in practice. We draw our example from the field of agriculture. For simplicity, consider a controlled field crop experiment involv ing a treatments, each replicated n times. It is a simple matter to recognize the design here as a completely randomized design. We shall suppose that the leaf component of the biomass of the crop under investi gation is used for human consumption as a vegetable. Further, suppose that the floral component of the crop is used as a different type of 55 vegetable food. Next, suppose that the seed component is used as another type of food while the remaining unusable above-ground part of the plant is burned as a fuel. If one is interested in the effect of treatment upon the accumulation of biomass (on a green weight basis) in component j (j = 1, 2, 3, 4) and the corresponding effect on total biomass accumula tion (resulting from adding the four components), then one has the follow ing problem. Designate the observed biomass of the j*"*1 component corres-ponding to the r replicate by y.. (i = 1, 2, ..., a; j = 1, 2, 3, 4; x j r r = 1, 2, ..., n). Then one would be interested in models of the form y. . = y. + T.. + e . . (8.1.1) ijr j ij ijr and ?iTr = Z]=l ?ijr " *T + TiT + EiTr> (8'1'2) with appropriate assumptions on the errors. Note that by using an approp riate incidence matrix X, one may write (8.1.1) and (8.1.2), respectively, in matrix form as and y.. = X6j + £j (j = 1, 2, 3, 4) (8.1.3) yT = zj=1 y.. = X0T + er (8.1.4) We remark that 6. = (u.,i, ,, r~ •, T .)t while 6T = (yT, x1T, x2T, T^)1". Clearly, the estimation and inference theory presented elsewhere in this thesis can be applied to (8.1.3) and (8.1.4) subject only to the proviso that a generalized inverse or the Moore-Penrose inverse is used in place of (X*"X) ^ or (XfcV ''"X) ^. The regression formulation (8.1.3) and (8.1.4) of the analysis of variance models (8.1.1) and (8.1.2) makes it especially easy to estimate 9^ sub ject to the condition that certain of its components are equal to zero. Furthermore, there is no reason why the additivity concept cannot be 56 applied to more complicated classificatory models such as the randomized complete block design, Latin square design, and other designs since, in each case, one can write the corresponding models in linear regression form. This demonstrates that the extension of the notion of additivity of component regression equations to classificatory models is not only theoretically plausible but also appears to make sense in practice. 8.2 Extension to Nonlinear Models A definition was given in chapter II for a regression model in general and for a linear regression model in particular. From those definitions, it follows that any regression model r* e R* satisfying R* i L is a non linear regression model. Recall that L was defined in chapter II as the set of all linear regression models. Conventionally, nonlinear re gression models are divided into two groups, namely the class of nonlinear regression models that can be made linear by applying an appropriate trans formation to the nonlinear model and the class of nonlinear regression models for which there exists no known linearizing transformation. The two types of nonlinear regression models are generally referred to in the literature as intrinsically linear and intrinsically nonlinear, respectively (see Draper and Smith, 1981). The main objective in this section is to investigate whether the notion of additivity does make sense for these two types of nonlinear models. We consider models of the form y.. = f(X, B..) + Ej (j = 1, 2, k) (8.2.1) and yT = £k=1 Yj = f(X, BT) + eT. (8.2.2) Attention is directed here toward discovering the extent to which model (8.2.2) is arithmetically determined by the models specified in (8.2.1). 57 For purposes of simplicity* we restrict detailed analysis to two types of nonlinear models, namely B, -X y.. = $ e iJ e (j = 1, 2, k) (8.2.3) and B, -X y.. = 3Q:j e iJ + z (j = 1, 2, k). (8.2.4) Note that models specified by (8.2.3) are intrinsically linear, so that for purposes of estimation, one may transform them to linear form using a logarithmic transformation. This leads to Zn y. = £.n(BQj) + g^X + UTI e^ (j = 1, 2, . .., k) (8.2.5) or simply yj* = B0j* + BljX + ej* (j = l' 2' '•' k)' (8-2-6) On the other hand, models specified by (8.2.4) cannot be so transformed technically although the first member of the expression on the right of this model is linearizable. Indeed, (8.2.4) specifies the more simple forms of an intrinsically nonlinear model. Now consider fitting the linearized form of (8.2.3) and suppose one is interested in this'linearized form for prediction purposes. If, in addition, one is interested in the predictive equation for the sum of the transformed form of the components, then the parameters of. the latter model are determined by additivity from the component models. This is the case since we have yT* - y.* - Z%x (6oj* + Bljx + Ej*) " B0j* + hi* + 'i* = B0T* + B1TX + eT*. (8.2.7) Thus, as in the ordinary model, additivity holds here as long as the variable of interest is the transformed version of the dependent variable y.. 58 Kozak (1970) made this point in his paper and, therefore, it is not new. Suppose, on the other hand, that one is really interested in the original nonlinear form as a predictive equation. In this case, the lineariza tion is only an intermediate step aimed largely at simplifying the process of estimation. Note that in this case some of the parameters estimated from the linearized equation would need to be further transformed before being inserted in a nonlinear predictive equation. Now, if the nonlinear fitted analogue of (8.2.7) were desired, note that using additivity one would need to back-transform the expression yT* = Ek=1 £n~y\ = Zk=1 An"^ + (Ej=1 5lj)X (8.2.8) to obtain k * k ~ (Zi=lgli)X y = n. y. = (II. 6 )e J J yT JLj=l yj U1j=l P0j;e 3 X = IIk=1 [§oj e lj ]. (8.2.9) The result in (8.2.9) warrants some comment. Perhaps the most important of such comments is the following. If one is interested in predicting total biomass, say, as a sum of the components y^ using a model of the form (8.2.3), then one must not do so by invoking additivity of the trans formed version of (8.2.3) and then re-transform (that is, back-transform). If one does so, then one gets an equation which predicts the product of the components rather than their sum. In a nutshell, one gets the wrong predictive equation. Herein lies the real virtue of a proper analysis of a modelling situation. Thus for models of the form (8.2.3), a total predictive equation of the same form cannot be determined from additivity of the parameters of the transformed component equations. It may be determined at least arithmetically, however, as a simple sum of the corresponding fitted nonlinear component equations, though the predictive 59 merits of such an equation may be debatable. With respect to models of the form (8.2.4), it has been indicated that such models do not admit transformation to linear form. Therefore, the parameters would be estimated using any of a number of known iterative search techniques. As with models of the form (8.2.3), however, a pre dictive total equation cannot be obtained here by appealing to the addit ivity property since the parameters of the total predictive equation cannot be determined by adding corresponding parameters in the component equations. However, if simple prediction was the objective, then a predictive total equation may be obtained by simply adding up the predictive component equations. Once again, the predictive usefulness of such a model is largely an open question. The foregoing discussion indicates that the additivity property, which holds almost universally for linear models, does not carry over, in general, to the class of nonlinear models. This precludes, for instance, the use of the notion of additivity in inventory and/or biomass studies if nonlinear models are used for prediction. CHAPTER IX 9.0 COMPUTATIONAL CONSIDERATIONS We have attempted to present, in preceding chapters, a theory of estimation and inference for the additivity problem and to indicate general izations and extensions to other types of models. The essential objective of the discourse has been to present the additivity problem as perceived here within the general framework of linear model theory. One hopes that this objective has been achieved to a large extent. However, our derivation of expressions for estimators and associated statistics, parti-ularly in chapter V, leaves one important question largely unanswered. This question is: Does the conditioning principle introduced to handle the additivity problem in general call for new computing subroutines or algorithms in order to obtain estimates and other statistics? We show, in this chapter, that no such subroutines or algorithms are required. All estimates and associated statistics can be computed using existing system-based software such as is provided by the various statistical packages. Examples of such packages are MIDAS (The Michigan Interactive Data Analysis System, The University of Michigan, Ann Arbor, Michigan), BMDP (Biomedical computer programmes P-series, University of California Press, Los Angeles, California), SAS (Statistical Analysis System, SAS Institute, Raleigh, North Carolina), and SPSS (Statistical Package for the Social Sciences, McGraw-Hill Inc., New York). At computing 60 61 installations where SAS is not available, BMDP is perhaps the most commend able package to use mainly because it has options for generating diagnostic plots and other information valuable in model choice and validation. To motivate the derivation of the main result of this chapter, consider the estimation problem associated with fitting component equation j (j = 1, 2, k) as presented in chapter V. More specifically, recall that in fitting equation j, where equation j contains only statis tically important independent variables, the estimator for B..* is given by M - L i.l' o.o.i) ib - T T b p., p,q, q4q., q. 3 3 3 3 3 3 where b and b are given by the partition q • P • 3 3 Aq\ B. = 3 b (9.0.2) of B obtained from fitting the full model (with all independent variables) while T and T are obtained from a corresponding partitioning P j q 3 q3 q3 of (XtX)~1 in the form /T T \ t -1 ^ qJPJ | (XCX) 1 = I . (9.0.3) \T T / Vp.q. p.p./ 3 3 3 3 With a corresponding partitioning of the X matrix in the form X = (X |x ), the result in (9.0.1) is given equivalently by (see qj PJ equation (5.1.18) in chapter V) §.*=( , ) • (9.0.4) (T _T T ~ T )X y. p.p. p.q. q.q. q-p. p. J* 62 Consider now fitting y. on the set X only (i.e., so that the 3 Pj coefficients of the components of are set identically equal to zero). It is shown below that the estimator 6* , say, from the latter fit is precisely equal to the non-null part of * in (9.0.4). Hence B\* is completely specified by simply fitting y. on the set X . The conse-2 Pj quence of this result is that no special algorithm is necessary to obtain 0j* beyond those already available on standard statistical packages. We demonstrate this result formally by stating and proving the following theorem. Theorem: Let x = (Xi, X2, ..., ^} be a set of predictor variables and consider fitting the linear model (with intercept) y. = Xg + e. J j J subiect to a subset b of 8. of order q. being equal to the zero vector. qj J 2 Here X = (XQ|XX| ... |Xm_^) is an n x m (n > m) matrix of full column rank and we assume, as usual, that e^. <v N(<J>, lff?) • Denote the solution from fitting this constrained model by §..*• Now partition X so that X = (X Ix ) for q +p. =m, 0<q. <min accordance with the con-qj pi i 2 2 straining of 8.. so that (xtx)"1 Then B.* is given by - ( * 8.* = I J VT -T T - . - ; pipj piqi qjqi qjpi pjJ _1T )Xt y. J 63 Furthermore, if b * is the solution from fitting y. on the set X P- J P-J 3 (that is, ignoring X ), then qj b * = (T -T T -1T )XC y. . p. p.p. p.q. q.q. q.p. p. i J 3 3 3 3 3 3 H3 3 3 We prove the above theorem by utilizing a well-known theorem in linear algebra concerning the inverse of a partitioned matrix. This theorem is stated here as a lemma, without proof, and we refer the reader to Graybill (1976, p. 19) or any standard text in linear algebra for a proof. Lemma; Let W be an n x n nonsingular matrix that is partitioned as follows: /wn W12 w = I \W2i w22> where W„ has size n^ x n_. for i, j = 1, 2 (ni + n2 = n, 0 < nj < n) . If |Wn| 4 0 and |w22| 4 0, then W ^ is given by W -1 [WIJ-WI2W22 "H?21 ] -Wu ^Wl2 [W22-W2iWi i "H/i2] ^ -W22 "Hj2j [Wii-Wj2W22 ^W2ll ^ [W22-W2IWJI ^Wi2l ^ Proof of Theorem: The expression for i§.* given in the theorem follows 3 from Searle (1971, pp. 113-114), as demonstrated earlier in the thesis. It remains to show that the second part of the theorem holds. With (XtX) partitioned as in the theorem and using the above lemma, it follows that (XtX)"1 : 64 where T 1 —. q.q. 3 3 [xt x - xt x (xfc x )~1xt x ] 1 q. q. q. p. p. p. p. q. J 3 3 j J J J 3 Vj _ -(xfc x )_1xt x [xfc x -y?- x (xt x ) q. q. q- p. p. p. p. q- q- q. J j 3 3 3 3 3 3 2 3 x l"1 q3 Pj = -(xt x )~1xt x [xfc x -x11 x (x11 X ) p.p. p.q. q-q. q. p. •p. p -Vx r1 P3 q3 [xfc x -xfc x (xfc x )"1xt x ]_1 . p. p. p. q. q. q. q. p. J J J J 3 3 3 3 Note that the existence of (X*" X ) and (Xt X ) ^ follows from q- q. P. P. the fact that X is of full column rank. Note also that the solution b * obtained from fitting y. on the set X must also be given by P. J P. *J J 3 b * = (X1 X )"1Xt y. . P. P- P. P- 3 FJ F3 3 3 Therefore, to complete the proof of the theorem, we need only show that (T -T T _1T ) = (Xt X To this end, note that p.p. p.q. q.q. q.p. p- p-3 3 3 3 3 3 H3 3 3 3 T - T T "1T = [XC X -x' X (Xt X )"1Xt X ]_1 p.p. p.q. q.q. q-p- p. p- p- q- q- q- q- P-3 3 3 3 3 3 3 3 3 3 3 H3 3 3 3 3 - {(xt X ) V X p. p. p- q. J 3 3 3 [xt x -x* x (xt x )"1xt x ]-1 q. q. q. p. p. p. p- q. 3 3 3 3 3 3 3 3 [Xt X -x' X (x1 x )~1xt x ](xfc X )"1xt X qjqj qJ PJ PJ PJ PJ qJ q3 q3 qJ PJ [xfc x -xc x (xfc x )"1xt x p. p. p. q. q- q- q- p-3 3 3 3 3 3 3 3 65 = [xfc X -X* x (xfc X )"1xt X l"1 p. p. p- q. q- q. q. p. 3 3 3 3 3 3 3 3 - {(xfc x )"1xt X (xfc x )"1xt X Pj PJ PJ qj qj qj q3 PJ [x* x -xz x (xfc x )"1xt x p. p. p. q. q. q. q. p. J J J 3 3 3 3 3 = [i-(xt X )"1Xt X (Xt X )"1xt X ] PJ PJ PJ qj qj qj ^ Pj [xt x -xc x (xt x )"1xt x ]_1 p. p. p. q. q. q. q. p. J 3 3 3 3 3 3 3 = (xc x )"1[xt x -xfc x (xc x )"1xt X ] p. p. p. p. p. q. q. q. q. q. 3 3 3 3 3 HJ J J 3 3 [Xfc X -X1 X (xc X )~1xt X ]_1 p. p. p. q. q. q. q. p. 3 *J 3 3 3 3 3 3 P- P. J J which is what we set out to show. Q.E.D. Thus, we have established by the above theorem that the estimation problem associated with the generalized additivity problem as developed in chapter V does not require that new computing algorithms be developed to obtain estimates and related statistics. One simply fits a model containing what are construed to be statistically important independent variables. The parameter estimate corresponding to such a fit can then be augmented to the corresponding estimate for a full model constrained so that the unimportant independent variables have coefficients of zero. The virtue of the estimator of the parameter vector for component equation j (j = 1, 2 k) given in chapter V is that it is of appropriate size for additivity. However, whether g\* is obtained directly as in chapter V or indirectly by fitting a subset X and then augmenting the Pj resulting estimator, its components may need to be permuted before invoking additivity to obtain the estimator for the corresponding total equation. Such permuting ensures that appropriate components of B\ (j = 1, 2, k) are added to obtain (3,^. Finally, the theorem proved above is based upon the distributional assumption E^ ^ N(<J>, I^j) • Clearly, obvious modifications in the theorem would make it hold for the case E. ^ N(<)>, Vo?) for V positive J J definite. Other generalizations are also possible. CHAPTER X 10.0 SOME ILLUSTRATIVE EXAMPLES In this chapter, some examples are given that illustrate the applica tion of the theory presented in the discourse. First, however, a some what detailed analysis is given aimed at assessing the tenacity of the k assumption of multivariate normality of y^, = Z^_^ y^ for each of three data sets. As was stated in chapter VII, the usual inferences for the k total model depend critically upon the assumption that y^, = y^ is multivariate normal. Since there is no prior knowledge that the assump tion holds, it is necessary to assess for multivariate normality in order to more appropriately qualify any inference statements in the examples. 10.1 Assessing Multivariate Normality of y Koziol's (1982) method for assessing joint multivariate normality of the components yj, y2, y^ was used on three data sets. The first of these data sets is that used by Kozak (1970) to illustrate the additivity result presented in his paper. The second data set is British Columbia coastal western hemlock data used by Kurucz (1969). The third data set is western hemlock data from various parts of British Columbia and was obtained from the ENFOR project (Williams, 1983, personal communication). Note that ENFOR is an acronym for ENergy from FORests. Koziol's (1982) method for assessing multivariate normality is based upon a Cramer-von Mises type statistic , which is computed as follows: 68 1. Given XT, X? , •••> X are random k-dimensional vectors, calculate n X = (Xi, X2» •••> X^)1" and S. Here S is the sample variance-covariance matrix of the n vectors and is k x k. 2. Calculate the sample Mahalanobis squared distances Y\, Y2, Y n defined by Y. = (X. - X)tS~1(X. - X). 11 1 3. Put Z. = F. . (Y.), i = 1, 2, n and order the Z. in ascending 1 (k) 1order so.that Z/lN S Z.„. S ... S Z. N. (Y.) here denotes (1) (2) (n) (k) 1 the area under the chi-square density function with k degrees of freedom between the limits of zero and Y_^ (i.e., F(k) (V = Pr [Y = Yi])' Calculate J using n 2 . f n s 1 J = £ [Z - (i - h)lnV + (12n) i=l K ' Note that with three components, k is equal to three in our case. The three data sets are reproduced in Appendix I (a,b,c). The first two of the data sets in the appendix are reported in imperial units, while the third data set is given in metric units. However all analyses reported in this dissertation were carried out in metric units. Before presenting details of the test for joint multivariate nor mality of the Yj's> it is worth pointing out some technical considerations which simplify considerably the computation of the Koziol statistic J^. In particular, this simplifies the computational formula for the sample Mahalanobis squared distances Y_^ (i = 1, 2, n). Observe that the joint distribution of the Yj's ln this case is conditional upon the independent variables (the X's). As a result of this and based upon the notion that the regression of y on the X is important (signifi cant), the sample Mahalanobis squared distances are given by 68a Y. = (y. - y.)tS~1(y. - y.) or by -t -I-Y. = e. S E. xi x where y. - y. = £. = (y^ - y.^ y±2 - y.^ . . . , y±k - y^)' and S = (n - 1) ^ EE*" (note that E = (EI,£2 »E3)t). The distinction between Y. and y. must be borne in mind here. The residual vectors used in l x computing Y^ above are those obtained from fitting component equations using only statistically significant independent variables for each .data set. These equations are those used to obtain conditioned total predic tive equations in examples 1, 2, and 3 that follow. It should be empha sized that a test for multivariate normality of EJ, £2, ..., E^ is equivalent here to a test for multivariate normality of yi, y2» •••> y^-Thus if EI, £2, ...» e. are jointly multivariate normal, one can speak K. of the multivariate normality of ylt yz, y^ and hence of E^ and yT-In computing the statistic for each data set, an APL programme was used to calculate the Y^ as specified above using an IBM 5100 Portable Computer. This computer is located in the Mathematics Annex at the University of British Columbia. APL is an extremely efficient language when one is dealing with matrix computations. The computation of the chi-square probabilities in step 3 was achieved by calling the IMSL (International Mathematical and Statistical Libraries) subroutine MDCH which computes cumulative chi-square probabilities. A short fortran programme was written to call MDCH (see Appendix Id, PROGRAMME 1). Note that although DF in programme 1 is specified as 2.0, DF = 3.0 for the first part of this assessment problem. Finally, the ordered chi-square probabilities from step 3 were used in another fortran programme to calculate (see PROGRAMME 2 in Appendix Id). The results of Koziol's (1982) test on the three data sets are summarized in Table 1 below. Table 1. Results of Koziol's (1982) test for multivariate normality on three data sets Data Set Sample DF Computed Koziol Size Statistic (J ) p-value n Kozak (1970) 10 3.0 0.05799 > 0.15 Kurucz (1969) 18 3.0 0.86078 < 0.01 ENFOR 48 3.0 4.27682 «0.01 The p-values in Table 1 are obtained by comparison with Koziol's Table 1 (Koziol, 1982). It is to be emphasized that due to small sample sizes associated with Kozak's (1970) and Kurucz's (1969) data, our p-values may be somewhat off. However, on the basis of these results, the assumption of joint multivariate normality will be entertained for Kozak's data but not for the other two data sets. Note that this conclusion is quite reasonable for the ENFOR data because of the moderate (n = 48) sample size. In view of the above results (ignoring the small sample sizes in the first two data sets) it is reasonable to expect that y^ is multivariate normal for Kozak's data since it is reasonable that yi, y2, y3 are jointly 70 multivariate normal in this data set. On the other hand, the above results suggest only that for the two other data sets yT may or may not be normal, since it is possible for y^ to be multivariate normal even when Yl> Yl> Y3 are not jointly multivariate normal. For both the Kurucz (1969) and ENFOR data, it was considered of some interest to check for joint bivariate normality of the y 's. Accordingly, Koziol's test for multivariate normality was applied to pair-wise yj's» thus three tests were performed on each data set. The results are sum marized in Table 2 below. Table 2. Results of Koziol 's test for bivariate normality DF Sample Size Koziol's Computed Statistic (J ) p-value n Kurucz's (1969) data (ei, ?2) 2.0 18 0.8484 <0.01 2.0 18 0.5023 <0.01 (£2, £3) 2.0 18 0.6873 <0.01 ENFOR data (©1, £2) 2.0 48 5 .4432 «0.01 (ei, £3) 2.0 48 3.7356 «0.01 (£2. £3) 2.0 48 3 .1189 «0.0l The results in Table 2 indicate that the assumption of joint bivari ate normality is rejected essentially in every case. This result is not unexpected since, having rejected trivariate normality, one expects that bivariate normality should fail to obtain in at least one of the three cases. It is also probably adequate to check bivariate normality and reject trivariate normality the first time bivariate normality fails to 71 hold. It has been shown so far that joint multivariate normality of Yl> Y2» Y3 does not appear to hold for the Kurucz (1969) and ENFOR data while multivariate normality will be entertained for the Kozak (1970) data. It should be emphasized again that, in general, one should be more cautious in accepting multivariate normality for the Kozak data because of the very small sample size. However, for purposes of the examples to follow, multivariate normality will be entertained. Once again, it is reasonable then to assume y multivariate normal for the Kozak data. However, one is unable to decide whether or not y^ is multi variate normal for the Kurucz and ENFOR data. A direct examination of the behaviour of y^ = vj i-s necessary to make a judgement concerning its normality or non-normality. One way in which information can be obtained concerning the multi-variate normality or lack of it for y^ = ^^aj_ Yj *s to fit tne component models and investigate the behaviour of the empirical distribution of = ej• This can be achieved, in part, by plotting a histogram of or a normal probability plot of e^. Unfortunately, these proce dures require large enough sample sizes in order for the plots to be reasonably interpretable. Largely because of this, it was possible to examine such plots in this study only for the ENFOR data because of its moderate sample size (n = 48). The Kurucz data were obviously too small to be examined by this procedure. Three component models were fitted using the ENFOR data. Bole biomass was regressed on D2H and DCL, where D denotes diameter at breast-height, H denotes height, and CL crown length. Bark biomass was regressed on D2H and HCL and crown biomass was regressed on DCL and HCL. The 72 residuals from these fitted component equations were added up and a histo gram and normal probability plot constructed using the BMDP P:5D subroutine. If the histogram of eT looks sufficiently bell-shaped, it is reasonable to conclude that yT is normal. Similarly, normality of yT would be suggested by a sufficiently linear normal probability plot. The plots are given in Appendix II(a,b) and both suggest that eT and hence yT is normal for the ENFOR data. Based upon results of this section, we can proceed as though y^, were multivariate normal for the Kozak and ENFOR data but are unable to say whether y^ is multivariate normal or not for the Kurucz data. 10.2 Example 1 In this section, the data given by Kozak (1970) are used to apply additivity theory as presented in chapters IV and V of this thesis. As indicated earlier, though the data are reproduced in the appendix in imperial units, all calculations here are in metric units. It is further assumed throughout that the biomass components are independent. Admit tedly, this may be a tenuous assumption; however, we use it largely for purposes of demonstrating the application of the theory. For the effect of dependence on estimation and inference see the discussion in chapter VII. Let us assume further that ^ N(cj), l°j)» First, consider fitting the component equations using both diameter and the square of diameter as independent variables. Recall that this is the case considered by Kozak (1970). Then the fitted component equa tions are given by 73 yi = 131.39 - 19.037X + 0.95195X2, R2 = 0.9923 y2 = -1.12 + 0.205X + 0.02980X2, R2 = 0.9965 y3 = -13.08 + 1.136X + 0.08361X2, R2 = 0.9605. The corresponding total fitted equation is yT = 117.1.9 - 17.696X + 1.06540X2, R2 = 0.9948. One can check easily that the coefficients of the total equation are obtained by adding corresponding coefficients of the fitted component equations, as Kozak (1970) demonstrated. The regression sum of squares for the total equation is 289060, to five-digit accuracy, and given that Xtyrf = (2300.6, 56183, 1448300)t, one can check easily that this is the result one obtains using additivity and equation (4.2.6) of the thesis. Next, Kozak (1970) reports that when only statistically significant (important) independent variables are used in fitting the component equa tions, the first equation (yi) involves both X and X2, the second (y2) only X2, and the last (y3) only X2 also. The metric analogues of Kozak's specification of these fitted equations are yi = 131.39 - 19.037X + 0.95195X2, R2 = 0.9923 y2 = 0.822 + 0.03471X2, R2 = 0.9962 y3 = -2.342 + 0.11079X2, R2 = 0.9588. In accordance with the extension of the concept of additivity as developed in chapter V, there is a total equation determined by additivity of the coefficients of the preceding equations. In fact, this equation is given by yTC = 129.87 - 19.037X + 1.09745X2. The sum of squares regression for this conditioned total equation is given by BTCXtyT - ny2 = 818660 - 531860.4564 = 286799.5436. 74 Since the total corrected sum of squares is 290580, it follows that R2 corresponding to the total conditioned equation is given by R2 = 0.9870. It is worth remarking that in terms of R , this model fits the data almost as well as the unrestricted total model, with an R of 0.9948. Other aspects of this problem, including computational details, are provided in a more detailed example in Appendix III. In connection with this problem and related problems concerning additivity, the question naturally arises whether the variables in the conditioned total equation remain statistically significant after being incorporated into the total conditioned equation. The answer appears to be that they would be statistically significant if the variables in the conditioned component equations are not very highly correlated. However, this may not be the case if the variables are highly correlated. It should be pointed out that this has not been checked thoroughly and, thus, should be viewed here as largely a conjecture. For the Kozak data, how ever, the contribution to the total conditioned equation of each variable was checked by computing the increase in residual sum of squares when a particular variable is omitted from the total conditioned equation. The following partial F-values were calculated: Fx = 96.27, Fx2 = 15.64 The degrees of freedom for these partial F values are 1 and 7, respectively. It is clear from these results that both diameter at breast height and its square are statistically important in the total conditioned equation. 75 10.3 Example 2 In this example, use is made of the western hemlock data from coastal British Columbia to go through the basic computational results as in the previous example. These data were used by Kurucz (1969) and are given in Appendix 1(b). First it should be noted that, as discussed in section 10.1, it has not been possible to determine whether y^ is multivariate normal for these data or not. Therefore, inferential results given in this section relating to these data must not be viewed as strictly valid. The essence of this example is mainly to demonstrate use of the concept of additivity computationally. One would need to check that yT is reasonbly multi variate normal for inference statements to carry full weight. The cal culations here are carried out in metric units and the assumption is made that the components y^ (j = 1, 2, 3) are independent. An all-combinations (all subsets) procedure provided by the BMDP package (P:9R) was used to find the best variable subsets for predicting component biomass. Three components were recognized for purposes of this analysis, namely bole (yi), bark (y2), and crown (branches + foliage = y3) . The crown component was obtained by simply adding branch and fine branch components for individual trees. Using R2 as a selection cri--terion, the best equations were found to be yi = -75.708 + 0.01330X2, R2 = 0.9907 y2 = -25.782 + 0.00203X2, R2 = 0.9309 y3 = -24.765 + 0.095895X!, R2 = 0.8122 where Xj = (height)2 and X2 = (height)(diameter)2. A corresponding unrestricted total fitted equation, using X\ and X2, is given by yT = 33.542 + 0.26819Xx + 0.01993X2, R2 = 0.9655. 76 The total corrected sum of squares corresponding to the latter fit is 182559547. As in the preceding example, a conditioned total fitted equation is obtained by additivity as yT_ = -126.255 + 0.95895X! + 0.01533X2. Since x'y = (53205, 113670000, 15112000000)t, it follows that the sum of squares regression associated with the latter conditioned equation is §TC XtyT - ny2 = 332291089.2 - 18(2955.8)2 = 175029523.7. Therefore, the R2 associated with this total conditioned equation is 0.9588. Again, if the assumption of normality of y^ held, one would conclude from this that the conditioned total equation performs well when compared with the unrestricted total equation. It may be noted that because of the great variability in the size of the trees in this data set, one needs to be careful about the predictive goodness of these models. Indeed, as mentioned elsewhere in this thesis, we are much less concerned here with using the best equations in a particular sense than with demonstrating certain aspects of additivity. 10.4 Example 3 The next example is based upon the ENFOR data for western hemlock (see Appendix 1(c)). We restrict details to the level of previous examples. Using an all-combinations procedure as in the previous example, the following equations were found to be the best for predicting component biomass: yi = 6.49538 + 0.01541D2H - 0.12258DCL, R2 = 0.9836 y2 = 0.93179 + 0.00247D2H - 0.03112HCL, R2 = 0.9561 y3 = -4.82066 + 0.31477DCL - 0.23344HCL, R2 = 0.8424. The unrestricted total biomass equation is given by yT = 2.05138 - 0.000074D2H + 0.09319DCL - 0.0667HCL, R2 = 0.8674. The corresponding total conditioned equation is given by y„n = 2.60651 + 0.01788D2H + 0.19219DCL - 0.26456HCL. The sum of squares regression due to fitting the latter equation is given by f£ x'y - ny2 = 2876.506. Hence the R2 corresponding to this model J. L» J. J. is 0.8215 which compares favourably with that of the unrestricted total equation. Note that even when crown variables are used for predicting crown biomass, the R2 is still in 0.80-0.90 range for that component. In Appendix III, the computational details relating to the additivity problem are given using Kozak's (1970) data again. The objective there is to show how the various statistics are computed, especially the vari ances of the parameters in the total conditioned equation. CHAPTER XI CONCLUSIONS AND REMARKS In the discourse we have generalized the additivity problem as originally posed in the context of forestry by Kozak (1970). It has been shown that the statistical theory of estimation and inference for the generalized additivity problem as defined here is constructible within the general framework of general linear model theory. It is important to recog nize that both estimation and inference theory is, in general, dependent upon distributional assumptions for the £_. (j = 1, .. ., k) and upon whether the e_. are dependent or not. When the are dependent, it has been shown that additivity as defined here, does not hold. Furthermore, inference theory relating to the total model is complicated by the fact that although the components may follow normal distributions, it does not follow automatically that their sum is also normal. This suggests a need to investigate, or otherwise, justify the normality of y^, before inference can be drawn about it when dependence obtains among the components. In particular, it would be useful if future studies in this area could address the problem relating to the distribution of y^, directly uSing large enough data sets along the lines indicated in section 10.1. Large data sets that might become available through projects such as the ENFOR project might make such studies possible and worthwhile. Other directions of further investigation might be the determination of the form of the dependence among components. This might simplify the problem of determining the 78 79 distributional behaviour of y^,. The problem of additivity has also been seen to lead to interesting but, as yet, unsolved problems in multivariate distribution theory. This is obviously a fruitful line of further research for those who are theor etically inclined. One of the interesting results obtained here is that the additivity problem is naturally extendible to the class of linear models known as classificatory models generally encountered in designed experiments. This extension must not be construed to be accidental since any classificatory model can, in general, be expressed in regression form. The additivity problem does not, however, extend to the class of intrinsically nonlinear models. Hence the usefulness of theory relating to the additivity prob lem in mensurational studies involving nonlinear functions would, at best, be minimal. However, the theory should find wide applicability among ecologists and quantitative scientists interested in the assessment of biomass. The additivity problem does not require the construction of new com puting subroutines as clearly demonstrated in chapter IX. This should make it especially easy to use the theory of additivity as developed here. Finally, the examples given in the preceding chapter show that the concept of additivity is quite practical and realistic and statistically appealing. 80 REFERENCES Andrews, D. F., Gnanadesikan, R., and Warner, J. L. 1973. Methods of assessing multivariate normality. In Multivariate Analysis 3, Krishnaiah, P. R. (ed.), pp. 95-116, Academic Press, New York. Arnold, S. F. 1981. The theory of linear models and multivariate analysis. John Wiley and Sons, Inc., New York. Behboodian, J. 1972. A simple example of some properties of normal random variables. Amer. Math. Monthly, 79: 632-634. Berkson, J. 1950. Are there two regressions? J. Amer. Stat. Assoc., 45: 163-180. Brown, J. K. 1963. Crown weights in red pine plantations. U.S.D.A. Forest Serv., Lake States Forest Exp. Sta., Res. No. LS-19. Brown, J. K. 1965. Estimating crown fuel weights of red pine and jack pine. U.S.D.A. Forest Serv., Lake States Forest Exp. Sta., Res. Pap. LS-20. Chaturvedi, 0. P., and Singh, J. S. 1982. Total biomass and biomass production of Pinus roxburghii trees growing in all-aged natural forests. Can. J. For. Res. 12(3): 632-640. Chiyenda, S. S. 1974. Prediction and estimation of slash based on total and component biomass of standing trees and on measurement of post-harvest residues. Unpublished B.S.F. thesis, The University of British Columbia, Vancouver, Canada. Cox, D. R. 1968. Notes on some aspects of regression analysis (with discussion). J. Roy. Stat. Soc, A131: 265-279. Crow, T. R. 1971. Estimation of biomass in an even-aged stand— regression and 'mean tree' techniques. In Forest Biomass Studies. IUFR0 Sect. 25: Yield and Growth Working Group on Forest Biomass Studies, University of Main Press, Oronto, pp. 35-47. Draper, N. R., and Smith, H. 1981. Applied regression analysis. John Wiley and Sons, Inc., New York (second edition). Dyer, R. F. 1967. Fresh and dry weight, nutrient elements and pulping characteristics of northern white cedar, Thuja occidentalis. Maine Agric. Exp. Sta., Techn. Bull. 27. Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Stat. 7: 1-26. Freedman, B., Duinker, P. N., Morash, R., and Prager, U. 1982. A com parison of measurements of standing crops of biomass and nutrients in a conifer stand in Nova Scotia. Can. J. For. Res. 12(3): 494-502. 81 Gallant, A. R. 1971. Statistical inference for nonlinear regression models. Unpublished Ph.D. dissertation, Iowa State University, Ames, Iowa. Graybill, F. A. 1976. Theory and application of the linear model. Duxbury Press, New York. Hawkins, D. M. 1981. A new test for multivariate normality and homo-scedasticity. Technometrics, 23: 105-110. Healy, M. J. R. 1968. Multivariate normal plotting. Appl. Stat. 17: 157-161. Hogg, R. V. 1979. An introduction to robust estimation. In Robustness in Statistics, Launer, R. L., and Wilkinson, G. N. (Eds.), Academic Press, New York. Hogg, R. V., and Craig, A. T. 1970. Introduction to mathematical statis tics. Macmillan Publ. Co., Inc., New York (third edition). Hollander, M., and Wolfe, D. A. 1973. Nonparametric statistical methods. John Wiley and Sons, Inc., New York. Honer, T. G. 1971. Weight relationships in open- and forest-grown balsam fir trees. In Forest Biomass Studies, IUFRO Sect. 25: Yield and Growth Working Group on Forest Biomass Studies, University of Maine Press, Orono, pp. 65-78. Jacobs, M. W., and Cunia, T. 1980. Use of dummy variables to harmonize tree biomass tables. Can. J. For. Res. 10(4): 483-490. Johnstone, W. D. 1971. Total standing crop and tree component distribu tions in three stands of 100-year-old lodgepole pine. In Forest Biomass Studies, IUFRO Sect. 25: Yield and Growth Working Group on Forest Biomass Studies, University of Maine Press, Orono, pp. 81-89. Jokela, E. J.., Shannon, C. A., and White, E. H. 1981. Biomass and nutrient equations for mature Betula papyrifera Marsh. Can. J. For. Res. 11(2): 298-304. Kale, B. K. 1970. Normality of linear combinations of non-normal random variables. Amer. Math. Monthly, 77: 992-995. Keen, R. E. 1963. Weights and centres of gravity involved in handling pulpwood trees. Pulp Pap. Res. Inst. Can., Techn. Rep. 340. Kellogg, R. M., and Keays, J. L. 1968. Weight distributions in western hemlock trees. Can. Dept. Fish. Forest., Ottawa, Bi-monthly Res. Notes 24(4): 32-33. Kempthorne, 0. 1975. Design and analysis of experiments. Krieger Publ. Co., Inc., New York. 82 Ker, M. F., and Van Raalte, G. C. 1980. Tree biomass equations for Abies balsamae and Picea glauca in northwestern New Brunswick. Can. J. For. Res. 11(1): 13-17. Keyes, M. R., and Grier, C. C. 1981. Above- and below-ground net produc tion in 40-year-old Douglas-fir stands on low and high productivity sites. Can. J. For. Res. 11(3): 599-605. Kiil, A. D. 1967. Fuel weight tables for white spruce and lodgepole pine crowns in Alberta. Can. Dept. Forest. Rural Develop. Publ. 1196. Kiil, A. D. 1968. Weight of the fuel complex in 70-year-old lodgepole pine stands of different densities. Can. Dept. Forest. Rural Develop. Publ. 1228. Kimmins, J. P. 1977. Evaluation of the consequences for future tree productivity of the loss of nutrients in whole-tree harvesting. For. Ecol. Mgt. 1: 169-183. Kimmins, J. P., and Krumlik, G. J. 1976. On the question of nutrient losses accompanying whole-tree harvesting. IUFRO Oslo Biomass Studies, University of Maine Press, Orono, Maine, pp. 41-53. Kimmins, J. P., and Scoullar, K. 1979. FORCYTE: A computer simulation approach to evaluating the effect of whole-tree harvesting on the nutrient budget in Northwest forests. Univ. of B.C., Fac. of. Forestry. Kimmins, J. P., Scoullar, K., and Feller, M. C. 1981. FORCYTE—An ecologically-based computer simulation model to evaluate the effect of intensive forest management on the productivity, economics and energy balance of forest biomass production. Univ. B.C., Fac. of Forestry. Kozak, A. 1970. Methods for ensuring additivity of biomass components by regression analysis. For. Chron. 46(5): 402-404. Koziol, J. A. 1982. A class of invariant procedures for assessing multi variate normality. Biometrika, 69: 423-427. Kurucz, J. 1969. Component weights of Douglas-fir, western hemlock and western red cedar biomass for simulation of amount and distribution of forest fuels. Unpublished M.F. Thesis, The University of British Columbia, Vancouver, Canada. Loomis, R. M., Phares, R. E., and Crosby, J. S. 1966. Estimating foliage and branchwood quantities in short-leaf pine. For. Sci. 12(1) : 30-39. Madansky, A. 1959. The fitting of straight lines when both variables are subject to error. J. Amer. Stat. Assoc., 54: 173-205. 83 Malkovich, J. F., and Afifi, A. A. 1973. On tests for multivariate normality. J. Amer. Stat. Assoc., 68: 176-179. Marks, P. L., and Bormann, F. H. 1972. Revegetation following forest cutting: Mechanisms for return to steady-state nutrient cycling. Science, 176: 914-915. Miller, R. G. 1974. The jackknife—a review. Biometrika, 61: 1-15. Montgomery, D. C. 1976. Design and analysis of experiments. John Wiley and Sons, Inc., New York. Montgomery, D. C., and Peck, E. A. 1982. Introduction to linear regres sion analysis. John Wiley and Sons, Inc., New York. Mood, A. M., Graybill, F. A., and Boes, D. C. 1974. Introduction to the theory of statistics. McGraw-Hill, New York (third edition). Moore, E. H. 1920. On the reciprocal of the general algebraic matrix. Bull. Amer. Math. Soc. 26: 394-395. Morrison, D. F. 1976. Multivariate statistical methods. McGraw-Hill, New York (second edition). Muirhead, R. J. 1982. Aspects of multivariate statistical theory. John Wiley and Sons, Inc., New York. Noble, B., and Daniel, J. W. 1977. Applied linear algebra. Prentice-Hall Inc., Englewood Cliffs, N.J. (second edition). Ovington, J. D. 1956. The form, weights and productivity of tree species grown in close stands. New Phytol., 55(3): 289-304. Ovington, J. D. 1962. Quantitative ecology and the woodland ecosystem concept. Adv. Ecol. Res. 1: 103-192. Penrose, R. A. 1955. A generalized inverse for matrices. Proc. Cambridge Philos. Soc. 51: 406-413. Randies, R. H., and Wolfe, D. A. 1979. Introduction to the theory of nonparametric statistics. John Wiley and Sons, Inc., New York. Rao, C. R. 1965. Linear statistical inference and its applications. John Wiley and Sons, Inc., New York. Rao, C. R., and Mitra, S. K. 1971. Generalized inverse of matrices and its applications. John Wiley and Sons, New York. Rosenberg, L. 1965. Nonnormality of linear combinations of normally distributed random variables. Amer. Math. Monthly, 72: 888-890. 84 Ruymgaart, F. H. 1973. Non-normal bivariate densities with normal marginals and linear regression functions. Statistica Neerlandica, 27: 11-17. Sampson, A. R. 1974. A tale of two regressions. J. Amer. Stat. Assoc., 69: 682-689. Sando, R. W., and Wick, C. H. 1972. A method of evaluating crown fuels in forest stands. U.S.D.A. Forest Serv., North Central Forest Exp. Sta., Res. Pap. NC-84. Schmitt, M. D. C, and Grigal, D. F. 1981. Generalized biomass estima tion equations for Betula papyrifera Marsh. Can. J. For. Res. 11(4): 837-839. Searle, S. R. 1966. Matrix algebra for the biological sciences. John Wiley and Sons, New York. Searle, S. R. 1971. Linear models. John Wiley and Sons, New York. Singh, T. 1982. Biomass equations for ten major tree species of the prairie provinces. Can. For. Serv., Northern Forest Res. Centre, Information Rept. NOR-X-242. Smith, J. H. G. 1979. A review of the state of the art in complete tree utilization, short rotation forestry, biomass and growth modelling, and simulation. Univ. B.C., Fac. of Forestry. Smith, J. H. G., and Williams, D. H. 1980, A proposal to develop a comprehensive forest biomass growth model. Final report for the Canadian Forestry Service, Univ. B.C., Fac. of Forestry. Snedecor, G. W., and Cochran, W. G. 1973. Statistical methods. Iowa State Univ. Press, Ames, Iowa (6th edition). Springer, M. D. 1979. The algebra of random variables. John Wiley and Sons, New York. Stewart, G. W. 1973. Introduction to matrix computations. Academic Press, New York. Storey, T. G. 1969. The weights and fuel size distribution of pinyon pine and Utah juniper. Project Flambeau—an investigation of mass fire 1964-67. Pac. S. W. Forest Range Exp. Sta., Final Rep. III. Storey, T. G., Fons, W. L., and Sauer, F. M. 1955. Crown characteris tics of several coniferous tree species. U.S.D.A. Foreist Serv., Interim Tech. Rep. AFSWP-416. Storey, T. G., and Pong, W. Y. 1957. Crown characteristics of several hardwood tree species. U.S.D.A. Forest Serv., Interim Tech. Rep. AFSWP-968. 85 Strang, G. 1976. Linear algebra and its applications. Academic Press, New York. Tadaki, Y., Shidei, T., Sakasegawa, T., and Ogino, K. 1961. Studies on productive structure of forest (II): Estimation of standing crop and some analyses on productivity of young birch stand (Betula platyphylla). J. Jap. For. Soc. 43(1): 19-26. Tufts, W. P. 1919. Pruning of deciduous fruit trees. Calif. Agric. Exp. Sta. Bull. 313. Wald, A. 1940. The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11: 284-300. Whittacker, R. H., Bormann, F. H., Likens, G. E., and Siccama, T. G. 1974. The Hubbard Brook ecosystem study: Forest biomass and production. Ecol. Monogr. 44: 233-254. Yandle, D. 0., and Wiant, H. V. 1981. Estimation of plant biomass based on the allometric equation. Can. J. For. Res. 11(4): 833. Young, H. E. 1978. Forest biomass inventory. Proc. Symposium on Complete Tree Utilization of Southern Pine, April 1978, New Orleans, Louisiana. Young, H. E., and Chase, A. J. 1965. Fibre weight and pulping character istics of the logging residue of seven tree species in Maine. Maine Agric. Exp. Sta., Tech. Bull. 17. Young, H. E., Strand, L., and Altenberger, R. 1964. Preliminary fresh and dry weight tables for seven tree species in Maine. Maine Agric. Exp. Sta., Tech. Bull. 12. Zavitkovski, J. 1971. Dry weight and leaf area of Aspen (Populus tremuloides/Populus grandidentata) trees in northern Wisconsin. In Forest Biomass Studies. IUFRO Working Group on Forest Biomass Studies, Sect 25: Growth and Yield, University of Florida, Gainesville, Florida. Zavitkovski, J., Jeffers, R. M., Nienstaedt, H., and Strong, T. F. 1981. Biomass production of several jack pine provenances at three Lake States locations. Can. J. For. Res. 11(2): 441-447. Zyskind, G. 1962. On conditions for equality of best and simple least squares estimators. (Abstract) Ann. Math. Stat. 33: 1502-1503. Zyskind, G. 1967. On canonical forms, non-negative covariance matrices and best and simple linear estimators in linear models. Ann. Math. Stat. 38: 1092-1109. Zyskind, G., and Martin, F. G. 1969. On best linear estimation and a general Gauss-Markoff theorem in linear models with arbitrary non-negative covariance structure. SIAM J. Appl. Math. 17: 1190-1202. APPENDIX 1(a) Kozak's (1970) Biomass Data DBH Bole Bark Branches Total (Inches) (lbs.) (lbs.) (lbs.) YT=Y1+Y2+Y3 X Yi Y2 Y3 (lbs.) 7.2 254 29 73 356 11.0 749 60 192 1001 9.8 519 49 151 719 7.5 217 31 115 363 12.2 1025 76 222 1323 6.7 242 26 76 344 5.9 136 18 37 191 5.5 127 18 39 184 3.5 62 6 11 79 8.7 375 39 98 512 Conversion factors: 1 in = 2.54 cm, 1 lb. = 0 .4535924 kg 87 APPENDIX 1(b) Western Hemlock Biomass Data (Kurucz, 1969) Fine Branch Height DBH Branch + Foliage Bole Bark Total (Feet) (Inches) (lbs.) (lbs.) (lbs.) (lbs.) (lbs.) 20.0 3.5 25.23 36.26 15.56 3.57 80.62 14.0 1.6 3.43 7.27 3.32 0.82 14.84 34.0 6.1 55.62 71.95 60.18 16.21 203.96 53.0 11.8 314.77 309.58 318.24 52.21 994.80 41.0 8.8 130.36 168.71 169.58 35.45 504.10 55.0 10.5 231.26 260.52 278.94 44.52 815.24 55.0 11.7 289.72 275.71 346.04 56.06 967.53 93.0 21.9 1210.37 859.97 2095.34 220.94 4395.23 73.0 15.9 701.20 430.48 916.22 120.91 2168.81 92.0 17.0 529.87 423.46 1329.83 188.24 . 2471.40 117.0 24.5 1201.44 468.42 3806.64 487.50 5964.00 167.0 30.5 3188.59 993.78 7971.30 1221.49 13375.16 123.0 23.7 2424.59 831.81 3332.09 369.61 6958.10 131.0 26.0 1658.45 522.55 4323.78 357.79 6862.57 176.0 34.1 3876.44 2013.48 11187.62 1447.37 18524.91 175.0 36.4 4027.03 1535.28 13874.81 2337.72 21774.84 148.0 31.1 2870.00 1204.30 8920.92 1196.20 14191.62 151.0 29.0 5926.09 2149.10 7537.04 1415.22 17027.45 Conversion factors: 1 in = 2.54 cm, 1 foot = 0.3048 m, 1 lb. = .4535924 kg APPENDIX 1(c) DBH (cm) 19.20 28.00 22.70 23.90 28.90 23.90 14.60 26.80 11 .20 3.10 5.70 14.00 12.10 7.00 16.40 4.50 9.50 10.50 17.60 18.20 15.40 14.50 32.70 31 .30 16.10 1 1 .20 42.40 17.40 16.40 29.90 1 1 .30 21 .50 12.20 14.20 7.80 9.40 I 1 .00 19.80 13.60 II .20 9.80 9.70 12.90 17.70 1 1.40 14.00 18.70 11.10 Height (m) 22.40 25.20 26.70 28.30 28.20 22.40 12.80 26.60 9.80 3.50 4.90 11 .20 10.20 6.10 13.40 3.90 7.10 8.80 10.70 12.40 8.70 8.40 20.60 21 .30 13.40 10.00 22.40 1 1 .60 12.60 17.00 8.30 13.50 6.70 9.70 5.90 7.30 7.60 9.70 6.90 6.10 6.40 8.10 9.20 10.70 9.70 9.70 9.80 6.90 ENFOR Biomass Data Crown Crown Length Width (m) (m) 14.50 4.30 20.20 6.80 16.70 4.20 15.40 4.10 11.40 5.60 12.10 4.00 U.20 3.30 23.60 4.80 7.80 2.60 3.10 1.30 4.50 1.20 9.70 4.40 9.30 3.10 5.60 2.40 11.90 5.20 3.80 1.40 7.00 2.30 8.80 2.70 10.60 14.00 10.80 4.40 7.00 3.50 7.20 4.00 16.30 3.80 15.30 6.70 10.80 4.80 8.30 3.70 19.70 6.50 10.60 2.80 11.30 3.80 16.20 5.20 6.20 2.40 11.60 3.60 5.50 3.10 9.10 4.30 5.90 2.60 7.30 2.00 6.60 2.60 7.60 2.60 5.20 1.20 4.40 2.30 5.50 2.10 8.10 2.70 9.20 4.40 10.70 4.70 8.80 4.30 9.70 2.50 9.80 3.50 6.30 2.30 88 Total Bole Bark Crown Biomass (kg) (kg) (kg) (kg) 94.43 13.96 9.29 117.69 253.21 27.19 57.67 338.08 184.65 21.67 7.21 213.52 224.57 27.48 11.15 263.21 324.82 34.56 45.79 405.17 155.99 20.84 15.00 191.83 28.85 4.58 18.80 52.22 234.19 32.09 28.43 294.72 19.77 2.97 5.84 28.58 0.64 0.12 0.71 1.48 2.79 0.37 2.56 5.73 24.04 3.85 15.75 43.64 17.34 2.82 9.73 29.89 5.43 0.74 2.09 8.26 43.07 6.66 24.29 74.02 1 .30 0.32 2.16 3.79 7.89 1 .41 4.23 13.53 13.80 2.08 6.77 22.65 36.52 5.51 17.54 59.58 46.94 7.57 24. 17 78.68 25.01 4.09 10.86 39.96 19.94 3.20 12.19 35.32 232.63 43.90 89.67 366.20 326.34 50.43 86.41 463.18 39.00 7.68 26.77 73.45 16.15 2.52 1 1 .29 29.95 523.92 96.28 129.66 749.86 46.42 6.46 28.42 81 .30 41.12 6.07 8.65 55.84 131.27 18.81 142.89 292.96 13.23 3.65 4.81 21 .70 82.87 13.57 16.55 1 12.99 14.35 3.30 5.14 22.79 22.44 3.31 16.59 42.34 3.93 0.53 2.04 6.50 6.07 1.00 5.04 12.11 10.94 1.16 4.42 16.51 38.36 4.72 13.77 56.85 15.57 2.18 8.54 26.29 9.91 1.50 5.22 16.62 7.17 1.57 4.56 13.30 8.40 0.82 1 .98 11.19 18.47 1.93 6.25 26.65 31.71 3.53 14.63 49.87 14.91 1 .60 6.88 23.39 21 .87 3.64 6.61 32.12 35.83 4.62 19.28 59.73 9.01 1.40 5.28 15.69 89 APPENDIX 1(d) PROGRAMME 1: Fortran programme calls IMSL subroutine MDCH to compute chi-square probabilities INTEGER IER REAL. XC18) READ(5< 10) <X< J)< J=l» 18) 10 FORMAT OX, 1SFS. 5) DO 15 I-l i 18 DF=2. O CALL MDCH(X(I> • DF« P. IER) WRITE<6,20) X(I),P 20 FORMAT( ' X < I )= ', F8. 5. 5X. 'P= ', F10. 5) 15 CONTINUE STOP END PROGRAMME 2: Fortran programme computes and prints Koziol statistics REAL X( 18 >, JN READ(5i 10) (X(J)» <J~li IB) 10 FORMAT (18F8. 5) SUM=0. O DO 20 1=1/ 18 SUM=SUM+(XU>-< (1-0. 5)/18. 0) )**2 20 CONTINUE JN=SUM+ < 1.0/216.0) WRITE(6, 30) JN 30 FORMAT( ' JN=',F10. 6) STOP END INTERVAL NAME 5 10 15 +. + + +. *-42.OOO +X *-38.500 + •-35.OOO + *-31.500 + •-28.OOO + *-24.500 + *-21.000 +XX •-17.500 +X *-14.000 +X •-10.500 +X •-7.OOOO +XX •-3.5000 +XXXXXX •0.00000 +XXXXXXXXXXXXX •3.50000 +XXXXXXXX •7.OOOOO +XXX •10.5000 +XXXXX •14.0000 +XXX •17.50CO +X •21.OOOO + •24.5000 + •28.0000 + •31.5000 + •35.0000 + •38.5000 + •42.OOOO + •45.5000 + •49.OOOO • •52.5000 + •56.OOOO + •59.5000 + •63.OOOO + •66.5000 + •70.OOOO + •73.5000 + •77.OOOO + •80.5000 + •84.OOOO +X •87.5000 + + -20 -- + -25 -- + -30 --+-35 -- + -40 --+-45 -- + -50 --+-55 60 --+ FREQUENCY PERCENTAGE INT. CUM. INT. CUM. 1 0 0 0 0 O 2 1 1 1 2 6 13 8 3 5 3 1 O 0 0 O 0 0 0 O O O O 0 0 0 0 0 0 0 1 0 3 4 5 6 8 14 27 35 38 43 46 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 48 48 2.1 0.0 0.0 0.0 0.0 0.0 .2 . 1 . 1 . 1 .2 .5 4 2 2 2 4 12 27. 1 16.7 6.3 10.4 6.3 2.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 O.O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.1 0.0 2.1 2.1 2.1 2.1 2.1 2.1 6.3 8.3 10.4 12.5 16.7 29.2 56.3 72.9 79.2 89.6 95.8 97.9 97 .9 97 97. 97. 97. 97. 97.9 97 .9 97 .9 97 .9 97.9 97 .9 97. 97 97. 97. 97. 97.9 100.0 100.0 .9 .9 .9 .9 .9 .9 .9 .9 .9 .9 a H> CO rt O CW H g M J. CO l-h O H CD M a o p 55 W CJ H X - + -5 • - + -10 • - + -15 20 --+-25 -- + -30 --+-35 -- + -40 --+-45 50 -- + -55 60 KD O APPENDIX 11(b) Normal probability plot of residuals (e_ = e.) for the ENFOR data 2.4 1 .8 1.2 .60 0.0 -.60 1.2 •1.8 -2.4 -37.5 -50.0 -25.0 RESIDUAL 92 APPENDIX HT A SOMEWHAT DETAILED COMPUTATIONAL EXAMPLE In this part of the thesis, it is intended to use Kozak's (1970) data to show some of the computational details relating to the generalized additivity problem. These data are chosen partly because of their small size, making the computational exercise fairly straightforward yet making possible a demonstration of the computations involved. The computations for larger data sets (with more independent variables) are performed as typically shown here. Particular attention is given in the following to aspects of the computational details not given in section 10.2. In the following, Xi = diameter, X2 = (diameter)2, and the matrix X refers to the 10x3 matrix X = (X0|Xi|x2) where X0 is a column vector of l's. Also let X* = (X0IX2). The following matrices will be of use in this discussion. (8.2031 -0.82845 0.019172 \ 0.82845 0.087614 -0.002093 0.019172 -0.002093 0.0000514/ , / 0.36948 -0.0006217 \ /2300.6\ (X*tX*) 1 = , Xy = 56060 V^O.0006217 0.00000143/ \ 1443700 / The unrestricted total fitted equation is given by yT = 117.19 - 17.696Xi + 1.06540X2. The estimated variance-covariance matrix of 8^ for this model is (XtX)~1 rj2 = (XtX)~1(217.64) , where 217.64 is the MSE associated with fitting y^. Thus the standard errors for the parameters of this model can be obtained from the estimated variance-covariance matrix. In this case they are 42.25, 4.37, 0.1058 for §QT, §1T» and respectively. The component equations containing only important independent vari ables are given in section 10.2. The corresponding total conditioned equation as determined by additivity is yTC = 129.87 - 19.037X! + 1.09745X2. The covariance matrix of 3^, for the above equation, is obtained, under the assumption of independence of the y^'s, as the sum of the covariance matrices of the estimated parameters of the component equations. In the present case, the estimated covariance matrix is given by '8.94206 -0.82845 0.01793 \ i? . a2, J *3 j> = |-0.082845 0.087614 -0.002093 BTC \0.01793 -0.002093 0.0000543; where is the mean square associated with fitting component model j. In the present case, a2 = 194.04, a2 = 0.55375, and a2 = 42.198. There fore E? , a2 = 236.792. Hence the estimated standard errors for the J=l J parameter estimates in y ^ are, in order, 46.02, 4.55, 0.1134, which compare favourably with those given above for the unrestricted total model. It should be pointed out here that before adding the covariance matrices, they are filled up with zeros to bring them to the full size corresponding to all variables in the conditioned total equation and the elements are permuted to correspond to the same parameters prior to addition. This part has not been exhibited in the above derivations. Finally, the predicted values generated by the unrestricted total equation y^ and its residuals are compared with those obtained using the total conditioned equation yTr,. The results are given in Table 3(a,b) and in Figure 1. For these data, at least, the total conditioned equation performs relatively well. Of course this may partly be ascrib-able to the very small sample and narrow sample range. Table. 3 Comparison of predicted values and residuals of unrestricted total equation with those of total conditioned equation a. Unrestricted Total Equation OBSERVED TOTAL BIOMASS PREDICTED TOTAL BIOMASS RESIDUAL 161 .48 166 .53 -5 .05 454 .05 454 .27 -0 .23 326 . 13 336 .65 -10. .51 164 .65 166 .53 -1 . .87 600, . 10 591 .69 8 .41 156 .04 124 .40 31 , .63 86 .64 91 .08 -4. .44 83 .46 77 .71 5. .75 35 .83 43 .88 -8. .05 232 .24 246 .21 -13. .97 b. Total Conditioned Equation OBSERVED TOTAL BIOMASS 161.48 454.05 326. 13 164.65 600.10 156.04 86.64 83.46 35.83 232.24 PREDICTED TOTAL 165.48 454.69 335.99 165.48 593.78 123.73 91 .05 78. 10 47.36 245. 10 BIOMASS RESIDUAL -4.00 -0.65 -9.86 -0.83 6.32 32.30 -4.41 5.36 -11.53 -12.86 95 Figure 1. Scatter of Residuals from Total Unrestricted Equation and from Total Conditioned Equation a 4-C7 . ¥ Key + = (yT» eT) X = (yTC, eTC) a CC Oo. CO LU a: q a a a f—% . 1 X + X , 1 1 1 I 1 0.0 10.0 20.0 30.0 40.0 , 50.0 60.0 PREDICTED TOTAL BJOMflSS (X103 )
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Additivity of component regression equations when the...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Additivity of component regression equations when the underlying model is linear Chiyenda, Simeon Sandaramu 1983
pdf
Page Metadata
Item Metadata
Title | Additivity of component regression equations when the underlying model is linear |
Creator |
Chiyenda, Simeon Sandaramu |
Publisher | University of British Columbia |
Date | 1983 |
Date Issued | 2010-05-02T22:13:05Z |
Description | This thesis is concerned with the theory of fitting models of the form y = Xβ + ε, where some distributional assumptions are made on ε. More specifically, suppose that y[sub=j] = Zβ[sub=j] + ε [sub=j] is a model for a component j (j = 1, 2, ..., k) and that one is interested in estimation and interference theory relating to y[sub=T] = Σ [sup=k; sub=j=1] y[sub=j] = Xβ[sub=T] + ε[sub=T]. The theory of estimation and inference relating to the fitting of y[sub=T] is considered within the general framework of general linear model theory. The consequence of independence and dependence of the y[sub=j] (j = 1, 2, ..., k) for estimation and inference is investigated. It is shown that under the assumption of independence of the y[sub=j], the parameter vector of the total equation can easily be obtained by adding corresponding components of the estimates for the parameters of the component models. Under dependence, however, this additivity property seems to break down. Inference theory under dependence is much less tractable than under independence and depends critically, of course, upon whether y[sub=T] is normal or not. Finally, the theory of additivity is extended to classificatory models encountered in designed experiments. It is shown, however, that additivity does not hold in general in nonlinear models. The problem of additivity does not require new computing subroutines for estimation and inference in general in those cases where it works. |
Subject |
Linear models (Statistics) Regression analysis Estimation theory |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2010-05-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0075140 |
Degree |
Doctor of Philosophy - PhD |
Program |
Forestry |
Affiliation |
Forestry, Faculty of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/24279 |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- UBC_1983_A1 C49_8.pdf [ 5.09MB ]
- [if-you-see-this-DO-NOT-CLICK]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0075140.json
- JSON-LD: 1.0075140+ld.json
- RDF/XML (Pretty): 1.0075140.xml
- RDF/JSON: 1.0075140+rdf.json
- Turtle: 1.0075140+rdf-turtle.txt
- N-Triples: 1.0075140+rdf-ntriples.txt
- Original Record: 1.0075140 +original-record.json
- Full Text
- 1.0075140.txt
- Citation
- 1.0075140.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
United States | 17 | 1 |
China | 7 | 4 |
Germany | 4 | 0 |
Brazil | 1 | 2 |
City | Views | Downloads |
---|---|---|
Unknown | 10 | 2 |
Shenzhen | 7 | 2 |
Ashburn | 4 | 0 |
Bay Shore | 4 | 0 |
Mountain View | 2 | 1 |
Cheyenne | 1 | 0 |
Kansas City | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0075140/manifest