O P T I M A L DESIGN FOR LINEAR REGRESSION W I T H V A R I A B L E COSTS A N D PRECISION R E Q U I R E M E N T S A N D ^ T S APPLICATIONS T O F O R E S T R Y r By MARGARET PENNER B. Sc. (Hons.) Lakehead University, ,1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT. OF THE REQUIREMENTS FOR T H E DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF FORESTRY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1988 • © MARGARET PENNER, 1988 In presenting this degree at the thesis in University of partial fulfilment of of department this thesis for or publication of by his or her representatives. for an advanced Library shall make it agree that permission for extensive scholarly purposes may be It is granted by the understood that head of copying my or this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) requirements British Columbia, I agree that the freely available for reference and study. I further copying the Abstract Various criteria are discussed and as algorithms for obtaining optimal designs for use in linear regression. Linear regression is used widely and effectively in forestry but the method of quantifying the linear relationship in terms of selecting observations or designing an experiment to obtain observations is often inefficient. The experiment or study objectives must be identified to develop a criterion for comparing designs. Then a method of obtaining observations can be found which performs well under this criterion. Biometricians in forestry have been slow to take advantage of one of the assumptions of linear regression, namely that the independent variables are fixed. In part this has been due to limitations in the theory. Two important assumptions in most optimal design work, namely that precision requirements and costs are constant for all observations, are not valid for most forestry applications. Ignoring nonconstant costs can lead to designs less efficient than ones where each combination of independent variables is selected with the same frequency. The objective of this study was to develop a method of optimal sample selection that allowed for costs and precision requirements that vary from observation to observation. The resulting practical experimental layouts are more efficient for attaining the experimenter's objectives than randomly selected observations or designs constructed using the currently available design theory. Additional features of designs that consider differing costs and precision requirements are their larger sample size and their robustness to misspecification of the sample space. Traditional optimal designs concentrated n observations on the boundaries of the sample space. By recognizing that these observations may be more costly and may not be of primary interest to the experimenter, more efficient designs can be constructed from less extreme observations. A computer program for obtaining optimal designs is also included. 111 Table of Contents Abstract ii List of Tables vii List of Figures ix Acknowledgement x 1 INTRODUCTION 1 2 LITERATURE REVIEW 4 2.1 2.2 2.3 THEORY 6 2.1.1 THE PROBLEM 6 2.1.2 OPTIMAL DESIGN FOR PARAMETER ESTIMATION . . . . 12 2.1.3 MODEL UNCERTAINTY 22 2.1.4 SEQUENTIAL DESIGN 28 2.1.5 BAYESIAN OPTIMAL DESIGN 29 2.1.6 NONLINEAR REGRESSION 31 2.1.7 DECISION THEORY 34 2.1.8 ROBUST DESIGNS 37 REVIEW OF FORESTRY LITERATURE 40 2.2.1 MODEL-BASED SAMPLING 41 2.2.2 SAMPLING FOR BIOMASS 44 ALGORITHMS 45 iv 3 4 5 2.3.1 INFORMATION MATRIX 45 2.3.2 APPROXIMATE DESIGNS 46 2.3.3 EXACT DESIGNS 50 2.3.4 SEQUENTIAL DESIGN 54 COMPLICATIONS 56 3.1 AVAILABILITY 56 3.2 PRECISION 57 3.3 COSTS 58 P R O B L E M SOLUTION 61 4.1 AVAILABILITY 62 4.1.1 OBTAINING THE WEIGHTS 64 4.1.2 INCORPORATING THE WEIGHTS 65 4.1.3 IMPLICATIONS OF MISSPECIFYING AVAILABILITY . . . . 66 4.1.4 DISCUSSION 66 4.2 GENERALIZED LOSS 71 4.3 INCORPORATING COST 72 4.4 DISCUSSION 74 EXAMPLE 76 5.1 BACKGROUND 76 5.2 PREPARATION 78 5.2.1 OBJECTIVES 78 5.2.2 INDEPENDENT VARIABLES 78 5.2.3 MODEL 78 5.2.4 RANGES 79 v 7 WEIGHTS 79 5.2.6 VARIANCE OF THE DEPENDENT VARIABLE 80 5.3 PILOT STUDY 80 5.4 ANALYSIS OF PILOT STUDY 82 5.4.1 OBTAINING A MODEL 82 5.4.2 OBTAINING THE WEIGHTS 83 5.4.3 ESTIMATING THE VARIANCE OF VOLUME 84 5.5 IDENTIFYING THE OPTIMUM DESIGN 84 5.6 COMPARISONS WITH OTHER DESIGNS 92 5.6.1 RANDOM SAMPLING 92 5.6.2 UNIFORM SAMPLING 93 5.6.3 PURE OPTIMUM DESIGN 97 5.6.4 DISCUSSION 97 5.7 6 5.2.5 MODIFICATIONS 103 5.7.1 NONLINEAR MODELS 103 5.7.2 POLYNOMIAL MODELS 104 A D D I T I O N A L APPLICATIONS A N D V A R I A N T S 106 6.1 UNIQUENESS 106 6.2 STRATIFICATION AND QUALITATIVE VARIABLES 107 6.3 DISCRETE QUANTITATIVE VARIABLES 108 CONCLUSION 109 REFERENCES CITED 111 vi List of Tables 2.1 Some examples of 0 and <p from Fedorov and Malyutov(1972) 5.2 Pilot study data relating average plot aerial photo measurements and 20 ground volume measurements 81 5.3 ANOVA for the final model 82 5.4 Comparison of various designs in terms of sample size and cost to achieve the same precision of 1 cubic metre per plot 5.5 88 Comparison of various designs in terms of sample size and minimum precision obtained for a total study cost of $50,000.00 vn 92 List of Figures 2.1 The process of seeking a mathematical model (Fedorov, 1972 p. 8) . . . 2.2 The region of operability (RO(x)) and the region of interest (RI(x)) in 5 two dimensions 14 2.3 Two possible weight functions (w(x)) for k = 1 15 4.4 The relationship between diameter at breast height, height, and relative frequency 63 4.5 The relationship between height, crown width, and dispersion = xDx. . 68 4.6 The relationship between diameter at breast height, height, and weight = relative frequency 4.7 69 The relationship between height, crown width, and weight*relative frequency 70 5.8 Guidelines for designing and conducting an experiment or survey . . . . 77 5.9 The relationship between the normalized ranks of the residuals and the residuals 83 5.10 The relationship between the residual and predicted values 84 5.11 A summary of the program to find the optimum design 86 5.12 Relative frequency of observations over the sample space for the optimum sample design for CC = 50 - 65 % '. . 89 5.13 Relative frequency of observations over the sample space for the optimum sample design for CC = 65 - 85 % viii 90 5.14 Relative frequency of observations over the sample space for the optimum sample design for CC = 85 - 100 % 91 5.15 Relative frequency of observations over the sample space for the random sample design for CC = 50 - 65 % 94 5.16 Relative frequency of observations over the sample space for the random sample design for CC = 65 - 85 % 95 5.17 Relative frequency of observations over the sample space for the random sample design for CC = 85 - 100 % 96 5.18 Relative frequency of observations over the sample space for the uniform sample design for CC = 50 - 65 % 98 5.19 Relative frequency of observations over the sample space for the uniform sample design for CC = 65 - 85 % 99 5.20 Relative frequency of observations over the sample space for the uniform sample design for CC = 85 - 100 % 100 5.21 Relative frequency of observations over the sample space for the pure optimum sample design for CC = 50 - 65 % 101 5.22 Relative frequency of observations over the sample space for the pure optimum sample design for CC = 85 - 100 %. . . . ix r 102 Acknowledgement I would like to thank my supervisor Dr. Julien Demaerschalk as well as the rest of my supervisory committee, Dr. Harry Joe, Dr. A. Kozak, and Dr. Peter Marshall for their direction and guidance. Their contributions to my thesis and forest biometrics education were enormous as was their patience. Funding for this research was supplied by the Canadian Forest Service from 1984 to 1987 and then by the National Sciences and Engineering Research Council for the final year. I would also like to thank my fellow students and friends, particularly Sylvia Regehr and Sandra Thomson. Their friendship became very important to me during my years at the University of British Columbia. They are very special people. I would also like to thank Guillaume Therien for proofreading my thesis draft and removing many of the commas. He as well as Karen Jahraus and Valerie Lemay are important to me both as peers and friends. Finally I would like to thank my family, particularly my parents who always asked when I would finish, never doubting that the time would come. Their faith in me has always amazed and encouraged me. x my grandparents, Jacob and Marichien Klassen I dedicate this thesis. xi Statistical experimental designing means the investigation and construction of reason- able, possibly in some sense optimal, strategies of observation of random elements in stochastic models with regard to a given decision situation. Bandemer, 1980 p. 105 A good experimental design is one which focuses experimental effort on what is judged important in the particular current experimental context. Box and Draper, 1987 p.505 xn Chapter 1 INTRODUCTION An important part of forest biometrics involves quantifying relationships between different measurable quantities. Ratio estimators are used to estimate load volume from truck weights. Regression lines are used to relate aerial photograph measurements to ground measurements. Experiments are conducted to relate nutrient levels to seedling growth. Curves are used to relate stem wood density to distance from the pith. Tree volume is predicted from more easily measured tree attributes. For example, a manager may be interested in knowing the relationship between a tree's height and diameter and its volume. Knowledge of this relationship enables the manager to make educated predictions of tree volume from height and diameter measurements and also to estimate the effect that changing diameter and height would have on volume. To quantify such a relationship, one must usually collect the information by either experimentation or sampling. The dependent or predicted variable can then be regressed on the independent or predictor variables. Another example of estimating relationships in forestry is the use of sample plots. Temporary and permanent sample plots are used extensively for predicting various quantities such as stand volume, height growth, basal area, etc. These plots are expensive to establish, and permanent sample plots are costly to maintain. Funds are not available to establish plots in all stand conditions. If the relationship between stand conditions and attributes of interest are known or estimated, attributes of stands which have not been sampled can be predicted. Sample plots are usually located in a wide variety of stand conditions to encompass as many forest types 1 Chapter 1. INTRODUCTION 2 as practical. Note that a network of plots located in one set of stand conditions may yield predictions which have quite different precision than predictions from a network of plots located in another set of stand conditions. The two networks of plots do not yield the same information, therefore, one network may be better than the other for some purposes. Thus the question arises, how should one collect the information? Allocation of observations or design of experiments has been examined in some detail by statisticians (see Fedorov and Malyutov, 1972; St. John and Draper, 1975; Ash and Hedayat, 1978; Pazman, 1980; Atkinson, 1982; Steinberg and Hunter, 1984). The major complication which has emerged is that the variability of a regression surface has two components, model or relationship variability (also called random error) and bias or systematic error due to model inadequacy. Unfortunately, designs which are optimal for minimizing the random error component rarely allow for checking of bias although the reverse is not necessarily true. Designs which minimize the bias may also be efficient for minimizing the random error component (Box and Draper, 1959). The problem of experimental design for regression has been examined chiefly from the theoretical point of view, and some common complications have been largely ignored. Firstly, statisticians tend to equate efficiency and thus optimality with precision and accuracy. However, most managers equate efficiency with the amount of information gathered per unit money (or time) spent. When all information is not available at the same cost and not all observations are equally informative, these two notions of efficiency lead to different optimal designs. Secondly, not all predictions need have the same precision. Statisticians tend to specify a maximum allowable error while in forestry a per cent error is often specified. Again this leads to different optimal designs. These two complications represent major obstacles to the current theory being applied to practical forestry problems. This study addresses these two complications to optimal design. First the current Chapter 1. INTRODUCTION 3 statistical literature on optimal design theory will be reviewed as well as the forestry literature on model-based sampling. Currently available algorithms for optimal design construction are discussed. Complications which limit the use of optimal design theory in forestry are identified in chapter 3. Chapter 4 outlines a method of addressing the complications and explicitly incorporating them in optimal designs. In chapter 5, an algorithm for obtaining optimal designs is developed. An example of the application of optimal design theory to forestry, accounting for the unequal cost of observations, will be discussed and the limitations of the current theory identified. An optimal design will be constructed and compared to several more common designs as well as a design obtained using the current, unmodified theory (i.e., not considering practical differences between observations). Further applications and modifications will be described. Increasing awareness of the inefficiency of many surveys and experiments in forestry has lead to a search for more methodical, objective approaches to design. Data are expensive to collect and ought to provide- as much relevant information as possible. Optimal design can provide a tool for obtaining such data. Chapter 2 LITERATURE REVIEW Optimal design of experiments is concerned with the planning of experiments to yield information which best meets the experimenter's objectives(see figure 2.1). A key development, promoted largely by Sir R. A. Fisher, was the shift from using statistics only after an experiment was completed to involving the use of statistics from the beginning of an investigation. This shift from analysis to design of experiments was prompted by the observation that analysis of experimental data was often unnecessarily difficult if not impossible because of improper planning of the experiment. By using the concepts of orthogonality and rotatability, Fisher obtained classes of designs commonly grouped under the heading "Analysis of Variance". These lead to simple calculations for estimates and symmetric variances and covariances. This was necessary at the time since calculations were done by hand. The resulting designs also had some desirable statistical properties including variances for parameter estimates which are minimal under certain conditions. A later development was to identify these desirable properties, develop an optimality criterion, and find designs which are optimal or close to optimal under this criterion. The development and sophistication of computing machines has meant that search algorithms may be used to obtain such designs. In some cases both approaches lead to the same design but in many cases they differ. Experimental design for regression is concerned with quantifying the relationship between one or more independent variables and one or more dependent variables. The relationship is often called the regression (or response) surface (or function). Much 4 Chapter 2. LITERATURE unsatisfactory model REVIEW Verify agreement between the model and the data Reconsider the model Suggest concurring models I 1 satisfactory Plan an experiment for estimation of parameters Reduce the experimental data Seek values of the parameters sought I I Plan discriminatory experiment i I experiment The clockwise route is concerned with increasing the precision of the model while counterclockwise route is concerned with increasing the accuracy of the model. Experimental design is concerned with the steps inside dashed boxes. Figure 2.1: The process of seeking a mathematical model (Fedorov, 1972 p. 8) Chapter 2. LITERATURE REVIEW 6 work was done on linear regression design prior to the 1970's and since then several algorithms have been designed and implemented. Also, increasing effort has been devoted to designs for nonlinear relationships and models with bias due to relationship misspecification. Silvey (1980) provides an excellent summary of progress to 1980. Most of the initial work was concerned with parameter estimation (Elfving,1952; Draper and Hunter, 1966, 1967a, 1967b; Kiefer, 1959). Several optimality criteria were developed. One weakness of this approach is that designs which are optimal for parameter estimation are often inadequate for providing model validation (Silvey, 1980). Another branch of research looked at designs which have been called discriminating experiment designs—designs which are optimal for discriminating among two or more possible regression models. Approaches to design construction can be classified into sequential and non-sequential. 2.1 THEORY 2.1.1 THE PROBLEM NOTATION The following notation is after Silvey (1980). The distribution of a random variable y depends on : 1. a column vector /J, ~'= (/i 1 ,/i 2 , T • • •, Mr) of control variables which may be chosen by the experimenter. These are also known as independent variables. 2. a column vector 0 = (0 0 ,..., Ok) of parameters which arefixedbut unknown T l5 2 to the experimenter. These and certain functions of them are of interest to the experimenter. Chapter 2. LITERATURE 3. a column vector T t REVIEW = (r 1 ? r , . . . , T{) 2 7 of nuisance parameters which are fixed and unknown but not of primary interest. The distinction between 9 and r may not always be obvious. In biology, an attribute, e.g., tree volume, is related to many things e.g., tree height, diameter, age, species, phenotype, genotype, etc. Typically we may only be able or interested in predicting tree volume from a subset of these attributes. The 9 vector corresponds to parameters in functions of this subset. The remaining attributes are ignored or assumed to be negligible and are lumped, along with measurement errors, into a random error component and the r accounts for this. Vectors fx can be chosen from the sample space, a given set U in R (where R T 1 is i-dimensional real space assumed to be compact), which may be constrained by the experimenter (e.g., to include previous observations or to exclude dangerous or infeasable operating conditions). The true 0 belongs to 0 in R and the true r to a set k T in R . l The experimenter may take N independent observations on y at vectors ^( ),//( ),.. .,fJ.(N) 1 2 from the set U. These selected vectors compose the iV-observation design and may then be used to estimate 9 and r and predict y for additional \i. The problem is how to best select the N observations. Just as there are many different estimators, there are many approaches to optimal design which have been proposed and tried. A decision theoretic approach which uses a utility function and prior distributions for 9 and r selects the design with the maximum utility (see Brooks, 1974; Chaloner, 1984). Although this approach has some attractions, it has some serious disadvantages. Chief among them are the computational complexities. Other approaches consider the possible biases that arise from model or relationship misspecification. However, the bulk of the groundwork was based on the Chapter 2. LITERATURE REVIEW 8 assumption of a correct model and using the information matrix and squared error loss. The addition of model bias considerably complicates the design problem. As well, the form of the bias must often be specified. If the bias function is known, the bias can often be removed. INFORMATION MATRIX Let the probability density function of y for given /z, 9 and r be given by p(y; 9, r). Then, assuming sufficient regularity of p(y; fx, 9, r) for differentiation, the partitioned Fisher information matrix for (9, r) for a single observation y at the vector \J. is J{H\B,T)=\ where Jee(p) is the k x k Jee(n) Jer{fJ>)\ Jre(n) JTT() (/*)/ matrix whose ( i j ) a ln 2 E t h (2.1) element is p(y;n,9,T) (2.2) ddidOj and other entries are similarly obtained. For the present, the observations y are assumed to be independent of each other. The information for an iV-observation design is the sum of the information matrices of the individual observations and will be denoted as K(H;9,T) and partitioned in the same manner as J(/z;#,r). For the present it will be assumed that K(/J.;0,T) is non-singular. Then K~ (fj,;9, r) 1 provides a generalized Cramer-Rao lower bound for the variance-covariance matrix of any unbiased estimator of (9,T) (Silvey, 1975). Also, its leading k x k submatrix given by Vi(/i; 0, r) = 9, r) - i^ ( ; T M 9, r)[K (^ rr 9, r)\~ K {\x\ x r6 9, r)}" 1 (2.3) is a lower bound for the variance-covariance matrix of any linear unbiased estimator of 9. As the sample size increases, the variance-covariance matrix of the maximum likelihood estimates (MLE's) converges to this lower bound (Silvey, 1975). With this asymptotic Chapter 2. LITERATURE REVIEW 9 property in mind, this matrix will also be referred to as the variance-covariance matrix for the estimate of 9. If one is interested in real-valued functions of g (9) T = (g (9),g (9),... 1 ,g (9)) 2 S and be the Dg(8) 9, k say x s gi(9), g (9),..., g (9), 2 matrix with S (ij) t k let compo- nent ^ 1 . Then, assuming sufficient regularity, given an iV-observation design, the lower bound for the variance-covariance matrix of an unbiased estimator of g(9) is V (;u; 9, r) = {Dg{9)YV {^ g e 9, r){Dg(9)} (2.4) and for large JV, the MLE of g(9) converges to this lower bound. PROBLEM STATEMENT The majority of approaches to optimal design are concerned with selecting /J, to make the lower bound of the variance-covariance matrix of the parameters of interest (9) as small as possible or to make its inverse, the Fisher information matrix, as large as possible. Two complications occur. 1. Only in special cases is it possible to find the "best" /z = fi* in the sense that Vg(fj.;9, T) — Vg(/u,*;$,T) is non-negative definite for all \i. Thus one must be satisfied with minimizing some function of V$(/j,; 9, T). 2. In the case of nonlinear regression, Vg(fj.;$,r) is dependent on the unknown pa- rameters. Therefore, prior knowledge about 0 and r is required. In linear regression, this dependence on the parameters does not occur and therefore prior knowledge of their values is not required. For now, attention will be restricted to linear models (models linear in the parameters). The main reasons for this are that the theory is better developed than for nonlinear models, and many nonlinear problems can be solved by transforming the Chapter 2. LITERATURE REVIEW 10 nonlinear model to a model linear in the parameters. As well, when one is looking for an empirical model, it is usually preferable to keep things as simple as possible. Many nonlinear relationships can be approximated by a linear model over portions of their range. The linear relationship can be stated in the form E{Y;n,d,R)=RI(TI,6) ( 2 . 5 ) or Y = v(n,0) + G(r*,T) where E[G(FJ.,r)] = 0 . ( 2 . 6 ) Specifing a model involves stating the functional form of and G(/J.,R). A linear model is one in which RJ(^I,8) is linear in the parameter R](FJ,,0) 8. That is, 77( M ,0) = / ( M ) T 0 ( 2 . 7 ) and VAX(Y;/J,,e,T) = TV(/X) ( 2 . 8 ) where V(FI) is a known function of /J,. The standard linear model has the additional simplification Vax(y;M,T) = l . This may be obtained from equation a corresponding transformation to 2 . 8 F(FI), by the transformation namely, ( 2 . 9 ) Y' = Y{RV(FJ,)}~2 F(FI)' = / ( / i ) { r u ( / j ) } " 2 . and Therefore, attention will be restricted to the standard linear model with the realization that it is not as restrictive as it may appear. For linear relationships, it is customary to estimate 8 by the method of least squares (LS). When Y is assumed to be normal, the LS estimators are also the MLE's. Selecting a vector \x from the set U is equivalent to choosing afc-vectorx T (xi, x » • • •, k) from the set X = F(U) where F is the vector-valued function (/i(/x), x 2 = Chapter 2. LITERATURE / (/x),..., fkir ))- 11 Therefore, equation 2.5 may be replaced by 1 2 REVIEW E(y;H,0,T) = x e. (2.10) T The problem becomes one of selecting N vectors X(i),X( ), 2 ••• from the set X, a subset of R . k Under the assumption that the observations on y are independent, the variancecovariance matrix of the least squares estimators 8 of 8 from an iV-observation design is V(8 X,T) ] = T(Y,X X{ )(I) I) (2.11) 1 t=i when J2iLi (i) (i) x x 1 S non-singular. Note that this quantity is independent of 8. APPROXIMATE THEORY For an iV-observation design, let pi be the proportion of total observations taken at X(j),i = 1,... ,n, where n < N with equality if, and only if, all x^ are distinct. These probabilities comprise a discrete probability distribution or design measure fa on X. Equation 2.11 may now be written as V(^;x,r) = r{ivf:p x xJ }- . 1 t (t) ) (2.12) Note that minimization is subject to the the constraints that Npi be a non-negative integer and represent the number of replicate observations of x^y If the requirement that Npi be an integer is relaxed to include all non-negative real numbers, calculus techniques may be used to facilitate finding an optimal solution. Thus, the set of fa is expanded to the set of all design measures £ € E, both discrete and continuous. It is then hoped it will be possible to find an fa close to £ that will be close to optimal. The design associated with £ is called an approximate design while one associated with fa is called an exact design. Note that £ is unique only up to multiplication by a constant, Chapter 2. LITERATURE REVIEW 12 that is, the pi are independent of N. Thus, for most of the optimality criteria, the pi can be determined first and then N calculated (Fedorov, 1972 p.63). OPTIMAL DESIGN FOR P A R A M E T E R ESTIMATION 2.1.2 Let N- M[i(N)} x = M(() = iV" 1 x^xj^ = f x^xj^dx which is the standardized x information matrix T~ K(/J,; 6, r). A good design is one which makes M(£) large in some 1 sense or its inverse small. In general, M(£) £ M where M. = {M(() : £ € H}. Let M + = {M G M. : det M ^ 0}. For now restrict M to be in A4 . Parameter estimation + criteria essentially are concerned only with reducing the random error component of a model, ignoring model bias and often leaving no means for evaluating model bias. Several criteria for prediction and parameter estimation have been studied. DESIGN CRITERIA Once an experimenter defines his objectives and constraints, he must also find a criterion to measure to what extent his objectives have been met. As Kiefer (1975) observed, when statisticians use an optimality criterion in selecting a procedure, the precisely expressed criterion is usually only an approximation to some vague notion of "goodness". Sometimes it will be useful to distinguish between the region of interest RI(x) and the region of operability RO(x). The region of interest is that subset of R for which k the experimenter is interested in using the model. The region of operability is the subset of R from which x may be selected. The region of interest will be assumed to k be contained within the region of operability, i.e., one is not interested in extrapolation (see figure 2.2). In addition, the experimenter may not be equally interested in all points in RI(x). Chapter 2. LITERATURE REVIEW 13 He may, for example, be more interested in "average" observations. This can be accounted for by introducing a weighting function over RI(x) denoted by w(x) suitably scaled to integrate to one over RI(x) (see figure 2.3). For the present, it will be assumed that RI(x) and RO(x) coincide and that w(x) is uniform over both. SPECIFIC C R I T E R I A The following criteria are essentially different ways of re- ducing M(£) to a meaningful scalar quantity suitable for comparing two or more designs. 1. G-optimality. Thefirstformal criteria was due to Smith (1918). Let x 9 be the T estimate for x 0. She proposed minimizing the maximum variance of this estiT mate for all x 6 X, that is, finding £ 6 H which minimizes max x x€ x [M(£)]~ x. T 1 This criterion has been referred to as the minimax criterion or G-optimality where the G- stands for generalized variance. This is a prediction criterion and has been criticized for being too conservative in that it protects against the worst possible situation. As will be seen, this criteria is closely related to parameter estimation criteria. It is also closely related to robust designs which protect against "wild" observations (Box and Draper, 1975). 2. D-optimality. Under the assumption of normality of errors (r ~ xjv_ ), a confifc dence ellipsoid for 9 is of the form {9; (0 — 9) M(£)~ (9 T l - 9) < constant} where 9 is the least squares estimate for 9. The content of this ellipsoid is proportional to {det M ( £ ) } . The design which makes the content of this ellipsoid _1 small by maximizing det M(£) or minimizing det M ( £ ) -1 is called D-optimal. This is essentially a parameter estimation criterion and is one of the most widely Chapter 2. LITERATURE REVIEW 14 RO(x) RI(x) Figure 2.2: The region of operability (RO(x)) and the region of interest (RI(x)) in two dimensions. Chapter 2. LITERATURE REVIEW Chapter 2. LITERATURE REVIEW 16 studied (e.g.,Wynn, 1970, 1972; Mitchell, 1974a; Hill, 1980;Johnsori and Nachtsheim, 1983). Without the assumption of normality, D-optimality also minimizes the variance of the best linear unbiased estimates (BLUE's) of the parameters (Pazman, 1986 p.79). One disadvantage of D-optimality is that the ellipsoid may be narrow but long indicating that an element of 9 has a relatively large variance. 3. E-optimality. Let C = {c G R : c c = 1}. l r Choosing ( G H to minimize max fli c M ( £ ) c is equivalent to choosing £ to maximize the minimum eigenT - 1 c6 value of M((). This is called E-optimality and minimizes the variance of the least well estimated parameter. Under the assumption of normality, an E-optimal design maximizes the power of the F-test of size a for this least well estimated parameter. 4. A-optimality. Another measure of the "size" of a matrix is its trace and a design which minimizes the trace of M ( £ ) - 1 is called A-optimal where A- stands for av- erage variance since ^tr[M(£) ] is the average variance of the 6Vs. This criterion -1 does not account for the correlations between the 6Vs. 5. L-optimality. When one is interested in predicting E(y) over a region C of R , k one may wish to minimize some average of c { M ( £ ) } c over C. If the average is T _1 with respect to some probability distribution n on C, the criterion is to minimize / c M(£)~ c/j,(dc) T 1 and is called L-optimality where L- refers to linear criterion. This integral can be expressed in the form tr[{M(£)} i?] where B = f c cfi(dc) -1 T and is positive semidefinite. Again, this is a prediction criterion averaged over the regions of interest. When B = I, this reduces to A-optimality. If B has rank s, it can be written B = AA where rank(.<4) = s. T Chapter 2. LITERATURE REVIEW 17 These are the main criteria found in the literature for cases when the form of the relationship E(y; (J.,9,T) = rj(fj,,9) is assumed known. They are all based on linear functions of 6 (Silvey,1980 p. 13). Criteria functions which are based on nonlinear estimates of 9 have the additional complication of being dependent on the value of 9. No nonlinear optimality criteria for linear estimates have achieved prominence. MINIMAL PROPERTIES A N D U N I V E R S A L O P T I M A L I T Y One of the most common dissatisfactions with the previously described optimality criteria is that they are too precise and restrictive and may only approximate the experimenter's objectives. Typically an experimenter will have several objectives or a single vaguely defined objective. He may not wish to restrict himself to a single criterion which may lead to a design which is poor in terms of another criterion. Kiefer (1975b) proposed a set of properties which any optimality function should possess. An optimality function $ should satisfy the following properties: 1. $ is convex, 2. $[6M(£)] is nonincreasing as scalar b > 0 increases, and 3. $ is invariant under each permutation of the rows and corresponding columns of M(0Kiefer called a design £* universally optimal in the set H of designs under consideration if £* optimizes $[M(£)] for every $ satisfying the above conditions. All the previously mentioned criteria satisfy these minimal properties. Kiefer also gave some conditions for proving universal optimality which were further developed by Cheng (1987). If a design with all eigenvalues equal maximizes tr[M(£)], then it minimizes $[M(£)] for all nonincreasing, convex, and orthogonally invariant $ (Cheng, 1978). Chapter 2. LITERATURE REVIEW 18 A universally optimal design greatly decreases the pressure on the experimenter to precisely define vague objectives or to choose between several optimality criteria. A universally optimal design is robust to changes in criteria. As a consequence, these designs are also robust to changes in objectives. Bondar (1983) provided alternative properties and definitions for universal optimality. Unfortunately, universally optimal designs do not always exist (Hedayat, 1981). Another desirable property of an optimality function is that it be insensitive to changes and scale of the x. However, this is rare. D-optimality is an exception in that the D-optimal design remains D-optimal with nonsingular linear transformations of the independent variables although the value of the criterion will change (Pesochinsky, 1978). More generally, D-optimal designs remain D-optimal with transformations where the determinant of Jacobian of the transformation is non zero (Mehra, 1973). This sensitivity of criteria to scale can be ameliorated by standardizing the x. EQUIVALENCE THEOREM In 1960, Kiefer and Wolfowitz demonstrated the equivalence of the following conditions for an approximate design measure £*: 1. M(C) is D-optimal, 2. M(f*) is "G-optimal, and 3. max x [M{C)]~ x T 1 x = k. In addition, x [M(£*)]~ x T 1 = k is true for all points x»(i = l...n). Any linear combination of designs satisfying these criteria also satisfies them. This theorem gives a necessary but not sufficient (Fedorov, 1972) condition for proving both D- and Goptimality. It also brings together the objectives of prediction and parameter estimation. However, it should be emphasized that the theorem is only valid for approximate designs where Var(y; /J.,6,T) = constant. Chapter 2. LITERATURE REVIEW 19 Fedorov and Malyutov (1972) gave a more general theorem. Let $ be a function defined on { M ( £ ) } _1 such that the following conditions are satisfied: 1. ^[7V- {M(0}" ] = 7 ( N ) $ { M ( £ ) } 1 1 _1 where -y(N) is a decreasing function in N, 2. $(A) < $(B) if B — A is positive semidefinite, and 3. $ { M ( £ ) } _1 is a convex function on M(£). Then 1. A necessary and sufficient condition for optimality of the approximate design £* is the fulfillment of the equation sup A(xMx, f) = trWO}- 1 (2 - 13) where ^(x,0 = x { M ( 0 } - ^ ^ ^ - { M ( 0 } - x T 1 1 (2.14) and A(x) = {Var[y(x)]} continuous on X. _1 2. The function \(x)(p(x, £*) reaches its upper bound on the support of the optimal design £*. 3. A set of optimal designs is convex. If $ { M ( £ ) } - 1 is a strictly convex function on M(£) then all the optimum designs have the same information matrix. Some examples of $ and <p are given in table 2.1. PARTIAL O P T I M A L I T Y The previous criteria are sometimes referred to as global criteria since interest is in all the parameters. . Partial optimality criteria are concerned with subsets of the parameters. For instance, the experimenter may only be interested in estimating certain linear combinations of 9, namely aj9, aj9,..., aj9. Let Chapter 2. LITERATURE REVIEW 20 Table 2.1: Some examples of 6 and <p from Fedorov and Malyutov(1972) *P(0] V(a:,0 d(x,0 = f (x)D(0f(x) T d (x,x ,() d(x ,0 2 0 0 f d(x,()dx j d {x,x ),£)dx 2 z z 0 0 2 [r{x)D{()L D{V);v = L9 L = \\h...l \\ m trAD(£) A f(x)D(0AD(0f(x) be the s x k matrix of rank s < k made up of rows a , a ,..., aj. Then interest T T 2 is in A 8 and for nonsingular M(£) the Cramer-Rao lower bound for the variancer covariance matrix of unbiased estimators of A 9 is proportional to A {M(£)}~ T T 1 A. Now, the previous criteria can be used on this new variance-covariance matrix, i.e., D -optimality = max det LA {M(£)}~ .4:] and so forth. T 1 -:1 a A special case occurs when A — [I 0], i.e., when one is only interested in the first T s s parameters of 9. The matrix M(£) may be partitioned M(x) =\ \M (i) 21 M (x)f . . 22 Then [A {M(x)}" A]- = {M (x) - M (x){M (a;)}- M (x)}- . T 1 1 1 n 12 (2.15) 22 1 21 Again, the pre- vious criteria can be applied to this new variance-covariance matrix and are usually denoted by a subscript s. An example of partial optimality occurs with a randomized complete block experimental design. The experimenter may have used blocks in the experimental design to remove some unavoidable systematic variation in the sample units. However, precise estimation of the blocking effects may not be one of the objectives of the experiment. Kiefer and Wolfowitz (1960) extended their equivalence theorem to the problem of Chapter 2. LITERATURE REVIEW 21 estimating subsets of parameters for approximate designs. The following are equivalent. 1. M (£*) is D -optimal. s s 2. M (C) is G -optimal. S s 3. m a x x [ M ( £ * ) ] i = s. T x _1 r One of the complications encountered with partial optimality is that the matrix M (C) is often singular. Thus [M (£*)] is not defined and statement 3 above cannot -1 S s be used as it is written to verify approximate D- or G-optimality. [M (£*)] must be _1 s replaced by the generalized inverse [M (£*)]"• s R E S T A T E M E N T OF C R I T E R I A . Let \ > A > . . . > A be the eigenvalues of r 2 fc the M(£) matrix. A useful family of criteria is $ (A) = [| YA=I K }* with the limiting 1 P values = 11?= i = m a x Here p < q implies $ (A) < $,(A) with P equality if and only if the Aj are all equal. Note: 1. $'(\) = D-optimality, 2. $^(A) = E-optimality, 3. $i(A) = A-optimality, and 4. ^ ( A ) > *: (A) if si > s (Zarrop, 1979). 2 2 Thus, for a given design, max A" is an upper bound for all the other criteria in this 1 family. Kiefer (1975a) found that although the optimal design £* changed appreciably with changes in p, the efficiency of £* in terms of the other criteria was surprisingly good. Chapter 2. LITERATURE REVIEW 22 DISCUSSION The emphasis has been on designing experiments for parameter estimation or prediction. Optimal design has also been applied to other problems notably response surface design (Box and Draper, 1959, 1987). As well, optimal design has been used to obtain restricted randomization experimental designs (White and Welch, 1981) and to obtain optimal designs for estimating the independent variables for a given value of the dependent variable (Ott and Myers, 1968). As pointed out by Bandemer (1980), the approach to optimal design based on the information matrix M has several potential shortcomings: 1. it is based on the assumption of a correct model and the criteria lose their meaning if this is not valid; 2. it does not make explicit use of prior knowledge of the relationship; 3. it is not always clear which criteria are appropriate under which circumstances; and 4. it is not always possible to find an exact design which is close to the optimal approximate design therefore the transition from an approximate to an exact design may not be clear. These shortcomings may or may not be serious enough in a given instance to make one abandon this approach. There are several other approaches to optimal design which specifically address these shortcomings. 2.1.3 MODEL UNCERTAINTY The previous theory was concerned with obtaining designs for specified models. However, often the experimenter may not know the model with certainty. He may have Chapter 2. LITERATURE REVIEW 23 several potential models in mind or may have a single model which he may wish to test for adequacy. The worst possible situation is when the experimenter is unable to state any information about possible models. Several approaches to solving these difficulties have been proposed. Hill (1978) reviewed design procedures for regression model discrimination. One measure of the adequacy of a model is the integrated mean squared error N r J = — / (x)E{y(x) -r](x)} dx 2 w (T J (2.16) JO where r)(x) is the true model. The mean squared error can be broken down into two components, V due to model variance and B due to model bias given by V = -f Jo w(x)E{y{x) - Ey(x)} dx 2 (2.17) N r B = — w(x){Ey(x) - r](x)} dx (2.18) J = V + B. (2.19) 2 (7* JO The previous parameter estimation theory dealt exclusively with minimizing V. Designs for dealing with model uncertainty are useful when model bias cannot be ignored. NO PRIOR I N F O R M A T I O N The case when the experimenter has no prior information about the relationship is extremely rare in practise and when it occurs, it may indicate a lack of preliminary research into available information. However, this situation is of interest from a theoretical viewpoint since it represents the opposite extreme from complete model certainty, and has an even simpler solution. It may also happen that when an experimenter is planning a pilot study, he may wish to minimize possible introduction of bias by deliberately ignoring previous results. Once the pilot study information has been collected, one or more possible models may be proposed and a sampling design suggested. Chapter 2. LITERATURE REVIEW 24 The most informative design in the absence of any information is a uniform design, i.e., one where every X(;j is sampled as equally as possible. This has been called the all-bias design since it minimizes potential model bias. Scheffe (1963) recommended such a uniform design in the absence of prior model information, but recommended a few replicate observations also be taken. MINIMIZING S Q U A R E D BIAS Consider fitting y(x) = xjb (2.20) r when the true model is n(x) = x1p\ + xT/3. (2.21) 2 The bias for this model is B — ~2~ J w(x){E(y(x))- (x)} dx. 2 V (2.22) Let Mn = N^X^Xi M Ahi = / w(x)xixjdx 12 = N^XfXi (2.23) fj,i2 = / w(x)xixjdx. A necessary and sufficient condition for the squared bias B to be minimized is that M{ M 1 1 12 = MuVi2- (2.24) A sufficient but not necessary condition for B to be minimized is that M n =/i n and M 1 2 = ^i , 2 (2.25) i.e., the moments of the design points coincide with the moments of the weight function. For a uniform weight function, this coincides with a uniform design. Chapter 2. LITERATURE REVIEW 25 Note that both of these conditions (equations 2.24 and 2.25) are independent of f3\ and fa which is not surprising since optimal variance designs are also independent of the parameter values. However, the efficiency of these designs compared to designs minimizing total mean squared error does depend on f3i and /3 . As well, one must 2 specify the form of the bias (equation 2.21). Box and Draper (1959) found that the efficiency of the all-bias design compared to the optimal (minimum mean squared error) design decreased slowly as the bias component of J decreased until B approaches 0, i.e., all-bias designs are efficient for a wide range of B and V. But for minimum variance designs, this was not true. This lead them to recommend that if one wishes to simplify the design problem, one should concentrate on minimizing bias rather than the random error component unless bias is almost negligible. Stigler (1971) pointed out that the results of Box and Draper were highly dependent on their allowing RO to extend indefinitely beyond RI. He suggested that just as it is not advisable to extrapolate if there is likely to be bias, one should constrain RO to coincide with RI. Under this constraint, the differences between the all-bias and minimum variance designs are much reduced. M O D E L DISCRIMINATING DESIGNS An experimenter may have a finite (hopefully small) number of potential designs from which he wishes to select the most adequate one. Hunter and Reiner (1965) proposed a criterion for selecting observations for discriminating between two rival regression models 77i(x,0i) and 7 7 {77i(x,^i) — 77 (x,0 )} 2 2 2 2 (x,# ). 2 The sequential criterion is to select x n + 1 to maximize where the 0's are the MLE's based on the n observations al- ready taken. In a sense, one tries to take an observation at the point of maximum divergence of the two models. Roth (1965) extended the criterion to more than two Chapter 2. LITERATURE REVIEW 26 rival models. Box and Hill (1967) pointed out that this criterion ignored the variability of the estimated response rjj(x,9j). Two precisely estimated responses which do not differ much may yield more information than two widely differing estimated responses with high variability. Therefore, they proposed maximizing the upper bound of the expected entropy change due to the (n + l) function which is defined as I j th th observation using the Shannon entropy = — JZjLi IX In Tij where JTj is the probability that the model is the true one. To obtain a computational form for the expected entropy change, Box and Hill made some rather restrictive assumptions and much subsequent work has been concerned with relaxing these assumptions. For discriminating between two models, the procedure of Hunter and Reiner (1965) is asymptotically equivalent to that of Box and Hill only if the asymptotic design is non-singular (Hill, 1978). Atkinson (1981) found that for'discriminating between two models, one of which was correct, there was no detectable systematic difference between the two criteria for asymptotically non-singular designs. There was no clear indication that one criterion converged faster than the other. Thus, for his examples, the two procedures were found to be roughly equivalent both for small samples and asymptotically. M O D E L DISCRIMINATION A N D P A R A M E T E R ESTIMATION Box and Draper (1959) proposed a criterion for minimizing the mean squared error. They found that in typical situations in which both bias and variance occur, the optimal design is very nearly the same as an optimal design obtained if V were ignored completely and only B considered. A more detailed analysis in Box and Draper (1987) showed that the optimal design obtained when considering both V and B is close to the optimal design obtained from considering only B. Only when the proportion of J resulting from V approaches 1 does the design change appreciably. Box and Draper surmised that if one wished to simplify the optimal design problem, it may be better to Chapter 2. LITERATURE REVIEW 27 ignore V than B. Fedorov (1972 p264+) described minimizing generalized loss defined as L = W\B + W2V where w\ and w are weight multipliers such that wiB «C w V 2 2 when one model is very likely and WiB 3> w V when all models are equally likely. One 2 possible pair of weights is Wi = [ ^~^]* and w — 1 — w where v 2 x is the probability associated with the most likely model, v is the number of possible models, and A is a constant between 0 and 00 (Hill et a/., 1968). W is monotonically decreasing in x PN and A. Borth (1975) proposed maximizing the expected decrease in total entropy as a criterion for optimal experimental design where total entropy measures both the uncertainty about the model and the parameter values. Welch (1983) proposed a mean squared error criterion for protecting against model bias yet still providing efficient designs for parameter estimation. T E S T I N G SPECIFIC D E P A R T U R E S Atkinson (1975) presented a criterion for testing specified departures from the model. The tentative model is as given in equation 2.10 but may contain additional parameters 7 of dimension s so that the more general model is E(Y) = x 9 + z j = x $°. T T (2.26) oT A D-optimum design for estimating the significance of the departure maximizes A(z) = \Z Z - Z X{X X)- X Z\ T r T l T = (2.27) where X, Z and X° are n-rowed matrices specifying the values of x,z and x°. A Doptimum design for estimating the parameters in the original model maximizes Ai = \X X\. T (2.28) Chapter 2. LITERATURE 28 REVIEW A multipurpose design, i.e., one which estimates the parameters and detects the departures 7 from the model maximizes (2.29) AHA(z)} where a\ and a are weights selected by the experimenter. 2 As Atkinson points out, in designing experiments to detect specific departures from the model, there is the risk that departures of other kinds may not be detected. 2.1.4 S E Q U E N T I A L DESIGN Mitchell (1974a) suggested using sequential design (less expensive) to determine the approximate sample size N and then use an exchange procedure (more expensive but theoretically closer to the optimum design) based on that N. These usually start with an all-bias design and after each observation is taken, the data updated. As the experimenter becomes more certain about the true model, the design approaches those obtained from parameter and response estimation criteria. More formally, let Y. T = (/J(i), M(2)> - -•'> A^r)) " 1 be the initial design consisting of R < N observations. These may be used to estimate the model form and, in the case of nonlinear regression, to estimate 9 by 8. Then find At(r+i) ${X(/z ,0 ) + r r J(n ,O )}: (r+1) Then re-estimate the model and parameters using ^ (2.30) r r + 1 = (A'(I),/-i(2)j •••i M(r+i)) and T repeat the process to find H( +2)- In linear regression, this corresponds tofindingx to r maximize ${M (0 + za }_1 r T (2.31) An important note is that the observations taken subsequent to the design being updated are not independent of the previous observations. As a consequence, the matrix Chapter 2. LITERATURE REVIEW 29 K(yi\ 8,r) as described earlier is no longer the Fisher information matrix (Silvey, 1980 p63). This does not invalidate using the above method of design construction; however, it can influence inferences drawn from the results. 2.1.5 B A Y E S I A N O P T I M A L DESIGN Bayesian designs are based on prior distributions for 8 and r. Restrict these to be of the form / ( 0 ; r ) ~ N(8 ,TR) (2.32) 0 where R is a specified k x k positive definite matrix, and E{r~ ) is finite. Then l p(8;r,y) ~ N(8 ,r(R + 1 X X ( T 2 . 3 3 ) where • e = (R + XX )- {Xy T + R0 ). 1 1 (2.34) o Let c be a vector and /J. a probability measure on c. Let ip = E^cc ]. 7 A ^-optimal design for squared error loss corresponds to taking an observation at the X which minimizes (2.35) + XX ]- . r tTii>[R 1 Psz-optimality is also referred to as Bayes A-optimality or, more generally, as Bayes L-optimality. It also has a non-Bayesian interpretation. If some observations Xo have already been taken or they must be observed, and the matrix X Xj 0 is non-singular, the problem remains as to how to allocate n additional observations. For given vectors c and JU, the problem can be written as find the X to minimize tr 4>{X Xj + XX ] T 0 _1 (2.36) Chapter 2. LITERATURE REVIEW 30 which is the same computation as before when one sets R = X Xj 0 (Covey-Crump and Silvey, 1970). Thus one can see that the Bayesian approach lends itself to sequential construction of optimal designs. Again, by relaxing the exact restriction to include continuous (approximate) designs, one can replace XX r by N f xx d£(x) where £ is a probability measure on X. T x Most of the non-Bayesian theory has an analog in Bayesian theory. For example, one can once again define a variety of optimality criteria based on one's objectives. An equivalence theorem analogous to the Kiefer and Wolfowitz theorem also exists for Bayesian design (Chaloner, 1984 p.287). However, an important difference is that, in general, optimal Bayesian approximate designs depend on N (Chaloner, 1984). It is worth noting that the contribution of the prior distribution approaches 0 as 7Y approaches oo, i.e., the Bayes solution asymptotically approaches the non-Bayes solution although the basic premises are quite different. The Bayesian approach has several shortcomings as well. Use of prior distributions specified by the experimenter introduces subjectivity to the design selection process. Different prior distributions lead to different designs so how does one select a prior distribution? This leads to another shortcoming. The computational complexities in Bayesian theory make it imperative that restrictions such as equation 2.32 be imposed so that a solution can be obtained. This constrains the choice of prior distributions. In fact, V>-optimality may only be appropriate in a strictly Bayesian sense under the assumptions of normality and squared error loss (Goel and DeGroot, 1980). Chapter 2. LITERATURE 2.1.6 REVIEW 31 N O N L I N E A R REGRESSION A nonlinear relationship is one in which the random variable y whose density function is p(y; / i , 0, r) is described by the model yi = v{l«J) + £i (2-37) where rj is a nonlinear function of 9 and £ depends only on r. Restrict & to be of the form E(£i) = 0 and Var(£;) = r. In general, for nonlinear functions, the experimental design depends on: 1. the form of the model, 2. the sample space U, and 3. the initial parameter estimates. Once again one can obtain the Fisher information matrix as in section 2.1.1. However, the matrix is no longer independent of the unknown 9. By defining {i) —d9— * ' the standardized information matrix for an N-observation design can be written K(n- 9, T ) = M(£, 9) = N- fl x xly 1 {l) (2.39) One can use the criteria in section 2.1.2 to obtain a design optimal for estimating the parameters 9 (usually denoted by a bracketed 9, e.g., D-optimality becomes D(0)optimality) but the interpretation changes slightly. In general, the optimal properties become asymptotic in N. As well, the information matrix is not nearly as easily obtained as before since it depends on 6 which is unknown. Thus to use design criteria based on the information matrix, one must have initial parameter estimates. This approach lends itself to sequential design of experiments where the parameter estimates Chapter 2. LITERATURE REVIEW 32 are being continually updated thereby minimizing the dependence on the initial estimates. Another approach is to use a linearized approximization to the nonlinear model (Box and Lucas, 1959) which is often satisfactory for a narrow range of X. For models which are linear in some of the parameters and nonlinear in the remaining parameters, the D-optimal design does not depend on the values of the linear parameters (Hill, 1980). Fedorov and Malyutov (1972) looked at the properties of optimal designs based on least squares estimates. They stated the following theorems. Denote the true value of 8 by 8 . Define a design £* to be locally optimal if U H M ^ o r 1 = (2.40) gi*{M(8 ,o} fl where $ is an optimality criteria defined in table 2.1. 1. Let the following conditions be satisfied. • The function rj(p,,8) be continuous on U x Q,8 £ 0, the set 0 is compact. U • The sequence of designs fa (where N is the number of observations) converges weakly lim _ x <00 fa — £ and the function v (8, 2 £ ) = f[n(fx, 8)—r](n, £ u ) ] has a unique minimum when 8 — 8 . U Then the LS estimates are strongly consistent (9 —> 8 w.p.l). U 2. Let the following conditions be satisfied in addition to those of the previous theorem. • 6 is an inner point of 0. U 2 £(d/i) Chapter 2. LITERATURE REVIEW 33 • The matrix M(0 ,O = j xx i{dx) (2.41) J U is non-singular. Then the LSE have an asymptotic normal distribution and Hm N V(9 ) = V(9 , 0 = {M(8 , N U (2.42) U JV—»oo where V($N) is a variance-covariance matrix for 9fj. 3. In addition to the conditions of the previous 2 theorems, let the following conditions be satisfied. • Let n(jM,9) have continuous first and second derivatives with respect to ix for all 0 € 0. • Let ${M(0 , £ ) } U - 1 have continuousfirstand second derivatives with respect to the elements of the matrix V(9, £) for all 8 G 0 and for all non-singular designs for given U and n(/j,, 9). Then, if the limiting design £ = lim^-^ is non-singular, it is almost sure locally optimal. The equivalence theorem of section 2.1.2 can also be extended to nonlinear models (White, 1973). Define the analog of i { M ( £ ) } z to be tr[I(z, 6){M(£, 0)}""]. A T - 1 1 design measure £* is called G(0)-optimal if suptr[/(z,0){M(f ,0)} ] =minsuptr[/(x,0){M(e,0)} ]. -1 xex _1 it- x€X The following conditions are equivalent for an approximate design measure £*. 1. M(r, 9) is D(0)-optimal. (2.43) Chapter 2. LITERATURE 2. M(£*,6) 3. su VxeX REVIEW 34 is G(0)-optimal. tT[I{x,e){M{i\6)}-'} The quantity tr[J(x, 0){M(£, = k is asymptotically proportional to the expected uncertainty at x from estimating 8. As well, White (1973) obtained similar results for partial optimality (estimating s < k parameters). As with linear regression, numerous other criteria for obtaining good designs have been examined. The model discriminating criteria of section 2.1.3 may be extended to nonlinear models. Box (1969) examined the case where one wishes to test the adequacy of a single nonlinear model. He proposed comparing the nonlinear model to linear empirical models using model discriminating techniques. Bayesian techniques may also be applied to nonlinear models (Draper and Hunter, 1967a, 1967b). 2.1.7 DECISION T H E O R Y As pointed out by Herzberg and Cox (1969 p.36), decision-theoretical ideas can be used to determine the size of a given structure by achieving an explicit balance between the cost of experimentation and the losses arising from an incorrect conclusion. P A R A M E T E R ESTIMATION Lindley (1968) investigated the problem of finding a subset of the independent variables X i , X 2 , . . . , Xfc to use for predicting a future value of y. More formally,finda subset / of {1,2,..., k} containing s members such that xj — {xi\i £ 1} is to be used for predicting y. Let xj be the complement of xj. The objective is to minimize the loss Q{xi) = {y-f{xi)Y + cr (2.44) Chapter 2. LITERATURE REVIEW 35 where y is the true value of the independent variable to be estimated and cj is the cost of observing xj. Under a number of independence assumptions, Lindley found that one needs to know the expectation of the regression parameters (E(6)), the dispersion of the unobserved x's (var(xj)), and the costs of observation c/. In particular, one does not require the dispersion of the regression parameters. For this problem the dispersion of the x's is more important than that of the #'s. However, calculating the value of the expected loss requires the dispersion of 0j. Lindley also pointed out that designed experiments do not give any information about the distribution of x and may therefore be uneconomical for this type of prediction problem unless additional information about x is available. Brooks (1972) extended Lindley's work to examine designing experiments for predicting a future value of y. As well, Brooks added a term C to the loss function (equation 2.44) to represent the cost of the experiment. Brooks made some normality assumptions about 6 and y. Using a vague prior for 9 (var(0) —• oo), one obtains the same results as if one had decided beforehand to use all the x; to predict y (i.e., xj — 0). DISCRIMINATING E X P E R I M E N T S When one is unsure of the correct form of the model, one is usually interested in finding the most plausible model from among a set of tentative models which hopefully contains the true model. Each model may be thought of as a hypothesis and the problem is how to discriminate among the rival hypotheses. Each hypothesis Hj should be characterized by the following attributes: 1. pj(y\x) = Pj[y\Vj(x,9j)], j = l,...,v; 2. E(y\x) = (x\ej); Vj Chapter 2. LITERATURE REVIEW 36 3. A(x) = {J [y — r]j(x,9j)} pj(y\x)dy} 2 1 y i.e., the efficiency (inverse of the variability) function is known for the range of y; and, 4. the results of the observations are assumed to be independent. To discriminate among rival hypotheses Hj, j = 1,... ,v, one must have a decision rule of function 8. Associated with each experiment is a loss Q{8) which includes the cost of the experiment and any penalties for incorrect decisions. For a reasonably informative experiment one would expect that as more samples are taken (and cost rises) the probability of an incorrect decision decreases. In general, there does not exist a decision function 8* which minimizes Q(8) for all y. This leads one to look for a decision function which minimizes the expected loss with respect to y, i.e., (2.45) R(8) depends on 8 and the form of the true hypothesis which is the point of the investigation. If one has a priori information about the true hypothesis, one can minimize the expected risk Efj[R(6)]. When one does not have prior information, one can find the minimax decision rule, i.e., one which minimizes the maximum risk min max S (2.46) H The value of the risk depends not only on the decision function but also on the design of the experiment, thus one can look for a design to minimize E[R(8)]. It is apparent that because the risk depends on many unknown quantities, designing discriminatory experiments can become mathematically complex. To simplify the procedure and to allow for constant updating of information, sequential experimentation is recommended. Chapter 2. LITERATURE 2.1.8 REVIEW 37 R O B U S T DESIGNS Use of the previous theory requires fairly stringent specifications of the model and the optimality criterion. Various authors have attempted to generalize the theory by investigating robust designs. Robust designs retain many of their desirable properties when some of the initial assumptions have not been met. Most of the work has been concerned with designs robust to departures from the specified model and the specified criterion. Much of the work has been discussed earlier but some additional topics will be mentioned here. M O D E L R O B U S T DESIGNS As Box and Draper (1987) observed, all models are incorrect to some extent, but some are useful. However, an experimenter may not always be confident that he has specified a useful model prior to experimentation. Thus, he may wish to have an experimental design whose optimal properties are somewhat insensitive to departures from the specified model. One option is to design a model discriminating experiment (see section 2.1.3). However, this may result in the entire experiment being devoted to model discrimination because of a little model uncertainty. For example, an experimenter may have a model he is satisfied with but may be uncertain as to whether to include an interaction term. A model discriminating design would concentrate resources on estimating the interaction term. As far as the experimenter is concerned, this may be a bit excessive. He may include the interaction term in the model if it is significant since it involves no additional measurement cost but it may be only a minor modification. One solution is to embed the model of interest in a slightly more general model which also includes the interaction term. This will provide good parameter estimates for the model of interest and still allow the interaction effect to be estimated. The most Chapter 2. LITERATURE REVIEW 38 common terms to be included in the general model are interaction terms and higher order polynomials. Another solution is to obtain an optimal design for the model of interest and augment it with observations which allow the possible interaction terms to be estimated. Some authors have tried to obtain model robust designs without specifying the nature of the possible bias. This involves allocating observations such that model adequacy may be checked as well as permiting unspecified parameters to be estimated. No general model robust procedure has been developed. O U T L I E R R O B U S T DESIGNS For the purpose of this discussion, outliers will be loosely defined as observations which disproportionately influence the conclusions drawn from the investigation. The two most common sources of outliers are human error and "unlikely" observations. Human error outliers are best handled by the experimenter although they may be affected by the design. A relatively simple design may reduce (or increase) the likelihood of human error compared to a more complex design. Unlikely observations are those which occur with very low relative frequency and may affect the conclusions more than they should. For example, snow in Vancouver is relatively rare and permanent residents would probably agree that a relatively small snow removal budget is adequate. However, if a tourist happened to visit Vancouver on a day it was snowing, he might conclude that Vancouver should have a relatively large snow removal budget. One may wish to have a design which protects against the influence of such unlikely observations. Since one of the assumptions of linear regression is that the value of the dependent variable is independent of other, values for a given combination of independent variables, one cannot simply discard unlikely observations. One must work at decreasing their influence. Outliers have two major effects. They affect the estimated variance and the Chapter 2. LITERATURE REVIEW 39 estimated relationship. Draper and Herzberg (1979) observed that if there is no model bias, outliers simply increase the integrated variance and the design which minimizes the variance function alone is best. Box and Draper (1987, p.505) stated that Goptimal designs are optimally robust to wild observations. This is a consequence of the coincidence of observations with the greatest influence and predictions with the greatest variance. Another method of reducing the influence of unlikely observations is to increase the sample size so that the relative frequency of the outlier in the sample approaches the relative frequency of the outlier in the population. C R I T E R I O N R O B U S T DESIGNS The majority of effort in developing criterion robust designs has centered around universal optimality (see section 2.1.2). However, universally optimal designs do not always exist and even when they do, they may be difficult to identify. One can obtain an optimal design in terms of one criterion and then investigate its optimality in terms of other criteria. This is always a good practise since the optimal design may not be unique according to one criterion. Another criterion can be used to distinguish between designs equivalent in terms of thefirstcriterion. Designs close in terms of one criterion may be quite dissimilar in terms of another criterion. Chapter 2. LITERATURE 2.2 REVIEW 40 REVIEW OF FORESTRY LITERATURE The term "optimal" has been used mainly in forestry in reference to optimal allocation of samples to strata in stratified sampling (Freese, 1966) or in double sampling with regression (Paine, 1981). In these sampling strategies, the proportion of observations allocated to each stratum (sample and subsample in the case of double sampling) is dependent on the cost of obtaining observations from the stratum. However, the costs are assumed to be constant within the strata. Therefore, optimal design, as used in the present study, is distinct from optimal allocation. Optimal design specifies the values of the independent variables at which one should take observations. Randomization is replaced by a statistical information approach. Thus optimal design is not restricted to sampling but is also appropriate for experimental design. There has not been much discussion of optimal design for linear regression in the forestry literature with the exception of Demaerschalk and Kozak (1974,1975a,1975b). Model-based sampling and sampling for biomass have received more attention. Among other suggestions, Demaerschalk and Kozak recommended using a uniform sampling distribution when there is model uncertainty. When the model is known, they recommended sampling to increase X X. In addition, they developed formulas for calcuT lating the sample size when precision requirements are specified for various values of x. Ker and Smith (1957) discussed sampling for height-diameter relationships. They recommended sampling two large diameter trees and two trees of approximately 30% of the diameter of the large trees to estimate a quadratic relationship. They noted, however, that more study is required to determine the optimum sampling strategy of tree heights. Chapter 2. LITERATURE 2.2.1 MODEL-BASED REVIEW 41 SAMPLING There has been some discussion of model-based sampling in the forestry literature (e.g. Schreuder and Wood,1986). Most of the discussion has been concerned with comparing model-based versus design-based sampling. Although model-based design and inference can be more efficient, it involves more assumptions than design-based inference. Hansen et al. (1983) gave an overview of model- and design-based sampling. Model-based sampling schemes have been defined as those for which sample selection is determined by the assumed model and the model also provides the basis for inference. Randomization other than that supplied by the sampling units is not considered necessary. Design-dependent or probability sampling is sampling where randomization is introduced at the sample selection stage and is the basis of inference. For modelbased sampling, estimates are model unbiased or model consistent. A model-consistent design is one where the difference between the estimate and the actual population value becomes very small as N becomes large: This reliance on model-based estimates is considered the distinguishing characteristic of model-based sampling. Design-dependent sampling makes use of randomization consistent estimators so that although models may be used in the design (e.g. used for stratification), they do not provide a basis for inference. Hansen et al. (1983) pointed out that as sample size increases, any inadequacies in the model become more pronounced while design-dependent sampling schemes rise above this shortcoming. However, one should note that although the inference is modelbased, the sample selection discussed is not equivalent to the optimal design considered in this paper. A D-optimal design would have consisted of sampling the largest x values, resulting in a considerably smaller mean squared error. Chapter 2. LITERATURE REVIEW 42 Hansen et al. also pointed out that for inference about the causal relationship or prediction based on the parameters of a superpopulation or stochastic process, only modeldependent processes are relevant. As well, for small sample sizes, model-dependent sampling may be preferable. Little (1983), in a comment on the paper by Hansen et a/., pointed out that small sample theory is probably very common, more so than the large sample theory of designbased inference. Smith (1983), in a comment on the same paper, emphasized that using models to describe the sampling error structure is only one of the uses of models in sampling design. They can also be used for stratification, for the response errors and coding errors, for non-response, etc. Sampling error may be minimal compared to these other potential errors. All models used should be viewed critically and evaluated as to their usefulness. Schreuder and Wood (1986) enumerated some points to consider when selecting between model- or design-dependent sampling. The sampling scheme should provide a high percentage of good estimates. For model-based sampling, one should have confidence in the model or test its adequacy before engaging in a costly study. One should also consider how much time and effort one is willing to expend ensuring the model and the sampling design are seen to be valid. The user of the study results may need to be convinced of their validity. Other peripheral or subsequent uses of the study results should be considered. The study should lead to a greater understanding of the underlying relationships between the data. The size of the sample may influence the design choice. Small sample sizes may require model-based sampling to achieve the desired precision. Ease of implementation of the sampling should always be considered. However, in the bulk of the studies considered, only a rather limited form of modelbased sampling was discussed. In most cases, all the values of the independent variables in the population were required to be known, as well, the model was mainly used for Chapter 2. LITERATURE REVIEW 43 stratification. However, as has been seen, model-based sampling is much broader in scope than these applications suggest. The objective of these studies was to predict population values rather than relationships. Schreuder et al. (1984) compared a form of model-based sampling and pointPoisson sampling against large sample estimates of tree volume on a timber sale. The model-based sampling scheme involved placing the independent variable (Dbh Hi 2 where DBH = diameter at breast height and Ht = tree height) into equally spaced intervals and then sampling a tree close to the midpoint of each class. The point-Poisson estimate involved sampling trees proportional to their basal area. A subsample of these trees was selected with probabilities proportional to Dbh Ht. 2 Probabilities were set equal to one if Dbh Ht exceeded a certain value. They found that the model-based 2 estimate of total volume was considerably closer to the large sample estimate than the point-Poisson estimate. Using jackknife variance estimates, model-based and pointPoisson sampling yielded similar variance estimates. Confidence intervals constructed from both sampling schemes included the large sample based confidence interval for total volume. They concluded that the two sampling schemes were similar in terms of efficiency. The advantage of model-based sampling was that the sampler can purposely avoid undesirable trees (e.g., hung, fallen, or forked trees which-are difficult and occassionally dangerous to measure). The disadvantage is that it is more dependent on the model than is point-Poisson sampling. This paper was mentioned because although it does not use optimal design theory, it is model-based and a prime candidate for optimal design techniques. The optimal design would have concentrated observations in the smallest and largest Dbh Ht classes 2 yielding predictions with lower variances. However, as in the paper, separate estimates of the population average Dbh Ht and the total number of trees would have to be 2 obtained. Chapter 2. LITERATURE REVIEW 44 Gertner (1987) used optimal design theory to evaluate sampling schemes to improve the predictive ability of already calibrated models. Using L-optimality for a comparison criterion, he compared several sampling schemes. Optimal design theory was used for the comparison but not for the construction of designs. As well, differing availabilities of observations was not considered. i 2.2.2 S A M P L I N G F O R BIOMASS There has been much discussion of sampling for specific quantities of interest to foresters. In particular, in the last two decades there has been considerable interest in predicting biomass(IUFRO working group on forest biomass proceedings in 1971, 1973, and 1976). Cunia (1979a,1979b) reviewed some of the biomass research and made some recommendations concerning statistical and sampling methods. Concerning biomass models, he recommended more attention be paid to model selection. Often allometric models are selected with little effort to evaluate the adequacy of the model. Cunia observed that linear models may be as good as, if not better than, allometric models and, in addition, have a fairly simple error structure so computing the error of combined estimates (e.g. total biomass for British Columbia) is relatively straightforward. Although he observed that one of the assumptions of linear regression is that the values of x , x , • • •, XJV be x 2 fixed, he went on to describe several methods of obtaining a randomized sample. This was done to ensure the sample was representative of the populations. Although this may be required to fulfill some additional sampling objectives, it is not required for quantifying the relationship between dependent and independent variables. What is required is that the relationship between the sample values Xi and be representative of the population relationships. This is ensured by sampling in such a manner that y; is independent of y for a given combination of independent variables. d Chapter 2. LITERATURE 2.3 REVIEW 45 ALGORITHMS Algorithms can be roughly classified into exact and approximate depending on the type of design they produce. The algorithms typically are highly dependent on the optimality criteria adopted. However, general algorithms exist, notably by Fedorov and Malyutov (1972). Most of the published work has dealt with D-optimality. Of course, general algorithms will also work for D-optimality (the D-optimal algorithms are often just special cases of the more general algorithms) but the criterion specific algorithms are more efficient in general. Several algorithms are described in detail. 2.3.1 I N F O R M A T I O N M A T R I X The design criteria in section 2.1.2 are all functions of the information matrix M(£). Several properties of M(£) are worth noting. 1. M(£) is a symmetric positive-semidefinite matrix. 2. The family of matrices M(£), f € E, is convex, that is, if £i £ H and £ £ E then 2 £ ~ a£i + (1 — a)£ is also in E for 0 < a < 1. 2 3. For any design £, the matrix M(£) can be written in the form M(t)=ibt(*i) (i) i) x x (- ) 2 47 7=1 where n < H?=i £ ( x + 1, 0 < f(z,) < 1, n is the number of distinct and i ) = 1 (from Catheodory's theorem (Fedorov, 1972 p.66)). That is, for any design M(£), a design with an equivalent information matrix can be found based on n distinct points. Chapter 2. LITERATURE 2.3.2 REVIEW 46 A P P R O X I M A T E DESIGNS Approximate design algorithms construct design measures which are independent of the number of observations. One then tries to find an exact design close to the approximate design. D - O P T I M U M DESIGNS Analytic solutions for obtaining D-optimal designs exist for some special cases, notably polynomial regression (De La Garza, 1954; Stigler, 1971; Studden, 1978, 1982; Guest, 1958; Gaffke and Krafft, 1982; Lau and Studden, 1985). However, in general, numerical search methods must be used. Approximate D-optimal designs are equivalent to approximate G-optimal designs and most algorithms are based on the equivalence theorem of Kiefer and Wolfowitz (1960). STEEPEST DESCENT This algorithm was discussed in detail by Fedorov (1972). Start with an arbitrary initial non-singular design (2.48) n where ^2C{ i) — 1 x and n > k. (2.49) i=i 1. Compute M(£ ) = E?=i f (zi) (0 (») and its inverse. x x T 0 2. Find x G X to maximize d(x ,£ ) 0 Q 0 = £ {M(£o)} ^oT 1 3. The design £ = (1 - a )£o + ao£(zo), i-e., a 0 (2.50) Chapter 2. LITERATURE REVIEW 47 is constructed where a and where 8 = d(x ,£o) 0 0 o = [8 0+ k-l)]k ( 2 - 5 1 ) k. This value of ct maximizes the increase in the Q determinant of M(£). 4. The information matrix M ( £ ) of the design ^ is computed as well as its inverse. x Steps 2 through 4. are repeated replacing £ by £ or, in general, £ by f i . The 0 x s s + point XQ is often called the direction of the step and Q is often called the length of the 0 step. To avoid repeated matrix inversions in step 4 and calculations of the determinant one can use the following relationships. a{M(6)}- x 1 {M(Ui)}- = (1 - «•) 1 f s + 1 1 - a + ad(x |M(6 )| = (l-a) f c + 1 1+ a 1-a xJ s + 1 + 1 ,£ ) {M(^)}- d(x , £ ) | M ( 6 ) | s+1 1 (2.52) 5 a (2.53) (Fedorov and Malyutov, 1972). Fedorov proved that this algorithm converges to the D-optimal design M((*) as the number of iterations goes to infinity, i.e., lim|M(6)| = s—>oo |M(r)| (2.54) Fedorov also gave a number of stopping rules for terminating the iterative procedure when a design is sufficiently "close" to being D-optimal. Terminate the above algorithm when one of the following conditions is true. 1. a < 7 i . s 2 Mis+i)\-Mt)\ < 3. k d(x ,£o) 1 0 ^ — k < 73. Chapter 2. LITERATURE 48 REVIEW Various modifications to this algorithm can be introduced. The algorithm may be slow to eliminate some near optimal points and Fedorov (1972 p. 109) gave some suggestions for removing suboptimal points and rounding-off near optimal points. The step length a can also be modified. The algorithm will converge to a D-optimal design 0 for any sequence {a } satisfying s CO di = oo and lim ct = 0 (2.55) s 1=1 One such sequence is to select an initial an and use it as long as the determinant of M(£ ) increases at each stage. When the determinant no longer increases, repeat the 3 above procedure with and then ^ and continue in this manner until a stopping rule is statisfied. In general, there does not exist a single sequence {a } which most s quickly approaches the optimal design for all function forms and efficiency functions (Fedorov,1972 pll4). This is a special case of the linear criterion algorithm described later. A C C E L E R A T E D STEEPEST DESCENT One weakness of the steepest descent algorithm is the slow reduction of the weights of non-optimal points included in the design. One solution alternates between adding points to the design and removing non-optimal points from the design by setting their weights p; to zero. Pazman (1986) and Atwood (1973) described such algorithms. M U L T I - C R I T E R I A DESIGNS L I N E A R CRITERIA Let I, be a non-negative function on M ( £ ) , i.e., _1 L(A + B) =L(A)+L(B) L(cA) = cL{A) L(A) >0 (2.56) Chapter 2. LITERATURE REVIEW 49 for all A, B £ M.. A design £* is said to be linear optimal if iP(D}1 = mi L[{M(0}- ]. (2.57) x nJ Note that L[{M(£)} ] is a convex function of M(£) and that for any linear optimal _1 design £*, there exists a design £Q based on n distinct points such that 0 L[{M(C)}- ] = L[{M(Co)}- } 1 where n < (2.58) 1 Fedorov (1972 p. 123). For notation, see section 2.1.2. 0 1. Start with an arbitrary non-singular design £ • 0 2. Find x to maximize (p(x,£ ) 0 - L[{M(£ )}] 0 where <p(x,£ ) — S 0 L[{M(£ )}~ xx 1 T 0 {^^(x.eo)}- ]. 1 3. Compute £ = (1 — a )£ x 0 + a (x ) where Q 0 0 y 0 (x ,eo)-x[{M(e )}0 1 0 7V(^o.eo)[x {M(eo)}- xo-l] T 1 1 j 0 where 7 is a constant greater than 1. 4. Repeat steps 1 to 3 replacing £o,x ,a 0 Q with f i , X i , a i , and so on. Fedorov (1972) proved, that this algorithm converges as s —• 00, i.e., JimL[{MU )}- ] 1 s = L[{M(i)}- }. 1 (2.60) If M(£) is non-singular then it converges to the L-optimal design, i.e., LliMU)}- } 1 = xmnLKATCO}-]. 1 (2.61) Chapter 2. LITERATURE REVIEW M E A N SQUARED ERROR CRITERION 50 Welch (1983) described an algo- rithm to obtain designs which minimized a mean squared error criterion <&, where is made up of two components; V, the normalized average variance, and B, an approximation of model bias. A design is sought which is efficient in terms of minimizing both V and B. 2.3.3 E X A C T DESIGNS When N is large, or when the * are close to integer values, one loses little by using an integer approximation of an approximate design. However, when N is small and the iV& are not close to integers, one may wish to use a procedure designed for obtaining an exact design. Three major complications occur with exact designs: covergence to an optimal design is difficult to prove; no equivalence theorem exists, so it is hard to prove optimality; and, in general, the optimal design depends on N. Because of the dependence of ^ on N, the matrix M[£(N)\ = NM(£) will often be used rather than M(£). As well, in general, exact designs are not unique. Note that $[M(£^)] < $[M(£*)] where fj^ and £* are the optimum exact and approximate designs respectively. D - O P T I M U M DESIGNS Cook and Nachtsheim (1980) summarized and compared some of the available algorithms. K I E F E R R O U N D - O F F P R O C E D U R E (KIEFER,1971) Given a D-optimal design measure £* over X, choose £^ such that (2.62) ( Chapter 2. LITERATURE REVIEW 51 Then ln|M(&)| i M , . , M =1 + G in|M(f*)| 777 • V A T 2 / (2-63) V 7 Note that more than one design £w could satisfy equation 2.62 with some having larger information matrix determinants but there is no attempt to distinguish between these designs. E X C H A N G E PROCEDURES Most of the other alorithms for obtaining exact D-optimal designs are exchange algorithms. One point in the design is removed and a point not currently in the design is added. The two points are chosen to maximize the improvement in the criterion function. This algorithm does not guarantee convergence to the global optimum (Fedorov, 1982 p. 164). Some variations on this basic procedure have been implemented. To reduce computing time, the exchange may be broken into two steps. Remove the point Xi from the design whose deletion results in the smallest decrease in the optimality function. Then add the point not yet in the design whose addition results in the largest improvement in the optimality function. The order of the two steps may be reversed. Mitchell (1974a,1974b) modified this further to come up with DETMAX (see also Galil and Kiefer, 1980). Instead of following each addition of a point x by the deletion of a point x, (or a single subtraction followed by an addition), he allowed the addition or subtraction of k points before returning to the N point design. These "excursions" from the N point, design are small (i.e., k = 1) for the first iterations but as the improvements in A (xi,x) become smaller the excursions become larger. s M U L T I - C R I T E R I A DESIGNS Most of the previous algorithms for obtaining exact D-optimum designs can be modified to obtain exact designs for other criteria by appropriate modifications of the objective Chapter 2. LITERATURE REVIEW 52 function. One should keep in mind that the equivalence between D- and G-optimality does not exist in general for exact designs. ROUND-OFF PROCEDURE Again, for large N, approximate designs can be rounded-off to exact designs with little loss in efficiency. For small N, the standard aproximate design 77, may provide useful information. If the second partial derivatives of the objective function ^{M(rj)} with respect to x exist then ${M(T7,)} - ${M(n)} = 0(N~ ) (2.64) 2 where 77 is the optimal exact design based on the support points of 77, (i.e., those x» for which 77„ > 0) (Silvey, 1980). Thus, one may use an approximate design to identify t the support points of an exact design to reduce the design space X of the problem. However, if the exact design 77 includes points not in the support of 77,, then ${M(77.)} - ${M(77)} = 0(/V*- ) (2.65) 1 Fedorov and Malyutov (1972) stated the following theorem. $ { M ( r ) } " < HM(CN)}1 1 < 7 ( ^ N ) ) ^{M(D}- - 1 (2.66) where 7 is as in section 2.1.2. For linear criteria, j(N) = JV . They also gave the _1 following corollary. Let fa have the same support points as the optimal approximate design £*. At every support point x^, take rf = [(N — T Z ) £ * ] measurements where [c] is the smallest + + integer satisfying the inequatliy [c] > c. Arbitrarily distribute the remaining N — + J2[(7Y — n)£*] measurements. Then + HM(fa)}' 1 - HM(CN)}- 1 < 7(JV - n) - 1 ${M(r)> (2.67) Chapter 2. LITERATURE REVIEW 53 where £^ is he optimum exact design. The corollary can be used to estimate at what N the optimum approximate design can still be used to obtain a reasonably efficient exact design. Note that as the number of support points decreases, the upper bound in equation 2.66 is decreased, indicating approximate designs with minimal support will be closest to the optimal exact design. EXCHANGE PROCEDURE Welch (1984) modified Mitchell's DETMAX to con- struct G- and A-optimal exact designs. He found that for his examples the value of |M(£/v)| did not alter considerably with changing criteria (D-, A-, and G-optimality) but that tr[M(£jy)] an< ^ rnax X [M(£JV)] :E showed more variability. Therefore, it may T _1 x be worthwhile to compromise slightly in terms of D-optimality to achieve a considerable improvement in terms of either A- or G-optimality. ANNEALING ALGORITHM "Annealing" originally refers to a process in which metals are heated and then cooled to achieve a crystalline structure corresponding to an energy state, dependent on the rate of cooling. When cooled, the atoms tend toward an arrangement to minimize potential energy, but because of the large number of possible configurations, it is likely only a local minimum will be achieved. The metal may be reheated and cooled with the expectation of achieving a lower energy state. Metropolis et al. (1953) devised an algorithm to simulate the equilibrium state of the atoms in a metal at a given temperature. A rearrangement of the atoms' configuration is accepted with probability 1 if the new configuration has a lower energy level and accepted with probability p (dependent on the temperature and the change in energy) if the energy level increases. Otherwise, the old configuration is used. Kirkpatrick et al. (1983) adapted this simulated annealing procedure for function optimization. They found that because of the probabilistic nature of acceptance of the new configuration, the algorithm Chapter 2. LITERATURE REVIEW 54 is able to escape from local optima and thus convergence to a global optimum was more likely. As well, the computing power required to solve an optimization problem increases linearly or as a small power of the size of the problem rather than exponentially as most other approaches. Bohachevsky et al. (1986) found that the Kirkpatrick algorithm may require a large number of steps before identifying a global optimum. They suggested modifing the algorithm so that the probability of accepting a detrimental step tends to zero as a global optimum is reached. They called their technique the generalized simulated annealing method. U N I F O R M DESIGNS Plackett introduced a criterion of uniformity in the discussion of Scheffe (1963). He designated the domain of a point x, the region for which x is the nearest design point. A design is uniform if the volumes of the domains of all the design points are equal. Doehlert (1970) described a method of obtaining uniform designs by generating uniform shells. Kennard and Stone (1969) in addition to recognizing the problem of model uncertainty, also identified "messy" design spaces X and the inadvisability of over replication as problems not addressed by most optimal design procedures. They developed an algorithm for obtaining uniform designs which addressed these complications. 2.3.4 S E Q U E N T I A L DESIGN N O N L I N E A R DESIGN Fedorov and Malyutov (1972) described the following algorithm for obtaining sequential ^-optimum approximate designs for nonlinear regression models. Let $ be a criterion function as defined in table 2.1 and use the notation of section Chapter 2. LITERATURE REVIEW 55 2.1.6. 1. Start with N observations taken according to the arbitrary non-singular design £/v (possibly a previous experiment). 2. Find x N+1 satisfying ip(x ,9 , N+1 <p(xJ ,t ) N N N £ ) = sup N Bgjr tp(x, 9 , (N) where N = x V(d ,t ) *l ° \ V(9 ,Z )x T d N { N iN) N N N (2.68) and 0jv is the LSE after N observations. 3. Take an observation at xw+i- 4. Go to step 2 using 9 N+i and = (1 - J ^ ) £ N Stop when the desired precision is achieved. + j^i^N+i) in place of 9^ and Chapter 3 COMPLICATIONS The previous discussion dealt mainly with statistical concerns. However, for most practical applications there are additional factors which must be considered to make the design easy and inexpensive to implement. When one is sampling, not all combinations of the independent variable occur with the same frequency. All are not easy to locate; some are rare. As well, when one is conducting an experiment, not all combinations of the independent variables are equally easy to set up. Some combinations may be dangerous or require resources which are limited. In this discussion, variable frequency and ease of set up will be termed availability. Availability must be an important part of obtaining optimal designs in forestry. Two other common complications are variable precision requirements and variable observation costs. As will be seen, these complications may be handled in somewhat similar manners. In the following discussion, restrict attention to standard linear models of the form given in equation 2.10. 3.1 AVAILABILITY In forestry, commonly the predictor variables are not truly independent of one another. For example, two of the most commonly used predictors of tree volume are the square of diameter at breast height (dbh ) and height (/it). These are highly correlated variables. 2 One result of this is that some combinations of dbh and ht are fairly common while 2 56 Chapter 3. COMPLICATIONS 57 others are quite rare. One cannot select a possible value of dbh and a feasible ht and 2 expect to find the combination commonly occuring in nature. One must consider the relationships between the independent variables. Tall, thin trees and short, fat trees are uncommon and may even be undesirable sample trees because of their rarity and disproportionate influence on results if included in a design. Even when correlation is not important or is ignored, availability is often still important. When predicting height from diameter, an optimal design which specifies which diameters to sample may be difficult to implement since not all diameters are equally common. Trees with large diameters may be quite rare and difficult to locate. When predicting wood density from growth rings per centimeter, some values of rings per centimeter will be less common than others, some being almost impossible to find. In experiments, some combinations are not desirable. For instance, high pressures and high temperatures may create dangerous operating conditions. Certain settings of the independent variables may require expensive, sophisticated equipment or specially trained personnel to obtain. In some experiments, e.g., fertilization trials, some factors may be limited. Some observations may be useful for other studies and may be more desirable than other single-use samples. 3.2 PRECISION When a regression model is used for prediction, it is not always necessary that all predictions have the same precision. Before conducting an experiment or survey, the manager should consider the size of error he is willing to allow. In some cases, this may already be dictated by government or industry policy. In other cases, the manager may use his knowledge of the acceptable error levels for effective decision making when Chapter 3. COMPLICATIONS 58 specifying allowable error levels. For instance, in forestry volume surveys, often per cent error is specified so that the allowable error increases linearly with the prediction. For example, company policy may require that volume estimates be within ±10%, 19 times out of 20. This is, in effect, a variable allowable error specification. Statisticians tend to specify an absolute allowable error and their optimal designs reflect this. The points included in an optimal design are points at which the maximum allowable error is attained and all other points have equal or smaller errors (see section 2.1.2). Techniques for obtaining optimal designs for absolute allowable error specifications require modification in order to be used efficiently to obtain optimal designs for variable allowable error specifications. One solution is to use the minimum value of the variable allowable error as the absolute allowable error. This can lead to a considerable waste of effort by attaining precision where it is not required. The design is not really "optimal" . A more effective method of incorporating variable allowable errors or precision requirements is necessary Several types of precision requirements are possible : precision specified as a functon of x, as a function of y, and for a few values of x. 3.3 COSTS The previous theory made no explicit mention of the cost of taking observations. Potential observations were evaluated and selected purely on the basis of their potential information. However, in most applications of experimental design, observation cost is important and should be explicitly considered. An observation which more expensive to obtain than another potential observation should not be included in a design, all other things being equal. 'In forestry, particularly in sample surveys, costs can vary considerably from one observation to another. In biomass studies, large trees may be Chapter 3. COMPLICATIONS 1 59 significantly; mpee. expej^ive :te: sample than small trees. Dense stands may involve : much more time? to meagre than; sparse stands. In experiments, low levels of a treatment may be less expensive tban high levels. Control observations usually are cheaper than treatments which .njay involve expensive equipment or costly labour. Several observations with-fhe'sarri©. treatment combinations may be less expensive than the same number of observations,: eachvwith-a different treatment combination. Additional factors may influence the cost of obtaining observations. For instance, when relating w^od'kili>,.drying-iimes to strength and durability properties, some drying times may result in- boards-,width: different market values. Most management practises may result in trees of varying qualities and:end uses. The potential revenues from the by-products of an experiment?©!*/survey may vary depending on the design. % • Possibly somg'tlesigns should;.ignore variable costs of observations, e.g., when the cost of obtainihgi'obseryations-tjaiminor compared to the cost of analysis. However, in the majority ojdns'tances, costs cannot be ignored. If the cost information is available, it should be josfecfc'to obtain a ^ D S t t e f f i c i e n t Hesign. The cost of jiii'ebservation may be broken down into C{ = lc{ + mci (3.69) where c; is the total cost associated with an observation at x;, lei is the location (sampling) or experimental se?up (experimentation) cost, and mc^ is the measurement cost. In general, these quantifies depend on* both x and y and possibly on the previous observations.' " . -> • J i.. . ' ' :p 1. Location cost. For most practical forestry purposes, the setup cost for X{ can be assumed to be independent of that for Xj, % ^ j and the rest of the discussion : < :<xi will be b a s e d "on this assumption. Typically, the closer the x; is to the status quo or to being "average" (i.e., near the center of X), the less costly is locating such Chapter 3. COMPLICATIONS 60 an observation or setting up such an experiment. As one moves to more extreme values of x,, lei becomes higher. For some surveys, locating specific values of the independent variables may be more expensive than a random sampling design which achieves the same precision. In these cases, additional considerations may still cause the manager to opt for a designed survey. Random surveys are usually poor for covering the range of interest. A survey which specifies which values of the independent variables to sample may be more expensive than a random survey but may be essential to avoid extrapolation. 2. Measurement cost. This typically depends on the magnitude of y and must therefore usually be estimated. For most experiments, this cost usually does not vary much with y, and is assumed to be constant. Two common exceptions may occur. The mci may depend on t/; or on x^. In sampling for tree volume or biomass, cost typically varies greatly with y and greatly influences total cost. In other studies, treatment cost or the cost of measuring or setting x, dominates the total measurement cost, e.g., as in studies relating pruning height to wood quality. In this case mci is largely a function of x,. -When total cost is independent of x and y, i.e., all observations have the same cost, the previous theory is valid. However, when costs depend on x and y, the previous theory requires modification. Chapter 4 P R O B L E M SOLUTION The previous chapter identified some of the limitations of optimal design theory which may prevent its use in forestry. Methods of addressing these limitations are discussed below. To illustrate methods of dealing with differing availabilities, precision requirements, and costs, attention will be retricted to D-optimal designs with the realization that these designs are also G-optimal. The D-optimality criterion was selected because it is invariant with respect to scale. It also has a nice interpretation in the parameter space and, as a consequence of its equivalence to G-optimality, D-optimality also has a nice interpretation in the prediction space. The most difficult aspect of incorporating differing availabilities, precision requirements, and costs into optimal designs is estimating these quantities. How much more difficult is it to locate a 40 cm Douglas-fir than a 20 cm Douglas-fir? What magnitude of error is allowable when making management decisions? How expensive is one design compared to another? Answers to these questions require considerable thought and experience from the manager and often he will only be able to guess at them. Nevertheless, they are important and should receive attention. It is not realistic to expect a cost-efficient design if one is unwilling to supply costs. The more precise and detailed the information available, the more efficient the design will be. If the biometrician finds that he is unable to supply the necessary information, perhaps he should consider undertaking a small preliminary study to obtain some of the missing observation. 61 Chapter 4. PROBLEM SOLUTION 62 Once information is available on availability, precision, and costs it should be incorporated into the design. For the remaining discussion, a design which ignores differing availabilities, costs, and precision requirements will be called a "pure" optimum design (i.e., a design which assumes only information varies from observation to observation). Designs which consider these complications will simply be called optimal designs. 4.1 AVAILABILITY Availability is the relative ease of obtaining observations for different values of the independent variables. Typically, the availabilities depend both on the availabilities of the individual independent variables as well as the joint availabilities of the independent variables. A potentially useful description of these relationships is the ellipsoid. Recall the general equation for an ellipse. ax + by + cxy + dx + ey + f = 0 2 (4.70) 2 Or, in matrix notation. Cn [(x - x )(y - y )} 0 C12 (x XQ) - 0 c 21 The centre of the matrix is (x ,y ). 0 0 c 22 = const (4.71) (y -yo) The matrix C = {c}ij is a positive definite matrix. Since all combinations of the independent variables are not equally easy to find or set, one may be interested in weighting each combination by its availability. For the dbh ,ht example, the concentric ellipses shown in figure 4.4 represent in2 creasingly rarer combinations of dbh and ht as one gets further from the center. The 2 problem now is to incorporate such information into the experimental design. A combination of independent variables on an outer ellipse should have a lower probability of being incorporated into the optimal design than a combination on an inner ellipse, all Chapter 4. PROBLEM SOLUTION Figure 4.4: The relationship between d i a m e t e r at breast height, height, a n d relative frequency. Chapter 4. PROBLEM SOLUTION 64 other things being equal. That is, if a certain combination is more difficult to locate, to set, or is less desirable for some reason, it should be less likely to be included in a design. Combinations of independent variables should be weighted according to their availability. 4.1.1 OBTAINING T H E WEIGHTS For sample surveys, where availability is often roughly proportional to the relative frequency of the observations in the population, if the covariance matrix of the independent variables is available, C can be set proportional to this. If the covariance matrix is not available, the experimenter can use the geometric interpretation of C to derive the weights by first approximating an ellipse which the experimenter feels adequately describes the relationship between the independent variables and then obtaining C. Another approach to considering explicitly the availability of an observation for a given combination of independent variables is to use a probability distribution. For example, a bivariate normal density may adequately describe the frequency of occurence of combinations of two independent variables. This approach is potentially more flexible than the previous approach using ellipsoids. The reason for selecting the ellipsoid approach over the density function approach is that the ellipsoid approach can be viewed as a linear transformation of the independent variables, a transformation of the original Rl. This may lead to some insights into the optimal solution which would not be as obvious using the density function approach. However, using a joint density function may be the only feasible approach if one or more of the independent variables is discrete and has a variable (non-uniform) joint density with one or more additional independent variables. A third approach to considering availability is to calculate a "location" cost to measure availability. If an observation has a lower availability, the location cost is greater Chapter 4. PROBLEM SOLUTION 65 and can be used to account for differing availabilities. One can then weigh combinations of independent variables by this location cost. For the remaining discussion, a design which ignores differing availabilities, costs, and precision requirements will be called a "pure" optimum design (a design which assumes only information varies from observation to observation). Designs which consider these complications will simply be called optimum designs. 4.1.2 INCORPORATING T H E WEIGHTS If one has a numerical measure of the availability of each potential observation, one can then proceed to incorporate the information into the search for an optimal design. An observation with twice the availability of another observation, e.g., twice as common, should be contain twice as much information to be given equal consideration for inclusion into the optimal design, all other things being equal. Thus the probability of selection should be directly proportional to the availability or inversely proportional to the scarcity of an observation. However, an observation's probability of being included in the optimal design is also directly proportional to the information it posesses. Therefore, the probability of selection of an observation will be formulated as follows. probability of inclusion in the optimal design a information x availability (4.72) The information associated with an observation is a second order function of x. To directly incorporate the availability of an observation x, one can use the following transformation. x = x x -^availability (4-73) The previous theory is valid for this transformed x. and now considers availability. One can anticipate design problems when availability is inversely proportional to the square root of the information. The algorithms previously discussed will have difficulty Chapter 4. PROBLEM SOLUTION 66 finding the optimal design since the associated information matrix will be singular. As far as the algorithms are concerned, all the observations are equivalent and therefore all designs are equal. In this case, rather than using another criterion to distinguish between designs, one can use a transformation of x. A prime choice would be to go back to the original sample space. The optimal design would be the best in terms of statistical criteria, ignoring availability, and would be equivalent to all other designs if both information and availability are jointly considered. Another prime candidate would be a uniform design since it does not depend on the model chosen. 4.1.3 IMPLICATIONS OF MISSPECIFYING AVAILABILITY The ease of obtaining an observation for a given combination of independent variables is often not known and must be estimated. Errors in specifying availability have no effect on the eventual statistical analysis of the data and inference drawn from the analysis, since the errors' only influence is on the selection of combinations of the independent variables, the dependent variable is not affected. However, the optimality of the design obtained is dependent on the availability specifications. If these specifications are in error, the design is no longer optimal although it may still yield very useful information. The design is no longer the most efficient for achieving the manager's objectives. 4.1.4 DISCUSSION For any convex RO, the points comprising the optimal design will all be boundary points (if no weights are used). If the variables are "weighted", the optimal design points will be on the boundaries of the transformed space, i.e., X • w(x). Most transformations change the scale of the axes and the shape of the boundary but the original boundary points remain boundary points in the transformed space. Therefore, one must take extreme care when specifying the boundaries to ensure boundary observations are Chapter 4. PROBLEM SOLUTION 67 available or feasable. Some transformations (such as the joint normal probability density function) effectively change the boundaries and protect against misspecification of boundaries in the original problem. In fact, one can use this method to avoid specifying a RO. However, one should be extremely careful about changing the boundaries. Usually, the boundaries of the original problem are where one has the most uncertainty about the model. If the optimal design does not include points on the boundary, one should augment the design with such points. The optimal design is based on the assumption of a correct model throughout the RI. In practise, this assumption is usually not verified fully until after the full study is carried out. In any case, if the boundary of the RI is not sampled, some boundary points will involve extrapolation. Extrapolation involves uncertainty which should usually be avoided if possible. Figures 4.5 through 4.7 help illustrate the effect of weighting on the optimal design. The data are from the example discussed later. The dispersions are given over the RI infigure4.5. In an unweighted design, the next design point selected for inclusion in the optimal design would be the point with the greatest dispersion (note that this is a boundary point). Figure 4.6 shows the weighting function, chosen in this case to be availability, proportional to the covariance matrix. The product of these two quantities is given infigure4.7. The optimum weighted design would choose the point with the maximum product. Note the shift from the corners. Rather than optimizing some function of the information contained in a design, one optimizes some function of the information per unit availability contained in a design. (4.74) One can use the previous theory on the transformed equation. Note that the need only be specified up to multiplication by a constant although the units of measurement are not easily interpretable then. Chapter 4. PROBLEM SOLUTION Figure 4 . 5 : The relationship between height, crown width, a n d d i s p e r s i o n = xDx. PROBLEM SOLUTION s 0 z OJV 6 *'9ht Chapter 4. PROBLEM 4.2 SOLUTION 71 G E N E R A L I Z E D LOSS In section 2.1 the only loss that was considered was due to error in prediction or parameter estimation. There is also, in general, a cost associated with obtaining observations. This cost typically is measured in terms of monetary units, time, manpower, equipment, or materials used or some combination of these. Thus, the loss associated with a experiment £ may be written L(LN) = C(t,N) + P(M(0,N) (4.75) where L((, N) is the total loss made up of the cost of the experiment, C(£, N), and the precision, P(M{(),N) (Fedorov, 1972 p.56). The form of C(£,iV) is highly dependent on the type of experiment being performed. The simplest case is when all observations are available at the same cost. Then minimizing the total loss corresponds to minimizing the number of observations, or, equivalently, maximizing the precision. However, costs are often not equal. In forest biomass studies, the cost is usually highly dependent on the size of the tree. In chemical experiments which involve using sophisticated equipment, the cost incurred when resetting equipment may be considerable and the cost of obtaining an observation may be dependent on the setting used to obtain the previous observation. In this case, it may be less costly to replicate a few design points rather than spreading observations throughout the design space. The cost may be made up of costs associated with setting or observing £(,) as well as costs associated with observing T/,: which are not known a priori but may be estimated by the proposed model. The function P(M(£), N) is usually one of the criteria in section 2.1.2 appropriately scaled. The loss function composed of a cost component and a precision component is very appealing. Conceptually, it is fairly easy to understand. The formulation allows Chapter 4. PROBLEM SOLUTION 72 one to compare the relative importance of cost and precision as well as making the consideration of cost obvious. To be useful, both components must be in the same units and although this may be difficult to specify, it has the benefit of ensuring the experimenter has thought about the relative importance of cost and precision. As well, for complicated cost functions, computationally this may be the only tractable formulation. This formulation (equation 4.92) also has several drawbacks. It is often very difficult to put a cost function and a design criterion such as D-optimality in the same units. There appears to be no conclusive argument for assuming costs and precision to be additive. Conceivably, one could use weighting functions as in section 4.1. Other methods of handling costs (section 4.3) are formulated so the current optimal design techniques are still applicable. The generalized loss approach has not been used in this discussion. 4.3 I N C O R P O R A T I N G COST Cost can be viewed as a general penalty and can be used for distinguishing among observations on the basis of practical considerations such as time or effort required to obtain observations and cost involved in obtaining the observations. Cost can be a measure of any expenditure, time, effort, money, resources, incurred. It should come as no surprise, therefore, to find that differing costs can be handled using a series of weights in the same manner as differing availabilities were. An observation which is twice as expensive as another should contain twice as much information as the other to be given the same consideration for inclusion in a design, all other things being equal. The probability that an observation is incorporated into a design should be inversely Chapter 4. PROBLEM 73 SOLUTION proportional to its cost. That is, probability of inclusion oc '. (4- 6) 7 cost But the probability of inclusion should, also be proportional to the infomation an observation possesses. Thus, the following relationshiop. probability of inclusion oc —— x information cost (4-77) The information an observation possesses is a function of the square of x. Therefore, to incorporate differing costs and yet retain the previous theory, one should use the following transformation. x[ = x x -4= t (4.78) where C{ is the cost associated with obtaining observation i. The cost may be a function of j/i, Xi, or both. The previously described optimal design algorithms may be used on this transformed sample space to minimize information per unit cost. Again, as with availability, one can anticipate problems when x± is proportional to the square root of the cost of obtaining observation X,. The optimal design algorithm chosen may fail since it will not be able to distinguish between observations. All designs are equivalent. In this case, another formulation should be used. One could ignore cost, and obtain a pure optimum design. A random design may be desirable because of the ease of implementation. However, a very strong contender for the final design used should be a uniform design. This design is not dependent on the model chosen, ensures one can evaluate the final model's performance throughout the range of interest, and may provide useful data for other future applications, because it is an all-purpose design. Chapters 4.4 PROBLEM SOLUTION 74 D I S C U S S I O N One can view the desirability of an observation as a function of two components, its information and its cost. Information is the statistical information modified by the desired precision, if precision requirements vary within the range of interest. The cost component incorporates the expenditures incurred in obtaining an observation in whatever measure the manager chooses. The optimal design should include observations which contain much information relative to their cost. The example discussed in the next chapter illustrates a technique for obtaining such an optimum design and compares the resulting design to some of the current alternatives. In general, when an absolute allowable error is specified, the pure optimal design (the optimal design which ignores complications such as different costs and availabilities) will require fewer observations than the optimal design which considers these complications. This is because pure optimal designs maximize precision or minimize sample size to achieve a given precision.- However, the observations comprising the pure optimal design, will, in general, also be more expensive or have lower availabilities. The optimal design will require more observations but they will be cheaper or have higher availabilities. The large sample size for the optimum design (relative to the pure optimum design) has several advantages. As sample sizes get larger, close-to-optimal exact designs can be constructed from optimal approximate designs with little loss in efficiency. This means that the approximate theory can be used for design construction. Large samples are also desirable when one is relying on large sample theory, e.g., asymptotic properties of estimators, asymptotic distributions of variables, asymptotic optimality of tests, etc. The influence of outliers is reduced in large samples. A large sample may provide more opportunity for checking assumptions. Halfway through a survey Chapter 4. PROBLEM SOLUTION 75 with many observations, one may have enough data to check and revise many of the initial assumptions including the variance of the dependent variable. This can prevent excessive resource expenditures based on erroneous assumptions. A large sample may also seem more credible than a sample based on fewer observations. A possible disadvantage of larger samples is the large amounts of data they produce. If the cost of data processing is greatly influenced by sample size and is substantial, a smaller sample size may be desirable. However, this cost should then be incorporated into the cost weights. If variable precision requirements are specified, pure optimal designs do not necessarily require fewer observations than an optimal design. This is because pure optimal designs will maximize precision everywhere, and will include unnecessary observations where precision is not required. Thus, in general, for variable precision requirements, pure optimal designs may require more observations, but they will be unnecessary. Chapter 5 EXAMPLE The following example illustrates the use of optimal design for linear regression applied to a forestry problem. A sampling problem was chosen but the technique is also appropriate for experimental design. The procedure given in figure 5.8 is a general guideline for designing an experiment and need not be rigorously followed. However, it does cover the major points which should be considered and may be used as a checklist to protect against omitting critical steps. One should be checking constantly to see that assumptions and specifications are met and revising the design if necessary. 5.1 BACKGROUND Study data from the United States were available from aerial photographs relating aerial photograph measurements to ^ ha plot volumes. The original source of the data is unknown but the information is from actual photograph and field measurements. This example does not presume to reflect any particular forested area and is used for illustrative purposes only. The resulting design is highly dependent upon the data and the assumptions made which may not adequately reflect reality. However, the general conclusions should have wide applicability. Tree height, crown width, and crown closure were measured on the photographs while plot volume was determined on the ground. Crown closure was measured using a dot grid. The heights and crown widths of all trees on each plot were measured and averaged by plot and will be referred to as tree 76 Chapter 5. EXAMPLE 77 1. Preparation (a) State objectives as specifically as possible. (b) Identify independent variables and transformations. (c) Identify model. (d) State range of interest. (e) Identify weights. • • • • precision. cost. availability. variability (heteroscedasticity). (f) Estimate variance(<7 ). 2 2. Conduct a Pilot Study - uniform design with replications is recommended unless additional information is required (e.g., cr ). x 3. Analysis of Pilot Study. (a) Check for agreement between pilot study results and the original specifications. (b) Check to see that model is appropriate including checking of assumptions. If not, try alternate models or modify the original model. (c) Identify the criterion which best quantifies the objectives. (d) Determine the optimum design. Check to see that the boundary of the region of interest is covered and, if necessary, augment the optimal design. 4. Conduct the full study leaving some resources for analysis of study results and possible further studies. 5. Analysis of study results. (a) Determine to what extent the objectives were met. (b) Make recommendations for future similar studies. Figure 5.8: Guidelines for designing and conducting an experiment or survey Chapters. EXAMPLE 78 height and crown width, although they are in fact averages. The pilot study was not a uniform design nor did it include any replications of combinations of the independent variables. 5.2 5.2.1 PREPARATION OBJECTIVES The objective of the study was to quantify the relationship between aerial photograph measurements and ground plot volumes for the purpose of estimating plot volumes from aerial photograph information. 5.2.2 I N D E P E N D E N T V A R I A B L E S The most common ground measurements chosen for predicting tree volume are tree height and diameter. As well, some sort of spacing indicator is often used to account for changing tree form with changing spacing. Therefore, on the aerial photographs, tree height, crown width, and crown closure were chosen as independent variables. Crown width was chosen as a good indicator of tree diameter. Crown closure, the percentage of an area covered by tree crowns, was used as an indicator of the proportion of the area occupied by trees. Smith (1965) found a multiple regression equation based on height, crown closure, and crown width to yield the best estimate of plot volume. 5.2.3 MODEL In general, for widespread use, or for extrapolation, a biologically-based or theorectical model (usually nonlinear) is preferred. However, the objective was to obtain a local volume equation. Thus, no a priori model was selected. Since the model would be empirical, it was decided to use a linear model for computational ease since analytic Chapter 5. EXAMPLE 79 forms for parameter estimates and confidence intervals for nonlinear models do not exist. Some transformations of the independent variables were selected for trial in the model particularly for crown width which possibly needed to be squared to give an indication of the cross-sectional area of a tree. Among the limitations of an empirical model are the lack of biological interpretation of the parameters and the dangers involved in extrapolation. 5.2.4 RANGES The heights of interest ranged from 10 to 40 meters. The crown widths of interest ranged from 5 to 10 meters. The crown closures of interest ranged from 50 to 100 per cent. 5.2.5 WEIGHTS For illustrative purposes, only one complication was introduced into the example. The minimum precision for the volume estimate of any given plot was set at ±lm /plot, 3 with precision level a = 0.05. Availabilities of the observations were assumed to be estimable from the values of the independent variables. The independent variables were assumed to have a jointly normal distribution with means and covariances equal to the pilot study mean vector and covariance matrix. The availabilities were set proportional to the probability density function. Thus, "average" observations were assumed to be common and more extreme observations were penalized for their low availabilities. The total cost of obtaining an observation was broken down into two components, location cost and sampling cost. Location cost was set inversely proportional to availability and ranged from a low of less than $ 1.00 for an "average" plot to a high of $ 363.00 for an "extreme" plot. This range was used for illustrative purposes and was felt to be reasonable. However, actual costs may be considerably different. Sampling cost was Chapter 5. EXAMPLE 80 the cost of obtaining the volume estimates once the plot was located and was fixed at $ 200.00. Thus, the total cost for obtaining an observation ranged from $ 200.00 to $ 563.00. The costs needed only to be specified up to multiplication by a constant. No other costs were included. As was seen in figure 4.6, this weighting function tends only to penalize the plots with short, fat tree (small heights and wide crowns) and tall, thin trees (large heights and narrow crowns). The costs for the rest of the plots are dominated by the sampling costs. 5.2.6 V A R I A N C E OF T H E D E P E N D E N T V A R I A B L E The variance of volume, conditional on height, crown width, and crown closure, was to be estimated from the pilot study. 5.3 PILOT S T U D Y A pilot study was conducted and 24 observations were taken. The size of the pilot study was rather small but the example is still usefull for illustrating the use of optimal design. The objective of the pilot study was to obtain data required for constructing an optimal design. The necessary data included an estimate of the standard error of volume given crown width, height, and crown closure (o~ \ ), and estimates of the availabilities of y x different combinations of the independent variables, as well as identification of a suitable model. The availability function was to be estimated from the covariance matrix of the independent variables. Therefore, it was necessary that the pilot study be a random sample to allow estimation of this matrix. Had an estimate of the covariance matrix been available prior to the pilot study, a replicated uniform design throughout the RI would have been recommended to facilitate model estimation and testing. The pilot study data are given in table 5.2. Chapter 5. EXAMPLE 81 Table 5.2: Pilot study data relating average plot aerial photo measurements and ground volume measurements Height(m) Crown Closure(%) Crown Width(m) Volume( m /plot) 3 27.4 84.0 7.5 13.026 22.6 58.0 7.7 12.261 21.3 68.0 8.1 10.336 22.9 88.0 7.1 10.647 20.7 88.0 7.3 11.865 17.7 87.0 5.8 9.345 24.4 48.0 6.5 10.251 26.5 53.0 7.9 10.789 24.4 72.0 6.9 12.205 21.9 96.0 7.7 13.649 18.3 59.0 7.1 9.118 21.9 77.0 7.9 10.732 21.9 62.0 7.7 10.789 24.4 86.0 7.1 14.895 24.4 84.0 8.1 13.196 27.4 78.0 9.3 15.801 16.8 85.0 6.5 7.957 18.3 72.0 7.2 9.486 17.7 92.0 6.9 9.316 27.4 85.0 7.7 12.261 25.9 93.0 7.2 14.385 22.6 94.0 8.1 14.753 33.5 72.0 9.3 18.548 26.2 73.0 8.2 16.036 Chapter 5. EXAMPLE source regression error total 5.4 82 Table 5.3: ANOVA for the final model. df sum of squares mean square F 3 126.478 42.160 27.15 20 31.052 1.553 23 157.530 P>F 0.00001 ANALYSIS OF PILOT S T U D Y 5.4.1 OBTAINING A M O D E L Initially, each independent variable was plotted against the dependent variable to determine if a relationship existed between the variables. Attention was restricted to linear models. Transformations were used to linearize curvilinear relationships. The plot of height vs. volume showed a linear relationship with increasing variance of volume with increasing height. Both crown width and crown closure showed possible curvilinear relationships with volume. The squares of both crown width and crown closure were also plotted against volume. Both these plots revealed approximately linear relationships between dependent and transformed independent variables. Backward elimination was used to find a good linear model using the independent variables, height, crown width, crown closure, crown width , and crown closure without 2 2 including unnecessary (nonsignificant) terms. The model thus obtained was VOL = f{HT,CW ,CC ). 2 2 (5.79). The ANOVA table is given in table 5.3. The assumptions of linear regression were then checked. The assumptions of normality of the residuals was checked by plotting the normalized ranks of the residuals vs. the residuals (see figure 5.9). The resulting approximately linear relationship showed no serious departures from the assumption of normality. The assumption of homogeneity of variance was checked by plotting the residual vs. predicted values (see figure Chapter 5. EXAMPLE PLOT OF NSCORE'RESIOV S3 LEGEND: t = 1 OSS. B " I OBS. ETC. I A 0.6 R I > 0.0 -0.2 RESIOV Figure 5.9: The relationship between the normalized ranks of the residuals and the residuals 5.10). There did not appear to be any clear evidence of increasing variability as volume increased. 5.4.2 OBTAINING T H E WEIGHTS Availability can be estimated using the covariance matrix and the original data or the correlation matrix and the standardized data, i.e., a = x Cx T or a = z Rz r (5.80) where x is the vector of untransformed independent variables, z is the vector of standardized independent variables, C is the covariance matrix, and R is the correlation matrix. For this example the covariance matrix and x variables were used. The availability weight, w becomes a Chapter 5. EXAMPLE RESIDV | PLOT OF P.ESIDVPREDV 84 LEGEND: » * 1 OBS, B B Z OBS, ETC. i.i • > A A A A -3.0 » I Figure 5.10: The relationship between the residual and predicted values. 5.4.3 ESTIMATING T H E V A R I A N C E OF V O L U M E The mean squared error from the regression analysis of the final model provides an estimate of the variance of volume, given the independent variables. Since the pilot study was rather small, the upper bound of the two-sided 90 per cent confidence interval of s , 2 was used rather than s itself to reduce the risk of not meeting the specified precision 2 requirements due to a low estimate of a . This was recommended by Demaerschalk 2 and Kozak (1974). , MSE(A'-p) 2 *o.9o = —T 1 = X(AT_p),0.025 5.5 1.553(20) = -2388 3 C E O y ' 0 (5.82) y IDENTIFYING T H E O P T I M U M DESIGN A computer program was written to find the optimum design for the example. The program was written in TURBO C® (Borland International, 1987) on an IBM® personal system/2 model 50 following Fedorov's steepest descent algorithm (section 2.3.2) and Chapter 5. EXAMPLE some of the subroutines described in Press et al. 85 (1986). A description of the program is given in figure 5.11. Fedorov's steepest descent algorithm was selected since it is conceptually simple. Most of the other algorithms are refinements of this algorithm. The modifications applied to this algorithm could easily be introduced to most of the other algorithms as well. Observations are added one by one to the pilot study and given weights (epsilon) to maximize the increase in the determinant of the information matrix M. An infinite population is assumed so that the availability function does not require updating after each iteration. The program was written to handle general problems but due to software and hardware constraints, the maximum dimensions of X were fixed at 100 rows and 10 columns. Later, the program was transferred to the University of British Columbia Amdahl® 5860 computer and the maximum array dimensions were increased. Execution time improved considerably as well. The user supplies an initial design matrix, an upper and lower bound for each independent variable, a function which evaluates ^%" M~ ^= using a weighting function which evaluates the weight for each x. T r x The program uses a Powell (1964) search to find the observation x which maximizes ^^ M ^=. T _ 1 space ^ The original design space X is searched rather-than the transformed since the correspondence between the two spaces need not be one to one. By searching over X, the problem of choosing among several x all corresponding to a single value of ^ is avoided. The Powell search uses conjugate gradients to optimize a function and does not require derivatives to be calculated or supplied by the user. The algorithm does occasionaly converge to a local optimum rather than a global op.timum which may be a serious defect for designs with very few observations, but for larger designs, the occasional inclusion of a suboptimal point does not have a major effect.on the overall design. However, if the point x; was known to be a local optimum Chapter 5. 86 EXAMPLE read in initial data - X, epsilon • calculate the mean of X - xmean •• calculate the covariance matrix for X - Cov, its inverse Covinv and its determinant Covinvdet. ••• prompt for upper and lower bounds for data • calculate fisher information matrix and its determinant - M , det • caculate the dispersion matrix - D = M~ x • prompt for number of observations to add • begin loop — calculate x,constrained to lie between -1 and 1, to maximize -r= D-f= T xmax. — if xmax Dxmax < rowdim(i9), found local minimum, repeat search for xmax. — transform xmax to lie within the region of interest. — update weights epsilon and add xmax to X — update dispersion matrix D — update det end loop output X, D, M, det, epsilon Figure 5.11: A summary of the program to find the optimum design — Chapter 5. (-j±= M 87 EXAMPLE << minmax-7^ M T ~ p), the search was repeated until a point T was found with a positive weight (epsilon). The determinant of M and M _ 1 are revised after each observation is added using equations 2.52 and 2.53. Once the observation to be added to the design is found, its corresponding weight is calculated using equation 2.51. This method of calculating weights was used since it helps minimize the negative effects of false convergence of the Powell algorithm. As stated in section 2.1.2, ^ ^ T M - 1 ^ = is always greater or equal to k (the number of independent variables) for points which should be included in the optimal design. The Powell algorithm will sometimes converge to a point x where M~ < k. When this occurs, the as- l sociated weight will be negative indicating observations should be removed from that point. In practise, if no observations are at that point already, the observation can be deleted from the design and the remaining weights normalized to sum to one. The program continues adding observations until the user specified number are added. Because the algorithm constructs an approximate design, the resulting design is not dependent on the sample size n. A sufficient number of points should be added so that the weights are decreased for nonoptimal points included in the initial design. The resulting design and its associated characteristics are printed. In practise, the optimal design algorithm will eventually start cycling around a finite number of points (the support of the optimal design). When this occurs, the user may wish to terminate the program, identify the support and weights, and replicate this design until the required number of observations are included. The number of points required to achieve a specific precision p can be calculated using the following formula. *VarL 2 N = . P where Var'i = max, y „ e x Dx. J (5.83) 2 For this example, one hundred points were added sequentially to the pilot study. The addition of the last observations resulted in minimal Chapter 5. EXAMPLE 88 Table 5.4: Comparison of various designs in terms of sample size and cost to achieve the same precision of 1 cubic metre per plot. Design Sample Size Total Study Average Cost Cost per observation Random 1,084 $216,797.31 $200.00 Uniform 77 $21,457.09 $280.71 Pure Optimum 53 $50,070.28 $381.51 Optimum 85 $17,115.60 $202.23 changes in the objective function. Thus, to get a complete design, one would begin replicating the points until the desired precision was reached. The resulting design is illustrated infigures5.12 through 5.14. The relative frequencies are given over the original sample space. While taking observations, the experimenter may wish to periodically check whether or not the original assumptions are justified by the data and whether the original objectives need revision. If changes are required, a new design can be sought which accounts for the adjusted design framework. Periodic checks are important to ensure the relevance of the collected data to the objectives. The results are summarized in tables 5.4 and 5.5. Table 5.4 gives the sample size and the total study cost required to meet the precision objective of estimating volume to within ±lm /plot, 19 times out of 20. Table 5.5 gives 3 the sample size and the minimum precision attained for a total study cost of $50,000.00. For the full study, one may wish to combine some of the sample points which are close together. Chapter 5. EXAMPLE 89 figure 5.12 : Relative frequency of o b s e r v a t i o n s o v e r the s a m p l e s p a c e for the o p t i m u m s a m p l e design for CC = 5 0 - 65%. Chapter 5. EXAMPLE 90 figure 5.13 : Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the o p t i m u m s a m p l e design for CC = 65 - 8 5 % . Chapter 5. EXAMPLE 91 figure 5.14 : Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the o p t i m u m s a m p l e design for CC = 8 5 - 1 0 0 % . Chapter 5. EXAMPLE 92 Table 5.5: Comparison of various designs in terms of sample size and minimum precision obtained for a total study cost of $50,000.00 Design Sample Size Minimum Precision Average Cost m /plot per observation 3 Random 250 4.336 $200.00 Uniform 179 0.4291 $280.71 97 1.109 $381.51 248 0.3423 $202.23 Pure Optimum Optimum 5.6 COMPARISONS W I T H O T H E R DESIGNS The following discussions refer to figures 5.12 through 5.22 and tables 5.4 and 5.5. For ease of graphical representation, crown closure has been divided into three classes, 50 65 %, 65 - 85%, and 85 - 100%. Thefiguresshow the relative frequency of sampling for various combinations of the independent variables for the different designs. 5.6.1 R A N D O M SAMPLING Random sampling should result in observations with a distribution roughly the same as the availability function. Therefore, the availability function was used as an estimate of the probable distribution of a random sample. The resulting value of the criterion function was evaluated using the following equation. det ( / x / ( x ) ) ( / x / ( x ) ) T where /(x) = (x — x) Cov(x — x) T (5.84) The advantages of this design are that it is easy to implement and requires very little planning prior to sampling. The ease of implementation is difficult to surpass.. However, the freedom from planning may be detrimental to the survey since managers may give little thought to ranges of interest, design objectives, and possible models prior to Chapter 5. EXAMPLE experimentation. 93 This can lead to studies which do not gather the data required to meet the objectives, more expensive studies, or to studies with poorly conceived objectives. Since the design, in effect, samples proportionally to the relative frequency of the observations in the population, and availability was set proportional to this relative frequency, the design minimizes according to availability, ignoring the resulting precision. Thus the design behaves very poorly when evaluated in terms of precision. As seen in table 5.4, the random design is the most expensive to implement. Considerably more observations are required to achieve the same precision as the optimum design. This may be an advantage. Large samples have many desirable properties including approximate distributions. However, the design is also more expensive to implement. As shown in table 5.4, for the example, random sampling is an expensive design to implement due to the large sample size required to obtain the desired accuracy. 5.6.2 UNIFORM SAMPLING Uniform sampling is recommended for pilot studies, for applications where the model is unknown, or when there is a wide range of future uses for the data. For example, ordinary permanent sample plots should cover the range of forest types and condition approximately uniformly since the data collected will be put to a variety of uses, some of which may not be anticipated. These are long term plots and their location should not be based on assumptions which are likely to change considerably over time. The uniform design for the example was obtained by sampling at the lower and upper bounds and the middle of the range of interest for all the independent variables. This design is conceptually the simplest and also provides data for a wide variety of uses. However, in terms of the criterion and model chosen, it is not the most efficient. The design provides precise estimates but is relatively expensive to implement. The uniform design performs better than the random and pure optimum designs but cannot match Chapter 5. EXAMPLE 94 Figure 5.15: Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the r a n d o m s a m p l e design for CC = 5 0 - 65%. Chapter 5. EXAMPLE 95 Figure 5.16: Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the r a n d o m s a m p l e design for CC = 65 - 8 5 % . Chapter 5. EXAMPLE 96 Figure 5.17: Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the r a n d o m s a m p l e design for CC = 8 5 - 100%. Chapters. EXAMPLE 97 the optimum design with costs. The relatively small sample size may be a disadvantage if one is relying on large sample properties such as asymptotic normality. 5.6.3 P U R E O P T I M U M DESIGN Using the optimal design theory available in the literature, that is, ignoring differing availabilities, a pure design was constructed to maximize the following objective function. det XX ] T (5.85) The sample frequencies are illustrated in figures 5.21 and 5.22. The characteristics associated with the optimum design are given in table 5.4. As anticipated, the design concentrates observations on the boundary of the sample space. Although the design performs well in terms of minimizing the number of oberservations, it does not consider the difficulty of locating boundary observations (i.e., the design ignores availability). Ignoring differing availabilities (and therefore costs), this design performs the best since it achieves the desired precision with the lowest number of observations. As seen in table 5.4, when considering costs based on differing availability, this design no longer performs as well. As seen in table 5.5, for a given study cost, the pure optimum design cannot even perform as well as the uniform design. 5.6.4 DISCUSSION One can compare designs both in terms of the sample size or in terms of the expenditures required to achieve the desired precision. However, these two measures often lead to different conclusions. These quantities were determined from information obtained from the pilot study. When one undertakes the full study, one should then update the initial data. It is usually a good practise to set aside some resources for additional Chapter 5. EXAMPLE 98 figure 5.18 : Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the uniform s a m p l e design, for CC = 5 0 - 65%. Chapter 5. EXAMPLE 99 figure 5.19 : Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for the uniform s a m p l e design, for CC = 65 - 8 5 % . Chapter 5. EXAMPLE 100 figure 5.20 : Relative frequency of observations over the sample space for the uniform sample design for CC = 85 - 100%. Chapter 5. EXAMPLE 101 figure 5.21 : Relative frequency of o b s e r v a t i o n s over the s a m p l e s p a c e for ths pure o p t i m u m s a m p l e design, for CC = 5 0 - 65%. Chapter 5. EXAMPLE 102 figure 5.22 : Relative frequency of observations over the sample space for the pure optimum sample design, for CC = 85 - 100%. Chapter 5. EXAMPLE 103 observations to be taken after the full study is complete. Observations may be lost or found to contain errors or the population variability may be greater than anticipated and additional observations may be required. In the example, the only complication introduced was differing availabilities translated into differing costs. The optimum design obtained was significantly different both in terms of cost and sample size from the pure optimum design which ignored availabilities. If a per cent error had been specified rather than an absolute allowable error, then the optimal design would be quite different again. Observations would be concentrated where the most precision was required. As can be seen, practical considerations can lead to quite different designs than one obtains when one only considers statistical properties. Without modification, optimal design theory cannot even match the performance of a uniform design (compare the cost of the pure optimum design with the cost of the uniform design in table 5.4). 5.7 5.7.1 MODIFICATIONS N O N L I N E A R MODELS The increasing use of nonlinear models in forestry, particularly in growth and yield, requires that, to be of general utility, optimal design theory must be applied to these models as well. The previous theory applies to nonlinear as well as linear models. However, as previously mentioned, the information matrix and, therefore, the optimal design depend on the values of the parameters. Estimates may be available from previous work or from a pilot study. The formulas for calculating parameter estimates are approximate as are the formulas for variances of the parameters and predictions. The algorithm presented requires little modification to handle nonlinear designs. The subroutine which Chapter 5. EXAMPLE 104 calculates x M x (funcmin.c) must be generalized to evaluate T _ 1 M~ l where /(x) is the nonlinear model and 3 the vector of parameters. This more general subroutine will also work for linear models. If analytic partial derivatives are available, they should be supplied to reduce the run time of the program. If analytic derivatives are not available, computational derivatives can be calculated by finite differences using any of the many algorithms available. Sequential experimentation is recommended for nonlinear models so that the parameter estimates can be constantly updated using the available data. 5.7.2 POLYNOMIAL MODELS The program was written to handle only freely changing variables. If an independent variable is determined completely by one or more of the other freely changing variables, it is not free to vary. The program is easily modifed to account for such variables by only searching for values of those variables free to vary. The subroutine which calculates x M x {funcmin.c) must include the constrained variables as well as the freely varying T _ 1 ones but the constrained variables will not be searched for. They will be determined by the other independent variables. For example, tofitthe following model, Vol = f% + d^Dbh + 3 Ht + 3 Dbh Ht (5.86) 2 2 3 one can search for the vector (x°) = (1, Dbh, Ht) which maximizes x M~ x T x T = T 1 where (l,Dbh,Ht,Dbh Ht). 2 If one variable is a linear combination of other freely varying independent variables, the resulting information matrix M will be singular and the inverse not defined. In this case, the model should be reformulated to remove the variable, e.g., Vol = 3 + frDbh + 8 Basal Area + 3 Ht 2 0 2 z (5.87) Chapter 5. EXAMPLE should be replaced by Vol = 3 + frDbh + 0 Ht 2 0 2 or Vol = 3 + fcBasal Area + 3 Ht 0 since Dbh is proportional to Basal Area. 2 2 Chapter 6 A D D I T I O N A L A P P L I C A T I O N S A N D VARIANTS The previous example incorporates the optimal design theory as well as considering observation availability. Applications of optimal design theory may not always be as obvious. 6.1 UNIQUENESS Returning to the example of section 3.1 where volume was predicted from diameter and height, two possible model forms are the constant form factor and combined variable equations given below (notation after Clutter et a/.(1983)). constant form factor Vol = 8 Dbh Et (6.90) 2 x combined variable Vol = 0 + 8i.Dbh Ht (6.91) 2 O Here the independent variable is Dbh Ht. Optimal design algorithms will specify the 2 optimal design as consisting of observations taken at certain values of Dbh Ht. In 2 practise then, the optimal design is not unique since different combinations of diameter and height can yield the same value of Dbh Ht. The manager is then free to select his 2 own values of diameter and height to yield the required value of Dbh Ht, or take them 2 as he encounters them in the field. If, however, different combinations of Dbh and Ht 2 occur with different frequencies, the manager may wish to differentiate combinations of Dbh and Ht based on their availability. 2 106 Chapter 6. ADDITIONAL APPLICATIONS AND VARIANTS 107 Even when independent variables uniquely correspond to actual physical quantities, the optimal design may not be unique particularly when the sample space is spheroid and the availability function is symmetric. Several (possibly an infinite number of) designs may be optimal in terms of the criterion chosen. The optimal design algorithm should be run a number of times using different starting conditions to see if more than one optimal design does exist. The manager may then choose freely from among the optimal designs. Perhaps he has some additional criteria for evaluating designs. One or more of the criteria in section 2.1.2 could be used for this purpose. Another instance when optimal designs are not unique is when an independent variable is a non-unique function of the measured quantity. For example, squared terms can correspond to the positive or negative square root. If the sample space does not eliminate all but one of the possibilities, the experimenter is free to choose from among the allowed values. 6.2 S T R A T I F I C A T I O N A N D Q U A L I T A T I V E V A R I A B L E S A manager may wish to stratify the sample space prior to obtaining observations. For example, he may wish to stratify tree volume equations by species or region. This is particularly common in surveys. The manager may wish to have a design which takes into account the stratification. One approach is to obtain a separate optimal design for each stratum. Another approach is to explicitly include the stratification by means of dummy variables. Regression equations may include qualitative variables (including dummy variables). The interpretation of the corresponding coefficients and their standard errors depend on the equation formulation. Optimal design theory handles qualitative variables in the same manner as discrete quantitative variables. Applications of optimal design theory Chapter 6. ADDITIONAL APPLICATIONS AND VARIANTS 108 to qualitative variables are limited by whether or not the particular search algorithm used to find the optimum design can handle discrete designs spaces. 6.3 D I S C R E T E Q U A N T I T A T I V E VARIABLES Often in forestry, measurements are placed in discrete intervals. Diameter is often placed in 2 cm dbh classes and heights placed in 2 m height classes. For discrete variables, the algorithm which is used to obtain the optimal design must be able to search discrete design spaces. Most of the algorithms discussed previously search the design space for the observation whose addition or deletion will most improve the objective function. If the search algorithm to find this observation relies on derivatives (e.g., steepest descent), the algorithm will not work on discrete design spaces since the derivatives are not continuous. Other algorithms, such as grid search, are required. One alternative is to approximate the optimal discrete design byfindingthe optimum continuous design and then placing the observations in classes. The optimality of the design may be compromised slightly although the design should still perfom well under the optimality criterion. The smaller the widths of the classes and the greater the number of observations, the closer the resulting approximate design will be to being optimal. The discrete and continuous designs can be compared by comparing the respective values of the criterion function. Chapter 7 CONCLUSION Optimal design is an extremely important and useful statistical tool which has not received much attention in forestry. In part, this has been due to limitations in the theory. Optimal design theory can obtain designs with very desirable statistical properties but which may be difficult to implement for practical reasons. Foresters typically deal with widely varying organisms and conditions and designs which do not account for different observation costs and availabilities may not be practical. By explicitly considering these factors in the search for an optimal design, the resulting design has desirable statistical properties as well as being practical to implement. The advantages of optimal designs are that they are more efficient than traditional forestry designs and the preparation involved (identification of objectives, ranges of interest, etc.) forces the manager to review what he is doing and plan the study in some detail prior to data aquisition. As described previously, the use of an availability weighting function allows for distinguishing between observations in terms of their cost, precision, or availability, not only on the basis of their statistical information. This combination of statistical and practical concerns leads to informative, practical designs by exploiting the experimenter's knowledge of the relationships between the variables. The superiority of the optimal design with costs over pure optimal, uniform, and random designs was demonstrated in an example where only unequal availabilities were considered. When additional complications are introduced, such as variable precision 109 Chapter 7. CONCLUSION 110 requirements, one can expect the differences between the designs to be magnified. Particularly when larger, more expensive studies are undertaken, one should investigate the different designs available. It may seem as if the amount of information necessary for constructing an optimal design is excessive, particularly if the study is being undertaken because one does not know much about the population. However, it is unrealistic to expect to be supplied with a design most efficient for meeting one's objectives without providing any information. In fact, one should be suspicious of any designs which make such claims. The standard experimental designs such as randomized complete block and latin square are not claimed to be universally optimal designs. When one has differing costs and availabilities and a unique design problem, it is to be expected that the most efficient design will also be unique. Optimal design with costs considerations can provide the most efficient designs for the criterion chosen. Unless secondary considerations make an alternative design more desirable, all other designs are suboptimal and are wasting resources somewhere. The increasing demand for predictions withfinerresolution in forestry has lead to research into new predictive models, new estimators, and to more data collection. Optimal design can do much to increase the precision of predictions. Further work is required in the area of multiple-use designs where the same design is to provide data for two or more uses. REFERENCES CITED Ash, A. and Hedayat, A. 1978. An introduction to design optimality with an overview of the literature. Commun. Statist. Theor. Meth. A7(14):1295-1325. Atkinson, Anthony C. 1975. Planning experiments for model testing and discrimination. Math. Oper. Statist. Ser. Statist.6(2):253-267. Atkinson, A. C. 1981. A comparison of two criteria for the design of experiments for discriminating between models. Technometrics. 23:301305. Atkinson, A. C. 1982. Developments in the design of experiments. Int. Statist. Rev. 50:161-177. Atwood, Corwin L. 1973. Sequences converging to D-optimal designs of experiments. Ann. Statist. 1:34-35. Bandemer, H. 1980. Problems in foundation and use of optimal experimental design in regression models. Math. Oper. Statist. Ser. Statist.ll(l):89-113. Bohachevsky, Jhor O., Johnson, Mark E., and Stein, Myron, L. 1986. Generalized simulated annealing for function optimization. Technometrics 28(3):209-217. Bondar, James B. 1983. Universal optimality of experimental designs: definitions and a criterion. Canad. J. Statist. 11(4):325-331. Borth, David M. 1975. A total entropy criterion for the dual problem of model discrimination and parameter estimation. J. Roy. Statist. Soc. Ser. B37(l):77-87. Ill Box, G. E. P. and Draper, Norman R. 1959. A basis for the selection of a response surface design. J. Roy. Statist. Soc. Ser. B54:622-654. Box, G. E. P. and Draper, Norman R. 1975. Robust designs. Biometrika 62:347-352. Box, G. E. P., and Draper, N. R. 1987. Empirical model-building and re- sponse surfaces. John Wiley and Sons. New York. 669 pp. Box, G.E.P. and Hill, W.J., 1967. Discrimination among mechanistic models. Technometrics 9:57-71. Box, G. E. P., and Lucas, H. L. 1959. Design of experiments in non-linear situations. Biometrika 46:77-90. Box, M. J. 1969. Planning experiments to test the adequacy of non-linear models. Applied Statist. 18:241-248. Brooks, R. J. 1972. A decision theoretic approach to optimal regression designs. Biometrika 59:563-571 Brooks, R. J. 1974. On the choice of an experiment for prediction in linear regression. Biometrika 61(2):303-311. Chaloner, Kathryn. 1984. Optimal Bayesian experimental design for linear models. Ann. Stat. 12(l):283-300. Cheng, Ching-Shui.. 1978. A note on (M,S)-optimality. Commun. Statist.Theor. Meth. A7(14):1327-1338. Cheng, Ching-Shui. 1987. An optimization problem with applications to optimal design theory. Ann. Statist. 15(2):712-723. Clutter, Jerome L., Fortson, James C, Pienaar, Leon V, Blister, Graham H, 112 and Bailey, Robert L. 1983. Timber Management : a Quantitative Approach. John Wiley and Sons, New York. 333 pp. Cook, R. Dennis and Nachtsheim, Christopher J. 1980. A comparison! of algorithms for constructing exact D-optimal designs. Technometrics 22(3):315-324. Covey-Crump, P. A. K. and Silvey, S. D. 1970 Optimal regression designs with previous observations. Biometrika 57(3):551-566. Cunia, T. 1979a. On tree biomass tables and regression : some statistical comments. In Workshop Proceedings : forest resource inventories. Vol II. Colorado State Univ., Fort Collins, CO. pp. 629-642. Cunia, T. 1979b. On sampling trees for biomass tables construction : some statistical comments, in Workshop Proceedings : forest resource inventories. Vol II. Colorado State Univ., Fort Collins, CO. pp 643664. De La Garza, A. 1954. Spacing of information in polynomial regression. Ann. Math. Statist. 25:123-130. Demaerschalk, J. P. and Kozak, A. 1974. Suggestions and criteria for more effective regression sampling. Can. J. For. Res. 4:341-348. Demaerschalk, J. P. and Kozak, A. 1975a. Suggestions and criteria for more effective regression sampling. 2. Can. J. For. Res. 5:496-497. Demaerschalk, J. P. and Kozak, A. 1975b. How to sample more effectively for multiple linear regression. Fac. of For. U. B. C. unpub. 18pp. Doehlert, D. H. 1970. Uniform shell designs. Appl. Statist. 19:231-239. Draper, N. R. and Herzberg, A. M. 1979. Designs to guard against outliers 113 in the presence or absence of model bias. Canad. J. Statist. 7:127135. Draper, Norman R. and Hunter, William G. 1966. Design of experiments for parameter estimation in multiresponse situations. Biometrika 53:525-533. Draper, Norman R. and Hunter, William G. 1967a. The use of prior distributions in the design of experiments for parameter estimation in non-linear situations. Biometrika 54:147-153. Draper, Norman R. and Hunter, William G. 1967b. The use of prior distributions in the design of experiments for parameter estimation in non-linear situations : multiresponse case. Biometrika 54:662-665. Elfving, G. 1952. Optimum allocation in linear regression theory. Ann. Math Stat. 23:255-262. Fedorov, V. V. 1972. Theory of Optimal Experiments. Academic Press, New York. 270 pp. Fedorov, V. V. and Malyutov, M. B. 1972. Optimal designs in regression problems. Math. Oper. Statist. Ser. Statist.3(4):281-308. Freese, Frank. 1962. Elementary Forest Sampling. Agriculture Handbook No. 232. U.S. Dept. of Agr., Forest Service. 91 pp. Gaffke, N. and Krafft, O. 1982. Exact D-optimum designs for quadratic regression. J. Roy. Statist. Soc. Ser. B44(3):394-397. Galil, X. and Kiefer, J. 1983. Time- and space-savinc computer methods related to Mitchell's DETMAX, for finding D-optimum designs. Technometrics 22(3):301-313. 114 Gertner, George. 1987. A procedure to evaluate sampling schemes for improving already calibrated models. Forest Science 33(3):632-643. Goel, P. K. and DeGroot, M. H. 1980. Only normal distributions have linear posterior expectations in linear regression. J. Amer. Statist. Soc.75:895-900. Guest, P. G. 1959. The spacing of observations in polynomial regression. Ann. Math. Statist. 29:294-299. Hansen, Morris. H., Madow, William G., and Tepping, Benjamin, J. 1983. An evaluation of model-dependent and probability sampling inferences in sample surveys. J. Amer. Statist. Soc.78(384):776-793. Hedayat, A. 1981. A study of optimality criteria in design of experiments. In Statistics and Related Topics. M. Csorgo, D. A. Dawson, J. N. K. Rao, and A. K. Md. E. Saleh, eds. North-Holland Publishing Company. Herzberg, A. M. and Cox, D. R. 1969. Recent work in design of experiments, bibliography and review. J. Roy. Statist. Soc. A 32:29-67. Hill, Peter D. H. 1978. A review of experimental design procedures for regression model discrimination. Technometrics 20(1):15-21. Hill, Peter D. H. 1980. D-optimal designs for partially non-linear regression models. Technometrics 22:275-276. Hunter, W. G. and Reiner, A. M. 1965. Designs for discriminating between two rival models. Technometrics 7:307-323. Johnson, M. E. and Nachtsheim, C. J. 1983. Some guidelines for constructing D-optimal designs on convex design spaces. Technometrics 25(3):271-277. 115 Kennard, K. W. and Stone, L. A. 1969. Computer aided design of experiments. Technometrics 11(1):137-148. Ker, J. W. &: Smith, J. H. G. 1957. Sampling for height-diameter relationships. J. of Forestry. 205-207. Kiefer, J. 1959. Optimum experimental designs and discussion. J. Roy. Statist. Soc. Ser. B21:272-319. Kiefer, J. 1975. Construction and optimality of generalized Youden designs. In A Survey of Statistical Designs and Linear Models J. N. Srivas- tava ed. North-Holland, New York. pp333-353. Kiefer, J. and Wolfowitz, J. 1960. The equivalence of two extremum problems. Canad. J. Math. 12:363-366. Kirkpatrick, S., Gelatt, C. D. Jr.,and Vecchi, M. P. 1983. Optimization by simulated annealing. Science. 220(4598): 671-680. Lau, Tai-shing and Studden, W. J. 1985. Optimal designs for trignometric and polynomial regression using canonical moments. Ann. Statist. 13(l):383-394. Lindley, D. V. 1968. The choice of variables in multiple regression and discussion. J. Roy. Statist. Soc. Ser. B30:31-66. Little, Roderick J. A. 1983. Comment on Hansen et al (1983). J. Amer. Statist. Soc.78(382):797-799. combinatorial problems in statistics. Biometrika 72(1):191-198. Mehra, R. K. 1973. Frequency domain synthesis of optimal inputs for parameter estimation. Div. of Eng. and Appl. Physics. Harvard Univ. Technical Rep. 645. 116 Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1092. Mitchell, Toby J. 1974a. An algorithm for the construction of "D-optimal" experimental designs. Technometrics. 16(12):203-210. Mitchel, Toby J. 1974b. Computer construction of "D-6ptimal" first-order designs. Technometrics 16(2):211-220. Ott, R. L. and Myers, R. H. 1968. Optimal experimental designs for estimating the independent variable in regression. Technometrics. 10:811823. Paine, David P. 1981. . Aerial Photography and Image Interpretation for Resource Management. John Wiley & Sons, New York. 571 pp. Pazman, A. 1980b. Some features of the optimal design theory - a survey. Math. Oper. Statist. Ser. Statist.ll(3):415-446. Pazman, A. 1986. Foundations of Optimum Experimental Design. D. Reidel Publishing Co., Boston. 228pp. Pesotchinsky, L. 1978. $ -optimal second order designs for symmetric regions. J. Statist. Plann. Inf. 2:173-188. p Powell, M. J. D. 1964. An efficient method of finding the minimum of a function of several variables with calculating derivatives. The Computer Journal 7:155-162. Press, William H., Flannery, Brian P, Teukolsky, Saul A., and Vetterling, William T. 1986. Numerical Recipes - The Art of Scientific Computing. Cambridge University Press, Cambridge. 818 pp. 117 Roth, P. M. 1965. Design of experiments for discriminating among rival models. Ph. D. thesis. Princeton University, [cited in Hill,1978] St. John, R. C. and Draper, Norman R. 1975. D-optimality for regression designs: a review. Technometrics. 17(l):15-23. Scheffe, H. 1963. The simplex-centroid design for experiments with mixtures. J. Roy. Statist. Soc. Ser. B25:235-263. Schreuder, H, Brink, G, Schroeder, D, and Dieckman, R. 1984. Model-based sampling versus point-Poisson sampling on a timber sale in the Roosevelt National Forest in Colorado. Forest Science. 30(3):652-656. Schreuder, H and Wood, G. 1986. The choice between design-dependent and model-dependent sampling. Can. J. For. Res. 16:260-265. Silvey, S. D. 1975. Statistical Inference. Chapman and Hall, New York. 191pp. Silvey, S. D. 1980. Optimal Design. Chapman and Hall, New York. 86pp. Smith, J. H. G. 1965. Biological principles to guide estimation of stand volumes. Photogrammetric Engineering. 87-91. Smith, K. 1918. On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidelines they give towards a proper choice of the distribution of observations. Biometrika 12:1-85. Smith, T. M. Fred. 1983. Comment on Hansen et al (1983) J. Amer. Statist. Soc.78(382):801-802. Steinberg, David M. and Hunter, William G. 1984. Experimental design : review and comment. Technometrics 26(2):71-130. 118 Stigler, Stephen M. 1971. Optimal experimental design for polynomial regression. J. Amer. Statist. Soc.66(334):311-318. Studden, W.J. 1978. Designs for large degree polynomial regression. Commun. Statist.-Theor. Meth. A7(14):1391-1397. Studden, W. J. 1982. Some robust-type D-optimal designs in polynomial regression. J. Amer. Statist. Soc.77(380):916-921. Welch, William J. 1983. A mean squared error criterion for the design of experiments. Biometrika. 70(1):205-213. Welch, W. J. 1984. Computer-aided design of experiments for response estimation. Technometrics. 26:217-224. White, L. V. 1973. An extension of the general equivalence theorem to non-linear models. Biometrika 60:345-348. White, L. V., and Welch, W. J. 1981. A method for constructing valid restricted randomization schemes using the theory of D-optimal design of experiments. J. Roy. Statist. Soc. Ser. B43(2):167-172. Wynn, Henry P. 1970. The sequential generation of D-optimum experimental designs and discussion. Ann. Math. Statist. 41(5):1655-1664. 119 Appendix A Listing of Computer Program The following computer files comprise the program used to obtain the optimal design for the example in chapter 5. The main program is called "MINIMIZE.C" which calls all the other subroutines. The program is run by first compiling the main program and then running the object code. The user will be prompted for various options. A description of the program is given in figure 5.11. 120 Appendix A. Listing of Computer Program /* /* /* header f i l e f o r MINIMIZE.C /* contains include f i l e s #include <stdio.h> #include <math.h> #include <stdlib.h> #include "marg:matrixco.c" #include "marg:matrixda.c" */ */ */ */ Appendix A. Listing of Computer Program 122 /* constant declarations */ #define #define #define #define #define #define XMAXDIM 500 /* maximum dimension of data matrix */ MAXDIM 10 /*maximnm dimension of an array or matrix */ TOLERANCE 1.0e-10 /* tolerance f o r eigen vector convergence */ MAXITERATION 100 /* maximum # of iterations for eigen */ PI 3.14159 MAXX 1.0e+10 /* constant declarations */ Appendix A. Listing of Computer Program /* data type delcarations */ typedef double r e a l ; typedef struct { int rowdim; int coldim; real array[MAXDIM][MAXDIM]; } matrix ; typedef struct { i n t rowdim; r e a l array[MAXDIM]; } vector; typedef s t r u c t { vector l x ; vector l d i r e c t i o n ; } globvar; typedef struct { int rowdim; int coldim; real array[XMAXDIM][MAXDIM]; } xmatrix; typedef struct { int rowdim; real array[XMAXDIM]; } evector; /* data type delcarations */ 123 Appendix A. Listing of Computer Program /* /* /* /* /* /* /* /* 124 */ */ */ */ MINIMIZE. C m a i n programme **************************************/ p r o g r a m t o m i n i m i z e x ' D x where D i s t h e d i s p e r s i o n m a t r i x o f t h e i n i t i a l d a t a f o u n d i n m i n d a t . t x t and p r i n t s t h e r e s u l t s t o m i n o u t . t x t and m i n s u m . t x t r e q u i r e s t h e f u n c t i o n s p o w e l l , i n v e r t , decomp, and f u n c m i n / * input neccessary l i b r a r y f i l e s */ */ */ */ */ #include "marg:mintemp.h" /****:t^* * * * * * * * * * * * * * * * * * * * * * * /* . /* global variables /* /* /* /*• /* /* /* /* /* /* /* /* /* /* /* /*• /* */ */ */ f 1 = input f i l e format x x x contains p i l o t study data row d i m e n s i o n column d i m e n s i o n m a t r i x - e a c h row c o r r e s p o n d i n g t o one observation f2 = complete output f i l e f3 = summary o u t p u t f i l e D - d i s p e r s i o n m a t r i x = inv(M) - used i n fmin Covinv = inverse of Covariance matrix of p i l o t study data xmean = v e c t o r o f means o f p i l o t s t u d y d a t a mid = v e c t o r of m i d d l e o f range of i n t e r e s t lower = v e c t o r of lower l i m i t of range of i n t e r e s t upper = v e c t o r of upper l i m i t of range of i n t e r e s t Covinvdet = determinant of Covinv i n t e r c e p t = 1 i n d i c a t e s model w i t h i n t e r c e p t ; 0 = no i n t e r c e p t _stack = used to set stack s i z e */ */ */ */ */ */ *•/ */ */ */ */ */ */ */ */ */ /*• */ /*******************************************************************/ FILE FILE FILE * f l *f2 *f3 Appendix A. Listing of Computer Program matrix matrix vector vector real int int D; Covinv; xmean; m i d , lower, upper; Covinvdet; intercept; . s t a c k = 80; / * input necessary directory f i l e s #include #include #include #include #include #include #include 125 */ "marg:print.c" " m a r g : x p r i m e . . c" "marg:weight.c" "marg:funcmin.c" "marg:powell.c" "marg:invert.c" "marg:decomp.c" mainO */ /* /* /* local */ */ /* X /* R /* Cov /*• M /* Q,Dtemp /* d i r e c t i o n = = -= = = /* xmax - /*• = = = /* /* /* /* /*• /* /* variables nmax vmax alpha epsilon f min weightx det = = = = p i l o t s t u d y d a t a as w e l l as added o b s e r v a t i o : c o r r e l a t i o n matrix of p i l o t study */ covariance matrix of p i l o t study */ i n f o r m a t i o n m a t r i x of a l l o b s e r v a t i o n s */ temporary m a t r i c e s */ matrix of search d i r e c t i o n s */ c u r r e n t x t o maximize x ' D x */ number o f o b s e r v a t i o n t o add t o x */ temporary v e c t o r */ s c a l a r w e i g h t f o r xmax */ p r o b a b i l i t y f u n c t i o n on X */ v a l u e o f xmax'Dxmax */ weight(x) */ d e t e r m i n a n t of M */ */ xmatrix x; static matrix Dtemp,R,Q,M,direction,Cov; Appendix A. Listing of Computer Program int i,j,k,ii,nmax,count; vector static 126 VjXmax; evector epsilon; real alpha,cnst,fmin,temp,det,weightx,initial; float floattoreal,floattoreall; fl=fopen("marg:mindat.txt","r"); / * • open f i l e f o r reading */ f2=fopen("marg:minout.txt","w"); / * open f i l e f o r writing */ f3=fopen("marg:minsum.txt","w"); / * open f i l e for writing */ /* read i n the dimensions of x * / fscanfCfl, printf(" " °/.d",&x.rowdim) ; t h e row d i m e n s i o n o f x i s '/,d\n" , x . r o w d i m ) ; fscanfCfl, " °/.d",&x.coldim) ; epsilon.rowdim = x.rowdim; fprintf(f2, " t h e d i m e n s i o n s a r e '/,d by ' / , d \ n \ n " . x . r o w d i m , x . c o l d i m ) ; xmax.rowdim = x . c o l d i m ; /* prompt f o r i n t e r c e p t printf("enter scanf(" p r i n t f (" /* */ a 1 for intercept, '/.d" , & i n t e r c e p t ) ; intercept = 5£d\n" , i n t e r c e p t ) ; read i n the o r i g i n a l data matrix fprintf(f2, for(i otherwise\n"); 0 "The o r i g i n a l d a t a = 0 ; i < x.rowdim ; and w e i g h t s */ matrix\n"); i++) { for(j = intercept ; j < x.coldim+1 ; j++) { fscanfCfl," If",&floattoreal); x.array[i][j] = floattoreal; fprintf(f2," If",x.array [i][j]); > fprintfCf2," \ n " ) ; e p s i l o n . a r r a y [ i ] = 1.0/x.rowdim; } printf C'finished reading\n"); ifCintercept == 1) / * add a column o f . l ' s * / { /* if x . c o l d i m += 1; forCi = 0 ; i < x.rowdim ; i++) intercept */ Appendix A. Listing of Computer Program x.array[i][0] = 1 . 0 ; > /* calculate means of x and store i n xmean */ xmean.rowdim = x.coldim - intercept; f o r ( j = 0 ; j < xmean.rowdim ; j++) { xmean.array [ j] = 0 ; f o r ( i = 0 ; i < x.rowdim ; i++) xmean.array[j] += x.array[i][j+intercept]; xmean.array[j] /= x.rowdim; } fprintf(f2,"the vector of means of the original data\n"); mprintv(xmean); /* calculate Cov and i t s inverse Covinv using original data */ Cov.rowdim = xmean.rowdim; Cov.coldim = xmean.rowdim; Covinv.rowdim = Cov.rowdim; Covinv.coldim = Cov.coldim; f o r d = 0 ; i < Cov.coldim ; i++) for(j = i ; j < Cov.coldim ; j++) { Cov.array[i][j] = 0 ; for(k = 0 ; k < x.rowdim ; k++) Cov.array[i][j] += (x.array[k][i+intercept] -xmean.array[i])*(x.array[k][j+intercept] xmean.array[j])*epsilon.array[k] •(x.rowdim)/(x.rowdim-1); Cov.array[j][i] = Cov.array[i][j]; } fprintf(f2,"The Cov matrix\n"); mprintf(Cov); invert(Cov,&Covinv); f p r i n t f (f2, "The inverse of the covariance m a t r i x W ) ; mprintf(Covinv); decomp(Covinv,&Dtemp,&j,&v); Covinvdet = 1; f o r d = 0 ; i < Covinv.rowdim ; i++) Covinvdet *= Dtemp.array[i][i] ; Appendix A. Listing of Computer Program Covinvdet *= j ; f p r i n t f (f.2, "the determinant of Covinv i s %g\n".Covinvdet); /* prompt f o r range of interest = range of operability */ lower.rowdim = x.coldim; upper.rowdim = x.coldim; mid.rowdim = x.coldim; f o r d = intercept ; i < x.coldim ; i++) { printf ("Enter the lower limit f o r variable /,d\n",i); scanf (" '/.f",&floattoreal) ; lower.array[i] = f l o a t t o r e a l ; printf ("Enter the upper limit f o r variable °/,d\n",i); scanf(" '/.f",&floattoreal); upper.array[i] = f l o a t t o r e a l ; mid.array[i] = (upper.array[i] + lower.array[i])/2.0; ( } /* calculate the Fisher information M */ /* and the i n i t i a l direction matrix */ M.rowdim = x.coldim; M.coldim = x.coldim; direction.rowdim = x.coldim - i n t e r c e p t ; direction.coldim = x.coldim - intercept; f o r d = 0 ; i < M.rowdim ; i++) { • . f o r ( j = i ; j < M.rowdim ; j++) { M. array [i]-[j] = 0; direction.array[i][j] = 0; /* i n i t i a l i z e direction */ direction.array[j][i] = 0; /* to identity matrix */ for(k = 0 ; k < x.rowdim ; k++) { f o r d i = 0 ; i i < x.coldim-intercept ; ii++) xmax.array[ii] = x.array[k][ii+intercept]; weightx = weight(xmax); xprime(&xmax); M.array[i][j] += xmax.array[i]*xmax.array[j]*epsilon.array[k] •weightx; } M.array[j][i] = M.array[i][j]; } Appendix A. Listing of Computer Program direction.array[i][i] 129 = 1; } f p r i n t f ( f 2 , "The F i s h e r i n f o r m a t i o n m a t r i x M \ n " ) ; mprintf(M); decomp(M,&Dtemp,&j,&v); det=l; f o r ( i = 0 ; i < M . r o w d i m ; i++) d e t *= D t e m p . a r r a y [ i ] [ i ] ; d e t *= j ; f p r i n t f ( f 2 , "The d e t e r m i n a n t o f M i s °/.g\n" , d e t ) ; f p r i n t f ( f 3 , "The d e t e r m i n a n t o f M i s ' / , g \ n \ n " , d e t ) ; f p r i n t f ( f 3 , " i t e r a t i o n fmin weightx det M alpha c o s t \ n \ n " ) ; / * c a l c u l a t e d i s p e r s i o n matrix D = inv(M) * / . D.rowdim = M.rowdim; D.coldim = M.coldim; invert(M.&D); f p r i n t f ( f 2 , "The d i s p e r s i o n m a t r i x D \ n " ) ; mprintf(D); printf("finished calculating D\n"); / * prompt f o r t h e number o f o b s e r v a t i o n s t o add p r i n t f ( " E n t e r t h e number o f o b s e r v a t i o n s t o be s c a n f ( " •/.d'-.&nmax); i n i t i a l = 1.0; /* /* /* /* s t a r t major l o o p e a c h i t e r a t i o n adds one o b s e r v a t i o n for(count = 0 ; count */ added\n"); */ */ */ */ < nmax ; count++) { p r i n t f ( " d o i n g i n t e r a t i o n '/,d\n" , c o u n t ) ; / * l o o p u n t i l fmin > M.rowdim f o r ( j = 0 ; j < 100 ; j++) { / * f i n d x t o maximize x ' D x * / powell(&xmax,direction,&i,&fmin); */ Appendix A. Listing of Computer Program x . r o w d i m += 1; epsilon.rowdim for(i = 0 ; i +=1; < xmax.rowdim ; i++) { xmax.arrayCi] = sin(fmod(xmax.array[i],2*PI)); xmax.array[i] = xmax.array[i]*(upper.array[i+intercept] mid.array[i+intercept]) + - mid.array[i+intercept]; } weightx = weight(xmax); printf("doing loop '/,d\n",j); fmin = - f m i n ; i f ( ( f m i n > M.rowdim)I I ( j == 9 9 ) ) b r e a k ; / * loop exit conditions p r i n t f (" l o c a l minimum, f m i n = °/,f \ n " , f m i n ) ; x . r o w d i m -= 1; e p s i l o n . r o w d i m -= 1; f o r ( k = 0 ; k < xmax.rowdim ; k++) xmax.array[k] = -xmax.array[k+1]; } f p r i n t f (f3, for(j " °/„d If '/.f" . c o u n t , f m i n , 1.0/weightx) ; = 0 ; j < x.coldim-intercept x.array[x.rowdim if(intercept - 1][j+intercept] j++) = xmax.array[j]; == 1) x.array[x.rowdim /* ; - 1] [0] = 1.0; update t h e weights */ a l p h a = ( f m i n - M . r o w d i m ) / ( ( f m i n - 1)*M.rowdim); ford = 0 ; i < (x.rowdim-1) ; i++) epsilon.array[i] *= (1-alpha); e p s i l o n . a r r a y [ x . r o w d i m - 1] = a l p h a ; /* update the d i s p e r s i o n matrix */ xprime(&xmax); Q.rowdim = D.rowdim; Q . c o l d i m = D.coldim; cnst = alpha*weightx; cnst /= (1.0 - a l p h a + a l p h a * f m i n ) ; for(i = 0 ; i < Q.rowdim ; i++) { for(j = 0 ; j < q.rowdim ; j++) { Q.array[i][j] =0; f o r ( k = 0 ; k < Q.rowdim ; k++) Appendix A. Listing of Computer Program 131 Q. array[i] [j] += D.array[i][k]*xmax.array[j]*xmax.array[k]; Q.arrayCi][j] *= -cnst; } Q.arrayCi] [i] += 1; } Dt emp.rowdim = D.rowdim; Dtemp.coldim = D.coldim; f o r d = 0 ; i < D.rowdim ; i++) for(j = i ; j < D.rowdim ; j++) { Dtemp.array[i][j] = 0; for(k = 0 ; k < D.rowdim ; k++) Dtemp.array[i][j] += Q.array[i][k]*D.array[k][j]/(1-alpha) } f o r d = 0 ; i < D.rowdim ; i++) { D.array[i][i] = Dtemp.array[i][i]; for(j = i+1 ; j < D.rowdim ; j++) • { D.arrayCi] [j] = Dtemp.array[i] [j] ; D.array [j] [i] = Dtemp.array [i] [j] ; } > initial = -initial; for(k = 0 ; k < D.rowdim ; k++) xmax.array[k] = i n i t i a l ; / * reinitialize xmax */ / * update determinant of M */ det *= pow(l-alpha,M.rowdim)*(l+alpha/(l-alpha)*fmin); fprintf (f 3," °/.g '/.f \n" ,det .alpha) ; /* *•/. /* end loop /* */ */ > /* print, final x matrix and weights */ fprintf(f2, "The final x matrix and weights\n"); f o r d = 0 ; i < x.rowdim ; i++) Appendix A. Listing of Computer Program { > if(intercept ==1) f printf Cf 2, " 7.f",x. array [i] [0] ) ; for(j = intercept ; j < x.coldim ; j++) fprintf (f2, " °/.f",x.array[i] [j]); fprintf (f 2, " °/,f\n" ,epsilon.array [i] ) ; fprintf(f2, "\n\n cnst fprintf(f2, "The final mprintf(D); invert(D.&M); fprintf(f2, "The final mprintf(M); fprintf(f2, "The final fclose(f1); fclose(f2); fclose(f3); = °/.f,alpha = %f\n\n",cnst,alpha); dispersion matrix\n"); information matrix M\n"); determinant of M = '/.f \n\n" ,det); 132 Appendix A. Listing of Computer Program 133 /**********************************************************/ /* /* function */ */ powell() /* */ /**********************************************************/ /* call #include other f i l e s */ "marg:linemin.c" /* function declarations void powell(vector /*• /* /* /* /* containing functions */ *x,matrix d i r e c t i o n , i n t *iterations,real finds x to minimize func(x) i n i t e r a t i o n s with the corresponding value of fmin requires functions func(x), linmin(x,direction,fmin), m n b r a k ( a x , b x , c x , f a , f b , f c ) , f l m i n ( ) and brent(ax,bx,cx,tol,xmin) void powell(vector *x,matrix d i r e c t i o n , i n t *iterations,real *funcmin); */ */ */ */ */ *funcmin) { const const int real vector int real ITMAX = 1 0 0 ; PTOLERANCE = 1 . 0 e - 8 ; i,j,ibig,iter,itcount; t,fxtt,fx,del,fmin; directionn.xO,xi,xn; xO - * x ; / * s a v e i n i t i a l , p o s i t i o n i n xO * / i f ( x O . r o w d i m . == 1) { directionn.rowdim = 1 ; directionn.array[0] =1; linmin(&x0,&directionn,&fmin); *x = x O ; *funcmin = f m i n ; • i t e r a t i o n s = 0; return; } x i = *x; xn.rowdim = xO.rowdim; Appendix A. Listing of Computer Program 134 directionn.rowdim = direction.rowdim; fmin = func(xO); for( < UMAX = 0 ; iter iter ; iter++) { fx - fmin; ibig =0; d e l = 0; itcount=iter; ford = 0 ; i < xO. rowdim ; i++) { for(j = 0 ; j < xO.rowdim directionn.array[j] fxtt = ; j++) direction.array[j][i]; = fmin; linmin(&xi,&directionn,&fmin); if(fabs(fxtt - fmin) / * f i n d minimum a l o n g d i r e c t i o n i */ > del) { del = fabs(fxtt - fmin); / * r e c o r d t h e l a r g e s t d e c r e a s e so f a r * / ibig = i; } } if(2.*fabs(fx - fmin) for(j < xO.rowdim - 0 ; j <= PTOLERANCE*(fabs(fx) ; + fabs(fmin))) break; j++) { xn.array[j] = 2.0*xi.array[j] directionn.array[j] xO.arrayCj] - x0.array[j]; = xi.array[j] - x0.array[j]; = xi.arrayCj]; / * s e t x i = xi+1 * / } fxtt = f u n c ( x n ) ;. if(fxtt < fx) { t = if(t 2.0*(fx-2.0*fmin+fxtt)*pow(fx-fmin-del,2)-del*pow(fx-fxtt,2); < 0) { ford = ibig ; i for(j = 0 ; j < xO.rowdim-1 < xO.rowdim ; direction.array[j][i] for(j = 0 ; j = < xO.rowdim ; ; direction.array[j][i+l]; j++) direction.array[j][xO.rowdim-1] linmin(&xi,&directionn,&fmin); for(j = 0 ; j xO.array[j] < xO.rowdim = ; xi.array[j]; i++) j++) j++) = directionn.array[j]; / * f i n d minimum a l o n g */ /* */ direction n Appendix A. Listing of Computer Program 135 } } /* p r i n t f ( " i n t e r a t i o n °/,d i n p o w e l l \ n " , i t c o u n t ) ; } *x = x i ; •iterations = itcount; *funcmin = f m i n ; } / * end p o w e l l * / */ Appendix A. Listing of Computer /* /* 136 Program */ */ LINEMIN.C /* */ /* global variable declarations globvar GV; */ /* global declarations of functions*/ void mnbrak(real *ax,real *bx,real *cx,real *fa,real *fb, real * f c ) ; real brent(real ax,real bx.real cx,real t o l , r e a l *xmin); real fldim(real x); void linmin(vector *x,vector *direction, real *fmin); /* global function definitions #define maxi(a,b) ((a > b) ? a : b) #define sign(a,b) ((b > 0) ? fabs(a) 7* */ : -fabs(a)) */ /* function linemin /* */ */ void linmin(vector *x,vector *direction, real *fmin) /* given as input the vectors x and direction and the function f, f i n d * / /* the scalar step that minimizes f ( x + step*direction). */ /* Replace direction by step*direction. */ /* requires functions fldim(x), brent(ax,bx,cx,tol,xmin) */ /* and mnbrak(ax,bx,cx,fa,fb,fc) and func(x) */ /* also requires globvar GV */ { const real TOL = 1.0e-8; int j; real xx,xmin,fx,fb,fa,bx,ax; vector directiontemp,xtemp; xtemp = *x; GV.lx = *x; Appendix A. Listing of Computer Program 137 GV.ldirection=*direction; directiontemp=*direction; ax = 2*PI*rand()/32767; xx = ax - 2*PI; /* i n i t i t i a l guess f o r brackets */ mnbrak(&ax,&xx,&bx,&fa,&fx,&fb); *fmin = brent(ax,xx,bx,TOL,ftxmin); f o r ( j = 0 ; j < GV.lx.rowdim ; j++) { directiontemp.array[j] *= xmin; xtemp.array[j] += directiontemp.array[j]; } *x=xtemp; *direction=directiontemp; } /* end linmin */ /***********************************^ /* /* function fldim /* /***********************************^ /* begin f IdimO /* requires func(x) real fldim(real x) { int j; vector xt; real f; */ */ */ */ */ xt.rowdim = GV.lx.rowdim; f o r ( j = 0 ; j < GV.lx.rowdim ; j++) xt.array[j] = GV.lx.array[j] + x*GV.ldirection.array[j]; f = func(xt); return(f); } /* end fldim */ /* /* /* function brent() */ */ */ Appendix A. Listing of Computer Program 138 /***********************************^ /* brent function definition */ real brent(real ax,real bx,real cx,real t o l , r e a l *xmin) /* requires fldim(x) to be minimized */ /* given this function and a bracketing t r i p l e t of abscissas */ /* ax.bx, and cx, (such that bx i s between ax and cx and f(bx) */ /* i s less than both f(ax) and f(cx)) this function isolates the*/ /* minimum to a fractional precision of about t o l using brent's */ /* method. The abscissa of the minimum i s returned as xmin, and */ /* the minimum function value i s returned as brent. */ { const real CGOLD = 0.3819660; const i n t ITMAX = 100; const real ZEPS = 1.0e-10; real int a,b,d,e,et emp,fu,fv,fw,fx,p,q,r,toll,tol2,u,v,w,x,xm; iteration; d = 0.0; a = ((ax < cx) ? ax : cx); b = maxi(ax.cx); v = bx; w = v; x = v; e = 0.0; fx = fldim(x); fv = fx; fw = fx; f o r ( i t e r a t i o n = 0; iteration < ITMAX ; iteration++) { xm = 0.5*(a+b); t o l l = tol*fabs(x) + ZEPS; tol2 = 2.0*toll; if(fabs(x-xm) <= (tol2 - 0.5*(b-a))) { •xmin = x; return(fx); } if(fabs(e) > t o l l ) { r = (x-w)*(fx-fv); /* done */ Appendix A. Listing of Computer Program 139 q = (x-v)*(fx-fw); p = (x-v)*q - /* trial parabolic fit */ (x-w)*r; q = 2.0*(q-r); if(q > 0.0) p = -p; q = fabs(q); etemp — e ; e = d; if((fabs(p)>=fabs(0.5*q*etemp))I I(p <= q * ( a - x ) ) | | ( p { e = ((x >= xm) ? a - x : b-x); d = CG0LD*e; } else { i f ( q < 0.000000001) q = 0.0000000001; /* printf("q = Xe\n",q); */ d = p/q; u = x + d; if (((u-a) < tol2) II d =• s i g n ( t o l l , x m - ((b-u) < tol2)) x) ; } } else •c e = ((x } >= xm) ? a - x d = CG0LD*e; u = ((fabs(d) fu = fldim(u); if ( f u <= fx) { if(u >= a = x; else b=x; v = w; fv = fw; w = x; fw = f x ; x) : b-x); . >= t o l l ) ? x+d : x+sign(toll,d)); >= q*(b-x))) Appendix A. Listing of Computer Program fx 140 = fu; } else { if (u < x) a = u; else b = u; i f C C f u <= f w ) | | ( w == x ) ) { v = w; fv = fw; w = u; fw = f u ; } else i f ( ( f u <= f v ) I I(v == x ) I I (v ==2)) { v = u; fv = fu; } if(iteration == ITMAX) f p r i n t f ( f 2 , " t o o many i t e r a t i o n s i n brent. \n"); } > *xmin = x ; return(fx); } /* end b r e n t */ /******************************************************/ /* */ /* f u n c t i o n mnbrakO */ /* */ /******************************************************/ /* g i v e n a f u n c t i o n f l d i m and d i s t i n c t /* a t and b t , mnbrak s e a r c h e s i n t h e d o w n h i l l d i r e c t i o n */ */ /* ( a s measured by f l d i m ) /* at.bt, /* f l d i m as w e l l initial points and r e t u r n s t h e new p o i n t s and c t which b r a c k e t a minimum o f t h e f u n c t i o n * / as r e t u r n i n g t h e f u n c t i o n v a l u e s a t /*• these p o i n t s void mnbrak(real */ */ */ *at,real *bt,real *ct,real *ta,real *tb,real *tc) Appendix A. Listing of Computer Program 141 /* r e q u i r e s func(x) f o r which a minimum i s to be found */ { const const const real real real GOLD = 1.618034;/* d e f a u l t i n t e r v a l m a g n i f i c a t i o n GLIMIT = 100.0; /* max m a g n i f i c a t i o n per step TINY = 1.0e-10; */ */ r e a l ax,bx,cx,fa,fb,fc,ulim,u,r,q,fu,dum; int count=0; ax = * a t ; bx = *bt; f a = fldim(ax); fb = f l d i m ( b x ) ; i f ( f b > fa) { dum ax bx dum. fb fa = = = = = = ax; bx; dum; fb; fa; dum; > cx = bx + G0LD*(bx - ax fc = fldim(cx); w h i l e ( f b >= f c ) { /* count +=1; p r i n t f ("count i n mnbrak = °/,d\n" .count); */ r = (bx-ax)*(fb-fc); /* compute u by p a r a b o l i c e x t r a p o l a t i o n q = (bx-cx)*(fb-fa); /* from a,b, and c i f C f a b s ( q - r ) < 1.0e-15) { p r i n t f ("bx = He, ax = /,e'/,e, cx= 0 */ */ °/,ey,e\n" ,bx,fb,ax,fa,cx,f c) ; u = 3*PI;*/ /* } else u=bx+((bx-cx)*q-(bx-ax)*r)/(2.0*sign(maxi(fabs(q-r),TINY),(q-r))); ulim = bx + GLIMIT*(cx-bx); i f ( ( b x - u ) * ( u - c x ) > 0) /* p a r a b o l i c u i s between b & c */ { fu = fldim(u); i f (fu < fc) /* minimum i s between b & c */ Appendix A. Listing of Computer Program 142 { ax = bx; f a = fb; bx = u; fb = fu; break; } i f (fu > fb) /* minimum i s between a & u */ { } > cx = u; fc = fu; break; u = cx + GOLD*(cx-bx); fu = fldim(u); /* parabolic f i t was no good */ /* use GOLD magnification */ else i f ( ( c x - u)*(u - ulim) > 0) /* parabolic f i t i s between u */ { /* and i t s limit */ fu = fldim(u); i f (fu < fc) { bx = cx; cx = u; u - cx + G0LD*(cx - bx); fb = f c ; fc = fu; fu = fldim(u); } } else i f ( ( u - ulim)*(ulim - cx) >= 0) /* limit u to i t s maximum */ { u = ulim; fu = fldim(u); , } else /* reject parabolic u and use GOLD */ { } u = cx + G0LD*(cx - bx); fu = fldim(u); ax = bx; bx = cx; cx = u; /* eliminate oldest point and continue */ Appendix A. Listing of Computer Program f a = fb; fb = f c ; fc = fu; if((fabs(cx-bx) <= TINY) && (fabs(bx - ax) <= TINY)) break; } *at = ax; bx; *bt = *ct cx; *ta = fa; *tb = fb; *tc = fc; - /* end mnbrak */ 143 Appendix A. Listing of Computer Program /* /* */ */ FUNCMIN.C /* */ /* f u n c t i o n t o c a l c u l a t e x'Dx */ /* used i n minimize */ real func(vector x); real f u n c ( v e c t o r x) { i,j; int real f=0,temp.weightx; ford = 0 ; i < x.rowdim ; { i++) / * s c a l e x t o be between - 1 > MAXX) if(x.array[i] x.array[i] else if = 2.0*PI; (x.array[i] x.array[i] x.arrayCi] < -MAXX) = -2.0*PI; = sin(fmod(x.array[i],2*PI))*(upper.array[i+intercept] - mid.array[i+intercept]) + mid.array[i+intercept]; > weightx = weight(x); xprime(ftx); ford = 0 ; i < D.rowdim ; i++) { f += D . a r r a y [ i ] [ i ] * x . a r r a y [ i ] * x . a r r a y [ i ] ; for f (j - i + 1 ; j < D.rowdim ; j++) += 2 . 0 * D . a r r a y [ i ] [ j ] * x . a r r a y [ i ] * x . a r r a y [ j ] ; } return(-f*weightx); } and 1 * / Appendix A. Listing of Computer Program /* 1 4 5 */ /* DISPERSION.C /* */ */ **********************/ /* program to calculate the dispersion of the point height, cc */ /* cw f o r design D */ real dispersion(real height,real cc.real cw); real dispersion(real height,real cc.real cw) { real v a r i ; cc = cc*cc; cw = cw*cw; vari = D.array[0][0]+height*height*D.array[1][l]+cc*cc*D.array[2][2]; vari += cw*cw*D.array[3][3]+2*(height*D.array[1][0]+cc*D.array[2][0] ); vari += 2*(cw*D.array[3][0] + height*cc*D.array[1][2]); vari += 2*(height*cw*D.array[1][3] + cc*cw*D.array[2][3]); return(vari); } Appendix A. Listing of Computer Program /* /* 146 */ */ INVERT.C /* */ ******^ /* function to invert a square matrix void invert(matrix a.matrix *b); void invert(matrix a,matrix *b) A and p l a c e i n B */ { int i,j,k; printf("made for (k - it to invert\n"); 0 ; k < a.rowdim ; k++) { a . a r r a y [ k ] [k] = for(i < a.rowdim = 0 ; i if(i -1.0/a.array[k][k]; = 0 ; i for(j =0 = -a.array[i] < a.rowdim ; j a . a r r a y [ i ] [j] = 0 ; j if(j ; < a.rowdim if((i-k)*(j-k) for(j i++) != k) a.array[i][k] ford ; != 0) [k]*a.array[k][k]; i++) ; j++) - = a . a r r a y [ i ] [ j ] - a . a r r a y [ i ] [ k ] * a . a r r a y [ k ] [j] ; < a.rowdim ; j++) != k) a.array[k][j]=-a.array[k][j]*a.array[k][k]; } for(j = 0 ; j ford - < a.rowdim 0 ; i a . a r r a y E i ] [j] *b = a; } ; < a.rowdim j++) ; i++) = - a . a r r a y [ i ] [j] ; Appendix A. Listing of Computer Program /* 147 */ /* function xprime() /* /* calculates the effective x vector x' = f(x) */ */ */ void xprime(vector *x); void xprime(vector *x) { vector xtemp; xtemp = *x; > xtemp.array[3] xtemp.array[2] xtemp.array[1] xtemp.array[0] *x = xtemp; = xtemp.array[2]*xtemp.array[2]; = xtemp.array[1]*xtemp.array[l]; = xtemp.array[0]; =1.0; /* square cw */ /* square cc */ /* height */ /* intercept */ Appendix A. Listing of Computer Program /* 148 */ /* function weight() /* /********************************** /* produces the weight = e**(-x'Covinv x) /* used i n funcmin.c */ */ */ */ real weight(vector x); real weight(vector x) { int real real i, j ; prod.prob,temp.tempi; cnst=0.00000000001; prod = 0; f o r ( i = 0 ; i < Covinv.rowdim ; i++) x.array[i] -= xmean.array[i]; f o r d = 0 ; i < Covinv .rowdim ; i++) for(j = 0 ; j < Covinv.rowdim ; j++) prod += x.array[i]*x.array[j] *Covinv.array[i][j]; temp = pow(2.0*PI,Covinv.rowdim); if(prod > 50) prod = 50; /* prevent exponent overflow */ prob = 1./(cnst*sqrt(temp/Covinvdet)*exp(0.5*prod)+200); return(prob); Appendix A. 149 Listing of Computer Program /* */ /* function weightzQ */ /* */ /* produces the weight = e**(-x'Covinv x) /* used in funcmin.c */ */ real weightz(vector x); real weightz(vector x) { int i , j ; real prod,prob,temp; prod = 0; > f o r ( i = 0 ; i < Covinv.rowdim ; i++) x.array[i] = x.array[i]*(upper.array[i+intercept]mid.array[i+intercept]) + mid.array[i+intercept] - xmean.array[i]; f o r d = 0 ; i < Covinv .rowdim ; i++) f o r ( j = 0 ; j < Covinv.rowdim ; j++) prod += x.array[i]*x.array[j] •Covinv.array[i][j] ; temp = pow(2.0*PI,Covinv.rowdim); prob = 1.0/(sqrt(temp/Covinvdet)*exp(0.5*prod)); return(prob); Appendix A. Listing of Computer Program 150 j****************************************** ***************/ /* */ /* Decomp.c */ /* */ /* function to decompose a square matrix x /* used to calculate the matrix determinant */ */ /* function declaration */ void decomp(matrix x,matrix *LU,int *exch,vector *index); void { const real int vector decomp(matrix x,matrix *LU,int *exch,vector *index) real TINY = 1.0e-20; xmax,sum,temp; i,j,k,imax,d; rowmax,ind; rowmax.rowdim = x.rowdim; ind.rowdim = x.rowdim; d = 1; f o r ( i = 0 ; i < x.rowdim ; i++) { xmax = 0; f o r ( j = 0 ; j < x.rowdim ; j++) i f ( f a b s ( x . a r r a y [ i ] [ j ] ) > xmax) xmax = x. array [i] [j] ; if(xmax ==0) printf("ERROR - singular matrix\n"); rowmax.array[i] = 1.0/xmax; > ' printf("made i t past f i r s t loop\n"); f o r ( j = 0 ; j < x.coldim ; j++) { f o r d = 0 ; i <= j-1 ; i++) { sum = x . a r r a y [ i ] [ j ] ; for(k = 0 ; k <= i - 1 ; k++) sum += -x.array[i] [k]*x.array[k] [j] ; x.array[i][j] = sum; } Appendix A. Listing of Computer Program xmax = 0; f o r ( i = j ; i < x.rowdim ; i++) { sum = x.array[i] [j] ; for(k = 0 ; k <= j - 1 ; k++) sum += -x.array[i] [k]*x.array[k] [j] ; x.array[i][j] = sum; temp = rowmax.array[i]*fabs(sum); if(temp >= xmax) { imax = i ; xmax = temp; } } if(j { != imax) for(k = 0 ; k < x.rowdim ; k++) { } temp = x.array[imax][k]; x.array[imax][k] = x.array[j][k]; x. array [j ] [k] = temp; d = -d; rowmax.array [imax] = rowmax. array [j ] ; } ind.array[j] = imax; if(x.array[j] [j] == 0) x.array[j][j] = TINY; i f ( j != x.rowdim-1) { temp = 1 . 0 / x . a r r a y [ j ] [ j ] ; f o r ( i = j+1 ; i. < x.rowdim ; i++) x.array[i] [j] *= temp; } *LU = x; •index = ind; *exch = d; } > 151 Appendix A. Listing of Computer Program /* /* mprintO & mprintfO /* /**************************^ /* function to print a matrix */ x) { int i.,j; for(i = 0 ; i < x.rowdim ; i++) { for(j = 0 ; j < x . c o l d i m ; j++) p r i n t f ( " '/.f " , x . a r r a y [ i ] [ j ] ) ; printf(" W ) ; } printf("\n\n"); > void mprintf(matrix x) { int i,j ; ford = 0 ; i < x . r o w d i m ; i++) { for(j = 0 ; j f p r i n t f (f2, fprintf(f2, } fprintf(f2," < x . c o l d i m ; j++) " °/.e",x.array[i] [j]); "\n"); \n\n"); void mprintv(vector x); v o i d m p r i n t v ( v e c t o r x) { int i; */ */ */ extern FILE *f2; void mprint(matrix 152 Appendix A. Listing of Computer Program f o r ( i = 0 ; i < x.rowdim ; i++) f printf (f 2, " °/.e\n" ,x.array [i] ); f p r i n t f ( f 2 , "\n\n"); } 153 Appendix A. Listing of Computer Program /* /* input data /* . ' 24 27.4 22.6 21.3 22.9 20.7 17.7 24.4 26.5 24.4 21.9 18.3 21.9 21.9 24.4 24.4 27.4 16.8 18.3 17.7 27.4 25.9 22.6 33.5 26.2 84.0 58.0 68.0 88.0 88.0 87.0 48.0 53.0 72.0 96.0 59.0 77.0 62.0 86.0 84.0 78.0 85.0 72.0 92.0 85.0 93.0 94.0 72.0 73.0 7.5 7.7 8.1 7.1 7.3 5.8 6.5 7.9 6.9 7.7 7.1 7.9 7.7 7.1 8.1 9.3 6.5 7.2 6.9 7.7 7.2 8.1 9.3 8.2 154 */ */ */ Appendix A. Listing of Computer Program /* /* /* 155 */ summary output f i l e The determinant of M i s 2.83251 iteration fmin weightx det M alpha cost 0 44.776219 208.250211 14.3143 0.232867 1 35.250677 203.484981 58.0171 0.228103 2 30.942401 200.000146 208.947 0.224952 3 16.418169 200.000055 436.878 0.201356 4 8.139710 204.357308 555.749 0.144954 5 5.702645 202.616128 596.048 0.090515 6 6.033151 200.000146 653.223 0.100988 7 5.671553 200.000055 699.213 0.089454 8 5.035835 203.938586 721.472 0.064165 9 6.721358 204.690646 829.228 0.118912 10 5.024016 200.000528 855.111 0.063619 11 4.946451 200.000007 878.418 0.059956 12 5.143710 204.016071 911.513 0.069003 13 4.923478 200.000055 935.321 0.058843 14 5.077377 207.577253 967.163 0.066058 15 4.207452 200.000175 968.768 0.016170 16 4.992963 200.000146 997.45 0.062170 17 4.569444 200.000007 1008.48 0.039883 18 4.680347 200.000528 1023.85 0.046215 19 4.212123 200.215836 1025.62 0.016510 20 4.663896 205.108999 1040.58 0.045300 21 4.456075 200.000007 1048.24 0.032991 22 4.374703 200.000528 1053.59 0.027758 23 4.130671 200.000055 1054.31 0.010435 24 4.302212 204.293161 1057.9 0.022880 25 4.781354 206.012127 1078.52 0.051658 26 5.083598 204.396750 1115.6 0.066338 27 4.102556 200.038000 1116.07 0.008264 28 4.161839 200.000528 1117.22 0.012796 29 4.322243 200.000055 1121.51 0.024249 30 4.489293 205.065621 1130.91 0.035057 31 4.854972 200.000007 1156.73 0.055446 */ */ Appendix A. Listing of Computer Program 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 4.376778 200.000528 4.383992 200.000146 4.286449 204.894779 4.506626 207.788993 4.127847 200.000528 4.283152 200.000007 4.087589 200.000528 4.158436 200.000055 4.262090 204.730677 4.943194 205.151454 4.341091 200.000528 4.000000 200.000528 4.025834 200.000108 4.459949 204.756142 4.118826 200.000528 4.138323 200.000055 4.000693 200.000528 4.654956 200.000007 4.184546 200.000528 4.138339 205.348867 4.043121 200.000528 4.248202 204.971220 4.377497 203.259375 4.146823 200.000007 4.167280 204.468209 4.157742 200.000055 4.041702 200.000528 4.074709 200.000007 4.024391 200.000528 4.000000 200.000528 4.000000 200.000528 4.000744 200.000054 4.000008 200.000528 4.257013 200.000131 4.008829 200.000528 4.083920 200.000055 4.000184 200.000528 4.013086 200.000007 4.004305 200.000528 4.000887 200.000055 4.000008 200.000528 4.644840 204.777644 1162.7 0.027895 1168.91 0.028368 1172.51 0.021790 1182.98 0.036119 1183.75 0.010218 1187.31 0.021561 1187.68 0.007092 1188.85 0.012541 1191.94 0.020086 1224.23 0.059799 1229.47 0.025522 1229.47 0.000000 1229.5 0.002134 1238.69 0.033234 1239.39 0.009525 1240.33 0.011019 1240.33 0.000058 1257.99 0.044799 1259.65 0.014488 1260.6 0.011020 1260.7 0.003543 1263.65 0.019103 1270.19 0.027942 1271.27 0.011664 1272.66 0.013204 1273.91 0.012489 1274 0.003428 1274.29 0.006074 1274.32 0.002016 1274.32 0.000000 1274.32 0.000000 1274.32 0.000062 1274.32 0.000001 1277.5 0.019728 1277.51 0^000734 1277.87 0.006803 1277.87 0.000015 1277.88 0.001086 1277.88 0.000358 1277.88 0.000074 1277.88 0.000001 1295.57 0.044230 156 Appendix A. Listing of Computer Program 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 4. 184407 200 .000528 4.,026261 200 .000055 4. 000375 200 .000528 4. 066599 200 .000004 4. 021459 200 .000528 4. 301028 202 .339196 4. 089407 200 .000528 4. 305835 200 .000007 4. 093813 200 .000528 4.,028935 200 .000055 4.,000253 200 .000528 4., 174793 204 .817531 4.,055824 200 .000528 4.,365287 205 .379954 4.,004504 200 .000528 4.,141607 200 .000007 4.,045470 200 .000528 4.,014949 200 .000007 4.,004952 200 .000528 4.,001645 200 .000007 4.,000547 200 .000528 4.,106836 200 .000055 4..350408 205 .371338 4.,091230 200 .000055 4..316636 200 .000146 4..096539 200 .000055 1297 .29 0.014477 1297 .32 0.002169 1297 .32 0.000031 1297 .56 0.005429 1297 .58 0.001776 1301 .97 0.022798 1302 .39 0.007235 1306 .92 0.023128 1307 .38 0.007581 1307 .43 0.002388 1307 .43 0.000021 1308 .99 0.013764 1309 .15 0.004567 1315 .53 0.027136 1315 .53 0.000375 1316 .57 0.011269 1316 .68 0.003733 1316 .69 0.001240 1316 .69 0.000412 1316 .69 0.000137 1316 .69 0.000046 1317 .3 OJ308597 1323 .23 -0.026147 1323 .67 0.007378 1328 .59 0.023867 1329 .09 0.007794 157
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Optimal design for linear regression with variable...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Optimal design for linear regression with variable costs and precision requirements and its applications… Penner, Margaret 1988
pdf
Page Metadata
Item Metadata
Title | Optimal design for linear regression with variable costs and precision requirements and its applications to forestry |
Creator |
Penner, Margaret |
Publisher | University of British Columbia |
Date Issued | 1988 |
Description | Various criteria are discussed and as algorithms for obtaining optimal designs for use in linear regression. Linear regression is used widely and effectively in forestry but the method of quantifying the linear relationship in terms of selecting observations or designing an experiment to obtain observations is often inefficient. The experiment or study objectives must be identified to develop a criterion for comparing designs. Then a method of obtaining observations can be found which performs well under this criterion. Biometricians in forestry have been slow to take advantage of one of the assumptions of linear regression, namely that the independent variables are fixed. In part this has been due to limitations in the theory. Two important assumptions in most optimal design work, namely that precision requirements and costs are constant for all observations, are not valid for most forestry applications. Ignoring nonconstant costs can lead to designs less efficient than ones where each combination of independent variables is selected with the same frequency. The objective of this study was to develop a method of optimal sample selection that allowed for costs and precision requirements that vary from observation to observation. The resulting practical experimental layouts are more efficient for attaining the experimenter's objectives than randomly selected observations or designs constructed using the currently available design theory. Additional features of designs that consider differing costs and precision requirements are their larger sample size and their robustness to misspecification of the sample space. Traditional optimal designs concentrated observations on the boundaries of the sample space. By recognizing that these observations may be more costly and may not be of primary interest to the experimenter, more efficient designs can be constructed from less extreme observations. A computer program for obtaining optimal designs is also included. |
Subject |
Structural optimization Regression analysis Forests and forestry -- Data processing |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-08 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0075387 |
URI | http://hdl.handle.net/2429/29057 |
Degree |
Doctor of Philosophy - PhD |
Program |
Forestry |
Affiliation |
Forestry, Faculty of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1988_A1 P46.pdf [ 7.39MB ]
- Metadata
- JSON: 831-1.0075387.json
- JSON-LD: 831-1.0075387-ld.json
- RDF/XML (Pretty): 831-1.0075387-rdf.xml
- RDF/JSON: 831-1.0075387-rdf.json
- Turtle: 831-1.0075387-turtle.txt
- N-Triples: 831-1.0075387-rdf-ntriples.txt
- Original Record: 831-1.0075387-source.json
- Full Text
- 831-1.0075387-fulltext.txt
- Citation
- 831-1.0075387.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0075387/manifest