{"Affiliation":[{"label":"Affiliation","value":"Forestry, Faculty of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."}],"AggregatedSourceRepository":[{"label":"AggregatedSourceRepository","value":"DSpace","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","classmap":"ore:Aggregation","property":"edm:dataProvider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","explain":"A Europeana Data Model Property; The name or identifier of the organization who contributes data indirectly to an aggregation service (e.g. Europeana)"}],"Campus":[{"label":"Campus","value":"UBCV","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","classmap":"oc:ThesisDescription","property":"oc:degreeCampus"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the name of the campus from which the graduate completed their degree."}],"Creator":[{"label":"Creator","value":"Penner, Margaret","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/creator","classmap":"dpla:SourceResource","property":"dcterms:creator"},"iri":"http:\/\/purl.org\/dc\/terms\/creator","explain":"A Dublin Core Terms Property; An entity primarily responsible for making the resource.; Examples of a Contributor include a person, an organization, or a service."}],"DateAvailable":[{"label":"DateAvailable","value":"2010-10-08T22:11:58Z","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"edm:WebResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"DateIssued":[{"label":"DateIssued","value":"1988","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"oc:SourceResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"Degree":[{"label":"Degree","value":"Doctor of Philosophy - PhD","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","classmap":"vivo:ThesisDegree","property":"vivo:relatedDegree"},"iri":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","explain":"VIVO-ISF Ontology V1.6 Property; The thesis degree; Extended Property specified by UBC, as per https:\/\/wiki.duraspace.org\/display\/VIVO\/Ontology+Editor%27s+Guide"}],"DegreeGrantor":[{"label":"DegreeGrantor","value":"University of British Columbia","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","classmap":"oc:ThesisDescription","property":"oc:degreeGrantor"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the institution where thesis was granted."}],"Description":[{"label":"Description","value":"Various criteria are discussed and as algorithms for obtaining optimal designs for use in linear regression. Linear regression is used widely and effectively in forestry but the method of quantifying the linear relationship in terms of selecting observations or designing an experiment to obtain observations is often inefficient. The experiment or study objectives must be identified to develop a criterion for comparing designs. Then a method of obtaining observations can be found which performs well under this criterion.\r\nBiometricians in forestry have been slow to take advantage of one of the assumptions\r\nof linear regression, namely that the independent variables are fixed. In part this has been due to limitations in the theory. Two important assumptions in most optimal design work, namely that precision requirements and costs are constant for all observations,\r\nare not valid for most forestry applications. Ignoring nonconstant costs can lead to designs less efficient than ones where each combination of independent variables is selected with the same frequency. The objective of this study was to develop a method of optimal sample selection that allowed for costs and precision requirements that vary from observation to observation.\r\nThe resulting practical experimental layouts are more efficient for attaining the experimenter's objectives than randomly selected observations or designs constructed using the currently available design theory. Additional features of designs that consider differing costs and precision requirements are their larger sample size and their robustness\r\nto misspecification of the sample space. Traditional optimal designs concentrated observations on the boundaries of the sample space. By recognizing that these observations\r\nmay be more costly and may not be of primary interest to the experimenter, more efficient designs can be constructed from less extreme observations. A computer program for obtaining optimal designs is also included.","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/description","classmap":"dpla:SourceResource","property":"dcterms:description"},"iri":"http:\/\/purl.org\/dc\/terms\/description","explain":"A Dublin Core Terms Property; An account of the resource.; Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource."}],"DigitalResourceOriginalRecord":[{"label":"DigitalResourceOriginalRecord","value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/29057?expand=metadata","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","classmap":"ore:Aggregation","property":"edm:aggregatedCHO"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","explain":"A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier"}],"FullText":[{"label":"FullText","value":"OPTIMAL DESIGN FOR LINEAR REGRESSION WITH VARIABLE COSTS AND PRECISION REQUIREMENTS A N D ^ T S APPLICATIONS TO FORESTRY r By M A R G A R E T P E N N E R B. Sc. (Hons.) Lakehead University, ,1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT. OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF FORESTRY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1988 \u2022 \u00a9 MARGARET PENNER, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2\/88) Abstract Various criteria are discussed and as algorithms for obtaining optimal designs for use in linear regression. Linear regression is used widely and effectively in forestry but the method of quantifying the linear relationship in terms of selecting observations or designing an experiment to obtain observations is often inefficient. The experiment or study objectives must be identified to develop a criterion for comparing designs. Then a method of obtaining observations can be found which performs well under this criterion. Biometricians in forestry have been slow to take advantage of one of the assump-tions of linear regression, namely that the independent variables are fixed. In part this has been due to limitations in the theory. Two important assumptions in most optimal design work, namely that precision requirements and costs are constant for all observa-tions, are not valid for most forestry applications. Ignoring nonconstant costs can lead to designs less efficient than ones where each combination of independent variables is selected with the same frequency. The objective of this study was to develop a method of optimal sample selection that allowed for costs and precision requirements that vary from observation to observation. The resulting practical experimental layouts are more efficient for attaining the experimenter's objectives than randomly selected observations or designs constructed using the currently available design theory. Additional features of designs that consider differing costs and precision requirements are their larger sample size and their robust-ness to misspecification of the sample space. Traditional optimal designs concentrated n observations on the boundaries of the sample space. By recognizing that these obser-vations may be more costly and may not be of primary interest to the experimenter, more efficient designs can be constructed from less extreme observations. A computer program for obtaining optimal designs is also included. 111 Table of Contents Abstract ii List of Tables vii List of Figures ix Acknowledgement x 1 INTRODUCTION 1 2 L I T E R A T U R E R E V I E W 4 2.1 THEORY 6 2.1.1 THE PROBLEM 6 2.1.2 OPTIMAL DESIGN FOR PARAMETER ESTIMATION . . . . 12 2.1.3 MODEL UNCERTAINTY 22 2.1.4 SEQUENTIAL DESIGN 28 2.1.5 BAYESIAN OPTIMAL DESIGN 29 2.1.6 NONLINEAR REGRESSION 31 2.1.7 DECISION THEORY 34 2.1.8 ROBUST DESIGNS 37 2.2 REVIEW OF FORESTRY LITERATURE 40 2.2.1 MODEL-BASED SAMPLING 41 2.2.2 SAMPLING FOR BIOMASS 44 2.3 ALGORITHMS 45 iv 2.3.1 INFORMATION MATRIX 45 2.3.2 APPROXIMATE DESIGNS 46 2.3.3 EXACT DESIGNS 50 2.3.4 SEQUENTIAL DESIGN 54 3 COMPLICATIONS 56 3.1 AVAILABILITY 56 3.2 PRECISION 57 3.3 COSTS 58 4 P R O B L E M SOLUTION 61 4.1 AVAILABILITY 62 4.1.1 OBTAINING THE WEIGHTS 64 4.1.2 INCORPORATING THE WEIGHTS 65 4.1.3 IMPLICATIONS OF MISSPECIFYING AVAILABILITY . . . . 66 4.1.4 DISCUSSION 66 4.2 GENERALIZED LOSS 71 4.3 INCORPORATING COST 72 4.4 DISCUSSION 74 5 E X A M P L E 76 5.1 BACKGROUND 76 5.2 PREPARATION 78 5.2.1 OBJECTIVES 78 5.2.2 INDEPENDENT VARIABLES 78 5.2.3 MODEL 78 5.2.4 RANGES 79 v 5.2.5 WEIGHTS 79 5.2.6 VARIANCE OF THE DEPENDENT VARIABLE 80 5.3 PILOT STUDY 80 5.4 ANALYSIS OF PILOT STUDY 82 5.4.1 OBTAINING A MODEL 82 5.4.2 OBTAINING THE WEIGHTS 83 5.4.3 ESTIMATING THE VARIANCE OF VOLUME 84 5.5 IDENTIFYING THE OPTIMUM DESIGN 84 5.6 COMPARISONS WITH OTHER DESIGNS 92 5.6.1 RANDOM SAMPLING 92 5.6.2 UNIFORM SAMPLING 93 5.6.3 PURE OPTIMUM DESIGN 97 5.6.4 DISCUSSION 97 5.7 MODIFICATIONS 103 5.7.1 NONLINEAR MODELS 103 5.7.2 POLYNOMIAL MODELS 104 6 ADDITIONAL APPLICATIONS A N D VARIANTS 106 6.1 UNIQUENESS 106 6.2 STRATIFICATION AND QUALITATIVE VARIABLES 107 6.3 DISCRETE QUANTITATIVE VARIABLES 108 7 CONCLUSION 109 R E F E R E N C E S CITED 111 vi List of Tables 2.1 Some examples of 0 and
)\\ Jre(n) JTT() where Jee(p) is the k x k matrix whose ( i j ) t h element is a 2ln p(y;n,9,T) J{H\\B,T)=\\ (2.1) (\/*)\/ E (2.2) ddidOj and other entries are similarly obtained. For the present, the observations y are assumed to be independent of each other. The information for an iV-observation design is the sum of the information matrices of the individual observations and will be denoted as K(H;9,T) and partitioned in the same manner as J(\/z;#,r). For the present it will be assumed that K(\/J.;0,T) is non-singular. Then K~1(fj,;9, r) provides a generalized Cramer-Rao lower bound for the variance-covariance matrix of any unbiased estimator of (9,T) (Silvey, 1975). Also, its leading k x k submatrix given by Vi(\/i; 0, r) = 9, r) - i^T(M; 9, r)[Krr(^ 9, r)\\~xKr6{\\x\\ 9, r)}\"1 (2.3) is a lower bound for the variance-covariance matrix of any linear unbiased estimator of 9. As the sample size increases, the variance-covariance matrix of the maximum likelihood estimates (MLE's) converges to this lower bound (Silvey, 1975). With this asymptotic Chapter 2. LITERATURE REVIEW 9 property in mind, this matrix will also be referred to as the variance-covariance matrix for the estimate of 9. If one is interested in real-valued functions of 9, say gi(9), g2(9),..., gS(9), let gT(9) = (g1(9),g2(9),... ,gS(9)) and Dg(8) be the k x s matrix with ( i j ) t k compo-nent ^ 1 . Then, assuming sufficient regularity, given an iV-observation design, the lower bound for the variance-covariance matrix of an unbiased estimator of g(9) is Vg(;u; 9, r) = {Dg{9)YVe{^ 9, r){Dg(9)} (2.4) and for large JV, the MLE of g(9) converges to this lower bound. PROBLEM STATEMENT The majority of approaches to optimal design are concerned with selecting \/J, to make the lower bound of the variance-covariance matrix of the parameters of interest (9) as small as possible or to make its inverse, the Fisher information matrix, as large as possible. Two complications occur. 1. Only in special cases is it possible to find the \"best\" \/z = fi* in the sense that Vg(fj.;9, T) \u2014 Vg(\/u,*;$,T) is non-negative definite for all \\i. Thus one must be satisfied with minimizing some function of V$(\/j,; 9, T). 2. In the case of nonlinear regression, Vg(fj.;$,r) is dependent on the unknown pa-rameters. Therefore, prior knowledge about 0 and r is required. In linear re-gression, this dependence on the parameters does not occur and therefore prior knowledge of their values is not required. For now, attention will be restricted to linear models (models linear in the param-eters). The main reasons for this are that the theory is better developed than for nonlinear models, and many nonlinear problems can be solved by transforming the Chapter 2. LITERATURE REVIEW 10 nonlinear model to a model linear in the parameters. As well, when one is looking for an empirical model, it is usually preferable to keep things as simple as possible. Many nonlinear relationships can be approximated by a linear model over portions of their range. The linear relationship can be stated in the form E{Y;n,d,R)=RI(TI,6) ( 2 . 5 ) or Y = v(n,0) + G(r*,T) ( 2 . 6 ) where E[G(FJ.,r)] = 0 . Specifing a model involves stating the functional form of R](FJ,,0) and G(\/J.,R). A linear model is one in which RJ(^I,8) is linear in the parameter 8. That is, 7 7 ( M , 0 ) = \/ ( M ) T 0 ( 2 . 7 ) and VAX(Y;\/J,,e,T) = TV(\/X) ( 2 . 8 ) where V(FI) is a known function of \/J,. The standard linear model has the additional simplification Vax(y;M,T) = l . ( 2 . 9 ) This may be obtained from equation 2 . 8 by the transformation Y' = Y{RV(FJ,)}~2 and a corresponding transformation to F(FI), namely, F(FI)' = \/(\/i){ru(\/j)}\"2. Therefore, attention will be restricted to the standard linear model with the realization that it is not as restrictive as it may appear. For linear relationships, it is customary to estimate 8 by the method of least squares (LS). When Y is assumed to be normal, the LS estimators are also the MLE's. Selecting a vector \\x from the set U is equivalent to choosing a fc-vector xT = (xi, x2\u00bb \u2022 \u2022 \u2022, xk) from the set X = F(U) where F is the vector-valued function (\/i(\/x), Chapter 2. LITERATURE REVIEW 11 \/2(\/x),..., fkir1))- Therefore, equation 2.5 may be replaced by E(y;H,0,T) = xTe. (2.10) The problem becomes one of selecting N vectors X ( i ) , X ( 2 ) , \u2022 \u2022 \u2022 from the set X, a subset of Rk. Under the assumption that the observations on y are independent, the variance-covariance matrix of the least squares estimators 8 of 8 from an iV-observation design is V(8]X,T) = T(Y,X(I)X{I))-1 (2.11) t = i when J2iLi x(i)x(i) 1 S non-singular. Note that this quantity is independent of 8. A P P R O X I M A T E T H E O R Y For an iV-observation design, let pi be the proportion of total observations taken at X(j),i = 1,... ,n, where n < N with equality if, and only if, all x^ are distinct. These probabilities comprise a discrete probability distribution or design measure fa on X. Equation 2.11 may now be written as V(^;x,r) = r{ivf:p tx ( t )xJ )}- 1. (2.12) Note that minimization is subject to the the constraints that Npi be a non-negative integer and represent the number of replicate observations of x^y If the requirement that Npi be an integer is relaxed to include all non-negative real numbers, calculus techniques may be used to facilitate finding an optimal solution. Thus, the set of fa is expanded to the set of all design measures \u00a3 \u20ac E, both discrete and continuous. It is then hoped it will be possible to find an fa close to \u00a3 that will be close to optimal. The design associated with \u00a3 is called an approximate design while one associated with fa is called an exact design. Note that \u00a3 is unique only up to multiplication by a constant, Chapter 2. LITERATURE REVIEW 12 that is, the pi are independent of N. Thus, for most of the optimality criteria, the pi can be determined first and then N calculated (Fedorov, 1972 p.63). 2.1.2 OPTIMAL DESIGN FOR P A R A M E T E R ESTIMATION Let N-xM[i(N)} = M(() = iV\"1 x^xj^ = fx x^xj^dx which is the standardized information matrix T~1K(\/J,; 6, r). A good design is one which makes M(\u00a3) large in some sense or its inverse small. In general, M(\u00a3) \u00a3 M where M. = {M(() : \u00a3 \u20ac H}. Let M+ = {M G M. : det M ^ 0}. For now restrict M to be in A4 + . Parameter estimation criteria essentially are concerned only with reducing the random error component of a model, ignoring model bias and often leaving no means for evaluating model bias. Several criteria for prediction and parameter estimation have been studied. DESIGN CRITERIA Once an experimenter defines his objectives and constraints, he must also find a cri-terion to measure to what extent his objectives have been met. As Kiefer (1975) observed, when statisticians use an optimality criterion in selecting a procedure, the precisely expressed criterion is usually only an approximation to some vague notion of \"goodness\". Sometimes it will be useful to distinguish between the region of interest RI(x) and the region of operability RO(x). The region of interest is that subset of Rk for which the experimenter is interested in using the model. The region of operability is the subset of Rk from which x may be selected. The region of interest will be assumed to be contained within the region of operability, i.e., one is not interested in extrapolation (see figure 2.2). In addition, the experimenter may not be equally interested in all points in RI(x). Chapter 2. LITERATURE REVIEW 13 He may, for example, be more interested in \"average\" observations. This can be ac-counted for by introducing a weighting function over RI(x) denoted by w(x) suitably scaled to integrate to one over RI(x) (see figure 2.3). For the present, it will be assumed that RI(x) and RO(x) coincide and that w(x) is uniform over both. SPECIFIC CRITERIA The following criteria are essentially different ways of re-ducing M(\u00a3) to a meaningful scalar quantity suitable for comparing two or more de-signs. 1. G-optimality. The first formal criteria was due to Smith (1918). Let xT9 be the estimate for xT0. She proposed minimizing the maximum variance of this esti-mate for all x 6 X, that is, finding \u00a3 6 H which minimizes maxx\u20acx xT[M(\u00a3)]~1x. This criterion has been referred to as the minimax criterion or G-optimality where the G- stands for generalized variance. This is a prediction criterion and has been criticized for being too conservative in that it protects against the worst possible situation. As will be seen, this criteria is closely related to parameter estimation criteria. It is also closely related to robust designs which protect against \"wild\" observations (Box and Draper, 1975). 2. D-optimality. Under the assumption of normality of errors (r ~ xjv_fc), a confi-dence ellipsoid for 9 is of the form {9; (0 \u2014 9)TM(\u00a3)~l(9 - 9) < constant} where 9 is the least squares estimate for 9. The content of this ellipsoid is propor-tional to {det M ( \u00a3 ) } _ 1 . The design which makes the content of this ellipsoid small by maximizing det M(\u00a3) or minimizing det M ( \u00a3 ) - 1 is called D-optimal. This is essentially a parameter estimation criterion and is one of the most widely Chapter 2. LITERATURE REVIEW 14 RO(x) RI(x) Figure 2.2: The region of operability (RO(x)) and the region of interest (RI(x)) in two dimensions. Chapter 2. LITERATURE REVIEW Chapter 2. LITERATURE REVIEW 16 studied (e.g.,Wynn, 1970, 1972; Mitchell, 1974a; Hill, 1980;Johnsori and Nacht-sheim, 1983). Without the assumption of normality, D-optimality also minimizes the variance of the best linear unbiased estimates (BLUE's) of the parameters (Pazman, 1986 p.79). One disadvantage of D-optimality is that the ellipsoid may be narrow but long indicating that an element of 9 has a relatively large variance. 3. E-optimality. Let C = {c G Rl : crc = 1}. Choosing ( G H to minimize maxc6fli c T M ( \u00a3 ) - 1 c is equivalent to choosing \u00a3 to maximize the minimum eigen-value of M((). This is called E-optimality and minimizes the variance of the least well estimated parameter. Under the assumption of normality, an E-optimal de-sign maximizes the power of the F-test of size a for this least well estimated parameter. 4. A-optimality. Another measure of the \"size\" of a matrix is its trace and a design which minimizes the trace of M ( \u00a3 ) - 1 is called A-optimal where A- stands for av-erage variance since ^tr[M(\u00a3) - 1] is the average variance of the 6Vs. This criterion does not account for the correlations between the 6Vs. 5. L-optimality. When one is interested in predicting E(y) over a region C of Rk, one may wish to minimize some average of c T { M ( \u00a3 ) _ 1 } c over C. If the average is with respect to some probability distribution n on C, the criterion is to minimize \/ cTM(\u00a3)~1c\/j,(dc) and is called L-optimality where L- refers to linear criterion. This integral can be expressed in the form tr[{M(\u00a3)} - 1i?] where B = f cTcfi(dc) and is positive semidefinite. Again, this is a prediction criterion averaged over the regions of interest. When B = I, this reduces to A-optimality. If B has rank s, it can be written B = AAT where rank(.<4) = s. Chapter 2. LITERATURE REVIEW 17 These are the main criteria found in the literature for cases when the form of the relationship E(y; (J.,9,T) = rj(fj,,9) is assumed known. They are all based on linear functions of 6 (Silvey,1980 p. 13). Criteria functions which are based on nonlinear estimates of 9 have the additional complication of being dependent on the value of 9. No nonlinear optimality criteria for linear estimates have achieved prominence. MINIMAL PROPERTIES A N D UNIVERSAL O P T I M A L I T Y One of the most common dissatisfactions with the previously described optimality criteria is that they are too precise and restrictive and may only approximate the experimenter's ob-jectives. Typically an experimenter will have several objectives or a single vaguely defined objective. He may not wish to restrict himself to a single criterion which may lead to a design which is poor in terms of another criterion. Kiefer (1975b) proposed a set of properties which any optimality function should possess. An optimality function $ should satisfy the following properties: 1. $ is convex, 2. $[6M(\u00a3)] is nonincreasing as scalar b > 0 increases, and 3. $ is invariant under each permutation of the rows and corresponding columns of M(0-Kiefer called a design \u00a3* universally optimal in the set H of designs under consid-eration if \u00a3* optimizes $[M(\u00a3)] for every $ satisfying the above conditions. All the previously mentioned criteria satisfy these minimal properties. Kiefer also gave some conditions for proving universal optimality which were further developed by Cheng (1987). If a design with all eigenvalues equal maximizes tr[M(\u00a3)], then it minimizes $[M(\u00a3)] for all nonincreasing, convex, and orthogonally invariant $ (Cheng, 1978). Chapter 2. LITERATURE REVIEW 18 A universally optimal design greatly decreases the pressure on the experimenter to precisely define vague objectives or to choose between several optimality criteria. A universally optimal design is robust to changes in criteria. As a consequence, these designs are also robust to changes in objectives. Bondar (1983) provided alternative properties and definitions for universal optimality. Unfortunately, universally optimal designs do not always exist (Hedayat, 1981). Another desirable property of an optimality function is that it be insensitive to changes and scale of the x. However, this is rare. D-optimality is an exception in that the D-optimal design remains D-optimal with nonsingular linear transformations of the independent variables although the value of the criterion will change (Pesochinsky, 1978). More generally, D-optimal designs remain D-optimal with transformations where the determinant of Jacobian of the transformation is non zero (Mehra, 1973). This sensitivity of criteria to scale can be ameliorated by standardizing the x. E Q U I V A L E N C E T H E O R E M In 1960, Kiefer and Wolfowitz demonstrated the equivalence of the following conditions for an approximate design measure \u00a3*: 1. M(C) is D-optimal, 2. M(f*) is \"G-optimal, and 3. maxxxT[M{C)]~1x = k. In addition, xT[M(\u00a3*)]~1x = k is true for all points x\u00bb(i = l. . .n). Any linear combination of designs satisfying these criteria also satisfies them. This theorem gives a necessary but not sufficient (Fedorov, 1972) condition for proving both D- and G-optimality. It also brings together the objectives of prediction and parameter estima-tion. However, it should be emphasized that the theorem is only valid for approximate designs where Var(y; \/J.,6,T) = constant. Chapter 2. LITERATURE REVIEW 19 Fedorov and Malyutov (1972) gave a more general theorem. Let $ be a function defined on { M ( \u00a3 ) } _ 1 such that the following conditions are satisfied: 1. [^7V-1{M(0}\"1] = 7 ( N ) $ { M ( \u00a3 ) } _ 1 where -y(N) is a decreasing function in N, 2. $(A) < $(B) if B \u2014 A is positive semidefinite, and 3. $ { M ( \u00a3 ) } _ 1 is a convex function on M(\u00a3). Then 1. A necessary and sufficient condition for optimality of the approximate design \u00a3* is the fulfillment of the equation sup A(xMx, f ) = trWO}-1 ( 2- 1 3 ) where ^(x,0 = x T {M(0}- 1 ^^^-{M(0}- 1 x (2.14) and A(x) = {Var[y(x)]}_1 continuous on X. 2. The function \\(x)(p(x, \u00a3*) reaches its upper bound on the support of the optimal design \u00a3*. 3. A set of optimal designs is convex. If $ { M ( \u00a3 ) } - 1 is a strictly convex function on M(\u00a3) then all the optimum designs have the same information matrix. Some examples of $ and
A2 > . . . > Afc be the eigenvalues of the M(\u00a3) matrix. A useful family of criteria is $P(A) = [| YA=I K1}* with the limiting values = 11?= i = m a x Here p < q implies $P(A) < $,(A) with equality if and only if the Aj are all equal. Note: 1. $'(\\) = D-optimality, 2. $^(A) = E-optimality, 3. $i(A) = A-optimality, and 4. ^ (A) > *:2(A) if si > s2 (Zarrop, 1979). Thus, for a given design, max A\"1 is an upper bound for all the other criteria in this family. Kiefer (1975a) found that although the optimal design \u00a3* changed appreciably with changes in p, the efficiency of \u00a3* in terms of the other criteria was surprisingly good. Chapter 2. LITERATURE REVIEW 22 DISCUSSION The emphasis has been on designing experiments for parameter estimation or predic-tion. Optimal design has also been applied to other problems notably response surface design (Box and Draper, 1959, 1987). As well, optimal design has been used to ob-tain restricted randomization experimental designs (White and Welch, 1981) and to obtain optimal designs for estimating the independent variables for a given value of the dependent variable (Ott and Myers, 1968). As pointed out by Bandemer (1980), the approach to optimal design based on the information matrix M has several potential shortcomings: 1. it is based on the assumption of a correct model and the criteria lose their meaning if this is not valid; 2. it does not make explicit use of prior knowledge of the relationship; 3. it is not always clear which criteria are appropriate under which circumstances; and 4. it is not always possible to find an exact design which is close to the optimal approximate design therefore the transition from an approximate to an exact design may not be clear. These shortcomings may or may not be serious enough in a given instance to make one abandon this approach. There are several other approaches to optimal design which specifically address these shortcomings. 2.1.3 MODEL UNCERTAINTY The previous theory was concerned with obtaining designs for specified models. How-ever, often the experimenter may not know the model with certainty. He may have Chapter 2. LITERATURE REVIEW 23 several potential models in mind or may have a single model which he may wish to test for adequacy. The worst possible situation is when the experimenter is unable to state any information about possible models. Several approaches to solving these difficulties have been proposed. Hill (1978) reviewed design procedures for regression model discrimination. One measure of the adequacy of a model is the integrated mean squared error N r J = \u2014 \/ w(x)E{y(x) -r](x)}2dx (2.16) (TJ JO where r)(x) is the true model. The mean squared error can be broken down into two components, V due to model variance and B due to model bias given by V = -f w(x)E{y{x) - Ey(x)}2dx (2.17) Jo N r B = \u2014 w(x){Ey(x) - r](x)}2dx (2.18) (7* JO J = V + B. (2.19) The previous parameter estimation theory dealt exclusively with minimizing V. Designs for dealing with model uncertainty are useful when model bias cannot be ignored. NO PRIOR INFORMATION The case when the experimenter has no prior information about the relationship is extremely rare in practise and when it occurs, it may indicate a lack of preliminary research into available information. However, this situation is of interest from a theoret-ical viewpoint since it represents the opposite extreme from complete model certainty, and has an even simpler solution. It may also happen that when an experimenter is planning a pilot study, he may wish to minimize possible introduction of bias by delib-erately ignoring previous results. Once the pilot study information has been collected, one or more possible models may be proposed and a sampling design suggested. Chapter 2. LITERATURE REVIEW 24 The most informative design in the absence of any information is a uniform design, i.e., one where every X ( ; j is sampled as equally as possible. This has been called the all-bias design since it minimizes potential model bias. Scheffe (1963) recommended such a uniform design in the absence of prior model information, but recommended a few replicate observations also be taken. MINIMIZING SQUARED BIAS Consider fitting y(x) = xjbr (2.20) when the true model is n(x) = x1p\\ + xT\/32. (2.21) The bias for this model is B \u2014 ~2~ J w(x){E(y(x))-V(x)}2dx. (2.22) Let Mn = N^X^Xi M12 = N^XfXi Ahi = \/ w(x)xixjdx fj,i2 = \/ w(x)xixjdx. A necessary and sufficient condition for the squared bias B to be minimized is that (2.23) M{11M12 = MuVi2- (2.24) A sufficient but not necessary condition for B to be minimized is that M n = \/ i n and M 1 2 = ^ i 2 , (2.25) i.e., the moments of the design points coincide with the moments of the weight function. For a uniform weight function, this coincides with a uniform design. Chapter 2. LITERATURE REVIEW 25 Note that both of these conditions (equations 2.24 and 2.25) are independent of f3\\ and fa which is not surprising since optimal variance designs are also independent of the parameter values. However, the efficiency of these designs compared to designs minimizing total mean squared error does depend on f3i and \/32. As well, one must specify the form of the bias (equation 2.21). Box and Draper (1959) found that the efficiency of the all-bias design compared to the optimal (minimum mean squared error) design decreased slowly as the bias component of J decreased until B approaches 0, i.e., all-bias designs are efficient for a wide range of B and V. But for minimum variance designs, this was not true. This lead them to recommend that if one wishes to simplify the design problem, one should concentrate on minimizing bias rather than the random error component unless bias is almost negligible. Stigler (1971) pointed out that the results of Box and Draper were highly dependent on their allowing RO to extend indefinitely beyond RI. He suggested that just as it is not advisable to extrapolate if there is likely to be bias, one should constrain RO to coincide with RI. Under this constraint, the differences between the all-bias and minimum variance designs are much reduced. MODEL DISCRIMINATING DESIGNS An experimenter may have a finite (hopefully small) number of potential designs from which he wishes to select the most adequate one. Hunter and Reiner (1965) proposed a criterion for selecting observations for discriminating between two rival regression models 7 7 i ( x , 0 i ) and 7 7 2 ( x , # 2 ) . The sequential criterion is to select x n + 1 to maximize { 7 7 i ( x , ^ i ) \u2014 7 7 2 ( x , 0 2 ) } 2 where the 0's are the MLE's based on the n observations al-ready taken. In a sense, one tries to take an observation at the point of maximum divergence of the two models. Roth (1965) extended the criterion to more than two Chapter 2. LITERATURE REVIEW 26 rival models. Box and Hill (1967) pointed out that this criterion ignored the variability of the estimated response rjj(x,9j). Two precisely estimated responses which do not differ much may yield more information than two widely differing estimated responses with high variability. Therefore, they proposed maximizing the upper bound of the expected entropy change due to the (n + l)th observation using the Shannon entropy function which is defined as I = \u2014 JZjLi IX In Tij where JTj is the probability that the jth model is the true one. To obtain a computational form for the expected entropy change, Box and Hill made some rather restrictive assumptions and much subsequent work has been concerned with relaxing these assumptions. For discriminating between two models, the procedure of Hunter and Reiner (1965) is asymptotically equivalent to that of Box and Hill only if the asymptotic design is non-singular (Hill, 1978). Atkin-son (1981) found that for'discriminating between two models, one of which was correct, there was no detectable systematic difference between the two criteria for asymptoti-cally non-singular designs. There was no clear indication that one criterion converged faster than the other. Thus, for his examples, the two procedures were found to be roughly equivalent both for small samples and asymptotically. MODEL DISCRIMINATION AND PARAMETER ESTIMATION Box and Draper (1959) proposed a criterion for minimizing the mean squared error. They found that in typical situations in which both bias and variance occur, the op-timal design is very nearly the same as an optimal design obtained if V were ignored completely and only B considered. A more detailed analysis in Box and Draper (1987) showed that the optimal design obtained when considering both V and B is close to the optimal design obtained from considering only B. Only when the proportion of J resulting from V approaches 1 does the design change appreciably. Box and Draper surmised that if one wished to simplify the optimal design problem, it may be better to Chapter 2. LITERATURE REVIEW 27 ignore V than B. Fedorov (1972 p264+) described minimizing generalized loss defined as L = W\\B + W2V where w\\ and w2 are weight multipliers such that wiB \u00abC w2V when one model is very likely and WiB 3> w2V when all models are equally likely. One possible pair of weights is Wi = [v^~^]* and w2 \u2014 1 \u2014 wx where is the probability associated with the most likely model, v is the number of possible models, and A is a constant between 0 and 00 (Hill et a\/., 1968). Wx is monotonically decreasing in PN and A. Borth (1975) proposed maximizing the expected decrease in total entropy as a criterion for optimal experimental design where total entropy measures both the uncertainty about the model and the parameter values. Welch (1983) proposed a mean squared error criterion for protecting against model bias yet still providing efficient designs for parameter estimation. TESTING SPECIFIC DEPARTURES Atkinson (1975) presented a criterion for testing specified departures from the model. The tentative model is as given in equation 2.10 but may contain additional parameters 7 of dimension s so that the more general model is E(Y) = xT9 + zTj = xoT$\u00b0. (2.26) A D-optimum design for estimating the significance of the departure maximizes A(z) = \\ZTZ - ZrX{XTX)-lXTZ\\ = (2.27) where X, Z and X\u00b0 are n-rowed matrices specifying the values of x,z and x\u00b0. A D-optimum design for estimating the parameters in the original model maximizes Ai = \\XTX\\. (2.28) Chapter 2. LITERATURE REVIEW 28 A multipurpose design, i.e., one which estimates the parameters and detects the de-partures 7 from the model maximizes where a\\ and a2 are weights selected by the experimenter. As Atkinson points out, in designing experiments to detect specific departures from the model, there is the risk that departures of other kinds may not be detected. Mitchell (1974a) suggested using sequential design (less expensive) to determine the approximate sample size N and then use an exchange procedure (more expensive but theoretically closer to the optimum design) based on that N. These usually start with an all-bias design and after each observation is taken, the data updated. As the experimenter becomes more certain about the true model, the design approaches those obtained from parameter and response estimation criteria. More formally, let Y.T = (\/J(i), M(2)> - -\u2022'> A^r))1\" be the initial design consisting of R < N observations. These may be used to estimate the model form and, in the case of nonlinear regression, to estimate 9 by 8. Then find At(r+i) AHA(z)} (2.29) 2.1.4 SEQUENTIAL DESIGN ${X(\/zr,0r) + J(n(r+1),Or)}: (2.30) Then re-estimate the model and parameters using ^ r + 1 = (A ' (I) , \/ - i(2 ) j \u2022 \u2022 \u2022 i M(r+i))T and repeat the process to find H(r+2)- In linear regression, this corresponds to finding x to maximize ${Mr_1(0 + zaT}- (2.31) An important note is that the observations taken subsequent to the design being updated are not independent of the previous observations. As a consequence, the matrix Chapter 2. LITERATURE REVIEW 29 K(yi\\ 8,r) as described earlier is no longer the Fisher information matrix (Silvey, 1980 p63). This does not invalidate using the above method of design construction; however, it can influence inferences drawn from the results. 2.1.5 BAYESIAN OPTIMAL DESIGN Bayesian designs are based on prior distributions for 8 and r. Restrict these to be of the form \/ (0 ; r ) ~ N(80,TR) (2.32) where R is a specified k x k positive definite matrix, and E{r~l) is finite. Then p(8;r,y) ~ N(81,r(R + X X T ( 2 . 3 3 ) where \u2022 e1 = (R + XXT)-1{Xy + R0o). (2.34) Let c be a vector and \/J. a probability measure on c. Let ip = E^cc7]. A ^-optimal design for squared error loss corresponds to taking an observation at the X which minimizes tTii>[R + XXr]-1. (2.35) Psz-optimality is also referred to as Bayes A-optimality or, more generally, as Bayes L-optimality. It also has a non-Bayesian interpretation. If some observations Xo have already been taken or they must be observed, and the matrix X0Xj is non-singular, the problem remains as to how to allocate n additional observations. For given vectors c and J U , the problem can be written as find the X to minimize tr 4>{X0Xj + XXT]_1 (2.36) Chapter 2. LITERATURE REVIEW 30 which is the same computation as before when one sets R = X0Xj (Covey-Crump and Silvey, 1970). Thus one can see that the Bayesian approach lends itself to sequential construction of optimal designs. Again, by relaxing the exact restriction to include continuous (approximate) designs, one can replace XXr by N fx xxT d\u00a3(x) where \u00a3 is a probability measure on X. Most of the non-Bayesian theory has an analog in Bayesian theory. For example, one can once again define a variety of optimality criteria based on one's objectives. An equivalence theorem analogous to the Kiefer and Wolfowitz theorem also exists for Bayesian design (Chaloner, 1984 p.287). However, an important difference is that, in general, optimal Bayesian approximate designs depend on N (Chaloner, 1984). It is worth noting that the contribution of the prior distribution approaches 0 as 7Y ap-proaches oo, i.e., the Bayes solution asymptotically approaches the non-Bayes solution although the basic premises are quite different. The Bayesian approach has several shortcomings as well. Use of prior distributions specified by the experimenter introduces subjectivity to the design selection process. Different prior distributions lead to different designs so how does one select a prior distribution? This leads to another shortcoming. The computational complexities in Bayesian theory make it imperative that restrictions such as equation 2.32 be imposed so that a solution can be obtained. This constrains the choice of prior distributions. In fact, V>-optimality may only be appropriate in a strictly Bayesian sense under the assumptions of normality and squared error loss (Goel and DeGroot, 1980). Chapter 2. LITERATURE REVIEW 31 2.1.6 NONLINEAR REGRESSION A nonlinear relationship is one in which the random variable y whose density function is p(y; \/ i , 0, r) is described by the model yi = v{l\u00abJ) + \u00a3i (2-37) where rj is a nonlinear function of 9 and \u00a3 depends only on r. Restrict & to be of the form E(\u00a3i) = 0 and Var(\u00a3;) = r. In general, for nonlinear functions, the experimental design depends on: 1. the form of the model, 2. the sample space U, and 3. the initial parameter estimates. Once again one can obtain the Fisher information matrix as in section 2.1.1. How-ever, the matrix is no longer independent of the unknown 9. By defining { i ) \u2014d9\u2014 * ' the standardized information matrix for an N-observation design can be written K(n- 9, T ) = M(\u00a3, 9) = N-1 fl x{l)xly (2.39) One can use the criteria in section 2.1.2 to obtain a design optimal for estimating the parameters 9 (usually denoted by a bracketed 9, e.g., D-optimality becomes D(0)-optimality) but the interpretation changes slightly. In general, the optimal properties become asymptotic in N. As well, the information matrix is not nearly as easily ob-tained as before since it depends on 6 which is unknown. Thus to use design criteria based on the information matrix, one must have initial parameter estimates. This ap-proach lends itself to sequential design of experiments where the parameter estimates Chapter 2. LITERATURE REVIEW 32 are being continually updated thereby minimizing the dependence on the initial esti-mates. Another approach is to use a linearized approximization to the nonlinear model (Box and Lucas, 1959) which is often satisfactory for a narrow range of X. For models which are linear in some of the parameters and nonlinear in the remaining parameters, the D-optimal design does not depend on the values of the linear parameters (Hill, 1980). Fedorov and Malyutov (1972) looked at the properties of optimal designs based on least squares estimates. They stated the following theorems. Denote the true value of 8 by 8U. Define a design \u00a3* to be locally optimal if where $ is an optimality criteria defined in table 2.1. 1. Let the following conditions be satisfied. \u2022 The function rj(p,,8) be continuous on U x Q,8U \u00a3 0, the set 0 is compact. \u2022 The sequence of designs fa (where N is the number of observations) con-verges weakly limx_<00 fa \u2014 \u00a3 and the function v2(8, \u00a3 ) = f[n(fx, 8)\u2014r](n, \u00a3u ) ] 2 \u00a3 ( d \/ i ) has a unique minimum when 8 \u2014 8U. Then the LS estimates are strongly consistent (9 \u2014> 8U w.p.l). 2. Let the following conditions be satisfied in addition to those of the previous H M ^ o r 1 = gi*{M(8fl,o} (2.40) theorem. \u2022 6U is an inner point of 0. Chapter 2. LITERATURE REVIEW 33 \u2022 The matrix M(0U,O = j xxJi{dx) (2.41) is non-singular. Then the LSE have an asymptotic normal distribution and Hm N V(9N) = V(9U, 0 = {M(8U, (2.42) JV\u2014\u00bboo where V($N) is a variance-covariance matrix for 9fj. 3. In addition to the conditions of the previous 2 theorems, let the following condi-tions be satisfied. \u2022 Let n(jM,9) have continuous first and second derivatives with respect to ix for all 0 \u20ac 0. \u2022 Let ${M(0U, \u00a3 ) } - 1 have continuous first and second derivatives with respect to the elements of the matrix V(9, \u00a3) for all 8 G 0 and for all non-singular designs for given U and n(\/j,, 9). Then, if the limiting design \u00a3 = lim^-^ is non-singular, it is almost sure locally optimal. The equivalence theorem of section 2.1.2 can also be extended to nonlinear models (White, 1973). Define the analog of i T { M ( \u00a3 ) } - 1 z to be tr[I(z, 6){M(\u00a3, 0)}\"\"1]. A design measure \u00a3* is called G(0)-optimal if suptr[\/(z,0){M(f ,0)}-1] =minsuptr[\/(x,0){M(e,0)}_1]. (2.43) xex it- x\u20acX The following conditions are equivalent for an approximate design measure \u00a3*. 1. M(r, 9) is D(0)-optimal. Chapter 2. LITERATURE REVIEW 34 2. M(\u00a3*,6) is G(0)-optimal. 3. suVxeXtT[I{x,e){M{i\\6)}-'} = k The quantity tr[J(x, 0){M(\u00a3, is asymptotically proportional to the expected uncertainty at x from estimating 8. As well, White (1973) obtained similar results for partial optimality (estimating s < k parameters). As with linear regression, numerous other criteria for obtaining good designs have been examined. The model discriminating criteria of section 2.1.3 may be extended to nonlinear models. Box (1969) examined the case where one wishes to test the adequacy of a single nonlinear model. He proposed comparing the nonlinear model to linear empirical models using model discriminating techniques. Bayesian techniques may also be applied to nonlinear models (Draper and Hunter, 1967a, 1967b). 2.1.7 DECISION THEORY As pointed out by Herzberg and Cox (1969 p.36), decision-theoretical ideas can be used to determine the size of a given structure by achieving an explicit balance between the cost of experimentation and the losses arising from an incorrect conclusion. PARAMETER ESTIMATION Lindley (1968) investigated the problem of finding a subset of the independent variables X i , X 2 , . . . , Xfc to use for predicting a future value of y. More formally, find a subset \/ of {1,2,..., k} containing s members such that xj \u2014 {xi\\i \u00a3 1} is to be used for predicting y. Let xj be the complement of xj. The objective is to minimize the loss Q{xi) = {y-f{xi)Y + cr (2.44) Chapter 2. LITERATURE REVIEW 35 where y is the true value of the independent variable to be estimated and cj is the cost of observing xj. Under a number of independence assumptions, Lindley found that one needs to know the expectation of the regression parameters (E(6)), the dispersion of the unobserved x's (var(xj)), and the costs of observation c\/. In particular, one does not require the dispersion of the regression parameters. For this problem the dispersion of the x's is more important than that of the #'s. However, calculating the value of the expected loss requires the dispersion of 0j. Lindley also pointed out that designed experiments do not give any information about the distribution of x and may therefore be uneconomical for this type of predic-tion problem unless additional information about x is available. Brooks (1972) extended Lindley's work to examine designing experiments for pre-dicting a future value of y. As well, Brooks added a term C to the loss function (equation 2.44) to represent the cost of the experiment. Brooks made some normality assumptions about 6 and y. Using a vague prior for 9 (var(0) \u2014\u2022 oo), one obtains the same results as if one had decided beforehand to use all the x; to predict y (i.e., xj \u2014 0). DISCRIMINATING EXPERIMENTS When one is unsure of the correct form of the model, one is usually interested in finding the most plausible model from among a set of tentative models which hopefully contains the true model. Each model may be thought of as a hypothesis and the problem is how to discriminate among the rival hypotheses. Each hypothesis Hj should be characterized by the following attributes: 1. pj(y\\x) = Pj[y\\Vj(x,9j)], j = l,...,v; 2. E(y\\x) = Vj(x\\ej); Chapter 2. LITERATURE REVIEW 36 3. A(x) = {Jy[y \u2014 r]j(x,9j)}2pj(y\\x)dy} 1 i.e., the efficiency (inverse of the variability) function is known for the range of y; and, 4. the results of the observations are assumed to be independent. To discriminate among rival hypotheses Hj, j = 1,... ,v, one must have a decision rule of function 8. Associated with each experiment is a loss Q{8) which includes the cost of the experiment and any penalties for incorrect decisions. For a reasonably informative experiment one would expect that as more samples are taken (and cost rises) the probability of an incorrect decision decreases. In general, there does not exist a decision function 8* which minimizes Q(8) for all y. This leads one to look for a decision function which minimizes the expected loss with respect to y, i.e., R(8) depends on 8 and the form of the true hypothesis which is the point of the inves-tigation. If one has a priori information about the true hypothesis, one can minimize the expected risk Efj[R(6)]. When one does not have prior information, one can find the minimax decision rule, i.e., one which minimizes the maximum risk The value of the risk depends not only on the decision function but also on the design of the experiment, thus one can look for a design to minimize E[R(8)]. It is apparent that because the risk depends on many unknown quantities, designing discriminatory experiments can become mathematically complex. To simplify the pro-cedure and to allow for constant updating of information, sequential experimentation is recommended. (2.45) min max S H (2.46) Chapter 2. LITERATURE REVIEW 37 2.1.8 ROBUST DESIGNS Use of the previous theory requires fairly stringent specifications of the model and the optimality criterion. Various authors have attempted to generalize the theory by investigating robust designs. Robust designs retain many of their desirable properties when some of the initial assumptions have not been met. Most of the work has been concerned with designs robust to departures from the specified model and the specified criterion. Much of the work has been discussed earlier but some additional topics will be mentioned here. MODEL ROBUST DESIGNS As Box and Draper (1987) observed, all models are incorrect to some extent, but some are useful. However, an experimenter may not always be confident that he has specified a useful model prior to experimentation. Thus, he may wish to have an experimen-tal design whose optimal properties are somewhat insensitive to departures from the specified model. One option is to design a model discriminating experiment (see sec-tion 2.1.3). However, this may result in the entire experiment being devoted to model discrimination because of a little model uncertainty. For example, an experimenter may have a model he is satisfied with but may be uncertain as to whether to include an interaction term. A model discriminating design would concentrate resources on estimating the interaction term. As far as the experimenter is concerned, this may be a bit excessive. He may include the interaction term in the model if it is significant since it involves no additional measurement cost but it may be only a minor modifica-tion. One solution is to embed the model of interest in a slightly more general model which also includes the interaction term. This will provide good parameter estimates for the model of interest and still allow the interaction effect to be estimated. The most Chapter 2. LITERATURE REVIEW 38 common terms to be included in the general model are interaction terms and higher order polynomials. Another solution is to obtain an optimal design for the model of interest and augment it with observations which allow the possible interaction terms to be estimated. Some authors have tried to obtain model robust designs without specifying the nature of the possible bias. This involves allocating observations such that model adequacy may be checked as well as permiting unspecified parameters to be estimated. No general model robust procedure has been developed. OUTLIER ROBUST DESIGNS For the purpose of this discussion, outliers will be loosely defined as observations which disproportionately influence the conclusions drawn from the investigation. The two most common sources of outliers are human error and \"unlikely\" observations. Human error outliers are best handled by the experimenter although they may be affected by the design. A relatively simple design may reduce (or increase) the likelihood of human error compared to a more complex design. Unlikely observations are those which occur with very low relative frequency and may affect the conclusions more than they should. For example, snow in Vancouver is relatively rare and permanent residents would probably agree that a relatively small snow removal budget is adequate. However, if a tourist happened to visit Vancouver on a day it was snowing, he might conclude that Vancouver should have a relatively large snow removal budget. One may wish to have a design which protects against the influence of such unlikely observations. Since one of the assumptions of linear regression is that the value of the dependent variable is independent of other, values for a given combination of independent variables, one cannot simply discard unlikely observations. One must work at decreasing their influence. Outliers have two major effects. They affect the estimated variance and the Chapter 2. LITERATURE REVIEW 39 estimated relationship. Draper and Herzberg (1979) observed that if there is no model bias, outliers simply increase the integrated variance and the design which minimizes the variance function alone is best. Box and Draper (1987, p.505) stated that G-optimal designs are optimally robust to wild observations. This is a consequence of the coincidence of observations with the greatest influence and predictions with the greatest variance. Another method of reducing the influence of unlikely observations is to increase the sample size so that the relative frequency of the outlier in the sample approaches the relative frequency of the outlier in the population. CRITERION ROBUST DESIGNS The majority of effort in developing criterion robust designs has centered around univer-sal optimality (see section 2.1.2). However, universally optimal designs do not always exist and even when they do, they may be difficult to identify. One can obtain an optimal design in terms of one criterion and then investigate its optimality in terms of other criteria. This is always a good practise since the optimal design may not be unique according to one criterion. Another criterion can be used to distinguish between designs equivalent in terms of the first criterion. Designs close in terms of one criterion may be quite dissimilar in terms of another criterion. Chapter 2. LITERATURE REVIEW 40 2.2 R E V I E W OF FORESTRY L I T E R A T U R E The term \"optimal\" has been used mainly in forestry in reference to optimal allocation of samples to strata in stratified sampling (Freese, 1966) or in double sampling with regression (Paine, 1981). In these sampling strategies, the proportion of observations allocated to each stratum (sample and subsample in the case of double sampling) is dependent on the cost of obtaining observations from the stratum. However, the costs are assumed to be constant within the strata. Therefore, optimal design, as used in the present study, is distinct from optimal allocation. Optimal design specifies the values of the independent variables at which one should take observations. Randomization is replaced by a statistical information approach. Thus optimal design is not restricted to sampling but is also appropriate for experimental design. There has not been much discussion of optimal design for linear regression in the forestry literature with the exception of Demaerschalk and Kozak (1974,1975a,1975b). Model-based sampling and sampling for biomass have received more attention. Among other suggestions, Demaerschalk and Kozak recommended using a uniform sampling distribution when there is model uncertainty. When the model is known, they recom-mended sampling to increase XT X. In addition, they developed formulas for calcu-lating the sample size when precision requirements are specified for various values of x. Ker and Smith (1957) discussed sampling for height-diameter relationships. They recommended sampling two large diameter trees and two trees of approximately 30% of the diameter of the large trees to estimate a quadratic relationship. They noted, however, that more study is required to determine the optimum sampling strategy of tree heights. Chapter 2. LITERATURE REVIEW 41 2.2.1 M O D E L - B A S E D SAMPLING There has been some discussion of model-based sampling in the forestry literature (e.g. Schreuder and Wood,1986). Most of the discussion has been concerned with comparing model-based versus design-based sampling. Although model-based design and inference can be more efficient, it involves more assumptions than design-based inference. Hansen et al. (1983) gave an overview of model- and design-based sampling. Model-based sampling schemes have been defined as those for which sample selection is determined by the assumed model and the model also provides the basis for infer-ence. Randomization other than that supplied by the sampling units is not considered necessary. Design-dependent or probability sampling is sampling where randomization is introduced at the sample selection stage and is the basis of inference. For model-based sampling, estimates are model unbiased or model consistent. A model-consistent design is one where the difference between the estimate and the actual population value becomes very small as N becomes large: This reliance on model-based estimates is con-sidered the distinguishing characteristic of model-based sampling. Design-dependent sampling makes use of randomization consistent estimators so that although models may be used in the design (e.g. used for stratification), they do not provide a basis for inference. Hansen et al. (1983) pointed out that as sample size increases, any inadequacies in the model become more pronounced while design-dependent sampling schemes rise above this shortcoming. However, one should note that although the inference is model-based, the sample selection discussed is not equivalent to the optimal design considered in this paper. A D-optimal design would have consisted of sampling the largest x values, resulting in a considerably smaller mean squared error. Chapter 2. LITERATURE REVIEW 42 Hansen et al. also pointed out that for inference about the causal relationship or pre-diction based on the parameters of a superpopulation or stochastic process, only model-dependent processes are relevant. As well, for small sample sizes, model-dependent sampling may be preferable. Little (1983), in a comment on the paper by Hansen et a\/., pointed out that small sample theory is probably very common, more so than the large sample theory of design-based inference. Smith (1983), in a comment on the same paper, emphasized that using models to describe the sampling error structure is only one of the uses of models in sampling design. They can also be used for stratification, for the response errors and coding errors, for non-response, etc. Sampling error may be minimal compared to these other potential errors. All models used should be viewed critically and evaluated as to their usefulness. Schreuder and Wood (1986) enumerated some points to consider when selecting between model- or design-dependent sampling. The sampling scheme should provide a high percentage of good estimates. For model-based sampling, one should have confidence in the model or test its adequacy before engaging in a costly study. One should also consider how much time and effort one is willing to expend ensuring the model and the sampling design are seen to be valid. The user of the study results may need to be convinced of their validity. Other peripheral or subsequent uses of the study results should be considered. The study should lead to a greater understanding of the underlying relationships between the data. The size of the sample may influence the design choice. Small sample sizes may require model-based sampling to achieve the desired precision. Ease of implementation of the sampling should always be considered. However, in the bulk of the studies considered, only a rather limited form of model-based sampling was discussed. In most cases, all the values of the independent variables in the population were required to be known, as well, the model was mainly used for Chapter 2. LITERATURE REVIEW 43 stratification. However, as has been seen, model-based sampling is much broader in scope than these applications suggest. The objective of these studies was to predict population values rather than relationships. Schreuder et al. (1984) compared a form of model-based sampling and point-Poisson sampling against large sample estimates of tree volume on a timber sale. The model-based sampling scheme involved placing the independent variable (Dbh2Hi where DBH = diameter at breast height and Ht = tree height) into equally spaced in-tervals and then sampling a tree close to the midpoint of each class. The point-Poisson estimate involved sampling trees proportional to their basal area. A subsample of these trees was selected with probabilities proportional to Dbh2Ht. Probabilities were set equal to one if Dbh2Ht exceeded a certain value. They found that the model-based estimate of total volume was considerably closer to the large sample estimate than the point-Poisson estimate. Using jackknife variance estimates, model-based and point-Poisson sampling yielded similar variance estimates. Confidence intervals constructed from both sampling schemes included the large sample based confidence interval for total volume. They concluded that the two sampling schemes were similar in terms of efficiency. The advantage of model-based sampling was that the sampler can purposely avoid undesirable trees (e.g., hung, fallen, or forked trees which-are difficult and occas-sionally dangerous to measure). The disadvantage is that it is more dependent on the model than is point-Poisson sampling. This paper was mentioned because although it does not use optimal design theory, it is model-based and a prime candidate for optimal design techniques. The optimal design would have concentrated observations in the smallest and largest Dbh2Ht classes yielding predictions with lower variances. However, as in the paper, separate estimates of the population average Dbh2Ht and the total number of trees would have to be obtained. Chapter 2. LITERATURE REVIEW 44 Gertner (1987) used optimal design theory to evaluate sampling schemes to improve the predictive ability of already calibrated models. Using L-optimality for a comparison criterion, he compared several sampling schemes. Optimal design theory was used for the comparison but not for the construction of designs. As well, differing availabilities of observations was not considered. i 2.2.2 SAMPLING FOR BIOMASS There has been much discussion of sampling for specific quantities of interest to foresters. In particular, in the last two decades there has been considerable interest in predicting biomass(IUFRO working group on forest biomass proceedings in 1971, 1973, and 1976). Cunia (1979a,1979b) reviewed some of the biomass research and made some recommen-dations concerning statistical and sampling methods. Concerning biomass models, he recommended more attention be paid to model selection. Often allometric models are selected with little effort to evaluate the adequacy of the model. Cunia observed that linear models may be as good as, if not better than, allometric models and, in addition, have a fairly simple error structure so computing the error of combined estimates (e.g. total biomass for British Columbia) is relatively straightforward. Although he observed that one of the assumptions of linear regression is that the values of xx, x2, \u2022 \u2022 \u2022, XJV be fixed, he went on to describe several methods of obtaining a randomized sample. This was done to ensure the sample was representative of the populations. Although this may be required to fulfill some additional sampling objectives, it is not required for quantifying the relationship between dependent and independent variables. What is required is that the relationship between the sample values Xi and be representative of the population relationships. This is ensured by sampling in such a manner that y; is independent of yd for a given combination of independent variables. Chapter 2. LITERATURE REVIEW 45 2.3 ALGORITHMS Algorithms can be roughly classified into exact and approximate depending on the type of design they produce. The algorithms typically are highly dependent on the optimality criteria adopted. However, general algorithms exist, notably by Fedorov and Malyutov (1972). Most of the published work has dealt with D-optimality. Of course, general algorithms will also work for D-optimality (the D-optimal algorithms are often just special cases of the more general algorithms) but the criterion specific algorithms are more efficient in general. Several algorithms are described in detail. 2.3.1 INFORMATION MATRIX The design criteria in section 2.1.2 are all functions of the information matrix M(\u00a3). Several properties of M(\u00a3) are worth noting. 1. M(\u00a3) is a symmetric positive-semidefinite matrix. 2. The family of matrices M(\u00a3), f \u20ac E, is convex, that is, if \u00a3i \u00a3 H and \u00a32 \u00a3 E then \u00a3 ~ a\u00a3i + (1 \u2014 a)\u00a3 2 is also in E for 0 < a < 1. 3. For any design \u00a3, the matrix M(\u00a3) can be written in the form M(t)=ibt(*i)x(i)xi) (2-47) 7=1 where n < + 1, 0 < f(z,) < 1, n is the number of distinct and H?=i \u00a3 ( x i ) = 1 (from Catheodory's theorem (Fedorov, 1972 p.66)). That is, for any design M(\u00a3), a design with an equivalent information matrix can be found based on n distinct points. Chapter 2. LITERATURE REVIEW 46 2.3.2 A P P R O X I M A T E DESIGNS Approximate design algorithms construct design measures which are independent of the number of observations. One then tries to find an exact design close to the approximate D - O P T I M U M DESIGNS Analytic solutions for obtaining D-optimal designs exist for some special cases, notably polynomial regression (De La Garza, 1954; Stigler, 1971; Studden, 1978, 1982; Guest, 1958; Gaffke and Krafft, 1982; Lau and Studden, 1985). However, in general, nu-merical search methods must be used. Approximate D-optimal designs are equivalent to approximate G-optimal designs and most algorithms are based on the equivalence theorem of Kiefer and Wolfowitz (1960). STEEPEST D E S C E N T This algorithm was discussed in detail by Fedorov (1972). Start with an arbitrary initial non-singular design design. (2.48) n where ^2C{xi) \u2014 1 and n > k. (2.49) i=i 1. Compute M(\u00a3 0) = E?=i f (zi)x(0x(\u00bb) T and its inverse. 2. Find x0 G X to maximize d(xQ,\u00a30) = \u00a3 T {M(\u00a3o)} 1^o-3. The design \u00a3a = (1 - a0)\u00a3o + ao\u00a3(zo), i-e., (2.50) Chapter 2. LITERATURE REVIEW 47 is constructed where and where 80 = d(x 0 ,\u00a3o) determinant of M(\u00a3). a o = [80 + k-l)]k ( 2 - 5 1 ) k. This value of ctQ maximizes the increase in the 4. The information matrix M(\u00a3 x ) of the design ^ is computed as well as its inverse. Steps 2 through 4. are repeated replacing \u00a3 0 by \u00a3 x or, in general, \u00a3 s by f s + i . The point XQ is often called the direction of the step and Q 0 is often called the length of the step. To avoid repeated matrix inversions in step 4 and calculations of the determinant one can use the following relationships. {M(Ui)}-1 = (1 - \u00ab\u2022) f a {M (6 ) } - 1 x s + 1 xJ + 1 1 - a + a d ( x s + 1 , \u00a3 5 ) | M ( 6 + 1 ) | = ( l - a ) f c (Fedorov and Malyutov, 1972). 1 + a 1-a d(xs+1, \u00a3a) {M(^)}-1 | M ( 6 ) | (2.52) (2.53) Fedorov proved that this algorithm converges to the D-optimal design M((*) as the number of iterations goes to infinity, i.e., l i m | M ( 6 ) | = |M(r)| s\u2014>oo (2.54) Fedorov also gave a number of stopping rules for terminating the iterative procedure when a design is sufficiently \"close\" to being D-optimal. Terminate the above algorithm when one of the following conditions is true. 1. as < 7 i . 2 Mis+i)\\-Mt)\\ < ^ 3. k 1d(x0,\u00a3o) \u2014 k < 7 3 . Chapter 2. LITERATURE REVIEW 48 Various modifications to this algorithm can be introduced. The algorithm may be slow to eliminate some near optimal points and Fedorov (1972 p. 109) gave some suggestions for removing suboptimal points and rounding-off near optimal points. The step length a 0 can also be modified. The algorithm will converge to a D-optimal design for any sequence {as} satisfying CO di = oo and lim cts = 0 (2.55) 1=1 One such sequence is to select an initial an and use it as long as the determinant of M(\u00a33) increases at each stage. When the determinant no longer increases, repeat the above procedure with and then ^ and continue in this manner until a stopping rule is statisfied. In general, there does not exist a single sequence {as} which most quickly approaches the optimal design for all function forms and efficiency functions (Fedorov,1972 pll4). This is a special case of the linear criterion algorithm described later. A C C E L E R A T E D STEEPEST D E S C E N T One weakness of the steepest descent algorithm is the slow reduction of the weights of non-optimal points included in the design. One solution alternates between adding points to the design and removing non-optimal points from the design by setting their weights p; to zero. Pazman (1986) and Atwood (1973) described such algorithms. MULTI-CRITERIA DESIGNS LINEAR CRITERIA Let I, be a non-negative function on M ( \u00a3 ) _ 1 , i.e., L(A + B) =L(A)+L(B) L(cA) = cL{A) (2.56) L(A) > 0 Chapter 2. LITERATURE REVIEW 49 for all A, B \u00a3 M.. A design \u00a3* is said to be linear optimal if i P ( D } 1 = minJL[{M(0}-x]. (2.57) Note that L[{M(\u00a3)} _ 1 ] is a convex function of M(\u00a3) and that for any linear optimal design \u00a3*, there exists a design \u00a3Q based on n 0 distinct points such that L[{M(C)}-1] = L[{M(Co)}-1} (2.58) where n 0 < Fedorov (1972 p. 123). For notation, see section 2.1.2. 1. Start with an arbitrary non-singular design \u00a30 \u2022 2. Find x0 to maximize (p(x,\u00a30) - L[{M(\u00a3S)}] where
0) (Silvey, 1980). Thus, one may use an approximate design to identify the support points of an exact design to reduce the design space X of the problem. However, if the exact design 77 includes points not in the support of 77,, then $ { M (77 . ) } - $ { M ( 77 ) } = 0(\/V*-1) (2.65) Fedorov and Malyutov (1972) stated the following theorem. ${M(r)}\" 1 < HM(CN)}-1 < 7 ( ^ ) N ) ^ { M ( D } - 1 - (2.66) where 7 is as in section 2.1.2. For linear criteria, j(N) = JV _ 1. They also gave the following corollary. Let fa have the same support points as the optimal approximate design \u00a3*. At every support point x^, take rf = [(N \u2014 T Z ) \u00a3 * ] + measurements where [c]+ is the smallest integer satisfying the inequatliy [c]+ > c. Arbitrarily distribute the remaining N \u2014 J2[(7Y \u2014 n)\u00a3*] + measurements. Then HM(fa)}'1 - HM(CN)}-1 < 7(JV - n) - 1 ${M(r)> (2.67) Chapter 2. LITERATURE REVIEW 5 3 where \u00a3^ is he optimum exact design. The corollary can be used to estimate at what N the optimum approximate design can still be used to obtain a reasonably efficient exact design. Note that as the number of support points decreases, the upper bound in equation 2.66 is decreased, indicating approximate designs with minimal support will be closest to the optimal exact design. E X C H A N G E P R O C E D U R E Welch (1984) modified Mitchell's DETMAX to con-struct G- and A-optimal exact designs. He found that for his examples the value of |M(\u00a3\/v)| did not alter considerably with changing criteria (D-, A-, and G-optimality) but that tr[M(\u00a3jy)] a n < ^ rnaxx X T[M(\u00a3JV)] _ 1:E showed more variability. Therefore, it may be worthwhile to compromise slightly in terms of D-optimality to achieve a considerable improvement in terms of either A- or G-optimality. A N N E A L I N G A L G O R I T H M \"Annealing\" originally refers to a process in which metals are heated and then cooled to achieve a crystalline structure corresponding to an energy state, dependent on the rate of cooling. When cooled, the atoms tend toward an arrangement to minimize potential energy, but because of the large number of possible configurations, it is likely only a local minimum will be achieved. The metal may be reheated and cooled with the expectation of achieving a lower energy state. Metropolis et al. (1953) devised an algorithm to simulate the equilibrium state of the atoms in a metal at a given temperature. A rearrangement of the atoms' configuration is accepted with probability 1 if the new configuration has a lower energy level and accepted with probability p (dependent on the temperature and the change in energy) if the energy level increases. Otherwise, the old configuration is used. Kirkpatrick et al. (1983) adapted this simulated annealing procedure for function optimization. They found that because of the probabilistic nature of acceptance of the new configuration, the algorithm Chapter 2. LITERATURE REVIEW 54 is able to escape from local optima and thus convergence to a global optimum was more likely. As well, the computing power required to solve an optimization problem increases linearly or as a small power of the size of the problem rather than exponentially as most other approaches. Bohachevsky et al. (1986) found that the Kirkpatrick algorithm may require a large number of steps before identifying a global optimum. They suggested modifing the algorithm so that the probability of accepting a detrimental step tends to zero as a global optimum is reached. They called their technique the generalized simulated annealing method. UNIFORM DESIGNS Plackett introduced a criterion of uniformity in the discussion of Scheffe (1963). He designated the domain of a point x, the region for which x is the nearest design point. A design is uniform if the volumes of the domains of all the design points are equal. Doehlert (1970) described a method of obtaining uniform designs by generating uni-form shells. Kennard and Stone (1969) in addition to recognizing the problem of model uncertainty, also identified \"messy\" design spaces X and the inadvisability of over repli-cation as problems not addressed by most optimal design procedures. They developed an algorithm for obtaining uniform designs which addressed these complications. 2.3.4 SEQUENTIAL DESIGN NONLINEAR DESIGN Fedorov and Malyutov (1972) described the following algorithm for obtaining sequential -^optimum approximate designs for nonlinear regression models. Let $ be a criterion function as defined in table 2.1 and use the notation of section Chapter 2. LITERATURE REVIEW 55 2.1.6. 1. Start with N observations taken according to the arbitrary non-singular design \u00a3\/v (possibly a previous experiment). 2. Find xN+1 satisfying ip(xN+1,9N, \u00a3N) = supB g j r tp(x, 9N, (N) where
,.drying-iimes to strength and durability properties, some drying times may result in- boards-,width: different market values. Most management practises may result in trees of varying qualities and:end uses. The potential revenues from the by-products of an experiment?\u00a9!*\/survey may vary depending on the design. % \u2022 Possibly somg'tlesigns should;.ignore variable costs of observations, e.g., when the cost of obtainihgi'obseryations-tjaiminor compared to the cost of analysis. However, in the majority ojdns'tances, costs cannot be ignored. If the cost information is available, it should be josfecfc'to obtain a ^ D S t t e f f i c i e n t Hesign. The cost of jiii'ebservation may be broken down into C{ = lc{ + mci (3.69) where c; is the total cost associated with an observation at x;, lei is the location (sampling) or experimental se?up (experimentation) cost, and mc^ is the measurement cost. In general, these quantifies depend on* both x and y and possibly on the previous observations.' \" . J ' ' -> \u2022 i . . . :p 1. Location cost. For most practical forestry purposes, the setup cost for X{ can be assumed to :be independent of that for Xj, % ^ j and the rest of the discussion < :