DEVELOPMENT AND EXAMINATION OF SEQUENTIAL APPROACHESFOR APPLICABILITY TESTING OF TREE VOLUME MODELSByYue WangB.Sc. (Agri.), Agricultural University Of Guangxi, 1982M.Sc. (For. Eng.), University Of New Brunswick, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF FOREST RESOURCES MANAGEMENTWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIADecember 1994© Yue Wang, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. it is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)_________________________________Department ofThe University of British ColumbiaVancouver, CanadaDate(1h4PI-. J /9DE-6 (2188)AbstractIn forest inventories, an existing standard volume equation may be considered for application to a local area (subpopulation), or to the same species in a different geographicregion (new population). In order to use a volume model with confidence under thesesituations, an applicability test must be carried out to determine the actual accuracy ofthe model. Procedures based on a predetermined, sufficiently large sample of data (fixedsample size procedures) are available. However, since applicability testing is only used toclassify a model as either acceptable or unacceptable, it is likely that a testing decisioncan be made with a smaller sample size, especially when the actual accuracy of the modelis far below or above the user’s requirement. This is particularly of concern when datacollection for model checking is very expensive, time-consuming or destructive.As an alternative, three sequential accuracy testing plans (SATP) were developed byextending Freese’s (1960) accuracy tests using Wald’s (1947) sequential probability ratiotests (SPRT) in this thesis. Observations are taking sequentially, and at each stage ofsampling, a decision of whether to accept or reject the model, or to continue sampling, ismade. The SATP procedures are potentially superior to a fixed sample size procedure interms of lower sample sizes. Approximate Operating Characteristic (OC) and AverageSample Number (ASN) equations were also suggested to assist the potential users of theSATP procedures in choosing appropriate tested parameters for a given problem.The simulation results using normal distribution generators showed that the SATPprocedures are reliable for classifying a volume model as either acceptable or unacceptablebased on two pre-set limits of the accuracy requirement. Also, on average, the use ofthe SATP procedures will result in a 40 to 60% of sampling cost-saving compared to anHequally reliable conventional fixed sample size procedure. A detailed example is given toillustrate the application of the SATP procedures.InTable of ContentsAbstract iiList of Tables viiiList of Figures xAcknowledgement xi1 Introduction 12 Literature Review 82.1 Tree Volume Estimation 82.2 Applicability Testing of Tree Volume Models 132.3 Fixed Sample Size Procedures for Validating Forestry Models 152.3.1 Freese’s procedure of accuracy testing and its modifications . . . 172.3.1.1 Freese’s procedure 172.3.1.2 Underlying assumptions of Freese’s procedure 202.3.1.3 Modifications of Freese’s procedure 242.3.1.4 Previous applications of Freese’s procedure 262.3.2 Procedures for validating forestry simulation models 282.3.3 Reynolds’ estimation procedures 312.3.4 Other fixed sample size procedures 322.4 Variable Sample Size Procedures for Testing Forestry Populations . . .. 342.4.1 Sequential probability ratio tests 37iv2.4.1.1 Wald’s general decision rule for SPRT plans 372.4.1.2 Wald’s general OC function for SPRT plans 402.4.1.3 Wald’s general ASN function for SPRT plans 422.4.1.4 Modifications of Wald’s decision boundaries 432.4.1.5 Previous forestry applications and studies of Wald’s SPRTplans 452.4.2 Other sequential sampling plans used in forestry 492.4.3 Development of sequential analysis theory in statistics 512.5 Summary 523 Extension of Freese’s Procedure to Sequential Accuracy Testing Plans 553.1 Sequential Accuracy Testing Plan I 563.1.1 Modification for Freese’s original formulation 563.1.2 Sequential decision boundaries 583.2 Sequential Accuracy Testing Plan II 613.2.1 Modifications of Freese’s original formulation 613.2.2 Sequential decision boundaries 623.3 Sequential Accuracy Testing Plan III 643.4 Approximate OC and ASN Equations for the SATP Procedure 664 Monte Carlo Simulations for Examining Sequential Accuracy TestingPlans 714.1 Estimates of Error Parameters of Tree Volume Estimations 734.2 Reliability of the SATP When Errors Are iid Normal Variables With aZero Mean and a Constant Variance 764.2.1 Simulation methods 79V4.2.2 Reliability of the SATP procedures for classifying the error varianceof volume models 814.2.3 Accuracy of the nominal error probabilities of the SATP 874.2.4 Expected sample size of the SATP procedures 894.3 Reliability of the SATP When Errors Are iid Normal Variables With aNon-Zero Mean and a Constant Variance 954.4 Reliability of the SATP When Errors Are Normally Distributed With aZero Mean and Heterogenous Variances 984.4.1 Simulation Methods 984.4.2 Simulation Results and Discussions 1034.5 Accuracy of the Approximate OC and ASN Equations of the SATP . 1044.5.1 Simulation methods 1074.5.2 Comparisons between the approximate and Monte Carlo OC andASN equations 1094.6 Effects of Truncation and Group Sampling on the Performance of the SATP 1104.7 Conclusion 1185 Field Application Procedures And Example 1225.1 Field Application Procedures 1225.1.1 Preparation 1225.1.2 Sampling and testing process 1255.1.3 Post-tests for the underlying assumptions 1285.1.4 Making the final testing decision 1295.2 Application Example 1315.2.1 Problem and data 1315.2.2 Specification of test parameters 131vi5.2.3 Testing results.1335.2.4 Final testing decision 1356 Conclusions and Recommendations 138References Cited 144viiList of Tables4.1 Descriptive statistics of B. C. tree sectional data used for determining theerror parameters of volume estimations 754.2 Calculated statistics of the errors for three volume modds and B.C. treesectional data 774.3 Monte Carlo OC estimates of SATP I for testing H0 : cr2 o vs. H1= cr? with c = = 0.05 when five normal generators with zero meansand different variances were used to simulate the errors of volume models. 854.4 Monte Carlo OC estimates of SATP I for testing H0 : a2 = vs. H1= o with cr=/3 = 0.10 when five normal generators with zero meansand constant variances were used to simulate the errors of volume models. 864.5 Monte Carlo estimates of the actual error probabilities when the SATP Iis applied to test H0 : a2 = a vs. H1 : a2 = a with various a, a and904.6 Monte Carlo estimates of the actual error probabilities when the SATP Iis applied to test H0 : cr2 = a vs. H1 : a2 = o with various a, a anda=i3=0.10 914.7 Monte Carlo ASN estimates and the associated standard error when a2 =a for applying SATP I to test H0 : a2 = vs. H1 : a2 = o with variouscombinations of a, u, a and /3 93VIII4.8 Comparison of Monte Carlo OC values when u2 = ug between SATP Iand III for testing H0 a-2 = vs. H1 : a-2 = u with a /3 = 0.05 andvarious cr and cr using normal generators with a non-zero mean 974.9 Monte Carlo estimated probabilities and associated average sample sizefor accepting H0 when the SATP I is applied to test H0 a-2 = e/z_12vs. H1 : a-2 = with e0 and-y determined from the mixed normalgenerated population, N(0, 0.05x) 1054.10 Monte Carlo estimated probabilities and associated average sample sizefor accepting H0 when the SATP I is applied to test H0 a-2 = eg/z_712vs. H1 : = e/z_1,2 with e0 and y determined from the mixed normalgenerated population, N(0, 0.5x) 1064.11 Comparisons of the Monte Carlo OC and ASN curves of the SATP I andIII and the approximate equations with o = 0.3,o = 0.8,a= /3 = 0.05. 1114.12 Comparisons of the Monte Carlo OC and ASN curves of the SATP I andIII and the approximate equations with a- = 1.50, a- = 2.25,7 = a 0.05and/3=0.10 1124.13 Comparison of Monte Carlo estimates of a, /3 and associated ASN values ofSATP I for various group sizes with a- = 1.221, o = 2.747 and a = /3 = 0.05.1174.14 Comparison of Monte Carlo estimates of a, /3 and ASN values of SATP Ifor various truncation points and different combinations of a and /3 with= 1.221, a- = 2.747 1195.15 Field decision table for applying the SATP procedures 1265.16 Calculated statistics of the lodgepole pine trees collected from FIZ E in B.C. 132ixList of Figures3.1 Example of an approximate Operating Characteristic (OC) curve of SATPI for testing H0 : = 1.50 vs. H1 : .2 = 2.25 with a = 0.05 and 3 = 0.10 693.2 Example of an approximate Average Sample Number (ASN) curve ofSATP I for testing H0 : a2 = 1.50 vs. H1 : a2 = 2.25 with a 0.05and 3 = 0.10 704.3 Comparison of the Monte Carlo OC and ASN curves of the SATP I andIII and the Wald’s approximated equations with a = 0.3, o = 0.8, a == 0.05. (a) Comparison of OC curves; (b) Comparison of ASN curves. 1134.4 Comparison of the Monte Carlo OC and ASN curves of the SATP I and IIIand Wald’s approximated equations with a = 1.50, a = 2.25, a = 0.05and j3 = 0.10. (a) Comparison of OC curves; (b) Comparison of ASN curves.1145.5 Field decision graph for applying the SATP procedures 1255.6 Field decision graph for SATP II when applied to test the applicabilityof B.C. standard volume model of lodgepole pine for the application tothe subregion, FIZ E with specified parameters: po = 10 and Pi = 15,= 0.05, a = /3 = 0.05 and n0 50 1355.7 Field decision graph for SATP III when applied to test the applicability ofB.C. standard volume model of lodgepole pine for the application to thesubregion, FIZ E with specified parameters: e0 = 0.08 and e1 = 0.10 inlogarithmic units,-y = 0.05, a = /3 = 0.05 and n0 = 50 136xAcknowledgementI am grateful to my major advisor, Dr. Valerie LeMay, for her guidance and encouragement. I would like to thank my committee, Drs. A. Kozak, P. Marshall, D. Tait fromDept. of Forest Resources Management, and Dr. N. Heckman from Dept. of Statistics.Their contributions to my thesis and forest biometrics education were enormous, as wastheir patience and willingness to help at all times.I thank Dr. A. Kozak, my former major advisor, for accepting me as a Ph.D. candidateand assisting me in obtaining the data for this research from the Inventory Branch, B. C.Ministry of Forests. Thanks also go to Drs. P. Marshall and V. LeMay for their initialworks, which inspired me to carry out this research. I have also profited from workingwith the other graduate students in the Biometrics Hut.Funding for this research was supplied by the University of British Columbia throughvarious fellowships.Finally, I would like to thank my wife, Wan Ju, for her patience and encouragement.xiChapter 1IntroductionAlthough the growth of trees involves a complex biological process, and the forest is adynamic ecosystem, timber management shares the general character of other productionprocesses. That is, given levels of inputs, the process will result in certain outputs, whichmean a profit or loss for the owners. Intervening in, or modifying the production processmay help the owners to maximize profit. The goal of timber management is to maintainor increase the yield of the timber resources. To achieve this goal, forest managersmust make many decisions. Examples include selecting the optimum age to harvest(rotation age), the levels of planting density, and the timing of thinnings. Whetherthese decisions can be made intelligently depends on the availability of accurate andup-to-date information about the managed forest, especially quantitative informationon the measurable characteristics of forest. Since the basic management unit of theforest is a stand, and any stand is an aggregation of trees, the basic information fortimber management decision-making is measurements of tree or stand characteristics(e.g., diameter, height, volume and density).To help forest managers obtain quantitative information, forest mensurationists andbiometricians have developed many mathematical functions for various purposes. Forexample, to sample a forest population, the relationship between a main variable and anauxilliary variable may be represented as a regression model. Based on this relationship,the parameter of the main variable can be efficiently estimated using ratio/regressionestimators, since the measurements of the auxilliary variable are usually easier to obtain1Chapter 1. Introduction 2or previously available.To estimate the total or merchantable volume of a stand or forest, a volume equationmay be developed that relates tree volume to a few easily measured variables (e.g., treediameter and height). The difficulty of estimating tree volume is reduced. To obtaininformation on future forest volumes, a forest growth or yield model may be developedfor making predictions. Based on this model, a yield table can be established to providethe information required for forest management decision-making. As the intensity offorest management increases, the use of forest growth and yield simulation models, whichinclude a system of forestry prediction models, becomes more and more common in forestmanagement activities.In the history of applying mathematical functions, much attention in forestry has beenfocused on the following: (1) suggesting appropriate sampling methods for model development (optimal sampling design for model estimation) (e.g., Penner 1989, Marshall andWang 1993); (2) proposing improved approaches for estimating the model parameters(model estimation) (e.g., Burkhart 1986, Gregoire 1987, Kozak 1988, LeMay 1990); and(3) validating the ability of the developed models to represent the real systems (modelvalidation), or to assess the ability of the developed models to predict the dependent variable (accuracy, or prediction error testing) (e.g., Freese 1960, Reynolds 1984, Reynoldset al. 1988, Gregoire and Reynolds 1988). The research on suggesting new samplingand estimation methods provides forest researchers with better models, which give moreaccurate predictions, a better representation of real systems, or models with superiorproperties required for making statistical inferences about model parameters (i.e., constructing confidence intervals or conducting hypothesis tests). Model validation buildsconfidence for using a developed model.The specific problem addressed in this dissertation is the assessment (testing) of theapplicability of forestry models, particularly tree volume estimation models. ApplicabilityChapter 1. Introduction 3testing of forestry models is an extension of model validation; it focuses on assessing thepredictive ability of a forest model when applied to a population other than that forwhich it was developed. The predictive ability (accuracy) of a model is defined as thecloseness of the predicted or estimated values to their target values.Applicability testing of forestry models is a common practice in forest management.Examples can be found from the early stages of checking a tree volume table establishedby hand-drawn curves (e.g., Bruce 1920) to the validation of the complicated computersimulation models of today (e.g., Goulding 1979; Reynolds et al. 1981). The importanceof this problem is based on the fact that the procedures of model specification, parameter estimation, and model validation are usually based on data collected from a specificforest population (e.g., a particular tree species) over a large geographic region of forestlands. Developed in this way, a forestry model is valid only for making estimates or prediction for the same population for which it was developed. However, in many situations,an existing forestry model may be considered for application to a subpopulation (e.g.,subregion), a new population (e.g., the same species in a different geographic region ora similar species), or the same population following some change (e.g., following a forestfire or silvicultural treatment). Under these situations, the above assumption for applying regression models may not hold, and the accuracy of the developed models becomesuncertain. Using an existing model in these situations may result in inaccurate estimations; the user may then make incorrect management decisions (e.g., annual allowablecut levels too low or too high).To test the applicability of a forestry model, a new sample set is required from thepopulation to which the model will be applied (application population). The accuracyof the model is then determined by comparing the predicted or estimated values withthe actual observed values. This research is concerned with the appropriate statisticalprocedures that can be used to draw conclusions about the accuracy of models basedChapter 1. Introduction 4on comparing the predicted and actual values. In forestry, a number of papers havebeen published to suggest such statistical procedures (e.g., Freese 1960, Reynolds etal. 1981, Reynolds 1984, Reynolds et al. 1988, Gregoire and Reynolds 1988). Theseprocedures treat the differences between the predicted or estimated values and the actualvalues (model errors) as random variables. Statistical hypothesis testing or estimationprocedures are conducted to infer the parameters (mean or variance) of these randomvariables.Although these statistical procedures can be used for detailed evaluations of the predictive ability (accuracy) of forestry models, they must be based on a sample set with apredetermined sufficiently large size from the application populations (i.e., they are fixedsample size procedures). In many situations, a decision to accept or reject an existingmodel may be made based on data from a smaller sample set, especially when the actualaccuracy of the model is far below or above the accuracy requirement of the user. A fixedsample size procedure is then considered to be inefficient for applicability testing. This isparticularly true when data collection is highly expensive, time-consuming or destructive,as with data for volume model development. Therefore, alternatives to fixed sample sizeprocedures should be suggested for applicability testing of forestry models. The desiredcriteria for the alternative procedures were:1. The accuracy determination of a model should be made with a minimum samplesize, or at least a smaller sample size than required by an equally reliable fixedsample size procedure;2. The accuracy of a model should be explicitly tested;3. The use of the alternative procedure should not preclude the use of the availablefixed sample size procedures; andChapter 1. Introduction 54. The testing procedures should be easy to apply under field conditions.The principal objectives of this research were: (1) to review forestry and other literature, and from this literature to propose alternative procedures which meet the desiredcriteria for applicability testing of forestry models, particularly tree volume estimationmodels; (2) to test the reliability of the alternative procedures for various common errors of tree volume models; and (3) to suggest and illustrate appropriate procedures forapplying the alternative procedures.To meet the first objective, a literature search was conducted, first by examiningforestry literature, and then extending this to biometrics, technometrics, industrial quality control and statistics literature. In order to limit the scope of this research, only modelvalidation procedures based on comparing the estimated and observed values (predictionerror testing or estimation procedures) were examined. No procedures were found inthe literature which satisfied all of the desired criteria of applicability testing of forestrymodels. However, the information from the literature was used to develop three newtesting procedures, labelled sequential accuracy testing plans (SATP) I, II and III. Thedeveloped SATP procedures are extensions of Freese’s (1960) procedure of accuracy testing using Wald’s (1947) sequential probability ratio test (SPRT) plan. Since Freese’s(1960) idea of accuracy testing was adopted, the SATP procedures can be used for explicitly testing the accuracy of a model. As sequential sampling plans, the characteristicfeature of the SATP procedures is that the number of observations required is not presetin advance, but is a function of the observations themselves. This feature of the SATPprocedures allows applicability testing of forestry models to be realized with a minimumsample size, especially when the actual accuracy of the tested model is very high or verylow. In order to assist potential users of the SATP procedures, an approximate Operating Characteristic (OC) equation for determining the probability of making a correctChapter 1. Introduction 6decision and an approximate Average Sample Number (ASN) equation to indicate theexpected average sample size required to reach a terminal decision were also derived.To meet the second objective, Monte Carlo simulations were conducted to examine the reliability of the SATP procedures using normal, or mixed normal distributiongenerators to simulate different errors found in tree volume models. These simulatederror populations included normally distributed errors with (1) a zero mean and a constant variance; (2) a non-zero mean and a constant variance; and (3) a zero mean andnon-constant variances. Two modifications of Wald’s (1947) SPRT, taking more thanone observation at each stage (group sequential sampling) and setting an upper limit toavoid the situation of a very large sample size (truncation), were also examined.Finally, to meet the third objective, field procedures for applying SATP procedureswere suggested, and an example was constructed to illustrate the suggested applicationprocedures.The contributions of this research are:1. This research lays the foundation for future research in application of sequentialanalysis to the applicability testing of forestry models and thereby brings to forestresearchers’ attention the concept of sequential analysis for forestry model evaluation;2. The results of this research provide an alternative to fixed sample size proceduresfor applicability testing of forestry models, especially for circumstances where datacollection is highly expensive, time-consuming, or destructive; and3. Although tree volume models were addressed, the methods and results of this research are applicable for determining the applicability of other forestry models withthe same distributional error assumptions. The results of this research will also benefit other scientific fields where mathematical models are developed for estimationChapter 1. Introduction 7or prediction purpose.The dissertation has been organized according to the following format. The background information for this research is presented in Chapter 2. The extension of Freese’s(1960) procedure into a sequential testing procedure is presented in Chapter 3. The methods and results of Monte Carlo simulations for testing the reliability of SATP proceduresare presented in Chapter 4. The proposed field application procedures and applicationexample are in Chapter 5. The conclusions and recommendations for further researchcan be found in the last chapter.Chapter 2Literature ReviewTo achieve the first objective of this research, to propose alternatives to fixed sample sizeprocedures for applicability testing of tree volume models, a literature search was carriedout, first by examining forestry literature, and then by extending it to the biometrics,technometrics, industrial quality control, and statistics literature. A summary of thisliterature search is presented in this chapter. The first section is a presentation of themethods for tree volume estimation. A brief review of the problem and history of applicability testing of tree volume models is presented in the second section. In section three,some widely used statistical procedures for validating forestry models using a fixed sample size are reviewed and discussed in detail. Variable sample size (sequential sampling)procedures for sampling forestry populations are presented in section four. Finally, anoverview of the results of this literature search can be found in the last section.2.1 Tree Volume EstimationTree volume information is essential information required for timber management decision-making. Any tree is composed of a bole or stem, a root system, and branches and leavesthat collectively make up the crown. However, a forester speaking of individual tree volumes is commonly referring to either the total volume, or the commercially marketablevolume of the tree bole. The total tree volume includes the volume of the main bole of atree from ground to tree tip, whereas the merchantable tree volume includes the volumeof the merchantable part of the main bole only. Methods for estimating tree volumes8Chapter 2. Literature Review 9in forestry have been well researched, and many mathematical functions (tree volumemodels) have been proposed for this purpose (see, Clutter et al. 1983, p.8; and Loetschand Hailer 1973, p.154).Historically, the relationship between tree volume and other tree measurements wasrepresented as a volume table. A volume table provides the average volume of standingtrees of various sizes and species (Avery 1975, p.92). The basis for the use and construction of tree volume tables was given by Cotta in the 19th century (see Husch et al. 1982p.157). Such tables are based on mathematical relationships between tree volume and afew easily measured tree attributes (e.g., diameter, height) for a given species and region.These relationships can be expressed asV — f(dbh,H,F) (2.1)where V is either total tree volume or merchantable volume; dbh is tree diameter outsidebark at breast height (1.3 metres above ground); H is the total tree height or the heightto some specified upper-stem merchantability limit of tree diameter inside bark (e.g., 6cm, 8 cm etc.); and F is tree form. The commonly used measures of tree form are ratiosof diameters at specified heights to dbh, called form quotients (Spurr 1952). Also, sinceother variables, besides V, H and F, may affect the volume of a tree, an additive ormultiplicative error, e, will be associated with the function above in representing treevolume (i.e., a random error term). A tree volume table derived from this function isknown as a form class volume table. Because of the difficulty of measuring the upperdiameter of a tree, the construction of this type of table is mainly for research purposes.Functional forms including only dbh, or dbh and height are more common. Tree volumetables derived from these functional forms are called local tree volume tables and standardtree volume tables, respectively. Today, volume tables are rarely used in forestry practice;instead, calibrated volume functions (tree volume models) are directly used for estimatingChapter 2. Literature Reviewvolume. Some commonly used tree volume models are:1. Schumacher and Hall (1933)V = /3i(dbh)2H3e (2.2)2. Logarithmic Schumacher and Halllog(V) +31og(dbh) + ,821og(H) + e (2.3)3. Combined variable (Spurr 1952)V = + /31(dbh)2H+ (2.4)where log is logarithm (base 10); and /3o, ,82, /33 are unknown parameters that need tobe estimated.To develop a tree volume model, a large sample of trees is selected from a given speciesand forest region where the volume model will be applied. Besides dbh and the totalheight, the diameters at intervals along the stem of each sampled tree are accurately measured using a precise optical instrument, or sample trees are felled and measured. Usingthe measurements of sectional diameters, standard formulae (e.g., Huber’s or Smalian’sformulae, see Husch et al. 1982) are then applied to determine the volume of each section,and then summed to obtain the stem volume. Using this large data set, the parametersof a tree volume model can be estimated. In the early years of volume table construction, graphical methods, such as the harmonized-curve method (e.g., Baker 1925) andalignment-chart methods (e.g., Reineke and Bruce, 1932), were used to relate the measured volume to dbh and height of the sampled trees. However, all tree volume modelsused today are fitted using least-squares regression.In developing and using tree volume models, some common problems are encountered.First, since the variation of tree volume usually increases as the tree size (dbh and height)Chapter 2. Literature Review 11increases, the assumption of regression that the variances of the dependent variable will beequal across all levels of the independent variables (homogeneous variances) is violated.In this situation, the least-squares estimated coefficients are still unbiased. Since treevolume models are developed mainly for the purpose of estimation, many tree volumemodels with heterogeneous variances of the estimated volumes are commonly used inforestry. However, statistical inferences based on the standard error of estimate (SEE)and the standard errors of estimated coefficients of these models will be invalid. This canbe improved through using weighted least-squares regression to estimate the coefficientsof the model, if the appropriate weight is known or it can be estimated.Cunia (1964) studied the variance patterns of tree volumes for some common treespecies of North America (black spruce (Picea mariana (Mill.) B.S.P.), balsam fir (Abiesbalsamea (L.) Mill.), jack pine (Pinus banksiana Lamb.) and yellow birch (Betula alleghaniensis Britton (Betula lutea Michx.f.)). He found that the variance of tree volumescould usually be estimated using one of following models:4 k1(dbh)k2 (2.5)4 = k1(dbh)2 x H (2.6)4 = -i--k(dbh)+k3bh) (2.7)where 4 is the estimated variance of total tree volume; R is the average tree height forgiven dbh class; and k1, k2 and k3 are unknown constants. Clutter et al. (1983) tabulatedthe assumed variance patterns for some common volume equations.Logarithmic tree volume models (e.g, logarithmic Schumacher and Hall’s (1933)model) were found to meet the assumptions of the constant variance and normality forestimated logarithmic volumes better. This model form is widely used for tree volumeestimation in North America. For example, the standard tree volume equations suggested for the main species in the province of British Columbia (B.C.) are all logarithmicChapter 2. Literature Review 12forms (Watts 1983, p.431-432). Two problems exist in using the logarithmic models: (1)the SEE associated with a logarithmic model is not comparable to the SEE of othermodels with the dependent variables in tree volume units; and (2) by taking logarithmsthe estimated arithmetic means are automatically replaced by the estimated geometricmeans, and because the first is always larger than the second, the volumes are consistently underestimated. To solve the first problem, Meyer (1938) applied a theorem fromthe calculus of errors. If the standard error (o-) of a statistic x is known, the standarderror, Uf, of any function of x (f(x)) can be approximated as:(2.8)where is the first derivative of the function of x with respect to x. When thismethod is applied, the standard error of estimate in volume units (Sr) for logarithmicSchumacher and Hall’s (1933) model can be approximated as follows. For logV = x, andV = lOs, this becomes:f(x) = 1Odf(x)= lOxlnlO.dxTherefore, the approximate standard error of estimate in volume unit is:5,, 1O x inlO x SEE2.3026 x l0 x SEE (2.9)where ln is natural logarithm and SEE is the standard error of estimate associatedwith the logarithmic model. The percentage standard error of estimate in volume units(S,,(%)) is then:S(%) 100 (2.3026xlOx x SEE)= 230.26 x SEE (2.10)Chapter 2. Literature Review 13Alternatively, Meyer (1938) showed that S(%) could also be approximated as:S(%) 100 (lOEE — 1) (2.11)Replacing 10SEE in Equation 2.11 by the first two terms of the expansion of Taylor’s series,Equation 2.11 will also be equal to 230.26 SEE. Meyer indicated that Equation 2.10 wasappropriate when SEE was less than 0.10. However, in case SEE was greater than 0.10,Equation 2.11 should be used.For solving the problem of underestimated volumes using logarithmic models, Meyer(1938) indicated that the underestimate depends upon SEE. He suggested correctionfactors for different values of SEE (see Spurr 1952, p.74 for a detailed discussion).Other methods based on the properties of the lognormal distribution were discussedby Flewelling and Pienaar (1981).2.2 Applicability Testing of Tree Volume ModelsApplicability testing of tree volume models (tables) appears to have been first mentionedby Bruce (1920). He stated that the accuracy checking of a volume table was applicabilitytesting when applied to a population other than that for which it was developed. Spurr(1952, p.122) indicated that applicability testing of tree volume tables was used to answerthe question of whether a volume table should be constructed locally for each species,or whether a single volume table could be applied to a given species wherever it occurs.Using data from 1021 red spruce, black spruce and balsam fir trees, he investigatedthe effect of species, type of growth, site, and locality on estimating total cubic-footvolume using standard tree volume tables. He concluded that “the locality, type ofgrowth, and site where a tree grows apparently do not affect the total cubic-foot volumesufficiently to justify the development of more than one volume table for a given species.Different species, however, do have different volumes for trees of the same dimensions,Chapter 2. Literature Review 14but this difference may usually be expressed as a constant percentage correction whichis independent of tree size. One volume table, therefore, may be used for a number ofspecies, with the final volume estimate being corrected by a single percentage to adjust forthe species being estimated.” Spurr’s conclusion implies that a well established standardtree volume model may be applied to a larger geographic region, and for different treespecies, if the source of the model errors can be identified and a correction factor canthen be used to modify the model estimates.In applicability testing of tree volume models, the volume of sample trees should becompared with the estimated volume from the volume table to be tested (Husch et al.1982). In order to do this, the sample trees first must be selected appropriately from thepopulation where the tree volume model will be applied and the volume of each sampletree must be accurately determined. Second, appropriate criteria or statistical proceduresshould be used to determine the accuracy of the estimation of the model (i.e., how closethe estimated volume is to the actual volume). Husch et al. (1982, p.156) suggestedthe following conditions for selecting sample trees for testing the applicability of a treevolume table:1. Sample trees for a given species, or species group, should be well distributed throughthe population to which the volume table will be applied;2. No size, type, or growing conditions should be unduly represented in the sample;and3. If a sample of cut-trees is used, this sample, if not representative of the timber,should be supplemented by a sample of standing trees.Bruce (1920) suggested that the aggregate deviation and the average deviation between the estimated and actual volumes should be used to evaluate the accuracy of treeChapter 2. Literature Review 15volume tables. He defined the aggregate deviation as the difference between the sum ofthe actual volumes and the sum of the estimated volumes, expressed as a percentage ofthe latter, and the average deviation as the arithmetic sum of the absolute values of thedifferences between actual and estimated volumes expressed as a percentage of the sumof the estimated volumes. Spurr (1952, p.’75) indicated that aggregate deviation is anindication of the freedom of the volume table from bias and should ordinarily not exceedone percent, and average deviation indicates primarily the variability inherent in the dataused, and may often be as high as 10 percent. He also indicated that these two measuresare valid tests of precision, especially in the graphic and alignment-chart methods wherethe curves are fitted on the basis of the aggregate and average deviations. However, if thetree volume tables are constructed using the least-squares method, then it is advisableto assess precision in terms of the squares of the deviations rather than in terms of thedeviations themselves. Husch et al. (1982, p.156) stated that the aggregate deviationbetween the measured and estimated volumes of sample trees in an applicability checkof tree volume tables, for practical purposes, should not exceed 2CV/1/, where CV isthe coefficient of variation of the volume table being tested, and n is the number of treesused in the test. If this criterion is met, the table is applicable without correction.2.3 Fixed Sample Size Procedures for Validating Forestry ModelsThe applicability testing of forestry models is an extension of forestry model validation.Van Horn (1971) defined model validation as “the process of building an acceptable levelof confidence that an inference about a simulated process is a correct or valid inferencefor the actual process”. In other words, a model validation process involves testingthe “usefulness” or “validity” of a model. Testing the usefulness or validity of a modelimplies that (1) we have established a set of criteria for differentiating between thoseChapter 2. Literature Review 16models which are “useful” or “valid” and those which are “not useful” or “not valid”,and (2) we are able to apply these criteria to any given model. These two aspects ofmodel validation require an understanding of the nature of the problem including howthe model estimation or prediction will be used and the impact of errors on these uses,as well as the availability of statistical procedures that are designed to fit the conditionsof the problem (Reynolds et al. 1981, p.350).Model validation may be considered to be an essential component of the model building process, which provides users with some indication about how well the model willperform in the real population. In this situation, model validation can be carried out bysplitting the available data into an estimation and a validation set, and no new data setis needed (i.e., cross-validation approach, see Snee 1977 for detailed discussions). However, since applicability testing of forestry models is used when a model is considered forapplication to a population other than that for which it was developed, new data mustbe collected from the application population.In forestry, a number of papers have been published on the philosophy and methodology of model validation (e.g., Freese 1960; Reynolds et al. 1981; Reynolds 1984; Gregoireand Reynolds 1988; and Marshall and LeMay 1990). In general, the method for validating an estimation or prediction model is to compare the data (y) obtained from theactual population of interest with corresponding data () generated from the model. Theessence of this comparison is to examine the model errors d (d=y— ), or relative errors,d/y. Widely used statistical procedures for this purpose include:1. Freese’s (1960) procedure of accuracy testing and its modifications (i.e., Rennieand Wiant 1978; Reynolds 1984; and Gregoire and Reynolds 1988);2. Reynolds et al.’s (1981) procedures for validating stochastic simulation models;3. Reynolds’ (1984) estimation procedures; andChapter 2. Literature Review 174. Other procedures (i.e., paired t-test, analysis of variance and regression procedureetc.).The use of these procedures requires a data set from the application population witha predetermined, sufficiently large sample size; these procedures are then called fixedsample size procedures.2.3.1 Freese’s procedure of accuracy testing and its modifications2.3.1.1 Freese’s procedureFreese (1960) developed a statistical procedure for determining the accuracy of a newestimating or measuring technique against an accepted standard. His procedure used astandard chi-square test for a hypothesized variance. Freese’s idea of accuracy testingwas based on comparing estimated and observed values against an established standard.This idea was realized by first stating the accuracy required (or the inaccuracy thatwould be tolerated) by the user of the technique, then estimating the accuracy obtainedby the technique, and finally determining whether the accuracy of the technique met theaccuracy required through statistical testing of a hypothesis.Freese’s (1960) procedure of accuracy testing included three chi-square test statistics.The first test statistic was designed to determine the accuracy of the technique in themeasurement units of the variable of interest. This test statistic is:2 — L1=1 kY2 — Yi— ,j2 ,i 2X(n)df 2 — iI0hUh i=1where X)df represents a chi-square distributed variable with n degrees of freedom underthe assumptions explained in section 2.3.1.2; yt and are the pair of observed andestimated values, respectively; d are the differences between observed and estimatedvalues (also called the errors of a model in this dissertation); n is the sample size; ando is the hypothesized variance representing the precision required by the user.Chapter . Literature Review 18In applying this test statistic, the potential user of the technique is first required tospecify an allowable error limit (e) with a desired probability level, 1—-y. These valuesspecify that for acceptance of the model, 100(1—y) percent of the estimated values (th)must be within e units of the actual values (yi). The specified values are then used tocalculate a hypothesized variance as:= (2.13)z1—_y/2wherez1_712 is the 100(1 — 7/2) percentile of the standard normal distribution. Using aspecified probability of a type I error, a, the model will be considered to be acceptableif the computed Xn)df is below x_(n), the 100(1 — a) percentile of the chi-squaredistribution with n degrees of freedom. If the model is accepted, it means that theuser’s accuracy requirement specified by e and 7 is met. Since the sum of squared errorsd) is a measure of the total error, Freese (1960) indicated this statistic wouldreject inaccurate techniques regardless of the sources of inaccuracy (large bias, lack ofprecision, or both).The second test statistic was designed to test the accuracy of a technique in relativeunits. The explanation for using this test is that specifying a percent error limit (p) ismore convenient than specifying the error limit in absolute units (e) in many practicalapplications.n (2 —________X(n)df— 2 2=i hY1—1002z_712(yt —22= [100zl_f2][d]2 (2.14)p i=1where the hypothesized variance, o =2/(lOOzi_712);and 7 is specified to define anacceptable model that at least 100(1— ) of the relative errors, d/y, must be withinChapter 2. Literature Review 19p/100. With a significance level of c, if the computed Xn)df is below X_a(), theaccuracy requirement of the user specified as p and -y is met, and the technique willbe acceptable. This test statistic also rejects an inaccurate technique regardless of thesources of inaccuracy.The last test of Freese’s (1960) procedure is a “bias-free” test. This test is consideredto be useful since some techniques may be capable of providing excellent accuracy if bias(systematic errors) can be removed. For different assumptions of the bias, two possibletest statistics may be used for this test.1. If the bias is assumed to be the same for all values of y, and the magnitude of thebias can be estimated by the mean error (d) of the technique, the test statistic is:— j2 d —2 — L.j=1”. ) — L..,j= iX(—i)df— 2 2 (2.15)where Xn—df is a chi-square distributed variable with n — 1 degrees of freedomunder assumptions explained in section 2.3.1.2.2. If the bias is assumed to increase or decrease directly with the true values in alinear fashion, the bias-free accuracy test requires fitting a linear regression of theestimated values on the true values, and computing sum of squared errors (SSE)about this regression. In this case, the test statistic is:X(n—2)cif= SSE (2.16)where is a chi-square distributed variable with n — 2 degrees of freedomunder assumptions explained in section 2.3.1.2; and cr in Equations 2.15 and 2.16are defined to be the same as Equation 2.13.The application of this bias-free test is the same as the first test. For convenience, thethree chi-square test statistics (i.e., Equations 2.12, 2.14 and 2.15) will be denoted asFreese’s chi-square test I, II and III throughout this dissertation.Chapter 2. Literature Review 202.3.1.2 Underlying assumptions of Freese’s procedureSince it was proposed, Freese’s (1960) procedure of accuracy testing has been widelyused for evaluating the accuracy of forestry models (e.g., Evert 1981; Hazard and Berger1972; Moser and Hall 1969). However, the basic assumptions underlying this procedurewere not explicitly stated by Freese. Consequently, users of the procedure have notalways known when the assumptions of the procedure have been violated. Realizing thisproblem, Reynolds (1984), and Gregoire and Reynolds (1988) presented discussions onthe basic underlying assumptions and derivation of Freese’s procedure. The main pointsof these discussions are summarized in this section.Freese’s (1960) procedure to determine how well a model will perform involves formulating statistical hypothesis testing for the distribution of the errors, d, or the relativeerrors, d/y. In discussing the distributions of d, Reynolds (1984, p.455-457) indicatedthat two types of distributions may relate to d. These are the conditional distributionof d for a given input vector, x, (i.e., the values of independent variables), and the unconditional distribution of d where d is generated from input vectors that are randomlyselected from all possible . The accuracy related to the unconditional distribution ofd is called the average accuracy, a measure of the “average” accuracy over all values ofx . The basic assumption underlying Freese’s chi-square test I is that the n differencesare a random sample from the unconditional distribution of d, and the d1,d2,... ,d, areindependent and identically distributed (iid) normal variables.The derivation of Freese’s (1960) test I as shown by Reynolds (1984 p.457) requiresthe user to specify values of e and y such that if:P((dI<e)1—7 (2.17)the accuracy of a model will be acceptable. Since-y is usually small, the acceptance ofthe model means that the differences between the estimated and observed values shouldChapter 2. Literature Review 21be less than e with a high probability. In other words, if at least 100(1—7) percent of theabsolute values of model errors, d, are less than or equal to e, the model is acceptable.If the normally distributed variable, d, can be further assumed to have a zero mean(ltd = 0) and a variance, u2, Reynolds showed that the accuracy statement representedby Equation 2.17 will be satisfied if:e2 >u2(l) (2.18)where x(v) is the 1 —7 percentile of chi-square distribution with v degrees of freedom.Equation 2.18 results since &/a2 has a chi-squared distribution with 1 degree of freedomunder these assumptions, and thus:P(— x-(’)) = 1 —y (2.19)Also, since_7(1) = z_712, Equation 2.18 can be rewritten as:u2 <e2/z_12 (2.20)This inequality represents a variance bound of the errors, d, for satisfying the accuracyrequirement given as Equation 2.17. Freese’s test I is designed to test whether thevariance of d exceeds this bound by setting u =e2/z_..12,and testing the hypothesis:H0: a2 < crvs.H1 : a (2.21)Although not explicitly stated by Freese, the derivation of this procedure assumes thatthe model estimation is not biased (i.e., lt = 0). If the model estimation is biased, inorder to satisfy Equation 2.17 for a given pair of e and,the true variance of d wouldneed to be even smaller than the bound (Equation 2.20). If the bias is large enough,for given values of e and,it may not be possible to meet the accuracy requirementrepresented by Equation 2.17 no matter how small the true variance of d.Chapter 2. Literature Review 22The underlying assumptions and derivation of Freese’s (1960) chi-square test II aresimilar to those of test I. The difference is that the variable being examined is the relativeerror, d/y, instead of d, where y is the actual observed value of the dependent variable.The underlying assumption of this test is that the unconditional distribution of d/y isnormal with a zero mean and a variance,°h,• Similar to test I, Reynolds (1984) indicatedthat the specified values, p and -y were used to establish the accuracy requirement:P(lOOId/yI p) 1—7 (2.22)Under the assumptions above, and similar to test I, the following relationship is true:p((d/Y)X(1))=1—7 (2.23)The accuracy requirement (Equation 2.22) will be satisfied if:p2 a,x_,,(1)1002 (2.24)or if:[p/(1OOzi_12)]2 (2.25)To confirm the accuracy requirement (Equation 2.22), Freese’s (1960) procedure is usedto determine whether the variance of the unconditional distribution of d/y exceeds thevariance bound (Equation 2.25) by testing the hypotheses:H0 : cr vs.ii. 2 2>where =p2/(lOOz1_2)2,Reynolds (1984, p.458) also indicated that the distributionalassumptions between Freese’s test I and II were contradictory, since if the distributionof d is normal then the distribution of d/y will not be normal and vice versa. Therefore,Chapter 2. Literature Review 23when applying these two tests, the examination of normality should be conducted for theappropriate variable, d or d/y.The normality assumption provides the crucial link to the x2 distribution, which iscentral to Freese’s (1960) procedure of accuracy testing. Gregoire and Reynolds (1988)studied the sensitivity of Freese’s procedure to non-normality through simulation experiments. In their study, three non-normal distributions were used. These were a uniformdistribution with the interval from -ito 1, a double exponential distribution, and a mixture of two normal distributions, N(0, 1) and N(0, o), which was mixed with a ratio of9-to-i favoring the N(0, 1), where o- = 2, 3, 4, and 5. They concluded that nonnormalitywould very seriously distort the operating characteristic and power of Freese’s chi-squaretests, and the deterioration increased with increasing sample size.From the derivations of Freese’s (1960) tests I and II, the derivation of Freese’s chisquare test III is straightforward. The maximum likelihood estimate for the varianceof d is 1(d — J)2/, instead of d/n if the expected value of d is not zero andunknown. Therefore, the measure of the accuracy of a model should be (d1 — d)2,instead of d. However, Reynolds (1984, p.45’T) indicated that this test was onlyapplicable when the bias of a model is removable. For the bias to be removable, the userof a model must understand the nature of the model bias (e.g., negative or positive), andthe magnitude of the model bias must be estimated using an appropriate estimate (twosituations indicated by Freese). The simplest situation is when the bias is the same forall estimated values, and the mean of errors is the appropriate estimate of the bias.The difference between Freese’s (1960) test III and the others is that this test isused to determine the accuracy of a model after the bias is eliminated. The source oferrors cannot be separated when using Freese’s test I and II; the total errors are usedto compare with the specified requirement of accuracy (hypothesized variance). If themodel is unbiased, then the results using Freese’s test I and III would be the same underChapter . Literature Review 24the assumptions of normality and constant variance.2.3.1.3 Modifications of Freese’s procedureThree modifications to Freese’s (1960) procedure have been suggested to improve itsusefulness for evaluating the predictive ability of forestry models. First, the specificationof the error or percent error limit, e or p, is sometimes not straightforward, and it dependson the knowledge of the user about the desired level of accuracy. When different usersare involved, one user’s specified value may be too accurate or not accurate enough forothers (Rennie and Wiant 1978). To overcome this problem, some researchers (Bell andGroman 1971; Boehmer and Rennie 1976) inverted Freese’s testing procedure to calculatethe maximum anticipated error (E*), and percent error (E*(%)), which are the maximume and p, respectively that will result in rejecting the model. Rennie and Wiant (1978)gave the formulae for calculating these values. These formulae were:2 ‘n 1/2= i_y/22 L.: (2.27)Xi— (n)2‘ •2 1/2E*(%) = Zl_Y/2Ji=l (2.28)xi—(n)where-y is the specified probability level associated with the allowable error limit, e, orp; and cr is the specified probability of the type I error for testing the hypotheses givenas Equations 2.21 or 2.26. If the user-specified e or p is larger than the calculated E* orE*(%), the tested model should be acceptable. If the user-specified e or p is less than thecalculated E* or E*(%), the model will be unacceptable. Ek and Monserud (1979) usedthese calculated maximum anticipated errors as criteria to evaluate two stand growthmodels; the model with smaller E* and E*(%) was considered to be better.Reynolds (1984) stated that since the null hypothesis of Freese’s (1960) original procedure was H0 : g2 o, the model would be judged to be adequate unless there wasChapter 2. Literature Review 25strong evidence to show that it was not. This formulation may result in favor of theacceptance of a model. Reynolds then suggested modifying Freese’s null hypothesis toH0 : o.2 o-, which would then be tested by using the same test statistic of Freese’s testI (Equation 2.12). This modified test will reject the null hypothesis and accept the modelat a significance level a when the calculated statistic (x)df) is below x(n), the lowertail 100cr percentile of the chi-square distribution with n degrees of freedom. Reynoldsindicated that this modified testing procedure was more conservative than the originalprocedure since stronger evidence is demanded to judge a model to be adequate. Hestated that it was probably preferable to the original procedure for most model userswho need to be reasonably sure that the model would meet their requirements.Reynolds also attempted to relate the maximum anticipated error, E*, to a confidenceinterval estimate. That is, under the assumptions that d is a normally distributed variablewith a zero mean and a variance, o2, a 100(1 — a) percent confidence interval for the1—quantile of the distribution of Id can be constructed as: (CL, Cu), and whereCL— (1) d 1/2 (2.29)xi_a12()CU= x_(i) i=I d1/2(2.30)x12(n)Instead of making a decision to accept or reject a model, this interval estimate providessome indication how large the errors are expected to be if the model is used for estimation.That is, the interpretation of (CL, CU) is that one can be 100(1 — a) percent sure that100(1 —7) percent of model errors will be within (CL, CU), if the model is repeatedly usedfor estimation.Gregoire and Reynolds (1988 p. 306) further discussed Freese’s (1960) original procedure and Reynolds’ (1984) modified procedure. They indicated that Freese’s proceduremay be too liberal for typically small values of and a in the sense of not requiringChapter 2. Literature Review 26reasonable evidence of accuracy attainment, and Reynolds’ modified procedure may betoo conservative in the sense of requiring too stringent evidence for accepting a model.They then suggested a three-decision procedure for combining the advantages from twoprevious modifications of Freese’s procedure (i.e., the modification of calculating themaximum anticipated error (E*) and Reynolds’ (1984) modified testing procedure). Thedecision rules of their suggested three-decision procedure are:1. If e < CL, conclude that the actual accuracy of the model is less than that required(specifed as e and );2. If L e < u, no confident conclusion can be made for the accuracy of the modelbased on the available data; and3. If e CU, conclude that the actual accuracy of the model exceeds the requiredlevel.Among these three modifications of Freese’s (1960) procedure, the idea of calculatingthe maximum anticipated error or percent error has been applied by many researchersfor forestry model testing (e.g., Boehmer and Rennie 1976; Ek and Monseud 1979).2.3.1.4 Previous applications of Freese’s procedureEvert (1981) applied Freese’s (1960) test Ito validate five mortality models, which weredeveloped from thirty permanent sample plots provided by the Canadian Forestry Serviceat the Petawawa National Forestry Institute, Chalk River, Ontario. The evaluation dataincluded five periods of measurements of three permanent sample plots belonging to thesame series of plots that were used for model estimations. They were not used in thedevelopment of the models. Instead of specifying the error limit, e, the standard errorof estimate of the evaluated model was set to be the hypothesized variance o as theChapter 2. Literature Review 27required accuracy. The result was that the calculated Freese’s chi-square statistics wereall less than the tabulated values at a significance level of 0.05. All five models wereconsidered to be acceptable.Nevers and Barrett (1966) applied Freese’s (1960) test I to determine the accuracyof height accumulation volumes from penta prism caliper measurements. Based on theestimated volumes from 20 standing white pine (Pinus albicaulis Engeim.) trees, and theactual volumes from the measurements made by climbing the trees, they concluded thatat least 95% of the estimated tree volumes from the penta prism were within 2 cubic feet(0.0566 m3) of their actual volumes (no a level reported) for the test. In their application,the Wilcoxon’s signed rank test was used first to check the unbiasedness of d; it showedthe presence of a highly significant bias in estimated volumes. Similarly, Cost (1971)also applied Freese’s test I to evaluate the accuracy of standing-tree volume estimatesbased on McClure Mirror caliper measurements. He first measured the upper diametersof 25 sample trees using the instrument to determine the estimated volumes, and thenfelled the sample trees to determine the true volumes. Wilcoxon’s signed rank test wasalso used to test the unbiasedness of d in this application. The accuracy requirement,e, was set to 1.5 cubic feet (0.0425 m3) with-y set as 0.05. The conclusion was that atleast 95% of errors of the individual tree volume estimates from McClure Mirror calipermeasurements were expected to be within 1.5 cubic feet (no a level reported).Moser and Hall (1969) used Freese’s test II to check the accuracy of a growth andyield function of basal area, which was obtained from the mathematical integration of agrowth-rate equation. The measurements from 40 permanent sample plots were used asthe actual values; the required accuracy was that the predicted values should be within10 percent of the actual values. The conclusion was that for the 12-year period, thederived function provided the necessary accuracy for predicting average basal area.Chapter 2. Literature Review 28Hazard and Berger (1972) studied the applicability of a ponderosa pine (Pinus ponderosa Laws.) volume table when applied to different species and geographical locations.A random sample of 50 trees was selected from the tested area, which had a uniformdistribution of trees ranging in dbh from 5 to 40 inches (12.7 to 101.6 cm). The computed volumes based on the measurements using a Barr and Stroud optical dendrometer,model FP-15, were used as the true volumes for comparison. A p value of 10 percent wasspecified as the required accuracy. The result of Freese’s (1960) test II showed that thevolume table failed to meet the accuracy requirement. They then studied the differencesbetween the estimated and true volumes, and found that the differences ranged from-62.3 cubic feet (-1.7643 m3) to +5.6 cubic feet (0.1586 m3). Based on this result, theyconcluded that Freese’s test I is influenced considerably by bias and not just variationbetween two methods of volume estimation.Thies and Harvey (1979) applied Freese’s (1960) chi-square test II to determine theaccuracy of a photographic method for measuring tree defect. The required accuracy wasthat the estimated defect areas by the photographic method were within + 5% (i.e., p)of the true areas. The conclusion was that the photographic method met the requiredaccuracy standard.2.3.2 Procedures for validating forestry simulation modelsSimulation models are widely used to help understand complicated systems or processes.In forestry, tree and stand simulation models have become an important tool used fordecision-making in forest management. With the increased use of simulation models hascome an awareness of the need to validate the models before a model can be used withconfidence.A bibliography on the validation of simulation models is given by Balci and Sargent(1984). The philosophical aspects of validating simulation models were well discussedChapter 2. Literature Review 29in the papers by Van Horn (1971), Naylor and Finger (1967), Mankin et al. (1977).In general, these authors agreed that simulation model validation was a multi-stageprocess of examination (Naylor and Finger 1967, p.95). The initial stages of this processinvolve the examination of the structure and operation of the model to make sure thatit is working as intended (verification stages). The next stages of this process involvecomparing model output with what is observed in the real system (validation stages).Many statistical procedures have been proposed in computer simulation and other literature for validation. Examples include the two-sample Hotelling’s T2 test for the modelswith multivariate response variables (Balci and Sargent 1982); the decision-analytic approach for minimizing the expected losses based on the type I and II errors resulting frommodel acceptance or rejection (Greig 1979); and the spectral analysis approach or statistical hypothesis testing procedure for validating time-series simulation models (Fishmanand Kiviat 1967; Naylor and Finger 1967; Feldman et al. 1984). However, these proposedprocedures are only applicable for their specific designed condition. They are in generalnot appropriate for testing forestry prediction models, especially tree volume models.In forestry, Reynolds et al. (1981) suggested that statistical hypothesis testing approaches, which combine n independent tests for testing the same hypothesis into oneoverall test, may be used. They had several reasons for this suggestion. First, the input variables (x) of forestry simulation models usually involve a wide range of possiblelevels. For example, the initial stand and site characteristics (e.g., stand age, site indexand basal area) input to a stand model may include a wide range of values for a givenforest management unit. Therefore, the goal in developing and using a forestry simulation model is to be able to predict the conditional distribution of the variable of interest(y). In other words, conditional, not unconditional accuracy, should be emphasized insimulation model validation. The appropriate null hypothesis for validating forestry simulation models is then H0 : F(y I ) = G(y ) for all — < y < oc and e A, whereChapter 2. Literature Review 30F(y I ) and G(y I c) are the true and predicted conditional distribution of y given ,respectively; and A represents some specified set of x values. Second, using a sample setof n pairs of observations, (y, ) selected from a population, the method to compare thesimulated distribution, G(y I x), with F(y I x), is usually to make m runs of the simulation model for each given•This will give m independent and identically distributedvalues, say , = (zi, Z2,. . . , Zm), where represents m predicted values conditioned onç. In this way, Reynolds et al. (1981) indicated that since each of the n pairs of values(yj, ) for i = 1 ... , n, was generated under different values of , they might representdifferent initial stand and site conditions. To compare the true and predicted conditionaldistributions, the data obtained in this way should not be grouped into one large set forcomparing the conditional distribution. Instead, the n independent pairs must be keptseparate. Also, since there is only one actual observed value (y) in every individual pair,any test applied to an individual pair would not have much power by itself.They then proposed six parametric testing procedures and three nonparametric procedures. One of these testing approaches assumes that F(y I i) and G(y I are normaldistributions with the same mean and variance. Because of the independence of y andZjj, the statistic for each pair of (y,) is:ti= yi_isiV’1+;which has a i-distribution with m— 1 degrees of freedom, where=z/m and= Z(z — z)2/(m — 1). The sum of these test statistics would be:U=1tiUnder the null hypothesis and for m 4, the statisticu =ui31)Chapter .2. Literature Review 31has approximately a standard normal distribution. Thus, the test can be carried out bycomparing the observed value of the statistic with the appropriate critical value from thestandard normal distribution. The idea for formulating other proposed testing approachesis very similar. Reynolds et al. (1981) also indicated that the difference between this andFreese’s (1960) procedure was that Freese’s procedure is useful for determining whetherthe predicted value was close to the actual value y, but their procedures are useful fordetermining whether the distribution of is close to the distribution of y.2.3.3 Reynolds’ estimation proceduresFreese’s (1960) procedure and its modifications were designed to compare the accuracyof a model against an established standard stated by users. However, in many cases ofmodel validation, the question that the user needs to answer is not whether the modelmeets a particular standard, but rather what is the magnitude of the error that can beexpected when the model is used for estimation or prediction. Reynolds (1984 p.46l)stated that for the latter, statistical estimation approaches are more appropriate. Insteadof simply accepting or rejecting a model, estimation approaches provide the user withsome indication of how far predictions using the model will be from their actual values.In this way, they may be more meaningful for the user. Also, they are applicable evenwhen bias is present.Reynolds (1984) then suggested various interval estimates (confidence interval, prediction interval and tolerance interval) for the mean of d. In general, if the users wanted toestimate the true mean of d based on the observed errors of a model, a 100(1—-y) percentconfidence interval should be used. If the model was used for making future prediction,a 100(1—‘y) percent prediction interval of a future value of d would be appropriate. Thetolerance interval might be used when the users want to know the upper and lower limit(tolerance limit) for the distribution of d, which contains at least a given percent of theChapter .2. Literature Review 32entire distribution of d. As with Freese’s (1960) procedure, all these suggested intervalestimates require the normality assumption for d, except for the non-parameteric prediction interval for k future values of d. For any continuous distribution of d, this predictioninterval is defined as:P(Min(d) < d3 Max(d)) = 1— 71 (2.31)n.(n—1)and 71 (n + k)(n + k — 1) (2.32)where Min(d) and Max(d), respectively are the smallest and largest errors of d in thecurrent sample; d3 for j = 1,.. . , k, are k furture values of d; and‘is the probabilitythat all k future errors, d, are within (Min(dj, Max(d)). This interval means thatthe smallest and largest values of d from the current sample give a 100(1— 71) percentprediction interval for all k future values of d.To assist with the application of Reynolds’ (1984) interval estimates procedures,Rauscher (1986) developed a PC BASIC program, ATEST, which accepts the inputof errors (d for i = 1 .. . n) and outputs the calculated confidence interval, predictioninterval and tolerance interval. The program also provides a routine to test the normalityfor the error, d, and relative errors, cl/y. If the normality assumption is not met, theprogram calculates various intervals based on the sample 10-percent trimmed mean andthe jackknifing variance, which are more robust to non-normality, instead of using thecommon mean and variance.2.3.4 Other fixed sample size proceduresBesides the procedures discussed in previous subsections, other statistical proceduresfor comparing the estimated and actual values of a model were also found in forestryliterature. A brief discussion of these procedures follows.Chapter 2. Literature Review 33The first procedure found was a paired t test’ for the null hypothesis H0 : pd = 0,where ,j is the mean for the normally distributed errors, d. It is well known that the teststatistic of this test is t,-_ = d/S, where d and S are the sample mean and standarderror of d, respectively, and t,, has a Student’s t distribution with n — 1 degrees offreedom. Since d and S are sample measures of bias and precision of a technique,respectively, Freese (1960 p.145) indicated that the paired t test was not suitable for thepurpose of accuracy testing, because it compares one form of accuracy (bias) to the otherform (lack of precision), frequently with anomalous results.The analysis of variance (ANOVA) procedure appears to be a logical choice to comparethe estimated and actual values of a technique (model) when the observations from apopulation can be grouped (e.g., tree dbh or height classes). Reynolds et al. (1981,p.352) pointed out that the ANOVA procedure was not strictly appropriate for modelvalidation since it required the same variance within each group, and this was not likelyto be true in many applications of forestry model testing.West (1983, p.186) used the Kolmogorov-Smirnov (K-S) test of goodness of fit toevaluate the predictive ability of a forest simulation model. Naylor and Finger (1967)indicated that this distribution-free (nonparametric) test was concerned with the degreeof agreement between a simulated (predicted) and an observed series of the model’sresponse variable.Finally, the regression approach can be used to regress the observed values on theestimated or predicted values. A test is then used to confirm whether the resultingregression equation has an intercept and a slope which are not significantly different fromzero and unity, respectively. This procedure was used to assess the predictive abilities ofthree stand growth models by Daniels et al. (1979) and the accuracy of aluminum band‘The pairs in model testing will be the predicted and observed values, th and yj for i = 1 . . . n, whichare determined by different sets of the values for independent variables, .Chapter 2. Literature Review 34dendrometers for measuring the diameter growth of trees by Auchmoody (1976).In an interesting study conducted by Ek and Monserud (1979), all procedures presented in this subsection, as well as Freese’s (1960) procedure, were used to evaluate twotree growth models (e.g., FOREST and SHAF). However, no conclusion was made onwhich procedure was better for that application.2.4 Variable Sample Size Procedures for Testing Forestry PopulationsIn the last section, procedures for forestry model validation were reviewed. These procedures can be used for detailed evaluation of the accuracy of a forestry model. However,to apply these procedures, a sufficiently large sample set with a preset size (e.g., a largesample of trees or plots) must be collected before any testing or estimating process can becarried out. These procedures are not efficient for applicability testing of forestry models,because the purpose of an applicability test is to classify an existing model into two wideclasses (i.e., acceptable or unacceptable), and it is possible that a decision about theapplicability of the model can be made with a small sample size. This is particularlytrue when the actual accuracy of the model is far below or above the accuracy requirement specified by the user. Therefore, alternatives to the fixed sample size proceduresare needed. In sampling theory, a variable sample size procedure (sequential sampling)may be used as an alternative to the fixed sample size procedures.In the statistics literature, sequential sampling procedures for testing or estimatingthe parameters of a population are generally called sequential analysis approaches. Conceptually, sequential analysis is a method of statistical inference with the characteristicfeature that the number of observations required by the procedure is not determined inadvance (Wald 1947, p.1). The basic theory of sequential analysis for statistical hypothesis testing was developed by Wald in America and Barnard in Britain at about the sameChapter 2. Literature Review 35time during World War II (Wetherill and Glazebrook, 1986). Because of its value, thesequential approach was classified as restricted in America and was not made availablefor wider use until after World War II (Wald 1947, p.3). The advantage of a sequentialanalysis approach, as applied to testing statistical hypothesis, is that it requires, on average, a substantially smaller number of observations than equally reliable test proceduresbased on a fixed sample size procedure (Wald, 1947).In discussing the problem of considering the use of a model fitted for one populationfor application to a different population, Marshall and LeMay (1990) indicated that threetypes of approaches could be applied to this problem. First, a new sample set could beselected, and a new model could be fitted (independent sampling approach). Second, theparameters of the existing model could be adjusted using new sample data (adjustmentapproach). Third, a sequential sampling plan could be used to sample until a decisionregarding the applicability of the existing model is made, and a new model may be fittedonly if necessary (sequential sampling approach). They concluded that among thesethree approaches, the sequential sampling approach is the only one which explicitly teststhe applicability of the existing model. Also, for a given sampling design and precisionrequirement, a sequential sampling approach requires the least number of samples.Brack and Marshall (1990) made an attempt to apply the sequential approach forchecking model estimation. In their study, the sequential sampling approach was usedto check the applicability of a mean dominant height (MDH) equation for radiata pine(Pirius radiata D. Don) plantation in specific cutting units in New South Wales, Australia.In their sequential testing plan, the checking of model estimations was considered to bea sequence of independent Bernoulli trials with two possible outcomes (acceptable andunacceptable) by comparing each observed relative error, d/y, to a preset limit (5%or 10% of actual value). A binomial variable (the sum of the number of the acceptablepredictions) was then used to measure the accuracy of the model. By using this procedure,Chapter . Literature Review 36they showed that the sequential approach could reach a decision on whether the estimatedMDII of the model was acceptable or unacceptable in as little as three plots (sample size).However, since a Bernoulli variable could take only the value of 0 or 1, much informationabout model errors was ignored. They suggested that other better measures of modelaccuracy should be used instead of a binomial variable in order to use the sequentialapproach for model testing.These two initial discussions and attempts indicated that sequential sampling procedures may be potentially superior to fixed sample size procedures for the purpose ofapplicability testing of forestry models.Many applications of sequential sampling plans to pest surveys have been found in theforestry and agriculture literature. In discussing the advantages of sequential samplingplans for forest pest surveys, Fowler and Lynch (1987a) stated that the sequential approach is intuitively appealing in that very few observations (little work, time, or money)are needed to make a terminating decision when insect populations are very sparse or veryabundant. Such plans required, on the average, only 40 to 60% as many observations asan equally reliable fixed-sample procedure. Therefore, sequential procedures should findfairly wide applicability where the objective is to classify insect densities or damage intotwo or three broad classes (e.g., treatment and non-treatment) for management decisionsand where observations are time-consuming, costly, or destructive. These properties ofthe sequential sampling procedures exactly meet the desired criteria for the alternativeprocedures of the applicability testing of forestry models.Based on the similarity of the purposes between a forest pest survey and an applicability test of forestry prediction models, and also on the potential advantages of thesequential approach for achieving the objective of this reasearch, a search of forestryand other literature for the theory and application of sequential analysis approaches wascarried out.Chapter 2. Literature Review 372.4.1 Sequential probability ratio tests2.4.1.1 Wald’s general decision rule for SPRT plansThe most widely used sequential sampling plans in forestry are the sequential probabilityratio test (SPRT) plans. The theory of SPRT plans was developed by Wald (1945, 1947).A detailed review on its basic theory and application will be given first.The SPRT, in general, is a statistical procedure mainly for hypothesis testing. It isdesigned to test a simple hypothesis, H0 : 0 = 00 against another simple hypothesis,H1 : 0 01, where 0 is the parameter of interest, and 00, Oi are two class limits ofpopulation classification for decision-making purposes (e.g., acceptable and unacceptable,or no need for treatment and need for treatment, etc.).2 To design an SPRT procedure,besides the two class limits, Oo and 01, the desired levels of probability for making type Iand II errors, a and j3, are all preset. However, the sample size, n, is not preset, but is arandom variable (variable sample size), which depends on the values of the observationstaken in each random experiment.For any given sample, the SPRT divides the n-dimensional sample space (S ={x1,x2,x3,. . . , x,}) into three mutually exclusive sets (e.g., S, = {A0U A1 U A}) foreach n. The observations, x1,. . . , x, are sequentially sampled from the given distribution (population). After the first random observation x1 is taken, H0 is accepted if x1lies in A10;H0 is rejected if x1 lies in A11, or a second observation x2 is taken if x1 lies inA1. If x1 is in A1 and a second observation x2 taken, H0 is accepted, H0 is rejected, ora third observation x3 is needed to taken according to whether the point (x1,x2) lies inA20,A21 or A2. This process stops only when H0 has been either accepted or rejected.In practical applications, the basic theory is realized through the following procedures.At each stage of the test, an observation is taken at random from the given distribution21n a statistical context, 9o and 01 are also called the null and alternative parameters, respectively.Chapter 2. Literature Review 38and the likelihood ratio:= n [f(xO)] (2.33)i=1 f(x, o)is computed from the total number of observations taken up to that point (n), wheref(x, O) and f(x, O) are the probability density functions under the null and alternative hypotheses, respectively. Based on the computed probability ratio, R, one of thefollowing decisions will be made for each stage:B, stop sampling and accept Ho;R A, stop sampling and reject H0, orB < R <A, continue sampling.This is called the decision rule of Wald’s SPRT procedure. A and B are two constantsthat are chosen so that 0 < B < A, and so that the test has the desired preset values ofc and 6. Wald (1947) gave the approximate formulae for determining A and B as:1-/3The approximate equalities are obtained by ignoring the possible overshooting (exceeding) of the decision boundaries. In other words, R A is approximated as R = A,and R B is approximated as R = B. R can be simplified by taking the naturallogarithm (in) of each density function ratio in the product, which yields:1n(R) = Zn = and z = ln {fi Oi)]fo(x, O)The decision procedure is then (1) if Z,, a, stop sampling and reject H0; (2) if Z,, b,stop sampling and accept H0; and (3) if b < Z < a, continue sampling, where a = ln(A)and b = ln(B).Chapter .2. Literature Review 39For some probability distributions (e.g., normal or binomial distribution), the cumulative sum of the observed log-likelihood ratio, Z,., is equivalent to using the cumulativesum of the observations, x, which is simple to calculate by setting Z,-, a, Z = band solving for x2 to determine the upper rejection and lower acceptance boundaries,respectively. These decision boundaries are represented by two parallel lines (L0 and L1).The decision process is then: (1) if x h2 + sn, stop sampling and reject Ho; (2)if x < hi+sn, stop sampling and accept H0; or (3) h1+sn < x < h2+sn,continue sampling, where the intercepts, h1 and h2, and the slope s are calculated fromthe knowledge of the underlying distribution and the specified values of 0, 01, a, and 3.The decision boundaries, h1 + sn and h2 + sn, are also called the acceptance and rejectionlines, respectively.Fowler and Lynch (1987a) indicated that in the derivation of SPRT, the assumptionsmade by Wald (1947) were:1. Only one observation (sampling unit) is taken at each stage of sequential process;2. There is no predetermined upper limit to the number of observations taken beforea terminating decision is made; and3. A terminating decision is made as soon as a decision boundary is crossed.In applying the SPRT procedure for hypothesis testing, the probability density function of the random variable of interest must be known (e.g., normal, binomial, etc.). Also,two class limits must be set for the null (Os) and alternative (Or) hypothesis parametervalues of the underlying distribution, and the associated error probabilities, a and 3,must also be specified. Having met these conditions, an SPRT can be established and itsdecision boundaries given.Before an SPRT procedure is actually carried out, it is desirable to know the propertiesof the test in order to determine its potential feasibility (e.g., expected sampling cost).Chapter 2. Literature Review 40Wald (1947) described the properties of an SPRT over all possible values of the randomvariable of interest with Operating Characteristic (OC) and Average Sample Number(ASN) functions, and he also derived the general formulae of the OC and ASN functionsfor an SPRT procedure.2.4.1.2 Wald’s general OC function for SPRT plansWald’s (1947) Operating Characteristic (OC) of an SPRT procedure was defined as:L(0) = P(aecept H0 I 0),which is the probability of accepting the null hypothesis (H0), assuming that 0 is thetrue parameter. L(0) is called the OC value for given 0. Wald derived the general OCfunction of any SPRT plan (see, Wald 1947, p.50) as:Ah(e)— 1L(0) Ah(6) — Bh(6) (2.34)where h(0) is the nonzero solution of:f f(x, 0) h8) 0 d —f(x,)x_J—00 f(x, 0)if the distribution of x is continuous, or:f(x,01) h(6)0 —1f(x,0o) (x, )—if the distribution of x is discrete.In order to explain how to derive the OC equation for a specific SPRT procedurebased on Wald’s general OC function, an example for testing the standard deviation (o)of a normal variable with known mean (JL) is given. First, two class limits of the standarddeviation (oo and ai), and the desired probability levels of the type I and II error (oand 3), must be specified by the user. Based on these specified values, the hypothesisChapter . Literature Review 41H0 : a = a0 vs. H1 : a = a1, can be established for classifying the true standard deviationof a normal variable x with known mean into two classes. By replacing A and B inWald’s general function (Equation 2.34) using the specified error probabilities, the OCequation for this specific SPRT is:()h1L(a) (iz.)h — (..A)h (2.35)where h = h(a), which must be determined in order to calculate OC values using thisequation. By Wald’s (1947) definition, h is the nonzero solution of:——(r—t)2 h1(ao)hj+(:_:_2)exdx= 1 (2.36)Wald (1947) showed that the above expression could be simplified asa1 h “ h h 1a (—) = /i/(_ - - + -) (2.37)Ji 0 0Instead of solving Equation 2.37 with respect to h, Wald (1947) suggested that it couldbe solved with respect to a to obtain:I / \2hI() -1a 41 h h (2.38)N ? cWith the use of Equations 2.35 and 2.38, the OC curve for this specific SPRT can beplotted. First, for any given value of h, we compute L(a) and a from Equations 2.35 and2.38. The pair, [a, L(a)], obtained in this way gives us a point on the OC curve with aand L(a) as the horizontal and vertical axis, respectively. The computation of the pair,[a, L(a)], is then repeated for a sufficiently large number of values of h. Fowler (1978)indicated that h from -4.0 to 4.0 with an interval 0.5 (h 0) would usually be sufficientfor constructing the OC curve of an SPRT.Chapter 2. Literature Review 422.4.1.3 Wald’s general ASN function for SPRT plansThe Average Sample Number (ASN) function is the expected (average) number of observations (Ee(n)) needed to make a terminating decision, assuming that 0 is the trueparameter. The general ASN function derived by Wald (1947) for any SPRT is:L(0)ln(B) + [1 — L(O)]ln(A)Ee(u) = (2.39)Ee(z)and Ee(z) = E(Z) = E (in [‘‘]) (2.40)where E9(n) is the ASN value for given 0; E9(z) is the expected value of log-likelihoodratio of Z; and L(0) is the OC value for given 0 obtained from Equation 2.34. Tocalculate E9(n), the OC value and E9(z) must first be determined for a given 0.In order to explain how to derive the ASN equation for a specific SPRT procedurebased on Wald’s general OC function, the example for testing the standard deviation (a)of a normal variable with known mean () is also given. By replacing A and B in Wald’sgeneral ASN equation (Equation 2.39) with the specified values of a and j3, and assumingthe OC value (L(o-)) for given a has been calculated using the previous procedures, theASN equation for this specific SPRT plan is:L(a) 1n(j-) + [1 — L(a)] 1n()Enj — 2.41E(z)where E(z), the expected value of the log likelihood ratio for this test is:1 ( 2IE(z) = E = E [1n() + (- - (x - )2j (2.42)a0 2E(z) = ln(—) + j———j a (2.43)By replacing E(z) in Equation 2.41 with Equation 2.43, the ASN equation of an SPRTChapter 2. Literature Review 43for testing the standard deviation of normal variable with known mean becomes:E(n) L(a) 1n(j) + [1 — L(u)] 1n(-) (2.44)ln()+(_)cr2where u2 is the true variance of the variable of interest. To construct the ASN curve forthis test, the procedures are (1) h values of the OC equation (Equation 2.35) are chosenfrom -4.0 to 4.0 with an interval of 0.5 (h 0); (2) for each h selected in this interval,Equations 2.35 and 2.38 are used to determine and L(o); (3) for each pair [o, L(o)]obtained in this way, Equation 2.44 is used to calculate an ASN value; and (4) the pairs{a, E,(n)] are plotted to form an ASN curve.2.4.1.4 Modifications of Wald’s decision boundariesAs described in Section 2.4.1.1, the decision boundaries of a Wald’s (1947) SPRT maybe represented as two parallel lines (i.e., L0 and L1). The decision process then involvescomparing the cumulative sum of observations, Z x, to these two decision lines todecide whether to stop or continue the testing at each stage of sampling. Wald (1947,p.158) proved that the probability was one that an SPRT plan would eventually terminate. However, for the situation when the true parameter, 0, of the population is closeto the mean of two class limits (i.e., (Oo + 0)/2), it is possible that the number of observations required by an SPRT plan will tend to be very large. This may result in asample size which is even larger than that required by an equally reliable fixed samplesize procedure.To prevent this situation, Wald (1947) suggested modifying the decision boundariesof an SPRT plan to force a terminating decision to be made at some maximum numberof observations, and called it the truncation of an SPRT plan. He also described apossible method for using the truncation in practice. The method was that an upperlimit (n0) of the number of observations was first preset. Then, if n0 was reached andChapter . Literature Review 44no terminal decision could be made, a final decision might be made using the rule: (1) ifT,0 (a0 +r0)/2, H0 is rejected; and (2) if T0 < (a0 +r0)/2, H0 is accepted, whereT0 is the cumulative observed values of the test statistic up to the sampling stage of no(i.e., T0= Z d for the SATP I); a0 = h1 + sn0 and r0 = h2 + sn0. Because a0and r0 are the values of the acceptance (L0) and rejection (L1) lines at the n stageof sampling, respectively, they are also called the acceptance and rejection values at n0.This rule will be denoted as Wald’s (1947) rule of truncation for an SPRT plan. Themodification will certainly change the resulting probabilities of the type I and II errorsand the expected sample number (ASN) of an SPRT. However, Wald stated that if nowas chosen to be large enough (i.e., two or three times the calculated ASN value), theeffect caused by the truncation would be small.Originally, Wald (1947) derived the decision boundaries, and the general OC and ASNfunctions of SPRT plans based on the assumption that the random observations weretaken singly in every stage of sequential sampling. Wald realized that this assumptionwas not practical in many situations, because taking random observations singly wouldincrease the travel time between the sampling units, and degrade the advantage of anSPRT plan. Wald stated that in these situations, group selection (i.e., more than oneobservation is taken at each stage of sampling) might be used instead of single selection.The only possible effect of doing this would be an increase in the number of observationsrequired by the test, and the resulting a and /9 might be substantially smaller thentheir desired values. In other words, this modification may make an SPRT plan moreconservative.Wald (1947) did not provide any numerical evidence to confirm the real effects causedby these two modifications. Some studies in the forestry literature on the effects of thesemodifications will be presented in the next subsection.Chapter 2. Literature Review 452.4.1.5 Previous forestry applications and studies of Wald’s SPRT plansNumerous applications of sequential procedures can be found in the forestry as wellas the agricultural literature. However, almost all of these applications used Wald’s(1947) SPRT plans for pest surveys. Pieters (1978) tabulated 60 SPRT applications foragricultural and forest pest surveys. Fowler and Lynch (1987b) tabulated 70 SPRT plans(from 65 articles), 28 were for forest and 42 for agricultural pest surveys. The underlyingdistributions of the variables of interest in these applications included binomial, negativebinomial, normal, and Poisson distributions. These applications of Wald’s SPRT planswere designed to test the percent infestation (q) using a binomial or negative binomialdistribution, or to test the mean density (it) (e.g., the average number of live larvae perplant) using a normal distribution with known variance or using a Poisson distribution.The formulae for determining the decision boundaries and the OC and ASN functionsof Wald’s (1947) SPRT plans for forest pest surveys were tabulated by Walters (1955),and Fowler and Lynch (1987a). Talerico and Chapman (1970) developed a FORTRANIV computer program (SEQUAN) to calculate the decision boundaries, and OC and ASNequations, and plot the decision boundaries for binomial, negative binomial, normal andPoisson distributions.In applying Wald’s (1947) SPRT procedure, it is important to choose appropriatevalues for the two class limits and the levels of error probabilities. In pest surveys, theclass limits (i.e., Oo and 0) represent economic thresholds or pest density levels. Choiceof these values and the resulting gap between them (i.e., the interval between 0 and0) depends on the biology and behavior of the insect and its damage, the relationshipbetween pest densities and resulting damage, economic constraints, time or labor constraints, and other criteria. The choice of the levels of error probabilities (i.e., a and3) is mainly based on the perceived seriousness of each error (Fowler and Lynch 1987a,Chapter 2. Literature Review 46p.346)Wald’s (1947) OC and ASN functions are helpful in assisting the users to designan optimal SPRT for a given problem. Since the problem of exceeding (overshooting)was ignored in their derivation, Wald’s OC and ASN equations are only approximate.Wald (1947) stated that the errors inherent in his equations because of exceeding thedecision boundary are small if a and /3 are small (i.e., < 0.05), and O and 8 (classlimits) are sufficiently close together. Because a and j3 are usually 0.10, and the classlimits are usually relatively wide for most sequential plans in forest pest management,several studies (i.e., Fowler 1978; Fowler 1983; Fowler and Lynch 1987a; and Lynchet al. 1990) have shown that the errors inherent in Wald’s equations as a result of“overshooting” the decision boundaries of the SPRT can be large. The relative errorsincreased for the OC and ASN equations as the difference between the null (0) andalternative (0) test parameter values increased. Relative errors also increased for theASN equation as the probabilities of type I and II errors increased. Wald’s equations,in general, overestimated the true error probabilities and underestimated the true ASN.The practical consequences of these errors to the samplers of pest surveys are: (1) theactual error probabilities can be considerably smaller than the nominal error probabilitiesused to build the sampling plan; and (2) considerably more observations are taken in thefield, on average, than necessary (Fowler 1983). Therefore, Wald’s SPRT plans appearto be conservative testing procedures, because they actually give lower risk values (errorprobabilities) for a given amount of sampling. This could be considered a safety measureto assure reliable classification during decision making (Lynch et al. 1990).In some situations, if the user considers Wald’s (1947) equations not acceptable, twoMonte Carlo alternative OC and ASN functions suggested by Fowler (1983) may be used.These are:Chapter 2. Literature Review 471. Nominal values of 00, 01, a and /3 are first used to determine Wald’s SPRT decision boundaries, and then based on these decision boundaries, the Monte Carlotechnique is used to estimate the actual OC and ASN curves associated with thedetermined SPRT procedure. If the user considers the Monte Carlo OC and ASNcurves acceptable, then these Monte Carlo OC and ASN curves, and the actualerror probabilities, & and , should be used instead of Wald’s equations.2. If the Monte Carlo OC and ASN obtained in (1) are not considered acceptable, theactual error probabilities, & and ,8, will then be used to calculate the new nominallevel of error probabilities, a’ and /3’. These are a’ = a2/& and /3’ = fl2//3. Thenew nominal error probabilities, a’ and /3’, are then used to determine new decisionboundaries. The new SPRT procedure based on a’ and /3’, will have the desiredlevel of error probabilities (i.e., a and /3).For some entomology applications, the decision process of sequential sampling plansbased on Wald’s (1947) SPRT plans have been modified by taking more than one observation at each stage of the plan, not making a terminating decision until some minimumnumber of observations has been taken, or forcing a terminating decision to be madeafter some maximum number of observations. In practice, these modifications are madefor biological and economic reasons after the sampling plan has been developed, and donot in any way affect the sequential decision lines. Thus, the modification is in applyingthe sampling plan and not in developing it. These deviations from the assumptions ofWald’s SPRT will, of course, increase the inaccuracy of Wald’s equations when used todescribe the properties of the modified sampling plan. The errors caused by modifyingWald’s assumptions were denoted as decision boundary modification errors by Fowler(1978). Fowler investigated the effects of these modifications on the accuracy of Wald’sequations for a normal distribution using Monte Carlo simulation techniques. Based onChapter 2. Literature Review 48the simulation results, he concluded that taking more than one observation at each stageof the plan decreases the actual a and 9 of the test and increases the actual ASN values.Truncating the decision process at some maximum number of observations increases theactual a and of the test and decreases the actual ASN values. Postponing a terminatingdecision until some minimum number of observations is taken decreases the actual a and3 of the test and increases the actual ASN values. He also concluded that the increase inthe inaccuracy of Wald’s equations due to these modifications depends on how far theydeviate from the assumptions of Wald’s SPRT and what combination of modifications isused operationally. The seriousness of the inaccuracies depends on the magnitude of theinaccuracy and on the costs of making a decision.For forest management purposes, application of sequential sampling plans have beenlimited to surveying the adequacy of forest stocking. Smith and Ker (1958) applied asequential sampling plan to assess the adequacy of the reproduction in the University ofBritish Columbia’s (B.C.) research forest at Haney. The sampling unit they defined wasa cluster of four 1-milacre (0.1 acres or 0.0405 ha) quadrats along the sampling lines,and they used a test to confirm that the Poisson distribution was most appropriate forrepresenting the mean stocking of four milacre quadrats of their data. Based on theresults, they concluded that sequential sampling could result in considerable savings inthe field work required in a reproduction survey. Fairweather (1985) gave a detaileddiscussion of applying a sequential sampling plan for assessing the adequacy of stocking.He indicated that the objective of stocking assessment was to classify forest lands into twobroad classes, “stocked” and “non-stocked”. Sequential sampling plans should be veryefficient in this application, especially when the actual stocking of the forest was eithervery high or very low. However, the disadvantages in using sequential sampling were: (1)the sample size could be very large when the true stocking percentage was between theacceptable and unacceptable limits; (2) selection of an inappropriate spatial distributionChapter 2. Literature Review 49could result in an incorrect decision; and (3) theoretically, sample plots (either singly orin clusters) have to be located randomly on the site. This may reduce the efficiency ofsequential sampling by increasing the travel time between plots. Fairweather conducteda Monte Carlo simulation study using stocking data obtained from five logging sites innorthern Maine’s (United States) spruce-fir region. Sequential sampling plans based onPoisson and binomial distributions were tested. In his simulation, a maximum of 150plots was arbitrarily set to stop the sampling for the “no decision” case. Fairweather alsoindicated that systematic selection, instead of random selection, appeared to be a validmethod for assessing stocking adequacy.2.4.2 Other sequential sampling plans used in forestryBesides Wald’s (1947) SPRT plans, two other types of sequential sampling plans havebeen also used in forest pest management.Iwao’s (1975) sequential sampling plans were designed to classify the mean number offorest pests per sampling unit (mean density, p) relative to a critical value of the meandensity, 1tt. In testing the null hypothesis H0 : p = Pc against all other alternatives(i.e., p < p, or p > pa), the basis used by Iwao for deriving the decision boundarieswas different from Wald’s SPRT plans. Iwao’s sequential plans depended on a principalassumption, that the relationship between the variance (u2) and the mean density of apopulation can be represented as:= f(p) = ( + l)p + (xi — 1)p2 (2.45)where A and xi are two constants determined by Iwao’s (1968) linear relationship betweenLloyd’s (1967) mean-crowding index (iii) and the mean density (i.e., th = A + vp, whereth is defined as p + [(o-2/p) — 1]). Based on this assumption, the acceptance and rejectionboundaries (lines) of Iwao’s (1975) sequential plans are the upper (Ta) and lower (T1)Chapter . Literature Review 50limits of the 100(1 —cv) percent confidence interval for the total number (Ta) of individualsof interest in n observations when H0 is true, where:T = n/ic + zi_/2/nf(/ic) (2.46)T1 = fl/Ic — zi_12/nf(ii) (2.47)and Zl_a/2 is the 100(1 — a/2) percentile of a standard normal distribution.By taking random observations from a population and recording the cumulative number of the individuals of interest in n observations (Ta), the null hypothesis is tested aftereach stage of sampling using the following rule: (1) if T > T, stop sampling and conclude i > ; (2) if T < T1, stop sampling and conclude / < u; and (3) if T1 <T <Ta,then continue sampling until a terminal decision can be made, or a predetermined maximum number of samples (maa) is reached. A terminal decision can then be made bycomparing the observed mean density, ,ti’ = Tmax /max to the critical mean density, 1ti.Iwao (1975) defined mar as:ma= zi_a/2f(/ic)2(2.48)Do/iawhere D0 is the required level of precision specified as the ratio of the standard error ofmean to the population mean density.A discussion of the statistical aspects of Iwao’s (1975) sequential plans was given byNyrop and Simmons (1984). In general, the use of Iwao’s plans relaxes the requirementof known distribution of the variable of interest and the difficulty of specifying the classlimits, Oo and 8, for applying Wald’s (1947) SPRT plans. However, its precision dependsupon the estimated relationship of the variance and the mean density of a population(i.e., Equation 2.45). Also, it is only appropriate for testing the mean densities, for whicha relationship between the variance and the mean has been defined. Some applicationsare found in Coggin and Dively (1982), and Mukerji et al. (1988).Chapter 2. Literature Review 51Instead of classifying a population parameter (mean density) into two or three wideclasses, Kuno’s (1969) sequential plans were designed to estimate the mean density () ofa population at a specified level of precision (D0) with a minimum expected sample size.Kuno’s plans also require the assumption that the variance (2) could be represented asa known function of the mean density for given population (i.e., u2=f(t)). Based onthis assumption, Kuno expressed the standard error of sample mean as = [f()/nj’I2,and sampling precision as:D =-Jf()/n (2.49)where T is the cumulative number of individuals observed in n random observations;and Ta/n (i.e., ) is an estimate of the mean density, t. By replacing D by the requiredlevel of precision, and rearranging the relationship with respect to T, Equation 2.49becomes:T= f() (2.50)Equation 2.50 represents the total number of individuals of interest required in n randomobservations to give the desired precision D0. Kuno (1969) called this equation the “stopline” since the sampler can stop once the observed cumulative number of individuals inany n random observatins (T,) equals or exceeds T. By using Kuno’s (1969) sequentialplans, an estimate of population mean density with the desired level of precision will be(Ta/n), and a 100(1 — c) percent confidence interval for the mean density can also beobtained. Some applications of Kuno’s (1969) sequential plans are found in Allen et al.(1972), Newton (1989), and Newton and LeMay (1992).2.4.3 Development of sequential analysis theory in statisticsIn its early stages, sequential analysis was heavily dominated by Wald’s (1947) SPRT. Inthe statistics literature, bibliographies were given by Johnson (1961) and Jackson (1960).Chapter . Literature Review 52The scientific fields in which Wald’s SPRT plans were most widely used are clinicalresearch (Armitage 1975) and industrial quality control (Davies 1954; Burr 1953). In theformer, patients usually enter a study serially and ethical considerations require that anyunnecessary use of inferior treatment should be avoided. For the latter, the samplinginspection for batches of products is usually time-consuming or destructive. In theseapplications, sequential analysis is appealing.Because of its practical values, Wald’s (1947) original theory of SPRT has been extended to sequential testing of composite hypotheses, sequential estimation, sequentialdecision-making, and many other statistical applications. The theory and developmentof sequential analysis are well documented by Wald (1947), Wetherill and Glazebrook(1986) and Ghosh and Sen (1991). Some special testing procedures found in the literature were: (i) the sequential t-test for population means (Rushton, 1950); (ii) thetruncated sequential t-test for a population mean (Fowler 1969); (iii) the sequential F-test for analysis of variance (Ghosh 1967); and (iv) sequentially testing the correlationcoefficient for bivariate normal variables (Choi 1971).2.5 SummaryTo meet the first objective of this research, a literature search was carried out in forestryand other literature. Since applicability testing of forestry models is an extension ofmodel validation, the available procedures for forestry model validation were reviewed,focusing on the statistical procedures for comparing estimated and observed values of amodel.In general, model validation is a process of building the confidence level for usinga model. If a model is developed for estimation or prediction purposes, the predictiveChapter . Literature Review 53ability (accuracy) of the model should be determined before it can be used with confidence. To determine the accuracy of a model (the closeness of the estimated values totheir target values), statistical hypothesis testing or estimation methods may be used.Hypothesis testing methods are appropriate if the user of a model wants to determinewhether the accuracy of the model meets an established requirement of accuracy. Inforestry, Freese’s (1960) procedure and its modifications were designed for this purpose,especially when the test model is a deterministic function (e.g., regression equation) andthe average accuracy of the model is mainly of concern (the unconditional accuracy fora set of input variables that is randomly selected from all possible values). If the testmodel is a stochastic simulation model or conditional accuracy of the model is of concern(the accuracy of estimated values for a specified set of input variables), then Reynoldset al.’s (1981) procedures for validating computer simulation models are appropriate.Statistical estimation methods are appropriate when the user of a model wants to knowthe magnitude of the errors if the model is repeatedly used for estimation in a givenpopulation. Reynolds’ (1984) estimation procedures and Reynolds and Chung’s (1986)regression procedures are designed for this purpose.However, almost all procedures found for forestry model validation are fixed samplesize procedures; the decision about the accuracy of a model when applying these procedures can only be made when a data set with a predetermined, sufficiently large samplesize is available. The purpose of applicability testing is not to provide accurate estimatesfor the error parameters of forestry estimation models, but to classify a model as acceptable or unacceptable for a given application. It may be possible that a decision to acceptor reject the model can be made with a small sample size, especially when the actualaccuracy of a model is far below or above the requirement of the user. A fixed samplesize procedure is then not considered to be efficient for applicability testing of forestrymodels. This is particularly of concern when data collection is expensive, time-consumingChapter 2. Literature Review 54or destructive.Sequential sampling plans have been widely used in forest pest management. In theseapplications, three types of sequential sampling plans were involved. Wald’s (1947) andIwao’s (1975) sequential sampling plans were designed for hypothesis testing (classifyinga population parameter in two or three wide classes). Kuno’s (1969) sequential samplingwas designed for estimating the population mean at a specified level of precision withminimum expected sample size. The applications of the sequential sampling plans toforest pest surveys were appealing, because these procedures required, on average, only40 to 60% as many observations as an equally reliable fixed sample size procedure (Fowlerand Lynch 1987a).Although sequential approaches have been widely used in forest and agricultural pestmanagement, and other scientific fields, few studies were found that addressed the problem of sequentially testing the predictive ability (accuracy) of any type of mathematicalmodel. Also, no sequential testing plans that met the desired criteria outlined for thealternative procedures for applicability testing of forestry models, were found except fortwo initial studies in forestry (Marshall and LeMay 1990; Brack and Marshall 1990).Chapter 3Extension of Freese’s Procedure to Sequential Accuracy Testing PlansInformation from the literature suggested that the variable sample size (sequential sampling) procedures could be superior to the fixed sample size procedures in terms of sampling cost savings. However, in order to apply sequential analysis approaches to modeltesting, some appropriate sequential testing plans, which have a better measure of theaccuracy of model than a binomial variable, must be deveIoped (Marshall and LeMay1990).Among the fixed sample size procedures found in the literature, Freese’s (1960) procedure seems to be most appropriate for the purpose of applicability testing of a model,because this procedure was designed to determine whether the accuracy of an estimationtechnique meets a specified accuracy requirement of the user. Also, Freese’s procedurehas been widely used for evaluating the accuracy of forestry estimation or predictionmodels (e.g., Evert 1981; Ek and Monserud 1979), and it is quite natural to considerwhether this procedure can be extended to a sequential testing plan.The distributional assumptions and the derivation of this procedure have been wellresearched (i.e., Reynolds 1984; Gregoire and Reynolds 1988). The only requirements forthe development and use of a sequential testing plan such as Wald’s (1947) SPRT, arethat the distribution of the variable of interest must be known, and that the two classlimits and the probabilities of the type I and II errors must be specified. If the specifiedclass limits represent the accuracy requirements for model acceptance and rejection, asequential sampling plan may be used to classify the applicability of a model into one of55Chapter 3. Sequential Accuracy Testing Plans 56two pre-defined classes (i.e., acceptable or unacceptable).This led to the decision to extend Freese’s (1960) procedure of accuracy testing intoa Wald’s SPRT plan to meet the first objective of this research. The extension was madefor each of Freese’s three chi-square test statistics. The derivations are presented in thischapter.3.1 Sequential Accuracy Testing Plan I3.1.1 Modification for Freese’s original formulationIn order to extend Freese’s test I into an SPRT plan, the following modifications weremade. First, instead of a predetermined value, the sample size required was allowed to bea random variable. Observations of the dependent variable (y) and independent variables() of the tested model would be sequentially taken at random from the applicationpopulation.Second, since Wald’s SPRT is only applicable for testing a simple hypothesis’ (H0)against another simple hypothesis (H,), the composite hypotheses2of Freese’s test I (i.e.,H0 : .2 vs. H, > a, where o = e2/z 7/2) were modified into two simplehypotheses. To solve this problem, instead of specifying a single allowable error, e, twoerror limits, e0 and e,, were used. The choice of e0 and e1 is somewhat arbitrary, ande0 < e < e,. The e0 can be interpreted as the upper limit of errors for classifying a modelas acceptable, and e1 as the lower limit of errors for classifying a model as unacceptable.Based on these two specified error limits and the desired probability level (-y) in theaccuracy requirement (Equation 2.17), the modified test hypotheses of the extended test‘A simple hypothesis is defined as a hypothesis under which the probability distribution is completelyknown.2A composite hypothesis is defined as a hypothesis under which the probability distribution is notcompletely known.Chapter 3. Sequential Accuracy Testing Plans 57I are:H0:H1 : o2 = o, a, (3.51)where g2 is the true variance of d; u and oj are the hypothesized variances of d formodel acceptance and rejection, respectively; o = eg/z_,2;and o = e/z_712. Theinterpretations of the modified test hypotheses are as follows. Since Freese’s test statisticI was derived based on the assumption that the model error, d, is normally distributedvariable with a zero mean and variance g2, an accurate model will have:P(Id <e)=1—7and o2 = (i.e., the true error variance is equal to the hypothesized variance), whereo- is determined from the specified values of e and ‘-y. Because o and o are chosen as<o for the modified hypotheses, it implies that:P(IdI e)> 1—7if the modified null hypothesis is true, and:P(Id <e)< 1—7if the modified alternative hypothesis is true. Therefore, the modified hypotheses maybe used to substitute for the original hypotheses of Freese’s test I. Also, the null andalternative parameters represent the class limits for model acceptance and rejection,respectively. In a sequential analysis context, the equivalence between the modified testhypotheses and those of the original composite hypothesis was discussed by Wald (1947,p.125). He stated that for any value of the tested parameter, 2, we always could findtwo values (i.e., o and o), such that the classification of the true variance to be greaterChapter 3. Sequential Accuracy Testing Plans 58than o was considered an error of practical importance whenever the true variance wasless than c. Similarly, the classification of the true variance to be less than c wasregarded as an error of practical importance whenever the true variance was greater thanr. For a true variance value between u and o, there was no particular concern as towhich decision was made.Finally, in order to derive the decision boundaries of the extended test I, Wald’s (1947)general stopping rule, B < R < A, for an SPRT plan was used. This requires that theprobability distribution of d is known, and that the probability levels of the type I and IIerrors, cr and /3, must be preset for the test. The distributional assumptions of Freese’stest I, which were outlined by Reynolds (1984) and Gregoire and Reynolds (1988) wereretained. That is, the n differences between the estimated and actual values from asequence of model estimation, d1,d2,. . . d,, are independent and identically distributed(iid) normal variables with a zero mean and a variance, u2. Also, the unconditionaldistribution of model errors, d, was assumed to be used in testing.3.1.2 Sequential decision boundariesUsing the distributional assumptions stated for d, the decision boundaries of the extendedtest I were derived as follows. The joint probability density function of the observederrors, d for i = 1,... ,n, under the null hypothesis H0 (Equation 3.51) is:go(d1,. . . , d)=f0(d)= (2I2u exp(—- d) (3.52)where ug eg/z_..,2. Similarly, the joint probability density function of d under themodified alternative hypothesis H1 (Equation 3.51) is:gi(di,...,d) =]Jf1(d=2)/flexP(_._id) (3.53)where o = e/z?_,2.With these known density functions, Wald’s (1947) general decision rules for an SPRT plan described in section 2.4.1.1 could be applied to derive theChapter 3. Sequential Accuracy Testing Plans 59decision boundaries of the extended test I by first determining the sample likelihood ratio(Rn) of the alternative to the null hypotheses:R— g1(di,...,d) — fl1(d)n —— ngo(di,...,d) fJ_0(d)1 ( 1’n d2D— (2)flI)f/ex’) L— 1 1 1 v’fl d2)fl/(U)flIexP—-2fl/2 i ‘exp01 i=1Wald’s rule, B R A, was then applied to divide the n-dimensional sample spaceinto three mutually exclusive subspaces, where A (1 — /3)/a and B /3/(l — a);and a and /3 are the specified probability levels of the type I and II errors for the test,respectively. This was obtained by replacing R in Wald’s rule using the above result.2f/2 iB [] exp [(— — —i) d] A°• °•i :1or2n/2 11 1 2n/2B exp <AO 1 jThis expression was simplified by taking the natural logarithms (ln) for each term, andreplacing cr by (eg/z_12)and ? by (e?/z_,2).ln(B) + in < . z_( — -)d <ln(A) + inThis result was further simplified to:2 ln(B) + n 1n(e/eg) < d < 2 ln(A) + n 1n(e/eg)2 (1 1 — i— 2 /1 1Zl_7/?— i=1 — -Equation 3.54 gives the derived decision boundaries of the extended test I, and it can berepresented as two parallel decision lines by letting:— 2ln(j—)h1— f 2 /1 1 (3.55)—Chapter 3. Sequential Accuracy Testing Plans 602h2= r 2a (3.56)[Z1,j—— ln(e/e)S— r2 (1 1 (.LZi_/I—T1 = (3.58)where T1 is the cumulative squared errors of model estimation, the test statistic of theextended test I; h1 and h2 are the intercepts; and s is the common slope of the decisionlines. Equation 3.54 becomes:h1 + sn T1 h2 + snAt each stage of sequential sampling, one of the following decisions would be made:1. T1 h + Sn, stop sampling and conclude that the model is not acceptable (i.e.,reject Ho);2. T1 h1 +sn, stop sampling and conclude that the model is acceptable (i.e., acceptHi); or3. h1 + sri <T1 <h2 + sri, continue sampling. No terminal decision can be made atthis stage.Cases 1 and 2 are denoted as the terminal decisions of the extended test I; and L0 =h1 + sn and L1 = h2 + sn are called the acceptance and rejection lines, respectively.This sequential testing plan was designed to perform the same function as Freese’s chisquare test I. It will be denoted as the sequential accuracy testing plan I throughoutthis dissertation, and labelled as SATP I. Based on the theory of Wald’s (1947) SPRToutlined in Chapter 2, such an extension of Freese’s test I appears to be reasonable.However, some problems in applying this extended procedure exist.Chapter 3. Sequential Accuracy Testing Plans 61The key for this extension is the normality assumption of the model errors, d. Gregoire and Reynolds (1988) showed that Freese’s original procedure was sensitive to non-normality. The extended procedure is not robust to any departure from normality sincea smaller sample size is expected using a sequential sampling procedure. Therefore, inapplying this extended test, care must be exercised to insure that the distribution ofd is normal. To do this, a post-testing procedure of normality is proposed and will bepresented in Chapter 5 of this dissertation.In order to classify a model as either acceptable or unacceptable, two class limits forthe errors (eo and ei) must be specified by the potential users. These class limits, ofcourse, must be realistic and based on a fundamental knowledge of model evaluation andtimber management purposes. These two specified values and their associated probabilitylevel, y, are required to determine the hypothesized variances for model acceptance andrejection, o and u. If the mean squared error (MSE) from the model estimation isknown for the original population, it may be possible to specify the hypothesized variancesdirectly, instead of calculating them from the specified values of e0, e1 and.Also, thesmaller the difference between the class limits, the wider the two decision boundaries willbe. The effect of the interval between two class limits will be addressed in Chapter 4.3.2 Sequential Accuracy Testing Plan II3.2.1 Modifications of Freese’s original formulationAs indicated in Chapter 2, Freese’s test II is designed for testing the relative error (d/y)instead of d. The basic assumptions and idea for accuracy testing are exactly the sameas those used for Freese’s test I. According to Reynolds (1984), the derivation of Freese’stest II assumes that di/yi, d2/y, ..., d/y are iid normal variables with a zero meanand a variance cr,. The test hypotheses of Freese’s original test are: H0 : vs.Chapter 3. Sequential Accuracy Testing Plans 62H1 : > o, where cr p2/(100z_7 2).Similar to SATP 1, to extend Freese’s test II into a Wald’s SPRT, the test hypothesesof the original test must be modified into two simple hypotheses as:H0 :=vs.H1 : = (3.59)where = pg/(100z1_,2),u = p/(1O0z1_2)2;P0 and Pi are the percent errorlimits for model acceptance and rejection, respectively; po < p < P; and p and arethe same as in Freese’s test II. Also, both the probability levels of the type I and IIerrors of the extended test II, cv and 8, must be specified. The interpretations of thesemodified hypotheses are similar to those of SATP I. That is, if the null hypothesis is true,it implies:P(lOOjd/y p) > 1—Similarly, if the alternative hypothesis is true, it implies:P(1001d/yI p) <1—7.As with SATP I, the modified hypotheses were then used to substitute for the originalhypotheses of Freese’s (1960) test II.3.2.2 Sequential decision boundariesTo derive decision boundaries for the extended test II, the methods are similar to those forSATP I. First, based on the assumption, d/y N(0, u,), the joint probability densityfunctions of d1/yi, d2/y, ..., d/y under the modified null and alternative hypotheses(Equation 3.59) are:• ,d/y) = flfo(d/y)= n/2 exp (3.60)i=1 ( 0p0 pOChapter 3. Sequential Accuracy Testing Plans 63r igi(di/y1,..., /y) = flf1(d/y) exp (3.61)= (2rr)f/2o L i=1The sample likelihood ratio (Rn) of the alternative to null hypothesis is:gi(di/yi,...,d/y) —fl1(d/y)— go(di/yi,...,d/y) fJo(d/y)1 r 1(2)f/O)f/exP •—--R= 1 / 1lr)n/(o07Iex1)j—— (d/y)2)2 n/2 r11F 1= Li exp [( 2 — —i-)p0 pl j=iRelating the variance limits, cr and o, to the user-specified values, po and pi, (i.e., a, =[po/(1ooz1_2)]2,and = [pi/(ioozi_,2)j,and applying Wald’s rule, B Afor sample space division, the following expression is obtained.[2n/2 r1 1 1 “B I1 exp x lOO2z_.),/(— — —)(d2/y)Lvii Po P1Taking natural logarithms and doing the necessary algebraic simplification, the expressionabove becomes:2 ln(B) + n ln(p/p)________________/ 1 1 (d1/yj2 < 2 ln(A) + n ln(p/p)lOO2xz1 7/2(p1002 x — ) i=1—Similar to SATP I, this resulting expression can be represented as two parallel lines byletting:2 ln (4){ (ioo 2 1 1 1Z1_.y/2)—2ln(i){ (ioo )2(i 1\1z1__y/2—-— J IPO P’Jin(4)p0-2(1 1\1[ (1O0zi_,2) i ——‘p0h2S =(3.62)(3.63)(3.64)Chapter 3. Sequential Accuracy Testing Plans 64T2= E(d/y)2 (3.65)where h1, h2 are the two intercepts for the upper and lower decision boundaries; s is thecommon slope of the decision boundaries; and T2 is the cumulative squared relative errorsand it is the test statistic of the extended test II. The simplified decision boundary isthen:h1 + sn <T2 < h2 + snWith h1, h2 and s defined as above, the decision rules of the extended test II at eachstage of sampling are exactly the same as those outlined for SATP I after replacing thetest statistic with T2.This testing procedure is designed to perform the function of Freese’s test II. It will bedenoted as the sequential accuracy testing plan II, or SATP II. Similar to SATP I, SATPII requires the assumptions that the relative errors, d/y, have a normal distribution withzero mean, and that random observations are selected singly. Therefore, in applyingthis test, care must be given to confirm the normality of d/y. The appropriate post-testprocedures are the same as those for the SATP I, and are explained in Chapter 5.3.3 Sequential Accuracy Testing Plan IIISATPs I and II require a normality assumption for d or d/y with a zero mean. A zeromean for the error d, or relative error d/y, implies that the model estimation is unbiased.However, it may not be practical to assume the mean of d or d/y to be zero in manyapplications. The sum of squared errors is used as the test statistic in each of thesetwo SATP procedures, and this is a measure of the total error of model estimation.Therefore, in applying these two sequential testing plans, an inaccurate model will berejected regardless of the source of inaccuracy (larger bias, lower precision, or both).Chapter 3. Sequential Accuracy Testing Plans 65For the situation when the bias can be assumed to be the same for all estimatedvalues, Freese (1960) derived test III to give an approximate testing of accuracy aftereliminating bias. Gregoire and Reynolds (1988, p.309) indicated that this approximatetest after eliminating the bias for the absolute error d cannot be used for the relativeerror d/y. They stated that there is no apparent way to correct for bias that preservesnormality when the error is represented as a relative unit.Similar to SATP I, effort was made to extend Freese’s (1960) test III into a Wald’s(1947) SPRT plan. However, compared to the two previous extensions, the difficulty withthis extension is that the population mean ltd of d (the non-tested parameter) is unknownwhen bias is present. As indicated previously, Wald’s (1947) SPRT is only applicablewhen both the null and alternative hypothesis are simple hypotheses (all parameters inthe null and alternative distributions are known). When ltd is unknown, the modifiedhypotheses of the SATP I (Equation 3.51) are no longer simple, but composite. Wald’sSPRT plan is then not applicable.In order to formulate Freese’s (1960) test III into a Wald’s (1947) SPRT, some suggestions by Wald were adopted. These were that when testing the standard deviationof a normal variable () in the case when the population mean (t) is unknown, anapproximate SPRT plan may be obtained using the following modifications:1. Replace the test statistic — jt)2 byZ1(x — )2, where x is the observedvalue of the variable, and Z x/n; and2. Change the number of observations in the decision rule to n— 1 instead of n. Thatis, if the mean is unknown, the decision boundaries at the nth sampling stage areequal to the decision boundaries corresponding to the (n— 1)t1. sampling stage whenthe mean is known.Since the essence of Freese’s (1960) procedures is to test whether the variance of a normalChapter 3. Sequential Accuracy Testing Plans 66variable d exceeds a given value u, Wald’s suggestions are applicable to extend Freese’stest III into a Wald’s SPRT plan. For testing the same test hypotheses (Equation 3.51)as SATP I, the test statistic (T3) for the extended test III is:T3—_(d_d)2 (3.66)At each stage of sequential sampling, one of three decisions will be made:1. If T3 h2 + s(n — 1), stop sampling and reject the model;2. If T3 h + s(n — 1), stop sampling and accept the model; or3. If h1 + s(n — 1) <T3 < h2 + s(n — 1), continue sampling. No terminal decision canbe made at this stage.where the values of h1, h2, and s are defined as for SATP I (Equations 3.55, 3.56 and3.57). This sequential testing plan is designed to perform the same function as Freese’stest III, and it will be denoted as sequential accuracy testing plan III, or SATP III.It should be indicated that SATP III was not obtained directly through computingthe sample likelihood ratio R for the situation when the mean is unknown, but wasa modification of the sequential testing plan when the mean of d is known (SATP I).Therefore, it is only an approximate SPRT plan. The validity of SATP III must beexamined through simulation experiments, which will be addressed in Chapter 4.3.4 Approximate OC and ASN Equations for the SATP ProcedureIn previous sections, three Freese’s (1960) chi-square test statistics were formulated asWald’s (1947) SPRT plans for determiningthe applicability of forestry estimation models.The rationale of the extensions was that Freese’s idea behind accuracy testing (i.e., toconfirm the probability statement given by Equation 2.17) can be translated into testingChapter 3. Sequential Accuracy Testing Plans 67the hypotheses H0 : o2 < u vs. H1 : > cr under the normality assumption and azero mean for model error d or d/y.As sequential testing plans, SATP procedures should have the advantage that, onaverage, the sample sizes required to reach a decision about the applicability of a modelshould be smaller than those required by an equally reliable fixed sample size procedure.However, since the sample size becomes a random variable, the sampling cost is unknown in advance of sampling. This may produce administrative difficulties in practicalapplication. To solve this problem, Wald’s approximate OC and ASN equations maybe applied to describe the probability of accepting the null hypothesis (i.e., OC value)and the expected sample size (i.e., ASN) required for a given SATP procedure. Theseequations will assist potential users of the SATP procedures in designing an appropriatetesting plan for a given problem.Because the essence of Freese’s (1960) test I and II is to determine whether thevariance (u2) of a normal variable with a zero mean (i.e., model errors, d or d/y) exceedsa specified value (o), and the SATPs I and II are formulated to perform the samefunctions as the original procedure, this suggests that it may be possible to extend theOC and ASN equations of Wald’s (1947) SPRT plan for testing the standard deviation ofa normal variable for use with SATPs I and II. Based on this analysis, the appropriate OCequations of SATPs I and II are suggested based on the equations given in Section 2.4.1.These are:(1—/3)h 1L(2) (1/3)h— (fi)h (3.67)/ \2h11-1=h (3.68)where L(o’2) is the OC value for given .2 (i.e., a possible value of the true variance ofmodel error, d or d/y); and h can be any non-zero value (h from -4.0 to 4.0 with anChapter 3. Sequential Accuracy Testing Plans 68interval of 0.5 usually may be used for constructing an OC curve). Equations 3.67 and3.68 can be used to determine the approximate OC curve for any SATP I and II with aspecified set of values, o-, o, a and 3. The procedures used are as follows. For any givenvalue of h, a2 and L(u2) are calculated using these equations. Using a2 as the horizontalaxis and L(a2) as the vertical axis, the pair (a2,L(o2)) is plotted. When the pairs,(a2,L(a2)), are calculated for a sufficiently large number of values of h (e.g., from -4.0 to4.0 with an interval 0.5), the OC curve for a given SATP procedure will be obtained as inFigure 3.1. The OC curve constructed in this way has the property that the OC values(L(u2)) are L(0) = 1, L(a) = 1 — a, L(a?) = ,6, and L(oo) = 0. These properties maybe used for examining the accuracy of an approximate OC equation through simulationexperiments.The approximated ASN equations suggested for SATP I and II are:L(a2) [in(--)-1n(1-)} + lnQ--)E2(n) a (3.69)or: E2(n)= L(u2)(ho—hi) + h1 (3.70)where E2(n) is the ASN value for a given a2; and h0, h1 and s are the determinedintercepts and slope of the decision boundaries of SATP procedure.To construct an ASN curve for a specified set of o, ?, a and ,6, procedures are similarto those used to construct an OC curve. For a given h value, the (a2, E,.2(n)) pair canbe calculated using Equation 3.68, and Equation 3.69 or 3.70. Repeating the calculationfor a sufficiently large number of h, and plotting all calculated pairs as in Figure 3.2, anapproximate ASN curve will be obtained.It should be noted that these OC and ASN equations were originally derived byignoring any possible overshooting of the decision boundaries of an SPRT plan. Theaccuracy of these suggested equations in approximating the actual unknown OC andChapter 3. Sequential Accuracy Testing Plans 691 • N__0.90.80.60.5 \0ASoo.0.1 SI I I I I I I I • g0.90 0.97 1.06 1.15 125 137 1.50 1.65 1.74 1.92 2.02 2.25 251 2.81 3.16 356 4.03 4.57Error VarianceFigure 3.1: Example of an approximate Operating Characteristic (OC) curve of SATP Ifor testing H0 : = 1.50 vs. H1 : = 2.25 with a = 0.05 and /3 = 0.10.Chapter 8. Sequential Accuracy Testing Plans 708070• I• *• IZSC I-g Is. Sz.‘Sw._*a___._.w••s’Ass••s090 07 1b6 1’15 1125 1i37 1.50 1.5 1.74 192 2.b2 225 2’Sl 2.’81 3.I6 3.6 403 4k7Error VarianceFigure 3.2: Example of an approximate Average Sample Number (ASN) curve of SATPI for testing H0 : = 1.50 vs. H1 : = 2.25 with c = 0.05 and /9 = 0.10.ASN functions of the SATP procedures must be examined through simulation. Also,since SATP III is a modification of SATP I for the situation that the mean of d isnot zero, the OC and ASN equations suggested in this section may be appropriate fordetermining the OC and ASN values for SATP III. However, this needs to be confirmedthrough simulation.Chapter 4Monte Carlo Simulations for Examining Sequential Accuracy Testing PlansIn Chapter 3, Freese’s (1960) procedure of accuracy testing was extended to Wald’s(1947) SPRT plans to meet the first objective of this research. It was indicated that therationale of such an extension is that the accuracy requirement (Equation 2.17 or 2.22)can be translated into a variance bound (Equation 2.20 or 2.25) when the errors of amodel are iid normal variables with a zero mean and a constant variance. However, thereliability and behavior of the developed SATP procedures must be confirmed throughsampling experiments. The reliability of the SATP procedures means that the extendedprocedures can reach a correct decision of model acceptance or rejection with desiredprobability levels for the type I and II errors (a and j3).This chapter is designed to meet the second objective of the research, that is tovalidate the SATP procedures by constructing model errors of tree volume estimatesusing Monte Carlo techniques. The problem addressed by the simulation studies inthis chapter was the applicability testing of tree volume models. The objectives of thesimulation studies were:1. To examine the reliability of the SATP procedures in terms of the probability ofreaching a correct decision under different assumptions for the errors of tree volumemodels;2. To investigate the expected sample size of the SATP procedures required to makea terminal decision for different specifed parameters (i.e., o, o, and /3);71Chapter . Monte Carlo Simulation 723. To examine the robustness of the SATP procedures to departure from the underlying assumptions (i.e., the mean of the model errors is not zero, or the varianceof errors is not constant);4. To examine the accuracy of the approximate OC and ASN equations suggested forthe SATP procedures; and5. To study the effect of modifications of Wald’s assumptions for SPRT on the performance of SATP procedures. The modifications considered were: (1) at each stageof sampling, instead of selecting one random observation only, a group of randomobservations was taken; and (2) a maximum sample size (no) was set to terminatethe testing process for the situation when no terminal decision can be reached.In general, by using Monte Carlo techniques, some insights into how the SATP procedures behaved under the simulated application conditions of model testing were provided.Since the study of Gregoire and Reynolds (1988) confirmed that Freese’s original procedure is not robust to the departure from normality, the simulations were limited tonormally distributed errors, or mixtures of normally distributed errors. The normal generator proposed by Forsythe et al. (1977) was used, and the unconditional distributionsof the errors (d) of tree volume models were simulated. The mean (it) and the variance (a2) of the normal distributional errors generated represent the bias and precisionof model estimates, respectively. Three combinations for the mean and variance of thegenerated errors were considered in this chapter. These were:1. Zero mean and constant variance. These normally distributed errors were used tosimulate the errors of a tree volume model when the model estimates are unbiased,and constant error variance is present across all levels of the model independentvariables;Chapter 4. Monte Carlo Simulation 732. A non-zero mean and constant variance. These generated errors were used tosimulate the errors of a volume model when the model estimates are biased, butthe variance of errors is constant; and3. Zero mean and various variances. A mixture of five or ten single normal distribution generators with zero means and different variances was used to simulateheterogeneous variances of model errors, which are commonly found with Spurr’s(1952), or Schumacher and Hall’s (1933) volume models.Because the underlying distributional assumptions for SATP I and II are exactly thesame, only SATP I and III were examined in this chapter.The chapter is divided into seven sections. The first section presents the estimates ofthe error parameters from three sets of sectional tree data and three selected tree volumemodels. The sections following that present the simulation methods and results of examining the developed SATP procedures using the different generated error populations.The discussions and conclusion are found in the last section. It should be noted thatalthough the problem of applicability testing of tree volume models was addressed in thesimulation studies, the methods and results obtained in this chapter are applicable toother forestry models with similar error parameters.4.1 Estimates of Error Parameters of Tree Volume EstimationsIn order to address the problem of applicability testing of tree volume models, the magnitude of the errors of tree volume estimation should be known. Realistic simulationscan then be designed based on the known parameters of errors. For this purpose, threesets of tree sectional data were obtained. These data were collected by Inventory Branch,B.C. Ministry of Forests across the province in the 1970’s. Each data set included oneChapter . Monte Carlo Simulation 74species: Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco), lodgepole pine (Finns contorta Dougi.), and aspen (Fopulus tremuloides Michx.). The available variables in thesedata sets included: Forest Inventory Zone (FIZ), site quality (denoted as good, medium,and poor), total tree age (age), total height (height), diameter outside bark at breastheight (1.3 metres above the ground) (dbh), total stem volume (volume), and the sectional measurements (i.e., diameters inside bark at intervals along the stem, cumulativeheight, and sectional volumes). After deleting the trees that were forked or had brokentops, descriptive statistics for the remaining trees in each data set were computed forage, dbh, height and volume (Table 4.1).Using these data, and Spurr’s (1952) (Equation 2.4), Schumacher and Hall’s (1933)(Equation 2.2) and the logarithm of Schumacher and Hall’s (1933) (Equation 2.3) treevolume models, estimates of the error parameters (i.e., the mean and variance) of treevolume models were obtained. For each data set, each of the three volume models wasfitted using linear least-squares or nonlinear least-squares methods. These estimatedmodels were assumed to be the population models for each data set. Using these fittedmodels, the error parameters of volume estimation were calculated by applying the population models to a local area (subpopulation). A subset was selected from each dataset based on the variable, FIZ (Forestry Inventory Zone), and the population modelswere used to estimate the volume (Vi) for each tree in the subset. The errors of volume estimation were then calculated as the differences between the actual and estimatedvolumes (d1 =—i). Errors were also calculated by applying a population model toa different population (different species). Each population model was used to estimatethe tree volume for the other two species. For example, the Douglas-fir model was usedto estimate tree volume for lodgepole pine and aspen data. The average difference, d,the average absolute difference, IdI, and the mean squared difference (MSD) betweenthe actual and estimated volumes were computed for each application. The d’s found inChapter. Monte Carlo Simulation 75Table 4.1: Descriptive statistics of B. C. tree sectional data used for determining theerror parameters of volume estimations.Species Number Variable Mean Standard Minimum Maximumof Trees DeviationAge (yrs.) 204.1 132.9 29.0 578.0dbh (cm) 55.3 36.8 5.4 216.4Douglas-fir 990 Height (m) 33.2 14.4 6.3 76.7Volume (m3) 4.9210 8.652 0.0074 77.3126Age (yrs.) 111.7 46.3 15.0 302.0dbh (cm) 24.4 9.5 3.8 60.0lodgepole pine 2880 Height (m) 21.7 6.8 3.1 39.7Volume (m3) 0.6022 0.5346 0.0024 3.38786Age (yrs.) 95.4 26.7 27.0 192.0dbh (cm) 26.9 9.1 9.4 72.9aspen 1077 Height (m) 22.3 4.6 10.3 34.56Volume (m3) 0.6223 0.5405 0.0404 3.6916Chapter . Monte Carlo Simulation 76this way ranged from -2.0630 to 0.1505 m3 (Table 4.2), and MSD’s found ranged from0.0049 to 19.9330 for Spurr’s (1955) and Schumacher and Hall’s (1933) volume models.The d’s ranged from -0.2215 to 0.1646, and the MSD’s ranged from 0.00536 to 0.0247for the logarithm of Schumacher and Hall’s (1933) volume model. The d and MSD wereused as the estimates of the bias and variance of the errors in volume estimates for B.C.trees. In the simulations, the ranges of the observed bias (i.e., -2.1 to 0.15) and thevariance (i.e., 0.005 to 20.0) were covered by the means and variances of the generatednormally distributed errors. In this way, the simulation results are applicable for testingthe volume models, and should also be applicable for any other forestry models, whichhave similar ranges of error parameters.4.2 Reliability of the SATP When Errors Are iid Normal Variables With aZero Mean and a Constant VarianceIn the first simulation to validate the SATP procedures, the errors of a model wereassumed to be iid normal variables with a zero mean and a constant variance. Sincethe mean of the unconditional distribution of errors, d, measures the “average” bias ofa model, a mean of zero indicates that the model estimates are the unbiased estimatesof the true values. To assume that a model is unbiased may not be practical. However,a mean of zero was one of the assumptions required to derive the SATPs I and II inChapter 3 corresponding to Freese’s (1960) test I and II. Also, since SATP III is simplya modification of SATP I to allow for a non-zero mean of d, the results by using SATP Iand III should be very similar if the mean of d is zero. Therefore, this simulation was usedto validate the SATP I, and to compare SATP I and III under the ideal distributionalcondition.Under the assumption of I-4d = 0, the accuracy of a model will depend only on theChapter . Monte Carlo Simulation 77Table 4.2: Calculated statistics of the errors for three volume models and B.C. treesectional data.Population Application Number of Model d Idi MSDData Data Trees[1] - 0.05387 0.54590 1.40340FIZ B (DF) 575 [2] -0.02250 0.08860 0.01 186[3] -0.00003 0.49640 1.44599[1] -0.09664 0.17354 0.03284Douglas-fir AC 1077 [2] 0.09343 0.11107 0.00883(DF) [3] 0.12508 0.12549 0.01845[1] -0.07586 0.17634 0.03720Pb 2880 [2] 0.16458 0.17170 0.00920[3] 0.15048 0.15068 0.01955[1] 0.01028 0.05135 0.00935FIZ L (AC) 662 [2] 0.01128 0.05824 0.00536[3] 0.00166 0.04326 0.00591[1] 0.03839 0.05834 0.00643Aspen PL 2880 [2] 0.09170 0.11100 0.01005(AC) [3] 0.02343 0.04792 0.00516[1] -1.71060 1.71900 14.84238DF 990 [2] -0.18089 0.19973 0.02468[3] -1.70847 1.71228 13.40570[1] 0.00330 0.06997 0.01250FIZ I (PL) 953 [2] -0.01080 0.06134 0.00663[3] -0.00117 0.06156 0.00965[1] -2.06300 2.06441 19.93330Lodgepole pine DF 990 [2] -0.22150 0.22275 0.01554(Pb) [3] -0.86878 0.92135 2.74764[1] -0.04143 0.05756 0.00782AC 1077 [2] -0.08247 0.09360 0.00689[3] -0.03726 0.05570 0.00490Note: The values of Model [1] and [3] are in cubic metre units, and the values of Model [2] are in thelogarithmic units (base 10). Model [1], [2] and [3] represents Spurr’s 1952), Logarithmic Schumacherand Hall’s and Schumacher and Hall’s (1933) volume models; d and dl are mean difference and meanabsolute difference between the estimated and actual tree volumes, respectively; MSD is mean squareddifference between the estimated and actual tree volumes; and HZ is Forestry Inventory Zone.Chapter 4. Monte Carlo Simulation 78precision (variance) of the model estimates. Since the purpose in applying the SATPprocedures is to classify a model as either acceptable or unacceptable, the validity of theSATP procedures depends on whether the SATP procedures can correctly classify theerror variance of a model into two wide classes based on two preset class limits (i.e., oand o-). However, the derivation of the SATP procedures were based on Wald’s stoppingrule, B < R <A, where A and B had been chosen by Wald to approximately obtainthe specified probabilities of the type I and II errors, o and /3 (Wetherill and Glazebrook1986, p.16). These approximations used for the actual error probabilities of the SATPprocedures must be examined for various levels of o, o, c and j3. Also, in order todetermine the usefulness of the SATP procedures in terms of sampling cost savings, theexpected sample size required to reach a decision by the SATP procedures should beinvestigated. Therefore, the concerns addressed by this simulation were:1. Whether the SATP procedures work reliably for classifying the error variance, a2,of a volume model as either acceptable or unacceptable for various specified valuesof u and o, and for different values of the true error variances;2. Whether the specified error probabilities, c and /3, are equal to the actual errorprobabilities found in applying the SATP procedures.3. What is the expected sample size required by the SATP procedures for reaching aterminal decision for different values of o, o, c and /3; andTo address these problems, three simulations were performed by using normal generators to simulate taking random observations from the errors of tree volume models. Themeans of these normal generators were all set to be zero, and the variance was chosento include five levels, 0.003, 0.03, 0.3, 3.0, and 20.0. The values were chosen in order tocover the range of the MSD’s found from B.C. tree data and the three volume models(i.e., 0.005 to 20.0 from Table 4.2).Chapter 4. Monte Carlo Simulation 794.2.1 Simulation methodsBased on Wald’s (1947) formulations, for a given SPRT plan, the OC value gives theprobability of making a correct decision and the ASN value is the expected sample sizerequired for making a terminal decision. The relationship between the OC value andthe error probabilities of an SPRT are OCHO = 1 — a and OCHI = /3, if the possibleovershooting of the decision boundaries can be ignored, where OCH0 and 0C111 are theOC values when the null or alternative hypothesis is true, respectively. Therefore, inorder to examine the reliability and usefulness of the SATP procedures, or to investigatethe actual error probabilities of these extended procedures, the OC and ASN values fordifferent values of o, u, a and /3, which may be used in the practical application ofvolume model testing, must be estimated. The general simulation procedures used toestimate these desired OC and ASN values were:1. The test parameters, a, o, and the error probabilities, a and /3 (i.e., the nominalerror probabilities) were first specified. In deriving the SATP, o- and o wereassumed to be determined based on the error limits, e0 and e1 (or Po and pa), andthe accuracy level, y. Each set of o, o, a and /3 represented a simulated SATPprocedure. The OC and ASN values were then estimated for each simulated SATPprocedure;2. Each set of ug, o, a and /3 was then used to determine the decision boundaries ofthe SATP I or III using the formulae presented in Chapter 3 (e.g., Equations 3.55,3.56 and 3.57 for SATP I). Since o and u were specified instead of e0, e1 and ,the denominators of Equations 3.55, 3.56 and 3.57 were rewritten as (—4) whenthese formulae were used;Chapter 4. Monte Carlo Simulation 803. For each simulated SATP procedure, the normal generator with a zero mean andvariance, u,, denoted as N(0, o-,) was used to simulate taking random observationsfrom the unconditional distribution of errors, d. The simulated SATP procedurewas carried out to test the hypotheses H0 : o2 = vs. H1 o.2 = o. Thevariance of the normal generator that was used represented the true error varianceof a model;4. After each observation was taken, the computed test statistic (e.g., Equation 3.58for SATP I) was compared to the determined boundaries (i.e., h1 +sn andh2+sn).A terminal decision to accept H0 or H1 was made, or an additional observation wastaken based on the rules corresponding to SATP I or III as outlined in Chapter 3.This step was repeated until a terminal decision was finally reached (no truncationpoint was set in the simulations); and5. Steps 3 and 4 were repeated for a preset large number of trials (iterations) for eachsimulated SATP procedure. The percentage of times that H0 was accepted wasthen calculated as the estimate of the OC value when the true error variance is u,denoted as OCc72. The associated average number of the observations sampled wasalso calculated as the estimates of the ASN value, denoted as ASN.Before the simulations were conducted, a sensitivity analysis indicated that in mostcases, 500 and 1,000 iterations would result in adequate precisions for the Monte Carloestimates of the ASN and OC values, respectively. However, for estimating the actualerror probabilities, 2,000 iterations were used to obtain more precise estimates of the OCvalues.Chapter 4. Monte Carlo Simulation 814.2.2 Reliability of the SATP procedures for classifying the error varianceof volume modelsUnder normality of d with a zero mean arid a constant variance, it was shown in Chapter2 that the accuracy requirement (Equation 2.17) can be translated into an error variancebound (Equation 2.20). Freese’s (1960) procedure then becomes a test to determinewhether the error variance exceeds a given value (i.e., the user-supplied hypothesizedvariance, u). In deriving SATPs I and III, the error variance bound, 4 was modifiedinto two class limits of the error variance, o and o, for model acceptance and rejection,respectively. For a given application, the relationship between the tested parameters, oand o, and the true error variance, 2, may be described by one of the three followingsituations. These are: (1) a2 o (i.e., the true error variance is in the acceptanceregion); (2) a2 o- (i.e., the true error variance is in the rejection region); and (3)< < a (i.e., the true error variance is between the acceptable and unacceptablevariance limits). To determine the reliability of the SATP procedures, the probability ofmaking a correct decision under each of these situations should be determined.As a standard, Wald’s (1947) requirements imposed for the OC function of an SPRTplan may be used for evaluating the reliability of the SATP procedures in terms ofthe probability of making a correct decision. For a given SPRT plan, the space of thetest parameter (0) can be divided into three mutually exclusive regions. These are: (1)0 0; (2) Oo < 0 < 0; and (3) 0 0. Regions (1) and (3) are called the acceptance andrejection regions, respectively. Region (2) is called the indifference region. It representsa situation for which thesis is no practical importance as to which decision is made if thetrue prameter falls in this region. In order to test two simple hypotheses, H0 against H,Wald (1947, p.32) imposed some requirements on the OC function of an SPRT plan forgiven values of a and . These were: (1) OC must be equal to or greater than 1 — aChapter . Monte Carlo Simulation 82for the region of acceptance; (2) OC must be equal to or less than 3 for the region ofrejection; and (3) OC = 1 —a when 0 = 8 and OC = /3 when 8 = 01. These requirementsare met only if the possible overshooting of the decision boundaries can be ignored.To conduct this simulation, five values were selected for the true error variance (i.e.,the variance of normal generator, a) for each simulated SATP I or III plan. These were:(1) o = 0.9o; (2) m2 = o; (3) (o + o); (4) am4 cT; and (5) a5 l.lo.The multipliers, 0.9 and 1.1, were chosen arbitrarily to give values less than the acceptablevariance, ag, and greater than the unacceptable variance, o, respectively. To cover awider range of possible values of error variance, the following combinations were selectedfor o, o, a and /3:1. Five levels were set as the acceptable variance, o (i.e., 0.003, 0.03, 0.3, 3.0, 20.0).These values were selected to cover the range of MSD found in the tree data andvolume models;2. For each u level, three levels were arbitrarily chosen for the unacceptable variance,o. These values were specified as a ratio of u to ag (e.g., o/a 1.10, 1.20,2.00).The first two ratios were varied for different values of ag in order to cover a widerange of simulated conditions. For the purpose of comparison, the last ratio wasthe same (i.e., 2.00 was used). These values do not represent the optimal choice forthe inteval of o’ to cr, but only provide some indications of how to set this intervalfor practical application; and3. Two levels of nominal error probabilities were chosen. These were a=/3 = 0.05and 0.10. These values were commonly used for Wald’s SPRT plans in forest pestsurveys.These selected combinations resulted in a total of 30 simulated SATP I or III plans (i.e.,5 x 3 x 2 = 30).Chapter . Monte Carlo Simulation 83For each simulated SATP I plan (a combination of o, o, a and /9), the normalgenerators with a zero mean and one of five variances (i.e., U, O, 03, o74 and o)were used to simulate taking random observations from the errors of a volume model.For each normal generator used, the procedures described in section 4.2.1 were used toestimate the OC value of SATP I for a given true variance. For the five generators used,the estimated OC values were obtained and denoted as OCog, 00g, OC,, OC andOC11 representing the estimated OC value when the true error variance was less thana, equal to cr, between u and a (i.e., crm3), equal to a and greater than o. Sincethe estimate of interest in this simulation was the OC value, 1,000 iterations were usedbased on the results of a sensitivity analysis.For a given distributional generator, and level of a = 3, the estimated OC values werethe same when the ratio of o to o was the same (i.e., 2.00). Therefore, the results for= 2.00 are only given for u = 20.0 (Tables 4.3 and 4.4). This result is explainable.From the derived decision boundaries of SATP I (Equation 3.54), the OC value of anSATP I depends only on the relationship between o/crg and 2/cr for given a and /9,where u2 is the true variance. Because the choice of the true variance (i.e., the varianceof normal generator) was the same for each pair of o and o (i.e., 2 = o, cr etc.),simulated SATP I plans using the same ratio of o to o would result in the same OCvalues.Based on the Monte Carlo OC estimates obtained, the reliability of SATP I for classifying the error variance of a model into two pre-defined classes was examined by comparing the resulting OC values with Wald’s (1947) stated requirements for the OC functionof an SPRT. These were:1. To confirm the requirement, OC 1— a, for the acceptance region (i.e., a2the values of OC092 and OC2 were examined. It was found that the resulting OCChapter 4. Monte Carlo Simulation 84values satisfied the requirement for all levels of the specified variances, and a and/3. Since the OC values in this region directly represent the probability of making acorrect decision, the resulting OC values were all greater than their nominal values(i.e., 100(1— a) = 95% or 90%). This shows that under the normality assumptionwith a zero mean and constant variance of d, the developed SATP I worked reliablyfor classifying the variance (precision) of a volume model into the acceptance classwhen the true variance of errors is indeed within the specified acceptance level.2. To confirm the requirement, OC < 3, for the rejection region (i.e., o.2 a?), thevalues of OC2 and OC112 were examined. It was also found that the resultingOC values satisfied the requirement. The values of I— OC in the region of rejection represent the probability of making a correct decision (i.e., rejecting H0 andconcluding H1). Since all resulting OC values were less than 13, the probabilitiesof making a correct decision were greater than their nominal values (100(1 — j3) =95% or 90%) when the true variance of error fell in the rejection region.3. For the region of indifference (i.e., o <u2 <u), Wald (1947) did not indicate anyrequirement for the OC values. However, the values should be less than 1 — a andgreater than /3. The resulting OC values (i.e., OC2) ranged from 0.359 to 0.474for a= /3 = 0.05, and ranged from 0.397 to 0.495 for a= /3 = 0.10.The OC values of SATP III were also obtained using the same simulation procedures.The results were almost identical to those obtained for the SATP I. Based on the resultsobtained, it can be concluded that the SATP I and III are reliable for classifying the errorvariance of a model into two predetermined classes (acceptable and unacceptable) whenthe model errors are iid normal variables with a zero mean and a constant variance.Chapter 4. Monte Carlo Simulation 85Table 4.3: Monte Carlo OC estimates of SATP I for testing H0 : a2 ag vs. H1 : a2with c=/3 = 0.05 when five normal generators with zero means and different varianceswere used to simulate the errors of volume models.N(0,0.9ag) N(0,o) N(0,o) N(0,o) N(0,1.1cr)u u/a 0 0 0 0C, OC11.15 1.000 0.991 0.411 0.044 0.0030.003 1.20 0.999 0.981 0.435 0.049 0.0061.25 0.998 0.974 0.472 0.048 0.0090.030 1.30 0.996 0.970 0.452 0.048 0.0121.35 0.994 0.965 0.448 0.048 0.0140.300 1.40 0.993 0.965 0.443 0.047 0.0151.45 0.991 0.965 0.434 0.043 0.0163.000 1.50 0.988 0.966 0.435 0.042 0.0171.55 0.987 0.966 0.432 0.041 0.01820.000 1.60 0.987 0.966 0.425 0.042 0.0182.00 0.989 0.970 0.359 0.048 0.024Note: The results are based on 1,000 iterations. r and cr? are the specified acceptable and unacceptablevariance limits of model errors, respectively; and /3 are the specified probabilities of the type I andII errors for the test, respectively; N(0, o.2) represents the normal generator with zero and variance p.2,which is used for simulating the error of a tree volume model; o = (o + o); OC is the Monte CarloOC value when the true variance is o; 0.9o and 1.lof were arbitrarily chosen to represent the caseswhen the true variance is less than o and greater than o, respectively.Chapter 4. Monte Carlo Simulation 86Table 4.4: Monte Carlo OC estimates of SATP I for testing H0 : o2 = o vs. H1 : =with cv = = 0.10 when five normal generators with zero means and constant varianceswere used to simulate the errors of volume models.N(0,0.9o) N(0,o) N(0,o) N(0,a) N(0,1.1u)cr u/ug OCo,j OC OC OC1.15 0.999 0.953 0.461 0.087 0.0100.003 1.20 0.995 0.942 0.495 0.094 0.0161.25 0.991 0.931 0.477 0.094 0.0210.030 1.30 0.985 0.929 0.478 0.093 0.0261.35 0.981 0.933 0.473 0.091 0.0310.300 1.40 0.980 0.938 0.472 0.095 0.0291.45 0.977 0.936 0.464 0.096 0.0313.000 1.50 0.975 0.936 0.464 0.098 0.0371.55 0.975 0.937 0.452 0.101 0.03720.000 1.60 0.974 0.941 0.445 0.099 0.0382.00 0.973 0.945 0.397 0.096 0.058Note: The results are based on 1,000 iterations. o and r are the specified acceptable and unacceptablevariance limits of model errors , respectively; cv and /3 are the specified probabilities of the type I andII errors for the test, respectively; N(0, v.2) represents the normal generator with zero and variance o.2,which is used for simulating the errors of a tree volume model; oj= (o + o); OC is the MonteCarlo OC value when the true variance is o; 0.9o and 1.1o were arbitrarily chosen to represent thecases when the true variance is less than o and greater than o, respectively.Chapter 4. Monte Carlo Simulation 874.2.3 Accuracy of the nominal error probabilities of the SATPSimulations were performed to examine whether the specified error probabilities, a andwere equal to the actual error probabilties of the SATP procedures. This examinationwas considered necessary, because Wald’s (1947) stopping rule, B < R < A, was usedto derive the decision boundaries of the SATP procedures. Wald (1947, p.46) stated thatsince a and 3 were usually small (commonly less than 0.05), the approximate boundariesof A (A (1 — /3)/a) and B (B /3/(1— a)) would result in actual error probabilitiesof an SPRT that were very nearly equal to the specifed values, a and 13. However,after examining some SPRT plans used in forest pest surveys, Fowler (1983) indicatedsince a and 3 were usually 0.10 for most sequential sampling plans for pest surveys, thedifference between the nominal and actual error probabilities of Wald’s SPRT as a resultof overshooting the decision boundaries could be large.In order to evaluate the accuracy of the nominal error probabilities of the SATPprocedures, Monte Carlo estimates of the actual probabilities of the type I (am) and II(/3m) errors were obtained, and compared to the nominal values. In this simulation, amand /3m were estimated based on Wald’s (1947) requirements imposed on the OC functionof an SPRT. These are that the OC value of an SPRT when the null hypothesis is trueshould be 1— a (i.e., OCHO = 1 — a), and the OC value when the alternative hypothesisis true should be /3 (OCH1 = /3). Therefore, the actual probabilities of a and 3 of anSATP can be estimated through Monte Carlo estimates of the OC values when the trueerror variance is equal to o or o, respectively (i.e., OCg and OCq). The estimatedactual error probabilties of the type I and II errors for an SATP were:= 1—O? (4.71)(4.72)Chapter 4. Monte Carlo Simulation 88The procedures described in Section 4.2.1 were used again to estimate OC0 and OC2for the same sets of the simulated SATP I used in Section 4.2.2. Since the estimates ofinterest were the actual error probabilities, 2,000 iterations were used in this simulation.To compare the estimated actual error probabilities to the nominal error probabilities,the following values were calculated: (1) the difference between nominal and actualprobabilities (i.e., DEa = a— am and DE =— /3m); and (2) the relative error of thenominal error probabilities (i.e., REa = (a am)/am and RE = (i3— /3m)/!3m).The nominal error probabilities were consistently higher than the estimated actualerror probabilities (Tables 4.5 and 4.6). In general, the differences were larger for a thanfor 8. For example, when the nominal error probabilities were set to a = = 0.05, thedifferences between the nominal and estimated actual a ranged from 0.0155 to 0.0395,and the relative errors of the nominal a ranged from 0.45 to 3.75. However, the differencesbetween the nominal and estimated actual 3 only ranged from 0.0050 to 0.0110, and therelative errors of the nominal 9 only ranged from 0.11 to 0.30.Also, the smallest value of o and o chosen in this simulation (i.e., o = 0.003and o/o 1.15) had the narrowest distance between the acceptance and the rejectionlines (i.e., h2 — h1 = 0.271 when a = /3 = -y = 0.05), it resulted in the largest errors(DEa = 0.0395 and DE 0.0110). The largest value of o and a (i.e., u = 20.0 and= 2.00) had the widest distance between the acceptance and the rejection lines(i.e., h2— h1 = 471.112 when a = /3 = -y = 0.05), and resulted in the smaller errors(DEa = 0.0195 and DE,3 = 0.0050).These results suggest that the difference between the nominal and actual error probabilities is a result of overshooting the decision boundaries of the SATP. As the intervalbetween two decision lines (L0 and L1) decreases, overshooting of the decision boundariesbecomes more common. Therefore, the SATP procedures may be viewed as conservativeChapter . Monte Carlo Simulation 89testing procedures since these procedures always resulted in a smaller probability of making a wrong decision (a high precision testing procedure). On the other hand, increasingthe precision of a testing procedure usually results in a larger sample size. The resultssuggest that the sample size required by the SATP I procedure is likely larger than thatrequired to meet the specified probabilties of errors.The accuracies of the nominal error probabilities of the SATP procedures found inthis simulation were very similar to those obtained by Fowler (1978; 1983) for SPRTplans used in forest pest surveys. Therefore, in some situations, if the potential usersof the SATP procedures want to obtain correct nominal error probabilities, the MonteCarlo OC and ASN functions presented in Section 2.4.1.5 may be used with the SATPprocedures.The same simulation procedures were also carried out for evaluating the accuracy ofthe nominal error probabilities of SATP III. The results were almost the same as thoseobtained for SATP I.4.2.4 Expected sample size of the SATP proceduresUsing normal generators with zero means and constant variance, simulations were designed to provide some indication of sampling cost savings iii using the SATP procedures.Since the sample size in a sequential plan is a random variable, the expected sample size(ASN) required for reaching a terminal decision was examined.To design this simulation, a wider range of the four specified parameters, o, o, cand /3 was examined. Two situations were considered. These were: (1) fixed o, andincreasing o (e.g., o = 0.003, o = l.25o, 1.5o, 2ug); and (2) fixed o, and increasingo (e.g., o- = 0.03, o = 0.65o, 0.75u, 0.85o). The choice of these combinations werearbitrary. Also, three levels were set for c and j3 (i.e., 0.05, 0.10, and 0.20). Similarly tothe OC values, for o2 = o, the ASN values of an SATP procedure will depend only onChapter . Monte Carlo Simulation 90Table 4.5: Monte Carlo estimates of the actual error probabilities when the SATP I isapplied to test H0 : o2 o- vs. H1 : o with various ag, T and a = = 0.05.crg am DEa REa /3m DE,3 RE,31.15 0.0105 0.0395 3.75 0.0390 0.0110 0.280.003 1.20 0.0200 0.0300 1.50 0.0435 0.0065 0.151.25 0.0255 0.0245 0.96 0.0440 0.0060 0.140.030 1.30 0.0310 0.0190 0.61 0.0435 0.0065 0.151.35 0.0345 0.0155 0.45 0.0435 0.0065 0.150.300 1.40 0.0330 0.0170 0.52 0.0425 0.0075 0.181.45 0.0330 0.0170 0.52 0.0400 0.0100 0.253.000 1.50 0.0330 0.0170 0.52 0.0395 0.0105 0.271.55 0.0340 0.0160 0.47 0.0390 0.0110 0.2820.000 1.60 0.0340 0.0160 0.47 0.0385 0.0115 0.302.00 0.0305 0.0195 0.64 0.0450 0.0050 0.11Note: The estimates were based on 2,000 iterations. am and /3m are the estimated actual probabilitiesof type I and II errors, respectively; DEa and DE are the differences between the nominal and actualvalues of type I and II error, respectively; RE and REa are the relative differences of the nominal typeI and II errors to their actual estimates, respectively.Chapter .4. Monte Carlo Simulation 91Table 4.6: Monte Carlo estimates of the actual error probabilities when the SATP I isapplied to test H0 : o.2 crg vs. H1 : = u with various cr, c and a = /3 = 0.10.U J/Y am DEc REa /3m DE RE1.15 0.0470 0.0530 1.13 0.0810 0.0190 0.230.003 1.20 0.0605 0.0395 0.66 0.0870 0.0130 0.151.25 0.0680 0.0320 0.47 0.0875 0.0125 0.140.030 1.30 0.0700 0.0300 0.43 0.0875 0.0125 0.141.35 0.0670 0.0330 0.49 0.0860 0.0140 0.160.300 1.40 0.0645 0.0355 0.55 0.0890 0.0110 0.121.45 0.0660 0.0320 0.52 0.0920 0.0125 0.093.000 1.50 0.0665 0.0335 0.50 0.0940 0.0060 0.061.55 0.0640 0.0360 0.56 0.0965 0.0035 0.0420.000 1.60 0.0595 0.0405 0.68 0.0960 0.0040 0.042.00 0.0550 0.0450 0.82 0.0935 0.0065 0.07Note: The estimates were based on 2,000 iterations. am and /3m are the estimated actual probabilitiesof type I and II errors, respectively; DEa and DEfi are the differences between the nominal and actualvalues of type I and II error, respectively; REa and RE,, are the relative differences of the nominal typeI and II errors to their actual estimates, respectively.Chapter .. Monte Carlo Simulation 92a, /3 and u/r. That is, for given a and 3, ASN values can be expected to be the sameif the ratio of o to o is same. Finally, since o and o were specified instead e0 and e1,the specification of ‘y was not required.For each combination of o, o, a and /3, the normal generator, N(0, o) was used tosimulate taking random observations from the distribution of d. The simulation procedures described in section 4.2.1 were then used to estimate the ASN values when o isthe true error variance. Since the estimate of interest is the ASN value, 500 iterationswas used based on the results of the sensitivity analysis reported previously.In general, it was found that the expected sample size (AS N) required to reach aterminal decision decreased rapidly for fixed levels of error probabilities (Table 4.7) asthe interval between two class limits (o and a) increased. By fixing the level of a(or /3), and increasing the level of /3 (or a), ASN also decreased. These results canbe explained by examining the equations for the decision boundaries of SAT? I. Thedecision boundaries of SATP I (i.e., h1 + sn and h2 + sn) are always two parallel lines.The intercepts (h1 and h2) of these lines are determined by Equations 3.55 and 3.56, andthe common slope (s) is determined by Equation 3.57. Intuitively, any change that causesan increase in the distance between these two decision lines (the absolute difference ofh1 and h2), or an increase in the value of .s will result in a higher ASN. For example,for given values of a and /3, as the difference between the two class limits increases, thedenominators of the three equations will increase, thus resulting in a rapid decrease inthe values of h1 and h2. The distance between the two decision lines is therefore closer,and ASN values will decrease. Similarly, for given values of the two class limits, as thevalue of a or 3 increases, the distance between the two decision lines decreases, and theresulting ASN then decreases.Based on the results of this simulation, it was found that for any a and /3 examined, aresulting ASN value of less than 30 observations could be obtained by using a ratio of oChapter 4. Monte Carlo Simulation 93Table 4.7: Monte Carlo ASN estimates and the associated standard error when cr2for applying SATP I to test H0 : cr2 = o vs. H1 : .2 = o with various combinations ofo, o, c and 3.a=0.05 o=0.10 c0.20o 3= /3= I= /3= 3= /3= /3= 3= /3=0.05 0.10 0.20 0.05 0.10 0.20 0.05 0.10 0.201.25cig 240.8 177.8 117.6 222.2 159.8 105.9 195.2 135.3 79.6(7.080) (5.999) (4.319) (6.449) (5.120) (3.888) (5.867) (4.339) (2.531)0.003 1.50o 75.4 57.8 39.7 71.0 53.6 35.8 62.4 46.0 28.5(2.111) (1.755) (1.443) (1.860) (1.529) (1.232) (1.511) (1.264) (0.873)2.00a 29.8 22.8 15.9 28.3 21.7 14.7 26.1 19.4 12.2(0.798) (0.651) (0.536) (0.727) (0.623) (0.465) (0.656) (0.520) (0.324)0.65u 67.5 51.5 35.2 64.2 47.9 32.0 56.6 41.6 26.0(1.815) (1.556) (1.244) (1.661) (1.381) (1.077) (1.368) (1.142) (0.795)0.75c, 0.030 144.1 108.5 74.6 136.5 101.0 65.6 117.4 83.3 51.1(4.134) (3.589) (2.961) (3.853) (3.218) (2.384) (2.944) (2.379) (1.660)0.85o 428.0 321.8 216.6 406.5 301.4 197.1 359.1 254.1 142.8(13.237) (10.851) (8.503) (12.618) (10.207) (7.895) (11.372) (7.978) (4.855)1.30c 174.8 129.1 86.9 164.1 117.3 77.8 141.2 99.3 61.0(5.236) (4.239) (3.415) (4.856) (3.608) (2.771) (3.986) (2.888) (2.052)0.300 1.60c 57.9 44.6 30.4 55.3 42.2 27.9 49.8 36.3 22.6(1.531) (1.338) (1.028) (1.435) (1.269) (0.939) (1.254) (0.996) (0.676)1.90o’ 34.2 26.2 18.5 32.4 24.6 16.8 29.7 22.0 13.9(0.898) (0.769) (0.635) (0.814) (0.705) (0.545) (0.737) (0.626) (0.394)0.60o? 50.8 37.8 26.6 47.7 35.9 24.6 43.6 31.8 19.8(1.355) (1.114) (0.907) (1.240) (1.076) (0.850) (1.113) (0.894) (0.580)0.7OcT? 3.000 96.8 73.0 50.3 90.8 67.4 44.9 77.9 57.1 35.7(2.964) (2.407) (1.818) (2.682) (2.089) (1.507) (1.952) (1.563) (1.125)0.80 240.8 177.8 117.6 222.2 159.8 105.9 195.2 135.3 79.6(7.080) (5.999) (4.519) (6.448) (5.120) (3.888) (5.877) (4.339) (2.531)Note: the results are based on 500 iterations; and the numbers in the brackets are the standard error ofASN. When o is specified as 0.60o?, it means that the value of o-? is first specified, then o is determinedas 0.60o-?; similarly, when o is specified as 1.25o, it means that the value of is first specified, thenoj is determined as 1.25o.Chapter . Monte Carlo Simulation 94to o that is larger than 2.0. To test the hypothesis H0 : a2 = o against H1 : a2 =the sample size (n) required by a fixed sample size procedure to have the desired valuesof a and /9 will be the value, n, which satisfies the following relationship:X_(n—l) a2 2 (.)— 1) a0where x_a( — 1) and x(n — 1) are the 100(1 — a) and 100/9 percentiles of a chi-squaredistribution with (n— 1) degrees of freedom. Since it has been previously indicated thatthe OC and ASN values of an SATP procedure when ag or a is true will depend only onthe ratio a/o for given a and 3, the resulting ASN values of the simulated SATP I maybe compared to the sample sizes required by a fixed sample size procedure with the samea and /9 levels, and same value of the a/ag ratio. However, based on last section, theSATP procedures are always resulted in smaller error probabilities than those specified.For a simulated SATP I with a ratio, at/ag, of 2.0, the actual probabilities of the type 1and II errors (am and /3m) were 0.0305 and 0.0450, respectively, when the specified valueswere a= /9 = 0.05 (Table 4.5). The actual probabilities of the type I and II errors were0.0550 and 0.0935, respectively, when the specified values were a /9 = 0.10 (Table 4.6).Therefore, when making a comparison to fixed sample size procedures, these actual errorprobabilities should be used instead of those specified.Based on Equation 4.73, using a fixed sample size procedure for testing the hypotheses,H0 : a2 = ag against H1 : a2 = o, the sample size for a = 0.0305, /9 = 0.0450, anda ratio, a/a, of 2.0 is 55. Similarly, the sample size of a fixed sample size procedurerequired for a = 0.0550, /9 0.0935, and a ratio, a/ag, of 2.0 is 35. The resulting ASNvalues of the simulated SATP I that correspond to these two fixed sample size tests are29.8 and 21.7 (Table 4.7). This means that in doing the same tests, SATP proceduresuse only 54.2% and 62% of sample sizes, respectively, when compared to the fixed samplesize procedures with the same error probabilities. This result is similar to those foundChapter . Monte Carlo Simulation 95by Fowler (1978). That is, the use of SATP procedures will also be expected to resultin a 40 to 60% of sampling cost-saving compared to an equally reliable fixed sample sizeprocedure.The ASN values of SATP III for all combinations of the four parameters were alsoestimated and examined. The results were similar to those obtained for SATP I. However,ASN values of SATP III were one to five observations higher than those of SATP I. Thisresult may be explained by the fact that SATP III requires that the sample mean of d beestimated, and therefore, a minimum of two observations is needed to make a terminaldecision. SATP I does not have this requirement.4.3 Reliability of the SATP When Errors Are iid Normal Variables With aNon-Zero Mean and a Constant VarianceSimulations were also performed to validate the SATP procedures, assuming that theunconditional distribution of the errors of a volume model are iid normal variable witha non-zero mean (i.e., estimation bias is present). Acknowledging that the mean ofmodel errors is not zero may be more realistic for applicability testing of forestry models,because changes in growth type, site quality, or genotype between forest or species likelycause systematic estimation errors.From Chapter 2, the derivation of Freese’s (1960) test I and II requires an assumptionthat d or d/y is normally distributed with a zero mean. This assumption is critical fortranslating the probability statement of accuracy requirement (Equation 2.17) into thevariance bound (Equation 2.20). If the mean of d or d/y is not zero, this translation is notpossible, because the distribution of d2 or (d/y)2 will not be the standard chi-square, andwill instead be the noncentral chi-square. If the errors, d, is the variable of interest anda non-zero mean of d is found, Freese (1960) suggested that >(d — cl)2 should be usedChapter . Monte Carlo Simulation 96for measuring the actual accuracy of a model (i.e., Freese’s test III) instead of d.Therefore, SATP III is simply a modification of SATP I to allow for a non-zero mean of d.However, as indicated in Chapter 3, SATP III was not derived by directly computing thelikelihood ratio, R, for the situation when ud 0. Instead it was obtained by modifyingthe test statistic and the decision boundaries of SATP I. Therefore, the validity of theSATP procedures may be examined by comparing the performance of SATP I and IIIusing normal distribution generators with non-zero means and constant variance. Theproblems of interest for this simulation were: (1) what are the behaviours of SATP Iunder different magnitudes of estimation bias; and (2) how reliable is SATP III whenapplied to classify the applicability of a model after bias-correction.To carry out this simulation, five levels were set for the means of the normal generators(i.e., 0.1, 0.3, 0.5, 1.0 and 2.0). These values were chosen to cover the absolute valuesof the mean of volume estimation errors found in B.C. tree data and selected models(i.e., from -2.1 to 0.15). The variance of each generator was set to be equal to thenull parameter, o, for each simulated SATP. Using these selected normal generators,the simulation procedures in Section 4.2.1 were performed to estimate the OC values.However, to limit the work of this simulation, only one level of error probabilities wasused (i.e., = = 0.05), and the OC values when r was the true parameter (i.e.,OC2) were estimated only. For each simulation, 1,000 interations were used based onthe results of the sensitivity analysis described earlier.The effect of the bias of model estimation on the testing decision of SATP I wassubstantial, especially when the two specified limits of error variance were close together(Table 4.8). For these closer variance limits, the estimates of the OC values approachedzeros, which means that the probability of rejecting a model is nearly one even when thehypothesized acceptance variance o is known to be equal to the true variance of thegenerated errors. As the introduced bias (the mean of the normal generators) increased,Chapter . Monte Carlo Simulation 97Table 4.8: Comparison of Monte Carlo OC values when a2 = ag between SATP I andIII for testing H0 : a2 = o vs. H1 : a2 = aj with o = /3 = 0.05 and various a and ausing normal generators with a non-zero mean.SATP I SATP Illo o/o N(0.1,g) N(0.3,o) N(0.5,c) N(1.0,o) N(2.0,-g) Ailfive generators1.15 0.000 0.000 0.000 0.000 0.000 0.9920.003 1.20 0.000 0.000 0.000 0.000 0.000 0.9832.00 0.000 0.000 0.000 0.000 0.000 0.9671.25 0.012 0.000 0.000 0.000 0.000 0.9780.030 1.30 0.022 0.000 0.000 0.000 0.000 0.9712.00 0.625 0.000 0.000 0.000 0.000 0.9671.35 0.946 0.107 0.000 0.000 0.000 0.9690.300 1.40 0.951 0.164 0.000 0.000 0.000 0.9652.00 0.955 0.686 0.046 0.000 0.000 0.9671.45 0.964 0.955 0.878 0.165 0.000 0.9623.000 1.50 0.965 0.952 0.887 0.214 0.000 0.9662.00 0.967 0.955 0.949 0.625 0.001 0.9671.55 0.968 0.967 0.964 0.933 0.601 0.97320.000 1.60 0.967 0.969 0.965 0.938 0.657 0.9672.00 0.969 0.966 0.962 0.952 0.819 0.9671,000 iterations. o and o are the specifiedacceptable and unacceptable variance limits of model errors, respectively; N(0.l, a)represents the normal generator with mean 0.1 and variance, 0g.the effect of bias on the performance of SATP I was greater. This result confirms thatthe testing decision of SATP I will depend on both the actual variance (precision) andmean (bias) of model errors. The results discussed above seem to support the statementmade by Reynolds (1984, p.457) when he discussed Freese’s (1960) test I and II formodel testing. That is, if the tested model is biased, then the error variance would needto be even smaller than the variance bound (i.e., a2 e2/z_12) specified for modelacceptance. If the bias is large enough, it may not be possible to meet the accuracyrequirement no matter how small the error variance.Note: The values obtained were based onChapter 4. Monte Carlo Simulation 984.4 Reliability of the SATP When Errors Are Normally Distributed With aZero Mean and Heterogenous VariancesFor the two previous simulations, normally distributed errors with a constant variancewere considered, which is consistent with the distributional assumption required to extendFreese’s (1960) procedure into a Wald’s (1947) SPRT plan. However, models with non-constant (heterogeneous) variances of errors are commonly used in forestry. This situationis particularly true in tree volume estimation since the variation of tree volume increasesas tree size (e.g., dbh and height) increases. Although weighted least-squares, generalizedleast-squares approaches, or appropriate transformations may be applied to solve theproblem of heterogeneous variance (Cunia 1964, Greene 1990), the principal purpose inapplying the volume models is to estimate tree volume. Models obtained from an ordinaryleast-squares approach are still unbiased even with heterogenous variance. Therefore,the validation of the SATP procedure should be extended for the model errors withheterogenous variance.4.4.1 Simulation MethodsTo simulate the model errors with heterogenous variances, a simple practical situation forvolume model testing was considered. This was that the variances of model errors increasewith tree dbh class or dbh class groups. Also, the errors from each dbh class or group canbe assumed to follow a normal distribution with a zero mean and a constant variance,for i 1 . . . k, where k is the number of the dbh classes or groups in the population.In order to carry out this simulation, three problems should be addressed. First, thevariance pattern of the model errors (the mathematical relationship between the errorvariances and tree dbh) must be specified. Second, the mathematical model (distribution)used for generating the error population must be selected. Finally, a method to evaluateChapter . Monte Carlo Simulation 99the reliability of the SATP procedures for this simulated condition must be selected.To address the first problem, assuming that the error variances of a volume modelincreases as tree dbh increases, the variance estimation model suggested by Cunia (1964)was used. This is:= a0 dbh’ (4.74)where is the variance for the model errors of the jih dbh class; and a0 is an unknownconstant for given volume model. In the simulation, a0 was arbitrarily set to be 0.005since the minimum value of MSD found for the B.C. data was 0.0049; and a is the powercoefficient. In this simulation, two patterns of variances were considered, o cx dbh: andcr(i) cx dbh (i.e., a = 1 and 2, respectively). Also, since the range of dbh of the treesectional data was found to be from 3.8 to 216.4 cm (most trees were between 3.8 and 100cm, especially in the lodgepole pine and aspen sets), to reduce the range of simulation,dbh was transformed by dividing by 10. This gave two variance models:1. Model 1: = 0.05x, and x 2, 4, 6, 8, 10.2. Model 2: o = 0.Sx?, and x = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.where x, = dbh/10, and the numbers, 1,2,. . . 10, cover a dbh range from 10 to 100 cmwith a 10 or 20 cm class interval.Second, to develop the simulated error populations with a variance pattern followingModel 1 or 2 above, a combination (mixture) of normal generators was used. These were:1. To simulate Model 1, five normal generators with zero means and variances of 0.1,0.2, 0.3, 0.4 or 0.5 were linked using a uniformly distributed variable with valuesfrom 1 to 5. At each stage of sampling, the uniform generator was first used togenerate a number. If the generated number was 1, the normal generator with zeromean and variance 0.1 was used to simulate a random observation for the currentChapter . Monte Carlo Simulation 100stage of sequential sampling. If the value of the uniform generator was 2, thenormal generator with zero mean and variance 0.2 was used for sampling, etc. Thissimulated population will be denoted as dmi N(0, 0.05x) for x = 2,4,6,8, 10.2. To simulate Model 2, ten normal generators with zero means and variances of 0.5,2.0, 4.5, 8.0, 12.5, 18.0, 24.5, 32.0, 40.5 or 50.0 were linked using a uniformlydistributed variable with values from 1 to 10 using the same method used to generate dmi. This simulated population will be denoted as dm2 “-‘ N(0, 0.5x?) for= 1,2,...,10.The probability distribution of these two simulated error populations of tree volumemodels with heterogeneous variances may be defined statistically as a mixture of a finite number (k) of normal distributions (Johnson and Kotz 1970, p.87). The generalprobability density function of a mixture of normal distributions is:kwj((j))’exp{—(x — (4.75)where w2 are the weights for i 1 . . . k (0 < w, w = 1); u and are the meanand variance of the ith normal distributions, respectively. In this simulation, the u werezeros for all i, and the w were equal for all i (i.e., the k normal distributions were mixedby a weight coefficient of or . Eisenberger (1964) studied the statistical propertiesof the mixture of two normal distributions, N(1,u)) and N(1t2,°f2)) with a weightcoefficient p (0 <p < 1) from the first, and (1—p) from the second normal distribution.From the general form given, this mixed normal variable x has a density function:f(x,p) = pN(1,g(1)) + (1 — p)N(2,a(2))Based on his study, Eisenberger concluded for fixed values of variances o and Os), thatthe distribution of the mixture will be unimodal, and independent of the weights p andChapter 4. Monte Carlo Simulation 1011—p if the difference between the means is sufficiently small. If the difference betweentwo means exceeds a critical value given by:2 2(fL2 — fLi)2 8o(1)o(2)U(1) +the distribution of x may be unimodal or bimodal, depending on the value of p. However,if p is sufficiently close to zero or one, the distribution will be always unimodal. Johnsonand Kotz (1970) gave the formula of the 1 to 5 moment about zero of the mixture oftwo normal distributions. For a random variable, x, with the probability density functionof Equation 4.75 and zero means (i.e., uj 0 for all i), the mathematical expectationand variance of x are:E(x) = 0; and (4.76)kVar(x) = (4.77)•For a model with heterogeneous error variances as generated using the mixed normaldistributions described above, the validity of applying the SATP procedures is not clear,because the translation between the accuracy requirement specified as e and ‘y and thevariance bound, o =e2/z,,is not valid. However, the problem of interest is whetherthe SATP procedures can correctly classify a model as acceptable with a probability of1 — o when at least 100(1—-y) percent of the model errors are within e units. In otherwords, knowing that the accuracy requirement, P(d e) 1—y, is true for a model,will the SATP procedures reach a decision to accept the model with a probability of1 — o when model errors follow a mixture of the normal distributions with zero meansand different variances ? To test this question, the true values e for different levelshad to be determined for each of the two generated mixed normal populations. For thispurpose, 10 samples with a large sample size of 10,000 were taken from dmi or dm2. Foreach of the 10 samples, the e value such that 100(1— 7) percent of the observations wereChapter 4. Monte Carlo Simulation 102within e units was calculated. The average of the e values over the 10 samples was thencalculated for each-y level and population. In this way, the error limits, e, correspondingto the-y levels of 0.30, 0.20, 0.10 and 0.05 were determined. These values were:1. For dmi, e = 0.5674, 0.7022, 0.9010 and 1.0735; and2. For dm2, e = 4.5454, 5.6248, 7.2174 and 8.5995.The calculated values e were then used to set the e0 values, and the acceptable variancelimit o was determined as eg/z_,2 for calibrating a simulated SATP I or III. In thisway, the error population generated should be accepted with a probability of 1— a, ifthe approximation (i.e., o = eg/z,2)can be used. Simulations were then conductedto estimate the probability of accepting H0 as a standard to evaluate the reliability ofthe SATP procedures.The procedures for examining SATP I using the generated mixed normal distributedpopulations were:1. For each of four levels of e0 values, three levels were chosen for the unacceptableerror limit, e1, specified as a ratio of e1 to e0 (i.e., ei/eo = 1.1, 1.2 etc.);2. Three levels of error probabilities were chosen. These were: a = /3 = 0.05, 0.10,and 0.20;3. The values for each combination of e0, e1, a = /3 and the corresponding value of-ywere first used to determine the decision boundaries of SATP I using Equations 3.55,3.56 and 3.57. Each combination of e0, e1,-y, and a = /3 represented a simulatedSATP I for testing the hypothesis: H0 : cr2 = eg/z_,2 vs. H : o2 =4. Using the generated population, dmi, random observations were taken, with oneobservation at each stage of sampling. The observed test statistic, T1= >D d,Chapter 4. Monte Carlo Simulation 103was then compared to the decision boundaries, (h1 + sn) and (h2 + sn), where dis the sampled value at the stage of sampling and n is the cumulative numberof sampled values up to the current stage of sampling. Based on the decision rulesoutlined in Section 3.1.2, the decision of accepting or rejecting H0 was made, or anadditional observation was taken;5. Step 4 was repreated until a terminal decision was finally reached;6. For a simulated test determined in Step 3, Steps 4 and 5 were repeated 1,000 times.The percent of times that H0 was accepted (i.e., the estimate of the OC value),and associated ASN value were calculated; and7. Steps 3 to 6 were repeated for all combinations of e0, e1, 7 and a = 8.The same procedures were repeated using the simulated population, dm2.4.4.2 Simulation Results and DiscussionsFor the generated error population, dmi, it was found that all of the resulting probabilitiesof accepting H0 (i.e., OC values) were equal to or larger than 1 — a for all levels of a =(Table 4.9). For a = 3 = 0.05, the resulting OC values ranged from 0.951 to 0.975. Fora = /3 = 0.10, the resulting OC values ranged from 0.900 to 0.964, and for a = /3 = 0.20,the resulting OC values ranged from 0.820 to 0.927. Similar results were also found fordm2 (Table 4.10).Since the values, e and,used to set up the null hypothesis were known to exactlymeet the accuracy requirement represented as P(IdmI e) = 1— ,the results of thissimulation confirmed that the use of the SATP I procedure resulted in the acceptanceof the generated error populations (models) with a probability of at least 1 — a. Inother words, the error populations with the heterogeneous variances addressed in thisChapter . Monte Carlo Simulation 104simulation did not appear to adversely affect the performance of the SATP procedures.Also, the resulting ASN values were similar to those found in Table 4.7. For an expectedsample size of less than 30 observations, in most cases, the ratio of e1 to e0 should belarger than 2.00 for c 41 = 0.05 or 0.10. For = 41 = 0.20, the ASN was less than 30for ei/eo ratios greater than 1.25.It should be indicated that although only two patterns (models) of error varianceswere considered in this simulation study (i.e., 2 c dbh: and g2 x dbh), they representthe most common models of error variances in tree volume estimation (Cunia 1964).Therefore, the results provide a useful indication for application of the SATP proceduresto testing tree volume models with heterogenous variances of errors. However, the resultsobtained in this simulation depend on two critical assumptions. These are: (1) the errorsof a model can be divided into some mutually exclusive subgroups. The errors in eachof these subgroups can be assumed to be iid normal variables, and (2) the observationsare taken uniformly from every subgroup (i.e., k normal distributions of errors are mixedwith equal weights.).4.5 Accuracy of the Approximate OC and ASN Equations of the SATPAs indicated in Chapter 2, the OC and ASN functions are the two most important properties of Wald’s (1947) SPRT plans, because these functions can be applied to describethe probabilities of making correct testing decisions and the expected sample size of anSPRT over all possible values of the parameter of interest. In practical applications ofWald’s SPRT plans, these functions will assist the users to obtain an appropriate SPRTplan for a given problem. In Chapter 3, OC and ASN equations based on Wald’s generalfunctions were suggested as approximations of the unknown OC and ASN functions of theSATP procedures (i.e., Equations 3.67, 3.68 and 3.69). These equations were originallyChapter . Monte Carlo Simulation 105Table 4.9: Monte Carlo estimated probabilities and associated average sample size for accepting H0 when the SATP I is applied to test H0 : cr2 eg/z_712 vs. H : o2 =with e0 and-y determined from the mixed normal generated population, N(0, O.O5x).=4?=0.05 0.10 0.20e0 ei/eo‘OC ASN OC ASN OC ASN(SE) (SE) (SE)1.15 0.954 181.8 0.910 115.4 0.820 54.9(4.370) (2.912) (1.350)0.5674 1.35 0.30 0.957 40.4 0.906 27.2 0.858 15.4(0.842) (0.552) (0.330)2.00 0.973 10.0 0.961 7.4 0.923 4.6(0.160) (0.125) (0.072)1.20 0.956 94.0 0.902 64.3 0.825 33.0(1.918) (1.389) (0.758)0.7022 1.40 0.20 0.954 31.3 0.929 22.6 0.880 12.6(0.597) (0.474) (0.249)2.00 0.975 9.8 0.964 7.2 0.927 4.6(0.158) (0.120) (0.070)1.25 0.951 65.4 0.900 45.7 0.842 24.0(1.261) (0.997) (0.528)0.9010 1.45 0.10 0.954 26.8 0.929 19.3 0.887 10.9(0.523) (0.400) (0.221)2.00 0.975 9.8 0.964 7.2 0.924 4.6(0.158) (0.120) (0.070)1.30 0.952 50.4 0.909 34.2 0.851 18.9(1.027) (0.741) (0.425)1.0735 1.50 0.05 0.957 23.5 0.931 16.7 0.890 9.4(0.445) (0.327) (0.183)2.00 0.974 9.9 0.962 7.3 0.923 4.6(0.159) (0.121) (0.071)Note: The values are based on 1,000 iterations. N(0, O.O5x) represents the simulated mixture of 5 normaldistributions with all zero means and variances, 0.1, 0.2, 0.3, 0.4 and 0.5, and the five normal distributionswere mixed in a equal weight, ; eo and e1 are the specified acceptable and unacceptable error limits;dmi represents the mixed normal generator, N(0, 0.0553); is the actual accuracy level associated witheo;both e0 and were determined from the generated population based on the 10 samples of size, 10,000;OCis the Monte Carlo estimated probability of acting H0 when the sequential test I is simulated;ASN is the average sample number associated with OC; SE is the standard error of the estimated ASN.Chapter 4. Monte Carlo Simulation 106Table 4.10: Monte Carlo estimated probabilities and associated average sample size for accepting H0 when the SATP I is applied to test H0 : o2 = e/z_712 vs. H1 : a2 = e/z_,2with e0 and determined from the mixed normal generated population, N(0, 0.54).j3=0.05 0.10 0.20eo ei/eo y OC ASN OC ASN OC ASN(SE) (SE) (SE)1.15 0.978 121.9 0.950 86.3 0.882 47.0(3.262) (2.426) (1.590)4.5454 1.35 0.30 0.978 34.5 0.958 25.4 0.894 13.8(0.924) (0.773) (0.371)2.00 0.984 9.4 0.972 7.0 0.940 4.5(0.208) (0.150) (0.095)1.20 0.984 72.5 0.962 52.4 0.896 28.8(1.831) (1.450) (0.900)5.6248 1.40 0.20 0.980 27.4 0.966 20.4 0.926 12.1(0.704) (0.585) (0.349)2.00 0.986 9.3 0.974 6.8 0.948 4.5(0.201) (0.138) (0.093)1.25 0.984 51.7 0.972 37.9 0.904 20.7(1.309) (1.106) (0.585)7.2174 1.45 0.10 0.980 23.0 0.968 17.4 0.936 10.2(0.568) (0.474) (0.279)2.00 0.986 9.2 0.974 6.7 0.952 4.5(0.195) (0.13) (0.092)1.30 0.982 39.1 0.974 29.0 0.912 16.4(0.986) (0.836) (0.452)8.5995 1.50 0.05 0.986 20.2 0.968 14.8 0.942 9.0(0.525) (0.374) (0.244)2.00 0.986 9.1 0.978 6.7 0.954 4.5(0.189) (0.139) (0.091)Note: The values are based on 1,000 iterations. N(0, 0.54) represents the simulated mixture of 10normal distributions with all zero means and variances, 0.5, 2.0, 4.5, 8.0, 12.5, 18.0, 24.5, 32.0, 40.5, 50.0and the normal distributions were mixed in a equal weight, ; e0 and el are the specified acceptableand unaceeptable error limits; dm2 represents the mixed normal generator, N(0, 0.54); is the actualaccuracy level associated with e0; both e0 were determined from the generated population based onthe 10 samples of size, 10,000; OCis the Monte Carlo estimated probability of accepting H0 whenthe sequential test I is simulated; ASN is the average sample number associated with OC; SE is thestandard error of the estimated ASN.Chapter . Monte Carlo Simulation 107derived by ignoring possible overshooting of the decision boundaries; their accuracy inrepresenting the unknown OC and ASN functions of the SATP procedure is unknown.Fowler (1978, 1983) investigated the accuracy of Wald’s OC and ASN functions forSPRT plans of binomial, negative binomial, normal and Poisson distributions used inforest pest management. He concluded that Wald’s OC functions underestimated theactual OC function near the lower class limit and overestimated it near the upper limit.Also, Wald’s ASN functions underestimated the actual ASN function with the maximumrelative error being near the maximum ASN value. The practical consequences of theseerrors are: (1) the actual error probabilities can be smaller than the nominal probabilitesof errors used to build the sampling plan; and (2) more observations are usually taken inthe field than required to meet the desired probabilities of errors (i.e., a and ) (Fowler1978, p.12). However, all SPRT procedures studied by Fowler were for testing populationmeans, not for testing population variances. Also, the parameters used in the simulationof his study were adopted from pest surveys; they may not be appropriate for modeltesting. Therefore, it was necessary to investigate the accuracy of the approximate OCand ASN equations suggested before applying them in practical situations.4.5.1 Simulation methodsSince an OC and an ASN equation can be developed for any given SPRT specified by acombination of the values of four parameters (crg, o, a and ), it is impossible to examinethe accuracy of the suggested OC and ASN equations for all possible combinations ofthese parameters in the range found for the volume models selected. Only two sequentialaccuracy testing plans were selected to provide some indications of the OC and ASNequations suggested for SATP procedures. These two testing plans selected were:1. Plan 1: ug 0.3,o = O.8,’y = a = = 0.05; andChapter . Monte Carlo Simulation 1082. Plan 2: o = 2.25,7 = a = 0.05,/3 = 0.10.The considerations in selecting these plans were to have the tested variance within themiddle of the range of the actual error variance found in the B.C. tree data and modelsselected (i.e., MSD from 0.003 to 19.9 in Table 4.2), and to have the levels for the typeI and II errors be equal and unequal.To evaluate the accuracy of the approximate OC and ASN equations of the SATPprocedure, the actual unknown OC and ASN function for a given testing plan must beestimated using Monte Carlo techniques. The estimated equations are then used as astandard for comparing with the OC and ASN values determined by the approximateequations. The procedures used to construct the Monte Carlo OC and ASN equations ofthe Plan 1 for SATP I were:1. The specified values of o = 0.3, o = 0.8, a== 0.05 were first used todetermine decision boundaries for SATP I using Equations 3.55, 3.56 and 3.57;2. A set of values which are uniformly distributed across, and extending beyond theacceptance and unacceptance limits (o and o) was obtained by setting the valuesof h in Equation 3.68 to be from -4.0 to 4.0 with an interval of 0.5 (h 0). Foreach h value, Equation 3.68 was then used to calculate 2. The calculated valuesrepresented various possible values of the true variance of errors around the testedvariances. These were the data points for the horizontal axis when constructing theOC or ASN curves. These calculated values will be denoted as o for i = 1 ... 18;3. The normal generator, N(0, o), was used to simulate taking random observationsfrom the error population of a volume model, with one observation taken at eachstage of sampling. SATP I was carried out to test the hypothesis H0 : = 0.3vs. H : cr2 = 0.8 until a terminal decision was reached, where 4 was the firstChapter 4’. Monte Carlo Simulation 109calculated value in Step 2;4. Step 3 was repeated 2,000 times. The percent of times that H0 was accepted andthe average of the number of observations taken were calculated, and used as theestimates of the actual OC and ASN values for a given u2=5. By changing the variance to 42, T3, etc., Steps 3 and 4 were repeated for all 18values calculated in step 2. These 18 pairs of Monte Carlo OC and ASN values area function of the calculated values, u. Together, they were used to represent theactual OC and ASN equations of Plan 1 for the SATP I; and6. By changing the specified values to those of Plan 2, all steps above were repeated.The Monte Carlo OC and ASN equations of Plan 2 for SATP I were obtained.Monte Carlo OC and ASN values of these two selected plans were also obtained forSATP III using the same procedures. Finally, the approximate OC and ASN values werealso obtained from the suggested OC and ASN equations (i.e., Equations 3.67 3.68 and3.69) for the same set of the calculated 4 values by following the procedure describedin Section 3.4. The OC and ASN values determined by the approximate equations aredenoted as OC and ASN, those obtained from simulations are denoted as OCm andASNm.4.5.2 Comparisons between the approximate and Monte Carlo OC and ASNequationsThe suggested OC equation approximates the actual OC function well for both SATP Iand III, in general (Table 4.11 and Figure 4.3). However, near the acceptable variancelimit (o) and between the two limits, the suggested OC equation underestimated theactual OC function of the SATP procedures. The maximum difference between the MonteChapter 4. Monte Carlo Simulation 110Carlo estimated OC values and those determined from the OC equation was -0.067 forSATP I and -0.068 for SATP III. The OC equation suggested appears to be appropriatefor approximating the unknown OC function of the SATP procedure.The suggested ASN equation consistently underestimated the actual ASN functionfor both SATP I and III (Table 4.12 and Figure 4.4). The differences of the ASN valuesbetween Monte Carlo estimates and those determined from the equation ranged from-0.6 to -19.0 for SATP I and from -1.6 to -17.0 for SATP III. The maximum differenceoccurred near o-. This result suggests that the actual sample size required by the SATPprocedure will be larger than the values calculated from the suggested ASN equation.However, the suggested ASN equation could still be used to provide some indication ofthe expected sample size. Such an indication is very helpful in assisting potential usersto obtain the appropriate SATP plan for a given problem.The Monte Carlo ASN values for SATP I and III showed no large differences. Sincethe distributional assumptions of SATP I and II are exactly the same, the suggested OCand ASN equation can be used to approximate the actual OC and ASN functions for thethree developed sequential testing plans.4.6 Effects of Truncation and Group Sampling on the Performance of theSATPAs described in Section 2.4.1.1, the decision boundaries of Wald’s (1947) SPRT planswere derived based on two principal assumptions. One is that the random observationswere taken from a population singly at each stage of sampling, and the other was thatno upper limit to the number of observations was set. These assumptions were retainedin extending Freese’s (1960) procedure to obtain the SATP procedures. However, theseChapter . Monte Carlo Simulation 111Table 4.11: Comparisons of the Monte Carlo OC and ASN curves of the SATP I and IIIand the approximate equations with o = 0.3, o = 0.8, a = = 0.05.Approximate SATP I SATP III.2 Equations0C ASN 0Cm SE0 ASNm SEASJ.J 0Cm SE0a ASNm .9EAS0.1176 1.000 8.0 1.000 0.000 8.6 0.029 1.000 0.000 9.6 0.0310.1327 1.000 8.4 1.000 0.000 9.0 0.039 1.000 0.000 9.9 0.0380.1516 1.000 8.9 1.000 0.000 9.5 0.045 1.000 0.000 10.5 0.0470.1755 1.000 9.6 1.000 0.000 10.2 0,067 1.000 0.000 11.2 0.0620.2062 1.000 10.6 0.999 0.001 11.4 0.085 0.999 0.001 12.3 0.0830.2465 0.990 12.3 0.994 0.002 13.3 0.119 0.995 0.002 14.2 0.1160.3000 0.950 14.9 0.975 0.003 16.8 0.196 0.975 0.003 17.7 0.1940.3721 0.810 18.0 0.888 0.007 23.2 0.330 0.887 0.007 23.8 0.3280.4175 0.680 18.7 0.754 0.010 26.3 0.399 0.754 0.010 26.8 0.3960.5335 0.320 15.9 0.322 0.010 26.1 0.412 0.336 0.011 27.8 0.4140.6077 0.190 12.9 0.175 0.008 22.5 0.350 0.186 0.009 25.1 0.3530.8000 0.050 7.7 0.041 0.004 18.1 0.254 0.046 0.005 20.5 0.2611.0735 0.010 4.6 0.007 0.002 14.1 0.110 0.007 0.002 13.9 0.1121.4667 0.000 2.8 0.001 0.001 11.0 0.078 0.002 0.001 11.5 0.0762.0376 0.000 1.8 0.000 0.000 3.8 0.035 0.001 0.001 9.0 0.0322.8741 0.000 1.2 0.000 0.000 2.9 0.048 0.000 0.000 3.9 0.0464.1097 0.000 0.8 0.000 0.000 2.4 0.032 0.000 0.000 3.4 0.0365.9481 0.000 0.5 0.000 0.000 2.0 0.031 0.000 0.000 3.0 0.029Note: The values were based on 2,000 iterations. o.2 represents the possible values for the true errorvariance ofa volume model; OC and ASN are the OC and ASN values determined from the approximateequations; 0Cm and ASNm are Monte Carlo estimated OC and ASN values, respectively; SE0e andare the standard errors of the Monte Carlo estimated OC and ASN, respectively.Chapter . Monte Carlo Simulation 112Table 4.12: Comparisons of the Monte Carlo OC and ASN curves of the SATP I and IIIand the approximate equations with o = 1.50,u = 2.25,7 = a = 0.05 and /9 = 0.10.Approximate SATP I SATP III.2 EquationsOC ASN 0Cm SE0m ASNm SEAS 0Cm SEo ASNm SEAS0.9028 1.000 22.0 1.000 0.000 22.6 0.141 1.000 0.000 23.6 0.1470.9747 1.000 23.8 1.000 0.000 24.6 0.181 1.000 0.000 25.5 0.1791.0556 1.000 26.3 1.000 0.001 27.1 0.223 1.000 0.001 28.0 0.2221.1468 1.000 29.8 0.999 0.001 30.7 0.291 0.999 0.001 31.6 0.2881.2500 1.000 35.0 1.000 0.001 36.1 0.402 0.998 0.001 36.9 0.3931.3670 0.990 43.0 0.990 0.002 44.6 0.574 0.989 0.002 45.2 0.5611.5000 0.950 55.3 0.968 0.004 59.1 0.912 0.966 0.004 59.5 0.8941.6515 0.830 71.0 0.867 0.008 82.0 1.592 0.860 0.008 82.7 1.5771.7352 0.710 77.1 0.739 0.010 86.7 1.576 0.734 0.010 88.7 1.5931.9203 0.410 76.0 0.433 0.011 92.8 1.719 0.428 0.011 93.8 1.7262.0227 0.270 68.6 0.278 0.010 86.8 1.643 0.279 0.010 86.8 1.6042.2500 0.100 50.3 0.093 0.006 51.7 1.001 0.089 0.006 55.4 0.9522.5114 0.030 35.6 0.026 0.004 45.9 0.652 0.029 0.004 47.9 0.6362.8125 0.010 25.8 0.012 0.002 43.8 0.470 0.013 0.003 42.3 0.4733.1602 0.000 19.4 0.006 0.002 41.4 0.353 0.006 0.002 41.2 0.3483.5625 0.000 14.9 0.003 0.001 26.6 0.287 0.001 0.001 30.1 0.2914.0288 0.000 11.8 0.000 0.000 14.8 0.226 0.000 0.000 15.7 0.2204.5703 0.000 9.50 0.000 0.000 12.1 0.176 0.000 0.000 13.0 0.171Note: The values were based on 2,000 iterations. cr2 represents the possible values for the true errorvariance ofa volume model; OC and ASN are the OC and ASN values determined from the approximateequations; 0Cm and ASNm are Monte Carlo estimated OC and ASN values, respectively; SE0Om andSEAsN are the standard errors of the Monte Carlo estimated OC and ASN, respectively.Chapter. Monte Carlo SimulationC) 0.00.lox= 0.00.20.10I2 &13 (kl5 0.’lB Q21 0.2S 0.0 0.37 &42 0.3 o.bi o.bo 1.b7 l47 21)4 21)7 4i1 s1)Error VananceIw’s ---SATPI —-SATPIII113Figure 4.3: Comparison of the Monte Carlo OC and ASN curves of the SATP I andIII and the Wald’s approximated equations with o 0.3, o = 0.8, a = = 0.05. (a)Comparison of OC curves; (b) Comparison of ASN curves.(a)4’4’Chapter . Monte Carlo Simulationgo..2. o.or(0•’C)0) 0.4O.2o..C. i • .030 0.97 1.06 1.15 1.25 1.37 1.50 1.65 1.74 1.92 2.( 2.25 2.51 2.81 3.16 3.56 4.03 4.57Error Vañance9080.70I:I. 4C302010WaIcfs -- SATP I SATP III114Figure 4.4: Comparison of the Monte Carlo OC and ASN curves of the SATP I and IIIand Wald’s approximated equations with u. = 1.50, = 2.25,c = 0.05 and /3 = 0.10.(a) Comparison of OC curves; (b) Comparison of ASN curves.(a)(b)090 097 106 1.15 12S l37 15O 1.5 174 l 2.2 225 2.51 2.81 316 3.56 4.03 457Enor VananceChapter . Monte Carlo Simulation 115assumptions may not be practical for the purpose of model testing. Taking random observations singly will increase the travel times between the sampling units, and this couldgreatly degrade the expected cost-saving advantage of the SATP procedures. Also, if thedata obtained include field measurement and lab analysis, single observation selection isnot practical for most applications. In addition, if no upper limit is set for the numberof observations sampled, the SATP procedures may result in a large sample size that iseven larger than that considered large enough for an equally reliable fixed sample sizeprocedure.Methods suggested by Wald (1947) to solve these problems were described in Section 2.4.1.4. Instead of taking observations singly, a group of k observations may betaken at each stage of sampling (i.e., group selection). Also, a maximum number of observations (no, called the truncation point) may be preset to stop the sequential testingprocess when n0 is reached. A final decision to accept or reject the null hypothesis is thenmade by comparing the observed test statistic, T0 (e.g., d, when SATP I is applied), with the average of the acceptance and rejection values at this point (i.e., Wald’srule of truncation). These modifications will affect the derived decision boundaries of theSATP procedure, and therefore, the resulting error probabilities. The reliability of theSATP under these situations is unknown.To investigate the effect of group selection, one testing plan, u = 1.221, cr = 2.747and-y = a = /3 = 0.05, was arbitrarily chosen. In choosing this plan, the mean squareerror associated with Spurr’s (1952) estimated model from the Douglas-fir data set wasused for o, and cr was than arbitrarily set to 2.25o. Three group sizes were selected as 2,5 and 10 (i.e., 2, 5 or 10 observations were taken at each stage of simulated sampling). Forcomparison, a group size of one (observations taken singly) was also used. To study theeffect of truncation, o = 1.221 and o = 2.747 were used along with three combinationsof error probabilities of the type I and II error (i.e., a = /3 = 0.05, a = 0.05 and /3 = 0.10,Chapter. Monte Carlo Simulation 116and a = 0.10 and 3 = 0.05). These three selected plans had a specified type I error (a)that was equal, less than, or greater than the specified type II error (3). To set theupper limits of the number of observations sampled, n0, the approximate ASN equation(Equation 3.69 or 3.70) was used to determine ASN when u was the true variance (i.e.,E2 (n)) for each of three testing plans. Based on this calculated ASN value, four levels forthe truncation points (no) were set as n0 = Eg(n), 1.5Eg(n), 2Eg(n) and 3Eg(n). Forcomparison, the case of no truncation (i.e., n0 oc) was also used. In order to providesome indication of the effects of truncation and group selection on the performance ofthe SATP procedure, the Monte Carlo estimates of the actual probabilities of the type Iand II error, am and I3m and associated ASN values were obtained using the proceduresdescribed in Section 4.2.1 for the various group sizes and truncation points of the testingplans described above.As group size increased from 1 to 10, the resulting type I error (i.e., am 1 — OCg)decreased from 0.026 to 0.016, and the resulting type II error (/3m = OC2) decreasedfrom 0.047 to 0.025 (Table 4.13). This means that as the number of observations takenat each stage increased, the difference (error) between the actual error probabilities andtheir nominal values (0.05) increased rapidly. Also, as the group size increased from 1 to10, the resulting ASN (average sample number) increased from 23.1 to 30.0 when u wastrue, and from 23.9 to 34.4 when o was true. This result is explainable since the decisionboundaries of the SATP procedures were obtained by ignoring possible overshooting ofthe decision boundaries. As the number of observations taken at each stage increases,overshooting of the decision boundaries increases. The consequence is that the resultingerror probabilities are smaller than their desired values, and more observations are takenthan are neccessary.In general, the resulting error probabilities decreased as the values of the truncationpoints of the number increased (Table 4.14). For the type I error, a truncation point ofChapter . Monte Carlo Simulation 117Table 4.13: Comparison of Monte Carlo estimates of a, 3 and associated ASN values ofSATP I for various group sizes with o 1.221, o 2.747 and a = /3 = 0.05.Monte Carlo Group Sampling SizeEstimate 1 2 5 10am 0.026 0.021 0.019 0.016ASN2 23.1 24.0 26.7 30.0SEA,.N 0.181 0.188 0.200 0.210/3m 0.047 0.041 0.031 0.025ASN2 23.9 26.0 27.5 34.4SEA.N 0.177 0.227 0.199 0.284Note: The values were based on 2,000 iterations. m = 1 — OC and /3m = OC, and OC and OCare Monte Carlo estimated OC values when o and o-? are true, respectively; ASN and ASN2 areMonte Carlo estimate of average sample number when o or o is true; SEAN is the standard error ofthe estimated ASN.Chapter . Monte Carlo Simulation 1181.5E(n) or 2E7(n) would be enough to obtain a resulting probability (°.m) equal to orless than the desired value (cv). However, for type II error, a truncation point of 3E2 (n)would be required to obtain a resulting probability (/3m) equal to or less than the desiredvalue (a). Therefore, based on the results of this simulation, 3E2(n) appears to be areasonable truncation point, when Wald’s (1947) rule of truncation is used to force theSATP procedure to reach a decision before a terminal decision is reached.4.7 ConclusionFive simulation studies were conducted to examine the developed SATP procedures usingnormal generated populations of the errors of tree volume models. Based on the resultsof these simulation studies, the following conclusions may be made:1. If the errors of a tested model are iid normally distributed variables with a zeromean (estimation is unbiased) and a constant variance, a2, both SATP I and IIIare reliable for classifying a volume model into two wide classes (i.e., acceptable orunacceptable) based on two preset limits of error variance. Also, the probabilityof making a correct decision (OC values) of SATP I and III are very similar, butthe average sample required (ASN values) of SATP III is slightly larger than thatof SATP I (usually, within 5 observations). This occurs since the sample mean oferrors must be estimated by SATP III.2. If the errors, d, are iid normally distributed variables with non-zero means (estimations are biased) and constant variance, the testing decision to accept or rejecta model reached by SATP I depends not only on the variance of errors, but also onthe magnitude of bias. When the value of the bias is larger than that of the errorvariance, the probability of rejecting a model is very high even for very small valuesChapter 4. Monte Carlo Simulation 119Table 4.14: Comparison of Monte Carlo estimates of a, /9 and ASN values of SATP Ifor various truncation points and different combinations of a and /3 with u = 1.221,= 2.747.Monte Carlo Truncation Ponits(a,8) Values E2(n) l.5E2(n) 2E2(n) 3E2(n) 00am 0.076 0.050 0.040 0.031 0.021(a = 0.05, ASN2 17.4 20.3 21.7 22.7 23.13 = 0.05) 13m 0.115 0.080 0.059 0.046 0.047ASN2 19.7 25.4 27.4 24.7 23.9am 0.076 0.052 0.039 0.032 0.028(a = 0.05, ASN2 13.0 15.1 16.3 17.2 17.63 = 0.10) 13m 0.194 0.143 0.113 0.093 0.085ASN2 14.7 18.5 20.3 20.3 18.0am 0.113 0.076 0.061 0.052 0.050(a = 0.10, ASN2 16.2 19.2 20.6 21.7 22.03 = 0.05) [3m 0.103 0.073 0.061 0.049 0.047ASN2 17.9 22.7 25.2 24.2 23.0Note: The values were based on 2,000 iterations. Truncation point means that if the number of observations taken is equal to this value and no terminal decision can be reached, the average of acceptanceand rejection value at this stage is used to make the final decision based on Wald’s rule of truncation;EQ2(n) is the calculated ASN value when o- is true using Wald’s approximated ASN equation; oo meansthat no truncation is made; am and are Monte Carlo estimates of the actual a and fi, respectively;ASN2 and ASN2 are Monte Carlo estimates of ASN when o and o-? are true, respectively.Chapter 4. Monte Carlo Simulation 120of error variance. For a given value of bias, the probability of accepting a model obtained by SATP I is always less than or equal to those of SATP III. The probabilityobtained by SATP III did not change as the values of the introduced bias increased.This confirms that SATP I will reject an inaccurate model regardless of the sourceof errors (larger bias, lower precision, or both), whereas SATP III will determinethe accuracy of a model after bias-correction. In practical applications, the useof SATP III for accuracy testing after bias-correction requires that the biases arethe same for all estimated values, otherwise the bias in the estimation may not beremoveable (e.g., the bias for the different levels of model’s independent variablesis not constant, but a random variable).3. If the distribution of the errors of a volume model can be assumed to be a mixtureof a finite number of normal distributions from different dbh classes with zeromeans and variances being proportional to dbh or (dbh2), SATP I and III arestill applicable for determining the applicability of volume models when randomobservations are uniformly taken from every dbh class, even though the assumptionof normality of the model errors is violated.4. The nominal probabilities of type I and II errors (c and ) used to calibrate anSATP were found to be larger than the resulting error probabilities when the SATPprocedures are actually performed. This means that the SATP procedures areconservative, and always result in a higher testing precision with possibly a highersample size than that required to meet the nominal error probabilities.5. The suggested ASN equation for the SATP procedures based on Wald’s (1947)general ASN function of an SPRT plan always underestimated the actual ASN.However, in general, the suggested OC equation worked well in approximating theactual OC values for the SATP procedures.Chapter 4. Monte Carlo Simulation 1216. Under the normality assumption of model errors with a zero mean and a constantvariance, when random group selection of observations is used instead of singleselection, the resulting probabilities of type I and II errors will decrease with anassociated increase in the average sample size required, as the group size increases.7. If Wald’s (1947) rule of truncation is used to force the SATP procedures to obtain afinal decision when a preset maximum number of observations sampled is reached,a reasonable truncation point will be three times the expected sample size from theASN equation.Chapter 5Field Application Procedures And ExampleIn Chapter 3, three sequential accuracy testing plans (SATP) were developed for applicability testing of tree volume models. The simulation results of Chapter 4 showed thatthe SATP procedures were reliable for classifying a model into two predefined classes (acceptable or unacceptable) under the normality assumption of the error, d, or the relativeerror, d/y. However, as with Freese’s (1960) original procedure, care must be exercised toinsure that the unconditional distribution of d or d/y is normal when applying the SATPprocedures. Also, the appropriate application procedures should be followed in order toobtain the expected advantage of the SATP procedures in sampling cost savings.This chapter is designed to meet the third objective of this research, that is, to proposeand illustrate appropriate field procedures for applying the SATP procedures.5.1 Field Application Procedures5.1.1 PreparationBefore an SATP for determining the accuracy of a model can be carried out, informationabout this model and the application population should be collected. The desired information includes (1) the standard error of estimate (SEE) from model building, the resultsof testing the regression assumptions, especially the normality of errors (residuals); (2)the mean difference (J), the mean absolute difference (1J) and mean squared difference(MSD) of the estimated and observed values in the previous applications of the model;122Chapter 5. Application Example 123and (3) the ranges of the dependent and independent variables in the application population (e.g., the range of tree dbh or height). This information is required for choosingthe appropriate hypothesized error limits for model acceptance and rejection for a givenapplication (e.g., e0 and e1, or p0 and p’), and for setting the desired probability level ofmodel accuracy (7). The information above is also helpful for selecting an appropriategroup sampling design. The sampling cost constraint, or the calculated sample size forapplying a fixed sample size test (e.g., Freese’s (1960) procedure) is helpful for settingthe truncation point of an SATP. Based on available information, the test parameterscan then be specified to calibrate an SATP for a given application.In order to determine an SATP, five parameters need to be specified. These are:(i) e0, the acceptable error limit (or Po, the acceptable percent error limit if SATP IIis used); (ii) e1, the unacceptable error limit (or P1, unacceptable percent error limit ifSATP II is used); (iii) the probability level of model accuracy associated with the e0 orp0 specified, -y; and (iv) the probability levels of the type I and II errors for the test,a and 8. The choice of e0 and e1 (or Po and Pi) is not straight forward; it depends onthe user’s knowledge about the nature of model errors, and the expected sample size formaking a terminal decision (i.e., the wider the interval between these class limits, thesmaller the sample sizes required).Based on the specified values for the parameters, the test hypotheses of an SATP arestated, and the decision boundaries can then be determined.1. For applying SATP I and III, the test hypotheses are:H0 : a2 2 = egz1—-y/2H1: a2 = ez1—-y/2if SATP I is applied, 1ud = 0 should also be added to the hypotheses above.Chapter 5. Application Example 1242. For applying SATP II, the test hypotheses are:2H 2 — PoH(lOOzi...7/2)1 d/y— (100z_712)23. To determine the decision lines of an SATP, the intercepts, h1 and h2, and thecommon slope s must be calculated using the specified values of e0, e1 (or P0, p1when applying SATP II), y, c and 3 with the formulae presented in Chapter 3.Based on these calculated intercepts and slope, the acceptance line is h1 + ns, andthe rejection line is h2 + sn.For selecting random observations from the population, random group sampling issuggested. To do the group sampling, first, the desired group size, k (e.g., 5, 10, etc.),should be preset; then k observations would be randomly taken from the population ateach stage of sequential sampling. Under field conditions, k trees or plots (samplingunits) could be taken at randomly selected points in the stand or forest (population).Also, a truncation point (n0) should be predetermined based on (1) the considerationof maximum sampling cost, or the calculated sample size of an equally reliable fixed-sample procedure; or (2) no 3Eg(n), (i.e., three times the calculated ASN value usingthe approximate ASN equation, Equation 3.69 or 3.70). However, if the first method isused to determine n0, Wald’s (1947) rule of truncation cannot be used to obtain a finaldecision. In this case, a fixed sample size procedure should be used to draw the finaldecision based on the observations sampled.The last preparation is to prepare a field decision graph (e.g., Figure 5.5) or a fielddecision table (e.g., Table 5.15), since the decision boundaries of the SATP proceduresare completely determined when the values of two variance limits and error probabilitieshave been specified. This will simplify the field calculations when the SATP procedureChapter 5. Application Example 125I0CoEC)is applied.5.1.2 Sampling and testing processThe sampling and testing process for the SATP procedure is:1. Take a group of k random observations (e.g., k trees or plots) from the applicationpopulation. Measure the values of the dependent (ye) and independent (xj variablesfori=l...k;2. Obtain an estimate using the tested model (th) and compute the error for eachestimate, d=— th;Sample SizeFigure 5.5: Field decision graph for applying the SATP procedures.Chapter 5. Application Example 126Table 5.15: Field decision table for applying the SATP procedures.Test: H0 : o.2 = VS. H1 : o.2 =Specified parameters: e0, C1, 7, , 3Truncation point: n0Number of Accept Acceptance Test Statistics Rejection RejectObservations H0 Value (T) Value H0(n) (*) h1 + sn h2 + sn (*)5 :1015 :20 :25 :30 :35 : :40 : :45 :50 :Chapter 5. Application Example 1273. Calculate the cumulative squared error as the test statistic, T=d (or1(d/y)2when SATP II is applied); and4. Compare the observed test statistic to the acceptance (hi+sn) or rejection (h2-f-sn)value at ri stage of sampling, then stop sampling, or to continue by taking anadditional group of observations.Since all three SATP testing procedures require the same information (the observed error,d or transformed error, d/y for i = 1 . . . no), it is suggested that all three sequentialtesting plans be used simultaneously in the field. This can be done by simply using threefield decision sheets or graphs to record the testing process for each SATP procedure,or a computer program can be developed, and a portable computer can then be used tofacilitate the process of testing.It should be indicated that this is not equal to a multiple testing plan although threedeveloped sequential testing plans are used simultaneously. The reason for this is thatthese three testing plans do not test the same null hypothesis, but three different nullhypotheses (see Section 5.1.1). In other words, they are actually three separate testingplans for the same problem (i.e., determine the applicability of a model), and there isno interest in the joint probability level of the test. The simultaneous use is only forovercoming the problem that the underlying assumptions of d and d/y cannot be testeduntil data are collected.In general, the sampling and testing process of Steps 1 to 4 are repeated. Wheneverone of three testing plans has the terminal decision, the post-tests for the normality andunbiasedness assumptions suggested in the next section will be carried out for the observed errors, d or d/y, depended on whether SATP I, III, or II terminates. For example,if the SATP I or III terminates, the post-tests for normality assumption should be conducted for the observed error, d. Also if the test shows that d is normally distributed,Chapter 5. Application Example 128the testing process is completed. However, if d is not normally distributed, SATP I andIII should be ignored. The testing process is continued until the SATP II has the terminal decision, and the normality assumption of d/y will then be checked. Finally, if noterminal decision can be reached by either SATP I, II or III, the sampling and testingprocess will be continued until the truncation point, n0, is reached.5.1.3 Post-tests for the underlying assumptionsTo validate the testing decisions reached by the SATP procedure, the normality assumption of d or d/y must be confirmed through post-tests. The test approaches for normalityof errors suggested for Freese’s (1960) procedure and Reynolds’ (1984) estimation procedures (Reynolds 1984) are also suggested as the post-tests for normality of the SATPprocedure. These are two goodness-of-fit testing procedures for normality based on theempirical distribution function (edf) of the errors, and named the Cramer-von Mises(W2) and Anderson-Darling (A2) tests. To apply these tests, the observed errors, d ord/y, must first be sorted in ascending order as:d(1) <d(2) < . . . , < dThen the edf for the sorted errors is given by:F(d) = ifd. d<d(), i= 1,2,...,n(1 ifddThe test statistic of the Cramer-von Mises test is:w2=(F[d} 2i_ 1)2 +-_(5.78)and the test statistic for the Anderson-Darling test is:A2 = — (2i— 1)[lnF(d()] — ln{1— F(d(+l_))}) — n (5.79)Chapter 5. Application Example 129The critical values for these tests are given by Stephens (1974) when the mean and/orthe variance must be estimated from the sample. If both the mean (nd) and the variance(o.2) of d or d/y must be estimated, another test suggested for normality is the correlationcoefficient (r) test based on normal probability plot suggested by Filliben (1975). In anactual application, these three procedures of normality can be all carried out for the sameset of model errors to increase the reliability of the decision.To test the unbiasedness of the observed error, d or d/y, the hypothesis H0 : = 0for error, d (or H0 : = 0 for error, d/y) should be tested. The appropriate teststatistic suggested for this hypothesis by Gregoire and Reynolds (1988) is:T =- (5.80)where S is the standard error of J; and T follows the student’s t distribution with n — 1degrees of freedom.If H0 : ILd = 0 is rejected, it indicates that the estimates of the tested model arebiased. Under this situation, the decision of SATP III will provide some indication ofthe model accuracy after bias-correction for the error, d.It should be noted that since the observed model errors (d or d/y) used to carryout the suggested post-tests are obtained from a sequential sampling process, and thesample size is a random variable instead a predetermined value, these post-tests are onlyapproximate tests used for providing some indications of the underlying distributions ofd or d/y.5.1.4 Making the final testing decisionAfter the post-tests for checking the normality assumption, a final testing decision onthe applicability of the model can be made.Chapter 5. Application Example 1301. If the conclusion from the post-tests is that the observed d is normally distributed,but d/y is not. The decision reached by the SATP I or III is valid for making thefinal decision about the accuracy of the model. The decision reached by SATP IIshould be ignored;2. If the post-tests concludes that the observed d/y is normally distributed, but d isnot, then the decision reached by SATP II should be used as the final decision andthe decision reached by SATP I and III should be ignored;3. If the post-tests conclude that both d and d/y are not normal, the decisions reachedby all three proposed SATP procedures should be ignored. In this case, the non-parametric procedures suggested by Reynolds (1984) may be used to provide anad hoc final decision based on the observations collected in the sequential testingprocess;4. If the hypothesis H0 : = 0 is rejected, but d is normally distributed, then the finaldecisions about the accuracy of a model should be based on the decision reachedby SATP III, which represents the accuracy after bias-correction.5. Finally, if n0 is reached (i.e., no terminal decision is reached by the procedure),the final decision depends on how n0 is set. If no 3E2(n), Wald’s (1947) rule oftruncation can then be used to obtain the final decision by comparing the value oftest statistic (T) with the average of the acceptance and the rejection value at no.If n0 is the calculated sample size of an equally reliable fixed sample size procedure,then it means that there is no gain from using the SATP procedures, and a fixedsample size should be used, or the model should be fitted using the new data.Chapter 5. Application Example 1315.2 Application ExampleIn order to illustrate the proposed field application methods of the SATP procedures,an example of applicability testing for applying the B. C. standard volume model to asubpopulation was constructed.5.2.1 Problem and dataThe tree volume model suggested by the B.C. Ministry of Forests is:log(V) = —4.34950 + 1.82276 x log(dbh) + 1.10812 x log(Height) (5.81)for estimating the total tree volume of lodgepole pine (Watts 1983), where log is thebase 10 logarithm. This model was developed using a sample of 2,846 felled trees, andsuggested for estimating the total volume of lodgepole pine trees for all age classes overmost of the province (i.e., FIZ’s A to J). The percent standard error of estimate in volumeunits (S(%)) was +9.3%. The problem of interest was to determine whether this modelwas sufficiently accurate for estimating tree volume in FIZ E.In constructing this example, the lodgepole pine trees collected from FIZ E describedin Section 4.1 were used. The subset was viewed as representive of lodgepole pine growingin this zone. The calculated statistics for this data set are given in Table 5.16.5.2.2 Specification of test parametersTo specify the acceptance and non-acceptance limits of error variance, a and o, theknown percentage standard error of estimate in volume units (S(%)) was used for o.Since the errors d or d/y in the example would be in logarithmic units, S(%) had tobe represented in logarithmic units. This was obtained by rearranging Equation 2.11 toChapter 5. Application Example 132Table 5.16: Calculated statistics of the lodgepole pine trees collected from FIZ E in B.C.Variable N Mean SD Minimum MaximumAge 192 47.2 25.211 15.0 163.0dbh (cm) 192 13.1 9.331 3.8 43.0Height (m) 192 13.4 7.901 3.1 36.8Volume (m3) 192 0.18968 0.30440 0.00240 1.97060obtain:.9.3SEE = log(j- + 1) = 0.03862 (5.82)This meant that the variance of residual (MSE) of the fitted model (Equation 5.81)in logarithmic units was SEJ = (0.03862)2 = 0.00149. Using this value, o- was set to0.0015. Under the asumption of normality of errors and a y of 0.05, the acceptance errorlimit in logarithmic units was e0 = 1.96 x 0.03862 = 0.08. The non-acceptable error limitwas then arbitrarily set to e1 = 1.25 x e0 = 0.10. This, in turn gave = e/1.962 =0.0026.To set the truncation point (no), Equation 3.69 was used to calculate the expectedsample size (ASN value) when r was true. The calculated value was E2 (n) = 13.8, and3E(n) = 41.4 trees. The truncation point was then set to 50 trees. To carry out SATPII, the percentage acceptance and non-acceptance limits, p0 and p1, were arbitrarily setto 10 and 15, respectively. Finally, both the probabilities of the type I and II error wereChapter 5. Application Example 133each set to 0.05. In summary, the specified parameters of this example were:Acceptable error limit: eo = 0.08 (logarithmic units)Unacceptable error limit: e1 = 0.10 (logarithmic units)Probability level of accuracy required: 7 = 0.05Percent acceptable error limit: Po = 10Percent unacceptable error limit: p = 15Desired type I error: a = 0.05Desired type II error: 3 = 0.05Truncation point: n0 = 505.2.3 Testing resultsThese specified values were then used to compute the intercepts (h1 and h2) and slope (s)of the decision boundaries of SATP procedure using the formulae presented in Chapter3. These calculated intercepts and slopes were:For SATP I and III: h1 —0.02725, h2 = 0.02725, and s = 0.00207;For SATP II: = —0.02760, h2 = 0.02760, and s = 0.00380.With these determined decision boundaries, the SATP procedure was carried out by taking five randomly selected observations from the FIZ E lodgepole pine data at each stageof sampling, and by calculating the sum of squared errors for this group of observations.The cumulative squared errors up to this stage of sampling were then calculated, andcompared to the predetermined decision boundaries for each of three SATP procedure.Three developed SATP procedures were used simultaneously in this example. Atthe 4’’ stage of sampling, SATP II reached its terminal decision of accepting the model(Figure 5.6), and 20 observations were taken. The post-tests were then carried out tocheck the normality assumption of the relative error, d/y based on these 20 observedChapter 5. Application Example 134relative errors. The calculated statistics for Cramer-von Mises (W2) and Anderson-Darling (A2) tests were 0.2224 and 1.3627, respectively. The modified test statistics forthese two tests determined from Stephens’ (1974, p.732) Case 3 were 0.2280 and 1.5501,respectively. Both these modified values were greater than the critical values at 0.01significance level. The calculated correlation coefficient based on the probability plotwas 0.8599. From the Table 1 of Filliben (1975, p.113) with n = 20, this calculated valuewas found to be significant even at 0.005 significance level. Therefore, all these threetests rejected the normality assumption of d/y.With the rejection of the normality of d/y, the decision reached by SATP II wasignored. The sampling and testing process were continued using SATP I and III. SATP IIIreached the terminal decision of accepting the model at 5th stage of sampling (Figure 5.7),and 25 observations were sampled. Again the post-tests were carried out to check thenormality assumption for the error, d, based on the 25 observed errors. The calculatedstatistics for Cramer-von Mises (W2) and Anderson-Darling (A2) tests were 0.0280 and0.1924, respectively. These values were less than the critical values in Stephens’ (1974)Case 3 table even at the 0.15 significance level. The calculated correlation coefficientbased on the probability plot was 0.9922. From the Table 1 of Filliben (1975, p.113)with n 25, this calculated value was found to be not significant even at the 0.25significance level. Therefore, all these three tests confirmed that normality assumptionof d was reasonable. This means that the testing decision reached by SATP III is valid.Since SATP III is an accuracy test after bias-correction, we would like to know whetherthe volume estimates of the tested model are biased. This requires a test of the unbiasedness assumption of d (i.e., to test the hypothesis H0 : = 0). The t-test statistic(Tn) was calculated as -3.9578 using the 25 observed errors. Since ITPI was greater thanthe 95 two-tails percentile of the t-distribution with 24 degrees of freedom, the null hypothesis H0 was rejected. The conclusion was that the volume estimates from the testedChapter 5. Application Example 135gjFigure 5.6: Field decision graph for SATP II when applied to test the applicability ofB.C. standard volume model of lodgepole pine for the application to the subregion, FIZE with specified parameters: po = 10 and P1 = 15, y = 0.05, c = /3 0.05 and n0 = 50.model were biased. With the information about the model errors obtained, the sequentialtesting process could be completed without to continue the testing process of SATP 1.5.2.4 Final testing decisionSince post-testing of normality indicated that the distribution of d was normal, but d/ywas not, the testing decision reached by the SATP III was valid, but that of the SATP IIwas not. In this example, the information about the accuracy of the tested model basedon SATP III was as follows. The volume estimates were biased, the average bias (d)Sample SizeChapter 5. Application Example 1360.160.14_0.12________________________g fReject the2 0.1Iw10.08fAcceptthemodea)0.06.EC)0.040.0215 20 25 30 35 40 45 50 55 60Sarrple SizeFigure 5.7: Field decision graph for SATP III when applied to test the applicability ofB.C. standard volume model of lodgepole pine for the application to the subregion, FIZE with specified parameters: e0 = 0.08 and e1 = 0.10 in logarithmic units, y = 0.05,oz=/3=O.05andno=50.Chapter 5. Application Example 137and its standard error (Si) were -0.0234 and 0.00592 in logarithmic units, respectively.However, if the model bias could be assumed to be constant across dbh classes, theaccuracy of the model after bias-correction would be considered to be acceptable againstthe accuracy requirement specified.Therefore, the conclusion could be made as follows. We would expect that at least95 percent of the absolute values of the model errors represented in logarithmic unitsto be less than 0.08 when the model is applied to FIZ E, if the bias in model estimateswas corrected. This is equivalent to saying that the true variance of errors (variationof model estimates) is less 0.0015 in logarithmic units. However, the estimated volumesfrom the model overestimated the actual tree volumes with a average bias of -0.0234 inlogarithmic units. If the user considered such a bias as tolerable, the model estimateswould be acceptable without modification. Otherwise, a modification of— 0.0234 forthe estimated logarithmic volumes could be used when the model is applied in FIZ E.To confirm the results obtained, the errors were calculated for all trees in FIZ E.The actual accuracy of the model after correcting bias (i.e., d’ = d — d = d + 0.0234)was calculated. The results were P(d’I 0.08) = 0.9531 and the true variance of d’,was 0.00143. These results are consistent with the conclusion reached by the SATPprocedures. Finally, if Freese’s (1960) test I or III was used instead of SATP procedures,the sample size required (Equation 4.73) to have the desired error probabilities, ci = /3 =0.05, and the ratio u/o = 0.0026/0.0015 = 1.733, would be 70. Therefore, the use ofthe developed SATP procedures in this example resulted in a 100(1— 25/70) = 64.3%saving of the sample cost.Chapter 6Conclusions and RecommendationsMathematical models are widely used for estimating tree or stand volumes in forestry. Inthe development of a tree volume model, model specification, parameter estimation, andmodel validation are usually based on data collected over a large geographical region offorested land, and for a particular tree species. Developed in this way, a model will onlybe appropriate for providing accurate volume estimates of trees in the same population.However, in forest inventories, a volume model may be considered for application to amore local region (subpopulation), or to the same species in a different geographicalregion (new population). Under these situations, the accuracy of the model is uncertain.An applicability test should then be carried out to determine the actual accuracy of themodel in order to use it with confidence.Statistical testing procedures based on data with a predetermined, sufficiently largesample size (fixed sample size procedures) are conventionally used for the purpose ofapplicability testing or validation of forestry models. Since an applicability test is notused to provide accurate estimates for the parameters of model errors, but to classify amodel into two wide classes only (i.e., acceptable or unacceptable for a given application),it is possible that a decision to accept or reject a model can be made with a small sample,especially when the actual accuracy of the model is far below or above the requirementof the user. This suggests that a fixed sample size procedure may not be efficient for thepurpose of applicability testing of volume models. This is particularly of concern whendata collection is expensive, time-consuming, or destructive, as with data required for138Chapter 6. Conclusions 139developing volume models.The first objective of this research was to suggest alternative procedures to fixedsample size procedures for applicability testing of tree volume models, and other forestryestimation or prediction models. The second objective was to test the reliability of thesuggested alternative testing procedures under different application conditions. The thirdobjective was to suggest and illustrate appropriate field procedures for the application ofthe suggested alternative testing procedures.Information from the literature was used to develop three new testing plans with avariable sample size, labeled as sequential accuracy testing plans (SATP) I, II and III.The SATP procedures are extensions of Freese’s (1960) procedure of accuracy testingto Wald’s (1947) sequential probability ratio test (SPRT) plan. Approximate OperatingCharacteristic (OC) and Average Sample Number (ASN) equations were also derived forcalculating the probability of making a correct decision and the expected sample size forthe SATP procedures, respectively based on Wald’s the OC and ASN functions for SPRTplans.The second objective of the research was met by addressing volume model testingunder different error assumptions. For this purpose, Monte Carlo techniques were usedto generate normally distributed errors, or a mixture of normally distributed errors fortree volume models. The conclusions reached from the simulations were as follows. First,if the errors of a model are iid normal variables with a zero mean (no estimation biasis present) and a constant variance, SATPs I and III are reliable for classifying theapplicability of a model. The use of SATP I or SATP III will obtain a similar testingdecision and expected sample size under this situation. The simulation results showthat under this distributional assumption, SATPs I and III have smaller probabilities ofmaking a wrong decision than those specified (i.e., c and /3). The Monte Carlo estimatesof the ASN values showed that in applying SATP I and III, an expected sample size forChapter 6. Conclusions 140making a decision with less than 30 observations can be obtained, if the ratio of e1/eois larger than 2.00 for c = 9 = 0.05, or if c = 3 is larger than 0.10 when the the ratioof e1/eo is somewhat less then 2.00. Also, by comparing the resulting ASN values to thesample sizes required by the equally reliable fixed sample size procedures (i.e., havingthe same c and /9, and the ratio, e1/eo), the use of the SATP procedures would resultin, on average, a 40 to 60% of sampling cost-saving.Second, if the errors of a model are iid normal variables with a non-zero mean (estimation bias is present) and a constant variance, SATP I is still valid for classifying theapplicability of an existing model based on the specified values of co, e and y. However,the use of SATP I will reject an inaccurate model regardless of the sources of inaccuracy(large bias, lack of precision, or both) under the distributional assumptions. If accuracyof a model after bias-correction is desired, SATP III should be used instead of SATPI. The use of SATP III requires an assumption that the biases in the model estimatesare the same, and that the mean of the observed errors in the current sample is a goodestimate for the average bias across the range of application. Since the basis of Freese’s(1960) accuracy test (i.e., test to determine whether the true variance of model errorsexceeds an user-supplied hypothesized variance) was incorporated into the derivation ofthe SATP I and II, the simulation results confirmed a statement made for Freese’s testI and II by Reynold (1984). That is, when bias is present, the true variance of errorswould need to be even smaller than the user-supplied hypothesized for model acceptance(i.e., o in the SATP procedures). If the bias is large enough, it may not be possible toaccept the model no matter how small the true error variance.If the errors of a model can be considered to be a mixture of a finite number ofnormal distributions with zero means, and the error variances are proportional to treedbh, SATP I and III are still appropriate for determining the applicability of an existingmodel when the random observations are taken uniformly from each distribution.Chapter 6. Conclusions 141Because the distributional assumptions of SATP II are exactly the same as thoserequired by SATP I, all conclusions obtained for SATP I are also true for SATP II.However, in applying SATP II, the distributional assumptions are made for the relativeerrors of a model instead of the absolute errors. Also, percent error limits must bespecified as the accuracy requirement.Simulations were carried out to examine the accuracy of the suggested OC and ASNequations, and to investigate the effects of two modifications of Wald’s (1947) assumptionsfor SPRT plans on applicability testing of tree volume models. The results showed thatthe suggested OC equation worked well for approximating the actual OC values of SATPprocedure in general. However, the suggested ASN equation consistently underestimatedthe actual ASN values of the SATP procedures. Hence, the ASN equation suggested canbe used for providing some indication of the expected sample size of the SATP proceduresonly.By modifying the SATP procedures and using group selection instead of single selection of random observations, the resulting probabilities of type I and II errors consistentlydecreased, and the average sample size required to make a decision consistently increasedas the group size increased. In general, the use of Wald’s (1947) rule for setting a maximum number of observations sampled (truncation point) to stop an SATP procedureand obtain a final decision before a terminal decision is reached was reliable only whenthe truncation point was larger than three times the calculated ASN value using thesuggested ASN equation.For field applications, it was suggested that sampling continue until all three testsreached a terminal decision, or the preset truncation point was reached. Post-tests fornormality and unbiasedness were also suggested.Some problems exist in the SATP procedures and further studies on these problemsare needed to improve the usefulness of sequential sampling procedures for forestry modelChapter 6. Conclusions 142testing. First, the normality assumption for the unconditional distribution of model errors or relative errors is critical for applying the SATP procedures. This assumptionmay not be met for some applications. To overcome this problem, studies should becarried out to investigate the actual distributions of the errors for some common forestrymodels, and to suggest appropriate error transformation methods to meet the normalityassumption. Another approach for solving this problem may be to develop sequentialnon-parametric testing procedures. Second, the random selection methods required toapply the SATP or other sequential sampling procedures (i.e., random single or randomgrouping selection) may be impractical for forestry model testing since it will increase thetravel time between the sampled units, and degrade the usefulness of sequential procedures. Also, the observations obtained from random selection methods may not cover theentire range of the model, and this will reduce the representativeness of the observationsfor model testing. Therefore, other sampling methods may be appropriate for applyingthe sequential testing procedures. Finally, the data processing for each selected samplingunit (i.e., tree or plot) must be completed in the field in order to continue the testingprocess of the SATP procedures. If the measurements obtained involve complicated analyses or computation processes, the application of SATP procedures may become difficult.Appropriate computer programs should be developed and the use of a portable computeris strongly recommended for applying the SATP procedures.As variable sample size procedures, the SATP procedure does not require a predetermined, sufficiently large sample set. This property of the SATP procedures insuresthat the sample size required to make a testing decision is minimum. The use of theSATP procedures do not preclude the use of other fixed sample size procedures, becausesampling can be always continued to let the sample size be large enough for using a fixedsample size procedure. Therefore, the developed SATP procedures are considered to beChapter 6. Conclusions 143appropriate alternative testing procedures to the conventional fixed sample size procedures. Also, based on the similarity between the purpose of an applicability test andmodel validation, the developed SATP procedures should find fairly wide applicabilityin validating forestry estimation or prediction models, especially when data collection ishighly expensive, time-consuming and destructive.The main contribution made by this reseach was to lay the foundation for futureresearch in applying sequential analysis approaches to forestry model testing and therebybring to the forestry researcher’s attention the possible wide applications of sequentialanalysis procedures.References CitedAllen, J., D. Gonzalez and D. V. Gokhale. 1972. Sequential sampling plans for theBoliworm, Heliothis zea. Environ. Entomol. 1:771-780.Armitage, P. 1975. Sequential medical trials. 2nd ed. Oxford, Blackwell. 194 pp.Auchmoody, L. R. 1976. Accuracy of band dendrometers. USDA For. Serv. Res. NoteNE-221. 4 pp.Avery, T. E. 1975. Natural Resources Measurements. McGraw-Hill Book Company, NewYork. 339 pp.Baker, F. S. 1925. The construction of taper curves. J. Agr. Res. 30:609-624.Balci, 0. and R. 0. Sargent. 1982. Validation of multivariate response models usingHotelling’s two-sample T2 test. Simulation. 39:185-192.Balci, 0. and R. C. Sargent. 1984. A bibilography on the credibility assessment andvalidation of simulation and mathematical models. Simuletter, 15:15-27.Bell, J. F. and W. A. Groman. 1971. A field test of the accuracy of the Barr and Stroudtype FP-12 optical dendrometer. For. Chron. 47:69-74.Boehmer, W.D. and J. C. Rennie. 1976. Predicting diameter inside bark for somehardwoods in west Tennessee. Wood Sci. 8:209-212.Brack, C.L. and P. Marshall 1990. Sequential sampling and modelling for mean dominantheight estimation. Australian Forestry. 53(1):41-46.144References Cited 145Bruce, D. 1920. A proposed standardization of the checking of volume tables. J. For.18:544-548.Burkhart, H. E. 1986. Fitting analytically related models to forestry data. In Proceedings of XIII International Biometric Conference, Seattle, Washington. July 27 -August 1, 1986. 15 pp.Burr, I. W. 1953. Fundamental principles of sequential analysis. Industrial QualityControl. 6:92-98.Choi, S.C. 1971. Sequential test for correlation coefficients. J. Amer. Statist. Assoc.66:575-576.Clutter J. L., J. C. Fortson, L. V. Pienaar, G. H. Brister and R. L. Bailey. 1983. TimberManagement: A Quantitative Approach. John Wiley & Sons, New York, 333pp.Coggin, D. L. and G. P. Dively. 1982. Sequential sampling plan for the Armyworm inMaryland small Grains. Environ. Entomol. 11:169-172.Cost, N. D. 1971. Accuracy of standing-tree volume estimates based on McClure mirrorcaliper measurements. USDA For. Serv. Res. Note. SE-152. 4 pp.Cunia, T. 1964. Weighted least squares method and construction of volume tables. ForestSci. 10:180-191.Daniels, R. F., H. E. Burkhart and M. R. Strub. 1979. Yield estimates for Loblolly pineplantations. J. For. 76:581-583.Davies, 0. L. 1954. The Design and Analysis of Industrial Experiments. New York,Hafner. 635 pp.References Cited 146Eisenberger, I. 1964. Genesis of bimodal distributions. Technometrics. 69(4): 357-363.Ek, A. R., and R. A. Monserud. 1979. Performance and comparison of stand growthmodels based on individual tree and diameter-class growth. Can. J. For. Res.9:231-244.Evert, F. 1981. A model for regular mortality in unthinned white spruce plantations.For. Chron. 57:77-79.Fairweather, S.E. 1985. Sequential sampling for assessment of stocking adequacy. North.J. Appl. For. 2:5-8.Feldman, R. M., G. L. Curry and T. E. Wehrly. 1984. Statistical procedure for validatinga simple population model. Environmental Entomology. 13: 1446-1451.Filliben, J. J. 1975. The probability plot correlation coefficient test for normality. Technometrics. 17(1): 111-117.Fishman, G. S. and P. J. Kiviat. 1967. The analysis of simulation-generated time series.Management Sci. 13(7):525-557.Flewellirig, J. W. and L. V. Pienaar. 1981. Multiplicative regression with lognormalerrors. Forest Sci. 27(2):281-289.Forsythe, 0. E., M. A. Malcolm and C. B. Moler. 1977. Computer Methods for Mathematical Computations. Prentice-Hall, Inc. 259 pp.Fowler, 0. W. 1969. An investigation of some new sequential procedures for use in forestsampling. Ph.D. Thesis, University of California, Berkeley. 118 pp.Fowler, 0. W. 1978. Errors in sampling plans based on Wald’s sequential probabilityratio test. USDA For. Serv. Gen. Tech. Rep. NC-46. 13 pp.References Cited 147Fowler, G. W. 1983. Accuracy of sequential sampling plans based on Wald’s sequentialprobability ratio test. Can. J. For. Res. 13: 1197-1203.Fowler, G. W., and Lynch, A. M. 1987(a). Sampling plans in insect pest managementbased on Wald’s sequential probability ratio test. Environ. Entomol. 16:345-354.Fowler, G. W., and Lynch, A. M. 1987(b). Bibliography of sequential sampling plans ininsect pest management based on Wald’s sequential probability ratio test. GreatLakes Entomologist. 20(3): 165-171.Freese, F. 1960. Testing accuracy. Forest Sci. 6:345-354.Ghosh, B.K. 1967. Sequential analysis of variance under random and mixed models. J.Amer. Stat. Ass. 62:1401-1417.Ghosh, B. K. and P. K. Sen. 1991. Handbook of sequential analysis. Marcel Dekker,Inc., New York. 637 pp.Goulding, C. J. 1979. Validation of growth models used in forest management. New.Zealand J. of Forestry. 24(1):108-124.Greene, W. H. 1990. Econometric Analysis. MacMillan Publishing Company, New York.783 pp.Gregoire, T. C. 1987. Generalized error structure for forestry yield models. Forest Sci.33:423-444.Gregoire, T. G. and M. R. Reynolds Jr. 1988. Accuracy testing and estimation alternatives. Forest Sci. 34:302-320.Greig, I. D. 1979. Validation, statistical testing, and the decision to model. Simulation.33(2) :55-60.References Cited 148Hazard, J. W. and J. M. Berger. 1972. Volume tables vs. dendrometers for forest surveys.J. For. 69:216-219.Husch, B., C. I. Miller and T. W. Beers. 1982. Forest Mensuration. 3rd ed. John Wileyand Sons, New York. 402 pp.Iwao, S. 1968. A new population method of analyzing the aggregation pattern of animalpopulations. Res. Popul. Ecol. 10:1-20.Iwao, 5. 1975. A new method of sequential sampling to classify populations relative to acritical density. Res. Popul. Ecol. 16:281-288.Jackson, J. E. 1960. Bibliography on sequential analysis. J. Am. Stat. Ass. 55:561-580.Johnson, N. L. 1961. Sequential analysis: a survey. J. R. Stat. Soc., A. 124:372-411.Johnson, N. L. and S. Kotz. 1970. Continuous univariate distributions-i: Distributionsin Statistics. John Wiley & Sons, New York. 300 pp.Kuno, E. 1969. A new method of sequential sampling to obtain a fixed level of precision.Res. Popul. Ecol. 11:127-136.Kozak, A. 1988. A variable-exponent taper equation. Can. J. For. Res. 18:1363-1368.LeMay, V. M. 1990. MSLS: A technique for fitting a simultaneous system of equationswith a generalized error structure. Can. J. For. Res. 20:1830-1839.Lloyd, M. 1967. Mean crowding. J. Anim. Ecol. 36:1-30.Loetsch, F. and K. E. Hailer. 1973. Forest inventory. Vol.1 BLU Verlagsgesellschaft. 436pp.References Cited 149Lynch, A. M., G. W. Flower and 0. A. Simmori. 1990. Sequential sampling plans forspruce budworm (Lepidoptera: Tortricidae) egg mass density using Monte Carlosimulation. J. Econ. Entomol. 81(1) :220-224.Mankin, J. L., V. O’Neill, H. H. Shugart, and B. W. Rust. 1977. The importance ofvalidation in ecosystem analysis. In New directions in the analysis of ecologicalsystems. Part I. Simulation Councils Proceed, 0. 5. Innis (ed.). Ser.5(1). pp.63-71.Marshall, P. and V. M. LeMay. 1990. Testing prediction equations for application toother populations. In Proceedings, IUFRO Research Forest Inventory, Monitoring Growth and Yield Conference. August 5-11. Montreal, Canada pp. 166-173.Marshall, P. and Y. Wang. 1993. OPSILIR (Version 2.0): A computer program to determine the efficient sampling distribution for simple linear regression problems.Funding for the development of this program was supplied through the STDF -AGAR program of the Science Council of British Columbia, Canada.Meyer, H. A. 1938. The standard error of estimate of tree volume from logarithmicvolume equations. J. For. 36:340-342.Moser, J. W. JR. and 0. F. Hall. 1969. Deriving growth and yield functions for unevenaged forest stands. Forest Sci. 15:183-188.Mukerji, M. K., 0. 0. Olfert and J. F. Doane. 1988. Development of sampling designs foregg and larval populations of the wheat midge, Sitodiplosis Mosellana (Géhin)(Diptera: Cecidomyiidae). Wheat. Can. Ent. 120:497-505.Naylor, T. H. and J. M. Finger. 1967. Verification of computer simulation models.Manage. Sci. 14(2): 92-101.References Cited 150Nevers, H. P. and J. P. Barrett. 1966. Testing accuracy of white pine volume estimatesfrom penta prism caliper measurements. J. For. 64:811-812.Newton, P. F. 1989. Fixed-precision list-quadrat sequential sampling for point-densityestimation. For. Ecol. Manage. 27:295-308.Newton, P. F. and V. M. LeMay. 1992. Evaluation of a sequential counting plan forpoint-density estimation within Black Spruce/Balsam Fir seedling populations.For. Ecol. Manage. 53:195-212.Nyrop, J. P. and 0. A. Simmons. 1984. Errors incurred when using Iwao’s sequentialdecision rule in insect sampling. Environ. Entomol. 13:1459-1465.Penner, M. 1989. Optimal design with variable cost and precision requirements. Can. J.For. Res. 19:1591-1597.Pieters, S. 1978. Bibliography of sequential sampling plans for insects. Bull. Entomol.Soc. Am. 24:372-374.Rauscher, H. M. 1986. The microcomputer scientific software series 4: Testing predictionaccuracy. USDA For. Serv. Gen. Tech. Rep. NC-107.Reineke, L. H. and D. Bruce. 1932. An alignment-chart method for preparing forest-treevolume tables. USDA Tech. Bull. 304. 28 pp.Rennie, J. C., and H. V. Wiant, Jr. 1978. Modification of Freese’s Chi-square test ofaccuracy. USD1 Bur. Land Manage. Res. Inventory Note BLM-14. 3 pp.Reynolds, M.R., Jr. 1984. Estimating the error in model predictions. Forest Sci.30(2) :454-469.References Cited 151Reynolds, M. R., Jr., H. E. Burkhart, and R. F. Daniels. 1981. Procedures for statisticalvalidation of stochastic simulation models. Forest Sci. 27(2): 349-364.Reynolds, M. R., Jr., T. E. Burk, and W. C. Huang. 1988. Goodness-of-Fit tests andmodel selection procedures for diameter distribution models. Forest Sci. 34(2):373-399.Reynolds, M. R., Jr., and J. Chung. 1986. Regression methodology for estimating modelprediction error. Can. J. For. Res. 16:931-938.Rushton, S. 1950. On a sequential t-test. Biometrika 37:326-333.Schumacher, F. X. and F. S. Hall. 1933. Logarithmic expression of timber-tree volume.J. Agric. Res. 47:719-734.Smith, J. H. G. and J. W. Ker. 1958. Sequential sampling in reproduction surveys. J.For. 56:106-109.Snee, R.D. 1977. Validation of regression models: Methods and Examples. Technometrics 19(4):415-428.Spurr, 5. 1952. Forest inventory. The Ronald Press Company, New York. 94 pp.Stephens, M. A. 1974. EDF statistics for goodness-of-fit and some comparisons. J. Am.Stat. Assoc. 69: 730- 737.Talerico, R. L. and R. C. Chapman. 1970. SEQUAN: A computer program for sequentialanalysis. USDA For. Serv. Res. Note. NE-116. 6 pp.Thies, W. G. and R. D. Harvey, Jr. 1979. A photographic method for measuring treedefect. Can. J. For. Res. 9:541-543.Van Horn, R. L. 1971. Validation of simulation results. Manage. Sci. 17:247-258.References Cited 152Wald, A. 1945. Sequential tests of statistical hypotheses. Ann. Math. Stat. 16:117-186.Wald, A. 1947. Sequential analysis. John Wiley and Sons, New York. 212 pp.Walters, W.E. 1955. Sequential analysis of forest insect surveys. Forest Sci. 1:68-79.Watts, S. B. (ed.) 1983. Forestry handbook for British Columbia. 4th ed. The ForestryUndergraduate Society, University of British Columbia, Vancouver. pp.431-432.West, P. W. 1983. Accuracy and precision of volume estimates from a simulation modelfor regrowth eucalypt forest. Aust. For. Res. 13:183-188.Wetherill, C. B. and K. D. Glazebrook. 1986. Sequential Methods in Statistics. 3rd ed.Chapman and Hall, New York. 264 pp.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Development and examination of sequential approaches...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Development and examination of sequential approaches for applicability testing of tree volume models Wang, Yue 1994
pdf
Page Metadata
Item Metadata
Title | Development and examination of sequential approaches for applicability testing of tree volume models |
Creator |
Wang, Yue |
Date Issued | 1994 |
Description | In forest inventories, an existing standard volume equation may be considered for appli- cation to a local area (subpopulation), or to the same species in a different geographic region (new population). In order to use a volume model with confidence under these situations, an applicability test must be carried out to determine the actual accuracy of the model. Procedures based on a predetermined, sufficiently large sample of data (fixed sample size procedures) are available. However, since applicability testing is only used to classify a model as either acceptable or unacceptable, it is likely that a testing decision can be made with a smaller sample size, especially when the actual accuracy of the model is far below or above the user’s requirement. This is particularly of concern when data collection for model checking is very expensive, time-consuming or destructive. As an alternative, three sequential accuracy testing plans (SATP) were developed by extending Freese’s (1960) accuracy tests using Wald’s (1947) sequential probability ratio tests (SPRT) in this thesis. Observations are taking sequentially, and at each stage of sampling, a decision of whether to accept or reject the model, or to continue sampling, is made. The SATP procedures are potentially superior to a fixed sample size procedure in terms of lower sample sizes. Approximate Operating Characteristic (OC) and Average Sample Number (ASN) equations were also suggested to assist the potential users of the SATP procedures in choosing appropriate tested parameters for a given problem. The simulation results using normal distribution generators showed that the SATP procedures are reliable for classifying a volume model as either acceptable or unacceptable based on two pre-set limits of the accuracy requirement. Also, on average, the use of the SATP procedures will result in a 40 to 60% of sampling cost-saving compared to an equally reliable conventional fixed sample size procedure. A detailed example is given to illustrate the application of the SATP procedures. |
Extent | 2810225 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-06-05 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0075339 |
URI | http://hdl.handle.net/2429/8820 |
Degree |
Doctor of Philosophy - PhD |
Program |
Forestry |
Affiliation |
Forestry, Faculty of |
Degree Grantor | University of British Columbia |
GraduationDate | 1995-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1995-983647.pdf [ 2.68MB ]
- Metadata
- JSON: 831-1.0075339.json
- JSON-LD: 831-1.0075339-ld.json
- RDF/XML (Pretty): 831-1.0075339-rdf.xml
- RDF/JSON: 831-1.0075339-rdf.json
- Turtle: 831-1.0075339-turtle.txt
- N-Triples: 831-1.0075339-rdf-ntriples.txt
- Original Record: 831-1.0075339-source.json
- Full Text
- 831-1.0075339-fulltext.txt
- Citation
- 831-1.0075339.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0075339/manifest