BAYESIAN STATISTICS FOR FISHERY STOCK ASSESSMENT ANDMANAGEMENTByPaul G. KinasB. Sc. (Oceonology) Universidade do Rio GrandeM. Sc. (Statistics) Universidade de Sao PauloA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF STATISTICSWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIA1993© Paul G. Kinas, 1993In presenting this thesis in partial fulfilment of the requirements for an advanced degree at theUniversity of British Columbia, I agree that the Library shall make it freely available for refer-ence and study. I further agree that permission for extensive copying of this thesis for scholarlypurposes may be granted by the head of my department or by his or her representatives. Itis understood that copying or publication of this thesis for financial gain shall not be allowedwithout my written permission.Department of StatisticsThe University of British Columbia2075 Wesbrook PlaceVancouver, CanadaV6T 1W5Date:^/2 O c k)_zr^3AbstractThis work is about the use of Bayesian statistics in fishery stock assessment and management.Multidimensional posterior distributions replace classical parameter estimation in surplus-production and delay-difference models. The maximization of expected utilities replaces theestimation of optimal policies.Adaptive importance sampling is used to obtain approximations for posterior distributions.The importance function is given as a finite mixture of heavy-tailed Student distributions. Theperformance of the method is tested in five case-studies, two of which use data simulation.Real data refer to Skeena river salmon (Oncorhynchus nerka), Orange Roughy (Hoplostethusatlanticus) and Pacific cod (Gadus macrocephalus).The results show that the technique successfully approximates posterior distributions inhigher dimensions even if such distributions are multimodal. When comparing models in termsof their performance as management tools, simpler and less realistic models can do better thanmore sophisticated alternatives. The Bayesian approach also sheds new light on the controversyabout the Orange Roughy fishery.11Table of ContentsAbstract^ iiTable of Contents^ iiiList of Tables vList of Figures^ viiiAcknowledgement x1 Introduction 11.1 Motivation ^ 11.2 Uncertainties, Models and Learning ^ 41.3 The Bayesian Method ^ 71.4 Summary of Contents 132 Bayesian Posterior Computation 152.1 Introduction ^ 152.2 Monte Carlo Integration with Basic Importance Sampling (BIS) ^ 182.3 Adaptive Importance Sampling (AIS) and Mixture Models 212.3.1^An Example ^ 262.4 Sequential Analysis 322.4.1^AIS in Sequential Models ^ 362.4.2^An Example ^ 382.5 Summary ^ 42in3 Models for Biomass Dynamics^ 433.1 Introduction ^ 433.2 Model 1 473.2.1 Two Lemmas ^ 493.2.2 Catch and Effort Data ^ 533.3 Model 2 ^ 583.4 Delay-Difference Models ^ 633.5 Summary ^ 664 Applications^ 674.1 Introduction 674.2 Stock and Recruitment: sockeye salmon ^ 684.3 Surplus Production: Simulation Model 744.4 Surplus Production: Orange Roughy ^ 914.5 Multimodality: Simulation Model 1034.6 Multimodality: Pacific Cod ^ 1074.7 Final Remarks ^ 1135 Discussion and Conclusion^ 119Appendices^ 126A Pseudo-code for Adaptive Importance Sampling^ 127Bibliography^ 139ivList of Tables2.1 Data from McCullagh and Nelder [41] (Sec 9 3 3)^ 272.2 List of parameters used in the simulation model for artificial data generation . . 392.3 Output from simulation. The harvest rates ht were fixed in advance. Bt is theBiomass at the start of year t and h is the corresponding observed abundanceindex. Ot and Yt are the natural logarithm of Bt and h, respectively. ^ 394.4 Some biologically meaningful attributes of the parameters in the Ricker andBeverton-Holt models. ^ 694.5 Five quantiles used to determine prior distribution for parameters a and b (thesubscript according to the model: Ricker or Beverton-Holt). 'm' and 'C' aremean and variance used in Student prior. 704.6 Results for AIS applied to the Skeena river sockeye salmon data under variousassumptions. The first number is the posterior mean; the posterior standarddeviation is given in brackets. Columns R* and BH* are the estimates ofHilborn and Walters [25] 724.7 Posterior mean and Covariance for different Models fitted to the Skeena riversockeye salmon data ^ 724.8 Specification of the three simulation models used to generate catch and effort data. 764.9 Specification of 16 models used to fit catch and effort data. 'SIM' indicatesthe simulation model. 'FIT' gives the fitting model. 'Eff' specifies assumedeffort sequence. A is the assumed choice for A. An omitted entrance implies therepetition from the line above. 81v4.10 Posterior expectation (standard error) for the biases in Sopt, the bias in Eopt,and the global variance (72 = Vp -I- V ^ 854.11 The Bayes rule EB for different cases is displayed together with its "true" al-ternative [value in square brackets]. Columns 2 to 4 specify model assumptions.All 6 runs use SLIMIT-0 3 894.12 Estimated catch history, and trawl survey biomass for Orange Roughy (Ho-plostethus atlanticus) for the period 78/79 to 88/90. Data from Francis [18]934.13 Model specifications (columns 2 and 3), final entropy (Ent.) and posterior es-timates for virgin biomass B1 for Orange Roughy (values in 1000 t) and theprobability that B1 exceeds 500 thousand t. Values in brackets are estimates forposterior standard error. Run 'la' uses assumption (a) of Lemma 3.1.; the re-maining runs use assumption (b). 'D' denotes diffuse prior and 'I' the informativealternative. 954.14 Risk-averse utilities cl, associate to the five possible cumulative discounted catchCOO ^ 1004.15 Recommended decisions ai (rate of TAC reduction in thousand t/year), whichmaximize the expected posterior utilities, using three alternative utility functions 1024.16 Expected utilities obtained for all 5 decision rules, according to the form of theutility function (expressed by 0) ^ 1024.17 Parameters which are fixed in the delay-difference model for Pacific cod (Gadusmacrocephalus) ^ 1094.18 Quantiles specified for the parameters. The mean m and variance C are pa-rameters for the corresponding Gaussian distribution for the logarithm of theparameter. 1104.19 Monte Carlo estimates of posterior mean (standard deviation) for diffuse andinformative priors. The entropies are for the final importance function. ^ 113vi4.20 Monte Carlo estimates of moments of the posterior distribution p(0 I Y(r)) forPacific cod, given diffuse and informative priors. The values mo, ml and m2refer to In b, In q, and In B1 respectively. The values Cij are the correspondingcovariances 116viiList of Figures2.1 Contour plot for posterior distribution of (pi,p2). Contour lines are for .05, .10,^.25, .50, .75, .90, .95 of largest observed density 302.2 (a) Marginal posterior for pi (full line) and p2 (broken line). (b) Density estimatesfor (p2 —p1) for 4000 sample points (full line) and the 500 values remaining afterclustering (dotted line) 312.3 Prior (predictive) and posterior (filtered) distributions are shown respectively bydotted and full lines, for times t = 1 to t = 6. `v' is the observation Yt and `+'indicates the "true" et 404.4 Stock and Recruitment data for Skeena river sockeye salmon (Reproduced fromHilborn and Walters [25] ). The lines were fitted according to Ricker model (bro-ken) and Beverton-Holt model (continuous) using means of posterior distribution. 704.5 (a) and (b): Contour plots of posterior distributions for Rb and BHb (Lines at.01, .05, .10, .25, .75, .90, .95 of the largest density). (c) and (d): Approximateposterior densities for Smsy and umsy (dashed line: BHb; full and dotted lines:Ra and Rb) ^ 734.6 The two effort sequences used in the simulation models. ^ 784.7 Time-series of observed efforts (dotted line) and the resulting simulated catches(full line), for different models. ^ 794.8 Spawning stock 'S(t)' and the corresponding biomass `B(t-I-1)' for different sim-ulation models. The line is the function GO from which the data were generated. 804.9 Box-plots for "relative entropy" for the sixteen fits to simulated catch and effortdata^ 84viii4.10 Contour plots of two-dimensional marginal posterior distributions for 'run 8'(linesat .05, .10, .25, .50, .75, .90, .95 of largest density). The dots indicate the trueparameter used in the simulation model. In (d) a histogram of the 6000 MonteCarlo estimates for Eopt (the column to the far right includes all Eopt > 5). . 874.11 Expected posterior utilities over the set A of possible choices for effort. Firstcolumn uses fitting model It2 and second column uses BH. Data from simulationmodels 'case 1', 'case 2' and 'case 3' were used in rows 1, 2 and 3. 894.12 Marginal posterior distributions for virgin biomass B1 for different runs:'la'(line), '2'(dots) and '4' (broken line) ^ 964.13 Contour plots for two-dimensional marginal posterior distributions: 'run 1a':(a)and (b); 'run 2':(c) and (d); 'run 4':(e) and (f). Contour lines are drawn at .05,.10, .25, .50, .75, .90 and .95 of the highest density 974.14 Time series for Catch (Ct), observed effort (Es) and CPUE (Ii). Also a plot of`CPUE vs. E' for the final 13 years ^ 1064.15 Some marginal posteriors for data simulated using the logistic biomass modeland fitted by the mixture model described in text. The contours are at.01,.10,.20,.30,.40,.50,.70,.90 of the largest observed posterior density 1084.16 Estimated posterior marginal densities for Pacific Cod (Gadus macrocephalus).The contours represent .05, .10, .25,.50, .75, .90 and .95 of the largest observeddensity. 1124.17 Estimated posterior marginal densities for Pacific cod, using an informative priorfor the catchability parameter q. The contours represent .05, .10, .25, .50, .75,.90 and .95 of the largest observed density. 1144.18 Marginal posterior densities for: (a) In b, (b) ln B1. The full and dotted linescorrespond to the "diffuse" and "informative" priors respectively ^ 115ixAcknowledgementI am most grateful to my research supervisor Donald Ludwig for the many ideas and valuableadvice; and to Colin W. Clark whose work in bioeconoraics triggered my interest in resourcemanagement.I wish to thank the people whose suggestions and comments about preliminary versionsof this monograph have been most helpful: Mohan Delampady, Harry Joe, Marc Mangel, JonSchnute, Al Tyler, Carl J. Walters, and James V. Zidek.I also wish to thank Antonio A. Loureiro for his willingness to help me over countlesscomputing hurdles; and to Christine A. Adkins for agreeing to proofread the pseudo-code.Only the support from my wife Carmen Luiza and her patience made the journey possible.Our sons Bruno and Werner made the journey more enjoyable by always reminding me ofrecreation and fun. To the family I owe my greatest debt.Chapter 1Introduction1.1 MotivationThis work is about the practical implementation of Bayesian statistical analysis for quantitativefisheries stock assessment and management. The basis of the study is the recognition that fish-eries management is decision making in the presence of uncertainty. Different possible actionshave to be compared according to measures of their consequences. Such analysis typically seeksa compromise between biological, economical and social components — see Section 1.2 — whiletraditional assessment concentrates on the biology of the stock. The Bayesian approach de-scribed in Section 1.3 expresses uncertainty as probabilities and uses utility theory to measurethe consequences of different actions. Lindley [31] and Berger [2] give convincing argumentsendorsing the claim that coherent and rational decision making in the presence of uncertaintymust correspond to a Bayesian analysis. Hilborn and Walters [25] emphasize the importanceof a reorientation of thinking within fishery sciences. I believe that the Bayesian paradigm tostatistics has the ingredients to support such a philosophical change: (i) the consistent expres-sion of uncertainties in the appropriate language of probability, and (ii) a coherent frameworkfor decision making in the presence of uncertainties. Technical difficulties have hindered a wideruse of the Bayesian approach in practical fishery problems. The adaptive importance samplingprocedure described in Chapter 2 is designed to overcome some of these obstacles. In Chapter 3I propose ways in which such procedure can be implemented for biomass dynamic models. Ifurther illustrate the procedure for a variety of case-studies in Chapter 4 and make generalcomments and draw conclusions in Chapter 5.In fisheries sciences it is assumed that management actions are taken upon recommendations1Chapter 1. Introduction^ 2of scientific assessment of the resource and its potential production. But criteria for real-worlddecisions need to include considerations of a political, social, economic, and cultural nature aswell. Quantitative stock assessment is too often seen as synonymous with management, whilein fact assessment and management serve distinct purposes: the former should produce usefuladvice which can effectively assist decision making which is responsibility of the latter. The con-ventional approach of producing a set of statistical point estimates has proved to be misleadingfor at least two reasons: (i) it ignores uncertainties about models and estimation and (ii) it doesnot tell how the results are to be used. The scientists in charge of assessment were expected torecommend such quantities as total fishing effort and catch quotas. This procedure improperlyplaced the judgment about risk-taking into the hands of technical advisers. In contrast, whenproducing an answer in form of a (posterior) probability distribution, the Bayesian approachgives a complete description of current beliefs and uncertainties. It becomes the responsibilityof the manager to formulate an utility function which measures the consequences (biological,economical, social) of different actions. The assessment expert is no longer the decision maker,but an analyst of uncertainty. A dear indication of the appeal of Bayesian statistics in fisheries,is the considerable growth of the use of such methods in current fisheries literature (Ludwig andWalters [36], Mangel and Clark [40], Clark [6] [7], Walters [68], Charles [5], Fried and Hilborn[19], Deriso and Parma [13], Nakamura ei al. [43], Hilborn and Walters [25], Zeh et al. [80],Geiger and Koenings [20]).Fundamental characteristics of fisheries sciences are the lack of reliable controls and difficultyto obtain replicates. The possibility of performing planned experiments is severely limited.Studies of the resource, using data produced during the history of its exploitation, are mostlyobservational. In contrast, statistical science originally dealt with the collection of data andlater moved to the analysis of (replicated) data obtained under controlled conditions. Classicalfrequentist probabilities involve the concept of a long sequence of repetitions for a given situation(the probability of that a fair coin comes up heads builds on such rationale). This (frequentist)view of probability is central to classical statistics. In fishery sciences, however, where oftenChapter 1. Introduction^ 3there are no replicates, the uncertainties about the effects of an action have a very differentmeaning. For example, consider the statement: "There is a probability of that the stockwill collapse within five years if action A is taken today". The frequentist viewpoint does notapply here. The theory of subjective probability has been created to enable one to talk aboutprobabilities in such situations. A comprehensive review of some systems describing subjectiveprobabilities is found in Press [51]. According to Berger [2] (p.75), "such systems show thatthe complexity of probability theory is needed to quantify uncertainty (i.e., nothing simplerwill do), and at the same time indicate that it is not necessary to go beyond the language ofprobability". I believe that the classical statistical approach, with its ideas of repetition andrelative frequency is inadequate to cope with the reality of fishery sciences in most cases, andthat the Bayesian paradigm, founded on the concept of subjective probability, has the potentialto provide better support of scientific procedures in this field. Given the uninformative natureof most fishery data, the practitioner knows that all bits of additional information can be ofgreat relevance and should therefore be used whenever possible. A fair way of proceeding isto explicitly unfold available beliefs and insights in the form of probability distributions. Byformally stating such probabilities, uncertainties become open to public scrutiny and evaluation.Starfield and Bleloch [65], by pointing out that "people's perception about models andmodelling can be very different depending on the kind of problems they usually face and thetypes of tools they commonly use", provide a very general classification which can be helpful tosummarize the difficulties typically faced by fisheries scientists. Their classification is accordingto two factors:Factor 1 the level of understanding (and control) of the problem: low, high.Factor 2 the quality and/or quantity of available data: low, high.Most problems in engineering and physical sciences, for example, are close to a high-highclassification for Factors 1 and 2 and consequently benefit most from the developments inclassical statistics. Many problems in the nonphysical sciences, particularly fisheries, are closerto a low-low classification and are therefore unsuitable for a traditional approach.Chapter 1. Introduction^ 4Within this general picture of fishery science and given the argumentation above, the mod-elling approach proposed here has two central features:1. The sequential nature of system dynamics and data acquisition.2. The Bayesian paradigm to perform analysis and assist the decision making process in thepresence of uncertainties.The next section discusses the different levels of uncertainty encountered in fisheries and thetradeoff between information acquisition and monetary returns, when management actions arechosen. In Section 1.3 I describe the Bayesian approach. Finally, Section 1.4 gives a summaryof the contents in the remaining chapters.1.2 Uncertainties, Models and LearningThe necessity for explicit recognition of uncertainties was used in the previous section as a cen-tral argument in support to a reorientation in thinking among fishery scientists and managers.The Bayesian approach, suggested as an adequate framework in which to perform the statisticalanalysis, will be described in Section 1.3. Here I will comment on three types of uncertainty infisheries models: (i) model structure, (ii) parameter estimation and (iii) prediction; and I willdiscuss how each one of them affects assessment and/or management.Model Structure.This basic uncertainty is central to modelling in general and is not a feature confined to problemsin fisheries. It is concerned with the formulation of problems so that relevant properties arecorrectly captured. While there is no specific way to address this complex issue, a variety ofbooks can provide helpful advice: Starfield and Bleloch [65], Walters [68], Clark [7], Hilbornand Walters [25].Linhart and Zucchini [32] propose one possible strategy to confront this difficulty. TheyChapter 1. Introduction^ 5introduce the concept of an Operating Model which should be the nearest possible representa-tion of the "true situation". Ideally, it is on the operating model (or a family of alternativepossibilities) that interpretation, prediction, and decision should be based. Since these modelsusually have many unknown parameters, considerations on the behaviour of estimates play animportant limiting role in practice and will often require the use of simpler Estimation Models.This last consideration leads to the second level of uncertainty: the estimation of parameters.Parameter EstimationThe lack of good controls, often caused by management constraints, result in weak contrast infishery data (Ludwig at al.[35]). When combined with (often large) observation errors, such datatend to be non-informative. Parameter estimation under such circumstances becomes worse(increasingly uncertain) as the dimensionality of the problem grows, a phenomenon known tostatisticians as "overfitting".The uncertainties associated with parameter estimation are accuracy (bias and variance)and confounding. The estimation may be done by minimizing least squares or maximizing thelikelihood. Using simulation studies, it has been shown that for typical fisheries data, simplermodels produce better policy estimates than more realistic highly parameterized ones (Ludwigand Walters [37] [38]). For further applications of the operating model concept in fisheries seeFournier and Doonan [16], Ludwig et al. [39] and Fournier and Warburton [17].A drawback in all these procedures is the use of a single best estimate upon which manage-ment recommendations are based. It is common that various other parameter combinations fitthe data nearly as well. Different combinations, however, might imply different managementactions. Such legitimate and important uncertainty simply vanishes from the problem whenconclusions are given as point estimates.Chapter 1. Introduction^ 6PredictionAs already mentioned in Section 1.1, fishery science depends heavily on observations rather thancontrolled experiments. This limits the range of available data (because conventional manage-ment tends to seek stability) and affects the reliability of predictions. It is generally true thatmoving outside the range of observed data and controls is a very uncertain undertaking. For fish-ery models in particular, it usually involves extrapolation for non-linear systems. For example,one can only know how a fish stock responds to increased fishing pressure by trying it out. Butin many instances this form of testing might translate into short-term losses (or unacceptablerisks) for which there are no guaranties of future compensating returns. A compromise betweenshort-term performance and long-term learning rates needs to be established. An action (con-trol) which is good under performance considerations, perhaps by maximizing profit, is oftenincapable of producing useful information about the system (i.e., it results in non-informativedata). This principle has become known as the "dual effect of control". Performance orientedstrategies usually seek stability of the system, while better learning rates are associated withinstability and strong contrast in the data. Actively adaptive (or "dual control") strategies aredesigned to establish some balance between learning and short-term performance. Walters [68](Chapter 9) gives an extensive description of these methods, which involve stochastic dynamicprogramming. While acknowledging the importance of the described conflict and the relevanceof actively adaptive strategies to management of fisheries, the present work will not furtherelaborate on the issue. But, as a final remark, I mention three features which are importantfor the development of actively adaptive strategies and are of particular interest here: (1) Theidentification of alternative hypothesis about stock dynamics, (2) the development of perfor-mance criteria for comparing adaptive policy options, and (3) the formal comparison of differentoptions. In Section 1.3 it will become clear that the Bayesian approach can adequately dealwith all three demands. In this sense, a support for active adaptive strategies is a support forthe Bayesian approach as well.In Section 1.1 I have pointed out the time-series nature of models and data acquisition inChapter 1. Introduction^ 7fisheries. The adaptation of traditional time series methods (Box and Jenkins [3]) has been pro-posed in the past (Schnute [56], [58]) but has not proved to be successful in describing fisheriesstock dynamics (Hilborn and Walters [25]). Some classical autoregressive moving average modelARMA(p,q) say, can be good to describe past data but is not of much help in guiding man-agement actions for the future. The literature in time-series and dynamic system methodology(Brockwell and Davies [4], Aoki [1]) is mainly suited to situations in which a large amount ofgood quality data and a good understanding (and control) of the system dynamics is available.Pella and Tomlinson [48] introduced a sequential fitting procedure which is unrelated tothe traditional time-series methodology but has proved very useful in fisheries. They assumeda known underlying deterministic (but unobservable) model describing the biomass dynamic,together with an observation model establishing the link between observed catches and biomass.An estimate for the parameters of the biomass model was then obtained by minimizing least-squares of the differences between the observed and the predicted time-series of catches (ob-servation errors). The operating model approach, described earlier in this section, is moregeneral due to the inclusion of two features: (1) system noise (Schnute [58] discusses how theassumed noise structure can fundamentally affect the results of an assessment) and (2) moreelaborated goodness-of-fit criteria allowing for the incorporation of auxiliary information. Ishall give more details about this approach in Chapter 3 (Section 3.3) where I extend theseconcepts into Bayesian inference.1.3 The Bayesian MethodIn Sections 1.1 and 1.2, I presented a series of features of fishery science and argued in favourof a reorientation of thinking about the distinct role of fishery assessment and management. Ifurther claimed that the Bayesian paradigm of statistical modelling is the correct frameworkin which to operate. Fishery management is ultimately a problem of decision making in thepresence of uncertainties and the Bayesian theory consists of an axiomatic system developed toprovide a rational and coherent way of reaching such decisions.Chapter 1. Introduction^ 8Here I shall describe the central ideas which characterize the Bayesian method. As intro-ductory books to the subject I suggest Lindley [31] and Lee [29]. More technical treatmentsare Berger [2], Press [51] and, in the context of dynamic models, West and Harrison [77].There are four basic elements:1. The unknown quantity 9 which denotes the uncertain element in the decision process andis called the state of nature. The set of all possible states of nature will be symbolizedby 0. For definiteness and clarity of exposition, let 0 be a subset of the real line R. Aprobability density r(0) is assumed to describe the subjective relative credibility associatedto a particular O.2. A sample space of elements x and a set of probabilities po(x I 0) of x, indexed by O.The model po(x I 0) (the index 'o' indicating observation) is seen as a function of 0 and,conditional on a fixed observation of x, is denoted the likelihood function and symbolizedby 1(0). This "conditional view" — which makes the sample space (i.e., the set of allpossible outcomes for x) irrelevant in most cases — is known as the likelihood principle.This principle makes it explicit that only the observed x should be relevant to provideevidence about 0.3. A set of all possible actions under consideration. I denote an individual action by a andthe set of all such actions by A.4. A numerical value measuring the consequence (utility) of an action a if 9 turns out to bethe true state of nature. This value, symbolized by U(0, a) and denoted utility function,is assumed to be defined for all pairs (0, a) E 0 X A. Such an utility function is describedon a scale which is also based on probabilities (this provides for coherence).The probability density r(0) describes uncertainties about 9 prior to the observation x;therefore called the "prior density". In light of the information provided by the observed datax, the prior beliefs about 8 are updated using Bayes Theorem, which says:p(0 Ix) cc po(x I 0)7 r(0)Chapter I. Introduction^ 9The constant of proportionality is a normalizing constant to make p(0 I x) a proper probabilitydensity so that it integrates to 1 over the set 0.From a Bayesian viewpoint, the conditional density p(0) x) gives a complete description of0 after observation of the data x; it is denoted the "posterior density". Since p(0 I x) says itall, there is no need to go further in terms of estimation. Anything less, however, implies lossof information.Any action should have its value described on an utility scale. In the light of current beliefs,fully expressed in the posterior p(0 I x), a best action must be chosen. It is an implication ofSavage's [55] work that, given a coherent comparison of utilities, the best action is one thatmaximizes the expected (posterior) utility E[U (• , a)]. In the continuous case and for a given athis expectation is obtained as the integralE[U(., a)] = I U(0, a)p(0 I x)d0Additional arguments in favour of the maximization of expected utility are given by Lindley[31] (1). 57).Some remarks are needed here:• For simplicity, I have assumed 0 to be a subset of the real line 42 in the above description,and consequently r(9) and p(0 I x) are one-dimensional probability densities. But, Ican have 0 as finite or countable (n --+ oo) sets, in which case a discrete probabilitydistribution is used for both, prior and posterior. Also, 0 = (01, • - • ,0p)' can be a p-dimensional vector so that 0 becomes a subset of RP and multivariate probability densitieswould apply. In fact, the latter (multivariate) will be the case of interest in the remainingchapters.• Bayes' Theorem is central to the approach (hence, the name) in that in establishes asystematic way of handling (and modifying) subjective beliefs about 0 in the presence ofobservation x.Chapter 1. Introduction^ 10• In Bayesian inference it is common to consider loss functions instead of utilities, so thatminimization of expected losses replaces maximization of expected utilities in determiningan optimal action. The "squared-error loss", which definesL(0, a) = (0 — a)2is a typical example. If A = 0 = 42, it is easy to show that the posterior expectation E[0]is the value for a which minimizes the posterior expected square-error loss E[(0 — 02].In general it is adequate to think simply of the relation L(0, a) = — U(0, a); that is, thinkof loss as being a negative utility.There would be little resistance to the use of prior distributions if 0 were a random variablesubjected to frequentist interpretation. The subjectivistic concept of probability and the use ofprior distributions causes some discomfort even among advocates and users of Bayesian analysisfor fisheries. To give a taste of the controversy involving the former, I quote Lindley [31](p.20):"Some have regarded this (subjective probability) as a disadvantage. In fact, thesubjective nature of probability is its strength, for it describes a real situation ofa subject, or observer, contemplating a world, rather than talking about a worlddivorced from the observer living in it, as do some other sciences."As to the latter, Berger [2] (Sections 3.7 and 4.7) discusses extensively the common criticismassociated to prior specifications and presents possible solutions to this dilemma. I give nexta short account of his discussion regarding four arguments which are likely to be raised in anycritical discussion about the merits (or not) of a Bayesian analysis.1. Lack of Objectivity: If an overwhelming amount of (good quality) data were available,almost any analysis would yield similar conclusion, so that prior information would belargely irrelevant. But, when such data availability is limited (as in most cases), the choiceof a model and its error structure can have a comparatively much stronger impact on theanswer than the choice of a prior does; and it is just as subjective.Chapter 1. Introduction^ 112. Misuse of Prior: It is true that the use of a prior introduces an additional source for abuseof statistics. The best prevention against potential misuse consists in a proper educationregarding prior information and the handling of uncertainties. In reporting results of aBayesian analysis, prior, likelihood, and utility should be reported separately so the readercan evaluate how reasonable all subjective inputs are. It also makes it possible to try outalternative priors if such a change is felt to be appropriate.3. Robustness: "Do small changes in prior specification cause significant changes in the finaldecision ?" This question is justified and a central issue. Studies in Bayesian robustnessand related methodology address the topic. The construction of robust priors is sometimesan adequate compromise. If, however, different reasonable priors result in contradictingconclusions, this is an indication of scientific uncertainty of the problem rather than aweakness of prior specification.4. Data or Model Dependence of the Prior: The Bayesian approach assumes that the infor-mation contained in 740) does not depend in any way on the data. This idealized view ishard to match in practice. This happens because a model po(x I 19) is often chosen afterinspection of available data. Such choice further defines the parameters 0 under consider-ation; hence, the circularity between data and parameters. The answer to these criticismis twofold. First, all analyses suffer from the circularity between data and model. Second,if in a given analysis the model seems reasonable as the priors do, and other reasonablepriors give similar conclusions, then details of prior development are not central to theanalysis; this leads back to the robustness considerations discussed previously.The Bayesian paradigm is conceptually easy to implement. But the calculation of theposterior p(0 1 x) and the corresponding expected utility entail evaluation of integrals whichcan become difficult, especially for multidimensional O. To avoid such difficulties, the use ofconjugate priors has been an important concept and is explained below.Chapter 1. Introduction^ 12Conjugate Families and Numerical ApproximationsIf a class F of density functions po(x I 0) is given, and P is a restricted class of prior distributions7 40), then we say that "the class P is a conjugate family for F", if the posterior p(0 I x) isagain a member of P. Some examples:1. If (x I 0) has Gaussian distribution with mean 0 and (known) variance a2, then the classof Gaussian priors 740) is a conjugate family for the class of Gaussian (sample) densities.2. If (x I 0) has Gaussian distribution with (known) mean 0 and variance a2, then the classof Inverse-Gamma priors 71-(a2) is a conjugate family for the class of Gaussian (sample)densities; that is, p(cr2 I x) is an Inverse-Gamma density.3. If (x I 0) has Poisson distribution with intensity parameter 0 (i.e. h(x I 0) = 0x e-e (x!)-1),then the class of Gamma priors 740) is a conjugate family for the class of Poisson (sample)densities.The need to look for conjugate families is a constraint on practical application. For fisheriesin particular, the non-linearity of models further complicates the matter. In an attempt tofree applications from both difficulties, theoretical and technological developments over the lastdecade have produced a variety of computationally demanding approximating methods. Smith[60] in a recent review, divides these techniques according to three different strategies:(i) Laplace and related analytical approximations,(ii) adaptive quadrature based on classical numerical analysis and(iii) sample-based methods or Monte Carlo integration.While (i) and (ii) require a great deal of expertise and become harder with growing dimen-sionality of 0, the sample-based methods of (iii) are suitable to a broader audience for theirrelative ease of implementation and practicality for high-dimensional O.The final output of a sample-based method consists of a random sample 0 = {01,• • •,0,,,}drawn from the posterior distribution p(0 I x). The essential duality between a sample and thedistribution from which it is drawn is an important concept which I shall explore next.Chapter 1. Introduction^ 13The distribution clearly generates the sample. But, conversely, given a sample one canapproximately recreate the distribution. More generally, one can use the sample 0 to exploreand summarize features of p(0 ) x) as one would proceed in exploratory data analysis on ordinarydata x. Features which might be of interest are (Smith and Roberts [62]):• examining shapes of uni-variate marginals for 9 or for functions H(0).• producing marginal moments or quantile summaries.• examining bivariate marginals or pairs of functions.• examining trivariate distributions (by taking advantage of the powerful statistical pack-ages for graphical analysis).• connecting 0 to an utility function so that the output gives answers to a specific decisionproblem.• using 0 in predictive analysis.The Monte Carlo methods are further sub-divided into two groups. First there are theMarkov Chain methods which include the Gibbs sampler of Geman and Geman [22]. Thesecond group is based upon importance sampling. The present work uses the latter. In thenext two chapters I shall explore possibilities for the use of adaptive importance sampling withmixture models (West [75]) to obtain posterior distributions for parameters in a variety of fisherymodels.1.4 Summary of ContentsIn Chapter 1 I have described the motivations for this work and the relevance of the Bayesianparadigm in fishery sciences (Section 1.1). Different levels of uncertainty and their impact infishery assessment and management were examined in Section 1.2. The Bayesian method wasdescribed in Section 1.3. This last section is designed to give an overview of the contents ofremaining chapters and assist a selective reader.Chapter 1. Introduction^ 14In Chapter 2 I describe the Monte Carlo methods used in Bayesian computation of pos-terior probability distributions. Central aspects of basic importance sampling are reviewed inSection 2.2 and the adaptive importance sampling proposed by West [75] is reviewed and il-lustrated in Section 2.3. The use of the approach for sequential analysis of dynamic models isdescribed and illustrated in Section 2.4.In Chapter 3, I describe a variety of models for biomass dynamics: stock-recruitment,surplus-production and delay-difference. Starting from the basic time-series fitting approachof Pella and Tomlinson [48], I incorporate the adaptive importance sampling methodologyto construct an approximate posterior distribution for the underlying parameter vector. Thederivation of the results are in Sections 3.2 and 3.3. Section 3.4 gives an overview of delay-difference models.In Chapter 4 I present different case-studies to illustrate the results of Chapters 2 and 3. InSection 4.2 I construct a bivariate posterior distribution for the stock-recruitment parameters ofSkeena river sockeye salmon data. A simulation model using catch and effort data is describedin Section 4.3. This case also proposes an utility function and uses it to make managementdecisions. In Section 4.4 I examine data for the Orange Roughy fisheries in New Zealand.Different management strategies are analyzed in the context of Bayesian decision theory andthe results compared to the (non-Bayesian) risk analysis of Francis [18]. The last two case-studies refer to situations for which the posterior distributions are bi-modal. In Section 4.5 Iuse artificial data generated from simulations and in Section 4.6 I use data from the Pacific codfisheries off the coast of British Columbia, Canada.The insight gained from the work, topics for further research, and final conclusions are allgiven in Chapter 5.Chapter 2Bayesian Posterior Computation2.1 IntroductionIn the previous chapter I discussed general features of fisheries stock assessment and man-agement. I gave arguments supporting the adequacy of the Bayesian paradigm to performstatistical analysis and to guide decision making. In Section 1.3 the Bayesian method was de-scribed. While conceptually easy to implement, its application reduces mostly to calculationsof integrals of the formf H (8)f (0)cle MHO) =^ (2 .1 )f f (0)c19where 9 E RP, H(9) is a measurable function, and f(0)I f f (01)c10' a probability density. Differ-ent choices of H(9) result in different aspects of practical Bayesian inference. To focus attention,assume that f(9) is proportional to the posterior density of interest. Then,• To calculate the posterior mean 1.1, I define H(9) = . Further, taking H(9) = (0 — iu)2yields the corresponding posterior variance.• If a decision has to be made between actions al and a2 according to the expectation ofgiven utility functions U(0, ai) and U(0, a2), I define II(0)= U(0, al) — U(0, a2).• To calculate posterior probabilities of specific sets (or hypothesis), H(9) can be definedas an indicator function IA for the appropriate set A. As an example, suppose I want toestimate the posterior probability of 9 being smaller than a fixed value 90; this is obtainedby definingH(9) = h<9015Chapter 2. Bayesian Posterior Computation^ 16• Within the context of fishery models in particular, the function H(0) sometimes definesa policy (e.g. optimum fishing effort) for which an estimate is desired. The posteriorexpectation itH say, can serve this purpose (although posterior largest mode and medianare also possibilities). Posterior estimation of the corresponding standard error is obtainedby defining a new function 11*(0) --= (H(9) — 141)2.Models in fisheries are mostly non-linear in 0, which is often a vector of moderate to highdimension. As a result, the evaluation of (2.1) is in general quite complex and impractical to besolved analytically. A wide range of approximating methods have been developed to overcomethis difficulty. These methods can be subdivided into two major groups. One group concentrateson analytic approximations involving Taylor series expansions (Lindley [30], Tierney and Kadane[67], Delampady et al. [11] [10]). Difficulties with the implementation of the analytic methodsgrow rapidly as the dimensionality and complexity of the models increase. The second group ofapproximating methods, involving Monte Carlo integration, is generally necessary under thosecircumstances. These methods are presently an active field of research; see Smith [60] for anoverview.The principle behind Monte Carlo integration is as follows. Suppose one can generate anindependent identically distributed (i.i.d.) sequence of random variables {01, • • - , 0,} havingcommon density p(0) > 0 on 0. Then, for a well-behaved H(0), it follows from the strong lawof large numbers, thatnlim —1 E H(9i) . I H(0)p(0)d0n.00 n i=.1^eWhile the principle is simple, difficulties for practical implementation consist in the abilityto perform random drawings from a general density p(0); it might be computationally veryexpensive (as the dimension of 0 increases) or impossible for complicated forms of p(0). Inattempting to overcome these hurdles, Monte Carlo methods have followed two distinct paths:1. Mar/coy Chain Methods: These methods include the Gibbs sampler algorithm of Gemanand Geman (1984) [22]. Given a p-dimensional 0, the key feature of the algorithm is toChapter 2. Bayesian Posterior Computation^ 17fix the full conditional densities p(0i I 03; j 0 i) for i = 1, - • - ,p, from which compu-tationally inexpensive random draws are made. Possible limitations of the methods aretwofold: (1) the need to specify full conditionals (often a difficult task in itself) withina convenient class of distributions from which sampling is easy and (2) the possibility ofbad performance for distributions featuring two separate modes (with a region of smalldensities in between). In the latter case, and due to its Markovian nature, a chain ofrandom generation could be "trapped" at one of the modes. Refer to Smith and Roberts[62] for a review. A related method is the substitution sampler of Gelfand and Smith [21].2. Importance Sampling Methods: In its original form, as described in Geweke [23], basicimportance sampling replaces the density p(0) by a "similar" density g(0), from whichthe random draws can be made efficiently. The ability to find a suitable g(-) is the keyassumption. Adaptive sampling schemes have been developed to assist the choice of animportance function g(-) ( Kloeck and van Dijk [28], Naylor and Smith [45], Oh and Berger[46]). An important variation, specially useful for high-dimensional 0, was proposed byWest [75] [76]. It consists in modelling g(-) as a discrete mixture of distributions of somestandard parametric form. By recursively updating g(-) a few times, the final mixturemodel seems capable of providing a satisfactory approximation to X.) quite generally.Sample-based Monte Carlo methods are becoming a central tool for practical implementationof Bayesian inference. In this work I shall focus attention on adaptive importance samplingmethods with the importance function given as a finite mixture of distributions. I shall explorepossibilities for its use in a variety of biomass dynamic models in the upcoming chapters. Beforeexamining these, I use the remainder of the present chapter to describe importance sampling.The next section introduces basic importance sampling. In Section 2.3 I present the al-gorithm for adaptive importance sampling using mixture models and provide an illustrativeexample. In Section 2.4 I describe the use of the adaptive algorithm for sequential updating ofdynamic models and illustrate the procedure with another example.Chapter 2. Bayesian Posterior Computation^ 182.2 Monte Carlo Integration with Basic Importance Sampling (BIS)Let 0 have a density p(0) with support 0 E RP. Assume that it is impractical to produce asequence of i.i.d. random variables from p•); but the exact evaluation of the unnormalizednon-negative function f(0) oc p(0) is possible for a given 0. Suppose it is required to evaluatethe expectation (with respect to p(.)), of some given function H(0). Monte Carlo integrationwith importance sampling performs this calculation as follows:1. Specify a probability density g(0) as an approximation for p(0). If Eq.] and EgH denoterespectively the expectations under densities p(•) and g(.), thenEEH (-)j = f H (0)f (0)d0P f f(0)(10f H(0)fgr4g(0)c10f fgr4g(0)c18f H(0)co(0)9(0)d0f w (0)g (0)c10Eg[H(.)co(-)]Eg[w()]where w(0) = ^2. Draw a sample of size n from g(0), denoted 0 = {91, • • Onl.3. Get the set of normalized weights Si = {w1, • • • ,con}, where coi is defined as^w(19.i)^f(0) k^kg(0i)with k = E7=1 w(93).j).4. Use Monte Carlo approximation to evaluate the expectations with respect to g. Namely,E-g[H(•)co(-)]^17. -^H(9,49,)Chapter 2. Bayesian Posterior Computation^ 19Eg[co(•)]^co(8j)^-k-n j=1The final result isEP[1101 E H(0.00.,^ (2.2)In essence, the procedure consists in replacing p(.) by a discrete approximation with massfunction coj at OiThe description above assumes the existence of a good importance function g(.). Hence,the key issue is finding a suitable g•). It should be easy to draw i.i.d. random variablesfrom it while, at the same time, it should make the approximation in (2.2) accurate for n assmall as possible. These are sometimes competing goals, and different situations might needdifferent solutions. Berger [2] (pp.263-265) provides some guidelines for choosing an adequateg(). Geweke [23] gives mild conditions under which the numerical approximation in (2.2)converges almost surely to the true value and is approximately normal distributed with meanILH = EP[H(•)] and variance a2/n, as the number of Monte Carlo replications n increases. Thevariance is defined as(72 EP [(HO —141)2(001 •The performance of BIS clearly depends on (72 and n, so that a g() yielding small o2 isdesirable. Although there is not a general rule for choosing a good g(), Oh and Berger [46]give some desirable properties of a good importance function:1. It should be easy to generate random drawings from it, a requirement already mentionedearlier.2. It should not have sharper tails than f(-). The intuition being that sharper tails wouldresult in large values of co(02) for outlying values of Eli. The result is increase in thevariance u2 and instability in the estimates.3. It should mimic the true f(.). Here the goal is to avoid having a large proportion of c,..)(0j)near zero. Such values have little effect on (2.2), reducing the efficiency of the method.Chapter 2. Bayesian Posterior Computation^ 20A complementary way of reducing the variance of the importance sampling estimate hasbeen the search for appropriate sampling strategies (Rubin [54]). According to Smith [60], andprovided that a good procedure can be found, it is likely that importance sampling is moreefficient than the Markov approach mentioned earlier.Probability Densities and Random SamplesAn important aspect of all the Monte Carlo sampling methods, and importance sampling inparticular, is the duality between the density p(-) and a sample 0 = {i, • • , On} drawn from it.If only 0 is known, it is possible to approximately reconstruct the underlying density p(-) usingan histogram, kernel techniques, or the empirical cumulative distribution. Switching attentionto the sample 0 as a description of the posterior leads to two important observations.First concerns Bayesian Decision Theory. There it is usually necessary to evaluate theexpected utility EP[U(0, a)] to find the optimal action a. If 0 is an i.i.d sample from p(•), itseems natural to use the same set 0 (and SI = {w1, • ,con,}) to calculate the expected utilityfor all required values of a. Since the generation of 0 is often the most expensive part of theMonte Carlo process (Berger [2]), the availability of 0 and S2 is an important advantage of thesampling methods with respect to computational efficiency.The second observation relates to resampling techniques and can be very helpful in thecontext of Bayesian robustness. There it is desirable to determine potential changes to theposterior when alternatives to the likelihood and/or the prior are considered. In the context ofdistributions such a study can pose formidable difficulties and requires a fresh start when a newpossibility has to be investigated. Focusing attention on the available sample 0, the resamplingapproach makes such an analysis remarkably simple (Smith [60]). Starting with the sample 0from p(0), suppose the goal is to produce a new sample 0* from a distribution proportional tosome known function P(9)• Resample from 0 = {92; j = 1,- • • , n} as follows:1. Calculate coj = f*(9j)/p(0j).Chapter 2. Bayesian Posterior Computation^ 212. Obtain the normalized weightsWiqi =Ldj=13. Draw a random sample with replacement from the discrete distribution over 0 placingmass qi on 9,. Denote this sample 0* = {01, 0, —}. Then 0* is approximately dis-tributed according to the densityf*(0)/ f*(01)do'Here is an example. Let ri(9) and 71-2(9) be distinct priors to be compared and 4(9), 12(9)two likelihoods. For the posterior densities pi(0) a 1 i(0)x- i(0) for i = 1,2 the following relationholds:P2(0) oc 12(0)7r2(0)) Pi(9)U1(9)71-1(0)If 0 is a sample from pl., changes from the likelihood l(0) to 12(9) and/or from the prior ri (9) to1r2(0) can be examined using the resampling algorithm described in steps 1 to 3 above. Simplydefinev(9) = 12(9)r2(9)11.(8)ri(e)PM= Pi(9)f*(9) = v(e)pi(o)coi = v(0i)2.3 Adaptive Importance Sampling (AIS) and Mixture ModelsFor efficiency in using the importance sampling method, we must use an appropriate importancefunction g(.) (in the sense discussed in the previous section). Finding such a function requiresskills It becomes harder as the dimension of 9 increases. For wider applicability of the method,more automated ways to obtain a satisfactory g(-) are needed. This is the motivation behindthe development of adaptive importance sampling (AIS), the subject of this section.Chapter 2. Bayesian Posterior Computation^ 22The AIS procedure was introduced by Kloeck and van Dijk [28]. It starts with some crudeapproximation for p(-) denoted go(•). Because many of the sample points 9o, will have negligibleweights w0 ,j (i.e. small mass p(00,i) relative to go(00,i)), such an importance function is veryinefficient computationally, requiring huge sample size no, as well as statistically, convergingslowly or not at all. The adaptive refinement consists in taking a small no and, based onthe outcome, constructing the updated gi(-). A new sample from M.) is expected to have abetter match with p(.). The final importance function obtained by such an adaptive schemeis expected to display good mimicry to p(.); a desirable property according to the previoussection. Different methods to perform the adaptive refinement of g(-) were proposed by Naylorand Smith [45], Oh and Berger [46] and West [75] [76].I will focus on West's method: the use of a discrete mixture of distributions of some stan-dard parametric form. Three features which justify my choice for this method are: (1) thecapability of mixture models to emulate very irregular forms of distributions (Wolpert [79]), (2)the practicality for high-dimensional 0, and (3) the potential for automation.I use the remainderof this section to reviewing this method.Assume 9 is a p-dimensional vector with true density p(0). Further assume that f(9) a p(0)can be evaluated for any given 0. The adaptive procedure for obtaining a good g(0) goes asfollows:1. Define an initial importance function go•) and draw a stochastic sample of size no fromit. Denote this sample by 00 = {00j; j = 1, --• , no} and the associated weights by:= {wo,j; = 1, • • • , no}. LetF0 = {900, no, 0o, no}represent the available quadruple after this initial step.2. Compute the Monte Carlo estimate for the variance Vo, as the weighted sumnoV0 =Chapter 2. Bayesian Posterior Computation^ 23where Bo =^wo,j00,i is the Monte Carlo mean.3. Construct a revised importance function as the following weighted mixture:noMO) =^wod do 00,i, h2v0)^ (2.3)j=.1.where d(9 I m, M) denotes a known p-variate density function with mean m and varianceM. The bandwidth h is a smoothing parameter as used in conventional kernel densityestimation (Silverman [59]). Refer to the comments below for more details about bothd(.) and h.4. Draw a larger sample n1 > no from gi(-) and replace Fo byF1 = {g1(•),ni3O1,S21}5. Use diagnostics (see comments below) and either stop and base inference on F1 or repeatsteps 2 and 3 for a revised version r2.Next a few comments to clarify some aspects of this adaptive procedure:Smoothing: The mixture model g() given in (2.3) is simply a weighted kernel estimate withkernel de I m, M). The theory for kernel estimation (Silverman [59]) implies that themixture model g(.) approaches p(.) for increasing sample size n if the bandwidth decaysto zero at an appropriate rate. For Gaussian kernel, a standard h — defined as a functionof the sample size n, the dimension p of 9 and two constants a and b — ish = anilbwith a given by4^)1/ba= 1 + 2p jSilverman [59] defines b p+4. West [76] defines b = 1+4p, instead. For one-dimensionalO's, both bandwidths are identical. If p > 1 (and n is fixed), West's formula givesChapter 2. Bayesian Posterior Computation^ 24comparatively larger values. If multimodality is suspected, this standard bandwidth tendsto oversmooth and slightly smaller values, say rh (0.5 < r < 1) are preferable. West [76]also shows that, for fixed n, the function g(.) is always overdispersed: a desirable featurefor any importance function. The constant variance V in the kernel of (2.3) can be replacedby its Monte Carlo estimate obtained from the current values of 0 and O. Furthermore,in any adaptive procedure, a crude go(.) need only capture gross features of p() and asmall sample size no will do.Possible variations (not used here) consist in: (1) allowing for a different bandwidth ateach recursion, and (2) using local variances 17 for each kernel in the mixture. In (1)the spread of g(.) is controlled directly: it can be increased by intervention whenever thecurrent importance function is questionable. The use of local variances suggested in (2),can better explore the existence of local dependencies among elements of O. Although thisis a good feature, the possible gain in statistical efficiency needs to be balanced against acorresponding loss in computational tractability.The kernel: In all applications to be considered here, I shall use a heavy-tailed p-dimensionalStudent distribution as the kernel de m, M). According to West [76], the Studentdistribution is an adequate choice, provided Oi E ( — oo, -I-oo) for i = 1, • • • ,p. I use thenotation T(k, m, C) to denote a Student distribution with k degrees of freedom, mean mand scale ClDiagnostics: After each recursion in the adaptive reconstruction of (2.3), different diagnosticscan be used to check for improvement in the importance function. Particularly, Westrefers to four possibilities:1. Changes in (Monte Carlo) estimated expectations.2. Configuration of points in each dimension of the parameter space 0.3. Frequency distribution of weights 9.'The scale C and variance M of a Student distribution are related by C = YM for k> 2.Chapter 2. Bayesian Posterior Computation^ 254. Some measure of variability of weights.In the approach taken here, I shall use a measure of variability of weights as the standarddiagnostic tool. West [76] suggests the entropy relative to uniformity, which he defines as-^wi log wilog n (2.4)This value is non-negative and equals unity in case of a perfect match between g(0i) andp(0) for all 03 E 0; that is, the normalized weights coj will all equal 1/n. As the variabilityamong coj's increases, the relative entropy becomes smaller.The choice of adequate sample sizes ni, bandwidth h, and number of updates for g(•), canbe guided by one or more of these diagnostics. For instance: look for stable Monte Carloestimates of mean and variance (criteria 1); verify if the relative entropy (2.4) increasesuniformly towards 1 (criteria 4 ), etc.Reduction of MixturesFor a very large n, the mixture (2.3) might contain many redundancies, so that another mixturewith a much smaller n is alike for all practical purposes (see West [76] for examples). Thesesmaller mixture models are obtained by reducing the n-component sets (0(n), f2(n)) to thesmaller pair (0(m), S2(m)) with m < n (superscripts indicating cardinality of the sets) by someclustering scheme. The procedure he suggests goes as follows:1. Sort the elements Oi E 0(n) according to increasing weights wi E SIN, such that eicorresponds to the smallest weight col.2. Find the index i, such that 0i is the "nearest neighbour" to 01 and calculatew* =4.01 wi0* = 0,101 co20ico*Chapter 2. Bayesian Posterior Computation ^ 263. Remove 01 and 9, from 0(n) and include 0* instead, to get 0(n-1). Proceed similarly withcoi and co* to get C2(n-1)•4. Repeat the previous steps (n - m) times, until obtaining 0(7n) and S/(m).The reduced mixture keeps the form (2.3) but has only m components, with locations 0(m)and weights 52(m). The estimate for V, needed in the kernel de I m, M) and calculated fromthe larger sets 0(n) and 11(n), is kept. But a larger bandwidth h, due to the smaller number ofkernels in the "reduced" mixture, is appropriate.To help practical implementation of AIS for the biomass models of Chapter 3,1 have includeda pseudo-code for this procedure as an Appendix.2.3.1 An ExampleTo illustrate the AIS procedure, I shall perform a Bayesian analysis for a model first consideredby McCullagh and Nelder [41] (sec. 9.3.3) and, in the Bayesian context, by Smith and Gelfand[61].For i = 1,2,3, let Xi,1 and Xj,2 denote two binomial random variables with parameters(no, pi) and (ni,2,p2). Assume Xi,1 and X1,2 are conditionally independent given (pi,p2), andwith no and ni,2 specified. The observed random variables are Yi = X2,1 Xi,2.Given the observations^1'2 = y2, and Y3 = y3, the likelihood for (Pi ,P2) takes theform:i=i JEA(i)^j 0: - iwhere A(i) = {j; max-0,yi - ni,2} _< j < minfno , yil.The data of McCuRagh and Nelder are given in Table 2.1. Their goal was point estimationof (p1,p2). I shall obtain a (two-dimensional) posterior density instead.Transforming for 0: The method requires 0 = (OM), 0(2)) to be defined on the real planen2. Hence I use the logit transformation0(i) = ln(pi/(1 - pi)), for i^1,23 noE(( ni,2^pri(1 _ p2ri,2-Yi-Fi (2.5)Chapter 2. Bayesian Posterior Computation^ 27Table 2.1: Data from McCullagh and Nelder [41] (Sec.9.3.3)i no ni,2 Yi15 5 726 4 534 6 6The final results can be described for 0(i)'s or, as I did here, transformed back to the originalparameters pi.The calculations: I shall guide the reader through the various steps of the AIS algorithmfor this specific example.1. Define the prior distribution r(0).After drawing a sample of independent observations from a two-dimensional uniform dis-tribution on the unit square (0,1) x (0,1), and applying the logit transformation to eachof them, the resulting sample of values 0 has a dome-shaped distribution with (approxi-mate) mean-vector (0,0)' and diagonal covariance matrix diag(3.2, 3.2). Therefore I useda two-dimensional Student with 30 degrees of freedom and the given values as mean andscale as my prior for 0. The degrees of freedom were chosen arbitrarily; smaller valueswould result in too heavy tails inconsistent with the observed sample.2. Define go(.) = r(.). That is, use the prior distribution as the initial importance function.3. Draw a sample of size no = 1500 from go(.). This is the set00 = {00j, j = 1, • • -, no}.The sample size, again an arbitrary choice based on diagnostics and experimentation, is ofmoderate size. It is intended to capture crude features of the true posterior. In subsequentrecursions the importance function becomes more refined. Therefore the sample size willgradually increase.Chapter 2. Bayesian Posterior Computation^ 284. For each 00,j E 00 calculate:(i) the true posterior density P(eo,i) = 1(0o,i)740o,3). Where 1(9) is the likelihood (2.5)with pi's (i = 1,2) replaced by the appropriate function in 0(0 =-e°(`)(1 + egs))(ii) the value g0(90,2). In subsequent recursions the importance function g•) from whichthe current sample was drawn is used instead.(iii) the weight c4)(00j) = p(eo,i)/go(eo,i)•5. Get the set of normalized weights=-^j = 1, • • , no}where co0,3 = co(00,j)/K and K = E.72.1440o,j).6. Using 00 and 00 calculate the Monte Carlo estimates of the (2 x 2) covariance matrix V,as indicated earlier. Also calculate the relative entropy for g0(•) by using f20 in equation(2.4).7. Using the procedure for reduction of mixtures, described earlier in this section, reduce 00and 120 to smaller sets of m 500 points. Denote these new sets 0((50°) and 9,(50°) anduse them, together with the estimate V calculated in step 6, to construct the mixturemodel gi(.) of the form (2.3). I fixed the kernel de m, M) as a Student distributionwith 9 degrees of freedom (to provide for heavy tails in the importance function) and thebandwidth of Silverman [59] defined earlier.8. Draw a new sample of size n1 = 2500 from fil(•) and denote itei = {01,i,^= 1,- • •,ni}9. Repeat steps 4, 5, and 6 to obtain 91, V, and the relative entropy for gi(-).Chapter 2. Bayesian Posterior Computation^ 2910. Obtain the reduced sets .500) and S2(15"43^. Construct the new mixture model g2(.).11. Draw a new sample of size n2 = 4000 from g2(.) and denote it02 = {92,3, ./ = 1, • • • , n2}12. Repeat steps 4, 5 and 6 to obtain f22, V and the relative entropy for g20.13. Obtain the reduced sets 0(25" and Si(2500). Use them in the final mixture model (2.3) togenerate the contour plot of Figure 2.1.Results: The relative entropy (2.4) was 0.873 for go(.), 0.993 for gi(.) and 0.997 for g2(-).The major improvement is shown for the first recursion, where the entropy jumps from 0.87to 0.99. A final entropy close to 1 means good agreement between the mixture model and thetrue posterior at the 4000 observations in the set 02. Figure 2.1 is a contour2 plot based onthe mixture approximation for the posterior. Negative correlation and a slightly larger valuefor p2, as compared to pi., are identified. For comparison, I included the values reported byMcCullagh and Nelder: the quasi-likelihood estimate (0.36,0.84) (triangle) and the likelihoodestimate (0.2,1.0) (cross). Both estimates satisfy pl. -I- p2 = 1.2, and are on the (dotted) linewhich represents all such cases. I have included this line to show that the shape of the contourplot is approximately symmetric to it.Marginal posterior probabilities for p1 and p2 are shown in Figure 2.2(a). Despite the largeuncertainties, indicated by the flatness of the distributions, there is a clear trend towards smallervalues of p1 (full line) as compared to p2 (dotted line). When comparing both parameters, anatural function of interest isH(PlIP2) =P2 -P1I calculated this difference for each of the 4000 sample points in 02. A summary of the outcomeis shown as the full line in Figure 2.2(b). This curve was obtained using standard densityestimation (a simple histogram gives the same message); there is a tendency towards positive2A11 contour plots are made using `interp0', an S-Plus function.Chapter 2. Bayesian Posterior Computation^30Figure 2.1: Contour plot for posterior distribution of (p1,P2). Contour lines are for .05, .10,.25, .50, .75, .90, .95 of largest observed density.q .,--co0CD-0 c;2o_ •cr..oc\!oqo0.0 0.2^0.4^0.6p1 and p2(b)0.8^1.0(a).................^ .......Chapter 2. Bayesian Posterior Computation^ 31-1.0^-0.5^0.0^0.5^1.0p2-plFigure 2.2: (a) Marginal posterior for Th. (full line) and /32 (broken line). (b) Density estimatesfor (p2 — p1) for 4000 sample points (full line) and the 500 values remaining after clustering(dotted line).values, particularly in the range 0.5 to 0.7. The dotted line on the same graph, was obtained inan analogous way, but using only the 500 clustered values 0(25°°); little information is lost andthe result is essentially the same.In my description, I have avoided results in the form of point estimates. Posterior mode,median and (Monte Carlo) mean could be used as such. But, the posterior distribution is thefull expression of current uncertainties and beliefs about the parameters; why say less (with apoint estimate) if I can say it all with the posterior distribution ?Chapter 2. Bayesian Posterior Computation^ 322.4 Sequential AnalysisHere I shall discuss a class of models for which the parameter 9 is assumed time-dependent;hence I use Ot to associate it with time t. This topic is not needed for the developments in thenext chapters, and may be omitted without loss of continuity. But it illustrates another fieldwhere the AIS can be used.There are good biological reasons to distrust time-invariant 9 for models describing thedynamics of exploited fish stocks (Walters [69], Parma [47]). Alternatively, we might take 0i todescribe the state of the system at time t. Some examples: define Ot as the stock biomass attime t; or define Ot = (X,,,t, • • • ,Xch,t)' to be a k-dimensional vector where X,„t describes thestate of classes ci at time t. For the latter case, think of X as some abundance measurementand ci as different locations; or think of X as number of fish (or total weight) caught at age-or size-classes ci.State-space models and Kalman filtering (Brockwell and Davis [4]) have been used in suchsituations (Walters [68], Mendelssohn [42], Sullivan [66]). The Bayesian alternatives are dynamicBayesian models (DBM), the subject of this section. An extensive treatment of DBM's is givenby West and Harrison [77]. I shall describe DBM in its general form and show why (and how)adaptive importance sampling (AIS) can overcome difficulties in practical implementation. Iconclude with an application to a non-linear non-Gaussian example, where Kalman filteringdoes not apply.In its general form, the elements of a DBM are a discrete time-series of s-dimensional obser-vations Yt (t = 1,2, -.), each of which is related to a p-dimensional parameter Ot representingthe state of the system at time t. The dynamics are described as follows. At time t, theobservation Yt has a known sampling distribution.(Yt I et) -- Po(Yt I et)^ (2.6)Conditional on Ot, Yt is assumed independent of yi and Oj, for all j t (past or future). TheChapter 2. Bayesian Posterior Computation^ 33state parameter is assumed to evolve in time according to an evolution density(°t+1 I et ) ' Pe(et+i I et)^ (2.7)Once Ot is given, 9t+1 is conditionally independent of past values Oi for j < t.For each time t = 1, 2, • • • the defining probability densities are given by the observationmodel (2.6) and the evolution model (2.7). Let Di be the information available at the endof period t. If, from t to t + 1, the only relevant information is the new data point Yt+1,then the recursive relation Dt+i = {Yt-i-i , Dt} describes the dynamics of sequential informationacquisition. The conditional densities of (2.6) and (2.7) typically also depend on D. I oftenexclude this term from the notation to avoid clutter.The Bayesian element in the model is the expression of uncertainties about Ot by probabilitydensities. In particular, assume thatP(Oj I DO, j = 1, 2, - • •^ (2.8)describes the uncertainty about 9, given all current information D. In the context of time-series,the probability densities (2.8) have different names depending on the j used: filtered for j = t;smoothed for j < t and predictive for j > t. In standard Bayesian terminology (Section 1.3),filtered and smoothed densities are posteriors given Di, while predictive densities are priorsfor future states. The Bayesian (subjectivistic) interpretation of these probabilities allows themodification of any such density if relevant information outside of Di becomes available.To use the Bayes Theorem, we need a sequential updating mechanism for each additionalobservation Y. This is done by the following 3 steps:Step 1: Starting at some time t - 1 with a given posterior (filtered) distribution p(Ot_i I Dt_1),get a prior (one-step-ahead predictive) density for Ot asMt I Dr-1 ) = 1 Pe(Ot I et- i 7 Dt_i)p(et_i ( Dt_i)det-iStep 2: Get the (marginal) one-step-ahead predictive distribution for the observable Yt asp(Yt 1 Dt_1)=---- f po(Yt I et, Dt-i)p(et I Dt-i)dotChapter 2. Bayesian Posterior Computation^ 34Step 3: For an observed value Yt -,--- yt, the probability po(yt 1 Ot,Dt_i) acts as a likelihood forOt. Hence, using Bayes Theorem, update the prior of Step 1 to get the posterior (filtered)density for time t,p(ot I Dt) = Ayt I Dt_irip(ot 1 Dt—i )Po(yt I et , Dt—i)The evaluation of the above integrals has no analytic solution in general. Important excep-tions — Dynamic Linear Models and Multi-Process Models — are discussed in detail by Westand Harrison [77].In case of non-linear functions and/or non-Gaussian noise structures, two types of approxi-mations are used: analytical and numerical. Analytical approximations are used, for instance,by Delampady et al. [11] and [10]. They apply Laplace approximations of Tierney and Kadane[67] to calculate posterior moments for a Bayesian version of an autoregressive time-series modelAR(1), and for a model where Ot is the intensity parameter of a Poisson process.Numerical approximations start by considering a grid of values for O. Choice of this gridis an important point for the distinction among different methods. Sorenson [64] identifies twogeneral groups:(i) local methods which restrict analysis to one grid point only; and(ii) global methods which use the whole grid.Local MethodsThe best known local method is the extended Kalman filter (EKF) (for application to fisheriessee Walters [68]). To describe it, I defineE[Yt I Ot,Dt-il = F(et,Dt-i)^ (2.9)with expectation over observation error vt; an (s x 1) random vector; andE[Ot I et_i,Dt-i] = G(et-i,Dt-i)^ (2.10)with expectation taken over system noise wt; a (p x 1) random vector.Chapter 2. Bayesian Posterior Computation^ 35The functions G(.) and F(.) — which can be time-dependent — are assumed known. TheEKF linearizes GO and F(.) with respect to O., assumes Gaussian distributions and choosesthe grid point to be some "best linear estimate" (The ordinary Kalman filter is the specialcase where GO and F(.) are linear). The updating formulas used in Kalman filtering areequivalent to the formulas used to update mean and variance for a Gaussian conjugate familyof distributions. Therefore, within the Bayesian context, the use of the EKF implies that theposterior distributions for Ot are Gaussian. According to Sorenson [64], the first serious stepaway from a Gaussian assumption involved the use of Edgeworth expansions and "seems to bemost useful when the posterior density is unimodal even though not Gaussian".Global MethodsAny global method needs a way to solve the general problem of1. defining an initial grid,2. giving a procedure to define the grid at the next time step, and3. for a given grid, select a method to approximate the probability density.Kitagawa [27] approximates integrals by a finite sum defined over a discrete equi-spacedgrid for Ot, where densities are approximated by a first-order spline function. Smith et al. [63],Pole [49], Pole and West [50], propose an adaptive grid using Gauss-Hermite quadrature. West[74] points out the difficulties with the quadrature method:(i) the enormous computational demand in problems with more than a few parameters,(ii) the difficulty of reconstructing smooth posterior distributions based on the evaluationof only a few points, and(iii) the need to work with an approximately orthogonal transformation of the parameters.In the adaptive importance sampling approach of West [75] [76] the grid points are randomlygenerated. Such dynamic grids will be concentrated in regions of interest in the parameter space(provided the mixture models are good approximation to the "true" underlying distribution).Chapter 2. Bayesian Posterior Computation ^ 36How to (adaptively) find a good mixture model, was described (for time-invariant 0) in Sec-tion 2.3. I will show next how this can be modified for sequential models.2.4.1 AIS in Sequential ModelsUsing the notation of Section 2.3 (with the obvious changes to include the time index t) andgiven the observation model (2.6), the evolution model (2.7), and the data set Dt_1, the pos-terior p(0t-1 I Dt_1) is summarized by = {fit-10,nt_bot-bfit-1}, where gt_1(•) is thefinal mixture model before clustering and nt_i the corresponding sample size. The updatingmechanism (Steps 1 to 3 above) can be rewritten as Monte Carlo approximations.Step 1: (i) If the expectation Ee[H(Ot) Ot_i] with respect to the evolution density (2.7) canbe explicitly evaluated for some function of interest H(.), then its predictive expectationisnt-1^E[H(9t) I Dt_1]^E cot_i,jEetH(9t) et_1,2]Particularly, for mean and variance,nt-1at = E[9 I Dt_1} E wt_LiEe[et I et_i,i]Rt = Val*[Ot Dt-i] Pe, E wt-i,i(Vare[et I °t-i,j] (Ee[Ot Ot-i,j] - at )2)j=1(ii) The prior (predictive) distribution^I Dt_i) is approximated bynt-1^P(Ot I Dt-i)^E wt_i,Jpe(Ot 1 et-1,J)j=1This approximation is used to generate a stochastic sample 07 = ^= 1, • -,nt-i}from the prior distribution. It is also used to evaluate prior probabilities in step 3 below.Step 2: The sets Slt_i and Ot* are used for inferences from the (marginal) forecasting distri-bution (Yt I Dt_i). Particularly,nt-tft = E[Yi Dt-i] E wt_LJE" ezi]j=1Chapter 2. Bayesian Posterior Computation^ 37nt-1Qt Var[Yi Dt--1]-^-' E cot_i j(Varo[Yt 0;,j] (Eo[Yt^it?)where Eo and Varo are calculated with respect to the observation model (2.6).The marginal probability p(Yt Di_1) is given byat-1p(Yt I Dt_i) E cot_Lipo(Yt KJ)j=1Step 3: After /it = lit was observed, the goal is to obtain Ft --= {gto,nt,ot,nt} as an approx-imation for the posterior (filtered) distribution p(Ot I Di), completing one recursion. Theprocedure is as follows:1. Use o; and 52t_i to: (i) calculate the Monte Carlo estimate of V as indicated inSection 2.3 and (ii) obtain the clustered sets (in = {9; j = 1, • • , ml and S4 = {wZi; j =1, • • • ml with m < nt_i elements.2. Define an initial mixturegoy)) = Ewzido ezi,h2v)j=1and produce a sample Ot,o = fot,o,j; = 1, • • • , nt,01 from it.3. Calculate the corresponding set of weights Clip, from yt,o(et,o,i) andft(Ot,0,j) OC Po(Yt I et,0,j)P(Ot,0,3 Dt-1)4. The summary rt,o = {gt,o(-),nt,o, et,o,f2t,o} is used to repeat the procedure. Repeatthe procedure to obtain rt,i, rt,2, until this set is considered to mimic well the trueposterior. Use the final set as I.For the initial time t = 1 the density 1-(01) = p(01 I Do) has to be specified; usually a diffusedistribution is chosen, from which a random sample can be drawn easily. This random sampleis O. To obtain the corresponding weights f21,0, define gi,o(0) = 740) so that wo,i a po(Yi01,0j). The above procedure is then applied to generate Fla, etc.Chapter 2. Bayesian Posterior Computation^ 382.4.2 An ExampleFor illustrative purposes, I shall consider a (hypothetical) situation in which the biomass ofsome resource is described in time. I use a non-linear non-Gaussian model for which classical(Kalman filter) methods do not apply.The biomass dynamic is described byBt = St_i exp(a - bSr i)e"t^ (2.11)where Wt are independent random variables from a Student distribution with 9 degrees offreedom mean 0 and constant variance W. St denotes the biomass surviving year t. Given anannual natural survival rate C, and ht the harvest rate in year t, I define St asSt = C(1 - ht)Bt^ (2.12)Each year some abundance index It is observed. The relation between It and the unknownbiomass Bt is modelled byIt = qBtevt^ (2.13)where the observation errors vt are assumed independent random variables from a Studentdistribution with 9 degrees of freedom, mode 0 and constant variance V.After defining 0 = ln B and Y = in I, the following dynamic model is derived from (2.11),(2.12) and (2.13):where1e = G(Ot-i,ht-i)+ wtYt = F(Ot_i) + vtG(6, h) = di. + ln(1 - h ) -I- 0 - d2(1- h)PePeF(0) = in q + 0di. .1nC-1-ad2 = bCP(2.14)Chapter 2. Bayesian Posterior Computation ^ 39Table 2.2: List of parameters used in the simulation model for artificial data generationParameter ValueC 0.8a 1.65b 0.20P 2B1 4.7q 1V 0.13W 0.04d1 1.43d2 0.13Table 2.3: Output from simulation. The harvest rates ht were fixed in advance. Bt is theBiomass at the start of year t and It is the corresponding observed abundance index. Ot and Ytare the natural logarithm of Bt and It, respectively.t ht Bt It Ot Yt1 0.2 4.71 3.22 1.55 1.172 0.4 3.39 1.67 1.22 0.513 0.6 3.90 2.59 1.36 0.954 0.6 4.62 5.81 1.53 1.765 0.3 5.70 5.16 1.74 1.646 0.3 2.12 2.51 0.75 0.92Notice the non-linearity of GO with respect to O. By construction the process is non-Gaussian as well. For the purpose of illustration, all parameters a, b, p , q, C, W and V areassumed known with values listed in Table 2.2.I simulated the process for 6 time steps with harvest rates ht specified for times t = 1, • • • , 6.Table 2.3 summarizes the outcome.Notice the substantial drop in biomass at time t = 6; possibly an effect of the heavy-tailedStudent distribution modelling the random process noise w.The initial uncertainty about 01 is described by a prior r(01) which is Student with 9 degreesof freedom, mean 1 and scale 0.36 (the dotted line for t = 1 in Figure 2.3). After observingt=3 t=4t=50 32Chapter 2. Bayesian Posterior Computation^ 40t=1^ t=2Figure 2.3: Prior (predictive) and posterior (filtered) distributions are shown respectively bydotted and full lines, for times t = 1 to t = 6. `\?•' is the observation Yt and `+' indicates the"true" Ot.the posterior distribution p(01 I Y1) is obtained (the full line on the same graph). Theuse of G(01, h1) together with the known Student distribution for w1 enables the calculationof the prior (predictive) distribution p(02 I Y1) (the dotted line for t = 2 in Figure 2.3). Afterobserving Y2, the process is repeated; then for Y3 etc.At each t, I repeated the AIS procedure 3 times, with sample sizes no = 250, n1 = 350 andn = 1000. The mixture model (2.3), used to approximate the posterior p(Ot I Yi, • , Yt), wasobtained after clustering the sets 0 and SZ to m = 100 components. I used the conventionalChapter 2. Bayesian Posterior Computation^ 41bandwidth multiplied by 0.65 to allow for possible multi-modal distributions. The relativeentropy (2.4) was always larger than 0.997 for the final mixture approximation, indicating agood agreement between the importance function g(.) and the corresponding posterior p(•). Fortimes t > 1, densities for predictive (prior) distributions p(Ot I • , Yt_1) where approximatedby 1000E cot_lJpe(et I 9t_1,i)j=1where Pe(•) is the (known) evolution model: a Student density with 9 degrees of freedom, meanG(0t_1, ht_i), and variance W. In Figure 2.3, prior (broken lines) and posterior for Ot, are shownfor times t = 1 to 6. For convenience, probabilities (y-axis) were standardized for maximummode at unity. The information provided by 171 = 1.17 (displayed as V') reduces uncertaintiesas is reflected in the sharper tails of the posterior. The small deviation of VI with respect tothe prior mode 1, caused the minor shift of the mode to the right. The true (unobserved)biomass indicated by `+' was in fact higher than prior mode or observation would indicate.The influence of GO in shaping predictions can be seen in the prior for t = 2. The asymmetricshape of this distribution supports the inappropriateness of a Gaussian approximation (whichwould be implicit in case of EKF). A second observation V2 0.51, surprisingly low accordingto predictions, causes a substantial revision in current beliefs and a smaller posterior mode. Fortimes 3, 4 and 5 analogous considerations describe the dynamic of information acquisition. Att = 6, the predictive distribution again presents the asymmetric feature displayed at t = 2.Two features about the predictive distributions deserve comment: (1) the consistent lim-itation to values below 9= 2, and (2) the striking presence of sharper tails in the predictivedistribution (as compared to the posterior) for times 3, 4 and 5. Both are reflection of theparticular form of GO. But, while (1) could be anticipated by the non-linearity in the biomassproduction model, (2) was a surprise.Chapter 2. Bayesian Posterior Computation^ 422.5 SummaryThis chapter was about the implementation of the Bayesian paradigm where numerical ap-proximations of integrals involving some probability function p(.) are needed. Monte Carlointegration with adaptive importance sampling was used to calculate these approximations.In Section 2.2, I gave a review of the concepts and properties related to basic importancesampling. This approach depends critically on the existence of a "good" importance functionto replace p(-). A procedure to find such a function adaptively is described in Section 2.3.This method, proposed by West [75], uses a finite mixture of Student distributions as theimportance function. This function is refined by a recursive mechanism Such procedure isparticularly attractive for multivariate p(.). An example was included to illustrate the detailsof the required calculations.Section 2.4 extended the application of the adaptive method to situations where the param-eter of interest changes in time and the data are acquired, one by one, in sequential order. Anillustrative example was provided.Chapter 3Models for Biomass Dynamics3.1 IntroductionIn Chapter 1 I described the importance of Bayesian analysis for fisheries assessment andmanagement. Some technical difficulties which have hindered a more widespread applicationin the past can be overcome by the numerical methods described in Chapter 2. The presentchapter integrates the previous two, and proposes procedures for Bayesian analysis of biomassdynamic models.The novelty is the Bayesian framework in which traditional fishery models are treated. Ishall establish notation, describe, and justify the proposed procedures. The main result of thechapter is the posterior distribution for O. The potential advantages and new insight gainedfrom such an approach will be explored in Chapter 4 using case-studies.Here is a short guide through the chapter. The rest of this introduction describes some gen-eral features of fishery models and data. The reader familiar with them can move to Section 3.2where I introduce the first group of models. Section 3.3 presents a second group of models forwhich I extend the time-series fitting procedure of Pella and Tomlinson [48]. In Section 3.4 Idescribe delay-difference models — again an optional section. Finally, Section 3.5 is a summaryof the chapter.Fishery Data and ModelsModels describing the biomass dynamics of fish stocks are of central importance in the assess-ment of the resource. By providing information about productivity, such models are used to43Chapter 3. Models for Biomass Dynamics^ 44find optimal policies for exploitation. Generally speaking, these models imply that highest pro-ductivity is obtained at some intermediate biomass. If the biomass level is too low, the stockis being overfished in the sense of catching too many fish too early, hence wasting potential forreproduction and growth. At low biomass, the surplus due to growth typically exceeds lossescaused by natural mortality. Unduly high biomasses, on the other hand, favour mortality overgrowth. At least two causes explain this phenomenon: (1) the large proportion of older fish,which no longer use food efficiently for body growth; and (2) the waste of energy caused bycompetition for limited food supply.The above description makes it clear why estimates of biomass, and the attempt to modelits dynamic, are central in assessment studies. In the past scientists worked with a staticconcept of "optimum biomass" obtained under equilibrium (time-steadiness) conditions. Theinappropriateness of such a concept has become apparent from its failure in practical situationsand from a growing number of simulation studies. Recent developments have moved away fromthese earlier approaches in at least two ways. First, there is the use of dynamic (as opposedto equilibrium) models and the view of optimum biomass as a "moving target" subjected touncontrollable external factors and affected by the history of exploitation; in fact, optimalityitself is being questioned. Second, there is an increasing emphasis on statistical models todescribe the structure and dynamics of uncertainty.Management strategies need to reflect these changes. The historic (and still widely used)concept of "Maximum Sustainable Yield" (MSY), defined as "the average yield which would sus-tain the resource at its optimum equilibrium biomass", is outdated. MSY has been replaced byconcepts such as: (i) optimum fishing effort maximizing future average discounted catch (Lud-wig and Walters [38]) or (ii) total allowable catch quotas (TAC's) which — subjected to someconstraints — result in (estimated) smallest risk of biomass depletion (Fournier and Warburton[17]). Still more elaborate strategies using stochastic dynamic programming (Walters [68]) arepossible. Such strategies explicitly acknowledge that an action today (by affecting the dynamicsinto the future) will affect the state of the stock tomorrow. Uncertainty reduction (learning)Chapter 3. Models for Biomass Dynamics^ 45as well as economic gains can be combined to determine an acceptable compromise. Finally, inorder to disclose confounding effects between environmental factors and those related to fishingand stock size, management actions can be guided by general principles of experimental design(Walters and Collie [70]).The usefulness of the Bayesian estimation procedure and its link to management decisions,was described in Chapter 1. I will not further elaborate on the topic (although I shall come backto it in the concluding chapter), but turn to other aspects relevant for stock assessment: (1)biomass as an aggregate measurement of the stock, (2) data types and acquisition, (3) modelsand estimation of parameters.BiomassThe appeal of stock description in terms of its biomass is simplicity. But, as for all aggregatemeasurements, there is a price to pay: a loss of information regarding the internal structureof the stock. A large number of young and small fish can have the same biomass as a smallernumber of older large fish. From a biological standpoint they can be very different in behaviour,food requirement, reproductive potential, and mortality rate. In commercial terms they mightalso represent distinct markets, prices, and fishing gear requirement (hence, costs). Similarcomments apply to the spatial and/or temporal aggregation of local biomass measures.DataI shall give a short account of the type of data commonly available for fishery assessment, itssources and potential weaknesses. For a broad description of this topic including survey design,see Hilborn and Walters [25].The data in fisheries have three sources: commercial catches, surveys and tagging. Datafrom the commercial fleet form the primary source of information about the exploitation historyof the stock. Total catches (often better defined as "landings", because they ignore onboarddischarge of economically unattractive or illegal sizes and species) together with fishing effortChapter 3. Models for Biomass Dynamics^ 46and location are collected. Port samples of fish to determine age, sex, length- and weight-distribution are usually collected as well. The ratio of catch per unit of fishing effort (CPUE),is used as an index of abundance and assumed to be proportional to biomass. Such data aretypically observational in the sense that there is no control over where and how they are collectedat sea. The fisher will go where the activity is economically attractive, be it for high abundanceof fish, convenience of location, or any other reason. Assumptions of random sampling do notapply. This fact causes enormous difficulties for reliable estimation of abundance (via CPUE)and represents a serious obstacle to stock assessment.To avoid the problems associated with commercial data, management agencies have been us-ing research surveys to obtain more accurate estimates of abundance and fishing effort. Surveysare comparatively expensive and usually gather less data than would be required for acceptablelevels of confidence in estimates (in the context of classical statistics). But, surveys are valuableas a supplement to commercial data.Another possibility for abundance estimation has been tagging. Because of high cost, dif-ferential fish behaviour, and difficulties with tag recovery, its performance for abundance es-timation has been discouraging (although it is a helpful tool in the study of movement andmigration patterns).ModelsFishery data are usually compiled annually so that a discrete modelling of time is appropri-ate. Therefore the models will be presented in the form of difference (rather than differential)equations.The levels of confidence associated with classical estimates of parameters in usually non-linear models (and non-linear functions of these parameters) have been calculated in differentways: (i) variations on traditional linear and non-linear regression methods and approximations;(ii) numerical methods (jackknife, bootstrap). Alternatively, the Bayesian approach taken hereprovides estimates in the form of posterior distributions; the most complete description ofChapter 3. Models for Biomass Dynamics^ 47uncertainties.I shall present next a detailed description of models and procedures for the practical imple-mentation of Bayesian analysis in fisheries.3.2 Model 1In this section I shall present models for stock dynamics which assume only system noise. I callthem Model 1. The next section considers models with observation error and system noise; Ishall call those Model 2.Model 1 is important because it allows us to derive some useful results which build the linkbetween adaptive importance sampling (Chapter 2) and biomass dynamic models. Model 2takes in the time-series fitting procedure of Pella and Tomlinson [48] — which has proven quitesuccessful for conventional parameter estimation in biomass dynamic models — to calculatethe likelihood.The biomass at time t is denoted by B. After harvest, the surviving biomass is St. Themodel describing biomass dynamics is formulated asBt = St-iF(0,St_i)et^ (3.15)where 9 = , ,O(P))' E RP is the unknown p-dimensional parameter of interest and {vt}a sequence of independent random variables with mean 0 and constant (unknown) variance V.Management policies are often given as functions H(9) where H : is known. Predictioninto the future depend on the knowledge of 9 as well.When such models are used to describe stock dynamics they are called surplus-productionmodels (e.g. Ludwig and Walters [37][38]). When used to describe the relation between thenumber of spawners (Si) and the number of returning recruits k years later (Bj = Rj+k), theyare denoted stock-recruitment models. But for the purpose of this section such a distinction isunnecessary.Chapter 3. Models for Biomass Dynamics^ 48I shall consider two specific forms for F(0, S). The Deriso-Schnute family of models specifies^F(0, S) = a(1 - b7S)lh^ (3.16)with a > 0 and b > 0. Appropriate choice of 7 allows formulation of a variety of classicalfishery models. For instance, traditional models due to Schaeffer, Beverton-Holt, and Rickercorrespond to 7 = 1, 7 = - 1 and -y -> 0, respectively.An alternative formulation uses the generalized Ricker models of Ludwig and Walters [38].It specifiesF(0, S) = ae-bs° (3.17)Both groups, (3.16) and (3.17), overlap at the standard Ricker model, for which the limit as-y -> 0 is taken in the former and (5= 1 is assumed in the latter. In the models described below,I always consider specific sub-families by fixing in advance the parameter 7 in case of model(3.16) or /5 in case of (3.17). According to Ludwig and Walters [38], an attempt to include theseas unknown parameters would produce a worse result than assuming a plausible value instead.The components of 0 need to be defined over the real line. Therefore, I shall define 0as the appropriate transformation of parameters: for a parameter 0 < a < oo I use ln a; if0 < a < 1 I use the logit transformation ln(a/(1 - a)), etc. Hence, in the above models I define0 = (in a, in b)'.Biomass and catch are strictly positive quantities and because of this I have modelledrandomness as a multiplicative (non-negative) quantity (e"). For mathematical tractabilityit is convenient to divide both sides of (3.15) by St_i and take logarithms. The followingprobabilistic model for Yt = ln(Bt/ St_i) results:P(rt I 9, V) - dp(pt(0), V) (3.18)whereMO) = ln F(0, St--i)and dp(m, C) is some p-dimensional distribution with mean m and covariance matrix C. Thevalue St_i is assumed known at time of predicting Yt; hence, pt(0) is also available.Chapter 3. Models for Biomass Dynamics^ 49Prior uncertainties about V and 9 are assumed to be independent and given by ir(0, V) =r(0)r(V).For a given data set Y(r)^, , Yr} I will be interested in features of the posteriordistribution p(0 I Y 01), and sometimes in policy variables H(9) or utilities U(0, a). 111(9, V) =Po(Y(7) 0,V) is the likelihood for (0,V) then, given the assumption of prior independence ofand V, I obtain 1(9) = Po(Y(T) I 0) by marginalization with respect to V. I.e:1(0) = jpo(Y(7)1 0, v)r(v)dv1(9) = Elor(1T)[1(0,V)]I shall follow the notational convention of Berger [2], and use superscripts on expectationsE to indicate the random quantity (or its density) over which the expectation is to be taken.ir(V)Subscripts on E will denote fixed parameter values. For example, Ee^means the expectationwith respect to the prior for V, conditional on a fixed value O. Superscript and/or subscriptwill be omitted whenever the interpretation for E is obvious.According to Bayes Theorem (section 1.3), the posterior for 0 is obtained asPO I 17(7)) a r(0)4(v)[1(9, V)]^ (3.19)To use (3.19) in the context of adaptive importance sampling (AIS), we need to evaluateits right-hand side (up to a normalizing constant) for any given O. Technical results given asLemmas in the next subsection will enable the practical implementation in Chapter 4, but canbe omitted from a first reading without loss of continuity.3.2.1 Two LemmasThe results to be given in the two Lemmas of this subsection will be used throughout theapplications of Chapter 4.The first and most important of these Lemmas assumes a Gaussian model in (3.18) togetherwith an Inverse Gamma prior r(V). The reason for this combination is the conjugate propertyChapter 3. Models for Biomass Dynamics ^ 50of distributions. As was explained in Section 1.3, such property is a helpful tool to simplifycalculations in general, and will be used to aid the calculation of 1(0) in (3.19).Lemma 3.2.1 Let d(.,.) in (3.18) be Gaussian and r(V) /G(ctoo3o), an inverse gammadistribution. The following special cases of (3.19) are obtained given two different assumptionsregarding the distribution of (Y (r) I 0,V):(a) If (Y3 0,V), j = 1,•• •,7" are independent, thenP(9 j 17 (7)) 0( 7(9)(13T(9))ar)where ct, = co r/2 and^0„- ( 0) =^20owith Z(0) =^-(b) If (Y:i I 9,V) N((9), V) for any fixed 9 and (V I Y(j -1),0) IG(aj_1,13J-1(0)), thenp(0 I Y (r))^r^(13-7(9))c'where ai = a2_1+ 1/2 and20i_1(0)0.7(9) = 2 +^- Pi (9))2Proof:• In (a), notice that1(9, V) = H p( Y2 I 0, v) v -T/ 2 e - Z(19)/ 2 Vj=1and hence^F701[0,101 OC ty-(0+1-12+1) exp^(Z2(:) olov)) dvE7(11[49, V)] CX (0T(0))ar pe(v)dvwhere pe(V)^[3,-(0)) so that the last integral equals 1.2 + poz(e)Chapter 3. Models for Biomass Dynamics^ 51• For (b) recall that r(V) = r(V I 0) so that^Eir(v)[/(0, V)] =^Po(Y(T) I v,0)7r(v I 0)dv= po(Y(r),v I 0)dv= Po(Y(T) I 9) = 460.The pair of distributions assumed in (b) form a conjugate family, such that(V I Y(j), 0)^/3(9))and from the Gaussian observation model it follows that^(yi Y(j - 1), 9) T^(ai_ifii-1(9)) 1)where T(k,m,C) denotes a Student distribution with k degrees of freedom, mean m andscale C.The expectation in (3.19) is the marginal likelihood 1(9) = po(Y(r) I 0). Because ofthe time-series nature in the observations, assume yi dependent on past data Y(j - 1).According to the multiplication rule, 1(9) can be written as the product1(0) = H po(Yi I Y(j - 1), 0)From the assumptions it follows that each term in the product is given byPo(Yi I Y(j - 1),9) a (#i_1(0))-112 (1 + i3j-1(19)(17-i iii(9))2)2(0i-i(0))-(a,-1-1/2)2 +^(0)(Y - pi(0))2213i_1(0) (0.i(0))°' oc(/3j-1 (9))-1which completes the proof.Chapter 3. Models for Biomass Dynamics ^ 52What distinguishes assumptions (a) and (b), is the use of sequential learning about V in thelatter. At each time t an up-to-date estimate of V, given the currently available informationY(t — 1), is being used. I expected improved performance under assumption (b). However, forthe cases I have analyzed the difference was unimportant in practice.The second Lemma is useful in cases where some (posterior) estimation of the nuisanceparameter V is required, and the assumptions of Lemma 3.2.1 apply. It gives explicit formulasfor first and second central moments of the posterior (V I Y(r)); a finite mixture of Inverse-Gamma distributions. These moments are expectations over p(0 I Y(r)), which can be evaluatedeasily by Monte Carlo approximation, using the final sets 0 and Si produced by adaptiveimportance sampling.Lemma 3.2.2 Given the assumptions of Lemma 3.2.1, the first and second central momentsof (V I Y(r)) are:1 E[V lif(-r)] =^EP—^(91Y(T))[0;10] for a,- > 1a, 11 E[V2 I^ Ep(elY(T))[13,720] for a, > 2 .17(7)1 = (a,._ 1)(a,-- 2)Proof: These expectations are a direct consequence of the following facts:1. Under assumption of Lemma 3.2.1(a), and any fixed 0, it follows thatp(V I Y (r), 0) a r (V)1(0 , V).In the proof of Lemma 3.2.1, I showed that this is proportional to an Inverse Gammadistribution with parameters a„ and 0,(0).In the case of Lemma 3.2.1(b) and any fixed 0, the conjugate property of distributionsimplies that(V I Y (r), 0) ,-, IG(ar, OT(9)).Notice that, while a,- is the same for cases (a) and (b), the values of 0,(0) differ.Chapter 3. Models for Biomass Dynamics^ 532. Using properties of conditional expectation, the two moments of interest can be rewrittenas follows:E[V I Y (r)] E2[E1[V Y (r), 011E[V2 Y (r)] E2 [V ari[V I Y (r), 0] -I- ERV I Y (r), 0]]where E1 and V ari are expectation and variance taken with respect to p(V I Y (r), 0) andE2 is expectation with respect to p(0 I Y(T)).3. Finally, using the fact that for X IG(a,b) the mean is E[X] = 11(b(a - 1)) if a > 1and the variance V ar[X] 11(b2(a - 1)2(a - 2)) if a> 2, we obtain the result.3.2.2 Catch and Effort DataModel (3.18) implicitly assumes direct observation (with negligible observation errors) of the set{(St, Bt+i); t = 1, • , r} . The stochastic component modelled by v, usually denoted "systemnoise" or "process error", reflects noise in the dynamic description of the system. Such modelapplies, for instance, in studies of salmon populations to describe relations between spawningstock St_i and the returning recruits Bt = Rt+k after k years (Stock-Recruitment Models),assuming that good estimates of both S and B can be obtained. For most practical cases,however, neither S nor B are available directly.It is common to have a data history of total catches Ct and the corresponding fishing effortsE. These data need to be translated into abundance estimates for B and S in order to useModel 1. A widely used model in fisheries defines this relation as follows:Ct = Bt(1 - e-gEt)^ (3.20)St = Bt - Ct (3.21)The term ht = 1- e-gEt in (3.20) denotes the harvest rate in year t as a function of fishingeffort; the constant q > 0, the catchability coefficient, specifying the efficiency of each unit ofeffort. In (3.21) no mortality other than fishing is assumed. If the rate of survival to naturalcauses ( is to be included, this expression can be replaced by St = BtCht •Chapter 3. Models for Biomass Dynamics ^ 54From (3.20) and (3.21) the values Yt and St appearing in (3.18) are functions ofq, Ct, Ct+i, Et, Et+i; that is:Yt = y(q, Ct, Ct+i, Et, Et+i)St = s(q, Ct, Et)where these functions are as follows:Yt in (C+1 (1 — exp (— q-Ei))) qEt(1 — exp (—qEt-F1))CtSt = Ct exP "Et)— exp (—qEt)J(3.22)(3.23)Notice that these calculations require specification of q which can be incorporated into 0.The logarithm of q is the appropriate transformation and I define 9 = (in a, in b, in q)'. Givena set of catch and effort data {(Ci,Ei); j = 1, •• • , r 1) and a fixed value for the parameter0, the corresponding set {(Yi, Si); j = 1,•• • , r} can be calculated using (3.22) and (3.23). Theusual calculations of the posterior p(0 Y(r)) can now be performed. But, notice that eachgiven q will result in a different sequence of Y and S values. Also, the likelihood 1(0) is to beinterpreted as the evidence for 9 given catch and effort data.Observation ErrorsThe model so far has not considered any error in the observation of catch and/or effort. From thediscussion in Section 3.1, it should be clear that such an assumption is inadequate. To focus mydiscussion, I shall assume accurate observation of catches Ct but substantial observation errorsin efforts E. Such an assumption often agrees with practice if one considers the formidabledifficulties in standardizing measurements of effort over an entire fleet of differing vessels, gears,and skills. To examine the effects of these errors, assume Et to be unknown and that Et" isobserved instead; the difference (Ef — Et) being the observation error. Let Yt° and Sf be thealternatives to Yt and St in (3.22) and (3.23) when efforts E are replaced by E°. To keep theargument simple, assume that the observation errors are adequately modelled byYt° = Yt + xi,tChapter 3. Models for Biomass Dynamics^ 55Sf = St + x2,twhere x.,t are random quantities induced by errors in E°. It is easy to see that (3.15) can berewritten in terms of observed values asYt° = in F(0, Sf_1 - x24-1) vtParticularly, model (3.17) with 5 = 1 (the standard Ricker model) results inYt° = a - bSt° 1+ vt*where viK = vt^bx2,t_1. The parameter V in (3.18) is now the variance of the randomvariables v; (rather than vt), which include process and observation errors. To distinguishthe combined variance from the variance associated to (3.15), I denote the latter by Vi,: the"process" error.Such a (convenient) linear arrangement of random terms is not likely to occur in more real-istic formulations. The non-linear nature of most models make this relation quite involved andanalytical analysis impractical. The impact of "errors-in-variables" on modelling and manage-ment has been recognized for quite some time (Walters and Ludwig [72]) and has caused a shiftin recent developments towards the explicit modelling of statistical components (Schnute [58]).But the example demonstrates that V > Vp; the overall variance is more than just process error.How much more will depend on the magnitude of errors in observations and the particular formof the model. If an estimate of Vp is needed — maybe to determine some optimal policy —then the careless use of an estimate for V instead can result in considerable bias in the finaloutcome. How to overcome such difficulty ? Ludwig and Walters [38], in a similar context,side-step this problem by defining an additional parameter A = Vp/V, the ratio between thesetwo variancesl. The case with no errors in observation would correspond to A = 1, while A = 0assumes no process error. According to their performance evaluation, attempts to estimatethis parameter together with all the others gave poorer results than to simply fix it at someintermediate value, say A = 0.5.'They actually use A (V — VA/VChapter 3. Models for Biomass Dynamics^ 56Utility Function and Bayesian DecisionThe central interest of any practical application is not 9 itself. In the classical approach somefunction H(9) is usually to be estimated. A pragmatic approach might seek an answer tothe question: "What effort level should be recommended for future years ?" Bayesian decisiontheory (Section 1.3) can be used in connection with the posterior for 9 and some utility functionto provide an answer. I shall propose an utility function and describe how such analysis can beperformed.Given a parameter value 03 E 0 and an effort ei E A, where 0 is the set of parameter valuesand A the set of possible management actions, I define U(9, ei) as the utility associatedwith such a pair. Further p(0 I Data) denotes the posterior distribution for the parameter 0andU(i) EkelData) [u(., ei)]the expected utility for a fixed effort ei. The Bayes rule or optimal action will be the effort ei*which maximizes the expected utility; i.e.:U(i*)^U(i)A Monte Carlo approximation for U(i) is readily available (recall Section 2.2) and calculatedasU(i)In defining an utility function, it seems sensible to reach two goals simultaneously:1. To maximize (discounted) future catches.2. The avoid dangerously low surviving spawning stocks St.For a given pair (03, ei) and a starting surviving biomass So, — which is available from thedata, and defined as "the value of S at the final year of observations" — the fitting model F(-)is used to predict the average annual catches.6 over a T years period into the future, given thatChapter 3. Models for Biomass Dynamics ^ 57a constant effort E = ei is used for the whole time. For each time t = 1, • • • , r the calculationsgo as follows:Bt = F(S-1)et = Bt(1 — e—qE)St = Bt - etThe Cumulative Discounted Catch(CC) is calculated asCC(j, i) = E 4513vett=1where (Spy is the discounting factor.Further I determine the smallest value of S over the same period as beingSmin = min St1<t<Tand fix a threshold, say SLIMIT, so that smaller spawning stocks are considered to be anunacceptable risk. Such limit might be associated with a perceived danger of stock collapse ifthe stock is reduced further.Finally, I define the utility function asCC(j, i) if Smin > SLIMIT=0^if Smin < SLIMITIn words, the utility function is given by the total discounted predicted catch if the choseneffort is fixed over a T year period, if and only if during these years the spawning stock neverfalls bellow the given threshold `SLIMIT'. If the latter occurs, the utility is set to zero.This utility function is simplistic in two ways: (1) it assumes a linear (i.e., risk-neutral)relation between CC and the utility derived from it; and (2) it defines the utility to be zerowhen Smin falls below the threshold, instead of using some generic penalty. For a particularfisheries, further refinements might be advisable (see the case-study for Orange Roughy). But,for the present purpose, this straightforward definition is adequate since it embraces threecharacteristics which, I believe, should be present in any such function. Namely,Chapter 3. Models for Biomass Dynamics^ 581. A range of implementable effort levels. If too low, it might kill the fishery; if too high, itwould bring overcapitalization.2. The desire of the industry to maximize catches in order to supply the increasing marketdemand.3. The intent to avoid stock collapse, manifested by the managing agency and the generalpublic and an industry which wants to utilize the resource in the future.3.3 Model 2For this second formulation denoted Model 2, I shall propose an estimation procedure thatbuilds on the concept of observation error/time-series (OETS) fitting, introduced by Pella andTomlinson [48]. Their central idea (for conventional parameter estimation) is, to fix a parametervalue 60 E 0 (possibly including the biomass at time t = 1) and then use some model to predictthe whole time-series. A best 0 is selected to minimize some goodness-of-fit criteria SS(0).Particularly, Pella and Tomlinson usedSS(0) = E(Ct — Ct)2t=iwhere Ct denote observed catches and Ot the corresponding prediction from the model.Subsequent developments considered more elaborate delay-difference models, as well as moresophisticated goodness-of-fit criteria (Deriso [12]; Schnute [56][57]; Fournier and Doonan [16];Ludwig et al.[39]; Fournier and Warburton [17]). In the next section I shall describe delay-difference models in more detail. Here, I focus attention on the estimation procedures used bythose authors, and suggest ways of incorporating these ideas into the Bayesian framework. Thefunction SS(0), in its most general form, is given asT^1/S S (9) = A E iv? +(1 - A) E + E ri (xi — 13)2t=i^t=i^j=iwhere A denotes the "split" between process noise and observation error as described for Model1 (previous section), with {Wt} and {vt} the corresponding sets of random perturbations. TheChapter 3. Models for Biomass Dynamics^ 59values rj are pre-specified penalties associated to the deviation between model predictionsand fixed (hopefully true) values xi. A large penalty associated to a specific j = jo say, indicatesa constraint imposed upon 0, to the subset for which ijo xio.The method is clearly attractive given its great flexibility and conceptual simplicity, althoughit is at least as "subjective" as the Bayesian approach. All it requires is an efficient algorithmfor minimization of SS(0) over the set 0. Difficulties are associated with the estimation ofstandard errors of such an estimate. Extensive simulation using bootstrap methods (Efron[14][15]) may be necessary. Therefore the Bayesian approach is appealing for at least threereasons:1. By introducing a prior distribution r(0), it avoids the necessity to simulate possiblyhundreds of replicates to determine variability in the estimates. The effect of alternativepriors (and/or likelihoods) can be examined by the more efficient procedure mentionedat the end of Section 2.22. It does not require the maximization2 of SS(0). Such maximization forces choice of oneoptimal parameter 0*• This ignores the fact that often there are many choices of 9 whichgive a good fit to the data, but would demand different management actions.3. For more elaborate delay-difference models, 9 will be of high dimension. Furthermore,the set {xj; j = 1, • • , v} in SS(0) are estimates from an independent source (outside thedata set), and typically have uncertainties associated with them. Non-informative fisherydata, allow for estimation of only two or three parameters in 0, with remaining parametersincluded as additional xj's. The Bayesian framework allows (at least in principle) for acoherent treatment of all uncertainties about 9 and x as probabilities. For convenience,the latter can be incorporated into O.To formalize the proposed Bayesian approach, I replace (3.15) in Model 1, by the following2Although the maximization of an utility function will be necessary at the later stage of decision making.Chapter 3. Models for Biomass Dynamics 60pair of equationsBt = G(0, Bt_i)e" (3.24)It = F(0, Bt)evt (3.25)where (3.24) models the dynamics for biomass B, and (3.25) the relation between biomass andan observed abundance index I. The process noise {wt; t = 1, • - • , r} and the observation errorlvt; t = I,. • -,7-1 are assumed to be independent identically distributed random sequences withmean 0 and constant variances W and V, respectively. The random variables vt and wt are alsoassumed to be mutually independent.Let Yt = ln It and pt(0) = In F(0, BO. I assume the observation modelP( Yt 1 9, V) rs, d(p1(0), V)^ (3.26)where d(m, C) denotes a one-dimensional symmetric distribution with mode m and varianceC. I will use Gaussian distributions in the examples of the next chapter.In analogy to the OETS fitting procedure, I define the time-series of predicted biomasses asfollows:B = {ErEBt I f3t-i]f3i fixedif t > 2possibly El E 0(3.27)and further define the likelihood for 0 to be given by1(9) = P al/1, ' ' • 'Yr) I (121(9), ' • • , iLT(0)))^ (3.28)where it(G) = ln F(0, 13').I propose the following procedure to obtain the posterior p(0 1 Y(T)):1. Fix a value of O. (recall from Chapter 2, that a sample of O's is available from the prior70)).2. Use the models (3.24) and (3.25) together with the definition (3.27) to obtain the sequenceof predictions Iiit(0); t = 1, . • • , 71Chapter 3. Models for Biomass Dynamics^ 613. Use (3.28) to calculate the likelihood 460 after appropriate marginalization with respectto V.4. Obtain the posterior p(0 I Y(r)) oc 48)749)Two comments are appropriate: (i) Lemmas 3.2.1 and 3.2.2 can be used directly afterreplacing pe(8) by the corresponding ilt(0); (ii) the time-series {Bt} and {/t} are considered ofthe same length in the above formulation, but the abundance data need to be available onlyover a subset of time points. In fact, the Orange Roughy example of Chapter 4 considers sucha case. There, catch data (useful in (3.24)) are available for a certain time sequence of lengthT say, while abundance estimates were obtained only for the s final years: T - 3 +1,• • • , T.The side-stepping of process noise by way of definitions (3.27) and (3.28) is an ad hocprocedure obtained by analogy to the OETS approach. I will discuss next, "why" (or at least"when") definitions (3.27) and (3.28) are justified.The handling of process noiseSuppose that in (3.24) I accept model G(.) as a fair description of the underlying dynamicson average, and that model imperfections and remaining sources of variability are adequatelycaptured by the random noise w. It is known from basic statistics that a* = Etr[Bt I f3t--i]minimizesEY [(Be - a)2 1 Et-ii •But a* is the definition of lit given in (3.27). In Bayesian decision theory (recall Section 1.3),the optimal action (Bayes rule) minimizes the posterior expectation of a given loss functionL(0, a). In particular, L(Bt, a) = (Bt - a)2 is the "squared-error loss"; and the definition of Etimitates the Bayes rule (although the expectation is not taken over the posterior distribution ofBe). Based on the rationale for choosing a Bayes rule, I could, in principle, use any alternative"loss-function" and modify definition (3.27) accordingly without affecting the remaining stepsof the overall procedure.Chapter 3. Models for Biomass Dynamics^ 62On the other hand, definition (3.28) is not so clear. For simplicity, assume a single knownobservation yt so that the subscript t can be dropped. Then, according to (3.26), the likelihoodfor 9 is I = p(y j p(0)); while in (3.28) I use I = p(y j ft(0)) instead. I am changing the mode ofthe symmetric distribution from the unknown true p(0) to its estimate f(0); the latter being afixed value given 0, while the former is random. Definition (3.28) might be adequate on averageif, for a given 0, the expectation of I (with respect to the random p) is approximately given byi; i.e:E;[1] IWhile unable to assess the reasonability of such an assumption in general, I shall considera particular case which is relevant for later applications. Assuming Gaussian distributions forw and v (with respective variances W and V), and fi = ln F satisfying ED.t — = 0 andE[A — Al2 = W, it is easy to show (using Taylor series expansion up to quadratic terms) thatE[l]^W1 +^1+ y — 11)2) (3.29)2V^V-17If the left-hand side of (3.29) equals 1 there is no bias; such is trivially true for W = 0.For a ratio smaller than 1, lis an overestimate of 4[1], while a value greater than 1 denotesunderestimation. Given W> 0, and defining the standardized deviation c = (y — /2)2[/V, thefollowing holds for (3.29):1. The smallest ratio of 1 (W/ 2V) > 1 is obtained for c = 0; the ratio increases withincreasing values of c.2. Since is a function of 0, (3.29) changes with 0 for fixed y. For those values of 9 causingc to be large, the underestimation in I is more pronounced.3. Larger values of V imply smaller values of c (hence, less bias). For V>> W the bias isnegligible.4. For fixed y and W, different values of 0 giving the same squared deviation c result inidentical bias.Chapter 3. Models for Biomass Dynamics^ 63The central message in (3.29) is that I consistently underestimates EY[d] and, as a functionof increasing values of c, the underestimation gets worse.A bias correction can be attempted if the relationis assumed, and estimates of A and V are available. Expression W/2V in (3.29) is then replacedbyA2(1 — A)and W is eliminated from the calculations.3.4 Delay-Difference ModelsHere is an overview of delay-difference (DD) models. Such models can be used for GO in(3.24). For a more detailed analysis, see Hilborn and Walters (Chapter 9) [25].The simple biomass dynamic models described by (3.16) and (3.17) cannot deal with time-delays and changes in population structure. The new class of DD models, introduced by Deriso[12], still describes the dynamics of the aggregate biomass but, by virtue of its construction, candeal with various time-delays. This because growth and recruitment are modelled explicitly.The aggregate biomass Bt at time t, is defined as the sum00Bt = E Na,tWaGk(3.30)where a > k are ages (with k > 0 denoting the age of recruitment to the fishery), Na,t thenumber of fish of age a at the starting of year t and wa the average body weight for age a.The possibility of modelling details of growth (w), recruitment (Nk.), and survival (Na,tNa+i,t+i) separately makes the model very flexible.In formal terms, the three basic assumptions concerning growth, recruitment and survivalto be used throughout are as follows:Chapter 3. Models for Biomass Dynamics ^ 64• Growth: A linear relation is assumed to describe average weight dynamics for all agesa > k, where k is a minimum age at which an individual is first recruited to the fishery.This relation requires two growth parameters a and p as follows.wa = a + Pwa-iThe weight wk is usually also fixed as an parameter.• Recruitment: Let St represent the spawning biomass in a given year t. Particularly, Iwill define St as " biomass which survives harvest in year t". The average recruitmentdynamics will be described by a deterministic relation of the formRt÷. = R{St -i}where RH is a given known function, u is a non-negative integer that describes a possiblelag between birth and recruitment, and Rti-u represents the average number of recruitsjoining the stock at time t u.• Survival: The total survival fraction in any year, st, is usually modelled as the productof a constant natural survival C and a time-varying survival to harvest (1 - ht), such thatSt = C(1- he).Given these basic assumptions, the following pair of dynamic equations for biomass Bt andnumber Nt can be derived:Bt+i = St (aNt PBt) wkRt+i^ (3.31)Nt+i = stNt Rt+i (3.32)Initial conditions need to be specified before the above recurrence equations can be used.The way in which these are established might change according to any particular application.I assume a start from equilibrium (i.e. dropping the t subscripts in (3.31) and (3.32) ) andChapter 3. Models for Biomass Dynamics^ 65negligible harvest prior to time t = 1 (i.e., h = 0). Such an assumption is reasonable if aninitial unfished population can be assumed. Rewrite (3.32) asNe = 141(1— ()^ (3.33)Replacing Ne in (3.31) and solving for Re results in an equilibrium recruitment Re given asBe(1 — (p)Re =^e (3.34)( icje-='Z + wk)The recruitment function I will use in the case-study of Chapter 4 is a one-parameter Rickercurve defined asSt-1St-1Rt+. = Rei,—e exp { b (1 —Be)}^ (3.35)Generalization of Deriso's model was proposed by Schnute [56]. Further improvements byFournier and Doonan [16] included the use of moments for weight and length as auxiliaryinformation. Incidently, they also considered a Bayesian structure allowing for the inclusion ofa prior distribution for 9. This information was then combined with the likelihood to determinethe posterior mode as an estimate for 0 (the generalized maximum likelihood estimate in Bayesiananalysis). They did not consider, however, the possibility of obtaining the posterior distribution.Alternatively Schnute [57] proposed a model for which he used size rather than age classes. Hefurther assumed "knife-edge" recruitment for a minimum size rather than a minimum age.Hence his largest departure from previous approaches, relates to the modelling of recruitment.An advantage of DD models over the simpler surplus-production family relates to the un-derlying parameters which, in the DD case, have a direct biological meaning. This is a desirablefeature for the Bayesian approach if elicitation of prior uncertainty is to be used. An disad-vantage with respect to simpler models is the high dimensionality of the parameter O. Thisdemands more computation and, because of uninformative data, often results in poorer perfor-mance (Ludwig and Walters [37]).Chapter 3. Models for Biomass Dynamics^ 663.5 SummaryIn this chapter I have outlined the types of data commonly used to assess the biomass offish stocks subjected to exploitation. They are: (I) catch and fishing effort statistics from thecommercial fleet, (2) abundance estimates obtained through surveys undertaken by managementagencies. While the former is not a random sample, the weakness of the latter is its high cost.The practical solution usually is a compromise between both types of data.Posterior distributions for the parameter 0 were obtained for surplus-production modelsin Section 3.2 (Model I) assuming the time-series of biomass Bt and surviving biomass St aredirectly observed, or estimated from catch and effort data. To deal with time-series of catch andabundance index, or more elaborate delay-difference models, I have proposed an adaptation ofthe observation error/time -series fitting procedure of Pella and Tomlinson [48]. This approachwas described in Section 3.3 (Model 2). Finally, Section 3.4 gives an account of general featuresof delay- difference models.The implications of process noise and observation errors in measurements, are also examinedthroughout. Corrections are possible whenever their relative magnitudes can be assessed.Chapter 4Applications4.1 IntroductionThis chapter is dedicated to specific applications of the adaptive importance sampling procedure(Section 2.3) to models for biomass dynamics (Sections 3.2 and 3.3). I shall consider 5 differentcase-studies, one per section. Each section was written to be self-contained, so that the readercan select the case-study of his or her particular interest. But I emphasize different details ofthe analysis in different case-studies. This avoids excessive repetition for the reader of all five.The notation developed in the two previous chapters is used throughout. In some cases I usereal data, while others rely on simulations.Sections 4.2 and 4.3 are devoted to "Model 1" of Chapter 3. In Section 4.2 I constructa posterior distribution for the parameters in models for stock and recruitment, using datafor Skeena river sockeye salmon. Section 4.3 refers to surplus-production models. There I usesimulated data and compare my results to those obtained by Ludwig and Walters [38], showingthat some of their main conclusions can also be verified here. Since my results are posteriordistributions, I can perform a Bayesian decision analysis. I suggest a sensible utility functionand show that the best decision is robust with respect to alternative models and priors.Sections 4.4, 4.5, and 4.6 use the fitting procedure outlined in Chapter 3 as "Model 2":the Bayesian alternative to the observation error/time-series fitting method. Section 4.4 is acase study of Orange Roughy, a valuable fishery in New Zealand which has been the subject ofrecent controversy. I show that the determining factor in the dispute is the choice of an utilityfunction. Different prior distributions and technical details about models are of little relevance.In Sections 4.5 and 4.6 I examine situations with multi-modal posterior distributions. In the67Chapter 4. Applications^ 68former I again use simulated data, while the latter uses a delay-difference model to analyze datafor Pacific cod. Finally, Section 4.7 summarizes the results of the chapter.4.2 Stock and Recruitment: sockeye salmonI shall consider a Bayesian alternative to the estimation of parameters in stock and recruitmentmodels when applied to data for Skeena river sockeye salmon (Oncorhynchus nerka). I use thedata presented in Hilborn and Walters [25] (Fig7.1, p.243) and shall conduct my analysis toallow direct comparisons with some of the results presented in their Table 7.4 (p.279).Pacific salmon have played an important role in the development of stock and recruitmentmodels. This can be explained in part by their particular life history. The fish hatches in freshwater and later migrates to the ocean where it lives most of its life. At the time of sexualmaturity (four years later for sockeye), the fish returns to the small watershed where it hadhatched. It then reproduces and dies. The size of the spawning stock can be measured in freshwater. Ageing techniques are used to determine, for each spawning stock, the number of adultsthat recruit to the fishery. This gives the pair of needed observations: spawning stock and thecorresponding recruits.The data consist of a sequence {(Ri, Si); j = 1, • •, r} of observed data, where Si is "thenumber of spawners in year-class j" and Ri "the number of recruits produced by Si". Hence,in (3.15), I define Bi+i = Rj and consider two alternative models to fit the data:• R: Ricker model (Expression (3.17) with (5 = 1)F(0, S) ae-bs• BH: Beverton-Holt model (Expression (3.16) with 7 = -1)aF(0, S) - ^1 bSwhere 9 = (in a, In b)'.R^BIT1. Slope at Origin2. Sma,x3. Rmax4. Smsy5. umsya1-6go'11+2(.5 — .071na)ln a(.5 — .071n a)Chapter 4. Applications^ 69Table 4.4: Some biologically meaningful attributes of the parameters in the Ricker and Bever-ton-Holt models.For the deterministic case (i.e.: V = 0 in (3.15)), and under equilibrium conditions, theparameter 0 relates to quantities of biological meaning. Table 4.4 (adapted from Hilborn andWalters) gives a summary of those quantities. Smax is the spawning stock which maximizes therecruitment; this maximum recruitment is given by Rmax. Notice the basic difference betweenthe models. While R predicts the existence of an intermediate spawning stock able to producemaximum recruitment, in BH the recruitment tends towards an asymptote, as S increases.Smsy is the optimum (spawning) stock size for "Maximum Sustainable Yield" (MSY1), withumsy denoting the corresponding optimal harvest rate.I used lines 1, 2 and 3 of Table 4.4 and the data-plot of R versus S displayed in Figure 4.4 toset the range for the parameters of the prior r(0). I started by determining five quantiles of an"idealized" (marginal) log-normal distribution for each parameter. If the fitting of a log-normaldistribution revealed inconsistency among these quantiles, I would try a new set of values.Table 4.5 gives the quantiles which were finally chosen. For instance, the value bp (parameter bunder the Ricker model) used as benchmark the interval 380 < Smax < 1430, while for brillI assumed 100 < Rmax < 5000. The parameters of the corresponding Gaussian distributionfor 0 (columns 'm' and 'C') were used as mean and variance of a heavy-tailed two-dimensionalStudent distribution with 9 degrees of freedom and diagonal covariance matrix.Prior uncertainties are smaller for model R as compared to BH. This is a direct consequenceof the larger difficulty in eliciting quantiles for bBll as compared to bR. The reason for such a'See Section 3.1 for its definition.Chapter 4. Applications^ 70200^400^600^800^1000^1200^1400SFigure 4.4: Stock and Recruitment data for Skeena river sockeye salmon (Reproduced fromHilborn and Walters [25]). The lines were fitted according to Ricker model (broken) and Bev-erton-Holt model (continuous) using means of posterior distribution.Table 4.5: Five quantiles used to determine prior distribution for parameters a and b (thesubscript according to the model: Ricker or Beverton-Holt). 'm' and 'C' are mean and varianceused in Student prior.Q u ant ile s 5 25 50 75 95 m Ca 1.0 1.5 2.0 2.7 4.0 0.70 0.18bR x 10-4 7 11 14 18 26 -6.65 0.15bim x 10-4 2 7 14 31 100 -6.55 1.37Chapter 4. Applications^ 71difficulty is the different biological meaning of the parameter (see Table 4.4). Prior assessmentof bBH needs the joint consideration of two uncertain quantities: Rmax and the slope at origina. For comparison, bit relates only to Smax.As a diffuse prior for V, I chose an inverse gamma distribution IG(ao = 2,00 = 1) whichhas mean=1, mode=0.33 and oo variance.The goal is to construct the posterior p(0 Y(T)), using the AIS approximation of Sec-tion 2.3. For convenience, I define the initial importance function go(S) to be the prior r(.).This results in initial weights wo j proportional to the marginal likelihood Po(Y (7) I 00 j). Aftersampling no = 1000 points from this prior, I perform two updating recursions using ni = 1500and n2 = n = 4000, respectively. At each loop the mixture model is clustered to a smallermixture of m = 500 before proceeding. The reported sample sizes were chosen empirically.Sample sizes that were too small caused instability in measurements of the relative entropies(2.4). Monte Carlo estimates for the mean and variance of 9 and functions H(9), were calcu-lated using the full sample of n = 4000 points. With respect to the assumptions described inLemma 3.2.1, I considered three different scenarios:• Ra Ricker model and assumptions of Lemma 3.2.1 (a).• Rb Ricker model and assumptions of Lemma 3.2.1 (b).• BHb Beverton-Holt model and assumptions of Lemma 3.2.1 (b).The values of "entropy" in Table 4.6 were calculated according to expression (2.4). Thismeasure is an indication of the goodness-of-fit between the current importance function andthe actual posterior densities at the given grid points; a value of 1 corresponds to a perfectfit. The observed entropies indicate the usefulness of the adaptive procedure in constructing agood importance function. In particular, the first recursion shows a considerable improvement,considering the relatively small sample sizes. After focusing the sampling effort on the regionof interest, the refinement of a second recursion takes the value of entropy even higher. Allfinal approximations have entropies above 0.99 (recall that 1.0 is a perfect fit). By comparison,Chapter 4. Applications^ 72Table 4.6: Results for AIS applied to the Skeena river sockeye salmon data under variousassumptions. The first number is the posterior mean; the posterior standard deviation is givenin brackets. Columns R* and BH* are the estimates of Hilborn and Walters [25].MODEL Ra Rb BHb R* BH*Entropy .466 .540 .635.955 .984 .984.995 .996 .994V I Y(r) .258(.056) .259(.056) .260(.056) .22 .23Smsy I Y(r) 718(148) 706(134) 1203(544) 851(246) 916(271)umsy I Y(r) .53(.04) .53(.04) .45(.04) .55(.05) .48(.06)P(Smsy > 800) .21(.007) .19(.006) .82(.006)P(Smsy > 1000) .05(.003) .03(.003) .03(.008)Table 4.7: Posterior mean and Covariance for different Models fitted to the Skeena river sockeyesalmon data.Model mo m1 Co,o Co,i C1,1Ra 1.29 -7.20 0.016 0.024 0.058Rh 1.29 -7.18 0.015 0.021 0.052BHb 1.20 -7.22 0.026 0.071 0.266a one-time sample of 6500 points using the initial importance function produced an entropy of0.60 in model Ra and less acceptable results for estimates. The same pattern was observed forother cases.Posterior mean and variance for 0 = (In a, in b)' are displayed in Table 4.7 and contour plotsof the posteriors for Rb and BHb are shown in Figure 4.5 (a) and (b), respectively. There ispositive correlation between in a and In b. In the case of model BHb, one can see that for largevalues of in a, the conditional distribution of in b is sharper (i.e. less variable) than for smallervalues of in a.Figure 4.5 (c) and (d), show approximations for posterior densities of Smsy and umsy. Theseposterior densities were obtained by fitting an ordinary (Gaussian) kernel estimator (Silverman[59]) to the 4000 observations of the appropriate function H(9i) (I used the 'density' functionin Splus). The strong agreement between the posteriors obtained for models Ra (full line)(a) (b)0.8^1.0^1.2^1.4^1.6^1.8In(a)(d)500^1000^1500S(msy)20-00Chapter 4. Applications^ 73Figure 4.5: (a) and (b): Contour plots of posterior distributions for Rb and BHb (Lines at .01,.05, .10, .25, .75, .90, .95 of the largest density). (c) and (d): Approximate posterior densitiesfor Smsy and umsy (dashed line: BHb; full and dotted lines: Ra and Rb).Chapter 4. Applications^ 74and Rb (broken line), indicate that there is no much to be gained by the more elaborateassumption (b) in Lemma 3.2.1 for this particular example. A comparison of the estimatesreported in Table 4.6 for both models (Ra and b) also support this conclusion. However,comparing models Rb and BHb, show marked differences for Smsy and umsy. Under BHbthe posterior for Smsy is quite flat as compared to Rb. The heavy upper-tail for Smsy in theformer is supported by the Monte Carlo estimate P(Smsy > 800) = 0.82 as compared to only0.19 for the latter. The posterior for umsy also suggests different actions under different modelassumptions. If Rb is chosen, an optimal harvest rate in the range (0.5, 0.6) seems adequate.Alternatively, for BHb, smaller values within (0.4, 0.5) are more adequate. But which one isthe better model ?The two lines fitted in Figure 4.4 use Monte Carlo estimates for the posterior means ofthe parameters a and b (the Bayes estimators under squared-error loss) for different models:full line for It and broken line for BHb. I cannot tell from these fits which model is better.The distinction between the models becomes important when predictions for high levels ofspawning stock (S> 1000) are made. Such model uncertainty can only be resolved by explicitexperimentation in management, allowing high spawning stocks (i.e. zero harvest rate !).The results obtained by Hilborn and Walters [25] are displayed for comparison, as columnsR* and BH* of Table 4.6. There are differences in the values for estimates of both mean andstandard error. The strongest disagreement is for Smsy under model BHb. But in generalterms the outcome of both methods agree.There is an advantage in describing uncertainties by posterior distributions rather thanpoint estimates when an action has to be taken. The next two sections, where the process ofdecision making under uncertainty is directly addressed, will illustrate this point.4.3 Surplus Production: Simulation ModelIn this section I shall use simulations to generate artificial data for catch and effort. These dataare fitted to a model of the form (3.15) according to the procedure described in Section 3.2. IChapter 4. Applications^ 75replicate some of the cases described in Ludwig and Walters [38], (denoted L&W throughoutthis section) and use their definition of "optimal effort" (Eopt), as the policy variable to beestimated from the data. In a second stage of the analysis, I shall replace this policy variable bythe effort EB (where B stands for `Bayes'), chosen to maximize some specified expected utility.L&W judge the appropriateness of a given model/estimation scheme by examining bias andvariability in the estimates for Eopt. In order to obtain a distribution of such estimates, theyreplicate a large set of Monte Carlo experiments for any given fixed simulation model. Whileinterested in the same kind of analysis, I will proceed differently to obtain the distribution ofthe estimates. Since a stochastic sample2 of values for 0 is available, the corresponding samplefor Eopt — a function of 0 — is available too. Therefore, I can approximate the (Bayesian)posterior distribution of Eopt directly, without starting the whole process from scratch.I shall illustrate the fitting procedure for catch and effort data described in Section 3.2 andcompare the outcome to results obtained by L&W. Given the artificial data generation, I havecontrol over the amount of process noise and observation error and can determine its influenceover the final outcome of the estimation. Also, the effect of different exploitation histories canbe analyzed by defining different effort sequences. In the data generation process I used threedifferent simulation models which I will now describe.Simulation ModelsThree different schemes were used to simulate the artificial data. In all cases I used the "Deriso-Schnute" model (3.16) with a = 1.65 and b = .2. The cases differ according to: (1) the modelto describe the biomass dynamic (choice of 7), (2) the catchability parameter q, (3) the startingstock size So, and (4) the global variance a2 = vp + V defined as the sum of process noise iipand observation error V. The values used are listed in Table 4.8. 'Case 3' differs from the othertwo, by having a much smaller overall variance a2 and a comparatively small catchability q.The parameter A = V7,/(Vp-1- V) is fixed at A = 0.5 in all cases; so that the global variability is2Recall that it is a sample drawn from the posterior distribution p(0 I Data).Chapter 4. Applications^ 76Table 4.8: Specification of the three simulation models used to generate catch and effort data.Case No. -y q So a21 -1 0.6 2 0.162 1 0.6 1.5 0.163 -0.5 0.31 1 0.05equally divided between Vp and V. The observation error is associated with effort measurementsonly. More specifically, for t = 1,. - • ,26 and a sequence of independent Gaussian randomvariables vt, with mean 0 and variance V, I assume thatEf = Ete'where Et is a specified effort sequence. The corresponding catch Cf is obtained from (3.20)when Ef is used as effort measurement. I further use expression (3.21) to get the surviving stockSt, an input required for the dynamic model (3.16). The random process noise wt, associatedto the biomass dynamics, is modelled as Gaussian with mean 0 and variance V.Fitting ModelsAfter having generated the data, I use another set of models to perform estimation:R1 The generalized Ricker model (3.17) with power 6. = 1.R2 Similar to above, with power b = 2.BH Model (3.16) with -y = —1, commonly known as the "Beverton-Holt" model.In all but one case, the "fitting model" is different from the underlying "real" model used togenerate the data. For the case in which both models are the same, the posterior distributiondescribes perceived uncertainties for the "true" parameter vector 0 (known from the simulationmodel). In the remaining situations such interpretation does not apply and I focus on estimatesfor the policy parameters. As in the approach taken by L&W, I am concerned with the esti-mation of an optimal effort (Eopt) and the corresponding optimal surviving biomass (Sot).Chapter 4. Applications^ 77The exact optimal values are known from the parameters of the simulation model generatingthe data. I will construct an approximate posterior distribution for such a policy variable byusing the AIS procedure of Section 2.3. Further, I will compare my results to those reportedby L&W, which were produced by an unrelated estimation scheme.The optimization criteria of L&W are designed to choose a constant effort to maximizethe expected present value of future harvest. Appendix B of their paper details the calculationof such an optimal effort Eopt and its associated stock size Sopt. While the details of thesecalculations are not of concern here, it is important to point out that Eopt is a function of00 = ln a, 92 = In q and the variance of the process noise Vp. Particularly, for the Beverton-Holtmodel an explicit solution is possible, and given byEopt(0, vp) -e-022 (Bo Vp — ln Spv) (4.36)where Spy is the discounting constant used in present value calculation (this value is fixed at6pv = 0.9 for all my calculations). From the above expression, it is clear that an overestimateof Vp causes overestimation of Zvi. This scenario is likely to occur whenever only process noiseis assumed. To control this feature, I specify A, which indicates the amount of global varianceto be attributed to process noise. I perform different fits using the assumptions of process noiseonly (A = 1), observation error only (A = 0) , and an intermediate value (A = .5).In order to verify the effect of different exploitation histories over the estimates, I usedtwo alternative sequences of effort Et displayed in Figure 4.6. Although the sequence `Eff 1'is expected to be more informative than the slowly increasing "one-way-trip" of `Eff 2', thelatter is closer to situations encountered in practice. Table 4.9 gives a summary of the differentscenarios which were analyzed. Rows 1 to 8 allow comparisons within the same simulationmodel, given in Table 4.8 as 'case 1'. To make comparisons among different models, one cantake for instance rows 5, 10 and 14 ( all are R2 ); or alternatively rows 7, 8, 12 and 15 ( all BH ).Figure 4.7 shows the simulated sequences of efforts (dotted line) and the corresponding catches(full line). In Figure 4.8 I shown the observed pairs (Se, Bt+i) for the different simulations. Thelines exhibit the true model — GO in (3.24) — used to generate the data.Chapter 4. Applications^ 78Eff 10^5^10^15^20^25timeEff 25^10^15^20timeFigure 4.6: The two effort sequences used in the simulation models.2020 10timeCase 310timeCase 2Chapter 4. Applications^ 79Case 1(1) Case 1(2)Figure 4.7: Time-series of observed efforts (dotted line) and the resulting simulated catches(full line), for different models.Case 1(2)Case 1(1)qChapter 4. Applications^ 802.0 2.5 3.0 3.5 4.0^2^3^4^5^6^7S(t)^ S(t)Case 2 Case 3q .---------▪ ........▪^..•• ft.---••• •1.0^1.5^2.0^2.5S(t) 1.0 1.5• •• ale• ...^••••2.0^2.5^3.0S(t)Figure 4.8: Spawning stock 'S(t)' and the corresponding biomass `13(t-F1)' for different simula-tion models. The line is the function GO from which the data were generated.Chapter 4. Applications^ 81Table 4.9: Specification of 16 models used to fit catch and effort data. 'SIM' indicates thesimulation model. 'FIT' gives the fitting model. `Eff' specifies assumed effort sequence. A isthe assumed choice for A. An omitted entrance implies the repetition from the line above.Row No. SIM FIT Eff A1 1 R1 2 12 - - 2 0.53 - - 1 0.54 - - 2 05 - R2 2 0.56 - - 1 0.57 - BH 2 18 - - 1 0.59 2 R1 1 0.510 - R2 - 0.511 - - - 012 - BH - 0.513 3 R1 2 0.514 - R2 - 0.515 - BH - 0.516 - - 0Chapter 4. Applications^ 82Given the data, the approximate posterior p(0 Y(25)) for the parameter vector 9 =(in a, In b, in q)' is obtained using the AIS procedure of Section 2.3 and the mixture model (2.3).I define the initial importance function go(.) to be the prior TO. This will allow for thecalculation of initial weights c4.70,3 proportional to the marginal likelihood p0(Y(25) I 00,j). Idraw an initial sample of size no = 1000 points from this prior, and perform two updatingrecursions, using respectively n1 = 2000 and n2 = n = 6000. At each stage, the mixture modelis clustered to a smaller mixture of m = 800, before proceeding with a new sample draw. Thereported sample sizes were chosen empirically. Too small sample sizes, caused instability inmeasurements of relative entropies (2.4). Monte Carlo estimates for the mean and variance ofand functions thereafter were calculated using the full sample of n = 6000 points. In all casesreported here, I used the assumption of Lemma 3.2.1(b).As prior 7r(0) for 9=(9i, 02, 93)'=(ln a, in b, in q)', I defined a three-dimensional Student dis-tribution with 9 degrees of freedom, mean vector m = (mi, m2, m3)' and diagonal covariancematrix C = diag(Ci, C2, C3)). To establish values for the moments m and C, I first determinedquantiles for some biologically meaningful variables. To illustrate the procedure, I explain thederivation of m2 and C2 for 92 = ln b in the standard Ricker model Rl. According to this model,the surviving (or spawning) biomass producing maximum output, is given by Smax = 1/b. Asquantiles 5, 25, 50, 75 and 95 for Smax, I used the values 1.1, 2.0, 3.1, 5.3, 10.0. These valueswere checked to approximately cohere with a log-normal distribution. Such a distribution wouldapproximately have mode 2, median 3, mean 4 and standard deviation 3. Satisfied with thesenumbers, and using the transformation 02 = —1n(Smax) at the given quantiles, I calculatedthe quantiles of the corresponding Gaussian distribution for 02. From each pair of quantiles Icalculated m2 and C2 of the Gaussian distribution; the average of these values were my chosenparameters: m2 = —1.18 and C2 = 0.46. For the generalized Ricker model with power b itholds that b = 1/(5Sfnax). If b(2) denotes the parameter b for model R2 (i.e., b = 2), then, bydefining 02(2) = in b(2), it is easy to see that02(2) = 202 — In 2Chapter 4. Applications^ 83I used this linear relation to modify mean and variance of the parameter 92 accordingly. Fornone of the other parameters was a transformation necessary when going from R1 to R2. Theremaining parameters were chosen using a similar procedure. The final values are:R1 : m = (0.57, —1.18, —1.18)' and C = diag(0.11, 0.46,0.46).R2 : m = (0.57, —3.05,-1.18)' and C = diag(0.11,1.84, 0.46).BH : m = (0.57, —2.14, —1.18)' and C = diag(0.11, 0.30, 0.46).Finally, as a prior for V, I defined an Inverse Gamma distribution with parameters ao = 2 and00 = 5. To simplify comparisons, I used the same priors in all simulations.Description of the ResultsThe entropies (2.4) of the different mixture models give a measure of the goodness-of-fit betweenthe current importance function and the actual posterior densities at the given grid points; avalue of 1 corresponding to a perfect fit. The observed entropies consistently approach 1 asthe mixture model is updated. A summary of the observed entropies is given in Figure 4.9. Itshows the substantial improvement at the first recursion, followed by the refinement of a laterstep.The following results are summarized in Table 4.10. They are Monte Carlo estimates ob-tained from the sample of 6000 observations taken from the final mixture model. The effect offixing A can be verified for models R1, R2 and BH, when comparing the set of runs {1,2,4},{10,11} and {15, 16}. Regardless of the model, there is an improvement in the estimates (i.e.reduction of bias and variance) as A decreases. This result implies that, for the situations stud-ied here, the best strategy is to assume all random disturbance as observation error. Similarfindings were reported in Ludwig at al. [39].The influence of different exploitation histories can be analyzed by comparing the outcomesfor models differing only in their effort sequences. The pairs {2,3} for R1, {5,6} for 112 and{7,8} for BH, provide the output for such comparisons. In all those fits, I fixed A arbitrarily atChapter 4. Applications^ 84g(0), g(1), g(2)Figure 4.9: Box-plots for "relative entropy" for the sixteen fits to simulated catch and effortdata.Chapter 4. Applications^ 85Table 4.10: Posterior expectation (standard error) for the biases in Sopt, the bias in Eopt, andthe global variance a2 = Vp + V.Row No. Sopt — SIat Eopt — Eo*pt a21 0.43(1.01) 0.12(0.46) 0.23(0.07)2 0.43(1.01) 0.12(0.46) 0.24(0.07)3 0.36(0.77) 0.66(0.68) 0.15(0.04)4 0.08(0.85) 0.01(0.40) 0.24(0.07)5 0.87(1.60) -0.19(0.34) 0.24(0.07)6 1.10(1.15) 0.46(0.57) 0.15(0.04)7 1.87(1.52) 1.05(1.10) 0.23(0.07)8 1.48(1.40) 1.59(1.67) 0.16(0.05)9 0.29(0.59) 0.61(0.70) 0.19(0.06)10 0.84(0.84) 0.27(0.49) 0.22(0.06)11 0.44(0.94) 0.07(0.48) 0.22(0.06)12 1.24(1.15) 1.72(1.93) 0.21(0.06)13 -0.21(0.44) -0.15(0.57) 0.07(0.02)14 0.16(0.69) -0.31(0.47) 0.07(0.02)15 0.30(0.77) 0.36(1.10) 0.07(0.02)16 0.15(0.72) 0.30(1.03) 0.07(0.02)Chapter 4. Applications^ 860.5. The results for estimates of Sopt and Eopt oppose each other. In all cases, going from `Eff2' to `Eff 1' reduced the variance for Sopt and increased bias and variance for Eopt. For Soptthe bias increased only in the case of model R2. The sequence `Eff 1' definitively improved theestimates for the global variance a2. Under the simulation model described as 'case 1', onlythe data generated from `Eff 1' (runs 3, 6 and 8) have expectations close to the real value 0.16,the general tendency being towards overestimation. Since `Eff 1' would theoretically produce amore informative data sequence than `Eff 2', the inconclusive results came as a surprise.Another question to be answered: "How do different fitting models (R1, 1t2 or BH) performunder different circumstances (i.e. different simulation models) ?" This addresses the robust-ness of a given estimation model to deviations from the correct underlying biomass dynamicsSuch relative comparisons are possible for the sets {2,5,7}, {3,6,8}, {9,10,12}, and {13,14,15}.Considering both, bias and variance, model R1 performs best for Sopt, while R2 is the bettermodel for Eopt. It is important to notice that model BH failed, even in 'case 1' where BHwas the "correct" model from which the data were generated. Similar findings in Ludwig et al.[39] and L&W, indicate that the generalized Ricker model is more effective in producing goodestimates for policy parameters Eopt and Soo than some of its competitors. This includescases for which the data were effectively generated from such competing models.As was already mentioned, the posterior estimates of variance tend to overestimate its truevalue. In the different situations investigated here, such overestimation is roughly one standarderror. Exceptions are runs 3, 6 and 8 (already discussed), and runs 13 to 16 corresponding tosimulations in 'case 3'.In runs 7 and 8, the simulation and fitting models are the same. I have used 'run 8' to con-struct the posterior distribution for the parameter vector (In a, in b, in q)'. The two-dimensionalmarginals are shown in Figure 4.10 (a) to (c); the correct parameter combination used in thedata generation (i.e. simulation model) is shown as a dot in each contour plot. In (d), I displaya histogram of the 6000 estimates for Eopt; the true value is 0.54. The strong asymmetry of thehistogram is a consequence of a few extremely large values (the maximum being Eopt = 16.3).(a) (b)-3 -2In(b)-1yk:Chapter 4. Applications^ 87Figure 4.10: Contour plots of two-dimensional marginal posterior distributions for 'run 8'(linesat .05, .10, .25, .50, .75, .90, .95 of largest density). The dots indicate the true parameter usedin the simulation model. In (d) a histogram of the 6000 Monte Carlo estimates for Eopt (thecolumn to the far right includes all Eopt > 5).Chapter 4. Applications^ 88Utilities and Bayesian DecisionSo far I have concentrated my analysis on policy criteria of L&W in order to compare theBayesian procedure and the "total least-squares" method used in their paper. But, in orderto adequately explore the power of Bayesian decision making, I need a different approach.According to the description of Section 1.3, an optimal policy corresponding to a"Bayes rule"has to maximize the expected utility. To describe this approach, I use the utility functionintroduced in Section 3.2. It remains to define a set of decision rules A and the thresholdSLIMIT.For A, I specify a finite set of efforts, over a range — from 0.05 to 2.50 — which I assumeimplementable for the imaginary fishery under consideration. This set (50 points) is fixed asA = {0.05, 0.10, 0.15, - • • , 2.45, 2.50}More precision could be achieved with a finer grid. But when it comes to enforcement of suchpolicy, it might be of little use to distinguish between 0.68 and 0.71 — from a practical viewpoint0.7 serves both cases.The expected utilities for six simulations are displayed in Figure 4.11. Each row correspondsto a different simulation model. The first column (Runs 1, 3 and 5) relates to the fitting modelR2; BH is the fitting model in the second column (Runs 2, 4 and 6). Table 4.11 also accountsfor the models — notice that, in contrast to calculations of Eopt, no value of 5, is needed— and gives the effort EB which maximizes the expected utilities. In square brackets is thecorresponding "true effort". It gives the best choice (in the set A) if complete information wereavailable, the dynamics deterministic, and the above defined utility function were used as themaximizing criteria. All runs use SLIMIT = 0.3 — the threshold for surviving stock in theutility function.The evaluation of R2 and BH is done by comparing EB with the corresponding "true"policy. For simulation models 'case 1' and 'case 2' R2 performed best while in 'case 3' thecorrect estimate was given by BH. For model BH two features are observed: (1) the value0.3 0•9.w Cd•••-.••••...„LUChapter 4. Applications^ 89Run 1^ Run 2 0. •.•-•...41....•.••••••••••■•••••••••••oNs.sk-r 3N • 01 •0.0^0.5^1.0^1.5^2.0^2.5^0.0^0.5^1.0^1.5^2.0^25Effort EffortRun 3 Run 40.0^0.5^1.0^1.5^2.0^2.5^0.0^0.5^1.0^1.5^2.0^2.5Effort EffortRun 5 Run 6oV3a•0.0^0.5^1.0^1.5^2.0^2.5^0.0^0.5^1.0^1.5^2.0Effort EffortFigure 4.11: Expected posterior utilities over the set A of possible choices for effort. Firstcolumn uses fitting model 1t2 and second column uses BH. Data from simulation models 'case1', 'case 2' and 'case 3' were used in rows 1, 2 and 3.Table 4.11: The Bayes rule EB for different cases is displayed together with its "true" alter-native [value in square brackets]. Columns 2 to 4 specify model assumptions. All 6 runs useSLIMIT=0.3.No. SIM Eff FIT EB1 1 1 R2 0.75 [0.60]2 1 2 BH 2.15 [0.60]3 2 1 1t2 0.60 [0.60]4 2 1 BH 1.80 [0.60]5 3 2 1t2 0.70 [1.20]6 3 2 BH 1.20 [1.20]Chapter 4. Applications^ 90EB is always larger than the corresponding value for 1t2; (2) the expected utilities (near themaximum) are very close over a wide range of effort selection — in run 2, for instance any valuefrom effort 1.5 to 2.5 would do. Further notice that, similarly to the findings in the first partof this section, the "correct" fitting model of 'run 2' is outperformed by the "incorrect" R2 of'run 1'.I also experimented with different values for SLIMIT —not shown here. A sufficiently largeincrease of SLIMIT will tend to produce smaller values for EB, reflecting a more conservativemanagement. Finally, there is no effect of different exploitation histories (Eff 1 versus Eff 2)over the final outcomes in all cases I have analyzed.ConclusionHere is a summary of the results for this section. Two distinct analysis are considered. The firstestimates the optimal effort Eopt as defined by L&W and the results are discussed in the lightof earlier findings reported by them. The second analysis is new; it uses an utility function forwhich the best effort is the Bayes rule EB, chosen to maximize the expected posterior utility.First Analysis: Given the goal of estimating the optimal effort, the three central findingsagree with L&W. I summarize them as follows:1. When deciding on how to "split" the overall variance into process noise and observa-tion error, it is preferable to favour the observation error. In particular, observationerror only (A = 0), was shown to outperform the assumption of process noise only(A = 1).2. An "incorrect" fitting model can lead to superior estimates of policy variables. Forcatch and effort data with large error and/or low variation in stock size, one mightgive up fidelity of the fitting model to the underlying dynamics in order to obtainimproved estimates for the policy parameter of interest. In particular, the Rickermodels (R1 and 1t2) have outperformed the Beverton-Holt (BH) alternative.Chapter 4. Applications^ 913. Different exploitation histories (here represented by distinct effort sequences) canaffect policy parameters in different ways. Overall, there is a trade-off between biasand variance. For the cases analyzed here, there is no single model which is superiorfor both parameters Eopt and Sopt under all circumstances.Second Analysis: Here I use Bayesian decision-making. The utility function incorporates thecentral goals that a fishery manager would like to achieve: maximization of catches andreduction in the risk of stock collapse. hi order to determine the best solution, there is noneed for equilibrium assumptions, neither is it necessary to specify a parameter Ä dealingwith process noise and observation error. The results show that:1. The "incorrect" fitting model performed better than the correct model. This featurewas also observed in the previous analysis.2. For all three simulation models, at least one of the fits identified the "true" (perfectinformation) effort. It seems that the generalized Ricker model (particularly 1t2)does better than model BH while always staying on the conservative side (smallerefforts). Only in 'case 3', for which the effort has less effect over stock depletion3,did BH give the correct effort.4.4 Surplus Production: Orange RoughyThis section is a case-study of the Orange Roughy (Hoplostethus atlanticus) fishery on theChatham Rise, New Zealand. The analysis uses data previously published by Francis [18]. Therisk analysis performed by Francis in that paper has caused some recent controversy (Hilbornet al. [24]) regarding the state of the fishery and consequently the management actions to betaken. Bayesian analysis which is made possible by the procedures developed in Chapters 2and 3, can help to clarify some of the controversy.Using an age-structured model with many parameters (which were estimated elsewhere and3The catchability q = .31 is half the value used in cases 1 and 2.Chapter 4. Applications^ 92are assumed known), Francis built a cumulative distribution function for the virgin (unfished)biomass B1. This construction was done by replicating simulations 100 times and using afrequentist approach to make empirical estimates of cumulative probabilities. He proceededby fitting a Johnson's distribution (Johnson and Kotz [26] p.23) to the simulated cumulativedistribution. Taking the first derivative of this fit, he obtained a probability density for B1.The mode of this distribution was used as a maximum likelihood estimate of B1. His goalwas the assessment of the risk of collapse within the next five years, given different rates inthe reduction of annual total allowable catches (TAC's) within that period. I shall give furtherdetails about these different decision options later.Hilborn et al. [24] performed a series of Monte Carlo experiments for scenarios in whichFrancis' method of constructing probability distributions for B1 fails: cases with low harvestrate and uninformative data. Based on these results, they question the estimate for B1 producedby Francis.A key element in the controversy is the uncertainty about the virgin biomass B1 and its usein decision making Francis describes uncertainty by a density for B1, from which he then drawsrandom values of B1 to produce replicates for a frequentist construction of his risk function.Further uncertainties are ignored in the process. I shall use the Bayesian alternative describedas 'Model 2' (Section 3.3) to construct a posterior distribution for the parameter vector 0 whichincludes ln B1 as one of its elements. This multi-dimensional posterior distribution is furtherused in the context of Bayesian Decision Theory to determine an optimal action.The data of Francis are reproduced in Table 4.12. They include a time series of catches{Ct; t = 1, • • • ,11) and estimates of abundance from trawl surveys {It; t = 6, , 11). Accordingto Francis' assumptions, the errors in Ct are negligible, so that all observation error is relatedto It.My general model, is given by (3.24) and (3.25). Particularly, I define= (In a, —ln^, ln q)'b = 1/B1Chapter 4. Applications^ 93Table 4.12: Estimated catch history, and trawl survey biomass for Orange Roughy (Hoplostethusatlanticus) for the period 78/79 to 88/90. Data from Francis [18].Time Catch (t) Trawl Index1^15,3402^40,4303^36,6604^32,3545^20,0646^32,263^164,8357^38,142^149,4258^39,098^102,9759^39,896^80,39710^31,478^97,10811^42,621^66,291St = Bt - CtG(O, Bt_i) = St - i -I- aBt_i (1 - bBt_i) for t = 2, • • - , 11F(9, Bi) = qBt for t = 6, • - - , 11Model GO describing biomass dynamics, is the discrete form (i.e. difference equation) ofthe Schaeffer model. The observation model F(.) assumes direct proportionality between indexI and the corresponding true biomass B, with constant of proportionality q.Determining PriorsI used two different priors for O. Both are three-dimensional Student distributions, with 9degrees of freedom. They represent my interpretation of the information provided and generalassumptions used by Francis. The first prior, denoted "informative", assumes a relatively highconfidence in prior identification of the mean values of the parameters. It is based on a guess formean as well as a range to represent possible smallest and largest values. For the second prior,denoted "diffuse", I made a more careful (and pessimistic) elicitation. Starting with guesses forpossible values of five pre-specified quantiles — 5, 25, 50, 75 and 95 — I found the parametersChapter 4. Applications^ 94of an idealized log-normal distribution which would best agree to those values (If these numberswere not coherent with such a distribution, I tried to improve with a new set of values). Theparameters corresponding to the underlying Gaussian distribution were used as parameters inthe Student prior. I also assumed a diagonal prior covariance.For the parameter q I used as quantiles the values 0.20, 0.30, 0.41, 0.56, and 0.90 — thesevalues more than cover the range of possibilities considered by Francis. Similarly, for B1 I usedthe values (in thousands) 200, 400, 650, 1000, and 2000 and for a the values 0.04, 0.10, 0.19,0.49, and 0.95.The prior parameters for the Inverse Gamma distribution of the observation variance Vwere the same in all runs. They were based on Francis' assumption of an coefficient of variationof 0.20 for observation errors in the abundance index. By fixing ao = 2 and setting the modefor V at 0.40, the value 00 = 8.0 was calculated. The values for the prior were as follows:• V /G(ao = 2.0, /3 = 8.0)• Informative: ro(0) T3(9, Mo, Co) with mo = (-1, 61, –13.36, –0.66)' and Codiag(0.30, 0.20, 0.05)• Diffuse: ro(0) T3(9, mo, Co) with mo (-1, 62, –13.37, –0.88)' and Codiag(0.97, 0.48, 0.21)ResultsTable 4.13 summarizes model specifications (columns 2 and 3) and the results for differentruns. Various bandwidths between h2 = 1 and h2 = .6 were used. I wanted to make surenot to smooth out potential "ill behaved" features, such as multimodality, by using too largea bandwidth. Variations in estimates were small within the given range for h2. For the resultspresented here, I used h2 = 0.64 and h2 = 0.90.In the AIS procedure (Section 2.3) I used three updating steps with sample sizes no = 1000,n1 = 2000, n2 = 2500, and n3 = n = 6000 . At each stage the mixture model was clusteredChapter 4. Applications^ 95Table 4.13: Model specifications (columns 2 and 3), final entropy (Ent.) and posterior estimatesfor virgin biomass B1 for Orange Roughy (values in 1000 t) and the probability that B1 exceeds500 thousand t. Values in brackets are estimates for posterior standard error. Run 'la' usesassumption (a) of Lemma 3.1.; the remaining runs use assumption (b). 'D' denotes diffuse priorand 'I' the informative alternative.Run 112 Prior Ent. B1 P(Bi > 500)la 0.64 D 0.94 450 (108) 0.25 (.006)2 0.90 D 0.98 436 (92) 0.19 (.005)3 0.64 D 0.98 431 (90) 0.18 (.005)4 0.64 I 0.99 385 (48) 0.02 (.002)5 0.64 I 0.99 384 (49) 0.02 (.002)to m = 800 components of three-variate Student distributions with 9 degrees of freedom beforeproceeding. The relative entropy (2.4) follows the usual pattern described in previous sections.A typical outcome (Run 5 of Table 4.13) gives the values 0.597, 0.947, 0.987, and 0.991 for theimportance functions gi(-) for i = 0, • • , 3, respectively.The prior clearly influences the final estimates. If diffuse, the Monte Carlo estimates ofposterior mean and standard error are higher (approx. 433,000 t and 90,000 t), than estimatesobtained under the informative alternative (approx. 384,000 t and 48,000 t). Similarly, theestimate of the probability that B1 exceeds 500,000 t, is about 0.2 for the diffuse prior, while inthe informative case the estimate is 10 times smaller. The marked differences due to differentpriors are also shown in Figure 4.12. Three different marginal posteriors are displayed: for 'runla' (line), 'run 2' (dots) and 'run 4' (broken line). If the diffuse prior is accepted, values of B1up to 800,000 t are possible. This is quite different from the informative case, where valuesof B1 exceeding 600,000 t are unlikely. A comparison with the estimated density reported byFrancis (his Figure 1(B)), suggests the density which was obtained for 'run 4' (broken line).In other words: the acceptance of Francis' distribution corresponds to the acceptance of theinformative prior in my model.The marginal distribution for 'run la' (line) shows an interesting feature suggesting bimodal-ity. This was an atypical feature. It is included as an illustration of a particularly interestingChapter 4. Applications^ 96210A5^4*10A5^610115^8*10A5^10A6^1.2*10A6^1.4*10AEVirgin BiomassFigure 4.12: Marginal posterior distributions for virgin biomass B1 for different runs: `laVine),'2'(dots) and '4' (broken line).13.5(a) (b)12.4^12.6^12.8^13.0^13.2^13.4111(8_1)12.6 13.212.4 13.412.8^110In(B 1)Chapter 4. Applications 9712.0 12.5 13.0In(B 1)(C)13.5 14.0 12.0 12.5 135 14.00.Tcy •12.5 13.0In(B_1)(e)12.5 13.5Figure 4.13: Contour plots for two-dimensional marginal posterior distributions: 'run la':(a)and (b); 'run 2':(c) and (d); 'run 4':(e) and (f). Contour lines are drawn at .05, .10, .25, .50,.75, .90 and .95 of the highest density.possibility. To better see this feature, I included contour plots of the two-dimensional marginalposterior distributions for (ln Bi,ln a) and (ln q) in Figure 4.13 for the same three cases.For 'run la' (displayed in (a) and (b)), the two modes are clearly shown. All remaining casesare unimodal. The negative correlation between B1 and q is consistently present (contours (b),(d), and (e)).Chapter 4. Applications^ 98Decision- makingThe analysis so far was able to describe some general features of marginal uni- and bi-dimensional posterior distributions and its relation to prior specification. But the final goalof Francis' analysis was the use of his risk-assessment to make management decisions. A key el-ement in Hilborn et. al.'s criticism was the absence of a coherent framework for decision-makingunder uncertainty. I shall use the three-dimensional posterior distribution for 9 together withan appropriate utility function to perform a Bayesian decision analysis and choose the Bayesrule policy from among the five different options considered by Francis. But first I shall describethese management options.The fishery for Orange Roughy is managed by annually setting the TAC. For instance, thefishing season of 89/90 (the year immediately following the data sequence of Table 4.12), hadits TAC fixed at 28,637 t. According to Francis, there seems to be an agreement to reduce TAC.But two questions remain: (1)"to what level should the TAC be reduced" and (2)"at what rateshould the reduction take place".The target level was fixed at a maximum constant yield of 7,500 t — accounting for question(1). Francis defined this value to be -IMSY, where MSY denotes the deterministic maximumsustainable yield. This value is questionable on two grounds: (i) it is an arbitrary choice and(ii) its calculation uses the estimate B1 = 411,000 t, ignoring the uncertainties of this estimate.What I will try to answer here is question (2): the rate at which such reduction shouldbe made. Table 5 of Francis displays five alternative options. I will use the same options todefine my set of actions A = {al, a2, a3, a4, a5}. Given action al, the rate of TAC reduction is3 thousand tons per year. Similarly for a2 up to a5, the rates are respectively 5, 7, 9 and 12thousand tons.Francis considers "the risk to the fisheries" which each of these options represents by esti-mating (using a frequentist empirical estimate) "the risk that the fishery would collapse withina five year period". The fishery is said to have collapsed if, within this period, in some year theTAC is greater than 2/3 of the corresponding biomass.Chapter 4. Applications^ 99To construct my utility function I assume that, for a fixed ai and in any given year, thecatch will equal the prescribed TAC; I denote this total quota by T(i). Hence, starting fromthe current To = 28,637, future quotas for s = 1,-• • , 5 are given byT2) = max{To — sat; 7500)Defining the discount rate of future catches by Spy, the cumulative discounted catch overthe five year period is denoted CC(i) and given by5Cr(i) = E (5-TPs=iI further need to identify the "occurrence of collapse given decision ai". This is done by theindicator functionA(2) = 1 1 if TP) > B12-i-8 for some s = 0, • • • , 50 otherwise(4.37)Given a pair (Oi , ai) E 0 X A (recall that 0 is the random sample from the posteriordistribution), the utility function can be stated in terms of the notation just introduced:U(83, ai) = (1 — A())0(cc(2))^ (4.38)where c/) is some known non-decreasing function.The utility (4.38) is zero whenever collapse is predicted (A(i) = 1). Otherwise it is 0(CC(i)).The dependence on the posterior p(9 I Data) becomes clear when considering the algorithm usedto calculate (4.38):1. Fix (ki E 0 and use GO together with the observed catch data {Ct; t = 1,•• • , 11) topredict the average time series for {Bt; t = 2, • ,12}.2. Fix a decision rule ai E A and define {To, TP, • • • , TP} to be the catches into the future.3. Evaluate A(2) according to (4.37).4. Calculate U(93, a1) using (4.38).i ai COO x10-3 0(CC(0)1 3 83 .9812 5 63 .9503 7 52 .9164 9 46 .8885 12 40 .850Chapter 4. Applications^ 100Table 4.14: Risk-averse utilities 0 associate to the five possible cumulative discounted catchCOO.5. Repeat the whole procedure for the next pair (0i, ai).I consider 3 alternatives for 0:1. 0(x). x.2. 0(x) = 1.3. q:(x) is a risk-averse function defined in Table 4.14.The first assumes 0(x) = x; so that the utility is linear in CC whenever the possibility forcollapse is excluded. A more conservative approach, which is concerned only with the risk ofcollapse defines 0(x) = 1. This second alternative assumes the same utility for different catchesregardless of its size; an unrealistic assumption. A compromise between both extremes can beobtained by a concave function for 0 and is given as the third option.Some comments on Table 4.14. Since the values CC() are fixed for each i, I only need todefine 0 for those particular points. Notice that the utilities become smaller with smaller valuesof CC, but the reduction is non-linear. This values were determined assuming 0(160) = 1 andusing Table A.1 for constant risk-aversion in Lindley [31] (p.196). For comparison, the linear(or risk-neutral) values would be .52, .39, .33, .29, .25 for i = 1 to 5, respectively.For the manager who needs a practical answer, the results in Table 4.15 are the mostimportant of the section — with complementing details on Table 4.16. The following featuresare highlighted there:Chapter 4. Applications^ 101• The decisions are robust over different bandwidths [1a,2] and prior assumptions ([1a,5]and [3,4]).• What defines the optimal rate is the utility function. If the linear relation 0(x) = x ischosen, the optimal decision is al which recommends a reduction in TAC of 3000 t per yearover the next 5 years. On the opposite end, the conservative '0-1' option (0(x) = 1)recommends a5, a drastic rate of annual reduction by 12000 t. The intermediate risk-averse utility recommends rates in between: it chooses a3, a reduction rate of 7000 t peryear, as optimal.• The expected utilities in the '0-1' case, have a special interpretation: the complement ofthe probability of collapse. For any decision a, the probability of collapse pi, is estimatedaspi =1— E[U(0,ai)]If, for instance, I take 'run 5' in Table 4.16, these estimated are .27, .13, .08, .06, .05for actions al to a5, respectively (these probabilities change little for diffuse prior:`run2'). Such values are in disagreement with the results presented in Francis' Figure 3: hisestimates of the risk of collapse exceed .6 for al, while probabilities smaller than .2 areobserved only for a4 and a5. Since both methods are derived under different paradigms, adirect comparison is not possible. However, Francis' model is complex (many parameters)and the estimates of risk are sensitive to recruitment assumptions (as indicated in hisFigure 3(d)). My model is comparatively simple and does not need special assumptionsfor recruitment. The findings in Ludwig and Walters [37] [38] indicate that simple modelsare likely to be more robust for management purposes.The results suggest that, for practical decision-making, the focus of discussions should bethe assessment of the most adequate utility function to which all parts can agree (industry,agency, fisher, environmentalist, etc.). Given the available data, details regarding technicalitiesin parameter estimation and modelling seem to be of minor consequence.Chapter 4. Applications^ 102Table 4.15: Recommended decisions ai (rate of TAC reduction in thousand t/year), whichmaximize the expected posterior utilities, using three alternative utility functionsRun Linear Risk-Averse 0-1la - 7 122 - 123 3 7 -4 3 7 -5 - 7 12Table 4.16: Expected utilities obtained for all 5 decision rules, according to the form of theutility function (expressed by 0).4) Run al a2 a3 a4 a5Linear 3 .622 .538 .462 .420 .3734 .609 .540 .470 .427 .378Risk-Averse la .740 .814 .823 .814 .7933 .733 .811 .824 .817 .7964 .717 .825 .838 .830 .8085 .716 .820 .837 .828 .8100-1 la .767 .871 .913 .931 .9452 .753 .862 .906 .922 .9385 .730 .870 .918 .934 .951Chapter 4. Applications^ 103SummaryThere are 4 conclusions from this case-study for Orange Roughy:1. The simple production of probabilities of collapse, as done by Francis, is not enough fordecision making. The uncertainties (given by probabilities), need to be linked to measuresof the consequences (utilities) resulting from different actions. The Bayesian paradigmfurther says that, for rational and coherent decision making, the maximization of expectedutilities is the correct criterion to use. Such analysis was carried out with three distinctutility functions. By taking this additional step, my analysis completes the work ofFrancis.2. The shape of the marginal posterior distributions for B 1 is affected by the choice of a prior.I used two alternative priors: informative and diffuse. The distribution used by Franciscorresponds to the informative prior. The diffuse prior resulted in a heavier upper tail,increasing the possibility of values B1 > 500,000t from 5% to about 20%.3. Decision making is robust with respect to choices for prior. Given any of the utility functions,the optimal decision was not affected by the particular choice of prior (informative ordiffuse). This indicates that for the current model/data situation, and assuming consensusabout the set of actions A, the tail behaviour in the distribution for B 1 was not strongenough to change the overall optimal decision.4. The construction of a sensible utility function is an important tool in the negotiating processamong all interest groups. My results suggests that this aspect of the analysis overshadowstechnicalities concerning the choice of a model for biomass dynamics4.5 Multimodality: Simulation ModelThe advantage of replacing classical parameter estimation by the Bayesian alternative is mostapparent when the posterior distribution is multimodal. It indicates that more than one modelChapter 4. Applications^ 104describes data reasonably well. Such information is an important tool in the design of adap-tive management policies (Walters [68]), where one of the requirements is the identification ofalternative hypothesis. A procedure that forces the choice upon a single best model, as theclassical approach does (least-squares, maximum likelihood), is unable to capture the uncer-tainty present in the problem. Such ideas were the motivation to examine practical situationsfor which multimodality might emerge.In this section and the next, I give examples for which the posterior distribution is multi-modal. Here I use simulations to produce artificial data. The next section is a case-study ofPacific cod. I employ two different approaches:1. In the simulated example I use an observation mixture-model (MM) given as:(Yt I 0, V), r • N(141)(0),V) -1- (1 - r) • N(142(9), V) with r E (0,1)^(4.39)/111) ,^(2) ,.where^r pt for all t. Such model might be adequate in situations where availableknowledge suggests a mixture of two possible hypothesis (or stocks): a very productivestock with low virgin biomass, against a less productive stock with large virgin biomass.2. The approach of the next section maintains the original observation model (without mix-ture) but modifies the description of biomass dynamics by replacing the surplus-productionmodel by a more elaborate delay-difference (DD) model (see Section 3.4).The observation mixture model (MM)For data generation I used a standard logistic (Schaeffer) model for biomass dynamics. Thesimulation model is as follows:Bt = St-i + aBt-i(1 - bBt-i)St = Bt - CtCt = Bt(1 - e-c2")Chapter 4. Applications^ 105Ef = Etevt , where vt are i.i.d. N(0,ae2 = 0.05)I = CtlEt)where Bt is biomass, Ct the catch, Et the effective effort, Ef the observed effort and h theobserved abundance index. Parameters in the model are a = 0.2, B1 = 1/b = 4000 andcat chability Q. The cat chability Q is modelled as a random variable from a mixture of uniformdensities, namelyQ ,,, EU(.20, .40) + (1 — OU(.55, .65)with c = 0.4. The real (but unknown) effort sequence is defined for t = 1,- • • ,26 to be0.40 if 9 < t < 18Et = 0.20 if t = 9 and t = 1810.04 otherwiseFigure 4.14 shows the time series of Ct, Ef and It for the 26 years of data. The last plotdisplays the abundance index against the corresponding observed effort for the last 13 years.For fitting purposes I assume that only these 13 abundance observations are known along withall 26 observations for catch.The observation model for Yt = log It uses MM defined in (4.39) with r = 0.5. I furtherdefine p(i)(0) as/4°(9) = ln(q(i)e) for i = 1,2where for i = 1 I use a(1), b(1), q(1) according to the relation(ln a(1), ln b(1), ln q(1)) = (00,01,92)while for i = 2, I take(a(2), 1)(2), q(2)) = (a^' 2 '(1) b(1) 2 . q(1) )2 In the above construction I am mixing the hypothesis (i = 1) of high productivity, low virginbiomass, and a given catchabifity, with an alternative hypothesis (i = 2) of low productivity,large virgin biomass, and high catchability.000 0091.^0001.^0093/00CV011..^ o90 90 VO £'0 Z'O VO 00pogo•Chapter 4. Applications^ 106ocm01....0009^009^00t7^00Z^0 40W0000k^000£^000Z^0001.3/00cm00Figure 4.14: Time series for Catch (Ct), observed effort (Es) and CPUE (It). Also a plot of`CPUE vs. E' for the final 13 yearsChapter 4. Applications^ 107The posterior distribution for the parameter vector 9 is obtained by using the algorithm ofSection 3.3. What changes is the use of (4.39) as the observation model.Some two- and three-dimensional' marginal posteriors are displayed in Figure 4.15. Theperturbation caused by the stochastic behaviour of the catchability Q is translated into un-certainties regarding the virgin biomass. A secondary mode is estimated for a biomass near1600 while the primary mode is estimated near a biomass of 3200. Also notice the negativecorrelation between in a and in B1 which reflects the confounding between a small productivestock versus a larger less productive one.If both modes are interpreted as summarizing conflicting hypothesis, the highest one in-dicates the hypothesis with the strongest support. In this example, the large stock (virginbiomass near 3200) with a smaller intrinsic growth rate (parameter a) is the most likely of bothalternatives. This hypothesis is closest to the true virgin biomass of 4000.4.6 Multimodality: Pacific CodOne might argue that the mixture model (4.39) used in Section 4.5 is artificial. Therefore, Iinvestigate the emergence of a multimodal posterior in a more natural way, without using aspecific mixture model for observations. Possibly surplus-production models are too simplistic5.A next step towards more flexible models, capable of incorporating characteristics of growth,recruitment, and time-delays, is the class of delay-difference (DD) models. By establishing as-sumptions for growth, survival and recruitment, these models can better depict the interactionsof these elements and the resulting time-delays. For more details on the DD model and thenotation used throughout, see Section 3.4.Here I shall present a case-study using the data for Pacific cod (Gadus rnacrocephalus)presented in Hilborn and Walters [25] (p.345). For the purpose of the analysis, parameters forgrowth and natural survival are assumed known (Walters pers. comm ) and given in Table 4.17.The observed data are two time-series: catches {C1, • • , Cr} and abundance index4For three-dimensional plots I used `perspO' function in S-Plus.5But recall that in the case-study for Orange Roughy one of the fits also suggested bimodality.El. I. 81:01:1YOZ'00 TO-ApuepChapter 4. Applications^ 108I. 8"09'0VOZ*0 0 TO-AmsuepZ^1.^0^l-^Z-^E-^V- I.^o^I--^z-^C-(e)Boi (b)601Figure 4.15: Some marginal posteriors for data simulated using the logistic biomass model andfitted by the mixture model described in text. The contours are at .01,.10,.20,.30,.40,.50,.70,.90of the largest observed posterior density.Parameter ValueaWk1.00.80.50.63Chapter 4. Applications^ 109Table 4.17: Parameters which are fixed in the delay-difference model for Pacific cod (Gadusrnacrocephalus).{h, -,/,} for a total of T = 22 years. The abundance index is catch per unit effort, CPUE. Ishall define harvest rates in terms of catches byht=CtlBt for t = 1, • • , rThe abundance index h is assumed proportional to the corresponding true biomass Bt butsubjected to i.i.d. observation errors vt. Hence, I use the standard modelh = qBtevt for t = 1, • • • ,Twhere q is the constant of proportionality.The unknown parameters are: b in the recruitment equation (3.35); B1 = Be the (initial)equilibrium biomass needed in (3.34) and (3.35); and the constant of proportionality q. Theparameter vector 0 = 02, 03)' is given by0 = (ln ln q,lnThe observation model, written as a function of 0, is defined by:(Yt I 0,V), N(pt(0),V)^ (4.40)For any fixed value 9 and time-series of catches {Ct}, the sequence^t = 1, • • , 7} isobtained recursively, using the following steps:1. Start with B1 = e93 calculate R1 using (3.34).Chapter 4. Applications^ 110Table 4.18: Quantiles specified for the parameters. The mean m and variance C are parametersfor the corresponding Gaussian distribution for the logarithm of the parameter.Quant. 5 25 50 75 95 m Cb 1.0 1.4 2.5 3.5 5.0 0.82 0.28q x 10-3 1 6 25 100 500 -3.74 3.73B1 1 3 6 9 18 1.60 0.752. Substitute R1 into (3.33) to obtain Ni.3. For t < 5, define Rt = R1; otherwise use (3.35) and the corresponding "lagged" spawningstock4. Use (3.31) and (3.32) recursively to obtain the sequences {13t t . 1, • • • ,r} and Ott = 1, ..., r}.5. Calculate {MO) : t = 1, ..., r}, where ft(G) = ln(q.ht) and q = 02.The posterior distribution for 0 is obtained using the adaptive importance sampling (AIS)of Section 2.3. For data fitting I use the algorithm proposed in Section 3.3.To describe background information about the parameter vector 0, I constructed the priordistribution based on educated guesses for specified quantiles of the original parameters to whichbiological meaning can be attached. Table 4.18 gives these values for five different quantilesof an "imaginary" marginal log-normal distribution for each parameter. The listed quantileswere the outcome after a series of minor adjustments until reaching consistency among all fivevalues. The parameters which would correspond to the underlying Gaussian distribution wereused as the mean m and variance C in a three-dimensional Student distribution with 4 degreesof freedom and diagonal covariance matrix.In order to evaluate the range of values for b, notice that1 _ Soptb — BeChapter 4. Applications^ 111where Soo is the spawning stock which produces the maximum number of recruits Ropt. It iseasy to see from (3.35) thatRRop: = _bleb_iso that for b = 5 the recruitment Ropt would be eleven times larger then its value at unfishedequilibrium Re. It is also reasonable to assume 0 < Soo < Be, which implies b> 1. For q andB1 the quantiles can be interpreted directly.As prior for the variance V I assume an Inverse-Gamma distribution with parameters av = 2and fiv = 8 which is diffuse (infinite variance) and corresponds to a coefficient of variability ofapproximately 20% in the observed abundance.The approximation for the posterior distributions were obtained after 4 iterations usingno = 1000, n1 = 2000, n2 = 2500, n3 = 3500, and n4 = n = 6000. At each stage, a reductionto a smaller mixture model of m = 800 components was performed before proceeding. Theheavier tails (4 degrees of freedom) were chosen empirically after trying different possibilitieswhile aiming for stable measures of relative entropy (2.4). For the results discussed here, I setthe bandwidth h2 = 0.9. The entropy of the final approximations was 0.91 (recall that a valueof 1 corresponds to a perfect fit at the posterior sample points). In Figure 4.16 (a, b and c),contour-plots for the two-dimensional marginal posterior distributions are displayed. The pair(ln b,ln Bi) (Fig. 4.16(a)) has a primary mode near (1.4,1.3), while a secondary (less likely)mode is near (0.5, 2.1). Bi-modality is also shown in (c), the marginal posterior for (lnq,ln b).In (b) a strong (negative) correlation between in B1 and in q features the confounding effecttypically found for those parameters. Figure 4.16 (d) shows the one-dimensional marginal forin B1. The comparison of this one-dimensional marginal distribution with the contour plots (a)and (b) gives a good indication of loss in information incurred by the reduction in dimensions.Still, such a one-dimensional marginal posterior distribution has much more to say than a pointestimate as, for instance, the Monte Carlo mean 1.89 with standard deviation 0.36.The confounding effect between B1 and q suggests that a good prior knowledge about q islikely to reduce posterior uncertainty about B1. In order to see this, I have defined a more542^3In B_11Chapter 4. Applications^ 112(a)^ (b)cvT7-1^0^1^2^0^2^3^4^5In b In B_1Figure 4.16: Estimated posterior marginal densities for Pacific Cod (Gadus macrocephalus).The contours represent .05, .10, .25,.50, .75, .90 and .95 of the largest observed density.Chapter 4. Applications^ 113Table 4.19: Monte Carlo estimates of posterior mean (standard deviation) for diffuse andinformative priors. The entropies are for the final importance function.Estimates Diffuse InformativeV 0.28 (0.10) 0.30 (0.10)B1 8.1 (7.5) 8.7 (4.0)P(Bi > 9.0) 0.29 (.006) 0.40 (.006)P(Bi > 10.0) 0.23 (.005) 0.28 (.006)Entropy 0.91 0.95informative prior for q, setting the quantiles 5, 25, 50, 75, 90 and 95 at values .03, .05, .08, .12,.22 respectively. Such resulted in the prior moments m = —2.53 and C = 0.38 for ln q. Thisprior favours the less likely of both modes. Figure 4.17 shows the contours (and perspectiveplot) for the marginal posterior between In b and ln B1, together with the one-dimensionalmarginals for the individual parameters. Notice the increased credibility associated to thissecondary mode, confirming my predictions. At the same time the mode corresponding to thesmaller biomass dominates — a consequence of the strong favouritism towards the recruitmentparameter b associated to that mode. The change in the shape of the marginal posteriors forln B1 and ln b are most noticeable. Figure 4.18 serves this purpose: the marginal posteriorsfor the "informative" prior (broken line) increase possibilities for the smaller mode for In b andsubstantially reduce the chances for values of ln B1 below 1.5.Table 4.19 and Table 4.20 give additional summaries for estimates obtained for both fits.The case with diffuse prior suggests a slightly smaller expected initial biomass of 8.1 as comparedto 8.7 obtained for the informative case. The increased background information assumed in thelatter case is also reflected in the overall reduction in variances; the exception is hi b for whichthe variance increased slightly — possibly due to the more pronounced bimodality.4.7 Final RemarksThroughout this Chapter I have examined a series of case studies for Bayesian inference anddecision making These applications were intended to provide a range of illustrations for the(a)qy-2^3In B_11 41 43(b)0.0 0.5 1.0 1.5 2.0In bChapter 4. Applications^ 114Figure 4.17: Estimated posterior marginal densities for Pacific cod, using an informative priorfor the catchability parameter q. The contours represent .05, .10, .25, .50, .75, .90 and .95 ofthe largest observed density.Chapter 4. Applications^ 115(a)-1^0^1^2In b(b)1^2^3^4^5In B_1Figure 4.18: Marginal posterior densities for: (a) ln b, (b) ln Bi. The full and dotted linescorrespond to the "diffuse" and "informative" priors respectively.Chapter 4. Applications^ 116Table 4.20: Monte Carlo estimates of moments of the posterior distribution p(0 I Y(7)) forPacific cod, given diffuse and informative priors. The values mo, ml and m2 refer to in b, ln q,and In B respectively. The values are the corresponding covariances.Parameter Diffuse Informativemo 1.16 1.07mi. -2.09 -2.28M2 1.89 2.08CO30 0.14 0.16Coo. 0.11 0.05CO32 -0.13 -0.09Ci.o. 0.36 0.15C1,2 -0.35 -0.15C2,2 0.36 0.17procedures described in Chapters 2 and 3, especially the use of adaptive importance sampling(AIS) to approximate posterior distributions of parameters in biomass dynamic models. Thefull benefits of such a rich description of uncertainties, are realized in decision making In twocase-studies — the simulation of Section 4.3 and the Orange Roughy fishery of Section 4.4 — Itake this extra step after defining a suitable utility function and a set of possible managementactions.The first case-study (Section 4.2) considered stock and recruitment data for Skeena Riversockeye salmon (Oncorhynchus nerka). I constructed a posterior distribution for the two-dimensional parameter and used standard density estimation techniques to approximate pos-terior distributions for two policy parameters. The estimates were discussed in the light of aprevious analysis performed by Hilborn and Walters [25] for the same data. I further used thisexample to illustrate the advantage of adaptive importance sampling over one-time importancesampling.A second case-study (Section 4.3) was based on simulated data. The effect of process noiseand observation error and the robustness of different fitting models were examined. For thestudy I replicated some of the simulation models used by Ludwig and Walters [38], but usedthe Bayesian models to estimate the "optimal effort" defined in their paper. I further includedChapter 4. Applications^ 117a new criterion to choose some "best effort", based on utility theory and Bayesian decisionanalysis. This new approach is attractive given its simplicity of concept and the avoidance ofunrealistic equilibrium assumptions. Since the approach requires a posterior distribution, it isan exclusive feature of the Bayesian method. If a fitting model is judged by its capability ofcorrectly estimating the policy parameters of interest, one is compelled to give up the fidelity ofthe model to the "true" underlying dynamics in order to improve the performance of estimation.Such findings were reported previously by Ludwig and Walters [38] and are confirmed withinthe Bayesian setting. The same feature was also identified for the new approach by comparingthe estimated best decision to its perfect information alternative.The third case-study (Section 4.4) refers to the fishery of Orange Roughy (Hoplostethasatlanticus) off the New Zealand coast. I used the data of Francis [18] and a surplus-productionmodel to examine the risk of different management options. Francis produced estimates of riskof collapse from a sophisticated age-structured model. Such risks are too high if compared to myresults based on a much simpler and possibly more robust model. Furthermore, the Bayesianframework of my analysis can better account for uncertainties in underlying parameters andits consequences for decision making. My results suggest that the choice between variousmanagement options is primarily driven by the utility function which compromises betweenrisk of collapse and future catches. The more one favours the former (large risk-aversion), themore conservative (i.e. larger rate of annual reduction in total allowable catches (TAC)) thesolution.The richness of information available through a posterior distribution becomes very evidentin cases where this distribution is multimodal. Such situations highlight the weakness of anyapproach which forces estimation to one point only. Decision making that is based on pointestimates ignores the uncertainty among a multitude of reasonable answers. The last twocase-studies examine multimodality. In Section 4.5 I once again resort to a simulation model.I consider an imaginary fishery for which the effectiveness of fishing effort (measured by thecoefficient of catchability) changes randomly from year to year. The results show a clear bimodalChapter 4. Applications^ 118posterior, indicating the duality which the fitting model is incapable to resolve. Section 4.6returns to real data. This time I consider Pacific cod' (Gadus macrocephalus) and use a moreelaborate delay- difference model (see Section 3.4) to describe biomass dynamics. The posteriordistribution indicates two modes. The analysis of one- and two-dimensional marginal posteriordensities also depicts the loss in information incurred by dimensionality reduction.6The potential for bimodality in this particular data set was pointed out to me by C. J. Walters.Chapter 5Discussion and Conclusion"Fisheries management is a decision problem, not an assessment problem", say Hilborn andWalters [25] after a detailed description and analysis of where and why it has failed in the past.The confusion between management and assessment, they claim, is at the root of the problem.Why ?People used their expertise to build more complex, realistic models. It was thought thatinclusion of any new insight about biological or ecological phenomena, would enhance thepredictive capabilities of such models. In seeking to maintain the credibility of fishery sciences,there was a tendency towards suppression or elimination of all elements of doubt. Confessionof uncertainty was considered to be a sign of weakness. But failures in the real world revealedthat this procedure was not effective.In the 1970's and 1980's, Hilborn, Ludwig, Walters, and others ([71], [34], [37], [33], [39],[38]) indicated where to look for possible flaws. They used simulation studies to show thattypical fishery data are not informative enough to allow for reliable estimation in realistic buthighly parameterized models. It was recognized that a compromise between performance ofestimates and complexities of models had to be made. Traditional management practices,based on equilibrium assumptions and goals for stable catches, were in collision course with thedesire for learning about the systems potential. Less realistic but simpler models were shown tohave a better performance in management than more sophisticated alternatives. Uncertaintywas exposed; its description presented a new challenge. Such a major change in thinking aboutmodels simultaneously changed the concept of management.Traditional notions and management practices were questioned. "It is far more important to119Chapter 5. Discussion and Conclusion ^ 120explicitly consider uncertainty than to apply sophisticated analytical techniques", say Hilbornand Walters [25]. The pretence of certainty was abandoned and fishery management seen asdecision making under uncertainty. It became important to identify competing models andcompare possible actions. The distinctive roles of assessment and management became clear:assessment should produce the alternative models (hypothesis) and their relative credibility,while management had to establish a set of actions and the measurement of their consequences.All these insights led naturally to use of Bayesian statistics.Lindley [31] states that, for coherent decision making in the presence of uncertainty, thereis only one way to go:1. All uncertainties must be quantified by probabilities.2. The consequences of various actions must be described by utilities.3. The action, expected to give the largest utility, must be taken.There is considerable theoretical evidence — see Berger [2] and de Finetti [8] for lengthytreatments and further references — to support the conclusion that Bayesian decision theorygives the framework to accomplish this goal. It has been shown that any decision making whichdoes not correspond to a Bayesian analysis, is liable to lead to incoherent procedures, and hencerepeated use may lead to the collapse of the fishery.From a Bayesian standpoint, probabilities are the expression of beliefs about uncertainevents. The individualistic perception which goes into such interpretation, gave it the name:subjective probability. Classical non-Bayesian views, reject such probabilities on two grounds:1. The frequentist interpretation of probability (e.g., the probability 1 of heads in tossing afair coin) as the only acceptable one; and2. the impression of lack in scientific objectivity, caused by the inclusion of personal opinions,perceived as arbitrary.Chapter 5. Discussion and Conclusion^ 121In answer to argument 1, it can be said that the flexibility gained from subjective probabil-ities is in fact an advantage of Bayesian statistics. In a frequentist approach there is no roomfor statements like "high probability that the Orange Roughy fishery will collapse within thenext 5 years" while it offers no difficulty to the subjectivist. As to the criticism in 2, any statis-tical analysis leading to decision making, relies on subjective (personal) inputs. The Bayesianapproach makes such inputs explicit and open to evaluation by other experts; this is certainlya scientific attitude. The reader can find more details about the controversy between 'objec-tivists' and 'subjectivists' and the philosophical foundations underlying the Bayesian approach,in Berger [2] (sections 1.6, 3.1 and 4.1) or, in a less technical language, in Lindley [31].If we agree with subjective probabilities as a coherent way to express uncertain beliefs, Bayes'Theorem becomes an important conceptual link between (prior) beliefs and the (statistical)information provided by data. The resulting posterior probability gives the complete descriptionof current beliefs. The theorem tells how beliefs have to change in the light of a new piece ofevidence (data); it is a mathematical description for coherent inductive reasoning.But, in public decision making — as is the case in fisheries — it can be difficult to finda prior or utility to which all participants would agree. The Bayesian statistician may use a"diffuse" (or "vague") prior or may report the outcome of the analysis over a set of reasonablealternatives. Often — as was observed for Orange Roughy — the management decision isrobust to such alternatives. If, however, different reasonable priors give conflicting answers,then the scientific uncertainty is to be treated as a feature of the problem and should never beeliminated by artificial or hidden assumptions. A set of sensible utilities, on the other hand, isan useful negotiating tool among various interest groups.Management: description or prescriptionOur modern thinking about assessment and management includes concepts of adaptive man-agement and replication. Such tools are needed to develop a better understanding of stockpotential and enable more reliable predictions. The answers sought are therefore descriptive;Chapter 5. Discussion and Conclusion ^ 122they explain how nature behaves.Management decisions are needed, and will always be based on beliefs, no matter howgood the understanding of the system or how reliable the predictions. In the worst scenario,uncertainties are simply ignored. To ensure better decision making, prescriptive answers areneeded.According to de Finetti [8], the subjectivistic theory of probability is prescriptive ratherthen descriptive. A posterior distribution is not the description of a real phenomena but anexpression of the perceived uncertainties and beliefs about it, given all the available evidence.Such distribution becomes of practical relevance when used in combination with measures ofconsequences (utilities), to prescribe a best action.Adaptive management uses experimentation to enhance understanding (description) of themechanisms acting on renewable resources under exploitation. There are, however, recognizeddifficulties with the practical implementation of experimental management programs. Thedifficulties go from technical (insufficient replicates) to institutional (coordination, enforcement)and political (who sets priorities and accepts the responsibilities for the decision). But how canwe predict the effect of an action if there are no replicates and controls to compare with ? Howto respond to unexpected (or contradictory) predictions from different working models ?Regardless of the ability to overcome these problems, the prescriptive role of managementwill still be necessary. Although often preferred, inaction is merely one of many possible choices.I believe that Bayesian statistics has much to offer in helping to improve the decision makingprocess. If people are trained in better elicitation of uncertainties, if utilities give a fair descrip-tion of preferences and attitudes towards risk, then a coherent framework is available. Howcan we accomplish the complex task (decision making) if we can not do the elementary ones(describe uncertainties and utilities) ?Chapter 5. Discussion and Conclusion^ 123The contribution of this workTechnical difficulties have hindered the implementation of the Bayesian approach. For fisherysciences in particular, the non-linearity of models and plethora of parameters have impededeasy implementation.This technical challenge had to be tackled so that the Bayesian paradigm could becomeof practical significance. By using the adaptive importance sampling procedure introduced byWest [75] [76], I was able to construct multivariate posterior distributions for the parametersof biomass dynamic models. Since the procedure gives a random sample of values from theposterior distribution, this output can be coupled easily with an utility function to calculateapproximate Monte Carlo expectations — a requirement for Bayesian decision making.The application of these results to a variety of case-studies and its combination with rea-sonable utility functions served to illustrate the new possibilities. For instance, it shed newlight on the controversy about the New Zealand Orange Roughy fishery. The (frequentist)results of Francis [18] were shown to agree with the Bayesian analysis only when a particular(informative) prior was assumed. But the sensitivity of the model to different reasonable priorsturned out to be unimportant when deciding upon the best policy. For this case in particular,my analysis suggests that the decisive issue is the choice of an utility function; neither prioruncertainties nor technical details about models are influential. Because this choice is politicaland not technical, the scientist doing assessment should not impose his or her judgement of"adequate" risk taking. He or she are entitled, however, to assist in the construction of anadequate utility function and in the elicitation of prior beliefs as well as the calculation ofexpected utilities. In this context the role of assessment is more than data analysis; it includesthe analysis of beliefs and attitudes towards risk.Chapter 5. Discussion and Conclusion^ 124Future WorkThere are many aspects of the Bayesian method where practical difficulties remain and wheremore work is needed. Some are general: elicitation of priors and utilities or the problem ofpublic decision (i.e., more than one decision maker). Others are specific: the extension to otherclasses of fishery models and/or importance functions. I shall comment on each separately:Elicitation: In his description of the importance of probabilities as the adequate measurementof uncertainty, Lindley [31] uses an analogy in geometry. He states that in the same wayas the laws of geometry have to be satisfied in the construction of a building, the laws ofprobability are to be satisfied if a coherent description of uncertainties is to be achieved.Pursuing the analogy further, he compares the tools to measure lengths and angles tomethods for elicitation of priors. Such methods might be simple or sophisticated, inac-curate or precise. The decision maker, likewise the architect, will need tools to make therequired measurements. Much needs to be done to develop better tools for measurementof uncertainty. Nevertheless, concludes Lindley, while technological applications are stilldeficient, the scientific principles are clear.Public Decision: In the process of decision making it is usually a committee that has to arriveat a consensus, each committee member having his or her own utility function and prioruncertainty. All analyses considered throughout previous chapters made the implicitassumption of one decision maker only. What can we do when beliefs and interestsof different individuals have to be used ? The theory for dealing with group decision(Weerahandi and Zidek [73], Zidek [81], de Waal et al. [9]) was developed to address thisissue and can be tried out here.Fishery Models: Surplus-production models were central to this work. I commented thatthere was no conceptual difficulty in extending the procedure to more elaborate alterna-tives. In fact, a standard delay-difference model was used in the case-study for Pacificcod. We have also seen throughout that, given the common limitations in quantity andChapter 5. Discussion and Conclusion ^ 125quality of fishery data, simpler (i.e., less parameterized) surplus-production models tendto be more robust to correctly identify the best policy choice. But if we can improve priorelicitation of parameters in the more demanding delay-difference models, the potentialgain is considerable. To train fishery biologists for a better prior elicitation might includethe following steps:1. Acquaint the elicitors with probabilities and its role in describing uncertainties.2. Generate artificial data using detailed and realistic models.3. Get elicitation of parameters for specified models and based on the artificial datafrom previous step.4. Perform analysis to identify failures of model and elicitation. Try to answer questionslike:• Which parameters are more critical ?• Which models are more robust ?• Can elicitors learn from past mistakes ?The extension to nonstationary models (i.e. time-dependent Ot) is another promising areafor further research, as is the search for alternative mixture models which enable to workwith strictly positive variables directly (i.e., no need to use log-transformations).ConclusionActive adaptive strategies are the essence of modern fishery management. Such experimentalapproach can increase our understanding (description) of the system and improve our capabili-ties for prediction. But, it needs the features of Bayesian analysis to determine how uncertaintiespropagate in time as new evidence becomes available through new data.Policy choices need to be made, independent of the progress in adaptive management prac-tices. Bayesian analysis provides coherent prescriptions of how such choices are to be made. "Inall cases where details are available", says Wilimovsky [78] in (non-Bayesian) discussion aboutChapter 5. Discussion and Conclusion^ 126fishery management, "the output of formal predictions is modified by judgment. The method-ology of judgment is not formalized so that it is unlikely that two or more individuals wouldarrive at the same management recommendation or even have the ability to trace the other'slogic".Appendix APseudo-code for Adaptive Importance SamplingThis appendix presents a pseudo-code which describes the implementation of the adaptive im-portance sampling (AIS) of Chapter 2. In particular, it refers to the procedures describedin Section 2.3. For concreteness, the code considers a fixed number of updating steps (mod-ifications for an interactive procedure are easy to implement in principle) and assumes twotime-series of data (not necessarily of the same length): (i) total catches and (ii) some abun-dance index. I use the fitting procedure described in Section 3.3 as "Model 2" and take thefunctions G(-) and F(.) to be known.I shall specify some notation to be used in the pseudo-code:• A vector (al , • • -,ap) of order P, will be denoted a[1...13]. A single element ai from thisvector is identified as a [i.] .• A matrix (N x P) with N lines and P columns with its elements given by fai,i : j =1,• • ,N;i =1,- • ,P}, will be denoted a [1 . . .N, 1. . .13] . A single element^is identifiedas a [j , i].• The j-th line of the above matrix — the vector of order P given by ^• • ai,p1 —is denoted a [j , 1. .13] or simply a [j ,*] . Any operation performed on such a line j isunderstood to be applied to each of its P elements individually.• Let S be a symmetric positive-definite matrix of order P. I define a vector T of orderK = P*(P -F 1)/2, containing the elements of the lower triangle of S, taken by row. Thatis, for each element (Si,i : j > i) define r = (j(j — 1)12) i to getT,. =127Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 128Covariance matrices will be stored in the form T rather than S throughout the code.• The different steps in the pseudo-code are numbered. Global numbers (1, 2, etc.) if read inisolation, give a summary description of the sequential structure of the algorithm. Lowerlevels of numbering (2.1, 2.1.1, etc.) give details about how the task of '2' is accomplished.• The conditional operator IF (condition) GOTO (x .x) is to be interpreted as follows:1. Verify condition.2. If condition is TRUE, than go to line number x.x.3. If condition is FALSE, than go to next line.The description of the code is sub-divided into three parts: (I) - a "list of variables" forwhich the values need to be specified in advance; (II) - The pseudo-code for the main program;(III) - Pseudo-code describing subroutines used in (II).I - LIST OF VARIABLES:P: dimension of the parameter vector 'Theta'.K = ( P*(P+1)/2 ): the dimension of a vector giving thelower triangle of PxP Covariance matrix.G: number of iterations in the AIS procedure.NO: size of sample to be drawn from prior.N[1...G]: sample size to be drawn in each iteration.M: number of Student distributions to be used in the clusteredmixture model.DFO: degrees of freedom in prior Student distribution.DFk: degrees of freedom of the Student kernel in the mixturemodel.H: the bandwidth of kernel variance.Appendix A. Pseudo-code for Adaptive Importance Sampling^ 129DATA_C: number of observed catch data.DATA_A: number of observed abundance index data.('DATA_A' issmaller or equal to 'DATA_C)THETA_MEAN[1...P]: prior mean for 'Theta'.THETA_VAR[1...K]: lower triangle of the prior covariance matrixfor 'Theta'.ALPHAO and BETAO: parameters for Inverse Gamma prior.CATCH[1...DATA_C]: time-series of observed catches.ABUN[1...DATA_A]: time-series of observed abundance index data.II - PSEUDO-CODE:1: Draw a sample of size NO from the prior distribution and storethe results in THETA[1...N0,1...P].1.1: Perform a "Choleski Decomposition" (see final remarks) ofTHETA_VAR[1...K] and put the result in CHOLESKI[1...K].1.2: For j=1 to j=NO1.2.1: Draw a random vector from a "Multivariate StudentDistribution" (see final remarks) using as inputTHETA_MEAN[1...P], CHOLESKI[1...K] and DFO.1.2.2: Store the output in THETA[j,*].Next j2: Calculate the likelihood for ABUN[1...DATA_A] for eachgiven value of THETA[1...N0,*] and store the normalizedAppendix A. Pseudo-code for Adaptive Importance Sampling ^ 130results in OMEGA[1...N0].2.1: For i=1 to DATA_A2.1.1. Get Y[i] = ln( ABUN[i] )Next i2.2: Initialize SUM = 02.3: For j=1 to NO2.3.1: Get the time series of deterministic averagebiomass dynamics by using THETA[j,*], ALPHAO,BETA0 and CATCH[1...DATA_C] in the assumedbiomass model (G(.) and F(.)). Store therelevant values in MU[1...DATA_A].2.3.2: Store the likelihood value in OMEGA[j].2.3.3: SUM = SUM + OMEGA[j]Next j2.4: For j=1 to NO2.4.1: OMEGA[j] = OMEGA[j]/SUMNext j3: Calculate the "relative entropy" of the prior and store it asEO or send it to the output device.3.1: Initialize ENTROPY = 03.2: For j=1 to NO3.2.1: ENTROPY = ENTROPY + ( OMEGA[j]*ln(OMEGA[j]) )Next j3.3: EO = -ENTROPY/ln(NO)3.4: Send 'EO' to output device.Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 1314: Perform the G recursions of the adaptive importance samplingprocedure. In the process keep track of the entropies E[1...G].Store the final values THETA[1...N[G]] and OMEGA[1...N[G]] forfurther use.4.1: Get the "bandwidth" for kernel variance and store theoutcome in WINDOW.4.1.1: X1 = (4/(1+2P))-(1/(1+4P))4.1.2: X2 = M-(1/(1+4P))4.1.3: WINDOW = H*(X1*X1)/(X2*X2)4.2: For k=1 to G4.2.1: IF ( k>1 ) GOTO (4.2.5)4.2.2: Use current THETA[1...NO,*] and OMEGA[1...NO]to get Monte Carlo mean and covariance. Store themean as KER_MEAN[1...P] and the lower triangle ofthe covariance matrix as KER_VAR[1...K].4.2.3: Use the subroutine "CLUSTERO" (described below)and the input of THETA[1...NO,*] andOMEGA[1...NO] to get the smaller setsT_CL[1...M,*] and 'O_CL[1...M].4.2.4: Go to (4.2.7)4.2.5: Use current T_OLD[1...N[k],*] andO_KER[1...N[k]] to get Monte Carlo mean andcovariance. Store the mean as KER_MEAN[1...P] andthe lower triangle of the covariance matrix asKER_VAR[1...K].4.2.6: Use the subroutine "CLUSTER()" and the input ofT_OLD[1...N[k],*] and 0_KER[1...N[k]] to getAppendix A. Pseudo-code for Adaptive Importance Sampling ^ 132T_CL[1...M,*] and O_CL[1...M].4.2.7: For 1=0 to K*.1: KER_VAR[i]=WINDOW*KER_VAR[i]Next i4.2.8: Perform a "Choleski Decomposition" ofKER_VAR[1...K] and put the result inKER_CHOL[1...K].4.2.9: For j=1 to N[k]*.1:Draw a random integer from the set {1,...,14}according to probabilities given byO_CL[1...M]. Store it in A.*.2:Draw a random vector from a "MultivariateStudent distribution" using as inputT_CL[A,*], KER_CHOL[1...,K] and DFK.*.3: Store the output in THETA_KER[j,1...P].Next j4.2.10: Calculate the weights of the random sampleTHETA_KER[1...,N[k],*] which was drawn from thecurrent "mixture model". Store this value inO_KER[1...N[k]].*.1:Initialize SUM=0*.2:For j=1 to N[k]*.2.1: Calculate the "density" ofTHETA_KER[j,*] for a P-dimensionalStudent distribution with parametersTHETA_MEAND, THETA_VARD and DFO.Store the result in PRIOR.Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 133*.2.2: Initialize MIXTURE = 0*.2.3: For 1=1 to M*.2.3.1: Calculate the "density" ofTHETA_KER[j,*] for aP-dimensional Studentdistrib. with parametersT_CL[i,*], KER_VARD and DFK.Store the result in KERNEL.*.2.3.2: KERNEL = O_CL[i]*KERNEL*.2.3.3: MIXTURE = MIXTURE + KERNELNext i*.2.4: Get the time series of averagebiomass dynamics using THETA_KER[j,*],ALPHAO, BETAO, and CATCH[1...DATA_C]in equation G(.).Store the mean values inMU[1...DATA_A] and get the likelihoodLIK for data ABUN[1...DATA_A].*.2.5: F = LIK*PRIOR*.2.6: O_KER[j] = F/MIXTURE*.2.7: SUM = SUM + O_KER[j]Next j*.3:For j=1 to N[k]*.3.1: O_KER[j] = O_KER[j]/SUMNext j*.4:Calculate the "relative entropy" of thecurrent "mixture model" and store it asAppendix A. Pseudo-code for Adaptive Importance Sampling^ 134E[k] or send it to output.*.4.1: Initialize ENTROPY = 0*.4.2: For j=1 to N[k]*.4.2.1: ENTROPY = ENTROPY +0_kER[j]*ln(O_KER[j]).Next j*.4.3: E[k] = -ENTROPY/ln(N[k])*.4.4: Send E[k] to output.*.5:IF (k=G) GOTO (*.8)*.6:For j=1 to N[k]*.6.1: T_OLD[j,*] = THETA_KER[j,*]Next j*.7:Go to (4.2.1)*.8:For j=1 to N[G]*.8.1: THETA[j,*] = THETA_KER[j,*]*.8.2: OMEGA[j] = O_KER[j]Next jNext k5: Use THETA[1...N[G],*] and OMEGA[1...N[G]] to get Monte Carlomean and covariance. Store the mean in POST_MEAN[1...P] andthe lower triangle of the covariance matrix asPOST_VAR[1...K]. Send this values to the output device forfurther use.6: (Optional): Get Monte Carlo estimates for mean and variance offunctions "H(Theta)".Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 1357: Use THETA[1...N[G],*] and OMEGA[1...N[G]] in the subroutine"CLUSTER()" to get the final "mixture model". Store these valuesas T_CL[1...M,*] and O_CL[1...11] and keep them in a File forfurther use.III - SUB-ROUTINES:CLUSTER(): This subroutine performs the clustering algorithm ofsection 2.4 (Chapter 2).Input: - The dimension P of the parameter vector.- A positive integer N denoting the size of the data set.- A positive integer M (M < N) defining the desired size ofthe data set after clustering.- A data matrix THETA[1...N,1...P]_ A vector of weights OMEGA[1...N]Output: - A matrix T_CL[1...M,1...P]_ A vector O_CL[1...M]1: Initialize D=N2: Make a copy of OMEGAD. Also createa vector of identification indices.2.1: For j=1 to N2.1.1: O_COPY[j] = OMEGA[j]Appendix A. Pseudo-code for Adaptive Importance Sampling^ 1362.1.2: ID[j] = jNext j3: Sort (see final remarks about "sorting") the vector0_COPY[1...N] into ascending numerical order andmake the corresponding rearrangement ofID[1...N].4: Sort the values in T_COPY[1...N,*] to agree with theordered vector O_COPY[1...C.4.1: For j=1 to N4.1.1: S = ID[j]4.1.2: T_COPY[j,*] = THETA[S,*]Next j5: IF ( D=M ) GOTO (7)6: Reduce the data set by one unit, by merging the first casewith its "nearest neighbour" and reallocating the new valuewithin the sorted set.6.1: For j=2 to D6.1.1: Calculate the "distance" between T_COPY[1,*] andT_COPY[j,*] and store its value in DIST.6.1.2: IF ( j>2 ) GOTO (6.1.5)6.1.3: Initialize: DMIN = DISTWEIGHT = O_COPY[2]S = 2Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 1376.1.5: IF ( DIST>=DMIN ) GO TO (6.1.9)6.1.6: DMIN = DIST6.1.7: WEIGHT = O_COPY[j]6.1.8: S = jNext j6.2: O_TEST = O_COPY[1] + O_COPY[S]6.3: For i=1 to P6.3.1: T_TEST[i] = (0_COPY[1]*T_COPY[1,i] +O_COPY[S]*T_COPY[S,i])/O_TESTNext i6.4: For j=2 to D-16.4.1: IF ( j>=S ) GOTO (6.4.5)6.4.2: O_C0PY[j-1]=0_C0PY[j]6.4.3: T_COPY[j-1,*]=T_COPY[j,*]6.4.4: Go to (6.4.13)6.4.5: IF ( O_TEST<O_COPY[j+1] ) GOTO (6.4.9)6.4.6: O_COPY[j-1] = 0_COPY[j+1]6.4.7: T_COPY[j-1,*] = T_COPY[j+1,*]6.4.8: Go to (6.4.13)6.4.9: O_COPY[j-1] = O_TEST6.4.10: O_COPY[j] = 0_COPY[j+1]6.4.11: T_COPY[j-1,1...P] = T_TEST[1...P]6.4.12: T_COPY[j,*] = T_COPY[j+1,*]Next j6.5: D = D-16.6: Return to (5)Appendix A. Pseudo-code for Adaptive Importance Sampling^ 1387: Store T_COPY [1 . . . M , *] and O_COPY [1. . .M] in the arrays definedfor output.7.1: For j=1 to M7 .1. 1: O_CL [j] = O_COPY [j]7 .2 . 2 : T_CL [j , *] = T_COPY [j , *]Final RemarksCholeski Decomposition: The Choleski decomposition of a covariance matrix E results in alower-triangular matrix L satisfyingz . LLTSuch decomposition is helpful here in two ways: (i) to perform random draws from amultivariate distribution with covariance matrix E and (ii) to calculate the inverse matrixE-1. For further details and algorithms I suggest Ripley [53] and Nash [44].Student Distribution: In order to generate random draws from uni- and multidimensionalStudent densities, algorithms can be found in Ripley [53].Sorting: To perform the "sorting" in the CLUSTER() subroutine, I used the "Heapsort"algorithm in Press et al. [52].Bibliography[1] M. Aoki. State space modelling of time series. Springer Verlag, 1987.[2] J. 0. Berger. Statistical decision theory and Bayesian analysis. Springer Verlag, 1985.[3] G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, revised edition, 1976.[4] P. J. Brockwell and R. A. Davis. Time series: Theory and Methods. Springer Verlag, 1987.[5] A. Charles. In-season fishery management: a Bayesian model. Nat. Res. Mod., 2:457-498,1988.[6] C. A. Clark. Bioeconornic Modelling and Fisheries Management. John Wiley & Sons Inc.,1985.[7] C. A. Clark. Mathematical Bioeconomics: the optimal management of renewable resources.John Wiley & Sons Inc., 2nd edition, 1990.[8] B. de Finetti. Probability, induction and statistics. John Wiley & Sons, 1972.[9] D. de Waal, P. Groenewald, D. van Zyl, and J. V. Zidek. Multi-Bayesian estimation theory.Statistics and Decisions, 03, 1985.[10] M. Delampady, I. Yee, and J. V. Zidek. Hierarchical Bayesian analysis of a discrete timeseries of Poisson counts. Technical Report 111, Dept. of Statistics - University of BritishColumbia, 1991.[11] M. Delampady, I. Yee, and J. V. Zidek. A smoothness-prior model for discrete time series.Technical Report 110, Dept. of Statistics - University of British Columbia, 1991.[12] R. B. Deriso. Harvesting strategies and parameter estimation for an age-structured model.Can. J. Fish. Aquat. Sci., 37:268-282, 1980.[13] R. B. Deriso and A. M. Parma. Dynamics of age and size for a stochastic populationmodel. Can. J. Fish. Aquat. Sci., 45:1054-1068, 1988.[14] B. Efron. Non parametric estimates of standard error: the jackknife, the bootstrap andother methods. Biometrika, 68:589-599, 1981.[15] B. Efron. Bootstrap confidence intervals for a class of parametric problems. Biometrika,72:45-58, 1985.[16] D. A. Fournier and I. J. Doonan. A length-based stock assessment method utilizing ageneralized delay-difference model. Can. J. Fish. Aquat. Sci., 44:422-437, 1987.139Bibliography^ 140[17] D. A. Fournier and A. Warburton. Evaluating fisheries management models by simulatedadaptive control — Introducing the composite model. Can. J. Fish. Aquat. Sci., 46:1002—1012, 1989.[18] R. I. C. C. Francis. Use of risk analysis to assess fishery management startegies: a casestudy using Orange Roughy (Hoplostehus atlanticus) on the Chatham Rise, New Zealand.Can. J. Fish. Aquat. Sci., 49:922-930, 1992.[19] S. M. Fried and It. Hilborn. Inseason forecasting of Bristol Bay, Alaska sockeye salmon(Oncorhynchus nerka) abundance using Bayesian probability theory. Can. J. Fish. Aquat.Sc., 45:850-855, 1988.[20] H. J. Geiger and J. P. Koenings. Escapement goals for sockeye salmon with informativeprior probabilities based on habitat considerations. Fish. Res., 11:239-256, 1991.[21] A. E. Gelfand and A. F. M. Smith. Sampling based approaches to calculating marginaldensities. J. Am. Stat. Assoc., 85:398-409, 1990.[22] S. Geman and D. Geman. Stochastic relaxation, Gibbs distribution and the Bayesianrestoration of images. IEEE Trans. Pattn. Anal. Mach. Intell., 6:721-741, 1984.[23] J. Geweke. Bayesian inference in econometric models using Monte Carlo integration.Econometrica, 57:1317-1339, 1989.[24] It. Hilborn, E. K. Pikitch, M. M. McAllister, and A. Punt. A comment on "Use of riskanalysis to assess fishery management startegies; a case study using Orange Roughy, (ho-plosthethus atlanticus) on the Chatham Rise, New Zealand" by R.I.C.C. Francis. Can. J.Fish. Aquat. Sci., 1993. To appear.[25] It. Hilborn and C. J. Walters. Quantitative fisheries stock assessment: choice, dynamics6 uncertainty. Chapman and Hall, 1992.[26] N. L. Johnson and S. Kotz. Continuous univariate distributions. Houghton Mifflin, 1970.[27] G. Kitagawa. Non-gaussian state-space modeling of nonstationary time series. J. Amer.Stat. Assoc., 82:1032-1050, 1987.[28] T. Kloek and H. K. van Dijk. Bayesian estimates of equation system parameters: anapplication of integration by Monte Carlo. Econometrica, 46:1-19, 1978.[29] P. M. Lee. Bayesian statistics: an introduction. Oxford University Press, 1989.[30] D. V. Lindley. Approximate Bayesian methods. In J. M. Bernardo et al., editor, BayesianStatistics 2, pages 223-245. Valencia University Press, 1980.[31] D. V. Lindley. Making Decisions. John Wiley & Sons, 2nd edition, 1985.[32] H. Linhart and W. Zucchini. Model Selection. John Wiley & Sons, Inc., 1986.Bibliography^ 141[33] D. Ludwig. Small models are beautiful: efficient estimators are even more beautiful. InSpringer Verlag, editor, Unknown, 1988. Conference held at Cornell University.[34] D. Ludwig and R. Hilborn. Adaptive probing strategies for age-structured fish stocks.Can. J. Fish. Aquat. Sci., 40:559-569,1983.[35] D. Ludwig, R. Hilborn, and C. J. Walters. Uncertainty, resource exploitation, and conser-vation: Lessons from History. Science, 260:17,1993.[36] D. Ludwig and C. J. Walters. Optimal harvesting with imprecise parameter estimates.Ecol. Model., 14:273-292,1982.[37] D. Ludwig and C. J. Walters. Are age structured models appropriate for catch-effort data?Can. J. Fish. Aquat. Sci., 42:1066-1072,1985.[38] D. Ludwig and C. J. Walters. A robust method for parameter estimation from catch andeffort data. Can. J. Fish. Aquat. Sci., 46:137-144,1989.[39] D. Ludwig, C. J. Walters, and J. Cooke. Comparison of two models and two estimationmethods for catch and effort data. Nat. Res. Model., 2:457-498,1988.[40] M. Mangel and C. W. Clark. Uncertainty, search and information in fisheries. J. Cons.Ciem., 41:93-103,1983.[41] P. McCullagh and J. A. Nelder. Generalized linear models. Chapman and Hall, 2nd edition,1989.[42] R. Mendelssohn. Some problems in estimating population sizes from catch-at-age data.Fishery Bulletin, 86:617-630,1988.[43] T. S. Nakamura, S. Ohnishi, and Y. Matsumiya. A Bayesian cohort model for catch-at-agedata obtained from research takes of whales. Rep. Int. Whal. Comm., 39:375-382,1989.[44] J. C. Nash. Compact Numerical Methods for Computers. Adam Hilger Ltd., 1979.[45] J. C. Naylor and A. F. M. Smith. Econometric illustrations of novel numerical integrationstrategies for Bayesian inference. J. of Econometrics, 38:103-125,1988.[46] M. S. Oh and J. 0. Berger. Adaptive importance sampling in Monte Carlo integration. J.Statist. Computn. Simuln., 1991. (to appear).[47] A. M. Parma. Optimal harvesting of fish populations with non-stationary stock-recruitment relationships. Nat. Resource Mod., 4:39-76,1990.[48] J. J. Pella and P. K. Tomlinson. A generalized stock production model. Bull. Inter-Am.Trop. Tuna Comm., 13:419-496,1969.[49] A. Pole. Efficient Bayesian learning in non-linear dynamic models. Technical report, Dep.of Statistics — University of Warwick, 1988.Bibliography^ 142[50] A. Pole and M. West. Efficient Bayesian learning in non-linear dynamic models. J. Fore-casting, 9:119-136, 1990.[51] S. J. Press. Bayesian statistics: principles, models and applications. John Wiley & Sons,1989.[52] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical recipies inC. Cambridge University Press, 1988.[53] B. Ripley. Stochastic Simulation. John Wiley Si Sons, 1987.[54] D. B. Rubin. Comment on "The Calculation of posterior distributions by data augmenta-tion", by M.A.Tanner and W.H.Wong. J. Am. Statist. Assoc., 82:543-546, 1987.[55] L. J. Savage. The foundations of statistics. John Wiley Si Sons, 1954.[56] J. Schnute. A general theory for analysis of catch and effort data. Can. J. Fish. Aquat.Sci., 42:414-429, 1985.[57] J. Schnute. A general fishery model for size-structured fish population. Can. J. Fish.Aquat. Sci., 44:924-940, 1987.[58] J. Schnute. The importance of noise in fish population models. Fisheries Research,11:197-223, 1991.[59] B. W. Silverman. Density estimation for statistics and data annalysis. Chapman and Hall,1986.[60] A. F. M. Smith. Bayesian computational methods. Phil. Trans. R. Soc. Lond. A, 337:369-386, 1991.[61] A. F. M. Smith and A. E. Gelfand. Bayesian statistics without tears: a sampling-resampling perspective. The Am. Statistician, 46:84-88, 1992.[62] A. F. M. Smith and G. 0. Roberts. Bayesian computation via the Gibbs sampler andrelated Markov ChainMomte Carlo methods. J. R. Statist. Soc. B, 55:3-23, 1993.[63] A. F. M. Smith, A. M. Skene, J. E. H. Shaw, and J. C. Naylor. Progress with numericaland graphical methods for practical Bayesian statistics. The Statistician, 36:75-82, 1987.[64] H. W. Sorenson. Recursive estimation for nonlinear dynamic systems. In J. C. Spa11, editor,Bayesian Analysis of time-series and dynamic models, pages 127-165. Marcel Dekker, 1988.[65] A. M. Starfield and A. L. Bleloch. Building models for conservation and wildlife. McMillanPub. Co., 1986.[66] P. J. Sullivan. A Kalman-Filter approach to catch-at-length analysis. Biometrics, 48:237-257, 1992.Bibliography^ 143[67] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments. J. Amer.Statist. Assoc., 81:82-86, 1986.[68] C. J. Walters. Adaptive management of renewable resources. McMillan Pub. Co., 1986.[69] C. J. Walters. Nonstationarity of production relationships in exploited populations. Can.J. Fish. Aquat. Sci., 44:156-165, 1987. Suppl. 2.[70] C. J. Walters and J. S. Collie. Design of an experimental program for groundfish manage-ment in the face of large uncertainty about stock size and production. In R. J. Beamishand G. A. MacFarlane, editors, Effects of Ocean Variability on Recruitment and an Eval-uation of Parameters used in Stock Assessment Models., pages 13-25. Canadian SpecialPublications in Fisheries and Aquatic Sciences, 1989. No. 108.[71] C. J. Walters and It. Hilborn. Ecological optimization and adaptive management. Ann.Rev. EcoL Syst., 9:157-188, 1978.[72] C. J. Walters and D. Ludwig. Effects of measurement errors on the assessment of stock-recruitment relationships. Can. J. Fish. Aquat. Sci., 38:704-710, 1981.[73] S. Weerahandi and J. V. Zidek. Elements of multi-Bayesian decision theory. Ann. Statist.,11:1032-1046, 1983.[74] M. West. Mixture models, Monte Carlo, Bayesian updating and dynamic models. Discus-sion paper, ISDS — Duke University, 1992.[75] M. West. Modelling with mixtures. In J.M. Bernardo, J.O. Berger, A.P. Dawid, andA.F.M. Smith, editors, Bayesian Statistics .4, pages 503-524. Oxford University Press,1992.[76] M. West. Approximating posterior distributions by mixtures. J. R. Statist. Soc. B, 55:409—422, 1993.[77] M. West and J. Harrison. Bayesian forecasting and dynamic models. Springer Verlag,1989.[78] N. J. Wilimowsky. The need for formalization of decision algorithms and risk levels infishery research and management. Can. J. Fish. Aquat. Sci., 42:258-262, 1985. Suppl. 1.[79] R. L. Wolpert. Monte Carlo integration in Bayesian statistical analysis. Contemp. Math.,115(101-116), 1991.[80] J. E. Zeh, A. E. Raftery, and Y. Qing. Assessment of tracking algorithm performance andits effect on population estimates using bowhead whales, Balaena mysticetus, identifiedvisually and acoustically in 1986 off Point Barrow, Alaska. Rep. Int. Whaling Comm.,40:411-421, 1990.[81] J. V. Zidek. Multi-Bayesianity. Technical Report 05, Dept. of Statistics — University ofBritish columbia, 1984.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bayesian statistics for fishery stock assessment and...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bayesian statistics for fishery stock assessment and management Kinas, Paul G. 1993-09-10
pdf
Page Metadata
Item Metadata
Title | Bayesian statistics for fishery stock assessment and management |
Creator |
Kinas, Paul G. |
Date Issued | 1993 |
Description | This work is about the use of Bayesian statistics in fishery stock assessment and management. Multidimensional posterior distributions replace classical parameter estimation in surplus-production and delay-difference models. The maximization of expected utilities replaces the estimation of optimal policies. Adaptive importance sampling is used to obtain approximations for posterior distributions. The importance function is given as a finite mixture of heavy-tailed Student distributions. The performance of the method is tested in five case-studies, two of which use data simulation. Real data refer to Skeena river salmon (Oncorhynchus nerka), Orange Roughy (Hoplostethus atlanticus) and Pacific cod (Gadus macrocephalus). The results show that the technique successfully approximates posterior distributions in higher dimensions even if such distributions are multimodal. When comparing models in terms of their performance as management tools, simpler and less realistic models can do better than more sophisticated alternatives. The Bayesian approach also sheds new light on the controversy about the Orange Roughy fishery. |
Extent | 5815821 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2008-09-10 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0086451 |
URI | http://hdl.handle.net/2429/1849 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1993-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1993_fall_phd_kinas_paul.pdf [ 5.55MB ]
- Metadata
- JSON: 831-1.0086451.json
- JSON-LD: 831-1.0086451-ld.json
- RDF/XML (Pretty): 831-1.0086451-rdf.xml
- RDF/JSON: 831-1.0086451-rdf.json
- Turtle: 831-1.0086451-turtle.txt
- N-Triples: 831-1.0086451-rdf-ntriples.txt
- Original Record: 831-1.0086451-source.json
- Full Text
- 831-1.0086451-fulltext.txt
- Citation
- 831-1.0086451.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0086451/manifest