BAYESIAN STATISTICS FOR FISHERY STOCK ASSESSMENT AND MANAGEMENT By Paul G. Kinas B. Sc. (Oceonology) Universidade do Rio Grande M. Sc. (Statistics) Universidade de Sao Paulo A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA 1993 © Paul G. Kinas, 1993 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Statistics The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date:^ /2 O c k)_zr^3 Abstract This work is about the use of Bayesian statistics in fishery stock assessment and management. Multidimensional posterior distributions replace classical parameter estimation in surplusproduction and delay-difference models. The maximization of expected utilities replaces the estimation of optimal policies. Adaptive importance sampling is used to obtain approximations for posterior distributions. The importance function is given as a finite mixture of heavy-tailed Student distributions. The performance of the method is tested in five case-studies, two of which use data simulation. Real data refer to Skeena river salmon (Oncorhynchus nerka), Orange Roughy (Hoplostethus atlanticus) and Pacific cod (Gadus macrocephalus). The results show that the technique successfully approximates posterior distributions in higher dimensions even if such distributions are multimodal. When comparing models in terms of their performance as management tools, simpler and less realistic models can do better than more sophisticated alternatives. The Bayesian approach also sheds new light on the controversy about the Orange Roughy fishery. 11 Table of Contents Abstract^ ii iii Table of Contents^ v List of Tables^ viii List of Figures^ Acknowledgement^ x 1 1 2 Introduction 1 1.1 Motivation 1.2 Uncertainties, Models and Learning ^ 4 1.3 The Bayesian Method ^ 7 1.4 Summary of Contents ^ 13 ^ Bayesian Posterior Computation 15 2.1 Introduction ^ 15 2.2 Monte Carlo Integration with Basic Importance Sampling (BIS) ^ 18 2.3 Adaptive Importance Sampling (AIS) and Mixture Models ^ 21 2.3.1^An Example 26 2.4 2.5 ^ Sequential Analysis ^ 32 2.4.1^AIS in Sequential Models ^ 36 2.4.2^An Example 38 ^ Summary ^ 42 in 3 Models for Biomass Dynamics ^ 43 3.1 Introduction ^ 43 3.2 Model 1 ^ 47 3.2.1 Two Lemmas ^ 49 3.2.2 Catch and Effort Data ^ 53 3.3 Model 2 ^ 58 3.4 Delay-Difference Models ^ 63 3.5 Summary ^ 66 4 Applications^ 67 4.1 Introduction ^ 67 4.2 Stock and Recruitment: sockeye salmon ^ 68 4.3 Surplus Production: Simulation Model ^ 74 4.4 Surplus Production: Orange Roughy ^ 91 4.5 Multimodality: Simulation Model ^ 103 4.6 Multimodality: Pacific Cod ^ 107 4.7 Final Remarks ^ 113 5 Discussion and Conclusion ^ 119 Appendices^ 126 A Pseudo-code for Adaptive Importance Sampling^ 127 Bibliography^ 139 iv List of Tables 2.1 Data from McCullagh and Nelder [41] (Sec 9 3 3) ^ 27 2.2 List of parameters used in the simulation model for artificial data generation . . 39 2.3 Output from simulation. The harvest rates ht were fixed in advance. Bt is the Biomass at the start of year t and h is the corresponding observed abundance index. Ot and Yt are the natural logarithm of Bt and h, respectively. ^ 39 4.4 Some biologically meaningful attributes of the parameters in the Ricker and Beverton-Holt models. ^ 69 4.5 Five quantiles used to determine prior distribution for parameters a and b (the subscript according to the model: Ricker or Beverton-Holt). 'm' and 'C' are mean and variance used in Student prior. 70 4.6 Results for AIS applied to the Skeena river sockeye salmon data under various assumptions. The first number is the posterior mean; the posterior standard deviation is given in brackets. Columns R* and BH* are the estimates of Hilborn and Walters [25] 72 4.7 Posterior mean and Covariance for different Models fitted to the Skeena river sockeye salmon data ^ 72 4.8 Specification of the three simulation models used to generate catch and effort data. 76 4.9 Specification of 16 models used to fit catch and effort data. 'SIM' indicates the simulation model. 'FIT' gives the fitting model. 'Eff' specifies assumed effort sequence. A is the assumed choice for A. An omitted entrance implies the repetition from the line above. v 81 4.10 Posterior expectation (standard error) for the biases in Sopt, the bias in Eopt, and the global variance (72 = Vp -I- V ^ 85 4.11 The Bayes rule EB for different cases is displayed together with its "true" alternative [value in square brackets]. Columns 2 to 4 specify model assumptions. All 6 runs use SLIMIT-0 3 89 4.12 Estimated catch history, and trawl survey biomass for Orange Roughy (Hoplostethus atlanticus) for the period 78/79 to 88/90. Data from Francis [18] 93 4.13 Model specifications (columns 2 and 3), final entropy (Ent.) and posterior estimates for virgin biomass B1 for Orange Roughy (values in 1000 t) and the probability that B1 exceeds 500 thousand t. Values in brackets are estimates for posterior standard error. Run 'la' uses assumption (a) of Lemma 3.1.; the remaining runs use assumption (b). 'D' denotes diffuse prior and 'I' the informative alternative. 95 4.14 Risk-averse utilities cl, associate to the five possible cumulative discounted catch COO ^ 100 4.15 Recommended decisions ai (rate of TAC reduction in thousand t/year), which maximize the expected posterior utilities, using three alternative utility functions 102 4.16 Expected utilities obtained for all 5 decision rules, according to the form of the utility function (expressed by 0) ^ 102 4.17 Parameters which are fixed in the delay-difference model for Pacific cod (Gadus macrocephalus) ^ 109 4.18 Quantiles specified for the parameters. The mean m and variance C are parameters for the corresponding Gaussian distribution for the logarithm of the parameter. 110 4.19 Monte Carlo estimates of posterior mean (standard deviation) for diffuse and informative priors. The entropies are for the final importance function. ^ 113 vi 4.20 Monte Carlo estimates of moments of the posterior distribution p(0 I Y(r)) for Pacific cod, given diffuse and informative priors. The values mo, ml and m2 refer to In b, In q, and In B1 respectively. The values Cij are the corresponding covariances vii 116 ^ List of Figures 2.1 Contour plot for posterior distribution of (pi,p2). Contour lines are for .05, .10, .25, .50, .75, .90, .95 of largest observed density 30 2.2 (a) Marginal posterior for pi (full line) and p2 (broken line). (b) Density estimates for (p2 —p1) for 4000 sample points (full line) and the 500 values remaining after clustering (dotted line) 31 2.3 Prior (predictive) and posterior (filtered) distributions are shown respectively by dotted and full lines, for times t = 1 to t = 6. `v' is the observation Yt and `+' indicates the "true" et 40 4.4 Stock and Recruitment data for Skeena river sockeye salmon (Reproduced from Hilborn and Walters [25] ). The lines were fitted according to Ricker model (broken) and Beverton-Holt model (continuous) using means of posterior distribution. 70 4.5 (a) and (b): Contour plots of posterior distributions for Rb and BHb (Lines at .01, .05, .10, .25, .75, .90, .95 of the largest density). (c) and (d): Approximate posterior densities for Smsy and umsy (dashed line: BHb; full and dotted lines: Ra and Rb) ^ 73 4.6 The two effort sequences used in the simulation models. ^ 78 4.7 Time-series of observed efforts (dotted line) and the resulting simulated catches (full line), for different models. ^ 79 4.8 Spawning stock 'S(t)' and the corresponding biomass `B(t-I-1)' for different simulation models. The line is the function GO from which the data were generated. 80 4.9 Box-plots for "relative entropy" for the sixteen fits to simulated catch and effort data ^ 84 viii 4.10 Contour plots of two-dimensional marginal posterior distributions for 'run 8'(lines at .05, .10, .25, .50, .75, .90, .95 of largest density). The dots indicate the true parameter used in the simulation model. In (d) a histogram of the 6000 Monte Carlo estimates for Eopt (the column to the far right includes all Eopt > 5). . 87 4.11 Expected posterior utilities over the set A of possible choices for effort. First column uses fitting model It2 and second column uses BH. Data from simulation models 'case 1', 'case 2' and 'case 3' were used in rows 1, 2 and 3. 89 4.12 Marginal posterior distributions for virgin biomass B1 for different runs: 'la'(line), '2'(dots) and '4' (broken line) ^ 96 4.13 Contour plots for two-dimensional marginal posterior distributions: 'run 1a':(a) and (b); 'run 2':(c) and (d); 'run 4':(e) and (f). Contour lines are drawn at .05, .10, .25, .50, .75, .90 and .95 of the highest density 97 4.14 Time series for Catch (Ct), observed effort (Es) and CPUE (Ii). Also a plot of `CPUE vs. E' for the final 13 years ^ 106 4.15 Some marginal posteriors for data simulated using the logistic biomass model and fitted by the mixture model described in text. The contours are at .01,.10,.20,.30,.40,.50,.70,.90 of the largest observed posterior density 108 4.16 Estimated posterior marginal densities for Pacific Cod (Gadus macrocephalus). The contours represent .05, .10, .25,.50, .75, .90 and .95 of the largest observed density. 112 4.17 Estimated posterior marginal densities for Pacific cod, using an informative prior for the catchability parameter q. The contours represent .05, .10, .25, .50, .75, .90 and .95 of the largest observed density. 114 4.18 Marginal posterior densities for: (a) In b, (b) ln B1. The full and dotted lines correspond to the "diffuse" and "informative" priors respectively ^ 115 ix Acknowledgement I am most grateful to my research supervisor Donald Ludwig for the many ideas and valuable advice; and to Colin W. Clark whose work in bioeconoraics triggered my interest in resource management. I wish to thank the people whose suggestions and comments about preliminary versions of this monograph have been most helpful: Mohan Delampady, Harry Joe, Marc Mangel, Jon Schnute, Al Tyler, Carl J. Walters, and James V. Zidek. I also wish to thank Antonio A. Loureiro for his willingness to help me over countless computing hurdles; and to Christine A. Adkins for agreeing to proofread the pseudo-code. Only the support from my wife Carmen Luiza and her patience made the journey possible. Our sons Bruno and Werner made the journey more enjoyable by always reminding me of recreation and fun. To the family I owe my greatest debt. Chapter 1 Introduction 1.1 Motivation This work is about the practical implementation of Bayesian statistical analysis for quantitative fisheries stock assessment and management. The basis of the study is the recognition that fisheries management is decision making in the presence of uncertainty. Different possible actions have to be compared according to measures of their consequences. Such analysis typically seeks a compromise between biological, economical and social components — see Section 1.2 — while traditional assessment concentrates on the biology of the stock. The Bayesian approach described in Section 1.3 expresses uncertainty as probabilities and uses utility theory to measure the consequences of different actions. Lindley [31] and Berger [2] give convincing arguments endorsing the claim that coherent and rational decision making in the presence of uncertainty must correspond to a Bayesian analysis. Hilborn and Walters [25] emphasize the importance of a reorientation of thinking within fishery sciences. I believe that the Bayesian paradigm to statistics has the ingredients to support such a philosophical change: (i) the consistent expression of uncertainties in the appropriate language of probability, and (ii) a coherent framework for decision making in the presence of uncertainties. Technical difficulties have hindered a wider use of the Bayesian approach in practical fishery problems. The adaptive importance sampling procedure described in Chapter 2 is designed to overcome some of these obstacles. In Chapter 3 I propose ways in which such procedure can be implemented for biomass dynamic models. I further illustrate the procedure for a variety of case-studies in Chapter 4 and make general comments and draw conclusions in Chapter 5. In fisheries sciences it is assumed that management actions are taken upon recommendations 1 Chapter 1. Introduction^ 2 of scientific assessment of the resource and its potential production. But criteria for real-world decisions need to include considerations of a political, social, economic, and cultural nature as well. Quantitative stock assessment is too often seen as synonymous with management, while in fact assessment and management serve distinct purposes: the former should produce useful advice which can effectively assist decision making which is responsibility of the latter. The conventional approach of producing a set of statistical point estimates has proved to be misleading for at least two reasons: (i) it ignores uncertainties about models and estimation and (ii) it does not tell how the results are to be used. The scientists in charge of assessment were expected to recommend such quantities as total fishing effort and catch quotas. This procedure improperly placed the judgment about risk-taking into the hands of technical advisers. In contrast, when producing an answer in form of a (posterior) probability distribution, the Bayesian approach gives a complete description of current beliefs and uncertainties. It becomes the responsibility of the manager to formulate an utility function which measures the consequences (biological, economical, social) of different actions. The assessment expert is no longer the decision maker, but an analyst of uncertainty. A dear indication of the appeal of Bayesian statistics in fisheries, is the considerable growth of the use of such methods in current fisheries literature (Ludwig and Walters [36], Mangel and Clark [40], Clark [6] [7], Walters [68], Charles [5], Fried and Hilborn [19], Deriso and Parma [13], Nakamura ei al. [43], Hilborn and Walters [25], Zeh et al. [80], Geiger and Koenings [20]). Fundamental characteristics of fisheries sciences are the lack of reliable controls and difficulty to obtain replicates. The possibility of performing planned experiments is severely limited. Studies of the resource, using data produced during the history of its exploitation, are mostly observational. In contrast, statistical science originally dealt with the collection of data and later moved to the analysis of (replicated) data obtained under controlled conditions. Classical frequentist probabilities involve the concept of a long sequence of repetitions for a given situation (the probability of that a fair coin comes up heads builds on such rationale). This (frequentist) view of probability is central to classical statistics. In fishery sciences, however, where often Chapter 1. Introduction^ 3 there are no replicates, the uncertainties about the effects of an action have a very different meaning. For example, consider the statement: "There is a probability of that the stock will collapse within five years if action A is taken today". The frequentist viewpoint does not apply here. The theory of subjective probability has been created to enable one to talk about probabilities in such situations. A comprehensive review of some systems describing subjective probabilities is found in Press [51]. According to Berger [2] (p.75), "such systems show that the complexity of probability theory is needed to quantify uncertainty (i.e., nothing simpler will do), and at the same time indicate that it is not necessary to go beyond the language of probability". I believe that the classical statistical approach, with its ideas of repetition and relative frequency is inadequate to cope with the reality of fishery sciences in most cases, and that the Bayesian paradigm, founded on the concept of subjective probability, has the potential to provide better support of scientific procedures in this field. Given the uninformative nature of most fishery data, the practitioner knows that all bits of additional information can be of great relevance and should therefore be used whenever possible. A fair way of proceeding is to explicitly unfold available beliefs and insights in the form of probability distributions. By formally stating such probabilities, uncertainties become open to public scrutiny and evaluation. Starfield and Bleloch [65], by pointing out that "people's perception about models and modelling can be very different depending on the kind of problems they usually face and the types of tools they commonly use", provide a very general classification which can be helpful to summarize the difficulties typically faced by fisheries scientists. Their classification is according to two factors: Factor 1 the level of understanding (and control) of the problem: low, high. Factor 2 the quality and/or quantity of available data: low, high. Most problems in engineering and physical sciences, for example, are close to a high-high classification for Factors 1 and 2 and consequently benefit most from the developments in classical statistics. Many problems in the nonphysical sciences, particularly fisheries, are closer to a low-low classification and are therefore unsuitable for a traditional approach. Chapter 1. Introduction^ 4 Within this general picture of fishery science and given the argumentation above, the modelling approach proposed here has two central features: 1. The sequential nature of system dynamics and data acquisition. 2. The Bayesian paradigm to perform analysis and assist the decision making process in the presence of uncertainties. The next section discusses the different levels of uncertainty encountered in fisheries and the tradeoff between information acquisition and monetary returns, when management actions are chosen. In Section 1.3 I describe the Bayesian approach. Finally, Section 1.4 gives a summary of the contents in the remaining chapters. 1.2 Uncertainties, Models and Learning The necessity for explicit recognition of uncertainties was used in the previous section as a central argument in support to a reorientation in thinking among fishery scientists and managers. The Bayesian approach, suggested as an adequate framework in which to perform the statistical analysis, will be described in Section 1.3. Here I will comment on three types of uncertainty in fisheries models: (i) model structure, (ii) parameter estimation and (iii) prediction; and I will discuss how each one of them affects assessment and/or management. Model Structure. This basic uncertainty is central to modelling in general and is not a feature confined to problems in fisheries. It is concerned with the formulation of problems so that relevant properties are correctly captured. While there is no specific way to address this complex issue, a variety of books can provide helpful advice: Starfield and Bleloch [65], Walters [68], Clark [7], Hilborn and Walters [25]. Linhart and Zucchini [32] propose one possible strategy to confront this difficulty. They Chapter 1. Introduction^ 5 introduce the concept of an Operating Model which should be the nearest possible representation of the "true situation". Ideally, it is on the operating model (or a family of alternative possibilities) that interpretation, prediction, and decision should be based. Since these models usually have many unknown parameters, considerations on the behaviour of estimates play an important limiting role in practice and will often require the use of simpler Estimation Models. This last consideration leads to the second level of uncertainty: the estimation of parameters. Parameter Estimation The lack of good controls, often caused by management constraints, result in weak contrast in fishery data (Ludwig at al.[35]). When combined with (often large) observation errors, such data tend to be non-informative. Parameter estimation under such circumstances becomes worse (increasingly uncertain) as the dimensionality of the problem grows, a phenomenon known to statisticians as "overfitting". The uncertainties associated with parameter estimation are accuracy (bias and variance) and confounding. The estimation may be done by minimizing least squares or maximizing the likelihood. Using simulation studies, it has been shown that for typical fisheries data, simpler models produce better policy estimates than more realistic highly parameterized ones (Ludwig and Walters [37] [38]). For further applications of the operating model concept in fisheries see Fournier and Doonan [16], Ludwig et al. [39] and Fournier and Warburton [17]. A drawback in all these procedures is the use of a single best estimate upon which management recommendations are based. It is common that various other parameter combinations fit the data nearly as well. Different combinations, however, might imply different management actions. Such legitimate and important uncertainty simply vanishes from the problem when conclusions are given as point estimates. Chapter 1. Introduction^ 6 Prediction As already mentioned in Section 1.1, fishery science depends heavily on observations rather than controlled experiments. This limits the range of available data (because conventional management tends to seek stability) and affects the reliability of predictions. It is generally true that moving outside the range of observed data and controls is a very uncertain undertaking. For fishery models in particular, it usually involves extrapolation for non-linear systems. For example, one can only know how a fish stock responds to increased fishing pressure by trying it out. But in many instances this form of testing might translate into short-term losses (or unacceptable risks) for which there are no guaranties of future compensating returns. A compromise between short-term performance and long-term learning rates needs to be established. An action (control) which is good under performance considerations, perhaps by maximizing profit, is often incapable of producing useful information about the system (i.e., it results in non-informative data). This principle has become known as the "dual effect of control". Performance oriented strategies usually seek stability of the system, while better learning rates are associated with instability and strong contrast in the data. Actively adaptive (or "dual control") strategies are designed to establish some balance between learning and short-term performance. Walters [68] (Chapter 9) gives an extensive description of these methods, which involve stochastic dynamic programming. While acknowledging the importance of the described conflict and the relevance of actively adaptive strategies to management of fisheries, the present work will not further elaborate on the issue. But, as a final remark, I mention three features which are important for the development of actively adaptive strategies and are of particular interest here: (1) The identification of alternative hypothesis about stock dynamics, (2) the development of performance criteria for comparing adaptive policy options, and (3) the formal comparison of different options. In Section 1.3 it will become clear that the Bayesian approach can adequately deal with all three demands. In this sense, a support for active adaptive strategies is a support for the Bayesian approach as well. In Section 1.1 I have pointed out the time-series nature of models and data acquisition in Chapter 1. Introduction^ 7 fisheries. The adaptation of traditional time series methods (Box and Jenkins [3]) has been proposed in the past (Schnute [56], [58]) but has not proved to be successful in describing fisheries stock dynamics (Hilborn and Walters [25]). Some classical autoregressive moving average model ARMA(p,q) say, can be good to describe past data but is not of much help in guiding management actions for the future. The literature in time-series and dynamic system methodology (Brockwell and Davies [4], Aoki [1]) is mainly suited to situations in which a large amount of good quality data and a good understanding (and control) of the system dynamics is available. Pella and Tomlinson [48] introduced a sequential fitting procedure which is unrelated to the traditional time-series methodology but has proved very useful in fisheries. They assumed a known underlying deterministic (but unobservable) model describing the biomass dynamic, together with an observation model establishing the link between observed catches and biomass. An estimate for the parameters of the biomass model was then obtained by minimizing leastsquares of the differences between the observed and the predicted time-series of catches (observation errors). The operating model approach, described earlier in this section, is more general due to the inclusion of two features: (1) system noise (Schnute [58] discusses how the assumed noise structure can fundamentally affect the results of an assessment) and (2) more elaborated goodness-of-fit criteria allowing for the incorporation of auxiliary information. I shall give more details about this approach in Chapter 3 (Section 3.3) where I extend these concepts into Bayesian inference. 1.3 The Bayesian Method In Sections 1.1 and 1.2, I presented a series of features of fishery science and argued in favour of a reorientation of thinking about the distinct role of fishery assessment and management. I further claimed that the Bayesian paradigm of statistical modelling is the correct framework in which to operate. Fishery management is ultimately a problem of decision making in the presence of uncertainties and the Bayesian theory consists of an axiomatic system developed to provide a rational and coherent way of reaching such decisions. Chapter 1. Introduction^ 8 Here I shall describe the central ideas which characterize the Bayesian method. As introductory books to the subject I suggest Lindley [31] and Lee [29]. More technical treatments are Berger [2], Press [51] and, in the context of dynamic models, West and Harrison [77]. There are four basic elements: 1. The unknown quantity 9 which denotes the uncertain element in the decision process and is called the state of nature. The set of all possible states of nature will be symbolized by 0. For definiteness and clarity of exposition, let 0 be a subset of the real line R. A probability density r(0) is assumed to describe the subjective relative credibility associated to a particular O. 2. A sample space of elements x and a set of probabilities po(x I 0) of x, indexed by O. The model po(x I 0) (the index 'o' indicating observation) is seen as a function of 0 and, conditional on a fixed observation of x, is denoted the likelihood function and symbolized by 1(0). This "conditional view" — which makes the sample space (i.e., the set of all possible outcomes for x) irrelevant in most cases — is known as the likelihood principle. This principle makes it explicit that only the observed x should be relevant to provide evidence about 0. 3. A set of all possible actions under consideration. I denote an individual action by a and the set of all such actions by A. 4. A numerical value measuring the consequence (utility) of an action a if 9 turns out to be the true state of nature. This value, symbolized by U(0, a) and denoted utility function, is assumed to be defined for all pairs (0, a) E 0 X A. Such an utility function is described on a scale which is also based on probabilities (this provides for coherence). The probability density r(0) describes uncertainties about 9 prior to the observation x; therefore called the "prior density". In light of the information provided by the observed data x, the prior beliefs about 8 are updated using Bayes Theorem, which says: p(0 Ix) cc po(x I 0)7 r(0) Chapter I. Introduction^ 9 The constant of proportionality is a normalizing constant to make p(0 I x) a proper probability density so that it integrates to 1 over the set 0. From a Bayesian viewpoint, the conditional density p(0) x) gives a complete description of 0 after observation of the data x; it is denoted the "posterior density". Since p(0 I x) says it all, there is no need to go further in terms of estimation. Anything less, however, implies loss of information. Any action should have its value described on an utility scale. In the light of current beliefs, fully expressed in the posterior p(0 I x), a best action must be chosen. It is an implication of Savage's [55] work that, given a coherent comparison of utilities, the best action is one that maximizes the expected (posterior) utility E[U (• , a)]. In the continuous case and for a given a this expectation is obtained as the integral E[U(., a)] = I U(0, a)p(0 I x)d0 Additional arguments in favour of the maximization of expected utility are given by Lindley [31] (1). 57). Some remarks are needed here: • For simplicity, I have assumed 0 to be a subset of the real line 42 in the above description, and consequently r(9) and p(0 I x) are one-dimensional probability densities. But, I can have 0 as finite or countable (n --+ oo) sets, in which case a discrete probability distribution is used for both, prior and posterior. Also, 0 = (01, • - • ,0p)' can be a pdimensional vector so that 0 becomes a subset of RP and multivariate probability densities would apply. In fact, the latter (multivariate) will be the case of interest in the remaining chapters. • Bayes' Theorem is central to the approach (hence, the name) in that in establishes a systematic way of handling (and modifying) subjective beliefs about 0 in the presence of observation x. Chapter 1. Introduction^ 10 • In Bayesian inference it is common to consider loss functions instead of utilities, so that minimization of expected losses replaces maximization of expected utilities in determining an optimal action. The "squared-error loss", which defines L(0, a) = (0 — a)2 is a typical example. If A = 0 = 42, it is easy to show that the posterior expectation E[0] is the value for a which minimizes the posterior expected square-error loss E[(0 — 02]. In general it is adequate to think simply of the relation L(0, a) = — U(0, a); that is, think of loss as being a negative utility. There would be little resistance to the use of prior distributions if 0 were a random variable subjected to frequentist interpretation. The subjectivistic concept of probability and the use of prior distributions causes some discomfort even among advocates and users of Bayesian analysis for fisheries. To give a taste of the controversy involving the former, I quote Lindley [31](p.20): "Some have regarded this (subjective probability) as a disadvantage. In fact, the subjective nature of probability is its strength, for it describes a real situation of a subject, or observer, contemplating a world, rather than talking about a world divorced from the observer living in it, as do some other sciences." As to the latter, Berger [2] (Sections 3.7 and 4.7) discusses extensively the common criticism associated to prior specifications and presents possible solutions to this dilemma. I give next a short account of his discussion regarding four arguments which are likely to be raised in any critical discussion about the merits (or not) of a Bayesian analysis. 1. Lack of Objectivity: If an overwhelming amount of (good quality) data were available, almost any analysis would yield similar conclusion, so that prior information would be largely irrelevant. But, when such data availability is limited (as in most cases), the choice of a model and its error structure can have a comparatively much stronger impact on the answer than the choice of a prior does; and it is just as subjective. Chapter 1. Introduction^ 11 2. Misuse of Prior: It is true that the use of a prior introduces an additional source for abuse of statistics. The best prevention against potential misuse consists in a proper education regarding prior information and the handling of uncertainties. In reporting results of a Bayesian analysis, prior, likelihood, and utility should be reported separately so the reader can evaluate how reasonable all subjective inputs are. It also makes it possible to try out alternative priors if such a change is felt to be appropriate. 3. Robustness: "Do small changes in prior specification cause significant changes in the final decision ?" This question is justified and a central issue. Studies in Bayesian robustness and related methodology address the topic. The construction of robust priors is sometimes an adequate compromise. If, however, different reasonable priors result in contradicting conclusions, this is an indication of scientific uncertainty of the problem rather than a weakness of prior specification. 4. Data or Model Dependence of the Prior: The Bayesian approach assumes that the information contained in 740) does not depend in any way on the data. This idealized view is hard to match in practice. This happens because a model po(x I 19) is often chosen after inspection of available data. Such choice further defines the parameters 0 under consideration; hence, the circularity between data and parameters. The answer to these criticism is twofold. First, all analyses suffer from the circularity between data and model. Second, if in a given analysis the model seems reasonable as the priors do, and other reasonable priors give similar conclusions, then details of prior development are not central to the analysis; this leads back to the robustness considerations discussed previously. The Bayesian paradigm is conceptually easy to implement. But the calculation of the posterior p(0 1 x) and the corresponding expected utility entail evaluation of integrals which can become difficult, especially for multidimensional O. To avoid such difficulties, the use of conjugate priors has been an important concept and is explained below. Chapter 1. Introduction^ 12 Conjugate Families and Numerical Approximations If a class F of density functions po(x I 0) is given, and P is a restricted class of prior distributions 7 40), then we say that "the class P is a conjugate family for F", if the posterior p(0 I x) is again a member of P. Some examples: 1. If (x I 0) has Gaussian distribution with mean 0 and (known) variance a2, then the class of Gaussian priors 740) is a conjugate family for the class of Gaussian (sample) densities. 2. If (x I 0) has Gaussian distribution with (known) mean 0 and variance a2, then the class of Inverse-Gamma priors 71-(a2) is a conjugate family for the class of Gaussian (sample) densities; that is, p(cr2 I x) is an Inverse-Gamma density. 3. If (x I 0) has Poisson distribution with intensity parameter 0 (i.e. h(x I 0) = 0x e - e (x!) - 1), then the class of Gamma priors 740) is a conjugate family for the class of Poisson (sample) densities. The need to look for conjugate families is a constraint on practical application. For fisheries in particular, the non-linearity of models further complicates the matter. In an attempt to free applications from both difficulties, theoretical and technological developments over the last decade have produced a variety of computationally demanding approximating methods. Smith [60] in a recent review, divides these techniques according to three different strategies: (i) Laplace and related analytical approximations, (ii) adaptive quadrature based on classical numerical analysis and (iii) sample-based methods or Monte Carlo integration. While (i) and (ii) require a great deal of expertise and become harder with growing dimensionality of 0, the sample-based methods of (iii) are suitable to a broader audience for their relative ease of implementation and practicality for high-dimensional O. The final output of a sample-based method consists of a random sample 0 = {01,• • •,0,,,} drawn from the posterior distribution p(0 I x). The essential duality between a sample and the distribution from which it is drawn is an important concept which I shall explore next. Chapter 1. Introduction^ 13 The distribution clearly generates the sample. But, conversely, given a sample one can approximately recreate the distribution. More generally, one can use the sample 0 to explore and summarize features of p(0 ) x) as one would proceed in exploratory data analysis on ordinary data x. Features which might be of interest are (Smith and Roberts [62]): • examining shapes of uni-variate marginals for 9 or for functions H(0). • producing marginal moments or quantile summaries. • examining bivariate marginals or pairs of functions. • examining trivariate distributions (by taking advantage of the powerful statistical packages for graphical analysis). • connecting 0 to an utility function so that the output gives answers to a specific decision problem. • using 0 in predictive analysis. The Monte Carlo methods are further sub-divided into two groups. First there are the Markov Chain methods which include the Gibbs sampler of Geman and Geman [22]. The second group is based upon importance sampling. The present work uses the latter. In the next two chapters I shall explore possibilities for the use of adaptive importance sampling with mixture models (West [75]) to obtain posterior distributions for parameters in a variety of fishery models. 1.4 Summary of Contents In Chapter 1 I have described the motivations for this work and the relevance of the Bayesian paradigm in fishery sciences (Section 1.1). Different levels of uncertainty and their impact in fishery assessment and management were examined in Section 1.2. The Bayesian method was described in Section 1.3. This last section is designed to give an overview of the contents of remaining chapters and assist a selective reader. Chapter 1. Introduction^ 14 In Chapter 2 I describe the Monte Carlo methods used in Bayesian computation of posterior probability distributions. Central aspects of basic importance sampling are reviewed in Section 2.2 and the adaptive importance sampling proposed by West [75] is reviewed and illustrated in Section 2.3. The use of the approach for sequential analysis of dynamic models is described and illustrated in Section 2.4. In Chapter 3, I describe a variety of models for biomass dynamics: stock-recruitment, surplus-production and delay-difference. Starting from the basic time-series fitting approach of Pella and Tomlinson [48], I incorporate the adaptive importance sampling methodology to construct an approximate posterior distribution for the underlying parameter vector. The derivation of the results are in Sections 3.2 and 3.3. Section 3.4 gives an overview of delaydifference models. In Chapter 4 I present different case-studies to illustrate the results of Chapters 2 and 3. In Section 4.2 I construct a bivariate posterior distribution for the stock-recruitment parameters of Skeena river sockeye salmon data. A simulation model using catch and effort data is described in Section 4.3. This case also proposes an utility function and uses it to make management decisions. In Section 4.4 I examine data for the Orange Roughy fisheries in New Zealand. Different management strategies are analyzed in the context of Bayesian decision theory and the results compared to the (non-Bayesian) risk analysis of Francis [18]. The last two casestudies refer to situations for which the posterior distributions are bi-modal. In Section 4.5 I use artificial data generated from simulations and in Section 4.6 I use data from the Pacific cod fisheries off the coast of British Columbia, Canada. The insight gained from the work, topics for further research, and final conclusions are all given in Chapter 5. Chapter 2 Bayesian Posterior Computation 2.1 Introduction In the previous chapter I discussed general features of fisheries stock assessment and management. I gave arguments supporting the adequacy of the Bayesian paradigm to perform statistical analysis and to guide decision making. In Section 1.3 the Bayesian method was described. While conceptually easy to implement, its application reduces mostly to calculations of integrals of the form f H (8)f (0)cle MHO) =^ f f (0)c19 (2 .1 ) where 9 E RP, H(9) is a measurable function, and f(0)I f f (01)c10' a probability density. Different choices of H(9) result in different aspects of practical Bayesian inference. To focus attention, assume that f(9) is proportional to the posterior density of interest. Then, • To calculate the posterior mean 1.1, I define H(9) = . Further, taking H(9) = (0 — iu)2 yields the corresponding posterior variance. • If a decision has to be made between actions al and a2 according to the expectation of given utility functions U(0, ai) and U(0, a2), I define II(0)= U(0, al) — U(0, a2). • To calculate posterior probabilities of specific sets (or hypothesis), H(9) can be defined as an indicator function IA for the appropriate set A. As an example, suppose I want to estimate the posterior probability of 9 being smaller than a fixed value 90; this is obtained by defining H(9) = h<90 15 Chapter 2. Bayesian Posterior Computation ^ 16 • Within the context of fishery models in particular, the function H(0) sometimes defines a policy (e.g. optimum fishing effort) for which an estimate is desired. The posterior expectation itH say, can serve this purpose (although posterior largest mode and median are also possibilities). Posterior estimation of the corresponding standard error is obtained by defining a new function 11*(0) --= (H(9) — 141)2. Models in fisheries are mostly non-linear in 0, which is often a vector of moderate to high dimension. As a result, the evaluation of (2.1) is in general quite complex and impractical to be solved analytically. A wide range of approximating methods have been developed to overcome this difficulty. These methods can be subdivided into two major groups. One group concentrates on analytic approximations involving Taylor series expansions (Lindley [30], Tierney and Kadane [67], Delampady et al. [11] [10]). Difficulties with the implementation of the analytic methods grow rapidly as the dimensionality and complexity of the models increase. The second group of approximating methods, involving Monte Carlo integration, is generally necessary under those circumstances. These methods are presently an active field of research; see Smith [60] for an overview. The principle behind Monte Carlo integration is as follows. Suppose one can generate an independent identically distributed (i.i.d.) sequence of random variables {01, • • - , 0,} having common density p(0) > 0 on 0. Then, for a well-behaved H(0), it follows from the strong law of large numbers, that E n . 1 lim — H(9i) I H(0)p(0)d0 n.00 n i=.1^e While the principle is simple, difficulties for practical implementation consist in the ability to perform random drawings from a general density p(0); it might be computationally very expensive (as the dimension of 0 increases) or impossible for complicated forms of p(0). In attempting to overcome these hurdles, Monte Carlo methods have followed two distinct paths: 1. Mar/coy Chain Methods: These methods include the Gibbs sampler algorithm of Geman and Geman (1984) [22]. Given a p-dimensional 0, the key feature of the algorithm is to Chapter 2. Bayesian Posterior Computation^ 17 fix the full conditional densities p(0i I 03; j 0 i) for i = 1, - • - ,p, from which computationally inexpensive random draws are made. Possible limitations of the methods are twofold: (1) the need to specify full conditionals (often a difficult task in itself) within a convenient class of distributions from which sampling is easy and (2) the possibility of bad performance for distributions featuring two separate modes (with a region of small densities in between). In the latter case, and due to its Markovian nature, a chain of random generation could be "trapped" at one of the modes. Refer to Smith and Roberts [62] for a review. A related method is the substitution sampler of Gelfand and Smith [21]. 2. Importance Sampling Methods: In its original form, as described in Geweke [23], basic importance sampling replaces the density p(0) by a "similar" density g(0), from which the random draws can be made efficiently. The ability to find a suitable g(-) is the key assumption. Adaptive sampling schemes have been developed to assist the choice of an importance function g(-) ( Kloeck and van Dijk [28], Naylor and Smith [45], Oh and Berger [46]). An important variation, specially useful for high-dimensional 0, was proposed by West [75] [76]. It consists in modelling g(-) as a discrete mixture of distributions of some standard parametric form. By recursively updating g(-) a few times, the final mixture model seems capable of providing a satisfactory approximation to X.) quite generally. Sample-based Monte Carlo methods are becoming a central tool for practical implementation of Bayesian inference. In this work I shall focus attention on adaptive importance sampling methods with the importance function given as a finite mixture of distributions. I shall explore possibilities for its use in a variety of biomass dynamic models in the upcoming chapters. Before examining these, I use the remainder of the present chapter to describe importance sampling. The next section introduces basic importance sampling. In Section 2.3 I present the algorithm for adaptive importance sampling using mixture models and provide an illustrative example. In Section 2.4 I describe the use of the adaptive algorithm for sequential updating of dynamic models and illustrate the procedure with another example. Chapter 2. Bayesian Posterior Computation^ 18 2.2 Monte Carlo Integration with Basic Importance Sampling (BIS) Let 0 have a density p(0) with support 0 E RP. Assume that it is impractical to produce a sequence of i.i.d. random variables from p•); but the exact evaluation of the unnormalized non-negative function f(0) oc p(0) is possible for a given 0. Suppose it is required to evaluate the expectation (with respect to p(.)), of some given function H(0). Monte Carlo integration with importance sampling performs this calculation as follows: 1. Specify a probability density g(0) as an approximation for p(0). If Eq.] and EgH denote respectively the expectations under densities p(•) and g(.), then EEH P (-)j = f H (0)f (0)d0 f f(0)(10 f H(0)fgr4g(0)c10 f fgr4g(0)c18 f H(0)co(0)9(0)d0 f w (0)g (0)c10 Eg[H(.)co(-)] Eg[w()] where w(0) = ^ 2. Draw a sample of size n from g(0), denoted 0 = {91, • • Onl. 3. Get the set of normalized weights Si = {w1, • • • ,con}, where coi is defined as ^w(19.i)^f(0) k^kg(0i) j). with k = E7=1 w(93). 4. Use Monte Carlo approximation to evaluate the expectations with respect to g. Namely, E-g[H(•)co(-)]^. 17 -^ H(9,49,) 19 Chapter 2. Bayesian Posterior Computation^ Eg[co(•)]^co(8j)^-k- n j=1 The final result is EP[1101 E H(0.00.,^ (2.2) In essence, the procedure consists in replacing p(.) by a discrete approximation with mass function coj at Oi The description above assumes the existence of a good importance function g(.). Hence, the key issue is finding a suitable g•). It should be easy to draw i.i.d. random variables from it while, at the same time, it should make the approximation in (2.2) accurate for n as small as possible. These are sometimes competing goals, and different situations might need different solutions. Berger [2] (pp.263-265) provides some guidelines for choosing an adequate g(). Geweke [23] gives mild conditions under which the numerical approximation in (2.2) converges almost surely to the true value and is approximately normal distributed with mean ILH = EP[H(•)] and variance a2/n, as the number of Monte Carlo replications n increases. The variance is defined as (72 EP [(HO —141)2(001 • The performance of BIS clearly depends on (72 and n, so that a g() yielding small o2 is desirable. Although there is not a general rule for choosing a good g(), Oh and Berger [46] give some desirable properties of a good importance function: 1. It should be easy to generate random drawings from it, a requirement already mentioned earlier. 2. It should not have sharper tails than f(-). The intuition being that sharper tails would result in large values of co(02) for outlying values of Eli. The result is increase in the variance u2 and instability in the estimates. 3. It should mimic the true f(.). Here the goal is to avoid having a large proportion of c,..)(0j) near zero. Such values have little effect on (2.2), reducing the efficiency of the method. Chapter 2. Bayesian Posterior Computation^ 20 A complementary way of reducing the variance of the importance sampling estimate has been the search for appropriate sampling strategies (Rubin [54]). According to Smith [60], and provided that a good procedure can be found, it is likely that importance sampling is more efficient than the Markov approach mentioned earlier. Probability Densities and Random Samples An important aspect of all the Monte Carlo sampling methods, and importance sampling in particular, is the duality between the density p(-) and a sample 0 = {i, • • , On} drawn from it. If only 0 is known, it is possible to approximately reconstruct the underlying density p(-) using an histogram, kernel techniques, or the empirical cumulative distribution. Switching attention to the sample 0 as a description of the posterior leads to two important observations. First concerns Bayesian Decision Theory. There it is usually necessary to evaluate the expected utility EP[U(0, a)] to find the optimal action a. If 0 is an i.i.d sample from p(•), it seems natural to use the same set 0 (and SI = {w1, • ,con,}) to calculate the expected utility for all required values of a. Since the generation of 0 is often the most expensive part of the Monte Carlo process (Berger [2]), the availability of 0 and S2 is an important advantage of the sampling methods with respect to computational efficiency. The second observation relates to resampling techniques and can be very helpful in the context of Bayesian robustness. There it is desirable to determine potential changes to the posterior when alternatives to the likelihood and/or the prior are considered. In the context of distributions such a study can pose formidable difficulties and requires a fresh start when a new possibility has to be investigated. Focusing attention on the available sample 0, the resampling approach makes such an analysis remarkably simple (Smith [60]). Starting with the sample 0 from p(0), suppose the goal is to produce a new sample 0* from a distribution proportional to some known function P(9)• Resample from 0 = {92; j = 1,- • • , n} as follows: 1. Calculate coj = f*(9j)/p(0j). Chapter 2. Bayesian Posterior Computation^ 21 2. Obtain the normalized weights qi = Wi Ldj=1 3. Draw a random sample with replacement from the discrete distribution over 0 placing mass qi on 9,. Denote this sample 0* = {01, 0, —}. Then 0* is approximately distributed according to the density f*(0)/ f*(01)do' Here is an example. Let ri(9) and 71-2(9) be distinct priors to be compared and 4(9), 12(9) two likelihoods. For the posterior densities pi(0) a 1 i(0)x- i(0) for i = 1,2 the following relation holds: 12(0)7r2(0)) P2(0) oc Pi(9) U1(9)71-1(0) If 0 is a sample from pl., changes from the likelihood l(0) to 12(9) and/or from the prior ri (9) to 1r2(0) can be examined using the resampling algorithm described in steps 1 to 3 above. Simply define v(9) = 12(9)r2(9) 11.(8)ri(e) PM= Pi(9) f*(9) = v(e)pi(o) coi = v(0i) 2.3 Adaptive Importance Sampling (AIS) and Mixture Models For efficiency in using the importance sampling method, we must use an appropriate importance function g(.) (in the sense discussed in the previous section). Finding such a function requires skills It becomes harder as the dimension of 9 increases. For wider applicability of the method, more automated ways to obtain a satisfactory g(-) are needed. This is the motivation behind the development of adaptive importance sampling (AIS), the subject of this section. Chapter 2. Bayesian Posterior Computation^ 22 The AIS procedure was introduced by Kloeck and van Dijk [28]. It starts with some crude approximation for p(-) denoted go(•). Because many of the sample points 9o, will have negligible weights w0 ,j (i.e. small mass p(00,i) relative to go(00,i)), such an importance function is very inefficient computationally, requiring huge sample size no, as well as statistically, converging slowly or not at all. The adaptive refinement consists in taking a small no and, based on the outcome, constructing the updated gi(-). A new sample from M.) is expected to have a better match with p(.). The final importance function obtained by such an adaptive scheme is expected to display good mimicry to p(.); a desirable property according to the previous section. Different methods to perform the adaptive refinement of g(-) were proposed by Naylor and Smith [45], Oh and Berger [46] and West [75] [76]. I will focus on West's method: the use of a discrete mixture of distributions of some standard parametric form. Three features which justify my choice for this method are: (1) the capability of mixture models to emulate very irregular forms of distributions (Wolpert [79]), (2) the practicality for high-dimensional 0, and (3) the potential for automation.I use the remainder of this section to reviewing this method. Assume 9 is a p-dimensional vector with true density p(0). Further assume that f(9) a p(0) can be evaluated for any given 0. The adaptive procedure for obtaining a good g(0) goes as follows: 1. Define an initial importance function go•) and draw a stochastic sample of size no from it. Denote this sample by 00 = {00j; j = 1, --• , no} and the associated weights by := {wo,j; = 1, • • • , no}. Let F0 = {900, no, 0o, no} represent the available quadruple after this initial step. 2. Compute the Monte Carlo estimate for the variance Vo, as the weighted sum no V0 = Chapter 2. Bayesian Posterior Computation ^ 23 where Bo =^wo,j00,i is the Monte Carlo mean. 3. Construct a revised importance function as the following weighted mixture: no MO) =^wod j=.1. do 00,i, h2v0)^ (2.3) where d(9 I m, M) denotes a known p-variate density function with mean m and variance M. The bandwidth h is a smoothing parameter as used in conventional kernel density estimation (Silverman [59]). Refer to the comments below for more details about both d(.) and h. 4. Draw a larger sample n1 > no from gi(-) and replace Fo by F1 = {g1(•),ni3O1,S21} 5. Use diagnostics (see comments below) and either stop and base inference on F1 or repeat steps 2 and 3 for a revised version r2. Next a few comments to clarify some aspects of this adaptive procedure: Smoothing: The mixture model g() given in (2.3) is simply a weighted kernel estimate with kernel de I m, M). The theory for kernel estimation (Silverman [59]) implies that the mixture model g(.) approaches p(.) for increasing sample size n if the bandwidth decays to zero at an appropriate rate. For Gaussian kernel, a standard h — defined as a function of the sample size n, the dimension p of 9 and two constants a and b — is h= a nilb with a given by a= 4^ )1/b 1 + 2p j Silverman [59] defines b p+4. West [76] defines b = 1+4p, instead. For one-dimensional O's, both bandwidths are identical. If p > 1 (and n is fixed), West's formula gives 24 Chapter 2. Bayesian Posterior Computation^ comparatively larger values. If multimodality is suspected, this standard bandwidth tends to oversmooth and slightly smaller values, say rh (0.5 < r < 1) are preferable. West [76] also shows that, for fixed n, the function g(.) is always overdispersed: a desirable feature for any importance function. The constant variance V in the kernel of (2.3) can be replaced by its Monte Carlo estimate obtained from the current values of 0 and O. Furthermore, in any adaptive procedure, a crude go(.) need only capture gross features of p() and a small sample size no will do. Possible variations (not used here) consist in: (1) allowing for a different bandwidth at each recursion, and (2) using local variances 17 for each kernel in the mixture. In (1) the spread of g(.) is controlled directly: it can be increased by intervention whenever the current importance function is questionable. The use of local variances suggested in (2), can better explore the existence of local dependencies among elements of O. Although this is a good feature, the possible gain in statistical efficiency needs to be balanced against a corresponding loss in computational tractability. The kernel: In all applications to be considered here, I shall use a heavy-tailed p-dimensional Student distribution as the kernel de m, M). According to West [76], the Student distribution is an adequate choice, provided Oi E ( — oo, -I-oo) for i = 1, • • • ,p. I use the notation T(k, m, C) to denote a Student distribution with k degrees of freedom, mean m and scale Cl Diagnostics: After each recursion in the adaptive reconstruction of (2.3), different diagnostics can be used to check for improvement in the importance function. Particularly, West refers to four possibilities: 1. Changes in (Monte Carlo) estimated expectations. 2. Configuration of points in each dimension of the parameter space 0. 3. Frequency distribution of weights 9. 'The scale C and variance M of a Student distribution are related by C = YM for k> 2. Chapter 2. Bayesian Posterior Computation^ 25 4. Some measure of variability of weights. In the approach taken here, I shall use a measure of variability of weights as the standard diagnostic tool. West [76] suggests the entropy relative to uniformity, which he defines as -^wi log wi log n (2.4) This value is non-negative and equals unity in case of a perfect match between g(0i) and p(0) for all 03 E 0; that is, the normalized weights coj will all equal 1/n. As the variability among coj's increases, the relative entropy becomes smaller. The choice of adequate sample sizes ni, bandwidth h, and number of updates for g(•), can be guided by one or more of these diagnostics. For instance: look for stable Monte Carlo estimates of mean and variance (criteria 1); verify if the relative entropy (2.4) increases uniformly towards 1 (criteria 4 ), etc. Reduction of Mixtures For a very large n, the mixture (2.3) might contain many redundancies, so that another mixture with a much smaller n is alike for all practical purposes (see West [76] for examples). These smaller mixture models are obtained by reducing the n-component sets (0(n), f2(n)) to the smaller pair (0(m), S2(m)) with m < n (superscripts indicating cardinality of the sets) by some clustering scheme. The procedure he suggests goes as follows: 1. Sort the elements Oi E 0(n) according to increasing weights wi E SIN, such that ei corresponds to the smallest weight col. 2. Find the index i, such that 0i is the "nearest neighbour" to 01 and calculate w* =4.01 wi 0* = 0,101 co20i co* Chapter 2. Bayesian Posterior Computation ^ 26 3. Remove 01 and 9, from 0(n) and include 0* instead, to get 0(n-1). Proceed similarly with coi and co* to get C2(n-1)• 4. Repeat the previous steps (n - m) times, until obtaining 0(7n) and S/(m). The reduced mixture keeps the form (2.3) but has only m components, with locations 0(m) and weights 52(m). The estimate for V, needed in the kernel de I m, M) and calculated from the larger sets 0(n) and 11(n), is kept. But a larger bandwidth h, due to the smaller number of kernels in the "reduced" mixture, is appropriate. To help practical implementation of AIS for the biomass models of Chapter 3,1 have included a pseudo-code for this procedure as an Appendix. 2.3.1 An Example To illustrate the AIS procedure, I shall perform a Bayesian analysis for a model first considered by McCullagh and Nelder [41] (sec. 9.3.3) and, in the Bayesian context, by Smith and Gelfand [61]. For i = 1,2,3, let Xi,1 and Xj,2 denote two binomial random variables with parameters (no, pi) and (ni,2,p2). Assume Xi,1 and with no and ni,2 X1,2 are conditionally independent given (pi,p2), and specified. The observed random variables are Yi = X2,1 Given the observations^1'2 = y2, and Y3 = Xi,2. y3, the likelihood for (Pi ,P2) takes the form: 3 ( ni,2^pri(1 _ p2ri,2-Yi-Fi no E ( j^ i=i JEA(i)^ (2.5) 0: - i where A(i) = {j; max-0,yi - ni,2} _< j < minfno , yil. The data of McCuRagh and Nelder are given in Table 2.1. Their goal was point estimation of (p1,p2). I shall obtain a (two-dimensional) posterior density instead. Transforming for 0: The method requires 0 = (OM), 0(2)) to be defined on the real plane n2. Hence I use the logit transformation 0(i) = ln(pi/(1 - pi)), for i^1,2 Chapter 2. Bayesian Posterior Computation^ 27 Table 2.1: Data from McCullagh and Nelder [41] (Sec.9.3.3) i no 15 26 34 ni,2 Yi 5 4 6 7 5 6 The final results can be described for 0(i)'s or, as I did here, transformed back to the original parameters pi. The calculations: I shall guide the reader through the various steps of the AIS algorithm for this specific example. 1. Define the prior distribution r(0). After drawing a sample of independent observations from a two-dimensional uniform distribution on the unit square (0,1) x (0,1), and applying the logit transformation to each of them, the resulting sample of values 0 has a dome-shaped distribution with (approximate) mean-vector (0,0)' and diagonal covariance matrix diag(3.2, 3.2). Therefore I used a two-dimensional Student with 30 degrees of freedom and the given values as mean and scale as my prior for 0. The degrees of freedom were chosen arbitrarily; smaller values would result in too heavy tails inconsistent with the observed sample. 2. Define go(.) = r(.). That is, use the prior distribution as the initial importance function. 3. Draw a sample of size no = 1500 from go(.). This is the set 00 = {00j, j = 1, • • -, no}. The sample size, again an arbitrary choice based on diagnostics and experimentation, is of moderate size. It is intended to capture crude features of the true posterior. In subsequent recursions the importance function becomes more refined. Therefore the sample size will gradually increase. Chapter 2. Bayesian Posterior Computation^ 28 4. For each 00,j E 00 calculate: (i) the true posterior density P(eo,i) = 1(0o,i)740o,3). Where 1(9) is the likelihood (2.5) with pi's (i = 1,2) replaced by the appropriate function in 0(0 =e°(`) (1 + egs)) (ii) the value g0(90,2). In subsequent recursions the importance function g•) from which the current sample was drawn is used instead. (iii) the weight c4)(00j) = p(eo,i)/go(eo,i)• 5. Get the set of normalized weights =-^j = 1, • • , no} where co0,3 = co(00,j)/K and K = E.72.1440o,j). 6. Using 00 and 00 calculate the Monte Carlo estimates of the (2 x 2) covariance matrix V, as indicated earlier. Also calculate the relative entropy for g0(•) by using f20 in equation (2.4). 7. Using the procedure for reduction of mixtures, described earlier in this section, reduce 00 and 120 to smaller sets of m 500 points. Denote these new sets 0((50°) and 9,(50°) and use them, together with the estimate V calculated in step 6, to construct the mixture model gi(.) of the form (2.3). I fixed the kernel de m, M) as a Student distribution with 9 degrees of freedom (to provide for heavy tails in the importance function) and the bandwidth of Silverman [59] defined earlier. 8. Draw a new sample of size n1 = 2500 from fil(•) and denote it ei = {01,i,^= 1,- • •,ni} 9. Repeat steps 4, 5, and 6 to obtain 91, V, and the relative entropy for gi(-). Chapter 2. Bayesian Posterior Computation^ 29 43^. and S2(15" Construct the new mixture model g2(.). 10. Obtain the reduced sets .500) 11. Draw a new sample of size n2 = 4000 from g2(.) and denote it 02 = {92,3, ./ = 1, • • • , n2} 12. Repeat steps 4, 5 and 6 to obtain f22, V and the relative entropy for g20. 13. Obtain the reduced sets 0(25" and Si(2500). Use them in the final mixture model (2.3) to generate the contour plot of Figure 2.1. Results: The relative entropy (2.4) was 0.873 for go(.), 0.993 for gi(.) and 0.997 for g2(-). The major improvement is shown for the first recursion, where the entropy jumps from 0.87 to 0.99. A final entropy close to 1 means good agreement between the mixture model and the true posterior at the 4000 observations in the set 02. Figure 2.1 is a contour2 plot based on the mixture approximation for the posterior. Negative correlation and a slightly larger value for p2, as compared to pi., are identified. For comparison, I included the values reported by McCullagh and Nelder: the quasi-likelihood estimate (0.36,0.84) (triangle) and the likelihood estimate (0.2,1.0) (cross). Both estimates satisfy pl. -I- p2 = 1.2, and are on the (dotted) line which represents all such cases. I have included this line to show that the shape of the contour plot is approximately symmetric to it. Marginal posterior probabilities for p1 and p2 are shown in Figure 2.2(a). Despite the large uncertainties, indicated by the flatness of the distributions, there is a clear trend towards smaller values of p1 (full line) as compared to p2 (dotted line). When comparing both parameters, a natural function of interest is H(PlIP2) =P2 -P1 I calculated this difference for each of the 4000 sample points in 02. A summary of the outcome is shown as the full line in Figure 2.2(b). This curve was obtained using standard density estimation (a simple histogram gives the same message); there is a tendency towards positive 2A11 contour plots are made using `interp0', an S-Plus function. Chapter 2. Bayesian Posterior Computation ^ Figure 2.1: Contour plot for posterior distribution of (p1,P2). Contour lines are for .05, .10, .25, .50, .75, .90, .95 of largest observed density. 30 31 Chapter 2. Bayesian Posterior Computation^ (a) q . ,-co 0 CD -0 c; 2 o_ •cr.. o c\! o q o ............ .....^ 0.0 0.2^0.4^0.6 p1 and p2 ....... 0.8 ^ 1.0 (b) -1.0 ^ -0.5 ^ ^ 1.0 0.5 0.0 p2-pl ^ Figure 2.2: (a) Marginal posterior for Th. (full line) and /32 (broken line). (b) Density estimates for (p2 — p1) for 4000 sample points (full line) and the 500 values remaining after clustering (dotted line). values, particularly in the range 0.5 to 0.7. The dotted line on the same graph, was obtained in an analogous way, but using only the 500 clustered values 0(25°°); little information is lost and the result is essentially the same. In my description, I have avoided results in the form of point estimates. Posterior mode, median and (Monte Carlo) mean could be used as such. But, the posterior distribution is the full expression of current uncertainties and beliefs about the parameters; why say less (with a point estimate) if I can say it all with the posterior distribution ? 32 Chapter 2. Bayesian Posterior Computation ^ 2.4 Sequential Analysis Here I shall discuss a class of models for which the parameter 9 is assumed time-dependent; hence I use Ot to associate it with time t. This topic is not needed for the developments in the next chapters, and may be omitted without loss of continuity. But it illustrates another field where the AIS can be used. There are good biological reasons to distrust time-invariant 9 for models describing the dynamics of exploited fish stocks (Walters [69], Parma [47]). Alternatively, we might take 0i to describe the state of the system at time t. Some examples: define Ot as the stock biomass at time t; or define Ot = (X,,,t, • • • ,Xch,t)' to be a k-dimensional vector where X,„t describes the state of classes ci at time t. For the latter case, think of X as some abundance measurement and ci as different locations; or think of X as number of fish (or total weight) caught at ageor size-classes ci. State - space models and Kalman filtering (Brockwell and Davis [4]) have been used in such situations (Walters [68], Mendelssohn [42], Sullivan [66]). The Bayesian alternatives are dynamic Bayesian models (DBM), the subject of this section. An extensive treatment of DBM's is given by West and Harrison [77]. I shall describe DBM in its general form and show why (and how) adaptive importance sampling (AIS) can overcome difficulties in practical implementation. I conclude with an application to a non-linear non-Gaussian example, where Kalman filtering does not apply. In its general form, the elements of a DBM are a discrete time-series of s-dimensional observations Yt (t = 1,2, -.), each of which is related to a p-dimensional parameter Ot representing the state of the system at time t. The dynamics are described as follows. At time t, the observation Yt has a known sampling distribution. (Yt I et) •--• Po(Yt I et)^ Conditional on Ot, Yt is assumed independent of yi and Oj, for all j (2.6) t (past or future). The 33 Chapter 2. Bayesian Posterior Computation^ state parameter is assumed to evolve in time according to an evolution density (°t+1 I et ) ' Pe(et+i I et)^ (2.7) Once Ot is given, 9t+1 is conditionally independent of past values Oi for j < t. For each time t = 1, 2, • • • the defining probability densities are given by the observation model (2.6) and the evolution model (2.7). Let Di be the information available at the end of period t. If, from t to t + 1, the only relevant information is the new data point Yt+1, then the recursive relation Dt+i = {Yt-i-i , Dt} describes the dynamics of sequential information acquisition. The conditional densities of (2.6) and (2.7) typically also depend on D. I often exclude this term from the notation to avoid clutter. The Bayesian element in the model is the expression of uncertainties about Ot by probability densities. In particular, assume that P(Oj I DO, j = 1, 2, - • •^ (2.8) describes the uncertainty about 9, given all current information D. In the context of time-series, the probability densities (2.8) have different names depending on the j used: filtered for j = t; smoothed for j < t and predictive for j > t. In standard Bayesian terminology (Section 1.3), filtered and smoothed densities are posteriors given Di, while predictive densities are priors for future states. The Bayesian (subjectivistic) interpretation of these probabilities allows the modification of any such density if relevant information outside of Di becomes available. To use the Bayes Theorem, we need a sequential updating mechanism for each additional observation Y. This is done by the following 3 steps: Step 1: Starting at some time t - 1 with a given posterior (filtered) distribution p(Ot_i I Dt_1), get a prior (one-step-ahead predictive) density for Ot as Mt I Dr-1 ) = 1 Pe(Ot I et - i 7 Dt_i)p(et_i ( Dt_i)det-i Step 2: Get the (marginal) one-step-ahead predictive distribution for the observable Yt as p(Yt 1 Dt_1)=---- f po(Yt I et, Dt-i)p(et I Dt-i)dot Chapter 2. Bayesian Posterior Computation^ 34 Step 3: For an observed value Yt -,--- yt, the probability po(yt 1 Ot,Dt_i) acts as a likelihood for Ot. Hence, using Bayes Theorem, update the prior of Step 1 to get the posterior (filtered) density for time t, p(ot I Dt) = Ayt I D t_ i ri p( ot 1 Dt—i )Po(yt I et , Dt—i) The evaluation of the above integrals has no analytic solution in general. Important exceptions — Dynamic Linear Models and Multi-Process Models — are discussed in detail by West and Harrison [77]. In case of non-linear functions and/or non-Gaussian noise structures, two types of approximations are used: analytical and numerical. Analytical approximations are used, for instance, by Delampady et al. [11] and [10]. They apply Laplace approximations of Tierney and Kadane [67] to calculate posterior moments for a Bayesian version of an autoregressive time-series model AR(1), and for a model where Ot is the intensity parameter of a Poisson process. Numerical approximations start by considering a grid of values for O. Choice of this grid is an important point for the distinction among different methods. Sorenson [64] identifies two general groups: (i) local methods which restrict analysis to one grid point only; and (ii) global methods which use the whole grid. Local Methods The best known local method is the extended Kalman filter (EKF) (for application to fisheries see Walters [68]). To describe it, I define E[Yt I Ot,Dt-il = F(et,Dt-i) ^ (2.9) with expectation over observation error vt; an (s x 1) random vector; and E[Ot I et_i,Dt-i] = G(et-i,Dt-i) ^ with expectation taken over system noise wt; a (p x 1) random vector. (2.10) Chapter 2. Bayesian Posterior Computation^ 35 The functions G(.) and F(.) — which can be time-dependent — are assumed known. The EKF linearizes GO and F(.) with respect to O., assumes Gaussian distributions and chooses the grid point to be some "best linear estimate" (The ordinary Kalman filter is the special case where GO and F(.) are linear). The updating formulas used in Kalman filtering are equivalent to the formulas used to update mean and variance for a Gaussian conjugate family of distributions. Therefore, within the Bayesian context, the use of the EKF implies that the posterior distributions for Ot are Gaussian. According to Sorenson [64], the first serious step away from a Gaussian assumption involved the use of Edgeworth expansions and "seems to be most useful when the posterior density is unimodal even though not Gaussian". Global Methods Any global method needs a way to solve the general problem of 1. defining an initial grid, 2. giving a procedure to define the grid at the next time step, and 3. for a given grid, select a method to approximate the probability density. Kitagawa [27] approximates integrals by a finite sum defined over a discrete equi-spaced grid for Ot, where densities are approximated by a first-order spline function. Smith et al. [63], Pole [49], Pole and West [50], propose an adaptive grid using Gauss-Hermite quadrature. West [74] points out the difficulties with the quadrature method: (i) the enormous computational demand in problems with more than a few parameters, (ii) the difficulty of reconstructing smooth posterior distributions based on the evaluation of only a few points, and (iii) the need to work with an approximately orthogonal transformation of the parameters. In the adaptive importance sampling approach of West [75] [76] the grid points are randomly generated. Such dynamic grids will be concentrated in regions of interest in the parameter space (provided the mixture models are good approximation to the "true" underlying distribution). 36 Chapter 2. Bayesian Posterior Computation ^ How to (adaptively) find a good mixture model, was described (for time-invariant 0) in Section 2.3. I will show next how this can be modified for sequential models. 2.4.1 AIS in Sequential Models Using the notation of Section 2.3 (with the obvious changes to include the time index t) and given the observation model (2.6), the evolution model (2.7), and the data set Dt_1, the posterior p(0t-1 I Dt_1) is summarized by = {fit-10,nt_bot-bfit-1}, where gt_1(•) is the final mixture model before clustering and nt_i the corresponding sample size. The updating mechanism (Steps 1 to 3 above) can be rewritten as Monte Carlo approximations. Step 1: (i) If the expectation Ee[H(Ot) Ot_i] with respect to the evolution density (2.7) can be explicitly evaluated for some function of interest H(.), then its predictive expectation is nt-1 ^E[H(9t) Dt_1]^E cot_i,jEetH(9t) et_1,2] I Particularly, for mean and variance, nt-1 at = E[9 I Dt_1} E wt_LiEe[et I et_i,i] Rt = Val*[Ot Dt-i] Pe, E wt-i,i(Vare[et I °t-i,j] (Ee[Ot Ot-i,j] - at )2) j=1 (ii) The prior (predictive) distribution ^I Dt_i) is approximated by ^P(Ot I nt-1 Dt-i)^E wt_i,Jpe(Ot 1 et-1,J) j=1 This approximation is used to generate a stochastic sample 07 = ^= 1, • -,nt-i} from the prior distribution. It is also used to evaluate prior probabilities in step 3 below. Step 2: The sets Slt_i and Ot* are used for inferences from the (marginal) forecasting distribution (Yt I Dt_i). Particularly, nt-t ft = E[Yi Dt-i] E wt_LJE" ezi] j=1 Chapter 2. Bayesian Posterior Computation^ 37 nt-1 Qt Var[Yi Dt--1]-^-' E cot_i j(Varo[Yt 0;,j] (Eo[Yt^it?) where Eo and Varo are calculated with respect to the observation model (2.6). The marginal probability p(Yt Di_1) is given by p(Yt I Dt_i) at-1 E cot_Lipo(Yt KJ) j=1 Step 3: After /it = lit was observed, the goal is to obtain Ft --= {gto,nt,ot,nt} as an approximation for the posterior (filtered) distribution p(Ot I Di), completing one recursion. The procedure is as follows: 1. Use o; and 52t_i to: (i) calculate the Monte Carlo estimate of V as indicated in Section 2.3 and (ii) obtain the clustered sets (in = {9; j = 1, • • , ml and S4 = {wZi; j = 1, • • • ml with m < nt_i elements. 2. Define an initial mixture goy)) = Ewzido j=1 and produce a sample Ot,o = fot,o,j; = 1, • • • , ezi,h2v) nt,01 from it. 3. Calculate the corresponding set of weights Clip, from yt,o(et,o,i) and I ft(Ot,0,j) OC Po(Yt et,0,j)P(Ot,0,3 Dt - 1) 4. The summary rt,o = {gt,o(-),nt,o, et,o,f2t,o} is used to repeat the procedure. Repeat the procedure to obtain rt,i, rt,2, • until this set is considered to mimic well the true posterior. Use the final set as I. For the initial time t = 1 the density 1-(01) = p(01 I Do) has to be specified; usually a diffuse distribution is chosen, from which a random sample can be drawn easily. This random sample is O. To obtain the corresponding weights f21,0, define gi,o(0) = 740) so that wo,i a po(Yi 01,0j). The above procedure is then applied to generate Fla, etc. Chapter 2. Bayesian Posterior Computation^ 38 2.4.2 An Example For illustrative purposes, I shall consider a (hypothetical) situation in which the biomass of some resource is described in time. I use a non-linear non-Gaussian model for which classical (Kalman filter) methods do not apply. The biomass dynamic is described by Bt = St_i exp(a - bSr i)e"t^ (2.11) where Wt are independent random variables from a Student distribution with 9 degrees of freedom mean 0 and constant variance W. St denotes the biomass surviving year t. Given an annual natural survival rate C, and ht the harvest rate in year t, I define St as St = C(1 - ht)Bt^ (2.12) Each year some abundance index It is observed. The relation between It and the unknown biomass Bt is modelled by It = qBtevt^ (2.13) where the observation errors vt are assumed independent random variables from a Student distribution with 9 degrees of freedom, mode 0 and constant variance V. After defining 0 = ln B and Y = in I, the following dynamic model is derived from (2.11), (2.12) and (2.13): 1e = G(Ot-i,ht-i)+ wt Yt = F(Ot_i) + vt where G(6, h) = di. + ln(1 - h ) -I- 0 - d2(1- h)PePe F(0) = in q + 0 di. .1nC-1-a d2 = bCP (2.14) Chapter 2. Bayesian Posterior Computation ^ 39 Table 2.2: List of parameters used in the simulation model for artificial data generation Parameter C a b P B1 q V W d1 d2 Value 0.8 1.65 0.20 2 4.7 1 0.13 0.04 1.43 0.13 Table 2.3: Output from simulation. The harvest rates ht were fixed in advance. Bt is the Biomass at the start of year t and It is the corresponding observed abundance index. Ot and Yt are the natural logarithm of Bt and It, respectively. t ht Bt It 1 2 3 4 5 6 0.2 0.4 0.6 0.6 0.3 0.3 4.71 3.39 3.90 4.62 5.70 2.12 3.22 1.67 2.59 5.81 5.16 2.51 Ot 1.55 1.22 1.36 1.53 1.74 0.75 Yt 1.17 0.51 0.95 1.76 1.64 0.92 Notice the non-linearity of GO with respect to O. By construction the process is nonGaussian as well. For the purpose of illustration, all parameters a, b, p , q, C, W and V are assumed known with values listed in Table 2.2. I simulated the process for 6 time steps with harvest rates ht specified for times t = 1, • • • , 6. Table 2.3 summarizes the outcome. Notice the substantial drop in biomass at time t = 6; possibly an effect of the heavy-tailed Student distribution modelling the random process noise w. The initial uncertainty about 01 is described by a prior r(01) which is Student with 9 degrees of freedom, mean 1 and scale 0.36 (the dotted line for t = 1 in Figure 2.3). After observing Chapter 2. Bayesian Posterior Computation t= 1 ^ ^ 40 t=2 t=4 t=3 t=5 0 2 3 Figure 2.3: Prior (predictive) and posterior (filtered) distributions are shown respectively by dotted and full lines, for times t = 1 to t = 6. `\?•' is the observation Yt and `+' indicates the "true" Ot. the posterior distribution p(01 I Y1) is obtained (the full line on the same graph). The use of G(01, h1) together with the known Student distribution for w1 enables the calculation of the prior (predictive) distribution p(02 I Y1) (the dotted line for t = 2 in Figure 2.3). After observing Y2, the process is repeated; then for Y3 etc. At each t, I repeated the AIS procedure 3 times, with sample sizes no = 250, n1 = 350 and n = 1000. The mixture model (2.3), used to approximate the posterior p(Ot I Yi, • , Yt), was obtained after clustering the sets 0 and SZ to m = 100 components. I used the conventional Chapter 2. Bayesian Posterior Computation^ 41 bandwidth multiplied by 0.65 to allow for possible multi-modal distributions. The relative entropy (2.4) was always larger than 0.997 for the final mixture approximation, indicating a good agreement between the importance function g(.) and the corresponding posterior p(•). For times t > 1, densities for predictive (prior) distributions p(Ot I • , Yt_1) where approximated by 1000 E cot_lJpe(et I 9t_1,i) j=1 where Pe(•) is the (known) evolution model: a Student density with 9 degrees of freedom, mean G(0t_1, ht_i), and variance W. In Figure 2.3, prior (broken lines) and posterior for Ot, are shown for times t = 1 to 6. For convenience, probabilities (y-axis) were standardized for maximum mode at unity. The information provided by 171 = 1.17 (displayed as V') reduces uncertainties as is reflected in the sharper tails of the posterior. The small deviation of VI with respect to the prior mode 1, caused the minor shift of the mode to the right. The true (unobserved) biomass indicated by `+' was in fact higher than prior mode or observation would indicate. The influence of GO in shaping predictions can be seen in the prior for t = 2. The asymmetric shape of this distribution supports the inappropriateness of a Gaussian approximation (which would be implicit in case of EKF). A second observation V2 0.51, surprisingly low according to predictions, causes a substantial revision in current beliefs and a smaller posterior mode. For times 3, 4 and 5 analogous considerations describe the dynamic of information acquisition. At t = 6, the predictive distribution again presents the asymmetric feature displayed at t = 2. Two features about the predictive distributions deserve comment: (1) the consistent limitation to values below 9= 2, and (2) the striking presence of sharper tails in the predictive distribution (as compared to the posterior) for times 3, 4 and 5. Both are reflection of the particular form of GO. But, while (1) could be anticipated by the non-linearity in the biomass production model, (2) was a surprise. Chapter 2. Bayesian Posterior Computation^ 42 2.5 Summary This chapter was about the implementation of the Bayesian paradigm where numerical approximations of integrals involving some probability function p(.) are needed. Monte Carlo integration with adaptive importance sampling was used to calculate these approximations. In Section 2.2, I gave a review of the concepts and properties related to basic importance sampling. This approach depends critically on the existence of a "good" importance function to replace p(-). A procedure to find such a function adaptively is described in Section 2.3. This method, proposed by West [75], uses a finite mixture of Student distributions as the importance function. This function is refined by a recursive mechanism Such procedure is particularly attractive for multivariate p(.). An example was included to illustrate the details of the required calculations. Section 2.4 extended the application of the adaptive method to situations where the parameter of interest changes in time and the data are acquired, one by one, in sequential order. An illustrative example was provided. Chapter 3 Models for Biomass Dynamics 3.1 Introduction In Chapter 1 I described the importance of Bayesian analysis for fisheries assessment and management. Some technical difficulties which have hindered a more widespread application in the past can be overcome by the numerical methods described in Chapter 2. The present chapter integrates the previous two, and proposes procedures for Bayesian analysis of biomass dynamic models. The novelty is the Bayesian framework in which traditional fishery models are treated. I shall establish notation, describe, and justify the proposed procedures. The main result of the chapter is the posterior distribution for O. The potential advantages and new insight gained from such an approach will be explored in Chapter 4 using case-studies. Here is a short guide through the chapter. The rest of this introduction describes some general features of fishery models and data. The reader familiar with them can move to Section 3.2 where I introduce the first group of models. Section 3.3 presents a second group of models for which I extend the time-series fitting procedure of Pella and Tomlinson [48]. In Section 3.4 I describe delay-difference models — again an optional section. Finally, Section 3.5 is a summary of the chapter. Fishery Data and Models Models describing the biomass dynamics of fish stocks are of central importance in the assessment of the resource. By providing information about productivity, such models are used to 43 Chapter 3. Models for Biomass Dynamics^ 44 find optimal policies for exploitation. Generally speaking, these models imply that highest productivity is obtained at some intermediate biomass. If the biomass level is too low, the stock is being overfished in the sense of catching too many fish too early, hence wasting potential for reproduction and growth. At low biomass, the surplus due to growth typically exceeds losses caused by natural mortality. Unduly high biomasses, on the other hand, favour mortality over growth. At least two causes explain this phenomenon: (1) the large proportion of older fish, which no longer use food efficiently for body growth; and (2) the waste of energy caused by competition for limited food supply. The above description makes it clear why estimates of biomass, and the attempt to model its dynamic, are central in assessment studies. In the past scientists worked with a static concept of "optimum biomass" obtained under equilibrium (time-steadiness) conditions. The inappropriateness of such a concept has become apparent from its failure in practical situations and from a growing number of simulation studies. Recent developments have moved away from these earlier approaches in at least two ways. First, there is the use of dynamic (as opposed to equilibrium) models and the view of optimum biomass as a "moving target" subjected to uncontrollable external factors and affected by the history of exploitation; in fact, optimality itself is being questioned. Second, there is an increasing emphasis on statistical models to describe the structure and dynamics of uncertainty. Management strategies need to reflect these changes. The historic (and still widely used) concept of "Maximum Sustainable Yield" (MSY), defined as "the average yield which would sustain the resource at its optimum equilibrium biomass", is outdated. MSY has been replaced by concepts such as: (i) optimum fishing effort maximizing future average discounted catch (Ludwig and Walters [38]) or (ii) total allowable catch quotas (TAC's) which — subjected to some constraints — result in (estimated) smallest risk of biomass depletion (Fournier and Warburton [17]). Still more elaborate strategies using stochastic dynamic programming (Walters [68]) are possible. Such strategies explicitly acknowledge that an action today (by affecting the dynamics into the future) will affect the state of the stock tomorrow. Uncertainty reduction (learning) Chapter 3. Models for Biomass Dynamics^ 45 as well as economic gains can be combined to determine an acceptable compromise. Finally, in order to disclose confounding effects between environmental factors and those related to fishing and stock size, management actions can be guided by general principles of experimental design (Walters and Collie [70]). The usefulness of the Bayesian estimation procedure and its link to management decisions, was described in Chapter 1. I will not further elaborate on the topic (although I shall come back to it in the concluding chapter), but turn to other aspects relevant for stock assessment: (1) biomass as an aggregate measurement of the stock, (2) data types and acquisition, (3) models and estimation of parameters. Biomass The appeal of stock description in terms of its biomass is simplicity. But, as for all aggregate measurements, there is a price to pay: a loss of information regarding the internal structure of the stock. A large number of young and small fish can have the same biomass as a smaller number of older large fish. From a biological standpoint they can be very different in behaviour, food requirement, reproductive potential, and mortality rate. In commercial terms they might also represent distinct markets, prices, and fishing gear requirement (hence, costs). Similar comments apply to the spatial and/or temporal aggregation of local biomass measures. Data I shall give a short account of the type of data commonly available for fishery assessment, its sources and potential weaknesses. For a broad description of this topic including survey design, see Hilborn and Walters [25]. The data in fisheries have three sources: commercial catches, surveys and tagging. Data from the commercial fleet form the primary source of information about the exploitation history of the stock. Total catches (often better defined as "landings", because they ignore onboard discharge of economically unattractive or illegal sizes and species) together with fishing effort Chapter 3. Models for Biomass Dynamics^ 46 and location are collected. Port samples of fish to determine age, sex, length- and weightdistribution are usually collected as well. The ratio of catch per unit of fishing effort (CPUE), is used as an index of abundance and assumed to be proportional to biomass. Such data are typically observational in the sense that there is no control over where and how they are collected at sea. The fisher will go where the activity is economically attractive, be it for high abundance of fish, convenience of location, or any other reason. Assumptions of random sampling do not apply. This fact causes enormous difficulties for reliable estimation of abundance (via CPUE) and represents a serious obstacle to stock assessment. To avoid the problems associated with commercial data, management agencies have been using research surveys to obtain more accurate estimates of abundance and fishing effort. Surveys are comparatively expensive and usually gather less data than would be required for acceptable levels of confidence in estimates (in the context of classical statistics). But, surveys are valuable as a supplement to commercial data. Another possibility for abundance estimation has been tagging. Because of high cost, differential fish behaviour, and difficulties with tag recovery, its performance for abundance estimation has been discouraging (although it is a helpful tool in the study of movement and migration patterns). Models Fishery data are usually compiled annually so that a discrete modelling of time is appropriate. Therefore the models will be presented in the form of difference (rather than differential) equations. The levels of confidence associated with classical estimates of parameters in usually nonlinear models (and non-linear functions of these parameters) have been calculated in different ways: (i) variations on traditional linear and non-linear regression methods and approximations; (ii) numerical methods (jackknife, bootstrap). Alternatively, the Bayesian approach taken here provides estimates in the form of posterior distributions; the most complete description of Chapter 3. Models for Biomass Dynamics^ 47 uncertainties. I shall present next a detailed description of models and procedures for the practical implementation of Bayesian analysis in fisheries. 3.2 Model 1 In this section I shall present models for stock dynamics which assume only system noise. I call them Model 1. The next section considers models with observation error and system noise; I shall call those Model 2. Model 1 is important because it allows us to derive some useful results which build the link between adaptive importance sampling (Chapter 2) and biomass dynamic models. Model 2 takes in the time-series fitting procedure of Pella and Tomlinson [48] — which has proven quite successful for conventional parameter estimation in biomass dynamic models — to calculate the likelihood. The biomass at time t is denoted by B. After harvest, the surviving biomass is St. The model describing biomass dynamics is formulated as Bt = St-iF(0,St_i)et^ (3.15) where 9 = , ,O(P))' E RP is the unknown p-dimensional parameter of interest and {vt} a sequence of independent random variables with mean 0 and constant (unknown) variance V. Management policies are often given as functions H(9) where H : is known. Prediction into the future depend on the knowledge of 9 as well. When such models are used to describe stock dynamics they are called surplus-production models (e.g. Ludwig and Walters [37][38]). When used to describe the relation between the number of spawners (Si) and the number of returning recruits k years later (Bj = Rj+k), they are denoted stock-recruitment models. But for the purpose of this section such a distinction is unnecessary. Chapter 3. Models for Biomass Dynamics^ 48 I shall consider two specific forms for F(0, S). The Deriso - Schnute family of models specifies ^F(0, S) = a(1 - b7S)lh^ (3.16) with a > 0 and b > 0. Appropriate choice of 7 allows formulation of a variety of classical fishery models. For instance, traditional models due to Schaeffer, Beverton-Holt, and Ricker correspond to 7 = 1, 7 = - 1 and -y -> 0, respectively. An alternative formulation uses the generalized Ricker models of Ludwig and Walters [38]. It specifies F(0, S) = ae - bs° (3.17) Both groups, (3.16) and (3.17), overlap at the standard Ricker model, for which the limit as -y -> 0 is taken in the former and (5= 1 is assumed in the latter. In the models described below, I always consider specific sub-families by fixing in advance the parameter 7 in case of model (3.16) or /5 in case of (3.17). According to Ludwig and Walters [38], an attempt to include these as unknown parameters would produce a worse result than assuming a plausible value instead. The components of 0 need to be defined over the real line. Therefore, I shall define 0 as the appropriate transformation of parameters: for a parameter 0 < a < oo I use ln a; if 0 < a < 1 I use the logit transformation ln(a/(1 - a)), etc. Hence, in the above models I define 0 = (in a, in b)'. Biomass and catch are strictly positive quantities and because of this I have modelled randomness as a multiplicative (non-negative) quantity (e"). For mathematical tractability it is convenient to divide both sides of (3.15) by St_i and take logarithms. The following probabilistic model for Yt = ln(Bt/ St_i) results: P(rt I 9, V - dp(pt(0 V ) ), ) (3.18) where MO) = ln F(0, St--i) and dp(m, C) is some p-dimensional distribution with mean m and covariance matrix C. The value St_i is assumed known at time of predicting Yt; hence, pt(0) is also available. Chapter 3. Models for Biomass Dynamics^ 49 Prior uncertainties about V and 9 are assumed to be independent and given by ir(0, V) = r(0)r(V). For a given data set Y(r)^, , Yr} I will be interested in features of the posterior distribution p(0 I Y 01), and sometimes in policy variables H(9) or utilities U(0, a). 111(9, V) = Po(Y(7) 0,V) is the likelihood for (0,V) then, given the assumption of prior independence of and V, I obtain 1(9) = Po(Y(T) I 0) by marginalization with respect to V. I.e: 1(0) = jpo(Y(7)1 0, v)r(v)dv 1(9) = Elor(1T)[1(0,V)] I shall follow the notational convention of Berger [2], and use superscripts on expectations E to indicate the random quantity (or its density) over which the expectation is to be taken. ir(V) Subscripts on E will denote fixed parameter values. For example, Ee^means the expectation with respect to the prior for V, conditional on a fixed value O. Superscript and/or subscript will be omitted whenever the interpretation for E is obvious. According to Bayes Theorem (section 1.3), the posterior for 0 is obtained as PO I 17(7)) a r(0)4(v)[1(9, V)] ^ (3.19) To use (3.19) in the context of adaptive importance sampling (AIS), we need to evaluate its right-hand side (up to a normalizing constant) for any given O. Technical results given as Lemmas in the next subsection will enable the practical implementation in Chapter 4, but can be omitted from a first reading without loss of continuity. 3.2.1 Two Lemmas The results to be given in the two Lemmas of this subsection will be used throughout the applications of Chapter 4. The first and most important of these Lemmas assumes a Gaussian model in (3.18) together with an Inverse Gamma prior r(V). The reason for this combination is the conjugate property Chapter 3. Models for Biomass Dynamics ^ 50 of distributions. As was explained in Section 1.3, such property is a helpful tool to simplify calculations in general, and will be used to aid the calculation of 1(0) in (3.19). Lemma 3.2.1 Let d(.,.) in (3.18) be Gaussian and r(V) /G(ctoo3o), an inverse gamma distribution. The following special cases of (3.19) are obtained given two different assumptions regarding the distribution of (Y (r) I 0,V): (a) If (Y3 0,V), j = 1,•• •,7" are independent, then P(9 j 17 (7)) 0( 7(9)(13T(9))ar) where ct, = co r/2 and ^0„- 0 ) =^20o 2 + poz(e) ( with Z(0) =^- (b) If (Y:i I 9,V) N((9), V) for any fixed 9 and (V I Y(j -1),0) IG(aj_1,13J-1(0)), then p(0 I Y (r))^r^(13-7(9))c' where ai = a2_1+ 1/2 and 20i_1(0) 0.7(9) = 2 +^ - Pi (9))2 Proof: • In (a), notice that 1(9, V) = H p( Y2 I 0, v) v -T/ 2 e - Z(19)/ 2 V j=1 and hence ^F701[0,101 OC ty-(0+1-12+1) exp^(Z2(:) olov)) E7(11[49, V)] CX (0T(0))ar pe(v)dv where pe(V)^[3,-(0)) so that the last integral equals 1. dv ^ Chapter 3. Models for Biomass Dynamics^ 51 • For (b) recall that r(V) = r(V I 0) so that ^Eir(v)[/(0, V)] =^Po(Y(T) I v,0)7r(v I 0)dv = po(Y(r),v I 0)dv = Po(Y(T) I 9) = 460. The pair of distributions assumed in (b) form a conjugate family, such that (V I Y(j), 0)^/3(9)) and from the Gaussian observation model it follows that (yi Y(j - 1), 9) T^(ai_ifii-1(9)) 1) where T(k,m,C) denotes a Student distribution with k degrees of freedom, mean m and scale C. The expectation in (3.19) is the marginal likelihood 1(9) = po(Y(r) I 0). Because of the time-series nature in the observations, assume yi dependent on past data Y(j - 1). According to the multiplication rule, 1(9) can be written as the product 1(0) = H po(Yi I Y(j - 1), 0) From the assumptions it follows that each term in the product is given by iii(9))2) Po(Yi I Y(j - 1),9) a (#i_1(0))-112 (1 + i3j-1(19)(17-i 2 (0i-i(0))-(a,-1-1/2) oc which completes the proof. 213i_1(0) 2 +^(0)(Y - pi(0))2 (0.i(0))°' (/3j-1 (9))-1 Chapter 3. Models for Biomass Dynamics ^ 52 What distinguishes assumptions (a) and (b), is the use of sequential learning about V in the latter. At each time t an up-to-date estimate of V, given the currently available information Y(t — 1), is being used. I expected improved performance under assumption (b). However, for the cases I have analyzed the difference was unimportant in practice. The second Lemma is useful in cases where some (posterior) estimation of the nuisance parameter V is required, and the assumptions of Lemma 3.2.1 apply. It gives explicit formulas for first and second central moments of the posterior (V I Y(r)); a finite mixture of InverseGamma distributions. These moments are expectations over p(0 I Y(r)), which can be evaluated easily by Monte Carlo approximation, using the final sets 0 and Si produced by adaptive importance sampling. Lemma 3.2.2 Given the assumptions of Lemma 3.2.1, the first and second central moments of (V I Y(r)) are: 1 E[V lif(-r)] =^EP (91Y(T))[0;10] for a,- > 1 a, —^ 1 1 Ep(elY(T))[13,720] for a, > 2 . E[V2 I^ 17(7)1 = (a,._ 1)(a,-- 2) Proof: These expectations are a direct consequence of the following facts: 1. Under assumption of Lemma 3.2.1(a), and any fixed 0, it follows that p(V I Y (r), 0) a r (V)1(0 , V). In the proof of Lemma 3.2.1, I showed that this is proportional to an Inverse Gamma distribution with parameters a„ and 0,(0). In the case of Lemma 3.2.1(b) and any fixed 0, the conjugate property of distributions implies that (V I Y (r), 0) ,-, IG(ar, OT(9)). Notice that, while a,- is the same for cases (a) and (b), the values of 0,(0) differ. Chapter 3. Models for Biomass Dynamics ^ 53 2. Using properties of conditional expectation, the two moments of interest can be rewritten as follows: E[V I Y (r)] E2[E1[V Y (r), 011 E[V2 Y (r)] E2 [V ari[V I Y (r), 0] -I- ERV I Y (r), 0]] where E1 and V ari are expectation and variance taken with respect to p(V I Y (r), 0) and E2 is expectation with respect to p(0 I Y(T)). 3. Finally, using the fact that for X IG(a,b) the mean is E[X] = 11(b(a - 1)) if a > 1 and the variance V ar[X] 11(b2(a - 1)2(a - 2)) if a> 2, we obtain the result. 3.2.2 Catch and Effort Data Model (3.18) implicitly assumes direct observation (with negligible observation errors) of the set {(St, Bt+i); t = 1, • , r} . The stochastic component modelled by v, usually denoted "system noise" or "process error", reflects noise in the dynamic description of the system. Such model applies, for instance, in studies of salmon populations to describe relations between spawning stock St_i and the returning recruits Bt = Rt+k after k years (Stock-Recruitment Models), assuming that good estimates of both S and B can be obtained. For most practical cases, however, neither S nor B are available directly. It is common to have a data history of total catches Ct and the corresponding fishing efforts E. These data need to be translated into abundance estimates for B and S in order to use Model 1. A widely used model in fisheries defines this relation as follows: Ct = Bt(1 - e-gEt)^ (3.20) St = Bt - Ct^ (3.21) The term ht = 1- e-gEt in (3.20) denotes the harvest rate in year t as a function of fishing effort; the constant q > 0, the catchability coefficient, specifying the efficiency of each unit of effort. In (3.21) no mortality other than fishing is assumed. If the rate of survival to natural causes ( is to be included, this expression can be replaced by St = BtCht • Chapter 3. Models for Biomass Dynamics ^ 54 From (3.20) and (3.21) the values Yt and St appearing in (3.18) are functions of q, Ct, Ct+i, Et, Et+i; that is: Yt = y(q, Ct, Ct+i, Et, Et+i) St = s ( q , Ct, Et ) where these functions are as follows: Yt in (C+1 (1 — exp (— q-Ei))) (1 — exp (—qEt-F1))Ct qEt St = Ct exP "Et) exp (—qEt)J (3.22) (3.23) — Notice that these calculations require specification of q which can be incorporated into 0. The logarithm of q is the appropriate transformation and I define 9 = (in a, in b, in q)'. Given a set of catch and effort data {(Ci,Ei); j = 1, •• • , r 1) and a fixed value for the parameter 0, the corresponding set {(Yi, Si); j = 1,•• • , r} can be calculated using (3.22) and (3.23). The usual calculations of the posterior p(0 Y(r)) can now be performed. But, notice that each given q will result in a different sequence of Y and S values. Also, the likelihood 1(0) is to be interpreted as the evidence for 9 given catch and effort data. Observation Errors The model so far has not considered any error in the observation of catch and/or effort. From the discussion in Section 3.1, it should be clear that such an assumption is inadequate. To focus my discussion, I shall assume accurate observation of catches Ct but substantial observation errors in efforts E. Such an assumption often agrees with practice if one considers the formidable difficulties in standardizing measurements of effort over an entire fleet of differing vessels, gears, and skills. To examine the effects of these errors, assume Et to be unknown and that Et" is observed instead; the difference (Ef — Et) being the observation error. Let Yt° and Sf be the alternatives to Yt and St in (3.22) and (3.23) when efforts E are replaced by E°. To keep the argument simple, assume that the observation errors are adequately modelled by Yt° = Yt + xi,t Chapter 3. Models for Biomass Dynamics ^ 55 Sf = St + x2,t where x.,t are random quantities induced by errors in E°. It is easy to see that (3.15) can be rewritten in terms of observed values as Yt° = in F(0, Sf_1 - x24-1) vt Particularly, model (3.17) with 5 = 1 (the standard Ricker model) results in Yt° = a - bSt° 1+ vt* where viK = vt^bx2,t_1. The parameter V in (3.18) is now the variance of the random variables v; (rather than vt), which include process and observation errors. To distinguish the combined variance from the variance associated to (3.15), I denote the latter by Vi,: the "process" error. Such a (convenient) linear arrangement of random terms is not likely to occur in more realistic formulations. The non-linear nature of most models make this relation quite involved and analytical analysis impractical. The impact of "errors-in-variables" on modelling and management has been recognized for quite some time (Walters and Ludwig [72]) and has caused a shift in recent developments towards the explicit modelling of statistical components (Schnute [58]). But the example demonstrates that V > Vp; the overall variance is more than just process error. How much more will depend on the magnitude of errors in observations and the particular form of the model. If an estimate of Vp is needed — maybe to determine some optimal policy — then the careless use of an estimate for V instead can result in considerable bias in the final outcome. How to overcome such difficulty ? Ludwig and Walters [38], in a similar context, side-step this problem by defining an additional parameter A = Vp/V, the ratio between these two variancesl. The case with no errors in observation would correspond to A = 1, while A = 0 assumes no process error. According to their performance evaluation, attempts to estimate this parameter together with all the others gave poorer results than to simply fix it at some intermediate value, say A = 0.5. 'They actually use A (V — VA/V Chapter 3. Models for Biomass Dynamics^ 56 Utility Function and Bayesian Decision The central interest of any practical application is not 9 itself. In the classical approach some function H(9) is usually to be estimated. A pragmatic approach might seek an answer to the question: "What effort level should be recommended for future years ?" Bayesian decision theory (Section 1.3) can be used in connection with the posterior for 9 and some utility function to provide an answer. I shall propose an utility function and describe how such analysis can be performed. Given a parameter value 03 E 0 and an effort ei E A, where 0 is the set of parameter values and A the set of possible management actions, I define U(9, ei) as the utility associated with such a pair. Further p(0 I Data) denotes the posterior distribution for the parameter 0 and U(i) EkelData) [u(., ei)] the expected utility for a fixed effort ei. The Bayes rule or optimal action will be the effort ei* which maximizes the expected utility; i.e.: U(i*)^U(i) A Monte Carlo approximation for U(i) is readily available (recall Section 2.2) and calculated as U(i) In defining an utility function, it seems sensible to reach two goals simultaneously: 1. To maximize (discounted) future catches. 2. The avoid dangerously low surviving spawning stocks St. For a given pair (03, ei) and a starting surviving biomass So, — which is available from the data, and defined as "the value of S at the final year of observations" — the fitting model F(-) is used to predict the average annual catches.6 over a T years period into the future, given that Chapter 3. Models for Biomass Dynamics ^ 57 a constant effort E = ei is used for the whole time. For each time t = 1, • • • , r the calculations go as follows: Bt = F(S-1) et = Bt(1 — e—q E) St = Bt - et The Cumulative Discounted Catch(CC) is calculated as CC(j, i) =E 4513vet t= 1 where (Spy is the discounting factor. Further I determine the smallest value of S over the same period as being Smin = min St 1<t<T and fix a threshold, say SLIMIT, so that smaller spawning stocks are considered to be an unacceptable risk. Such limit might be associated with a perceived danger of stock collapse if the stock is reduced further. Finally, I define the utility function as = CC(j, i) if Smin > SLIMIT 0^if Smin < SLIMIT In words, the utility function is given by the total discounted predicted catch if the chosen effort is fixed over a T year period, if and only if during these years the spawning stock never falls bellow the given threshold `SLIMIT'. If the latter occurs, the utility is set to zero. This utility function is simplistic in two ways: (1) it assumes a linear (i.e., risk-neutral) relation between CC and the utility derived from it; and (2) it defines the utility to be zero when Smin falls below the threshold, instead of using some generic penalty. For a particular fisheries, further refinements might be advisable (see the case-study for Orange Roughy). But, for the present purpose, this straightforward definition is adequate since it embraces three characteristics which, I believe, should be present in any such function. Namely, Chapter 3. Models for Biomass Dynamics^ 58 1. A range of implementable effort levels. If too low, it might kill the fishery; if too high, it would bring overcapitalization. 2. The desire of the industry to maximize catches in order to supply the increasing market demand. 3. The intent to avoid stock collapse, manifested by the managing agency and the general public and an industry which wants to utilize the resource in the future. 3.3 Model 2 For this second formulation denoted Model 2, I shall propose an estimation procedure that builds on the concept of observation error/time-series (OETS) fitting, introduced by Pella and Tomlinson [48]. Their central idea (for conventional parameter estimation) is, to fix a parameter value 60 E 0 (possibly including the biomass at time t = 1) and then use some model to predict the whole time-series. A best 0 is selected to minimize some goodness-of-fit criteria SS(0). Particularly, Pella and Tomlinson used SS(0) = E(Ct — Ct)2 t=i where Ct denote observed catches and Ot the corresponding prediction from the model. Subsequent developments considered more elaborate delay-difference models, as well as more sophisticated goodness-of-fit criteria (Deriso [12]; Schnute [56][57]; Fournier and Doonan [16]; Ludwig et al.[39]; Fournier and Warburton [17]). In the next section I shall describe delaydifference models in more detail. Here, I focus attention on the estimation procedures used by those authors, and suggest ways of incorporating these ideas into the Bayesian framework. The function SS(0), in its most general form, is given as T^1/ S S (9) = A E iv? +(1 - A) E + E ri (xi — 13)2 t=i^t=i^j=i where A denotes the "split" between process noise and observation error as described for Model 1 (previous section), with {Wt} and {vt} the corresponding sets of random perturbations. The Chapter 3. Models for Biomass Dynamics^ 59 values rj are pre-specified penalties associated to the deviation between model predictions and fixed (hopefully true) values xi. A large penalty associated to a specific j = jo say, indicates a constraint imposed upon 0, to the subset for which ijo xio. The method is clearly attractive given its great flexibility and conceptual simplicity, although it is at least as "subjective" as the Bayesian approach. All it requires is an efficient algorithm for minimization of SS(0) over the set 0. Difficulties are associated with the estimation of standard errors of such an estimate. Extensive simulation using bootstrap methods (Efron [14][15]) may be necessary. Therefore the Bayesian approach is appealing for at least three reasons: 1. By introducing a prior distribution r(0), it avoids the necessity to simulate possibly hundreds of replicates to determine variability in the estimates. The effect of alternative priors (and/or likelihoods) can be examined by the more efficient procedure mentioned at the end of Section 2.2 2. It does not require the maximization2 of SS(0). Such maximization forces choice of one optimal parameter 0*• This ignores the fact that often there are many choices of 9 which give a good fit to the data, but would demand different management actions. 3. For more elaborate delay-difference models, 9 will be of high dimension. Furthermore, the set {xj; j = 1, • • , v} in SS(0) are estimates from an independent source (outside the data set), and typically have uncertainties associated with them. Non-informative fishery data, allow for estimation of only two or three parameters in 0, with remaining parameters included as additional xj's. The Bayesian framework allows (at least in principle) for a coherent treatment of all uncertainties about 9 and x as probabilities. For convenience, the latter can be incorporated into O. To formalize the proposed Bayesian approach, I replace (3.15) in Model 1, by the following 2Although the maximization of an utility function will be necessary at the later stage of decision making. 60 Chapter 3. Models for Biomass Dynamics pair of equations Bt = G(0, Bt_i)e" (3.24) It = F(0, Bt)evt (3.25) where (3.24) models the dynamics for biomass B, and (3.25) the relation between biomass and an observed abundance index I. The process noise {wt; t = 1, • - • , r} and the observation error lvt; t = I,. • -,7-1 are assumed to be independent identically distributed random sequences with mean 0 and constant variances W and V, respectively. The random variables vt and wt are also assumed to be mutually independent. Let Yt = ln It and pt(0) = In F(0, BO. I assume the observation model P Yt 1 9, V) rs, d(p1(0), V) ^ ( (3.26) where d(m, C) denotes a one-dimensional symmetric distribution with mode m and variance C. I will use Gaussian distributions in the examples of the next chapter. In analogy to the OETS fitting procedure, I define the time-series of predicted biomasses as follows: B = {ErEBt I f3t-i] f3i fixed if t > 2 (3.27) possibly El E 0 and further define the likelihood for 0 to be given by 1(9) = P al/1, ' ' • 'Yr) I (121(9), ' • • , iLT(0))) ^ (3.28) where it(G) = ln F(0, 13'). I propose the following procedure to obtain the posterior p(0 1 Y(T)): 1. Fix a value of O. (recall from Chapter 2, that a sample of O's is available from the prior 70)). 2. Use the models (3.24) and (3.25) together with the definition (3.27) to obtain the sequence of predictions Iiit(0); t = 1, . • • , 71 Chapter 3. Models for Biomass Dynamics^ 61 3. Use (3.28) to calculate the likelihood 460 after appropriate marginalization with respect to V. 4. Obtain the posterior p(0 I Y(r)) oc 48)749) Two comments are appropriate: (i) Lemmas 3.2.1 and 3.2.2 can be used directly after replacing pe(8) by the corresponding ilt(0); (ii) the time-series {Bt} and {/t} are considered of the same length in the above formulation, but the abundance data need to be available only over a subset of time points. In fact, the Orange Roughy example of Chapter 4 considers such a case. There, catch data (useful in (3.24)) are available for a certain time sequence of length T say, while abundance estimates were obtained only for the s final years: T - 3 +1,• • • , T. The side-stepping of process noise by way of definitions (3.27) and (3.28) is an ad hoc procedure obtained by analogy to the OETS approach. I will discuss next, "why" (or at least "when") definitions (3.27) and (3.28) are justified. The handling of process noise Suppose that in (3.24) I accept model G(.) as a fair description of the underlying dynamics on average, and that model imperfections and remaining sources of variability are adequately captured by the random noise w. It is known from basic statistics that a* = Etr[Bt I f3t--i] minimizes EY [(Be - a)2 1 Et-ii • But a* is the definition of lit given in (3.27). In Bayesian decision theory (recall Section 1.3), the optimal action (Bayes rule) minimizes the posterior expectation of a given loss function L(0, a). In particular, L(Bt, a) = (Bt - a)2 is the "squared-error loss"; and the definition of Et imitates the Bayes rule (although the expectation is not taken over the posterior distribution of Be). Based on the rationale for choosing a Bayes rule, I could, in principle, use any alternative "loss-function" and modify definition (3.27) accordingly without affecting the remaining steps of the overall procedure. Chapter 3. Models for Biomass Dynamics ^ 62 On the other hand, definition (3.28) is not so clear. For simplicity, assume a single known observation yt so that the subscript t can be dropped. Then, according to (3.26), the likelihood for 9 is I = p(y j p(0)); while in (3.28) I use I = p(y j ft(0)) instead. I am changing the mode of the symmetric distribution from the unknown true p(0) to its estimate f(0); the latter being a fixed value given 0, while the former is random. Definition (3.28) might be adequate on average if, for a given 0, the expectation of I (with respect to the random p) is approximately given by i; i.e: E;[1] I While unable to assess the reasonability of such an assumption in general, I shall consider a particular case which is relevant for later applications. Assuming Gaussian distributions for w and v (with respective variances W and V), and fi = ln F satisfying ED.t — = 0 and E[A — Al2 = W, it is easy to show (using Taylor series expansion up to quadratic terms) that E[l]^W 1+ y — 11)2) 1 +^ 2V^V-17 (3.29) If the left-hand side of (3.29) equals 1 there is no bias; such is trivially true for W = 0. For a ratio smaller than 1, lis an overestimate of 4[1], while a value greater than 1 denotes underestimation. Given W> 0, and defining the standardized deviation c = (y — /2)2[/V, the following holds for (3.29): 1. The smallest ratio of 1 (W/ 2V) > 1 is obtained for c = 0; the ratio increases with increasing values of c. 2. Since is a function of 0, (3.29) changes with 0 for fixed y. For those values of 9 causing c to be large, the underestimation in I is more pronounced. 3. Larger values of V imply smaller values of c (hence, less bias). For V>> W the bias is negligible. 4. For fixed y and W, different values of 0 giving the same squared deviation c result in identical bias. Chapter 3. Models for Biomass Dynamics ^ 63 The central message in (3.29) is that I consistently underestimates EY[d] and, as a function of increasing values of c, the underestimation gets worse. A bias correction can be attempted if the relation is assumed, and estimates of A and V are available. Expression W/2V in (3.29) is then replaced by A 2(1 — A) and W is eliminated from the calculations. 3.4 Delay-Difference Models Here is an overview of delay-difference (DD) models. Such models can be used for GO in (3.24). For a more detailed analysis, see Hilborn and Walters (Chapter 9) [25]. The simple biomass dynamic models described by (3.16) and (3.17) cannot deal with timedelays and changes in population structure. The new class of DD models, introduced by Deriso [12], still describes the dynamics of the aggregate biomass but, by virtue of its construction, can deal with various time-delays. This because growth and recruitment are modelled explicitly. The aggregate biomass Bt at time t, is defined as the sum 00 Bt = E Na,tWa (3.30) Gk where a > k are ages (with k > 0 denoting the age of recruitment to the fishery), Na,t the number of fish of age a at the starting of year t and wa the average body weight for age a. The possibility of modelling details of growth (w), recruitment (Nk.), and survival (Na,t Na+i,t+i) separately makes the model very flexible. In formal terms, the three basic assumptions concerning growth, recruitment and survival to be used throughout are as follows: Chapter 3. Models for Biomass Dynamics ^ 64 • Growth: A linear relation is assumed to describe average weight dynamics for all ages a > k, where k is a minimum age at which an individual is first recruited to the fishery. This relation requires two growth parameters a and p as follows. wa = a + Pwa-i The weight wk is usually also fixed as an parameter. • Recruitment: Let St represent the spawning biomass in a given year t. Particularly, I will define St as " biomass which survives harvest in year t". The average recruitment dynamics will be described by a deterministic relation of the form Rt÷. = R{St - i} where RH is a given known function, u is a non-negative integer that describes a possible lag between birth and recruitment, and Rti-u represents the average number of recruits joining the stock at time t u. • Survival: The total survival fraction in any year, st, is usually modelled as the product of a constant natural survival C and a time-varying survival to harvest (1 - ht), such that St = C(1- he). Given these basic assumptions, the following pair of dynamic equations for biomass Bt and number Nt can be derived: Bt+i = St (aNt PBt) wkRt+i^ (3.31) Nt+i = stNt Rt+i (3.32) Initial conditions need to be specified before the above recurrence equations can be used. The way in which these are established might change according to any particular application. I assume a start from equilibrium (i.e. dropping the t subscripts in (3.31) and (3.32) ) and Chapter 3. Models for Biomass Dynamics^ 65 negligible harvest prior to time t = 1 (i.e., h = 0). Such an assumption is reasonable if an initial unfished population can be assumed. Rewrite (3.32) as Ne = 141(1— ()^ (3.33) Replacing Ne in (3.31) and solving for Re results in an equilibrium recruitment Re given as Be(1 — (p) Re =^e^ ( je ic-='Z + wk) (3.34) The recruitment function I will use in the case-study of Chapter 4 is a one-parameter Ricker curve defined as St-1St-1 Be)}^ e exp { b (1 — Rt+. = Rei,— (3.35) Generalization of Deriso's model was proposed by Schnute [56]. Further improvements by Fournier and Doonan [16] included the use of moments for weight and length as auxiliary information. Incidently, they also considered a Bayesian structure allowing for the inclusion of a prior distribution for 9. This information was then combined with the likelihood to determine the posterior mode as an estimate for 0 (the generalized maximum likelihood estimate in Bayesian analysis). They did not consider, however, the possibility of obtaining the posterior distribution. Alternatively Schnute [57] proposed a model for which he used size rather than age classes. He further assumed "knife-edge" recruitment for a minimum size rather than a minimum age. Hence his largest departure from previous approaches, relates to the modelling of recruitment. An advantage of DD models over the simpler surplus-production family relates to the underlying parameters which, in the DD case, have a direct biological meaning. This is a desirable feature for the Bayesian approach if elicitation of prior uncertainty is to be used. An disadvantage with respect to simpler models is the high dimensionality of the parameter O. This demands more computation and, because of uninformative data, often results in poorer performance (Ludwig and Walters [37]). Chapter 3. Models for Biomass Dynamics ^ 66 3.5 Summary In this chapter I have outlined the types of data commonly used to assess the biomass of fish stocks subjected to exploitation. They are: (I) catch and fishing effort statistics from the commercial fleet, (2) abundance estimates obtained through surveys undertaken by management agencies. While the former is not a random sample, the weakness of the latter is its high cost. The practical solution usually is a compromise between both types of data. Posterior distributions for the parameter 0 were obtained for surplus production models - in Section 3.2 (Model I) assuming the time-series of biomass Bt and surviving biomass St are directly observed, or estimated from catch and effort data. To deal with time-series of catch and abundance index, or more elaborate delay difference models, I have proposed an adaptation of - the observation error/time series fitting procedure of Pella and Tomlinson [48]. This approach - was described in Section 3.3 (Model 2). Finally, Section 3.4 gives an account of general features of delay difference models. - The implications of process noise and observation errors in measurements, are also examined throughout. Corrections are possible whenever their relative magnitudes can be assessed. Chapter 4 Applications 4.1 Introduction This chapter is dedicated to specific applications of the adaptive importance sampling procedure (Section 2.3) to models for biomass dynamics (Sections 3.2 and 3.3). I shall consider 5 different case-studies, one per section. Each section was written to be self-contained, so that the reader can select the case-study of his or her particular interest. But I emphasize different details of the analysis in different case-studies. This avoids excessive repetition for the reader of all five. The notation developed in the two previous chapters is used throughout. In some cases I use real data, while others rely on simulations. Sections 4.2 and 4.3 are devoted to "Model 1" of Chapter 3. In Section 4.2 I construct a posterior distribution for the parameters in models for stock and recruitment, using data for Skeena river sockeye salmon. Section 4.3 refers to surplus-production models. There I use simulated data and compare my results to those obtained by Ludwig and Walters [38], showing that some of their main conclusions can also be verified here. Since my results are posterior distributions, I can perform a Bayesian decision analysis. I suggest a sensible utility function and show that the best decision is robust with respect to alternative models and priors. Sections 4.4, 4.5, and 4.6 use the fitting procedure outlined in Chapter 3 as "Model 2": the Bayesian alternative to the observation error/time-series fitting method. Section 4.4 is a case study of Orange Roughy, a valuable fishery in New Zealand which has been the subject of recent controversy. I show that the determining factor in the dispute is the choice of an utility function. Different prior distributions and technical details about models are of little relevance. In Sections 4.5 and 4.6 I examine situations with multi-modal posterior distributions. In the 67 Chapter 4. Applications^ 68 former I again use simulated data, while the latter uses a delay-difference model to analyze data for Pacific cod. Finally, Section 4.7 summarizes the results of the chapter. 4.2 Stock and Recruitment: sockeye salmon I shall consider a Bayesian alternative to the estimation of parameters in stock and recruitment models when applied to data for Skeena river sockeye salmon (Oncorhynchus nerka). I use the data presented in Hilborn and Walters [25] (Fig7.1, p.243) and shall conduct my analysis to allow direct comparisons with some of the results presented in their Table 7.4 (p.279). Pacific salmon have played an important role in the development of stock and recruitment models. This can be explained in part by their particular life history. The fish hatches in fresh water and later migrates to the ocean where it lives most of its life. At the time of sexual maturity (four years later for sockeye), the fish returns to the small watershed where it had hatched. It then reproduces and dies. The size of the spawning stock can be measured in fresh water. Ageing techniques are used to determine, for each spawning stock, the number of adults that recruit to the fishery. This gives the pair of needed observations: spawning stock and the corresponding recruits. The data consist of a sequence {(Ri, Si); j = 1, • •, r} of observed data, where Si is "the number of spawners in year-class j" and Ri "the number of recruits produced by Si". Hence, in (3.15), I define Bi+i = Rj and consider two alternative models to fit the data: • R: Ricker model (Expression (3.17) with (5 = 1) F(0, S) ae-bs • BH: Beverton-Holt model (Expression (3.16) with 7 = -1) a F(0, S) - ^ 1 bS where 9 = (in a, In b)'. Chapter 4. Applications^ 69 Table 4.4: Some biologically meaningful attributes of the parameters in the Ricker and Beverton-Holt models. R^BIT 1. Slope at Origin 2. Sma,x 3. Rmax 4. Smsy 5. umsy a 1 -6 go' 11+2(.5 — .071na) ln a(.5 — .071n a) For the deterministic case (i.e.: V = 0 in (3.15)), and under equilibrium conditions, the parameter 0 relates to quantities of biological meaning. Table 4.4 (adapted from Hilborn and Walters) gives a summary of those quantities. Smax is the spawning stock which maximizes the recruitment; this maximum recruitment is given by Rmax. Notice the basic difference between the models. While R predicts the existence of an intermediate spawning stock able to produce maximum recruitment, in BH the recruitment tends towards an asymptote, as S increases. Smsy is the optimum (spawning) stock size for "Maximum Sustainable Yield" (MSY1), with umsy denoting the corresponding optimal harvest rate. I used lines 1, 2 and 3 of Table 4.4 and the data-plot of R versus S displayed in Figure 4.4 to set the range for the parameters of the prior r(0). I started by determining five quantiles of an "idealized" (marginal) log-normal distribution for each parameter. If the fitting of a log-normal distribution revealed inconsistency among these quantiles, I would try a new set of values. Table 4.5 gives the quantiles which were finally chosen. For instance, the value bp (parameter b under the Ricker model) used as benchmark the interval 380 < Smax < 1430, while for brill I assumed 100 < Rmax < 5000. The parameters of the corresponding Gaussian distribution for 0 (columns 'm' and 'C') were used as mean and variance of a heavy-tailed two-dimensional Student distribution with 9 degrees of freedom and diagonal covariance matrix. Prior uncertainties are smaller for model R as compared to BH. This is a direct consequence of the larger difficulty in eliciting quantiles for bBll as compared to bR. The reason for such a 'See Section 3.1 for its definition. 70 Chapter 4. Applications^ 200^400^600^800^1000^1200 S ^ 1400 Figure 4.4: Stock and Recruitment data for Skeena river sockeye salmon (Reproduced from Hilborn and Walters [25]). The lines were fitted according to Ricker model (broken) and Beverton-Holt model (continuous) using means of posterior distribution. Table 4.5: Five quantiles used to determine prior distribution for parameters a and b (the subscript according to the model: Ricker or Beverton-Holt). 'm' and 'C' are mean and variance used in Student prior. Q u ant ile s a bR x 10-4 bim x 10-4 5 1.0 7 2 25 1.5 11 7 50 2.0 14 14 75 2.7 18 31 95 4.0 26 100 m 0.70 -6.65 -6.55 C 0.18 0.15 1.37 Chapter 4. Applications^ 71 difficulty is the different biological meaning of the parameter (see Table 4.4). Prior assessment of bBH needs the joint consideration of two uncertain quantities: Rmax and the slope at origin a. For comparison, bit relates only to Smax. As a diffuse prior for V, I chose an inverse gamma distribution IG(ao = 2,00 = 1) which has mean=1, mode=0.33 and oo variance. The goal is to construct the posterior p(0 Y(T)), using the AIS approximation of Section 2.3. For convenience, I define the initial importance function go(S) to be the prior r(.). This results in initial weights wo j proportional to the marginal likelihood Po(Y (7) I 00 j). After sampling no = 1000 points from this prior, I perform two updating recursions using ni = 1500 and n2 = n = 4000, respectively. At each loop the mixture model is clustered to a smaller mixture of m = 500 before proceeding. The reported sample sizes were chosen empirically. Sample sizes that were too small caused instability in measurements of the relative entropies (2.4). Monte Carlo estimates for the mean and variance of 9 and functions H(9), were calculated using the full sample of n = 4000 points. With respect to the assumptions described in Lemma 3.2.1, I considered three different scenarios: • Ra Ricker model and assumptions of Lemma 3.2.1 (a). • Rb Ricker model and assumptions of Lemma 3.2.1 (b). • BHb Beverton-Holt model and assumptions of Lemma 3.2.1 (b). The values of "entropy" in Table 4.6 were calculated according to expression (2.4). This measure is an indication of the goodness-of-fit between the current importance function and the actual posterior densities at the given grid points; a value of 1 corresponds to a perfect fit. The observed entropies indicate the usefulness of the adaptive procedure in constructing a good importance function. In particular, the first recursion shows a considerable improvement, considering the relatively small sample sizes. After focusing the sampling effort on the region of interest, the refinement of a second recursion takes the value of entropy even higher. All final approximations have entropies above 0.99 (recall that 1.0 is a perfect fit). By comparison, Chapter 4. Applications^ 72 Table 4.6: Results for AIS applied to the Skeena river sockeye salmon data under various assumptions. The first number is the posterior mean; the posterior standard deviation is given in brackets. Columns R* and BH* are the estimates of Hilborn and Walters [25]. MODEL Entropy V I Y(r) Smsy I Y(r) umsy I Y(r) P(Smsy > 800) P(Smsy > 1000) Ra .466 .955 .995 .258(.056) 718(148) .53(.04) .21(.007) .05(.003) Rb .540 .984 .996 .259(.056) 706(134) .53(.04) .19(.006) .03(.003) BHb .635 .984 .994 .260(.056) 1203(544) .45(.04) .82(.006) .03(.008) R* BH* .22 851(246) .55(.05) .23 916(271) .48(.06) Table 4.7: Posterior mean and Covariance for different Models fitted to the Skeena river sockeye salmon data. Model Ra Rh BHb mo 1.29 1.29 1.20 m1 -7.20 -7.18 -7.22 Co,o 0.016 0.015 0.026 Co,i 0.024 0.021 0.071 C1,1 0.058 0.052 0.266 a one-time sample of 6500 points using the initial importance function produced an entropy of 0.60 in model Ra and less acceptable results for estimates. The same pattern was observed for other cases. Posterior mean and variance for 0 = (In a, in b)' are displayed in Table 4.7 and contour plots of the posteriors for Rb and BHb are shown in Figure 4.5 (a) and (b), respectively. There is positive correlation between in a and In b. In the case of model BHb, one can see that for large values of in a, the conditional distribution of in b is sharper (i.e. less variable) than for smaller values of in a. Figure 4.5 (c) and (d), show approximations for posterior densities of Smsy and umsy. These posterior densities were obtained by fitting an ordinary (Gaussian) kernel estimator (Silverman [59]) to the 4000 observations of the appropriate function H(9i) (I used the 'density' function in Splus). The strong agreement between the posteriors obtained for models Ra (full line) Chapter 4. Applications^ 73 (b) (a) 0.8^1.0^1.2^1.4^1.6^1.8 In(a) (d) 500^1000^1500 S(msy) 20-0 0 Figure 4.5: (a) and (b): Contour plots of posterior distributions for Rb and BHb (Lines at .01, .05, .10, .25, .75, .90, .95 of the largest density). (c) and (d): Approximate posterior densities for Smsy and umsy (dashed line: BHb; full and dotted lines: Ra and Rb). Chapter 4. Applications^ 74 and Rb (broken line), indicate that there is no much to be gained by the more elaborate assumption (b) in Lemma 3.2.1 for this particular example. A comparison of the estimates reported in Table 4.6 for both models (Ra and b) also support this conclusion. However, comparing models Rb and BHb, show marked differences for Smsy and umsy. Under BHb the posterior for Smsy is quite flat as compared to Rb. The heavy upper-tail for Smsy in the former is supported by the Monte Carlo estimate P(Smsy > 800) = 0.82 as compared to only 0.19 for the latter. The posterior for umsy also suggests different actions under different model assumptions. If Rb is chosen, an optimal harvest rate in the range (0.5, 0.6) seems adequate. Alternatively, for BHb, smaller values within (0.4, 0.5) are more adequate. But which one is the better model ? The two lines fitted in Figure 4.4 use Monte Carlo estimates for the posterior means of the parameters a and b (the Bayes estimators under squared-error loss) for different models: full line for It and broken line for BHb. I cannot tell from these fits which model is better. The distinction between the models becomes important when predictions for high levels of spawning stock (S> 1000) are made. Such model uncertainty can only be resolved by explicit experimentation in management, allowing high spawning stocks (i.e. zero harvest rate !). The results obtained by Hilborn and Walters [25] are displayed for comparison, as columns R* and BH* of Table 4.6. There are differences in the values for estimates of both mean and standard error. The strongest disagreement is for Smsy under model BHb. But in general terms the outcome of both methods agree. There is an advantage in describing uncertainties by posterior distributions rather than point estimates when an action has to be taken. The next two sections, where the process of decision making under uncertainty is directly addressed, will illustrate this point. 4.3 Surplus Production: Simulation Model In this section I shall use simulations to generate artificial data for catch and effort. These data are fitted to a model of the form (3.15) according to the procedure described in Section 3.2. I Chapter 4. Applications^ 75 replicate some of the cases described in Ludwig and Walters [38], (denoted L&W throughout this section) and use their definition of "optimal effort" (Eopt), as the policy variable to be estimated from the data. In a second stage of the analysis, I shall replace this policy variable by the effort EB (where B stands for `Bayes'), chosen to maximize some specified expected utility. L&W judge the appropriateness of a given model/estimation scheme by examining bias and variability in the estimates for Eopt. In order to obtain a distribution of such estimates, they replicate a large set of Monte Carlo experiments for any given fixed simulation model. While interested in the same kind of analysis, I will proceed differently to obtain the distribution of the estimates. Since a stochastic sample2 of values for 0 is available, the corresponding sample for Eopt — a function of 0 — is available too. Therefore, I can approximate the (Bayesian) posterior distribution of Eopt directly, without starting the whole process from scratch. I shall illustrate the fitting procedure for catch and effort data described in Section 3.2 and compare the outcome to results obtained by L&W. Given the artificial data generation, I have control over the amount of process noise and observation error and can determine its influence over the final outcome of the estimation. Also, the effect of different exploitation histories can be analyzed by defining different effort sequences. In the data generation process I used three different simulation models which I will now describe. Simulation Models Three different schemes were used to simulate the artificial data. In all cases I used the "DerisoSchnute" model (3.16) with a = 1.65 and b = .2. The cases differ according to: (1) the model to describe the biomass dynamic (choice of 7), (2) the catchability parameter q, (3) the starting stock size So, and (4) the global variance a2 = vp + V defined as the sum of process noise iip and observation error V. The values used are listed in Table 4.8. 'Case 3' differs from the other two, by having a much smaller overall variance a2 and a comparatively small catchability q. The parameter A = V7,/(Vp-1- V) is fixed at A = 0.5 in all cases; so that the global variability is 2Recall that it is a sample drawn from the posterior distribution p(0 I Data). Chapter 4. Applications^ 76 Table 4.8: Specification of the three simulation models used to generate catch and effort data. Case No. 1 2 3 -y -1 1 -0.5 q 0.6 0.6 0.31 So 2 1.5 1 2 a 0.16 0.16 0.05 equally divided between Vp and V. The observation error is associated with effort measurements only. More specifically, for t = 1,. - • ,26 and a sequence of independent Gaussian random variables vt, with mean 0 and variance V, I assume that Ef = Ete' where Et is a specified effort sequence. The corresponding catch Cf is obtained from (3.20) when Ef is used as effort measurement. I further use expression (3.21) to get the surviving stock St, an input required for the dynamic model (3.16). The random process noise wt, associated to the biomass dynamics, is modelled as Gaussian with mean 0 and variance V. Fitting Models After having generated the data, I use another set of models to perform estimation: R1 The generalized Ricker model (3.17) with power 6. = 1. R2 Similar to above, with power b = 2. BH Model (3.16) with -y = —1, commonly known as the "Beverton-Holt" model. In all but one case, the "fitting model" is different from the underlying "real" model used to generate the data. For the case in which both models are the same, the posterior distribution describes perceived uncertainties for the "true" parameter vector 0 (known from the simulation model). In the remaining situations such interpretation does not apply and I focus on estimates for the policy parameters. As in the approach taken by L&W, I am concerned with the estimation of an optimal effort (Eopt) and the corresponding optimal surviving biomass (Sot). Chapter 4. Applications^ 77 The exact optimal values are known from the parameters of the simulation model generating the data. I will construct an approximate posterior distribution for such a policy variable by using the AIS procedure of Section 2.3. Further, I will compare my results to those reported by L&W, which were produced by an unrelated estimation scheme. The optimization criteria of L&W are designed to choose a constant effort to maximize the expected present value of future harvest. Appendix B of their paper details the calculation of such an optimal effort Eopt and its associated stock size Sopt. While the details of these calculations are not of concern here, it is important to point out that Eopt is a function of 00 = ln a, 92 = In q and the variance of the process noise Vp. Particularly, for the Beverton-Holt model an explicit solution is possible, and given by Eopt(0, vp) e-02 2 - (Bo Vp — ln Spv) (4.36) where Spy is the discounting constant used in present value calculation (this value is fixed at 6pv = 0.9 for all my calculations). From the above expression, it is clear that an overestimate of Vp causes overestimation of Zvi. This scenario is likely to occur whenever only process noise is assumed. To control this feature, I specify A, which indicates the amount of global variance to be attributed to process noise. I perform different fits using the assumptions of process noise only (A = 1), observation error only (A = 0) , and an intermediate value (A = .5). In order to verify the effect of different exploitation histories over the estimates, I used two alternative sequences of effort Et displayed in Figure 4.6. Although the sequence `Eff 1' is expected to be more informative than the slowly increasing "one-way-trip" of `Eff 2', the latter is closer to situations encountered in practice. Table 4.9 gives a summary of the different scenarios which were analyzed. Rows 1 to 8 allow comparisons within the same simulation model, given in Table 4.8 as 'case 1'. To make comparisons among different models, one can take for instance rows 5, 10 and 14 ( all are R2 ); or alternatively rows 7, 8, 12 and 15 ( all BH ). Figure 4.7 shows the simulated sequences of efforts (dotted line) and the corresponding catches (full line). In Figure 4.8 I shown the observed pairs (Se, Bt+i) for the different simulations. The lines exhibit the true model — GO in (3.24) — used to generate the data. Chapter 4. Applications^ Eff 1 0 ^ ^ ^ ^ ^ 5 10 15 20 25 time Eff 2 5 ^ ^ ^ 10 15 20 time Figure 4.6: The two effort sequences used in the simulation models. 78 79 Chapter 4. Applications^ Case 1(2) Case 1(1) 10 time Case 2 20 10 time 20 Case 3 Figure 4.7: Time-series of observed efforts (dotted line) and the resulting simulated catches (full line), for different models. ▪^ 80 Chapter 4. Applications^ Case 1(2) Case 1(1) 2.0 2.5 3.0 3.5 4.0^2^3^4^5 S(t) S(t)^ ^ ^ 6 7 Case 3 Case 2^ q. • • • ▪ ........ • ale ...^ • •• --------- .. • • • • ft.--- •• •• 1.0 ^ 1.5^2.0 S(t) ^ q 2.5 1.0 1.5 2.0 S(t) ^ 2.5 ^ 3.0 Figure 4.8: Spawning stock 'S(t)' and the corresponding biomass `13(t-F1)' for different simulation models. The line is the function GO from which the data were generated. Chapter 4. Applications^ 81 Table 4.9: Specification of 16 models used to fit catch and effort data. 'SIM' indicates the simulation model. 'FIT' gives the fitting model. `Eff' specifies assumed effort sequence. A is the assumed choice for A. An omitted entrance implies the repetition from the line above. Row No. SIM FIT Eff A 1 2 3 4 5 6 7 8 1 - R1 R2 BH - 2 2 1 2 2 1 2 1 1 0.5 0.5 0 0.5 0.5 1 0.5 9 10 11 12 2 - R1 R2 BH 1 - 0.5 0.5 0 0.5 13 14 15 16 3 - R1 R2 BH - 2 - 0.5 0.5 0.5 0 Chapter 4. Applications^ 82 Given the data, the approximate posterior p(0 Y(25)) for the parameter vector 9 = (in a, In b, in q)' is obtained using the AIS procedure of Section 2.3 and the mixture model (2.3). I define the initial importance function go(.) to be the prior TO. This will allow for the calculation of initial weights c4.70,3 proportional to the marginal likelihood p0(Y(25) I 00,j). I draw an initial sample of size no = 1000 points from this prior, and perform two updating recursions, using respectively n1 = 2000 and n2 = n = 6000. At each stage, the mixture model is clustered to a smaller mixture of m = 800, before proceeding with a new sample draw. The reported sample sizes were chosen empirically. Too small sample sizes, caused instability in measurements of relative entropies (2.4). Monte Carlo estimates for the mean and variance of and functions thereafter were calculated using the full sample of n = 6000 points. In all cases reported here, I used the assumption of Lemma 3.2.1(b). As prior 7r(0) for 9=(9i, 02, 93)'=(ln a, in b, in q)', I defined a three-dimensional Student distribution with 9 degrees of freedom, mean vector m = (mi, m2, m3)' and diagonal covariance matrix C = diag(Ci, C2, C3)). To establish values for the moments m and C, I first determined quantiles for some biologically meaningful variables. To illustrate the procedure, I explain the derivation of m2 and C2 for 92 = ln b in the standard Ricker model Rl. According to this model, the surviving (or spawning) biomass producing maximum output, is given by Smax = 1/b. As quantiles 5, 25, 50, 75 and 95 for Smax, I used the values 1.1, 2.0, 3.1, 5.3, 10.0. These values were checked to approximately cohere with a log-normal distribution. Such a distribution would approximately have mode 2, median 3, mean 4 and standard deviation 3. Satisfied with these numbers, and using the transformation 02 = —1n(Smax) at the given quantiles, I calculated the quantiles of the corresponding Gaussian distribution for 02. From each pair of quantiles I calculated m2 and C2 of the Gaussian distribution; the average of these values were my chosen parameters: m2 = —1.18 and C2 = 0.46. For the generalized Ricker model with power b it holds that b = 1/(5Sfnax). If b(2) denotes the parameter b for model R2 (i.e., b = 2), then, by defining 02(2) = in b(2), it is easy to see that 02(2) = 202 — In 2 Chapter 4. Applications^ 83 I used this linear relation to modify mean and variance of the parameter 92 accordingly. For none of the other parameters was a transformation necessary when going from R1 to R2. The remaining parameters were chosen using a similar procedure. The final values are: R1 : m = (0.57, —1.18, —1.18)' and C = diag(0.11, 0.46,0.46). R2 : m = (0.57, —3.05,-1.18)' and C = diag(0.11,1.84, 0.46). BH : m = (0.57, —2.14, —1.18)' and C = diag(0.11, 0.30, 0.46). Finally, as a prior for V, I defined an Inverse Gamma distribution with parameters ao = 2 and 00 = 5. To simplify comparisons, I used the same priors in all simulations. Description of the Results The entropies (2.4) of the different mixture models give a measure of the goodness-of-fit between the current importance function and the actual posterior densities at the given grid points; a value of 1 corresponding to a perfect fit. The observed entropies consistently approach 1 as the mixture model is updated. A summary of the observed entropies is given in Figure 4.9. It shows the substantial improvement at the first recursion, followed by the refinement of a later step. The following results are summarized in Table 4.10. They are Monte Carlo estimates obtained from the sample of 6000 observations taken from the final mixture model. The effect of fixing A can be verified for models R1, R2 and BH, when comparing the set of runs {1,2,4}, {10,11} and {15, 16}. Regardless of the model, there is an improvement in the estimates (i.e. reduction of bias and variance) as A decreases. This result implies that, for the situations studied here, the best strategy is to assume all random disturbance as observation error. Similar findings were reported in Ludwig at al. [39]. The influence of different exploitation histories can be analyzed by comparing the outcomes for models differing only in their effort sequences. The pairs {2,3} for R1, {5,6} for 112 and {7,8} for BH, provide the output for such comparisons. In all those fits, I fixed A arbitrarily at Chapter 4. Applications^ 84 g(0), g(1), g(2) Figure 4.9: Box-plots for "relative entropy" for the sixteen fits to simulated catch and effort data. 85 Chapter 4. Applications^ Table 4.10: Posterior expectation (standard error) for the biases in Sopt, the bias in Eopt, and the global variance a2 = Vp + V. Row No. Sopt — SIat Eopt — Eo*pt a2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0.43(1.01) 0.43(1.01) 0.36(0.77) 0.08(0.85) 0.87(1.60) 1.10(1.15) 1.87(1.52) 1.48(1.40) 0.29(0.59) 0.84(0.84) 0.44(0.94) 1.24(1.15) -0.21(0.44) 0.16(0.69) 0.30(0.77) 0.15(0.72) 0.12(0.46) 0.12(0.46) 0.66(0.68) 0.01(0.40) -0.19(0.34) 0.46(0.57) 1.05(1.10) 1.59(1.67) 0.61(0.70) 0.27(0.49) 0.07(0.48) 1.72(1.93) -0.15(0.57) -0.31(0.47) 0.36(1.10) 0.30(1.03) 0.23(0.07) 0.24(0.07) 0.15(0.04) 0.24(0.07) 0.24(0.07) 0.15(0.04) 0.23(0.07) 0.16(0.05) 0.19(0.06) 0.22(0.06) 0.22(0.06) 0.21(0.06) 0.07(0.02) 0.07(0.02) 0.07(0.02) 0.07(0.02) Chapter 4. Applications^ 86 0.5. The results for estimates of Sopt and Eopt oppose each other. In all cases, going from `Eff 2' to `Eff 1' reduced the variance for Sopt and increased bias and variance for Eopt. For Sopt the bias increased only in the case of model R2. The sequence `Eff 1' definitively improved the estimates for the global variance a2. Under the simulation model described as 'case 1', only the data generated from `Eff 1' (runs 3, 6 and 8) have expectations close to the real value 0.16, the general tendency being towards overestimation. Since `Eff 1' would theoretically produce a more informative data sequence than `Eff 2', the inconclusive results came as a surprise. Another question to be answered: "How do different fitting models (R1, 1t2 or BH) perform under different circumstances (i.e. different simulation models) ?" This addresses the robustness of a given estimation model to deviations from the correct underlying biomass dynamics Such relative comparisons are possible for the sets {2,5,7}, {3,6,8}, {9,10,12}, and {13,14,15}. Considering both, bias and variance, model R1 performs best for Sopt, while R2 is the better model for Eopt. It is important to notice that model BH failed, even in 'case 1' where BH was the "correct" model from which the data were generated. Similar findings in Ludwig et al. [39] and L&W, indicate that the generalized Ricker model is more effective in producing good estimates for policy parameters Eopt and Soo than some of its competitors. This includes cases for which the data were effectively generated from such competing models. As was already mentioned, the posterior estimates of variance tend to overestimate its true value. In the different situations investigated here, such overestimation is roughly one standard error. Exceptions are runs 3, 6 and 8 (already discussed), and runs 13 to 16 corresponding to simulations in 'case 3'. In runs 7 and 8, the simulation and fitting models are the same. I have used 'run 8' to construct the posterior distribution for the parameter vector (In a, in b, in q)'. The two-dimensional marginals are shown in Figure 4.10 (a) to (c); the correct parameter combination used in the data generation (i.e. simulation model) is shown as a dot in each contour plot. In (d), I display a histogram of the 6000 estimates for Eopt; the true value is 0.54. The strong asymmetry of the histogram is a consequence of a few extremely large values (the maximum being Eopt = 16.3). Chapter 4. Applications^ 87 (a) (b) yk: -3 -2 In(b) -1 Figure 4.10: Contour plots of two-dimensional marginal posterior distributions for 'run 8'(lines at .05, .10, .25, .50, .75, .90, .95 of largest density). The dots indicate the true parameter used in the simulation model. In (d) a histogram of the 6000 Monte Carlo estimates for Eopt (the column to the far right includes all Eopt > 5). 88 Chapter 4. Applications^ Utilities and Bayesian Decision So far I have concentrated my analysis on policy criteria of L&W in order to compare the Bayesian procedure and the "total least-squares" method used in their paper. But, in order to adequately explore the power of Bayesian decision making, I need a different approach. According to the description of Section 1.3, an optimal policy corresponding to a"Bayes rule" has to maximize the expected utility. To describe this approach, I use the utility function introduced in Section 3.2. It remains to define a set of decision rules A and the threshold SLIMIT. For A, I specify a finite set of efforts, over a range — from 0.05 to 2.50 — which I assume implementable for the imaginary fishery under consideration. This set (50 points) is fixed as A = {0.05, 0.10, 0.15, - • • , 2.45, 2.50} More precision could be achieved with a finer grid. But when it comes to enforcement of such policy, it might be of little use to distinguish between 0.68 and 0.71 — from a practical viewpoint 0.7 serves both cases. The expected utilities for six simulations are displayed in Figure 4.11. Each row corresponds to a different simulation model. The first column (Runs 1, 3 and 5) relates to the fitting model R2; BH is the fitting model in the second column (Runs 2, 4 and 6). Table 4.11 also accounts for the models — notice that, in contrast to calculations of Eopt, no value of 5, is needed — and gives the effort EB which maximizes the expected utilities. In square brackets is the corresponding "true effort". It gives the best choice (in the set A) if complete information were available, the dynamics deterministic, and the above defined utility function were used as the maximizing criteria. All runs use SLIMIT = 0.3 — the threshold for surviving stock in the utility function. The evaluation of R2 and BH is done by comparing EB with the corresponding "true" policy. For simulation models 'case 1' and 'case 2' R2 performed best while in 'case 3' the correct estimate was given by BH. For model BH two features are observed: (1) the value Chapter 4. Applications ^ Run 1 89 ^ Run 2 • .•-•...41....•.••••••••••■••••••••••• 0. o k-r • 3 Ns.s LU 01 • N • ^ ^ ^ ^ ^ 2.0 25 0.5^1.0^1.5 2.0 2.5^0.0 0.0^0.5^1.0^1.5 ^ Effort Effort Run 3 ^ Run 4 0. 3 0• 9. w Cd o •••-. V3 a ••••...„ ^ ^ ^ ^ ^ 1.0^1.5 0.5 2.0^2.5 2.0 2.5^0.0 0.0^0.5^1.0^1.5 ^ Effort Effort Run 5 ^ Run 6 • ^ ^ 2.0 2.0^2.5^0.0^0.5^1.0^1.5 0.0^0.5^1.0^1.5 Effort Effort Figure 4.11: Expected posterior utilities over the set A of possible choices for effort. First column uses fitting model 1t2 and second column uses BH. Data from simulation models 'case 1', 'case 2' and 'case 3' were used in rows 1, 2 and 3. Table 4.11: The Bayes rule EB for different cases is displayed together with its "true" alternative [value in square brackets]. Columns 2 to 4 specify model assumptions. All 6 runs use SLIMIT=0.3. No. 1 2 3 4 5 6 SIM 1 1 2 2 3 3 Eff 1 2 1 1 2 2 FIT R2 BH 1t2 BH 1t2 BH EB 0.75 [0.60] 2.15 [0.60] 0.60 [0.60] 1.80 [0.60] 0.70 [1.20] 1.20 [1.20] Chapter 4. Applications^ 90 EB is always larger than the corresponding value for 1t2; (2) the expected utilities (near the maximum) are very close over a wide range of effort selection — in run 2, for instance any value from effort 1.5 to 2.5 would do. Further notice that, similarly to the findings in the first part of this section, the "correct" fitting model of 'run 2' is outperformed by the "incorrect" R2 of 'run 1'. I also experimented with different values for SLIMIT —not shown here. A sufficiently large increase of SLIMIT will tend to produce smaller values for EB, reflecting a more conservative management. Finally, there is no effect of different exploitation histories (Eff 1 versus Eff 2) over the final outcomes in all cases I have analyzed. Conclusion Here is a summary of the results for this section. Two distinct analysis are considered. The first estimates the optimal effort Eopt as defined by L&W and the results are discussed in the light of earlier findings reported by them. The second analysis is new; it uses an utility function for which the best effort is the Bayes rule EB, chosen to maximize the expected posterior utility. First Analysis: Given the goal of estimating the optimal effort, the three central findings agree with L&W. I summarize them as follows: 1. When deciding on how to "split" the overall variance into process noise and observation error, it is preferable to favour the observation error. In particular, observation error only (A = 0), was shown to outperform the assumption of process noise only (A = 1). 2. An "incorrect" fitting model can lead to superior estimates of policy variables. For catch and effort data with large error and/or low variation in stock size, one might give up fidelity of the fitting model to the underlying dynamics in order to obtain improved estimates for the policy parameter of interest. In particular, the Ricker models (R1 and 1t2) have outperformed the Beverton-Holt (BH) alternative. Chapter 4. Applications^ 91 3. Different exploitation histories (here represented by distinct effort sequences) can affect policy parameters in different ways. Overall, there is a trade-off between bias and variance. For the cases analyzed here, there is no single model which is superior for both parameters Eopt and Sopt under all circumstances. Second Analysis: Here I use Bayesian decision-making. The utility function incorporates the central goals that a fishery manager would like to achieve: maximization of catches and reduction in the risk of stock collapse. hi order to determine the best solution, there is no need for equilibrium assumptions, neither is it necessary to specify a parameter Ä dealing with process noise and observation error. The results show that: 1. The "incorrect" fitting model performed better than the correct model. This feature was also observed in the previous analysis. 2. For all three simulation models, at least one of the fits identified the "true" (perfect information) effort. It seems that the generalized Ricker model (particularly 1t2) does better than model BH while always staying on the conservative side (smaller efforts). Only in 'case 3', for which the effort has less effect over stock depletion3, did BH give the correct effort. 4.4 Surplus Production: Orange Roughy This section is a case-study of the Orange Roughy (Hoplostethus atlanticus) fishery on the Chatham Rise, New Zealand. The analysis uses data previously published by Francis [18]. The risk analysis performed by Francis in that paper has caused some recent controversy (Hilborn et al. [24]) regarding the state of the fishery and consequently the management actions to be taken. Bayesian analysis which is made possible by the procedures developed in Chapters 2 and 3, can help to clarify some of the controversy. Using an age-structured model with many parameters (which were estimated elsewhere and 3The catchability q = .31 is half the value used in cases 1 and 2. Chapter 4. Applications^ 92 are assumed known), Francis built a cumulative distribution function for the virgin (unfished) biomass B1. This construction was done by replicating simulations 100 times and using a frequentist approach to make empirical estimates of cumulative probabilities. He proceeded by fitting a Johnson's distribution (Johnson and Kotz [26] p.23) to the simulated cumulative distribution. Taking the first derivative of this fit, he obtained a probability density for B1. The mode of this distribution was used as a maximum likelihood estimate of B1. His goal was the assessment of the risk of collapse within the next five years, given different rates in the reduction of annual total allowable catches (TAC's) within that period. I shall give further details about these different decision options later. Hilborn et al. [24] performed a series of Monte Carlo experiments for scenarios in which Francis' method of constructing probability distributions for B1 fails: cases with low harvest rate and uninformative data. Based on these results, they question the estimate for B1 produced by Francis. A key element in the controversy is the uncertainty about the virgin biomass B1 and its use in decision making Francis describes uncertainty by a density for B1, from which he then draws random values of B1 to produce replicates for a frequentist construction of his risk function. Further uncertainties are ignored in the process. I shall use the Bayesian alternative described as 'Model 2' (Section 3.3) to construct a posterior distribution for the parameter vector 0 which includes ln B1 as one of its elements. This multi-dimensional posterior distribution is further used in the context of Bayesian Decision Theory to determine an optimal action. The data of Francis are reproduced in Table 4.12. They include a time series of catches {Ct; t = 1, • • • ,11) and estimates of abundance from trawl surveys {It; t = 6, , 11). According to Francis' assumptions, the errors in Ct are negligible, so that all observation error is related to It. My general model, is given by (3.24) and (3.25). Particularly, I define = (In a, —ln^, ln q)' b = 1/B1 93 Chapter 4. Applications^ Table 4.12: Estimated catch history, and trawl survey biomass for Orange Roughy (Hoplostethus atlanticus) for the period 78/79 to 88/90. Data from Francis [18]. Time Catch (t) Trawl Index 1^15,340 2^40,430 3^36,660 4^32,354 5^20,064 6^32,263^164,835 7^38,142^149,425 8^39,098^102,975 9^39,896^80,397 10^31,478^97,108 11^42,621^66,291 St = Bt - Ct G(O, Bt_i) = St i I aBt_i (1 - bBt_i) for t = 2, • • - , 11 - - - F(9, Bi) = qBt for t = 6, • - - , 11 Model GO describing biomass dynamics, is the discrete form (i.e. difference equation) of the Schaeffer model. The observation model F(.) assumes direct proportionality between index I and the corresponding true biomass B, with constant of proportionality q. Determining Priors I used two different priors for O. Both are three-dimensional Student distributions, with 9 degrees of freedom. They represent my interpretation of the information provided and general assumptions used by Francis. The first prior, denoted "informative", assumes a relatively high confidence in prior identification of the mean values of the parameters. It is based on a guess for mean as well as a range to represent possible smallest and largest values. For the second prior, denoted "diffuse", I made a more careful (and pessimistic) elicitation. Starting with guesses for possible values of five pre-specified quantiles — 5, 25, 50, 75 and 95 — I found the parameters Chapter 4. Applications^ 94 of an idealized log-normal distribution which would best agree to those values (If these numbers were not coherent with such a distribution, I tried to improve with a new set of values). The parameters corresponding to the underlying Gaussian distribution were used as parameters in the Student prior. I also assumed a diagonal prior covariance. For the parameter q I used as quantiles the values 0.20, 0.30, 0.41, 0.56, and 0.90 — these values more than cover the range of possibilities considered by Francis. Similarly, for B1 I used the values (in thousands) 200, 400, 650, 1000, and 2000 and for a the values 0.04, 0.10, 0.19, 0.49, and 0.95. The prior parameters for the Inverse Gamma distribution of the observation variance V were the same in all runs. They were based on Francis' assumption of an coefficient of variation of 0.20 for observation errors in the abundance index. By fixing ao = 2 and setting the mode for V at 0.40, the value 00 = 8.0 was calculated. The values for the prior were as follows: • V /G(ao = 2.0, /3 = 8.0) • Informative: ro(0) T3(9, Mo, Co) with mo = (-1, 61, –13.36, –0.66)' and Co diag(0.30, 0.20, 0.05) • Diffuse: ro(0) T3(9, mo, Co) with mo (-1, 62, –13.37, –0.88)' and Co diag(0.97, 0.48, 0.21) Results Table 4.13 summarizes model specifications (columns 2 and 3) and the results for different runs. Various bandwidths between h2 = 1 and h2 = .6 were used. I wanted to make sure not to smooth out potential "ill behaved" features, such as multimodality, by using too large a bandwidth. Variations in estimates were small within the given range for h2. For the results presented here, I used h2 = 0.64 and h2 = 0.90. In the AIS procedure (Section 2.3) I used three updating steps with sample sizes no = 1000, n1 = 2000, n2 = 2500, and n3 = n = 6000 . At each stage the mixture model was clustered Chapter 4. Applications^ 95 Table 4.13: Model specifications (columns 2 and 3), final entropy (Ent.) and posterior estimates for virgin biomass B1 for Orange Roughy (values in 1000 t) and the probability that B1 exceeds 500 thousand t. Values in brackets are estimates for posterior standard error. Run 'la' uses assumption (a) of Lemma 3.1.; the remaining runs use assumption (b). 'D' denotes diffuse prior and 'I' the informative alternative. Run la 2 3 4 5 112 0.64 0.90 0.64 0.64 0.64 Prior D D D I I Ent. 0.94 0.98 0.98 0.99 0.99 B1 450 (108) 436 (92) 431 (90) 385 (48) 384 (49) P(Bi > 500) 0.25 (.006) 0.19 (.005) 0.18 (.005) 0.02 (.002) 0.02 (.002) to m = 800 components of three-variate Student distributions with 9 degrees of freedom before proceeding. The relative entropy (2.4) follows the usual pattern described in previous sections. A typical outcome (Run 5 of Table 4.13) gives the values 0.597, 0.947, 0.987, and 0.991 for the importance functions gi(-) for i = 0, • • , 3, respectively. The prior clearly influences the final estimates. If diffuse, the Monte Carlo estimates of posterior mean and standard error are higher (approx. 433,000 t and 90,000 t), than estimates obtained under the informative alternative (approx. 384,000 t and 48,000 t). Similarly, the estimate of the probability that B1 exceeds 500,000 t, is about 0.2 for the diffuse prior, while in the informative case the estimate is 10 times smaller. The marked differences due to different priors are also shown in Figure 4.12. Three different marginal posteriors are displayed: for 'run la' (line), 'run 2' (dots) and 'run 4' (broken line). If the diffuse prior is accepted, values of B1 up to 800,000 t are possible. This is quite different from the informative case, where values of B1 exceeding 600,000 t are unlikely. A comparison with the estimated density reported by Francis (his Figure 1(B)), suggests the density which was obtained for 'run 4' (broken line). In other words: the acceptance of Francis' distribution corresponds to the acceptance of the informative prior in my model. The marginal distribution for 'run la' (line) shows an interesting feature suggesting bimodality. This was an atypical feature. It is included as an illustration of a particularly interesting Chapter 4. Applications^ 96 210A5^4*10A5^610115^8*10A5^10A6^1.2*10A6^1.4*10AE Virgin Biomass Figure 4.12: Marginal posterior distributions for virgin biomass B1 for different runs: `laVine), '2'(dots) and '4' (broken line). Chapter 4. Applications 97 (a) 12.0 12.5 13.0 In(B 1) (b) 13.5 14.0 12.0 12.5 135 14.0 (C) 0. Tcy • 12.5 13.0 In(B_1) 13.5 12.5 13.5 (e) 12.4 12.6 12.8^110 In(B 1) 13.2 13.4 12.4^12.6^12.8^13.0^13.2^13.4 111(8_1) Figure 4.13: Contour plots for two-dimensional marginal posterior distributions: 'run la':(a) and (b); 'run 2':(c) and (d); 'run 4':(e) and (f). Contour lines are drawn at .05, .10, .25, .50, .75, .90 and .95 of the highest density. possibility. To better see this feature, I included contour plots of the two-dimensional marginal posterior distributions for (ln Bi,ln a) and (ln q) in Figure 4.13 for the same three cases. For 'run la' (displayed in (a) and (b)), the two modes are clearly shown. All remaining cases are unimodal. The negative correlation between B1 and q is consistently present (contours (b), (d), and (e)). Chapter 4. Applications^ 98 Decision making - The analysis so far was able to describe some general features of marginal uni- and bidimensional posterior distributions and its relation to prior specification. But the final goal of Francis' analysis was the use of his risk-assessment to make management decisions. A key element in Hilborn et. al.'s criticism was the absence of a coherent framework for decision-making under uncertainty. I shall use the three-dimensional posterior distribution for 9 together with an appropriate utility function to perform a Bayesian decision analysis and choose the Bayes rule policy from among the five different options considered by Francis. But first I shall describe these management options. The fishery for Orange Roughy is managed by annually setting the TAC. For instance, the fishing season of 89/90 (the year immediately following the data sequence of Table 4.12), had its TAC fixed at 28,637 t. According to Francis, there seems to be an agreement to reduce TAC. But two questions remain: (1)"to what level should the TAC be reduced" and (2)"at what rate should the reduction take place". The target level was fixed at a maximum constant yield of 7,500 t — accounting for question (1). Francis defined this value to be -IMSY, where MSY denotes the deterministic maximum sustainable yield. This value is questionable on two grounds: (i) it is an arbitrary choice and (ii) its calculation uses the estimate B1 = 411,000 t, ignoring the uncertainties of this estimate. What I will try to answer here is question (2): the rate at which such reduction should be made. Table 5 of Francis displays five alternative options. I will use the same options to define my set of actions A = {al, a2, a3, a4, a5}. Given action al, the rate of TAC reduction is 3 thousand tons per year. Similarly for a2 up to a5, the rates are respectively 5, 7, 9 and 12 thousand tons. Francis considers "the risk to the fisheries" which each of these options represents by estimating (using a frequentist empirical estimate) "the risk that the fishery would collapse within a five year period". The fishery is said to have collapsed if, within this period, in some year the TAC is greater than 2/3 of the corresponding biomass. 99 Chapter 4. Applications^ To construct my utility function I assume that, for a fixed ai and in any given year, the catch will equal the prescribed TAC; I denote this total quota by T(i). Hence, starting from the current To = 28,637, future quotas for s = 1,-• • , 5 are given by T2) = max{To — sat; 7500) Defining the discount rate of future catches by Spy, the cumulative discounted catch over the five year period is denoted CC(i) and given by Cr(i) 5 (5-TP E = s=i I further need to identify the "occurrence of collapse given decision ai". This is done by the indicator function A(2) = 1 1 if TP) > B12-i-8 for some s = 0, • • • , 5 0 otherwise (4.37) Given a pair (Oi , ai) E 0 X A (recall that 0 is the random sample from the posterior distribution), the utility function can be stated in terms of the notation just introduced: U( 83, ai) = (1 — A())0(cc(2)) ^ (4.38) where c/) is some known non-decreasing function. The utility (4.38) is zero whenever collapse is predicted (A(i) = 1). Otherwise it is 0(CC(i)). The dependence on the posterior p(9 I Data) becomes clear when considering the algorithm used to calculate (4.38): 1. Fix (ki E 0 and use GO together with the observed catch data {Ct; t = 1,•• • , 11) to predict the average time series for {Bt; t = 2, • ,12}. 2. Fix a decision rule ai E A and define {To, TP, • • • , TP} to be the catches into the future. 3. Evaluate A(2) according to (4.37). 4. Calculate U(93, a1) using (4.38). Chapter 4. Applications^ 100 Table 4.14: Risk-averse utilities 0 associate to the five possible cumulative discounted catch COO. i ai COO x10-3 0(CC(0) 3 83 .981 1 2 5 63 .950 3 7 52 .916 4 9 .888 46 .850 5 12 40 5. Repeat the whole procedure for the next pair (0i, ai). I consider 3 alternatives for 0: 1. 0(x). x. 2. 0(x) = 1. 3. q:(x) is a risk-averse function defined in Table 4.14. The first assumes 0(x) = x; so that the utility is linear in CC whenever the possibility for collapse is excluded. A more conservative approach, which is concerned only with the risk of collapse defines 0(x) = 1. This second alternative assumes the same utility for different catches regardless of its size; an unrealistic assumption. A compromise between both extremes can be obtained by a concave function for 0 and is given as the third option. Some comments on Table 4.14. Since the values CC() are fixed for each i, I only need to define 0 for those particular points. Notice that the utilities become smaller with smaller values of CC, but the reduction is non-linear. This values were determined assuming 0(160) = 1 and using Table A.1 for constant risk-aversion in Lindley [31] (p.196). For comparison, the linear (or risk-neutral) values would be .52, .39, .33, .29, .25 for i = 1 to 5, respectively. For the manager who needs a practical answer, the results in Table 4.15 are the most important of the section — with complementing details on Table 4.16. The following features are highlighted there: Chapter 4. Applications^ 101 • The decisions are robust over different bandwidths [1a,2] and prior assumptions ([1a,5] and [3,4]). • What defines the optimal rate is the utility function. If the linear relation 0(x) = x is chosen, the optimal decision is al which recommends a reduction in TAC of 3000 t per year over the next 5 years. On the opposite end, the conservative '0-1' option (0(x) = 1) recommends a5, a drastic rate of annual reduction by 12000 t. The intermediate risk- averse utility recommends rates in between: it chooses a3, a reduction rate of 7000 t per year, as optimal. • The expected utilities in the '0-1' case, have a special interpretation: the complement of the probability of collapse. For any decision a, the probability of collapse pi, is estimated as pi =1— E[U(0,ai)] If, for instance, I take 'run 5' in Table 4.16, these estimated are .27, .13, .08, .06, .05 for actions al to a5, respectively (these probabilities change little for diffuse prior:`run 2'). Such values are in disagreement with the results presented in Francis' Figure 3: his estimates of the risk of collapse exceed .6 for al, while probabilities smaller than .2 are observed only for a4 and a5. Since both methods are derived under different paradigms, a direct comparison is not possible. However, Francis' model is complex (many parameters) and the estimates of risk are sensitive to recruitment assumptions (as indicated in his Figure 3(d)). My model is comparatively simple and does not need special assumptions for recruitment. The findings in Ludwig and Walters [37] [38] indicate that simple models are likely to be more robust for management purposes. The results suggest that, for practical decision-making, the focus of discussions should be the assessment of the most adequate utility function to which all parts can agree (industry, agency, fisher, environmentalist, etc.). Given the available data, details regarding technicalities in parameter estimation and modelling seem to be of minor consequence. 102 Chapter 4. Applications^ Table 4.15: Recommended decisions ai (rate of TAC reduction in thousand t/year), which maximize the expected posterior utilities, using three alternative utility functions Run la 2 3 4 5 Linear 3 3 - Risk-Averse 7 7 7 7 0-1 12 12 12 Table 4.16: Expected utilities obtained for all 5 decision rules, according to the form of the utility function (expressed by 0). 4) Linear Risk-Averse 0-1 Run 3 4 la 3 4 5 la 2 5 al .622 .609 .740 .733 .717 .716 .767 .753 .730 a2 .538 .540 .814 .811 .825 .820 .871 .862 .870 a3 .462 .470 .823 .824 .838 .837 .913 .906 .918 a4 .420 .427 .814 .817 .830 .828 .931 .922 .934 a5 .373 .378 .793 .796 .808 .810 .945 .938 .951 Chapter 4. Applications^ 103 Summary There are 4 conclusions from this case-study for Orange Roughy: 1. The simple production of probabilities of collapse, as done by Francis, is not enough for decision making. The uncertainties (given by probabilities), need to be linked to measures of the consequences (utilities) resulting from different actions. The Bayesian paradigm further says that, for rational and coherent decision making, the maximization of expected utilities is the correct criterion to use. Such analysis was carried out with three distinct utility functions. By taking this additional step, my analysis completes the work of Francis. 2. The shape of the marginal posterior distributions for B 1 is affected by the choice of a prior. I used two alternative priors: informative and diffuse. The distribution used by Francis corresponds to the informative prior. The diffuse prior resulted in a heavier upper tail, increasing the possibility of values B 1 > 500,000t from 5% to about 20%. 3. Decision making is robust with respect to choices for prior. Given any of the utility functions, the optimal decision was not affected by the particular choice of prior (informative or diffuse). This indicates that for the current model/data situation, and assuming consensus about the set of actions A, the tail behaviour in the distribution for B 1 was not strong enough to change the overall optimal decision. 4. The construction of a sensible utility function is an important tool in the negotiating process among all interest groups. My results suggests that this aspect of the analysis overshadows technicalities concerning the choice of a model for biomass dynamics 4.5 Multimodality: Simulation Model The advantage of replacing classical parameter estimation by the Bayesian alternative is most apparent when the posterior distribution is multimodal. It indicates that more than one model Chapter 4. Applications ^ 104 describes data reasonably well. Such information is an important tool in the design of adaptive management policies (Walters [68]), where one of the requirements is the identification of alternative hypothesis. A procedure that forces the choice upon a single best model, as the classical approach does (least-squares, maximum likelihood), is unable to capture the uncertainty present in the problem. Such ideas were the motivation to examine practical situations for which multimodality might emerge. In this section and the next, I give examples for which the posterior distribution is multimodal. Here I use simulations to produce artificial data. The next section is a case-study of Pacific cod. I employ two different approaches: 1. In the simulated example I use an observation mixture-model (MM) given as: (Yt I 0, V), r • N(141)(0),V) -1- (1 - r) • N(142(9), V) with r E (0,1)^(4.39) /111) ,^(2) ,. where^r pt for all t. Such model might be adequate in situations where available knowledge suggests a mixture of two possible hypothesis (or stocks): a very productive stock with low virgin biomass, against a less productive stock with large virgin biomass. 2. The approach of the next section maintains the original observation model (without mixture) but modifies the description of biomass dynamics by replacing the surplus -production model by a more elaborate delay - difference (DD) model (see Section 3.4). The observation mixture model (MM) For data generation I used a standard logistic (Schaeffer) model for biomass dynamics. The simulation model is as follows: Bt = St - i + aBt-i(1 - bBt-i) St = Bt - Ct Ct = Bt(1 - e-c2") 105 Chapter 4. Applications^ Ef = Etevt , where vt are i.i.d. N(0,ae2 = 0.05) I = CtlEt) where Bt is biomass, Ct the catch, Et the effective effort, Ef the observed effort and h the observed abundance index. Parameters in the model are a = 0.2, B1 = 1/b = 4000 and cat chability Q. The cat chability Q is modelled as a random variable from a mixture of uniform densities, namely Q ,,, EU(.20, .40) + (1 — OU(.55, .65) with c = 0.4. The real (but unknown) effort sequence is defined for t = 1,- • • ,26 to be 0.40 if 9 < t < 18 Et = 10.20 if t = 9 and t = 18 0.04 otherwise Figure 4.14 shows the time series of Ct, Ef and It for the 26 years of data. The last plot displays the abundance index against the corresponding observed effort for the last 13 years. For fitting purposes I assume that only these 13 abundance observations are known along with all 26 observations for catch. The observation model for Yt = log It uses MM defined in (4.39) with r = 0.5. I further ( define p i)(0) as /4°(9) = ln(q(i)e) for i = 1,2 where for i = 1 I use a(1), b(1), q(1) according to the relation (ln a(1), ln b(1), ln q(1)) = (00,01,92) while for i = 2, I take (a(2), 1)(2), q(2)) = ( a^ (1) b(1) 2 ' 2 ' 2 . q(1) ) In the above construction I am mixing the hypothesis (i = 1) of high productivity, low virgin biomass, and a given catchabifity, with an alternative hypothesis (i = 2) of low productivity, large virgin biomass, and high catchability. Chapter 4. Applications^ 106 0 CV 0 11. • .^ o 90 90 VO £'0 Z'O VO 00 000 0091.^0001.^009 pogo 3/0 o cm 0 0 0 cm 1.... 0 0 009^009^00t7^00Z^0 40W0 000k^000£^000Z^0001. 3/0 Figure 4.14: Time series for Catch (Ct), observed effort (Es) and CPUE (It). Also a plot of `CPUE vs. E' for the final 13 years Chapter 4. Applications^ 107 The posterior distribution for the parameter vector 9 is obtained by using the algorithm of Section 3.3. What changes is the use of (4.39) as the observation model. Some two- and three-dimensional' marginal posteriors are displayed in Figure 4.15. The perturbation caused by the stochastic behaviour of the catchability Q is translated into uncertainties regarding the virgin biomass. A secondary mode is estimated for a biomass near 1600 while the primary mode is estimated near a biomass of 3200. Also notice the negative correlation between in a and in B1 which reflects the confounding between a small productive stock versus a larger less productive one. If both modes are interpreted as summarizing conflicting hypothesis, the highest one indicates the hypothesis with the strongest support. In this example, the large stock (virgin biomass near 3200) with a smaller intrinsic growth rate (parameter a) is the most likely of both alternatives. This hypothesis is closest to the true virgin biomass of 4000. 4.6 Multimodality: Pacific Cod One might argue that the mixture model (4.39) used in Section 4.5 is artificial. Therefore, I investigate the emergence of a multimodal posterior in a more natural way, without using a specific mixture model for observations. Possibly surplus-production models are too simplistic5. A next step towards more flexible models, capable of incorporating characteristics of growth, recruitment, and time-delays, is the class of delay-difference (DD) models. By establishing assumptions for growth, survival and recruitment, these models can better depict the interactions of these elements and the resulting time-delays. For more details on the DD model and the notation used throughout, see Section 3.4. Here I shall present a case-study using the data for Pacific cod (Gadus rnacrocephalus) presented in Hilborn and Walters [25] (p.345). For the purpose of the analysis, parameters for growth and natural survival are assumed known (Walters pers. comm ) and given in Table 4.17. The observed data are two time-series: catches {C1, • • , Cr} and abundance index 4For three-dimensional plots I used `perspO' function in S-Plus. 5But recall that in the case-study for Orange Roughy one of the fits also suggested bimodality. Chapter 4. Applications^ I. 8"09'0VOZ*0 0 TOAmsuep Z^1.^0^l-^Z-^E-^V-^ ^ (e)Boi 108 El. I. 81:01:1YOZ'00 TOApuep I.^o^I-- ^ z-^C- (b)601 Figure 4.15: Some marginal posteriors for data simulated using the logistic biomass model and fitted by the mixture model described in text. The contours are at .01,.10,.20,.30,.40,.50,.70,.90 of the largest observed posterior density. 109 Chapter 4. Applications^ Table 4.17: Parameters which are fixed in the delay-difference model for Pacific cod (Gadus rnacrocephalus). Parameter Value 1.0 a 0.8 0.5 Wk 0.6 3 {h, -,/,} for a total of T = 22 years. The abundance index is catch per unit effort, CPUE. I shall define harvest rates in terms of catches by ht=CtlBt for t = 1, • • , r The abundance index h is assumed proportional to the corresponding true biomass Bt but subjected to i.i.d. observation errors vt. Hence, I use the standard model h = qBtevt for t = 1, • • • ,T where q is the constant of proportionality. The unknown parameters are: b in the recruitment equation (3.35); B1 = Be the (initial) equilibrium biomass needed in (3.34) and (3.35); and the constant of proportionality q. The parameter vector 0 = 02, 03)' is given by 0 = (ln ln q,ln The observation model, written as a function of 0, is defined by: (Yt I 0,V), N(pt(0),V)^ For any fixed value 9 and time-series of catches {Ct}, the sequence obtained recursively, using the following steps: 1. Start with B1 = e93 calculate R1 using (3.34). ^ (4.40) t = 1, • • , 7} is 110 Chapter 4. Applications^ Table 4.18: Quantiles specified for the parameters. The mean m and variance C are parameters for the corresponding Gaussian distribution for the logarithm of the parameter. Quant. b q x 10-3 B1 5 1.0 1 1 25 1.4 6 3 50 2.5 25 6 75 3.5 100 9 95 5.0 500 18 m 0.82 -3.74 1.60 C 0.28 3.73 0.75 2. Substitute R1 into (3.33) to obtain Ni. 3. For t < 5, define Rt = R1; otherwise use (3.35) and the corresponding "lagged" spawning stock 4. Use (3.31) and (3.32) recursively to obtain the sequences {13t t . 1, • • • ,r} and Ot t = 1, ..., r}. 5. Calculate {MO) : t = 1, ..., r}, where ft(G) = ln(q.ht) and q = 02. The posterior distribution for 0 is obtained using the adaptive importance sampling (AIS) of Section 2.3. For data fitting I use the algorithm proposed in Section 3.3. To describe background information about the parameter vector 0, I constructed the prior distribution based on educated guesses for specified quantiles of the original parameters to which biological meaning can be attached. Table 4.18 gives these values for five different quantiles of an "imaginary" marginal log-normal distribution for each parameter. The listed quantiles were the outcome after a series of minor adjustments until reaching consistency among all five values. The parameters which would correspond to the underlying Gaussian distribution were used as the mean m and variance C in a three-dimensional Student distribution with 4 degrees of freedom and diagonal covariance matrix. In order to evaluate the range of values for b, notice that 1 _ Sopt b — Be Chapter 4. Applications ^ 111 where Soo is the spawning stock which produces the maximum number of recruits Ropt. It is easy to see from (3.35) that RRop: = _bleb_i so that for b = 5 the recruitment Ropt would be eleven times larger then its value at unfished equilibrium Re. It is also reasonable to assume 0 < Soo < Be, which implies b> 1. For q and B1 the quantiles can be interpreted directly. As prior for the variance V I assume an Inverse-Gamma distribution with parameters av = 2 and fiv = 8 which is diffuse (infinite variance) and corresponds to a coefficient of variability of approximately 20% in the observed abundance. The approximation for the posterior distributions were obtained after 4 iterations using no = 1000, n1 = 2000, n2 = 2500, n3 = 3500, and n4 = n = 6000. At each stage, a reduction to a smaller mixture model of m = 800 components was performed before proceeding. The heavier tails (4 degrees of freedom) were chosen empirically after trying different possibilities while aiming for stable measures of relative entropy (2.4). For the results discussed here, I set the bandwidth h2 = 0.9. The entropy of the final approximations was 0.91 (recall that a value of 1 corresponds to a perfect fit at the posterior sample points). In Figure 4.16 (a, b and c), contour-plots for the two-dimensional marginal posterior distributions are displayed. The pair (ln b,ln Bi) (Fig. 4.16(a)) has a primary mode near (1.4,1.3), while a secondary (less likely) mode is near (0.5, 2.1). Bi-modality is also shown in (c), the marginal posterior for (lnq,ln b). In (b) a strong (negative) correlation between in B1 and in q features the confounding effect typically found for those parameters. Figure 4.16 (d) shows the one-dimensional marginal for in B1. The comparison of this one-dimensional marginal distribution with the contour plots (a) and (b) gives a good indication of loss in information incurred by the reduction in dimensions. Still, such a one-dimensional marginal posterior distribution has much more to say than a point estimate as, for instance, the Monte Carlo mean 1.89 with standard deviation 0.36. The confounding effect between B1 and q suggests that a good prior knowledge about q is likely to reduce posterior uncertainty about B1. In order to see this, I have defined a more Chapter 4. Applications ^ (a) 112 ^ (b) cv T 7 1 -1 2^3 In B_1 4 5 ^ ^^ ^ ^ ^ ^ 2 2^3 4 0 1 0 5 ^ In b In B_1 Figure 4.16: Estimated posterior marginal densities for Pacific Cod (Gadus macrocephalus). The contours represent .05, .10, .25,.50, .75, .90 and .95 of the largest observed density. Chapter 4. Applications^ 113 Table 4.19: Monte Carlo estimates of posterior mean (standard deviation) for diffuse and informative priors. The entropies are for the final importance function. Estimates V B1 P(Bi > 9.0) P(Bi > 10.0) Entropy Diffuse 0.28 (0.10) 8.1 (7.5) 0.29 (.006) 0.23 (.005) 0.91 Informative 0.30 (0.10) 8.7 (4.0) 0.40 (.006) 0.28 (.006) 0.95 informative prior for q, setting the quantiles 5, 25, 50, 75, 90 and 95 at values .03, .05, .08, .12, .22 respectively. Such resulted in the prior moments m = —2.53 and C = 0.38 for ln q. This prior favours the less likely of both modes. Figure 4.17 shows the contours (and perspective plot) for the marginal posterior between In b and ln B1, together with the one-dimensional marginals for the individual parameters. Notice the increased credibility associated to this secondary mode, confirming my predictions. At the same time the mode corresponding to the smaller biomass dominates — a consequence of the strong favouritism towards the recruitment parameter b associated to that mode. The change in the shape of the marginal posteriors for ln B1 and ln b are most noticeable. Figure 4.18 serves this purpose: the marginal posteriors for the "informative" prior (broken line) increase possibilities for the smaller mode for In b and substantially reduce the chances for values of ln B1 below 1.5. Table 4.19 and Table 4.20 give additional summaries for estimates obtained for both fits. The case with diffuse prior suggests a slightly smaller expected initial biomass of 8.1 as compared to 8.7 obtained for the informative case. The increased background information assumed in the latter case is also reflected in the overall reduction in variances; the exception is hi b for which the variance increased slightly — possibly due to the more pronounced bimodality. 4.7 Final Remarks Throughout this Chapter I have examined a series of case studies for Bayesian inference and decision making These applications were intended to provide a range of illustrations for the Chapter 4. Applications ^ (a) 1 114 (b) 3 4 0.0 0.5 1.0 1.5 2.0 In b q y- 1 2^3 In B_1 4 Figure 4.17: Estimated posterior marginal densities for Pacific cod, using an informative prior for the catchability parameter q. The contours represent .05, .10, .25, .50, .75, .90 and .95 of the largest observed density. Chapter 4. Applications ^ 115 (a) -1 ^ ^^ 0 1 2 In b (b) 1 ^ 2^3 In B_1 ^ ^ 4 5 Figure 4.18: Marginal posterior densities for: (a) ln b, (b) ln Bi. The full and dotted lines correspond to the "diffuse" and "informative" priors respectively. Chapter 4. Applications^ 116 Table 4.20: Monte Carlo estimates of moments of the posterior distribution p(0 I Y(7)) for Pacific cod, given diffuse and informative priors. The values mo, ml and m2 refer to in b, ln q, and In B respectively. The values are the corresponding covariances. Parameter mo mi. M2 CO30 Coo. CO32 Ci.o. C1,2 C2,2 Diffuse 1.16 -2.09 1.89 0.14 0.11 -0.13 0.36 -0.35 0.36 Informative 1.07 -2.28 2.08 0.16 0.05 -0.09 0.15 -0.15 0.17 procedures described in Chapters 2 and 3, especially the use of adaptive importance sampling (AIS) to approximate posterior distributions of parameters in biomass dynamic models. The full benefits of such a rich description of uncertainties, are realized in decision making In two case-studies — the simulation of Section 4.3 and the Orange Roughy fishery of Section 4.4 — I take this extra step after defining a suitable utility function and a set of possible management actions. The first case-study (Section 4.2) considered stock and recruitment data for Skeena River sockeye salmon (Oncorhynchus nerka). I constructed a posterior distribution for the twodimensional parameter and used standard density estimation techniques to approximate posterior distributions for two policy parameters. The estimates were discussed in the light of a previous analysis performed by Hilborn and Walters [25] for the same data. I further used this example to illustrate the advantage of adaptive importance sampling over one-time importance sampling. A second case-study (Section 4.3) was based on simulated data. The effect of process noise and observation error and the robustness of different fitting models were examined. For the study I replicated some of the simulation models used by Ludwig and Walters [38], but used the Bayesian models to estimate the "optimal effort" defined in their paper. I further included Chapter 4. Applications^ 117 a new criterion to choose some "best effort", based on utility theory and Bayesian decision analysis. This new approach is attractive given its simplicity of concept and the avoidance of unrealistic equilibrium assumptions. Since the approach requires a posterior distribution, it is an exclusive feature of the Bayesian method. If a fitting model is judged by its capability of correctly estimating the policy parameters of interest, one is compelled to give up the fidelity of the model to the "true" underlying dynamics in order to improve the performance of estimation. Such findings were reported previously by Ludwig and Walters [38] and are confirmed within the Bayesian setting. The same feature was also identified for the new approach by comparing the estimated best decision to its perfect information alternative. The third case-study (Section 4.4) refers to the fishery of Orange Roughy (Hoplostethas atlanticus) off the New Zealand coast. I used the data of Francis [18] and a surplus-production model to examine the risk of different management options. Francis produced estimates of risk of collapse from a sophisticated age-structured model. Such risks are too high if compared to my results based on a much simpler and possibly more robust model. Furthermore, the Bayesian framework of my analysis can better account for uncertainties in underlying parameters and its consequences for decision making. My results suggest that the choice between various management options is primarily driven by the utility function which compromises between risk of collapse and future catches. The more one favours the former (large risk-aversion), the more conservative (i.e. larger rate of annual reduction in total allowable catches (TAC)) the solution. The richness of information available through a posterior distribution becomes very evident in cases where this distribution is multimodal. Such situations highlight the weakness of any approach which forces estimation to one point only. Decision making that is based on point estimates ignores the uncertainty among a multitude of reasonable answers. The last two case-studies examine multimodality. In Section 4.5 I once again resort to a simulation model. I consider an imaginary fishery for which the effectiveness of fishing effort (measured by the coefficient of catchability) changes randomly from year to year. The results show a clear bimodal Chapter 4. Applications^ 118 posterior, indicating the duality which the fitting model is incapable to resolve. Section 4.6 returns to real data. This time I consider Pacific cod' (Gadus macrocephalus) and use a more elaborate delay difference model (see Section 3.4) to describe biomass dynamics. The posterior - distribution indicates two modes. The analysis of one- and two-dimensional marginal posterior densities also depicts the loss in information incurred by dimensionality reduction. 6The potential for bimodality in this particular data set was pointed out to me by C. J. Walters. Chapter 5 Discussion and Conclusion "Fisheries management is a decision problem, not an assessment problem", say Hilborn and Walters [25] after a detailed description and analysis of where and why it has failed in the past. The confusion between management and assessment, they claim, is at the root of the problem. Why ? People used their expertise to build more complex, realistic models. It was thought that inclusion of any new insight about biological or ecological phenomena, would enhance the predictive capabilities of such models. In seeking to maintain the credibility of fishery sciences, there was a tendency towards suppression or elimination of all elements of doubt. Confession of uncertainty was considered to be a sign of weakness. But failures in the real world revealed that this procedure was not effective. In the 1970's and 1980's, Hilborn, Ludwig, Walters, and others ([71], [34], [37], [33], [39], [38]) indicated where to look for possible flaws. They used simulation studies to show that typical fishery data are not informative enough to allow for reliable estimation in realistic but highly parameterized models. It was recognized that a compromise between performance of estimates and complexities of models had to be made. Traditional management practices, based on equilibrium assumptions and goals for stable catches, were in collision course with the desire for learning about the systems potential. Less realistic but simpler models were shown to have a better performance in management than more sophisticated alternatives. Uncertainty was exposed; its description presented a new challenge. Such a major change in thinking about models simultaneously changed the concept of management. Traditional notions and management practices were questioned. "It is far more important to 119 Chapter 5. Discussion and Conclusion ^ 120 explicitly consider uncertainty than to apply sophisticated analytical techniques", say Hilborn and Walters [25]. The pretence of certainty was abandoned and fishery management seen as decision making under uncertainty. It became important to identify competing models and compare possible actions. The distinctive roles of assessment and management became clear: assessment should produce the alternative models (hypothesis) and their relative credibility, while management had to establish a set of actions and the measurement of their consequences. All these insights led naturally to use of Bayesian statistics. Lindley [31] states that, for coherent decision making in the presence of uncertainty, there is only one way to go: 1. All uncertainties must be quantified by probabilities. 2. The consequences of various actions must be described by utilities. 3. The action, expected to give the largest utility, must be taken. There is considerable theoretical evidence — see Berger [2] and de Finetti [8] for lengthy treatments and further references — to support the conclusion that Bayesian decision theory gives the framework to accomplish this goal. It has been shown that any decision making which does not correspond to a Bayesian analysis, is liable to lead to incoherent procedures, and hence repeated use may lead to the collapse of the fishery. From a Bayesian standpoint, probabilities are the expression of beliefs about uncertain events. The individualistic perception which goes into such interpretation, gave it the name: subjective probability. Classical non-Bayesian views, reject such probabilities on two grounds: 1. The frequentist interpretation of probability (e.g., the probability 1 of heads in tossing a fair coin) as the only acceptable one; and 2. the impression of lack in scientific objectivity, caused by the inclusion of personal opinions, perceived as arbitrary. Chapter 5. Discussion and Conclusion^ 121 In answer to argument 1, it can be said that the flexibility gained from subjective probabilities is in fact an advantage of Bayesian statistics. In a frequentist approach there is no room for statements like "high probability that the Orange Roughy fishery will collapse within the next 5 years" while it offers no difficulty to the subjectivist. As to the criticism in 2, any statistical analysis leading to decision making, relies on subjective (personal) inputs. The Bayesian approach makes such inputs explicit and open to evaluation by other experts; this is certainly a scientific attitude. The reader can find more details about the controversy between 'objectivists' and 'subjectivists' and the philosophical foundations underlying the Bayesian approach, in Berger [2] (sections 1.6, 3.1 and 4.1) or, in a less technical language, in Lindley [31]. If we agree with subjective probabilities as a coherent way to express uncertain beliefs, Bayes' Theorem becomes an important conceptual link between (prior) beliefs and the (statistical) information provided by data. The resulting posterior probability gives the complete description of current beliefs. The theorem tells how beliefs have to change in the light of a new piece of evidence (data); it is a mathematical description for coherent inductive reasoning. But, in public decision making — as is the case in fisheries — it can be difficult to find a prior or utility to which all participants would agree. The Bayesian statistician may use a "diffuse" (or "vague") prior or may report the outcome of the analysis over a set of reasonable alternatives. Often — as was observed for Orange Roughy — the management decision is robust to such alternatives. If, however, different reasonable priors give conflicting answers, then the scientific uncertainty is to be treated as a feature of the problem and should never be eliminated by artificial or hidden assumptions. A set of sensible utilities, on the other hand, is an useful negotiating tool among various interest groups. Management: description or prescription Our modern thinking about assessment and management includes concepts of adaptive management and replication. Such tools are needed to develop a better understanding of stock potential and enable more reliable predictions. The answers sought are therefore descriptive; Chapter 5. Discussion and Conclusion ^ 122 they explain how nature behaves. Management decisions are needed, and will always be based on beliefs, no matter how good the understanding of the system or how reliable the predictions. In the worst scenario, uncertainties are simply ignored. To ensure better decision making, prescriptive answers are needed. According to de Finetti [8], the subjectivistic theory of probability is prescriptive rather then descriptive. A posterior distribution is not the description of a real phenomena but an expression of the perceived uncertainties and beliefs about it, given all the available evidence. Such distribution becomes of practical relevance when used in combination with measures of consequences (utilities), to prescribe a best action. Adaptive management uses experimentation to enhance understanding (description) of the mechanisms acting on renewable resources under exploitation. There are, however, recognized difficulties with the practical implementation of experimental management programs. The difficulties go from technical (insufficient replicates) to institutional (coordination, enforcement) and political (who sets priorities and accepts the responsibilities for the decision). But how can we predict the effect of an action if there are no replicates and controls to compare with ? How to respond to unexpected (or contradictory) predictions from different working models ? Regardless of the ability to overcome these problems, the prescriptive role of management will still be necessary. Although often preferred, inaction is merely one of many possible choices. I believe that Bayesian statistics has much to offer in helping to improve the decision making process. If people are trained in better elicitation of uncertainties, if utilities give a fair description of preferences and attitudes towards risk, then a coherent framework is available. How can we accomplish the complex task (decision making) if we can not do the elementary ones (describe uncertainties and utilities) ? Chapter 5. Discussion and Conclusion^ 123 The contribution of this work Technical difficulties have hindered the implementation of the Bayesian approach. For fishery sciences in particular, the non-linearity of models and plethora of parameters have impeded easy implementation. This technical challenge had to be tackled so that the Bayesian paradigm could become of practical significance. By using the adaptive importance sampling procedure introduced by West [75] [76], I was able to construct multivariate posterior distributions for the parameters of biomass dynamic models. Since the procedure gives a random sample of values from the posterior distribution, this output can be coupled easily with an utility function to calculate approximate Monte Carlo expectations — a requirement for Bayesian decision making. The application of these results to a variety of case-studies and its combination with reasonable utility functions served to illustrate the new possibilities. For instance, it shed new light on the controversy about the New Zealand Orange Roughy fishery. The (frequentist) results of Francis [18] were shown to agree with the Bayesian analysis only when a particular (informative) prior was assumed. But the sensitivity of the model to different reasonable priors turned out to be unimportant when deciding upon the best policy. For this case in particular, my analysis suggests that the decisive issue is the choice of an utility function; neither prior uncertainties nor technical details about models are influential. Because this choice is political and not technical, the scientist doing assessment should not impose his or her judgement of "adequate" risk taking. He or she are entitled, however, to assist in the construction of an adequate utility function and in the elicitation of prior beliefs as well as the calculation of expected utilities. In this context the role of assessment is more than data analysis; it includes the analysis of beliefs and attitudes towards risk. Chapter 5. Discussion and Conclusion ^ 124 Future Work There are many aspects of the Bayesian method where practical difficulties remain and where more work is needed. Some are general: elicitation of priors and utilities or the problem of public decision (i.e., more than one decision maker). Others are specific: the extension to other classes of fishery models and/or importance functions. I shall comment on each separately: Elicitation: In his description of the importance of probabilities as the adequate measurement of uncertainty, Lindley [31] uses an analogy in geometry. He states that in the same way as the laws of geometry have to be satisfied in the construction of a building, the laws of probability are to be satisfied if a coherent description of uncertainties is to be achieved. Pursuing the analogy further, he compares the tools to measure lengths and angles to methods for elicitation of priors. Such methods might be simple or sophisticated, inaccurate or precise. The decision maker, likewise the architect, will need tools to make the required measurements. Much needs to be done to develop better tools for measurement of uncertainty. Nevertheless, concludes Lindley, while technological applications are still deficient, the scientific principles are clear. Public Decision: In the process of decision making it is usually a committee that has to arrive at a consensus, each committee member having his or her own utility function and prior uncertainty. All analyses considered throughout previous chapters made the implicit assumption of one decision maker only. What can we do when beliefs and interests of different individuals have to be used ? The theory for dealing with group decision (Weerahandi and Zidek [73], Zidek [81], de Waal et al. [9]) was developed to address this issue and can be tried out here. Fishery Models: Surplus-production models were central to this work. I commented that there was no conceptual difficulty in extending the procedure to more elaborate alternatives. In fact, a standard delay-difference model was used in the case-study for Pacific cod. We have also seen throughout that, given the common limitations in quantity and Chapter 5. Discussion and Conclusion ^ 125 quality of fishery data, simpler (i.e., less parameterized) surplus-production models tend to be more robust to correctly identify the best policy choice. But if we can improve prior elicitation of parameters in the more demanding delay-difference models, the potential gain is considerable. To train fishery biologists for a better prior elicitation might include the following steps: 1. Acquaint the elicitors with probabilities and its role in describing uncertainties. 2. Generate artificial data using detailed and realistic models. 3. Get elicitation of parameters for specified models and based on the artificial data from previous step. 4. Perform analysis to identify failures of model and elicitation. Try to answer questions like: • Which parameters are more critical ? • Which models are more robust ? • Can elicitors learn from past mistakes ? The extension to nonstationary models (i.e. time-dependent Ot) is another promising area for further research, as is the search for alternative mixture models which enable to work with strictly positive variables directly (i.e., no need to use log-transformations). Conclusion Active adaptive strategies are the essence of modern fishery management. Such experimental approach can increase our understanding (description) of the system and improve our capabilities for prediction. But, it needs the features of Bayesian analysis to determine how uncertainties propagate in time as new evidence becomes available through new data. Policy choices need to be made, independent of the progress in adaptive management practices. Bayesian analysis provides coherent prescriptions of how such choices are to be made. "In all cases where details are available", says Wilimovsky [78] in (non-Bayesian) discussion about Chapter 5. Discussion and Conclusion ^ 126 fishery management, "the output of formal predictions is modified by judgment. The methodology of judgment is not formalized so that it is unlikely that two or more individuals would arrive at the same management recommendation or even have the ability to trace the other's logic". Appendix A Pseudo-code for Adaptive Importance Sampling This appendix presents a pseudo-code which describes the implementation of the adaptive importance sampling (AIS) of Chapter 2. In particular, it refers to the procedures described in Section 2.3. For concreteness, the code considers a fixed number of updating steps (modifications for an interactive procedure are easy to implement in principle) and assumes two time-series of data (not necessarily of the same length): (i) total catches and (ii) some abundance index. I use the fitting procedure described in Section 3.3 as "Model 2" and take the functions G(-) and F(.) to be known. I shall specify some notation to be used in the pseudo-code: • A vector (al , • • -,ap) of order P, will be denoted a[1...13]. A single element ai from this vector is identified as a [i.] . • A matrix (N x P) with N lines and P columns with its elements given by fai,i : j = 1,• • ,N;i =1,- • ,P}, will be denoted a [1 . . .N, 1. . .13] . A single element^is identified as a [j , i]. • The j-th line of the above matrix — the vector of order P given by ^• • ai,p1 — is denoted a [j , 1. .13] or simply a [j ,*] . Any operation performed on such a line j is understood to be applied to each of its P elements individually. • Let S be a symmetric positive-definite matrix of order P. I define a vector T of order K = P*(P -F 1)/2, containing the elements of the lower triangle of S, taken by row. That is, for each element (Si,i : j > i) define r = (j(j — 1)12) i to get T,. = 127 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 128 Covariance matrices will be stored in the form T rather than S throughout the code. • The different steps in the pseudo-code are numbered. Global numbers (1, 2, etc.) if read in isolation, give a summary description of the sequential structure of the algorithm. Lower levels of numbering (2.1, 2.1.1, etc.) give details about how the task of '2' is accomplished. • The conditional operator IF (condition) GOTO (x .x) is to be interpreted as follows: 1. Verify condition. 2. If condition is TRUE, than go to line number x.x. 3. If condition is FALSE, than go to next line. The description of the code is sub-divided into three parts: (I) - a "list of variables" for which the values need to be specified in advance; (II) - The pseudo-code for the main program; (III) - Pseudo-code describing subroutines used in (II). I - LIST OF VARIABLES: P: dimension of the parameter vector 'Theta'. K = ( P*(P+1)/2 ): the dimension of a vector giving the lower triangle of PxP Covariance matrix. G: number of iterations in the AIS procedure. NO: size of sample to be drawn from prior. N[1...G]: sample size to be drawn in each iteration. M: number of Student distributions to be used in the clustered mixture model. DFO: degrees of freedom in prior Student distribution. DFk: degrees of freedom of the Student kernel in the mixture model. H: the bandwidth of kernel variance. Appendix A. Pseudo-code for Adaptive Importance Sampling ^ DATA_C: number of observed catch data. DATA_A: number of observed abundance index data.('DATA_A' is smaller or equal to 'DATA_C) THETA_MEAN[1...P]: prior mean for 'Theta'. THETA_VAR[1...K]: lower triangle of the prior covariance matrix for 'Theta'. ALPHAO and BETAO: parameters for Inverse Gamma prior. CATCH[1...DATA_C]: time-series of observed catches. ABUN[1...DATA_A]: time-series of observed abundance index data. II - PSEUDO-CODE: 1: Draw a sample of size NO from the prior distribution and store the results in THETA[1...N0,1...P]. 1.1: Perform a "Choleski Decomposition" (see final remarks) of THETA_VAR[1...K] and put the result in CHOLESKI[1...K]. 1.2: For j=1 to j=NO 1.2.1: Draw a random vector from a "Multivariate Student Distribution" (see final remarks) using as input THETA_MEAN[1...P], CHOLESKI[1...K] and DFO. 1.2.2: Store the output in THETA[j,*]. Next j 2: Calculate the likelihood for ABUN[1...DATA_A] for each given value of THETA[1...N0,*] and store the normalized 129 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ results in OMEGA[1...N0]. 2.1: For i=1 to DATA_A 2.1.1. Get Y[i] = ln( ABUN[i] ) Next i 2.2: Initialize SUM = 0 2.3: For j=1 to NO 2.3.1: Get the time series of deterministic average biomass dynamics by using THETA[j,*], ALPHAO, BETA0 and CATCH[1...DATA_C] in the assumed biomass model (G(.) and F(.)). Store the relevant values in MU[1...DATA_A]. 2.3.2: Store the likelihood value in OMEGA[j]. 2.3.3: SUM = SUM + OMEGA[j] Next j 2.4: For j=1 to NO 2.4.1: OMEGA[j] = OMEGA[j]/SUM Next j 3: Calculate the "relative entropy" of the prior and store it as EO or send it to the output device. 3.1: Initialize ENTROPY = 0 3.2: For j=1 to NO 3.2.1: ENTROPY = ENTROPY + ( OMEGA[j]*ln(OMEGA[j]) ) Next j 3.3: EO = -ENTROPY/ln(NO) 3.4: Send 'EO' to output device. 130 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 4: Perform the G recursions of the adaptive importance sampling procedure. In the process keep track of the entropies E[1...G]. Store the final values THETA[1...N[G]] and OMEGA[1...N[G]] for further use. 4.1: Get the "bandwidth" for kernel variance and store the outcome in WINDOW. 4.1.1: X1 = (4/(1+2P))-(1/(1+4P)) 4.1.2: X2 = M-(1/(1+4P)) 4.1.3: WINDOW = H*(X1*X1)/(X2*X2) 4.2: For k=1 to G 4.2.1: IF ( k>1 ) GOTO (4.2.5) 4.2.2: Use current THETA[1...NO,*] and OMEGA[1...NO] to get Monte Carlo mean and covariance. Store the mean as KER_MEAN[1...P] and the lower triangle of the covariance matrix as KER_VAR[1...K]. 4.2.3: Use the subroutine "CLUSTERO" (described below) and the input of THETA[1...NO,*] and OMEGA[1...NO] to get the smaller sets T_CL[1...M,*] and 'O_CL[1...M]. 4.2.4: Go to (4.2.7) 4.2.5: Use current T_OLD[1...N[k],*] and O_KER[1...N[k]] to get Monte Carlo mean and covariance. Store the mean as KER_MEAN[1...P] and the lower triangle of the covariance matrix as KER_VAR[1...K]. 4.2.6: Use the subroutine "CLUSTER()" and the input of T_OLD[1...N[k],*] and 0_KER[1...N[k]] to get 131 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ T_CL[1...M,*] and O_CL[1...M]. 4.2.7: For 1=0 to K *.1: KER_VAR[i]=WINDOW*KER_VAR[i] Next i 4.2.8: Perform a "Choleski Decomposition" of KER_VAR[1...K] and put the result in KER_CHOL[1...K]. 4.2.9: For j=1 to N[k] *.1: Draw a random integer from the set {1,...,14} according to probabilities given by O_CL[1...M]. Store it in A. *.2: Draw a random vector from a "Multivariate Student distribution" using as input T_CL[A,*], KER_CHOL[1...,K] and DFK. *.3: Store the output in THETA_KER[j,1...P]. Next j 4.2.10: Calculate the weights of the random sample THETA_KER[1...,N[k],*] which was drawn from the current "mixture model". Store this value in O_KER[1...N[k]]. *.1: Initialize SUM=0 *.2: For j=1 to N[k] *.2.1: Calculate the "density" of THETA_KER[j,*] for a P-dimensional Student distribution with parameters THETA_MEAND, THETA_VARD and DFO. Store the result in PRIOR. 132 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ *.2.2: Initialize MIXTURE = 0 *.2.3: For 1=1 to M *.2.3.1: Calculate the "density" of THETA_KER[j,*] for a P-dimensional Student distrib. with parameters T_CL[i,*], KER_VARD and DFK. Store the result in KERNEL. *.2.3.2: KERNEL = O_CL[i]*KERNEL *.2.3.3: MIXTURE = MIXTURE + KERNEL Next i *.2.4: Get the time series of average biomass dynamics using THETA_KER[j,*], ALPHAO, BETAO, and CATCH[1...DATA_C] in equation G(.). Store the mean values in MU[1...DATA_A] and get the likelihood LIK for data ABUN[1...DATA_A]. *.2.5: F = LIK*PRIOR *.2.6: O_KER[j] = F/MIXTURE *.2.7: SUM = SUM + O_KER[j] Next j *.3: For j=1 to N[k] *.3.1: O_KER[j] = O_KER[j]/SUM Next j *.4: Calculate the "relative entropy" of the current "mixture model" and store it as 133 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ E[k] or send it to output. *.4.1: Initialize ENTROPY = 0 *.4.2: For j=1 to N[k] *.4.2.1: ENTROPY = ENTROPY + 0_kER[j]*ln(O_KER[j]). Next j *.4.3: E[k] = -ENTROPY/ln(N[k]) *.4.4: Send E[k] to output. *.5: IF (k=G) GOTO (*.8) *.6: For j=1 to N[k] *.6.1: T_OLD[j,*] = THETA_KER[j,*] Next j *.7: Go to (4.2.1) *.8: For j=1 to N[G] *.8.1: THETA[j,*] = THETA_KER[j,*] *.8.2: OMEGA[j] = O_KER[j] Next j Next k 5: Use THETA[1...N[G],*] and OMEGA[1...N[G]] to get Monte Carlo mean and covariance. Store the mean in POST_MEAN[1...P] and the lower triangle of the covariance matrix as POST_VAR[1...K]. Send this values to the output device for further use. 6: (Optional): Get Monte Carlo estimates for mean and variance of functions "H(Theta)". 134 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 7: Use THETA[1...N[G],*] and OMEGA[1...N[G]] in the subroutine "CLUSTER()" to get the final "mixture model". Store these values as T_CL[1...M,*] and O_CL[1...11] and keep them in a File for further use. III - SUB-ROUTINES: CLUSTER(): This subroutine performs the clustering algorithm of section 2.4 (Chapter 2). Input: - The dimension P of the parameter vector. - A positive integer N denoting the size of the data set. - A positive integer M (M < N) defining the desired size of the data set after clustering. - A data matrix THETA[1...N,1...P] _ A vector of weights OMEGA[1...N] Output: - A matrix T_CL[1...M,1...P] _ A vector O_CL[1...M] 1: Initialize D=N 2: Make a copy of OMEGAD. Also create a vector of identification indices. 2.1: For j=1 to N 2.1.1: O_COPY[j] = OMEGA[j] 135 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 2.1.2: ID[j] = j Next j 3: Sort (see final remarks about "sorting") the vector 0_COPY[1...N] into ascending numerical order and make the corresponding rearrangement of ID[1...N]. 4: Sort the values in T_COPY[1...N,*] to agree with the ordered vector O_COPY[1...C. 4.1: For j=1 to N 4.1.1: S = ID[j] 4.1.2: T_COPY[j,*] = THETA[S,*] Next j 5: IF ( D=M ) GOTO (7) 6: Reduce the data set by one unit, by merging the first case with its "nearest neighbour" and reallocating the new value within the sorted set. 6.1: For j=2 to D 6.1.1: Calculate the "distance" between T_COPY[1,*] and T_COPY[j,*] and store its value in DIST. 6.1.2: IF ( j>2 ) GOTO (6.1.5) 6.1.3: Initialize: DMIN = DIST WEIGHT = O_COPY[2] S = 2 136 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 6.1.5: IF ( DIST>=DMIN ) GO TO (6.1.9) 6.1.6: DMIN = DIST 6.1.7: WEIGHT = O_COPY[j] 6.1.8: S = j Next j 6.2: O_TEST = O_COPY[1] + O_COPY[S] 6.3: For i=1 to P 6.3.1: T_TEST[i] = (0_COPY[1]*T_COPY[1,i] + O_COPY[S]*T_COPY[S,i])/O_TEST Next i 6.4: For j=2 to D-1 6.4.1: IF ( j>=S ) GOTO (6.4.5) 6.4.2: O_C0PY[j-1]=0_C0PY[j] 6.4.3: T_COPY[j-1,*]=T_COPY[j,*] 6.4.4: Go to (6.4.13) 6.4.5: IF ( O_TEST<O_COPY[j+1] ) GOTO (6.4.9) 6.4.6: O_COPY[j-1] = 0_COPY[j+1] 6.4.7: T_COPY[j-1,*] = T_COPY[j+1,*] 6.4.8: Go to (6.4.13) 6.4.9: O_COPY[j-1] = O_TEST 6.4.10: O_COPY[j] = 0_COPY[j+1] 6.4.11: T_COPY[j-1,1...P] = T_TEST[1...P] 6.4.12: T_COPY[j,*] = T_COPY[j+1,*] Next j 6.5: D = D-1 6.6: Return to (5) 137 Appendix A. Pseudo-code for Adaptive Importance Sampling ^ 138 7: Store T_COPY [1 . . . M , *] and O_COPY [1. . .M] in the arrays defined for output. 7.1: For j=1 to M 7 .1. 1: O_CL [j] = O_COPY [j] 7 .2 . 2 : T_CL [j , *] = T_COPY [j , *] Final Remarks Choleski Decomposition: The Choleski decomposition of a covariance matrix E results in a lower-triangular matrix L satisfying z . LLT Such decomposition is helpful here in two ways: (i) to perform random draws from a multivariate distribution with covariance matrix E and (ii) to calculate the inverse matrix E-1. For further details and algorithms I suggest Ripley [53] and Nash [44]. Student Distribution: In order to generate random draws from uni- and multidimensional Student densities, algorithms can be found in Ripley [53]. Sorting: To perform the "sorting" in the CLUSTER() subroutine, I used the "Heapsort" algorithm in Press et al. [52]. Bibliography [1] M. Aoki. State space modelling of time series. Springer Verlag, 1987. [2] J. 0. Berger. Statistical decision theory and Bayesian analysis. Springer Verlag, 1985. [3] G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. HoldenDay, revised edition, 1976. [4] P. J. Brockwell and R. A. Davis. Time series: Theory and Methods. Springer Verlag, 1987. [5] A. Charles. In-season fishery management: a Bayesian model. Nat. Res. Mod., 2:457-498, 1988. [6] C. A. Clark. Bioeconornic Modelling and Fisheries Management. John Wiley & Sons Inc., 1985. [7] C. A. Clark. Mathematical Bioeconomics: the optimal management of renewable resources. John Wiley & Sons Inc., 2nd edition, 1990. [8] B. de Finetti. Probability, induction and statistics. John Wiley & Sons, 1972. [9] D. de Waal, P. Groenewald, D. van Zyl, and J. V. Zidek. Multi-Bayesian estimation theory. Statistics and Decisions, 03, 1985. [10] M. Delampady, I. Yee, and J. V. Zidek. Hierarchical Bayesian analysis of a discrete time series of Poisson counts. Technical Report 111, Dept. of Statistics - University of British Columbia, 1991. [11] M. Delampady, I. Yee, and J. V. Zidek. A smoothness-prior model for discrete time series. Technical Report 110, Dept. of Statistics - University of British Columbia, 1991. [12] R. B. Deriso. Harvesting strategies and parameter estimation for an age-structured model. Can. J. Fish. Aquat. Sci., 37:268 282, 1980. - [13] R. B. Deriso and A. M. Parma. Dynamics of age and size for a stochastic population model. Can. J. Fish. Aquat. Sci., 45:1054-1068, 1988. [14] B. Efron. Non parametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika, 68:589-599, 1981. [15] B. Efron. Bootstrap confidence intervals for a class of parametric problems. Biometrika, 72:45-58, 1985. [16] D. A. Fournier and I. J. Doonan. A length-based stock assessment method utilizing a generalized delay-difference model. Can. J. Fish. Aquat. Sci., 44:422-437, 1987. 139 Bibliography^ 140 [17] D. A. Fournier and A. Warburton. Evaluating fisheries management models by simulated adaptive control — Introducing the composite model. Can. J. Fish. Aquat. Sci., 46:1002— 1012, 1989. [18] R. I. C. C. Francis. Use of risk analysis to assess fishery management startegies: a case study using Orange Roughy (Hoplostehus atlanticus) on the Chatham Rise, New Zealand. Can. J. Fish. Aquat. Sci., 49:922 930, 1992. - [19] S. M. Fried and It. Hilborn. Inseason forecasting of Bristol Bay, Alaska sockeye salmon (Oncorhynchus nerka) abundance using Bayesian probability theory. Can. J. Fish. Aquat. Sc., 45:850-855, 1988. [20] H. J. Geiger and J. P. Koenings. Escapement goals for sockeye salmon with informative prior probabilities based on habitat considerations. Fish. Res., 11:239-256, 1991. [21] A. E. Gelfand and A. F. M. Smith. Sampling based approaches to calculating marginal densities. J. Am. Stat. Assoc., 85:398-409, 1990. [22] S. Geman and D. Geman. Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattn. Anal. Mach. Intell., 6:721-741, 1984. [23] J. Geweke. Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 57:1317-1339, 1989. [24] It. Hilborn, E. K. Pikitch, M. M. McAllister, and A. Punt. A comment on "Use of risk analysis to assess fishery management startegies; a case study using Orange Roughy, (hoplosthethus atlanticus) on the Chatham Rise, New Zealand" by R.I.C.C. Francis. Can. J. Fish. Aquat. Sci., 1993. To appear. [25] It. Hilborn and C. J. Walters. Quantitative fisheries stock assessment: choice, dynamics 6 uncertainty. Chapman and Hall, 1992. [26] N. L. Johnson and S. Kotz. Continuous univariate distributions. Houghton Mifflin, 1970. [27] G. Kitagawa. Non-gaussian state-space modeling of nonstationary time series. J. Amer. Stat. Assoc., 82:1032-1050, 1987. [28] T. Kloek and H. K. van Dijk. Bayesian estimates of equation system parameters: an application of integration by Monte Carlo. Econometrica, 46:1-19, 1978. [29] P. M. Lee. Bayesian statistics: an introduction. Oxford University Press, 1989. [30] D. V. Lindley. Approximate Bayesian methods. In J. M. Bernardo et al., editor, Bayesian Statistics 2, pages 223-245. Valencia University Press, 1980. [31] D. V. Lindley. Making Decisions. John Wiley & Sons, 2nd edition, 1985. [32] H. Linhart and W. Zucchini. Model Selection. John Wiley & Sons, Inc., 1986. 141 Bibliography^ [33] D. Ludwig. Small models are beautiful: efficient estimators are even more beautiful. In Springer Verlag, editor, Unknown, 1988. Conference held at Cornell University. [34] D. Ludwig and R. Hilborn. Adaptive probing strategies for age-structured fish stocks. Can. J. Fish. Aquat. Sci., 40:559-569,1983. [35] D. Ludwig, R. Hilborn, and C. J. Walters. Uncertainty, resource exploitation, and conservation: Lessons from History. Science, 260:17,1993. [36] D. Ludwig and C. J. Walters. Optimal harvesting with imprecise parameter estimates. Ecol. Model., 14:273-292,1982. [37] D. Ludwig and C. J. Walters. Are age structured models appropriate for catch-effort data? Can. J. Fish. Aquat. Sci., 42:1066-1072,1985. [38] D. Ludwig and C. J. Walters. A robust method for parameter estimation from catch and effort data. Can. J. Fish. Aquat. Sci., 46:137-144,1989. [39] D. Ludwig, C. J. Walters, and J. Cooke. Comparison of two models and two estimation methods for catch and effort data. Nat. Res. Model., 2:457-498,1988. [40] M. Mangel and C. W. Clark. Uncertainty, search and information in fisheries. J. Cons. Ciem., 41:93-103,1983. [41] P. McCullagh and J. A. Nelder. Generalized linear models. Chapman and Hall, 2nd edition, 1989. [42] R. Mendelssohn. Some problems in estimating population sizes from catch-at-age data. Fishery Bulletin, 86:617 630,1988. - [43] T. S. Nakamura, S. Ohnishi, and Y. Matsumiya. A Bayesian cohort model for catch-at-age data obtained from research takes of whales. Rep. Int. Whal. Comm., 39:375-382,1989. [44] J. C. Nash. Compact Numerical Methods for Computers. Adam Hilger Ltd., 1979. [45] J. C. Naylor and A. F. M. Smith. Econometric illustrations of novel numerical integration strategies for Bayesian inference. J. of Econometrics, 38:103-125,1988. [46] M. S. Oh and J. 0. Berger. Adaptive importance sampling in Monte Carlo integration. J. Statist. Computn. Simuln., 1991. (to appear). [47] A. M. Parma. Optimal harvesting of fish populations with non-stationary stockrecruitment relationships. Nat. Resource Mod., 4:39-76,1990. [48] J. J. Pella and P. K. Tomlinson. A generalized stock production model. Bull. Inter Am. Trop. Tuna Comm., 13:419 496,1969. - - [49] A. Pole. Efficient Bayesian learning in non-linear dynamic models. Technical report, Dep. of Statistics — University of Warwick, 1988. Bibliography^ 142 [50] A. Pole and M. West. Efficient Bayesian learning in non-linear dynamic models. J. Forecasting, 9:119-136, 1990. [51] S. J. Press. Bayesian statistics: principles, models and applications. John Wiley & Sons, 1989. [52] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical recipies in C. Cambridge University Press, 1988. [53] B. Ripley. Stochastic Simulation. John Wiley Si Sons, 1987. [54] D. B. Rubin. Comment on "The Calculation of posterior distributions by data augmentation", by M.A.Tanner and W.H.Wong. J. Am. Statist. Assoc., 82:543-546, 1987. [55] L. J. Savage. The foundations of statistics. John Wiley Si Sons, 1954. [56] J. Schnute. A general theory for analysis of catch and effort data. Can. J. Fish. Aquat. Sci., 42:414-429, 1985. [57] J. Schnute. A general fishery model for size-structured fish population. Can. J. Fish. Aquat. Sci., 44:924-940, 1987. [58] J. Schnute. The importance of noise in fish population models. Fisheries Research,11:197223, 1991. [59] B. W. Silverman. Density estimation for statistics and data annalysis. Chapman and Hall, 1986. [60] A. F. M. Smith. Bayesian computational methods. Phil. Trans. R. Soc. Lond. A, 337:369386, 1991. [61] A. F. M. Smith and A. E. Gelfand. Bayesian statistics without tears: a samplingresampling perspective. The Am. Statistician, 46:84-88, 1992. [62] A. F. M. Smith and G. 0. Roberts. Bayesian computation via the Gibbs sampler and related Markov ChainMomte Carlo methods. J. R. Statist. Soc. B, 55:3-23, 1993. [63] A. F. M. Smith, A. M. Skene, J. E. H. Shaw, and J. C. Naylor. Progress with numerical and graphical methods for practical Bayesian statistics. The Statistician, 36:75-82, 1987. [64] H. W. Sorenson. Recursive estimation for nonlinear dynamic systems. In J. C. Spa11, editor, Bayesian Analysis of time-series and dynamic models, pages 127-165. Marcel Dekker, 1988. [65] A. M. Starfield and A. L. Bleloch. Building models for conservation and wildlife. McMillan Pub. Co., 1986. [66] P. J. Sullivan. A Kalman-Filter approach to catch-at-length analysis. Biometrics, 48:237257, 1992. Bibliography^ 143 [67] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments. J. Amer. Statist. Assoc., 81:82 86, 1986. - [68] C. J. Walters. Adaptive management of renewable resources. McMillan Pub. Co., 1986. [69] C. J. Walters. Nonstationarity of production relationships in exploited populations. Can. J. Fish. Aquat. Sci., 44:156-165, 1987. Suppl. 2. [70] C. J. Walters and J. S. Collie. Design of an experimental program for groundfish management in the face of large uncertainty about stock size and production. In R. J. Beamish and G. A. MacFarlane, editors, Effects of Ocean Variability on Recruitment and an Evaluation of Parameters used in Stock Assessment Models., pages 13 25. Canadian Special Publications in Fisheries and Aquatic Sciences, 1989. No. 108. - [71] C. J. Walters and It. Hilborn. Ecological optimization and adaptive management. Ann. Rev. EcoL Syst., 9:157-188, 1978. [72] C. J. Walters and D. Ludwig. Effects of measurement errors on the assessment of stockrecruitment relationships. Can. J. Fish. Aquat. Sci., 38:704-710, 1981. [73] S. Weerahandi and J. V. Zidek. Elements of multi-Bayesian decision theory. Ann. Statist., 11:1032-1046, 1983. [74] M. West. Mixture models, Monte Carlo, Bayesian updating and dynamic models. Discussion paper, ISDS — Duke University, 1992. [75] M. West. Modelling with mixtures. In J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, editors, Bayesian Statistics .4, pages 503-524. Oxford University Press, 1992. [76] M. West. Approximating posterior distributions by mixtures. J. R. Statist. Soc. B, 55:409— 422, 1993. [77] M. West and J. Harrison. Bayesian forecasting and dynamic models. Springer Verlag, 1989. [78] N. J. Wilimowsky. The need for formalization of decision algorithms and risk levels in fishery research and management. Can. J. Fish. Aquat. Sci., 42:258-262, 1985. Suppl. 1. [79] R. L. Wolpert. Monte Carlo integration in Bayesian statistical analysis. Contemp. Math., 115(101-116), 1991. [80] J. E. Zeh, A. E. Raftery, and Y. Qing. Assessment of tracking algorithm performance and its effect on population estimates using bowhead whales, Balaena mysticetus, identified visually and acoustically in 1986 off Point Barrow, Alaska. Rep. Int. Whaling Comm., 40:411-421, 1990. [81] J. V. Zidek. Multi-Bayesianity. Technical Report 05, Dept. of Statistics — University of British columbia, 1984.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bayesian statistics for fishery stock assessment and...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bayesian statistics for fishery stock assessment and management Kinas, Paul G. 1993-12-31
pdf
Page Metadata
Item Metadata
Title | Bayesian statistics for fishery stock assessment and management |
Creator |
Kinas, Paul G. |
Date | 1993 |
Date Issued | 2008-09-10T23:50:24Z |
Description | This work is about the use of Bayesian statistics in fishery stock assessment and management. Multidimensional posterior distributions replace classical parameter estimation in surplus-production and delay-difference models. The maximization of expected utilities replaces the estimation of optimal policies. Adaptive importance sampling is used to obtain approximations for posterior distributions. The importance function is given as a finite mixture of heavy-tailed Student distributions. The performance of the method is tested in five case-studies, two of which use data simulation. Real data refer to Skeena river salmon (Oncorhynchus nerka), Orange Roughy (Hoplostethus atlanticus) and Pacific cod (Gadus macrocephalus). The results show that the technique successfully approximates posterior distributions in higher dimensions even if such distributions are multimodal. When comparing models in terms of their performance as management tools, simpler and less realistic models can do better than more sophisticated alternatives. The Bayesian approach also sheds new light on the controversy about the Orange Roughy fishery. |
Extent | 5815821 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2008-09-10 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0086451 |
URI | http://hdl.handle.net/2429/1849 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1993-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- ubc_1993_fall_phd_kinas_paul.pdf [ 5.55MB ]
- Metadata
- JSON: 1.0086451.json
- JSON-LD: 1.0086451+ld.json
- RDF/XML (Pretty): 1.0086451.xml
- RDF/JSON: 1.0086451+rdf.json
- Turtle: 1.0086451+rdf-turtle.txt
- N-Triples: 1.0086451+rdf-ntriples.txt
- Original Record: 1.0086451 +original-record.json
- Full Text
- 1.0086451.txt
- Citation
- 1.0086451.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
United States | 15 | 1 |
China | 11 | 42 |
Canada | 5 | 12 |
India | 4 | 0 |
Japan | 4 | 0 |
France | 3 | 0 |
Russia | 2 | 0 |
Australia | 2 | 0 |
Nigeria | 1 | 0 |
United Kingdom | 1 | 1 |
Germany | 1 | 0 |
Mexico | 1 | 0 |
City | Views | Downloads |
---|---|---|
Unknown | 7 | 41 |
Shenzhen | 7 | 42 |
Ashburn | 5 | 0 |
Tokyo | 4 | 0 |
Tiruvannamalai | 4 | 0 |
New Haven | 4 | 0 |
Beijing | 4 | 0 |
Montreal | 3 | 0 |
Saint Petersburg | 2 | 0 |
Rancho Palos Verdes | 2 | 0 |
Perth | 2 | 0 |
Redmond | 1 | 0 |
Los Angeles | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0086451/manifest