MIXED REGRESSION MODELS FOR DISCRETE DATA By Peiming Wang , 1983 B. Sc. (Mathematics) Shanghai Second Polytechnic University 1988 , M. Sc. (Engineering) Shanghai Institute of Mechanical Engineering M. A. (Statistics) York University, 1990 NT OF A THESIS SUBMITTED IN PARTIAL FULFILLME THE REQUIREMENTS FOR THE DEGREE OF DoCToR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1994 © Peiming Wang, 1994 the of fulfillment partial thesis in In presenting this requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available I further agree that permission for for reference and study. extensive copying of this thesis for scholarly purposes may be his or her by or department my of by head the granted It is understood that copying or publication of representatives. this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of______________________ The University of British Columbia Vancouver, Canada Date Lt ) Y’ Abstract The dissertation consists of two parts. In the first part we introduce and investigate a class of mixed Poisson regression models that include covariates in both mixing proba bilities and Poisson rates. The proposed models generalize the usual Poisson regression in several ways, and can be used to adjust for extra-Poisson variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice is sues. Several applications of this approach are analyzed. This analysis is compared to quasi-likelihood approaches. In the second part we introduce and investigate a class of mixed logistic regression models that include covariates in both mixing probabilities and binomial parameters with the logit link. The proposed models generalize the usual logistic regression in several ways, and can be used to adjust for extra-binomial variation. The features of the mod els, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice issues. An appli cations of this approach is analyzed and results compared to those by quasi-likelihood approaches. The dissertation also discusses future research in the areas and provides FORTRAN codes for all computations required to apply the models. Table of Contents Abstract ii List of Tables vj List of Figures Acknowledgement viii x DEDICATION 1 Introduction 1 2 Mixed Poisson Regression Models 5 2.1 Poisson regression and its modifications 5 2.2 Implications of Overdispersion 14 2.3 Tests for Extra-Poisson Variation 16 2.4 Mixed Poisson Regression Models 18 2.4.1 The Model 19 2.4.2 Identifiability 22 2.5 2.6 2.7 Parameter Estimation for the mixed Poisson regression models 24 2.5.1 EM and.Quasi-Newton Algorithms 25 2.5.2 Starting Values 30 A Monte Carlo Study 32 2.6.1 Performance of the Estimation Algorithm 32 2.6.2 The mixed Poisson regression Models For Some Typical Problems 35 Implementation Issues 39 2.8 2.9 3 2.7.1 Model Selection 2.7.2 Classification 2.7.3 Residual Analysis and Goodness-of-fit Test . . 46 Applications 2.8J RD and Patents 2.8.2 Seizure Frequency in a Clinical Trial 2.8.3 Terrorist Bombing 2.8.4 Accidents in Worksites 2.8.5 Aces Salmonella Assay Data Tables and Figures in Chapter 2 Mixed Logistic Regression Models 129 3.1 Logistic Regression and Its Modifications 129 3.1.1 Link Modifications 132 3.1.2 Frequency Distribution Modifications • 134 3.2 Tests For Extra-binomial Variation 140 3.3 A Mixed Logistic Regression Model 142 3.3.1 The Model 142 3.3.2 Features of the Mixed Logistic Regression Models 145 3.3.3 Identifiability 147 3.4 3.5 Parameter Estimation 151 . 3.4.1 The EM algorithm 151 3.4.2 Starting Values 155 3.4.3 A Monte Carlo Study Implementation’ Issues 3.5.1 Model Selection . 157 . 161 . 161 tv 4 3.5.2 Classification 163 3.5.3 Residual Analysis and Goodness-of-fit. 164 3.6 An Application 168 3.7 Tables and Figures in Chapter3 175 Summary, Conclusions and Future Research 189 4.1 Summary and Conclusions 189 4.2 Mixed Exponential Regression Models 191 4.3 Hidden Markov Poisson Regression Models 195 . 4.3.1 The Model 196 4.3.2 Moment Structure 199 4.3.3 Identifiability 200 4.3.4 Estimation 202 4.3.5 The Probabilities of Initial States and Starting Values 208 4.3.6 Implementation and Remaining Issues 209 Bibliography . . . 210 . A FORTRAN PROGRAM 220 V List of Tables 2.1 The results of the simulations for the mixed Poisson regression models. 2.2 The result of a Monte Carlo study on the 2-component mixed Poisson regression model with constant mixing probabilities and variable rates 2.3 — I The results of the likelihood ratio tests for the hypothesis of a 2 = — II 85 0 based on the 2-component mixed Poisson regression model—I 2.5 84 The result of a Monte Carlo study on the 2-component mixed Poisson regression model with constant mixing probabilities and variable rates 2.4 83 86 The results of fitting mixed Poisson regression model to the data from a Monte Carlo study on the 2-component mixed Poisson regression model with constant mixing probabilities and variable rates 2.6 The results of the likelihood ratio tests for the hypothesis of a 2 87 = 0 based on the 2-component mixed Poisson regression model—IT 2.7 88 The results of model selection based on AIC and BIC values for the Monte Carlo study 89 2.8 Poisson regression and overdispersion test statistics for patent data. 2.9 Mixed Poisson iegression model estimates for patent data . 90 91 2.10 Parameter estimates for five models for patent data 92 2.11 Parameter estimates for five methods for seizure data analysis 93 2.12 Mixed Poisson regression model estimates for seizure data 94 2.13 Mixed Poisson regression model estimates for terrorist bombing data. 2.14 Mixed Poisson regression model estimates for workplace injury data. Lu . . 95 96 2.15 Number of revertant colonies of salmonella (ye) 97 2.16 Mixed Poisson regression model estimates for Ames salmonella assay data 98 2.17 Parameter estimates for five estimation methods for assay data) 99 3.1 Data of Busvine (1938) 3.2 The results of the simulations for the mixed logistic regression model 176 (Modell) 3.3 177 The results of the simulations for the mixed logistic regression model (Model2) 3.4 178 The results of the simulations for the mixed logistic regression model (Model3) 179 3.5 Number of trout with liver tumors /number in tank 180 3.6 Logistic regression and mixed logistic regression model estimates for fishdata.181 3.7 Parameter estimates for four models for fish data Y1i 182 List of Figures 2.1 The index plot of Pearson residuals from the fitted 3-component mixed Poisson regression model for patent data 2.2 The index plot of deviance residuals from the fitted 3-component mixed Poisson regression model for patent data 2.3 101 The index plot of likelihood residuals from the fitted 3-component mixed Poisson regression model for patent data 2.4 100 102 The index plot of average relative coefficient changes from the fitted 3component mixed Poisson regression model for patent data 103 2.5 The plot of patent data 104 2.6 Classification of patent data according to estimated posterior probabilities based on the fitted mixed Poisson model 105 2.7 Daily epileptic seizure counts 106 2.8 Estimated hourly seizure rates and classification of seizure data accord ing to estimated posterior probabilities based on the fitted mixed Poisson regression model 2.9 107 Estimated mean and variance based on the fitted mixed Poisson regression model for seizure data 108 2.10 The index plot of Pearson residuals from the fitted mixed Poisson regres sion model for seizure data 109 2.11 The index plot of deviance residuals from the fitted mixed Poisson regres sion model for seizure data 110 V(1I 2.12 The index plot of likelihood residuals from the fitted mixed Poisson re gression model for seizure data 111 2.13 The index plot of average relative coefficient changes from the fitted mixed Poisson regression model for seizure data 112 2.14 The time plot of terrorist bombing data 113 2.15 Classification of terrorist bombing episodes according to estimated poste rior probabilities based on the fitted mixed Poisson regression model. . 114 2.16 The index plot of Pearson residuals from the fitted mixed Poisson regres sion model for terrorist bombing data 115 2.17 The index plot of deviance residuals from the fitted mixed Poisson regres sion model for terrorist bombing data 116 2.18 The index plot of likelihood residuals from the fitted mixed Poisson re gression model for terrorist bombing data 117 2.19 The index plot of average relative coefficient changes from the fitted mixed Poisson regression model for terrorist bombing data 118 2.20 Classification of accident data according to estimated posterior probabili ties based on the fitted mixed Poisson regression model 119 2.21 The index plot of Pearson residuals from the fitted mixed Poisson regres sion model for accident data 120 2.22 The index plot of deviance residuals from the fitted mixed Poisson regres sion model for accident data 121 2.23 The index plot of likelihood residuals from the fitted mixed Poisson re gression model for accident data 122 2.24 The index plot of average relative coefficient changes from the fitted mixed Poisson regression model for accident data hc 123 2.25 Classification of Ames data according to estimated posterior probabilities based on the fitted mixed Poisson regression model 124 2.26 The index plot of Pearson residuals from the fitted mixed Poisson regres sion model for Ames data 125 2.27 The index plot of deviance residuals from the fitted mixed Poisson regres sion model for Ames data 126 2.28 The index plot of likelihood residuals from the fitted mixed Poisson re gression model for Ames data 127 2.29 The index plot of average relative coefficient changes from the fitted mixed Poisson regression model for Ames data 3.1 The index plot f Pearson residuals from the fitted mixed logistic regres sionmodel for fish data 3.2 184 The index plot of likelihood residuals from the fitted mixed logistic regres sion model for fish data 3.4 183 The index plot of deviance residuals from the fitted mixed logistic regres sion model for fish data 3.3 128 185 The index plot of average relative coefficient changes based on the fitted mixed logistic regression model for fish data 186 3.5 Classification and dose-response curves for fish data 187 3.6 The mean-variance relationship based on the fitted mixed logistic regres sion model for fish data 188 Acknowledgement First and foremost, I would like to thank my supervisor Prof. Martin L. Puterman. Marty suggested the basic models used in this thesis and provided me with much needed encouragement — especially in the early stages of our work. His assistance in the thesis research, writing and financial support through NSERC grant A5527 is deeply appre ciated. His comments on drafts of thesis reflect a thoughtful serious reading and have substantially improved the final version. To him I give my deepest thanks. Also, I would like to express my appreciation to my thesis committee members. Prof. Bent Jorgensen has raised many important questions about the thesis and has provided me with very valuable information on related research work. Prof. lain Cockburn intro duced me to patent data and provided helpful comments concerning econometrical issues related to it. I also thank to Dr. Nhu Le for constructive comments concerning mixed Poisson regression models, and for providing assistance with seizure data analysis. In addition, I acknowledge the receipt of MacPhee Memorial Fellowship and Leslie G. J. Wong Memorial Fellowship which provided support during my graduate schooling at this university. Finally, I must thank my fellow students, faculty and staff of the management science division. The atmosphere has been open, relaxed and hospitable. I consider myself very fortunate to have known and become friends with so many people here. To my parents Chapter 1 Introduction Poisson and logistic regression models are widely used for analyzing discrete data. Using such models, we implicitly assume that the response variable follows either a Poisson dis tribution or a binomial distribution with mean depending on covariates. Sometimes such assumptions may not be appropriate in the sense that the mean-variance relationship specified by the distribution of the response variable is not valid. In most of these cases, we often observe that data are overdispersed, i.e., the observed sample variance is larger than that predicted by inserting the observed sample mean into the mean-variance rela tionship. On the other hand, in few cases of data analysis, we may also observe that data are underdispersed, i.e., the observed sample variance is smaller than that predicted by inserting the observed sample mean into the mean-variance relationship. Without taking either overdispersion or underdispersion into account, using these regression models may lead to biased parameter estimates and incorrect inferences about the parameters. In this thesis, we propose using a finite mixture model approach to adjust for overdisper sion. Specifically, we incorporate covariates in both mixing probabilities and component parameters of a finite mixture model in such a way that overdispersion may be explicitly interpreted by the model structure. The proposed models have applications in many different disciplines including economics, biostatistics and epidemiology. The work in this thesis was motivated by several studies in different areas. One of these studies is to analyze relationship between technological innovation and research and development expenditures for U.S. high-tech companies. Another study is to assess 1 Chapter 1. Introductioi 2 treatment effects in a clinical trial on epileptic patients carried out in British Columbia Children’s Hospital. For the clinical study, for instance, the patients were randomly assigned into two groups: control and treatment. Those patients in the treatment group received monthly infusions of intravenous gammaglobulin (IVIG), while those patients in the control group received “best available therapy”. The primary end point of the trial was daily seizure frequency. The principal data source was a daily seizure diary which contained the number of hours of parental observation and the number of seizures of each type during the observation period. We analyzed a typical series of myoclonic seizure counts from a single subject receiving IVIG. Data extracted from the seizure diary were the daily counts and the hours of parental observation. The questions of interest here are that of fitting a model to these counts which describes the pattern of epileptic seizure activity, and assessing IVIG effects on suppression of myoclonic seizures. Although it is a reasonable assumption that a daily seizure count follows a Poisson distribution which implies random occurrence of seizures in time, the data were overdispersed with respect to the Poisson regression model with mean including treatment effect. As indicated by the clinical investigators conducting this study, they have observed subjects to have “bad days” and “good days” with no obvious explanation of this effect. Hence, we are led to consider the mixed Poisson regression models which allow seizure frequency function to change in a random fashion. Several alternative approaches for modelling overdispersion with respect to Poisson assumption are reviewed in Chapter 2. In this chapter, we propose a mixed Poisson regression model and shdw that it includes several special cases such as the usual Poisson regression model, mixed Poisson regression model with constant mixing probabilities and mixed Poisson regression model with constant Poisson rates. We also discuss identifiablity of the proposed model and provide sufficient conditions for identifiability. Maximum likelihood parameter estimation is used. An algorithm for computation of maximum Chapter 1. Introduction 3 likelihood estimates is presented (FORTRAN code for implementation of the algorithm is provided in Appendix A). Particularly, for a fixed finite number of components, the algorithm finds maximum likelihood estimates by two steps: (1) using the EM algorithm first until either observed log likelihood or parameter estimates do not change more than a given tolerance, and (2) using a quasi-Newton algorithm which maximizes the observed log likelihood function. The results of a Monte Carlo study on performance of the algorithm are given here. Model selection procedure determining the number of components and inference about regression parameters is also presented. Classification based on the estimated posterior probabilities from the fitted model is discussed. Finally, four applications of this model are given, and results are compared to those from quasilikelihood approaches. Several alternative approaches for modelling overdispersion with respect to binomial assumption are reviewed in Chapter 3. In this chapter, we propose a mixed logistic regression model and shOw that it includes several special cases such as the usual logis tic regression model, mixed logistic regression model with constant mixing probabilities and mixed logistic regression model with constant binomial parameters. We also discuss identifiablity of the proposed model and provide sufficient conditions for identifiability. Maximum likelihood parameter estimation is used. An algorithm for computation of maximum likelihood estimates is presented (FORTRAN code for implementation of the algorithm is provided in Appendix A). Particularly, for a fixed finite number of compo nents, the algorithm finds maximum likelihood estimates by two steps: (1) using the EM algorithm first until either observed log likelihood or parameter estimates do not change more than a give tolerance, and (2) using a quasi-Newton algorithm maximizes the ob served log likelihood function. The results of a Monte Carlo study on performance of the algorithm are given here. Model selection procedure determining the number of compo nents and inference about regression parameters is also presented. Classification based Chapter 1. Introduction 4 on the estimated posterior probabilities from the fitted model is discussed. Finally, an application of this model is given, and results are compared to those from quasi-likelihood approaches. Chapter 4 concerns summary, conclusions and future research. We discuss some simi larities and differences between the mixed Poisson regression and mixed logistic regression models. We extend the mixed Poisson regression and logistic regression models to the more general case of a one-parameter exponential distribution. Mixed exponential re gression models are considered in this chapter. Furthermore, we propose hidden Markov Poisson regression models for longitudinal data. Particularly, we give preliminary results of this model, including model definition, moment structure, identifiability and parameter estimation. Chapter 2 Mixed Poisson Regression Models 2.1 Poisson regression and its modifications The Poisson regression model has been widely used for analyzing count data in which each observation consists of a discrete response variable and a vector of covariates or predictors. Typical examples of such data include counts of events in a Poisson or Poissonlike process where the upper limit to the number is infinite or effectively so. For instance, the response variable may represent the number of failures of a piece of equipment per unit time, the number of purchases of a particular commodity per family, or the number of bacteria per unit volume of suspension. In practice, however, the model sometimes fits poorly, suggesting the need for alternative models. In this case, it is not uncommon that observed data are overdispersed, i.e., the variance of an observation is greater than its mean. This may be reflected in over-large residual deviance and adjusted residuals which have a variance > 1. Without consideration for the overdispersion, using the Poisson regression model may not be justified. In the first part of this dissertation, mixed Poisson regression models are introduced and investigated. These models are applicable in several different situations where the Poisson regression model appears inadequate and provide an alternative way to adjust for extra-Poisson variation with a more meaningful interpretation. Suppose that the ith response variable Y is a count, and associated with this response is a covariate vector x = (x,. . . , xi,.)’ for 1 5 i n. The Poisson regression model Chapter 2. Mixed Poisson Regression Models 6 assumes that the Y are distributed independently Poisson f(ii where ) = = a’ exp(xa), a e exp(—)j) for yj = (j) with density function 0,1,2,..., (2.1) R” is a r-dimensional vector of unknown parameters. Note that the Poisson parameter X, = E(Y) is related to the covariate vector x by a link function so that the dependence of ) on x is assumed to be multiplicative and is usually written in the logarithmic form log(X) = xa. (2.2) Equations (2.1) and (2.2) are sometimes referred as a log-linear model. The Poisson regression model has been applied in many areas (e.g., Frome, Kutner, and Beauchamp 1973; Frome 1983; Holford 1983; Hausman et al. 1984; Mannering 1989). For instance, Frome et al (1973) used the Poisson regression model to describe the rela tionship between the number of failures of a piece of electronic equipment per unit time (response variable) and the times spent in regimes one arid two (covariates), and the relationship between the number of colonies produced in the spleen of recipient animals (response variable) and the concentration of injected cells and the radiation dose (covari ates). Frome (1983) applied the Poisson regression model in the analysis of survival time data. He analyzed the data that were obtained in epidemiologic follow-up studies and organized into a format similar to that of a life table. Holford (1983) analyzed the data that consists of numbers of prostatic cancer deaths and mid-period population denomi nators for non-whites in the US by age and calendar period, and fitted it to the Poisson regression model with age and cohort effects to the death rates. Hausman et al. (1984) introduced the Poisson regression model to analyze the relationship between the research and development (R&D) expenditures of firms and the number of patents applied for and received by them. Mannering (1989) used the Poisson regression model to investigate Chapter 2. Mixed Poisson Regression Models 7 the determinants of commuter flexibility in changing routes and departure times for the morning trip to work. He assumed that the number of route and departure time changes occurring during a one month period follows a Poisson distribution with mean depending on a vector of commuting and socioeconomic characteristics for an individual. The Poisson regression model is analogous to the normal linear regression model in many ways. The estimation of unknown parameters is straightforward and is done either by an iterative weighted least squares technique or by a maximum likelihood algorithm. The log likelihood function is globally concave so that maximization routines converge rapidly. Residual analysis is carried out in the same way as the normal linear regression model, except that the definition of the residual is different. The Poisson regression model is used for many different purposes. Sometimes, infer ence concerning the regression parameters a is of primary importance. For example, Y may denote the number of car accidents for an individual. Large values of as (relative to their standard errors) then correspond to factors which significantly increase the chance of the accidents. On the other hand, when one is primarily interested in creating a good predictive model, the interpretation of parameters may take a secondary role. The Poisson regression model is an example of a Generalized Linear Model (McCul lagh and Nelder, 1989) in which the frequency distribution of the response Y is a Poisson distribution with mean )(x), and the link is a log function: g) = log(x)) = x’cv. A consequence of using the Poisson regression model is that the variance equals the mean, i.e., Var() = EQI’). In practice, however, we often have overdispersed data, i.e., Var(Y) > E(Y). When the Poisson regression model fits the count data poorly, overdispersion is often a cause of the problem. There are several ways to modify the Poisson regression model. Using GLM formulation we can modify it by choosing either an alternative link function or an alternative frequency distribution, or both. Since the log link has nice properties such as multiplicative effects of covariates on the Poisson 8 Chapter 2. Mixed Poisson Regression Models mean, few researchers have suggested use of alternative link functions. On the other hand, there are a lot of studies of alternative frequency distributions for the Poisson distribution (e.g., Breslow 1984; Efron 1986; Lawless 1987b and Dean et al. 1989). To adjust for extra-Poisson variation, mixed Poisson distributions have been used as frequency distributions (Efron 1986; Lawless 1987b and Dean et al. 1989). In these models, the Poisson means associated with each observed count are defined as latent variables that are sampled from a specified parametric distribution. In other words, the Poisson means are random variables following a specific distribution. Under such a set up, the marginal density function of the response Y without covariates can be often given by Pr(Y y I ,g) = = f (2.3) [vexp(—vA)g(v)dv, y=O,1,... where g(v) is a mixing probability density function and ). > 0 is a unknown parameter. Such models can be viewed as multiplicative Poisson random-effects models (Brillinger 1986) for the following reasons: (1) there is a random effect T with a density g(v), in the model; (2) conditional on 1’ mean = v, 0 the response Y has a Poisson distribution with Without loss of generality we can assume that E(T) v). v > = 1. Most authors have considered a gamma mixing distribution, which leads to a negative binomial distribution for the observed data (Manton, Woo dbury, and Stallard 1981, Margolin, Kaplan, and Zeiger 1981). In this case the mixing distribution r(k)V exp(—kv) for v g(v) is 0 g(v) 0 where k > 0 and > otheTwise. 0 are unknown parameters. Note that E(T) = 1 and Var(T) = 1/k. Hence (2.3) becomes f(yA,k)= (k)Y(k)k, fory=0,1,2,..., (2.4) Chapter 2. Mixed Poisson Regression Models where k 9 0 is often referred to the index or dispersion parameter. The mean and variance of Y are E(Y) = A and Var(Y) A + (l/k)A . 2 (2.5) As a natural extension of the above models, several researchers (e.g., Lawless, 1987b, and Hausman, Hall, and Griliches, 1984) have studied negative binomial regression mod els in which covariates are related to the parameter A by a positive function A(x). Usually one takes the common log-linear form A(x) = exp(x’a) so that random and fixed effects are added on the same exponential scale. The negative binomial regression model may be interpreted as follows: if T is a positive-value random variable with mean 1 and vari ance 1/k, and if the distribution of Y, conditional on T = v and covariates x, is Poisson (vA(x)), then the marginal mean and variance of Y are as in (2.5), and the marginal distribution of Y is the negative binomial defined by (2.4). Note that in the negative binomial regression model, the shape parameter k is a constant for all observations. In this case, the likelihood equations based on the neg ative binomial model are unbiased and the maximum likelihood estimates of the mean parameters are consistent, regardless of the true variance function (Lawless, 1987b and Hausman, Hall, and Griliches 1984). Several researchers apply the negative binomial model in different situations. For instance, for count data without covariates, Anscombe (1950) gives a comprehensive discussion of properties of the model and several examples of the use of the model. Ehrenberg (1972) applies it to model market behaviour for frequently purchased lowcost products by assuming that the number of purchases follows the negative binomial distribution. For count data with covariates, Manton et al. (1981) use it in the analysis of mortality rates. They assume that variation in individual risk levels follows the gamma distribution within each category, and that conditional on the individual risk levels, the Chapter 2. Mixed Poisson Regression Models 10 number of cancer deaths follows the Poisson distribution with mean depending on some covariates including age and race. Hausman, Hall, and Griliches (1984) introduce it to study the relation between technical innovation and firm characteristics (mainly R&D spending and sales) at firm level. They assume that there is a random firm effect described by the gamma distribution, and that number of patents applied for by a company per year, Y, follows a negative binomial regression model in which E(Y) = x) is a log-linear function of the covariates: annual R&D spending and sales of the company. Another useful choice of the mixing distribution g(v) in (2.3) is an inverse Gaussian distribution (e.g., Folks and Chhikara 1978, Tweedie 1957) for T, with density g(v) = (27rrv3)_h/2 exp(—(v The parameter r — /2rv), 2 1) v > 0. (2.6) is unknown, and equals Var(T). The marginal distribution of Y from (2.3) is then a Poisson-inverse-Gaussian model with the mean and variance relationship: E(Y) = ) and Var(Y) = )+r\ . 2 This model provides a heavier-tailed alternative to the negative-binomial model, although both have the same mean and variance relationship. A difficulty of using the model is to compute the integral in (2.3). Dean, Lawless and Willmot (1989) introduce a Poisson-inverse-Gaussian regression model by taking the common log-linear form \(x) = exp(x’cv). This model has almost the same structure and interpretation as the negative binomial regression models. Jor gensen (1987) and Stein and Juritz (1988) also propose other versions of Poisson-inverse Gaussian models by using different variance functions. Jorgensen (1987) defines both the Poisson and inverse-Gaussian distributions as exponential dispersion models so that his mixture model is an exponential dispersion model and satisfies an appealing convolutions property. Stein and Juritz’s model is structured so that the regression parameter vector c is orthogonal to the shape parameter (analogous to the r in the above model) specifying the degree of extra-Poisson variation. Neither model has, however, the simple structure Chapter 2. Mixed Poisson Regression Models 11 of the above model in terms of the multiplicative random effects. A log normal mixing distribution for g(v) has also been advocated (e.g., Hinde 1982 and Pocock et al 1981). In this model, the Poisson mean has a lognormal distribution with location parameter related to a linear function of covariates and a constant scale parameter. Efron (1986) introduces the double Poisson distribution as an alternative frequency distribution to accommodate extra-Poisson variation. The exact double Poisson density is h,o(y) = c(A, 0)f,o(y), where f(y) = (Oh/ e_Oj 2 ( ) (—) e\ ° , for y = 0,1,2,..., and the factor c(X, 0) can be calculated as (9) = f,e(y) 1 + (1 + Since the constant c(), 0) nearly equals 1, the approximate probability density function for the double Poisson distribution is f,e(y). Usually ) is referred to as a mean parameter and 0 as a dispersion parameter. The double Poisson distribution allows us to individually adjust the mean and variance of the response Y using the parameters \ and 0, and it only involves rescaled Poisson distributions, in the approximate sense that Y is approximately expressed by X/0 where X follows the Poisson distribution with mean )0. For count data with covariates, we can incorporate covariates to either ). or 0 or both. Efron suggests that the double Poisson regression model may be more appropriate for count data in which subjects may be, for example, obtained in clumps rather than by genuine random sampling. Note that such clumped sampling may be one of possible causes of overdispersion. Chapter 2. Mixed Poisson Regression Models 12 Another approach to modify the Poisson regression distribution is the quasi-likelihood. This approach specifies only the mean and variance structure of Y implied by the mixed Poisson model, and estimates the regression coefficients by quasi-likelihood and the vari ance parameter by the method of moments (e.g., Williams 1982 and Breslow 1987). The attraction is that unduly rigorous assumptions about the frequency distribution are avoided. The trade-off is that the estimation based on the quasi-likelihood model is not as efficient as the fully parametric model (Lawless 1987b). Several researchers have studied different quasi-likelihood models by assuming dif ferent relationship between mean and variance. Breslow (1984) introduces the quasilikelihood models by assuming that conditional on \ and exposure tj, the response Y has an independent Poisson distribution with mean E(Y) = and log) = x’cr + e where c is a vector of unknown parameters and the j are random error terms having means 0 and a constant unknown variance . 2 Note that there are no assumptions on the probability distributions of random effects j except the first two moments. Breslow (1984) also proposes two procedures to fit count data to the model. One is when the data have relatively large values of . In this case = log(Y/t) may be regarded as having approximate normal distributions with mean x’c and variance 2 + rj 2 where rj 2 = 1/E(Y). Hence the estimation method is based on the iteration of the following two steps: (1) obtain estimates of the regression parameters by weighted least squares solution using the empirical weights w 2 = 2+ (u )_1, and (2) obtain the value of a 2 by setting the chi-square criterion equal to its degree of freedom, i.e., — /(a + r?) 2 x’a) = — p, where p is the number of parameters in the model. The other is when the data have relatively small values of Y. In this case, the normal approximation appears in doubt. Since the above assumptions lead to the approximate Chapter 2. Mixed Poisson Regression Models mean and variance relationship: E(}’) = t 13 exp(x’a) and Var(Y) , the 2 +a maximum quasi-likelihood estimates are obtained with GLIM (Backer and Nelder, 1978) by using Poisson error function and the natural log link, declaring log(t ) as an offset, 2 and defining prior weights w (1 + u ))—’. The value of 2 is also obtained by setting 2 = the chi-square criterion equal to its degrees of freedom, i.e., — )2/{( + 2)} = — where p is the number of parameters in the model. Note that this approach can also apply to the data that have both small and large values of Y, because the above approximation of the mean and variance relationship can still hold. There are also other quasi-likelihood models in the literature for analyzing overdis persed count data. For instance, many non-Poisson distributions encountered in statis tical practice may have the connection between the mean and variance of a response Y as expressed by Var(Y) = ciE(Y) + 2 {E(Y)} c . This relation was used by Bartlett (1936) to analyze counts for field experiments. Both Armitage (1957) and Finney (1976) define another mean-variance relationship as Var(Y) = and find by the study of examples that 1 < b < 2. Breslow (1990) also uses a quasi likelihood model with the above mean-variance relationship to analyze viral activity from pock counts. Another approach for modifying the Poisson distribution is through finite mixture Chapter 2. Mixed Poisson Regression Models 14 models which are obtained by taking the mixing distribution in (2.3) as a discrete prob ability distribution with c points of support. Hence the distribution of Y is Fr(Y = where Ipi,.. ,Pc,i, 3 = 1 and p p 3 > 0 (1 < functions with mean j . . ,A) = pPo(y c), and Po(y I ,)) are Poisson distribution This approach applies to a wide variety of applications and has received an increasing amount of attention late. See for example Everitt and Hand (1981) and Titterington et al (1985). Simar (1976) and Leroux (1989) study finite mixtures with an unknown number components for overdispersed count data. No researchers have systematically studied regression-type finite mixture models with covariates. 2.2 Implications of Overdispersion Overdispersion as an issue has been recognized for many years. In Poisson regression analysis of count data, residual variability sometimes is greater than what is predicted by Poisson models, suggesting either lack-of-fit (incorrect mean) or overdispersion, or both. It is important to note that so far various score tests cannot distinguish lackof-fit from the true overdispersion (incorrect variance). In our discussion, we mainly concentrate on the issue of overdispersion rather than the choice of the link function. Without consideration of overdispersion, using the Poisson regression model may be misleading in statistical analysis. This will be illustrated in our examples later. Many authors have studied the effects of overdispersion on inferences made under the Poisson regression model. As Cox (1983) indicates, overdispersion in general has two effects. One is that summary statistics have a larger variance than anticipated under the simple model. The second effect is a possible loss of efficiency. It is important to note that the implications of overdispersion may also depend on the type of overdispersion specified. For the Poisson regression analysis, if the overdispersion is accommodated by randomizing Chapter 2. Mixed Poisson Regression Models 15 the Poisson mean to obtain gamma-Poisson models and quasi-likelihood models, among others (e.g. Cox 1983), fitting maximum likelihood of a log linear model for Poissondistribution data retains high efficiency for a modest amount of overdispersion, provided that the log linear model determines the expected value of the observed count (Cox, 1983). Specifically parameter estimates based on the Poisson regression model are generally not seriously biased or inefficient, but estimated standard errors are too small and tests are too liberal (Breslow 1990; Cox 1983; Firth 1987; Hill and Tsai 1988; McCullagh and Nelder 1989). On the other hand, when there is serious overdispersion, using the usual Poisson regression may lead to either seriously biased or inefficient parameter estimates. For instance, in a random coefficient log-linear Poisson regression, the response Y is Poisson (e) given and /3, but each individual has a different random baseline c or different responsiveness to treatment /3, parameter estimates of c and /3 as well as their standard errors based on the Poisson regression may be misleading. In particular, the mean of a random coefficient is not the Poisson mean evaluated at the average of the random coefficients (see Neuhaus et al. 1991). Also if the true log-mean is a + x3 + z-y but only x is recorded, then the assumed log-mean a* + 43 has a random intercept a* + z7 that varies with z. In this case the extent of the overdispersion depends on z, and the parameter estimate of a* based on the Poisson regression may be seriously biased when the overdispersion is serious. When the extra-Poisson variation is explained by the mixed Poisson regression model, we will show, in examples, that without accounting for the overdispersion may have rather different results from the usual Poisson regression. Chapter 2. Mixed Poisson Regression Models 16 Tests for Extra-Poisson Variation 2.3 There are several overdispersed Poisson regression models which have been discussed in the literature. Without fitting a particular overdispersed Poisson model, we would like to know whether there is serious overdispersion. Several methods have been proposed to detect overdispersion in terms of the Poisson assumption. An informal graphical approach is introduced by Lambert and Roeder (1993) and Lindsay and Roeder (1992). For instance, for log-linear Poisson regression, Lambert and Roeder (1993) define the following function ()Y2 C() = n’ exp(th — i=1 where . = exp(x3) and t > 0. They show that C(,,t) tends to be convex when the data are from a random mean Poisson regression model, random coefficient Poisson regression model, or double Poisson regression model. Thus they suggest to use the plot of C(t) against it. The more convex C(it) appears, the more evidence there is of overdispersion or an omitted variable. It is not clear, however, whether this, approach can apply to other modified Poisson regression models such as the finite mixture of Poisson regression model for dealing with extra-Poisson variation. Another simple approach is to fit a more comprehensive model that contains the Poisson model and then test for a reduction to the simple model using, for instance, a likelihood ratio test. This approach, however, may provide misleading results (Dean, 1992). As Lawless (1987a) indicates, in certain circumstances the asymptotic distribu tions used with these tests may not be reliable because they tend to underestimate the evidence against the base model. A widely used approach is through score tests. With these tests we may fit the Poisson Chapter 2. Mixed Poisson Regression Models 17 regression model as a first step in the model building process and test for overdisper sion. Score tests for detecting extra-Poisson variation have been discussed by Cameron and Trivedi (1986), Collings and Margolin (1985), Dean and Lawless (1989), and Fisher (1950). Concern has been expressed over the suitability of tests and confidence interval based on overly simple models for extra-Poisson. Breslow (1990) proposes tests for pa rameters that appear in the mean, using model-free estimates of variance for each case. He found that these to be robust to incorrect specification of the variance function, but not as powerful as tests based on correct model for response variation. Dean (1992) develops a unifying theory for all the score tests mentioned above. Before applying the mixed Poisson regression models, we need to determine whether the data are overdispersed with respect to the Poisson distribution in Poisson regres sion models. We use three score test statistics proposed by Dean (1992). They test the hypothesis of no overdispersion against alternatives representing different forms of overdispersion. The test statistics are Pa Pb — — — = and P corresponding to the following specifications of overdispersion: (a) E(y) j, ) 1 Var(y (b) E(y) = jj, Var(y) = jj(1 + rj); (c) E(y) = Var(y) = (1 + r). fL(1 + rt) for r small; In these formulae ,â is the estimated mean value for the independent identical obser vations based on Poisson regression. Under H : r 0 = 0, each asymptotically follows a standard normal distribution. Note that the difference between (a) and (b) is that the Chapter 2. Mixed Poisson Regression Models 18 former has the approximate forms for the first two moments, whereas the latter has the exact ones. For small samples, Dean (1992) provides the following “corrected” versions P, P and P corresponding to Pa Pb and P respectively. — P,a — — (1 — — F’b an d where Ijj with W = — 1 (yj — is the ith diagonal element of the matrix H 1 1 diag( , .. 1 . , j%,) + — — = X(XTWX)_1XTW1/2, 2 Wh/ and X being an n x p design matrix. Dean (1992) points out that the distributions of these corrected statistics converges very quickly to normality. 2.4 Mixed Poisson Regression Models Without covariates, the finite mixture approach has been used for analyzing count data appearing extra-Poisson variation (c.f. Titterington et al.,1985; Simar 1976; and Leroux, 1989). With covariates, however, this approach has not been systematically studied and directly applied for analyzing regression-type count data. In this section, we extend the finite Poisson mixture model to the mixed Poisson regression model by allowing both the component Poisson parameters and mixing probabilities of a mixture to depend on covariates. We investigate some basic features of the model. We also discuss identifiability for the model and provide sufficient conditions for the identifiability. Chapter 2. Mixed Poisson Regression Models 2.4.1 19 The Model Let the random variable Y denote the ith count response, and let {(y, t, xJ, i 1,. . . , = n} denote observations where yj is the observed value of Y, t a non-negative num ber representing the time period or exposure during which observation y is generated, and x (x, m)) a covariate vector in which x and fm) are k 1 and k -dimensional 2 covariate vectors corresponding to the regression part and the mixing part of the model respectively. We allow some or all components of x and x to be identical. Usually the first elements of (m) and x is a 1 corresponding to an intercept. The mixed Poisson regression model assumes (1) The unobserved mixing process can occupy any one of c states where c is finite and unknown; (2) For each observed count yj, there is an unobserved random variable, A , representing 1 the component which generates y. Further, the (1, A ) are pairwisely independent; 1 (3) A, follow discrete distributions with c points of support, 1,. 1 Pr(A where =i Pu = j) . c, and , . , 3 pj = 1 for each i and = (m) 1 pj(x 3 pj ,3) (m) 1 exp(/3,x ) forj . (m) 1 c—i kx1 3 i+>k.exp(/ = 1 ... c—i and 2.7) ) c—i (m) 1 pc(x Plc ,3) = 1 (2.8) Pjj, 3=1 where 3 = (th. . . , c._1)’ 8 , and /3 = (,6,,. .. , jk 3 / ) 2 ’, j = 1,. . . , c — 1, are unknown parameters. Note that all components of /3 appear in each mixing probability Pij, Chapter 2. Mixed Poisson Regression Models (4) Conditional on A: j, = 20 Y follows a Poisson distribution which we denote by (r) ‘ 1 Q’i I x 3 f Po(y I ) 3 exp(—Ai (2.9) = where we define a log link function between the Poisson mean and covariates as t(x, aj) where a 1,. . . (ar,. .. , t exp(ax), for as)’ are unknown parameters , j= 1,. and a . = , . (aji,. . . , a, )‘, j = , c. Note that we could also choose other link functions. The above assumptions define the unconditional distribution of observations, yj, as a finite Poisson mixture in which the mixing probabilities, pj, are related to the covariates m) through the logit function, and the component distributions are Poisson distribu tions with mean determined by the exposure, t and by the Poisson rate ), 3 a which is related to the covariates x through an exponential function. Suppose that observations can be classified into c groups corresponding to the c underlying states, a vector of unknown parameters a may be interpreted as the coefficients of the Poisson regression for group j. On the other hand, unknown parameters /3 may be interpreted as the coefficients of the multinomial regression in which A. and m) are dependent and independent variables respectively. Note that our model allows some or all components of m) and x to be identical, and some coefficients of Poisson rates, as, to be constant across components, i.e., ai = =Oforsomej,j=l,...,c. 31 forj=l,...,cor0inoneorseveralcovariates,i.e.,a Under the above assumptions the probability function of Y satisfies , m) t• a, /3) = p ( ) (2.10) Chapter 2. Mixed Poisson Regression Models where pj and Po (yj I 21 V ki) are given by (2.7), (2.8) and (2.9) respectively. We may equivalently view the model as arising from the following sampling scheme: Observations are independent; For observation i, component j is chosen according to a multinomial distribution with probability pjj; Subsequently, y is generated from a Poisson distribution with mean )qj. A justification for the mixed Poisson regression models is to assume that the coefficient vector a in the usual Poisson regression model, log()) = a’x, is a random variable following a discrete distribution with c points of support: Pr(a = aj) 3 for j = 1,. =p . . c. By making the further asumption that p 3 are related to a covariate vector m) through a logit link p(m), /3) we are led to the model of equation (2.10). Note that this model includes many previously studied models as special cases. • Choosing c = 1 yields the Poisson regression model; • Setting r) = m) = 1 and t, = 1 for all i yields an independent Poisson mixture model (Simar (1976) and Leroux (1989)); • Setting m) = 1 yields an independent finite mixture of Poisson regression. Fur ther, letting the Poisson rates have common regression parameters and different intercepts yields a Poisson regression with a random intercept which follows a dis crete mixing distribution; • Setting cm) = 1, c,= 2 and ) (x, ai) 1 0 yields a Poisson regression model with an extra mass at 0; • Setting = 1 yields an independent multinomial mixture of Poisson distributions with constant rates. For the above model, the mean and variance of observation y are, respectively, = E(E(IA)) Chapter 2. Mixed Poisson Regression Models 22 (2.11) tpjjjj = and E(Var(Y I At)) + Var(E(Y Var(1) I 2 {= = Obviously, Var(E( I Ai)) = Aj) {Piii} } (2.12) 0 if and only if i1 = i2 = ... (2.13) = This implies that the mixture model is able to cope with extra-Poisson variation among ,. . . , Y,, due to heterogeneity in the population. 1 Y 2.4.2 Identifiability To be able to estimate the parameters of (2.10), it is important to establish identifiabil ity of the model, that is, two sets of parameters in the mixture which do not agree after permutation cannot yield the same mixture distribution. Furthermore, identifiability is a necessary requirement for the usual asymptotic theory to hold for the estimation proce dures considered latter. For finite mixture models with covariates we define identifiability as follows. Let F = {F(x, 0); 0 E , x e Rd} be the class of d-dimensional distribution functions from which finite mixtures are to be formed. This class is identifiable if e for x Rd, = where 3 p = = 1 and pj, j5 are positive, implies that the summations such that p 3 = F = F, j = 1,.. . , c = and we can order c. Note that if a class of models 23 Chapter 2. Mixed Poisson Regression Models is not identifiable we cannot discriminate between (at least two) parameter values using data generated by the model. Without covariates, Teicher (1961) proves that the class of finite mixtures of Poisson distributions is identifiable. Considering covariates, we extend the above definition of identifiability as follows. r) m) Definition 1: Consider the collection of probability models f(y x, t, C x Ax P, sample spaces a, /3)}, with a restriction that ) < )1,.. x(m) where x(r) E R k 1 and 2 . , Y, and fixed covariate vectors e R k for z 2 = identifiable if for (c, a, /3), (c*, a*, /3*) T) for all y Y, i = ... (m) 1,. < )iC, (xT), , a, 1 t parameter space Xm)), ..., (X 4m)) ,n. The collection of probability models is . . . C x A x 7) a, /3) çr) = f(y t 1,... ,n, implies (c,a,/3) = t, a’ /3*) cm) (2.14) (c*,a*,/3*). Note that the order restriction in the definition means that two models are equivalent if they agree up to permutations of parameters. We now provide sufficient conditions for identifiability. Theorem 1: The mixed Poisson regression model is identifiable if both matrices X(m) and X(r) are full rank, where X(m) = (m)m) . 4m)y and x(r) = . . . Proof: Suppose that (c, a, /3), (c*, a*, /3*) satisfy (2.14). This then implies that for each i and all I where p, = p(Xm), /3) and ) j) = pPo(y I , (2.15) a) are defined above. Note that each side = of (10) may be regarded as a finite Poisson mixture without covariates. Teicher’s result implies that c = c, Pij = and = Chapter 2. Mixed Poisson Regression Models for i 1,..., n and j 24 = 1,... , c. By the definition of the model, we obtain exp(xm)) = exp(/3xm)) for (r) exp(ax ) (r) = 3 exp(a*, x ) for j j = 1,... , c = 1,. . . , — 1 c (2.16) (2.17) From (2.16) and (2.17) we obtain 1 (í3,—/3)x (a — *y(r) =Ofor=1,...,c—1andz=1,...,n = 0 for j = 1,. . . , c and i = 1,. . . , n or (/3 /3;)!X(m) (crj_a)IX(r) = 0 forj = 1,...,c— 1 = 0 forj = 1,...,c. (2.18) (2.19) Sufficient conditions for identifiability are that both X(m) and X(r) are full rank matrices, in which case (2.18) and (2.19) imply that (a, /3) = (a*, j3*). We can assume this without loss of generality such as might be the case in an ANOVA structure, since if it does not hold we can reparameterize the model accordingly. 2.5 D Parameter Estimation for the mixed Poisson regression models To find the maximum likelihood estimates of the parameters in the mixed Poisson re gression model requires an iterative algorithm. Two kinds of widely used algorithms can be applied to this case: (1) the EM algorithm due to Dempster, Laird and Rubin (1977) and (2) quasi-Newton algorithms (e.g., Nash 1990, and Dennis and Schanbel 1983). In this section we discuss how to find the estimates for the mixed Poisson regression model with a known number of components by combining both algorithms. We also report the results of a Monte Carlo study which investigates the performance of our codes and some implementation issues which will be discussed later. 25 Chapter 2. Mixed Poisson Regression Models 2.5.1 EM and Quasi-Newton Algorithms For a fixed number of components c, we obtain maximum likelihood estimates of the parameters in the above model using the EM algorithm (Dempster, Laird and Rubin (1977)). As is now standard in mixture model estimation, we implement it by treating unobservable membership of the observations as missing data and representing a complete data set for the model. We discuss choice of number of components below. Suppose that (Y X(T), X(m), T) {(y, ti); i 1,. . . , n} is the observed data generated by the mixed Poisson regression model. Let (YZ,Xfr),X(m),T) r) (m) tj; i = 1,. quantity z = (zi,. . {(, n} be the complete data for the mixture, where the unobserved . . , . , z)’ satisfies zij = 1. 1 ifA.=j 0 otherwise. The log likelihood for the complete data is =1 j=1 where Pu and Po (yj log Po (y )+ 3 zjj log(p Y Z, X, T) = i=1 j=1 )u) are given by (2.7), (2.8) and (2.9) respectively. The EM approach finds the maximum likelihood estimates using an iterative proce dure consisting of two steps: an E-step and an M-step. At the E-step, it replaces the missing data by its expectation conditional on the observed data. At the M-step, it finds the parameter estimates which maximize the expected log likelihood for the complete data, conditional on the expected values of the missing data. In our case, this procedure can be stated as follows.. E-step: Given a° and 9(°), replace the missing data Z by its expectation conditioned on these initial values of the parameters and the observed data, (Y x(T), X(m), T). In this case, the conditional expectation of the jth component of z equals the probability 26 Chapter 2. Mixed Poisson Regression Models that the observation y was generated by the jth component of the mixture distribution, Denote the conditional conditional on the parameters, the data and the covariates. expectation of the jth component of z by ,,(a(°), /3(0)). Then a(0),/3(0),Y;IvI,x(m),X) E (z 3 = = Pr (z = 1 (m) 1 ; ( (Xcm/3o) /3(0)) f, fi (r) (i aco)) 1VI—step: Given conditional probabilities {(a(°), /3(0)) = (i,1,.. . , E {1C(cx,/3 I yZ,X(r),X(m),T) = i = 1,. .i,)’; obtain estimates of the parameters by maximizing, with respect to a and Q(a,/3 a°,/3°) (2.20) 1•• x,ti,a°))’ . . , /3, I where j(a°, 13(0)) log(p) and Qi = (O), /3(0)) Q2 log(Po(y = The estimated parameters, & and , satisfy the following M-step equations i I&,/ - I&, = -- [1og(Po(y )} 3 1/3= 1 L zj [1og(p I )1 = = 0 0. (2.21) (2.22) Since closed form solutions of these equations are unavailable, we use a quasi-Newton approach (Nash, 1990) to obtain estimates. This approach makes use of functions its gradient g = (, -)‘ Q, and to find the estimates through an iterative formula (&, $) = (a, /3) + kBg (2.23) Chapter 2. Mixed Poisson Regression Models 27 where B is a transformation matrix evaluated at (a, ), and k the step length. Note that when B in the above iterative equation equals the inverse Hessian matrix of function Q, this is Newton’s method. We implement the E and M steps in the following way to obtain parameter estimates. Step 0: Specify starting values a° = (a°, . . . , a°) and (o) O) 9 ( O) and two tolerance o and e; Step 1: (E-step) Compute j (i = overflow in the calculation of = 1,.. .,n), using (2.20). To avoid we divide both the numerator and denominator in (2.20) by the largest term in the sum in the denominator; Step 2: (M-step) Find values of & and 3 to solve (2.21) and (2.22) using the quasiNewton algorithm (Nash, 1990). This algorithm consists of two parts: a matrix updating formula for B and a linear search procedure for k in (2.23). Given B and w (0 <w < 1), it chooses k < 0 < where t (&, ) — (a, /3) B where Sg = g(&, /3) — = 1, w, w ,..., successively until 2 = [Q(&,$) = Q(a,/3)]/tTg — —kBg and dttT — to an identity matrix I. Reset B (a): (b): (c): = ei <<1 is given. Given t, B is updated by [t(BSg) g(a, /3) and ci for + (BSg)tT]/tTSg (1 + 6gTBSg)/tTSg. Initially, B is set equal I if any of the following occurs: tTg 0; (&,)=(a,/3); tT6g 0. Chapter 2. Mixed Poisson Regression Models 28 The stopping criterion for the iterations is c I (&, $) — (a, i ) 3 1 k > 2 k c• I — aj, I I + j=1 1=1 — /3:j,i 1< E2 3=1 1=1 where e 2 is a very small positive number; Step 3: If at least one of the following conditions is true, set a° = & and 9(O) = 3, and go to Step 1; Otherwise, stop. (1) c 1 k ha—a (0) I 1 E > (2) I (3) I — (0) 1 Ia,i—a 3(°) l(&, / I ij,i Y X(m), T) — — /3 I; I e; I l(a(°), “ X(r), X(m), T) o, where l(a, j3 Y, X(r), X(m), T) is the observed log likelihood function. Note that we could have used other versions of quasi Newtonwhich use different updating scheme for B. Dempster, Laird and Rubin (1977) and Wu (1983) discussed the convergence proper ties of the EM algorithm in a general setting. Since Q(a, /3 I 3(0)) and its first order partial derivatives are continuous in a, 3, a° and 3(°), applying Wu’s theorems (1983) in our case, we conclude that the sequence of the observed data likelihood l(a(2’), YXfr),X(m),T) converges to a local maximum value l(a*,B* I vided that it is not trapped at any saddle point. Furthermore, if ,j3(P+1) — i9’ II—÷ I YX(2’),X(m),T), pro I — II— 0, 0 and the set of local maxima with a given 1 value is discrete, then (a(), /3(r)) converges to (a*, ,3*). Note that for some starting values the stopping criteria in Step 3 above might not be valid. Also l(a, /3 I Y X(m), T) need not, in general, be globally concave. For these reasons, we need to choose initial values carefully in order to increase the chance that the algorithm converges to the global maximum. We will discuss our starting value approach latter. Chapter 2. Mixed Poisson Regression Models 29 Note that the above EM algorithm does not directly yield the estimates of the stan dard errors corresponding to the parameter estimates. On the other hand, when the number of components c is known, asymptotic normality of /((&, /3) — (a, 3)) is easily proved under standard regularity conditions (Lehmann, 1983). To approximate standard error, we compute 1 + (c (c * k — 1) * ) and a(i 1 o-(&,, ,) from the diagonal elements of the inverse of the 3 )-dimensional observed information matrix with c fixed at ê which is 2 k defined as 821 Y,Xfr),X(m),T) 32 821 3c — — 821 821 8c8fl 8132 An alternative algorithm to the EM which maximizes the observed log-likelihood l(a,8) l(cr,/3 I YX(r),X(m),T) l:jPo(yj p 1 og{ I )} 3 )q is a quasi-Newton algorithm (e.g., Nash 1990). Instead of using the E and M steps, we maximize l(cx, 3) by computing successive parameter iterates via the formula 9(P)) (a(1+1), j3(P+1)) = (a, 1 + kBjgi 3(P)), where B 1 is the transform matrix evaluated at (a(”), 1 gi the gradient of l(a, 3) at ’), /3(r)), and k is a search step length. 0 (a Note that the maximization of l(a, /3) is different from maximizing the complete data log-likelihood Q(a, /3), though the quasiNewton algorithm is applied in both cases. In principle either the EM or the quasi-likelihood algorithm can be used to produce the maximum likelihood estimates for the mixed Poisson regression model. The EM and quasi-Newton algorithms, however, have complementary strengths. The convergence rate of the EM algorithm is linear which can be quite slow. In fact adjectives such as ‘exceedingly’ McCullagh and Nelder (1989), ‘maddeningly’ Redner and Walker (1984), and ‘painfully’ Haberma’n (1977) have been used. As proven by Wu(1983), however, the Chapter 2. Mixed Poisson Regression Models 30 EM algorithm converges to a stationary point regardless of the initial guess. A quasiNewton algorithm on the other hand, often requires rather good initial guesses in order to converge, but the convergence rate in a neighborhood of the solution is much faster than for the EM. The rate is quadratic for a quasi-Newton algorithm. A sensible combination of these two algorithms is to use the EM until the iterates are in a neighborhood of the solution and finish up with the quasi-Newton algorithm. This is an obvious algorithm to propose and suggestions similar to this have been made. Bock and Aitkin (1981) suggest performing a few EM steps and then one Newton-Raphson step. Dempster et al (1977) suggest using a Newton step while Redner and Walker (1984) suggest switching to a quasi-Newton procedure at some point. Note that using the quasiNewton algorithm, we can obtain the approximate standard errors of the estimates as by-product. To combine the EM and the quasi-Newton algorithm for our case, we modify the above Step 3 as follows: Step 3’: (a) If at least one of the following conditions is true, set = & and 3(°) = and go to Step 1; Otherwise, go to (b). (1) (2) (3) (b) & - 1 I I - cr e; I 1 I i.i,i i34 I c; E= E I l(&, I y X(r), j(m), T) l(c(°), ,6°) I y; x(r), (m), T) Maximize the observed likelihood function l(, I y, M, X(m), X(T)) — — — using the quasi-Newton algorithm (Nash, 1990) with & and 3 as initial values. Then, stop. 2.5.2 Starting Values We assume that c is known. The first step of our approach divides the data, {yi,. . . , y}, into c groups in terms of its percentiles and fits the data into a c-component indepen dent Poisson mixture model without covariates by choosing initial values based on the 31 Chapter 2. Mixed Poisson Regression Models percentile information. The second step, if necessary, fits the data into a mixed Poisson regression model containing only one covariate in either Poisson rate or mixing proba bilities in such a way that the initial values of the parameters included in the previous mixture model equal the estimates of the corresponding parameters from the previous fitting model, and initial values of the parameters not in the previous fitting model are set to a small value, say, 0.00001. This process is iterated until a complete set of ini tial values for the mixture model is obtained. The motivation of this ad hoc approach is based on the idea of cluster analysis. At each iteration, we use different criteria to classify the data. First, the data are classified in terms of its percentiles. Then the data are classified in terms ofindependent Poisson mixture model, and subsequently in terms of mixed Poisson regression models. Note that choosing a complete set of initial values for a mixture model step by step in such a way guarantees that the likelihood values will increase in each step. Also our approach obtains maximum likelihood estimates for a sequence of nested mixture models. We use an example to explain this approach. Suppose that we need to choose initial values to fit a 3-component mixture model with covariates x” = (1, d) and cm) (1, e) where d: and e are real numbers. First, we find 16.5, 33.0, 49.5, 66.0 and 82.5 percentiles of observations {yi,. . . , y} denoted as ql-q5 respectively, and fit the data into a 3(r) component independent Poisson mixture model (x 1 (m) = = (1)) with the initial . . values of a ) respectively, and both 5 , cr 1 1 and cr , 2 1 equal to log(qi), log(q , 3 ) and log(q 3 the initial values of and /32,1 equal to 0. Note that under this specification and an exponential link function, the initial values of X (x, cry), 3 (j = 1,2, 3) are equal to q, q3 and q5 with the same mixing probabilities 1/3. Second, we fit the data into the 3component Poisson mixture model with x = (1, d) and (m) = (1) by choosing the , and cr 2 2 equal to 0.00001 and the initial values of the other , 3 initial values of cr 2 cv , 1 parameters equal to the estimates of the corresponding parameters of the first fitting 32 Chapter 2. Mixed Poisson Regression Models model. Finally, we choose initial values for the 3-component Poisson mixture model with = (l,d:) and (m) =(1,e) in such a way that /31,2 and /32,2 are equal to 0.00001 and the other parameters is equal to the estimates of corresponding parameters of the second fitting model. A Monte Carlo Study 2.6 This section consists of two parts. In the first part, we use Monte Carlo methods to examine the performance of the above algorithm. In particular, we wished to verify the reliability of our code, determine the precision of estimates and investigate some model selection criteria. We use three 3-component mixture models. For each, we analyzed 100 replicates, each with 100 observations. In the second part, we use Monte Carlo methods to study how the mixed Poisson regression models can be used to analyze some typical problems in practice. We also fit the simulated data to Poisson regression, models and compare them with the mixed Poisson regression models. 2.6.1 Performance of the Estimation Algorithm’ Two different approaches for choosing initial values are compared in the study. In one, we use the true parameter values of the model generating the observations as initial values in order to determine performance of the algorithm in the best case. The other uses 1 as initial values, chooses initial values of , 3 the true parameter values of a, c 1 and a , 2 /31,1 and /32,1 according to the approach described in section 2.5.2, and fits the samples to a 3-component independent Poisson mixture model. Then, following the approach of section 2.5.2, we choose a complete set of initial values for the parameters of the model generating the samples. These two different approaches of choosing initial values lead to essentially the same estimates. We describe the details below. Chapter 2. Mixed Poisson Regression Models 33 Model 1: A model with Poisson rates depending on one time-dependent covariate, with constant mixing probabilities and t, = 1. For the regression part, r) = (2.24) (1, di), =0.4fori=11,...,20,etc.,and 1 whered=0.2fori=1,...,10,d a where o = (2.8, 2.9), c4 ) 3 (ai, a , a 2 (2.6, 0.4) and a = (m) /9 = 1 = (,6, /92) = (2.25) (3.6, 0.2). For the mixing part, = (1.1, 0.6). For the Poisson rates, we choose an exponential link function defined by = (xT),a ,\ ) 2 = exp(2.8 2.9d) (2.26) exp(2.6 + 0.4d) (2.27) exp(3.6 + 0.2d), (2.28) — and the mixing probabilities and Pl(Xm)/3) 0.5156, (xH,/3) 2 p 0.3127 (xi9) 3 p 0.1717. Model 2: A model with constant Poisson rates and mixing probabilities depending on one time-dependent covariate. That is, for the regression part, (r) a = (ai, a , a 2 ) 3 = (0.4, 3.0, 2.0) Chapter 2. Mixed Poisson Regression Models 34 and for the mixing part, (m) (2.29) = (1, d) where cl 2 is defined as above, and i3(th, where = (2.0, —1.4) and = (2.30) /32) (—2.0, 1.5). The Poisson rates, then, are (r) and x ,ai) 1.49, (r) X ,a ) 2 20.08 (r) X ,a ) 3 7.39, and the mixing probabilities are given by (m) pi(x (m) (x 2 p (m) p3(x ,3) ,i3) ,/3) exp(2.0 2.0d) 2.0c4) + exp(—1.4 + l.5d:) + 1 (2.31) exp(—1.4 + 1.5d ) 1 2.Odj + exp(—1.4 + 1.5d) + 1 (2.32) 1 2.0d) + exp(—1.4 + 1.5c4) + 1 (2.33) — = = = exp(2.0 exp(2.0 exp(2.0 — — — Model 3: Both the Poisson rates and mixing probabilities depend on the covariate d. For the regression part, x, a and a) are given by (2.24), (2.25), (2.26), (2.27), and (2.28) respectively; For the mixing part, (Xm), 3 cm) /3 and p /3) are given by (2.29), (2.30), (2.31), (2.32) and (2.33) respectively. We chose the above parameter values so that the Poisson rate functions do not cross each other and the ranges of the mixing probabilities for each component do not overlap. We would expect that in this case, the algorithm would perform well. Chapter 2. Mixed Poisson Regression Models 35 We carried out these simulations, each with 100 replicates. In each case, the response y were obtained by first generating a uniform (0,1) random number u and then assigning PoissonXi(xT), ai) if u (m) u, pi(x (m) (x 2 ,/3)+p (m) 1 p ,B), y ‘-‘.‘ (x, a 2 Poisson(A )) if 2 (r) . (m) . ,8),ory ‘-.‘Poisson(.)is(x ,as))ifu >p (x 1 (X(m) 1 P 3) < (m) ,/3)+p2(x ,/3). The results of the Monte Carlo study are presented in Table 2.1. The table shows that for each parameter the mean of estimates is very close to the true value in the models, suggesting that the global maximum of the observed likelihood is reached. For model 1, the sample means are quite close to the true values and the standard deviations are relatively small. Although the Poisson rates of model 2 are estimated accurately, estimates of mixing probabilities are more variable. This suggests that estimating mixing probability parameters in this model is intrinsically more difficult than estimating Poisson rates. This agrees with observations in the literature (Titterington et al., 1985; Mclachlan and Basford, 1988). Estimates of the parameters of model 3 illustrate the same pattern as in Model 2 where estimates of the mixing probability parameters are more variable than those of Poisson rate parameters. Note, however, that although the estimates of mixing probability parameters, 9, vary somewhat, the estimated mixing probabilities, (m) 3 (x p , ), are more precise due to the multimonial link function between the parameters . . and mixing probabilities. Our implementation of the algorithm used FORTRAN on a Sun SPARC station 1. The average number of the iterations of the EM algorithm for Model 1 is 4.75, 4.93 for Model 2 and 55.6 for Model 3 under the stopping criterion = 0.01, and average time is 6.65, 7.39 and 79.2 seconds respectively. 2.6.2 The mixed Poisson regression Models For Some Typical Problems In a clinical trial it may not be uncommon for a treatment to have a significant effect on some subjects but not on others. Thus subjects under treatment may be classified Chapter 2. Mixed Poisson Regression Models 36 into two groups: responding and non-responding. Models which ignore this distinction often are unable to detect such a treatment effect. For example, in a clinical trial carried out at British Columbia Children’s Hospital which investigated the effect of intravenous gammaglobulin (IVIG) on suppression of epileptic seizures, the clinical investigators con ducting this study found that some patients responded to the treatment and others did not. Using Poisson regression to analyze the seizure count data, we found that the data are seriously overdispersed. To explore whether the proposed mixed Poisson regression models can be used to describe and analyze such a scenario, we carried out he following Monte Carlo study. In the study, we used eight 2-component mixed Poisson regression models in which the mixing probabilities are constant and where x = 1 if i < Pi and (x,ai) 2 P2, and the Poisson rates are defined by = xj) 2 exp(ai+a = exp(ai), 50; and 0 otherwise, and i = 1,. . . , 100. This model describes the following situation: there are 50 subjects in each of two groups (e.g., treatment and control groups) for a study which records the observed responses for all subjects; the background effects are characterized by the Poisson rate exp(ai); Pi 100% of subjects in the treatment group respond to the treatment which has an effects characterized by the Poisson rate exp(ai + a ) where a 2 2 < 0; and the other P2 100% subjects in the treatment group do not respond the treatment, and their responses are the same as the background effects. These eight models in the study are defined by choosing all combinations of parameter values from the following: p1 = 1 0.6, 0.4, a = 1.0, 2.0, and a 2 = —0.5, —2.5. Note that the actual Poisson rates of the background effects are 2.7183 and 7.3891 evaluated by exp(ai) respectively, and the rates of the treatment effects 1.6487, 4.4817, 0.2231 and 0.6065 by exp(ai + a ) respectively. 2 Chapter 2. Mixed Poisson Regression Models 37 We carried out these simulations, each with 200 replicates. The responses yj were obtained by first generating a uniform (0,1) random number u and then assigning yj ,a 1 )) if u 2 Poisson(i(x, a p1 and y, (x, ai)) otherwise. Our implemen 2 Poisson(.\ tation of the algorithm used FORTRAN version on a Sun SPARC station 1. The results are reported in Table 2.2 and Table 2.3. It summarizes the properties of the estimated coefficients. Among all eight models the means of &i, & and j are very close to the their true values, and their sample standard deviations are very small compared with the magiitudes of the estimates. This means the maximum likelihood estimates are achievable and robust for not only different choices of background and treat ment effect but also different choices of responding rates. Since the means and medians of the parameter estimates are very close and upper and lower quartiles are roughly sym metric at the center of the means, the parameter estimates follow approximately normal distributions. Indeed, the histograms of the parameter estimates (not given here) show normal distribution patterns. 2 To investigate the treatment effect, we test the hypothesis of a = 0 by computing the likelihood ratio test statistic. Note that the chi-squared approximation for the likelihood ratio test statistic may not be justified here because the regularity conditions may be not satisfied on the boundary. We use it in these cases as a guideline. The test results are summarized in Table 2.4 in which the numerator in each cell is the number of the times that we reject th hypothesis at 5% significance level, and the denominator is the total number of the replicates. Clearly when the treatment effect is highly significant 2 (a = —2.5), we reject the hypothesis of a 2 = 0 for almost all replicates at 5 % significance level. This means the likelihood ratio test may work well in these cases. On the other hand, when the treatment effect is small (a 2 = —0.5), the likelihood ratio test may not be appropriate partially because the difference between the background and treatment effects may not be significant enough for the test. The baseline effects may not affect the 38 Chapter 2. Mixed Poisson Regression Models tests significantly, while the mixing probabilities (respond rate) have some impact on the tests. Note that when P’ = 0.4, there may be only 20 subjects out of 200 who may have a significant treatment effect. In order to compare the mixed Poisson regression model with Poisson regression, we fitted the simulated data with the Poisson regression model with covariate (1, xi). The results are summarized in Table 2.5. The means of the intercept estimates in the Poisson regression are very close to the true values in these cases, suggesting that the back ground effects are appropriately estimated. However the treatment effects are seriously underestimated in these cases because the model cannot distinguish the non-responding subjects from the responding subjects. For example, in the two cases of the low treatment 2 (cr = —0.5) and low background (ai = 1.0) effect, the estimate of the treatment effect by the Poisson regression is —0.2668 for the mixing probability p for p = = 0.6 and —0.1611 0.4, which are about one half and one quarter of the true parameter value 2 respectively; In the two cases of the high treatment (a 1 (a = = —2.5) and high background 2.0) effect, the estimate of the treatment effect by the Poisson regression is -0.8065 for the mixing probability p = 0.6 and —0.4536 for P1 = 0.4, which are less than one quarter and one fifth of the true value respectively. We also carried out the test for the 2 hypothesis of a = 0 using the likelihood ratio test statistic. The test results given in Table 2.6 in which the numerator in each cell is the number of times that we reject the hypothesis, and the denominator is the total number of these tests. For example, in the two cases of the low background (ai = 2 1.0) and low treatment (a times out of 200 for mixing probability p = = —0.5) effect, 99 0.6 and 47 times out of 200 for Pi = 0.4, respectively, that we reject the hypothesis at 5% significance level; In the two cases of the high background (ai = 2 2.0) and high treatment (a = —2.5) effect, we always reject the hypothesis at 5% significance level for both the mixing probability values. Note that the Poisson regression is more powerful except one case, although Table 2.4 and Table 2.6 Chapter 2. Mixed Poisson Regression Models 39 have a similar pattern. Using the mixed Poisson regression model, we can classify subjects as responding and non-responding. In the Monte Carlo study, for x = 1, yj is identified with group one generated by Poissn rate .Xi(x, a, o ) if the estimated posterior probability of 2 being group one > 0.5, and with the other generated by Poisson rate ) (x, ai) 2 otherwise. For 200 replicates the mean of the number of subjects in the treatment group who responded to the treatment is very close to 5 OPi, suggesting that the classification criterion works well. 2.7 2.7.1 Implementation Issues Model Selection We need to address the following two issues when applying a mixed Poisson regression model: (a) We must determine the number of components c, and (b) we must have a method to carry out inference about model parameters. When c is known, inference for the parameters can be bsed on a likelihood ratio test. In practice, however, this is rarely the case. When c is unknown, the likelihood ratio test is no longer valid for determining c or testing hypotheses about parameter values. This is because the usual regularity con ditions do not hold for the likelihood ratio test statistic to have its standard asymptotic null distribution of chi-squared with degree of freedom equal to the difference between the number of parameters under the null and alternative hypotheses. One of the regularity conditions requires that the parameters in a mixture are identifiable without any restric tion. This ensures that the information matrix is non-singular. The main problem here is the lack of identifiability even when the class of the mixed Poisson regression models is identifiable. As McLachlan and Basford (1988) illustrate this, consider a 2-component Chapter 2. Mixed Poisson Regression Models 40 mixture without covariates. The null hypothesis that there is one underlying population, 0 :c H can be approached by testing whether p = = 1, 1, which is on the boundary of the parameter space with a consequent breakdown in the standard regularity conditions. Alternatively, we can view H 0 as testing for whether ) If for a specified value of P1 = ‘2, where now the value of p is irrelevant. regularity conditions held, so that the log likelihood ratio test statistic under H 0 were distributed asymptotically as chi-squared, then the null asymptotic distribution of the likelihood ratio test statistic where P1 is unspecified, would correspond to the maximum of a set of dependent chi-squared variables. A comprehensive account of the breakdown in regularity conditions has been give by Ghosh and Sen (1985); see also Hartigan (l985a,b), Titterington, Smith and Makov (1985), and Mclachlan and Basford (1988). We propose the following methods for model selection. In general, there are two criteria used for statistical model selection: the prillciple of parsimony and closeness to the true distribution. The former means that more par simonious use of parameters should be pursued so as to raise the accuracy of estimates for unknown parameters in a model. On the other hand, closeness to the true model is incompatible with parsimony of parameters. These two criteria form a trade-off: if one pursues one of the Oriteria, the other must be necessarily sacrificed. The multiple correlation coefficient adjusted for the degrees of freedom may be most commonly used statistic that incorporates these two incompatible criteria into a single statistic. Akaike (1973) has proposed a more general as well as more widely applicable statistic that ingeniously incorporates the above two criteria. As it is based on the Kullback Leibler Information Criterion (KLIC), Akaike’s statistic is called Akaike Information Criterion and is abbreviated as the AIC. The AIC can be derived as follows. Chapter 2. Mixed Poisson Regression Models 41 Suppose that the adequacy of a postulated model F(y 0) to approximate the un known true distribution G(Y) is measured by the KLIC if ri i-if I jY. r n\\ Li)) = ‘V. i—i ri rIGIog g o where 0 is a finite-dimensional vector of unknown parameters; g and f are density (or probability) functions of G and F respectively; EG(.) stands for expectation with respect to the true distribution 0. We define a pseudo-true model F(. I 0) with a parameter value 0 such that I(G: F(. 0)) <I(G: F(. I 0)) for any possible 0 in the admissible parameter space. The model F(. I Oo) may be regarded as the most adequate relatively within the family models F(y I 0) in the sense that the KLIC for FQ,, I 0) is minimized by F(y I Oo). Assuming that 1(0 : F(. I 0)) = O(n’), i.e., the pseudo-true model is nearly true, Akaike (1973) derives AIC(F(. 0)) = —2 log f(y I Ô) + 2k as an almost unbiased estimate for —2EG[log f(Y I &o)] where Ô is the maximum likelihood estimate for 0 based on observation y and k is the number of unknown parameters, i.e., the dimension of 0. Note that the first term of the AIC measures the goodness-of-fit of the model to a given set of data, because f(y I 0) is the maximized likelihood function. The second term is interpreted as representing a penalty that should be paid for increasing the number of parameters. In this sense the AIC may be regarded as an explicit formation of the so-called principle of parsimony in model building. Schwartz (1978) has proposed another model selection criterion: the Bayesian Infor mation Criterion (BIC). The BIC is defined through a larger-sample version of Bayes Chapter 2. Mixed Poisson Regression Models 42 procedures by placing a prior distribution on the parameter space including all dimen sions and models considered. It can be derived as follows. We assume that observations are generated by a distribution from a family with a density f(y,O) = exp(9 x(y) . — where 0 e 3, a convex subset of the K-dimensional Euclidean space, and x(y) is the sufficient K-dimensional statistic. The competing models are denoted by sets m 3 e where m 3 is a k- dimensional linear submanifold of K-dimensional space. Since the a priori distribution need not be known exactly for the asymptotic results, we assume that it is of the form model being the true one, and aj,uj, where a 3 is the a priori probability of the jth the conditional a priori distribution of 0 given the j model, has a k -dimensional density that is bounded and locally bounded away form zero 3 throughout m 3 E 0. Finally, we assume a fixed penalty for guessing the wrong model. Under this assump tion, the Bayes solution consists of selecting the model that is a posterior most probable. That is equivalent to choosing the S(X,n,j) = j that maximizes iogf aexp(X.O — b(0)n)d(0), where the integral extend over m 3 E 0, and X is the averaged x-statistic (1/n) For fixed X and j, > X(yj). as n tends to infinity, we obtain the asymptotic expansion of S(X,n,j) as S(X,n,j) where the remainder R = = nsup(X .0 — b(0)) — klogn + R, R(X, n,j) is bounded in n for fixed x and large sample, maximizing S(X, n, j) in j is equivalent to maximizing IC=logf(yi,...,y) — klogn, j. Therefore, for a Chapter 2. Mixed Poisson Regression Models where f (yi,.. 3 . , y) 43 is the maximum likelihood function for model j, and k 3 is the di mension of the model. Qualitatively both the AIC and BIC give a mathematical formulation of the principle of parsimony in model building. Quantitatively, since the BIC differs from the AIC only in that the dimension is multiplied by (log n)/2, the BIC leans more than the AIC towards lower-dimensional models. For large numbers of observations the two model selection procedures differ markedly from each other. McLachlan and Basford (1988) discussed the use of AIC to determine the number of components in a finite mixture model. Leroux and Puterman (1992) applied AIC and BIC to select independent Poisson mixture models. We define the AIC and BIC criteria for the mixed Poisson regression model as follows: • AIC: choose the model for which l(X) • BIC: choose the model for which l(X) — — ac(X) is largest; (1og(n))a(X) is largest where l(X) is the maximum log-likelihood of the mixture with c components and co variate X, a(X) j 3 1 = c * 1 + (c k — 1) * 2 where k k 1 and k 2 are the dimensions of a 3 and respectively, and n is the total number of observations. As discussed above, these two criteria do not always select the same model; the BIC tends to select a smaller number of components than AIC when there are 8 or more observations. Using the BIC (Ale), our model selection approach consists of two stages. At the first stage, we determine c to maximize BIC (AIC) values for the saturated 1-3 (1-4) component mixture models that contain all possible covariates in both rates and mixing probabilities. Although we compute both AIC and BIC values in our applications, we recommend using BIC because Monte Carlo studies reported below suggest that BIC is more reliable in the model selection. At the second stage, our model selection approach depends on our analysis objectives. If our goal is inference about some particular model Chapter 2. Mixed Poisson Regression Models 44 parameter, we carry out likelihood ratio tests for nested c-component mixture models. If the goal is choosillg an appropriate model to fit the data, we select a model to maximize BIC (AIC) values among c-component mixture models concerned. Since this selection method is heuristic and oniy gives a guideline in applications, some other specific concerns in model selection should be taken into account from case to case. For instance, in some applications the number of components and some parameters in a mixture may be explicitly or implicitly determined by underlying theory, especially when a mixture model is intended as a direct representation of the underlying physical phenomenon. For a housing market in disequilibrium, the market has two phases: supply and demand. If we regard the phase in o.peration in any given month to be the unobservable underlying state because it may not be clear which phase is in operation, we have a two-component mixture model. Goldfeld and Quandt (1973) discuss such a model and denote it as a switching regression model. In the Monte Carlo studies discussed in Section 2.6.1, we computed both AIC and BIC values for all possible mixed 2 to 4 component models. Table 2.7 shows that AIC and BIC are reliable methods for choosing the correct models. AIC chose the correct model 96% of the time for Model 1, 87% of the time for Model 2 and 91% of the time for Model 3. When AIC failed to select the correct model, it always chose a model with too many components, suggesting that AIC may under-penalize the number of parameters in the mixtures. On the other hand, BIC always chose the correct models, suggesting that BIC may not over-penalize the number of parameters. Note that all sample sizes in the Monte Carlo studies are 100. The examples in the next section will exhibit this procedure in practice. Chapter 2. Mixed Poisson Regression Models 2.7.2 45 Classification In classification, the nuiriber and composition of groups are not known at the start of the investigation. On occasion, the aim of a classification study may be to enable the subsequent assignment of new objects. For instance, in pattern recognition (Fukunaga, 1972, and Duda and Hart, 1973), information about ‘patterns’ can be obtained from a ‘training’ set of observations which may be analyzed by classification method. Fitting the mixed Poisson regression models to Poisson-distribution data, we assume that each observation belongs one of c groups characterized by the Poisson rate functions. One possible use of the mixed Poisson regression model is to classify data on the basis of a probabilistic model rather than an ad hoc clustering technique. Since in (2.20) is the estimated posterior probability that the ith observation yj is generated by the jth component distribution f (y I r) cj), this information can be used to classify observations into different groups characterized by the component distributions. For instance, for a c-component mixture model we may postulate c different groups defined by the c different forms of Poisson rates, c) (j = 1,.. .., c) of the model. According to the classification criterion, an observation i is identified with the component which maximizes . In our Monte Carlo study this classification criterion works very well. Also in our applications, maximum values for this quantity all exceed 0.5. Note that if the parameters of the model were known, this classification criterion would be the optimal or Bayes rule (Anderson, 1984, chapter 6) which minimizes the overall error rate. Also such a approach has been referred to as latent class analysis (Aitkin et al. 1981). We illustrate this approach in examples below. Chapter 2. Mixed Poisson Regression Models 2.7.3 46 Residual Analysis and Goodness-of-fit Test Once a mixed Poisson regression model has been fit to a set of observations, it is essential to the quality of the fit. For this purpose, we consider Pearson, deviance and likelihood residuals for mixed Poisson regression models, and use them to identify individually poorly fitting observations and influential observations on overall fit of the model as well. We also define a quantity to measure influence of individual observations on the set of parameter estimates, and use it to identify influential obseryations. In addition, we give goodness-of-fit statistics for mixed Poisson regression models. Definitions of Residuals For Normal regression models, we can express an observation yj of the response vari able in the form yi where = + (y — is the maximum likelihood estimate of the mean of yj, i.e., data=fitted value +residual. Residuals are used in many procedures designed to detect various types of disagreement between data and assumed model. For example, the scatterplot of residuals versus fitted values that accompanies a linear least square fit is a standard tool used to diagnose nonconstant variance, curvature, and outliers. Diagnostic tools such as this plot have two important uses. First, they may result in the recognition of important phenomena that might otherwise have gone unnoticed. Outlier detection is an example of this, where an outlying case may indicate conditions under which a process works differently, possible worse or better. Second, the diagnostic methods can be used to suggest appropriate remedial action to the analysis of the model. For generalized linear models there are at least three types of generalized residuals Chapter 2. Mixed Poisson Regression Models 47 which are widely used in practice. One is the Pearson residual defined as 234 — where V(j) is the variance function and is the maximum likelihood estimate of the ith mean of fitted to the regression model. These residuals are the signed square roots of the contribution to the Pearson goodness-of-fit statistic X . For the usual Poisson regression 2 model, the Pearson residual is r= where = yi — ili exp(x&) and & is the maximum likelihood estimates of the regression pa rameters; for the usual logistic regression model, yj—m:j3j — where j3 = logit(x&). The second type of generalized residuals is deviance residual defined as rd = sign(yj = sign(yj — j)/2[l(y, Yz) — l(/%, y)} (2.35) — where l(, yj) is the log likelihood function for y and d is the contribution to the deviance goodness-of-fit statistic D. For the usual Poisson regression model, d = 2(y: 1n(y/) — yj + Iti), where j% = exp(x&); for the usual logistic regression model, d = 2yln where t% () + 2(m — y)ln (mi_ i) m—t = mj3 = mlogit(x&). The third type of generalized residuals is the likelihood residual which is derived by comparing the deviance obtained on fitting a linear model Chapter 2. Mixed Poisson Regression Models 48 to the complete set of n cases, with the deviance obtained when the same model is fitted to the n — 1 cases, excluding the ith, for i 1,... , n. This gives rise to a quantity that = measures the change in the deviance when each case in turn is excluded from the data set. The likelihood residual for the ith case is defined as rj = sign(y — — (2.36) D() where D and D() are the deviances based on n and n 1 cases respectively. Pregibon — (1981) derives useful one step approximation for the above exact value by I slgn(yj ri — h, 1 — hpt + where h is the ith diagonal element of the n x n matrix H = X(X’WX)’X’W W’/ . 2 / 1 (2.37) In this expression for H, W is the n x n diagonal matrix of weights used in fitting the linear model and X is the n x k design matrix. W = diag{%j,.. . , ,t%}; Fo.r Poisson regression model, for logistic regression model, the ith diagonal element of W is mj3(1 —j3). Note that when the response Y follows a Normal distribution, r follows x distri bution; when Y follows a non-Normal distribution, r does not asymptotically follows x distribution as n — because the asymptotical theory does not hold in this case (Williams, 1987). To standardize the above residuals so that they have approximate unit variance, one needs to account for the inherent variation in the fitted values /j. In general, for any type of residuals 1 R(y 2 , i ) , Pierce and Schafer (1986) show that its variance is approximately given by Var[R(y, 2)] , tJ] 2 Var[R(y — Var[(f — it)/SD(y)] (2.38) Chapter 2. Mixed Poisson Regression Models as —* 49 cc. For either Poisson regression or logistic regression model, Var[(2 — j/SD(y)] = where h is defined by (2.37). Therefore, we can standardize the factor /l — rd and ri by dividing h because for all three types of the residuals the first term in the right side of (2.38) is 1. Several researchers have compared differences between these three types of residuals (e.g., Pierce and Schafer, 1986; Williams, 1987; McCulagh and Nelder, 1989; and Collett, 1991). The value of ri is intermediate between rd and to rd than to and it is usually much closer Both rd and r 1 take account of the shape of the distribution of Y which is ignored by Both rdl and have distributions which are closer to normality than that of rpj. For outlier detection seems the best choice because of its relevance to the measurement of case influence on likelihood ratio tests. Several types of residual plots are useful for different purposes of diagnostics. For example, an index plot that the residuals are displayed against the corresponding obser vation number or index is particularly suitable for detection of outliers. Although a plot of the residuals against the fitted values j or an explanatory variable is more informative than an index plot for normal regression, it may be uninformative for Poisson regression because when the mean of the response variable is small; there may be a pattern in the plot no matter whether the model is correct or not. Indeed, if yj = 0, = = This means that for small mean values, the residuals are not approximately normal. Analogously, we can define the same three types of residuals for mixed Poisson re gression models. That is, the Pearson residual, rp, for mixed Poisson regression models is given by defining /‘L. and V(/1 ) in (2.34) as 1 = t (2.39) Chapter 2. Mixed Poisson Regression Models 50 where (r) exp(crx = ), expi p •x•(m) i, Pu = Pic EJ’ exp(/3x) + 1 1 exp(/xm)) + 1’ for y 1,. = . . ,c — 1 and and V(ju) = tu The deviance residual, +t rD, - ]2} for mixed Poisson regression models is given by defining the log likelihood function l(,a, yj) in (2.35) as l(uj, y) where f(y. (r) (m) ; ,x (r) = (m) log[f(y ; ,; , t, a, 3)] (2.40) . ,t,a,B) is defined in (2.10). Note that l(yu,yj) is the same for both generalized linear models and mixed Poisson regression models because we have the following relation f(u I x),xm),ti,a,/3) = P( C = Po (yj I This indicates that there is the same baseline for generalized linear models and mixed Poisson regression models. The likelihood residual rL for mixed Poisson regression models is given by defining as specified in (2.39), and D and D() as the deviances based on the data set of ri and Chapter 2. Mixed Poisson Regression Models n — 51 1 cases for the mixed Poisson regression model. Computing the likelihood residuals requres fitting the model n times, each having good starting values which are already available in our algorithm. In contrast to linear normal regression, it may require fitting the model only once. Note that for the residuals of mixed Poisson regression models, equation (2.38) still hold. Thus, to account for variation in the fitted values j, in these three types of the residuals, we need to calculate 2 Var[(i% — However, the computation of this variance now becomes too complicated. Fortunately, for large samples, are very close to ,uj so that the variation in the fitted values may be negligible. Example. Ré1D and Patent In modeling the patent data from Section 2.8.1 on the relationship between R&D spending and number of patent applications at firm level, a 3-component mixed Poisson regression model is found to be satisfactory. The analysis will be given in Section 2.8.1. Figure 2.1, Figure 2.2 and Figure 2.3 give index plots of the Pearson, deviance and likelihood residuals respectively. Figure 2.1 shows that the Pearson residuals may not aprroximately be normal. On the other hand, Figure 2.2 and Figure 2.3 show that the deviance and likelihood residuals are very similar to each other. Note that the 6th has the largest Pearson residual and the 8th has both largest deviance and likelihood residuals. These plots suggest that the deviance residuals and the likelihood residuals may be likely to perform similarly in terms of the ranking of extreme observations. In fact, the empirical evidence to be presented in examples in Section 2.8 suggest the same. The numerical studies also indicate that rm and rLI are more approximately normal than rp. Since the likelihood residuals are much more difficult to compute than any other type of residuals, we recommend using Chapter 2. Mixed Poisson Regression Models TDi 52 routinely. Detection of Outliers and Influential Observations The residuals obtained after fitting a mixed Poisson regression model to an observed set of data form the basis of diagnostic techniques for assessing model adequacy. Since our primary objective of residual analysis for mixed Poisson regression models is to identify outliers and influential observations, we discuss how these residuals can be used for this objective. Like generalized linear models, we define outliers as those observations that are sur prisingly distant from the remaining observations in the sample. Such observations may occur as a result of measurement errors, that is errors in reading, calculating or recording a numerical value; or they may be just an extreme manifestation of natural variability. Since large residuals indicate poorly fitting observations, we use index plots of resid uals for detection of outliers, that is, observations that have unusually large residuals. For example, in the previous example, the 8th observation stands out from the rest as having a relatively large residual in all three index plots of the residuals. The outlying nature of this observation is obvious from these plots. The influence of a particular observation on the overall fit of a model can be assessed from the change in the value of a summary measure of goodness of fit that results from excluding the observation from the data set. Since r is the change in deviance on omitting the ith observation from the fit, an index plot of these values is the best way of assessing the contribution of each observation to the overall goodness of fit of the model. In the previous example, Figure 2.3 shows that the 8th observation has great impact on the overall fit of the model to the data, as measured by the deviance. Indeed, on omitting the 8th observation, the deviance reduction is r, 8 = (3.392)2 = 11.506. Chapter 2. Mixed Poisson Regression Models 53 To examine how the ith observation affects the set of parameter estimates, we define the following quantity = -{(&-&)/se(&)II+I-)/se()II} I where & and I 3 ‘} ‘:‘!SJ are the maximum parameter estimates of the mixed Poisson regression model based on the complete data set of n cases, and n — (2.41) and (i) 13 on the data set of 1 cases excluding the i case; se(&) and se(,8) are the estimated standard errors of the corresponding estimates based on the n cases, and p = 1 + (c— l)k ck . Because each 2 term in (2.41) measures a relative change in individual coefficient, w may be interpreted as average relative coefficient changes for a set of estimates. This is a useful quantity for assessing the extent to which the set of parameter estimates is affected by the exclusion of the ith observation. Relatively large values of this quantity will indicate that the corresponding observations are influential and causing instability in the fitted model. An index plot of w is the most useful way of presenting these values. For the previous example, Figure 2.4 is the index plot of w. Clearly, the plot shows that the 8th, 12th, 47th 64th, 65th and 66th observations are influential so that omitting each of them from the data has a great effect on the set of parameter estimates. For example, if the 12th observation is excluded from the data set, each parameter estimate will averagely change about 33%. Although the 47th observation has relatively large value of wj, it has a relatively small value of either likelihood residual or deviance residual. This indicates that an influential observation need not necessarily be an outlier. In particular, an influential observation that is not an outlier will occur when the observation distorts the form of the fitted model to such an extent that the observation itself has a small residual value. Note that in this example, the 8th observation is not only an influential Chapter 2. Mixed Poisson Regression Models 54 observation but also an outlier as well. On the other hand, the first observation appears an outlier but has a rather small value of w:. Goodness-of-fit Statistics After fitting a mixed Poisson regression model to a set of data, it is natural to inquire about the extent to which the fitted values of the response variable under the model compare with the observed values. If the agreement between the observations and the corresponding fitted values is good, the model may be acceptable. If not, the current form of the model will certainly not be acceptable and the model will need to be revised. The aspect of the adequacy of a model is widely referred to as goodness of fit. There are at least two widely used goodness-of-fit statistics which can be used here. One is the deviance statistic, D, defined as where TDi is the deviance residuals for the mixed Poisson regression model; And the other is the Pearson’s statistic, X , defined as 2 2 X = where rp is the Pearson residuals for the mixed Poisson regression model. In order to evaluate the extent to which an adopted mixed Poisson regression model fits a set of data, the distribution of either the deviance or the Pearson statistic, under the assumption that the model is correct, is needed. For normal linear models, the deviance and the Pearson’s X 2 statistics are distributed as x with 2 (n — p) degrees of freedom, where ri is the number of observations and p is the number of unknown parameters in the model. In general, many studies have shown that the Pearson statistic is often much more nearly Chapter 2. Mixed Poisson Regression Models 55 chi-squared than that of the deviance (e.g., Larntz, 1978). For this reason, we use the Pearson statistic for overall goodness of fit tests for the mixed Poisson regression models. 2.8 Applications 2.8.1 R&D and Patents Economists studying technological innovation often use patent applications as an indica tor of inventive activity. The nature of much industrial R&D activity suggests that it is natural to assume that patent counts follow a Poisson distribution: patent applications can be thought of as measuring the number of successful outcomes among a large (but unobserved) number of projects within a firm’s R&D lab, each of which has a small prob ability of success. Econometricians have accordingly examined the relationship between R&D and patenting by using Poisson regression to estimate a “production function for patents” of the form: E(y) = exp(a’xt), where yj is the number of patents applied for by firm i and x is a vector of explanatory variables, including R&D spending. (There are many problems with using patent counts as indicators of innovative output, but they remain the only comprehensive, objective, and readily available measure of inventive activity. See Griliches (1990).) In economics there are two important characteristics associated with a production function f(x): returns to scale and elasticity. The former identifies how output responds to proportionate, scaled expansion in inputs. If a proportionate increase in all inputs in creases output by the same proportion, the production function is said to exhibit constant returns to scale. This can be mathematically described by tf(x) = f(tx), where x is a vector representing inputs, and t is a positive real number. Similarly, if a more (less) than proportionate increase (decrease) in output is obtained, there is increasing Chapter 2. Mixed Poisson Regression Models 56 (decreasing) returns to scale. These can be mathematically described by tf(x)<f(tx) and tf(x) > f(tx) respectively. An input (xj elasticity of output is a measure of responsiveness of output to that input that uses the percent change in output divided by the percent change in the input. This is given by — — zXf/f — — a(log f) ö(logx) Note that for the above production function for patents, for instance, the R&D elasticity of patent applications is independent of the units in which patents are measured, and thus a more meaningful measure of the responsiveness of patent applications to R&D spending. The R&D elasticity of patent applications simply measures the percentage change in patent applications when R&D spending changes by a small percent. The parameters of the above model have a direct and interesting economic interpre tation: they provide estimates of returns to scale in performing R&D. However, efforts to test for returns to scale using these data have been hampered by the fact that they are typically quite severely overdispersed. Hausman, Hall and Griliches (1984), Bound et al. (1984), and Hall, Griliches, and Hausman (1986) estimated variations of the Poisson model which account for the overdispersion by including an additive random firm effect in the patent equation. The random firm effect can be thought of as capturing unobserved firm-specific factors affecting R&D productivity. As is well known, if the additive random effect is distributed gamma thell the unconditional distribution of the response variable Chapter 2. Mixed Poisdon Regression Models 57 is negative binomial. If the distribution assumption is incorrect, inconsistent parameter estimates will be obtained. These studies also present results from the quasi-generalized pseudo maximum likelihood estimators proposed by Gourieroux, Monfort, and Trognon (1984) which allow the random firm effect to be drawn from an unspecified distribution. Though results obtained using the Poisson, Negative Binomial, and QGPML estimators were qualitatively the same, the estimated coefficient on R&D varied substantially. The authors attributed this problem to “instability” in the R&D-patents relationship over time and across firms. We treat the unobserved heterogeneity in these data quite differently, and show how the overdispersion can be accounted for in an alternative and perhaps more interesting way by using finite rather than continuous mixtures. Rather than assume that all firms have common regression coefficients and a random intercept, we allow both the intercept and the coefficient on R&D to vary from firm to firm, but in a restricted way. We postulate a discrete Poisson mixture model in which firms can be in a finite number of different states defined by different degrees of R&D productivity, for example “high”, “medium”, and “low”. In this model the coefficients vary from state to state, rather than from firm to firm. One way to motivate this model is to assume that all firms have access to the same technological opportunities, but have different unobservable innovative capabilities (e.g. “Type A” or “Type B” or “Type C” organizational structures). Alternatively, we could assume that all firms have the same innovative capabilities, but have differential access to technological opportunities: some firms are working in “hot” areas of the underlying science while others are hot. The data are patent applications and R&D spending in 1976 for 70 pharmaceutical and biomedical companies, taken from the NBER R&D Masterfile (see Hall (1988) for documentation of this data set.) The data are displayed in Figure 2.5, where the hori zontal axis is the logarithm of R&D. Formal test results in Table 2.8 confirm the visual Chapter 2. Mixed Poisson Regression Models 58 impression that the data are overdispersed: all of the tests strongly reject the null hy pothesis of no overdispersion. As in the standard model used in previous studies, the dependent variable is a count of patent applications, and the explanatory variables are log(R&D) and a quadratic term (log(R&D)) 2 included to capture non-linearities in the relationship. The coefficients on these variables provide a’ direct estimate of the elasticity of innovative output with respect to R&D spending, and thus the extent to which there are scale economies in performing R&D. If the elasticity is greater than one then an increase in R&D spending would generate a more than proportionate increase in patents. The coefficient on the quadratic term is particularly interesting since it captures the ex tent to which economies of scale vary with the size of a firm’s R&D effort, a question which has been been hotly (though inconclusively) debated by economists for many years. To apply our mixture model, we assume that (1) the total number of patents applied for by firm i is associated with covariates x (X(m) (r)) where t = 1 ( one year), (m) = (1) and x = = ) 2 (1, log(R&D ), (log(R&D)) 1 where R&D, is R&D expenditure of firm i in 1976. Note that m) = (1) correspond to the assumption of constant mixing probabilities. Note also that the mixing probability here may be interpreted as the likelihood that a firm stays in a particular underlying state during one year period. Since R&D expenditure is usually calculated at the end of a year, for one year patent data, it is legitimate to assume that the mixing probabilities are independent of R&D covariates; (2) patent counts of different firms are independent; (3) each patent count follows a mixed Poisson distribution with Poisson rates defined by exponential link functions A(x, aj) where i = 1,2, ..., 70, j = = exp[ojo + oj 1 Iog(R&D,) + 2 (log(R&D,)) cr, j 1,2, ..., c, and c is the number of components in the mixture. Chapter 2. Mixed Poisson Regression Models 59 The maximum likelihood estimates for the saturated 1-4 component mixture models and several constrained 3-component mixture models applied to the data are given in Table 2.9. Among the four saturated mixture models, both AIC and BIC lead to the choice of 3-component mixtures. Within the class of 3-component mixture models, the saturated 3-component mixture model is considered as the most appropriate one to fit the data in terms of BIC (AIC). After fitting the 3-component mixed Poisson regression model to the data, the Pearson 2 is 64.53 with 59 degrees of freedom. This value does not goodness-of-fit statistic X exceed the upper 95% critical point of the 2 -distribution on 59 degrees of freedom, X9,o.95 = 77.93, suggesting that the mixed Poisson regression model fits adequately. Moreover, as discussed in Section 2.7.3, the residual analysis shows that there are a few influential observations and outliers. For example, the 12th observation is an influential observation corresponding to the company which spent $33.8 million on R&D for 59 patent applications. On omitting the 12th observation, the new parameter estimates become = (13.7407, 7.7893, —0.8036), (0.4344, 1.8847, —0.2071), = (0.7056,0.5177,0.0744), = 0.1653, 132=0.1929, and . 8 l 64 jO. Note that the changes in the parameter estimates of the first component are relatively large, while the changes in the other parameter estimates are not significant. The fitted mixed Poisson regression model suggests that patent counts are generated by three underlying Poisson distributions with rates defined by three different R&D productivity functions, respectively, = ) 2 exp[—16.223 + 9.3091og(R&D — , (log(R&D)) ] 1.014 2 Chapter 2. Mixed Poisson Regression Models 60 ) 2 a = exp[0.590 + 1.780 log(R&D) and )(x,a ) 3 = exp[0.703 + 0.518 log(R&D) + 0.076 (log(R&D)) ]. 2 — 0.196 ] 2 ) 1 (log(R&D ) Note that since the above three rate functions are conditional on the three underlying states respectively, the coefficients in these functions should be interpreted as the effects on conditional mean. For instance, cv 12 = 9.309 is the log(R&D) effect on patents when a firm is in state one. The three dotted lines in Figure 2.6 represent the curves of the above functions re spectively. The implied R&D elasticities (derivatives with respect to log(R&D)) are 9.309 — 2.028 log(R&D), 1.780 — 0.3921og(R&D) and 0.518 + 0.1521og(R&D), suggest ing that returns to scale differ across components. Note that when we fit the data by the usual Poisson regression, which fails to account for the excess variation, the quadratic term is not significant. (The difference in the log likelihood between the usual Poisson regression models with and without the quadratic term is 0.45 and 99 . 0 x, = 6.634 > 2 * 0.45 = 0.9 .) If this were the correct model, we would conclude that economies of scale do not vary significantly with the size of the firm’s R&D program. The mixture model estimated above indicates, however, that the quadratic term is significant in terms of likelihood ratio test. (The difference in the log likelihood between 3-component mixture models with and without the quadratic term is 6.56 and = 11.345 <2 * 6.56 = 13.12.) This result exemplifies that overdispersion in the usual Poisson regression may result in too large standard error estimates, and subsequently reject too many items in the usual Poisson regression. If we postulate three different states in terms of the above three different forms of the Poisson rates, a firm has 0.1819 probability of being in state 1, 0.1773 of being in state 61 Chapter 2. Mixed Poisson Regression Models 2 and 0.6408 of being in state 3. Based on the estimated posterior probabilities defined in (2.20), we identify each firm with one of the three states. Figure 2.6 displays this classification in which a firm is identified with a state if the estimated posterior probability of the firm’s being in that state has the largest value. The maximum estimated posterior probabilities always exceeds 0.5 in this application. Note that those observations marked as “1” form a group characterized by (xT), 1 ) ai), those marked as “2” by (xT), 2 ) ), 2 a (x, a 3 ). 3 and those marked as “3” by ) For the purpose of comparison, we fit the data to three widely used quasi-likelihood models. Var()1) The first assumes a variance function Var(Y) = = E(), and the second 2 a E(Y) Note that the negative binomial model has such a meano . E(11) + 2 variance relationship. Further, the parameter estimates under the negative binomial model may not be significantly different from those obtained by the quasi-likelihood, though the former may be more efficient (Lawless, 1987). = is a random variable, and that log()) are unknown regression parameters, and a constant unknown variance = x/3 + ej The third assumes that where x are covariates, are random errOr terms having mean 0 and The unknown parameter o 2 in these models is called unexplained variance. Estimation for these models is discussed by McCullagh and Nelder (1989) and Breslow (1984). The results of parameter estimates and standard errors are given in Table 2.10. Com puting the t-statistic (estimated coefficient/standard error) and comparing the mixed Poisson regression model with the quasi-likelihood, we find that all three quasi-likelihood models may underestimate the effects of R&D innovation. For example, the absolute values of the t-statistics of the estimated coefficient for (log(R&D)) 2 are 0.398, 3.418 and 1.554 for the quasi-likelihood model I, II and III respectively, while the values of the same coefficient in the mixed Poisson regression model are 4.955, 4.560 and 4.314 for the first, second and third components. Chapter 2. Mixed Poisson Regression Models 62 In summary, we have applied the mixed Poisson regression model to analyze the relationship between technological innovation and R&D research at firm level. The patent data are well fitted by the 3-component mixed Poisson regression model with constant mixing probabilities and Poisson rates defined by quadratic functions in log(R&D). This 2 are significant predictors of the shows that both covariates log(R&D) and (log(R&D)) number of patent applications. On the other hand, the covariate (log(R&D)) 2 is not significant in the usual Poisson regression model which may not be justifiable here because of overdispersion. The goodness-of-fit test shows that there is no significant evidence of lack of fit in the mixed Poisson regression model. In addition, the residual analysis identifies outliers and influential observations in terms of the fitted model. According to the fitted model, the firms are classified into three categories, each characterized by a Poisson rate function. Note that the significance of the parameter estimates of the mixed Poisson regression model is quite different from that obtained by the quasi-likelihood methods for dealing with extra-Poisson variation. 2.8.2 Seizure Frequency in a Clinical Trial The timing and circumstances of epileptic seizure recurrence are a source of apprehension for the patients and a mystery for the neurologists. Thus there have been many clinical studies of different treatments for reducing occurrence of epileptic seizures, and accord ingly various methods used to assess a reduction in seizure frequency (e.g., Wilensky, et al., 1981, Hopkins, et al., 1985, Milton, et al., 1987, Gram, 1988, and Albert, 1991). Some of these methods like the percentage of patients “improved,” “unchanged,” or “worse” are rather subjective. This kind of the methods cannot be used to form anything other than an impressionistic opinion of the value of a treatment unless formal criteria for evaluating the significance of changes in the various parameters are first defined; others are designed for particular situations. For instance, Hopkins et al. (1985) first proposed Chapter 2. Mixed Poisson Regression Models 63 a two-state Markov mixture model to describe apparent clustering among daily seizure counts for epileptics. They assumes that at each state the number of seizures is gener ated by a Poisson distribution, and that transitions between the two states are governed by a Markov chain. Albert (1991) and Le, Leroux and Puterman (1993) presented two different algorithms to find the estimates of the parameters in the model. All these meth ods do not directly include treatment effects as covariates in model building so that the treatment effects may be difficult to assess. In this subsection we analyze data from a clinical trial carried out at British Columbia’s Children’s Hospital which investigated the effect of intravenous gammaglobulin (IVIG) on suppression of epileptic seizures. Subjects were randomized into two groups. After a four week (28 days) baseline observation, the treatment group received monthly infusion of IVIG while the control group received “best available therapy”. The primary end point of the trial was daily seizure frequency. The principal data source was a daily seizure diary which contained the number of hours of parental observation and the number of seizures of each type during the observation period. We use Poisson regression to analyze a series of myoclonic seizure counts from a single subject receiving IVIG. Data extracted from the seizure diary was the daily counts, yj, and the hours of parental observation t for the ith day. Figure 2.7 gives the time plot of daily seizure counts. As covariates we use treatment (xii), trend (x ) and treatment2 trend interaction (xj.), where = and 3 x 1 1 1 if there is a treatment (i > 28) 0 otherwise, (i 28 (2.42) ) = log(i) (2.43) = . 2 1 x (2.44) The second column in Table 2.11 reports results of fitting the data using the usual Chapter 2. Mixed Poisson Regression Models 64 Poisson regression with covariates defined in (2.42), (2.43) and (2.44), and a log link function. The data are overdispersed with respect to the Poisson distribution, since each of the overdispersion tests is highly significant (Pa = 16.18, Pb 16.22 and Pc = 36.33). This suggests the inadequacy of the usual Poisson regression model. We apply the mixture model assuming that (1) each daily observed seizure count, y, is associated with time exposure (observation hours), t, and covariates (m) (1) and x’ = (x. 1 x ,x 2 ), where x, x: 3 3 are 2 and x defined in (2.42), (2.43) and (2.44). Note that we assume constant mixing probabilities here because it is believed that the likelihood of being a particular state is a constant for a patient; (2) daily seizure counts are independent and follow a mixed Poisson regression model with means equal to the, product of observation time (ti) and the Poisson rate (number of seizures per hour). Rates are specified by exponential link functions (xT), where i = 1,.. . , 140, j aj) exp(ajo + = = 1,. . . , c, and c x 1 a + x 2 a •+ ). x 3 a is the number of components in the mixture model. This model allows the treatment, trend and interaction of the treatment and trend to affect the Poisson rate, and the regression coefficients to vary across components. Table 2.12 provides the results of fitting these models. Among the three saturated mixture models, both AIC and BIC suggest a 2-component model. Within the class of two component models, we can carry out likelihood ratio tests for treatment, trend and interaction effect respectively. For example, to test interaction effect, i.e., H 0 : a 33 = 0 for j = 1, 2, we find that the likelihood ratio statistic equals 2 100.06 > * (426.21 — 376.18) = = 9.21. ‘This suggests a highly significant treatment-trend interaction. The model we finally select is the 2-component saturated mixture. Chapter 2. Mixed Poisson Regression Models 65 After fitting the 2-component mixed Poisson regression model to the data, the Pear son goodness-of-fit statistic X 2 is 134.0 with 131 degrees of freedom. This value does not exceed the upper 95% critical point of the 2 -distribution on 131 degrees of freedom, 35 x 9 . 0 , 1 = 158.7, suggesting that the mixed Poisson regression model fits adequately. Furthermore, the Pearson, deviance and likelihood residuals from the fitted model are calculated and displayed in Figure 2.10, Figure 2.11 and Figure 2.12 respectively. Fig ure 2.10 shows that the Pearson residuals may not be approximately normal. On the other hand, both Figure 2.11 and Figure 2.12 show that the deviance residuals and likeli hood residuals are very similar to each other, and that the 61st observation is far distant from the remaining observations in both plots, suggesting that it may be an outlier. On omitting this observation, the deviance reduction is r, 61 = (_0.314)2 = 9.86. This means that the 61st observation has great impact on the overall fit of the mixed Poisson regression model to the data. For detection of influential observations, the average relative coefficient changes w are calculated and displayed in Figure 2.13. Clearly, the 6th observation is the only influential observation suggested b the plot. On omitting the 6th observation, the average relative coefficient change for each parameter estimate is about 20%, and the new parameter estimates become = (2.2701, 1.8800, —0.2006, —0.6373), = (2.0045, 7.4989, —0.2444, —2.3026), = 0.2740 and 12 = 0.7260. Note that the changes in the parameter estimates of the first component is relatively larger than that in the other parameter estimates. After excluding the 6th observation, we reanalyze the data by fitting to the Poisson regression and 2-3 component mixed Pois son regression models, and select the same mixed Poisson regression model with the above Chapter 2. Mixed Poisson Regression Models 66 new parameter estimates. In fact, the values of AIC for the Poisson regression and the saturated 2-3 component mixed Poisson regression models are -576.4, -379.7 and -383.8 respectively, and the values of BIC are -582.3, -392.9 and -404.4 respectively. Further, the likelihood ratio tests lead to the choice of the saturated 2-component mixed Pois son regression model. Hence, residual analysis identifies possible outliers and influential observations in terms of the mixed Poisson regression model. We now interpret the fitted model. In it the mixing probabilities equal 0.2761 and 0.7239 and the respective rates are Ai(xT),i) and 2 (x,a ) = exp[2.8450 + l.3020x 21 = exp[2.0704 + 7.43l8x 1 — — 2 0.4O63x 2 0.2707x — — J 3 0.4309x ]. 3 2.2762x Note that since the above two rate functions are conditional on the two underlying states respectively, the coefficients in these functions should be interpreted as the effects on on conditional mean. For example, in state one, while &22 = &12 = 1.3020 is the treatment effect when the patient is 7.4318 is the treatment effect in state two. Figure 2.8 provides the estimated hourly seizure rate corresponding to each component (the solid line is the rate for component one and the dotted line for component two) and the observed hourly seizure rate y/t. Observe that with the treatment both the hourly rates are lower and the trend is less steep than at baseline, suggesting that this patient benefited from IVIG therapy. Figure 2.9 depicts the estimated mean E() (the solid line) and variance Var(Y) (the dotted line) for the fitted model obtained through (2.11) and (2.12). Observe that with the treatment the variance becomes much closer to the mean, suggesting the patient’s situation becomes more stable. Further, the variance exceeds the mean throughout, with the greatest difference in the baseline period. The “bumpiness” in these quantities is due to the non-constant exposure. Note also that there is no obvious Chapter 2. Mixed Poisson Regression Models 67 parametric relationship between the estimated mean and variance. We note that the clinical investigators conducting this study found the two compo nent model plausible. They said that they have observed subjects to have “bad days” and “good days” with no obvious explanation of this effect. We believe our model captures this aspect of the data and by doing so provides a clinically meaningful explanation of overdispersion. Note that Figure 2.8 also classifies the days in terms of the estimated posterior probabilities. Those observations marked as “1” form a group which is charac terized by the Poisson rate function 1 (x, ai), while those marked as “2” form another group which is characterized by 2 (x, cr ). We may regard )i(x, oi) as the Poisson 2 regression specification for group one, and (xR 2 ) cr2) for group two. In this sense, our model consists of two P?isson regression models, each describing the seizure frequency rate on “bad days” and “good days” respectively. For the purpose of comparison, we also fit the data to the three quasi-likelihood models defined in Section 2.8.1. Table 2.11 reports parameter estimates for these methods. From Table 2.11 we find that using different methods for overdispersion may lead to either different parameter estimates or different standard errors or both. For instance, the coefficient estimate for treatment effect is 4.132 by model I, 4.656 by model II, 3.757 by model III, and 1.3020 for component 1 and 7.4132 for component 2 by our mixture. Further, the ratio of estimate to standard error for trend is 5.8535 under Method I, 4.2559 under Method II, 4.5145 under Method III, and 2.6550 for component 1 and 14.587 for component 2 under our mixture. This implies that these methods disagree to the significance of background trend effect. Compared with the three methods for overdispersion, our mixture model has smaller confidence intervals for parameter estimates. In this example, we have analyzed the series of myoclonic seizure counts from a clinical trial. The data are well fitted by 2-component mixed Poisson regression model Chapter 2. Mixed Poisson Regression Models 68 with constant mixing probabilities and Poisson rates depending on covariates treatment, trend and treatment-trend interaction. The goodness-of-fit test suggests that there is no significant evidence of lack of fit in the model. In addition, the residual analysis identifies influential observations and outliers. According to this model, the patient may have two states of seizure frequency rate, which describe “bad days” and “good days” situations respectively. Comparing with the quasi-likelihood methods, the mixed Poisson regression model gives smaller confidence intervals of parameter estimates. Note that both parameter and staildard error estimates under the mixed Poisson regression model differ from those obtained by the quasi-likelihood method. 2.8.3 Terrorist Bombing We analyze data consisting of a time series of the number of international terrorist bomb ing episodes (Roberts, 1991, p.432). Roberts (1991) notes that the data do not behave as a single homogeneous series, and suggests that an indicator variable be used to model a level shift for the 1astseven years. This is reinforced by the time plot (Figure 2.14) which suggests that there might have been a change in rate in 1973. We first apply the usual Poisson regression with an intercept, trend variable log(i) and a step variable s, defined by 10 fori<60 1 otherwise. (2.45) 8, = Note that defining the step variable as above, we assume that the step change hap pened in the 60th month. The trend variable is insignificant, and regression estimates are 0.7498(0.0887) for intercept and 1.158(0.0981) for the coefficient of the step vari able. The deviance for the model is 368.1 with 142 degrees of freedom. Note that the data are overdispersed in terms of the Poisson regression, since all three score tests for overdispersion (Dean, 1992) are highly significant (Pa = Pb = 13.84, and P 14.59). Chapter 2. Mixed Poisson Regression Models 69 We apply a mixed Poisson model in which (1) the monthly terrorist bombing count, yj, is associated with exposure t and covariates r) = (1) and (m) = (1, log(i), si), where t = 1 (one month) and s is defined by (2.45). Note that the covariate log(i) represents a trend, and Poisson rates are constant; (2) yj, i = 1,. . . , 144, are independent and follow a mixed Poisson model with rates, .A, and mixing probabilities defined as _ e 1 i0+13ull0+2su1 k0 + 1 3 k2Si] + 1 3 k1 log(z) + / 3 k=i exp{/ P(Xm),13) (i = c—i and (m) pc(; ,i3) = (m) 1— (x 3 p ,8), 3=1 where i = 1,. . . , 144 and c is the number of components in the mixture. This model allows mixing probabilities to depend on the trend variable and step change and to vary between different forms of them. Table 2.13 provides the results of model fitting. Among the four saturated mixture models, both AIC and BIC suggest a 3-component mixture model. To test whether the trend effect is signifièant, we first compare the mixture with covariates including an intercept and the step change with the 3- component saturated mixture. The difference in log-likelihood between the two is 0.89, and the chi-square test statistic is 2* 0.89 = 1.78 with 2 degrees of freedom. Hence the trend effect is not significant based on the usual likelihood ratio test. Similarly, comparing the mixture without covariate with the one with the step change variable in covariate, we find that the step change is significant based on the likelihood ratio test. (The chi-square test statistic is 61.62 with 2 degrees of freedom.) Further, we can compare two non-nested mixtures with only step change Chapter 2. Mixed Poisson Regression Models 70 variable in covariates and only the trend variable respectively using either AIC or BIC. Clearly, the former has bigger AIC and BIC values. According to the model selection procedure, we finally choose, within the class of 3-component mixtures, the model with a step change in the mixing probabilities. After fitting the 3-component mixed Poisson regression model to the data, the Pear son goodness-of-fit statistic X 2 is 134.7 with 137 degrees of freedom. This value does not exceed the upper 95% critical point of the 2 -distribution on 131 degrees of freedom, X3r,o.95 = 165.3, suggesting that there is no evidence of lack of fit. Furthermore, the Pearson, deviance and likelihood residuals from the fitted model are calculated and dis played in Figure 2.16, Figure 2.17 and Figure 2.18 respectively. Figure 2.16 shows that the Pearson residuals may not be approximately normal. On the other hand, Figure 2.17 and Figure 2.18 show that the deviance residuals and likelihood residuals are very similar to each other, and that the 7th observation is far distant from the remaining observations in both plots, suggesting that it may be an outlier. On omitting this observation, the deviance reduction is r 7 = (3.121)2 = 9.741. This means that the 7th observation has great impact on the overall fit of the mixed Poisson regression model to the data. For detection of influential observations, the average relative coefficient changes w, are calculated and displayed in Figure 2.19. Clearly, the 7th observation is the only influential observation suggested by the plot. On omitting the 7th observation, the average relative coefficient change for each parameter estimate is about 586%, and the new parameter estimates become I = (30.28, —30.10), /32 = (27.56, —26.05), = 1.6874, 2 = 6.3577 and )3 = 14.239. Note that the changes in the regression parameter estimates of the mixing probabilities Chapter 2. Mixed Poisson Regression Models 71 are very significant. This may be due to the fact that after excluding the 7th observa tion, the first 60 observations are all generated by the first two components. Hence the mixing probability of the third component is almost zero. In this case, the parameter estimates may lead infinity because they are on the boundary of the parameter space, as it usually happens in logistic regression. Note also that the Poisson rates do not change significantly, suggesting that the 7th observation has great influence on the mixing prob abilities rather than on the Poisson rates. The residual analysis confirms that the fitted model is adequate. We interpret the fitted mixed Poisson regression model as follows. In it the mixing probabilities are ‘ (m) — Pi — , (m) P2i — ‘ 1 — and (m) (x 3 p exp(4.0231 3.8535s) 3.8535s) + exp(1.3141 + 7 O.1 4 1s) + 1’ — exp(4.0231 — exp(1.3141+0.1741s) exp(4.0231—3.8535s)+exp(1.3141+0.1741s)+1 1 exp(4.0231 3.8535s) + exp(1.3141 + 0.1741s) + f — and the Poisson rates are 1 A = 1.6864, A 2 = 6.3611 and ) = 14.044. This model suggests that the mixing probabilities have a jump. During the first 60 months, the number of episodes follows one of three Poisson distributions with a low rate of 1.6864 (episodes per month) with probability of 0.9221, a medium rate of 6.3611 with probability 0.0614 and a high rate of 14.044 with probability 0.0164 respectively. After December 1972, the data follow one of the same Poisson distributions, however the probabilities have changed to 0.1791, 0.6697 and 0.1512 respectively. This indicates that terrorist bombing incidents become significantly more frequent between 1973 and 1979. Furthermore, the mixture model suggests that the time trend (monthly index) is not significant, suggesting that rates are stable in these periods. Chapter 2. Mixed Poisson Regression Models 72 If we postulate three levels of terrorist bombing corresponding to the three different Poisson rates, each month occupies one of the levels according to the mixing probabil ities. Based on the estimated posterior probabilities defined in (2.20), we identify each observation with a level if its estimated posterior probability of being at that level is greater than 0.5. Figure 2.15 classifies months in this way. Note that the high intensity component counts for the large number of episodes in July 1968 as well many past 1973 data parts. From the fitted model, we find the estimated mean and variance are 2.18 and 6.69, respectively, for the first five years, and 5.82 and 19.5 for the last seven years. Clearly, the mixed Poisson model accounts for overdispersion. Note that we also fit the data using mixed Poisson regression model with a step change in the rate, and have found that the above model fits better. In summary, the terrorist bombing data have been fitted by the 2-component mixed Poisson regression model with constant Poisson rates and mixing probabilities depending on a step change. This means that since July 1968 terrorist bombing have become more intensive because of a likelihood of being a higher bombing rate. The goodness-of-fit test shows that there is no significant evidence of lack of fit. In addition, the residual analysis identifies one observation which is not only an outlier but also an influential observation in terms of the fitted model. 2.8.4 Accidents in Worksites There have been many studies of the relationship between alcohol and accident injuries (e.g., McDermott, 1977; Dietz and Baker, 1974; Hingson and Howland, 1987; and Wech sler et al., 1969). Some of these studies established a link between alcohol and accidental injuries (McDermott, 1977), but others have not. Particularly, there is no strong evi dence implicating alcohol in workplace injuries. Some methodological issues associated Chapter 2. Mixed Poisson Regression Models 73 with these studies include data collection, alcohol measurement and appropriate statis tical models. Webb et al. (1994) conducted a study to analyze the relationship between problem drinking and industrial workplace injuries. They collected data from 470 em ployees of a large industrial plant manufacturing metal products in the Hunter Valley region of New South Wales, Australia, employed during period May 1985 to July 1986. Problem drinking was measured by the Mortimer-Filkins test, which was devised initially to detect alcohol problem among persons charged with drunk-driving (Mortimer et al., 1971). The range of the test scores (MFts) in the data varies from -3 to 37. The numbers of work injuries were obtained from medical reports completed for all injuries reported to the medical center by study participants, for a period of 12 months from the time of administration of the questionnaire to each study participant. The data also contain socio-demographic measures including age and job satisfaction. A question of interest here is to find significant predictors of work injuries. A review of studies on the relationship between alcohol and work injuries revealed that the evidence is contradictory and that many of the studies contain methodological flaws (Webb et al., 1994). As a standard method for count data analysis, we use Pois son regression by defining the number of work injuries in subject i as Y and including covariates: = log(age) 1 1 if individual i has low level of job satisfaction 0 otherwise, = and (2.46) (2.47) 3 xi = 1og(MFts + 10) (2.48) 4 x = x. (2.49) Thus the model for Poisson mean X is log()) = 11 + c x 1 x + cr 2 x+. 3 24 x 4 a ao + a 74 Chapter 2. Mixed Poisson Regression Models Note that we add a constant 10 in (2.48) so that MFts + 10 > 0 and the log-transfer can be applied. The first row in Table 2.14 reports the results of fitting the data using usual Poisson regression. Comparing the t-statistics (parameter estimate/standard error), all covariates except x 4 are highly significant. However these results may be misleading because the data are seriously overdispersed. The overdispersion score test statistic Pa has a value of 24.33 which was compared to the N(0, 1) reference value, and suggests inadequacy of the usual Poisson regression model. To apply the mixed Poisson regression model, we assume that (1) the number of work injuries for individual i is associated with covariates x = / (r) (r) (m) ,x 1 ,x 2 3 and ,x 1 ,x 2 ,x 3 ), where x 4 (x(tn) , x ) with x = (1, x, x: , x3) , x = (1, x 2 . 4 are defined by (2.46), (2.47), (2.48) and (2.49) respectively. Note that we choose x t=1fora1li; (2) injury counts of different individuals are independent and follow a mixed Poisson regression model with rates (number of work injuries per year) given by the link functions ).(Xr) .) where i = 1,2, ...,470, j = exp(ajo + cvjixil + x2 2 cj aax + +3 ) x 4 aj = 1,2, ...,c and c is the number of components in the mixture. Table 2.14 shows the results of fitting these models. In order to determine the number of components first, we compare the values of BIC and AIC among the three saturated models. Clearly, both BIC and AIC lead to the choice of 2-component mixture models. Within these 2-component mixtures, we carry out inference using likelihood ratio tests. 3 and x are insignificant First we test the hypothesis that the effects of covariates x ,x 2 by comparing the one including oniy x in both mixing probabilities and rates with the saturated 2-componnt model. Since the chi-square test statistic is 2 * (—897.74 + 903.45) = 11.42 < xLo. 95 = 15.51, we do not reject the hypothesis at 5% significance Chapter 2. Mixed Poisson Regression Models 75 level. This implies that both the level of job satisfaction and Mortimer-Filkins test score do not have significant effects on mixing probability and Poisson rates. Then we test whether the effect of age (x) is insignificant in the mixing probabilities. Indeed, age is a significant covariate in mixing probabilities because the cu-square test statistic for the corresponding hypothesis is 2 * (—899.15 + 906.31) = 14.32 > x, 95 . 0 = 3.84. For Poisson rates, the age covariate x is also highly significant in the rates because the corresponding test statistic is 2 * (—903.45 + 909.57) = 12.24 > x, 95 . 0 5.99. Finally we test the hypothesis of a common slope for both components, i.e., a 21• = Indeed this hypothesis is valid at 5% significance level because the test statistic is 2*(—903.45+ 903.48) = 0.06 < (1, 0.95) 2 = 3.84. Therefore we choose the 2-component mixture model with the age covariate in mixing probabilities and Poisson rates with the common coefficient. This model fits the data best. After fitting the 2-component mixed Poisson regression model to the data, the Pearson goodness-of-fit statistic X 2 is 510.8 with 465 degrees of freedom. This value does not exceed the upper 95% critical point of the X -distribution on 465 degrees of freedom, 2 = 516.27, suggesting that there is no evidence of lack of fit in the mixed Poisson regression model. Furthermore, the Pearson, deviance and likelihood residuals from the fitted model are calculated and displayed in Figure 2.21, Figure 2.22 and Figure 2.23 respectively. Figure 2.21 shows that the Pearson residuals may not be approximately normal. On the other hand, Figure 2.22 and Figure 2.23 show that the deviance residuals and likelihood residuals are very similar to each other, and that the numbers of the possible outliers in these two plots are the same, with the 72th observation having the largest values of deviance and likelihood residuals. On omitting the 72th observation, the deviance reduction is r, 72 = (3.488)2 = 12.166. This means that this observation has great impact on the overall fit of the mixed Poisson regression model to the data. Chapter 2. Mixed Poisson Regression Models 76 For detection of influential observations, the average relative coefficient changes w are calculated and displayed in Figure 2.24. Clearly, the plot shows that there are a couple of influential observations with the 434th observation having the largest value (0.417). On omitting the 434th observation, the average relative coefficient change for each parameter estimate is about 42%, and the new parameter estimates become (—1.1505,0.2566), = 2 a (0.5850,0.2566) and (—6.5083,1.9982). = Note that the changes in the regression parameter estimates of the Poisson rates, espe cially the common regression parameter, are relatively larger than that in mixing prob abilities. This suggests that the 434th observation has great influence on the Poisson rates rather than on the mixing probabilities. The residual analysis identifies possible outliers and influential observations in terms of mixed Poisson regression model. We now interpret the fitted model as follows. The chosen mixture model suggests that work injury counts are generated by the two underlying Poisson distributions with rates defined by (Xr), ai) 1 ;\ = exp(—1.4545 + 0.3431 log(agej) and 2 (x, o ) 2 = exp(0.3066 + 0.3341 log(age)). Also these two distributions are mixed according to the mixing probabilities defined by Pi (m) , — — and (m) p ( 2 x = exp(—6.8705+2.10681og(age ) 2 ) exp(—6.8705+2.10681og(age))+1 1 exp(—6.8705 + 2.1068 log(age)) + 1 Chapter 2. Mixed Poisson Regression Models 77 According to this model employees may be classified into two groups on the basis of work injury rates. Those in one group have relatively a low baseline risk, and those in group two a high baseline risk. Age, however, has the same effect on both groups. In fact as employees get older, their chances of having a work injury increase. On the other hand, since the mixing probability for group one (m), 1 /3) increases in terms of age, there are more senior employees in the low risk group than young ones. For example, for a 25 year old employee, there is a 47.8% chance of being classified into the low risk group with an accident rate of 0.7 work injuries per year, and 52.2 % chance the high risk group with an accident rate of 4.0 work injuries per year; For a 50 year old employee, there is a 79.8% chance of being classified into the low risk group with an accident rate of 0.9 work injuries per year, and 20.2% chance the high risk group with an accident rate of 5.0 work injuries per year. Figure 2.20 provides the estimated work injury rate corresponding to each group (the solid line is the rate for the low risk group and the dotted line for the high risk group. Note that Figure 2.20 also classifies the employees in terms of the estimated posterior probabilities. Those observations marked as “1” form the low risk group which is characterized by the function ) (xv, ai), while those marked as “2” form the high risk group which is characterized by the function ) (x, a 2 ). 2 In this example, we have found that neither the problem drinking measure, the Mortimer-Filkins test score nor the job satisfaction score is a good predictor of work place injuries. On the other hand, age is a significant predictor of workplace injuries. After taking into account age effects, the accident rates do not depend on Mortimer Filkins test score and job satisfaction but only on age in the log-linear function. The workplace injury data are well fitted by the 2-component mixed Poisson regression model which consists of two Poisson regression models. According to the model, the employees can be classified into two groups depending on baseline risk and the likelihood of being in one of the baseline groups associated with age. Note also that the inferences differ from Chapter 2. Mixed Poisson Regression Models 78 those obtained through the usual Poisson regression analysis. The goodness-of-fit test shows that there is no significant evidence of lack of fit. In addition, the residual analysis identifies several outliers and influential observations in terms of the fitted model. 2.8.5 Aces Salmonella Assay Data The data in this example were first presented by Margolin et al. (1981) from an Ames salmonella reverse mutagenicity assay, and analyzed by Breslow (1984) and Lawless (1987b) using quasi-likelihood and negative binomial approaches respectively. Table 2.15 shows the number of revertant colonies (yj) observed on each of three replicate plates tested at each of six dose level of quinoline (di). Lawless (1987b) defined the expected frequency of revertants as E(Y d) = )(d:) exp(ao + a d+a 1 2 log(d + 10)), while Breslow (1984) assumed E(l’ I d) \(d). At issue is whether a mutagenic effect is present. This corresponds to testing the hypothesis that a 2 = 0. The data are overdispersed relative to Poisson regression with rate defined above, since each of the three tests for overdispersion is highly significant (Pa 5.628, Pb = 5.656 and P = 5.607). To account for overdispersion, Breslow (1984) assumed a variance function Var(Y) u2,)(d) and obtained parameter estimates by using weighted least-squares , )(d) + 2 combined with method of moments. Similarly, Lawless (1987b) fitted the data with a negative binomial model in which the variance function is Var() = , and 2 )..(d)+J2)..(d) obtained parameter estimates by maximum likelihood. Parameter estimates (standard errors) are reported in Table 1.8.5.3. Our analysis of the data using mixed Poisson regression models follows. We assume (1) the number of observed revertant colonies, y, is associated with covariates x (1,d:,1og(d + 10)), and t: = 1; = Chapter 2. Mixed Poisson Regression Models 79 (2) Y: are independent and follow a mixed Poisson regression model with Poisson rates c) where i = 1,. . . , 18 and j = = 30 + a exp(a d.j 3 1+ 1,. . . , cJ2 1og(d + 10)). c. Table 2.16 shows the results of fitting these models. Among the three saturated mod els, both AIC and BIC lead to the choice of 2-component mixtures. To test mutagenic effects, we compare thesaturated model to the one without covariate log(d + 10) by a likelihood ratio test. Since the chi-square test statistic equals 2 15.82 > X,o.9g = * (68.81 — 60.90) = 9.21, mutagenic effects are significant. Further, the similar regression coefficient estimates for each component in the saturated model suggest common regres sion coefficients for both components. This is indeed confirmed by the likelihood ratio test (the chi-square test statistic is 2 * 0.01 = 0.02 < x, 99 . 0 = 9.21.) Hence we choose to represent the data by the 2-component mixture with common regression coefficients and different intercepts for each component. The fitted model may be interpreted as follows. In it mixing probabilities equal 0.8173 and 0.1827 and the respective rates are = (x, a 2 ) ) 2 =‘ exp(1.9094 exp(2.4768 — — 0.00126d + 0.36401og(d + 10)) and O.00126d + 0.3640 log(d + 10). This model indicates that mutagenic effects are the same for both components. This model may also be regarded as a Poisson regression with a random intercept following a discrete mixing distribution with 2-points of support. Figure 2.25 shows the classification for the data in which each observation is identified with either of the two components in the mixture according to the estimated posterior probabilities defined by (2.20). This plot may provide a way to visualize overdispersion for the data. From it we conjecture Chapter 2. Mixed Poisson Regression Models 80 that the three observations classified with component 2 may be outliers in terms of the Poisson regression model, and that overdispersion may be due to these three observations. In fact, the residual analysis below adds strength to this conjecture. After fitting the 2-component mixed Poisson regression model to the data, the Pear son goodness-of-fit statistic X 2 is 16.2 with 13 degrees of freedom. This value does not exceed the upper 95% critical point of the 2 -distribution on 13 degrees of freedom, ) 3 Xo.s(l = 22.36, suggesting that there is no evidence of lack of fit. Moreover, the Pearson, deviance and likelihood residuals are displayed in Figure 2.26, Figure 2.27 and Figure 2.28 respectively. Figure 2.27 and Figure 2.28 show that the deviance and likeli hood residuals are very similar to each other. On the other hand, Figure 2.26 indicates that the Pearson residuals may not be approximately normal. For detection of influential observations, the average relative coefficient changes w: are calculated and displayed in Figure 2.29. Clearly, the plot shows that the 12th observation is influential. On omitting the 12th observation, the new estimates of the intercepts in two components are 2.2242 and 2.5460 respectively; the new estimates of the other common regression parameters are -0.00067 and 0.2430 respectively; and the new estimates of the mixing probabilities for the two components are 0.5644 and 0.4356 respectively. Note that the new intercept estimates are very close, suggesting that the data excluding the 12th observation may not be overdispersed. In fact, we fit the data to the Poisson regression model, and find that there is no strong evidence of overdispersion because each of the three overdispersion score test statistics is not significant (Pa and P = = 1.6142, Pb = 1.6132 1.8543). If we use the correction forms of these score test statistics for small samples, P = 2.1339, P = 2.1328 and P normal critical values at critical level a = = 2.3688. These values are marginal to the 0.5, suggesting again that there are no strong evidence of overdispersin. Assuming that the data excluding the 12th observation is overdispersed, we also fit the data to the 2 and 3 component mixed Poisson regression Chapter 2. Mixed Poisson Regression Models 81 models, and select the (one-component) Poisson regression model because it yields the largest values of AIC and BIC among the three saturated models. That is, the values of AIC and BIC for the Poisson regression and the 2-3 component saturated mixed Poisson regression models are -61.3, -61.4 and -64.4 respectively, and the values of BIC are -62.7, -64.5 and -68.9 respectively. The analysis shows that extra-Poisson variation may be caused by outliers in terms of Poisson regression, and that the mixed Poisson regression model may tend to model these outliers by extra components. Note also that the changes in the parameter estimates and corresponding standard errors between the two Poisson regression models with and without the 12th observation may not be very significant, suggesting that the 12th observation may be an outlier in terms of the Poisson regression with the complete data. From Table 2.17, we note that the regression coefficient estimates, & and &2, do not vary drastically across models, but their standard errors do. For instance, the value of 2 /se(& changes from 0.3640/0.0665 & ) model to 0.3110/0.09901 = = 5.4737 under the mixed Poisson regression 3.1411 under the quasi-likelihood model. Thus, although all four models agree that mutagenic effects are significant, they disagree agree to the significance of the effect’s. Note that confidence intervals under the mixed Poisson re gression model are much smaller than either the quasi-likelihood or negative binomial model. Hence effects are estimated more precisely. For example, an approximate 95% confidence interval for the coefficient of log(dose + 10) under the mixed Poisson regres sion is 0.3640 + 0.1303, 0.3110 + 0.1941 under quasi-likelihood, and 0.313 + 0.1701 under the negative binomial model. This suggests that using different models to account for overdispersion may lead to different conclusions. In this example, we analyzed the data set from an Ames salmonella reverse muta genicity assay. The data are well fitted by the 2-component mixed Poisson regression model with constant mixing probabilities and Poisson rates as functions of dose level. Chapter 2. Mixed Poisson Regression Models 82 Note that the mutagenic effects are the same for both components, while the intercepts in the Poisson rates vary between the two components. The goodness-of-fit test sug gests that there are no evidence of lack of fit in the model. In addition, the residual analysis identifies one influential observation. Excluding this observation, the data are not overdispersed. This example suggests that extra-Poisson variation may be caused by the presence of outliers in terms of Poisson regression, and that the mixed Poisson regression may model these outliers by including extra components. This example also illustrates a difference between our approach and the usual approaches for accounting for overdispersion. Since the variance exceeds the mean, methods which correct for this by increasing the variance may lead to less significant regression coefficient estimates. Our approach has a different effect. By attributing overdispersion to the presence of several components, the mixed Poisson regression model estimates coefficient effects with smaller error. 2.9 Tables and Figures in Chapter 2 0.9904 0.4372 -2.2528 1.5776 -2.0 1.5 0.9838 0.7530 2.2648 -1.4346 2.0 -1.4 0.1523 0.0027 -2.9461 0.2119 0.4062 -2.9 0.2 0.4 0.0376 0.0069 0.0187 2.8141 3.5811 2.5870 2.8 3.6 2.6 1 2 3 0.0104 ) 1 Var(,, ) 1 E(g, i3 Var(t) ) 0 E(j, i3 ) 11 Var(& ) 11 E(& 1 a 1 ) 0 Var(&, Mixing Probabilities ) 10 E(& Poisson Rates 0 cx Comp. The Third Model 0.0104 1.9882 2.0 3 0.4572 1.5686 1.5 0.8428 -1.5175 -1.4 0.0011 3.0019 3.0 2 0.5776 -2.0034 -2.0 0.6256 1.9238 ) 1 Var(j, ) 1 E(, Ii ) t 1 Var( Var(,) ) 1 E(i, ) 0 E( Mixing Probabilities n Io 2.0 ) 1 Var(& 0.0268 ) 1 E(& 0.3699 1 a 1 0.4 ) 10 Var(& E(&) Poisson Rates 1 Comp. The Second Model 0.0065 0.1983 0.2 0.0095 3.5986 3.6 3 0.1050 0.5938 0.6 0.0090 0.3903 0.4 0.0183 2.6146 2.6 2 0.0845 1.1764 1.1 0.1076 -2.9033 -2.9 0.0424 2.7955 2.8 1 Var(g,,) ) 10 E(I !1O ) 1 Var(&, ) 11 E(& Mixing Probabilities 1 a 1 Poisson Rates ) 10 Var(& ) 0 E(&, a, Comp. The First Model Table 2.1 The results of the simulations for the mixed Poisson regression models 00 I a2 1 a 2 a 1 a a2 a 2 a 1 a parameter 2.5000 0.5510 2.5000 0.5981 -2.4999 0.6475 -2.4997 0.7507 0.0801 0.0705 2.4828 0.5961 2.5 0.6 0.41 1728 2.5000 1.9704 1.9875 1.9999 2.0000 2.0176 0.0474 1.9932 2.0 0.3652 0.5276 0.5878 0.6466 0.8029 0.0865 0.5878 0.6 -2.5205 -2.5085 2.5000 2.5000 -2.4945 0.1419 2.4629 2.5 0.8467 0.9347 1.0000 1.0000 1.0613 0.0562 0.9734 1.0 0.3628 0.5524 0.6055 0.7015 0.9239 0.1437 0.6239 0.6 O.5327 0.5000 0.5000 0.4224 0.0585 0.4766 0.5 0.4689 2.0570 0.0420 1.9985 2.0 1.9277 2.01 19 0.8878 0.0699 0.6037 0.6 0.5442 1.9781 0.5996 0.6688 0.5023 2.0000 0.5000 -0.4762 0.4388 0.0699 0.4793 0.5 0.8972 0.9558 0.5 120 1.0000 1.0000 1.0501 0.0630 0.9885 1.0 lower extreme lower quartile 0.29 17 median upper quartile upper extreme standard deviation mean true value Table 2.2: The result of a Monte Carlo study on the 2-component mixed Poission regression model with constant mixing probabilities and variable rates --I. b... a2 1 a a2 1 a 2 a a a2 a parameter 0.0678 0.0484 0.0657 0.1683 0.0860 0.0819 0.9906 -0.4820 0.3975 1.9967 -0.4692 0.4299 0.9813 -2.4817 0.3818 1.9938 -2.4763 0.3965 1.0 0.5 0.4 2.0 -0.5 0.4 1.0 .2.5 0.4 2.0 -2.5 0.4 0.0751 0.1045 0.0371 0.0615 0.0430 0.1715 standard deviation mean true value 2.0105 -0.4699 2.0511 -0.4298 2.5000 2.5000 0.4483 2.0000 2.0203 0.5900 0.4296 -2.4998 1.0000 0.5619 2.4995 1.0547 0.4907 0.4759 0.6965 0.6654 -0.4707 1.0105 upper quartile 4) .4305 1.0829 upper extreme 0.3967 2.5000 2.0000 0.3818 2.5000 1.0000 0.4087 -0.4993 2.0000 0.3999 -0.5000 1.0000 median 0.3506 2.5000 1.9858 0.3323 2.5000 0.9625 0.3401 -0.5000 1.9820 0.3268 0.5000 0.9584 lower quartile Table 2.3: The result of a Monte Carlo study on the 2-component mixed Poission regression model with constant mixing probabilities and variable rates --IL 0.2086 2.5000 1.9657 0.1991 2.5000 0.9064 0.1195 -0.5432 1.9399 0.1045 -0.5304 0.8813 lower extreme I. F Chapter 2. Mixed Poisson Regression Models Pt 1 a 1.0 2.0 2 a 0.6 -0.5 63/200 17/200 -2.5 200/200 198/200 -0.5 174/200 119/200 -2.5 200/200 200/200 Table 2.4: The results of the likelihood ratio tests for the hypothesis the 2-component mixed Poisson regression model—I. 0.4 of a 2 = 0 based on Chapter 2. Mixed Poisson Regression Models 87 p,=O.4 6 = 1 p o. parameter 1 a — true value mean standard deviation mean standard deviation 1.0 0.9910 0.0849 0.9882 0.0861 -0.5 -0.2668 0.1362 -0.1611 0.1306 a1 2.0 1.9985 0.5669 1.9954 0.0546 2 a -0.5 -0.2708 0.0889 -0.1693 0.0862 1 a 1.0 0.9976 0.0797 0.9903 0.0761 2.5 -0.8055 0.2021 0.4377 0. 1598 aI 2.0 1.9959 0.0500 1.9931 0.0517 2 a -2.5 -0.8065 0.1789 -0.4536 0.1425 2 a a 2 Table 2.5: The results of fitting mixed Poisson regression model to the data from a Monte Carlo study on the 2-component mixed Poisson regression model with consta nt mixing probabilities and variable rates. ‘Jhapter 2. Mixed Poisson Regression Models — Pt 1 a 1.0 2 a 0.6 0.4 -0.5 99/200 47/200 -2.5 200/200 177/200 181/200 122/200 200/200 200/200 -0.5 2.0 -2.5 . Table 2.6: The results of the likelihood ratio tests for the hypothesis the 2-component mixed Poisson regression model—IT. 2 = of a 0 based on 96 87 91 1 2 3 100 100 100 100 total number of replicates 100 100 # of repicates that BIC leads the choice of the right model Table 2.7: The results of model selection based on AIC and BIC values for the Monte Carlo study. # of replicates that AIC leads the choice of the right model Model Number 00 CD I * ‘a ‘b 24.45 24.47 374.29 a 24.45 24.47 374.29 b and P are score test statistics which asymptotically follow the standard normal distribution. -316.24 0.8560 (0.0770) 0.6207 (0.1206) 0.0123 (0.0127) -316.69 0.9279 (0.0231) 0.5392 (0.0906) (log (RND)) 2 -1780. log (RND) loglikelihood 3.155 (0.2466) 1 Covariates Table 2.8: Poisson regression and overdispersion test statistics for the patent data. 28.89 28.77 374.29 I. •.4 . 0 4 2 ! $ 2 00 PPr ‘0 b0e o( (A P ‘a 00 U( (A 00 Pj 0 00 fr 0 — I & -. Ga 4.. U. — ,O’. •1 1.) 00 .4 D . 0 e — — iu 0 . so t) •3 l1 — — 8p Os .4 r 0 ‘a p ‘0 ‘a (A .- .a . LA oo 1.3 — j ‘a 0 0 LA 0 l.a 0 — p P 1.3 iiiii -iii 1 ,!l?S 8 - Q ‘a — I 0.3 LA ‘a 00 0 0. 00 0 0 Os 1.3 op .4 00 0 I •1 b w 4 00 I I t I II f 0. 0.8560 (0.1874) 0.0123 (0.0309) 0.8560 (0.0770) 0.0123 (0.0127) ) ))2 parameter dispersion 1.0 Mixing probabilities (log( R&D log( R&D 0.6207 (0.2935) 0.6207 (0.1206) Intercept 5.917 0.2094 0.0687 (0.0201) 0.5268 (0.0806) 0.9734 (0.1256) Quasilikelihood II Quasilikelihood I Poisson Regression Parameters Estimated Quasi- 0.4070 0.0415 (0.0267) 0.7321 (0.1087) 0.6626 (0.1540) likelihood ifi Table 2.10: Parameter estimates for five models for patent data. 0.1819 -1.0137 (0.2046) 986 (1.M .16.233 comp 1 0.1773 -0.1961 (0.0430) 1.7801 (0.2748) 0.5900 -__(0.4149) comp 2 0.6408 0.0755 (0.0175) 0.5182 (0.0916) 0.7025 (0.1422) comp 3 Mixed Poisson_Regression tz ci I Chapter 2. Mixed Poisson Regression ?pfodejs Table 2.11: Parameter estimates for five methods for seizure data. Parameters Estimated • . Poisson Regression Method I Method U Method Ill Mixed Poissoe Regressice Comp 1 Comp 2 2.129 (0.3846) 2.8450 (0.2360) 2.0704 (0.0890) 4.656 (1.094) 3.757 (0.8322) 1.3020 (0.4904) 7.4318 (0.5095) -0.2257 -0.2412 -0.2408 -0.4063 -0.2707 (0.0766) (0.2191) (0.1523) (0.0909) (0.0377) -1.320 (0.0800) -1.320 (0.1863) -1.440 (0.3098) -1.221 (0.2316) -0.4309 (0.1385) -2.2762 (0.1377) Mixing Probabilities NA NA NA NA 0.2762 0.7238 Unexplained Variance 1.0 5.4206 0.8631 0.4051 NA NA Intercept 2.118 (0.0815) 2.118 (0.1897) 2.148 (0.5539) 1 x 4.132 (0.3032) 4.132 (0.7059) 2 x -0.2257 (0.0329) 3 x 93 Chapter 2. Mixed Poisson Regression Models 94 Table 2.12: Mixed Poisson regression model estim ates for seizure data. Coapceeo Mixing probability 0) Poisson rate 1 p [ 1a0 [ 1 a log likelihood 2 aj BIC a 1 3 1 -component mixture 2.118 4.132 -0.27 -1.320 [ -583.16 2-component mixture 1 0.4128 1.2183 2 0.5872 -1.1571 1 0.3715 1.8959 -1.2761 2 0.6285 1.3777 -3.1018 1 0.3736 2.9919 -0.4732 -0.4718 2 0.6264 2.1791 -2.3248 -0.3379 1 0.2761 2.8450 (0.2360) 1.3020 (0.4924) -0.4063 (0.0909) -0.4309 (0.1385) 2 0.7239 2.0704 (0.0890) 7.4318 (0.5095) -0.2707 (0.0377) -2.2762 (0.1371) • AIC 3-component [ -587.16 [ -3.04 -700.10 -703.10 -707.41 -462.79 -467.79 475.14 426.21 -433.21 -443.51 -376.18 -385.18 -398.41 -375.29 -389.29 409.88 mixture 1 0.2742 2.8440 1.2938 -0.4054 -0.4294 2 0.0277 2.0809 -28.767 -0.3928 5.4488 3 0.6981 2.0694 7.3197 .0.2648 -2.2478 I I I bo U) ‘0 00 bo I-’ 43. 00 a’. U) . . I U) 0’. —3 -.3 U) I l3 0O Vi I -4 (.J Vi C 00 . -44 (‘3 J3 1 I Il ‘.3 0’. . ..) i- Vi •‘ Vi; i_il U) . 0’ 0 —4 W I-’ 4. 1 ,— 4 Vi >4 0 0 0 C’ I - 0 00 Vi p t- —w I 0 —3 I— -. 0 0’ I-.- Vi Vi - Vi Vi i— j (%) 00 0’ 00 Go 00 00 - I- Vi !.3 Vi ‘.3 3 i.- 0 O 0’ Vi 0 0 %Q (‘3 0’ Ls) 0 - t) t’) I’.) 00 s-” b -4 -4 — I’.) Vi t- M t) 1%) -400 O\ . Vi -3Vi U. (‘3 Vi I 43. 00 3 0’ U) Ui 00 Vi I Vi ) - 0 0 00 ‘3 -4 p 00 Vi i%) 0 t) —3 O -3b V) 0 ç -4 00 Vi 0’. 0 I U) ‘.0 U) 00 U) .3 I 00 (‘3 4 (s) Vi 0 0’ i— t’) I. C) Vi . “I 00 4 0 Vi 0 —3 00 0 0’. 00 0 Ut 00 —‘ (3 . 00 \0 t&) 00 (‘.3 (‘-3 Vi I— 0’ 0’. >4 (‘-3 C’ 0 0 00 (‘.3 1:.. 00 0’. (‘.3 00 Vi (.3 I— 00 z z z I >4 I C’ 0 I C.) 0 0 cn C,) 0 0 0 -t S. >4 0. 0 I-I ‘-—‘ C) 0 (‘.3 C,’ I I I •1 I :i I U m Ij,, x I I I 8. I 00 0 9 z 1 I a r9 ? 0 9 00 1 9 0 — (. r 0 ‘0 : 0 .-. (1 r 0’1 0 a 0 ‘o !. 000% — r 1 C 9 1 - 0 -: - 0 0 £) ° 9 •0 00 ‘0 — lf)$ 0 1-4 ‘? (1 v• 00 fl ‘9 — 0 - 9 ...‘qi- — I %0 r, — (I 0. r) w, ? 9 0. . ? 9 ‘9 — (‘I 9 V. (1 0% 0% 9 9 0% 00 0 I0% 0. I I 9 S (1 V. 9 (-I a’ 9 0 9 0 ‘0 0. V. 0 0 ( F! “1 0 9 ‘0 V. I— Observed # of colonies 18 21 21 29 10 16 I 15 0 33 26 16 33 Dose of quinoline d, Table 2.15: Number Of revertant colonies of salmonella (y ) 1 60 41 27 100 (j.Lg/plate) 41 38 33 333 42 27 20 1000 I. Chapter 2. Mixed Poisson Regression Models 98 Pviuoa rate C_z a1t log probability p, - a, 1 ai likelihood PJC BK -68.13 -71.13 -72.47 -68.93 -71.93 -73.27 .68.81 -73.81 -76.04 -60.90 -67.90 -71.02 -60.91 -65.91 -68.14 -60.78 -71.78 -76.68 12 a 1-component mixture 1 J [ 2.173 -0.001013 0.3198 2-component mixture • 1 0.6145 3.0779 2 0.3855 3.7112 1 0.5617 2.9886 0.000188 2 0.4383 3.6428 0.000082 1 0.8132 1.9125 -0.001247 0.3623 2 0.1868 2.4064 -0.001294 0.3790 1 0.8173 1.9094 -0.001260 0.3640 2 0.1827 2.4768 3-component mixture 1 0.5918 1.8484 ..0.001190 0.3640 2 0.3241 2.8535 -0.000154 0.1476 3 0.0841 5.9320 -0.000100 -03895 Table 2.16: Mixed Poisson regression model estimates for Ames salmonella assay data. The data excluding the 12th observation. as appeared in Lawless (1987b). 0.07181 2 1.0 as appeared in Breslow (1984) 1.0 1 Unexplained Variance Mixing Probabilities 0.0488 0.8173 0.1827 0.3640 (0.0665) 0.313 (0.0868) 0.3110 (0.09901) 0.2632 (0.0607) 0.3198 (0.05698) log( Dose • 10) -0.001260 (0.000275) -0.001013 (0.000245) Dose -0.000980 (0.000381) 2.3768 (0.2753) Comp 2 -0.000974 (0.000437) (0.2266) (0.2183) 2.203 1.9094 (0.2674) Comp 1 Mixed Poisson Regression II -0.000750 (0.000265) Negative 2 Binomial (0.3634) QuasiLikelihood’ 2.203 (0.359) 3 incomplete 2.308 complete Poisson regression 2.173 Intercept Parameters Estimated Table 2.17: Parameter estimates for five estimation methods for assay data I. aS a) 0 Cl) C 0 U) 0 (I) I 0 0— 1• C4j . — . . .. . 20 I . . . index . 40 . . 60 . Fkiure 2.1: The jndx. pJot of the Pearson residuals from the fitted 3-Component mixed F’isson regression model tor the patent data . . r 0 C ( > 0) 0 (1) 0 (I) 0• c’J. Ce). 0 1 . . 20 . index . 40 . 60 I- Fkiure 2.2: The. index plot of the deviance residuals from the fitted 3-component mixed F’Otsson regression model for the patent data. . . I B.. ci) 0 0 0 ci) I U) D (0 C4;J 0 c’j. 0 . . . . . 20 index 40 .. . . 60 Fkiure 2.3: The.indej p.lot of the likeljhood residuals from the fitted 3-omponent mixed t’Oisson regression model br me patent data. I I. tb3 a) 0) 8 0 a) C C.) (‘S 0) C a, Cl) 0 0 0 C,) 0 I • I I 20 . . observation index 40 .. •A. \f 60 .\ !.\ / Fiqtjye 2.4: The index plçt of the. averaqe relative coefficient chanqes from thG titted 3-component mixed I-’oisson regression model br the patent data. I-. C E .0 o h.. a) 0• 0, L() ce’ 0 00 0 -2 Iog(R&D) 000 Q00O())Q90 0 I 0°) I 0 0 0 )() o Qj 2 I 0 00 0 0 0 0000 000 0 0 Figure 2.5: The plot of the patent data. 4 I 0 o0 0 0 00 0 a 2. L C’ a of 0 0 Cl) C E 0 .0 0 Ct, 0 4-’ 4-’ 0• LC) 0 0 0 C L() -2 0 log(R&D) 2 I 2 ‘3 4 I .3 33 .3 2 Fiqure.2.6: The çJssification of the iatent çlata cçordinq to the estimated pOstenor probabilities based on the fltted mixed ioisson regression model. 3 I E ci -o I- 0 Cl) N a) I G) 0. (I) 0 -D a) Cl) 0 0. C’4J 0, 0. CD 140 120 100 80 60 40 20 0 day I .ItIlI.. I .1 I I I I. I II I II I i.Ii. 1 I IIII Figure 2.7: Daily epileptic seizure counts. I-’ I. a. C .0 a) E 0 U) a) D N a) a) I— 0. Cl) -D 0 a) Cl) 0. -c I... a) 0 D I 0 L() 0 1 0 20 baseline 40 60 day 80 100 treatment period 120 140 Ficjure 2.$: Estimated.hourly seizure rates and classification of the.seizure data. according to The estimated posterior prObabilities based on the titted mixed I-oisson regression model. -4 I. I. 0 (‘I ‘4 . 0• o o 0 Co o - 0 I I I 1 I ‘ ‘‘I ‘ II Ill I Il 20 I baseline - 40 60 day 80 treatment period 100 120 Figure 2.9: Etimated mean and variance basçd on the fitted mixed Poisson regression model tor the seizure data. 140 0 b. 2. I 0 ctS 0 C Cl) 0 U) 1. 0’ I 40 I 20 0 . . I . 60 . index . . 80 . I 120 100 . I Ficiure 2j 0: ThQ index plot ofthe Pearson residuals from the fitted mixed Foisson regression model br the seizure data. I I. . 140 •I. C I-’ > 0) C) ci cj I 0 U) C?. C’;J 0” c’J • 0 20 I 40 60 I I index I. .. 80 I 100 . 120 I. Ficiure 2.11: ThQ index plot ofthe deviance residqals from the ITtted mixed Foisson regression model tor the seizure data. I I. I” I 140 ... I, I a) 0 0 a) Cl) :3 (I) C. cJ 0• 1 cJ 0 . I 20 I . I 40 I• 60 1- I index . I 80 •. 100 I I . I FiQure 2.12: TJje. index plot of he likelihood residu.als from the Titted mixed ioisson regression model for the seizure data. 120 I 140 I I. I-’ i 0 d 0 Q)L() >0. 0 CU 0) I ci cDo > () C Ce -C 1) 0 c’J 0 I,’ • •. 20 . . • 6th observation 40 I’ I. I’ • 60 ••.! •• • index 80 • ‘L.•.• 100 120 Figure .1 : The jndex.pLot of the averaqe relative coefficient chanqes from the titteci mixed Foisson regression model tor the seizure data. 140 I I. 0 DL C E C) .0 0 4-’ a) 2 .0 o E C) C a) E. OLO. (0- a) 0. Cs.J 1968 1969 1970 1971 1972 h 1973 1974 1975 1976 1977 Figure 2.14: The time plot of the terrorist bombing data. 1978 1979 1980 0 r o E C) C a) .- C.J I—,, - 2 2 3 40 60 month 80 0 3 3 = 100 rate 1 1.6864 120 3 22 3 140 22 2 2 22 2 22 2 22 22 222 2 2 2222 0 22222 2 2 22 2 1.1 2 222 2 2 2 22 1 1 1 111 111 11111122 L.1 •11 11 1 11 111 11 11 11111111 1 1 11111 11 111 1 1 1 rate2=6.3611 rate 3=14.044 3 3 Fkjure 2.15: .Classificat’Qp of the terrorist bpmIinq epjsces according to the estimated posterior probabi ities based on the fitted mixed i-oisson regression model. 0 0 0 1) CU I Co C 0 S Co D 0 (I) r r C,, 0 20 40 60 index 80 100 120 Fjquye a16; The index plot of the Pearson resictuals from the fitted mixed I-’oisson regression model for trie terrorist bombing data. lAfl I rw 0 & CII > ci -D C) U) ci, (1) CII CS;] 0 cJ C) 0 20 40 60 index 80 . 100 120 Fique 2.t7 The index plot of th ciøviance resiciuals from the fitted mixed 1-oisson regression model br the terrorist bombing data. 140 a) i-I I G) 0 0 0 (I) :3 Cl) cJ. 0 0 C”.- 20 .. ... 40 60 _..1 index . 80 . . ... .. 100 I . .. 120 I Fjqure 2.t8:. The index pI.ot of th lilselihood resiciuals from the fittecurnixed Foisson regression model tor the terrorist bombing data. I 140 . I-’ )4 I. 0) 0) 0• a) c’j. a) > G) 0 C.) C-) a) C 4-’ C.) C) C a) Cl) CD. 0 20 - I - 40 ___ 7th observation - 60 .••.• index 80 100 - - - 120 __.__ -‘ - .•••..• Figure 2.19: Th inciex plot of the avraqe relative coeffIcient phariqes from the filled mixed ioIsson regressionior the terrorist bombing data. 140 ...I.... 00 I 0 - I I 4 2 number of accidents 6 8 I F) F’) F’) F’) I 10 I 12 14 I -1, -‘ CD ) -‘ F’) F’) F’) A F’) F’) F\) - C.i) F’) C) I o 0 I F’) -L - - 2: ‘ F’) - -L C,.) F’) -L F’ F’) F’) F’) F’) -L F’) CDD 0 C). 1 I F’) F’) D CD CD) F’) F’) - F’ F’) - -‘ - - .L - -‘ A - F’)’ FOi F’) F’) -- F’) CD C) .L - - - - - ...L - —L .L .L -L . .& .A ..L - —L .1 - —L -. .L L i. - — A — —‘ ...L - A — .L —L ._L ..L .L F’) F’) F\) ‘F’) F’) F’) F’.) F’) F’) iF’) Cl) 00. F’) CD _. F’3 F’) CD 0 CD ‘ t’.) F’) F’) F’) I —‘ —L —L A — - F’b ....L —L .—a. L I. .L F’)’ F’) IF’) F’) F’) - —‘ .L .1. — A r.) F’) —L . - L -& —‘ - A A A ‘ — —‘ —‘ —a. ‘. I. L —‘ .A — ..L -‘ F’) F’) F’) F’) F) F’) F’) F’.) —& L F’) F’) F’) 1’) F’) ‘ F’) 0 Cl) CD F’) —‘ 0 I J”)l I I I I I) F’) F’) V -.‘ 0 0 F’) = 2: - CD I . Cl) r4) ppoyr uossarJaj uosoj pQxrpj 611 g 0 a) U) aS 0 L.. ci) 0 U) aS U) 0 . 1. 300 200 100 index I I I ii i 1. •. 400 Figure 2.21: The index plot of the Pearson residuals from the fitted mixed Poisson regression model br tIie accident data. I F 2. > G) c C-) I Cl) -o D U) cJ 0 c’j. cw). 0 100 200 index 300 400 ‘I AL • I’ Figure 2.22: The index plot of the deviance residuals from the fitted Poisson regression model for me accident data. I a) 4Z 0 0 G) h... U) U) 0 c’J 0 . 100 —I 200 — index I 300 - 400 Figure 2.23: The index plot of the likelihood residuals from the fitted mixed Poisson regression model tor the accident data. I. F 0 0 co >‘— ci) 0) a) I— a) > 8c, a) 0 ci C. ‘I-’ C-) C Co O)C.) Cl) C 0 100 . 200 index . . 300 400 434th observation Figure 2.24: The, index plot of the aveaqe relative cefficint chajicies from the fitted mixed ioisson regression model for me accident daTa. . . a) C E a) .0 I 0 %4 I-. G) ‘I a) 0 U) a) C 0 ‘4- Ce Cl) E C 0 0 N 0 Cl) 0 0 0 CD 0 200 400 600 dose of quinoline (x) 800 1000 Figure 2.25: CIassificatio of the Ames aImonIIaassy jata accordinci to the estimated posterior probabilities based on the fitted mixed ioisson regressTon modei. I. I ba 4; 0 a) I Co 0 C I Vi) C’) 0- c’J. 5 . 10 index . . 15 . Figure 2.26: The index plot of the Pearson residuals from the fitted mixed Poisson regression model br the Ames salmonella assay data. I as > a) -D ci C) I— U) (I) CL 1, . 0-j-- C4J . 5 . 10 index /. . I-- ..—. 15 N Figure 2.7: The index plot of the deviance residuals from the fitted mixed Poisson regression model tor the Ames salmonella assay data. I .c a) 0 0 0 C) h. U) :3 (‘5 Ci) 0• 1 c’J. I . 5 -I— —----- . I 10 index --- . 15 .. Figure 2.28: The index plot of th likelihood resiciuals from the fitted mixed Poisson regression model br the Ames salmonella assay data. F (I) •1 0 0 0 a) LI) 0) 0 > 8 C.) C ci 4-’ C.) aS 0) C a) 1 LI) .. 5 • ____• •%%%%\\ _/‘ 10 index 12th observation .—. 15 Figure 2.29: The index plot.of the vraq relative coefficient chanqes from the mixed Poisson regression model forthe Ames salmonella assay data. I F Chapter 3 Mixed Logistic Regression Models 3.1 Logistic Regression and Its Modifications The logistic regression model has been widely used for analyzing count data in which each observation consists of a finite valued response variable and a vector of covariates or predictors. Areas of applications include epidemiology, quantal bioassay, and the social sciences. Sometimes the model fits poorly, suggesting the need for alternative models. In this case, it is not uncommon that observed data are overdispersed in terms of the binomial assumption. In the second part of this dissertation, mixed logistic regression models are introduced and investigated. These models are applicable in several different situations where the ustial logistic regression model is inadequate. They provide an alternative way to quasi-likelihood approach and others for modelling extra-binomial variation with a more meaningful interpretation. Suppose that the ith response Y is a count of successes in m trials, and associated with this response is a covariate vector x = (xii,. . . , x.)’ for 1 n. The logistic regression model assumes that the Y are distributed independently binomial(m, ir) with density function given by f (yi I where r(x, a) = c, x, = ( ) Ui) \ Yi(l exp(xa)/(1 + exp(xa)), a eter vector, m 2 is an integer and y = 1,. . . , — RT, is a unknown regression param m. Note that the binomial parameter 7r is 129 Chapter 3. Mixed Logistic Regression Models 130 related to the linear part, xa, through a logit transformation. Note also that m may vary with i. The logistic regression model may be used as follows. Sometimes, inference concerning the as is of primary importance. For example, when m = 1, Yj 1 may denote the occurrence of a particular event of interest. Large a’s (relative to their standard errors) correspond to factors which increase the chance of the event. There are several reasons for the widespread popularity of the logistic regression model. Cox (1970) argues from considerations of sufficiency. By writing down the likeli hood based on {(y, xi),. . ., (y, x)}, one discovers that the vector . . . , ) yxj) is sufficient for a. Cox(1970) feels that this model is the most useful analogue, for bino mial data, of the normal linear model. When the covariates are nominal or ordinal, there is a correspondence between the logistic parameters and the parameters of a log linear model for cross-classified data (Fienberg(1981)). Finally, inference for the a’s remains unaffected regardless of whether the data are sampled prospectively or retrospectively (see for examples McCullagh and Nelder (1989)). The logistic regression model is an example of a Generalized Linear Model (GLM) which is discussed by McCullagh and Nelder (1989). GLMs are models for regression data, i.e. a response Y measured along with a vector of covariates x. Under the GLM formulation, the response Y has a distribution which is a member of the exponential family and some monotonic differentiable function of the expected value = E(Y) (called the link function), g(u), is expressed as a linear combination of covariates and parameters. For binomial regression data, the proportion Y/m is regarded as the response and E(Y/m) = ir. Hence for the logistic regression model, the link function is the logit function, i.e., g(7r) = log(7r/(1 — 7r)) 131 Chapter 3. Mixed Logistic Regression Models When the logistic regression model fits the data poorly there are several alternative models to consider. Using the GLM formulation, these alternatives can be dichotomized into link function or frequency distribution modifications. To understand some of these generalizations, it becomes important to distinguish between two types of data sets. Suppose that in a designed experiment, experiment units are sampled and a 0-1 response along with some covariates are recorded for each unit. We call such data sets as ungrouped or point binomial, and the fundamental experimental units as Bernoulli experimental ones. Observations of this nature arise, for instance, in some medical trials where an end-period result for each patient (experimental unit) is either recovered (Y unrecovered (Y = = 1) or 0). Alternatively, if 0-i responses are grouped under each experimental condition and the cumulative number of positive responses for each condition are recorded along with a vector of covariates describing the condition, we call such data sets from the experiment as grouped and the fundamental experimental units as binomial ones. In a toxicity experiment, for example, tanks of fish are exposed to some toxic agent at several levels and the incidence of liver tumors in each tank is recorded. Here the tumor rates are the fundamental experimental units and each provides a 0, 1,. . . , m response where m is the size of the ith tank. With the logistic regression model, this distinction between these two data sets is superfluous. The log-likelihoods under the two regimes differ only by an irrelevant constant term ln j, and inference remains unaffected. When yi I considering generalizations, however, the distinction between two types of data can be \ crucial. While grouped data can be modelled by non-binomial frequency distributions, with ungrouped data we do not have this option. Any model for a Bernoulli response, Y = 0 or 1, is determined by P(Y and r = P(Y = 1). = 1), which specifies a binomial model with m = 1 Chapter 3. Mixed Logistic Regression Models 3.1.1 132 Link Modifications A wide choice of link function g(’r) is available. In addition to the logistic function, at least two other functions are commonly used in practice: (1) the probit function gfr) = where 1 (r) is the inverse of the standard Normal integraL This function is symmetric in ir and for any value of r in the range (0, 1), the corresponding value of the probit of ir will lie between —oo and oo. Note that when r = 0.5, probit(’ir) complementary or log-log complementary function ln(— ln(1 — = 0; and (2) the 7r)). This function again transforms a probability in the range (0, 1) to a value in (—oc, oc), but unlike logistic and probit transformations, this function is not symmetric about ir = 0.5. Note that all the three link functions can be regarded as special cases of a general procedure that relates the probability of a positive response to the covariates through a link G’ (yr) where G is some continuous distribution function. In fact, the logistic link is the inverse of the logistic distribution which is defiled as 7r(z) = exp(z)/(1 +exp(z)) = Pr(Z <z) where Z is a standard logistic random variable. Similarly, the complementary link can be derived by taking the inverse of the extreme value distribution function as the link function. McCullagh and Nelder (1989) discuss and compare these link functions. Of these three link functions, the use of the complementary function is limited to those situations where it is appropriate to deal with success probabilities in an asymmetric manner. The logit and probit link functions are quite similar to each other, but from computational viewpoint, the logistic transformation is more convenient because it has an explicitly analytical form. There are two other reasons why the logit link function is preferred to the other two link functions. First, it has a direct interpretation in terms of the logarithm of the odds in favor of a success. Second, models based on the logit link function are particularly appropriate for analysis of data that have been collected retrospectively, such as in a case-control study. Chapter 3. Mixed Logistic Regression Models Other links include the angular, g(ir) = 133 sin(r)h/2 and the linear, g(7r) = r. These links are discussed in Cox (1970). Of the links discussed above, the linear, angular, probit and logit are symmetric in the sense that g (z) 1 = 1 —g 1 (—z) and these links are similar for probabilities in the range (0.1,0.9). Relaxing the requirement that g(ir) be a linear function of the covariates, we can use nonlinear link functions to obtain a richer class of probability functions than the class specified by a linear link. Prentice (1976) generalizes the logistic link symmetrically to a 1 [r ) 7 (1+exp(w))_(1+’2) exp(w — — J_ j 72 , 71 B( ) f(w)dw, (3.1) where B(a, b) is the beta function. When ‘yi = inverting (3.1) yields the logistic link. The parameters -y and ‘-y indicate skewness and heaviness of tails of the density f(w). Other special cases of f(w) are extreme minimum value, extreme maximum value, probit, exponential, reflected exponential, and double exponential. Thus this model can be viewed as specifying a richer class of threshold distributions than the logistic alone. Other link functions include the power transformations of the logit probability (Aranda Ordaz (1981) and Guerrero and Johnson (1982)). A problem with these nonlinear link functions is that in some cases it may be difficult to compute the maximum likelihood estimates under the corresponding models. With development of high speed computers, this problem may become less important. Carroll et al. (1984) modify the probit link function by including covariates measured with error in Bernoulli experiments. With normal measurement errors, they discuss pro cedures to compute estimates for this model. They also demonstrate that the usual estimate of the probability of a positive response can be substantially in error when co variates are measured with non-trivial error. Their modification differs from the previous Chapter 3. Mixed Logistic Regression Models 134 link alternatives in that the modified link is derived to accommodate a specific problem. These approaches try to modify or enrich the basic logistic model by focusing on the relationship between the covariates and the probability of a positive response. 3.1.2 Frequency Distribution Modifications A consequence of using the binomial frequency distribution in the logistic regression is that Var(Y) = m7r(x, a)(1 7r(x, a)), suggesting the need for alternative frequency distributions. This — m7r(x, a)(1 — 7r(x, a)). In practice, however, we often have Var(Y) > may be reflected in over-large residual deviance and adjusted residuals which have a variance > 1. We note that if a positive response Y can be expressed as the sum of m independent Bernoulli random variables each with success probability (7r(x, a)), Var(Y) = m7r(x, a)(1 — r(x, a)). Hence, to use a non-binomial frequency distribution implicitly requires viewing Y as the fundamental response, that is, to have binomial experimental units. Several researchers have proposed approaches to accommodate extrabinomial variability. Without covariates, an alternative frequency distribution is the beta-binomial distri bution = i f( I a,b,m) () (1 - )m-Y+b-ld. (3.2) The model is derived by assuming that the binomial parameter ir is a positive random variable following a beta(a,b) mixing distribution. Hence the marginal distribution of the response Y is the beta-binomial. Williams (1975) discusses this model for the data from completely randomized toxicological experiments in which the experimental units are animal litters. In the model, the number of deaths among pups within a litter is assumed to have a beta-binomial distribution. This is a sensible situation to consider Chapter 3. Mixed Logistic Regression Models 135 binomial generalizations because litter mates often tend to respond more alike than pups from different litters and a binomial model assumes independence between litter mates. Several researchers generalize the beta-binomial distribution to incorporate covariates in the parameters for some particular applications. Crowder (1978) generalizes the betabinomial model for 1 and 2 way layouts. It is not obvious, however, how his approach generalizes to continuous covariates. A difficulty with generalizing the beta-binomial to allow more complicated settings, for example continuous covariates, is that one ought to somehow relate the beta-binomial parameter a and b to a covariate vector x via some functions a(x) and b(x). As Ochi and Prentice (1984) point out, it is hard to specify such functions with intuitive appeal. Otake and Prentice (1984) model the number of aberrant cells in samples of 100 cells taken from human survivors of the atom bombings of Hiroshima and Nagasaki. Possibly due to measurement error of the radiation doses, the data exhibit extra-binomial variability. At each unique x vector, they estimate a(x), b(x) by maximum likelihood using the beta-binomial model of equation (3.2). They then fit a linear model (x) = x’a via weighted least squares where i(x) is the average number of responses at covariate vector x. The weights are the inverses of the estimated variance of i(x) (based on b(x)) under the beta-binomial model. They point out that failure to accommodate this variability results in overly precise inference concerning the a’s. Pierce and Sands (1975) used a different approach. They assume that unmeasured covariates or measurement errors might have an additive random effect on the log-odds scale, and that logit(ir) = x’a where the intercept a 0 is distributed as a normal ) 2 (, o random variable. Likelihood estimation and residual analysis are discussed as well as an approximate analysis necessitated by the complicated nature the likelihood function. Efron (1986) introduces double exponential families as constituent distributions in GLMs, in which means and variances are allowed to depend on covariates. As an example 136 Chapter 3. Mixed Logistic Regression Models of his model, he modifies the binomial distribution (m, ?r) by rescaling it with sample size m to define a double binomial family f(y I r, 0, m) {g,m(y)}°{gy,m(y)}’ 2 m)0” [ dGm(y)], c(Tr, 0, 6 where g,m(y) = and Gm(y) ( m 1 mY( — )m(i-Y) \myj (m\ is the discrete distribution putting mass —m 2 at y = 0, 1/rn,... , 1, \my) and c(ir,0,m) satisfies I ,0,m)dGm(y) = 1. Based this model, he analyzes the toxoplasmosis data by incorporating covariates to the mean and variance in such a way that logit(ir) standardized rainfall for city i, and 0 = = x+c 1 x+c 2 x where x is the 3 ao + a 1.25/(1 +exp(—X)) where A = 2 for city i. and M is the standardized value of the sample size m Another approach for modifying the binomial frequency distribution is quasi-likelihood which specifies only the first two moments of Y rather than the complete distribution. The attraction is that unduly rigorous assumptions about the frequency distribution are avoided. To model binomial a regression (yi, xi),. (1983) suggest assuming that E(Y) = mirj . . , (y, x), McCullagh and Nelder and Var(Y) specifying a complete distribution for Y, where ir = ?r(x, = 7r(1 2 mju — r) rather than cr). This approach is similar to one advocated by Finney (1971) who used the probit instead of the logit link. Note that for the logistic regression model, u 2 = 1. Therefore 2 > 1 corresponds to extra-binomial variability or overdispersion, while u 2 < 1 corresponds to underdispersion. Since the complete distribution of Y is not specified, maximum likelihood estimation is precluded. 137 Chapter 3. Mixed Logistic Regression Models 2 are computed via a quasi-likelihood (Wedderburn, 1974) approach. Estimates of a and a In fact, the maximum quasi-likelihood estimates of a are the same as the usual logistic , and the moment 2 regression maximum likelihood estimates regardless of the value of a estimate of 2 equals Pearson’s chi-square value divided by the degree of freedom. This estimate is consistent in the limit as the number of observations increases to infinity with m fixed, and its asymptotic distribution is known (McCullagh and Nelder (1983)). 2 is obtained by the deviance divided by degree of freedom. A prob Another estimate of a lem with this model is lack of interpretation of it because it cannot explain the cause of overdispersion as other juasi-likelihood models such as that of Williams does. Williams (1982) considers two quasi-likelihood models which fine tune the previous approach. By regarding the binomial parameter as an unspecified random variable ll following a continuous mixing distribution on (0, 1) with E(II) (1 2 qO — = 02 and Var(ll) = 0j, he shows that the unconditional mean and variance of I’ are where 0 = E(Y) = 0 2 m Var(Y) = 0(1 2 m and — 2 0j)(1 + çb(m — 1)), exp(xa)/(1 + exp(xa). Note that in the absence of random variation in the response probabilities, Y would have a binomial distribution, Bi(m, 0j, and in this case, Var(Y) = m0(1 — 0). This corresponds to the situation where = 0 in the above equation. On the other hand, if there is variation of Y amongst the response probabilities, so that (1 0 2 m — is greater than zero, the unconditional variance of Y will exceed 2 —1)). Thus variation amongst the response probabilities 02) by a factor (1 + q(m causes the variance of the observed number of successes to be greater than it would have been if the response probabilities did not vary at random, resulting in overdispersion. As Ochi and Prentice (1984) and Collett (1991) mention, this model can be also de rived by assuming that there is a common correlation between the Bernoulli responses 138 Chapter 3. Mixed Logistic Regression Models within a binomial experimental units. Suppose that the ith of m sets of binary data ,. 1 consists of Y successes in m observations. Let Rj . . , Rjm be the random variables as 2 observations in this set, where R, = 1 for j = 1,. sociated with the m . . , m, corresponds to a success, and R 3 = 0 to a failure. Now suppose that the probability of a success ) = 0(l 3 ) = 0 and Var(R 3 , = 1) = 0, B(R 1 is O, so that P(R — 0). The number of E(R) = R, and so E() = successes Y is then the random variable and the variance of Y is given by m m Var(11) = > )+ 3 Var(R j=1 > , Rk) 3 Cov(R 3=1 kj 3 and Rj for where Cov(R , Rk) is the covariance between R 3 , 1 If the m random variables R . . . , j$ k, and k 1,. . . , m. Rjm were mutually independent, each of these co variance terms would be zero. However, since we assume that the correlation between 3 and Rik is R Cov(Rj,,Rjk) .../Var(R)Var(Rik) we have Cov(R , Rk) = 6O(1 3 — 0) and m m Var(Y) = j=1 kj j=1 = m:Ot(1 = m:0j(1 — — 0) + m(m; 0)[1 + (m — — 1)[SO(1 — O)} 1)S]. Note that the approach of McCullagh and Nelder lacks this interpretation unless m, = m for i = 1,. . . , n. An iterative algorithm which produces estimates of a and ir is also presented. Unlike the approach of McCullagh and Nelder, the estimates of a may be different from the usual logistic regression maximum likelihood estimates unless m = m fori=1,...,n. Chapter 3. Mixed Logistic Regression Models 139 Williams (1982) also discusses another model where the logit of ‘irj is a random variable with E(logit(7rj) = x’a and Var(logitfrj) = . As a consequence of this assumption, 2 a the true response probability is a random variable II whose expected value is O. The resulting model for logit(ll) is then logit(llj = xa + 6 and the term 6 is known as a random effect. This model generalizes the approach of Pierce and Sands by relaxing the assumption that the intercept of the regression has a normal distribution. Williams(1982) notes that these two models are quite similar though the latter has a more elegant interpretation since the fixed and random effects are on the same scale. Follmann and Lambert (1989) propose a non-parametric mixture of logistic regression model in which the intercept in the regression is a random variable with an unknown mixing probability distribution, and other regression coefficients are unknown constants. The mixed probability function of the response Y associated with a covariate vector x and m trials is given by x, a, m, H) = I ’ m \ r°° r(a x’a)!I(l + (uJj J — dH(a), m ir(a + x’a)) (3.3) - where r(a+x’a) =exp(a+x’a)/(l +exp(a+x’a)) andy = O,1,...,m. Although the mixing distribution H is not indexed by parameters, Laird (1978) has shown, under general conditions, that when estimating any mixture model (without covariates), the nonparametric maximum likelihood estimator of H is a step function with a finite number of steps. Lindsay (1983) also discusses some general results for nonparametric mixtures. He shows that existence, uniqueness and support size of the maximum likelihood estimate are related to properties of the convex hull of the likelihood. These results given by Laird (1978) and Lindsay (1983) imply that in terms of maximum 140 Chapter 3. Mixed Logistic Regression Models likelihood estimate, it is the same no matter whether H is assumed as a nonparametric distribution or as a discrete distribution with c points of support, where c is an unknown finite integer. In this sense, (3.3) may be equivalently expressed by a finite mixture with an unknown number of components. In the next section we will propose a mixed logistic regression model which generalizes Follmann and Lambert’s model. Tests For Extra-binomial Variation 3.2 To check whether data re overdispersed relative to the binomial assumption, we need a way to test for extra-binomial variation for regression type data. Note that it may be misleading if one tests for extra-binomial variation by fitting a more comprehensive model that includes the binomial, and tests a reduction to the simple model using, for instance, a likelihood ratio test. Lawless (1987a) points out that in some circumstances the asymptotic distribution used with these cases may be unreliable, as they tend to underestimate the evidence against the base model. An informal approach to detect extra-binomial variation is to use convexity plots (Lindsay and Roeder, 1992, and Lambert and Roeder, 1993). For example, Lambert and Roeder (1993) define the following function C(7r) and propose plotting it against r for logistic regression fl C(Tr) where = = n’ > vi () (f ) mj—yj exp(x’&)/(1 + exp(xj’à)), & is the maximum likelihood estimate of regres sion parameter vector a, and r E (0, 1). They prove that if observations are generated by a logistic regression model with random coefficients or random means, C(r) is ap proximately convex for a large sample. Therefore, the more convex C(7r) appears, the more evidence there is of overdispersion or an omitted variable. Note that this approach cannot distinguish overdispersion from lack-of-fit problem. 141 Chapter 3. Mixed Logistic Regression Models Several researchers use score tests for extra-binomial variation by fitting the bino mial model as a first step in the model building and testing for overdispersion. Tarone (1976) considers a correlated binomial alternative model, and applies the C(o) proce dure of Neyman (1959) to derive the score test statistic for the adequacy of the binomial distribution. Taking a different approach, Efron (1986) derives the score test statistic against beta-binomial alternatives. Dean (1992) develops a unifying theory for the score tests mentioned above and provides three score test statistics for the hypotheses of no overdispersion in the usual logistic regression model against alternatives based on three different forms of extra-binomial variation respectively. These score test statistics are Na Nb N — — — {[*(1 2 mfr) — {(m 1 — m(1 ,)]‘(yj — 2 + *j(yj m) mfr) n m(m—1)} 1/2 1 {2E — — — 1)(1 — — — *)} { 1 (y 2 + fr(y m) {2 (m 1)_1}h/2 2 m — — y:(l — an — mir) — y(l — — — corresponding to the following specifications of overdispersion: (a) E(1’) m’irj and Var(11) mr(1 (b) E() = mir and Var(Y) = mr(1 (c) E() = mr and Var() = m,r(1 — — r)[1 + O(m ir)[1 + 0(m, — — 1)7r(1 7r) for 0 small, 1)], and ir)(1 + 0) for 0 — — > 0. In the formula for Na, V is calculated by 2 = {2mfr(1 V — )2 +mir(1 — irj(1 — 67r +67r)} — :=1 j=1 where Wi: = mr(1 — irs), W 2 = m:irj(1 — ir)(1 — / where W 2 W X h )_1W (XtW the matrix H = 1 1 = diag{ Wii,. design matrix. In the above three formulae *j are the elements of 2ir), and . . , } and X is the 1 W is the estimated probabilities for positive Chapter 3. Mixed Logistic Regression Models 142 response for the independent identical observations based on the usual logistic regression. Under the null hypothesis H : 0 0 = 0, each statistic asymptotically follows the standard normal distribution. Note that the first two types of overdispersion (a) and (b) are the mean-variance relationship of the models proposed by Williams (1982), and (c) is that introduced by McCullagh and Nelder (1983). 3.3 A Mixed Logistic Regression Model Without covariates the finite mixture approach has been widely used in many applications (c.f. Titterington et. al. (1985)). With covariates, however, this approach has not been systematically studied and directly applied for analyzing binomial response data. In this section we extend the finite binomial mixture model to the logistic regression model by allowing both the component binomial parameters and mixing probabilities to depend on covariates. We investigate some basic features of the model. We also discuss identifiability for the model and provide sufficient conditions for identifiability. 3.3.1 The Model Let the random variable Y denote the ith binomial response variable, and let {(y, m, i = for that 1,... , n} denote observations where y, are observed value of 1’, m are total trials and x m) = (Xm), xT)) are k-dimensional covariate vectors associated with yj. Note and x are k -dimensional and k 1 -dimensional vectors corresponding to the 2 regression part of mixing probabilities and component binomial parameters respectively. Usually the first element of cm) and is 1 corresponding to an intercept. Our mixed logistic regression model assumes (1) The unobserved mixing process can occupy any one of c states where c is finite and unknown; Chapter 3. Mixed Logistic Regression Models (2) For each observed binomial response 143 associated with a binomial denominator j, m, there is an unobserved random variable, H,, representing the component which generates yj. Further, the (1’, fl) are pairwisely independent; (3) Conditional on covariate çm) fl follows a discrete distribution with c points of (m) (m) c (x(m) p _ ,/3) = 1 and 1 ,/3) = p (x ,8) where 1 3 support, and Pr(ll = j x (m) 1 ,/3) is defined by p,(x (m) 1 ( 3 p x ,/3) Pij ,(m) . ) exp(/3 x 3 = forj = 1,...,c— 1, 1 + E’ exp(/3xm)) (3.4) and (m) = 1 pc(x ,8) — Pic with /3 = (flu,. . . ,/ —i)’ 9 (3.5) 1_ipij = and /3 = . . , /31)’, j = 1,. . ,c . — 1, are unknown parameters. In fact, conditional on m) . follows a multinomial distribution (1, Pil,. . . , pic). Note that (4) Conditional on ll = j /3 appears in each pj 3 for 1 <j < C; and the binomial denominator m, Y follows a binomial distribution which we denote by (r) / jf 3 ,yiIx, 1 I m,7r) bi(y = Imi\ I\ Il I ...Yi( 7 1 — )mz (3.6) J where 7rj = 7r(x(r) aj) = — / expaj / (r) 2 X 1 + exp(aj’x ‘) , for j = 1, . . . , 144 Chapter 3. Mixed Logistic Regression Models where a (ai,. . . 3 , aj’ are unknown parameters , where a ,. (cr i 3 = . . 2 )‘, , aJ/ j 1,... , c. Note that the component binomial parameter rjj relate to covariates r) through the logit function. The above assumptions define the unconditional distribution of observations, yj, as a finite binomial mixture in which the mixing probabilities, Pu, depend on the covariates (m) through the multinomial link function, and the component distributions are binomial distributions with the probabilities, lrjj, depending on the covariates x through the logit function. Suppose that observations can be classified into c groups corresponding to the c unobservable states, a, may be interpreted as the coefficients of the logistic regression of observations in group j. On the other hand, 3 may be interpreted as the coefficients of the multinomial regression in which llu and are dependent and independent variables m) respectively. Note that the model allows some or all components of (m) and x to be identical, and some coefficients, a’s, to be constant across components, i.e., ajl or 0 in one or several covariates, i.e., X (m) = (z(m) (m) . . . x, ), and X ‘ = au (r) 1 (x . . . = 0 for some j, j = = al for 1,. . . j = 1,.. , c . , c. We denote x’’) , as two design matrices. Under the above assumptions the probability function of Y satisfies C (‘,‘) (m) f(y ; ,x ,mu,a,/3) p,bi(y, = I mu,ru) (3.7) j=1 where pj and hi (y I m, are specified by (3.4),(3.5) and (3.6) respectively. We may equivalently view the model as arising from the following sampling scheme: Observations are indepndent; for observation i, component j is chosen according to a multinomial distribution with probabilities pjj; subsequently, yj is generated from a binomial distribution with m trials, and probability ‘Irj. There are several justifications for mixed logistic regression models. Suppose that each experiment unit or object has some underlying propensity for a positive response Chapter 3. Mixed Logistic Regression Models 145 which is captured by one of the c response curves: logit(ir) = x(”1’a, (1 j c), and that the proportion of the experiment units captured by the jth curve, depends on a covariate vector (m), i.e., (x(m), 3 p 8). Thus we are led to the model of equation (3.7). Another argument for the mixed logistic regression model is that the coefficient vector a in the usual logistic regression model, logit(7r) the discrete distribution: Pr(a = ) 3 a = 3 for p j x’)’a, is a random variable with 1,. = assumption that p 3 are related to a covariate vector . (m) . c. By making the further we are led to the model of equation (3.7). Note that the above model includes several interesting models as special cases. Some of them were previously studied. • Choosing c • Setting 1 yields a logistic regression model; = m) = (1) yields a finite mixed logistic regression model with constant mixing probabilities; • Setting (m) = (1), x 1 and aj ak for k 4 1 yields Follmann and Lambert’s model (1989); • Setting x = (1) yields the finite binomial mixture in which the component bi nomial parameters are constant and the mixing probabilities depend on covariates (m) 3.3.2 Features of the Mixed Logistic Regression Models To use the mixed logistic regression models we have to distinguish experiment units as either Bernoulli or binomial. For binary data (m 1 = 1 for 1 i n) we can rewrite 146 Chapter 3. Mixed Logistic Regression Models equation (3.7) as C f(y (r) x (m) , , m, a, /3) = C pjj7rjj} [ [1 pjirjjj — j=1 j1 The above equation implies that we oniy modify the link function with the probability = prj. In this case, no matter whether the binary responses are heterogeneous, the responses always have Bernoulli distributions. This also means that the above model cannot adjust for overdispersion relative to the Bernoulli assumption. Furthermore, the model may not be identifiable without imposing some unrealistic restrictions on covari ates. Hence we recommend not using the mixed logistic regression models when dealing with binary data. For binomial experimental units, the distribution defined by equation (3.7) is no longer a member of exponential family so that the representation of a generalized linear model does not apply. In this case the component distributions have the logistic link, and the frequency distribution is a finite binomial mixture. For the mixed logistic regression models, the unconditional mean and variance of Y are, respectively, E(Y) E(E(1IH)) = = mj(pjjjj) mj*j (3.8) and Var(Y) = E(Var(Y = m = m(1 I ll)) + Var(E(Y I fl)) — — j) pj) + ((mi + ((mi — — 1)/mi) Var(E( I ll)) 1)/rn:) Var(E(Y I fl)), where Var(E( I ll)) = m — } (3.9) 147 Chapter 3. Mixed Logistic Regression Models I Since m > 1, Var(E(}’ He)) = as the new probability, Var(34) we denote I 0 holds if and oniy if E(Y = m*(1 j) — ll) is constant. Hence, if if and only if 7r 1 ... 7rj for 1 <i <n. This implies that the proposed model is able to cope with extra-binomial variation among Y ,.. 1 3.3.3 . , Y,, due to heterogeneity in the population. Identifiability To be able to reliably estimate the parameter of (3.7) we require the mixture be identi fiable, that is, two sets of parameters which do not agree after permutation cannot yield the same mixture distribution. Although an unlimited class of finite binomial mixtures may not be identifiable, classes of finite mixtures of some subfamilies of binomials may be identifiable. Without covariates Teicher (1961,1963), Blischke (1964) and Margolin, Kim and Risko (1989) give necessary and sufficient conditions for identifiability of the finite binomial mixtures. These results may be summarized as follows. In the binomial family bi(M, ir), 0 < r < 1, for fixed M but varying ir, the class of mixtures of at most k members is identifiable if and only if M 2k— 1. That is, if there are two representations of the same mixture: y = with Fj(y) 1 j = I bi(y , 2 c 3 p M,ir), Fj(y) Cl = if and only if k j5 = ,’ 2 c = = bi(y I = 0,. .. , M, 0 <j < 1 for 1 j ci, for ,c 1 2 < k, then 1 and c 7r = j, and 3 p = for j = 1,. . . (M + 1)/2. With covariates, Follmann and Lambert (1991) discuss the sufficient conditions for the identifiability of the nonparametric logistic regression model with common nonrandom regression coefficients and a random intercept with a finite, unknown mixing distribution. 148 Chapter 3. Mixed Logistic Regression Models Note that their model may be equivalently viewed as a special case of our models. They show that for binary response the number of components in the mixture must be bounded by a function of the number of covariate vectors that agree except for one coordinate; and for binomial response the number of components must satisfy the same bound or be bounded by a function of the largest number of trials per response (M), i.e., c (M+1)/2. To discuss sufficient conditions for identifiability in our case, we first define identifia bility as follows denote the class of probability models Definition: Let r) x,xm),mn,a,/3)}, with fQi nents, a restriction that < ... < {f(yi , a, ,8), 1 x, x, m (m) ma/3) with at most c compo parameter space C x II x 2, sample spaces (m (x, m))), , and fixed total number of trials and covariate vectors 1 4m))) where x Rd1 and (m) Rk2 for i = 1,. , n. (ma, (41, E E Ø is identifiable if for •, . . (c, a, ,8), (c*, a, j3*) E C x U x 2, çm) t, a, 8) = I f(i x, f(y t, a, /3*) (3.10) for all y E Y, i = 1,..., n, implies (c, a, /3) = (c*, a*, /3*). Note that the order restriction in the definition means that two models are equivalent if they agree up to permutations of parameters. Like the setting without covariates, we give sufficient conditions for identifiability by imposing a restriction on c specified by the minimum number of trials for proper subsets of the observations. We state them below. Theorem 2. Let S,\ = {(y, xAj; i = 1,. . . , t for some t} denote such a subset (m) of the observations indexed by ) E A that the ranks of vectors {x, {x,. , . . x} equal the ranks of the design matrices X(m) and let N = min{m , 1 . . . , m}, and 0 NA = maxXEA{NA}. Then X(T) (m) ,. . . } and respectively, and is identifiable if (1) 149 Chapter 3. Mixed Logistic Regression Models c 0 + 1), and (2) (NA x(T) and X(m) are full rank. Proof. Without loss of generality, we assume that the subset of the first t observations is . Suppose that (c, , 3) and (c*, a, *) satisfy equation (3.10), 0 0 corresponding to NA SA this then implies that for each i and all y E I where mj) 6) and 7rjj — irj(xT), Y, 1,. ? . . I pbi (3.11) aj) are defined above, and p, and are defined analogously. Note that each side of equation (3.11) may be regarded as a finite binomial mixture without covariates. Since c, c” N m, Teicher’s results (1961, 1963) imply that c = c”, for i = 1,... ,t and for i = 1,.. . , pjj = p’, and (3.12) rij = j = 1,... ,c. By the definition of the model, we obtain exp(/3xm)) = exp(/3xm)) for j = 1,... ,c— 1, (3.13) 1ogit(ax) = logit(cvxT)) for j = 1,. . c, (3.14) . , t. Since the logit function is monotone, from (3.13) and (3.14) we obtain (/3_/3;)IXm) = 0 (r) *, 1 ) 3 (a—c x = 0 forj =1,...,c—1 andi= i,...,t, forj=1,...,candz±=1,...,t, or (I3_/3;)1x4m) = 0 (a a*)Ix = 0 forj = 1,...,c—1, forj (3.15) 1,...,c, (3.16) (m) (r) where X and X, are the submatrices consisting of the first t rows of X (m) and X respectively. Since the ranks of Xm) and equal to the ranks of X(m) and are full rank, (3.15) and (3.16) imply that (cr, /3) = (o*, /3*). Thus r x(r) that is identifiable. D 150 Chapter 3. Mixed Logistic Regression Models Note that we can assume that condition (2) holds without loss of generality, since if it does not we can reparameterize the model accordingly. Note also that the sufficient conditions for identifiability depend on partial information of the observations. The conditions in Theorem 2 mean that if the two design matrices are full rank, the mixed logistic regression models are identifiable up to [(NA 0 + 1)/2] components. For instance, if NA 0 4, the theorem only guarantees that one or two-component mixed logistic regression models are identifiable. Note that the sufficient condition c 0+ (NA 1) may not be the lowest bound for identifiability. As a simple illustration of Theorem 2, consider the following data in Table 3.1 on the toxicity of ethylene oxide for grain beetles (Busvine, 1938). Note that Follmann and Lambert (1991) discuss identifiability of their model for this data set. We assume a mixed logistic regression model with both binomial parameters and mixing probabilities depending on dose level x and an intercept. Hence the ranks of the design matrices X(m) and x” are 2. Since any 2 x 2 submatrix of either X(m) or X’ is full rank, there are 45 x 45 = 2025 elements in the index set A. NA ranges from 24 to 31, and NA 0 = 31. Therefore, Theorem 2 allows 16 components in the mixed logistic regression model. This sufficient condition is the same as that given by Follmann and Lambert (1991). For two special cases of our model: constant mixing probabilities (X(m) = 1) and constant binomial parameters (X(’) 1), the above sufficient conditions can be stated as follows. Corollary 1. Let SA = {(y, mx , XAJ; i = 1,.. 1 . , } denote such a subset of the 2 k observations indexed by ) E A that the rank of vectors {x,.. the design matrices X(r). And let NA = Then , 1 min{mA . .. , . , x } equal the ranks of 0 = maxAEA{NA}. , and NA m,k } 2 is identifiable if (1) c < (NA 0 + 1), and (2) X(r) is full rank. Corollary 2. Let S, = {(yA, mA,, xA); i = 1,. . . , } denote such a subset of the 1 k Chapter 3. Mixed Logistic Regression Models 151 observations indexed by -.A E A that the rank of vectors {xS,. of the design matrices X(m). And let N>, Then is identifiable if (1) c = , 1 min{m>, .. . 1 , m>,, .. , x) } equal the ranks 0 }, and N>, = max>,EA {N>, }. 0 + 1), and (2) X(m) is full rank. (N>, Parameter Estimation 3.4 To obtain the maximum likelihood estimates of the parameters in the proposed model requires using an iterative algorithm. Two widely used algorithms can be applied to this case: (1) the EM algorithm (Dempster, Laird and Rubin, 1977) and (2) quasi-Newton algorithms. In this section we discuss how to apply the EM algorithm and the quasiNewton algorithm to our model with a known number of components. Note that when implementing the EM algorihtm, we also use a quasi-Newton approach for the M-step. We present results of a Monte Carlo study to investigate the performance of our codes and discuss some implementation issues. 3.4.1 The EM algorithm For a fixed number of components c we obtain maximum likelihood estimates of the parameters in the above model using both the EM algorithm (Dempster, Laird and Rubin, 1977) and the quasi-Newton approach (Nash, 1991). As is now standard in mixture model estimation, we implement the EM algorithm by treating unobservable component membership of the observations as missing data. We discuss choice of number of components below. Suppose that (Y M, X(m), X(r)) {(yj, m, (m) xv); i .. 1,. , n} is the observed data generated by the mixed logistic regression model. Let (Y z, M, X(m), X) {(yj, z, m, m) xT)); i = 1,. . . , n} 152 Chapter 3. Mixed Logistic Regression Models 2 denote the complete data for the model, where the unobserved quantity z = ,. 21 (z . , . satisfies zij = ( 1 ifll=j 0 otherwise. The log-likelihood of the complete data is 1Z,M,X(m),X(r)) = I m,)) + i1 j=1 =1 j=1 where pj and bi (zi I m, 7rjj) are defined by (3.4),(3.5) and (3.6) respectively. The EM approach finds the maximum likelihood estimates using an iterative proce dure consisting of two steps: E-step and M-step. At the E-step, it replaces the missing data by its expectation, conditional on the observed data and the initial values of pa rameters. At the M-step, it finds the parameter estimates which maximize the expected log likelihood for the complete data, conditional on the expected values of the missing data. Iteration stops when the log likelihood for the observed data does not increase significantly. In our case this procedure can be stated as follows. and E-step: Given the values, 3(0), replace the missing data, Z, by its ex pectation conditioned on these initial values of the parameters and the observed data, (Y M, X(m), X(r)). In this case, the conditional expectation of the jth component of z equals to the probability that the observation y was generated by the jth component of the mixture distribution, conditional on the parameters, the data and the covariates. Denote the conditional expectation of the jth component of z by (a(°), 3(0)). Then I c X(m), x(T), a(0) = E (z = Pr (z = i (O)) (0) (r) (m) (j pj(x ,3’ ‘)bi(y: I m,?r(x ,a ) . C çm , 3( 1=1 pi(x, . ))bi(y I (r) rn, iri(x 0) ‘ forj=1 . ) ... c / 3.17 153 Chapter 3. Mixed Logistic Regression Models where pj(X,/3()) and bi(y. are defined by (3.4), (3.5) and (3.6) I respectively. M-step: Given conditional probabilities {:(a(°), 9(0)) (zi,. = obtain estimates of the parameters by maximizing, with respect Q(a, 3 (°), = E (ic I Y, X(m), 3(0)) /3(0)) Qi(13 + . . ,z)’; i and = 1,. . . , /3, jij, (O), /3(0)) Q2(a where = :j1og(pj) and i=1 j=1 = j log (bi(y m, 1=1 j=1 The estimated parameters, & and , satisfy the following M-step equations 2 öQ = 0 (3.18) = 0. (3.19) Since closed form solutions of these equations are unavailable, we use a quasi-Newton approach (Nash, 1990) to obtain estimates. We implement the E and M steps in the following way to obtain parameter estimates. Step 0: Specify starting values a’ tolerance 1 E and ‘ = (0) 1 (a ,. . . , a(0) ) and j3’(0) = (0) (i (0) ,. . . , /3_) and two ; 2 Step 1: (E-step) Compute j = ., overflow problem in the calculation of i)’, (1 n), using (3.17). To avoid we divide both the numerator and de nominator in (3.17) by the largest term in the sum in the denominator; 154 Chapter 3. Mixed Logistic Regression Models Step 2: (M-step) Find values of & and /3 to solve (3.18) and (3.19), respectively, using the quasi-Newton algorithm (Nash, 1990); Step 3: If at least one of the following conditions is true, set a° = /3, and & and ,B(°) go to Step 1; Otherwise, stop. (1) H (2) H (3) I & — — I E= E I — I I l(&, $1 Y M, X(m), X(r)) — — a /34 l(a(°), I “ iw X(m), X(r)) , where l(ci, /3 2 e Y M, X(m), X(r)) is the observed likelihood function. Dempster, Laird and Rubin (1977) and Wu (1983) discussed the convergence proper ties of the EM algorithm in a general setting. Since Q(a, /3 a(°), /3(0)) and its first order partial derivatives are continuous in c, 9, a(°) and 3(°), applying Wu’s theorems (1983) lets us conclude that the sequence of the observed likelihood l((1c), /3(k) Y, M, X(m), X(r)) converges to a local maximum or saddle point. Note that the observed likelihood func tion l(a, /1 Y M, X(m), X(r)) need not, in general, be globally concave. Thus we need to choose initial values carefully in order to increase the chance that the algorithm converges to the global maximum. Our approach will be discussed below. Note that the above EM algorithm does not directly yield estimates of the standard errors corresponding to the parameter estimates. On the other hand, when c is known, asymptotic normality of /((&, /3) — (a, /3)) is easily proved under standard regularity conditions (Lehmann, 1983). To approximate standard errors, we may compute &(&,) and ô, ) from the diagonal elements of the inverse of the (c 1 * * 1 + (c k — 1) ) 2 k - Chapter 3. Mixed Logistic Regression Models 155 dimensional observed information matrix with c fixed at ê which is defined as I X(r),X(m),M,) 321 3a 2 321 8c33 — — 321 3a8 - l 2 3 3/32 Although the EM algorithm is relatively robust for the choice of initial values, it has a lower convergence rate than the quasi- Newton algorithms. To balance the trade-off between these two algorithms, we first use the EM algorithm until either the likelihood value does not increase significantly in terms of a given tolerance epsilon 2 or the parame , and then 1 ter estimates do not change significantly in terms of a given tolerance epsilon shift to a quasi-Newton algorithm which maximizes the observed likelihood function. In doing so we can obtain approximate standard error of the estimates as by-product of the quasi-Newton approach. Note that in some cases the approximate standard errors by the quasi-Newton approach may not be accurate. Hence we recommend calculating the information matrix numerically whenever possible. We modify the above Step 3 as follows: - Step 3’: (a) If at least one of the following conditions is true, set c(° = & and 3(°) and go to Step 1; Otherwise, go to (b). (1) (2) (3) (o) I I = 11$ I I D=i (r)) I l&, $ y pj — c 1 k — — — (0) — — I 61; iP l(cx(°), y pj j(m) j(r)) I 2• (b) Maximize the observed log likelihood function l(cv, 3 Y, M, X(m), X(r)) using the quasi-Newton algorithm (Nash, 1990) with & and 3 as initial values. Then, stop. 3.4.2 Starting Values To run the code of the above algorithm, we need to choose the starting values for the pa rameters in the model. Note that the EM only ensures, under some regularity conditions 156 Chapter 3. Mixed Logistic Regression Models (Wu, 1983), that the estimates converge to the local maximum points of the likelihood function for the observed data. Furthermore, since the likelihood function may not be globally concave, the several starting points needed to find the maximum likelihood es timates, & and $. We propose the following approach for choosing the starting values. We assume that c is known. ratios, {y1/mi,. .. , yn/mn}, At the first step of the approach it calculates the divides the set of the ratios into c groups in terms of its per centiles and fits the observed data into a c-component mixture with constant covariates, (m) = (r) (1) by choosing initial values based on the percentile information. At the second step, if necessary, it fits the observed data into a mixed logistic regression model containing only one regression term in either the success probabilities or the mixing prob abilities in such a way that the initial values of the parameters included in the previous mixture model equal the estimates of the corresponding parameters from the previous fitting model, and initial values of the parameters not in the previous fitting model are set to a small value, say, 0.00001. This process is iterated until a complete set of initial values for the mixture model is obtained. The motivation of this ad hoc approach is based on the idea of cluster analysis. At each iteration, we use different criteria to classify the data. First, the data are classified in terms of its percentiles. Then the data are classified in terms of a finite binomial mixture without covariates, and subsequently in terms of mixed logistic regression models. Note that choosing a complete set of initial values for a mixture model step by step in such a way guarantees that the likelihood values will increase in each step. Also our approach produces maximum likelihood estimates for a sequence of nested mixture models while it achieves a complete set of initial values for the mixture model. We use an example to explain this approach. Suppose that we need to choose initial values to fit a 3-component mixture model with covariates (r) X: = (1, s) and x.(m) = (1, t) where s and t are real numbers, each with a regression term. First, we find 16.5, 33.0, 157 Chapter 3. Mixed Logistic Regression Models 49.5, 66.0 and 82.5 percentiles of the observed ratios {yi/mi,. . . , yn/mn} denoted as q—q respectively, and fit the data into a 3-component binomial mixture of constant covariates (x = m) = 21 and a 31 equal to logit(qi), , a 11 (1)) with the initial values of cr ) and logit(q5) respectively, and both the initial values of /3 and 3 logit’(q /321 equal to 0. Note that under this specification and the logit link function, the initial values of (r) a,), 1, 2,3) are equal to q, q3 and q 5 with the same mixing probabilities . . . (j = . 1/3. Second, we fit the data into the 3-component mixed logistic regression model with (r) (m) = (1, s) and ; 12 a a 22 and a 32 equal (1) by choosing the initial values of , . = to 0.00001 and the initial values of the other parameters equal to the estimates of the corresponding parameters of the first fitting model. Finally, we choose initial values for the 3-component mixed logistic regression model with x such a way that /312 and /322 = (1, s) and x = ) in 1 (1, t are equal to 0.00001 and the other parameters is equal to the estimates of corresponding parameters of the second fitting model. 3.4.3 A Monte Carlo Study We use Monte Carlo methods to examine the performance of the above algorithm. Par ticularly, we wished to verify the reliability of our code, determine the precision of es timates and investigate some model selection criteria to be discussed below. We use three 3-component mixture models. For each, we analyzed 101 replicates, each with 100 observations. Two different approaches for choosing initial values are compared in the study. In one, we use the true parameter values of the model generating the observations as initial values in order to determine performance of the algorithm in the best case. The other 21 and a 31 as initial values, chooses initial values 11 a a uses the true parameter values of , of /3ii and /921 according to the approach described in 3.4.2 section, and fits the samples to a 3-component binomial mixture with constant covariates. Then, following the approach 158 Chapter 3. Mixed Logistic Regression Models of section 3.4.2, we choose a complete set of initial values for the parameters of the model generating the samples. These two different approaches of choosing initial values lead to essentially the same estimates. We describe the details below. Model 1: A model with the success probabilities, tributions, bi(y Trjj, of the component binomial dis m, irjj), depending on one time-dependent covariate, with constant 30. For the logistic regression part, mixing probabilities, where m (r) = (3.20) (1, sj, wheres=0.2fori=1,...,10,d=0.4fori=11,...,20,etc.,and a where c = = (—1.2962,—0.4505), a , 2 (a,, a = (3.21) Q3) (—1.3148,1.0811) and a = (0.6973,0.7499). For the mixing part, (m) 13 = 1 = (th’ 132) = (—0.9163, —0.5108). For the success probabilities can be written with the form Tri(x, ai) (xc 3 ir ) 3 a = logit(—1.2962 0.4505s) (3.22) = logit(—1.3148 + 1.0811s) (3.23) = logit(0.6973 + 0.7499s), (3.24) — and the mixing probabilities • and (Xm),/3) 1 p 0.2, (xm),13) 2 p 0.25 (m) x 2 ( 3 p ,/3) 0.5. 159 Chapter 3. Mixed Logistic Regression Models Note that choosing the parameters as the above makes the component distributions easily decreases from 0.3 to 0.1, Pi2 increases from 0.3 to 0.7 distinguished. In this model, and increases from 0.7 to 0.9. Thus there are no overlap among them. Model 2: A model with constant success probabilities, lrjj, of the component binomial distributions, bi(y, m,, covariate, where m: (r) a 2rj,), and mixing probabilities depending on one time-dependent 30. That is, for the logistic regression part, = 1 = ) 3 , a 2 (ai, a = (—2.1972,—0.8473,1.3863) and for the mixing part, (m) = (1, s) (3.25) (13i, (3.26) where s, is defined as above, and /3 where /3 = (—2.1129,1.6057) and = /3 = /32) (—0.9692,1.3805). The positive probabilities, then, are (r) ) 1 ?rl(x ,a (r) ) 2 X ,a 7 ( 2 and (xT),a 3 ir ) — = — = 0.1 0.3 0.8, and the mixing probabilities are given by (m) — P1\X, — exp(—2.1129 + 1.6057s) l129 + 1.6057s) + exp(—0.9692 + 1.3805s) + 1 exp(— . 2 3 27 160 Chapter 3. Mixed Logistic Regression Models x p ( 2 exp(—0.9692 + 1.38O5s) (3.28) s) + exp(—0.9692 + 1.38O5s) + 1 7 exp(—2.1129 + 1.605 x(m) p ( 3 1 exp(—2.1129 + 1.6057s) + exp(—0.9692 + 1.3805s) + (m) = (3.29) Note that choosing the values of j3 as the above results in that Pu decreases from 0.2 to 0.1, Pu2 increases from 0.25 to 0.7 and Pu3 increases from 0.7 to 0.9. They don’t overlap. Model 3: Both the success probabilities and mixing probabilities depend on the covariate s. Fortheregressionpart, x, aand Kj(xT),cj) aregivenby (3.20), (3.21), (3.22), (3.23) and (3.24) respectively; For the mixing part, m) 3 and P(Xm), ) are given by (3.25), (3.26), (3.27), (3.28) and (3.29) respectively. We chose the above parameter values so that the success probabilities and mixing probabilities for each component do not overlap. We wouldexpect that in this case, the algorithm would perform well. We carried out these simulations, each with 100 replicates. The responses i were obtained by first generating a uniform (0,1) random number u: and then assigning yu , iri) if u 2 binomial(m m), p ( 1 ) < p(xm), B ri if 1 2 binomial(m,i ) ‘UI pi(xm),8) + p(x),); and yj (X(m) /9). Our implementation of the algorithm 2 m), 9) + P p ( ) if u, > 1 3 , R1 binomial(m used FORTRAN version on a Sun SPARC station 1. The results of the Monte Carlo study are presented in Table 3.2 , Table 3.3 and Ta ble 3.4. These tables show that the mean of estimates are very close to the true parameter values in the models, suggesting that the global maximum of the observed likelihood is reached. For model 1, the sample means are quite close to the true values and the stan dard deviations are relatively small. Although the coefficients of the logistic regression of model 2 are estimated accurately, estimates of mixing probabilities are more variable. This suggests that estimating mixing probability parameters in this model is intrinsically 161 Chapter 3. Mixed Logistic Regression Models more difficult than estimating the success probabilities. This agrees with observations in the literature (Titterington et al., 1985; McLachlan and Basford, 1988). Estimates of the parameters of model 3 illustrate the same pattern as in Model 2 where estimates of the mixing probability parameters are more variable than those of success probabilities parameters. Note, however, that although the estimates of mixing probability parame ters, , x p ( vary somewhat, the estimated mixing probabilities, 3 , 9), are more precise due to the multimonial link function between the parameters and mixing probabilities. The average number of the iterations of the EM algorithm for Model 1 is 8.24, 12.35 for Model 2 and 20.2 for Model 3 under the stopping criterion = 0.01, and average time is 12.5, 19.4 and 120.5 seconds respectively. Implementation Issues 3.5 3.5.1 Model Selection We need to address following the three issues when we apply a mixed logistic regression model: (a) We need to determine the conditions of identifiability for the model; (b) we need to determine the number of components, c, of a mixture, and (c) we need to have a method to carry out inference about model parameters. When c is known, inference for the parameters can be based on a standard likelihood ratio test. In practice, however, this case may not be common. When c is unknown, the usual likelihood ratio test is no longer valid for determining c or testing hypotheses about parameter values. As we discuss in section 2.7.1, this is because mixing probabilities may lie on the boundary of the parameter space when the hypothesized number of components is less than the fitted number of components. Hence the usual regularity conditions for the likelihood ratio test do not hold. We propose the following methods for model selection. Two widely used model selection criteria are the Akaike’s Information Criterion (AIC) 162 Chapter 3. Mixed Logistic Regression Models (see (Akaike, 1973; 1974) and the Bayesian Information Criterion (BIC) (Schwarz, 1978) criteria section 2.7.1. For the mixed logistic regression models, we define the AIC and BIC as follows: • AIC: choose the model for which l(X) • BIC: choose the màdel for which l(X) — — ac(X) is largest; (log(n))a(X) is largest and co where I(X) is the maximum log-likelihood of the mixture with c components 3 and 9 j 2 are the dimensions of a 1 and k 2 where k 1 + (c 1) * k variate X, a(X) = c * k — always respectively, and n is the total number of observations. These two criteria do not select the same model. At the Using the BIC (Ale), our model selection procedure consists of two stages. com first stage, we determine c to maximize BIC (AIC) values for the saturated 1-3 (1-4) ilities ponent mixture models that contain all possible covariates in both success probab ing the and mixing probabilities. Note that the c values must be within the range satisfy our appli identifiability conditions. Although we compute both AIC and BIC values in BIC is cations, we recommend using BIC because our Monte Carlo studies suggest that approach more reliable in the model selection. At the second stage, our model selection lar model depends on our analysis objectives. If our goal is inference about some particu e models. If parameter, we carry out likelihood ratio tests for nested c-component mixtur maximize the goal is choosing an appropriate model to fit the data, we select a model to selection BIC (AIC) values among c-component mixture model concerned. Since this c con method is heuristic and only gives a guideline in applications, some other specifi e, in cerns in model selection should be taken into account from case to case. For instanc be some applications the number of components and some parameters in a mixture may explicitly or implicitly determined by underlying theory. 163 Chapter 3. Mixed Logistic Regression Models In the Monte Carlo studies discussed in Section 3.4.3, we computed both AIC and BIC values for all possible mixed 2 to 4 component models. Table 2.5.1.1 shows that AIC and BIC are reliable methods for choosing the correct models. AIC chose the correct model 94% of the time for Model 1, 82% of the time for Model 2 and 93% of the time for Model 3. When AIC failed to select the correct model, it always chose a model with too many components, suggesting that AIC may under-penalize the number of parameters in the mixtures. On the other hand, BIC always chose the correct models, suggesting that BIC may not over-penalize the number of parameters. Note that all sample sizes in the Monte Carlo studies are 100. The examples in the next section will exhibit this procedure in practice. 3.5.2 Classification One possible use of the mixed logistic regression model is to classify data on the basis of a probabilistic model rather than an ad hoc clustering technique. Since j in (3.17) is the estimated posterior probability that the observation y, is generated by the component distribution bi (yj I m?r), this information can be used to classify observa tions into different groups characterized by the component distributions. For instance, for a c-component mixture model we may postulate c different groups defined by the c different sets of the coefficients of the logistic regression,7rj cj) (j = 1,.. . , c) of the model. According to the classification criterion, an observation i is identified with the component which maxiniizes jj. In our applications, maximum values for this quantity all exceed 0.5. Note that if the parameters of the model were known, this classification criterion would be the optimal or Bayes rule (Anderson, 1984, chapter 6) which minimizes the overall error rate. Also such an approach has been referred to as latent class analysis (Aitkin et al. 1981). We illustrate this approach in examples below. 164 Chapter 3. Mixed Logistic Regression Models 3.5.3 Residual Analysis and Goodness-of-fit al Once a mixed logistic regression model has been fit to a set of observations, it is essenti to check the quality of the fit. For this purpose, we first define Pearson, deviance and y likelihood residuals for mixed logistic regression models, and then use them to identif individually poorly fitting observations and influential observations on overall fit of the model as well. We also define a quantity to measure influence of individual observations on the set of parameter estimates, and use it to identify influential observations. In ad dition, we provide goodness-of-fit statistics for mixed logistic regression models. Definitions of Residuals As we discuss in Section 2.7.3, we define Pearson, deviance and likelihood residuals for a mixed logistic regression model. The Pearson residual is defined as yi rp,, — i 2 1 3.30 where (3.31) = 7rjj (r) = (r) .i exp(c exp(cr ; 3 3 )/(1+ x (m) “ x 3 exp(/3 pi’ = c—i Ek c—i Ek , ) , for j m) ) (m) )+1 / x: 3 exp( k 1 / ; 3 exp( k )), = 1 ... c — 1 and +1 and {C C V(tj) = m + m(m — — 1) — The deviance residual is defined as = sign(yj — /%)\,/2[l(y, yi) — l(/, y)} } (3.32) Chapter 3. Mixed Logistic Regression Models sign(y 165 (3.33) — where l(j, y) is the log likelihood function of mixed logistic regression model for obser vation yj and d = 2(l(yj, y) — l(/j, yj)) is the contribution to the deviance goodness-of-fit statistic D which is defined as D 2[l(y,y) — (3.34) l(fj,y)]. Note that l(yj, yj) is the same for both the usual logistic regression and mixed logistic regression models because C (r) (m) f(y ; ,x ,m,a,6) pbi(y = I (3.35) j=1 m,y) < = (3.36) bi(y I mj,yj) (3.37) This indicates that there is the same baseline for the usual logistic regression models and mixed logistic regression models. The likelihood residual is derived by comparing the deviance obtained on fitting a mixed logistic regression model to the complete set of n cases with the deviance obtained when the same model is fitted to the n — 1 cases, excluding the ith, for i = 1,. . . , n. This gives rise to a quantity that measures the change in the deviance when each case in turn is excluded from the data set. The value of the likelihood residual for the ith case is defined as rL = sign(yt — [t)/D — (3.38) D() where /% is defined by (3.31); & and & are the maximum estimates of the regression parameters based on the complete data set of n cases and the data set of n — 1 cases 166 Chapter 3. Mixed Logistic Regression Models excluding the i case respectively; and D and D() are the deviances based on n and n — 1 cases respectively. Note that for large binomial denominators m, all three types of residuals approx imately follow the standard normal distribution if the fitted model is adequate. Our numerical results show that the Pearson residuals may not be as approximately normal as the other two types of residuals. Detection of Outliers and Influential Observations The residuals obtained after fitting a mixed logistic regression model to an observed set of data form the basis of a large number of diagnostic techniques for assessing model adequacy. Since our primary objective of residual analysis for mixed logistic regression models is to identify outliers and influential cases, we discuss how the residuals can be used for this objective. Like mixed Poisson regression models, we define outliers as those observations that are surprisingly distant from the remaining observations in the sample. Such observations may occur as a result of measurement errors, that is errors in reading, calculating or recording a numerical value; or they may be just an extreme manifestation of natural variability. Since large residuals indicate poorly fitting observations, we use index plots of residuals for detection of outliers, that is, observations that have unusually large residuals. The influence of a particular observation on the overall fit of a model can be assessed from the change in the value of a summary measure of goodness of fit that results from excluding the observation from the data set. Since is the change in deviance on omitting the ith observation from the fit, an index plot of these values is the best way of assessing the contribution of each observation to the overall goodness of fit of the model. To examine how the ith observation affects the set of parameter estimates, we define 167 Chapter 3. Mixed Logistic Regression Models the following quantity = {ii (& - 1 k &)/se(&) 11+11 ( (i) a,,l-a c—i a 2 k )/se() — se(,i) = where - ii} } (3.39) and 3 are the maximum likelihood parameter estimates of the mixed logistic regression model based on the complete data set of n cases, and data set of n — and on the 1 cases excluding the i case; se(&) and se() are the estimated standard errors of the corresponding estimates based on the ii cases, and p = 1 + (c ck — . 2 1)k Because each term in (3.39) measures a relative change in individual coefficient, w can be interpreted average relative coefficient changes for a set of estimates. This is a useful quantity for assessing the extent to which the set of parameter estimates is affected by the exclusion of the ith observation. Relatively large values of this quantity will indicate that the corresponding observations are influential and causing instability in the fitted model. An index plot of w is the most useful way of presenting these values. The exam ple in the next section will illustrate these points. Goodness-of-fit Statistics After fitting a mixed logistic regression model to a set of data, it is natural to enquire about the extent to which the fitted values of the response variable under the model compare with the observed values. If the agreement between the observations and the corresponding fitted values is good, the model may be acceptable. If not the current form of the model will certainly not be acceptable and the model will need to revised. The aspect of the adequacy of a model is widely referred to as goodness of fit. There are at least two widely used goodness-of-fit statistics which can be used here. 168 Chapter 3. Mixed Logistic Regression Models One is the deviance defined as D (3.40) = where rD is the deviance residuals for the mixed logistic regression model; and the other is the Pearson’s X 2 statistic defined as (3.41) = where rp is the Pearson residuals for the mixed logistic regression model. In order to evaluate the extent to which an adopted mixed logistic regression model fits a set of data, the distribution of either the deviance or the Pearson statistic, under the assumption that the model is correct, is needed. In general, the deviance and the Pearson’s X 2 statistics are asymptotically distributed as 2 x with (n — p) degrees of freedom, where n is the number of observations and p is the number of unknown parameters in the model. Many studies have shown that the Pearson statistic is often much more nearly chi-squared than that of the deviance (e.g., Larntz, 1978). For this reason we use the Pearson statistic for overall goodness of fit tests for the mixed logistic regression models. 3.6 An Application This example uses data from an experiment reported by Ganio and Schafer (1992), which investigates the carcinogenic effects of aflatoxin, a toxic by-product produced by a mold that infects cottonseed meal, peanuts, and grains. Forty tanks of rainbow trout embryos were exposed to either afiatoxin Bi or a related compound, afiatoxicol, at one of five doses for one hour, and the incidence of liver tumors in each tank was recorded after one year. The data in Table 3.5 are the proportions of fish with liver tumors in each of 40 tanks. The researchers believe that there may exist extra-binomial variation due to tank effects and different treatments. They believe that afiatoxical must undergo more Chapter 3. Mixed Logistic Regression Models 169 chemical changes than afiatoxin B1 to produce tumors in fish. This may result in more variation of effective doses reaching the liver of fish in aflatoxicol tanks and, therefore, a greater degree of extra-binomial variation for the aflatoxicol group. The issue of interest is to assess dose level and treatment effects on the proportions of fish with liver tumors while taking extra-binomial variation into account. We first apply the usual logistic regression model with covariates including an inter cept, dose level (xii), treatment (x ) and dose-treatment interaction (x 2 ), where 3 1 ( = and 0 if fish in tank i was exposed to aflatoxin B1 1 if fish in tank i was exposed to afiatoxicol 3 = x x . 2 1 (3.42) (3.43) The top part of Table 3.6 reports results of fitting the data to the usual logistic regression models. Note that the deviance and Pearson goodness-of-fit statistics for the model with covariates x, x 2 and x 3 are 391.08 and 365.3, respectively, with 36 degrees of freedom, suggesting that there is significant evidence of lack of fit in the logistic regression model. Furthermore, the data are overdispersed with respect to the binomial distribution, since each of overdispersion tests is highly significantly (Na = 68.26, N = 36.42 and N = 36.3). This also indicates inadequacy of the usual logistic regression model. Ganio and Schafer (1992) only present exploratory techniques for use in an early stage of data analysis to aid modelling extra-binomial variation. They take some function of the dispersion parameter in a generalized linear model to depend on explanatory variables. To detect extra-binomial variation for the fish data, they consider three models for dispersion. Let ‘Irjj be the probability of tumor for concentration level i and carcinogen group j (i = 1, . . . , 5; j = 1,2), and let Yk be the number of tumors observed in mk fish in tank k of treatment ij. Then they model the variance of this count as — rjj)/qjjk and consider the following forms for dispersion parameter: (a) qjjk = Chapter 3. Mixed Logistic Regression Models ; (b) (mk , where z 3 A + az 3 cijk = — l)r,(l — (j = — 1); and (c) 170 qSk = [A + aZjik}’, where Zijk = ?rj,). Note that Model (a) is a generalized linear model with constant dispersion. Model (b) contains separate dispersion parameters for the two carcinogen groups. Model (c) is the approximate variance of Y if a random effect, with mean 0 and variance a, is additive on the logit scale (Williams, 1982). They find that the extrabinomial variation is associated with the type of carcinogen and cannot be explained simply by differences in the rk’s. Note that, however, they do not analyze extra-binomial variation along with dose-response function in mean simultaneously. We apply the mixed logistic regression model assuming that (1) each observed number of tumors, yj, in m fish in tank i is associated with co (m) variates ; = (r) (1, x ,) 1 22 and x x = (1, x ,x 1 , xj) where x 2 , xj 1 2 and x 3 are defined above; (2) numbers of tumors in different tanks are independent and follow a mixed logistic regression model with binomial parameters rjj given by the link function = where i = 1,. p: . . = ,40, and exp(a + x 30 31 + a a 1 x+x 2 33 a ) 3 1 + exp(ajo + a x+a 1 x+a 2 )’ x 3 j = 1,. . . , (3 44) c, and the mixing probabilities pj 3 given by 30 + /3ixi + 2 exp(/3 xi 3 ) c—i 1 + >ki exp(,Bko + / kiXii + / 3 k2X2) 3 ford = 1 ... c—i (3.45) and — Pic = 1 (3.46) Pij. 1 Note that since the smallest binomial denominator in the data set is 80, the mixed logistic regression model is identifiable if c < (80 + l)/2 = 40.5. Thus, there are virtually no restrictions on identifiability in this example. Table 3.6 provides the results of fitting these models. In order to determine the number of components first, we compare the values of AIC and BIC among the three Chapter 3. Mixed Logistic Regression Models 171 staturated models. Clearly, both AIC and BIC lead to the choice of a 2-component mixed logistic regression model. Within these 2-component models we carry out inference using likelihood ratio tests. First we test which covariates in the mixing probabilities are significant. Comparing the model oniy excluding x 2 in mixing probabilities and including all covariates in binomial parameters with the saturated 2-component model, the chi-square test statistic is 0 up to 2 decimal approximation. This clearly indicates that x 2 is insignificant in mixing probabilities. Then we test the hypothesis that x 1 is insignificant in mixing probabilities. The corresponding chi- square test statistic is 2(1784.36 — 1757.48) = 53.76 with one degree of freedom, suggesting that x 1 is highly significant in mixing probablities. For binomial parameters we first test the hypothesis that the dose-treatment in teraction is insignificant. Comparing the one only excluding x 3 in binomial parame ters and including x 1 in mixing probabilities with the one including all covariates in binomial parameters and x 1 in mixing probabilities, the chi- square test statistic is 2(1760.57 — 1757.48) 6.18 with 2 degrees of freedom. Since the p-value of the test = statistic is 0.0455, we do not reject the hypothesis, at 1% level, that the interaction effect is insignificant. On the other hand, both the effects of covariates x 1 and x 2 are significant. For instance, to test the hypothesis that the effect of x 2 is insignifi cant, we compare the model including x, 2 in binomial parameters only and x 1 in both mixing probabilities and binomial parameters with the one only including x 1 in both mixing probabilities and binomial parameters, and obtain the corresponding chi-square test statistic 2(1834.94 — 1760.57) = 148.74 with 2 degrees of freedom. Clearly we reject the hypothesis that the effect of x: 2 is insignificant. Finally, we test the hypothesis of a common effect of treatment for both components, i.e., o 2 = o22. Indeed this hypothesis is valid because the test statistic is 0 up to two decimal approximation. Therefore we select the 2-component mixed logistic regression model with the covariate of dose level Chapter 3. Mixed Logistic Regression Models 172 in both mixing probabilities and binomial parameters and the common coefficient of the covariate of treatment in binomial parameters. This model fits the data best. After fitting the 2-component mixed logistic regression model, the Pearson goodnessof-fit test statistic X 2 is 52.18 with 33 degrees of freedom. The p-value of the test statistic is 0.0181, suggesting that there is no evidence of lack of fit at 1% significance level. Note that the deviance for the fitted model is 51.46 with 33 degrees of freedom. In addition, the Pearson, deviance aid likelihood residuals from the fitted model are calculated and displayed in Figure 3.1, Figure 3.2 and Figure 3.3 respectively. These plots show that the three types of residuals are very similar to each other, and that the 37th observation is far distant from the remaining observations in these plots, suggesting that it is an outlier. On omitting the observation, the deviance reduction is r 37 = (_3.1651)2 = 10.0179. This means that the 37th observation has great impact on the overall fit of mixed logistic regression model to the data. For detection of influential observations, the average relative coefficient changes w are calculated and displayed in Figure 3.4. Clearly, the 37th observation also has the largest value (0.3543). On omitting this observation, the average relative coefficient change for each parameter estimate is about 35%, and the new parameter estimates become 2 a = (—44.38,—1183.7) (3.47) = (—0.8838, 7.2151, 1.2232) (3.48) = (—4.8242, 123.29, 1.2232) (3.49) Note that changes in the binomial parameter estimates for first component are relatively large, while there are almost no changes in the parameter estimates for mixing probabili ties. This indicates that the 37th observation has greater influence on the first component than on the second component. We now interpret the fitted model. The chosen mixed logistic regression model suggests that numbers of fish with liver Chapter 3. Mixed Logi.tic Regression Models 173 tumors are generated by two underlying binomial distributions with binomial parameters defined by, respectively, — — exp(—0.8161 + 6.6209x 1 + 1.1686z ) 2 1 + exp(—0.8161 + 6.6209x 1 + 1.1686x ) 2 3 50 exp(—4.7798 + 122.92x 1 + 1.1686x ) 2 1 + exp(—4.7798 + 122.92x 1 + 1.1686x ) 2 3 51 and — — In addition, these two distributions are mixed according to the mixing probabilities de fined by — — exp(—44.38 + 1183.7xi) 1 + exp(—44.38 + 1183.7xi) 3 52 1 1 + exp(—44.38 + 1183.7x ) 1 (3.53) and Pi2 = According to this model, tanks in either of the two treatments may be classified into two groups on the basis of the two dose-response functions. For either of the two treatments, fish in those tanks exposed to a higher dose level (> 0.025 pm) follows one dose-response function; and fish exposed to a lower dose level ( 0.025 ppm) follows another. In addition, the treatment effect is the same for both groups. On the other hand, when exposed to a higher dose level, there is a higher chance for fish to follow the first dose-response function because the mixing probability for component one is very close to 1. Similarly, when exposed to a lower dose level, there is a higher chance for fish to follow the second dose-response function because the mixing probability for component two is close to 1. Figure 3.5 provides the estimated proportions of fish with liver tumors corresponding to each group for either of the two treatments (the solid line is the proportion for group one and the dotted line for group two). Note that Figure 3.5 also Chapter 3. Mixed Logistic Regression Models 174 classifies the observed proportions in terms of the estimated posterior probabilities from the fitted model. Those observations marked as “1” form group one which characterized by the function while those marked as “2” form group two which is characterized by the function Figure 3.6 depicts the mean-variance relationship for the fitted model based on the estimated mean and variance obtained through (3.8) and (3.9). Note that there is no obvious parametric relationship between the estimated mean and variance. For the purpose of comparison, we also fit the data to the two quasi-likelihood mod els which are discussed by McCullagh and Nelder (1989) and Williams (1982) respec tively. The first assumes a variance form Var(1’) Var() = mK(1 — 7r)[1 + (m — = mr(1 2 cr — 7r), and the second l)q9. Note that the unknown parameters u 2 and are usually called unexplained variance. The results of parameter estimates and stan dard errors are given in Table 3.7. Note that the dose-treatment effect is not significant in quasi-likelihood models (estimates not reported here). As expected, the parameter estimates for both quasi-likelihood models are very similar to each other because the binomial denominators m do not vary much. From Table 3.7, we find that parameters estimates under quasi-likelihood models and mixed logistic regression model are different, suggesting that using different methods to model extra-binomial variation may lead to either different parameter estimate or different standard errors or both. For instance, the coefficient estimate for dose level is 12.82 and 12.81 by quasi-likelihood method I and II respectively, and 6.6209 for component one and 122.92 for component two respec tively. Furthermore, computing the t-statistic (estimated coefficient/standard error) and comparing the mixed logistic regression model with the quasi-likelihood, we find that quasi-likelihood models may underestimate the treatment and dose effects. For example, the values of the t-statistic of the estimated coefficient for x are 4.1556 and 4.1441 for the quasi-likelihood model I and II respectively, while the value for the mixed logistic Chapter 3. Mixed Logistic Regression Models 175 regression model is 13.732. Thus, compared with quasi-likelihood methods, the mixed logistic regression model has smaller confidence intervals for parameter estimates. In summary, we have applied the mixed logistic regression model to analyze the data from a fish toxicology study. The data are well fitted by a 2-component mixed logistic regression model with mixing probabilities depending on the dose level covariate and binomial parameters dependillg on both dose level and treatment covariates. The goodness-of-fit test suggests that there is no evidence of lack of fit in the model. In addition, the residual analysis identifies an outlier and influential observations. According to this model, there are two dose-response functions for each treatment, which describe lower dose level and higher dose level situations respectively. Comparing with the quasi likelihood methods, the mixed logistic regression model gives smaller confidence intervals of parameter estimates. Note that both parameter estimates and standard errors under the mixed logistic regression differ from those obtained by the quasi- likelihood method. 3.7 Tables and Figures in Chapter3 Chapter 3. Mixed Logistic Regression Models 176 Table 3.1: Data of Busvine (1938) Jar label Dose Jar total Number dead 1 0.033 24 0 2 0.167 31 10 3 0.199 30 17 4 0.225 31 12 5 0.260 27 7 6 0.314 26 23 7 0.322 30 22 8 0.362 31 29 9 0.391 30 30 10 0.394 30 23 Chapter 3. Mixed Logistic Regreas.jon Models 177 Table 3.2: The results of the simulations for the mixed logistic regression model (Model 1) Initial values set as the true values Parameter True value Upper extreme Upper quartile Median Lower quartile -0.9163 -0.2618 -0.7825 -0.9698 -1. 1354 -1.6169 -0.9643 -0.5108 0.0366 -0.3293 -0.5443 -0.7185 -1.2269 -0.5380 a it 4.2962 0.6248 4.0144 4.2717 4.4430 -1.9056 4.2402 a 12 -0.4505 0. 1955 -0.2766 0.4593 0.6294 -0.9752 -0.4855 a 21 -1.3148 -0.8427 -1.2082 -1.3796 4.5097 4.7884 4.3619 a 22 1.0811 1.4963 1.2325 1.1158 1.0344 0.7893 1.1211 a 31 0.6973 1.0054 0.8004 0.6911 0.5779 0.3414 0.6892 a 32 0.7499 1.1164 0.8855 0.7695 0.6527 0.4017 0.7683 ç H t ‘-21 Lower extreme Average Initial values chosen step by step -0.9163 -0.2467 -0.7711 -0.9626 -1.1331 -1.6162 -0.9586 -0.5108 0.0336 -0.3317 -0.5500 -0.7468 -1.2571 -0.5386 a 1 1 4.2962 0.6238 -0.9970 -1 .2561 -1.4414 4.9909 4.2324 a 12 0.4505 0. 1540 -0.3189 -0.4839 -0.6473 -0.9759 -0.4976 a 21 4.3148 0.7701 4.1968 4.3648 4.5076 -1.7880 -1.2844 a 22 1.0811 1.5773 1.2293 1.1066 0.9885 0.6696 1.0599 a 31 0.6973 1.0076 0.813i 0.6996 0.5814 0.3416 0.7011 a 32 0.7499 1.1165 0.8772 O7656 0.6473 0.3954 0.7577 , i-fl ‘-21 Chapter 3. Mixed Logistic Regression Models 178 Table 33: The results of the simulations for the mixed logistic regression model (Model 2). Initial values set as the true values Parameter True value Upper extreme Upper quartile Median Lower quartile Lower extreme -2.1129 -0.7531 -1.6871 -2.2560 -2.8070 -3.8851 -2.3068 1.6057 2.8955 2.0599 1.7048 1.4069 0.5755 1.7467 p -0.9692 0.2170 -0.7619 -1.0517 -1.4186 -2.3082 -1.0958 p 1.3805 2.3960 1.7416 1.4624 1.1670 0.4227 1.4941 a 1 1 -2.1972 -1.8090 -2.0612 -2.1586 -2.2975 -2.6451 -2.2061 a 21 -0.8473 -0.6443 -0.7807 -0.8517 -0.9046 -1.0277 -0.8473 a 31 1.3863 1.5637 1.4346 1.3926 1.3371 1.2438 1.3892 p 1.’ll p ‘ 12 ‘21 22 Average Initial values chosen step by step i -2.1129 -0.7410 -1.6260 -2.2150 -2.7535 -3.8757 -2.2733 p 1.6057 2.8799 2.0629 1.6947 1.3974 0.5713 1.7469 -0.9692 0.2106 -0.7705 -1.0538 -1.4351 -2.3073 -1.1090 1.3805 2.3804 1.7385 1.4463 1.1490 0.4193 1.4889 a 11 -2.1972 -1.6279 -2.0135 -2.1347 2.2806 2.6425 2.1818 a 21 0.8473 0.6152 0.7692 0.8452 0.8980 4.0274 0.8378 a 31 1.3863 1.5637 1.4348 1.3926 1.3386 1.2438 1.3895 “12 p ‘21 p ‘22 : Chapter 3. Mixed Logistic Regression Models 179 Table 3.4: The results of the simulations for the mixed logistic regression model (Model 3) Initial values set as the true values Parameter True Upper extreme Upper quartile Median value -2.1129 -0.2943 -1.5967 1.6057 3.3711 p -0.9692 p Lower Lower quartile extreme -2.2683 -2.8381 4.5911 -2.2618 2.1703 1.6764 1.2100 -0.1373 1.7061 0.0446 -0.8472 -1.0817 -1.4547 -2.1604 -1.1588 1.3805 2.6180 1.7743 1.4772 1.1977 0.3613 1.5383 a 1 1 -1.2962 -0.4551 -0.9429 -1.1646 -1.5292 -2.1857 -1.2114 a 12 0.4505 0.3340 0.2551 0.4819 0.6523 4.0381 0.5067 a 21 4.3148 -0.8501 4.1829 -1.3497 4.4191 -1.6918 4.3164 a 2 2 1.0811 1.3219 1.1537 1.0830 0.9804 0.7596 1.0718 0.6973 1.0964 0.8138 0.6710 0.5802 0.3398 0.6881 0.7499 1.2727 0.9020 0.7549 0.5985 0.2325 0.7604 p 1.111. p ‘12 ‘21 22 31 a 3 2 Average Initial values chosen step by step -2.1129 -0.2945 -1.5197 -2.1655 -2.8181 -4.6014 -2.2109 p 1.6057 3.3687 2.1429 1.6063 1.1472 -0.3411 1.6660 p -0.9692 0.0457 -0.8123 -1.0815 -1.4438 -2.3682 -1.1286 p 1.3805 2.5867 1.7570 1.4766 1.1845 0.3604 1.5061 a 1 1 -1.2962 -0.4557 -0.9293 -1.1561 -1.5302 -2.1914 -1.2054 a 12 0.4505 0.3340 -0.2544 0.4896 0.6639 4.0377 -0.5096 a 21 4.3148 0.8394 -1.1830 -1.3459 -1.4153 -1.6962 -1.2974 a 22 1.0811 1.3233 1.1550 1.0768 0.9769 0.7526 1.0577 a 3 1 0.6973 1.0967 0.8149 0.6848 0.5824 0.3398 0.7022 a 3 2 0.7499 1.2727 0.8991 0.7518 0.5948 0.1788 0.7455 1 tJi 12 ‘21 22 Chapter 3. Mixed Logistic Regression Models 180 Table 35: Number of trout with liver tumors/number in tank Dose (ppm) Aflatoxin Bi Aflatoxicol 0.010 3/86, 5/86, 4/88, 2/86 9/87, 5/86,2/89,9/85 0.025 14/87,14/90, 9/83 12/88 30/86, 41/86, 27/86, 34/88 0.050 29/90, 3 1/89, 33/89, 26/87 54/89, 53/86, 64/90, 55/88 0.100 44/86,40/80, 44/89, 43/88 71/88,73/89, 65/88, 72/90 0.250 62/87,67/88,59/88,58/84 66/86, 75/82, 72/81,73/89 Chapter 3. Mixed Logistic Pression Models 181 Table 3.6: Logistic regression and mixed logistic regression modd estimates for fish data. Binomial pacsmetcrs Mixing probability S (3) 0 fi = P,., BIC c,, 2 P, Logistic regression model (i-component) I NA NA NA -1.758 (0.0847) 11.93 (0.6750) 0.8911 (0.1143) 1 NA NA NA -1.839 (0.0764) 12.82 (0.5424) 1.063 (0.0803) 2.402 (1.154) -1930.38 -1934.38 -1937.76 .1932.62 -1935.62 -1938.15 -1757.48 -1768.48 -1776.77 -1757.48 -1767.48 -1775.92 -1784.36 -1793.36 -1800.96 -1760.57 -1768.57 -1775.32 -1760.57 -1767.57 -1773.48 -1834.98 -1840.98 -1846.05 -1922.26 -1926.26 -1929.64 -1750.80 -1768.80 .1784.00 2-component mixture 1 -43.89 1170.6 -0.0103 2 42.81 1 1141.58 2 1 0.4095 2 1 42.72 1139.6 2 1 -44.38 (407.8) 1183.7 (4.453) 2 1 -2.6492 64.73 2 -0.9220 7.4548 1.4198 -2.1632 4.0710 90.50 0.1337 47.67 -0.9220 7.4547 1.4198 -2.1631 4.0710 90.50 0.1337 47.67 -0.9232 7.4613 [.4141 -2.1311 4.0708 90.4747 0.1356 47.48 -0.8156 6.6201 1.1676 4.7821 122.94 -0.8161 (0.0982) 6.6209 (0.6155) -4.7798 (0.2896) 122.92 (12.49) -0.5843 7.9411 -3.8221 87.33 1 4.9315 -1.5167 2 -[00.94 0.7843 1.1716 1.1686 (0.0851) 3-component mixture 1 25.66 -195.59 0.2586 -1.3636 13.58 1.1820 1.5813 2 74.16 -1500.7 .0.3769 -4.0710 90.50 0.1337 47.67 -0.6032 5.9736 1.6905 -3.5038 3 1 Log-likelihood does not include the constant term. Unexplained variance (treatment) 2 a (dose level) 1 a (intercept) 0 a Covariates 1.063 (0.2558) 1.063 (0.0803) =io.is 12.82 (1.7278) 12.82 (0.5424) =i.o -1.839 (0.2434) -1.839 (0.0764) Quasilikelihood I Logistic regression Table 3.7: Parameter estiamtes for four models for fish data Quasi- 4=O.IO56 (0.2553) 1.058 12.81 (1.714) (0.2424) -1.841 likelihood II 6.6209 (0.6155) (0.0982) -0.8161 component 1 Mixed NA 1.1686 (0.0851) 122.92 (12.49) (0.2896) -4.7798 component 2 logistic regression -_____________________________ 1’3 ci I 0 ci C 0 (0 U) 0 Cl) ce). 0• J 4 C 0 Figure 3.1: 10 . 20 index . . 30 I . . 40 . The index plot of the Pearson residuals from the fitted logistic regression model for the fish data. r 0 > a) C C.) a) 0 I.. (I) Cl) Ci). 0• C”. 0 . /\ Figure 3.2: - 10 . . 20 index 1/ . / .%. / 30 . . . 40 The index plot of the deviance residtjals from the fitted mixed logistic regression model tor me tish data. I. 0 0 4: 0) V (I) G) (I) C. cJ. 1 0 Figure 3.3: . 10 . . 20 index . tor me tisti data. 30 . . 40 . The index plot of the likelihooçi residuals from the fitted mixed logistic regression model i I-. G) 0) I- a) a) > a) 8 a) 0 a) C 4-’ C) -c (IS 0) Cl) 0 0 d 0 c’J C) 0 0 . 10 I I . . . 20 index I •I\ . • 4% I 30 I Figure 3.4: The index plot of the average relative coefficient changes from the fitted mixed logistic regression model for the fish data. 40 I: I t-4 probabilities 0.0 0.2 0.4 0.6 0.8 1.0 probabilties 0.0 0.2 0.4 0.6 0.8 1.0 4’ .4 .4 4’ .4 4’ 4. 0 0 C71 .4 .4 0 0 C,’ S I I I I I I I .0 0 I 0 p 0 Li) CD CD CD p oq •1 C) I 0 0 I - (J1 I I 1 1 0 C) 0 0 0 Cl) CD CD CD 0 CD p 0 (7’ CD CD CD 0 0 t’%) 01 0 0 0 0 I rs 01 spo nosaz,J 3S7 L9t 0 0 c’J >. G) C-) 0 Co co 0 0 20 40 mean 60 80 Figure 3.6: The plot of the mean-variance relationshjp based on the fitted mixed logistic regression model for the tish data. I $4 Chapter 4 Summary, Conclusions and Future Research In this chapter, we summarize similarities and differences between the mixed Poisson re gression and mixed logistic regression models discussed in the previous chapters. Further more, we discuss some extensions of these mixed regression models and related remaining issues for future research. Section 4.2 formulates a mixed exponential family regression model which includes the mixed Poisson regression and mixed logistic regression models as special cases. Section 4.3 concerns a hidden Markov Poisson regression models for longitudinal data. We give some preliminary results of this model. 4.1 Summary and Conclusions There are many similarities between the mixed Poisson regression and mixed logistic regression models discussed in Chapters 2 and 3. These are that • both models assume an unobserved mixing process which can occupy any one of c states where c is finite and unknown; independent pairs of observed and unobserved random variables; covariates consisting of two parts: one related to the mixing probabilities, and the other to the component parameters; the same multinomial link in the mixing probabilities; • both models can model overdispersion in the sense that the variances of the mixed regression models are larger than those specified by the mean-variance relationships of the corresponding usual regression models; 189 Chapter 4. Summary, Conclusions and Future Research 190 • parameters are estimated by maximum likelihood. Parameter estimates of both models are obtained by applying (1) the EM algorithm treating the unobserved random variable as missing data and (2) a quasi-Newton approach for the M-step and for maximizing the observed log likelihood functions; • the model selection procedures for both models are the same, i.e., first determining the number of components by comparing the AIC and BIC values among the satu rated models, and then carrying out inferences about regression parameters within c-component mixtures by likelihood ratio tests; • classification, residual analysis and goodness-of-fit tests for both models are carried out in the same way. There are several differences between the mixed Poisson regression and mixed lo gistic regression models. Obviously, the component distributions of the mixtures and link functions are different. This leads to different sufficient conditions for identifiability of these models. For the mixed Poisson regression models, the sufficient conditions for identifiability are virtually satisfied in all applications; for the mixed logistic regression models, since the sufficient conditions for identifiability depend on the binomial denom inators, these may restrict the applications of these models in some cases. Although the algorithms for computation of parameter estimates for both models are similar, the im plementation of these algorithms are quite different because there are different rescaling schemes to overcome numerical overflow or underfiow problems. Note that coding these algorithms might be a formidable task. Both the mixed Poisson regression and mixed logistic regression models provide new tools to analyze discrete data when data are overdispersed with respect to either the Poisson or binomial assumption. Allowing covariates in both mixing probabilities and the component parameters give a direct way to assess effects of each covariate on the Chapter 4. Summary, Conclusions and Future Research 191 response variable. Using these models, we can classify observations into different groups characterized by different regression functions. This may give a more meaningful inter pretation for overdispersion. The mixed regression models are not always preferable to other models for mod elling overdispersion such as parametric mixtures or quasi-likelihood regression. When overdispersion is reasonably modeled by a continuous mixing distribution, either para metric mixtures or quasi-likelihood regression models may. be better. For the Poisson case, for instance, if extra-Poisson variation is caused by a random effect in the mean which is reasonably modelled by a continuous distribution, say a gamma distribution, then the negative binomial model is more suitable. Likewise, for the binomial case, if extra-binomial variation varies smoothly in the binomial denominators, Williams’ quasilikelihood models (1984) may be better. Nevertheless, the mixed regression models are suitable in many applications, which we have demonstrated in the previous chapters. The same technique of accommodating heterogeneity with mixture models can be applied to other cases. We discuss some generalizations below. 4.2 Mixed Exponential Regression Models For a given one-parameter one-dimensional exponential model, the mean-variance rela tionship is determined by a single parameter. The one-parameter exponential density is h(y) exp(Oy where h(y) is a real function, = , and variance x”(O)• x(O) — x(O)), is the log moment generating function with mean Sometimes samples are found to be either too heteroge neous or homogeneous to be explained by a one-parameter exponential model of models in the sense that the implicit mean-variance relationship in such a model is violated by Chapter 4. Summary, Conclusions and Future Research 192 the data. If the sample variance is large compared with that predicted by inserting the sample mean into the mean-variance relationship, overdispersion occurs. On the other hand, if sample variance is small compared with that predicted by the mean-variance relationship, underdispersion occurs. In this section, we suggest a mixed exponential regression model to adjust for overdispersion in terms of the mean-variance relationship of the one-parameter exponential model. Let the random variable Y denote the ith response variable, and let {(yj, xi), i 1,.. . , u} denote observations where yj are observed value of Y, and x (m) are k-dimensional covariate vectors associated with y. Note that = = (Xm), and x are k 1 dimensional and k -dimensional vectors corresponding to the regression part of mixing 2 probabilities and component parameters respectively. Usually the first element of and (m) is 1 corresponding to an intercept. Our mixed exponential regression model assumes that (1) the unobserved mixing process can occupy any one of c states where c is finite and unknown; (2) for each observed response there is an unobserved random variable, O, repre j, senting the component which generates y. Further, the CI’, 9) are pairwisely independent; (3) conditional on covariate support, 1,. (m) and p,(x 2 . , . ,/9) c, and Pr(9 (m) = j x ,/3) = p(x(m) , 9) where (m) c j1 p,(x ,8) 1 is defined by (m) p(x follows a discrete distribution with c points of m) ,/3) pj exp(/3xIm)) = 1 + c—i / (m) kx 3 ki exp(/ for ) j= 1,. . . , c — 1, (4.1) Chapter 4. Summary, Conclusions and Future Research 193 and (m) INc = pc(Xj ,i3) — (4.2) = with 3 (/3,...,8C_)’ and /3 = = 8 j1,..., (/ ) 1 k 3 ’, parameters. In fact, conditional on (m) (1, pa,. , .. p). Note that j, (4) conditional on € j 1,...,c— 1, are unknown = follows a multinomial distribution j appears in each pjj for 1 C; Y follows an one-parameter exponential distribution which we denote by fi (Yi I x,aj) = exp(Oy — (4.3) ii)) 0 X( where h(x, aj) where a j = 1,. (ai,. . . , c. . . , for j = i,:. a,)’ are unknown parameters . , , c, where a 3 = (aj,. . . , Note that the component parameter Otj relate to covariates SIr) through the link function h. Under the above assumptions the probability “density” of Yj satisfies f(y x),xm),a,) where pjj and Oj = (9 — ij)) 8 x( (4.4) are specified by (4.1),(4.2) and (4.3) respectively. Note that the mixed Poisson regression and mixed logistic regression models discussed in the previous chapters are special cases of the mixed exponential regression models in which the component distributions are Poisson and binomial distributions respectively. Chapter 4. Summary, Conclusions and Future Research 194 Another example of the mixed exponential regression models is the mixed normal re gression model which assumes that the component distributions are normal distributions with conimon variance for all components. In this case, the component distributions can be denoted by fj I zi I x(r) , °i)‘\ 1 1 exp(—-j(yj = /_ — pii) 2 ) where I(r) for j = 1,. . . , c, Note that the link function is the identity function. To apply the mixed exponential regression models, we need to show under what con ditions the unconditional variance of Y is larger than that allowed by the one-parameter exponential distribution. The results given by Shaked (1980) may provide insight it. Since the different assumptions about the component distributions may lead to different conditions for identifiability of the mixed regression models, as we show in the previous chapters, we also need to show under what conditions the mixed exponential regression models is identifiable. As we did for the mixed Poisson regression and mixed binomial regression models, parameter estimation of the mixed exponential regression model can be carried out by maximum likelihood. Furthermore, to obtain the maximum likelihood estimates requires using an iterative algorithm similar to ones in the previous chapters. Specifically, for a fixed number of components c, we may apply the EM algorithm by defining the un observed random variable as missing data and using a quasi-Newton approach for the M-step. When either the observed likelihood or the parameter estimates do not change more than a given tolerance, we apply a quasi-Newton approach for maximizing the observed likelihood function. Chapter 4. Summary, Conclusions and Future Research 195 After fitting data to the proposed model, we need to carry out residual analysis to identify possible outliers and influential observations and goodness-of-fit test for the fitted model. As we do for the mixed Poisson and logistic regression models, we propose using Pearson, deviance and likelihood residuals as well as relative average coefficient changes in a similar way for this purpose. We also suggest using the estimated posterior probabilities from the fitted model to classify observations into c groups, each characterized by a regression function. 4.3 Hidden Markov Poisson Regression Models In this section, we consider a statistical method for longitudinal discrete data where the objective of data analysis is to describe an observed count, Yki, for subject k during the ith time interval zt, as a function of covariates, Xk. Longitudinal data are characterized by the fact that there may exist some dependence structure between repeated observations for a subject. The model which we have developed assumes that the dependence between repeated observations for a subject is determined by a finite state Markov chain in such a way that conditional on a state, an observed count, Yki, follows a Poisson distribution with mean specified by the product of exposure, t , and Poisson rate defined 1 t_ by a log linear function of covariates, Xkj, = — in which coefficients may vary from state to state. This model allows for overdispersion relative to the usual Poisson regression model. Our initial motivation comes from economic studies which investigate the relationship between research and development and patent activity at firm level based on longitudi nal discrete data associated with covariates. The previous studies have suggested that the data may be overdispersed relative to the usual Poisson regression and that there may exist some correlation between repeated observations for a firm (Hausman, Hall and Griliches (1984) and Hall, Griliches and Hausman (1986)). However these studies Chapter 4. Summary, Conclusions and Future Research 196 have no discussion about directly modeling the dependence structure between repeated observations. Our approach explicitly specifies the dependence structure as a finite state Markov chain and estimates both the parameters of the Markov chain and coefficients in the Poisson regression corresponding to each underlying state. In the context of generalized linear models, several approaches have been developed for longitudinal data. Liang and Zeger (1986) proposed a general framework for analysis of longitudinal data based on generalized linear models, and Zeger (1988), Kaufmann (1987), Stiratelli, Laird and Ware (1984), Zeger, Liang and Self (1985) and Zeger and Qaqish (1988) developed methods for serially correlated discrete observations. In ap plications to economics in which data are primarily continuous, some approaches allow parameter values suddenly to change according to the states of a Markov chain, c.f. Goldfeld and Quandt (1973), Lindgren (1978), Sclove (1983) and Tyssedal and Tjos theim (1988). In applications without covariates, Albert (1992) proposed a two- state Markov mix ture model for longitudinal epileptic seizure counts. Leroux and Puterman (1992), Ler oux (1989) and Le, Leroux and Puterman (1992) developed a finite state Markov mixture model for the sequence of counts of fetal movements. Our approach extends their ap proaches by incorporating covariates into the model and allowing variable exposure. We also use a rescale scheme to overcome either over or under numerical flow in applying the EM algorithm so that our algorithm improves the ones proposed by these authors. 4.3.1 The Model The model we study in this paper embeds a finite state Markov chain in Poisson re gression in which the regression coefficients depend on the chosen state. Specifically, let { (yii, Xki, tki); i 1,. a subject k, where . Yki . , k, 0 = tko <tkl < ... <tkfl} be a sequence of observed data for is an observed count associated with covariates Xkj of d-dimension Chapter 4. Summary, Conclusions and Future Research during a time interval /tk tki — tki_1. 197 For simplicity we suppress the subscript for subjects in the following discussion. A Markov Poisson regression model assumes (1) The unobserved stochastic process has c possible states where c is finite and un known; (2) For each observed count, yj, at time point t, there exists an unobserved discrete random variable, S, representing a state at which yj is generated. Further, S has c points of support, {1,. (3) The S-process, 1 {S 52,.. , . .. , , c}; S,}, follows a c-state Markov chain with transition prob abilities defined by Pr(S=jIS_l=k)=pk, (4) Conditional on Sj j, = 3 (i I f = (4.5) Y follows a Poisson distribution which we denote as a, S) Po (y 0, 1,..., I a)t] exp = where yj j,k=1,...,c; tj (xi, a) and X [j(Xj, )t] (4.6) a)is a nonnegative function equal to the Poisson rate; for example, (xj,aj) where c = t, = — . . , aj,j, j = 1,. . . = exp (cx), , c, are unknown parameter vectors. Note that 4 may equal 1 for all i or correspond to time of observation in time tj series data. The above assumptions define a semi-Markov process {(Y, Si); i t < ... = 1,. . . , n, 0 = 0 < t < t} in which the transitions of the S-process follow a stationary, first-order, Chapter 4. Summary, Conclusions and Future Research 198 Markov process, and the count, }, is renewed at each transition point, so that the t, conditional component distributions for the count depend only upon which state is exited. Note that the Poisson rates of the conditional component distributions vary between states by different coefficients in the same Poisson regression specification. Furthermore, since the covariates can include parts of an individual’s past history, the proposed model provides a means of relaxing the assumptions about the transition process of the renewal counts. Note that the transition probabilities pjj do not depend on covariates. Under the above assumptions, the joint probability “density” function of a sequence of observed counts, Y exposure, T = {yi,. . , y}, associated with covariates, X . ,.. , 1 {zt (i)f( = j=1 2 S (an, . . . 21 a ,ald, , . . , x,}, and . . unknown parameter vector, ps’) . , 2d, = xi,ti,aj,Si) S=1 sfs(y 1 flps_ = ,. 1 {x satisfies the following equation. . f(YIX,T,O) where 0 = . Pr(Si , . = j), initial states for the subject, ps_ s and fs (i 1 I . I (4.7) acd,pli, . . . ,Plc, . . . ,Pci, . . ,pcc) is an = 1,... ,c, are the probabilities of the x, as, S ) are defined by (4.5) and 1 (4.6) respectively. Note that the probabilities of the initial states, p, are assumed known. We will discuss how to determine their values below. Note also that Pjk = 1 for all j. Some previously studied models are special cases of the above model. • Choosing c = 1 yields a Poisson regression model; • Choosing the transition probability matrix as an identity matrix yields an inde pendent mixed Poisson regression model which is a special case of the generalized Chapter 4. Summary, Conclusions and Future Research 199 mixed Poisson regression models discussed by Wang, Puterman, Le and Cockburn (1994); Setting x = (1) and t = 1 for all i yields Markov Poisson mixture without covari ates which is studied by Leroux and Puterman (1992) and Albert (1992). 4.3.2 Moment Structure From the above definition we can derive the basic moment structure of observed counts. Using the properties of conditional expectation, we obtain I I S) = )is and Var(Y S) = Thus the unconditional mean and variance of Y are E(1) = E(E(1 I Si)) Var (‘4) = E(Var(Y I Si)) + Var(E(} = Pr(S Pr(S = + = j)ij {E (4.8) = j)jj I Sj) — Pr(S Pr(S = = i)A}2}(49) Since the second term in (4.9) is always nonnegative, (4.8) and (4.9) show that the proposed model can accommodate overdispersion relative to Poisson regression, and that the observed data are homogeneous if and only if = ... for all i. = The covariance of Y and ‘+m is given by cov(E()4 COV(1’,’4+m) I S),E(’4+m I S)) E()jS)j+ms+m) = — iji+mkFT(Si = j=1 k=1 j, Si+m = k) — Chapter 4. Summary, Conclusions and Future Research 4.3.3 200 Identifiability Along with the applications of the Markov Poisson regression models we must be con cerned with the identifiability for the models. Without covariates, Teicher (1961, 1967) proves that both the class of finite Poisson mixtures and the class of all mixtures of Poisson distribution products are identifiable. We will apply these results to derive the sufficient conditions for identifiability for the model. But we first define identifiability for the Markov Poisson regression model as follows. Definition: Consider the class of probability models, {f(Y X, T, 0) defined by (4.7), a restriction that ) < space Y1 x. . . x .,y,,) I <), parameter space Cx 0, sample C x 0, f(Y I X,T,0) . X, T, O)}, with f(Y and fixed covariate matrices X and T. The class of probability models is identifiable if for (c,0), (c*,0*) for all (yr,. ... I E 34 x ... x = f(Y X,T,0*) Y, implies (c,0) = (4.10) (c*,0*). Note that the order restriction in the definition indicates that two models are equiva lent if they agree up to permutations of parameters. We now provide a sufficient condition for identifiability as follows. Theorem 3: The hidden Markov Poisson regression model is identifiable if the design matrix X is full rank. Proof: Suppose that (c,0) and (c*,0*) satisfy (4.10), then summing up both sides of equation (4.10) for Y2, . . . , n respectively yields 1 ‘j) p’Po(y for all y E = Po(y 1 p j) (4.11) 34. Since each side of equation (4.11) may be regarded as a finite Poisson mixture without covariates, Teicher’s result (1961) implies that c= c, l) = > 0 and )1j = (4.12) Chapter 4. Summary, Conclusions and Future Research 201 forj=1,...,c. Now summing up both sides of (4.10) for ppkP0(y1 I 2 )Po(y I y3,. . . , y,, yields I P’P;kPO(Y1 2k) = j=1 k=1 2 i)Po(y I) (4.13) j=1 k=1 e for all (yr, y2) 31 x )‘2. Since each side of equation (4.13) may be regarded as a finite mixture of two Poisson distribution products without covariates, Teicher’s result (1967) implies that forj )‘2j, P P 1 k (1) = = for 1,...,c, j, k = 1,. or forj,k= 1,...,c. Pjk =Pk’ For each i k=1 > (E... j=1 . . , 2, summing up both sides of (4.10) for Yl,. Po(y I) = sil k=1 (s... j=1 (4.14) , . zii—i, Psi_ik) Po(y (4.15) , y yields I) (4.16) s_ = 1 1 for all yj E Y. (4.16) implies that )jj= ., for i = 3,...,n andj = 1,...,c. (4.17) From (4.12), (4.14) and (4.17) we obtain exp(axj) =exp(4’x) fori = 1,...,n andj = 1,...,c. This is equivalent to (c—c)’x=0, fori=1,...,nandj=1,...,c, or (a — 3 cr)’X=O, forj=1,...,c. (4.18) Chapter 4. Summary, Conclusions and Future Research 202 Thus a sufficient condition for identifiability is that X is full rank, in which case (4.18) implies that cr for = j 1,. = . . , c. We can assume that this sufficient condition holds without loss of generality, since if it does not we can reparameterize the model accordingly. 4.3.4 D Estimation The EM algorithm In order to find the maximum likelihood estimates of the unknown parameters for the above model, we apply the EM algorithm (Dempster, Laird and Rubin, 1977), treating the unobservable state variable S as missing information. In doing so, we represent a complete data set by introducing the following indicator functions 1 ifS_i=kandS=j zz(z,j,k) = ( 0 otherwise; 1 ifS=j z(z,j) = 0 otherwise. z(i, j), z(i, j, k)); i Thus the log-likelihood of the complete data set, {(yj, x, and j,k=1,...,c},with0=0°is Q(0 100) = i=2 j=1 k=1 + z(i,j)logfs(y I i=1 3=1 logp) + Qi(Oi where 01 = (Pu, . . . ,Pic, Qi(Oi 0°) . . ,Pc1, = . . . ,Pcc), 02 = I 0°) + (a1i, . . . Q2(02 , aid, >zz(i,j,k)logpjk i=2 j=1 k=1 I 0°) , ad, and . . . , add), = 1,. . . , n Chapter 4. Summary, Conclusions and Future Research Q2(02 I 0°) = 203 z(i,j)1ogf(y =j) i=1 j=1 The EM algorithm finds the maximum likelihood estimates by proceeding iteratively in two steps: E-step and M step. At the E- step, it replaces the missing data in Q(0 I 0°) by its expectation, conditional on the observation data and the initial values of the parameters. At the M-step, it finds the estimates of the parameters by maximizing the expected log likelihood for the complete data set, conditional on the observed data. It repeats the two steps until the log likelihood of the observed data no longer increases. Note that the EM algorithm guarantees that the log likelihood does not decrease for each iteration. In our case, the E-step of the EM algorithm updates the expected values of the missing data z(i, j) and zz(i, j, k) in each iteration, given the observed data and the initial values of the parameters. By definition, (i,j) = E{z(i,j) = Pr(S I ui,. iI . ,y,} Yi,.”,Yn) ,...,y,S 1 =j)/Pr(y = Iz(i,j,k) . = E(z(i,j,k) y,. ..,y,) = Pr(z(i,j,k) = Pr(yi,.. = ,...y_ Pr(y , 1 S_ =j)pkPr(y,...,yfl = k) I ii,. =j,S = ,...,y) 1 k)/Fr(y I = k) As first proposed by Baum et al. (1970), we use the following quantities to set up the forward-backward recursive formula for the computation of .(i,j) and Iz(i,j, k), a(i) = a(1) = Pr(yi,...,y:,S:=j), ,t 1 , aj, 51 1 Ix fori=2,...,nand = j) for j = 1,..., c, Chapter 4. Summary, Conclusions and Future Research (i) 3 b Pr(yi,. = (ri) 3 b Thus 1 for j = S .,y . = 204 for i j), 1,... —1 and 1,... ,c (i,j) and iz(i,j, k) can be written as (i,j) = a(i)b(i)/a(n) Iz(i,j,k) = pJkf(y I (4.19) = j)aj(i — (4.20) 1)bk(i)/a(n) The advantage of the above expressions is that there are the following recursive for mula to compute a(i) a(i) and b(i): = = k)pf(y = b(i) ak(z — l)pkfj(u = j) = I Pjkfj(Y+1 j) I = = I k I S =i) x+i,t+i,aj,S =j)Pr(yi,...,y S I = i) j)bk(i + 1) The M-step is equivalent to maximizing the following two functions with respect to Oi and 02 separately: 0°) = z”z(i,j,k)logpk and i=2 j=1 k=1 Q2(02 00) = 2(i,j)1ogf(y i=1 j=1 =j). Chapter 4. Summary, Conclusions and Future Research To maximize Qi(Oi probabilities, 01 00) I = (l jk), 3 205 with respect to 01, the estimated values of the transition should satisfy the following equation Solving (4.21), we obtain ê — Pjk — = 0. o 1 (4.21) 2 z ’z(i,j,k) , j,k —1,...,c. 2E = 1 E = zz(z,j, k) 1 (4.22) = (jk) by . ,. . To maximize Q2(02 I 00) with respect to 02, the estimated value 02 should satisfy the following equation 02 0. (4.23) However there are usually no closed form for the solution of (4.23). We use the quasiNewton approach (Nash, 1990) to solve it for 02. We now summarize the EM algorithm for the hidden Markov Poisson regression model below. Step 0: Specify starting values o° and o° and a tolerance ; Step 1: (E-step) Compute (i,j) and i’zi,j, k) using (4.19) and (4.20) respectively, for i=2,...,nandj,k=1,...,c; Step 2: (M-step) 1. Find the values of 0 = jk 3 J using (4.22); 2. Find the values of 02 to solve (4.23) using the quasi-Newton approach (Nash, 1990); Step 3: If -‘ crk — I O (0) jk 0) ::= — I (0) , set 01 I Pjk (0) = 0 and 02 = — p°) e or I 02— 0 I =i 02, and go to Step 1; Otherwise, stop. I Chapter 4. Summary, Conclusions and Future Research 206 The E-step of the EM algorithm The difficulty to compute (i,j) and Iz(i,j, k) by (4.19) and (4.20) is that a(i) and 3 (i) converge to 0 or oo very fast as i increases. This will cause underfiow problems in b the computation. To overcome this difficulty, we introduce an approach to rescale a (i) 3 and b,(i) so that both maximum values are around 1 for each i. This approach takes the special structure of the model into account. It first represents, for such a(i) = ( ak(i = exp {lo — l)pkj)fj(y (E ak(i — x, j, Si = j that a(i) j) aj, s 1 l)pk) + log(f(y x, = i)) ) 3 exp(q where qij = log( ak(i — (y 3 ) + log(f 3 1)pk by multiplying exp(—màxt) for such stores the order of a (i) by powera(i) 3 x, = } j)). It then rescales a(i) crj, s 1 (i) 3 j that a 0, 0, where maxt maxk{qjk}, and . This order will be used to calculate 3 maxt the orders of (i,j) and Iz(i,j, k). The same procedure is applied to calculate b (i). 3 Before we state the computation of the E-step of the EM algorithm, we first define some notations for simplicity as follows: f(i,j) y!f(y I x,/ =j) = [ztA(x, a)] exp (—t)(x , cj)) 1 = exp{yj log(t ) + y(a 3 ’x) 3 — Lt exp(a’xj} exp(rj,j) where rj,j = y )+ 3 log(Lt yj(cj’xj) — Lt exp(a ’xj) 3 for i = 1,. . . , n and j = 1,. .. , c. Note that factorials in the numerators and denominators of (4.19) and (4.20) are cancelled out. This simplifies the computation of the E-step. The E-step of the EM algorithm can now be carried out as follows: Chapter 4. Summary, Conclusions and Future Research j = 1,.. c, and set powera(i) = 0; (a) compute a (1) = p 3 f(1,j), 1 (b) compute a (i) for i = 2,.. 3 . n, and , . 1. identify an index set Ka(i) , j = 1,. . . {k; c as follows: , (i 3 a 1 a(i maxkEKa(){log( 2. find tempa(i) 207 (i) = exp{1og( 3 1 3. compute a ak(i — — — 1)pik L 0, k = 1,.. 1)pk) + 3 1)pk) + rj, — . , r,k}; tempa(i)} for j Ka(i) and 0 otherwise; 4. set powera(i) = powera(i — 1) + tempa(i). (c) Set b(n) = 1, forj = 1,...,c, and powerb(n) = 0; (d) compute b(n) for i = n — 1,n — 2,...,1 andj = 1,...,c as follows: 1. identify an index set K”(i) pkb(i + 1) 1 {k;. 2. find tempb(i) maxkEKb(){log(l 3. compute b(i) pkbk(i 1 exp{log( 0,k 1,... ,c}; pkb(i + 1)) + r+1,k}; — 1)) + r, 3 , 1 — tempb(i)} for jE K’(i) and 0 otherwIse; 4. set powerb(i) = powerb(i (e) For i = 2,. . . , u and j = 1,. .. , — 1) + tempb(i). c compute temp(i) exp{powera(i) + powerb(i) powera(n)} and (i,j) = temp(i)a(i)b(i)/ (f) For i = 2,... , n and j = 1,. . . , a(n); c, compute Iz(i,j, k) = temp(i)pkf (i, k)a(i — 1)bk(i)/ a(n). — Chapter 4. Summary, Conclusions and Future Research, 4.3.5 208 The Probabilities of Initial States and Starting Values In the above model we define the probabilities of initial states as known parameters. To determine their values, we consider two types of data: (1) data for a single subject and (2) data for several subjects. In the first case there is only the first observation directly related to the initial states so that there is little information about the initial probabilities. Thus we set p’) = ... = p. Since the data in this case usually contain a rather long sequence of observations, the values of the probabilities may not have significant effects on estimation in terms of aymptotic properties. Without covariates, Leroux (1989) proves that the effect of the probabilities vanishes as the number of observations increases. In the second case we choose the values of the probabilities as the estimates of the mixing probabilities which are obtained by fitting the first observations of the subjects, { yii; k = 1,. . . , m}, into a c-component mixed Poisson regression model with constant mixing probabilities and covariates in Poisson rates (Wang, Puterman, Le and Cockburn, 1993). Note that in this case the mixing probabilities can be equivalently interpreted as the the probabilities of initial states for the Markov mixture model. Further, in many applications like this, the data contain many subjects but short series. To be able to run the EM algorithm, we need to choose the starting values for the unknown parameters in the model. The EM algorithm only guarantees, under some regularity conditions (Wu, 1983), that the parameter estimates are local maxima of the likelihood function. As the number of unknown parameters in the model increases, there may be more local maxima. Further, a poor choice of the starting values may slow down convergence with the EM algorithm. Indeed, in some cases where the likelihood is unbounded on the edge of parameter space, the sequence of estimates generated by the EM algorithm may diverge if the starting values are too close to the boundary. Hence for these reasons it is important to choose the starting values carefully so as to increase Chapter 4. Summary, Conclusions and Future Research 209 the chance to achieve the maximum likelihood estimate. We use the following approach which works well in our applications. We assume that c is known. We first fit the observed data into a c-component in dependent mixed Poisson regression model. Then we choose the initial values of the regression parameters as the corresponding estimates by the fitting. Further, we identify each observation with one of the c states if it has the largest value of the estimated posterior probabilities calculated by (4.19). We then calculate the frequencies of the transitions from state j to state k, and set these frequencies as the initial values of the corresponding transition probabilities, p:j 4.3.6 Implementation and Remaining Issues We suggest using BIC or AIC to determine the number of underlying states, and carrying out inference about parameters by likelihood ratio tests. Specifically, we first determine the number of components c by comparing BIC and AIC values among saturated models which include all covariates in Poisson means. After c is determined, we then carry out inference about regression parameters by likelihood ratio tests within c component mixture models. We will plan to conduct a Monte Carlo study to investigate this model selection procedure. On the other hand, using the quantities (i,j) and Iz(i,j, k) from the fitted model, we can classify observations into one of c states, and identify transitions for each subject. This information may be useful in applications. Note that our code works well for fitting the fetal movement data (Leroux,1989) to the proposed model without covariates; the results are the same as those given by Leroux (1989). Bibliography [1] Aitkin, M., Anderson, D. and Hinde, J (1981), “Statistical Modelling of Data on Teaching Styles (with discussion),” Journal of the Royal Statistical Society, Ser. A 144, 419-461. [2] Akaike, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle,” Second International Symposium on Information The ory, (B.N. Petrov and F. Csaki, Eds.), Budapest: Akademia Kaido, 267-81. [3] Akaike, H. (1974), “A New Look At the Statistical Model Identification,” IEEE Trans. on Automatic Control, AC-19, 716-23. [4] Albert, P.S., (1991), “A two-state Markov model for a time series of epileptic seizure counts,” Biometrics, 47, 1371-1381. [5] Amritage,P. (1957), “Studies in the variability of pock counts,” J. Hug., Camb., 55, 564-581. [6] Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis, Second Edition, New York: Wiley. [7] Anscombe,F.J. (1950), “Sampling theory of the negative binomial and loga rithmic series distributions,” Biometrika, 37, 358-382. [8] Aranda-Ordaz, F.J., (1981), “Quantal response analysis for a mixture of pop ulation,” Biometrics, 28, 981-988. [9] Ashford, R. and Walker, P.J. (1972), “Quantal Response Analysis For a Mix ture of Populations,” ,Biometrics 28, 981-988. [10] Backer, R.J. and Nelder, J.A., (1978), The GLIM systems, release 3, Oxford: Numerical Algorithms Group. [11] Bartlett,M.S. (1936), “Some notes on insecticide tests in the laboratory and in the field,” J. R. Statist. Soc., Suppi., 3, 185-194. [12] Baum, L.E., Petrie, T., Soules, G., and Weiss, N., (1970), “A maximization technique occuring in the statistical analysis of probabilitic functions of Markov chains,” Annals of Mathematics Statistics, 41, 164-171. 210 Bibliography 211 [13] Blischke, W.R. (1964), “Estimating the Parameters of Mixtures of Binomial Distributions,” Journal of the American Statistical Association, 59, 510-528. [14] Bock, R.D. and Aitkin, M., (1981), “Marginal maximum likelihood estimation of item parameters: application of an EM algorithm,”, Psychometrika, 46, 443-459. [15] Bound, J., Cummins, C., Griliches, Z., Hall, B.H., and Jaffe., A., (1984), “Who does R and D and who patents?” National Bureau of Economic Research Work ing Paper No. 908, in Z. Griliches, ed., R and D, Patents, and Productivity, (Chicago: University of Chicago Press), 21-54. [16] Breslow, N. (1984), “Extra-Poisson Variation in Log-linear Models,” Applied Statistics, 33, 38-44. [17] Breslow, N. (1990a), “Tests of Hypotheses in Overdispersed Poisson Regres sion and Other Quasi-likelihood Methods,” Journal of the American Statistical Association, 85, 565-571. [18] Breslow, N. (1990b), “Further Studies in Variability of Pock Counts,” Statistics in Medicine, Vol.9, 615-626. [19] Brillinger, D.R. (1986), “The Natural Variability of Vital Rates and Associated Statistics (with discussion),” Biometrics, 42, 693-734. [20] Busvine, J.R., (1938), “The toxicity of ethylene oxide to Calandra oryzae, C.C. Granaria, Tribolium Castaneum, and Cimex Lectualarius,” Biology, 25, 605-632. [21] Cameron, A.C. and Trivedi, P.K. (1990), “Regression-Based Tests for Overdis persion in the Poisson Model”, Journal of Econometrics, 46, 347-364. [22] Cameron, A.C. and Trivedi, P.K. (1986), “ Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests,” Journal of Applied Econometrics, 1, 29-53. [23] Carrol, R.J., Spiegelman, C.H., Lan, K.K.G., Bailey, K.T. and Abbott, R.D., (1984), “Errors-in-variables for binary regression models,” Biometrika, 71, 1926. [24] Collett, D., (1991), Modelling Binary Data, Champman Hall. [25] Collings,B.J. and Margolin,B.H. (1985), “Testing Goodness-of-Fit for the Pois son Assumption When Observations Are Not Identically Distributed,” Journal of the American Statistical Association, 80, 411-18. Bibliography [26] Cox, D.R. (1970), The Analysis of Binary Data, London: Chapman 212 Hall. [27] Cox, D.R. (1983), “Some Remarks On Overdispersion,” Biometrika, 70, 269274. [28] Crowder, M.J. (1978), “Beta-Binomial Anova for Proportions,” Applied Statis tics, 27, 34-37. [29] Dean, C. and Lawless J.F. (1989), “Tests for Detecting Overdispersion in Pois son Regression Model,” Journal of the American Statistical Association, 84, 467-472. [30] Dean, C.; Lawless J.F. and Willmot, G.E. (1989), “A mixed Poisson-inverseGaussian regression model,” Canadian Journal of Statistics, 17, 171-181. [31] Dean, C. (1992), “Testing for Overdispersion in Poisson and Binormial Regres sion Models,” Journal of the American Statistical Association, 87, 451-457. [32] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm (with discussion),” Journal of the Royal Statistical Society B, 39, 1-38. [33] Dennis, J.E. Jr. and Schanbel, R.B., (1983), Numerical methods for uncon strained optimization and nonlinear equations, Englewood Clifs, New Jersey: Prentice-Hall. [34] Dietz, P.E.and Baker, S.P., (1974), “Drowning: epidemiology and prevention,” American Journal of Public Health, 64, 303-3 12. [35] Duda, R.O. and Hart, P.E., (1973), Pattern Classification and Scene Analysis, New York: Wiley. [36] Efron,E., (1986), “Double exponential families and their use in generalized linear regression”. Journal of the American Statistical Association, 81, 709721 [37] Ehrenberg, (1972), Repeated-Buying, Amsterdam: North-Holland Publishing Co.; New York: American Elsevier Publishing Co. [38] Everitt, B.S., and Hand, D.J., (1981), Finite Mixture Distributions, Chapman and Hall, Landon. [39] Fienberg, S.E., (1981), The Analysis of Cross-Classified Categorical Data, Sec ond Edition, Cambridge: The MIT press. Bibliography 213 [40] Finney, D.J. (1976), “Radioligan assay,” Biometrics,32, 721-740. [41] Firth, D. (1987), “On the efficiency of quasi-likehood estimation,” Biometrika,74, 233-245. [42] Fisher,R.A. (1950), “The Significance of Deviations From Expectation in a Poisson Series,” Biometrics,6, 17-24. [43] Folks, J.L. and Chhikara, R.S., (1978), “The inverse Gaussian distribution and its statistical application—a review,” Journal of the Royal Statistical Society B, 40, 263-289. [44] Follmann,D.A. and Lambert,D. (1989), “Generalizing Logistic Regression by Nonparametric Mixing,” Journal of the American Statistical Association, 84, 294-30. [45] Follmann,D.A. and Lambert,D. (1991), “Identifiability of Finite Mixture of Logistic Regression Models,” Journal of Statistical Planning and Inference, 27, 375-381. [46] Formann, A.K. (1992), “Linear Logistic Latent Class Analysis for Polytomous Data,” Journal of American Statistical Association, 87, 476-486. [47] Frome, E.L.; Kutner, M.H. and Beauchamp, J.J. (1973), “Regression analysis of Poisson-dist’ributed data”, Journal of American Statistical Association, 68, 935-940. [48] Frome, E.L. (1983), “The analysis of rates using• Poisson regression models”. Biometrics, 39, 665-674. [49] Fukunaga, K. (1972), Introduction to Statistical Pattern Recognition, New York: Academic Press. [50] Ganio, L.M. and Schafer, D.W., (1992), “Diagnostics for overdispersion,” Jour nal of American Statistical Association, 87, 795-804. [51] Ghosh, J.K. and Sen, P.K., (1985), “On the asymptotic performance of the log likelihod ratio statistic for the mixture model and related results,” Proc. Berkeley Conference in Jonor of Jcrzy Neyman and Jack Kiefer (Vol. II), L.M. Le Cam and R.A. Olshen (Eds.). Monterey: Wadsworth, 789-806. [52] Goldfeld, S.M. and Quandt, R.E., (1973), “A Markov model for switching regressions,”, Journal of Econometrics, 1, 3-16. Bibliography 214 [53] Gourieroux, C., Monfort, A., and Trognon, A., (1984), “Pseudo maximum likelihood methods: applications to Poisson models,” Econometrica, 52, 701720. [54] Gram, L., (1988), “Experimental studies and controlled clinical testing of val proate and vigabatrin,” Acta. Neurol. Scand., 78, 241-270. [55] Griliches, Z., (1990), “Patent statistics as economic indicators: a survey,” Jour nal of Economic Literature, XXVIII, 1661-1707. [56] Guerrero, V.M. and Johnson, R.A., (1982), “Use of the Box-Cox transforma tion with binary response models,” Biometrika, 69, 309-314. [57] Haberman, S.J., (1977), “Maximum likelihood estimation with incomplete data via the EM algorithm (discussion),” Journal of the Royal Statistical Society B, 39, 1-38. [58] Hall, B.H., Griliches, Z. and Hausman, J.A., (1986), “Patents and R and D: is there a lag,” International Economic Review, 27, 265-283. [59] Hartigan, J.A. (1985a), “Statistical theory in clustering,” Journal of Classifi cation, 2, 63-76. [60] Hartigan, J.A. (1985b), “A failure of likelihood asymptotics for normal mix tures,” Proc. Berkeley Conference in Jonor of Jerzy Neyman and Jack Kiefer (Vol. II), L.M. Le Cam and R.A. Olshen (Eds.). Monterey: Wadsworth, 807810. [61] Hausman,J.A., Hall,B.H. and Griliches,Z., (1984), “Econometric models for count data with an application to the patents R and D relationship,” Econo metrica, 52, 909-938. [62] linde, J. (1982), “Compound Poisson regression model”, GLIM 82: Proc. Internat. Conf. Generalized Linear Models (R. Gilchrist, ed.), Springer, Berlin, 109-121. [63] Hill, J.R. and Tsai, C., (1988), “Calculating the efficiency of maximum quasilikelihood estimation,” Applied Statistics, 37, 219-230. [64] Hingson, R. and Rowland, J., (1987), “Alcohol as a risk factor for jinjury or death resulting from accidnetal falls: a review of the literature,” J. Stud. Alc., 48, 212-219. [65] Holford, T.R. (1983), “The estimation of age, period and cohort effects for vital rates”, Biometrics, 39, 311-324. Bibliography 215 [66] Hopkins, A., Davies, P. and Dobson, C., (19.85), “Mathematical models of patterns of seizures,” Arch. Neurol., 42, 463-467. [67] Jorgensen, B. (1987), “Exponential dispersion models (with Discussion),” Jour nal of the Royal Statistical Society B, 49, 127-162. [68] Kaufmann, H. (1987), “Regression models for nonstationary categorical time series: asymptotic estimation theory,” Annal of Statistics, 15, 79-98. [69] Laird,N.M. (1978), “Nonparametric Maximum Likelihood Estimation of Mix ing Distribution,” Journal of the American Statistical Association, 73, 805-811. [70] Lambert, D., and Roeder, K., (1993), “Overdispersion diagnostics for general ized linear models,” working paper. [71] Lawless, J.F. (1987a), “Regression Methods For Poisson Process Data,” Jour nal of the American Statistical Association, 82, 808-8 15. [72] Lawless, J.F. (1987b), “Negative Binomial And Mixed Poisson Regression”, The Canadian Journal of Statistics, 15, 209-225. [73] Le, N., Leroux, B.G. and Puterman, L.M. (1992), “Exact likelihood evaluation in a Markov mixture model for time series of seizure counts,” Biometrics, 48, 317-323. [74] Lehmann, E.L. (1983), Theory of Point Estimation, New York: Wiley. [75] Leroux,B.G. (1989), “Maximum Likelihood Estimation for Mixture Distribu tion and Hidden Markov Models,” University of British Columbia, Ph.D. dis sertation. [76] Leroux,B.G. and Puterman M.L. (1992), “Maximum Penalized Likelihood Esti mation for Independent and Markov Dependent Mixture Models,” Biometrics, 48, 545-558. [77] Liang, K.L. and Zeger, S.L., (1986), “Longitudinal data analysis using gener alized linear models,” Biometrika, 73, 370- 384. [78] Lindsay,B.G. (1983), “The Geometry of Mixing Likelihood: a General Theory,” The Annals of Statistics, 11, 86-94. [79] Lindsay,B.G. and Roeder, K. (1992), “Residual diagnostics for mixture mod els,” Journal of the American Statistical Association, 87, 785-794. [80] Linhart,H. and Zucchini,W. (1986), Model Selection, New York: John Wiley. Bibliography 216 [81] Mannering, F.L. (1989), “Poisson analysis of commuter flexibility in changing routes and departure times,” Transpn. Res. B., 23B, 53-60. [82] Manton,K.G.; Woodbury,M.A., and Stallard, E., (1981), “A variance compo nents approach to categorical data models with heterogeneous cell populations: Analysis of spatial gradients in lung cancer mortality rates in North Carolina counties,” Biometrics, 37, 259-269. [83] Margolin, B.H.; Kaplan, N., and Zeiger,E., (1981), “Statistical analysis of the Ames salmonella/microsome test,” Proc. Nat. Acad. Sci. U.S.A., 76, 37793783. [84] Margolin, B.H., Kim, B.S. and Risko, K.J. (1989), “The Ames Salmonella/Microsome Mutagenicity Assay: Issues of Inference and Valida tion,” Journal of the American Statistical Association , 84, 651-661. [85] McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models (Second Edition), London: Chapman and Hall. [86] McDermott, F.T., (1977), “Alcohol, road crash casualties, and contermea sures,” A.N.Z. J. Surgery, 47, 156-161. [87] McLachlan, GJ. and Basford, K.E. (1988), Mixture Models, New York: Marcel Dekker, Inc.. [88] Milton, J.G., Gotman, J., Remillard, G.M. and Adermann, F., (1987), “Tim ing of seizure recurrence in adult epileptic patients: a statistical analysis,” Epilepsia, 28, 471-478. [89] Nash,J.C. (1990), Compact Numerical Methods for Computers, Adam Hilger. [90] Neyman, J., (1959), “Optimal asymptotic tests of composite statistical hy potheses,” In Probability and Statistics, Ed. U. Grenander, 213-234, New York: Wiley. [91] Neuhaus, J.M., Kalbfleisch, J.D., and Hauck, W.W. (1991), “A comparison of cluster-specific and population averaged approaches for analyzing correlated binary data,” International Statistical Review, 59, 22-35. [92] Ochi, Y. and Prentice, R.L., (1984), “Likelihood inference in a correlated probit regression model,” Biometrika, 71, 531-554. [93] Otake, M. and Prentice, R.L., (1984), “The analysis of chromosomally aberrant cells based on beta-binomial distribution,” Radiation Research, 98, 456-470. Bibliography 217 [94] Pierce, D.A. and Sands, B.R. (1975), “Extra-Bernoulli variation in binary data,” Technical Report, 46, Oregon State University, Department of Statistics. [95] Pierce, D.A. and Schafer, D.W., (1986), “Residuals in generalized linear mod els,” Journal of the American Statistical Association , 81, 977-98 1. [96] Pocock, S.J., Cook, D.G., and Beresford, S.A.A.,(1981), “Regression of area mortality rates on explanatory variables: What weighting is appropriate?,” Applied Statistics, 30, 370-384. [97] Pregibon, D., (1981), “Logistic regression diagnostics,” Annals of Statistics, 9, 705-724. [98] Prentice, R.L., (1976), “A generalization of the probit and logit methods for dose response curves,” Biometrics, 32, 761-768. [99] Redner, R.A. and Walker, H.F., (1984), “Mixture densities, maximum likeli hood and the EM algorithm”, SIAM, 26, 195-239. [100] Roberts, H.V. (1991), Data Analysis for Managers with Minitab, The Scientific Press, South San Francisco. [101] Schall, R. (1991), “Estimation In Generalized Linear Models With Random Effects”, Biometrika, 78, 719-27. [102] Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals of Statistics, Vol.6, 461-464. [103] Sclove, S., (1983), “Time-series segmentation: a model and a method,” Infor mation Sciences, 29, 7-25. [104] Shaked, M., (1980), “On mixtures from exponential families,” J.R. Statist. Soc. B, 42, 192-198. [105] Simar,L. (1976), “Maximum Likelihood Estimation of a Compound Poisson Process,” The Annals of Statistics, 4, 1200-1209. [106] Stein, G.Z., and Juritz,J.M. (1988), “Linear models with an inverse-Guassian distribution,” Comm. Statist. Theory Methods, 17, 557-571. [107] Stiratelli, R., Laird, N. and Ware, J.H., (1984), “Random-effect models for series observations with binary response,” Biometrics, 40, 961-971. [108] Tarone, R.E., (1976) “Testing the goodness of fit of the binomial distribution,” Biometrika, 66, 585-590. Bibliography 218 [109] Teicher, H. (1961), “Identifiability of Mixtures,” Annals of Mathematical Statis tics 32, 244-248. [110] Teicher, H. (1963), “Identifiability of Finite Mixtures,” Annals of Mathematical Statistics 34, 1265-1269. [111] Titterington, D.M., Smith, A.F. and Markov, U.E. (1985), Statistical Analysis of Finite Mixture Models, Chichester: John Wiley & Sons. [112] Tweedie, M.C.K., (1957), “Statistical properties of inverse Gaussian distribu tions,” International Annals of Mathematical Statistics, 28, 362-372. [113] Tyssedal, J.S., and Tjostheim, D., (1988), “An autoregression model with sud denly changing parameters and an application to stock market prices,” Applied Statistics 37, 353-369. [114] Walker, P.J. (1966), “A Method of Measuring the Sensitivity of Trypanosome to Acriflavine and Trivalent Tryparsamide,” Journal of General Micro biol, 43, 45-58. [115] Webb, G.R., Redman, S., Hennrikus, D.J., Kelman, G.R., Gibberd, R.W. and Sanson-Fisher, R.W., (1994), “The relationships between high-risk and problem drinking and the occurrence of work injuries and related absences,” Journal of Studies on Alcohol, forthcoming. [116j Wechsler, H., Kasey, E.H., Thum, D. and Demone, H.W., (1969), “Alcohol level and home accidents,” Public Health Reports, 84, 1043- 1050. [117] Wedderburn, R.W. M. (1974), “Quasi-likelihood Functions, Generalized Linear Models and the Gauss-Newton Method,” Biornetrika 61, 439-447. [118] Wilesnsky, A.J., Ojemann, L. M., Temkin, N.R., Troupin, A.S. and Dodrill, C.B., (1981), “Clorazepate and phenobarbital as antiepileptic drugs: a doubleblind study,” Neurology 31, 1271-1276. [119] Williams, D.A. (1975), “The analysis of binary response from toxicological experiments involving reproduction and teratogenicity,” Biometrika, 61, 439447. [120] Williams, D.A. (1982), “Extra-binomial Variation in Logistic Linear Moldes,” Applied Statistics, 31, 144-148. [121] Williams, D.A. (1984), “Generalized linear model diagnostics using deviance and single case deletions”, Applied Statistics, 36, 181-191. Bibliography 219 [122] Wu,C.F.J. (1983), “On the Convergence Properties of the EM Algorithm,” The Annals of Statistics, 11, 95-103. [123] Zeger, S.L., (1988), “A regression model for time series of counts,” Biometrika, 75, 621-629. V [124] Zeger, S.L., Liang, K.Y. and Self, S.G., (1985), “The analysis of binary longi tudinal data with time-independent covariates,” Biometrics, 72, 31-38. [125] Zeger, S.L. and Qaqish, B., (1988), “Markov regression models for time series: a quasi-likelihood approach,” Biometrics, 44, 1019-1031. Appendix A 1. Fortran program for computing the maximum likelihood estimates of the mixed Poisson regression model. C PROGRMf GENMDC C C C C C C C C Thi. I. dceigird for dii. in whldi e.dR cinc.vatii i. wi4hi t poriod. Tidi • piorid.. fitlid von., Pcui&. Ma*i.Aic XSQR. Devinico, P..o.’. d.vin.or oc.Inh for l i1 LZI for radi NV-I OF VARIABLE REGRESSION COEFFICIENTS FOR E4C1I COMPONENT. NT I OF COMMOM REGRESSION COEFFICIENTS. • IMPLICrF DOUBLE PRECTSION(A.H,O-Z) INTEGER NOBSNSFAT,NX,NXI,NV,NF,NI$2 DiMENSION OBS(I000),TU000),M(1000,8),X141(1000,8),Z(1000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,NFrAT,NX,NXI.NV.NF,Nl,N2 DIMENSION BGUESSQ0),OB(30),HO0,30),AGUESS(IQ),OADG) DIMENSION PA(10),PB(I,PC(10),Wr(30),DRES(1aOo),TDEES(IC0o) DIMENSION Frr(i000,12),RE (IC ,iid0,3Q), RESL(i000) dioxoajai tdi.(I0C0),tImo(I000),(I000,8),I(lOQ0,8) diiorork.i s(lOOO),ae3O),cp.oQO),w(I0C0) INTEGER N1,N2,UII,1112,MON,NEVALS,IFAIL,NSFEF INTEGER N’r,N’FEMPI,NTEMP2,n13,NUM,IrER.NrOI. OPEN( UNTr=I,FILE=’lcuI’) OPEN( UNrr-2,FnE=rcon.r) OPEN( UNIF..3,FIE.’Iflntor’) OPEN( UNrr=7.PILE-rcauk’) opni( aiit=8fiin&flt.a) OPEN( UNrr—9,FILE—’ld.taoi) READ(1,I00) NOBS,NSTAT,NX,NXI,NV,NF w,100) ncin,ini.t,ior,n.I,NV,NF 100 EORMAT(615) Ni ..(NTrAT-I)NX N2=NSTAT*NV+NF NT=NI+N2 NTEMP1=NI NTEMP2N2 READ(1,i13) (OBS(I),TIME(I), l.’I,NOBS) 110 FORMAT(Fi0.5) wriln(*,113) (oin(i),tiin.(i),i—i,nois) do lii 1i,nob. READ (1,112) (XM(,J),Ii,NX) if(Lgt.i0) go to ill wñte°,iI2) (xm(Lj)j—i,xoc) 111 112 FORMAT(6G16.8) 113 FORMAT(2F10.5) READ(l,1I0) (BGUESS(I),I—1,N1) wnlc(9,i10) (bgucaa(i),ii,n1) DO 115 1—l,NOBS READ(i,1IZ QcMI(1,i),J=1,NXI) if (I gt.i0) go to 115 writo (,I12) (xxnl(i.j)j1,mcl) 115 CONTINUE READ(l,1 10) (AGUESS(I),l= l,N2) WRrrE(9,1 10) (AGUESS(O,1 i,N2) do 118 i’I,oobe toin(i)’’cb(i) do 116 j=i,nx 116 lxm(Lj)xm(ij) do Ill j1,nxl 117 mI(Lj)=xm1(i.j) 118 ccoih.r NEVALS- 1000 uhIN1 IH2’N2 1113-NT MONt TOL.=0.0001 TOL.L’..O.OI DO I 1=I,NOBS l TDRES(I)=0.0 OBSINF0.0 220 DEV=0.ODO DO 150 1=1,NOBS IF (OBSQ).EQ.0.0) GO TO 150 TDRESQ)=OBS(1)*(DLOG(OBSQ))1 .0D0D0)oba(i)*dllog(time(i)) DEV=DEV+TDRES(1) TEMP1 =0.ODO NSTEP=INT(OBS(1)) DO 145 J=1,NSTEP TEMP1 =TEMPI +DLOG(DFLOAT(J)) 145 CONTINUE obainf=obainf+TEMPIOBS(I)*DLOG(T1ME(1)) 150 CONTINUE NTOL=NOE5 ODEV=DEV OOESINF=OBSINF do 155 i=1,ntol 155 w(i)=0.ODO 160 DO 888 1TER=0,NTOL PREL=(10.0D0ca10.0D0) 200 DO 202 I=1,N1 202 OBm=EGuESso) DO 205 1=1,N2 205 OA(I)=AGUESS(1) CALL ESTEP(NTEMPI,NTEMP2,EGUESS,AGUESS) CALL MSTEPIa4TEMP2,AGUESS,H,P,1H2,NEVALS,IFML,MON) CALL MSTEfl(NTEMP1,BGUESS,H,P,IHI,NEVALS,IFAIL,MON) SSRI =0.ODO SSR2=0.ODO DO 350 I=l,Nl SSRI =SSRI+(OEW-BGUESS(I))°’QODO 350 CONTINUE DO 352 l=1,N2 352 SSR2=(OA(1)-AGUESS(l))2.0D0+SSR2 DO 353 K=l,Nl 353 BT(K)=BGUESS(K) DO 354 K=1,N2 354 BT(K+Nl)=AGUESS(K) CALL LL1KELY’lT,BT,F) TEMP=-F-PREL IF (TEMP.LT.TOLL) GO TO 359 PREL=F IF (ITER.GT.0) GO TO 356 f=-f-obainF WRITE(9,1II1) F wcitc(*,l11l) f liii fonnat(4x,G16.8) 356 IF ((SSRI .GT.TOL).OR.(SSR2.GT.TOL)) GO TO 200 359 call cawton(nI,bt,li,pO,nt,tr.vals,ifail,mon,std) if (itcr.eq.0) call Fllkely(nt,bt,f,dits) if (ilec.gt.0) call llhktly(nt,bt,l) do 3M i=l,nl 364 bgueas(i)=bt(i) do 365 i=I,n2 365 agtas(i)=bt(ol+i) DEV=2*(DEV(F)) wcite(*,9 itec,cdev,tdcea(itcc),f C IF (ITER.EQ.0) TDEV=DEV IF (1TER.GT.0) GO TO 500 f--f-obuiof wcite(9,ll11) f call gfit(ol,n2,bgueua,agiraa,XSQR,flt,RES,pa,pb,pc) do 366 i-1,nobS temp=fIt(i,l)-fit(i,2) if (Icoipcqo.ODO) sign(i)=0.ODO if (lcmpat.O000) sign(i)=tecap/(aba(tcmp)) dQ)(2*Q).(i))yICK(05IJffl3t3) dces(i)=sign(i)*drea(i) 366 continue wcitc(9,4444) 4444 fonnat(4x,’gccdneaa of fit--XSQR, DeviancE’) write(9,7777) XSQR, DEv do 368 i1,nobe wrile(8,369) (FfflI,J),J= 1,2÷2aNSTAT) o WRJTE(2,369) (RES(I,J),J=l,l+NSTAT) wcite(7,l 12) (zQj),j I,natat) NUM= 1 DO 367 J=l,NSTAT-l IF (ZQ,J)GT.ZO,J+l)) GO TO 367 NUM=J+I 367 CONTINUE WRITE(2,370) Fff(I,l),Ffr(I,2),RES(I,l),DRES(I),FIT(I,NUM+2), o RES(I,l+NUM),NUM 368 continue 369 fonnat(12g16.8) 370 FORMAT(G12.6,x,g12.6,x,g12. DO 400 J=1,N1 400 BGUESS(J)=BT(J) DO 402 J=I,N2 402 AGUESS(J)=BT(J+N1) call estep(ntempl,ntcmp2,bgueas,a temp=dfloat(nl +n2) do 410 k=1,nl ct,ar(k)=bgucsa(k) se(k)=(std(k,k)*a(0.500DO))*temp 410 continue do 420 k=1,n2 opar(k+n1)aguess(k) sek+nl)(sk+nl,k+nl)ca(0.5D0D0))temp 420 continue WRfl’E(9,5555) WR]TE(9,7777) (BGUESS(1), std(I,i)ca(0.SDODO), 1l,Nl) writc(*,7777) (bguess(i),i= 1,nl) write(9,6666) write(9,7777) (AGUESS(I), atd(l+Nl,i+n1)°’(0.5D0D0),l= I,N2) writet*,7777) (aguess(i),i=l,n2) wiite(9,7787) writc(9,7777) (pa(i),i l,nstat) write(9,7797) write(9,7777) (pb(i),i l,natat) wiite(9,7799) wrlte(9,7777) (pc(i),i l,nstat) GO TO 504 500 RESLQTER)=TDEV-DEV do 501 k=l,nl w(Uer)=w(iter)+abs(bguees(k)-oparikl)/sc(k) bguess(k)opar(k) 501 continue do 502 k=l,n2 w(iter)=w(iter)+abe(aguess(k)-opa(k+n1))Ise(k+nl) aguess(k)=opar(k+nl) 502 continue IF (TTER.EQ.NTOL) GO TO 889 504 NOBS=NTOL-l DEV=ODEV.TDRES(ITER+ I) if (iter.eq.0) go to 514 do 510 k=l,iter obs(k)tobs(k) time(k)ttime(k) do 505 j=1,nx 505 xm(k,j)=tzcm(kj) do 506 j=1,nxl 506 xm1(kj)=ntm1(kj) 510 continue if (iter.eq.noba) go to 888 514 DO 520 K=ITER+l,NOBS OBS(K)=TOBS(K+l) TIME(K)=tTIME(K+l) DO 515 J=1,NX 515 XM(K,J)=tXM(K+l,J) DO 516 J=1,NXI 516 XMI(K,J)=tXMI(K+l,J) 520 CONTINUE 888 CONTINUE 889 do 900 i=l,ntol resl(i)=sign(i)*(resl(i)ncO.SD000) write(3,7778) ds(i),ca(,l),nul(i),w(i) 900 continue 5555 fonnat(4x,’beta-vecto?) 6666 fonnat(4x,’alpl,a-vecto?) 7777 fom.at(4x,2g16.8) 7778 fonnat(4x,4g16.8) 7787 fonnat(’pa’) 7797 fonnat(’pb’) 7799 fonnat(’pc) 9999 sro END SUBROUflNE FuNcr4,B,P) Cnu0on0a C This subroutlue computes the value of function QI in Chapter 2. C Data input: N diiuension of vector B; B=betavector; C C output: P = Ut flmction value Q1(B). IMPLICiT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NV,NF,N1,N2 DIMENSION OBS(1000),TIME(1000),XM(1000,S),XM1(l000,8),Z(1000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NXI,NV,NF,N1,N2 INTEGER N DIMENSION E(N),BX(5) P=0.ODO DO 100 1=l,NOBS DOS J=l,NSTAT-l BX(J)=0.ODO DO 6 M=I,NX 6 BX(J)=BX(J)+XM(I,M)*B(M+(J.I)*NX) S CONTINUE BX(l) 5 P=P+Z(I,l) TEMP1 =EX(l) IF (TEMPI .LT.0.ODO) TEMPI =0.ODO DO 20 J=2,NSTAT-I P=P+Z(I,J)*BX(J) IF (BX(J).GT.BX(J-I)) TEMPI =BX(J) 20 CONTINUE P=P-TEMPI CALL AEXP(-TEMPI,TEMP2) DO 30J=I,NSTAT-I CALL AEXP((BX(J)-TEMP1),TEMP3) TEMP2=TEMP2+TEMP3 30 CONTINUE P=P-DLOG(TEMP2) 100 CONTINUE P=-F RETURN END ct SUBROUrINE GRADasB,G) C This subroutine computes Ut first derivative of QI (see eqn 2.21). C Data input: N = dimension of vector B; C B=hetavector; C output:G=tlsederivativeofQlatli. :::.::..:. .:. C IMPLICiT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI,N2 DIMENSION OBS(l000),TIME(I000),XM(l000,S),XMI (l000,S),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOES,NSTAT,NX,NX1,NV,NF,Nl,N2 INTEGER N DIMENSION G(25),B(N),TEMP(25),BX(5) DOS I=l,N S G(I)=0.ODO DO 100 I=l,NOBS DO 20 J=l,NSTAT-I BX(J)=0.ODO DO 1OM=I,NX 10 BX(J)=BX(fl+XM(I,M)*B(M+(3l)*NX) 20 CONTINUE TEMP1 =BX(1) IF (FEMP1 .LT.0.ODO) TEMP1 =0.ODO DO 30 J=2,NSTAT-I IF (BX(J).GT.BXQ-I)) TEMPI =BX(3) 30 CONTINUE CALL AEXP&TEMPI ,TEMP2) DO 40 J=l,NSTAT-I CALL AEXP((BX(J)-TEMPI),TEMPQ)) TEMP2=TEMP2+TEMP(J) 40 CONTINUE DO 60J=l,NSTAT-1 DO 50 M=I,NX G(M+(Jl)aNX)=G(M+(J.l)aNX)+XM(I,M)*(z(I,J).TEMP(J)IfEMp2) 50 CONTINUE 60 CONTINUE 100 CONTINUE DO 200 I=I,N 200 G(I)=-G(I) RETURN END SUBROI.TrINE MSTEP2(N,B,H,P0,IH,NEVALS,IFAIL,MON) Ct©©ttt5©t C This subroutine is a quasi-Newton algorithm (Nash, 1990) which C maximizes Ut function Ql. 13-3 C Data input: N dimenskai of vector B; B = beta vector; 111 = dimansion of tiz Hessian matrix; C NEVALS C 1/of evaluations for the function QI; output: H = Liz Hessian matrix; P0 nzxinasn value; C C B = optimal values of beta vector. IMPLICiT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSrAT,NX,NXI,NV,NF,Nl,N2 DIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(1000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,N1,N2 DIMENSION B4), H(IH,N) DIMENSION X(30), C(30), 0(30), T(30) DOUBLE PRECISION K INTEGER COUNT DATA W,TOL.2,1.0D0D-4/,EPS/1 .ODOD-61 IF (N.LT.0.OR.N.GT.23) GO TO 160 IFN = N+l 10 = RLIM=7.2D0*(l0.ODOec74.ODO) CALL FUNCr(N,B,P0) IF(P0.GT.RLIM)GOTO18O CALL GRAD4,B,G) C C C RESEF HESSIAN 10 DO 301 = l,N DO 20 J l,N 20 H(I,J) = 0.0130 30 H(I,l) = I .ODODO ILAST = 10 C C C TOP OF ITERATION 40 DO 501 l,N X(1) = B(I) 50 C(I)=G(1) C C C FIND SEARCH DIRECrION T Dl = 0.ODO SN=0.000 DO 701 = l,N S = 0.0 DO 60 J = 1,N 60 S = S-H(I,J)*G(J) T(I) = S SN = SN+S*S 70 Dl = Dl-SG(I) C C C CHECK IF DOWNHILL IF (D1.LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN = 0.SDODO/DSQRT(SN) K = DMIN1(1.000DO,SN) 80 COUNT =0 DO 901 = l,N B(I) = X(1)+K*T(I) IF (DABS(B(I)-X(1)).LT.EPS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO 150 CALL FUNCT(N,B,P) IFN = IFN+1 IF (IFN.GE.NEVALS) GO TO 170 IF (P.LT.P0DI*K9X)L) GO TO 100 K = W*K GO TO 80 C C C NEW LOWEST VALUE 100 P0 P 10 = 10+1 CALL GRAD(N,B,G) IFN IFN+N C C UPDATE HESSIAN COUNT+l C DI = 0.0D0 DO 1101 = 1,N TI)) = K T(I) 5 C(I) = G(l)-C(1) 110 Dl = D1+T(1)*C(l) C C C CHECK IF +VE DEF ADDITION IF (D1.LE.0.ODO) GO TO 10 D2 = (l.ODO DO 130 1 = l,N S = 0.0130 DO 120 J = I,N 120 5 = S+H(1,J)C(J) XCI) = S 130 D2 = D2+S’C(l) 1)2 = I+D2/D1 DO 140 I = I,N DO 1401 = 1,N 140 H(I,J) = H(I,J)-(T(I)*X(J)+T(J)*X(I)-D2fl(Ifl(J))ID1 GO TO 40 150 WAIL = 0 C SUCCESSFUL CONCLUSION RETURN 160 WAlL = I C N Our OF RANGE RETURN 170 WAIL = 2 C TOO MANY FUNCTION EVALUATIONS RETURN 180 WAIL=3 C IND1AL POINT INFEASIBLE RETURN 2005 FORMAT( 2X,3G16.4) END SUBROUTINE AEXP(X,F) C C This subroutite computes a expotrntial function value. C Data input: X = real number; output: F = exp(X). C C•••• IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER NSTEP TEMP1 =ABS(X) IF (TEMPI.GT.79.9D0) GO TO 50 F=DEXP(X) GO TO 200 50 IF (X.LT.-79.9D0) GO TO ISO IF (X.GT.lS0.ODO) X=150.ODO F=l.ODODO+X NSTEP=l FAcrrOR= 1 .ODODO TEMPI =DFLOAT(NSTEP) TEMP2=XITEMP1 100 IF (TEMP2.LT.l.ODODO) GO TO 200 NETEP=NSTEP+ 1 TEMPI =DFLOAT(NSTEP) FACTOR=XITEMPI FACTOR 5 TEMP2=TEMP2 F=F+TEMP2 GO TO 100 150 F=0.ODO 200 RETURN END SUBROUTINE FUNCT1(N,B,P) C This subroutine computes the value of function Q2 in Chapter 2. C Data input: N = dinrnsion of vector B; B=alphavector; C C output: P = the fanction value Q2(B). IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NY,NF,NI ,N2 DIMENSION OBS(1000),TIME(I100),XM(1IM),8),XMI (I,8),Z(II00,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX I ,NV,NF,N I ,N2 INTEGER N i DIMENSION B(N),BX(5) P=0.ODO DO 100 I=1,NOBS DOS J=1,NSTAT BX(J)=0.000 DOS M=1,NV 6 BX(J)=BX(J)+XMI(1,M)SB(M+Q.1)*NV) S CONTINUE TEMP=0.000 DO I2M=I,NP TEMP=TEMP+XMI (I,NV+M)*B(M+NSTAT*NV) 12 CONTINUE DO 14J=1,NSTAT BX(J)=BX(J)+TEMP 14 CONTINUE DO 20 J=1,NSTAT CALL AEXP(BX(J),TEMP) P=P+ZQ,J)*(OBS(I)*BX(J)TIME(1)*TEMP) 20 CONTINUE 100 CONTINUE P=-p RETURN END SUBROUTINE ORADI(N,B,O) C C This subroutine computes the first derivative of Q2 (see eqn 2.22). C Data input: N = dimension of vector B; C B = alpha vector; output: G...thederivativeofQ2atB. C Cossssssoucosssosssso©nosssossss©seccs©ccccccomrs IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NV,NF DIMENSION OBS(l000),TIME(l000),XM(l000,S),XM1(l000,8),Z(l0(XI,5) COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXI ,NV,NF INTEGER N DIMENSION G(30),B(N),BX(5) DOS I=I,N S G(I)=0.000 DO 100 1=l,NOBS DO 20 J=l,NSTAT BX(J)=0.ODO DO 10 M=l,NV 10 BX(J)=BX(J)+XMI(I,M)’B(M+(J-l) NV) 5 20 CONTINUE TEMP=0.ODO DO 22 M=l,NF TEMP=TEMP+XMI (I,M+NV)*B(M+NSTAT*NV) 22 CONTINUE DO 24 J=I,NSTAT BX(J)=BX(J)+TEMP 24 CONTINUE DO 32J=l,NSTAT CALL AEXP(BX(J),TEMP) DO 30 M=l,NV NV)=G(M+(J.l)*NV) 5 G(M+(J-l) +Zo,J)*XMla,M)*(OBS(I)T1MEwrrEMp) C 30 CONTINUE 32 CONTINUE DO 42 M=I,NF DO 40 J=I,NSTAT CALL AEXP(BX(J),TEMP) G(M+NSTArNV)=G(M+NSTAT*NV) +Z(I,J)*XMI(I,M)*(OBS(I).TIME(l)*TEMP) C 40 CONTINUE 42 CONTINUE 100 CONTINUE DO 200 I=l,N 200 G(I)=-G(I) RErURN END SUBROUTINE MSTEPI(N,B,H,P0,IN,NEVALS,IPAIL,MON) C This subroutine is a quasi-Newton algorithm (Nash, 1990) which C maximizes the function Q2. C Data input: N = dimension of vector B; B = alpha vector; C IN = dinrnsion of the corresponding Hessian matrix; C NEVALS = # of evaluations for the function Q2 C output: H = the Hessian matrix; P0 = maximum value; C B optimal values of alpha vector. IMPLICiT DOUBLE PRECISION(AH,O.Z) INTEGER NOBS,NSTAT,Nx,NxI ,NV,NF,Nl,N2 0 M(1008)XM1(l08)Z(l DIMENSION 0 OBS(100O),TIME(l00),X COMMON OBS,TIME,XM,XI,z,NoBs,NSTATNVNVINVNFNINZ DIMENSION BQl), H(U4,N) DIMENSION X(30), C(30), 0(30), T(30) DOUBLE PRECISION K INTEGER COUNT DATA W,TOLIO.2,l .ODOD-4/,EpS/I .0130D-61 IF (N.LT.0.OR.N.GT.23) GO TO 160 IFN = N+l IG I RLIM=7.2DGa(I0.000es74OD0) CALL FUNCrI(N,B,po) IF(P0.GT.RLIM)GOTO18O CALL GRADI(N,B,G) C C C RESET HESSIAN 10 DO 301 = I,N DO 203 = I,N 20 H(I,J) = 0.01)0 30 H(1,J) 1.0001)0 ll.ATT = IG C C C TOP OF ITERATION 4000501 = l,N X(I) = B(l) 50 C(I)=G(1) C C C PINt) SEARCH DIRECTION T 0.ODO DI SN=0.000 I,N DO 701 S 0.000 DO 601 I,N 60 5 = S.H(I,3)*G(J) T(I) = S SN SN+SS 70 DI DI-S*G(I) C C C CHECK IF DOWNHILL IF (Dl .LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN = 0.500DOIDSQRT(SN) K = DMINI(l.00000,SN) 80 COUNT 0 DO 901 = I,N B(I) = X(I)+I(3T(I) IF (DAES(B(I)-X(I)).LT.EPS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO 150 CALL FUNCrI(N,B,p) IFN IFN+l IF (IFN.GE.NEVALS) GO TO 170 IF (P.LT.P0-Dl*KSTOL) GO TO 100 IC = WK GO TO 80 C C C NEW LOWEST VALUE 100 P0 P 10 10+1 CALL GRADI(N,B,G) IFN IFN+N C C C UPDATE HESSIAN Dl = 0.000 DO 1101 I,N T(I) = K’T(I) COUNT+I 110 C C C CQ) Dl = = Gm-CU) Dl +TQ)*CQ) CHECK H +VE DEF ADDITION IF (D1.LE.0.000) GO TO 10 02 = 0.000 DO 130 I = 1,N S = 0.000 DO 1203 = I,N 120 S = S+H(I,J)*C(J) X(1) = S 130 D2 = D2+S*C(I) D2 = 1+D2ID1 DO 140 I = l,N DO 140 J = 1,N 140 11(1,3) = Ho,J).cr(I)*X(J)+T(J)*Xa).D2*r(Iyrr(J))/D1 GO TO 40 150 WAIL = 0 C SUCCESSFUL CONCLUSION REFURN 160 WAlL C N OUT OF RANGE REFURN 170 WAlL = 2 C TOO MANY FUNUFION EVALUATIONS RErURN 180 WAlL = 3 C INITIAL POINT INFEASIBLE RErURN 2005 FORMAT( 2X,3G16.4) END SUBROUTINE ESFEP(14TEMF1,NTEMF2,B,BI) C This subroutine executes the E-step of the EM algorithm. C Data input: NTEMFI = dimension of vector B; C NTEMP2 = dimension of vector RI; C R=hatavectoG C El = alpha vector. C Ouput: updated pesterior prohabifitisa, Z(I,J). C IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI ,N2 DIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(l000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NV,NF,N1,N2 DIMENSION B(NTEMFI),B1(NTEMF2),TEMF(5),BX(5),BX1(5),TEMPL(5) INTEGER NTEMFI,NTEMF2 SMALL=-79.9D0 SMALI0= 10000000000.000 SMALLO= I .OD000/SMALLO DO 100 I=I,NOBS DO 12J=1,NEFAT-I BX(J)=0.000 DO 10 M=l,NX BX(J)=BX(J)+XM(I,M)*B(M +(Jl)*NX) 10 CONTINUE 12 CONTINUE BXQ4STAT)=0.000 DO 18 J=1,NSTAT BX1(J)=0.000 DO 16 M=I,NV BXl(J)=BXl(J)+XMI(I,M)*Bl(M+(Jl)*NV) 16 CONTINUE 18 CONTINUE H mF.EQ.0) GO TO 25 TEMPP=0.000 DO 22 M=l,NF TEMPP=TEMFP+XMI(I,NV+M)*Bl(M+NSTAT*NV) 22 CONTINUE DO 24J=l,NSTAT BXI (J)=BX1(J)+TEMPP 24 CONTINUE 25 CONTINUE CALL AEXP(BXI(l),TEMP(l)) TEMP1 =BX(l)+OBS(I)*BXI (l)-TIME(I)*TEMP(l) DO 30 J=2,NSTAT CALL AEXP(BXI(l),TEMP(J)) TEMFI2=(rEMF(s)-TEMP(s-l)flPIME(I) TEMPl2=(BX(J)BX(J.l))+OBS(l)*(BXl(J)BXl (J.l)).TW4PI2 IF (rEMPI2.GT.0.000) TEMPI Bx(J)+OBSm*Bx1a)TIME(IyrTEMP(J) 30 CONTINUE TEMP2=0.ODO DO 40 J=l,NSrAT BXI (J)-TIME(I)rEMP(J) 5 TEMP(J)=BX(J)+OBS(I CALL AEXP((TEMP(J)-TEMP1),TEMPLQ)) TEMP2=TEMP2+TEMPL(J) 40 CONTINUE DO 50 J=I,NSTAT Z(I,J)=TEMPL(J)IrEMP2 IF (Z(I,J).LT.SMALLO) Z(I,3)=0.ODO 50 CONTINUE 100 CONTINUE RErURN END SUBROIJFINE LLIKELY(NT,BT,F) C .: .::: .: C This subroutine computes the observed log likelihood value. C Data Input: NT = total dintnsion of vector BT; C BT = vector combining beta and alpha vectors. C Output: F = lit observed log likelihood value at liT. IMPLICif DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NV,NF,NI,N2 DIMENSION OBS(l000),TIME(1000),XM(l000,8),XMI(l000,8),Z(ltXlo,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2 INTEGER NT DIMENSION B(30),Bl(30),BT(NT),BX(5),BX1(5),TEMP(5) DO 1 J=I,Nl I B(J)=BT(J) DO 2 J=1,N2 2 BI(J)=BT(Nl+J) F=0.ODO DO 100 I=l,NOBS DO 12J=l,NSTAT-I BX(J)=0.ODO DO 10 M=l,NX 10 BX(J)=EX(J)+XM(I,M)B(M+(JI)NX) 12 CONTINUE BXQISTAT)=0.ODO DO 18 J=l,NSTAT BXI(J)=0.000 DO 16 M=l,NV B1 (M+(J-l)*NV) 5 BXI (J)=BXI(J)+XMI(1,M) 16 CONTINUE 18 CONTINUE IF F4F.EQ.0) GO TO 25 TEMPP=0.000 DO 22 M=l,NF TEMPP=TEMPP+XMI(I,NV+M)*Bl(M+NSTAT*NV) 22 CONTINUE DO 24 J=1,NSTAT BXl(J)=BXI(J)+TEMPP 24 CONTINUE 25 CONTINUE CALL AEXP(BXI (l),TEMP(1)) TEMPI =BX(1)+OBS(I)*BXl(l)TIME(I)*rEMP(I) TEMPPI =BX(l) DO 30 J=2,NSTAT IF (BX(J).GT.BX(J-l)) TEMPPI=BX(J) CALL AEXP(BXI(J),TEMP(fl) TEMPI2=(TEMP(J)-TEMP(J-I)flIME(I) TEMPI2=(BX(J).BX(J1))+OBS(I)*(BXl(J)BXI (J-1))-TEMPI2 IF (TEMPI2.GT.0.ODO) TEMP1 =BX(J)+OBS(I)*BXl(J)TIME(IflEMPQ) 30 CONTINUE TEMP2=0.ODO TEMPP2=0.ODO DO 40 J=l,NSTAT CALL AEXP((BX(J)-TEMPPI),TEMPP2I) TEMPP2=TEMPP2+TEMPP2I TEMPI2=BX(J)+OBS(I)*BXI(J).TIME(IyrI’EMP(J) TEMPI2=TEMPI2-TEMP1 CALL AEXP(TEMPI2JEMF22) TEMP2=TEMP2+TEMP22 40 CONTINUE F=F+TEMP1 +DLOG(TEMP2)-TEMPPI-DLOG(TEMPP2) 100 CONTINUE F=-F RETURN END SUBROUDNE NEWTON(N,B,H,PO,IH,NEVALS,IFAIL,MON,std) C This suhrouthr is a quasi-Newton algorithm 4ash, 1990) which C maximizes the observed log littlihood function. C Data input: N = dimension of vector B; B = vector combining heta and alpha vectors; C III = dimension of the corcesponding Hessian matrix; C C NEVALS =1/ of evaluations for fir observed log lilrlihocd function; C output: H = fir Hessian matrix; 90 = maxinasu value; C B = optimal values of alpha vector C sal = approximate standard errom. IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2 DIMENSION OBS(l000),TIME(l000),XM(l000,8),XM1(l000,8),Z(1000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2 DIMENSION BQ4), H(IH,N),std(30,30) DIMENSION X(30), C(30), G(30), T(30) DOUBLE PRECISION K INTEGER COUNT,ib,n DATA W,TOLIO.2,I .ODOD-41,EPS/l .ODOD-6/ IF (N.LT.0.OR.N.GT.23) GO TO 160 lEN = N+I IG = I (lO.ODO* RLIM=7.2D0 7 5 4.ODO) CALL LLIKELY(N,B,P0) IF (P0.GT.RLIM) GOTO 180 CALL GLIKELYa4,B,G) - C C C RESET HESSIAN 10 DO 301 = I,N DO 20 J = l,N 20 H(l,J) = 0.01)0 30 HØ,fl = I .ODODO ILAST = IG C C C TOP OF ITERATION 40 DO 501 = I,N X(I) = B(I) 50 C(I)=G(I) C C C FIND SEARCH DIRECTION T Dl = 0.01)0 SN=0.ODO DO701=l,N S = 0.ODO DO 60 J = l,N 60 S = S-H(I,.J)G(J) T(I) = S SN = SN+SS 70 Dl = Dl-5 G(I) 5 C C C CHECK IF DOWNHILL IF (Dl.LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN = 0.SDODO/DSQRT(SN) IC = DMINI(l.ODODO,SN) 80 COUNT =0 DO 901 = l,N = xm÷K Tm 5 IF (DABS(Bm-X(ID.LT.EPS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO ISO CALL LLIKELY(N,B,P) IFN = IFN+l IF (IFN.GE.NEVALS) GO TO 170 IF (P.LT.P0-D1 K’TOL) GO TO 1(8) 5 K = W K 5 GO TO 80 = COUNT+ I C C C NEW LOWEST VALUE 100 P0 = P 10 = 10+1 CALL GLIKELY(N,B,0) IPN = WN+N C C C UPDATE HESSIAN Dl = 0.000 DO 1101 = 1,N TQ) = K*T(I) CQ) = OQ)-C(I) 110 Dl = D1+T(Iy C(I) 5 C C C CHECK IF +VE DEP ADDITION IF (Dl.LE.0.ODO) GO TO 10 D2 = 0.000 DO 130 I = I,N S = 0.000 DO 120 J = 1,N 120 S = S+H(I,J)C(J) X(I) = S 130 D2=D2+S*C(l) 02 = 1+02/01 DO1401=I,N DO 140 J = I,N 140 H(I,J) = Ho,J)cJ)+T(J)ax(I)D2T(Iytcr(J))ml GO TO 40 150 do 141 i=1,n do 141 j=l,n 141 std(ij)=h(ij) IFAIL. = 0 C SUCCESSFUL CONCLUSION RETURN 160 WAIL 1 C N OUT OF RANGE RETURN 170 WAIL = 2 C TOO MANY FUNCTION EVALUATIONS RETURN ISO WAIL = 3 C INITIAL POINT INFEASIBLE RETURN 2005 FORMAT( 2X,3G16.4) END SUBROUTINE OLBCELYQ4,B,G) C©°ve C This subroutInc computes the first derivative of the observed log C llktlihocd function. C Data input: N = dinantaice of vector B; C B = vector combining bets and alpha vectors; C Output: G=thederivative of the function atE. 1 2 S 10 12 IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI,N2 DIMENSION OBS(l000),TIME(l000),xM(I000,I),xMI (1000 ,S),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Ni,N2 INTEGER N DIMENSION G(30),B(N),BX(5),BX1(5),BTEMP(30),ATEMP(30) DO I J=I,Nl BTEMP(J)=B(J) DO 2 J=I,N2 ATEMP(J)=B(Nl+J) CALL ESTEP(Nl,N2,BTEMP,ATEMP) DOS I=I,N 0(I)=0.000 DO 100 I=I,NOBS DO t21=l,NSTAT-I BX(J)=0.000 DO 10 M=l,NX BTEMP(M + (J-I)*NX) 5 BX(J)=BX(J)+XM(I,M) CONTINUE BX(NSTAT)=0.000 TPROB=0.000 DO 13 J=t,NSTAT CALL AEXP(BX(J),TEMP1) TFROB=TPROB+TEMpI BX(J)=TEMPI 13 CONTINUE DO 14J=1,NSTAT 14 BX(J)=BX(J)/TPROB DO 183=1,NSTAT BX1(J)=O.000 DO 16 M=1,NV BX1(J)=BX1(J)+XM1(I,M)*ATEMP(M+(J1)*NV) 16 CONTINUE 11 CONTINUE IF (NF.EQ.0) GO TO 22 TEMPP=0.ODO DO 20 M=I,NF TEMPP=TEMPP+XMI(I,NV+M)*ATEMP(M+NSTAT*NV) 20 CONTINUE DO 21 J=1,NSTAT BX1(J)=BXI(J)+TEMPP 21 CONTINUE 22 CONTINUE DO 24 J=1,NSTAT-1 DO 23 M=1,NX G(M+(J1)*NX)=G(M + (J1)*NX)+XM(I,M)*(Z(1,J)BX(T)) 23 CONTINUE 24 CONTINUE DO 30 3=1,NSTAT CALL AEXP(BXI(J),TBATE) I1X1(J)=TRATE 30 CONTINUE DO 45 J=1,NSTAT DO 42 M=1,NV NV+Nl)=G(M+Q-l)NV+N1) 5 GQv1+(J-l) +XMl(l,M)*Z(l,J)*(OBS(I)BX1(J)) C 42 CONTINUE 45 CONTINUE IF (NF.EQ.0) GO TO 60 DO 55 M=1,NF DO 52 J=I,NSTAT G(M+N1 +NSrA’r NV)=o(M+N1 +NSTAT 5 NV) 5 C +XMI (I,M+NVfl(I,J) (OBS(1)-BXI(J)) 5 52 CONTINUE 55 CONTINUE 60 CONTINUE 100 CONTINUE DO 200 I=1,N 200 G(1)=-G(I) RETURN END SUBROUTINE GFIT(NTEMP1,NTEMP2,B,Bl,XSQR,FIT,RES,PA,PB,PC) ,.++‘.+‘::4+ +:z:.:: .,: Ci4+ . C This suhroutitr computes Pearson statistic, fitted values, Pearson C residuals, overdispemion test statistics for each component. C Data input: NTEMPI = dimession of vector B; C NTEMP2 = dimension of vector El; C B =brtavector; El =alphavector; C Output: XSQR = Pearson statistic; C FTT = fitted values including for each component; C RES = Pearson residuals including for each component; C PA, PB and PC are vectors containing tes A, B and C overdinperion teat statistics for each component. C Cssssssssssssssssuasussssssssssssssssssssssssssssssssssssssssss50555so IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NX1,NV,NF,Nl,N2 DIMENSION OBS(1000),TIME(l000),XM(l000,8),XMI(I000,I),Z(I000,5) COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXl,NV,NP,Nl,N2 INTEGER NTEMP1,NTEMP2,NCOUNT(l0) DIMENSION Bq’TEMPI),BIQqTEMP2),BX(5),BXI(5),FTT(I000,l2) DIMENSION PA(10),TPA(10,2) DIMENSION CFTT(lO),RES(l000,6),PB(l0),TPB(l0,2),PC(l0),TPC(l0,2) DO2J=l,NSTAT NCOUNT(J)=0 TPA(J,l)=0.ODO TPA(J,2)=0.ODO TPB(J,I)=0.ODO TPB(J,2)=0.ODO TPC(J,l)=0.ODO 2 CONTINUE XSQR=0.0 DO 100 l=I,NOES FIT(I,l)=OBS(I) FIT(l,2)=0.000 DO 20 l=1,NSTAT-1 BX(J)=0.ODO DO 10 M=1,NX B(M+(J-1) BX(J)=BX(J)+XM(I,M) N X) 10 5 20 CONTINUE BX(NSTAT)0.0D0 TEMPI =BX(1) DO 21 J=2,NSTAT IF (BX(J).GT.IIX(J-1)) TEMPI =BX(J) 21 CONTINUE TEMPD=0.ODO DO 25 J=1,NSTAT CALL AEXP((BX(J)-TEMP1),TEMPI 1) BX(J)=TEMPII TEMPTh=TEMPD+TEMP1 1 BXI(J)=0.ODO DO 22 M=1,NV Bxl(J)=Bx1(J)÷xMl(I,M)*Bl(M÷q-1)Nv) 22 CONTINUE if (nv.eq.0) go to 25 DO 23 M=1,NF E1(M+NSTAT BXI(J)=BXI(J)+XMI(l,M+NV) N 5 V) 23 CONTINUE 24 CALL AEXP(BXI(J),TEMP2) EX1(J)=TEMP2*TIME(l) FIT(i,2+J)=BX1(J) CFIT(J)=BXIQ) 25 CONTINUE DO 26 J=l,NSTAT FrF(I,2+NSTAT+J)=BX(J)IrEMPD FIT(I,2+NSTAT÷J) 5 FIT(I,2)=FIT(I,2)÷FITa,2÷J) RES(l,l +J)=(FlT(l,1)FlT(I,2+J))s(Frr(I,2+J)*%0iD0D0)) 26 CONTINUE E2=0.000 DO 30 J=l,NSTAT E2=E2+FIT(I,2+NSTAT+J)*(F1T(l,2+J)*K2.ODODO) 30 CONTINUE E2=Frf(J,2)+E2-(FITq,2)(2.oD0D0)) RES(l,l)=(F1T(I,1)F1T(I,2))*(E2fl(0.5D0DO)) XSQR=XSQR+(RESa,l)ve(2.ODODO)) NUM= I DO 40 J=I,NSTAT-l IF (Z(l,J).GT.Z(I,J+l)) GO TO 40 NUM=J+1 40 CONTINUE NCOUNTa4UM)=NCOUNT4UM)+ I TPA(NUM,1)=TPA(NUM,1)+(OBS(l)-CF1T(NUMfletaODO-CFTra4UM) TPB(NUM,l)=TPB(NUM,i)+(OES(i)-CF1T(NUM))2.0D0-OBS(l) 2.0D0 tte TPA4UM,2)=TPAQ4UM,2)+CFIT(NUM) TPB4UM,2)=TPA(NUM,2) TPC(NUM,l)=TPC(NUM,l)+((OBS(I)-CFlT(NUM))2.0D0 C -OBSØ)/CFEf4UM) 100 CONTINUE DO 150 J=I,NSTAT PA(J) =TPA(J,l)/((2.0D0DcWI’PA(J,2)).5D0D0) TPB(J,2)y5.5DOD0) 5 PB(J)=TPB(J,1)/((2.0D0D0 PC(J)=TPC(J,l)/((DFLOAT(NCOUNT(J))n.ODODO)y’°v.SDODO 150 CONTINUE RETURN END SUBROUTINE FLIKELY(NT,BT,F,DRES) C C Thia subroutine computes Ut deviance residuals. C Data input: NT dimension of vector BT1 C BT = vector combining beta and alpba vectors; Output: ORES = deviance residuals; C C F = the observed log likelihood fsnctlon value at liT. Cn0©anacccccnccccc IMPLICIT DOUBLE PRECISION(A-I4,O-Z) INTEGER NOBS,NSTAT,NX,NXl,NV,NF,Nl,N2 DIMENSION OBS(1000),TIME(l000),XM(l000,8),XMI(I000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NV,NF,Nl,N2 INTEGER NT DIMENSION B(30),Bl (30),BT(NT),BX(5),BX1(5),TEMP(5),DRES(l000) DO I J=l,Nl 1 B(J)=BT(J) DO 2 J=l,N2 2 lil(J)=BT041+J) F=0.000 an DO 100 I=l,NOBS DO 12 J=1,NSTAT-I BX(J)=0.ODO DO 10 M=1,NX 10 BX(J)=BX(fl+XM(I,My°B(M+ (J.1)NX) 12 CONTINUE BXO1STAT)=0.ODO DO 18J=I,NSTAT BXI(J)=0.ODO DO 16 M=I,NV BXI(J)=BX1(J)+XMIO,MrBl(M±(J.1)*NV) 16 CONTINUE 18 CONTINUE IF (NF.EQ.0) GO To 25 TEMPP=0.ODO DO 22 M=1,NF B1(M+NSTAT TEMPP=TEMPP+XMI(1,NV+M) N 5 V) 22 CONTINUE DO 24]=I,NSTAT BX1(J)=BX1(J)+TEMPP 24 CONTINUE 25 CONTINUE CALL AEXP(BX1(1),TEMP(1)) TEMP1 =BX(1)+OBS(I)5BXl(l).TIME(1)8EMP(1) TEMPPI=BX(1) DO 30 J=2,NflAT IF (BX(J).GT.BX(J-1)) TEMPPI =BX(J) CALL AEXP(BXI(J),TEMP(J)) TEMP12=(rEMP(J).TEMP(J1fl*rIME(1) (BXI(J)-BXI(J.I))-TEMP12 5 TEMP12=(BXQ)-BX(3-1))+OBS(I) IF (TEMPI2.GT.0.ODO) TEMPI =.BX(J)+OBS(I)*BX1Q)TIME(I)*TEMP(J) 30 CONTINUE TEMP2=0.ODO TEMPP2=0.ODO DO 40 J=1,NSTAT CALL AEXP((BX(J)-TEMPPI),TEMPP2I) TEMPP2=TEMPF2+TEMPP2I TEMPI2=BX(J)+OBS(I)*BXI (J)-11ME(IflEMI%I) TEMPI2=TEMP12-TEMPI CALL AEXP(rEMPI2,TEMP22) TEMP2=TEMP2+TEMP22 40 CONTINUE DRES(1)=TEMPI +DLOO(TEMF2)-TEMPPI-DLOO(TEMPP2) F=F+DRES(1) 100 CONTINUE F=-F REFURN END 2. Fortran program for computing the maximum likelihood estimates of the mixed logistic regression model. PROGRAM BINMIX C C C C C C C C C C C C C This code find niaximosn IIlcnllhood estimates of tIre parameters for mixed * binomial regression model. Observed data should be assooisted with n(i) * * whioh is the number of total trials related to observation i. * In this code we sflnw to choose common regression coefficients for * different oresponenta. NVAR = I/of different coeffecients * NCOM = 1/ of common coeffecients If NCOM =0, this Is the most general mae. Note that this code does * * not impose any restriction on mixing probabilities. The progrsm gives the estimated standard errnrs from the quasi-Newton * * apprcach. ... --“ -- .++‘+-‘“.i. IMPIJCfF DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS0000),TIME(I000),xM(I000,g),xMl0000,8),w000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOM DIMENSION BGUESS(30),OB(30)jI30,30),AGUESS(30),OA(30) DIMENSION Fff(I000,13),RES(1000,8),Th(30),TDRES(l000),DRES(l000) INTEGER Nl,N2,nII,D12,MON,NEVALS,IFML,NSTEP,NSTEPI,N,1H3 Integer stol,iter dinresnion tcbs(l),uinse(I000),sms(I000,8), o ocmi(1000,8),resl(I000),sign(I000) dimension opar(3OXse(3O),w(I000) OPEN( UNIT=l,FILE’Iostl’) OPEN( UNIT2,FILE’residsaY) p.. 3i4 open( unit=3,fiIe=’IiIres’) open( unit=8,flIc’fitout’) OPEN( UNIT7,FILE=’result’) OPEN( UN1T9,FILE=’Idataout’) READ(1,I00) NOBS,NSTAT,NX,NXI,NVAR,NCOM write(*,100) nobs,nstat,nx,nxl,NVAR,NCOM 100 FORMAT(615) NI =(NSTATI)*NX N2=NSTAT*NVAR+NCOM N=N1 +N2 READ(I,113) (OBS(1),TIME(1), I=I,NOBS) 110 FORMAT(FI0.5D0) write(*,1 13) (obs(i),ti,ue(i),i=1,nobs) do 111 11,nobs READ (1,112) (XM(I,J),J=1,NX) wrilc(*,112) (xm(ij)j=1,nx) III continue 112 FORMAT(6G16.8) 113 FORMAT(2F10.500) READ(1,l10) (BGIJESS(I),1=1,Nl) write(9,I 10) (bguess(i),i=I,nl) DO 115 I=1,NOBS READ(1,112) (XMI(1,J),J1,NX1) write (*,112) (xml(i,j)j=l,nxl) 115 CONTINUE READ(1,1 10) (AGUESS(J),1=1,N2) WRITE(9,1 10) (AGUESS(I),1= I ,N2) do 118 i=l,nobs tobs(i)=olw(i) ttimeC)’timeO) do 116 j=1,nx 116 txm(i,j)=xm(i,j) do 117 j=1,nxl 117 bm1(i,j)xml(ij) 118 continue NEVALS 1000 IH1N1 JIJ2=N2 IN3N MON=1 TOL0.0D01 toIlO.ODOI OBSINF=0.ODO DEV=0.ODO DO 120 1=l,NOBS 120 TDRES(I)=0.ODO do 121 iI,nobs 121 w(i)=0.ODO DO 150 1=I,NOBS IF (OES(I).EQ.0.ODO) GO TO ISO IF (OES(1).EQ.TIME(1)) GO TO 150 TDRES(1)=OBS(J)*DLOG(OBS(1))TIME(I)*DLOG(rIMEQ)) C +(flME(I)-OBS(I)yDLOG(TIME(1)-OBs(J)) DEV=DEV+TDRES(1) TEMPSUM=0.ODO NSTEP=INT(OBS(1)) NSTEP1 =INT(flME(I)) DO 142 J=NSTEP+1,NSTEPI TEMPSUM=TEMPSUM+DLOG(DFLOAT(J)) 142 CONTINUE OBSINF=OBSINF+TEMPSUM TEMPSUM=0.ODO N8TEP=INT(TIME(1)-ORS(1)) DO 144 J=1,NSTEP TEMPSUM=TEMPSUM+DLOG(DFLOAT(J)) 144 CONTINUE OBSINF=OBSINF+TEMPSUM ISO CONTINUE ntol=nobs odev=dev do 888 iter=0,ntol prel=(10.0DOK10.0D0) 200 DO 202 I=1,NI 202 OB(I)=BGUESS(I) DO 205 I=I,N2 205 OA(1)=AGUESS(I) CALL ESTEP(N1,N2,EGUESS,AGUESS) CALL MSThPI(N2,AGUESS,H,P,1112,NEVALS,IFAII,MON) CALL MSTEP241,BGUESS,H,P,flh1,NEVALS,IFAIL,MON) SSRI =0.ODO SSR2=0.ODO 3r 00210 I=1,N1 SSRI =SSR1 +(0B(1)-EGUESS(1))2.0D0 210 CONTINUE DO 220 I=1,N2 220 SSR2=(OA(I)-AGUESS(1))2.0D0+SSR2 1111 format(4x,2G16.8) do 230 i=1,nl 230 tb(i)=bgiess(i) do 240 i=1,n2 240 tb(i+nl)=agueeu(i) call lll1y(n,tb,f) tcmp=-f-pcel if (temp.lt.toll) go to 368 prel-f if (iter.gt.0) go to 363 10=-f f=-f-inf write(9,1l11) f,Rl write(a,111l) f,10 363 IF ((SSRI.GT.TOL).OR.(SSR2.GT.TOL)) GO TO 200 tn(n,tb,b,fueva1s,ifail,mcas) 368 call if (lter.eq.0) call fllk1y(n,tb,f,dies) if (lter.gLO) call lllAIy(n,tb,f) do 375 i=I,nl 375 bguess(i)=tb(i) do 376 i=l,n2 376 agueas(i)tb(i+nl) DEV=2’(DEV-(-F)) if (iter.eq.0) tdev=dev if (iter.gt.0) go to 600 10=-f f-f-obainf write(9,llil) f,10 call esteZbgueas,aguess) call gflt(nl,n2,bguess,agueas,XSQR,fit,RES) tempdflnut(nl +n2) do 378 k1,nl opar(k)bgueas(k) =Qi(kk)(0.5D0))*temp 378 continue do 379 k=1,n2 opar(k+nl)aguess(k) se(k+n1)=(h(k+n1,k+nl)*(0.5D0))*temp 379 continue wrlte(9,4444) 4444 format(4x,goodness of flt—XSQR, atal devianco’) wrlte(9,7777) XSQR, DEV do 380 i1,noba C IF (OBS(1).EQ.0.000) GO TO 380 o IF (OBSW.EQ.TIME(I)) GO TO 380 TEMP=tobe(i)-FIT(1,3) IF (rEMP.EQ.0.ODO) sign(I)=0.ODO IF (1EMP.NE.0.0D0) sign(i)=TEMP/(ABS(IEMP)) DRES(I)=(2*(rDRES(I)0RES(I))).5D0 DRES(1)=sign(i)*DRES(l) 380 CONTINUE do 385 i1,nobs write(8,398) (FIT(I,J),J= 1,2+2*NSTAT+ 1) WRITE(2,398) DRES(i), (RES(I,J),J =1,1 +NSTAT) 385 continue 398 format(6g18.7) do 500 i1,nobs writc(7,1 12) (z(i,j),j= I ,nstat) 500 continue WRITE(9,5555) WRITE(9,7777) (BGUESS(J),(h(i,i)°0.500),I= l,Nl) write(°,7777) (bgueus(i),i= l,nl) write(9,666 write(9,7777) (AGU SS(I),lh(i+nl,i+n1).5D0),l= I,N2) write(°,7777) (aguess(i),i=l,n2) go to 603 6(8) rcsl(iter)=tdev-dev do 601 k=1,nl w(itcr)=wtec)+aIs(bguess(k).opar(k))/se(k) bguess(k)=opar(k) 601 continue do 602 k=1,n2 w(iter)=w(iter)+aba(aguess(k)-q,ar(k+nI))/se(k+nI) aguess(k)=cpar(k+nI) 602 cnat if (itcr.eq.ntol) go to 889 603 noba=ntol-1 dev=odev-tdins(iter+ I) if (iter.eq.0) go to 614 do 610 k1,iter obs(k)tob(k) t(kttime(k) do 605 j=l,nx 605 =ncmQc) do 606 j=I,nxl 606 uu1(kj)Uun1(kj) 610 continue if (iter.eq.nobe) go to 888 614 do 620 k=iter+1,oobs obs4k)=tobs(k+1) timettime(k+1) do 615 j1,nx 615 xm(k.j)txm(k+1j) do 616 j1,nxl 616 xml(k.j)=txml(k+1j) 620 continue 888 continue 889 do 900 i=1,ntol res1(i)=aigtai)*(res1(i)I*.5D0)) write(3,7778) dresQ),resQ,1),msl(i),w(i) 900 continue 5555 format(4x,beta-vecto?) 6666 format(4x,’alpha-vectoi’) 7777 format(4x,2g16.8) 7778 format(4x,4g16.8) 9999 STOP END SUBROUTINE FUNCr(N,B,p) C This subroutinc computes the value of function QI in Chapter 3. C Data input: N = dimension of vector B; C B = beta vector; output: P C the function value Q1(B). IMPLICIT DOUBLE PRECISION (A.H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS(l000),TIME(1090),XM(l000,8),XMI(l000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOM INTEGER N DIMENSION B(N),BX(5) P=0.000 DO 100 1=l,NOBS DO 8 J=l,NSTAT-l BX(J)=0.ODO DO 6 M=1,NX 6 BX(J)=BX(J)+XM(I,M)*B(M+(J.l)*NX) 8 CONTINUE BX(NSTAT)=0.ODO TEMPMAX=BX(l) Conom C Loop 20 finds the largest BX(J), TEMPMAX. * C DO 20J=2,NSTAT IF (BX(J).UT.BX(J-l)) TEMPMAX=BX(J) 20 CONTINUE TEMPSUM=0.ODO DO 30 J=l,NSTAT P=P+Z(I,J)*BX(J) CALL AEXP((BX(J)-TEMPMAX),TEMP3) TEMPSUM=TEMPSUM+TEMP3 30 CONTINUE P=P-TEMPMAX-DLOG(TEMPSUM) 100 CONTINUE P=-P RETURN END SUBROUTINE GRAD(N,B,G) C This subroutine computes the first derivative of QI (see eqn 3.18). C Data input: N = dimension of vector B; C B=betavector output:G=thederivativeofQlatB. C C’on°°° IMPLICIT DOUBLE PRECISION (A-H,O.Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS(I000),TIME(l(mo),XM(1000,8),XMI(I000,g),Zu000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,N&rAT,Nx,Nxl,NVAR,NCOM INTEGER N DIMENSION 0(25),B(N),TEMPC2S),BX(S) DOS I=I,N S G(I)=0.0D0 DO 100 I=l,NOBS DO 20 J’=l,NSTAT-I BX(J)=0.000 DO 10 M=l,NX 10 BX(J)=BXQ)+XM(I,M)*B(M +(Jl)*NX) 20 CONTINUE BXaISTAT)=0.ODO TEMPMAX=BX(l) :. :: C C Loop 30 finds the largest 3)1(J), TEMPMAX. 5 :+——— ++‘+— . +4 Cs*+_+._+—.++ DO 30 J=2,NSTAT IF (BX(J).GLBX(J-l)) TEMPMAX=BX(J) 30 CONTINUE TE1vWSUM=0.ODO DO 40 J=1,NflAT BX(J)=BX(J)-TEMPMAX CALL AEXP(BXQ),TEMP(J)) TEMPSUM=TEMPSUM+TEMP(J) 40 CONTINUE DO 60 J=I,NSTAT-1 TEMPPRO=TEMP(JyrEMPSUM DO 50 M=l,NX G(M+(Jl)*NX)=0(M +(J.l)aNX)+XM(I,M)*(z(1,J)TEMPPRO) 50 CONTINUE 60 CONTINUE 100 CONTINUE DO 200 1=1,N 200 GQ)=-G(I) RETURN END SUBROUTINE MSTEP2(N,B,H,PO,IH,NEVALS,IFAIL,MON) . :: -. :: C C ‘This sshrzusiir is a quasi-Newton algorithm (Nash, 19P0) which C maximizes theflincticnQl. C Data input: N = dinrttion of vector B; B = heta vector; C 114 = dinresion of the Hessian matrix; C NEVALS = 1/ of evalustimta for the ftasctisn QI; C output: H = the Hessian matrix; P0 = nnxinaies valse; C B = optimal values of heta vector. +4,——”:., .‘,+—— IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI ,NVAR,NCOM DIMENSION OBS(I000),TIME(I000),XM(l000,I),XM I (l000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI ,NVAR,NCOM DIMENSION 3Q4), H(IH,N) DIMENSION X(30), C(30), 0(30), T(30) DOUBLE PRECISION K INTEGER COUNT DATA W,TOL/0,2,l .0000-41,EPSII .ODOD-6/ IF (N.LT.0.OR,N.OT.23) GO TO 160 IFN = N+I 10 = RLIM=72*(l000074,0) CALL FUNCT(N,B,P0) IF(P0GT.RLIM)GOTO1SO CALL GRADQ4,B,0) C C C RESEF HESSIAN 10 DO 301 = I,N DO 20 J = l,N 20 H(I,J) = 0,000 30 HQ,l) = 1.000 ILAST = 10 C C C TOP OF ITERATION 40 DO 501 = I,N X(I) 3(1) 50 C(I)=0(I) C C FIND SEARCH DIRECrION T C Dl = 0.ODO SN =0.000 00701 = 1,N 0.000 S l,N DO 603 SH(1,J)*G(J) 60 S T(1) S SN = SN+SS 70 Dl = D1S*G(I) C C C CHECK IF DOWNIULL IF (D1.LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN 0.SDO/DSQRT(SN) K = DMINI(1.000DO,SN) 80 COUNT =0 00901 1,N 5(1) = X(I)+K9(I) IF DABS(B(I)-X(I)).LT.EpS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO ISO CALL FIJNCT(N,B,P) IFN = IFN+I IF (IFN.GE.NEVALS) GO TO 170 IF P.LT.DI*1QqOL) GO TO 100 K = WK GO TO 80 C C C NEW LOWEST VALUE 100 P0 = P IG = IG+1 CALL GRAD4,B,G) IFN = IFN+N C C C UPDATE HESSIAN Dl = 0.000 DO 1101 = 1,N T(1) = K*T(I) C(I) = G(E)-C(1) 110 Dl =D1+T(I)V(I) C C C CHECK IF +VE DEF ADDITION IF (Dl .LE.0.00000) GO TO 10 02 =0.000 DO 130 1 = 1,N S = 0.0130 DO 1203 = 1,N 120 S S+H(1,fl’C(/) X(I) = S 130 02 = D2+SCQ) D2 = 1+02/01 DO 1401 1,N DO 140/ = 1,N 140 H(1,J) = GO TO 40 150 WAIL 0 C SUCCESSFUL CONCLUSION RETURN 160 IFAIL I C N 0U OF RANGE RETURN 170 WAIL = 2 C TOO MANY FUNCTION EVALUATIONS RETURN 180 IFAIL = 3 C INITIAL POINT INFEASIBLE RETURN 2005 FORMAT( 2X,3G164) END COUNT+ I SUBROUTINE AEXP(X,F) + +: :4 C This subroutine computes a exponential function value. C Data input: IC = real oumbeç output: F = exp(X). C IMPLICTT DOUBLE PRECISION (A-N,O-Z) INTEGER NSTEP TEMFI =ABS(X) IF (TEMFI .GT.79S) GO TO SO F=DEXP(X) GO TO 200 SO IF(X.LT.-79.9)GOTOISO IF (X.GT.lS0.ODO) X=lS0.ODO F=l.000+X NSTEP= I FACTOR= I .ODO TEMF1 =DFLOAT(NSTEP) TEMP2==XITEMPI 100 IF (TEMF2.LT.I.ODO) GO TO 200 NSTEP=NSTEP+1 TEMF1 =DFLOAT(NSTEP) FACTOR.=XITEMP1 TEMP2TEM F 5 ACTOR P2 F=F+TEMP2 GO TO 100 150 F=0.000 200 RETURN END SUBROUTINE FUNCTI(N,B,P) C This subroutine computes the value of function Q2 in Chapter 3. C Data input: N = dintntion of vector B; C B alpha vector; cutput: P = tic function value Q2(B). C IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NX1,NVAR,NCOM DIMENSION OBS(l000),TIME(I000),XM(l000,8),XMI(1000,8),Z(l000,5) COMMON OBS,TIME,XM,XMI ,ZJ4OBS,NSTAT,NX,NXI,NVAR,NCOM INTEGER N DIMENSION B(N),BX(5) P=0.0D0 TEMPI = 1 .ODO DO 100 I=I,NOBS H . —,--4-’ttttt-tt-,,: :‘ : +t,,±’ “‘:: -: 4 C Locp 8 computes BX(J) for variable coefficient part. * Ct :. H : 008 J=l,NSTAT BX(J)=O.0150 DO6M=l,NVAR 6 BX(J)=BX(J)+XMI(I,M)B(M+(J-I)NVAR) 8 CONTINUE IF (NCOM.EQ.O) GOTO II DO l0J=l,NSTAT DO 9 M=1,NCOM BX(J)=BX(J)+XMI(I,NVAR+M)*B(NVAR*NSTAT+M) 9 CONTINUE 10 CONTINUE 11 CONTINUE DO 20J=l,NSTAT IF (BX(J).LT.0.000) GO TO IS CALL AEXP(-BX(J),TEMP) PP+ ( B D 5 (OBS( X(J) LOG WTI?c Z(I -TIM (EEM tE(I)) j) EQ) PI +TEMP)) GO TO 20 IS CALL AEXP(BX(J),TEMP) P=P+Z(t,J)(OBS(I)BX(J)-T1ME(I)DLOG(TEMPt +TEMP)) 20 CONTINUE 100 CONTINUE p=-p RETURN END . SUBROUTINE GRADI(N,B,G) C This subroutine computes the first derivative C Data input: N = dlnconioo vector B; C B=alphavedor; of of Q2 (tee eqo 3.19). C rutput:G=thedeeivativeofQ2atB. IMPUCTr DOUBLE PRECISION (A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS(1000),TIME(1000),XM(1000,8),XMI(l000,8),Z(I000,5) COMMON OBS,TME,XM,XMI,Z,NOBS,NS’FAT,NX,NXl,NVAIçNCOM INTEGER N DIMENSION G(3o),Ba4),BX(5) TEMP1 = I .ODO DOS I=’l,N S O(I)=0.ODO DO 100 I=l,NOBS ,‘:I,—’ C Loop 20 osmputea BX(3) for variable ocefficient past. * DO 20.I=l,NSTAT BX(J)=0.0D0 DO 10 M=I,NVAR 10 BX(J)=BX(J)+XMI(l,M)*B(M+QI)*NVAR) 20 CONTINUE IF (NCOM.EQ.0) GO TO 25 DO 243=I,NSTAT DO 22 M=l,NCOM BX(J)=BX(J)+xMla,NVAR+M)*B(NnA’PONvAR÷M) 22 CONTINUE 24 CONTINUE 25 CONTINUE C C Loop 40 computes tic gradient. c 44-,++— —— DO 40 J=I,NSTAT IF (BX(J).GT.0.ODO) GO TO 35 CALL AEXP(BXQ),TEMP) DO 30M=1,NVAR G(M +(3l)*NVAR)=G(M +(JI)*NVAR) +Z(1,J)*XM1(1,M)*(OBS(I)TIME(IflEMPI(TEMpl +TEMP)) C 30 CONTINUE GO TO 40 35 CALL AEXP(-BX(J),TEMP) DO 36 Ml,NX1 G(M +(J1)*NVAR)=G(M +(J1)*NVAR) +Z(I,J)*XMI(1,M)*(OBS(I)TIME(I)/(TEMPl +TEMP)) C 36 CONTINUE 40 CONTINUE IF (NCOM.EQ.0) GO TO 81 DO 80 M=1,NCOM DO 60 J=l,NSTAT IF (BX(J).GT.0.ODO) GO TOSS CALL AEXP(BXQ),TEMP) O(M +NVAR*NSTAT)=G(M+NVAR*NSTAT) C +Z(lj)*XMI(I,M+NVAR)*(OBS(1)-TIMEQflEMP/cI’EMPI +TEMP)) GO TO 60 55 CALL AEXP(-BX(J),TEMP) G(M +(J1)*NVAR)=G(M+(J1)*NVAR) +Z(I,J)*XMI(I,M+NVAR)*(OBS(1)TIME(I)/(rEMp1 +TEMP)) C 60 CONTINUE 80 CONTINUE 81 CONTINUE 100 CONTINUE DO2001=I,N 200 G(fl=-G(1) RETURN END SUBROUTINE MFrEP1a4,B,H,p0,IH,NEVALS,IFAIL,MON) H.. . C This aubroutine is a quasi-Newton algorithm (Nash, )9%) which C maximizes the fanction Q2. C Data input: N = dimension of vector B; B alpha vector; C DI = dincanion of the corresponding Hessian matrix; C NEVALS =// of evaluations for fan function Q2; C output: H = liar Hessian matrix; P0 = maximian value; C B = optimal values of alpha vector. C5aos©ncanatnsussssss4o55n©essocenn*sssn.wsssossss.sns IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NFrAT,NX,NXI,NVAJçNC0M DIMENSION OBS(I000),TIME(i000),XM(1000,8),XM l(l000,8),Z(l000,S) COMMON OBS,TIME,XM,XMI,z,NOBS,NSTAT,Nx,Nxl,NVaNCOM DIMENSION B(N), H(IH,N) DIMENSION X(30), C(30), 0(30), T(30) DOUBLE PRECISION K INTEGER COUNT DATA W,T0LTh2,1 .000D-41,EPS)1 .ODOD-6/ IF (N.LT.0.OR.N.GT.23) GO TO 160 IFN = N+1 IG = 1 RLIM=7.2*(10.0D074.0) CALL FUNCFI(N,B,P0) IF(P0.GTRLIM)GOTOISO CALL GRADI(N,B,G) C C C RESET HESSIAN 10 DO 301 1,N DO 20) = 1,N 20 0.0130 HO,)) 30 H(I,I) I.ODO ILAST = 10 C C C TOP OF ITERATION 40 DO 501 = 1,N XQ) = 0(1) 50 C(I)=’G(I) C C C FIND SEARCH DIRECTION T Dl = 0.000 SN=0.ODO DO 701 I,N S 0.000 DO 60) 1,N 60 S = S.H(I,J)*G(J) T(1) = S SN = SN+S*S 70 DI = D1-SG(1) C C C CHECK IF DOWNHILL IF (D1.LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN = 0.500/DSQRT(SN) K = DMINI(1.ODODO,SN) 80 COUNT 0 DO 901 = 1,N X(1)+K*T(I) BQ) IF (DABS(B(1)-X(I)).LT.EPS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO ISO CALL FUNCFI(N,B,P) IFN = LFN+1 IF (IFN.GE.NEVALS) GO TO 170 IF (PiT.P0.DI’KrOL) GO TO 100 K = WK GO TO 80 C C C NEW LOWEST VALUE 100 P0 = P IG 10+1 CALL GRADI(N,B,G) IFN = IFN+N C C C UPDATE HESSIAN DI = 0.000 DO 110 I I,N T(I) = KT(1) C(1) = G(1)-C(I) 110 Dl =DI+T(I)*C(I) C C C CHECK IF +VE DEE ADDITION IF (D1.LE.0.00000) GO TO 10 02 = 0.0130 DO 1301 l,N = COUNT+1 S = 0.000 DO 1203 = 1,N 120 C(J) 5 S=S+Ha.JY XO) = S 130 02 = 02÷S*c(I) D2 = I+D2/Dl DO 1401 = 1,N DO 1403 = I,N 140 H(l,J) = H.crwex(J)÷Tq)*xo).D2serorT(J))mI GO TO 40 150 IFAIL = 0 C SUCCESSFUL CONCLUSION RETURN 160 WAIL 1 C N OUT OF RANGE RETURN 170 IFAIL = 2 C TOO MANY FUNCTION EVALUATIONS RETURN 180 WAlL = 3 C INITIAL POINT E1FEASIBLE RETURN 2005 FORMAT( 2X,3G16.4) END SUBROUTINE ESTEPQ4I,N2,B,B1) :. tt**-ttrt-rr’-rt C• C This subroutine executes the E-step of the EM algoritlun. C Data input: NTEMFI = dimension of vectoe B; C NTEMF2 dimension of vector Bi; C Bbetavectoe C El = alpha vector. C Ouput: updated posterior probabilities, Z(l,J). IMPLICIT DOUBLE PRECISION (A-H,O.Z) INTEGER NOBS,NSTAT,NX,NX I ,NVAR,NCOM DIMENSION OES(l000),TIME(1000),XM(I000,8),XMI (l000,8),Z(1000,5) COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOM INTEGER Nl,N2 DIMENSION B(NI ),Bl(N2),TEMP(5),EX(5),EXI(5) SMALLO= l0000000000.000 SMALLO= 1 .ODOISMALLO ONE=l.ODO DO 100 1=l,NOBS C Loop 12 computes BX(J). * DO 12 J=1,NETAT-I EX(J)=0.000 DO 10 M=I,NX BX(J)=BX(I B 5 (M )+XM(I,M) +(3.l)*NX) 10 CONTINUE 12 CONTINUE BXQ4STAt=0.ODO C Loop 21 computes BXI(J) foe variable coefficient part. -. .. * -, DO 21 J=I,NSrAT EXI(J)=0.ODO DO 20 M=1,NVAR 20 EXI(J)=BXI(J)+XMl(1,M)*El(M+(JI)*NVAR) 21 CONTINUE IF (NCOM.EQ.0) GO TO 24 DO 23 J=I,NSTAT DO 22 M=l,NCOM BXI(J)=BXI(J)+XMI(I,M+NVAR)*B1(NSTAT*NVAR+M) 22 CONTINUE 23 CONTINUE 24 CONTINUE =—° 4 C .1 C Loop 30 linda the largest item in exponential functioes. * ...,....:.::,,, I, IF (BXl(I).GT.0.000) GO TO 25 CALL AEXP(EX1(l),TEMPI) TEMP(l )=BX(1) +OBS(I)*BXI 5 (I)-TIM D LOG(ONE+TEMPI) E(I) GO TO 26 25 CALL AEXP(-EXI(l),TEMPI) TEMP(l)=EX(l)+(OBS(1).TIME(I))*BXI(l).TIME(l)*DLOG(ONE+TEMF1) 26 TEMPMAX=TEMP(1) DO 34 J=2,NSTAT IF (BX1(J).GT.0.0D0) GO TO 32 CALL AEKP(BXI(J),TEMP1) TEMP(J)=’BX(J)+OBS(I)*BXI (J)TlMEm*DLOG(ONE+TEMPl) GO TO 33 32 CALL AEXP(-BXI(J),TEMPI) TEMP(J)=BX(J)+(OBST1ME(1))*BX1(J)TIME(l)*DLOG(ONE+TEMP1) 33 IF (rEMP(J).GT.TEMP(J-1)) TEMPMAX =TEMP(J) 34 CONTINUE C Loops 40 and 50 compute Z(I,J) values. * TEMPSUM’=o.OW DO 40J=1,NSTAT TEMPP’=TEMP(J)-TEMPMAX CALL AEXP(TEMPP,TEMP(J)) TEMPSUM=TEMPSUM+TEMP(J) 40 CONTINUE DO 50 J=t,NSTAT Z(I,J)=’TEMP(J)fI’EMPSUM IF (Z(I,J).LT.SMALLO) Z(I,J)=’O.ODO 50 CONTINUE 100 CONTINUE RETURN END SUBROUTINE GFff(Nt,N2,B,BI,XSQR,F1T,RES) C This aubroutiit computes Pearson statistic, fitted values, l earscss 5 C residuals. C Data input: Ni = dhneeaicn of vector B; C N2 dimension of vector Bi; C B’=betavector; Dl =alphaveotor; C Output: XSQR = Pearson statistic; C FIT = fitted values including for each ctanpomnt C RES = Pearson residuals including for each ccinpcarnt; IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOB ,NSTAT,NX,NX1,NVAR,NCOM DIMENSION OBS(1000),TIME(I000),XM(t000,8),XMI (t,8),Z(l000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NXI ,NVAR,NCOM INTEGER Nl,N2 DIMENSION B(N I),Bl(N2),BX(5),BXt(5),TEMP(5) DIMENSION FIT(tOtX),13),RES(l000,8) ONE=t.000 XSQR=0.ODO DO 100 I=I,NOBS Ffl’(I,I + t)=OBS(l) Ff1’ 0,2+ I)=0.ODO Csscsccscancsaasassscscsassc C Loop 12 computes BX(J). DO 123=I,NSTAT-1 BX(J)=0.ODO DO 10 M=I,NX BX(J)=BX(fl+XM(I,M)sB(M+(J1)*NX) 10 CONTINUE 12 CONTINUE BX(I1STAT)=0.ODO TEMP1 =BX(1) DO 14 J=2,NSTAT iF (BX(J).GT.BX(J-l)) TEMP1 =BX(I) 14 CONTINUE TEMPD=0.ODO DO 161=1,NSTAT CALL AEXP((BX(J)-TEMP1),TEMP1 1) BX(J)=TEMPI 1 TEMPD=TEMPD+TEMP1 I 16 CONTINUE DO 18J=I,NSTAT Fff(1,2+NSTAT+J + 1)=BX(J)IFEMPD 18 CONTINUE C Loop 21 computes DXI (I) for variable coefficient pact. DO 21 1=I,NSTAT DX 1(J) =0.ODO DO 20 M=I,NVAR 20 BXt(J)=BXt(i)+XMt(I,M)*Bt(M+(JI)*NVAR) 21 CONTINUE IF (NCOM.EQ.0) GO TO 24 DO 23 J=I,NSTAT * DO 22 M=l,NCOM BX1(J)=BXI(J)+XMI(I,M+NVAR)*Bl(NSTAT*NVAR+M) 22 CONTINUE 23 CONTINUE 24 CONTINUE C El =0.ODO DO 40 J=l,NSTAT IF (BXIQ).LT.0.ODO) GO TO 35 CALL AEXP(-BX1(J),TEMPP) TEMP(J)ONE/(ONE+TEMPP) TEMP(J)=TIME(lflEMP(3)*(ONETEMP(J)) El =El +FIT(I,2+NSTAT+J+IflEMP(J) FIT(l,2+J + 1)=TIME(I)/(ONE+TEMPP) GO TO 40 35 CALL AEXP(BXI(J),TEMPP) TEMP(J)=TEMPP/(ONE+TEMPP) TEMP(J)=TIME(Iyel’EMP(J)*(ONETEMP(l)) El =El +FITa,2+NSTAT+J÷IflEMpGT) F1T(l,2+J + I)=TIMEcJrEMPP/(ONE+TEMPP) 40 CONTINUE DO 42 i=I,NSTAT Frr(I,2+J+l) 5 F1T(l,2+l)=F1T(I,2+l)+F1T(I,2+NSTAT+i+l) RES(l,l +J)=(FIT(I,l +l)-FlTQ,2+J+l)r(I’EMP(J)(-0.5D0)) 42 CONTINUE E2=0.ODO DO 50 J=t,NSTAT E2=E2+F1T(l,2+NSTAT+J+ t)*(FIT(I,2+J+l)*5(2.000)) 50 CONTINUE E2=EI +E2-(FIT(I,2+ l)(2.0D0)) FIT(I,I)E2 ©s(0.5D0)) 5 RES(I,t)=(FIT(i,1 + l)FlT(I,2+l))*(E2 XSQR=XSQR+(RES(I,l)an(2.000)) 100 CONTINUE REFUEN END SUBROUTINE GLIKELfl4,TB,G) C C This subroutine computes the first derivative of the observed log C likelihood flusotion. C Data input: N = dinsonsion of vector B; C B = vector combining beta and aipha vectors; C Output: G = the derivative of the function at B. C IMPLICIT DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS(I000),TIME(l000),XM(1000,8),XMI(l000,8), C Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOM INTEGER N,Nl,N2 DIMENSION TB(N),B(25),BI(25),BX(5),BXI(5),TEMP(5),COM(5) C ,G(25),P(5) DO 1 l=l,N G(I)=0.000 1 Ni =(NSTATI)*NX N2=NSTAT*NVAR+NCOM DO 2 I=l,Nl 2 B(I)TB(I) DO 3 I=I,N2 3 Bl(I)TB(Nl+l) ONE=l.ODO DO 100 I=l,NOBS C C Loop2ooomputesliX(J). * C DO 20 J=I,NSTAT-l BX(J)=0.ODO DO 1OM=l,NX 10 BX(J)=BX(J)+XM(I,M)*B(M+(Jl)*NX) 20 CONTINUE BX(NSTAT)=0.000 DO 22J=l,NSTAT BXI(J)=0.ODO Ca4----4 C Loop 21 computes BX1(J) for variable coefficient part. * C DO 21 M=l,NVAR BXI(J)=BXI (J)+XM1(I,M)*B1(M+Q.l)*NVAR) 21 CONTINUE çr 22 CONTINUE IF (NCOM.EQ.0) GO TO 25 DO 24 J=I,NSTAT 0023 M=I,NCOM BX1(J)=BXI (J)+XMI(I,M+NVAR)*B1(M+NSTAT*NVAR) 23 CONTINUE 24 CONTINUE 25 CONTINUE C:::::-:::::::::::::::-::::.: PMAX=BX(1) DO 26J=2,NSTAT IF (BX(J).GT.BX(J-1)) PMAX=BX(J) 26 CONTINUE PSUM=0.ODO DO283=l,NSTAT CALL AEXP(BX(3),P(J)) PSUM=PSUM+P(I) 28 CONTINUE C CALCULATE MIXING PROBABILiTIES Pj * 0029 J=1,NSTAT 29 P(J)=P(J)IPSUM C C CALCULATE BINOMIAL PARAMErERS THETAJ — * DO 40 J=1,NSTAT IF (BXI(J).LT.0.000) GO TO 35 CALL AEXP(-BXI (J),TEMPP) TEMp(J)=Bx(J)+(oBswTIME(u)*BXl(J)TlMEm*DLOG(oNE+TEMpp) BXI (J)= I .ODO/(l .000+TEMPP) GO TO 40 35 CALL AEXP(BXI(J),TEMPP) TEMP(J)=BX(J)+OBS(l)*BXI (J)-TlME(I)DLOG(ONE+TEMPP) BXI(J)=TEMPP/(l .000+TEMPP) 40 CONTINUE TEMPMAX=TEMP(1) 0045 J=2,NSTAT IF (TEMP(J).GT.TEMP(J-1)) TEMPMAX=TEMP(J) 45 CONTINUE TEMPSUM=0.000 DO 48 J=1,NSTAT TEMPP=TEMP(J)-TEMPMAX CALL AEXP(FEMPP,COM(J)) TEMPSUM=TEMPSUM+COM(J) 48 CONTINUE DO SO J=1,NSTAT COM(J)=COM(J)fI’EMPSUM 50 CONTINUE 0070 J=l,NSTAT-I TEMPP=COM(J)-P(J) 0065 M=l,NX G(M +NX*(J1))=G(M +NX*(J1))+XM(1,MflEMPP 65 CONTINUE 70 CONTINUE TEMPSUM =0.000 0080 J=I,NSTAT TEMPP=COM(J)*(OBS(I)TIME(l)*BXl(J)) TEMPSUM=TEMPSUM +TEMPP 0075 M=l,NVAR G(M+Nl +NVAR*Ql))=GQol+Nl +NVAR*(Jt))+TEMPP*XMl(I,M) 75 CONTINUE 80 CONTINUE IF .lCOM.EQ.0) GO TO 100 0090 M=1,NCOM G(M +N1 +NVAR*NSTAT)=G(M +N1 +NVAR*NSTAT)+ TEMPSUM*XMI(l,M+NVAR) C 90 CONTINUE 100 CONTINUE do 200 i=t,n 200 g(i)=-g(i) RETURN END SUBROIJfINE LLIICELY4,TB,F) 000 Cas C This subroutine computes the observed log ilbelihood value. C Data input: N = total dinrnsion of vector BT; C TB = vector combining beta and alpha vectors. C Output: F = Ike observed log likelihood value at liT. IMPLICiT DOUBLE PRECISION(A-I4,O-Z) INTEGER NOBS,Nsi’AT,NX,NXi,NVAR,NCOM DIMENSION OBS(i000),TIME(l000),XM(1000,8),XM1 (1000,8), Z(l000,5) C COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX i,NVAR,NCOM INTEGER Ni,N2 DIMENSION TBQ4),B(25),Bl(25),BX(5),BX1 (5),TEMP(5) Ni =alSTAT.i)*NX N2=NSTAT*NVAR+NCOM DO 1 i=i,Nl I B(I)=TB(I) DO 2 I=i,N2 2 Bi(i)=TBO’Tl+I) F=0.ODO ONE= I .ODO DO 100 I=i,NOBS C C Loop 20 romputes BX(J). * DO 20 J=l,NSTAT-i BX(J)=0.ODO DO 10 M=i,NX 10 BX(J)=BX(J)+XM(i,M)*B(M+(Jl)*NX) 20 CONTINUE BX(NSTAT)=0.ODO DO 223=i,NSTAT BX1(J)=0.ODO C C 21 22 23 24 25 C Loop 21 computea BXI(J) for variable oceflicient part, a DO 21 M=i,NVAR BXi(J)=BXi(J)+XMl(I,M)*Bi(M+(Ji)*NVAR) CONTINUE CONTINUE IF Q4COM.EQ.0) GO TO 25 DO 24 J=l,NSTAT DO 23 M=l,NCOM BXI(J)=BXI(J)+XM1(l,M+NVAR)*Bl(M +N&FAT*NVAR) CONTINUE CONTINUE CONTINUE H. :::..::::::. :::::..:H::t4444: *44 PMAX=BX(i) DO 26 J=2,NSTAT IF (BXQ).OT.BX(J-I)) PMAX=BX(J) 26 CONTINUE PSUM=0.ODO DO 28 ,t=l,NSTAT bx)j)=bxØ-pmn CALL AEXP(BX(J),TEMPP) PSUM=PSUM+TEMPP 28 CONTINUE F=F-DLOG(PSUM) ttt C ’ DO 40 J=I,NSTAT IF (BX1(J).LT.0.ODO) GO TO 35 CALL AEXP(-BX1(J),TEMPP) TEMP(J)=BX(J)+(OBS(I)TiME(ID*BXi (J).TIME(I)*DLOG(ONE+TEMPP) GO TO 40 35 CALL AEXP(BX1(J),TEMPP) TEMP(J)=BX(J)+OBS(I)*BXi(J)TIME(i)*DLOG(ONE±TEMPP) 40 CONTINUE TEMPMAX=TEMP(i) DO 45 J=2,NSTAT IF (TEMP(J).GT.TEMP(J-i)) TEMPMAX=TEMP(J) 45 CONTINUE TEMFSUM =0.ODO DO 48 J=l,NSTAT TEMPP=TEMP(J)-TEMPMAX CALL AEXP(TEMPP,TEMI’2) TEMPSUM=TEMPSUM+TEMP2 48 CONTINUE F=F+TEMPMAX +DLOG(FEMPSUM) 100 CONTINUE f=-f RETURN END dr’9 SUBROUTINE QNEWrON(N,B,H,P0,NEVALS,IFAIL,MON) C—-. C This subroutine is a quasi-Newton algorithm 4ash, 1990) which C maximizes the observed log lihalihood function. C Data input: N = dinantsion of vector B; C B = vector combining beta and alpha vectors; C 111 = dimansion of the corresponding Hessian matrix; C NEVALS 11 of evaluations for the observed log likelihood function; output: H = the Hessian matrix; 90 = maximum value; C C B = optimal values of alpha vector (sasst—t—t— 4: . IMPLIC1T DOUBLE PRECISION(A-H,O-Z) INTEGER NOBS,NS1’AT,NX,NXI,NVAR,NCOM DIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(l000,8), C Z(l000,5) COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOM DIMENSION B4), H(30,30) DIMENSION X(30), C(30), G(30), T(30) DOUBLE PRECISION K INTEGER COUNT,IH,N DATA W,TOL/0.2,l.ODOD-4/,EPSI1 .000D-61 ih=n IF (N.LT.0.OR.N.GT.23) GO TO 160 IFN = N+l lG = (l0.0D074.0D0) 5 RLIM=7.2D0 CALL LLIKELY(N,BP0) IF(P0.GT.RLIM)GOTOI8O CALL GLlKELY(NB,G) C C C RESEF HESSIAN 10 DO 301 = l,N DO 20 J = l,N 20 H(I,J) = 0.01)0 30 H(l,l) I.ODO LAST = 10 C C C TOP OF ITERATION 40 DO 501 I,N X(l) = B(I) 50 C(I)=G(1) C C C FIND SEARCH DlRECION T Dl = 0.000 SN =0.000 DO 701 = I,N S = 0.000 DO 60 J = l,N 60 S = SH(I,J)*G(J) T(l) = S SN = SN+S*S 70 Dl = Dl.S*GQ) C C C CHECK IF DOWNHILL IF (DI.LE.0.ODO) GO TO 10 C C C SEARCH ALONG T SN = 0.5D0/DSQRT(SN) K = DMIN1(l.000DO,SN) 80 COUNT =0 DO 901 = l,N B(l) = X(l)+K*T(I) IF (DABS(B(I).X(I)).LT.EPS) COUNT 90 CONTINUE C C C CHECK IF CONVERGED IF (COUNT.EQ.N) GO TO ISO CALL LLIKELYa4,B,P) IFN IFN+l IF (IFN.GE.NEVALS) GO TO 170 IF (P.LT.P0Dl*K*TOL) GO TO 100 K = W*K GO TO 80 C C NEW LOWEST VALUE = COUNT+l C 100 P0 = P 10 = 10+1 CALL GLIKELYa4,B,G) WN = IFN+N C C C UPDATE HESSIAN Dl = 0.000 DO 1101 = 1,N TO) = K TQ) 5 C(I) = 0(I)-C(I) 110 Dl =D1+T(I)*CW C C C CHECK W +VE DEF ADDiTION W (D1.LE.0.OD000) GO TO 10 D2 = 0.000 DO 130 I = 1,N S = 0.000 DO 1203 = 1,N 120 S = S+H(I,J)*C(J) XQ) = S 130 D2=D2+S*C(1) 02 = 1+02/01 DO 140 I = I,N DO 140 J = I,N 140 H(1,J) = H(I,J).(r(l)*x(J)+T(J)*xO).D29(1Yer(J))/Dl GO TO 40 150 IFAIL =0 C SUCCESSFUL CONCLUSION RETURN 160 IFAIL = I C N 01fF OF RANGE RETURN 170 WAIL = 2 C TOO MANY FUNCTION EVALUATIONS RETURN 180 WAIL = 3 C INITIAL POINT INFEASIBLE RETURN 2(8)5 FORMAT( 2X,3G16.4) END SUBROUTINE FLIKELY(N,TB,F,DRES) Csese°°’40 C This subroutine computes the deviance residuals. C Data input: N = dimension of vector BTI; C TB = vector combining beta and alpha vectors; C Output: ORES = deviance residuals; C F = the observed log likelihood function value at BT. C IMPUC1T DOUBLE PREC1SION(A-E,O-Z) INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOM DIMENSION OBS(l000),TIME(l000),XM(I000,8),XMI (1000,8), C Z(I000,5) COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOM INTEGER NI,N2 DIMENSION TB(N),B(25),BI (25),BX(S),BXI(5),TEMP(5),DRES(l000) NI =(NSTAT-I) NX 5 N2=NSTATvNVAR+NCOM DO I l=l,Nl 1 B(I)=TB(I) DO 2 l=I,N2 2 BlO)=TBIN1+I) F=0.000 ONE=l.ODO DO 100 l=I,NOES —. C Loop 20 computes BX(J). * C DO 20 J=l,NETAT-l BX(J)=0.ODO DO 1OM=l,NX 10 BX(J)=BX(J)+XM(l,M)*B(M+(Jl)*NX) 20 CONTINUE BXa4STAT)=0.ODO DO 22 J=l,NSTAT BXI(J)=0.ODO C C Loop 21 computes BXI(J) for variable coefficient part. * 21 22 23 24 25 DO 21 M=1,NVAR BX1(J)=BX1(J)+XM1(I,M)*B1(M+(iI)*NVAR) CONTINUE CONTINUE IF (NCOM.EQ.0) GO TO 25 DO 24 J=l,NSTAT DO 23 M=l,NCOM BXI(J)=BX1(J)+XMI(I,M+NVAR)*B1(M+NSrADWNVAR) CONTINUE CONTINUE CONTINUE PMAX=BX(l) DO 26 J=2,NSTAT IF (EX(J).GT.EX(J-l)) PMAX=EX(J) 26 CONTINUE PSUM=0.ODO DO 28 J=l,NSTAT bx(j)bxW-pmax CALL AEXP(BX(J),TEMPP) PSUM=PSUM+TEMPP 28 CONTINUE DRESW=-DLOG(PSUM) DO 40J=I,NflAT IF (BX1(J).LT.0.ODO) GO TO 35 CALL AEXP(-BXI(J),TEMPP) TEMP(J)=BX(J)+(OB)T1MEW)*BXl(J)TIME(1)*DLOG(ONE+TEMPP) GO TO 40 35 CALL AEXP(BXI(J),TEMPP) TEMP(J)=EX(3)+OES(I)*BXl(J)TIME(I)*DLOO(ONE+TEMPP) 40 CONTINUE TEMPMAX=TEMP(1) DO 45 J=2,NSTAT IF (FEMP(J).GTJE?vIP(J-l)) TEMPMAX TEMP(J) 45 CONTINUE TEMPSUM=0.ODO DO 48 J=l,NSTAT TEMPP=TEMP(J)-TEMPMAX CALL AEXP(TEMPP,TEMP2) TEMPSUM=TEMPSUM +TEMP2 48 CONTINUE DRES(1)=DRES(I)+TEMPMAX+DLOG(FEMPSUM) F=F+DRES(1) 100 CONTINUE f=-f RETURN END
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Mixed regression models for discrete data
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Mixed regression models for discrete data Wang, Peiming 1994
pdf
Page Metadata
Item Metadata
Title | Mixed regression models for discrete data |
Creator |
Wang, Peiming |
Date Issued | 1994 |
Description | The dissertation consists of two parts. In the first part we introduce and investigate a class of mixed Poisson regression models that include covariates in both mixing probabilities and Poisson rates. The proposed models generalize the usual Poisson regression in several ways, and can be used to adjust for extra-Poisson variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice issues. Several applications of this approach are analyzed. This analysis is compared to quasi-likelihood approaches. In the second part we introduce and investigate a class of mixed logistic regression models that include covariates in both mixing probabilities and binomial parameters with the logit link. The proposed models generalize the usual logistic regression in several ways, and can be used to adjust for extra-binomial variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice issues. An applications of this approach is analyzed and results compared to those by quasi-likelihood approaches. The dissertation also discusses future research in the areas and provides FORTRAN codes for all computations required to apply the models. |
Extent | 5609440 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-04-15 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088106 |
URI | http://hdl.handle.net/2429/7176 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Graduation Date | 1994-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1994-954051.pdf [ 5.35MB ]
- Metadata
- JSON: 831-1.0088106.json
- JSON-LD: 831-1.0088106-ld.json
- RDF/XML (Pretty): 831-1.0088106-rdf.xml
- RDF/JSON: 831-1.0088106-rdf.json
- Turtle: 831-1.0088106-turtle.txt
- N-Triples: 831-1.0088106-rdf-ntriples.txt
- Original Record: 831-1.0088106-source.json
- Full Text
- 831-1.0088106-fulltext.txt
- Citation
- 831-1.0088106.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088106/manifest