MIXED REGRESSION MODELS FOR DISCRETE DATAByPeiming WangB. Sc. (Mathematics) Shanghai Second Polytechnic University , 1983M. Sc. (Engineering) Shanghai Institute of Mechanical Engineering , 1988M. A. (Statistics) York University, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDoCToR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESFACULTY OFCOMMERCE AND BUSINESS ADMINISTRATIONWe accept this thesis as conformingto the required standard4ATHE UNIVERSITY OF BRITISH COLUMBIAAugust, 1994© Peiming Wang, 1994In presenting this thesis in partial fulfillment of therequirements for an advanced degree at the University of BritishColumbia, I agree that the Library shall make it freely availablefor reference and study. I further agree that permission forextensive copying of this thesis for scholarly purposes may begranted by the head of my department or by his or herrepresentatives. It is understood that copying or publication ofthis thesis for financial gain shall not be allowed without mywritten permission.(Signature)Department of______________________The University of British ColumbiaVancouver, CanadaDate Lt ) Y’AbstractThe dissertation consists of two parts. In the first part we introduce and investigate aclass of mixed Poisson regression models that include covariates in both mixing probabilities and Poisson rates. The proposed models generalize the usual Poisson regressionin several ways, and can be used to adjust for extra-Poisson variation. The featuresof the models, identifiability, estimation methods based on the EM and quasi-Newtonalgorithms, properties of these estimates, model selection criteria and residual analysisare discussed. A Monte Carlo study investigates implementation and model choice issues. Several applications of this approach are analyzed. This analysis is compared toquasi-likelihood approaches.In the second part we introduce and investigate a class of mixed logistic regressionmodels that include covariates in both mixing probabilities and binomial parameters withthe logit link. The proposed models generalize the usual logistic regression in severalways, and can be used to adjust for extra-binomial variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms,properties of these estimates, model selection criteria and residual analysis are discussed.A Monte Carlo study investigates implementation and model choice issues. An applications of this approach is analyzed and results compared to those by quasi-likelihoodapproaches.The dissertation also discusses future research in the areas and provides FORTRANcodes for all computations required to apply the models.Table of ContentsAbstract iiList of Tables vjList of Figures viiiAcknowledgement xDEDICATION1 Introduction 12 Mixed Poisson Regression Models 52.1 Poisson regression and its modifications 52.2 Implications of Overdispersion 142.3 Tests for Extra-Poisson Variation 162.4 Mixed Poisson Regression Models 182.4.1 The Model 192.4.2 Identifiability 222.5 Parameter Estimation for the mixed Poisson regression models 242.5.1 EM and.Quasi-Newton Algorithms 252.5.2 Starting Values 302.6 A Monte Carlo Study 322.6.1 Performance of the Estimation Algorithm 322.6.2 The mixed Poisson regression Models For Some Typical Problems 352.7 Implementation Issues 392.7.1 Model Selection2.7.2 Classification2.7.3 Residual Analysis and Goodness-of-fit2.8 Applications2.8J RD and Patents2.8.2 Seizure Frequency in a Clinical Trial2.8.3 Terrorist Bombing2.8.4 Accidents in Worksites2.8.5 Aces Salmonella Assay Data2.9 Tables and Figures in Chapter 23 Mixed Logistic Regression Models3.1 Logistic Regression and Its Modifications3.1.1 Link Modifications3.1.2 Frequency Distribution Modifications3.2 Tests For Extra-binomial Variation3.3 A Mixed Logistic Regression Model3.3.1 The Model3.3.2 Features of the Mixed Logistic3.3.3 Identifiability3.4 Parameter Estimation .3.4.1 The EM algorithm3.4.2 Starting Values .3.4.3 A Monte Carlo Study3.5 Implementation’ Issues .3.5.1 Model Selection .3945Test . . 4655556268727882129129132• 134140142142Models 145147151151155157161161Regressiontv3.5.2 Classification 1633.5.3 Residual Analysis and Goodness-of-fit. 1643.6 An Application 1683.7 Tables and Figures in Chapter3 1754 Summary, Conclusions and Future Research 1894.1 Summary and Conclusions 1894.2 Mixed Exponential Regression Models 1914.3 Hidden Markov Poisson Regression Models . 1954.3.1 The Model 1964.3.2 Moment Structure 1994.3.3 Identifiability 2004.3.4 Estimation 2024.3.5 The Probabilities of Initial States and Starting Values 2084.3.6 Implementation and Remaining Issues . . . 209Bibliography. 210A FORTRAN PROGRAM 220VList of Tables2.1 The results of the simulations for the mixed Poisson regression models. 832.2 The result of a Monte Carlo study on the 2-component mixed Poissonregression model with constant mixing probabilities and variable rates — I 842.3 The result of a Monte Carlo study on the 2-component mixed Poissonregression model with constant mixing probabilities and variable rates — II 852.4 The results of the likelihood ratio tests for the hypothesis of a2 = 0 basedon the 2-component mixed Poisson regression model—I 862.5 The results of fitting mixed Poisson regression model to the data from aMonte Carlo study on the 2-component mixed Poisson regression modelwith constant mixing probabilities and variable rates 872.6 The results of the likelihood ratio tests for the hypothesis of a2 = 0 basedon the 2-component mixed Poisson regression model—IT 882.7 The results of model selection based on AIC and BIC values for the MonteCarlo study 892.8 Poisson regression and overdispersion test statistics for patent data. . 902.9 Mixed Poisson iegression model estimates for patent data 912.10 Parameter estimates for five models for patent data 922.11 Parameter estimates for five methods for seizure data analysis 932.12 Mixed Poisson regression model estimates for seizure data 942.13 Mixed Poisson regression model estimates for terrorist bombing data. . 952.14 Mixed Poisson regression model estimates for workplace injury data. . 96Lu2.15 Number of revertant colonies of salmonella (ye) 972.16 Mixed Poisson regression model estimates for Ames salmonella assay data 982.17 Parameter estimates for five estimation methods for assay data) 993.1 Data of Busvine (1938) 1763.2 The results of the simulations for the mixed logistic regression model(Modell) 1773.3 The results of the simulations for the mixed logistic regression model(Model2) 1783.4 The results of the simulations for the mixed logistic regression model(Model3) 1793.5 Number of trout with liver tumors /number in tank 1803.6 Logistic regression and mixed logistic regression model estimates for fishdata.1813.7 Parameter estimates for four models for fish data 182Y1iList of Figures2.1 The index plot of Pearson residuals from the fitted 3-component mixedPoisson regression model for patent data 1002.2 The index plot of deviance residuals from the fitted 3-component mixedPoisson regression model for patent data 1012.3 The index plot of likelihood residuals from the fitted 3-component mixedPoisson regression model for patent data 1022.4 The index plot of average relative coefficient changes from the fitted 3-component mixed Poisson regression model for patent data 1032.5 The plot of patent data 1042.6 Classification of patent data according to estimated posterior probabilitiesbased on the fitted mixed Poisson model 1052.7 Daily epileptic seizure counts 1062.8 Estimated hourly seizure rates and classification of seizure data according to estimated posterior probabilities based on the fitted mixed Poissonregression model 1072.9 Estimated mean and variance based on the fitted mixed Poisson regressionmodel for seizure data 1082.10 The index plot of Pearson residuals from the fitted mixed Poisson regression model for seizure data 1092.11 The index plot of deviance residuals from the fitted mixed Poisson regression model for seizure data 110V(1I2.12 The index plot of likelihood residuals from the fitted mixed Poisson regression model for seizure data 1112.13 The index plot of average relative coefficient changes from the fitted mixedPoisson regression model for seizure data 1122.14 The time plot of terrorist bombing data 1132.15 Classification of terrorist bombing episodes according to estimated posterior probabilities based on the fitted mixed Poisson regression model. . 1142.16 The index plot of Pearson residuals from the fitted mixed Poisson regression model for terrorist bombing data 1152.17 The index plot of deviance residuals from the fitted mixed Poisson regression model for terrorist bombing data 1162.18 The index plot of likelihood residuals from the fitted mixed Poisson regression model for terrorist bombing data 1172.19 The index plot of average relative coefficient changes from the fitted mixedPoisson regression model for terrorist bombing data 1182.20 Classification of accident data according to estimated posterior probabilities based on the fitted mixed Poisson regression model 1192.21 The index plot of Pearson residuals from the fitted mixed Poisson regression model for accident data 1202.22 The index plot of deviance residuals from the fitted mixed Poisson regression model for accident data 1212.23 The index plot of likelihood residuals from the fitted mixed Poisson regression model for accident data 1222.24 The index plot of average relative coefficient changes from the fitted mixedPoisson regression model for accident data 123hc2.25 Classification of Ames data according to estimated posterior probabilitiesbased on the fitted mixed Poisson regression model 1242.26 The index plot of Pearson residuals from the fitted mixed Poisson regression model for Ames data 1252.27 The index plot of deviance residuals from the fitted mixed Poisson regression model for Ames data 1262.28 The index plot of likelihood residuals from the fitted mixed Poisson regression model for Ames data 1272.29 The index plot of average relative coefficient changes from the fitted mixedPoisson regression model for Ames data 1283.1 The index plot f Pearson residuals from the fitted mixed logistic regressionmodel for fish data 1833.2 The index plot of deviance residuals from the fitted mixed logistic regression model for fish data 1843.3 The index plot of likelihood residuals from the fitted mixed logistic regression model for fish data 1853.4 The index plot of average relative coefficient changes based on the fittedmixed logistic regression model for fish data 1863.5 Classification and dose-response curves for fish data 1873.6 The mean-variance relationship based on the fitted mixed logistic regression model for fish data 188AcknowledgementFirst and foremost, I would like to thank my supervisor Prof. Martin L. Puterman.Marty suggested the basic models used in this thesis and provided me with much neededencouragement — especially in the early stages of our work. His assistance in the thesisresearch, writing and financial support through NSERC grant A5527 is deeply appreciated. His comments on drafts of thesis reflect a thoughtful serious reading and havesubstantially improved the final version. To him I give my deepest thanks.Also, I would like to express my appreciation to my thesis committee members. Prof.Bent Jorgensen has raised many important questions about the thesis and has providedme with very valuable information on related research work. Prof. lain Cockburn introduced me to patent data and provided helpful comments concerning econometrical issuesrelated to it.I also thank to Dr. Nhu Le for constructive comments concerning mixed Poissonregression models, and for providing assistance with seizure data analysis.In addition, I acknowledge the receipt of MacPhee Memorial Fellowship and LeslieG. J. Wong Memorial Fellowship which provided support during my graduate schoolingat this university.Finally, I must thank my fellow students, faculty and staff of the management sciencedivision. The atmosphere has been open, relaxed and hospitable. I consider myself veryfortunate to have known and become friends with so many people here.To my parentsChapter 1IntroductionPoisson and logistic regression models are widely used for analyzing discrete data. Usingsuch models, we implicitly assume that the response variable follows either a Poisson distribution or a binomial distribution with mean depending on covariates. Sometimes suchassumptions may not be appropriate in the sense that the mean-variance relationshipspecified by the distribution of the response variable is not valid. In most of these cases,we often observe that data are overdispersed, i.e., the observed sample variance is largerthan that predicted by inserting the observed sample mean into the mean-variance relationship. On the other hand, in few cases of data analysis, we may also observe that dataare underdispersed, i.e., the observed sample variance is smaller than that predicted byinserting the observed sample mean into the mean-variance relationship. Without takingeither overdispersion or underdispersion into account, using these regression models maylead to biased parameter estimates and incorrect inferences about the parameters. Inthis thesis, we propose using a finite mixture model approach to adjust for overdispersion. Specifically, we incorporate covariates in both mixing probabilities and componentparameters of a finite mixture model in such a way that overdispersion may be explicitlyinterpreted by the model structure. The proposed models have applications in manydifferent disciplines including economics, biostatistics and epidemiology.The work in this thesis was motivated by several studies in different areas. One ofthese studies is to analyze relationship between technological innovation and researchand development expenditures for U.S. high-tech companies. Another study is to assess1Chapter 1. Introductioi 2treatment effects in a clinical trial on epileptic patients carried out in British ColumbiaChildren’s Hospital. For the clinical study, for instance, the patients were randomlyassigned into two groups: control and treatment. Those patients in the treatment groupreceived monthly infusions of intravenous gammaglobulin (IVIG), while those patients inthe control group received “best available therapy”. The primary end point of the trialwas daily seizure frequency. The principal data source was a daily seizure diary whichcontained the number of hours of parental observation and the number of seizures of eachtype during the observation period. We analyzed a typical series of myoclonic seizurecounts from a single subject receiving IVIG. Data extracted from the seizure diary werethe daily counts and the hours of parental observation. The questions of interest hereare that of fitting a model to these counts which describes the pattern of epileptic seizureactivity, and assessing IVIG effects on suppression of myoclonic seizures. Although it isa reasonable assumption that a daily seizure count follows a Poisson distribution whichimplies random occurrence of seizures in time, the data were overdispersed with respectto the Poisson regression model with mean including treatment effect. As indicated bythe clinical investigators conducting this study, they have observed subjects to have “baddays” and “good days” with no obvious explanation of this effect. Hence, we are led toconsider the mixed Poisson regression models which allow seizure frequency function tochange in a random fashion.Several alternative approaches for modelling overdispersion with respect to Poissonassumption are reviewed in Chapter 2. In this chapter, we propose a mixed Poissonregression model and shdw that it includes several special cases such as the usual Poissonregression model, mixed Poisson regression model with constant mixing probabilities andmixed Poisson regression model with constant Poisson rates. We also discuss identifiablityof the proposed model and provide sufficient conditions for identifiability. Maximumlikelihood parameter estimation is used. An algorithm for computation of maximumChapter 1. Introduction 3likelihood estimates is presented (FORTRAN code for implementation of the algorithmis provided in Appendix A). Particularly, for a fixed finite number of components, thealgorithm finds maximum likelihood estimates by two steps: (1) using the EM algorithmfirst until either observed log likelihood or parameter estimates do not change morethan a given tolerance, and (2) using a quasi-Newton algorithm which maximizes theobserved log likelihood function. The results of a Monte Carlo study on performanceof the algorithm are given here. Model selection procedure determining the number ofcomponents and inference about regression parameters is also presented. Classificationbased on the estimated posterior probabilities from the fitted model is discussed. Finally,four applications of this model are given, and results are compared to those from quasi-likelihood approaches.Several alternative approaches for modelling overdispersion with respect to binomialassumption are reviewed in Chapter 3. In this chapter, we propose a mixed logisticregression model and shOw that it includes several special cases such as the usual logistic regression model, mixed logistic regression model with constant mixing probabilitiesand mixed logistic regression model with constant binomial parameters. We also discussidentifiablity of the proposed model and provide sufficient conditions for identifiability.Maximum likelihood parameter estimation is used. An algorithm for computation ofmaximum likelihood estimates is presented (FORTRAN code for implementation of thealgorithm is provided in Appendix A). Particularly, for a fixed finite number of components, the algorithm finds maximum likelihood estimates by two steps: (1) using the EMalgorithm first until either observed log likelihood or parameter estimates do not changemore than a give tolerance, and (2) using a quasi-Newton algorithm maximizes the observed log likelihood function. The results of a Monte Carlo study on performance of thealgorithm are given here. Model selection procedure determining the number of components and inference about regression parameters is also presented. Classification basedChapter 1. Introduction 4on the estimated posterior probabilities from the fitted model is discussed. Finally, anapplication of this model is given, and results are compared to those from quasi-likelihoodapproaches.Chapter 4 concerns summary, conclusions and future research. We discuss some similarities and differences between the mixed Poisson regression and mixed logistic regressionmodels. We extend the mixed Poisson regression and logistic regression models to themore general case of a one-parameter exponential distribution. Mixed exponential regression models are considered in this chapter. Furthermore, we propose hidden MarkovPoisson regression models for longitudinal data. Particularly, we give preliminary resultsof this model, including model definition, moment structure, identifiability and parameterestimation.Chapter 2Mixed Poisson Regression Models2.1 Poisson regression and its modificationsThe Poisson regression model has been widely used for analyzing count data in whicheach observation consists of a discrete response variable and a vector of covariates orpredictors. Typical examples of such data include counts of events in a Poisson or Poisson-like process where the upper limit to the number is infinite or effectively so. For instance,the response variable may represent the number of failures of a piece of equipment perunit time, the number of purchases of a particular commodity per family, or the numberof bacteria per unit volume of suspension. In practice, however, the model sometimesfits poorly, suggesting the need for alternative models. In this case, it is not uncommonthat observed data are overdispersed, i.e., the variance of an observation is greater thanits mean. This may be reflected in over-large residual deviance and adjusted residualswhich have a variance > 1. Without consideration for the overdispersion, using thePoisson regression model may not be justified. In the first part of this dissertation, mixedPoisson regression models are introduced and investigated. These models are applicablein several different situations where the Poisson regression model appears inadequate andprovide an alternative way to adjust for extra-Poisson variation with a more meaningfulinterpretation.Suppose that the ith response variable Y is a count, and associated with this responseis a covariate vector x = (x,. . .,xi,.)’ for 1 i n. The Poisson regression model5Chapter 2. Mixed Poisson Regression Models 6assumes that the Y are distributed independently Poisson (j) with density functionf(ii a’ = exp(—)j) for yj = 0,1,2,..., (2.1)where ) = exp(xa), a e R” is a r-dimensional vector of unknown parameters. Notethat the Poisson parameter X, = E(Y) is related to the covariate vector x by a linkfunction so that the dependence of ) on x is assumed to be multiplicative and is usuallywritten in the logarithmic formlog(X) = xa. (2.2)Equations (2.1) and (2.2) are sometimes referred as a log-linear model.The Poisson regression model has been applied in many areas (e.g., Frome, Kutner,and Beauchamp 1973; Frome 1983; Holford 1983; Hausman et al. 1984; Mannering 1989).For instance, Frome et al (1973) used the Poisson regression model to describe the relationship between the number of failures of a piece of electronic equipment per unit time(response variable) and the times spent in regimes one arid two (covariates), and therelationship between the number of colonies produced in the spleen of recipient animals(response variable) and the concentration of injected cells and the radiation dose (covariates). Frome (1983) applied the Poisson regression model in the analysis of survival timedata. He analyzed the data that were obtained in epidemiologic follow-up studies andorganized into a format similar to that of a life table. Holford (1983) analyzed the datathat consists of numbers of prostatic cancer deaths and mid-period population denominators for non-whites in the US by age and calendar period, and fitted it to the Poissonregression model with age and cohort effects to the death rates. Hausman et al. (1984)introduced the Poisson regression model to analyze the relationship between the researchand development (R&D) expenditures of firms and the number of patents applied for andreceived by them. Mannering (1989) used the Poisson regression model to investigateChapter 2. Mixed Poisson Regression Models 7the determinants of commuter flexibility in changing routes and departure times for themorning trip to work. He assumed that the number of route and departure time changesoccurring during a one month period follows a Poisson distribution with mean dependingon a vector of commuting and socioeconomic characteristics for an individual.The Poisson regression model is analogous to the normal linear regression model inmany ways. The estimation of unknown parameters is straightforward and is done eitherby an iterative weighted least squares technique or by a maximum likelihood algorithm.The log likelihood function is globally concave so that maximization routines convergerapidly. Residual analysis is carried out in the same way as the normal linear regressionmodel, except that the definition of the residual is different.The Poisson regression model is used for many different purposes. Sometimes, inference concerning the regression parameters a is of primary importance. For example, Ymay denote the number of car accidents for an individual. Large values of as (relative totheir standard errors) then correspond to factors which significantly increase the chanceof the accidents. On the other hand, when one is primarily interested in creating a goodpredictive model, the interpretation of parameters may take a secondary role.The Poisson regression model is an example of a Generalized Linear Model (McCullagh and Nelder, 1989) in which the frequency distribution of the response Y is a Poissondistribution with mean )(x), and the link is a log function: g) = log(x)) = x’cv.A consequence of using the Poisson regression model is that the variance equals themean, i.e., Var() = EQI’). In practice, however, we often have overdispersed data,i.e., Var(Y) > E(Y). When the Poisson regression model fits the count data poorly,overdispersion is often a cause of the problem. There are several ways to modify thePoisson regression model. Using GLM formulation we can modify it by choosing eitheran alternative link function or an alternative frequency distribution, or both. Since thelog link has nice properties such as multiplicative effects of covariates on the PoissonChapter 2. Mixed Poisson Regression Models 8mean, few researchers have suggested use of alternative link functions. On the otherhand, there are a lot of studies of alternative frequency distributions for the Poissondistribution (e.g., Breslow 1984; Efron 1986; Lawless 1987b and Dean et al. 1989).To adjust for extra-Poisson variation, mixed Poisson distributions have been usedas frequency distributions (Efron 1986; Lawless 1987b and Dean et al. 1989). In thesemodels, the Poisson means associated with each observed count are defined as latentvariables that are sampled from a specified parametric distribution. In other words, thePoisson means are random variables following a specific distribution. Under such a setup, the marginal density function of the response Y without covariates can be often givenbyPr(Y= y I ,g)= f [vexp(—vA)g(v)dv, y=O,1,... (2.3)where g(v) is a mixing probability density function and ). > 0 is a unknown parameter.Such models can be viewed as multiplicative Poisson random-effects models (Brillinger1986) for the following reasons: (1) there is a random effect T with a density g(v), v > 0in the model; (2) conditional on 1’ = v, the response Y has a Poisson distribution withmean v). Without loss of generality we can assume that E(T) = 1.Most authors have considered a gamma mixing distribution, which leads to a negativebinomial distribution for the observed data (Manton, Woo dbury, and Stallard 1981,Margolin, Kaplan, and Zeiger 1981). In this case the mixing distribution g(v) isr(k)V exp(—kv) for v 0g(v)0 otheTwise.where k > 0 and > 0 are unknown parameters. Note that E(T) = 1 and Var(T) = 1/k.Hence (2.3) becomesf(yA,k)= (k)Y(k)k, fory=0,1,2,..., (2.4)Chapter 2. Mixed Poisson Regression Models 9where k 0 is often referred to the index or dispersion parameter. The mean andvariance of Y areE(Y) = A and Var(Y) A + (l/k)A2. (2.5)As a natural extension of the above models, several researchers (e.g., Lawless, 1987b,and Hausman, Hall, and Griliches, 1984) have studied negative binomial regression models in which covariates are related to the parameter A by a positive function A(x). Usuallyone takes the common log-linear form A(x) = exp(x’a) so that random and fixed effectsare added on the same exponential scale. The negative binomial regression model maybe interpreted as follows: if T is a positive-value random variable with mean 1 and variance 1/k, and if the distribution of Y, conditional on T = v and covariates x, is Poisson(vA(x)), then the marginal mean and variance of Y are as in (2.5), and the marginaldistribution of Y is the negative binomial defined by (2.4).Note that in the negative binomial regression model, the shape parameter k is aconstant for all observations. In this case, the likelihood equations based on the negative binomial model are unbiased and the maximum likelihood estimates of the meanparameters are consistent, regardless of the true variance function (Lawless, 1987b andHausman, Hall, and Griliches 1984).Several researchers apply the negative binomial model in different situations. Forinstance, for count data without covariates, Anscombe (1950) gives a comprehensivediscussion of properties of the model and several examples of the use of the model.Ehrenberg (1972) applies it to model market behaviour for frequently purchased low-cost products by assuming that the number of purchases follows the negative binomialdistribution. For count data with covariates, Manton et al. (1981) use it in the analysisof mortality rates. They assume that variation in individual risk levels follows the gammadistribution within each category, and that conditional on the individual risk levels, theChapter 2. Mixed Poisson Regression Models 10number of cancer deaths follows the Poisson distribution with mean depending on somecovariates including age and race. Hausman, Hall, and Griliches (1984) introduce it tostudy the relation between technical innovation and firm characteristics (mainly R&Dspending and sales) at firm level. They assume that there is a random firm effect describedby the gamma distribution, and that number of patents applied for by a company peryear, Y, follows a negative binomial regression model in which E(Y) = x) is a log-linearfunction of the covariates: annual R&D spending and sales of the company.Another useful choice of the mixing distribution g(v) in (2.3) is an inverse Gaussiandistribution (e.g., Folks and Chhikara 1978, Tweedie 1957) for T, with densityg(v) = (27rrv3)_h/2 exp(—(v — 1)2/2rv), v > 0. (2.6)The parameter r is unknown, and equals Var(T). The marginal distribution of Y from(2.3) is then a Poisson-inverse-Gaussian model with the mean and variance relationship:E(Y) = ) and Var(Y) = )+r\2. This model provides a heavier-tailed alternative to thenegative-binomial model, although both have the same mean and variance relationship.A difficulty of using the model is to compute the integral in (2.3).Dean, Lawless and Willmot (1989) introduce a Poisson-inverse-Gaussian regressionmodel by taking the common log-linear form \(x) = exp(x’cv). This model has almostthe same structure and interpretation as the negative binomial regression models. Jorgensen (1987) and Stein and Juritz (1988) also propose other versions of Poisson-inverseGaussian models by using different variance functions. Jorgensen (1987) defines both thePoisson and inverse-Gaussian distributions as exponential dispersion models so that hismixture model is an exponential dispersion model and satisfies an appealing convolutionsproperty. Stein and Juritz’s model is structured so that the regression parameter vector cis orthogonal to the shape parameter (analogous to the r in the above model) specifyingthe degree of extra-Poisson variation. Neither model has, however, the simple structureChapter 2. Mixed Poisson Regression Models 11of the above model in terms of the multiplicative random effects.A log normal mixing distribution for g(v) has also been advocated (e.g., Hinde 1982and Pocock et al 1981). In this model, the Poisson mean has a lognormal distributionwith location parameter related to a linear function of covariates and a constant scaleparameter.Efron (1986) introduces the double Poisson distribution as an alternative frequencydistribution to accommodate extra-Poisson variation. The exact double Poisson densityish,o(y) = c(A, 0)f,o(y),wheree\ °f(y) = (Oh/2e_Oj ( ) (—) , for y = 0,1,2,...,and the factor c(X, 0) can be calculated as(9) = f,e(y) 1 + (1 +Since the constant c(), 0) nearly equals 1, the approximate probability density functionfor the double Poisson distribution is f,e(y). Usually ) is referred to as a mean parameterand 0 as a dispersion parameter. The double Poisson distribution allows us to individuallyadjust the mean and variance of the response Y using the parameters \ and 0, and it onlyinvolves rescaled Poisson distributions, in the approximate sense that Y is approximatelyexpressed by X/0 where X follows the Poisson distribution with mean )0. For countdata with covariates, we can incorporate covariates to either ). or 0 or both. Efronsuggests that the double Poisson regression model may be more appropriate for countdata in which subjects may be, for example, obtained in clumps rather than by genuinerandom sampling. Note that such clumped sampling may be one of possible causes ofoverdispersion.Chapter 2. Mixed Poisson Regression Models 12Another approach to modify the Poisson regression distribution is the quasi-likelihood.This approach specifies only the mean and variance structure of Y implied by the mixedPoisson model, and estimates the regression coefficients by quasi-likelihood and the variance parameter by the method of moments (e.g., Williams 1982 and Breslow 1987).The attraction is that unduly rigorous assumptions about the frequency distribution areavoided. The trade-off is that the estimation based on the quasi-likelihood model is notas efficient as the fully parametric model (Lawless 1987b).Several researchers have studied different quasi-likelihood models by assuming different relationship between mean and variance. Breslow (1984) introduces the quasi-likelihood models by assuming that conditional on \ and exposure tj, the response Yhas an independent Poisson distribution with mean E(Y) = and log) = x’cr + ewhere c is a vector of unknown parameters and the j are random error terms havingmeans 0 and a constant unknown variance 2. Note that there are no assumptions onthe probability distributions of random effects j except the first two moments.Breslow (1984) also proposes two procedures to fit count data to the model. Oneis when the data have relatively large values of . In this case = log(Y/t) maybe regarded as having approximate normal distributions with mean x’c and variance2 + rj2 where rj2 = 1/E(Y). Hence the estimation method is based on the iteration ofthe following two steps: (1) obtain estimates of the regression parameters by weightedleast squares solution using the empirical weights w2 = (u2 + )_1, and (2) obtain thevalue of a2 by setting the chi-square criterion equal to its degree of freedom, i.e.,— x’a)2/( + r?) =—p,where p is the number of parameters in the model.The other is when the data have relatively small values of Y. In this case, the normalapproximation appears in doubt. Since the above assumptions lead to the approximateChapter 2. Mixed Poisson Regression Models 13mean and variance relationship: E(}’) = t exp(x’a) and Var(Y) +a2, themaximum quasi-likelihood estimates are obtained with GLIM (Backer and Nelder, 1978)by using Poisson error function and the natural log link, declaring log(t2) as an offset,and defining prior weights w = (1 +u2))—’. The value of 2 is also obtained by settingthe chi-square criterion equal to its degrees of freedom, i.e.,—)2/{(+2)}=—where p is the number of parameters in the model. Note that this approach can also applyto the data that have both small and large values of Y, because the above approximationof the mean and variance relationship can still hold.There are also other quasi-likelihood models in the literature for analyzing overdispersed count data. For instance, many non-Poisson distributions encountered in statistical practice may have the connection between the mean and variance of a response Yas expressed byVar(Y) = ciE(Y) +c2{E(Y)}.This relation was used by Bartlett (1936) to analyze counts for field experiments. BothArmitage (1957) and Finney (1976) define another mean-variance relationship asVar(Y) =and find by the study of examples that 1 < b < 2. Breslow (1990) also uses a quasilikelihood model with the above mean-variance relationship to analyze viral activity frompock counts.Another approach for modifying the Poisson distribution is through finite mixtureChapter 2. Mixed Poisson Regression Models 14models which are obtained by taking the mixing distribution in (2.3) as a discrete probability distribution with c points of support. Hence the distribution of Y isFr(Y = Ipi,.. ,Pc,i, . . ,A) = pPo(ywhere p3 = 1 and p3 > 0 (1 < j c), and Po(y I ,)) are Poisson distributionfunctions with mean This approach applies to a wide variety of applications and hasreceived an increasing amount of attention late. See for example Everitt and Hand (1981)and Titterington et al (1985). Simar (1976) and Leroux (1989) study finite mixtureswith an unknown number components for overdispersed count data. No researchers havesystematically studied regression-type finite mixture models with covariates.2.2 Implications of OverdispersionOverdispersion as an issue has been recognized for many years. In Poisson regressionanalysis of count data, residual variability sometimes is greater than what is predictedby Poisson models, suggesting either lack-of-fit (incorrect mean) or overdispersion, orboth. It is important to note that so far various score tests cannot distinguish lack-of-fit from the true overdispersion (incorrect variance). In our discussion, we mainlyconcentrate on the issue of overdispersion rather than the choice of the link function.Without consideration of overdispersion, using the Poisson regression model may bemisleading in statistical analysis. This will be illustrated in our examples later.Many authors have studied the effects of overdispersion on inferences made under thePoisson regression model. As Cox (1983) indicates, overdispersion in general has twoeffects. One is that summary statistics have a larger variance than anticipated under thesimple model. The second effect is a possible loss of efficiency. It is important to note thatthe implications of overdispersion may also depend on the type of overdispersion specified.For the Poisson regression analysis, if the overdispersion is accommodated by randomizingChapter 2. Mixed Poisson Regression Models 15the Poisson mean to obtain gamma-Poisson models and quasi-likelihood models, amongothers (e.g. Cox 1983), fitting maximum likelihood of a log linear model for Poisson-distribution data retains high efficiency for a modest amount of overdispersion, providedthat the log linear model determines the expected value of the observed count (Cox, 1983).Specifically parameter estimates based on the Poisson regression model are generally notseriously biased or inefficient, but estimated standard errors are too small and tests aretoo liberal (Breslow 1990; Cox 1983; Firth 1987; Hill and Tsai 1988; McCullagh andNelder 1989).On the other hand, when there is serious overdispersion, using the usual Poissonregression may lead to either seriously biased or inefficient parameter estimates. Forinstance, in a random coefficient log-linear Poisson regression, the response Y is Poisson(e) given and /3, but each individual has a different random baseline c or differentresponsiveness to treatment /3, parameter estimates of c and /3 as well as their standarderrors based on the Poisson regression may be misleading. In particular, the mean ofa random coefficient is not the Poisson mean evaluated at the average of the randomcoefficients (see Neuhaus et al. 1991). Also if the true log-mean is a + x3 + z-y butonly x is recorded, then the assumed log-mean a* + 43 has a random intercept a* + z7that varies with z. In this case the extent of the overdispersion depends on z, andthe parameter estimate of a* based on the Poisson regression may be seriously biasedwhen the overdispersion is serious. When the extra-Poisson variation is explained by themixed Poisson regression model, we will show, in examples, that without accounting forthe overdispersion may have rather different results from the usual Poisson regression.Chapter 2. Mixed Poisson Regression Models 162.3 Tests for Extra-Poisson VariationThere are several overdispersed Poisson regression models which have been discussed inthe literature. Without fitting a particular overdispersed Poisson model, we would liketo know whether there is serious overdispersion. Several methods have been proposedto detect overdispersion in terms of the Poisson assumption. An informal graphicalapproach is introduced by Lambert and Roeder (1993) and Lindsay and Roeder (1992).For instance, for log-linear Poisson regression, Lambert and Roeder (1993) define thefollowing functionC() = n’ exp(th—()Y2i=1where.= exp(x3) and t > 0. They show that C(,,t) tends to be convex when the dataare from a random mean Poisson regression model, random coefficient Poisson regressionmodel, or double Poisson regression model. Thus they suggest to use the plot of C(t)against it. The more convex C(it) appears, the more evidence there is of overdispersionor an omitted variable. It is not clear, however, whether this, approach can apply to othermodified Poisson regression models such as the finite mixture of Poisson regression modelfor dealing with extra-Poisson variation.Another simple approach is to fit a more comprehensive model that contains thePoisson model and then test for a reduction to the simple model using, for instance,a likelihood ratio test. This approach, however, may provide misleading results (Dean,1992). As Lawless (1987a) indicates, in certain circumstances the asymptotic distributions used with these tests may not be reliable because they tend to underestimate theevidence against the base model.A widely used approach is through score tests. With these tests we may fit the PoissonChapter 2. Mixed Poisson Regression Models 17regression model as a first step in the model building process and test for overdispersion. Score tests for detecting extra-Poisson variation have been discussed by Cameronand Trivedi (1986), Collings and Margolin (1985), Dean and Lawless (1989), and Fisher(1950). Concern has been expressed over the suitability of tests and confidence intervalbased on overly simple models for extra-Poisson. Breslow (1990) proposes tests for parameters that appear in the mean, using model-free estimates of variance for each case.He found that these to be robust to incorrect specification of the variance function, butnot as powerful as tests based on correct model for response variation. Dean (1992)develops a unifying theory for all the score tests mentioned above.Before applying the mixed Poisson regression models, we need to determine whetherthe data are overdispersed with respect to the Poisson distribution in Poisson regression models. We use three score test statistics proposed by Dean (1992). They testthe hypothesis of no overdispersion against alternatives representing different forms ofoverdispersion. The test statistics areP —_____aP —b —and P=corresponding to the following specifications of overdispersion:(a) E(y) j, Var(y1) fL(1 + rt) for r small;(b) E(y) = jj, Var(y) = jj(1 + rj);(c) E(y) = Var(y) = (1 + r).In these formulae ,â is the estimated mean value for the independent identical observations based on Poisson regression. Under H0: r = 0, each asymptotically follows astandard normal distribution. Note that the difference between (a) and (b) is that theChapter 2. Mixed Poisson Regression Models 18former has the approximate forms for the first two moments, whereas the latter has theexact ones.For small samples, Dean (1992) provides the following “corrected” versions P, Pand P corresponding to Pa Pb and P respectively.P, ——— (1 —aF’—_____bd — 1 (yj—— +an—where Ijj is the ith diagonal element of the matrix H = Wh/2X(XTWX)_1XTW1/2,with W = diag(11,.. . , j%,) and X being an n x p design matrix. Dean (1992) points outthat the distributions of these corrected statistics converges very quickly to normality.2.4 Mixed Poisson Regression ModelsWithout covariates, the finite mixture approach has been used for analyzing count dataappearing extra-Poisson variation (c.f. Titterington et al.,1985; Simar 1976; and Leroux,1989). With covariates, however, this approach has not been systematically studied anddirectly applied for analyzing regression-type count data. In this section, we extend thefinite Poisson mixture model to the mixed Poisson regression model by allowing boththe component Poisson parameters and mixing probabilities of a mixture to depend oncovariates. We investigate some basic features of the model. We also discuss identifiabilityfor the model and provide sufficient conditions for the identifiability.Chapter 2. Mixed Poisson Regression Models 192.4.1 The ModelLet the random variable Y denote the ith count response, and let {(y, t, xJ, i =1,. . . , n} denote observations where yj is the observed value of Y, t a non-negative number representing the time period or exposure during which observation y is generated,and x (x, m)) a covariate vector in which x and fm) are k1 andk2-dimensionalcovariate vectors corresponding to the regression part and the mixing part of the modelrespectively. We allow some or all components of x and x to be identical. Usuallythe first elements of (m) and x is a 1 corresponding to an intercept. The mixed Poissonregression model assumes(1) The unobserved mixing process can occupy any one of c states where c is finite andunknown;(2) For each observed count yj, there is an unobserved random variable, A1, representingthe component which generates y. Further, the (1, A1) are pairwisely independent;(3) A, follow discrete distributions with c points of support, 1,. . . , c, andPr(A1=j) = pj3,where=i Pu = 1 for each i and(m)pj3 pj(x1 ,3)(m)exp(/3,x1 ).forj = 1 ... c—i and 2.7)c—i 1(m)i+>k.exp(/3kx1 )c—i(m)Plc pc(x1 ,3) = 1 Pjj, (2.8)3=1where 3= (th. . . ,8c._1)’ and /3 = (,6,,. .. ,/3jk2)’, j = 1,. . . , c — 1, are unknownparameters. Note that all components of /3 appear in each mixing probability Pij,Chapter 2. Mixed Poisson Regression Models 20(4) Conditional on A: = j, Y follows a Poisson distribution which we denote by(r)‘ f3Q’i I x1Po(y I=exp(—Ai3) (2.9)where we define a log link function between the Poisson mean and covariates ast(x, aj) t exp(ax), for j = 1,. . . ,where a (ar,. ..,as)’ are unknown parameters , and a= (aji,. . . , a, )‘, j =1,. . . , c. Note that we could also choose other link functions.The above assumptions define the unconditional distribution of observations, yj, as afinite Poisson mixture in which the mixing probabilities, pj, are related to the covariatesm) through the logit function, and the component distributions are Poisson distributions with mean determined by the exposure, t and by the Poisson rate a3),which is related to the covariates x through an exponential function. Suppose thatobservations can be classified into c groups corresponding to the c underlying states, avector of unknown parameters a may be interpreted as the coefficients of the Poissonregression for group j. On the other hand, unknown parameters /3 may be interpretedas the coefficients of the multinomial regression in which A. and m) are dependent andindependent variables respectively.Note that our model allows some or all components of m) and x to be identical,and some coefficients of Poisson rates, as, to be constant across components, i.e., ai =forj=l,...,cor0inoneorseveralcovariates,i.e.,a31=Ofo s m j,j=l,... c.Under the above assumptions the probability function of Y satisfies,m) t• a, /3) =p ( ) (2.10)Chapter 2. Mixed Poisson Regression Models V 21where pj and Po (yj I ki) are given by (2.7), (2.8) and (2.9) respectively.We may equivalently view the model as arising from the following sampling scheme:Observations are independent; For observation i, component j is chosen according toa multinomial distribution with probability pjj; Subsequently, y is generated from aPoisson distribution with mean )qj.A justification for the mixed Poisson regression models is to assume that the coefficientvector a in the usual Poisson regression model, log()) = a’x, is a random variablefollowing a discrete distribution with c points of support: Pr(a = aj) = p3 for j = 1,. . . c.By making the further asumption that p3 are related to a covariate vector m) througha logit link p(m), /3) we are led to the model of equation (2.10).Note that this model includes many previously studied models as special cases.• Choosing c = 1 yields the Poisson regression model;• Setting r) = m) = 1 and t, = 1 for all i yields an independent Poisson mixturemodel (Simar (1976) and Leroux (1989));• Setting m) = 1 yields an independent finite mixture of Poisson regression. Further, letting the Poisson rates have common regression parameters and differentintercepts yields a Poisson regression with a random intercept which follows a discrete mixing distribution;• Setting cm) = 1, c,= 2 and)1(x, ai) 0 yields a Poisson regression model withan extra mass at 0;• Setting = 1 yields an independent multinomial mixture of Poisson distributionswith constant rates.For the above model, the mean and variance of observation y are, respectively,= E(E(IA))Chapter 2. Mixed Poisson Regression Models 22= tpjjjj (2.11)andVar(1) E(Var(Y I At)) + Var(E(Y I Aj)2= {= {Piii} } (2.12)Obviously, Var(E( I Ai)) = 0 if and only ifi1 = i2 = ... = (2.13)This implies that the mixture model is able to cope with extra-Poisson variation amongY1,. . . , Y,, due to heterogeneity in the population.2.4.2 IdentifiabilityTo be able to estimate the parameters of (2.10), it is important to establish identifiability of the model, that is, two sets of parameters in the mixture which do not agree afterpermutation cannot yield the same mixture distribution. Furthermore, identifiability is anecessary requirement for the usual asymptotic theory to hold for the estimation procedures considered latter. For finite mixture models with covariates we define identifiabilityas follows.Let F = {F(x, 0); 0 E , x e Rd} be the class of d-dimensional distribution functionsfrom which finite mixtures are to be formed. This class is identifiable if=for x e Rd,where p3 = = 1 and pj, j5 are positive, implies that c = and we can orderthe summations such that p3=F = F, j = 1,.. . , c. Note that if a class of modelsChapter 2. Mixed Poisson Regression Models 23is not identifiable we cannot discriminate between (at least two) parameter values usingdata generated by the model.Without covariates, Teicher (1961) proves that the class of finite mixtures of Poissondistributions is identifiable. Considering covariates, we extend the above definition ofidentifiability as follows.Definition 1: Consider the collection of probability models r) m) t1, a,f(y x, t, a, /3)}, with a restriction that ) < ... < )iC, parameter spaceC x Ax P, sample spaces )1,.. . , Y, and fixed covariate vectors (xT), Xm)), ..., (X 4m))(r) k (m) k .where x E R 1 and x2 e R 2 for z = 1,. . . ,n. The collection of probability models isidentifiable if for (c, a, /3), (c*, a*, /3*) C x A x 7)T) (m) a, /3) = f(yt çr) cm) t, a’ /3*) (2.14)for all y Y, i = 1,... ,n, implies (c,a,/3) = (c*,a*,/3*).Note that the order restriction in the definition means that two models are equivalentif they agree up to permutations of parameters.We now provide sufficient conditions for identifiability.Theorem 1: The mixed Poisson regression model is identifiable if both matrices X(m)and X(r) are full rank, where X(m) = (m)m) . 4m)y and x(r) = . . .Proof: Suppose that (c, a, /3), (c*, a*, /3*) satisfy (2.14). This then implies that for each iand allI j) = pPo(y I , (2.15)where p, = p(Xm), /3) and ) = a) are defined above. Note that each sideof (10) may be regarded as a finite Poisson mixture without covariates. Teicher’s resultimplies thatc = c, Pij = and =Chapter 2. Mixed Poisson Regression Models 24for i 1,..., n and j = 1,... , c. By the definition of the model, we obtainexp(xm)) = exp(/3xm)) for j = 1,... , c — 1 (2.16)(r) *, (r)exp(ax ) = exp(a3 x ) for j = 1,. . . , c (2.17)From (2.16) and (2.17) we obtain(í3,—/3)x1 =Ofor=1,...,c—1andz=1,...,n(a — *y(r) = 0 for j = 1,. . . , c and i = 1,. . . , nor(/3 /3;)!X(m) = 0 forj = 1,...,c— 1 (2.18)(crj_a)IX(r)= 0 forj = 1,...,c. (2.19)Sufficient conditions for identifiability are that both X(m) and X(r) are full rank matrices,in which case (2.18) and (2.19) imply that (a, /3) = (a*, j3*). We can assume this withoutloss of generality such as might be the case in an ANOVA structure, since if it does nothold we can reparameterize the model accordingly. D2.5 Parameter Estimation for the mixed Poisson regression modelsTo find the maximum likelihood estimates of the parameters in the mixed Poisson regression model requires an iterative algorithm. Two kinds of widely used algorithms canbe applied to this case: (1) the EM algorithm due to Dempster, Laird and Rubin (1977)and (2) quasi-Newton algorithms (e.g., Nash 1990, and Dennis and Schanbel 1983). Inthis section we discuss how to find the estimates for the mixed Poisson regression modelwith a known number of components by combining both algorithms. We also report theresults of a Monte Carlo study which investigates the performance of our codes and someimplementation issues which will be discussed later.Chapter 2. Mixed Poisson Regression Models 252.5.1 EM and Quasi-Newton AlgorithmsFor a fixed number of components c, we obtain maximum likelihood estimates of theparameters in the above model using the EM algorithm (Dempster, Laird and Rubin(1977)). As is now standard in mixture model estimation, we implement it by treatingunobservable membership of the observations as missing data and representing a completedata set for the model. We discuss choice of number of components below.Suppose that (Y X(T), X(m), T) {(y, ti); i 1,. . . , n} is the observed datagenerated by the mixed Poisson regression model. Let (YZ,Xfr),X(m),T) {(,r) (m) tj; i = 1,. . . , n} be the complete data for the mixture, where the unobservedquantity z = (zi,. . . , z)’ satisfies1 ifA.=jzij =1. 0 otherwise.The log likelihood for the complete data isY Z, X, T) = zjj log(p3)+ log Po (y=1 j=1 i=1 j=1where Pu and Po (yj )u) are given by (2.7), (2.8) and (2.9) respectively.The EM approach finds the maximum likelihood estimates using an iterative procedure consisting of two steps: an E-step and an M-step. At the E-step, it replaces themissing data by its expectation conditional on the observed data. At the M-step, it findsthe parameter estimates which maximize the expected log likelihood for the completedata, conditional on the expected values of the missing data. In our case, this procedurecan be stated as follows..E-step: Given a° and 9(°), replace the missing data Z by its expectation conditionedon these initial values of the parameters and the observed data, (Y x(T), X(m), T). Inthis case, the conditional expectation of the jth component of z equals the probabilityChapter 2. Mixed Poisson Regression Models 26that the observation y was generated by the jth component of the mixture distribution,conditional on the parameters, the data and the covariates. Denote the conditionalexpectation of the jth component of z by ,,(a(°), /3(0)). Then= E (z3 a(0),/3(0),Y;IvI,x(m),X)= Pr (z = 1(m) /3(0)) f, ( (r) aco));1 (Xcm/3o) fi ( i x,ti,a°))’ 1•• (2.20)1VI—step: Given conditional probabilities {(a(°), /3(0)) = (i,1,.. . , .i,)’; i = 1,. . . ,obtain estimates of the parameters by maximizing, with respect to a and /3,Q(a,/3 a°,/3°) = E {1C(cx,/3 I yZ,X(r),X(m),T) IwhereQi=j(a°, 13(0)) log(p) andQ2=(O), /3(0)) log(Po(yThe estimated parameters, & and , satisfy the following M-step equationsI&,/ i [1og(Po(y I )1 = 0 (2.21)-I&, = -- 1/3= L1zj [1og(p3)} = 0. (2.22)Since closed form solutions of these equations are unavailable, we use a quasi-Newtonapproach (Nash, 1990) to obtain estimates. This approach makes use of functions Q, andits gradient g = (, -)‘ to find the estimates through an iterative formula(&, $) = (a, /3) + kBg (2.23)Chapter 2. Mixed Poisson Regression Models 27where B is a transformation matrix evaluated at (a, ), and k the step length. Note thatwhen B in the above iterative equation equals the inverse Hessian matrix of functionQ, this is Newton’s method. We implement the E and M steps in the following way toobtain parameter estimates.Step 0: Specify starting values a° = (a°, . . . , a°) and (o) (9O) O) and twotolerance o and e;Step 1: (E-step) Compute j=(i = 1,.. .,n), using (2.20). To avoidoverflow in the calculation of we divide both the numerator and denominatorin (2.20) by the largest term in the sum in the denominator;Step 2: (M-step) Find values of & and 3 to solve (2.21) and (2.22) using the quasi-Newton algorithm (Nash, 1990). This algorithm consists of two parts: a matrixupdating formula for B and a linear search procedure for k in (2.23). Given B andw (0 <w < 1), it chooses k = 1, w,w2,..., successively until0 < < [Q(&,$) — Q(a,/3)]/tTg for ei <<1where t (&, ) — (a, /3) = —kBg and is given. Given t, B is updated byB = dttT — [t(BSg) + (BSg)tT]/tTSgwhere Sg = g(&, /3) — g(a, /3) and ci (1 + 6gTBSg)/tTSg. Initially, B is set equalto an identity matrix I. Reset B = I if any of the following occurs:(a): tTg 0;(b): (&,)=(a,/3);(c): tT6g 0.Chapter 2. Mixed Poisson Regression Models 28The stopping criterion for the iterations isc k1 c• k2II (&, $) — (a, i3) > I — aj, I + I — /3:j,i 1< E2j=1 1=1 3=1 1=1where e2 is a very small positive number;Step 3: If at least one of the following conditions is true, set a° = & and 9(O) = 3, andgo to Step 1; Otherwise, stop.(0) c k1 (0)(1) ha—a IIE1>Ia,i—a1 I;(2) II — 3(°) ij,i — /3 I e;(3) I l(&, / I Y X(m), T) — l(a(°), I “ X(r), X(m), T) o, where l(a, j3Y, X(r), X(m), T) is the observed log likelihood function.Note that we could have used other versions of quasi Newtonwhich use different updatingscheme for B.Dempster, Laird and Rubin (1977) and Wu (1983) discussed the convergence properties of the EM algorithm in a general setting. Since Q(a, /3 I 3(0)) and its first orderpartial derivatives are continuous in a, 3, a° and 3(°), applying Wu’s theorems (1983)in our case, we conclude that the sequence of the observed data likelihood l(a(2’), IYXfr),X(m),T) converges to a local maximum value l(a*,B* I YX(2’),X(m),T), provided that it is not trapped at any saddle point. Furthermore, if II — II— 0,,j3(P+1)— i9’ II—÷ 0 and the set of local maxima with a given 1 value is discrete, then(a(), /3(r)) converges to (a*, ,3*). Note that for some starting values the stopping criteriain Step 3 above might not be valid. Also l(a, /3 I Y X(m), T) need not, in general,be globally concave. For these reasons, we need to choose initial values carefully in orderto increase the chance that the algorithm converges to the global maximum. We willdiscuss our starting value approach latter.Chapter 2. Mixed Poisson Regression Models 29Note that the above EM algorithm does not directly yield the estimates of the standard errors corresponding to the parameter estimates. On the other hand, when thenumber of components c is known, asymptotic normality of /((&, /3) — (a, 3)) is easilyproved under standard regularity conditions (Lehmann, 1983). To approximate standarderror, we compute o-(&,,1) and a(i3,) from the diagonal elements of the inverse of the(c * k1 + (c — 1) *k2)-dimensional observed information matrix with c fixed at ê which isdefined as821 82132 3cY,Xfr),X(m),T) ——821 8218c8fl 8132An alternative algorithm to the EM which maximizes the observed log-likelihoodl(a,8) l(cr,/3 I YX(r),X(m),T) 1log{p:jPo(yj I )q3} is a quasi-Newtonalgorithm (e.g., Nash 1990). Instead of using the E and M steps, we maximize l(cx, 3) bycomputing successive parameter iterates via the formula(a(1+1), j3(P+1))= (a,19(P)) + kBjgiwhere B1 is the transform matrix evaluated at (a(”),13(P)), gi the gradient of l(a, 3) at(a0’), /3(r)), and k is a search step length. Note that the maximization of l(a, /3) isdifferent from maximizing the complete data log-likelihood Q(a, /3), though the quasi-Newton algorithm is applied in both cases.In principle either the EM or the quasi-likelihood algorithm can be used to producethe maximum likelihood estimates for the mixed Poisson regression model. The EM andquasi-Newton algorithms, however, have complementary strengths. The convergencerate of the EM algorithm is linear which can be quite slow. In fact adjectives such as‘exceedingly’ McCullagh and Nelder (1989), ‘maddeningly’ Redner and Walker (1984),and ‘painfully’ Haberma’n (1977) have been used. As proven by Wu(1983), however, theChapter 2. Mixed Poisson Regression Models 30EM algorithm converges to a stationary point regardless of the initial guess. A quasi-Newton algorithm on the other hand, often requires rather good initial guesses in orderto converge, but the convergence rate in a neighborhood of the solution is much fasterthan for the EM. The rate is quadratic for a quasi-Newton algorithm.A sensible combination of these two algorithms is to use the EM until the iterates arein a neighborhood of the solution and finish up with the quasi-Newton algorithm. Thisis an obvious algorithm to propose and suggestions similar to this have been made. Bockand Aitkin (1981) suggest performing a few EM steps and then one Newton-Raphsonstep. Dempster et al (1977) suggest using a Newton step while Redner and Walker (1984)suggest switching to a quasi-Newton procedure at some point. Note that using the quasi-Newton algorithm, we can obtain the approximate standard errors of the estimates asby-product.To combine the EM and the quasi-Newton algorithm for our case, we modify theabove Step 3 as follows:Step 3’: (a) If at least one of the following conditions is true, set = & and 3(°) =and go to Step 1; Otherwise, go to (b).(1) & - 1 I - cr e;(2) II — II E= E1 I i.i,i — i34 I c;(3) I l(&, I y X(r), j(m), T) — l(c(°), ,6°) I y; x(r), (m), T)(b) Maximize the observed likelihood function l(, I y, M, X(m), X(T)) using thequasi-Newton algorithm (Nash, 1990) with & and 3 as initial values. Then, stop.2.5.2 Starting ValuesWe assume that c is known. The first step of our approach divides the data, {yi,. . . , y},into c groups in terms of its percentiles and fits the data into a c-component independent Poisson mixture model without covariates by choosing initial values based on theChapter 2. Mixed Poisson Regression Models 31percentile information. The second step, if necessary, fits the data into a mixed Poissonregression model containing only one covariate in either Poisson rate or mixing probabilities in such a way that the initial values of the parameters included in the previousmixture model equal the estimates of the corresponding parameters from the previousfitting model, and initial values of the parameters not in the previous fitting model areset to a small value, say, 0.00001. This process is iterated until a complete set of initial values for the mixture model is obtained. The motivation of this ad hoc approachis based on the idea of cluster analysis. At each iteration, we use different criteria toclassify the data. First, the data are classified in terms of its percentiles. Then the dataare classified in terms ofindependent Poisson mixture model, and subsequently in termsof mixed Poisson regression models. Note that choosing a complete set of initial valuesfor a mixture model step by step in such a way guarantees that the likelihood values willincrease in each step. Also our approach obtains maximum likelihood estimates for asequence of nested mixture models.We use an example to explain this approach. Suppose that we need to choose initialvalues to fit a 3-component mixture model with covariates x” = (1, d) and cm) (1, e)where d: and e are real numbers. First, we find 16.5, 33.0, 49.5, 66.0 and 82.5 percentilesof observations {yi,. . . , y} denoted as ql-q5 respectively, and fit the data into a 3-(r) (m) . .component independent Poisson mixture model (x1 = = (1)) with the initialvalues of a1, cr2,1 and cr3,1 equal to log(qi), log(q3) and log(q5) respectively, and boththe initial values of and /32,1 equal to 0. Note that under this specification and anexponential link function, the initial values of X3(x, cry), (j = 1,2, 3) are equal to q,q3 and q5 with the same mixing probabilities 1/3. Second, we fit the data into the 3-component Poisson mixture model with x = (1, d) and (m) = (1) by choosing theinitial values of cr1,2 cv2, and cr3,2 equal to 0.00001 and the initial values of the otherparameters equal to the estimates of the corresponding parameters of the first fittingChapter 2. Mixed Poisson Regression Models 32model. Finally, we choose initial values for the 3-component Poisson mixture model with= (l,d:) and (m) =(1,e) in such a way that /31,2 and /32,2 are equal to 0.00001 andthe other parameters is equal to the estimates of corresponding parameters of the secondfitting model.2.6 A Monte Carlo StudyThis section consists of two parts. In the first part, we use Monte Carlo methods toexamine the performance of the above algorithm. In particular, we wished to verify thereliability of our code, determine the precision of estimates and investigate some modelselection criteria. We use three 3-component mixture models. For each, we analyzed 100replicates, each with 100 observations. In the second part, we use Monte Carlo methodsto study how the mixed Poisson regression models can be used to analyze some typicalproblems in practice. We also fit the simulated data to Poisson regression, models andcompare them with the mixed Poisson regression models.2.6.1 Performance of the Estimation Algorithm’Two different approaches for choosing initial values are compared in the study. In one, weuse the true parameter values of the model generating the observations as initial valuesin order to determine performance of the algorithm in the best case. The other usesthe true parameter values of a, c2,1 and a3,1 as initial values, chooses initial values of/31,1 and /32,1 according to the approach described in section 2.5.2, and fits the samplesto a 3-component independent Poisson mixture model. Then, following the approach ofsection 2.5.2, we choose a complete set of initial values for the parameters of the modelgenerating the samples. These two different approaches of choosing initial values lead toessentially the same estimates. We describe the details below.Chapter 2. Mixed Poisson Regression Models 33Model 1: A model with Poisson rates depending on one time-dependent covariate, withconstant mixing probabilities and t, = 1. For the regression part,r)= (1, di), (2.24)whered=0.2fori=1,...,10,d=0.4fori=11,...,20,etc.,ana (ai, a2, a3) (2.25)where o = (2.8, 2.9), c4 = (2.6, 0.4) and a = (3.6, 0.2). For the mixing part,(m)= 1/9 = (,6, /92) = (1.1, 0.6).For the Poisson rates, we choose an exponential link function defined by= exp(2.8 — 2.9d) (2.26),\2(xT),a) exp(2.6 + 0.4d) (2.27)= exp(3.6 + 0.2d), (2.28)and the mixing probabilitiesPl(Xm)/3) 0.5156,p2(xH,/3) 0.3127and p3(xi9) 0.1717.Model 2: A model with constant Poisson rates and mixing probabilities depending onone time-dependent covariate. That is, for the regression part,(r)=a (ai, a2, a3) = (0.4, 3.0, 2.0)Chapter 2. Mixed Poisson Regression Models 34and for the mixing part,where cl2 is defined as above, and(m)= (1, d) (2.29)i3(th, /32)where = (2.0, —1.4) and = (—2.0, 1.5). The Poisson rates, then, are(r)x ,ai) 1.49,(r)X ,a2 20.08and (r)X ,a3 7.39,and the mixing probabilities are given by(2.30)(m)pi(x ,3) = exp(2.0 — 2.0d)exp(2.0 — 2.0c4) + exp(—1.4 + (2.31)(m)p2(x ,i3) =(m)p3(x ,/3) =exp(—1.4 + 1.5d)exp(2.0 — 2.Odj + exp(—1.4 + 1.5d) + 11exp(2.0 — 2.0d) + exp(—1.4 + 1.5c4) + 1(2.32)Model 3: Both the Poisson rates and mixing probabilities depend on the covariate d.For the regression part, x, a and a) are given by (2.24), (2.25), (2.26), (2.27),and (2.28) respectively; For the mixing part, cm) /3 andp3(Xm), /3) are given by (2.29),(2.30), (2.31), (2.32) and (2.33) respectively.We chose the above parameter values so that the Poisson rate functions do not crosseach other and the ranges of the mixing probabilities for each component do not overlap.We would expect that in this case, the algorithm would perform well.l.5d:) + 1(2.33)Chapter 2. Mixed Poisson Regression Models 35We carried out these simulations, each with 100 replicates. In each case, the responsey were obtained by first generating a uniform (0,1) random number u and then assigningPoissonXi(xT), ai) if u p1(m) ,B), y ‘-‘.‘ Poisson(A2(x,a2)) ifP1(X(m) 3) <(m) (m) . (r) . (m) (m)u, pi(x ,/3)+p2(x ,8),ory ‘-.‘Poisson(.)is(x ,as))ifu >p1(x ,/3)+p2(x ,/3).The results of the Monte Carlo study are presented in Table 2.1. The table showsthat for each parameter the mean of estimates is very close to the true value in themodels, suggesting that the global maximum of the observed likelihood is reached. Formodel 1, the sample means are quite close to the true values and the standard deviationsare relatively small. Although the Poisson rates of model 2 are estimated accurately,estimates of mixing probabilities are more variable. This suggests that estimating mixingprobability parameters in this model is intrinsically more difficult than estimating Poissonrates. This agrees with observations in the literature (Titterington et al., 1985; Mclachlanand Basford, 1988). Estimates of the parameters of model 3 illustrate the same patternas in Model 2 where estimates of the mixing probability parameters are more variablethan those of Poisson rate parameters. Note, however, that although the estimatesof mixing probability parameters, 9, vary somewhat, the estimated mixing probabilities,(m) . .p3 (x , ), are more precise due to the multimonial link function between the parametersand mixing probabilities.Our implementation of the algorithm used FORTRAN on a Sun SPARC station 1.The average number of the iterations of the EM algorithm for Model 1 is 4.75, 4.93 forModel 2 and 55.6 for Model 3 under the stopping criterion = 0.01, and average time is6.65, 7.39 and 79.2 seconds respectively.2.6.2 The mixed Poisson regression Models For Some Typical ProblemsIn a clinical trial it may not be uncommon for a treatment to have a significant effecton some subjects but not on others. Thus subjects under treatment may be classifiedChapter 2. Mixed Poisson Regression Models 36into two groups: responding and non-responding. Models which ignore this distinctionoften are unable to detect such a treatment effect. For example, in a clinical trial carriedout at British Columbia Children’s Hospital which investigated the effect of intravenousgammaglobulin (IVIG) on suppression of epileptic seizures, the clinical investigators conducting this study found that some patients responded to the treatment and others didnot. Using Poisson regression to analyze the seizure count data, we found that the dataare seriously overdispersed. To explore whether the proposed mixed Poisson regressionmodels can be used to describe and analyze such a scenario, we carried out he followingMonte Carlo study.In the study, we used eight 2-component mixed Poisson regression models in whichthe mixing probabilities are constant Pi and P2, and the Poisson rates are defined by= exp(ai+a2j)and 2(x,ai) = exp(ai),where x = 1 if i < 50; and 0 otherwise, and i = 1,. . . , 100. This model describesthe following situation: there are 50 subjects in each of two groups (e.g., treatment andcontrol groups) for a study which records the observed responses for all subjects; thebackground effects are characterized by the Poisson rate exp(ai); Pi 100% of subjects inthe treatment group respond to the treatment which has an effects characterized by thePoisson rate exp(ai +a2) where a2 < 0; and the other P2 100% subjects in the treatmentgroup do not respond the treatment, and their responses are the same as the backgroundeffects. These eight models in the study are defined by choosing all combinations ofparameter values from the following: p1 = 0.6, 0.4, a1 = 1.0, 2.0, and a2 = —0.5,—2.5. Note that the actual Poisson rates of the background effects are 2.7183 and 7.3891evaluated by exp(ai) respectively, and the rates of the treatment effects 1.6487, 4.4817,0.2231 and 0.6065 by exp(ai + a2) respectively.Chapter 2. Mixed Poisson Regression Models 37We carried out these simulations, each with 200 replicates. The responses yj wereobtained by first generating a uniform (0,1) random number u and then assigning yjPoisson(i(x, a1,a2)) if u p1 and y, Poisson(.\2(x,ai)) otherwise. Our implementation of the algorithm used FORTRAN version on a Sun SPARC station 1.The results are reported in Table 2.2 and Table 2.3. It summarizes the propertiesof the estimated coefficients. Among all eight models the means of &i, & and j arevery close to the their true values, and their sample standard deviations are very smallcompared with the magiitudes of the estimates. This means the maximum likelihoodestimates are achievable and robust for not only different choices of background and treatment effect but also different choices of responding rates. Since the means and mediansof the parameter estimates are very close and upper and lower quartiles are roughly symmetric at the center of the means, the parameter estimates follow approximately normaldistributions. Indeed, the histograms of the parameter estimates (not given here) shownormal distribution patterns.To investigate the treatment effect, we test the hypothesis of a2 = 0 by computing thelikelihood ratio test statistic. Note that the chi-squared approximation for the likelihoodratio test statistic may not be justified here because the regularity conditions may benot satisfied on the boundary. We use it in these cases as a guideline. The test resultsare summarized in Table 2.4 in which the numerator in each cell is the number of thetimes that we reject th hypothesis at 5% significance level, and the denominator isthe total number of the replicates. Clearly when the treatment effect is highly significant(a2 = —2.5), we reject the hypothesis of a2 = 0 for almost all replicates at 5 % significancelevel. This means the likelihood ratio test may work well in these cases. On the otherhand, when the treatment effect is small (a2 = —0.5), the likelihood ratio test may notbe appropriate partially because the difference between the background and treatmenteffects may not be significant enough for the test. The baseline effects may not affect theChapter 2. Mixed Poisson Regression Models 38tests significantly, while the mixing probabilities (respond rate) have some impact on thetests. Note that when P’ = 0.4, there may be only 20 subjects out of 200 who may havea significant treatment effect.In order to compare the mixed Poisson regression model with Poisson regression, wefitted the simulated data with the Poisson regression model with covariate (1, xi). Theresults are summarized in Table 2.5. The means of the intercept estimates in the Poissonregression are very close to the true values in these cases, suggesting that the background effects are appropriately estimated. However the treatment effects are seriouslyunderestimated in these cases because the model cannot distinguish the non-respondingsubjects from the responding subjects. For example, in the two cases of the low treatment(cr2 = —0.5) and low background (ai = 1.0) effect, the estimate of the treatment effectby the Poisson regression is —0.2668 for the mixing probability p = 0.6 and —0.1611for p = 0.4, which are about one half and one quarter of the true parameter valuerespectively; In the two cases of the high treatment (a2 = —2.5) and high background(a1 = 2.0) effect, the estimate of the treatment effect by the Poisson regression is -0.8065for the mixing probability p = 0.6 and —0.4536 for P1 = 0.4, which are less than onequarter and one fifth of the true value respectively. We also carried out the test for thehypothesis of a2 = 0 using the likelihood ratio test statistic. The test results given inTable 2.6 in which the numerator in each cell is the number of times that we reject thehypothesis, and the denominator is the total number of these tests. For example, in thetwo cases of the low background (ai = 1.0) and low treatment (a2 = —0.5) effect, 99times out of 200 for mixing probability p = 0.6 and 47 times out of 200 for Pi = 0.4,respectively, that we reject the hypothesis at 5% significance level; In the two cases of thehigh background (ai = 2.0) and high treatment (a2 = —2.5) effect, we always reject thehypothesis at 5% significance level for both the mixing probability values. Note that thePoisson regression is more powerful except one case, although Table 2.4 and Table 2.6Chapter 2. Mixed Poisson Regression Models 39have a similar pattern.Using the mixed Poisson regression model, we can classify subjects as respondingand non-responding. In the Monte Carlo study, for x = 1, yj is identified with groupone generated by Poissn rate .Xi(x, a, o2) if the estimated posterior probability ofbeing group one > 0.5, and with the other generated by Poisson rate )2(x, ai)otherwise. For 200 replicates the mean of the number of subjects in the treatment groupwho responded to the treatment is very close to 5OPi, suggesting that the classificationcriterion works well.2.7 Implementation Issues2.7.1 Model SelectionWe need to address the following two issues when applying a mixed Poisson regressionmodel: (a) We must determine the number of components c, and (b) we must have amethod to carry out inference about model parameters. When c is known, inference forthe parameters can be bsed on a likelihood ratio test. In practice, however, this is rarelythe case. When c is unknown, the likelihood ratio test is no longer valid for determiningc or testing hypotheses about parameter values. This is because the usual regularity conditions do not hold for the likelihood ratio test statistic to have its standard asymptoticnull distribution of chi-squared with degree of freedom equal to the difference between thenumber of parameters under the null and alternative hypotheses. One of the regularityconditions requires that the parameters in a mixture are identifiable without any restriction. This ensures that the information matrix is non-singular. The main problem hereis the lack of identifiability even when the class of the mixed Poisson regression modelsis identifiable. As McLachlan and Basford (1988) illustrate this, consider a 2-componentChapter 2. Mixed Poisson Regression Models 40mixture without covariates. The null hypothesis that there is one underlying population,H0 : c = 1,can be approached by testing whether p = 1, which is on the boundary of the parameterspace with a consequent breakdown in the standard regularity conditions. Alternatively,we can view H0 as testing for whether ) = ‘2, where now the value of p is irrelevant.If for a specified value of P1 regularity conditions held, so that the log likelihood ratiotest statistic under H0 were distributed asymptotically as chi-squared, then the nullasymptotic distribution of the likelihood ratio test statistic where P1 is unspecified, wouldcorrespond to the maximum of a set of dependent chi-squared variables. A comprehensiveaccount of the breakdown in regularity conditions has been give by Ghosh and Sen (1985);see also Hartigan (l985a,b), Titterington, Smith and Makov (1985), and Mclachlan andBasford (1988). We propose the following methods for model selection.In general, there are two criteria used for statistical model selection: the prillcipleof parsimony and closeness to the true distribution. The former means that more parsimonious use of parameters should be pursued so as to raise the accuracy of estimatesfor unknown parameters in a model. On the other hand, closeness to the true modelis incompatible with parsimony of parameters. These two criteria form a trade-off: ifone pursues one of the Oriteria, the other must be necessarily sacrificed. The multiplecorrelation coefficient adjusted for the degrees of freedom may be most commonly usedstatistic that incorporates these two incompatible criteria into a single statistic.Akaike (1973) has proposed a more general as well as more widely applicable statisticthat ingeniously incorporates the above two criteria. As it is based on the KullbackLeibler Information Criterion (KLIC), Akaike’s statistic is called Akaike InformationCriterion and is abbreviated as the AIC. The AIC can be derived as follows.Chapter 2. Mixed Poisson Regression Models 41Suppose that the adequacy of a postulated model F(y 0) to approximate the unknown true distribution G(Y) is measured by the KLICif ri i-if n\\ i—i ri gI jY. r ‘V. Li)) = rIGIog owhere 0 is a finite-dimensional vector of unknown parameters; g and f are density (orprobability) functions of G and F respectively; EG(.) stands for expectation with respectto the true distribution 0. We define a pseudo-true model F(. I 0) with a parametervalue 0 such thatI(G: F(. 0)) <I(G: F(. I 0))for any possible 0 in the admissible parameter space. The model F(. I Oo) may beregarded as the most adequate relatively within the family models F(y I 0) in the sensethat the KLIC for FQ,, I 0) is minimized by F(y I Oo).Assuming that 1(0 : F(. I 0)) = O(n’), i.e., the pseudo-true model is nearly true,Akaike (1973) derivesAIC(F(. 0)) = —2 log f(y I Ô) + 2kas an almost unbiased estimate for —2EG[log f(Y I &o)] where Ô is the maximum likelihoodestimate for 0 based on observation y and k is the number of unknown parameters, i.e.,the dimension of 0. Note that the first term of the AIC measures the goodness-of-fit of themodel to a given set of data, because f(y I 0) is the maximized likelihood function. Thesecond term is interpreted as representing a penalty that should be paid for increasing thenumber of parameters. In this sense the AIC may be regarded as an explicit formationof the so-called principle of parsimony in model building.Schwartz (1978) has proposed another model selection criterion: the Bayesian Information Criterion (BIC). The BIC is defined through a larger-sample version of BayesChapter 2. Mixed Poisson Regression Models 42procedures by placing a prior distribution on the parameter space including all dimensions and models considered. It can be derived as follows.We assume that observations are generated by a distribution from a family with adensityf(y,O) = exp(9 . x(y) —where 0 e 3, a convex subset of the K-dimensional Euclidean space, and x(y) is thesufficient K-dimensional statistic. The competing models are denoted by sets m3 ewhere m3 is a k- dimensional linear submanifold of K-dimensional space.Since the a priori distribution need not be known exactly for the asymptotic results,we assume that it is of the form aj,uj, where a3 is the a priori probability of the jthmodel being the true one, and the conditional a priori distribution of 0 given the jmodel, has ak3-dimensional density that is bounded and locally bounded away form zerothroughout m3 E 0.Finally, we assume a fixed penalty for guessing the wrong model. Under this assumption, the Bayes solution consists of selecting the model that is a posterior most probable.That is equivalent to choosing the j that maximizesS(X,n,j) = iogf aexp(X.O — b(0)n)d(0),where the integral extend over m3 E 0, and X is the averaged x-statistic (1/n) > X(yj).For fixed X and j, as n tends to infinity, we obtain the asymptotic expansion ofS(X,n,j) asS(X,n,j) = nsup(X .0 — b(0)) — klogn + R,where the remainder R = R(X, n,j) is bounded in n for fixed x and j. Therefore, for alarge sample, maximizing S(X, n, j) in j is equivalent to maximizingIC=logf(yi,...,y)— klogn,Chapter 2. Mixed Poisson Regression Models 43wheref3(yi,.. .,y) is the maximum likelihood function for model j, and k3 is the dimension of the model.Qualitatively both the AIC and BIC give a mathematical formulation of the principleof parsimony in model building. Quantitatively, since the BIC differs from the AIConly in that the dimension is multiplied by (log n)/2, the BIC leans more than the AICtowards lower-dimensional models. For large numbers of observations the two modelselection procedures differ markedly from each other.McLachlan and Basford (1988) discussed the use of AIC to determine the number ofcomponents in a finite mixture model. Leroux and Puterman (1992) applied AIC andBIC to select independent Poisson mixture models. We define the AIC and BIC criteriafor the mixed Poisson regression model as follows:• AIC: choose the model for which l(X)— ac(X) is largest;• BIC: choose the model for which l(X) — (1og(n))a(X) is largestwhere l(X) is the maximum log-likelihood of the mixture with c components and covariate X, a(X) = c * k1 + (c — 1) * k2 where k1 and k2 are the dimensions of a3 and13j respectively, and n is the total number of observations. As discussed above, these twocriteria do not always select the same model; the BIC tends to select a smaller numberof components than AIC when there are 8 or more observations.Using the BIC (Ale), our model selection approach consists of two stages. At thefirst stage, we determine c to maximize BIC (AIC) values for the saturated 1-3 (1-4)component mixture models that contain all possible covariates in both rates and mixingprobabilities. Although we compute both AIC and BIC values in our applications, werecommend using BIC because Monte Carlo studies reported below suggest that BIC ismore reliable in the model selection. At the second stage, our model selection approachdepends on our analysis objectives. If our goal is inference about some particular modelChapter 2. Mixed Poisson Regression Models 44parameter, we carry out likelihood ratio tests for nested c-component mixture models. Ifthe goal is choosillg an appropriate model to fit the data, we select a model to maximizeBIC (AIC) values among c-component mixture models concerned. Since this selectionmethod is heuristic and oniy gives a guideline in applications, some other specific concernsin model selection should be taken into account from case to case. For instance, insome applications the number of components and some parameters in a mixture maybe explicitly or implicitly determined by underlying theory, especially when a mixturemodel is intended as a direct representation of the underlying physical phenomenon. Fora housing market in disequilibrium, the market has two phases: supply and demand. Ifwe regard the phase in o.peration in any given month to be the unobservable underlyingstate because it may not be clear which phase is in operation, we have a two-componentmixture model. Goldfeld and Quandt (1973) discuss such a model and denote it as aswitching regression model.In the Monte Carlo studies discussed in Section 2.6.1, we computed both AIC andBIC values for all possible mixed 2 to 4 component models. Table 2.7 shows that AICand BIC are reliable methods for choosing the correct models. AIC chose the correctmodel 96% of the time for Model 1, 87% of the time for Model 2 and 91% of the time forModel 3. When AIC failed to select the correct model, it always chose a model with toomany components, suggesting that AIC may under-penalize the number of parametersin the mixtures. On the other hand, BIC always chose the correct models, suggestingthat BIC may not over-penalize the number of parameters. Note that all sample sizesin the Monte Carlo studies are 100. The examples in the next section will exhibit thisprocedure in practice.Chapter 2. Mixed Poisson Regression Models 452.7.2 ClassificationIn classification, the nuiriber and composition of groups are not known at the start ofthe investigation. On occasion, the aim of a classification study may be to enable thesubsequent assignment of new objects. For instance, in pattern recognition (Fukunaga,1972, and Duda and Hart, 1973), information about ‘patterns’ can be obtained from a‘training’ set of observations which may be analyzed by classification method.Fitting the mixed Poisson regression models to Poisson-distribution data, we assumethat each observation belongs one of c groups characterized by the Poisson rate functions.One possible use of the mixed Poisson regression model is to classify data on the basisof a probabilistic model rather than an ad hoc clustering technique. Since in (2.20)is the estimated posterior probability that the ith observation yj is generated by thejth component distribution f (y I r) cj), this information can be used to classifyobservations into different groups characterized by the component distributions. Forinstance, for a c-component mixture model we may postulate c different groups defined bythe c different forms of Poisson rates, c) (j = 1,.. .., c) of the model. Accordingto the classification criterion, an observation i is identified with the component whichmaximizes . In our Monte Carlo study this classification criterion works very well.Also in our applications, maximum values for this quantity all exceed 0.5. Note that if theparameters of the model were known, this classification criterion would be the optimalor Bayes rule (Anderson, 1984, chapter 6) which minimizes the overall error rate. Alsosuch a approach has been referred to as latent class analysis (Aitkin et al. 1981). Weillustrate this approach in examples below.Chapter 2. Mixed Poisson Regression Models 462.7.3 Residual Analysis and Goodness-of-fit TestOnce a mixed Poisson regression model has been fit to a set of observations, it is essentialto the quality of the fit. For this purpose, we consider Pearson, deviance and likelihoodresiduals for mixed Poisson regression models, and use them to identify individuallypoorly fitting observations and influential observations on overall fit of the model as well.We also define a quantity to measure influence of individual observations on the set ofparameter estimates, and use it to identify influential obseryations. In addition, we givegoodness-of-fit statistics for mixed Poisson regression models.Definitions of ResidualsFor Normal regression models, we can express an observation yj of the response variable in the formyi = + (y—where is the maximum likelihood estimate of the mean of yj, i.e., data=fitted value+residual. Residuals are used in many procedures designed to detect various types ofdisagreement between data and assumed model. For example, the scatterplot of residualsversus fitted values that accompanies a linear least square fit is a standard tool used todiagnose nonconstant variance, curvature, and outliers. Diagnostic tools such as thisplot have two important uses. First, they may result in the recognition of importantphenomena that might otherwise have gone unnoticed. Outlier detection is an exampleof this, where an outlying case may indicate conditions under which a process worksdifferently, possible worse or better. Second, the diagnostic methods can be used tosuggest appropriate remedial action to the analysis of the model.For generalized linear models there are at least three types of generalized residualsChapter 2. Mixed Poisson Regression Models 47which are widely used in practice. One is the Pearson residual defined as234—where V(j) is the variance function and is the maximum likelihood estimate of the ithmean of fitted to the regression model. These residuals are the signed square roots of thecontribution to the Pearson goodness-of-fit statistic X2. For the usual Poisson regressionmodel, the Pearson residual isyi — ilir=where = exp(x&) and & is the maximum likelihood estimates of the regression parameters; for the usual logistic regression model,yj—m:j3j—________where j3 = logit(x&).The second type of generalized residuals is deviance residual defined asrd = sign(yj — j)/2[l(y, Yz) — l(/%, y)}= sign(yj — (2.35)where l(, yj) is the log likelihood function for y and d is the contribution to the deviancegoodness-of-fit statistic D. For the usual Poisson regression model,d = 2(y: 1n(y/) — yj + Iti),where j% = exp(x&); for the usual logistic regression model,d = 2yln () + 2(m — y)ln (mi_ i)m—twhere t% = mj3 = mlogit(x&). The third type of generalized residuals is the likelihoodresidual which is derived by comparing the deviance obtained on fitting a linear modelChapter 2. Mixed Poisson Regression Models 48to the complete set of n cases, with the deviance obtained when the same model is fittedto the n — 1 cases, excluding the ith, for i = 1,... , n. This gives rise to a quantity thatmeasures the change in the deviance when each case in turn is excluded from the dataset. The likelihood residual for the ith case is defined asrj = sign(y —— D() (2.36)where D and D() are the deviances based on n and n — 1 cases respectively. Pregibon(1981) derives useful one step approximation for the above exact value byI h,ri slgn(yj—1 — hpt +where h is the ith diagonal element of the n x n matrixH =W’/2X(X’WX)’X’W1/. (2.37)In this expression for H, W is the n x n diagonal matrix of weights used in fittingthe linear model and X is the n x k design matrix. Fo.r Poisson regression model,W = diag{%j,.. . , ,t%}; for logistic regression model, the ith diagonal element of W ismj3(1 —j3).Note that when the response Y follows a Normal distribution, r follows x distribution; when Y follows a non-Normal distribution, r does not asymptotically followsx distribution as n — because the asymptotical theory does not hold in this case(Williams, 1987).To standardize the above residuals so that they have approximate unit variance, oneneeds to account for the inherent variation in the fitted values /j. In general, for any typeof residuals R(y1,i2), Pierce and Schafer (1986) show that its variance is approximatelygiven byVar[R(y, 2)] Var[R(y2,tJ] — Var[(f — it)/SD(y)] (2.38)Chapter 2. Mixed Poisson Regression Models 49as —* cc. For either Poisson regression or logistic regression model,Var[(2 — j/SD(y)] =where h is defined by (2.37). Therefore, we can standardize rd and ri by dividingthe factor /l — h because for all three types of the residuals the first term in the rightside of (2.38) is 1.Several researchers have compared differences between these three types of residuals(e.g., Pierce and Schafer, 1986; Williams, 1987; McCulagh and Nelder, 1989; and Collett,1991). The value of ri is intermediate between rd and and it is usually much closerto rd than to Both rd and r1 take account of the shape of the distribution of Ywhich is ignored by Both rdl and have distributions which are closer to normalitythan that of rpj. For outlier detection seems the best choice because of its relevanceto the measurement of case influence on likelihood ratio tests.Several types of residual plots are useful for different purposes of diagnostics. Forexample, an index plot that the residuals are displayed against the corresponding observation number or index is particularly suitable for detection of outliers. Although a plotof the residuals against the fitted values j or an explanatory variable is more informativethan an index plot for normal regression, it may be uninformative for Poisson regressionbecause when the mean of the response variable is small; there may be a pattern in theplot no matter whether the model is correct or not. Indeed, if yj = 0, = =This means that for small mean values, the residuals are not approximately normal.Analogously, we can define the same three types of residuals for mixed Poisson regression models. That is, the Pearson residual, rp, for mixed Poisson regression modelsis given by defining /‘L. and V(/11) in (2.34) as= t (2.39)Chapter 2. Mixed Poisson Regression Models 50where(r)= exp(crx ),i, (m)expi p •x•Pu = for y = 1,. . . , c — 1 andexp(/3x) + 11PicEJ’ exp(/xm)) + 1’andV(ju) = tu + t - ]2}The deviance residual, rD, for mixed Poisson regression models is given by definingthe log likelihood function l(,a, yj) in (2.35) as(r) (m)l(uj, y) = log[f(y ; ,; , t, a, 3)] (2.40)(r) (m) .where f(y. ; ,x ,t,a,B) is defined in (2.10). Note that l(yu,yj) is the same forboth generalized linear models and mixed Poisson regression models because we have thefollowing relationf(u I x),xm),ti,a,/3) = P(C= Po (yj IThis indicates that there is the same baseline for generalized linear models and mixedPoisson regression models.The likelihood residual rL for mixed Poisson regression models is given by definingas specified in (2.39), and D and D() as the deviances based on the data set of ri andChapter 2. Mixed Poisson Regression Models 51n — 1 cases for the mixed Poisson regression model. Computing the likelihood residualsrequres fitting the model n times, each having good starting values which are alreadyavailable in our algorithm. In contrast to linear normal regression, it may require fittingthe model only once.Note that for the residuals of mixed Poisson regression models, equation (2.38) stillhold. Thus, to account for variation in the fitted values j, in these three types of theresiduals, we need to calculateVar[(i%2 —However, the computation of this variance now becomes too complicated. Fortunately,for large samples, are very close to ,uj so that the variation in the fitted values may benegligible.Example. Ré1D and Patent In modeling the patent data from Section 2.8.1 on therelationship between R&D spending and number of patent applications at firm level, a3-component mixed Poisson regression model is found to be satisfactory. The analysiswill be given in Section 2.8.1. Figure 2.1, Figure 2.2 and Figure 2.3 give index plots ofthe Pearson, deviance and likelihood residuals respectively.Figure 2.1 shows that the Pearson residuals may not aprroximately be normal. Onthe other hand, Figure 2.2 and Figure 2.3 show that the deviance and likelihood residualsare very similar to each other. Note that the 6th has the largest Pearson residual andthe 8th has both largest deviance and likelihood residuals. These plots suggest that thedeviance residuals and the likelihood residuals may be likely to perform similarly in termsof the ranking of extreme observations. In fact, the empirical evidence to be presentedin examples in Section 2.8 suggest the same. The numerical studies also indicate thatrm and rLI are more approximately normal than rp. Since the likelihood residuals aremuch more difficult to compute than any other type of residuals, we recommend usingChapter 2. Mixed Poisson Regression Models 52TDi routinely.Detection of Outliers and Influential ObservationsThe residuals obtained after fitting a mixed Poisson regression model to an observedset of data form the basis of diagnostic techniques for assessing model adequacy. Since ourprimary objective of residual analysis for mixed Poisson regression models is to identifyoutliers and influential observations, we discuss how these residuals can be used for thisobjective.Like generalized linear models, we define outliers as those observations that are surprisingly distant from the remaining observations in the sample. Such observations mayoccur as a result of measurement errors, that is errors in reading, calculating or recordinga numerical value; or they may be just an extreme manifestation of natural variability.Since large residuals indicate poorly fitting observations, we use index plots of residuals for detection of outliers, that is, observations that have unusually large residuals.For example, in the previous example, the 8th observation stands out from the rest ashaving a relatively large residual in all three index plots of the residuals. The outlyingnature of this observation is obvious from these plots.The influence of a particular observation on the overall fit of a model can be assessedfrom the change in the value of a summary measure of goodness of fit that results fromexcluding the observation from the data set. Since r is the change in deviance onomitting the ith observation from the fit, an index plot of these values is the best way ofassessing the contribution of each observation to the overall goodness of fit of the model.In the previous example, Figure 2.3 shows that the 8th observation has great impact onthe overall fit of the model to the data, as measured by the deviance. Indeed, on omittingthe 8th observation, the deviance reduction is r,8 = (3.392)2 = 11.506.Chapter 2. Mixed Poisson Regression Models 53To examine how the ith observation affects the set of parameter estimates, we definethe following quantity= -{(&-&)/se(&)II+I-)/se()II}I I‘:‘!SJ3 ‘} (2.41)where & and are the maximum parameter estimates of the mixed Poisson regressionmodel based on the complete data set of n cases, and and 13(i) on the data set ofn — 1 cases excluding the i case; se(&) and se(,8) are the estimated standard errors ofthe corresponding estimates based on the n cases, and p = ck1 + (c— l)k2. Because eachterm in (2.41) measures a relative change in individual coefficient, w may be interpretedas average relative coefficient changes for a set of estimates. This is a useful quantity forassessing the extent to which the set of parameter estimates is affected by the exclusionof the ith observation. Relatively large values of this quantity will indicate that thecorresponding observations are influential and causing instability in the fitted model. Anindex plot of w is the most useful way of presenting these values.For the previous example, Figure 2.4 is the index plot of w. Clearly, the plot showsthat the 8th, 12th, 47th 64th, 65th and 66th observations are influential so that omittingeach of them from the data has a great effect on the set of parameter estimates. Forexample, if the 12th observation is excluded from the data set, each parameter estimatewill averagely change about 33%. Although the 47th observation has relatively large valueof wj, it has a relatively small value of either likelihood residual or deviance residual. Thisindicates that an influential observation need not necessarily be an outlier. In particular,an influential observation that is not an outlier will occur when the observation distortsthe form of the fitted model to such an extent that the observation itself has a smallresidual value. Note that in this example, the 8th observation is not only an influentialChapter 2. Mixed Poisson Regression Models 54observation but also an outlier as well. On the other hand, the first observation appearsan outlier but has a rather small value of w:.Goodness-of-fit StatisticsAfter fitting a mixed Poisson regression model to a set of data, it is natural to inquireabout the extent to which the fitted values of the response variable under the modelcompare with the observed values. If the agreement between the observations and thecorresponding fitted values is good, the model may be acceptable. If not, the currentform of the model will certainly not be acceptable and the model will need to be revised.The aspect of the adequacy of a model is widely referred to as goodness of fit.There are at least two widely used goodness-of-fit statistics which can be used here.One is the deviance statistic, D, defined aswhere TDi is the deviance residuals for the mixed Poisson regression model; And the otheris the Pearson’s statistic, X2, defined asX2 =where rp is the Pearson residuals for the mixed Poisson regression model. In order toevaluate the extent to which an adopted mixed Poisson regression model fits a set of data,the distribution of either the deviance or the Pearson statistic, under the assumptionthat the model is correct, is needed. For normal linear models, the deviance and thePearson’s X2 statistics are distributed as x2 with (n — p) degrees of freedom, where ri isthe number of observations and p is the number of unknown parameters in the model. Ingeneral, many studies have shown that the Pearson statistic is often much more nearlyChapter 2. Mixed Poisson Regression Models 55chi-squared than that of the deviance (e.g., Larntz, 1978). For this reason, we use thePearson statistic for overall goodness of fit tests for the mixed Poisson regression models.2.8 Applications2.8.1 R&D and PatentsEconomists studying technological innovation often use patent applications as an indicator of inventive activity. The nature of much industrial R&D activity suggests that it isnatural to assume that patent counts follow a Poisson distribution: patent applicationscan be thought of as measuring the number of successful outcomes among a large (butunobserved) number of projects within a firm’s R&D lab, each of which has a small probability of success. Econometricians have accordingly examined the relationship betweenR&D and patenting by using Poisson regression to estimate a “production function forpatents” of the form: E(y) = exp(a’xt), where yj is the number of patents applied forby firm i and x is a vector of explanatory variables, including R&D spending. (Thereare many problems with using patent counts as indicators of innovative output, but theyremain the only comprehensive, objective, and readily available measure of inventiveactivity. See Griliches (1990).)In economics there are two important characteristics associated with a productionfunction f(x): returns to scale and elasticity. The former identifies how output respondsto proportionate, scaled expansion in inputs. If a proportionate increase in all inputs increases output by the same proportion, the production function is said to exhibit constantreturns to scale. This can be mathematically described bytf(x) = f(tx),where x is a vector representing inputs, and t is a positive real number. Similarly, if a more(less) than proportionate increase (decrease) in output is obtained, there is increasingChapter 2. Mixed Poisson Regression Models 56(decreasing) returns to scale. These can be mathematically described bytf(x)<f(tx)andtf(x) > f(tx)respectively.An input (xj elasticity of output is a measure of responsiveness of output to thatinput that uses the percent change in output divided by the percent change in the input.This is given by—zXf/f — a(log f)——ö(logx)Note that for the above production function for patents, for instance, the R&D elasticityof patent applications is independent of the units in which patents are measured, andthus a more meaningful measure of the responsiveness of patent applications to R&Dspending. The R&D elasticity of patent applications simply measures the percentagechange in patent applications when R&D spending changes by a small percent.The parameters of the above model have a direct and interesting economic interpretation: they provide estimates of returns to scale in performing R&D. However, effortsto test for returns to scale using these data have been hampered by the fact that theyare typically quite severely overdispersed. Hausman, Hall and Griliches (1984), Bound etal. (1984), and Hall, Griliches, and Hausman (1986) estimated variations of the Poissonmodel which account for the overdispersion by including an additive random firm effect inthe patent equation. The random firm effect can be thought of as capturing unobservedfirm-specific factors affecting R&D productivity. As is well known, if the additive randomeffect is distributed gamma thell the unconditional distribution of the response variableChapter 2. Mixed Poisdon Regression Models 57is negative binomial. If the distribution assumption is incorrect, inconsistent parameterestimates will be obtained. These studies also present results from the quasi-generalizedpseudo maximum likelihood estimators proposed by Gourieroux, Monfort, and Trognon(1984) which allow the random firm effect to be drawn from an unspecified distribution.Though results obtained using the Poisson, Negative Binomial, and QGPML estimatorswere qualitatively the same, the estimated coefficient on R&D varied substantially. Theauthors attributed this problem to “instability” in the R&D-patents relationship overtime and across firms.We treat the unobserved heterogeneity in these data quite differently, and show howthe overdispersion can be accounted for in an alternative and perhaps more interestingway by using finite rather than continuous mixtures. Rather than assume that all firmshave common regression coefficients and a random intercept, we allow both the interceptand the coefficient on R&D to vary from firm to firm, but in a restricted way. We postulatea discrete Poisson mixture model in which firms can be in a finite number of differentstates defined by different degrees of R&D productivity, for example “high”, “medium”,and “low”. In this model the coefficients vary from state to state, rather than from firm tofirm. One way to motivate this model is to assume that all firms have access to the sametechnological opportunities, but have different unobservable innovative capabilities (e.g.“Type A” or “Type B” or “Type C” organizational structures). Alternatively, we couldassume that all firms have the same innovative capabilities, but have differential accessto technological opportunities: some firms are working in “hot” areas of the underlyingscience while others are hot.The data are patent applications and R&D spending in 1976 for 70 pharmaceuticaland biomedical companies, taken from the NBER R&D Masterfile (see Hall (1988) fordocumentation of this data set.) The data are displayed in Figure 2.5, where the horizontal axis is the logarithm of R&D. Formal test results in Table 2.8 confirm the visualChapter 2. Mixed Poisson Regression Models 58impression that the data are overdispersed: all of the tests strongly reject the null hypothesis of no overdispersion. As in the standard model used in previous studies, thedependent variable is a count of patent applications, and the explanatory variables arelog(R&D) and a quadratic term (log(R&D))2included to capture non-linearities in therelationship. The coefficients on these variables provide a’ direct estimate of the elasticityof innovative output with respect to R&D spending, and thus the extent to which thereare scale economies in performing R&D. If the elasticity is greater than one then anincrease in R&D spending would generate a more than proportionate increase in patents.The coefficient on the quadratic term is particularly interesting since it captures the extent to which economies of scale vary with the size of a firm’s R&D effort, a questionwhich has been been hotly (though inconclusively) debated by economists for many years.To apply our mixture model, we assume that(1) the total number of patents applied for by firm i is associated with covariates x =(X(m) (r)) where t = 1 ( one year), (m) = (1) and x = (1, log(R&D),(log(R&D))2)where R&D, is R&D expenditure of firm i in 1976. Note that m) = (1) correspond tothe assumption of constant mixing probabilities. Note also that the mixing probabilityhere may be interpreted as the likelihood that a firm stays in a particular underlyingstate during one year period. Since R&D expenditure is usually calculated at the end ofa year, for one year patent data, it is legitimate to assume that the mixing probabilitiesare independent of R&D covariates;(2) patent counts of different firms are independent;(3) each patent count follows a mixed Poisson distribution with Poisson rates definedby exponential link functionsA(x, aj) = exp[ojo + oj1 Iog(R&D,) +cr,2(log(R&D,))jwhere i = 1,2, ..., 70, j = 1,2, ..., c, and c is the number of components in the mixture.Chapter 2. Mixed Poisson Regression Models 59The maximum likelihood estimates for the saturated 1-4 component mixture modelsand several constrained 3-component mixture models applied to the data are given inTable 2.9. Among the four saturated mixture models, both AIC and BIC lead to thechoice of 3-component mixtures. Within the class of 3-component mixture models, thesaturated 3-component mixture model is considered as the most appropriate one to fitthe data in terms of BIC (AIC).After fitting the 3-component mixed Poisson regression model to the data, the Pearsongoodness-of-fit statistic X2 is 64.53 with 59 degrees of freedom. This value does notexceed the upper 95% critical point of the2-distribution on 59 degrees of freedom,X9,o.95 = 77.93, suggesting that the mixed Poisson regression model fits adequately.Moreover, as discussed in Section 2.7.3, the residual analysis shows that there are a fewinfluential observations and outliers. For example, the 12th observation is an influentialobservation corresponding to the company which spent $33.8 million on R&D for 59patent applications. On omitting the 12th observation, the new parameter estimatesbecome= (13.7407, 7.7893, —0.8036),(0.4344, 1.8847, —0.2071),= (0.7056,0.5177,0.0744),= 0.1653, 132=0.1929, and jO.64l8.Note that the changes in the parameter estimates of the first component are relativelylarge, while the changes in the other parameter estimates are not significant.The fitted mixed Poisson regression model suggests that patent counts are generatedby three underlying Poisson distributions with rates defined by three different R&Dproductivity functions, respectively,= exp[—16.223 + 9.3091og(R&D)— 1.014 (log(R&D))2],Chapter 2. Mixed Poisson Regression Models 60a2) = exp[0.590 + 1.780 log(R&D) — 0.196 (log(R&D))2]and )(x,a3 = exp[0.703 + 0.518 log(R&D) + 0.076 (log(R&D))2].Note that since the above three rate functions are conditional on the three underlyingstates respectively, the coefficients in these functions should be interpreted as the effectson conditional mean. For instance, cv12 = 9.309 is the log(R&D) effect on patents whena firm is in state one.The three dotted lines in Figure 2.6 represent the curves of the above functions respectively. The implied R&D elasticities (derivatives with respect to log(R&D)) are9.309 — 2.028 log(R&D), 1.780 — 0.3921og(R&D) and 0.518 + 0.1521og(R&D), suggesting that returns to scale differ across components.Note that when we fit the data by the usual Poisson regression, which fails to accountfor the excess variation, the quadratic term is not significant. (The difference in the loglikelihood between the usual Poisson regression models with and without the quadraticterm is 0.45 and x,0.99 = 6.634 > 2 * 0.45 = 0.9 .) If this were the correct model, wewould conclude that economies of scale do not vary significantly with the size of thefirm’s R&D program. The mixture model estimated above indicates, however, that thequadratic term is significant in terms of likelihood ratio test. (The difference in the loglikelihood between 3-component mixture models with and without the quadratic term is6.56 and = 11.345 <2 * 6.56 = 13.12.) This result exemplifies that overdispersionin the usual Poisson regression may result in too large standard error estimates, andsubsequently reject too many items in the usual Poisson regression.If we postulate three different states in terms of the above three different forms of thePoisson rates, a firm has 0.1819 probability of being in state 1, 0.1773 of being in stateChapter 2. Mixed Poisson Regression Models 612 and 0.6408 of being in state 3. Based on the estimated posterior probabilities definedin (2.20), we identify each firm with one of the three states. Figure 2.6 displays thisclassification in which a firm is identified with a state if the estimated posterior probabilityof the firm’s being in that state has the largest value. The maximum estimated posteriorprobabilities always exceeds 0.5 in this application. Note that those observations markedas “1” form a group characterized by)1(xT), ai), those marked as “2” by)2(xT),a2),and those marked as “3” by)3(x,a3).For the purpose of comparison, we fit the data to three widely used quasi-likelihoodmodels. The first assumes a variance function Var(Y) = a2E(), and the secondVar()1) = E(11) +o2E(Y). Note that the negative binomial model has such a mean-variance relationship. Further, the parameter estimates under the negative binomialmodel may not be significantly different from those obtained by the quasi-likelihood,though the former may be more efficient (Lawless, 1987). The third assumes that= is a random variable, and that log()) = x/3 + ej where x are covariates,are unknown regression parameters, and are random errOr terms having mean 0 anda constant unknown variance The unknown parameter o2 in these models is calledunexplained variance. Estimation for these models is discussed by McCullagh and Nelder(1989) and Breslow (1984).The results of parameter estimates and standard errors are given in Table 2.10. Computing the t-statistic (estimated coefficient/standard error) and comparing the mixedPoisson regression model with the quasi-likelihood, we find that all three quasi-likelihoodmodels may underestimate the effects of R&D innovation. For example, the absolutevalues of the t-statistics of the estimated coefficient for (log(R&D))2 are 0.398, 3.418and 1.554 for the quasi-likelihood model I, II and III respectively, while the values of thesame coefficient in the mixed Poisson regression model are 4.955, 4.560 and 4.314 for thefirst, second and third components.Chapter 2. Mixed Poisson Regression Models 62In summary, we have applied the mixed Poisson regression model to analyze therelationship between technological innovation and R&D research at firm level. The patentdata are well fitted by the 3-component mixed Poisson regression model with constantmixing probabilities and Poisson rates defined by quadratic functions in log(R&D). Thisshows that both covariates log(R&D) and (log(R&D))2are significant predictors of thenumber of patent applications. On the other hand, the covariate (log(R&D))2 is notsignificant in the usual Poisson regression model which may not be justifiable here becauseof overdispersion. The goodness-of-fit test shows that there is no significant evidence oflack of fit in the mixed Poisson regression model. In addition, the residual analysisidentifies outliers and influential observations in terms of the fitted model. According tothe fitted model, the firms are classified into three categories, each characterized by aPoisson rate function. Note that the significance of the parameter estimates of the mixedPoisson regression model is quite different from that obtained by the quasi-likelihoodmethods for dealing with extra-Poisson variation.2.8.2 Seizure Frequency in a Clinical TrialThe timing and circumstances of epileptic seizure recurrence are a source of apprehensionfor the patients and a mystery for the neurologists. Thus there have been many clinicalstudies of different treatments for reducing occurrence of epileptic seizures, and accordingly various methods used to assess a reduction in seizure frequency (e.g., Wilensky, etal., 1981, Hopkins, et al., 1985, Milton, et al., 1987, Gram, 1988, and Albert, 1991). Someof these methods like the percentage of patients “improved,” “unchanged,” or “worse”are rather subjective. This kind of the methods cannot be used to form anything otherthan an impressionistic opinion of the value of a treatment unless formal criteria forevaluating the significance of changes in the various parameters are first defined; othersare designed for particular situations. For instance, Hopkins et al. (1985) first proposedChapter 2. Mixed Poisson Regression Models 63a two-state Markov mixture model to describe apparent clustering among daily seizurecounts for epileptics. They assumes that at each state the number of seizures is generated by a Poisson distribution, and that transitions between the two states are governedby a Markov chain. Albert (1991) and Le, Leroux and Puterman (1993) presented twodifferent algorithms to find the estimates of the parameters in the model. All these methods do not directly include treatment effects as covariates in model building so that thetreatment effects may be difficult to assess.In this subsection we analyze data from a clinical trial carried out at British Columbia’sChildren’s Hospital which investigated the effect of intravenous gammaglobulin (IVIG)on suppression of epileptic seizures. Subjects were randomized into two groups. After afour week (28 days) baseline observation, the treatment group received monthly infusionof IVIG while the control group received “best available therapy”. The primary end pointof the trial was daily seizure frequency. The principal data source was a daily seizurediary which contained the number of hours of parental observation and the number ofseizures of each type during the observation period.We use Poisson regression to analyze a series of myoclonic seizure counts from a singlesubject receiving IVIG. Data extracted from the seizure diary was the daily counts, yj,and the hours of parental observation t for the ith day. Figure 2.7 gives the time plotof daily seizure counts. As covariates we use treatment (xii), trend (x2) and treatment-trend interaction (xj.), where1 1 if there is a treatment (i > 28)= (2.42)1 0 otherwise, (i 28 )= log(i) (2.43)and x3 = x12. (2.44)The second column in Table 2.11 reports results of fitting the data using the usualChapter 2. Mixed Poisson Regression Models 64Poisson regression with covariates defined in (2.42), (2.43) and (2.44), and a log linkfunction. The data are overdispersed with respect to the Poisson distribution, since eachof the overdispersion tests is highly significant (Pa = 16.18, Pb 16.22 and Pc = 36.33).This suggests the inadequacy of the usual Poisson regression model.We apply the mixture model assuming that(1) each daily observed seizure count, y, is associated with time exposure (observationhours), t, and covariates (m) (1) and x’ = (x.1 x2,x3), where x, x:2 and x3 aredefined in (2.42), (2.43) and (2.44). Note that we assume constant mixing probabilitieshere because it is believed that the likelihood of being a particular state is a constant fora patient;(2) daily seizure counts are independent and follow a mixed Poisson regression modelwith means equal to the, product of observation time (ti) and the Poisson rate (numberof seizures per hour). Rates are specified by exponential link functions(xT),aj) = exp(ajo +a1x +a2x•+a3x).where i = 1,.. . , 140, j = 1,. . . , c, and c is the number of components in the mixturemodel. This model allows the treatment, trend and interaction of the treatment and trendto affect the Poisson rate, and the regression coefficients to vary across components.Table 2.12 provides the results of fitting these models. Among the three saturatedmixture models, both AIC and BIC suggest a 2-component model. Within the class oftwo component models, we can carry out likelihood ratio tests for treatment, trend andinteraction effect respectively. For example, to test interaction effect, i.e., H0 : a33 = 0for j = 1, 2, we find that the likelihood ratio statistic equals 2 * (426.21 — 376.18) =100.06 > = 9.21. ‘This suggests a highly significant treatment-trend interaction.The model we finally select is the 2-component saturated mixture.Chapter 2. Mixed Poisson Regression Models 65After fitting the 2-component mixed Poisson regression model to the data, the Pearson goodness-of-fit statistic X2 is 134.0 with 131 degrees of freedom. This value doesnot exceed the upper 95% critical point of the2-distribution on 131 degrees of freedom,x31,0.95 = 158.7, suggesting that the mixed Poisson regression model fits adequately.Furthermore, the Pearson, deviance and likelihood residuals from the fitted model arecalculated and displayed in Figure 2.10, Figure 2.11 and Figure 2.12 respectively. Figure 2.10 shows that the Pearson residuals may not be approximately normal. On theother hand, both Figure 2.11 and Figure 2.12 show that the deviance residuals and likelihood residuals are very similar to each other, and that the 61st observation is far distantfrom the remaining observations in both plots, suggesting that it may be an outlier.On omitting this observation, the deviance reduction is r,61 = (_0.314)2 = 9.86. Thismeans that the 61st observation has great impact on the overall fit of the mixed Poissonregression model to the data.For detection of influential observations, the average relative coefficient changes w arecalculated and displayed in Figure 2.13. Clearly, the 6th observation is the only influentialobservation suggested b the plot. On omitting the 6th observation, the average relativecoefficient change for each parameter estimate is about 20%, and the new parameterestimates become= (2.2701, 1.8800, —0.2006, —0.6373),= (2.0045, 7.4989, —0.2444, —2.3026),= 0.2740 and 12 = 0.7260.Note that the changes in the parameter estimates of the first component is relativelylarger than that in the other parameter estimates. After excluding the 6th observation,we reanalyze the data by fitting to the Poisson regression and 2-3 component mixed Poisson regression models, and select the same mixed Poisson regression model with the aboveChapter 2. Mixed Poisson Regression Models 66new parameter estimates. In fact, the values of AIC for the Poisson regression and thesaturated 2-3 component mixed Poisson regression models are -576.4, -379.7 and -383.8respectively, and the values of BIC are -582.3, -392.9 and -404.4 respectively. Further,the likelihood ratio tests lead to the choice of the saturated 2-component mixed Poisson regression model. Hence, residual analysis identifies possible outliers and influentialobservations in terms of the mixed Poisson regression model.We now interpret the fitted model. In it the mixing probabilities equal 0.2761 and0.7239 and the respective rates areAi(xT),i)= exp[2.8450 + l.3020x21 — 0.4O63x2 — 0.4309x3Jand)2(x,a = exp[2.0704 + 7.43l8x1 — 0.2707x — 2.2762x3].Note that since the above two rate functions are conditional on the two underlying statesrespectively, the coefficients in these functions should be interpreted as the effects on onconditional mean. For example, &12 = 1.3020 is the treatment effect when the patient isin state one, while &22 = 7.4318 is the treatment effect in state two.Figure 2.8 provides the estimated hourly seizure rate corresponding to each component(the solid line is the rate for component one and the dotted line for component two) andthe observed hourly seizure rate y/t. Observe that with the treatment both the hourlyrates are lower and the trend is less steep than at baseline, suggesting that this patientbenefited from IVIG therapy. Figure 2.9 depicts the estimated mean E() (the solid line)and variance Var(Y) (the dotted line) for the fitted model obtained through (2.11) and(2.12). Observe that with the treatment the variance becomes much closer to the mean,suggesting the patient’s situation becomes more stable. Further, the variance exceeds themean throughout, with the greatest difference in the baseline period. The “bumpiness”in these quantities is due to the non-constant exposure. Note also that there is no obviousChapter 2. Mixed Poisson Regression Models 67parametric relationship between the estimated mean and variance.We note that the clinical investigators conducting this study found the two component model plausible. They said that they have observed subjects to have “bad days” and“good days” with no obvious explanation of this effect. We believe our model capturesthis aspect of the data and by doing so provides a clinically meaningful explanation ofoverdispersion. Note that Figure 2.8 also classifies the days in terms of the estimatedposterior probabilities. Those observations marked as “1” form a group which is characterized by the Poisson rate function1(x, ai), while those marked as “2” form anothergroup which is characterized by 2(x,cr2). We may regard )i(x, oi) as the Poissonregression specification for group one, and)2(xR cr2) for group two. In this sense, ourmodel consists of two P?isson regression models, each describing the seizure frequencyrate on “bad days” and “good days” respectively.For the purpose of comparison, we also fit the data to the three quasi-likelihood modelsdefined in Section 2.8.1. Table 2.11 reports parameter estimates for these methods.From Table 2.11 we find that using different methods for overdispersion may lead toeither different parameter estimates or different standard errors or both. For instance,the coefficient estimate for treatment effect is 4.132 by model I, 4.656 by model II,3.757 by model III, and 1.3020 for component 1 and 7.4132 for component 2 by ourmixture. Further, the ratio of estimate to standard error for trend is 5.8535 underMethod I, 4.2559 under Method II, 4.5145 under Method III, and 2.6550 for component1 and 14.587 for component 2 under our mixture. This implies that these methodsdisagree to the significance of background trend effect. Compared with the three methodsfor overdispersion, our mixture model has smaller confidence intervals for parameterestimates.In this example, we have analyzed the series of myoclonic seizure counts from aclinical trial. The data are well fitted by 2-component mixed Poisson regression modelChapter 2. Mixed Poisson Regression Models 68with constant mixing probabilities and Poisson rates depending on covariates treatment,trend and treatment-trend interaction. The goodness-of-fit test suggests that there isno significant evidence of lack of fit in the model. In addition, the residual analysisidentifies influential observations and outliers. According to this model, the patientmay have two states of seizure frequency rate, which describe “bad days” and “gooddays” situations respectively. Comparing with the quasi-likelihood methods, the mixedPoisson regression model gives smaller confidence intervals of parameter estimates. Notethat both parameter and staildard error estimates under the mixed Poisson regressionmodel differ from those obtained by the quasi-likelihood method.2.8.3 Terrorist BombingWe analyze data consisting of a time series of the number of international terrorist bombing episodes (Roberts, 1991, p.432). Roberts (1991) notes that the data do not behaveas a single homogeneous series, and suggests that an indicator variable be used to modela level shift for the 1astseven years. This is reinforced by the time plot (Figure 2.14)which suggests that there might have been a change in rate in 1973.We first apply the usual Poisson regression with an intercept, trend variable log(i)and a step variable s, defined by10 fori<608, = (2.45)1 otherwise.Note that defining the step variable as above, we assume that the step change happened in the 60th month. The trend variable is insignificant, and regression estimatesare 0.7498(0.0887) for intercept and 1.158(0.0981) for the coefficient of the step variable. The deviance for the model is 368.1 with 142 degrees of freedom. Note that thedata are overdispersed in terms of the Poisson regression, since all three score tests foroverdispersion (Dean, 1992) are highly significant (Pa = Pb = 13.84, and P 14.59).Chapter 2. Mixed Poisson Regression Models 69We apply a mixed Poisson model in which(1) the monthly terrorist bombing count, yj, is associated with exposure t and covariatesr)= (1) and (m) = (1, log(i), si), where t = 1 (one month) and s is definedby (2.45). Note that the covariate log(i) represents a trend, and Poisson rates areconstant;(2) yj, i = 1,. . . , 144, are independent and follow a mixed Poisson model with rates, .A,and mixing probabilities defined asP(Xm),13)_1ei0+13ull0+2su1 (i =k=i exp{/3k0 + 13k1 log(z) +/3k2Si] + 1c—i(m) (m)and pc(; ,i3) = 1— p3(x ,8),3=1where i = 1,. . . , 144 and c is the number of components in the mixture.This model allows mixing probabilities to depend on the trend variable and step changeand to vary between different forms of them.Table 2.13 provides the results of model fitting. Among the four saturated mixturemodels, both AIC and BIC suggest a 3-component mixture model. To test whetherthe trend effect is signifièant, we first compare the mixture with covariates including anintercept and the step change with the 3- component saturated mixture. The differencein log-likelihood between the two is 0.89, and the chi-square test statistic is 2* 0.89 = 1.78with 2 degrees of freedom. Hence the trend effect is not significant based on the usuallikelihood ratio test. Similarly, comparing the mixture without covariate with the onewith the step change variable in covariate, we find that the step change is significantbased on the likelihood ratio test. (The chi-square test statistic is 61.62 with 2 degreesof freedom.) Further, we can compare two non-nested mixtures with only step changeChapter 2. Mixed Poisson Regression Models 70variable in covariates and only the trend variable respectively using either AIC or BIC.Clearly, the former has bigger AIC and BIC values. According to the model selectionprocedure, we finally choose, within the class of 3-component mixtures, the model witha step change in the mixing probabilities.After fitting the 3-component mixed Poisson regression model to the data, the Pearson goodness-of-fit statistic X2 is 134.7 with 137 degrees of freedom. This value doesnot exceed the upper 95% critical point of the2-distribution on 131 degrees of freedom,X3r,o.95 = 165.3, suggesting that there is no evidence of lack of fit. Furthermore, thePearson, deviance and likelihood residuals from the fitted model are calculated and displayed in Figure 2.16, Figure 2.17 and Figure 2.18 respectively. Figure 2.16 shows thatthe Pearson residuals may not be approximately normal. On the other hand, Figure 2.17and Figure 2.18 show that the deviance residuals and likelihood residuals are very similarto each other, and that the 7th observation is far distant from the remaining observationsin both plots, suggesting that it may be an outlier. On omitting this observation, thedeviance reduction is r7 = (3.121)2 = 9.741. This means that the 7th observation hasgreat impact on the overall fit of the mixed Poisson regression model to the data.For detection of influential observations, the average relative coefficient changes w, arecalculated and displayed in Figure 2.19. Clearly, the 7th observation is the only influentialobservation suggested by the plot. On omitting the 7th observation, the average relativecoefficient change for each parameter estimate is about 586%, and the new parameterestimates becomeI = (30.28, —30.10),/32 = (27.56, —26.05),= 1.6874, 2 = 6.3577 and )3 = 14.239.Note that the changes in the regression parameter estimates of the mixing probabilitiesChapter 2. Mixed Poisson Regression Models 71are very significant. This may be due to the fact that after excluding the 7th observation, the first 60 observations are all generated by the first two components. Hence themixing probability of the third component is almost zero. In this case, the parameterestimates may lead infinity because they are on the boundary of the parameter space, asit usually happens in logistic regression. Note also that the Poisson rates do not changesignificantly, suggesting that the 7th observation has great influence on the mixing probabilities rather than on the Poisson rates. The residual analysis confirms that the fittedmodel is adequate. We interpret the fitted mixed Poisson regression model as follows.In it the mixing probabilities are‘(m) — exp(4.0231 — 3.8535s)Pi— exp(4.0231 — 3.8535s) + exp(1.3141 + O.1741s) + 1’,(m) — exp(1.3141+0.1741s)P2i ‘ 1— exp(4.0231—3.8535s)+exp(1.3141+0.1741s)+1(m) 1and p3(xexp(4.0231— 3.8535s) + exp(1.3141 + 0.1741s) + fand the Poisson rates areA1 = 1.6864, A2 = 6.3611 and ) = 14.044.This model suggests that the mixing probabilities have a jump. During the first 60months, the number of episodes follows one of three Poisson distributions with a lowrate of 1.6864 (episodes per month) with probability of 0.9221, a medium rate of 6.3611with probability 0.0614 and a high rate of 14.044 with probability 0.0164 respectively.After December 1972, the data follow one of the same Poisson distributions, howeverthe probabilities have changed to 0.1791, 0.6697 and 0.1512 respectively. This indicatesthat terrorist bombing incidents become significantly more frequent between 1973 and1979. Furthermore, the mixture model suggests that the time trend (monthly index) isnot significant, suggesting that rates are stable in these periods.Chapter 2. Mixed Poisson Regression Models 72If we postulate three levels of terrorist bombing corresponding to the three differentPoisson rates, each month occupies one of the levels according to the mixing probabilities. Based on the estimated posterior probabilities defined in (2.20), we identify eachobservation with a level if its estimated posterior probability of being at that level isgreater than 0.5. Figure 2.15 classifies months in this way. Note that the high intensitycomponent counts for the large number of episodes in July 1968 as well many past 1973data parts.From the fitted model, we find the estimated mean and variance are 2.18 and 6.69,respectively, for the first five years, and 5.82 and 19.5 for the last seven years. Clearly,the mixed Poisson model accounts for overdispersion.Note that we also fit the data using mixed Poisson regression model with a stepchange in the rate, and have found that the above model fits better.In summary, the terrorist bombing data have been fitted by the 2-component mixedPoisson regression model with constant Poisson rates and mixing probabilities dependingon a step change. This means that since July 1968 terrorist bombing have become moreintensive because of a likelihood of being a higher bombing rate. The goodness-of-fit testshows that there is no significant evidence of lack of fit. In addition, the residual analysisidentifies one observation which is not only an outlier but also an influential observationin terms of the fitted model.2.8.4 Accidents in WorksitesThere have been many studies of the relationship between alcohol and accident injuries(e.g., McDermott, 1977; Dietz and Baker, 1974; Hingson and Howland, 1987; and Wechsler et al., 1969). Some of these studies established a link between alcohol and accidentalinjuries (McDermott, 1977), but others have not. Particularly, there is no strong evidence implicating alcohol in workplace injuries. Some methodological issues associatedChapter 2. Mixed Poisson Regression Models 73with these studies include data collection, alcohol measurement and appropriate statistical models. Webb et al. (1994) conducted a study to analyze the relationship betweenproblem drinking and industrial workplace injuries. They collected data from 470 employees of a large industrial plant manufacturing metal products in the Hunter Valleyregion of New South Wales, Australia, employed during period May 1985 to July 1986.Problem drinking was measured by the Mortimer-Filkins test, which was devised initiallyto detect alcohol problem among persons charged with drunk-driving (Mortimer et al.,1971). The range of the test scores (MFts) in the data varies from -3 to 37. The numbersof work injuries were obtained from medical reports completed for all injuries reportedto the medical center by study participants, for a period of 12 months from the timeof administration of the questionnaire to each study participant. The data also containsocio-demographic measures including age and job satisfaction. A question of interesthere is to find significant predictors of work injuries.A review of studies on the relationship between alcohol and work injuries revealedthat the evidence is contradictory and that many of the studies contain methodologicalflaws (Webb et al., 1994). As a standard method for count data analysis, we use Poisson regression by defining the number of work injuries in subject i as Y and includingcovariates:= log(age) (2.46)1 1 if individual i has low level of job satisfaction= (2.47)0 otherwise,xi3 = 1og(MFts + 10) (2.48)and x4 = x. (2.49)Thus the model for Poisson mean X islog()) = ao +a1x1 +c2x + cr3x+a4x24.Chapter 2. Mixed Poisson Regression Models 74Note that we add a constant 10 in (2.48) so that MFts + 10 > 0 and the log-transfercan be applied.The first row in Table 2.14 reports the results of fitting the data using usual Poissonregression. Comparing the t-statistics (parameter estimate/standard error), all covariatesexcept x4 are highly significant. However these results may be misleading because thedata are seriously overdispersed. The overdispersion score test statistic Pa has a valueof 24.33 which was compared to the N(0, 1) reference value, and suggests inadequacy ofthe usual Poisson regression model.To apply the mixed Poisson regression model, we assume that(1) the number of work injuries for individual i is associated with covariates x =(tn) (r) . (m) / (r)(x , x ) with x = (1, x, x:2, x3) , x = (1, x1,x2,x3,x4), where x1,x2,x3 andx4 are defined by (2.46), (2.47), (2.48) and (2.49) respectively. Note that we chooset=1fora1li;(2) injury counts of different individuals are independent and follow a mixed Poissonregression model with rates (number of work injuries per year) given by the link functions).(Xr).) = exp(ajo + cvjixil + cj2x2 + aax3 + aj4x)where i = 1,2, ...,470, j = 1,2, ...,c and c is the number of components in the mixture.Table 2.14 shows the results of fitting these models. In order to determine the numberof components first, we compare the values of BIC and AIC among the three saturatedmodels. Clearly, both BIC and AIC lead to the choice of 2-component mixture models.Within these 2-component mixtures, we carry out inference using likelihood ratio tests.First we test the hypothesis that the effects of covariates x2,x3 and x are insignificantby comparing the one including oniy x in both mixing probabilities and rates withthe saturated 2-componnt model. Since the chi-square test statistic is 2 * (—897.74 +903.45) = 11.42 < xLo.95 = 15.51, we do not reject the hypothesis at 5% significanceChapter 2. Mixed Poisson Regression Models 75level. This implies that both the level of job satisfaction and Mortimer-Filkins test scoredo not have significant effects on mixing probability and Poisson rates.Then we test whether the effect of age (x) is insignificant in the mixing probabilities.Indeed, age is a significant covariate in mixing probabilities because the cu-square teststatistic for the corresponding hypothesis is 2 * (—899.15 + 906.31) = 14.32 > x,0.95 =3.84.For Poisson rates, the age covariate x is also highly significant in the rates becausethe corresponding test statistic is 2 * (—903.45 + 909.57) = 12.24 > x,0.95 5.99.Finally we test the hypothesis of a common slope for both components, i.e., a =21• Indeed this hypothesis is valid at 5% significance level because the test statistic is2*(—903.45+ 903.48) = 0.06 <2(1, 0.95) = 3.84. Therefore we choose the 2-componentmixture model with the age covariate in mixing probabilities and Poisson rates with thecommon coefficient. This model fits the data best.After fitting the 2-component mixed Poisson regression model to the data, the Pearsongoodness-of-fit statistic X2 is 510.8 with 465 degrees of freedom. This value does notexceed the upper 95% critical point of theX2-distribution on 465 degrees of freedom,= 516.27, suggesting that there is no evidence of lack of fit in the mixed Poissonregression model. Furthermore, the Pearson, deviance and likelihood residuals from thefitted model are calculated and displayed in Figure 2.21, Figure 2.22 and Figure 2.23respectively. Figure 2.21 shows that the Pearson residuals may not be approximatelynormal. On the other hand, Figure 2.22 and Figure 2.23 show that the deviance residualsand likelihood residuals are very similar to each other, and that the numbers of thepossible outliers in these two plots are the same, with the 72th observation having thelargest values of deviance and likelihood residuals. On omitting the 72th observation,the deviance reduction is r,72 = (3.488)2 = 12.166. This means that this observation hasgreat impact on the overall fit of the mixed Poisson regression model to the data.Chapter 2. Mixed Poisson Regression Models 76For detection of influential observations, the average relative coefficient changes ware calculated and displayed in Figure 2.24. Clearly, the plot shows that there are acouple of influential observations with the 434th observation having the largest value(0.417). On omitting the 434th observation, the average relative coefficient change foreach parameter estimate is about 42%, and the new parameter estimates become= (—1.1505,0.2566),a2 (0.5850,0.2566) and= (—6.5083,1.9982).Note that the changes in the regression parameter estimates of the Poisson rates, especially the common regression parameter, are relatively larger than that in mixing probabilities. This suggests that the 434th observation has great influence on the Poissonrates rather than on the mixing probabilities. The residual analysis identifies possibleoutliers and influential observations in terms of mixed Poisson regression model. We nowinterpret the fitted model as follows.The chosen mixture model suggests that work injury counts are generated by the twounderlying Poisson distributions with rates defined by;\1(Xr), ai) = exp(—1.4545 + 0.3431 log(agej)and 2(x,o2) = exp(0.3066 + 0.3341 log(age)).Also these two distributions are mixed according to the mixing probabilities defined by(m) , — exp(—6.8705+2.10681og(age2))Pi— exp(—6.8705+2.10681og(age))+1(m) 1and p2(x= exp(—6.8705 + 2.1068 log(age)) + 1Chapter 2. Mixed Poisson Regression Models 77According to this model employees may be classified into two groups on the basis ofwork injury rates. Those in one group have relatively a low baseline risk, and those ingroup two a high baseline risk. Age, however, has the same effect on both groups. Infact as employees get older, their chances of having a work injury increase. On the otherhand, since the mixing probability for group one1(m), /3) increases in terms of age,there are more senior employees in the low risk group than young ones. For example,for a 25 year old employee, there is a 47.8% chance of being classified into the low riskgroup with an accident rate of 0.7 work injuries per year, and 52.2 % chance the highrisk group with an accident rate of 4.0 work injuries per year; For a 50 year old employee,there is a 79.8% chance of being classified into the low risk group with an accident rateof 0.9 work injuries per year, and 20.2% chance the high risk group with an accidentrate of 5.0 work injuries per year. Figure 2.20 provides the estimated work injury ratecorresponding to each group (the solid line is the rate for the low risk group and thedotted line for the high risk group. Note that Figure 2.20 also classifies the employees interms of the estimated posterior probabilities. Those observations marked as “1” formthe low risk group which is characterized by the function ) (xv, ai), while those markedas “2” form the high risk group which is characterized by the function)2(x,a2).In this example, we have found that neither the problem drinking measure, theMortimer-Filkins test score nor the job satisfaction score is a good predictor of workplace injuries. On the other hand, age is a significant predictor of workplace injuries.After taking into account age effects, the accident rates do not depend on MortimerFilkins test score and job satisfaction but only on age in the log-linear function. Theworkplace injury data are well fitted by the 2-component mixed Poisson regression modelwhich consists of two Poisson regression models. According to the model, the employeescan be classified into two groups depending on baseline risk and the likelihood of being inone of the baseline groups associated with age. Note also that the inferences differ fromChapter 2. Mixed Poisson Regression Models 78those obtained through the usual Poisson regression analysis. The goodness-of-fit testshows that there is no significant evidence of lack of fit. In addition, the residual analysisidentifies several outliers and influential observations in terms of the fitted model.2.8.5 Aces Salmonella Assay DataThe data in this example were first presented by Margolin et al. (1981) from an Amessalmonella reverse mutagenicity assay, and analyzed by Breslow (1984) and Lawless(1987b) using quasi-likelihood and negative binomial approaches respectively. Table 2.15shows the number of revertant colonies (yj) observed on each of three replicate platestested at each of six dose level of quinoline (di).Lawless (1987b) defined the expected frequency of revertants asE(Y d) = )(d:) exp(ao + a1d + a2 log(d + 10)),while Breslow (1984) assumed E(l’ I d) \(d). At issue is whether a mutageniceffect is present. This corresponds to testing the hypothesis that a2 = 0. The data areoverdispersed relative to Poisson regression with rate defined above, since each of the threetests for overdispersion is highly significant (Pa 5.628, Pb = 5.656 and P = 5.607).To account for overdispersion, Breslow (1984) assumed a variance function Var(Y))(d) + u2,)(d), and obtained parameter estimates by using weighted least-squarescombined with method of moments. Similarly, Lawless (1987b) fitted the data with anegative binomial model in which the variance function is Var() = )..(d)+J2)..(d)2,andobtained parameter estimates by maximum likelihood. Parameter estimates (standarderrors) are reported in Table 1.8.5.3.Our analysis of the data using mixed Poisson regression models follows. We assume(1) the number of observed revertant colonies, y, is associated with covariates x =(1,d:,1og(d + 10)), and t: = 1;Chapter 2. Mixed Poisson Regression Models 79(2) Y: are independent and follow a mixed Poisson regression model with Poissonratesc) = exp(a30 +a31d.j + cJ2 1og(d + 10)).where i = 1,. . . , 18 and j = 1,. . . , c.Table 2.16 shows the results of fitting these models. Among the three saturated models, both AIC and BIC lead to the choice of 2-component mixtures. To test mutageniceffects, we compare thesaturated model to the one without covariate log(d + 10) bya likelihood ratio test. Since the chi-square test statistic equals 2 * (68.81 — 60.90) =15.82 > X,o.9g = 9.21, mutagenic effects are significant. Further, the similar regressioncoefficient estimates for each component in the saturated model suggest common regression coefficients for both components. This is indeed confirmed by the likelihood ratiotest (the chi-square test statistic is 2 * 0.01 = 0.02 < x,0.99 = 9.21.) Hence we choose torepresent the data by the 2-component mixture with common regression coefficients anddifferent intercepts for each component.The fitted model may be interpreted as follows. In it mixing probabilities equal 0.8173and 0.1827 and the respective rates are= exp(1.9094 — 0.00126d + 0.36401og(d + 10)) and)2(x,a2) =‘ exp(2.4768 — O.00126d + 0.3640 log(d + 10).This model indicates that mutagenic effects are the same for both components. Thismodel may also be regarded as a Poisson regression with a random intercept following adiscrete mixing distribution with 2-points of support. Figure 2.25 shows the classificationfor the data in which each observation is identified with either of the two componentsin the mixture according to the estimated posterior probabilities defined by (2.20). Thisplot may provide a way to visualize overdispersion for the data. From it we conjectureChapter 2. Mixed Poisson Regression Models 80that the three observations classified with component 2 may be outliers in terms of thePoisson regression model, and that overdispersion may be due to these three observations.In fact, the residual analysis below adds strength to this conjecture.After fitting the 2-component mixed Poisson regression model to the data, the Pearson goodness-of-fit statistic X2 is 16.2 with 13 degrees of freedom. This value does notexceed the upper 95% critical point of the2-distribution on 13 degrees of freedom,Xo.s(l3) = 22.36, suggesting that there is no evidence of lack of fit. Moreover, thePearson, deviance and likelihood residuals are displayed in Figure 2.26, Figure 2.27 andFigure 2.28 respectively. Figure 2.27 and Figure 2.28 show that the deviance and likelihood residuals are very similar to each other. On the other hand, Figure 2.26 indicatesthat the Pearson residuals may not be approximately normal.For detection of influential observations, the average relative coefficient changes w: arecalculated and displayed in Figure 2.29. Clearly, the plot shows that the 12th observationis influential. On omitting the 12th observation, the new estimates of the intercepts in twocomponents are 2.2242 and 2.5460 respectively; the new estimates of the other commonregression parameters are -0.00067 and 0.2430 respectively; and the new estimates of themixing probabilities for the two components are 0.5644 and 0.4356 respectively. Note thatthe new intercept estimates are very close, suggesting that the data excluding the 12thobservation may not be overdispersed. In fact, we fit the data to the Poisson regressionmodel, and find that there is no strong evidence of overdispersion because each of thethree overdispersion score test statistics is not significant (Pa = 1.6142, Pb = 1.6132and P = 1.8543). If we use the correction forms of these score test statistics for smallsamples, P = 2.1339, P = 2.1328 and P = 2.3688. These values are marginal to thenormal critical values at critical level a = 0.5, suggesting again that there are no strongevidence of overdispersin. Assuming that the data excluding the 12th observation isoverdispersed, we also fit the data to the 2 and 3 component mixed Poisson regressionChapter 2. Mixed Poisson Regression Models 81models, and select the (one-component) Poisson regression model because it yields thelargest values of AIC and BIC among the three saturated models. That is, the values ofAIC and BIC for the Poisson regression and the 2-3 component saturated mixed Poissonregression models are -61.3, -61.4 and -64.4 respectively, and the values of BIC are -62.7,-64.5 and -68.9 respectively. The analysis shows that extra-Poisson variation may becaused by outliers in terms of Poisson regression, and that the mixed Poisson regressionmodel may tend to model these outliers by extra components. Note also that the changesin the parameter estimates and corresponding standard errors between the two Poissonregression models with and without the 12th observation may not be very significant,suggesting that the 12th observation may be an outlier in terms of the Poisson regressionwith the complete data.From Table 2.17, we note that the regression coefficient estimates, & and &2, donot vary drastically across models, but their standard errors do. For instance, the valueof &2/se(&) changes from 0.3640/0.0665 = 5.4737 under the mixed Poisson regressionmodel to 0.3110/0.09901 = 3.1411 under the quasi-likelihood model. Thus, althoughall four models agree that mutagenic effects are significant, they disagree agree to thesignificance of the effect’s. Note that confidence intervals under the mixed Poisson regression model are much smaller than either the quasi-likelihood or negative binomialmodel. Hence effects are estimated more precisely. For example, an approximate 95%confidence interval for the coefficient of log(dose + 10) under the mixed Poisson regression is 0.3640 + 0.1303, 0.3110 + 0.1941 under quasi-likelihood, and 0.313 + 0.1701 underthe negative binomial model. This suggests that using different models to account foroverdispersion may lead to different conclusions.In this example, we analyzed the data set from an Ames salmonella reverse mutagenicity assay. The data are well fitted by the 2-component mixed Poisson regressionmodel with constant mixing probabilities and Poisson rates as functions of dose level.Chapter 2. Mixed Poisson Regression Models 82Note that the mutagenic effects are the same for both components, while the interceptsin the Poisson rates vary between the two components. The goodness-of-fit test suggests that there are no evidence of lack of fit in the model. In addition, the residualanalysis identifies one influential observation. Excluding this observation, the data arenot overdispersed. This example suggests that extra-Poisson variation may be causedby the presence of outliers in terms of Poisson regression, and that the mixed Poissonregression may model these outliers by including extra components. This example alsoillustrates a difference between our approach and the usual approaches for accounting foroverdispersion. Since the variance exceeds the mean, methods which correct for this byincreasing the variance may lead to less significant regression coefficient estimates. Ourapproach has a different effect. By attributing overdispersion to the presence of severalcomponents, the mixed Poisson regression model estimates coefficient effects with smallererror.2.9 Tables and Figures in Chapter 2Table2.1TheresultsofthesimulationsforthemixedPoissonregressionmodelsITheFirstModelComp.PoissonRatesMixingProbabilitiesa,E(&,0)Var(&10)a11E(&11)Var(&,1)!1OE(I10)Var(g,,)nE(i,1)Var(1t)12.82.79550.0424-2.9-2.90330.10761.11.17640.084522.62.61460.01830.40.39030.00900.60.59380.105033.63.59860.00950.20.19830.0065TheSecondModelComp.PoissonRatesMixingProbabilitiesE(&)Var(&10)a11E(&1)Var(&1)IoE(0)Var(,)IiE(,1)Var(j,1)10.40.36990.02682.01.92380.6256-2.0-2.00340.577623.03.00190.0011-1.4-1.51750.84281.51.56860.457232.01.98820.0104TheThirdModelComp.PoissonRatesMixingProbabilitiescx0E(&10)Var(&,0)a11E(&11)Var(&11)i3E(j,0)Var(t)i3E(g,1)Var(,,1)12.82.81410.0376-2.9-2.94610.15232.02.26480.9838-2.0-2.25280.990423.63.58110.00690.20.21190.0027-1.4-1.43460.75301.51.57760.437232.62.58700.01870.40.40620.010400Table2.2:Theresultof aMonteCarlostudyonthe2-componentmixedPoissionregressionmodelb...withconstantmixingprobabilitiesandvariablerates--I.parametertruevaluemeanstandardupperuppermedianlowerlowerdeviationextremequartilequartileextremea11.00.98850.06301.05011.00001.00000.95580.8972a20.50.47930.06990.4388-0.47620.50000.50230.54420.60.60370.06990.88780.66880.59960.51200.2917a2.01.99850.04202.05702.01192.00001.97811.9277a0.50.47660.05850.42240.46890.50000.5000O.532720.60.62390.14370.92390.70150.60550.55240.3628a11.00.97340.05621.06131.00001.00000.93470.8467a22.52.46290.1419-2.49452.50002.5000-2.5085-2.52050.60.58780.08650.80290.64660.58780.52760.3652a12.01.99320.04742.01762.00001.99991.98751.9704a2.52.48280.0801-2.4997-2.49992.50002.50002.500020.60.59610.07050.75070.64750.59810.55100.411728Table2.3:TheresultofaMonteCarlostudyonthe2-componentmixedPoissionregressionmodelF I.withconstantmixingprobabilitiesandvariablerates--ILparametertruevaluemeanstandardupperuppermedianlowerlowerdeviationextremequartilequartileextremea1.00.99060.06781.08291.01051.00000.95840.8813a0.5-0.48200.04844).4305-0.4707-0.50000.5000-0.530420.40.39750.17150.69650.47590.39990.32680.1045a2.01.99670.04302.05112.01052.00001.98201.9399-0.5-0.46920.0657-0.4298-0.4699-0.4993-0.5000-0.5432a20.40.42990.16830.66540.49070.40870.34010.1195a11.00.98130.06151.05471.00001.00000.96250.9064a.2.5-2.48170.08602.4995-2.49982.50002.50002.500020.40.38180.08190.56190.42960.38180.33230.1991a12.01.99380.03712.02032.00002.00001.98581.9657a-2.5-2.47630.10452.50002.50002.50002.50002.500020.40.39650.07510.59000.44830.39670.35060.2086Chapter 2. Mixed Poisson Regression ModelsPt0.6 0.4a1 a2-0.5 63/200 17/2001.0-2.5 200/200 198/200-0.5 174/200 119/2002.0-2.5 200/200 200/200Table 2.4: The results of the likelihood ratio tests for the hypothesis of a2 = 0 based onthe 2-component mixed Poisson regression model—I.Chapter 2. Mixed Poisson Regression Models 87p1=o.6 p,=O.4parameter — true value mean standard mean standarddeviation deviation1.0 0.9910 0.0849 0.9882 0.0861a1-0.5-0.2668 0.1362-0.1611 0.1306a2a 2.0 1.9985 0.5669 1.9954 0.05461-0.5-0.2708 0.0889-0.1693 0.0862a21.0 0.9976 0.0797 0.9903 0.0761a1a 2.5 -0.8055 0.2021 0.4377 0. 15982a 2.0 1.9959 0.0500 1.9931 0.0517I-2.5-0.8065 0.1789-0.4536 0.1425a2Table 2.5: The results of fitting mixed Poisson regression model to the data from a MonteCarlo study on the 2-component mixed Poisson regression model with constant mixingprobabilities and variable rates.‘Jhapter 2. Mixed Poisson Regression Models—Pt0.6 0.4a1 a2-0.5 99/200 47/2001.0-2.5 200/200 177/200-0.5 . 181/200 122/2002.0-2.5 200/200 200/200Table 2.6: The results of the likelihood ratio tests for the hypothesis of a2 = 0 based onthe 2-component mixed Poisson regression model—IT.ModelNumber#of replicatesthatAICleads#ofrepicatesthatBICleadstotalnumberof replicatesthechoiceoftherightmodelthechoiceoftherightmodel196100100287100100391100100Table2.7:TheresultsofmodelselectionbasedonAICandBICvaluesfortheMonteCarlostudy.I 00 CDTable2.8:Poissonregressionandoverdispersionteststatisticsfor thepatentdata.I.Covariateslog-1log (RND)(log(RND))2likelihoodab3.155(0.2466)-1780.374.29374.29374.290.53920.9279(0.0906)(0.0231)-316.6924.4724.4728.770.62070.85600.0123(0.1206)(0.0770)(0.0127)-316.2424.4524.4528.89*‘a‘bandParescoreteststatisticswhichasymptoticallyfollowthestandardnormaldistribution.‘a‘a‘a1.31.3P.PPrPjfr(A4‘0U((A000o(•.400000000—QGaPp—p-4..U.l.a0LA0jLA.a‘0—oo.-‘a800—,!l?S-.,O’.001.)r—.0e—‘aOs(A.4bI0.•1b0e2 ! $ 20.3 Iop Os LA 1.3‘a 0 0 00 0.0 00•1 If I I I t I&-iiiiiiii1. D .408p——l1•3—sot) iu.I 00w 40 .4 000Table2.10:Parameterestimatesfor fivemodelsfor patent data.I ciParametersPoissonQuasi-Quasi-Quasi-MixedPoisson_RegressionEstimatedRegressionlikelihoodIlikelihoodIIlikelihoodificomp1comp2comp3Intercept0.62070.62070.97340.6626.16.2330.59000.7025(0.1206)(0.2935)(0.1256)(0.1540)-__(0.4149)(0.1422)log(R&D)0.85600.85600.52680.73219861.78010.5182(0.0770)(0.1874)(0.0806)(0.1087)(1.M(0.2748)(0.0916)(log(R&D))20.01230.01230.06870.0415-1.0137-0.19610.0755(0.0127)(0.0309)(0.0201)(0.0267)(0.2046)(0.0430)(0.0175)Mixingprobabilities0.18190.17730.6408dispersion1.05.9170.20940.4070parametertzChapter 2. Mixed Poisson Regression ?pfodejs. 93Table 2.11: Parameter estimates for five methods for seizure data.Parameters Poisson Method Method Method Mixed Poissoe RegressiceEstimated Regression I U IllComp 1 Comp 2Intercept 2.118 2.118 2.148 2.129 2.8450 2.0704(0.0815) (0.1897) (0.5539) (0.3846) (0.2360) (0.0890)x1 4.132 4.132 4.656 3.757 1.3020 7.4318(0.3032) (0.7059) (1.094) (0.8322) (0.4904) (0.5095)x2 -0.2257 -0.2257 -0.2412 -0.2408-0.4063 -0.2707(0.0329) (0.0766) (0.2191) (0.1523) (0.0909) (0.0377)x3 -1.320 -1.320 -1.440 -1.221 -0.4309-2.2762(0.0800) (0.1863) (0.3098) (0.2316) (0.1385) (0.1377)Mixing NA NA NA NA 0.2762 0.7238• ProbabilitiesUnexplained 1.0 5.4206 0.8631 0.4051 NA NAVarianceChapter 2. Mixed Poisson Regression Models 94Table 2.12: Mixed Poisson regression model estimates for seizure data.Coapceeo Mixing Poisson rate logprobabilitylikelihood AIC BIC0)p1 a10 a1 aj2 a131 -component mixture[ [ 2.118 4.132 -0.27 -1.320 [ -583.16 [ -587.16 [ -3.042-component mixture1 0.4128 1.2183-700.10-703.10-707.412 0.5872 -1.1571• 1 0.3715 1.8959-1.2761-462.79-467.79 475.142 0.6285 1.3777-3.10181 0.3736 2.9919-0.4732-0.4718426.21-433.21-443.512 0.6264 2.1791-2.3248-0.33791 0.2761 2.8450 1.3020-0.4063-0.4309(0.2360) (0.4924) (0.0909) (0.1385) -376.18-385.18 -398.412 0.7239 2.0704 7.4318-0.2707-2.2762(0.0890) (0.5095) (0.0377) (0.1371)3-component mixture1 0.2742 2.8440 1.2938 -0.4054-0.42942 0.0277 2.0809-28.767-0.3928 5.4488-375.29-389.29 409.883 0.6981 2.0694 7.3197 .0.2648-2.24781%)I-t’)I.t)s-”-M000000t’)GoI’.)00t-t)I’.)-0’Vi00— -4(%)-40 Ls)jb0’(‘3i—00’Vi0Vi0Vi%QOi.--3 Vi‘.3!.3 Vi0O Vil30 -3bI-.-4.1U.(‘3V)O0’t-,—4i—000I-’—30’-.ppt)I—0-4i%)—300‘30Vi000-0II—wWVi;0•‘-3Vi—4Vi-440’i-.ViViC-400Vi4(‘300.O\000çVi(s).0II..)(.J.Vi-40’.-4-00)IlVi‘.3C’ 0 0 0 >4C)(‘-3 C’ >4I z z z 00 I— (.3 Vi 00 0 0’. (‘.3 1:.. 00 (‘.3 00 0I C’ 0 I >4C)‘-—‘0.0 >4 S. -t 0 0 0 cn C,) 0 0I-I C.)0 (‘.30’.0’I— Vi00 (‘.3 (‘-3 t&)\000 400Vi..(3“I—‘Ut00 0 00 0’.0 00—3 0 Vi0C,’IIII1J3 ViViVi00Vi.00U)i_il..3U)(‘3UiU)IIIIIU)U)U)U)U)-.30’.a’.0’‘.0—30043.000bobo0030’.U)‘0I-’43.Vi‘0C90•0(10%0%I-0%0.III•1II:iIUmIj,,x8.IIII0009z1I:900000a...‘qi-000%9r9——.-.ar—10000!.0(I(.—(19‘o--:£)?rIr-1-°9000.r)1990.%0‘0v•lf)$.w,r,000‘9??0‘9‘?II‘0V.0.00(‘0V.I—F!“10999(1V.S9(-Ia’909090%000(‘I9V.—0’1—fl(11-4—Table2.15:Number Ofrevertantcoloniesof salmonella(y1)Doseofquinolined,(j.Lg/plate)________________________0I10331003331000Observed#of151616273320colonies211826413827292133604142I.Chapter 2. Mixed Poisson Regression Models 98C_z Pviuoa rate loga1t probability likelihood PJC BKp, - a, ai1 a121-component mixture1 [ J 2.173 -0.001013 0.3198 -68.13 -71.13 -72.472-component mixture1 0.6145 3.0779-68.93 -71.93 -73.272 0.3855 3.71121 0.5617 2.9886 0.000188.68.81 -73.81 -76.04• 2 0.4383 3.6428 0.0000821 0.8132 1.9125 -0.001247 0.3623-60.90 -67.90 -71.022 0.1868 2.4064 -0.001294 0.37901 0.8173 1.9094-0.001260 0.3640 -60.91 -65.91 -68.142 0.1827 2.4768-___________3-component mixture1 0.5918 1.8484 ..0.001190 0.36402 0.3241 2.8535 -0.000154 0.1476-60.78 -71.78 -76.683 0.0841 5.9320 -0.000100 -03895Table 2.16: Mixed Poisson regression model estimates for Ames salmonella assay data.Table2.17:Parameterestimatesforfiveestimationmethodsforassaydata1asappearedinBreslow(1984)2asappearedinLawless(1987b).I.ParametersPoissonregressionQuasi-NegativeMixedPoissonRegressionIIEstimatedLikelihood’Binomial2completeincomplete3Comp1Comp2Intercept2.1732.3082.2032.2031.90942.3768(0.2183)(0.2266)(0.3634)(0.359)(0.2674)(0.2753)Dose-0.001013-0.000750-0.000974-0.000980-0.001260(0.000245)(0.000265)(0.000437)(0.000381)(0.000275)log(Dose•10)0.31980.26320.31100.3130.3640(0.05698)(0.0607)(0.09901)(0.0868)(0.0665)Mixing0.81730.1827ProbabilitiesUnexplained1.01.00.071810.0488VarianceThedataexcludingthe12thobservation.Fkiure2.1:The jndx.pJotofthePearsonresidualsfromthefitted3-ComponentmixedF’issonregressionmodeltorthepatentdatar.....C4j1•0—.(I) 0 U) C 0 Cl) aS a) 0..... 0I—I...20.4060indexB.. IFkiure2.2:The.indexplotofthedevianceresidualsfromthefitted3-componentmixedF’Otssonregressionmodelforthepatentdata.Ce)..c’J...(I) 0 (1) 0 C ( > 0) 0.0•..10204060.I-index.Fkiure2.3:The.indejp.lotofthelikeljhoodresidualsfromthefitted3-omponentmixedt’Oissonregressionmodelbrmepatentdata..c’j..(0 D U) ci) I 0 0 0 ci)0tb3 I. I...C4;J...0204060indexFiqtjye2.4:Theindexplçtofthe.averaqerelativecoefficientchanqesfromthGtitted3-componentmixedI-’oissonregressionmodelbrthepatentdata.Cl) a, 0) C (‘S C.) C a) 0 8 0) a)C,)0 0 0observation .I-....\ !.\ /I•II.•A.0204060\findexFigure2.5:Theplotofthepatentdata.00C’0L 2.00aofce’00ao h.. a).0 E0Co0000,0L()00000000o0000•00000)()0°)000Q00O())Q900Qj00IIII-2024Iog(R&D)Fiqure.2.6:TheçJssificationoftheiatentçlatacçordinqtotheestimatedpOstenor probabilities basedontheflttedmixedioissonregressionmodel.3Cl) 4-’ 4-’ Ct,0 0 0.0 E CC L() 0 0 0 LC)0•I2.3332‘3.3-20II24log(R&D)Figure2.7:Dailyepilepticseizurecounts.0.CDCl) a)-D 0 (I) 0.G) a) I N Cl) 0 I- ci-o E0,0.C’4J0a. I. I-’IIII.I1ii.IIIIIII.IIIIIIII020406080100120140.1.ItIlI..daybaselinetreatment periodI D 0-c I... a) 0. Cl) a)-D 0 Cl) 0. a) a) I— D N a) U) 0 a).0 E CFicjure2.$:Estimated.hourlyseizureratesandclassificationofthe.seizuredata.accordingtoTheestimatedposterior prObabilitiesbasedonthetittedmixedI-oissonregressionmodel.1 0 L()0I. I. -4020406080100120140dayFigure2.9:EtimatedmeanandvariancebasçdonthefittedmixedPoissonregressionmodeltortheseizuredata.obaselinetreatmentperiodII2.o10I-CoI Ib.0oIl0•‘‘(‘II‘4‘IIIll II-‘0.020406080100120140dayFiciure2j0:ThQindexplotofthePearsonresidualsfromthefittedmixedFoissonregressionmodelbr theseizuredata..U) 0 Cl) C 0 ctS 00’1..I-’C.....•I. .I..III020406080100120140IIIindexc’JFiciure2.11:ThQindexplotofthedevianceresidqalsfromtheITttedmixedFoissonregressionmodeltortheseizuredata...•I.II0”U) 0 cj I ci C) > 0)IC’;JI.I” I,...C?..I.I0III20406080100120140indexIFiQure2.12:TJje.indexplotofhelikelihoodresidu.alsfromtheTittedmixedioissonregressionmodelfortheseizuredata..I•cJ 1 0•.(I) :3 Cl) a) 0 0 a)cJII•.I.C.I.ii I-’I.020II1-406080III100120140indexCe-C 1) C () cDo> ci I 0) Q)L()>0.CU0Figure.1:Thejndex.pLotoftheaveraqerelativecoefficientchanqesfromthetittecimixedFoissonregressionmodel tortheseizuredata. 1200 c’J 06thobservationI. I..I,’ • •.•0 d• I’ I. I’020••.!‘L.•.•••••406080100140indexFigure2.14:Thetimeplotoftheterroristbombingdata.0.Cs.Ja) OLO.(0-E. a) C) CrE o0.0 2 a) 4-’ 0 C).0 E DLC0h1968196919701971197219731974197519761977197819791980Fkjure2.15:.Classificat’QpoftheterroristbpmIinqepjscesaccordingtotheestimatedposterior probabiitiesbasedonthefittedmixedi-oissonregressionmodel.30I—,,C.J3333rate3=14.04433.- a) C) C00E o022222222222rate2=6.36112222222222222222-2221222222111222211111111111122222222•11L.11111.11111111111111111111111111111111rate1=1.6864406080100120140month&Fjquyea16;TheindexplotofthePearsonresictualsfromthefittedmixedI-’oissonregressionmodel fortrieterrorist bombingdata.C,,(I) D 0 Co S C 0 Co I CU 1) 0r r020406080100120lAflIrw0index(1) CII U) ci, C) CII > ci-DC)cJFique2.t7Theindexplotofthciøvianceresiciuals fromthefittedmixed1-oissonregressionmodel br theterroristbombingdata..0 CS;]I i-I a)020406080100120140indexFjqure2.t8:.TheindexpI.otofthlilselihoodresiciualsfromthefittecurnixedFoissonregressionmodeltortheterroristbombingdata.C”.-.._..1.Cl):3 (I) 0 0 0 G)0 cJ.I. )4 I-’..............020406080100120140IIIindexCl) a) C) C C.) 4-’ C a) C-) G) 0 C.) > a) a) 0) 0)---__.__-‘-.•••..•...I....CD.Figure2.19:ThinciexplotoftheavraqerelativecoeffIcientphariqesfromthefilledmixedioIssonregressioniortheterroristbombingdata.7thobservation.-I-___-••.•c’j.0•I14000020406080100120indexnumberofaccidents02468101214IIIIII-1,-F’)F’)-‘CDF’)F)) -‘-F\)F’)F’)F’)AF’)F’)F’)C)I0 C.i)oI2:-LF’)‘--F’)F’)CDD -LF’)F’)1C). 0IC,.)--LF’)F’F’)F’)DCD F’)-F’)F’)CD) -L-F’)F’F’)--AF’)’F’)-‘-.LFOiF’)--F’)CD---‘r.)F’)’ -...LF’)F’)-----.LIF’)F’)F’)Cl)00. -.—LF’)F\)F’)C)—‘-&LF’)—L.L-‘F’)F’3CD_.-F’)F’)-.—LF’.)iF’)F’) -.1—LF’)‘0CD-..L.AF’)t’.)CD .&.-L.L—‘F’)F’)—.1..LF’)F’)F’)AA—A....LF’bF’)‘AI—L.L...L..LF’)1’)—A-—‘F’)F’)0 —A—-F’)Cl) L—L.LF’)‘F’)F’)CD —i.—L—‘F)—‘ -AF’)0 L.—a.—LF’) —a.—‘—‘F’.) .LI.._L.LII)V-.‘ —‘‘.—&J”)lF’)0 .AI.IF’)F’)0 LI —I ..LL-‘I=2: -ICD.Cl)r4)611ppoyruossarJajuosojpQxrpjgFigure2.21:TheindexplotofthePearsonresidualsfromthefittedmixedPoissonregressionmodelbrtIieaccidentdata..U) aS 0 U) ci) L.. 0 U) aS a) 01.2. F I1. ii iIII0100200300400•.indexFigure2.22:TheindexplotofthedevianceresidualsfromthefittedPoissonregressionmodelformeaccidentdata.cw).c’j.U) D-o Cl) I C-) c > G)0IcJ0100AL ‘I•I’200300400indexFigure2.23:TheindexplotofthelikelihoodresidualsfromthefittedmixedPoissonregressionmodeltortheaccidentdata..c’JU) U) G) h... 0 0 4Z a)0F I.——I0100200300400-IindexFigure2.24:The,indexplotoftheaveaqerelativecefficint chajiciesfromthefittedmixedioissonregressionmodelformeaccidentdaTa.434thobservation.CCl) O)C.)C CoC-)‘I-’ C.ci 0 a) 8c,> a) I— a) 0) ci) >‘—co 0 0...0.100200300400indexa) C 0 E Ce Cl)‘4- 0 U) a) C 0 a) ‘I G) I-. %4 0 I a) .0 E CFigure2.25:CIassificatiooftheAmesaImonIIaassyjataaccordincitotheestimatedposteriorprobabilitiesbasedonthefittedmixedioissonregressTonmodei.0 CD 0 0 0 Cl)0 N4; ba I I.02004006008001000doseofquinoline(x)Figure2.26:TheindexplotofthePearsonresidualsfromthefittedmixedPoissonregressionmodelbrtheAmessalmonellaassaydata..c’J.0-.C’)Vi)I C 0 Co I a) 0.I.510index15IC4JFigure2.7:TheindexplotofthedevianceresidualsfromthefittedmixedPoissonregressionmodeltor theAmessalmonellaassaydata./.(I) CL U) I— ci C) as > a)-D0-j-- .1,.N...—..51015I--indexCi) (‘5 :3 U) C) h. 0 0 0.c a)Figure2.28:Theindexplotofthlikelihoodresiciuals fromthefittedmixedPoissonregressionmodelbrtheAmessalmonellaassaydata.c’J...1 0•IF...-I-——--------I510index15Figure2.29:Theindexplot.ofthevraqrelativecoefficientchanqesfromthemixedPoissonregressionmodelfortheAmessalmonellaassaydata.12thobservation(I) a) 0) C aS C.) 4-’ C ci C.) 8 > a) 0)LI)1 0 •1 LI)0 0 0F I..•____••%%%%\\_/‘5.—.1015indexChapter 3Mixed Logistic Regression Models3.1 Logistic Regression and Its ModificationsThe logistic regression model has been widely used for analyzing count data in whicheach observation consists of a finite valued response variable and a vector of covariates orpredictors. Areas of applications include epidemiology, quantal bioassay, and the socialsciences. Sometimes the model fits poorly, suggesting the need for alternative models.In this case, it is not uncommon that observed data are overdispersed in terms of thebinomial assumption. In the second part of this dissertation, mixed logistic regressionmodels are introduced and investigated. These models are applicable in several differentsituations where the ustial logistic regression model is inadequate. They provide analternative way to quasi-likelihood approach and others for modelling extra-binomialvariation with a more meaningful interpretation.Suppose that the ith response Y is a count of successes in m trials, and associatedwith this response is a covariate vector x = (xii,. . . , x.)’ for 1 n. The logisticregression model assumes that the Y are distributed independently binomial(m, ir) withdensity function given byf(yi I c, x,= ( ) Yi(l —\ Ui)where r(x, a) = exp(xa)/(1 + exp(xa)), a RT, is a unknown regression parameter vector, m2 is an integer and y = 1,. . . , m. Note that the binomial parameter 7r is129Chapter 3. Mixed Logistic Regression Models 130related to the linear part, xa, through a logit transformation. Note also that m mayvary with i.The logistic regression model may be used as follows. Sometimes, inference concerningthe as is of primary importance. For example, when m = 1, Yj 1 may denote theoccurrence of a particular event of interest. Large a’s (relative to their standard errors)correspond to factors which increase the chance of the event.There are several reasons for the widespread popularity of the logistic regressionmodel. Cox (1970) argues from considerations of sufficiency. By writing down the likelihood based on {(y, xi),. . ., (y, x)}, one discovers that the vector. . ., ) yxj)is sufficient for a. Cox(1970) feels that this model is the most useful analogue, for binomial data, of the normal linear model. When the covariates are nominal or ordinal, thereis a correspondence between the logistic parameters and the parameters of a log linearmodel for cross-classified data (Fienberg(1981)). Finally, inference for the a’s remainsunaffected regardless of whether the data are sampled prospectively or retrospectively(see for examples McCullagh and Nelder (1989)).The logistic regression model is an example of a Generalized Linear Model (GLM)which is discussed by McCullagh and Nelder (1989). GLMs are models for regressiondata, i.e. a response Y measured along with a vector of covariates x. Under the GLMformulation, the response Y has a distribution which is a member of the exponentialfamily and some monotonic differentiable function of the expected value = E(Y)(called the link function), g(u), is expressed as a linear combination of covariates andparameters. For binomial regression data, the proportion Y/m is regarded as the responseand E(Y/m) = ir. Hence for the logistic regression model, the link function is the logitfunction, i.e., g(7r) = log(7r/(1—7r))Chapter 3. Mixed Logistic Regression Models 131When the logistic regression model fits the data poorly there are several alternativemodels to consider. Using the GLM formulation, these alternatives can be dichotomizedinto link function or frequency distribution modifications. To understand some of thesegeneralizations, it becomes important to distinguish between two types of data sets.Suppose that in a designed experiment, experiment units are sampled and a 0-1 responsealong with some covariates are recorded for each unit. We call such data sets as ungroupedor point binomial, and the fundamental experimental units as Bernoulli experimentalones. Observations of this nature arise, for instance, in some medical trials where anend-period result for each patient (experimental unit) is either recovered (Y = 1) orunrecovered (Y = 0). Alternatively, if 0-i responses are grouped under each experimentalcondition and the cumulative number of positive responses for each condition are recordedalong with a vector of covariates describing the condition, we call such data sets from theexperiment as grouped and the fundamental experimental units as binomial ones. In atoxicity experiment, for example, tanks of fish are exposed to some toxic agent at severallevels and the incidence of liver tumors in each tank is recorded. Here the tumor rates arethe fundamental experimental units and each provides a 0, 1,. . . , m response where m isthe size of the ith tank. With the logistic regression model, this distinction between thesetwo data sets is superfluous. The log-likelihoods under the two regimes differ only byan irrelevant constant term ln j, and inference remains unaffected. When\ yi Iconsidering generalizations, however, the distinction between two types of data can becrucial. While grouped data can be modelled by non-binomial frequency distributions,with ungrouped data we do not have this option. Any model for a Bernoulli response,Y = 0 or 1, is determined by P(Y = 1), which specifies a binomial model with m = 1and r = P(Y = 1).Chapter 3. Mixed Logistic Regression Models 1323.1.1 Link ModificationsA wide choice of link function g(’r) is available. In addition to the logistic function, at leasttwo other functions are commonly used in practice: (1) the probit function gfr) =where1(r) is the inverse of the standard Normal integraL This function is symmetricin ir and for any value of r in the range (0, 1), the corresponding value of the probit ofir will lie between —oo and oo. Note that when r = 0.5, probit(’ir) = 0; and (2) thecomplementary or log-log complementary function ln(— ln(1 — 7r)). This function againtransforms a probability in the range (0, 1) to a value in (—oc, oc), but unlike logistic andprobit transformations, this function is not symmetric about ir = 0.5. Note that all thethree link functions can be regarded as special cases of a general procedure that relatesthe probability of a positive response to the covariates through a link G’ (yr) where Gis some continuous distribution function. In fact, the logistic link is the inverse of thelogistic distribution which is defiled as 7r(z) = exp(z)/(1 +exp(z)) = Pr(Z <z) where Zis a standard logistic random variable. Similarly, the complementary link can be derivedby taking the inverse of the extreme value distribution function as the link function.McCullagh and Nelder (1989) discuss and compare these link functions. Of thesethree link functions, the use of the complementary function is limited to those situationswhere it is appropriate to deal with success probabilities in an asymmetric manner. Thelogit and probit link functions are quite similar to each other, but from computationalviewpoint, the logistic transformation is more convenient because it has an explicitlyanalytical form. There are two other reasons why the logit link function is preferred tothe other two link functions. First, it has a direct interpretation in terms of the logarithmof the odds in favor of a success. Second, models based on the logit link function areparticularly appropriate for analysis of data that have been collected retrospectively, suchas in a case-control study.Chapter 3. Mixed Logistic Regression Models 133Other links include the angular, g(ir) = sin(r)h/2 and the linear, g(7r) = r. Theselinks are discussed in Cox (1970). Of the links discussed above, the linear, angular, probitand logit are symmetric in the sense thatg1(z) = 1—g (—z) and these links are similarfor probabilities in the range (0.1,0.9).Relaxing the requirement that g(ir) be a linear function of the covariates, we can usenonlinear link functions to obtain a richer class of probability functions than the classspecified by a linear link. Prentice (1976) generalizes the logistic link symmetrically to—[aexp(w)(1+exp(w))_(1+’2)— J_ B(71,2)j f(w)dw, (3.1)where B(a, b) is the beta function.When‘yi = inverting (3.1) yields the logistic link. The parameters -y and ‘-yindicate skewness and heaviness of tails of the density f(w). Other special cases off(w) are extreme minimum value, extreme maximum value, probit, exponential, reflectedexponential, and double exponential. Thus this model can be viewed as specifying a richerclass of threshold distributions than the logistic alone.Other link functions include the power transformations of the logit probability (ArandaOrdaz (1981) and Guerrero and Johnson (1982)). A problem with these nonlinear linkfunctions is that in some cases it may be difficult to compute the maximum likelihoodestimates under the corresponding models. With development of high speed computers,this problem may become less important.Carroll et al. (1984) modify the probit link function by including covariates measuredwith error in Bernoulli experiments. With normal measurement errors, they discuss procedures to compute estimates for this model. They also demonstrate that the usualestimate of the probability of a positive response can be substantially in error when covariates are measured with non-trivial error. Their modification differs from the previousChapter 3. Mixed Logistic Regression Models 134link alternatives in that the modified link is derived to accommodate a specific problem.These approaches try to modify or enrich the basic logistic model by focusing on therelationship between the covariates and the probability of a positive response.3.1.2 Frequency Distribution ModificationsA consequence of using the binomial frequency distribution in the logistic regression isthat Var(Y) = m7r(x, a)(1 — 7r(x, a)). In practice, however, we often have Var(Y) >m7r(x, a)(1 — 7r(x, a)), suggesting the need for alternative frequency distributions. Thismay be reflected in over-large residual deviance and adjusted residuals which have avariance > 1. We note that if a positive response Y can be expressed as the sumof m independent Bernoulli random variables each with success probability (7r(x, a)),Var(Y) = m7r(x, a)(1 — r(x, a)). Hence, to use a non-binomial frequency distributionimplicitly requires viewing Y as the fundamental response, that is, to have binomialexperimental units. Several researchers have proposed approaches to accommodate extra-binomial variability.Without covariates, an alternative frequency distribution is the beta-binomial distributionf( I a,b,m)= i () (1- )m-Y+b-ld. (3.2)The model is derived by assuming that the binomial parameter ir is a positive randomvariable following a beta(a,b) mixing distribution. Hence the marginal distribution ofthe response Y is the beta-binomial. Williams (1975) discusses this model for the datafrom completely randomized toxicological experiments in which the experimental unitsare animal litters. In the model, the number of deaths among pups within a litter isassumed to have a beta-binomial distribution. This is a sensible situation to considerChapter 3. Mixed Logistic Regression Models 135binomial generalizations because litter mates often tend to respond more alike than pupsfrom different litters and a binomial model assumes independence between litter mates.Several researchers generalize the beta-binomial distribution to incorporate covariatesin the parameters for some particular applications. Crowder (1978) generalizes the beta-binomial model for 1 and 2 way layouts. It is not obvious, however, how his approachgeneralizes to continuous covariates. A difficulty with generalizing the beta-binomial toallow more complicated settings, for example continuous covariates, is that one ought tosomehow relate the beta-binomial parameter a and b to a covariate vector x via somefunctions a(x) and b(x). As Ochi and Prentice (1984) point out, it is hard to specify suchfunctions with intuitive appeal.Otake and Prentice (1984) model the number of aberrant cells in samples of 100cells taken from human survivors of the atom bombings of Hiroshima and Nagasaki.Possibly due to measurement error of the radiation doses, the data exhibit extra-binomialvariability. At each unique x vector, they estimate a(x), b(x) by maximum likelihoodusing the beta-binomial model of equation (3.2). They then fit a linear model (x) = x’avia weighted least squares where i(x) is the average number of responses at covariatevector x. The weights are the inverses of the estimated variance of i(x) (based onb(x)) under the beta-binomial model. They point out that failure to accommodate thisvariability results in overly precise inference concerning the a’s.Pierce and Sands (1975) used a different approach. They assume that unmeasuredcovariates or measurement errors might have an additive random effect on the log-oddsscale, and that logit(ir) = x’a where the intercept a0 is distributed as a normal (,o2)random variable. Likelihood estimation and residual analysis are discussed as well as anapproximate analysis necessitated by the complicated nature the likelihood function.Efron (1986) introduces double exponential families as constituent distributions inGLMs, in which means and variances are allowed to depend on covariates. As an exampleChapter 3. Mixed Logistic Regression Models 136of his model, he modifies the binomial distribution (m, ?r) by rescaling it with sample sizem to define a double binomial familyf(y I r, 0, m) c(Tr, 0,m)0”2{g,m(y)}°{gy,m(y)}’6[dGm(y)],whereg,m(y) = ( m mY(1 — )m(i-Y)\myj(m\and Gm(y) is the discrete distribution putting mass 2—m at y = 0, 1/rn,... , 1,\my)and c(ir,0,m) satisfiesI ,0,m)dGm(y) = 1.Based this model, he analyzes the toxoplasmosis data by incorporating covariates to themean and variance in such a way that logit(ir) = ao +a1x+c2x+c3x where x is thestandardized rainfall for city i, and 0 = 1.25/(1 +exp(—X)) where A =and M is the standardized value of the sample size m2 for city i.Another approach for modifying the binomial frequency distribution is quasi-likelihoodwhich specifies only the first two moments of Y rather than the complete distribution.The attraction is that unduly rigorous assumptions about the frequency distribution areavoided. To model binomial a regression (yi, xi),. . . , (y, x), McCullagh and Nelder(1983) suggest assuming that E(Y) = mirj and Var(Y) = mju27r(1 — r) rather thanspecifying a complete distribution for Y, where ir = ?r(x, cr). This approach is similar toone advocated by Finney (1971) who used the probit instead of the logit link. Note thatfor the logistic regression model, u2 = 1. Therefore 2 > 1 corresponds to extra-binomialvariability or overdispersion, while u2 < 1 corresponds to underdispersion. Since thecomplete distribution of Y is not specified, maximum likelihood estimation is precluded.Chapter 3. Mixed Logistic Regression Models 137Estimates of a and a2 are computed via a quasi-likelihood (Wedderburn, 1974) approach.In fact, the maximum quasi-likelihood estimates of a are the same as the usual logisticregression maximum likelihood estimates regardless of the value of a2, and the momentestimate of 2 equals Pearson’s chi-square value divided by the degree of freedom. Thisestimate is consistent in the limit as the number of observations increases to infinitywith m fixed, and its asymptotic distribution is known (McCullagh and Nelder (1983)).Another estimate of a2 is obtained by the deviance divided by degree of freedom. A problem with this model is lack of interpretation of it because it cannot explain the cause ofoverdispersion as other juasi-likelihood models such as that of Williams does.Williams (1982) considers two quasi-likelihood models which fine tune the previousapproach. By regarding the binomial parameter as an unspecified random variablell following a continuous mixing distribution on (0, 1) with E(II) = 02 and Var(ll) =qO2(1 — 0j, he shows that the unconditional mean and variance of I’ areE(Y) = m20 andVar(Y) = m20(1 — 0j)(1 + çb(m2 — 1)),where 0 = exp(xa)/(1 + exp(xa). Note that in the absence of random variation inthe response probabilities, Y would have a binomial distribution, Bi(m, 0j, and in thiscase, Var(Y) = m0(1—0). This corresponds to the situation where = 0 in theabove equation. On the other hand, if there is variation of Y amongst the responseprobabilities, so that is greater than zero, the unconditional variance of Y will exceedm20(1—02) by a factor (1 + q(m2—1)). Thus variation amongst the response probabilitiescauses the variance of the observed number of successes to be greater than it would havebeen if the response probabilities did not vary at random, resulting in overdispersion.As Ochi and Prentice (1984) and Collett (1991) mention, this model can be also derived by assuming that there is a common correlation between the Bernoulli responsesChapter 3. Mixed Logistic Regression Models 138within a binomial experimental units. Suppose that the ith of m sets of binary dataconsists of Y successes in m observations. Let Rj1,. . . , Rjm be the random variables associated with the m2 observations in this set, where R, = 1 for j = 1,. . . , m, correspondsto a success, and R3 = 0 to a failure. Now suppose that the probability of a successis O, so that P(R1, = 1) = 0, B(R3) = 0 and Var(R3)= 0(l—0). The number ofsuccesses Y is then the random variable R, and so E() = E(R) =and the variance of Y is given bym mVar(11)= > Var(R3)+ > Cov(R3,Rk)j=1 3=1 kjwhere Cov(R3,Rk) is the covariance between R3 and Rj for j $ k, and k 1,. . . , m.If the m random variables R1, . . . , Rjm were mutually independent, each of these covariance terms would be zero. However, since we assume that the correlation betweenR3 and Rik isCov(Rj,,Rjk).../Var(R)Var(Rik)we have Cov(R3,Rk) = 6O(1 — 0) andm mVar(Y) =j=1 j=1 kj= m:Ot(1 — 0) + m(m; — 1)[SO(1 — O)}= m:0j(1 — 0)[1 + (m — 1)S].Note that the approach of McCullagh and Nelder lacks this interpretation unlessm, = m for i = 1,. . . , n. An iterative algorithm which produces estimates of a and ir isalso presented. Unlike the approach of McCullagh and Nelder, the estimates of a may bedifferent from the usual logistic regression maximum likelihood estimates unless m = mfori=1,...,n.Chapter 3. Mixed Logistic Regression Models 139Williams (1982) also discusses another model where the logit of ‘irj is a random variablewith E(logit(7rj) = x’a and Var(logitfrj) = a2. As a consequence of this assumption,the true response probability is a random variable II whose expected value is O. Theresulting model for logit(ll) is thenlogit(llj = xa + 6and the term 6 is known as a random effect. This model generalizes the approach ofPierce and Sands by relaxing the assumption that the intercept of the regression has anormal distribution. Williams(1982) notes that these two models are quite similar thoughthe latter has a more elegant interpretation since the fixed and random effects are on thesame scale.Follmann and Lambert (1989) propose a non-parametric mixture of logistic regressionmodel in which the intercept in the regression is a random variable with an unknownmixing probability distribution, and other regression coefficients are unknown constants.The mixed probability function of the response Y associated with a covariate vector xand m trials is given byIm’\ r°°x, a, m, H)= ( j J r(a + x’a)!I(l — ir(a + x’a))mdH(a), (3.3)uJ -where r(a+x’a) =exp(a+x’a)/(l +exp(a+x’a)) andy = O,1,...,m.Although the mixing distribution H is not indexed by parameters, Laird (1978) hasshown, under general conditions, that when estimating any mixture model (withoutcovariates), the nonparametric maximum likelihood estimator of H is a step functionwith a finite number of steps. Lindsay (1983) also discusses some general results fornonparametric mixtures. He shows that existence, uniqueness and support size of themaximum likelihood estimate are related to properties of the convex hull of the likelihood.These results given by Laird (1978) and Lindsay (1983) imply that in terms of maximumChapter 3. Mixed Logistic Regression Models 140likelihood estimate, it is the same no matter whether H is assumed as a nonparametricdistribution or as a discrete distribution with c points of support, where c is an unknownfinite integer. In this sense, (3.3) may be equivalently expressed by a finite mixture withan unknown number of components. In the next section we will propose a mixed logisticregression model which generalizes Follmann and Lambert’s model.3.2 Tests For Extra-binomial VariationTo check whether data re overdispersed relative to the binomial assumption, we needa way to test for extra-binomial variation for regression type data. Note that it maybe misleading if one tests for extra-binomial variation by fitting a more comprehensivemodel that includes the binomial, and tests a reduction to the simple model using, forinstance, a likelihood ratio test. Lawless (1987a) points out that in some circumstancesthe asymptotic distribution used with these cases may be unreliable, as they tend tounderestimate the evidence against the base model.An informal approach to detect extra-binomial variation is to use convexity plots(Lindsay and Roeder, 1992, and Lambert and Roeder, 1993). For example, Lambert andRoeder (1993) define the following function C(7r) and propose plotting it against r forlogistic regressionfl vi mj—yjC(Tr) = n’ > () (f)where = exp(x’&)/(1 + exp(xj’à)), & is the maximum likelihood estimate of regression parameter vector a, and r E (0, 1). They prove that if observations are generatedby a logistic regression model with random coefficients or random means, C(r) is approximately convex for a large sample. Therefore, the more convex C(7r) appears, themore evidence there is of overdispersion or an omitted variable. Note that this approachcannot distinguish overdispersion from lack-of-fit problem.Chapter 3. Mixed Logistic Regression Models 141Several researchers use score tests for extra-binomial variation by fitting the binomial model as a first step in the model building and testing for overdispersion. Tarone(1976) considers a correlated binomial alternative model, and applies the C(o) procedure of Neyman (1959) to derive the score test statistic for the adequacy of the binomialdistribution. Taking a different approach, Efron (1986) derives the score test statisticagainst beta-binomial alternatives. Dean (1992) develops a unifying theory for the scoretests mentioned above and provides three score test statistics for the hypotheses of nooverdispersion in the usual logistic regression model against alternatives based on threedifferent forms of extra-binomial variation respectively. These score test statistics areN —— mfr)2— m(1—aN — {[*(1 — ,)]‘(yj — mfr)2 + *j(yj — m) — y:(l —b—n 1/2 an{2E1m(m—1)}N—1{(m — 1)(1—*)}1{(y — m)2 + fr(y — mir) — y(l ——{2 m2(m — 1)_1}h/2corresponding to the following specifications of overdispersion:(a) E(1’) m’irj and Var(11) mr(1 — r)[1 + O(m — 1)7r(1 — 7r) for 0 small,(b) E() = mir and Var(Y) = mr(1 — ir)[1 + 0(m, — 1)], and(c) E() = mr and Var() = m,r(1 — ir)(1 + 0) for 0 > 0.In the formula for Na, V is calculated byV2 = {2mfr(1 — )2 +mir(1 — irj(1 — 67r +67r)} —_____:=1 j=1where Wi: = mr(1—irs), W2 = m:irj(1 — ir)(1 — 2ir), and are the elements ofthe matrix H =W2X(XtW)_1h/where W1 = diag{ Wii,. . . , W1} and X is thedesign matrix. In the above three formulae *j is the estimated probabilities for positiveChapter 3. Mixed Logistic Regression Models 142response for the independent identical observations based on the usual logistic regression.Under the null hypothesis H0: 0 = 0, each statistic asymptotically follows the standardnormal distribution. Note that the first two types of overdispersion (a) and (b) are themean-variance relationship of the models proposed by Williams (1982), and (c) is thatintroduced by McCullagh and Nelder (1983).3.3 A Mixed Logistic Regression ModelWithout covariates the finite mixture approach has been widely used in many applications(c.f. Titterington et. al. (1985)). With covariates, however, this approach has not beensystematically studied and directly applied for analyzing binomial response data. In thissection we extend the finite binomial mixture model to the logistic regression model byallowing both the component binomial parameters and mixing probabilities to depend oncovariates. We investigate some basic features of the model. We also discuss identifiabilityfor the model and provide sufficient conditions for identifiability.3.3.1 The ModelLet the random variable Y denote the ith binomial response variable, and let {(y, m,i = 1,... , n} denote observations where y, are observed value of 1’, m are total trialsfor and x = (Xm), xT)) are k-dimensional covariate vectors associated with yj. Notethat m) and x arek1-dimensional andk2-dimensional vectors corresponding to theregression part of mixing probabilities and component binomial parameters respectively.Usually the first element of cm) and is 1 corresponding to an intercept. Our mixedlogistic regression model assumes(1) The unobserved mixing process can occupy any one of c states where c is finite andunknown;Chapter 3. Mixed Logistic Regression Models 143(2) For each observed binomial response j, associated with a binomial denominatorm, there is an unobserved random variable, H,, representing the component whichgenerates yj. Further, the (1’, fl) are pairwisely independent;(3) Conditional on covariate çm) fl follows a discrete distribution with c points of(m) (m) c (m)support, and Pr(ll = j x1 ,/3) = p3(x ,8) where_1p(x ,/3) = 1 and(m)p,(x1 ,/3) is defined by(m)p3(x1 ,/3) Pij,(m)exp(/33. )= forj = 1,...,c— 1, (3.4)1 + E’ exp(/3xm))and— (m)Pic = pc(x1 ,8)=1_ipij (3.5)with /3 = (flu,. . . ,/9—i)’ and /3 = . . , /31)’, j = 1,. . . , c — 1, are unknownparameters. In fact, conditional on m) . follows a multinomial distribution(1, Pil,. . . , pic). Note that /3 appears in each pj3 for 1 <j < C;(4) Conditional on ll = j and the binomial denominator m, Y follows a binomialdistribution which we denote by/ (r)f3j,yiIx,bi(y1 I m,7r)Imi\= I I 7...Yi(1 — )mz (3.6)\ Il Jwhere/ / (r)—(r) expaj X27rj = 7r(x aj)=, for j = 1, . . . ,1 + exp(aj’x ‘)Chapter 3. Mixed Logistic Regression Models 144where a (ai,. . . , aj’ are unknown parameters , where a3 = (cr3i,. . . , aJ/2 )‘,j 1,... , c. Note that the component binomial parameter rjj relate to covariatesr) through the logit function.The above assumptions define the unconditional distribution of observations, yj, as afinite binomial mixture in which the mixing probabilities, Pu, depend on the covariates(m) through the multinomial link function, and the component distributions are binomialdistributions with the probabilities, lrjj, depending on the covariates x through the logitfunction. Suppose that observations can be classified into c groups corresponding to thec unobservable states, a, may be interpreted as the coefficients of the logistic regressionof observations in group j. On the other hand, 3 may be interpreted as the coefficients ofthe multinomial regression in which llu and m) are dependent and independent variablesrespectively.Note that the model allows some or all components of (m) and x to be identical, andsome coefficients, a’s, to be constant across components, i.e., ajl = al for j = 1,.. . , cor 0 in one or several covariates, i.e., au = 0 for some j, j = 1,. . . , c. We denote(m) (m) (m) , (r) ,X=(z . . . x, ) and X ‘ = (x1 . . . x’’) as two design matrices.Under the above assumptions the probability function of Y satisfiesC(‘,‘) (m)f(y ; ,x ,mu,a,/3) = p,bi(y, I mu,ru) (3.7)j=1where pj and hi (y I m, are specified by (3.4),(3.5) and (3.6) respectively.We may equivalently view the model as arising from the following sampling scheme:Observations are indepndent; for observation i, component j is chosen according toa multinomial distribution with probabilities pjj; subsequently, yj is generated from abinomial distribution with m trials, and probability ‘Irj.There are several justifications for mixed logistic regression models. Suppose thateach experiment unit or object has some underlying propensity for a positive responseChapter 3. Mixed Logistic Regression Models 145which is captured by one of the c response curves: logit(ir) = x(”1’a, (1 j c), andthat the proportion of the experiment units captured by the jth curve, depends on acovariate vector (m), i.e.,p3(x(m), 8). Thus we are led to the model of equation (3.7).Another argument for the mixed logistic regression model is that the coefficient vectora in the usual logistic regression model, logit(7r) x’)’a, is a random variable withthe discrete distribution: Pr(a = a3)=p3 for j = 1,. . . c. By making the furtherassumption that p3 are related to a covariate vector (m) we are led to the model ofequation (3.7).Note that the above model includes several interesting models as special cases. Someof them were previously studied.• Choosing c = 1 yields a logistic regression model;• Setting m) = (1) yields a finite mixed logistic regression model with constantmixing probabilities;• Setting (m) = (1), x 1 and aj ak for k 4 1 yields Follmann and Lambert’smodel (1989);• Setting x = (1) yields the finite binomial mixture in which the component binomial parameters are constant and the mixing probabilities depend on covariates(m)3.3.2 Features of the Mixed Logistic Regression ModelsTo use the mixed logistic regression models we have to distinguish experiment units aseither Bernoulli or binomial. For binary data (m1 = 1 for 1 i n) we can rewriteChapter 3. Mixed Logistic Regression Models 146equation (3.7) asC C(r) (m)f(y x , , m, a, /3) = [ pjj7rjj} [1 — pjirjjjj=1 j1The above equation implies that we oniy modify the link function with the probability=prj. In this case, no matter whether the binary responses are heterogeneous,the responses always have Bernoulli distributions. This also means that the above modelcannot adjust for overdispersion relative to the Bernoulli assumption. Furthermore, themodel may not be identifiable without imposing some unrealistic restrictions on covariates. Hence we recommend not using the mixed logistic regression models when dealingwith binary data.For binomial experimental units, the distribution defined by equation (3.7) is nolonger a member of exponential family so that the representation of a generalized linearmodel does not apply. In this case the component distributions have the logistic link,and the frequency distribution is a finite binomial mixture.For the mixed logistic regression models, the unconditional mean and variance of Yare, respectively,E(Y) = E(E(1IH))=mj(pjjjj) mj*j (3.8)andVar(Y) = E(Var(Y I ll)) + Var(E(Y I fl))= m — pj) + ((mi — 1)/mi) Var(E( I ll))= m(1—j) + ((mi — 1)/rn:) Var(E(Y I fl)), (3.9)whereVar(E( I ll)) = m— }Chapter 3. Mixed Logistic Regression Models 147Since m > 1, Var(E(}’ I He)) = 0 holds if and oniy if E(Y I ll) is constant. Hence, ifwe denote as the new probability, Var(34) = m*(1—j) if and only if 7r1 ... 7rjfor 1 <i <n. This implies that the proposed model is able to cope with extra-binomialvariation among Y1,.. . , Y,, due to heterogeneity in the population.3.3.3 IdentifiabilityTo be able to reliably estimate the parameter of (3.7) we require the mixture be identifiable, that is, two sets of parameters which do not agree after permutation cannot yieldthe same mixture distribution. Although an unlimited class of finite binomial mixturesmay not be identifiable, classes of finite mixtures of some subfamilies of binomials maybe identifiable. Without covariates Teicher (1961,1963), Blischke (1964) and Margolin,Kim and Risko (1989) give necessary and sufficient conditions for identifiability of thefinite binomial mixtures. These results may be summarized as follows. In the binomialfamily bi(M, ir), 0 < r < 1, for fixed M but varying ir, the class of mixtures of at most kmembers is identifiable if and only if M 2k— 1. That is, if there are two representationsof the same mixture:= y = 0,. .. , M,with Fj(y) = bi(y I M,ir), Fj(y) = bi(y I 0 <j < 1 for 1 j ci, for1 j c2, p3 = j5 = 1 and c1, c2 < k, thenCl = c2,’ 7r = j, and p3 = for j = 1,. . .if and only if k (M + 1)/2.With covariates, Follmann and Lambert (1991) discuss the sufficient conditions for theidentifiability of the nonparametric logistic regression model with common nonrandomregression coefficients and a random intercept with a finite, unknown mixing distribution.Chapter 3. Mixed Logistic Regression Models 148Note that their model may be equivalently viewed as a special case of our models. Theyshow that for binary response the number of components in the mixture must be boundedby a function of the number of covariate vectors that agree except for one coordinate;and for binomial response the number of components must satisfy the same bound orbe bounded by a function of the largest number of trials per response (M), i.e., c(M+1)/2.To discuss sufficient conditions for identifiability in our case, we first define identifiability as followsDefinition: Let denote the class of probability models {f(yi x,x,m1, a, ,8),x,xm),mn,a,/3)}, with fQi r) (m) ma/3) with at most c components, a restriction that < ... < parameter space C x II x 2, sample spaces• , and fixed total number of trials and covariate vectors (m1,(x, m))),(ma, (41, 4m))) where x E Rd1 and (m) E Rk2 for i = 1,. . . , n. Ø is identifiable if for(c, a, ,8), (c*, a, j3*) E C x U x 2,f(i I çm) t, a, 8) = f(y x, t, a, /3*) (3.10)for all y E Y, i = 1,..., n, implies (c, a, /3) = (c*, a*, /3*).Note that the order restriction in the definition means that two models are equivalentif they agree up to permutations of parameters.Like the setting without covariates, we give sufficient conditions for identifiability byimposing a restriction on c specified by the minimum number of trials for proper subsetsof the observations. We state them below.Theorem 2. Let S,\ = {(y, xAj; i = 1,. . . , t for some t} denote such a subset(m) (m)of the observations indexed by ) E A that the ranks of vectors {x, ,. . . } and{x,. . . , x} equal the ranks of the design matrices X(m) and X(T) respectively, andlet N = min{m1, . . . , m}, and NA0 = maxXEA{NA}. Then is identifiable if (1)Chapter 3. Mixed Logistic Regression Models 149c (NA0 + 1), and (2) x(T) and X(m) are full rank.Proof. Without loss of generality, we assume that the subset of the first t observations isSA0 corresponding to NA0. Suppose that (c, , 3) and (c*, a, *) satisfy equation (3.10),this then implies that for each i and all y E Y, ? 1,. . .I mj) — pbi I (3.11)where 6) and 7rjj irj(xT), aj) are defined above, and p, and aredefined analogously. Note that each side of equation (3.11) may be regarded as a finitebinomial mixture without covariates. Since c, c” N m, Teicher’s results (1961, 1963)imply thatc = c”, pjj = p’, and rij = (3.12)for i = 1,... ,t and j = 1,... ,c. By the definition of the model, we obtainexp(/3xm)) = exp(/3xm)) for j = 1,... ,c— 1, (3.13)1ogit(ax) = logit(cvxT)) for j = 1,. . . , c, (3.14)for i = 1,.. . , t. Since the logit function is monotone, from (3.13) and (3.14) we obtain(/3_/3;)IXm) = 0 forj =1,...,c—1 andi= i,...,t,*, (r)(a—c3)x1 = 0 forj=1,...,candz±=1,...,t,or(I3_/3;)1x4m) = 0 forj = 1,...,c—1, (3.15)(a a*)Ix = 0 forj 1,...,c, (3.16)(m) (r) (m) rwhere X and X, are the submatrices consisting of the first t rows of X and Xrespectively. Since the ranks of Xm) and equal to the ranks of X(m) and x(r) thatare full rank, (3.15) and (3.16) imply that (cr, /3) = (o*, /3*). Thus is identifiable. DChapter 3. Mixed Logistic Regression Models 150Note that we can assume that condition (2) holds without loss of generality, since ifit does not we can reparameterize the model accordingly. Note also that the sufficientconditions for identifiability depend on partial information of the observations.The conditions in Theorem 2 mean that if the two design matrices are full rank, themixed logistic regression models are identifiable up to [(NA0 + 1)/2] components. Forinstance, if NA0 4, the theorem only guarantees that one or two-component mixedlogistic regression models are identifiable. Note that the sufficient condition c (NA0 +1) may not be the lowest bound for identifiability.As a simple illustration of Theorem 2, consider the following data in Table 3.1 onthe toxicity of ethylene oxide for grain beetles (Busvine, 1938). Note that Follmannand Lambert (1991) discuss identifiability of their model for this data set. We assume amixed logistic regression model with both binomial parameters and mixing probabilitiesdepending on dose level x and an intercept. Hence the ranks of the design matrices X(m)and x” are 2. Since any 2 x 2 submatrix of either X(m) or X’ is full rank, there are45 x 45 = 2025 elements in the index set A. NA ranges from 24 to 31, and NA0 = 31.Therefore, Theorem 2 allows 16 components in the mixed logistic regression model. Thissufficient condition is the same as that given by Follmann and Lambert (1991).For two special cases of our model: constant mixing probabilities (X(m) = 1) andconstant binomial parameters (X(’) 1), the above sufficient conditions can be statedas follows.Corollary 1. Let SA = {(y, mx1,XAJ; i = 1,.. . , k2} denote such a subset of theobservations indexed by ) E A that the rank of vectors {x,.. . , x } equal the ranks ofthe design matrices X(r). And let NA = min{mA1,. .. , m,k2}, and NA0 = maxAEA{NA}.Then is identifiable if (1) c < (NA0 + 1), and (2) X(r) is full rank.Corollary 2. Let S, = {(yA, mA,, xA); i = 1,. . . , k1} denote such a subset of theChapter 3. Mixed Logistic Regression Models 151observations indexed by -.A E A that the rank of vectors {xS,. . . , x) } equal the ranksof the design matrices X(m). And let N>, = min{m>,1,.. . , m>,,1 }, and N>,0 = max>,EA {N>, }.Then is identifiable if (1) c (N>,0 + 1), and (2) X(m) is full rank.3.4 Parameter EstimationTo obtain the maximum likelihood estimates of the parameters in the proposed modelrequires using an iterative algorithm. Two widely used algorithms can be applied to thiscase: (1) the EM algorithm (Dempster, Laird and Rubin, 1977) and (2) quasi-Newtonalgorithms. In this section we discuss how to apply the EM algorithm and the quasi-Newton algorithm to our model with a known number of components. Note that whenimplementing the EM algorihtm, we also use a quasi-Newton approach for the M-step.We present results of a Monte Carlo study to investigate the performance of our codesand discuss some implementation issues.3.4.1 The EM algorithmFor a fixed number of components c we obtain maximum likelihood estimates of theparameters in the above model using both the EM algorithm (Dempster, Laird andRubin, 1977) and the quasi-Newton approach (Nash, 1991). As is now standard inmixture model estimation, we implement the EM algorithm by treating unobservablecomponent membership of the observations as missing data. We discuss choice of numberof components below.Suppose that (Y M, X(m), X(r)) {(yj, m, (m) xv); i 1,. . .,n} is the observeddata generated by the mixed logistic regression model. Let(Y z, M, X(m), X) {(yj, z, m, m) xT)); i = 1,. . . , n}Chapter 3. Mixed Logistic Regression Models 152denote the complete data for the model, where the unobserved quantity z2 = (z21,. . . ,satisfies1 ifll=jzij =( 0 otherwise.The log-likelihood of the complete data is1Z,M,X(m),X(r)) = + I m,))=1 j=1 i1 j=1where pj and bi (zi I m, 7rjj) are defined by (3.4),(3.5) and (3.6) respectively.The EM approach finds the maximum likelihood estimates using an iterative procedure consisting of two steps: E-step and M-step. At the E-step, it replaces the missingdata by its expectation, conditional on the observed data and the initial values of parameters. At the M-step, it finds the parameter estimates which maximize the expectedlog likelihood for the complete data, conditional on the expected values of the missingdata. Iteration stops when the log likelihood for the observed data does not increasesignificantly. In our case this procedure can be stated as follows.E-step: Given the values, and 3(0), replace the missing data, Z, by its expectation conditioned on these initial values of the parameters and the observed data,(Y M, X(m), X(r)). In this case, the conditional expectation of the jth component of zequals to the probability that the observation y was generated by the jth componentof the mixture distribution, conditional on the parameters, the data and the covariates.Denote the conditional expectation of the jth component of z by (a(°), 3(0)). Then= E (z I c X(m), x(T), a(0) (O))= Pr (z = i(m) (j . (r) (0)pj(x ,3’ ‘)bi(y: I m,?r(x ,a ) . /forj=1 ... c 3.17C çm . (r) 0)1=1 pi(x, , 3( ))bi(y I rn, iri(x ‘ )Chapter 3. Mixed Logistic Regression Models 153where pj(X,/3()) and bi(y. I are defined by (3.4), (3.5) and (3.6)respectively.M-step: Given conditional probabilities {:(a(°), 9(0)) = (zi,. . . ,z)’; i = 1,. . . ,obtain estimates of the parameters by maximizing, with respect and /3,Q(a, 3 (°), 3(0)) = E (ic I Y, X(m), jij, (O), /3(0))Qi(13 /3(0)) + Q2(awhere= :j1og(pj) andi=1 j=1= j log (bi(y m,1=1 j=1The estimated parameters, & and , satisfy the following M-step equations= 0 (3.18)öQ2= 0. (3.19)Since closed form solutions of these equations are unavailable, we use a quasi-Newtonapproach (Nash, 1990) to obtain estimates. We implement the E and M steps in thefollowing way to obtain parameter estimates.(0) (0) (0) (0) (0)Step 0: Specify starting values a’ ‘ = (a1 ,. . . , a ) and j3’ = (i ,. . . , /3_) and twotolerance E1 and 2;Step 1: (E-step) Compute j =. , i)’, (1 n), using (3.17). To avoidoverflow problem in the calculation of we divide both the numerator and denominator in (3.17) by the largest term in the sum in the denominator;Chapter 3. Mixed Logistic Regression Models 154Step 2: (M-step) Find values of & and /3 to solve (3.18) and (3.19), respectively, usingthe quasi-Newton algorithm (Nash, 1990);Step 3: If at least one of the following conditions is true, set a° = & and ,B(°) /3, andgo to Step 1; Otherwise, stop.(1) H & — II E= E I — a(2) H — II I — /34(3) I l(&, $1 Y M, X(m), X(r)) — l(a(°), I “ iw X(m), X(r)) e2, where l(ci, /3Y M, X(m), X(r)) is the observed likelihood function.Dempster, Laird and Rubin (1977) and Wu (1983) discussed the convergence properties of the EM algorithm in a general setting. Since Q(a, /3 a(°), /3(0)) and its first orderpartial derivatives are continuous in c, 9, a(°) and 3(°), applying Wu’s theorems (1983)lets us conclude that the sequence of the observed likelihood l((1c), /3(k) Y, M, X(m), X(r))converges to a local maximum or saddle point. Note that the observed likelihood function l(a, /1 Y M, X(m), X(r)) need not, in general, be globally concave. Thus we need tochoose initial values carefully in order to increase the chance that the algorithm convergesto the global maximum. Our approach will be discussed below.Note that the above EM algorithm does not directly yield estimates of the standarderrors corresponding to the parameter estimates. On the other hand, when c is known,asymptotic normality of /((&, /3)—(a, /3)) is easily proved under standard regularityconditions (Lehmann, 1983). To approximate standard errors, we may compute &(&,)and ô,1) from the diagonal elements of the inverse of the (c * k1 + (c — 1) * k2) -Chapter 3. Mixed Logistic Regression Models 155dimensional observed information matrix with c fixed at ê which is defined as321 3213a2 8c33I X(r),X(m),M,) ——- 321 32l3a8 3/32Although the EM algorithm is relatively robust for the choice of initial values, it hasa lower convergence rate than the quasi- Newton algorithms. To balance the trade-offbetween these two algorithms, we first use the EM algorithm until either the likelihoodvalue does not increase significantly in terms of a given tolerance epsilon2 or the parameter estimates do not change significantly in terms of a given tolerance epsilon1,and thenshift to a quasi-Newton algorithm which maximizes the observed likelihood function. Indoing so we can obtain approximate standard error of the estimates as by-product ofthe quasi-Newton approach. Note that in some cases the approximate standard errorsby the quasi-Newton approach may not be accurate. Hence we recommend calculatingthe information matrix numerically whenever possible. We modify the above Step 3 asfollows: -Step 3’: (a) If at least one of the following conditions is true, set c(° = & and 3(°)and go to Step 1; Otherwise, go to (b).(o) — c k1 (0)(1) II — II = — I 61;(2) 11$ — II D=i I — iP(3) I l&, $ y pj (r)) — l(cx(°), y pj j(m) j(r)) I 2•(b) Maximize the observed log likelihood function l(cv, 3 Y, M, X(m), X(r)) using thequasi-Newton algorithm (Nash, 1990) with & and 3 as initial values. Then, stop.3.4.2 Starting ValuesTo run the code of the above algorithm, we need to choose the starting values for the parameters in the model. Note that the EM only ensures, under some regularity conditionsChapter 3. Mixed Logistic Regression Models 156(Wu, 1983), that the estimates converge to the local maximum points of the likelihoodfunction for the observed data. Furthermore, since the likelihood function may not beglobally concave, the several starting points needed to find the maximum likelihood estimates, & and $. We propose the following approach for choosing the starting values.We assume that c is known. At the first step of the approach it calculates theratios, {y1/mi,. . . , yn/mn}, divides the set of the ratios into c groups in terms of its percentiles and fits the observed data into a c-component mixture with constant covariates,(m) = (r) (1) by choosing initial values based on the percentile information. At thesecond step, if necessary, it fits the observed data into a mixed logistic regression modelcontaining only one regression term in either the success probabilities or the mixing probabilities in such a way that the initial values of the parameters included in the previousmixture model equal the estimates of the corresponding parameters from the previousfitting model, and initial values of the parameters not in the previous fitting model areset to a small value, say, 0.00001. This process is iterated until a complete set of initialvalues for the mixture model is obtained. The motivation of this ad hoc approach is basedon the idea of cluster analysis. At each iteration, we use different criteria to classify thedata. First, the data are classified in terms of its percentiles. Then the data are classifiedin terms of a finite binomial mixture without covariates, and subsequently in terms ofmixed logistic regression models. Note that choosing a complete set of initial values fora mixture model step by step in such a way guarantees that the likelihood values willincrease in each step. Also our approach produces maximum likelihood estimates for asequence of nested mixture models while it achieves a complete set of initial values forthe mixture model.We use an example to explain this approach. Suppose that we need to choose initial(r) (m)values to fit a 3-component mixture model with covariates X: = (1, s) and x. = (1, t)where s and t are real numbers, each with a regression term. First, we find 16.5, 33.0,Chapter 3. Mixed Logistic Regression Models 15749.5, 66.0 and 82.5 percentiles of the observed ratios {yi/mi,. . . , yn/mn} denoted as q—qrespectively, and fit the data into a 3-component binomial mixture of constant covariates(x = m) = (1)) with the initial values of cr11, a21 and a31 equal to logit(qi),logit’(q3)and logit(q5) respectively, and both the initial values of /3 and /321 equalto 0. Note that under this specification and the logit link function, the initial values of(r) . . . .a,), (j = 1, 2,3) are equal to q, q3 and q5 with the same mixing probabilities1/3. Second, we fit the data into the 3-component mixed logistic regression model with(r) (m) .= (1, s) and ; = (1) by choosing the initial values of a12, a22 and a32 equalto 0.00001 and the initial values of the other parameters equal to the estimates of thecorresponding parameters of the first fitting model. Finally, we choose initial values forthe 3-component mixed logistic regression model with x = (1, s) and x = (1, t1) insuch a way that /312 and /322 are equal to 0.00001 and the other parameters is equal tothe estimates of corresponding parameters of the second fitting model.3.4.3 A Monte Carlo StudyWe use Monte Carlo methods to examine the performance of the above algorithm. Particularly, we wished to verify the reliability of our code, determine the precision of estimates and investigate some model selection criteria to be discussed below. We usethree 3-component mixture models. For each, we analyzed 101 replicates, each with 100observations.Two different approaches for choosing initial values are compared in the study. Inone, we use the true parameter values of the model generating the observations as initialvalues in order to determine performance of the algorithm in the best case. The otheruses the true parameter values of a11, a21 and a31 as initial values, chooses initial valuesof /3ii and /921 according to the approach described in 3.4.2 section, and fits the samples toa 3-component binomial mixture with constant covariates. Then, following the approachChapter 3. Mixed Logistic Regression Models 158of section 3.4.2, we choose a complete set of initial values for the parameters of the modelgenerating the samples. These two different approaches of choosing initial values lead toessentially the same estimates. We describe the details below.Model 1: A model with the success probabilities, Trjj, of the component binomial distributions, bi(y m, irjj), depending on one time-dependent covariate, with constantmixing probabilities, where m 30. For the logistic regression part,(r)= (1, sj, (3.20)wheres=0.2fori=1,...,10,d=0.4fori=11,...,20,etc.,anda=(a,, a2, Q3) (3.21)where c = (—1.2962,—0.4505), a = (—1.3148,1.0811) and a = (0.6973,0.7499). Forthe mixing part,(m)= 113 = (th’ 132) = (—0.9163, —0.5108).For the success probabilities can be written with the formTri(x, ai) = logit(—1.2962 — 0.4505s) (3.22)= logit(—1.3148 + 1.0811s) (3.23)ir3(xc a3) = logit(0.6973 + 0.7499s), (3.24)and the mixing probabilitiesp1(Xm),/3) 0.2,p2(xm),13) 0.25(m)• and p3(x2 ,/3) 0.5.Chapter 3. Mixed Logistic Regression Models 159Note that choosing the parameters as the above makes the component distributions easilydistinguished. In this model, decreases from 0.3 to 0.1, Pi2 increases from 0.3 to 0.7and increases from 0.7 to 0.9. Thus there are no overlap among them.Model 2: A model with constant success probabilities, lrjj, of the component binomialdistributions, bi(y, m,, 2rj,), and mixing probabilities depending on one time-dependentcovariate, where m: 30. That is, for the logistic regression part,(r)= 1a = (ai, a2, a3) = (—2.1972,—0.8473,1.3863)and for the mixing part,(m)= (1, s) (3.25)where s, is defined as above, and/3 = (13i, /32) (3.26)where /3 = (—2.1129,1.6057) and /3 = (—0.9692,1.3805). The positive probabilities,then, are(r) —?rl(x ,a1 = 0.1(r) —72(X ,a2 = 0.3and ir3(xT),a) 0.8,and the mixing probabilities are given by(m) — exp(—2.1129 + 1.6057s) 3 27P1\X,—exp(—2.l129 + 1.6057s) + exp(—0.9692 + 1.3805s) + 1Chapter 3. Mixed Logistic Regression Models 160(m) exp(—0.9692 + 1.38O5s)p2(xexp(—2.1129 + 1.6057s) + exp(—0.9692 + 1.38O5s) + 1(3.28)(m) 1p3(x= exp(—2.1129 + 1.6057s) + exp(—0.9692 + 1.3805s) +(3.29)Note that choosing the values of j3 as the above results in that Pu decreases from 0.2 to0.1, Pu2 increases from 0.25 to 0.7 and Pu3 increases from 0.7 to 0.9. They don’t overlap.Model 3: Both the success probabilities and mixing probabilities depend on the covariates. Fortheregressionpart, x, aand Kj(xT),cj) aregivenby (3.20), (3.21), (3.22), (3.23)and (3.24) respectively; For the mixing part, m) 3 and P(Xm), ) are given by (3.25),(3.26), (3.27), (3.28) and (3.29) respectively.We chose the above parameter values so that the success probabilities and mixingprobabilities for each component do not overlap. We wouldexpect that in this case, thealgorithm would perform well.We carried out these simulations, each with 100 replicates. The responses i wereobtained by first generating a uniform (0,1) random number u: and then assigning yubinomial(m2,iri) if u p1(m),binomial(m,iri2)if p(xm),1B) < ‘UI pi(xm),8) + p(x),); and yjbinomial(m1,R-3) if u, >p1(m), 9) +P2(X(m) /9). Our implementation of the algorithmused FORTRAN version on a Sun SPARC station 1.The results of the Monte Carlo study are presented in Table 3.2 , Table 3.3 and Table 3.4. These tables show that the mean of estimates are very close to the true parametervalues in the models, suggesting that the global maximum of the observed likelihood isreached. For model 1, the sample means are quite close to the true values and the standard deviations are relatively small. Although the coefficients of the logistic regressionof model 2 are estimated accurately, estimates of mixing probabilities are more variable.This suggests that estimating mixing probability parameters in this model is intrinsicallyChapter 3. Mixed Logistic Regression Models 161more difficult than estimating the success probabilities. This agrees with observationsin the literature (Titterington et al., 1985; McLachlan and Basford, 1988). Estimates ofthe parameters of model 3 illustrate the same pattern as in Model 2 where estimates ofthe mixing probability parameters are more variable than those of success probabilitiesparameters. Note, however, that although the estimates of mixing probability parameters, , vary somewhat, the estimated mixing probabilities, p3(x , 9), are more precisedue to the multimonial link function between the parameters and mixing probabilities.The average number of the iterations of the EM algorithm for Model 1 is 8.24, 12.35for Model 2 and 20.2 for Model 3 under the stopping criterion = 0.01, and average timeis 12.5, 19.4 and 120.5 seconds respectively.3.5 Implementation Issues3.5.1 Model SelectionWe need to address following the three issues when we apply a mixed logistic regressionmodel: (a) We need to determine the conditions of identifiability for the model; (b) weneed to determine the number of components, c, of a mixture, and (c) we need to have amethod to carry out inference about model parameters. When c is known, inference forthe parameters can be based on a standard likelihood ratio test. In practice, however,this case may not be common. When c is unknown, the usual likelihood ratio test isno longer valid for determining c or testing hypotheses about parameter values. As wediscuss in section 2.7.1, this is because mixing probabilities may lie on the boundary ofthe parameter space when the hypothesized number of components is less than the fittednumber of components. Hence the usual regularity conditions for the likelihood ratio testdo not hold. We propose the following methods for model selection.Two widely used model selection criteria are the Akaike’s Information Criterion (AIC)Chapter 3. Mixed Logistic Regression Models 162(Akaike, 1973; 1974) and the Bayesian Information Criterion (BIC) (Schwarz, 1978) (seesection 2.7.1. For the mixed logistic regression models, we define the AIC and BIC criteriaas follows:• AIC: choose the model for which l(X) — ac(X) is largest;• BIC: choose the màdel for which l(X) — (log(n))a(X) is largestwhere I(X) is the maximum log-likelihood of the mixture with c components and covariate X, a(X) = c * k1 + (c — 1) * k2 where k1 and k2 are the dimensions of a3 and 9jrespectively, and n is the total number of observations. These two criteria do not alwaysselect the same model.Using the BIC (Ale), our model selection procedure consists of two stages. At thefirst stage, we determine c to maximize BIC (AIC) values for the saturated 1-3 (1-4) component mixture models that contain all possible covariates in both success probabilitiesand mixing probabilities. Note that the c values must be within the range satisfying theidentifiability conditions. Although we compute both AIC and BIC values in our applications, we recommend using BIC because our Monte Carlo studies suggest that BIC ismore reliable in the model selection. At the second stage, our model selection approachdepends on our analysis objectives. If our goal is inference about some particular modelparameter, we carry out likelihood ratio tests for nested c-component mixture models. Ifthe goal is choosing an appropriate model to fit the data, we select a model to maximizeBIC (AIC) values among c-component mixture model concerned. Since this selectionmethod is heuristic and only gives a guideline in applications, some other specific concerns in model selection should be taken into account from case to case. For instance, insome applications the number of components and some parameters in a mixture may beexplicitly or implicitly determined by underlying theory.Chapter 3. Mixed Logistic Regression Models 163In the Monte Carlo studies discussed in Section 3.4.3, we computed both AIC andBIC values for all possible mixed 2 to 4 component models. Table 2.5.1.1 shows that AICand BIC are reliable methods for choosing the correct models. AIC chose the correctmodel 94% of the time for Model 1, 82% of the time for Model 2 and 93% of the time forModel 3. When AIC failed to select the correct model, it always chose a model with toomany components, suggesting that AIC may under-penalize the number of parametersin the mixtures. On the other hand, BIC always chose the correct models, suggestingthat BIC may not over-penalize the number of parameters. Note that all sample sizesin the Monte Carlo studies are 100. The examples in the next section will exhibit thisprocedure in practice.3.5.2 ClassificationOne possible use of the mixed logistic regression model is to classify data on the basisof a probabilistic model rather than an ad hoc clustering technique. Since j in (3.17)is the estimated posterior probability that the observation y, is generated by thecomponent distribution bi (yj I m?r), this information can be used to classify observations into different groups characterized by the component distributions. For instance,for a c-component mixture model we may postulate c different groups defined by the cdifferent sets of the coefficients of the logistic regression,7rj cj) (j = 1,.. . , c) of themodel. According to the classification criterion, an observation i is identified with thecomponent which maxiniizes jj. In our applications, maximum values for this quantityall exceed 0.5. Note that if the parameters of the model were known, this classificationcriterion would be the optimal or Bayes rule (Anderson, 1984, chapter 6) which minimizesthe overall error rate. Also such an approach has been referred to as latent class analysis(Aitkin et al. 1981). We illustrate this approach in examples below.Chapter 3. Mixed Logistic Regression Models 1643.5.3 Residual Analysis and Goodness-of-fitOnce a mixed logistic regression model has been fit to a set of observations, it is essentialto check the quality of the fit. For this purpose, we first define Pearson, deviance andlikelihood residuals for mixed logistic regression models, and then use them to identifyindividually poorly fitting observations and influential observations on overall fit of themodel as well. We also define a quantity to measure influence of individual observationson the set of parameter estimates, and use it to identify influential observations. In addition, we provide goodness-of-fit statistics for mixed logistic regression models.Definitions of ResidualsAs we discuss in Section 2.7.3, we define Pearson, deviance and likelihood residualsfor a mixed logistic regression model. The Pearson residual is defined asyi — 12irp,, 3.30where=(3.31).i (r) (r)7rjj = exp(cr3; )/(1+exp(c3x)),“ (m)exp(/33 )pi’ = for j = 1 ... c — 1 andc—i , m)Ek exp(/3kx: ) + 11c—i , (m)Ek exp(/3k; ) + 1andV(tj) = m —C+ m(m — 1){C—} (3.32)The deviance residual is defined as= sign(yj — /%)\,/2[l(y, yi) — l(/, y)}Chapter 3. Mixed Logistic Regression Models 165sign(y — (3.33)where l(j, y) is the log likelihood function of mixed logistic regression model for observation yj and d = 2(l(yj, y) — l(/j, yj)) is the contribution to the deviance goodness-of-fitstatistic D which is defined asD 2[l(y,y)— l(fj,y)]. (3.34)Note that l(yj, yj) is the same for both the usual logistic regression and mixed logisticregression models becauseC(r) (m)f(y ; ,x ,m,a,6) = pbi(y I (3.35)j=1< m,y) (3.36)= bi(y I mj,yj) (3.37)This indicates that there is the same baseline for the usual logistic regression models andmixed logistic regression models.The likelihood residual is derived by comparing the deviance obtained on fitting amixed logistic regression model to the complete set of n cases with the deviance obtainedwhen the same model is fitted to the n — 1 cases, excluding the ith, for i = 1,. . . , n.This gives rise to a quantity that measures the change in the deviance when each case inturn is excluded from the data set. The value of the likelihood residual for the ith caseis defined asrL = sign(yt — [t)/D — D() (3.38)where /% is defined by (3.31); & and & are the maximum estimates of the regressionparameters based on the complete data set of n cases and the data set of n — 1 casesChapter 3. Mixed Logistic Regression Models 166excluding the i case respectively; and D and D() are the deviances based on n and n — 1cases respectively.Note that for large binomial denominators m, all three types of residuals approximately follow the standard normal distribution if the fitted model is adequate. Ournumerical results show that the Pearson residuals may not be as approximately normalas the other two types of residuals.Detection of Outliers and Influential ObservationsThe residuals obtained after fitting a mixed logistic regression model to an observedset of data form the basis of a large number of diagnostic techniques for assessing modeladequacy. Since our primary objective of residual analysis for mixed logistic regressionmodels is to identify outliers and influential cases, we discuss how the residuals can beused for this objective.Like mixed Poisson regression models, we define outliers as those observations thatare surprisingly distant from the remaining observations in the sample. Suchobservations may occur as a result of measurement errors, that is errors in reading,calculating or recording a numerical value; or they may be just an extreme manifestationof natural variability. Since large residuals indicate poorly fitting observations, we useindex plots of residuals for detection of outliers, that is, observations that have unusuallylarge residuals.The influence of a particular observation on the overall fit of a model can be assessedfrom the change in the value of a summary measure of goodness of fit that results fromexcluding the observation from the data set. Since is the change in deviance onomitting the ith observation from the fit, an index plot of these values is the best way ofassessing the contribution of each observation to the overall goodness of fit of the model.To examine how the ith observation affects the set of parameter estimates, we defineChapter 3. Mixed Logistic Regression Models 167the following quantity= {ii (& - &)/se(&) 11+11 ( - )/se() ii}k1 (i) c—i k2 —=a,,l-ase(,i) } (3.39)where a and 3 are the maximum likelihood parameter estimates of the mixed logisticregression model based on the complete data set of n cases, and and on thedata set of n — 1 cases excluding the i case; se(&) and se() are the estimated standarderrors of the corresponding estimates based on the ii cases, and p = ck1 + (c — 1)k2.Because each term in (3.39) measures a relative change in individual coefficient, w canbe interpreted average relative coefficient changes for a set of estimates. This is a usefulquantity for assessing the extent to which the set of parameter estimates is affected bythe exclusion of the ith observation. Relatively large values of this quantity will indicatethat the corresponding observations are influential and causing instability in the fittedmodel. An index plot of w is the most useful way of presenting these values. The example in the next section will illustrate these points.Goodness-of-fit StatisticsAfter fitting a mixed logistic regression model to a set of data, it is natural to enquireabout the extent to which the fitted values of the response variable under the modelcompare with the observed values. If the agreement between the observations and thecorresponding fitted values is good, the model may be acceptable. If not the current formof the model will certainly not be acceptable and the model will need to revised. Theaspect of the adequacy of a model is widely referred to as goodness of fit.There are at least two widely used goodness-of-fit statistics which can be used here.Chapter 3. Mixed Logistic Regression Models 168One is the deviance defined asD=(3.40)where rD is the deviance residuals for the mixed logistic regression model; and the otheris the Pearson’s X2 statistic defined as= (3.41)where rp is the Pearson residuals for the mixed logistic regression model. In order toevaluate the extent to which an adopted mixed logistic regression model fits a set of data,the distribution of either the deviance or the Pearson statistic, under the assumption thatthe model is correct, is needed. In general, the deviance and the Pearson’s X2 statisticsare asymptotically distributed as x2 with (n — p) degrees of freedom, where n is thenumber of observations and p is the number of unknown parameters in the model. Manystudies have shown that the Pearson statistic is often much more nearly chi-squared thanthat of the deviance (e.g., Larntz, 1978). For this reason we use the Pearson statistic foroverall goodness of fit tests for the mixed logistic regression models.3.6 An ApplicationThis example uses data from an experiment reported by Ganio and Schafer (1992), whichinvestigates the carcinogenic effects of aflatoxin, a toxic by-product produced by a moldthat infects cottonseed meal, peanuts, and grains. Forty tanks of rainbow trout embryoswere exposed to either afiatoxin Bi or a related compound, afiatoxicol, at one of fivedoses for one hour, and the incidence of liver tumors in each tank was recorded afterone year. The data in Table 3.5 are the proportions of fish with liver tumors in each of40 tanks. The researchers believe that there may exist extra-binomial variation due totank effects and different treatments. They believe that afiatoxical must undergo moreChapter 3. Mixed Logistic Regression Models 169chemical changes than afiatoxin B1 to produce tumors in fish. This may result in morevariation of effective doses reaching the liver of fish in aflatoxicol tanks and, therefore, agreater degree of extra-binomial variation for the aflatoxicol group. The issue of interestis to assess dose level and treatment effects on the proportions of fish with liver tumorswhile taking extra-binomial variation into account.We first apply the usual logistic regression model with covariates including an intercept, dose level (xii), treatment (x2) and dose-treatment interaction (x3), where1 0 if fish in tank i was exposed to aflatoxin B1= (3.42)( 1 if fish in tank i was exposed to afiatoxicoland x3 = x12. (3.43)The top part of Table 3.6 reports results of fitting the data to the usual logisticregression models. Note that the deviance and Pearson goodness-of-fit statistics for themodel with covariates x, x2 and x3 are 391.08 and 365.3, respectively, with 36 degrees offreedom, suggesting that there is significant evidence of lack of fit in the logistic regressionmodel. Furthermore, the data are overdispersed with respect to the binomial distribution,since each of overdispersion tests is highly significantly (Na = 68.26, N = 36.42 andN = 36.3). This also indicates inadequacy of the usual logistic regression model.Ganio and Schafer (1992) only present exploratory techniques for use in an early stageof data analysis to aid modelling extra-binomial variation. They take some functionof the dispersion parameter in a generalized linear model to depend on explanatoryvariables. To detect extra-binomial variation for the fish data, they consider three modelsfor dispersion. Let ‘Irjj be the probability of tumor for concentration level i and carcinogengroup j (i = 1, . . . , 5; j = 1,2), and let Yk be the number of tumors observed inmk fish in tank k of treatment ij. Then they model the variance of this count as—rjj)/qjjk and consider the following forms for dispersion parameter: (a) qjjk =Chapter 3. Mixed Logistic Regression Models 170; (b) cijk = A + az3, where z3 = (j — 1); and (c) qSk = [A + aZjik}’, where Zijk =(mk — l)r,(l — ?rj,). Note that Model (a) is a generalized linear model with constantdispersion. Model (b) contains separate dispersion parameters for the two carcinogengroups. Model (c) is the approximate variance of Y if a random effect, with mean 0 andvariance a, is additive on the logit scale (Williams, 1982). They find that the extra-binomial variation is associated with the type of carcinogen and cannot be explainedsimply by differences in the rk’s. Note that, however, they do not analyze extra-binomialvariation along with dose-response function in mean simultaneously.We apply the mixed logistic regression model assuming that(1) each observed number of tumors, yj, in m fish in tank i is associated with co(m) (r)variates ; = (1, x1,x22) and x = (1, x1,x2,xj) where x1, xj2 and x3 are definedabove;(2) numbers of tumors in different tanks are independent and follow a mixed logisticregression model with binomial parameters rjj given by the link function= exp(a30 +a31x1 +a2x+a33x) (3 44)1 + exp(ajo +a1x +a2x+a3x)’where i = 1,. . . ,40, and j = 1,. . . , c, and the mixing probabilities pj3 given byexp(/330 + /3ixi +32xi)p: = ford = 1 ... c—i (3.45)c—i1 + >ki exp(,Bko +/3kiXii +/3k2X2)andPic = 1—1Pij. (3.46)Note that since the smallest binomial denominator in the data set is 80, the mixedlogistic regression model is identifiable if c < (80 + l)/2 = 40.5. Thus, there are virtuallyno restrictions on identifiability in this example.Table 3.6 provides the results of fitting these models. In order to determine thenumber of components first, we compare the values of AIC and BIC among the threeChapter 3. Mixed Logistic Regression Models 171staturated models. Clearly, both AIC and BIC lead to the choice of a 2-componentmixed logistic regression model. Within these 2-component models we carry out inferenceusing likelihood ratio tests. First we test which covariates in the mixing probabilitiesare significant. Comparing the model oniy excluding x2 in mixing probabilities andincluding all covariates in binomial parameters with the saturated 2-component model,the chi-square test statistic is 0 up to 2 decimal approximation. This clearly indicatesthat x2 is insignificant in mixing probabilities. Then we test the hypothesis that x1is insignificant in mixing probabilities. The corresponding chi- square test statistic is2(1784.36 — 1757.48) = 53.76 with one degree of freedom, suggesting that x1 is highlysignificant in mixing probablities.For binomial parameters we first test the hypothesis that the dose-treatment interaction is insignificant. Comparing the one only excluding x3 in binomial parameters and including x1 in mixing probabilities with the one including all covariates inbinomial parameters and x1 in mixing probabilities, the chi- square test statistic is2(1760.57 — 1757.48) = 6.18 with 2 degrees of freedom. Since the p-value of the teststatistic is 0.0455, we do not reject the hypothesis, at 1% level, that the interactioneffect is insignificant. On the other hand, both the effects of covariates x1 and x2are significant. For instance, to test the hypothesis that the effect of x2 is insignificant, we compare the model including x,2 in binomial parameters only and x1 in bothmixing probabilities and binomial parameters with the one only including x1 in bothmixing probabilities and binomial parameters, and obtain the corresponding chi-squaretest statistic 2(1834.94— 1760.57) = 148.74 with 2 degrees of freedom. Clearly we rejectthe hypothesis that the effect of x:2 is insignificant. Finally, we test the hypothesis of acommon effect of treatment for both components, i.e., o2 = o22. Indeed this hypothesisis valid because the test statistic is 0 up to two decimal approximation. Therefore weselect the 2-component mixed logistic regression model with the covariate of dose levelChapter 3. Mixed Logistic Regression Models 172in both mixing probabilities and binomial parameters and the common coefficient of thecovariate of treatment in binomial parameters. This model fits the data best.After fitting the 2-component mixed logistic regression model, the Pearson goodness-of-fit test statistic X2 is 52.18 with 33 degrees of freedom. The p-value of the test statisticis 0.0181, suggesting that there is no evidence of lack of fit at 1% significance level. Notethat the deviance for the fitted model is 51.46 with 33 degrees of freedom. In addition,the Pearson, deviance aid likelihood residuals from the fitted model are calculated anddisplayed in Figure 3.1, Figure 3.2 and Figure 3.3 respectively. These plots show that thethree types of residuals are very similar to each other, and that the 37th observation is fardistant from the remaining observations in these plots, suggesting that it is an outlier.On omitting the observation, the deviance reduction is r37 = (_3.1651)2 = 10.0179.This means that the 37th observation has great impact on the overall fit of mixed logisticregression model to the data.For detection of influential observations, the average relative coefficient changes w arecalculated and displayed in Figure 3.4. Clearly, the 37th observation also has the largestvalue (0.3543). On omitting this observation, the average relative coefficient change foreach parameter estimate is about 35%, and the new parameter estimates become= (—44.38,—1183.7) (3.47)= (—0.8838, 7.2151, 1.2232) (3.48)a2 = (—4.8242, 123.29, 1.2232) (3.49)Note that changes in the binomial parameter estimates for first component are relativelylarge, while there are almost no changes in the parameter estimates for mixing probabilities. This indicates that the 37th observation has greater influence on the first componentthan on the second component. We now interpret the fitted model.The chosen mixed logistic regression model suggests that numbers of fish with liverChapter 3. Mixed Logi.tic Regression Models 173tumors are generated by two underlying binomial distributions with binomial parametersdefined by, respectively,— exp(—0.8161 + 6.6209x1+ 1.1686z2) 3 50— 1 + exp(—0.8161 + 6.6209x1+ 1.1686xand— exp(—4.7798 + 122.92x + 1.1686x2) 3 51— 1 + exp(—4.7798 + 122.92x + 1.1686xIn addition, these two distributions are mixed according to the mixing probabilities defined by— exp(—44.38 + 1183.7xi)3 52— 1 + exp(—44.38 + 1183.7xi)and1Pi2= 1 + exp(—44.38 + 1183.7x) (3.53)According to this model, tanks in either of the two treatments may be classifiedinto two groups on the basis of the two dose-response functions. For either of the twotreatments, fish in those tanks exposed to a higher dose level (> 0.025 pm) follows onedose-response function; and fish exposed to a lower dose level ( 0.025 ppm) followsanother. In addition, the treatment effect is the same for both groups. On the otherhand, when exposed to a higher dose level, there is a higher chance for fish to followthe first dose-response function because the mixing probability for component one isvery close to 1. Similarly, when exposed to a lower dose level, there is a higher chancefor fish to follow the second dose-response function because the mixing probability forcomponent two is close to 1. Figure 3.5 provides the estimated proportions of fish withliver tumors corresponding to each group for either of the two treatments (the solid line isthe proportion for group one and the dotted line for group two). Note that Figure 3.5 alsoChapter 3. Mixed Logistic Regression Models 174classifies the observed proportions in terms of the estimated posterior probabilities fromthe fitted model. Those observations marked as “1” form group one which characterizedby the function while those marked as “2” form group two which is characterized bythe functionFigure 3.6 depicts the mean-variance relationship for the fitted model based on theestimated mean and variance obtained through (3.8) and (3.9). Note that there is noobvious parametric relationship between the estimated mean and variance.For the purpose of comparison, we also fit the data to the two quasi-likelihood models which are discussed by McCullagh and Nelder (1989) and Williams (1982) respectively. The first assumes a variance form Var(1’) = cr2mr(1 — 7r), and the secondVar() = mK(1 — 7r)[1 + (m—l)q9. Note that the unknown parameters u2 andare usually called unexplained variance. The results of parameter estimates and standard errors are given in Table 3.7. Note that the dose-treatment effect is not significantin quasi-likelihood models (estimates not reported here). As expected, the parameterestimates for both quasi-likelihood models are very similar to each other because thebinomial denominators m do not vary much. From Table 3.7, we find that parametersestimates under quasi-likelihood models and mixed logistic regression model are different,suggesting that using different methods to model extra-binomial variation may lead toeither different parameter estimate or different standard errors or both. For instance,the coefficient estimate for dose level is 12.82 and 12.81 by quasi-likelihood method Iand II respectively, and 6.6209 for component one and 122.92 for component two respectively. Furthermore, computing the t-statistic (estimated coefficient/standard error) andcomparing the mixed logistic regression model with the quasi-likelihood, we find thatquasi-likelihood models may underestimate the treatment and dose effects. For example,the values of the t-statistic of the estimated coefficient for x are 4.1556 and 4.1441 forthe quasi-likelihood model I and II respectively, while the value for the mixed logisticChapter 3. Mixed Logistic Regression Models 175regression model is 13.732. Thus, compared with quasi-likelihood methods, the mixedlogistic regression model has smaller confidence intervals for parameter estimates.In summary, we have applied the mixed logistic regression model to analyze thedata from a fish toxicology study. The data are well fitted by a 2-component mixedlogistic regression model with mixing probabilities depending on the dose level covariateand binomial parameters dependillg on both dose level and treatment covariates. Thegoodness-of-fit test suggests that there is no evidence of lack of fit in the model. Inaddition, the residual analysis identifies an outlier and influential observations. Accordingto this model, there are two dose-response functions for each treatment, which describelower dose level and higher dose level situations respectively. Comparing with the quasilikelihood methods, the mixed logistic regression model gives smaller confidence intervalsof parameter estimates. Note that both parameter estimates and standard errors underthe mixed logistic regression differ from those obtained by the quasi- likelihood method.3.7 Tables and Figures in Chapter3Chapter 3. Mixed Logistic Regression Models 176Table 3.1: Data of Busvine (1938)Jar label Dose Jar total Number dead1 0.033 24 02 0.167 31 103 0.199 30 174 0.225 31 125 0.260 27 76 0.314 26 237 0.322 30 228 0.362 31 299 0.391 30 3010 0.394 30 23Chapter 3. Mixed Logistic Regreas.jon Models 177Table 3.2: The results of the simulations for the mixed logistic regression model (Model 1)Initial values set as the true valuesParameter True Upper Upper Median Lower Lower Averagevalue extreme quartile quartile extremeç-0.9163 -0.2618-0.7825 -0.9698-1. 1354 -1.6169-0.9643tH-0.5108 0.0366 -0.3293 -0.5443-0.7185-1.2269 -0.5380‘-21a 4.2962 0.6248 4.0144 4.2717 4.4430 -1.9056 4.2402ita -0.4505 0. 1955 -0.2766 0.4593 0.6294 -0.9752 -0.485512a -1.3148 -0.8427 -1.2082 -1.3796 4.5097 4.7884 4.361921a 1.0811 1.4963 1.2325 1.1158 1.0344 0.7893 1.121122a 0.6973 1.0054 0.8004 0.6911 0.5779 0.3414 0.689231a 0.7499 1.1164 0.8855 0.7695 0.6527 0.4017 0.768332Initial values chosen step by step, -0.9163 -0.2467 -0.7711 -0.9626-1.1331 -1.6162 -0.9586i-fl-0.5108 0.0336 -0.3317 -0.5500-0.7468-1.2571 -0.5386‘-21a11 4.2962 0.6238 -0.9970 -1 .2561 -1.4414 4.9909 4.2324a 0.4505 0. 1540 -0.3189 -0.4839 -0.6473 -0.9759-0.497612a 4.3148 0.7701 4.1968 4.3648 4.5076 -1.7880 -1.284421a 1.0811 1.5773 1.2293 1.1066 0.9885 0.6696 1.059922a 0.6973 1.0076 0.813i 0.6996 0.5814 0.3416 0.701131a 0.7499 1.1165 0.8772 O7656 0.6473 0.3954 0.757732Chapter 3. Mixed Logistic Regression Models 178Table 33: The results of the simulations for the mixed logistic regression model (Model 2).Initial values set as the true valuesParameter True Upper Upper Median Lower Lower Averagevalue extreme quartile quartile extremep-2.1129 -0.7531 -1.6871 -2.2560 -2.8070 -3.8851 -2.30681.’llp 1.6057 2.8955 2.0599 1.7048 1.4069 0.5755 1.7467‘ 12p-0.9692 0.2170 -0.7619-1.0517 -1.4186 -2.3082 -1.0958‘21p 1.3805 2.3960 1.7416 1.4624 1.1670 0.4227 1.494122-2.1972 -1.8090 -2.0612 -2.1586-2.2975 -2.6451 -2.2061a11a -0.8473 -0.6443 -0.7807 -0.8517 -0.9046 -1.0277 -0.847321a 1.3863 1.5637 1.4346 1.3926 1.3371 1.2438 1.389231Initial values chosen step by stepi -2.1129 -0.7410 -1.6260 -2.2150 -2.7535 -3.8757 -2.2733p 1.6057 2.8799 2.0629 1.6947 1.3974 0.5713 1.7469“12p-0.9692 0.2106 -0.7705-1.0538-1.4351 -2.3073 -1.1090‘21p 1.3805 2.3804 1.7385 1.4463 1.1490 0.4193 1.4889‘22a -2.1972 -1.6279 -2.0135 -2.1347 2.2806 2.6425 2.1818: 11a 0.8473 0.6152 0.7692 0.8452 0.8980 4.0274 0.837821-____________a 1.3863 1.5637 1.4348 1.3926 1.3386 1.2438 1.389531Chapter 3. Mixed Logistic Regression Models 179Table 3.4: The results of the simulations for the mixed logistic regression model (Model 3)Initial values set as the true valuesParameter True Upper Upper Median Lower Lower Averagevalue extreme quartile quartile extremep-2.1129 -0.2943-1.5967 -2.2683-2.8381 4.5911 -2.26181.111.p 1.6057 3.3711 2.1703 1.6764 1.2100 -0.1373 1.7061‘12p -0.9692 0.0446 -0.8472 -1.0817 -1.4547 -2.1604-1.1588‘21p 1.3805 2.6180 1.7743 1.4772 1.1977 0.3613 1.538322-1.2962 -0.4551 -0.9429 -1.1646-1.5292 -2.1857-1.2114a11a 0.4505 0.3340 0.2551 0.4819 0.6523 4.0381 0.506712a 4.3148 -0.8501 4.1829 -1.3497 4.4191 -1.6918 4.3164211.0811 1.3219 1.1537 1.0830 0.9804 0.7596 1.0718a220.6973 1.0964 0.8138 0.6710 0.5802 0.3398 0.6881310.7499 1.2727 0.9020 0.7549 0.5985 0.2325 0.7604a32Initial values chosen step by step-2.1129 -0.2945 -1.5197 -2.1655-2.8181 -4.6014-2.2109tJi1p 1.6057 3.3687 2.1429 1.6063 1.1472 -0.3411 1.666012p -0.9692 0.0457 -0.8123 -1.0815-1.4438 -2.3682-1.1286‘21p 1.3805 2.5867 1.7570 1.4766 1.1845 0.3604 1.506122-1.2962 -0.4557 -0.9293 -1.1561-1.5302 -2.1914 -1.2054a11a 0.4505 0.3340 -0.2544 0.4896 0.6639 4.0377 -0.509612a 4.3148 0.8394 -1.1830 -1.3459 -1.4153 -1.6962 -1.297421a 1.0811 1.3233 1.1550 1.0768 0.9769 0.7526 1.0577220.6973 1.0967 0.8149 0.6848 0.5824 0.3398 0.7022a310.7499 1.2727 0.8991 0.7518 0.5948 0.1788 0.7455a32Chapter 3. Mixed Logistic Regression Models 180Table 35: Number of trout with liver tumors/number in tankDose (ppm) Aflatoxin Bi Aflatoxicol0.010 3/86, 5/86, 4/88, 2/86 9/87, 5/86,2/89,9/850.025 14/87,14/90, 9/83 12/88 30/86, 41/86, 27/86, 34/880.050 29/90, 3 1/89, 33/89, 26/87 54/89, 53/86, 64/90, 55/880.100 44/86,40/80, 44/89, 43/88 71/88,73/89, 65/88, 72/900.250 62/87,67/88,59/88,58/84 66/86, 75/82, 72/81,73/89Chapter 3. Mixed Logistic Pression Models 181Table 3.6: Logistic regression and mixed logistic regression modd estimates for fish data.S Mixing probability Binomial pacsmetcrsBIC(3)= fi0 P,., P,2 c,,Logistic regression model (i-component)I NA NA NA -1.758 11.93 0.8911 2.402 -1930.38 -1934.38 -1937.76(0.0847) (0.6750) (0.1143) (1.154)1 NA NA NA -1.839 12.82 1.063 .1932.62 -1935.62 -1938.15(0.0764) (0.5424) (0.0803)2-component mixture1 -43.89 1170.6 -0.0103 -0.9220 7.4548 1.4198 -2.1632-1757.48 -1768.48 -1776.772 4.0710 90.50 0.1337 47.671 42.81 1141.58 -0.9220 7.4547 1.4198 -2.1631-1757.48 -1767.48 -1775.922 4.0710 90.50 0.1337 47.671 0.4095 -0.9232 7.4613 [.4141 -2.1311-1784.36 -1793.36 -1800.962 4.0708 90.4747 0.1356 47.481 42.72 1139.6 -0.8156 6.6201 1.1676-1760.57 -1768.57 -1775.322 4.7821 122.94 1.17161 -44.38 1183.7 -0.8161 6.6209(407.8) (4.453) (0.0982) (0.6155)1.1686 -1760.57 -1767.57 -1773.482 -4.7798 122.92(0.0851)(0.2896) (12.49)1 -2.6492 64.73 -0.5843 7.9411-1834.98 -1840.98 -1846.052 -3.8221 87.331 4.9315 -1.5167-1922.26 -1926.26 -1929.642 -[00.94 0.78433-component mixture1 25.66 -195.59 0.2586 -1.3636 13.58 1.1820 1.58132 74.16 -1500.7 .0.3769 -4.0710 90.50 0.1337 47.67-1750.80 -1768.80 .1784.003 -0.6032 5.9736 1.6905 -3.50381 Log-likelihood does not include the constant term.Table3.7:ParameterestiamtesforfourmodelsforfishdataI ciCovariatesLogisticQuasi-Quasi-MixedlogisticregressionregressionlikelihoodIlikelihoodIIcomponent1component2-1.839-1.839-1.841-0.8161-4.7798a0(0.0764)(0.2434)(0.2424)(0.0982)(0.2896)(intercept)-_____________________________12.8212.8212.816.6209122.92a1(0.5424)(1.7278)(1.714)(0.6155)(12.49)(doselevel)1.0631.0631.0581.1686a2(0.0803)(0.2558)(0.2553)(0.0851)(treatment)Unexplained=i.o=io.is4=O.IO56NAvariance1’3Figure3.1:TheindexplotofthePearsonresidualsfromthefittedlogisticregressionmodelforthefishdata...Cl) 0 U) C 0 (0 ci 0C4J 0•ce).Ir...010203040.indexFigure3.2:Theindexplotofthedevianceresidtjalsfromthefittedmixedlogisticregressionmodel-tor metishdata./\ ..%/.Cl) (I) 0 I.. a) C.) C > a) 0C”.0•Ci)....I../.1/.010203040indexFigure3.3:Theindexplotofthelikelihooçi residualsfromthefittedmixedlogisticregressionmodeltormetistidata.I-. i010203040.1..(I) (I) G) V 0 0 4: 0).cJ...C..indexCl) 0) (IS-c C) 4-’ C a) 0 a) 8 a) > a) I- a) 0) G)Figure3.4:Theindexplotoftheaveragerelativecoefficientchangesfromthefittedmixedlogisticregressionmodelforthefishdata.C)0 c’J 0 d 0 0.t-4 I I:.•I\•.4%I.0.IIII10203040indexL9tsponosaz,J3S7probabilities0.00.20.40.60.81.04’probabilties0.00.20.40.60.81.0.4.44’.44’4.S0Li)CDCDCDC).00I00C71p0p-(J1000t’%)0100Cl)CDCDCD.4.4IIII00C,’00p(7’0rs0001oq0C)00CD0CDCDCDIIIII•1III11Figure3.6:Theplotofthemean-variancerelationshjpbasedonthefittedmixedlogistic regressionmodelforthetishdata.0 co 0 CoG) C-) >. 0 c’J0$4 I020406080meanChapter 4Summary, Conclusions and Future ResearchIn this chapter, we summarize similarities and differences between the mixed Poisson regression and mixed logistic regression models discussed in the previous chapters. Furthermore, we discuss some extensions of these mixed regression models and related remainingissues for future research. Section 4.2 formulates a mixed exponential family regressionmodel which includes the mixed Poisson regression and mixed logistic regression modelsas special cases. Section 4.3 concerns a hidden Markov Poisson regression models forlongitudinal data. We give some preliminary results of this model.4.1 Summary and ConclusionsThere are many similarities between the mixed Poisson regression and mixed logisticregression models discussed in Chapters 2 and 3. These are that• both models assume an unobserved mixing process which can occupy any one of cstates where c is finite and unknown; independent pairs of observed and unobservedrandom variables; covariates consisting of two parts: one related to the mixingprobabilities, and the other to the component parameters; the same multinomiallink in the mixing probabilities;• both models can model overdispersion in the sense that the variances of the mixedregression models are larger than those specified by the mean-variance relationshipsof the corresponding usual regression models;189Chapter 4. Summary, Conclusions and Future Research 190• parameters are estimated by maximum likelihood. Parameter estimates of bothmodels are obtained by applying (1) the EM algorithm treating the unobservedrandom variable as missing data and (2) a quasi-Newton approach for the M-stepand for maximizing the observed log likelihood functions;• the model selection procedures for both models are the same, i.e., first determiningthe number of components by comparing the AIC and BIC values among the saturated models, and then carrying out inferences about regression parameters withinc-component mixtures by likelihood ratio tests;• classification, residual analysis and goodness-of-fit tests for both models are carriedout in the same way.There are several differences between the mixed Poisson regression and mixed logistic regression models. Obviously, the component distributions of the mixtures andlink functions are different. This leads to different sufficient conditions for identifiabilityof these models. For the mixed Poisson regression models, the sufficient conditions foridentifiability are virtually satisfied in all applications; for the mixed logistic regressionmodels, since the sufficient conditions for identifiability depend on the binomial denominators, these may restrict the applications of these models in some cases. Although thealgorithms for computation of parameter estimates for both models are similar, the implementation of these algorithms are quite different because there are different rescalingschemes to overcome numerical overflow or underfiow problems. Note that coding thesealgorithms might be a formidable task.Both the mixed Poisson regression and mixed logistic regression models provide newtools to analyze discrete data when data are overdispersed with respect to either thePoisson or binomial assumption. Allowing covariates in both mixing probabilities andthe component parameters give a direct way to assess effects of each covariate on theChapter 4. Summary, Conclusions and Future Research 191response variable. Using these models, we can classify observations into different groupscharacterized by different regression functions. This may give a more meaningful interpretation for overdispersion.The mixed regression models are not always preferable to other models for modelling overdispersion such as parametric mixtures or quasi-likelihood regression. Whenoverdispersion is reasonably modeled by a continuous mixing distribution, either parametric mixtures or quasi-likelihood regression models may. be better. For the Poissoncase, for instance, if extra-Poisson variation is caused by a random effect in the meanwhich is reasonably modelled by a continuous distribution, say a gamma distribution,then the negative binomial model is more suitable. Likewise, for the binomial case, ifextra-binomial variation varies smoothly in the binomial denominators, Williams’ quasi-likelihood models (1984) may be better. Nevertheless, the mixed regression models aresuitable in many applications, which we have demonstrated in the previous chapters. Thesame technique of accommodating heterogeneity with mixture models can be applied toother cases. We discuss some generalizations below.4.2 Mixed Exponential Regression ModelsFor a given one-parameter one-dimensional exponential model, the mean-variance relationship is determined by a single parameter. The one-parameter exponential densityish(y) exp(Oy— x(O)),where h(y) is a real function, x(O) is the log moment generating function with mean= , and variance x”(O)• Sometimes samples are found to be either too heterogeneous or homogeneous to be explained by a one-parameter exponential model of modelsin the sense that the implicit mean-variance relationship in such a model is violated byChapter 4. Summary, Conclusions and Future Research 192the data. If the sample variance is large compared with that predicted by inserting thesample mean into the mean-variance relationship, overdispersion occurs. On the otherhand, if sample variance is small compared with that predicted by the mean-variancerelationship, underdispersion occurs. In this section, we suggest a mixed exponentialregression model to adjust for overdispersion in terms of the mean-variance relationshipof the one-parameter exponential model.Let the random variable Y denote the ith response variable, and let {(yj, xi), i =1,.. . , u} denote observations where yj are observed value of Y, and x = (Xm),are k-dimensional covariate vectors associated with y. Note that (m) and x are k1-dimensional andk2-dimensional vectors corresponding to the regression part of mixingprobabilities and component parameters respectively. Usually the first element of (m)and is 1 corresponding to an intercept. Our mixed exponential regression modelassumes that(1) the unobserved mixing process can occupy any one of c states where c is finite andunknown;(2) for each observed response j, there is an unobserved random variable, O, representing the component which generates y. Further, the CI’, 9) are pairwiselyindependent;(3) conditional on covariate m) follows a discrete distribution with c points of(m) (m) c (m)support, 1,. . . , c, and Pr(9= j x ,/3) = p(x , 9) where j1 p,(x ,8) 1(m)and p,(x2 ,/9) is defined by(m)p(x ,/3) pjexp(/3xIm))= c—i / (m) for j = 1,. . . , c — 1, (4.1)1 + ki exp(/3kx )Chapter 4. Summary, Conclusions and Future Research 193and— (m)INc = pc(Xj ,i3)= (4.2)with 3 = (/3,...,8C_)’ and /3 =(/3j1,...,8k)’, j = 1,...,c— 1, are unknownparameters. In fact, conditional on (m) follows a multinomial distribution(1, pa,. .. , p). Note that appears in each pjj for 1 j C;(4) conditional on€ j, Y follows an one-parameter exponential distribution whichwe denote byfi (Yi I x,aj)= exp(Oy— X(0ii)) (4.3)whereh(x, aj) for j = i,:. . , c,where a (ai,. . . , a,)’ are unknown parameters , where a3 = (aj,. . . ,j = 1,. . . , c. Note that the component parameter Otj relate to covariates SIr)through the link function h.Under the above assumptions the probability “density” of Yj satisfiesf(y x),xm),a,) = (9—x(8ij)) (4.4)where pjj and Oj are specified by (4.1),(4.2) and (4.3) respectively.Note that the mixed Poisson regression and mixed logistic regression models discussedin the previous chapters are special cases of the mixed exponential regression models inwhich the component distributions are Poisson and binomial distributions respectively.Chapter 4. Summary, Conclusions and Future Research 194Another example of the mixed exponential regression models is the mixed normal regression model which assumes that the component distributions are normal distributionswith conimon variance for all components. In this case, the component distributions canbe denoted byI (r) ‘\ 1 1 2fj zi I x , °i)= /_exp(—-j(yj— pii) )whereI(r) for j = 1,. . . , c,Note that the link function is the identity function.To apply the mixed exponential regression models, we need to show under what conditions the unconditional variance of Y is larger than that allowed by the one-parameterexponential distribution. The results given by Shaked (1980) may provide insight it.Since the different assumptions about the component distributions may lead to differentconditions for identifiability of the mixed regression models, as we show in the previouschapters, we also need to show under what conditions the mixed exponential regressionmodels is identifiable.As we did for the mixed Poisson regression and mixed binomial regression models,parameter estimation of the mixed exponential regression model can be carried out bymaximum likelihood. Furthermore, to obtain the maximum likelihood estimates requiresusing an iterative algorithm similar to ones in the previous chapters. Specifically, fora fixed number of components c, we may apply the EM algorithm by defining the unobserved random variable as missing data and using a quasi-Newton approach for theM-step. When either the observed likelihood or the parameter estimates do not changemore than a given tolerance, we apply a quasi-Newton approach for maximizing theobserved likelihood function.Chapter 4. Summary, Conclusions and Future Research 195After fitting data to the proposed model, we need to carry out residual analysis toidentify possible outliers and influential observations and goodness-of-fit test for the fittedmodel. As we do for the mixed Poisson and logistic regression models, we propose usingPearson, deviance and likelihood residuals as well as relative average coefficient changes ina similar way for this purpose. We also suggest using the estimated posterior probabilitiesfrom the fitted model to classify observations into c groups, each characterized by aregression function.4.3 Hidden Markov Poisson Regression ModelsIn this section, we consider a statistical method for longitudinal discrete data where theobjective of data analysis is to describe an observed count, Yki, for subject k during the ithtime interval zt, as a function of covariates, Xk. Longitudinal data are characterized bythe fact that there may exist some dependence structure between repeated observationsfor a subject. The model which we have developed assumes that the dependence betweenrepeated observations for a subject is determined by a finite state Markov chain in sucha way that conditional on a state, an observed count, Yki, follows a Poisson distributionwith mean specified by the product of exposure, = t — t_1, and Poisson rate definedby a log linear function of covariates, Xkj, in which coefficients may vary from state tostate. This model allows for overdispersion relative to the usual Poisson regression model.Our initial motivation comes from economic studies which investigate the relationshipbetween research and development and patent activity at firm level based on longitudinal discrete data associated with covariates. The previous studies have suggested thatthe data may be overdispersed relative to the usual Poisson regression and that theremay exist some correlation between repeated observations for a firm (Hausman, Halland Griliches (1984) and Hall, Griliches and Hausman (1986)). However these studiesChapter 4. Summary, Conclusions and Future Research 196have no discussion about directly modeling the dependence structure between repeatedobservations. Our approach explicitly specifies the dependence structure as a finite stateMarkov chain and estimates both the parameters of the Markov chain and coefficients inthe Poisson regression corresponding to each underlying state.In the context of generalized linear models, several approaches have been developedfor longitudinal data. Liang and Zeger (1986) proposed a general framework for analysisof longitudinal data based on generalized linear models, and Zeger (1988), Kaufmann(1987), Stiratelli, Laird and Ware (1984), Zeger, Liang and Self (1985) and Zeger andQaqish (1988) developed methods for serially correlated discrete observations. In applications to economics in which data are primarily continuous, some approaches allowparameter values suddenly to change according to the states of a Markov chain, c.f.Goldfeld and Quandt (1973), Lindgren (1978), Sclove (1983) and Tyssedal and Tjostheim (1988).In applications without covariates, Albert (1992) proposed a two- state Markov mixture model for longitudinal epileptic seizure counts. Leroux and Puterman (1992), Leroux (1989) and Le, Leroux and Puterman (1992) developed a finite state Markov mixturemodel for the sequence of counts of fetal movements. Our approach extends their approaches by incorporating covariates into the model and allowing variable exposure. Wealso use a rescale scheme to overcome either over or under numerical flow in applying theEM algorithm so that our algorithm improves the ones proposed by these authors.4.3.1 The ModelThe model we study in this paper embeds a finite state Markov chain in Poisson regression in which the regression coefficients depend on the chosen state. Specifically, let{ (yii, Xki, tki); i 1,. . . , k, 0 = tko <tkl < ... <tkfl} be a sequence of observed data fora subject k, where Yki is an observed count associated with covariates Xkj of d-dimensionChapter 4. Summary, Conclusions and Future Research 197during a time interval /tk tki — tki_1. For simplicity we suppress the subscript forsubjects in the following discussion. A Markov Poisson regression model assumes(1) The unobserved stochastic process has c possible states where c is finite and unknown;(2) For each observed count, yj, at time point t, there exists an unobserved discreterandom variable, S, representing a state at which yj is generated. Further, S hasc points of support, {1,. .. , c};(3) The S-process, {S1, 52,.. . , S,}, follows a c-state Markov chain with transition probabilities defined byPr(S=jIS_l=k)=pk, j,k=1,...,c; (4.5)(4) Conditional on Sj = j, Y follows a Poisson distribution which we denote asf3 (i I a, S) Po (y I=a)t] exp [j(Xj, )t] (4.6)where yj = 0, 1,..., tj (xi, a) and X a)is a nonnegative functionequal to the Poisson rate; for example,(xj,aj) = exp (cx),where c = . . , aj,j, j = 1,. . . , c, are unknown parameter vectors. Note that= t,—tj4 may equal 1 for all i or correspond to time of observation in timeseries data.The above assumptions define a semi-Markov process {(Y, Si); i = 1,. . . , n, 0 = t0 <t < ... < t} in which the transitions of the S-process follow a stationary, first-order,Chapter 4. Summary, Conclusions and Future Research 198Markov process, and the count, }, is renewed at each transition point, t, so that theconditional component distributions for the count depend only upon which state is exited.Note that the Poisson rates of the conditional component distributions vary betweenstates by different coefficients in the same Poisson regression specification. Furthermore,since the covariates can include parts of an individual’s past history, the proposed modelprovides a means of relaxing the assumptions about the transition process of the renewalcounts. Note that the transition probabilities pjj do not depend on covariates.Under the above assumptions, the joint probability “density” function of a sequenceof observed counts, Y = {yi,. . . , y}, associated with covariates, X = {x1,. . . , x,}, andexposure, T {zt1,.. . , satisfies the following equation.f(YIX,T,O) = (i)f( xi,ti,aj,Si)j=1 S2 S=1flps_1sfs(y I (4.7)where 0 = (an, . . . ,ald,a21, . . . , 2d, . , . . acd,pli, . . . ,Plc, . . . ,Pci, .. ,pcc) is anunknown parameter vector, ps’) = Pr(Si= j), = 1,... ,c, are the probabilities of theinitial states for the subject, ps_1s and fs (i I x, as, S1) are defined by (4.5) and(4.6) respectively.Note that the probabilities of the initial states, p, are assumed known. We willdiscuss how to determine their values below. Note also that Pjk = 1 for all j.Some previously studied models are special cases of the above model.• Choosing c = 1 yields a Poisson regression model;• Choosing the transition probability matrix as an identity matrix yields an independent mixed Poisson regression model which is a special case of the generalizedChapter 4. Summary, Conclusions and Future Research 199mixed Poisson regression models discussed by Wang, Puterman, Le and Cockburn(1994);Setting x = (1) and t = 1 for all i yields Markov Poisson mixture without covariates which is studied by Leroux and Puterman (1992) and Albert (1992).4.3.2 Moment StructureFrom the above definition we can derive the basic moment structure of observed counts.Using the properties of conditional expectation, we obtainI S) = )is and Var(Y I S) =Thus the unconditional mean and variance of Y areE(1) = E(E(1 I Si)) = Pr(S = j)jj (4.8)Var (‘4) = E(Var(Y I Si)) + Var(E(} I Sj)= Pr(S= j)ij + {E Pr(S = — Pr(S = i)A}2}(49)Since the second term in (4.9) is always nonnegative, (4.8) and (4.9) show that theproposed model can accommodate overdispersion relative to Poisson regression, and thatthe observed data are homogeneous if and only if = ... = for all i.The covariance of Y and ‘+m is given byCOV(1’,’4+m) cov(E()4 I S),E(’4+m I S))E()jS)j+ms+m) —= iji+mkFT(Si = j, Si+m = k) —j=1 k=1Chapter 4. Summary, Conclusions and Future Research 2004.3.3 IdentifiabilityAlong with the applications of the Markov Poisson regression models we must be concerned with the identifiability for the models. Without covariates, Teicher (1961, 1967)proves that both the class of finite Poisson mixtures and the class of all mixtures ofPoisson distribution products are identifiable. We will apply these results to derive thesufficient conditions for identifiability for the model. But we first define identifiability forthe Markov Poisson regression model as follows.Definition: Consider the class of probability models, {f(Y I X, T, O)}, with f(Y IX, T, 0) defined by (4.7), a restriction that ) < ... <), parameter space Cx 0, samplespace Y1 x. . . x and fixed covariate matrices X and T. The class of probability modelsis identifiable if for (c,0), (c*,0*) C x 0,f(Y I X,T,0) = f(Y X,T,0*) (4.10)for all (yr,...,y,,) E 34 x ... x Y, implies (c,0) = (c*,0*).Note that the order restriction in the definition indicates that two models are equivalent if they agree up to permutations of parameters. We now provide a sufficient conditionfor identifiability as follows.Theorem 3: The hidden Markov Poisson regression model is identifiable if the designmatrix X is full rank.Proof: Suppose that (c,0) and (c*,0*) satisfy (4.10), then summing up both sides ofequation (4.10) for Y2, . . . , n respectively yieldsp’Po(y1‘j) =p1Po(y j) (4.11)for all y E 34. Since each side of equation (4.11) may be regarded as a finite Poissonmixture without covariates, Teicher’s result (1961) implies thatc= c, l) = > 0 and )1j=(4.12)Chapter 4. Summary, Conclusions and Future Research 201forj=1,...,c.Now summing up both sides of (4.10) for y3,. . ., y,, yieldsppkP0(y1 I )Po(y2 I 2k)=P’P;kPO(Y1 I i)Po(y2 I ) (4.13)j=1 k=1 j=1 k=1for all (yr, y2) e 31 x )‘2. Since each side of equation (4.13) may be regarded as a finitemixture of two Poisson distribution products without covariates, Teicher’s result (1967)implies that)‘2j, forj = 1,...,c, (4.14)P1k = (1) for j, k = 1,. .. ,orPjk =Pk’ forj,k= 1,...,c. (4.15)For each i > 2, summing up both sides of (4.10) for Yl,. . . , zii—i, . , y yields(E... Po(y I ) = (s... Psi_ik) Po(y I ) (4.16)k=1 j=1 sil k=1 j=1 s_1=1for all yj E Y. (4.16) implies that)jj= ., for i = 3,...,n andj = 1,...,c. (4.17)From (4.12), (4.14) and (4.17) we obtainexp(axj) =exp(4’x) fori = 1,...,n andj = 1,...,c.This is equivalent to(c—c)’x=0, fori=1,...,nandj=1,...,c,or(a3—cr)’X=O, forj=1,...,c. (4.18)Chapter 4. Summary, Conclusions and Future Research 202Thus a sufficient condition for identifiability is that X is full rank, in which case(4.18) implies that cr = for j = 1,. . . , c. We can assume that this sufficient conditionholds without loss of generality, since if it does not we can reparameterize the modelaccordingly. D4.3.4 EstimationThe EM algorithmIn order to find the maximum likelihood estimates of the unknown parameters for theabove model, we apply the EM algorithm (Dempster, Laird and Rubin, 1977), treatingthe unobservable state variable S as missing information. In doing so, we represent acomplete data set by introducing the following indicator functions1 ifS_i=kandS=jzz(z,j,k) =( 0 otherwise;1 ifS=jz(z,j) =0 otherwise.Thus the log-likelihood of the complete data set, {(yj, x, z(i, j), z(i, j, k)); i = 1,. . . , nand j,k=1,...,c},with0=0°isQ(0 100) =i=2 j=1 k=1+ z(i,j)logfs(y Ii=1 3=1logp) + Qi(Oi I 0°) + Q2(02 I 0°)where 01 = (Pu, . . . ,Pic, . . ,Pc1, . . . ,Pcc), 02 = (a1i, . . . , aid, , ad, . . . , add),Qi(Oi 0°) = >zz(i,j,k)logpjk andi=2 j=1 k=1Chapter 4. Summary, Conclusions and Future Research 203Q2(02 I 0°) = z(i,j)1ogf(y =j)i=1 j=1The EM algorithm finds the maximum likelihood estimates by proceeding iterativelyin two steps: E-step and M step. At the E- step, it replaces the missing data in Q(0 I 0°)by its expectation, conditional on the observation data and the initial values of theparameters. At the M-step, it finds the estimates of the parameters by maximizing theexpected log likelihood for the complete data set, conditional on the observed data. Itrepeats the two steps until the log likelihood of the observed data no longer increases.Note that the EM algorithm guarantees that the log likelihood does not decrease for eachiteration.In our case, the E-step of the EM algorithm updates the expected values of the missingdata z(i, j) and zz(i, j, k) in each iteration, given the observed data and the initial valuesof the parameters. By definition,(i,j) = E{z(i,j) I ui,. . . ,y,}= Pr(S i I Yi,.”,Yn)= =j)/Pr(y1,...,y,S = k)Iz(i,j,k) = E(z(i,j,k) y,. ..,y,)= Pr(z(i,j,k) I ii,.= Pr(yi,.. =j,S = k)/Fr(y1,...,y)= Pr(y1,...y_S_=j)pkPr(y,...,yfl I = k)As first proposed by Baum et al. (1970), we use the following quantities to set up theforward-backward recursive formula for the computation of .(i,j) and Iz(i,j, k),a(i) = Pr(yi,...,y:,S:=j), fori=2,...,nanda(1) = I x1,t1,aj, 51 = j) for j = 1,..., c,Chapter 4. Summary, Conclusions and Future Research 204b3(i) = Pr(yi,...,y S= j), for i 1,... —1 andb3(ri) 1 for j = 1,... ,cThus (i,j) and iz(i,j, k) can be written as(i,j) = a(i)b(i)/a(n) (4.19)Iz(i,j,k)= pJkf(y I = j)aj(i — 1)bk(i)/a(n) (4.20)The advantage of the above expressions is that there are the following recursive formula to compute a(i) and b(i):a(i) == k)pf(y I j)= ak(z— l)pkfj(u I j)b(i) = = k I S =i)= I x+i,t+i,aj,S =j)Pr(yi,...,y S i)= Pjkfj(Y+1 I = j)bk(i + 1)The M-step is equivalent to maximizing the following two functions with respect toOi and 02 separately:0°) = z”z(i,j,k)logpk andi=2 j=1 k=1Q2(02 00) = 2(i,j)1ogf(y =j).i=1 j=1Chapter 4. Summary, Conclusions and Future Research 205To maximize Qi(Oi I 00) with respect to 01, the estimated values of the transitionprobabilities, 01 = (l3jk), should satisfy the following equation1o= 0. (4.21)Solving (4.21), we obtain ê= (jk) by— 2z’z(i,j,k)Pjk— ,. . ., j,k —1,...,c. (4.22)E1=2E1=zz(z,j, k)To maximize Q2(02 I 00) with respect to 02, the estimated value 02 should satisfy thefollowing equation02 0. (4.23)However there are usually no closed form for the solution of (4.23). We use the quasi-Newton approach (Nash, 1990) to solve it for 02.We now summarize the EM algorithm for the hidden Markov Poisson regression modelbelow.Step 0: Specify starting values o° and o° and a tolerance ;Step 1: (E-step) Compute (i,j) and i’zi,j, k) using (4.19) and (4.20) respectively, fori=2,...,nandj,k=1,...,c;Step 2: (M-step)1. Find the values of 0= J3jk using (4.22);2. Find the values of 02 to solve (4.23) using the quasi-Newton approach (Nash,1990);Step 3: If II O — 0) ::= I Pjk — p°) e or II 02— 0 II =i I-‘(0) (0) (0)crk— jk I , set 01 = 0 and 02 = 02, and go to Step 1; Otherwise, stop.Chapter 4. Summary, Conclusions and Future Research 206The E-step of the EM algorithmThe difficulty to compute (i,j) and Iz(i,j, k) by (4.19) and (4.20) is that a(i) andb3 (i) converge to 0 or oo very fast as i increases. This will cause underfiow problems inthe computation. To overcome this difficulty, we introduce an approach to rescale a3(i)and b,(i) so that both maximum values are around 1 for each i. This approach takes thespecial structure of the model into account. It first represents, for such j that a(i) 0,a(i)= ( ak(i — l)pkj)fj(y x, j, Si = j)= exp {lo (E ak(i — l)pk) + log(f(y x, aj, s1 = i)) }exp(q3)where qij = log( ak(i — 1)pk3 + log(f3(y x, crj, s1 j)). It then rescales a(i)by multiplying exp(—màxt) for such j that a3(i) 0, where maxt maxk{qjk}, andstores the order ofa3(i) by powera(i)=maxt3. This order will be used to calculatethe orders of (i,j) and Iz(i,j, k). The same procedure is applied to calculate b3(i).Before we state the computation of the E-step of the EM algorithm, we first definesome notations for simplicity as follows:f(i,j) y!f(y I x,/ =j)= [ztA(x, a)] exp (—t)(x1,cj))= exp{yj log(t3)+ y(a3’x) — Lt exp(a’xj}exp(rj,j)where rj,j= y log(Lt3)+ yj(cj’xj) — Lt exp(a3’ j) for i = 1,. . . , n and j = 1,. .. , c.Note that factorials in the numerators and denominators of (4.19) and (4.20) arecancelled out. This simplifies the computation of the E-step.The E-step of the EM algorithm can now be carried out as follows:Chapter 4. Summary, Conclusions and Future Research 207(a) compute a3(1) =p1f(1,j), j = 1,.. . , c, and set powera(i) = 0;(b) compute a3(i) for i = 2,.. . , n, and j = 1,. . . , c as follows:1. identify an index set Ka(i) {k; a3(i— 1)pik L 0, k = 1,.. . ,2. find tempa(i) maxkEKa(){log(1a(i— 1)pk) + r,k};3. compute a3(i) = exp{1og(1ak(i — 1)pk) + rj,3 — tempa(i)} for j Ka(i)and 0 otherwise;4. set powera(i) = powera(i — 1) + tempa(i).(c) Set b(n) = 1, forj = 1,...,c, and powerb(n) = 0;(d) compute b(n) for i = n — 1,n — 2,...,1 andj = 1,...,c as follows:1. identify an index set K”(i) {k;.1pkb(i + 1) 0,k 1,... ,c};2. find tempb(i) maxkEKb(){log(l pkb(i + 1)) + r+1,k};3. compute b(i) exp{log(1pkbk(i — 1)) + r,1,3 — tempb(i)} for j E K’(i)and 0 otherwIse;4. set powerb(i) = powerb(i — 1) + tempb(i).(e) For i = 2,. . . , u and j = 1,. .. , c compute temp(i) exp{powera(i) + powerb(i) —powera(n)} and(i,j) = temp(i)a(i)b(i)/ a(n);(f) For i = 2,... , n and j = 1,. . . , c, computeIz(i,j, k) = temp(i)pkf(i, k)a(i — 1)bk(i)/ a(n).Chapter 4. Summary, Conclusions and Future Research, 2084.3.5 The Probabilities of Initial States and Starting ValuesIn the above model we define the probabilities of initial states as known parameters. Todetermine their values, we consider two types of data: (1) data for a single subject and(2) data for several subjects. In the first case there is only the first observation directlyrelated to the initial states so that there is little information about the initial probabilities.Thus we set p’) = ...=p. Since the data in this case usually contain a rather longsequence of observations, the values of the probabilities may not have significant effects onestimation in terms of aymptotic properties. Without covariates, Leroux (1989) provesthat the effect of the probabilities vanishes as the number of observations increases.In the second case we choose the values of the probabilities as the estimates of themixing probabilities which are obtained by fitting the first observations of the subjects,{ yii; k = 1,. . . , m}, into a c-component mixed Poisson regression model with constantmixing probabilities and covariates in Poisson rates (Wang, Puterman, Le and Cockburn,1993). Note that in this case the mixing probabilities can be equivalently interpreted asthe the probabilities of initial states for the Markov mixture model. Further, in manyapplications like this, the data contain many subjects but short series.To be able to run the EM algorithm, we need to choose the starting values for theunknown parameters in the model. The EM algorithm only guarantees, under someregularity conditions (Wu, 1983), that the parameter estimates are local maxima of thelikelihood function. As the number of unknown parameters in the model increases, theremay be more local maxima. Further, a poor choice of the starting values may slowdown convergence with the EM algorithm. Indeed, in some cases where the likelihood isunbounded on the edge of parameter space, the sequence of estimates generated by theEM algorithm may diverge if the starting values are too close to the boundary. Hencefor these reasons it is important to choose the starting values carefully so as to increaseChapter 4. Summary, Conclusions and Future Research 209the chance to achieve the maximum likelihood estimate. We use the following approachwhich works well in our applications.We assume that c is known. We first fit the observed data into a c-component independent mixed Poisson regression model. Then we choose the initial values of theregression parameters as the corresponding estimates by the fitting. Further, we identifyeach observation with one of the c states if it has the largest value of the estimatedposterior probabilities calculated by (4.19). We then calculate the frequencies of thetransitions from state j to state k, and set these frequencies as the initial values of thecorresponding transition probabilities, p:j4.3.6 Implementation and Remaining IssuesWe suggest using BIC or AIC to determine the number of underlying states, and carryingout inference about parameters by likelihood ratio tests. Specifically, we first determinethe number of components c by comparing BIC and AIC values among saturated modelswhich include all covariates in Poisson means. After c is determined, we then carryout inference about regression parameters by likelihood ratio tests within c componentmixture models. We will plan to conduct a Monte Carlo study to investigate this modelselection procedure.On the other hand, using the quantities (i,j) and Iz(i,j, k) from the fitted model,we can classify observations into one of c states, and identify transitions for each subject.This information may be useful in applications. Note that our code works well for fittingthe fetal movement data (Leroux,1989) to the proposed model without covariates; theresults are the same as those given by Leroux (1989).Bibliography[1] Aitkin, M., Anderson, D. and Hinde, J (1981), “Statistical Modelling of Dataon Teaching Styles (with discussion),” Journal of the Royal Statistical Society,Ser. A 144, 419-461.[2] Akaike, H. (1973), “Information Theory and an Extension of the MaximumLikelihood Principle,” Second International Symposium on Information Theory, (B.N. Petrov and F. Csaki, Eds.), Budapest: Akademia Kaido, 267-81.[3] Akaike, H. (1974), “A New Look At the Statistical Model Identification,” IEEETrans. on Automatic Control, AC-19, 716-23.[4] Albert, P.S., (1991), “A two-state Markov model for a time series of epilepticseizure counts,” Biometrics, 47, 1371-1381.[5] Amritage,P. (1957), “Studies in the variability of pock counts,” J. Hug., Camb.,55, 564-581.[6] Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis,Second Edition, New York: Wiley.[7] Anscombe,F.J. (1950), “Sampling theory of the negative binomial and logarithmic series distributions,” Biometrika, 37, 358-382.[8] Aranda-Ordaz, F.J., (1981), “Quantal response analysis for a mixture of population,” Biometrics, 28, 981-988.[9] Ashford, R. and Walker, P.J. (1972), “Quantal Response Analysis For a Mixture of Populations,” ,Biometrics 28, 981-988.[10] Backer, R.J. and Nelder, J.A., (1978), The GLIM systems, release 3, Oxford:Numerical Algorithms Group.[11] Bartlett,M.S. (1936), “Some notes on insecticide tests in the laboratory and inthe field,” J. R. Statist. Soc., Suppi., 3, 185-194.[12] Baum, L.E., Petrie, T., Soules, G., and Weiss, N., (1970), “A maximizationtechnique occuring in the statistical analysis of probabilitic functions of Markovchains,” Annals of Mathematics Statistics, 41, 164-171.210Bibliography 211[13] Blischke, W.R. (1964), “Estimating the Parameters of Mixtures of BinomialDistributions,” Journal of the American Statistical Association, 59, 510-528.[14] Bock, R.D. and Aitkin, M., (1981), “Marginal maximum likelihood estimationof item parameters: application of an EM algorithm,”, Psychometrika, 46,443-459.[15] Bound, J., Cummins, C., Griliches, Z., Hall, B.H., and Jaffe., A., (1984), “Whodoes R and D and who patents?” National Bureau of Economic Research Working Paper No. 908, in Z. Griliches, ed., R and D, Patents, and Productivity,(Chicago: University of Chicago Press), 21-54.[16] Breslow, N. (1984), “Extra-Poisson Variation in Log-linear Models,” AppliedStatistics, 33, 38-44.[17] Breslow, N. (1990a), “Tests of Hypotheses in Overdispersed Poisson Regression and Other Quasi-likelihood Methods,” Journal of the American StatisticalAssociation, 85, 565-571.[18] Breslow, N. (1990b), “Further Studies in Variability of Pock Counts,” Statisticsin Medicine, Vol.9, 615-626.[19] Brillinger, D.R. (1986), “The Natural Variability of Vital Rates and AssociatedStatistics (with discussion),” Biometrics, 42, 693-734.[20] Busvine, J.R., (1938), “The toxicity of ethylene oxide to Calandra oryzae,C.C. Granaria, Tribolium Castaneum, and Cimex Lectualarius,” Biology, 25,605-632.[21] Cameron, A.C. and Trivedi, P.K. (1990), “Regression-Based Tests for Overdispersion in the Poisson Model”, Journal of Econometrics, 46, 347-364.[22] Cameron, A.C. and Trivedi, P.K. (1986), “ Econometric Models Based onCount Data: Comparisons and Applications of Some Estimators and Tests,”Journal of Applied Econometrics, 1, 29-53.[23] Carrol, R.J., Spiegelman, C.H., Lan, K.K.G., Bailey, K.T. and Abbott, R.D.,(1984), “Errors-in-variables for binary regression models,” Biometrika, 71, 19-26.[24] Collett, D., (1991), Modelling Binary Data, Champman Hall.[25] Collings,B.J. and Margolin,B.H. (1985), “Testing Goodness-of-Fit for the Poisson Assumption When Observations Are Not Identically Distributed,” Journalof the American Statistical Association, 80, 411-18.Bibliography 212[26] Cox, D.R. (1970), The Analysis of Binary Data, London: Chapman Hall.[27] Cox, D.R. (1983), “Some Remarks On Overdispersion,” Biometrika, 70, 269-274.[28] Crowder, M.J. (1978), “Beta-Binomial Anova for Proportions,” Applied Statistics, 27, 34-37.[29] Dean, C. and Lawless J.F. (1989), “Tests for Detecting Overdispersion in Poisson Regression Model,” Journal of the American Statistical Association, 84,467-472.[30] Dean, C.; Lawless J.F. and Willmot, G.E. (1989), “A mixed Poisson-inverse-Gaussian regression model,” Canadian Journal of Statistics, 17, 171-181.[31] Dean, C. (1992), “Testing for Overdispersion in Poisson and Binormial Regression Models,” Journal of the American Statistical Association, 87, 451-457.[32] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977), “Maximum Likelihoodfrom Incomplete Data Via the EM Algorithm (with discussion),” Journal ofthe Royal Statistical Society B, 39, 1-38.[33] Dennis, J.E. Jr. and Schanbel, R.B., (1983), Numerical methods for unconstrained optimization and nonlinear equations, Englewood Clifs, New Jersey:Prentice-Hall.[34] Dietz, P.E.and Baker, S.P., (1974), “Drowning: epidemiology and prevention,”American Journal of Public Health, 64, 303-3 12.[35] Duda, R.O. and Hart, P.E., (1973), Pattern Classification and Scene Analysis,New York: Wiley.[36] Efron,E., (1986), “Double exponential families and their use in generalizedlinear regression”. Journal of the American Statistical Association, 81, 709-721[37] Ehrenberg, (1972), Repeated-Buying, Amsterdam: North-Holland PublishingCo.; New York: American Elsevier Publishing Co.[38] Everitt, B.S., and Hand, D.J., (1981), Finite Mixture Distributions, Chapmanand Hall, Landon.[39] Fienberg, S.E., (1981), The Analysis of Cross-Classified Categorical Data, Second Edition, Cambridge: The MIT press.Bibliography 213[40] Finney, D.J. (1976), “Radioligan assay,” Biometrics,32, 721-740.[41] Firth, D. (1987), “On the efficiency of quasi-likehood estimation,”Biometrika,74, 233-245.[42] Fisher,R.A. (1950), “The Significance of Deviations From Expectation in aPoisson Series,” Biometrics,6, 17-24.[43] Folks, J.L. and Chhikara, R.S., (1978), “The inverse Gaussian distribution andits statistical application—a review,” Journal of the Royal Statistical Society B,40, 263-289.[44] Follmann,D.A. and Lambert,D. (1989), “Generalizing Logistic Regression byNonparametric Mixing,” Journal of the American Statistical Association, 84,294-30.[45] Follmann,D.A. and Lambert,D. (1991), “Identifiability of Finite Mixture ofLogistic Regression Models,” Journal of Statistical Planning and Inference,27, 375-381.[46] Formann, A.K. (1992), “Linear Logistic Latent Class Analysis for PolytomousData,” Journal of American Statistical Association, 87, 476-486.[47] Frome, E.L.; Kutner, M.H. and Beauchamp, J.J. (1973), “Regression analysisof Poisson-dist’ributed data”, Journal of American Statistical Association, 68,935-940.[48] Frome, E.L. (1983), “The analysis of rates using• Poisson regression models”.Biometrics, 39, 665-674.[49] Fukunaga, K. (1972), Introduction to Statistical Pattern Recognition, NewYork: Academic Press.[50] Ganio, L.M. and Schafer, D.W., (1992), “Diagnostics for overdispersion,” Journal of American Statistical Association, 87, 795-804.[51] Ghosh, J.K. and Sen, P.K., (1985), “On the asymptotic performance of thelog likelihod ratio statistic for the mixture model and related results,” Proc.Berkeley Conference in Jonor of Jcrzy Neyman and Jack Kiefer (Vol. II), L.M.Le Cam and R.A. Olshen (Eds.). Monterey: Wadsworth, 789-806.[52] Goldfeld, S.M. and Quandt, R.E., (1973), “A Markov model for switchingregressions,”, Journal of Econometrics, 1, 3-16.Bibliography 214[53] Gourieroux, C., Monfort, A., and Trognon, A., (1984), “Pseudo maximumlikelihood methods: applications to Poisson models,” Econometrica, 52, 701-720.[54] Gram, L., (1988), “Experimental studies and controlled clinical testing of valproate and vigabatrin,” Acta. Neurol. Scand., 78, 241-270.[55] Griliches, Z., (1990), “Patent statistics as economic indicators: a survey,” Journal of Economic Literature, XXVIII, 1661-1707.[56] Guerrero, V.M. and Johnson, R.A., (1982), “Use of the Box-Cox transformation with binary response models,” Biometrika, 69, 309-314.[57] Haberman, S.J., (1977), “Maximum likelihood estimation with incomplete datavia the EM algorithm (discussion),” Journal of the Royal Statistical Society B,39, 1-38.[58] Hall, B.H., Griliches, Z. and Hausman, J.A., (1986), “Patents and R and D: isthere a lag,” International Economic Review, 27, 265-283.[59] Hartigan, J.A. (1985a), “Statistical theory in clustering,” Journal of Classification, 2, 63-76.[60] Hartigan, J.A. (1985b), “A failure of likelihood asymptotics for normal mixtures,” Proc. Berkeley Conference in Jonor of Jerzy Neyman and Jack Kiefer(Vol. II), L.M. Le Cam and R.A. Olshen (Eds.). Monterey: Wadsworth, 807-810.[61] Hausman,J.A., Hall,B.H. and Griliches,Z., (1984), “Econometric models forcount data with an application to the patents R and D relationship,” Econometrica, 52, 909-938.[62] linde, J. (1982), “Compound Poisson regression model”, GLIM 82: Proc.Internat. Conf. Generalized Linear Models (R. Gilchrist, ed.), Springer, Berlin,109-121.[63] Hill, J.R. and Tsai, C., (1988), “Calculating the efficiency of maximum quasi-likelihood estimation,” Applied Statistics, 37, 219-230.[64] Hingson, R. and Rowland, J., (1987), “Alcohol as a risk factor for jinjury ordeath resulting from accidnetal falls: a review of the literature,” J. Stud. Alc.,48, 212-219.[65] Holford, T.R. (1983), “The estimation of age, period and cohort effects forvital rates”, Biometrics, 39, 311-324.Bibliography 215[66] Hopkins, A., Davies, P. and Dobson, C., (19.85), “Mathematical models ofpatterns of seizures,” Arch. Neurol., 42, 463-467.[67] Jorgensen, B. (1987), “Exponential dispersion models (with Discussion),” Journal of the Royal Statistical Society B, 49, 127-162.[68] Kaufmann, H. (1987), “Regression models for nonstationary categorical timeseries: asymptotic estimation theory,” Annal of Statistics, 15, 79-98.[69] Laird,N.M. (1978), “Nonparametric Maximum Likelihood Estimation of Mixing Distribution,” Journal of the American Statistical Association, 73, 805-811.[70] Lambert, D., and Roeder, K., (1993), “Overdispersion diagnostics for generalized linear models,” working paper.[71] Lawless, J.F. (1987a), “Regression Methods For Poisson Process Data,” Journal of the American Statistical Association, 82, 808-8 15.[72] Lawless, J.F. (1987b), “Negative Binomial And Mixed Poisson Regression”,The Canadian Journal of Statistics, 15, 209-225.[73] Le, N., Leroux, B.G. and Puterman, L.M. (1992), “Exact likelihood evaluationin a Markov mixture model for time series of seizure counts,” Biometrics, 48,317-323.[74] Lehmann, E.L. (1983), Theory of Point Estimation, New York: Wiley.[75] Leroux,B.G. (1989), “Maximum Likelihood Estimation for Mixture Distribution and Hidden Markov Models,” University of British Columbia, Ph.D. dissertation.[76] Leroux,B.G. and Puterman M.L. (1992), “Maximum Penalized Likelihood Estimation for Independent and Markov Dependent Mixture Models,” Biometrics,48, 545-558.[77] Liang, K.L. and Zeger, S.L., (1986), “Longitudinal data analysis using generalized linear models,” Biometrika, 73, 370- 384.[78] Lindsay,B.G. (1983), “The Geometry of Mixing Likelihood: a General Theory,”The Annals of Statistics, 11, 86-94.[79] Lindsay,B.G. and Roeder, K. (1992), “Residual diagnostics for mixture models,” Journal of the American Statistical Association, 87, 785-794.[80] Linhart,H. and Zucchini,W. (1986), Model Selection, New York: John Wiley.Bibliography 216[81] Mannering, F.L. (1989), “Poisson analysis of commuter flexibility in changingroutes and departure times,” Transpn. Res. B., 23B, 53-60.[82] Manton,K.G.; Woodbury,M.A., and Stallard, E., (1981), “A variance components approach to categorical data models with heterogeneous cell populations:Analysis of spatial gradients in lung cancer mortality rates in North Carolinacounties,” Biometrics, 37, 259-269.[83] Margolin, B.H.; Kaplan, N., and Zeiger,E., (1981), “Statistical analysis of theAmes salmonella/microsome test,” Proc. Nat. Acad. Sci. U.S.A., 76, 3779-3783.[84] Margolin, B.H., Kim, B.S. and Risko, K.J. (1989), “The AmesSalmonella/Microsome Mutagenicity Assay: Issues of Inference and Validation,” Journal of the American Statistical Association , 84, 651-661.[85] McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models (SecondEdition), London: Chapman and Hall.[86] McDermott, F.T., (1977), “Alcohol, road crash casualties, and contermeasures,” A.N.Z. J. Surgery, 47, 156-161.[87] McLachlan, GJ. and Basford, K.E. (1988), Mixture Models, New York: MarcelDekker, Inc..[88] Milton, J.G., Gotman, J., Remillard, G.M. and Adermann, F., (1987), “Timing of seizure recurrence in adult epileptic patients: a statistical analysis,”Epilepsia, 28, 471-478.[89] Nash,J.C. (1990), Compact Numerical Methods for Computers, Adam Hilger.[90] Neyman, J., (1959), “Optimal asymptotic tests of composite statistical hypotheses,” In Probability and Statistics, Ed. U. Grenander, 213-234, New York:Wiley.[91] Neuhaus, J.M., Kalbfleisch, J.D., and Hauck, W.W. (1991), “A comparison ofcluster-specific and population averaged approaches for analyzing correlatedbinary data,” International Statistical Review, 59, 22-35.[92] Ochi, Y. and Prentice, R.L., (1984), “Likelihood inference in a correlated probitregression model,” Biometrika, 71, 531-554.[93] Otake, M. and Prentice, R.L., (1984), “The analysis of chromosomally aberrantcells based on beta-binomial distribution,” Radiation Research, 98, 456-470.Bibliography 217[94] Pierce, D.A. and Sands, B.R. (1975), “Extra-Bernoulli variation in binarydata,” Technical Report, 46, Oregon State University, Department of Statistics.[95] Pierce, D.A. and Schafer, D.W., (1986), “Residuals in generalized linear models,” Journal of the American Statistical Association , 81, 977-98 1.[96] Pocock, S.J., Cook, D.G., and Beresford, S.A.A.,(1981), “Regression of areamortality rates on explanatory variables: What weighting is appropriate?,”Applied Statistics, 30, 370-384.[97] Pregibon, D., (1981), “Logistic regression diagnostics,” Annals of Statistics, 9,705-724.[98] Prentice, R.L., (1976), “A generalization of the probit and logit methods fordose response curves,” Biometrics, 32, 761-768.[99] Redner, R.A. and Walker, H.F., (1984), “Mixture densities, maximum likelihood and the EM algorithm”, SIAM, 26, 195-239.[100] Roberts, H.V. (1991), Data Analysis for Managers with Minitab, The ScientificPress, South San Francisco.[101] Schall, R. (1991), “Estimation In Generalized Linear Models With RandomEffects”, Biometrika, 78, 719-27.[102] Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals ofStatistics, Vol.6, 461-464.[103] Sclove, S., (1983), “Time-series segmentation: a model and a method,” Information Sciences, 29, 7-25.[104] Shaked, M., (1980), “On mixtures from exponential families,” J.R. Statist. Soc.B, 42, 192-198.[105] Simar,L. (1976), “Maximum Likelihood Estimation of a Compound PoissonProcess,” The Annals of Statistics, 4, 1200-1209.[106] Stein, G.Z., and Juritz,J.M. (1988), “Linear models with an inverse-Guassiandistribution,” Comm. Statist. Theory Methods, 17, 557-571.[107] Stiratelli, R., Laird, N. and Ware, J.H., (1984), “Random-effect models forseries observations with binary response,” Biometrics, 40, 961-971.[108] Tarone, R.E., (1976) “Testing the goodness of fit of the binomial distribution,”Biometrika, 66, 585-590.Bibliography 218[109] Teicher, H. (1961), “Identifiability of Mixtures,” Annals of Mathematical Statistics 32, 244-248.[110] Teicher, H. (1963), “Identifiability of Finite Mixtures,” Annals of MathematicalStatistics 34, 1265-1269.[111] Titterington, D.M., Smith, A.F. and Markov, U.E. (1985), Statistical Analysisof Finite Mixture Models, Chichester: John Wiley & Sons.[112] Tweedie, M.C.K., (1957), “Statistical properties of inverse Gaussian distributions,” International Annals of Mathematical Statistics, 28, 362-372.[113] Tyssedal, J.S., and Tjostheim, D., (1988), “An autoregression model with suddenly changing parameters and an application to stock market prices,” AppliedStatistics 37, 353-369.[114] Walker, P.J. (1966), “A Method of Measuring the Sensitivity of Trypanosometo Acriflavine and Trivalent Tryparsamide,” Journal of General Micro biol, 43,45-58.[115] Webb, G.R., Redman, S., Hennrikus, D.J., Kelman, G.R., Gibberd, R.W.and Sanson-Fisher, R.W., (1994), “The relationships between high-risk andproblem drinking and the occurrence of work injuries and related absences,”Journal of Studies on Alcohol, forthcoming.[116j Wechsler, H., Kasey, E.H., Thum, D. and Demone, H.W., (1969), “Alcohollevel and home accidents,” Public Health Reports, 84, 1043- 1050.[117] Wedderburn, R.W. M. (1974), “Quasi-likelihood Functions, Generalized LinearModels and the Gauss-Newton Method,” Biornetrika 61, 439-447.[118] Wilesnsky, A.J., Ojemann, L. M., Temkin, N.R., Troupin, A.S. and Dodrill,C.B., (1981), “Clorazepate and phenobarbital as antiepileptic drugs: a double-blind study,” Neurology 31, 1271-1276.[119] Williams, D.A. (1975), “The analysis of binary response from toxicologicalexperiments involving reproduction and teratogenicity,” Biometrika, 61, 439-447.[120] Williams, D.A. (1982), “Extra-binomial Variation in Logistic Linear Moldes,”Applied Statistics, 31, 144-148.[121] Williams, D.A. (1984), “Generalized linear model diagnostics using devianceand single case deletions”, Applied Statistics, 36, 181-191.Bibliography 219[122] Wu,C.F.J. (1983), “On the Convergence Properties of the EM Algorithm,” TheAnnals of Statistics, 11, 95-103.[123] Zeger, S.L., (1988), “A regression model for time series of counts,” Biometrika,75, 621-629. V[124] Zeger, S.L., Liang, K.Y. and Self, S.G., (1985), “The analysis of binary longitudinal data with time-independent covariates,” Biometrics, 72, 31-38.[125] Zeger, S.L. and Qaqish, B., (1988), “Markov regression models for time series:a quasi-likelihood approach,” Biometrics, 44, 1019-1031.Appendix A1. Fortran program for computing the maximum likelihood estimates of the mixed Poissonregression model.CPROGRMf GENMDCCC Thi. I. dceigird for dii. in whldi e.dR cinc.vatii i.C wi4hi t poriod. Tidi • piorid.. fitlid von., Pcui&. Ma*i.AicC XSQR. Devinico, P..o.’. d.vin.or oc.Inh for l • i1C LZI for radiC NV-I OF VARIABLE REGRESSION COEFFICIENTS FOR E4C1I COMPONENT.C NT I OF COMMOM REGRESSION COEFFICIENTS.CIMPLICrF DOUBLE PRECTSION(A.H,O-Z)INTEGER NOBSNSFAT,NX,NXI,NV,NF,NI$2DiMENSION OBS(I000),TU000),M(1000,8),X141(1000,8),Z(1000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NFrAT,NX,NXI.NV.NF,Nl,N2DIMENSION BGUESSQ0),OB(30),HO0,30),AGUESS(IQ),OADG)DIMENSION PA(10),PB(I,PC(10),Wr(30),DRES(1aOo),TDEES(IC0o)DIMENSION Frr(i000,12),RE (IC ,iid0,3Q), RESL(i000)dioxoajai tdi.(I0C0),tImo(I000),(I000,8),I(lOQ0,8)diiorork.i s(lOOO),ae3O),cp.oQO),w(I0C0)INTEGER N1,N2,UII,1112,MON,NEVALS,IFAIL,NSFEFINTEGER N’r,N’FEMPI,NTEMP2,n13,NUM,IrER.NrOI.OPEN( UNTr=I,FILE=’lcuI’)OPEN( UNrr-2,FnE=rcon.r)OPEN( UNIF..3,FIE.’Iflntor’)OPEN( UNrr=7.PILE-rcauk’)opni( aiit=8fiin&flt.a)OPEN( UNrr—9,FILE—’ld.taoi)READ(1,I00) NOBS,NSTAT,NX,NXI,NV,NFw,100) ncin,ini.t,ior,n.I,NV,NF100 EORMAT(615)Ni ..(NTrAT-I)NXN2=NSTAT*NV+NFNT=NI+N2NTEMP1=NINTEMP2N2READ(1,i13) (OBS(I),TIME(I), l.’I,NOBS)110 FORMAT(Fi0.5)wriln(*,113) (oin(i),tiin.(i),i—i,nois)do lii 1i,nob.READ (1,112) (XM(,J),Ii,NX)if(Lgt.i0) go to illwñte°,iI2) (xm(Lj)j—i,xoc)111112 FORMAT(6G16.8)113 FORMAT(2F10.5)READ(l,1I0) (BGUESS(I),I—1,N1)wnlc(9,i10) (bgucaa(i),ii,n1)DO 115 1—l,NOBSREAD(i,1IZ QcMI(1,i),J=1,NXI)if (I gt.i0) go to 115writo (,I12) (xxnl(i.j)j1,mcl)115 CONTINUEREAD(l,1 10) (AGUESS(I),l= l,N2)WRrrE(9,1 10) (AGUESS(O,1 i,N2)do 118 i’I,oobetoin(i)’’cb(i)do 116 j=i,nx116 lxm(Lj)xm(ij)do Ill j1,nxl117 mI(Lj)=xm1(i.j)118 ccoih.rNEVALS- 1000uhIN1IH2’N21113-NTMONtTOL.=0.0001TOL.L’..O.OIDO I 1=I,NOBSl TDRES(I)=0.0OBSINF0.0220DEV=0.ODODO 150 1=1,NOBSIF (OBSQ).EQ.0.0) GO TO 150TDRESQ)=OBS(1)*(DLOG(OBSQ))1 .0D0D0)oba(i)*dllog(time(i))DEV=DEV+TDRES(1)TEMP1 =0.ODONSTEP=INT(OBS(1))DO 145 J=1,NSTEPTEMP1 =TEMPI +DLOG(DFLOAT(J))145 CONTINUEobainf=obainf+TEMPIOBS(I)*DLOG(T1ME(1))150 CONTINUENTOL=NOE5ODEV=DEVOOESINF=OBSINFdo 155 i=1,ntol155 w(i)=0.ODO160 DO 888 1TER=0,NTOLPREL=(10.0D0ca10.0D0)200 DO 202 I=1,N1202 OBm=EGuESso)DO 205 1=1,N2205 OA(I)=AGUESS(1)CALL ESTEP(NTEMPI,NTEMP2,EGUESS,AGUESS)CALL MSTEPIa4TEMP2,AGUESS,H,P,1H2,NEVALS,IFML,MON)CALL MSTEfl(NTEMP1,BGUESS,H,P,IHI,NEVALS,IFAIL,MON)SSRI =0.ODOSSR2=0.ODODO 350 I=l,NlSSRI =SSRI+(OEW-BGUESS(I))°’QODO350 CONTINUEDO 352 l=1,N2352 SSR2=(OA(1)-AGUESS(l))2.0D0+SSR2DO 353 K=l,Nl353 BT(K)=BGUESS(K)DO 354 K=1,N2354 BT(K+Nl)=AGUESS(K)CALL LL1KELY’lT,BT,F)TEMP=-F-PRELIF (TEMP.LT.TOLL) GO TO 359PREL=FIF (ITER.GT.0) GO TO 356f=-f-obainFWRITE(9,1II1) Fwcitc(*,l11l) fliii fonnat(4x,G16.8)356 IF ((SSRI .GT.TOL).OR.(SSR2.GT.TOL)) GO TO 200359 call cawton(nI,bt,li,pO,nt,tr.vals,ifail,mon,std)if (itcr.eq.0) call Fllkely(nt,bt,f,dits)if (ilec.gt.0) call llhktly(nt,bt,l)do 3M i=l,nl364 bgueas(i)=bt(i)do 365 i=I,n2365 agtas(i)=bt(ol+i)DEV=2*(DEV(F))C wcite(*,9 itec,cdev,tdcea(itcc),fIF (ITER.EQ.0) TDEV=DEVIF (1TER.GT.0) GO TO 500f--f-obuiofwcite(9,ll11) fcall gfit(ol,n2,bgueua,agiraa,XSQR,flt,RES,pa,pb,pc)do 366 i-1,nobStemp=fIt(i,l)-fit(i,2)if (Icoipcqo.ODO) sign(i)=0.ODOif (lcmpat.O000) sign(i)=tecap/(aba(tcmp))dQ)(2*Q).(i))yICK(05IJffl3t3)dces(i)=sign(i)*drea(i)366 continuewcitc(9,4444)4444 fonnat(4x,’gccdneaa of fit--XSQR, DeviancE’)write(9,7777) XSQR, DEvdo 368 i1,nobewrile(8,369) (FfflI,J),J= 1,2÷2aNSTAT)o WRJTE(2,369) (RES(I,J),J=l,l+NSTAT)wcite(7,l 12) (zQj),j I,natat)NUM= 1DO 367 J=l,NSTAT-lIF (ZQ,J)GT.ZO,J+l)) GO TO 367NUM=J+I367 CONTINUEWRITE(2,370) Fff(I,l),Ffr(I,2),RES(I,l),DRES(I),FIT(I,NUM+2),o RES(I,l+NUM),NUM368 continue369 fonnat(12g16.8)370 FORMAT(G12.6,x,g12.6,x,g12.DO 400 J=1,N1400 BGUESS(J)=BT(J)DO 402 J=I,N2402 AGUESS(J)=BT(J+N1)call estep(ntempl,ntcmp2,bgueas,atemp=dfloat(nl +n2)do 410 k=1,nlct,ar(k)=bgucsa(k)se(k)=(std(k,k)*a(0.500DO))*temp410 continuedo 420 k=1,n2opar(k+n1)aguess(k)sek+nl)(sk+nl,k+nl)ca(0.5D0D0))temp420 continueWRfl’E(9,5555)WR]TE(9,7777) (BGUESS(1), std(I,i)ca(0.SDODO), 1l,Nl)writc(*,7777) (bguess(i),i= 1,nl)write(9,6666)write(9,7777) (AGUESS(I), atd(l+Nl,i+n1)°’(0.5D0D0),l= I,N2)writet*,7777) (aguess(i),i=l,n2)wiite(9,7787)writc(9,7777) (pa(i),i l,nstat)write(9,7797)write(9,7777) (pb(i),i l,natat)wiite(9,7799)wrlte(9,7777) (pc(i),i l,nstat)GO TO 504500 RESLQTER)=TDEV-DEVdo 501 k=l,nlw(Uer)=w(iter)+abs(bguees(k)-oparikl)/sc(k)bguess(k)opar(k)501 continuedo 502 k=l,n2w(iter)=w(iter)+abe(aguess(k)-opa(k+n1))Ise(k+nl)aguess(k)=opar(k+nl)502 continueIF (TTER.EQ.NTOL) GO TO 889504 NOBS=NTOL-lDEV=ODEV.TDRES(ITER+ I)if (iter.eq.0) go to 514do 510 k=l,iterobs(k)tobs(k)time(k)ttime(k)do 505 j=1,nx505 xm(k,j)=tzcm(kj)do 506 j=1,nxl506 xm1(kj)=ntm1(kj)510 continueif (iter.eq.noba) go to 888514 DO 520 K=ITER+l,NOBSOBS(K)=TOBS(K+l)TIME(K)=tTIME(K+l)DO 515 J=1,NX515 XM(K,J)=tXM(K+l,J)DO 516 J=1,NXI516 XMI(K,J)=tXMI(K+l,J)520 CONTINUE888 CONTINUE889 do 900 i=l,ntolresl(i)=sign(i)*(resl(i)ncO.SD000)write(3,7778) ds(i),ca(,l),nul(i),w(i)900 continue5555 fonnat(4x,’beta-vecto?)6666 fonnat(4x,’alpl,a-vecto?)7777 fom.at(4x,2g16.8)7778 fonnat(4x,4g16.8)7787 fonnat(’pa’)7797 fonnat(’pb’)7799 fonnat(’pc)9999 sroENDSUBROUflNE FuNcr4,B,P)Cnu0on0aC This subroutlue computes the value of function QI in Chapter 2.C Data input: N diiuension of vector B;C B=betavector;C output: P = Ut flmction value Q1(B).IMPLICiT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NV,NF,N1,N2DIMENSION OBS(1000),TIME(1000),XM(1000,S),XM1(l000,8),Z(1000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NXI,NV,NF,N1,N2INTEGER NDIMENSION E(N),BX(5)P=0.ODODO 100 1=l,NOBSDOS J=l,NSTAT-lBX(J)=0.ODODO 6 M=I,NX6 BX(J)=BX(J)+XM(I,M)*B(M+(J.I)*NX)S CONTINUEP=P+Z(I,l)5BX(l)TEMP1 =EX(l)IF (TEMPI .LT.0.ODO) TEMPI =0.ODODO 20 J=2,NSTAT-IP=P+Z(I,J)*BX(J)IF (BX(J).GT.BX(J-I)) TEMPI =BX(J)20 CONTINUEP=P-TEMPICALL AEXP(-TEMPI,TEMP2)DO 30J=I,NSTAT-ICALL AEXP((BX(J)-TEMP1),TEMP3)TEMP2=TEMP2+TEMP330 CONTINUEP=P-DLOG(TEMP2)100 CONTINUEP=-FRETURNENDSUBROUrINE GRADasB,G)ctC This subroutine computes Ut first derivative of QI (see eqn 2.21).C Data input: N = dimension of vector B;C B=hetavector;C output:G=tlsederivativeofQlatli.C :::.::..:. .:.IMPLICiT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI,N2DIMENSION OBS(l000),TIME(I000),XM(l000,S),XMI(l000,S),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOES,NSTAT,NX,NX1,NV,NF,Nl,N2INTEGER NDIMENSION G(25),B(N),TEMP(25),BX(5)DOS I=l,NS G(I)=0.ODODO 100 I=l,NOBSDO 20 J=l,NSTAT-IBX(J)=0.ODODO 1OM=I,NX10 BX(J)=BX(fl+XM(I,M)*B(M+(3l)*NX)20 CONTINUETEMP1 =BX(1)IF (FEMP1 .LT.0.ODO) TEMP1 =0.ODODO 30 J=2,NSTAT-IIF (BX(J).GT.BXQ-I)) TEMPI =BX(3)30 CONTINUECALL AEXP&TEMPI ,TEMP2)DO 40 J=l,NSTAT-ICALL AEXP((BX(J)-TEMPI),TEMPQ))TEMP2=TEMP2+TEMP(J)40 CONTINUEDO 60J=l,NSTAT-1DO 50 M=I,NXG(M+(Jl)aNX)=G(M+(J.l)aNX)+XM(I,M)*(z(I,J).TEMP(J)IfEMp2)50 CONTINUE60 CONTINUE100 CONTINUEDO 200 I=I,N200 G(I)=-G(I)RETURNENDSUBROI.TrINE MSTEP2(N,B,H,P0,IH,NEVALS,IFAIL,MON)Ct©©ttt5©tC This subroutine is a quasi-Newton algorithm (Nash, 1990) whichC maximizes Ut function Ql.13-3C Data input: N dimenskai of vector B; B = beta vector;C 111 = dimansion of tiz Hessian matrix;C NEVALS 1/of evaluations for the function QI;C output: H = Liz Hessian matrix; P0 nzxinasn value;C B = optimal values of beta vector.IMPLICiT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSrAT,NX,NXI,NV,NF,Nl,N2DIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(1000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,N1,N2DIMENSION B4), H(IH,N)DIMENSION X(30), C(30), 0(30), T(30)DOUBLE PRECISION KINTEGER COUNTDATA W,TOL.2,1.0D0D-4/,EPS/1 .ODOD-61IF (N.LT.0.OR.N.GT.23) GO TO 160IFN = N+l10 =RLIM=7.2D0*(l0.ODOec74.ODO)CALL FUNCr(N,B,P0)IF(P0.GT.RLIM)GOTO18OCALL GRAD4,B,G)CC RESEF HESSIANC10 DO 301 = l,NDO 20 J l,N20 H(I,J) = 0.013030 H(I,l) = I .ODODOILAST = 10CC TOP OF ITERATIONC40 DO 501 l,NX(1) = B(I)50 C(I)=G(1)CC FIND SEARCH DIRECrION TCDl = 0.ODOSN=0.000DO 701 = l,NS = 0.0DO 60 J = 1,N60 S = S-H(I,J)*G(J)T(I) = SSN = SN+S*S70 Dl = Dl-SG(I)CC CHECK IF DOWNHILLCIF (D1.LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN = 0.SDODO/DSQRT(SN)K = DMIN1(1.000DO,SN)80 COUNT =0DO 901 = l,NB(I) = X(1)+K*T(I)IF (DABS(B(I)-X(1)).LT.EPS) COUNT COUNT+l90 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO 150CALL FUNCT(N,B,P)IFN = IFN+1IF (IFN.GE.NEVALS) GO TO 170IF (P.LT.P0DI*K9X)L) GO TO 100K = W*KGO TO 80CC NEW LOWEST VALUEC100 P0 P10 = 10+1CALL GRAD(N,B,G)IFN IFN+NCC UPDATE HESSIANCDI = 0.0D0DO 1101 = 1,NTI)) = K5T(I)C(I) = G(l)-C(1)110 Dl = D1+T(1)*C(l)CC CHECK IF +VE DEF ADDITIONCIF (D1.LE.0.ODO) GO TO 10D2 = (l.ODODO 130 1 = l,NS = 0.0130DO 120 J = I,N120 5 = S+H(1,J)C(J)XCI) = S130 D2 = D2+S’C(l)1)2 = I+D2/D1DO 140 I = I,NDO 1401 = 1,N140 H(I,J) = H(I,J)-(T(I)*X(J)+T(J)*X(I)-D2fl(Ifl(J))ID1GO TO 40150 WAIL = 0C SUCCESSFUL CONCLUSIONRETURN160 WAlL = IC N Our OF RANGERETURN170 WAIL = 2C TOO MANY FUNCTION EVALUATIONSRETURN180 WAIL=3C IND1AL POINT INFEASIBLERETURN2005 FORMAT( 2X,3G16.4)ENDSUBROUTINE AEXP(X,F)CC This subroutite computes a expotrntial function value.C Data input: X = real number;C output: F = exp(X).••••IMPLICIT DOUBLE PRECISION (A-H,O-Z)INTEGER NSTEPTEMP1 =ABS(X)IF (TEMPI.GT.79.9D0) GO TO 50F=DEXP(X)GO TO 20050 IF (X.LT.-79.9D0) GO TO ISOIF (X.GT.lS0.ODO) X=150.ODOF=l.ODODO+XNSTEP=lFAcrrOR= 1 .ODODOTEMPI =DFLOAT(NSTEP)TEMP2=XITEMP1100 IF (TEMP2.LT.l.ODODO) GO TO 200NETEP=NSTEP+ 1TEMPI =DFLOAT(NSTEP)FACTOR=XITEMPITEMP2=TEMP25FAC ORF=F+TEMP2GO TO 100150 F=0.ODO200 RETURNENDSUBROUTINE FUNCT1(N,B,P)C This subroutine computes the value of function Q2 in Chapter 2.C Data input: N = dinrnsion of vector B;C B=alphavector;C output: P = the fanction value Q2(B).IMPLICIT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NY,NF,NI ,N2DIMENSION OBS(1000),TIME(I100),XM(1IM),8),XMI (I,8),Z(II00,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX I ,NV,NF,N I ,N2INTEGER NiDIMENSION B(N),BX(5)P=0.ODODO 100 I=1,NOBSDOS J=1,NSTATBX(J)=0.000DOS M=1,NV6 BX(J)=BX(J)+XMI(1,M)SB(M+Q.1)*NV)S CONTINUETEMP=0.000DO I2M=I,NPTEMP=TEMP+XMI (I,NV+M)*B(M+NSTAT*NV)12 CONTINUEDO 14J=1,NSTATBX(J)=BX(J)+TEMP14 CONTINUEDO 20 J=1,NSTATCALL AEXP(BX(J),TEMP)P=P+ZQ,J)*(OBS(I)*BX(J)TIME(1)*TEMP)20 CONTINUE100 CONTINUEP=-pRETURNENDSUBROUTINE ORADI(N,B,O)CC This subroutine computes the first derivative of Q2 (see eqn 2.22).C Data input: N = dimension of vector B;C B = alpha vector;C output: G...thederivativeofQ2atB.Cossssssoucosssosssso©nosssossss©seccs©ccccccomrsIMPLICIT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NV,NFDIMENSION OBS(l000),TIME(l000),XM(l000,S),XM1(l000,8),Z(l0(XI,5)COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXI ,NV,NFINTEGER NDIMENSION G(30),B(N),BX(5)DOS I=I,NS G(I)=0.000DO 100 1=l,NOBSDO 20 J=l,NSTATBX(J)=0.ODODO 10 M=l,NV10 BX(J)=BX(J)+XMI(I,M)’B(M+(J-l)5N )20 CONTINUETEMP=0.ODODO 22 M=l,NFTEMP=TEMP+XMI (I,M+NV)*B(M+NSTAT*NV)22 CONTINUEDO 24 J=I,NSTATBX(J)=BX(J)+TEMP24 CONTINUEDO 32J=l,NSTATCALL AEXP(BX(J),TEMP)DO 30 M=l,NVG(M+(J-l)5NV)=G(M+(J.l)*NV)C +Zo,J)*XMla,M)*(OBS(I)T1MEwrrEMp)30 CONTINUE32 CONTINUEDO 42 M=I,NFDO 40 J=I,NSTATCALL AEXP(BX(J),TEMP)G(M+NSTArNV)=G(M+NSTAT*NV)C +Z(I,J)*XMI(I,M)*(OBS(I).TIME(l)*TEMP)40 CONTINUE42 CONTINUE100 CONTINUEDO 200 I=l,N200 G(I)=-G(I)RErURNENDSUBROUTINE MSTEPI(N,B,H,P0,IN,NEVALS,IPAIL,MON)C This subroutine is a quasi-Newton algorithm (Nash, 1990) whichC maximizes the function Q2.C Data input: N = dimension of vector B; B = alpha vector;C IN = dinrnsion of the corresponding Hessian matrix;C NEVALS = # of evaluations for the function Q2C output: H = the Hessian matrix; P0 = maximum value;C B optimal values of alpha vector.IMPLICiT DOUBLE PRECISION(AH,O.Z)INTEGER NOBS,NSTAT,Nx,NxI ,NV,NF,Nl,N2DIMENSION OBS(100O),TIME(l00),XM(1008)XM1(l08)Z(lCOMMON OBS,TIME,XM,XI,z,NoBs,NSTATNVNVINVNFNINZDIMENSION BQl), H(U4,N)DIMENSION X(30), C(30), 0(30), T(30)DOUBLE PRECISION KINTEGER COUNTDATA W,TOLIO.2,l .ODOD-4/,EpS/I .0130D-61IF (N.LT.0.OR.N.GT.23) GO TO 160IFN = N+lIG IRLIM=7.2DGa(I0.000es74OD0)CALL FUNCrI(N,B,po)IF(P0.GT.RLIM)GOTO18OCALL GRADI(N,B,G)CC RESET HESSIANC10 DO 301 = I,NDO 203 = I,N20 H(I,J) = 0.01)030 H(1,J) 1.0001)0ll.ATT = IGCC TOP OF ITERATIONC4000501 = l,NX(I) = B(l)50 C(I)=G(1)CC PINt) SEARCH DIRECTION TCDI 0.ODOSN=0.000DO 701 I,NS 0.000DO 601 I,N60 5 = S.H(I,3)*G(J)T(I) = SSN SN+SS70 DI DI-S*G(I)CC CHECK IF DOWNHILLCIF (Dl .LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN = 0.500DOIDSQRT(SN)K = DMINI(l.00000,SN)80 COUNT 0DO 901 = I,NB(I) = X(I)+I(3T(I)IF (DAES(B(I)-X(I)).LT.EPS) COUNT COUNT+I90 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO 150CALL FUNCrI(N,B,p)IFN IFN+lIF (IFN.GE.NEVALS) GO TO 170IF (P.LT.P0-Dl*KSTOL) GO TO 100IC = WKGO TO 80CC NEW LOWEST VALUEC100 P0 P10 10+1CALL GRADI(N,B,G)IFN IFN+NCC UPDATE HESSIANCDl = 0.000DO 1101 I,NT(I) = K’T(I)CQ) = Gm-CU)110 Dl = Dl +TQ)*CQ)CC CHECK H +VE DEF ADDITIONCIF (D1.LE.0.000) GO TO 1002 = 0.000DO 130 I = 1,NS = 0.000DO 1203 = I,N120 S = S+H(I,J)*C(J)X(1) = S130 D2 = D2+S*C(I)D2 = 1+D2ID1DO 140 I = l,NDO 140 J = 1,N140 11(1,3) = Ho,J).cr(I)*X(J)+T(J)*Xa).D2*r(Iyrr(J))/D1GO TO 40150 WAIL = 0C SUCCESSFUL CONCLUSIONREFURN160 WAlLC N OUT OF RANGEREFURN170 WAlL = 2C TOO MANY FUNUFION EVALUATIONSRErURN180 WAlL = 3C INITIAL POINT INFEASIBLERErURN2005 FORMAT( 2X,3G16.4)ENDSUBROUTINE ESFEP(14TEMF1,NTEMF2,B,BI)C This subroutine executes the E-step of the EM algorithm.C Data input: NTEMFI = dimension of vector B;C NTEMP2 = dimension of vector RI;C R=hatavectoGC El = alpha vector.C Ouput: updated pesterior prohabifitisa, Z(I,J).CIMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI ,N2DIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(l000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NV,NF,N1,N2DIMENSION B(NTEMFI),B1(NTEMF2),TEMF(5),BX(5),BX1(5),TEMPL(5)INTEGER NTEMFI,NTEMF2SMALL=-79.9D0SMALI0= 10000000000.000SMALLO= I .OD000/SMALLODO 100 I=I,NOBSDO 12J=1,NEFAT-IBX(J)=0.000DO 10 M=l,NXBX(J)=BX(J)+XM(I,M)*B(M +(Jl)*NX)10 CONTINUE12 CONTINUEBXQ4STAT)=0.000DO 18 J=1,NSTATBX1(J)=0.000DO 16 M=I,NVBXl(J)=BXl(J)+XMI(I,M)*Bl(M+(Jl)*NV)16 CONTINUE18 CONTINUEH mF.EQ.0) GO TO 25TEMPP=0.000DO 22 M=l,NFTEMPP=TEMFP+XMI(I,NV+M)*Bl(M+NSTAT*NV)22 CONTINUEDO 24J=l,NSTATBXI (J)=BX1(J)+TEMPP24 CONTINUE25 CONTINUECALL AEXP(BXI(l),TEMP(l))TEMP1 =BX(l)+OBS(I)*BXI (l)-TIME(I)*TEMP(l)DO 30 J=2,NSTATCALL AEXP(BXI(l),TEMP(J))TEMFI2=(rEMF(s)-TEMP(s-l)flPIME(I)TEMPl2=(BX(J)BX(J.l))+OBS(l)*(BXl(J)BXl (J.l)).TW4PI2IF (rEMPI2.GT.0.000) TEMPI Bx(J)+OBSm*Bx1a)TIME(IyrTEMP(J)30 CONTINUETEMP2=0.ODODO 40 J=l,NSrATTEMP(J)=BX(J)+OBS(I5B I(J)-TIME(I)rEMP(J)CALL AEXP((TEMP(J)-TEMP1),TEMPLQ))TEMP2=TEMP2+TEMPL(J)40 CONTINUEDO 50 J=I,NSTATZ(I,J)=TEMPL(J)IrEMP2IF (Z(I,J).LT.SMALLO) Z(I,3)=0.ODO50 CONTINUE100 CONTINUERErURNENDSUBROIJFINE LLIKELY(NT,BT,F)C .: .::: .:C This subroutine computes the observed log likelihood value.C Data Input: NT = total dintnsion of vector BT;C BT = vector combining beta and alpha vectors.C Output: F = lit observed log likelihood value at liT.IMPLICif DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NV,NF,NI,N2DIMENSION OBS(l000),TIME(1000),XM(l000,8),XMI(l000,8),Z(ltXlo,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2INTEGER NTDIMENSION B(30),Bl(30),BT(NT),BX(5),BX1(5),TEMP(5)DO 1 J=I,NlI B(J)=BT(J)DO 2 J=1,N22 BI(J)=BT(Nl+J)F=0.ODODO 100 I=l,NOBSDO 12J=l,NSTAT-IBX(J)=0.ODODO 10 M=l,NX10 BX(J)=EX(J)+XM(I,M)B(M+(JI)NX)12 CONTINUEBXQISTAT)=0.ODODO 18 J=l,NSTATBXI(J)=0.000DO 16 M=l,NVBXI (J)=BXI(J)+XMI(1,M)51(M+(J-l)*NV)16 CONTINUE18 CONTINUEIF F4F.EQ.0) GO TO 25TEMPP=0.000DO 22 M=l,NFTEMPP=TEMPP+XMI(I,NV+M)*Bl(M+NSTAT*NV)22 CONTINUEDO 24 J=1,NSTATBXl(J)=BXI(J)+TEMPP24 CONTINUE25 CONTINUECALL AEXP(BXI (l),TEMP(1))TEMPI =BX(1)+OBS(I)*BXl(l)TIME(I)*rEMP(I)TEMPPI =BX(l)DO 30 J=2,NSTATIF (BX(J).GT.BX(J-l)) TEMPPI=BX(J)CALL AEXP(BXI(J),TEMP(fl)TEMPI2=(TEMP(J)-TEMP(J-I)flIME(I)TEMPI2=(BX(J).BX(J1))+OBS(I)*(BXl(J)BXI (J-1))-TEMPI2IF (TEMPI2.GT.0.ODO) TEMP1 =BX(J)+OBS(I)*BXl(J)TIME(IflEMPQ)30 CONTINUETEMP2=0.ODOTEMPP2=0.ODODO 40 J=l,NSTATCALL AEXP((BX(J)-TEMPPI),TEMPP2I)TEMPP2=TEMPP2+TEMPP2ITEMPI2=BX(J)+OBS(I)*BXI(J).TIME(IyrI’EMP(J)TEMPI2=TEMPI2-TEMP1CALL AEXP(TEMPI2JEMF22)TEMP2=TEMP2+TEMP2240 CONTINUEF=F+TEMP1 +DLOG(TEMP2)-TEMPPI-DLOG(TEMPP2)100 CONTINUEF=-FRETURNENDSUBROUDNE NEWTON(N,B,H,PO,IH,NEVALS,IFAIL,MON,std)C This suhrouthr is a quasi-Newton algorithm 4ash, 1990) whichC maximizes the observed log littlihood function.C Data input: N = dimension of vector B;C B = vector combining heta and alpha vectors;C III = dimension of the corcesponding Hessian matrix;C NEVALS =1/ of evaluations for fir observed log lilrlihocd function;C output: H = fir Hessian matrix; 90 = maxinasu value;C B = optimal values of alpha vectorC sal = approximate standard errom.IMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2 -DIMENSION OBS(l000),TIME(l000),XM(l000,8),XM1(l000,8),Z(1000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Nl,N2DIMENSION BQ4), H(IH,N),std(30,30)DIMENSION X(30), C(30), G(30), T(30)DOUBLE PRECISION KINTEGER COUNT,ib,nDATA W,TOLIO.2,I .ODOD-41,EPS/l .ODOD-6/IF (N.LT.0.OR.N.GT.23) GO TO 160lEN = N+IIG = IRLIM=7.2D05(lO.ODO*74. )CALL LLIKELY(N,B,P0)IF (P0.GT.RLIM) GOTO 180CALL GLIKELYa4,B,G)CC RESET HESSIANC10 DO 301 = I,NDO 20 J = l,N20 H(l,J) = 0.01)030 HØ,fl = I .ODODOILAST = IGCC TOP OF ITERATIONC40 DO 501 = I,NX(I) = B(I)50 C(I)=G(I)CC FIND SEARCH DIRECTION TCDl = 0.01)0SN=0.ODODO701=l,NS = 0.ODODO 60 J = l,N60 S = S-H(I,.J)G(J)T(I) = SSN = SN+SS70 Dl = Dl-55G(I)CC CHECK IF DOWNHILLCIF (Dl.LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN = 0.SDODO/DSQRT(SN)IC = DMINI(l.ODODO,SN)80 COUNT =0DO 901 = l,N= xm÷K5TmIF (DABS(Bm-X(ID.LT.EPS) COUNT = COUNT+ I90 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO ISOCALL LLIKELY(N,B,P)IFN = IFN+lIF (IFN.GE.NEVALS) GO TO 170IF (P.LT.P0-D15K’ OL) GO TO 1(8)K = W5KGO TO 80CC NEW LOWEST VALUEC100 P0 = P10 = 10+1CALL GLIKELY(N,B,0)IPN = WN+NCC UPDATE HESSIANCDl = 0.000DO 1101 = 1,NTQ) = K*T(I)CQ) = OQ)-C(I)110 Dl = D1+T(Iy5C(I)CC CHECK IF +VE DEP ADDITIONCIF (Dl.LE.0.ODO) GO TO 10D2 = 0.000DO 130 I = I,NS = 0.000DO 120 J = 1,N120 S = S+H(I,J)C(J)X(I) = S130 D2=D2+S*C(l)02 = 1+02/01DO1401=I,NDO 140 J = I,N140 H(I,J) = Ho,J)cJ)+T(J)ax(I)D2T(Iytcr(J))mlGO TO 40150 do 141 i=1,ndo 141 j=l,n141 std(ij)=h(ij)IFAIL. = 0C SUCCESSFUL CONCLUSIONRETURN160 WAIL 1C N OUT OF RANGERETURN170 WAIL = 2C TOO MANY FUNCTION EVALUATIONSRETURNISO WAIL = 3C INITIAL POINT INFEASIBLERETURN2005 FORMAT( 2X,3G16.4)ENDSUBROUTINE OLBCELYQ4,B,G)C©°veC This subroutInc computes the first derivative of the observed logC llktlihocd function.C Data input: N = dinantaice of vector B;C B = vector combining bets and alpha vectors;C Output: G=thederivative of the function atE.IMPLICIT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NX1,NV,NF,NI,N2DIMENSION OBS(l000),TIME(l000),xM(I000,I),xMI (1000 ,S),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NV,NF,Ni,N2INTEGER NDIMENSION G(30),B(N),BX(5),BX1(5),BTEMP(30),ATEMP(30)DO I J=I,Nl1 BTEMP(J)=B(J)DO 2 J=I,N22 ATEMP(J)=B(Nl+J)CALL ESTEP(Nl,N2,BTEMP,ATEMP)DOS I=I,NS 0(I)=0.000DO 100 I=I,NOBSDO t21=l,NSTAT-IBX(J)=0.000DO 10 M=l,NX10 BX(J)=BX(J)+XM(I,M)5TEMP(M+ (J-I)*NX)12 CONTINUEBX(NSTAT)=0.000TPROB=0.000DO 13 J=t,NSTATCALL AEXP(BX(J),TEMP1)TFROB=TPROB+TEMpIBX(J)=TEMPI13 CONTINUEDO 14J=1,NSTAT14 BX(J)=BX(J)/TPROBDO 183=1,NSTATBX1(J)=O.000DO 16 M=1,NVBX1(J)=BX1(J)+XM1(I,M)*ATEMP(M+(J1)*NV)16 CONTINUE11 CONTINUEIF (NF.EQ.0) GO TO 22TEMPP=0.ODODO 20 M=I,NFTEMPP=TEMPP+XMI(I,NV+M)*ATEMP(M+NSTAT*NV)20 CONTINUEDO 21 J=1,NSTATBX1(J)=BXI(J)+TEMPP21 CONTINUE22 CONTINUEDO 24 J=1,NSTAT-1DO 23 M=1,NXG(M+(J1)*NX)=G(M + (J1)*NX)+XM(I,M)*(Z(1,J)BX(T))23 CONTINUE24 CONTINUEDO 30 3=1,NSTATCALL AEXP(BXI(J),TBATE)I1X1(J)=TRATE30 CONTINUEDO 45 J=1,NSTATDO 42 M=1,NVGQv1+(J-l)NV+Nl)=G(M+Q-l)NV+N1)C +XMl(l,M)*Z(l,J)*(OBS(I)BX1(J))42 CONTINUE45 CONTINUEIF (NF.EQ.0) GO TO 60DO 55 M=1,NFDO 52 J=I,NSTATG(M+N1 +NSrA’rNV)=o(M+N1 +NSTAT5NV)C +XMI (I,M+NVfl(I,J)OBS(1)-BXI(J)52 CONTINUE55 CONTINUE60 CONTINUE100 CONTINUEDO 200 I=1,N200 G(1)=-G(I)RETURNENDSUBROUTINE GFIT(NTEMP1,NTEMP2,B,Bl,XSQR,FIT,RES,PA,PB,PC)Ci4+ ,.++‘.+‘::4+ +:z:.:: .,: .C This suhroutitr computes Pearson statistic, fitted values, PearsonC residuals, overdispemion test statistics for each component.C Data input: NTEMPI = dimession of vector B;C NTEMP2 = dimension of vector El;C B =brtavector; El =alphavector;C Output: XSQR = Pearson statistic;C FTT = fitted values including for each component;C RES = Pearson residuals including for each component;C PA, PB and PC are vectors containing tes A, B and CC overdinperion teat statistics for each component.Cssssssssssssssssuasussssssssssssssssssssssssssssssssssssssssss50555soIMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NX1,NV,NF,Nl,N2DIMENSION OBS(1000),TIME(l000),XM(l000,8),XMI(I000,I),Z(I000,5)COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXl,NV,NP,Nl,N2INTEGER NTEMP1,NTEMP2,NCOUNT(l0)DIMENSION Bq’TEMPI),BIQqTEMP2),BX(5),BXI(5),FTT(I000,l2)DIMENSION PA(10),TPA(10,2)DIMENSION CFTT(lO),RES(l000,6),PB(l0),TPB(l0,2),PC(l0),TPC(l0,2)DO2J=l,NSTATNCOUNT(J)=0TPA(J,l)=0.ODOTPA(J,2)=0.ODOTPB(J,I)=0.ODOTPB(J,2)=0.ODOTPC(J,l)=0.ODO2 CONTINUEXSQR=0.0DO 100 l=I,NOESFIT(I,l)=OBS(I)FIT(l,2)=0.000DO 20 l=1,NSTAT-1BX(J)=0.ODODO 10 M=1,NX10 BX(J)=BX(J)+XM(I,M)5M+(J-1)N )20 CONTINUEBX(NSTAT)0.0D0TEMPI =BX(1)DO 21 J=2,NSTATIF (BX(J).GT.IIX(J-1)) TEMPI =BX(J)21 CONTINUETEMPD=0.ODODO 25 J=1,NSTATCALL AEXP((BX(J)-TEMP1),TEMPI 1)BX(J)=TEMPIITEMPTh=TEMPD+TEMP1 1BXI(J)=0.ODODO 22 M=1,NVBxl(J)=Bx1(J)÷xMl(I,M)*Bl(M÷q-1)Nv)22 CONTINUEif (nv.eq.0) go to 25DO 23 M=1,NFBXI(J)=BXI(J)+XMI(l,M+NV)5E1(M+NSTATNV)23 CONTINUE24 CALL AEXP(BXI(J),TEMP2)EX1(J)=TEMP2*TIME(l)FIT(i,2+J)=BX1(J)CFIT(J)=BXIQ)25 CONTINUEDO 26 J=l,NSTATFrF(I,2+NSTAT+J)=BX(J)IrEMPDFIT(I,2)=FIT(I,2)÷FITa,2÷J)IT(I,2+NSTAT÷JRES(l,l +J)=(FlT(l,1)FlT(I,2+J))s(Frr(I,2+J)*%0iD0D0))26 CONTINUEE2=0.000DO 30 J=l,NSTATE2=E2+FIT(I,2+NSTAT+J)*(F1T(l,2+J)*K2.ODODO)30 CONTINUEE2=Frf(J,2)+E2-(FITq,2)(2.oD0D0))RES(l,l)=(F1T(I,1)F1T(I,2))*(E2fl(0.5D0DO))XSQR=XSQR+(RESa,l)ve(2.ODODO))NUM= IDO 40 J=I,NSTAT-lIF (Z(l,J).GT.Z(I,J+l)) GO TO 40NUM=J+140 CONTINUENCOUNTa4UM)=NCOUNT4UM)+ ITPA(NUM,1)=TPA(NUM,1)+(OBS(l)-CF1T(NUMfletaODO-CFTra4UM)TPB(NUM,l)=TPB(NUM,i)+(OES(i)-CF1T(NUM))2.0D0-OBS(l)TPA4UM,2)=TPAQ4UM,2)+CFIT(NUM)tte2.0D0TPB4UM,2)=TPA(NUM,2)TPC(NUM,l)=TPC(NUM,l)+((OBS(I)-CFlT(NUM))2.0D0C -OBSØ)/CFEf4UM)100 CONTINUEDO 150 J=I,NSTATPA(J) =TPA(J,l)/((2.0D0DcWI’PA(J,2)).5D0D0)PB(J)=TPB(J,1)/((2.0D0D0TPB(J,2)y5.5DOD0)PC(J)=TPC(J,l)/((DFLOAT(NCOUNT(J))n.ODODO)y’°v.SDODO150 CONTINUERETURNENDSUBROUTINE FLIKELY(NT,BT,F,DRES)CC Thia subroutine computes Ut deviance residuals.C Data input: NT dimension of vector BT1C BT = vector combining beta and alpba vectors;C Output: ORES = deviance residuals;C F = the observed log likelihood fsnctlon value at liT.Cn0©anacccccncccccIMPLICIT DOUBLE PRECISION(A-I4,O-Z)INTEGER NOBS,NSTAT,NX,NXl,NV,NF,Nl,N2DIMENSION OBS(1000),TIME(l000),XM(l000,8),XMI(I000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NV,NF,Nl,N2INTEGER NTDIMENSION B(30),Bl (30),BT(NT),BX(5),BX1(5),TEMP(5),DRES(l000)DO I J=l,Nl1 B(J)=BT(J)DO 2 J=l,N22 lil(J)=BT041+J)F=0.000anDO 100 I=l,NOBSDO 12 J=1,NSTAT-IBX(J)=0.ODODO 10 M=1,NX10 BX(J)=BX(fl+XM(I,My°B(M+ (J.1)NX)12 CONTINUEBXO1STAT)=0.ODODO 18J=I,NSTATBXI(J)=0.ODODO 16 M=I,NVBXI(J)=BX1(J)+XMIO,MrBl(M±(J.1)*NV)16 CONTINUE18 CONTINUEIF (NF.EQ.0) GO To 25TEMPP=0.ODODO 22 M=1,NFTEMPP=TEMPP+XMI(1,NV+M)5B1(M+NSTATNV)22 CONTINUEDO 24]=I,NSTATBX1(J)=BX1(J)+TEMPP24 CONTINUE25 CONTINUECALL AEXP(BX1(1),TEMP(1))TEMP1 =BX(1)+OBS(I)5BXl(l).TIME(1)8EMP(1)TEMPPI=BX(1)DO 30 J=2,NflATIF (BX(J).GT.BX(J-1)) TEMPPI =BX(J)CALL AEXP(BXI(J),TEMP(J))TEMP12=(rEMP(J).TEMP(J1fl*rIME(1)TEMP12=(BXQ)-BX(3-1))+OBS(I)5(BXI(J)- I(J.I))-TEMP12IF (TEMPI2.GT.0.ODO) TEMPI =.BX(J)+OBS(I)*BX1Q)TIME(I)*TEMP(J)30 CONTINUETEMP2=0.ODOTEMPP2=0.ODODO 40 J=1,NSTATCALL AEXP((BX(J)-TEMPPI),TEMPP2I)TEMPP2=TEMPF2+TEMPP2ITEMPI2=BX(J)+OBS(I)*BXI (J)-11ME(IflEMI%I)TEMPI2=TEMP12-TEMPICALL AEXP(rEMPI2,TEMP22)TEMP2=TEMP2+TEMP2240 CONTINUEDRES(1)=TEMPI +DLOO(TEMF2)-TEMPPI-DLOO(TEMPP2)F=F+DRES(1)100 CONTINUEF=-FREFURNEND2. Fortran program for computing the maximum likelihood estimates of the mixed logistic regression model.PROGRAM BINMIXCC This code find niaximosn IIlcnllhood estimates of tIre parameters for mixed *C binomial regression model. Observed data should be assooisted with n(i) *C whioh is the number of total trials related to observation i. *C In this code we sflnw to choose common regression coefficients for *C different oresponenta. NVAR = I/of different coeffecients *C NCOM = 1/ of common coeffecients *C If NCOM =0, this Is the most general mae. Note that this code does *C not impose any restriction on mixing probabilities. *C The progrsm gives the estimated standard errnrs from the quasi-Newton *C apprcach. *C ... --“ -- .++‘+-‘“.i.CIMPIJCfF DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS0000),TIME(I000),xM(I000,g),xMl0000,8),w000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOMDIMENSION BGUESS(30),OB(30)jI30,30),AGUESS(30),OA(30)DIMENSION Fff(I000,13),RES(1000,8),Th(30),TDRES(l000),DRES(l000)INTEGER Nl,N2,nII,D12,MON,NEVALS,IFML,NSTEP,NSTEPI,N,1H3Integer stol,iterdinresnion tcbs(l),uinse(I000),sms(I000,8),o ocmi(1000,8),resl(I000),sign(I000)dimension opar(3OXse(3O),w(I000)OPEN( UNIT=l,FILE’Iostl’)OPEN( UNIT2,FILE’residsaY)p.. 3i4open( unit=3,fiIe=’IiIres’)open( unit=8,flIc’fitout’)OPEN( UNIT7,FILE=’result’)OPEN( UN1T9,FILE=’Idataout’)READ(1,I00) NOBS,NSTAT,NX,NXI,NVAR,NCOMwrite(*,100) nobs,nstat,nx,nxl,NVAR,NCOM100 FORMAT(615)NI =(NSTATI)*NXN2=NSTAT*NVAR+NCOMN=N1 +N2READ(I,113) (OBS(1),TIME(1), I=I,NOBS)110 FORMAT(FI0.5D0)write(*,1 13) (obs(i),ti,ue(i),i=1,nobs)do 111 11,nobsREAD (1,112) (XM(I,J),J=1,NX)wrilc(*,112) (xm(ij)j=1,nx)III continue112 FORMAT(6G16.8)113 FORMAT(2F10.500)READ(1,l10) (BGIJESS(I),1=1,Nl)write(9,I 10) (bguess(i),i=I,nl)DO 115 I=1,NOBSREAD(1,112) (XMI(1,J),J1,NX1)write (*,112) (xml(i,j)j=l,nxl)115 CONTINUEREAD(1,1 10) (AGUESS(J),1=1,N2)WRITE(9,1 10) (AGUESS(I),1= I ,N2)do 118 i=l,nobstobs(i)=olw(i)ttimeC)’timeO)do 116 j=1,nx116 txm(i,j)=xm(i,j)do 117 j=1,nxl117 bm1(i,j)xml(ij)118 continueNEVALS 1000IH1N1JIJ2=N2IN3NMON=1TOL0.0D01toIlO.ODOIOBSINF=0.ODODEV=0.ODODO 120 1=l,NOBS120 TDRES(I)=0.ODOdo 121 iI,nobs121 w(i)=0.ODODO 150 1=I,NOBSIF (OES(I).EQ.0.ODO) GO TO ISOIF (OES(1).EQ.TIME(1)) GO TO 150TDRES(1)=OBS(J)*DLOG(OBS(1))TIME(I)*DLOG(rIMEQ))C +(flME(I)-OBS(I)yDLOG(TIME(1)-OBs(J))DEV=DEV+TDRES(1)TEMPSUM=0.ODONSTEP=INT(OBS(1))NSTEP1 =INT(flME(I))DO 142 J=NSTEP+1,NSTEPITEMPSUM=TEMPSUM+DLOG(DFLOAT(J))142 CONTINUEOBSINF=OBSINF+TEMPSUMTEMPSUM=0.ODON8TEP=INT(TIME(1)-ORS(1))DO 144 J=1,NSTEPTEMPSUM=TEMPSUM+DLOG(DFLOAT(J))144 CONTINUEOBSINF=OBSINF+TEMPSUMISO CONTINUEntol=nobsodev=devdo 888 iter=0,ntolprel=(10.0DOK10.0D0)200 DO 202 I=1,NI202 OB(I)=BGUESS(I)DO 205 I=I,N2205 OA(1)=AGUESS(I)CALL ESTEP(N1,N2,EGUESS,AGUESS)CALL MSThPI(N2,AGUESS,H,P,1112,NEVALS,IFAII,MON)CALL MSTEP241,BGUESS,H,P,flh1,NEVALS,IFAIL,MON)SSRI =0.ODOSSR2=0.ODO3r00210 I=1,N1SSRI =SSR1 +(0B(1)-EGUESS(1))2.0D0210 CONTINUEDO 220 I=1,N2220 SSR2=(OA(I)-AGUESS(1))2.0D0+SSR21111 format(4x,2G16.8)do 230 i=1,nl230 tb(i)=bgiess(i)do 240 i=1,n2240 tb(i+nl)=agueeu(i)call lll1y(n,tb,f)tcmp=-f-pcelif (temp.lt.toll) go to 368prel-fif (iter.gt.0) go to 36310=-ff=-f-infwrite(9,1l11) f,Rlwrite(a,111l) f,10363 IF ((SSRI.GT.TOL).OR.(SSR2.GT.TOL)) GO TO 200368 call tn(n,tb,b,fueva1s,ifail,mcas)if (lter.eq.0) call fllk1y(n,tb,f,dies)if (lter.gLO) call lllAIy(n,tb,f)do 375 i=I,nl375 bguess(i)=tb(i)do 376 i=l,n2376 agueas(i)tb(i+nl)DEV=2’(DEV-(-F))if (iter.eq.0) tdev=devif (iter.gt.0) go to 60010=-ff-f-obainfwrite(9,llil) f,10call esteZbgueas,aguess)call gflt(nl,n2,bguess,agueas,XSQR,fit,RES)tempdflnut(nl +n2)do 378 k1,nlopar(k)bgueas(k)=Qi(kk)(0.5D0))*temp378 continuedo 379 k=1,n2opar(k+nl)aguess(k)se(k+n1)=(h(k+n1,k+nl)*(0.5D0))*temp379 continuewrlte(9,4444)4444 format(4x,goodness of flt—XSQR, atal devianco’)wrlte(9,7777) XSQR, DEVdo 380 i1,nobaC IF (OBS(1).EQ.0.000) GO TO 380o IF (OBSW.EQ.TIME(I)) GO TO 380TEMP=tobe(i)-FIT(1,3)IF (rEMP.EQ.0.ODO) sign(I)=0.ODOIF (1EMP.NE.0.0D0) sign(i)=TEMP/(ABS(IEMP))DRES(I)=(2*(rDRES(I)0RES(I))).5D0DRES(1)=sign(i)*DRES(l)380 CONTINUEdo 385 i1,nobswrite(8,398) (FIT(I,J),J= 1,2+2*NSTAT+ 1)WRITE(2,398) DRES(i), (RES(I,J),J =1,1 +NSTAT)385 continue398 format(6g18.7)do 500 i1,nobswritc(7,1 12) (z(i,j),j= I ,nstat)500 continueWRITE(9,5555)WRITE(9,7777) (BGUESS(J),(h(i,i)°0.500),I= l,Nl)write(°,7777) (bgueus(i),i= l,nl)write(9,666write(9,7777) (AGU SS(I),lh(i+nl,i+n1).5D0),l= I,N2)write(°,7777) (aguess(i),i=l,n2)go to 6036(8) rcsl(iter)=tdev-devdo 601 k=1,nlw(itcr)=wtec)+aIs(bguess(k).opar(k))/se(k)bguess(k)=opar(k)601 continuedo 602 k=1,n2w(iter)=w(iter)+aba(aguess(k)-q,ar(k+nI))/se(k+nI)aguess(k)=cpar(k+nI)602 cnatif (itcr.eq.ntol) go to 889603 noba=ntol-1dev=odev-tdins(iter+ I)if (iter.eq.0) go to 614do 610 k1,iterobs(k)tob(k)t(kttime(k)do 605 j=l,nx605 =ncmQc)do 606 j=I,nxl606 uu1(kj)Uun1(kj)610 continueif (iter.eq.nobe) go to 888614 do 620 k=iter+1,oobsobs4k)=tobs(k+1)timettime(k+1)do 615 j1,nx615 xm(k.j)txm(k+1j)do 616 j1,nxl616 xml(k.j)=txml(k+1j)620 continue888 continue889 do 900 i=1,ntolres1(i)=aigtai)*(res1(i)I*.5D0))write(3,7778) dresQ),resQ,1),msl(i),w(i)900 continue5555 format(4x,beta-vecto?)6666 format(4x,’alpha-vectoi’)7777 format(4x,2g16.8)7778 format(4x,4g16.8)9999 STOPENDSUBROUTINE FUNCr(N,B,p)C This subroutinc computes the value of function QI in Chapter 3.C Data input: N = dimension of vector B;C B = beta vector;C output: P the function value Q1(B).IMPLICIT DOUBLE PRECISION (A.H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS(l000),TIME(1090),XM(l000,8),XMI(l000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOMINTEGER NDIMENSION B(N),BX(5)P=0.000DO 100 1=l,NOBSDO 8 J=l,NSTAT-lBX(J)=0.ODODO 6 M=1,NX6 BX(J)=BX(J)+XM(I,M)*B(M+(J.l)*NX)8 CONTINUEBX(NSTAT)=0.ODOTEMPMAX=BX(l)ConomC Loop 20 finds the largest BX(J), TEMPMAX. *CDO 20J=2,NSTATIF (BX(J).UT.BX(J-l)) TEMPMAX=BX(J)20 CONTINUETEMPSUM=0.ODODO 30 J=l,NSTATP=P+Z(I,J)*BX(J)CALL AEXP((BX(J)-TEMPMAX),TEMP3)TEMPSUM=TEMPSUM+TEMP330 CONTINUEP=P-TEMPMAX-DLOG(TEMPSUM)100 CONTINUEP=-PRETURNENDSUBROUTINE GRAD(N,B,G)C This subroutine computes the first derivative of QI (see eqn 3.18).C Data input: N = dimension of vector B;C B=betavectorC output:G=thederivativeofQlatB.’on°°°IMPLICIT DOUBLE PRECISION (A-H,O.Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS(I000),TIME(l(mo),XM(1000,8),XMI(I000,g),Zu000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,N&rAT,Nx,Nxl,NVAR,NCOMINTEGER NDIMENSION 0(25),B(N),TEMPC2S),BX(S)DOS I=I,NS G(I)=0.0D0DO 100 I=l,NOBSDO 20 J’=l,NSTAT-IBX(J)=0.000DO 10 M=l,NX10 BX(J)=BXQ)+XM(I,M)*B(M +(Jl)*NX)20 CONTINUEBXaISTAT)=0.ODOTEMPMAX=BX(l)C :. ::C Loop 30 finds the largest 3)1(J), TEMPMAX.5Cs*+_+._+—.++ :+——— ++‘+— . +4DO 30 J=2,NSTATIF (BX(J).GLBX(J-l)) TEMPMAX=BX(J)30 CONTINUETE1vWSUM=0.ODODO 40 J=1,NflATBX(J)=BX(J)-TEMPMAXCALL AEXP(BXQ),TEMP(J))TEMPSUM=TEMPSUM+TEMP(J)40 CONTINUEDO 60 J=I,NSTAT-1TEMPPRO=TEMP(JyrEMPSUMDO 50 M=l,NXG(M+(Jl)*NX)=0(M +(J.l)aNX)+XM(I,M)*(z(1,J)TEMPPRO)50 CONTINUE60 CONTINUE100 CONTINUEDO 200 1=1,N200 GQ)=-G(I)RETURNENDSUBROUTINE MSTEP2(N,B,H,PO,IH,NEVALS,IFAIL,MON)C . :: -. ::C ‘This sshrzusiir is a quasi-Newton algorithm (Nash, 19P0) whichC maximizes theflincticnQl.C Data input: N = dinrttion of vector B; B = heta vector;C 114 = dinresion of the Hessian matrix;C NEVALS = 1/ of evalustimta for the ftasctisn QI;C output: H = the Hessian matrix; P0 = nnxinaies valse;C B = optimal values of heta vector.+4,——”:., .‘,+——IMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI ,NVAR,NCOMDIMENSION OBS(I000),TIME(I000),XM(l000,I),XM I (l000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI ,NVAR,NCOMDIMENSION 3Q4), H(IH,N)DIMENSION X(30), C(30), 0(30), T(30)DOUBLE PRECISION KINTEGER COUNTDATA W,TOL/0,2,l .0000-41,EPSII .ODOD-6/IF (N.LT.0.OR,N.OT.23) GO TO 160IFN = N+I10 =RLIM=72*(l000074,0)CALL FUNCT(N,B,P0)IF(P0GT.RLIM)GOTO1SOCALL GRADQ4,B,0)CC RESEF HESSIANC10 DO 301 = I,NDO 20 J = l,N20 H(I,J) = 0,00030 HQ,l) = 1.000ILAST = 10CC TOP OF ITERATIONC40 DO 501 = I,NX(I) 3(1)50 C(I)=0(I)CC FIND SEARCH DIRECrION TCDl = 0.ODOSN =0.00000701 = 1,NS 0.000DO 603 l,N60 S SH(1,J)*G(J)T(1) SSN = SN+SS70 Dl = D1S*G(I)CC CHECK IF DOWNIULLCIF (D1.LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN 0.SDO/DSQRT(SN)K = DMINI(1.000DO,SN)80 COUNT =000901 1,N5(1) = X(I)+K9(I)IF DABS(B(I)-X(I)).LT.EpS) COUNT COUNT+ I90 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO ISOCALL FIJNCT(N,B,P)IFN = IFN+IIF (IFN.GE.NEVALS) GO TO 170IF P.LT.DI*1QqOL) GO TO 100K = WKGO TO 80CC NEW LOWEST VALUEC100 P0 = PIG = IG+1CALL GRAD4,B,G)IFN = IFN+NCC UPDATE HESSIANCDl = 0.000DO 1101 = 1,NT(1) = K*T(I)C(I) = G(E)-C(1)110 Dl =D1+T(I)V(I)CC CHECK IF +VE DEF ADDITIONCIF (Dl .LE.0.00000) GO TO 1002 =0.000DO 130 1 = 1,NS = 0.0130DO 1203 = 1,N120 S S+H(1,fl’C(/)X(I) = S130 02 = D2+SCQ)D2 = 1+02/01DO 1401 1,NDO 140/ = 1,N140 H(1,J) =GO TO 40150 WAIL 0C SUCCESSFUL CONCLUSIONRETURN160 IFAIL IC N 0U OF RANGERETURN170 WAIL = 2C TOO MANY FUNCTION EVALUATIONSRETURN180 IFAIL = 3C INITIAL POINT INFEASIBLERETURN2005 FORMAT( 2X,3G164)ENDSUBROUTINE AEXP(X,F)+ +: :4C This subroutine computes a exponential function value.C Data input: IC = real oumbeçC output: F = exp(X).IMPLICTT DOUBLE PRECISION (A-N,O-Z)INTEGER NSTEPTEMFI =ABS(X)IF (TEMFI .GT.79S) GO TO SOF=DEXP(X)GO TO 200SO IF(X.LT.-79.9)GOTOISOIF (X.GT.lS0.ODO) X=lS0.ODOF=l.000+XNSTEP= IFACTOR= I .ODOTEMF1 =DFLOAT(NSTEP)TEMP2==XITEMPI100 IF (TEMF2.LT.I.ODO) GO TO 200NSTEP=NSTEP+1TEMF1 =DFLOAT(NSTEP)FACTOR.=XITEMP1TEMP2TEMP25FAC ORF=F+TEMP2GO TO 100150 F=0.000200 RETURNENDSUBROUTINE FUNCTI(N,B,P)C This subroutine computes the value of function Q2 in Chapter 3.C Data input: N = dintntion of vector B;C B alpha vector;C cutput: P = tic function value Q2(B).IMPLICIT DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NX1,NVAR,NCOMDIMENSION OBS(l000),TIME(I000),XM(l000,8),XMI(1000,8),Z(l000,5)COMMON OBS,TIME,XM,XMI ,ZJ4OBS,NSTAT,NX,NXI,NVAR,NCOMINTEGER NDIMENSION B(N),BX(5)P=0.0D0TEMPI = 1 .ODODO 100 I=I,NOBSH. —,--4-’ttttt-tt-,,: :‘ : +t,,±’ “‘:: -: 4C Locp 8 computes BX(J) for variable coefficient part. *t :. . H :008 J=l,NSTATBX(J)=O.0150DO6M=l,NVAR6 BX(J)=BX(J)+XMI(I,M)B(M+(J-I)NVAR)8 CONTINUEIF (NCOM.EQ.O) GOTO IIDO l0J=l,NSTATDO 9 M=1,NCOMBX(J)=BX(J)+XMI(I,NVAR+M)*B(NVAR*NSTAT+M)9 CONTINUE10 CONTINUE11 CONTINUEDO 20J=l,NSTATIF (BX(J).LT.0.000) GO TO ISCALL AEXP(-BX(J),TEMP)PP+Z(Ij)5((OBS(WTI?ctE(I )BX(J)-TIM QDLOG(EEMPI+TEMP))GO TO 20IS CALL AEXP(BX(J),TEMP)P=P+Z(t,J)(OBS(I)BX(J)-T1ME(I)DLOG(TEMPt +TEMP))20 CONTINUE100 CONTINUEp=-pRETURNENDSUBROUTINE GRADI(N,B,G)C This subroutine computes the first derivative of Q2 (tee eqo 3.19).C Data input: N = dlnconioo of vector B;C B=alphavedor;C rutput:G=thedeeivativeofQ2atB.IMPUCTr DOUBLE PRECISION (A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS(1000),TIME(1000),XM(1000,8),XMI(l000,8),Z(I000,5)COMMON OBS,TME,XM,XMI,Z,NOBS,NS’FAT,NX,NXl,NVAIçNCOMINTEGER NDIMENSION G(3o),Ba4),BX(5)TEMP1 = I .ODODOS I=’l,NS O(I)=0.ODODO 100 I=l,NOBS,‘:I,—’C Loop 20 osmputea BX(3) for variable ocefficient past. *DO 20.I=l,NSTATBX(J)=0.0D0DO 10 M=I,NVAR10 BX(J)=BX(J)+XMI(l,M)*B(M+QI)*NVAR)20 CONTINUEIF (NCOM.EQ.0) GO TO 25DO 243=I,NSTATDO 22 M=l,NCOMBX(J)=BX(J)+xMla,NVAR+M)*B(NnA’PONvAR÷M)22 CONTINUE24 CONTINUE25 CONTINUECC Loop 40 computes tic gradient.c 44-,++— ——DO 40 J=I,NSTATIF (BX(J).GT.0.ODO) GO TO 35CALL AEXP(BXQ),TEMP)DO 30M=1,NVARG(M +(3l)*NVAR)=G(M +(JI)*NVAR)C +Z(1,J)*XM1(1,M)*(OBS(I)TIME(IflEMPI(TEMpl +TEMP))30 CONTINUEGO TO 4035 CALL AEXP(-BX(J),TEMP)DO 36 Ml,NX1G(M +(J1)*NVAR)=G(M +(J1)*NVAR)C +Z(I,J)*XMI(1,M)*(OBS(I)TIME(I)/(TEMPl +TEMP))36 CONTINUE40 CONTINUEIF (NCOM.EQ.0) GO TO 81DO 80 M=1,NCOMDO 60 J=l,NSTATIF (BX(J).GT.0.ODO) GO TOSSCALL AEXP(BXQ),TEMP)O(M +NVAR*NSTAT)=G(M+NVAR*NSTAT)C +Z(lj)*XMI(I,M+NVAR)*(OBS(1)-TIMEQflEMP/cI’EMPI +TEMP))GO TO 6055 CALL AEXP(-BX(J),TEMP)G(M +(J1)*NVAR)=G(M+(J1)*NVAR)C +Z(I,J)*XMI(I,M+NVAR)*(OBS(1)TIME(I)/(rEMp1 +TEMP))60 CONTINUE80 CONTINUE81 CONTINUE100 CONTINUEDO2001=I,N200 G(fl=-G(1)RETURNENDSUBROUTINE MFrEP1a4,B,H,p0,IH,NEVALS,IFAIL,MON)H.. .C This aubroutine is a quasi-Newton algorithm (Nash, )9%) whichC maximizes the fanction Q2.C Data input: N = dimension of vector B; B alpha vector;C DI = dincanion of the corresponding Hessian matrix;C NEVALS =// of evaluations for fan function Q2;C output: H = liar Hessian matrix; P0 = maximian value;C B = optimal values of alpha vector.C5aos©ncanatnsussssss4o55n©essocenn*sssn.wsssossss.snsIMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NFrAT,NX,NXI,NVAJçNC0MDIMENSION OBS(I000),TIME(i000),XM(1000,8),XM l(l000,8),Z(l000,S)COMMON OBS,TIME,XM,XMI,z,NOBS,NSTAT,Nx,Nxl,NVaNCOMDIMENSION B(N), H(IH,N)DIMENSION X(30), C(30), 0(30), T(30)DOUBLE PRECISION KINTEGER COUNTDATA W,T0LTh2,1 .000D-41,EPS)1 .ODOD-6/IF (N.LT.0.OR.N.GT.23) GO TO 160IFN = N+1IG = 1RLIM=7.2*(10.0D074.0)CALL FUNCFI(N,B,P0)IF(P0.GTRLIM)GOTOISOCALL GRADI(N,B,G)CC RESET HESSIANC10 DO 301 1,NDO 20) = 1,N20 HO,)) 0.013030 H(I,I) I.ODOILAST = 10CC TOP OF ITERATIONC40 DO 501 = 1,NXQ) = 0(1)50 C(I)=’G(I)CC FIND SEARCH DIRECTION TCDl = 0.000SN=0.ODODO 701 I,NS 0.000DO 60) 1,N60 S = S.H(I,J)*G(J)T(1) = SSN = SN+S*S70 DI = D1-SG(1)CC CHECK IF DOWNHILLCIF (D1.LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN = 0.500/DSQRT(SN)K = DMINI(1.ODODO,SN)80 COUNT 0DO 901 = 1,NBQ) X(1)+K*T(I)IF (DABS(B(1)-X(I)).LT.EPS) COUNT = COUNT+190 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO ISOCALL FUNCFI(N,B,P)IFN = LFN+1IF (IFN.GE.NEVALS) GO TO 170IF (PiT.P0.DI’KrOL) GO TO 100K = WKGO TO 80CC NEW LOWEST VALUEC100 P0 = PIG 10+1CALL GRADI(N,B,G)IFN = IFN+NCC UPDATE HESSIANCDI = 0.000DO 110 I I,NT(I) = KT(1)C(1) = G(1)-C(I)110 Dl =DI+T(I)*C(I)CC CHECK IF +VE DEE ADDITIONCIF (D1.LE.0.00000) GO TO 1002 = 0.0130DO 1301 l,NS = 0.000DO 1203 = 1,N120 S=S+Ha.JY5C(J)XO) = S130 02 = 02÷S*c(I)D2 = I+D2/DlDO 1401 = 1,NDO 1403 = I,N140 H(l,J) = H.crwex(J)÷Tq)*xo).D2serorT(J))mIGO TO 40150 IFAIL = 0C SUCCESSFUL CONCLUSIONRETURN160 WAIL 1C N OUT OF RANGERETURN170 IFAIL = 2C TOO MANY FUNCTION EVALUATIONSRETURN180 WAlL = 3C INITIAL POINT E1FEASIBLERETURN2005 FORMAT( 2X,3G16.4)ENDSUBROUTINE ESTEPQ4I,N2,B,B1)C• :. tt**-ttrt-rr’-rtC This subroutine executes the E-step of the EM algoritlun.C Data input: NTEMFI = dimension of vectoe B;C NTEMF2 dimension of vector Bi;C BbetavectoeC El = alpha vector.C Ouput: updated posterior probabilities, Z(l,J).IMPLICIT DOUBLE PRECISION (A-H,O.Z)INTEGER NOBS,NSTAT,NX,NX I ,NVAR,NCOMDIMENSION OES(l000),TIME(1000),XM(I000,8),XMI (l000,8),Z(1000,5)COMMON OBS,TIME,XM,XMI ,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOMINTEGER Nl,N2DIMENSION B(NI ),Bl(N2),TEMP(5),EX(5),EXI(5)SMALLO= l0000000000.000SMALLO= 1 .ODOISMALLOONE=l.ODODO 100 1=l,NOBSC Loop 12 computes BX(J). *DO 12 J=1,NETAT-IEX(J)=0.000DO 10 M=I,NXBX(J)=BX(I)+XM(I,M)5(M+(3.l)*NX)10 CONTINUE12 CONTINUEBXQ4STAt=0.ODOC Loop 21 computes BXI(J) foe variable coefficient part. *-...-,DO 21 J=I,NSrATEXI(J)=0.ODODO 20 M=1,NVAR20 EXI(J)=BXI(J)+XMl(1,M)*El(M+(JI)*NVAR)21 CONTINUEIF (NCOM.EQ.0) GO TO 24DO 23 J=I,NSTATDO 22 M=l,NCOMBXI(J)=BXI(J)+XMI(I,M+NVAR)*B1(NSTAT*NVAR+M)22 CONTINUE23 CONTINUE24 CONTINUEC4-=—° .1C Loop 30 linda the largest item in exponential functioes. *...,....:.::,,, I,IF (BXl(I).GT.0.000) GO TO 25CALL AEXP(EX1(l),TEMPI)TEMP(l )=BX(1) +OBS(I)*BXI(I)-TIME(I)DLOG(ONE+TEMPI)GO TO 2625 CALL AEXP(-EXI(l),TEMPI)TEMP(l)=EX(l)+(OBS(1).TIME(I))*BXI(l).TIME(l)*DLOG(ONE+TEMF1)26 TEMPMAX=TEMP(1)DO 34 J=2,NSTATIF (BX1(J).GT.0.0D0) GO TO 32CALL AEKP(BXI(J),TEMP1)TEMP(J)=’BX(J)+OBS(I)*BXI (J)TlMEm*DLOG(ONE+TEMPl)GO TO 3332 CALL AEXP(-BXI(J),TEMPI)TEMP(J)=BX(J)+(OBST1ME(1))*BX1(J)TIME(l)*DLOG(ONE+TEMP1)33 IF (rEMP(J).GT.TEMP(J-1)) TEMPMAX =TEMP(J)34 CONTINUEC Loops 40 and 50 compute Z(I,J) values. *TEMPSUM’=o.OWDO 40J=1,NSTATTEMPP’=TEMP(J)-TEMPMAXCALL AEXP(TEMPP,TEMP(J))TEMPSUM=TEMPSUM+TEMP(J)40 CONTINUEDO 50 J=t,NSTATZ(I,J)=’TEMP(J)fI’EMPSUMIF (Z(I,J).LT.SMALLO) Z(I,J)=’O.ODO50 CONTINUE100 CONTINUERETURNENDSUBROUTINE GFff(Nt,N2,B,BI,XSQR,F1T,RES)C This aubroutiit computes Pearson statistic, fitted values,l5earscssC residuals.C Data input: Ni = dhneeaicn of vector B;C N2 dimension of vector Bi;C B’=betavector; Dl =alphaveotor;C Output: XSQR = Pearson statistic;C FIT = fitted values including for each ctanpomntC RES = Pearson residuals including for each ccinpcarnt;IMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOB ,NSTAT,NX,NX1,NVAR,NCOMDIMENSION OBS(1000),TIME(I000),XM(t000,8),XMI (t,8),Z(l000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NXI ,NVAR,NCOMINTEGER Nl,N2DIMENSION B(N I),Bl(N2),BX(5),BXt(5),TEMP(5)DIMENSION FIT(tOtX),13),RES(l000,8)ONE=t.000XSQR=0.ODODO 100 I=I,NOBSFfl’(I,I + t)=OBS(l)Ff1’ 0,2+ I)=0.ODOCsscsccscancsaasassscscsasscC Loop 12 computes BX(J).DO 123=I,NSTAT-1BX(J)=0.ODODO 10 M=I,NXBX(J)=BX(fl+XM(I,M)sB(M+(J1)*NX)10 CONTINUE12 CONTINUEBX(I1STAT)=0.ODOTEMP1 =BX(1)DO 14 J=2,NSTATiF (BX(J).GT.BX(J-l)) TEMP1 =BX(I)14 CONTINUETEMPD=0.ODODO 161=1,NSTATCALL AEXP((BX(J)-TEMP1),TEMP1 1)BX(J)=TEMPI 1TEMPD=TEMPD+TEMP1 I16 CONTINUEDO 18J=I,NSTATFff(1,2+NSTAT+J + 1)=BX(J)IFEMPD18 CONTINUEC Loop 21 computes DXI (I) for variable coefficient pact. *DO 21 1=I,NSTATDX 1(J) =0.ODODO 20 M=I,NVAR20 BXt(J)=BXt(i)+XMt(I,M)*Bt(M+(JI)*NVAR)21 CONTINUEIF (NCOM.EQ.0) GO TO 24DO 23 J=I,NSTATDO 22 M=l,NCOMBX1(J)=BXI(J)+XMI(I,M+NVAR)*Bl(NSTAT*NVAR+M)22 CONTINUE23 CONTINUE24 CONTINUECEl =0.ODODO 40 J=l,NSTATIF (BXIQ).LT.0.ODO) GO TO 35CALL AEXP(-BX1(J),TEMPP)TEMP(J)ONE/(ONE+TEMPP)TEMP(J)=TIME(lflEMP(3)*(ONETEMP(J))El =El +FIT(I,2+NSTAT+J+IflEMP(J)FIT(l,2+J + 1)=TIME(I)/(ONE+TEMPP)GO TO 4035 CALL AEXP(BXI(J),TEMPP)TEMP(J)=TEMPP/(ONE+TEMPP)TEMP(J)=TIME(Iyel’EMP(J)*(ONETEMP(l))El =El +FITa,2+NSTAT+J÷IflEMpGT)F1T(l,2+J + I)=TIMEcJrEMPP/(ONE+TEMPP)40 CONTINUEDO 42 i=I,NSTATF1T(l,2+l)=F1T(I,2+l)+F1T(I,2+NSTAT+i+l)rr(I 2+J+l)RES(l,l +J)=(FIT(I,l +l)-FlTQ,2+J+l)r(I’EMP(J)(-0.5D0))42 CONTINUEE2=0.ODODO 50 J=t,NSTATE2=E2+F1T(l,2+NSTAT+J+ t)*(FIT(I,2+J+l)*5(2.000))50 CONTINUEE2=EI +E2-(FIT(I,2+ l)(2.0D0))FIT(I,I)E2RES(I,t)=(FIT(i,1 + l)FlT(I,2+l))*(E25©s(0.5D0))XSQR=XSQR+(RES(I,l)an(2.000))100 CONTINUEREFUENENDSUBROUTINE GLIKELfl4,TB,G)CC This subroutine computes the first derivative of the observed logC likelihood flusotion.C Data input: N = dinsonsion of vector B;C B = vector combining beta and aipha vectors;C Output: G = the derivative of the function at B.CIMPLICIT DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS(I000),TIME(l000),XM(1000,8),XMI(l000,8),C Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NXI,NVAR,NCOMINTEGER N,Nl,N2DIMENSION TB(N),B(25),BI(25),BX(5),BXI(5),TEMP(5),COM(5)C ,G(25),P(5)DO 1 l=l,N1 G(I)=0.000Ni =(NSTATI)*NXN2=NSTAT*NVAR+NCOMDO 2 I=l,Nl2 B(I)TB(I)DO 3 I=I,N23 Bl(I)TB(Nl+l)ONE=l.ODODO 100 I=l,NOBSCC Loop2ooomputesliX(J). *CDO 20 J=I,NSTAT-lBX(J)=0.ODODO 1OM=l,NX10 BX(J)=BX(J)+XM(I,M)*B(M+(Jl)*NX)20 CONTINUEBX(NSTAT)=0.000DO 22J=l,NSTATBXI(J)=0.ODOCa4----4C Loop 21 computes BX1(J) for variable coefficient part. *CDO 21 M=l,NVARBXI(J)=BXI (J)+XM1(I,M)*B1(M+Q.l)*NVAR)21 CONTINUEçr22 CONTINUEIF (NCOM.EQ.0) GO TO 25DO 24 J=I,NSTAT0023 M=I,NCOMBX1(J)=BXI (J)+XMI(I,M+NVAR)*B1(M+NSTAT*NVAR)23 CONTINUE24 CONTINUE25 CONTINUEC:::::-:::::::::::::::-::::.:PMAX=BX(1)DO 26J=2,NSTATIF (BX(J).GT.BX(J-1)) PMAX=BX(J)26 CONTINUEPSUM=0.ODODO283=l,NSTATCALL AEXP(BX(3),P(J))PSUM=PSUM+P(I)28 CONTINUEC CALCULATE MIXING PROBABILiTIES Pj *0029 J=1,NSTAT29 P(J)=P(J)IPSUMC —C CALCULATE BINOMIAL PARAMErERS THETAJ *DO 40 J=1,NSTATIF (BXI(J).LT.0.000) GO TO 35CALL AEXP(-BXI (J),TEMPP)TEMp(J)=Bx(J)+(oBswTIME(u)*BXl(J)TlMEm*DLOG(oNE+TEMpp)BXI (J)= I .ODO/(l .000+TEMPP)GO TO 4035 CALL AEXP(BXI(J),TEMPP)TEMP(J)=BX(J)+OBS(l)*BXI (J)-TlME(I)DLOG(ONE+TEMPP)BXI(J)=TEMPP/(l .000+TEMPP)40 CONTINUETEMPMAX=TEMP(1)0045 J=2,NSTATIF (TEMP(J).GT.TEMP(J-1)) TEMPMAX=TEMP(J)45 CONTINUETEMPSUM=0.000DO 48 J=1,NSTATTEMPP=TEMP(J)-TEMPMAXCALL AEXP(FEMPP,COM(J))TEMPSUM=TEMPSUM+COM(J)48 CONTINUEDO SO J=1,NSTATCOM(J)=COM(J)fI’EMPSUM50 CONTINUE0070 J=l,NSTAT-ITEMPP=COM(J)-P(J)0065 M=l,NXG(M +NX*(J1))=G(M +NX*(J1))+XM(1,MflEMPP65 CONTINUE70 CONTINUETEMPSUM =0.0000080 J=I,NSTATTEMPP=COM(J)*(OBS(I)TIME(l)*BXl(J))TEMPSUM=TEMPSUM +TEMPP0075 M=l,NVARG(M+Nl +NVAR*Ql))=GQol+Nl +NVAR*(Jt))+TEMPP*XMl(I,M)75 CONTINUE80 CONTINUEIF .lCOM.EQ.0) GO TO 1000090 M=1,NCOMG(M +N1 +NVAR*NSTAT)=G(M +N1 +NVAR*NSTAT)+C TEMPSUM*XMI(l,M+NVAR)90 CONTINUE100 CONTINUEdo 200 i=t,n200 g(i)=-g(i)RETURNENDSUBROIJfINE LLIICELY4,TB,F)Cas 000C This subroutine computes the observed log ilbelihood value.C Data input: N = total dinrnsion of vector BT;C TB = vector combining beta and alpha vectors.C Output: F = Ike observed log likelihood value at liT.IMPLICiT DOUBLE PRECISION(A-I4,O-Z)INTEGER NOBS,Nsi’AT,NX,NXi,NVAR,NCOMDIMENSION OBS(i000),TIME(l000),XM(1000,8),XM1 (1000,8),C Z(l000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX i,NVAR,NCOMINTEGER Ni,N2DIMENSION TBQ4),B(25),Bl(25),BX(5),BX1 (5),TEMP(5)Ni =alSTAT.i)*NXN2=NSTAT*NVAR+NCOMDO 1 i=i,NlI B(I)=TB(I)DO 2 I=i,N22 Bi(i)=TBO’Tl+I)F=0.ODOONE= I .ODODO 100 I=i,NOBSCC Loop 20 romputes BX(J). *DO 20 J=l,NSTAT-iBX(J)=0.ODODO 10 M=i,NX10 BX(J)=BX(J)+XM(i,M)*B(M+(Jl)*NX)20 CONTINUEBX(NSTAT)=0.ODODO 223=i,NSTATBX1(J)=0.ODOC Loop 21 computea BXI(J) for variable oceflicient part, aCDO 21 M=i,NVARBXi(J)=BXi(J)+XMl(I,M)*Bi(M+(Ji)*NVAR)21 CONTINUE22 CONTINUEIF Q4COM.EQ.0) GO TO 25DO 24 J=l,NSTATDO 23 M=l,NCOMBXI(J)=BXI(J)+XM1(l,M+NVAR)*Bl(M +N&FAT*NVAR)23 CONTINUE24 CONTINUE25 CONTINUEC H. :::..::::::. :::::..:H::t4444: *44PMAX=BX(i)DO 26 J=2,NSTATIF (BXQ).OT.BX(J-I)) PMAX=BX(J)26 CONTINUEPSUM=0.ODODO 28 ,t=l,NSTATbx)j)=bxØ-pmnCALL AEXP(BX(J),TEMPP)PSUM=PSUM+TEMPP28 CONTINUEF=F-DLOG(PSUM)Cttt’DO 40 J=I,NSTATIF (BX1(J).LT.0.ODO) GO TO 35CALL AEXP(-BX1(J),TEMPP)TEMP(J)=BX(J)+(OBS(I)TiME(ID*BXi (J).TIME(I)*DLOG(ONE+TEMPP)GO TO 4035 CALL AEXP(BX1(J),TEMPP)TEMP(J)=BX(J)+OBS(I)*BXi(J)TIME(i)*DLOG(ONE±TEMPP)40 CONTINUETEMPMAX=TEMP(i)DO 45 J=2,NSTATIF (TEMP(J).GT.TEMP(J-i)) TEMPMAX=TEMP(J)45 CONTINUETEMFSUM =0.ODODO 48 J=l,NSTATTEMPP=TEMP(J)-TEMPMAXCALL AEXP(TEMPP,TEMI’2)TEMPSUM=TEMPSUM+TEMP248 CONTINUEF=F+TEMPMAX +DLOG(FEMPSUM)100 CONTINUEf=-fRETURNENDdr’9SUBROUTINE QNEWrON(N,B,H,P0,NEVALS,IFAIL,MON)C—-.C This subroutine is a quasi-Newton algorithm 4ash, 1990) whichC maximizes the observed log lihalihood function.C Data input: N = dinantsion of vector B;C B = vector combining beta and alpha vectors;C 111 = dimansion of the corresponding Hessian matrix;C NEVALS 11 of evaluations for the observed log likelihood function;C output: H = the Hessian matrix; 90 = maximum value;C B = optimal values of alpha vector(sasst—t—t— . 4:IMPLIC1T DOUBLE PRECISION(A-H,O-Z)INTEGER NOBS,NS1’AT,NX,NXI,NVAR,NCOMDIMENSION OBS(l000),TIME(l000),XM(l000,8),XMI(l000,8),C Z(l000,5)COMMON OBS,TIME,XM,XMI,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOMDIMENSION B4), H(30,30)DIMENSION X(30), C(30), G(30), T(30)DOUBLE PRECISION KINTEGER COUNT,IH,NDATA W,TOL/0.2,l.ODOD-4/,EPSI1 .000D-61ih=nIF (N.LT.0.OR.N.GT.23) GO TO 160IFN = N+llG =RLIM=7.2D05(l0.0D074 0 0)CALL LLIKELY(N,BP0)IF(P0.GT.RLIM)GOTOI8OCALL GLlKELY(NB,G)CC RESEF HESSIANC10 DO 301 = l,NDO 20 J = l,N20 H(I,J) = 0.01)030 H(l,l) I.ODOLAST = 10CC TOP OF ITERATIONC40 DO 501 I,NX(l) = B(I)50 C(I)=G(1)CC FIND SEARCH DlRECION TCDl = 0.000SN =0.000DO 701 = I,NS = 0.000DO 60 J = l,N60 S = SH(I,J)*G(J)T(l) = SSN = SN+S*S70 Dl = Dl.S*GQ)CC CHECK IF DOWNHILLCIF (DI.LE.0.ODO) GO TO 10CC SEARCH ALONG TCSN = 0.5D0/DSQRT(SN)K = DMIN1(l.000DO,SN)80 COUNT =0DO 901 = l,NB(l) = X(l)+K*T(I)IF (DABS(B(I).X(I)).LT.EPS) COUNT = COUNT+l90 CONTINUECC CHECK IF CONVERGEDCIF (COUNT.EQ.N) GO TO ISOCALL LLIKELYa4,B,P)IFN IFN+lIF (IFN.GE.NEVALS) GO TO 170IF (P.LT.P0Dl*K*TOL) GO TO 100K = W*KGO TO 80CC NEW LOWEST VALUEC100 P0 = P10 = 10+1CALL GLIKELYa4,B,G)WN = IFN+NCC UPDATE HESSIANCDl = 0.000DO 1101 = 1,NTO) = K5TQ)C(I) = 0(I)-C(I)110 Dl =D1+T(I)*CWCC CHECK W +VE DEF ADDiTIONCW (D1.LE.0.OD000) GO TO 10D2 = 0.000DO 130 I = 1,NS = 0.000DO 1203 = 1,N120 S = S+H(I,J)*C(J)XQ) = S130 D2=D2+S*C(1)02 = 1+02/01DO 140 I = I,NDO 140 J = I,N140 H(1,J) = H(I,J).(r(l)*x(J)+T(J)*xO).D29(1Yer(J))/DlGO TO 40150 IFAIL =0C SUCCESSFUL CONCLUSIONRETURN160 IFAIL = IC N 01fF OF RANGERETURN170 WAIL = 2C TOO MANY FUNCTION EVALUATIONSRETURN180 WAIL = 3C INITIAL POINT INFEASIBLERETURN2(8)5 FORMAT( 2X,3G16.4)ENDSUBROUTINE FLIKELY(N,TB,F,DRES)Csese°°’40C This subroutine computes the deviance residuals.C Data input: N = dimension of vector BTI;C TB = vector combining beta and alpha vectors;C Output: ORES = deviance residuals;C F = the observed log likelihood function value at BT.C —.IMPUC1T DOUBLE PREC1SION(A-E,O-Z)INTEGER NOBS,NSTAT,NX,NXI,NVAR,NCOMDIMENSION OBS(l000),TIME(l000),XM(I000,8),XMI (1000,8),C Z(I000,5)COMMON OBS,TIME,XM,XM1,Z,NOBS,NSTAT,NX,NX1,NVAR,NCOMINTEGER NI,N2DIMENSION TB(N),B(25),BI (25),BX(S),BXI(5),TEMP(5),DRES(l000)NI =(NSTAT-I)5NXN2=NSTATvNVAR+NCOMDO I l=l,Nl1 B(I)=TB(I)DO 2 l=I,N22 BlO)=TBIN1+I)F=0.000ONE=l.ODODO 100 l=I,NOESC Loop 20 computes BX(J). *CDO 20 J=l,NETAT-lBX(J)=0.ODODO 1OM=l,NX10 BX(J)=BX(J)+XM(l,M)*B(M+(Jl)*NX)20 CONTINUEBXa4STAT)=0.ODODO 22 J=l,NSTATBXI(J)=0.ODOCC Loop 21 computes BXI(J) for variable coefficient part. *DO 21 M=1,NVARBX1(J)=BX1(J)+XM1(I,M)*B1(M+(iI)*NVAR)21 CONTINUE22 CONTINUEIF (NCOM.EQ.0) GO TO 25DO 24 J=l,NSTATDO 23 M=l,NCOMBXI(J)=BX1(J)+XMI(I,M+NVAR)*B1(M+NSrADWNVAR)23 CONTINUE24 CONTINUE25 CONTINUEPMAX=BX(l)DO 26 J=2,NSTATIF (EX(J).GT.EX(J-l)) PMAX=EX(J)26 CONTINUEPSUM=0.ODODO 28 J=l,NSTATbx(j)bxW-pmaxCALL AEXP(BX(J),TEMPP)PSUM=PSUM+TEMPP28 CONTINUEDRESW=-DLOG(PSUM)DO 40J=I,NflATIF (BX1(J).LT.0.ODO) GO TO 35CALL AEXP(-BXI(J),TEMPP)TEMP(J)=BX(J)+(OB)T1MEW)*BXl(J)TIME(1)*DLOG(ONE+TEMPP)GO TO 4035 CALL AEXP(BXI(J),TEMPP)TEMP(J)=EX(3)+OES(I)*BXl(J)TIME(I)*DLOO(ONE+TEMPP)40 CONTINUETEMPMAX=TEMP(1)DO 45 J=2,NSTATIF (FEMP(J).GTJE?vIP(J-l)) TEMPMAX TEMP(J)45 CONTINUETEMPSUM=0.ODODO 48 J=l,NSTATTEMPP=TEMP(J)-TEMPMAXCALL AEXP(TEMPP,TEMP2)TEMPSUM=TEMPSUM +TEMP248 CONTINUEDRES(1)=DRES(I)+TEMPMAX+DLOG(FEMPSUM)F=F+DRES(1)100 CONTINUEf=-fRETURNEND
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Mixed regression models for discrete data
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Mixed regression models for discrete data Wang, Peiming 1994
pdf
Page Metadata
Item Metadata
Title | Mixed regression models for discrete data |
Creator |
Wang, Peiming |
Date Issued | 1994 |
Description | The dissertation consists of two parts. In the first part we introduce and investigate a class of mixed Poisson regression models that include covariates in both mixing probabilities and Poisson rates. The proposed models generalize the usual Poisson regression in several ways, and can be used to adjust for extra-Poisson variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice issues. Several applications of this approach are analyzed. This analysis is compared to quasi-likelihood approaches. In the second part we introduce and investigate a class of mixed logistic regression models that include covariates in both mixing probabilities and binomial parameters with the logit link. The proposed models generalize the usual logistic regression in several ways, and can be used to adjust for extra-binomial variation. The features of the models, identifiability, estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, model selection criteria and residual analysis are discussed. A Monte Carlo study investigates implementation and model choice issues. An applications of this approach is analyzed and results compared to those by quasi-likelihood approaches. The dissertation also discusses future research in the areas and provides FORTRAN codes for all computations required to apply the models. |
Extent | 5609440 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-04-15 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088106 |
URI | http://hdl.handle.net/2429/7176 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
GraduationDate | 1994-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1994-954051.pdf [ 5.35MB ]
- Metadata
- JSON: 831-1.0088106.json
- JSON-LD: 831-1.0088106-ld.json
- RDF/XML (Pretty): 831-1.0088106-rdf.xml
- RDF/JSON: 831-1.0088106-rdf.json
- Turtle: 831-1.0088106-turtle.txt
- N-Triples: 831-1.0088106-rdf-ntriples.txt
- Original Record: 831-1.0088106-source.json
- Full Text
- 831-1.0088106-fulltext.txt
- Citation
- 831-1.0088106.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0088106/manifest