AN ORTHODOX BLUP APPROACH TO GENERALIZED LINEAR MIXED MODELS by Renjun Ma B.Sc, Wuhan University of Water Transportation Engineering, China, 1982 M.Sc, Wuhan University, China, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April, 1999 ©Renjun Ma, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract We introduce a new class of generalized linear mixed models assuming Tweedie exponential dispersion model distributions for both the response and the random ef-fects. This class of models accommodates a wide range of discrete, continuous and mixed data. By letting the random effects enter as weights as well as means in the conditional distributions, the variance matrix may be expressed as a sum of variance components. We consider an orthodox BLUP approach to parameter estimation and random effects prediction for this new class of models based on a predictor of the random effects that is truly best linear and unbiased, in contrast to the conventional BLUP which is the conditional mode. We obtain an optimal estimating equation based on the orthodox BLUP, which is solved by a modified Newton algorithm. This approach facilitates analysis of residuals and allows justification of asymptotic results under realistic conditions through standard estimating equation theory. An impor-tant feature of this approach is that the principal results depend only on the first and second moment assumptions of unobserved random effects. The common fitting algorithm based on orthodox BLUP enables us to study this new class of models as a single class, rather than as a collection of unrelated different models. This approach is illustrated with the analyses of seed germination data, epilepsy data and cake bak-ing data. By means of asymptotic justifications, simulations and worked examples, we conclude that the orthodox BLUP approach is of practical value for analysis of clustered non-normal data. ii Contents Table of Contents iii List of Tables vii List of Figures viii Acknowledgements x 1 Introduction 1 2 Preliminaries 7 2.1 Generalized linear models with random effects 7 2.1.1 Definitions 7 2.1.2 Tweedie exponential dispersion models 9 2.2 Prediction of random effects 11 2.2.1 Likelihoods 11 2.2.2 Penalized quasi-likelihood and h-likelihood approaches . . . . 12 2.2.3 Comparison for different predictors 13 2.3 Estimating functions . 15 2.3.1 Unbiased estimating functions 15 2.3.2 Optimal estimating functions 19 iii 2.3.3 Nuisance parameter case 19 2.4 Data examples 20 2.4.1 Epilepsy data . . 21 2.4.2 Seed germination data 23 2.4.3 Cake baking data 23 3 Tweedie mixed models 26 3.1 Models 26 3.2 Covariance structure • . 31 3.2.1 Derivation 31 3.2.2 Matrix expressions 33 4 Orthodox B L U P predictors of random effects 37 4.1 Orthodox BLUP 38 4.2 Random effects predictors 39 4.3 Mean squared distance 42 4.4 Consistency 45 5 Parameter estimation 49 5.1 Estimation of regression parameters 49 5.1.1 Estimated score function 49 5.1.2 Standard errors of regression parameter estimators 52 5.1.3 Newton scoring algorithm 52 5.1.4 Optimality 54 5.1.5 Derivation of matrix form of ib((3) 57 5.2 Estimation of dispersion parameters 61 5.2.1 Adjusted Pearson estimators 61 iv 5.2.2 Asymptotic properties 63 5.2.3 Heterogeneity 65 6 Residual analysis and computational procedure 67 6.1 Residual analysis 67 6.2 Computational procedure 69 7 Conventional, semiparametric and binomial mixed models 71 7.1 Conventional Tweedie mixed models 72 7.1.1 Covariance and variance function 73 7.1.2 Random effects predictors 75 7.1.3 Parameter estimation 77 7.1.4 Adjusted Pearson estimators 78 7.2 Random effects beyond Tweedie family 78 7.2.1 Nonparametric random effects 79 7.2.2 Log-normal random effects 81 7.2.3 Binomial mixed models 83 8 Data analysis 86 8.1 Count data 86 8.1.1 Overview of the previous analyses . . . 87 8.1.2 Poisson-Tweedie models 89 8.1.3 Model checking 91 8.1.4 Comparison of different approaches 94 8.1.5 Computational aspects 99 8.2 Binomial data 99 8.2.1 Paired Poisson-Tweedie models 100 v 8.2.2 Model checking 102 8.2.3 Comparison of different approaches . . . 102 8.3 Continuous data . 105 9 Simulation 112 9.1 Results over 100 simulations 112 9.1.1 Summary statistics 114 9.1.2 Confidence and prediction intervals 116 9.1.3 Normality of parameter estimates and random effects 117 9.2 Residual analysis 120 10 Discussion 124 10.1 Conclusion 124 10.2 Further study 127 10.2.1 More than two levels of random effects 127 10.2.2 Crossed designs 128 10.2.3 Survival data analysis 129 Appendix 132 A Data sets 132 Bibliography 136 v i List of Tables 2.1 Summary of Tweedie exponential dispersion models 10 8.1 HGLM parameter estimates for the epilepsy data 89 8.2 Parameter estimates for the epilepsy data 91 8.3 Parameter estimates for the seed germination data based on paired Poisson mixed models 100 8.4 Parameter estimates for the seed data based on binomial models. . . 102 8.5 Parameter estimates for cake baking data 107 9.1 Averages of parameter estimates over 100 simulations 114 9.2 Simulated and estimated standard errors 115 9.3 Mean squared errors over 100 simulations 115 9.4 Coverage counts of 95% confidence and prediction intervals 117 A. l Seed germination data (a)seeds: O. Aegyptiaca 75 and 73, (b)root extracts: bean and cucumber 132 A.2 Epilepsy data: successive two-week seizure counts for 59 epileptics, (a) Trt: treatment 0=placebo, l=progabide); (b) Base: eight-week baseline seizure counts; (c)Age (in years) 133 A.3 Cake baking data: breaking angles (degrees) 135 vii List of Figures 2.1 Overdispersion diagnostic plot for epilepsy data 22 2.2 Scatter plot of proportions of seeds germinated versus plates 24 2.3 Boxplots of cake baking data by batches 25 8.1 Profile plot of log transformed epilepsy seizure counts . 87 8.2 Normal plots of level 1, 2 and 3 residuals for epilepsy data 92 8.3 Scatter plots of level 2 and 3 residuals for epilepsy data 93 8.4 Heterogeneity plots for epilepsy data: (a) scatter plot of within subject sample variances versus sample means; (b) boxplots of within subject responses for all patients 97 8.5 Heterogeneity plot for epilepsy data 98 8.6 Normal plots of residuals for seed germination data 103 8.7 Scatter plots of residuals for seed germination data 104 8.8 Normal plot of ANOVA model residuals for cake baking data 108 8.9 Boxplots of cake baking data by temperature 109 8.10 Normal plots of residuals for cake baking data 110 8.11 Scatter plots of residuals for cake baking data I l l viii 9.1 Normal plots of parameters over 100 simulations: x and y axes are quantiles of standard normal distribution and ordered parameter esti-mates over 100 simulations, respectively. . ., 118 9.2 Plots for random effects predictors ui and u n over 100 simulations. . 119 9.3 Comparison of normal plots of level 1, 2 and 3 residuals for epilepsy data and simulated data 121 9.4 Comparison of histograms of level 1, 2 and 3 residuals for epilepsy data and simulated data 122 9.5 Comparison of scatter plots of level 2 and 3 residuals for epilepsy data and simulated data 123 ix Acknowledgements Foremost I would like to thank my supervisor, Dr. Bent J0rgensen, for his excel-lent guidance, for his inspiration, for his constant encouragement and for his great patience. I am very grateful to other members of my supervisory committee, Dr. John Petkau and Dr. Martin Puterman for their valuable comments and assistance. I am par-ticularly indebted to Dr. John Petkau for his careful reading of the manuscript; his helpful discussion and constructive suggestions lead to the substantial improvement of the presentation of this thesis. I wish to thank my program advisor, Dr. Jean Meloche, for helping me to adapt to the new environment. I would like to thank Dr. Nancy Heckman for her invaluable advice and help, for her constant encouragement and support throughout my program. Many thanks go to Dr. Harry Joe for his useful advice and help throughout my program. Many thanks go to Dr. Li Sun, Hongbin Zhang and Jeevanantham Rajeswaran for their valuable help with Latex and computer programming. I am very grateful to Christine Graham for her constant help and patience. I express my gratitude to Department of Theoretical Statistics, Aarhus University for providing me help and access to research facility during my academic visit. I also acknowledge the support of the University of British Columbia through a University Graduate Fellowship. Chapter 1 Introduction In recent years much effort has been devoted to extending regression methodology to clustered non-normal data. A wide range of applications of generalized linear mixed models to clustered data has been investigated by many researchers (Breslow and Clayton 1993; Lee and Nelder 1996). Traditional generalized linear models (McCul-lagh and Nelder 1989) have unified regression analysis for a variety of independent responses, but have difficulties in providing valid inference for clustered, and hence correlated, data. Clustered data usually exhibit significant heterogeneity and over-dispersion as well (cf. Engel 1987; McCullagh and Nelder 1989, p. 124-126, 198-200). Such cluster effects are often modelled by incorporating random effects into general-ized linear models. The application of generalized linear mixed models to clustered data has attracted much attention since the seventies (Crowder 1978; Laird 1978). However, the early development of generalized linear mixed models focussed mainly on relatively simple cases such as random intercept models (Hinde 1982; Williams 1982; Breslow 1984; Stiratelli, Laird and Ware 1984; Anderson and Aitkin 1985; Brillinger and Preisler 1 1986). The modelling of more complicated problems has largely been hampered by the intractability of high-dimensional integrals involved in evaluating the likelihood. To avoid these numerical problems, numerous alternative approaches to the analysis of clustered data have been proposed recently. The most popular approaches are generalized estimating equations (Liang and Zeger 1986; Zeger, Liang and Albert 1988), Bayesian methods using computational techniques (Zeger and Karim 1991), penalized quasi-likelihood (Breslow and Clayton 1993) and maximum hierarchical likelihood (Lee and Nelder 1996) methods. We briefly review these approaches and discuss their advantages and disadvantages. Generalized estimating equations (GEE) approach focuses on the marginal rela-tionship between covariates and clustered responses, but only estimates the covari-ances as nuisance parameters by adopting a so-called "working correlation". This approach enjoys robustness against mis-specification of covariances, but sometimes suffers from a lack of efficiency due to such incorrect specification (Lipsitz et al. 1994; Fitzmaurice 1995). It is of limited use when the correlation structure is of primary interest. This approach mainly deals with marginal models instead of random effects models. On the other hand, in areas such as genetics, it is often the unobserved genetic effects which are of primary interest (Harville 1977; Clayton 1991; Karim and Zeger 1992). Robinson (1991) presented a variety of applications where the random effects themselves are of interest; therefore random effects models are more relevant in such situations. As a general approach to complicated generalized linear mixed models, Zeger and Karim (1991) proposed to cast these models in a Bayesian framework and ap-proximate maximum likelihood estimates using flat or diffuse priors. However, such 2 approximations are often impossible because the posterior may not exist for such priors (Natarajan and McCulloch 1995; Hobert and Casella 1996). This problem of the posterior may not be detected when using computational techniques such as the Gibbs sampler; therefore misleading estimates may result. On the other hand, recent non-Bayesian approaches to generalized linear mixed models have focussed on the explicit or implicit modification of the E-step in the EM algorithm (Dempster, Laird and Rubin 1977) due to the difficulty evaluating the con-ditional expectation of the random effects given the data. The most widely adopted technique is the generalization of the linear mixed model equations of Henderson et al. (1975), but with various approximations (Gilmour, Anderson and Rae 1984; Harville and Mee 1984; Schall 1991; Breslow and Clayton 1993; Wolfinger 1993; McGilchrist 1994; Lee and Nelder 1996). It can be shown that these approaches essentially mod-ify the EM algorithm with the conditional expectation of the random effects given the data being replaced by the predictors based on the mode of the corresponding conditional distributions. These modal predictors are often referred to as the 'BLUP' (Schall 1991; McGilchrist 1994), although in general it is neither linear nor unbiased for non-normal distributions. In contrast to the modal predictor approaches, Bayesian approaches enjoy great flexibility in modelling the data with normal or non-normal random effects distribu-tions, but are computationally intensive. The computational time required is suf-ficiently long as to possibly discourage fitting several different models (Karim and Zeger 1992). This drawback may pose serious problems in practice because it is gen-erally difficult to justify a particular distribution for the unobserved random effects. Data will often point with almost equal emphasis at several different models and it is 3 important that we recognize these models and their possible different interpretations. On the other hand, the modal predictor approaches enable us to explore several dif-ferent models with reasonable computing time. The penalized quasi-likelihood (PQL) approach deals with models with approximate multivariate normal random effects. The maximum hierarchical likelihood (MHL) approach widens the choice of random effects distributions to include conjugate distributions (George et al. 1987), but fails to model the dependence structure of the random effects. The introduction of flex-ible, yet tractable distribution classes to model both the distributional shape and dependence structure of random effects is certainly needed to facilitate appropriate inference. The implementation of the Bayesian approach is relatively straightforward in con-trast to the requirement for manipulation of large matrices for modal predictor ap-proaches when there are a large number of random effects; however the assessment of convergence of computational techniques such as the Gibbs sampler remains an area of debate (Glifford 1993; Smith and Roberts 1993). On the other hand, non-Bayesian approaches provide a natural framework for model checking, but are forced to rely on asymptotics. In fact, the existing approaches to generalized linear mixed models mainly concentrate on the model fitting part, but to a large extent ignore another important ingredient of the modelling process, the model checking component (Lee and Nelder 1996). The justifications of these modal predictor approaches were generally intuitive un-til Breslow and Clayton (1993) presented some ad hoc justifications for their penalized quasi-likelihood approach and Lee and Nelder (1996) provided rigorous justification for their maximum hierarchical likelihood approach. The asymptotic results for quite 4 general settings are obtained under certain conditions by Nelder and Lee (1996), but with respect to large cluster sizes with fixed number of clusters. This raised con-cerns about more practical situations where the number of clusters is large, but with relatively small cluster sizes (Clayton 1996; Engel and Keen 1996). In addition, the estimating equations based on the modal preditors are generally biased (Breslow and Lin 1995; Lin and Breslow 1996a). This thesis considers generalized linear mixed models based on the class of Tweedie exponential dispersion model distributions (J0rgensen 1987a) for both the response and the random effects. This gives a very flexible class of models which includes various combinations of Poisson, normal, gamma, inverse Gaussian, compound Pois-son and extreme stable and positive stable distributions. In the context of clustered data, the hierarchical structure is very clear so that the modeled covariance struc-ture should clearly reflect those hierarchies. The dominant tradition in accounting for dependence between or within clusters is to explicitly incorporate random effects into a monotonic transform of the conditional mean ignoring the dispersion compo-nents. This approach generally does not lead to variance components decomposition structure for covariance matrix of the response. By incorporating correlated or un-correlated random effects into both mean and dispersion components, the covariance matrix of our model possesses an interpretable variance components decomposition. The novelty of our approach lies in the introduction of a new unbiased estimating equation, which is based on a modification of the EM algorithm where conditional expectations are approximated by an orthodox BLUP, in the context of generalized linear mixed models. An orthodox BLUP is defined as the best linear unbiased predic-tor in the literal sense, (cf. Brockwell and Davis 1991 p. 64). The unbiased estimating 5 equations introduced via the orthodox BLUP lead to consistent estimators for both regression and dispersion parameters under practical conditions where the number of clusters is large. The estimating equation for the regression parameters is optimal in the sense of Godambe (1976). While the parametric nature of our models facilitates residual analysis, our estimating procedure also allows a semi-parametric interpreta-tion of the models. Our approach does not require manipulation of large matrices; therefore is computationally simpler than the modal predictor approaches. This ap-proach is applicable to a wide range of clustered discrete, continuous and mixed data. The organization of this thesis is as follows. In Chapter 2, besides a brief intro-duction of Tweedie exponential dispersion model distributions and some estimating function results, we compare the orthodox BLUP and modal predictors and present . a few data examples to motivate our study. In Chapter 3, we propose a class of nested random effect models and derive their moment structures. The prediction of random effects and the consistency properties of the orthodox BLUP are discussed in Chapters 4. In Chapter 5, we discuss estimation for both regression and dispersion parameters as well as asymptotic properties of these parameter estimators. An out-line of the residual analysis and computational procedure is presented in Chapter 6. We address in detail a so-called conventional model and its relationship with other models in Chapter 7. Illustrative examples and simulations are presented in Chapters 8 and 9, respectively. We present a discussion in Chapter 10. 6 Chapter 2 Preliminaries 2.1 Generalized linear models with random effects 2.1.1 Definitions We study generalized linear models with random effects based on the class of Tweedie exponential dispersion model distributions. Here we first define exponential dispersion models. A random variable Y is said to follow reproductive exponential dispersion model ED(n, a2) if its probability density functions can be written in the form: P(V, , V) = ) exp{0[y rj - K(T,)]}, (2.1) where \x = E[Y] and a2 = 1/0. Let «'(•), the first derivative of K(-), be denoted by r(-). V(/JL) = T'{T~1 (fj,)} is called the variance function. Further, d(y; /*) = 2[y{T-\y) - r " 1 ^ ) } - /.{r" 1^)} + /.{r" 1^)}] (2.2) is known as the (unit) deviance function. The distribution of Z = <\>Y is called an additive exponential dispersion model, 7 denoted by Z ~ ED*(r],(f)). The justification of terminology 'reproductive' and 'ad-ditive' can be found in J0rgensen (1997, p l l ) . Now we can define a generalized linear model with random effects. Let Y = ( Y i i , V i m , Y m i , Y m n j n ) T be an n = YnLi n^-dimensional vector of observed responses. Let (3 = (Pi,3P)T be a p-dimensional vector of fixed effects and U = (Ui,Um)T be a m-dimensional vector of random effects. We suppose that, given U = u, Y u , Y i n i , Y m i , Y m n j n are conditionally in-dependent and Yij\U = u^ED(^,a2), (2.3) Let fx u = (/z^, . . . , / / i n i , ...,Aimi. - > / C n m ) T a n d l e t #(•) denote the link function. We denote ( g ^ ) , g ( t f n J , g ( ^ ) \ g ( ^ n J ) T by #(/i u). Suppose further that g(^) = Xf3 + Zu, (2.4) where X and Z are two known matrices. Then E [ F i j | U = u] = ^ , (2.5) The expectation and covariance matrix of U are denoted by the following equa-tions: E[U] = 77 and Var[U] = D ( 7 ) , (2.6) where D ( 7 ) is a q x q covariance matrix depending on an unknown vector of variance components 7 . 8 The expectations of U or transformed U are often assumed to be known. The transformed U is also often called random effects. 2.1.2 Tweedie exponential dispersion models Many exponential dispersion models have variance functions that are asymptotically of the Tweedie form, leading to a general convergence theorem with the Tweedie models as limiting distributions (J0rgensen et al. 1994). For this reason, Tweedie models occupy a central position among exponential dispersion models. Now we define Tweedie exponential dispersion models. We call (2.1) a reproductive Tweedie exponential dispersion model, denoted by Twp(/i, a2), if E(Y) = p and Var(Y) = a2pP. Here p = 0, 2, 3 and 1 < p < 2 correspond to well known normal, gamma, inverse Gaussian and compound Poisson distributions respectively. The case p = 1 with a2 = 1 corresponds to Poisson distributions. In fact, using usual notations, we have Twi(/i, 1) = Poisson(^ x), Tw2(//, cr2)= Gamma(/i, ff2), and Tw3(/i,a2) = Inverse Gaussian(//, a2). Furthermore Tweedie exponential dispersion models posses the following scale transformation property: cTwp{p, a2) = Twp(cp, c2'pa2). (2.7) A complete list of Tweedie exponential dispersion models is given in Table 2.1 where S, f2 and 0 denote the support, mean parameter domain and canonical pa-rameter domain of the Tweedie exponential dispersion model, respectively (J0rgensen 9 Table 2.1: Summary of Tweedie exponential dispersion models. Distributions P S 0 e Extreme stable p<0 R R + Ro Normal p = 0 R R R [Do not exist] 0 3 R+ R+ —Ro Extreme stable p = oo R R R_ Notation: —Ro = (—oo, 0] 1987a). To ease the derivation of the estimation function for regression parameter for our random effects models later, we rewrite the density of the Tweedie exponential dis-persion models Twp(/i, a2) as follows: fP(y; A*, o-2) = { c , ( 2 / ; ^ 2 ) e x P { ^ ( ^ - ^ ) } i f P 7 U , 2 , c 2(2/;^ 2)exp{-^ (j + log^)} iip = 2, Ci(y)exp{ylog/u - p,} if P = 1, where the explicit expressions for cp(y;a2) are given by J0rgensen (1987). The fact that cp(y; a2) do not depend on p is crucial to the derivation of our unbiased estimating function in Chapter 5. The exact expressions for cp(y; a2) are immaterial in the derivation of the unbiased estimating function, thus omitted. For more details about 10 Tweedie exponential dispersion model, see J0rgensen (1987, 1996). 2.2 Prediction of random effects The prediction of random effects plays an important role in random effects models. It is especially useful in the identification of outliers. The distinction between Bayesian and non-Bayesian approaches to generalized linear models with random effects is clear. Bayesian approaches use posterior means or modes as point estimators for both parameters and random effects. The regression coefficients and the random effect variance are assumed to be random vectors and treated symmetrically with the observed responses and unobserved random effects. On the other hand, non-Bayesian approaches such as penalized quasi-likelihood approach of Breslow and Clayton (1993) and the maximum h-likelihood estimation approach of Lee and Nelder (1996) predict random effects using the mode of the conditional distribution of the random effects given the data. We concentrate on making comparisons of these two approaches with our approach. 2.2.1 Likelihoods Before we make comparisons, we need to introduce the concept of the partially ob-served joint (log) likelihood as follows. Let Y be the response and U be the unobserved random effects. We assume that both the conditional distribution of Y given U and the distribution of U are parametric. The partially observed joint log likelihood with only the response being 11 observed is defined as £(/3, a, 7; Y, U) - a; Y|U) + £ ( T , U), where £((3, a 2; Y|U) is the logarithm of the conditional density of Y given U with (3 and cc being the regression and dispersion parameters respectively, and £ ( 7 ; U) is that for U with parameter 7 . Thus the marginal likelihood is |exp[£(/3,a,7;Y,U)]dU. In Breslow and Clayton's paper, the partially observed joint likelihood is approx-imated by the penalized quasi-likelihood defined as q(J3,a,r, Y , U ) = ^n^Ud^Y^p^) - ^u T D- 1 ( 7 )u , (2.8) where dY,-(-; •) denotes the deviance function defined in Section 2.1.1. 2.2.2 Penalized quasi-likelihood and h-likelihood approaches Breslow and Clayton consider the model in (2.3), but assume U follows, at least approximately, a multivariate normal distribution with mean 0. They use solutions (/3, u), for fixed 7 , from the following equations flg(y,/3,u,7) _ n dg(y»ft ,u,7) _ n d(3 du as the estimators for ((3, u). That is, they obtain estimators through maximizing an approximation of the partially observed joint likelihood. They adopted the Fisher's scoring algorithm to obtain the solutions. The multivariate normality assumption is convenient to incorporate correlation structure among random effects, but quite 12 restrictive to model the distributional shape. On the other hand, Lee and Nelder consider the model in (2.3), that is Yij\\J = u~ED(^,o2), but with (0;Y) = E £ i ^ i ( 0 ; Y i ) = £ £ i V \ ( 0 ) , where tp^ i = 1,..., m are unbiased esti-mating functions, that is Egi/,i(0;Y) = 0, where tb(0;Y) is of the same dimension as the dimension of 0. We also assume that ib(9;Y) is regular, (J0rgensen and Labouriau 1994; McLeish and Small, 1988). Let the density function of Y and the range of Y be denoted by 15 p(Y; 0) and X, respectively. We define a regular estimating function as follows: Definition 2.1 An estimating function ib(0;Y) is said to be regular if the following conditions are satisfied for all 0 in the parameter space: 1. The support ofY does not depend on 0; 2. E0ih(0;Y) = 0; 3. The partial derivative a^^'Y^ exists for almost every Y in X, j = 1,..., q; 4- The order of integration and differentiation may be interchanged as follows: A I m Y) P (Y; = jx J - me; Y W Y ; *)} dY, j = l,...,q. 5. S(0) = Eg d^^'Y^ is a nonsingular q x q matrix; 6. V(0) = EQ[IP(0; Y)xb(9; Y) T ] is a q x q positive-definite matrix. S(0) and V(0) are called the sensitivity and variability matrices, respectively. Note that we do not assume that ib^O; Y)s are regular. Actually the corresponding sensitivity and variability matrices for Y) are often singular in practice, due to the presence of categorical covariates. However, we assume that 1/^ (0; Y) satisfies all above conditions except the non-singularity requirement of its sensitivity and vari-ability matrices. 16 In the estimating function approach one considers estimators which can be ex-pressed as solutions of the following estimating equation: m tf(0;Y) = 5>i(0) = O. i-l By standard asymptotic theory for estimating functions, we may show that under certain regularity conditions, the sequence of roots, 9^ \ associated with the estimat-ing function ^ ^ ( 6 ) , is consistent for 0 and asymptotically normal. Specifically we have (Artes and J0rgensen 1998): Lemma 2.1 xAr7(0(m) -ff)-2+N (0, J _ 1(0)) as m -> oo, (2.10) where 3(6) can be expressed in terms of the sensitivity and variability matrices for cluster i, defined by S*(0) = E0 { ^ r 1 } , and V;(0) = E0 {^(0)V\T(0)} respec-tively: ^ E S i W J j m - ^ V ^ ) ! j m - ^ S i W J - (2-11) The asymptotic covariance matrix of 0 is therefore given by the inverse of the Godambe information matrix defined by 3(0) = S(0)V(0)-1S(0)T, (2.12) where S(0) = ET=i Si(0) and V(0) = Z?=1 V<(0). The estimating function is a generalization of the score function U(6): 17 where £(Y; 0) = logp(Y; 0) is the log likelihood. The sensitivity and variability matrices for the score function have following rela-tionship: S(0) = -V(0), which does not hold for regular estimating functions in general. The estimator and Godambe information matrix for the score function are the maximum likelihood estimator, denoted by MLE, and the Fisher information ma-trix, denoted by 1(0), respectively. Among all regular estimating functions, the score function is optimal in the sense that the estimator associated with the score func-tion attains the minimum asymptotic variance among estimators associated with all regular estimating functions. To state this result precisely, let the Godambe informa- , tion matrix for any given regular estimating function ip be denoted by J^(0). Then (J0rgensen and Labouriau 1994) jj(0)-r\0), is nonnegative-definite for all 0 in the parameter space. However, in the context of generalized linear mixed models, the full score function for both the response and the random effects is not available since the random effects are unobserved. Thus we consider estimating functions other than the score function. We can also consider the optimality property, but within a more restricted class of estimating functions. In the next section, we state an optimality result within a certain linear class. 18 2.3.2 Optimal estimating functions Crowder (1986, 1987) proved, under some regularity conditions, the following Lemma 2.2 m *l>opt(0) = Z2Si(0)V-1(0)ibl(0) (2.13) i=i is an optimal estimating function within the class of all linear estimating functions of the form m EQiWiW. (2-14) i=i where Qi(0) is a constant weights matrix of appropriate dimensions. More specifically, Qi(0) is a matrix function of parameter 0, but does not involve Y. The solution 0 from the estimating equation xbopt(0) = 0 is then asymptotically normal with the asymptotic mean 0 and asymptotic variance given by the inverse of m m m •VP) = £ - S ; ( 0 ) = £ V , ( 0 ) = £ Ji(0). i=l i=l i=l That is, the Godambe information for the optimal estimating equation is the sum of the Godambe informations for each i. 2.3.3 Nuisance parameter case Suppose that 0 = (0(i), 0(I2))T where 0(i) and and 0(2) are taken as the parame-ter of interest and the nuisance parameter, respectively. We partition ,ibi(0) into {\bf\0), tp[2\0)), where xb^f1 and ipf1 are of the same dimensions as those of 0(X) and 0(2), respectively. Let ib^ = Y^Li ipf"1 k = 1, 2. Then the asymptotic covariance matrix for 0 ( 1 ) is given by (2.15) if E e d ^ { 0 ) = 0. (2) Lemma 2.3 Asymptotic Var(0{l)) = S^(0)Vn(0)S^(0)T, (2.15) 19 where Skl{0) = EQ^O^- andVkl(0) = EQ (0)V>(/)(0)t| k,l = 1,2. Knudsen (1998) obtained this result in his unpublished thesis. Note that since Si2(0) = 0, his proof is straightforward, and is based on the following equation: Sn(0) 0 V1 Vn(6>) V 1 2(0) ^ S21(0) S22(0) ) \ V 2 1(0) V 2 2(0) j \ S21(0) S22(0) Sn(0) 0 - T sri1WVn(0)sr1 1(0)T * (°" V Clearly the upper left block of the right hand side, Sr11(0)Vii(0)Sn1(0)T, is the asymptotic covariance of 0(i) since the right hand side is the asymptotic covariance of 0. The asymptotic covariance of 0(i) will, in general, be affected by both the variance of the nuisance parameter estimator and the variability for the estimating function for the nuisance parameter. However, under nuisance parameter insensitivity, that is> Efl \ A = 0, this result tells us that the asymptotic covariance matrix of U 0 " ( 2 ) the estimator for the parameter of interest will be affected by the variability and sensitivity of the estimating function for the parameter of interest, but not by the remaining parts of the variability and sensitivity matrices for the estimating function. 2.4 Data examples In this section, we describe some clustered data examples where over-dispersion and heterogeneity may exist. We will analyze these data sets in Chapter 8. 20 2.4.1 Epilepsy data Thall and Vail (1990) presented longitudinal data (see Table A.2) from a clinical trial of 59 epileptics who were randomized to a new anti-epileptic drug progabide (Trt=l) or a placebo (Trt=0). Baseline data available at the start of the trial included the number of epileptic seizures during the 8-week period prior to randomization and the patient's age. A multivariate response variable consisted of the counts of seizures during the 2-week periods before each of four visits to the clinic. Count data are traditionally modelled by the Poisson distribution; however the Poisson assumption that E(Y) = Var(F) is generally inconsistent with the empirical behaviour for clustered data. A set of count data is called over-dispersed or under-dispersed if Var(y) > E(F) or Var(F) < E(F), respectively. Over-dispersion is very common in biological data. An over-dispersion diagnostic plot for count data relative to Poisson regression models (Lambert and Roeder, 1995) is displayed for the epilepsy data in Figure 2.1. The graph is convex if over-dispersion exists with respect to the corresponding generalized linear model. The clear convexity of the over-dispersion diagnostic graph for the epilepsy data with respect to the Poisson generalized linear model may indicate the existence of over-dispersion. Subject (random) effects are often incorporated into Poisson models to account for heterogeneity, over-dispersion relative to Poisson models, and dependence among the repeated measurements within the same subject. It is natural to introduce a second level of random effects at familial level to account for familial aggregation of epilepsy (Paik, Tsai and Ottman 1994); however we do not have such familial grouping information for this data set. A second level of random effects is often introduced to 21 O 10 2 0 30 4 0 5 0 6 0 mean parameter Figure 2.1: Overdispersion diagnostic plot for epilepsy data. 22 account for uncontrollable covariates at each visit (Lee and Nelder 1996). We will thus consider generalized linear models with two levels of random effects in next section. Such two levels of random effects arises from many practical situations (Hedeker and Gibbons 1994; Goldstein 1995). 2.4.2 Seed germination data Crowder (1978) presented data (see Table A.l) on the proportion of seeds that germi-nated on each of 21 plates arranged according to a 2x2 factorial layout by seed variety and type of root extract. In particular, he presented the total number of seeds on each plate and the number of seeds germinated on that plate (Table A.l). He noted that there is between-plate heterogeneity of proportions. Figure 2.2 shows that these proportions vary from 0 to 0.83 among plates. Furthermore, the variability among clustered binary responses also often exceeds what would be expected due to binomial variation alone. It is natural to account for such heterogeneity and overdispersion by means of random effects models. 2.4.3 Cake baking data Cochran and Cox (1957) presented data (see Table A.3) from an experiment in baking chocolate cakes. Three recipes were tested and each recipe was replicated 15 times, giving total of 45 batches of cake mix. Each batch was divided into six cakes and these were baked at six different temperatures. After baking, a breaking angle was measured as the response of the experiment. Firth and Harris. (1991) found the residual plots based on the analysis of variance revealed heterogeneity between batches. This can also be seen from the boxplot of the responses by batches in Figure 2.3. 23 -8 10 plate numbers Figure 2.2: Scatter plot of proportions of seeds germinated versus plates. 24 8 H batches Figure 2.3: Boxplots of cake baking data by batches. 25 Chapter 3 Tweedie mixed models In this chapter, we introduce a class of models with nested random effects based on the class of Tweedie exponential dispersion model distributions. We call these models Tweedie mixed models. When the conditional responses follow a specific distribution, say, the Poisson or gamma, we call it a Poisson mixed model or gamma mixed model, respectively. For a model with Poisson responses and gamma random effects distribu-tions, we call it Poisson-gamma model. We will also discuss the covariance structure of the model. In this section, we consider three-level hierarchical models where each model is com-posed of m independent clusters indexed by i. Within each cluster i, there are J; correlated sub-clusters indexed by Then within each sub-cluster there are correlated observations. Let the vector of observations be denoted by Y = ( Y i n , . . . , Y n n i l , . . . , Y m J r n l , . . . , Y m j m n m J m ) T . Then Yijk represents the kth observation in sub-cluster (i, j). One such hierarchy would be a multi-center longitudinal clini-cal trial involving repeated measurements within patients and patients nested within 3.1 Models 26 study centers. Another example is teaching method evaluation involving high scool students within classes and classes nested within schools. (Goldstein 1995; Gray et al. 1995). Yijk denotes the A;th measurement for patient j taken in the ith medical center for the former example, whereas the represents the test score for student k in class j of the ith. school. Denote the cluster, sub-cluster and observation covariates by z^Zjj and z^, re-spectively. We assume that there exist cluster and sub-cluster specific random ef-fects. The random effects for the cluster i and sub-cluster (i,j) are denoted by Ui and Uij, respectively. Thus the vector of the random effects can be written as U = (Ui, ...,Um,Un,UmJm)T = (U*, £/**), where U* and stand for (Uu Um)T and (Un, ...,Umjm)T, for short. We assume further that, given the random effects, the responses are independent and follow certain Tweedie distributions. Specifically: Al) Given U = u, Y m , Y n n u , Y i n , Y i j n . . , Y m J m i , F m J m „ m J m are con-ditionally independent, and the conditional distribution of Y ^ , given U = u, depends on Uij only which is Yijk\U = u ~ r£wp{}iijkuij,p2uijl~lp) = UijTwp ifiijh, — ) , (3.1) V Uij) where / i ;^ = exp(z A^./9(3)). For case p = 1, namely the Poisson distribution, p2 = 1. Furthermore we assume that, given the cluster level random effects, the sub-cluster level random effects are independent and follow Tweedie distributions as follows: A2) Given U* = u*, Un, ...,Umjm are conditionally independent, and the condi-27 tional distribution of [/»_,-, given U* = u*, depends on Ui only which is t/ij|U„, = U * ~ TWq(fJ,ijUi,UJ2Ui1 Q) = UiTwq (^lij, ^- j . (3.2) where ptj = exp(zJ/3(2)). A3) Finally we assume that the cluster level random effects are independent with Ui ~ Tw r(^,a 2), (3.3) where ^ = exp(ztT/3(1)). L e t x5fc = (z7> zij> zijfe)- T h e n t h e f u l 1 covariate matrix is X = ( x m , . . . , x m J m „ m J m ) The Tweedie distributions Tw r, Tw 9 and Twp are called the level 1, level 2 and level 3 distributions, respectively. To avoid non-positive random effects, r > 2 and q > 2 are required. Now we discuss some implications of the model assumptions. First, note that (3.1) interprets Uij as weights. Note also that the dispersion component takes a special form p2U\~v for the assumed level 3 Tweedie distribution in assumption Al). This special form will enable us to obtain unbiased estimating equation for the regression parameters later. It also leads to variance component structure of the covariance ma-trix which will be shown in the next section. This special dispersion component form also gives the linear structure of the conditional variance of the level 3 distributions. Thus we have the following double linearity: 28 Double linearity E[Yijk\\J] = m^Uij, Var[^ i fc|U] = p2ppijkUl3. The first statement is clear from the assumption, whereas the second statement is immediately verified as follows: V a r o l i i ] = P2Ut-pi4jkU% = p 2 ^ . That is, both the conditional mean and the conditional variance of the level 3 distribution is linear in the second level random effects. In fact, for any given p, the double linearity is equivalent to the assumption Al). Similarly, for any given q, A2) is equivalent to the following double linearity of the level 2 distributions. E[cyU»] = fMjUi, Varol i i*] = wV'A We discuss some decompositions of the models. We consider the structure of the conditional mean. In the literature, the modeling of within cluster dependence is usually focused on incorporating random effects into a monotonic transform of the conditional mean assuming constant dispersion components. For multiplicative models, it is usually defined as E[Yijk\V,V] = pHjkUiVij. This implies that Var[Yijk\\J]=uj2V(LiijkUiVij) where V(-) is the variance function for the conditional distribution of Yijk, given (U, V). In addition, Ui and V}j are often taken as independent, but the covariances 29 of the responses in general do not then exhibit a variance component structure (Mor-ton 1987; Thall and Vail 1990; Firth and Harris 1991; Lee and Nelder 1996). In our model assumptions, the conditional expectation of Y given U is linear in the random effects. To compare our model assumptions with others, we give the following multi-plicative decomposition for the conditional expectation of Y given U. Multiplicative decomposition EfY -^fclU] = fajkUij = fajkUi-jji- = PijkUiVij, where Vij = ^ji- and Cov[C/j, V~ij] = 0. The latter can be shown as follows: E , | , u . ] - E { ™ } therefore Cov[^,vy = E (ujfy - E(0i)E = E (Uij) - HiHij = 0. Additive decomposition Besides the above multiplicative decomposition, the response of this model also possesses the following additive decomposition with three uncorrelated, but dependent components as follows: Yijk = {Yijk — P-ijkUij) + {p-ijkUij — Hij^ijkUi) + HijllijkUi. (3.4) 30 This additive decomposition will facilitate residual analysis. The two sides of the equation are clearly equal. The three components on the right-hand side can be eas-ily shown to be uncorrelated through the covariance structure described in the next section. Tweedie mixed models with one level of random effects A Tweedie mixed model with one level of random effects is a special case of the Tweedie mixed model with two levels of random effects by setting us2 = 0 and J; = 1 for all i. 3.2 Covariance structure The derivations of both our parameter estimators and random effects predictors are based on the moment structure of the model. Thus we investigate the moment struc-ture of the model here. 3.2.1 Derivation We begin our investigation with the moments of the random effects. The intra-dependence within clusters are clearly reflected by the covariance structure described below. The derivations of the following moment expressions are straightforward us-ing the conditioning technique. Using Kronecker notation 8^) being 1 if s = i, 0 otherwise, the covariance structure can be expressed in the following way: E[Ui] = ^ and Cov[Us, Ui] = 8(s, i)o-2f4. E[iy.y»] = iHjUi and Var[Ly[/,] = co2^. 31 Cav[Ua, Uij] = S(s, i) t^ijjy Vij* = (Mtjl> • • • J ^ijmjY") Mi** = (Mil*> • • • ' MZ/i*) a n d Uij* = (^it^ijIHjli • • • i fJ'ifJ'ijH'ijriij) • Then E(Y i) = ^i = K T i * , - - - , ^ i * ) T -In addition, for any a = (ai,... , an), ar is defined as (a[,..., a^ ). Then we have Cov[Ui,Yi] = a-2i4-1vJ, (3.11) Covp7y, Yi] = ^tt-^uj + ^ ( O 7 , . . . , i/?,, . . . , 0T). (3.12) Now we derive an explicit expression for the inverse of Var(Y). Note first that I Var-^Yi) Var _ 1 (Y) 0 \ V 0 V a r - ^ Y J j since different clusters are independent. Thus it is enough to derive the inverse of Var(Yj). The inversion of Var(Yj) can be further simplified by noting that Var(Y;) is a patterned matrix which reflects the hierarchy. To find a simple expression for 34 Var(Yi), we define Then / v •A-ij P P'iP'ij U i 0 X Var(Y i) = 0 Therefore Var(Yj) can be expressed as a positive-definite matrix plus a matrix product between a column vector and a row vector. Thus we can invert Var(Yj) using the following well-known formula (cf. Rao 1973) (A + a b ^ - 1 = A " 1 A-labTA~l 1 + b T Aa ' where a and b are vectors and A is invertible. With (3.13) ( V Aii 0 0 \ a = i, we obtain VarfY,)- - (A, + O V - W ) " = A."' - f f ^ f f i ^ . (3.14) where 35 A T 1 A - 1 0 0 Now where each block A^- is again of the form (3.13). Applying the same matrix inversion formula to the A^- gives < ± 0 A " 1 \ (fillip [ M ' where ^ = l/(p2 + to2^ 1 £*=i M i ? / ) - Thus we have (3.15) Ajj ^ jj* 2 A ^ I J * ^ Mij Z^fc=l H-ijk Wij 1-p p2 Mij* _ 1-P — wijH>ij* , Plugging the last two equalities into (3.14) gives the explicit formula for the inverse of the covariance matrix of the responses. We will write out this explicit expression with simpler notation later. This explicit expression will be very useful in our discus-sion of random effects prediction and parameter estimation in the following chapters. 36 Chapter 4 Orthodox BLUP predictors of random effects In this chapter, we study the orthodox BLUP of random effects. Robinson (1991) presented a variety of applications where the prediction of random effects is of inter-est. These applications arise from areas such as sports, genetic study, quality and management, time series, image analysis, geostatistics, actuarial science and small-area estimation. The prediction of random effects is also found to be very useful in the identification of outliers. Fellner (1986) discussed this topic with some examples. He found that looking at the predictors of random effects is often more sensitive for detecting outliers than merely looking at residuals. We derive the explicit expressions for the orthodox predictors of random effects. Let us begin with the general expressions for orthodox BLUP. 37 4.1 Orthodox BLUP Let U and Y be random vectors with finite second moments. We define the orthodox BLUP of U given Y by: U = E(U) + Cov(U, Y )Va r _ 1 (Y) (Y - E(Y)). (4.1) The mean squared distance between U and U can be evaluated through the fol-lowing equation (Harvey 1981; J0rgensen et al. 1996b): V a r ( U - U ) = E [ ( U - U ) ( U - U ) T ] = Var(U)-Cov(U,Y)Var- 1(Y)Cov(Y,U). (4.2) This mean squared distance can be further decomposed into the sum of the mean squared distance between orthodox BLUP of U given Y and the conditional expec-tation of U given Y and mean squared distance between the conditional expectation of U given Y and U as follows: E [ ( U - U ) ( U - U ) T ] = E[ (E(U-E(U |Y) ) (U-E(U |Y) ) T ] +E[(E(U|Y) - U)(E(U|Y) - U) T]. (4.3) In addition, we have the following two desirable orthogonality properties concern-ing the orthodox BLUP: Cov[U - U, U] = 0 and Cov[U - U , Y] = 0. (4.4) That is, the residuals between the random effects and their orthodox BLUP pre-dictors are orthogonal to both the response and the predictors. These orthogonality properties will be repeatedly used later. 38 The first two moments of the orthodox BLUP are as follows: E(U) = E(U), Var(U) = Cov(U, Y)Var 1(Y)Cov(Y,U). (4.5) The first equation follows immediately from (4.1) since E[Y — E(Y)] = 0. The second statement can be easily verified by using (4.4) as follows: Var(U) = Var(U - U + U) = Var(U - U) + Var(U) + 2Cov(U - U, U) = Var(U - U) + Var(U) + 0. (4.6) The verification is completed by noting (4.2). Comparing (4.2) with (4.5), we have Var(U) < Var(U). Thus the variance of the orthodox BLUP predictor of random effects is generally smaller than that of the random effects. Thus the orthodox BLUP predictor is also referred to as the shrinkage predictor of random effects. 4.2 R a n d o m ef fects p r e d i c t o r s Explicit expressions for the orthodox BLUP predictors of random effects U given Y can be derived from (4.1). Since different clusters are independent, we derive the random effects predictors from the following two formulae: Ui = E(Ui) + Cov(Uu Y i)Var(Y i)" 1 {Yt - E(Y^), (4.7)' and Un = E(Uij) + Cov(% Y i)Var(Y i)" 1 (Y* - E(Y,)). (4.8) 39 Let us first derive the explicit expression for the orthodox BLUP predictor Ui. It is more convenient to derive the expression using a matrix form based on (3.11), (3.12) and (3.14) as follows: u- - u I . V - W A - I ^ r 2 A ^ ( A r ^ ) T ) ( Y v ) 1 + a^-2ujA^Ui Noting that we have Ji riij j=l fc=l Hence where u>ij = l/(p 2 + co2/4j 1 Y^=i Pnjk) a s defined in (3.15). (4.9) Thus the orthodox BLUP predictor for each cluster i is a linear function of re-sponses within the cluster. A reasonable predictor of Ui should be nonnegative since Ui is. The orthodox BLUP predictor Ui is clearly nonnegative when the responses are. In fact, we mainly concentrate on models with nonnegative responses in this 40 thesis. Similarly we can derive an explicit expression for l \ . We introduce a notation which denotes (0 T , . . . , t/J^,..., 0 T ) T . We have Cov[tfy, Yt] = o 2 ^ 1 ^ + uj 2^ eJ3. (4.12) Thus Uij = (a2prx^vl + ^ - ^ V a r ^ ) " 1 ^ - n) = HjUi + rftf^elVaxOTiy^Yi-Vi) „ f F x „ 2 „ H 0 T | i - i a2p\-2AllUi(AilUi)T\ / v l + a ^ - ^ A - 1 ^ ) ^ = mjUi + ^ Vr1(Ar1%)T(Yi - -UilJi). (4.13) Plugging A^ey = (0 T , . . . , Wij(fj,l~p)T,..., 0 T ) T into (4.13), we obtain Uij = /Jy/7i + uj2nlJlWij Pi'kiYijk - HijHijhUi) fc=i = p2wijfMijUi + u2fj,qi[1WijJ^i21i[kpYijk, (4.14) fc=i where the second expression is obtained from the identity Uij 2 i 2 o - l 2-p pwij = l - u filj Wij 2^ fc=l 41 The first expression for Uij shows that U^ is PijUi adjusted by a linear function of the responses within sub-cluster (i,j). The second expression shows that Uij is also nonnegative when the responses are. 4.3 Mean squared distance To evaluate the distance between random effects predictors and the corresponding random effects, we study the diagonal elements of Var(U — U). These mean squared distances form the basis of our discussion of consistency of random effects predictors in next section. Using the notation c(i) = E(Ui — Ui)2 and c(ij) = E(f/y — U^)2, it follows from (4.6) that Vai(Ui) = Var(Ui-Ui + Ui) = Vai(Ui-Ui) + Vai(Ui) + 2Cov[Ui-Ui,Ui] = c(i) + V a r © ) + 0. Hence c(i) = Viur(Ui)-Vax(Ui) = a 2 ^ - V a r (ft). (4.15) Similarly, we have c(ij) = V a r ( i y - V a r ( l \ ) = a wijVijVijk = 7777 - 1-j=ifc=i c w Furthermore, it follows from (4.4) that Cov(Uij, Uij) = Cov(C/y, C/y). Now we have V a r ( l \ ) = CoviU&Uij) = Cov(p2WijHijUi + u2\i\~xWij VijkYijk, Uij) fe=i = (PwijiUjCoviUi, Uij) + Ui, and c(ij) —¥ 0 implies Uij —> Uij. To draw our consistency results based on this inequality, we need to find upper bounds for c(i) and c(ij). Since oversimplified upper bounds do not result in desirable consis-tency results, we derive the upper bounds through a sequence of inequalities. After some algebra, we can show that c(i) is bounded above as follows: Lemma 4.1 fr2/4 (f?+ u2minj{nij)maxj(tf71)minjjk{p]jkp)) c{i)< p2 + u2minj(nij) maxj(p,jj )minj,k(pijk) + a2minj(riij)Ji mui m^nj(lIij)m'''nj,k(fJ'ijk). Proof We start our investigation with w^. 1 P2+U2p1j Efc=l Mijfc 1 p2 + w2minJ-(riiJ0minj(/j?71)mini/b(^7) ^ . 9 , . . 9 _ : _ <- / , 2 - P V (4 - 2 0 ) 45 Also we have nij sr^ij . , 2 - p E 2 - p _ Z ^ f c = l H'jjk wijf1ijk fc=l P - I - W /Xy 2^k=l Pijk 1 „ P 2 \-L02ufl~l E nij ,,2-p I ^ PlJ fc = l y^fc > m i n J ( n , J ) £ i n j f c ( ^ ) + a ; 2 m a X ^ ^ 1 ) mi"j(^)minjfc(^ rfcP) p2 + u2mmj(nij)ma,xj([iqi71)mmjk(^ ' hence 2 r - i v * ^ 2 - P ^ gV» ^min^^ )mmj(nij)mmjk{pijkp) i=ifc=i P +w 2 min J (n l i )max i (^ )mmjk(nijkp) Now we have c( i ) = < o2ti ( P 2 + ^2m%(^j)maxJ(pf~1)min j ; f c( p2 + w2minj(nij)maxi(pf71)minJ-fc(pJ27fep) + ^min^n^) J^ ^mm /^Jminj,k{lAjk ) • The derivation of an upper bound for c(ij) is straightforward. Note that Propo-sition 4.1 immediately implies that c(i) < a 2 t f • In addition, we have piw.. = ?! < i 13 o2 + uPulj1 E ^ i lAik 46 Thus c(ij) = p2wij {uPliifij + p2c{i)wijp2i]) ^ 2 I 2 Q , 2 r 2 \ It follows from (4.20) that c(ij) is bounded above as follows: Lemma 4.2 c(ij) < The following 'small dispersion asymptotics' (J0rgensen 1987b) is an immediate consequence of Lemma 4.1 and 4.2: Proposition 4.3 1. Ui as a2 -> 0 or (UJ2, p2) -> (0,0); 2. ftj - A as p2 -> 0 or (a 2 , a;2) -)• (0,0). For large sample asymptotics, clearly we have Proposition 4.4 p Ui as Ji —>• oo and C/j,- p C/jj as mirijinij) —>• oo, a// Pi, Uij and p,ijk are contained in a compact set not containing zero. 47 This result distinguishes the consistency conditions for Ui from those for U^. The former is implied by a large number of sub-clusters within a certain cluster, while the latter is implied by large sizes of the sub-clusters. This result matches our intuition. The above results can be expressed in a more delicate form derived from Cheby-shev's inequality as follows: Proposition 4.5 under the same conditions. It follows from (4.3) that c(i) ->• 0 and c(ij) -> 0 also imply Ui - A E(Ui\Y) and Uij - A E{Uij\Y), respectively. Thus Ui - A E(Ui\Y) and U{j - A E(J7„|y) are clearly implied by the conditions stated above. Similarly, we have Proposition 4.6 A more delicate discussion of the consistency of the random effects predictors would involve the relationships among the quantities fa, fj,ijk,p,q,r,riij. Some further results will be discussed in Section 7.1.1. 48 Chapter 5 Parameter estimation 5.1 Estimation of regression parameters We begin our discussion on the estimation for the regression parameters with the case of known dispersion parameters. The inclusion of unknown dispersion parameters wi l l be discussed in next section. 5.1.1 Estimated score function As the score function is optimal among al l regular estimating functions, we begin our investigation with the partially observed score function defined below: 30(2) m Ji / X 1 " 9 — £ £ Zij~~2~(Uij — UiPij), !) i=l j=l £ £ £ Zijk K O^ijk UijPijk)-m Ji ritj 1-p 3/3(3) i=l j=l k=l P' 49 The marginal score function for the response Y is obtained by taking expectation of the above partially observed score function with respect to the conditional distri-bution of the unobserved random effects U given the responses Y. Since the partially observed score function is linear in the unobserved random effects U, the marginal score function for the response Y is obtained by replacing unobserved random effects U in the partially observed score function by the expectations of the unobserved ran-dom effects U given the responses Y, denoted by E(U|Y). To find the maximum likelihood estimate, in principle, we can then solve the marginal score equation ob-tained by setting the marginal score function to zero.. However, a closed form expression for E(U|Y) is generally difficult or impossible to obtain. Note that E(U|Y) is the best unbiased predictor of unobserved random effects U given the response Y and we showed that the orthodox BLUP predictor, the best linear unbiased predictor of the unobserved random effects U, generally converges to E(U|Y) in probability. Thus we approximate E(U|Y) by the orthodox BLUP predictor of the unobserved random effects in the marginal score function as follows: m m (5.1) (2)(/3) = £ £ z ; IkpiP) (ul3{(3)-um^M) = iZ^\P) (5.2) i=i j=i (3)(/3) = £ £ £ Z y f c i = l jz=l k = l m Ji ™ij P 2 (Yljk - UyfflfMikiPJ) = z24Z\P)- (5-3) i=i 50 The three functions in (5.1),(5.2) and (5.3) are clearly linear functions of the responses as the orthodox BLUP predictors of the random effects are. We call ib((3) as defined below the estimated score function (based on the orthodox BLUP): = (V> ( W,V> ( W,V> ( 3 W) T m m m i=l i=l i=l m i=i m = £V>,(/3), i=l where ipi{0) corresponds to the estimated score function for the zth independent cluster. Clearly t/>j(/3) is an unbiased estimating function as the orthodox BLUP predictors of the random effects are unbiased. If we treat the unobserved random effects as 'missing data', the EM algorithm can then be used to obtain the maximum likelihood estimate. This algorithm iterates between an E-step, which involves evaluating the conditional expectations of unob-served random effects given the response in the marginal score function using current parameter values, and an M-step, which involves obtaining updated parameter es-timates by solving the marginal score equation. The orthodox BLUP approach is equivalent to approximating the E-step by replacing the conditional expectations by orthodox BLUP predictors and approximating the M-step by solving m E ^ ( / 3 ) = 0 . 1=1 The roots of this equation are then used as regression parameter estimates. 51 5.1.2 Standard errors of regression parameter estimators Letting (3^ denote the sequence of roots oiY%L\ rfiifi) = 0, it follows from Lemma 2.1 that $ ^ is consistent for (3 and asymptotically normal as m —> oo. Specifically, the asymptotic variance is given by the inverse of the Godambe information matrix: J(/3) = S(/3)V(/3)-1S(/3)T, (5.4) where the sensitivity matrix S(j3) and the variability matrix V(/3) are given by: m m V(/3) = £ V M = zZE/3 {*M1>J(P)} • 1=1 1=1 From now on, we simply denote ft ^ by /3. An analogue of Wald's test is available for testing the hypothesis H0 : (3^ = 0, where (3^ is a sub-vector of (3. The test statistic is: ^ = 3(r1){j11(3)}~13(1), where Jn(0) is the corresponding block of the asymptotic covariance matrix of /3. Asymptotically, this statistic follows a x2(A;)-distribution, where k is the size of the sub-vector (3^. 5.1.3 Newton scoring algorithm The orthodox BLUP approach is actually a linearized EM algorithm. Instead of us-ing this approximate EM algorithm, we adopt a more efficient algorithm, the Newton 52 scoring algorithm, introduced by J0rgensen et al. (1996a) to solve the estimating equation ib((3) = 0. The Newton scoring algorithm is defined as the Newton algorithm applied to the equation il)((3) = 0, but with the derivative of t/?(/3) replaced by its expectation S((3). The resulting algorithm gives the following updated value for (3, (3* =(3-S-1{(3)ib{P). (5.5) Clearly the sensitivity and variability matrices are crucial in this estimation proce-dure. USij denotes and denotes E(V>(i)(/3)V>0)(/3)T), then the sensitivity and variability matrices can be expressed as follows: i Sn s 1 2 S13 \ m = S21 S22 S23 \ S31 S32 S33 / V n v 1 2 v 1 3 V 2 i V 2 2 V 2 3 V31 v 3 2 V 3 3 Note that S(/3) is apparently of asymmetric form. For example, Si 3 = ^ar , whereas S31 = y J^>'• Actually S{(3) can be shown to be symmetric. However a block-by-block direct derivation of these matrix blocks from (5.1), (5.2) and (5.3) would be complicated by the nonlinearity of f3 in U (J0rgensen et al. 1996b, 1996c). Instead, we introduce a concise matrix expression for ib(f3) in the next section. This 53 expression not only facilitates derivation of the sensitivity and variability matrices, but also helps to clarify the relationships between the sensitivity and variability ma-trices. 5.1.4 O pt imality After introducing a matrix expression for we will study the relationships be-tween the sensitivity and variability matrices and the optimality of the estimated score function. The concept of optimality here is in the sense that the estimated score function attains the minimum asymptotic covariance for / 3 among a certain linear class discussed below. Recall that the full covariate matrix is X with x7fc = (zT, zJ-: zj f c). Let Xj denote the sub-matrix of X corresponding to the ith cluster. We state the following results: Theorem 5.1 For Tweedie mixed models, the estimated score function, sensitivity and variability matrices can be expressed as follows: 1. = X.rdiag{E{Y)) Var~l(Y) (Y - E{Y)), 2. V(/3) = Var{xb{f3)) = XTdiag(E(Y)) VarCY)'1 diag{E(Y)) X, 3. J ( / 3 ) = - S ( / 3 ) = V ( / 3 ) . The first statement gives a global matrix expression for the estimated score func-tion. This expression can be viewed as a multivariate version of the quasi-score function first proposed by Wedderburn (1974) . The second statement gives the ex-pression for the variability matrix. The third statement for the sensitivity matrix is 54 similar to that for the Fisher information matrix. Both the derivation and computing efforts are greatly eased by the relationship given in the second and the third state-ments. Clearly now the asymptotic covariance matrix of (3 is given by the inverse of the variability matrix alone. The key statement in Theorem 5.1 is the first statement; the second and third statements are immediate consequences. We will leave the derivation of ib((3) to a separate section since it is long and technical. Now we show the second statement. V((3) = Var(V(/3)) = Var(X T diag(E(Y))Var- 1 (Y)(Y-E(Y))) = XTdiag(E(Y))Var- 1(Y)Var(Y)Var- 1(Y)diag(E(Y))X = XTdiag(E(Y))Var _ 1(Y)diag(E(Y))X. The third statement can also be easily shown based on the first statement. Noting that <9E(Y) d/3T and E ^ (Y - E(Y)) = 0, we have = diag(E(Y))X f d (xTdiag (E(Y)) VarfY)- 1) E/3 j ((Y - E(Y)) ® E) + E / 3 | (xTdiag (E(Y)) Var(Y)- 1) 9 ( Y ~ ^ ( Y ) ) } f3(xTdiag(E(Y))Var(Y)-1) . . ^ (E0 (Y - E(Y)) ® E) + (xTdiag(E(Y))Var(Y)- 1) ( - ^ ^ ) 55 = 0-X T diag(E(Y))Var(Y)- 1 diag(E(Y))X = -V(/3), ' where denotes the Kronecker product and E is an identity matrix whose order equals the size of /3. Now we can derive the explicit expression for S(/3). Note that S((3) = -V09) = -X Tdiag(E(Y))Var(Y)- 1diag(E(Y))X = - ( X T , . . . , Xjjdiag (E(Y)) Var(Y)-1diag (E(Y)) = - £ X I T d i a g ( E ( Y i ) ) V a r ( Y i ) - 1 d i a g ( E ( Yi)) Xj. i = i (5.6) Rewrite Var X(Y,) as Var-^YO A n V o o Aij , / 1-p \ c(i) / I 1 l-p\T\ where A y 1 is given in (3.15). Plugging this expression for Var 1{Yi) into (5.6), we have m Ji nij jt mj S(P) = zZ c(S) (zZ zZ wijtojVij~kxijk) (zZ ZZ wijto3t£jkXijk)T i=l j = i fc=i j-i fc=i m ^2^2 Ji n{j n{j + zZ zZ VijtfPizZ V2m^Jk)CzZ »ij~kXijk)T i = l P j=l fc=i fc=l 56 - £ £ £ -^rffkXijkxJjk- (5-7) i=ij=ifc=iP In next section, we show that tl>M = XTdiag (E(Yj)) VarCY,)-1 (Y, - E(Y;)). This immediately implies that the results of Theorem 5.1 hold for each cluster. Ap-plying Lemma 2.2 to the estimated score functions for each cluster, we may directly show the optimality of the estimated score function among the class of all linear esti-mating functions of the form Y%Li Qi(/3)V>j(/3) if the variability matrices Vj(/3)s are nonsingular; however, these variability matrices are often singular in practice. On the other hand, J2iLi Qi(/3)V'i(/3) is a subclass of the class of all linear estimating functions of the form YliLi^iiP) (Yj — E(Y;)) since is a linear function of Yj — E(Yj). Applying Lemma 2.2 to the latter linear class, we have m (/3) = E X i r d i a g ( E ( Y 0 ) V a r ( Y l ) - 1 ( Y l - E ( F i ) ) i=i = -tEp{d { Y l e f Y l ) ) } V a r ( Y - ) - 1 ( Y » " E ( Y ' ) } is the optimal estimating function among the linear class. 5.1.5 Derivation of matrix form of ip((3) We will derive the matrix expression for ib((3) in terms of ipi((3). Furthermore we will deal with t/> (^/3), %bf\(3) and ij)f\p) separately. The basic technique of han-dling V>|^ (/3) and ibf\(3) is to convert (5.1) and (5.2) into explicit linear functions of Yj - E(Yj) directly. We will then deal with il>f\(3) by converting (5.3) into explicit linear function of Yj — E(Y*) indirectly due to the technical convenience. 57 The derivation of ib\l\(3) is quite straightforward. First rewrite Ui as Ui = tM + aVr 1 E(Y i ) T Var(Y i )" 1 (Y* - E(Y,)). Thus we have # } ( / 3 ) = *£l'Ui-tii) = z i ^ ( a V r 1 E ( Y l ) T V a r ( Y l ) - 1 (Y, - E(Y,)) = z iE(Y i)TVar(Y l)" 1 (Y, - E(YS)) = (Zi,Zi)diag (E(Yj)) V a r ^ ) " 1 (Y, - E ( Y 0 ) • Similarly, we may deal with ibf^ (/3) by noting that = iHjUi + o ; V r l e 5 V a r ( Y i ) _ 1 ( Y i - " 0 -Thus we obtain Uij - KjUi = ^nf-'eljVMCY^CYi - E ( Y 0 ) . • Plugging this formula in ib^\/3), we have ib?\(3) = ^Zij^&j-U^j) j=i v = E z ^ ( ^ ^ ^ V a r f Y O - ^ Y , - E(Y,))) 3=1 U = X;^-e5Var(Y0- 1 (Y i -E(Y0) = (E z ^ V a r f Y O - ^ Y i - E(Y,)) J'=I = . . . ,z l J ii/ l T J i)Var(Y i)- 1(Y i - E(Y,)) = (za,..., za , . . . , zUi,..., z i7i)diag (E(Y;)) Var(Y i)"1 (Y, - E(Y;)). 58 A direct derivation of ib\ ((3) would be much more complicated than those of tp^\(3) and xp[2\(3). Instead, we introduce a new matrix notation to ease the deriva-tion. Let Uj*A*i** — (UnPm,..., UiiHnnn,..., Uu^ij^,..., Uu^u^j.) . This new vector is the same size as the longer, vector piJfif with components UijHijhS- That is, each component pijk of p^ is matched by a random effect com-ponent Uij within the same sub-cluster. We may call it a nested product between = (Un,..., UUi) and p^ = (pin,Pijinu.)- This nested product notation is especially convenient in the generalization of Tweedie mixed models with two levels of nested random effects models to higher levels. Similarly we have Ui*Pi** = (Uiipni,..., C/jiMiinjD . . . , Uij^ij^,..., UuifJ-iJimj.)1'• With this notation, we can express ibf\(3) simply in terms of Yj—-XJ^p^. Before doing so, we note: Ui^i** = E(Yj) + C o v ( U ^ „ , Yi)Var(Yi)- 1 (Y< - E(Y<)), Now we can rewrite the covariance of the responses in terms of that between Ut*/A I ] M , and Yi as follows: P Pin UH1 Var(Yi) = Cov(Ui#A*<**,Yi) + P ViJiniJ.ViJlniji J 59 / 2 P— 1 ' P M i l l = Cov(U i , / * i „ ,Y i ) + P^PHJXJ. J diag(E(YO) Hence Yj — U i * / / ^ has the following simple expression: Yj - U i , / i i M = Yi - E(Yi) - C o v ( U ^ „ Yi)Var(Yi)- 1 ( Y - E(Y,)) = {Var(Yi) - C o v ^ ^ , YO} Var(Y l)- 1 (Y< - E(Yj)) / 2 P—1 ' P Mil l p ^ „ r - i ; diag (E(Yi)) Var(Y l)" 1 (Yi - E(Y,)) Now we have W (0) = EE zufc " J " - ^ijMijfc) i=ifc=i P i-p . / - P — /'Mill ^iJinjjj s / v :pjr v — I 2 Z i l l 5 - • • •> 2 Z i ^ n i J , A * » — P , p ' = ( Z i l l , . . • , Zi J i n i J { )d iag (E(Yj)) Var(Yi)"1 (Y, - E(Yj)). These equations for ipf\l3), ibf\p) and ibf\/3) give ^!2)(/3) V ^3)(/3) Zil Z i l l Z i Z i J i Z i J v n v diag(E(Yi)) VartY,)"1 (Yi - E(Y,)) 60 = (x m , . . . , xUiniJi )diag (E(YO) VarCY,)"1 (F, - E(Y,)) = xTdiag (E(YO) VarCY,)"1 (Y, - E(Y,)). (5.8) Now Theorem 5.1 follows immediately. Noting that Var _ 1 (Y) is block diagonal, we have m (/3) = £ < / > i ( / 3 ) t=i m = E X z T d i a g( E ( Y 0) Var(Yj)-1 (Yi - E( F/)) i= i (X7,...,X^)diag(E(Y))Var(Y) - l / Y ! - E ( Y ! ) ^ ^ Y m - E(Y m ) ) = X'diag(E(Y))Var- 1 (Y)(Y-E(Y)) . 5.2 Estimation of dispersion parameters We now discuss the situation when the dispersion parameters are unknown. We esti-mate unknown dispersion parameters using the adjusted Pearson estimator (J0rgensen et al. 1996b); that is, the Pearson estimator adjusted by bias correction. 5.2.1 Adjusted Pearson estimators Recall (4.16), or equivalently a c(i) , E(Ui - Hif + M i M i We may thus estimate a2 by the following adjusted Pearson estimator: (5.9) 2 _ 1 " (Uj - / / j ) 2 1 " c(i) ™ ,,r ™ 2-~i ,,r • i = l M i mT=i Mi (5.10) 61 To obtain an unbiased estimator for to2, we consider E(Uij - mftif = Var(r7ij) + tfjVaxiUi) -2piijCov(Ui, Uij) = VaxiUij) + tfjVaxiUi) -2fXijCov(UhUij), (5.11) where the last equality follows from (4.4). Rewrite (4.15) and (4.16) as Var(£/i) = o2p\ - c(i), (5.12) and VaxiUij) = o2iiriix\j + ~ c(ij). (5.13) Plugging (5.12), (5.13) and (4.18) into (5.11), after some simplification we obtain w2//i/4 = E(Uij - HijUi)2 + c{i)ii2j + c(ij) - 2p2c(i)wiJiJ2j. (5.14) Thus we may estimate u2 as follows: Lb2 = 1 1 (Uij l^ijUi) m i=l *^ » j=zl VipHj | 1 A 1 ^c(i)tfj + c(ij)-2p2c(i)Wijtfj The dispersion parameter p2 can be estimated similarly. Noting that (4.4) implies CovCYijk, U^) = Cov(Yijk, U^), we have E(Yijk - UijUijk)2 = VsLi(Yijk) + p2jkV&v(Uij) - 2pijkCov(Yijk, l \ ) = Va,v(Yijk) + p,2jkVa,v(Uij) - 2pJijkCov(Yijk, U^) 62 (Vax(Yijk) - PijkCov(Yijk, Uij)) +H%k (Var(fty) - Var(E^)) P2PiPijPPijk + c(ij)n2jk. (5.16) Therefore p2 can be estimated as follows: ^2 1 1 1 (^ijfc UijP*ijk} m i=i j= i n i j fe=l PiPijP%k . (5,7) m i=i •'t j= i n » j fc=l MiMij Similar to the REML estimator, we may also consider a small sample degree of freedom correction. For example, we may replace m by m minus the number of regression parameters estimated; however the latter correction is suggested on an intuitive basis. 5.2.2 Asymptotic properties Let £ denote (o2, u2, p2)T. Let Pi Pi 0( 2) (0 |) = 1 jr f ^ ~ ^ ' ^ ^ i + C ^ ~ 2 P 2 c ^ W i ^ l | and ^ ( 3 ) ^ 0 = £ ~ E l Uij^ijk^ + ^fi^Jk \ - p2 j=l nij fc=l I PiPijPijk PiPij I Let m -i -p m -I W 0 = E - ( ^ O , 0 , ?] ( f t # } ( / 3 > 0 ) = E 0-i = i m i = i m 63 If we replace (a2,u2,p2) in (5.10), (5.15) and (5.17) by (a2, cu2, p2)T, clearly these three equations can be rewritten as an unbiased estimating equation m -I i=l m where ;(/3, £) is unbiased estimating function. Together with m V(/3,0 = £ ^ ( / 3 ) = 0, i=l we obtain a set of unbiased estimating equations. Applying Lemma 2.1 again, we obtain that the sequence of roots, ( /3 T , £ T ) T , is consistent for ( / 3 T , £ T ) T and asymp-totically normal with mean (/3 T , ,£ T ) T , as m —t oo. Note that = f 3 (xTdiag(E(Y)) Var-^Y) (Y - E(Y))) | = a ( x ^ i a g ( E ( Y ) ) V a r ^ ( Y ) ) E ( Y ^ = 0. Taking /3 and £ as parameter of interest and nuisance parameter respectively and applying Lemma 2.15 to (tb((3, £ ) T , 2, p2) being replaced by cluster specific dispersion parameters (a2,u2,p2). All the previous derivations of orthodox BLUP random effects predictors and estimated score function remain valid if we replace (a2,u2,p2) by (a2, to2, pf). The dispersion parameters can now be estimated through the following equations: + l_ c(i)/4- + c(ij) - 2p2c{i)wijp2j Ji 3 = 1 PihAj (5.18) 65 and 1 Ji 1 nij (v. _ fj .. \2 ~2 uijh lijk) Pi ~f / , / y p J i j=i nij k=l \ XiP Jiji Xijk 1 ^ 1 ^ c(ij)ii2-kp / +yE — E 3 • (5-19) Ji j—i n \ Ii j k uijH'ijk) 3 m j=l n i j fc=l P'i^ijPijk + I g i g c W ) ^ _ ( 5 2 1 ) m i=l ^ t j fc=l MiMij The index j here often represents time in longitudinal data settings where the dis-persion parameter estimators are obtained by averaging the appropriate quantities across time. These two estimators are consistent for large number of clusters. 66 Chapter 6 Residual analysis and computational procedure In this chapter, we briefly discuss residual analysis and computational issues. 6.1 Residual analysis Residual analysis for both the responses and the random effects is an important ingredient of our approach. Note that the marginal residuals of the responses can be decomposed into three uncorrelated residual components as follows: Yijk — PiPijPijk = (Yijk — PijkUij) + (Uij — Hij)HijkUi + (Ui — HijHijHijk- (6.1) These three components actually correspond to the residuals for the three levels of distributional assumptions given in the model. Thus we may check those distri-butional assumptions via estimated residuals Yijk — HijJJij, Uij — [HjUi a n d Ui — All three estimated residuals have zero mean and can be standardized to have unit variance. Actually the variances of the three estimated residuals are already available from (5.16), (5.14) and (5.9). We now define the appropriate types of standardized 67 (estimated) residuals for the three levels of distributions. Level 1: r(i) = Ui - Hi (6.2) Level 2: Uij Pij Ui (6.3) \l^2PiPqij ~ c{i)H2j - c(ij) + 2p2c{i)wijH2j Level 3: ijk HijkUij (6.4) ^p2HiHijHvik + c(ij)ix\k The level 2 and 3 residuals, and , are residuals for conditional distributions, so we will call them conditional residuals. The level 1 residual is the residual for the first level random effects distribution. We can also check the marginal distribu-tional assumptions of the response and the second level random effects through the following marginal residuals: The basic idea in residual analysis is then to use plots of standardized residuals in much the same way as in standard generalized linear models. Plots of standard-relative to the model. To check the log link assumption, we plot logYyfc against the Ui r(3) = ized residuals against each covariate are useful for detecting nonlinearity of the data 68 log fitted values logPijkUijk- Ideally this plot should show a horizontal linear rela-tionship; curvature or other unusual shape would indicate inadequacy of the log link assumption. As in generalized linear models, we may check the form of the variance functions and hence the distributional forms by plotting the standardized residuals against the corresponding fitted values. In order to check the variance function of the distribution for the random effects, we thus plot the standardized residuals against log fitted values. 6.2 Computational procedure Initial values for regression parameter estimates are taken as the regression parame-ter estimates obtained from standard generalized linear models techniques assuming independent responses. We take initial random effects predictions for Ui and Uij as the average of the responses within cluster i divided by the average of all responses and the average of the responses within sub-cluster divided by the average of all responses, respectively. That is m ^i=\ j{ 2^j=l n i j 2"k=\ Iijk 1 x^ nij v. U*3 _i_ sr^m J_ y>Ji _1_ \-^ nij -y m ^»=1 Ji £"j=l riij ^k=l The initial dispersion parameter estimates are calculated from Pearson estimators via (5.10), (5.15) and (5.17), but omitting the bias-correction terms. More specifically, of = and 69 we have and a(o) = — E i™{Ui- in? U) ( o ) - ~ L j L TTTF. > " b i = l Ji j=l riF-ij and P (O) = - E T E — E m i-l Ji j=l nij k=l ViVijV' ijk The algorithm then iterates between updating the regression parameter estimates via the Newton scoring algorithm, updating random effect predictors via the ortho-dox BLUP and updating dispersion parameter estimates via the adjusted Pearson estimators. 70 Chapter 7 Conventional, semiparametric and binomial mixed models In the literature of multiplicative random effects models, the marginal means of the random effects are usually taken to be 1 (Morton 1987; Thall and Vail 1990; Firth and Harris 1991; Lee and Nelder 1996). Imposing these conventional constraints on the Tweedie mixed models is equivalent to taking fa = I and faj = 1 for all i and j. We thus call a Tweedie mixed model with such contraints a conventional Tweedie mixed model. After discussing the conventional Tweedie mixed models in the following section, we will discuss mixed models beyond the Tweedie family which are related to the conventional Tweedie mixed models. The first model is a semiparametric mixed model with only first and second monents assumptions about the random effects; the second one is a mixed model assuming log-normally distributed random effects; whereas the last model involves binomial distributions for the conditional responses. 71 7.1 Conventional Tweedie mixed models To be more specific, we give the assumptions for the conventional Tweedie mixed models: BI) Given U = u, Y 1 U , Y U n n , Y i j U Y i j n i j , Y m J m l , Y m J j n r i m J m are con-ditionally independent, and the conditional distribution of Y ^ , given U = u, depends on only as Yijk\U = u ~ Twp(ij,ijkUij,p2Uij1~p) P2 = UijTwp(p,ijk,—), Uij where nijk = exp(xT.fe/3). B2) Given U* = u», Un, ...,Umjm are conditionally independent, and the condi-tional distribution of Uij, given U* = u*, depends on Ui only, which is £/y-|U, = u» ~ Tw q(ui, cu2 Ui l- g) to2 = U i T w , ( l , — ) . Hi B3) Ui,...,Um iid T w r ( l , ( 7 2 ) . The derived random effects predictors, estimated score function and adjusted Pearson estimators for the more general models in Section 3.1 remain valid with the constraints p; = /iy = 1. Since the detailed expressions for the random effects predictors, estimated score function and adjusted Pearson estimators will be useful to discuss related models beyond the Tweedie family, we will present these expressions in the following sections. 72 7.1.1 Covariance and variance function The first and second moments of the second level random effects possess the following simple structure E[Ul3] = 1, and ' a2 + u2 if (s,t) = a2 if s = i and t ^ j 0 otherwise. The covariance between the random effects and responses is simply given by cov[[/ s i,cy = ^ Cov[Ust, Yijk] = 5(s,i) [a2 + 5(t,j)tu2} /j,ijk. The first and second moments of the response for the conventional Tweedie mixed models are then given by and E[Vij/t] — pijk, Cov[Ysth Yijk P2f4jk + (o2 + 0J2)p?ijk if (s, *, I) = (i, j, k) (CT 2 + uj2)iii]kpiji if (s, t) = andl^k o2VijkViti if s = i and t ^ j 0 otherwise. The covariance of the response displays an interpretable variance components struc-ture which clearly shows the variance contribution from each level of random effects. 73 The marginal variance of Yi3k for conventional Tweedie mixed models has a simple form as follows: Var(Yyfc) = p2^k + (a2 + a,2)/,2.,. (7.1) The case p = 1 and p = 2 which correspond to Poisson-Tweedie models and gamma-Tweedie models are especially interesting. The marginal variances of these two models are 1 Var(Y^) = pijk + (a2 + u)2)p2jk, and Var(y ijfc) = (p2 + a 2 + to2)tfjk, respectively. Therefore the marginal variance functions of these two models coincide with those of the negative binomial and gamma distributions, respectively. It is known that the marginal distribution of Yijk of the conventional Poisson-gamma models with only one level of random effects (to2 — 0) follows a negative binomial distribution. This negative binomial distribution is frequently adopted to account for overdisper-sion relative to Poisson model (Venables and Ripley 1994). It is unclear, in general, if the marginal distributions of the responses of the Poisson-Tweedie and gamma-Tweedie models follow the negative binomial and gamma distributions, respectively. It is known that the distribution of a random variable is uniquely characterized by its variance function within the exponential dispersion models (J0rgensen 1997); how-ever, the marginal distribution of Yijk may not follow an exponential dispersion model (J0rgensen 1987). Hence whether the marginal distributions of the responses of these two models follow the negative binomial and gamma distributions, respectively, re-mains an open question. 74 Finally, we give an expression for the correlation between the responses: U o r r ( r j , - f c , Ystl) = . .— — . ylftiik + (°2 + rfHkJPAu + (°2 + <*2)tit Clearly the correlation matrix still depends on both the regression and the dispersion parameters for the conventional Tweedie mixed models. 7.1.2 Random effects predictors The random effects predictors for conventional Tweedie mixed models also have simple expressions as follows: fr _ 1 + ° 2 s £ i ffii , 7 -1 + ° 2-3=1 Lk=l Wi3Vijk where w{j = l/(p 2 + u2 1%$), and Uij = p2WijUi + UJ2Wij VijkYi3k- (7-3) k=l The conventional Poisson-gamma model coincides with the conjugate Poisson-gamma model studied by Lee and Nelder (1996) when there is only one level of the random effects. For this model both the orthodox BLUP predictors and MHLEs for the random effects are exactly the conditional expectations of the random effects given the responses; therefore both orthodox BLUP and MHLE estimation procedure lead to maximum likelihood estimates for (3 when dispersion parameters are known. In general, the orthodox BLUP predictors and MHLEs are not identical. Lee and Nelder (1996) presented a class of conjugate hierarchical generalized linear models with one level of random effects, and provided an explicit expression for MHLEs of the random 75 effects. As with any multiplicative conjugate hierarchical generalized linear models within this class, a direct calculation shows that the MHLEs for the random effects are linear functions of the response. However, the orthodox BLUP predictors of the random effects are the best linear unbiased predictor for Tweedie mixed models. The estimates for Cov(/3, U ) and Var(U—U) are useful in making inferences about realized or sample values of U (Harville 1976). Lee and Nelder (1996) mentioned that they have not found consistent estimates for these quantities except in few special cases. They further conjectured that consistent estimates may not exist under their regularity condition: the number of clusters is fixed when the number of observations increases. We have not obtained estimates for Cov(/3, U ) for the orthodox BLUP approach yet, but exact expressions for Var(U — U ) are available below: Cov[Ui - Uh Us - Us] = 5(Sii)c(i) Covfft - Ui, Ust - Uat] = S(s,i)P2c(i)wit Gov[Uij - U^, Ust - Ust] = 5(s,i) [p2Wij[5^j)Uj2•+ p2c(i)wit]} . These are clearly consistent estimates for Var(U — U ) when the parameter esti-mators are plugged in. Besides the general small dispersion and large sample asymptotics established in Section 4.4, we give another small dispersion asymptotics result: c ^ < a 2 (p 2 + ^^in^nijQmiiijjfe^2^)) ^ ^ ~ p2 +a;2minj(ny)min:/fc(tfrfcp) + a2mmj{nij)mmjk{p2~kp)Ji and 76 P > 2 + CT2) P2 + ^ 2 min j (n i j )min j f c (^) Obviously, when p ^ 2, we also have c ( « < . . . , ^ , _ \ " J _ , > - R V (7-5) Uij - A Uij as min^/x 2 /) oo. For Poisson-Tweedie (p = 1) and compound Poisson-Tweedie (1 < p < 2) mod-els, m\Ujk(iJ?i3~kp) —> oo is equivalent to all /^ fc -> oo. For positive stable Tweedie models (p > 2), including the inverse Gaussian, minjfc(p2rfcp) oo is equivalent to all Vijk ->• 0. Clearly when p = 2, the same large sample asymptotics conclusion as in Section 4.4 holds without any restrictions on /XjjfcS. Actually, the large sample asymptotics conclusion upon the second level random effects now holds for any p under much re-laxed conditions such as mmj(nij)mmjk(fi2~kp) —>• oo as min,-(njj) —> oo. A sufficient condition is that m in^ / i 2 ^) > clog(minj(ny))/minj(nij) for a positive constant c. That is, the only restriction is that p,ijk should not tend to zero too fast for the Pois-son and compound Poisson cases, whereas minjfc(pJ27fcp) should not tend to infinity too fast for positive stable case. Actually, for conventional Tweedie mixed models, only the behavior of Uij affects the orthodox BLUP estimation procedure. This will become apparent in Section 7.2. 7.1.3 Parameter estimation The estimated score function now reduces to m Ji nij l - p W ) = E E E *m^t{Yijk - U i m k \ (7.6) 1=1 j = \ k=l p 77 whereas the matrix form of the estimated score function is still given by ip(P) = XTdiag (E(Y)) Var_ 1(Y) (Y - E(Y)). In addition, the expression for sensitivity matrix S(/3) also has the following sim-plified version: m Ji nij Jt Uij S(/3) = £ c ( z ) ( £ zZ wijPiik^ijk)(zZ zZ wi3hhik*ijk)T i=\ j=lk=l j=lk=l m Ji . .2 "y ritj i = l j = l " fc=l fc=l ^h^ijk ^ijk^ijk i = l j = l jfc=l P 7.1 .4 Adjusted Pearson estimators The adjusted Pearson estimators have simpler forms: 1 m -i m * 2 = 1)2+ - £ ' ( 0 , • (7-8) m " ™ t = i . 2 1 ^ 1 * = - E T E {(ft* - ft)2 + c(tj) + c(i) - 2p2c(i)wij\ , (7.9) i=l 1 j = l ? = ltjt^"±{ (V"k ~S"lU'i)2 + cdfinl-A . (7.10) j = i J i j=i 'Hj k=l I Mijfc J 7.2 Random effects beyond Tweedie family For the general Tweedie mixed models defined in Section 3.1, the estimation pro-cedure of the orthodox BLUP approach depends on the specific parametric forms of the random effects distributions among Tweedie family of distributions through 78 Pi, Uij, q and r. Since we have //; = pij = 1 for conventional Tweedie mixed models, this dependence of the estimation procedure on the specific parametric random ef-fects distributional forms forms is then via q and r only. However, for conventional Tweedie mixed models, q and r actually do not enter the estimation procedure in either the estimated score function, the sensitivity matrix, the random effects pre-dictors or the adjusted Pearson estimators. That is, the estimation procedure of the orthodox BLUP approach to conventional Tweedie mixed models is robust against misspecification of the random effects distributions within the Tweedie family. This robustness indicates that the orthodox BLUP approach to conventional Tweedie mixed models depends on the random effects only via the first and second moments of the second level random effects. Therefore we can extend the orthodox BLUP approach to deal with conventional models with only the first and second mo-ment assumptions about the second level of random effects. These models are usually referred as semiparametric random effects models with nonparametric random effects. 7.2.1 Nonparametric random effects To discuss this semiparametric model in more detail, we assume that Cl) Given U = u, F m , ...;Ynnn,Yyi,Yijnij,Ymjmx,YmjmnmJm are con-ditionally independent, and the conditional distribution of Y^, given U = u, depends on only as Yijk\U = u ~ Twp(pijkUij,p2Uij1~p) P2 = UijTwp(fj,ijk,—), (7.11) Uij where pijk = exp(xyfc/3). For case p = l , namely Poisson distribution, p2 = 1. 79 C2) The random effects Un,..., Umjm are positive random variables with E(Ulj) = l, and Cov[Ust, Uij] — 5(s>i) \a2 + oo since (a,/3) is shown to be a consistent estimate for (a,/3) in Section 5.2.2. There are many different approaches to Poisson mixed models in the literature. The paired Poisson mixed models trick provides a way of handling binomial mixed models via Poisson mixed models. The extension of this trick to handle multinomial mixed models via Poisson mixed models is straightforward. Such multinomial data examples can be found in McCullagh and Nelder (1989). 85 Chapter 8 Data analysis Illustrative examples for the orthodox BLUP approach to analyzing data of different types based on conventional Tweedie mixed models are presented in this chapter. We will concentrate mainly on the analysis of the epilepsy data to illustrate the use of variations of the models. We will also analyze the seed germination data and cake baking data to illustrate the use of orthodox BLUP approach to handle binomial and continuous data. 8.1 Count data In this section, we illustrate the orthodox BLUP approach to count data with analyses of the epilepsy data described in Section 2.4.1. Preliminary analysis indicated that the counts were much lower during the fourth visit so Breslow and Clayton (1993) introduced a linear trend covariate Visit (coded (-0.3,-0.1,0.1,0.3)). To facilitate vi-sual understanding of the data, Breslow (1996) plotted the logarithm of the seizure counts reported at baseline (visit 0) and at each of the four visits as in Figure 8.1. Counts at the four follow-up visits were increased by 0.5 when taking log-transform 86 0 1 2 3 4 visit number Figure 8.1: Profile plot of log transformed epilepsy seizure counts. to avoid infinities. The baseline counts were divided by four since the baseline counts were measured over eight weeks instead of two weeks. The baseline seizure counts are less variable because the period over which they were measured was four times as long as for each of the subsequent visits. Patient 207 had exceptionally high counts at baseline count and all subsequent visits. 8.1.1 Overview of the previous analyses Breslow and Clayton (1993) reanalyzed the epilepsy data using the penalized quasi-likelihood approach, whereas Lee and Nelder (1996) also reanalyzed the same data based on hierarchical generalized linear models. To make a systematic comparison of 87 these analyses, we summarize their models as follows. Lee and Nelder (1996) considered a model where the response Y given the random effects (U, V) follows a Poisson distribution and Yij\U = u , V = v ~ Poisson(iJ,ijUiVij) i = 1,..., 59, j — 1,2,3,4, (8.1) where Us and V~ijS are all mutually independent having E(£/j) = E(Vjj) = 1 and Var(£/j) = a 2 and Var(Vjj) = 2 = 0), Model(a;2), Model(w2) and Model(a>?) assume a;2 = 0, LO2, to2 and LO2 in (8.2), respectively. Thus Model(o;2 = 0) is the same one level model as HGLM 1, but without the gamma distributional assumption for the random effects. Model(a>2) is the model with equal to2 for all (i, j) corresponding to HGLM 3. Model(a>2), namely the model with distinct co2 for different visits, has the same covariance structure of the response as that of model 22 proposed by Thall and Vail through crossed random effects design. The last model, Model(cj2), has distinct co2 for each subject. We start with the Model ( C J 2 ) , including all possible two way interaction terms among the covariates. We removed all those interaction terms except.the interaction term Base.Trt from the model based on the Wald test and residual analysis. The p-value of Wald test for this last interaction term is slightly larger than 5%, but we retained it in all our models to allow a systematic comparison. That is, we take Base, Age, Trt, Visit and Base.Trt as the covariates. We fit the four models and present the results in Table 8.2. 90 Table 8.2: Parameter estimates for the epilepsy data. Model(cj2 = 0) Model(cj2) Model(cj2) Model(cj2) Parameter est.* s.e.* est. s.e. est. s.e. est. s e. Constant -1.35 1.22 -1.35 1.22 -1.37 1.16 -1.11 0.72 Base 0.88 0.14 0.88 0.14 0.87 0.13 0.94 0.07 Trt -0.89 0.41 -0.89 0.42 -0.91 0.40 -0.49 0.28 Base.Trt 0.34 0.21 0.34 0.21 0.35 0.20 0.03 0.13 Age 0.51 0.36 0.51 0.36 0.52 0.34 0.36 0.21 Visit -0.22 0.10 -0.22 0.22 -0.28 0.23 -0.35 0.18 a 2 0.28 0.19 0.16 0.007 u, 2 0.36 0.33 0.03 w\ 0.32 0.08 u\ 0.67 0.54 "I 0.25 12.9 * est. and s.e. represent estimates and standard errors respectively. 8.1.3 Model checking We have done model checking through residual analyses. The normal plots for Model(cj2), Model(cj2) and Model(a;2) are displayed in the first, second and third columns, respectively of Figure 8.2. The plots in the three rows correspond to the level 1, 2 and 3 residuals defined in Section 6.1. As expected, Model (CJ?) exhibits the least curvature since there are many more dispersion parameters estimated. Unlike Model (CJ 2 ) , the dispersion parameter estimators are known to be consistent for the Model(cj2) and Model(cjJ). There is not much difference between the normal plots for Model(cj2) and Model(cj2). We plot the standardized residuals against fitted values for the level 2 and level 91 Model(u;2) Normal plot of laval 1 ratiduali Model(o;2) Models 2) Notrrnl plot ol IMI 2 raaiduaU Normal ptol ol laval 2 Model(o;2) Normal ptol ol I oval 2 raiiduaJi Model (u;2) Model(o;2) Normal ploi ol laval 3 ratidualt Normal plot ol loval 3 raiiduali Model(o;2) Model(o;2) Model(w2) Figure 8.2: Normal plots of level 1, 2 and 3 residuals for epilepsy data. 92 Standardized residuals vs fitted values Residuals vs lilted values Residuals vs fitted values Model(a;2) Model(u;2) Model(w2) Standardized residuals vs fitted values Residuals vs log (fitted values) Residuals vs log (fitted values) leg Model(o;2) Model(a;2) Model(o;2) Figure 8.3: Scatter plots of level 2 and 3 residuals for epilepsy data. 3 distributions in Figure 8.3. The level 1 residual plot is omitted since the fitted values are fixed at 1. The residual plots for these three models are arranged in the same fashion as the normal plots. The residuals for the level 3 distribution from the three models are approximately around zero, but display distinct lines due to the discreteness of the Poisson distribution. The residuals of Model(u;2) for the level 2 distribution show a slightly megaphone shape, whereas the residuals of Model(o;2) for the level 2 distribution exhibit an upward trend. The residuals of Model(u 2) for the level 2 distribution show less pattern than the other two. 93 Based on the residual plots, Model(o;2) appears to fit slightly better than Model ( C J 2 ) . But the inferences about regression parameters are not significantly different for these two models. The dispersion parameter estimators are shown to be consistent for these two models as well. On the other hand, the normal plots for Model(u;2) look less curved. 8.1.4 Comparison of different approaches Thall and Vail (1990) found their model 22 had the best fit among their fitted models. The numerical results from Breslow and Clayton (1993) were reported as similar to those obtained from this model 22. Analyses from the hierarchical generalized linear models and penalized quasi-likelihood are also very similar except for the intercept estimators. The regression parameter estimates and the standard errors obtained from Model(cj2 = 0) and Model(cj2) are quite similar to those in their counterpart from HGLMs except our standard error estimates are slightly larger. Thus all dif-ferent approaches yield essentially the same results about the regression parameters although different random effects distributional assumptions were made. The conclusions based on Model(c<;2 = 0), Model(c<;2) and Model(a;2) are similar to those previous studies. The interaction between the treatment and baseline is re-tained in the models since the approximate p-values for this interaction is just slightly larger than a = 0.05. As Thall and Vail indicated, this means the predicted mean seizure rate for the treatment group is either higher or lower than that for the placebo group, accordingly as the baseline count does or does not exceed a critical threshold (minus the ratio of the regression parameter for Trt to the regression coefficient for 94 the interaction term). This seems to suggest that the new drug progabide may be contraindicated for patients with high seizure rates. However they pointed out this suggestion should be treated with caution since it is based on a single dataset and a particular family of models. In fact, the interaction between the treatment and baseline is statistically insignif-icant in Model (of). Based on the Wald test, the p-value of dropping the interaction term from Model (LO2) is as high as 0.8. That is, the contraindication of the new drug progabide for patients with high seizure rates disappears after accounting properly for the marginal heterogeneity among patients. Dropping this interaction term, the p-value for the treatment effect in this model becomes extremely small. That is, the effectiveness of this new drug progabide is statistically significant. Thall and Vail (1990) focused their analyses on the regression parameters in the log-linear models for the marginal seizure rates and the covariance parameters in var-ious patterned covariance matrices. They carefully examined the marginal residuals comparing the observed and fitted counts of each subject at each visit, and identified patients 207, 135, 225, 227 and 112 as 'outliers'. The primary interest of Breslow and Clayton (1993) is systematic identification of patients who had extreme levels, or extreme degrees of change over time, in their seizure rates. They identified patient 135 as having marked improvement over time after an initially high seizure rate, and identified patients 227, 225 and 112 as having the highest overall count levels relative to expectation based on the covariates. They also identified patient 232 as having low and zero counts. However Lee and Nelder were more cautious in claiming outliers and indicated a single outlier, patient 227. 95 The marginal heterogeneity among patients is substantial, even after accounting for the treatment and baseline variables. In all models except Model(w2), the dis-persion parameters are assumed to remain the same across subjects. That is, the marginal variance only changes with the marginal mean in a systematic way. How-ever, in biological studies, due to the genetic, environmental or other unknown factors, the marginal heterogeneity across subjects often exceeds what could be explained by the mean change. We plot the sample variances s2 against sample means y~i across subjects in Figure 8.4 to study the mean-to-variance relationship. The boxplots of the responses for different subjects are also displayed in Figure 8.4 to serve this purpose. As indicated by (7.1), the variance is a monotone function of the mean for Tweedie mixed models with equal dispersion parameters. Thus the two plots in Figure 8.4 should reflect this monotone pattern if the dispersion parameters do not vary across subjects. These two plots do show a kind of monotone pattern, but not very closely. To account for the irregular marginal heterogeneity, we considered Model (a;2) where different dispersion parameters u>2 are allowed across patients. This model seems to have effectively captured this irregular heterogeneity. The minimum, first quartile, third quartile and the maximum of Qjs are displayed in Table 8.2 labeled as to2 for j = 1,2,3,4. The estimated u;2s range from 0.03 to 12.9 corresponding to patients 213 and 225 respectively. Model(u;2) also seems to be very sensitive in detecting outliers. The scatter plot of cD2s in Figure 8.5 identifies patients 135, 227, 112, 207 and 225 with large Q2. These patients match outliers reported by Thall and Vail (1990) and Breslow and Clayton (1993). 96 S c a t t e r plot 2 0 4 0 w i t h i n s u b j e c t s a m p l e m e a n s B o x p l o t s of with i n s u b j e c t r e s p o n s e s B = i « . « 0 1 • • • — 1 — —•—•—•—•—•—•—•—•—•—r—c^jcOojcQcQojoocOc^ Figure 8.4: Heterogeneity plots for epilepsy data: (a) scatter plot of within subject sample variances versus sample means; (b) boxplots of within subject responses for all patients. 97 ( Figure 8.5: Heterogeneity plot for epilepsy data. 98 8.1.5 Computational aspects With regards to computing, we ran Splus on a Sparc ultra machine. All parame-ter estimates for the Model(u;2) updated in monotone directions after five iterations. Those included regression and dispersion parameters as well as standard errors for regression parameters. All estimates at the fifth iteration are close to those in Table 8.2. The same two decimal point precision for all parameter estimates was achieved at the eighteenth iteration. At this iteration, the absolute value of the standardized components of ib((3) were less than 0.001. The absolute distance between regression parameter estimates and their updates were less than 0.00003. We stopped the pro-gram after sixty iterations. The maximum of the absolute values of the standardized components of i/>(/3) and the absolute distance between regression parameter esti-mates and their updates were less than 2E — 12 and 3E — 13, respectively. Each iteration took about 10 seconds of user time and 0.01 seconds of system time on the Sparc ultra machine. Twenty iterations took about four minutes. As to the other two level random effects models, the computing time for Model(a;2) was slightly longer than that for Model(a;2). For Model(c<;2), all parameter estimates stabilized after eight iterations, but they converged much more slowly although still in a monotone direction. Those estimates approached the values reported in Table 8.2 after about fifty iterations. Each iteration took about 15 seconds on the same machine. 8.2 Binomial data We illustrate the orthodox BLUP approach to binomial data with analysis of the seed germination data described in Section 2.4.2. 99 8.2.1 Paired Poisson-Tweedie models We analyze the seed germination data based on the paired Poisson-Tweedie models. The model notation is similar to that for the epilepsy data. The estimates and cor-responding standard errors are displayed in Table 8.3. Table 8.3: Parameter estimates for the seed germination data based on paired Poisson mixed models. Paired GLM Model(a2 - 0) Model(o;2) Model^J) Parameter est.* s.e.* s.e. s.e. s.e. Constant -0.558 0.126 0.349 0.181 0.179 Seed (2) 0.146 0.223 0.512 0.289 0.287 Extract (2) 1.318 0.177 0.475 0.250 0.248 Interaction -0.778 0.306 0.708 0.398 0.395 est. est. est. a2 0.229 0.226 co2 0.265 0.042 0.055 co\ 0.026 * est. and s.e. represent estimates and standard errors respectively. As indicated in Chapter 7, the asymmetric binomial mixed models can be con-sidered based on the paired two level Poisson-Tweedie models with the second level random effects having unequal dispersion parameters co2s between the two groups j = 1,2. We started our modeling with this asymmetric model, Model (co2), but there is little evidence for this asymmetry as the difference between u2 and 0J\ is small relative to a2. Thus we proceed to fit the symmetric paired Poisson mixed model with one and two levels of random effects, namely Model(a2 = 0) and Model(t<;2). 100 The estimates and standard errors presented in the first and second columns were obtained from the standard paired Poisson generalized linear model, namely, the paired Poisson generalized linear models without random effects. This standard paired Poisson generalized linear model corresponds to the standard binomial gen-eralized linear models. The regression parameter estimates and their corresponding standard errors obtained from these two models are exactly the same. The regression parameter estimates obtained from all above paired Poisson-Tweedie models with various distributional assumptions for random effects are almost identi-cal to those obtained from the standard binomial generalized linear models. Breslow and Clayton (1993) also reported similar results from their simulation study based on some simulated data sets generated from a balanced binomial mixed model de-sign. Their marginal quasi-likelihood (MQL) approach to binomial mixed models led to regression coefficient estimates identical to those obtained from standard logistic regression; however neither the MQL nor PQL regression parameter estimates for this seed germination data were identical to those obtained from standard logistic regression. On the other hand, the standard error estimates obtained from the paired Poisson mixed models are much larger than those obtained from the standard paired Poisson regression models. Furthermore, the standard error estimates clearly vary from one random effects model to another. That is, the standard error estimates depend heav-ily on the random effects model assumptions. This is different from the results for the epilepsy data. 101 8.2.2 Model checking We performed residual analysis based on the paired Poisson-Tweedie models. Since the procedure is exactly the same as those presented in last section, we present the residual plots for Model(a;2) only. The normal plots of the level 1, 2 and 3 residuals, for Model(u;2) are displayed in Figure 8.6 and the scatter plots of level 2 and 3 resid-uals are displayed in Figure 8.7. Normal plots at all three levels exhibit moderate curvature, whereas the scatter plots of level 2 and 3 residuals against log-fitted values show moderate upward trends. Some patterns usually exist for a small data set with discrete responses. 8.2.3 Comparison of different approaches Breslow and Clayton (1993) and Lee and Nelder (1996) also analyzed the seed ger-mination data. The estimates and the corresponding standard errors are presented in Table 8.4. Ta Die 8.4: Paramet :er estimates for t ie seed data based on binomial moc els. Parameter GLM est.* s.e.* PQL est. s.e. HGLM est. s.e. Constant Seed (2) Extract (2) Interaction a2 -0.558 0.126 0.146 0.223 1.318 0.177 -0.778 0.306 -0.542 0.190 0.077 0.308 1.339 0.270 -0.825 0.430 0.098 -0.543 0.187 0.080 0.303 1.337 0.265 -0.822 0.423 0.045 * est. and s.e. represent estimates and standard errors respectively. The standard errors obtained from the paired Poisson-Tweedie models with one 102 N o r m a l p l o t o f l e v e l 1 r e s i d u a l s q u a n t i l e s o f s t a n d a r d n o r m a l N o r m a l p l o t o f l e v e l 2. r e s i d u a l s q u o n t i l o o of s t a n d a r d n o r m a l N o r m a l p l o t o f l e v e l 3 r e s i d u a l s qLianti los o f s t a n d a r d n o r m a l Figure 8.6: Normal plots of residuals for seed germination data. 103 Residuals vs fitted values 0 . 2 0 . 4 O.e 0 . 8 L O 1 . 2 1 . 4 1 . 6 l e v e l 1 r a n d o m e f f e c t s Residuals vs log(fitted values) CM I . O 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 l o g ( f i t t e d v a l u e s ) Figure 8.7: Scatter plots of residuals for seed germination data. 104 level of random effects are almost twice those obtained from penalized quasi-likelihood and maximum hierarchical likelihood approaches, whereas the symmetric paired Pois-son with two levels of random effects model gives similar, but slightly smaller standard error estimates than those obtained from penalized quasi-likelihood and maximum hi-erarchical likelihood approaches. The Model(cr2 = 0) corresponds to binomial-beta model (Lee and Nelder 1996); however the former gives quite different standard error estimates than the latter. This difference is not easy to explain. Further investigation is needed. 8.3 Continuous data We illustrate our approach to clustered continuous data with an analysis of the cake baking data described in Section 2.4.3. Cochran and Cox (1957) analyzed the cake baking data using analysis of variance. The normal plot of residuals for the ANOVA model in Figure 8.8 shows a curvature. Firth and Harris (1991) re-analyzed the cake baking data using multiplicative models and they found strong support for the hy-pothesis of a constant coefficient of variation. Thus we analyze the cake baking data based on the conventional gamma mixed models, Model(a;2), where the two levels of random effects are at the batch and observation levels. Temperature and recipe are taken as factors. We screen these factors based on the Wald test. We found that the recipe effect and any interaction between temperature and recipe appear negligible, but the temperature effect is highly statistically significant. In fact, Figure 8.9 shows that there is an increasing trend in breaking angles as the temperature increases. Cochran and Cox (1957) and Firth and Harris (1991) reached similar results to 105 ours except they found the recipe effect is significant at the statistical significance level of 0.05. The normal plots of level 1, 2 and 3 residuals in Figure 8.10 do not reveal serious patterns except some curvature for the level 1 residuals. The scatter plot for level 2 residuals in Figure 8.11 does not exhibit any serious pattern, whereas the scatter plot for level 3 residuals shows an upward trend. Noting that the dispersion parameter p2 is nearly zero, one would expect that the effect of level 3 residuals on the model be small. To have a systematic comparison, we display estimates and corresponding standard errors for ANOVA, gamma and gamma mixed models without interaction in Table 8.5. The parameter estimates from ANOVA model are quite different from those obtained using standard generalized linear model and gamma mixed model; however the regression parameters for the latter two models are again almost identical for this balanced design. After accounting for random effects, the recipe factor becomes less statistically significant, whereas the temperature factor becomes more statistically significant. 106 Table 8.5: Parameter estimates for cake baking data. ANOVA GLM Model(o;2) Parameters est. s.e. est. s.e. est. s.e. Constan 32.122 0.474 3.466 0.015 3.466 0.030 RI 0.989 0.821 -0.023 0.018 -0.023 0.037 R2 0.819 0.474 -0.008 0.010 -0.008 0.021 T l 0.598 0.335 0.034 0.026 0.034 0.014 T2 1.092 0.260 0.028 0.015 0.028 0.008 T3 0.647 0.212 0.020 0.010 0.020 0.006 T4 -0.739 0.581 0.033 0.008 0.033 0.005 T5 -0.261 0.335 0.020 0.007 0.020 0.004 a 2 0.038 LO2 0.019 P 2 0.00025 R and T represent recipe and temperature factors. 107 108 175 185 195 205 temperature 225 Figure 8.9: Boxplots of cake baking data by temperature. 109 N o r m a l p l o t o f l e v e l 1 r e s i d u a l s - 2 -1 O 1 2 q t i a n t i l e o o f s t a n d a r d n o r m a l N o r m a l p l o t o f l e v e l 2 r e s i d u a l s - 3 - 2 -1 O 1 2 3 q u a n t i l o o o f s t a n d a r d n o r m a l N o r m a l p l o t o f l e v e l 3 r e s i d u a l s - 3 - 2 -1 O 1 2 3 q u a n t i l e a o f s t a n d a r d n o r m a l Figure 8.10: Normal plots of residuals for cake baking data. 110 Level 2 residuals vs level 1 random effects o . a 1 .0 1 .2 1 .4 l e v e l 1 r a n d o m e f f e c t s Level 3 residuals vs log-fitted values • • • • • * * - : s • • • : : • -- i • • • • • • • " • • • ! ! : 1 I i • ! • • t • I t * '. * • • • • - : . • • • . • * • " t • • - * • • t - • • • 3 . 0 3 . 2 3 . 4 3 . 6 3 . 8 4 . 0 l o g - f i t t e d v a l u e s Figure 8.11: Scatter plots of residuals for cake baking data. I l l Chapter 9 Simulat ion In this chapter, we evaluate the performance of the orthodox BLUP approach based on a simulation study. We will focus on Poisson mixed models since these models play an important role in the analysis of discrete data. Illustrative examples of applications of Poisson mixed models to analyze count data and binomial data were presented in the previous chapter. A possible connection between Poisson mixed models and the random effects Cox proportional hazard models for censored data will be discussed in next chapter. 9.1 Results over 100 simulations To investigate the performance of the orthodox BLUP approach under realistic con-ditions, we 'replicate' the epilepsy data set 100 times via simulation using Poisson-gamma model. To generate the random effects and responses via simulation from this model, we take the covariates Constant, Base, Trt, Age, Visit and Base.Trt of epilepsy data as the covariates for simulation. The corresponding regression and dispersion parameter estimates for the epilepsy data are taken as true model parameters. There are listed in Table 9.1 as 'true values' (30, px, ..., @5, a2 and LO2, respectively. Each 112 of these 100 data sets is then simulated through the following three steps: • Step 1: Generate 59 samples from Gamma(l, a2), denoted by Vq°\ . . . , u^; • Step 2: Generate 4 samples from Gamma(uf , w 2 « f ) for each i, denoted by j = 1,2,3,4 and i = 1,..., 59; • Step 3: Generate a sample from Poisson (uij exp(x /^3)) for each denoted by y£\ j = 1, 2, 3,4 and i = 1,..., 59, where the design matrix X is taken exactly the same as that of epilepsy data. The regression and dispersion parameter were fixed at the corresponding estimates for epilepsy data, that is, (3 = (1.30,0.88,-0.88,0.50,-0.23,0.34), a2 = 0.24 and co2 — 0.44, respectively. The 100 data sets are obtained by repeating this procedure for k = 1,...,100. We analyzed each of these 100 data sets based on the standard Poisson regres-sion model (GLM) and Poisson-Tweedie models with and without degree of freedom correction for small samples, denoted by BLUP.c and BLUP, respectively. The sum-maries are presented in the next subsection. 113 9.1.1 Summary statistics The averages of regression and dispersion parameter estimates over 100 simulations are displayed in Table 9.1. Table 9.1: Averages of parameter estimates over 100 simulations. f% Pi P2 Ps PA f% a2 cu2 true value -1.30 0.88 -0.88 0.50 -0.23 0.34 0.24 0.44 GLM -1.37 0.88 -0.78 0.51 -0.26 0.29 BLUP -1.36 0.87 -0.89 0.51 -0.25 0.34 0.17 0.50 BLUP.c -1.31 0.87 -0.88 0.50 -0.25 0.34 0.23 0.61 The regression parameters are reasonably estimated by both GLM and orthodox BLUP approaches with or without small-sample correction. On the other hand, the dispersion parameters a2 and cu2 are underestimated and overestimated respectively. This phenomenon is different from simulation studies reported by Breslow and Clay-ton (1993) where the dispersion parameters were always underestimated. However the possibility of overestimation was previously predicted by Engel and Keen (1996). The small-sample correction alleviates the negative bias of the estimate of a2, but worsens the positive bias of the estimate of cu2. The standard errors of the estimates over 100 simulations and the averages of the 100 estimated standard errors are termed as simulated and estimated standard errors, respectively (Lin and Breslow 1996b). Table 9.2 displays the simulated and estimated standard errors. The expressions of the simulated and estimated standard errors for Pi are as follows: 1 100 1 100 simulated s.e.(A) = , — Y(p\k) - — Y B\k))2, 114 and 1 1 0 0 estimated s.e.(0i) = s.e.(3^). v ' 100 i Table 9.2: Simulated and estimated standard errors s.e. . 0o 0i A 0s As a2 ui2 GLM Sim. 1.65 0.17 0.60 0.48 0.28 0.31 Est. 0.41 0.04 0.15 0.12 0.10 0.06 BLUP Sim. 1.42 0.15 0.43 0.41 0.22 0.23 0.07 0.07 Est. 1.25 0.14 0.42 0.36 0.24 0.22 BLUP.c Sim. 1.42 0.15 0.43 0.41 0.22 0.23 0.07 0.09 Est. 1.39 0.16 0.47 0.41 0.26 0.24 * Simulated s.e.: s.e. of estimates over 100 simulations. ** Estimated s.e.: average of 100 estimated s.e.s. Clearly the GLM standard error estimates are seriously negatively biased. In contrast, the orthodox BLUP approach with small-sample correction slightly overes-timates standard errors for regression parameters, except for the intercept (30. On the other hand, the orthodox BLUP approach without small-sample correction slightly underestimates the standard errors for the regression parameters, severely so for the intercept. Table 9.3: Mean squared errors over f% 0i 02 03 04 05 a2 LO2 GLM 2.73 0.03 0.36 0.23 0.08 0.10 BLUP 2.01 0.02 0.18 0.17 0.05 0.05 0.01 0.01 BLUP.c 2.01 0.02 0.19 0.17 0.05 0.05 0.01 0.04 00 simulations. To evaluate the overall performance of the estimates over 100 simulations, we dis-115 play the mean squared errors (MSEs) in Table 9.3. As expected, the mean squared » errors of regression parameter estimates based on standard Poisson model are larger than those based on the orthodox BLUP approaches. According to mean squared errors, the orthodox BLUP approach without small-sample correction performed slightly better than the orthodox BLUP approach with small-sample correction. All three approaches gave exceptionally large mean squared errors of the intercept ft. 9.1.2 Confidence and prediction intervals Based on the simulated and estimated standard errors, we constructed simulated and estimated 95% confidence intervals for the parameters and 95% prediction intervals for the random effects (u\ and un) assuming normality of the estimators and predictors. We present in Table 9.4 the counts of the number of times the true values are covered by 95% confidence intervals or prediction intervals. The kth confidence intervals for ft are defined as follows: simulated C.I. for ft = - 1.96 simulated s.e .( f t) , $[k) + 1.96 simulated s .e . ( f t ) ) , and estimated C.I. for ft = ($*} - 1.96 estimated s.e.0[k)), 0{k) - 1.96 estimated s.e.0[k))) . As expected, the GLM estimated 95% confidence intervals performed very poorly. On the other hand, the estimated 95% confidence intervals based on the orthodox BLUP approaches performed reasonably well, especially those with small-sample cor-rection. The orthodox BLUP approaches also gave reasonable 95% prediction inter-vals, but with lower coverage counts for u\ and higher coverage counts for u n . 116 Table 9.4: Coverage counts of 95% confidence and prediction intervals. A) A i A> A3 A i As Ui GLM Sim. 95 96 94 94 94 93 Est. 35 28 38 35 44 31 BLUP Sim. 96 97 96 95 94 96 88 100 Est. 90 96 94 91 97 92 84 98 BLUP.c Sim. 96 97 94 95 94 97 92 100 Est. 95 96 95 94 98 97 91 100 9.1.3 Normality of parameter estimates and random effects To check the normality of parameter estimators, we did a normal plot for each of regression and dispersion parameter estimates over the 100 simulations. The results are displayed in Figure 9.1. The departures from normality do not seem serious even for the dispersion parameters. To assess the performance of the orthodox BLUP predictors, we plotted the simu-lated random effects versus their orthodox BLUP predictors over the 100 simulations. In particular, we plotted u[k^ versus and versus u[ki over k = 1,..., 100 in the upper rows of Figure 9.2. Perfect prediction would lead to diagonal lines in these two plots. The prediction for the second level of random effects seems to be better than that for the first level random effects. Normal plots of the orthodox BLUP predictors of random effects Ui and un over 100 simulations are displayed in the lower row of Figure 9.2. The clear curvatures in these normal plots imply that random effects predictors are not well approximated by normal distributions. The numerical results and plots from this simulation study show that the param-117 y : 3 y Po Pi P2 ; y / S y s Pz PA Ps 3 Figure 9.1: Normal plots of parameters over 100 simulations: x and y axes are quan-tiles of standard normal distribution and ordered parameter estimates over 100 sim-ulations, respectively. 118 Simulated vs predicted first level random effects over 100 simulations Simulated vs predicted second level random effects over 100 simulations o.o 0.5 1.0 1.5 2.0 2.5 simulated first level random effects 1 simulated second level random effects Normal plot of first level random effects Normal plot of second level random effects -1 1 quantiles of standard normal - 2 - 1 0 1 2 quantiles of standard normal Figure 9.2: Plots for random effects predictors iii and uu over 100 simulations. 119 eters are reasonably well estimated through orthodox BLUP approach. 9.2 Residual analysis In Chapter 8, we did model checking through residual plots. Serious irregularities exhibited in residual plots may indicate serious violation of model assumptions. To help to set up standards for judging irregularity exhibited by residual plots, we will study the behavior of residuals through simulation in this section. In this section, we make comparison and contrast of residual plots for the epilepsy data and simulated data. The normal plots for the level 1, 2 and 3 residuals for the epilepsy data and simulated data are displayed at row 1, 2, and 3 of Figure 9.3 where the normal plots for the epilepsy data and simulated data are on the left and right hand side, respectively. The histograms and scatter plots for the epilepsy data and simulated data are displayed in Figure 9.4 and 9.5, respectively, in the same fashion as the normal plots. The simulation shows that the residual plots may exhibit some patterns even if the data were generated from a Poisson-gamma model. That is, moderate patterns in residual plots may not indicate violation of the model assumptions. 120 Normal plot of level 1 residuals Normal plot of level 1 residuals (simulated data) quantiles of standard normal Normal plot of level 2 residuals quantiles of standard normal Normal plot of level 2 residuals (simulated data) quantiles of standard normal Normal plot of level 3 residuals quantiles of standard normal Normal plot of level 3 residuals (simulated data) quantiles of standard normal quantiles of standard normal Figure 9.3: Comparison of normal plots of level 1, 2 and 3 residuals for epilepsy data and simulated data. 121 Histogram of level 1 residuals Histogram of level 1 residuals (simulated data) -2 -1 Histogram of level 2 residuals Histogram of level 2 residuals (simulated data) Histogram of level 3 residuals Histogram of level 3 residuals (simulated data) Figure 9.4: Comparison of histograms of level 1, 2 and 3 residuals for epilepsy data and simulated data. 122 Level 2 residuals vs level 1 random effects Level 2 residuals vs level 1 random effects (simulated data) (•vol 1 random affects Level 3 residuals vs log-fitted values k>g(fltlnd valuas) Level 3 residuals vs log-fitted values (simulated data) Figure 9.5: Comparison of scatter plots of level 2 and 3 residuals for epilepsy and simulated data. 123 Chapter 10 Discussion 10.1 Conclusion In this thesis we have introduced a new class of generalized linear mixed models, Tweedie mixed models, and adopted the orthodox BLUP in the fitting algorithm. This approach has several advantages. 1. Tweedie mixed models take both the distributional shape and intra-dependence of clustered data into account. The resulting variance component decomposition of the structure of the covariance matrix of responses is not only very interpretable, but also makes the model fitting much simpler than modal predictor approaches; all expressions for the estimating equations were explicitly derived based on (3.13) so there is no need to invert large matrices. 2. The orthodox approach provides us a common method for computing parame-ter estimates and random effects predictors for all Tweedie mixed models; therefore we can study Tweedie mixed models as a single class, rather than as a collection of 124 unrelated different models. Furthermore, the conventional Tweedie mixed models are shown to have close connection with well-known Poisson-gamma, binomial-beta and Poisson-lognormal models. The semiparametric interpretation of the conventional Tweedie mixed models allows specification only the first and second moments of the unobserved random effects. 3. The orthodox BLUP for random effects in (4.1), estimated score function for regression parameter in Theorem (5.1) and adjusted Pearson estimators for dispersion parameters in (5.10), (5.15) and (5.17) show that our orthodox BLUP approach relies on only the first and second moment structure of (U, Y). It is important to note that the orthodox BLUP approach requires specification only of the first and second mo-ments of the random effects model. Tweedie mixed models play an interpretable role within this larger family of models just as exponential dispersion models do within generalized linear models. 4. The asymptotic justification of orthodox BLUP approach does not rely on any kind of normality of random effects. In contrast, the asymptotic justifications of pe-nalized quasi-likelihood and maximum hierarchical likelihood approaches largely rely on the approximate normality for random effects or 'the right transformed normality for random effects on the right scale' , respectively. It is important to note that our asymptotic variance of regression parameter estimator is not affected by the variabil-ity of estimating dispersion parameters. Our orthodox BLUP approach leads to the same estimating equation for the regression parameters as that obtained by the generalized estimating equation ap-proach. This estimating equation was also reached via the quasi-likelihood function 125 (McCullagh 1983) and multilevel modelling (Goldstein 1995) approaches. All these approaches except the orthodox BLUP approach are marginal modelling instead of conditional modelling approaches, that is, they are useful to make the 'population-averaged' instead of 'subject-specific' inferences on the mean (Schabenberger 1996; Burton et al. 1998). On the other hand, the orthodox BLUP approach explicitly incorporates random effects and mergers the 'subject-specific' and the 'population-averaged' inferences using the same model. In addition, the marginal covariance structures for our models are generally different from those considered by the other three approaches. In the development of random effects modelling methodologies, the model check-ing has generally been ignored (see e.g. Zeger and Karim 1991; Breslow and Clayton 1993) or done via normality checking of residuals (Lee and Nelder 1996); however the justification of the normality of residuals has not been found in the literature. It appears from our simulation study that moderate departures from normality may not necessarily indicate departures from the model assumptions. We illustrated the orthodox BLUP approach to analyses of clustered count, bino-mial and continuous data using Tweedie mixed models. In addition, our compound Poisson-Tweedie models may be applied to positive continuous data with a positive probability component at zero. One data example is car insurance data, for example, where the zero component corresponds to the case of no claims for a given insur-ance policy and the positive part of the distribution is the claim-size distribution (J0rgensen and Souze 1994). Another data example is the customers' expenditures in a certain shopping center. 126 In this thesis, we take dispersion parameters as nuisance parameters and their standard errors are not estimated; however it would be of interest to develop tests for the presence of random effects as well as issues such as over-dispersion and het-erogeneity. Testing for such hypotheses is complicated because the null values lie on the boundary of the dispersion parameter space. Bootstrap methods may be used to construct such confidence intervals (Lele 1991), but these are controversial as regards their ability to fully reflect the relevant uncertainties. 10.2 Further study In this section, we discuss some further research of the orthodox BLUP approach to generalized linear mixed models. 10.2.1 More than two levels of random effects We have presented the Tweedie mixed model with two levels of random effects in detail. This model is useful to handle three-level hierarchical structure; however, hi-erarchies involving more levels also occur frequently in practice (Goldstein 1995). One such example is multi-center longitudinal clinical trials with centers further nested within geographical regions such as cities or countries. The extension of the Tweedie mixed models with two levels of random effects is straightforward. The estimation procedure for more than two levels of nested ran-dom effects models is much the same as that for two-level models. The estimated score function and sensitivity matrix have the same global matrix expression as in Theorem 5.1. The key derivation of the orthodox BLUP approach is the calculation 127 of the orthodox BLUP prediction of random effects which involves the inverse of the covariance matrix of the response. However this inverse can be obtained by repeat-edly applying (3.13) to the covariance matrix of the response, twice for the models with two levels of random effects, three times for three levels of random effects, and so on. We have derived explicit expressions for all quantities of interest. These expres-sions are useful for theoretical study. In practice, those explicit expressions are not necessary for computing purposes. Calculation of the covariance matrix of the re-sponse can be programmed based on (3.13), and all quantities of interest can be evaluated in terms of this inverse. This remark will be especially useful to resolve the computing issue for these extended models. 10.2.2 Crossed designs In our analyses of the epilepsy data, we considered the nested random effects only. In clinical trials, the nature and degree of the variability of such epileptic seizure counts over time may be as important as its average behavior (Thall and Vail 1990). This aspect of the epilepsy data may be analyzed through the following simple crossed factor Tweedie mixed model with p — 1: Yijk\U = u,V = v ~ Twp {pijk(ui + Vj), p2(ui + U j ) 1 _ p } , where Ui,..., Um, V\,...,Vn are mutually independent with E(Ui) = E(Vj) = \ . Replacing the random effects in the partially observed score function by their 128 orthodox B L U P predictors would give us an estimated score function as follows: i=lj=lk=l P = X T diag(E(Y))Var _ 1 (Y)(Y-E(Y)) . This estimating equation is clearly unbiased, but its asymptotic properties have yet to be studied. Ways of handling crossed designs are still rather limited in the literature of generalized linear mixed models. Further work on crossed factor random effects models would be of interest. The development of models for complicated crossed factor designs such as salamander mating data (McCullagh and Nelder 1989) is under way. 10.2.3 Survival data analysis Incorporating random effects into Cox proportional hazard models has gained in-creasing attention in analyses of epidemiological or event history data. However, these models pose considerable theoretical difficulties in the development of estima-tion and inference procedures (Clayton 1991). Following Whitehead's (1980) idea of fitting the traditional Cox model using Poisson modelling techniques, we showed elsewhere that the random effects Cox model can also be fitted using random effects Poisson modelling techniques. To be more specific, we consider a Cox model with two levels of random effects as follows: Let the kth individual in the j th sub-cluster of the ith cluster be indexed by k). Let the hazard function for individual k) at time t be denoted by hijk(t). We assume that, given random effects U = u, the hazard functions for individuals 129 are conditionally independent with hijk(t) = h0(t)Uij exp(x;FC/3), (10.1) where h0(t) is the baseline hazard function and Ui3s are positive random effects, or 'frailties', shared by all individuals of the same cluster. Suppose further that • T i , . . . , rg are distinct death times; • m/i is the multiplicity of deaths at time T > , ; • the risk set at rh is TZ(rh) = {(i,j,k) : ti3k > Th}, where ti3k is the observed survival time for individual (z, j, k). Let Yijk,h be 1 if individual (i,j,k) dies at Th and 0 otherwise. The fitting of random effects Cox model is equivalent to fitting the following auxiliary random effects Poisson model: Yijk,h\^J — u ~ Poisson (ui3 exp(ah + xjjkf3f) . (10-2) Given random effects, Peto's version of the conditional partial likelihood (Cox and Oakes 1984) and conditional Poisson likelihood are v£(3- Y I U - u) - TT " ( ^ t t f o ) ^ ( m h l ) and /(«, /3; Y | U = u) = JI " ' " ^ ^ ' > X p ^ + * 5 f ^ * (10.4) It can be shown that, given random effects, Peto's version of the conditional partial likelihood and conditional Poisson likelihood lead to the same maximum likelihood 130 estimates for regression parameters. Hence the orthodox BLUP approach may be adopted to fit the random effects Cox models via Poisson mixed models. 131 A p p e n d i x A D a t a sets Table A. l : Seed germination data (a)seeds: O. Aegyptiaca 75 and 73, (b)root tracts: bean and cucumber 0. Aegyptiaca 75 0. Aegyptiaca 73 Bean Cucumber Bean Cucumber r n r/n r n r/n r n r/n r n r/n 10 39 0.26 5 6 0.83 8 16 0.50 3 12 0.25 23 62 0.37 53 74 0.72 10 30 0.33 22 41 0.54 23 81 0.28 55 72 0.76 8 28 0.29 15 30 0.50 26 51 0.51 32 51 0.63 23 45 0.51 32 51 0.63 17 39 0.44 46 79 0.58 0 4 0 3 7 0.43 10 13 0.77 132 Table A.2: Epilepsy data: successive two-week seizure counts for 59 epileptics, (a) Trt: treatment 0=placebo, l=progabide); (b) Base: eight-week baseline seizure counts; (c)Age (in years). ID Vi Y2 Y3 n Base Age Trt 104 5 3 3 3 11 31 0 106 3 5 3 3 11 30 0 107 2 4 0 5 6 25 0 114 4 4 1 4 8 36 0 116 7 18 9 21 66 22 0 118 5 2 8 7 27 29 0 123 6 4 0 2 12 31 0 126 40 20 23 12 52 42 0 130 5 6 6 5 23 37 0 135 14 13 6 0 10 28 0 141 26 12 6 22 52 36 0 145 12 6 8 4 33 24 0 201 4 4 6 2 18 23 0 202 7 9 12 14 42 36 0 205 16 24 10 9 87 26 0 206 11 0 0 5 50 26 0 210 0 0 3 3 18 28 0 213 37 29 28 29 111 31 0 215 3 5 2 5 18 32 0 217 3 0 6 7 20 21 0 219 3 4 3 4 12 29 0 220 3 4 3 4 9 21 0 222 2 3 3 5 ' 17 32 0 226 8 12 2 8 28 25 0 227 18 24 76 25 55' 30 0 230 2 1 2 1 9 40 0 234 3 1 4 2 10 19 0 238 13 15 13 12 47 22 0 101 11 14 9 8 76 18 1 102 8 7 9 4 38 32 1 133 Table A.2: continued ID Yi Y2 Yz Yt Base Age Trt 103 0 4 3 0 19 20 1 108 3 6 1 3 10 30 1 110 2 6 7 4 19 18 1 111 4 3 1 3 24 24 1 112 22 17 19 16 31 30 1 113 5 4 7 4 14 35 1 117 2 4 0 4 11 27 1 121 3 7 7 7 67 20 1 122 4 18 2 5 41 22 1 124 2 1 1 0 7 28 1 128 0 2 4 0 22 23 1 129 5 4 0 3 13 40 1 137 11 14 25 15 46 33 1 139 10 5 3 8 36 21 1 143 19 7 6 7 38 35 1 147 1 1 2 3 7 25 1 203 6 10 8 8 36 26 1 204 2 1 0 0 11 25 1 207 102 65 72 63 151 22 1 208 4 3 2 4 22 32 1 209 8 6 5 7 41 25 1 211 1 3 1 5 32 35 1 214 18 11 28 13 56 21 1 218 6 3 4 0 24 41 1 221 3 5 4 3 16 32 1 225 1 23 19 8 22 26 1 228 2 3 0 1 25 21 1 232 0 0 0 0 13 36 1 236 1 4 3 2 12 37 1 134 Table A.3: Cake baking data: breaking angles (deg Temperature Rep.. 175° 185° 195° 205° 215° 225° 1 42 46 47 39 53 42 2 47 29 35 47 57 45 3 32 32 37 43 45 45 4 26 32 35 24 39 26 5 28 30 31 37 41 47 6 24 22 22 29 35 26 7 26 23 25 27 33 35 Recipe I 8 24 33 23 32 31 34 9 24 27 28 33 34 23 10 24 33 27 31 30 33 11 33 39 33 28 33 30 12 28 31 27 39 35 43 13 29 28 31 29 37 33 14 24 40 29 40 40 31 15 26 28 32 25 37 33 1 39 46 51 49 55 42 2 35 46 47 39 52 61 3 34 30 42 35 42 35 4 25 26 28 46 37 37 5 31 30 29 35 40 36 6 24 29 29 29 24 35 7 22 25 26 26 29 36 Recipe II 8 26 23 24 31 27 37 9 27 26 32 28 32 33 10 21 24 24 27 37 30 11 20 27 33 31 28 33 12 23 28 31 34 31 29 13 32 35 30 27 35 30 14 23 25 22 19 21 35 15 21 21 28 26 27 20 1 46 44 45 46 48 63 . 2 43 43 43 46 47 58 3 33 24 40 37 41 38 4 38 41 38 30 36 35 5 21 25 31 35 33 23 6 24 33 30 30 37 35 7 20 21 31 24 30 33 Recipe III 8 24 23 21 24 21 35 9 24 18 21 26 28 28 10 26 28 27 27 35 35 11 28 25 26 25 38 28 12 24 30 28 35 33 28 13 28 29 43 28 33 37 14 19 22 27 25 25 35 15 21 28 . 25 25 31 25 135 Bibl iography [1] Aichison, J. and Ho, C H . (1989). The multivariate Poisson log-normal distribu-tion. Biometrika 76, 643-653. [2] Anderson, D.A. and Aitkin, M. (1985) Variance component models with binary responses: interviewer variability. J. Roy. Statist. Soc. Ser. B 47, 203-210. [3] Artes, R. and J0rgensen, B. (1998). Longitudinal data estimating equations for dispersion models. Technical report RT-MAE 9802, Institute of Mathematical Statistics, University of Sao Paulo. [4] Atkinson, A.C. (1981). Two graphical displays for outlying and influential ob-servations in regression. Biometrika 68, 13-20. [5] Breslow, N.E. (1984) Extra-Poisson variation in log-linear models. Applied Statist. 33, 38-44. [6] Breslow, N.E. (1996). Generalized linear models: Checking assumptions and strengthening conclusions. Statistica Applicata 8, 23-41. [7] Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed model. J. Amer. Statist. Assoc. 88, 9-25. [8] Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82, 81-91. 136 [9] Brillinger, D. and Preisler, H.K. (1986) Two examples of quantal data analy-sis. Proceedings of the 13th International Biometrics Conference Seattle: The Biometrics Society. 95-113. [10] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods 2nd ed. New York: Springer-Verlag. [11] Burton, P., Gurrin, L., and Sly, P. (1998). Tutorial in biastatistics. Extending the simple linear regression model to account for correlated responese: an in-troduction to generalized estimating equations and multi-level mixed modelling. Statist. Med. 17, 1261-1291. [12] Clayton, D.G. (1991) A Monte Carlo method for Bayesian inference in frailty models. Biometrics 47, 467-485. [13] Clayton, D.G. (1996). Discussion of the paper by Lee Y., and Nelder J.A., Hier-archical generalized linear models. J. Roy. Statist. Soc. Ser. B 58, 667. [14] Cochran, W.G. and Cox, D.M. (1957). Experimental Designs. 2ed. New York: Wiley. [15] Cox, D.R. and Oakes, D. (1984) Analysis of Survival Data. London: Chapman & Hall. [16] Crowder, M.J. (1978). Beta-binomial Anova for proportions. Appl. Statist. 27, 34-37. [17] Crowder, M.J. (1986). On consistency and inconsistency of estimating equations. Econometric Theory 3, 305-330. [18] Crowder, M.J. (1987). On linear and quadratic estimating function. Biometrika 74, 591-597. 137 [19] Dean, C B . and Lawless, J.F. (1989). A mixed Poisson-inverse Gaussian regres-sion model. Canad. J. Statist. 17, 171-181. [20] Dempster, A.P., Laird, N. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39, 1-38. [21] Draper, D. (1996). Discussion of the paper by Lee Y., and Nelder J.A., Hierar-chical generalized linear models J. Roy. Statist. Soc, Ser. B 58, 662. [22] Engel, B. and Keen, A. (1996). Discussion of the paper by'Lee Y., and Nelder J. A., Hierarchical generalized linear models, J. Roy. Statist. Soc, Ser. B 58, 656. [23] Firth, D. and Harris, I.R. (1991). Quasi-likelihood for multiplicative random effects. Biometrika 78, 545-555. [24] Fitzmaurice, G.M. (1995). A caveat concerning independence estimating equa-tions with multivariate binary data. Biometrics 51, 309-317. [25] George, E.I., Makov, U.E. and Smith, I.G. (1987). Conjugate likelihood distri-butions. Scand. J. Statist. 20, 147-156. [26] Geyer, C J . and Thompson, E.A. (1992). Constrained Monte Carlo maximum likehihood for dependent data. J. Roy. Statist. Soc, Ser. B 54, 657-699. [27] Gilks, W.R., Wang, C C , Yvonnet, B., and Coursaget, P. (1993). Random-effects models for longitudinal data using Gibbs sampling. Biomatrics 49, 441-453. [28] Gilmour, A.R., Anderson, R. and Rae, A.L. (1985) The analysis of binomial data by a generalized linear mixed model. Biometrika 72, 593-599. 138 [29] Glifford, P. (1993). Discussion on the meeting on the Gibbs sampler and other Markov chain Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 55, 53-54. [30] Godambe, V.P. (1976). Conditional likelihood and unconditional optimum esti-mating equations. Biometrika 63, 277-284. [31] Goldstein, H. (1995). Multilevel Statistical Models. New York: Wiley. [32] Gray, J., Jesson, D., Goldstein, H., Hedger, J. and Rasbash, J. (1995). A multi-level analysis of school improvement: changes in schools' performance over time. School Effectiveness and School Improvement, 6, 97-114. [33] Harvey, A.C. (1981). Time Series Models. Oxford: Allan. [34] Harville, D.A. and Mee, R.W. (1984) A mixed model procedure for analyzing ordered categorical data. Biometrics 40, 393-408. [35] Harville, D. (1977). Maximum likelihood approaches to variance component es-timation and related problems. J. Amer. Statist. Assoc. 72, 320-340. [36] Hedeker, D. and Gibbons, R.D. (1994). A random-effects ordinal regression model for multilevel analysis. Biometrics 50, 933-944. [37] Henderson, C.R. (1975). Best linear unbiased estimation and prediction under a selection model. Biomatrics 31, 423-447. [38] Hinde, J. (1982) Compound Poisson regression models. In GLIM 82: Proceedings of International Conference on Generalized Models, ed. R. Gilchrist, Berlin: Springer. 109-121. [39] Hobert, J. and Casella, G. (1996) The effect of improper priors on Gibbs Sam-pling in hierarchical linear mixed models. J. Amer. Statist. Assoc. 91, 1461-1473. 139 [40] J0rgensen, B. (1987). Exponential dispersion models (with discussion). J. Roy. Statist. Soc. Ser. B 49, 127-162. [41] J0rgensen, B. (1997). The Theory of Dispersion Models. London: Chapman & Hall. [42] J0rgensen, B. and Labouriau, R.S. (1995). Exponential Family and Theoret-ical Inference. Lecture notes. Department of Statistics, University of British Columbia, Vancouver, Canada. [43] J0rgensen, B., Labouriau, R. and Lundbye-Christensen, S. (1996a). Linear growth curve analysis based on exponential dispersion model. J. Roy. Statist. Soc. Ser. B 58, 573-592. [44] J0rgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1996b). A longitudinal study of emergency room visits and air pollution for Prince George, British Columbia. Statist. Med. 15, 823-836. [45] J0rgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1996c). State-space models for multivariate longitudinal data of mixed types. Canad. J. Statist. 24, 385-402. [46] J0rgensen, B., Martinez, J.R. and Tsao, M. (1994). Asymptotic behaviour of the variance function. Scand. J. Statist. 21, 223-243. [47] J0rgensen, B. and Souza, M.C.P. (1994). Fitting Tweedie's compound Poisson model to insurance claims data. Scand. Actuarial J. 1, 69-93. [48] Karim, M.R. and Zeger, S.L. (1992). Generalized linear models with random effects; Salamander Mating Revisited. Biometrics 48, 631-644. 140 [49] Knudsen, S.J. (1998). Estimating functions and Separate Inference. Cand. Scient. thesis, Department of Theoretical Statistics, Aarhus University. [50] Laird, N.M. (1978). Empirical Bayes methods for two-way contingency tables. Biometrika 65, 581-590. [51] Lambert, D. and Roeder, K. (1995). Overdispersion diagnostics for generalized linear models. J. Amer. Statist. Assoc. 90, 1225-1236. [52] Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canad. J. of Stat. 15, 209-225. [53] Lee, Y. and Nelder, J.A. (1996). Hierarchical generalized linear models. J. Roy. Statist. Soc. Ser. B 58, 619-678. [54] Lele, S. (1991). Resampling using estimating equations. Theory of Estimating Equations.(ed. Godambe, V.P.) Oxford: Clarendon Press. [55] Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. [56] Lin, X. and Breslow, N.E. (1996a). Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion. J. Amer. Statist. Assoc. 9 1 , 1007-1016. [57] Lin, X. and Breslow, N.E. (1996b). Analysis of correlated binomial data in logistic-normal models. J. Statist. Comput. Simul. 55, 130-146. [58] Lipsitz, S.R., Fitzmaurice, G. M., Orav, E. J., and Laird, N. M. (1994). Per-formance of generalized estimating equations in practical situations. Biometrics 50, 270-278. 141 [59] McCullagh, P. (1983). Quasi-likelihood functions Annals of statistics 11, 59-67. [60] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models. 2nd ed. Lon-don: Chapman & Hall. [61] McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92, 162-170. [62] McGilchrist, C A . (1994). Estimation in generalized mixed models. J. Roy. Statist. Soc. Ser. B 56, 61-69. [63] McGilchrist, C.A.and Yau, K.K.W. (1995). The derivation of BLUP, ML, REML estimation methods for generalised linear mixed models. Commun. Statist., The-ory and Methods 24, 2963-2980. [64] McLeish, D.L. and Small, C.G. (1988). The Theory and Applications of Statistical Inference Functions. Lecture Notes in Statistics 44, New York: Springer-Verlag. [65] Morton, R. (1987). A generalized linear model with nested strata of extra-Poisson variation. Biometrics 74, 247-257. [66] Natarajan, R. and McCullogh, C.E. (1995). A note on existence of the posterior distribution for a class of mixed models for binomial responses. Biometrika 82, 639-643. [67] Paik, M . C , Tsai, W Y . and Ottman, R. (1994). Multivariate survival analysis using piecewise gamma frailty. Biometrics 50, 975-988. [68] Rao, C.R. (1973). Linear Statistical Inference and Its Applications. 2nd ed. New York: Wiley. 142 [69] Ripley, B.D. and Venables, W.N. (1994). Modern Applied Statistics with Splus. New York: Springer-Verlag. [70] Robinson, C.K. (1991). That BLUP Is a Good Thing: The Estimation of Random Effects. Statist. Sci. 6, 15-32. [71] Schabenberger, O. (1996). Population-averaged and subject-specific approaches for clustered categorical data. J. Statist. Simul. 54, 231-253. [72] Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78, 719-727. [73] Smith, A.F.M. and Roberts, G.O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 55, 3-23. [74] Stiratelli, R., Laird, N. and Ware, J. (1984) Random effects models for serial observations with binary responses. Biometrics 40, 961-971. [75] Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657-671. [76] Waclawiw, M.A. and Liang, K.Y. (1993). Prediction of random effects in the generalized linear model. J. Amer. Statist. Assoc. 88, 171-178. [77] Whitehead, J. (1980) Fitting Cox's regression model to survival data using GLIM. Applied Statistics 29, 268-275. [78] Williams, D.A. (1982) Extra-binomial variation in logistic linear models. Applied Statist. 3 1 , 144-148. 143 [79] Wolfinger, R. and O'Connell, M. (1993). Generalized linear mixed models: A pseudo-likelihood approach. J. Statist. Cornput. Simul. 48, 233-243. [80] Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects; A Gibbs sampling approach. J. Amer. Statist. Assoc. 86, 79-86. [81] Zeger, S.L. and Liang, K.-Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42, 121-130. 144