- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Simultaneous inference for generalized linear mixed...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Simultaneous inference for generalized linear mixed models with informative dropout and missing covariates 2003
pdf
Page Metadata
Item Metadata
Title | Simultaneous inference for generalized linear mixed models with informative dropout and missing covariates |
Creator |
Wu, Kunling |
Date Created | 2009-11-20T01:27:04Z |
Date Issued | 2009-11-20T01:27:04Z |
Date | 2003 |
Description | Generalized linear mixed effects models (GLMMs) are popular in many longitudinal studies. In these studies, however, missing data problems arise frequently, which makes statistical analyses more complicated. In this thesis, we propose an exact method and an approximate method for GLMMs with informative dropouts and missing covariates, and provide a unified approach for simultaneous inference. Both methods are implemented by Monte Carlo EM algorithms. The approximate method is based on Taylor series expansion, and it avoids sampling the random effects in the E-step. Thus, the approximate method may be computationally more efficient when the dimension of random effects is not small. We also briefly discuss other methods for accelerating the EM algorithms. To illustrate the proposed methods, we analyze two real datasets, a AIDS 315 dataset and a dataset from a parent bereavement project, using these methods. A simulation study is conducted to evaluate the performance of the proposed methods under various situations. |
Extent | 3704414 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2009-11-20T01:27:04Z |
DOI | 10.14288/1.0091594 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 2004-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/15346 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0091594/source |
Download
- Media
- ubc_2004-0111.pdf [ 3.53MB ]
- Metadata
- JSON: 1.0091594.json
- JSON-LD: 1.0091594+ld.json
- RDF/XML (Pretty): 1.0091594.xml
- RDF/JSON: 1.0091594+rdf.json
- Turtle: 1.0091594+rdf-turtle.txt
- N-Triples: 1.0091594+rdf-ntriples.txt
- Citation
- 1.0091594.ris
Full Text
Simultaneous Inference for Generalized Linear Mixed Models with Informative Dropout and Missing Covariates by Kunling Wu M.Sc, Beijing Polytechnic University, China, 1999 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF Master of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Statistics) We accept this thesis as conforming to the required standard The University of British Columbia December 2003 © Kunling Wu, 2003 Library Authorization In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Kunling Wu 19/12/2003 Name of Author (please print) Date (dd/mm/yyyy) Title of Thesis: Simultaneous Inference for Generalized Linear Mixed Models with Informative Dropout and Missing Covariates Degree: Master of Science Department of Statistics The University of British Columbia Vancouver, B C Canada Year: 2003 Abstract Generalized linear mixed effects models (GLMMs) are popular in many longitudinal studies. In these studies, however, missing data problems arise frequently, which makes statistical analyses more complicated. In this thesis, we propose an exact method and an approximate method for GLMMs with informative dropouts and missing covariates, and provide a unified approach for simultaneous inference. Both methods are implemented by Monte Carlo E M algorithms. The approximate method is based on Taylor series ex- pansion, and it avoids sampling the random effects in the E-step. Thus, the approximate method may be computationally more efficient when the dimension of random effects is not small. We also briefly discuss other methods for accelerating the E M algorithms. To illustrate the proposed methods, we analyze two real datasets, a AIDS 315 dataset and a dataset from a parent bereavement project, using these methods. A simulation study is conducted to evaluate the performance of the proposed methods under various situations. ii Contents Abstract ii Contents iii List of Tables vii List of Figures viii Acknowledgements ix Dedication x 1 Introduction 1 1.1 Generalized Linear Mixed Effect Models 1 1.2 Missing Data Problems 3 1.3 Motivating Examples 5 1.3.1 Example 1 5 1.3.2 Example 2 5 1.4 Objectives and Outline 6 2 Generalized Linear Mixed Models and Missing Data 8 2.1 Introduction 8 iii 2.2 Generalized Linear Models 8 2.2.1 Model Specification 9 2.2.2 Maximum Likelihood Estimation in GLMs 10 2.2.3 Quasi-Likelihood Approach 13 2.3 Generalized Linear Mixed Models 14 2.3.1 Generalized Linear Mixed Models 14 2.3.2 Maximum Likelihood Estimation 15 2.3.3 Literature for Generalized Linear Mixed Models 16 2.4 Literature for Missing Data 17 2.4.1 Literature of Informative Dropout 17 2.4.2 Literature of Missing Covariates 18 3 Exact Inference for G L M M s with Informative Dropout and Missing Covariates 20 3.1 Introduction 20 3.2 Models and Likelihood . . . •. 21 3.3 Monte Carlo E M Algorithm 23 3.3.1 E-step 24 3.3.2 M-step 26 3.3.3 Variance Estimation 27 3.4 Sampling Methods 28 3.4.1 Gibbs Sampler 28 3.4.2 Adaptive Rejection Algorithm -. . 29 3.4.3 Rejection Sampling 30 3.4.4 Sampling Method for Binary Variables 31 iv 3.5 P X - E M Algorithm 31 3.6 Convergence 34 4 Approximate Inference for G L M M s with Informative Dropout and Missing Covariates 35 4.1 Introduction 35 4.2 Approximate Inference without Missing Values 36 4.3 Approximate Inference with Missing Values 39 4.4 Strategies for Sampling the Missing Values 42 4.5 P X - E M 43 5 Covariate Models and Dropout Models 46 5.1 Introduction 46 5.2 Dropout Models 46 5.3 Covariate Models 48 5.4 Sensitivity Analyses 49 6 Real Data Examples 50 6.1 Introduction 50 6.2 Example 1 51 6.2.1 Data Description . . . ' 51 6.2.2 Models 52 6.2.3 Analysis and Results 54 6.2.4 Sensitivity Analysis 56 6.2.5 Conclusion 58 6.3 Example 2 '. • 59 6.3.1 Data Description 59 v 6.3.2 Models 60 6.3.3 Analysis and Results 62 6.3.4 Sensitivity Analysis 63 6.3.5 Conclusion 65 6.4 Computation Issues 65 7 Simulation Study 71 7.1 Introduction 71 7.2 Description of the Simulation Study 72 7.2.1 Models 72 7.2.2 Bias and Mean-Squared Error 73 7.3 Simulation Results 74 7.3.1 .Comparison of Methods with Varying Missing Rates 74 7.3.2 Comparison of Methods with Different Variances 75 7.3.3 Comparison of Methods with Different Sample sizes 76 7.3.4 Comparison of Methods with Varying Intfa-individual Measurements 76 7.3.5 Conclusion 77 8 Conclusion and Discussion 81 Bibliography 84 vi List of Tables 6.1 Summary statistics 52 6.2 Estimates for the AIDS data 55 6.3 Sensitivity analysis for covariate models 57 6.4 Sensitivity analysis for. dropout models •. • • • 58 6.5 Summary statistics , 60 6.6 Estimates for the Parent Bereavement data 62 6.7 Sensitivity analysis for covariate models 63 6.8 Sensitivity analysis for dropout models 64 7.1 Simulation results with varying missing rates 75 7.2 Simulation results with varying variances 76 7.3 Simulation results with varying sample sizes 77 7.4 Simulation results with varying intra-individual measurements 78 vii List of Figures 6.1 Viral loads (log 1 0 scale) for six randomly selected patients. The open dots are the observed values and the dashed line indicates the detection limit of viral loads. The viral loads below the detection limit are substituted with log 1 0 (50) 67 6.2 GSI scores for six randomly selected parents. The open dots are the ob- served values and the GSI scores at time 0 are the baseline values 68 6.3 (a) Time series and (b) autocorrelation function plots for CH50 69 6.4 (a) Time series and (b) autocorrelation function plots for 6 4 6 associated with patient 46 70 7.1 (a) Time series and (b) autocorrelation function plots for z 2 79 7.2 (a) Time series and (b) autocorrelation function plots for 6 1 8 associated with individual 18 80 viii Acknowledgements First and foremost, I would like to thank my supervisor, Dr. Lang Wu, for his excellent guidance and immense help during my study at UBC. Without his support, expertise and patience, this thesis would not have been completed. Also, I would like to thank my second reader, Dr. Paul Gustafson, for his invaluable comments and suggestions on this thesis. Furthermore, I thank Dr. Nancy Heckman and Dr. Bertrand Clarke for their invaluable advice on my consulting projects, which will benefit me very much in the future. I thank all the faculty and staff in the Department of Statistics at UBC for providing such a nice academic environment. I also thank all the graduate students in the Department of Statistics for making my study at UBC so enjoyable. Most importantly I would like to thank my parents for loving me and believing in me. My big thanks goes to my beloved husband, Weiliang Qiu, for his love, his constant support and encouragement, which push me to be the best at everything I do. K U N L I N G W U The University of British Columbia December 2003 ix To my parents and husband. Chapter 1 Introduction 1.1 Generalized Linear Mixed Effect Models Longitudinal data or repeated measurement data occur frequently in many applications where repeated measurements are obtained for each individual. Statistical analysis of longitudinal data are reviewed in Diggle, Liang and Zeger (1994). One of the key ad- vantages of a longitudinal study over a cross-sectional study is to separate variation over time within an individual from difference among individuals, while a cross-sectional study can not do this because it simply records one measurement for each individual. So the analysis of cross-sectional data may confound time effect and may give misleading results. For longitudinal data, it is important to recognize two sources of variations: intra-individual variation produced by different measurements within a given individual, and inter-individual variation among different individuals. Generalized linear models (GLMs) such as logistic regression models, extend nor- mal linear models to allow non-normal error distributions in the natural exponential family such as Poisson and binomial distributions. GLMs can handle not only continu- ous variables but also discrete variables, as long as the distribution of the variable belongs 1 to the natural exponential family. Therefore, GLMs provide a unified different approach for continuous and discrete responses and have wide applicability in practice. For ex- ample, in Agresti (1990), a sample of male residents of Framingham, Massachusetts, was collected according to their blood pressures. During a follow-up period, whether or not these male residents developed coronary heart diseases, was recorded and viewed as response. So the response variable is binary. To investigate the relationship between the blood pressure and the coronary heart disease, we can build a logistic regression model and then make statistical inferences based on this GLM. Generalized linear mixed models (GLMMs) are useful for longitudinal studies which extend GLMs by introducing random effects to account for correlation within the repeated measurements for a given individual. Such models can separate two kinds of variations, borrow information cross individuals and allow discrete and continuous responses. Therefore, GLMMs are very popular in the analysis of longitudinal data. A G L M M may be written as a hierarchical two-stage model. At the first stage, intra-individual variation is charactered by a G L M . In the second stage, inter-individual variation is represented through individual-specific regression parameters. Covariates are often introduced in the second stage to partially explain systematic variation. There are two main approaches to estimate parameters in GLMMs: (i) an exact likelihood inference based on numerical integration (Booth and Hobert (1999)), and (ii) an approximate inference based on linearization procedures via Taylor series expansion (Breslow and Clayton (1993); Vonesh et al. (2002)). In the exact inference, marginal likelihood is obtained by integrating out random effects from the joint distribution of re- sponse and random effects. By maximizing the marginal likelihood, we obtain estimates of parameters of interest. However, the integration is usually intractable, one may use Monte Carlo approximations to evaluate it (Wei and Tanner (1990)). The exact likelihood 2 inference works well with a small dimension of random effects. However, computation may become quite demanding or unstable as the dimension of random effects increases. In such cases, we may consider the approximate inference which avoids this computation difficulty by integrating out the random effects. The strategy for the approximate infer- ence is to iteratively solve L M E models based on second-order Taylor series expansion around current estimates. If the number of measurements for each individual is large enough, approximate methods may give reasonable estimates for parameters. Otherwise, approximate maximum likelihood estimates may be inconsistent. 1.2 Missing Data Problems Missing data are a serious problem in many applications and arise frequently in lon- gitudinal studies. Two kinds of missing data often occur in a longitudinal study: (i) missing covariates due to different measurement schedules for covariates and response or other problems; and (ii) missing responses due to dropout or missing visits. For example, individuals may withdraw or die before the end of study or do not come to the study center for measurements at scheduled times for various reasons. Missing data problems make statistical analysis in longitudinal studies much more complicated, since standard complete-data methods are not directly applicable. Commonly-used naive methods for missing data include the complete-case method, which deletes all incomplete observations, the mean imputation method, which substi- tutes missing values with the mean values of observed data, and the last-value-carried- forward method, which imputes a missing value by the immediate previous observed data. At the presence of missing data, the missing data mechanism must be taken into account in order to obtain valid statistical inference. Little and Rubin (1987) define three types 3 of missing data mechanisms. Missing data are missing completely at random (MCAR) if the probability of missingness is independent of both observed and unobserved data. For example, patients do not come to the study center because of reasons irrelevant to the treatment such as simply forgetting the appointment. Missing data are missing at ran- dom (MAR) if the probability of missingness depends only on observed data, but not on unobserved data. For example, a patient may occasionally fail to visit the clinic because he/she is too old. Missing data are nonignorable or informative missing data (NIM) if the probability of missingness depends on unobserved data. For example, a patient fails to visit the clinic because he/she is too sick. If missing values are MCAR, the complete-case method will give unbiased, but inefficient estimates. If the missing data are not MCAR, the naive methods may give biased, even misleading results due to not taking missing data information into consideration. MCAR and MAR are called ignorably missing. We can ignore the missing data mechanism in likelihood inference when missing values are ignorably missing (Little and Rubin (1987)). Little (1992, 1995) gave a review on missing covariates in regression and drop-out in repeated-measures studies. Ibrahim, Lipsitz, and Chen (1999) proposed a Monte-Carlo E M method for estimating parameters in GLMs with nonignorable missing covariates. Wu and Wu (2001) estimated parameters in nonlinear mixed effects models with missing covariates (MAR) by a three-step multiple imputation method. Wu and Carroll (1988) considered linear mixed effect models with informative dropout and assumed missingness depending on random effects. Ibrahim, Chen and Lipsitz (2001) developed a Monte Carlo E M algorithm for estimating parameters in GLMMs with informative dropout. However, little literature considers parameter estimation in GLMMs with informative dropout and missing covariates simultaneously. 4 1.3 Motivating Examples 1.3.1 Example 1 Our research is motivated by a longitudinal study from the AIDS Clinical Trial Group (ACTG) Protocol 315 (Wu and Ding (1999)). In this study, 46 HIV infected patients were treated with a potent antiviral drug. Plasma HIV-1 RNA (viral loads) were repeatedly quantified on days 2, 7, 10, 14, 21, 28, and weeks 8, 12, and 24 after initiation of treatment. After the antiviral treatment, the patients' viral loads will decay, and the decay rate may reflect the efficacy of the treatment. The Nucleic Acid Sequence-Based Amplification assay that is used to measure the viral load has a detection limit. If the viral load drops below the detection limit after the treatment, the viral load can not be measured, which may indicate that the treatment may be successful. To investigate the treatment effect, one approach is to define the response as whether the viral load is below the detection limit or not, which is thus a binary variable. In this study, patients drop out before the end of the study, and the dropout may be informative. Thus, the response contains non- ignorable missing values. Preliminary studies show that some baseline covariates such as CD4 cell counts, tumor necrosis factor (measured by TNF levels) and total complement levels (measured by CH50), may partially explain variation in the viral load trajectory. However, some of these covariates are also missing. Our objectives are to model the viral load trajectory and to identify covariates that may partially predict changes of viral loads, in the presence of informative dropouts and missing covariates. 1.3.2 Example 2 Our second example involves a longitudinal study from a parent bereavement project, which investigates the long-term mental outcomes of parents whose children died by 5 accident, suicide, or homicide. After their children's death, the parents usually experience a high level of mental distress. In this study, the mental distress of 239 parents were measured at baseline (i.e. 4 to 6 weeks after their children's death), and then at 4, 12, 24 and 60 months post-death. The Global Severity Index (GSI), which is the most sensitive indicator of mental distress, is used to measure the parents' distress levels. A high GSI score indicates a high level of mental distress. If the parents' adjustment to their children's death goes well, their GSI scores will decrease over time, at least lower than their baseline GSI scores. To examine how the parents' mental distress changes over time after their children's death, we treat whether or not a parent's GSI score after baseline is lower than his/her baseline value as response. Several other relevant factors were also obtained at baseline, including parents' gender, marital status, age, education, annual income, the cause of death, age and gender of the deceased child. These baseline factors may be important predictors of parents' distress and thus are viewed as covariates. Note that some baseline covariates such as income contain missing values, and some responses are also missing. Our objectives are to investigate the change of parent's distress levels over time and to determine which covariates affect the parent's mental distress. 1.4 Objectives and Outline In this thesis, we develop an exact inference method, implemented by a Monte-Carlo E M algorithm, to make simultaneous inferences for GLMMs with informative dropout and missing covariates. To avoid computational difficulties when the dimension of random effects is not small, we propose an approximate inference method, which integrates out the random effects for more efficient computation. The remainder of this thesis is organized as follows. Chapter 2 introduces GLMs 6 and GLMMs and reviews the literature about informative dropout and missing covari- ates. Chapter 3 discusses the exact inference method for estimation of GLMMs with informative dropout and missing covariates. The approximate inference method based on linearization is presented in Chapter 4. We discuss dropout models and covariate models in Chapter 5. In Chapter 6, we apply our methods to two real data examples. Chapter 7 presents our simulation study. We conclude the thesis with a discussion in Chapter 8. 7 Chapter 2 Generalized Linear Mixed Models and Missing Data 2.1 Introduction Before we present our methods for estimating parameters in GLMMs with informative dropout and missing covariates, we give a brief introduction to GLMs, GLMMs, and methods for the missing data problems in this chapter. In Section 2.2, we introduce GLMs and the methods of estimation for parameters in GLMs. Section 2.3 describes GLMMs, briefly discusses two main methods for estimating parameters in GLMMs and reviews the literature for GLMMs. In Section 2.4, we give a literature review about methods of handling informative dropout and missing covariates respectively. 2.2 Generalized Linear Models A classical linear model is useful to model a continuous response under the assumption that the response follows a normal distribution and a linear relationship exists between 8 the mean of the response and covariates. However, in practise, some non-normal dis- tributions such as binomial, Poisson, etc, may be better assumptions for some response variables such as discrete variables. For example, we may want to study whether devel- oping a heart disease relates to the blood pressure level. Here, we treat the health status of patients' heart as our response. The response is thus a binary variable which takes values of 0 or 1, where 0 means that a patient has a heart disease and 1 means that a patient has no heart disease. Obviously, here the assumption of normality is completely unrealistic. Moreover, frequently the mean of the response can not be expressed as a linear form of the covariates. In those situations, we can not use standard linear models. Generalized linear models (GLMs), which are a extension of classical linear models, can not only deal with variables whose distributions come from the exponential family but also allow nonlinear forms between the mean of responses and the covariates. Variables in the exponential family include continuous variables such as normal and exponential, and discrete variables such as binomial and Poisson. Due to the capability to handle continuous data as well as discrete data, GLMs unify different methodologies and thus have wide applicability in practice. 2.2.1 M o d e l Specification GLMs are specified by three components including a random component, a systematic component, and a link function. Let y — (yi,y2) • • • IVN)T be a vector of independent and identically distributed (i.i.d) observations whose distribution belongs to the natural exponential family. Then the density function of each observation y* can be expressed in the form f(yu A) = exp{[yi6»i - ¥>(0i)]/a(0) + <Vu </>)}, (2.1) 9 where a(.), ip(.) and c(.) are specific functions, 6 = (#1,62, • • • , 6N)T is called the natural parameters, and <p is called the dispersion parameter. The random component of GLMs is specified by the above density function of the response variable. The systematic component specifies the relation between covariates CCJ to the linear predictor rji by a linear form V i = xJ(3 i = l,2,.•• ,N, (2.2) where j3 is a vector of regression parameters. ' The mean \i{ = E[%ji) is related to the linear predictor through the link component of GLMs rh = xj(3 = g(iii) i = l,2,---,N, (2.3) or Pi = g-1(Vi)=9~1(xJf3) i = l,2,---,N, (2.4) where g(.) which is a monotone and differentiate function called the link function. In the exponential family, if a link function g(.) satisfies g(fa) = Oi(fJ-i), then the link is called the canonical link or natural link. Binomial, Poisson and normal variables all have canonical link functions. A function g(/i,) = fa gives the identity link. For example, normal variables have the identity link function. In summary, GLMs allow for linear as well as non-linear models under a single framework. Moreover, GLMs make it possible to fit models where the underlying data are normal, Poisson, binomial, etc, by a suitable choice of the link functions. 2.2.2 Maximum Likelihood Estimation in GLMs The principal method of estimation used in GLMs is the maximum likelihood method. In this section, we will briefly describe how to obtain the maximum likelihood esti- 10 mates of parameters in GLMs. We assume <p is known, then c(yi, (f>) is a constant in the log-likelihood function about 9 and thus is not ignored in the following log-likelihood function. For N independent observations, the log-likelihood function is N l(0\y) = Y^m\yi) i=l N i=l N S o m e u s e f u l E q u a t i o n s Now we will derive some useful identities used in maximizing the likelihood function. The derivation of (2.5) with respect to 0$ gives dl 1 / d<p(0i)\ , . an i aytfli) ( 2 7 ) 39} a{cp) de2 ' 1 ' ' The following is two well-known likelihood results that we use here: K i t ' - 0 - <2-8» i dl \ „ (d2l » Substituting (2.6) and (2.7) to (2.8) and (2.9) respectively gives and Efa) = M*<) = ^ , (2-10) i ay(^) I dmjQi) I n where V ( / / i ) = dni(6i)/d9i is often called the variance function. Equation (2.11) indicates that the variance of the response depends on its mean. We differentiate two sides of equation (2.3) with respect to (3 and obtain = dg{fM) d^j d9j Upon rearrangement, the above equation can be written as d6i 1 d(3 V(ni)dg(fii)/dfjLi Maximum Likelihood Estimation (2.12) To obtain the maximum likelihood estimates .(MLEs) of /3, we differentiate (2.5) with respect to (3, and then apply (2.10), (2.11) and (2.12) to get the following score function dl(0\y) 509) = 0(3 N .dkip^ddi E -Uh\Pi\yi) Wi d6i df3 (2-13) 1 N = _ L _ y V i - fM ^ a(<^) ~[ V{ni)dg(pi)/diii Let W = diag" 1 {Vi^idgi^/d^f, ••• , VifMN^dgifi^/diJLftf}, X = (x1,x2,--- ,xN), A = d\B%{dg(ni)/dtiudgM/dya, • • • ,dg(ixN)/dfj,N} , Then (2.13) can be rewritten as = S((3) = -^-rXWA(Y - M)- (2-14) df3 a(4>) 12 MLEs can be obtained by solving the following score equation S(f3) = -^XWA(Y-»((3)) = Q- (2.15) The solution to the above equation (2.15) can be performed by Fisher scoring algorithm or Gauss-Newton algorithm. In the case of canonical links, both Fisher scoring and Newton-Raphson reduce to the iteratively re-weighted least squares algorithm. Under the regularity condition, MLEs of parameters in GLMs have the asymptotic normality property P^N((3,a(<l>)(XWXT)-1). As we see, the asymptotic covariance matrix of (3 is equal to the inverse of the expected Fisher information matrix, which is F(/3) = -E(^M) = -±-XWXT. (2.16) K 1 V d(3d(3T ' a(</>) y ' With a large sample size, we can apply this property to make inference about f3. 2.2.3 Quasi-Likelihood Approach Based on the fact that the just first two moments of variables are mainly involved in the score function, Wedderburn (1974) proposed the quasi-likelihood method for estimating parameters in GLMs. The advantages of this method are that we do not need to make specific distribution assumptions, and that its estimators own the similar asymptotic properties as MLEs. 13 2.3 Generalized Linear Mixed Models 2.3.1 Generalized Linear Mixed Models A G L M M is a extension of a G L M to longitudinal data by introducing random effects to account for correlation within repeated measurements for a given individual. It can sepa- rate the inter-individual variation and the intra-individual variation and borrow strength across individuals. Thus, a G L M M is very popular in the analysis of longitudinal data. A G L M M may be written as a hierarchical two-stage model. In the first stage, the intra- individual variation is specified by a generalized linear regression model. In the second stage, the inter-individual variation is represented through individual-specific regression parameters. Let yij denote the jth observation on individual i, % = 1, 2, • • • , JV; j = 1, 2, • • • , n,. Then there are a total of YliLi ni observations. • stage 1 (intra-individual variation) Let bi be the random effects associated with individual i. We assume that con- ditioning on bi, observations ya,yi2,-'' ,VirH a r e independent and each has the density function from the natural exponential family. fiVijlP, bi) = exp{[j/y0y - (p(dij)]/a(<l>) + c{yij, </))}, (2.17) E{yij\p, h) = faj = g(x%(3 + z%bi), (2.18) where 0 is a dispersion parameter, Here we assume that (f> is known. The function g(.) is the link function, 77^ = xfj/3 + is the linear predictor, and at̂ and are two vectors of covariates such as time, baseline value, etc. 14 • stage 2 (inter-individual variation) r1i = xjf3 + zjbi, (2.19) bi~N(0,D), (2.20) where ^ = [rjn,--- ,r)ini)T, xt - (xa, • • •-, xini), and zt = (zn,--- ,zini), and /3 is a vector of fixed parameters. We assume that the random effects, bi's, are i.i.d. The covariance matrix D in (2.20) quantifies the random inter-individual variation. 2.3.2 Maximum Likelihood Estimation Let Vi = (y )T. From the preceding section, the joint density of y = (yl, • • • , yN) and 6 = (6i, • • • , 6jv) can be written as N m f(y, b\f3, D) = HH f(yij\(3, bMibilD). (2.21) i=i j=i Since random effects b are unobservable variables, we integrate out random effects and obtain the marginal distribution for y N . m f(y\(3,D) = TJ / lJ{/(yy|A W(M^)}* . - (2-22) Thus, the corresponding log-likelihood is N / . m \ l(f3,D\y) = yj J[fMP> KfiPiWh J . (2.23) If the above log-likelihood has a closed form, we can obtain the MLEs of param- eters in GLMMs by solving the score equation as usual. However, usually integration with respect to random effects is intractable such that we can not get the closed form for the log-likelihood. This problem results in two main approaches of estimation for pa- rameters in GLMMs including an exact likelihood inference method based on numerical 15 integration and an approximate inference method based on linearization procedures via Taylor series expansion. In the exact inference, when integration becomes intractable due to moderate to large random effects, one may solve this problem by implementing the Monte Carlo E M algorithm. The exact inference method works very well with a small dimension of random effects. However, the computation may become quite de- manding or unstable as the dimension of random effects increases, while the approximate inference method avoids the computation problem by integrating out the random effects. The strategy for the approximate inference method is to iteratively solve L M E models based on first-order or second-order Taylor series expansion around current estimates. If the number of the intra-individual measurements (measurements for each individual) is large enough, the approximate method may give reasonable estimates for parameters. Otherwise, approximate MLEs may be inconsistent. 2.3.3 Literature for Generalized Linear Mixed Models McCulloch (1997) derived a Monte Carlo Newton-Raphson algorithm and combined it with a simulated maximum likelihood method to come up with a hybrid method for GLMMs. His simulation study showed that the Monte Carlo E M algorithm, the Monte Carlo Newton-Raphson algorithm and the hybrid method worked well in calculating MLEs for GLMMs, and the hybrid method gave more precise estimators. Booth and Hobert (1999) proposed two new implementations for the maximum likelihood fitting in GLMMs. Both methods are carried out by the Monte Carlo E M algorithm. The main difference is that the first method uses a rejection sampling to generate samples from the exact conditional distribution of the random effects, while the second one uses a mul- tivariate t importance sampling. Breslow (1993) proposed a penalized quasi-likelihood (PQL) method and a marginal quasi-likelihood (MQL) method, and demonstrated their 16 suitability for inference in GLMMs by simulation and application in several examples. In the simulation study, PQL and M Q L made correct inferences on regression coefficients, but underestimated parameters a bit (in absolute value). Vonesh (2002) proposed a conditional second-order generalized estimation equation (CGEE2) to estimate the pa- rameters in GLMMs and also showed that the efficiency of estimators was improved due to the involvement of the second-order moment. 2.4 Literature for Missing Data We frequently encounter the missing data (response or/and covariate) problem in prac- tise. However, ignoring missing data or using overly simple methods to handle missing data often leads to invalid inference. Thus, it is very important to find appropriate ap- proaches to deal with missing data in our hand. Various strategies for considering the missing data mechanism have been proposed in the recent literature. 2.4.1 Literature of Informative Dropout Wu and Carroll (1988) considered linear mixed effect models with informative dropout under the assumption that the informative dropout could be modeled by a probit model which included the random effects as its covariates. Diggle and Kenward (1994) de- rived a likelihood method to get MLEs in a multivariate linear model with informative dropout modeled by a logistic regression model which included the response as covariate. Computation of the likelihood was speeded up by using the probit approximation to the logit transformation. Their simulation work has shown that considering the informative dropout mechanism in the statistical inference reduces the bias caused by the ordinary least square (OLS) estimator or by only considering the informative dropout as M A R . 17 Little (1995) gave a review on modeling the dropout mechanism in repeated-measures studies. Regarding how to factor the dropout mechanism, models handling dropout were classified into selection models and pattern-mixture models. The main difference between two types of models is that we need to specify the form of missing data mechanism in the selection models while pattern-mixture models do not require that. He classified NIM into nonignorable Outcome-Based missing data where the dropout depends on missing values, and Random-effect-Based missing data where the dropout depends on future val- ues. He also suggested to examine the sensitivity of results to the choice of missing data mechanism when we almost know nothing about the missing data mechanism. Ibrahim, Chen and Lipsitz (2001) developed a Monte Carlo E M algorithm to obtain MLEs in GLMMs with informative dropout and nonmontone missing data patterns. Moreover, they proposed that the missing data mechanism may be modeled by a logistic regression or a sequence of one-dimensional conditional distributions which may reduce the number of nuisance parameters. 2.4.2 Literature of Missing Covariates Little (1992) defined three special types of patterns of missing covariates: (i) univariate missing data where only one covariate values are missing, (ii) monotone or nested missing data where the (j + l)th covariate Xj+1 is observed for every case in which the jth (j = 1, 2, • • • ,p) covariate Xj is observed and (iii) a special pattern where two covariates can not be observed at the same time. He reviewed the methods of estimation in the' regression models with missing covariates. The six reviewed statistical methods dealing with missing covariates are compared in this paper, including complete-case methods, available-case methods, least squares on imputed data, maximum likelihood, Bayesian methods and multiple imputation. He suggested that the maximum likelihood, Bayesian methods and 18 multiple imputation would be a better choice for dealing with missing covariate problems. Moreover, he preferred the maximum likelihood in a large sample and Bayesian methods or multiple imputation in a small sample. Ibrahim (1990) analyzed the problem of missing covariates (MAR) in GLMs with discrete covariates and applied the E M algorithm to obtain MLEs under the assumption that the missing covariates came from a discrete distribution. The asymptotic variance of MLEs was estimated by computing the observed information matrix via Louis's method. Ibrahim, Lipsitz, and Ghen (1999) proposed a Monte-Carlo E M algorithm for estimating parameters in GLMs with nonignorable missing covariates. In this paper, they assumed a multinomial model for the missing data mechanism and a sequence of one-dimensional conditional distribution for unobserved covariates. Wu and Wu (2001) estimated parameters in nonlinear mixed effect models with missing covariates (MAR) by a three-step multiple imputation method. In first step, they fitted a hierarchical model without covariates. Then they imputed the missing covariates based on a multivariate linear model implemented by Gibbs sampler, and created B independent complete datasets in the second step. In the last step, they used the standard complete-data method to analyze each dataset and thus obtained the overall inference based on B analysis results. 19 Chapter 3 Exact Inference for GLMMs with Informative Dropout and Missing Covariates 3.1 Introduction In this chapter, we develop an exact inference method based on numerical integration to obtain MLEs for parameters in GLMMs with informative dropout and missing covariates. The proposed exact method is implemented by a Monte Carlo E M algorithm, which need to generate samples for missing values and random effects by Gibbs sampler in each E M step. In Section 3.2, we give a description of GLMMs with informative dropout and missing covariates, considered in this thesis. Section 3.3 describes a Monte Carlo E M algorithm for implementing the exact inference method. A detailed description of our sampling methods is provided in Section 3.4. In Section 3.5, we present a P X - E M algorithm, which may boost the convergence rate of the standard E M algorithm. Computation issues regarding our algorithm are discussed in Section 3.6. 20 3.2 Models and Likelihood We assume that data are collected from N individuals. Let y{ = (yu, • • • ,yini)T', where ytj is the outcome for individual % at time Uj, j = 1,2,-•• , n$, i = 1,2, ••• ,JV. The response t/j may contain missing values due to dropouts. So we write y{ = (yobSti, ymis,i)> where yO0Sti corresponds to the observed components oiyt, and ymis j contains the missing components of yt. Let r, = (ra,-- - ,riTH)T be a vector of missing response indicators such that Tij = 1 if is missing for individual % at time Uj, and = 0 if yij is observed for individual i at time t^. Let a;, = (xa, • • • ,Xif)T be a (/ x 1) vector of time-independent covariates for individual i. Since the time-independent covariates may also contain missing values, we write Xi = (x^i, xmiSti), where xO0S!i = the observed part of Xi and xmiS^ = the missing part of Xi. Let Si = (sn, • • • , S j / ) T be a vector of missing covariate indicators such that = 1 if Xik is missing and Sik = 0 if xu- is observed, k = l,---,f. Let /(.) denote a generic density function. If the response and all covariates are completely observed for each individual, the corresponding G L M M can be written as a hierarchical two-stage model as follows. IMP, bi) = e x P [{ViAjM - vtfijMfiM) + c(Vij, 4>)], (3.1) Vij =9(lMj) = Al/3 + zJjbi, (3.2) bil^dN(0,D), j = l,---,m, i = l,---,N, where £ , (y i J |6 j) = pij and (j) is the dispersion parameter (here we assume that </> is known). The function g(.) is a link function, is the linear predictor, (3 = (/3i, • • • , (5P)T is a vector of fixed effects, and 6j = (bn, ••• • ,biq)T is a vector of random effects. The covariate Afj = (xf,tfj) is a (1 x p) vector, where is a vector of time-dependent covariates. Usually, the covariate vector ẑ - is a subset of Aij. The q x q matrix D 21 quantifies the random inter-individual covariance. By integrating out the unobservable random effects bi: we obtain the following complete-data marginal distribution /(!/!••• ,yN\Xl--- ,xN,f3,D) = l [ / Y[{f(yij\p,bi,Xi)f(bi\D)}dbi. (3.3) i=l J j=l In the presence of missing values in the response and covariates, the complete- data marginal distribution becomes more complicated. When the missing responses are informative, we have to take into account the missing data mechanism, i.e., the distribution of the missing data indicators TV Otherwise, the estimates of parameters may be biased. In this thesis, we make the following assumptions: (i) The missing covariates are MAR, i.e, the missing covariate mechanism does not depend on any unobserved values, but may depend on observed values. In other words, the density function for the missing covariate indicator S j satisfies f(si\yuxt,S) = f ( s i \ y o b s i , x o b S t i , S ) , where S is a vector of parameters, (ii) The missing responses are informative, i.e, the missing response mechanism may depend on the unobserved values. We denote fir^y^Xi,^}) as the density function of the missing response indicator, where tf) is a vector of parameters, (iii) Let f(Xi\a) to be the density function for covariates X j , where a is a vector of parameters. Modeling strategies for specifying the missing data mechanism fir^y^Xi,^) and the covariate model f(xi\a) are explored in Chapter 5. By integrating out y m i s i and x m i S j i , we obtain the marginal distribution for the observed data ( y o b s , x o b s , r , s). N n p /* Tli f(y0bs,xobs,r,s\(3,D,ip,d,at.) =]J / Y[{f(yij\P,bi,xi)f(bi\D)f(xi\a) t=iJ J J j=i (3.4) fiTilVi, x u )f{Si\Xi, yi, 6)}dbidxmiStidymiSi, where y o b s = (yobSil, ••• , yobs,N), x o b s = (xobSil, ••• , x o b s > N ) , r = (n , • • • , r N ) and s = (si, • • • , SN). Rubin (1976) showed that the missing data mechanism can be ignored from likelihood inference if the data are MAR. Since we assume that the missing covariates 22 are M A R , ignoring the missing covariates mechanism leads to the the following observed data log-likelihood: Maximizing the above log-likelihood gives us the MLEs for parameters in the G L M M . However, the intractable integration in (3.5) makes the observed data log-likelihood diffi- cult to maximize..In this thesis, we propose an exact inference via Markov Chain Monte Carlo techniques and an approximate inference method via Taylor series expansion. In next section, we describe the Monte Carlo E M algorithm in details, which implements the exact inference method. The approximate inference method will be illustrated in Chapter 4. The E M algorithm (Dempster, Laid, and Rubin, 1977) is a very useful and powerful algo- rithm to compute MLEs in a wide variety of situations such as missing data and random effect models. Each iteration of a E M algorithm consists of an E-step that evaluates the expectation of "complete data" log-likelihood conditional on the observed data and previous parameter estimates, and a M-step that updates the parameter estimates by maximizing the expectation of the conditional log-likelihood. This iterative computation between the E-step and M-step till convergence leads to the MLEs. If we treat (yobSti, ymia^ xobsA, xmis4, bt, T\) = (yi} xt, fy, n) as the "complete data", l((3,D,ij>,cx\yobs,xobs,r,s (3.5) 3.3 Monte Carlo E M Algorithm 23 the complete data density for individual i is given by f(yi)xi)bi,ri\f3,D,ip,a) = f{Vi\P, bi, ̂ i)/(x i|o;)/(6 i| JD)/(r i|t/ i, xu ip). This leads to the complete data log-likelihood JV i=l N = £ [log{/d/il)3,6i,Xi)} + l o g g i a ) } ( 3 - 6 ) i=l + log{ / (6 i | £>)} + logifinlVi, xu iP)}], where 7 = (j3, a, ip, D) and (̂7; yt, xiy 7-j) is the contribution to the complete data log- likelihood from the ith individual. Note that we are mainly interested in estimating the parameters (/3,D), and treat (cx,ip) as nuisance parameters. Ibrahim et al. (2001) proposed a Monte Carlo E M algorithm for estimating param- eters in GLMMs with informative dropout without missing covariates. Here we extend their method to GLMMs with informative dropout and missing covariates for simultane- ous inference. 3.3.1 E-step Let 7^ be the current parameter estimates. Then the conditional expectation of the complete-data log-likelihood given the observed data for individual % at the (t + l)st E M 24 iteration is given by Qi{i\i(t)) E! (jiilf > Vii x i i Ti)\yobs,ii Xobs,ii fii ^) = / / /[log{/(j/ i | iS,6i ) a; i )} + log{/(aj i.|a)} (3.7) + logl/^D)} + iog{f(n\yi, Xi,ib)} f{ymis,iiXrnis,iibi\yobs,i>Xobs,iirii7^)dbidym^ =h +I2 + I3 + h. In general, the above integration is intractable and does not have a closed form expres- sion. However, this integral can be evaluated by using Monte Carlo approximations (Wei and Tanner (1990)). Specifically, a sample of size {(y^i3i,x^t ubf^), b\mi))} can be drawn from f ( y m i a i i , xmia,u bi\yobS:i, xobs^, ru 7 ( i ) ) via Gibbs sampler along with the adaptive rejection algorithm (Gilks and Wild, 1992). Then we may approximate QtOrl*/^) by 1 rrii + m log{/(aJo6»,i, âmL.iI")} + - E 1 ° g { / ( ^ ) i ^ ) } (3.8) 1 i=i For simplicity, we may take m; constant in each iteration. However, increasing m ; with each iteration may speed up the E M convergence (Booth and Hobert, 1999). The E-step 25 for all individuals at the (t + l)st iteration can be written as . C ? ( 7 | 7 ( i ))=£ft ( 7 l 7 W ) i = i = E E — l°g{f(yobs,i, Vmis,i\P> ^ \ X o b S t l , X^i)} i = l j=l 1 + E E ~ l o g { / ' ( X < ^ , i > V m i s M ) ) * = i i = i * (3.9) JV J7lj _ + £ £ - i ° g { / ( * ? , i ^ ) } JV TOj v m,- + £ £ ~ l 0 S { / ( r ^ 0 b S , i . J/mL,i. </>)} t = l j = l =Q(1)(/3|7W) + g ( 2 ) (« |7 W ) + Q ( 3 ) (^l7 W ) + Q [ i ) W \ l [ t ) ) . 3.3.2 M-s tep We can obtain the updated estimates 7^ + 1) at the (t + l)st iteration by maximizing Q(7 |7^) - Assuming that the parameters (3, a, D and ip are alldistinct, we can update (3, a, D and ip by maximizing Q^2\ and Q ( 4 ) separately at the M-step. The maximizer / 3^ + 1 ' for may be computed via iteratively re-weighted least squares where the missing values are replaced by their simulated values {y^is j , x^is t, 6^}. / 3 ( t + 1 ) =argmax{Q ( 1 )(/3, |7 ( i ))} JV (3.10) =argmaxj^ ^ —{f(y0bs,u Vmu,i\P, b\J\xobsA, x ^ { ) } . t=l 3=1 The maximizer £)( t + 1 ) for can be written as follows: L>(*+1) =argmax{Q ( 3 )(£>, |7 ( t ))} JV rm (3-11) =argmax£E — M/C^P)} 1=1 .7 = 1 26 To update a and ip, one can use standard methods for commonly used models such as multivariate normal models and logistic regression models. =argmax{Q(2>(a,|7(t))} a N mi (3.12) =arg max VV- \og{ f {x o b s ^ , x^] 41 a)}, V ( t + 1 ) =argmax{Q^(^,|7(*))} Af m j (3.13) =argmax E E ~ l o g { / ( r ^ o b S , i > I/£L,i, */>)}• i=l 3=1 To obtain the MLEs 7, we may start with any reasonable values for 7, which can be obtained by the complete-case method or other naive methods , and then iterate between E-step and M-step until convergence is reached. 3.3.3 Variance Estimation The asymptotic covariance matrix of 7 can be obtained by the method of Louis (1982). Specifically, note that the observed information matrix equals the conditional expected complete information minus the missing information, that is, Iobsil) = Icom{l) ~ Imis\obs(l)- (3-14) Let and Qh\i) = E <?<(7l7) = E E -s<*h)> i=l ' i=l k = l r r k 27 Since /3, a, ip and D are distinct, matrices <2(7|7), Q(7|7) and ̂ •('y) are block diagonal. Then, based on (3.14), the asymptotic observed information matrix gives { N mi N -\ E E " 5 « ( 7 ) ^ ( 7 ) - E 4 ( 7 l 7 ) Q r ( 7 | 7 ) • (3-15) i=l j = l ^ i=l J Thus, the asymptotic covariance matrix of 7 can be approximated by , c o v ( 7 ) = / ^ ( 7 ) . ( 3 - 1 6 ) 3.4 Sampling Methods 3.4.1 G i b b s Sampler As we can see from the preceding section, generating samples from the conditional dis- tribution /(2/mi S, i,a;mj S ]i,6 i |t/ o h S )j,cc 06 S ii,ri,7 ( t )) is crucial for implementing the E-step of the Monte-Carlo E M algorithm. Gibbs sampler is a popular method to generate samples from a complicated multidimensional distribution by sampling from each of the full con- ditional distributions in turn. Here we use the Gibbs sampler to simulate the "missing values" as follows. Set initial values y ^ i s i , x ^ i s i and b\°\ Supposed that the current generated values are yj*]S ) i, x { ^ i s i and bf\ Step 1. draw a sample for the missing responses {2/̂ }̂ from f{ymis,i\yobs,iiXobs,i,Xmis,iibi ,fi,^ ^), Step 2. draw a sample for the missing covariates {x^1-} from f{Xmis,i\yobs,ii v\nis}i xobs,i, °| \ ri,~f^), a n d Step 3. draw a sample {of + 1 )} from f ^ y ^ y ^ ^ X o ^ u X ^ 1 } ^ ^ ^ ) . 28 After a sufficiently large burn-in of d iterations, the sampled values will achieve a steady state, that is, {(y^ts]} > xm?s]i> b^1^), k = d + 1, • • • ,B} can be treated as samples from the multidimensional density function f ( y m i s A , xmis^, bi\yobs<i, x o b S t i , rit 7(t)). 3.4.2 Adaptive Rejection Algorithm Gilk and Wilks (1992) proposed an adaptive rejection algorithm for effectively sampling any univariate log-concave density function. In the current situation, we can write f(Vmisj\Vob8,i>xi>bi,ru'Yit)) oc f(yi\bi,xi,P{t))fir^y^Xi,^), fiXmiaAVu xobs,i, bi, n,7 ( i )) oc /(yjbj, x{,P{t))f(xi\a(t))f{ri\yi, Xi,ip{t)), f i b ^ x u r u ^ ) oc fiyilbuXupMmbilD®). Density functions f(yi\bi,xi,j3^>) and f(bi\D^) often come from the exponential.fam- ily, and thus are log-concave in each component of y m i s i , x m i S t i , and b, respectively. If /(rjlj/j, Xi,ip^) is log-concave in each component of y m i s < i and f(xi\D^) is log-concave in each component of x m i s < i , then the products of log-concave functions, f{ymiS,i\y0bs,u xi> bi, fi,7W), f(xmi8j\Vi,Xob8,i,bi,ri,'y(t)), and / ( b j l ^ a ^ r ^ W ) , are log-concave. So we can use the adaptive rejection algorithm to generate samples from f(ymiSti\y0bs,i,xi, bt, rt, 7^), f(xmis,i\yi'xobs,i,bi,ri,^) and / ( b ^ , xt,r*j,7(i)) respectively. Note that the adaptive rejection sampling can only be applied to the univariate case, but y m i s i , x m i S j and bj are often multidimensional. Thus, to implement the Gibbs sampler described earlier, we need to modify the sampling scheme to incorporate multidimensional variables, as described below. For example, suppose that ymis^ is a multivariate of dimension I, that is, y m i s t = )T. Since the function f{ymis^\yobs,i, xi, bi, Tj,7 ( t ) ) is log-concave with 29 respect to each component of y m i s i , and k,i\Vobs,ii {ymis,h,i, h ^ k}, Xi, bi, riy 7̂ ) OC f{ymiSti\yooS:i, Xi, bi, 7"j, 7 )̂, where k = l , - - - ,1, the function f{ymis^yobs^ {ymis,h,i,h ^ fc}, xit bi, rit 7«) is log- concave with respect to y m i s Thus, another Gibbs sampler, together with the adaptive rejection sampling, can be used for generating a sample from f ( y m i S i i \ y o b s j , xiy 6j, 7**, 7 )̂. Specifically, we can proceed as follows. Step 1. use the adaptive rejection sampling method to generate j fr°m f{ymis,iAy0bs^{ymis,h^h> 2},xuburu~i®); Step 2. use the adaptive rejection sampling method to generate y^ts^li f r o m f(ymis,2,i\yobs,i,{yttslli>y{mis,h,i,h > 3 > , » i . b * > 7 ( t ) ) ; Step Z. use the adaptive rejection sampling method to generate y m i s t i from f(ymi8,l,i\Vob8,i> {yttsl,Vh ^ Xu 6;, rU 7W). After a burn-in period, the vector jy^ , 1 ^; • • • , 2/mis,V} m a y D e treated as a sample from f(ymis,i\Vobs^xhbi^in[t))- Samples from f(xmia,i\Vi>zofeSii,buru7(i)), and / ( b ^ , a:*, T*i)7 )̂ c a n D e obtained in a similar way. 3.4.3 Reject ion Sampling When the density functions do not satisfy the log-concave property, the usual rejection sampling method can be used for generating the desired samples. For example, suppose that we want to sample from f(bi\yi,Xi,ri,'y®), which can be written as f(bi\y^Xi,ri, 7M) = cf(bi\D^)f(yi\bi,xi,f3^), where c is a constant, then the usual rejection sam- pling method can be described as follows. 30 Step 1. generate a random value b* from f(bi\D^), and draw a sample U from the Uni- form(0,l), Step 2. accept b* as a sample from fibily^Xi,^,^^) if U < f(yi\bi,xI,^)/T where T — sup{f(yi\u,xi,(3^)}. Otherwise, reject b* and go to step 1. u Samples from f(ymi3ti\yobati,Xi,bi,ri,'yW) and f(xmia<i\yi,xobaji,bi,ri,'yW) can be ob- tained in a similar way. 3.4.4 S a m p l i n g M e t h o d for B i n a r y Var i ab l e s If the missing variables are binary variables, then we may use an easier way to generate the desired sample. Here, we take the missing response ymis, as an example. Suppose that the response is binary and we want to draw samples from f{ymiSi\y0bsi,xii bi,ri,*y^). For simplicity, here we assume that ymiSji is univariate. The corresponding sampling procedure is described as follows. Step 1. draw a sample U from the Uniform(0,l), Step 2. take 0 as a sample from f{ymiSti\yobSti, xu bu rh 7W) if U < f{0\yobSti, xu ru 7(t))- Otherwise, take 1 as a sample. 3.5 P X - E M Algorithm Although the E M algorithm is a very popular tool for estimation due to its easy imple- mentation and stable convergence, it may converge quite slowly in some applications such as ours. To speed up the convergence, many acceleration methods have been proposed (e.g. Liu and Rubin, 1994a, Meng and Van Dyk, 1997), A particularly useful method is 31 the P X - E M algorithm (Liu et al, 1998) , which speeds up the E M algorithm by introduc- ing additional working parameters to the model. For the P X - E M , the E-step is usually the same as the standard E M , while the M-step needs to maximize the expected log- likelihood over the original parameters and the working parameters. Thus, the P X - E M algorithm can be obtained by simple modification of the standard E M . Liu et al (1998) showed that the P X - E M algorithm may dramatically accelerate the rate of convergence without loss of the simplicity and stability of the standard E M . Next, we show how to ap- ply the P X - E M algorithm to GLMMs with informative dropouts and missing covariates. We may expand the G L M M (3.1)-(3.2) by introducing additional working parameters as follows. f(Vij\P*, bi) = exp [{yijdijinij) - <p(0ij(lMj))}/a.(<t>) + c(y«, <f>)], (3.17) Vij = 9(lMj) = ATj/3* + zJjAbi, (3.18) 6 , ~ i V ( 0 , D * ) , j = l,..-,ni, i = l,-..,N, where A is a q x q matrix, called working parameters. The P X - E M algorithm is the stan- dard E M applied to the expanded models (3.17)-(3.18) rather than the original models (3.1)-(3.2). Specifically, the E-step and the M-step are described as follows. E - s t e p : Let 0 ( 4 ) = ( /3* ( 4 \a * W V>*W,£>* ( i ), A ( i ) ) = {P{t\ot^\^t],D^,Iqxq) be the current estimates of the expanded parameters. The E-step of P X - E M is obtained by simply adding the working parameters A to Q(.), i.e, the E-step of the standard E M in Section 3.2.1. Then the conditional expectation of the complete-data log-likelihood given the observed data for the model (3.17)-(3.18) can be written as 32 N g*(0|0(*)) =^g*(©|0(t)) 2 = 1 N nu = E E ~ l°s{f(yobs,u V%L,i\P*, ^\xobSti, a£L,z)} i=l 3=1 ^ N mi ^ 1=1 3 = 1 N mi ^ + E E ^ l o § { / ( 6 ? ) i ^ ) } i=i j=i 1 N mi ^ + E E ~ l o S { / ( r ^ o 6 S , i > Vmls^ V>*)} i=l j=l ^ =Q*W(P\ A|0(t)) + Q<2\a*\®W) + Q<3\D*\&^) + Q<4\ip-*\&V). (3.19) Obviously, everything in this E-step is the same as the E-step of the standard E M in Section 3.2.1, except the extra working parameters A in (3.18). M - s t e p : By the same standard maximization procedures as the M-step in Section 3.2.2, we maximize Q*^\ Q*^2\ Q*^ and Q*^ separately to update the estimates of the expanded parameters to (3<t+l\ a < t + 1 \ ip<t+1\ D<t+^ and A( t + 1 ) . The only difference in this step between the P X - E M and the E M is that the P X - E M maximizes Q*^ over (3* and A, while the E M does this only over /3. The reduction to the original parameters in the models (3.1) — (3.2) gives p{t+l) _ )g»(t+i) ) a ( t + l ) = a*(*+i)> = £)(*+!) = \(t+l)Jj*(t+l)j{(t+l)T_ This iterative calculation between the E-step and M-step until convergence leads to the MLEs of parameters in the original models (3.1) — (3.2). 33 3.6 Convergence When carrying out the Monte Carlo E M algorithm, Monte Carlo samples for the "missing data" are drawn at each iteration to approximate true values. Consequently, Monte Carlo errors are introduced. One way to reduce the Monte Carlo errors is to increase the Monte Carlo sample size m*. However, the computation is intensive for a large rrij. Because the estimate 7^ in the initial E M steps is often far from the true values of the parameters, Monte Carlo samples of a large size may be wasted. Thus, we usually use a small m, at initial iterations, and then increase with the iteration, as suggested by Booth and Hobert (1999). After an initial burn-in period, the Gibbs sampler converges to a stationary state and thus produces draws from the conditional density function f(ymisi,xmis^,bi\yobSti, &obs,i, ri,7^)- Obviously, the determination of the burn-in period is very important. We way use common diagnostic methods to determine the burn-in period, such as time series plots. The proposed Monte Carlo E M algorithm often works well for the models with a small dimension of random effects. When the dimension of the random effects is not small, however, the proposed E M algorithm and Gibbs sampler may converge very slowly or even not converge. Therefore, in next chapter, we propose an approximate inference method which may avoid these convergence difficulties and may be much more efficient. 34 Chapter 4 Approximate Inference for G L M M s with Informative Dropout and Missing Covariates 4.1 Introduction In Chapter 3, we have described the exact inference method implemented by the Monte Carlo E M algorithm. However, the exact method may be computationally intensive and may even offer potentially computational difficulties such as slow or non-convergence. Moreover, when the dimension of random effects is not small, sampling random effects may result in inefficient and computationally unstable Gibbs samplers, which may lead to a high degree of autocorrelation and a lack of convergence. In the presence of missing response and missing covariates, these problems become more serious. In this chapter, we propose an approximate inference method which is not only much more efficient, but also avoids potential computational difficulties. This approximate method is obtained by Taylor series expansion and it avoids sampling the random effects in the E-step by 35 integrating them out. Pinheiro et al. (2001), in a different context, have showed that the convergence rate of the E M algorithm can be greatly improved by integrating out the random effects in the E-step. The outline of this chapter is as follows. In Section 4.2, we present the approximate inference method for GLMMs without missing values, and then extend this method to GLMMs with informative dropout and missing covariates implemented by the Monte Carlo E M algorithm in Section 4.3. In Section 4.4, we briefly describe the sampling methods used in Section 4.3. We conclude this chapter with a discussion about the P X - E M algorithm, an extension of the standard E M algorithm. 4.2 Approximate Inference without Missing Values As described in Chapter 3, a G L M M is written as 1/3, bi) = exp [ { y y 0 y ( / z « ) - ¥>(0y(/xy))}/a(#) + ^Va^)], (4.1) Vij = 9(tMj) = AjjP + zJjbi, (4.2) b ^ N ^ D ) , j = l,---,rH, i = l,---,N, where the notation is the same as (3.1)-(3.2) in Chapter 3. Denoting the observation vector as yt = (yn-- - ,yim)T, and design matrices as Ai = (An,-- - ,Afc) T and z% = (ZJI, • • • ,ziq)T, then the conditional mean and covariance of ?/j satisfy E(yi\bi) = fj,t = g~1(Af(3 + zjbi) and cov(y i|b i) = C f = diag{V(//y)/a(0), j = 1, • • • ,nj} respectively. The above G L M M yields a marginal log-likelihood function by integrating out the 36 random effects KP,D\Vi-m- ,VN) N Hi = l o g | /flfifiVijlfrbimbiWdbi 1 [J i=lj=l ) / e x P fe E { l0S(f(yij\P, bi)) + l o g ( / ( 0 i | D ) ) } | dbx • • • db (4.3) i=l 3 = 1 To estimate the parameters in the G L M M , we need to maximize the log-likelihood func- tion (4.3). However, in most cases, the integral in (4.3) is intractable. Evaluating the integral in (4.3) by Monte Carlo methods for the exact inference method may offer po- tential computational problems, as noted earlier. Here, we consider a much more efficient approximate inference method based on Taylor series expansion. The following approxi- mation is based on a second-order Taylor series expansion about the current parameter estimates 6, which is equivalent to the Laplace's approximation (see [20] [26]), d2k(6)I J ek^dO « (2TT)! 0=0 ok{0) (4.4) dOdOT1 where 0 is a q x 1 vector, k{0) is a function of 0, and 6 is a maximizer to k(0). Applying the Laplace's approximation to the log-likelihood (4.3) yields l(P,D\yi • VN) ~ ~y log(27r) 1 N 5> i = l a2fci(64) dbidbj bi=b? N $ > ( & ? ) , (4:5) i = i where ki(bi) = Y^Li { l°g(/(yy|/3, bi)) + log(/(6j|D))|, and 6° maximizes the function Hh). Maximizing the approximate log-likelihood function (4.5) with respect to /3 and D, and maximizing &i(bj) with respect to 6j are equivalent to jointly solving the following score equations (see [20] [26]), E i 1 V W i - 1 J 5 i ( y i - ^ i ) = 0> ' (4.6) a(</»)^ T W i - 1 5 i (y i - ^) = D-%, i = 1, • • • , N, 37 where J5j is a x diagonal matrix with diagonal terms <9#(̂ y )/cty%- and Wt = B^Bi. It can be shown that the solution to (4.6) via Fisher scoring is equivalent to iteratively solving the following linear equations (see [20] [26]), ATW~lA ATW~XZ ZTW-XA ZTW-XZ + DA-X ATW-^y [ ZTW-'y ) (4.7) where i/j = A?p + zfbi + 5i(?/j - g 1(AfJ3 + zfbi)), J3 and b{ are the current es- timates. Row vectors are yT = (yT, • • • ,yjf) and bT = (bf, • • • ,bff), and matrices are AT = (AT... ,ATN), ZT = diag(zf, • • • ,zTN), Dd = diag{L>, • • • , D} and W = diag{Wi, • • • , WN}- Solving the linear equations (4.7) is equivalent to solving the follow- ing linear mixed effect (LME) model (see [3] [26]), y^Af/3 + zfbi + ei, i = l,---,N, (4.8) where ej's are independent with a normal distribution N(0,Wi), bj's are independent and have an identical normal distribution N(0, D), and and bi are independent. From (4.8) we can derive bAn ~ N (EiZiWr^y. - AtP), E j ) , (4.9) where E* = (ZiW^zf + J 9 - 1 ) - 1 , and yi~N(AT{3,zTDzi + Wi). (4.10) In summary, approximate estimates for GLMMs can be obtained by iteratively solving the liner mixed effect model (4.8), which can be easily handled by standard software packages such as Splus and SAS. 38 4.3 Approximate Inference with Missing Values In the previous section, we discuss an approximate inference method for the G L M M without missing values. However, in our G L M M (4.1) — (4.2), the response y{ is non- ignorably missing and the covariate xt is ignorably missing. In this section we consider a similar method for GLMMs with missing values. Note that missing values in G L M M (4.1) — (4.2) correspond to missing responses in y{ and missing covariates in Xi in the L M E model (4.8) respectively. Here we write yt = (yobs^,ymiS}i), where yobSji contains the observed components of y{ and y m i s i contains the missing components of y i . Note that yobSti and ymis^ are appropriate functions of the missing and observed components of y{ respectively. So the missing response indicator for y{ is the same as the missing response indicator for yt. For L M E models with non-ignorable missing responses, Ibrahim et al. (2001) derived a much more efficient Monte Carlo E M algorithm by integrating out the random effects in the E-step. Here we extend their approaches to the G L M M with informative dropout and missing covariates by iteratively solving the L M E model (4.8) with non-ignorable missing responses and ignorable missing covariates. Since sampling random effects is avoided in the E-step, the rate of convergence of the E M algorithm may be greatly improved. The E-step and M-step are described in details as follows. E - s t e p : Let 7 ( t ) = (/3 ( t ) ,a ( t ) ,•j/> ( t ) ,£ ) (* )) be the current parameter estimates. The response in the L M E model (4.8) can be written as yt = Aj(3w + zfb^ + B<f\yi - g-\AT^+zJbf)) where bf = ^ z ^ 1 (y-Af^), B® = ^ l ^ - i ^ c o ^ > } and W f } = B ^ B ^ „ m . ' As in the previous section, the contribution of individual i in the (t + l)st iteration 3 9 is given by Qi{i\i[t)) J J J {\og(f{yi\!3,buxi))+\og(f{bi\D)) + log (f(Xi\a)) + log (firiiyi,xuij>))} X J'(ilmis,ii xmis,it & i | j / c * . s , i ' xobs,i, r i , 7̂ ^)dbidymis idxr { l o g ( / ( j / i | A f t i , ^ ) ) + log(/(bi|I>)) + \ogtf{xi\a)) + log(firiljji, Xi, •0))} /(bi |j/ i ,a;i .7 W )^» filfmis,ii xmis,i\yobs,ii xobs,ii fit 7̂ ^)dymis,idxmis,i = h + h + h + h, (4.11) where /(yJ/3,6j, Xj) is the normal density with mean Af/3 + zfbi and covariance W/*'. Equation (4:9) implies b ^ , xu 7 ( t ) ~ N(bf\ Sf } ) , where S 4 W = ( Z j W ^ z f + Z ) « ~ 1 ) - 1 . After some algebra, we can integrate out the random effects 6, from (4.11) and obtain the following results / i = - ^ l o g | ^ W | - ^ (yt - AJ(3 - zTbi)TW^\y, - Aff3 - zfbi) f(bi\iji,Xi,7^ ^)db^f (ymisi,xmiS^\yobsi,x0bs,t, ft,7^ ^)dymiSidxmiSti - l i o g i w ^ - ^ z i w r 1 ^ ) \ I J(Vi - AJ(3 - z?b®)TW®-\yi - AJ(3 - zjb®) f^Vmis,ii xmis,i\yobs,ii xobs,ii r i i 7̂ ^)dymiStidxmisj. 40 I2 = -l-\og\D\-l-J J {JibfDbiMbt^Xin^dbi} f{ymis,i> xmis,i\yobs,ii xobs,ii ^ii ^)dy'mis,id'Xmis,i = -1-log\D\--\tv(D-^f)) j^i ^^>i ^ ^ m i s > i ' XmisAyobs,ii xobs,ii rii lf^)dymiStidxr Since f(Xi\a) and f(ri\yi,^.i,'d}) do not depend on bj, we have ^ 3 = J J f{xi\a)f{ymis4>xrnisAyobs,iixobs,i, rii'y^ ^)dymisidxmiSti, and — J J f{Ti\yi> xi>^){ymis,i> xrnisAyobs,ii xobs,ii^ As we can see, integrals 7j, I2, h and I4 do not involve the random effects Oj. Thus we only need to generate random samples from f { y m i 3 i i , xmiS}i\y0bs,i, xobs,i, ru 7(t)). This leads to a much more efficient E M than that for the exact method. Suppose that { ( y { ^ i s 4 , x^J, • • • , a&) 4 ) } is a sample of size m* generated from f(ymiSii,xmiSti\y0bs,ii x0bsti, fi, 7 )̂. Let x\ ^ = (x 0(,S )i, xUisi), y\ ^ = (y0bs,iiymls,i)- Then 6 f f c ) = s f ^ W ^ f o W - AtT^), k = 1,2, • • • , m,. Thus C?j(7l7(i)) can be approximated as Qi{i\i[t)) - ^ W ^ - ^ i z i W ^ z ^ ) + [ - i log |D| - W E ? ' ) - ̂ B&r̂ r')] 1 fe=l fc=i fc=i 41 Therefore, the E-step for all individuals at the (t + l)st iteration can be written as Q ( 7 | 7 « ) = f ^ ( 7 | 7 « ) 1=1 - E [ - ^ ° s i ^ ( t ) i - ^ ^ ( i r l ^ ) i=l m. k=l N _ ~ rru i=l 1 k=l +E £ E /N'V)] + E [ i - E / ( ' . i * f » ! " . * ) ] =1 ' fc=l i=l ' fc=l = Q(1)(/3|7W) + Q(2)(^l7(t)) + Q ( 3 ) (« |7 W ) + QW(iph{t)). (4.13) M-step: Since the parameters in 7 are all distinct, we can maximize Q(7|7^) by max- imizing and separately, leading to the updated estimate 7(i+1). These maximizations can be accomplished by standard complete data optimization methods. The covariance matrix for the parameter estimates, 7, can again be obtained using Louis's method (1982), as in Chapter 3. 4.4 Strategies for Sampling the Missing Values To implement the E-step of the E M algorithm, we need to generate random samples for the missing response y m i s i and missing covariates xmiSii from the joint density function f(ymis,i>xmisAyob8,i>xobs,i,ri,'y(t)). As in Chapter 3, we can use Gibber sampler to draw the desired samples. The procedure is described as follows. Suppose that the current samples for missing values are y^ia^ and x^isi. Step 1. Draw a sample for the missing responses {y^s}} from fill mis,ill/obs,ii xobs,ii x m i s , i i r i i 7̂ 0> 42 Step 2. Draw a sample fror missing covarites {xmisi} from f(xmis,i\yobs,iiymis,i > Xobs,i> rii T*' After a burn-in period, the sampled vaules {ymis}ixm7s}) can.be treated as the true sample from the density function f{ymi3i, xmis,i\Vobs,ii xobs,iiri>7^ ). Note that 'f(Vmi8,AVoba,i>xi>ru'lf{t)) <x f(Vi\xi,'r{t))f(ri\yi,Xi,'yit)), where /(j/Ja^, 7^) is a normal density function with mean Af(3^ and covariance zfD^Zi. If the density function /(T*J|2/J, X j , 7^) is log-concave, we can use the adaptive rejection algorithm to draw sample in Step 1; otherwise, we may consider the general rejection sampling method. Similarly, since f(xmia,i I Vi, xobs,i, 7 W ) oc / ( & | x{, 7(t)) f{xt |7 w ) f(rt I Vi, Xi, 7(t)), as in Chapter 3, samples from f(xmiSti\yi, x0bSti, 7^) in Step 2, can be obtained using the adaptive rejection algorithm or the rejection sampling method, depending on whether both /(a3j |7^) and f(ri\yu ,̂7'*') are log-concave. 4.5 P X - E M The E M algorithm described in Section 4.3 may still be quite slow. To improve the speed of the E M algorithm, in this section we again consider the P X - E M for the approximate method, which is obtained by applying the standard E M algorithm to an expanded model. Specifically, we introduce a q x q working parameter matrix V to the L M E model (4.8) and obtained the following expanded L M E model yi = Ajf3* + zfVbl + eu i = l,-..,N, (4.14) 43 where the €j's are independent error terms with an identical distribution N(0, Wi), bi l~d N(0,D*), and €j and bt are independent. Let 0 = ((3*, a*, tp*, D*, V), where is a vector of parameters for the dropout model and a* is a vector of parameters for the covariate model. Note that model (4.14) is reduced to model (4.8) when V = Iqxq. E-step: Let 0(*> = (f3{t),a^,ip{t),D{t),Iqxq) be the current parameter estimates. Then in the E-step the conditional expectation of the complete-data log-likelihood given the observed data for the expanded model (4.14) can be written as JV <?*(0|0«)=E ^ ( e i e W ) £ [ - \ log |wf I - l-HV?ZiWrlz*V^) i=i N i=l 1 rat £ ( v i f c ) - %F ~ ̂ vb^fwr1^ - AJ(3* - zfvbT) 1 k=l + E [ - j log m - i ^ - E j . ) , _ _ L g ( «J-V- i f " ' ) i=l 1 fc=l J (4.15) where W® = diag { V J ^ ) ^ ^ ) / ^ ) 2 / ^ ^ ^ ^ , , ^ ^ ) } , i f = ( z ^ ^ z f + /jw-1)-!, = ( a ^ , ^ , ) , y f } - (ySL,,Vo*,i), ^ = s W ^ W i W - ' ^ W - A ^ W ) . The sample of size rrii {(y£] S i i , a£L,i)> • • • , (ySi, ̂ S )} i s d r a w n f r o m / ( i / m » , < > ami a,i| yobSti, xobSti, 7**, 0 ^ ) by Gibbs sampler along with the adaptive rejection algorithm. Again, everything in this E-step is the same as the E-step of the standard E M in Section 4.3, except the extra working parameters A in (4.15). M-step: In the M-step, we maximize <2* ( 1 ), Q*{2\. <2* ( 3 ), and <2* ( 4 ) separately to 44 update the parameter estimates to / 3 * ( t + 1 ) , a<t+1\ t/>*(t+1), D*(t+1) and V^t+1). Then the estimates of original parameters are given by p(t+i) = /g*(<+1);Q;(i+1) = a*(t+1),ip-{t+1) = t/>*(t+1), Jj( t + 1) = y(*+1)rj)*(t+1)v(*+1)T. 45 Chapter 5 Covariate Models and Dropout Models 5.1 Introduction In the previous chapters, we have discussed how to estimate the parameters in GLMMs with informative dropout and missing covariates. As we note earlier, to provide a valid inference, we need to specify a dropout model for the missing response, and a covariate model for the time-independent covariates, and then incorporate them into our analyses. In this chapter, we describe how to specify these models. Sections 5.2 and 5.3 introduce dropout models and covariate models respectively. In Section 5.4, we discuss sensitivity analyses for the dropout model and covariate model. 5.2 Dropout Models A dropout model is the distribution for the missing response indicators r^. The param- eters in the dropout model are treated as nuisance parameters and are usually not of 46 inference interest. Thus, we should try to reduce the number of nuisance parameters to make the estimation of /3 more efficient. Moreover, too many nuisance parameters may even make the G L M M unidentifiable. Therefore, one should be very cautious about adding extra nuisance parameters. Since the missing response indicators r^, are binary, a natural model for the r^'s is a logistic regression model as follows. We may assume Tli f i r ^ x ^ ) = H-'J(\ - - , • , • ) ' ( 5 . 1 ) i.e. an independent assumption for the r^'s, and M T 13 ) = H^;yi,Xi,tij), ( 5 . 2 ) where TI^ = P r ( r y = 1) and h(.) is an often linear function of y i 5 Xi and ty . To determine a suitable function /i(.), one can consider standard model selection techniques, such as the likelihood ratio test, A I C / B I C , or consider simple and reasonable linear functions. For example, if we believe that the current missing response indicator only depends on the current or previous response values, then it may be reasonable to assume h(ip; yit Xi, tij) = ipo + ipiVij + V ^ y i j - i - Note that the independence model ( 5 . 1 ) is simple and may not contain too many nuisance parameters, but it fails to incorporate possible correlation among the r^'s. To incorporate possible correlation among the r^'s, we may adapt the model considered in Ibrahim et al. ( 2 0 0 1 ) , f(ri} ip) =f(ril\yi, xutu V j / ^ l n i , yt, Xi,ti7 ip2) ••• x / ( r y l r a , - - - , r i ) (j_i), y i 5 xu tu ip3) ( 5 - 3 ) • • • x /(r j j j jr j i , • • • , r j 5 ( n i _ i ) ,y i ; Xi,ti, ipni), where the ipfs are the parameters for the jth one-dimensional conditional distribution, ip = (•j/'i, 1P2, • • • J V ' M ) a n d M = maxlnj}. We assume that the i/>'s are distinct. 47 The one-dimensional conditional distributions in the product of (5.3) may be chosen to be logistic regression models. Again, one can choose parsimonious one-dimensional distributions in (5.3) by standard model selection techniques. Lipsitz and Ibrahim (1996) noted that model (5.3) approximates a joint log-linear model, a natural model for binary variables. 5.3 Covariate Models When some covariates are missing, we need a distributional assumption for the covariates. The parameters in the covariate model are also viewed as nuisance parameters. Ibrahim (1990) proposed a saturated multinomial model for categorical covariates with missing values. A drawback of his method is that the saturated model greatly increases the number of nuisance parameters, which increases computation burden and may make the model unidentifiable. When the missing covariates are all continuous, we may assume a multivariate normal distribution for the covariates (see [15]). To allow both continuous and categorical covariates, we may write the covariate distribution as a product of one- dimensional conditional distributions, as in Ibrahim, et al. (1999), f(Xi, CX.) =f(Xic\Xn, • • • , £ j ) C _ i , CX-c) . X f(XitC-i\Xn, • • • ,XitC-2,Oic-1) (5.4) • • • x /(arii.ai), where a = ( a i , a 2 , • • • , a c ) and a i , • • • , ac are distinct. The index c is the number of covariates that include missing values. Note that we do not need to make distributional assumptions for the completely observed covariates, which are conditioned on and are suppressed in the expressions. Note also that this modeling scheme allows the missing covariates to be continuous, categorical and mixed. For example, suppose that x\ is 48 continuous and x2 is binary. By the above modeling strategy, we may specify a normal distribution for x\ and a logistic regression model for x2 conditional on x\. 5.4 Sensitivity Analyses Since both the dropout model and the covariate model are not verifiable based on the observed data, it is important to conduct sensitivity analyses. That is, we should try other plausible dropout models and covariate models, and then assess the sensitivity of results to those different models. If there is not much difference between the results based on different models, we draw a relatively reliable conclusion. Otherwise, the results may depend on the assumed models and the conclusions may not be reliable. 49 Chapter 6 Real Data Examples 6.1 Introduction In previous chapters, we have described an exact method and an approximate method for GLMMs with informative dropouts and missing covariates. In this chapter, we will discuss application of these methods to two real datasets. In Section 6.2, we consider a dataset from the AIDS Clinical Trial Group (ACTG) Protocol 315 and investigate the viral load trajectory after an antiviral treatment. In Section 6.3, we consider a dataset from a parent bereavement project to study the pattern of change of parents' mental distress over time after their children's death. In Section 6.4, we discuss computation issues in the analyses of our examples. 50 6.2 Example 1 6.2.1 Data Description Our research is motivated by a longitudinal study from the AIDS Clinical Trial Group (ACTG) Protocol 315 (Wu and Ding, 1999). In this study, 46 HIV infected patients were treated with a potent antiviral drug, a combination of ritonavir, 3TC, and AZT. Plasma HIV-1 RNA (viral load) was repeatedly quantified on days 2, 7, 10, 14, 21, 28, and weeks 8, 12, and 24 after initiation of treatment. After the antiviral treatment, the patients' viral loads will decay, and the decay rate may reflect the efficacy of the treatment. Throughout the time course, due to individual characteristics, the viral load may continue to decay, fluctuate, or start to rebound (rise). The Nucleic Acid Sequence- Based Amplification assay that is used to measure the viral load has a detection limit of 100 RNA copies per ml plasma. If the viral load drops below the detection limit after the treatment, the viral load can not be measured, which may indicate that the treatment is successful. Note that patients with a viral load below the detection limit at early stage may have viral rebound and may have a viral load dropping again after rebound. Figure 6.1 shows the viral load trajectories for six randomly selected patients. To investigate the treatment effect, one approach is to define the response as whether the viral load is below the detection limit or not, which is thus a binary variable. In this study, some patients drop out before the end of the study, and the dropout may be informative. Thus, the response contains non-ignorable missing values. We summarize our data in Table 6.1. As we see from Table 6.1, 8.9% of our responses are missing due to patients' dropout. Preliminary studies show that some baseline covariates such as CD4 cell counts, tumor necrosis factor (measured by TNF levels) and total complement levels (measured by CH50), may partially explain variation in the viral load trajectory. However, some of 51 Table 6.1: Summary statistics Variable Sample mean Sample standard deviation Percentage of missing values Response 0.1 0.3 8.9% CD4 175.4 87.5 0% CH50 242.3 49.6 15.2% T N F 60.0 29.0 8.7% # of patients: TV = 46. # of observations per patient: nj = 7 or 8. these covariates are also missing, since in the multi-center study some baseline covariates may not be measured at some centers. As indicated in Table 6.1, the baseline CH50 con- tains approximately 15.2% missing values, the TNF level contains roughly 8.7% missing values and the CD4 cell count is completely observed. Our objectives are to model the viral load trajectory and to identify covariates that may partially predict changes of viral loads, in the presence of informative dropouts and missing covariates. 6.2.2 Models Let tjij be the viral load status for patient i at the j th visit, i = 1,2,-•• ,N, j = 1, 2, • • • ,rij, where N = 46 and nj = 7 or 8. If the viral load for patient i at the jtb. visit is below the detection limit, = 1; otherwise, y^ = 0. Naturally, we consider a logistic regression model for the binary response. To take into account the inter-patient variation and the intra-patient correlation, we add a random effect term bi to the logistic regression model and obtain the following G L M M . Pv(yij = l\f3,bi) \ l-Pv(Vlj = l\(3,bi)) ( 6 1 ) 52 logit{Pr( ?/j j = l|/3,6j)} = log where j3 = (fa,fa, fa,fa, fa), xu is the baseline CD4 cell count for patient i, xi2 is the baseline CH50 for patient i, x^ is the baseline TNF for patient i, and iy is the jth measurement time for patient i. The regression coefficients, fa, fa, fa, and fa, represent the fixed effects associated with the baseline CD4, CH50, TNF, and time respectively, and bi represents the random effect associated with each patient. We assume that b\, • • • ,b^ are independent and follow an identical normal distribution N(0, a2) with a2 unknown. In this study, the baseline CH50 and T N F contain missing values and the CD4 cell count is completely observed. For this example, it appears reasonable to assume that the missingness of baseline covariates is MAR, i.e., the missingness may depend on the observed values but not on the missing values. To make a valid likelihood inference, we need to specify a model for the covariates which contain missing values. Since CH50 and T N F are continuous and each approximately has a normal marginal distribution, the joint distribution of CH50 and TNF (i.e., Xi2 and x^) may be written as a product of two one-dimensional normal distributions. f(xi2,xa\xii,cx) = f{xa\xii,Xi2)f(xi2\xn), (6.2) where a = ( « ! , - • • , a 7 ) T , f(xi2\xii) is the density function of N(ot\ + a2Xn,a3), and f{xa\xii,Xi2) is the density function of iV(of4 + ct^xn + a§Xi2, aj). As noted earlier, 8.9% of the responses yy are missing due to patients' dropout. The dropout may be due to drug intolerance or drug resistance, so we assume that the response is non-ignorably missing or the dropout is informative, i.e., the missingness of responses may depend on the missing values. When the missing data (response) are non- ignorable, we must model the missing data mechanism in order to obtain valid statistical results. To incorporate the missing data mechanism, we need to specify a distribution for the missing response indicator. The missing response indicator is defined as = 1 if yij is missing; = 0 if is observed. Here, we use a logistic regression model for the 53 missing data indicator, which includes the current response j/y, CD4 cell count Xn and time tij as covariates and is chosen based on the likelihood ratio test. logit {Pr(r y = 1\4>)} = log | T T ^ ^ ^ y } = ^ + <t>m + + M j , (6.3) where <f> = (<f>o, 4>i, 4>2, 4>3)T- Thus, in model (6.3), we link the missingness of the response to the values being missing and therefore allow the response to be non-ignorably missing. For simplicity, we focus on the following independent model TV ni f(r\4>) = n i l P r ( ^ = W y { l - P'(ry = l ^ ) } 1 ^ - i=l j=l More complicated models without assuming r^-'s being independent are possible as well, which contain more nuisance parameters and may be unidentifiable. 6.2.3 Analysis and Results In this section, we analyze the A C T G protocol 315 dataset using our proposed methods. Note that before our analysis, covariates CD4, CH50 and T N F were standardized to avoid extremely small estimates. We consider the following methods to estimate the parameters in model (6.1) — (6.3) with informative dropouts and missing covariates: (i) the exact method using the Monte Carlo E M algorithm, (ii) the approximate method using the Monte Carlo E M algorithm. Table 6.2 shows maximum likelihood estimates of (3 = (/?o,Pi,P2,@3, At), along with their standard errors and p-values, based on the above two methods. Compared with the approximate method, the exact method sometimes gave somewhat larger estimates and smaller standard errors. For example, CD4 cell count is marginally significant (p- value 0.107) based on the approximate method, but highly significant (p-value 0.025) 54 Table 6.2: Estimates for the AIDS data Methods Parameters Po Pi Pi Pz PA Exact Estimate -4.811 0.898 0.634 -0.745 5.869 method SE 0.998 0.400 0.456 0.538 1.361 p- value < 0.001 0.025 0.164 0.166 < 0.001 Approximate Estimate -3.879 0.861 0.395 -0.291 6.240 method SE 0.789 0.534 0.598 0.600 0.880 p- value < 0.001 0.107 0.508 0.627 < 0.001 * SE refers to the standard error. based on the exact method. Since the approximate method integrates out the random effects instead of sampling the random effects, it should be faster than the exact method. However, in this example, the E M convergence for the exact method is obtained in 21 iterations, while the E M convergence for the approximate method is reached in 55 iterations. This is probably because only one random effect is included in model (6.1). From Table 6.2, both methods indicate that the time covariate is highly signifi- cant, suggesting a strong relationship between the viral load trajectory and time. The estimated coefficient for the time covariate, fa, is 5.869 based on the exact method. This means that the estimated odds of patients' viral loads dropping below the detection limit is exp(5.869) = 354 times higher when time increases by one unit (6 months). The exact method suggests that CD4 cell count may be associated with patients' viral loads. The estimated coefficient for CD4, fa, is 0.898. This means that the estimated odds of pa- tients' viral loads dropping below the detection limit, is exp(0.898) = 2.45 times higher when CD4 increases by one unit (262 cell counts). Based on the p-values, the baseline CH50 and T N F do not appear to have significant effects on patients' viral loads, using either of the two methods of estimation. 55 As discussed in previous chapters, the P X - E M algorithm should have a higher convergence rate than E M . This is confirmed in this example. The number of iterations to convergence for the exact method is 21 by the E M algorithm, while the number of iterations to convergence for the exact method is 10 by the P X - E M algorithm. 6.2.4 Sensitivity Analysis To check the sensitivity of our results to the choice of the covariate model, we re-analyze the dataset using the following alternative covariate models, which are obtained based on a standard model selection method - the likelihood ratio test. (i) Alternative Covariate Model 1 (CM1): Model (6.2) with a2 = a5 = 0. The two conditional distribution in the covariate model become Xi2\xn ~ N(cti,ctz) and ~ iV(a 4 + aexi2, a 4), i.e., xi2 and x i 3 are independent of xa. (ii) Alternative Covariate Model 2 (CM2): Model (6.2) with a2 = a5 = a 6 = 0. The two conditional distributions in the covariate model become xi2\xn ~ N(oti,ct3) and Xi3\xn,Xi2 ~ N(a±,OLT), i.e., xu, Xi2 and x^ are independent. The estimates based on the original covariate model (6.2), and the alternative covariate models CM1 and CM2 are shown in Table 6.3. As we can see from Table 6.3, the three covariate models gave similar results. This suggests that parameter estimates and their standard errors may not depend on the covariate models. Similarly, we need to assess sensitivity of our results to the dropout models. We performed sensitivity analyses based on the following choices of the dropout models. (i) Alternative Dropout Model 1 (DM1): logit {Pr(ry = 1|<£)} = (j)Q + fayij-i + foVij + foxn + fatij] 56 Table 6.3: Sensitivity analysis for covariate models Covariate Parameters Models A) Ih ft Original Estimate -4.811 0.898 0.634 -0.745 5.869 Model SE 0.998 0.400 0.456 0.538 1.361 (6.2) p- value < 0.001 0.025 0.164 0.166 < 0.001 Estimate -4.731 0.906 0.594 -0.767 5.754 CM1 SE 1.007 0.387 0.433 0.520 1.375 p- value < 0.001 0.019 0.170 0.140 < 0.001 Estimate -4.893 0.921 0.676 -0.763 5.920 CM2 SE 0.999 0.424 0.455 0.543 1.356 p- value < 0.001 0.030 0.137 0.160 < 0.001 * SE refers to the standard error. (ii) Alternative Dropout Model 2 (DM2): logit {Pr(ry = l \ < f > ) } = (po + (piVij-i + hVa', (iii) Alternative Dropout Model 3 (DM3): logit {Pr(r y = l\<f>)} = (p0 + <h^n + hUj- Note that DM3 corresponds to an ignorable missing data mechanism (i.e., MAR) . Table 6.4 gives the estimates, standard errors, and p-values based on the original dropout model and the alternative dropout models DMI , DM2 and DM3 respectively. As we can see from Table 6.4, all these dropout models, except dropout model DM2, gave similar results. Dropout model DM2, which excludes CD4 and time, produced slightly smaller absolute values of estimates and standard errors. But this does not affect our conclusion. Thus, our results are not very sensitive to the choice of the dropout models. 57 Table 6.4: Sensitivity analysis for dropout models Dropout Parameters Models ft ft ft ft ft Original Estimate -4.811 0.898 0.634 -0.745 5.869 Model SE 0.998 0.400 0.456 0.538 1.361 . (6.3) p- value < 0.001 0.025 0.164 0.166 < 0.001 Estimate -4.911 1.047 0.611 -0.735 6.656 D M I SE 1.027 0.417 0.500 0.613 1.341 p- value < 0.001 0.012 0.221 0.230 < 0.001 Estimate -3.761 0.772 0.428 -0.323 6.011 DM2 SE 0.571 0.299 0.311 0.324 0.898 p- value < 0.001 0.010 0.168 0.318 < 0.001 Estimate -4.867 0.949 0.630 -0.753 6.153 DM3 SE 1.001 0.410 0.466 0.549 1.345 p- value < 0.0001 0.021 0.177 0.171 < 0.001 * SE refers to the standard error. 6.2.5 Conclusion Based on our analyses, we conclude that a patient's viral load tends to more likely drop below the detection limit if he/she stays longer in the study, and a patient with a higher baseline CD4 cell count is more likely to have his/her viral load below the detection limit over the time course. In this example, the exact method converged much faster than the approximate method, and gave smaller standard errors. Also, different covariate models and different dropout models lead to essentially the same results, so our conclusions may be robust. 58 6.3 Example 2 6.3.1 Data Description Our second example involves a longitudinal study from a parent bereavement project, which investigates long-term mental outcomes of parents whose children died by accident, suicide, or homicide. After their children's death, the parents usually experience a high level of mental distress. In this study, the mental distress of 239 parents were calculated based on their answers in the questionnaire, at baseline (i.e. 4 to 6 weeks after their children's death), and then at 4, 12, 24 and 60 months post-death. The Global Severity Index (GSI), which is the most sensitive indicator of mental distress, is used to measure the parents' distress levels. A high GSI score indicates a high level of mental distress. If the parents' adjustment to their children's death goes well, their GSI scores will decrease over time, at least lower than their baseline GSI scores. Figure 6.2 shows the profiles of GSI scores for six randomly selected parents from 239 parents enrolled in this study. To investigate how the parents' mental distress changes over time after their children's death, we treat whether or not a parent's GSI score after baseline is lower than his/her baseline value as response. Several other relevant factors were also obtained at baseline, including parents' gender, marital status, age, education, annual income, cause of death, age, and gender of the deceased child. These baseline factors may be important predictors of parents' distress and thus are viewed as covariates. Summary statistics for the response, education and income are given in Table 6.5. Since the response is binary, we consider a G L M M model for analysis. Note that some baseline covariates such as income contain missing values, and some responses are also missing since some parents may be not in a good mood at the scheduled time. As indicated in Table 6.5, 4.2% of incomes are missing and 19.7% of responses are missing. 59 Table 6.5: Summary statistics Variable Sample mean Sample standard deviation Percentage of missing values Response 0.7 0.5 19.7% Education 13.7 -2.3 0% Income 4.67 1.9 4.2% # of parents: N = 239. # of observations per parent: = 4. Our objectives are to investigate the change of parent's distress levels over time and to determine which covariates affect the parent's mental distress. We will use the methods developed here for G L M M models with informative dropouts and missing covariates. 6 . 3 . 2 M o d e l s To get a parsimonious model, we used standard model selection techniques such as the likelihood ratio test to determine which covariates should be included in our model. Since some covariates and responses contain missing values, model selection were carried out based on the complete-case method. Based on these analyses, we include income, education, and time as covariates in the model. Note that education does not have missing values, while income contains 4.2% missing values. We denote y^ as the response for parent % at the j th time after baseline, i = 1, 2, • • • , N, j•= 1, 2, • • • , n{. Here N = 239 and nt = 4. If GSI for parent i at the j'th time is lower than his/her baseline GSI, yy = 1; otherwise, y^ = 0. We consider the following G L M M to investigate the effects of covariates on the mental distress. Pr(yi3- = 1|/3A) \ 1 -P r (yy = 11/3,6,)/ ( g 4 ) = fa + fiixn + (32Xi2 + fotij + h, logit {Pr(y y = l|/3,6 i)} = log 60 where (3 = (Po, Pi, P2, Ps), %n is the education level for parent i, xi2 is the annual family income for parent i, and ti3- is the j th scheduled time for parent i. The regression coeffi- cients, Pi, p\, and Pz, represent the fixed effects associated with the parents' education level, income, and time respectively, and 6$ represents the random effect associated with each parent. Here, we assume that , frjv are independent and have an identical normal distribution N(0,a2) with a2 unknown. . Note that income xi2 contains approximately 4.2% missing values. Here, we assume that the missing income is M A R . To incorporate missing covariates, we need to specify a model for income. We assume the following covariate distribution xi2\xn ~ N(cti + a2xn,a3). (6.5) Note also that 19.7% of responses yi3- are missing. The response (i.e., GSI status) is missing probably due to parents' high stress, so we assume that our response is non- ignorably missing, i.e., the missingness of responses may depend on the missing values. To incorporate the missing data mechanism in our analysis, we specify a distribution for the missing response indicator. Recall that the missing response indicator is defined as = 1 if yij is missing; = 0 if j/y is observed. Here, we use a logistic model for the missing response mechanism, which includes the current response value yy, the immediate previous response value yij-i, education xn , and time Uj as covariates. f Pr(r - = 1|0) 1 logit {Pr(r y = 1|0)} = l o g ^ _ p r ^ = ^ j = 4>o + (piVij-i + faVij + h^n + <M«> (6.6) where <p = (4>o, (f>i, §2, <i>z)T• We assume that the r^-'s are independent for all i and j. Note that the covariates in model (6.6) are selected based on the likelihood ratio test. 61 Table 6.6: Estimates for the Parent Bereavement data Methods Parameters ft ft ft ft Exact Estimate -1.882 0.182 0.083 0.345 method SE 0.966 0.070 0.165 . 0.258 p- value 0.051 • 0.010 0.612 0.181 Approximate Estimate -1.579 0.139 0.058 -0.239 method SE 0.898 0.065 0.152 0.193 p- value 0.079 0.033 0.704 0.216 :" SE refers to the standard error. 6:3.3 Analysis and Results We consider the following methods to estimate the parameters in models (6.4)-(6.6). (i) the exact method using the Monte Carlo E M algorithm, (ii) the approximate method using the Monte Carlo E M algorithm. Estimates of (3, along with their standard errors and p-values, are shown in Table 6.6. Compared with the exact method, the approximate method resulted in smaller absolute values of estimates and smaller standard errors. Especially for the estimate of fa, the exact and approximate methods gave opposite results. As discussed in previous chapters, the approximate method should have a faster convergence rate, since it avoids sampling the random effect in each E M iteration. However, for this example, the number of iterations to convergence for the approximate method is 24, larger than the number of iterations to convergence for the exact method, which is 13. The P X - E M algorithm improved the convergence speed a bit in this example. The number of iterations to convergence for the exact method based on P X - E M is 9, smaller than 13. Table 6.6 shows that education is significant based on the exact method and the approximate method. The estimate for education fa based on the exact method is 0.182, 62 Table 6.7: Sensitivity analysis for covariate models Covariate Parameters Models Po Pi P2 Ps Original Estimate -1.882 0.182 0.083 0.345 model SE 0.966 0.070 0.165 0.258 (6.5) p- value 0.051 0.010 0.612 0.181 Estimate -1.969 0.189 0.107 0.312 CM1 SE 1.043 0.076 0.175 0.255 p- value 0.059 0.013 0.542 0.222 *SE refers to the standard error. which suggests that the estimated odds of having a lower distress than the baseline value is exp(0.182) = 1.2 times higher, when parents increase their education level by one unit. Based on both the exact method and the approximate method, income and time do not have significant effects on change of parents' mental distress. 6.3.4 Sensitivity Analysis To check the sensitivity of the above results to the covariate models, we consider the following alternative covariate model (i) Alternative Covariate Model 1 (CM1): Model (6.5) with a2 = 0. That is, x i 2 \ x n ~ N(ai,az), i.e., xi2 is independent of Xu. Table 6.7 shows that results based on the original covariate model and the alternative covariate model are quite similar. This suggests that the results may be robust to the covariate models. We also check the sensitivity of our results to the dropout models. We consider the following alternative dropout models. 63 Table 6.8: Sensitivity analysis for dropout models Dropout • Parameters Model ft ft ft ft Original Estimate -1.882 0.182 0.083 0.345 model SE 0.966 0.070 0.165 0.258 (6.6) p- value 0.051 0.010 0.612 0.181 Estimate -1.592 0.161 0.063 0.545 D M I SE 0.808 0.059 0.136 0.273 p- value 0.049 0.006 0.644 0.046 Estimate -1.958 0.193 0.087 0.239 DM2 SE 1.019 0.074 0.169 0.254 p- value 0.055 0.010 0.604 0.347 Estimate -1.460 0.167 0.094 0.939 DM3 SE 1.006 0.073 0.172 0.282 p- value 0.146 0.022 0.585 < 0.001 * SE refers to the standard error. (i) Alternative Dropout Model 1 (DMI): logit {Pr(ry = l\<p)} = 0o + <P\Va + foxn + hUj] (ii) Alternative Dropout Model 2 (DM2): logit {Pr(ry = 1|0)} = 0O + (piVij-i + foVij] (iii) Alternative Dropout Model 3 (DM3): logit {Pr(ry = l\cp)} = 0o + faxn + faUj. Note that DM3 suggests that the missing responses may be M A R . The comparison of estimates based on the original dropout model and the above alternative dropout models is given in Table 6.8. As we can see from Table 6.8, whether yi3- and yij-i are included in the dropout model affects our inference on the coefficient of the time covariate (i.e., fa). For the dropout model DM3, which excludes yi3- and y%j-\ as covariates, we obtain a highly significant p-value (< 0.001) for fa. For the dropout model D M I , which excludes Vij-i, we get a marginally significant fa. Other dropout models lead to insignificant fa. That is, the conclusion about fa is sensitive to the choices of the dropout models. However, estimates of other parameters and their standard errors are quite robust to the different dropout models. 6.3.5 Conclusion Our analyses suggest that parents with a higher education level are more likely to have a lower level of mental distress, i.e., they may have a good adjustment to their children's death. Possibly due to the low dimension of random effects and the small number of intra-parent measurements, the approximate method in this example did not improve the convergence rate. Unlike in Example 1, the approximate method in this example gave smaller standard errors than the exact method. For this example, sensitivity analyses suggest that our conclusions about the time covariate may not be reliable, i.e., they may depend on the choices of dropout models, but our conclusions about other covariates are reliable. 6.4 Computation Issues Starting values. For the E M algorithms in our examples, the starting values for j3 were obtained based on the logistic regressions using the completely observed cases, the starting values for a were obtained based on linear regression models using the completely observed cases, and the starting values for <p were obtained based on logistic regressions using the last-value-carried-forward method. Convergence of the Gibbs sampler. We checked the convergence of the Gibbs 65 sampler used in each Monte-Carlo E M by examining the time series and autocorrelation function plots. For example, Figure 6.3 shows the time series and autocorrelation function plots for generating missing CH50 in the first example. From Figure 6.3, we notice that the Gibbs sampler converged quickly and the autocorrelations between successive generated samples are negligible. We also drew the time series plot and autocorrelation function for the random effect 646 associated with patient 46 in the first example, shown in Figure 6.4. It shows that the Markov chain converged quickly, but the autocorrelations are negligible after lag 6. Time series and autocorrelation function plots for other random effects and other missing covariates show similar behaviors. Therefore, for each E M iteration, we discarded the first 200 samples as the burn-in, and then we took one sample from every 10 simulated samples until 500 samples were obtained. S t o p p i n g r u l e . The stopping rule for the E M and P X - E M algorithms in our examples is that the relative change in the parameter values from successive iterations is smaller than a given tolerance level (e.g. 0.01). However, due to Monte Carlo errors induced by the Gibbs sampler, it is difficult to converge for a extremely small tolerance level, otherwise it may converge very slowly. 66 \ i i i r 0 50 100 150 200 day Figure 6.1: Viral loads (log 1 0 scale) for six randomly selected patients. The open dots are the observed values and the dashed line indicates the detection limit of viral loads. The viral loads below the detection limit are substituted with log 1 0 (50). 67 o C O in O o o Ui 55 o ci o d month Figure 6.2: GSI scores for six randomly selected parents. The open dots are the observed values and the GSI scores at time 0 are the baseline values. 68 Time series plot o t o I O CM I I 100 200 300 400 500 iteration (a) Autocorrelation function plot 00 c i LL O c i I . ' • . I . I I . . I "i r ~ l 1 1 1 10 15 20 25 Lag (b) Figure 6.3: (a) Time series and (b) autocorrelation function plots for CH50. 69 Time series plot o CD CD E o "O c CO CM I i 100 200 300 400 500 O oo d o d iteration (a) Autocorrelation function plot I 1 i 10 H r - 15 20 Lag (b) 25 Figure 6.4: (a) Time series and (b) autocorrelation function plots for 6 4 6 associated with patient 46. 70 Chapter 7 Simulation Study 7.1 Introduction To evaluate the performance of the two proposed methods: the exact method (EX) and the approximate method (AP), we conduct a simulation study in this chapter. In our simulations, we compare E X and A P in terms of biases and mean-squared errors of their estimates. Section 7.2 gives a description of data generation models in our simulations. In Section 7.3, we compare two methods of estimation in four different situations, and examine the effects of missing rates, variance of random effects, sample size, and number of intra-individual measurements. We conclude this chapter in Section 7.4. 71 7.2 Description of the Simulation Study 7.2.1 Models We generate the response variable from the following G L M M logit{Pr(yy = 11/3,6; j)} =Po + AaJii + fax%i + /33ijj + h (7.1) i = l , - - - , i V , j = ! , - • • , where /3 = (Po,Pi, fa, fa), the random effects fcj's are assumed to be i.i.d with a normal distribution iV(0, a 2). The true values of /3 and a 2 are /3 = (-3,0.5, -0.3,4) and a2 = 0.3. The number of individuals (sample size) is N = 50, and the number of intra-individual measurements is = 10. The n* time points for each individual are 2, 7, 9, 14, 20, 28, 40, 56, 70, and 84. The covariates Xu and xi2 are continuous variables. Covariate variable Xu is generated from iV( 1,0.1) and covariate xi2 is generated from the following model where a = (ai,a2,a3) and the true value of at. is a = (—1.5,1,0.2). In our simulation study, the missing covariate mechanism is assumed to be MAR. For each generated data set, we keep xu completely observed and delete those values of xa with probability 0.8 which correspond to the largest values of x^. To evaluate the proposed methods, we also generate some missing values of re- sponses j/jj's as follows. The model for the missingness of the response is where <j> = (<J)Q, 4>I) and the missing response indicator is a binary variable. The above model suggests that the missingness of the response depends on the missing values, and xi2\Xii ~ N(cti + a2xn, a3) (7.2) logit {Pr(ry = 1|<£)} = (j)Q + fayy (7.3) 72 thus the response is non-ignorably missing (NIM). We generate missing responses based on the model (7.3). That is, if ri3 = 1, then yi3 is deleted; if ri3 = 0, yi3 is retained. Note that different values of d) will lead to different missing rates of responses. We will discuss E X and A P in two different values of <f> in Section 7.3.1. 7.2.2 Bias and Mean-Squared Error We examine the convergence of Monte Carlo Markov Chains by their time series plots and autocorrelations function plots. Time series plots and autocorrelation function plots have shown that Markov Chains converged very fast, usually in 100 or 200 iterations, and autocorrelations are negligible after lag 2. Figure 7.1 and Figure 7.2 show typical time series plots and typical autocorrelation plots for the missing covariates and random effects from our simulated data sets. Thus, to ensure the convergence, we conservatively discard the first 500 samples, and then take one sample every 10 samples until we obtained the desired number of samples. We run B = 100 replicates in each simulation, and compare E X and A P in terms of biases and mean square errors (MSEs). Here, bias and MSE are reported in terms of percent relative bias and percent relative root mean-squared error. The bias for P3, the j th component of /3, is defined as bias.,- = J3j — Pj, where Pj is the estimate of Pj. The mean-squared error for @3 is defined as MSEj = bias2 + s2, where s3 is the simulated standard error of P3. Then, the percent relative bias of J33 (%bias) is biasj/Pj x 100%, 73 and the percent relative VMSE ( % V / M S E ) is y/MSEj/lPj] x 1 0 0 % . In our simulations, we consider (i) two missing rates: 2 0 % and 4 0 % , (ii) two different variances of bp. a2 = 0 . 3 and a2 = 1, (iii) two different sample sizes: N = 5 0 and N = 1 0 0 , (iv) three different numbers of intra-individual measurements: n, = 5 , n, = 1 0 and rij = 2 0 . In the above situations, we compare estimates based on E X and AP, and investigate how the missing rate, the variance of bi, the sample size, and the number of intra-individual measurements affect estimation of the parameters. 7.3 Simulation Results 7.3.1 Comparison of Methods with Varying Missing Rates To see the impact of the missing rates on estimation by E X and AP, we estimate the parameters based on two missing rates respectively. A 2 0 % missing rate and a 4 0 % missing rate are considered. In our case, if the true values of <f> are (p = ( — 1 . 8 , 1 ) , the missing response mechanism ( 7 . 3 ) leads to an average of 2 0 % missing rate for the response; if <f> = ( — 0 . 8 , 1 ) , the missing response mechanism ( 7 . 3 ) leads to an average of 4 0 % missing rate. Regarding the covariate x2 with missing values, we take the same missing rate as the response. Table 7 . 1 shows average simulation results from 1 0 0 simulated data sets based on methods E X and AP. E X and AP yield comparable results for the two missing rates. In the 2 0 % missing rate case, compared with AP, E X gives smaller biases, but slightly larger mean-squared errors. In the 4 0 % missing rate case, E X produces slightly larger biases and mean-squared errors than AP. As we can see from Table 7 . 1 , the missing rate 7 4 Table 7.1: Simulation results with varying missing rates Missingness rate (%) Parameter (true values) %bias % V M S E E X A P E X A P 20 A> = - 3 6 1 29 28 P\ = 0.5 -6 -8 115 112 & = -0.3 -3 -5 144 141 ft = 4 2 -2 12 11 40 A) = - 3 22 14 46 40 A = 0.5 44 39 164 148 ft = -0.3 50 48 153 148 & = 4 3 -4 17 15 greatly affects biases and mean-squared errors of estimates from two methods, especially estimates from E X , that is, the absolute values of biases and the mean-squared errors increase with the missing rate. 7.3.2 Comparison of Methods with Different Variances To investigate how the variability of 6* affects the estimates from two methods, we con- sider two sets of values of cr2: a small variance a1 = 0.3 and a moderate variance a2 = 1 at the same missing rate 20%. We summarize the simulation results of estimation from E X and A P in Table 7.2. E X produces slightly lager mean-squared errors of estimates than A P in both cases. However, the performance of E X is still quite close to AP. We also note that the mean- squared errors of estimates based on E X and A P increase as a2 increases. That is, the variability of random effects affects estimation of E X and AP. 75 Table 7.2: Simulation results with varying variances Variance Parameter (true values) %bias % V M S E E X A P E X A P Small ft = - 3 6 1 29 28 Variance ft = 0.5 -6 -8 115 112 a2 = 0.3 ft = -0.3 -3 -5 144 141 ft = 4 2 -2 12 11 Moderate ft = - 3 4 -5 36 36 Variance ft = 0.5 -3 -7 156 150 a2 = 1 ft = -0.3 9 7 176 169 ft = 4 2 -6 12 12 7.3.3 Comparison of Methods with Different Sample sizes To examine the effect of the sample size on estimation, we estimate the parameters based on E X and A P with two different sample sizes: N = 50 and iV = 100, with a 20% missing rate. The average simulation results from E X and A P are shown in Table 7.3. We note that, as the sample size increases from 50 to 100, A P becomes more reliable in the sense that A P provides somewhat smaller biases and mean-squared errors than E X . However, A P does not outperform E X much. Moreover, both A P and E X yield smaller mean-squared errors for larger sample sizes. 7.3.4 Comparison of Methods with Varying Intra-individual Mea- surements To see how the number of intra-individual measurements affects our estimates, we con- sider the two methods of estimation under three different numbers of intra-individual measurements : n^ = 5, n, = 10 and nj = 20. If n* = 5, the time points for each individ- 76 Table 7.3: Simulation results with varying sample sizes Number of Parameter %bias %VMSE individuals (true values) E X A P E X A P N=50 ft = - 3 6 1 29 28 ft = 0.5 -6 -8 115 112 ft = -0.3 -3 -5 144 141 ft = 4 2 -2 12 11 N=100 ft = - 3 9 4 21 20 ft = 0.5 8 4 82 79 ft - -0.3 11 9 101 98 ft = 4 2 -2 9 8 ual are 2, 9, 20, 40, 70; if = 10, the time points for each individual are 2, 7, 9, 14, 20, 28, 40, 56, 70 and 84; if = 20, the time points for each individual are 2, 4, 7, 9, 12, 14, 17, 20, 24, 28, 33, 40, 46, 53, 56, 60, 66, 70, 76 and 84. The simulation results are indicated in Table 7.4. Both E X and A P produce smaller mean-squared errors as the number of intra-individual measurements increases (i.e., as rii increases). Compared with E X , A P provides slightly smaller mean-squared errors in the three cases. But, the results from E X and A P are still quite close and become closer as n, gets larger. 7.3.5 Conclusion Based on the simulation results in the preceding sections, we may draw conclusions as follows. • Estimates based on E X and A P get worse in terms of biases and mean square errors as the missing rate gets larger. • The mean-squared errors of estimates from both E X and A P increase as the vari- 77 Table 7.4: Simulation results with varying intra-individual measurements Number of intra-individual Parameter %bias % V M S E measurements (true values) E X A P E X A P rii = 5 Po = - 3 11 4 40 37 Pi = 0.5 9 1 150 148 Pi = -0.3 -0.1 -7 200 193 03 = 4 7 2 19 16 Hi = 10 A) = - 3 6 1 29 28 ft = 0.5 -6 -8 115 112 P2 = T 0.3 -3 -5 144 141 03=4 2 -2 12 11 rii = 20 0o = - 3 5 1 22 21 0i = 0.5 -4 -4 91 90 02 = -0.3 2 4 117 118 03=4 1 -3 8 8 ability of random effects a2 increases. • Increasing the sample size reduces the mean-squared errors of estimates for both E X and AP. • Increasing the number of intra-individual measurements reduces biases and mean- squared errors of estimates for both E X and AP. • A P yields somewhat smaller mean-squared errors than E X and thus provides more stable results. This is probably because sampling the random effect in the E X , may lead to unstable Gibbs samplers and thus induce more Monte Carlo errors. Note that the convergence rate of E X is approximately as fast as that of A P in our simulations probably due to the fact that only one random effect is included in our GLMMs. 78 Time series plot i 1 1 1 1 r~ 0 100 200 300 400 500 iteration (a) Autocorrelation function plot 00 c> LL — O < ci o 1 1 I I . I , , o I I ' ' I I I I I I ' i 1 1 1 1 r 0 5 10 15 20 25 Lag (b) Figure 7.1: (a) Time series and (b) autocorrelation function plots for z2. 79 Time series plot Figure 7.2: (a) Time series and (b) autocorrelation function plots for 6 1 8 associated with individual 18. 80 Chapter 8 Conclusion and Discussion In this thesis, we have proposed two methods to estimate the parameters for GLMMs with informative dropouts and missing covariates. These include an exact method and an approximate method, which are implemented by the Monte Carlo E M algorithm. For the exact method, the conditional expectation in the E-step of the Monte-Carlo E M is evaluated by Monte Carlo approximations (Wei and Tanner, 1990), which generate random samples for the unobservable random effects, missing covariates, and missing responses. However, sampling the random effects may offer potential computational difficulties such as very slow or non-convergence, especially when the dimension of the random effects is not small. To overcome this difficulty, in the more efficient approximate method, we integrate out the random effect in the E-step and thus avoid sampling the random effects in the Monte Carlo E M . Pinheiro and Wu (2001) have shown that the convergence rate of the E M algorithm can be greatly improved by integrating out the random effects. To further speed up the Monte Carlo E M , we also applied a P X - E M algorithm, which accelerates the E M algorithm by introducing additional working parameters to the model. Based on our two examples, the P X - E M algorithm is much faster than the 81 standard E M algorithm. We conducted a simulation study to compare the performance of the exact method and the approximate method. In our simulations, in general, the approximate method gives somewhat more stable results than the exact method in the sense that it provides smaller mean-squared errors. As the number of intra-individual measurements or the sample size increases, the performance of the approximate method and the exact method becomes similar. Our simulations also suggest that the proportion of missing values, the variance of random effects, the sample size, the number of intra-individual measurements, may affect the performance of the exact method and the approximate method. The proposed methods were applied to an AIDS dataset to evaluate an antiviral treatment. The results of our analyses based on the exact and approximate methods suggest that the viral loads of HIV patients tend to decrease with time, and that patients with higher CD4 cell counts are more likely to have their viral loads suppressed below the detection limit. We also applied our methods to a data set from a parent bereavement project to investigate the change of parents' mental distress after their children's death and to determine which factors influence parents' mental distress. We conclude that parents with a higher education level are more likely to have a better adjustment to their children's death. Note that we have assumed parametric models for the missing covariates and missing response indicators. So it is important to conduct sensitivity analyses of our results to these parametric models. Based on our sensitivity analyses, except for /33 in the second example, the results of the two examples are quite robust to the choices of the covariate model and the dropout model. Thus these results except for f33 in the second example may be reliable. Finally, we give an outline for possible future work. 82 (i) For simplicity, in our examples and simulations, we include only one random ef- fect in the GLMMs to demonstrate our methods. In the future, we will study models with more random effects and further investigate the computational advan- tages/disadvantages of the proposed methods, via,simulations. (ii) In our examples and simulations, we only consider mixed effect logistic regression models with informative dropouts and missing covariates. Generally, our proposed methods can be applied to other GLMMs, such as mixed-effect Poisson models, and nonlinear mixed effect models with informative dropouts and missing covariates. (iii) In the thesis, we assume that covariates with missing values are time-independent. When some covariates with missing values are time-dependent, similar methods can be proposed. (iv) We have assumed that the missing responses depend on the values being miss- ing. We could also apply our methods to shared-parameter models, in which the missingness of responses is assumed to depend on the unobservable random effects. 83 Bibliography [1] Agresti, A. Categorical Data Analysis. New York: John Wiley, 1990. [2] Booth, J. G. and Hobert, J. P. Maximizing generalized linear mixed model likeli- hoods with an automated Monte Carlo E M algorithm. Journal of the Royal Statis- tical Society, Soc. B, 61:265-285, 1999. [3] Breslow, N . E. and Clayton, D. G. Imcomplete data in generalized linear models. Journal of the American Statistical Association, 88:9-25, 1993. [4] Dempster, A. P., Laird, N . M . , and Rubin, D. B. Maximum likelihood estimation from incomplete data via the the E M algorithm (with Discussion). Journal of the Royal Statistical Society, Soc. B, 39:1-38, 1977. [5] Diggle, P. and Kenward, M . G. Informaive Drop-out in Longitudinal Data Analysis. Apllied Statistics, 43:49-93, 1994. [6] Diggle, P. J., Liang, K. Y . , and Zeger, S. L. Analysis of Longitudinal Data. Oxford: Oxford University Press, 1994. [7] Gilks, W. R. and Wild, P. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41:337-348, 1992. 84 [8] Ibrahim, J. G. Incomplete data in generalized linear models. Journal of the American Statistical Association, 85:765-769, 1990. [9] Ibrahim, J. G., Chen, M . H., and Lipsitz, S. R. Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88:551-564, 2001. [10] Ibrahim, J. G. and Lipsitz, S. R. Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. Journal of the Royal Statistical Society, Soc. B, 61:173-190, 1999. [11] Lipsitz, S. R. and Ibrahim, J. G. A conditional model for incomplete covariates in parametric regression models. Biometrika, 83:916-922, 1996. [12] Little, R. J. A. Regression with missing X's: A review. Journal of the American Statistical Association, 87:1227-1237, 1992. [13] Little, R. J. A. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90:1112-1121, 1995. [14] Little, R. J. A. and Rubin, D. B. Statistical Analysis with Missing Data. New York: John Wiley, 1987. [15] Little, R. J. A. and Schlucher, M . D. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72:497-512, 1985. [16] Liu, C. and Rubin, D. B. The E C M E algorithm: a simple extension of E M and E C M with faster monotone convergence. Biometrika, 81:633-648, 1994a. [17] Liu, C , Rubin, D. B., and Wu, Y . N . Parameter expansion to accerlate E M : The P X - E M algorithm. Biometrika, 85:755-770, 1998. 85 [18] Louis, T. A. Finding the observed information matix when using the E M algorithm. Journal of the Royal Statistical Society, Ser. B, 44:226-233, 1982. [19] McCulloch, C. E. Maximum likelihood algorithm for generalized linear mixed mod- els. Journal of the American Statistical Association, 92:162-170, 1997. [20] McCullock, C. E. and Searle, S. R. Generalized, Linear, and Mixed Models. New York: Wiley, 2001. [21] Meng, X . L. and Van Dyk. The E M algorithm - an old folk song sung to a fast new tune (with Discussion). Journal of the Royal Statistical Society, Soc. B, 59:511-567, 1997. [22] Pinheiro, J. C. and Wu, Y . Efficient algorithms for bobust estimation in linear mixed- effects models using the multivariate t-distribution. Journal of Computational and Graphical Statistics, 10:249-276, 2001. [23] Vonesh, E. F., Wang, H., Nie, L., and Majumdar, D. Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. Journal of the American Statistical Association, 97:271-283, 2002. [24] Wedderburn, R. W. M . Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61:439-447, 1974. [25] Wei, G. C. and Tanner, M . A. A Monte Carlo implementation of the E M algo- rithm and the poor man's data augmentation algorithms. Journal of the American Statistical Association, 85:699-704, 1990. [26] Wolfinger, R. Laplace's approximation for nonlinear mixed models. Journal of the American Statistical Association, 80:791-795, 1993. 86 [27] Wu, H. and Ding, A. Population HIV-1 Dynamics in Vivo: Applicable Models and Inferential Tools for Virological Data from AIDS Clinical Trials. Biometrics, 55:410-418, 1999. [28] Wu, H. and Wu, L. A multiple imputation method for missing covariates in non- linear mixed-effects models with application to HIV dynamics. Statistics in medicine, 20:1755-1769, 2001. [29] Wu, M . C. and Carroll, R. J. Estimation and comparision of changes in the pres- ence of informative right censoring by modeling the censoring process. Biometrics, 44:175-188, 1988. 87
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 1 | 7 |
Japan | 1 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 1 | 7 |
Tokyo | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: