Nonlinear Mixed Effects Models with Dropout and Missing Covariates When the Dropout Depends on the Random Effects by Shijun Song B.Sc, Peking University, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Statistics) The University of British Columbia August 2005 © Shijun Song, 2005 Abstract Nonlinear mixed effects models ( N L M E s ) are very popular in many longitudinal studies such as H I V viral dynamic studies, pharmacokinetics analyses, and studies of growth and decay. In these studies, however, missing data problems often arise, which make some statistical analyses complicated. In this thesis, we proposed an exact method and an approximate method for N L M E s with random-effects based informative dropouts and missing covariates, and propose methods for simultaneous inference. Monte Carlo E M algorithms are used in both methods. T h e approximate method, which is based on a Taylor series expansion, avoids sampling the random effects in the E-step and thus reduces the computation burden substantially. T o il- lustrate the proposed methods, we analyze two real datasets. T h e exact method is applied to a dataset with covariates and a dataset without covariates. T h e approximate method is applied to the dataset without covariates. T h e result shows that, for both datasets, dropouts may be correlated with individual random effects. Ignoring the missingness or assuming ignorable missingness may lead to unreliable inferences. A simulation study is performed to evaluate the two proposed methods under various situations. ii Contents Abstract ii Contents iii List of Tables vii List of F i g u r e s viii Acknowledgements ix Dedication 1 x Introduction 1.1 1.2 1.3 1 Longitudinal D a t a Analysis 1 1.1.1 Longitudinal Studies 1 1.1.2 Approaches to Longitudinal D a t a Analysis 3 Missing D a t a Problems 5 1.2.1 Missing Covariates and Responses 5 1.2.2 Classification of Missing Value Mechanisms 5 1.2.3 Literature on Missing D a t a Problems 7 Motivating Examples 8 iii 1.4 2 3 11 N o n l i n e a r M i x e d Effects M o d e l s 12 2.1 Introduction 12 2.2 Nonlinear M i x e d Effects Models 12 2.3 Literature Review on N L M E Models with Informative Missing D a t a . 15 A n E x a c t M e t h o d for N L M E M o d e l s w i t h I n f o r m a t i v e D r o p o u t a n d Missing Covariates 17 3.1 Introduction 17 3.2 T h e Models 18 3.3 A Monte Carlo E M M e t h o d 20 3.3.1 E-step 22 3.3.2 M-step 25 3.4 3.5 4 Objectives and Outline Sampling Methods 27 3.4.1 Gibbs Sampler 27 3.4.2 Multivariate Rejection Algorithm 28 3.4.3 Importance Sampling 29 Convergence 30 A n A p p r o x i m a t e M e t h o d for N L M E M o d e l s w i t h I n f o r m a t i v e D r o p o u t and Missing Covariates 33 4.1 Introduction 33 4.2 T h e Approximate Method 35 4.2.1 E-step 36 4.2.2 M-step 45 4.3 Monte Carlo Sampling 45 iv 5 6 Covariates Models and D r o p o u t M o d e l s 5.1 Introduction 47 5.2 Covariate Models 48 5.3 Dropout Models 49 5.4 Sensitivity Analyses 50 D a t a Analysis 52 6.1 Introduction 52 6.2 Example 1 53 6.2.1 Data Description 53 6.2.2 Models 55 6.2.3 Analysis and Results 57 6.2.4 Sensitivity Analysis 59 6.2.5 Conclusion . 61 6.3 Example 2 62 6.3.1 Data Description 62 6.3.2 Models 64 6.3.3 Analysis and Results 65 6.3.4 Sensitivity Analysis 66 6.3.5 Conclusion 67 6.4 7 47 Computation Issues 67 Simulation Study 69 7.1 Introduction 69 7.2 Design of the Simulation Study 70 7.2.1 70 Models v 7.2.2 7.3 Comparison Criteria 71 Simulation Results 71 7.3.1 Comparison of Methods with Varying Missing Rates 71 7.3.2 Comparison of Methods with Different Random Effects Covariances 7.3.3 7.3.4 7.4 8 73 Comparison of Methods with Varying Intra-individual Measurements 74 Comparison of Methods with Different Variances 75 Conclusions 76 Conclusion and Discussion 78 References 80 ) vi List of Tables 6.1 Data summary of Example 1 53 6.2 Estimations for response model parameters. (Example 1) 58 6.3 Estimations for dropout model parameters. (Example 1) 59 6.4 Sensitivity analyses for dropout models. (Example 1) 60 6.5 Data summary of Example 2 62 6.6 Estimates for dynamic model parameters in Model (6.11) and (6.12). (Example 2) 65 6.7 Estimations for dropout model parameters. (Example 2) 66 6.8 Sensitivity analyses for dropout models. (Example 2) 67 7.1 Simulation results for varying missing rates 73 7.2 Simulation results for different covariance matrices for random effects. 74 7.3 Simulation results for varying intra-individual measurements 75 7.4 Simulation results for varying variances 76 vii List of Figures 1.1 Hypothetical data on the relationship between height and age 1.2 V i r a l loads of four randomly selected patients 10 6.1 V i r a l loads of four randomly selected patients. (Example 1) 54 6.2 Q - Q plots for covariates (Example 1) 56 6.3 V i r a l loads of four randomly selected patients. (Example 2) 63 viii 2 Acknowledgements First and foremost, I would like to thank my supervisor, D r . Lang W u , for his excellent guidance and immense help during my study at the University of British Columbia. Without his support, expertise and patience, this thesis would not have been completed. Also, I would like to thank my co-supervisor and second reader, D r . Harry Joe, for his invaluable comments and suggestions on this thesis. I would also like to thank M s . Kunling W u , a previous student of D r . Lang W u . Her master thesis and personal advice benefit me a lot in completion of this thesis. Furthermore, I would like to thank D r . John Petkau for his invaluable advice on my consulting projects, which benefit me very much in the past and future. I thank all the faculty and staff in Department of Statistics at the University of British Columbia for providing such a nice academic environment. I should also thank all the graduate students in the Department of Statistics for making my study so enjoyable. Most importantly I would like to thank my parents for loving me and believing in me. Their love, constant support and encouragement push me to be the best at everything I do. SHIJUN SONG The University of British Columbia August 2005 ix T o my parents. x Chapter 1 Introduction 1.1 1.1.1 Longitudinal D a t a Analysis Longitudinal Studies T h e key characteristic of a longitudinal study is that individuals are measured repeatedly over time. Longitudinal studies differ from cross-sectional studies, in which a single outcome is measured for each individual. In many studies, especially in clinical trials, longitudinal data are very common. Even when it is possible to address the same scientific questions in a longitudinal or cross-sectional study, there may be many advantages in addressing them in a longitudinal study. A n example, which can illustrate this idea, is Figure 1.1. In Figure 1.1(a), height is plotted against age for a hypothetical cross-sectional study of boys. Height appears to be shorter among older boys. In Figure 1.1(b), we connect the data points from each individual. Now, it is clear that everyone's height increases with age. T h i s example shows that longitudinal studies can distinguish changes over time within individuals from differences among people in their baseline levels. Cross-sectional studies cannot. Longitudinal data can 1 a 10 12 14 b 16 18 10 age 12 14 16 18 age Figure 1.1: Hypothetical data on the relationship between height and age. be collected either prospectively, following subjects forward in time, or retrospectively, by extracting multiple measurements on each person from historical records. L o n gitudinal data require special statistical methods because the set of observations on one subject tends to be inter-correlated. T h i s correlation must be taken into account to draw valid inferences. Correlation is also taken into account when analyzing a single long time series of measurements. In most time series studies, there is only one series available and people usually try to find clues and draw conclusions from that series itself. Anal- ysis of longitudinal data tends to be simpler because subjects are usually assumed independent. Valid inferences can be made by borrowing information across people. T h a t is, the consistency of a pattern across subjects is the basis for substantive conclusions. For this reason, inferences from longitudinal studies can be made more robust to model assumptions than those from time series data, particularly to assumptions about the nature of the correlation. 2 1.1.2 Approaches to Longitudinal D a t a Analysis Let yij represent a response variable and X y represent a p x l vector of p explanatory variables observed at time point Uj, for measurement j on subject i, j = 1,..., n , , i — 1 , . . . , N. T h e mean and variance of are represented by E (y^) = ^ and Var (y^) — u^. T h e set of repeated outcomes for subject i are collected into an vector, yt = (yu,... ,yi ) , T ni with mean E (y*) = /x and ; x1 x rij covariance matrix V a r ( y i ) = V, where the (j,k) element of Vi is the covariance between y^ and y^, denoted by C o v (y^, y,fc) = t ^ . T h e covariate matrix for the ith subject is denoted as Xi = {xn, • • • , X m J , T an n; x p matrix. W e use B4 for the n* x rij correlation matrix of y*. T h e responses for all subjects are denoted as y = ( y f , . . . , y ^ ) , which r is a n m x 1-vector with m — YlZ=i i- T h e covariates for all units are referred to as n X — (Xf,..., Xj^) , which is an m x p matrix. T There are three approaches to longitudinal data analysis. T h e first approach, which is often called marginal models, is to model univariate responses ignoring dependence. Marginal methods are mainly used for regression with dependent data with the main interest is inference for the regression parameters. For example, in a clinical trial the difference between control and treatment is most important, not the difference for any one individual. A second approach, the random effects model approach, assumes that correlation arises among repeated responses because the regression coefficients vary across individuals. Here, we model given the individual-specific coefficients, h(E(y \0 ))=xf (3i. ij Here, h(-) is a link function. i J by (1.1) For normal responses, it is the expectation and for binary responses, it may be the log odds ratio. 3 Usually, there are too little data on a single person to estimate /3j from ( y ^ X j ) alone. We further assume that the Pi's are independent realizations from some distribution with mean (3. We can write Pi = P + hi, where P is fixed and bj is a vector of zero-mean random variables. T h e n the basic assumption can be restated in terms of the latent variables bj. T h a t is, there are unobserved factors represented by bj that are common to all responses for a given individual but which vary across individuals. R a n d o m effects models are particularly useful when inferences are to be made about individual trajectories, such as in A I D S studies. T h e y focus on both population parameters P and individuals characteristics bj's. T h e third approach is called a transition model approach. T h i s focuses on the conditional distribution of y^ given past outcomes, r/y_i, • • • ,yu- Here, the data- analyst specifies a regression model for h(E (yij\yij-i, • • •, yn, x^)), as an explicit function of Xy and of the past responses. In each of the three approaches, we consider both the dependence of the responses on explanatory variables and the correlation among the responses. With cross-sectional data, only the dependence of the responses on the explanatory variables needs to be specified; there is no correlation of responses. In longitudinal studies, in which correlation usually exists among responses, there are at least two consequences of ignoring it: (1) incorrect inferences about regression coefficients P, particularly, confidence intervals are too short based on assumption of independence, when in fact there is positive dependence; (2) the estimating method of P may be inefficient, that is, less precise than possible; 4 1.2 Missing D a t a Problems 1.2.1 Missing Covariates and Responses In many applications, especially in longitudinal studies, missing data are a serious problem. Ignoring missing data or using over-simplified methods to handle missing data often leads to invalid inferences. Thus, it is very important to find appropriate approaches to deal with missing data. T w o kinds of missing data in longitudinal studies are common: (i) missing covariates; and (ii) missing responses due to dropout or missing visits. For example, individuals may not come to study center for mea- surements at scheduled time points for various reasons, or they may even dropout permanently because of drug intolerance or death. Missing data make statistical analysis in longitudinal studies much more complicated, because standard methods, which are usually designed for complete-data, are not directly applicable. Commonly-used naive methods for missing data include the complete-data method, which only uses the complete observations and deletes all incomplete observations, the mean imputation method, which replaces the missing values by the mean values of observed data, and the last-value-carried-forward method, which imputes a missing value by the immediate previous observed data. 1.2.2 Classification of Missing Value Mechanisms A t the presence of missing data, the missing data mechanism must be taken into account in order to obtain valid statistical inferences. Little and R u b i n (1987) and Little (1995) give a general treatment of statistical analysis with missing values. Let y= ( with y(°) denoting the measurements actually obtained and y' ^ m 5 de- noting the measurements which would have been available had they not been missing. Let r denote a set of indicator random variables, denoting which elements of y fall into y ( ° ) and which into y ( ) . Now, a probability model for the missing value mechm anism defines the probability distribution of r. Little and R u b i n (1987) classify the missing value mechanism as • Missing data are missing completely at random ( M C A R ) if the probability of missingness is independent of both observed and unobserved data. W h e n missing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be quite plausible. For example, some values are missing because of reasons irrelevant to the treatment such as the medical equipment is broken down on a certain day. So missingness is M C A R if r is independent of both y • ( o ) and y ( m ) Missing data are missing at random ( M A R ) if the probability of missingness depends only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he/she is too old. In mathematical notations, missingness is M A R if r is independent of y ( ) . m • Missing data are nonignorable or informative (NIM) if the probability of missingness depends on unobserved data. T o be specific, N I M has two cases in the context of random effects models: (i) the missingness depends on unobserved responses. For example, a patient fails to visit the clinic because he/she is too sick. W e call the missingness outcome-based informative (Little, 1995) if r is dependent on y ^ . (ii) T h e probability of missingness depends on unknown ran- dom effects (i.e. individual characteristics such as individual decay rates) which may substantially affect the responses. We call missingness random-effect-based 6 informative (Little, 1995) if r is dependent on random effect b*. It turns out that, for likelihood-based inference, the crucial distinction is between random and informative missing values. B o t h M C A R and M A R missing mechanisms are sometimes referred to without distinction as ignorable. Little and R u b i n (1987) show that, when missing data are non-ignorable, likelihood inference must incorporate the missing data mechanism. 1.2.3 Literature on Missing D a t a Problems Little (1992) reviewed methods of estimation in regression models with missing covariates. Six methods dealing with missing covariates are compared: complete-case methods, available-case methods, least squares on imputed data, maximum likelihood methods, Bayesian methods and multiple imputation. He suggested that the maximum likelihood method, Bayesian methods, and multiple imputation method perform well, and the maximum likelihood method is preferred in a large samples and Bayesian methods or multiple imputation method are preferred in a small samples. Ibrahim (1990) considered missing covariates ( M A R ) in generalized linear models (GLMs) with discrete covariates, and applied the E M algorithm to obtain M L E s under the assumption that the missing covariates are from a discrete distribution. Ibrahim, Lipsitz, and Chen (1999) proposed a Monte Carlo E M algorithm for G L M s with nonignorable missing covariates. W u and Carroll (1988) consider linear mixed effects models ( L M E s ) with informative dropout under the assumption that the informative dropout could be modeled by a probit model which includes the random effects as its covariates. Diggle and K e n ward (1994) consider general approaches to informative dropouts in multivariate data and longitudinal data. T h e y show that considering informative dropout mechanisms in the statistical inference reduces the bias caused by considering the informative dropout as only M A R . T e n Have et al. (1998) discuss mixed effects logistics regression models for longitudinal binary responses with informative dropout. Roy and L i n (2002) consider multivariate longitudinal data with nonignorable dropouts and missing covariates. Little (1995) gives an excellent review on modeling the dropout mechanism in repeated-measures studies. Dropout models were classified into selection models and pattern-mixture models. T h e main difference between the two type of dropout models is that the form of missing data mechanism needs to be specified in the selection models but not in the pattern-mixture models. He classified N I M into nonignorable outcome-based missing data where the dropout depends on the missing values, and random-effect-based missing data where the dropout depends on random effects. He also suggested to examine the sensitivity of the results to the choice of missing data mechanisms when we almost know nothing about the missing data mechanism. Ibrahim, Chen and Lipsitz (2001) develop a Monte Carlo E M algorithm to obtain M L E s in G L M M s with informative dropouts. T h e y propose that the missing data mechanism may be modelled by a logistic regression and a sequence of one-dimensional conditional distributions which may reduce the number of nuisance parameters. 1.3 Motivating Examples Our research is motivated by studies of H I V viral dynamics, which have received great attention in A I D S studies in recent years (Ho et al. 1995, Perelson et al. 1996; W u and D i n g 1999). These viral dynamic models provide good understanding of the pathogenesis of H I V infection and evaluation of antiretroviral therapies, and the 8 dynamic parameters may reflect the efficacy of the antiviral treatments (Ding and W u , 2001). A common problem in these studies is that some subjects may drop out of the study or miss visits due to drug intolerance and other problems (although dropout patients may return to study later), and covariates may contain missing data as well. It is important to evaluate how the dropout patients affect estimates of the viral decay rates since the decay rates may reflect the efficacy of the antiviral treatments. T h e dataset which motivate our research consists of 48 H I V infected patients who were treated by a potent antiviral regimen. T h e viral load is repeatedly measured after initiation of the treatment. After the antiviral treatment, the patient's viral loads will often decay, and the decay rate may reflect the efficacy of the treatment. We only consider the viral load data before viral rebound and the first three months data since data after three months are likely to be contaminated by long-term clinical factors. T h e number of measurements for each patient varies from 2 to 7. Fourteen patients have missing viral loads at scheduled time points due to dropout or other problems. T h e baseline covariates C D 4 cell counts, total complement levels (CH50), and tumor necrosis factor ( T N F ) contain 3.7%, 12.3% and 16.4% missing data respectively. Four patients are randomly selected and their viral loads are plotted in Figure 1.2. Visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay rates, compared to the remaining patients. the dropouts are likely to be informative or nonignorable. Thus, T h i s dataset was ana- lyzed previously, but dropout patients were discarded and the missing viral loads were assume to be missing completely at random (Wu and Ding 1999; W u and W u 2001). W u (2004) re-analyze the dataset, proposing a missing mechanism based on the unobserved responses (viral loads). In this thesis, our objectives are to model the viral load, incorporating non-ignorable missing mechanism, based on unknown 9 "i 1 1 1 0 10 20 30 days after treatment Figure 1.2: V i r a l loads of four randomly selected patients. 10 r~ 40 random effects, and check if the estimates of decay rates are different. 1.4 Objectives and Outline In this thesis, we develop an exact inference method, implemented by a Monte Carlo E M algorithm, to make simultaneous inferences for N L M E s with informative dropout and missing covariates. T o avoid computational difficulties when the dimension of random effects is not small, we also propose an approximate inference method, which integrates out the random effects in the E M algorithm for more efficient computation. Our methods differ from W u (2004) in that the proposed dropout mechanism depends on the random effects rather than the unobserved responses. The remainder of this thesis is organized as follows. NLMEs. Chapter 2 introduces Chapter 3 discusses the exact inference method for estimation of N L M E s with informative dropout and missing covariates. T h e approximate inference method based on linearization is presented in Chapter 4. We discuss dropout models and covariate models in Chapter 5. In Chapter 6, we apply our methods to real datasets. Chapter 7 presents our simulation study. We conclude the thesis with a discussion in Chapter 8. 11 Chapter 2 Nonlinear Mixed Effects Models 2.1 Introduction Before we present our methods for estimating parameters in N L M E s with informative dropout and missing covariates, we give a brief introduction to N L M E s in this chapter. In Section 2.2, we introduce N L M E s for longitudinal data. Section 2.3 gives a literature review on N L M E s with informative dropout and missing covariates. 2.2 Nonlinear M i x e d Effects Models Linear models, such as polynomials, are often empirical models based on the observed data. Therefore, they may be only valid within the observed range of the data. There is often no theoretical consideration about the underlying mechanism, which generates the data. In many longitudinal studies such as H I V viral dynamics, pharmacokinetics analyses, and studies of growth and decay, nonlinear modeling is often required for meaningful analyses. Nonlinear mixed effects models ( N L M E s ) , or hierarchical nonlinear models, are popular in these studies in characterizing both the 12 intra-individual variation and the inter-individual variation (Davidian and Giltinan, 1995; Vonesh and Chinchilli, 1996). A s a generalization of linear models, nonlinear models have many advantages: (1) Nonlinear models are often mechanistic, that is, they are often based on the mechanism which produces the data, so the model parameters generally have a natural physical interpretation. (2) A nonlinear model generally uses fewer parameters than a competing linear model, such as a polynomial, offering a more parsimonious description of the data. (3) Nonlinear models often provide more reliable prediction for the responses outside the observed data range. However, compared with linear models, nonlinear models usually don't have a close form expression for the marginal likelihood, and thus parameter estimation is more computationally intensive. For longitudinal data analysis, nonlinear mixed effects models are popular for inferences. Suppose that there are N individuals, with individual i having n» measurements at times tn,..., tj .. n Let be the response value for individual i at time i y , subject to informative dropout, i = Yi = (yn, • • •) Vim) Let 7 z$ = (zn,..., z ) T ip l,...,N;j = l,...,nj. Denote be a collection of incompletely observed baseline time-independent covariates for individual i. Let v$ = (vn,... ,Vi ) T q be a collection of completely observed baseline time-independent covariates for individual i. A general N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995) 13 + eij, Va = 9(^ij,^ij;Pi) p = d(z ,v ;/3,b ), t i i ~ N(0,a I) bt^NfrD), i (2.1) 2 j = l,...,m,i = l,...,N, (2.2) where g(-) is an arbitrary nonlinear function, z - and v - are respectively (p x 1) y and (g x 1) vectors of covariates, e, = (en,..., Pi — ( A i ) • • • iPis) T r ej ) represents measurement errors, r n ; is a (s x 1) vector of individual-specific regression parameters, T P = ( / ? i , . . . , (3 ) y is a ( r x 1) vector of population parameters (fixed effects), d(-) is a s- dimensional vector-valued function, and is independent of e^, a 1 = (bn,..., b) T i3 is the vector of random effects is the unknown within individual variance, / is the identity matrix, and the (s x s) matrix D quantifies the random inter-individual covariance. We write D = D(rj), where r) denotes the collection of all distinct parameters in D. Let /(•) 1 denote a generic density function and f(y\x) density function of y given x. After integrating out the unobserved random effects vector, the density of the responses f( \zi, yi denote the conditional is given by Vi, P, a , D) = J f(yi\zi, 2 v P, a , b , D)f(h\D)dh, 2 i ; (2.3) and the likelihood function is N L(P, a , D\y) 2 = J] / / ( X i N i . v*; P, a , b, D)f(b\D)db, »=i ^ 2 (2.4) which generally does not have a closed-form expression. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the number of random effects b^. T h i s is straightforward to do by direct numerical integration when the dimension of is 1 or 2. However, when bj has a dimension of 3 or more, people need to consider alternative methods, such as Monte Carlo method. Here, for simplicity, we are abusing mathematical notation, by using / for many different densities, and the function can be determined from the arguments. x 14 Lindstrom and Bates (1990) propose an approximate method based on firstorder Taylor expansions about the random effects 6;. T h e resulting algorithm provides a computationally fast, albeit approximate, method for a wide class of non-linear models. 2.3 Literature Review on N L M E Models with Informative Missing D a t a W u and W u (2001) estimate parameters in nonlinear mixed effects models with missing covariates ( M A R ) by a three-step multiple imputation method. In first step, they fitted a hierarchical model without covariates. T h e n they imputed the missing covariates based on a multivariate linear model, implemented by Gibbs sampler, and created B independent complete datasets in the second step. In the last step, they used the standard complete-data method to analyze each dataset and combine B obtain the overall inference. W u (2002) proposed a method for N L M E s with censored responses and covariates measured with errors. W u and W u (2002) also proposed a method for analyzing N L M E s with missing time-dependent covariates. Later, W u (2004, a) proposed an exact and an approximate method for analyzing data with missing covariates in nonlinear mixed effects models. T h e exact method is implemented by a Monte Carlo E M algorithm, and the approximate method linearizes the nonlinear model based on a Taylor expansion, and it substantially reduces the computation load. W u (2004, b) proposed a Monte Carlo E M method for estimating parameters in N L M E s with nonignorable missing covariates and dropout, with a dropout mechanism depending on unobserved responses. However, no one has considered pa- 15 rameter estimation in N L M E s with informative dropout and missing covariates, with a dropout mechanism depending on unknown random effects. In the following chapters, we focus on N L M E s with informative dropout and missing covariates, with a random-effects based dropout mechanism. Since the random effects are shared by both the response model and the dropout model, this approach may also be referred to as a shared parameter model. 16 Chapter 3 A n Exact Method for N L M E Models with Informative Dropout and Missing Covariates 3.1 Introduction In this chapter, we develop an exact inference method based on Monte Carlo methods to obtain M L E s for parameters in N L M E s with informative dropout and missing covariates. T h e proposed exact method is implemented by a Monte Carlo E M algorithm. In Section 3.2, we give a description of N L M E s with informative dropout and missing covariates. Section 3.3 describes a Monte Carlo E M algorithm. A detailed description of our sampling methods is provided in Section 3.4. Computation issues regarding our algorithm are discussed in Section 3.5. 17 3.2 T h e Models We consider the models (2.1) and (2.2). Let rj = (rn,... data indicators for individual i such that We write y; = ( y ™ Yi and y ,i m i S ) i n i ) T be a vector of missing = 1 if y^ is missing and 0 otherwise. , Yob i) , where y j ,j corresponds to the missing components of T iSii S TO s contains the observed components of y». We write Zj = obs where z ,r (z^ ,z^ ) , r i s i s i j corresponds to the missing components of covariate vector Zj and z {, j 0 5] contains the observed components of Zj. We assume that the missing covariates are ignorable (or missing at random), i.e., the missing covariate mechanism may depend on the observed data but not on the covariate values being missing. T h e observed data are {(y 6 ,i> obs,i, v*, rj), i = 1 , . . . , N}. z 0 S Note that the dimensions of y bs,i and 0 z ts,i depend on i. 0 T o facilitate likelihood inference, we need to make a distributional assumption for the incompletely observed covariates z it conditional on the completely observed covariates Vj. We denote the covariate distribution by / ( z j | v j ; a ) , where the parameters a may be viewed as nuisance parameters. T o allow for informative missing responses, we assume a distribution for rj as / ( r » | y j , Z j , v*; <p), where the missingness may depend on the values being missing (i.e., y i ,i m S and z is,i)- T h e parameter <p are m treated as nuisance parameters. W h e n the responses y$ and the covariates z* contain missing data, likelihood inference becomes more complicated for (2.4). In this section, we consider a Monte Carlo E M algorithm (Wei and Tanner, 1990) for 'exact' likelihood inference, in- corporating missing responses and missing covariates simultaneously. B y treating the unobservable random effects bj as missing data, the 'complete data' become {(yi, Zj, Vj, Ti, bj), i — 1 , . . . ,N}, and the 'complete-data' log-likelihood can be writ- 18 ten as N w i = X]{ l o g ^^ r i i y i ' b i ' Z i ' ^ V i ; + l o g ^( y i i b i ' z i ' v i ; ' ( 3 ' c r 2 ) »=i + log/(b |D(»7)) + l o g / ( z | v ; a ) } i i i (3.1) ) where ip = ( a , /3, a ,77, (p) denotes the collection of all parameters. We assume that 2 the parameters a, f3,a ,r], 2 and cp do not overlap. T h e term / ( r ^ y * , b;, Zj, v*; cp) is a general expression of the missing response mechanism. Little (1995) points out two ways to incorporate informative missingness: • outcome-based informative if /(r^y*, b^, z,, v$; cp) = / ( r ^ y * , z , Vjj 0 ) . t T h a t is, in addition to the covariates, the missing probability of the current response depends on the possibly unobserved response For example, a patient does not show up because he is too sick to go to the clinic. • random-effect-based informative if / ( r ^ y j , b;, z,, v;; cp) = / ( r j | b j , Zj, v^; (p). T h a t is, in addition to the covariates, the missing probability of the current response depends on the underlying unobservable random effect bj. For example, a patient is more likely to miss the scheduled exam because the treatment is less effective on him and therefore, he does not believe the treatment. O r , a patient may be more likely to dropout early because his true (but unobservable) viral decay rate is slower than other patients. Diggle and Kenward (1994), Little (1995), and Ibrahim et al. (2001) discussed various specifications of the outcome-based informative missing mechanism / ( r j | y j , z it Vi, <p). In this thesis, we focus on the random-effect-based informative missing mechanism / ( r ; | b j , z;, cp). We may assume, for example, r ,..., iX 19 r ini are independent with the same distribution: logit [P(ry = l | b i , <p)] = fa + 0i6i H + <t>,b . (3.2) t In this dropout model, the missing probabilities of responses only depend on the random effects of that patient. More complicated dropout models can be specified in a similar way. Note that the assumed models are not testable based on the observed data, so it is important to perform sensitivity analyses on various missing mechanisms. If the main parameter estimates /3 are quite independent of the assumed dropout model, we may be confident about the results. Otherwise, if the estimates are very sensitive to the assumed dropout model, we need to justify the dropout model first to get reasonable estimates of the parameters. T h e covariate model /(z;|v;) can be chosen in a similar way and sensitivity analyses should also be performed (Ibrahim et al. 1999). 3.3 A M o n t e Carlo E M M e t h o d The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algorithm to compute M L E s in a wide variety of situations, such as missing data and random effects models, but it fails to get an estimating covariance matrix of M L E s . Each iteration of a E M algorithm consists of an E-step that evaluates the expectation of 'complete data' log-likelihood conditional on the observed data and previous parameter estimates, and an M-step that updates the parameter estimates by maximizing the the conditional expectation of log-likelihood. T h i s iterative computation between the E-step and M-step till convergence leads to the M L E s . If we treat (y ofes ,i, y ,i, mis z o 6 S ) i , z m i S ) i , Vj, r;, h ) = ( y 20 t i; z it v,, r b ) as the 'comh { plete' data, the complete data density for individual i is given by f(Yi, = Zi, Vi, Ti, bi\a, [3, a , t], cp) 2 / W y * , b i , Z i , Vi; 0 ) / ( y i | b i , Z i , v*; /3, cr )/(bi|Z?(r7))/(zi|vi; a). 2 (3.3) This leads to the complete data log-likelihood JV i=l N = z~2{ l o § / Wy;> b » z » v " <W + l o s /(yil b i> z»> v » ; A °" 2 ) i=l + log/(bi|D(r?)) + l o g / ( z i | v i ; a ) } , where ip = (a,/3, a ,77, 0) and k{ip\yi, z 2 i} v it r it (3.4) bi) is the contribution to the com- plete data log-likelihood from the ith individual. Note that we mainly interested in estimating (/3, a ), and treat (a,cp, rf) as nuisance parameters. 2 Ibrahim et al. (1999, 2001) proposed a Monte Carlo E M algorithm for estimating parameters in G L M M s with informative dropout without missing covariates and for G L M s with missing covariates respectively. W u (2004, a, b) extends their methods to N L M E models and provide a unified approach to address dropouts and missing covariates simultaneously, under outcome-based informative missing response mechanism. Here, we extend the E M methods to N L M E models with dropouts and missing covariates, under random effects based informative missing mechanism. 21 3.3.1 E-step Let xpW be the parameter estimates from the i t h E M iteration. T h e E-step for individual i at the (t + l)st E M iteration can be written as <2*(V# ) (t) = Et/i^lyi.Zi.Vi.ri.b^lyoi^.Zob^.Vj.ri;^] = {log/Crilyi.bj.Zi.Vj;^) + log/(y;|bi, Zi, v /3, a ) JJJ 2 i; + l o g / ( z | v ; a ) + log/(b | D(r ))} i i i J 7 ^ / ( y m i S j t ) z i s , i ; b j | y i , j , z ( , j V j , T j , T / > ^ ^)dbidy i idz i ^. m 0 s 0 S m s m (3.5) s In general, the above integration is intractable and does not have a closed form expression. However, since the integral is an expectation with respect to f(y is,i, z i s , i , b » | m m yobs,it Z b i V j , r;;V> ), it may be evaluated using the Monte Carlo E M of Wei and (t) 0 S) Tanner (1990) and Ibrahim et al. (2001). Specifically, in the i t h iteration, we may generate m t samples from / ( y m i s > j, z , bj|y miSii mate the expectation Qi{ip\ip®) o 6 S ] i , z o 6 5 > i V j , r^; ip^) and then approxi- by its empirical mean, with missing data replaced by simulated values ^ + ~ m t Y t ^ + m ~ 1 7 s > s i obs,i, z% , v ; /3, a ) h z m 2 iSii 4 j=i mt Y m S /( <*».<> m L , i l i ; " ) l 0 z Z v ' +-E^/( where { ( y ^ ] j , z ^ - , b - - ' ) , j y i,i\ i\ °gf(yobs,i, l = b ? i ) £ > fa))>- l , . . . , m } are the m t ( ' ) 3 4 6 simulated values of missing responses, missing covariates, and unobservable random effects. We may choose m 0 as a large number and m t = m -\ t + m -i/k, t 22 (k > 1) in the i t h iteration. Increasing m t with each iteration may speed up the E M convergence (Booth and Hobert, 1999). T h e E-step for all individuals at the (t + l)st iteration can be written as 2=1 ^ + m ' j=l N m t ^ E S i=i j=i AT j l t o Zo ,i, zm\si, f(yobs,i,y^.,M\ v /3,a ) 2 bs < ; m ^ + + E E »=i j=i l o § /( «*.,<. zHl.Jvi; a ) z ^ E E ° g / ( b ? - t=i j=i 1 =E Q ( 1 ) (0|^ ) + Q W ( 2 ) ) i ^ ) ) } (/3,a ]^) + Q 2 ( 3 ) («|^ ( t ) ) +Q (D(r7)|V,W) (3.7) (4) T o generate independent samples from f(y , miaii z , bi|y mia<i o 6 S i i , z Vi, oba4 r ; ip ), {t) f we may use Gibbs sampler (Gelfand and Smith, 1990) by sampling from the following three full conditionals iteratively: * f(ymis,i\yobs ii ^ii b i , V^, * f i^mis^^obs^ii Y i j b j , V^, Tj, "0^ ^), } ^), * and / ( b | y , z , v , r ; i / > ' ) . ( i i i i ) i T h e details about this sampling will be discussed in Section 3.4.1. In practice, when we generate samples with respect to /(y » ,tIyofc«,i> Z j , b j , v^, m 23 s r ^ ; ? / ^ ) , we can note that f {yTnis,i\yobs,i) Zj, b i , Vj, Yi^lp^ ^) = f(yi, r»|zi, b V i ; ip )/f{y bs,i, = /(yi|zi,bi,Vi;V ) •— Yi\z {t) i ; b i , v ; V> ) (t) u 0 4 • — u TTtyT- (3.8) /(y 6 ,i,ri|zi,bi, V i ; ^ ^ ' ) 0 s Since we are focusing on random effects based missing mechanism, assuming that the probability of missingness can be explained by b, rather than y*, both the numerator and denominator of the second term in (3.8) are constant with respect to the newgenerated samples of y u,i- Thus, m / ( y m i s ; | y o 6 , i , Z i , b i , V i , r i ; i / > ) oc / ( y ^ , b \ ; ip ) (t) ) s (3.9) {t) i ; t Similarly, we can simplify the other two full conditionals as: f (.2»mis,i\2>obs,ii Yi> b i , V i , I"i, = /(zi, y ^) , b i , r-i, |vi; tp )/f(z {t) y, b, r obs<i u t t |vij ip ) {t) i ( / ( Z i l v i ; V > ) / ( b i | j , Vjj 0 W ) / ( r i | j , b , v (t) Z Z f < ; ^ / ( y ^ , z , bj, v f i ; - 0 ^ ) /(z 6 ,i,yi,bi,ri|vi;i/> ) (4) 0 5 ) / ( r i | z i , b i , v*; ip )f(yi\ri, oc /(Zi|vi; -0 = /(Zilvi;^)/^^) ( t ) /(bi|yi,Zi,Vi,ri;0 z bi, {t) ( T ) W { ) (3.10) ) = /(bi,yi,ri|zi,Vi;V = /(bj|zj, V J ; 0 ( t ) v ; 0 u ( t ) )//(y»» i|zi,Vi;i/' r ( < ) ) ) / ( r j | b j , Zj, V J ; V> )/(yilbj, r z \j\ip ) (t) {i) u u f(y ,Y \z w ;ip ) {i) i i cx /(bi|-0 = /(bi|V )/6(b,), ( t ) ) / ( r d b i , Zi, v u i V )/(yi|bi,r W i ; ( ) z v u < ; 0 W ) (3.11) W where /;(zi) = / ( r i | z i , b i , V i ; 0 « ) / ( y i | r i , Z i , b i , V i ; V 24 ( t ) ), (3.12) and ft fa) = / ( r | b z v ; t / ; W ) / ( y | b , r , z v ; ^ ) . i 3.3.2 i ! i ! i i i i i ) (3.13) i M-step We can update the estimates at the ( i + l ) s t iteration by maximizing Q(ip\ip^ ). tS> Suppose that the parameters f3, a, D(rf), 4> and a are all different, we can update <fi, (3, a, a and D(rj) by maximizing T h e maximizer Q ( 2 ) ,Q ( 3 ) and Q ( 4 ) separately in the M-step. for QW may be computed v i a iteratively re-weighted least squares where the missing values are replaced by their simulated values {y \s,n z^L.ti m 0('+D = a r g m a x { Q « ( 0 | i / ; W ) } = argmax— 0 m zZ S f( i\yobs,i, y £ ] i j=i l o t i = r S i i A n d , similarly, the maximizer (/3^ ', CT ^ ^) for +1 2 +1 (/3,a )< > = arg max {Q( >(/3, a | V 2 2 <+1 JV j = 2 t + 1 o 6 S i i , z ^ ,v i ; 0). (3.14) could be written as: )} mi arg max — ^ ^ l o g / ( y the maximizer a ^ ( t ) , b^,z o 6 s ,i,y^L,i|bp\z o 6 s , ,z^ ^Vi;^,^ ); (3.15) S i i |v (3.16) 2 i ' for ( ^ ( a l V ^ ) can be written as: (t+i) a = argm^x{Q^(a|V )} N m ^ = ( t ) t argm x—^^log/(z a arn t 1 i=i j=i 25 o 6 S ) i ,z^ i ; a); and the maximizer L>(/7)' ' for Q^(D(rj)\ip^) can be written as: t+1 D ( m = ) = argmax{QW(D|^ )} ( t ) Mgmax—^glog/(b^|£>(»7))}. (3.17) Generally, (3.14) - (3.17) are nonlinear functions and have no closed-form expressions. T h e maximizers could be obtained via standard optimization procedures for complete data, such as the Newton-Raphson method and Quasi-Newton method. T h e asymptotic variance-covariance matrix of ip can be obtained using well-known complete-data formulas (Louis, 1982; McLachlan and Krishnan, 1997), where the expectations in those formulas can be approximated by Monte-Carlo means. Specifically, note that the observed information matrix equals the expected complete information minus the missing information, that is, IobsW = IcomW ~ (3-18) ImisW- Let N m N Q M V O = YlQiWi') t = E E - 1 s j = l t=l « i W Tlil ( - °) 3 2 and W ) - W * # > - £ £ ± - 2 g j ± . ^'^ 8ipdip ^ ^ m t dip v ; (3.21) T ' ' i=i v ; j=i Since (3, a, <p and r) are all different, matrices Q(ip\ ip), , O ^ I V O n d 5^ (ip) are block a diagonal. T h e n based on (3.18), the asymptotic observed information matrix is { N m N t E E i=l j=l 3 ^ 26 # 1 ) - ~\ E < W W # M ^ ) i=l • (3-22) J Thus, the asymptotic variance-covariance matrix of tp can be approximated by Cov$) = / ^ $ ) . 3.4 (3.23) Sampling Methods 3.4.1 Gibbs Sampler From the previous sections, we can see that generating samples from the conditional distribution f(y ,i, mis z m i S i i , b r i\yobs,i> obs,i i> z v i> V*^) * s a n important step for imple- menting the E-step of the Monte Carlo E M algorithm. Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multidimensional distribution by sampling from each of the full conditional distributions in turn, if the distribution has a convenient representation via conditional distributions. Here, we use the Gibbs sampler to simulate the missing values as follows. Set initial values (v^L,*) (v ( f c ) 7 {k) b ) {k) b- '). Supposed that the current generated values are 0 mi ,i> z S we can obtain i v 7 ( f e + 1 ) h ) {k+1) ( f c + 1 ) as- Step 1. draw a sample for the missing responses y ^ V f (ymis,i\ is,i> z b m i > yobs,i> obs,i~^i> from "0^ z Step 2. draw a sample for the missing covariates z ^ * V from f { mis,i\(ymis,i i bj z \ yobs,i> obs,i' i, z ? i \ ^ ) - v Step 3. draw a sample for the "missing" random effects b ^ ffoilymisji misJ Z < o(>s,i) v obs,i it z v i\ r ^ fe+1 ' from ^)- After a sufficiently large burn-in of d iterations, the sampled values will achieve a steady state. T h e n , {(yjf,^, z ^ - , hf^),k = d + 1 , . . . , B} can be treated as samples s i 27 from the multidimensional density function f(ymis,i) mis,ii bj|y i) j, Z ( , j V j , Tj) i / / ^). z 0 S] 0 s A n d , if we choose a sufficiently large gap d! (usually smaller than d), we can treat the sample series {{y^ is>it z^ 8 ] i , b f ^ ) , k = d + <f, d + 2d',...} as independent samples from the multidimensional density function. There are several ways to get the initial values (yJJ] , z ^ ] , b ^ ) . S]i s i A simple way is to replace (y^] j, z ^ ] ) by the average values of the observed data, and set S b| si as 0. 0 ) 3.4.2 Multivariate Rejection Algorithm Sampling from the three full conditional distributions can be accomplished by an adaptive rejection algorithm (Gilk and W i l d , 1992) if the appropriate densities are log-concave. However, for arbitrary N L M E models, some densities may not be log- concave. In such cases, multivariate rejection sampling methods may be considered. Booth and Hobert (1999) discussed such a method for complete-data G L M M models, which can be extended to N L M E models with dropouts and missing covariate as follows. Considering sampling from / ( b j | y j , z , v;, r-j, ip^). A s in (3.13), let f *(bi) = t /(r |b ,z ,v i/' i i i i ) ( t ) ) / ( y j | b j , r , z , v , ' 0 ) , and r = sup/ *(u). We assume r < oo. A u W i i I 6 random sample from f(b \y ,z ,v ,v ,'d)^) i Step 1. b i i i can be obtained as follows. i sample b* from f(bi\ip^), and independently, sample w from the uniform (0,1) distribution. Step 2. if w < / * ( b * ) / r , then accept b*; otherwise, go to Step 1. Samples from f{y is,i\yobs,i> i , b;, v,, r z m t/> ) and / ( z (t) i} mis,i\ obs,ii V j , "ii V i , z can be obtained in a similar way. Note that, when the dropout probability only de- 28 pends on random effect b , but not y we have i} f{ymis,i\yobs,ii i ; bj, Vj, TjJ ^) z /(yi, r |zi, bi, f VJ; ip )/f(y ,i, Ti\z (t) obs /(yj|zj,bj,Vj;^^ u b ^ l y ^ ' ^ it Vjj ^ v W T/> ^ (<) ) Vii^ ) W /(y 6 ,i. »l ». ». r 0 = b v i;^ ( t ) ) /(yilzi.bi.Vi;^)/^^). T h e function / * ( y » ) = / ( r j | z j , bj, v i ; ip^)/f(y ,i, obs respect to the missing responses y j j . m any z S (3.24) r»|zi, b i , Vj, i/> ) is a constant with (t) So, in Step 2, we always have fy{y*i) = r for S i generated y*. T h u s / y ( y * ) / r = 1 > to, for any u; generated from uniform(0,l) and y * is always accepted. T h a t is, for any y* generated from / ( y i | z j , b i , Vi,ijj®) in Step 1, we always accept y * in Step 2. 3.4.3 Importance Sampling W h e n the dimension of y j j , m z j j or b i is large, the above rejection sampling S ) m S ] methods may be slow. In this case, we may consider importance sampling methods where the importance function can be chosen to be a multivariate normal density or a multivariate Student t density whose mean and variance match the mode and curvature of / ( y i , i , z j m S m S i i ) bj|y o 6 s , j , z 6 ,iVj, r 0 S i ; ip^). B o o t h and Hobert (1999) discuss an importance sampling method for complete-data G L M M models. Here, we may extend their method to N L M E models with dropout and missing covariates. Specially, we may write / ( y m i s , i j mis,ii hi\y0bs,ii o 6 s , i i , I"i, l / ^ ') = C e x p ( / l ( y j j , Z i i , bi)), z z v m 29 s m S ] (3.25) where c is a unknown normalizing constant. Zmta.i, bj) be the first and second derivatives of Let z mi3}i m m f(y is,i, m m s z j , i , bj) respectively, and let , m - mis m mis S 0, which is the maximizer of Then, the Laplace approximations of the mean and variance of mis,i,hi). z z i,,i, m i S t i be the solution of h(y ,z ^,bi) (.y is,u mis,i^i) h{y is,i> h(y zmis>i, bj) and / i ( y i , i , h(y j, bi\y Zo^jVj, r ip ) are u o b s > u {t) (y* z*mi^, b*) a n d - ( h ( y , mis<i , i , «mi»,i» *))" b m i s respectively. Suppose that { ( y ^ , z ^ , b * ) , . . . , ( y ^ , z ^ > , b * ^ ) } (1) is a ran- dom sample of size m generated from the importance function / i * ( y j i , z j, bj), t m that is assumed to have the same support as / ( y j , j , z m s mijJ| j , bj|y S) miS j, z ,jVj, r , V> ). (t) o6Si o6s f Then we have N QW\4>M) . « 1=1 nit {^EwW^y^,ySi,^.i,< 2 ob * t J=l 1 i 0 ) )}, (3.26) where (t) _ *i w f(ymJl,i> ~ lypba.i) Z0ha,jVj, Tj, mis,i> Z h*(v )- z' - b* ) {j U) ij) Tp^) { ] are importance weights. For the above sampling methods, the rejection sampling methods may be more efficient when the dimension of the random effects and the sample size are small, while the importance sampling method may be more efficient when the sample sizes are large since in this case the importance distribution may closely resembles the true conditional distribution. 3.5 Convergence When applying the Monte Carlo E M algorithm, Monte Carlo samples for the "missing data" are drawn at each iteration to approximate the true values. Consequently, Monte Carlo errors are introduced. The Monte Carlo errors are affected by the 30 Monte Carlo sample size. It is obvious that larger values of m , the Monte Carlo t sample size, will result in more precise but slower computation. A common strategy is to increase m as the number of E M iterations increase (Booth and Hobert, 1999). t For sufficiently large values of m , the Monte Carlo E M algorithm would inherit the t properties of the exact versions, such as the likelihood increasing properties of E M , but would substantially increase the computation work load. Thus, we usually use a relatively small m at initial iterations, and then increase m with the iteration. t t If the Monte Carlo error associated with i / / + ^ is large, the (t + l)th iteration of the Monte Carlo E M is wasted because the E M step has been swamped by the Monte Carlo error. Booth and Hobert (1999) proposed an automated method for choosing m in the context of complete-data G L M M models. Here we extend their t method to N L M E models with dropout and missing covariates. Let and let rb*^ ^ t+ Q'"^'") = g » „ W / > ) - ^f , B 8 < W ! > , Q^(tp\xp^) — 0. be the solution of (3.28) 1 ( Va?(^ = ( t + 1 ) |^ ( t ) 0* ( T + 1 ) , 9 ) When the simulated samples are independent, it can be seen that the conditional distribution of approximately normal with mean 3 /(•J/>^ + 1 ^|'0^) is and a variance that can be estimated by ) QW^+^i^^V^^f^^^^i^^^gw^^'+^i^^V 31 1 (3.30) where ,bP ;tf<« ) t y \si' m z mL,» a n o ^ bj j ; are simulated samples, and u> y + 1 >)] }, r (3.31) are the importance weights when the importance sampling is used and are all set equal to 1 when rejection sampling methods are used. After the (t + l ) t h iterations, we may then construct an approximate 100(1 — a)% confidence ellipsoid for ip^ ^ t+1 based on the above normal approximation. T h e E M step is swamped by Monte Carlo error if the previous if>® lies in the confidence ellipsoid, and in that case we need to increase m . t we may set m t to be m -i t +m -i/k t For example, for some positive constant k and appropriate m . 0 T h e proposed Monte Carlo E M algorithm often works well for the models with a small dimension of random effects. W h e n the dimension of the random effects is not small, however, the proposed E M algorithm and Gibbs sampler may converge very slowly or even not converge. Therefore, in the next chapter, we propose an approximate inference method which may avoid these convergence difficulties and may be more efficient in computation. 32 Chapter 4 A n Approximate Method for N L M E Models with Informative Dropout and Missing Covariates 4.1 Introduction In the previous chapter, we have described a Monte Carlo E M algorithm for "exact" likelihood inference for N L M E models with informative dropout and missing covariates. However, the exact method may offer potential computational problems such as slow or non-convergence, especially when the dimension of the random effects b* is large or the intra-individual data are not rich. dom W h e n the dimension of the ran- effects bi is not small, sampling the random effects may lead to inefficient and computationally unstable Gibbs samplers, and may lead to a high degree of autocorrelation and lack of convergence. W h e n the intra-individual data are sparse, the individual nonlinear regressions used in the M-step may fit poorly, leading to slow or non-convergence. T o reduce computation work load, in this section, we propose an ap- 33 proximate method which iteratively solves certain L M E models and avoids sampling the random effects in E-step. The proposed approximate method may be preferable when the exact method exhibits computational difficulties. Alternatively, the approximate estimates can be excellent starting values for the exact method. Note that this approximate method is exact for L M E models and certain N L M E models where the model may be nonlinear in the fixed effects but is strictly linear in the random effects. For complete-data N L M E models, approximate methods have been widely used, and these approximate methods often perform reasonably well in most cases (Lindstorm and Bates, 1990; Pinheiro and Bates, 1995; Vonesh et al. 2002). These approximate methods are typically obtained via Taylor expansions or Laplace approximations to the nonlinear models. One particularly popular approximate method for complete-data models is that of Lindstorm and Bates (1990), which is equivalent to iteratively solving certain L M E models (Wolfinger, 1993). For L M E models with missing responses but completely observed covariates, Ibrahim et al. (2001) propose an efficient E M method which is obtained by integrating out the random effects. These methods can be extended to N L M E models with informative dropout and missing covariates for approximate inference. Here, we propose an approximate method based on Lindstorm and Bates (1990), which uses first-order Taylor expansions around the updated parameter and random effects estimates and is equivalent to iteratively solving certain L M E models. Then we propose to handle the missing responses and missing covariates in the L M E model step, for which the random effects can be integrated out. Thus, this approximate method avoids sampling the random effects in the E-step and avoids fitting some nonlinear models in the M-step, and therefore avoids potential computational difficulties associated with the exact method. Moreover, well known efficient EM-type algorithms for complete-data L M E models (Meng and van 34 Dyk, 1997; L i u et al. 1998) can be incorporated in the L M E model step to further speed up the convergence. 4.2 T h e Approximate M e t h o d We can rewrite the N L M E model (2.1) and (2.2) as a single equation by combining the two stages: Vii = 9ij{tuPM) j = l,...,nui = 1,...,N, (4.1) where <7ij(-) is a nonlinear function. T o simplify the notation, we suppress the complete observed covariates V ; a n d denote the current estimate of ip i n the t-th E M iteration by V W = (a<*>, P \ Dfa)« {t a 2 « <p«). L e t g = (g l .. .,g ) . T iU ini Following Lindstorm and Bates (1990) and Wolfinger (1993), the proposed approximate method iteratively solves the following L M E model: y = X3 i ir + + e;, (4.2) where fi = Y i - S i f a ^ . b ^ + XipV+TfrV, (4.3) %j(zi,/3,bi) X i — j _ t = Xfa) = (X? ... lt ~ /3 ",bi")' ,Xlf, ( ( ( 4 4 ) dg {z p,bi) i:j u dbl i j X w (/3 W ( - ) ( Ti = T^) = (Tl.. 4 .,Tf f, n and Y i = (y„,... 5 ,y ) . T ini Note that we have &*>) r«;V ), (t) 35 (4.6) where ymis,i ~ Ymis.i — and X ,i,T mis l t m S] respectively. g i i is a sub-vector function t defined similarly, and y (4.7) z S) are submatrice of Xi,T miSii of + -Xmj j(z j j)/3^ + Tmis,i{ mis,i)b\ \ gmia,i('Z mis,i, $ \ h\ m = (y£ S ) , yo6 ,i) - Under the L M E model (4.2), T i j M S it is straightforward to show that HbilyuZi^V) ~ Nfati),- (4.8) + D- ®)- , (4.9) where ti^ia-WlfTi 1 hi = tiTf 4.2.1 1 (Si-Xip^)/^. (4.10) E-step We can integrate out bj and obtain the following results. Qi(tp\*p ) {t) = = £'[/i(V'|yi,Zi,v ,r ,b )|y i ///{log/(ri| + log/( |v i ; xf(y ia,i, m Z i m = y i i i , h Zi, v u o 6 a i i i ; ,z ( 0 i a | i ,v ,r ;V' i ( t ) i </>«) + log f( \hi, yi ] Zi, v ; / 3 « < 7 ° ) ( 2 t Q(") + log/(bi|Z?(77))} z i , i , bj|y 6 ,i, z S 0 S h + h + h + h- V j , TJ; • 0 ' ) d b i d y i , i d z ( o t S | i ) m S m i S | i (4.11) 36 h = JJJ ^ogf{ri\yi,bi,Zi,Vi\<p )f(y ,z ^^ {t) miSii miatU X (ibjdymisjd-Z'mis ,t x / ( b i | y i , Zi, = JJ[J $ )dbidy idz ia,i w mi3t m log/^lyi.bi.Zi.VijaiWj/Cbilyi.Zi-.^Jrfbi} Vi,ri;T/> V y m i s . i d Z n ^ i • ( (4.12) Consider the dropout model (3.2), and suppose that the missing probabilities for each time points are conditionally mutually independent. T h e n , we have /falyi.bi.Zi.VijciW) - n p ( ^ ' = l|0 \bi) «(l ( t - P(Ra = l | ^ r ( t ) , b0) 1_r «<4.13) j=i Note that, here, we use Rij to represent the argument in the equation, and use r y to represent the value we observed. Define 0 = (<pi,..., (p ) , T s and we can re-write (4.12) as h = JJ {y log | n ^ ( ^ = i i 0 ( t ) ,bo r y (i - p(Rij=li^.bo) -^ 1 / ( b i | y i , Z j ; i / > ) d b i | x f{y is,u mia,i\9obs,i, (t) z z m = JJ { / ( E h o 6 s ,iVj,r ;ip = ! | i - ^ ) ) + (1 " ^ ) b W - inti x/(bi|yi,z ; />W) ibi}/(y i 1 W l Q { t ) )dy m i s < g ( l - P(Rii idz m i s < i = l|b i ( 0 ( t > ))] ipW)dy ( Tnis.i^^mis.i ,1 + 00 + 0 b ^ x/(bi|yi,Zi;V t ^1 + 00 + 0 ^ 7 j )rfbi}/(y mis,», Z i m S | i|y 5 i, Z 5 iVi, 0 a 0 s ~ JJ {it^^ }?^™^' ™^^'^ 1 2 37 mis i ^ Z i , i m (4.14) where i)0 + (p Tb — 1 + 00 + 0 ^ fij log +(l-r )log ,1+00 + 0 b x/(bi|yi,Zi;i/> )dbi w Define ai 1 y i ; (4.15) 0o + 0 b i . From (4.8), since b i is multivariate normally distributed, with mean t* bi = Z 1?(y -X 0W)/o*n i i i (4.16) and variance E = (a- W3fT + Z?- Wr 2 b i 1 i 1 > (4.17) Oj is a univariate normal random variable with mean (4.18) and variance (4.19) Take an R ^ M variable transformation s 3 fli = 0 o + 0 b i , 6 2i = b , 2i (4.20) and assume that at least one 0,- ^ Q,j = 1 , 2 , . . . , a, say 0 ! ^ 0, so the Jacobi determinant d(ai,b i,.. 2 d{b lu .,b ) ai b , •.., b ) 2i ai 38 (4.21) and we can write Q i as: n, - / /(bi|yi,z ^ )dbi w i ; =/ • • 7 h (i^-) l o s + ( i v . 11 u L I~ xf{ai,b i,...,b i\y ,z ;tl:V>) S i r ° (r^-) ) i e i(t)\ \ 9(ai,b i, 2 i 2 - « i • • • ,b i) s d{bu, b ,b ) 2i db\idb 2i • • • db s si x / ( a i , &2», • • • , &*i|y»> t i il> )<f)\dbudb i • • • db z {t) 2 / • • 7 h ^ ( i ^ - ) = i x / ( a i , &2i, • • • , = / - / [ < - ^ ^ ( T ^ - ) ] + i z,; tl) )d{<t)ib{)db • • • db {t) 2i W i r J + o - x / ( a j , 6 i, • • • , 6«|y»» Zii ip )daidb 2 2i y {t) 4 = J y / where / ( a i | y j , Z i ; "J r « l o g r + y 3i it z<; ip^daidbu • • • db. f( 2u---,bsi\ai,yi,Zi;ip )db ---db b {t) 2i (rT^) n j l o s [ 2i W si x /(fltilyi, z ; ip )f{b ,b \oi, { si • • • db {t) = si ( i T ^ ) + ( + ( 1 1 - ^ - r i ) ^ l l o o g g : ( i ^ ) . ( r + T : ) /(«i|yi>Zi;V )rfai,(4.22) W is the density function of i V ( / i , S . ) . B y Taylor expansion of 0( 39 a a* at / i , we can write Cli approximately as: a i «i * / ( r « flog ( ^ - ) + ( - ) ( 1 a i - ^ ^/(ailyi.Zi;^)^ = y r J [r log/i tj tf LA'oi a i - log(l + / z j ] / K | y a Mai + 1J i ; z y /(c-lyi, l o g ^ - log(l + / i ) ] ai i/; )da; w i ; z V )cfai W i ; y (a.' ~ A O / ( o i | y . - , z < ; V = r - l o g / i - l o g ( l + /z ) y 0 i Va» X 1+ ai Ma +1 4 = ( t x 0 rylog/ia, - l 0 g ( l + /X J. (4.23) a Therefore, h / / { E r ^ S ^ ° ' ~ ° S ( + A**)} l o 1 1 \9oba,i, Zo63,iVi, I-,; = y^ {"i.mia log tb )dy (t) mis ~ Tli log(l + / i ) } 0i |y 6 ,i, z 6 i V i , r ; ip )dy idz {t) 0 where n i i m i S s 0 Si 4 miSi miSii (4.24) is the total number of missing responses from the i t h subject, i. e., n.'i mis t — ^ ^ Tjj. J= l 40 (4.25) Next, ^ f ijfmisfit ^mis,ii = JJJ \°%f{yi\buii v \P \o- )f(y ^ {t ) x/(b |y , i i {t)2 i rni3 ;V )db dy W Z i i miSfidlt-mis^i = JJ {J log/^lbi.Zi.Vi^W.ffW'j/^lyi.Zij^Wjdbi} ^/(ymiSji) ^Tnis,i\yobs>ii ^obs,i^i^ ^i) ^ fi2 = = = II^ 2^^ mis' i' Zmi3' i^ obs' i' Zobs' iVi,ri'^ t)^ y ( 4- 26) log/(y |b ,z ,v ;/3W,aW )/(bi|yi,Zi;'0 )rfbi 2 i / i i W i log c 2 G v ^ U r e x p ( ~ ^ ( - * log ^ - 1 logdEij) - — x exp ( - ^ ( b i - b ) E - ( b T i 1 i y / ~* i ( y *~ - b i ) ) db; i / 3 X - T i b i ) T ( y i - ^ ^~ ^ ) ( y * - ^ r r ~ - )) T i h i ) T ^ (4.27) where s is the dimension of random effects b*, as defined in Section 3.2, and C = 2 -*±Mog(27T). 41 can perform a variable transformation as bj = b; + T, / ^, and obtain: 1 2 C -^\oga - -log(\t \) 2 1 2 i _J_ 2 f (Yi ~ Xj(3 - TM - X£ (v^lE, ! - T^y%n yi ° J 2 - TM - T&\'X) 172 Ca-^log^-ilogdEil) _ J _ 2 / (Yi ~ Xj(3 - TM - T ^ \ ) ° J T ( y i - Xtf - - T\ty\) - x& - T& - T^X) (v^F)'|Ej | 2 / a xexp^-ikfk^ |Ej |d(ki) /2 Ca-^log^-ilogdEil) _J_ f (y< - Xjfl - Tjb, - r s ? k ) ( / 2 r i i y < 2a2 J xexp ^ - i k f k i ^ dki C - ^ log a - i log(|Ei|) - ^ ( 2 2 / 0 ^ < * - y * (y* - ^ i / 3 - T i b i ) ( 2 T 2 C - y log a - 1 log(|Ei|) - ^ ( 2 2 1 2 - X / 3 - T i b i f ( y i - X£ - T&) - C - 2 i log a - i log(|Ei|) - 2a i y i - X£ - T^f^ y i - X£ - T&) - Xfi - TM) (4.28) TriTfTiti). 42 Thus, X (yi ^ibi)]/(y i Xi/3 m 5 ] i, Z i m S ( i|y ^ 0 3 j i , Z 5 iVj, Ti, a ) 0 ^)dymia,i^Z i m 5 ) i. Also, ^3 log/(Zi|vi;a = = yyy = ^ io s /^k; a W ( t ) )/(y i ,i,z 3 m )/(y™«,t' z i w )/(y m i ,i,z 3 m j ,i,bi|^ 3 3 i i m log/(zi|vi;a m i 3 i |y o i ) 3 ; i , z ^ , , ^ , ry, ^ ) i|y^ (4.29) and, -^4 = y^ log/(bi|D)/(y = JJj log / ( b i | £ > ) / ( y = JJ { {J 3 m i S ) i, z m m i 3 ] i 3 i i|y i,b^ o 6 S i i V i , r<; i/> ) (t) 0 m y^ o i ) 3 i i log/^l^/Cbilyi.z*;^ )^} x/(ymi5,i) Z j i|y (, i, Z b iVi, Ti, = , z ip^dbidymis^dz^i z; x/(bi|yi, i ,i,z T O 3 ) 0 3 j 0 3 ) ^)^ymis,t^Z i i m ^ 4 / ( y m i s , i i Zmts,i|yofcs,i) Z ( , i V j , Tj', 0 3 ) ^^yn^^idZn^iai, 3 j (4.30) where, fi 4 = = yiog/(bi|£»)/(bi|yi,Zi;i/>W)dbi C - | log |£>l " | / ( b f / ? - ^ ) 1 4 X (v^)-|Si|^ e X P {-\ 43 { h i ~ 6 i ) T £ _ 1 ( b i - ^ ( 4 3 1 ) and d = - f log(27r). Again, we may perform a variable transformation b = bj + E - k j and obtain: / 2 ; Q 4 = C4-ilog|D|-iy*(b ^ ^ e x p ( ^ = C - i 4 k r log\D\-\J ^ ^ e x p ( - I k f k = t k + Ej i l / 2 k ) D- (b r ) , ( b i + Sj k ) 1 i / 2 i i E ^ ) + (bj + E J ^ f i r ' f o + E ^ k i ) ) | E V 2 C - \ log p | - \ J (bi + E , 4 | , 1 / 2 ( k t ) kjf D " 1 ^ + E 1 / 2 t ki) ^ e x p ( - i k r k i ) d k i = Q - i l o g P I - i b f D ^ b i - \ /(Efki)^-(Ef k t ) ^ exp ( - | k f k i ) d k i 4 / b ^ - ( E f k ^ e x p ( - i k f k i ) ^ ^ / ^ k i f ^ b i ^ e x p f - I k f k i ) ^ = c . - i i o g i D i - i b ^ b i -\J = ( E j k i f Z r ^ k i ) ^ ^ exp ( - i k f k ^ d k i - 0 - 0 C -^log|Z3|-lbJ Z?- b -irr(D- f: ). , 1 4 (4.32) 1 i i Thus, 7 4 « Q - i l o g p l - i l Q b f ^ b i + i r ^ E i ) ) ^ f{ymis,i> mis,i\yobs,ii oba,i^i> z z tp )dymi ,idz . {i) 3 miSti (4.33) From the analytical discussion above, we have integrated out bj. Consequently, Qi(-0|0 (<) ) can be evaluated as an integral with respect to the density function 44 /(ymi ,i,Zmis,i|yo6 ,i,z 6s,iVi,ri;i/> ). S 3 w 0 Thus, in E-step, we can only sample y m i s A and z is,i- This may substantially reduce computation burden since the integration with m respect to b * is not needed. 4.2.2 M-step The M-step in each iteration of the approximate method is similar to the M-steps of the exact method, which is discussed in Section 3.3.2. The asymptotic variancecovariance matrix of ip can again be obtained using well-known complete-data formulas as described in Section 3.3.2. The only difference is that the likelihood function li is for the model (4.2). 4.3 M o n t e Carlo Sampling Having integrated out the random effects b ; , in E-step we only need to simulate samples from f{y ,i, z is,i\9obs,i,z 6s,iV»,r^; mis m As in Chapter 3, we can again T/> ). w 0 use the Gibbs sampler to draw the desired samples. The procedure is described as follows. Set initial values Afr(fc) (y™] 3 | i ,z^- ). s i Supposed that the current generated values (fc) \ Step 1. draw a sample for the missing responses 9mu!i from f (9mis,i\ mis,i' 9obs,ii obs,i^ii Z ^)- z Step 2. draw a sample for the missing covariates z ^ V from f { mis,i\{9mis,i , 9obs,ii Z ( , j V j , Yi] 1p^ ^). z 0 After a burn-in period, the sampled values sample from the density function /(y i«,t, m 45 Si (y^.V' m?s*i) z c a n D e treated as the true mis,i\9obs,i, z bs,iVi, i \ ; ip ). z {t) 0 And, if we choose a sufficiently large gap d! (usually smaller than d), we can treat the sample series { ( y ^ , z ^ ) , k = d + d',d + 2d',...} multidimensional density function f(y is,i, m 46 2 \y miSii obStU as independent samples from the z ^ oba u r ip ). w i ; Chapter 5 Covariates Models and Dropout Models 5.1 Introduction In the foregoing chapters, we have already discussed the methodology for estimation of parameters in N L M E s with informative dropout and missing covariates. To provide valid inference, we need to specify a dropout model for the missing response, and a covariate model for the incompletely observed covariates, and then incorporate them into our analyses. However, the dropout model is usually unknown and hard to be verified from the observed data. Sensitivity analyses are thus very important in that they can show us how sensitive our conclusion rely on our models. If our estimates vary a lot when we choose different dropout models or covariate models, they may be unreliable because we do not know whether our covariate model and dropout model are true. On the other hand, if our estimates are robust to model selection, we can believe that the estimates may be reliable. In Section 5.2 we introduce covariate models. In Section 5.3 we discuss possible dropout models. In Section 5.4 we discuss 47 sensitivity analyses for the dropout model and covariate model. 5.2 Covariate Models W h e n some covariates are missing, we need to assume a distribution for the covariates. T h e parameters in the covariate model are also viewed as nuisance parameters. Ibrahim (1990) proposed a saturated multinomial model for categorical covariates with missing values. A drawback of his method is that the saturated model greatly increases the number of nuisance parameters, which increases computation burden and may make the model unidentifiable. W h e n the missing covariates are all continuous, we may assume a multivariate normal distribution for the covariates (Little and Schlucher, 1985). T o allow both continuous and categorical covariates, we may write the covariate distribution as a product of one-dimensional conditional distributions, as in Ibrahim, et al. (1999) /(zi;a) = /(z x j ) P |z i,...,z p_i;ap) i i i i / ( ^ i , p - l k i , l ) • • • , Zi,v-I'i p-l) a x ••• x / ( z | a i ) , (5.1) M where z* is the covariate vector for the i t h subject, a = (aj, a , • • • , a ) T 2 T is the pa- rameter which characterize the relationships among the covariates, and c*i, a , • • • , a 2 p are all different. T h e index p is the number of covariates. Note that we do not need to make distributional assumption for the completely observed covariates, which are conditioned on and are suppressed in the above expressions. Note also that this modeling scheme allows the missing covariates to be continuous, categorical and mixed. For example, suppose that Z\ is continuous and z 2 48 is binary. B y the above modeling strategy, we may specify a normal distribution for Z\ and a logistic regression model for Z2 conditional on z\. For the dataset described in Section 1.3, all of the three covariates in the models, C D 4 (z\), T N F (z ) and C H 5 0 (23), contain missing values. T h u s we need 2 to assume a joint covariate model for likelihood inference. A s we have discussed, we model the joint distribution of z = (z\, z , z ) 2 3 as a product of three one-dimensional T conditional distributions: f(zii,Zi2,Zi \a) = f{z \z ,z \a. )f{zi \zii\a. )f{z \ct ). 3 a il i2 3 2 2 il (5.2) l where a = (otj, cx^, c * ) . T 3 We focus on the following saturated model, (zi \zn,z ;oc ) 3 i2 3 (z \zn; a) i2 2 {z \an) n where a 3 = (a , a i , a 3 0 3 3 2 ,a ~ N(a ~ ^ ( 0 2 0 + 0:21^1,0:22), (5.4) ~ iV(ai aii), (5.5) ) , a r 3 3 2 + a z +a z ,a ), 30 31 a 32 i2 (5.3) 33 0 ) = (a , a i , a 2 0 2 ) , and a i = ( a i o , a n ) . T 2 2 r We will also consider other more parsimonious models for sensitivity analysis. 5.3 D r o p o u t Models Dropout models are the models for the missing responses indicators r j. t T h e param- eters in the dropout model are treated as nuisance parameters and are usually not of inference interest. Thus, we try to reduce the number of nuisance parameters to make the estimation of (3 more efficient. Moreover, too many nuisance parameters may even make the N L M E model unidentifiable. Therefore, one should be very cautious about adding extra nuisance parameters. 49 Since the missing responses indicators are binary, a simple model for them is a logistic regression model as follows. We may assume that the missing probabilities for each time points are independent, i . e., Tli f{Vi\y b i , , v u Z i i ; (* " * t t ) V>) = f l 1 _ r y . (5-6) and l o g ( i_ 3 7 r .) ^''' = ft where 7Ty = P{Rij = 1) is the probability that linear function of yi,bj,Zj and Vj. ( yi bi Zi Vi; 5 7 ) is missing, and /i(-) is often an More complicated models can also be considered, but they may introduce more parameters and increase the computational burden. In general, the probability that y^ is missing may depend on many factors, such as past or current responses, individual random effects, covariates, etc. However, since in this thesis we focus on random effects based informative dropouts, we may assume that the missing probabilities of responses are only explainable through b j , the random effects, i . e., / ( r » | y » . b i , Z i , V i ; ip) = / ( r ^ b j , 0 ) . (5.8) Thus, we have Tli /(ri|yi,bi,Zi,v 0) = [ ] > ( ^ = l|0.bi) "(l r I ; P{Rij = b*)) "^ 1 (5.9) Again, note that, as in Section 3.3.2, we use i?y to represent the argument in the function and TY, to represent the value of Rij as we have observed. 5.4 Sensitivity Analyses The dropout model and covariate model are not verifiable based on the observed data, so it is important to conduct sensitivity analyses. That is, we need to try other 50 plausible dropout models and covariate models, and then assess the sensitivity of results to those different models. If there is not much difference between the results based on different models, we can draw relatively reliable conclusions. Otherwise, the results may depend on assumed the models and the conclusions may not be reliable. 51 Chapter 6 Data Analysis 6.1 Introduction In the previous chapters, we have discussed an exact method a n d an approximate method for N L M E s with random effects based informative dropouts and missing covariates. In this chapter, we will analyze two real datasets. In Section 6.2, we analyze the data described in Section 1.3. T h i s dataset has both missing covariates and missing responses. We will only apply the exact method. In Section 6.3, we analyze another A I D S dataset, which does not have covariates. W e will apply both the exact method and approximate method on the second dataset. In Section 6.4, we discuss some computational issues. 52 Table 6.1: D a t a summary of Example 1. Variable Sample Sample Percentage of mean standard deviation missing values V i r a l load 3.57 0.99 6.7% CD4 177.3 86.9 3.7% CH50 241.1 48.6 16.4% TNF 61.0 28.8 12.3% # of patients: N=48 # of observations per patient: n;=2 to 7. 6.2 6.2.1 Example 1 D a t a Description T h e dataset which motivates our research consists of 48 H I V infected patients who were treated by a potent antiviral regimen. T h e Plasma HIV-1 R N A (viral load) is repeatedly measured on days 2, 7, 10, 14, 28, and weeks 8, 12 and 24, after initiation of the treatment. After the antiviral treatment, the patient's viral loads will decay, and the decay rate may reflect the efficacy of the treatment. Throughout the time course, due to individual characteristics, the viral load may continue to decay, fluctuate, or even start to rise (rebound). We only consider the data for the first three months d a t a since data after three months are likely to be contaminated by longterm clinical factors. T h e number of measurements for each patient varies from 2 to 7. Fourteen patients have missing viral loads at scheduled time points due to dropout or other problems. T h e baseline covariates C D 4 cell counts, total complement levels (CH50), and tumor necrosis factor ( T N F ) contain 3.7% 16.4%, and 12.3% missing data respectively. We summarize our data in Table 6.1 (viral load is in l o g 10 scale). Four patients are randomly selected and their viral loads are plotted in Figure 53 0 10 20 30 40 days after treatment Figure 6.1: V i r a l loads of four randomly selected patients. (Example 1) 6.1. V i s u a l inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared to the remaining patients. dropouts are likely to be informative or nonignorable. Thus, the T h i s dataset was analyzed previously, but dropout patients were discarded and the missing viral loads were assume to be missing completely at random ( W u and Ding 1999; W u and W u 2001). W u (2004) re-analyzed the dataset, assuming that the missing mechanism depends on the unobserved responses (viral loads). In this section, our objectives are to consider random effects based nonignorable missing mechanism, and check if the estimates of decay rates are different. 54 6.2.2 Models The following two-phase H I V viral dynamic model has been proposed for this study (Wu and Ding, 1999; W u and W u , 2001) Vi, = log log (P ) = Pt+foTNF + bu, Xu = log (P ) = p + b, X 1 0 1 0 H 2 I where y^ is the l o g 1 0 (P e- + p TNF 6 A l i t l i l i 7 + P ie- ) + ^ , A 2 i t 2 l 2 3i 2i (6-1) 0 + P ,TNF 3 = p a + P CDA + b , 4 6 + p CH50 3i + b, 9 (6.2) 4i transformation of viral load for patient i at the j t h visit, i — 1 0 1 , . . . , N; j = 1 , . . . , rij, N = 48 and n* varies from 2 to 7, A H and A viral decay rates, P H and P 2 I 2 I represents two are baseline values, bki,k — 1, ...,4, are random effects, and £ij represents within individual errors. In this study, the baseline C D 4 (zi), C H 5 0 (z ) and T N F ( z s ) all contains 2 missing values. T o make a valid likelihood inference, we need to specify a model for these three covariates. We model the joint distribution of z = ( z i , z , z ) T 2 3 as a product of three one-dimensional conditional distributions: f(zii,z ,z \a) i2 = f{z \zn,z ;a ) i3 a where a = ( a f , a ^ , aJ) . • f(z \zn; a ) • f(zn\cxi). 3 i2 2 {zi \z ,z \a ) ~ N{a o + a iZii + a z ,a ), (2»2|*ii;"2) ~ ^("20 + a i ^ i , a (Zii|ai) ~ ^(aio.an). n (6.3) We first focus on the following saturated model: T 3 i2 i2 3 3 3 2 32 2 2 ), i2 33 (6.4) (6.5) (6.6) Figure 6.2 shows the Q - Q plots for each of the three covariates. It appears that the normality assumption may be plausible. T h e responses yij contain 6.7% missing values. T h u s , we also need to assume a model for the dropout mechanism in order to make valid likelihood inference. Note 55 Norma! Q - Q Plot - 2 - 1 0 1 Theoretical Quantiles Normal Q - Q Plot - 2 - 1 0 1 2 Theoretical Quantiles Normal Q - Q Plot -3 -i—r -2 -1 Figure 6.2: Q - Q plots for covariates (Example 1) 56 0 1 2 Theoretical Quantiles that although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analyses based on plausible models may still lead to reasonable models. Since, in this thesis, we focus on random-effects based nonignorable missing data, we propose the following missing model. /(ri|yi,bi,Zi Vi;0) = n p ( ^ = 1 l ^ b ' ) r y ( 1 - - P ( ^ = 1 ^' *)) b (6.7) where = [m,..., J"j ) n i r is a vector of missing data indicators for individual i such that r-jj = 1 if yij is missing and 0 otherwise. 6.2.3 Analysis and Results We consider estimating the population parameters /3 = (0\,..., [3g) T using three methods: the complete case method, exact method assuming ignorable missingness, and exact method with nonignorable missingness. given in Section 6.4. Details of the computation are T h e results are shown in Table 6.2; Jackknife standard errors are included for comparison to check on the accuracy of method for standard error estimation described in Section 3.3.2. We see that the results are somewhat different under different methods. For the most important parameter fa, the initial decay rate, the exact method assuming nonignorable missing gives the smallest estimate, the exact method assuming ignorable missing gives an moderate estimate, and the complete case method gives the largest. T h i s suggests that studies assuming ignorable missing data mechanism or discarding dropout patients may over-estimate the initial viral decay rate. Another informative parameter is f3i, which is the intercept term of the baseline viral load. Although it is not of much interest for testing the efficiency of the new 57 Table 6.2: Estimations for response model parameters. (Example 1) Exact (Ignor.) Exact (Non-ignor.) Complete Case JSE Estimate SE JSE Estimate SE JSE Estimate SE 0.32 0.08 0.02 0.04 0.02 0.04 13.45 12.27 12.29 ft 0.04 0.54 0.10 0.59 0.28 0.09 0.01 0.06 0.61 ft 0.34 35.84 0.71 40.65 4.40 0.90 37.54 0.50 0.39 ft 0.14 1.26 1.07 3.43 2.01 2.85 1.75 0.63 0.49 ft 5.46 2.06 0.55 0.20 0.45 5.59 0.76 0.57 7.69 ft 0.02 0.04 0.02 0.05 0.32 0.04 7.77 7.60 7.83 ft 0.34 0.14 0.07 0.03 0.03 0.30 0.01 0.08 0.30 ft 1.92 0.03 0.07 1.99 0.03 0.08 0.49 0.07 2.08 ft 0.092 0.22 0.09 0.01 0.05 0.01 0.05 0.20 0.13 ft 0.12 0.02 0.32 0.61 0.56 a 0.58 Complete Case is based on 30 subjects without any missing measurements. treatment, it shows the difference between different estimation methods. The exact methods almost give the same estimate for (3i, no matter whether we assume ignorable missingness or not. However, the complete case method gives a larger estimate. A possible interpretation is that, since the complete case method overestimate the initial decay rate, the baseline response "intercept" is correspondingly larger. The JSE is often but not always larger. Based on comparing the estimated SEs from the sample sizes of 30 complete cases and 48 subjects in total, it looks like JSE is more reliable than SE using method of Section 3.3.2 and nlme( ). The reliability of different approaches for SE estimation is a topic of future research. Although the parameters in the dropout models are nuisance parameters and are usually not of interest, they sometimes may contain useful information. We summarize the estimates of <p = (0o,..., (pi) T in Table 6.3. From Table 6.3, we could see that the estimate of <f>\ is positive and the estimate of <fo is negative. Consider our model (6.1) and (6.2). We may conclude that patients who have higher baseline 58 Table 6.3: Estimations for dropout model parameters. (Example 1) Parameters 0o 01 02 03 04 2.64 0.14 -0.11 -0.12 0.07 Standard Error 0.0017 0.11 0.067 0.17 0.42 p- value < 0.001 0.21 0.10 0.48 0.86 Estimate viral loads and slower (true, but unobservable) initial decay rate are more likely to dropout. T h e new treatment may be less efficient on such patients, and ignoring these patients may result in over-optimistic conclusion on the treatment effects. Note that the p-value associated with (pi is 0.21, which may be acceptable in practice. However, <p and 0 3 4 are not statistically significant, and we may conse- quently remove them to simplify the model. Note also that the W a l d tests are only approximate here, so these p-values should only serve as rough guidance. In next section, we will perform sensitivity analyses on alternative dropout models. 6.2.4 Sensitivity Analysis It is important to check sensitivity of parameter estimates to various plausible dropout models. Subject-area knowledge may help us to determine alternative dropouts models. It is conceivable that dropout may be related to individual's random effects, current and previous viral load measurements, or covariates such as C D 4 cell counts. Such relationship may be very complicated, but simple logistic regressions may provide reasonable approximation. Note that we should make use of the conclusions from preliminary studies and try to propose simple but plausible dropout models. We should avoid building a too complicated dropout model since the parameters may become non-identifiable (Fitzmaurice et al. 59 1996). Here we consider the following Table 6.4: Sensitivity analyses for dropout models. (Example 1) ft ft ft ft ft ft ft ft ft (7 Model (6.8) Estimate SE 0.02 12.27 0.54 0.10 35.84 0.39 2.85 1.75 5.59 0.76 0.02 7.77 0.30 0.03 1.99 0.03 0.09 0.01 0.02 0.61 JSE 0.04 0.04 0.71 0.63 0.57 0.05 0.03 0.08 0.05 Model (6.9) Estimate SE 12.27 0.02 0.63 0.08 36.57 0.50 1.87 1.12 7.25 0.80 7.66 0.02 0.32 0.03 1.40 0.02 0.02 0.07 0.62 0.02 JSE 0.04 0.04 0.70 0.60 0.57 0.04 0.03 0.05 0.05 Model (6.10) Estimate SE JSE 0.02 0.03 11.50 0.55 0.06 0.05 35.90 0.29 0.57 1.77 0.07 0.68 6.63 0.17 0.55 7.09 0.01 0.04 0.28 0.01 0.04 0.02 0.07 1.88 0.09 0.01 0.06 0.80 0.01 plausible dropout models l|0>,bj) = i - P(Ra l|0,bO P(Rij = l|0,bi) log i - P{Rn = l|0,bi) P(PHJ = l|0,bi) log i - P(Rij = l|0,b,) log P{Rii = 00 + 01&li + 02&2i + fahi + 04&4i, (6-8) 00 + 01&li + 02&2i (6.9) 0o + <PiVij (6.10) Since we only focus on random-effects based informative missingness in this thesis, model (6.8) and (6.9) are of main interest. Model (6.10) is responses based informative missingness, and Wu (2004) discusses and analyzes a similar dataset based on such approach. Here, we consider this kind of dropout model only for comparison. The parameter estimates under different dropout models are summarized in Table 6.4. The standard error estimates are almost consistent among different dropout models. For model (6.10), the SE estimates described in Section 3.3.2 tends to yield smaller estimates. We find that the resulting parameter estimates for viral dynamic parameters /3 60 are all comparable. T h i s suggests that the estimation of the viral dynamic parameters may be robust against plausible dropout models, and thus the estimates in Table 6.2 may be reliable. Compared with models (6.8) and (6.9), model (6.10) gives smaller standard errors on the estimates. However the jackknife SEs for the three models are much closer for each beta. Hence this is an indication of potential problems with the method of Section 3.3.2. O r it may be because model (6.10) is more parsimonious than model (6.8) and (6.9) (i.e., less nuisance parameters). Another possible explanation is that model (6.10) models the missing probability at each scheduled time point, while the other two models model the missing probability for each subject. Thus, model (6.10) captured more information. 6.2.5 Conclusion Based on our analyses, we conclude that, for the H I V viral load data, complete case analyses and analyses assuming ignorable missingness may over-estimate the initial decay rate. Assuming non-ignorable missingness incorporates possible mechanism which leads to patients' dropouts, and therefore may give parameter estimates which may be more reliable. Either responses-based or random-effects based non-ignorable dropout models may be used to get valid inferences, and either may be used as a tool for sensitivity analysis for the other one. 61 Table 6.5: D a t a summary of Example 2. Variable " V i r a l load Sample Sample Percentage of mean standard deviation missing values 4.27 1.12 6.5% # of patients: iV=51, Total # of observations: 415, # of visits per patient: rij=4 to 10(mean(nt)=8.1,SD(nj)=1.6)), # of missing values: 27 (from 15 patients). 6.3 6.3.1 Example 2 D a t a Description O u r second example is from another H I V study. T h e data contain H I V viral measurements from 51 patients. T h e Plasma H I V - 1 R N A (viral load) is repeatedly measured on days 1, 2, 3, 7, 10, 14, 28, and weeks 8, 12 and 24, after initiation of an anti-HIV treatment. After the antiviral treatment, the patient's viral loads will decay, and the decay rate may reflect the efficacy of the treatment. A s in the first dataset in Example 1, throughout the time course, due to individual characteristics, the viral load may continue to decay, fluctuate, or even start to rise (rebound). W e only consider the first three months data since data after three months are likely to be contaminated by long-term clinical factors. T h e number of measurements for each patient varies from 2 to 7. 15 patients have missing viral loads at scheduled time points due to dropout or other problems. However, different from Example 1, this study does not contain baseline covariates. Four patients are randomly selected and their viral load measurements are plotted in Figure 6.3. W e summarize our data in Table 6.5. A s in Example 1, visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared to the remaining pa- 62 Figure 6.3: V i r a l loads of four randomly selected patients. (Example 2) 63 tients. Thus, the dropouts are likely to be informative or nonignorable. 6.3.2 Models We consider the following H I V viral dynamic model, which does not contain covariates (Wu and Ding, 1999) V i j = log (P e- " A 1 0 log (Pii) = fa + b , loglo(Ai) = 10 where j/y is the l o g u P + hi, 3 t l i l i + P 2 i e- A 2 ^)+^ (6-11) I \u = Pi + b 2U \2i=Pi + h (6.12) U transformation of viral load for patient i at the j t h visit, i = 1 0 1 , . . . , N; j = 1 , . . . , n it where N'= 51 and n , varies from 4 to 10, A H and A 2 I represents two viral decay rates, Pn and P t are baseline values, bki,k = 1,...,4 are random 2 effects, and £ y represents within individual errors. T h e responses contain 6.5% missing values. Thus, we also need to assume a model for the dropout mechanism in order to make valid likelihood inference. Note that although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analyses based on plausible models may still lead to reasonable models. Since, in this thesis, we focus on random-effects based nonignorable missing data, we propose the following missing response model. /falyi.bi.Zi.v^) 1 O g (i-P^y=l|0b)) where = (r^,..., r* ) n i T = = f[P{Rij = 110,^)^(1 - P{R ii ^O + 01&li + 02&2i + 03&3i + 04&4i, = ll^.b,)) -^ 1 (6.13) is a vector of missing data indicators for individual i such that Vij — 1 if yij is missing and 0 otherwise. 64 Table 6.6: Estimates for dynamic model parameters in Model (6.11) and (6.12). (Example 2) E x a c t (Nonig.) Pi P2 Est SE JSE 12.23 0.002 60.13 P P* a 3 6.3.3 8.60 0.15 0.05 3.96 0.13 0.05 0.08 Exact (Ignor.) Est SE JSE 0.004 13.33 0.003 0.27 66.07 1.12 0.07 0.20 9.55 4.16 0.11 0.14 0.52 0.08 Approx. (Nonig.) C o m p . Case Est SE. JSE 0.004 11.45 0.002 0.35 55.78 0.08 9.01 0.004 0.08 0.22 2.88 0.076 0.32 0.02 Est SE JSE 0.005 12.27 0.23 0.01 0.25 58.53 2.27 0.59 0.10 8.56 0.23 0.19 3.80 0.30 0.39 0.20 0.17 0.25 Analysis and Results W e consider estimating the population parameters (3 = (Pi,..., Pi) T using four meth- ods: the complete-case method, the exact method assuming ignorable missingness, the exact method with nonignorable missingness, and the approximate method with nonignorable missingness. T h e results are shown in Table 6.6. We see that the results are somewhat different under different methods. For the most important parameter p , the initial decay rate, the exact method with non2 ignorable missingness gives a smaller estimate than the exact method with ignorable missingness, the approximate method gives the smallest estimate, and the complete case method gives a moderate estimate but with large estimation variance. T h i s suggests that studies assuming ignorable missing data mechanism or discarding dropout patients may over-estimate the initial viral decay rate. Note that the approximate method gives smaller estimates and standard errors. For some parameters, the estimate of standard errors in approximate method seems too small. It may indicate that the standard error estimate in Section 4.2.2 is not reliable. For the approximate method, the Jackknife standard errors may be more reliable. 65 Table 6.7: Estimations for dropout model parameters. (Example 2) Parameters 00 01 02 03 04 Estimate 2.95 0.611 -2.046 -0.562 0.1034 Standard Error 0.0346 0.0296 0.448 0.0465 0.125 p-value < 0.001 < 0.001 < 0.001 < 0.001 0.41 T h e estimates for parameters in the dropout model are summarized in Table 6.7. We find that, except for </>, all other coefficients are statistically significant. This 4 suggest that the missing mechanism may be non-ignorable, and the missingness may be related to the underlying unobservable individual characteristics (random effects). 6.3.4 Sensitivity Analysis F r o m the previous section, we conclude that the dropout mechanism may be nonignorable because most p-values associated with the coefficients cp in the dropout model are statistically significant. We still need to check whether different dropout models may affect the estimates of the parameters, because the validation of dropout models are not verifiable based on the observed data. Thus, it is important to check sensitivity of parameter estimates to various plausible dropout models. Subject-area knowledge may help us to determine alternative dropouts models. W e consider the following two possible models: log log log = l|0,b ) = l|0,bi) P(Rii = l|0,bi) 1 - P ( i ^ = l|0,bi) P(Ru = l|0,bO />(/2y 1- 4 PiRij \-P{Ra = 1\<PM) 00 + 01< + 02&2i + 03&3i + <t>ihu (6-14) 00 + 01&li + 02&2i + 03&3i, (6.15) 00 + 01&H + 02&2i (6.16) 66 Table 6.8: Sensitivity analyses for dropout models. (Example 2) Estimate S.E. J.S.E. 0.004 12.23 0.002 0.004 0.27 60.13 0.14 0.27 0.07 8.59 0.06 0.07 0.19 S.E. J.S.E. 12.23 0.002 60.13 0.15 01 02 0z 04 a 0.05 8.60 0.13 3.96 0.08 0.50 M o d e l (6.16) M o d e l (6.15) M o d e l (6.14) Estimate 0.20 0.13 3.99 0.08 0.49 Estimate S.E. J.S.E. 12.25 0.002 0.04 0.14 0.25 8.66 0.06 0.09 4.02 0.11 0.25 0.44 0.08 60.09 , T h e parameter estimates by exact method under different dropout models are summarized in Table 6.8. We find that the three model almost result in the same estimates, and therefore, we may conclude that the parameter estimates may be robust against different dropout models. 6.3.5 Conclusion F r o m the above analyses, we conclude that, for this dataset, the dropout maybe nonignorable. Simply ignoring the dropouts may over-estimate the initial viral decay rate f3 . T h e approximate method is much faster than the exact method. 2 6.4 Computation Issues S t a r t i n g v a l u e s . For the E M algorithms i n our examples, the starting values for f3 were obtained based on the complete-case methods. For example 1, the starting values for a. in the covariate model were obtained from linear regression models (6.4)-(6.6) using completely observed cases. T h e starting values for <p were set to be (0 , 01, 02, 03, 04, ) = (1,0,0,0,0). O 67 S t o p p i n g r u l e . T h e stopping rule for the E M algorithms in our examples is that the relative change in the parameter estimates from successive iterations is smaller than a given tolerance level (e.g. 0.01). However, due to Monte Carlo errors induced by Gibbs sampler, it is difficult to converge for a extremely small tolerance level, otherwise it may converge very slowly. level we used in our examples is 0.05. T h e actual tolerance T h e E M is stopped when each of the new parameters estimates falls within 5% difference from the corresponding estimates from the last E M iteration for two consecutive iterations. R u n n i n g time. For the data of Example 1, the algorithm for the exact method converged in about three hours on a S U N Sparc work-station (Ultra-60). For the data of Example 2, the algorithm for the exact method converged in about one hour, while the algorithm for the approximate method converged in about 15 minutes. Thus, the approximate method is computationally much more efficient. S a m p l i n g m e t h o d . In both examples, we use the multivariate rejection sampling method. Other sampling methods may also be applied and may be even more efficient. 68 Chapter 7 Simulation Study 7.1 Introduction In order to evaluate the performance of the two proposed methods: the exact method ( E X ) and the approximate method ( A P ) , we conduct a simulation study. In our simulations, we prepare E X and A P in terms of biases and mean-squared errors of their estimates under various situations. We also add the complete-data ( C D ) method in our comparisons. Section 7.2 describes the data generation models, and Section 7.3 compares the two methods of estimation in three different situations. our result i n Section 7.4. 69 W e conclude 7.2 Design of the Simulation Study 7.2.1 Models W e generate the responses variable yu = log togioCAi) = logio(^2i) where /3 = (Pi, /3 , P ,/? ) T 2 (feii, 6 j,b , bn) T 2 3i 3 4 = from the following N L M E 1 0 (P e- A l i H Pi+ u, b Ih + bsi, ^ + P 2 i e- A ^)+^, A i = /3 + 6 « i A 2 i 2 2 (7-1) 1 = /3 + 6 , 4 (7.2) 4i are the model parameters which are of interest, bj = are assumed to be i.i.d with a normal distribution N(0,D), and JD is a diagonal matrix with rank 4. T h e true values of f3 is taken as ( 1 0 , 4 0 , 8 , 4 ) . T h e T number of individuals is N = 48. T h e choice of tij,riij, and a 2 will be reported in later sections with the results. T o evaluate the proposed methods, we generate some missing values of responses yij's as follows. T h e model for missing responses is log ( — ^ T r T T T T J V I - P(Rij = l\<p,Oi)J where cp = (<f> ,<pi, <p2,<p , (pi) T 0 3 = 0 ° + 0 i i » + 02&2i + <i>3b i + & 3 and <pibu, (7.3) is the missing responses indicator. T h e above missingness model suggest that the missingness of the responses depends on the random effects of each individual, and thus the responses is nonignorable missing. We generate missing responses based on model (7.3). We choose appropriate values of cp to mimic certain missing rate, and we generate binary data and b j . If — 1, then j/y is deleted, and if 70 based on values of cp = 0, yij is considered to be observed. 7.2.2 Comparison Criteria We compare E X and A P in terms of biases and mean square errors ( M S E s ) . Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined next. T h e bias for ft, the j t h component of j3, is defined as bias,- = ft-ft, (7.4) where ft is the estimate of ft. T h e mean-squared error for f3j is defined as MSE,- = bias + s , 2 2 (7.5) where Sj is the simulated standard error of ft. T h e n , the percent relative bias of ft is defined as Uasj/pj x 100%, (7.6) and the percent relative root M S E is v / M S E " / ! ^ ! x 100%. (7.7) T o show the difference between E X and A P , we also calculate the mean and standard error of the absolute differences between the estimates from E X and A P . 7.3 7.3.1 Simulation Results Comparison of Methods with Varying Missing Rates To check the impact of the missing rates on estimation by E X and A P , we estimate the parameters based on three missing rate respectively. A missing rate of 10% and 71 20%. Specifically, in our model (7.3), we set d> = get roughly an 10% missing rate, and set <b = an average of 20% missing rate. (-2.6,0.1, - 0 . 1 , - 0 . 1 , 0 . 1 ) (—1.4,0.1, — 0 . 1 , - 0 . 1 , 0 . 1 ) T T to to get Since the random-effects based probability of a missing observation is a constant for a particular subject, there are situations where a subject has a high missing probability and the responses are missing for A L L visits. T o overcome this difficulty, we always keep the responses for the first two visits. T h a t is, every subject has at least two observations. Specifically, we choose ij = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0), the relative scheduled time points, for each subject, and generate (j = 1 , . . . ,rij = 9). If value from the generated responses yij (j = 1 , . . . , choose D = diag(4,25,4,1) = 1, we manually remove the jth = 9) for the zth subject. We and a — 0.5 in these simulations. T o check the effect of each factor, we change a factor each time and compare the estimates with the original ones. Therefore, in the following tables, one half of each table is exactly the same. Table 7.1 shows average simulation results based on 150 simulations. We see that the exact method performs better than the approximate method in the sense that the exact method yields smaller relative M S E and smaller bias. C D may yield very biased estimates, especially for the most important parameter ft, the initial viral decay rate. A l l methods performs better when the missing rate is lower. T h e absolute differences between E X and A P are similar under different missing rates. 72 Table 7.1: Simulation results for varying missing rates. Missingness True rate (%) values 10 ft 20 7.3.2 AP CD Mean SD 0.05 0.02 10 3.5 3.2 2.9 4.7 12.11 == 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 1.10 6.7 7.50 9.01 0.03 0.01 11.03 21.22 25.03 0.08 0.06 1.33 0.84 = 04- = 4 01 == 10 02 == 40 ft ft EX CD AP EX | EX-AP| 20.10 ft =: ft %VMSE %bias -2.8 0.33 3.66 4.71 -10.53 6.11 -10.29 30.33 12.76 21.83 40.93 = 8 3.23 -3.01 -6.11 5.32 12.78 17.19 0.06 0.05 = 4 -2.77 5.75 -3.75 5.66 22.50 24.71 0.07 0.03 Comparison of Methods with Different R a n d o m Effects Covariances T o see how the variability of bj affects the estimates from the three methods, we consider two variance-covariance matrices for bf. / 1 0 0 0 \ / 0 9 0 0 and Var(bj) = \ 0 0 0 0 1 0 0 0 0 25 0 0 4 0 0 0 1 0 0 Var(bj) 1 0 0 4 J \Q J Table 7.2 shows average simulation results based on methods E X , A P , and C D . We choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0), the relative scheduled time points, for each subject, and generate (j = 1 , . . . , n j = 9). If = 1, we manually remove the j t h value from the generated responses yij (j = 1,... , n j = 9) for the i t h subject. W e choose <p = (—2.0,0.1, —0.1, —0.1,0.1) and a = 0.5 in these simulations. T T h e response missing rate is roughly 10%. We may conclude that, for most of the parameters, the exact method performs better than the approximate method in the 73 Table 7.2: Simulation results for different covariance matrices for random effects. Covariance |EX-AP| %\/MSE %bias True AP CD Mean SD 0.01 Matrix values EX AP CD EX Var(bi) A = 10 2.7 2.4 2.1 3.9 5.6 9.1 0.02 5.0 6.0 11.2 15.7 0.23 0.14 0.01 =diag(l,9,l,l) Var(bi) =diag(4,25,4,l) 2.3 /3 = 40 2 -9.0 ft = 8 0.4 3.1 2.1 2.1 2.4 12.6 0.02 ft = 4 1.5 2.5 5.0 1.9 5.5 6.5 0.03 0.02 12.11 20.10 0.05 0.02 0.64 3.5 ft = 10 3.2 2.9 4.7 ft = 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 ft = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 ft = 4 0.33 -2.8 1.10 6.7 7.50 9.01 0.03 0.01 T h e lower half of this table is identical to the upper half of Table (7.1) sense that the exact method yields smaller relative M S E s and smaller bias. Both methods perform better when the variances of the random effects are smaller. For ft, the estimate under approximate method has slightly smaller bias. T h e absolute differences between E X and A P have smaller mean and standard deviation when the variances of b are smaller. 7.3.3 Comparison of Methods with Varying Intra-individual Measurements T o examine how the number of intra-individual measurements affect our estimates, we consider the two methods of estimation under two maximum number of measurements, n* = 9 and n, — 15. For the case in which maximum number of measurements is 9, we choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0) as the relative scheduled time points, for each subject. For the case in which the maximum number of mea- surements is 5, we choose U = (0.01,0.02,0.04,0.07,0.1,0.15,0.2,0.3,0.4,0.5,0.6,0.7, 0.8,0.9,1.0) as the relative scheduled time points for each subject. 74 In both cases, Table 7.3: Simulation results for varying intra-individual measurements. True %bias per patient values EX AP 2 to 9 ft = 10 ft = 40 3.5 -3.1 # of obs. 2 to 15 %VMSE | EX-AP| AP CD Mean SD 4.7 12.11 20.10 0.05 0.02 9.5 27.82 21.77 0.97 0.64 5.64 11.76 0.06 0.03 CD EX 3.2 2.9 -5.5 -2.5 ft = 8 0.22 -0.5 0.76 4.1 ft = 4 0.33 -2.8 1.10 6.7 7.50 9.01 0.03 0.01 ft = 10 ft = 40 2.1 3.0 2.8 3.5 11.4 -1.9 -3.0 5.7 7.5 10.9. 11.9 9.6 0.03 0.22 0.01 0.12 ft = 8 -0.10 0.1 1.3 2.7 4.9 9.5 0.03 0.02 2.7 6.1 0.05 0.02 ft = 4 0.30 -2.0 0.88 3.6 T h e upper half of this table is identical to the upper half of Table (7.1) we always keep the responses values for the first two visits of each patients, but the responses from the third visit may be missing. Therefore, the actual number of observations for each patient may vary from 2 to 9 or 2 to 15 respectively. Additionally, We use the missing rate of 10% (roughly), Var(bj)=diag(4,25,4,1), and a = 0.5 in these simulations. T h e results of 100 simulations are summarized in Table 7.3. We see that, when there are more intra-individual measurements, all methods performs better. Again, the exact method performs better than the approximate method, no matter what the number of intra-individual measurements are. C D performs worse than E X and A P . T h e absolute differences between E X and A P have smaller mean and standard deviation when there are more observations for each subjects. 7.3.4 Comparison of Methods with Different Variances T o investigate the impact of intra-individual variability on E X and A P , we estimate the parameters based on two data generation strategy, with a — 0.5 a n d 1 respectively. 75 Table 7.4: Simulation results for varying variances. a True values a = 0.5 0! = 10 0 2 EX CD EX 2~9 AP CD 20.10 Mean O05 SD 47 12.11 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 0 O |EX-AP] / -3.1 3 3T5 AP %y MSE = 40 0 0.02 = 4 0.33 -2.8 1.10 6.7 . 7.50 9.01 0.03 0.01 = 10 3.4 3.3 3.1 6.3 14.1 19.18 0.06 0.04 02 = 40 0.54 4 a = 1 %bias ft -2.7 -5.1 6.9 8.8 27.3 27.10 0.78 = 8 0.31 -0.6 1.2 6.35 9.8 9.28 0.05 0.08 04 = 4 0.23 -2.7 1.9 7.15 9.7 17.3 0.03 0.03 0 3 T h e lower half of this table is identical to the upper half of Table (7.1) Table 7.4 shows the average simulation results based on 150 simulations. In these simulations, we choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0) as the relative scheduled time points for each subject, and generate (j = 1 , . . . , n j = 9). If Tij = 1, we manually remove the j t h value from the generated responses y^ (j = 1 , . . . ,rij = 9) for the i t h subject. We choose Var(bi)=diag(4,25,4,l) and the missing rate is 10% roughly. We find that, E X performs better than A P and yields less biased estimates. Complete-case methods tend to over estimate the initial decay rate, and yields large estimation variances. T h e absolute differences between E X and A P are relatively similar . 7.4 Conclusions Based on the simulation results in the preceding sections, we may conclude as follows. • E X and A P results are quite close relatively. 76 • For most cases, E X performs better than A P in the sense that E X yields smaller bias and smaller relative M S E s . However, in some cases, the estimates from A P may have smaller bias. C D often performs the worst in sense of large M S E . • B o t h methods perform better when the missing rates are lower. • B o t h methods perform better when the variances of random effects are smaller. • B o t h methods perform better when the number of observations for each subject is larger. • T h e Complete-case method tends to overestimate the initial decay rate and the corresponding estimates are positive biased, especially when the missing rate is large. Note that, in our simulation studies, A P is computationally more efficient than E X . T h e running time of A P is about 1/4 of that of the E X , and it does not have convergence problems. In practice, the E X is preferable when the computation load is not too heavy. W h e n E X is too slow, A P is preferable. 77 Chapter 8 Conclusion and Discussion In this thesis, we have proposed two methods to estimate the parameters for N L M E s with random effects based informative dropout and missing covariates. T h e proposed methods include an exact method and an approximate method, both are implemented by a Monte Carlo E M algorithm. For the exact method, sampling the random effects may offer potential computational difficulties such as slow or non-convergence, especially when the dimension of random effects is large. T o overcome this difficulty, we proposed an approximate method which integrates out the random effects in the E-step and thus avoids sampling the random effects in the Monte Carlo E M . Pinheiro and W u (2001) show that convergence rate of the E M algorithm can be improved greatly by integrating out the random effects. We also conducted a simulation study to compare the performance of the exact method and the approximate method. In our simulations, the exact method gives somewhat more reliable results than the approximate method in the sense that it provides smaller mean squared errors on the parameter estimates. O u r simulations also suggest that the proportion of missing values, the variances of random effects, the number of intra-individual measurements, and the intra-individual variabilities 78 may affect the performance of the exact method and the approximate method. T h e exact method was applied to an H I V dataset with missing covariates and informative dropouts. We find that patients who have higher baseline C D 4 cell counts and slower initial viral decay rates may be more likely to dropout. Thus, ignoring dropouts may lead to over-optimistic assessment of the antiviral treatment. W e also applied the exact and the approximate method on the second H I V dataset. W e obtain similar conclusions as the first dataset. W e find that the approximate method may under-estimate the initial viral decay rate, and thus may be somewhat conservative in assessing the treatment effect. It may be caused by the lack of adequacy of our approximate model. We also notice that the estimation of standard errors i n approximate method is not reliable, and Jackknife standard error is recommended. But, in practice, the approximate method is computationally much more efficient than the exact method. Since the parametric models for the missing responses are not testable based on the observed data, it is important to conduct sensitivity analyses. Based on our sensitivity analyses, we find that the estimates are robust to different dropout models. Finally, we give an outline for possible future work. (1) For simplicity, in our examples and simulations, we only include random effects in the dropout models. It is conceivable that the dropout probability may also depend on covariates and the responses. In the future, we may study dropout models which simultaneously consider random effects, covariates, and responses. (2) In our study, we only consider nonlinear mixed effect models for normal data. Generally, our proposed methods may be extended to other models, such as generalized linear mixed effects models ( G L M M s ) and generalized nonlinear 79 mixed effects models ( G N L M M s ) . (3) We have only considered baseline covariates. Our proposed methods may be extended to time-dependent covariates. (4) Multivariate rejection sampling methods were used in our analyses and simulation. In general, other sampling methods, such as adaptive rejection sampling methods and importance sampling methods, may also be used and may be even more efficient. (5) In our approximate method, we used first order Taylor expansion to linearize the model. In general, it is not necessarily a good approximation. In the future, we may investigate better approximation, such as higher order Taylor expansion, Laplace approximation, etc. (6) Investigation of accuracy of standard error estimation. In our work, the estimation of S E s maybe unreliable, especially in the approximate method. (7) In our work, we consider ej|/3i ~ iV(0, a 1). 2 In the future, we may consider more complicated relationships on the error terms. We may consider random effects model for longitudinal data with being an AR(1) time series. 80 References Booth, J . G . and Hobert, J . P. (1999). Maximizing generalized linear mixed models likelihoods with an automated Monte Carlo E M algorithm. Journal of the Royal Statistical Society, Ser. B. 61, 265-285. Davidian, M . , and Giltinan, D . M . (1995). Nonlinear Models for Repeated Measurements Data. C h a p m a n & Hall. Dempster, A . P., Laird, N . M . , and R u b i n , D . B . (1977) M a x i m u m likelihood estimation from incomplete data v i a the E M algorithm (with discussion). of the Royal Statistical Society, Ser. B. 39, 1-38. Diggle, P., Heagerty, P., Liang, K , and Zeger, S. (2002) Analysis Data, 2nd Edition. Journal of Longitudinal Oxford. Diggle, P. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis (with discussion). Applied Statistics 43, 49-93. Ding, A . and W u , H . (2001). Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics 2(1), 13-29. Fitzmaurice, G . M . , L a i r d , N . M . , and Zahner, G . E . P. (1996). Multivariate logistic 81 models for incomplete binary responses. Journal of the American Statistical Association 91, 99-108. Gelfand, A . E . and Smith, A . F . M . (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398-409. Gilks, W . R. and W i l d , P. (1992). Adaptive rejection sampling for G i b b s sampling. Applied Statistics 41, 337-348. H o , D . D . , Neumann, A . U . , Perelson, A . S., Chen, W . , Leonard, J . M , and Markowitz, M . (1995). Rapid turnover of plasma virions and C D 4 lymphocytes in H I V - 1 infection. Nature 373, 123-126. Ibrahim, J . G . (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association 85, 765-769. Ibrahim, J . G . , Lipsitz, S. R . , and Chen, M . H . (1999). Missing covariates in generalized linear models when the missing data mechanism is nonignorable. Journal of the Royal Statistical Society, Ser. B 61, 173-190. Ibrahim, J . G . , Lipsitz, S. R . , and C h e n , M . H . (2001). Missing responses in generalized linear models when the missing data mechanism is nonignorable. Biometrics 88, 551-564. Lindstrom, M . J . and Bates, D . M . (1990). Nonlinear mixed effects models for repeated measures data. Biometrics 46, 673-687. Little, R. J . A . and Rubin, D . B . (1987). New York: John Wiley. 82 Statistical Analysis with Missing Data. Little, R . J . A . (1992). Regression with missing X ' s : a review. Journal of the American Statistical Association 8 7 , 1227-1237. Little, R . J . A . (1995). Modeling the drop-out mechanism i n repeated measures studies. Journal of the American Statistical Association 9 0 , 1112-1121. Little, R . J . A . and Schlucher, M . D . (1985). M a x i m u m likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72, 497-512. L i u , C , R u b i n , D . B . , and W u , Y . N . (1998), Parameter expansion for E M acceleration: the P X - E M algorithm, Biometrika 85, 755-770. Louis, T . A . (1982), Finding the observed information matrix when using the E M algorithm. Journal of the Royal Statistical Society, Ser. B. 44, 226-233. McLachlan, G . J . and Krishnan, T . (1997). The EM-algorithm and Extension. John Wiley & Sons. Meng, X . L . and van D y k , D . A . (1997). T h e E M algorithm - an old folk-song sung to a fast new tune (with discussion). Journal of the Royal Statistical Society, Ser. B. 59, 511-567. Perelson, A . S., Neumann, A . U . , Markowitz, M . , Leonard, J . M . , and H o , D . D . (1996). H I V - 1 dynamics i n vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271, 1582-1586. Pinheiro, J . C . and Bates, D . M . (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects models. Journal of Computational and Graphical Statistics 4, 12-35. 83 Pinheiro, J . C . and W u , Y . (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate i-distribution. Journal of Computational and Graphical Statistics 10, 249-276. Robert, C . P. and Casella G . (1999). Monte Carlo Statistical Methods. SpringerVerlag, New York. Roy, J . and L i n , X . (2002). Analysis of multivariate longitudinal outcomes with nonignorable dropouts and missing covariates: changes in methadone treatment practices. Journal of the American Statistical Association 97, 40-52. T e n Have, T . R . , Pulkstenis, E . , Kunselman, A . , and Landis, J . R. (1998). Mixed effects logistics regression models for longitudinal binary response data with informative dropout. Biometrics 54, 367-383. Vonesh, E . F . , and Chinchilli, V . M . (1996). Linear and Nonlinear Models for the Analysis of Repeated Measurements. Marcel Dekker, New York. Vonesh, E . F . , Wang, H . , Nie, L . , and Majumdar, D . (2002). Conditional secondorder generalized estimating equations for generalized linear and nonlinear mixedeffects models. Journal of the American Statistical Association 97, 271-283. Wei, G . C . and Tanner, M . A . (1990). A Monte Carlo implementation of the E M algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association 85, 699-704. Wolfinger, R. (1993). Laplace's approximation for nonlinear mixed models, Biometrika 80, 791-795. 84 W u , M . C . and Carroll, R . J . (1988). Estimation and comparison of changes i n the presence of informative right censoring by modeling the censoring process. Biometrics 55, 410-418. W u , H . and Ding, A . (1999). Population H I V - 1 dynamics in vivo: applicable models and inferential tools for virological data from A I D S clinical trials. Biometrics 55, 410-418. W u , H . and W u , L . (2001). A multiple imputation method for missing covariates in nonlinear mixed-effect models, with application to H I V dynamics. Statistics in Medicine 20, 1755-1769. W u , K . (2003). Simultaneous inference for generalized linear mixed models with informative dropout and missing covariates. Master Thesis, the University of British Columbia. W u , L . (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to A I D S studies, Journal of the American Statistical Association 97, 955-964. W u , L . (2004, a). Exact and approximate inferences for nonlinear mixed-effects models with missing covariates, Journal of the American Statistical Association 99, 700-709. W u , L . (2004, b). E x a c t and approximate simultaneous inferences for nonlinear mixed-effects models with dropouts and missing covariates, Technical report, the Department of Statistics, the University of British Columbia. W u , L . and W u , H . (2002). Missing time-dependent covariates in H I V viral dynamic models. Journal of the Royal Statistical Society, Ser. C. 51, 297 - 318. 85
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Nonlinear mixed effects models with dropout and missing...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Nonlinear mixed effects models with dropout and missing covariates when the dropout depends on the random… Song, Shijun 2005
pdf
Page Metadata
Item Metadata
Title | Nonlinear mixed effects models with dropout and missing covariates when the dropout depends on the random effects |
Creator |
Song, Shijun |
Date Issued | 2005 |
Description | Nonlinear mixed effects models (NLMEs) are very popular in many longitudinal studies such as HIV viral dynamic studies, pharmacokinetics analyses, and studies of growth and decay. In these studies, however, missing data problems often arise, which make some statistical analyses complicated. In this thesis, we proposed an exact method and an approximate method for NLMEs with random-effects based informative dropouts and missing covariates, and propose methods for simultaneous inference. Monte Carlo E M algorithms are used in both methods. The approximate method, which is based on a Taylor series expansion, avoids sampling the random effects in the E-step and thus reduces the computation burden substantially. To illustrate the proposed methods, we analyze two real datasets. The exact method is applied to a dataset with covariates and a dataset without covariates. The approximate method is applied to the dataset without covariates. The result shows that, for both datasets, dropouts may be correlated with individual random effects. Ignoring the missingness or assuming ignorable missingness may lead to unreliable inferences. A simulation study is performed to evaluate the two proposed methods under various situations. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-15 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0092156 |
URI | http://hdl.handle.net/2429/16692 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2005-0647.pdf [ 3.03MB ]
- Metadata
- JSON: 831-1.0092156.json
- JSON-LD: 831-1.0092156-ld.json
- RDF/XML (Pretty): 831-1.0092156-rdf.xml
- RDF/JSON: 831-1.0092156-rdf.json
- Turtle: 831-1.0092156-turtle.txt
- N-Triples: 831-1.0092156-rdf-ntriples.txt
- Original Record: 831-1.0092156-source.json
- Full Text
- 831-1.0092156-fulltext.txt
- Citation
- 831-1.0092156.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092156/manifest