- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Nonlinear mixed effects models with dropout and missing...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Nonlinear mixed effects models with dropout and missing covariates when the dropout depends on the random.. 2005
pdf
Page Metadata
Item Metadata
Title | Nonlinear mixed effects models with dropout and missing covariates when the dropout depends on the random effects |
Creator |
Song, Shijun |
Date Created | 2009-12-15T21:31:29Z |
Date Issued | 2009-12-15T21:31:29Z |
Date | 2005 |
Description | Nonlinear mixed effects models (NLMEs) are very popular in many longitudinal studies such as HIV viral dynamic studies, pharmacokinetics analyses, and studies of growth and decay. In these studies, however, missing data problems often arise, which make some statistical analyses complicated. In this thesis, we proposed an exact method and an approximate method for NLMEs with random-effects based informative dropouts and missing covariates, and propose methods for simultaneous inference. Monte Carlo E M algorithms are used in both methods. The approximate method, which is based on a Taylor series expansion, avoids sampling the random effects in the E-step and thus reduces the computation burden substantially. To illustrate the proposed methods, we analyze two real datasets. The exact method is applied to a dataset with covariates and a dataset without covariates. The approximate method is applied to the dataset without covariates. The result shows that, for both datasets, dropouts may be correlated with individual random effects. Ignoring the missingness or assuming ignorable missingness may lead to unreliable inferences. A simulation study is performed to evaluate the two proposed methods under various situations. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2009-12-15T21:31:29Z |
DOI | 10.14288/1.0092156 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/16692 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0092156/source |
Download
- Media
- ubc_2005-0647.pdf [ 3.03MB ]
- Metadata
- JSON: 1.0092156.json
- JSON-LD: 1.0092156+ld.json
- RDF/XML (Pretty): 1.0092156.xml
- RDF/JSON: 1.0092156+rdf.json
- Turtle: 1.0092156+rdf-turtle.txt
- N-Triples: 1.0092156+rdf-ntriples.txt
- Citation
- 1.0092156.ris
Full Text
Nonlinear Mixed Effects Models with Dropout and Missing Covariates When the Dropout Depends on the Random Effects by Shijun Song B.Sc, Peking University, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Statistics) The University of British Columbia August 2005 © Shijun Song, 2005 Abstract Nonlinear mixed effects models (NLMEs) are very popular in many longitudi- nal studies such as HIV viral dynamic studies, pharmacokinetics analyses, and studies of growth and decay. In these studies, however, missing data problems often arise, which make some statistical analyses complicated. In this thesis, we proposed an exact method and an approximate method for N L M E s with random-effects based in- formative dropouts and missing covariates, and propose methods for simultaneous inference. Monte Carlo E M algorithms are used in both methods. The approximate method, which is based on a Taylor series expansion, avoids sampling the random effects in the E-step and thus reduces the computation burden substantially. To il- lustrate the proposed methods, we analyze two real datasets. The exact method is applied to a dataset with covariates and a dataset without covariates. The approxi- mate method is applied to the dataset without covariates. The result shows that, for both datasets, dropouts may be correlated with individual random effects. Ignoring the missingness or assuming ignorable missingness may lead to unreliable inferences. A simulation study is performed to evaluate the two proposed methods under various situations. ii Contents A b s t r a c t i i C o n t e n t s ii i L i s t of Tab les v i i L i s t of F i g u r e s v i i i A c k n o w l e d g e m e n t s ix D e d i c a t i o n x 1 I n t r o d u c t i o n 1 1.1 Longitudinal Data Analysis 1 1.1.1 Longitudinal Studies 1 1.1.2 Approaches to Longitudinal Data Analysis 3 1.2 Missing Data Problems 5 1.2.1 Missing Covariates and Responses 5 1.2.2 Classification of Missing Value Mechanisms 5 1.2.3 Literature on Missing Data Problems 7 1.3 Motivating Examples 8 iii 1.4 Objectives and Outline 11 2 N o n l i n e a r M i x e d Effects M o d e l s 12 2.1 Introduction 12 2.2 Nonlinear Mixed Effects Models 12 2.3 Literature Review on N L M E Models with Informative Missing Data . 15 3 A n E x a c t M e t h o d for N L M E M o d e l s w i t h Informat ive D r o p o u t a n d M i s s i n g Covar ia tes 17 3.1 Introduction 17 3.2 The Models 18 3.3 A Monte Carlo E M Method 20 3.3.1 E-step 22 3.3.2 M-step 25 3.4 Sampling Methods 27 3.4.1 Gibbs Sampler 27 3.4.2 Multivariate Rejection Algorithm 28 3.4.3 Importance Sampling 29 3.5 Convergence 30 4 A n A p p r o x i m a t e M e t h o d for N L M E M o d e l s w i t h Informat ive D r o p o u t a n d M i s s i n g Covar ia t e s 33 4.1 Introduction 33 4.2 The Approximate Method 35 4.2.1 E-step 36 4.2.2 M-step 45 4.3 Monte Carlo Sampling 45 iv 5 Covar ia te s M o d e l s a n d D r o p o u t M o d e l s 47 5.1 Introduction 47 5.2 Covariate Models 48 5.3 Dropout Models 49 5.4 Sensitivity Analyses 50 6 D a t a A n a l y s i s 52 6.1 Introduction 52 6.2 Example 1 53 6.2.1 Data Description 53 6.2.2 Models 55 6.2.3 Analysis and Results 57 6.2.4 Sensitivity Analysis 59 6.2.5 Conclusion . 61 6.3 Example 2 62 6.3.1 Data Description 62 6.3.2 Models 64 6.3.3 Analysis and Results 65 6.3.4 Sensitivity Analysis 66 6.3.5 Conclusion 67 6.4 Computation Issues 67 7 S i m u l a t i o n S t u d y 69 7.1 Introduction 69 7.2 Design of the Simulation Study 70 7.2.1 Models 70 v 7.2.2 Comparison Criteria 71 7.3 Simulation Results 71 7.3.1 Comparison of Methods with Varying Missing Rates 71 7.3.2 Comparison of Methods with Different Random Effects Covari- ances 73 7.3.3 Comparison of Methods with Varying Intra-individual Measure- ments 74 7.3.4 Comparison of Methods with Different Variances 75 7.4 Conclusions 76 8 C o n c l u s i o n a n d D i scuss ion 78 References 80 ) vi List of Tables 6.1 Data summary of Example 1 53 6.2 Estimations for response model parameters. (Example 1) 58 6.3 Estimations for dropout model parameters. (Example 1) 59 6.4 Sensitivity analyses for dropout models. (Example 1) 60 6.5 Data summary of Example 2 62 6.6 Estimates for dynamic model parameters in Model (6.11) and (6.12). (Example 2) 65 6.7 Estimations for dropout model parameters. (Example 2) 66 6.8 Sensitivity analyses for dropout models. (Example 2) 67 7.1 Simulation results for varying missing rates 73 7.2 Simulation results for different covariance matrices for random effects. 74 7.3 Simulation results for varying intra-individual measurements 75 7.4 Simulation results for varying variances 76 vii List of Figures 1.1 Hypothetical data on the relationship between height and age 2 1.2 Vira l loads of four randomly selected patients 10 6.1 Vira l loads of four randomly selected patients. (Example 1) 54 6.2 Q - Q plots for covariates (Example 1) 56 6.3 Vira l loads of four randomly selected patients. (Example 2) 63 viii Acknowledgements First and foremost, I would like to thank my supervisor, Dr. Lang Wu, for his excellent guidance and immense help during my study at the University of British Columbia. Without his support, expertise and patience, this thesis would not have been completed. Also, I would like to thank my co-supervisor and second reader, Dr. Harry Joe, for his invaluable comments and suggestions on this thesis. I would also like to thank Ms. Kunling Wu, a previous student of Dr. Lang Wu. Her master thesis and personal advice benefit me a lot in completion of this thesis. Furthermore, I would like to thank Dr. John Petkau for his invaluable advice on my consulting projects, which benefit me very much in the past and future. I thank all the faculty and staff in Department of Statistics at the University of British Columbia for providing such a nice academic environment. I should also thank all the graduate students in the Department of Statistics for making my study so enjoyable. Most importantly I would like to thank my parents for loving me and believing in me. Their love, constant support and encouragement push me to be the best at everything I do. SHIJUN SONG The University of British Columbia August 2005 ix To my parents. x Chapter 1 Introduction 1.1 Longitudinal Data Analysis 1.1.1 Longitudinal Studies The key characteristic of a longitudinal study is that individuals are measured re- peatedly over time. Longitudinal studies differ from cross-sectional studies, in which a single outcome is measured for each individual. In many studies, especially in clin- ical trials, longitudinal data are very common. Even when it is possible to address the same scientific questions in a longitudinal or cross-sectional study, there may be many advantages in addressing them in a longitudinal study. A n example, which can illustrate this idea, is Figure 1.1. In Figure 1.1(a), height is plotted against age for a hypothetical cross-sectional study of boys. Height appears to be shorter among older boys. In Figure 1.1(b), we connect the data points from each individual. Now, it is clear that everyone's height increases with age. This example shows that longitudinal studies can distinguish changes over time within individuals from differences among people in their baseline levels. Cross-sectional studies cannot. Longitudinal data can 1 a b 10 12 14 16 18 10 12 14 16 18 age age Figure 1.1: Hypothetical data on the relationship between height and age. be collected either prospectively, following subjects forward in time, or retrospectively, by extracting multiple measurements on each person from historical records. Lon- gitudinal data require special statistical methods because the set of observations on one subject tends to be inter-correlated. This correlation must be taken into account to draw valid inferences. Correlation is also taken into account when analyzing a single long time series of measurements. In most time series studies, there is only one series available and people usually try to find clues and draw conclusions from that series itself. Anal- ysis of longitudinal data tends to be simpler because subjects are usually assumed independent. Valid inferences can be made by borrowing information across people. That is, the consistency of a pattern across subjects is the basis for substantive conclu- sions. For this reason, inferences from longitudinal studies can be made more robust to model assumptions than those from time series data, particularly to assumptions about the nature of the correlation. 2 1.1.2 Approaches to Longitudinal Data Analysis Let yij represent a response variable and X y represent a p x l vector of p explanatory variables observed at time point Uj, for measurement j on subject i, j = 1,..., n,, i — 1,..., N. The mean and variance of are represented by E (y^) = ^ and Var (y^) — u^. The set of repeated outcomes for subject i are collected into an x 1 vector, yt = (yu,... ,yini)T, with mean E (y*) = /x; and x rij covariance matrix Var(yi) = V, where the (j,k) element of Vi is the covariance between y^ and y^, denoted by Cov (y^, y,fc) = t ^ . The covariate matrix for the ith subject is denoted as Xi = {xn, • • • , X m J T , an n; x p matrix. We use B4 for the n* x rij correlation matrix of y*. The responses for all subjects are denoted as y = ( y f , . . . , y ^ ) r , which is an m x 1-vector with m — YlZ=i ni- The covariates for all units are referred to as X — (Xf,..., Xj^)T, which is an m x p matrix. There are three approaches to longitudinal data analysis. The first approach, which is often called marginal models, is to model univariate responses ignoring de- pendence. Marginal methods are mainly used for regression with dependent data with the main interest is inference for the regression parameters. For example, in a clinical trial the difference between control and treatment is most important, not the difference for any one individual. A second approach, the random effects model approach, assumes that correla- tion arises among repeated responses because the regression coefficients vary across individuals. Here, we model given the individual-specific coefficients, by h(E(yij\0i))=xfJ(3i. (1.1) Here, h(-) is a link function. For normal responses, it is the expectation and for binary responses, it may be the log odds ratio. Usually, there are too little data 3 on a single person to estimate /3j from (y^Xj) alone. We further assume that the Pi's are independent realizations from some distribution with mean (3. We can write Pi = P + hi, where P is fixed and bj is a vector of zero-mean random variables. Then the basic assumption can be restated in terms of the latent variables bj. That is, there are unobserved factors represented by bj that are common to all responses for a given individual but which vary across individuals. Random effects models are particularly useful when inferences are to be made about individual trajectories, such as in AIDS studies. They focus on both population parameters P and individuals characteristics bj's. The third approach is called a transition model approach. This focuses on the conditional distribution of y^ given past outcomes, r/y_i, • • • ,yu- Here, the data- analyst specifies a regression model for h(E (yij\yij-i, • • •, yn, x^)), as an explicit func- tion of Xy and of the past responses. In each of the three approaches, we consider both the dependence of the re- sponses on explanatory variables and the correlation among the responses. W i t h cross-sectional data, only the dependence of the responses on the explanatory vari- ables needs to be specified; there is no correlation of responses. In longitudinal studies, in which correlation usually exists among responses, there are at least two consequences of ignoring it: (1) incorrect inferences about regression coefficients P, particularly, confidence intervals are too short based on assumption of independence, when in fact there is positive dependence; (2) the estimating method of P may be inefficient, that is, less precise than possible; 4 1.2 Missing Data Problems 1.2.1 Missing Covariates and Responses In many applications, especially in longitudinal studies, missing data are a serious problem. Ignoring missing data or using over-simplified methods to handle missing data often leads to invalid inferences. Thus, it is very important to find appropriate approaches to deal with missing data. Two kinds of missing data in longitudinal studies are common: (i) missing covariates; and (ii) missing responses due to dropout or missing visits. For example, individuals may not come to study center for mea- surements at scheduled time points for various reasons, or they may even dropout permanently because of drug intolerance or death. Missing data make statistical analysis in longitudinal studies much more complicated, because standard methods, which are usually designed for complete-data, are not directly applicable. Commonly-used naive methods for missing data include the complete-data method, which only uses the complete observations and deletes all incomplete obser- vations, the mean imputation method, which replaces the missing values by the mean values of observed data, and the last-value-carried-forward method, which imputes a missing value by the immediate previous observed data. 1.2.2 Classification of Missing Value Mechanisms At the presence of missing data, the missing data mechanism must be taken into account in order to obtain valid statistical inferences. Little and Rubin (1987) and Little (1995) give a general treatment of statistical analysis with missing values. Let y = ( with y(°) denoting the measurements actually obtained and y'm^ de- 5 noting the measurements which would have been available had they not been missing. Let r denote a set of indicator random variables, denoting which elements of y fall into y(°) and which into y( m ) . Now, a probability model for the missing value mech- anism defines the probability distribution of r. Little and Rubin (1987) classify the missing value mechanism as • Missing data are missing completely at random ( M C A R ) if the probability of missingness is independent of both observed and unobserved data. When miss- ing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be quite plausible. For ex- ample, some values are missing because of reasons irrelevant to the treatment such as the medical equipment is broken down on a certain day. So missingness is M C A R if r is independent of both y ( o ) and y ( m ) • Missing data are missing at random ( M A R ) if the probability of missingness depends only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he/she is too old. In mathematical notations, missingness is M A R if r is independent of y( m ) . • Missing data are nonignorable or informative (NIM) if the probability of miss- ingness depends on unobserved data. To be specific, N I M has two cases in the context of random effects models: (i) the missingness depends on unobserved responses. For example, a patient fails to visit the clinic because he/she is too sick. We call the missingness outcome-based informative (Little, 1995) if r is dependent on y ^ . (ii) The probability of missingness depends on unknown ran- dom effects (i.e. individual characteristics such as individual decay rates) which may substantially affect the responses. We call missingness random-effect-based 6 informative (Little, 1995) if r is dependent on random effect b*. It turns out that, for likelihood-based inference, the crucial distinction is be- tween random and informative missing values. Both M C A R and M A R missing mech- anisms are sometimes referred to without distinction as ignorable. Little and Rubin (1987) show that, when missing data are non-ignorable, likelihood inference must incorporate the missing data mechanism. 1.2.3 Literature on Missing Data Problems Little (1992) reviewed methods of estimation in regression models with missing co- variates. Six methods dealing with missing covariates are compared: complete-case methods, available-case methods, least squares on imputed data, maximum likelihood methods, Bayesian methods and multiple imputation. He suggested that the maxi- mum likelihood method, Bayesian methods, and multiple imputation method perform well, and the maximum likelihood method is preferred in a large samples and Bayesian methods or multiple imputation method are preferred in a small samples. Ibrahim (1990) considered missing covariates ( M A R ) in generalized linear models (GLMs) with discrete covariates, and applied the E M algorithm to obtain M L E s under the assumption that the missing covariates are from a discrete distribution. Ibrahim, Lipsitz, and Chen (1999) proposed a Monte Carlo E M algorithm for G L M s with nonignorable missing covariates. W u and Carroll (1988) consider linear mixed effects models (LMEs) with infor- mative dropout under the assumption that the informative dropout could be modeled by a probit model which includes the random effects as its covariates. Diggle and Ken- ward (1994) consider general approaches to informative dropouts in multivariate data and longitudinal data. They show that considering informative dropout mechanisms in the statistical inference reduces the bias caused by considering the informative dropout as only M A R . Ten Have et al. (1998) discuss mixed effects logistics regres- sion models for longitudinal binary responses with informative dropout. Roy and L i n (2002) consider multivariate longitudinal data with nonignorable dropouts and missing covariates. Little (1995) gives an excellent review on modeling the dropout mechanism in repeated-measures studies. Dropout models were classified into selec- tion models and pattern-mixture models. The main difference between the two type of dropout models is that the form of missing data mechanism needs to be spec- ified in the selection models but not in the pattern-mixture models. He classified N I M into nonignorable outcome-based missing data where the dropout depends on the missing values, and random-effect-based missing data where the dropout depends on random effects. He also suggested to examine the sensitivity of the results to the choice of missing data mechanisms when we almost know nothing about the missing data mechanism. Ibrahim, Chen and Lipsitz (2001) develop a Monte Carlo E M algo- rithm to obtain M L E s in G L M M s with informative dropouts. They propose that the missing data mechanism may be modelled by a logistic regression and a sequence of one-dimensional conditional distributions which may reduce the number of nuisance parameters. 1.3 Motivating Examples Our research is motivated by studies of H I V viral dynamics, which have received great attention in AIDS studies in recent years (Ho et al. 1995, Perelson et al. 1996; W u and Ding 1999). These viral dynamic models provide good understanding of the pathogenesis of HIV infection and evaluation of antiretroviral therapies, and the 8 dynamic parameters may reflect the efficacy of the antiviral treatments (Ding and W u , 2001). A common problem in these studies is that some subjects may drop out of the study or miss visits due to drug intolerance and other problems (although dropout patients may return to study later), and covariates may contain missing data as well. It is important to evaluate how the dropout patients affect estimates of the viral decay rates since the decay rates may reflect the efficacy of the antiviral treatments. The dataset which motivate our research consists of 48 H I V infected patients who were treated by a potent antiviral regimen. The viral load is repeatedly measured after initiation of the treatment. After the antiviral treatment, the patient's viral loads will often decay, and the decay rate may reflect the efficacy of the treatment. We only consider the viral load data before viral rebound and the first three months data since data after three months are likely to be contaminated by long-term clinical factors. The number of measurements for each patient varies from 2 to 7. Fourteen patients have missing viral loads at scheduled time points due to dropout or other problems. The baseline covariates C D 4 cell counts, total complement levels (CH50), and tumor necrosis factor ( T N F ) contain 3.7%, 12.3% and 16.4% missing data respectively. Four patients are randomly selected and their viral loads are plotted in Figure 1.2. Visual inspection of the raw data seems to indicate that dropout patients ap- pear to have slower viral decay rates, compared to the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. This dataset was ana- lyzed previously, but dropout patients were discarded and the missing viral loads were assume to be missing completely at random (Wu and Ding 1999; W u and W u 2001). W u (2004) re-analyze the dataset, proposing a missing mechanism based on the unobserved responses (viral loads). In this thesis, our objectives are to model the viral load, incorporating non-ignorable missing mechanism, based on unknown 9 "i 1 1 1 r~ 0 10 20 30 40 days after treatment Figure 1.2: Viral loads of four randomly selected patients. 10 random effects, and check if the estimates of decay rates are different. 1.4 Objectives and Outline In this thesis, we develop an exact inference method, implemented by a Monte Carlo E M algorithm, to make simultaneous inferences for N L M E s with informative dropout and missing covariates. To avoid computational difficulties when the dimension of random effects is not small, we also propose an approximate inference method, which integrates out the random effects in the E M algorithm for more efficient computation. Our methods differ from W u (2004) in that the proposed dropout mechanism depends on the random effects rather than the unobserved responses. The remainder of this thesis is organized as follows. Chapter 2 introduces N L M E s . Chapter 3 discusses the exact inference method for estimation of N L M E s with informative dropout and missing covariates. The approximate inference method based on linearization is presented in Chapter 4. We discuss dropout models and covariate models in Chapter 5. In Chapter 6, we apply our methods to real datasets. Chapter 7 presents our simulation study. We conclude the thesis with a discussion in Chapter 8. 11 Chapter 2 Nonlinear Mixed Effects Models 2.1 Introduction Before we present our methods for estimating parameters in N L M E s with informa- tive dropout and missing covariates, we give a brief introduction to N L M E s in this chapter. In Section 2.2, we introduce N L M E s for longitudinal data. Section 2.3 gives a literature review on N L M E s with informative dropout and missing covariates. 2.2 Nonlinear Mixed Effects Models Linear models, such as polynomials, are often empirical models based on the ob- served data. Therefore, they may be only valid within the observed range of the data. There is often no theoretical consideration about the underlying mechanism, which generates the data. In many longitudinal studies such as H I V viral dynamics, pharmacokinetics analyses, and studies of growth and decay, nonlinear modeling is often required for meaningful analyses. Nonlinear mixed effects models ( N L M E s ) , or hierarchical nonlinear models, are popular in these studies in characterizing both the 12 intra-individual variation and the inter-individual variation (Davidian and Giltinan, 1995; Vonesh and Chinchilli, 1996). As a generalization of linear models, nonlinear models have many advantages: (1) Nonlinear models are often mechanistic, that is, they are often based on the mechanism which produces the data, so the model parameters generally have a natural physical interpretation. (2) A nonlinear model generally uses fewer parameters than a competing linear model, such as a polynomial, offering a more parsimonious description of the data. (3) Nonlinear models often provide more reliable prediction for the responses outside the observed data range. However, compared with linear models, nonlinear models usually don't have a close form expression for the marginal likelihood, and thus parameter estimation is more computationally intensive. For longitudinal data analysis, nonlinear mixed effects models are popular for inferences. Suppose that there are N individuals, with individual i having n» measurements at times tn,..., t j n . . Let be the response value for individual i at time iy , subject to informative dropout, i = l,...,N;j = l , . . . , n j . Denote Yi = (yn, • • •) Vim)7Let z$ = (zn,..., zip)T be a collection of incompletely observed baseline time-independent covariates for individual i. Let v$ = (vn,... ,Viq)T be a collection of completely observed baseline time-independent covariates for individual i. A general N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995) 13 Va = 9(^ij,^ij;Pi) + eij, ~ N(0,a2I) (2.1) pt = d ( z i , v i ; / 3 , b i ) , bt^NfrD), j = l,...,m,i = l,...,N, (2.2) where g(-) is an arbitrary nonlinear function, z y- and v y - are respectively (p x 1) and (g x 1) vectors of covariates, e, = (en,..., e j n ; ) r represents measurement errors, Pi — (Ai) • • • iPis)T is a (s x 1) vector of individual-specific regression parameters, P = ( / ? i , . . . , (3r)T is a (rx 1) vector of population parameters (fixed effects), d(-) is a s- dimensional vector-valued function, = (bn,..., bi3)T is the vector of random effects and is independent of ê , a1 is the unknown within individual variance, / is the identity matrix, and the (s x s) matrix D quantifies the random inter-individual covariance. We write D = D(rj), where r) denotes the collection of all distinct parameters in D. Let /(•) 1 denote a generic density function and f(y\x) denote the conditional density function of y given x. After integrating out the unobserved random effects vector, the density of the responses is given by f(yi\zi, Vi, P, a2, D) = J f(yi\zi, v i ; P, a2, b, D)f(h\D)dh, (2.3) and the likelihood function is N L(P, a2, D\y) = J] / / ( X i N i . v*; P, a2, b, D)f(b\D)db, (2.4) »=i ^ which generally does not have a closed-form expression. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the number of random effects b^. This is straightforward to do by direct numerical integration when the dimension of is 1 or 2. However, when bj has a dimension of 3 or more, people need to consider alternative methods, such as Monte Carlo method. xHere, for simplicity, we are abusing mathematical notation, by using / for many different den- sities, and the function can be determined from the arguments. 14 Lindstrom and Bates (1990) propose an approximate method based on first- order Taylor expansions about the random effects 6;. The resulting algorithm provides a computationally fast, albeit approximate, method for a wide class of non-linear models. 2.3 Literature Review on N L M E Models with In- formative Missing Data W u and W u (2001) estimate parameters in nonlinear mixed effects models with miss- ing covariates ( M A R ) by a three-step multiple imputation method. In first step, they fitted a hierarchical model without covariates. Then they imputed the missing covari- ates based on a multivariate linear model, implemented by Gibbs sampler, and created B independent complete datasets in the second step. In the last step, they used the standard complete-data method to analyze each dataset and combine B obtain the overall inference. W u (2002) proposed a method for N L M E s with censored responses and covariates measured with errors. W u and W u (2002) also proposed a method for analyzing N L M E s with missing time-dependent covariates. Later, W u (2004, a) proposed an exact and an approximate method for analyzing data with missing co- variates in nonlinear mixed effects models. The exact method is implemented by a Monte Carlo E M algorithm, and the approximate method linearizes the nonlinear model based on a Taylor expansion, and it substantially reduces the computation load. W u (2004, b) proposed a Monte Carlo E M method for estimating parame- ters in N L M E s with nonignorable missing covariates and dropout, with a dropout mechanism depending on unobserved responses. However, no one has considered pa- 15 rameter estimation in N L M E s with informative dropout and missing covariates, with a dropout mechanism depending on unknown random effects. In the following chap- ters, we focus on N L M E s with informative dropout and missing covariates, with a random-effects based dropout mechanism. Since the random effects are shared by both the response model and the dropout model, this approach may also be referred to as a shared parameter model. 16 Chapter 3 An Exact Method for N L M E Models with Informative Dropout and Missing Covariates 3.1 Introduction In this chapter, we develop an exact inference method based on Monte Carlo meth- ods to obtain M L E s for parameters in N L M E s with informative dropout and missing covariates. The proposed exact method is implemented by a Monte Carlo E M algo- rithm. In Section 3.2, we give a description of N L M E s with informative dropout and missing covariates. Section 3.3 describes a Monte Carlo E M algorithm. A detailed description of our sampling methods is provided in Section 3.4. Computation issues regarding our algorithm are discussed in Section 3.5. 17 3.2 The Models We consider the models (2.1) and (2.2). Let rj = (rn,... , r i n i ) T be a vector of missing data indicators for individual i such that = 1 if y^ is missing and 0 otherwise. We write y; = ( y ™ i S i i , YobSi)T, where y T Oj s ,j corresponds to the missing components of Yi and yobs,i contains the observed components of y». We write Zj = ( z ^ i s i , z ^ s i ) r , where z m i S ) j corresponds to the missing components of covariate vector Zj and z0{,5]j contains the observed components of Zj. We assume that the missing covariates are ignorable (or missing at random), i.e., the missing covariate mechanism may depend on the observed data but not on the covariate values being missing. The observed data are {(y06S,i> zobs,i, v*, rj), i = 1 , . . . , N}. Note that the dimensions of y0bs,i and z 0 ts , i depend on i. To facilitate likelihood inference, we need to make a distributional assumption for the incompletely observed covariates zit conditional on the completely observed covariates Vj. We denote the covariate distribution by / ( z j | v j ;a ) , where the param- eters a may be viewed as nuisance parameters. To allow for informative missing responses, we assume a distribution for rj as / ( r » | y j , Z j , v*; <p), where the missingness may depend on the values being missing (i.e., ymiS,i and zmis,i)- The parameter <p are treated as nuisance parameters. When the responses y$ and the covariates z* contain missing data, likelihood inference becomes more complicated for (2.4). In this section, we consider a Monte Carlo E M algorithm (Wei and Tanner, 1990) for 'exact' likelihood inference, in- corporating missing responses and missing covariates simultaneously. By treating the unobservable random effects bj as missing data, the 'complete data' become {(yi, Zj, Vj, Ti, bj), i — 1,. . . ,N}, and the 'complete-data' log-likelihood can be writ- 18 ten as N iw = X ] { l o g ^ ^ r i i y i ' b i ' Z i ' V i ; ^ + l o g ^ ( y i i b i ' z i ' v i ; ' ( 3 ' c r 2 ) » = i + log/(b i|D(»7)) + l og / ( z i | v i ; a ) } ) (3.1) where ip = (a, /3, a2,77, (p) denotes the collection of all parameters. We assume that the parameters a, f3,a2,r], and cp do not overlap. The term /(r^y*, b;, Zj, v*; cp) is a general expression of the missing response mechanism. Little (1995) points out two ways to incorporate informative missingness: • outcome-based informative if /(r^y*, b^, z,, v$; cp) = /(r^y*, zt, Vjj 0). That is, in addition to the covariates, the missing probability of the current response depends on the possibly unobserved response For example, a patient does not show up because he is too sick to go to the clinic. • random-effect-based informative if / ( r ^ y j , b;, z,, v;; cp) = / ( r j |b j , Zj, v^; (p). That is, in addition to the covariates, the missing probability of the current response depends on the underlying unobservable random effect bj. For example, a pa- tient is more likely to miss the scheduled exam because the treatment is less effective on him and therefore, he does not believe the treatment. Or, a patient may be more likely to dropout early because his true (but unobservable) viral decay rate is slower than other patients. Diggle and Kenward (1994), Little (1995), and Ibrahim et al. (2001) discussed various specifications of the outcome-based informative missing mechanism / ( r j | y j , zit Vi, <p). In this thesis, we focus on the random-effect-based informative missing mech- anism / ( r ; |b j , z;, cp). We may assume, for example, riX,..., rini are independent 19 with the same distribution: logit [P(ry = l | b i , <p)] = fa + 0i6i H + <t>,bt. (3.2) In this dropout model, the missing probabilities of responses only depend on the random effects of that patient. More complicated dropout models can be specified in a similar way. Note that the assumed models are not testable based on the observed data, so it is important to perform sensitivity analyses on various missing mechanisms. If the main parameter estimates /3 are quite independent of the assumed dropout model, we may be confident about the results. Otherwise, if the estimates are very sensitive to the assumed dropout model, we need to justify the dropout model first to get reasonable estimates of the parameters. The covariate model /(z;|v;) can be chosen in a similar way and sensitivity analyses should also be performed (Ibrahim et al. 1999). 3.3 A Monte Carlo E M Method The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algorithm to compute M L E s in a wide variety of situations, such as missing data and random effects models, but it fails to get an estimating covariance matrix of M L E s . Each iteration of a E M algorithm consists of an E-step that evaluates the expecta- tion of 'complete data' log-likelihood conditional on the observed data and previous parameter estimates, and an M-step that updates the parameter estimates by maxi- mizing the the conditional expectation of log-likelihood. This iterative computation between the E-step and M-step till convergence leads to the M L E s . If we treat (y o f e s , i , ymis,i, z o 6 S ) i , z m i S ) i , Vj, r;, ht) = (y i ; zit v,, rh b{) as the 'com- 20 plete' data, the complete data density for individual i is given by f(Yi, Zi, Vi , Ti, bi\a, [3, a2, t], cp) = / W y * , bi , Z i , Vi; 0 ) / ( y i | b i , Z i , v*; /3, cr 2)/(bi|Z?(r7))/(zi|vi; a). (3.3) This leads to the complete data log-likelihood JV i=l N = z~2{ l o § / Wy;> b » z » v " <W + l o s /(yil bi> z»> v » ; A °"2) i=l + log/(bi |D(r?)) + l og / ( z i | v i ;a ) } , (3.4) where ip = (a,/3, a2,77, 0) and k{ip\yi, zi} vit rit bi) is the contribution to the com- plete data log-likelihood from the ith individual. Note that we mainly interested in estimating (/3, a2), and treat (a,cp, rf) as nuisance parameters. Ibrahim et al. (1999, 2001) proposed a Monte Carlo E M algorithm for esti- mating parameters in G L M M s with informative dropout without missing covariates and for G L M s with missing covariates respectively. W u (2004, a, b) extends their methods to N L M E models and provide a unified approach to address dropouts and missing covariates simultaneously, under outcome-based informative missing response mechanism. Here, we extend the E M methods to N L M E models with dropouts and missing covariates, under random effects based informative missing mechanism. 21 3.3.1 E-step Let xpW be the parameter estimates from the i th E M iteration. The E-step for individual i at the (t + l)st E M iteration can be written as <2*(V# ( t )) = E t / i ^ l y i . Z i . V i . r i . b ^ l y o i ^ . Z o b ^ . V j . r i ; ^ ] = JJJ { l o g / C r i l y i . b j . Z i . V j ; ^ ) + l o g / ( y ; | b i , Z i , v i ; /3, a 2 ) + l o g / ( z i | v i ; a ) + log/(b i | J D(r 7 ))} ^/ (ymiSjt )z m i s , i ;bj |y 0 i , s j , z 0 ( , S jVj ,Tj ,T/>^ ^)dbidymisidzmis^. (3.5) In general, the above integration is intractable and does not have a closed form expres- sion. However, since the integral is an expectation with respect to f(ymis,i, z m i s , i , b » | yobs,it Z 0 b S ) i V j , r;;V> ( t )), it may be evaluated using the Monte Carlo E M of Wei and Tanner (1990) and Ibrahim et al. (2001). Specifically, in the i th iteration, we may generate mt samples from / ( y m i s > j , zmiSii, b j | y o 6 S ] i , z o 6 5 > i V j , r ;̂ ip^) and then approxi- mate the expectation Qi{ip\ip®) by its empirical mean, with missing data replaced by simulated values ^ mt + ~ Y l°gf(yobs,i, ymi,i\hi\ zobs,i, z%iSii, v 4 ; /3, a2) m t j=i ^ mt + ~ Y l 0 S /(z<*».<> Z m L , i l v i ; " ) 1 m ' + - E ^ / ( b ? ) i £ > f a ) ) > - ( 3 ' 6 ) where { ( y ^ ] s > j , z ^ - s i , b - - 7 ' ) , j = l , . . . , m t } are the m 4 simulated values of missing responses, missing covariates, and unobservable random effects. We may choose m0 as a large number and mt = mt-\ + mt-i/k, (k > 1) in the ith iteration. Increasing 22 mt with each iteration may speed up the E M convergence (Booth and Hobert, 1999). The E-step for all individuals at the (t + l)st iteration can be written as 2=1 j=l ' ^ N mt + m t ^ E l o S f(yobs,i,y^.,M\ Zobs,i, zm\si, v < ; / 3 , a 2 ) i=i j=i AT m j + ^ E E l o § / ( z «* . ,< . z H l . J v i ; a ) »=i j=i + ^ E E 1 ° g / ( b ? ) i ^ ) ) } - t=i j=i =E Q ( 1 ) ( 0 | ^ W ) + Q ( 2 ) ( / 3 , a 2 ] ^ ) + Q ( 3 ) ( « | ^ ( t ) ) +Q ( 4 ) (D(r7) |V,W) (3.7) To generate independent samples from f(ymiaii, zmia<i, b i | y o 6 S i i , zoba4Vi, r f ; ip{t)), we may use Gibbs sampler (Gelfand and Smith, 1990) by sampling from the following three full conditionals iteratively: * f(ymis,i\yobs}ii ^ii b i , V^, ^), * f i^mis^^obs^ii Yij b j , V^, Tj, "0̂ )̂, * and / ( b i | y i , z i , v i , r i ; i / > ( ' ) ) . The details about this sampling will be discussed in Section 3.4.1. In practice, when we generate samples with respect to /(y m » s , tIyofc«, i> Zj , b j , v^, 23 r^;?/^), we can note that f {yTnis,i\yobs,i) Zj, bi , Vj, Yi^lp^ ^) = f(yi, r»|zi , b i ; V i ; ip{t))/f{y0bs,i, Yi\zu bi , v 4; V>(t)) = / ( y i | z i , b i , V i ; V u ) • — • — TTtyT- (3.8) / ( y 0 6 s , i , r i | z i , b i , V i ; ^ ^ ' ) Since we are focusing on random effects based missing mechanism, assuming that the probability of missingness can be explained by b, rather than y*, both the numerator and denominator of the second term in (3.8) are constant with respect to the new- generated samples of ymu,i- Thus, / (ymis ) ; |yo6 s , i ,Zi ,bi ,Vi ,r i ; i /> ( t ) ) oc / ( y ^ , b i ; \ t ; ip{t)) (3.9) Similarly, we can simplify the other two full conditionals as: f (.2»mis,i\2>obs,ii Yi> bi , V i , I"i, )̂ = / ( z i , yu b i , r-i, |vi; tp{t))/f(zobs<i, yt, bt, r i ( |vij ip{t)) / (Zi lv i ; V> ( t ) ) / (bi | Z j , Vjj 0 W ) / ( r i | Z j , b f, v < ; ^ / ( y ^ , z f , bj, v i ; - 0 ^ ) / (z 0 6 5 , i ,y i ,bi ,r i |v i ; i /> ( 4 ) ) oc / ( Z i | v i ; - 0 ( t ) ) / ( r i | z i , bi , v*; ip{t))f(yi\ri, zubi, v { ; 0 W ) = / ( Z i l v i ; ^ ) / ^ ^ ) (3.10) / ( b i | y i , Z i , V i , r i ; 0 ( T ) ) = / ( b i , y i , r i | z i , V i ; V ( t ) ) / / ( y » » r i | z i , V i ; i / ' ( < ) ) = / (bj |z j , V J ; 0 ( t ) ) / ( r j | b j , Zj , V J ; V> ( t ) )/(yilbj, ru zu \j\ip{i)) f(yi,Yi\zuwi;ip{i)) cx / ( b i | - 0 ( t ) ) / ( r d b i , Z i , v i ; V W ) / ( y i | b i , r ( ) zu v < ; 0 W ) = / ( b i | V W ) / 6 ( b , ) , (3.11) where /;(zi) = / ( r i | z i , b i , V i ; 0 « ) / ( y i | r i , Z i , b i , V i ; V ( t ) ) , (3.12) 24 and ft fa) = / ( r i | b i ! z i ! v i ; t / ; W ) / ( y i | b i , r i , z i ) v i ; ^ ) . (3.13) 3.3.2 M-step We can update the estimates at the (i+l)st iteration by maximizing Q(ip\ip^tS>). Suppose that the parameters f3, a, D(rf), 4> and a are all different, we can update <fi, (3, a, a and D(rj) by maximizing Q ( 2 ) , Q ( 3 ) and Q ( 4 ) separately in the M-step. The maximizer for QW may be computed via iteratively re-weighted least squares where the missing values are replaced by their simulated values {ym\s,n z^L.ti 0('+D = a r g m a x { Q « ( 0 | i / ; W ) } = a r g m a x — zZ l o S f(ri\yobs,i, y £ ] S i i , b ^ , z o 6 S i i , z ^ , v i ; 0). (3.14) 0 m t i = i j=i A n d , similarly, the maximizer (/3^+1', CT2^+1^) for could be written as: (/3,a2)<<+1> = arg max {Q(2>(/3, a 2 | V ( t ) ) } j JV m i = arg max — ^ ^ l o g / ( y o 6 s , i , y ^ L , i | b p \ z o 6 s , i , z ^ ^ V i ; ^ , ^ 2 ) ; (3.15) the maximizer a ^ t + 1 ' for ( ^ ( a l V ^ ) can be written as: a(t+i) = a r g m ^ x { Q ^ ( a | V ( t ) ) } ^ N mt = a r g m a x — ^ ^ l o g / ( z o 6 S ) i , z ^ S i i | v i ; a ) ; (3.16) a rnt 1 i=i j=i 25 and the maximizer L>(/7)' t +1' for Q^(D(rj)\ip^) can be written as: D ( m ) = argmax{QW(D|^ ( t ) ) } = M g m a x — ^ g l o g / ( b ^ | £ > ( » 7 ) ) } . (3.17) Generally, (3.14) - (3.17) are nonlinear functions and have no closed-form ex- pressions. The maximizers could be obtained via standard optimization procedures for complete data, such as the Newton-Raphson method and Quasi-Newton method. The asymptotic variance-covariance matrix of ip can be obtained using well-known complete-data formulas (Louis, 1982; McLachlan and Krishnan, 1997), where the ex- pectations in those formulas can be approximated by Monte-Carlo means. Specifically, note that the observed information matrix equals the expected complete information minus the missing information, that is, IobsW = IcomW ~ ImisW- (3-18) Let N N mt 1 Q M V O = YlQiWi') = E E - s « i W - ( 3 - 2 ° ) Tlil t = l j = l and W ) - W * # > - £ £ ± - 2 g j ± . (3.21) v^'^; 8ipdipT ^ ^ m t dip v ; ' ' i = i j = i Since (3, a, <p and r) are all different, matrices Q(ip\,ip), O^IVO a n d 5^ (ip) are block diagonal. Then based on (3.18), the asymptotic observed information matrix is { N mt N ~\ E E 3 ^ # ) - E < W W # M ^ ) • (3-22) i = l j=l 1 i = l J 26 Thus, the asymptotic variance-covariance matrix of tp can be approximated by C o v $ ) = / ^ $ ) . (3.23) 3.4 Sampling Methods 3.4.1 Gibbs Sampler From the previous sections, we can see that generating samples from the conditional distribution f(ymis,i, z m i S i i , b i\yobs,i> zobs,ivi> ri> V*^) * s a n important step for imple- menting the E-step of the Monte Carlo E M algorithm. Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multi- dimensional distribution by sampling from each of the full conditional distributions in turn, if the distribution has a convenient representation via conditional distribu- tions. Here, we use the Gibbs sampler to simulate the missing values as follows. Set initial values (v^L,*) zmiS,i> b- 0'). Supposed that the current generated values are ( v ( f c ) 7{k) b{k)) we can obtain i v ( f e + 1 ) 7 ( f c + 1 ) h{k+1)) as- Step 1. draw a sample for the missing responses y ^ V from f (ymis,i\zmis,i> b i > yobs,i> zobs,i~^i> " 0 ^ Step 2. draw a sample for the missing covariates z ^ * V from f {zmis,i\(ymis,i i bj \ yobs,i> zobs,i'vi, ? i \ ^ ) - Step 3. draw a sample for the "missing" random effects b^ f e + 1 ' from ffoilymisji ZmisJ < vo(>s,i) zobs,ivit ri\ ^ ^)- After a sufficiently large burn-in of d iterations, the sampled values will achieve a steady state. Then, {(yjf,^, z ^ - s i , hf^),k = d + 1 , . . . , B} can be treated as samples 27 from the multidimensional density function f(ymis,i) zmis,ii bj|y 0 i) S ] j, Z 0 ( , s j V j , Tj) i / / )̂. A n d , if we choose a sufficiently large gap d! (usually smaller than d), we can treat the sample series {{y^is>it z ^ 8 ] i , bf^), k = d + <f, d + 2d',...} as independent samples from the multidimensional density function. There are several ways to get the initial values (yJJ]S ] i, z ^ ] s i , b ^ ) . A simple way is to replace (y^] Sj, z^] s i ) by the average values of the observed data, and set b | 0 ) as 0. 3.4.2 Multivariate Rejection Algorithm Sampling from the three full conditional distributions can be accomplished by an adaptive rejection algorithm (Gilk and Wi ld , 1992) if the appropriate densities are log-concave. However, for arbitrary N L M E models, some densities may not be log- concave. In such cases, multivariate rejection sampling methods may be considered. Booth and Hobert (1999) discussed such a method for complete-data G L M M models, which can be extended to N L M E models with dropouts and missing covariate as follows. Considering sampling from / (bj |y j , zt, v;, r-j, ip^). As in (3.13), let fb*(bi) = / ( r i | b i , z i , v i ) i / ' ( t ) ) / ( y j | b j , r i , z i , v I , ' 0 W ) , and r = sup/ 6*(u). We assume r < oo. A u random sample from f(bi\yi,zi,vi,vi,'d)^) can be obtained as follows. Step 1. sample b* from f(bi\ip^), and independently, sample w from the uniform (0,1) distribution. Step 2. if w < /* (b*) /r , then accept b*; otherwise, go to Step 1. Samples from f{ymis,i\yobs,i> z i , b;, v,, ri} t/>(t)) and / ( z mis,i\zobs,ii V j , "ii V i , can be obtained in a similar way. Note that, when the dropout probability only de- 28 pends on random effect b, but not yi} we have f{ymis,i\yobs,ii z i ; bj, Vj, TjJ )̂ / ( y i , r f |zi, b i , V J ; ip(t))/f(yobs,i, Ti\zu bit Vjj T /> ( < ) ^ / ( y j | z j , b j , V j ; ^ ^ ^ l y ^ ' ^ v ^ W ) V i i ^ W ) / ( y 0 6 S , i . r » l z » . b » . v i ; ^ ( t ) ) = / ( y i l z i . b i . V i ; ^ ) / ^ ^ ) . (3.24) The function / * ( y » ) = / (r j | z j , bj, v i ; ip^)/f(yobs,i, r»|zi , b i , Vj, i/>(t)) is a constant with respect to the missing responses y m j S i j . So, in Step 2, we always have fy{y*i) = r for any generated y*. Thus / y ( y * ) / r = 1 > to, for any u; generated from uniform(0,l) and y* is always accepted. That is, for any y* generated from / ( y i | z j , b i , Vi,ijj®) in Step 1, we always accept y* in Step 2. 3 .4 .3 Importance Sampling When the dimension of y m j S ) j , z m j S ] j or b i is large, the above rejection sampling methods may be slow. In this case, we may consider importance sampling methods where the importance function can be chosen to be a multivariate normal density or a multivariate Student t density whose mean and variance match the mode and curvature of / ( y m i S , i , z m j S i i ) b j | y o 6 s , j , z 06 S , iVj, r i ; ip^). Booth and Hobert (1999) discuss an importance sampling method for complete-data G L M M models. Here, we may extend their method to N L M E models with dropout and missing covariates. Specially, we may write / (ymis , i j zmis,ii hi\y0bs,ii z o 6 s , i v i , I"i, l / ^ ') = C e x p ( / l ( y m j s j , Z m i S ] i , bi)), (3.25) 29 where c is a unknown normalizing constant. Let h(ymisj, zmis>i, bj) and / i (y m i s , i , Zmta.i, bj) be the first and second derivatives of h ( y m i S t i , z m j S , i , bj) respectively, and let (.ymis,uzmis,i^i) be the solution of h(ymi3}i,zmis^,bi) - 0, which is the maximizer of h{ymis,i> zmis,i,hi). Then, the Laplace approximations of the mean and variance of f(ymis,i, z m i , , i , b i \ y o b s > u Zo^jVj, ru ip {t)) are (y*mis<i, z*mi^, b*) a n d - ( h ( y m i s , i , «mi»,i»b*))" respectively. Suppose that { ( y ^ , z ^ , b* ( 1 )) , . . . , ( y ^ , z ^ > , b * ^ ) } is a ran- dom sample of size mt generated from the importance function / i*(y m j S ) i , zmiSj, bj), that is assumed to have the same support as / (y m j s , j , z m i j J | j , b j |y o 6 S i j , z o 6 s,jVj, r f , V> ( t )). Then we have N . nit QW\4>M) « { ^ E w W ^ y ^ , y S i , ^ . i , < t 2 1 o b i * 0 ) ) } , (3.26) 1=1 J=l where (t) _ f(ymJl,i> Zmis,i> lypba.i) Z0ha,jVj, Tj, Tp^) w*i ~ h*(v {j)- z'U)- b*ij)) { ] are importance weights. For the above sampling methods, the rejection sampling methods may be more efficient when the dimension of the random effects and the sample size are small, while the importance sampling method may be more efficient when the sample sizes are large since in this case the importance distribution may closely resembles the true conditional distribution. 3.5 Convergence When applying the Monte Carlo E M algorithm, Monte Carlo samples for the "missing data" are drawn at each iteration to approximate the true values. Consequently, Monte Carlo errors are introduced. The Monte Carlo errors are affected by the 30 Monte Carlo sample size. It is obvious that larger values of mt, the Monte Carlo sample size, will result in more precise but slower computation. A common strategy is to increase mt as the number of E M iterations increase (Booth and Hobert, 1999). For sufficiently large values of mt, the Monte Carlo E M algorithm would inherit the properties of the exact versions, such as the likelihood increasing properties of E M , but would substantially increase the computation work load. Thus, we usually use a relatively small mt at initial iterations, and then increase mt with the iteration. If the Monte Carlo error associated with i / / + ^ is large, the (t + l)th iteration of the Monte Carlo E M is wasted because the E M step has been swamped by the Monte Carlo error. Booth and Hobert (1999) proposed an automated method for choosing mt in the context of complete-data G L M M models. Here we extend their method to N L M E models with dropout and missing covariates. Let Q ' " ^ ' " ) = B-8^f1, (3.28) g » W / > ) „ < W ! > , ( 3 , 9 ) and let rb*^t+^ be the solution of Q^(tp\xp^) — 0. When the simulated samples are independent, it can be seen that the conditional distribution of / ( • J / > ^ + 1 ^ | ' 0 ^ ) is approximately normal with mean 0 * ( T + 1 ) and a variance that can be estimated by V a ? ( ^ ( t + 1 ) | ^ ( t ) ) = Q W ^ + ^ i ^ ^ V ^ ^ f ^ ^ ^ ^ i ^ ^ ^ g w ^ ^ ' + ^ i ^ ^ V 1 (3.30) 31 where t , b P ) ; t f < « + 1 > ) ] r } , (3.31) ym\si' zmL,» a n o ^ b j j ; are simulated samples, and u>y are the importance weights when the importance sampling is used and are all set equal to 1 when rejection sampling methods are used. After the (t + l ) th iterations, we may then construct an approximate 100(1 — a)% confidence ellipsoid for ip^t+1^ based on the above normal approximation. The E M step is swamped by Monte Carlo error if the previous if>® lies in the confidence ellipsoid, and in that case we need to increase mt. For example, we may set mt to be mt-i +mt-i/k for some positive constant k and appropriate m0. The proposed Monte Carlo E M algorithm often works well for the models with a small dimension of random effects. When the dimension of the random effects is not small, however, the proposed E M algorithm and Gibbs sampler may converge very slowly or even not converge. Therefore, in the next chapter, we propose an approximate inference method which may avoid these convergence difficulties and may be more efficient in computation. 32 Chapter 4 An Approximate Method for N L M E Models with Informative Dropout and Missing Covariates 4.1 Introduction In the previous chapter, we have described a Monte Carlo E M algorithm for "exact" likelihood inference for N L M E models with informative dropout and missing covari- ates. However, the exact method may offer potential computational problems such as slow or non-convergence, especially when the dimension of the random effects b* is large or the intra-individual data are not rich. When the dimension of the ran- dom effects bi is not small, sampling the random effects may lead to inefficient and computationally unstable Gibbs samplers, and may lead to a high degree of auto- correlation and lack of convergence. When the intra-individual data are sparse, the individual nonlinear regressions used in the M-step may fit poorly, leading to slow or non-convergence. To reduce computation work load, in this section, we propose an ap- 33 proximate method which iteratively solves certain L M E models and avoids sampling the random effects in E-step. The proposed approximate method may be preferable when the exact method exhibits computational difficulties. Alternatively, the approx- imate estimates can be excellent starting values for the exact method. Note that this approximate method is exact for L M E models and certain N L M E models where the model may be nonlinear in the fixed effects but is strictly linear in the random effects. For complete-data N L M E models, approximate methods have been widely used, and these approximate methods often perform reasonably well in most cases (Lindstorm and Bates, 1990; Pinheiro and Bates, 1995; Vonesh et al. 2002). These approximate methods are typically obtained via Taylor expansions or Laplace approx- imations to the nonlinear models. One particularly popular approximate method for complete-data models is that of Lindstorm and Bates (1990), which is equivalent to iteratively solving certain L M E models (Wolfinger, 1993). For L M E models with miss- ing responses but completely observed covariates, Ibrahim et al. (2001) propose an efficient E M method which is obtained by integrating out the random effects. These methods can be extended to N L M E models with informative dropout and missing co- variates for approximate inference. Here, we propose an approximate method based on Lindstorm and Bates (1990), which uses first-order Taylor expansions around the updated parameter and random effects estimates and is equivalent to iteratively solv- ing certain L M E models. Then we propose to handle the missing responses and missing covariates in the L M E model step, for which the random effects can be inte- grated out. Thus, this approximate method avoids sampling the random effects in the E-step and avoids fitting some nonlinear models in the M-step, and therefore avoids potential computational difficulties associated with the exact method. Moreover, well known efficient EM-type algorithms for complete-data L M E models (Meng and van 34 Dyk, 1997; L i u et al. 1998) can be incorporated in the L M E model step to further speed up the convergence. 4.2 The Approximate Method We can rewrite the N L M E model (2.1) and (2.2) as a single equation by combining the two stages: Vii = 9ij{tuPM) j = l,...,nui = 1,...,N, (4.1) where <7ij(-) is a nonlinear function. To simplify the notation, we suppress the com- plete observed covariates V; and denote the current estimate of ip in the t-th E M iteration by V W = (a<*>, P{t\ Dfa)« a 2 « <p«). L e t g l = (giU .. .,gini)T. Following Lindstorm and Bates (1990) and Wolfinger (1993), the proposed approximate method iteratively solves the following L M E model: y i = Xir3 + + e;, (4.2) where fi = Y i - S i f a ^ . b ^ + XipV+TfrV, (4.3) (/3(",bi")' ( 4 4 ) ( / 3 ( W ( 4- 5) %j(zi , /3 ,bi) X i j — w ~ _ dgi:j{zup,bi) i j dbl Xt = Xfa) = (X?lt... ,Xlf, Ti = T^) = (Tl.. .,Tfnf, a n d Y i = ( y „ , . . . ,yini)T. Note that we have &*>) r«;V ( t )), (4.6) 35 where ymis,i ~ Ymis.i — gmia,i('Zlmis,i, $ \ h\ + -Xmj S )j(zmj S ]j)/3^ + Tmis,i{zmis,i)b\ \ (4.7) and Xmis,i,TmiSii are submatrice of Xi,Tt respectively. g m i S ) i is a sub-vector function of defined similarly, and yt = ( y £ i j M , yo6 S , i) T- Under the L M E model (4.2), it is straightforward to show that HbilyuZi^V) ~ Nfati),- (4.8) where ti^ia-WlfTi + D-1®)-1, (4.9) hi = tiTf (Si-Xip^)/^. (4.10) 4.2.1 E-step We can integrate out bj and obtain the following results. Qi(tp\*p{t)) = £ ' [ / i ( V ' | y i , Z i , v i , r i , b i ) | y o 6 a i i , z 0 ( i a | i , v i , r i ; V ' ( t ) ] = / / / { l o g / ( r i | y i , hu Zi, v i ; </>«) + log f(yi\hi, Zi, vt; / 3 « <7 ( ° 2 ) + l o g / ( Z i | v i ; Q(") + log/(bi|Z?(77))} xf(ymia,i, z m i S , i , bj |y 0 6 S , i , z o t S | i V j , TJ; • 0 ( ' ) ) d b i d y m i S , i d z m i S | i = h + h + h + h- (4.11) 36 h = JJJ ^ogf{ri\yi,bi,Zi,Vi\<p{t))f(ymiSii,zmiatU^^ X (ibjdy misjd-Z'mis ,t x / ( b i | y i , Zi, $w)dbidymi3tidzmia,i = JJ[J l o g / ^ l y i . b i . Z i . V i j a i W j / C b i l y i . Z i - . ^ J r f b i } Vi,ri;T/> ( V y m i s . i d Z n ^ i • (4.12) Consider the dropout model (3.2), and suppose that the missing probabilities for each time points are conditionally mutually independent. Then, we have / f a l y i . b i . Z i . V i j c i W ) - n p ( ^ ' = l | 0 ( t \ b i ) r « ( l - P(Ra = l | ^ ( t ) , b0) 1 _ r«<4 .13) j=i Note that, here, we use Rij to represent the argument in the equation, and use r y - to represent the value we observed. Define 0 = (<pi,..., (ps)T, and we can re-write (4.12) as h = JJ {y log | n ^ ( ^ = i i 0 ( t ) , b o r y ( i - p ( R i j = l i ^ . b o ) 1 - ^ / ( b i | y i , Z j ; i /> ( t ) )dbi | x f{ymis,uzmia,i\9obs,i, z o 6 s , iV j , r t ; i p { t ) ) d y m i s < i d z m i s < i = JJ { / ( E h = ! | b i - ^ W ) ) + (1 " ^ ) l Q g ( l - P(Rii = l | b i ( 0 ( t > ) ) ] x / (b i |y i , z i ; 1 />W) ( ib i } / (y ipW)dy Tnis.i^^mis.i - inti 1̂ + 00 + 0 ^ 7 j ,1 + 00 + 0 b ^ x / ( b i | y i , Z i ; V W ) r f b i } / ( y mis,», Z m i S | i | y 0 5 a i , Z 0 5 s i V i , mis i ^ Z m i , i ~ JJ {it^^1}?^™^'2™^^'^ (4.14) 37 where fij log i)0 + (pTb 1 + 00 + 0 ^ x / (b i | y i ,Z i ; i /> w )db i — + ( l - r y ) l o g 1 , 1+00 + 0 b i ; (4.15) Define ai mean 0o + 0 bi . From (4.8), since bi is multivariate normally distributed, with t*bi = Zi1?(yi-Xi0W)/o*n and variance E b i = ( a - 2 W 3 f T i + Z ? - 1 W r 1 > Oj is a univariate normal random variable with mean and variance Take an R s ̂ M 3 variable transformation (4.16) (4.17) (4.18) (4.19) fli = 0 o + 0 bi , 6 2 i = b2i, (4.20) and assume that at least one 0,- ^ Q,j = 1 ,2 , . . . , a, say 0 ! ^ 0, so the Jacobi determinant d(ai,b2i,.. .,bai) d{blu b2i, •.., bai) (4.21) 38 and we can write Q i as: n , - / = / • • 7 h l o s ( i ^ - ) + ( i - r « ) i ° e ( r ^ - ) v . 11 u L I~ i(t)\ \ 9(ai,b2i, • • • ,bsi) xf{ai,b2i,...,bSi\yi,zi;tl:V>)i / ( b i | y i , z i ; ^ w ) d b i db\idb2i • • • dbs d{bu, b2i,bsi) x / ( a i , &2», • • • , &*i|y»> z t i il>{t))<f)\dbudb2i • • • dbsi = / • • 7 h i ^ ( i ^ - ) + < i - ^ ^ ( T ^ - ) ] x / ( a i , &2i, • • • , z,; tl){t))d{<t)ib{)db2i • • • dbsi = / - / [ W i r J + o - W r + y x / ( a j , 6 2i, • • • , 6«|y»» Zii ip{t))daidb2i • • • dbsi x /(fltilyi, z { ; ip{t))f{b2i,b3i\oi, yit z<; ip^daidbu • • • db. = y J "J f(b2u---,bsi\ai,yi,Zi;ip{t))db2i---db: 4 n j l o s ( r T ^ ) + ( 1 - ^ ) l o g ( i ^ ) . = / [ r « l o g ( i T ^ ) + ( 1 - r i ^ l o g ( r + T : ) / ( « i | y i > Z i ; V W ) r f a i , ( 4 . 2 2 ) where / (a i | y j ,Z i ; is the density function of i V ( / i 0 ( , S a . ) . B y Taylor expansion of 39 a* at / i a i , we can write Cli approximately as: « i * / ( r « flog ( ^ - ) + ( - 1 ) ( a i - ^ = y rtj l og / i a i - log(l + /z a j ] / K | y i ; z i ; i / ; w )da; ^ / ( a i l y i . Z i ; ^ ) ^ J LA'oi Mai + 1J [r t f l o g ^ - log(l + /i a i )] y /(c-lyi, z i ; V W ) c f a i y (a.' ~ AO/(oi |y .- ,z<; V ( t V a » = r y - l o g / i 0 i - l o g ( l + /z a i) = rylog/ ia , - l 0 g ( l + / X a J . X 1 + Ma4 + 1 x 0 Therefore, h (4.23) / / { E r ^ l o S ^ ° ' ~ 1 ° S ( 1 + A**)} \9oba,i, Zo63,iVi, I-,; tb(t))dymis = y^ {"i.mia log ~ Tli log(l + / i 0 i ) } |y 06 s,i, z 0 6 S i iVi, r 4 ; ip{t))dymiSiidzmiSii (4.24) where n i i m i S is the total number of missing responses from the ith subject, i. e., n. 'itmis — ^ ^ Tjj. J = l (4.25) 40 Next, ^ f ijfmisfit ^mis,ii = JJJ \°%f{yi\buii)vi\P{t\o-{t)2)f(yrni3^ x / ( b i | y i , Z i ; V W ) d b i d y miSfidlt-mis^i = JJ {J l o g / ^ l b i . Z i . V i ^ W . f f W ' j / ^ l y i . Z i j ^ W j d b i } ^/(ymiSji) ^Tnis,i\yobs>ii ^obs,i^i^ ^i) ^ II^2^^mis' i'Zmi3' i^obs' i'Zobs' iVi,ri'^ t)^ (4-26) fi2 = y l o g / ( y i | b i , z i , v i ; / 3 W , a W 2 ) / ( b i | y i , Z i ; ' 0 W ) r f b i = /log G v ^ U r e x p ( ~ ^ ( y i ~ * i / 3 - T i b i ) T ( y i - ^ ~ T i h i ) ) ) = c 2 - * log ̂ - 1 logdEij) - — / ( y * ~ X ^ ~ r ^ ) r ( y * - ^ - T ^ x exp ( - ^ ( b i - b i ) T E - 1 ( b i - b i ) ) db; (4.27) where s is the dimension of random effects b*, as defined in Section 3.2, and C2 = - * ± M o g ( 2 7 T ) . 41 can perform a variable transformation as bj = b; + T,1/2^, and obtain: C 2 - ^ \ o g a 2 - 1 - l o g ( \ t i \ ) _ J _ f (Yi ~ Xj(3 - TM - T^y%nyi - X£ - TM - T&\'X) 2 ° 2 J (v^lE, 1 7 2 ! C a - ^ l o g ^ - i l o g d E i l ) _ J _ / (Yi ~ Xj(3 - TM - T ^ \ ) T ( y i - Xtf - - T \ t y \ ) 2 ° 2 J ( v ^ F ) ' | E j / a | x e x p ^ - i k f k ^ |Ej / 2 | d (k i ) C a - ^ l o g ^ - i l o g d E i l ) _ J _ f (y< - Xjfl - Tjb, - r i s ? / 2 k i ) r ( y < - x& - T& - T^X) 2a2 J xexp ^ - i k f k i ^ dki C 2 - ^ log a 2 - i log(|Ei|) - ^ ( y i - X / 3 - T i b i f ( y i - X£ - T&) / 0 ^ < * - - * C2 - 2 i log a 2 - i log(|Ei|) - (y* - ^ i /3 - T i b i ) T ( y i - X£ - T&) C 2 - y log a 2 - 1 log(|Ei|) - ^ ( y i - X£ - T^f^ - Xfi - TM) 2a 2 1 TriTfTiti). (4.28) 42 Thus, X ( y i Xi/3 ^ i b i ) ] / ( y m i 5 ] i , Z m i S ( i | y 0 ^ 3 j i , Z 0 5 a ) i V j , T i , ^ ) d y m i a , i ^ Z m i 5 ) i . Also, ^3 = l o g / ( Z i | v i ; a ( t ) ) / ( y m i 3 , i , z m j 3 , i , b i | ^ = yyy i os /^k; a W ) / ( y ™ « , t ' z m i 3 i i | y o i ) 3 ; i , z ^ , , ^ , ry, ^ ) = ^ l o g / ( z i | v i ; a w ) / ( y m i 3 , i , z m i 3 i i | y ^ (4.29) and, -̂ 4 = y ^ l o g / ( b i | D ) / ( y T O i 3 , i , z m i 3 i i , b ^ = JJj log / ( b i | £ > ) / ( y m i S ) i , z m i 3 ] i | y o 6 S i i , z o i ) 3 i i V i , r<; i/>(t)) x / ( b i | y i , z { ; ip^dbidymis^dz^i = JJ {J l o g / ^ l ^ / C b i l y i . z * ; ^ 0 ) ^ } x / ( y m i 5 , i ) Z m j 3 ) i | y 0 ( , 3 j i , Z 0 b 3 ) i V i , T i , ^ ) ^ y m i s , t ^ Z m i 3 ) i = y^ ^ 4 / ( y m i s , i i Zmts,i|yofcs,i) Z 0 ( , 3 j i V j , Tj' , ^ ^ y n ^ ^ i d Z n ^ i a i , (4.30) where, fi4 = y i o g / ( b i | £ » ) / ( b i | y i , Z i ; i / > W ) d b i = C 4 - | log |£>l " | / ( b f / ? - 1 ^ ) X ( v ^ ) - | S i | ^ e X P { - \ { h i ~ 6 i ) T £ _ 1 ( b i - ^ ( 4 3 1 ) 43 and d = - f log(27r). Again, we may perform a variable transformation b ; = bj + E - / 2 k j and obtain: Q 4 = C 4 - i l o g | D | - i y * ( b i + E j / 2 k i ) r D - 1 ( b i + S j / 2 k i ) ^ ^ e x p ( ^ k r k l ) , ( b i + E ^ ) = C 4 - i log \D\ - \J (bj + E J ^ f i r ' f o + E ^ k i ) ^ ^ e x p ( - I k f k t ) | E V 2 | , ( k t ) = C 4 - \ log p | - \ J (bi + E , 1 / 2 k j f D " 1 ^ + E t 1 / 2 k i ) ^ e x p ( - i k r k i ) d k i = Q - i l o g P I - i b f D ^ b i - \ / ( E f k i ) ^ - ( E f k t ) ^ exp ( - | k f k i ) dki 4 / b ^ - ( E f k ^ e x p ( - i k f k i ) ^ ^ / ^ k i f ^ b i ^ e x p f - I k f k i ) ^ = c . - i i o g i D i - i b ^ b i -\J ( E j k i f Z r ^ k i ) ^ ^ exp ( - i k f k ^ dki - 0 - 0 = C 4 - ^ l o g | Z 3 | - l b J , Z ? - 1 b i - i r r ( D - 1 f : i ) . (4.32) Thus, 7 4 « Q - i l o g p l - i l Q b f ^ b i + i r ^ E i ) ) ^ f{ymis,i> zmis,i\yobs,ii zoba,i^i> tp{i))dymi3,idzmiSti. (4.33) From the analytical discussion above, we have integrated out bj. Consequently, Q i ( - 0 | 0 ( < ) ) can be evaluated as an integral with respect to the density function 44 / ( y m i S , i , Z m i s , i | y o 6 3 , i , z 0 6 s , i V i , r i ; i / > w ) . Thus, in E-step, we can only sample y m i s A and zmis,i- This may substantially reduce computation burden since the integration with respect to b * is not needed. 4.2.2 M-step The M-step in each iteration of the approximate method is similar to the M-steps of the exact method, which is discussed in Section 3.3.2. The asymptotic variance- covariance matrix of ip can again be obtained using well-known complete-data formu- las as described in Section 3.3.2. The only difference is that the likelihood function li is for the model (4.2). 4.3 Monte Carlo Sampling Having integrated out the random effects b ; , in E-step we only need to simulate samples from f{ymis,i, zmis,i\9obs,i,z06s,iV»,r^; T / > w ) . As in Chapter 3, we can again use the Gibbs sampler to draw the desired samples. The procedure is described as follows. Set initial values ( y ™ ] 3 | i , z ^ - s i ) . Supposed that the current generated values Afr(fc) (fc) \ Step 1. draw a sample for the missing responses 9mu!i from f (9mis,i\Zmis,i' 9obs,ii zobs,i^ii ^)- Step 2. draw a sample for the missing covariates z ^ V from f {zmis,i\{9mis,i , 9obs,ii Z 0 ( , S i j V j , Yi] 1p^ ^). After a burn-in period, the sampled values (y^.V' zm?s*i) c a n D e treated as the true sample from the density function / ( y m i « , t , zmis,i\9obs,i, z0bs,iVi, i \ ; ip{t)). And, if we 45 choose a sufficiently large gap d! (usually smaller than d), we can treat the sam- ple series { ( y ^ , z ^ ) , k = d + d',d + 2d',...} as independent samples from the multidimensional density function f(ymis,i, 2miSii\yobStU zoba^u r i ; ipw). 46 Chapter 5 Covariates Models and Dropout Models 5.1 Introduction In the foregoing chapters, we have already discussed the methodology for estimation of parameters in NLMEs with informative dropout and missing covariates. To provide valid inference, we need to specify a dropout model for the missing response, and a covariate model for the incompletely observed covariates, and then incorporate them into our analyses. However, the dropout model is usually unknown and hard to be verified from the observed data. Sensitivity analyses are thus very important in that they can show us how sensitive our conclusion rely on our models. If our estimates vary a lot when we choose different dropout models or covariate models, they may be unreliable because we do not know whether our covariate model and dropout model are true. On the other hand, if our estimates are robust to model selection, we can believe that the estimates may be reliable. In Section 5.2 we introduce covariate models. In Section 5.3 we discuss possible dropout models. In Section 5.4 we discuss 47 sensitivity analyses for the dropout model and covariate model. 5.2 Covariate Models When some covariates are missing, we need to assume a distribution for the covari- ates. The parameters in the covariate model are also viewed as nuisance parameters. Ibrahim (1990) proposed a saturated multinomial model for categorical covariates with missing values. A drawback of his method is that the saturated model greatly increases the number of nuisance parameters, which increases computation burden and may make the model unidentifiable. When the missing covariates are all continu- ous, we may assume a multivariate normal distribution for the covariates (Little and Schlucher, 1985). To allow both continuous and categorical covariates, we may write the covariate distribution as a product of one-dimensional conditional distributions, as in Ibrahim, et al. (1999) / ( z i ; a ) = / ( z j ) P | z i i i , . . . , z i i p _ i ; a p ) x / ( ^ i , p - l k i , l ) • • • , Zi,v-I'i ap-l) x ••• x / ( z M | a i ) , (5.1) where z* is the covariate vector for the ith subject, a = (aj, a 2 , • • • , aT)T is the pa- rameter which characterize the relationships among the covariates, and c*i, a 2 , • • • , a p are all different. The index p is the number of covariates. Note that we do not need to make distributional assumption for the completely observed covariates, which are conditioned on and are suppressed in the above expressions. Note also that this mod- eling scheme allows the missing covariates to be continuous, categorical and mixed. For example, suppose that Z\ is continuous and z2 is binary. B y the above modeling 48 strategy, we may specify a normal distribution for Z\ and a logistic regression model for Z2 conditional on z\. For the dataset described in Section 1.3, all of the three covariates in the models, C D 4 (z\), T N F (z 2) and CH50 (23), contain missing values. Thus we need to assume a joint covariate model for likelihood inference. As we have discussed, we model the joint distribution of z = (z\, z2, z3)T as a product of three one-dimensional conditional distributions: f(zii,Zi2,Zi3\a) = f{za\zil,zi2\a.3)f{zi2\zii\a.2)f{zil\ctl). (5.2) where a = (otj, cx^, c* 3 ) T . We focus on the following saturated model, (zi3\zn,zi2;oc3) ~ N(a30 + a31za+a32zi2,a33), (5.3) (zi2\zn; a2) ~ ^ ( 0 2 0 + 0:21^1,0:22), (5.4) {zn\an) ~ i V ( a i 0 ) a i i ) , (5.5) where a 3 = ( a 3 0 , a 3 i , a 3 2 , a 3 3 ) r , a 2 = ( a 2 0 , a 2 i , a 2 2 ) T , and a i = ( a i o , a n ) r . We will also consider other more parsimonious models for sensitivity analysis. 5.3 Dropout Models Dropout models are the models for the missing responses indicators rtj. The param- eters in the dropout model are treated as nuisance parameters and are usually not of inference interest. Thus, we try to reduce the number of nuisance parameters to make the estimation of (3 more efficient. Moreover, too many nuisance parameters may even make the N L M E model unidentifiable. Therefore, one should be very cautious about adding extra nuisance parameters. 49 Since the missing responses indicators are binary, a simple model for them is a logistic regression model as follows. We may assume that the missing probabilities for each time points are independent, i . e., Tli f{Vi\yu b i , Z i , v i ; V>) = f l (* " * t t ) 1 _ r y . (5-6) and l o g ( i _ 3 7 r . ) = ft^yi'bi'Zi'Vi; ( 5 7 ) where 7Ty = P{Rij = 1) is the probability that is missing, and /i(-) is often an linear function of y i , b j , Z j and Vj. More complicated models can also be considered, but they may introduce more parameters and increase the computational burden. In general, the probability that y^ is missing may depend on many factors, such as past or current responses, individual random effects, covariates, etc. However, since in this thesis we focus on random effects based informative dropouts, we may assume that the missing probabilities of responses are only explainable through b j , the random effects, i . e., /(r»|y». b i , Z i , V i ; ip) = / ( r ^ b j , 0 ) . (5.8) Thus, we have Tli / ( r i | y i , b i , Z i , v I ; 0 ) = [ ] > ( ^ = l | 0 . b i ) r " ( l - P{Rij = b * ) ) 1 " ^ (5.9) Again, note that, as in Section 3.3.2, we use i?y to represent the argument in the function and TY, to represent the value of Rij as we have observed. 5.4 Sensitivity Analyses The dropout model and covariate model are not verifiable based on the observed data, so it is important to conduct sensitivity analyses. That is, we need to try other 50 plausible dropout models and covariate models, and then assess the sensitivity of results to those different models. If there is not much difference between the results based on different models, we can draw relatively reliable conclusions. Otherwise, the results may depend on assumed the models and the conclusions may not be reliable. 51 Chapter 6 Data Analysis 6.1 Introduction In the previous chapters, we have discussed an exact method and an approximate method for N L M E s with random effects based informative dropouts and missing co- variates. In this chapter, we will analyze two real datasets. In Section 6.2, we analyze the data described in Section 1.3. This dataset has both missing covariates and miss- ing responses. We will only apply the exact method. In Section 6.3, we analyze another A I D S dataset, which does not have covariates. We will apply both the exact method and approximate method on the second dataset. In Section 6.4, we discuss some computational issues. 52 Table 6.1: Data summary of Example 1. Variable Sample mean Sample standard deviation Percentage of missing values Vira l load 3.57 0.99 6.7% C D 4 177.3 86.9 3.7% CH50 241.1 48.6 16.4% T N F 61.0 28.8 12.3% # of patients: N=48 # of observations per patient: n;=2 to 7. 6.2 Example 1 6.2.1 Data Description The dataset which motivates our research consists of 48 H I V infected patients who were treated by a potent antiviral regimen. The Plasma HIV-1 R N A (viral load) is repeatedly measured on days 2, 7, 10, 14, 28, and weeks 8, 12 and 24, after initiation of the treatment. After the antiviral treatment, the patient's viral loads will decay, and the decay rate may reflect the efficacy of the treatment. Throughout the time course, due to individual characteristics, the viral load may continue to decay, fluc- tuate, or even start to rise (rebound). We only consider the data for the first three months data since data after three months are likely to be contaminated by long- term clinical factors. The number of measurements for each patient varies from 2 to 7. Fourteen patients have missing viral loads at scheduled time points due to dropout or other problems. The baseline covariates C D 4 cell counts, total complement levels (CH50), and tumor necrosis factor ( T N F ) contain 3.7% 16.4%, and 12.3% missing data respectively. We summarize our data in Table 6.1 (viral load is in log 1 0 scale). Four patients are randomly selected and their viral loads are plotted in Figure 53 0 10 20 30 40 days after treatment Figure 6.1: Viral loads of four randomly selected patients. (Example 1) 6.1. Visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared to the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. This dataset was analyzed previously, but dropout patients were discarded and the missing viral loads were assume to be missing completely at random (Wu and Ding 1999; W u and W u 2001). W u (2004) re-analyzed the dataset, assuming that the missing mechanism depends on the unobserved responses (viral loads). In this section, our objectives are to consider random effects based nonignorable missing mechanism, and check if the estimates of decay rates are different. 54 6.2.2 Models The following two-phase H I V viral dynamic model has been proposed for this study (Wu and Ding, 1999; W u and W u , 2001) Vi, = l o g 1 0 ( P l i e - A l i t l i + P 2 i e - A 2 i t 2 l ) + ^ , (6-1) l o g 1 0 ( P H ) = Pt+foTNF + bu, Xu = 03 + P4,TNF + P6CDA + b3i, l o g 1 0 ( P 2 I ) = p6 + p7TNF + b3i, X2i = pa + p9CH50 + b4i, (6.2) where y^ is the l og 1 0 transformation of viral load for patient i at the j t h visit, i — 1, . . . , N; j = 1 , . . . , rij, N = 48 and n* varies from 2 to 7, A H and A 2 I represents two viral decay rates, P H and P 2 I are baseline values, bki,k — 1, ...,4, are random effects, and £ij represents within individual errors. In this study, the baseline C D 4 (zi), CH50 (z 2) and TNF(zs ) all contains missing values. To make a valid likelihood inference, we need to specify a model for these three covariates. We model the joint distribution of z = (z i , z 2 , z3)T as a product of three one-dimensional conditional distributions: f(zii,zi2,zi3\a) = f{za\zn,zi2;a3) • f(zi2\zn; a2) • f(zn\cxi). (6.3) where a = ( a f , a^ , aJ)T. We first focus on the following saturated model: {zi3\zn,zi2\a3) ~ N{a3o + a3iZii + a32zi2,a33), (6.4) (2»2 | * i i ; "2 ) ~ ^ ( " 2 0 + a 2 i ^ i , a 2 2 ) , (6.5) (Zii |ai) ~ ^ ( a i o . a n ) . (6.6) Figure 6.2 shows the Q - Q plots for each of the three covariates. It appears that the normality assumption may be plausible. The responses yij contain 6.7% missing values. Thus, we also need to assume a model for the dropout mechanism in order to make valid likelihood inference. Note 55 Norma! Q-Q Plot Normal Q-Q Plot Normal Q-Q Plot - 2 - 1 0 1 Theoretical Quantiles - 2 - 1 0 1 2 Theoretical Quantiles - i — r -3 -2 -1 0 1 2 Theoretical Quantiles Figure 6.2: Q - Q plots for covariates (Example 1) 56 that although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analyses based on plausible models may still lead to rea- sonable models. Since, in this thesis, we focus on random-effects based nonignorable missing data, we propose the following missing model. where = [ m , . . . , J " j n i ) r is a vector of missing data indicators for individual i such that r-jj = 1 if yij is missing and 0 otherwise. 6.2.3 Analysis and Results We consider estimating the population parameters /3 = (0\,..., [3g)T using three methods: the complete case method, exact method assuming ignorable missingness, and exact method with nonignorable missingness. Details of the computation are given in Section 6.4. The results are shown in Table 6.2; Jackknife standard errors are included for comparison to check on the accuracy of method for standard error estimation described in Section 3.3.2. We see that the results are somewhat different under different methods. For the most important parameter fa, the initial decay rate, the exact method assuming non- ignorable missing gives the smallest estimate, the exact method assuming ignorable missing gives an moderate estimate, and the complete case method gives the largest. This suggests that studies assuming ignorable missing data mechanism or discarding dropout patients may over-estimate the initial viral decay rate. Another informative parameter is f3i, which is the intercept term of the baseline viral load. Although it is not of much interest for testing the efficiency of the new / ( r i | y i , b i , Z i V i ; 0 ) = n p ( ^ = 1 l ^ b ' ) r y ( 1 - - P ( ^ = 1 ^ ' b * ) ) (6.7) 57 Table 6.2: Estimations for response model parameters. (Example 1) Exact (Ignor.) Exact (Non-ignor.) Complete Case Estimate SE JSE Estimate SE JSE Estimate SE JSE ft 12.29 0.02 0.04 12.27 0.02 0.04 13.45 0.32 0.08 ft 0.61 0.01 0.06 0.54 0.10 0.04 0.59 0.28 0.09 ft 37.54 0.34 0.50 35.84 0.39 0.71 40.65 4.40 0.90 ft 0.49 0.14 1.26 2.85 1.75 0.63 1.07 3.43 2.01 ft 7.69 0.20 0.45 5.59 0.76 0.57 5.46 2.06 0.55 ft 7.83 0.02 0.04 7.77 0.02 0.05 7.60 0.32 0.04 ft 0.30 0.01 0.08 0.30 0.03 0.03 0.34 0.14 0.07 ft 2.08 0.03 0.07 1.99 0.03 0.08 1.92 0.49 0.07 ft 0.13 0.01 0.05 0.092 0.01 0.05 0.20 0.22 0.09 a 0.58 0.12 0.61 0.02 0.32 0.56 Complete Case is based on 30 subjects without any missing measurements. treatment, it shows the difference between different estimation methods. The exact methods almost give the same estimate for (3i, no matter whether we assume ignorable missingness or not. However, the complete case method gives a larger estimate. A possible interpretation is that, since the complete case method overestimate the initial decay rate, the baseline response "intercept" is correspondingly larger. The JSE is often but not always larger. Based on comparing the estimated SEs from the sample sizes of 30 complete cases and 48 subjects in total, it looks like JSE is more reliable than SE using method of Section 3.3.2 and nlme( ). The reliability of different approaches for SE estimation is a topic of future research. Although the parameters in the dropout models are nuisance parameters and are usually not of interest, they sometimes may contain useful information. We summarize the estimates of <p = (0o,..., (pi)T in Table 6.3. From Table 6.3, we could see that the estimate of <f>\ is positive and the estimate of <fo is negative. Consider our model (6.1) and (6.2). We may conclude that patients who have higher baseline 58 Table 6.3: Estimations for dropout model parameters. (Example 1) Parameters 0o 01 02 03 04 Estimate 2.64 0.14 -0.11 -0.12 0.07 Standard Error 0.0017 0.11 0.067 0.17 0.42 p- value < 0.001 0.21 0.10 0.48 0.86 viral loads and slower (true, but unobservable) initial decay rate are more likely to dropout. The new treatment may be less efficient on such patients, and ignoring these patients may result in over-optimistic conclusion on the treatment effects. Note that the p-value associated with (pi is 0.21, which may be acceptable in practice. However, <p3 and 0 4 are not statistically significant, and we may conse- quently remove them to simplify the model. Note also that the Wald tests are only approximate here, so these p-values should only serve as rough guidance. In next section, we will perform sensitivity analyses on alternative dropout models. 6.2.4 Sensitivity Analysis It is important to check sensitivity of parameter estimates to various plausible dropout models. Subject-area knowledge may help us to determine alternative dropouts mod- els. It is conceivable that dropout may be related to individual's random effects, current and previous viral load measurements, or covariates such as C D 4 cell counts. Such relationship may be very complicated, but simple logistic regressions may pro- vide reasonable approximation. Note that we should make use of the conclusions from preliminary studies and try to propose simple but plausible dropout models. We should avoid building a too complicated dropout model since the parameters may become non-identifiable (Fitzmaurice et al. 1996). Here we consider the following 59 Table 6.4: Sensitivity analyses for dropout models. (Example 1) Model (6.8) Model (6.9) Model (6.10) Estimate SE JSE Estimate SE JSE Estimate SE JSE ft 12.27 0.02 0.04 12.27 0.02 0.04 11.50 0.02 0.03 ft 0.54 0.10 0.04 0.63 0.08 0.04 0.55 0.06 0.05 ft 35.84 0.39 0.71 36.57 0.50 0.70 35.90 0.29 0.57 ft 2.85 1.75 0.63 1.87 1.12 0.60 1.77 0.07 0.68 ft 5.59 0.76 0.57 7.25 0.80 0.57 6.63 0.17 0.55 ft 7.77 0.02 0.05 7.66 0.02 0.04 7.09 0.01 0.04 ft 0.30 0.03 0.03 0.32 0.03 0.03 0.28 0.01 0.04 ft 1.99 0.03 0.08 1.40 0.02 0.05 1.88 0.02 0.07 ft 0.09 0.01 0.05 0.07 0.02 0.05 0.09 0.01 0.06 (7 0.61 0.02 0.62 0.02 0.80 0.01 plausible dropout models 00 + 01&li + 02&2i + fahi + 04&4i, (6-8) 00 + 01&li + 02&2i (6.9) 0o + <PiVij (6.10) Since we only focus on random-effects based informative missingness in this thesis, model (6.8) and (6.9) are of main interest. Model (6.10) is responses based informative missingness, and Wu (2004) discusses and analyzes a similar dataset based on such approach. Here, we consider this kind of dropout model only for comparison. The parameter estimates under different dropout models are summarized in Table 6.4. The standard error estimates are almost consistent among different dropout models. For model (6.10), the SE estimates described in Section 3.3.2 tends to yield smaller estimates. We find that the resulting parameter estimates for viral dynamic parameters /3 60 log log log P{Rii = l|0>,bj) i - P(Ra = l|0,bO P(Rij = l|0,bi) i - P{Rn = l|0,bi) P(PHJ = l|0,bi) i - P(Rij = l|0,b,) are all comparable. This suggests that the estimation of the viral dynamic parameters may be robust against plausible dropout models, and thus the estimates in Table 6.2 may be reliable. Compared with models (6.8) and (6.9), model (6.10) gives smaller standard errors on the estimates. However the jackknife SEs for the three models are much closer for each beta. Hence this is an indication of potential problems with the method of Section 3.3.2. Or it may be because model (6.10) is more parsimonious than model (6.8) and (6.9) (i.e., less nuisance parameters). Another possible explanation is that model (6.10) models the missing probability at each scheduled time point, while the other two models model the missing probability for each subject. Thus, model (6.10) captured more information. 6.2.5 Conclusion Based on our analyses, we conclude that, for the HIV viral load data, complete case analyses and analyses assuming ignorable missingness may over-estimate the initial decay rate. Assuming non-ignorable missingness incorporates possible mechanism which leads to patients' dropouts, and therefore may give parameter estimates which may be more reliable. Either responses-based or random-effects based non-ignorable dropout models may be used to get valid inferences, and either may be used as a tool for sensitivity analysis for the other one. 61 Table 6.5: Data summary of Example 2. Variable Sample Sample Percentage of mean standard deviation missing values "Vira l load 4.27 1.12 6.5% # of patients: iV=51, Total # of observations: 415, # of visits per patient: rij=4 to 10(mean(nt)=8.1,SD(nj)=1.6)), # of missing values: 27 (from 15 patients). 6.3 Example 2 6.3.1 Data Description Our second example is from another H I V study. The data contain H I V viral measure- ments from 51 patients. The Plasma HIV-1 R N A (viral load) is repeatedly measured on days 1, 2, 3, 7, 10, 14, 28, and weeks 8, 12 and 24, after initiation of an anti-HIV treatment. After the antiviral treatment, the patient's viral loads will decay, and the decay rate may reflect the efficacy of the treatment. As in the first dataset in Example 1, throughout the time course, due to individual characteristics, the viral load may continue to decay, fluctuate, or even start to rise (rebound). We only consider the first three months data since data after three months are likely to be contaminated by long-term clinical factors. The number of measurements for each patient varies from 2 to 7. 15 patients have missing viral loads at scheduled time points due to dropout or other problems. However, different from Example 1, this study does not contain baseline covariates. Four patients are randomly selected and their viral load measurements are plotted in Figure 6.3. We summarize our data in Table 6.5. As in Example 1, visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared to the remaining pa- 62 Figure 6.3: Viral loads of four randomly selected patients. (Example 2) 63 tients. Thus, the dropouts are likely to be informative or nonignorable. 6.3.2 Models We consider the following HIV viral dynamic model, which does not contain covariates (Wu and Ding, 1999) V i j = l o g 1 0 ( P l i e - A " t l i + P 2 i e - A 2 ^ ) + ^ I (6-11) log 1 0 (Pii) = fa + bu, \u = Pi + b2U loglo(Ai) = P3 + hi, \2i=Pi + h U (6.12) where j/y is the log 1 0 transformation of viral load for patient i at the j t h visit, i = 1 , . . . , N; j = 1 , . . . , nit where N'= 51 and n, varies from 4 to 10, A H and A 2 I represents two viral decay rates, Pn and P2t are baseline values, bki,k = 1,...,4 are random effects, and £ y represents within individual errors. The responses contain 6.5% missing values. Thus, we also need to assume a model for the dropout mechanism in order to make valid likelihood inference. Note that although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analyses based on plausible models may still lead to rea- sonable models. Since, in this thesis, we focus on random-effects based nonignorable missing data, we propose the following missing response model. / f a l y i . b i . Z i . v ^ ) = f[P{Rij = 110,^)^(1 - P{Rii = l l ^ . b , ) ) 1 - ^ 1 O g ( i - P ^ y = l | 0 b ) ) = ^O + 01&li + 02&2i + 03&3i + 04&4i, (6.13) where = ( r ^ , . . . , r * n i ) T is a vector of missing data indicators for individual i such that Vij — 1 if yij is missing and 0 otherwise. 64 Table 6.6: Estimates for dynamic model parameters in Model (6.11) and (6.12). (Example 2) Exact (Nonig.) Exact (Ignor.) Approx. (Nonig.) Comp. Case Est S E J S E Est S E J S E Est S E . J S E Est S E J S E Pi 12.23 0.002 0.004 13.33 0.003 0.004 11.45 0.002 0.005 12.27 0.23 0.01 P2 60.13 0.15 0.27 66.07 1.12 0.35 55.78 0.08 0.25 58.53 2.27 0.59 P3 8.60 0.05 0.07 9.55 0.11 0.08 9.01 0.004 0.10 8.56 0.23 0.19 P* 3.96 0.13 0.20 4.16 0.14 0.22 2.88 0.076 0.25 3.80 0.30 0.39 a 0.05 0.08 0.52 0.08 0.32 0.02 0.20 0.17 6.3.3 Analysis and Results We consider estimating the population parameters (3 = (Pi,..., Pi)T using four meth- ods: the complete-case method, the exact method assuming ignorable missingness, the exact method with nonignorable missingness, and the approximate method with nonignorable missingness. The results are shown in Table 6.6. We see that the results are somewhat different under different methods. For the most important parameter p2, the initial decay rate, the exact method with non- ignorable missingness gives a smaller estimate than the exact method with ignorable missingness, the approximate method gives the smallest estimate, and the complete case method gives a moderate estimate but with large estimation variance. This sug- gests that studies assuming ignorable missing data mechanism or discarding dropout patients may over-estimate the initial viral decay rate. Note that the approximate method gives smaller estimates and standard errors. For some parameters, the es- timate of standard errors in approximate method seems too small. It may indicate that the standard error estimate in Section 4.2.2 is not reliable. For the approximate method, the Jackknife standard errors may be more reliable. 65 Table 6.7: Estimations for dropout model parameters. (Example 2) Parameters 00 01 02 03 04 Estimate 2.95 0.611 -2.046 -0.562 0.1034 Standard Error 0.0346 0.0296 0.448 0.0465 0.125 p-value < 0.001 < 0.001 < 0.001 < 0.001 0.41 The estimates for parameters in the dropout model are summarized in Table 6.7. We find that, except for </>4, all other coefficients are statistically significant. This suggest that the missing mechanism may be non-ignorable, and the missingness may be related to the underlying unobservable individual characteristics (random effects). 6.3.4 Sensitivity Analysis From the previous section, we conclude that the dropout mechanism may be non- ignorable because most p-values associated with the coefficients cp in the dropout model are statistically significant. We still need to check whether different dropout models may affect the estimates of the parameters, because the validation of dropout models are not verifiable based on the observed data. Thus, it is important to check sensitivity of parameter estimates to various plausible dropout models. Subject-area knowledge may help us to determine alternative dropouts models. We consider the following two possible models: 00 + 01< + 02&2i + 03&3i + <t>ihu (6-14) 00 + 01&li + 02&2i + 03&3i, (6.15) 00 + 01&H + 02&2i (6.16) 66 log log log / > ( / 2 y = l|0,b4) 1 - PiRij = l|0,bi) P(Rii = l|0,bi) 1 - P ( i ^ = l|0,bi) P(Ru = l|0,bO \-P{Ra = 1\<PM) Table 6.8: Sensitivity analyses for dropout models. (Example 2) Model (6.14) Model (6.15) Model (6.16) Estimate S.E. J .S .E . Estimate S .E. J . S . E . Estimate S .E. J .S .E . 01 12.23 0.002 0.004 12.23 0.002 0.004 12.25 0.002 0.04 02 60.13 0.15 0.27 60.13 0.14 0.27 60.09 , 0.14 0.25 0z 8.60 0.05 0.07 8.59 0.06 0.07 8.66 0.06 0.09 04 3.96 0.13 0.20 3.99 0.13 0.19 4.02 0.11 0.25 a 0.50 0.08 0.49 0.08 0.44 0.08 The parameter estimates by exact method under different dropout models are summarized in Table 6.8. We find that the three model almost result in the same estimates, and therefore, we may conclude that the parameter estimates may be robust against different dropout models. 6.3.5 Conclusion From the above analyses, we conclude that, for this dataset, the dropout maybe nonignorable. Simply ignoring the dropouts may over-estimate the initial viral decay rate f32. The approximate method is much faster than the exact method. 6.4 Computation Issues S t a r t i n g values. For the E M algorithms in our examples, the starting values for f3 were obtained based on the complete-case methods. For example 1, the starting values for a. in the covariate model were obtained from linear regression models (6.4)-(6.6) using completely observed cases. The starting values for <p were set to be (0O, 01, 02, 03, 04, ) = (1,0,0,0,0). 67 S t o p p i n g rule . The stopping rule for the E M algorithms in our examples is that the relative change in the parameter estimates from successive iterations is smaller than a given tolerance level (e.g. 0.01). However, due to Monte Carlo errors induced by Gibbs sampler, it is difficult to converge for a extremely small tolerance level, otherwise it may converge very slowly. The actual tolerance level we used in our examples is 0.05. The E M is stopped when each of the new parameters estimates falls within 5% difference from the corresponding estimates from the last E M iteration for two consecutive iterations. R u n n i n g t i m e . For the data of Example 1, the algorithm for the exact method converged in about three hours on a S U N Sparc work-station (Ultra-60). For the data of Example 2, the algorithm for the exact method converged in about one hour, while the algorithm for the approximate method converged in about 15 minutes. Thus, the approximate method is computationally much more efficient. S a m p l i n g m e t h o d . In both examples, we use the multivariate rejection sampling method. Other sampling methods may also be applied and may be even more efficient. 68 Chapter 7 Simulation Study 7.1 Introduction In order to evaluate the performance of the two proposed methods: the exact method (EX) and the approximate method (AP) , we conduct a simulation study. In our simulations, we prepare E X and A P in terms of biases and mean-squared errors of their estimates under various situations. We also add the complete-data (CD) method in our comparisons. Section 7.2 describes the data generation models, and Section 7.3 compares the two methods of estimation in three different situations. We conclude our result in Section 7.4. 69 7.2 Design of the Simulation Study 7.2.1 Models We generate the responses variable from the following N L M E yu = l o g 1 0 ( P H e - A l i ^ + P 2 i e - A ^ ) + ^ , (7-1) togioCAi) = Pi+bu, A i i = /32 + 6 2« 1 logio(^2i) = Ih + bsi, A 2 i = /34 + 6 4 i , (7.2) where /3 = (Pi, /32, P3,/?4)T are the model parameters which are of interest, bj = (feii, 62j,b3i, bn)T are assumed to be i.i.d with a normal distribution N(0,D), and JD is a diagonal matrix with rank 4. The true values of f3 is taken as (10,40,8,4) T . The number of individuals is N = 48. The choice of tij,riij, and a2 will be reported in later sections with the results. To evaluate the proposed methods, we generate some missing values of re- sponses yij's as follows. The model for missing responses is log ( — ^ T r T T T T J = 0 ° + 0 i & i » + 02&2i + <i>3b3i + <pibu, (7.3) V I - P(Rij = l\<p,Oi)J where cp = (<f>0,<pi, <p2,<p3, (pi)T and is the missing responses indicator. The above missingness model suggest that the missingness of the responses depends on the ran- dom effects of each individual, and thus the responses is nonignorable missing. We generate missing responses based on model (7.3). We choose appropriate values of cp to mimic certain missing rate, and we generate binary data based on values of cp and bj. If — 1, then j/y is deleted, and if = 0, yij is considered to be observed. 70 7.2.2 Comparison Criteria We compare E X and A P in terms of biases and mean square errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined next. The bias for ft, the j t h component of j3, is defined as bias,- = ft-ft, (7.4) where ft is the estimate of ft. The mean-squared error for f3j is defined as MSE,- = bias 2 + s 2, (7.5) where Sj is the simulated standard error of ft. Then, the percent relative bias of ft is defined as Uasj/pj x 100%, (7.6) and the percent relative root M S E is v / M S E " / ! ^ ! x 100%. (7.7) To show the difference between E X and A P , we also calculate the mean and standard error of the absolute differences between the estimates from E X and A P . 7.3 Simulation Results 7.3.1 Comparison of Methods with Varying Missing Rates To check the impact of the missing rates on estimation by E X and A P , we estimate the parameters based on three missing rate respectively. A missing rate of 10% and 71 20%. Specifically, in our model (7.3), we set d> = (-2.6,0.1, -0 .1 , -0 .1 ,0 .1) T to get roughly an 10% missing rate, and set <b = (—1.4,0.1, —0.1,-0.1,0.1) T to get an average of 20% missing rate. Since the random-effects based probability of a missing observation is a constant for a particular subject, there are situations where a subject has a high missing probability and the responses are missing for A L L visits. To overcome this difficulty, we always keep the responses for the first two visits. That is, every subject has at least two observations. Specifically, we choose ij = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0), the relative scheduled time points, for each subject, and generate (j = 1, . . . ,rij = 9). If = 1, we manually remove the jth value from the generated responses yij (j = 1 , . . . , = 9) for the zth subject. We choose D = diag(4,25,4,1) and a — 0.5 in these simulations. To check the effect of each factor, we change a factor each time and compare the estimates with the original ones. Therefore, in the following tables, one half of each table is exactly the same. Table 7.1 shows average simulation results based on 150 simulations. We see that the exact method performs better than the approximate method in the sense that the exact method yields smaller relative M S E and smaller bias. C D may yield very biased estimates, especially for the most important parameter ft, the initial viral decay rate. A l l methods performs better when the missing rate is lower. The absolute differences between E X and A P are similar under different missing rates. 72 Table 7.1: Simulation results for varying missing rates. Missingness True rate (%) values %bias %VMSE | E X - A P | E X A P C D E X A P C D Mean SD 10 ft = : 10 3.5 3.2 2.9 4.7 12.11 20.10 0.05 0.02 ft = = 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 ft = = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 04- = 4 0.33 -2.8 1.10 6.7 7.50 9.01 0.03 0.01 20 01 = = 10 4.71 3.66 -10.53 11.03 21.22 25.03 0.08 0.06 02 = = 40 6.11 -10.29 30.33 12.76 21.83 40.93 1.33 0.84 ft = 8 3.23 -3.01 -6.11 5.32 12.78 17.19 0.06 0.05 ft = 4 -2.77 5.75 -3.75 5.66 22.50 24.71 0.07 0.03 7.3.2 Comparison of Methods with Different Random Effects Covariances To see how the variability of bj affects the estimates from the three methods, we consider two variance-covariance matrices for bf. / 1 0 0 0 \ / 4 0 0 0 0 9 0 0 0 25 0 0 0 0 1 0 Var(bj) = and Var(bj) \ 0 0 0 1 J 0 0 4 0 \Q 0 0 1 J Table 7.2 shows average simulation results based on methods E X , A P , and C D . We choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0), the relative scheduled time points, for each subject, and generate (j = 1, . . . ,nj = 9). If = 1, we manually remove the j t h value from the generated responses yij (j = 1,. . . ,nj = 9) for the ith subject. We choose <p = (—2.0,0.1, —0.1, —0.1,0.1)T and a = 0.5 in these simulations. The response missing rate is roughly 10%. We may conclude that, for most of the parameters, the exact method performs better than the approximate method in the 73 Table 7.2: Simulation results for different covariance matrices for random effects. Covariance True %bias % \ / M S E | E X - A P | Matrix values E X A P C D E X A P C D Mean SD Var(bi) A = 10 2.7 2.4 2.1 3.9 5.6 9.1 0.02 0.01 =diag(l ,9, l , l ) /32 = 40 2.3 -9.0 5.0 6.0 11.2 15.7 0.23 0.14 ft = 8 0.4 3.1 2.1 2.1 2.4 12.6 0.02 0.01 ft = 4 1.5 2.5 5.0 1.9 5.5 6.5 0.03 0.02 Var(bi) ft = 10 3.5 3.2 2.9 4.7 12.11 20.10 0.05 0.02 =diag(4,25,4,l) ft = 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 ft = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 ft = 4 0.33 -2.8 1.10 6.7 7.50 9.01 0.03 0.01 The lower half of this table is identical to the upper half of Table (7.1) sense that the exact method yields smaller relative M S E s and smaller bias. Both methods perform better when the variances of the random effects are smaller. For ft, the estimate under approximate method has slightly smaller bias. The absolute differences between E X and A P have smaller mean and standard deviation when the variances of b are smaller. 7.3.3 Comparison of Methods with Varying Intra-individual Measurements To examine how the number of intra-individual measurements affect our estimates, we consider the two methods of estimation under two maximum number of measure- ments, n* = 9 and n, — 15. For the case in which maximum number of measurements is 9, we choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0) as the relative scheduled time points, for each subject. For the case in which the maximum number of mea- surements is 5, we choose U = (0.01,0.02,0.04,0.07,0.1,0.15,0.2,0.3,0.4,0.5,0.6,0.7, 0.8,0.9,1.0) as the relative scheduled time points for each subject. In both cases, 74 Table 7.3: Simulation results for varying intra-individual measurements. # of obs. True %bias %VMSE | E X - A P | per patient values E X A P C D E X A P C D Mean SD 2 to 9 ft = 10 3.5 3.2 2.9 4.7 12.11 20.10 0.05 0.02 ft = 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 ft = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 ft = 4 0.33 -2.8 1.10 6.7 7.50 9.01 0.03 0.01 2 to 15 ft = 10 2.1 3.0 2.8 3.5 10.9. 11.4 0.03 0.01 ft = 40 -1.9 -3.0 5.7 7.5 11.9 9.6 0.22 0.12 ft = 8 -0.10 0.1 1.3 2.7 4.9 9.5 0.03 0.02 ft = 4 0.30 -2.0 0.88 3.6 2.7 6.1 0.05 0.02 The upper half of this table is identical to the upper half of Table (7.1) we always keep the responses values for the first two visits of each patients, but the responses from the third visit may be missing. Therefore, the actual number of ob- servations for each patient may vary from 2 to 9 or 2 to 15 respectively. Additionally, We use the missing rate of 10% (roughly), Var(bj)=diag(4,25,4,1), and a = 0.5 in these simulations. The results of 100 simulations are summarized in Table 7.3. We see that, when there are more intra-individual measurements, all methods performs better. Again, the exact method performs better than the approximate method, no matter what the number of intra-individual measurements are. C D performs worse than E X and A P . The absolute differences between E X and A P have smaller mean and standard deviation when there are more observations for each subjects. 7.3.4 Comparison of Methods with Different Variances To investigate the impact of intra-individual variability on E X and A P , we estimate the parameters based on two data generation strategy, with a — 0.5 and 1 respectively. 75 Table 7.4: Simulation results for varying variances. a True %bias % y / M S E | E X - A P ] values E X A P C D E X A P C D Mean SD a = 0.5 0! = 10 3T5 O 2~9 4 7 12.11 20.10 O05 0.02 02 = 40 -3.1 -5.5 -2.5 9.5 27.82 21.77 0.97 0.64 03 = 8 0.22 -0.5 0.76 4.1 5.64 11.76 0.06 0.03 04 = 4 0.33 -2.8 1.10 6.7 . 7.50 9.01 0.03 0.01 a = 1 ft = 10 3.4 3.3 3.1 6.3 14.1 19.18 0.06 0.04 02 = 40 -2.7 -5.1 6.9 8.8 27.3 27.10 0.78 0.54 03 = 8 0.31 -0.6 1.2 6.35 9.8 9.28 0.05 0.08 04 = 4 0.23 -2.7 1.9 7.15 9.7 17.3 0.03 0.03 The lower half of this table is identical to the upper half of Table (7.1) Table 7.4 shows the average simulation results based on 150 simulations. In these simulations, we choose U = (0.05,0.1,0.15,0.2,0.3,0.4,0.6,0.8,1.0) as the rel- ative scheduled time points for each subject, and generate (j = 1 , . . . ,nj = 9). If Tij = 1, we manually remove the j t h value from the generated responses y^ (j = 1,. . . ,rij = 9) for the ith subject. We choose Var(bi)=diag(4,25,4,l) and the missing rate is 10% roughly. We find that, E X performs better than A P and yields less biased estimates. Complete-case methods tend to over estimate the initial decay rate, and yields large estimation variances. The absolute differences between E X and A P are relatively similar . 7.4 Conclusions Based on the simulation results in the preceding sections, we may conclude as follows. • E X and A P results are quite close relatively. 76 • For most cases, E X performs better than A P in the sense that E X yields smaller bias and smaller relative M S E s . However, in some cases, the estimates from A P may have smaller bias. C D often performs the worst in sense of large M S E . • Both methods perform better when the missing rates are lower. • Both methods perform better when the variances of random effects are smaller. • Both methods perform better when the number of observations for each subject is larger. • The Complete-case method tends to overestimate the initial decay rate and the corresponding estimates are positive biased, especially when the missing rate is large. Note that, in our simulation studies, A P is computationally more efficient than E X . The running time of A P is about 1/4 of that of the E X , and it does not have convergence problems. In practice, the E X is preferable when the computation load is not too heavy. When E X is too slow, A P is preferable. 77 Chapter 8 Conclusion and Discussion In this thesis, we have proposed two methods to estimate the parameters for N L M E s with random effects based informative dropout and missing covariates. The proposed methods include an exact method and an approximate method, both are implemented by a Monte Carlo E M algorithm. For the exact method, sampling the random ef- fects may offer potential computational difficulties such as slow or non-convergence, especially when the dimension of random effects is large. To overcome this difficulty, we proposed an approximate method which integrates out the random effects in the E-step and thus avoids sampling the random effects in the Monte Carlo E M . Pinheiro and W u (2001) show that convergence rate of the E M algorithm can be improved greatly by integrating out the random effects. We also conducted a simulation study to compare the performance of the exact method and the approximate method. In our simulations, the exact method gives somewhat more reliable results than the approximate method in the sense that it provides smaller mean squared errors on the parameter estimates. Our simulations also suggest that the proportion of missing values, the variances of random effects, the number of intra-individual measurements, and the intra-individual variabilities 78 may affect the performance of the exact method and the approximate method. The exact method was applied to an H I V dataset with missing covariates and informative dropouts. We find that patients who have higher baseline C D 4 cell counts and slower initial viral decay rates may be more likely to dropout. Thus, ignoring dropouts may lead to over-optimistic assessment of the antiviral treatment. We also applied the exact and the approximate method on the second H I V dataset. We obtain similar conclusions as the first dataset. We find that the approximate method may under-estimate the initial viral decay rate, and thus may be somewhat conservative in assessing the treatment effect. It may be caused by the lack of adequacy of our approximate model. We also notice that the estimation of standard errors in approx- imate method is not reliable, and Jackknife standard error is recommended. But, in practice, the approximate method is computationally much more efficient than the exact method. Since the parametric models for the missing responses are not testable based on the observed data, it is important to conduct sensitivity analyses. Based on our sensitivity analyses, we find that the estimates are robust to different dropout models. Finally, we give an outline for possible future work. (1) For simplicity, in our examples and simulations, we only include random effects in the dropout models. It is conceivable that the dropout probability may also depend on covariates and the responses. In the future, we may study dropout models which simultaneously consider random effects, covariates, and responses. (2) In our study, we only consider nonlinear mixed effect models for normal data. Generally, our proposed methods may be extended to other models, such as generalized linear mixed effects models ( G L M M s ) and generalized nonlinear 79 mixed effects models ( G N L M M s ) . (3) We have only considered baseline covariates. Our proposed methods may be extended to time-dependent covariates. (4) Multivariate rejection sampling methods were used in our analyses and simula- tion. In general, other sampling methods, such as adaptive rejection sampling methods and importance sampling methods, may also be used and may be even more efficient. (5) In our approximate method, we used first order Taylor expansion to linearize the model. In general, it is not necessarily a good approximation. In the future, we may investigate better approximation, such as higher order Taylor expansion, Laplace approximation, etc. (6) Investigation of accuracy of standard error estimation. In our work, the estima- tion of SEs maybe unreliable, especially in the approximate method. (7) In our work, we consider ej|/3i ~ iV(0, a21). In the future, we may consider more complicated relationships on the error terms. We may consider random effects model for longitudinal data with being an AR(1) time series. 80 References Booth, J . G . and Hobert, J . P. (1999). Maximizing generalized linear mixed models likelihoods with an automated Monte Carlo E M algorithm. Journal of the Royal Statistical Society, Ser. B. 61, 265-285. Davidian, M . , and Giltinan, D. M . (1995). Nonlinear Models for Repeated Measure- ments Data. Chapman & Hall. Dempster, A . P., Laird, N . M . , and Rubin, D . B . (1977) Maximum likelihood esti- mation from incomplete data via the E M algorithm (with discussion). Journal of the Royal Statistical Society, Ser. B. 39, 1-38. Diggle, P., Heagerty, P., Liang, K , and Zeger, S. (2002) Analysis of Longitudinal Data, 2nd Edition. Oxford. Diggle, P. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis (with discussion). Applied Statistics 43, 49-93. Ding, A . and W u , H . (2001). Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics 2(1), 13-29. Fitzmaurice, G . M . , Laird, N . M . , and Zahner, G . E . P. (1996). Multivariate logistic 81 models for incomplete binary responses. Journal of the American Statistical Association 91, 99-108. Gelfand, A . E . and Smith, A . F . M . (1990). Sampling-based approaches to calcu- lating marginal densities. Journal of the American Statistical Association 85, 398-409. Gilks, W . R. and Wi ld , P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41, 337-348. Ho, D . D. , Neumann, A . U . , Perelson, A . S., Chen, W . , Leonard, J . M , and Markowitz, M . (1995). Rapid turnover of plasma virions and C D 4 lymphocytes in HIV-1 infection. Nature 373, 123-126. Ibrahim, J . G . (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association 85, 765-769. Ibrahim, J . G . , Lipsitz, S. R., and Chen, M . H . (1999). Missing covariates in gener- alized linear models when the missing data mechanism is nonignorable. Journal of the Royal Statistical Society, Ser. B 61, 173-190. Ibrahim, J . G . , Lipsitz, S. R., and Chen, M . H . (2001). Missing responses in gen- eralized linear models when the missing data mechanism is nonignorable. Bio- metrics 88, 551-564. Lindstrom, M . J . and Bates, D . M . (1990). Nonlinear mixed effects models for repeated measures data. Biometrics 46, 673-687. Little, R. J . A . and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: John Wiley. 82 Little, R. J . A . (1992). Regression with missing X ' s : a review. Journal of the American Statistical Association 87, 1227-1237. Little, R. J . A . (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association 90, 1112-1121. Little, R. J . A . and Schlucher, M . D . (1985). Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72, 497-512. L i u , C , Rubin, D. B . , and W u , Y . N . (1998), Parameter expansion for E M acceler- ation: the P X - E M algorithm, Biometrika 85, 755-770. Louis, T . A . (1982), Finding the observed information matrix when using the E M algorithm. Journal of the Royal Statistical Society, Ser. B. 44, 226-233. McLachlan, G . J . and Krishnan, T . (1997). The EM-algorithm and Extension. John Wiley & Sons. Meng, X . L . and van Dyk, D . A . (1997). The E M algorithm - an old folk-song sung to a fast new tune (with discussion). Journal of the Royal Statistical Society, Ser. B. 59, 511-567. Perelson, A . S., Neumann, A . U . , Markowitz, M . , Leonard, J . M . , and Ho, D . D . (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271, 1582-1586. Pinheiro, J . C . and Bates, D . M . (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects models. Journal of Computational and Graphical Statistics 4, 12-35. 83 Pinheiro, J . C . and W u , Y . (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate i-distribution. Journal of Computational and Graphical Statistics 10, 249-276. Robert, C . P. and Casella G . (1999). Monte Carlo Statistical Methods. Springer- Verlag, New York. Roy, J . and L i n , X . (2002). Analysis of multivariate longitudinal outcomes with nonignorable dropouts and missing covariates: changes in methadone treatment practices. Journal of the American Statistical Association 97, 40-52. Ten Have, T . R., Pulkstenis, E . , Kunselman, A . , and Landis, J . R. (1998). Mixed effects logistics regression models for longitudinal binary response data with informative dropout. Biometrics 54, 367-383. Vonesh, E . F . , and Chinchilli, V . M . (1996). Linear and Nonlinear Models for the Analysis of Repeated Measurements. Marcel Dekker, New York. Vonesh, E . F . , Wang, H . , Nie, L . , and Majumdar, D . (2002). Conditional second- order generalized estimating equations for generalized linear and nonlinear mixed- effects models. Journal of the American Statistical Association 97, 271-283. Wei, G . C . and Tanner, M . A . (1990). A Monte Carlo implementation of the E M algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association 85, 699-704. Wolfinger, R. (1993). Laplace's approximation for nonlinear mixed models, Biometrika 80, 791-795. 84 Wu, M . C . and Carroll, R. J . (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 55, 410-418. W u , H . and Ding, A . (1999). Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from A I D S clinical trials. Biometrics 55, 410-418. W u , H . and W u , L . (2001). A multiple imputation method for missing covariates in nonlinear mixed-effect models, with application to H I V dynamics. Statistics in Medicine 20, 1755-1769. W u , K . (2003). Simultaneous inference for generalized linear mixed models with informative dropout and missing covariates. Master Thesis, the University of British Columbia. W u , L . (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to A I D S studies, Journal of the American Statistical Association 97, 955-964. W u , L . (2004, a). Exact and approximate inferences for nonlinear mixed-effects models with missing covariates, Journal of the American Statistical Association 99, 700-709. W u , L . (2004, b). Exact and approximate simultaneous inferences for nonlinear mixed-effects models with dropouts and missing covariates, Technical report, the Department of Statistics, the University of British Columbia. W u , L . and W u , H . (2002). Missing time-dependent covariates in H I V viral dynamic models. Journal of the Royal Statistical Society, Ser. C. 51, 297 - 318. 85
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 2 | 0 |
Japan | 2 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 2 | 0 |
Tokyo | 2 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: