{"http:\/\/dx.doi.org\/10.14288\/1.0092156":{"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool":[{"value":"Science, Faculty of","type":"literal","lang":"en"},{"value":"Statistics, Department of","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider":[{"value":"DSpace","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeCampus":[{"value":"UBCV","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"value":"Song, Shijun","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/issued":[{"value":"2009-12-15T21:31:29Z","type":"literal","lang":"en"},{"value":"2005","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#relatedDegree":[{"value":"Master of Science - MSc","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeGrantor":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/description":[{"value":"Nonlinear mixed effects models (NLMEs) are very popular in many longitudinal\r\nstudies such as HIV viral dynamic studies, pharmacokinetics analyses, and studies\r\nof growth and decay. In these studies, however, missing data problems often arise,\r\nwhich make some statistical analyses complicated. In this thesis, we proposed an\r\nexact method and an approximate method for NLMEs with random-effects based informative\r\ndropouts and missing covariates, and propose methods for simultaneous\r\ninference. Monte Carlo E M algorithms are used in both methods. The approximate\r\nmethod, which is based on a Taylor series expansion, avoids sampling the random\r\neffects in the E-step and thus reduces the computation burden substantially. To illustrate\r\nthe proposed methods, we analyze two real datasets. The exact method is\r\napplied to a dataset with covariates and a dataset without covariates. The approximate\r\nmethod is applied to the dataset without covariates. The result shows that, for\r\nboth datasets, dropouts may be correlated with individual random effects. Ignoring\r\nthe missingness or assuming ignorable missingness may lead to unreliable inferences.\r\nA simulation study is performed to evaluate the two proposed methods under various\r\nsituations.","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO":[{"value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/16692?expand=metadata","type":"literal","lang":"en"}],"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note":[{"value":"Nonlinear Mixed Effects Models with Dropout and Missing Covariates When the Dropout Depends on the Random Effects by Shijun Song B.Sc, Peking University, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Statistics) The University of British Columbia August 2005 \u00a9 Shijun Song, 2005 Abstract Nonlinear mixed effects models (NLMEs) are very popular in many longitudi-nal studies such as HIV viral dynamic studies, pharmacokinetics analyses, and studies of growth and decay. In these studies, however, missing data problems often arise, which make some statistical analyses complicated. In this thesis, we proposed an exact method and an approximate method for N L M E s with random-effects based in-formative dropouts and missing covariates, and propose methods for simultaneous inference. Monte Carlo E M algorithms are used in both methods. The approximate method, which is based on a Taylor series expansion, avoids sampling the random effects in the E-step and thus reduces the computation burden substantially. To il-lustrate the proposed methods, we analyze two real datasets. The exact method is applied to a dataset with covariates and a dataset without covariates. The approxi-mate method is applied to the dataset without covariates. The result shows that, for both datasets, dropouts may be correlated with individual random effects. Ignoring the missingness or assuming ignorable missingness may lead to unreliable inferences. A simulation study is performed to evaluate the two proposed methods under various situations. ii Contents A b s t r a c t i i C o n t e n t s ii i L i s t of Tab les v i i L i s t of F i g u r e s v i i i A c k n o w l e d g e m e n t s ix D e d i c a t i o n x 1 I n t r o d u c t i o n 1 1.1 Longitudinal Data Analysis 1 1.1.1 Longitudinal Studies 1 1.1.2 Approaches to Longitudinal Data Analysis 3 1.2 Missing Data Problems 5 1.2.1 Missing Covariates and Responses 5 1.2.2 Classification of Missing Value Mechanisms 5 1.2.3 Literature on Missing Data Problems 7 1.3 Motivating Examples 8 iii 1.4 Objectives and Outline 11 2 N o n l i n e a r M i x e d Effects M o d e l s 12 2.1 Introduction 12 2.2 Nonlinear Mixed Effects Models 12 2.3 Literature Review on N L M E Models with Informative Missing Data . 15 3 A n E x a c t M e t h o d for N L M E M o d e l s w i t h Informat ive D r o p o u t a n d M i s s i n g Covar ia tes 17 3.1 Introduction 17 3.2 The Models 18 3.3 A Monte Carlo E M Method 20 3.3.1 E-step 22 3.3.2 M-step 25 3.4 Sampling Methods 27 3.4.1 Gibbs Sampler 27 3.4.2 Multivariate Rejection Algorithm 28 3.4.3 Importance Sampling 29 3.5 Convergence 30 4 A n A p p r o x i m a t e M e t h o d for N L M E M o d e l s w i t h Informat ive D r o p o u t a n d M i s s i n g Covar ia t e s 33 4.1 Introduction 33 4.2 The Approximate Method 35 4.2.1 E-step 36 4.2.2 M-step 45 4.3 Monte Carlo Sampling 45 iv 5 Covar ia te s M o d e l s a n d D r o p o u t M o d e l s 47 5.1 Introduction 47 5.2 Covariate Models 48 5.3 Dropout Models 49 5.4 Sensitivity Analyses 50 6 D a t a A n a l y s i s 52 6.1 Introduction 52 6.2 Example 1 53 6.2.1 Data Description 53 6.2.2 Models 55 6.2.3 Analysis and Results 57 6.2.4 Sensitivity Analysis 59 6.2.5 Conclusion . 61 6.3 Example 2 62 6.3.1 Data Description 62 6.3.2 Models 64 6.3.3 Analysis and Results 65 6.3.4 Sensitivity Analysis 66 6.3.5 Conclusion 67 6.4 Computation Issues 67 7 S i m u l a t i o n S t u d y 69 7.1 Introduction 69 7.2 Design of the Simulation Study 70 7.2.1 Models 70 v 7.2.2 Comparison Criteria 71 7.3 Simulation Results 71 7.3.1 Comparison of Methods with Varying Missing Rates 71 7.3.2 Comparison of Methods with Different Random Effects Covari-ances 73 7.3.3 Comparison of Methods with Varying Intra-individual Measure-ments 74 7.3.4 Comparison of Methods with Different Variances 75 7.4 Conclusions 76 8 C o n c l u s i o n a n d D i scuss ion 78 References 80 ) vi List of Tables 6.1 Data summary of Example 1 53 6.2 Estimations for response model parameters. (Example 1) 58 6.3 Estimations for dropout model parameters. (Example 1) 59 6.4 Sensitivity analyses for dropout models. (Example 1) 60 6.5 Data summary of Example 2 62 6.6 Estimates for dynamic model parameters in Model (6.11) and (6.12). (Example 2) 65 6.7 Estimations for dropout model parameters. (Example 2) 66 6.8 Sensitivity analyses for dropout models. (Example 2) 67 7.1 Simulation results for varying missing rates 73 7.2 Simulation results for different covariance matrices for random effects. 74 7.3 Simulation results for varying intra-individual measurements 75 7.4 Simulation results for varying variances 76 vii List of Figures 1.1 Hypothetical data on the relationship between height and age 2 1.2 Vira l loads of four randomly selected patients 10 6.1 Vira l loads of four randomly selected patients. (Example 1) 54 6.2 Q - Q plots for covariates (Example 1) 56 6.3 Vira l loads of four randomly selected patients. (Example 2) 63 viii Acknowledgements First and foremost, I would like to thank my supervisor, Dr. Lang Wu, for his excellent guidance and immense help during my study at the University of British Columbia. Without his support, expertise and patience, this thesis would not have been completed. Also, I would like to thank my co-supervisor and second reader, Dr. Harry Joe, for his invaluable comments and suggestions on this thesis. I would also like to thank Ms. Kunling Wu, a previous student of Dr. Lang Wu. Her master thesis and personal advice benefit me a lot in completion of this thesis. Furthermore, I would like to thank Dr. John Petkau for his invaluable advice on my consulting projects, which benefit me very much in the past and future. I thank all the faculty and staff in Department of Statistics at the University of British Columbia for providing such a nice academic environment. I should also thank all the graduate students in the Department of Statistics for making my study so enjoyable. Most importantly I would like to thank my parents for loving me and believing in me. Their love, constant support and encouragement push me to be the best at everything I do. SHIJUN SONG The University of British Columbia August 2005 ix To my parents. x Chapter 1 Introduction 1.1 Longitudinal Data Analysis 1.1.1 Longitudinal Studies The key characteristic of a longitudinal study is that individuals are measured re-peatedly over time. Longitudinal studies differ from cross-sectional studies, in which a single outcome is measured for each individual. In many studies, especially in clin-ical trials, longitudinal data are very common. Even when it is possible to address the same scientific questions in a longitudinal or cross-sectional study, there may be many advantages in addressing them in a longitudinal study. A n example, which can illustrate this idea, is Figure 1.1. In Figure 1.1(a), height is plotted against age for a hypothetical cross-sectional study of boys. Height appears to be shorter among older boys. In Figure 1.1(b), we connect the data points from each individual. Now, it is clear that everyone's height increases with age. This example shows that longitudinal studies can distinguish changes over time within individuals from differences among people in their baseline levels. Cross-sectional studies cannot. Longitudinal data can 1 a b 10 12 14 16 18 10 12 14 16 18 age age Figure 1.1: Hypothetical data on the relationship between height and age. be collected either prospectively, following subjects forward in time, or retrospectively, by extracting multiple measurements on each person from historical records. Lon-gitudinal data require special statistical methods because the set of observations on one subject tends to be inter-correlated. This correlation must be taken into account to draw valid inferences. Correlation is also taken into account when analyzing a single long time series of measurements. In most time series studies, there is only one series available and people usually try to find clues and draw conclusions from that series itself. Anal-ysis of longitudinal data tends to be simpler because subjects are usually assumed independent. Valid inferences can be made by borrowing information across people. That is, the consistency of a pattern across subjects is the basis for substantive conclu-sions. For this reason, inferences from longitudinal studies can be made more robust to model assumptions than those from time series data, particularly to assumptions about the nature of the correlation. 2 1.1.2 Approaches to Longitudinal Data Analysis Let yij represent a response variable and X y represent a p x l vector of p explanatory variables observed at time point Uj, for measurement j on subject i, j = 1,..., n,, i \u2014 1,..., N. The mean and variance of are represented by E (y^) = ^ and Var (y^) \u2014 u^. The set of repeated outcomes for subject i are collected into an x 1 vector, yt = (yu,... ,yini)T, with mean E (y*) = \/x; and x rij covariance matrix Var(yi) = V, where the (j,k) element of Vi is the covariance between y^ and y^, denoted by Cov (y^, y,fc) = t ^ . The covariate matrix for the ith subject is denoted as Xi = {xn, \u2022 \u2022 \u2022 , X m J T , an n; x p matrix. We use B4 for the n* x rij correlation matrix of y*. The responses for all subjects are denoted as y = ( y f , . . . , y ^ ) r , which is an m x 1-vector with m \u2014 YlZ=i ni- The covariates for all units are referred to as X \u2014 (Xf,..., Xj^)T, which is an m x p matrix. There are three approaches to longitudinal data analysis. The first approach, which is often called marginal models, is to model univariate responses ignoring de-pendence. Marginal methods are mainly used for regression with dependent data with the main interest is inference for the regression parameters. For example, in a clinical trial the difference between control and treatment is most important, not the difference for any one individual. A second approach, the random effects model approach, assumes that correla-tion arises among repeated responses because the regression coefficients vary across individuals. Here, we model given the individual-specific coefficients, by h(E(yij\\0i))=xfJ(3i. (1.1) Here, h(-) is a link function. For normal responses, it is the expectation and for binary responses, it may be the log odds ratio. Usually, there are too little data 3 on a single person to estimate \/3j from (y^Xj) alone. We further assume that the Pi's are independent realizations from some distribution with mean (3. We can write Pi = P + hi, where P is fixed and bj is a vector of zero-mean random variables. Then the basic assumption can be restated in terms of the latent variables bj. That is, there are unobserved factors represented by bj that are common to all responses for a given individual but which vary across individuals. Random effects models are particularly useful when inferences are to be made about individual trajectories, such as in AIDS studies. They focus on both population parameters P and individuals characteristics bj's. The third approach is called a transition model approach. This focuses on the conditional distribution of y^ given past outcomes, r\/y_i, \u2022 \u2022 \u2022 ,yu- Here, the data-analyst specifies a regression model for h(E (yij\\yij-i, \u2022 \u2022 \u2022, yn, x^)), as an explicit func-tion of Xy and of the past responses. In each of the three approaches, we consider both the dependence of the re-sponses on explanatory variables and the correlation among the responses. W i t h cross-sectional data, only the dependence of the responses on the explanatory vari-ables needs to be specified; there is no correlation of responses. In longitudinal studies, in which correlation usually exists among responses, there are at least two consequences of ignoring it: (1) incorrect inferences about regression coefficients P, particularly, confidence intervals are too short based on assumption of independence, when in fact there is positive dependence; (2) the estimating method of P may be inefficient, that is, less precise than possible; 4 1.2 Missing Data Problems 1.2.1 Missing Covariates and Responses In many applications, especially in longitudinal studies, missing data are a serious problem. Ignoring missing data or using over-simplified methods to handle missing data often leads to invalid inferences. Thus, it is very important to find appropriate approaches to deal with missing data. Two kinds of missing data in longitudinal studies are common: (i) missing covariates; and (ii) missing responses due to dropout or missing visits. For example, individuals may not come to study center for mea-surements at scheduled time points for various reasons, or they may even dropout permanently because of drug intolerance or death. Missing data make statistical analysis in longitudinal studies much more complicated, because standard methods, which are usually designed for complete-data, are not directly applicable. Commonly-used naive methods for missing data include the complete-data method, which only uses the complete observations and deletes all incomplete obser-vations, the mean imputation method, which replaces the missing values by the mean values of observed data, and the last-value-carried-forward method, which imputes a missing value by the immediate previous observed data. 1.2.2 Classification of Missing Value Mechanisms At the presence of missing data, the missing data mechanism must be taken into account in order to obtain valid statistical inferences. Little and Rubin (1987) and Little (1995) give a general treatment of statistical analysis with missing values. Let y = ( with y(\u00b0) denoting the measurements actually obtained and y'm^ de-5 noting the measurements which would have been available had they not been missing. Let r denote a set of indicator random variables, denoting which elements of y fall into y(\u00b0) and which into y( m ) . Now, a probability model for the missing value mech-anism defines the probability distribution of r. Little and Rubin (1987) classify the missing value mechanism as \u2022 Missing data are missing completely at random ( M C A R ) if the probability of missingness is independent of both observed and unobserved data. When miss-ing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be quite plausible. For ex-ample, some values are missing because of reasons irrelevant to the treatment such as the medical equipment is broken down on a certain day. So missingness is M C A R if r is independent of both y ( o ) and y ( m ) \u2022 Missing data are missing at random ( M A R ) if the probability of missingness depends only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he\/she is too old. In mathematical notations, missingness is M A R if r is independent of y( m ) . \u2022 Missing data are nonignorable or informative (NIM) if the probability of miss-ingness depends on unobserved data. To be specific, N I M has two cases in the context of random effects models: (i) the missingness depends on unobserved responses. For example, a patient fails to visit the clinic because he\/she is too sick. We call the missingness outcome-based informative (Little, 1995) if r is dependent on y ^ . (ii) The probability of missingness depends on unknown ran-dom effects (i.e. individual characteristics such as individual decay rates) which may substantially affect the responses. We call missingness random-effect-based 6 informative (Little, 1995) if r is dependent on random effect b*. It turns out that, for likelihood-based inference, the crucial distinction is be-tween random and informative missing values. Both M C A R and M A R missing mech-anisms are sometimes referred to without distinction as ignorable. Little and Rubin (1987) show that, when missing data are non-ignorable, likelihood inference must incorporate the missing data mechanism. 1.2.3 Literature on Missing Data Problems Little (1992) reviewed methods of estimation in regression models with missing co-variates. Six methods dealing with missing covariates are compared: complete-case methods, available-case methods, least squares on imputed data, maximum likelihood methods, Bayesian methods and multiple imputation. He suggested that the maxi-mum likelihood method, Bayesian methods, and multiple imputation method perform well, and the maximum likelihood method is preferred in a large samples and Bayesian methods or multiple imputation method are preferred in a small samples. Ibrahim (1990) considered missing covariates ( M A R ) in generalized linear models (GLMs) with discrete covariates, and applied the E M algorithm to obtain M L E s under the assumption that the missing covariates are from a discrete distribution. Ibrahim, Lipsitz, and Chen (1999) proposed a Monte Carlo E M algorithm for G L M s with nonignorable missing covariates. W u and Carroll (1988) consider linear mixed effects models (LMEs) with infor-mative dropout under the assumption that the informative dropout could be modeled by a probit model which includes the random effects as its covariates. Diggle and Ken-ward (1994) consider general approaches to informative dropouts in multivariate data and longitudinal data. They show that considering informative dropout mechanisms in the statistical inference reduces the bias caused by considering the informative dropout as only M A R . Ten Have et al. (1998) discuss mixed effects logistics regres-sion models for longitudinal binary responses with informative dropout. Roy and L i n (2002) consider multivariate longitudinal data with nonignorable dropouts and missing covariates. Little (1995) gives an excellent review on modeling the dropout mechanism in repeated-measures studies. Dropout models were classified into selec-tion models and pattern-mixture models. The main difference between the two type of dropout models is that the form of missing data mechanism needs to be spec-ified in the selection models but not in the pattern-mixture models. He classified N I M into nonignorable outcome-based missing data where the dropout depends on the missing values, and random-effect-based missing data where the dropout depends on random effects. He also suggested to examine the sensitivity of the results to the choice of missing data mechanisms when we almost know nothing about the missing data mechanism. Ibrahim, Chen and Lipsitz (2001) develop a Monte Carlo E M algo-rithm to obtain M L E s in G L M M s with informative dropouts. They propose that the missing data mechanism may be modelled by a logistic regression and a sequence of one-dimensional conditional distributions which may reduce the number of nuisance parameters. 1.3 Motivating Examples Our research is motivated by studies of H I V viral dynamics, which have received great attention in AIDS studies in recent years (Ho et al. 1995, Perelson et al. 1996; W u and Ding 1999). These viral dynamic models provide good understanding of the pathogenesis of HIV infection and evaluation of antiretroviral therapies, and the 8 dynamic parameters may reflect the efficacy of the antiviral treatments (Ding and W u , 2001). A common problem in these studies is that some subjects may drop out of the study or miss visits due to drug intolerance and other problems (although dropout patients may return to study later), and covariates may contain missing data as well. It is important to evaluate how the dropout patients affect estimates of the viral decay rates since the decay rates may reflect the efficacy of the antiviral treatments. The dataset which motivate our research consists of 48 H I V infected patients who were treated by a potent antiviral regimen. The viral load is repeatedly measured after initiation of the treatment. After the antiviral treatment, the patient's viral loads will often decay, and the decay rate may reflect the efficacy of the treatment. We only consider the viral load data before viral rebound and the first three months data since data after three months are likely to be contaminated by long-term clinical factors. The number of measurements for each patient varies from 2 to 7. Fourteen patients have missing viral loads at scheduled time points due to dropout or other problems. The baseline covariates C D 4 cell counts, total complement levels (CH50), and tumor necrosis factor ( T N F ) contain 3.7%, 12.3% and 16.4% missing data respectively. Four patients are randomly selected and their viral loads are plotted in Figure 1.2. Visual inspection of the raw data seems to indicate that dropout patients ap-pear to have slower viral decay rates, compared to the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. This dataset was ana-lyzed previously, but dropout patients were discarded and the missing viral loads were assume to be missing completely at random (Wu and Ding 1999; W u and W u 2001). W u (2004) re-analyze the dataset, proposing a missing mechanism based on the unobserved responses (viral loads). In this thesis, our objectives are to model the viral load, incorporating non-ignorable missing mechanism, based on unknown 9 \"i 1 1 1 r~ 0 10 20 30 40 days after treatment Figure 1.2: Viral loads of four randomly selected patients. 10 random effects, and check if the estimates of decay rates are different. 1.4 Objectives and Outline In this thesis, we develop an exact inference method, implemented by a Monte Carlo E M algorithm, to make simultaneous inferences for N L M E s with informative dropout and missing covariates. To avoid computational difficulties when the dimension of random effects is not small, we also propose an approximate inference method, which integrates out the random effects in the E M algorithm for more efficient computation. Our methods differ from W u (2004) in that the proposed dropout mechanism depends on the random effects rather than the unobserved responses. The remainder of this thesis is organized as follows. Chapter 2 introduces N L M E s . Chapter 3 discusses the exact inference method for estimation of N L M E s with informative dropout and missing covariates. The approximate inference method based on linearization is presented in Chapter 4. We discuss dropout models and covariate models in Chapter 5. In Chapter 6, we apply our methods to real datasets. Chapter 7 presents our simulation study. We conclude the thesis with a discussion in Chapter 8. 11 Chapter 2 Nonlinear Mixed Effects Models 2.1 Introduction Before we present our methods for estimating parameters in N L M E s with informa-tive dropout and missing covariates, we give a brief introduction to N L M E s in this chapter. In Section 2.2, we introduce N L M E s for longitudinal data. Section 2.3 gives a literature review on N L M E s with informative dropout and missing covariates. 2.2 Nonlinear Mixed Effects Models Linear models, such as polynomials, are often empirical models based on the ob-served data. Therefore, they may be only valid within the observed range of the data. There is often no theoretical consideration about the underlying mechanism, which generates the data. In many longitudinal studies such as H I V viral dynamics, pharmacokinetics analyses, and studies of growth and decay, nonlinear modeling is often required for meaningful analyses. Nonlinear mixed effects models ( N L M E s ) , or hierarchical nonlinear models, are popular in these studies in characterizing both the 12 intra-individual variation and the inter-individual variation (Davidian and Giltinan, 1995; Vonesh and Chinchilli, 1996). As a generalization of linear models, nonlinear models have many advantages: (1) Nonlinear models are often mechanistic, that is, they are often based on the mechanism which produces the data, so the model parameters generally have a natural physical interpretation. (2) A nonlinear model generally uses fewer parameters than a competing linear model, such as a polynomial, offering a more parsimonious description of the data. (3) Nonlinear models often provide more reliable prediction for the responses outside the observed data range. However, compared with linear models, nonlinear models usually don't have a close form expression for the marginal likelihood, and thus parameter estimation is more computationally intensive. For longitudinal data analysis, nonlinear mixed effects models are popular for inferences. Suppose that there are N individuals, with individual i having n\u00bb measurements at times tn,..., t j n . . Let be the response value for individual i at time iy , subject to informative dropout, i = l,...,N;j = l , . . . , n j . Denote Yi = (yn, \u2022 \u2022 \u2022) Vim)7Let z$ = (zn,..., zip)T be a collection of incompletely observed baseline time-independent covariates for individual i. Let v$ = (vn,... ,Viq)T be a collection of completely observed baseline time-independent covariates for individual i. A general N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995) 13 Va = 9(^ij,^ij;Pi) + eij, ~ N(0,a2I) (2.1) pt = d ( z i , v i ; \/ 3 , b i ) , bt^NfrD), j = l,...,m,i = l,...,N, (2.2) where g(-) is an arbitrary nonlinear function, z y- and v y - are respectively (p x 1) and (g x 1) vectors of covariates, e, = (en,..., e j n ; ) r represents measurement errors, Pi \u2014 (Ai) \u2022 \u2022 \u2022 iPis)T is a (s x 1) vector of individual-specific regression parameters, P = ( \/ ? i , . . . , (3r)T is a (rx 1) vector of population parameters (fixed effects), d(-) is a s-dimensional vector-valued function, = (bn,..., bi3)T is the vector of random effects and is independent of e^ , a1 is the unknown within individual variance, \/ is the identity matrix, and the (s x s) matrix D quantifies the random inter-individual covariance. We write D = D(rj), where r) denotes the collection of all distinct parameters in D. Let \/(\u2022) 1 denote a generic density function and f(y\\x) denote the conditional density function of y given x. After integrating out the unobserved random effects vector, the density of the responses is given by f(yi\\zi, Vi, P, a2, D) = J f(yi\\zi, v i ; P, a2, b, D)f(h\\D)dh, (2.3) and the likelihood function is N L(P, a2, D\\y) = J] \/ \/ ( X i N i . v*; P, a2, b, D)f(b\\D)db, (2.4) \u00bb=i ^ which generally does not have a closed-form expression. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the number of random effects b^. This is straightforward to do by direct numerical integration when the dimension of is 1 or 2. However, when bj has a dimension of 3 or more, people need to consider alternative methods, such as Monte Carlo method. xHere, for simplicity, we are abusing mathematical notation, by using \/ for many different den-sities, and the function can be determined from the arguments. 14 Lindstrom and Bates (1990) propose an approximate method based on first-order Taylor expansions about the random effects 6;. The resulting algorithm provides a computationally fast, albeit approximate, method for a wide class of non-linear models. 2.3 Literature Review on N L M E Models with In-formative Missing Data W u and W u (2001) estimate parameters in nonlinear mixed effects models with miss-ing covariates ( M A R ) by a three-step multiple imputation method. In first step, they fitted a hierarchical model without covariates. Then they imputed the missing covari-ates based on a multivariate linear model, implemented by Gibbs sampler, and created B independent complete datasets in the second step. In the last step, they used the standard complete-data method to analyze each dataset and combine B obtain the overall inference. W u (2002) proposed a method for N L M E s with censored responses and covariates measured with errors. W u and W u (2002) also proposed a method for analyzing N L M E s with missing time-dependent covariates. Later, W u (2004, a) proposed an exact and an approximate method for analyzing data with missing co-variates in nonlinear mixed effects models. The exact method is implemented by a Monte Carlo E M algorithm, and the approximate method linearizes the nonlinear model based on a Taylor expansion, and it substantially reduces the computation load. W u (2004, b) proposed a Monte Carlo E M method for estimating parame-ters in N L M E s with nonignorable missing covariates and dropout, with a dropout mechanism depending on unobserved responses. However, no one has considered pa-15 rameter estimation in N L M E s with informative dropout and missing covariates, with a dropout mechanism depending on unknown random effects. In the following chap-ters, we focus on N L M E s with informative dropout and missing covariates, with a random-effects based dropout mechanism. Since the random effects are shared by both the response model and the dropout model, this approach may also be referred to as a shared parameter model. 16 Chapter 3 An Exact Method for N L M E Models with Informative Dropout and Missing Covariates 3.1 Introduction In this chapter, we develop an exact inference method based on Monte Carlo meth-ods to obtain M L E s for parameters in N L M E s with informative dropout and missing covariates. The proposed exact method is implemented by a Monte Carlo E M algo-rithm. In Section 3.2, we give a description of N L M E s with informative dropout and missing covariates. Section 3.3 describes a Monte Carlo E M algorithm. A detailed description of our sampling methods is provided in Section 3.4. Computation issues regarding our algorithm are discussed in Section 3.5. 17 3.2 The Models We consider the models (2.1) and (2.2). Let rj = (rn,... , r i n i ) T be a vector of missing data indicators for individual i such that = 1 if y^ is missing and 0 otherwise. We write y; = ( y \u2122 i S i i , YobSi)T, where y T Oj s ,j corresponds to the missing components of Yi and yobs,i contains the observed components of y\u00bb. We write Zj = ( z ^ i s i , z ^ s i ) r , where z m i S ) j corresponds to the missing components of covariate vector Zj and z0{,5]j contains the observed components of Zj. We assume that the missing covariates are ignorable (or missing at random), i.e., the missing covariate mechanism may depend on the observed data but not on the covariate values being missing. The observed data are {(y06S,i> zobs,i, v*, rj), i = 1 , . . . , N}. Note that the dimensions of y0bs,i and z 0 ts , i depend on i. To facilitate likelihood inference, we need to make a distributional assumption for the incompletely observed covariates zit conditional on the completely observed covariates Vj. We denote the covariate distribution by \/ ( z j | v j ;a ) , where the param-eters a may be viewed as nuisance parameters. To allow for informative missing responses, we assume a distribution for rj as \/ ( r \u00bb | y j , Z j , v*;
,bt. (3.2) In this dropout model, the missing probabilities of responses only depend on the random effects of that patient. More complicated dropout models can be specified in a similar way. Note that the assumed models are not testable based on the observed data, so it is important to perform sensitivity analyses on various missing mechanisms. If the main parameter estimates \/3 are quite independent of the assumed dropout model, we may be confident about the results. Otherwise, if the estimates are very sensitive to the assumed dropout model, we need to justify the dropout model first to get reasonable estimates of the parameters. The covariate model \/(z;|v;) can be chosen in a similar way and sensitivity analyses should also be performed (Ibrahim et al. 1999). 3.3 A Monte Carlo E M Method The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algorithm to compute M L E s in a wide variety of situations, such as missing data and random effects models, but it fails to get an estimating covariance matrix of M L E s . Each iteration of a E M algorithm consists of an E-step that evaluates the expecta-tion of 'complete data' log-likelihood conditional on the observed data and previous parameter estimates, and an M-step that updates the parameter estimates by maxi-mizing the the conditional expectation of log-likelihood. This iterative computation between the E-step and M-step till convergence leads to the M L E s . If we treat (y o f e s , i , ymis,i, z o 6 S ) i , z m i S ) i , Vj, r;, ht) = (y i ; zit v,, rh b{) as the 'com-20 plete' data, the complete data density for individual i is given by f(Yi, Zi, Vi , Ti, bi\\a, [3, a2, t], cp) = \/ W y * , bi , Z i , Vi; 0 ) \/ ( y i | b i , Z i , v*; \/3, cr 2)\/(bi|Z?(r7))\/(zi|vi; a). (3.3) This leads to the complete data log-likelihood JV i=l N = z~2{ l o \u00a7 \/ Wy;> b \u00bb z \u00bb v \" zobs,ivi> ri> V*^) * s a n important step for imple-menting the E-step of the Monte Carlo E M algorithm. Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multi-dimensional distribution by sampling from each of the full conditional distributions in turn, if the distribution has a convenient representation via conditional distribu-tions. Here, we use the Gibbs sampler to simulate the missing values as follows. Set initial values (v^L,*) zmiS,i> b- 0'). Supposed that the current generated values are ( v ( f c ) 7{k) b{k)) we can obtain i v ( f e + 1 ) 7 ( f c + 1 ) h{k+1)) as-Step 1. draw a sample for the missing responses y ^ V from f (ymis,i\\zmis,i> b i > yobs,i> zobs,i~^i> \" 0 ^ Step 2. draw a sample for the missing covariates z ^ * V from f {zmis,i\\(ymis,i i bj \\ yobs,i> zobs,i'vi, ? i \\ ^ ) -Step 3. draw a sample for the \"missing\" random effects b^ f e + 1 ' from ffoilymisji ZmisJ < vo(>s,i) zobs,ivit ri\\ ^ ^)-After a sufficiently large burn-in of d iterations, the sampled values will achieve a steady state. Then, {(yjf,^, z ^ - s i , hf^),k = d + 1 , . . . , B} can be treated as samples 27 from the multidimensional density function f(ymis,i) zmis,ii bj|y 0 i) S ] j, Z 0 ( , s j V j , Tj) i \/ \/ )^. A n d , if we choose a sufficiently large gap d! (usually smaller than d), we can treat the sample series {{y^is>it z ^ 8 ] i , bf^), k = d + ) r\u00ab;V ( t )), (4.6) 35 where ymis,i ~ Ymis.i \u2014 gmia,i('Zlmis,i, $ \\ h\\ + -Xmj S )j(zmj S ]j)\/3^ + Tmis,i{zmis,i)b\\ \\ (4.7) and Xmis,i,TmiSii are submatrice of Xi,Tt respectively. g m i S ) i is a sub-vector function of defined similarly, and yt = ( y \u00a3 i j M , yo6 S , i) T- Under the L M E model (4.2), it is straightforward to show that HbilyuZi^V) ~ Nfati),- (4.8) where ti^ia-WlfTi + D-1\u00ae)-1, (4.9) hi = tiTf (Si-Xip^)\/^. (4.10) 4.2.1 E-step We can integrate out bj and obtain the following results. Qi(tp\\*p{t)) = \u00a3 ' [ \/ i ( V ' | y i , Z i , v i , r i , b i ) | y o 6 a i i , z 0 ( i a | i , v i , r i ; V ' ( t ) ] = \/ \/ \/ { l o g \/ ( r i | y i , hu Zi, v i ; <\/>\u00ab) + log f(yi\\hi, Zi, vt; \/ 3 \u00ab <7 ( \u00b0 2 ) + l o g \/ ( Z i | v i ; Q(\") + log\/(bi|Z?(77))} xf(ymia,i, z m i S , i , bj |y 0 6 S , i , z o t S | i V j , TJ; \u2022 0 ( ' ) ) d b i d y m i S , i d z m i S | i = h + h + h + h- (4.11) 36 h = JJJ ^ogf{ri\\yi,bi,Zi,Vi\\ \\ is positive and the estimate of