The Theory and Methods for Measurement Errors and Missing Data Problems in Semiparametric Nonlinear Mixed-effects Models by WEI LIU B . S c , Northeast Normal University, 1990 M.Sc., Northeast Normal University, 1993 M.Sc., Memorial University of Newfoundland, 2001 A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Doctor of Philosophy in T H E F A C U L T Y O F G R A D U A T E STUDIES (Statistics) The University of British Columbia October 2006 © W E I L I U , 2006 Abstract Semiparametric nonlinear mixed-effects ( N L M E ) models are flexible for modelling complex longitudinal data. Covariates are usually introduced in the models to partially explain inter-individual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. In this thesis, we develop approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing re- sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. Some asymptotic properties of the estimates are discussed. The proposed methods are illustrated in a H I V data example. Simulation results show that all proposed methods perform better than the commonly used two-step method and the naive method. L ii Contents Abstract Contents List of Tables . . . List of Figures . . Acknowledgements Dedication '.. 1 Introduction 1 •1.1 Longitudinal Studies 1 1.2 Parametric Nonlinear Mixed-effects Models 1.3 Semiparametric Nonlinear Mixed-effects Models 6 1.4 Measurement Errors and Dropouts 8 1.5 A Motivating Example 1.6 Research Objectives and Thesis Organization . 3 10 . . . 2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Mea- iii 14 surement Errors • • 16 2.1 Introduction 16 2.2 A Semiparametric N L M E Model for the Response Process 17 2.2.1 A Semiparmetric N L M E Model with Mis-measured Covariates . . . . 17 2.2.2 A Basis-based Approach to Nonparametric Functions 18 2.2.3 Percentile-based K n o t Placing for Splines 21 2.2.4 Selection of Smoothing Parameters 21 2.2.5 Transformation of the Semiparametric N L M E M o d e l 23 2.2.6 Consistency of the Estimate of w(t) 23 2.3 3 '. Measurement Errors and Missing D a t a in Covariates 26 A Joint Model for Semiparametric N L M E Models with Covariate Measurement Errors and Missing Data 28 3.1 Introduction 28 3.2 A Two-step Method 29 3.3 A Joint Model Method fpr Likelihood Inference 34 3.3.1 The Likelihood for the Joint Model 34 3.3.2 A M C E M Method 36 3.3.3 Sampling Methods 40 3.3.4 Convergence 44 3.4 3.5 A Computationally More Efficient Approximate Method 46 3.4.1 The Need for an Alternative Method 46 3.4.2 Analytic Expressions of Estimates 48 3.4.3 Asymptotic Properties 51 Example and Simulation 51 iv 3.5.1 A n Application in A I D S Studies 51 3.5.2 A Simulation Study 58 3.6 Discussion 60 3.7 Appendix: Asymptotic Properties of 7 Based on the A P P R M e t h o d i n Section 3.4 61 3.7.1 Some Lemmas 61 3.7.2 Notation and Regularity Conditions 3.7.3 Estimating Equations 3.7.4 Consistency . . ; 67 3.7.5 Asymptotic Normality of 7 72 . 62 •. . . 65 S i m u l t a n e o u s Inference for S e m i p a r a m e t r i c N L M E M o d e l s w i t h C o v a r i a t e Measurement E r r o r s and Outcome-based Informative D r o p o u t s 74 4.1 Introduction 74 4.2 Models for Nonignorable Missing Responses 75 4.3 Likelihood Inference 4.4 4.5 . . . 77 4.3.1 The Likelihood Function 77 4.3.2 Approximate M L E s Based on a M C E M Method . . . 78 4.3.3 Monte Carlo Sampling 81 . A Computationally More Efficient Approximate Method 83 4.4.1 A M u c h Simpler E-step 84 4.4.2 T h e M-step 91 Example 95 4.5.1 Data Description 95 4.5.2 The Response and the Covariate Models 96 v 5 4.5.3 Dropout Models and Sensitivity Analysis . 4.5.4 Estimation Methods and Computation Issues 4.5.5 Analysis Results 98 . . 99 101 4.6 A Simulation Study 103 4.7 Conclusions and Discussion 108 Semiparametric N L M E Model with Random-effect-based Informative Dropouts and Covariate Measurement Errors 110 5.1 Introduction 110 5.2 Missing Response Models Ill 5.3 A Monte Carlo E M Method 113 5.3.1 The Likelihood Function 5.3.2 A M C E M Algorithm 5.3.3 Sampling Methods 117 5.4 A n Alternative Approximate Method 118 5.5 5.6 113 • 115 5.4.1 The Hierarchical Likelihood Method 118 5.4.2 Asymptotic Properties 121 Example and Simulation 122 5.5.1 Example 122 5.5.2 The Simulation Study 128 Discussion 130 5.7 Appendix: Asymptotic Properties of the Approximate M L E #HL in Section 5.4131 6 5.7.1 Consistency 5.7.2 Asymptotic Normality of f ? 131 H L Conclusions and Future Research 133 136 vi 6.1 Conclusions 6.2 Future Research Topics References 136 .' . 137 . 140 vii List of Tables 3.1 A I C and B I C values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3 . 54 3.2 A I C and B I C values for the linear, quadratic, and cubic C D 4 models 55 3.3 Parameter estimates (standard errors) for the H I V dataset 58 3.4 Simulation results for parameter estimates as well as (standard errors) and (simulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R . 59 4.1 A I C and B I C values for the model (4.8) - (4.10), with q < p = 1, 2, 3. . . . 97 4.2 A I C and B I C values for the linear, quadratic, and cubic C D 4 models . . . . 98 4.3 Parameter estimates (standard errors) for the models in the example 4.4 Simulation results for the parameter estimates (standard errors) as well as 102 their biases and M S E s for the estimation methods P A R A , A P P R 1 , and APPR2.105 4.5 Simulation results for the parameter estimates (standard errors) for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13) 107 viii 4.6 Simulation results for biases and M S E s of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13) 108 5.1 Parameter estimates (standard errors) for the models in the example 127 5.2 Simulation results for the parameter estimates (standard errors) for the estimation methods A P and H L 5.3 129 Simulation results for bias and M S E of the parameter estimates for the estimation methods A P and H L 129 ix List of Figures 1.1 Viral loads (response) of six randomly selected H I V patients 1.2 C D 4 counts of six randomly selected H I V patients 12 3.1 The time series plot for b associated with patient 10. . . . 56 3.2 The autocorrelation function plot for b associated with patient 10 56 4.1 The time series plot for b associated with patient 14 100 4.2 The autocorrelation function plot for b associated with patient 14 100 5.1 The time series plot for b associated with patient 10 126 5.2 The autocorrelation function plot for b associated with patient 14 126 2 2 2 2 2 2 x 7 Acknowledgements First of all, I thank my supervisor, Dr. Lang W u , for his support, both academic and moral. I feel honored to have had the chance to work with him. Very special thanks are due the other members of my supervisory committee, Dr. Paul Gustafson and Dr. Matias SalibianBarrera, for their invaluable suggestions and encouragement. Also, I am very grateful to Dr. Nancy Heckmen, Dr. Harry Joe, and Dr. John Petkau for teaching me, academically and personally. I am also indebted to the office staff, Christine Graham, Elaine Salameh, and Rhoda Morgan, for all their help with administrative matters. It is a pleasure to acknowledge helpful discussions with my fellow graduate students: Juxin L i u , Guohua Y a n , Weiliang Qiu, Hui Shen, and all of the others. Finally, I would like to thank my family, my husband, and my son for their tireless love, patience, understanding, and support. WEI LIU The University September of British Columbia 2006 xi To my family xii Chapter 1 Introduction 1.1 Longitudinal Studies A longitudinal study is defined as a study in which the response for each individual in the study is observed on two or more occasions. Longitudinal studies are very common in health and life sciences, epidemiology, medical, and biomedical research. Longitudinal studies are. also common in other areas including education, psychology, social sciences, and econometrics. A major advantage of longitudinal studies over cross-sectional studies is that in longitudinal studies one can model the individual response trajectory over time while in cross-sectional studies one cannot. In longitudinal studies, covariates may be classified into two categories: time-varying covariates and time-independent covariates. Time-varying covariates represent variables which vary over time within individuals. Time itself may be viewed as a covariate in that often there is interest in testing whether there are any changes in the response variable over time. When one studies children's weight trajectories over time, the height may be a time-varying covariate which can change with time. Time-independent covariates, on the other hand, may represent baseline factors which do hot vary with time. Examples of timeindependent covariates might include an individual's gender and race. One of the goals in longitudinal research is to investigate the effects of important covariates on individual response trajectories over time. A defining feature of a longitudinal data set is repeated observations on a number of individuals. The repeated observations on the same individual tend to be correlated. It is important to explicitly recognize two sources of variability in a longitudinal data set: random variation among repeated measurements within a given individual and random variation between individuals. Moreover, the number of observations within individuals often varies from individual to individual (i.e., the data are often unbalanced). Therefore, longitudinal data require special statistical methods to draw valid statistical inferences. There are three approaches to a longitudinal data analysis. "The marginal model approach is to model the marginal expectation of a response as a function of covariates. The methods are designed to permit separate modelling of the regression of the response on covariates, and the association among repeated observations of the response for each individual. Marginal models are appropriate when inferences about the population average are the main interest. For example, in a clinical trial the average difference between control and treatment is most important, not the difference for any one individual. The random effects model approach assumes that the response is a function of covari- ates with regression coefficients varying from one individual to the next. A random effects model is a reasonable description if the set of coefficients for a set of individuals can be thought of as a sample from a distribution. In random effects models, correlation arises among repeated responses because the regression coefficients vary across individuals, and regression coefficients represent the effects of the covariates on an individual, which is in contrast to the marginal model coefficients which describe the effect of covariates on the population average. Random effects models are most useful when the objective is to make inference about individuals, such as in AIDS studies. They may focus on both population parameters and individuals characteristics. The transition model approach describes the conditional distribution of each response on an individual as an explicit function of his past responses and covariates. Under transition models, correlation among the response observations on one individual exists because the past response observations explicitly influence the present response observation. The past response observations are treated as additional covariates. In each of the three approaches, we consider both the dependence of the responses on covariates and the correlation among the responses. W i t h cross-sectional data, only the dependence of the responses on covariates needs to be specified since there is no correlation of responses. In longitudinal studies, in which correlation usually exists among responses, there are at least two consequences of ignoring it as follows. First, incorrect inferences about regression coefficients. In particular, confidence intervals are too short based on assumption of independence when in fact there is positive dependence. Secondly, the estimation method may be inefficient, that is, less precise than possible. 1.2 Parametric Nonlinear Mixed-effects Models Parametric nonlinear mixed-effects ( N L M E ) models, or hierarchical nonlinear models, have been widely used in many longitudinal studies such as human immunodeficiency virus (HIV) viral dynamics, pharmacokinetic analyses, and studies of growth and decay (Davidian and Giltinan 1995; Vonesh and Chinchilli 1997). In these studies, the intra-individual variation and the inter-individual variation are typically modelled by a two-stage hierarchical model. The first stage specifies the mean and covariance structure for a given individual, whereas 3 the second stage characterizes the inter-individual variation. Understanding the nature of inter-individual systematic and random variation at the second stage often receives more emphasis. This inter-individual variation may be partially explained by some baseline or time-varying covariates. Suppose that there are n individuals with measurements over time. Let yy and zy respectively be the response value and the i / x l covariate values for individual i at time Uj, i = 1 , . . . , n, j = 1 , . . . , rii. The covariates zy may incorporate variables such as time, dose, etc. A general parametric N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995) = giz&PJ V i j = ft,- d(z + eij, eilfS, ^ 1 ft hi), ii; N(0,5 I), (1.1). 2 N(0,B), i = l,...,n, j = l,...,m, (1.2) where g{-) and d(-) are known (possibly nonlinear) functions, ft,- are individual-specific parameters, f3 are population parameters i — (eii, • • . , e ) e T ini (fixed effects), b; are random effects, ft — (/3^,..., f3jni)T, are within-individual random errors and are assumed to be independent' of bj, 8 is the unknown within-individual variance, / is the identity matrix, and B is an 2 unknown variance-covariance matrix. In AIDS studies, for example, viral loads (Plasma HIV-1 R N A copies) and various covariates such as C D 4 count are usually measured over time after initiation of treatments. The following parametric N L M E model has been widely used to fit short-term (the first three months after treatments) H I V viral dynamics (Wu, 2002; W u and Zhang, 2002) y y = log (P 1 0 l0g(Pu) =p\ - *^ + P A l l i e e - ^ ) + ey, X j = #2 + fcZij + hi, + b li} U l o g ( P » ) = PA + hi, X j = As + hi, 2 2i (1.3) X 2 i . (1-4) (1.5) where yy and zy are the logio-transformation of the viral load measurement and C D 4 count 4 for patient i at time Uj respectively, are baseline values, and A i ^ and X 2ij = (bu, b i, hi, bij) T 2 are random effects, Pu and P 2i are the first (initial) and the second phase viral decay rates respectively (they may be interpreted as the turnover rates of productively infected cells and long-lived and/or latently infected cells respectively). Although N L M E models are popular in practice, their use has been somewhat limited because of the. complexity of the likelihood function. Estimation of model parameters based on maximum likelihood can be challenging since these models are typically nonlinear with respect to the random effects and thus have no closed-form expressions for the marginal likelihood. This has led to the development of some widely used approximate methods based on Taylor expansions or Laplace approximation of the likelihood function (Lindstrom and Bates 1990; Wolfinger 1993; Vonesh, Wangs, Nie, and Majumdar 2002). These approximate methods are computationally efficient in the sense that they .may converge faster and have less computational problems than the "exact" likelihood method, which finds the maximum likelihood estimator ( M L E ) using numerical integration techniques or Monte Carlo methods. These approximate methods often perform well if the number of infra-individual measurements is not small, but their performance may be less satisfactory if the intra-individual data are sparse, especially when the inter-individual variability is large (Davidian and Giltinan 1995; Vonesh and Chinchilli 1997; Pinheiro and Bates 1995). Thus there is still a need for developing "exact" methods. "Exact" likelihood inference for generalized linear mixed models based on Monte Carlo E M algorithms has been investigated by McCulloch (1997) and Booth and Hobert (1999). 5 1.3 Semiparametric Nonlinear Mixed-effects Models Parametric N L M E models are powerful tools in many longitudinal analyses. In some cases, however, parametric N L M E models may not be flexible enough in modelling complex longitudinal processes, since the underlying mechanism which generates the data may be complicated in practice. In these cases, semiparametric or nonparametric models may be more flexible in modelling the complex longitudinal process (Ke and Wang, 2001; Rice and Wu, 2001). In particular, semiparametric N L M E models are very useful in characterizing both the intra-individual variation and the inter-individual variation, in which the intra-individual variation is modelled semiparametrically while the inter-individual variation is incorporated by random effects (Davidian and Giltinan, 1995; Ke and Wang, 2001; W u and Zhang, 2002). In A I D S studies, for instance, the parametric N L M E model (1.3) - (1.5) is appropriate only for fitting short-term H I V viral dynamics. Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1 for long-term viral load trajectories of six randomly selected H I V patients). Grossman et al. (1999) pointed out that viral decay rates after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). This leads to the following semiparametric N L M E model = log (Piie- «*« + P A l y i j 1 0 e~ ^) + e , (1.6) x 2 i y log(Pu) = (3i + b , X log(P i) = PA + hi, A ij = w(Uj) + hilUj), u Uj 2 2 = (3 + (3 2 3Zij + b, 2i (1.7) (1.8) where w(-) and hi(-) in (1.8) are unknown nonparametric smooth fixed- and random-effects 6 Figure 1.1: Viral loads (response) of six randomly selected H I V patients. functions used to describe the complicated second phase decay rate X ij2 Wu and Zhang (2002) introduced a class of semiparametric N L M E models for longitudinal data. The standard parametric N L M E models can be regarded as a special case of their models. Their models are more flexible than the semiparametric N L M E models proposed by K e and Wang (2001). Details of the semiparametric N L M E models proposed by W u and Zhang (2002) will be described in Chapter 2. 1.4 Measurement Errors and Dropouts In many longitudinal studies, the inter-individual variation may be large and this variation may be partially explained by time-varying covariates. Some covariates, however, may be measured with substantial errors and may contain missing values as well. Ignoring measurement errors and missing data in covariates may lead to biased results (Carroll et al. 1995; Higgins et al. 1997; W u , 2002). Moreover, some individuals may drop out of the study before the scheduled end for various reasons such as drug intolerance, which leads to missing data. Measurement errors and missing data make statistical analysis in longitudinal studies much more complicated, because standard complete-data methods are not directly applicable. Therefore, it is very important to find appropriate methods to deal with measurement errors and missing data. In AIDS studies, for example, it is well known that C D 4 counts, which may be used as covariates, are usually measured with substantial errors and are usually measured at time points different from the response (viral load) measurement schedule. In addition, it is very common that some patients drop out of the study early or miss scheduled visits due to drug intolerance or other problems. Visual inspection of the raw data seems to indicate that dropout patients may have slower viral decay, compared with the remaining patients. Thus, 8 the dropouts are likely to be informative or nonignorable. Commonly used measurement error models are reviewed in Carroll et al. (1995). For N L M E models with covariate measurement errors, Higgins, et al. (1997) proposed a two-step method and a bootstrap method, and W u (2002) considered censored response and covariate measurement errors based on a joint model. There is also extensive literature on dropouts in longitudinal studies (e.g., Diggle and Kenward, 1994; Little 1995; Ibrahim et al. 2001). However, there is little literature on addressing measurement errors, informative dropouts, and missing data in semiparametric N L M E models. In the presence of missing data, the missing data mechanism must be taken into account to obtain valid statistical inferences. Little and Rubin (1987) and Little (1995) discussed statistical analyses with missing values. Let repeated observations of a variable y on individual i. denoting the observed components of Let Ti = (ra,..., r i n i ) T = (yu,... Write = ,yi ) T ni be a vector of (y[°\ y^™'), with y-°' and y | ^ denoting the missing components of y,. m denote a set of indicator variables such that r y = 1 if y^ is missing and rij = 0 otherwise. The probability distribution of r; defines a probability model for the missing value mechanism. Little and Rubin (1987) classified the missing value mechanism as follows. • Missing data are missing completely at random ( M C A R ) if the probability of missing- ness is independent of both observed and unobserved data. When missing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be plausible. For example, some values are missing because of reasons irrelevant to the treatment (e.g., the medical equipment is broken down on a certain day). So missingness is M C A R if • Missing data are missing is independent of both y\°^ and y - ' m at random ( M A R ) if the probability of missingness depends 9 only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he/she is too old. In mathematical notation, missingness is M A R if Ti is independent of . • Missing data are nonignorable or informative (NIM) if the probability of missingness depends on unobserved data. For random effects models, we consider the following two nonignorable response missing mechanisms. First, the probability of the missingness depends on unobserved responses. For example, a patient fails to visit the clinic because he/she is too sick. We call the missingness outcome-based 1995) if rt is dependent on y\ \ m informative but not on the random effects bj. (Little, Secondly, the probability of missingness depends on unobservable random effects. For example, an AIDS patient may drop out if his/her individual-specific viral decay is too slow. We call the missingness random-effect-based informative (Little, 1995) if r-j is dependent on random effects bj, but not on y- ^. m Both M C A R and M A R missing mechanisms are sometimes referred to without distinction as ignorable. Little and Rubin (1987) showed that, when missing data are nonignorable, likelihood inference must incorporate the missing data mechanism to avoid biased results. 1.5 A Motivating Example Our research is motivated by H I V viral dynamic studies, which model the viral load trajectories after initiation of anti-HIV treatments. H I V viral dynamic models have received great attention in A I D S studies in recent years (Ho et al. 1995; Perelson et al. 1996; W u and Ding, 1999; W u , 2005). These viral dynamic models provide good understanding of the pathogenesis of H I V infection and evaluation of anti-HIV therapies. N L M E models have been popular in modelling the initial period of H I V viral dynamics and in characterizing 10 the large inter-patient variation. It is shown that the initial viral decay rate may reflect the efficacy of the anti-HIV therapy (Ding and W u , 2001). One of the major challenges in modelling long-term H I V viral dynamics is that, during late stages of an anti-HIV treatment, it is difficult to model the viral load trajectory parametrically, because drug resistance, noncompliance, and other long-term clinical factors may affect viral load trajectories. Therefore, semiparametric N L M E models may be more suitable for modelling H I V viral dynamics (Wu and Zhang, 2002). Understanding the large inter-patient variation in H I V viral dynamic studies often receives great attention, which may help to provide individualized treatments. It has been shown that covariates such as C D 4 cell count (see Figure 1.2) may partially explain the large inter-patient variation (Wu et al. 1999; W u , 2002). However, some covariates such as C D 4 cell count may be measured with substantial errors and may be measured at time points different from the response measurement schedule (which leads to missing data in covariates). Ignoring these measurement errors and missing data in covariates may lead to biased results (Wu, 2002). In addition, it is very common that some patients may drop out of the study early or miss scheduled visits due to drug resistance/intolerance and other problems (although dropout patients may return to study later). It appears that dropout patients may have slower viral decay rates, compared with the remaining patients (see Figure 1.1). Thus the dropouts are likely to be informative or nonignorable. Therefore, it is important to address measurement errors, informative dropouts, and missing data in semiparametric N L M E models in order to obtain reliable results, which may make significant contributions to H I V / A I D S studies. The following A I D S dataset motivates our research. A more detailed data description can be found in W u (2002). The dataset includes 53 H I V infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured 11 50 100 150 Days after treatment Figure 1.2: C D 4 counts of six randomly selected HIV patients. 12 200 on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contaminated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules. It is well known that C D 4 counts are usually measured with substantial errors. The number of response (viral load) measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance or other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 C D 4 measurements missing at viral load measurement times, due mainly to a somewhat different C D 4 measurement schedule. Six patients are randomly selected and their viral loads are plotted in Figure 1.1. In the presence of measurement errors in C D 4 count, we consider the following semiparametric N L M E model, which corresponds model (1.6) - (1.8), to fit the viral dynamics Va = \og {P e- ^ x w u + P e- ^) + x 2 i e i j , (1.9) log(Pi0 = Pl + hi, Xlij = $2 + Psz*j + b , (1.10) log(P t) = A (1.11) 2 2i + hi, 2 i i = w(tij) + hiiUj), where z*- is the unobservable true C D 4 count, reflecting the belief that actual, not possibly corrupted, C D 4 counts govern the initial phase viral decay rate XUJ. will be used in our data analyses in later chapters. Model (1.9) - (1.11) The C D 4 count trajectories for six randomly selected'patients are plotted in Figure 1.2. There exists large variability in C D 4 count between patients. Most C D 4 count trajectories appear to have roughly quadratic polynomial shapes. We will discuss covariate models in the next chapter. 13 1.6 Research Objectives and Thesis Organization In this thesis, we consider approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing responses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. Some asymptotic results are developed. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, is more accurate than the second method but it is computationally very intensive and may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions, is always computationally feasible and is often computationally much more efficient, but it is usually less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or may be used to provide excellent parameter starting values for the first method. The remainder of this thesis is organized as follows. In Chapter 2, we introduce general semiparametric N L M E models with covariate measurement errors. Following Rice and W u (2001) and W u and Zhang (2002), we employ natural cubic spline bases with the percentilebased knots to transform semiparametric N L M E models into a parametric N L M E models. In Chapter 3, we address measurement errors and missing data in time-varying covariates in semiparametric N L M E models and propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to log-likelihood func- 14 tions, respectively. We also compare the two joint model methods with the two-step method suggested by Higgins, et al. (1997) and discuss the asymptotic properties of approximate M L E s . We finally apply the two joint model methods to a H I V dataset. In Chapter 4, we address outcome-based informative dropouts and covariate measurement errors in semipara- metric N L M E models and propose two joint model methods, implemented by Monte Carlo E M algorithms. We illustrate our proposed methods in a H I V dataset and evaluate their performance via simulation studies. In Chapter 5, we consider random-effect-based informative missing responses in semiparametric N L M E models with covariate measurement errors. We propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Laplace approximation to log-likelihood functions respectively, to simultaneously obtain approximate M L E s of all model parameters. We also discuss some asymptotic properties of the approximate M L E s . We illustrate our methods in a H I V dataset and evaluate their performance by simulation studies. We conclude this thesis with some discussion and possible future work in Chapter 6. 15 Chapter 2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Measurement Errors 2.1 Introduction In this chapter we present the general form for semiparametric N L M E models with covariate measurement errors. In Section 2.2, we describe a general semiparametric N L M E model for the response process and incorporate possibly mis-measured time-varying covariates. We approximate the proposed semiparametric N L M E model by a parametric N L M E model, using linear combinations of natural cubic splines with percentile-based knots. Consistency of the estimates is discussed. In Section 2.3, the covariate process is modelled using a mixed-effects model to address measurement errors and missing data. 16 2.2 A Semiparametric N L M E Model for the Response Process 2.2.1 A Semiparmetric N L M E Model with Mis-measured Covariates We describe a semiparametric N L M E model in general form. Let for individual i at time Uj, i = l , . . . , n , and j = l,...,nj. be the response value Let ZM be the observed value let z* be the unobservable "true" value of covariate k for individual i at time uu, kl i = 1 , . . . , n, k = 1 , . . . , v, 1 = 1,..., m-i. For simplicity, we focus on the case where z* kl is the current true covariate value, but our method can be extended to the case where z* kl is a summary of the true covariate values up to time uu. Note that for each individual, we allow the covariate measurement times uu to differ from the response measurement times Uj. In other words, we allow missing data in the covariates. Let z* = (zfi> • • • >zLj » T where z = (z , • . . ,z i) , T u iU iu l = — (yu,... ,yi ) T ni and l,...,mi. For the response process, we consider a general semiparametric N L M E model similar to W u and Zhang (2002), but incorporate possibly mis-measured time-varying covariates Vij = ft, = d*(^, [3*, b*), r (t) t = 9(Uj, rifcjfi + eij, (2.1) (2.2) v(w(t), hi(t)), i = 1,... ,n, j = 1,... ,n u (2.3) where g(-), d*(-), and v(-) are known (possible nonlinear) functions, w(t) and hi(t) are unknown nonparametric smooth fixed-effects and random-effects functions respectively, /3*. are individual-specific parameters, (3* are population parameters, random error, and b* are random effects. Let e; = (en,... 17 is the within-individual ,ei ) . T ni We assume that ~ N(0, b* 5 1), 2 where 5 is the unknown within-individual variance and I is the identity matrix, 2 7Y(0, £ * ) , /ij(i)'s are identical and independent realizations of a zero-mean stochastic process h(t), and b* and hi(t) are independent of e i . We can rewrite the semiparametric N L M E models (2.1) - (2.3) in a compact way Vij = g(Uj, d*(z*., [3*, b*), v(w(tij), htiUj))) + eij. (2.4) Note that in (2.1) or (2.4), we assume that the individual-specific parameters /3*- depend on the true but unobservable covariates z*,- rather than the observed covariates zy, which are measured with error. Because of the nonparametric parts (i.e., w(t) and hi(t)) in the model, the semiparametric N L M E model (2.4) is more flexible than parametric N L M E models for modelling longitudinal data, and it reduces to a parametric N L M E model when the nonparametric parts w(t) and hi(t) are constants. Following W u and Zhang (2002), model (2.4) is also more flexible than other semiparametric N L M E models that have appeared in the literature, such as Ke and Wang (2001). The semiparametric N L M E models in K e and Wang (2001) can be considered as a special case of model (2.4). In particular, their model only put the random effects in [3*^ as in (2.2) and considered (£*,•) = W(UJ; /?*•) in (2.3). Therefore, model (2.4) is a very general and flexible semiparametric N L M E model. 2.2.2 A Basis-based A p p r o a c h to N o n p a r a m e t r i c F u n c t i o n s To do statistical inference for the semiparametric N L M E model (2.4), a main difficulty is how to fit the nonparametric smooth fixed-effects function w(t) and random-effects function hi(t). Following Rice and W u (2001) and W u and Zhang (2002), we use a basis-based approach which transforms a general semiparametric N L M E model into a set of parametric N L M E models indexed by a smoothing parameter (the number of basis functions). 18 We use the fixed-effects function w(t) to illustrate the basis-based approach. Let x be the support of t and L (x) be the inner product space of all square integrable 2 functions with norm || • || and inner product < •, • >, where for any ipi, ip2 £ L {x)> 2 Jx a subspace of L (x)2 w w W^(x) define e Jx Assume that w(t) is an element of a smooth function space S (x), example of S (x) w An is the Sobolev space = {ir>\il>,ip',...,i) ~ {m absolutely continuous, ^ 1] Denote a complete orthonormal basis of S (x) ( m e ) L ( )}. 2 X by \I>(£) — [ipo(t), tpi(t), tp2(t), • • • ] T w where ipo(t) = 1. Then w(t) can be expanded as oo fc=0 where the coefficients Hk= w(t)ip (t)dt =< w,ip > k k . JX Let * (t) p = [ipo(t), ipi(t), • • • ,ip -i(t)] T P and fj, — (//o, A*i, • • • ,^ ) T p P Since w(t) are square integrable, the truncations of w(t) at term p p-i fc=0 -2 will converge to w(t) in L -norm as p tends to infinity. It follows that when p is large enough, w (t) p can approximate w(t) very well, i.e., w(t) ~ w (t). p Similarly, if we assume that hi(i) is an element of a smooth function space 5/ (x)(c l L (x)) 2 = [(f>o{t), <Pi(t), 02 (0> • • - ] with a complete orthonormal basis T the truncations of hi(t) at term q <?-i k=0 19 where (j>o(t) = 1, will converge to hi(t) in L - n o r m as q tends to infinity, where <& (£) = 2 g (f) -i(t)] and £ , = ( £ > . . . , £ ; ) . It follows that when q is large enough, h (t) T T q [<f>o(t), <f>i(t), • • •, iq i 0 iq g imate hi(t) very well, i.e., hi(i) ~ can approx- hi (t). q The function w (t) can> be considered as the projection of w(t) on the linear space p S(x, * ) p = {i> | i> = ty (t) p and the function h (t) iq S{x, T p /ii(i)'s G i? } p p iiq^ iiq € p and g increasing, io (i) and p fi p C S (x), w spanned by basis functions $? (t), p can be considered as the projection of hi(t) on the linear space = {0 I 0 = ^P^T / x and fi , /ii (t) g C SUx), spanned by basis functions approach to w(t) and respectively. With Parameters are unknown vectors of fixed- and random-effects coefficients, respectively. Since are assumed to be identical and independent realizations of a zero-mean stochastic process, we can regard £ i q as identical and independent realizations of a zero-mean random vector with unknown covariance matrix K. There are many bases available in the literature for curve fitting. Among global bases are Legendre polynomials and Fourier series, and among local bases are regression splines (Eubank, 1988), B-splines (de Boor, 1978) and natural splines (Green and Silverman, 1994). A B-spline of degree d on x with knots to < ti < ' ' ' < < ^M+I is a piecewise polynomial with polynomial pieces of degree d joining together smoothly at the interior knots * i < • • • < * M while satisfying some boundary conditions. In other words, a B-spline is a polynomial of degree d within each of the intervals [tk, tfc+i), 0 < k < M — 1, and [XM, I M + I ] , which globally has (d — l)-continuous derivatives. A l l such B-splines form a linear space with M + d + 1 basis functions which are mainly determined by three factors: the degree d; the location of the knots, and the number of interior knots M. corresponding B-splines are called cubic splines. When the cubic splines have zero second and third derivatives at the two extreme knots XQ and XM+I, splines. When the degree d = 3, the they are called natural cubic Without loss of generality, throughout this thesis, we assume that the nonparametric 20 fixed- and random-effects functions w(t) and hi(t) are elements of the Sobolev space VK (x) 2 2 and we use natural cubic spline bases (Green and Silverman, 1994) due to their many good properties, for example, easy construction, good smoothness, and flexibility to model the underlying curves of various shapes (de Boor, 1978). 2.2.3 Percentile-based Knot P l a c i n g for Splines The placing of knots is an important issue for splines in which we attempt to use a few knots to represent a sample of design time points. We use sample percentiles of the design time points as knots so that there are more (fewer) knots in the area where more (fewer) design time points are available, as suggested by W u and Zhang (2002). They indicated that the percentile-based knot placing rule should work better for longitudinal data than the equally-spaced knot placing rule used by Rice and W u (2001), since the design time points of longitudinal data are usually sparse and often not uniformly spaced. Moreover, the percentile-based knot placing rule guarantees that the locations of the knots (and also the resulting basis functions) are sample-dependent and design-adaptive. These properties are not shared by the equally-spaced knot placing rule. After the degree d and the knot placing rule are determined, we need to choose the numbers of the interior knots, or equivalently to choose the numbers p and q of the basis functions, which are called smoothing 2.2.4 Selection of S m o o t h i n g parameters. Parameters Using natural cubic spline bases with percentile-based knots to fit the nonparametric fixedand random-effects functions w(t) and hi(t), we can transform the semiparametric N L M E model into a parametric N L M E model. To assess how well the resulting parametric N L M E model approximates the original semiparametric N L M E model, we need to consider two factors: the goodness-of-fit and the model complexity. Goodness-of-fit usually indicates how 21 well the model fits the data (or how small the biases of the associated estimators are). It can be improved by increasing p and and Sh(x,®q{t))- q or equivalently, enlarging the linear spaces S (x,^ {t)) w p However, the model complexity represents how complex the model is (or how large the variances of the associated estimators are). The model usually becomes more complicated with increasing p and q. Thus, there is a trade-off between the goodness-offit and the model complexity. To balance the two components, it is natural to employ some model selection rules such as the Akaike Information Criterion (AIC) or the Schwarz's Bayesian Information Criterion (BIC) (Davidian and Giltinan, 1995). This is because the transformed parametric N L M E models are indexed by p and q and choosing different p and q is equivalent to choosing different parametric N L M E models. Let <p be the number of independent parameters in a parametric N L M E model, say, model (1.1) and (1.2). Then the A I C and the B I C are defined as A I C = - 2 L o g l i k + 2tp, B I C = - 2 L o g l i k + [log ( £ m)] cp, where Loglik is the log-likelihood of the fitted the parametric N L M E model (see Davidian and Giltinan, 1995, pl56). Since a parametric N L M E model with a larger number of parameters will always produce a larger value for the log-likelihood (a smaller value for -2 Loglik), the n penalty terms 2<p in A I C and log(]Tnj) L i=l ip in B I C are needed to offset this advantage. Since J the penalty term in B I C is usually much larger than that in A I C , B I C is a conservative rule and generally favors a parsimonious model. Since both the A I C and the B I C of a parametric N L M E model are defined as twice the negative log-likelihood of the model (representing the goodness-of-fit) plus a. penalty term related to the number of parameters used in the model (representing the model complexity), we will choose BIC are minimized over a series of ^ (t) p and ^ (t) p $ (t), g and $> (t) q so that the A I C or the which leads to the best approximate parametric N L M E model to the original semiparametric N L M E model in terms of the A I C 22 and B I C criteria. Liang et al. (2003) noted that the model obtained this way often provides good approximation in practice. We will evaluate the performance of the A I C and the B I C in the current setting (see Section 4.6). 2.2.5 Transformation of the Semiparametric N L M E Model After determining the smoothing parameters p and q via the A I C and the B I C criteria, we replace w(t) and hi(t) in the nonparametric function r;(£) in (2.3) by their approximations w (t) p and h {t). iq Thus, we obtain an approximation to the nonparametric function r^t), and approximate the semiparametric N L M E model (2.4) as follows V i j « gfc, d*(zT., ft, b*), v^! {t) ^ = giUj, d(zy, ft, bi)) + ey T p $ (t) £ )) + T q iq ^ (2.5) where (3 = ( f t , /j, ) are fixed effects, b; = (b*, £ ) are random effects, and d(-) is a known i ? but possible nonlinear function. Then, we can approximate the semiparametric N L M E model (2.1) - (2.3) by the following parametric N L M E model ft.-Hey, Va = </(/,;• ft,. = d ( z , ft bi), e i |ft ^ / V ( 0 , 5 I), 2 b > A f JV(0,/3), y (2.6) (2.7) where B is an unstructured covariance matrix. Note that e; and bj are independent of each other. Approximate statistical inference can then be based on the approximate model (2.6) and (2.7), as shown in Chapters 3 - 5 . 2.2.6 Consistency of the After we obtain estimates ji p and £ Estimate of i q w{t) based on the parametric N L M E model (2.6) and (2.7), we can then estimate the nonparametric functions w(t) 23 and hi(t) in the semiparametric N L M E model (2.1) - (2.3) as follows w(t) =• w (t) = * (i) A , T p p P k(t) = k (t) = $ (t) i , T g g iq Therefore, the consistency of the estimates in the semiparametric N L M E model (2.1) - (2.3) is strongly related to the consistency of estimates in the parametric N L M E model (2.6) and (2.7). Under some mild conditions, the following Theorem 2.1 guarantees that we can obtain a consistent estimate w(t) of the nonparametric fixed-effects function w(t) in the semiparametric N L M E model (2.1) - (2.3) if we can find i/n-consistent estimates -ft of the fixed-effects coefficients pi . p Following W u and Zhang (2002), we prove the consistency of the estimate w{t) of the nonparametric fixed-effects function w(t) in the semiparametric N L M E model (2.1) - (2.3) based on the following conditions: (a) . $(t) is a complete orthonormal basis of S(x), subspace of a L (x)2 oo (b) . The nonparametric fixed-effects function w(t) E S(x) s o that w(t) = £ l^ki>k{t)k=Q (c) . The design time points {i^, i = l,...,n, j = l , . . . , n j } are identically and independently distributed such that when the number n of individuals tends to infinity, the number of distinct time points will tend to infinity. In this case, we can truncate w(t) in the semiparametric N L M E model (2.1) - (2.3) in such a way that w (t) p = Y^k^o^k^kif) = ty {t) /x so that p —> oo, p/n —> 0 as n —> oo. T p p (d) . For any fixed p, we assume that we can obtain A/n-consistent estimates fx of the p fixed-effects coefficients pL so that as n —• oo, E(p\ ) p p semidefinite positive matrix S —> fi and Cov(\/nfi ) p p —• E for some p with p t r ( E ) bounded. _ 1 p p T h e o r e m 2.1. Under Conditions (a) - (d), as n —* oo, we have \\w — w\\ —> 0 in probability. 24 Proof. First we consider E\\£ip — [ip "p f"pl E\\p, -» \\ J p -p-1 P-I _fc=0 fc=i = E 2 p p-i = ^E[/x -E(/i ) + ^(Afc)-^] fc=i p-i 2 - £ fc=i A**] + f c f c - W*)] + 2 - 2[A/t - E(p: )][E(fi ) 2 k - k v ]} k P-I = £{E[£ *;=i p—i - £(£fc)] + [£(£*) - ^] } 2 f e 2 p—i f = ^Var(/i ) + I][^(A fc=i fc=i - tr [Cov(A )] + | | £ ; ( A p ) - M ] ) - f e f c )-^] 2 2 p P Under Conditions (a) and (b), and C o v ( n / i ) —> S in Condition (d), we have / v E\\w - w\\ = E\\w - w\\ 2 2 p p p < 2{E\\w - w \\ + \\w - w\\ } = 2{E\\p, -v \\ = 2{tr[Cov(ji )] = 2 { n - t r [ C o v ( v ^ / i ) ] + \\E(p. ) - M | | + \\w - w\\ } = 2Jn- tr[S + = 2(n- [tr(S )+po(l)] 2 p 2 p 2 p P p p + \\wp-w\\ } 2 + \\E(j± )-ti \\ + 2 p p \\w -w\\ } 2 p 1 2 p 1 p 1 p p 2 p p o(l)] + ||£;(A)-^|| + ^ 2 p + ||/i;(Ap)-/Xp|| 2 + 2 f c | f;^} fc=p = 2^ k=p ) Under Conditions (b)-(d), it is easy to show that the three terms in parentheses {•} of the right-hand side tend to 0 as n —> oo. Under Condition (d), as n —> oo, E\\w — w\\ —> 0 2 implies \\w — w\\ —* 0 in probability. • ' 25 2.3 Measurement Errors and Missing Data in Covariates At the presence of measurement errors and missing data in the time-varying covariates u z = (zni,... ,z i) , T il/ we need to model the covariate processes. We consider the following multivariate linear mixed-effects ( L M E ) model (Shah et al., 1997) to empirically describe the covariate process z it = Uua + Vu a; + eu (= z* + e) t it where Uu and Vu are design matrices, a and i = 1 , . . . , n, I = 1 , . . . , m-i (2.8) are unknown population (fixed-effects) and individual-specific (random-effects) parameter vectors, and eu are the random measurement errors for individual i at time uu- For example, we may model the covariate processes parametrically based on empirical polynomial models with random coefficients, as in Higgins et al. (1997) and W u (2002). Alternatively, we may model the covariate processes nonparametrically, and approximate the nonparametric fixed- and random-effects functions by linear combination of some basis functions, as in Section 2.2. For either parametric or nonparametric covariate models, we may convert the covariate models to the L M E model (2.8). Note that the covariate model (2.8) incorporates both the correlation of the repeated measurements on each individual and the correlation among different covariates. Note that the parameters in the covariate model (2.8) may be viewed as nuisance parameters because they are often not of main interest. We assume that the true (unobservable) covariate values are z* = U a u We also assume that a; i.i.d.N{0, A), u e a + Vu sit. i.i.d.7V(0, R), and a^ and e = (e^,... t independent, where A is an unrestricted covariance matrix and R is an unknown within26 individual covariance matrix. We further assume that and a, are independent of ej and b j . Models (2.8) may be interpreted as a covariate measurement error model (Carroll et al., 1995; Higgins et al., 1997). To allow for missing data in the time-varying covariates (or different measurement schedules for the time-varying covariates), we recast model (2.8) in continuous time: Zi(t) = U (i)cx + V (t)a + e (t), i where Zj(£), i i i i = l,...,n, Ui(t), Vi(t), and e»(t) are the covariate values, design matrices, and measurement errors at time t respectively. A t the response measurement time Uj, which may be different from the covariate measurement times uu, the possibly unobserved "true" covariate values can be viewed as z*j = [7^ cx + a^, where Uij = Ui(Uj) and = Vi(Uj). In other words, missing covariates at time Uj may be imputed by their estimated true values z*-. 27 Chapter 3 A Joint Model for Semiparametric N L M E Models with Covariate Measurement Errors and Missing Data 3.1 Introduction In this chapter, we address measurement errors and missing data in time-varying covariates for semiparametric N L M E models. In Section 3.2, we review the two-step method proposed by Higgins et al. (1997). We derive some analytic and asymptotic results for the two- step estimates for parametric N L M E models with mis-measured covariates, and analytically show that the variances of the main parameter estimates based on the two-step method are underestimated. To address measurement errors and missing data in time-varying covariates in semi- 28 parametric N L M E models based on models (2.6) - (2.8), in Sections 3.3 and 3.4 we propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to the log-likelihood function respectively, to find approximate M L E s of model parameters. We also discuss asymptotic properties of these approximate M L E s . In Section 3.5, we apply the two joint model methods to a real dataset. We evaluate the proposed methods and compare them with the two-step method via simulation studies. We conclude this chapter with some discussion in Section 3.6. Proofs of the asymptotic properties of approximate M L E s are presented in Section 3.7. 3.2 A Two-step Method For covariate measurement error problems, a commonly used method is the so-called twostep method (Higgins, et al., 1997; Liang, et al., 2003): in the first step the "true" covariate values are estimated based on an assumed covariate model, and then in the second step, the possibly mis-measured covariates in the response model are simply replaced by the estimated covariates from the first step. The estimation of the main parameters in the response model proceeds as if the estimated covariate values are the true covariate values without measurement error. Intuitively, the resulting estimates of the main parameters may be approximately unbiased if the covariate estimates from the first step are unbiased, but the variances of the main parameter estimates may be underestimated because the variability of the covariate estimation in the first step is ignored in the estimation of the main parameters in the second step. Higgins et al. (1997) and Ogden and Tarpey (2005) realized this problem and proposed bootstrap methods which incorporate the variability from estimating the covariates in the first step. W u (2002) considered an approximate joint model approach which also incorporates the variability in the covariate estimation. In this section, we derive some 29 analytic results for the two-step estimates for parametric N L M E models with mis-measured covariates such as the models (2.6) - (2.8), analytically show that the variances of the main parameter estimates based on the two-step method are underestimated, and derive some asymptotic results for the two-step estimates. Let y = (yf, • • • , y ^ ) , and define z, z*, and e similarly. If z* is known, an estimate T (3 of (3 can be expressed as (3 = s(y, z*), where s is a vector function. Since z is recorded with errors and z* is unobservable, we assume z = z* + e, where E(e) = 0. Let z be an unbiased estimate of the true covariate value z* based on the observed covariate value z (i.e., E(z) = z*). The two-step method estimates (3 by f3 = s(y, z ) , which depends on realizations of two random variables y and z. If we assume that E[s(y, z*)] « (3, and y and z are roughly independent so that E ds(y,z) (z-z*)} dz E ds(y,z) dz £(z-z*), and that the function s(y, z) is well approximated by a first-order Taylor series expansion around z*, then we show next that the estimate f3 is also approximately unbiased, following Ogden and Tarpey (2005). Taking a first-order Taylor expansion of s(y, z) around the "true" covariate value z*, we have E0) =£(s(y,z))^£{s( ,z*) + y (3 + E ds(y,i) ds(y,z) | dz Iz=z* (z-z*)} E(z - z*) = f3, provided that the expectations exist (the expectation operator in the above expression is to be taken as the expectation with respect to both y and z). However, the variances of the two-step estimates will be underestimated, as shown below. When z is plugged in the estimation of j3, the resulting variance-covariance matrix is actually an estimate of Cov(/3|z). By the well-known variance decomposition formula Cav0) = Cov[£(/3|z)] + £[Cov(/3|z)], 30 we know that Cov(P) — E[Cov(/3\z)] is nonnegative definite, since Cov[E(/3|z)] is nonneg- ative definite. Thus, on average, the two-step approach underestimates the true variancecovariance matrix of f3. Following the general approach taken by Amemiya (1983), under the suitable regularity conditions on the log-likelihood function of P (e.g., the third order derivatives exist and are continuous in an open neighborhood about (3), we derive the asymptotic distribution for the M L E of f3 when the unobservable "true" covariates z* are imputed by their estimated values z^. Suppose that z^ are ^/m^-consistent estimates of z* and that yfrriiizi where - z*) N(0, fi ), i = Zi l,...,n, is the number of observations for covariates on individual i. mi = 0(m) We assume that uniformly for i — 1,. . . , n, where m — min^m,), and that m,j and n go to infinity at the same rate with n/rrti —• Cj, where 0 < Cj < oo. Let n i=l be the log-likelihood function of f3 based on the observed data y and z* with the unobservable "true" covariates z* are imputed by their estimated values z. T h e M L E J3 of /3 satisfies a set of equations f-' dp dp 1=1 where dk{P;yi, Z j ) dp = dlj(P;yi, z ) P=P dp f Taking a first-order Taylor expansion of dl((3;y, z)/d/3 around the "true" covariates z*, we 31 have z*) n ° - Edli(P;yi, dp +E K) = E dli((3;yi, +E 8(3 d h(p- , z*) dli(P;yi, z*) + E *-<>+£>(£) =E dpdzf * dli(f3;y z*) 8 k(P;yi, z*) • = E dp +E dpdzf n d k(P; , z*) dk(P;yi, z | ) =E +E dpdzf z*) d li(P;yi, z*) = E 8h(P;yi, + E dp dpdzf r&li&yi, L z* •(ii - z') + 0 { | | , - z-|| ) 2 Z d/33zf i=i n i=i i=l i=l n n r c ^ f t v i , z* L <9/3<9zf 2 yi a/3 i=i n n 1=1 2 u Z j - z*) + O I m a x Y — ) p i=i n L i \ mj / . 2 yi Zi-z*) + O 5/3 i=i i =n l n p (minm^. 2 z» - z*) + O p ( m ) , _ 1 2=1 since z» are 9k(P; of »=i y 77vconsistent / yi, estimates of z*. Next, carrying out a first-order Taylor expansion z*)/dp in the above expression around /3, we can obtain i=l d/3d/3 i=i J d lj(P;yj, z*) . ( z i dpdzf 2 i=l E t=i d h(p; , dh(P;yu zj) z*) 2 yi a/3 dpdp T i=i Z i ) + Op(m (P-P) _ x ) = 0 + O (n- ) 1 p , ^ d li(P;yi, z*) *s,n, -u + 2i =^l — a / 3 5 z ( z i - Z i ) + Q (^ ) = o, 2 n r P since /3 is the M L E of /3 and thus it is i/^-consistent under the necessary regularity conditions on the log-likelihood function of p. The above expression can be written as 1 ^ d h(P; , z*) n ^ dpdp i=i 2 yi 1 v d k(p; ,z*) dpdz\ 2 yi i=l 32 t=i dkjp-^z*) dp Assume that the following limits exist lim - f > ( / 3 ) = /(/3), n — * o o 7T, * — • • ' =1 where ii(/3) = —E[d li(f3; yi, z*)/d(3d0 ] is the Fisher information matrix for individual i. 2 r It follows from Lemma 3.2 in Section 3.7 that 1 A Note that -\/n( i z — z \) ~ - &kfay z?) u \/Ci m i{ i z — z d h(i3; , z*) 2 n yi p *) for large n and m^. Using the asymptotic normality of Z j , we know that for large m*, £'[y rf^'(zi — z*)] « 0 and Cov[y m^(z — z*)] « fl .. / / i z By Lindeberg's central limit theorem, we have It follows from Slutsky's theorem that 1^ d l 0-y , 2 i i z*) l^U r d k(J3;yi, z*) 2 d Under the necessary regularity conditions on the log-likelihood function of (3, based on the standard arguments for showing asymptotic normality of M L E s , we can obtain and 1 A d li(f3;yi, z*) 2 ^ d(3df3 T P I m Since y and z are roughly independent, the two limit random variables with distributions N(0, Q ) and N(0, I((3)) are roughly independent. Note that n/m = 0(1). Putting these z pieces together, we have asymptotic normality of y/n(/3 — (3) as follows: • ^cp-ft^N^rm-'+m-^jipy 1]. 33 (3.1) Note that the variance-covariance matrix of (3 based on the asymptotic distribution (3.1) has two components. T h e first component I(f3)~ is the "naive" estimate of the l variance-covariance matrix of /3 in the two-step method, in which the variability of z» is neglected. The second component /(/3) _1 O z /(/3) _1 arises from the variability in estimating Z j and summarizes the extra uncertainty of (3 due to estimation of Z j . component I{[3)~ fl 1 z 7(/3) _1 Since the second is nonnegative definite, the variances of the main parameter estimates based on the two-step method are underestimated. 3.3 A Joint Model Method for Likelihood Inference 3.3.1 The Likelihood for the Joint Model We consider likelihood inference for semiparametric N L M E models with measurement errors and missing data in time-varying covariates, based on the approximate parametric N L M E models (2.6) - (2.8). ( a , (3, 5 , 2 The observed data are {(yi, z*), i = l,...,n}. Let 6 = R, A, B) be the collection of all unknown parameters in models (2.6) - (2.8). We assume that the parameters a , (3, 5 , R, A, and B are distinct. Let /(•) be a generic 2 density function, and let [X\Y] denote a conditional distribution of X given Y. T h e approximate log-likelihood for the observed data {(yi, Z j ) , i = 1,... ,n} can be written as i=i J 34 where /y(yi|zi, = TJpli M y i j l y > *> *; a, (3, 5 ) z 2 a A <^) 2 b = n;ii ( 2 ^ ) - ^ e x p { - [ y - g(t , d(v%a + v^ai, /3, b,))] /2<5 }, 2 y l3 / (zi|ai; a, i?) = T J ^ i /z(ifcl *; a, i?) z a z = ]ir=i M " / 1 x(z - U i ifc f c 2 exp{-(zi - u fc Q 2 1 i ) fl) = |2^fi|- /2 p{-bf/3- bi/2}. 1 i; R~ l ik fc l /(b cx - w Vi ai)/2}, - /(a,; A)" = | 2 7 r A | - / e x p { - a M - a / 2 } 1 ifc 1 eX This approximate log-likelihood function generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects ai and bj. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the dimension of the random effects (a i( b;). This is straightforward to do by direct numerical integration such as Gaussian quadrature when the dimension of (ai, bi) is very small (say, 1 or 2). However, when (a^, bi) has a dimension of 3 or more as is often the case in practice, one needs to consider alternative methods such as computationally intensive Monte Carlo methods. Laird and Ware (1982) obtained M L E s in L M E models using the E M algorithm. Here we use a Monte Carlo E M ( M C E M ) algorithm to find the approximate M L E s of all parameters 6. B y treating the unobservable random effects ai and bi as additional "missing" data, we have "complete data" { ( y , , Z j , ai, bi), i = l,...,n}. The complete-data log- likelihood function for all individuals can be expressed as c{Q) l = E c(0) l = £ { l o g /y(yj| i, z i=i ai, b a, (3, 5 ) + log 2 i; i=i where z z a, R) (3.2) + log/(a ;yl) + log/(b ;B)} i f { i\^\ i ) is the complete-data log-likelihood for individual i. 35 2 3.3.2 A M C E M Method The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algorithm to compute M L E s in a wide variety of situations, such as missing data and randomeffects models. The E M algorithm iterates between an E-step, which computes the conditional expectation of the complete-data log-likelihood given the observed data and previous parameter estimates, and a M-step, which maximizes this conditional expectation to update parameter estimates. The computation iterates between the E-step and M-step until convergence leads to the M L E s (or local maximizers). When there are several modes in the conditional expectation, the M L E s can be determined by trying different parameter starting values. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectations. In the M-step, we use standard complete-data optimization procedures to update parameter estimates. Let 6® be the parameter estimates from the i-th E M iteration. The E-step for individual i at the (t + l)th E M iteration can be written as Qi(9\0W) = E(l®(0)\ , z; flW) yi f log /y(y;|zi, ai, bi; a, (3, S ) + log / (zj|aj; a, R) 2 z + log /(ai; A) + log /(bi; B) x /(ai, b;|v;, z ; 0(*>) db { = { J « ( a , (3,5 ) + / W ( a , R) + I®(A) + I®(B). (3.3) 2 The above integral generally does not have a closed form, and evaluation of'the integral by numerical quadrature is usually infeasible, except for simple cases. However, note that expression (3.3) is an expectation with respect to the conditional distribution /(ai, bi|yi, z*; 0®), and it may be evaluated using the M C E M algorithm of Wei and Tanner (1990), as in Ibrahim et al. (1999, 2001). Specifically, for individual i, let random sample of size k generated from t {(af \ bf >),..., (af , bf )} denote a [a;, bj|yj, z*; 0^]. 36 () Note that each t} ( a f b f ^) de- pends on the E M iteration number t, which is suppressed throughout. Then we approximate the conditional expectation Qi(0\0®) in the E-step by its empirical mean, with missing data replaced by simulated values, as follows = IT E log fviyiK (5, 6>) + i £ log / ( * | a ? > ; a , J2) a ? \ bf> «, z ; k=l fc=l + i E l o g /(af U ) + £ E log /(bi ;B). F C ) fc=i fc=i + fc _i/c, t — 1, 2 , 3 , . . . , for some positive We may choose ko as a large number and k = k -\ t t t constant c, in the t-th iteration. Increasing k with each E M iteration may speed up the E M t convergence (Booth and Hobert, 1999). The E-step at the (t + l ) t h E M iteration can then be expressed as Q(o\o®) » E = ± QMeW) i=i = E E i=i \r i £log M v t K E ^ ( 0 ; vu ~^t\ b ? ) } } t fe=i j (3, <5) + ± a f , b?\a, i=lfc=l £ log / ^ a f ; a , i?) i=lfc=l +E E 1=1 E 2 i log /(af ; A) + E E 5 fc=l 2=1 fc=l i log /(b! f c ) ; B) = Q ^ a , /3, r5|6>W) + Q ( ) ( a , i?|6»W) + < 2 ( ) ( A | 0 « ) + 2 2 Q^O^M). 3 To generate independent samples from [a bj| i( y i 0^], we use the Gibbs sampler , z; { (Gelfand and Smith, 1990) by sampling from the two full conditionals [ a j | , Zj, b ^ 0^] and yi f? ] as follows. (t) [ b i | y j , Zi, /(ailyi, z i ; b i ; 0<*)) oc / ( a i , y ^ Z j , b i ; 0(*>) = / ( a ^ , b = /(ai|zi;6>W)-/ (yi|zi, y / ( b i | y i , Zi, ai; 0 ) (t) a i i ; 0<*>) • f (yi\zi, 0 « ) • / ^ ( Z i | a i ; 0W) • M y ^ , oc / ( b i , y i a,; 0 ) i ; b^ 0(*>) ai, b , ; 0 « ) , = / ( b i | z i , a^ 0 ) W a , bi;6»W) oc / ( a , ; |zi, Y W • / y ( v i | z i , a*, (3.4) fy; 0W) = /(b ;0W)-/y(y |z iai,b ;0(*)), i i i i where a i and b i are independent each other. Monte Carlo samples from the above full conditionals can be generated using rejection sampling methods, as in W u (2004). Alternatively, 37 integral (3.3) may be evaluated using the importance sampling method. Details of these sampling methods and convergence issues will be investigated in the next section. The M-step then maximizes Q(0\0^) to produce an updated estimate #( t+1 ' at the (t + l)-th iteration. Note that the parameters cx, (3, 6 , R, A, and B are all different, so we 2 can update the parameters ( a , (3, 5 , R), A, and B by maximizing + Q^ \ 2 and 2 separately in the M-step. (3 \ The maximizer (a<- \ {t+l t+1 R ^ ) for 5^ \ 2 t+l + may be computed via iteratively re-weighted least squares where the random effects are replaced by their simulated values {(af \ ( (*+D a ) /3 (t+D b f >)}: j < J 2 ( t + l ) j = a r g m a {Q(D( x CX,j3,S ,R Q)/ 3 52| ( ) )( 0 t ) + g(2) ( a ) j R | (t) 0 ) } 2 { 5I5Ir 6 + j k n t EEr i=i fc=i l o K lo M Y i N , a f \ b f ^ a , /3, <5) g^(^l^ 2 f e ) ;«^)j- t ( - ) 3 5 J In general, the function in (3.5) is nonlinear in parameters and thus, the maximizers have no closed-form expressions. The maximizers could be obtained via standard optimization procedures for complete-data nonlinear models, such as the Newton-Raphson method. Note that optimization procedures for nonlinear models may be iterative as well. We can use the following Lemma to obtain analytic expressions of the maximizer A< > for t+1 and the maximizer B ^ for L e m m a 3.1. (Seber, 1984). Consider the matrix function h(E) = l o g | E | + t r [ E n ] . _1 If Q is positive definite, then, subject to positive definite E , /i(E) is minimized uniquely at E = a 38 By Lemma 3.1, = the maximizer A^ ^ y A n kt ^ argmax^^-log/CafM) i=i k=i " =' arg max £ £ - fe=l ^ fc=l argnunlnloglAI [af >f ^"^ [af >]} + i^^alY^- !^]} argmmjnlogMI + argminjn [fi 4 1 i = l fe=l i^^trltaf)]^- ^])} 1 i = l fc=l 4 = (*)]} A * 4 = 1 { - ± log | 2 T T A | - | [ g W p -i arg mm £ £ I { log \A\ + i=l = * i fct i=l = can be written as argmax{Q^(A|6>W)} • = for t+1 log | A | + ± ^ j S r ^ T [af] 1 [aj ] )} fc) r 2=1 fc=l 4 ^-trL" £ | > f ] [af'f }} 1 = a r g m i n j n log | A| + I 4 i=l J k=l • = rgmm{log|4+tr{/l-' i-^E^* ]N* ] '}} ) 7 2=1 K = l - Similarly, ) ; a C i=l k=l the maximizer 73 fl (i+1) = (4+1 for ) argmax{Q >(#|0( ))} (4 B = can be obtained by 4 n kt i=i fc=i -. argnwx££-log/(bi f c ) ;i?) 4 = ^EE^'iibfr. 4 1 = 1fc= l To obtain the asymptotic variance-covariance matrix of the M L E 9, we can use the formula of Louis (1982), which involves evaluating the second-order derivative of the complete39 data log-likelihood function. Alternatively, we may consider the following approximate formula (McLachlan and Krishnan, 1997). Let Sc \ = dlc^/dO, where li^ is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of 0 is -l Cov(0) = $>(s«| V l , z l 5 0) E{sf\ , Vi z i ; 9f where the expectations can be approximated by Monte-Carlo empirical means, as above. In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. Obtain an initial estimate of 9 = 0 ^ based on a naive method such as the two-step method, and set a f = 0 and bj ^ = 0. 0 Step 2. A t the (t+ l)th (t > 0) iteration, obtain Monte Carlo samples of the "missing data" (a,, bj) using the Gibbs sampler along with rejection sampling methods by sampling from the full conditionals [a^y;, z;, b,; 0^] and [b^y^, Z j , a^; 0 ^ ] , or using importance sampling methods, to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates 9^ \ +1 using standard complete-data optimization procedures. Step 4. Iterate between Step 2 and Step 3 until convergence. 3.3.3 Sampling Methods Gibbs Sampler For the proposed Monte Carlo E M algorithm, we can see that generating samples from the conditional distribution [a*, bj|yj, Zj'; 0^] is an important step for implementing the E-step of the Monte Carlo E M algorithm. The Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multi-dimensional distribution by sampling from full conditionals in turn,.until convergence after a burn-in period. Here, we 40 use the Gibbs sampler to simulate the "missing" random effects a^ and bv Set initial values ( a | ° \ b- ^). If the current generated values are (a\ \ 0 k b f ) , we can obtain ( a f + 1 \ h \ k a s follows: Step 1. Draw a sample for the "missing" random effects a\ from h the full conditional z , b| ; 0 W ) . fc) t ' Step 2. Draw a sample for the "missing" random effects /(tx| y > i h\ ^ from the full conditional k+1 af ; 0<*>). +1) We assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. After a sufficiently large burn-in of r iterations, the sampled values will achieve a steady state as reflected by the time series plots. Then, {(af, b| ')} can be treated as a sample from the multidimensional density function fc /(ai, bi|yi, 0(*>). If we choose a reasonably large gap r' (say r' = { ( a f ° , b\ ),k k) = r + r',r + 2r',...} 10), we can treat the sample series as an independent sample from the multidimensional density function. T h e simplest choice for initial values (a\°\ bf^) is (0, 0). Multivariate Rejection Algorithm Sampling from the two full conditionals can be accomplished by rejection sampling methods as follows. If the density functions are log-concave in the appropriate parameters, the adaptive rejection algorithm of Gilks and Wild (1992) may be used, as in Ibrahim et al. (1999). However, for arbitrary N L M E models, some densities may not be log-concave. In such cases, the multivariate rejection sampling method (see Section 3.2 in Geweke, 1996) may be used to obtain the desirable samples. 41 Booth and Hobert (1999) discussed such a method in the context of complete-data generalized linear models, which can be extended to our models. For example, consider sampling from / ( a j | y j , z / ( z j | a j ; 0 ^ ) - / ( y j | z j , a^, b i ; 0^) i ; bjj 0®) in (3.4). Let /*(a,) = and c = sup{/*(u)}. We assume c < oo. A random sample u from / ( a j | y i , Z i , b^ 0^) can then be obtained as follows by multivariate rejection sampling: Step 1. Sample a* from / ( a ^ 0^), and independently, sample w from the uniform (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from / ( b j | y j , Zj, a*; 0^) can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from [ a i , bj|yj, z*; 0^]. Booth and Hobert (1999) noted that, when it is easy to simulate from the assumed densities, the multivariate rejection sampling method can be very fast even if the acceptance rate is quite low. Importance Sampling When the dimensions of a^ or bj are not small, however, the foregoing rejection sampling methods may be slow. In this case, we may consider importance sampling methods where the importance function can be chosen to be a multivariate Student t density whose mean and variance match the mode and curvature of /(a*, bi| y i , Z i ; 0^). Note that a mul- tivariate t distribution, which has heavier tails than a multivariate normal distribution, will produce a more robust approximation since underestimating the tails can have serious consequences such as unstable behavior that may be difficult to diagnose. Booth and Hobert (1999) discussed an importance sampling method for complete-data generalized linear models. Here, we may extend their method to our models and use importance sampling methods to approximate the integral in the E-step. Specifically, we write /(a,, b i | y i ; Z i ; 0^) = 5exp[/i(ai, b i ) ] , 42 where s is an unknown normalizing constant. Let h(a^, bj) and h(a^, bj) be the first and second derivatives of h(a^, bj) respectively, and let (a*, b*) be the solution of /i(aj, bj) = 0, which is the maximizer of h(ai, bj). variance of Then, the Laplace approximations of the mean and /(aj,bj|yj, z$; 0®) are (a*, b*) and — (h(a*, b*)) _1 respectively. Suppose that {(a*' ', b*' ^),..., (a*^'', b*^)} is a random sample of size k generated from an importance 1 1 t function /i*(aj, bj), which is assumed to have the same support as /(aj, bj|vj, ZJ; 0^). Then we have where it) /(a; ,b; W (fc) |y ,z t i; QW)- h*(af\b*M) are importance weights. Other sampling methods have also been proposed (e.g. McCulloch, 1997). For the above sampling methods, the adaptive rejection method is applicable only when the appropriate densities are log-concave, while the multivariate rejection sampling method and the importance sampling method are applicable in general. Adaptive and multivariate rejection sampling methods may be efficient when the dimensions of the random effects and the sample sizes are small. When the dimension of the integral in the E-step is high, however, rejection sampling methods can be inefficient due to low acceptance rate. If the sample size is not small, importance sampling methods may be more efficient than rejection sampling methods since in this case the importance function may closely resemble the true conditional distribution. 43 3.3.4 Convergence For Monte Carlo E M algorithms, the incomplete-data log-likelihood is not guaranteed to increase at each iteration due to the Monte Carlo error at the E-step. However, under suitable regularity conditions, Monte Carlo E M algorithms still converge to the M L E s (Chan and Ledolter, 1995). When applying the Monte Carlo E M algorithm, Monte Carlo samples for the "missing" random effects are drawn at each E M iteration. Consequently, Monte Carlo errors are introduced. The Monte Carlo errors are affected by the Monte Carlo sample size. It is obvious that larger values of the Monte Carlo sample size fc will result in more t precise but slower computation. A common strategy is to increase k as the number t of E M t iterations increases (Booth and Hobert, 1999). For sufficiently large values of k , the Monte t Carlo E M algorithm would inherit the properties of the exact versions, such as the likelihood increasing properties of E M , but this would substantially increase the computational work load. Thus, we usually use a relatively small k at initial iterations, and then increase k t t with the iteration number t. If the Monte Carlo error associated with 0 ( t+1 ) is large, the (t + l ) t h iteration of the Monte Carlo E M algorithm is wasted because the E M step is swamped by the Monte Carlo error. Booth and Hobert (1999) proposed an automated method for choosing k in the t context of complete-data generalized linear models. Their method can be extended to our case in a straightforward way as follows. Let Q ( 1 ) 0Q(fl|flC>) (0|0«) dd d Q(0|0W) 2 Q >(0|0 ) (2 (t) and let 0*( ) be the solution to Q^(G\6^) <+1 dddO T = 0. When the simulated samples are indepen- dent, it can be seen that the conditional distribution of [0^ ^|0W] is approximately normal +1 44 with mean 0*( ) and a covariance matrix that can be estimated by t+1 C o v ( 0 ( ) | 0 « ) = Q(2)(0*(*+D|0(i))-i Cov(Q (0* i+1 (1) (t+1) |0W)) Q(2)( >*( )|0(*))- , t+1 1 6 where C o v ^ V ^ I ^ ) ) = £ E | ^log/(yi,z (a[ \ k i > ai f c \b^^* + 1 ))| }, b f ) are simulated samples, and Wik are the importance weights when the importance sampling is used and are all set to 1 when rejection sampling methods are used. After the (t + l)th iteration, we may construct an approximate 100(1 — a)% confidence ellipsoid for 0*(*+) 1 Dasec } o n the above normal approximation. The E M step is swamped by the Monte Carlo error if the previous value 0^ lies in the confidence ellipsoid, and in that case we need to increase k . For example, we may set kt to be & _ i + h-i/c for some positive constant t t c and appropriate k . Increasing k with each iteration may speed up the E M convergence 0 t (Booth and Hobert, 1999). Note that this method of choosing k is completely automated. t The proposed Monte Carlo E M algorithm often works well for models with a small dimension of random effects. When the dimension of random effects is not small, however, the proposed M C E M algorithm and Gibbs sampler may converge very slowly or even may not converge. Therefore, in the next section, we propose an alternative approximate inference method which may avoid these convergence difficulties and may be more efficient in computation. 45 3.4 3.4.1 A Computationally More Efficient Approximate Method The Need for a n A l t e r n a t i v e M e t h o d The Monte Carlo E M method in the previous section may be computationally very intensive and may offer potential computational problems such as slow or non-convergence, especially when the dimensions of the random effects a; and b; are not small. When the dimensions of the random effects are not small, sampling the random effects in the E-step may lead to inefficient and computationally unstable Gibbs sampler, and may lead to a high degree of auto-correlation and lack of convergence. To overcome these difficulties, in this section we propose an alternative approximate method by iteratively using a first-order Taylor approximation to the nonlinear models. The proposed method avoids sampling the random effects and provides analytic expressions for parameter estimates at each iteration, So it may be preferable when the Monte Carlo E M method exhibits computational difficulties. Alternatively, the proposed method in this section can be used to obtain excellent parameter starting values for the Monte Carlo E M method. For complete-data N L M E models, approximate methods have been widely used, and these approximate methods perform reasonably well in most cases (Lindstrom and Bates, 1990; Pinheiro and Bates, 1995; Vonesh et al., 2002). These approximate methods are typically obtained via Taylor expansions or Laplace approximations to the nonlinear models. One particularly popular approximate method for complete-data N L M E models is that of Lindstrom and Bates (1990), which is equivalent to iteratively lihood based on certain L M E models (Wolfinger, 1993). carrying out maximum like- Following Lindstrom and Bates (1990), we propose to further approximate model (2.5) by taking a first-order Taylor expansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model, with the covariate model (2.8), we 46 update parameter estimates based on the distribution of observations and an E M algorithm. In each iteration, analytic expressions for parameter estimates are available. Therefore, the proposed approximate method may provide substantial computational advantages over the Monte Carlo E M method. We rewrite the N L M E model (2.6) and (2.7) as a single equation Vij = 9ij( , P, i, i ) + e^; a where a i = 1 , . . . , n, j = 1 , . . . , n b (3.6) i} gij(-) is a nonlinear function. Let gi = (gn,... ,g ) . T iTH Denote the current estimates of (6, ai, bi) by (6, sn, bi). Taking a first-order Taylor expansion of g^ around the current parameter estimates a. and (3 and random effects estimates a* and bj, we obtain the following L M E response model y = W oc + X 3 + Hi&i + Tihi + ei, i i (3.7) ir where Wi = (wji,. . . ,W i „ ( x i i x ) i ) i with xv = da dgn = dp Xi = Hi = (hji,..., h ) with h = - d*i Ti = T T i n i ; j T ini 9i = with x i:j tj iJ (tii, • • • , Um) with tij = - 7 ^ - Yi - g i ( a t , P, i , bi) + WiCx + XiJ3 + HiSH + Tibi, a with all the partial derivatives being evaluated at (a, (3, a bi). i ; The proposed method in this section is to iteratively carry out maximum likelihood based on the L M E response model (3.7) and the covariate model (2.8). be the mean parameters and A = (5 ,R,A,B) 2 Let 7 = (a, (3) be the variance-covariance parameters. T h e algorithm consists of alternately obtaining approximate estimates of 7 given the current 47 estimates of variance-covariance parameters a based on the distribution of fj = (yf, z[) T and then updating the variance-covariance parameter estimates via an E M step using the posterior curvatures of (e;, e^, a;, bi), as in Laird and Ware (1982). 3.4.2 A n a l y t i c Expressions of Estimates We can combine the L M E response model (3.7) with the covariate model (2.8) to form a unified L M E model r% = QiJ where r % (yf, z f f , = = + ZitJi + V j , (af, bf) , v T 4 i = = l,...,n, (ef, f) , U T e t (3.8) = {1%,. U T V V Qi = Ui 0 Vi J 0 with O's being appropriate zero matrices. For the unified L M E model (3.8), by standard arguments, we have [fi\ui; 7, a] ~ where Ej = Z DZj + A», { N{Qi*f + ZiUii, Ai), / r2 T 2 si Ai = V 0 r> 0 [T\; \ D I®R 7, a] IA ~ iV(Qi7, Si), (3.9) 0 ^ B 0 and the Kronecker product I ® R is a z^mj x um-i matrix with the u x v submatrix R on the diagonals and zeros elsewhere. Using known results for L M E models (e.g., Vonesh and Chinchilli, 1997), we can update the estimate of 7 given the current estimates a by 7 = Etfsr ^ Eo^r * 1 1 .1=1 48 (3.10) Based on the unified L M E model (3.8), we can also obtain the joint distribution of (fj, u>i) at 7 = 7 and A = A W") Qa N 5 7i A IV ) 0 J V given the observed data r , at 7 = 7 from which we obtain the conditional distribution of and A = A: [wi|f i; 7, A] ~ JV[D Zf S _1 4 ( r i — Q i 7), D — D Zf S r 1 Z i £>]. (3.11) A n estimate of the random effects u>j is thus given by £> = DZ ^(fi-Qn). 1 i i (3.12) Finally, following Laird and Ware (1982), we can update the variance-covariance parameter estimates as follows. Note that if we were to observe a*, b j , e,, and e i} to y i in addition and Z j , we would have the following estimates ^ = Ebibf/n, * = £efe,/£n , 2 i-1 n where ^ = E E €ij i=ij=i £ F i ' S £ Jj ij> e e i=l rrii € i=lj=l A, and fl, respectively. e i=l t 1=1 n 4/ £ ™ » X^ajaf, i=l i=i and A = aj af/n, are the "sufficient" statistics for 6 , 2 £ i b f i=l b Since a i , b j , e^, and £ i=l R, are unobservable, we can "estimate" them by their conditional expectations given the observed data y i and z , at the current parameter estimates 7 = 7 and A = A. Based on standard results for multivariate normal distributions, we know that 7 = 7, A = A Z/i N \ 49 0 l LT PI /J and 1 ; 7 = 7, N A= A \ a J € Ej Qii^ V Rij / 0 matrix with the first n; x nj submatrix 5 I and zeros elsewhere, where L j is a (nj + i/m^) x 2 is a (rii + i/m*) x u matrix consisting of the first x v submatrix 0 and the remaining mi (u x v) square submatrices with the j t h square submatrix R and zeros elsewhere. B y the definition of conditional distributions, it can be shown that 5 1 ~ Lj E " LJ], Nf^T.Aj-iVtLfEr^-Q^), [e |f y 7, i ; A] ~ N[Rl Er^fi 2 - Qa), 1 R - R% E r 1 i^]. Using the expectation and covariance properties for multivariate random variables and some matrix algebra, we update the estimates of the variance-covariance parameters (<5, R, A, fl) 2 as follows: P = E E(ef ei\ril 7, A ) / £ rii = X X C o v ^ l r , ; 7, A)) + ^ e ^ ; R n mi = £££(e e£|f y i=i j=i n i ; = E E[Cov(e^|f »=i i=i i=l 7, _ mi i; 7, A f l ^ e ^ ; 7, A ) ] / £ n,- i=i 7, A ) + £ ( 6 ^ ; 7, A ) f l ( C y |r i ; 7, n A = E^( i fl^; a n Af]/£m , i=i ' t (3.13) 7,'A)/n a i=i n = £ [ C o v ( a j | f y , 7, A) + E{sLi\fi; fl 7, A) £ ( a j | f 7, A ) ] / n , • T i ; i=l =E£?(b bf|f ; ,A)//i i=l i i = ElCovCbilfij 7 7 ) A) + £ ( b j | f y , 7, A) E(b \ri; % 7, A) ]/n. T i=l The foregoing results show the closed-form expressions of the parameter estimates at each iteration for L M E model (3.8). Iteratively solving L M E model (3.8) until convergence leads to an approximate M L E 0 = (7, A) of 0. 50 3.4.3 Asymptotic Properties Following Vonesh et al. (2002), we show in Section 3.7 that, under fairly mild regularity conditions, 7 satisfies the following properties 7 = 7 M L £ + 0 {(minJVi) = 7 j P 1/2-1 where Ni = ni + m,i, and 7 0 + O / m a x n~ ^ , ^min N^j 1 0 2 p and ^MLE a r e ^ n e ^ r u e value and exact M L E of 7 , respectively. Thus the approximate M L E 7 is not only consistent but also asymptotically equivalent to ' the exact M L E . The rate of convergence is shown to depend on both the number 71$ of observations per individual and the number n of individuals. Moreover, the estimate 7 asymptotically follows a normal distribution: v ^ ( 7 - 7o) ± N(0, ft( ))., 7o where the asymptotic variance-covariance matrix, Q,(-y ), can be consistently estimated by 1 j 0 i=l 0=0 The proofs of the above results are given in Section 3.7. 3.5 3.5.1 Example and Simulation An Application in AIDS Studies We apply the foregoing proposed methods to a H I V dataset for illustration. We also compare the proposed methods with the commonly-used naive method which ignores covariate measurement errors and the two-step method, which is described in Section 3.2. 51 Data Description The study consists of 46 H I V infected patients who were treated with a potent antiretroviral regimen consisting of protease inhibitor and reverse transcriptase inhibitor drugs. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will typically decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). T h e data at the late stage of study are likely to be contaminated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules. The viral load has a detectable limit of 100 R N A copies/mL. For simplicity, we imputed the censored viral loads, which are below the detection limit, by half the detection limit 50, as in Wu and Zhang (2002). The number of measurements for each individual varies from 4 to 10. There were 72 out of 361 C D 4 measurements missing at viral load measurement times, due mainly to a somewhat different C D 4 measurement schedule. T h e detailed data description can be found in W u and Ding (1999) and W u (2002). The Response and Covariate Models Modelling H I V viral dynamics after anti-HIV treatments has received a great deal of attention in recent years (Ho et al., 1995; Perelson et al., 1996; W u and Ding, 1999; Wu, 2005). The following H I V viral dynamic model (first stage model) has been widely used (Wu 2002; W u and Zhang, 2002) V i j = log 1 0 (P l i e - A l ^ + P e - ^ ) + ey, X 2 i (3.14) where yy is the logio-transformation of the viral load measurement for patient i at time tij, Pu and P i are baseline values, and X^j and A y are the first (initial) and the second 2 2 phase viral decay rates, respectively, and they may be interpreted as the turnover rates of 52 productively infected cells and long-lived and/or latently infected cells respectively. The logio-transformation of the viral load is used to make the data more normally distributed and to stabilize the variance. Wu (2002) noted that variation in the dynamic parameters such as the first phase decay rate XUJ may be partially associated with variation in C D 4 counts. In AIDS studies, it is known that covariates such as C D 4 count are often measured with substantial errors. Thus we assume that the dynamic parameters are related to the true covariate values, reflecting the belief that actual, not possibly corrupted, covariate values govern the model parameters, as in Higgins et al. (1997) and W u (2002). Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1). Grossman et al. (1999) pointed out that the viral decay rate after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). Based on the reasons noted above, we consider the following second stage model, which corresponds to the first stage model (3.14), log(Pu) = A + hi (3.15) l o g ( P i ) = PA + hi 2 XUJ = w(Uj) +hi(tij), where z*- is the true (but unobserved) C D 4 count, and w(tij) and hi(Uj) are nonparametric smooth fixed- and random-effects functions defined in Section 2.1. To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines 53 Table 3.1: A I C and B I C values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3. p=l,g=l p=2,q=2 p=2,g=l p=3,g=3 p=3,q=2 p=3,g=l AIC Model 615.96 583.54 585.39 577.37 586.45 576.43 BIC 678.18 669.09 656.43 670.71 665.90 651.50 with the percentile-based knots to approximate the nonparametric smooth functions w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of the random effects bj, i.e., more basis functions used to approximate the fixed-effects function than the random-effects functions. A I C and BIC criteria are used to determine the values of p and q. We use the observed C D 4 counts for the unobservable true C D 4 counts in the response model (3.14) and (3.15), and use S P L U S functions nlmeQ and anovaQ to obtain the values of A I C and B I C . Table 3.1 displays A I C and B I C values for various plausible models. Based on these A I C and B I C values, the model with p = 3 and q = 1, i.e., A i « A + /?e#i(*tf)+ft V>2(*«)+•&«, 2 i (3.16) seems to be the best, and thus it is selected for our analysis. For the C D 4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,..., u ~) l u and linear (a = 2), quadratic (a = 3), and cubic (a = 4) polynomials. Table 3.2 presents A I C and B I C values for these three models. The following quadratic polynomial L M E model best fits the observed C D 4 54 Table 3.2: A I C and B I C values for the linear, quadratic, and cubic C D 4 models. Model a=2 a=3 a=4 AIC 796.17 703.19 742.12 BIC 819.50 761.52 781.01 process: + 0 2 ) uu + (&3 + 0.3) u\ + e CD4i( = (ai + ai) + (a 2 u (3.17) where uu is-the. time and oc — (ai, a , c*3) are the population parameters and a, T 2 (an, a , i2 a) T i3 = are the random effects. E s t i m a t i o n M e t h o d s a n d C o m p u t a t i o n Issues We estimate the parameters in the response and covariate models using the naive method which ignores measurement errors, the two-step method in Section 3.2, and the two proposed "joint" model methods discussed in Sections 3.3 and 3.4. We denote the method in Section 3.3 by M C E M and the method in Section 3.4 by A P P R . The two proposed joint model methods need starting values for model parameters. We respectively use the parameter estimates obtained by the naive method and by the two-step method as parameter starting values for the two joint model methods. For the naive method and the two-step method, we use S P L U S functions lme() and nlme() to obtain parameter estimates and their default standard errors. For the M C E M , method, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 3.1 and 3.2 show the time series and the autocorrelation function plots for b associated with patient 10. From these 2 figures, we notice that the Gibbs sampler converges quickly andthe autocorrelations between successive generated samples are negligible after lag 15. 55 Time series and autocorrelation Figure 3.1: The time series plot for 6 associated with patient 10. 2 Series : -I I I I b2 I I I i i i •' •'- Figure 3.2: The autocorrelation function plot for b associated with patient 10. 2 56 function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain "independent" samples (see sampling methods in Section 3.3.3). For the Monte-Carlo E M algorithm, we start with k = 500 Monte-Carlo samples, and 0 increase the Monte-Carlo sample size as the number of iteration t increases: k \ t+ with c = 4. = k + k /c t t Convergence criterion for the iterative methods in our examples is that the relative change in the parameter estimates from successively iterations is smaller than a given tolerance level (e.g., 0.01). However, due to Monte Carlo errors induced by Gibbs sampler, it is difficult to converge for an extremely small tolerance level, otherwise it may converge very slowly. The actual tolerance level we used in our example for the two proposed joint model methods is 0.05. Convergence of the algorithms are considered to be achieved when the maximum relative change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the M C E M method. Other sampling methods may also be applied. On a S U N Sparc work-station, the M C E M method took about 90 minutes to converge while the A P P R method took only 3 minutes to converge. This shows that the A P P R method offers quite a substantial reduction in computing time, and is thus computationally much more efficient than the M C E M method. Analysis Results and Conclusions Table 3.3 presents the resulting parameter estimates and standard errors. We see that, except for the naive method, the other three methods give similar point estimates for the parameters, especially for the covariate model parameters ex. However, for the parameters /3 of main interest, the two-step method gives smaller standard errors than the two joint model methods ( M C E M and A P P R ) . This is because the two-step method ignores the variability due to estimating the parameters cx in the covariate model. Thus, these results are consistent 57 Table 3.3: Parameter estimates (standard errors) for the H I V dataset. Method Naive a - Two-step -.42 MCEM APPR a 3 2 - - - - 4.17 -3.74 (.5) (•1) (.5) -.41. 4.02 -3.54 (.6) (•1) (.5) -.42 4.17 -3.74 (.6) (•1) (.6) Pi P5 As Pr 5 6.82 -3.01 9.27 -1.67 .35 (3.3) (.6) (5.6) (8.8) (3.6) 1.53 6.84 -2.86 9.04 -1.75 (4.7) (.6) (5.5) (8.9) (3.5) 1.55 (5.2) 6.85 -2.78 8.91 (9.0) -1.79 (3.6) .35 .53 8.99 -1.77 .35 .52 (5.0) (5.9) (9.1) (3.6) Ps ft 11.73 65.41 .41 (3.9) (•2) 11.73 65.78 (4.1) (•2) 11.74 66.60 (4.7) (•2) 11.74 65.72 1.33 (•2) (4.5) Pi (•7) (6.0) 6.85 -2.82 (•7) R .35 .52 Note: A and B are unstructured covariance matrices, but we only report the estimates of their diagonal elements here. Diag(A) = (.52, 4.06,1.98) for Two-step, Diag(A) = (.53, 2.55,1.25) for MCEM, and Diag(A) = (.52,4.06,1.99) for APPR. Diag{B) = (1.11, 69.94,2.02,24.86) for Naive, Diag(B) = (1.10,69.58,2.02,25.04) for Two-step,. Diag(B) = (1.11,69.96,2.05,25.47) for MCEM, and Diag(B) = (1.10,69.78,2.01,24.85) for APPR. with the analytical results about the two-step method in Section 3.2. We also see that the naive method may severely under-estimate the effect of the covariate C D 4 (which is measured by the parameter p ). 3 The estimates and standard errors based on the two joint model methods are similar and may be more reliable. The commonly used two-step method and the naive method may give misleading results, and the two proposed joint model methods may be more reliable. We will confirm this conclusion via simulations in next section. 3.5.2 A Simulation Study In this section, we conduct a simulation study to evaluate the proposed methods ( M C E M and A P P R ) , and compare them with the commonly used two-step method and the naive method by the mean-square-error (MSE). The models and the measurement schedules used in the simulation are the same as those in the real H I V dataset in the previous section (i.e., models (3.14) - (3.17)). In the simulations, the true values of a and (3 are shown in 58 Table 3.4: Simulation results for parameter estimates as well as (standard errors) and (simulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R . Parameter True Value Naive Method Two-step MCEM APPR Oil -.5 a 4.0 Pi P2 Pz P4 p5 (% Pi -4.0 12.0 66.0 1.5 7.0 -3.0 9.0 -2.0 - - - 11.98 65.50 0.94 6.92 -3.74 10.13 -1.62 - - - (-1) (1.1) (.3) (1.7) (2.6) (.9) - - - (.2)* -.51 4.05 -4.02 11.98 (1.1) (1.2)* 65.66 (1.0)* 1.53 (.3)* 6.92 (1.6)* -3.74 (2.8)* 10.13 (1.0)* -1.62 (•1) (.1)* (.3) (•2) (.1)* (1.2) (1.4) (1.7) (2.6) (.9) (1.2)* (1.4)* (•3) (.2)* (1.6)* 11.98 65.85 1.48 6.98 -3.11 (2.5)* 9.15 (.9)* -1.95 (•3) (.3)* (2.0) (3.0) (1.1) (1.9)* -3.82 (2.8)* 10.29 (1.1)* (3.1) (2.8)* 2 a 3 -.51 (.3)* 4.04 (•4) (.3)* -4.02 (•1) (.1)* (•4) (.4)* (•4) (.4)* (•1) (.2)* (1.5) (1.8) (1.5)* -.51 4.05 -4.03 11.98 65.46 (1.8)* 1.52 (•1) (.1)* (•3) • (-4) (.3)* (.3)* (•2) (.2)* (1.4) (1.8) (.3) (2.0) (1.4)* (1.7)* (.3)* (1.9)* 6.93 -1.56 (1.1) (1.0)* Table 3.4, and the other true parameter values are 5 = A, R = .3, A = diag(.5, 2, 1), and B = diag(l, 9, 2, 4). We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors as well as simulated standard errors based on each of the four estimation methods. Since M C E M method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 116 data sets. The simulation results are shown in Table 3.4. From Table 3.4, we see that the naive method can severely under-estimate the covariate effect /?3. The two-step method produces similar point estimates as the two joint model methods ( M C E M and APPR),"but it gives smaller standard errors than the M C E M and the A P P R methods, since the two-step method fails to incorporate variability in estimating the covariate model. These results are consistent with the analytical results in Section 3.2. Note that the estimates for the covariate model parameters a are similar for the three methods (Two-step, M C E M , and A P P R ) , but the parameters a are usually treated as nuisance pa- 59 rameters and are not of primary interest. The two proposed joint model methods (MCEM and A P P R ) perform better than the two-step method and the naive method, and the M C E M method is the best among all four methods in terms of bias. In the computation, the A P P R method converges much faster than the M C E M method. These simulation results confirm that the naive method and the two-step method may give misleading results and that the two proposed methods are more reliable. 3.6 Discussion We have proposed two approximate likelihood methods for semiparametric N L M E models with covariate measurement errors and missing data. The first method, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler, may be more accurate but may be computationally intensive and sometimes may offer computational problems such as slow or non-convergence. The second method, implemented by an iterative algorithm without Monte Carlo approximation, is computationally much more efficient, but it may be less accurate than the first method since it uses an additional approximation. Alternatively, the second method may provide excellent parameter starting values for the first method. Simulation results show that both methods perform better than the commonly used two-step method and a naive method. In particular, the commonly used two-step method may under-estimate standard errors, and the naive method may under-estimate covariate effects. For semiparametric N L M E models with covariate measurement errors and missing data, the models can be very complex. Thus, if there is not sufficient information in the data, the models can be non-identifiable. To our knowledge, there seem no existing general necessary and sufficient conditions for model identifiability, and the identifiability problem needs to be considered on a case-by-case basis. In practice, we can check model identifiability 60 by examining the convergence of iterative algorithms. If the model is non-identifiable, the iterative algorithms may diverge quickly. For the models considered here, we find that the iterative algorithms converged without problems, so the models seem identifiable. The methods proposed here may be extended to semiparametric generalized linear mixed models and nonparametric mixed-effects models with covariate measurement errors and missing data. T h e results will be reported in the near future. 3.7 Appendix: Asymptotic Properties of 7 Based on the A P P R Method in Section 3.4 In this section, we show the asymptotic properties of 7 obtained by the A P P R method in Section 3.4. We first state some Lemmas, which are used in the proofs, in Section 3.7.1. Then we describe some regularity conditions under which the asymptotic properties hold in Section 3.7.2. In Section 3.7.3, we obtain some estimating equations, which are used for showing asymptotic properties of 7 . Consistency and asymptotic normality of 7 are shown in Sections 3.7.4 and 3.7.5, respectively. The results are extensions of Vonesh et al. (2002). 3.7.1 Some Lemmas The following four lemmas will be used for showing asymptotic properties of 7 . L e m m a 3.2. (Vonesh and Chinchilli, 1997). Let 7 „ be a sequence of random variables satisfying Y n = c+O (a ) p n where a n = o(l). If f(x) is a function with r continuous derivatives at x = c, then f(Y ) n = /(c) + fU(c)(Y - c) + • • • + [l/(r - l)!]/( " )(c)(y - c ) r n 1 n 61 1 + O «), p where / ^ ( c ) is the fcth derivative of / evaluated at c. In particular, f(Y ) n = f(c) + O (a ). p n This result holds when O (-) is replaced everywhere by o (-) or when Y p p n and c are replaced by a vector/matrix random variable Y „ and vector/matrix constant c. L e m m a 3 . 3 . (Aitchison and Silvey, 1958). If / is a continuous function mapping R s itself with the property that, for every 9 such that ||#|| = 1, 9 f(6) into < 0, then there exists a T point 0 such that ||6>|| < 1 and f(8) = 0. L e m m a 3.4 (The B o u n d e d Convergence Theorem). Let {f (x)} be a sequence of n measurable functions defined on a set of E of finite measure, and suppose there is a real number M such that |/ (a;)| < M for all n and all x. If f(x) n = lim _ n > o 0 f (x) for each x in n E, then L e m m a 3 . 5 . Let A and B be u x u symmetric matrices with eigenvalues fJ-i(A) > / ^ ( A ) > ••• > A*„(A) and Hi(B) > fj-2{B)-> ••• > Hv{B), respectively. If A — B are nonnegative definite, denoted by A — B > 0 or A > B, then we have fJ-i(A) > Pi(B), 3.7.2 i = 1 , . . . , v. Notation and Regularity Conditions Let the r-dimensional vector 7 = ( a , (3) € T and k ( 7 , oj{) = li(ct, 0, a NiLih, i 5 y ^ Zj) = l o g / y ( j / i | z j , c ^ i ; 7 ) + l o g / z ( z i | w i ; 7 ) , Wi) = ^ ( 7 , Wi) + l o g / ( a i ) + l o g / ( b i ) , 62 ' where N; = rii + m-i. Let d w =u; (7)) i i d 2 »\w,w (7, ^ ( 7 ) ) = t w =u; (7)» dujidujf i i i, (7> ^ ( 7 ) ) = ^ ; ^ ( 7 , Wi) w =a; (7)' 7 i i " 7 7 ^ ^(-Y)) = d^ ^' li " 7 ^ ( 7 , W i ( ) ) = Q^fhh, 7 W l ) u>i=u;i(7)i etc. Wi) a; i =a> i (7). Similarly, we can define the corresponding derivatives for L i ( 7 , c*>i(7)) and 3 2 -h(i, Vi), etc. Also, we denote convergence in probability as iVj —> 00 by o (ljv ), convergence in probability p as n —> o p 00 by ( l A ^ i , n ) - o p ( l „ ) , i and convergence in probability as both Ni —> 00 and n —• 00 by We show consistency and asymptotic normality under the following regularity conditions. A n outline of the proof when some of these conditions are relaxed is provided at the end. R l . /Vj = O(N) uniformly for i = 1 , . . . , n, where N = min; R2. The variance-covariance parameters A = (5 , R, A, B) are'fixed and known, and 2 the true parameter 7 0 is in the interior of V. Qi, A i , Zi, E j , and D are evaluated at 6 and Wj. When A is unknown, we can simply replace it by its consistent estimate (e.g., in (3.13)). R3. The density functions fyiVijl^i, <*V)7) and fz(zij\tJi; 7 ) satisfy the necessary regularity conditions (e.g., Bradley and-Gart, 1962) such that, for fixed 7 , the M L E of u>j is v^-consistent for Wj as Ni —> 00. In addition, the necessary regularity conditions are assumed (e.g., Serfling, 1980, p. 27, Theorem C) such that, by the Law of Large Numbers, 63 the following hold: ~K -J^k{l^ ) oyd'y l l 1 -N~ -^-- k{ ^ ) l Y 1 -N- ^ + o {l ) as N* - oo, = N- ZjK- Z + o {l ) as N - oo, + o (l ) as T l i 1 l l l i kh,u l = N-'Q K- Q a a Finally, the matrices p ) = N- ZTA; Q 1 l T J i where, under models Z(2.6) - (2.8) p i p Ni Ni Ni { - oo, 2 2 N~ Qj'h~ Qi l l and N^ ZjK^Zi l are both assumed to be positive definite with finite determinants such that, for example, the smallest eigenvalue of N~ QfA~ Qi 1 1 exceeds Ao for some Ao > 0. R4. For all 7 G T and all the 6-dimensional u>j € R , b the function L;(7, cJj) is six times differentiable and continuous in 7 and Wj for all y^- and z^-, and L i (7, W j ) satisfies the necessary regularity conditions needed to change the order of integration and differentiation, as indicated in the proof. R5. For any 7 e F, there exist di > 0 and Ai > 0 such that a. For all 7* € 5^(7), where #^(7) is the r-dimensional sphere centered at radius d\, the following holds: ~ E ^>7^> i=l ^(7)) l7=T = " ( 7 T ' 64 1 + Op(ln), as n —> 00, 7 with where ^(7*) 1 is positive definite with minimum eigenvalue greater than A i and ^ , 7 ( 7 , Wi(7)) | =T = in {l\ 7 ni Wi(7*)) + # y W | ( 7 * , «i(7*)) x K i w « ( 7 * , Wi(7*)) + D - ] - ^ 7 ( 7 * 1 w*(7*))}. 1 ! b. The first, second, and third derivatives of y/NiL^-y, Wj) with respect to CJ^ are uniformly bounded in 5 ^ ( 7 ) . R6. A t the true value 7 , the following hold true: 0 = c^j(7 ) exists for all i = l , . . . , r a , Eu,(QjE~ Qi) l 0 n lim n " V Covo; ( Q f E ^ Q , ) = 0, 2 n—>oo ' ^ i=l and lim n n—>oo where Qi, Z it _ 1 YV(7o) ' and E ; are evaluated at 7 = (7o)~\ fi J 0 and o»j and f 2 ( 7 ) 0 R7. The marginal densities, Jexp{iVjL;(7, is positive definite. u>i)} duii, satisfy the necessary regularity conditions such that the M L E of 7 exists and satisfies ('JMLE 3.7.3 _ 1 ~ 7o) = O {n~ l ). l 2 p Estimating Equations In Section 3.4, we obtain an approximate M L E 6 = (7, A) of 6 and the predictor u>j of <jj. It is obvious from (3.10) and (3.12) that the estimate 7 is a function of A and the final estimates Wj of u>j are functions of both 7 and A. For fixed A, it can be shown that 7 and n Cji = uJi(j) maximize the complete-data log-likelihood function (3.2), i.e., 2~2 - ^ - ^ ( 7 , <*>i)- 65 In fact, we can write if if = - = if (a, (3, aj,bj, A) in (3.2) as Hi | log(27r<5 l O g ^ T T C T)j -- ^ — 2 I27T.RI -- \ ll oogg | 2 ^ | - \ log log |27Tfl| T Yi' Si AT yi - gi 1 -UiOL- yzi-UiOL-Viai J T ( » \ \ -) b (1 Taking the first derivatives of £ ii with respect to 7 = ( a , (3) and i=i = (af, b f ) r and. set- ting these first derivatives to appropriate zero vectors, we can obtain the following estimating equations i=l ^ Zi\ Zi - ^ -1 Ui OL - Vi yi - gi &i ^ J = 0, - D~ Zi — UiCX — ViZn j V (3.18) V b t i = l,...,n, / which is equivalent to £ <9* A i _ 1 ^ 7 i=l +E Q = £ Qi i=l A" ^ 1 l i=l K l n, ^ Zi A " Qi 7 + [Zj A " Zi + fl-" ]^ = ^ A ^ r - i , ' 1 where 1 i = 1 . . . , n, 1 ( V Si. V>. dbj da] eft (3.19) Zi \ J The solution to the estimating equations (3.19) can be obtained by iteratively solving the following equations £ QI A - Qn 1 i=i + tQT A " ZiUi 1 i=i Z f A" Q 1 l + 1 f it i=i (ZT A " Z i + b- )ui 1 l = ± Qf A,- 1 66 (3.20) = Zjk. r , 1 i i = 1...,n, where f; = ( y f , z[) is defined in (3.7), and Qi, A T Z it and D are Qi, Ai, Zi, and D it evaluated at 7 = 7 and Wj = Wj, respectively. The solution to the equations (3.20) is given in (3.10) and (3.12). Therefore, for fixed A , the final estimate 7 and a V = ^ ( 7 ) satisfy the estimating equations (3.18) and maximize the complete-data log-likelihood function (3.2). These facts will be used to show the following asymptotic properties of 7 . 3.7.4 Consistency We first note that, for fixed A , the M L E of 7 will satisfy the set of estimating equations ° ^ i=i i=i 7 J Under R4, we have "" / {j2 Q~ ^' Ni ^)}exp{^7V,L,(7, L = 53 V ^ ' / • • • / (VNiQ-Hl, wO) exp { 7 i=i = ^ V i=i 7 ^ 7 A ^ ( , w^W • • • dw n 7 j=i 7 / (\/^^, (7, ^ w^dwr-.dw, w<)) e x p { / V L ( 7 , W f ) } ^ i i x 7 exp{iV L ( , j^ii i 7 a;,-)}** 7 Now we examine the term £ - ^ ( 7 , u>j) in the above expression. Since the y^s conditionally independent each other given and Zj/s are u> by the Lindeberg Central Limit Theorem, it iy follows that conditional on u>,-. ( = \ Yi ~ gi NrWA, 1 Ui a — Vi a, = O (N~ ) 1/2 p (3.21) Furthermore, under R3 it can be shown that the estimate ^(7)= v O (N- ). 1/2 i + p 67 (3.22) Combining the results in (3.21) and (3.22) and applying Lemma 3.2 to £ ^ ( 7 , ^ ( 7 ) ) , we can show that • 4 (7,A(7)) = L ; ( u ; ) + O (iV - = O {N~ ) .= O (N~ ). 7 i 7 7 ) i 112 p p + 1 / 2 i ) O (N~ ) 1/2 p (3.23)' 1/2 p Then, by direct application of the Laplace approximation to integrals of the form J and / q(x) exp{kp(x)}dx, exp{kp(x)}dx where q(x) and p(x) are smooth functions in x with p(x) having a unique maximum at some point x (see, e.g., Barndorff-Nielson and Cox, 1989), it can be shown that (l + ocivr)) 1 J exp{/ViLi(7, Ui)}dwi = exp{A^ L (7, 1 ^ ( 7 ) ) } ^ \NiL"\ i i and / ( ^ ^ , 7 ( 7 , ";)) e x p { / V L ( 7 , i = exp{ML ( , l 7 «i( ))} x i w^jdui + W )), «i(7)) ( ^ - 7 ( 7 , ^ 7 where L - ' ^ ^ = £"u;,u>(7 0 ^ ( 7 ) ) . Because (3.21) implies y/N~L' (-y, ; irf 1 0)1(7)) = O ( l ) , it p follows from R l that 1L ( ^ ^ ( 7 , ^ ( 7 ) ) + O^N- )) x 1 J](l + O^Nf )) 1 Hence we have VWJ, (7, w<( )) = 7 i i (y/NiL'^in, 7 i=l 1 &&)) + OiNr ) 6/2 n 2TT i i (I + CXAT"1)) I exp{jV L ( , «i(7))} JJ 7 n 7 ) + ^(AT ). 6/2 2?r exp{/V L (7, W i ( ) ) } ir( 7 w( )) 7 E ^ , 7 ( 7 . ^( ))+nO (iV- / ) 1 7 p i=l 68 2 1 where 7 7 P 7 i=l for all 7 € r, the M L E -y MLE J where (Vl^w K( , u>( )) = e x { £ A ^ ( , ^ ( 7 ) ) } ft ^(7,^(7)) | n / 2 - Since ^ ( 7 , 4 ( ) ) ^ 0 7 +O (n7V- / ) 1 7 = 7 M L £ 2 = 0, p (3,24) n = E ^7(7, = ]H ^ - ^ 7 ( 7 , ^ ( 7 ) ) 7 6 of 7 satisfies ^ ) | 7 = 7 ^ = ° Ji( , £ ( 7 ) ) J ) i=l i=l ^ 1 ( 7 ) ) is the s e t of estimating equa- i=l tions for 7 conditional on fixed A , as given in (3.21). B y taking a first-order Taylor series expansion of Ji(7, £ ( 7 ) ) about y O (nN~ ), a n noting that, from (3.24), J\{I E, d M L E ^{IMLE)) ML = we have 1/2 p •• Ml, w ( 7 ) ) = O (nN-^ ) + 2 p J{( *, 7 w(7*))(7 - (3.25) 7MLE)> where JiW, « ( 7 * ) ) 7j~r<M7, « ( 7 ) ) = | =7* 7 = <9^ f E ^ ' 17=7* 2 7 t=i and 7* is on the line segment joining 7 to •JMLE- B y applying the chain rule, we have, for any 7 G T, d 2 •/I(7,*(7)) = E { ^ T ^ ( 7 ; W + ) i j=i = E ^77(7, Wi(7)) + ^ w ( 7 , *i(7)) a i=l NiLi(-y, u>i) I ^(7) (3.26) 3 7 Note that £ 1 ( 7 ) maximizes NiL^-y, u>) and satisfies the second set of equations in (3.18), i.e. ' '• W 7 , ^ ( 7 ) ) - D^Cain) = 0 I ^(7) « = ^ ^ ( 7 , w<(7)). Applying the chain rule once again, we have 0Wf(7) 3 :{l>^(7, 57' d = ^ : W 7 ( 7 ^(7)) a 2 h{l, , W j ) uj =u> (7) i i « < ( 7 ) ) + ^ w ( 7 , 2 dujidujf «i(7)) 69' lift, W i ) u>=<I;(7) i i ^(7) a r 7 Solving the above equation for [dLj { ^)/d^f ], i 0Wj(7) <9 / T = {- r we have Wi(7)) + 7 Substituting this expression of [du^)/d~f ) in (3.26), it follows from R l , R3, and R5 that T as n —> oo - l j { ( 7 * , *<(7-)) = -^E^, (7,«i.(7))l7^ i=l Q(7*)- + o (l„), 7 = which implies ~J[(rf*, _1 P fl(7*) - Wi(7*)) Taking e = | Q ( 7 * ) (3.27) 1 -1 in the definition of convergence in probability, from (3.27), we have P (-^(TT <*=> 1 < -\J'Al\ «*(7*)) - ^(T*)" < ^ ( T * ) " ) -> 1, 1 pQn(7T <~^(7*,«i(7*))<|n(7')- )-»l, 1 as n - oo 1 asn^oo, 1 which implies ^ (^(TT 1 < - ^ 1 ( 7 * . « i ( 7 * ) ) ) - 1, as n ^ oo. Therefore, for sufficiently large n, , - ^ i ( 7 * , «i(7*)) > ^ ( T T 1 >^ J (3.28) with probability 1, where Ax is defined in R5. Since, in (3.25), O (A^- / ) = N' ' O (l) A 0 1 p 2 1 2 p as N —» oo, similar arguments as for —\J'i{l*, £j(7*)) lead to with probability 1, for any 0 < K < di, \\O (N^ )\\ < 2 p 70 ^ , (3.29) For any 7 such that H 7 — •JMLEW where d\ is defined in R5. = > K w e know that H7 — — 1) which satisfies the condition of Lemma 3.3. We can regard J i ( 7 , ^ ( 7 ) ) in JMLE\\/ K (3.25) as a function of ^ ( 7 — JMLE) o the unit sphere in R and using the results in (3.28) T n and (3.29), we have :(7 - - J i ( 7 , ,#(7)) IMLE? n ( 7 - 1MLE) O {N- ' ) T + 1 2 P < " 7ML ||||O (^- s --- -- °- E 1 / 2 P < « 7(7-7MLB) ~(7-7Mi£) - 4^4||7 - 7 M L I | )II 2 < Therefore, conditional on u , we can show Pu { < 0 ,w(7)) -J\{l, n - { I - I M L E ? n —> 1 "as 00, N —> 0 0 . Thus, since 7 satisfies J i ( 7 , ^ ( 7 ) ) = 0, Lemma 3.3 implies that lim Puj{\\l - < « } = 1. IMLEW n,N—*oo Hence, using Lemma 3.4, we have lim P{|| 7 - n,N—>oo 7 M i £ | | < K} = Eu; { lim P {\ft - u 7 m l e | | I n,N—>oo Since ^ ( 7 , u>( )) = 0 and J i ( l 7 M L E , v(^ )) < « } ) = 1. (3.30) I = O ( n i V - / ) by (3.24), it follows from 1 MLE 2 p (3.25) that ~ ^ ( 7 * * , w(7**))(7 - W ) where 7 * * is on the line segment joining 7 to -y = 0 (iVP the interior. It follows from R5 and inequality (3.28) that <*K7**)) > ^z ( 7 * * ) 71 ), (3.31) From (3.30), we know that 7 * * is in . MLE --J'i(l**> n 1 / 2 _ 1 > z where Q(7**) _1 is positive definite. Lemma 3.5 implies that all eigenvalues of \—\J[{l**, ^ ( 7 * [—\J[{l**, £ ( 7 * * ) ) ] ~ \ . are greater than | A i . Therefore, all eigenvalues of which are equal to the reciprocal of all eigenvalues of [—^1(7**, ^(7**))], are less than [ - ^ ( 7 * * , ^( **))] < £ / , i.e., _ 1 7 [-±J{(Y*, ^(Y*))]' = 0 (1). 1 P So we have Hence, from (3.31), we have (7 IMLE) — -U( **, 0,(7**)) n O (iV-V2) p 7 = 0^1)0^-1/2) (N- / ), 1 2 = 0p from which it follows (given R7) that 1 3.7.5 + O {N- / ) X 2 = iMLE = 7o + O^n- ' ) + = 7o + O j max n~ p 1 2 p O (N-V ) 2 p 1 / 2 , -1/2 (min Asymptotic Normality of 7 The asymptotic normality of 7 will be shown based on the estimating equations (3.10). Let n *(7) = E ^ r ( r - Q i 7 ) , 1 l i=l where Qi, and Noting that are defined in (3.20). The estimator 7 satisfies $ ( 7 ) = 0 at convergence. d<&(7)/dy T = — E Qj^H Qi l constant for 7 , we take a Taylor series expansion xs of ^ ( 7 ) around the true parameter 7 : 0 0 = *( ) 7 = $(7 ) + 0 ^ M ( dy 7 1 which implies V^(7 - 7o) = d${n) n dy 1 T 72 _ 7 o ) ) Since E[Qjtr (ri - l Q )] = 0 and Cov[Qf S r I7O 1 ^ Q )] = Qf^Qu - I7O by the Lindeberg Central Limit Theorem, we have n . - £ Qjt^Qi I -1/2 / - = J2 Qf^ih J i=l Noting that 7 = 7 + o (ljv,n) 0 X n a n P = and using Lemma 3.2 and (3.22), we have £ ; ( 7 ) + o (ljv ) 0 p in = W i = W i .+ O (AT- = N(0, I). / d Ni = O(N), Wi(7) - - Q,7o) i=l \ * + O (AT 1 / 2 p 1 / 2 p ) + 0 p (l , ) w n ) + o (l , ) p w n u>i + o (lAr, ). p n Hence, it follows by the Law of Large Numbers, Lemma 3.2, and R6 that -1/2 -1/2 u —> oo i=l 7=7,0;=u> (7) -1/2 A —> oo, n —> oo n P. J 7=7 ,u>=u; i=i 0 rr~>/.. \ l l1 / 2 N —> oo, n —> oo, [^(To)] where u is the number of iteration. Using Slutsky's theorem, we can show that vM7-7o) = i E d 1 - va T J L i E Q?V& QWQi i=l -1/2 n i=l N(O, n ( ) ) . 7o We can extend the foregoing proofs to the case where A is unknown by replacing A with its consistent estimate. Note that at the estimate 7 , the estimate A given in (3.13) can be shown to be consistent for A as n —>• 0 0 and N —> 0 0 (see, e.g., Demidenko, 2004). 73 Chapter 4 Simultaneous Inference for Semiparametric N L M E Models with Covariate Measurement Errors and Outcome-based Informative Dropouts 4.1 Introduction In this chapter, we develop two approximate likelihood methods to simultaneously address covariate measurement errors and outcome-based informative dropouts in semiparametric N L M E models. In Section 4.2, we propose models for this complicated problem. We obtain approximate M L E s of all model parameters, using a Monte-Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 4.3. In Section 4.4, we propose an alternative and computationally much more efficient approximate method. The proposed methods are illustrated in a H I V dataset in Section 4.5 and are evaluated via simulation in Section 4.6. 74 We summarize this chapter in Section 4.7 with some discussion. 4.2 Models for Nonignorable Missing Responses In the presence of nonignorable or informative dropouts in the semiparameteric N L M E model (2.6) and (2.7) with the covariate process (2.8), we can write y bs,;) = (y is,i, 0 m for indi- vidual i, where y i i collects the missing components of y, and y bs,i collects the observed m S > 0 components of y^. Here, the missing y^-'s are intermittent, i.e., we allow dropout individuals to possibly return to the study at a later time. Let r; = (rn,..., r ) T ing response indicators for individual i such that r^ = 1 if Note that {(y b ,i, 0 s be a vector of miss- irH is missing and 0 otherwise. = 1 does not necessarily imply that r^+i = 1. We have the observed data Zi, Ti), i= l,...,n}. To allow for a nonignorable missing mechanism in the response, we need models for the missing response indicators r-j, which are called dropout models. T h e parameters in the dropout models are treated as nuisance parameters and are usually not of inference interest. Thus, we try to reduce the number of nuisance parameters to make the estimation of main parameters more efficient. Moreover, too many nuisance parameters may even make the response and the covariate models non-identifiable. Therefore, we should be very cautious about adding extra nuisance parameters. In general, the probability that is missing may depend on many factors, such as responses, covariates, and individual random effects, etc. Since the missing response indicators rj are binary, a simple model for them is a logistic regression model as follows. We will assume that r^-'s are independent for simplicity (to reduce the number of nuisance parameters) m /(r^, Zi, a^, bi; 77) = J J j P ^ - = l| y i ) z h ai, b ^ i=l 75 r])} ^[l r - P(r = l| y y i , z a i} bi; r / ) ] ^ , 1 - it with i logit[P(r - = 0 l|yi, ai, b i; 77)] = log ^ P — Z " b * ' ^ r\Jij — J-lYi, Z i , ai, [1 Di, h(y z a^ bij.ry), = u h 77JJ where 77 are the unknown nuisance parameters and h(-) is often chosen to be a linear function of yi, Z j , a^, and bi. More complicated models can also be considered, but they may introduce more parameters and increase the computational burden. Note that the missingness of may depend on the (unobserved) true covariates z* rather than the observed error-prone covariates Z j . In this case, a method similar to the one described below can be developed and will be discussed in the next chapter. The density function /(r^y*, z i; a i; b^ 77) is a general expression of the missing re- sponse mechanism. Little (1995) pointed out two ways to incorporate informative missingness: • outcome-based informative if /(rj|yj, Zj, a,, bi; 77) = /(rj|yj, z^; 77). That is, the prob- ability that the current response is missing depends on the possibly unobserved response yi and covariates Z j but not on the random effects ai and bi. For example, a patient does not show up because he is too sick to go to the clinic. • random-effect-based informative if / ( r j | y j , Z j , ai, bi; 77) = /(r^a,, bi; 77). That is, the probability that the current response is missing depends on the underlying unobservable random effects ai and bi but not on yi and z». For example, a patient may be more likely to drop out if his initial viral decay is slower than other patients. In this chapter, we focus on the outcome-based informative missing mechanism. Diggle and Kenward (1994), Little (1995), and Ibrahim et al. (2001) discussed various specifications of the outcome-based informative missing mechanism. We will assume that, for example, are independent with logit[P(rij = l|yi, Z i ; 77)] = 771 + 772^1,- + 76 r] z 3 i2j + • • • + r] z v+1 ivi + r?„ yij. +2 More complicated dropout models can be specified in a similar way. Note that the assumed dropout models are not testable based on the observed data, so it is important to carry out sensitivity analysis based on various plausible dropout models. If the main parameter estimates are not sensitive to the assumed dropout model, we may be confident about the results. Otherwise, if the estimates are sensitive to the assumed dropout model, we need further investigation of possible missing mechanisms. 4.3 4.3.1 Likelihood Inference The Likelihood Function We consider likelihood inference for semiparametric N L M E models with outcome-based in- formative dropouts and measurement errors and missing data in time-varying covariates, based on the approximate models (2.6) - (2.8). Let 0 = ( a , /3, 5 , the collection of all unknown model parameters. We assume that the parameters a, (3, 2 R, A, B, 77) be 5 , R, A, B, and 77 are all distinct. The approximate log-likelihood for the observed data 2 {(y bs,i, Z i , T i ) , i = 1 , . . . ,n} can be written as 0 1(0) = 5> /// n g f(W,B)f(Ti\yi, /y(yi|z» Zi; )fz{^W,a,R)f(&i;A) V) 77 dy daidbi miSti where /y(y;|z;, b^, a, (3, 8 ) 2 = U^U fY{yij\zij,ai, = n;ii / ( z i | a i ; a , i?) z The i ; (2TT<5 )- /2 e x p { - [ y y 2 z , 1 d ( u £ a + v £ a , /3, b ))] /2<5 }, 2 t l a IE=i M " x(zi /(b 2 = n £ = i /z( ;fcl *; a , i2) = /(a,; A) W, a, (3, 5 ) 1 7 2 exp{-(z i f c - ink a - v, a i ) i?" f c T 1 - UifcQ: - V i a i ) / 2 } , f c fc =|27r^|- /2 xp{-af^- ai/2}, 1 1 e ^yrfll-^expf-bffl-ibi^}. B) . = observed-data log-likelihood 1(0) can be quite intractable, so we use a M C E M algorithm, which is similar to Section 3.3.2, to find the approximate M L E s of parameters 0. B y treating the unobservable random effects a ; and b i as additional "missing" data, we have "complete data" { ( y i , z i ; r^, a i , b i ) , i = l , . . . , n } . The complete-data log-likelihood for all individuals can then be expressed as n n i=l i=l + log / ( a A) + log / ( b B) + log f(rA , i ; i ; yi z 77)}. i ; (4.1) (i) where l is the complete-data log-likelihood for individual i. 4.3.2 Approximate M L E s Based on a M C E M Method c The E M algorithm iterates between an E-step, which computes the conditional expectation of the complete-data log-likelihood given the observed data, and a M-step, which maximizes this conditional expectation to update parameter estimates. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectation. In the M-step, we use standard complete-data optimization procedures to update parameter estimates. 78 2 Let &® be the parameter estimates from the i-th E M iteration. The E-step for individual i at the (t + l)th E M iteration can be expressed as Qi(0\OV) = E(lU(e)\y , ,z ,T ;0®) oba i i = J J J i [log fyiy^, + log /z(zi|ai; oc, R) + log /(a^; A) + log /(fy; -(-log f(ri\yi, = A (a, 0, (i) 77) x / ( y i i , a m O.+ / i ° ( a . S i i ; a,, b<; oc, 3, 2 B) b^y^i, z 0 ) dy (t) u + 4 V ) + /J°(£) 8) m i S j i ^ dbi (4.2) + /«(t|). Since the expression (4.2) is an expectation with respect to f(y i ,i, ai, bi|y h j, Zj, i"*; 0^), m S 0 S) it can be evaluated using the M C E M algorithm (Wei and Tanner, 1990). Specifically, we may use the Gibbs sampler to generate samples from [y j j, a^, b ^ y ^ i , Zj, ^ m sampling from the full conditionals Zi, r [y is,i\y bs,i, m ai, bi; 0W], [ai|yi, i , r i , bi; 0 ], and (t) i} 0 0W] by iteratively S) Z [bi|yi, Zi, Ti, a;; 0W] as follows. f(y is,i\yobs,i, m Zi, r, { ai, bi; 0W) oc ri, a /y(yi Z i , bi; 0W) i ; /y(y< Z i , ai, bi; 0W) • / ( r i | y i , <• / ( a » | y i , Zi, r i , b = /y(y< i ; 0W) oc = « Z i 0W)-/(ri|yi, i 5 , a b<; 0 ) Z i ; 0«), (i) i ; ai|zi, r i , bi; 0W) f (yi Y fv(yi , ai, b Z i Z i , r i , a;, bi; 0W) • /(a^z*, r», bi; 0 ) (4) /y(yt Z i , ai, b 0W) • / ( r i | y i , i ; Z i ) ai, b i 5 0«) •/ ( a ^ ; 0(*>) ( CK / ( a i ; 0W) / ( z i | a i ; 0 « ) • / y ( y ^ , ai, b,; 0<«)), 2 / ( b i | y i , Z i , r i , a*; 0 ) (t) oc / y ( y i , b;|z;, r i , a ; 0 ) (t) ; = /y(Yi| i> * , a*. i ! z ex / ( b i ; r b 0 ( t ) ) •/(bi|zi, r 0(*)) • / y ( y i | z i , ai, b = / ( b i ; 0W) • / y ( y i | z i , a i ; < ; i ; ^ ; 0<*>) 0«) •/ ( r ^ , z i } ai, b<; 0W) b<; 0 ^ ) • / ( r ^ , z,; 0W) o c / ( b i ; 0 W ) - / y ( y i | z i , ai, b,; 0W). Monte Carlo samples from each of the full conditionals can be generated using rejection 79 sampling methods, as in W u (2004). Alternatively, the integral (4.2) may be evaluated using the importance sampling method (see Section 3.3.3). For individual sample of size {(y^., let i, k generated from t af> b f ) , . . . , (yf ,, a f \ b f ) } denote a random 0 a*, b;|y [y i ,i, m S ofcSii , z u 0 r- 0^}. Note that each ( y f , a f , bf si depends on the E M iteration number t, which is suppressed throughout. The E-step at the (t + l)th E M iteration can then be expressed as « E U Q(0|0<*>) = ± QMeV) . i=l i=l E ^(tf; (3$.,,, y ^ ' ) , af, bf)) fc=l L J = E E iipg /y((yi2.,i,yo6.,i)|zi, a f , b f / a , * ) 2 i=lfc=l + E E T log / z N a f ; a , R) + £ £ & log / ( a f ; A ) t=i fe=i • i=i fe=i t + E E £ log / ( b f ; B) + ± t £ log / ( r | ( y f y ^ ) , i=i fe=i t=i it=i t 0 = Q {cx, /3, <5 |0W) + . Q P ) ( a , R\0®) + Q^{A\0^) w + 2 The M-step then maximizes z „) i; Q( >(£|0(*>) + 4 Q^{r)\0V). Q(0\0^) to produce an updated estimate 0^ \ so it t+1 is like a complete-data maximization. Since the parameters in + Q^ \ Q^ \ 2 are distinct, the M-step can be implemented by maximizing and 3 + Q^ \ 4 and separately using standard optimization procedures for the corresponding complete-data models, as in Section 3.3.2. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0. Let sf = dlf/dO, where if is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of 0 is Cov(0) = [^£(sf|y o b S i i , Zi, Vi; 0) E(sf\y , obSti z u r i; Of] , (4.5) i=i where the expectations can again be approximated by Monte Carlo empirical means, as in (4.4). 80 In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. A^°\ B^) Obtain an initial estimate of ( a , (3, 5 , R, A, B) = ( a ' ' , / 3 , 5 -°\ 2 0 (0) 2( R®\ based on a naive method, which ignores covariate measurement errors and missing data, and an initial estimate of r/ = T / ° ) based on the dropout model by filling in the average of 0 and b f Step 2. (yrms.i, a i, Set a f = y b ,i0 s y m j S) j with 0. = A t the i-th iteration, obtain Monte Carlo samples of the "missing data" using the Gibbs sampler along with rejection sampling methods by sampling [ymiS,i|y 6S,i, from the full conditionals r-j, a i j 0^], 0 fy; 0^], Z i , ^ , ^ , [a^y*, z r bf, 0 ] and (i) i} iy [b^y*, z,, or using importance sampling methods to approximate the conditional expecta- tion in the E-step. Step 3. Update estimates 0( ) using standard complete-data optimization procei+1 dures. Step 4. Iterate between Step 2 and Step 3 until convergence. 4.3.3 Monte Carlo Sampling Gibbs Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples (yfsi, af \ as follows. Set initial values are (ySs,i> a f\ b f ). } w e c a n o b t a i n bf). (9mi£i, f a Suppose that the current generated values + 1 ) , bf+1)) as Step 1. Draw a sample for the "missing" response f(y is,i\yob ,i, m s Z i , n, af', bf ; } follows. yfitV fr°m 0W). Step 2. Draw a sample for the "missing" random effects a f /( *l(yfS\ a r yofc.,0, 81 i ; bf *<*>). + 1 ' from Step 3. Draw a sample for the "missing" random effects h\ f\ H^)> k = r ' from After a sufficiently large burn-in of r iterations of Steps 1 - 3 , the sampled values will achieve a steady state. Then, {(y^l.ii a + I,... ,r + k } t can be treated as samples from the multidimensional density function f(,y-mis,i> a.j, b j | y b j , Z j , F j , 0 Q()). S ) A n d , if we choose a sufficiently large gap r' (say r' = 10), we can treat the sample series bj '), { ( y m i s , i > d-i \ fe k k = r+r', r + 2r',...} as independent samples from the multidimensional density function. There are several ways to get the initial values ( y ^ , - , s i way is to set y ^ ] S ) i to the average of y bs,i, and (af \ Q b- 0 ) ) to (0, a\°\ b- ^). 0 A simple 0). Multivariate Rejection Algorithm Sampling from the three full conditionals can be accomplished by the multivariate rejection sampling method. rj, aj, b j ; 6>W) in (4.3). sup{/*(u)}. For example, consider sampling from / ( y j i | y & , i , Z j , m Let / * ( y We assume c < oo. s c= A random sample from / ( y j , i | y & , i , Z j , rj, aj, b j j 0 ) ) / ( y ( , , ; K »i. = 0 s h {t) m /(rj|yj, z S 0 <; 0®) 0 and m i S i i u 0 ) S ) s {t) u can then be obtained as follows by multivariate rejection sampling: Step 1. Sample y ^ j s i from / y ( y j j | z j , aj, b ^ ; m S > 0 ), and independently, sample (t) w from the uniform (0, 1) distribution. Step 2. If w < / * ( y „ j j ) / c , then accept y ^ j , otherwise, go back to step 1. S ] Samples from / ( a j | y j , Z j , s i r iy bj; 6^) and / ( b j | y j , z r a^;-O®) it it can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used'to obtain samples from [y i ,i, m S 82 bj|y (, j, Z j , 0 S 4.4 A Computationally More Efficient Approximate Method The estimation method described in the previous section may be computationally very intensive and may even offer potential computational problems such as slow or non-convergence, since the method involves sampling the random effects ( a j , b j ) , which may have high dimension (see the detailed discussion in Section 3.4.1). To overcome these difficulties, in this section we propose,an alternative approximate method which further approximates model (2.5) by taking a first-order Taylor expansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model with the covariate model (2.8) and the dropout model, we can obtain approximate M L E s of model parameters by using a computationally more efficient M C E M algorithm. The random effects a j and b j can be integrated out in the E-step of the E M algorithm and thus sampling the random effects in the E-step is no longer needed. Therefore, the proposed approximate method may provide substantial computational advantages over the M C E M method in the previous section. Moreover, the proposed method can be used to obtain good parameter starting values for the M C E M method in the previous section. Denote the current estimates of (6, a j , b j ) by (6, a j , b j ) , where a j = E(&i\y b i, 0 Si Z j , TJ; 0) and b j = £ ' ( b j | y h j , Z j , r^; 6), suppressing the E M iteration number. Taking a first-order 0 S ; Taylor expansion of in (3.6) around the current parameter estimates a and J3 and random effects estimates a j and b j , we obtain the following L M E response model Yi = Ei + Wia + Xip + HiSLi + Tibi + ei, 83 (4.6) where Wi = (wji,..., w .) with xv Xi = (xii,..., x i r i i //i = (hji,..., h i7lj Ti = (t ...,t J witht0- = ^ gi = g i ( d , y9, ai, bi) - WiQ: - A"i3 - TfiOi - T i b i , i n ) r = i:j with Xjj = ) u with h do. dp = i:j O&i T il; in with all the partial derivatives being evaluated at ( d , P, aj, bj). Our proposed approximate method is to iteratively solve L M E response model (4.6). For the L M E response model (4.6) with the covariate model (2.8) and the dropout model, the M L E s of the model parameters can be obtained by a M C E M algorithm, in which the random effects ai and b i can be integrated out in the E-step, as shown below. Thus, the E-step only involves sampling y i i rather than ( y i , i , a*, bj) as in Section 4.2. Moreover, m S ) m S some analytic expressions for the M-step can be obtained. These result in a substantial improvement in computational efficiency 4.4.1 A M u c h Simpler E-step In this section, we show that, based on the approximate L M E response model (4.6) and the covariate measurement error model (2.8) along with the dropout model, the random effects (aj, bi) in the E-step in Section 4.2 can be integrated out. Based on standard results for multivariate normal distributions, the distribution of 84 ( y i , Z j , a j , b j ) for individual i is given by Hi A Hf + TiB Tj + 5 I Hi A V? 2 \0 • N ViAHf 0 AHT 0 Since H A Hf + T B t ViAV? + I®R AV? BT? V , HiA TiB ViA 0 A 0 0 O Tf + S 1 and V A Vf + I ® R are positive definite, they are symmetric 2 t and invertible. B y the inverse operation of partition matrices, we can write Hf + 1 l ^ Hi rii A j\n; - t - 1TiB ; a Tj it + 5 o~ riiHiAVf A vt 2 V VAHT \ T/. 4 T / T . VAV/ +I<S)R J I V Ui + ft —ft tii \ (4.7) -EiF 1 Ei J where d = {HiA'Hj Fi = Ei = + TiBT? + 5 2 GiHiAVf, [(V^Vf+ / ® i ? ) - VjAFf Gj/fjAVf]- . 1 Because the random effects aj and b j are conditionally independent of r j given ( y j , Z j ) , it is straightforward to obtain the conditional distribution of yii Z j , I*j, 6 N 85 [aj, b j | y j , Z j , r j ] when 6 = 6 B where u = ai [AHTtGi + FiEiF^-AVfEiFfiiyi-gi-Wia-XiP) + u i = BT^Gi = A-iAHjiGi E = B — B T?(Gi a i b t - (E ) =[AVfE Fl-AHf(G hi E a i bi S [AVfE -AHTF E }(z -U a)\ Q i l i l e= + FiEiFT^-^-Wia-Xi^-BTTFiEiixi-Uia)^^, + FiEiF^-AV^EiF^HiA-lAV^Ei-AHTFiEilViA^ + Fi Ei Fj) T* B \ Q Q , = + T biai 1 i l F E Fn}T B\ . i i i e=B By the expectation and covariance properties for multivariate random variables, we have £(aj|yj, ZiYi, E(^aJ\yi, 0) = z^ n\ 0) = £ £(b;|yi, z n; 0) = u , E{hihJ\yi, Zi,Yi- u E(aihf a i + i/J, b i + u bi 0) = E \yi, z ^ ; 0) = S a i b i v?., hi + u &i Since f(y is,i, m ^i, b i | y o6S) j, Zi, Yi, 0) = /(a^, bi|yi, z», r»; 0 ) / ( y i , t | y 6 i , i , i ' , z m 86 S 0 Sl r - if the conditional expectations in the E-step, which correspond to I?\a, (3, 5 ) 2 = J £ , a i [ - ^ log(27r5 ) - ^ ( y < - - W oc - Xi (3 - H, a, - % 2 b i g i i = Zi, Ti, log(27r5 ) - ^ J 2 x/(y = Zi,r i j 0 - W oc - Xi (3) r E^lHi&i Zi, Tj, g i t + Tih^yi, z r 0] u [ ( ^ i a i + r i b i ) ( i / i a i + ri.bi)|yi, z ^ ; mis,i\yobs,ii i ; {(yi - gi - Wi oc - X i /3) (yi - T a i ) b i T { 0)dy ^{yi-^-WiOc-Xidf +E b) t x ( y i - gi - Wi oc - X i (3 - Hi ^ - Ti b ) | y ^ f (ymis,i\yobs,ij in (4.2), become 1 . < ; 0]} 0)dy -^log(27r5 )-^y'{(y -g -l^a-X ^)% -&-Wia-X /3) 2 i i i -2(yi-gi-Wia-Xi/3) [^iE(ai|yi, z i T i ; r i ; i 0) + Ti £ ( b i | i , Y + ^ , b j a f i f f / f i a i + 2bf T f t f i ^ + b f i f l - b ^ , a i x/(y , r Z i Z i ) r < ; 0)] 0]} i ; mis,i\^/obsji j ^i? ^"ij 0)dy = -^log(27r5 )-^ y { ( ^ - ^ - ^ ^ - ^ ^ ( y i - g i - W i « - X i / 3 ) 2 -2(yi-gi-Wia-Xi/3) (Fi B[ai|yi, z T J +tr[Hi E(eii a f |yi, z h r<; +tr[7;£;(bibf| i, Zi.Ti; y = -|log(27r<5 )-^ | 2 0) Hj) a i +tr[Tj ( £ b i + V + i/ a i b i r t u z i , r < ; i a i + i / J ) tff ] + 2tr[tf; ( E n; u 0) i f ] 7} i / a i b i b T (y -&-W;a-X /3) i i J + ^ if] !/£.) Z f ] | x i\yobs,i, 87 z x i i/ 0} + T E[bi\y bf|yi, {(yi-&-W a-X /3) T i ; + 2tr[#i 0)Zf]} - 2 ( y i - gi - VKi a - X , /3) +tr[Hi ( S i ; Z i , Ti', 6)dy i i, m St 0]) 1 _ '_ m E,a , , b j - ^ log\2*R\ - - - Uv oc - V Ri) RT 1 tj i=i UijOt x(ZJJ Vij a j ) | y j , z^r^; 0 x / ( y i i | y ( , i , m lOEg |27ri?| - - y { ^ ( z ^ - ^ a) T iT S] 0 (z 1 - ^ y Zj, S] d)dy i i iy m S oc) i mi -2 ^ ( z - Un ocf R- V E^ailyi, z n- 0) 1 y i3 u 3=1 _ • mi + ^2E [^V^R- 'Vija \yi, ^, r»; 0]\ x / ( y 1 aiibi i f r 1 m i s 0)dy , |y 6 , , Zi, r ; l 0 s l n 4 m i log |27ri?| - - J { £ ( z < , - - tf ocf R- (zy - £ ^ a ) ^ 1 y m i -2 ^ ( z ^ - - Un ocf R' Vij E(<a\ , z i y 0) 1 yi + i=i J^tr[V [i?- K £;(a af|y , r 1 i j m 1f m log |27ri?| - - { ^ i i 0)]} x , iy z», rii 6>) d y „ / ( y ^ y ^ , * ~ Vij ocf R- (zij - Uij oc) 1 3=1 -2Y(z j-U ocf l Z i u R- VijU l l3 ai 3=1 + Y ( ij tr v R" 1 H V ( a i s + x /(y ^ a i ^ ) ) m i s ,i|y 6 ,i, Zi, 0 S d)dy ,ij, r ; mia t 3=1 r E aubi 1 r l [ ~ 2 §\ 2 7 t A \ - ~ A l *l y i tr[A~ E(aisf\yi, l / ' Z i ' r i ' 0 /(ymis,i|y 6s,i, x i , z 0 i"*; 9) dy miSii z h n; 0)} x f{y i ,i\yobs,i, z m S ; , i\; 9)dy miSii 1 f -2 - E, log / [ ^ - ho? ( a B + - 4I^ log |2vr5| a i . b i I27T. 1 a 1f - log |2TTV4| - - 1 ~i l l o t r _ 1 E ; 1 / (0 y mx^ /y( y, ^i ,, i Z b i |^ y i ), ]z x r»; | yi ,&n; , i , 6) z i} m S 0 s i ; dy r<; i0) d y m Sii> m i 5 i 1f -log|27r5| - - / tr[5 E(bibf|yi, _ 1 1 1 f - log |2TT£| - - / t r [ 5 _1 (E + u BI hi z i} iy 0)} x f{y isAyobs,i, i, z m 9)dy i\; mis>i ~ ul.)\ x / ( y i , i | y , i , z i y 6) dy m S o6s u , mis>i i I^iw) = I E z^, log/(ri|yi, aiM x /(y i ,i|y 77) m S l o g / ( r j | y i , Z J ; 77) x / ( y « , i | y b , i , z m 0 s i ; o b s > i, z r ; 0) dy 4 i } rij 0) dy n Thus, compared with the intractable high-dimensional integrals in (4.2), the above integrals I{ - II if - if have much lower dimension, i.e., the E-step is substantially simplified. Note that are expectations with respect to / ( y i i | y 6 s , i , Z j , ry, m S l 0 0), so they may be evaluated using common numerical integration methods such as Gaussian quadrature if the dimension of y i s , i is small. If the dimension of y i i is not small, we can use the rejection sampling m m S ) methods to generate samples from [y is,i\y bs,i, m f{y is,i\yobs,i, m z 0 i , u 0) r %, &\ based'on the conditional density z oc / ( y i , Zj,ry, 0) = / ( Y i , Z i ; 0)/( i|y*> Z i ; 0), r where the distribution of ( y j , Zj) is yi N / gi + WiCt + Xi(3^ UjCt / HiAHf + T{BTj + 6 I 2 V AHf J t H AVf t ViAVf + I^R For individual i, let y ^ ] j , • • •., y f i i . i denote a random sample of size k generated from s [ymis,i|yobs,i! i, z ry, • Note that each y ^ \ is suppressed throughout. t s i depends on the E M iteration number t, which Then we approximate the conditional expectations if 89 - if in the E-step by its empirical mean, with missing data replaced by simulated values, as follows. * fe=i >(fc) r- E l(y Xi> 2 2= - & - w « - * /3p { 1 x[^£;(a |(yS , y ,), z n; 0) + % E(bA(y^ ., ^ fc + r - E { [CovC^KySi.i. y^O. i , » ; 0) * fc=i J Sil obs u y ), ofeM s Z i j r i; 9)} t t r H i +^( i|(y2l,i> a z yobs,i), r ry, 0) ^(aiKyJJ^, Zi, y bs,i), 0 zu ~6) )Hj} z n; ~9) \T?} T kt ^ + 71 7 E 2 t r fc=i {^[ +E(^\(y Xi, { C o v y ( ,i), o b s l(ySl,i' bi ' a i yobs,i), Zi, z vf, e)E(b \(y X , { u l Vi; y l o b s 9) ,i), T u k t E < < [Cov(b,|(yS ^ yobs,i), Zi, v , fc=i +£?(b |(y^ , y ,i), , r ~9) £(bi|(y<2flii, tr +771 T i Sl i i Z i o b s ~9) f y<*a,i), z r u i ; ~0) ] T i ; if}}, 1 f _ '_ m j f ( a , iJ) « - ^ i log [2jriJ| - J 2 - E 2 77 E ( j=i k=i _. TUi z «- n ) u a t r l o g | 2 y r A Zi,Vi, o) ai s 0 i ; * fc=l | _ 1 ry, 9) Zi, s „ _1 yob**), V Ba (j4) £N(y™U ^ T tj W ^ H [Cov( |(yS ., y ^), Zj, r 9) +-(*l(yS2,i. yobs,i), f) 1 tj kt + E J7 E j=l £ ( z „ - U af R' {z - U„ a) W - ^{A- 1 £?(ai|(y2ii, yo6«,i), 8i [Cov(a,|(yS , yob.,0, Zi, r>; S|i r;; 0 ) ] } T 0) * fc=i • +£( il(y2L, a yc*,i). s 90 Zi, r i; 9) £(a;|(yf., y s o6s ,i), Z i , ri; 9) }}, T I^\B) - 1 - - log\2KB\ l » \ £ I [Cov^Ky JS, ^ V +^( i|(ySi,i yobs,i), ^ , r b ; i ; y ^ ) ,z 0 0) £ ; ( b i | ( y S , y s i o b S i i ; i), r i 5 0) z r h 0) ]}, T i ; kt 4°fa) 7 r £ g / ( i l ( y 2 l , i . y<* ,i), *; l o ~ r z 8 * fc=i 4.4.2 The M-step In this section, we derive some closed-form expressions for the M-step, so we avoid some iterative algorithms, which may be computationally inefficient. The M-step of the E M algorithm maximizes Q{6\9) = n Y$\<*i A + f V, ^ R) + WW + f fa)] } + W(B) i=l to produce an updated estimate of 6. Note that if we were to observe (ei, €i, a i , b i ) , in addition to (y fc ,i, Z i , r i ) , we would use the following estimates 0 s n n ^ = £ ^ /£ *> e ei i=l i—1 n i W where J2 I «> i=l e and 5 , respectively. m i=l h h n i=l 2~2 Yl ij Ij> i=lj'=l e e R=J2 i I/ > n n e Y/)2 iJ l/J2 i' i=l j=l = J]aiaf/n, n mi e n n i=l n n m; R== € 2~3 i fi a a i=l Since ei, a n d E D i b f a r "sufficient" statistics for <5, e 2 R, A, i=l a^, and b i are unobservable, we can "estimate" them by their conditional expectations given the observed data (y f, i, Z i , r ^ , as in Laird and Ware 0 M S) f 1 (1982). Based on results multivariate Hi A we Hf have + TiB Tj gi in + Wia + XiP ^analysis, Zi V <) e ~N Vi A Hf Uioc \ 0 J \ 91 5 2 1 +51 2 HiAV? ViAVf + I^R 0 5 I^ 2 0 51 j 2 and Ei + Wicx + N V Hi A Hf + TiB 7 f + 8 I ViAVT 0 ^ J V \ HiAV? 2 UiC* e where Xip + I®R Ml 0 Mij R is a vrrii x v matrix with the j t h v y. v submatrix R and zeros elsewhere. Since are conditionally independent of r-j given y j and Z j , we have ej and /(ej|yj, Z j , ry, 0) /(ei|yi, Zi, ry = / ( e j | y j , z»; 0) = f(ei\yi, ZJ; 0), 0). Using the above results, the inverse of partition matrix in (4.7), and the definition of conditional distributions, it can be shown that [ej|yj, Z j , ry 0] ~ N(u , Bi A J, e where [5 (Gi + Fi Ei i f ) ( 2 - g j - Wi cc - Xi Q) - 6 F E^i - U cx)} 2 y < t 0=0 and [cij|yi, Zj, Ti\ 0] ~ N(u , €iJ A ), £ y where [R^E )*{v -U ci)-R{E FT)i(y -&-W OL-X B)} 6=6 = i k=l A .. e = [R-R{Ei)"R] 6=6 92 ik ik i i I i with the j t h v x rn submatrix (Ei FfY submatrix (E^ E(e[ei\y z obStU i ij € and the j t h row and the kth column v x v of Ei, j, k = 1 , . . . , m^. Note that k E of Ei Ff iy u i, z 0 u f 0) T ri; 0) E(hihJ|y ,i, o6s G i | y i , Z i , Vi', 0) j'(y is,i\yobs,i, E(&i ru 0) <Hj\yobs,i,Zi, E(aiaJ\y bs,i, 0) m ij z r t i 0) ^Ymis.t) E(€-i €j | y i , Z i , F i , 0)/(yjnis,i|yo6s,i; Z i , T i , 0) ai |yi, Z i , rf, 0)f(y Ay E( mis ai E(bibf| y i dy i i, m s v ;6)dymis,11 obs i , Z i , Ti; 0)f(ym ,i\yobs,i, ia z i , r»; 0) d y 7 1 Using the expectation and covariance properties for multivariate random variables and the properties of the matrix trace operation, we can update the estimates of (5 , R, A, B) in 2 the M-step as follows. 5 = E tv[E(eief\y , 2 z , iy 0)}/E obSti i=l n kt - E E i = l k=l n mi R = E [ (ei J\(y l,i, E mi yob ,i), e m S z * , r»; 0)]/ E ^ _ E n i=l . tv i n t e /ct ~ EEE n 5ly fe ,i; *> i ; ^)/ E 0 z S i=l r m i m Efej CyKyJn-B.i, Yobs.i), Z i , n r 0)/ E i; i = l j=l k=l i=l = E ^(ajaf | y A . o 6 S i i z iy , i; 0)/n i=l . ~ 5 = E E ^ ( i f l ( y m i , i > yobs,i), z<, v ; 0)//c n, i=l fc=l a a 5 E - ^ ( b i b f |y i=l - Ei = i Efc=i ^ ( b o 6 S ] i , " Z i , v^, 0)/n y«*».0. »> » ; 0 ) A ^ - i FI(y i ,i' z b m S r Plugging 5 and .R into E r = i [ - ^ ( > ^ , <5) + ^ ( i 2 t { 2 a a ^ ) ] and setting its first derivative with respect to (oc, 3) equal to 0 , we can get a set of equations for 6c and 3. i-1 j=l v n E i=l xf Wi 7 a i-l E xf x ( E^ i=l \ { i=i 93 l ^ ^ + ^Ec/^- ^.} 1 j=l ZXfv , t=i yi J where u = E(y \y , yi i = ahati Z i ij — , ry, 0) - u obSti { £(bi|y o h S i i, z i ; r- 0), Zi, ry 0). E(&i\y i, z z r^; 0) - T Hi E(ai\y , gi- obs> The foregoing conditional expectations can be approximated as follows. E(yi\y bs,i, Zi, 0 ry E^ ™l,i' kt « 0) y Yobs,i)/ t, k k = i kt E(ai\y , Zi, ry obSii 0) « ^ ^ M ^ m l , ; , y<**.0> E(bi|(yf , y i, z ry 0)/fc , t k = i kt 0) Zi, n; E(bi\y ,i, obs « s i o f e s > i r^; ), 0)/fc . t fc=i Using the inverse operation of partition matrices, we obtain closed-form estimates of a and 3 in the M-step: n « = r where r n -M r £ w^ i=l n i=l n 1 + 5 izuljR-^ j=l J L j=l J T 12 = F 21 = -Q L~\ n i=l " + r Zij n L = n L Q, T ^ 2 yi + Q L" nii „ and T £x> 1 2 y i , i=ln i=l = L ' , with 1 2 2 mi i=l j=l ExfXi-(E^^)M(E:^Xi), i=l i=l i=l n Q = M ( E ^ , ) . i=l We finally maximize E"=i ^i 1!) to obtain an updated estimate of 77, which can be done by standard optimization procedures such as the Newton-Raphson method. Iterating between the above E-step and M-step until convergence, we obtain approximate M L E s of 0. The asymptotic variance-covariance matrix of the approximate M L E 0 of t 94 0 can again be obtained using the approximate formula given in (4.5). T h e only difference is that the density function / y ( y j | z j , a;, by a, 3, 5 ) in if 2 1 in (4.1) is based on the L M E response model (4.6). We see that, for this approximate method, both the E-step and the M-step are computationally much less intensive than those in Section 4.3. The performance of this approximate method will be evaluated in Section 4.6. 4.5 Example We illustrate our proposed methods in this chapter using a H I V dataset. We also analyze this dataset using the commonly-used naive method which ignores measurement errors and missing data for comparison. 4.5.1 Data Description The dataset includes 53 H I V infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contaminated by long-term clinical factors. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules. It is well known that C D 4 counts are usually measured with substantial errors. The number of response measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance and other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 C D 4 measurements missing at viral load measurement times, due mainly to 95 a somewhat different C D 4 measurement schedule. A detailed data description can be found in W u and Ding (1999). Six patients are randomly selected and their viral loads are plotted in Figure 1.1. Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay. Visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. The C D 4 count trajectories for six randomly selected patients are plotted in Figure 1.2. There exists large variability in C D 4 count between patients. The population C D 4 count trajectory appears to have a quadratic polynomial shape. 4.5.2 The Response and the Covariate Models Based on W u (2002) and W u and Zhang (2002), we consider the following H I V viral dynamic model (see Section 3.5 for the details) j = \og (P e- log(Pii) = 0i + b , log(P ) = 04 + Vi 2i w + e, (4.8) \UJ = fa + fcz-j + b , (4.9) X i i u u ^+ P e- J i) X2i ti 2i ij 2i hi, X2ij = w(t ) + h {U ), ij i j (4.10) where y - is the logi -transform of the viral load measurement for patient i at time Uj, Pu and i3 P 2i 0 are baseline values, A^- and A j are viral decay rates, z*j is the true (but unobservable) 2i C D 4 count, and w(Uj) and hi(Uj) are nonparametric fixed- and random-effects functions (see Section 2.1). To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines 96 Table 4.1: A I C and B I C values for the model (4.8) - (4.10), with q < p = 1, 2, 3. Model p=l,q= 1 p=2,q=2 p=2,q=l AIC 622.92 590.82 591.37 584.51 p=3,q=3 p=3,g= 2 593.32 p=3,q= 1 583.19 BIC 685.37 676.68 662.95 677.16 671.76 657.72 with percentile-based knots to approximate w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects. A I C and B I C criteria are used to determine the values of p and q. Table 4.1 displays A I C and B I C values for various plausible models. Based on these A I C and B I C values, the model with p = 3 and q = 1, i.e., A ij ~ ft + fa ^ i ( i y ) + ft foiUj) + b 2 (4.11) 4i seems to be the best, and thus it~is selected for our analysis. For the C D 4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,..., v^f ) and linear (a = 2), quadratic 1 (a = 3), and cubic (a = 4) polynomials. Table 4.2 presents A I C and B I C values for these three models. T h e following quadratic polynomial L M E model best fits the observed C D 4 process: CD4 i ( = ( « i + ai) + ( « + a ) u where uu is the time and ot = (flii, i2, 0'i ) T fl 3 2 2 are the population parameters and a are the random effects. 97 3 t u (4.12) T 3 3 + e a (a\, a , a ) 2 + (a + a ) u 2 t = Table 4.2: A I C and B I C values for the linear, quadratic, and cubic C D 4 models Model 4.5.3 a=2 a=3 a=4 AIC 806.58 715.80 752.15 BIC 830.00 774.34 791.18 D r o p o u t M o d e l s and Sensitivity Analysis In this study, dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, dropouts are likely to be informative or nonignorable. So we need to assume a model for the dropout process in order to make valid likelihood inference. Although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analysis based on plausible models may still lead to reasonable models. Note that we should avoid building a too complicated dropout model since a complicated model may become non-identifiable (Fitzmaurice et al. 1996). Subject-area knowledge suggests that dropout may be related to current or previous viral load and C D 4 measurements. Thus, we consider the following five plausible dropout models for sensitivity analysis Model I: Model II : Model III : Model IV : Model V : where y^ (k < j) = l|yi,i'i V)]= Vi + V2CDAij + rj yij l o g i t [ P ( r = i|yi, i i V)] = vi + V2Vi,j-i + myij, logit[P(rjj = i|y<,i\ V)]= Vi+ mCD^j + rj yi logit[F(r = i|y»,i ' , *)] = m + V2yik, k<j, logit[F(ry = i|y<,i> V)] = +r CD4* , logit[F(r ij z 3 z y z 3 tj z ij z in Model IV Vl l2 j is the last observed response and CD A*- in Model V is the estimated true C D 4 value for individual i. nonignorable missing response models, Model IV Thus Models I - III represent possible represents a possible ignorable missing response model, and Model V relates dropouts to (estimated) true C D 4 values. 98 We also considered the following ignorable missing response models: logit[P(r;j = l | y i , zfT])] = r) X +r) yn 2 logit [P(r^ = l|y*, zf 77)] = 771 + 7 7 2 ^ , but the resulting estimates are similar to those for Model IV, so we only present results for Model IV in Table 4.3. We assume independence of the r^-'s to simplify the model. 4.5.4 Estimation Methods and Computation Issues We estimate the model parameters using the naive method which ignores measurement errors and missing data and the two proposed "joint" model methods discussed in Sections 4.3 and 4.4. We denote the method in Section 4.3 by A P P R 1 and the method in Section 4.4 by A P P R 2 . The two proposed joint model methods need starting values for model parameters since they are implemented by M C E M algorithms. We use the parameter estimates obtained by the naive method as parameter starting values for the two joint model methods. For the naive method, we use the S P L U S function nlme() and ImeQ to obtain param- eter estimates and their default standard errors. For the two proposed joint model methods, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 4.1 and 4.2 show the time series and the autocorrelation function plots for b associated with patient 14. From these figures, we 2 notice that the Gibbs sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 17. Time series and autocorrelation function plots for other random effects and missing responses show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 4.3.3). We start with ko = 500 Monte Carlo samples, and increase the Monte Carlo sample size as the number 99 O 200 400 GOO 800 10OO Iteration Figure 4.1: The time series plot for b associated with patient 14. 2 Series : b2 t L T Figure 4.2: The autocorrelation function plot for b associated with patient 14. 2 100 t of iteration increases: k t+i = k + k /c with c = 4 (Booth and Hobert, 1999). Convert t gence criterion for these two joint model methods is that the maximum relative change in the parameter estimates from successively iterations is smaller than 0.05. Convergence of the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the two proposed joint model method. O n a S U N Sparc work-station, the A P P R 1 method took about 140 minutes to converge while the A P P R 2 method took only 12 minutes to converge. This shows that A P P R 2 offers a big reduction in computing time, and thus is computationally much more efficient than A P P R 1 . 4.5.5 Analysis Results We estimate the model parameters using the naive method and the two proposed joint model methods A P P R 1 and A P P R 2 . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P P R 1 and the A P P R 2 methods. We also tried several other parameter starting values for the proposed methods. Different parameter starting values appear to lead to roughly the same parameter estimates for both the A P P R 1 and the A P P R 2 methods. Table 4.3 presents the resulting parameter estimates and standard errors based on models / , IV, and V in (4.13). We find that the two joint model methods provide similar parameter estimates. We also find that the naive method may severely under-estimate the covariate C D 4 effect (i.e., /? ) and may poorly estimate some other parameters as well (this 3 will be confirmed by simulation). For the different dropout models in (4.13), we find that the resulting estimates based on the three nonignorable models (Models I, II, III) are all similar, which indicates that the estimation may be robust against the nonignorable dropout 101 Table 4.3: Parameter estimates (standard errors) for the models in the example. Model Model I method Oil Oil NAIVE - - a Pi - 11.72 65.71 PT -1.90 — (3.8) (3.2) (.6) (5.5) (•2) 11.72 67.08 1.52 6.97 -1.83 (5.2) (6.2) .(•7) (5.8) (•2) 11.70 66.97 1.50 6.96 -1.90 (4.4) (5.8) (.6) (5.5) (•2) (8.9) (3.1) 7.86 -2.63 (7.9) (3.0) .33 .50 11.73 66.52 1.37 8.83 -1.92 .35 .51 (•6) (•1) (.5) A P P R 2 -.43 4.21 -3.78 (.6) (•1) (.6) Model IV A P P R 1 -.43 4.18 -3.75 (•6) (•1) (•5) A P P R 1 -.43 4.21 -3.80 (•1) (.6) R /V 8.66 A P P R 1 -.42 4.15 -3.75 Model V 6 PA P5 6.87 -2.58 3 (.6) P2 ft 0.84 6.89 -2.62 (5.0) (6.0) (•7) (5.9) (•2) 11.74 66.79 1.44 6.89 -2.50 (•2) (4.9) (6.1) (-7) (5.9) 7.75 -2.54 (8.8) (8.9) .35 .51 (3.5) (3.1) 8.60 -1.98 (8.9) .35 .35 .50 (3.1) Note: A and B are unstructured covariance matrices, but we only report the estimates of their diagonal elements here. Diag(A) = (.50,3.65,1.61) for APPR1, Diag(A) = (.52,3.80,1.66) for APPR2. Diag(B) = (1.08,77.12,2.03,24.98) for Naive, Diag(B) = (1.10,75.50,2.01,26.51) for APPR1, and Diag(B) = (1.09, 75.24,1.83, 22.37) for APPR2. models. The estimates based on the ignorable models (Models IV and V ) , however, appear, to be somewhat different, especially for the parameters associated with the decay rates XUJ and A ij. 2 This suggests that the missing responses (dropouts) may be nonignorable, and reliable likelihood estimation must incorporate a reasonable nonignorable missing response model. Although some estimates in Table 4.3 are not statistically significant, the values of the estimates may still provide useful information about viral load and C D 4 trajectories. The estimates of parameters 772 and 773 in dropout model I based on A P P R 1 method (or A P P R 2 method) are -.05 and .97 (-.04 and 1.06) respectively, with both p-values less than 0.001, which also indicates that the dropouts may be nonignorable (or informative) since the missingness may depend on the missing values. The estimates of 772 and 773 indicate that dropout patients seem to have lower C D 4 counts and higher viral loads than the remaining patients. 102 4.6 A Simulation Study In this section, we conduct a simulation, study to i) evaluate the performances of the semiparametric modelling and of the A I C / B I C knots selection method, ii) assess the two proposed methods ( A P P R 1 and A P P R 2 ) and compare them to the naive method ( N A I V E ) , and iii) evaluate the impact of specification of the missing response mechanisms on parameter estimation. We generate 100 datasets from the following model, which corresponds to the model (4.8) - (4.10), = log u = A + 2ij = - 2 . 2 + (5.3 + 0.164*) sin(0.04 + 3ty), y i j log(P ) X 1 0 (Piie- A l ^ + P e- A a M Xiij = « * « ) + eii, fez?, + b, 2i (4.14) log(P ) = A i + hi, 2i (4.15) (4.16) where the nonparametric model (4.16) is carefully chosen to closely mimic the viral load trajectory at later stages in the example of the previous section. The covariate model and the measurement time points used in the simulation are the same as those in the example of the previous section. T h e true values of model parameters are similar to those in the example. The true values of (5 , /3 , /3 ) and a are presented in Table 4.4, and the other 2 3 4 true parameter values are 5 = .2, R = .4, A = diag(.5, 3, 2), and B = diag(l, 9, 2, 4). Note that b» = (bn,bi ,bi ,bii) 2 3 ~ N(0,B), where b& is incorporated in the nonparametric model (4.16). There are 147 viral loads after the rescaled time 0.25. We regard the viral loads out of 147 greater than the 45th percentile as missing responses. Thus, we delete 20% largest response values at the last few time points to mimic an informative missing response mechanism (i.e., missingness depends.on the values being missing). T h e structure of the data generated by the simulation study is similar to that in the example in Section 4.5. We calculate averages of the resulting estimates and their standard errors based on 103 each method, and compare the methods by comparing their biases and mean-square-errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined next. For instance, the bias for ft, the j t h component of 3, is defined as biaSj = J3j- — fij, where ft is the estimate of ft. The mean-squared error for ft is defined as MSE.,- = bias + s), 2 where Sj is the simulated standard error of ft. Then, the percent relative bias of ft is defined as bias^/lftl x 100%, and the percent relative root M S E is •^MSEj/l/3,-1 x 100%. First, to evaluate the nonparametric modelling, we study the performance of the A I C and B I C criteria in selecting the numbers of knots (p and q), since these numbers represent the degrees of smoothness of nonparametric functions (too large/small values may result in overfit/underfit). For the 100 datasets simulated from the semiparametric N L M E model (4.14) - (4.16), we find that all B I C values and 97% of A I C values lead to the model (4.11) (i.e., p = 3, q = 1). To further evaluate the A I C and B I C methods, we also generate data from models (4.8) - (4.12) with (/3 , 0 , (3 ) = (-2.0, 8.0, -3.0) (so the true number of knots 5 6 7 are known), and use the A I C and B I C methods to select the best model. The performance of the A I C and B I C methods is similar. These results show that the A I C and B I C criteria perform well in the current setting. 104 Table 4.4: Simulation results for the parameter estimates (standard errors) as well as their biases and M S E s for the estimation methods P A R A , A P P R 1 , and A P P R 2 . Parameter ai a a 02 03 04 True Value -0.4 4.0 -4.0 12.0 67.0 1.5 7.0 - - - 11.94 60.91 -0.62 6.48 0.20 (•1) 12.00 (1.5) (1.8) 66.87 (1.3) 1.49 (•2) 7.11 (•1) -1.89 (1.6) (.3) 66.25 1.59 (1.4) PARA APPR1 APPR2 -0.39 (•1) -0.39 (•1) PARA APPR1 APPR2 2 3 4.05 -3.99 (.3) . (.3) (•1) 12.00 4.06 -4.01 (.3) (•3) (•2) Bias (1.1) - - - -.48 -9.09 1.28 1.73 1.27 .17 .01 -.19 1.55 -.18 .03 -1.07 -.60 6.99 0s 06 07 (2.0) 9.65 (3.1) -1.66 (1.1) 6.90 -3.11 10.18 -1.47 (•3) (1.7) (2.6) (1.0) -141.03 -7.43 1.29 -1.36 MSE - - - 1.71 9.38 230.93 8.18 APPR1 21.33 10.02 9.87 2.48 98.67 4.12 APPR2 25.40 10.49 10.56 1.12 1.34 2.73 100.94 5.06 PARA Note: Bias = Percent bias = 100 x b\as /\P \; J J MSE = Percent V M S E = 100 x , / M S E " / \ P J \ . 105 To investigate the effect of semiparametric modelling on the estimation of the fixedeffects parameters ft - ft, we consider the two proposed methods ( A P P R 1 and A P P R 2 ) for the semiparametric N L M E model (4.14) - (4.16), along with the covariate model (4.12) and a nonignorable dropout model (Model I in (4.13)). We also use a parametric N L M E model, where \ ij 2 = ft + b^ and the other parts are the same as in the semiparametric N L M E model (4.14) - (4.16), to fit the simulated datasets. To emphasize the difference between the nonparametric and the parametric modelling for X j, 2i we consider an ideal case for the parametric N L M E model fitting, in which there is no covariate measurement errors and dropouts. Thus, we do not need the covariate measurement error model and the dropout model in this ideal case. We use S P L U S function nlmeQ to obtain parameter estimates and their default standard errors, denoted by the P A R A method. We calculate averages of the resulting estimates and their standard errors based on each method. Since A P P R 1 method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 137 data sets. in Table 4.4. The simulation results are shown We find that estimates for the fixed-effects parameters ft - ft obtained by A P P R 1 and A P P R 2 are very close to their true values, and both methods perform better than the P A R A method in terms of bias and M S E criteria. These results show that the semiparametric modelling based on A I C / B I C for knots selection performs well and better than the parametric modelling in the current setting. To study the effect of missing data mechanisms, we assess the proposed methods ( A P P R 1 and A P P R 2 ) based on a nonignorable model (Model /) and an ignorable model (Model IV), and compare them with the naive method (which ignores measurement errors and missing data). To investigate the performance of the estimate of X ij, 2 we generate 100 datasets from the true models (4.8) - (4.12). In the simulations, the true values of model parameters 0 and a are shown in Table 4.5, and the other true parameter values, the missing 106 Table 4.5: Simulation results for the parameter estimates (standard errors) for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13). Dropout Parameter Oil "2 a Model True Value -.4 4.0 -4.0 NAIVE - - - - - APPR1 -.40 4.08 -4.00 (•1) 11.99 (•1) -.40 (.3) APPR2 4.06 (•3) -4.01 (•1) 11.99 (.3) (.3) APPR1 (•1) -.40 4.06 -3.99 (•2) 12.00 (.3) APPR2 (•1) -.39 4.06 (.3) -4.00 (•1) 11.99 (•1) (.3) (.3) Model I Model IV • 3 - ft ft ft ft ft ft ft 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 11.98 66.87 (1.2) 0.92 6.98 -2.41 9.64 -2.03 (1.1) (0.3) 7.01 (1.9) -2.04 (2.9) 8.11 (1.0) -2.92 (•2) 66.93 (1.4) 1.48 (1.6) (.3) (1.9) (3.1) (1.2) 66.86 1.53 7.03 8.17 -2.87 (1.3) (1.5) (.3) -2.ii (1.8) (2.8) (1.0) 67.15 1.45 7.01 -1.81 8.97 -2.22 (1.3) (1.6) 1.42 (.3) (3-2) 9.08 (1.2) 6.96 (2-1) -1.78 -2.28 (1.5) (.3) (1.8) (2.8) (1.0) 66.80 (1.2) mechanism and the missing rate are the same as above. We calculate averages of the resulting estimates and their standard errors based on each of the three methods and each of the two dropout models. We compare the methods by comparing their biases and mean-square-errors (MSEs). Since A P P R 1 method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 130 data sets. From the simulation results in Tables 4.5 and 4.6, we see that, when measurement errors and reasonable missing data mechanisms (Model /) are taken into account, the two proposed joint model methods ( A P P R 1 and A P P R 2 ) perform well in terms of both bias and M S E criteria. A P P R 1 performs better than A P P R 2 as expected, but A P P R 2 also performs reasonably well and is computationally much more efficient. When the missing data mechanism is ignored (Model IV), however, the two methods may not perform well. The naive method, which ignores measurement errors and missing data, may lead to severely biased estimates and large M S E s (e.g., the covariate effect ft can be severely under-estimated). 107 Table 4.6: Simulation results for biases and M S E s of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13). Dropout Model Parameter Oil OL2 QJ Pi ft ft ft ft ft ft True Value -.4 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 NAIVE - - - -.10 -.25 -38.42 -.61 -20.53 20.45 32.33 APPR1 -.78 1.95 .01 -.08 -.11 -1.59 .18 -2.21 1.40 2.62 APPR2 -.78 1.95 -.01 -.09 -.21 -1.78 .50 -5.70 2.08 4.30 APPR1 -.97 2.02 .16 -.08 .22 -3.63 .19 9.26 12.10 23.10 APPR2 1.40 2.07 -.09 -.09 -.29 -6.91 -.55 10.66 13.57 26.02 3 Bias Model I Model IV MSE Model I Model IV NAIVE - - - 1.89 2.97 148.06 6.96 168.66 68.32 70.14 APPR1 20.00 7.28 7.00 1.17 1.84 96.01 3.43 80.02 35.15 40.42 APPR2 25.00 7.65 8.00 1.25 2.23 98.69 3.74 112.64 39.06 43.21 APPR1 30.51 9.90 9.84 1.45 2.49 120.12 4.84 122.80 56.26 58.90 APPR2 36.50 10.46 10.31 1.74 2.58 131.24 5.73 146.04 62.14 66.07 Note: Bias = Percent bias = 100 x biaSj/lPj ; MSE = Percent 4.7 V M S E = 100 x V M S E J V I / 3 , - 1 . Conclusions and Discussion We have proposed two approximate likelihood methods for semiparametric N L M E models with outcome-based informative dropouts and covariate measurement errors and missing data, implemented by Monte Carlo E M algorithms combined with Gibbs sampler. The first method is more accurate than the second method but it may be computationally very intensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. T h e second method is computationally much more efficient, but it is less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that the proposed methods, which incorporate measurement 108 errors and dropout mechanisms, produce satisfactory results, but methods ignoring measurement errors and/or ignoring dropout mechanisms may perform poorly. Moreover, the A I C and B I C criteria perform well in the current setting. ^ We have assumed that the dropout models depend on the observed or unobserved responses and covariates. Alternatively, we may consider dropout models which share the random-effects parameters in response and covariate processes. Such models may be appropriate if the dropout mechanism is related to the true but unobservable response/covariate values or summaries of response and covariate processes such as unobservable true viral decay rates. The methods in this chapter may be extended to such models. Finally, for Monte Carlo E M algorithms, Booth and Hobert (1999) proposed a nice automated method for choosing the number of Monte Carlo samples, which can be extended in our case as well. 109 Chapter 5 Semiparametric N L M E M o d e l with Random-effect-based Informative Dropouts and Covariate Measurement Errors 5.1 Introduction In this chapter, we develop two likelihood methods to simultaneously address covariate measurement errors and random-effect-based models. informative dropouts in semiparametric N L M E The major difference in the models in this chapter and the models in Chapter 4 is the difference in the assumed missing response (or dropout) models. The response and covariate models remain the same. In Section 5.2, we discuss the models for this problem. We obtain approximate M L E s of all model parameters, using a Monte Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 5.3. 110 To avoid potential compu- tational problems in the method discussed in Section 5.3, we also propose an alternative approximate method by using a first-order Laplace approximation to the log-likelihood function in Section 5.4. Some asymptotic properties of the resulting estimates are also discussed. The two proposed methods are illustrated in a H I V dataset and are evaluated via simulation in Section 5.5. We conclude this chapter in Section 5.6 with some discussion. Asymptotic properties presented in Section 5-4 are proved in Section 5.7. 5.2 Missing Response Models The dropout is random-effect-based informative if the missing probability of the current response depends only on the underlying unobservable random effects rj) = /(rj|aj, bj; and aj bj, i.e., /(rj|yj, Z j, rj). In the presence of random-effect-based informative dropouts in the N L M E model (2.6) and (2.7) with the covariate process (2.8), we can again write v* = (ymis,i, as before. Let y<jbs,i) 2/ij's are again intermittent, n be the number of components in i obSt gobs,i) with gohS)i and respectively. Let r j = (rn,..., i such that imply that = 1 if r i j + i = 1. 0 s Here, the missing i.e., we allow dropout individuals to possibly return to the study at a later time. For the vector g, = (gn,... gi = (grms.i, y b ,i- ri ) T ni g m i S : i ,gi ), ni where ^ are defined in (3.6), we write being the conditional expectation of y o b S i i and y i ,i, m S be a vector of missing response indicators for individual is missing and 0 otherwise. Note that We have the observed data {(y 0 6s,i, Zj, = 1 does not necessarily i"i), i = 1,... ,n}. To allow for an informative missing mechanism in the response, we need to assume a distribution for the missing response indicator r j . For reasons discussed in Sections 4.2 and 4.7, in this chapter we consider the random-effect-based informative dropout mechanism /(rj|aj, bj; rj) for i y where rj are the unknown nuisance parameters. For such a missing data mechanism, the missingness of y - share the random effects a j and b j in the response id 111 and covariate models, suggesting that the dropout may be related to the true but unobservable response/covariate values or summaries of individual-specific response and covariate trajectories such as unobservable true viral decay rates. For example, such missing data models may be appropriate if a patient is more likely to dropout early because his true (but unobservable) viral decay rate is slower than other patients. In this chapter, we focus on the random-effect-based informative missing mechanism / ( r j | a j , bf 77). Such models are related to the shared-parameter models in the literature (e.g., W u and Carroll, 1988; Little, 1995; Ten Have et al., 1998). Although the relationship of the missingness with the random effects may be complex, a simple logistic regression model may provide a reasonable approximation. We will assume that r^'s are independent, i.e., fin]*, b<; 77) = IJ[P(r = IK b 77)] = log — y i5 = I K b 7 7 ) p [ l - P(rij » )] - «, 1 i; (5.1) r 7 with logit[P(rij = I K bf l — r\Tij where 77 = (770, r/J, r] ) T 2 r = 770 + r) —— + 77 b;, 1 — l | a i , Oi, 2 . rj) are the unknown nuisance parameters. For example, we may assume that the missingness of response is related to the first decay rate, say XUJ = ft + b i2 in Section 4.5., i.e., logit[F(ry = I K h i , 17)] = Vo + Vi UJ = V* + rjl(p = (Vo + V*iP2) + = r]o + x 2 + b) i2 rilb i2 ]ibi . r 2 Note that we should avoid building too complicated a dropout model since the model parameters may become non-identifiable. As the assumed dropout models are not testable 112 based on the observed data, it is important to carry out sensitivity analysis based on various dropout models. The random-effect-based informative dropout models in this chapter and the outcome-based informative dropout models in Chapter 4 can all be used for sensitivity analysis. 5.3 5.3.1 A Monte Carlo E M Method T h e Likelihood Function We consider likelihood inference for semiparametric N L M E models with random-effect-based informative dropouts and measurement errors and missing data in time-varying covariates, based on approximate models (2.6) - (2.8). Let 6 = (ct, 0, 5 , R, A,B, rjj) be the collection 2 of all unknown model parameters. We assume that the parameters ct, 0, 5 , R, A, B, and 2 rj are distinct. The approximate log-likelihood for the observed data {(y bs,i, 0 i> z r i), i = 1 , . . . , n} can be written as 1(d) = n ) fz(zi\&i;ct,R) x/(b ;B)/(r |a i n £ l o g i=l II i by rf) dy i Z dfy dbi miSti /V(y 6s,i| i> i a 0 b a , 0, 5 ) f ( i\*i\ct, 2 h x / ( b ; ; B) / ( r ; | a ; , b^; rj) da^dbi 113 f(sa;A) z z R) / ( a * ; A) (5.2) where a*, bj; ex, (3, 5 ) /y(y b ,i|zi, 0 2 s = n"=i'' fy(yobs,ij\ ij, z a * > » ; <*> b P> = nST"' (27r 5 )- /2exp{-[y , 2 /z(zi|aij a, i?) obs = n r = i fzfak\ai\ = Ilfcii |27r/?| x(z /(a*; A) f(h ; t B) _1/2 ij geb Ai «, exp{-(z ifc - Uj'fe a- v ifc a;) r i?" 1 - v a;)/2}, - ua ik iA; . ]*/2P}, 1 ( ifc =\27rA\-^ ex {- jA-^/2}, 2 P = a |27r B|- /2 xp{-bfB- b,/2}, 1 1 J e and y 6s,ij is the observed yij. Note that unlike 1(6) in Section 4.3, the missing responses 0 y i ,i m a r e S integrated out in (5.2). The observed-data log-likelihood function 1(6) generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects &i and bj. So we use a Monte C a r l o - E M ( M C E M ) algorithm to find the approximate M L E s of parameters 6. B y treating the unobservable random effects a; and b, as "missing" data, we have "complete data" {(y bs,i, i, *i, i, z bj), i = l , . . . , n } . a 0 T h e complete-data log- likelihood function for all individuals can be expressed as n n d) l G = YI i=l - Z c°(0) = XX log fy(yobs,i\ i, z a(, hi, ex, 0, 5 ) + log /(zi|ai; 2 z + log/(a ; 4)+log/(b,; B) + log/(r |a , h l where if ex, R) i=l J J is the complete-data log-likelihood for individual i. 114 t i %] r,)} (5.3) 5.3.2 A M C E M Algorithm Let be the parameter estimates from the t-th E M iteration. T h e E-step for individual i at the (t + l)th E M iteration can be expressed as Qi(0|0«) = E{lf{9)\y , , r 0 « ) = J J J [log /y(y ,i|zj, a*, b a , 3, 5 ) 2 obs u Z i i5 obs + log /z(zi|ai; a, i?) + log /(a i; i; A) + log /(bi; 5 ) + log /(ri|ai, bi; 17) x /(ai, bi|y ,i, Z i , n; 0 ) da dbi (t) o6s = / W ( a , /3, <5) + / « ( « , 2 J?) + 4V) t + + iffa). Since the expression (5.4) is an expectation with respect to /(ai, bj|y0j,Sjj, (5.4) Zj, i y 0^), it may be evaluated using the M C E M algorithm. Specifically, we may use the Gibbs sampler to generate samples from [ai, b ^ y ^ i , i y 0^] by iteratively sampling from the full conditionals Zj, [ai|y 6 ,i, Z j , r i , bi; 0W] and [bi|v ,i, z rj, a*; 0W] as follows. 0 s ofcs i; /(ai|y ,i, Z i , rj, b»; 0<*)) a f(y , o6s obSti a ^ , r , bij 0W) t =/(y fc ,i|zi, ri, ai,bi; 0^') •/(ai|zi, ri, bi; 0W) 0 s oc /(y 6s,i|zi, ri, ai,bi; 0W) • /(ri, a^z^ bij 0 ) (t) 0 = /(y bs,i|zi, ai,bi; 0W) • /(ri|zi, ai,bi; 6 ) • / ( a ^ z * , ^ 0 ) {t) (t) 0 = /(YcUzi, ai, b i; 00>) • /(r-i|ai, b i5 9®) • / ( a ^ ; 0 « ) oc /(y ,i|zi, ai,bi; 0(*>) • / ( r ^ a ^ h i ; 0W) • / ( z obs ai; 0(*>) i ? = /(ai; 0(*)) • / ( y ^ l z i , ai, b ; 0W) . / ( z ^ ; 0(0) . / ( . | a , , b,; 0W) t r /(bi|y ,i, Z i , ri, ai; 0(*)) oc f{y b ,i, b^Zj, ri, ai; 0(*>) obs 0 s = /(y b5,i|zi, ri,.ai,bi; 0 ) • /(bi|zi, r aj; 0 ) (i) W 0 i; OC /(yofcs,i| i, T i , ai,bij 0W) • / ( z J*i, b i | Z i , a i; 0W) = /(y h5,i|zi, ai,bi; 0W) • / ( r i | , a,,^; 0<*>) • /(b^z^a*; 0 ) (t) Zi 0 = /(b i5 0 « ) • /(y ,i|zi, a,, b,; 0(«>) • /(ri|ai, b 0 « ) . ofcs 115 i; Monte Carlo samples from each of the full conditionals can be generated using rejection sampling methods.. Alternatively, integral (5.4) may be evaluated using the importance sampling method (see Section 3.3.3). We will briefly discuss the sampling methods in the next section. Note that, unlike the M C E M method in Section 4.3.2, here we do not need to sample Ymis,i, which reduces the computational burden. For individual i, let { ( a f , b f ) , . . . , ( a f \ k t generated from [a*, bi\y o b S i i Zj, , ry, b f ' ) } denote a random sample of size Note that each ( a f , b f ) depends on the E M 0^}. iteration number t, which is suppressed throughout. The E-step at the ( £ + l ) t h E M iteration can then be expressed as Q(6\0M) = ± Qi{9\9®) i=i « ±h i=i I E = E E 17 log fy(yo ,iK i=i fc=i y ,u lf(0; obs z fe=i iy a f , b f ) } h J a f , b f ; a , 3, 5 ) 2 bs + E E i log / z f e l a f ; a , R) + £ £ £ log / ( a f ; A) i=lfc=l + E E i=lfc=l i log / ( b f ; B) + £ E £ log / W a f , b f ; r,) i=lfc=l t=lfc=l = Q ( 1 ) ( « , /3, <5 |0«) + Q< )(a, J?|0W) + Q ( ) ( A | 0 « ) + 2 The M-step then maximizes 2 Q(0\0^) 3 + Q^{B\0^) to produce an updated estimate is like a complete-data maximization. Since the parameters in are distinct, the M-step can be implemented by maximizing t+l 2 2 so it 0^ \ + Q < \ Q^, + Q^ \ Q^{rj\0W). Q^ \ 3 Q< ), and 4 and separately using standard optimization procedures for the corresponding complete-data models. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0: Let sf = dlf /80, where if is the complete-data log-likelihood for individual i. 116 Then an approximate formula for the variance-covariance matrix of 9 is n Z i , rf, 9) E{sf\y ^, Cov(0) = [J2 ( c \yobs,i, E s ] obs z it r 9) i ; where the expectations can be approximated by Monte Carlo empirical means, as in (5.6). In summary, the foregoing M C E M algorithm can be implemented as follows. Step 1. Obtain an initial estimate of ( a , 0, S , R, A, B) = ( a ' ' , 2 B^) and an initial value of (aj, bj) = (af\ 0 /3 ( 0 ) , 6 ^°\ Bl°\ A < ° \ 2 b- ^) based on a naive method; then we ob0 tain an initial estimate of 77 = rj^ based on the dropout model with the random effects (a b) = (aS°\bW). i> i Step 2. A t the t-th iteration, obtain Monte Carlo samples of the random effects (aj, bj) using the Gibbs sampler along with rejection sampling methods, or using importance sampling methods to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates #( ) using standard complete-data optimization t+1 procedures. Step 4. Iterate between Step 2 and Step 3 until convergence. 5.3.3 Sampling Methods Gibbs Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples as follows. {a\ \ k Set initial values (&f\ bf^). bf^), we can obtain (a[ \ k+1 b\ ^) k+l Suppose that the current generated values are as follows. Step 1. Draw a sample for the "missing" random effects a f ^ from /(aj|y h j, z 0 S] i: Step 2. Draw a sample for the "missing" random effects b\ ^ f r o m / ( b j | y t j , z k+1 0 Si iy i y b^; 9^). Ti, a| ^; 9^). After a sufficiently large burn-in of r iterations, the sampled values will achieve a steady state. Then, {(a| \ bf^), k = r + 1 , . . . , r + k } can be treated as samples from the fe t 117 fe+1 multidimensional density function /(aj, bj|y (, j, Zj, 0 S ) large gap r' (say r' = 10), we can treat { ( a f , samples from (af, /(aj, bj|y j, j, Zj,rj; 0 S ) ry, 0 ^ ) . A n d , if we choose a sufficiently b f ) , k = r + r',r + 2r',... } as independent 0®). There are several ways to get the initial values b f ) . A simple way is to obtain ( a f , b f ) based on a naive method. Multivariate Rejection Algorithm Sampling from the two full conditionals can be accomplished by the multivariate rejection sampling method. For example, we consider sampling from 9®) in (5.5). Let /*(aj) = /(y o b s ,j|zj, aj, / ( a j | y ( , j , Zj, b,; 0 W ) / ( z | a ; 0 ( « > ) / ( r ^ , i sup{/*(u)}. We assume <; < oo. A random sample from i /(aj|y o h s j, Z i , iy 0 b i ; bjj S 9®) ry bj; and ? = 0^) can then u be obtained as follows Step 1. Sample a* from / ( a j ; 0^), and independently, sample w from the uniform (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from /(bj|y (, j, 0 S i Z j , r-j, a j ; 0^) can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from 5.4 5.4.1 [aj, b j | y 6 0 S i j , Zj, A n Alternative Approximate Method The Hierarchical Likelihood Method The approximate maximum likelihood inference using a Monte Carlo E M method in the previous section may be computationally intensive and sometimes may offer potential computational problems such as slow or non-convergence, especially when the dimensions of the random effects a j and b j are not small (see the detailed discussion in Section 3.4.1). To 118 overcome these difficulties, in this section we consider an alternative method called the hierarchical likelihood method (Lee and Nelder, 1996, 2001) for approximate inference. T h e hierarchical likelihood method avoids Monte Carlo approximation and thus is always computationally feasible, and it can also used to obtain good parameter starting values for the M C E M method in the previous section. Let £ denote general "nuisance parameters". Lee and Nelder (1996) considered a function pi (I) defined by £4' where D(l, £ ) = -d l/d£ , 2 and £ solves dl/d£ 2 = 0. Following Lee and Lelder (1996), the complete-data log-likelihood function l (&) in (5.3) may also be called the hierarchical logc likelihood function since it combines the two stages of mixed-effects models. Let UJ = {u>; = (a^, bi), i = 1,... ,n} be the collection of random effects. The function p&(l (Q)) c can be written as i o™- — v _ c L 1=1 Z ^~ " ~ 1 1 , . I Z7T IJ ' (5-7) U>i=UJi 1=1 We can show that, for unobservable UJ, the use of the function PQj(l {9)) is equivalent to c integrating UJ out using the first-order Laplace approximation. Thus, P(j(l {Q)) c is the first- order Laplace approximation to the marginal log-likelihood 1(6) in (5.2) using the hierarchical log-likelihood function l (d). c In fact, let N = n b i + 0 be the number of the response and covariate observations S: for individual i and let b be the dimension of w . Assume that Ni — O(N) uniformly for ; i = 1,... , n, where'TV = mini TV*. Taking k = N, kp(x) = lf(6), 7 = 6, and x = a?* in the following Laplace approximation Id p(x) I dx 2 e fc (x) P d x = ( 2 7 r //c)^ 2 2 119 + 0(k~ ), l where x is a 7-dimensional parameter vector and x maximizes kp(x), we can approximate the ith individual's contribution U(0) to the overall log-likelihood 1(0) as h(0) j = log J e * du = log = log l ){e) i (^) b/2 = \og J |D(p( W i e duji Nip{U}i) )i «i) \^r e ^ /2 N + O (iVr^)| p -1/2 6/2 27rY ^ ^+O (TV- ) ) 1 P Ni) -1/2 = 4°(0) log U > = C J = log[ex {p^(^)(0))} + O ( i V - ) ] 1 P = Pc2,;(^ W) + O W - ) . i ) (5.8) 1 in which the last step holds by Lemma 3.2 in Section 3.7. Hence, the log-likelihood 1(0) can be approximated as 1= = 1 Pc (U0)) + Y2o(N- ) i J 1=1 = As (5.9) Po;(a0))+nO(iV- ). 1 N — miriiNi grows faster than n, the function p&(l (Q)) c approaches the marginal log- likelihood function 1(0), and hence an estimate of 0, which maximizes pjj(l (0)), also maxi- mizes 1(0). This lead to the following algorithm to obtain an approximate M L E of 0 called c OHL'- Step 1. Obtain an initial estimate of (ct, A(°\ 3, 5 , R, 2 A, B, w) =' ( a ' ' , 0 3 , {0) 5 -°\ 2( Rp\ B(°\ U/°)) based on a naive method, and an initial estimate of r) = rj^ based on the dropout model with the random effects u / ° ) . 120 Step 2. Given the current parameter estimates t U +V> update random-effects estimates with respect to a;*, i = 1 , . . . , n . by maximizing lc\0^) Step 3. Given the random-effects estimates u>f \ update the parameter estimates +1 #( ) by maximizing p ,(t+i)(Z (^)) with respect to 0. t+1 a c Step 4. Iterate between Step 2 and Step 3 until convergence. We can use Fisher information to obtain the following approximate formula for the variance-covariance matrix of the approximate M L E OHL C o v ( ^ ) =[[ - ^80dO ('f 8 v T ) r "' 0=0 HL Many optimization procedures evaluate the matrix [—d p^j(l (0))/dOd6 ] 2 T c 1T1 at 6 = BHL (called Hessian matrix), from which it is easy to obtain Cov(0 HL)- 5.4.2 Asymptotic Properties Under suitable regularity conditions on 1(0), g(-), and d(-), we extend Vonesh (1996) to show in Section 5.7 that (OHL — 00) = where 6 0 O p max | n 2, ^min N^j | is the true value of 6. Thus, the approximate M L E OHL will be consistent only as both n and (min; Ni) —* 0 0 . Intuitively, the theory while the (mini Ni)^ 1 term comes from standard asymptotic term comes from the Laplace approximation. Note that the accuracy of the first-order Laplace approximation to the log-likelihood function is 0 { n / ( m i n ; Ni)}, or, equivalently, o(l) provided (min;TV;) grows faster than n. In this case, (OHL ~ Oo) = O (n~^) with OHL being asymptotically equivalent to the "exact" p M L E . This reflects the fact that, as the accuracy of the Laplace approximation to the loglikelihood increases, the approximate M L E OHL will behave more and more like the "exact" 121 M L E . However, we can decrease the growth rate of (min^ N) of OHL- In particular, as (min, N ) t of OHL will still be O {rT^) v for the asymptotic normality grows at a rate greater than n%, the rate of consistency and the resulting estimate will be asymptotically equivalent to the "exact" M L E in the sense that it has the same asymptotic distribution as the "exact" M L E (see Section 5.7). We correct the claim by Vonesh (1996) that, as (minj/Vj) grows at a rate greater than nh but less than or equal to n, the rate of consistency will still be O (n~%) p but the resulting estimate will no longer be asymptotically equivalent to the "exact" M L E . The proofs of the above arguments are given in Section 5.7. 5.5 5.5.1 Example and Simulation Example We use the same H I V dataset in Section 4.5 to illustrate our proposed methods in this chapter, but we use the random-effect-based the outcome-based informative informative dropout model here rather than dropout model in Section 4.5. models may be used for sensitivity analysis. These informative dropout We use the commonly-used naive method, which ignores measurement errors and missing data, for parameter starting values in the two proposed methods. See the data description in Section 4.5.1. The Response and the Covariate Models We consider the same H I V viral dynamic and C D 4 measurement error models in Section 4.5.2. For completeness, we describe these models again here. 122 For the viral load process, we consider the following model Vij = \og {P - ^+ log(Pii) = Pi + bu, log(P ) = 2t P X W ft LIE X Uj + hi, e- ^) x 2 i = ft + ft4 + , (5.10) + 6 , (5.11) e i j 2i Mij = w(Uj) + hi(tij), (5.12) where y,j is the logi -transform of the viral load measurement for patient i at time t^, P^-and 0 P j are baseline values, A^- and A ^ are viral decay rates, z*j is the true (but unobservable) 2 2 CD4 count, and iw(£y) and h^tij) are nonparametric fixed- and random-effects functions (see Section 2.1). In order to reduce the number of nuisance parameters, we assume that the variance-covariance matrices A and B of the random effects are both diagonal matrices. To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts • and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines with percentile-based knots to approximate w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects. A I C and B I C criteria are used to determine the values of p and q, which leads to the following model for A r , in (5.12) (see Table 4.1), with p = 3 and q — 1, 2 A 2 i j « ft + ft MUj) + ft MUj) + hi- (5.13) For the C D 4 process, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. The following quadratic polynomial L M E model best fits the C D 4 trajectory (see Table 4.2): CD4*/ = (ai + ai) + ( a + a ) u 2 2 123 a + ( a + a ) u\ + e 3 3 ih (5-14) where uu is the time and a (an, aj2, a^) T = (ai, a , a^) T 2 are the population parameters and a; = are the random effects. Random-effect-based Informative Dropout Models In this study, dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, dropouts are likely to be informative or nonignorable. So we need to assume a model for the dropout process in order to make valid likelihood inference. Note that we should avoid building too complicated a dropout model since a complicated model may become non-identifiable. Subject-area knowledge and preliminary checks suggest that dropout may be related to the random-effects components an, a , i2 and b , so we consider i2 the following dropout model logit[P(rjj = l | a i , b^ T?)] =r] 1 + rj an + ri a 2 3 i2 + 774^2- (5.15) We assume independence of the r^'s to simplify the model. The dropout model (5.15) along with the dropout models in Section 4.5 can be used for sensitivity analysis. Estimation M e t h o d s and C o m p u t a t i o n Issues We estimate the model parameters using the two proposed "joint" model methods discussed in Sections 5.3 and 5.4. We denote the method in Section 5.3 by A P and the method in Section 5.4 by H L . The two proposed joint model methods need starting values for model parameters since they are implemented by a M C E M algorithm or by an iterative Laplace approximation to the log-likelihood function. We use the parameter estimates obtained by the naive method, which ignores measurement errors and missing data, as parameter starting values for the two joint model methods. For the naive method, we use the S P L U S function nlme() and lme() to obtain param- eter estimates and their default standard errors. For the proposed A P method, we assess the 124 convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 5.1 and 5.2 show the time series and the autocorrelation function plots for b associated with patient 14. From these figures, we notice that the Gibbs 2 sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 15. Time series and autocorrelation function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 5.3.3). We start with k 0 = 500 Monte Carlo samples, and increase the Monte-Carlo sample size as the number t of E M iteration increases: k +i = k + k /c t t t with c — 4 (Booth and Hobert, 1999). Convergence criterion for these two joint model methods in our examples is that the relative change in the parameter estimates from successively iterations is smaller than 0.05. Convergence of the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the A P method. Other sampling methods may also be applied and may be even more efficient. O n a S U N Sparc work-station, the A P method took about 135 minutes to converge while the H L method took about 150 minutes to converge. The H L method took more time than the A P method mainly because all model parameters appear in the nonlinear function p<jj(l (9)) and no sepc aration of the parameters is possible. However, the H L method is always computationally feasible while the A P method sometimes may have convergence problems. Moreover, the H L method can be used to obtain good parameter starting values for the A P method. Analysis Results We estimate the model parameters using the two proposed joint model methods A P 125 Series : b2 Figure 5.2: The autocorrelation function plot for b associated with patient 14. 2 126 Table 5.1: Parameter estimates (standard errors) for the models in the example. c Method ai a 2 AP -.43 HL (•1) (-4) ft «3 ft ft ft ft ft 11.72 66.65 1.53 6.93 -1.86 7.49 -2.36 (.2) (.5)' (7.8) 4.32 -3.93 (4.3) (4.6) (•2) 11.64 66.44 1.58 (.7) -.41 6.89 -1.92 (.6) (.5) (•1) (3.4) (2.8) (.6) (4.9) (4.8) S ft 4.29 -3.90 R .36 .51 (2.7) 7.46 -2.29 (7.5) (2.7) .35 .50 Note: the estimated covariance matrices are A = diag(.62,4.70, 4.41) for AP, A = diag(.51, 4.74, 4.53) for HL. B = diag(1.45,91.62,1.94,20.16) for AP, and B = diag(1.42,91.91,1.58,19.96) for HL. and H L . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P and the H L methods. We also tried several other parameter starting values for the proposed joint model methods. Different parameter starting values lead to roughly same parameter estimates in both the A P and the H L methods. Table 5.1 presents the resulting parameter estimates and standard errors based on the random-effect-based informative model (5.15). We find that the two joint model methods provide similar parameter estimates. Comparing the random-effect-based informative dropout model with the outcome-based informative dropout model I in (4.13), we find that the resulting estimates are similar. This indicates again that the estimation may be robust against the nonignorable dropout models. Although some estimates in Table 5.1 are not statistically significant, the values of the estimates may still provide useful information about viral load and C D 4 trajectories. The estimates of parameters rj based on the A P method (or the H L method) are -2.32, .31, -.05,, and - . 0 7 (or -2.4.1, .27, -.04, and -.08) respectively, with all p-values less than .00001, which also indicates that the dropouts may be nonignorable (or informative) since the missingness may depend on the unobservable random effects. The estimates of rj , n , and 77 indicate that dropout patients seems to have higher baseline 2 3 4 CD4 count, decrease in C D 4 count faster over time, and have slower first decay rate than the remaining patients. 127 5.5.2 The Simulation Study We evaluate the proposed methods ( A P and H L ) for the random-effect-based informative model (5.15) via simulation. The response and covariate models, the random-effect-based informative dropout model, and the measurement time points used in the simulation are all the same as those in the example in the previous section (i.e., (5.10) - (5.14)). We choose appropriate values of rj to mimic certain missing rate, and we use the S P L U S function sampleQ to generate binary data r y based on the values of parameters 77 and the random effects 3n and by If ?y = 1, then y - is deleted, and if ry = 0, id is considered to be observed. In the simulations, the true values of model parameters 3 and oc are shown in Table 5.2, and the other true parameter values are 5 = .2, R = .4, A = diag(.5, 3, 2), and B = diag(l, 9, 2, 4). We set 77 = (—1.4,0.1, —0.1, —0.1) to get an average missing rate of r 20%. We always regard the first two responses on each individual as observed, i.e., every individual has at least two observed responses. We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors based on each of the two methods. We compare the methods by comparing their biases and mean-square-errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined in Section 4.6. Since A P method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 128 data sets. From the simulation results in Tables 5.2 and 5.3, we see that the two proposed joint model methods ( A P and H L ) perform well. The A P method performs better than the H L method in the sense that the A P yields smaller relative M S E and bias than the H L method. The H L method also performs reasonably well and it is always computationally feasible. Therefore, the H L method may be a good alternative method when the A P method exhibits computational difficulties, and the H L method can also be used to obtain good parameter 128 Table 5.2: Simulation results for the parameter estimates (standard errors) for the estimation methods A P and H L . Parameter True Value Oil a 2 «3 ft A ft ft ft ft ft -.4 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 AP -.40 4.01 -3.98 11.99 66.95 1.49 6.99 -2.01 8.04 -2.97 (•2) -.38 (.5) (.6) (1.4) (1.6) (.3) (2.1) (3.2) (1.1) 3.96 -3.93 (•1) (•4) (.5) HL (•1) 11.99 67.10 1.56 6.95 -2.08 8.09 -2.89 (•1) (1.0) (1.3) (.3) (1.8) (2.9) (1.0) Table 5.3: Simulation results for bias and M S E of the parameter estimates for the estimation methods A P and H L . Parameter True Value a\ -A <%2 4.0 a 3 -4.0 ft ft ft ft ft ft 67.0 1.5 7.0 -2.0 8.0 -3.0 -.1 .22 -.44 -.13 -.26 -.53 -3.78 .46 .62 .25 1.98 4.24 71.84 28.95 22.26 98.75 37.30 31.71 ft- 12.0 Bias AP HL 1.59 .15 2.17 -1.02 .95 1.31 -.07 -.08 1.08 .93 MSE AP 7.93 6.46 7.39 1.66 1.74 66.69 HL 9.07 8.33 . 9.31 1.69 2.43 93.67 Note: Bias = Percent bias = 100 x biaSj/|/?j|; MSE = Percent ^MSE = 100 129 x ^/MSE^/I^I. starting values for the A P method. 5.6 Discussion We have proposed two approximate likelihood methods for semiparametric N L M E models with random-effect-based informative dropouts and covariate measurement errors and missing data, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler or by an iterative Laplace approximation to the log-likelihood function respectively. The first method may be more accurate than the second method but it sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method is always computationally feasible but may be less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that both methods produce satisfactory results. Although it does not need to generate Monte Carlo samples for random effects, the second method may not be computationally more efficient than the first method. A possible reason is that there are too many model parameters appear in the nonlinear functions in optimization procedures and no separation of the parameters is possible. A possible solution is to use Bayesian method to address this problem. Specifically, we may assume the known hyperprior distributions for the model parameters. In many longitudinal data sets, dropouts, censoring, measurement errors, and missing covariates are all present simultaneously. To our knowledge, there are almost no unified methods in the literature which address these problems simultaneously. W u (2002) pro- posed a unified method to address censoring and measurement errors simultaneously and 130 showed that the proposed method offered significant improvement over existing methods currently in use. T h e ideas in W u (2002) and in this chapter can be extended to address dropouts, censoring, measurement errors, and missing covariate simultaneously in semiparametric/nonparametric N L M E models. 5.7 Appendix: Asymptotic Properties of the Approximate M L E #HL in Section 5 . 4 5.7.1 Consistency We will show that the following result (OHL — Oo) = O m a x j n s^m.inA^ V j holds under the usual regularity conditions on 1(0), g(-) and d(-), where 6 0 is the true value of 0. Proof. Let u); maximize lc\0) Suppose that Ni = O(N) with respect to u>; for fixed 0. Denote Ni = n b i + rrii. 0 uniformly for i = 1 , . . . , n, where ./V = mim Ni. s< Based on (5.8) in Section 5.4.3, the zth individual's contribution k(0) to the overall log-likelihood may be approximated as m = POMW) + ^r ) = P& {W(o)).+ O(N- ). . 1 1 T Hence, the log-likelihood 1(0) can be .written as (see (5.9)) l(0) = l*(0) + O{nN~ }, 1 where l*(0) = p^(l (0)) c = E H i P u ^ W ) - L e t *( ) u e . = dl*(0)/d0 (5.16) and let 0 HL be the approximate maximum likelihood estimate satisfying U*(0HL) — 0- Under suitable regularity 131 conditions on 1(0) and assuming QHL is an interior point in a neighborhood containing 0 , 0 Taylor's theorem tells us that there exists a vector 0 on the line segment between 0 0 and OHL such that n- u(0 ) = n- u(0 ) l where u(0) — dl(0)/dO + - M(0)(0 x HL and M(0) - l Q n HL = 8 l(0)/d0d0 2 0 ), (5.17) 0 are the first and second order derivatives T of the true but intractable marginal log-likelihood 1(0). The. first term n u ( 0 ) on the right - 1 o of (5.17) is iu(« , = I«2> 0=0 0 n n = ly dk(0) 0=00 80 O oO Given sufficient regularity conditions on 1(0), we know from the Lindeberg Central Limit Theorem that (5.18) ^u(0 )-^(0,7(00)), o where the matrix 7(0) = l i m ^ o o ^ £ h(0) and h(0) is the information matrix for individual " i=l i. That implies - ^ u ( 0 ) = O (l) o The matrix n~ M(0) l O (n- ' ). 1 2 o p on the right of (5.17) is -M(0) = n ' 1 l v ^u(0 ) = p ndOdO T 0=0 A d\(0) n Z-u ^ dOdO 0=0 1 n (5.19) -1(0), by the Law of Large Numbers. Since 1(0) is positive definite for all 0, the probability that the matrix n _ 1 M ( 0 ) is invertible tends to 1. B y writing n' M(0) l = —1(6) + o (l) p and applying Lemma 3.2 in Section 3.7 to the inverse function, we have [n- M(0)Y l l = -IijO)- 1 132 + o (\). p (5.20) Given suitable regularity conditions ong(-) and d(-), for example that third order derivatives exist and are continuous in an open neighborhood about 0 , application of Lemma 3.2 in O Section 3.7 to the partial derivative function in the expression (5.16) leads to n- u(6 ) 1 HL From (5.17), we have -1 1 H L - 6o) = n - y f l i / L ) - n^u^o) 0 ) = [ - M(d)]- [n' vi{e ) (OHL - l 0 l - l n (5.21). 1 HL Note that u*(0HL) = 0 and n u ( 0 o ) = Op(n~i). n- M(0)(0 + 0(N- ). = n- u*{d ) 1 HL n " 1 ^ ) ] (OHL - Oo) = ( - / ( 0 ) " + o ( l ) ) [ n - u ( 0 ) - n " ^ ) ] 1 1 1 p (OHL - Oo) = (-1(G)- 1 0 ) = -I(O)- (OHL - H i + o {l)){n- u\0 L) l O maxjra^, 1 0 + H p AT }'] 1 p m a x j n .v^nrin/V^ (by (5.20)) O ^ " ) + O (n^)\ + o 1 p (by (5.21)) m a x j n - ^ , iV >}] _1 p }J . Finally, let OML denote the "exact" maximum likelihood estimate with U(0ML) Let min; N = 0(n ) T = 0- for r > 1 so that the accuracy of the Laplace approximation to the marginal log-likelihood is approximately 0(n ~ ) l = o(l) from the formula (5.16). T Then, under the same regularity conditions as before, by multiplying n on the both sides of the equation (5.21) and noting that U(0ML) = 0, we have U(0HL) = U*(0 L) + O (1) = O + O (1) = U(0 ) + O (1). H Thus U(9HL) ~ (0ML) u = o (l) p p p ml p and hence OHL is asymptotically equivalent to the "exact" maximum likelihood estimate OML- 5.7.2 d Asymptotic Normality of #HL In this section, we will show that as N grows at a rate greater than n%, i.e., N = 0(n ) T for r > | , the approximate M L E OHL and the "exact" M L E OML have the same asymptotic 133 distribution. Proof. Noting that the approximate M L E OHL satisfies a set of equations U*(OHL) = , U take a first-order Taylor series expansion of U*(0 L) H = O = u*(0 ) HL around the true parameter 0 U*(0 ) + W E O ^-^-(0HL-0 ), 0 0 dO where 0* is oh the line segment joining 0 0 V^{0HL-0 ) 0 to OHL, which implies 1 du*{0*) "i - i = n 80 1 1 i=l (5.22) 80 dOdO Now we consider the two product terms on the right of (5.22). Applying Lemma 3.2 in Section 3.7 to the first and second partial derivative functions in the expression in (5.16), we know that, for any fixed 0, 4=u*(6>) = - ^ u ( 0 ) + 0(n 2 A " ) l 1 dO dO + U ^ )> N (5.23) and n dO T n dO T dOdO n^dOdO T T i=i Assume that N = 0(n ), T 1 (5.24) ' i=i o(l). From where r > \ . Then ~0(n^ N' ) = 1 0(n^- ) T (5.23) and (5.24), we have _ £%>^h 1 l ™ 1 dlj(Oo) aO A rpp^f)) « £ a0a0 _ r ~ 134 l i m i f » (5.25) *«e-) " Note that OHL — Oo = O [ma,x{n~%, N^ }] = O (n~^), i.e., OHL is a -^-consistent estimate 1 p p of 6o- Since 9* is on the line segment joining 0 O and OHL, A- 0 O as n —> oo. Under the same regularity conditions as before, it follows from (5.18) and (5.19) that i= n „. (- ) 5 26 Combining the results in (5.25) and (5.26) and using Slutsky's theorem, we can show that V^(0HL-0O) which implies that when N = 0(n ) T A/V(0, Wo)' ), 1 for r > | , the approximate M L E OHL M L E OML have the same asymptotic distribution. 135 a n d the "exact" • Chapter 6 Conclusions and Future Research 6.1 Conclusions In this thesis, we have developed approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing re- sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we have proposed two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, may be more accurate than the second method but it may be computationally very intensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approxima- 136 tion, is computationally more appealing, but it may be less accurate than the first method. The performance of the second method may need further investigation. We have showed some asymptotic results for the estimates based on'the second method. T h e second method may be used as a reasonable alternative when the first method has convergence problems or be used to provide excellent parameter starting values for the first method. Simulation results have shown that all proposed methods perform better than the commonly used two-step method and the naive method which ignores measurement errors, in the sense that the proposed methods yield smaller bias and M S E . In particular, the commonly used two-step method may under-estimate standard errors, which is consistent with analytic results, and the naive method may under-estimate covariate effects and poorly estimate other parameters. 6.2 Future Research Topics Finally, we discuss possible future work relevant to this thesis as follows. 1. In many longitudinal studies such as H I V viral dynamics, another common problem is that the response measurements may be subject to left censoring due to a detection limit. Censored responses in practice were often substituted by the detection limit or half the detection limit (Wu and Ding, 1999; W u and W u , 2001), which may lead to substantial biases in the results (Wu, 2002). In the presence of both dropouts and censoring, unified approaches which address these problems simultaneously in semiparametric/nonparametric N L M E models are needed in order to make reliable statistical inference. 2. In many longitudinal datasets, dropouts, censoring, measurement errors, and missing covariates are all present simultaneously. To our knowledge, there are almost no 137 unified methods in the literature which address these problems simultaneously. Wu (2002) proposed a unified method to address censoring and measurement errors simultaneously and showed that the proposed method offered significant improvement over existing methods currently in use. The ideas in W u (2002) and in this thesis can be extended to address dropouts, censoring, measurement errors, and missing covariates simultaneously in semiparametric/nonparametric N L M E models. 3. For the response process, we only consider semiparametric nonlinear mixed-effects models with independent and normal distributed error terms e;. In the future, we may consider more complicated covariance structure for e; such as an AR(1) structure. 4. In our study, we only consider semiparametric nonlinear mixed-effects normal data. models for Generally, our proposed methods may be extended to other models, such as semiparametric/nonparametric generalized linear mixed-effects models and semiparametric/nonparametric generalized nonlinear mixed-effects models. 5. Computational efficiency is an important issue in our study. Multivariate rejection sampling methods have been used in our data analyses and simulation. In general, other sampling methods, such as adaptive rejection sampling methods and importance sampling methods, may also be used and may be even more efficient. We plan to compare computational efficiency among several sampling methods in our current setting. 6. In our alternative methods, we have approximated log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approximation. Sometimes, these are not necessarily accurate approximations. In the future, we may investigate better approximations, such as higher order Taylor expansions and Laplace approximations. 138 7. One problem in our models is that there are too many parameters. If the data are not rich enough, the proposed methods may have convergence problems and identifiability problems. We plan to develop Bayesian methods for our problems. 139 References Aitchison, J . and Silvey, S. D . (1958). Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics, 29, 813-828. Amemiya, T . (1983). Nonlinear regression models. In Handbook of Econometrics, Volume I, Z. Griliches and M . D . Intriligator, eds., pp.333-389. Noth Holland, Amsterdam. Barndorff-Nielsen, O. E . and Cox, D . R. (1989). Asymptotic Techniques for Use in Statis- tics. New York: Chapman and Hall. Booth, J . G . and Hobert, J . P. (1999). Maximizing generalized linear mixed models likelihoods with an automated Monte Carlo E M algorithm. Journal Society, of the Royal Statistical Ser. B, 61, 265-285. Bradley, R. A . , and Gart, J . J . (1962). The asymptotic properties of M L estimators when sampling from associated populations. Biometrika, 49, 205-214. Carroll, R. J . , Ruppert, D . , and Stefanski, L . A . (1995). Measurement Models. Error in Nonlinear London: Chapman and Hall. Chan, K . S. and Ledolter, J . (1995). Monte Carlo E M estimation for time series models involving counts. Journal of the American 140 Statistical Association, 90, 242-252. Davidian, M . and Giltinan, D. M . (1995). Nonlinear Data. Models for Repeated Measurements Chapman & Hall. de Boor, C . (1978). A Practical Guide to Splines. Demidenko, E . (2004). Mixed Models Springer-Verlag, New York. Theory and Applications. John Wiley & Sons. Dempster, A . P., Laird, N . M . , and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the E M algorithm (with discussion). Statistical Society, Ser. B, 39, Journal of the Royal 1-38. Diggle, P. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis (with Discussion). Applied Statistics, 43, 49-93. Ding, A . and W u , H . (2001). Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics, Euband, R. L . (1988). Spline smoothing and Nonparametric Regression. 2(1), 13-29. New York: Marcel Dekker. Fitzmaurice, G . M . , Laird, N . M . , and Zahner, G . E . P. (1996). Multivariate logistic models for incomplete binary responses. Journal of-the American Statistical Association, 91, 99-108. Gelfand, A . E . and Smith, A . F . M . (1990). marginal densities. Journal Geweke, J . (1996). Handbook of the American of Computational Sampling-based approaches to calculating Statistical Economics. Association, Amsterdam: 85, 398-409. North-Holland. Gilks, W . R. and W i l d , P. (1992). Adaptive rejection sampling for Gibbs sampling. Statistics, 41, 337-348. 141 Applied Green, P. J . and Solverman, B . W . (1994). Linear Models. Nonparametric Regression and Generalized Chapman and Hall, London. Grossman, Z . , Polis, M . , Feinberg, M . B . , Grossman, Z . , Levi, I., Jankelevich, S., Yarchoan, R., Boon, J . , De Wolf, F . , Lange, J . M . A . , Goudsmit, J . , Dimitrov, D . S., and Paul, W . E . (1999). Ongoing H I V dissemination during H A A R T . Nature Medicine, 5 , 1099-1103. Higgins, D. M . , Davidian, M . , and Giltinan, D. M . (1997). A two-step approach to measurement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. Journal of the American Statistical association, 92, 436-448. Ho, D . D . , Neumann, A . U . , Perelson, A . S., Chen, W . , Leonard, J . M . , and Markowitz, M . (1995). Rapid turnover of plasma virions and C D 4 lymphocytes in HIV-1 infection. Nature, 373, 123-126. Ibrahim, J . G . , Chen, M . H . , and Lipsitz, S. R. (2001). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551-564. Ibrahim, J . G . , Lipsitz, S. R., and Chen, M . H . (1999). Missing covariates in generalized linear models when the missing data mechanism is nonignorable. Journal Statistical Society, of the Royal Ser. B, 61, 173-190. Ke, C . and Wang, Y . (2001). Semiparametric nonlinear mixed-effects models and their applications (with discussions). Journal of the American Statistical Association, 96, 1272-1298. Laird, N . M . and Ware, J . H . (1982). Random-effects models for longitudinal data. metrics, 38, 963-974. Bio- Lee, Y . and Nelder, J . A . (1996). Hierarchical generalized linear models. Royal Statistical Society, Journal of the Ser. B, 58, 619-678. Lee, Y . and Nelder, J . A . (2001). Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987-1006. Liang, H . , W u , H . , and Carroll, R. (2003). The relationship between virologic and immunologic responses in A I D S clinical research using mixed-effects varying-coefncient models with measurement errors. Bio statistics, 4, 297-312. Lindstrom, M . J . and Bates, D . M . (1990). Nonlinear mixed effects models for repeated measures data. Biometrics, 46, 673-687. Little, R. J . A . (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical 90, 1112-1121. Association, Little, R. J . A . and Rubin, D . B . (1987). Statistical Analysis with Missing Data. New"York: John Wiley. Louis", T . A . (1982). Finding the observed information matrix when using the E M algorithm. Journal of the Royal Statistical McCulloch, C . E . (1997). models. Journal Society, Ser. B, 44, 226-233. Maximum likelihood algorithms for generalized linear mixed of the American Statistical Association, McLachlan, G . J . and Krishnan, T . (1997). The EM-Algorithm 92, 162-170. and Extension. New York, Wiley. Ogden, R. T . and Tarpey, T . (2006). estimated parameters. Bio statistics, Estimation in Regression models with externally 7, 115-129. 143 Perelson, A . S.,.Neumann, A . U . , Markowitz/-M., Leonard, J . M . , and Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science, 271, 1582-1586. Pinheiro, J . C . and Bates, D . M . (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35. Rice, J . A . and W u , C . O. (2001). Nonparametric mixed-effects models for unequally sampled noisy curves. Biometrics, Seber, G . A . F . (1984). Multivariate Serfling, R. J . (1980). 57, 253-259. Observations. Approximation Theorems New York: Wiley. of Mathematical Statistics. New York: Wiley. Shah, A . , Laird, N . , and Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association, 92, 775-779. Ten Have, T . R., Pulkstenis, E . , Kunselman, A . , and Landis, J . R. (1998). Mixed ef- fects logistics regression models for longitudinal binary response data with informative dropout. Biometrics, 54, 367-383. Vonesh, E . F . (1996). A note on the use of Laplace's approximation for nonlinear mixedeffects models. Biometrika, 83, 447-452. Vonesh, E . F . and Chinchilli, V . M . (1997). Linear of Repeated Measurements. and Nonlinear Marcel Dekker, New York. 144 Models for the Analysis Vonesh, E . F . , Wang, H . , Nie, L . , and Majumdar, D . (2002). Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. Journal of the American Statistical Association, 97, 271-283. Wei, G . C . and Tanner, M . A . (1990). A Monte Carlo implementation of the E M algorithm and the poor man's data augmentation algorithm. Journal Association, of the American Statistical 85, 699-704. Wolfinger, R. (1993). Laplace's approximation for nonlinear mixed models. Biometrika, 80, 791-795. Wu, H . , Kuritzkes, D . R., McClernon, D. R., Kessler, FL, Connick, E . , Landay, A . , Spear, G . , Heath-Chiozzi, M . , Rousseau, F . , Fox, L . , Spritzler, J . , Leonard, J . M'., and Lederman, M . M . (1999). Characterization of viral dynamics in human immuno-deficiency virus type 1-infected patients treated with combination antiretroviral therapy: relationships to host factors, cellular restoration and virological endpoints. Infectious Diseases, 179, Journal of 799-807. Wu, H . and Ding, A . (1999). Population HIV-1 dynamics in vivo: application models and inferential tools for virological data from AIDS clinical trials. Biometrics, Wu, H . (2005). Statistical 55, 410-418. Statistical Methods for H I V Dynamic Studies in A I D S Clinical Trials. Methods in Medical Research, 14, 171-192. Wu, H . and Zhang, J . (2002). The study of long-term H I V dynamics using semi-parametric non-linear mixed-effects models. Statistics in Medicine, 21, 3655-3675. Wu, L . (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal Statistical Association, 97, 955-964. • 145 of the American Wu, L . (2004). Exact and approximation inferences for nonlinear mixed-effects models with missing covariates. Journal of the American Statistical Association, 9 9 , 700-709. Wu, L . and W u , H . (2001). A multiple imputation method for missing covariates in nonlinear mixed-effect models, with application to H I V dynamics. Statistics in Medicine, 20, 1755-1769. Wu, M . C . and Carroll, R. J . (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 188. 146 44, 175-
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The theory and methods for measurement errors and missing...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The theory and methods for measurement errors and missing data problems in semiparametric nonlinear mixed-effects… Liu, Wei 2006
pdf
Page Metadata
Item Metadata
Title | The theory and methods for measurement errors and missing data problems in semiparametric nonlinear mixed-effects models |
Creator |
Liu, Wei |
Date Issued | 2006 |
Description | Semiparametric nonlinear mixed-effects (NLME) models are flexible for modelling complex longitudinal data. Covariates are usually introduced in the models to partially explain inter-individual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. In this thesis, we develop approximate maximum likelihood inference in the following three problems: (1) semiparametric NLME models with measurement errors and missing data in time-varying covariates; (2) semiparametric NLME models with covariate measurement errors and outcome-based informative missing responses; (3) semiparametric NLME models with covariate measurement errors and random-effect-based informative missing responses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. Some asymptotic properties of the estimates are discussed. The proposed methods are illustrated in a HIV data example. Simulation results show that all proposed methods perform better than the commonly used two-step method and the naive method. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-16 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0092843 |
URI | http://hdl.handle.net/2429/18520 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2006-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2006-200308.pdf [ 6.12MB ]
- Metadata
- JSON: 831-1.0092843.json
- JSON-LD: 831-1.0092843-ld.json
- RDF/XML (Pretty): 831-1.0092843-rdf.xml
- RDF/JSON: 831-1.0092843-rdf.json
- Turtle: 831-1.0092843-turtle.txt
- N-Triples: 831-1.0092843-rdf-ntriples.txt
- Original Record: 831-1.0092843-source.json
- Full Text
- 831-1.0092843-fulltext.txt
- Citation
- 831-1.0092843.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092843/manifest