Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The theory and methods for measurement errors and missing data problems in semiparametric nonlinear mixed-effects.. Liu, Wei 2006

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2006-200308.pdf [ 6.12MB ]
Metadata
JSON: 1.0092843.json
JSON-LD: 1.0092843+ld.json
RDF/XML (Pretty): 1.0092843.xml
RDF/JSON: 1.0092843+rdf.json
Turtle: 1.0092843+rdf-turtle.txt
N-Triples: 1.0092843+rdf-ntriples.txt
Original Record: 1.0092843 +original-record.json
Full Text
1.0092843.txt
Citation
1.0092843.ris

Full Text

The Theory and Methods for Measurement Errors and Missing Data Problems in Semiparametric Nonlinear Mixed-effects Models by WEI LIU B . S c , Northeast Normal University, 1990 M.Sc., Northeast Normal University, 1993 M.Sc., Memorial University of Newfoundland, 2001  A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Doctor of Philosophy in T H E F A C U L T Y O F G R A D U A T E STUDIES (Statistics)  The University of British Columbia October 2006 © W E I L I U , 2006  Abstract  Semiparametric nonlinear mixed-effects  ( N L M E ) models are flexible for modelling  complex longitudinal data. Covariates are usually introduced in the models to partially explain inter-individual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. In this thesis, we develop approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based  informative  missing responses; (3). semiparametric N L M E  models with covariate measurement errors and random-effect-based  informative  missing re-  sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way.  For each problem, we propose two joint model methods to simultaneously  obtain approximate maximum likelihood estimates (MLEs) of all model parameters. Some asymptotic properties of the estimates are discussed. The proposed methods are illustrated in a H I V data example. Simulation results show that all proposed methods perform better than the commonly used two-step method and the naive method.  L  ii  Contents Abstract Contents List of Tables . . . List of Figures . . Acknowledgements Dedication '.. 1 Introduction  1  •1.1  Longitudinal Studies  1  1.2  Parametric Nonlinear Mixed-effects Models  1.3  Semiparametric Nonlinear Mixed-effects Models  6  1.4  Measurement Errors and Dropouts  8  1.5  A Motivating Example  1.6  Research Objectives and Thesis Organization  .  3  10 . . .  2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Mea-  iii  14  surement Errors  • •  16  2.1  Introduction  16  2.2  A Semiparametric N L M E Model for the Response Process  17  2.2.1  A Semiparmetric N L M E Model with Mis-measured Covariates . . . .  17  2.2.2  A Basis-based Approach to Nonparametric Functions  18  2.2.3  Percentile-based K n o t Placing for Splines  21  2.2.4  Selection of Smoothing Parameters  21  2.2.5  Transformation of the Semiparametric N L M E M o d e l  23  2.2.6  Consistency of the Estimate of w(t)  23  2.3  3  '.  Measurement Errors and Missing D a t a in Covariates  26  A Joint Model for Semiparametric N L M E Models with Covariate Measurement Errors and Missing Data  28  3.1  Introduction  28  3.2  A Two-step Method  29  3.3  A Joint Model Method fpr Likelihood Inference  34  3.3.1  The Likelihood for the Joint Model  34  3.3.2  A M C E M Method  36  3.3.3  Sampling Methods  40  3.3.4  Convergence  44  3.4  3.5  A Computationally More Efficient Approximate Method  46  3.4.1  The Need for an Alternative Method  46  3.4.2  Analytic Expressions of Estimates  48  3.4.3  Asymptotic Properties  51  Example and Simulation  51  iv  3.5.1  A n Application in A I D S Studies  51  3.5.2  A Simulation Study  58  3.6  Discussion  60  3.7  Appendix: Asymptotic Properties of 7 Based on the A P P R M e t h o d i n Section 3.4  61  3.7.1  Some Lemmas  61  3.7.2  Notation and Regularity Conditions  3.7.3  Estimating Equations  3.7.4  Consistency . . ;  67  3.7.5  Asymptotic Normality of 7  72  .  62  •. . .  65  S i m u l t a n e o u s Inference for S e m i p a r a m e t r i c N L M E M o d e l s w i t h C o v a r i a t e Measurement E r r o r s and Outcome-based Informative D r o p o u t s  74  4.1  Introduction  74  4.2  Models for Nonignorable Missing Responses  75  4.3  Likelihood Inference  4.4  4.5  . .  .  77  4.3.1  The Likelihood Function  77  4.3.2  Approximate M L E s Based on a M C E M Method . . .  78  4.3.3  Monte Carlo Sampling  81  .  A Computationally More Efficient Approximate Method  83  4.4.1  A M u c h Simpler E-step  84  4.4.2  T h e M-step  91  Example  95  4.5.1  Data Description  95  4.5.2  The Response and the Covariate Models  96  v  5  4.5.3  Dropout Models and Sensitivity Analysis .  4.5.4  Estimation Methods and Computation Issues  4.5.5  Analysis Results  98 . .  99 101  4.6  A Simulation Study  103  4.7  Conclusions and Discussion  108  Semiparametric N L M E Model with Random-effect-based Informative Dropouts and Covariate Measurement Errors  110  5.1  Introduction  110  5.2  Missing Response Models  Ill  5.3 A Monte Carlo E M Method  113  5.3.1  The Likelihood Function  5.3.2  A M C E M Algorithm  5.3.3  Sampling Methods  117  5.4 A n Alternative Approximate Method  118  5.5  5.6  113 •  115  5.4.1  The Hierarchical Likelihood Method  118  5.4.2  Asymptotic Properties  121  Example and Simulation  122  5.5.1  Example  122  5.5.2  The Simulation Study  128  Discussion  130  5.7 Appendix: Asymptotic Properties of the Approximate M L E #HL in Section 5.4131  6  5.7.1  Consistency  5.7.2  Asymptotic Normality of f ?  131 H  L  Conclusions and Future Research  133 136  vi  6.1  Conclusions  6.2  Future Research Topics  References  136 .'  .  137  . 140  vii  List of Tables 3.1  A I C and B I C values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3  .  54  3.2  A I C and B I C values for the linear, quadratic, and cubic C D 4 models  55  3.3  Parameter estimates (standard errors) for the H I V dataset  58  3.4  Simulation results for parameter estimates as well as (standard errors) and (simulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R .  59  4.1  A I C and B I C values for the model (4.8) - (4.10), with q < p = 1, 2, 3.  . . .  97  4.2  A I C and B I C values for the linear, quadratic, and cubic C D 4 models  . . . .  98  4.3  Parameter estimates (standard errors) for the models in the example  4.4  Simulation results for the parameter estimates (standard errors) as well as  102  their biases and M S E s for the estimation methods P A R A , A P P R 1 , and APPR2.105 4.5  Simulation results for the parameter estimates (standard errors) for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13)  107  viii  4.6  Simulation results for biases and M S E s of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13)  108  5.1  Parameter estimates (standard errors) for the models in the example  127  5.2  Simulation results for the parameter estimates (standard errors) for the estimation methods A P and H L  5.3  129  Simulation results for bias and M S E of the parameter estimates for the estimation methods A P and H L  129  ix  List of Figures 1.1  Viral loads (response) of six randomly selected H I V patients  1.2  C D 4 counts of six randomly selected H I V patients  12  3.1  The time series plot for b associated with patient 10. . . .  56  3.2  The autocorrelation function plot for b associated with patient 10  56  4.1  The time series plot for b associated with patient 14  100  4.2  The autocorrelation function plot for b associated with patient 14  100  5.1  The time series plot for b associated with patient 10  126  5.2  The autocorrelation function plot for b associated with patient 14  126  2  2  2  2  2  2  x  7  Acknowledgements First of all, I thank my supervisor, Dr. Lang W u , for his support, both academic and moral. I feel honored to have had the chance to work with him. Very special thanks are due the other members of my supervisory committee, Dr. Paul Gustafson and Dr. Matias SalibianBarrera, for their invaluable suggestions and encouragement. Also, I am very grateful to Dr. Nancy Heckmen, Dr. Harry Joe, and Dr. John Petkau for teaching me, academically and personally. I am also indebted to the office staff, Christine Graham, Elaine Salameh, and Rhoda Morgan, for all their help with administrative matters. It is a pleasure to acknowledge helpful discussions with my fellow graduate students: Juxin L i u , Guohua Y a n , Weiliang Qiu, Hui Shen, and all of the others. Finally, I would like to thank my family, my husband, and my son for their tireless love, patience, understanding, and support.  WEI LIU  The University September  of British  Columbia  2006  xi  To my family  xii  Chapter 1 Introduction 1.1  Longitudinal Studies  A longitudinal study is defined as a study in which the response for each individual in the study is observed on two or more occasions.  Longitudinal studies are very common  in health and life sciences, epidemiology, medical, and biomedical research.  Longitudinal  studies are. also common in other areas including education, psychology, social sciences, and econometrics. A major advantage of longitudinal studies over cross-sectional studies is that in longitudinal studies one can model the individual response trajectory over time while in cross-sectional studies one cannot. In longitudinal studies, covariates may be classified into two categories: time-varying covariates and time-independent covariates.  Time-varying covariates represent variables  which vary over time within individuals. Time itself may be viewed as a covariate in that often there is interest in testing whether there are any changes in the response variable over time. When one studies children's weight trajectories over time, the height may be a time-varying covariate which can change with time. Time-independent covariates, on the  other hand, may represent baseline factors which do hot vary with time. Examples of timeindependent covariates might include an individual's gender and race.  One of the goals  in longitudinal research is to investigate the effects of important covariates on individual response trajectories over time. A defining feature of a longitudinal data set is repeated observations on a number of individuals. The repeated observations on the same individual tend to be correlated. It is important to explicitly recognize two sources of variability in a longitudinal data set: random variation among repeated measurements within a given individual and random variation between individuals. Moreover, the number of observations within individuals often varies from individual to individual (i.e., the data are often unbalanced). Therefore, longitudinal data require special statistical methods to draw valid statistical inferences. There are three approaches to a longitudinal data analysis. "The marginal  model  approach is to model the marginal expectation of a response as a function of covariates. The methods are designed to permit separate modelling of the regression of the response on covariates, and the association among repeated observations of the response for each individual. Marginal models are appropriate when inferences about the population average are the main interest. For example, in a clinical trial the average difference between control and treatment is most important, not the difference for any one individual. The random  effects model approach assumes that the response is a function of covari-  ates with regression coefficients varying from one individual to the next. A random effects model is a reasonable description if the set of coefficients for a set of individuals can be thought of as a sample from a distribution. In random effects models, correlation arises among repeated responses because the regression coefficients vary across individuals, and regression coefficients represent the effects of the covariates on an individual, which is in contrast to the marginal model coefficients which describe the effect of covariates on the  population average. Random effects models are most useful when the objective is to make inference about individuals, such as in AIDS studies. They may focus on both population parameters and individuals characteristics. The transition model approach describes the conditional distribution of each response on an individual as an explicit function of his past responses and covariates. Under transition models, correlation among the response observations on one individual exists because the past response observations explicitly influence the present response observation. The past response observations are treated as additional covariates. In each of the three approaches, we consider both the dependence of the responses on covariates and the correlation among the responses. W i t h cross-sectional data, only the dependence of the responses on covariates needs to be specified since there is no correlation of responses.  In longitudinal studies, in which correlation usually exists among responses,  there are at least two consequences of ignoring it as follows. First, incorrect inferences about regression coefficients. In particular, confidence intervals are too short based on assumption of independence when in fact there is positive dependence. Secondly, the estimation method may be inefficient, that is, less precise than possible.  1.2  Parametric Nonlinear Mixed-effects Models  Parametric nonlinear mixed-effects ( N L M E ) models, or hierarchical nonlinear models, have been widely used in many longitudinal studies such as human immunodeficiency virus (HIV) viral dynamics, pharmacokinetic analyses, and studies of growth and decay (Davidian and Giltinan 1995; Vonesh and Chinchilli 1997). In these studies, the intra-individual variation and the inter-individual variation are typically modelled by a two-stage hierarchical model. The first stage specifies the mean and covariance structure for a given individual, whereas  3  the second stage characterizes the inter-individual variation. Understanding the nature of inter-individual systematic and random variation at the second stage often receives more emphasis.  This inter-individual variation may be partially explained by some baseline or  time-varying covariates. Suppose that there are n individuals with measurements over time. Let yy and zy respectively be the response value and the i / x l covariate values for individual i at time Uj, i = 1 , . . . , n, j = 1 , . . . , rii. The covariates zy may incorporate variables such as time, dose, etc. A general parametric N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995)  = giz&PJ  V i j  =  ft,-  d(z  + eij,  eilfS, ^ 1  ft hi),  ii;  N(0,5 I),  (1.1).  2  N(0,B),  i = l,...,n,  j = l,...,m,  (1.2)  where g{-) and d(-) are known (possibly nonlinear) functions, ft,- are individual-specific parameters,  f3 are population parameters  i — (eii, • • . , e )  e  T  ini  (fixed effects), b; are random effects, ft — (/3^,..., f3jni)T,  are within-individual random errors and are assumed to be independent'  of bj, 8 is the unknown within-individual variance, / is the identity matrix, and B is an 2  unknown variance-covariance matrix. In AIDS studies, for example, viral loads (Plasma HIV-1 R N A copies) and various covariates such as C D 4 count are usually measured over time after initiation of treatments. The following parametric N L M E model has been widely used to fit short-term (the first three months after treatments) H I V viral dynamics (Wu, 2002; W u and Zhang, 2002) y  y  = log (P 1 0  l0g(Pu) =p\  - *^ + P A l  l i e  e - ^ ) + ey,  X j = #2 + fcZij + hi,  + b  li}  U  l o g ( P » ) = PA + hi,  X j = As + hi,  2  2i  (1.3)  X  2 i  .  (1-4) (1.5)  where yy and zy are the logio-transformation of the viral load measurement and C D 4 count 4  for patient i at time Uj respectively, are baseline values, and A i ^ and X  2ij  = (bu, b i, hi, bij)  T  2  are random effects, Pu and P  2i  are the first (initial) and the second phase viral decay  rates respectively (they may be interpreted as the turnover rates of productively infected cells and long-lived and/or latently infected cells respectively). Although N L M E models are popular in practice, their use has been somewhat limited because of the. complexity of the likelihood function. Estimation of model parameters based on maximum likelihood can be challenging since these models are typically nonlinear with respect to the random effects and thus have no closed-form expressions for the marginal likelihood. This has led to the development of some widely used approximate methods based on Taylor expansions or Laplace approximation of the likelihood function (Lindstrom and Bates 1990; Wolfinger 1993; Vonesh, Wangs, Nie, and Majumdar 2002). These approximate methods are computationally efficient in the sense that they .may converge faster and have less computational problems than the "exact" likelihood method, which finds the maximum likelihood estimator ( M L E ) using numerical integration techniques or Monte Carlo methods. These approximate methods often perform well if the number of infra-individual measurements is not small, but their performance may be less satisfactory if the intra-individual data are sparse, especially when the inter-individual variability is large (Davidian and Giltinan 1995; Vonesh and Chinchilli 1997; Pinheiro and Bates 1995). Thus there is still a need for developing "exact" methods.  "Exact" likelihood inference for generalized linear mixed  models based on Monte Carlo E M algorithms has been investigated by McCulloch (1997) and Booth and Hobert (1999).  5  1.3  Semiparametric Nonlinear Mixed-effects Models  Parametric N L M E models are powerful tools in many longitudinal analyses. In some cases, however, parametric N L M E models may not be flexible enough in modelling complex longitudinal processes, since the underlying mechanism which generates the data may be complicated in practice. In these cases, semiparametric or nonparametric models may be more flexible in modelling the complex longitudinal process (Ke and Wang, 2001; Rice and Wu, 2001). In particular, semiparametric N L M E models are very useful in characterizing both the intra-individual variation and the inter-individual variation, in which the intra-individual variation is modelled semiparametrically while the inter-individual variation is incorporated by random effects (Davidian and Giltinan, 1995; Ke and Wang, 2001; W u and Zhang, 2002). In A I D S studies, for instance, the parametric N L M E model (1.3) - (1.5) is appropriate only for fitting short-term H I V viral dynamics.  Due to long-term clinical factors, drug  resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1 for long-term viral load trajectories of six randomly selected H I V patients). Grossman et al. (1999) pointed out that viral decay rates after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). This leads to the following semiparametric N L M E model  = log (Piie- «*« + P A l  y i j  1 0  e~ ^) + e ,  (1.6)  x  2 i  y  log(Pu) = (3i + b ,  X  log(P i) = PA + hi,  A ij = w(Uj) + hilUj),  u  Uj  2  2  = (3 + (3 2  3Zij  + b, 2i  (1.7) (1.8)  where w(-) and hi(-) in (1.8) are unknown nonparametric smooth fixed- and random-effects 6  Figure 1.1: Viral loads (response) of six randomly selected H I V patients.  functions used to describe the complicated second phase decay rate  X ij2  Wu and Zhang (2002) introduced a class of semiparametric N L M E models for longitudinal data. The standard parametric N L M E models can be regarded as a special case of their models.  Their models are more flexible than the semiparametric N L M E models  proposed by K e and Wang (2001). Details of the semiparametric N L M E models proposed by W u and Zhang (2002) will be described in Chapter 2.  1.4  Measurement Errors and Dropouts  In many longitudinal studies, the inter-individual variation may be large and this variation may be partially explained by time-varying covariates.  Some covariates, however, may be  measured with substantial errors and may contain missing values as well. Ignoring measurement errors and missing data in covariates may lead to biased results (Carroll et al. 1995; Higgins et al.  1997; W u , 2002).  Moreover, some individuals may drop out of the study  before the scheduled end for various reasons such as drug intolerance, which leads to missing data. Measurement errors and missing data make statistical analysis in longitudinal studies much more complicated, because standard complete-data methods are not directly applicable. Therefore, it is very important to find appropriate methods to deal with measurement errors and missing data. In AIDS studies, for example, it is well known that C D 4 counts, which may be used as covariates, are usually measured with substantial errors and are usually measured at time points different from the response (viral load) measurement schedule. In addition, it is very common that some patients drop out of the study early or miss scheduled visits due to drug intolerance or other problems.  Visual inspection of the raw data seems to indicate that  dropout patients may have slower viral decay, compared with the remaining patients. Thus,  8  the dropouts are likely to be informative or nonignorable. Commonly used measurement error models are reviewed in Carroll et al. (1995). For N L M E models with covariate measurement errors, Higgins, et al. (1997) proposed a two-step method and a bootstrap method, and W u (2002) considered censored response and covariate measurement errors based on a joint model. There is also extensive literature on dropouts in longitudinal studies (e.g., Diggle and Kenward, 1994; Little 1995; Ibrahim et al. 2001). However, there is little literature on addressing measurement errors, informative dropouts, and missing data in semiparametric N L M E models. In the presence of missing data, the missing data mechanism must be taken into account to obtain valid statistical inferences.  Little and Rubin (1987) and Little (1995)  discussed statistical analyses with missing values.  Let  repeated observations of a variable y on individual i. denoting the observed components of Let Ti = (ra,...,  r  i n i  )  T  = (yu,... Write  =  ,yi )  T  ni  be a vector of  (y[°\ y^™'), with y-°'  and y | ^ denoting the missing components of y,. m  denote a set of indicator variables such that r y = 1 if y^ is missing  and rij = 0 otherwise. The probability distribution of r; defines a probability model for the missing value mechanism. Little and Rubin (1987) classified the missing value mechanism as follows.  • Missing data are missing  completely at random ( M C A R ) if the probability of missing-  ness is independent of both observed and unobserved data. When missing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be plausible. For example, some values are missing because of reasons irrelevant to the treatment (e.g., the medical equipment is broken down on a certain day). So missingness is M C A R if  • Missing data are missing  is independent of both y\°^ and y - ' m  at random ( M A R ) if the probability of missingness depends  9  only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he/she is too old. In mathematical notation, missingness is M A R if Ti is independent of  .  • Missing data are nonignorable  or informative  (NIM) if the probability of missingness  depends on unobserved data. For random effects models, we consider the following two nonignorable response missing mechanisms. First, the probability of the missingness depends on unobserved responses. For example, a patient fails to visit the clinic because he/she is too sick. We call the missingness outcome-based 1995) if rt is dependent on y\ \ m  informative  but not on the random effects bj.  (Little,  Secondly, the  probability of missingness depends on unobservable random effects. For example, an AIDS patient may drop out if his/her individual-specific viral decay is too slow. We call the missingness random-effect-based  informative  (Little, 1995) if r-j is dependent  on random effects bj, but not on y- ^. m  Both M C A R and M A R missing mechanisms are sometimes referred to without distinction as ignorable.  Little and Rubin (1987) showed that, when missing data are nonignorable,  likelihood inference must incorporate the missing data mechanism to avoid biased results.  1.5  A Motivating Example  Our research is motivated by H I V viral dynamic studies, which model the viral load trajectories after initiation of anti-HIV treatments.  H I V viral dynamic models have received  great attention in A I D S studies in recent years (Ho et al. 1995; Perelson et al. 1996; W u and Ding, 1999; W u , 2005). These viral dynamic models provide good understanding of the pathogenesis of H I V infection and evaluation of anti-HIV therapies.  N L M E models have  been popular in modelling the initial period of H I V viral dynamics and in characterizing 10  the large inter-patient variation. It is shown that the initial viral decay rate may reflect the efficacy of the anti-HIV therapy (Ding and W u , 2001). One of the major challenges in modelling long-term H I V viral dynamics is that, during late stages of an anti-HIV treatment, it is difficult to model the viral load trajectory parametrically, because drug resistance, noncompliance, and other long-term clinical factors may affect viral load trajectories. Therefore, semiparametric N L M E models may be more suitable for modelling H I V viral dynamics (Wu and Zhang, 2002). Understanding the large inter-patient variation in H I V viral dynamic studies often receives great attention, which may help to provide individualized treatments.  It has been  shown that covariates such as C D 4 cell count (see Figure 1.2) may partially explain the large inter-patient variation (Wu et al. 1999; W u , 2002). However, some covariates such as C D 4 cell count may be measured with substantial errors and may be measured at time points different from the response measurement schedule (which leads to missing data in covariates). Ignoring these measurement errors and missing data in covariates may lead to biased results (Wu, 2002). In addition, it is very common that some patients may drop out of the study early or miss scheduled visits due to drug resistance/intolerance and other problems (although dropout patients may return to study later).  It appears that dropout patients  may have slower viral decay rates, compared with the remaining patients (see Figure 1.1). Thus the dropouts are likely to be informative or nonignorable. Therefore, it is important to address measurement errors, informative dropouts, and missing data in semiparametric N L M E models in order to obtain reliable results, which may make significant contributions to H I V / A I D S studies. The following A I D S dataset motivates our research. A more detailed data description can be found in W u (2002). The dataset includes 53 H I V infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured  11  50  100  150  Days after treatment  Figure 1.2: C D 4 counts of six randomly selected HIV patients.  12  200  on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contaminated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules. It is well known that C D 4 counts are usually measured with substantial errors. The number of response (viral load) measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance or other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 C D 4 measurements missing at viral load measurement times, due mainly to a somewhat different C D 4 measurement schedule.  Six patients are randomly selected and their viral  loads are plotted in Figure 1.1. In the presence of measurement errors in C D 4 count, we consider the following semiparametric N L M E model, which corresponds model (1.6) - (1.8), to fit the viral dynamics  Va = \og {P e- ^ x  w  u  + P  e- ^) + x  2 i  e i j  ,  (1.9)  log(Pi0 = Pl + hi,  Xlij = $2 + Psz*j + b ,  (1.10)  log(P t) =  A  (1.11)  2  2i  + hi,  2 i i  = w(tij) + hiiUj),  where z*- is the unobservable true C D 4 count, reflecting the belief that actual, not possibly corrupted, C D 4 counts govern the initial phase viral decay rate XUJ. will be used in our data analyses in later chapters.  Model (1.9) - (1.11)  The C D 4 count trajectories for six  randomly selected'patients are plotted in Figure 1.2. There exists large variability in C D 4 count between patients.  Most C D 4 count trajectories appear to have roughly quadratic  polynomial shapes. We will discuss covariate models in the next chapter.  13  1.6  Research Objectives and Thesis Organization  In this thesis, we consider approximate maximum likelihood inference in the following three problems: (1).  semiparametric N L M E models with measurement errors and missing data  in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based  informative  missing responses; (3). semiparametric N L M E models  with covariate measurement errors and random-effect-based  informative  missing responses.  Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. Some asymptotic results are developed. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, is more accurate than the second method but it is computationally very intensive and may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions, is always computationally feasible and is often computationally much more efficient, but it is usually less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or may be used to provide excellent parameter starting values for the first method. The remainder of this thesis is organized as follows. In Chapter 2, we introduce general semiparametric N L M E models with covariate measurement errors. Following Rice and W u (2001) and W u and Zhang (2002), we employ natural cubic spline bases with the percentilebased knots to transform semiparametric N L M E models into a parametric N L M E models. In Chapter 3, we address measurement errors and missing data in time-varying covariates in semiparametric N L M E models and propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to log-likelihood func-  14  tions, respectively. We also compare the two joint model methods with the two-step method suggested by Higgins, et al. (1997) and discuss the asymptotic properties of approximate M L E s . We finally apply the two joint model methods to a H I V dataset. In Chapter 4, we address outcome-based  informative  dropouts and covariate measurement errors in semipara-  metric N L M E models and propose two joint model methods, implemented by Monte Carlo E M algorithms. We illustrate our proposed methods in a H I V dataset and evaluate their performance via simulation studies. In Chapter 5, we consider random-effect-based  informative  missing responses in semiparametric N L M E models with covariate measurement errors. We propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Laplace approximation to log-likelihood functions respectively, to simultaneously obtain approximate M L E s of all model parameters. We also discuss some asymptotic properties of the approximate M L E s . We illustrate our methods in a H I V dataset and evaluate their performance by simulation studies. We conclude this thesis with some discussion and possible future work in Chapter 6.  15  Chapter 2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Measurement Errors 2.1  Introduction  In this chapter we present the general form for semiparametric N L M E models with covariate measurement errors.  In Section 2.2, we describe a general semiparametric N L M E model  for the response process and incorporate possibly mis-measured time-varying covariates. We approximate the proposed semiparametric N L M E model by a parametric N L M E model, using linear combinations of natural cubic splines with percentile-based knots. Consistency of the estimates is discussed. In Section 2.3, the covariate process is modelled using a mixed-effects model to address measurement errors and missing data.  16  2.2  A Semiparametric N L M E Model for the Response Process  2.2.1  A Semiparmetric N L M E Model with Mis-measured Covariates  We describe a semiparametric N L M E model in general form. Let for individual i at time Uj, i = l , . . . , n , and  j = l,...,nj.  be the response value  Let ZM be the observed value  let z* be the unobservable "true" value of covariate k for individual i at time uu, kl  i = 1 , . . . , n, k = 1 , . . . , v, 1 = 1,..., m-i. For simplicity, we focus on the case where z*  kl  is  the current true covariate value, but our method can be extended to the case where z*  kl  is  a summary of the true covariate values up to time uu. Note that for each individual, we allow the covariate measurement times uu to differ from the response measurement times Uj. In other words, we allow missing data in the covariates. Let  z* = (zfi> • • • >zLj » T  where z  = (z , • . . ,z i) , T  u  iU  iu  l =  — (yu,...  ,yi )  T  ni  and  l,...,mi.  For the response process, we consider a general semiparametric N L M E model similar to W u and Zhang (2002), but incorporate possibly mis-measured time-varying covariates  Vij  =  ft,  = d*(^, [3*, b*),  r (t) t  =  9(Uj,  rifcjfi  + eij,  (2.1) (2.2)  v(w(t), hi(t)),  i = 1,... ,n, j = 1,... ,n  u  (2.3)  where g(-), d*(-), and v(-) are known (possible nonlinear) functions, w(t) and hi(t) are unknown nonparametric smooth fixed-effects and random-effects functions respectively, /3*. are individual-specific parameters, (3* are population parameters, random error, and b* are random effects.  Let e; = (en,...  17  is the within-individual  ,ei ) . T  ni  We assume that  ~  N(0, b*  5 1), 2  where 5 is the unknown within-individual variance and I is the identity matrix, 2  7Y(0, £ * ) , /ij(i)'s are identical and independent realizations of a zero-mean stochastic  process h(t), and b* and hi(t) are independent of e i . We can rewrite the semiparametric N L M E models (2.1) - (2.3) in a compact way  Vij = g(Uj, d*(z*., [3*, b*), v(w(tij), htiUj))) + eij.  (2.4)  Note that in (2.1) or (2.4), we assume that the individual-specific parameters /3*- depend on the true but unobservable covariates z*,- rather than the observed covariates zy, which are measured with error. Because of the nonparametric parts (i.e., w(t) and hi(t)) in the model, the semiparametric N L M E model (2.4) is more flexible than parametric N L M E models for modelling longitudinal data, and it reduces to a parametric N L M E model when the nonparametric parts w(t)  and hi(t) are constants.  Following W u and Zhang (2002), model (2.4) is also  more flexible than other semiparametric N L M E models that have appeared in the literature, such as Ke and Wang (2001). The semiparametric N L M E models in K e and Wang (2001) can be considered as a special case of model (2.4). In particular, their model only put the random effects in [3*^ as in (2.2) and considered  (£*,•) = W(UJ; /?*•) in (2.3). Therefore,  model (2.4) is a very general and flexible semiparametric N L M E model.  2.2.2  A  Basis-based A p p r o a c h to N o n p a r a m e t r i c F u n c t i o n s  To do statistical inference for the semiparametric N L M E model (2.4), a main difficulty is how to fit the nonparametric smooth fixed-effects function w(t) and random-effects function hi(t). Following Rice and W u (2001) and W u and Zhang (2002), we use a basis-based approach which transforms a general semiparametric N L M E model into a set of parametric N L M E models indexed by a smoothing parameter (the number of basis functions).  18  We use the  fixed-effects function w(t) to illustrate the basis-based approach. Let x be the support of t and L (x)  be the inner product space of all square integrable  2  functions with norm || • || and inner product < •, • >, where for any ipi, ip2 £ L {x)> 2  Jx  a subspace of L (x)2  w  w  W^(x)  define  e  Jx  Assume that w(t) is an element of a smooth function space S (x), example of S (x)  w  An  is the Sobolev space  = {ir>\il>,ip',...,i) ~ {m  absolutely continuous, ^  1]  Denote a complete orthonormal basis of S (x)  (  m  e  )  L ( )}. 2  X  by \I>(£) — [ipo(t), tpi(t), tp2(t), • • • ]  T  w  where  ipo(t) = 1. Then w(t) can be expanded as oo  fc=0 where the coefficients Hk=  w(t)ip (t)dt =< w,ip > k  k  .  JX Let * (t) p  = [ipo(t), ipi(t), • • • ,ip -i(t)]  T  P  and fj,  — (//o, A*i, • • • ,^ ) T  p  P  Since w(t) are square  integrable, the truncations of w(t) at term p p-i  fc=0 -2  will converge to w(t) in L -norm as p tends to infinity. It follows that when p is large enough, w (t) p  can approximate w(t) very well, i.e., w(t) ~ w (t). p  Similarly, if we assume that hi(i) is an element of a smooth function space 5/ (x)(c l  L (x)) 2  = [(f>o{t), <Pi(t), 02 (0> • • - ]  with a complete orthonormal basis  T  the truncations of hi(t) at term q <?-i  k=0  19  where (j>o(t) = 1,  will converge to hi(t) in L - n o r m as q tends to infinity, where <& (£) = 2  g  (f) -i(t)]  and £ , = ( £ > . . . , £ ; ) . It follows that when q is large enough, h (t) T  T  q  [<f>o(t), <f>i(t), • • •,  iq  i 0  iq  g  imate hi(t) very well, i.e., hi(i) ~  can approx-  hi (t). q  The function w (t) can> be considered as the projection of w(t) on the linear space p  S(x, * ) p  = {i> | i> =  ty (t) p  and the function h (t) iq  S{x,  T  p  /ii(i)'s  G i? } p  p  iiq^ iiq €  p and g increasing, io (i) and p  fi  p  C S (x), w  spanned by basis functions  $? (t), p  can be considered as the projection of hi(t) on the linear space  = {0 I 0 = ^P^T  / x and  fi ,  /ii (t) g  C SUx),  spanned by basis functions  approach to w(t) and  respectively.  With Parameters  are unknown vectors of fixed- and random-effects coefficients, respectively. Since are assumed to be identical and independent realizations of a zero-mean stochastic  process, we can regard £  i q  as identical and independent realizations of a zero-mean random  vector with unknown covariance matrix K. There are many bases available in the literature for curve  fitting.  Among global  bases are Legendre polynomials and Fourier series, and among local bases are regression splines (Eubank, 1988), B-splines (de Boor, 1978) and natural splines (Green and Silverman, 1994). A B-spline  of degree d on x with knots to < ti < ' ' ' <  < ^M+I is a piecewise  polynomial with polynomial pieces of degree d joining together smoothly at the interior knots * i < • • • < * M while satisfying some boundary conditions. In other words, a B-spline is a polynomial of degree d within each of the intervals [tk,  tfc+i),  0 < k < M — 1, and [XM, I M + I ] ,  which globally has (d — l)-continuous derivatives. A l l such B-splines form a linear space with M + d + 1 basis functions which are mainly determined by three factors: the degree d; the location of the knots, and the number of interior knots M. corresponding B-splines are called cubic splines.  When the cubic splines have zero second  and third derivatives at the two extreme knots XQ and XM+I, splines.  When the degree d = 3, the  they are called natural  cubic  Without loss of generality, throughout this thesis, we assume that the nonparametric  20  fixed- and random-effects functions w(t) and hi(t) are elements of the Sobolev space  VK (x) 2  2  and we use natural cubic spline bases (Green and Silverman, 1994) due to their many good properties, for example, easy construction, good smoothness, and flexibility to model the underlying curves of various shapes (de Boor, 1978).  2.2.3  Percentile-based  Knot  P l a c i n g for  Splines  The placing of knots is an important issue for splines in which we attempt to use a few knots to represent a sample of design time points. We use sample percentiles of the design time points as knots so that there are more (fewer) knots in the area where more (fewer) design time points are available, as suggested by W u and Zhang (2002).  They indicated  that the percentile-based knot placing rule should work better for longitudinal data than the equally-spaced knot placing rule used by Rice and W u (2001), since the design time points of longitudinal data are usually sparse and often not uniformly spaced. Moreover, the percentile-based knot placing rule guarantees that the locations of the knots (and also the resulting basis functions) are sample-dependent and design-adaptive.  These properties are  not shared by the equally-spaced knot placing rule. After the degree d and the knot placing rule are determined, we need to choose the numbers of the interior knots, or equivalently to choose the numbers p and q of the basis functions, which are called smoothing  2.2.4  Selection of S m o o t h i n g  parameters.  Parameters  Using natural cubic spline bases with percentile-based knots to fit the nonparametric fixedand random-effects functions w(t) and hi(t), we can transform the semiparametric N L M E model into a parametric N L M E model. To assess how well the resulting parametric N L M E model approximates the original semiparametric N L M E model, we need to consider two factors: the goodness-of-fit and the model complexity. Goodness-of-fit usually indicates how 21  well the model fits the data (or how small the biases of the associated estimators are). It can be improved by increasing p and and  Sh(x,®q{t))-  q  or equivalently, enlarging the linear spaces  S (x,^ {t)) w  p  However, the model complexity represents how complex the model is (or  how large the variances of the associated estimators are). The model usually becomes more complicated with increasing p and q. Thus, there is a trade-off between the goodness-offit and  the model complexity.  To balance the two components, it is natural to employ  some model selection rules such as the Akaike Information Criterion (AIC) or the Schwarz's Bayesian Information Criterion (BIC) (Davidian and Giltinan, 1995). This is because the transformed parametric N L M E models are indexed by p and q and choosing different p and q is equivalent to choosing different parametric N L M E models. Let <p be the number of independent parameters in a parametric N L M E model, say, model (1.1) and (1.2). Then the A I C and the B I C are defined as A I C = - 2 L o g l i k + 2tp, B I C = - 2 L o g l i k + [log ( £ m)] cp, where Loglik is the log-likelihood of the fitted the parametric N L M E model (see Davidian and Giltinan, 1995, pl56). Since a parametric N L M E model with a larger number of parameters will always produce a larger value for the log-likelihood (a smaller value for -2 Loglik), the n  penalty terms 2<p in A I C and  log(]Tnj) L  i=l  ip in B I C are needed to offset this advantage. Since J  the penalty term in B I C is usually much larger than that in A I C , B I C is a conservative rule and generally favors a parsimonious model. Since both the A I C and the B I C of a parametric N L M E model are defined as twice the negative log-likelihood of the model (representing the goodness-of-fit) plus a. penalty term related to the number of parameters used in the model (representing the model complexity), we will choose BIC are minimized over a series of  ^ (t) p  and  ^ (t) p  $ (t), g  and  $> (t) q  so that the A I C or the  which leads to the best approximate  parametric N L M E model to the original semiparametric N L M E model in terms of the A I C 22  and B I C criteria. Liang et al. (2003) noted that the model obtained this way often provides good approximation in practice. We will evaluate the performance of the A I C and the B I C in the current setting (see Section 4.6).  2.2.5  Transformation of the  Semiparametric N L M E  Model  After determining the smoothing parameters p and q via the A I C and the B I C criteria, we replace w(t) and hi(t) in the nonparametric function r;(£) in (2.3) by their approximations w (t) p  and h {t). iq  Thus, we obtain an approximation to the nonparametric function  r^t),  and approximate the semiparametric N L M E model (2.4) as follows  V i j  «  gfc,  d*(zT., ft, b*), v^! {t) ^  =  giUj, d(zy, ft, bi)) + ey  T  p  $ (t) £ ))  +  T  q  iq  ^ (2.5)  where (3 = ( f t , /j, ) are fixed effects, b; = (b*, £ ) are random effects, and d(-) is a known i ?  but possible nonlinear function. Then, we can approximate the semiparametric N L M E model (2.1) - (2.3) by the following parametric N L M E model  ft.-Hey,  Va = </(/,;• ft,.  =  d ( z , ft bi),  e i  |ft  ^ / V ( 0 , 5 I), 2  b > A f JV(0,/3),  y  (2.6) (2.7)  where B is an unstructured covariance matrix. Note that e; and bj are independent of each other. Approximate statistical inference can then be based on the approximate model (2.6) and (2.7), as shown in Chapters 3 - 5 .  2.2.6  Consistency  of the  After we obtain estimates ji  p  and £  Estimate of  i q  w{t)  based on the parametric N L M E model (2.6) and (2.7),  we can then estimate the nonparametric functions w(t) 23  and hi(t)  in the semiparametric  N L M E model (2.1) - (2.3) as follows  w(t) =• w (t) = * (i) A , T  p  p  P  k(t) = k (t) = $ (t) i , T  g  g  iq  Therefore, the consistency of the estimates in the semiparametric N L M E model (2.1) - (2.3) is strongly related to the consistency of estimates in the parametric N L M E model (2.6) and (2.7). Under some mild conditions, the following Theorem 2.1 guarantees that we can obtain a consistent estimate w(t)  of the nonparametric fixed-effects function w(t)  in the  semiparametric N L M E model (2.1) - (2.3) if we can find i/n-consistent estimates -ft of the fixed-effects coefficients  pi . p  Following W u and Zhang (2002), we prove the consistency of the estimate w{t) of the nonparametric fixed-effects function w(t) in the semiparametric N L M E model (2.1) -  (2.3)  based on the following conditions: (a) . $(t) is a complete orthonormal basis of  S(x),  subspace of  a  L (x)2  oo  (b) . The nonparametric fixed-effects function w(t) E S(x)  s  o  that w(t) = £ l^ki>k{t)k=Q  (c) .  The design time points {i^,  i = l,...,n,  j = l , . . . , n j } are identically and  independently distributed such that when the number n of individuals tends to infinity, the number of distinct time points will tend to infinity. In this case, we can truncate w(t) in the semiparametric N L M E model (2.1) - (2.3) in such a way that w (t) p  = Y^k^o^k^kif) =  ty {t) /x so that p —> oo, p/n —> 0 as n —> oo. T  p  p  (d) . For any fixed p, we assume that we can obtain A/n-consistent estimates fx of the p  fixed-effects coefficients pL so that as n —• oo, E(p\ ) p  p  semidefinite positive matrix S  —>  fi and Cov(\/nfi ) p  p  —• E for some p  with p t r ( E ) bounded. _ 1  p  p  T h e o r e m 2.1. Under Conditions (a) - (d), as n —* oo, we have \\w — w\\ —> 0 in probability.  24  Proof. First we consider E\\£ip — [ip "p f"pl E\\p, -» \\  J  p  -p-1  P-I  _fc=0  fc=i  = E  2  p  p-i =  ^E[/x -E(/i ) + ^(Afc)-^] fc=i p-i  2  -  £ fc=i  A**] +  f c  f c  - W*)]  +  2  -  2[A/t - E(p: )][E(fi )  2  k  -  k  v ]} k  P-I  =  £{E[£ *;=i p—i  - £(£fc)] + [£(£*) -  ^] }  2  f e  2  p—i  f  =  ^Var(/i ) + I][^(A fc=i fc=i  -  tr [Cov(A )] + | | £ ; ( A p ) - M ] ) -  f e  f c  )-^]  2  2  p  P  Under Conditions (a) and (b), and C o v ( n / i ) —> S  in Condition (d), we have  /  v  E\\w - w\\ = E\\w - w\\ 2  2  p  p  p  <  2{E\\w - w \\ + \\w - w\\ }  =  2{E\\p, -v \\  =  2{tr[Cov(ji )]  =  2 { n - t r [ C o v ( v ^ / i ) ] + \\E(p. ) - M | | + \\w - w\\ }  =  2Jn- tr[S +  =  2(n- [tr(S )+po(l)]  2  p  2  p  2  p  P  p  p  +  \\wp-w\\ } 2  + \\E(j± )-ti \\  +  2  p  p  \\w -w\\ } 2  p  1  2  p  1  p  1  p  p  2  p  p  o(l)] + ||£;(A)-^|| + ^ 2  p  + ||/i;(Ap)-/Xp||  2  +  2  f c  |  f;^}  fc=p =  2^ k=p  )  Under Conditions (b)-(d), it is easy to show that the three terms in parentheses {•} of the right-hand side tend to 0 as n —> oo. Under Condition (d), as n —> oo, E\\w — w\\ —> 0 2  implies \\w — w\\ —* 0 in probability.  • '  25  2.3  Measurement Errors and Missing Data in Covariates  At the presence of measurement errors and missing data in the time-varying covariates u  z  = (zni,...  ,z i) , T  il/  we need to model the covariate processes. We consider the following  multivariate linear mixed-effects  ( L M E ) model (Shah et al., 1997) to empirically describe  the covariate process z  it  = Uua + Vu a; + eu (= z* +  e)  t  it  where Uu and Vu are design matrices, a and  i = 1 , . . . , n, I = 1 , . . . , m-i  (2.8)  are unknown population (fixed-effects) and  individual-specific (random-effects) parameter vectors, and eu are the random measurement errors for individual i at time uu-  For example, we may model the covariate processes  parametrically based on empirical polynomial models with random coefficients, as in Higgins et al.  (1997) and W u (2002).  Alternatively, we may model the covariate processes  nonparametrically, and approximate the nonparametric fixed- and random-effects functions by linear combination of some basis functions, as in Section 2.2.  For either parametric or  nonparametric covariate models, we may convert the covariate models to the L M E model (2.8). Note that the covariate model (2.8) incorporates both the correlation of the repeated measurements on each individual and the correlation among different covariates. Note that the parameters in the covariate model (2.8) may be viewed as nuisance parameters because they are often not of main interest. We assume that the true (unobservable) covariate values are  z* = U a u  We also assume that a;  i.i.d.N{0, A),  u  e  a  + Vu sit.  i.i.d.7V(0, R), and a^ and e = (e^,... t  independent, where A is an unrestricted covariance matrix and R is an unknown within26  individual covariance matrix. We further assume that  and a, are independent of ej and  b j . Models (2.8) may be interpreted as a covariate measurement error model (Carroll et al., 1995; Higgins et al., 1997). To allow for missing data in the time-varying covariates (or different measurement schedules for the time-varying covariates), we recast model (2.8) in continuous time:  Zi(t) = U (i)cx + V (t)a + e (t), i  where  Zj(£),  i  i  i  i =  l,...,n,  Ui(t), Vi(t), and e»(t) are the covariate values, design matrices, and measurement  errors at time t respectively. A t the response measurement time Uj, which may be different from the covariate measurement times uu, the possibly unobserved "true" covariate values can be viewed as z*j = [7^ cx +  a^, where Uij = Ui(Uj) and  = Vi(Uj). In other words,  missing covariates at time Uj may be imputed by their estimated true values z*-.  27  Chapter 3 A Joint Model for Semiparametric N L M E Models with Covariate Measurement Errors and Missing Data 3.1  Introduction  In this chapter, we address measurement errors and missing data in time-varying covariates for semiparametric N L M E models. In Section 3.2, we review the two-step method proposed by Higgins et al.  (1997).  We derive some analytic and asymptotic results for the two-  step estimates for parametric N L M E models with mis-measured covariates, and analytically show that the variances of the main parameter estimates based on the two-step method are underestimated. To address measurement errors and missing data in time-varying covariates in semi-  28  parametric N L M E models based on models (2.6) - (2.8), in Sections 3.3 and 3.4 we propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to the log-likelihood function respectively, to find approximate M L E s of model parameters. We also discuss asymptotic properties of these approximate M L E s . In Section 3.5, we apply the two joint model methods to a real dataset. We evaluate the proposed methods and compare them with the two-step method via simulation studies. We conclude this chapter with some discussion in Section 3.6.  Proofs of the asymptotic  properties of approximate M L E s are presented in Section 3.7.  3.2  A Two-step Method  For covariate measurement error problems, a commonly used method is the so-called twostep method (Higgins, et al., 1997; Liang, et al., 2003): in the first step the "true" covariate values are estimated based on an assumed covariate model, and then in the second step, the possibly mis-measured covariates in the response model are simply replaced by the estimated covariates from the first step. The estimation of the main parameters in the response model proceeds as if the estimated covariate values are the true covariate values without measurement error. Intuitively, the resulting estimates of the main parameters may be approximately unbiased if the covariate estimates from the first step are unbiased, but the variances of the main parameter estimates may be underestimated because the variability of the covariate estimation in the first step is ignored in the estimation of the main parameters in the second step. Higgins et al. (1997) and Ogden and Tarpey (2005) realized this problem and proposed bootstrap methods which incorporate the variability from estimating the covariates in the first step. W u (2002) considered an approximate joint model approach which also incorporates the variability in the covariate estimation. In this section, we derive some  29  analytic results for the two-step estimates for parametric N L M E models with mis-measured covariates such as the models (2.6) - (2.8), analytically show that the variances of the main parameter estimates based on the two-step method are underestimated, and derive some asymptotic results for the two-step estimates. Let y = (yf, • • • , y ^ ) , and define z, z*, and e similarly. If z* is known, an estimate T  (3 of (3 can be expressed as (3 = s(y, z*), where s is a vector function. Since z is recorded with errors and z* is unobservable, we assume z = z* + e, where E(e)  = 0. Let z be an  unbiased estimate of the true covariate value z* based on the observed covariate value z (i.e., E(z) = z*). The two-step method estimates (3 by f3 = s(y, z ) , which depends on realizations of two random variables y and z. If we assume that E[s(y, z*)] « (3, and y and z are roughly independent so that  E  ds(y,z)  (z-z*)}  dz  E  ds(y,z) dz  £(z-z*),  and that the function s(y, z) is well approximated by a first-order Taylor series expansion around z*, then we show next that the estimate f3 is also approximately unbiased, following Ogden and Tarpey (2005). Taking a first-order Taylor expansion of s(y, z) around the "true" covariate value z*, we have  E0)  =£(s(y,z))^£{s( ,z*) + y  (3 + E ds(y,i)  ds(y,z) | dz Iz=z*  (z-z*)}  E(z - z*) = f3,  provided that the expectations exist (the expectation operator in the above expression is to be taken as the expectation with respect to both y and z). However, the variances of the two-step estimates will be underestimated, as shown below. When z is plugged in the estimation of j3, the resulting variance-covariance matrix is actually an estimate of Cov(/3|z). By the well-known variance decomposition formula  Cav0) = Cov[£(/3|z)] + £[Cov(/3|z)], 30  we know that Cov(P) — E[Cov(/3\z)]  is nonnegative definite, since Cov[E(/3|z)] is nonneg-  ative definite. Thus, on average, the two-step approach underestimates the true variancecovariance matrix of f3. Following the general approach taken by Amemiya (1983), under the suitable regularity conditions on the log-likelihood function of P (e.g., the third order derivatives exist and are continuous in an open neighborhood about (3), we derive the asymptotic distribution for the M L E of f3 when the unobservable "true" covariates z* are imputed by their estimated values z^. Suppose that z^ are ^/m^-consistent estimates of z* and that  yfrriiizi  where  - z*)  N(0,  fi ),  i =  Zi  l,...,n,  is the number of observations for covariates on individual i.  mi = 0(m)  We assume that  uniformly for i — 1,. . . , n, where m — min^m,), and that m,j and n go to  infinity at the same rate with n/rrti —• Cj, where 0 < Cj < oo. Let n  i=l  be the log-likelihood function of f3 based on the observed data y and z* with the unobservable "true" covariates z* are imputed by their estimated values z. T h e M L E J3 of /3 satisfies a set of equations  f-'  dp  dp  1=1  where  dk{P;yi, Z j ) dp  =  dlj(P;yi, z ) P=P dp f  Taking a first-order Taylor expansion of dl((3;y, z)/d/3 around the "true" covariates z*, we  31  have  z*) n ° - Edli(P;yi, dp +E K) = E dli((3;yi, +E 8(3 d h(p- , z*) dli(P;yi, z*) + E *-<>+£>(£) =E dpdzf * dli(f3;y z*) 8 k(P;yi, z*) • = E dp +E dpdzf n d k(P; , z*) dk(P;yi, z | ) =E +E dpdzf z*) d li(P;yi, z*) = E 8h(P;yi, + E dp dpdzf r&li&yi, L  z*  •(ii - z') + 0 { | | , - z-|| ) 2  Z  d/33zf  i=i n  i=i  i=l  i=l  n  n  r c ^ f t v i , z* L  <9/3<9zf  2  yi  a/3  i=i  n  n  1=1  2  u  Z j - z*) + O I m a x Y — ) p  i=i n  L i  \ mj  / .  2  yi  Zi-z*) + O  5/3  i=i  i =n l  n  p  (minm^.  2  z» - z*) + O p ( m ) , _ 1  2=1  since z» are  9k(P;  of  »=i  y 77vconsistent /  yi,  estimates of z*. Next, carrying out a first-order Taylor expansion  z*)/dp in the above expression around /3, we can obtain  i=l  d/3d/3  i=i  J  d lj(P;yj, z*) . ( z i dpdzf 2  i=l  E t=i  d h(p; ,  dh(P;yu zj)  z*)  2  yi  a/3  dpdp  T  i=i  Z  i  )  +  Op(m  (P-P)  _  x  ) =  0  + O (n- ) 1  p  , ^ d li(P;yi, z*) *s,n, -u + 2i =^l — a / 3 5 z ( z i - Z i ) + Q (^ ) = o, 2  n  r  P  since /3 is the M L E of /3 and thus it is i/^-consistent under the necessary regularity conditions on the log-likelihood function of p. The above expression can be written as  1 ^ d h(P; , z*) n ^ dpdp i=i 2  yi  1  v  d k(p; ,z*) dpdz\ 2  yi  i=l  32  t=i  dkjp-^z*) dp  Assume that the following limits exist  lim - f > ( / 3 ) = /(/3), n — * o o  7T, * — • • ' =1  where ii(/3) = —E[d li(f3; yi, z*)/d(3d0 ] is the Fisher information matrix for individual i. 2  r  It follows from Lemma 3.2 in Section 3.7 that  1 A  Note that -\/n( i z  —  z  \)  ~  - &kfay  z?)  u  \/Ci  m  i{ i z  —  z  d h(i3; ,  z*)  2  n  yi  p  *) for large n and m^. Using the asymptotic normality  of Z j , we know that for large m*, £'[y rf^'(zi — z*)] « 0 and Cov[y m^(z — z*)] « fl .. /  /  i  z  By  Lindeberg's central limit theorem, we have  It follows from Slutsky's theorem that  1^  d l 0-y , 2  i  i  z*)  l^U  r  d k(J3;yi, z*) 2  d  Under the necessary regularity conditions on the log-likelihood function of (3, based on the standard arguments for showing asymptotic normality of M L E s , we can obtain  and  1 A d li(f3;yi, z*) 2  ^  d(3df3  T  P  I  m  Since y and z are roughly independent, the two limit random variables with distributions  N(0, Q ) and N(0, I((3)) are roughly independent. Note that n/m = 0(1). Putting these z  pieces together, we have asymptotic normality of y/n(/3 — (3) as follows:  • ^cp-ft^N^rm-'+m-^jipy 1]. 33  (3.1)  Note that the variance-covariance matrix of (3 based on the asymptotic distribution (3.1) has two components. T h e first component I(f3)~ is the "naive" estimate of the l  variance-covariance matrix of /3 in the two-step method, in which the variability of z» is neglected. The second component  /(/3)  _1  O  z  /(/3)  _1  arises from the variability in estimating  Z j and summarizes the extra uncertainty of (3 due to estimation of Z j . component I{[3)~ fl 1  z  7(/3)  _1  Since the second  is nonnegative definite, the variances of the main parameter  estimates based on the two-step method are underestimated.  3.3  A Joint Model Method for Likelihood Inference  3.3.1  The Likelihood for the Joint Model  We consider likelihood inference for semiparametric N L M E models with measurement errors and missing data in time-varying covariates, based on the approximate parametric N L M E models (2.6) - (2.8). ( a , (3, 5 , 2  The observed data are {(yi,  z*),  i =  l,...,n}.  Let 6 =  R, A, B) be the collection of all unknown parameters in models (2.6) - (2.8).  We assume that the parameters a , (3, 5 , R, A, and B are distinct. Let /(•) be a generic 2  density function, and let [X\Y] denote a conditional distribution of X given Y. T h e approximate log-likelihood for the observed data {(yi, Z j ) , i = 1,... ,n} can be written as  i=i  J  34  where  /y(yi|zi,  = TJpli M y i j l y > *> *;  a, (3, 5 )  z  2  a  A <^) 2  b  = n;ii ( 2 ^ ) - ^ e x p { - [ y - g(t , d(v%a + v^ai, /3, b,))] /2<5 }, 2  y  l3  / (zi|ai; a, i?) = T J ^ i /z(ifcl *; a, i?) z  a  z  = ]ir=i M " / 1  x(z -  U i  ifc  f  c  2  exp{-(zi - u fc  Q  2  1  i  )  fl) = |2^fi|- /2 p{-bf/3- bi/2}. 1  i;  R~  l  ik  fc  l  /(b  cx - w  Vi ai)/2},  -  /(a,; A)" = | 2 7 r A | - / e x p { - a M - a / 2 } 1  ifc  1  eX  This approximate log-likelihood function generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects ai and bj. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the dimension of the random effects (a  i(  b;).  This is straightforward to do by  direct numerical integration such as Gaussian quadrature when the dimension of (ai, bi) is very small (say, 1 or 2). However, when (a^, bi) has a dimension of 3 or more as is often the case in practice, one needs to consider alternative methods such as computationally intensive Monte Carlo methods. Laird and Ware (1982) obtained M L E s in L M E models using the E M algorithm. Here we use a Monte Carlo E M ( M C E M ) algorithm to find the approximate M L E s of all parameters 6. B y treating the unobservable random effects ai and bi as additional "missing" data, we have "complete data" { ( y , , Z j , ai, bi),  i = l,...,n}.  The complete-data log-  likelihood function for all individuals can be expressed as  c{Q)  l  = E c(0) l  = £ { l o g /y(yj| i, z  i=i  ai, b a, (3, 5 ) + log 2  i;  i=i  where  z  z  a, R) (3.2)  + log/(a ;yl) + log/(b ;B)} i  f { i\^\  i  )  is the complete-data log-likelihood for individual i.  35  2  3.3.2  A  M C E M  Method  The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algorithm to compute M L E s in a wide variety of situations, such as missing data and randomeffects models. The E M algorithm iterates between an E-step, which computes the conditional expectation of the complete-data log-likelihood given the observed data and previous parameter estimates, and a M-step, which maximizes this conditional expectation to update parameter estimates. The computation iterates between the E-step and M-step until convergence leads to the M L E s (or local maximizers).  When there are several modes in the  conditional expectation, the M L E s can be determined by trying different parameter starting values. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectations.  In the M-step, we  use standard complete-data optimization procedures to update parameter estimates. Let 6® be the parameter estimates from the i-th E M iteration.  The E-step for  individual i at the (t + l)th E M iteration can be written as  Qi(9\0W) =  E(l®(0)\ ,  z; flW)  yi  f  log /y(y;|zi, ai, bi; a, (3, S ) + log / (zj|aj; a, R) 2  z  + log /(ai; A) + log /(bi; B) x /(ai, b;|v;, z ; 0(*>)  db  {  =  {  J « ( a , (3,5 ) + / W ( a , R) + I®(A) + I®(B).  (3.3)  2  The above integral generally does not have a closed form, and evaluation of'the integral by numerical quadrature is usually infeasible, except for simple cases. However, note that expression (3.3) is an expectation with respect to the conditional distribution  /(ai, bi|yi, z*;  0®),  and it may be evaluated using the M C E M algorithm of Wei and Tanner (1990), as in Ibrahim et al. (1999, 2001). Specifically, for individual i, let random sample of size k generated from t  {(af \ bf >),..., (af , bf )} denote a  [a;, bj|yj, z*; 0^]. 36  ()  Note that each  t}  ( a f b f ^) de-  pends on the E M iteration number t, which is suppressed throughout. Then we approximate the conditional expectation Qi(0\0®)  in the E-step by its empirical mean, with missing data  replaced by simulated values, as follows  = IT E log fviyiK  (5, 6>) + i £ log / ( * | a ? > ; a , J2)  a ? \ bf> «,  z  ;  k=l  fc=l  + i E l o g /(af U ) + £ E log  /(bi  ;B).  F C )  fc=i fc=i + fc _i/c, t — 1, 2 , 3 , . . . , for some positive  We may choose ko as a large number and k = k -\ t  t  t  constant c, in the t-th iteration. Increasing k with each E M iteration may speed up the E M t  convergence (Booth and Hobert, 1999). The E-step at the (t + l ) t h E M iteration can then be expressed as  Q(o\o®)  » E  = ± QMeW) i=i  = E  E  i=i  \r i  £log M v t K  E ^ ( 0 ; vu ~^t\ b ? ) } }  t  fe=i  j  (3, <5) + ±  a f , b?\a,  i=lfc=l  £ log / ^ a f ;  a , i?)  i=lfc=l  +E E 1=1  E  2  i log /(af ; A) +  E E  5  fc=l  2=1 fc=l  i log  /(b!  f c )  ; B)  = Q ^ a , /3, r5|6>W) + Q ( ) ( a , i?|6»W) + < 2 ( ) ( A | 0 « ) + 2  2  Q^O^M).  3  To generate independent samples from [a  bj|  i(  y i  0^], we use the Gibbs sampler  , z; {  (Gelfand and Smith, 1990) by sampling from the two full conditionals [ a j | , Zj, b ^ 0^] and yi  f? ] as follows. (t)  [ b i | y j , Zi,  /(ailyi, z  i ;  b  i ;  0<*))  oc / ( a i , y ^ Z j , b  i ;  0(*>) = / ( a ^ , b  = /(ai|zi;6>W)-/ (yi|zi, y  / ( b i | y i , Zi, ai; 0 ) (t)  a i  i ;  0<*>) • f (yi\zi,  0 « ) • / ^ ( Z i | a i ; 0W) • M y ^ ,  oc / ( b i ,  y i  a,; 0 )  i ;  b^  0(*>)  ai, b , ; 0 « ) ,  = / ( b i | z i , a^ 0 )  W  a  , bi;6»W)  oc / ( a , ;  |zi,  Y  W  • / y ( v i | z i , a*,  (3.4) fy;  0W)  = /(b ;0W)-/y(y |z iai,b ;0(*)), i  i  i  i  where a i and b i are independent each other. Monte Carlo samples from the above full conditionals can be generated using rejection sampling methods, as in W u (2004). Alternatively, 37  integral (3.3) may be evaluated using the importance sampling method.  Details of these  sampling methods and convergence issues will be investigated in the next section. The M-step then maximizes Q(0\0^) to produce an updated estimate #(  t+1  ' at the  (t + l)-th iteration. Note that the parameters cx, (3, 6 , R, A, and B are all different, so we 2  can update the parameters ( a , (3, 5 , R), A, and B by maximizing  + Q^ \  2  and  2  separately in the M-step.  (3 \  The maximizer (a<- \  {t+l  t+1  R ^ ) for  5^ \ 2  t+l  +  may be computed via  iteratively re-weighted least squares where the random effects are replaced by their simulated values {(af \ ( (*+D a  )  /3  (t+D  b f >)}:  j  <  J  2 (  t  +  l )  j  =  a  r  g  m  a  {Q(D(  x  CX,j3,S ,R  Q)/  3 52| ( ) )(  0  t  ) +  g(2)  ( a ) j R  | (t) 0  ) }  2  { 5I5Ir 6  +  j  k  n  t  EEr i=i fc=i  l o  K  lo  M Y i N , a f \ b f ^ a , /3, <5)  g^(^l^  2  f e )  ;«^)j-  t  ( - ) 3  5  J  In general, the function in (3.5) is nonlinear in parameters and thus, the maximizers have no closed-form expressions.  The maximizers could be obtained via standard optimization  procedures for complete-data nonlinear models, such as the Newton-Raphson method. Note that optimization procedures for nonlinear models may be iterative as well. We can use the following Lemma to obtain analytic expressions of the maximizer A< > for t+1  and the maximizer B  ^ for  L e m m a 3.1. (Seber, 1984). Consider the matrix function  h(E) = l o g | E | + t r [ E n ] . _1  If Q is positive definite, then, subject to positive definite E , /i(E) is minimized uniquely at E =  a 38  By Lemma 3.1,  =  the maximizer A^ ^  y  A n  kt  ^  argmax^^-log/CafM) i=i k=i "  ='  arg max £  £  -  fe=l  ^  fc=l  argnunlnloglAI  [af >f ^"^ [af >]}  + i^^alY^- !^]}  argmmjnlogMI + argminjn  [fi  4  1  i = l fe=l  i^^trltaf)]^- ^])} 1  i = l fc=l  4  =  (*)]}  A  *  4  =  1  { - ± log | 2 T T A | - | [ g W p -i  arg mm £ £ I { log \A\ + i=l  =  * i  fct  i=l  =  can be written as  argmax{Q^(A|6>W)} •  =  for  t+1  log | A | +  ± ^ j S r ^ T [af] 1  [aj ] )} fc)  r  2=1 fc=l  4  ^-trL" £ | > f ] [af'f }} 1  =  a r g m i n j n log | A| +  I  4  i=l  J  k=l  • = rgmm{log|4+tr{/l-' i-^E^* ]N* ] '}} )  7  2=1 K = l  -  Similarly,  )  ;  a  C  i=l  k=l  the maximizer 73 fl  (i+1)  =  (4+1  for  )  argmax{Q >(#|0( ))} (4  B =  can be obtained by 4  n  kt  i=i  fc=i  -.  argnwx££-log/(bi  f  c  )  ;i?)  4  = ^EE^'iibfr. 4  1 = 1fc= l  To obtain the asymptotic variance-covariance matrix of the M L E 9, we can use the formula of Louis (1982), which involves evaluating the second-order derivative of the complete39  data log-likelihood function. Alternatively, we may consider the following approximate formula (McLachlan and Krishnan, 1997). Let Sc \ = dlc^/dO, where li^ is the complete-data log-likelihood for individual i.  Then an approximate formula for the variance-covariance  matrix of 0 is  -l Cov(0) =  $>(s«|  V l  , z  l 5  0) E{sf\ , Vi  z  i ;  9f  where the expectations can be approximated by Monte-Carlo empirical means, as above. In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. Obtain an initial estimate of 9 = 0 ^ based on a naive method such as the two-step method, and set a f  = 0 and bj ^ = 0. 0  Step 2. A t the (t+ l)th (t > 0) iteration, obtain Monte Carlo samples of the "missing data" (a,, bj) using the Gibbs sampler along with rejection sampling methods by sampling from the full conditionals [a^y;, z;, b,; 0^] and [b^y^, Z j , a^; 0 ^ ] , or using importance sampling methods, to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates 9^ \ +1  using standard complete-data optimization  procedures. Step 4. Iterate between Step 2 and Step 3 until convergence.  3.3.3  Sampling  Methods  Gibbs Sampler For the proposed Monte Carlo E M algorithm, we can see that generating samples from the conditional distribution [a*, bj|yj, Zj'; 0^] is an important step for implementing the E-step of the Monte Carlo E M algorithm. The Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multi-dimensional distribution by sampling from full conditionals in turn,.until convergence after a burn-in period. Here, we  40  use the Gibbs sampler to simulate the "missing" random effects a^ and bv Set initial values ( a | ° \ b- ^). If the current generated values are (a\ \ 0  k  b f ) , we can obtain ( a f  + 1  \  h \  k  a s  follows: Step 1. Draw a sample for the "missing" random effects a\ from h  the full conditional  z , b| ; 0 W ) . fc)  t  '  Step 2. Draw a sample for the "missing" random effects  /(tx|  y > i  h\ ^ from the full conditional k+1  af ; 0<*>). +1)  We assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots.  After a sufficiently large burn-in of r iterations,  the sampled values will achieve a steady state as reflected by the time series plots. Then, {(af,  b| ')} can be treated as a sample from the multidimensional density function fc  /(ai,  bi|yi,  0(*>).  If we choose a reasonably large gap r' (say r' = { ( a f ° , b\ ),k k)  = r + r',r + 2r',...}  10), we can treat the sample series  as an independent sample from the multidimensional  density function. T h e simplest choice for initial values (a\°\ bf^) is (0,  0).  Multivariate Rejection Algorithm Sampling from the two full conditionals can be accomplished by rejection sampling methods as follows. If the density functions are log-concave in the appropriate parameters, the adaptive rejection algorithm of Gilks and Wild (1992) may be used, as in Ibrahim et al. (1999). However, for arbitrary N L M E models, some densities may not be log-concave. In such cases, the multivariate rejection sampling method (see Section 3.2 in Geweke, 1996) may  be used to obtain the desirable samples. 41  Booth and Hobert (1999) discussed such a  method in the context of complete-data generalized linear models, which can be extended to our models. For example, consider sampling from / ( a j | y j , z / ( z j | a j ; 0 ^ ) - / ( y j | z j , a^, b i ; 0^)  i ;  bjj 0®) in (3.4). Let /*(a,) =  and c = sup{/*(u)}. We assume c < oo. A random sample u  from / ( a j | y i ,  Z i  , b^ 0^)  can then be obtained as follows by multivariate rejection sampling:  Step 1. Sample a* from / ( a ^ 0^),  and independently, sample w from the uniform  (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from  / ( b j | y j , Zj,  a*;  0^) can be obtained in a similar way. Therefore, the  Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from [ a i , bj|yj, z*; 0^].  Booth and Hobert (1999) noted that, when it is easy to  simulate from the assumed densities, the multivariate rejection sampling method can be very fast even if the acceptance rate is quite low.  Importance Sampling When the dimensions of a^ or bj are not small, however, the foregoing rejection sampling methods may be slow. In this case, we may consider importance sampling methods where the importance function can be chosen to be a multivariate Student t density whose mean and variance match the mode and curvature of /(a*,  bi|  y i  ,  Z  i  ;  0^). Note that a mul-  tivariate t distribution, which has heavier tails than a multivariate normal distribution, will produce a more robust approximation since underestimating the tails can have serious consequences such as unstable behavior that may be difficult to diagnose.  Booth and Hobert  (1999) discussed an importance sampling method for complete-data generalized linear models. Here, we may extend their method to our models and use importance sampling methods to approximate the integral in the E-step. Specifically, we write  /(a,, b i | y  i ;  Z i ; 0^) = 5exp[/i(ai, b i ) ] , 42  where s is an unknown normalizing constant.  Let h(a^, bj) and h(a^, bj) be the first and  second derivatives of h(a^, bj) respectively, and let (a*, b*) be the solution of /i(aj, bj) = 0, which is the maximizer of h(ai, bj). variance of  Then, the Laplace approximations of the mean and  /(aj,bj|yj, z$; 0®) are (a*, b*) and — (h(a*, b*))  _1  respectively.  Suppose that  {(a*' ', b*' ^),..., (a*^'', b*^)} is a random sample of size k generated from an importance 1  1  t  function  /i*(aj, bj), which is assumed to have the same support as /(aj, bj|vj,  ZJ;  0^). Then  we have  where  it)  /(a; ,b; W  (fc)  |y ,z t  i;  QW)-  h*(af\b*M) are importance weights. Other sampling methods have also been proposed (e.g. McCulloch, 1997). For the above sampling methods, the adaptive rejection method is applicable only when the appropriate densities are log-concave, while the multivariate rejection sampling method and the importance sampling method are applicable in general. Adaptive and multivariate rejection sampling methods may be efficient when the dimensions of the random effects and the sample sizes are small. When the dimension of the integral in the E-step is high, however, rejection sampling methods can be inefficient due to low acceptance rate. If the sample size is not small, importance sampling methods may be more efficient than rejection sampling methods since in this case the importance function may closely resemble the true conditional distribution.  43  3.3.4  Convergence  For Monte Carlo E M algorithms, the incomplete-data log-likelihood is not guaranteed to increase at each iteration due to the Monte Carlo error at the E-step.  However, under  suitable regularity conditions, Monte Carlo E M algorithms still converge to the M L E s (Chan and Ledolter, 1995). When applying the Monte Carlo E M algorithm, Monte Carlo samples for the "missing" random effects are drawn at each E M iteration.  Consequently, Monte  Carlo errors are introduced. The Monte Carlo errors are affected by the Monte Carlo sample size. It is obvious that larger values of the Monte Carlo sample size fc will result in more t  precise but slower computation. A common strategy is to increase k as the number t of E M t  iterations increases (Booth and Hobert, 1999). For sufficiently large values of k , the Monte t  Carlo E M algorithm would inherit the properties of the exact versions, such as the likelihood increasing properties of E M , but this would substantially increase the computational work load.  Thus, we usually use a relatively small k at initial iterations, and then increase k t  t  with the iteration number t. If the Monte Carlo error associated with 0 (  t+1  ) is large, the (t + l ) t h iteration of  the Monte Carlo E M algorithm is wasted because the E M step is swamped by the Monte Carlo error. Booth and Hobert (1999) proposed an automated method for choosing k in the t  context of complete-data generalized linear models. Their method can be extended to our case in a straightforward way as follows. Let  Q  ( 1 )  0Q(fl|flC>)  (0|0«)  dd d Q(0|0W) 2  Q >(0|0 ) (2  (t)  and let 0*( ) be the solution to Q^(G\6^) <+1  dddO  T  = 0. When the simulated samples are indepen-  dent, it can be seen that the conditional distribution of [0^ ^|0W] is approximately normal +1  44  with mean 0*( ) and a covariance matrix that can be estimated by t+1  C o v ( 0 ( ) | 0 « ) = Q(2)(0*(*+D|0(i))-i Cov(Q (0* i+1  (1)  (t+1)  |0W)) Q(2)( >*( )|0(*))- , t+1  1  6  where  C o v ^ V ^ I ^ ) ) = £ E | ^log/(yi,z  (a[ \ k  i >  ai  f c  \b^^*  + 1  ))|  },  b f ) are simulated samples, and Wik are the importance weights when the importance  sampling is used and are all set to 1 when rejection sampling methods are used. After the (t + l)th iteration, we may construct an approximate 100(1 — a)% confidence ellipsoid for 0*(*+) 1  Dasec  }  o  n  the above normal approximation. The E M step is swamped by the Monte  Carlo error if the previous value 0^ lies in the confidence ellipsoid, and in that case we need to increase k . For example, we may set kt to be & _ i + h-i/c for some positive constant t  t  c and appropriate k . Increasing k with each iteration may speed up the E M convergence 0  t  (Booth and Hobert, 1999). Note that this method of choosing k is completely automated. t  The proposed Monte Carlo E M algorithm often works well for models with a small dimension of random effects. When the dimension of random effects is not small, however, the proposed M C E M algorithm and Gibbs sampler may converge very slowly or even may not converge. Therefore, in the next section, we propose an alternative approximate inference method which may avoid these convergence difficulties and may be more efficient in computation.  45  3.4 3.4.1  A Computationally More Efficient Approximate Method The Need  for a n A l t e r n a t i v e M e t h o d  The Monte Carlo E M method in the previous section may be computationally very intensive and may offer potential computational problems such as slow or non-convergence, especially when the dimensions of the random effects a; and b; are not small. When the dimensions of the random effects are not small, sampling the random effects in the E-step may lead to inefficient and computationally unstable Gibbs sampler, and may lead to a high degree of auto-correlation and lack of convergence.  To overcome these difficulties, in this section  we propose an alternative approximate method by iteratively using a first-order Taylor approximation to the nonlinear models.  The proposed method avoids sampling the random  effects and provides analytic expressions for parameter estimates at each iteration, So it may be preferable when the Monte Carlo E M method exhibits computational difficulties. Alternatively, the proposed method in this section can be used to obtain excellent parameter starting values for the Monte Carlo E M method. For complete-data N L M E models, approximate methods have been widely used, and these approximate methods perform reasonably well in most cases (Lindstrom and Bates, 1990; Pinheiro and Bates, 1995; Vonesh et al., 2002). These approximate methods are typically obtained via Taylor expansions or Laplace approximations to the nonlinear models. One particularly popular approximate method for complete-data N L M E models is that of Lindstrom and Bates (1990), which is equivalent to iteratively lihood based on certain L M E models (Wolfinger, 1993).  carrying out maximum like-  Following Lindstrom and Bates  (1990), we propose to further approximate model (2.5) by taking a first-order Taylor expansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model, with the covariate model (2.8), we  46  update parameter estimates based on the distribution of observations and an E M algorithm. In each iteration, analytic expressions for parameter estimates are available. Therefore, the proposed approximate method may provide substantial computational advantages over the Monte Carlo E M method. We rewrite the N L M E model (2.6) and (2.7) as a single equation  Vij = 9ij( , P, i, i ) + e^; a  where  a  i = 1 , . . . , n, j = 1 , . . . , n  b  (3.6)  i}  gij(-) is a nonlinear function. Let gi = (gn,... ,g ) . T  iTH  Denote the current estimates  of (6, ai, bi) by (6, sn, bi). Taking a first-order Taylor expansion of g^ around the current parameter estimates a. and (3 and random effects estimates a* and bj, we obtain the following L M E response model  y = W oc + X 3 + Hi&i + Tihi + ei, i  i  (3.7)  ir  where  Wi =  (wji,.  . . ,W i „  ( x i i x  )  i  )  i  with  xv =  da  dgn = dp  Xi  =  Hi  = (hji,..., h  ) with h = - d*i  Ti  =  T  T  i  n  i  ; j  T  ini  9i =  with x  i:j  tj iJ  (tii, • • • , Um)  with tij = - 7 ^ -  Yi - g i ( a t , P, i , bi) + WiCx + XiJ3 + HiSH + Tibi, a  with all the partial derivatives being evaluated at (a, (3, a bi). i ;  The proposed method in this section is to iteratively carry out maximum likelihood based on the L M E response model (3.7) and the covariate model (2.8). be the mean parameters and A =  (5 ,R,A,B) 2  Let 7 = (a, (3)  be the variance-covariance parameters. T h e  algorithm consists of alternately obtaining approximate estimates of 7 given the current 47  estimates of variance-covariance parameters a based on the distribution of fj = (yf,  z[)  T  and then updating the variance-covariance parameter estimates via an E M step using the posterior curvatures of (e;, e^, a;, bi), as in Laird and Ware (1982).  3.4.2  A n a l y t i c Expressions of  Estimates  We can combine the L M E response model (3.7) with the covariate model (2.8) to form a unified L M E model r% = QiJ where r  %  (yf, z f f ,  =  =  + ZitJi + V j , (af, bf) ,  v  T  4  i = =  l,...,n,  (ef, f) ,  U  T  e  t  (3.8) =  {1%,.  U  T  V  V  Qi = Ui  0  Vi  J  0  with O's being appropriate zero matrices. For the unified L M E model (3.8), by standard arguments, we have  [fi\ui;  7, a] ~  where Ej = Z DZj  + A»,  {  N{Qi*f + ZiUii, Ai),  /  r2 T 2  si  Ai =  V  0  r>  0  [T\;  \  D  I®R  7, a]  IA  ~ iV(Qi7, Si),  (3.9)  0 ^ B  0  and the Kronecker product I ® R is a z^mj x um-i matrix with the u x v submatrix R on the diagonals and zeros elsewhere. Using known results for L M E models (e.g., Vonesh and Chinchilli, 1997), we can update the estimate of 7 given the current estimates a by  7 = Etfsr ^  Eo^r *  1  1  .1=1 48  (3.10)  Based on the unified L M E model (3.8), we can also obtain the joint distribution of (fj, u>i) at 7 = 7 and A = A  W")  Qa  N  5 7i A  IV ) 0  J  V given the observed data r , at 7 = 7  from which we obtain the conditional distribution of and A = A:  [wi|f  i;  7, A] ~ JV[D Zf S  _1 4  ( r i — Q i 7), D — D Zf S r  1  Z i £>].  (3.11)  A n estimate of the random effects u>j is thus given by  £> = DZ  ^(fi-Qn).  1  i  i  (3.12)  Finally, following Laird and Ware (1982), we can update the variance-covariance parameter estimates as follows. Note that if we were to observe a*, b j , e,, and e  i}  to  y  i  in addition  and Z j , we would have the following estimates  ^ = Ebibf/n,  * = £efe,/£n , 2  i-1 n  where  ^ = E E €ij i=ij=i  £ F i ' S £ Jj ij> e  e  i=l  rrii  €  i=lj=l  A, and fl, respectively.  e  i=l  t  1=1  n  4/ £ ™ »  X^ajaf, i=l  i=i  and  A =  aj  af/n,  are the "sufficient" statistics for 6 , 2  £ i b f i=l b  Since a i , b j , e^, and  £  i=l  R,  are unobservable, we can "estimate" them  by their conditional expectations given the observed data y i and z , at the current parameter estimates 7 = 7 and A = A. Based on standard results for multivariate normal distributions, we know that 7 = 7, A = A  Z/i  N  \ 49  0  l  LT PI  /J  and 1  ; 7 = 7,  N  A= A  \ a J €  Ej  Qii^  V  Rij  /  0  matrix with the first n; x nj submatrix 5 I and zeros elsewhere,  where L j is a (nj + i/m^) x  2  is a (rii + i/m*) x u matrix consisting of the first  x v submatrix 0 and the remaining  mi (u x v) square submatrices with the j t h square submatrix R and zeros elsewhere. B y the definition of conditional distributions, it can be shown that 5 1 ~ Lj E " LJ],  Nf^T.Aj-iVtLfEr^-Q^), [e |f y  7,  i ;  A] ~  N[Rl  Er^fi  2  - Qa),  1  R -  R% E r  1  i^].  Using the expectation and covariance properties for multivariate random variables and some matrix algebra, we update the estimates of the variance-covariance parameters (<5, R, A, fl) 2  as follows:  P  = E E(ef ei\ril 7, A ) / £  rii  = X X C o v ^ l r , ; 7, A)) + ^ e ^ ;  R  n  mi  = £££(e e£|f y  i=i j=i n  i ;  = E E[Cov(e^|f  »=i i=i  i=l  7, _  mi  i;  7, A f l ^ e ^ ; 7, A ) ] / £ n,-  i=i  7, A ) + £ ( 6 ^ ;  7, A ) f l (  C y  |r  i ;  7,  n  A  = E^( i fl^; a  n  Af]/£m ,  i=i  '  t  (3.13)  7,'A)/n  a  i=i n  = £ [ C o v ( a j | f y , 7, A) + E{sLi\fi;  fl  7, A) £ ( a j | f  7, A ) ] / n , • T  i ;  i=l =E£?(b bf|f ; ,A)//i i=l i  i  = ElCovCbilfij  7  7  )  A) + £ ( b j | f y ,  7,  A) E(b \ri; %  7,  A) ]/n. T  i=l The foregoing results show the closed-form expressions of the parameter estimates at each iteration for L M E model (3.8). Iteratively solving L M E model (3.8) until convergence leads to an approximate M L E 0 = (7, A) of 0. 50  3.4.3  Asymptotic Properties  Following Vonesh et al. (2002), we show in Section 3.7 that, under fairly mild regularity conditions, 7 satisfies the following properties  7  =  7 M L £ + 0 {(minJVi)  =  7  j  P  1/2-1  where Ni = ni + m,i, and 7  0  + O / m a x n~ ^ , ^min N^j 1  0  2  p  and ^MLE  a  r  e  ^  n e  ^  r u e  value and exact M L E of 7 , respectively.  Thus the approximate M L E 7 is not only consistent but also asymptotically equivalent to ' the exact M L E . The rate of convergence is shown to depend on both the number 71$ of observations per individual and the number n of individuals. Moreover, the estimate 7 asymptotically follows a normal distribution:  v ^ ( 7 - 7o) ± N(0,  ft( ))., 7o  where the asymptotic variance-covariance matrix, Q,(-y ), can be consistently estimated by 1 j 0  i=l  0=0 The proofs of the above results are given in Section 3.7.  3.5 3.5.1  Example and Simulation An Application in AIDS Studies  We apply the foregoing proposed methods to a H I V dataset for illustration. We also compare the proposed methods with the commonly-used naive method which ignores covariate measurement errors and the two-step method, which is described in Section 3.2.  51  Data Description The study consists of 46 H I V infected patients who were treated with a potent antiretroviral regimen consisting of protease inhibitor and reverse transcriptase inhibitor drugs. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will typically decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). T h e data at the late stage of study are likely to be contaminated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules. The viral load has a detectable limit of 100 R N A copies/mL. For simplicity, we imputed the censored viral loads, which are below the detection limit, by half the detection limit 50, as in Wu and Zhang (2002). The number of measurements for each individual varies from 4 to 10. There were 72 out of 361 C D 4 measurements missing at viral load measurement times, due mainly to a somewhat different C D 4 measurement schedule. T h e detailed data description can be found in W u and Ding (1999) and W u (2002).  The Response and Covariate Models Modelling H I V viral dynamics after anti-HIV treatments has received a great deal of attention in recent years (Ho et al., 1995; Perelson et al., 1996; W u and Ding, 1999; Wu, 2005). The following H I V viral dynamic model (first stage model) has been widely used (Wu 2002; W u and Zhang, 2002)  V i j  = log  1 0  (P  l i e  -  A l  ^ + P  e - ^ ) + ey, X  2 i  (3.14)  where yy is the logio-transformation of the viral load measurement for patient i at time tij, Pu and P i are baseline values, and X^j and A y are the first (initial) and the second 2  2  phase viral decay rates, respectively, and they may be interpreted as the turnover rates of 52  productively infected cells and long-lived and/or latently infected cells respectively.  The  logio-transformation of the viral load is used to make the data more normally distributed and to stabilize the variance. Wu (2002) noted that variation in the dynamic parameters such as the first phase decay rate XUJ may be partially associated with variation in C D 4 counts. In AIDS studies, it is known that covariates such as C D 4 count are often measured with substantial errors. Thus we assume that the dynamic parameters are related to the true covariate values, reflecting the belief that actual, not possibly corrupted, covariate values govern the model parameters, as in Higgins et al. (1997) and W u (2002). Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1). Grossman et al. (1999) pointed out that the viral decay rate after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). Based on the reasons noted above, we consider the following second stage model, which corresponds to the first stage model (3.14),  log(Pu) = A  +  hi  (3.15) l o g ( P i ) = PA + hi 2  XUJ = w(Uj)  +hi(tij),  where z*- is the true (but unobserved) C D 4 count, and w(tij) and hi(Uj) are nonparametric smooth fixed- and random-effects functions defined in Section 2.1. To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines 53  Table 3.1: A I C and B I C values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3. p=l,g=l  p=2,q=2  p=2,g=l  p=3,g=3  p=3,q=2  p=3,g=l  AIC  Model  615.96  583.54  585.39  577.37  586.45  576.43  BIC  678.18  669.09  656.43  670.71  665.90  651.50  with the percentile-based knots to approximate the nonparametric smooth functions w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of the random effects bj, i.e., more basis functions used to approximate the fixed-effects function than the random-effects functions. A I C and BIC criteria are used to determine the values of p and q. We use the observed C D 4 counts for the unobservable true C D 4 counts in the response model (3.14) and (3.15), and use S P L U S functions nlmeQ and anovaQ to obtain the values of A I C and B I C . Table 3.1 displays A I C and B I C values for various plausible models. Based on these A I C and B I C values, the model with p = 3 and q = 1, i.e.,  A i « A + /?e#i(*tf)+ft V>2(*«)+•&«, 2 i  (3.16)  seems to be the best, and thus it is selected for our analysis. For the C D 4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,...,  u ~) l  u  and linear (a = 2), quadratic  (a = 3), and cubic (a = 4) polynomials. Table 3.2 presents A I C and B I C values for these three models. The following quadratic polynomial L M E model best fits the observed C D 4  54  Table 3.2: A I C and B I C values for the linear, quadratic, and cubic C D 4 models. Model  a=2  a=3  a=4  AIC  796.17  703.19  742.12  BIC  819.50  761.52  781.01  process:  + 0 2 ) uu + (&3 + 0.3) u\ + e  CD4i( = (ai + ai) + (a  2  u  (3.17)  where uu is-the. time and oc — (ai, a , c*3) are the population parameters and a, T  2  (an, a , i2  a)  T  i3  =  are the random effects.  E s t i m a t i o n M e t h o d s a n d C o m p u t a t i o n Issues We estimate the parameters in the response and covariate models using the naive method which ignores measurement errors, the two-step method in Section 3.2, and the two proposed "joint" model methods discussed in Sections 3.3 and 3.4. We denote the method in Section 3.3 by M C E M and the method in Section 3.4 by A P P R . The two proposed joint model methods need starting values for model parameters. We respectively use the parameter estimates obtained by the naive method and by the two-step method as parameter starting values for the two joint model methods. For the naive method and the two-step method, we use S P L U S functions lme() and nlme() to obtain parameter estimates and their default standard errors. For the M C E M , method, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 3.1 and 3.2 show the time series and the autocorrelation function plots for b associated with patient 10. From these 2  figures, we notice that the Gibbs sampler converges quickly andthe autocorrelations between successive generated samples are negligible after lag 15. 55  Time series and autocorrelation  Figure 3.1: The time series plot for 6 associated with patient 10. 2  Series :  -I I I I  b2  I  I  I  i  i  i  •' •'-  Figure 3.2: The autocorrelation function plot for b associated with patient 10. 2  56  function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain "independent" samples (see sampling methods in Section 3.3.3). For the Monte-Carlo E M algorithm, we start with k = 500 Monte-Carlo samples, and 0  increase the Monte-Carlo sample size as the number of iteration t increases: k \ t+  with c = 4.  = k + k /c t  t  Convergence criterion for the iterative methods in our examples is that the  relative change in the parameter estimates from successively iterations is smaller than a given tolerance level (e.g., 0.01). However, due to Monte Carlo errors induced by Gibbs sampler, it is difficult to converge for an extremely small tolerance level, otherwise it may converge very slowly. The actual tolerance level we used in our example for the two proposed joint model methods is 0.05. Convergence of the algorithms are considered to be achieved when the maximum relative change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the M C E M method. Other sampling methods may also be applied. On a S U N Sparc work-station, the M C E M method took about 90 minutes to converge while the A P P R method took only 3 minutes to converge. This shows that the A P P R method offers quite a substantial reduction in computing time, and is thus computationally much more efficient than the M C E M method. Analysis Results and Conclusions Table 3.3 presents the resulting parameter estimates and standard errors. We see that, except for the naive method, the other three methods give similar point estimates for the parameters, especially for the covariate model parameters ex. However, for the parameters /3 of main interest, the two-step method gives  smaller standard errors than the two joint model  methods ( M C E M and A P P R ) . This is because the two-step method ignores the variability due to estimating the parameters cx in the covariate model. Thus, these results are consistent  57  Table 3.3: Parameter estimates (standard errors) for the H I V dataset.  Method Naive  a -  Two-step -.42 MCEM APPR  a  3  2  -  -  -  -  4.17 -3.74  (.5) (•1) (.5) -.41. 4.02 -3.54 (.6) (•1) (.5) -.42 4.17 -3.74 (.6)  (•1)  (.6)  Pi  P5  As  Pr  5  6.82 -3.01  9.27  -1.67  .35  (3.3)  (.6)  (5.6)  (8.8)  (3.6)  1.53  6.84 -2.86  9.04  -1.75  (4.7)  (.6)  (5.5)  (8.9)  (3.5)  1.55 (5.2)  6.85 -2.78  8.91 (9.0)  -1.79 (3.6)  .35 .53  8.99  -1.77  .35 .52  (5.0)  (5.9)  (9.1)  (3.6)  Ps  ft  11.73 65.41  .41  (3.9) (•2) 11.73 65.78 (4.1) (•2) 11.74 66.60  (4.7) (•2) 11.74 65.72 1.33 (•2)  (4.5)  Pi  (•7) (6.0) 6.85 -2.82 (•7)  R  .35 .52  Note: A and B are unstructured covariance matrices, but we only report the estimates of their diagonal elements here. Diag(A) = (.52, 4.06,1.98) for Two-step, Diag(A) = (.53, 2.55,1.25) for MCEM, and Diag(A) = (.52,4.06,1.99) for APPR. Diag{B) = (1.11, 69.94,2.02,24.86) for Naive, Diag(B) = (1.10,69.58,2.02,25.04) for Two-step,. Diag(B) = (1.11,69.96,2.05,25.47) for MCEM, and Diag(B) = (1.10,69.78,2.01,24.85) for APPR. with the analytical results about the two-step method in Section 3.2. We also see that the naive method may severely under-estimate the effect of the covariate C D 4 (which is measured by the parameter p ). 3  The estimates and standard errors based on the two joint model  methods are similar and may be more reliable. The commonly used two-step method and the naive method may give misleading results, and the two proposed joint model methods may be more reliable. We will confirm this conclusion via simulations in next section.  3.5.2  A  Simulation  Study  In this section, we conduct a simulation study to evaluate the proposed methods ( M C E M and A P P R ) , and compare them with the commonly used two-step method and the naive method by the mean-square-error (MSE). The models and the measurement schedules used in the simulation are the same as those in the real H I V dataset in the previous section (i.e., models (3.14) - (3.17)). In the simulations, the true values of a and (3 are shown in  58  Table 3.4: Simulation results for parameter estimates as well as (standard errors) and (simulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R .  Parameter True Value Naive Method  Two-step  MCEM  APPR  Oil -.5  a 4.0  Pi  P2  Pz  P4  p5  (%  Pi  -4.0  12.0  66.0  1.5  7.0  -3.0  9.0  -2.0  -  -  -  11.98  65.50  0.94  6.92  -3.74  10.13  -1.62  -  -  -  (-1)  (1.1)  (.3)  (1.7)  (2.6)  (.9)  -  -  -  (.2)*  -.51  4.05  -4.02  11.98  (1.1) (1.2)* 65.66  (1.0)* 1.53  (.3)* 6.92  (1.6)* -3.74  (2.8)* 10.13  (1.0)* -1.62  (•1) (.1)*  (.3)  (•2) (.1)*  (1.2)  (1.4)  (1.7)  (2.6)  (.9)  (1.2)*  (1.4)*  (•3) (.2)*  (1.6)*  11.98  65.85  1.48  6.98  -3.11  (2.5)* 9.15  (.9)* -1.95  (•3) (.3)*  (2.0)  (3.0)  (1.1)  (1.9)* -3.82  (2.8)* 10.29  (1.1)*  (3.1) (2.8)*  2  a  3  -.51  (.3)* 4.04  (•4) (.3)* -4.02  (•1) (.1)*  (•4) (.4)*  (•4) (.4)*  (•1) (.2)*  (1.5)  (1.8)  (1.5)*  -.51  4.05  -4.03  11.98  65.46  (1.8)* 1.52  (•1) (.1)*  (•3) • (-4) (.3)* (.3)*  (•2) (.2)*  (1.4)  (1.8)  (.3)  (2.0)  (1.4)*  (1.7)*  (.3)*  (1.9)*  6.93  -1.56 (1.1) (1.0)*  Table 3.4, and the other true parameter values are 5 = A, R = .3, A = diag(.5, 2, 1), and B = diag(l, 9, 2, 4). We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors as well as simulated standard errors based on each of the four estimation methods. Since M C E M method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 116 data sets. The simulation results are shown in Table 3.4. From Table 3.4, we see that the naive method can severely under-estimate the covariate effect /?3. The two-step method produces similar point estimates as the two joint model methods ( M C E M and APPR),"but it gives smaller  standard  errors than the M C E M and the  A P P R methods, since the two-step method fails to incorporate variability in estimating the covariate model. These results are consistent with the analytical results in Section 3.2. Note that the estimates for the covariate model parameters a are similar for the three methods (Two-step, M C E M , and A P P R ) , but the parameters a are usually treated as nuisance pa-  59  rameters and are not of primary interest. The two proposed joint model methods  (MCEM  and A P P R ) perform better than the two-step method and the naive method, and the M C E M method is the best among all four methods in terms of bias. In the computation, the A P P R method converges much faster than the M C E M method. These simulation results confirm that the naive method and the two-step method may give misleading results and that the two proposed methods are more reliable.  3.6  Discussion  We have proposed two approximate likelihood methods for semiparametric N L M E models with covariate measurement errors and missing data. The first method, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler, may be more accurate but may be computationally intensive and sometimes may offer computational problems such as slow or non-convergence. The second method, implemented by an iterative algorithm without Monte Carlo approximation, is computationally much more efficient, but it may be less accurate than the first method since it uses an additional approximation. Alternatively, the second method may provide excellent parameter starting values for the first method. Simulation results show that both methods perform better than the commonly used two-step method and a naive method. In particular, the commonly used two-step method may under-estimate standard errors, and the naive method may under-estimate covariate effects. For semiparametric N L M E models with covariate measurement errors and missing data, the models can be very complex.  Thus, if there is not sufficient information in the  data, the models can be non-identifiable. To our knowledge, there seem no existing general necessary and sufficient conditions for model identifiability, and the identifiability problem needs to be considered on a case-by-case basis. In practice, we can check model identifiability  60  by examining the convergence of iterative algorithms. If the model is non-identifiable, the iterative algorithms may diverge quickly. For the models considered here, we find that the iterative algorithms converged without problems, so the models seem identifiable. The methods proposed here may be extended to semiparametric generalized linear mixed models and nonparametric mixed-effects models with covariate measurement errors and missing data. T h e results will be reported in the near future.  3.7  Appendix: Asymptotic Properties of 7 Based on the A P P R Method in Section 3.4  In this section, we show the asymptotic properties of 7 obtained by the A P P R method in Section 3.4.  We first state some Lemmas, which are used in the proofs, in Section 3.7.1.  Then we describe some regularity conditions under which the asymptotic properties hold in Section 3.7.2. In Section 3.7.3, we obtain some estimating equations, which are used for showing asymptotic properties of 7 . Consistency and asymptotic normality of 7 are shown in Sections 3.7.4 and 3.7.5, respectively. The results are extensions of Vonesh et al. (2002).  3.7.1  Some  Lemmas  The following four lemmas will be used for showing asymptotic properties of 7 .  L e m m a 3.2. (Vonesh and Chinchilli, 1997). Let 7 „ be a sequence of random variables satisfying Y  n  = c+O (a ) p  n  where a  n  = o(l). If f(x)  is a function with r continuous derivatives  at x = c, then  f(Y ) n  = /(c) + fU(c)(Y  - c) + • • • + [l/(r - l)!]/( " )(c)(y - c ) r  n  1  n  61  1  + O «), p  where / ^ ( c ) is the fcth derivative of / evaluated at c. In particular,  f(Y ) n  = f(c) +  O (a ). p  n  This result holds when O (-) is replaced everywhere by o (-) or when Y p  p  n  and c are replaced  by a vector/matrix random variable Y „ and vector/matrix constant c.  L e m m a 3 . 3 . (Aitchison and Silvey, 1958). If / is a continuous function mapping R  s  itself with the property that, for every 9 such that ||#|| = 1, 9 f(6)  into  < 0, then there exists a  T  point 0 such that ||6>|| < 1 and f(8) = 0.  L e m m a 3.4 (The B o u n d e d Convergence Theorem).  Let {f (x)}  be a sequence of  n  measurable functions defined on a set of E of finite measure, and suppose there is a real number M such that |/ (a;)| < M for all n and all x. If f(x) n  = lim _ n  > o 0  f (x)  for each x in  n  E, then  L e m m a 3 . 5 . Let A and B be u x u symmetric matrices with eigenvalues fJ-i(A) > / ^ ( A ) > ••• > A*„(A) and Hi(B)  > fj-2{B)->  ••• > Hv{B),  respectively.  If A — B are nonnegative  definite, denoted by A — B > 0 or A > B, then we have fJ-i(A) > Pi(B),  3.7.2  i = 1 , . . . , v.  Notation and Regularity Conditions  Let the r-dimensional vector 7 = ( a , (3) € T and  k ( 7 , oj{) = li(ct, 0, a NiLih,  i 5  y ^ Zj) = l o g / y ( j / i | z j , c ^ i ; 7 ) + l o g / z ( z i | w i ; 7 ) ,  Wi) = ^ ( 7 , Wi) + l o g / ( a i ) + l o g / ( b i ) , 62  '  where N; = rii + m-i. Let  d w =u; (7)) i  i  d  2  »\w,w (7, ^ ( 7 ) ) = t  w =u; (7)»  dujidujf  i  i  i, (7> ^ ( 7 ) ) = ^ ; ^ ( 7 , Wi) w =a; (7)' 7  i  i  " 7 7 ^ ^(-Y)) = d^ ^' li  " 7 ^ ( 7 , W i ( ) ) = Q^fhh, 7  W l )  u>i=u;i(7)i etc.  Wi) a; i =a> i (7).  Similarly, we can define the corresponding derivatives for L i ( 7 , c*>i(7)) and  3  2  -h(i,  Vi),  etc.  Also, we denote convergence in probability as iVj —> 00 by o (ljv ), convergence in probability p  as n —> o  p  00 by  ( l A ^ i , n ) -  o  p  ( l „ ) ,  i  and convergence in probability as both Ni  —> 00 and n —• 00 by  We show consistency and asymptotic normality under the following regularity  conditions. A n outline of the proof when some of these conditions are relaxed is provided at the end. R l . /Vj = O(N)  uniformly for i = 1 , . . . , n, where N = min;  R2. The variance-covariance parameters A = (5 , R, A, B) are'fixed and known, and 2  the true parameter 7  0  is in the interior of V. Qi, A i , Zi, E j , and D are evaluated at 6 and  Wj. When A is unknown, we can simply replace it by its consistent estimate (e.g., in (3.13)). R3.  The density functions fyiVijl^i,  <*V)7) and fz(zij\tJi;  7 ) satisfy the necessary  regularity conditions (e.g., Bradley and-Gart, 1962) such that, for fixed 7 , the M L E of u>j is v^-consistent for Wj as Ni —> 00.  In addition, the necessary regularity conditions are  assumed (e.g., Serfling, 1980, p. 27, Theorem C) such that, by the Law of Large Numbers,  63  the following hold:  ~K -J^k{l^ ) oyd'y l  l  1  -N~ -^-- k{ ^ ) l  Y  1  -N- ^  + o {l )  as  N* - oo,  = N- ZjK- Z  + o {l )  as  N - oo,  + o (l )  as  T  l  i  1  l  l  l  i  kh,u  l  = N-'Q K- Q  a a  Finally, the matrices  p  ) = N- ZTA; Q 1  l  T J i where, under models Z(2.6) - (2.8)  p  i  p  Ni  Ni  Ni  {  -  oo,  2  2  N~ Qj'h~ Qi l  l  and  N^ ZjK^Zi l  are both assumed to be positive definite  with finite determinants such that, for example, the smallest eigenvalue of  N~ QfA~ Qi 1  1  exceeds Ao for some Ao > 0. R4.  For all 7 G T and all the 6-dimensional u>j € R , b  the function L;(7,  cJj) is six  times differentiable and continuous in 7 and Wj for all y^- and z^-, and L i (7, W j ) satisfies the necessary regularity conditions needed to change the order of integration and differentiation, as indicated in the proof. R5. For any 7 e F, there exist di > 0 and Ai > 0 such that a. For all  7* € 5^(7), where #^(7) is the  r-dimensional sphere centered at  radius d\, the following holds:  ~ E ^>7^> i=l  ^(7))  l7=T = " ( 7 T  '  64  1  +  Op(ln),  as  n —> 00,  7 with  where ^(7*)  1  is positive definite with minimum eigenvalue greater than A i and  ^ , 7 ( 7 , Wi(7))  | =T = in {l\ 7  ni  Wi(7*)) + # y  W |  ( 7 * , «i(7*))  x K i w « ( 7 * , Wi(7*)) + D - ] - ^ 7 ( 7 * 1  w*(7*))}.  1  !  b. The first, second, and third derivatives of y/NiL^-y, Wj) with respect to CJ^ are uniformly bounded in 5 ^ ( 7 ) . R6. A t the true value 7 , the following hold true: 0  = c^j(7 ) exists for all i = l , . . . , r a ,  Eu,(QjE~ Qi) l  0  n  lim n " V  Covo; ( Q f E ^ Q , ) = 0,  2  n—>oo  '  ^  i=l  and  lim n n—>oo  where Qi, Z  it  _ 1  YV(7o) '  and E ; are evaluated at 7  =  (7o)~\  fi  J  0  and o»j and f 2 ( 7 ) 0  R7. The marginal densities, Jexp{iVjL;(7,  is positive definite.  u>i)} duii, satisfy the necessary regularity  conditions such that the M L E of 7 exists and satisfies ('JMLE  3.7.3  _ 1  ~  7o)  =  O {n~ l ). l 2  p  Estimating Equations  In Section 3.4, we obtain an approximate M L E 6 = (7, A) of 6 and the predictor u>j of <jj. It is obvious from (3.10) and (3.12) that the estimate 7 is a function of A and the final estimates Wj of u>j are functions of both 7 and A. For fixed A, it can be shown that 7 and n  Cji = uJi(j) maximize the complete-data log-likelihood function (3.2), i.e., 2~2 - ^ - ^ ( 7 , <*>i)-  65  In fact, we can write if  if  =  -  = if (a, (3, aj,bj, A) in (3.2) as  Hi | log(27r<5 l O g ^ T T C T)j --  ^ —  2  I27T.RI -- \  ll oogg  | 2 ^ | - \ log  log  |27Tfl|  T Yi'  Si  AT  yi - gi  1  -UiOL-  yzi-UiOL-Viai  J  T  ( » \  \ -) b  (1 Taking the first derivatives of £ ii with respect to 7 = ( a , (3) and i=i  = (af, b f )  r  and. set-  ting these first derivatives to appropriate zero vectors, we can obtain the following estimating equations  i=l  ^  Zi\  Zi -  ^  -1  Ui OL - Vi  yi - gi  &i  ^  J = 0,  - D~  Zi — UiCX — ViZn j  V  (3.18)  V  b  t  i =  l,...,n,  /  which is equivalent to  £ <9*  A  i  _ 1  ^ 7  i=l  +E Q  = £ Qi i=l  A" ^ 1  l  i=l  K  l  n,  ^ Zi A " Qi 7 + [Zj A " Zi + fl-" ]^ = ^ A ^ r - i , ' 1  where  1  i = 1 . . . , n,  1  ( V  Si. V>. dbj  da]  eft  (3.19)  Zi  \  J  The solution to the estimating equations (3.19) can be obtained by iteratively solving the following equations £  QI A - Qn 1  i=i  + tQT  A " ZiUi 1  i=i  Z f A" Q 1  l  +  1  f  it  i=i  (ZT A " Z i + b- )ui 1  l  = ± Qf A,-  1  66  (3.20)  = Zjk. r , 1  i  i = 1...,n,  where f; = ( y f , z[)  is defined in (3.7), and Qi, A  T  Z  it  and D are Qi, Ai, Zi, and D  it  evaluated at 7 = 7 and Wj = Wj, respectively. The solution to the equations (3.20) is given in (3.10) and (3.12). Therefore, for fixed A , the final estimate 7 and a V = ^ ( 7 ) satisfy the estimating equations (3.18) and maximize the complete-data log-likelihood function (3.2). These facts will be used to show the following asymptotic properties of 7 .  3.7.4  Consistency  We first note that, for fixed A , the M L E of 7 will satisfy the set of estimating equations  ° ^ i=i  i=i  7  J  Under R4, we have  "" / {j2 Q~ ^' Ni  ^)}exp{^7V,L,(7,  L  = 53 V ^ ' / • • • / (VNiQ-Hl,  wO) exp {  7  i=i  =  ^  V i=i  7  ^  7  A ^ ( , w^W • • • dw  n  7  j=i  7  / (\/^^, (7, ^  w^dwr-.dw,  w<)) e x p { / V L ( 7 , W f ) } ^ i  i  x 7 exp{iV L ( , j^ii  i  7  a;,-)}**  7  Now we examine the term £ - ^ ( 7 , u>j) in the above expression. Since the y^s conditionally independent each other given  and Zj/s are  u> by the Lindeberg Central Limit Theorem, it iy  follows that conditional on u>,-.  ( =  \  Yi ~ gi  NrWA,  1  Ui  a  —  Vi  a,  =  O (N~ ) 1/2  p  (3.21)  Furthermore, under R3 it can be shown that the estimate  ^(7)= v  O (N- ). 1/2  i  +  p  67  (3.22)  Combining the results in (3.21) and (3.22) and applying Lemma 3.2 to £ ^ ( 7 , ^ ( 7 ) ) , we can show that • 4 (7,A(7))  =  L ; ( u ; ) + O (iV -  =  O {N~ )  .=  O (N~ ).  7  i 7  7 )  i  112  p  p  +  1 / 2  i  )  O (N~ ) 1/2  p  (3.23)'  1/2  p  Then, by direct application of the Laplace approximation to integrals of the form J and / q(x) exp{kp(x)}dx,  exp{kp(x)}dx  where q(x) and p(x) are smooth functions in x with p(x) having  a unique maximum at some point x (see, e.g., Barndorff-Nielson and Cox, 1989), it can be shown that  (l + ocivr)) 1  J exp{/ViLi(7, Ui)}dwi = exp{A^ L (7, 1 ^ ( 7 ) ) } ^ \NiL"\ i  i  and / ( ^ ^ , 7 ( 7 , ";)) e x p { / V L ( 7 , i  =  exp{ML ( , l  7  «i( ))} x  i  w^jdui  + W )),  «i(7))  ( ^ - 7 ( 7 ,  ^  7  where L - ' ^ ^ = £"u;,u>(7 0 ^ ( 7 ) ) . Because (3.21) implies y/N~L' (-y, ;  irf  1  0)1(7)) = O ( l ) , it p  follows from R l that 1L  ( ^ ^ ( 7 ,  ^ ( 7 ) ) + O^N- ))  x  1  J](l + O^Nf )) 1  Hence we have  VWJ, (7, w<( ))  =  7  i  i  (y/NiL'^in,  7  i=l  1  &&)) + OiNr )  6/2  n  2TT i  i  (I + CXAT"1))  I  exp{jV L ( , «i(7))}  JJ  7  n 7 )  + ^(AT ).  6/2  2?r  exp{/V L (7, W i ( ) ) }  ir(  7  w( )) 7  E ^ , 7 ( 7 .  ^( ))+nO (iV- / ) 1  7  p  i=l 68  2  1  where  7  7  P  7  i=l  for all 7 € r, the M L E -y  MLE  J  where  (Vl^w  K( , u>( )) = e x { £ A ^ ( , ^ ( 7 ) ) } ft  ^(7,^(7)) |  n  /  2  -  Since ^ ( 7 , 4 ( ) ) ^ 0 7  +O (n7V- / ) 1  7 = 7 M L £  2  = 0,  p  (3,24)  n  = E ^7(7,  = ]H ^ - ^ 7 ( 7 , ^ ( 7 ) )  7  6  of 7 satisfies  ^ ) | 7 = 7 ^ = °  Ji( , £ ( 7 ) )  J )  i=l  i=l  ^ 1 ( 7 ) ) is the  s e  t of estimating equa-  i=l  tions for 7 conditional on fixed A , as given in (3.21). B y taking a first-order Taylor series expansion of Ji(7, £ ( 7 ) ) about y O (nN~ ),  a  n  noting that, from (3.24), J\{I E,  d  M L E  ^{IMLE))  ML  =  we have  1/2  p  •• Ml,  w ( 7 ) ) = O (nN-^ )  +  2  p  J{( *, 7  w(7*))(7 -  (3.25)  7MLE)>  where  JiW, « ( 7 * ) )  7j~r<M7, « ( 7 ) )  =  | =7* 7  = <9^ f E ^ '  17=7*  2  7  t=i  and 7* is on the line segment joining 7 to •JMLE-  B y applying the chain rule, we have, for  any 7 G T, d  2  •/I(7,*(7))  =  E { ^ T ^ ( 7 ;  W  +  )  i  j=i  = E  ^77(7,  Wi(7)) + ^ w ( 7 ,  *i(7))  a  i=l  NiLi(-y, u>i)  I  ^(7)  (3.26)  3 7  Note that £ 1 ( 7 ) maximizes NiL^-y, u>) and satisfies the second set of equations in (3.18), i.e.  '  '•  W 7 , ^ ( 7 ) ) - D^Cain) =  0  I  ^(7)  «  =  ^ ^ ( 7 ,  w<(7)).  Applying the chain rule once again, we have 0Wf(7)  3 :{l>^(7,  57'  d  =  ^ :  W  7  (  7  ^(7))  a  2  h{l, ,  W j ) uj =u> (7) i i  « < ( 7 ) ) + ^  w  ( 7 ,  2  dujidujf «i(7))  69'  lift, W i ) u>=<I;(7) i  i  ^(7)  a  r 7  Solving the above equation for [dLj { ^)/d^f ], i  0Wj(7) <9  /  T  = {-  r  we have  Wi(7)) +  7  Substituting this expression of [du^)/d~f )  in (3.26), it follows from R l , R3, and R5 that  T  as n —> oo - l j { ( 7 * , *<(7-)) =  -^E^, (7,«i.(7))l7^ i=l Q(7*)- + o (l„), 7  = which implies ~J[(rf*,  _1  P  fl(7*) -  Wi(7*))  Taking e = | Q ( 7 * )  (3.27)  1  -1  in the definition of convergence in probability, from (3.27), we  have P (-^(TT  <*=>  1  < -\J'Al\  «*(7*)) - ^(T*)" < ^ ( T * ) " ) -> 1, 1  pQn(7T <~^(7*,«i(7*))<|n(7')- )-»l, 1  as n - oo  1  asn^oo,  1  which implies ^ (^(TT  1  < - ^ 1 ( 7 * . « i ( 7 * ) ) ) - 1,  as n ^ oo.  Therefore, for sufficiently large n, , - ^ i ( 7 * , «i(7*)) > ^ ( T T  1  >^  J  (3.28)  with probability 1, where Ax is defined in R5. Since, in (3.25), O (A^- / ) = N' ' O (l) A 0 1  p  2  1  2  p  as N —» oo, similar arguments as for —\J'i{l*, £j(7*)) lead to with probability 1, for any 0 < K < di, \\O (N^ )\\  <  2  p  70  ^ ,  (3.29)  For any 7 such that H 7 — •JMLEW  where d\ is defined in R5.  =  >  K  w  e  know that H7 —  — 1) which satisfies the condition of Lemma 3.3. We can regard J i ( 7 , ^ ( 7 ) ) in  JMLE\\/  K  (3.25) as a function of ^ ( 7 — JMLE)  o  the unit sphere in R  and using the results in (3.28)  T  n  and (3.29), we have  :(7 -  - J i ( 7 , ,#(7))  IMLE?  n  ( 7 - 1MLE) O {N- ' ) T  +  1 2  P  <  " 7ML ||||O (^-  s  --- -- °-  E  1 / 2  P  <  «  7(7-7MLB)  ~(7-7Mi£)  - 4^4||7 - 7 M L I |  )II  2  <  Therefore, conditional on u , we can show  Pu  {  < 0  ,w(7))  -J\{l, n  - { I - I M L E ?  n —>  1 "as  00,  N —> 0 0 .  Thus, since 7 satisfies J i ( 7 , ^ ( 7 ) ) = 0, Lemma 3.3 implies that  lim  Puj{\\l -  < « } = 1.  IMLEW  n,N—*oo  Hence, using Lemma 3.4, we have  lim  P{||  7  -  n,N—>oo  7  M  i  £  | |  < K} = Eu; {  lim  P {\ft  -  u  7  m  l  e  | |  I n,N—>oo  Since ^ ( 7 , u>( )) = 0 and J i ( l 7  M L E  ,  v(^ ))  < « } ) = 1.  (3.30)  I  = O ( n i V - / ) by (3.24), it follows from 1  MLE  2  p  (3.25) that  ~  ^ ( 7 * * , w(7**))(7 - W )  where 7 * * is on the line segment joining 7 to -y  =  0 (iVP  the interior. It follows from R5 and inequality (3.28) that  <*K7**)) > ^z ( 7 * * ) 71  ),  (3.31)  From (3.30), we know that 7 * * is in  .  MLE  --J'i(l**> n  1 / 2  _ 1  >  z  where Q(7**)  _1  is positive definite. Lemma 3.5 implies that all eigenvalues of  \—\J[{l**, ^ ( 7 *  [—\J[{l**, £ ( 7 * * ) ) ] ~ \ .  are greater than | A i . Therefore, all eigenvalues of  which are equal  to the reciprocal of all eigenvalues of [—^1(7**, ^(7**))], are less than  [ - ^ ( 7 * * , ^( **))]  < £ / , i.e.,  _ 1  7  [-±J{(Y*, ^(Y*))]' = 0 (1). 1  P  So we have  Hence, from (3.31), we  have  (7  IMLE)  —  -U( **, 0,(7**)) n  O (iV-V2) p  7  =  0^1)0^-1/2)  (N- / ), 1 2  =  0p  from which it follows (given R7) that  1  3.7.5  +  O {N- / ) X  2  =  iMLE  =  7o + O^n- ' )  +  =  7o + O j max  n~  p  1 2  p  O (N-V ) 2  p  1 / 2  ,  -1/2  (min  Asymptotic Normality of 7  The asymptotic normality of 7 will be shown based on the estimating equations (3.10). Let n  *(7) = E ^ r ( r - Q i 7 ) , 1  l  i=l  where Qi,  and  Noting that  are defined in (3.20). The estimator 7 satisfies $ ( 7 ) = 0 at convergence.  d<&(7)/dy  T  = — E Qj^H Qi l  constant for 7 , we take a Taylor series expansion  xs  of ^ ( 7 ) around the true parameter 7 : 0  0 = *( ) 7  =  $(7 ) + 0  ^ M  (  dy  7  1  which implies  V^(7 - 7o)  =  d${n) n dy 1  T  72  _  7  o  )  )  Since  E[Qjtr (ri  -  l  Q )] = 0 and Cov[Qf S r I7O  1  ^  Q )] = Qf^Qu  -  I7O  by the Lindeberg  Central Limit Theorem, we have  n . - £ Qjt^Qi  I -1/2 /  - = J2 Qf^ih J  i=l  Noting that 7 = 7 + o (ljv,n) 0  X  n  a  n  P  =  and using Lemma 3.2 and (3.22), we have  £ ; ( 7 ) + o (ljv ) 0  p  in  =  W  i  =  W i  .+ O (AT-  =  N(0, I).  /  d Ni = O(N),  Wi(7)  -  - Q,7o)  i=l  \ *  + O (AT  1 / 2  p  1 / 2  p  ) +  0 p  (l , ) w  n  ) + o (l , ) p  w  n  u>i + o (lAr, ). p  n  Hence, it follows by the Law of Large Numbers, Lemma 3.2, and R6 that -1/2  -1/2  u —> oo  i=l  7=7,0;=u> (7) -1/2  A —> oo, n —> oo  n P.  J 7=7 ,u>=u;  i=i  0  rr~>/.. \ l l1 / 2  N —> oo, n —> oo,  [^(To)]  where u is the number of iteration. Using Slutsky's theorem, we can show that  vM7-7o)  = i E d  1 - va  T  J  L  i E Q?V&  QWQi  i=l  -1/2  n  i=l  N(O, n ( ) ) . 7o  We can extend the foregoing proofs to the case where A is unknown by replacing A with its consistent estimate. Note that at the estimate 7 , the estimate A given in (3.13) can be shown to be consistent for A as n —>• 0 0 and N —> 0 0 (see, e.g., Demidenko, 2004).  73  Chapter 4 Simultaneous Inference for Semiparametric N L M E Models with Covariate Measurement Errors and Outcome-based Informative Dropouts 4.1  Introduction  In this chapter, we develop two approximate likelihood methods to simultaneously address covariate measurement errors and outcome-based  informative  dropouts in semiparametric  N L M E models. In Section 4.2, we propose models for this complicated problem. We obtain approximate M L E s of all model parameters, using a Monte-Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 4.3. In Section 4.4, we propose an alternative and computationally much more efficient approximate method. The proposed methods are illustrated in a H I V dataset in Section 4.5 and are evaluated via simulation in Section 4.6.  74  We summarize this chapter in Section 4.7 with some discussion.  4.2  Models for Nonignorable Missing Responses  In the presence of nonignorable or informative dropouts in the semiparameteric N L M E model (2.6) and (2.7) with the covariate process (2.8), we can write  y bs,;)  = (y is,i,  0  m  for indi-  vidual i, where y i i collects the missing components of y, and y bs,i collects the observed m  S >  0  components of y^. Here, the missing y^-'s are intermittent, i.e., we allow dropout individuals to possibly return to the study at a later time. Let r; =  (rn,..., r )  T  ing response indicators for individual i such that r^ = 1 if Note that {(y b ,i, 0  s  be a vector of miss-  irH  is missing and 0 otherwise.  = 1 does not necessarily imply that r^+i = 1. We have the observed data  Zi, Ti),  i=  l,...,n}.  To allow for a nonignorable missing mechanism in the response, we need models for the missing response indicators r-j, which are called dropout models. T h e parameters in the dropout models are treated as nuisance parameters and are usually not of inference interest. Thus, we try to reduce the number of nuisance parameters to make the estimation of main parameters more efficient.  Moreover, too many nuisance parameters may even make the  response and the covariate models non-identifiable. Therefore, we should be very cautious about adding extra nuisance parameters. In general, the probability that  is missing may depend on many factors, such  as responses, covariates, and individual random effects, etc.  Since the missing response  indicators rj are binary, a simple model for them is a logistic regression model as follows. We will assume that r^-'s are independent for simplicity (to reduce the number of nuisance parameters) m /(r^,  Zi,  a^, bi;  77) = J J j P ^ -  = l|  y  i  )  z  h  ai, b ^  i=l  75  r])} ^[l r  - P(r = l| y  y i  ,  z a i}  bi; r / ) ] ^ , 1 -  it  with  i  logit[P(r - = 0  l|yi,  ai, b  i;  77)]  = log  ^  P  —  Z  "  b  * ' ^  r\Jij — J-lYi, Z i , ai,  [1  Di,  h(y z a^ bij.ry),  =  u  h  77JJ  where 77 are the unknown nuisance parameters and h(-) is often chosen to be a linear function of yi, Z j , a^, and bi. More complicated models can also be considered, but they may introduce more parameters and increase the computational burden. Note that the missingness of may depend on the (unobserved) true covariates z* rather than the observed error-prone covariates Z j . In this case, a method similar to the one described below can be developed and will be discussed in the next chapter. The density function /(r^y*, z  i;  a  i;  b^ 77) is a general expression of the missing re-  sponse mechanism. Little (1995) pointed out two ways to incorporate informative missingness: •  outcome-based  informative  if /(rj|yj,  Zj,  a,, bi; 77) = /(rj|yj, z^; 77). That is, the prob-  ability that the current response is missing depends on the possibly unobserved response yi and covariates Z j but not on the random effects ai and bi. For example, a patient does not show up because he is too sick to go to the clinic. •  random-effect-based  informative  if / ( r j | y j , Z j , ai, bi; 77) = /(r^a,, bi; 77). That is, the  probability that the current response is missing depends on the underlying unobservable random effects ai and bi but not on yi and z». For example, a patient may be more likely to drop out if his initial viral decay is slower than other patients. In this chapter, we focus on the outcome-based informative  missing mechanism. Diggle  and Kenward (1994), Little (1995), and Ibrahim et al. (2001) discussed various specifications of the outcome-based  informative  missing mechanism. We will assume that, for example,  are independent with  logit[P(rij = l|yi, Z i ; 77)]  = 771 + 772^1,- +  76  r] z 3  i2j  + • • • + r] z v+1  ivi  +  r?„ yij. +2  More complicated dropout models can be specified in a similar way. Note that the assumed dropout models are not testable based on the observed data, so it is important to carry out sensitivity analysis based on various plausible dropout models. If the main parameter estimates are not sensitive to the assumed dropout model, we may be confident about the results.  Otherwise, if the estimates are sensitive to the assumed dropout model, we need  further investigation of possible missing mechanisms.  4.3 4.3.1  Likelihood Inference The Likelihood Function  We consider likelihood inference for semiparametric N L M E models with  outcome-based in-  formative dropouts and measurement errors and missing data in time-varying covariates, based on the approximate models (2.6) - (2.8).  Let 0 = ( a , /3, 5 ,  the collection of all unknown model parameters.  We assume that the parameters a, (3,  2  R, A, B, 77) be  5 , R, A, B, and 77 are all distinct. The approximate log-likelihood for the observed data 2  {(y bs,i, Z i , T i ) , i = 1 , . . . ,n} can be written as 0  1(0)  = 5> /// n  g  f(W,B)f(Ti\yi,  /y(yi|z» Zi;  )fz{^W,a,R)f(&i;A)  V)  77  dy daidbi miSti  where /y(y;|z;,  b^,  a, (3, 8 ) 2  = U^U fY{yij\zij,ai,  = n;ii / ( z i | a i ; a , i?) z  The  i ;  (2TT<5 )- /2 e x p { - [ y y 2  z  ,  1  d ( u £ a + v £ a , /3, b ))] /2<5 }, 2  t  l  a  IE=i M " x(zi  /(b  2  = n £ = i /z( ;fcl *; a , i2)  =  /(a,; A)  W, a, (3, 5 )  1  7  2  exp{-(z  i f c  -  ink a  - v, a i ) i?" f c  T  1  - UifcQ: - V i a i ) / 2 } ,  f c  fc  =|27r^|- /2 xp{-af^- ai/2}, 1  1  e  ^yrfll-^expf-bffl-ibi^}.  B) . =  observed-data log-likelihood 1(0) can be quite intractable, so we use a M C E M  algorithm, which is similar to Section 3.3.2, to find the approximate M L E s of parameters 0. B y treating the unobservable random effects a ; and b i as additional "missing" data, we have "complete data" { ( y i , z  i ;  r^, a i , b i ) , i = l , . . . , n } . The complete-data log-likelihood  for all individuals can then be expressed as n  n  i=l  i=l  + log / ( a A) + log / ( b B) + log f(rA , i ;  i ;  yi  z 77)}. i ;  (4.1)  (i) where l  is the complete-data log-likelihood for individual i.  4.3.2  Approximate M L E s Based on a M C E M Method  c  The E M algorithm iterates between an E-step, which computes the conditional expectation of the complete-data log-likelihood given the observed data, and a M-step, which maximizes this conditional expectation to update parameter estimates. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectation. In the M-step, we use standard complete-data optimization procedures to update parameter estimates. 78  2  Let &® be the parameter estimates from the i-th E M iteration.  The E-step for  individual i at the (t + l)th E M iteration can be expressed as  Qi(0\OV)  =  E(lU(e)\y , ,z ,T ;0®) oba i  i  = J J J  i  [log fyiy^,  + log /z(zi|ai; oc, R) + log /(a^; A) + log /(fy; -(-log f(ri\yi,  =  A (a, 0, (i)  77) x / ( y i i , a m  O.+ / i ° ( a .  S i  i ;  a,, b<; oc, 3,  2  B)  b^y^i, z  0 ) dy (t)  u  + 4 V ) + /J°(£)  8)  m i S j  i ^  dbi (4.2)  + /«(t|).  Since the expression (4.2) is an expectation with respect to f(y i ,i,  ai, bi|y h j, Zj, i"*; 0^),  m S  0  S)  it can be evaluated using the M C E M algorithm (Wei and Tanner, 1990). Specifically, we may use the Gibbs sampler to generate samples from [y j j, a^, b ^ y ^ i , Zj, ^ m  sampling from the full conditionals  Zi, r  [y is,i\y bs,i, m  ai, bi; 0W], [ai|yi, i , r i , bi; 0 ], and (t)  i}  0  0W] by iteratively  S)  Z  [bi|yi, Zi, Ti, a;; 0W] as follows.  f(y is,i\yobs,i, m  Zi,  r, {  ai,  bi;  0W)  oc  ri, a  /y(yi Z i ,  bi; 0W)  i ;  /y(y< Z i , ai, bi; 0W) • / ( r i | y i , <• / ( a » | y i , Zi, r i , b  = /y(y< i ;  0W) oc =  «  Z i  0W)-/(ri|yi,  i 5  , a  b<; 0 )  Z i  ; 0«),  (i)  i ;  ai|zi, r i , bi; 0W)  f (yi Y  fv(yi  , ai, b  Z i  Z i , r i , a;, bi; 0W) • /(a^z*, r», bi; 0 ) (4)  /y(yt Z i , ai, b  0W) • / ( r i | y i ,  i ;  Z i )  ai, b  i 5  0«) •/ ( a ^ ;  0(*>) (  CK / ( a i ; 0W) / ( z i | a i ; 0 « ) • / y ( y ^ , ai, b,; 0<«)), 2  / ( b i | y i , Z i , r i , a*; 0 ) (t)  oc / y ( y i , b;|z;, r i , a ; 0 ) (t)  ;  = /y(Yi| i> * , a*. i ! z  ex / ( b  i ;  r  b  0 ( t )  ) •/(bi|zi, r  0(*)) • / y ( y i | z i , ai, b  = / ( b i ; 0W) • / y ( y i | z i , a  i ;  < ;  i ;  ^ ; 0<*>)  0«) •/ ( r ^ , z  i }  ai, b<; 0W)  b<; 0 ^ ) • / ( r ^ , z,; 0W)  o c / ( b i ; 0 W ) - / y ( y i | z i , ai, b,; 0W). Monte Carlo samples from each of the full conditionals can be generated using rejection 79  sampling methods, as in W u (2004). Alternatively, the integral (4.2) may be evaluated using the importance sampling method (see Section 3.3.3). For individual sample of size  {(y^.,  let  i,  k generated from t  af> b f ) , . . . , (yf ,, a f \ b f ) } denote a random 0  a*, b;|y  [y i ,i, m  S  ofcSii  , z  u  0  r- 0^}. Note that each ( y f , a f , bf si  depends on the E M iteration number t, which is suppressed throughout. The E-step at the (t + l)th E M iteration can then be expressed as  « E U  Q(0|0<*>) = ± QMeV) . i=l  i=l  E ^(tf; (3$.,,, y ^ ' ) ,  af,  bf))  fc=l  L  J  = E E iipg /y((yi2.,i,yo6.,i)|zi, a f , b f / a , * ) 2  i=lfc=l  + E E T log / z N a f ; a , R) + £ £ & log / ( a f ; A ) t=i fe=i • i=i fe=i t  + E E £ log / ( b f ; B) + ± t £ log / ( r | ( y f y ^ ) , i=i fe=i t=i it=i t  0  = Q {cx, /3, <5 |0W) + . Q P ) ( a , R\0®) + Q^{A\0^) w  +  2  The M-step then maximizes  z „) i;  Q( >(£|0(*>) + 4  Q^{r)\0V).  Q(0\0^) to produce an updated estimate 0^ \ so it t+1  is like a complete-data maximization. Since the parameters in  + Q^ \  Q^ \  2  are distinct, the M-step can be implemented by maximizing  and  3  +  Q^ \ 4  and  separately using standard optimization procedures for the corresponding complete-data models, as in Section 3.3.2. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0. Let  sf  = dlf/dO,  where  if  is the complete-data log-likelihood for individual  i. Then an  approximate formula for the variance-covariance matrix of 0 is  Cov(0) =  [^£(sf|y  o b S i i  ,  Zi,  Vi;  0)  E(sf\y , obSti  z  u  r  i;  Of]  ,  (4.5)  i=i where the expectations can again be approximated by Monte Carlo empirical means, as in (4.4). 80  In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. A^°\ B^)  Obtain an initial estimate of ( a , (3, 5 , R, A, B) = ( a ' ' , / 3 , 5 -°\ 2  0  (0)  2(  R®\  based on a naive method, which ignores covariate measurement errors and missing  data, and an initial estimate of  r/ = T / ° ) based on the dropout model by filling in  the average of  0 and b f  Step 2.  (yrms.i, a i,  Set a f =  y b ,i0  s  y m j S) j with  0.  =  A t the i-th iteration, obtain Monte Carlo samples of the "missing data" using the Gibbs sampler along with rejection sampling methods by sampling  [ymiS,i|y 6S,i,  from the full conditionals r-j, a i j 0^],  0  fy; 0^],  Z i , ^ , ^ ,  [a^y*, z  r bf, 0 ] and (i)  i}  iy  [b^y*, z,,  or using importance sampling methods to approximate the conditional expecta-  tion in the E-step. Step 3.  Update estimates 0( ) using standard complete-data optimization procei+1  dures. Step 4. Iterate between Step 2 and Step 3 until convergence.  4.3.3  Monte Carlo  Sampling  Gibbs Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples  (yfsi, af \  as follows. Set initial values are  (ySs,i>  a  f\  b  f ). }  w  e  c  a  n  o  b  t  a  i  n  bf).  (9mi£i, f a  Suppose that the current generated values  + 1 )  ,  bf+1)) as  Step 1. Draw a sample for the "missing" response  f(y is,i\yob ,i, m  s  Z i , n,  af',  bf ; }  follows.  yfitV fr°m 0W).  Step 2. Draw a sample for the "missing" random effects a f  /( *l(yfS\ a  r  yofc.,0,  81  i ;  bf  *<*>).  + 1  ' from  Step 3. Draw a sample for the "missing" random effects h\  f\ H^)> k = r  ' from  After a sufficiently large burn-in of r iterations of Steps 1 - 3 , the sampled values will achieve a steady state. Then,  {(y^l.ii  a  + I,...  ,r + k } t  can be treated as  samples from the multidimensional density function  f(,y-mis,i> a.j, b j | y b j , Z j , F j , 0  Q()).  S )  A n d , if we choose a sufficiently large gap r' (say r' = 10), we can treat the sample series bj '),  { ( y m i s , i > d-i \  fe  k  k = r+r', r + 2r',...}  as independent samples from the multidimensional  density function. There are several ways to get the initial values ( y ^ , - , s i  way is to set y ^ ]  S ) i  to the average of y bs,i, and (af \ Q  b-  0 )  ) to (0,  a\°\  b- ^). 0  A simple  0).  Multivariate Rejection Algorithm Sampling from the three full conditionals can be accomplished by the multivariate rejection sampling method. rj, aj, b j ; 6>W) in (4.3). sup{/*(u)}.  For example, consider sampling from / ( y j i | y & , i , Z j , m  Let / * ( y  We assume c < oo.  s  c=  A random sample from / ( y j , i | y & , i , Z j , rj, aj, b j j  0 )  )  / ( y ( , , ; K »i.  =  0  s  h  {t)  m  /(rj|yj, z  S  0  <;  0®)  0  and  m i S i i  u 0 )  S )  s  {t)  u  can then be obtained as follows by multivariate rejection sampling: Step 1.  Sample y ^ j  s i  from / y ( y j j | z j , aj, b ^ ; m  S >  0 ),  and independently, sample  (t)  w  from the uniform (0, 1) distribution. Step 2. If w < / * ( y „ j j ) / c , then accept y ^ j , otherwise, go back to step 1. S ]  Samples from / ( a j | y j , Z j ,  s i  r  iy  bj;  6^)  and / ( b j | y j ,  z r a^;-O®) it  it  can be obtained in  a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used'to obtain samples from [y i ,i, m S  82  bj|y (, j, Z j , 0  S  4.4  A Computationally More Efficient Approximate Method  The estimation method described in the previous section may be computationally very intensive and may even offer potential computational problems such as slow or non-convergence, since the method involves sampling the random effects ( a j , b j ) , which may have high dimension (see the detailed discussion in Section 3.4.1). To overcome these difficulties, in this section we propose,an alternative approximate method which further approximates model (2.5) by taking a first-order Taylor expansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model with the covariate model (2.8) and the dropout model, we can obtain approximate M L E s of model parameters by using a computationally more efficient M C E M algorithm. The random effects a j and b j can be integrated out in the E-step of the E M algorithm and thus sampling the random effects in the E-step is no longer needed. Therefore, the proposed approximate method may provide substantial computational advantages over the M C E M method in the previous section. Moreover, the proposed method can be used to obtain good parameter starting values for the M C E M method in the previous section. Denote the current estimates of (6, a j , b j ) by (6, a j , b j ) , where a j = E(&i\y b i, 0  Si  Z j , TJ; 0)  and b j = £ ' ( b j | y h j , Z j , r^; 6), suppressing the E M iteration number. Taking a first-order 0  S ;  Taylor expansion of  in (3.6) around the current parameter estimates a and J3 and random  effects estimates a j and b j , we obtain the following L M E response model  Yi  =  Ei + Wia + Xip + HiSLi + Tibi + ei,  83  (4.6)  where  Wi  =  (wji,..., w .)  with xv  Xi  =  (xii,..., x  i r i i  //i  =  (hji,..., h  i7lj  Ti  =  (t ...,t J witht0- = ^  gi  =  g i ( d , y9, ai, bi) - WiQ: - A"i3 - TfiOi - T i b i ,  i n  )  r  =  i:j  with Xjj =  )  u  with h  do. dp  =  i:j  O&i T  il;  in  with all the partial derivatives being evaluated at ( d , P, aj, bj). Our proposed approximate method is to iteratively solve L M E response model (4.6). For the L M E response model (4.6) with the covariate model (2.8) and the dropout model, the M L E s of the model parameters can be obtained by a M C E M algorithm, in which the random effects ai and b i can be integrated out in the E-step, as shown below.  Thus, the  E-step only involves sampling y i i rather than ( y i , i , a*, bj) as in Section 4.2. Moreover, m  S )  m  S  some analytic expressions for the M-step can be obtained.  These result in a substantial  improvement in computational efficiency  4.4.1  A M u c h Simpler E-step  In this section, we show that, based on the approximate L M E response model (4.6) and the covariate measurement error model (2.8) along with the dropout model, the random effects (aj, bi) in the E-step in Section 4.2 can be integrated out. Based on standard results for multivariate normal distributions, the distribution of  84  ( y i , Z j , a j , b j ) for individual i is given by Hi A Hf + TiB Tj + 5 I  Hi A V?  2  \0  •  N  ViAHf  0  AHT  0 Since H A Hf + T B t  ViAV?  + I®R AV?  BT?  V  ,  HiA  TiB  ViA  0  A  0  0  O  Tf + S 1 and V A Vf + I ® R are positive definite, they are symmetric 2  t  and invertible. B y the inverse operation of partition matrices, we can write Hf + 1 l ^ Hi rii A j\n; - t - 1TiB ; a Tj it + 5 o~  riiHiAVf A vt  2  V  VAHT  \  T/. 4 T / T .  VAV/  +I<S)R J  I  V  Ui +  ft  —ft tii  \ (4.7)  -EiF  1  Ei  J  where  d  = {HiA'Hj  Fi = Ei  =  + TiBT? + 5  2  GiHiAVf, [(V^Vf+ / ® i ? ) - VjAFf  Gj/fjAVf]- . 1  Because the random effects aj and b j are conditionally independent of r j given ( y j , Z j ) , it is straightforward to obtain the conditional distribution of  yii  Z j , I*j,  6  N  85  [aj, b j | y j , Z j , r j ]  when 6 = 6  B  where  u  =  ai  [AHTtGi  +  FiEiF^-AVfEiFfiiyi-gi-Wia-XiP)  + u  i  =  BT^Gi  =  A-iAHjiGi  E  =  B — B T?(Gi  a i b t  -  (E ) =[AVfE Fl-AHf(G  hi  E  a i  bi  S  [AVfE -AHTF E }(z -U a)\ Q i  l  i  l  e=  +  FiEiFT^-^-Wia-Xi^-BTTFiEiixi-Uia)^^, +  FiEiF^-AV^EiF^HiA-lAV^Ei-AHTFiEilViA^  + Fi Ei Fj) T* B \ Q Q , =  +  T  biai  1  i  l  F E Fn}T B\ . i  i  i  e=B  By the expectation and covariance properties for multivariate random variables, we have  £(aj|yj,  ZiYi,  E(^aJ\yi,  0) =  z^ n\ 0) = £  £(b;|yi, z  n; 0) = u ,  E{hihJ\yi,  Zi,Yi-  u  E(aihf  a i  +  i/J,  b i  + u  bi  0) = E  \yi, z ^ ; 0) = S  a i b i  v?.,  hi  + u  &i  Since  f(y is,i, m  ^i, b i | y  o6S)  j,  Zi, Yi,  0) = /(a^, bi|yi, z», r»; 0 ) / ( y i , t | y 6 i , i , i ' , z  m  86  S  0  Sl  r  - if  the conditional expectations in the E-step, which correspond to  I?\a,  (3, 5 ) 2  =  J £ , a i  [ - ^ log(27r5 ) - ^ ( y < -  - W oc - Xi (3 - H, a, - %  2  b i  g  i  i  =  Zi, Ti,  log(27r5 ) - ^  J  2  x/(y  =  Zi,r  i j  0  - W oc - Xi (3)  r  E^lHi&i  Zi, Tj,  g  i  t  + Tih^yi, z  r 0]  u  [ ( ^ i a i + r i b i ) ( i / i a i + ri.bi)|yi, z ^ ;  mis,i\yobs,ii  i ;  {(yi - gi - Wi oc - X i /3) (yi -  T  a i ) b i  T  {  0)dy  ^{yi-^-WiOc-Xidf +E  b)  t  x ( y i - gi - Wi oc - X i (3 - Hi ^ - Ti b ) | y ^ f (ymis,i\yobs,ij  in (4.2), become  1  .  < ;  0]}  0)dy  -^log(27r5 )-^y'{(y -g -l^a-X ^)% -&-Wia-X /3) 2  i  i  i  -2(yi-gi-Wia-Xi/3) [^iE(ai|yi, z  i  T  i ;  r  i ;  i  0) + Ti £ ( b i | i , Y  + ^ , b j a f i f f / f i a i + 2bf T f t f i ^ + b f i f l - b ^ , a i  x/(y  , r  Z i  Z i )  r  < ;  0)]  0]}  i ;  mis,i\^/obsji j ^i? ^"ij  0)dy =  -^log(27r5 )-^ y  { ( ^ - ^ - ^ ^ - ^ ^ ( y i - g i - W i « - X i / 3 )  2  -2(yi-gi-Wia-Xi/3) (Fi B[ai|yi, z T  J  +tr[Hi E(eii a f |yi, z  h  r<;  +tr[7;£;(bibf| i, Zi.Ti; y  =  -|log(27r<5 )-^ | 2  0) Hj)  a i  +tr[Tj ( £  b i  +  V  + i/  a  i  b i  r  t  u  z  i ,  r  < ;  i  a i  +  i / J ) tff ] + 2tr[tf; ( E  n;  u  0) i f ]  7} i / a i b i  b  T  (y -&-W;a-X /3) i  i  J  + ^  if]  !/£.) Z f ] | x i\yobs,i,  87  z  x i  i/  0} + T E[bi\y bf|yi,  {(yi-&-W a-X /3) T  i ;  + 2tr[#i  0)Zf]}  - 2 ( y i - gi - VKi a - X , /3) +tr[Hi ( S  i ;  Z i , Ti',  6)dy i i, m St  0])  1 _ '_ m  E,a , , b j  - ^ log\2*R\ - -  - Uv oc - V Ri) RT  1  tj  i=i  UijOt  x(ZJJ  Vij a j ) | y j , z^r^; 0 x / ( y i i | y ( , i , m  lOEg |27ri?| - - y { ^ ( z ^ - ^  a)  T  iT  S]  0  (z  1  - ^  y  Zj,  S]  d)dy i i  iy  m S  oc) i  mi  -2 ^ ( z  - Un ocf R- V E^ailyi,  z n- 0)  1  y  i3  u  3=1 _ •  mi  + ^2E [^V^R- 'Vija \yi,  ^, r»; 0]\ x / ( y  1  aiibi  i  f r  1  m i s  0)dy  , |y 6 , , Zi, r ; l  0  s  l  n  4  m i  log |27ri?| - - J { £ ( z < , - - tf ocf R- (zy - £ ^ a )  ^  1  y  m i  -2 ^ ( z ^ - - Un ocf R' Vij E(<a\ , z i y 0) 1  yi  +  i=i J^tr[V [i?- K £;(a af|y , r  1  i  j  m  1f  m  log |27ri?| - - {  ^  i  i  0)]} x  , iy  z», rii 6>) d y „  / ( y ^ y ^ ,  *  ~ Vij ocf R- (zij - Uij oc) 1  3=1  -2Y(z j-U ocf l  Z i  u  R- VijU l  l3  ai  3=1  + Y ( ij tr  v  R"  1  H  V  (  a i  s  +  x /(y  ^ a i ^ ) )  m i s  ,i|y 6 ,i, Zi, 0  S  d)dy ,ij,  r ;  mia  t  3=1  r  E  aubi  1  r  l  [ ~  2  §\  2 7 t A  \ -  ~  A  l  *l  y i  tr[A~ E(aisf\yi, l  /  '  Z i  '  r i  '  0  /(ymis,i|y 6s,i,  x  i ,  z  0  i"*;  9) dy  miSii  z  h  n; 0)} x f{y i ,i\yobs,i,  z  m S  ; , i\;  9)dy  miSii  1 f  -2  - E, log / [ ^ - ho? ( a B + - 4I^ log |2vr5| a i . b i I27T.  1  a  1f  - log |2TTV4| - -  1  ~i  l l o  t  r  _  1  E  ;  1  / (0 y mx^ /y( y, ^i ,, i Z b i |^ y i ), ]z x r»; | yi ,&n; , i , 6) z i}  m  S  0  s  i ;  dy r<; i0) d y m Sii>  m i 5 i  1f  -log|27r5| - -  / tr[5 E(bibf|yi, _ 1  1 1 f - log |2TT£| - - / t r [ 5  _1  (E + u BI  hi  z  i}  iy  0)} x f{y isAyobs,i, i, z  m  9)dy  i\;  mis>i  ~ ul.)\ x / ( y i , i | y , i , z i y 6) dy m  S  o6s  u  ,  mis>i  i  I^iw)  =  I E  z^,  log/(ri|yi,  aiM  x /(y i ,i|y  77)  m  S  l o g / ( r j | y i , Z J ; 77) x / ( y « , i | y b , i , z m  0  s  i ;  o b s >  i, z  r ; 0) dy 4  i }  rij 0) dy  n  Thus, compared with the intractable high-dimensional integrals in (4.2), the above integrals I{  - II  if - if  have much lower dimension, i.e., the E-step is substantially simplified. Note that are expectations with respect to / ( y i i | y 6 s , i , Z j , ry, m  S l  0  0), so  they may be evaluated  using common numerical integration methods such as Gaussian quadrature if the dimension of y i s , i is small. If the dimension of y i i is not small, we can use the rejection sampling m  m  S )  methods to generate samples from [y is,i\y bs,i, m  f{y is,i\yobs,i, m  z  0  i , u 0) r  %,  &\ based'on the conditional density  z  oc  / ( y i , Zj,ry, 0)  =  / ( Y i , Z i ; 0)/( i|y*> Z i ; 0), r  where the distribution of ( y j , Zj) is  yi  N  /  gi + WiCt + Xi(3^  UjCt  /  HiAHf  + T{BTj + 6 I 2  V AHf  J  t  H AVf t  ViAVf +  I^R  For individual i, let y ^ ] j , • • •., y f i i . i denote a random sample of size k generated from s  [ymis,i|yobs,i! i, z  ry,  • Note that each y ^ \  is suppressed throughout.  t  s i  depends on the E M iteration number t, which  Then we approximate the conditional expectations if  89  - if  in  the E-step by its empirical mean, with missing data replaced by simulated values, as follows.  * fe=i >(fc)  r- E l(y Xi> 2  2=  - & - w « - * /3p  {  1  x[^£;(a |(yS , y ,), z n; 0) + % E(bA(y^ ., ^ fc + r - E { [CovC^KySi.i. y^O. i , » ; 0) * fc=i J  Sil  obs  u  y ), ofeM  s  Z i j  r  i;  9)}  t  t r  H i  +^( i|(y2l,i> a  z  yobs,i),  r  ry, 0) ^(aiKyJJ^,  Zi,  y bs,i), 0  zu  ~6) )Hj}  z n;  ~9) \T?}  T  kt  ^  + 71 7 E  2 t r  fc=i  {^[  +E(^\(y Xi, {  C o v  y  (  ,i),  o b s  l(ySl,i'  bi  '  a i  yobs,i),  Zi,  z vf, e)E(b \(y X , {  u  l  Vi;  y  l  o b s  9)  ,i),  T  u  k  t  E < < [Cov(b,|(yS ^ yobs,i), Zi, v , fc=i +£?(b |(y^ , y ,i), , r ~9) £(bi|(y<2flii, tr  +771  T  i  Sl  i i  Z i  o b s  ~9)  f  y<*a,i), z r u  i ;  ~0) ] T  i ;  if}},  1 f _ '_ m  j f ( a , iJ) «  - ^ i log [2jriJ| - J 2  - E 2  77 E ( j=i k=i _.  TUi  z  «-  n )  u  a  t r  l  o  g  |  2 y r A  Zi,Vi,  o)  ai  s  0  i ;  * fc=l  | _ 1  ry, 9)  Zi,  s  „ _1  yob**),  V  Ba  (j4)  £N(y™U  ^  T  tj  W ^ H [Cov( |(yS ., y ^), Zj, r 9)  +-(*l(yS2,i. yobs,i), f)  1  tj  kt  + E J7 E j=l  £ ( z „ - U af R' {z - U„ a) W -  ^{A-  1  £?(ai|(y2ii, yo6«,i), 8i  [Cov(a,|(yS , yob.,0,  Zi,  r>;  S|i  r;; 0 ) ] } T  0)  * fc=i •  +£( il(y2L, a  yc*,i). s  90  Zi,  r  i;  9)  £(a;|(yf., y s  o6s  ,i),  Z i  , ri;  9) }}, T  I^\B)  - 1  - - log\2KB\ l  »  \  £  I  [Cov^Ky JS,  ^  V  +^( i|(ySi,i yobs,i), ^ , r b  ;  i ;  y ^ ) ,z 0  0) £ ; ( b i | ( y S , y s i  o b S i  i ;  i),  r  i 5  0)  z r h  0) ]}, T  i ;  kt  4°fa)  7 r £ g / ( i l ( y 2 l , i . y<* ,i), *; l o  ~  r  z  8  * fc=i  4.4.2  The M-step  In this section, we derive some closed-form expressions for the M-step, so we avoid some iterative algorithms, which may be computationally inefficient. The M-step of the E M algorithm maximizes  Q{6\9)  =  n  Y$\<*i  A  + f V,  ^  R) + WW  + f fa)] }  + W(B)  i=l  to produce an updated estimate of 6. Note that if we were to observe (ei, €i, a i , b i ) , in addition to (y fc ,i, Z i , r i ) , we would use the following estimates 0  s  n  n  ^ = £ ^ /£ *> e  ei  i=l  i—1 n  i  W  where J2 I «> i=l e  and 5 , respectively.  m  i=l h h  n  i=l  2~2 Yl ij Ij> i=lj'=l  e  e  R=J2 i I/ >  n  n e  Y/)2 iJ l/J2 i'  i=l j=l  = J]aiaf/n,  n mi e  n  n  i=l n  n m; R==  €  2~3 i fi a  a  i=l  Since ei,  a  n  d  E  D  i b f  a  r  "sufficient" statistics for <5,  e  2  R, A,  i=l a^, and b i are unobservable, we can "estimate" them by  their conditional expectations given the observed data (y f, i, Z i , r ^ , as in Laird and Ware 0  M  S)  f 1 (1982). Based on results multivariate Hi A we Hf have + TiB Tj gi in + Wia + XiP ^analysis,  Zi  V <) e  ~N  Vi A Hf  Uioc \  0  J  \  91  5  2  1  +51 2  HiAV? ViAVf  + I^R 0  5 I^ 2  0 51 j 2  and Ei + Wicx +  N  V  Hi A Hf + TiB 7 f + 8 I ViAVT  0  ^ J  V  \  HiAV?  2  UiC*  e  where  Xip  + I®R Ml  0  Mij R  is a vrrii x v matrix with the j t h v y. v submatrix R and zeros elsewhere. Since are conditionally independent of r-j given y j and Z j , we have  ej and  /(ej|yj,  Z j , ry, 0)  /(ei|yi,  Zi,  ry  =  / ( e j | y j , z»;  0) = f(ei\yi,  ZJ;  0),  0).  Using the above results, the inverse of partition matrix in (4.7), and the definition of conditional distributions, it can be shown that  [ej|yj, Z j ,  ry  0] ~ N(u , Bi  A J, e  where  [5 (Gi + Fi Ei i f ) ( 2  - g j - Wi cc - Xi Q) - 6 F E^i  - U cx)}  2  y <  t  0=0  and  [cij|yi, Zj, Ti\ 0] ~ N(u , €iJ  A ), £ y  where  [R^E )*{v -U ci)-R{E FT)i(y -&-W OL-X B)} 6=6  =  i  k=l  A .. e  =  [R-R{Ei)"R]  6=6  92  ik  ik  i  i  I  i  with the j t h v x rn submatrix (Ei FfY submatrix (E^  E(e[ei\y  z  obStU  i ij €  and the j t h row and the kth column v x v  of Ei, j, k = 1 , . . . , m^. Note that  k  E  of Ei Ff  iy  u  i,  z  0  u  f  0)  T  ri; 0)  E(hihJ|y ,i, o6s  G i | y i , Z i , Vi', 0) j'(y is,i\yobs,i,  E(&i  ru 0)  <Hj\yobs,i,Zi,  E(aiaJ\y bs,i,  0)  m  ij  z  r  t i 0) ^Ymis.t)  E(€-i €j | y i , Z i , F i , 0)/(yjnis,i|yo6s,i; Z i , T i , 0)  ai |yi, Z i , rf, 0)f(y Ay  E(  mis  ai  E(bibf|  y  i  dy i i, m  s  v ;6)dymis,11  obs  i  , Z i , Ti; 0)f(ym ,i\yobs,i, ia  z  i , r»;  0)  d  y 7 1  Using the expectation and covariance properties for multivariate random variables and the properties of the matrix trace operation, we can update the estimates of (5 , R, A, B) in 2  the M-step as follows. 5  = E tv[E(eief\y ,  2  z , iy 0)}/E  obSti  i=l n  kt  - E E  i = l k=l n mi  R  = E  [ (ei J\(y l,i, E  mi  yob ,i),  e  m  S  z  * , r»; 0)]/ E ^  _  E  n  i=l  . tv  i  n  t  e  /ct  ~ EEE  n  5ly fe ,i; *> i ; ^)/ E 0  z  S  i=l  r  m  i  m Efej CyKyJn-B.i, Yobs.i), Z i ,  n  r 0)/ E i;  i = l j=l k=l  i=l  = E ^(ajaf | y  A  .  o 6 S i i  z iy  ,  i;  0)/n  i=l .  ~ 5  =  E  E ^ ( i f l ( y m i , i > yobs,i), z<, v ; 0)//c n,  i=l  fc=l  a  a  5  E - ^ ( b i b f |y i=l  - Ei = i Efc=i ^ (  b  o 6 S ]  i , " Z i , v^,  0)/n  y«*».0. »> » ; 0 ) A ^ -  i FI(y i ,i'  z  b  m  S  r  Plugging 5 and .R into E r = i [ - ^ ( > ^ , <5) + ^ ( i 2  t  {  2  a  a  ^ ) ] and setting its first derivative  with respect to (oc, 3) equal to 0 , we can get a set of equations for 6c and 3.  i-1  j=l  v  n  E  i=l  xf Wi  7  a  i-l  E xf x  ( E^ i=l  \  {  i=i 93  l ^ ^ + ^Ec/^- ^.} 1  j=l  ZXfv , t=i yi  J  where  u  = E(y \y ,  yi  i  =  ahati  Z i  ij —  , ry,  0) -  u  obSti  {  £(bi|y  o h S i  i,  z  i ;  r-  0),  Zi, ry 0).  E(&i\y i,  z  z r^; 0) - T  Hi E(ai\y ,  gi-  obs>  The foregoing conditional expectations can be approximated as follows.  E(yi\y bs,i,  Zi,  0  ry  E^ ™l,i' kt  «  0)  y  Yobs,i)/ t, k  k = i  kt  E(ai\y ,  Zi, ry  obSii  0)  «  ^  ^ M ^ m l , ; ,  y<**.0>  E(bi|(yf ,  y  i,  z  ry 0)/fc , t  k = i  kt  0)  Zi, n;  E(bi\y ,i, obs  «  s i  o f e s > i  r^;  ),  0)/fc . t  fc=i Using the inverse operation of partition matrices, we obtain closed-form estimates of a and 3 in the M-step: n  « = r  where r  n  -M  r  £ w^ i=l n i=l  n  1  + 5 izuljR-^ j=l  J  L  j=l  J  T  12  = F  21  = -Q L~\ n  i=l  "  + r  Zij  n  L =  n  L  Q, T  ^  2  yi  + Q L"  nii  „  and T  £x>  1 2  y i  ,  i=ln i=l = L ' , with 1  2 2  mi  i=l j=l  ExfXi-(E^^)M(E:^Xi), i=l i=l i=l n  Q =  M ( E ^ , ) . i=l  We finally maximize E"=i  ^i 1!)  to obtain an updated estimate of 77, which can be  done by standard optimization procedures such as the Newton-Raphson method. Iterating between the above E-step and M-step until convergence, we obtain approximate M L E s of 0. The asymptotic variance-covariance matrix of the approximate M L E 0 of t  94  0 can again be obtained using the approximate formula given in (4.5). T h e only difference is that the density function / y ( y j | z j , a;, by a, 3, 5 ) in if 2  1  in (4.1) is based on the L M E  response model (4.6). We see that, for this approximate method, both the E-step and the M-step are computationally much less intensive than those in Section 4.3. The performance of this approximate method will be evaluated in Section 4.6.  4.5  Example  We illustrate our proposed methods in this chapter using a H I V dataset.  We also analyze  this dataset using the commonly-used naive method which ignores measurement errors and missing data for comparison.  4.5.1  Data Description  The dataset includes 53 H I V infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments.  After the antiretroviral  treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contaminated by long-term clinical factors. Various covariates such as C D 4 count were also recorded throughout the study on similar schedules.  It is well known that C D 4 counts are usually  measured with substantial errors. The number of response measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance and other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 C D 4 measurements missing at viral load measurement times, due mainly to  95  a somewhat different C D 4 measurement schedule. A detailed data description can be found in W u and Ding (1999). Six patients are randomly selected and their viral loads are plotted in Figure 1.1. Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay.  Visual inspection of  the raw data seems to indicate that dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. The C D 4 count trajectories for six randomly selected patients are plotted in Figure 1.2. There exists large variability in C D 4 count between patients.  The population  C D 4 count trajectory appears to have a quadratic polynomial shape.  4.5.2  The Response and the Covariate Models  Based on W u (2002) and W u and Zhang (2002), we consider the following H I V viral dynamic model (see Section 3.5 for the details)  j  =  \og (P e-  log(Pii)  =  0i + b ,  log(P )  =  04 +  Vi  2i  w  + e,  (4.8)  \UJ = fa + fcz-j + b ,  (4.9)  X i i  u  u  ^+  P e- J i) X2i  ti  2i  ij  2i  hi,  X2ij  = w(t ) + h {U ), ij  i  j  (4.10)  where y - is the logi -transform of the viral load measurement for patient i at time Uj, Pu and i3  P  2i  0  are baseline values, A^- and A j are viral decay rates, z*j is the true (but unobservable) 2i  C D 4 count, and w(Uj) and hi(Uj) are nonparametric fixed- and random-effects functions (see Section 2.1). To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines 96  Table 4.1: A I C and B I C values for the model (4.8) - (4.10), with q < p = 1, 2, 3. Model  p=l,q= 1  p=2,q=2  p=2,q=l  AIC  622.92  590.82  591.37  584.51  p=3,q=3  p=3,g= 2 593.32  p=3,q= 1 583.19  BIC  685.37  676.68  662.95  677.16  671.76  657.72  with percentile-based knots to approximate w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects.  A I C and B I C criteria are used to determine the values of p and q. Table  4.1 displays A I C and B I C values for various plausible models. Based on these A I C and B I C values, the model with p = 3 and q = 1, i.e.,  A ij ~ ft + fa ^ i ( i y ) + ft foiUj) + b 2  (4.11)  4i  seems to be the best, and thus it~is selected for our analysis. For the C D 4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,...,  v^f )  and linear (a = 2), quadratic  1  (a = 3), and cubic (a = 4) polynomials. Table 4.2 presents A I C and B I C values for these three models. T h e following quadratic polynomial L M E model best fits the observed C D 4 process:  CD4  i (  = ( « i + ai) + ( « + a ) u  where uu is the time and ot = (flii, i2, 0'i )  T  fl  3  2  2  are the population parameters and a  are the random effects.  97  3  t  u  (4.12)  T  3  3  + e  a  (a\, a , a ) 2  + (a + a ) u  2  t  =  Table 4.2: A I C and B I C values for the linear, quadratic, and cubic C D 4 models Model  4.5.3  a=2  a=3  a=4  AIC  806.58  715.80  752.15  BIC  830.00  774.34  791.18  D r o p o u t M o d e l s and Sensitivity  Analysis  In this study, dropout patients appear to have slower viral decay, compared with the remaining patients.  Thus, dropouts are likely to be informative or nonignorable. So we need to  assume a model for the dropout process in order to make valid likelihood inference. Although dropout models are not verifiable based on observed data, subject-area knowledge and sensitivity analysis based on plausible models may still lead to reasonable models.  Note that  we should avoid building a too complicated dropout model since a complicated model may become non-identifiable (Fitzmaurice et al.  1996).  Subject-area knowledge suggests that  dropout may be related to current or previous viral load and C D 4 measurements. Thus, we consider the following five plausible dropout models for sensitivity analysis  Model  I:  Model  II :  Model  III :  Model  IV :  Model  V :  where y^ (k < j)  = l|yi,i'i V)]= Vi + V2CDAij + rj yij l o g i t [ P ( r = i|yi, i i V)] = vi + V2Vi,j-i + myij, logit[P(rjj = i|y<,i\ V)]= Vi+ mCD^j + rj yi logit[F(r = i|y»,i ' , *)] = m + V2yik, k<j, logit[F(ry = i|y<,i> V)] = +r CD4* , logit[F(r  ij  z  3  z  y  z  3  tj  z  ij  z  in Model IV  Vl  l2  j  is the last observed response and CD A*- in Model V is  the estimated true C D 4 value for individual i. nonignorable missing response models, Model IV  Thus Models I - III  represent possible  represents a possible ignorable missing  response model, and Model V relates dropouts to (estimated) true C D 4 values.  98  We also  considered the following ignorable missing response models:  logit[P(r;j = l | y i ,  zfT])] = r)  X  +r) yn 2  logit [P(r^ = l|y*, zf 77)] = 771 + 7 7 2 ^ ,  but the resulting estimates are similar to those for Model IV,  so we only present results for  Model IV in Table 4.3. We assume independence of the r^-'s to simplify the model.  4.5.4  Estimation Methods and Computation Issues  We estimate the model parameters using the  naive method which ignores measurement errors  and missing data and the two proposed "joint" model methods discussed in Sections 4.3 and 4.4.  We denote the method in Section 4.3 by A P P R 1 and the method in Section 4.4 by  A P P R 2 . The two proposed joint model methods need starting values for model parameters since they are implemented by M C E M algorithms. We use the parameter estimates obtained by the naive method as parameter starting values for the two joint model methods. For the naive method, we use the S P L U S function nlme()  and ImeQ to obtain param-  eter estimates and their default standard errors. For the two proposed joint model methods, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 4.1 and 4.2 show the time series and the autocorrelation function plots for b associated with patient 14. From these figures, we 2  notice that the Gibbs sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 17. Time series and autocorrelation function plots for other random effects and missing responses show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 4.3.3). We start with ko = 500 Monte Carlo samples, and increase the Monte Carlo sample size as the number  99  O  200  400  GOO  800  10OO  Iteration  Figure 4.1: The time series plot for b associated with patient 14. 2  Series :  b2  t L T  Figure 4.2: The autocorrelation function plot for b associated with patient 14. 2  100  t of iteration increases: k  t+i  = k + k /c with c = 4 (Booth and Hobert, 1999). Convert  t  gence criterion for these two joint model methods is that the maximum relative change in the parameter estimates from successively iterations is smaller than 0.05.  Convergence of  the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the two proposed joint model method.  O n a S U N Sparc work-station, the A P P R 1 method took about 140 minutes to  converge while the A P P R 2 method took only 12 minutes to converge.  This shows that  A P P R 2 offers a big reduction in computing time, and thus is computationally much more efficient than A P P R 1 .  4.5.5  Analysis Results  We estimate the model parameters using the naive method and the two proposed joint model methods A P P R 1 and A P P R 2 . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P P R 1 and the A P P R 2 methods. We also tried several other parameter starting values for the proposed methods. Different parameter starting values appear to lead to roughly the same parameter estimates for both the A P P R 1 and the A P P R 2 methods. Table 4.3 presents the resulting parameter estimates and standard errors based on models / , IV,  and V in (4.13). We find that the two joint model methods provide similar  parameter estimates. We also find that the naive method may severely under-estimate the covariate C D 4 effect (i.e., /? ) and may poorly estimate some other parameters as well (this 3  will be confirmed by simulation). For the different dropout models in (4.13), we find that the resulting estimates based on the three nonignorable models (Models I, II,  III)  are all  similar, which indicates that the estimation may be robust against the nonignorable dropout  101  Table 4.3: Parameter estimates (standard errors) for the models in the example.  Model  Model I  method  Oil  Oil  NAIVE  -  -  a  Pi  -  11.72 65.71  PT -1.90  —  (3.8) (3.2) (.6) (5.5) (•2) 11.72 67.08 1.52 6.97 -1.83 (5.2) (6.2) .(•7) (5.8) (•2) 11.70 66.97 1.50 6.96 -1.90 (4.4) (5.8) (.6) (5.5) (•2)  (8.9)  (3.1)  7.86 -2.63 (7.9) (3.0)  .33 .50  11.73 66.52 1.37  8.83 -1.92  .35 .51  (•6) (•1) (.5) A P P R 2 -.43 4.21 -3.78 (.6) (•1) (.6) Model IV A P P R 1 -.43 4.18 -3.75 (•6) (•1) (•5) A P P R 1 -.43 4.21 -3.80 (•1)  (.6)  R  /V 8.66  A P P R 1 -.42 4.15 -3.75  Model V  6  PA P5 6.87 -2.58  3  (.6)  P2  ft 0.84  6.89 -2.62  (5.0) (6.0) (•7) (5.9) (•2) 11.74 66.79 1.44 6.89 -2.50 (•2)  (4.9)  (6.1)  (-7)  (5.9)  7.75 -2.54 (8.8)  (8.9)  .35 .51  (3.5)  (3.1)  8.60 -1.98 (8.9)  .35  .35 .50  (3.1)  Note: A and B are unstructured covariance matrices, but we only report the estimates of their diagonal elements here. Diag(A) = (.50,3.65,1.61) for APPR1, Diag(A) = (.52,3.80,1.66) for APPR2. Diag(B) = (1.08,77.12,2.03,24.98) for Naive, Diag(B) = (1.10,75.50,2.01,26.51) for APPR1, and Diag(B) = (1.09, 75.24,1.83, 22.37) for  APPR2.  models. The estimates based on the ignorable models (Models IV and V ) , however, appear, to be somewhat different, especially for the parameters associated with the decay rates XUJ and A ij. 2  This suggests that the missing responses (dropouts) may be nonignorable, and  reliable likelihood estimation must incorporate a reasonable nonignorable missing response model. Although some estimates in Table 4.3 are not statistically significant, the values of the estimates may still provide useful information about viral load and C D 4 trajectories. The estimates of parameters 772 and 773 in dropout model I based on A P P R 1 method (or A P P R 2 method) are -.05 and .97 (-.04 and 1.06) respectively, with both p-values less than 0.001, which also indicates that the dropouts may be nonignorable (or informative) since the missingness may depend on the missing values. The estimates of 772 and 773 indicate that dropout patients seem to have lower C D 4 counts and higher viral loads than the remaining patients.  102  4.6  A Simulation Study  In this section, we conduct a simulation, study to i) evaluate the performances of the semiparametric modelling and of the A I C / B I C knots selection method, ii) assess the two proposed methods ( A P P R 1 and A P P R 2 ) and compare them to the naive method ( N A I V E ) , and iii) evaluate the impact of specification of the missing response mechanisms on parameter estimation. We generate 100 datasets from the following model, which corresponds to the model (4.8) - (4.10),  =  log  u  =  A +  2ij  =  - 2 . 2 + (5.3 + 0.164*) sin(0.04 + 3ty),  y i j  log(P ) X  1 0  (Piie-  A l  ^ + P e-  A a  M  Xiij =  « * « ) + eii,  fez?,  + b, 2i  (4.14) log(P ) = A i + hi, 2i  (4.15) (4.16)  where the nonparametric model (4.16) is carefully chosen to closely mimic the viral load trajectory at later stages in the example of the previous section. The covariate model and the measurement time points used in the simulation are the same as those in the example of the previous section.  T h e true values of model parameters are similar to those in the  example. The true values of  (5 , /3 , /3 ) and a are presented in Table 4.4, and the other 2  3  4  true parameter values are 5 = .2, R = .4, A = diag(.5, 3, 2), and B = diag(l, 9, 2, 4). Note that b» = (bn,bi ,bi ,bii) 2  3  ~ N(0,B),  where b& is incorporated in the nonparametric  model (4.16). There are 147 viral loads after the rescaled time 0.25.  We regard the viral  loads out of 147 greater than the 45th percentile as missing responses. Thus, we delete 20% largest response values at the last few time points to mimic an informative missing response mechanism (i.e., missingness depends.on the values being missing).  T h e structure of the  data generated by the simulation study is similar to that in the example in Section 4.5. We calculate averages of the resulting estimates and their standard errors based on 103  each method, and compare the methods by comparing their biases and mean-square-errors (MSEs).  Here, bias and M S E are assessed in terms of percent relative bias and percent  relative root mean-squared error, as defined next.  For instance, the bias for ft, the j t h  component of 3, is defined as  biaSj = J3j- — fij, where ft is the estimate of ft. The mean-squared error for ft is defined as MSE.,- = bias + s), 2  where Sj is the simulated standard error of ft. Then, the percent relative bias of ft is defined as  bias^/lftl x 100%,  and the percent relative root M S E is  •^MSEj/l/3,-1 x 100%. First, to evaluate the nonparametric modelling, we study the performance of the A I C and B I C criteria in selecting the numbers of knots (p and q), since these numbers represent the degrees of smoothness of nonparametric functions (too large/small values may result in overfit/underfit). For the 100 datasets simulated from the semiparametric N L M E model (4.14) - (4.16), we find that all B I C values and 97% of A I C values lead to the model (4.11) (i.e., p = 3, q = 1). To further evaluate the A I C and B I C methods, we also generate data from models (4.8) - (4.12) with (/3 , 0 , (3 ) = (-2.0, 8.0, -3.0) (so the true number of knots 5  6  7  are known), and use the A I C and B I C methods to select the best model. The performance of the A I C and B I C methods is similar. These results show that the A I C and B I C criteria perform well in the current setting. 104  Table 4.4: Simulation results for the parameter estimates (standard errors) as well as their biases and M S E s for the estimation methods P A R A , A P P R 1 , and A P P R 2 .  Parameter  ai  a  a  02  03  04  True Value  -0.4  4.0  -4.0  12.0  67.0  1.5  7.0  -  -  -  11.94  60.91  -0.62  6.48  0.20  (•1) 12.00  (1.5)  (1.8)  66.87 (1.3)  1.49  (•2) 7.11  (•1) -1.89  (1.6)  (.3)  66.25  1.59 (1.4)  PARA  APPR1 APPR2  -0.39 (•1) -0.39 (•1)  PARA APPR1 APPR2  2  3  4.05 -3.99 (.3) . (.3)  (•1) 12.00  4.06  -4.01  (.3)  (•3)  (•2) Bias  (1.1)  -  -  -  -.48  -9.09  1.28 1.73  1.27  .17  .01  -.19  1.55  -.18  .03  -1.07  -.60 6.99  0s  06  07  (2.0)  9.65 (3.1)  -1.66 (1.1)  6.90  -3.11  10.18  -1.47  (•3)  (1.7)  (2.6)  (1.0)  -141.03 -7.43 1.29 -1.36  MSE  -  -  -  1.71  9.38  230.93  8.18  APPR1  21.33  10.02  9.87  2.48  98.67  4.12  APPR2  25.40  10.49  10.56  1.12 1.34  2.73  100.94  5.06  PARA  Note: Bias = Percent bias = 100 x b\as /\P \; J  J  MSE = Percent V M S E = 100 x , / M S E " / \ P J \ .  105  To investigate the effect of semiparametric modelling on the estimation of the fixedeffects parameters ft - ft, we consider the two proposed methods ( A P P R 1 and A P P R 2 ) for the semiparametric N L M E model (4.14) - (4.16), along with the covariate model (4.12) and a nonignorable dropout model (Model I in (4.13)). We also use a parametric N L M E model, where \ ij 2  = ft + b^ and the other parts are the same as in the semiparametric  N L M E model (4.14) - (4.16), to fit the simulated datasets.  To emphasize the difference  between the nonparametric and the parametric modelling for X j, 2i  we consider an ideal case  for the parametric N L M E model fitting, in which there is no covariate measurement errors and dropouts. Thus, we do not need the covariate measurement error model and the dropout model in this ideal case. We use S P L U S function nlmeQ  to obtain parameter estimates and  their default standard errors, denoted by the P A R A method. We calculate averages of the resulting estimates and their standard errors based on each method. Since A P P R 1 method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 137 data sets. in Table 4.4.  The simulation results are shown  We find that estimates for the fixed-effects parameters ft - ft obtained by  A P P R 1 and A P P R 2 are very close to their true values, and both methods perform better than the P A R A method in terms of bias and M S E criteria.  These results show that the  semiparametric modelling based on A I C / B I C for knots selection performs well and better than the parametric modelling in the current setting. To study the effect of missing data mechanisms, we assess the proposed methods ( A P P R 1 and A P P R 2 ) based on a nonignorable model (Model /) and an ignorable model (Model IV),  and compare them with the naive method (which ignores measurement errors  and missing data). To investigate the performance of the estimate of X ij, 2  we generate 100  datasets from the true models (4.8) - (4.12). In the simulations, the true values of model parameters 0 and a are shown in Table 4.5, and the other true parameter values, the missing  106  Table 4.5:  Simulation results for the parameter estimates (standard errors) for the three  estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13).  Dropout  Parameter  Oil  "2  a  Model  True Value  -.4  4.0  -4.0  NAIVE  -  -  -  -  -  APPR1  -.40  4.08  -4.00  (•1) 11.99  (•1) -.40  (.3)  APPR2  4.06  (•3) -4.01  (•1) 11.99  (.3)  (.3)  APPR1  (•1) -.40  4.06  -3.99  (•2) 12.00  (.3)  APPR2  (•1) -.39  4.06  (.3) -4.00  (•1) 11.99  (•1)  (.3)  (.3)  Model I  Model IV  •  3  -  ft  ft  ft  ft  ft  ft  ft  12.0  67.0  1.5  7.0  -2.0  8.0  -3.0  11.98  66.87 (1.2)  0.92  6.98  -2.41  9.64  -2.03  (1.1)  (0.3) 7.01  (1.9) -2.04  (2.9) 8.11  (1.0) -2.92  (•2)  66.93 (1.4)  1.48 (1.6)  (.3)  (1.9)  (3.1)  (1.2)  66.86  1.53  7.03  8.17  -2.87  (1.3)  (1.5)  (.3)  -2.ii (1.8)  (2.8)  (1.0)  67.15  1.45  7.01  -1.81  8.97  -2.22  (1.3)  (1.6) 1.42  (.3)  (3-2) 9.08  (1.2)  6.96  (2-1) -1.78  -2.28  (1.5)  (.3)  (1.8)  (2.8)  (1.0)  66.80 (1.2)  mechanism and the missing rate are the same as above. We calculate averages of the resulting estimates and their standard errors based on each of the three methods and each of the two dropout models. We compare the methods by comparing their biases and mean-square-errors (MSEs).  Since A P P R 1 method sometimes offers computational problems, such as slow or  non-convergence, the 100 sets of parameter estimates are obtained from 130 data sets. From the simulation results in Tables 4.5 and 4.6, we see that, when measurement errors and reasonable missing data mechanisms (Model /) are taken into account, the two proposed joint model methods ( A P P R 1 and A P P R 2 ) perform well in terms of both bias and M S E criteria. A P P R 1 performs better than A P P R 2 as expected, but A P P R 2 also performs reasonably well and is computationally much more efficient. When the missing data mechanism is ignored (Model IV),  however, the two methods may not perform well. The naive  method, which ignores measurement errors and missing data, may lead to severely biased estimates and large M S E s (e.g., the covariate effect ft can be severely under-estimated).  107  Table 4.6: Simulation results for biases and M S E s of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13).  Dropout Model  Parameter  Oil  OL2  QJ  Pi  ft  ft  ft  ft  ft  ft  True Value  -.4  4.0  -4.0  12.0  67.0  1.5  7.0  -2.0  8.0  -3.0  NAIVE  -  -  -  -.10  -.25  -38.42  -.61  -20.53  20.45  32.33  APPR1  -.78  1.95  .01  -.08  -.11  -1.59  .18  -2.21  1.40  2.62  APPR2  -.78  1.95  -.01  -.09  -.21  -1.78  .50  -5.70  2.08  4.30  APPR1  -.97  2.02  .16  -.08  .22  -3.63  .19  9.26  12.10  23.10  APPR2  1.40  2.07  -.09  -.09  -.29  -6.91  -.55  10.66  13.57  26.02  3  Bias Model I Model IV  MSE Model I Model IV  NAIVE  -  -  -  1.89  2.97  148.06  6.96  168.66  68.32  70.14  APPR1  20.00  7.28  7.00  1.17  1.84  96.01  3.43  80.02  35.15  40.42  APPR2  25.00  7.65  8.00  1.25  2.23  98.69  3.74  112.64 39.06  43.21  APPR1  30.51  9.90  9.84  1.45  2.49  120.12  4.84  122.80 56.26  58.90  APPR2  36.50  10.46  10.31  1.74  2.58  131.24  5.73  146.04 62.14  66.07  Note: Bias = Percent bias = 100 x biaSj/lPj ; MSE = Percent  4.7  V M S E  = 100 x V M S E J V I / 3 , - 1 .  Conclusions and Discussion  We have proposed two approximate likelihood methods for semiparametric N L M E models with  outcome-based informative  dropouts and covariate measurement errors and missing  data, implemented by Monte Carlo E M algorithms combined with Gibbs sampler. The first method is more accurate than the second method but it may be computationally very intensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. T h e second method is computationally much more efficient, but it is less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that the proposed methods, which incorporate measurement  108  errors and dropout mechanisms, produce satisfactory results, but methods ignoring measurement errors and/or ignoring dropout mechanisms may perform poorly. Moreover, the A I C and B I C criteria perform well in the current setting.  ^  We have assumed that the dropout models depend on the observed or unobserved responses and covariates. Alternatively, we may consider dropout models which share the random-effects parameters in response and covariate processes. Such models may be appropriate if the dropout mechanism is related to the true but unobservable response/covariate values or summaries of response and covariate processes such as unobservable true viral decay rates.  The methods in this chapter may be extended to such models.  Finally, for  Monte Carlo E M algorithms, Booth and Hobert (1999) proposed a nice automated method for choosing the number of Monte Carlo samples, which can be extended in our case as well.  109  Chapter 5 Semiparametric N L M E M o d e l with Random-effect-based Informative Dropouts and Covariate Measurement Errors 5.1  Introduction  In this chapter, we develop two likelihood methods to simultaneously address covariate measurement errors and random-effect-based models.  informative  dropouts in semiparametric N L M E  The major difference in the models in this chapter and the models in Chapter  4 is the difference in the assumed missing response (or dropout) models. The response and covariate models remain the same. In Section 5.2, we discuss the models for this problem. We obtain approximate M L E s of all model parameters, using a Monte Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 5.3.  110  To avoid potential compu-  tational problems in the method discussed in Section 5.3, we also propose an alternative approximate method by using a first-order Laplace approximation to the log-likelihood function in Section 5.4. Some asymptotic properties of the resulting estimates are also discussed. The two proposed methods are illustrated in a H I V dataset and are evaluated via simulation in Section 5.5. We conclude this chapter in Section 5.6 with some discussion. Asymptotic properties presented in Section 5-4 are proved in Section 5.7.  5.2  Missing Response Models  The dropout is random-effect-based informative if the missing probability of the current response depends only on the underlying unobservable random effects  rj)  =  /(rj|aj,  bj;  and  aj  bj,  i.e.,  /(rj|yj,  Z  j,  rj). In the presence of random-effect-based informative dropouts in the  N L M E model (2.6) and (2.7) with the covariate process (2.8), we can again write v* = (ymis,i,  as before. Let  y<jbs,i)  2/ij's are again intermittent,  n  be the number of components in  i  obSt  gobs,i)  with  gohS)i and  respectively. Let r j = (rn,..., i such that imply that  = 1 if  r  i j +  i = 1.  0  s  Here, the missing  i.e., we allow dropout individuals to possibly return to the study  at a later time. For the vector g, = (gn,...  gi = (grms.i,  y b ,i-  ri )  T  ni  g  m  i  S  :  i  ,gi ), ni  where ^  are defined in (3.6), we write  being the conditional expectation of  y  o b S i i  and  y i ,i, m  S  be a vector of missing response indicators for individual  is missing and 0 otherwise. Note that We have the observed data {(y 0 6s,i,  Zj,  = 1 does not necessarily  i"i), i =  1,...  ,n}.  To allow for an informative missing mechanism in the response, we need to assume a distribution for the missing response indicator r j . For reasons discussed in Sections 4.2 and 4.7, in this chapter we consider the random-effect-based informative dropout mechanism /(rj|aj,  bj;  rj) for i y where rj are the unknown nuisance parameters. For such a missing  data mechanism, the missingness of y - share the random effects a j and b j in the response id  111  and covariate models, suggesting that the dropout may be related to the true but unobservable response/covariate values or summaries of individual-specific response and covariate trajectories such as unobservable true viral decay rates.  For example, such missing data  models may be appropriate if a patient is more likely to dropout early because his true (but unobservable) viral decay rate is slower than other patients. In this chapter, we focus on the random-effect-based informative missing mechanism / ( r j | a j , bf  77). Such models are related to the shared-parameter models in the literature  (e.g., W u and Carroll, 1988; Little, 1995; Ten Have et al., 1998). Although the relationship of the missingness with the random effects may be complex, a simple logistic regression model may provide a reasonable approximation. We will assume that r^'s are independent, i.e.,  fin]*, b<; 77)  = IJ[P(r  =  IK b  77)] = log  —  y  i5  = I K b  7 7 ) p [ l - P(rij  » )] - «, 1  i;  (5.1)  r  7  with  logit[P(rij = I K  bf  l — r\Tij  where 77 =  (770, r/J, r] )  T  2  r = 770 + r)  ——  + 77 b;,  1  — l | a i , Oi,  2  .  rj)  are the unknown nuisance parameters.  For example, we may  assume that the missingness of response is related to the first decay rate, say XUJ = ft + b  i2  in Section 4.5., i.e., logit[F(ry = I K  h  i , 17)]  =  Vo + Vi UJ  =  V* + rjl(p  =  (Vo + V*iP2) +  =  r]o +  x  2  +  b) i2  rilb  i2  ]ibi .  r  2  Note that we should avoid building too complicated a dropout model since the model parameters may become non-identifiable. As the assumed dropout models are not testable 112  based on the observed data, it is important to carry out sensitivity analysis based on various dropout models. The random-effect-based informative dropout models in this chapter and the outcome-based informative dropout models in Chapter 4 can all be used for sensitivity analysis.  5.3 5.3.1  A Monte Carlo E M Method T h e Likelihood Function  We consider likelihood inference for semiparametric N L M E models with random-effect-based informative dropouts and measurement errors and missing data in time-varying covariates, based on approximate models (2.6) - (2.8). Let 6 = (ct, 0, 5 , R, A,B,  rjj) be the collection  2  of all unknown model parameters. We assume that the parameters ct, 0, 5 , R, A, B, and 2  rj are distinct. The approximate log-likelihood for the observed data  {(y bs,i, 0  i>  z  r  i),  i =  1 , . . . , n} can be written as  1(d)  =  n  ) fz(zi\&i;ct,R) x/(b ;B)/(r |a i  n  £ l o g i=l  II  i  by rf) dy  i  Z  dfy dbi  miSti  /V(y 6s,i| i> i a  0  b  a , 0, 5 ) f ( i\*i\ct, 2  h  x / ( b ; ; B) / ( r ; | a ; , b^; rj) da^dbi  113  f(sa;A)  z  z  R) / ( a * ; A) (5.2)  where  a*, bj; ex, (3, 5 )  /y(y b ,i|zi, 0  2  s  = n"=i'' fy(yobs,ij\ ij, z  a  * > » ; <*> b  P>  = nST"' (27r 5 )- /2exp{-[y , 2  /z(zi|aij a, i?)  obs  = n r = i fzfak\ai\  = Ilfcii |27r/?| x(z /(a*; A) f(h ; t  B)  _1/2  ij  geb  Ai  «, exp{-(z  ifc  - Uj'fe  a- v  ifc  a;)  r  i?"  1  - v a;)/2},  - ua ik  iA;  . ]*/2P},  1  (  ifc  =\27rA\-^ ex {- jA-^/2}, 2  P  =  a  |27r B|- /2 xp{-bfB- b,/2}, 1  1  J  e  and y 6s,ij is the observed yij. Note that unlike 1(6) in Section 4.3, the missing responses 0  y i ,i m  a  r  e  S  integrated out in (5.2).  The observed-data log-likelihood function 1(6) generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects &i and bj.  So we use a Monte C a r l o - E M ( M C E M ) algorithm to find the approximate M L E s of  parameters 6. B y treating the unobservable random effects a; and b, as "missing" data, we have "complete data" {(y bs,i, i,  *i, i,  z  bj), i = l , . . . , n } .  a  0  T h e complete-data log-  likelihood function for all individuals can be expressed as n  n  d)  l  G  =  YI i=l  -  Z  c°(0)  = XX  log  fy(yobs,i\ i, z  a(, hi, ex, 0, 5 ) + log /(zi|ai; 2  z  + log/(a ; 4)+log/(b,; B) + log/(r |a , h l  where if  ex, R)  i=l  J  J  is the complete-data log-likelihood for individual i.  114  t  i  %]  r,)}  (5.3)  5.3.2  A M C E M Algorithm  Let  be the parameter estimates from the t-th E M iteration. T h e E-step for individual i  at the (t + l)th E M iteration can be expressed as  Qi(0|0«)  = E{lf{9)\y ,  , r 0 « ) = J J J [log /y(y ,i|zj, a*, b a , 3, 5 ) 2  obs u  Z i  i5  obs  + log /z(zi|ai; a, i?) + log /(a  i;  i;  A) + log /(bi; 5 )  + log /(ri|ai, bi; 17) x /(ai, bi|y ,i, Z i , n; 0 ) da dbi (t)  o6s  =  / W ( a , /3, <5) + / « ( « , 2  J?) + 4V)  t  +  + iffa).  Since the expression (5.4) is an expectation with respect to /(ai, bj|y0j,Sjj,  (5.4)  Zj,  i y 0^), it may  be evaluated using the M C E M algorithm. Specifically, we may use the Gibbs sampler to generate samples from [ai, b ^ y ^ i ,  i y 0^] by iteratively sampling from the full conditionals  Zj,  [ai|y 6 ,i, Z j , r i , bi; 0W] and [bi|v ,i, z rj, a*; 0W] as follows. 0  s  ofcs  i;  /(ai|y ,i, Z i , rj, b»; 0<*)) a f(y , o6s  obSti  a ^ , r , bij 0W) t  =/(y fc ,i|zi, ri, ai,bi; 0^') •/(ai|zi, ri, bi; 0W) 0  s  oc /(y 6s,i|zi, ri, ai,bi; 0W) • /(ri, a^z^ bij 0 ) (t)  0  = /(y bs,i|zi, ai,bi; 0W) • /(ri|zi, ai,bi; 6 ) • / ( a ^ z * , ^ 0 ) {t)  (t)  0  = /(YcUzi, ai, b  i;  00>) • /(r-i|ai, b  i5  9®) • / ( a ^ ; 0 « )  oc /(y ,i|zi, ai,bi; 0(*>) • / ( r ^ a ^ h i ; 0W) • / ( z obs  ai; 0(*>)  i ?  = /(ai; 0(*)) • / ( y ^ l z i , ai, b ; 0W) . / ( z ^ ; 0(0) . / ( . | a , , b,; 0W) t  r  /(bi|y ,i, Z i , ri, ai; 0(*)) oc f{y b ,i, b^Zj, ri, ai; 0(*>) obs  0  s  = /(y b5,i|zi, ri,.ai,bi; 0 ) • /(bi|zi, r aj; 0 ) (i)  W  0  i;  OC /(yofcs,i| i, T i , ai,bij 0W) • / ( z  J*i, b i | Z i , a  i;  0W)  = /(y h5,i|zi, ai,bi; 0W) • / ( r i | , a,,^; 0<*>) • /(b^z^a*; 0 ) (t)  Zi  0  = /(b  i5  0 « ) • /(y ,i|zi, a,, b,; 0(«>) • /(ri|ai, b 0 « ) . ofcs  115  i;  Monte Carlo samples from each of the full conditionals can be generated using rejection sampling methods..  Alternatively, integral (5.4) may be evaluated using the importance  sampling method (see Section 3.3.3). We will briefly discuss the sampling methods in the next section. Note that, unlike the M C E M method in Section 4.3.2, here we do not need to sample Ymis,i,  which reduces the computational burden. For individual i, let { ( a f , b f ) , . . . , ( a f \  k  t  generated from [a*,  bi\y  o b S i i  Zj,  ,  ry,  b f ' ) } denote a random sample of size  Note that each ( a f , b f ) depends on the E M  0^}.  iteration number t, which is suppressed throughout. The E-step at the ( £ + l ) t h E M iteration can then be expressed as  Q(6\0M)  = ±  Qi{9\9®)  i=i  «  ±h  i=i I  E  = E E 17 log fy(yo ,iK i=i fc=i  y ,u  lf(0;  obs  z  fe=i  iy a f , b f ) }  h  J  a f , b f ; a , 3, 5 ) 2  bs  + E E i log / z f e l a f ; a , R) + £ £ £ log / ( a f ; A) i=lfc=l  + E E  i=lfc=l  i log / ( b f ;  B) + £ E £ log / W a f , b f ; r,) i=lfc=l  t=lfc=l  = Q  ( 1 )  ( « , /3, <5 |0«) + Q< )(a, J?|0W) + Q ( ) ( A | 0 « ) + 2  The M-step then maximizes  2  Q(0\0^)  3  +  Q^{B\0^)  to produce an updated estimate  is like a complete-data maximization. Since the parameters in are distinct, the M-step can be implemented by maximizing  t+l  2  2  so it  0^ \  + Q < \ Q^, + Q^ \  Q^{rj\0W).  Q^ \ 3  Q< ), and 4  and  separately using standard optimization procedures for the corresponding complete-data models. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0: Let sf  = dlf /80,  where if  is the complete-data log-likelihood for individual i.  116  Then an  approximate formula for the variance-covariance matrix of 9 is  n Z i , rf, 9) E{sf\y ^,  Cov(0) = [J2 ( c \yobs,i, E  s  ]  obs  z  it  r  9)  i ;  where the expectations can be approximated by Monte Carlo empirical means, as in (5.6). In summary, the foregoing M C E M algorithm can be implemented as follows. Step 1. Obtain an initial estimate of ( a , 0, S , R, A, B) = ( a ' ' , 2  B^)  and an initial value of (aj, bj) = (af\  0  /3  ( 0 )  ,  6 ^°\ Bl°\ A < ° \ 2  b- ^) based on a naive method; then we ob0  tain an initial estimate of 77 = rj^ based on the dropout model with the random effects  (a b) = (aS°\bW). i>  i  Step 2.  A t the t-th iteration, obtain Monte Carlo samples of the random effects  (aj, bj) using the Gibbs sampler along with rejection sampling methods, or using importance sampling methods to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates #( ) using standard complete-data optimization t+1  procedures. Step 4. Iterate between Step 2 and Step 3 until convergence.  5.3.3  Sampling Methods  Gibbs Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples as follows. {a\ \ k  Set initial values (&f\ bf^).  bf^), we can obtain (a[ \ k+1  b\ ^) k+l  Suppose that the current generated values are as follows.  Step 1. Draw a sample for the "missing" random effects a f ^ from /(aj|y h j, z 0  S]  i:  Step 2. Draw a sample for the "missing" random effects b\ ^ f r o m / ( b j | y t j , z k+1  0  Si  iy  i y b^;  9^).  Ti, a|  ^; 9^).  After a sufficiently large burn-in of r iterations, the sampled values will achieve a steady state. Then, {(a| \ bf^), k = r + 1 , . . . , r + k } can be treated as samples from the fe  t  117  fe+1  multidimensional density function  /(aj,  bj|y (, j, Zj, 0  S )  large gap r' (say r' = 10), we can treat { ( a f , samples from (af,  /(aj,  bj|y j, j, Zj,rj; 0  S )  ry, 0 ^ ) . A n d , if we choose a sufficiently  b f ) , k = r + r',r + 2r',...  } as independent  0®). There are several ways to get the initial values  b f ) . A simple way is to obtain ( a f , b f ) based on a naive method.  Multivariate Rejection Algorithm Sampling from the two full conditionals can be accomplished by the multivariate rejection sampling method. For example, we consider sampling from 9®)  in (5.5).  Let  /*(aj)  =  /(y  o b s  ,j|zj,  aj,  / ( a j | y ( , j , Zj,  b,; 0 W ) / ( z | a ; 0 ( « > ) / ( r ^ , i  sup{/*(u)}. We assume <; < oo. A random sample from  i  /(aj|y  o h s  j,  Z i  , iy  0  b  i ;  bjj  S  9®)  ry  bj;  and ? =  0^) can then  u  be obtained as follows Step 1. Sample a* from / ( a j ; 0^),  and independently, sample w from the uniform  (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from  /(bj|y (, j, 0  S i  Z j , r-j, a j ;  0^) can be obtained in a similar way. Therefore,  the Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from  5.4 5.4.1  [aj, b j | y 6 0  S i  j , Zj,  A n Alternative Approximate Method The Hierarchical Likelihood Method  The approximate maximum likelihood inference using a Monte Carlo E M method in the previous section may be computationally intensive and sometimes may offer potential computational problems such as slow or non-convergence, especially when the dimensions of the random effects a j and b j are not small (see the detailed discussion in Section 3.4.1). To  118  overcome these difficulties, in this section we consider an alternative method called the hierarchical likelihood method (Lee and Nelder, 1996, 2001) for approximate inference. T h e hierarchical likelihood method avoids Monte Carlo approximation and thus is always computationally feasible, and it can also used to obtain good parameter starting values for the M C E M method in the previous section. Let £ denote general "nuisance parameters".  Lee and Nelder (1996) considered a  function pi (I) defined by  £4' where D(l, £ ) = -d l/d£ , 2  and £ solves dl/d£  2  = 0. Following Lee and Lelder (1996), the  complete-data log-likelihood function l (&) in (5.3) may also be called the hierarchical logc  likelihood function since it combines the two stages of mixed-effects models. Let UJ = {u>; = (a^, bi), i = 1,... ,n} be the collection of random effects. The function p&(l (Q)) c  can be  written as  i o™-  — v _ c  L  1=1  Z  ^~ " ~  1 1 , .  I Z7T  IJ  '  (5-7)  U>i=UJi  1=1  We can show that, for unobservable UJ, the use of the function PQj(l {9)) is equivalent to c  integrating UJ out using the first-order Laplace approximation. Thus, P(j(l {Q)) c  is the first-  order Laplace approximation to the marginal log-likelihood 1(6) in (5.2) using the hierarchical log-likelihood function l (d). c  In fact, let N = n b i + 0  be the number of the response and covariate observations  S:  for individual i and let b be the dimension of w . Assume that Ni — O(N) uniformly for ;  i = 1,... , n, where'TV = mini TV*. Taking k = N, kp(x) = lf(6), 7 = 6, and x = a?* in the following Laplace approximation  Id p(x) I dx 2  e  fc (x) P  d x  =  (  2 7 r  //c)^  2  2  119  +  0(k~ ), l  where x is a 7-dimensional parameter vector and x maximizes kp(x),  we can approximate  the ith individual's contribution U(0) to the overall log-likelihood 1(0) as h(0)  j  =  log J e * du  =  log  =  log  l  ){e)  i  (^)  b/2  = \og  J  |D(p(  W i  e  duji  Nip{U}i)  )i «i)  \^r e ^ /2  N  + O (iVr^)| p  -1/2  6/2  27rY  ^ ^+O (TV- ) )  1  P  Ni) -1/2  =  4°(0)  log U > = C J  =  log[ex {p^(^)(0))} + O ( i V - ) ] 1  P  = Pc2,;(^ W) + O W - ) . i )  (5.8)  1  in which the last step holds by Lemma 3.2 in Section 3.7. Hence, the log-likelihood 1(0) can be approximated as  1=  =  1  Pc (U0)) + Y2o(N- ) i  J  1=1  = As  (5.9)  Po;(a0))+nO(iV- ). 1  N — miriiNi grows faster than n, the function p&(l (Q)) c  approaches the marginal log-  likelihood function 1(0), and hence an estimate of 0, which maximizes pjj(l (0)),  also maxi-  mizes 1(0). This lead to the following algorithm to obtain an approximate M L E  of 0 called  c  OHL'-  Step 1. Obtain an initial estimate of (ct, A(°\  3,  5 , R, 2  A, B,  w)  =' ( a ' ' , 0  3 , {0)  5 -°\ 2(  Rp\  B(°\ U/°)) based on a naive method, and an initial estimate of r) = rj^ based on the  dropout model with the random effects u / ° ) . 120  Step 2. Given the current parameter estimates t  U  +V>  update random-effects estimates  with respect to a;*, i = 1 , . . . , n .  by maximizing lc\0^)  Step 3. Given the random-effects estimates u>f \  update the parameter estimates  +1  #( ) by maximizing p ,(t+i)(Z (^)) with respect to 0. t+1  a  c  Step 4. Iterate between Step 2 and Step 3 until convergence. We can use Fisher information to obtain the following approximate formula for the variance-covariance matrix of the approximate M L E OHL  C o v ( ^ ) =[[ - ^80dO ('f 8  v  T  )  r  "' 0=0  HL  Many optimization procedures evaluate the matrix [—d p^j(l (0))/dOd6 ] 2  T  c  1T1  at 6 = BHL  (called Hessian matrix), from which it is easy to obtain Cov(0 HL)-  5.4.2  Asymptotic Properties  Under suitable regularity conditions on 1(0), g(-), and d(-), we extend Vonesh (1996) to show in Section 5.7 that  (OHL — 00) =  where 6  0  O  p  max | n  2, ^min  N^j |  is the true value of 6. Thus, the approximate M L E OHL will be consistent only  as both n and (min; Ni) —* 0 0 . Intuitively, the theory while the (mini Ni)^  1  term comes from standard asymptotic  term comes from the Laplace approximation.  Note that the accuracy of the first-order Laplace approximation to the log-likelihood function is 0 { n / ( m i n ;  Ni)},  or, equivalently, o(l) provided (min;TV;) grows faster than n.  In this case, (OHL ~ Oo) = O (n~^) with OHL being asymptotically equivalent to the "exact" p  M L E . This reflects the fact that, as the accuracy of the Laplace approximation to the loglikelihood increases, the approximate M L E OHL will behave more and more like the "exact"  121  M L E . However, we can decrease the growth rate of (min^ N) of OHL- In particular, as (min, N ) t  of OHL will still be O {rT^) v  for the asymptotic normality  grows at a rate greater than n%, the rate of consistency  and the resulting estimate will be asymptotically equivalent to  the "exact" M L E in the sense that it has the same asymptotic distribution as the "exact" M L E (see Section 5.7). We correct the claim by Vonesh (1996) that, as (minj/Vj) grows at a rate greater than nh but less than or equal to n, the rate of consistency will still be  O (n~%) p  but the resulting estimate will no longer be asymptotically equivalent to the "exact" M L E . The proofs of the above arguments are given in Section 5.7.  5.5 5.5.1  Example and Simulation Example  We use the same H I V dataset in Section 4.5 to illustrate our proposed methods in this chapter, but we use the random-effect-based the outcome-based  informative  informative  dropout model here rather than  dropout model in Section 4.5.  models may be used for sensitivity analysis.  These informative dropout  We use the commonly-used naive  method,  which ignores measurement errors and missing data, for parameter starting values in the two proposed methods. See the data description in Section 4.5.1.  The Response and the Covariate Models We consider the same H I V viral dynamic and C D 4 measurement error models in Section 4.5.2. For completeness, we describe these models again here.  122  For the viral load  process, we consider the following model  Vij  =  \og {P - ^+  log(Pii)  =  Pi + bu,  log(P )  =  2t  P  X  W  ft  LIE  X  Uj  + hi,  e- ^) x  2 i  = ft +  ft4  +  ,  (5.10)  + 6 ,  (5.11)  e i j  2i  Mij = w(Uj) + hi(tij),  (5.12)  where y,j is the logi -transform of the viral load measurement for patient i at time t^, P^-and 0  P j are baseline values, A^- and A ^ are viral decay rates, z*j is the true (but unobservable) 2  2  CD4 count, and iw(£y) and h^tij)  are nonparametric fixed- and random-effects functions  (see Section 2.1). In order to reduce the number of nuisance parameters, we assume that the variance-covariance matrices A and B of the random effects are both diagonal matrices. To avoid very small (large) estimates, which may be unstable, we standardize the C D 4 counts • and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines with percentile-based knots to approximate w(t) and hi(t). Following W u and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects. A I C and B I C criteria are used to determine the values of p and q, which leads to the following model for A r , in (5.12) (see Table 4.1), with p = 3 and q — 1, 2  A  2 i j  « ft + ft MUj)  + ft MUj)  + hi-  (5.13)  For the C D 4 process, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed C D 4 values, and is done separately from the response model for simplicity. The following quadratic polynomial L M E model best fits the C D 4 trajectory (see Table 4.2):  CD4*/ = (ai + ai) + ( a + a ) u 2  2  123  a  + ( a + a ) u\ + e 3  3  ih  (5-14)  where uu is the time and a (an, aj2, a^)  T  = (ai, a , a^)  T  2  are the population parameters and a;  =  are the random effects.  Random-effect-based Informative Dropout Models In this study, dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, dropouts are likely to be informative or nonignorable. So we need to assume a model for the dropout process in order to make valid likelihood inference. Note that we should avoid building too complicated a dropout model since a complicated model may become non-identifiable. Subject-area knowledge and preliminary checks suggest that dropout may be related to the random-effects components an, a , i2  and b , so we consider i2  the following dropout model logit[P(rjj = l | a i , b^ T?)] =r]  1  + rj an + ri a 2  3  i2  + 774^2-  (5.15)  We assume independence of the r^'s to simplify the model. The dropout model (5.15) along with the dropout models in Section 4.5 can be used for sensitivity analysis.  Estimation M e t h o d s and C o m p u t a t i o n Issues We estimate the model parameters using the two proposed "joint" model methods discussed in Sections 5.3 and 5.4. We denote the method in Section 5.3 by A P and the method in Section 5.4 by H L . The two proposed joint model methods need starting values for model parameters since they are implemented by a M C E M algorithm or by an iterative Laplace approximation to the log-likelihood function. We use the parameter estimates obtained by the naive method, which ignores measurement errors and missing data, as parameter starting values for the two joint model methods. For the naive method, we use the S P L U S function nlme()  and lme() to obtain param-  eter estimates and their default standard errors. For the proposed A P method, we assess the 124  convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 5.1 and 5.2 show the time series and the autocorrelation function plots for b associated with patient 14. From these figures, we notice that the Gibbs 2  sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 15. Time series and autocorrelation function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 5.3.3). We start with k  0  = 500 Monte Carlo samples, and increase the Monte-Carlo sample  size as the number t of E M iteration increases:  k +i = k + k /c t  t  t  with c — 4 (Booth and  Hobert, 1999). Convergence criterion for these two joint model methods in our examples is that the relative change in the parameter estimates from successively iterations is smaller than 0.05. Convergence of the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the A P method. Other sampling methods may also be applied and may be even more efficient.  O n a S U N Sparc  work-station, the A P method took about 135 minutes to converge while the H L method took about 150 minutes to converge. The H L method took more time than the A P method mainly because all model parameters appear in the nonlinear function p<jj(l (9)) and no sepc  aration of the parameters is possible. However, the H L method is always computationally feasible while the A P method sometimes may have convergence problems. Moreover, the H L method can be used to obtain good parameter starting values for the A P method.  Analysis Results We estimate the model parameters using the two proposed joint model methods A P  125  Series :  b2  Figure 5.2: The autocorrelation function plot for b associated with patient 14. 2  126  Table 5.1: Parameter estimates (standard errors) for the models in the example.  c Method  ai  a  2  AP  -.43  HL  (•1)  (-4)  ft  «3  ft ft ft  ft  ft  11.72 66.65 1.53  6.93 -1.86  7.49 -2.36  (.2)  (.5)'  (7.8)  4.32 -3.93  (4.3) (4.6) (•2) 11.64 66.44 1.58  (.7)  -.41  6.89 -1.92  (.6) (.5)  (•1)  (3.4)  (2.8)  (.6)  (4.9) (4.8)  S  ft  4.29 -3.90  R  .36 .51  (2.7)  7.46 -2.29 (7.5) (2.7)  .35 .50  Note: the estimated covariance matrices are A = diag(.62,4.70, 4.41) for AP, A = diag(.51, 4.74, 4.53) for HL. B = diag(1.45,91.62,1.94,20.16) for AP, and B = diag(1.42,91.91,1.58,19.96) for HL.  and H L . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P and the H L methods.  We also tried several other parameter  starting values for the proposed joint model methods.  Different parameter starting values  lead to roughly same parameter estimates in both the A P and the H L methods. Table 5.1 presents the resulting parameter estimates and standard errors based on the random-effect-based informative model (5.15). We find that the two joint model methods provide similar parameter estimates.  Comparing the random-effect-based informative  dropout model with the outcome-based informative dropout model I in (4.13), we find that the resulting estimates are similar. This indicates again that the estimation may be robust against the nonignorable dropout models. Although some estimates in Table 5.1 are not statistically significant, the values of the estimates may still provide useful information about viral load and C D 4 trajectories. The estimates of parameters rj based on the A P method (or the H L method) are -2.32, .31, -.05,, and - . 0 7 (or -2.4.1, .27, -.04, and -.08) respectively, with all p-values less than .00001, which also indicates that the dropouts may be nonignorable (or informative) since the missingness may depend on the unobservable random effects. The estimates of rj , n , and 77 indicate that dropout patients seems to have higher baseline 2  3  4  CD4 count, decrease in C D 4 count faster over time, and have slower first decay rate than the remaining patients.  127  5.5.2  The Simulation Study  We evaluate the proposed methods ( A P and H L ) for the random-effect-based  informative  model (5.15) via simulation. The response and covariate models, the random-effect-based informative  dropout model, and the measurement time points used in the simulation are all  the same as those in the example in the previous section (i.e., (5.10) - (5.14)). We choose appropriate values of rj to mimic certain missing rate, and we use the S P L U S function sampleQ to generate binary data r y based on the values of parameters 77 and the random effects 3n and by If ?y = 1, then y - is deleted, and if ry = 0, id  is considered to be observed.  In the simulations, the true values of model parameters 3 and oc are shown in Table 5.2, and the other true parameter values are 5 =  .2,  R = .4,  A = diag(.5, 3, 2), and  B = diag(l, 9, 2, 4). We set 77 = (—1.4,0.1, —0.1, —0.1) to get an average missing rate of r  20%. We always regard the first two responses on each individual as observed, i.e., every individual has at least two observed responses. We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors based on each of the two methods.  We compare the methods by  comparing their biases and mean-square-errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined in Section 4.6. Since A P method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 128 data sets. From the simulation results in Tables 5.2 and 5.3, we see that the two proposed joint model methods ( A P and H L ) perform well. The A P method performs better than the H L method in the sense that the A P yields smaller relative M S E and bias than the H L method. The H L method also performs reasonably well and it is always computationally feasible. Therefore, the H L method may be a good alternative method when the A P method exhibits computational difficulties, and the H L method can also be used to obtain good parameter  128  Table 5.2: Simulation results for the parameter estimates (standard errors) for the estimation methods A P and H L . Parameter True Value  Oil  a  2  «3  ft  A  ft  ft  ft  ft  ft  -.4  4.0  -4.0  12.0  67.0  1.5  7.0  -2.0  8.0  -3.0  AP  -.40  4.01  -3.98  11.99 66.95  1.49  6.99  -2.01  8.04  -2.97  (•2) -.38  (.5)  (.6)  (1.4)  (1.6)  (.3)  (2.1)  (3.2)  (1.1)  3.96  -3.93  (•1)  (•4)  (.5)  HL  (•1) 11.99  67.10  1.56  6.95  -2.08  8.09  -2.89  (•1)  (1.0)  (1.3)  (.3)  (1.8)  (2.9)  (1.0)  Table 5.3: Simulation results for bias and M S E of the parameter estimates for the estimation methods A P and H L . Parameter True Value  a\ -A  <%2  4.0  a  3  -4.0  ft  ft  ft  ft  ft  ft  67.0  1.5  7.0  -2.0  8.0  -3.0  -.1 .22  -.44  -.13 -.26  -.53 -3.78  .46 .62  .25  1.98 4.24  71.84  28.95  22.26  98.75  37.30  31.71  ft-  12.0 Bias  AP HL  1.59 .15 2.17 -1.02  .95 1.31  -.07 -.08  1.08  .93  MSE AP  7.93  6.46  7.39  1.66  1.74  66.69  HL  9.07  8.33 . 9.31  1.69  2.43  93.67  Note: Bias = Percent bias = 100  x  biaSj/|/?j|;  MSE = Percent ^MSE = 100  129  x  ^/MSE^/I^I.  starting values for the A P method.  5.6  Discussion  We have proposed two approximate likelihood methods for semiparametric N L M E models with random-effect-based  informative  dropouts and covariate measurement errors and missing  data, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler or by an iterative Laplace approximation to the log-likelihood function respectively. The first method may be more accurate than the second method but it sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method is always computationally feasible but may be less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that both methods produce satisfactory results. Although it does not need to generate Monte Carlo samples for random effects, the second method may not be computationally more efficient than the first method. A possible reason is that there are too many model parameters appear in the nonlinear functions in optimization procedures and no separation of the parameters is possible. A possible solution is to use Bayesian method to address this problem. Specifically, we may assume the known hyperprior distributions for the model parameters. In many longitudinal data sets, dropouts, censoring, measurement errors, and missing covariates are all present simultaneously.  To our knowledge, there are almost no unified  methods in the literature which address these problems simultaneously.  W u (2002) pro-  posed a unified method to address censoring and measurement errors simultaneously and  130  showed that the proposed method offered significant improvement over existing methods currently in use. T h e ideas in W u (2002) and in this chapter can be extended to address dropouts, censoring, measurement errors, and missing covariate simultaneously in semiparametric/nonparametric N L M E models.  5.7  Appendix: Asymptotic Properties of the Approximate M L E #HL in Section 5 . 4  5.7.1  Consistency  We will show that the following result  (OHL  —  Oo) = O m a x j n  s^m.inA^  V  j  holds under the usual regularity conditions on 1(0), g(-) and d(-), where 6  0  is the true value  of 0. Proof. Let u); maximize lc\0) Suppose that Ni = O(N)  with respect to u>; for fixed 0.  Denote Ni = n b i + rrii. 0  uniformly for i = 1 , . . . , n, where ./V = mim Ni.  s<  Based on (5.8)  in Section 5.4.3, the zth individual's contribution k(0) to the overall log-likelihood may be approximated as  m  = POMW)  + ^r )  = P& {W(o)).+ O(N- ). .  1  1  T  Hence, the log-likelihood 1(0) can be .written as (see (5.9))  l(0) = l*(0) + O{nN~ }, 1  where l*(0) = p^(l (0)) c  = E H i P u ^ W ) -  L  e  t  *( )  u  e  .  = dl*(0)/d0  (5.16)  and let 0  HL  be the  approximate maximum likelihood estimate satisfying U*(0HL) — 0- Under suitable regularity 131  conditions on 1(0) and assuming QHL is an interior point in a neighborhood containing 0 , 0  Taylor's theorem tells us that there exists a vector 0 on the line segment between 0  0  and  OHL such that  n- u(0 )  = n- u(0 )  l  where u(0) — dl(0)/dO  + - M(0)(0  x  HL  and M(0)  -  l  Q  n  HL  = 8 l(0)/d0d0 2  0 ),  (5.17)  0  are the first and second order derivatives  T  of the true but intractable marginal log-likelihood 1(0). The. first term n u ( 0 ) on the right - 1  o  of (5.17) is  iu(« , = I«2> 0=0 0  n  n  =  ly  dk(0) 0=00  80  O  oO  Given sufficient regularity conditions on 1(0), we know from the Lindeberg Central Limit Theorem that  (5.18)  ^u(0 )-^(0,7(00)), o  where the matrix 7(0) = l i m ^ o o ^ £ h(0) and h(0) is the information matrix for individual " i=l i. That implies  - ^ u ( 0 ) = O (l) o  The matrix n~ M(0) l  O (n- ' ). 1 2  o  p  on the right of (5.17) is  -M(0) = n '  1  l  v  ^u(0 ) =  p  ndOdO  T  0=0  A  d\(0)  n Z-u ^ dOdO  0=0  1  n  (5.19)  -1(0),  by the Law of Large Numbers. Since 1(0) is positive definite for all 0, the probability that the matrix n  _ 1  M ( 0 ) is invertible tends to 1. B y writing n' M(0) l  = —1(6) + o (l) p  and  applying Lemma 3.2 in Section 3.7 to the inverse function, we have  [n- M(0)Y l  l  = -IijO)-  1  132  +  o (\). p  (5.20)  Given suitable regularity conditions ong(-) and d(-), for example that third order derivatives exist and are continuous in an open neighborhood about 0 , application of Lemma 3.2 in O  Section 3.7 to the partial derivative function in the expression (5.16) leads to  n- u(6 )  1  HL  From (5.17), we have  -1  1  H L  - 6o) = n - y f l i / L ) -  n^u^o)  0 ) = [ - M(d)]- [n' vi{e )  (OHL -  l  0  l  -  l  n  (5.21).  1  HL  Note that u*(0HL) = 0 and n u ( 0 o ) = Op(n~i). n- M(0)(0  + 0(N- ).  = n- u*{d )  1  HL  n "  1  ^ ) ]  (OHL - Oo) = ( - / ( 0 ) " + o ( l ) ) [ n - u ( 0 ) - n " ^ ) ] 1  1  1  p  (OHL  - Oo) = (-1(G)-  1  0 ) = -I(O)-  (OHL -  H i  + o {l)){n- u\0 L) l  O maxjra^,  1  0  +  H  p  AT }'] 1  p  m a x j n .v^nrin/V^  (by (5.20))  O ^ " ) + O (n^)\  + o  1  p  (by (5.21))  m a x j n - ^ , iV >}] _1  p  }J .  Finally, let OML denote the "exact" maximum likelihood estimate with U(0ML) Let min; N  = 0(n ) T  =  0-  for r > 1 so that the accuracy of the Laplace approximation to the  marginal log-likelihood is approximately 0(n ~ ) l  = o(l) from the formula (5.16).  T  Then,  under the same regularity conditions as before, by multiplying n on the both sides of the equation (5.21) and noting that U(0ML) = 0, we have  U(0HL) = U*(0 L) + O (1) = O + O (1) = U(0 ) + O (1). H  Thus U(9HL) ~ (0ML) u  = o (l) p  p  p  ml  p  and hence OHL is asymptotically equivalent to the "exact"  maximum likelihood estimate OML-  5.7.2  d  Asymptotic Normality of #HL  In this section, we will show that as N grows at a rate greater than n%, i.e., N =  0(n ) T  for r > | , the approximate M L E OHL and the "exact" M L E OML have the same asymptotic 133  distribution. Proof. Noting that the approximate M L E OHL satisfies a set of equations U*(OHL) = , U  take a first-order Taylor series expansion of U*(0 L) H  =  O = u*(0 ) HL  around the true parameter 0  U*(0 ) +  W  E  O  ^-^-(0HL-0 ),  0  0  dO where 0* is oh the line segment joining 0  0  V^{0HL-0 ) 0  to OHL, which implies  1 du*{0*) "i - i  =  n  80  1  1  i=l  (5.22)  80  dOdO  Now we consider the two product terms on the right of (5.22).  Applying Lemma 3.2 in  Section 3.7 to the first and second partial derivative functions in the expression in (5.16), we know that, for any fixed 0,  4=u*(6>) = - ^ u ( 0 ) + 0(n 2 A " ) l  1  dO  dO  +  U  ^  )>  N  (5.23)  and  n  dO  T  n  dO  T  dOdO  n^dOdO  T  T  i=i  Assume that N = 0(n ), T  1  (5.24)  '  i=i  o(l). From  where r > \ . Then ~0(n^ N' )  =  1  0(n^- ) T  (5.23) and (5.24), we have _  £%>^h 1  l  ™  1  dlj(Oo)  aO  A rpp^f))  « £  a0a0  _ r  ~ 134  l i m  i f »  (5.25)  *«e-) "  Note that OHL — Oo = O [ma,x{n~%, N^ }] = O (n~^), i.e., OHL is a -^-consistent estimate 1  p  p  of 6o- Since 9* is on the line segment joining 0  O  and OHL,  A- 0  O  as n —> oo. Under the  same regularity conditions as before, it follows from (5.18) and (5.19) that  i=  n  „.  (- ) 5  26  Combining the results in (5.25) and (5.26) and using Slutsky's theorem, we can show that  V^(0HL-0O)  which implies that when N = 0(n ) T  A/V(0,  Wo)' ), 1  for r > | , the approximate M L E OHL  M L E OML have the same asymptotic distribution.  135  a  n  d the "exact" •  Chapter 6 Conclusions and Future Research 6.1  Conclusions  In this thesis, we have developed approximate maximum likelihood inference in the following three problems: (1).  semiparametric N L M E models with measurement errors and missing  data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based  informative  missing responses; (3). semiparametric N L M E  models with covariate measurement errors and random-effect-based  informative  missing re-  sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we have proposed two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, may be more accurate than the second method but it may be computationally very intensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approxima-  136  tion, is computationally more appealing, but it may be less accurate than the first method. The performance of the second method may need further investigation.  We have showed  some asymptotic results for the estimates based on'the second method. T h e second method may be used as a reasonable alternative when the first method has convergence problems or be used to provide excellent parameter starting values for the first method. Simulation results have shown that all proposed methods perform better than the commonly used two-step method and the naive method which ignores measurement errors, in the sense that the proposed methods yield smaller bias and M S E . In particular, the commonly used two-step method may under-estimate standard errors, which is consistent with analytic results, and the naive method may under-estimate covariate effects and poorly estimate other parameters.  6.2  Future Research Topics  Finally, we discuss possible future work relevant to this thesis as follows. 1. In many longitudinal studies such as H I V viral dynamics, another common problem is that the response measurements may be subject to left censoring due to a detection limit.  Censored responses in practice were often substituted by the detection limit  or half the detection limit (Wu and Ding, 1999; W u and W u , 2001), which may lead to substantial biases in the results (Wu, 2002).  In the presence of both dropouts  and censoring, unified approaches which address these problems simultaneously in semiparametric/nonparametric N L M E models are needed in order to make reliable statistical inference.  2.  In many longitudinal datasets, dropouts, censoring, measurement errors, and missing covariates are all present simultaneously. To our knowledge, there are almost no 137  unified methods in the literature which address these problems simultaneously.  Wu  (2002) proposed a unified method to address censoring and measurement errors simultaneously and showed that the proposed method offered significant improvement over existing methods currently in use. The ideas in W u (2002) and in this thesis can be extended to address dropouts, censoring, measurement errors, and missing covariates simultaneously in semiparametric/nonparametric N L M E models.  3. For the response process, we only consider semiparametric nonlinear mixed-effects models with independent and normal distributed error terms e;. In the future, we may consider more complicated covariance structure for e; such as an AR(1) structure.  4.  In our study, we only consider semiparametric nonlinear mixed-effects normal data.  models for  Generally, our proposed methods may be extended to other models,  such as semiparametric/nonparametric generalized linear mixed-effects  models and  semiparametric/nonparametric generalized nonlinear mixed-effects models.  5. Computational efficiency is an important issue in our study. Multivariate rejection sampling methods have been used in our data analyses and simulation. In general, other sampling methods, such as adaptive rejection sampling methods and importance sampling methods, may also be used and may be even more efficient. We plan to compare computational efficiency among several sampling methods in our current setting.  6. In our alternative methods, we have approximated log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approximation. Sometimes, these are not necessarily accurate approximations. In the future, we may investigate better approximations, such as higher order Taylor expansions and Laplace approximations.  138  7. One problem in our models is that there are too many parameters. If the data are not rich enough, the proposed methods may have convergence problems and identifiability problems. We plan to develop Bayesian methods for our problems.  139  References Aitchison, J . and Silvey, S. D . (1958). Maximum-likelihood estimation of parameters subject to restraints. The Annals  of Mathematical  Statistics,  29, 813-828.  Amemiya, T . (1983). Nonlinear regression models. In Handbook  of Econometrics,  Volume  I, Z. Griliches and M . D . Intriligator, eds., pp.333-389. Noth Holland, Amsterdam.  Barndorff-Nielsen, O. E . and Cox, D . R. (1989). Asymptotic  Techniques  for Use in  Statis-  tics. New York: Chapman and Hall.  Booth, J . G . and Hobert, J . P. (1999). Maximizing generalized linear mixed models likelihoods with an automated Monte Carlo E M algorithm. Journal Society,  of the Royal  Statistical  Ser. B, 61, 265-285.  Bradley, R. A . , and Gart, J . J . (1962). The asymptotic properties of M L estimators when sampling from associated populations. Biometrika,  49, 205-214.  Carroll, R. J . , Ruppert, D . , and Stefanski, L . A . (1995). Measurement Models.  Error  in  Nonlinear  London: Chapman and Hall.  Chan, K . S. and Ledolter, J . (1995). Monte Carlo E M estimation for time series models involving counts. Journal  of the American  140  Statistical  Association,  90, 242-252.  Davidian, M . and Giltinan, D. M . (1995). Nonlinear Data.  Models for Repeated  Measurements  Chapman & Hall.  de Boor, C . (1978). A Practical  Guide to Splines.  Demidenko, E . (2004). Mixed Models  Springer-Verlag, New York.  Theory and Applications.  John Wiley & Sons.  Dempster, A . P., Laird, N . M . , and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the E M algorithm (with discussion). Statistical  Society,  Ser.  B, 39,  Journal  of the  Royal  1-38.  Diggle, P. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis (with Discussion). Applied  Statistics,  43, 49-93.  Ding, A . and W u , H . (2001). Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics,  Euband, R. L . (1988). Spline smoothing  and Nonparametric  Regression.  2(1),  13-29.  New York: Marcel  Dekker.  Fitzmaurice, G . M . , Laird, N . M . , and Zahner, G . E . P. (1996). Multivariate logistic models for incomplete binary responses. Journal  of-the American  Statistical  Association,  91,  99-108.  Gelfand, A . E . and Smith, A . F . M . (1990). marginal densities. Journal  Geweke, J . (1996). Handbook  of the American  of Computational  Sampling-based approaches to calculating Statistical  Economics.  Association,  Amsterdam:  85, 398-409.  North-Holland.  Gilks, W . R. and W i l d , P. (1992). Adaptive rejection sampling for Gibbs sampling. Statistics,  41, 337-348.  141  Applied  Green, P. J . and Solverman, B . W . (1994). Linear  Models.  Nonparametric  Regression  and  Generalized  Chapman and Hall, London.  Grossman, Z . , Polis, M . , Feinberg, M . B . , Grossman, Z . , Levi, I., Jankelevich, S., Yarchoan, R., Boon, J . , De Wolf, F . , Lange, J . M . A . , Goudsmit, J . , Dimitrov, D . S., and Paul, W . E . (1999). Ongoing H I V dissemination during H A A R T . Nature Medicine,  5 , 1099-1103.  Higgins, D. M . , Davidian, M . , and Giltinan, D. M . (1997). A two-step approach to measurement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. Journal  of the American  Statistical  association,  92, 436-448. Ho, D . D . , Neumann, A . U . , Perelson, A . S., Chen, W . , Leonard, J . M . , and Markowitz, M . (1995). Rapid turnover of plasma virions and C D 4 lymphocytes in HIV-1 infection. Nature,  373, 123-126.  Ibrahim, J . G . , Chen, M . H . , and Lipsitz, S. R. (2001). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable.  Biometrika,  88, 551-564. Ibrahim, J . G . , Lipsitz, S. R., and Chen, M . H . (1999). Missing covariates in generalized linear models when the missing data mechanism is nonignorable. Journal Statistical  Society,  of the Royal  Ser. B, 61, 173-190.  Ke, C . and Wang, Y . (2001).  Semiparametric nonlinear mixed-effects models and their  applications (with discussions). Journal  of the American  Statistical  Association,  96,  1272-1298. Laird, N . M . and Ware, J . H . (1982). Random-effects models for longitudinal data. metrics,  38, 963-974.  Bio-  Lee, Y . and Nelder, J . A . (1996). Hierarchical generalized linear models. Royal Statistical  Society,  Journal  of the  Ser. B, 58, 619-678.  Lee, Y . and Nelder, J . A . (2001). Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions.  Biometrika,  88, 987-1006. Liang, H . , W u , H . , and Carroll, R. (2003). The relationship between virologic and immunologic responses in A I D S clinical research using mixed-effects varying-coefncient models with measurement errors. Bio statistics,  4, 297-312.  Lindstrom, M . J . and Bates, D . M . (1990). Nonlinear mixed effects models for repeated measures data. Biometrics,  46, 673-687.  Little, R. J . A . (1995). Modeling the drop-out mechanism in repeated measures studies. Journal  of the American  Statistical  90, 1112-1121.  Association,  Little, R. J . A . and Rubin, D . B . (1987). Statistical  Analysis  with Missing  Data.  New"York:  John Wiley.  Louis", T . A . (1982). Finding the observed information matrix when using the E M algorithm. Journal  of the Royal Statistical  McCulloch, C . E . (1997). models. Journal  Society, Ser. B, 44, 226-233.  Maximum likelihood algorithms for generalized linear mixed  of the American  Statistical  Association,  McLachlan, G . J . and Krishnan, T . (1997). The EM-Algorithm  92, 162-170. and Extension.  New York,  Wiley. Ogden, R. T . and Tarpey, T . (2006). estimated parameters. Bio statistics,  Estimation in Regression models with externally 7, 115-129. 143  Perelson, A . S.,.Neumann, A . U . , Markowitz/-M., Leonard, J . M . , and Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science, 271, 1582-1586.  Pinheiro, J . C . and Bates, D . M . (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal  of Computational  and Graphical  Statistics,  4, 12-35.  Rice, J . A . and W u , C . O. (2001).  Nonparametric mixed-effects models for unequally  sampled noisy curves. Biometrics,  Seber, G . A . F . (1984). Multivariate  Serfling, R. J . (1980).  57, 253-259.  Observations.  Approximation  Theorems  New York: Wiley.  of Mathematical  Statistics.  New York:  Wiley.  Shah, A . , Laird, N . , and Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing data. Journal  of the American  Statistical  Association,  92, 775-779.  Ten Have, T . R., Pulkstenis, E . , Kunselman, A . , and Landis, J . R. (1998).  Mixed ef-  fects logistics regression models for longitudinal binary response data with informative dropout. Biometrics,  54, 367-383.  Vonesh, E . F . (1996). A note on the use of Laplace's approximation for nonlinear mixedeffects models. Biometrika,  83, 447-452.  Vonesh, E . F . and Chinchilli, V . M . (1997). Linear of Repeated Measurements.  and Nonlinear  Marcel Dekker, New York.  144  Models for the  Analysis  Vonesh, E . F . , Wang, H . , Nie, L . , and Majumdar, D . (2002). Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. Journal  of the American  Statistical  Association,  97, 271-283.  Wei, G . C . and Tanner, M . A . (1990). A Monte Carlo implementation of the E M algorithm and the poor man's data augmentation algorithm. Journal Association,  of the American  Statistical  85, 699-704.  Wolfinger, R. (1993). Laplace's approximation for nonlinear mixed models.  Biometrika,  80, 791-795. Wu, H . , Kuritzkes, D . R., McClernon, D. R., Kessler, FL, Connick, E . , Landay, A . , Spear, G . , Heath-Chiozzi, M . , Rousseau, F . , Fox, L . , Spritzler, J . , Leonard, J . M'., and Lederman, M . M . (1999). Characterization of viral dynamics in human immuno-deficiency virus type 1-infected patients treated with combination antiretroviral therapy: relationships to host factors, cellular restoration and virological endpoints. Infectious  Diseases, 179,  Journal  of  799-807.  Wu, H . and Ding, A . (1999). Population HIV-1 dynamics in vivo: application models and inferential tools for virological data from AIDS clinical trials. Biometrics, Wu, H . (2005). Statistical  55, 410-418.  Statistical Methods for H I V Dynamic Studies in A I D S Clinical Trials.  Methods in Medical  Research, 14, 171-192.  Wu, H . and Zhang, J . (2002). The study of long-term H I V dynamics using semi-parametric non-linear mixed-effects models. Statistics  in Medicine,  21, 3655-3675.  Wu, L . (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal Statistical  Association,  97, 955-964. • 145  of the  American  Wu, L . (2004). Exact and approximation inferences for nonlinear mixed-effects models with missing covariates. Journal  of the American  Statistical  Association,  9 9 , 700-709.  Wu, L . and W u , H . (2001). A multiple imputation method for missing covariates in nonlinear mixed-effect models, with application to H I V dynamics. Statistics  in  Medicine,  20, 1755-1769.  Wu, M . C . and Carroll, R. J . (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 188.  146  44, 175-  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
Japan 24 0
United States 8 1
China 5 31
Canada 4 2
South Africa 3 0
France 3 0
Poland 1 0
India 1 0
City Views Downloads
Tokyo 24 0
Unknown 6 1
Ashburn 4 0
Shenzhen 4 31
Pietermaritzburg 3 0
Richmond 2 0
Winnipeg 2 2
Chicago 2 0
Beijing 1 0
Kolkata 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092843/manifest

Comment

Related Items