- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The theory and methods for measurement errors and missing...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The theory and methods for measurement errors and missing data problems in semiparametric nonlinear mixed-effects.. 2006
pdf
Page Metadata
Item Metadata
Title | The theory and methods for measurement errors and missing data problems in semiparametric nonlinear mixed-effects models |
Creator |
Liu, Wei |
Date Created | 2010-01-16T21:33:52Z |
Date Issued | 2010-01-16T21:33:52Z |
Date | 2006 |
Description | Semiparametric nonlinear mixed-effects (NLME) models are flexible for modelling complex longitudinal data. Covariates are usually introduced in the models to partially explain inter-individual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. In this thesis, we develop approximate maximum likelihood inference in the following three problems: (1) semiparametric NLME models with measurement errors and missing data in time-varying covariates; (2) semiparametric NLME models with covariate measurement errors and outcome-based informative missing responses; (3) semiparametric NLME models with covariate measurement errors and random-effect-based informative missing responses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. Some asymptotic properties of the estimates are discussed. The proposed methods are illustrated in a HIV data example. Simulation results show that all proposed methods perform better than the commonly used two-step method and the naive method. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2010-01-16T21:33:52Z |
DOI | 10.14288/1.0092843 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 2006-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/18520 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0092843/source |
Download
- Media
- ubc_2006-200308.pdf [ 6.12MB ]
- Metadata
- JSON: 1.0092843.json
- JSON-LD: 1.0092843+ld.json
- RDF/XML (Pretty): 1.0092843.xml
- RDF/JSON: 1.0092843+rdf.json
- Turtle: 1.0092843+rdf-turtle.txt
- N-Triples: 1.0092843+rdf-ntriples.txt
- Citation
- 1.0092843.ris
Full Text
The Theory and Methods for Measurement Errors and Missing Data Problems in Semiparametric Nonlinear Mixed-effects Models by W E I LIU B .Sc , Northeast Normal University, 1990 M.Sc., Northeast Normal University, 1993 M.Sc., Memorial University of Newfoundland, 2001 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D o c t o r o f P h i l o s o p h y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Statistics) The University of British Columbia October 2006 © W E I LIU, 2006 Abstract Semiparametric nonlinear mixed-effects (NLME) models are flexible for modelling complex longitudinal data. Covariates are usually introduced in the models to partially ex- plain inter-individual variations. Some covariates, however, may be measured with substan- tial errors. Moreover, the responses may be missing and the missingness may be nonignor- able. In this thesis, we develop approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measure- ment errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing re- sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. Some asymptotic properties of the estimates are discussed. The proposed methods are illustrated in a HIV data example. Simulation results show that all proposed methods perform better than the commonly used two-step method and the naive method. L ii Contents Abstract Contents List of Tables . . . List of Figures . . Acknowledgements Dedication '.. 1 Introduction 1 •1.1 Longitudinal Studies 1 1.2 Parametric Nonlinear Mixed-effects Models . 3 1.3 Semiparametric Nonlinear Mixed-effects Models 6 1.4 Measurement Errors and Dropouts 8 1.5 A Motivating Example 10 1.6 Research Objectives and Thesis Organization . . . 14 2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Mea- i i i surement Errors '. • • 16 2.1 Introduction 16 2.2 A Semiparametric N L M E Model for the Response Process 17 2.2.1 A Semiparmetric N L M E Model with Mis-measured Covariates . . . . 17 2.2.2 A Basis-based Approach to Nonparametric Functions 18 2.2.3 Percentile-based Knot Placing for Splines 21 2.2.4 Selection of Smoothing Parameters 21 2.2.5 Transformation of the Semiparametric N L M E Model 23 2.2.6 Consistency of the Estimate of w(t) 23 2.3 Measurement Errors and Missing Data in Covariates 26 3 A Joint Model for Semiparametric N L M E Models with Covariate Mea- surement Errors and Missing Data 28 3.1 Introduction 28 3.2 A Two-step Method 29 3.3 A Joint Model Method fpr Likelihood Inference 34 3.3.1 The Likelihood for the Joint Model 34 3.3.2 A M C E M Method 36 3.3.3 Sampling Methods 40 3.3.4 Convergence 44 3.4 A Computationally More Efficient Approximate Method 46 3.4.1 The Need for an Alternative Method 46 3.4.2 Analytic Expressions of Estimates 48 3.4.3 Asymptotic Properties 51 3.5 Example and Simulation 51 iv 3.5.1 A n Application in A I D S Studies 51 3.5.2 A Simulation Study 58 3.6 Discussion 60 3.7 Appendix: Asymptotic Properties of 7 Based on the A P P R Method in Section 3.4 61 3.7.1 Some Lemmas 61 3.7.2 Notation and Regularity Conditions . 62 3.7.3 Estimating Equations • . . . 65 3.7.4 Consistency . . ; 67 3.7.5 Asymptotic Normality of 7 72 Simultaneous Inference for Semiparametr ic N L M E M o d e l s w i t h Covariate Measurement E r r o r s and Outcome-based Informative D r o p o u t s 74 4.1 Introduction 74 4.2 Models for Nonignorable Missing Responses 75 4.3 Likelihood Inference . . . 77 4.3.1 The Likelihood Function 77 4.3.2 Approximate M L E s Based on a M C E M Method . . . 78 4.3.3 Monte Carlo Sampling . 81 4.4 A Computationally More Efficient Approximate Method 83 4.4.1 A Much Simpler E-step 84 4.4.2 The M-step 91 4.5 Example 95 4.5.1 Data Description 95 4.5.2 The Response and the Covariate Models 96 v 4.5.3 Dropout Models and Sensitivity Analysis . 98 4.5.4 Estimation Methods and Computation Issues . . 99 4.5.5 Analysis Results 101 4.6 A Simulation Study 103 4.7 Conclusions and Discussion 108 5 Semiparametric N L M E Model with Random-effect-based Informative Dropouts and Covariate Measurement Errors 110 5.1 Introduction 110 5.2 Missing Response Models I l l 5.3 A Monte Carlo E M Method 113 5.3.1 The Likelihood Function 113 5.3.2 A M C E M Algorithm • 115 5.3.3 Sampling Methods 117 5.4 An Alternative Approximate Method 118 5.4.1 The Hierarchical Likelihood Method 118 5.4.2 Asymptotic Properties 121 5.5 Example and Simulation 122 5.5.1 Example 122 5.5.2 The Simulation Study 128 5.6 Discussion 130 5.7 Appendix: Asymptotic Properties of the Approximate M L E #HL in Section 5.4131 5.7.1 Consistency 131 5.7.2 Asymptotic Normality of f ? H L 133 6 Conclusions and Future Research 136 vi 6.1 Conclusions 136 6.2 Future Research Topics .' 137 References . . 140 vii List of Tables 3.1 A I C and BIC values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3 . 54 3.2 A I C and BIC values for the linear, quadratic, and cubic CD4 models 55 3.3 Parameter estimates (standard errors) for the HIV dataset 58 3.4 Simulation results for parameter estimates as well as (standard errors) and (simulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R . 59 4.1 A I C and BIC values for the model (4.8) - (4.10), with q < p = 1, 2, 3. . . . 97 4.2 A I C and BIC values for the linear, quadratic, and cubic CD4 models . . . . 98 4.3 Parameter estimates (standard errors) for the models in the example 102 4.4 Simulation results for the parameter estimates (standard errors) as well as their biases and MSEs for the estimation methods P A R A , A P P R 1 , and APPR2.105 4.5 Simulation results for the parameter estimates (standard errors) for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13) 107 viii 4.6 Simulation results for biases and MSEs of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13) 108 5.1 Parameter estimates (standard errors) for the models in the example 127 5.2 Simulation results for the parameter estimates (standard errors) for the esti- mation methods A P and H L 129 5.3 Simulation results for bias and M S E of the parameter estimates for the esti- mation methods A P and H L 129 ix List of Figures 1.1 Viral loads (response) of six randomly selected HIV patients 7 1.2 CD4 counts of six randomly selected HIV patients 12 3.1 The time series plot for b2 associated with patient 10. . . . 56 3.2 The autocorrelation function plot for b2 associated with patient 10 56 4.1 The time series plot for b2 associated with patient 14 100 4.2 The autocorrelation function plot for b2 associated with patient 14 100 5.1 The time series plot for b2 associated with patient 10 126 5.2 The autocorrelation function plot for b2 associated with patient 14 126 x Acknowledgements First of all, I thank my supervisor, Dr. Lang Wu, for his support, both academic and moral. I feel honored to have had the chance to work with him. Very special thanks are due the other members of my supervisory committee, Dr. Paul Gustafson and Dr. Matias Salibian- Barrera, for their invaluable suggestions and encouragement. Also, I am very grateful to Dr. Nancy Heckmen, Dr. Harry Joe, and Dr. John Petkau for teaching me, academically and personally. I am also indebted to the office staff, Christine Graham, Elaine Salameh, and Rhoda Morgan, for all their help with administrative matters. It is a pleasure to acknowledge helpful discussions with my fellow graduate students: Juxin Liu, Guohua Yan, Weiliang Qiu, Hui Shen, and all of the others. Finally, I would like to thank my family, my husband, and my son for their tireless love, patience, understanding, and support. W E I L I U The University of British Columbia September 2006 xi To my family xii Chapter 1 Introduction 1 . 1 Longitudinal Studies A longitudinal study is defined as a study in which the response for each individual in the study is observed on two or more occasions. Longitudinal studies are very common in health and life sciences, epidemiology, medical, and biomedical research. Longitudinal studies are. also common in other areas including education, psychology, social sciences, and econometrics. A major advantage of longitudinal studies over cross-sectional studies is that in longitudinal studies one can model the individual response trajectory over time while in cross-sectional studies one cannot. In longitudinal studies, covariates may be classified into two categories: time-varying covariates and time-independent covariates. Time-varying covariates represent variables which vary over time within individuals. Time itself may be viewed as a covariate in that often there is interest in testing whether there are any changes in the response variable over time. When one studies children's weight trajectories over time, the height may be a time-varying covariate which can change with time. Time-independent covariates, on the other hand, may represent baseline factors which do hot vary with time. Examples of time- independent covariates might include an individual's gender and race. One of the goals in longitudinal research is to investigate the effects of important covariates on individual response trajectories over time. A defining feature of a longitudinal data set is repeated observations on a number of individuals. The repeated observations on the same individual tend to be correlated. It is important to explicitly recognize two sources of variability in a longitudinal data set: random variation among repeated measurements within a given individual and random variation between individuals. Moreover, the number of observations within individuals often varies from individual to individual (i.e., the data are often unbalanced). Therefore, longitudinal data require special statistical methods to draw valid statistical inferences. There are three approaches to a longitudinal data analysis. "The marginal model approach is to model the marginal expectation of a response as a function of covariates. The methods are designed to permit separate modelling of the regression of the response on covariates, and the association among repeated observations of the response for each individual. Marginal models are appropriate when inferences about the population average are the main interest. For example, in a clinical trial the average difference between control and treatment is most important, not the difference for any one individual. The random effects model approach assumes that the response is a function of covari- ates with regression coefficients varying from one individual to the next. A random effects model is a reasonable description if the set of coefficients for a set of individuals can be thought of as a sample from a distribution. In random effects models, correlation arises among repeated responses because the regression coefficients vary across individuals, and regression coefficients represent the effects of the covariates on an individual, which is in contrast to the marginal model coefficients which describe the effect of covariates on the population average. Random effects models are most useful when the objective is to make inference about individuals, such as in AIDS studies. They may focus on both population parameters and individuals characteristics. The transition model approach describes the conditional distribution of each response on an individual as an explicit function of his past responses and covariates. Under transition models, correlation among the response observations on one individual exists because the past response observations explicitly influence the present response observation. The past response observations are treated as additional covariates. In each of the three approaches, we consider both the dependence of the responses on covariates and the correlation among the responses. With cross-sectional data, only the dependence of the responses on covariates needs to be specified since there is no correlation of responses. In longitudinal studies, in which correlation usually exists among responses, there are at least two consequences of ignoring it as follows. First, incorrect inferences about regression coefficients. In particular, confidence intervals are too short based on assumption of independence when in fact there is positive dependence. Secondly, the estimation method may be inefficient, that is, less precise than possible. 1.2 Parametric Nonlinear Mixed-effects Models Parametric nonlinear mixed-effects (NLME) models, or hierarchical nonlinear models, have been widely used in many longitudinal studies such as human immunodeficiency virus (HIV) viral dynamics, pharmacokinetic analyses, and studies of growth and decay (Davidian and Giltinan 1995; Vonesh and Chinchilli 1997). In these studies, the intra-individual variation and the inter-individual variation are typically modelled by a two-stage hierarchical model. The first stage specifies the mean and covariance structure for a given individual, whereas 3 the second stage characterizes the inter-individual variation. Understanding the nature of inter-individual systematic and random variation at the second stage often receives more emphasis. This inter-individual variation may be partially explained by some baseline or time-varying covariates. Suppose that there are n individuals with measurements over time. Let yy and zy respectively be the response value and the i / x l covariate values for individual i at time Uj, i = 1, . . . , n, j = 1, . . . , rii. The covariates zy may incorporate variables such as time, dose, etc. A general parametric N L M E model can be written as a hierarchical two-stage model as follows (Davidian and Giltinan, 1995) V i j = giz&PJ + eij, eilfS,1^ N(0,52I), (1.1). ft,- = d (z i i ; ft hi), N(0,B), i = l,...,n, j = l,...,m, (1.2) where g{-) and d(-) are known (possibly nonlinear) functions, ft,- are individual-specific pa- rameters, f3 are population parameters (fixed effects), b; are random effects, ft — (/3^,..., f3jni)T, ei — (eii, • • . , eini)T are within-individual random errors and are assumed to be independent' of bj, 82 is the unknown within-individual variance, / is the identity matrix, and B is an unknown variance-covariance matrix. In AIDS studies, for example, viral loads (Plasma HIV-1 R N A copies) and various covariates such as CD4 count are usually measured over time after initiation of treatments. The following parametric N L M E model has been widely used to fit short-term (the first three months after treatments) HIV viral dynamics (Wu, 2002; Wu and Zhang, 2002) y y = l o g 1 0 ( P l i e - A l * ^ + P 2 i e - X ^ ) + ey, (1.3) l0g(Pu) =p\ + bli} XUj = #2 + fcZij + hi, . (1-4) log(P 2 ») = PA + hi, X2ij = As + hi, (1.5) where yy and zy are the logio-transformation of the viral load measurement and CD4 count 4 for patient i at time Uj respectively, = (bu, b2i, hi, bij)T are random effects, Pu and P2i are baseline values, and A i ^ and X2ij are the first (initial) and the second phase viral decay rates respectively (they may be interpreted as the turnover rates of productively infected cells and long-lived and/or latently infected cells respectively). Although N L M E models are popular in practice, their use has been somewhat limited because of the. complexity of the likelihood function. Estimation of model parameters based on maximum likelihood can be challenging since these models are typically nonlinear with respect to the random effects and thus have no closed-form expressions for the marginal like- lihood. This has led to the development of some widely used approximate methods based on Taylor expansions or Laplace approximation of the likelihood function (Lindstrom and Bates 1990; Wolfinger 1993; Vonesh, Wangs, Nie, and Majumdar 2002). These approximate methods are computationally efficient in the sense that they .may converge faster and have less computational problems than the "exact" likelihood method, which finds the maximum likelihood estimator (MLE) using numerical integration techniques or Monte Carlo methods. These approximate methods often perform well if the number of infra-individual measure- ments is not small, but their performance may be less satisfactory if the intra-individual data are sparse, especially when the inter-individual variability is large (Davidian and Gilti- nan 1995; Vonesh and Chinchilli 1997; Pinheiro and Bates 1995). Thus there is still a need for developing "exact" methods. "Exact" likelihood inference for generalized linear mixed models based on Monte Carlo E M algorithms has been investigated by McCulloch (1997) and Booth and Hobert (1999). 5 1.3 Semiparametric Nonlinear Mixed-effects Models Parametric N L M E models are powerful tools in many longitudinal analyses. In some cases, however, parametric N L M E models may not be flexible enough in modelling complex lon- gitudinal processes, since the underlying mechanism which generates the data may be com- plicated in practice. In these cases, semiparametric or nonparametric models may be more flexible in modelling the complex longitudinal process (Ke and Wang, 2001; Rice and Wu, 2001). In particular, semiparametric N L M E models are very useful in characterizing both the intra-individual variation and the inter-individual variation, in which the intra-individual variation is modelled semiparametrically while the inter-individual variation is incorporated by random effects (Davidian and Giltinan, 1995; Ke and Wang, 2001; Wu and Zhang, 2002). In AIDS studies, for instance, the parametric N L M E model (1.3) - (1.5) is appropriate only for fitting short-term HIV viral dynamics. Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1 for long-term viral load trajectories of six randomly selected HIV patients). Grossman et al. (1999) pointed out that viral decay rates after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). This leads to the following semiparametric N L M E model y i j = l o g 1 0 ( P i i e - A l « * « + P 2 i e ~ x ^ ) + e y , (1.6) log(Pu) = (3i + bu, XUj = (32 + (33Zij + b2i, (1.7) log(P 2i) = PA + hi, A 2ij = w(Uj) + hilUj), (1.8) where w(-) and hi(-) in (1.8) are unknown nonparametric smooth fixed- and random-effects 6 Figure 1.1: Viral loads (response) of six randomly selected HIV patients. functions used to describe the complicated second phase decay rate X2ij- Wu and Zhang (2002) introduced a class of semiparametric N L M E models for lon- gitudinal data. The standard parametric N L M E models can be regarded as a special case of their models. Their models are more flexible than the semiparametric N L M E models proposed by Ke and Wang (2001). Details of the semiparametric N L M E models proposed by Wu and Zhang (2002) will be described in Chapter 2. 1 . 4 Measurement Errors and Dropouts In many longitudinal studies, the inter-individual variation may be large and this variation may be partially explained by time-varying covariates. Some covariates, however, may be measured with substantial errors and may contain missing values as well. Ignoring measure- ment errors and missing data in covariates may lead to biased results (Carroll et al. 1995; Higgins et al. 1997; Wu, 2002). Moreover, some individuals may drop out of the study before the scheduled end for various reasons such as drug intolerance, which leads to missing data. Measurement errors and missing data make statistical analysis in longitudinal studies much more complicated, because standard complete-data methods are not directly applica- ble. Therefore, it is very important to find appropriate methods to deal with measurement errors and missing data. In AIDS studies, for example, it is well known that CD4 counts, which may be used as covariates, are usually measured with substantial errors and are usually measured at time points different from the response (viral load) measurement schedule. In addition, it is very common that some patients drop out of the study early or miss scheduled visits due to drug intolerance or other problems. Visual inspection of the raw data seems to indicate that dropout patients may have slower viral decay, compared with the remaining patients. Thus, 8 the dropouts are likely to be informative or nonignorable. Commonly used measurement error models are reviewed in Carroll et al. (1995). For N L M E models with covariate measurement errors, Higgins, et al. (1997) proposed a two-step method and a bootstrap method, and Wu (2002) considered censored response and covariate measurement errors based on a joint model. There is also extensive literature on dropouts in longitudinal studies (e.g., Diggle and Kenward, 1994; Little 1995; Ibrahim et al. 2001). However, there is little literature on addressing measurement errors, informative dropouts, and missing data in semiparametric N L M E models. In the presence of missing data, the missing data mechanism must be taken into account to obtain valid statistical inferences. Little and Rubin (1987) and Little (1995) discussed statistical analyses with missing values. Let = (yu,... ,yini)T be a vector of repeated observations of a variable y on individual i. Write = (y[°\ y^™'), with y-°' denoting the observed components of and y | m ^ denoting the missing components of y,. Let Ti = (ra,..., r i n i ) T denote a set of indicator variables such that ry = 1 if y^ is missing and rij = 0 otherwise. The probability distribution of r; defines a probability model for the missing value mechanism. Little and Rubin (1987) classified the missing value mechanism as follows. • Missing data are missing completely at random ( M C A R ) if the probability of missing- ness is independent of both observed and unobserved data. When missing data are caused by features of the study design, rather than the behavior of the study subjects, the M C A R mechanism may be plausible. For example, some values are missing because of reasons irrelevant to the treatment (e.g., the medical equipment is broken down on a certain day). So missingness is M C A R if is independent of both y\°^ and y - m ' • Missing data are missing at random (MAR) if the probability of missingness depends 9 only on observed data, but not on unobserved data. For example, a patient may fail to visit the clinic because he/she is too old. In mathematical notation, missingness is M A R if Ti is independent of . • Missing data are nonignorable or informative (NIM) if the probability of missingness depends on unobserved data. For random effects models, we consider the following two nonignorable response missing mechanisms. First, the probability of the missing- ness depends on unobserved responses. For example, a patient fails to visit the clinic because he/she is too sick. We call the missingness outcome-based informative (Little, 1995) if rt is dependent on y\m\ but not on the random effects bj. Secondly, the probability of missingness depends on unobservable random effects. For example, an AIDS patient may drop out if his/her individual-specific viral decay is too slow. We call the missingness random-effect-based informative (Little, 1995) if r-j is dependent on random effects bj, but not on y-m^. Both M C A R and M A R missing mechanisms are sometimes referred to without distinc- tion as ignorable. Little and Rubin (1987) showed that, when missing data are nonignorable, likelihood inference must incorporate the missing data mechanism to avoid biased results. 1 . 5 A Motivating Example Our research is motivated by HIV viral dynamic studies, which model the viral load tra- jectories after initiation of anti-HIV treatments. HIV viral dynamic models have received great attention in AIDS studies in recent years (Ho et al. 1995; Perelson et al. 1996; Wu and Ding, 1999; Wu, 2005). These viral dynamic models provide good understanding of the pathogenesis of HIV infection and evaluation of anti-HIV therapies. N L M E models have been popular in modelling the initial period of HIV viral dynamics and in characterizing 10 the large inter-patient variation. It is shown that the initial viral decay rate may reflect the efficacy of the anti-HIV therapy (Ding and Wu, 2001). One of the major challenges in modelling long-term HIV viral dynamics is that, during late stages of an anti-HIV treatment, it is difficult to model the viral load trajectory parametrically, because drug resistance, non- compliance, and other long-term clinical factors may affect viral load trajectories. Therefore, semiparametric N L M E models may be more suitable for modelling HIV viral dynamics (Wu and Zhang, 2002). Understanding the large inter-patient variation in HIV viral dynamic studies often receives great attention, which may help to provide individualized treatments. It has been shown that covariates such as CD4 cell count (see Figure 1.2) may partially explain the large inter-patient variation (Wu et al. 1999; Wu, 2002). However, some covariates such as CD4 cell count may be measured with substantial errors and may be measured at time points different from the response measurement schedule (which leads to missing data in covariates). Ignoring these measurement errors and missing data in covariates may lead to biased results (Wu, 2002). In addition, it is very common that some patients may drop out of the study early or miss scheduled visits due to drug resistance/intolerance and other problems (although dropout patients may return to study later). It appears that dropout patients may have slower viral decay rates, compared with the remaining patients (see Figure 1.1). Thus the dropouts are likely to be informative or nonignorable. Therefore, it is important to address measurement errors, informative dropouts, and missing data in semiparametric N L M E models in order to obtain reliable results, which may make significant contributions to H I V / A I D S studies. The following AIDS dataset motivates our research. A more detailed data description can be found in Wu (2002). The dataset includes 53 HIV infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured 11 50 100 150 200 Days after treatment Figure 1.2: CD4 counts of six randomly selected HIV patients. 12 on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contaminated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as CD4 count were also recorded throughout the study on similar schedules. It is well known that CD4 counts are usually measured with substantial errors. The number of response (viral load) measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance or other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 CD4 measurements missing at viral load measurement times, due mainly to a somewhat different CD4 measurement schedule. Six patients are randomly selected and their viral loads are plotted in Figure 1.1. In the presence of measurement errors in CD4 count, we consider the following semi- parametric N L M E model, which corresponds model (1.6) - (1.8), to fit the viral dynamics Va = \ogw{Pue-x^ + P 2 i e - x ^ ) + e i j , (1.9) log(Pi0 = Pl + hi, Xlij = $2 + Psz*j + b2i, (1.10) log(P2t) = + hi, A 2 i i = w(tij) + hiiUj), (1.11) where z*- is the unobservable true CD4 count, reflecting the belief that actual, not possibly corrupted, CD4 counts govern the initial phase viral decay rate XUJ. Model (1.9) - (1.11) will be used in our data analyses in later chapters. The CD4 count trajectories for six randomly selected'patients are plotted in Figure 1.2. There exists large variability in CD4 count between patients. Most CD4 count trajectories appear to have roughly quadratic polynomial shapes. We will discuss covariate models in the next chapter. 13 1.6 Research Objectives and Thesis Organization In this thesis, we consider approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measurement errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing responses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. Some asymptotic results are developed. For each problem, we propose two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, is more accurate than the second method but it is computationally very intensive and may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions, is always computationally feasible and is often computationally much more effi- cient, but it is usually less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or may be used to provide excellent parameter starting values for the first method. The remainder of this thesis is organized as follows. In Chapter 2, we introduce general semiparametric N L M E models with covariate measurement errors. Following Rice and Wu (2001) and Wu and Zhang (2002), we employ natural cubic spline bases with the percentile- based knots to transform semiparametric N L M E models into a parametric N L M E models. In Chapter 3, we address measurement errors and missing data in time-varying covariates in semiparametric N L M E models and propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to log-likelihood func- 14 tions, respectively. We also compare the two joint model methods with the two-step method suggested by Higgins, et al. (1997) and discuss the asymptotic properties of approximate M L E s . We finally apply the two joint model methods to a HIV dataset. In Chapter 4, we address outcome-based informative dropouts and covariate measurement errors in semipara- metric N L M E models and propose two joint model methods, implemented by Monte Carlo E M algorithms. We illustrate our proposed methods in a HIV dataset and evaluate their per- formance via simulation studies. In Chapter 5, we consider random-effect-based informative missing responses in semiparametric N L M E models with covariate measurement errors. We propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Laplace approximation to log-likelihood functions respectively, to simultaneously obtain approximate M L E s of all model parameters. We also discuss some asymptotic prop- erties of the approximate M L E s . We illustrate our methods in a HIV dataset and evaluate their performance by simulation studies. We conclude this thesis with some discussion and possible future work in Chapter 6. 15 Chapter 2 A Semiparametric Nonlinear Mixed-effects Model with Covariate Measurement Errors 2.1 Introduction In this chapter we present the general form for semiparametric N L M E models with covariate measurement errors. In Section 2.2, we describe a general semiparametric N L M E model for the response process and incorporate possibly mis-measured time-varying covariates. We approximate the proposed semiparametric N L M E model by a parametric N L M E model, using linear combinations of natural cubic splines with percentile-based knots. Consistency of the estimates is discussed. In Section 2.3, the covariate process is modelled using a mixed-effects model to address measurement errors and missing data. 16 2.2 A Semiparametric N L M E Model for the Response Process 2.2.1 A Semiparmetric NLME Model with Mis-measured Covari- ates We describe a semiparametric N L M E model in general form. Let be the response value for individual i at time Uj, i = l , . . . , n , j = l , . . . , n j . Let ZM be the observed value and let z*kl be the unobservable "true" value of covariate k for individual i at time uu, i = 1, . . . , n, k = 1, . . . , v, 1 = 1,..., m-i. For simplicity, we focus on the case where z*kl is the current true covariate value, but our method can be extended to the case where z*kl is a summary of the true covariate values up to time uu. Note that for each individual, we allow the covariate measurement times uu to differ from the response measurement times Uj. In other words, we allow missing data in the covariates. Let — (yu,... ,yini)T and z* = (zfi> • • • >zLjT» where zu = (ziU, • . . ,ziui)T, l = l,...,mi. For the response process, we consider a general semiparametric N L M E model similar to Wu and Zhang (2002), but incorporate possibly mis-measured time-varying covariates Vij = 9(Uj, rifcjfi + eij, (2.1) ft, = d*(^, [3*, b*), (2.2) rt(t) = v(w(t), hi(t)), i = 1,... ,n, j = 1,... ,nu (2.3) where g(-), d*(-), and v(-) are known (possible nonlinear) functions, w(t) and hi(t) are unknown nonparametric smooth fixed-effects and random-effects functions respectively, /3*. are individual-specific parameters, (3* are population parameters, is the within-individual random error, and b* are random effects. Let e; = (en,... ,eini)T. We assume that ~ 17 N(0, 521), where 52 is the unknown within-individual variance and I is the identity matrix, b* 7Y(0, £ * ) , /ij(i)'s are identical and independent realizations of a zero-mean stochastic process h(t), and b* and hi(t) are independent of e i . We can rewrite the semiparametric N L M E models (2.1) - (2.3) in a compact way Vij = g(Uj, d*(z*., [3*, b*), v(w(tij), htiUj))) + eij. (2.4) Note that in (2.1) or (2.4), we assume that the individual-specific parameters /3*- depend on the true but unobservable covariates z*,- rather than the observed covariates zy, which are measured with error. Because of the nonparametric parts (i.e., w(t) and hi(t)) in the model, the semipara- metric N L M E model (2.4) is more flexible than parametric N L M E models for modelling longitudinal data, and it reduces to a parametric N L M E model when the nonparametric parts w(t) and hi(t) are constants. Following Wu and Zhang (2002), model (2.4) is also more flexible than other semiparametric N L M E models that have appeared in the literature, such as Ke and Wang (2001). The semiparametric N L M E models in Ke and Wang (2001) can be considered as a special case of model (2.4). In particular, their model only put the random effects in [3*^ as in (2.2) and considered (£*,•) = W(UJ; /?*•) in (2.3). Therefore, model (2.4) is a very general and flexible semiparametric N L M E model. 2.2.2 A B a s i s - b a s e d A p p r o a c h t o N o n p a r a m e t r i c F u n c t i o n s To do statistical inference for the semiparametric N L M E model (2.4), a main difficulty is how to fit the nonparametric smooth fixed-effects function w(t) and random-effects function hi(t). Following Rice and Wu (2001) and Wu and Zhang (2002), we use a basis-based approach which transforms a general semiparametric N L M E model into a set of parametric N L M E models indexed by a smoothing parameter (the number of basis functions). We use the 18 fixed-effects function w(t) to illustrate the basis-based approach. Let x be the support of t and L2(x) be the inner product space of all square integrable functions with norm || • || and inner product < •, • >, where for any ipi, ip2 £ L2{x)> w e define Jx Jx Assume that w(t) is an element of a smooth function space Sw(x), a subspace of L2(x)- A n example of Sw(x) is the Sobolev space W^(x) = {ir>\il>,ip',...,i){m~1] absolutely continuous, ^ ( m ) e L2(X)}. Denote a complete orthonormal basis of Sw(x) by \I>(£) — [ipo(t), tpi(t), tp2(t), • • • ]T where ipo(t) = 1. Then w(t) can be expanded as oo fc=0 where the coefficients Hk= w(t)ipk(t)dt =< w,ipk> . J X Let * p(t) = [ipo(t), ipi(t), • • • ,ipP-i(t)]T and fj,p — (//o, A*i, • • • ,^P)T- Since w(t) are square integrable, the truncations of w(t) at term p p - i fc=0 -2 will converge to w(t) in L -norm as p tends to infinity. It follows that when p is large enough, wp(t) can approximate w(t) very well, i.e., w(t) ~ wp(t). Similarly, if we assume that hi(i) is an element of a smooth function space 5/l(x)(c L2(x)) with a complete orthonormal basis = [(f>o{t), <Pi(t), 02 (0> • • - ]T where (j>o(t) = 1, the truncations of hi(t) at term q <?-i k=0 19 will converge to hi(t) in L 2 -norm as q tends to infinity, where <&g(£) = [<f>o(t), <f>i(t), • • •, (f)q-i(t)]T and £, i q = ( £ i 0 > . . . , £ ; g ) T . It follows that when q is large enough, hiq(t) can approx- imate hi(t) very well, i.e., hi(i) ~ hiq(t). The function wp(t) can> be considered as the projection of w(t) on the linear space S(x, *p) = {i> | i> = typ(t)T fip, fip G i?p} C Sw(x), spanned by basis functions $?p(t), and the function hiq(t) can be considered as the projection of hi(t) on the linear space S{x, = {0 I 0 = ^P^T iiq^ iiq € C SUx), spanned by basis functions With p and g increasing, io p(i) and /iig(t) approach to w(t) and respectively. Parameters /x p and are unknown vectors of fixed- and random-effects coefficients, respectively. Since /ii(i)'s are assumed to be identical and independent realizations of a zero-mean stochastic process, we can regard £ i q as identical and independent realizations of a zero-mean random vector with unknown covariance matrix K. There are many bases available in the literature for curve fitting. Among global bases are Legendre polynomials and Fourier series, and among local bases are regression splines (Eubank, 1988), B-splines (de Boor, 1978) and natural splines (Green and Silverman, 1994). A B-spline of degree d on x with knots to < ti < ' ' ' < < ^M+I is a piecewise polynomial with polynomial pieces of degree d joining together smoothly at the interior knots * i < • • • < * M while satisfying some boundary conditions. In other words, a B-spline is a polynomial of degree d within each of the intervals [tk, tfc+i), 0 < k < M — 1, and [XM, I M + I ] , which globally has (d — l)-continuous derivatives. Al l such B-splines form a linear space with M + d + 1 basis functions which are mainly determined by three factors: the degree d; the location of the knots, and the number of interior knots M. When the degree d = 3, the corresponding B-splines are called cubic splines. When the cubic splines have zero second and third derivatives at the two extreme knots XQ and XM+I, they are called natural cubic splines. Without loss of generality, throughout this thesis, we assume that the nonparametric 20 fixed- and random-effects functions w(t) and hi(t) are elements of the Sobolev space VK22(x) and we use natural cubic spline bases (Green and Silverman, 1994) due to their many good properties, for example, easy construction, good smoothness, and flexibility to model the underlying curves of various shapes (de Boor, 1978). 2 .2 .3 P e r c e n t i l e - b a s e d K n o t P l a c i n g f o r S p l i n e s The placing of knots is an important issue for splines in which we attempt to use a few knots to represent a sample of design time points. We use sample percentiles of the design time points as knots so that there are more (fewer) knots in the area where more (fewer) design time points are available, as suggested by Wu and Zhang (2002). They indicated that the percentile-based knot placing rule should work better for longitudinal data than the equally-spaced knot placing rule used by Rice and Wu (2001), since the design time points of longitudinal data are usually sparse and often not uniformly spaced. Moreover, the percentile-based knot placing rule guarantees that the locations of the knots (and also the resulting basis functions) are sample-dependent and design-adaptive. These properties are not shared by the equally-spaced knot placing rule. After the degree d and the knot placing rule are determined, we need to choose the numbers of the interior knots, or equivalently to choose the numbers p and q of the basis functions, which are called smoothing parameters. 2 .2 .4 S e l e c t i o n o f S m o o t h i n g P a r a m e t e r s Using natural cubic spline bases with percentile-based knots to fit the nonparametric fixed- and random-effects functions w(t) and hi(t), we can transform the semiparametric N L M E model into a parametric N L M E model. To assess how well the resulting parametric N L M E model approximates the original semiparametric N L M E model, we need to consider two factors: the goodness-of-fit and the model complexity. Goodness-of-fit usually indicates how 21 well the model fits the data (or how small the biases of the associated estimators are). It can be improved by increasing p and q or equivalently, enlarging the linear spaces Sw(x,^p{t)) and Sh(x,®q{t))- However, the model complexity represents how complex the model is (or how large the variances of the associated estimators are). The model usually becomes more complicated with increasing p and q. Thus, there is a trade-off between the goodness-of- fit and the model complexity. To balance the two components, it is natural to employ some model selection rules such as the Akaike Information Criterion (AIC) or the Schwarz's Bayesian Information Criterion (BIC) (Davidian and Giltinan, 1995). This is because the transformed parametric N L M E models are indexed by p and q and choosing different p and q is equivalent to choosing different parametric N L M E models. Let <p be the number of independent parameters in a parametric N L M E model, say, model (1.1) and (1.2). Then the A I C and the BIC are defined as A I C = -2Loglik + 2tp, BIC = -2Loglik + [log ( £ m)] cp, where Loglik is the log-likelihood of the fitted the parametric N L M E model (see Davidian and Giltinan, 1995, pl56). Since a parametric N L M E model with a larger number of parameters will always produce a larger value for the log-likelihood (a smaller value for -2 Loglik), the n penalty terms 2<p in A I C and log(]Tnj) ip in BIC are needed to offset this advantage. Since L i = l J the penalty term in BIC is usually much larger than that in A I C , BIC is a conservative rule and generally favors a parsimonious model. Since both the A I C and the BIC of a parametric N L M E model are defined as twice the negative log-likelihood of the model (representing the goodness-of-fit) plus a. penalty term related to the number of parameters used in the model (representing the model complexity), we will choose ^p(t) and $>q(t) so that the AIC or the BIC are minimized over a series of ^p(t) and $g(t), which leads to the best approximate parametric N L M E model to the original semiparametric N L M E model in terms of the AIC 22 and BIC criteria. Liang et al. (2003) noted that the model obtained this way often provides good approximation in practice. We will evaluate the performance of the A I C and the BIC in the current setting (see Section 4.6). 2 .2 .5 T r a n s f o r m a t i o n o f t h e S e m i p a r a m e t r i c N L M E M o d e l After determining the smoothing parameters p and q via the A I C and the BIC criteria, we replace w(t) and hi(t) in the nonparametric function r;(£) in (2.3) by their approximations wp(t) and hiq{t). Thus, we obtain an approximation to the nonparametric function r^t), and approximate the semiparametric N L M E model (2.4) as follows V i j « gfc, d*(zT., ft, b*), v^!p{t)T ^ $q(t)T£iq)) + ^ = giUj, d(zy, ft, bi)) + ey (2.5) where (3 = (ft , /j, ) are fixed effects, b; = (b*, £ i ? ) are random effects, and d(-) is a known but possible nonlinear function. Then, we can approximate the semiparametric N L M E model (2.1) - (2.3) by the following parametric N L M E model Va = </(/,;• ft.-Hey, e i | f t ^ / V ( 0 , 52I), (2.6) ft,. = d ( z y , ft bi), b > A f JV(0,/3), (2.7) where B is an unstructured covariance matrix. Note that e; and bj are independent of each other. Approximate statistical inference can then be based on the approximate model (2.6) and (2.7), as shown in Chapters 3 - 5 . 2 .2 .6 C o n s i s t e n c y o f t h e E s t i m a t e o f w{t) After we obtain estimates jip and £ i q based on the parametric N L M E model (2.6) and (2.7), we can then estimate the nonparametric functions w(t) and hi(t) in the semiparametric 23 N L M E model (2.1) - (2.3) as follows w(t) =• wp(t) = *p(i)TAP, k(t) = kg(t) = $g(t)Tiiq, Therefore, the consistency of the estimates in the semiparametric N L M E model (2.1) - (2.3) is strongly related to the consistency of estimates in the parametric N L M E model (2.6) and (2.7). Under some mild conditions, the following Theorem 2.1 guarantees that we can obtain a consistent estimate w(t) of the nonparametric fixed-effects function w(t) in the semiparametric N L M E model (2.1) - (2.3) if we can find i/n-consistent estimates -ft of the fixed-effects coefficients pip. Following Wu and Zhang (2002), we prove the consistency of the estimate w{t) of the nonparametric fixed-effects function w(t) in the semiparametric N L M E model (2.1) - (2.3) based on the following conditions: (a) . $(t) is a complete orthonormal basis of S(x), a subspace of L2(x)- oo (b) . The nonparametric fixed-effects function w(t) E S(x) s o that w(t) = £ l^ki>k{t)- k=Q (c) . The design time points {i^, i = l,...,n, j = l , . . . , n j } are identically and independently distributed such that when the number n of individuals tends to infinity, the number of distinct time points will tend to infinity. In this case, we can truncate w(t) in the semiparametric N L M E model (2.1) - (2.3) in such a way that wp(t) = Y^k^o^k^kif) = typ{t)T /xp so that p —> oo, p/n —> 0 as n —> oo. (d) . For any fixed p, we assume that we can obtain A/n-consistent estimates fxp of the fixed-effects coefficients pLp so that as n —• oo, E(p\p) — > fip and Cov(\/nfip) —• E p for some semidefinite positive matrix S p with p _ 1 t r ( E p ) bounded. T h e o r e m 2.1. Under Conditions (a) - (d), as n —* oo, we have \\w — w\\ —> 0 in probability. 24 "p f"pl - p - 1 J E\\p,p-»p\\2 = E p-i P-I fc=i Proof. First we consider E\\£ip — [ip _fc=0 = ^ E [ / x f c - E ( / i f c ) + ^ ( A f c ) - ^ ] 2 fc=i p-i - £ - W * ) ] 2 + - A**] 2 + 2[A/t - E(p:k)][E(fik) - vk]} fc=i P-I = £ { E [ £ f e - £(£fc)] 2 + [£(£*) - ^]2} *;=i p—i p—i f = ^ V a r ( / i f e ) + I ] [ ^ ( A f c ) - ^ ] 2 fc=i fc=i - tr [Cov(Ap)] + | |£; (Ap ) -M P ] ) 2 - Under Conditions (a) and (b), and C o v ( v / n / i p ) —> S p in Condition (d), we have E\\w - w\\2 = E\\wp - w\\2 < 2{E\\wp - wp\\2 + \\wp - w\\2} = 2{E\\p,p-vP\\2 + \\wp-w\\2} = 2{tr[Cov(jip)] + \\E(j±p)-tip\\2 + \\wp-w\\2} = 2{n- 1tr[Cov(v^/i p)] + \\E(p.p) - M p | | 2 + \\wp - w\\2} = 2 J n - 1 t r [ S p + o(l)] + ||£;(Ap)-̂ ||2 + ̂ 2 f c | = 2(n- 1[tr(Sp)+po(l)] + | | / i ; ( A p ) - / X p | | 2 + f;^} fc=p = 2^ k=p ) Under Conditions (b)-(d), it is easy to show that the three terms in parentheses {•} of the right-hand side tend to 0 as n —> oo. Under Condition (d), as n —> oo, E\\w — w\\2 —> 0 implies \\w — w\\ —* 0 in probability. • ' 25 2.3 Measurement Errors and Missing Data in Covari- ates At the presence of measurement errors and missing data in the time-varying covariates zu = (zni,... ,zil/i)T, we need to model the covariate processes. We consider the following multivariate linear mixed-effects (LME) model (Shah et al., 1997) to empirically describe the covariate process where Uu and Vu are design matrices, a and are unknown population (fixed-effects) and individual-specific (random-effects) parameter vectors, and eu are the random measurement errors for individual i at time uu- For example, we may model the covariate processes parametrically based on empirical polynomial models with random coefficients, as in Hig- gins et al. (1997) and Wu (2002). Alternatively, we may model the covariate processes nonparametrically, and approximate the nonparametric fixed- and random-effects functions by linear combination of some basis functions, as in Section 2.2. For either parametric or nonparametric covariate models, we may convert the covariate models to the L M E model (2.8). Note that the covariate model (2.8) incorporates both the correlation of the repeated measurements on each individual and the correlation among different covariates. Note that the parameters in the covariate model (2.8) may be viewed as nuisance pa- rameters because they are often not of main interest. We assume that the true (unobservable) covariate values are zit = Uua + Vu a; + eu (= z*t + eit) i = 1, . . . , n, I = 1, . . . , m-i (2.8) z*u = Uua + Vu sit. We also assume that a; i.i.d. N{0, A), ea i.i.d. 7V(0, R), and â and et = (e^,... independent, where A is an unrestricted covariance matrix and R is an unknown within- 26 individual covariance matrix. We further assume that and a, are independent of ej and b j . Models (2.8) may be interpreted as a covariate measurement error model (Carroll et al., 1995; Higgins et al., 1997). To allow for missing data in the time-varying covariates (or different measurement schedules for the time-varying covariates), we recast model (2.8) in continuous time: Zi(t) = Ui(i)cx + Vi(t)ai + ei(t), i = l,...,n, where Z j ( £ ) , Ui(t), Vi(t), and e»(t) are the covariate values, design matrices, and measurement errors at time t respectively. At the response measurement time Uj, which may be different from the covariate measurement times uu, the possibly unobserved "true" covariate values can be viewed as z*j = [7^ cx + a^, where Uij = Ui(Uj) and = Vi(Uj). In other words, missing covariates at time Uj may be imputed by their estimated true values z*-. 27 Chapter 3 A Joint Model for Semiparametric N L M E Models with Covariate Measurement Errors and Missing Data 3.1 Introduction In this chapter, we address measurement errors and missing data in time-varying covariates for semiparametric N L M E models. In Section 3.2, we review the two-step method proposed by Higgins et al. (1997). We derive some analytic and asymptotic results for the two- step estimates for parametric N L M E models with mis-measured covariates, and analytically show that the variances of the main parameter estimates based on the two-step method are underestimated. To address measurement errors and missing data in time-varying covariates in semi- 28 parametric N L M E models based on models (2.6) - (2.8), in Sections 3.3 and 3.4 we propose two joint model methods, implemented by a Monte Carlo E M algorithm and by a first-order Taylor approximation to the log-likelihood function respectively, to find approximate M L E s of model parameters. We also discuss asymptotic properties of these approximate M L E s . In Section 3.5, we apply the two joint model methods to a real dataset. We evaluate the proposed methods and compare them with the two-step method via simulation studies. We conclude this chapter with some discussion in Section 3.6. Proofs of the asymptotic properties of approximate M L E s are presented in Section 3.7. 3.2 A Two-step Method For covariate measurement error problems, a commonly used method is the so-called two- step method (Higgins, et al., 1997; Liang, et al., 2003): in the first step the "true" covariate values are estimated based on an assumed covariate model, and then in the second step, the possibly mis-measured covariates in the response model are simply replaced by the esti- mated covariates from the first step. The estimation of the main parameters in the response model proceeds as if the estimated covariate values are the true covariate values without measurement error. Intuitively, the resulting estimates of the main parameters may be ap- proximately unbiased if the covariate estimates from the first step are unbiased, but the variances of the main parameter estimates may be underestimated because the variability of the covariate estimation in the first step is ignored in the estimation of the main parameters in the second step. Higgins et al. (1997) and Ogden and Tarpey (2005) realized this problem and proposed bootstrap methods which incorporate the variability from estimating the co- variates in the first step. Wu (2002) considered an approximate joint model approach which also incorporates the variability in the covariate estimation. In this section, we derive some 29 analytic results for the two-step estimates for parametric N L M E models with mis-measured covariates such as the models (2.6) - (2.8), analytically show that the variances of the main parameter estimates based on the two-step method are underestimated, and derive some asymptotic results for the two-step estimates. Let y = (yf, • • • , y ^ ) T , and define z, z*, and e similarly. If z* is known, an estimate (3 of (3 can be expressed as (3 = s(y, z*), where s is a vector function. Since z is recorded with errors and z* is unobservable, we assume z = z* + e, where E(e) = 0. Let z be an unbiased estimate of the true covariate value z* based on the observed covariate value z (i.e., E(z) = z*). The two-step method estimates (3 by f3 = s(y, z), which depends on realizations of two random variables y and z. If we assume that E[s(y, z*)] « (3, and y and z are roughly independent so that E d s ( y , z ) dz ( z - z * ) } E d s ( y , z ) dz £ ( z - z * ) , and that the function s(y, z) is well approximated by a first-order Taylor series expansion around z*, then we show next that the estimate f3 is also approximately unbiased, following Ogden and Tarpey (2005). Taking a first-order Taylor expansion of s(y, z) around the "true" covariate value z*, we have E0) = £ ( s ( y , z ) ) ^ £ { s ( y , z * ) + ds(y,z) | dz Iz=z* ( z - z * ) } (3 + E ds(y,i) E(z - z*) = f3, provided that the expectations exist (the expectation operator in the above expression is to be taken as the expectation with respect to both y and z). However, the variances of the two-step estimates will be underestimated, as shown below. When z is plugged in the estimation of j3, the resulting variance-covariance matrix is actually an estimate of Cov(/3|z). By the well-known variance decomposition formula Cav0) = Cov[£(/3|z)] + £[Cov( /3|z)] , 30 we know that Cov(P) — E[Cov(/3\z)] is nonnegative definite, since Cov[E(/3|z)] is nonneg- ative definite. Thus, on average, the two-step approach underestimates the true variance- covariance matrix of f3. Following the general approach taken by Amemiya (1983), under the suitable regular- ity conditions on the log-likelihood function of P (e.g., the third order derivatives exist and are continuous in an open neighborhood about (3), we derive the asymptotic distribution for the M L E of f3 when the unobservable "true" covariates z* are imputed by their estimated values ẑ . Suppose that ẑ are ^/m^-consistent estimates of z* and that yfrriiizi - z*) N(0, fiZi), i = l,...,n, where is the number of observations for covariates on individual i. We assume that mi = 0(m) uniformly for i — 1,. . . , n, where m — min^m,), and that m,j and n go to infinity at the same rate with n/rrti —• Cj, where 0 < Cj < oo. Let n i=l be the log-likelihood function of f3 based on the observed data y and z* with the unobservable "true" covariates z* are imputed by their estimated values z. The M L E J3 of /3 satisfies a set of equations where dp f-' dp 1=1 dk{P;yi, Z j ) = dlj(P;yi, z f) dp dp P=P Taking a first-order Taylor expansion of dl((3;y, z)/d/3 around the "true" covariates z*, we 31 have ° - E i = i n = E i= l n = E n • = E i= i n = E i = i n = E 2=1 dli(P;yi, z*) n + E i = i r&li&yi, z* dp L d/33zf dli((3;yi, K) + E i= l r c ^ f t v i , z* 8(3 L <9/3<9zf dli(P;yi, z*) n + E i = i d2h(p-yi, z*) a/3 dpdzf dli(f3;yu z*) n +E 82k(P;yi, z*) dp dpdzf dk(P;yi, z | ) n +E i= l d2k(P;yi, z*) 5/3 dpdzf 8h(P;yi, z*) dp n + E » = i d2li(P;yi, z*) dpdzf •(ii - z') + 0{|| Z , - z-|| 2) * - < > + £ > ( £ ) 1=1 * Z j - z*) + Op I m a x Y — ) L i \ m j / . Z i - z * ) + O p ( m i n m ^ . z» - z*) + Op(m _ 1 ) , since z» are y / 7 7 v c o n s i s t e n t estimates of z*. Next, carrying out a first-order Taylor expansion of 9k(P; y i , z*)/dp in the above expression around /3, we can obtain i= l i = i d/3d/3J d2lj(P;yj, z*) . _x ( z i - Z i ) + Op(m ) = 0 i= l E t=i dh(P;yu zj) a/3 i = i dpdzf d2h(p;yi, z*) dpdpT (P-P) + Op(n-1) , ^ d2li(P;yi, z*) * s , n , -u n + 2^—a/35z r ( z i - Z i ) + QP(^ ) = o, i= l since /3 is the M L E of /3 and thus it is i/^-consistent under the necessary regularity conditions on the log-likelihood function of p. The above expression can be written as n ^ 1 ^ d2h(P;yi, z*) i= i dpdp1 v t= i dkjp-^z*) dp i=l d2k(p;yi,z*) dpdz\ 32 Assume that the following limits exist lim - f > ( / 3 ) = /(/3), n — * o o 7T, * — • • ' =1 where ii(/3) = —E[d2li(f3; yi, z*)/d(3d0r] is the Fisher information matrix for individual i. It follows from Lemma 3.2 in Section 3.7 that 1 A - &kfayu z? ) n d2h(i3;yi, z*) p Note that -\/n( zi — z \ ) ~ \ / C i m i { z i — z*) for large n and m .̂ Using the asymptotic normality of Z j , we know that for large m*, £'[y /rf^'(zi — z*)] « 0 and Cov[y /m^(z i — z*)] « flz.. By Lindeberg's central limit theorem, we have It follows from Slutsky's theorem that 1 ^ d2li0-yi, z*) r l ^ U d2k(J3;yi, z*) d Under the necessary regularity conditions on the log-likelihood function of (3, based on the standard arguments for showing asymptotic normality of M L E s , we can obtain and 1 A d2li(f3;yi, z*) P - ^ d(3df3T I m Since y and z are roughly independent, the two limit random variables with distributions N(0, Q z ) and N(0, I((3)) are roughly independent. Note that n/m = 0(1). Putting these pieces together, we have asymptotic normality of y/n(/3 — (3) as follows: • ^cp-ft^N^rm-'+m-^jipy 1]. (3.1) 33 Note that the variance-covariance matrix of (3 based on the asymptotic distribu- tion (3.1) has two components. The first component I(f3)~l is the "naive" estimate of the variance-covariance matrix of /3 in the two-step method, in which the variability of z» is neglected. The second component /(/3) _ 1 O z /(/3) _ 1 arises from the variability in estimating Z j and summarizes the extra uncertainty of (3 due to estimation of Z j . Since the second component I{[3)~1 flz 7(/3)_1 is nonnegative definite, the variances of the main parameter estimates based on the two-step method are underestimated. 3.3 A Joint Model Method for Likelihood Inference 3.3.1 The Likelihood for the Joint Model We consider likelihood inference for semiparametric N L M E models with measurement er- rors and missing data in time-varying covariates, based on the approximate parametric N L M E models (2.6) - (2.8). The observed data are {(yi, z*), i = l , . . . , n } . Let 6 = (a, (3, 52, R, A, B) be the collection of all unknown parameters in models (2.6) - (2.8). We assume that the parameters a , (3, 52, R, A, and B are distinct. Let /(•) be a generic density function, and let [X\Y] denote a conditional distribution of X given Y. The approx- imate log-likelihood for the observed data {(yi, Z j ) , i = 1,... ,n} can be written as i=i J 34 where /y(yi |z i , a, (3, 52) = TJpli My i j l z y> a*> b*; A <̂ 2) = n;ii ( 2 ^ ) - ^ exp{-[y y - g(tl3, d(v%a + v̂ ai, /3, b,))]2/2<52}, /z(zi|ai; a, i?) = T J ^ i /z(zifcla*; a, i?) = ]ir=i M " 1 / 2 exp{-(zifc - u i f c cx - wik R~l x(zifc - U i f c Q - Vi f cai)/2}, /(a,; A)" = | 2 7 r A | - 1 / 2 e x p { - a l M - 1 a i / 2 } ) /(bi; fl) = |2^fi|-1/2eXp{-bf/3-1bi/2}. This approximate log-likelihood function generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects ai and bj. Exact likelihood calculations therefore require numerical evaluation of an integral whose dimension is equal to the dimension of the random effects (ai( b;). This is straightforward to do by direct numerical integration such as Gaussian quadrature when the dimension of (ai, bi) is very small (say, 1 or 2). However, when (â , bi) has a dimension of 3 or more as is often the case in practice, one needs to consider alternative methods such as computationally intensive Monte Carlo methods. Laird and Ware (1982) obtained M L E s in L M E models using the E M algorithm. Here we use a Monte Carlo E M ( M C E M ) algorithm to find the approximate M L E s of all parameters 6. By treating the unobservable random effects ai and bi as additional "missing" data, we have "complete data" { (y , , Z j , ai, bi), i = l , . . . , n } . The complete-data log- likelihood function for all individuals can be expressed as lc{Q) = E lc(0) = £ { l o g /y(yj | z i , ai, bi; a, (3, 52) + log fz{zi\^\ a, R) i=i i=i (3.2) + log / (a i ;y l ) + log/(b i ;B)} ) where is the complete-data log-likelihood for individual i. 35 3 .3 .2 A M C E M M e t h o d The E M algorithm (Dempster, Laird, and Rubin, 1977) is a very useful and powerful algo- rithm to compute M L E s in a wide variety of situations, such as missing data and random- effects models. The E M algorithm iterates between an E-step, which computes the condi- tional expectation of the complete-data log-likelihood given the observed data and previous parameter estimates, and a M-step, which maximizes this conditional expectation to update parameter estimates. The computation iterates between the E-step and M-step until con- vergence leads to the M L E s (or local maximizers). When there are several modes in the conditional expectation, the M L E s can be determined by trying different parameter starting values. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectations. In the M-step, we use standard complete-data optimization procedures to update parameter estimates. Let 6® be the parameter estimates from the i-th E M iteration. The E-step for individual i at the (t + l)th E M iteration can be written as Qi(9\0W) = E(l®(0)\yi, zf; flW) log /y(y;|zi, ai, bi; a, (3, S2) + log /z(zj|aj; a, R) + log /(ai; A) + log /(bi; B) x /(ai, b;|v;, z{; 0(*>) db{ = J « ( a , (3,52) + / W ( a , R) + I®(A) + I®(B). (3.3) The above integral generally does not have a closed form, and evaluation of'the integral by nu- merical quadrature is usually infeasible, except for simple cases. However, note that expres- sion (3.3) is an expectation with respect to the conditional distribution /(ai, bi|yi, z*; 0®), and it may be evaluated using the M C E M algorithm of Wei and Tanner (1990), as in Ibrahim et al. (1999, 2001). Specifically, for individual i, let {(af \ bf >),..., (af(), bft})} denote a random sample of size kt generated from [a;, bj|yj, z*; 0^]. Note that each (afbf )̂ de- 36 pends on the E M iteration number t, which is suppressed throughout. Then we approximate the conditional expectation Qi(0\0®) in the E-step by its empirical mean, with missing data replaced by simulated values, as follows = IT E log fviyiK a ? \ b f > ; « , (5, 6>) + i £ log / z ( * | a ? > ; a , J2) k=l fc=l + i E l o g /(af U ) + £ E log / ( b i F C ) ; B ) . fc=i fc=i We may choose ko as a large number and kt = kt-\ + fct_i/c, t — 1, 2 ,3 , . . . , for some positive constant c, in the t-th iteration. Increasing kt with each E M iteration may speed up the E M convergence (Booth and Hobert, 1999). The E-step at the (t + l)th E M iteration can then be expressed as Q(o\o®) = ± QMeW) » E \rt E ^ ( 0 ; vu ~^t\ b ? } ) } i=i i=i i fe=i j = E E £ l o g M v t K a f , b?\a, (3, <52) + ± E £ log / ^ a f ; a , i?) i=lfc=l i=lfc=l + E E i log /(af5; A) + E E i log / ( b ! f c ) ; B ) 1=1 fc=l 2=1 fc=l = Q ^ a , /3, r52|6>W) + Q( 2)(a, i?|6»W) + < 2 ( 3 ) ( A | 0 « ) + Q ^ O ^ M ) . To generate independent samples from [a i ( b j | y i , z{; 0^], we use the Gibbs sampler (Gelfand and Smith, 1990) by sampling from the two full conditionals [aj | y i , Zj, b ^ 0^] and [ b i | y j , Zi, f?(t)] as follows. / (ai lyi , z i ; b i ; 0<*)) oc / (a i , y ^ Z j , b i ; 0(*>) = / ( a ^ , b i ; 0<*>) • fY(yi\zi, a i ; b ^ 0(*>) = / ( a i | z i ; 6>W)- / y (y i | z i , a i , b i ;6»W) oc /(a,; 0 « ) • / ^ ( Z i | a i ; 0W) • M y ^ , ai, b , ; 0 « ) , (3.4) / ( b i | y i , Z i , a i ; 0 ( t ) ) oc / ( b i , y i | z i , a,; 0 W ) = / ( b i | z i , a^ 0 W ) • / y ( v i | z i , a*, fy; 0W) = / ( b i ; 0 W ) - / y ( y i | z i i a i , b i ; 0 ( * ) ) , where a i and b i are independent each other. Monte Carlo samples from the above full condi- tionals can be generated using rejection sampling methods, as in Wu (2004). Alternatively, 37 integral (3.3) may be evaluated using the importance sampling method. Details of these sampling methods and convergence issues will be investigated in the next section. The M-step then maximizes Q(0\0^) to produce an updated estimate #( t + 1' at the (t + l)-th iteration. Note that the parameters cx, (3, 62, R, A, and B are all different, so we can update the parameters (a, (3, 52, R), A, and B by maximizing + Q^2\ and separately in the M-step. The maximizer (a<-t+1\ (3{t+l\ 52^t+l\ R ^ ) for + may be computed via iteratively re-weighted least squares where the random effects are replaced by their simulated values {(af \ b f >)}: ( a(*+D ) / 3(t+D j < J 2 ( t + l ) j = a r g m a x {Q(D( Q ) / 3 ) ( 52| 0 ( t ) ) + g ( 2 ) ( a ) j R | 0 ( t ) ) } CX,j3,S2,R { n kt j 5I5Irlo6 M Y i N , a f \ b f ^ a , /3, <52) + E E r l o g ^ ( ^ l ^ f e ) ; « ^ ) j - ( 3- 5) i=i fc=i K t J In general, the function in (3.5) is nonlinear in parameters and thus, the maximizers have no closed-form expressions. The maximizers could be obtained via standard optimization procedures for complete-data nonlinear models, such as the Newton-Raphson method. Note that optimization procedures for nonlinear models may be iterative as well. We can use the following Lemma to obtain analytic expressions of the maximizer A<t+1> for and the maximizer B ^ for L e m m a 3.1. (Seber, 1984). Consider the matrix function h(E) = log |E | + tr[E _ 1 n] . If Q is positive definite, then, subject to positive definite E , /i(E) is minimized uniquely at E = a 38 By Lemma 3.1, the maximizer A^t+1^ for can be written as = argmax{Q^(A|6>W)} • A y n kt ^ = a r g m a x ^ ^ - l o g / C a f M ) i=i k=i * " fct i ^ 1 =' arg max £ £ - { - ± log | 2 T T A | - | [gWp A-i [fi(*)]} i = l fe=l * = arg mm £ £ I { log \A\ + [af >f ̂ "̂ [af >]} i = l fc=l 4 = a r g n u n l n l o g l A I + i^^alY^- 1!^]} 4 i = l fe=l = argmmjnlogMI + i ^ ^ t r l t a f ) ] ^ - 1 ^ ] ) } 4 i = l fc=l = argminjn log |A | + ± ^ j S r ^ T 1 [af] [aj f c )] r)} 4 2=1 fc=l = argminjn log | A | + ^- t rL" 1 £ | > f ] [af'f }} 4 I i=l k=l J • = argmm{log|4+tr{/l-';i-^E^*)]N*)]7'}} - C i = l k = l Similarly, the maximizer 73 ( 4 + 1) for can be obtained by fl(i+1) = argmax{Q(4>(#|0(4))} B n kt -. = a r g n w x £ £ - l o g / ( b i f c ) ; i ? ) i = i fc=i 4 = ^ E E ^ ' i i b f r . 2=1 K = l 4 1=1 fc = l To obtain the asymptotic variance-covariance matrix of the M L E 9, we can use the for- mula of Louis (1982), which involves evaluating the second-order derivative of the complete- 39 data log-likelihood function. Alternatively, we may consider the following approximate for- mula (McLachlan and Krishnan, 1997). Let Sc \ = dlc^/dO, where li^ is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of 0 is - l Cov(0) = $ > ( s « | V l , z l 5 0) E{sf\Vi, z i ; 9f where the expectations can be approximated by Monte-Carlo empirical means, as above. In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. Obtain an initial estimate of 9 = 0 ^ based on a naive method such as the two-step method, and set a f = 0 and bj0^ = 0. Step 2. At the (t+ l)th (t > 0) iteration, obtain Monte Carlo samples of the "missing data" (a,, bj) using the Gibbs sampler along with rejection sampling methods by sampling from the full conditionals [a^y;, z;, b,; 0^] and [b^y^, Z j , a^; 0^ ] , or using importance sam- pling methods, to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates 9^+1\ using standard complete-data optimization procedures. Step 4. Iterate between Step 2 and Step 3 until convergence. 3 . 3 . 3 S a m p l i n g M e t h o d s G i b b s Sample r For the proposed Monte Carlo E M algorithm, we can see that generating samples from the conditional distribution [a*, bj|yj, Zj'; 0^] is an important step for implementing the E-step of the Monte Carlo E M algorithm. The Gibbs sampler (Gelfand and Smith, 1990) is a popular method to generate samples from a complicated multi-dimensional distribution by sampling from full conditionals in turn,.until convergence after a burn-in period. Here, we 40 use the Gibbs sampler to simulate the "missing" random effects â and bv Set initial values ( a | ° \ b-0^). If the current generated values are (a\k\ b f ) , we can obtain ( a f + 1 \ h \ k a s follows: Step 1. Draw a sample for the "missing" random effects a\hfrom the full conditional z t, b| f c ); 0 W ) . ' Step 2. Draw a sample for the "missing" random effects h\k+1^ from the full conditional /(tx|yi> af+1); 0<*>). We assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. After a sufficiently large burn-in of r iterations, the sampled values will achieve a steady state as reflected by the time series plots. Then, { (a f , b|fc')} can be treated as a sample from the multidimensional density function / (a i , bi |yi , 0(*>). If we choose a reasonably large gap r' (say r' = 10), we can treat the sample series { ( a f ° , b\k)),k = r + r',r + 2r',...} as an independent sample from the multidimensional density function. The simplest choice for initial values (a\°\ bf^) is (0, 0). Mult ivar ia te Reject ion A l g o r i t h m Sampling from the two full conditionals can be accomplished by rejection sampling methods as follows. If the density functions are log-concave in the appropriate parameters, the adaptive rejection algorithm of Gilks and Wild (1992) may be used, as in Ibrahim et al. (1999). However, for arbitrary N L M E models, some densities may not be log-concave. In such cases, the multivariate rejection sampling method (see Section 3.2 in Geweke, 1996) may be used to obtain the desirable samples. Booth and Hobert (1999) discussed such a 41 method in the context of complete-data generalized linear models, which can be extended to our models. For example, consider sampling from /(aj |yj, z i ; bjj 0®) in (3.4). Let /*(a,) = / ( z j | a j ; 0^)- /(yj |zj , a ,̂ b i ; 0^) and c = sup{/*(u)}. We assume c < oo. A random sample u from / ( a j | y i , Z i , b^ 0^) can then be obtained as follows by multivariate rejection sampling: Step 1. Sample a* from / (a^ 0^), and independently, sample w from the uniform (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from / ( b j | y j , Z j , a*; 0^) can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from [ai , bj|yj, z*; 0^]. Booth and Hobert (1999) noted that, when it is easy to simulate from the assumed densities, the multivariate rejection sampling method can be very fast even if the acceptance rate is quite low. Importance Sampl ing When the dimensions of â or bj are not small, however, the foregoing rejection sam- pling methods may be slow. In this case, we may consider importance sampling methods where the importance function can be chosen to be a multivariate Student t density whose mean and variance match the mode and curvature of /(a*, b i | y i , Z i ; 0^). Note that a mul- tivariate t distribution, which has heavier tails than a multivariate normal distribution, will produce a more robust approximation since underestimating the tails can have serious con- sequences such as unstable behavior that may be difficult to diagnose. Booth and Hobert (1999) discussed an importance sampling method for complete-data generalized linear mod- els. Here, we may extend their method to our models and use importance sampling methods to approximate the integral in the E-step. Specifically, we write / (a , , b i | y i ; Z i ; 0^) = 5exp[/i(ai, b i ) ] , 42 where s is an unknown normalizing constant. Let h(a^, bj) and h(a^, bj) be the first and second derivatives of h(a^, bj) respectively, and let (a*, b*) be the solution of /i(aj, bj) = 0, which is the maximizer of h(ai, bj). Then, the Laplace approximations of the mean and variance of /(aj,bj|yj, z$; 0®) are (a*, b*) and — (h(a*, b*))_1 respectively. Suppose that {(a*'1', b*'1^),..., (a* '̂', b*^)} is a random sample of size kt generated from an importance function /i*(aj, bj), which is assumed to have the same support as /(aj, bj|vj, Z J ; 0^). Then we have where it) / (a; W ,b; ( f c ) |y t ,z i ; Q W ) - h*(af\b*M) are importance weights. Other sampling methods have also been proposed (e.g. McCulloch, 1997). For the above sampling methods, the adaptive rejection method is applicable only when the appropriate densities are log-concave, while the multivariate rejection sampling method and the importance sampling method are applicable in general. Adaptive and mul- tivariate rejection sampling methods may be efficient when the dimensions of the random effects and the sample sizes are small. When the dimension of the integral in the E-step is high, however, rejection sampling methods can be inefficient due to low acceptance rate. If the sample size is not small, importance sampling methods may be more efficient than rejection sampling methods since in this case the importance function may closely resemble the true conditional distribution. 43 3.3.4 C o n v e r g e n c e For Monte Carlo E M algorithms, the incomplete-data log-likelihood is not guaranteed to increase at each iteration due to the Monte Carlo error at the E-step. However, under suitable regularity conditions, Monte Carlo E M algorithms still converge to the M L E s (Chan and Ledolter, 1995). When applying the Monte Carlo E M algorithm, Monte Carlo samples for the "missing" random effects are drawn at each E M iteration. Consequently, Monte Carlo errors are introduced. The Monte Carlo errors are affected by the Monte Carlo sample size. It is obvious that larger values of the Monte Carlo sample size fct will result in more precise but slower computation. A common strategy is to increase kt as the number t of E M iterations increases (Booth and Hobert, 1999). For sufficiently large values of kt, the Monte Carlo E M algorithm would inherit the properties of the exact versions, such as the likelihood increasing properties of E M , but this would substantially increase the computational work load. Thus, we usually use a relatively small kt at initial iterations, and then increase kt with the iteration number t. If the Monte Carlo error associated with 0( t + 1) is large, the (t + l)th iteration of the Monte Carlo E M algorithm is wasted because the E M step is swamped by the Monte Carlo error. Booth and Hobert (1999) proposed an automated method for choosing kt in the context of complete-data generalized linear models. Their method can be extended to our case in a straightforward way as follows. and let 0*(<+1) be the solution to Q^(G\6^) = 0. When the simulated samples are indepen- Let Q ( 2>(0|0 ( t )) Q ( 1 ) ( 0 | 0 « ) 0Q(fl|flC>) dd d 2 Q(0|0W) dddOT dent, it can be seen that the conditional distribution of [0^+1^|0W] is approximately normal 44 with mean 0*(t+1) and a covariance matrix that can be estimated by C o v ( 0 ( i + 1 ) | 0 « ) = Q(2)(0*(*+D|0(i))-i Cov(Q ( 1 )(0* ( t + 1 ) |0W)) Q(2)(6>*(t+1)|0(*))-1, where C o v ^ V ^ I ^ ) ) = £ E | ^ l o g / ( y i , z i > a i f c \ b ^ ^ * + 1 ) ) | }, (a[k\ b f ) are simulated samples, and Wik are the importance weights when the importance sampling is used and are all set to 1 when rejection sampling methods are used. After the (t + l)th iteration, we may construct an approximate 100(1 — a)% confidence ellipsoid for 0*(*+1) D a s e c} o n the above normal approximation. The E M step is swamped by the Monte Carlo error if the previous value 0^ lies in the confidence ellipsoid, and in that case we need to increase kt. For example, we may set kt to be & t _ i + h-i/c for some positive constant c and appropriate k0. Increasing kt with each iteration may speed up the E M convergence (Booth and Hobert, 1999). Note that this method of choosing kt is completely automated. The proposed Monte Carlo E M algorithm often works well for models with a small dimension of random effects. When the dimension of random effects is not small, however, the proposed M C E M algorithm and Gibbs sampler may converge very slowly or even may not converge. Therefore, in the next section, we propose an alternative approximate infer- ence method which may avoid these convergence difficulties and may be more efficient in computation. 45 3.4 A Computationally More Efficient Approximate Method 3 .4 .1 T h e N e e d f o r a n A l t e r n a t i v e M e t h o d The Monte Carlo E M method in the previous section may be computationally very intensive and may offer potential computational problems such as slow or non-convergence, especially when the dimensions of the random effects a; and b; are not small. When the dimensions of the random effects are not small, sampling the random effects in the E-step may lead to inefficient and computationally unstable Gibbs sampler, and may lead to a high degree of auto-correlation and lack of convergence. To overcome these difficulties, in this section we propose an alternative approximate method by iteratively using a first-order Taylor ap- proximation to the nonlinear models. The proposed method avoids sampling the random effects and provides analytic expressions for parameter estimates at each iteration, So it may be preferable when the Monte Carlo E M method exhibits computational difficulties. Alternatively, the proposed method in this section can be used to obtain excellent parameter starting values for the Monte Carlo E M method. For complete-data N L M E models, approximate methods have been widely used, and these approximate methods perform reasonably well in most cases (Lindstrom and Bates, 1990; Pinheiro and Bates, 1995; Vonesh et al., 2002). These approximate methods are typ- ically obtained via Taylor expansions or Laplace approximations to the nonlinear models. One particularly popular approximate method for complete-data N L M E models is that of Lindstrom and Bates (1990), which is equivalent to iteratively carrying out maximum like- lihood based on certain L M E models (Wolfinger, 1993). Following Lindstrom and Bates (1990), we propose to further approximate model (2.5) by taking a first-order Taylor ex- pansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model, with the covariate model (2.8), we 46 update parameter estimates based on the distribution of observations and an E M algorithm. In each iteration, analytic expressions for parameter estimates are available. Therefore, the proposed approximate method may provide substantial computational advantages over the Monte Carlo E M method. We rewrite the N L M E model (2.6) and (2.7) as a single equation Vij = 9ij(a, P, ai, b i ) + e^; i = 1, . . . , n, j = 1, . . . , ni} (3.6) where gij(-) is a nonlinear function. Let gi = (gn,... ,giTH)T. Denote the current estimates of (6, ai, bi) by (6, sn, bi). Taking a first-order Taylor expansion of g^ around the current parameter estimates a. and (3 and random effects estimates a* and bj, we obtain the following L M E response model yi = Wioc + Xir3 + Hi&i + Tihi + ei, (3.7) where Wi = (wji,. . . , W i „ i ) i with xvi:j = Xi = ( x i i x i n i )T with x ; j = da dgn dp Hi = ( h j i , . . . , hini )T with htj = iJ - d*i Ti = (tii, • • • , Um)T with tij = -7^- 9i = Yi -g i (at , P, a i , bi) + WiCx + XiJ3 + HiSH + Tibi, with all the partial derivatives being evaluated at (a, (3, a i ; bi). The proposed method in this section is to iteratively carry out maximum likelihood based on the L M E response model (3.7) and the covariate model (2.8). Let 7 = (a, (3) be the mean parameters and A = (52,R,A,B) be the variance-covariance parameters. The algorithm consists of alternately obtaining approximate estimates of 7 given the current 47 estimates of variance-covariance parameters a based on the distribution of fj = (yf, z[)T and then updating the variance-covariance parameter estimates via an E M step using the posterior curvatures of (e;, e ,̂ a;, bi), as in Laird and Ware (1982). 3 .4 .2 A n a l y t i c E x p r e s s i o n s o f E s t i m a t e s We can combine the L M E response model (3.7) with the covariate model (2.8) to form a unified L M E model r% = QiJ + ZitJi + V j , i = l,...,n, where r% = (yf , z f f , = (af, bf)T, v 4 = ( e f , e f ) T , Ut = {1%,. Qi = (3.8) UT V V J Vi 0 Ui 0 with O's being appropriate zero matrices. For the unified L M E model (3.8), by standard arguments, we have [fi\ui; 7, a] ~ N{Qi*f + ZiUii, Ai), [T\; 7, a] ~ iV(Qi7, Si), (3.9) where Ej = Z{DZj + A», A i = / r 2 T r> \ V s2i 0 0 I®R D I A 0 ^ 0 B and the Kronecker product I ® R is a ẑ mj x um-i matrix with the u x v submatrix R on the diagonals and zeros elsewhere. Using known results for L M E models (e.g., Vonesh and Chinchilli, 1997), we can update the estimate of 7 given the current estimates a by 7 = Etfsr 1 ^ Eo^r 1 * (3.10) .1=1 48 Based on the unified L M E model (3.8), we can also obtain the joint distribution of (fj, u>i) at 7 = 7 and A = A 5 7i A N V Qa W") J IV 0 ) from which we obtain the conditional distribution of given the observed data r , at 7 = 7 and A = A: [wi|f i ; 7, A] ~ JV[D Zf S 4 _ 1 (r i — Qi 7), D — D Zf Sr 1 Zi £>]. A n estimate of the random effects u>j is thus given by (3.11) £>i = DZ1i ^(fi-Qn). (3.12) Finally, following Laird and Ware (1982), we can update the variance-covariance pa- rameter estimates as follows. Note that if we were to observe a*, b j , e,, and ei} in addition to y i and Z j , we would have the following estimates * 2 = £ e f e , / £ n t , i-1 i=l n rrii n ^ = E E €ij 4/ £ ™» i=ij=i i=i ^ = Ebibf/n, 1=1 A = £ aj a f /n , i=l where £ e F e i ' S £ €Jjeij> X ^ a j a f , and £ b i b f are the "sufficient" statistics for 62, R, i= l i = l j = l i = l i= l A, and fl, respectively. Since a i , b j , e ,̂ and are unobservable, we can "estimate" them by their conditional expectations given the observed data y i and z , at the current parameter estimates 7 = 7 and A = A. Based on standard results for multivariate normal distributions, we know that 7 = 7, A = A N \ 0 l Z/i LT PI / J 49 and \ €a J ; 7 = 7, A = A N 1 Q i i ^ V 0 / Ej Rij where Lj is a (nj + i/m^) x matrix with the first n; x nj submatrix 52I and zeros elsewhere, is a (rii + i/m*) x u matrix consisting of the first x v submatrix 0 and the remaining mi (u x v) square submatrices with the j th square submatrix R and zeros elsewhere. By the definition of conditional distributions, it can be shown that N f ^ T . A j - i V t L f E r ^ - Q ^ ) , 521 ~ Lj E " 1 L J ] , [e y | f i ; 7, A] ~ N[Rl E r ^ f i - Qa), R - R% E r 1 i ^ ] . Using the expectation and covariance properties for multivariate random variables and some matrix algebra, we update the estimates of the variance-covariance parameters (<52, R, A, fl) as follows: P = E E(ef ei\ril 7, A ) / £ r i i = X X C o v ^ l r , ; 7, A)) + ^ e ^ ; 7, A f l ^ e ^ ; 7, A)]/ £ n,- n m i i=l R = £ £ £ ( e y e £ | f i ; 7, i=i (3.13) i=i j=i n mi _ n ' = E E [Cov(e^ | f i ; 7, A ) + £ ( 6 ^ ; 7, A ) f l ( C y | r i ; 7, A f ] / £ m t , »=i i=i i=i n A = E ^ ( a i a f l ^ ; 7,'A)/n i=i n = £ [ C o v ( a j | f y , 7, A) + E{sLi\fi; 7, A) £ ( a j | f i ; 7, A ) T ] / n , • i=l fl = E £ ? ( b i b f | f i ; 7 , A ) / / i i=l = ElCovCbilfij 7 ) A) + £ ( b j | f y , 7, A) E(b%\ri; 7, A ) T ] / n . i=l The foregoing results show the closed-form expressions of the parameter estimates at each iteration for L M E model (3.8). Iteratively solving L M E model (3.8) until convergence leads to an approximate M L E 0 = (7, A) of 0. 50 3.4.3 Asymptotic Properties Following Vonesh et al. (2002), we show in Section 3.7 that, under fairly mild regularity conditions, 7 satisfies the following properties 7 = 7 M L £ + 0 P { (minJVi ) j = 7 0 + O p / m a x n~1^2, ^min N^j 1/2-1 where Ni = ni + m,i, and 7 0 and ^MLE a r e ^ n e ^ r u e value and exact M L E of 7 , respectively. Thus the approximate M L E 7 is not only consistent but also asymptotically equivalent to ' the exact M L E . The rate of convergence is shown to depend on both the number 71$ of observations per individual and the number n of individuals. Moreover, the estimate 7 asymptotically follows a normal distribution: v ^ ( 7 - 7o) ± N(0, ft(7o))., where the asymptotic variance-covariance matrix, Q,(-y0), can be consistently estimated by 1 j 0=0 i=l The proofs of the above results are given in Section 3.7. 3.5 Example and Simulation 3.5.1 An Application in AIDS Studies We apply the foregoing proposed methods to a HIV dataset for illustration. We also com- pare the proposed methods with the commonly-used naive method which ignores covariate measurement errors and the two-step method, which is described in Section 3.2. 51 D a t a D e s c r i p t i o n The study consists of 46 HIV infected patients who were treated with a potent an- tiretroviral regimen consisting of protease inhibitor and reverse transcriptase inhibitor drugs. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will typically decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contami- nated by long-term clinical factors, which leads to complex longitudinal trajectories. Various covariates such as CD4 count were also recorded throughout the study on similar schedules. The viral load has a detectable limit of 100 R N A copies/mL. For simplicity, we imputed the censored viral loads, which are below the detection limit, by half the detection limit 50, as in Wu and Zhang (2002). The number of measurements for each individual varies from 4 to 10. There were 72 out of 361 CD4 measurements missing at viral load measurement times, due mainly to a somewhat different CD4 measurement schedule. The detailed data description can be found in Wu and Ding (1999) and Wu (2002). T h e Response a n d Cova r i a t e M o d e l s Modelling HIV viral dynamics after anti-HIV treatments has received a great deal of attention in recent years (Ho et al., 1995; Perelson et al., 1996; Wu and Ding, 1999; Wu, 2005). The following HIV viral dynamic model (first stage model) has been widely used (Wu 2002; Wu and Zhang, 2002) V i j = l o g 1 0 ( P l i e - A l ^ + P 2 i e - X ^ ) + ey, (3.14) where yy is the logio-transformation of the viral load measurement for patient i at time tij, Pu and P2i are baseline values, and X^j and A 2 y are the first (initial) and the second phase viral decay rates, respectively, and they may be interpreted as the turnover rates of 52 productively infected cells and long-lived and/or latently infected cells respectively. The logio-transformation of the viral load is used to make the data more normally distributed and to stabilize the variance. Wu (2002) noted that variation in the dynamic parameters such as the first phase decay rate XUJ may be partially associated with variation in CD4 counts. In AIDS studies, it is known that covariates such as CD4 count are often measured with substantial errors. Thus we assume that the dynamic parameters are related to the true covariate values, reflecting the belief that actual, not possibly corrupted, covariate values govern the model parameters, as in Higgins et al. (1997) and Wu (2002). Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay (see Figure 1.1). Grossman et al. (1999) pointed out that the viral decay rate after the initial period may be complicated and may vary over time since they may depend on some phenomenological parameters which hide considerable microscopic complexity and change over time. Therefore, a nonparametric smooth curve modelling for the second phase viral decay rate may be more appropriate than parametric modelling (Wu and Zhang, 2002). Based on the reasons noted above, we consider the following second stage model, which corresponds to the first stage model (3.14), where z*- is the true (but unobserved) CD4 count, and w(tij) and hi(Uj) are nonparametric smooth fixed- and random-effects functions defined in Section 2.1. To avoid very small (large) estimates, which may be unstable, we standardize the CD4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines l o g ( P u ) = A + hi XUJ = w(Uj) +hi(tij), (3.15) l o g ( P 2 i ) = PA + hi 53 Table 3.1: A I C and BIC values for the viral load model (3.14) and (3.15), with q < p = 1, 2, 3. Model p=l ,g=l p=2,q=2 p=2,g=l p=3,g=3 p=3,q=2 p=3,g=l A I C 615.96 583.54 585.39 577.37 586.45 576.43 BIC 678.18 669.09 656.43 670.71 665.90 651.50 with the percentile-based knots to approximate the nonparametric smooth functions w(t) and hi(t). Following Wu and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of the random effects bj, i.e., more basis functions used to approximate the fixed-effects function than the random-effects functions. A I C and BIC criteria are used to determine the values of p and q. We use the observed CD4 counts for the unobservable true CD4 counts in the response model (3.14) and (3.15), and use SPLUS functions nlmeQ and anovaQ to obtain the values of A I C and BIC. Table 3.1 displays A I C and BIC values for various plausible models. Based on these A I C and BIC values, the model with p = 3 and q = 1, i.e., A 2 i i « A + /?e#i(*tf)+ft V>2(*«)+•&«, (3.16) seems to be the best, and thus it is selected for our analysis. For the CD4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed CD4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,..., uu~l) and linear (a = 2), quadratic (a = 3), and cubic (a = 4) polynomials. Table 3.2 presents A I C and BIC values for these three models. The following quadratic polynomial L M E model best fits the observed CD4 54 Table 3.2: A I C and BIC values for the linear, quadratic, and cubic CD4 models. Model a=2 a=3 a=4 A I C BIC 796.17 703.19 742.12 819.50 761.52 781.01 process: CD4i( = (ai + ai) + (a2 + 0 2 ) uu + (&3 + 0.3) u\ + eu (3.17) where uu is-the. time and oc — (ai, a2, c*3)T are the population parameters and a, = (an, ai2, ai3)T are the random effects. Es t imat ion M e t h o d s a n d C o m p u t a t i o n Issues We estimate the parameters in the response and covariate models using the naive method which ignores measurement errors, the two-step method in Section 3.2, and the two proposed "joint" model methods discussed in Sections 3.3 and 3.4. We denote the method in Section 3.3 by M C E M and the method in Section 3.4 by A P P R . The two proposed joint model methods need starting values for model parameters. We respectively use the parameter estimates obtained by the naive method and by the two-step method as parameter starting values for the two joint model methods. For the naive method and the two-step method, we use SPLUS functions lme() and nlme() to obtain parameter estimates and their default standard errors. For the M C E M , method, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 3.1 and 3.2 show the time series and the autocorrelation function plots for b2 associated with patient 10. From these figures, we notice that the Gibbs sampler converges quickly andthe autocorrelations between successive generated samples are negligible after lag 15. Time series and autocorrelation 55 Figure 3.1: The time series plot for 62 associated with patient 10. Series : b2 I I i i i -I I I I I •' •'- Figure 3.2: The autocorrelation function plot for b2 associated with patient 10. 56 function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain "independent" samples (see sampling methods in Section 3.3.3). For the Monte-Carlo E M algorithm, we start with k0 = 500 Monte-Carlo samples, and increase the Monte-Carlo sample size as the number of iteration t increases: kt+\ = kt + kt/c with c = 4. Convergence criterion for the iterative methods in our examples is that the relative change in the parameter estimates from successively iterations is smaller than a given tolerance level (e.g., 0.01). However, due to Monte Carlo errors induced by Gibbs sampler, it is difficult to converge for an extremely small tolerance level, otherwise it may converge very slowly. The actual tolerance level we used in our example for the two proposed joint model methods is 0.05. Convergence of the algorithms are considered to be achieved when the maximum relative change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the M C E M method. Other sampling methods may also be applied. On a S U N Sparc work-station, the M C E M method took about 90 minutes to converge while the A P P R method took only 3 minutes to converge. This shows that the A P P R method offers quite a substantial reduction in computing time, and is thus computationally much more efficient than the M C E M method. Analys i s Results and Conclus ions Table 3.3 presents the resulting parameter estimates and standard errors. We see that, except for the naive method, the other three methods give similar point estimates for the parameters, especially for the covariate model parameters ex. However, for the parameters /3 of main interest, the two-step method gives smaller standard errors than the two joint model methods ( M C E M and A P P R ) . This is because the two-step method ignores the variability due to estimating the parameters cx in the covariate model. Thus, these results are consistent 57 Table 3.3: Parameter estimates (standard errors) for the HIV dataset. Method a 2 a3 Pi ft Ps Pi P5 As Pr 5 R Naive - - - 11.73 65.41 .41 6.82 -3.01 9.27 -1.67 .35 - - - (•2) (3.9) (3.3) (.6) (5.6) (8.8) (3.6) Two-step -.42 4.17 -3.74 11.73 65.78 1.53 6.84 -2.86 9.04 -1.75 .35 .52 (•1) (.5) (.5) (•2) (4.1) (4.7) (.6) (5.5) (8.9) (3.5) M C E M -.41. 4.02 -3.54 11.74 66.60 1.55 6.85 -2.78 8.91 -1.79 .35 .53 (•1) (.5) (.6) (•2) (4.7) (5.2) (•7) (6.0) (9.0) (3.6) A P P R -.42 4.17 -3.74 11.74 65.72 1.33 6.85 -2.82 8.99 -1.77 .35 .52 (•1) (.6) (.6) (•2) (4.5) (5.0) (•7) (5.9) (9.1) (3.6) Note: A and B are unstructured covariance matrices, but we only report the estimates of their diagonal ele- ments here. Diag(A) = (.52, 4.06,1.98) for Two-step, Diag(A) = (.53, 2.55,1.25) for MCEM, and Diag(A) = (.52,4.06,1.99) for APPR. Diag{B) = (1.11, 69.94,2.02,24.86) for Naive, Diag(B) = (1.10,69.58,2.02,25.04) for Two-step,. Diag(B) = (1.11,69.96,2.05,25.47) for MCEM, and Diag(B) = (1.10,69.78,2.01,24.85) for APPR. with the analytical results about the two-step method in Section 3.2. We also see that the naive method may severely under-estimate the effect of the covariate CD4 (which is measured by the parameter p3). The estimates and standard errors based on the two joint model methods are similar and may be more reliable. The commonly used two-step method and the naive method may give misleading results, and the two proposed joint model methods may be more reliable. We will confirm this conclusion via simulations in next section. 3 .5 .2 A S i m u l a t i o n S t u d y In this section, we conduct a simulation study to evaluate the proposed methods ( M C E M and A P P R ) , and compare them with the commonly used two-step method and the naive method by the mean-square-error (MSE). The models and the measurement schedules used in the simulation are the same as those in the real HIV dataset in the previous section (i.e., models (3.14) - (3.17)). In the simulations, the true values of a and (3 are shown in 58 Table 3.4: Simulation results for parameter estimates as well as (standard errors) and (sim- ulated standard errors)* for the estimation methods Naive, Two-step, M C E M , and A P P R . Parameter Oil a2 a 3 Pi P2 Pz P4 p5 (% Pi True Value -.5 4.0 -4.0 12.0 66.0 1.5 7.0 -3.0 9.0 -2.0 Naive Method - - - 11.98 65.50 0.94 6.92 -3.74 10.13 -1.62 - - - (-1) (1.1) (1.1) (.3) (1.7) (2.6) (.9) - - - (.2)* (1.2)* (1.0)* (.3)* (1.6)* (2.8)* (1.0)* Two-step -.51 4.05 -4.02 11.98 65.66 1.53 6.92 -3.74 10.13 -1.62 (•1) (.3) (•4) (•2) (1.2) (1.4) (•3) (1.7) (2.6) (.9) (.1)* (.3)* (.3)* (.1)* (1.2)* (1.4)* (.2)* (1.6)* (2.5)* (.9)* M C E M -.51 4.04 -4.02 11.98 65.85 1.48 6.98 -3.11 9.15 -1.95 (•1) (•4) (•4) (•1) (1.5) (1.8) (•3) (2.0) (3.0) (1.1) (.1)* (.4)* (.4)* (.2)* (1.5)* (1.8)* (.3)* (1.9)* (2.8)* (1.1)* A P P R -.51 4.05 -4.03 11.98 65.46 1.52 6.93 -3.82 10.29 -1.56 (•1) (•3) • (-4) (•2) (1.4) (1.8) (.3) (2.0) (3.1) (1.1) (.1)* (.3)* (.3)* (.2)* (1.4)* (1.7)* (.3)* (1.9)* (2.8)* (1.0)* Table 3.4, and the other true parameter values are 5 = A, R = .3, A = diag(.5, 2, 1), and B = diag(l, 9, 2, 4). We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors as well as simulated standard errors based on each of the four estimation methods. Since M C E M method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 116 data sets. The simulation results are shown in Table 3.4. From Table 3.4, we see that the naive method can severely under-estimate the covari- ate effect /?3. The two-step method produces similar point estimates as the two joint model methods ( M C E M and APPR),"but it gives smaller standard errors than the M C E M and the A P P R methods, since the two-step method fails to incorporate variability in estimating the covariate model. These results are consistent with the analytical results in Section 3.2. Note that the estimates for the covariate model parameters a are similar for the three methods (Two-step, M C E M , and A P P R ) , but the parameters a are usually treated as nuisance pa- 59 rameters and are not of primary interest. The two proposed joint model methods ( M C E M and A P P R ) perform better than the two-step method and the naive method, and the M C E M method is the best among all four methods in terms of bias. In the computation, the A P P R method converges much faster than the M C E M method. These simulation results confirm that the naive method and the two-step method may give misleading results and that the two proposed methods are more reliable. 3.6 Discussion We have proposed two approximate likelihood methods for semiparametric N L M E models with covariate measurement errors and missing data. The first method, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler, may be more accurate but may be computationally intensive and sometimes may offer computational problems such as slow or non-convergence. The second method, implemented by an iterative algorithm without Monte Carlo approximation, is computationally much more efficient, but it may be less accurate than the first method since it uses an additional approximation. Alternatively, the second method may provide excellent parameter starting values for the first method. Simulation results show that both methods perform better than the commonly used two-step method and a naive method. In particular, the commonly used two-step method may under-estimate standard errors, and the naive method may under-estimate covariate effects. For semiparametric N L M E models with covariate measurement errors and missing data, the models can be very complex. Thus, if there is not sufficient information in the data, the models can be non-identifiable. To our knowledge, there seem no existing general necessary and sufficient conditions for model identifiability, and the identifiability problem needs to be considered on a case-by-case basis. In practice, we can check model identifiability 60 by examining the convergence of iterative algorithms. If the model is non-identifiable, the iterative algorithms may diverge quickly. For the models considered here, we find that the iterative algorithms converged without problems, so the models seem identifiable. The methods proposed here may be extended to semiparametric generalized linear mixed models and nonparametric mixed-effects models with covariate measurement errors and missing data. The results will be reported in the near future. 3.7 Appendix: Asymptotic Properties of 7 Based on the A P P R Method in Section 3.4 In this section, we show the asymptotic properties of 7 obtained by the A P P R method in Section 3.4. We first state some Lemmas, which are used in the proofs, in Section 3.7.1. Then we describe some regularity conditions under which the asymptotic properties hold in Section 3.7.2. In Section 3.7.3, we obtain some estimating equations, which are used for showing asymptotic properties of 7 . Consistency and asymptotic normality of 7 are shown in Sections 3.7.4 and 3.7.5, respectively. The results are extensions of Vonesh et al. (2002). 3.7.1 S o m e L e m m a s The following four lemmas will be used for showing asymptotic properties of 7. Lemma 3.2. (Vonesh and Chinchilli, 1997). Let 7 „ be a sequence of random variables satisfying Yn = c+Op(an) where an = o(l). If f(x) is a function with r continuous derivatives at x = c, then f(Yn) = /(c) + fU(c)(Yn - c) + • • • + [l/(r - l)!]/( r" 1)(c)(y n - c ) - 1 + O p « ) , 61 where / ^ ( c ) is the fcth derivative of / evaluated at c. In particular, f(Yn) = f(c) + Op(an). This result holds when Op(-) is replaced everywhere by op(-) or when Yn and c are replaced by a vector/matrix random variable Y„ and vector/matrix constant c. L e m m a 3 . 3 . (Aitchison and Silvey, 1958). If / is a continuous function mapping Rs into itself with the property that, for every 9 such that ||#|| = 1, 9Tf(6) < 0, then there exists a L e m m a 3 . 4 (The B o u n d e d Convergence T h e o r e m ) . Let {fn(x)} be a sequence of measurable functions defined on a set of E of finite measure, and suppose there is a real number M such that |/n(a;)| < M for all n and all x. If f(x) = l i m n _ > o 0 fn(x) for each x in E, then L e m m a 3 . 5 . Let A and B be u x u symmetric matrices with eigenvalues fJ-i(A) > /^(A) > ••• > A*„(A) and Hi(B) > fj-2{B)-> ••• > Hv{B), respectively. If A — B are nonnegative definite, denoted by A — B > 0 or A > B, then we have fJ-i(A) > Pi(B), i = 1, . . . , v. 3.7.2 Notation and Regularity Conditions Let the r-dimensional vector 7 = (a , (3) € T and k (7, oj{) = li(ct, 0, a i 5 y^ Zj) = log/y(j / i |zj , c^i;7) + log/z(zi |wi; 7 ) , NiLih, Wi) = ^ ( 7 , Wi) + log / (ai) + log / (b i ) , ' 62 point 0 such that ||6>|| < 1 and f(8) = 0. where N; = rii + m-i. Let d w i=u; i(7)) d2 dujidujf w i=a; i(7)' »\w,w t(7, ^ ( 7 ) ) = i, 7(7> ^ ( 7 ) ) = ^ ; ^ ( 7 , Wi) " 7 7 ^ ^(-Y)) = d^li^' W l ) " 7 ^ ( 7 , W i ( 7 ) ) = Q^fhh, Wi) wi=u;i(7)» u>i=u;i(7)i a; i=a>i(7). etc. Similarly, we can define the corresponding derivatives for L i (7, c*>i(7)) and 3 2 -h(i, Vi), etc. Also, we denote convergence in probability as iVj —> 00 by o p(ljv i), convergence in probability as n —> 00 by o p ( l „ ) , and convergence in probability as both Ni —> 00 and n —• 00 by o p ( l A ^ i , n ) - We show consistency and asymptotic normality under the following regularity conditions. A n outline of the proof when some of these conditions are relaxed is provided at the end. R l . /Vj = O(N) uniformly for i = 1, . . . , n, where N = min; R2. The variance-covariance parameters A = (52, R, A, B) are'fixed and known, and the true parameter 7 0 is in the interior of V. Qi, Ai , Zi, E j , and D are evaluated at 6 and Wj. When A is unknown, we can simply replace it by its consistent estimate (e.g., in (3.13)). R3. The density functions fyiVijl^i, <*V)7) and fz(zij\tJi; 7 ) satisfy the necessary regularity conditions (e.g., Bradley and-Gart, 1962) such that, for fixed 7 , the M L E of u>j is v^-consistent for Wj as Ni —> 00. In addition, the necessary regularity conditions are assumed (e.g., Serfling, 1980, p. 27, Theorem C) such that, by the Law of Large Numbers, 63 the following hold: ~ K l - J ^ k { l ^ l ) = N-'QTiK-lQl + op{lNi) as N* - oo, oyd'y1 -N~l-^--Yk{1^l) = N-1ZjK-lZi + op{lNi) as N{ - oo, - N - l ^ Z T k h , u J i ) = N-1ZTA;lQi + op(lNi) as - o o , where, under models (2.6) - (2.8) a 2 a 2 Finally, the matrices N~lQj'h~lQi and N^lZjK^Zi are both assumed to be positive definite with finite determinants such that, for example, the smallest eigenvalue of N~1QfA~1Qi exceeds Ao for some Ao > 0. R4. For all 7 G T and all the 6-dimensional u>j € Rb, the function L;(7, cJj) is six times differentiable and continuous in 7 and Wj for all ŷ - and z -̂, and L i (7, Wj) satisfies the necessary regularity conditions needed to change the order of integration and differentiation, as indicated in the proof. R5. For any 7 e F, there exist di > 0 and Ai > 0 such that a. For all 7* € 5̂ (7), where #̂ (7) is the r-dimensional sphere centered at 7 with radius d\, the following holds: ~ E ̂ >7^> ^ ( 7 ) ) l7=T = " ( 7 T 1 + Op(ln), i = l ' as n —> 00, 64 where ^(7*) 1 is positive definite with minimum eigenvalue greater than A i and ^ , 7 ( 7 , Wi(7)) |7=T = inni{l\ Wi(7*)) + # y W | ( 7 * , « i (7* ) ) x K i w « ( 7 * , Wi(7*)) + D - 1 ] - 1 ^ 7 ( 7 * ! w*(7*))}. b. The first, second, and third derivatives of y/NiL^-y, Wj) with respect to CJ^ are uniformly bounded in 5 ^ ( 7 ) . R6. At the true value 7 0 , the following hold true: Eu,(QjE~lQi) = c^j(70) exists for all i = l , . . . , r a , n lim n" 2 V Covo; (Qf E ^ Q , ) = 0, n—>oo ' ^ i = l and lim n _ 1 YV (7o) = fi(7o)~\ n—>oo ' J where Qi, Zit and E ; are evaluated at 7 0 and o»j and f 2 ( 7 0 ) _ 1 is positive definite. R7. The marginal densities, Jexp{iVjL;(7, u>i)} duii, satisfy the necessary regularity conditions such that the M L E of 7 exists and satisfies ('JMLE ~ 7o) = Op{n~ll2). 3.7.3 E s t i m a t i n g E q u a t i o n s In Section 3.4, we obtain an approximate M L E 6 = (7, A) of 6 and the predictor u>j of <jj. It is obvious from (3.10) and (3.12) that the estimate 7 is a function of A and the final estimates Wj of u>j are functions of both 7 and A. For fixed A, it can be shown that 7 and n Cji = uJi(j) maximize the complete-data log-likelihood function (3.2), i.e., 2~2 -^ -^ (7 , <*>i)- 65 In fact, we can write if = if (a, (3, aj,bj, A) in (3.2) as Hi l O g ^ T T C T j - — log - T if = - | log(27r<52) - ^ l o g I27T.RI - \ l o g | 2 ^ | - \ l o g | 2 7 T f l | Yi' Si A T 1 -UiOL- T ( » \ \b-) yi - gi yzi-UiOL-Viai J (1 Taking the first derivatives of £ ii with respect to 7 = (a , (3) and = (af, b f ) r and. set- i=i ting these first derivatives to appropriate zero vectors, we can obtain the following estimating equations i=l Z i \ -1 ^ Zi - Ui OL - Vi &i J ^ yi - gi ^ V - D~ (3.18) V b t / = 0, i = l,...,n, Zi — UiCX — ViZn j which is equivalent to £ <9* A i _ 1 ^ 7 + E Ql A " 1 ^ = £ Qi K l n, i=l i=l i=l ^ Zi A " 1 Qi 7 + [Zj A " 1 Zi + fl-"1]^ = ^ A ^ r - i , ' i = 1 . . . , n, where (3.19) ( V Si . V>. \ eft Zi da] dbj J The solution to the estimating equations (3.19) can be obtained by iteratively solving the following equations £ QI A - 1 Qn + tQT A " 1 ZiUi = ± Qf A,-1 fit i=i i=i i=i (3.20) Z f A " 1 Q l l + (ZT A " 1 Zi + b-1)ui = Z j k . 1 r i , i = 1 . . . , n , 66 where f; = (yf , z[)T is defined in (3.7), and Qi, Ait Zit and D are Qi, Ai, Zi, and D evaluated at 7 = 7 and Wj = Wj, respectively. The solution to the equations (3.20) is given in (3.10) and (3.12). Therefore, for fixed A , the final estimate 7 and a V = ^ ( 7 ) satisfy the estimating equations (3.18) and maximize the complete-data log-likelihood function (3.2). These facts will be used to show the following asymptotic properties of 7 . 3.7.4 C o n s i s t e n c y We first note that, for fixed A , the M L E of 7 will satisfy the set of estimating equations ° ^ i=i 7 i=i J Under R4, we have "" / {j2NiQ~L^' ^ ) } e x p { ^ 7 V , L , ( 7 , ŵ dwr-.dw, = 53 V 7 ^ ' / • • • / (VNiQ-Hl, wO) exp { A ^ ( 7 , w^W • • • dwn i=i ^ 7 j=i = V 7 ^ / ( \ / ^ ^ , 7 ( 7 , w<) exp{/V i L i(7, W f ) } ^ x 7 e x p { i V i L i ( 7 , a;,-)}** i=i ^ j ^ i - 7 Now we examine the term £ - ^ ( 7 , u>j) in the above expression. Since the y^s and Zj/s are conditionally independent each other given u>iy by the Lindeberg Central Limit Theorem, it follows that conditional on u>,-. ( = NrWA,1 Yi ~ gi Ui a — Vi a, \ = Op(N~1/2) (3.21) Furthermore, under R3 it can be shown that the estimate ^ ( 7 ) = vi + Op(N-1/2). 67 (3.22) Combining the results in (3.21) and (3.22) and applying Lemma 3.2 to £ ^ ( 7 , ^ ( 7 ) ) , we can show that • 4 7 ( 7 , A ( 7 ) ) = L ; i 7 ( 7 ) u ; i ) + O p ( i V i - 1 / 2 ) = Op{N~112) + Op(N~1/2) .= Op(N~1/2). (3.23)' Then, by direct application of the Laplace approximation to integrals of the form J exp{kp(x)}dx and / q(x) exp{kp(x)}dx, where q(x) and p(x) are smooth functions in x with p(x) having a unique maximum at some point x (see, e.g., Barndorff-Nielson and Cox, 1989), it can be shown that \NiL"\ (l + ocivr 1)) J exp{/ViLi(7, Ui)}dwi = exp{A^ iL i(7, 1^(7))} ^ and / ( ^ ^ , 7 ( 7 , ";)) exp{/V i L i(7, w^jdui = e x p { M L l ( 7 , «i(7))} x ^ ( ^ - 7 ( 7 , « i(7)) + W 1 ) ) , where L - ' ^ ^ = £"u;,u>(7 ; 0^(7)). Because (3.21) implies y/N~L'irf(-y, 0)1(7)) = O p ( l ) , it follows from R l that 1L ( ^ ^ ( 7 , ^ ( 7 ) ) + O^N-1)) x J](l + O^Nf1)) = VWJ,7(7, w<(7)) + ^ ( A T 1 ) . Hence we have i=l e x p { / V i L i (7 , W i ( 7 ) ) } 2?r 6 / 2 (y/NiL'^in, &&)) + OiNr1) n J J e x p { j V i L i ( 7 , « i ( 7 ) ) } 2TT i r ( 7 ) w( 7 ) ) n E ^ , 7 ( 7 . ^ ( 7 ) ) + n O p ( i V - 1 / 2 ) 6 / 2 I (I + CXAT"1)) i=l 68 where K(7, u>(7)) = e x P { £ A ^ ( 7 , ^ ( 7 ) ) } ft ( V l ^ w J ) 6 / 2 - Since ^ ( 7 , 4 ( 7 ) ) ^ 0 i = l i = l for all 7 € r, the M L E -yMLE of 7 satisfies J ^ ) | 7 = 7 ^ = ° ^ ( 7 , ^ ( 7 ) ) | 7 = 7 M L £ + O p ( n 7 V - 1 / 2 ) = 0 , (3,24) n n where Ji(7, £ ( 7 ) ) = ]H ^ - ^ 7 ( 7 , ^ ( 7 ) ) = E ^ 7 ( 7 , ^1 ( 7 ) ) is the s e t of estimating equa- i = l i = l tions for 7 conditional on fixed A , as given in (3.21). By taking a first-order Taylor series expansion of Ji(7, £ ( 7 ) ) about y M L E a n d noting that, from (3.24), J\{IMLE, ^{IMLE)) = Op(nN~1/2), we have •• Ml, w (7)) = Op(nN-^2) + J { ( 7 * , w(7*))(7 - 7 M L E ) > where JiW, « ( 7 * ) ) = 7j~r<M7, « ( 7 ) ) | 7 =7* = ^ f E ^ ' 17=7* (3.25) <972 t=i and 7* is on the line segment joining 7 to •JMLE- By applying the chain rule, we have, for any 7 G T, d2 I ^ ( 7 ) • / I ( 7 , * ( 7 ) ) = E { ^ T ^ ( 7 ; W i ) j=i + i = l NiLi(-y, u>i) = E ^ 7 7 ( 7 , W i ( 7 ) ) + ^ w ( 7 , * i ( 7 ) ) a 7 3 (3.26) Note that £ 1 ( 7 ) maximizes NiL^-y, u>) and satisfies the second set of equations in (3.18), i.e. '• ' I W 7 , ^ ( 7 ) ) - D^Cain) = 0 « ^ ( 7 ) = ^ ^ ( 7 , w<(7) ) . Applying the chain rule once again, we have 0Wf(7) 3 5 7 ' : { l > ^ ( 7 , ^ ( 7 ) ) d 2 h{l, Wj) uji=u>i(7) a2 dujidujf lift, W i ) u>i=<I;i(7) ^ ( 7 ) a 7 r = ^ : W 7 ( 7 , « < ( 7 ) ) + ^ w ( 7 , « i ( 7 ) ) 69' Solving the above equation for [dLji{/^)/d^fT], we have 0Wj(7) = { - Wi(7)) + <9 7 r Substituting this expression of [du^)/d~fT) in (3.26), it follows from R l , R3, and R5 that as n —> oo - l j { ( 7 * , *<(7-)) = - ^ E ^ , 7 ( 7 , « i . ( 7 ) ) l 7 ^ i=l = Q(7*)- 1 + o P ( l„) , (3.27) which implies ~J[(rf*, Wi(7*)) fl(7*)-1- Taking e = |Q(7* ) _ 1 in the definition of convergence in probability, from (3.27), we have P ( - ^ ( T T 1 < -\J'Al\ «*(7*)) - ^(T*)" 1 < ^ ( T * ) " 1 ) -> 1, as n - oo <*=> p Q n ( 7 T 1 < ~ ^ ( 7 * , « i ( 7 * ) ) < | n ( 7 ' ) - 1 ) - » l , a s n ^ o o , which implies ^ ( ^ ( T T 1 < - ^ 1 ( 7 * . «i (7*))) - 1, as n ̂ oo. Therefore, for sufficiently large n, , -^ i ( 7 * , «i(7*)) > ̂ (TT 1 > ^ J (3.28) with probability 1, where Ax is defined in R5. Since, in (3.25), O p (A^- 1 / 2 ) = N'1'2 Op(l) A 0 as N —» oo, similar arguments as for —\J'i{l*, £j(7*)) lead to with probability 1, for any 0 < K < di, \\Op(N^2)\\ < ^ , (3.29) 70 where d\ is defined in R5. For any 7 such that H 7 — •JMLEW = K> w e know that H7 — JMLE\\/K — 1) which satisfies the condition of Lemma 3.3. We can regard J i (7, ̂ ( 7 ) ) in (3.25) as a function of ^ ( 7 — JMLE) o n the unit sphere in RT and using the results in (3.28) and (3.29), we have :(7 - IMLE? - J i ( 7 , ,#(7)) n ( 7 - 1MLE)TOP{N-1'2) + « 7 ( 7 - 7 M L B ) ~ ( 7 - 7 M i £ ) < " 7 M L E | | | | O P ( ^ - 1 / 2 ) I I - 4^4||7 - 7 M L I | 2 s - - - < - - < ° - Therefore, conditional on u , we can show Pu { - { I - I M L E ? -J\{l, ,w (7 ) ) n < 0 1 "as n —> 0 0 , N —> 0 0 . Thus, since 7 satisfies J i (7, ̂ ( 7 ) ) = 0, Lemma 3.3 implies that lim Puj{\\l - IMLEW < « } = 1. n,N—*oo Hence, using Lemma 3.4, we have lim P { | | 7 - 7 M i £ | | < K} = Eu; { lim Pu{\ft - 7 m l e | | < « } ) = 1. (3.30) n,N—>oo I n,N—>oo I Since ^ ( 7 , u>(7)) = 0 and J i ( l M L E , v(^MLE)) = O p ( n i V - 1 / 2 ) by (3.24), it follows from (3.25) that ~ ^ ( 7 * * , w ( 7 * * ) ) ( 7 - W ) = 0 P ( i V - 1 / 2 ) , (3.31) where 7 * * is on the line segment joining 7 to -yMLE. From (3.30), we know that 7 * * is in the interior. It follows from R5 and inequality (3.28) that --J'i(l**> <*K7**)) > ^ ( 7 * * ) _ 1 > n z z 71 where Q(7**) _ 1 is positive definite. Lemma 3.5 implies that all eigenvalues of \—\J[{l**, ^ (7* are greater than | A i . Therefore, all eigenvalues of [—\J[{l**, £(7**))] ~ \ . which are equal to the reciprocal of all eigenvalues of [—^1(7**, ^(7**))], are less than So we have [ - ^ ( 7 * * , ^( 7 **)) ] _ 1 < £ / , i.e., [-±J{(Y*, ^(Y*))]'1 = 0P(1). Hence, from (3.31), we have -U(7**, 0,(7**)) n (7 IMLE) — from which it follows (given R7) that Op(iV-V2) = 0^1 )0^-1 /2 ) = 0p(N-1/2), 1 = iMLE + Op{N-X/2) = 7o + O^n-1'2) + Op(N-V2) = 7o + Op j max n ~ 1 / 2 , (min -1/2 3.7.5 Asymptotic Normality of 7 The asymptotic normality of 7 will be shown based on the estimating equations (3.10). Let n * ( 7 ) = E ^ r 1 ( r l - Q i 7 ) , i=l where Qi, and are defined in (3.20). The estimator 7 satisfies $ ( 7 ) = 0 at convergence. Noting that d<&(7)/dyT = — E Qj^HlQixs constant for 7, we take a Taylor series expansion of ^ ( 7 ) around the true parameter 7 0 : 0 = * ( 7 ) = $ ( 7 0 ) + ^ M ( 7 _ 7 o ) ) which implies V^(7 - 7o) = 1 d${n) n dyT dy1 72 Since E[Qjtrl(ri - Q I 7 O)] = 0 and Cov[Qf S r 1 ^ - Q I 7 O)] = Qf^Qu by the Lindeberg Central Limit Theorem, we have n . I -1 /2 / n X - £ Qjt^Qi - = J2 Qf^ih - Q,7o) - N(0, I). i = l J \ * i = l / Noting that 7 = 7 0 + oP(ljv,n) a n d Ni = O(N), and using Lemma 3.2 and (3.22), we have Wi (7) = £ ; ( 7 0 ) + o p(ljv i n) = W i + O p ( A T 1 / 2 ) + 0 p ( l w , n ) = W i . + O p ( A T - 1 / 2 ) + o p ( l w , n ) = u>i + op(lAr,n). Hence, it follows by the Law of Large Numbers, Lemma 3.2, and R6 that i = l -1/2 n i = i P. rr~>/.. \ l l /2 -1/2 7=7,0;=u> (7) -1/2 u —> oo J 7=70,u>=u; [^(To)]1 A —> oo, n —> oo N —> oo, n —> oo, where u is the number of iteration. Using Slutsky's theorem, we can show that vM7 -7o ) = d 1 - va T n i E QWQi i = l -1/2 i E Q ? V & J L i=l N(O, n( 7 o )) . We can extend the foregoing proofs to the case where A is unknown by replacing A with its consistent estimate. Note that at the estimate 7 , the estimate A given in (3.13) can be shown to be consistent for A as n —>• 0 0 and N —> 0 0 (see, e.g., Demidenko, 2004). 73 Chapter 4 Simultaneous Inference for Semiparametric N L M E Models with Covariate Measurement Errors and Outcome-based Informative Dropouts 4 . 1 I n t r o d u c t i o n In this chapter, we develop two approximate likelihood methods to simultaneously address covariate measurement errors and outcome-based informative dropouts in semiparametric N L M E models. In Section 4.2, we propose models for this complicated problem. We obtain approximate M L E s of all model parameters, using a Monte-Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 4.3. In Section 4.4, we propose an alternative and computationally much more efficient approximate method. The proposed methods are illustrated in a HIV dataset in Section 4.5 and are evaluated via simulation in Section 4.6. 74 We summarize this chapter in Section 4.7 with some discussion. 4 .2 Models for Nonignorable Missing Responses In the presence of nonignorable or informative dropouts in the semiparameteric N L M E model (2.6) and (2.7) with the covariate process (2.8), we can write = (ymis,i, y0bs,;) for indi- vidual i, where y m i S > i collects the missing components of y, and y0bs,i collects the observed components of y .̂ Here, the missing y -̂'s are intermittent, i.e., we allow dropout individuals to possibly return to the study at a later time. Let r; = (rn,..., rirH)T be a vector of miss- ing response indicators for individual i such that r^ = 1 if is missing and 0 otherwise. Note that = 1 does not necessarily imply that r^+i = 1. We have the observed data {(y0bs,i, Zi, Ti), i = l,...,n}. To allow for a nonignorable missing mechanism in the response, we need models for the missing response indicators r-j, which are called dropout models. The parameters in the dropout models are treated as nuisance parameters and are usually not of inference interest. Thus, we try to reduce the number of nuisance parameters to make the estimation of main parameters more efficient. Moreover, too many nuisance parameters may even make the response and the covariate models non-identifiable. Therefore, we should be very cautious about adding extra nuisance parameters. In general, the probability that is missing may depend on many factors, such as responses, covariates, and individual random effects, etc. Since the missing response indicators rj are binary, a simple model for them is a logistic regression model as follows. We will assume that r^-'s are independent for simplicity (to reduce the number of nuisance parameters) m / ( r ^ , Zi, a ,̂ bi; 77) = J J j P ^ - = l | y i ) zh ai, b^ r])}r^[l - P ( r y = l | y i , zi} ait bi; r / ) ] 1 - ^ , i = l 75 with i logit[P(r0- = l|yi, ai, b i ; 77)] = log P ^ — Z " b * ' ^ = h(yu zh a^ bij.ry), [ 1 r\Jij — J-lYi, Z i , ai, D i , 77JJ where 77 are the unknown nuisance parameters and h(-) is often chosen to be a linear function of yi, Z j , â , and bi. More complicated models can also be considered, but they may introduce more parameters and increase the computational burden. Note that the missingness of may depend on the (unobserved) true covariates z* rather than the observed error-prone covariates Z j . In this case, a method similar to the one described below can be developed and will be discussed in the next chapter. The density function /(r^y*, z i ; a i ; b^ 77) is a general expression of the missing re- sponse mechanism. Little (1995) pointed out two ways to incorporate informative missing- ness: • outcome-based informative if /(rj|yj, Z j , a,, bi; 77) = /(rj|yj, ẑ ; 77). That is, the prob- ability that the current response is missing depends on the possibly unobserved re- sponse yi and covariates Z j but not on the random effects ai and bi. For example, a patient does not show up because he is too sick to go to the clinic. • random-effect-based informative if / ( r j | y j , Z j , ai, bi; 77) = /(r^a,, bi; 77). That is, the probability that the current response is missing depends on the underlying unobservable random effects ai and bi but not on yi and z». For example, a patient may be more likely to drop out if his initial viral decay is slower than other patients. In this chapter, we focus on the outcome-based informative missing mechanism. Diggle and Kenward (1994), Little (1995), and Ibrahim et al. (2001) discussed various specifications of the outcome-based informative missing mechanism. We will assume that, for example, are independent with logit[P(rij = l|yi, Z i ; 77)] = 771 + 772^1,- + r]3zi2j + • • • + r]v+1zivi + r?„+2yij. 76 More complicated dropout models can be specified in a similar way. Note that the assumed dropout models are not testable based on the observed data, so it is important to carry out sensitivity analysis based on various plausible dropout models. If the main parameter estimates are not sensitive to the assumed dropout model, we may be confident about the results. Otherwise, if the estimates are sensitive to the assumed dropout model, we need further investigation of possible missing mechanisms. 4.3 Likelihood Inference 4.3.1 The Likelihood Function We consider likelihood inference for semiparametric N L M E models with outcome-based in- formative dropouts and measurement errors and missing data in time-varying covariates, based on the approximate models (2.6) - (2.8). Let 0 = (a , /3, 52, R, A, B, 77) be the collection of all unknown model parameters. We assume that the parameters a, (3, 52, R, A, B, and 77 are all distinct. The approximate log-likelihood for the observed data {(y0bs,i, Z i , T i ) , i = 1,... ,n} can be written as n / / / 1(0) = 5>g /y(yi|z» )fz{^W,a,R)f(&i;A) f(W,B)f(Ti\yi, Z i ; V) dymiStidaidbi 77 where / y ( y ; | z ; , b^, a, (3, 82) = U^U fY{yij\zij,ai, W, a, (3, 52) = n ; i i (2TT<5 2)- 1/2 e x p { - [ y y - , d (u£a + v £ a t , /3, bl))]2/2<52}, / z ( z i | a i ; a , i?) = n£=i /z( z;fcl a*; a , i2) = IE=i M " 1 7 2 exp { - ( z i f c - ink a - v , f c a i ) T i?" 1 x ( z i f c - UifcQ: - V i f c a i ) / 2 } , /(a,; A) = | 2 7 r ^ | - 1 / 2 e x p { - a f ^ - 1 a i / 2 } , / ( b i ; B) . = ^ y r f l l - ^ e x p f - b f f l - i b i ^ } . The observed-data log-likelihood 1(0) can be quite intractable, so we use a M C E M algorithm, which is similar to Section 3.3.2, to find the approximate M L E s of parameters 0. By treating the unobservable random effects a; and b i as additional "missing" data, we have "complete data" { ( y i , z i ; r̂ , a i , b i ) , i = l , . . . , n } . The complete-data log-likelihood for all individuals can then be expressed as n n i=l i=l + log / ( a i ; A) + log / ( b i ; B) + log f(rAyi, z i ; 77)}. (4.1) (i) where lc is the complete-data log-likelihood for individual i. 4.3.2 Approximate M L E s Based on a M C E M Method The E M algorithm iterates between an E-step, which computes the conditional expectation of the complete-data log-likelihood given the observed data, and a M-step, which maximizes this conditional expectation to update parameter estimates. For our models, the E-step is quite intractable due to nonlinearity, so we use Monte Carlo methods to approximate the intractable conditional expectation. In the M-step, we use standard complete-data optimiza- tion procedures to update parameter estimates. 78 Let &® be the parameter estimates from the i-th E M iteration. The E-step for individual i at the (t + l)th E M iteration can be expressed as Qi(0\OV) = E(lU(e)\yoba,i,zi,Ti;0®) = J J J [log fyiy^, a,, b<; oc, 3, 82) + log /z(zi |ai; oc, R) + log /(a^; A) + log /(fy; B) -(-log f(ri\yi, 77) x / ( y m i S i i , a i ; b ^ y ^ i , zu 0 ( t ) ) d y m i S j i ^ dbi = A(i)(a, 0, O.+ / i ° ( a . + 4 V ) + / J ° ( £ ) + / « ( t | ) . (4.2) Since the expression (4.2) is an expectation with respect to f(ymiS,i, ai, bi|y 0h S )j, Zj, i"*; 0^), it can be evaluated using the M C E M algorithm (Wei and Tanner, 1990). Specifically, we may use the Gibbs sampler to generate samples from [ymj S )j, â , b ^ y ^ i , Zj, ^ 0W] by iteratively sampling from the full conditionals [ymis,i\y0bs,i, Zi, ri} ai, bi; 0W], [ai|yi, Z i , r i , bi; 0 ( t )], and [bi|yi, Zi, Ti, a;; 0W] as follows. f(ymis,i\yobs,i, Zi, r{, ai, bi; 0W) oc /y (y i /y(y< <• = /y(y< / ( a » | y i , Zi, r i , b i ; 0W) oc f Y ( y i = fv(yi « /y(yt Z i , r i , a i ; bi; 0W) Z i , ai, bi; 0W) • / ( r i | y i , Z i , a i ; b<; 0 ( i )) Z i , ai, b i 5 0W ) - / ( r i | y i , Z i ; 0 « ) , ai|zi, r i , bi; 0W) Z i , r i , a;, bi; 0W) • /(a^z*, r», bi; 0 ( 4 )) Z i , ai, b i ; 0W) • / ( r i | y i , Z i ) ai, b i 5 0 « ) • / ( a ^ ; 0(*>) CK /(ai; 0W) / 2 ( z i | a i ; 0 « ) • / y ( y ^ , ai, b,; 0<«)), / (b i |y i , Z i , r i , a*; 0 ( t ) ) oc /y (y i , b;|z;, r i , a ;; 0 ( t )) = /y(Yi| zi> r *, a*. b i ! 0 ( t ) ) • / ( b i | z i , r i ; ^ ; 0<*>) ex / ( b i ; 0(*)) • /y(yi |z i , ai, b < ; 0 « ) • / ( r ^ , z i } ai, b<; 0W) = / (b i ; 0W) • /y(yi |z i , a i ; b<; 0^) • / ( r ^ , z,; 0W) oc/(bi; 0W ) - /y(yi |z i , ai, b,; 0W). Monte Carlo samples from each of the full conditionals can be generated using rejection ( 79 sampling methods, as in Wu (2004). Alternatively, the integral (4.2) may be evaluated using the importance sampling method (see Section 3.3.3). For individual i, let {(y^., af> b f ) , . . . , (yf 0 , , a f \ b f 0 ) } denote a random sample of size kt generated from [ymiS,i, a*, b;|yofcSii, zu r- 0^}. Note that each (yf s i , a f , bf depends on the E M iteration number t, which is suppressed throughout. The E-step at the (t + l)th E M iteration can then be expressed as Q(0|0<*>) = ± QMeV) « E U E ^(tf; (3$.,,, y ^ ' ) , a f , b f ) ) . i = l i = l L fc=l J = E E iipg /y((yi2.,i,yo6.,i)|zi, a f , b f / a , *2) i = l f c = l + E E Tt log / z N a f ; a , R) + £ £ & log / (af; A) t=i fe=i • i=i fe=i + E E £ log / (bf ; B) + ± t £ log / ( r t | ( y f y 0 ^ ) , z i ; „) i=i fe=i t=i it=i = Qw{cx, /3, <52|0W) +.QP)(a, R\0®) + Q^{A\0^) + Q(4>(£|0(*>) + Q^{r)\0V). The M-step then maximizes Q(0\0^) to produce an updated estimate 0^t+1\ so it is like a complete-data maximization. Since the parameters in + Q^2\ Q^3\ and are distinct, the M-step can be implemented by maximizing + Q^4\ and separately using standard optimization procedures for the corresponding complete-data models, as in Section 3.3.2. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0. Let sf = dlf/dO, where if is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of 0 is Cov(0) = [ ^ £ ( s f | y o b S i i , Zi, Vi; 0) E(sf\yobSti, zu r i ; Of] , (4.5) i=i where the expectations can again be approximated by Monte Carlo empirical means, as in (4.4). 80 In summary, the foregoing M C E M algorithm proceeds as follows. Step 1. Obtain an initial estimate of (a, (3, 52, R, A, B) = (a' 0 ' , /3 ( 0 ) , 52(-°\ R®\ A^°\ B^) based on a naive method, which ignores covariate measurement errors and missing data, and an initial estimate of r/ = T / ° ) based on the dropout model by filling in ymjS)j with the average of y0bs,i- Set a f = 0 and b f = 0. Step 2. At the i-th iteration, obtain Monte Carlo samples of the "missing data" (yrms.i, a i , using the Gibbs sampler along with rejection sampling methods by sampling from the full conditionals [ymiS,i|y06S,i, Z i , ^ , ^ , fy; 0^], [a^y*, zi} riy bf, 0(i)] and [b^y*, z,, r-j, a i j 0^], or using importance sampling methods to approximate the conditional expecta- tion in the E-step. Step 3. Update estimates 0(i+1) using standard complete-data optimization proce- dures. Step 4. Iterate between Step 2 and Step 3 until convergence. 4.3.3 M o n t e C a r l o S a m p l i n g G i b b s Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples as follows. Set initial values (yfsi, af \ b f ) . Suppose that the current generated values a r e (ySs,i> a f \ b f } ) . w e c a n o b t a i n (9mi£i, a f + 1 ) , bf+1)) as follows. Step 1. Draw a sample for the "missing" response yfitV fr°m f(ymis,i\yobs,i, Z i , n, a f ' , b f } ; 0W). Step 2. Draw a sample for the "missing" random effects a f + 1 ' from /(a*l(yfS\ yofc.,0, r i ; b f *<*>). 81 Step 3. Draw a sample for the "missing" random effects h\ ' from After a sufficiently large burn-in of r iterations of Steps 1 - 3 , the sampled values will achieve a steady state. Then, { ( y ^ l . i i af \ H^)> k = r + I , . . . ,r + kt} can be treated as samples from the multidimensional density function f(,y-mis,i> a.j, b j | y 0 b S ) j , Z j , F j , Q()). And, if we choose a sufficiently large gap r' (say r' = 10), we can treat the sample series { ( y m i s , i > d-ik\ bj f e'), k = r+r', r + 2r',...} as independent samples from the multidimensional density function. There are several ways to get the initial values ( y ^ , - s i , a\°\ b - 0 ^ ) . A simple way is to set y ^ ] S ) i to the average of yQbs,i, and (af \ b - 0 ) ) to (0, 0). Mult ivar ia te Reject ion A l g o r i t h m Sampling from the three full conditionals can be accomplished by the multivari- ate rejection sampling method. For example, consider sampling from / ( y m j S ) i | y 0 & s , i , Z j , rj, aj, b j ; 6>W) in (4.3). Let / * ( y m i S i i ) = / ( y 0 ( , s , ;K »i . hu 0{t)) / ( r j | y j , z < ; 0®) and c = sup{/*(u)}. We assume c < oo. A random sample from / ( y m j S , i | y 0 & s , i , Z j , rj, aj, b j j 0{t)) u can then be obtained as follows by multivariate rejection sampling: Step 1. Sample y ^ j s i from / y ( y m j S > j | z j , aj, b^ ; 0 ( t )) , and independently, sample w from the uniform (0, 1) distribution. Step 2. If w < / * ( y „ j S ] j ) / c , then accept y ^ j s i , otherwise, go back to step 1. Samples from / (aj|y j , Z j , riy b j ; 6^) and / ( b j | y j , zit rit a ;̂-O®) can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used'to obtain samples from [ymiS,i, b j | y 0 ( , S j , Z j , 82 4 . 4 A Computationally More Efficient Approximate Method The estimation method described in the previous section may be computationally very inten- sive and may even offer potential computational problems such as slow or non-convergence, since the method involves sampling the random effects (a j , b j ) , which may have high di- mension (see the detailed discussion in Section 3.4.1). To overcome these difficulties, in this section we propose,an alternative approximate method which further approximates model (2.5) by taking a first-order Taylor expansion around the current parameter and random effects estimates, which leads to a L M E response model. For the resulting L M E response model with the covariate model (2.8) and the dropout model, we can obtain approximate M L E s of model parameters by using a computationally more efficient M C E M algorithm. The random effects aj and b j can be integrated out in the E-step of the E M algorithm and thus sampling the random effects in the E-step is no longer needed. Therefore, the proposed approximate method may provide substantial computational advantages over the M C E M method in the previous section. Moreover, the proposed method can be used to obtain good parameter starting values for the M C E M method in the previous section. Denote the current estimates of (6, a j , b j ) by (6, a j , b j ) , where a j = E(&i\y0bSii, Z j , TJ; 0) and b j = £ ' ( b j | y 0 h S ; j , Z j , r̂ ; 6), suppressing the E M iteration number. Taking a first-order Taylor expansion of in (3.6) around the current parameter estimates a and J3 and random effects estimates a j and b j , we obtain the following L M E response model Yi = Ei + Wia + Xip + HiSLi + Tibi + ei, (4.6) 83 where Wi = ( w j i , . . . , w i n . ) with xvi:j = do. Xi = ( x i i , . . . , x i r i i ) r with Xjj = u dp / / i = ( h j i , . . . , h i 7 l j ) with hi:j = O&i Ti = (t i l ;...,t i nJT witht0- = ^ gi = g i ( d , y9, ai , bi) - WiQ: - A"i3 - TfiOi - T i b i , with all the partial derivatives being evaluated at ( d , P, aj, bj). Our proposed approximate method is to iteratively solve L M E response model (4.6). For the L M E response model (4.6) with the covariate model (2.8) and the dropout model, the M L E s of the model parameters can be obtained by a M C E M algorithm, in which the random effects ai and b i can be integrated out in the E-step, as shown below. Thus, the E-step only involves sampling y m i S ) i rather than ( y m i S , i , a*, bj) as in Section 4.2. Moreover, some analytic expressions for the M-step can be obtained. These result in a substantial improvement in computational efficiency 4.4.1 A Much Simpler E-step In this section, we show that, based on the approximate L M E response model (4.6) and the covariate measurement error model (2.8) along with the dropout model, the random effects (aj, bi) in the E-step in Section 4.2 can be integrated out. Based on standard results for multivariate normal distributions, the distribution of 84 ( y i , Z j , a j , b j ) for individual i is given by \0 N 0 0 V Hi A Hf + TiB Tj + 52 I Hi A V? HiA TiB • ViAHf ViAV? + I®R ViA 0 AHT AV? , A 0 BT? 0 O B Since Ht A Hf + TtB Tf + S21 and V A Vf + I ® R are positive definite, they are symmetric and invertible. By the inverse operation of partition matrices, we can write ^ rii j\n; - t -1; a it + o~ l rii A vt \ I Ui + ft —ft tii \ (4.7) V where Hi A Hf + TiB Tj 521 Hi Vf VAHT T/. 4 T / T . VAV/ +I<S)R J V -EiF 1 Ei J d = {HiA'Hj + TiBT? + 52 Fi = GiHiAVf, Ei = [ ( V ^ V f + / ® i ? ) - V j A F f G j / f j A V f ] - 1 . Because the random effects aj and b j are conditionally independent of r j given ( y j , Z j ) , it is straightforward to obtain the conditional distribution of [aj, b j | y j , Z j , r j ] when 6 = 6 y i i Z j , I*j, 6 N 85 where uai = [AHTtGi + FiEiF^-AVfEiFfiiyi-gi-Wia-XiP) + [AVfEi-AHTFiEl}(zi-Ula)\e=Q1 uhi = BT^Gi + FiEiFT^-^-Wia-Xi^-BTTFiEiixi-Uia)^^, E a i = A-iAHjiGi + FiEiF^-AV^EiF^HiA-lAV^Ei-AHTFiEilViA^ Ebi = B — B T?(Gi + Fi Ei Fj) T* B \ Q = Q , S a i b t - (Ebiai)T=[AVfEiFl-AHf(Gl + FiEiFn}TiB\e=B. By the expectation and covariance properties for multivariate random variables, we have £(aj |yj, ZiYi, 0) = E(^aJ\yi, z^ n\ 0) = £ a i + i / J , £(b; |yi , zu n; 0) = u b i , E{hihJ\yi, Zi,Yi- 0) = E b i + uhi v?., E(aihf \yi, z ^ ; 0) = S a i b i + u&i Since f(ymis,i, ^i, b i |y o 6 S ) j , Zi, Yi, 0) = /(a^, bi |yi , z», r»; 0 ) / ( y m i S , t | y 0 6 S l i , z i , r i ' , 86 the conditional expectations in the E-step, which correspond to - if1 in (4.2), become I?\a, (3, 52) = J £ a i , b i [ - ^ log(27r52) - ^ ( y < - g i - Wt oc - Xi (3 - H, a, - % b{)T x(yi - gi - Wi oc - X i (3 - Hi ^ - Ti b i ) | y i j Z i , r i ; 0 ^ f (ymis,i\yobs,ij Z i , T i , 0)dy = log(27r52) - ^ J {(yi - gi - Wi oc - X i /3) r(yi - g i - Wt oc - Xi (3) ^{yi-^-WiOc-Xidf E^lHi&i + Tih^yi, zu r < ; 0] . + E a i ) b i [ ( ^ i a i + r i b i ) T ( i / i a i + ri.bi)|yi, z ^ ; 0]} x / ( y mis,i\yobs,ii Z i , T j , 0)dy = - ^ l o g ( 2 7 r 5 2 ) - ^ y ' { ( y i - g i - l ^ a - X i ^ ) % i - & - W i a - X i / 3 ) - 2 ( y i - g i - W i a - X i / 3 ) T [ ^ i E ( a i | y i , z i ; r i ; 0) + Ti £ ( b i | Y i , Z i ) r < ; 0)] + ^ a i , b j a f i f f / f i a i + 2bf T f t f i ^ + b f i f l - b ^ , Z i , r i ; 0]} x / ( y mis,i\^/obsji j ^i? "̂ij 0)dy = - ^ l o g ( 2 7 r 5 2 ) - ^ y { ( ^ - ^ - ^ ^ - ^ ^ ( y i - g i - W i « - X i / 3 ) - 2 ( y i - g i - W i a - X i / 3 ) T ( F i J B [ a i | y i , z i ; r i ; 0} + Tt E[bi\yu zu n; 0]) +tr[Hi E(eii a f |yi, zh r<; 0) Hj) + 2 tr [# i b f | y i , z i , r < ; 0) i f ] + t r [ 7 ; £ ; ( b i b f | y i , Z i . T i ; 0)Zf]} x = - | l o g ( 2 7 r < 5 2 ) - ^ | { ( y i - & - W i a - X i / 3 ) T ( y i - & - W ; a - X i / 3 ) -2 (y i - gi - VKi a - X , /3)T i / a i + 7} i / b J +tr[Hi ( S a i + V a i i / J ) tff ] + 2tr[tf; ( E a i b i + ^ if] +tr[Tj ( £ b i + i / b i !/£.) Z f ] | x i\yobs,i, Z i , Ti', 6)dymiSti, 87 E, a , , b j 1 _ m ' _ - ^ log\2*R\ - - - Uv oc - Vtj Ri)T R-1 i=i x(ZJJ UijOt Vij a j ) | y j , z^r^; 0 x / (y m i S ] i |y 0 ( , S ] i , Z j , i y d)dymiSi lOE g |27ri?| - - y { ^ ( z ^ - ^ a ) T i T 1 ( z y - ^ oc) mi i -2 ^ ( z y - Un ocf R-1 Vi3 E^ailyi, zu n- 0) 3 = 1 mi _ • + ^2Eaiibi[^V^R-1'Vijai\yi, ^, r»; 0]\ x / ( y m i s , l | y 0 6 s , l , Z i , r 4; 0)dyn 1 f r m i ^ log |27ri?| - - J { £ ( z < , - - tfy ocf R-1 (zy - £ ^ a ) m i -2 ^ ( z ^ - - Un ocf R'1 Vij E(<a\yi, zu i y 0) i=i + J ^ t r [ V r i [ i ? - 1 K j £ ; ( a i a f | y i , Z i , i y 0)]} x / ( y ^ y ^ , z», rii 6>) dy„ m 1 f m * ^ log |27ri?| - - { ~ Vij ocf R-1 (zij - Uij oc) 3=1 -2Y(zlj-Ul3ocf R-lVijUai 3 = 1 + Ytr(vij R"1 VH ( s a i + ^ a i ^ ) ) x / ( y m i s , i | y 0 6 S , i , Z i , r t ; d)dymia,ij, 3 = 1 r r l l ~ i E a u b i [ ~ 2 l o § \ 2 7 t A \ - A~l a * l y i ' Z i ' r i ' 0 x / ( y m i s , i | y 0 6 s , i , z i , i"*; 9) dymiSii 1 1 f - log |2TTV4| - - / tr[A~l E(aisf\yi, z h n; 0)} x f{ymiS,i\yobs,i, z ; , i \ ; 9)dymiSii 1 1 f - log I27T.4I - 2 / t r [ ^ _ 1 ( E a ; + ^ ) ] x / ( y m ^ y , ^ , Z i , n; 6) dymiSii> E, a i . b i - ^ log |2vr5| - ho? B 1 bi |yi , zi} r»; 0 x / (y m i S i |y 0 & s , i , z i ; r<; 0) d y m i 5 i i 1 1 f - log|27r5| - - / t r [ 5 _ 1 E ( b i b f | y i , zi} i y 0)} x f{ymisAyobs,i,zi, i \ ; 9)dymis>i 1 1 f ~ - log |2TT£| - - / tr[5 _ 1 (E B I + uhi ul.)\ x / ( y m i S , i | y o 6 s , i , zu i y 6) dymis>i, I^iw) = I EaiM log/ (r i |y i , z^, 77) x / ( y m i S , i | y o b s > i , z i } r4; 0) dy log/(rj |y i , Z J ; 77) x / ( y m « , i | y 0 b s , i , z i ; rij 0) dyn Thus, compared with the intractable high-dimensional integrals in (4.2), the above integrals I{ - II have much lower dimension, i.e., the E-step is substantially simplified. Note that if - if are expectations with respect to / ( y m i S l i | y 0 6 s , i , Z j , ry, 0), so they may be evaluated using common numerical integration methods such as Gaussian quadrature if the dimension of y m i s , i is small. If the dimension of y m i S ) i is not small, we can use the rejection sampling methods to generate samples from [ymis,i\y0bs,i, z%, &\ based'on the conditional density f{ymis,i\yobs,i, z i , r u 0) oc / ( y i , Zj,ry, 0) = / ( Y i , Z i ; 0)/(ri|y*> Z i ; 0), where the distribution of ( y j , Zj) is y i N / gi + WiCt + Xi(3^ / UjCt J HiAHf + T{BTj + 62I HtAVf VtAHf ViAVf + I^R For individual i, let y ^ ] s j , • • •., yf i i . i denote a random sample of size kt generated from [ymis,i|yobs,i! zi, ry, • Note that each y ^ \ s i depends on the E M iteration number t, which is suppressed throughout. Then we approximate the conditional expectations if - if in 89 the E-step by its empirical mean, with missing data replaced by simulated values, as follows. * fe=i >(fc) r- E 2l(y{Xi> - & - w « - * /3p 2 = 1 x[^£;(aJ|(ySSil, yobs,), zu n; 0) + % E(bA(y^s., yofeM), Z i j ri; 9)} ^ fct +r -E t r { H i [CovC^KySi.i. y^O. z i , r » ; 0) * fc=i +^(ai|(y2l,i> yobs,i), Zi, ry, 0) ̂ (aiKyJJ^, y0bs,i), zu ~6)T)Hj} ^ kt + 7 7 E 2 t r { ^ [ C o v ( a i ' bil(ySl,i' yobs,i), Zi, Vi; 9) 1 fc=i +E(^\(y{Xi, y o b s , i ) , zu vf, e)E(bl\(y{Xl, y o b s , i ) , zu n; ~9)T\T?} kt +77 Etr<T< [Cov(b,|(ySSl̂ yobs,i), Zi, vf, ~9) 1 fc=i + £ ? ( b i | ( y ^ i i , y o b s , i ) , Z i , r i ; ~9) £(bi|(y<2flii, y<*a,i), zu r i ; ~0)T] i f } } , 1 f _m'_ j f (a , iJ) « - ^ i log [2jriJ| - J £ ( z „ - Utj af R'1 {ztj - U„ a) 2 W - - 2 E 77 E ( z « - un a ) T ^ £N ( y ™ U yob**), Zi,Vi, o) j=i k=i TUi _. kt + E J7 E t r W ^ VH [Cov(ai|(ySs., y0^), Z j , r i ; 9) j=l * fc=l +-B(a*l(yS2s,i. yobs,i), Zi, ry, 9) £?(ai|(y2i8i , yo6«,i), Z i , r;; 0) T]} f ) ( j 4 ) „ _ 1 l o g | 2 y r A | _ 1 ^{A-1 [Cov(a,|(ySS|i, yob.,0, r>; 0) * fc=i • +£( a i l ( y 2 L , yc*s,i). Z i , r i ; 9) £(a;|(yfs., y o 6 s,i), Z i , r i ; 9)T}}, 90 I^\B) » -l- log\2KB\ - 1 £ \ I V ^ [Cov^Ky JS, y 0 ^ ) , z i ; r i 5 0) +^( bi|(ySi , i ; yobs,i), ^ , r i ; 0) £ ; ( b i | ( y S s i , y o b S i i ) , zh r i ; 0)T]}, kt 4 ° f a ) ~ 7 r £ l o g / ( r i l ( y 2 l , i . y<*8,i),z*; * fc=i 4.4.2 The M-step In this section, we derive some closed-form expressions for the M-step, so we avoid some iterative algorithms, which may be computationally inefficient. The M-step of the E M algorithm maximizes n Q{6\9) = Y$\<*i A ^ + f V , R) + WW + W(B) + f }fa)] i=l to produce an updated estimate of 6. Note that if we were to observe (ei, €i, ai , bi) , in addition to (y0fcs,i, Z i , r i ) , we would use the following estimates n n n m; n ^ = £ e^ ei/£ W*> R== Y/)2eiJel/J2mi' i—1 i=l i=l j=l i=l n n i = J ] a i a f / n , R=J2hihI/n> i=l i=l n n mi n n where J2eIe«> 2~2 Yleij€Ij> 2 ~ 3 a i a f i a n d E D i b f a r e "sufficient" statistics for <52, R, A, i=l i=lj'=l i=l i=l and 5 , respectively. Since ei, a^, and b i are unobservable, we can "estimate" them by their conditional expectations given the observed data (y 0f, S )i, Z i , r ^ , as in Laird and Ware (1982). Based on results in multivariate analysis, we have M 1 gi + Wia + XiP ^ f Hi A Hf + TiB Tj + 521 HiAV? 52I^ Zi ~ N Uioc Vi A Hf ViAVf + I^R 0 V e<) \ 0 J \ 5 2 1 0 521 j 91 and V e^ J N Ei + Wicx + Xip UiC* 0 V Hi A Hf + TiB 7 f + 82 I HiAV? \ 0 ViAVT + I®R Mij Ml R where is a vrrii x v matrix with the j th v y. v submatrix R and zeros elsewhere. Since ej and are conditionally independent of r-j given y j and Z j , we have / ( e j | y j , Z j , ry, 0) = / ( e j | y j , z»; 0), / ( e i | y i , Z i , ry 0) = f(ei\yi, Z J ; 0). Using the above results, the inverse of partition matrix in (4.7), and the definition of condi- tional distributions, it can be shown that [ e j | y j , Z j , ry 0] ~ N(uBi, A e J , where [52(Gi + Fi Ei i f ) ( y < - g j - Wi cc - Xi Q) - 62 F E^i - Ut cx)} 0=0 and [cij|yi, Z j , Ti\ 0] ~ N(u€iJ, A £ y ) , where = [R^Ei)*{vik-Uikci)-R{EiFT)i(yi-&-WIOL-XiB)} 6=6 k=l A e . . = [R-R{Ei)"R] 6=6 92 with the j th v x rn submatrix (Ei FfY of Ei Ff and the j th row and the kth column v x v submatrix (E^k of Ei, j, k = 1, . . . , m .̂ Note that E(e[ei\yobStU zu iy 0) Ei€ij <Hj\yobs,i,Zi, ru 0) E(aiaJ\y0bs,i, zi, Tu 0) E(hihJ|yo6s,i, ri; 0) E(&i G i | y i , Z i , Vi', 0) j'(ymis,i\yobs,i, zij r t i 0) ^Ymis.t) E(€-i €j | y i , Z i , F i , 0)/(yjnis,i|yo6s ,i; Z i , T i , 0) d y m i s i , f E(ai ai |yi, Z i , rf, 0)f(ymisAyobs vi;6)dy mis,11 E ( b i b f | y i , Z i , Ti; 0)f(ymia,i\yobs,i, z i , r»; 0) d y 7 1 Using the expectation and covariance properties for multivariate random variables and the properties of the matrix trace operation, we can update the estimates of (52, R, A, B) in the M-step as follows. 52 = E tv[E(eief\yobSti, zt, iy 0)}/E ni i = l i = l n kt . - E E tv[E(eieJ\(yml,i, yobS,i), z * , r»; 0)]/ E ^ . i = l k=l i = l n m i _ n R = E E e5ly0feS,i; z*> r i ; )̂/ E m i n m i /ct m n ~ E E E Efej CyKyJn-B . i , Yobs.i), Z i , r i ; 0)/ E i = l j=l k=l i=l A = E ̂ (ajaf | y o 6 S i i , zi; iy 0)/n i = l . ~ E E ^ ( a i a f l ( y m i 5 , i > yobs,i), z<, v{; 0)//c t n, i = l fc=l 5 = E - ^ ( b i b f | y o 6 S ] i , " Z i , v^, 0)/n i = l - E E ^ ( b i b F I ( y m i S , i ' y«*».0. z»> r » ; 0 )A^- i = i fc=i Plugging 52 and .R into E r = i [ - ^ ( a > ^ , <52) + ^ ( a i ^ ) ] and setting its first derivative with respect to (oc, 3) equal to 0 , we can get a set of equations for 6c and 3. ( ^ i-1 v j = l 7 i-l n E xf Wi i=l E xf x{ i=i a E l ^ ^ + ^ E c / ^ - 1 ^ . } i = l j=l J \ ZXfvyi, t=i 93 where u y i = E(yi\yahati, Z i , ry, 0) - gi- Hi E(ai\yobSti, zu r̂ ; 0) - T{ £ ( b i | y o h S i i , z i ; r - 0), = zij — E(&i\yobs>i, Zi, ry 0). The foregoing conditional expectations can be approximated as follows. kt E(yi\y0bs,i, Z i , ry 0) « E ^ y ™ l , i ' Yobs,i)/kt, k = i kt E(ai\yobSii, Zi, ry 0) « ^ ^ M ^ m l , ; , y<**.0> z i , ry 0)/fct, k = i kt E(bi\yobs,i, Zi, n; 0) « E ( b i | ( y f s i , y o f e s > i ) , r ;̂ 0)/fc t . fc=i Using the inverse operation of partition matrices, we obtain closed-form estimates of a and 3 in the M-step: n r „ nii ^ n " « = r n £ w^yi + 52izuljR-^Zij + r 1 2 £ x > y i , i=l L j=l J i=l n n i=l L j=l J i=l where r n -M + Q L"1 QT, T12 = F21 = -Q L~\ and T 2 2 = L ' 1 , with n n mi i=l i=l j=l L = E x f X i - ( E ^ ^ ) M ( E : ^ X i ) , i=l i=l i=l n Q = M ( E ^ , ) . i=l We finally maximize E"=i ^i 1!) to obtain an updated estimate of 77, which can be done by standard optimization procedures such as the Newton-Raphson method. Iterating between the above E-step and M-step until convergence, we obtain approxi- mate M L E s of 0. The asymptotic variance-covariance matrix of the approximate M L E 0 of t 94 0 can again be obtained using the approximate formula given in (4.5). The only difference is that the density function /y(yj|zj, a;, by a, 3, 52) in if1 in (4.1) is based on the L M E response model (4.6). We see that, for this approximate method, both the E-step and the M-step are computationally much less intensive than those in Section 4.3. The performance of this approximate method will be evaluated in Section 4.6. 4.5 Example We illustrate our proposed methods in this chapter using a HIV dataset. We also analyze this dataset using the commonly-used naive method which ignores measurement errors and missing data for comparison. 4.5 .1 Data Description The dataset includes 53 HIV infected patients who were treated with a potent antiretroviral regimen. Viral loads (Plasma HIV-1 R N A copies) were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of treatments. After the antiretroviral treatment, the patients' viral loads will decay, and the decay rates may reflect the efficacy of the treatment. Throughout the time course, the viral load may continue to decay, fluctuate, or even start to rise (rebound). The data at the late stage of study are likely to be contami- nated by long-term clinical factors. Various covariates such as CD4 count were also recorded throughout the study on similar schedules. It is well known that CD4 counts are usually measured with substantial errors. The number of response measurements for each individual varies from 6 to 10. Five patients dropped out of the study due to drug intolerance and other problems and sixteen patients have missing viral loads at scheduled time points. There were 104 out of 403 CD4 measurements missing at viral load measurement times, due mainly to 95 a somewhat different CD4 measurement schedule. A detailed data description can be found in Wu and Ding (1999). Six patients are randomly selected and their viral loads are plotted in Figure 1.1. Due to long-term clinical factors, drug resistance, and other complications, the viral load trajectories can be very complex after the initial phase viral decay. Visual inspection of the raw data seems to indicate that dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, the dropouts are likely to be informative or nonignorable. The CD4 count trajectories for six randomly selected patients are plotted in Figure 1.2. There exists large variability in CD4 count between patients. The population CD4 count trajectory appears to have a quadratic polynomial shape. 4.5.2 The Response and the Covariate Models Based on Wu (2002) and Wu and Zhang (2002), we consider the following HIV viral dynamic model (see Section 3.5 for the details) Vij = \ o g w ( P u e - X i i ^ + P2ie-X2iJtii) + eij, (4.8) log(Pii) = 0i + bu, \UJ = fa + fcz-j + b2i, (4.9) log(P 2 i) = 04 + hi, X2ij = w(tij) + hi{Uj), (4.10) where yi3- is the logi0-transform of the viral load measurement for patient i at time Uj, Pu and P2i are baseline values, A^- and A 2 i j are viral decay rates, z*j is the true (but unobservable) CD4 count, and w(Uj) and hi(Uj) are nonparametric fixed- and random-effects functions (see Section 2.1). To avoid very small (large) estimates, which may be unstable, we standardize the CD4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines 96 Table 4.1: A I C and BIC values for the model (4.8) - (4.10), with q < p = 1, 2, 3. Model p=l,q= 1 p=2,q=2 p=2,q=l p=3,q= 3 p=3,g= 2 p=3,q= 1 A I C 622.92 BIC 685.37 590.82 591.37 584.51 676.68 662.95 677.16 593.32 671.76 583.19 657.72 with percentile-based knots to approximate w(t) and hi(t). Following Wu and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects. A I C and BIC criteria are used to determine the values of p and q. Table 4.1 displays A I C and BIC values for various plausible models. Based on these A I C and BIC values, the model with p = 3 and q = 1, i.e., seems to be the best, and thus it~is selected for our analysis. For the CD4 process, in the absence of a theoretical rationale, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed CD4 values, and is done separately from the response model for simplicity. Specifically, since the inter-patient variation is large, we consider model (2.8) with Uu = Vu = (1, uu,..., v^f1) and linear (a = 2), quadratic (a = 3), and cubic (a = 4) polynomials. Table 4.2 presents A I C and BIC values for these three models. The following quadratic polynomial L M E model best fits the observed CD4 process: where uu is the time and ot = (a\, a 2 , a3)T are the population parameters and at = (flii, fli2, 0'i3)T are the random effects. A 2ij ~ ft + fa ^ i ( iy) + ft foiUj) + b4i (4.11) C D 4 i ( = ( « i + ai) + ( « 2 + a 2) ua + (a 3 + a 3) u2t + eu (4.12) 97 Table 4.2: A I C and BIC values for the linear, quadratic, and cubic CD4 models Model a=2 a=3 a=4 A I C BIC 806.58 715.80 752.15 830.00 774.34 791.18 4.5.3 D r o p o u t M o d e l s a n d S e n s i t i v i t y A n a l y s i s In this study, dropout patients appear to have slower viral decay, compared with the remain- ing patients. Thus, dropouts are likely to be informative or nonignorable. So we need to assume a model for the dropout process in order to make valid likelihood inference. Although dropout models are not verifiable based on observed data, subject-area knowledge and sen- sitivity analysis based on plausible models may still lead to reasonable models. Note that we should avoid building a too complicated dropout model since a complicated model may become non-identifiable (Fitzmaurice et al. 1996). Subject-area knowledge suggests that dropout may be related to current or previous viral load and CD4 measurements. Thus, we consider the following five plausible dropout models for sensitivity analysis Model I : logit[F(r i j = l|yi, zi'i V)] = Vi + V2CDAij + rj3yij Model II : logit[P(r y = i|yi, z i i V)] = vi + V2Vi,j-i + myij, Model III : logit[P(rjj = i|y<, zi\ V)] = Vi+ mCD^j + rj3yitj Model IV : logit[F(r i j = i|y», z i ' , *)] = m + V2yik, k<j, Model V : logit[F(ry = i|y<, zi> V)] = Vl+rl2CD4*j, where y^ (k < j) in Model IV is the last observed response and CD A*- in Model V is the estimated true CD4 value for individual i. Thus Models I - III represent possible nonignorable missing response models, Model IV represents a possible ignorable missing response model, and Model V relates dropouts to (estimated) true CD4 values. We also 98 considered the following ignorable missing response models: logit[P(r;j = l | y i , zfT])] = r)X +r)2yn logit [P(r^ = l|y*, zf 77)] = 771 + 7 7 2 ^ , but the resulting estimates are similar to those for Model IV, so we only present results for Model IV in Table 4.3. We assume independence of the r^-'s to simplify the model. 4.5.4 Estimation Methods and Computation Issues We estimate the model parameters using the naive method which ignores measurement errors and missing data and the two proposed "joint" model methods discussed in Sections 4.3 and 4.4. We denote the method in Section 4.3 by A P P R 1 and the method in Section 4.4 by A P P R 2 . The two proposed joint model methods need starting values for model parameters since they are implemented by M C E M algorithms. We use the parameter estimates obtained by the naive method as parameter starting values for the two joint model methods. For the naive method, we use the SPLUS function nlme() and ImeQ to obtain param- eter estimates and their default standard errors. For the two proposed joint model methods, we assess the convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 4.1 and 4.2 show the time series and the autocorrelation function plots for b2 associated with patient 14. From these figures, we notice that the Gibbs sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 17. Time series and autocorrelation function plots for other random effects and missing responses show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 4.3.3). We start with ko = 500 Monte Carlo samples, and increase the Monte Carlo sample size as the number 99 O 2 0 0 4 0 0 G O O 8 0 0 1 0 O O Iteration Figure 4.1: The time series plot for b2 associated with patient 14. Series : b2 t L T Figure 4.2: The autocorrelation function plot for b2 associated with patient 14. 100 t of iteration increases: kt+i = kt + kt/c with c = 4 (Booth and Hobert, 1999). Conver- gence criterion for these two joint model methods is that the maximum relative change in the parameter estimates from successively iterations is smaller than 0.05. Convergence of the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the two proposed joint model method. On a S U N Sparc work-station, the A P P R 1 method took about 140 minutes to converge while the A P P R 2 method took only 12 minutes to converge. This shows that A P P R 2 offers a big reduction in computing time, and thus is computationally much more efficient than A P P R 1 . 4.5.5 Analysis Results We estimate the model parameters using the naive method and the two proposed joint model methods A P P R 1 and A P P R 2 . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P P R 1 and the A P P R 2 methods. We also tried several other parameter starting values for the proposed methods. Different parameter starting values appear to lead to roughly the same parameter estimates for both the A P P R 1 and the A P P R 2 methods. Table 4.3 presents the resulting parameter estimates and standard errors based on models / , IV, and V in (4.13). We find that the two joint model methods provide similar parameter estimates. We also find that the naive method may severely under-estimate the covariate CD4 effect (i.e., /?3) and may poorly estimate some other parameters as well (this will be confirmed by simulation). For the different dropout models in (4.13), we find that the resulting estimates based on the three nonignorable models (Models I, II, III) are all similar, which indicates that the estimation may be robust against the nonignorable dropout 101 Table 4.3: Parameter estimates (standard errors) for the models in the example. Model method Oil Oil a3 Pi P2 ft PA P5 / V PT 6 R N A I V E - - - 11.72 65.71 0.84 6.87 -2.58 8.66 -1.90 .35 - - — (•2) (3.8) (3.2) (.6) (5.5) (8.9) (3.1) Model I A P P R 1 -.42 4.15 -3.75 11.72 67.08 1.52 6.97 -1.83 7.75 -2.54 .35 .51 (•1) (.5) (•6) (•2) (5.2) (6.2) .(•7) (5.8) (8.8) (3.5) A P P R 2 -.43 4.21 -3.78 11.70 66.97 1.50 6.96 -1.90 7.86 -2.63 .33 .50 (•1) (.6) (.6) (•2) (4.4) (5.8) (.6) (5.5) (7.9) (3.0) Model IV A P P R 1 -.43 4.18 -3.75 11.73 66.52 1.37 6.89 -2.62 8.83 -1.92 .35 .51 (•1) (•5) (•6) (•2) (5.0) (6.0) (•7) (5.9) (8.9) (3.1) Model V A P P R 1 -.43 4.21 -3.80 11.74 66.79 1.44 6.89 -2.50 8.60 -1.98 .35 .50 (•1) (.6) (.6) (•2) (4.9) (6.1) (-7) (5.9) (8.9) (3.1) Note: A and B are unstructured covariance matrices, but we only report the estimates of their diag- onal elements here. Diag(A) = (.50,3.65,1.61) for APPR1, Diag(A) = (.52,3.80,1.66) for APPR2. Diag(B) = (1.08,77.12,2.03,24.98) for Naive, Diag(B) = (1.10,75.50,2.01,26.51) for APPR1, and Diag(B) = (1.09, 75.24,1.83, 22.37) for APPR2. models. The estimates based on the ignorable models (Models IV and V ) , however, appear, to be somewhat different, especially for the parameters associated with the decay rates XUJ and A 2 ij . This suggests that the missing responses (dropouts) may be nonignorable, and reliable likelihood estimation must incorporate a reasonable nonignorable missing response model. Although some estimates in Table 4.3 are not statistically significant, the values of the estimates may still provide useful information about viral load and CD4 trajectories. The estimates of parameters 772 and 773 in dropout model I based on A P P R 1 method (or A P P R 2 method) are -.05 and .97 (-.04 and 1.06) respectively, with both p-values less than 0.001, which also indicates that the dropouts may be nonignorable (or informative) since the missingness may depend on the missing values. The estimates of 772 and 773 indicate that dropout patients seem to have lower CD4 counts and higher viral loads than the remaining patients. 102 4 . 6 A Simulation Study In this section, we conduct a simulation, study to i) evaluate the performances of the semi- parametric modelling and of the A I C / B I C knots selection method, ii) assess the two proposed methods (APPR1 and APPR2) and compare them to the naive method (NAIVE), and iii) evaluate the impact of specification of the missing response mechanisms on parameter esti- mation. We generate 100 datasets from the following model, which corresponds to the model (4.8) - (4.10), y i j = l o g 1 0 ( P i i e - A l ^ + P M e - A a « * « ) + eii, (4.14) log(P u ) = A + Xiij = fez?, + b2i, log(P 2 i) = Ai + hi, (4.15) X2ij = -2.2 + (5.3 + 0.164*) sin(0.04 + 3ty), (4.16) where the nonparametric model (4.16) is carefully chosen to closely mimic the viral load trajectory at later stages in the example of the previous section. The covariate model and the measurement time points used in the simulation are the same as those in the example of the previous section. The true values of model parameters are similar to those in the example. The true values of (52, /33, /34) and a are presented in Table 4.4, and the other true parameter values are 5 = .2, R = .4, A = diag(.5, 3, 2), and B = diag(l, 9, 2, 4). Note that b» = (bn,bi2,bi3,bii) ~ N(0,B), where b& is incorporated in the nonparametric model (4.16). There are 147 viral loads after the rescaled time 0.25. We regard the viral loads out of 147 greater than the 45th percentile as missing responses. Thus, we delete 20% largest response values at the last few time points to mimic an informative missing response mechanism (i.e., missingness depends.on the values being missing). The structure of the data generated by the simulation study is similar to that in the example in Section 4.5. We calculate averages of the resulting estimates and their standard errors based on 103 each method, and compare the methods by comparing their biases and mean-square-errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined next. For instance, the bias for ft, the j th component of 3, is defined as biaSj = J3j- — fij, where ft is the estimate of ft. The mean-squared error for ft is defined as MSE.,- = bias2 + s), where Sj is the simulated standard error of ft. Then, the percent relative bias of ft is defined as bias^/lftl x 100%, and the percent relative root M S E is •^MSEj/l/3,-1 x 100%. First, to evaluate the nonparametric modelling, we study the performance of the AIC and BIC criteria in selecting the numbers of knots (p and q), since these numbers represent the degrees of smoothness of nonparametric functions (too large/small values may result in overfit/underfit). For the 100 datasets simulated from the semiparametric N L M E model (4.14) - (4.16), we find that all BIC values and 97% of A I C values lead to the model (4.11) (i.e., p = 3, q = 1). To further evaluate the A I C and BIC methods, we also generate data from models (4.8) - (4.12) with (/35, 06, (37) = (-2.0, 8.0, -3.0) (so the true number of knots are known), and use the A I C and BIC methods to select the best model. The performance of the A I C and BIC methods is similar. These results show that the A I C and BIC criteria perform well in the current setting. 104 Table 4.4: Simulation results for the parameter estimates (standard errors) as well as their biases and MSEs for the estimation methods P A R A , A P P R 1 , and A P P R 2 . Parameter ai a2 a 3 02 03 04 0s 06 07 True Value -0.4 4.0 -4.0 12.0 67.0 1.5 7.0 P A R A - - - 11.94 60.91 -0.62 6.48 0.20 - - - (•1) (1.5) (1.8) (•2) (•1) A P P R 1 -0.39 4.05 -3.99 12.00 66.87 1.49 7.11 -1.89 9.65 -1.66 (•1) (.3) . (.3) (•1) (1.3) (1.6) (.3) (2.0) (3.1) (1.1) A P P R 2 -0.39 4.06 -4.01 12.00 66.25 1.59 6.90 -3.11 10.18 -1.47 (•1) (.3) (•3) (•2) (1.1) (1.4) (•3) (1.7) (2.6) (1.0) Bias P A R A - - - -.48 -9.09 -141.03 -7.43 A P P R 1 1.28 1.27 .17 .01 -.19 -.60 1.29 A P P R 2 1.73 1.55 -.18 .03 -1.07 6.99 -1.36 M S E P A R A - - - 1.71 9.38 230.93 8.18 A P P R 1 21.33 10.02 9.87 1.12 2.48 98.67 4.12 A P P R 2 25.40 10.49 10.56 1.34 2.73 100.94 5.06 Note: Bias = Percent bias = 100 x b\asJ/\PJ\; MSE = Percent V M S E = 100 x , / M S E " / \ P J \ . 105 To investigate the effect of semiparametric modelling on the estimation of the fixed- effects parameters ft - ft, we consider the two proposed methods (APPR1 and APPR2) for the semiparametric N L M E model (4.14) - (4.16), along with the covariate model (4.12) and a nonignorable dropout model (Model I in (4.13)). We also use a parametric N L M E model, where \2ij = ft + b^ and the other parts are the same as in the semiparametric N L M E model (4.14) - (4.16), to fit the simulated datasets. To emphasize the difference between the nonparametric and the parametric modelling for X2ij, we consider an ideal case for the parametric N L M E model fitting, in which there is no covariate measurement errors and dropouts. Thus, we do not need the covariate measurement error model and the dropout model in this ideal case. We use SPLUS function nlmeQ to obtain parameter estimates and their default standard errors, denoted by the P A R A method. We calculate averages of the resulting estimates and their standard errors based on each method. Since A P P R 1 method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 137 data sets. The simulation results are shown in Table 4.4. We find that estimates for the fixed-effects parameters ft - ft obtained by A P P R 1 and A P P R 2 are very close to their true values, and both methods perform better than the P A R A method in terms of bias and M S E criteria. These results show that the semiparametric modelling based on A I C / B I C for knots selection performs well and better than the parametric modelling in the current setting. To study the effect of missing data mechanisms, we assess the proposed methods (APPR1 and APPR2) based on a nonignorable model (Model /) and an ignorable model (Model IV), and compare them with the naive method (which ignores measurement errors and missing data). To investigate the performance of the estimate of X2ij, we generate 100 datasets from the true models (4.8) - (4.12). In the simulations, the true values of model parameters 0 and a are shown in Table 4.5, and the other true parameter values, the missing 106 Table 4.5: Simulation results for the parameter estimates (standard errors) for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13). Dropout Parameter Oil "2 a 3 ft ft ft ft ft ft ft Model True Value -.4 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 N A I V E - - • - 11.98 66.87 0.92 6.98 -2.41 9.64 -2.03 - - - (•1) (1.2) (1.1) (0.3) (1.9) (2.9) (1.0) Model I A P P R 1 -.40 4.08 -4.00 11.99 66.93 1.48 7.01 -2.04 8.11 -2.92 (•1) (.3) (•3) (•1) (1.4) (1.6) (.3) (1.9) (3.1) (1.2) A P P R 2 -.40 4.06 -4.01 11.99 66.86 1.53 7.03 - 2 . i i 8.17 -2.87 (•1) (.3) (.3) (•2) (1.3) (1.5) (.3) (1.8) (2.8) (1.0) Model IV A P P R 1 -.40 4.06 -3.99 12.00 67.15 1.45 7.01 -1.81 8.97 -2.22 (•1) (.3) (.3) (•1) (1.3) (1.6) (.3) (2-1) (3-2) (1.2) A P P R 2 -.39 4.06 -4.00 11.99 66.80 1.42 6.96 -1.78 9.08 -2.28 (•1) (.3) (.3) (•2) (1.2) (1.5) (.3) (1.8) (2.8) (1.0) mechanism and the missing rate are the same as above. We calculate averages of the resulting estimates and their standard errors based on each of the three methods and each of the two dropout models. We compare the methods by comparing their biases and mean-square-errors (MSEs). Since A P P R 1 method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 130 data sets. From the simulation results in Tables 4.5 and 4.6, we see that, when measurement errors and reasonable missing data mechanisms (Model /) are taken into account, the two proposed joint model methods (APPR1 and APPR2) perform well in terms of both bias and M S E criteria. A P P R 1 performs better than A P P R 2 as expected, but A P P R 2 also performs reasonably well and is computationally much more efficient. When the missing data mech- anism is ignored (Model IV), however, the two methods may not perform well. The naive method, which ignores measurement errors and missing data, may lead to severely biased estimates and large MSEs (e.g., the covariate effect ft can be severely under-estimated). 107 Table 4.6: Simulation results for biases and MSEs of the parameter estimates for the three estimation methods N A I V E , A P P R 1 , and A P P R 2 with dropout models I and IV in (4.13). Dropout Parameter Oil OL2 QJ3 Pi ft ft ft ft ft ft Model True Value -.4 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 Bias N A I V E - - - -.10 -.25 -38.42 -.61 -20.53 20.45 32.33 Model I A P P R 1 -.78 1.95 .01 -.08 -.11 -1.59 .18 -2.21 1.40 2.62 A P P R 2 -.78 1.95 -.01 -.09 -.21 -1.78 .50 -5.70 2.08 4.30 Model IV A P P R 1 -.97 2.02 .16 -.08 .22 -3.63 .19 9.26 12.10 23.10 A P P R 2 1.40 2.07 -.09 -.09 -.29 -6.91 -.55 10.66 13.57 26.02 M S E N A I V E - - - 1.89 2.97 148.06 6.96 168.66 68.32 70.14 Model I A P P R 1 20.00 7.28 7.00 1.17 1.84 96.01 3.43 80.02 35.15 40.42 A P P R 2 25.00 7.65 8.00 1.25 2.23 98.69 3.74 112.64 39.06 43.21 Model IV A P P R 1 30.51 9.90 9.84 1.45 2.49 120.12 4.84 122.80 56.26 58.90 A P P R 2 36.50 10.46 10.31 1.74 2.58 131.24 5.73 146.04 62.14 66.07 Note: Bias = Percent bias = 100 x biaSj/lPj ; MSE = Percent V M S E = 100 x V M S E J V I / 3 , - 1 . 4 . 7 Conclusions and Discussion We have proposed two approximate likelihood methods for semiparametric N L M E mod- els with outcome-based informative dropouts and covariate measurement errors and missing data, implemented by Monte Carlo E M algorithms combined with Gibbs sampler. The first method is more accurate than the second method but it may be computationally very in- tensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method is com- putationally much more efficient, but it is less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence prob- lems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that the proposed methods, which incorporate measurement 108 errors and dropout mechanisms, produce satisfactory results, but methods ignoring mea- surement errors and/or ignoring dropout mechanisms may perform poorly. Moreover, the A I C and BIC criteria perform well in the current setting. ^ We have assumed that the dropout models depend on the observed or unobserved responses and covariates. Alternatively, we may consider dropout models which share the random-effects parameters in response and covariate processes. Such models may be appro- priate if the dropout mechanism is related to the true but unobservable response/covariate values or summaries of response and covariate processes such as unobservable true viral decay rates. The methods in this chapter may be extended to such models. Finally, for Monte Carlo E M algorithms, Booth and Hobert (1999) proposed a nice automated method for choosing the number of Monte Carlo samples, which can be extended in our case as well. 109 Chapter 5 Semiparametric N L M E Model with Random-effect-based Informative Dropouts and Covariate Measurement Errors 5.1 Introduction In this chapter, we develop two likelihood methods to simultaneously address covariate mea- surement errors and random-effect-based informative dropouts in semiparametric N L M E models. The major difference in the models in this chapter and the models in Chapter 4 is the difference in the assumed missing response (or dropout) models. The response and covariate models remain the same. In Section 5.2, we discuss the models for this problem. We obtain approximate M L E s of all model parameters, using a Monte Carlo E M ( M C E M ) algorithm along with Gibbs sampler methods, in Sections 5.3. To avoid potential compu- 110 tational problems in the method discussed in Section 5.3, we also propose an alternative approximate method by using a first-order Laplace approximation to the log-likelihood func- tion in Section 5.4. Some asymptotic properties of the resulting estimates are also discussed. The two proposed methods are illustrated in a HIV dataset and are evaluated via simulation in Section 5.5. We conclude this chapter in Section 5.6 with some discussion. Asymptotic properties presented in Section 5-4 are proved in Section 5.7. 5.2 Missing Response Models The dropout is random-effect-based informative if the missing probability of the current re- sponse depends only on the underlying unobservable random effects a j and b j , i.e., / ( r j | y j , Z j , rj) = / ( r j | a j , b j ; rj). In the presence of random-effect-based informative dropouts in the N L M E model (2.6) and (2.7) with the covariate process (2.8), we can again write v* = (ymis,i, y<jbs,i) as before. Let nobSti be the number of components in y0bs,i- Here, the missing 2/ij's are again intermittent, i.e., we allow dropout individuals to possibly return to the study at a later time. For the vector g, = (gn,... ,gini), where ^ are defined in (3.6), we write gi = (grms.i, gobs,i) with gohS)i and g m i S : i being the conditional expectation of y o b S i i and y m i S , i , respectively. Let r j = (rn,..., rini)T be a vector of missing response indicators for individual i such that = 1 if is missing and 0 otherwise. Note that = 1 does not necessarily imply that r i j + i = 1. We have the observed data {(y 0 6s , i , Z j , i"i), i = 1,.. . ,n}. To allow for an informative missing mechanism in the response, we need to assume a distribution for the missing response indicator r j . For reasons discussed in Sections 4.2 and 4.7, in this chapter we consider the random-effect-based informative dropout mechanism / ( r j | a j , b j ; rj) for i y where rj are the unknown nuisance parameters. For such a missing data mechanism, the missingness of yid- share the random effects a j and b j in the response 111 and covariate models, suggesting that the dropout may be related to the true but unobserv- able response/covariate values or summaries of individual-specific response and covariate trajectories such as unobservable true viral decay rates. For example, such missing data models may be appropriate if a patient is more likely to dropout early because his true (but unobservable) viral decay rate is slower than other patients. In this chapter, we focus on the random-effect-based informative missing mechanism /(rj |aj, bf 77). Such models are related to the shared-parameter models in the literature (e.g., Wu and Carroll, 1988; Little, 1995; Ten Have et al., 1998). Although the relationship of the missingness with the random effects may be complex, a simple logistic regression model may provide a reasonable approximation. We will assume that r^'s are independent, i.e., fin]*, b<; 77) = IJ[P(r y = I K b i 5 7 7 ) p [ l - P(rij = I K b i ; » 7 ) ] 1 - r «, (5.1) with logit[P(rij = I K bf 77)] = log — —— r = 770 + r)1 + 772 b;, . l — r\Tij — l | a i , Oi, rj) where 77 = (770, r/J, r]2)T are the unknown nuisance parameters. For example, we may assume that the missingness of response is related to the first decay rate, say XUJ = ft + bi2 in Section 4.5., i.e., logit[F(ry = I K h i , 17)] = Vo + Vi xUJ = V* + rjl(p2 + bi2) = (Vo + V*iP2) + rilbi2 = r]o + r]ibi2. Note that we should avoid building too complicated a dropout model since the model parameters may become non-identifiable. As the assumed dropout models are not testable 112 based on the observed data, it is important to carry out sensitivity analysis based on various the outcome-based informative dropout models in Chapter 4 can all be used for sensitivity analysis. We consider likelihood inference for semiparametric N L M E models with random-effect-based informative dropouts and measurement errors and missing data in time-varying covariates, based on approximate models (2.6) - (2.8). Let 6 = (ct, 0, 52, R, A,B, rjj) be the collection of all unknown model parameters. We assume that the parameters ct, 0, 52, R, A, B, and rj are distinct. The approximate log-likelihood for the observed data {(y0bs,i, zi> r i ) , i = 1,. . . , n} can be written as dropout models. The random-effect-based informative dropout models in this chapter and 5.3 A Monte Carlo E M Method 5 .3 .1 T h e L i k e l i h o o d F u n c t i o n n 1(d) = ) fz(zi\&i;ct,R) f(sa;A) x / ( b i ; B ) / ( r i | a i by rf) dymiSti dfy dbi n II £ l o g /V(y06s,i|Zi> ai bh a , 0, 52) fz(zi\*i\ct, R) / ( a * ; A) i=l x/(b;; B) / (r; |a;, b ;̂ rj) da^dbi (5.2) 113 where /y(y 0b s,i|zi, a*, bj; ex, (3, 52) = n"=i'' fy(yobs,ij\zij, a*> b » ; <*> P> = nST"' (27r (52)-1/2exp{-[yobs, i j - geb.Ai]*/2P}, /z(zi|aij a, i?) = n r=i fzfak\ai\ «, = Ilfcii |27r/?|_1/2 exp{-(z i f c - Uj'fe a - v i f c a;)r i?" 1 x(z i A ; - uika - vifca;)/2}, /(a*; A) =\27rA\-^2exP{-ajA-^/2}, f(ht; B) = |27rJB|-1/2 exp{-bfB-1b,/2}, and y06s,ij is the observed yij. Note that unlike 1(6) in Section 4.3, the missing responses ymiS,i a r e integrated out in (5.2). The observed-data log-likelihood function 1(6) generally does not have a closed-form expression since the functions in the integral can be nonlinear in the random effects &i and bj. So we use a Monte C a r l o - E M ( M C E M ) algorithm to find the approximate M L E s of parameters 6. By treating the unobservable random effects a; and b, as "missing" data, we have "complete data" {(y0bs,i, zi, *i, ai, bj), i = l , . . . , n } . The complete-data log- likelihood function for all individuals can be expressed as - n n ldG) = YI Zc°(0) = XX l o g fy(yobs,i\zi, a(, hi, ex, 0, 52) + log /z(zi|ai; ex, R) i=l i=l + log / (a l ; J 4)+log / (b , ; J B) + l o g / ( r t | a i , h%] r,)} (5.3) where if is the complete-data log-likelihood for individual i. 114 5.3.2 A M C E M Algorithm Let be the parameter estimates from the t-th E M iteration. The E-step for individual i at the (t + l)th E M iteration can be expressed as Q i ( 0 | 0 « ) = E{lf{9)\yobs,u Z i , r i 5 0 « ) = J J J [log /y(yo b s,i|zj, a*, b i ; a , 3, 52) + log /z(zi|ai; a, i?) + log /(a i ; A) + log /(bi; 5 ) + log /(ri|ai, bi; 17) x /(ai, bi|yo 6 s,i, Z i , n; 0 ( t ) ) dat dbi = /W(a, /3, <52) + /« (« , J?) + 4V) + + i f f a ) . (5.4) Since the expression (5.4) is an expectation with respect to /(ai, bj|y0j,Sjj, Z j , i y 0^), it may be evaluated using the M C E M algorithm. Specifically, we may use the Gibbs sampler to gen- erate samples from [ai, b^y^i, Z j , i y 0^] by iteratively sampling from the full conditionals [ai|y06s,i, Z j , r i , bi; 0W] and [bi|vofcs,i, z i ; rj, a*; 0W] as follows. /(ai|yo 6 s,i, Z i , rj, b»; 0<*)) a f(yobSti, a ^ , rt, bij 0W) =/(y0fcs,i|zi, ri, ai,bi; 0^') •/(ai|zi, ri, bi; 0W) oc /(y06s,i|zi, ri, ai,bi; 0W) • /(ri, a ẑ̂ bij 0 ( t ) ) = /(y0bs,i|zi, ai,bi; 0W) • /(ri|zi, ai,bi; 6{t)) • / ( a^z* ,^ 0 ( t ) ) = /(YcUzi, ai, b i ; 00>) • /(r-i|ai, b i 5 9®) • / ( a ^ ; 0 « ) oc /(yo b s,i|zi, ai,bi; 0(*>) • / ( r ^ a ^ h i ; 0W) • / ( z i ? ai; 0(*>) = /(ai; 0(*)) • / ( y ^ l z i , ai, bt; 0W) . / ( z ^ ; 0(0) . / ( r . |a , , b,; 0W) /(bi|yo b s,i, Z i , ri, ai; 0(*)) oc f{y0bs,i, b^Zj, ri, ai; 0(*>) = /(y 0b5,i|zi, ri,.ai,bi; 0 ( i ) ) • /(bi|zi, r i ; aj; 0 W ) OC /(yofcs,i|zi, T i , ai,bij 0W) • / ( J*i, bi |Zi, a i ; 0W) = /(y 0h5,i|zi, ai,bi; 0W) • / (ri | Z i , a,,^; 0<*>) • /(b^z^a*; 0 ( t ) ) = /(b i 5 0 « ) • /(yofcs,i|zi, a,, b,; 0(«>) • /(ri|ai, b i ; 0 « ) . 115 Monte Carlo samples from each of the full conditionals can be generated using rejection sampling methods.. Alternatively, integral (5.4) may be evaluated using the importance sampling method (see Section 3.3.3). We will briefly discuss the sampling methods in the next section. Note that, unlike the M C E M method in Section 4.3.2, here we do not need to sample Ymis,i, which reduces the computational burden. For individual i, let { ( a f , b f ) , . . . , ( a f \ b f ' ) } denote a random sample of size kt generated from [a*, b i \ y o b S i i , Z j , ry, 0^}. Note that each ( a f , b f ) depends on the E M iteration number t, which is suppressed throughout. The E-step at the ( £ + l ) t h E M iteration can then be expressed as Q(6\0M) = ± Qi{9\9®) « ± h E lf(0; yobs,u zh iy a f , b f ) } i=i i=i I fe=i J = E E 17 log fy(yobs,iK a f , b f ;a, 3, 52) i=i fc=i + E E i log / z f e l a f ; a , R) + £ £ £ log / ( a f ; A) i=lfc=l i=lfc=l + E E i log / ( b f ; B) + £ E £ log / W a f , b f ; r,) t=lfc=l i=lfc=l = Q ( 1 ) ( « , /3, <52|0«) + Q<2)(a, J?|0W) + Q( 3 ) (A | 0« ) + Q^{B\0^) + Q^{rj\0W). The M-step then maximizes Q(0\0^) to produce an updated estimate 0^t+l\ so it is like a complete-data maximization. Since the parameters in + Q< 2 \ Q^, Q<4), and are distinct, the M-step can be implemented by maximizing + Q^2\ Q^3\ and separately using standard optimization procedures for the corresponding complete-data models. As in Section 3.3.2, we use the approximate formula suggested by McLachlan and Krishnan (1997) to obtain the variance-covariance matrix of the approximate M L E 0: Let sf = dlf /80, where if is the complete-data log-likelihood for individual i. Then an 116 approximate formula for the variance-covariance matrix of 9 is n Cov(0) = [J2E(sc]\yobs,i, Z i , rf, 9) E{sf\yobs^, zit r i ; 9) where the expectations can be approximated by Monte Carlo empirical means, as in (5.6). In summary, the foregoing M C E M algorithm can be implemented as follows. Step 1. Obtain an initial estimate of (a, 0, S2, R, A, B) = (a ' 0 ' , / 3 ( 0 ) , 62^°\ Bl°\ A<°\ B^) and an initial value of (aj, bj) = (af\ b-0^) based on a naive method; then we ob- tain an initial estimate of 77 = rj^ based on the dropout model with the random effects (ai>bi) = (aS°\bW). Step 2. At the t-th iteration, obtain Monte Carlo samples of the random effects (aj, bj) using the Gibbs sampler along with rejection sampling methods, or using importance sampling methods to approximate the conditional expectation in the E-step. Step 3. Obtain updated estimates #( t + 1) using standard complete-data optimization procedures. Step 4. Iterate between Step 2 and Step 3 until convergence. 5.3.3 S a m p l i n g M e t h o d s G i b b s Sampler As in Section 3.3.3, we can again use the Gibbs sampler to draw the desired samples as follows. Set initial values (&f\ bf^). Suppose that the current generated values are {a\k\ bf^), we can obtain (a[k+1\ b\k+l^) as follows. Step 1. Draw a sample for the "missing" random effects a f ^ from /(aj |y 0 h S ] j , z i : i y b^; 9^). Step 2. Draw a sample for the "missing" random effects b\k+1^ from/(bj |y 0 t S i j , ziy Ti, a| f e + 1^; 9^). After a sufficiently large burn-in of r iterations, the sampled values will achieve a steady state. Then, {(a| f e\ bf^), k = r + 1, . . . , r + kt} can be treated as samples from the 117 multidimensional density function / ( a j , b j | y 0 ( , S ) j , Z j , ry, 0 ^ ) . And, if we choose a sufficiently large gap r' (say r' = 10), we can treat { (a f , b f ) , k = r + r',r + 2r',... } as independent samples from / ( a j , b j | y 0 j , S ) j , Z j , r j ; 0®). There are several ways to get the initial values ( a f , b f ) . A simple way is to obtain ( a f , b f ) based on a naive method. Mul t ivar ia te Reject ion A l g o r i t h m Sampling from the two full conditionals can be accomplished by the multivariate rejection sampling method. For example, we consider sampling from / ( a j | y 0 ( , S j , Z j , ry b j ; 9®) in (5.5). Let / * ( a j ) = / ( y o b s , j | z j , a j , b,; 0W ) / ( z i | a i ; 0 ( « > ) / ( r ^ , b i ; 9®) and ? = sup{/*(u)}. We assume <; < oo. A random sample from / ( a j | y o h s j , Z i , i y b j j 0^) can then u be obtained as follows Step 1. Sample a* from / ( a j ; 0^), and independently, sample w from the uniform (0, 1) distribution. Step 2. If w < /*(a*)/<r, then accept a*, otherwise, go back to step 1. Samples from / ( b j | y 0 ( , S i j , Z j , r-j, a j ; 0^) can be obtained in a similar way. Therefore, the Gibbs sampler in conjunction with the multivariate rejection sampling can be used to obtain samples from [aj, b j | y 0 6 S i j , Z j , 5.4 A n Alternative Approximate Method 5.4.1 The Hierarchical Likelihood Method The approximate maximum likelihood inference using a Monte Carlo E M method in the previous section may be computationally intensive and sometimes may offer potential com- putational problems such as slow or non-convergence, especially when the dimensions of the random effects a j and b j are not small (see the detailed discussion in Section 3.4.1). To 118 overcome these difficulties, in this section we consider an alternative method called the hi- erarchical likelihood method (Lee and Nelder, 1996, 2001) for approximate inference. The hierarchical likelihood method avoids Monte Carlo approximation and thus is always com- putationally feasible, and it can also used to obtain good parameter starting values for the M C E M method in the previous section. Let £ denote general "nuisance parameters". Lee and Nelder (1996) considered a function pi (I) defined by £4' where D(l, £ ) = -d2l/d£2, and £ solves dl/d£ = 0. Following Lee and Lelder (1996), the complete-data log-likelihood function lc(&) in (5.3) may also be called the hierarchical log- likelihood function since it combines the two stages of mixed-effects models. Let UJ = {u>; = (a ,̂ bi), i = 1,... ,n} be the collection of random effects. The function p&(lc(Q)) can be written as i o™-— v _ c ~̂ " ~ 1 1 , . ' (5-7) L Z I Z7T IJ U>i=UJi 1=1 1=1 We can show that, for unobservable UJ, the use of the function PQj(lc{9)) is equivalent to integrating UJ out using the first-order Laplace approximation. Thus, P(j(lc{Q)) is the first- order Laplace approximation to the marginal log-likelihood 1(6) in (5.2) using the hierarchical log-likelihood function lc(d). In fact, let N = n0bS:i + be the number of the response and covariate observations for individual i and let b be the dimension of w ; . Assume that Ni — O(N) uniformly for i = 1,... , n, where'TV = mini TV*. Taking k = N, kp(x) = lf(6), 7 = 6, and x = a?* in the following Laplace approximation Id2p(x) I e f c P ( x ) d x = ( 2 7 r / / c ) ^ 2 dx2 119 + 0(k~l), where x is a 7-dimensional parameter vector and x maximizes kp(x), we can approximate the ith individual's contribution U(0) to the overall log-likelihood 1(0) as h(0) = log J el*){e)dui = \og J eNip{U}i)duji = log j (^)b/2 | D ( p ( W i ) i « i ) \^r/2eN^ + O p ( i V r ^ ) | 6/2 = log = log 2 7 r Y Ni) - 1 / 2 ^ ) ^+O P (TV- 1 ) U > = C J -1/2 4°(0) = log[exP{p^(^)(0))} + O(iV- 1 ) ] = P c 2 , ; ( ^ i ) W ) + O W - 1 ) . (5.8) in which the last step holds by Lemma 3.2 in Section 3.7. Hence, the log-likelihood 1(0) can be approximated as 1 = 1 = PcJ(U0)) + Y2o(N-i) 1=1 = Po ; (a0 ) )+nO( iV- 1 ) . (5.9) As N — miriiNi grows faster than n, the function p&(lc(Q)) approaches the marginal log- likelihood function 1(0), and hence an estimate of 0, which maximizes pjj(lc(0)), also maxi- mizes 1(0). This lead to the following algorithm to obtain an approximate M L E of 0 called OHL'- Step 1. Obtain an initial estimate of (ct, 3, 52, R, A, B, w) =' (a ' 0 ' , 3{0), 52(-°\ Rp\ A(°\ B(°\ U/°)) based on a naive method, and an initial estimate of r) = rj^ based on the dropout model with the random effects u /°) . 120 Step 2. Given the current parameter estimates update random-effects estimates Ut+V> by maximizing lc\0^) with respect to a;*, i = 1,. . . ,n. Step 3. Given the random-effects estimates u>f+1\ update the parameter estimates #( t+1) by maximizing pa,(t+i)(Zc(^)) with respect to 0. Step 4. Iterate between Step 2 and Step 3 until convergence. We can use Fisher information to obtain the following approximate formula for the variance-covariance matrix of the approximate M L E OHL C o v ( ^ ) = [ - 8 ^ ( ' f ) r " ' v [ 80dOT Many optimization procedures evaluate the matrix [—d2p^j(lc(0))/dOd6T] at 6 = BHL (called Hessian matrix), from which it is easy to obtain Cov(0 HL)- 5.4.2 Asymptotic Properties Under suitable regularity conditions on 1(0), g(-), and d(-), we extend Vonesh (1996) to show in Section 5.7 that 0=0 HL 1T1 (OHL — 00) = Op max | n 2, ^min N^j | where 60 is the true value of 6. Thus, the approximate M L E OHL will be consistent only as both n and (min; Ni) —* 0 0 . Intuitively, the term comes from standard asymptotic theory while the (mini Ni)^1 term comes from the Laplace approximation. Note that the accuracy of the first-order Laplace approximation to the log-likelihood function is 0{n/(min; Ni)}, or, equivalently, o(l) provided (min;TV;) grows faster than n. In this case, (OHL ~ Oo) = Op(n~^) with OHL being asymptotically equivalent to the "exact" M L E . This reflects the fact that, as the accuracy of the Laplace approximation to the log- likelihood increases, the approximate M L E OHL will behave more and more like the "exact" 121 M L E . However, we can decrease the growth rate of (min^ N) for the asymptotic normality of OHL- In particular, as (min, Nt) grows at a rate greater than n%, the rate of consistency of OHL will still be Ov{rT^) and the resulting estimate will be asymptotically equivalent to the "exact" M L E in the sense that it has the same asymptotic distribution as the "exact" M L E (see Section 5.7). We correct the claim by Vonesh (1996) that, as (minj/Vj) grows at a rate greater than nh but less than or equal to n, the rate of consistency will still be Op(n~%) but the resulting estimate will no longer be asymptotically equivalent to the "exact" M L E . The proofs of the above arguments are given in Section 5.7. 5.5 Example and Simulation 5.5.1 Example We use the same HIV dataset in Section 4.5 to illustrate our proposed methods in this chapter, but we use the random-effect-based informative dropout model here rather than the outcome-based informative dropout model in Section 4.5. These informative dropout models may be used for sensitivity analysis. We use the commonly-used naive method, which ignores measurement errors and missing data, for parameter starting values in the two proposed methods. See the data description in Section 4.5.1. The Response and the Covariate Models We consider the same HIV viral dynamic and CD4 measurement error models in Section 4.5.2. For completeness, we describe these models again here. For the viral load 122 process, we consider the following model Vij log(Pii) log(P 2 t) where y,j is the logi0-transform of the viral load measurement for patient i at time t^, P^-and P 2 j are baseline values, A^- and A 2 ^ are viral decay rates, z*j is the true (but unobservable) CD4 count, and iw(£y) and h^tij) are nonparametric fixed- and random-effects functions (see Section 2.1). In order to reduce the number of nuisance parameters, we assume that the variance-covariance matrices A and B of the random effects are both diagonal matrices. To avoid very small (large) estimates, which may be unstable, we standardize the CD4 counts • and rescale the original time t (in days) so that the new time scale is between 0 and 1. As discussed in Section 2.1, we employ the linear combinations of natural cubic splines with percentile-based knots to approximate w(t) and hi(t). Following Wu and Zhang (2002), we take the same natural cubic splines with q < p in order to decrease the dimension of random effects. A I C and BIC criteria are used to determine the values of p and q, which leads to the following model for A 2 r , in (5.12) (see Table 4.1), with p = 3 and q — 1, A 2 i j « ft + ft MUj) + ft MUj) + hi- (5.13) For the CD4 process, we consider empirical polynomial L M E models, and choose the best fitted model based again on A I C / B I C values for each possible model. This is done based on the observed CD4 values, and is done separately from the response model for simplicity. The following quadratic polynomial L M E model best fits the CD4 trajectory (see Table 4.2): CD4*/ = (ai + ai) + (a 2 + a 2) ua + (a 3 + a 3) u\ + eih (5-14) = \ogW{PLIE-X^+ P 2 i e - x ^ ) + e i j , (5.10) = Pi + bu, XUj = ft + ft4 + 6 2 i, (5.11) = ft + hi, Mij = w(Uj) + hi(tij), (5.12) 123 where uu is the time and a = (ai, a2, a^)T are the population parameters and a; = (an, aj2, a^)T are the random effects. Random-effect-based Informative Dropout Models In this study, dropout patients appear to have slower viral decay, compared with the remaining patients. Thus, dropouts are likely to be informative or nonignorable. So we need to assume a model for the dropout process in order to make valid likelihood inference. Note that we should avoid building too complicated a dropout model since a complicated model may become non-identifiable. Subject-area knowledge and preliminary checks suggest that dropout may be related to the random-effects components an, ai2, and bi2, so we consider the following dropout model logit[P(rjj = l | a i , b^ T?)] =r]1 + rj2an + ri3ai2 + 774^2- (5.15) We assume independence of the r^'s to simplify the model. The dropout model (5.15) along with the dropout models in Section 4.5 can be used for sensitivity analysis. Estimation Methods and Computat ion Issues We estimate the model parameters using the two proposed "joint" model methods discussed in Sections 5.3 and 5.4. We denote the method in Section 5.3 by A P and the method in Section 5.4 by H L . The two proposed joint model methods need starting values for model parameters since they are implemented by a M C E M algorithm or by an iterative Laplace approximation to the log-likelihood function. We use the parameter estimates obtained by the naive method, which ignores measurement errors and missing data, as parameter starting values for the two joint model methods. For the naive method, we use the SPLUS function nlme() and lme() to obtain param- eter estimates and their default standard errors. For the proposed A P method, we assess the 124 convergence of the Gibbs sampler by examining time series plots and sample autocorrelation function plots. For example, Figures 5.1 and 5.2 show the time series and the autocorrelation function plots for b2 associated with patient 14. From these figures, we notice that the Gibbs sampler converges quickly and the autocorrelations between successive generated samples are negligible after lag 15. Time series and autocorrelation function plots for other random effects show similar behaviors. Therefore, we discard the first 500 samples as the burn-in, and then we take one sample from every 20 simulated samples to obtain independent samples (see sampling methods in Section 5.3.3). We start with k0 = 500 Monte Carlo samples, and increase the Monte-Carlo sample size as the number t of E M iteration increases: kt+i = kt + kt/c with c — 4 (Booth and Hobert, 1999). Convergence criterion for these two joint model methods in our examples is that the relative change in the parameter estimates from successively iterations is smaller than 0.05. Convergence of the algorithms are considered to be achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. We use the multivariate rejection sampling method for the A P method. Other sam- pling methods may also be applied and may be even more efficient. On a S U N Sparc work-station, the A P method took about 135 minutes to converge while the H L method took about 150 minutes to converge. The H L method took more time than the A P method mainly because all model parameters appear in the nonlinear function p<jj(lc(9)) and no sep- aration of the parameters is possible. However, the H L method is always computationally feasible while the A P method sometimes may have convergence problems. Moreover, the H L method can be used to obtain good parameter starting values for the A P method. A n a l y s i s Resu l t s We estimate the model parameters using the two proposed joint model methods A P 125 Series : b2 Figure 5.2: The autocorrelation function plot for b2 associated with patient 14. 126 Table 5.1: Parameter estimates (standard errors) for the models in the example. c Method a i a 2 « 3 ft ft ft ft ft ft ft S R A P -.43 4.29 -3.90 11.72 66.65 1.53 6.93 -1.86 7.49 -2.36 .36 .51 (.2) (.5)' (.6) (•2) (4.3) (4.6) (.7) (4.9) (7.8) (2.7) H L -.41 4.32 -3.93 11.64 66.44 1.58 6.89 -1.92 7.46 -2.29 .35 .50 (•1) (-4) (.5) (•1) (3.4) (2.8) (.6) (4.8) (7.5) (2.7) Note: the estimated covariance matrices are A = diag(.62,4.70, 4.41) for AP, A = diag(.51, 4.74, 4.53) for HL. B = diag(1.45,91.62,1.94,20.16) for AP, and B = diag(1.42,91.91,1.58,19.96) for HL. and H L . We use the parameter estimates obtained by the naive method as the parameter starting values for the A P and the H L methods. We also tried several other parameter starting values for the proposed joint model methods. Different parameter starting values lead to roughly same parameter estimates in both the A P and the H L methods. Table 5.1 presents the resulting parameter estimates and standard errors based on the random-effect-based informative model (5.15). We find that the two joint model meth- ods provide similar parameter estimates. Comparing the random-effect-based informative dropout model with the outcome-based informative dropout model I in (4.13), we find that the resulting estimates are similar. This indicates again that the estimation may be robust against the nonignorable dropout models. Although some estimates in Table 5.1 are not sta- tistically significant, the values of the estimates may still provide useful information about viral load and CD4 trajectories. The estimates of parameters rj based on the A P method (or the H L method) are -2.32, .31, -.05,, and - .07 (or -2.4.1, .27, -.04, and -.08) respectively, with all p-values less than .00001, which also indicates that the dropouts may be nonignor- able (or informative) since the missingness may depend on the unobservable random effects. The estimates of rj2, n3, and 774 indicate that dropout patients seems to have higher baseline CD4 count, decrease in CD4 count faster over time, and have slower first decay rate than the remaining patients. 127 5.5.2 The Simulation Study We evaluate the proposed methods (AP and HL) for the random-effect-based informative model (5.15) via simulation. The response and covariate models, the random-effect-based informative dropout model, and the measurement time points used in the simulation are all the same as those in the example in the previous section (i.e., (5.10) - (5.14)). We choose appropriate values of rj to mimic certain missing rate, and we use the SPLUS function sampleQ to generate binary data ry based on the values of parameters 77 and the random effects 3n and by If ?y = 1, then yid- is deleted, and if ry = 0, is considered to be observed. In the simulations, the true values of model parameters 3 and oc are shown in Table 5.2, and the other true parameter values are 5 = .2, R = .4, A = diag(.5, 3, 2), and B = diag(l, 9, 2, 4). We set 77 = (—1.4,0.1, —0.1, —0.1)r to get an average missing rate of 20%. We always regard the first two responses on each individual as observed, i.e., every individual has at least two observed responses. We simulated 100 data sets and calculated averages of the resulting estimates and their standard errors based on each of the two methods. We compare the methods by comparing their biases and mean-square-errors (MSEs). Here, bias and M S E are assessed in terms of percent relative bias and percent relative root mean-squared error, as defined in Section 4.6. Since A P method sometimes offers computational problems, such as slow or non-convergence, the 100 sets of parameter estimates are obtained from 128 data sets. From the simulation results in Tables 5.2 and 5.3, we see that the two proposed joint model methods (AP and HL) perform well. The A P method performs better than the H L method in the sense that the A P yields smaller relative M S E and bias than the H L method. The H L method also performs reasonably well and it is always computationally feasible. Therefore, the H L method may be a good alternative method when the A P method exhibits computational difficulties, and the H L method can also be used to obtain good parameter 128 Table 5.2: Simulation results for the parameter estimates (standard errors) for the estimation methods A P and H L . Parameter Oil a2 « 3 ft A ft ft ft ft ft True Value -.4 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 A P -.40 4.01 -3.98 11.99 66.95 1.49 6.99 -2.01 8.04 -2.97 (•2) (.5) (.6) (•1) (1.4) (1.6) (.3) (2.1) (3.2) (1.1) H L -.38 3.96 -3.93 11.99 67.10 1.56 6.95 -2.08 8.09 -2.89 (•1) (•4) (.5) (•1) (1.0) (1.3) (.3) (1.8) (2.9) (1.0) Table 5.3: Simulation results for bias and M S E of the parameter estimates for the estimation methods A P and H L . Parameter a\ <%2 a3 ft- ft ft ft ft ft ft True Value -A 4.0 -4.0 12.0 67.0 1.5 7.0 -2.0 8.0 -3.0 Bias A P 1.59 .15 .95 -.07 -.1 -.44 -.13 -.53 .46 .25 H L 2.17 -1.02 1.31 -.08 .22 1.08 -.26 -3.78 .62 .93 M S E A P 7.93 6.46 7.39 1.66 1.74 66.69 1.98 71.84 28.95 22.26 H L 9.07 8.33 . 9.31 1.69 2.43 93.67 4.24 98.75 37.30 31.71 Note: Bias = Percent bias = 100 x b i a S j / | / ? j | ; MSE = Percent ^MSE = 100 x ^/MSE^/I^I. 129 starting values for the A P method. 5.6 Discussion We have proposed two approximate likelihood methods for semiparametric N L M E models with random-effect-based informative dropouts and covariate measurement errors and missing data, implemented by a Monte Carlo E M algorithm combined with Gibbs sampler or by an iterative Laplace approximation to the log-likelihood function respectively. The first method may be more accurate than the second method but it sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method is always computationally feasible but may be less accurate than the first method. The second method may be used as a reasonable alternative when the first method has convergence problems or it may be used to provide excellent parameter starting values for the first method. Simulation studies indicate that both methods produce satisfactory results. Although it does not need to generate Monte Carlo samples for random effects, the second method may not be computationally more efficient than the first method. A possible reason is that there are too many model parameters appear in the nonlinear functions in optimization procedures and no separation of the parameters is possible. A possible solution is to use Bayesian method to address this problem. Specifically, we may assume the known hyperprior distributions for the model parameters. In many longitudinal data sets, dropouts, censoring, measurement errors, and missing covariates are all present simultaneously. To our knowledge, there are almost no unified methods in the literature which address these problems simultaneously. Wu (2002) pro- posed a unified method to address censoring and measurement errors simultaneously and 130 showed that the proposed method offered significant improvement over existing methods currently in use. The ideas in Wu (2002) and in this chapter can be extended to address dropouts, censoring, measurement errors, and missing covariate simultaneously in semipara- metric/nonparametric N L M E models. 5 . 7 Appendix: Asymptotic Properties of the Approx- imate M L E #HL in Section 5 . 4 5.7.1 Consistency We will show that the following result (OHL — Oo) = OV holds under the usual regularity conditions on 1(0), g(-) and d(-), where 60 is the true value of 0. Proof. Let u); maximize lc\0) with respect to u>; for fixed 0. Denote Ni = n0bs<i + rrii. Suppose that Ni = O(N) uniformly for i = 1, . . . , n, where ./V = mim Ni. Based on (5.8) in Section 5.4.3, the zth individual's contribution k(0) to the overall log-likelihood may be approximated as m = POMW) + ^ r 1 ) = P&T{W(o)).+ O(N-1). . Hence, the log-likelihood 1(0) can be .written as (see (5.9)) l(0) = l*(0) + O{nN~1}, . (5.16) where l*(0) = p^(lc(0)) = E H i P u ^ W ) - L e t u*(e) = dl*(0)/d0 and let 0HL be the approximate maximum likelihood estimate satisfying U*(0HL) — 0- Under suitable regularity 131 m a x j n s ^ m . i n A ^ j conditions on 1(0) and assuming QHL is an interior point in a neighborhood containing 00, Taylor's theorem tells us that there exists a vector 0 on the line segment between 00 and OHL such that n-lu(0HL) = n-xu(0Q) + n-lM(0)(0HL - 00), (5.17) where u(0) — dl(0)/dO and M(0) = 82l(0)/d0d0T are the first and second order derivatives of the true but intractable marginal log-likelihood 1(0). The. first term n - 1 u ( 0 o ) on the right of (5.17) is = ly dk(0) iu(«0, = I«2> n n oO 0=0O 80 0=00 Given sufficient regularity conditions on 1(0), we know from the Lindeberg Central Limit Theorem that ^ u ( 0 o ) - ^ ( 0 , 7 ( 0 0 ) ) , (5.18) where the matrix 7(0) = l i m ^ o o ^ £ h(0) and h(0) is the information matrix for individual " i = l i. That implies - ^ u ( 0 o ) = O p ( l ) ^u (0 o ) = Op(n-1'2). The matrix n~lM(0) on the right of (5.17) is 1 A d\(0) l-M(0) = n v ' ndOdOT 0=0 n Z-u n ^ dOdO1 0=0 -1(0), (5.19) by the Law of Large Numbers. Since 1(0) is positive definite for all 0, the probability that the matrix n _ 1 M ( 0 ) is invertible tends to 1. By writing n'lM(0) = —1(6) + op(l) and applying Lemma 3.2 in Section 3.7 to the inverse function, we have [n-lM(0)Yl = -IijO)-1 + op(\). 132 (5.20) Given suitable regularity conditions ong(-) and d(-), for example that third order derivatives exist and are continuous in an open neighborhood about 0O, application of Lemma 3.2 in Section 3.7 to the partial derivative function in the expression (5.16) leads to n-1u(6HL) = n-1u*{dHL) + 0(N-1). (5.21). Note that u*(0HL) = 0 and n - 1 u(0o) = Op(n~i). From (5.17), we have n - 1 M ( 0 ) ( 0 H L - 6o) = n - y f l i / L ) - n ^ u ^ o ) (OHL - 00) = [n-lM(d)]-l[n'lvi{eHL) - n " 1 ^ ) ] (OHL - Oo) = ( - / (0)" 1 + o p ( l ) ) [n - 1 u(0 H i ) - n " 1 ^ ) ] (by (5.20)) (OHL - Oo) = (-1(G)-1 + op{l)){n-lu\0HL) + O ^ " 1 ) + Op(n^)\ (by (5.21)) (OHL - 00) = -I(O)-1 Op maxjra^ , A T 1 } ' ] + o p m a x j n - ^ , iV_1>}] m a x j n . v^nr in /V^ }J . Finally, let OML denote the "exact" maximum likelihood estimate with U(0ML) = 0- Let min; N = 0(nT) for r > 1 so that the accuracy of the Laplace approximation to the marginal log-likelihood is approximately 0(nl~T) = o(l) from the formula (5.16). Then, under the same regularity conditions as before, by multiplying n on the both sides of the equation (5.21) and noting that U(0ML) = 0, we have U(0HL) = U*(0HL) + Op(1) = O + Op(1) = U(0ml) + Op(1). Thus U(9HL) ~ u(0ML) = op(l) and hence OHL is asymptotically equivalent to the "exact" maximum likelihood estimate OML- d 5.7.2 Asymptotic Normality of #HL In this section, we will show that as N grows at a rate greater than n%, i.e., N = 0(nT) for r > | , the approximate M L E OHL and the "exact" M L E OML have the same asymptotic 133 distribution. Proof. Noting that the approximate M L E OHL satisfies a set of equations U*(OHL) = U, W E take a first-order Taylor series expansion of U*(0HL) around the true parameter 0O O = u*(0HL) = U*(00) + ^-^-(0HL-00), dO where 0* is oh the line segment joining 00 to OHL, which implies V^{0HL-00) = 1 du*{0*) n 801 "i - i i=l dOdO1 80 (5.22) Now we consider the two product terms on the right of (5.22). Applying Lemma 3.2 in Section 3.7 to the first and second partial derivative functions in the expression in (5.16), we know that, for any fixed 0, and 4=u*(6>) = - ^ u ( 0 ) + 0(nl2 A " 1 ) dO dO + U ^ N )> (5.23) n dOT n dOT dOdOT n^dOdOT 1 ' i=i i=i Assume that N = 0(nT), where r > \ . Then ~0(n^ N'1) = 0(n^-T) (5.23) and (5.24), we have (5.24) o(l). From £%>^h aO _ l ™ 1 dlj(Oo) 1 A rpp^f)) _ l i m i f *«e-) « £ a0a0 r ~ » " (5.25) 134 Note that OHL — Oo = Op[ma,x{n~%, N^1}] = Op(n~^), i.e., OHL is a -^-consistent estimate of 6o- Since 9* is on the line segment joining 0O and OHL, A- 0O as n —> oo. Under the same regularity conditions as before, it follows from (5.18) and (5.19) that i=n „. (5-26) Combining the results in (5.25) and (5.26) and using Slutsky's theorem, we can show that V^(0HL-0O) A / V ( 0 , Wo)'1), which implies that when N = 0(nT) for r > | , the approximate M L E OHL a n d the "exact" M L E OML have the same asymptotic distribution. • 135 Chapter 6 Conclusions and Future Research 6.1 Conclusions In this thesis, we have developed approximate maximum likelihood inference in the following three problems: (1). semiparametric N L M E models with measurement errors and missing data in time-varying covariates; (2). semiparametric N L M E models with covariate measure- ment errors and outcome-based informative missing responses; (3). semiparametric N L M E models with covariate measurement errors and random-effect-based informative missing re- sponses. Measurement errors, dropouts, and missing data are addressed simultaneously in a unified way. For each problem, we have proposed two joint model methods to simultaneously obtain approximate maximum likelihood estimates (MLEs) of all model parameters. The first method, implemented by a Monte Carlo E M algorithm, may be more accurate than the second method but it may be computationally very intensive and sometimes may offer computational difficulties such as slow or non-convergence, especially when the dimensions of random effects are not small. The second method, which approximates joint log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approxima- 136 tion, is computationally more appealing, but it may be less accurate than the first method. The performance of the second method may need further investigation. We have showed some asymptotic results for the estimates based on'the second method. The second method may be used as a reasonable alternative when the first method has convergence problems or be used to provide excellent parameter starting values for the first method. Simulation results have shown that all proposed methods perform better than the commonly used two-step method and the naive method which ignores measurement errors, in the sense that the proposed methods yield smaller bias and M S E . In particular, the commonly used two-step method may under-estimate standard errors, which is consistent with analytic results, and the naive method may under-estimate covariate effects and poorly estimate other parameters. 6.2 Future Research Topics Finally, we discuss possible future work relevant to this thesis as follows. 1. In many longitudinal studies such as HIV viral dynamics, another common problem is that the response measurements may be subject to left censoring due to a detection limit. Censored responses in practice were often substituted by the detection limit or half the detection limit (Wu and Ding, 1999; Wu and Wu, 2001), which may lead to substantial biases in the results (Wu, 2002). In the presence of both dropouts and censoring, unified approaches which address these problems simultaneously in semiparametric/nonparametric N L M E models are needed in order to make reliable statistical inference. 2. In many longitudinal datasets, dropouts, censoring, measurement errors, and miss- ing covariates are all present simultaneously. To our knowledge, there are almost no 137 unified methods in the literature which address these problems simultaneously. Wu (2002) proposed a unified method to address censoring and measurement errors simul- taneously and showed that the proposed method offered significant improvement over existing methods currently in use. The ideas in Wu (2002) and in this thesis can be extended to address dropouts, censoring, measurement errors, and missing covariates simultaneously in semiparametric/nonparametric N L M E models. 3. For the response process, we only consider semiparametric nonlinear mixed-effects models with independent and normal distributed error terms e;. In the future, we may consider more complicated covariance structure for e; such as an AR(1) structure. 4. In our study, we only consider semiparametric nonlinear mixed-effects models for normal data. Generally, our proposed methods may be extended to other models, such as semiparametric/nonparametric generalized linear mixed-effects models and semiparametric/nonparametric generalized nonlinear mixed-effects models. 5. Computational efficiency is an important issue in our study. Multivariate rejection sam- pling methods have been used in our data analyses and simulation. In general, other sampling methods, such as adaptive rejection sampling methods and importance sam- pling methods, may also be used and may be even more efficient. We plan to compare computational efficiency among several sampling methods in our current setting. 6. In our alternative methods, we have approximated log-likelihood functions by using a first-order Taylor expansion or by using a first-order Laplace approximation. Some- times, these are not necessarily accurate approximations. In the future, we may in- vestigate better approximations, such as higher order Taylor expansions and Laplace approximations. 138 7. One problem in our models is that there are too many parameters. If the data are not rich enough, the proposed methods may have convergence problems and identifiability problems. We plan to develop Bayesian methods for our problems. 139 References Aitchison, J. and Silvey, S. D. (1958). Maximum-likelihood estimation of parameters sub- ject to restraints. The Annals of Mathematical Statistics, 29, 813-828. Amemiya, T . (1983). Nonlinear regression models. In Handbook of Econometrics, Volume I, Z. Griliches and M . D. Intriligator, eds., pp.333-389. Noth Holland, Amsterdam. Barndorff-Nielsen, O. E . and Cox, D. R. (1989). Asymptotic Techniques for Use in Statis- tics. New York: Chapman and Hall. Booth, J. G . and Hobert, J . P. (1999). Maximizing generalized linear mixed models likeli- hoods with an automated Monte Carlo E M algorithm. Journal of the Royal Statistical Society, Ser. B, 61, 265-285. Bradley, R. A . , and Gart, J . J . (1962). The asymptotic properties of M L estimators when sampling from associated populations. Biometrika, 49, 205-214. Carroll, R. J . , Ruppert, D. , and Stefanski, L . A . (1995). Measurement Error in Nonlinear Models. London: Chapman and Hall. Chan, K. S. and Ledolter, J . (1995). Monte Carlo E M estimation for time series models involving counts. Journal of the American Statistical Association, 90, 242-252. 140 Davidian, M . and Giltinan, D. M . (1995). Nonlinear Models for Repeated Measurements Data. Chapman & Hall. de Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York. Demidenko, E . (2004). Mixed Models Theory and Applications. John Wiley & Sons. Dempster, A . P., Laird, N. M . , and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the E M algorithm (with discussion). Journal of the Royal Statistical Society, Ser. B, 39, 1-38. Diggle, P. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis (with Discussion). Applied Statistics, 43, 49-93. Ding, A . and Wu, H . (2001). Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics, 2(1), 13-29. Euband, R. L . (1988). Spline smoothing and Nonparametric Regression. New York: Marcel Dekker. Fitzmaurice, G . M . , Laird, N. M . , and Zahner, G . E . P. (1996). Multivariate logistic models for incomplete binary responses. Journal of-the American Statistical Association, 91, 99-108. Gelfand, A . E . and Smith, A . F . M . (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409. Geweke, J. (1996). Handbook of Computational Economics. Amsterdam: North-Holland. Gilks, W . R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337-348. 141 Green, P. J . and Solverman, B. W . (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London. Grossman, Z., Polis, M . , Feinberg, M . B., Grossman, Z., Levi, I., Jankelevich, S., Yarchoan, R., Boon, J . , De Wolf, F . , Lange, J . M . A . , Goudsmit, J . , Dimitrov, D. S., and Paul, W. E . (1999). Ongoing HIV dissemination during H A A R T . Nature Medicine, 5 , 1099-1103. Higgins, D. M . , Davidian, M . , and Giltinan, D. M . (1997). A two-step approach to mea- surement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. Journal of the American Statistical association, 92, 436-448. Ho, D. D. , Neumann, A . U . , Perelson, A . S., Chen, W. , Leonard, J. M . , and Markowitz, M . (1995). Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature, 373, 123-126. Ibrahim, J. G . , Chen, M . H . , and Lipsitz, S. R. (2001). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551-564. Ibrahim, J. G . , Lipsitz, S. R., and Chen, M . H. (1999). Missing covariates in generalized linear models when the missing data mechanism is nonignorable. Journal of the Royal Statistical Society, Ser. B, 61, 173-190. Ke, C. and Wang, Y . (2001). Semiparametric nonlinear mixed-effects models and their applications (with discussions). Journal of the American Statistical Association, 96, 1272-1298. Laird, N. M . and Ware, J . H . (1982). Random-effects models for longitudinal data. Bio- metrics, 38, 963-974. Lee, Y . and Nelder, J. A . (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Ser. B, 58, 619-678. Lee, Y . and Nelder, J . A . (2001). Hierarchical generalised linear models: A synthesis of gen- eralised linear models, random-effect models and structured dispersions. Biometrika, 88, 987-1006. Liang, H . , Wu, H . , and Carroll, R. (2003). The relationship between virologic and immuno- logic responses in AIDS clinical research using mixed-effects varying-coefncient models with measurement errors. Bio statistics, 4, 297-312. Lindstrom, M . J. and Bates, D. M . (1990). Nonlinear mixed effects models for repeated measures data. Biometrics, 46, 673-687. Little, R. J . A . (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association, 90, 1112-1121. Little, R. J. A . and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New"York: John Wiley. Louis", T . A . (1982). Finding the observed information matrix when using the E M algorithm. Journal of the Royal Statistical Society, Ser. B, 44, 226-233. McCulloch, C. E . (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92, 162-170. McLachlan, G. J. and Krishnan, T . (1997). The EM-Algorithm and Extension. New York, Wiley. Ogden, R. T . and Tarpey, T . (2006). Estimation in Regression models with externally estimated parameters. Bio statistics, 7, 115-129. 143 Perelson, A . S.,.Neumann, A . U . , Markowitz/-M., Leonard, J. M . , and Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral gener- ation time. Science, 271, 1582-1586. Pinheiro, J. C. and Bates, D. M . (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35. Rice, J . A . and Wu, C . O. (2001). Nonparametric mixed-effects models for unequally sampled noisy curves. Biometrics, 57, 253-259. Seber, G. A . F . (1984). Multivariate Observations. New York: Wiley. Serfling, R. J . (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley. Shah, A . , Laird, N . , and Schoenfeld, D. (1997). A random-effects model for multiple char- acteristics with possibly missing data. Journal of the American Statistical Association, 92, 775-779. Ten Have, T . R., Pulkstenis, E . , Kunselman, A . , and Landis, J . R. (1998). Mixed ef- fects logistics regression models for longitudinal binary response data with informative dropout. Biometrics, 54, 367-383. Vonesh, E . F . (1996). A note on the use of Laplace's approximation for nonlinear mixed- effects models. Biometrika, 83, 447-452. Vonesh, E . F . and Chinchilli, V . M . (1997). Linear and Nonlinear Models for the Analysis of Repeated Measurements. Marcel Dekker, New York. 144 Vonesh, E . F . , Wang, H . , Nie, L . , and Majumdar, D. (2002). Conditional second-order gen- eralized estimating equations for generalized linear and nonlinear mixed-effects models. Journal of the American Statistical Association, 97, 271-283. Wei, G. C. and Tanner, M . A . (1990). A Monte Carlo implementation of the E M algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association, 85, 699-704. Wolfinger, R. (1993). Laplace's approximation for nonlinear mixed models. Biometrika, 80, 791-795. Wu, H . , Kuritzkes, D. R., McClernon, D. R., Kessler, FL, Connick, E . , Landay, A . , Spear, G . , Heath-Chiozzi, M . , Rousseau, F . , Fox, L . , Spritzler, J . , Leonard, J . M'., and Leder- man, M . M . (1999). Characterization of viral dynamics in human immuno-deficiency virus type 1-infected patients treated with combination antiretroviral therapy: rela- tionships to host factors, cellular restoration and virological endpoints. Journal of Infectious Diseases, 179, 799-807. Wu, H . and Ding, A . (1999). Population HIV-1 dynamics in vivo: application models and inferential tools for virological data from AIDS clinical trials. Biometrics, 55, 410-418. Wu, H. (2005). Statistical Methods for HIV Dynamic Studies in AIDS Clinical Trials. Statistical Methods in Medical Research, 14, 171-192. Wu, H. and Zhang, J. (2002). The study of long-term HIV dynamics using semi-parametric non-linear mixed-effects models. Statistics in Medicine, 21, 3655-3675. Wu, L . (2002). A joint model for nonlinear mixed-effects models with censoring and covari- ates measured with error, with application to AIDS studies. Journal of the American Statistical Association, 97, 955-964. • 145 Wu, L . (2004). Exact and approximation inferences for nonlinear mixed-effects models with missing covariates. Journal of the American Statistical Association, 99, 700-709. Wu, L. and Wu, H . (2001). A multiple imputation method for missing covariates in non- linear mixed-effect models, with application to HIV dynamics. Statistics in Medicine, 20, 1755-1769. Wu, M . C. and Carroll, R. J . (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44, 175- 188. 146
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
Japan | 24 | 0 |
United States | 1 | 0 |
China | 1 | 30 |
City | Views | Downloads |
---|---|---|
Tokyo | 24 | 0 |
Unknown | 1 | 0 |
Beijing | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: