Approximate Methods for Joint Models in Longitudinal Studies by Libo Lu B.Sc., Shandong University, 2002 M.Sc., University of Science and Technology of China, 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2010 c
Libo Lu 2010 Abstract Longitudinal studies often contain several statistical issues, such as longitudinal process and time-to-event process, the association among which requires joint modeling strategy. We rstly review the recent researches on the joint modeling topic. After that, four popular inference methods are introduced for jointly analyzing longitudinal data and time-to-event data based on a combination of typical parametric models. However, some of them may suer from non-ignorable bias of the estimators. Others may be computationally intensive or even lead to convergence problems. In this thesis, we propose an approximate likelihood-based simultaneous in- ference method for jointly modeling longitudinal process and time-to-event pro- cess with covariate measurement errors problem. By linearizing the joint model, we design a strategy for updating the random eects that connect the two pro- cesses, and propose two algorithm frameworks for dierent scenarios of joint like- lihood function. Both frameworks approximate the multidimensional integral in the observed-data joint likelihood by analytic expressions, which greatly reduce the computational intensity of the complex joint modeling problem. We apply this new method to a real dataset along with some available methods. The inference result provided by our new method agrees with those from other popular methods, and makes sensible biological interpretation. We also conduct a simulation study for comparing these methods. Our new method looks promising in terms of estimation precision, as well as computation eciency, especially when ii Abstract more subjects are given. Conclusions and discussions for future research are listed in the end. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Some Issues in Longitudinal Studies . . . . . . . . . . . . . . . . . 1 1.1.1 Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Incomplete Data . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Time-to-Event Data . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Joint Modeling Longitudinal Data and Time-to-Event Data . . . . 7 1.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 Inference Approaches . . . . . . . . . . . . . . . . . . . . . 10 1.3 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.1 Brief Biological Background . . . . . . . . . . . . . . . . . 12 1.3.2 Statistical Question . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Objective and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 14 iv Table of Contents 2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1 Nonlinear Mixed Eects (NLME) Model for Longitudinal Process 16 2.2 Nonparametric Model for Covariate Measurement Errors Problem 19 2.3 AFT Model for Time-to-Event Data . . . . . . . . . . . . . . . . . 22 2.4 Models for Nonignorable Missing Data . . . . . . . . . . . . . . . 26 3 Simultaneous Inference . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Two-Step Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Naive Two-Step Method . . . . . . . . . . . . . . . . . . . 31 3.1.2 Modied Two-Step Method . . . . . . . . . . . . . . . . . . 32 3.2 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 E-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 M-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Laplace Approximation . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Linearization Method . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Framework I . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.2 Framework II . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Model Specication . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2.1 Number of Subjects . . . . . . . . . . . . . . . . . . . . . . 54 5.2.2 Numbers of Repeated Measurements . . . . . . . . . . . . 56 5.2.3 Between-Subject Variance . . . . . . . . . . . . . . . . . . 57 v Table of Contents 5.2.4 Within-Subject Variance . . . . . . . . . . . . . . . . . . . 58 6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . 61 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 vi List of Tables 4.1 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 Simulation Results: n = 60 . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Simulation Results: n = 120 . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Simulation Results: ni = 12 . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Simulation Results: A2; B2 . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Simulation Results: 2; 2; 2 . . . . . . . . . . . . . . . . . . . . . 59 vii List of Figures 1.1 Longitudinal Data of 6 Subjects . . . . . . . . . . . . . . . . . . . . 2 1.2 Plots of Longitudinal Data of Subject 10 . . . . . . . . . . . . . . . 13 1.3 Plots of Longitudinal Data of Subject 18 . . . . . . . . . . . . . . . 14 1.4 Plots of Longitudinal Data of Subject 33 . . . . . . . . . . . . . . . 14 4.1 Longitudinal Data Trajectories . . . . . . . . . . . . . . . . . . . . 46 4.2 Estimate of Survivor Function . . . . . . . . . . . . . . . . . . . . . 47 4.3 Residuals Plots I of NLME Model . . . . . . . . . . . . . . . . . . 49 4.4 Residuals Plots II of NLME Model . . . . . . . . . . . . . . . . . . 49 4.5 Residuals Plots I of Covariate Measurement Errors Model . . . . . 50 4.6 Residuals Plots II of Covariate Measurement Errors Model . . . . . 50 viii Acknowledgments First and foremost, I would like to express my gratitude to my supervisors, Dr. Lang Wu and Dr. Jiahua Chen for their support, encouragement and patience throughout my study at the University of British Columbia. Their world-leading expertise and excellent guidance dened a role model for my career. This thesis would not have been completed without their inspiring comments and constructive suggestions. I would also like to thank everyone in the Department of Statistics at the University of British Columbia. In the past two years, they have kindly left me invaluable knowledge, along with wonderful memories. I had a great time with their company. Many many many thanks to my parents and Miss. Marie Li for their un- conditional love and support, and to my lifelong friends for their lifelong loyalty. And nally, I must thank Buddha for always being with me no matter what hap- pens, dispelling my fear, healing my pain, comforting my soul and guiding my life forward. It is lucky to be his follower. ix Chapter 1 Introduction 1.1 Some Issues in Longitudinal Studies With the development of modern statistical methodology, longitudinal studies play an increasingly important role in health science, medical research, social science and economics. This type of study obviously diers from the classical cross- sectional studies in terms of the times of measurement of the response. That is to say, in a longitudinal study, some variables are measured more than once at dierent time points for each subject. Longitudinal studies, which enable the researchers to separate the cohort and time eects, are valuable because the eect of time could be a serious confounding factor to other covariates of interest. 1.1.1 Longitudinal Data Suppose n subjects are included a longitudinal study. The value of the response variable y for subject i at time tij is denoted as yij , where i = 1; : : : ; n; j = 1; : : : ; ni: This type of data are called longitudinal data. The graphical display of longitudinal data requires attention. A sample plot of longitudinal data of 6 subjects is on the top of Figure 1.1, which is quite dicult for discovering growth patterns. It is often helpful to distinguish subjects with dierent types of points and connect the observations of each subject with lines, such as the bottom plot of Figure 1.1. Comparing to the data obtained from a cross-sectional study, the data col- 1 1.1. Some Issues in Longitudinal Studies Figure 1.1: Longitudinal Data of 6 Subjects 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 Time R es po ns e 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 Time R es po ns e 2 1.1. Some Issues in Longitudinal Studies lected from a longitudinal study are naturally correlated. This feature requires a statistical methodology to take the correlation into account. For a longitudi- nal data analysis, a primary interest is to estimate the population mean and its relationship with covariates. A popular method for this purpose is called the marginal model, which separately models the mean structure and the variance- covariance structure of the response variable. Its underlying idea is close to the quasi-likelihood approach that only requires the rst and second moment informa- tion of the response variable without specic distributional assumption. Another approach is the transitional model, which assumes a Markov structure for the longitudinal process to model the correlation among the repeated measurements. This approach is appealing if the current observation strongly depends on previous ones. Marginal models and transitional models consider the between-subject varia- tion and within-subject variation separately. A mixed eects model considers the two sources of variation simultaneously by incorporating random eects to rep- resent the between-subject variation and within-subject correlations. Laird and Ware (1982) proposed a general linear mixed eects (LME) model: yi = Xi + Zibi + ei (1.1) bi N(0; D); eijbi N(0; Ri); (1.2) where are the xed eects (population parameters), bi = (bi1; : : : ; biq) T are the random eects (subject-specic parameters), Xi is a ni (p + 1) design matrix containing the covariates of subject i, Zi is a ni q design matrix (Zi is often a submatrix of Xi) for the random eects, ei = (ei1; : : : ; eini) T are the random errors of those within-subject measurements, Ri is a ni ni matrix characteriz- ing variance-covariance of within-subject measurements, and D is the variance- 3 1.1. Some Issues in Longitudinal Studies covariance matrix of the random eects. The inference procedure of mixed eects models is a natural extension of classic regression methods for cross-sectional models, and most ideas and results, such as the maximum likelihood estimation (MLE) and asymptotic normality, also apply to the estimation of LME models. Diggle et al. (2002) provided a comprehensive overview of the above three methods. Verbeke and Molenberghs (2001) and Davidian and Giltinan (1995) gave more discussions on linear and nonlinear mixed eects models. In practice, the longitudinal trajectories could be so complex that parametric models are not
exible enough. For those cases, Wu and Zhang (2006) introduced some semiparametric or nonparametric models. 1.1.2 Incomplete Data The data observed in longitudinal studies are very likely to be incomplete, such as measurement errors and missing data. In the following, we brie
y introduce these problems. Measurement Errors Problem It is ideal that we can get precise values of the selected variables, such as gender and age. However, in practice, the measurements of some continuous variables, such as blood pressure and CD4, are often imprecise due to various reasons. We call this issue as measurement error problem, the distinguishing feature of which is that we can only observe a variable z that are related to z. A major goal of measurement error modeling is to obtain nearly unbiased estimates of the parameters in the main model by tting a sub-model based on z. This objective requires careful analysis, because simply substituting z for z without proper adjustments may lead to biased estimates in some cases. Carroll et al. (2006) 4 1.1. Some Issues in Longitudinal Studies reviewed general methods for measurement errors problems. Missing Data Problem Missing data problems mainly encompass missing responses, missing covariates, and dropouts. A main reason is that not all subjects are available at every mea- surement time throughout the entire study, especially if the study lasts for a long period. In the cases of missing responses, missing covariates, and dropouts, part of the data are completely missing. The missingness can be classied into three mechanisms: missing completely at random (MCAR): missingness depends neither on the observed data nor on the unobserved data. missing at random (MAR): missingness may depend on the observed data but not on the unobserved data. missing not at random (MNAR): missingness may depend on both the ob- served data and the unobserved data. For instance, suppose a student has not submitted his assignment in class on the due day, then his grade for that assignment was missing. If he left his assignment at home, the missing grade is MCAR. If he missed that class, then the missing grade is MAR, since the missingness is determined by the observed data \attendance". If he has not nished his assignment, then the missing grade is MNAR. There are enormous valuable literatures on this topic, since the missing data are often informative and nonignorable. Statistical inference without considering this issue might result in biased estimation. Little and Rubin (2002) provided a systematic introduction to major missing data methods. Schafer (1997) gave a 5 1.1. Some Issues in Longitudinal Studies general introduction for incomplete multivariate data analysis. Wu (2009) oered an overview of analyzing incomplete data problems with mixed eects models. 1.1.3 Time-to-Event Data In many longitudinal studies, the time to certain events happen, such as time of patient's death, are of great interest. It is conventional to talk about survival data and survival analysis, regardless of the nature of the events. Often, statistical models are built for studying the relationship between the event time and other covariates. The distinguishing feature of time-to-event data is censoring, which means that the event time may not be observed exactly. For example, suppose that a subject is lost to follow-up after 5 years of observation. According to these complex reasons, censoring issues can be classied into three types: Right-censoring. The time of event is not observed because it happened to the right of the 5th year. This case is called right-censoring. Left-censoring. Suppose a subject had an event before the 5th year but the exact time of the event is unknown. This case is called left-censoring. Interval-censoring. Suppose a subject had an event between the 4th year and 5th year but the exact time of the event is unknown. This case is called interval-censoring. Here are some possible reasons why censoring may occur: a subject does not experience the event before the study end; a person is lost to follow-up during the study period; a person withdraws from the study because of a death (if death is not the event of interest) or some other reasons. 6 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data The censoring feature, which is sometimes referred as a special missing data issue, determines that specic statistical methodology is necessary. Common approaches in time-to-event data analysis include Cox Proportional Hazards model (Cox model), which assumes that the eect of a covariate is to multiply the hazard by some constant, and Accelerated Failure Time model (AFT model), which assumes that the eect of a covariate is to multiply the predicted event time by some constant. Time-to-event data have been extensively studied. Lawless (2003) provided comprehensive introduction to the relevant statistical models and methods. 1.2 Joint Modeling Longitudinal Data and Time-to-Event Data 1.2.1 Motivation The statistical issues mentioned in section 1.1 often occur simultaneously but used to be analyzed separately in longitudinal studies. Previous relevant researches reported that the intrinsic relationships among longitudinal process, time-to-event process and incomplete data mechanism can be greatly in
uential to the statistical inference for diverse scientic objectives. Wu and Carroll (1988) reported that analyzing longitudinal data without modeling informative censoring issue might lead to biased results. Their work was generalized by DeGruttola and Tu (1994). Little (1995) pointed out the possible estimation bias for analyzing longitudinal data due to in- gnoring the missing data problems. Wu (2002) and Liu and Wu (2007) revealed the eects of measurement errors problems and missing data prob- lems on the inference of longitudinal process. These researches imply that classical statistical methods for longitudinal data analysis may need to be 7 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data adjusted for incorporating certain critical events in longitudinal studies, such as outcome-dependent drop-out. Faucett and Thomas (1996) and Wulfshon and Tsiatis (1997) modeled time- to-event data with time-varying covariates measured with errors, where the joint modeling approach and the separate modeling approach provided dif- ferent estimates. These researches indicated that valid modeling of time- to-event data might be conditional on relevant longitudinal processes, for instance, the time-varying covariates. Their researches were generalized re- spectively by Xu and Zeger (2001a,b) and Song et al. (2002a,b). Henderson et al. (2000) jointly analyzed longitudinal data and time-to-event data with equal interests using the likelihood-based approaches. Wang and Taylor (2001) and Brown and Ibrahim (2003) further generalized the model- ing assumptions on longitudinal process. Joint modeling of both longitudi- nal process and time-to-event process is particularly essential if the associa- tion between the event times and growth trends of the longitudinal process is of interest. Generally speaking, joint modeling links the longitudinal process and the time- to-event process via a common set of random eects. Under the joint modeling framework, the main focus could be just on the longitudinal process or the time-to- event process, or it could be on both processes. Since this joint modeling approach contains at least two processes, the parameters to be estimated are usually more than a single inference. If only one specic process is of interest, then the other process would better be modeled as simple as possible to reduce the nuisance parameters. Tsiatis and Davidian (2004) provided an excellent overview on the motivations of joint modeling and relevant preceding literature. The rest of this section brie
y 8 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data reviews both statistical model and inference approach that have been adopted for recent researches. 1.2.2 Statistical Models Models for Longitudinal Data Many researches adopt mixed eects models for modeling longitudinal data. One typical model, adopted by Ratclie et al. (2004), Zeng and Cai (2005a,b), Li et al. (2009) and Huang et al. (2009), is parametric linear mixed eects models with covariates as predictors. Other researchers, such as Brown et al. (2005), Ding and Wang (2008), Ye et al. (2008) and Zhang et al. (2010), used nonparametric models for more
exible descriptions of the longitudinal processes. Vonesh et al. (2006) and Yu et al. (2008) adopted nonlinear mixed eects models according to the corresponding scientic backgrounds. Rizopoulos et al. (2008), Liu (2009) and Li et al. (2010) established their approaches based on generalized mixed eects models for analyzing categorical longitudinal data. Models for Time-to-event Data For modeling time-to-event data, some joint modeling studies, including Ratclie et al. (2004), Song and Huang (2006), Ding and Wang (2008), Li et al. (2009) and Zhang et al. (2010), were based on Cox proportional hazard model, partially because Cox model is widely used in practice. Recently, according to the scientic background of the data, more and more attention have been turned to accelerated failure time model, such as Tseng et al. (2005) and Rizopoulos et al. (2008). Particularly, Vonesh et al. (2006) and Rizopoulos et al. (2010) adopted Weibull model, which bears the characteristics of both Cox model and AFT model. 9 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data 1.2.3 Inference Approaches Two-stage Approach The idea of two-stage approach is to rstly estimate the shared parameters bi from either the longitudinal process or the time-to-event process, such as Zhang et al. (2008) and Ye et al. (2008), with the corresponding parameters of the rst model being estimated. Then the estimated shared parameters b̂i are used for the estimation of the parameters of the second model. Although two-stage approach is easy to implement, there are several drawbacks (Wulfshon and Tsiatis, 1997). For example, the estimated parameters of one process from the rst step are regarded as xed in the modeling of the other process in the second step, which does not propagate uncertainty from step one to step two. Likelihood-Based Approach A more unied alternative of statistical inference is based on the joint likelihood function given both longitudinal data and time-to-event data. The joint modeling approaches simultaneously estimate the parameters that describe the longitudinal process, as well as those that describe the time-to-event process. Besides enhanc- ing the eciency of the inference, joint modeling is also expected to provide more precise estimation of the relationship between the two processes. Tseng et al. (2005) considered a LME model for longitudinal data along with a general AFT model for time-to-event model, and developed an EM algorithm to maximize the resulting log-likelihood, which involves intractable integrals over the distribution of xed eects. The similar EM algorithms have been applied to many other researches based on Cox models, such as Ratclie et al. (2004), Wu et al. (2008), Rizopoulos et al. (2009) and Li et al. (2010). Vonesh et al. (2006) and Rizopoulos et al. (2009) eectively reduced the computational diculties in 10 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data the E-step by using Laplace approximation. Zeng and Cai (2005a) adopted a LME model and a Cox model for the lon- gitudinal process and the time-to-event process, and discussed the asymptotic behavior of the MLE obtained from EM algorithm. Hsieh et al. (2006) reinforced the merit of the joint modeling approach in Wulfshon and Tsiatis (1997) by provid- ing a theoretical explanation of the robustness features observed in the literature. They suggested that the likelihood-based procedure with normal random eects can be very ecient and robust as long as there is rich enough information avail- able from the longitudinal data. However, if the longitudinal data are too sparse or carry too large measurement errors, the eciency loss of joint modeling can be quite substantial. Hsieh et al. (2006) also recommended to use bootstrap method to get more reliable estimates of the standard errors of the MLEs. Brown et al. (2005), Yu et al. (2008) and Zhang et al. (2010) took Bayesian approaches to joint model two processes and developed Markov Chain Monte Carlo (MCMC) implementations, which were also based on likelihood. Ibrahim et al. (2001b, chap. 7) provided a detailed discussion on joint modeling from a Bayesian perspective. Yu et al. (2004) compared the inference results for Bayesian approach and EM approach, both of which rely on the specication of likelihood functions. Semiparametric Approach Song and Huang (2006) and Song and Wang (2008) focused on estimating the xed eects in the joint models of longitudinal process and time-to-event process by developing a set of unbiased estimating equations (conditional and corrected score approaches), which yield consistent and asymptotically normal estimators with no assumptions on the random eects. Their approaches reduce reliance on the parametric modeling assumptions. 11 1.3. Motivating Example 1.3 Motivating Example 1.3.1 Brief Biological Background Human Immunodeciency Virus (HIV) is a virus that directly attacks certain human organs, such as the brain, heart, and kidneys, as well as the human im- mune system. Infection with HIV occurs by the transfer of blood, semen, vaginal
uid, pre-ejaculate, or breast milk. Within these bodily
uids, HIV is present as both free virus particles and virus within infected immune cells. The four ma- jor routes of transmission are unsafe sex, contaminated needles, breast milk, and transmission from an infected mother to her baby at birth (Vertical transmission). Screening of blood products for HIV has largely eliminated transmission through blood transfusions or infected blood products in the developed world. The primary cells attacked by HIV are the CD4 lymphocytes, which help direct immune function in the body. HIV infection leads to low levels of CD4 cells through three main mechanisms: direct viral killing of infected cells; increased rates of apoptosis in infected cells; and killing of infected CD4 cells by CD8 cytotoxic lymphocytes that recognize infected cells. Since CD4 cells are required for proper immune system function, when enough CD4 lymphocytes have been destroyed by HIV, the immune system barely works. Many problems experienced by people infected with HIV result from a failure of the immune system to protect them from certain opportunistic infections (OIs) and cancers. Most people infected with HIV eventually develop Acquired Immunodeciency Syndrome (AIDS). These individuals mostly die from opportunistic infections or malignancies associated with the progressive failure of the immune system. HIV progresses to AIDS at a variable rate aected by viral, host, and environmental factors. Most individuals will progress to AIDS within 10 years of HIV infection, sooner or later. HIV treatment with anti-retrovirals increases the life expectancy 12 1.3. Motivating Example of people infected with HIV. Even after HIV has progressed to diagnosable AIDS, the average survival time with antiretroviral therapy was estimated to be more than 5 years as of 2005. Without antiretroviral therapy, someone who has AIDS typically dies within a year. 1.3.2 Statistical Question According to the clinical experiences, HIV infection often occurs along with the variation of CD4 cells, CD8 cells and their ratio. Recently, the CD4/CD8 ratio has become a new biomarker for assessing the relative condition of HIV subjects, and further more, for predicting the progression from HIV to AIDS. After having certain anti-HIV treatments, the viral loads of HIV-infected sub- jects are expected to decline, roughly with biphasic exponential decay. Meanwhile, we might observe a corresponding increase of CD4 and a decrease of CD8, and con- sequently, an increase of CD4/CD8 ratio. However, the viral load might rebound after a period of treatment due to various reasons. The relationship between HIV viral suppression and immune restoration is of great attention in AIDS research. Figure 1.2: Plots of Longitudinal Data of Subject 10 0 50 100 150 3. 0 3. 5 4. 0 4. 5 5. 0 Subject 10 Time lo g1 0(R NA ) CD 4 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0 log10(RNA) CD4 0 50 100 150 3. 0 3. 5 4. 0 4. 5 5. 0 Subject 10 Time lo g1 0(R NA ) CD 4/ CD 8 0. 35 0. 40 0. 45 0. 50 0. 55 0. 60 0. 65 log10(RNA) CD4/CD8 0 50 100 150 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0 Subject 10 Time CD 4 CD 4/ CD 8 0. 35 0. 40 0. 45 0. 50 0. 55 0. 60 0. 65 CD4 CD4/CD8 Our research was motivated by an AIDS Clinical Trials Group (ACTG) study (Wu and Ding, 1999) for demonstrating that the initial viral decay rate re
ects 13 1.4. Objective and Outline Figure 1.3: Plots of Longitudinal Data of Subject 18 0 50 100 150 2. 5 3. 0 3. 5 4. 0 Subject 18 Time lo g1 0(R NA ) CD 4 25 0 30 0 35 0 log10(RNA) CD4 0 50 100 150 2. 5 3. 0 3. 5 4. 0 Subject 18 Time lo g1 0(R NA ) CD 4/ CD 8 0. 40 0. 42 0. 44 0. 46 0. 48 0. 50 log10(RNA) CD4/CD8 0 50 100 150 25 0 30 0 35 0 Subject 18 Time CD 4 CD 4/ CD 8 0. 40 0. 42 0. 44 0. 46 0. 48 0. 50 CD4 CD4/CD8 Figure 1.4: Plots of Longitudinal Data of Subject 33 0 50 100 150 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 Subject 33 Time lo g1 0(R NA ) CD 4 10 0 15 0 20 0 25 0 log10(RNA) CD4 0 50 100 150 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 Subject 33 Time lo g1 0(R NA ) CD 4/ CD 8 0. 18 0. 20 0. 22 0. 24 0. 26 log10(RNA) CD4/CD8 0 50 100 150 10 0 15 0 20 0 25 0 Subject 33 Time CD 4 CD 4/ CD 8 0. 18 0. 20 0. 22 0. 24 0. 26 CD4 CD4/CD8 the ecacy of an anti-HIV treatment. Figure 1.2, 1.3 and 1.4 are the trajectories of HIV viral load, CD4 count, and CD4/CD8 ratio of three randomly selected subjects. HIV viral load appears to be negatively correlated with both CD4 count and CD4/CD8 ratio. This article studies the potential underlying association between the viral load decay and the CD4/CD8 time trend by jointly modeling the HIV viral load dynamics and the time of the rst decrease of CD4/CD8. 1.4 Objective and Outline In this thesis, we develop a new approximate statistical inference method for jointly modeling longitudinal process and time-to-event process with covariate 14 1.4. Objective and Outline measurement errors. Comparing to other existing inference methods, the methods that we proposed are easy to implement and much more computationally ecient. This thesis is organized as follows. Chapter 2 introduces the statistical models that will be included in our joint modeling. Chapter 3 discusses currently available inference methods and proposes our new methods with procedure details. These methods are to be applied to a real dataset from the motivating example in Chapter 4 for data analysis. The simulation results are displayed in Chapter 5. We nally make the conclusion and future research discussion in Chapter 6. 15 Chapter 2 Statistical Models The joint modeling of longitudinal process and time-to-event process is basically composed of several simpler models which represent respectively dierent specic processes. In this chapter, we introduce the statistical models for each process in our longitudinal study. 2.1 Nonlinear Mixed Eects (NLME) Model for Longitudinal Process The general LME model (1.1) has achieved numerous successes in both theoretical and applied research for its simplicity in modeling, computation and interpreta- tion. However, linear structure only provides local linear approximation to the growth trend of the response variable and thus limits our vision within the range of the observed data. In many longitudinal studies, the developments of response variables have certain deterministic mechanisms underlying, which can often be represented by nonlinear mathematical formulations of physically meaningful pa- rameters. The nonlinear formulations, if being correctly specied, support more precise estimation for the statistical model, both inside and outside of the range of the observed data, with less parameters than linear models. Therefore, nonlinear mixed eects models (NLME) are becoming increasingly favorable in representing the longitudinal process with known developing mechanism. Suppose n subjects are included in a longitudinal study. Let yij be the response 16 2.1. Nonlinear Mixed Eects (NLME) Model for Longitudinal Process value of subject i at time tij , and let zij be the corresponding covariate value. We can write the NLME model in the following two-stage hierarchical nonlinear model: yij = g(tij ;i) + eij (2.1) i = h(zi;;bi) (2.2) bi N(0; B); eijbi N(0; Ri); (2.3) where g() and h() are given nonlinear functions, i are the subject-specic pa- rameters and are the xed eects (population parameters), zi contain the co- variates of subject i, bi = (bi1; : : : ; biq) T are the random eects distinguishing subjects, ei = (ei1; : : : ; eini) T are the random errors of within-subject measure- ments, B is the variance-covariance matrix of the random eects, and Ri is a nini matrix characterizing the variance-covariance of within-subject measurements. In practice, the function g() is determined by the scientic problem and the data background. The function h() is often a linear combination of the xed eects and the random eects, such as i = Ai +Bibi; (2.4) where Ai is a design matrix depending on the elements of zi, and Bi is a de- sign matrix, which typically involve only zeros and ones, allowing some elements of i to have no associated random eects. The random eects bi represent between-subject variance that is not explained by covariate zi. Thus, the variance- covariance matrix B is usually unstructured. The error term ei represent the lack of t of the models and possible measurement errors of the data. The choice of Ri should be guided under practical background, on which Davidian and Giltinan (2003) gave detailed discussions. 17 2.1. Nonlinear Mixed Eects (NLME) Model for Longitudinal Process A widely applicable approach for estimating parameters of NLME is based on likelihood. Let = (;; B) denotes all parameters, where is the collection of parameters in Ri; i = 1; : : : ; n: The marginal distribution of the response yi is given by f(yij) = Z f(yijzi;;bi)f(bijB)dbi; (2.5) and the likelihood is L(jy) = nY i=1 L(jyi) = nY i=1 Z f(yijzi;;bi)f(bijB)dbi; (2.6) The statistical inference is then based on maximum likelihood estimation (MLE) approach, which means that the classic large-sample asymptotic result is promis- ing to hold in this situation. In general, the marginal distribution (2.5) or the likelihood function (2.6) has no closed-form expression, which greatly distinguishes NLME model from LME model. This characteristic implies that a major challenge of likelihood-based approach for NLME model is the maximization of the likelihood function (2.6). Commonly applicable methods are: Numerical method or Monte Carlo method, which uses dierent integration techniques to approximate the integral of (2.6) and then maximizes the approximate likelihood. EM algorithm, which iteratively maximizes the likelihood function (2.6). Laplace approximation method, which approximates the integral of (2.6) by a simple form and then maximizes the approximate likelihood. Taylor expansion method, which linearizes the NLME into a sequence of LMEs and approximate the maximum point of the likelihood function (2.6). 18 2.2. Nonparametric Model for Covariate Measurement Errors Problem Davidian and Giltinan (1995) and Wu (2009) provided comprehensive discussions on the asymptotic results and implementational details. 2.2 Nonparametric Model for Covariate Measurement Errors Problem In longitudinal studies, the covariates are usually recorded and analyzed to par- tially explain the between-subject variance. However, the covariates may be mea- sured with substantial errors. It is well known that ignoring covariates measure- ment errors might lead to severe bias in statistical inference (see, e.g., Carroll et al. (2006)). Therefore, the covariate measurement errors problem should be incorpo- rated into the analysis of longitudinal data. Generally, the choice of appropriate analysis method for measurement errors problem depends on the following two considerations. The choice of statistical model depends on the type and amount of data available for analysis. A sophisticated error model may be less reliable without sucient data available. The measurement error models can mainly be classied into functional mod- els and structural models. Comparatively speaking, functional models treat z as a sequence of unknown xed constants or a random variable with min- imal distributional assumption. It leads to estimation procedures that are robust to mis-specication of the distribution of z. On the other hand, structural models assume z follow specic parametric distributions, the estimates based on which are usually more ecient if the parametric distri- butions are correctly specied. From the viewpoint of functional data analysis, measurement errors problems in longitudinal studies are closely related to tting several smooth curves. Thanks 19 2.2. Nonparametric Model for Covariate Measurement Errors Problem to the magnicent developments of functional data analysis and nonparametric statistical analysis in recent years, the researchers now have more
exible tech- niques to handle complex data with high precision. Rice and Wu (2001) proposed a linear mixed eects regression spline model for making subject-specic inference for the covariates with measurement errors and distinguishing the between-subject and within-subject variance. Liu and Muller (2009) developed a general and model-free dynamic equation for obtaining the derivatives of the growth curves for sparse data. This empirical dierential equation has potential to provide valu- able insights into the time-dynamics of random trajectories. In our study, we adopt the approach of Rice and Wu (2001) for modeling the time-varying covariate process to incorporate the measurement errors. Since the covariate process could be quite complex, a
exible empirical nonparametric linear mixed eects model is considered. zi(t) = r(t) + hi(t) + i(t) zi (t) + i(t); i = 1; : : : ; n: (2.7) where zi (tik) = r(tik)+hi(tik) are the true but unobserved covariate values, r(tik) and hi(tik) are unknown nonparametric smooth functions representing the pop- ulation means and the subject-specic variation respectively, and i(tik) are the error terms following N(0; 2). The random smooth function hi() is introduced to incorporate the large between-subject variation in the covariate process, while the xed smooth function r() represents population average of the covariate process. We assume that hi() is the realization of a zero-mean stochastic process. As in Rice and Wu (2001), we can approximate the nonparametric func- tions r(t) and hi(t) by linear combinations of some basis functions p(t) = 20 2.2. Nonparametric Model for Covariate Measurement Errors Problem [ 0(t); 1(t); : : : ; p1(t)]T and q(t) = [0(t); 1(t); : : : ; q1(t)]T as follows: r(t) rp(t) = p1X k=0 k k(t) = p(t) T; (2.8) hi(t) hiq(t) = q1X k=0 aikk(t) = q(t) Tai; i = 1; : : : ; n: (2.9) where = (0; : : : ; p1)T is a p1 vector of xed eects and ai = (ai0; : : : ; ai;q1)T is a q 1 vector of random eects, and we assume ai are i.i.d. N(0; A). The functions () and () can theoretically be taken as any basis functions, includ- ing various types of splines, such as B-splines (de Boor, 1978), smoothing splines (Chambers, 1991) and P-splines (Ramsay and Silverman, 2005), or local polyno- mials (Fan and Gijbels, 1996). The above approximations of r(t) and hi(t) can be arbitrarily accurate as the values of p and q increase. However, too large values of p and q may lead to overtting problem. Appropriate values of p and q can be determined by standard model selection criteria such as AIC or BIC values (Rice and Wu, 2001). By approximating r(t) and hi(t) with rp(t) and hiq(t) respectively, the covariate model (2.7) can be approximated by the following LME model zi = z i + i = T p+ T q ai + i (2.10) ai N(0; A); ijai N(0; 2Ii): (2.11) Model (2.10) can be used to model covariate measurement errors problems (Carroll et al., 2006). For multiple covariates, we can model each covariate sepa- rately using (2.10). However, a more reasonable approach is to consider a multi- variate version of model (2.10), such as a multivariate LME model (Shah et al., 1997). 21 2.3. AFT Model for Time-to-Event Data 2.3 AFT Model for Time-to-Event Data In the analysis of time-to-event data, the survivor function and the hazard func- tion, which depend on time, are of particular interest. Denote f(t) as the prob- ability density function (p.d.f.) of event time T . The survivor function S(t) is dened as the probability of surviving to time t S(t) = Pr(T t) = Z 1 t f(x)dx: (2.12) The graph of S(t) against t is called the survival curve. The hazard function h(t) is dened as the event rate at time t conditional on no event occurs until time t or later. h(t) = lim t!0 Pr(t T < t+tjT t) t = f(t) S(t) = d dt logS(t): (2.13) Many statistical analysis methods for analyzing time-to-event data, no mat- ter they are parametric, semiparametric or nonparametric, are closely related to the likelihood function given the data, which is determined by the existence and specic type of censoring issue. Let's consider a simple right-censoring case for example. Let Ti be the event time for subject i; i = 1; : : : ; N: Let Ci be the cen- soring time for subject i. Obviously, if no censoring occurs, Ti is observed and less than the potential censoring time (Ti Ci). If censoring occurs, we only know that Ti > Ci. In general, we denote i = min(Ti; Ci); i = I(Ti Ci): (2.14) 22 2.3. AFT Model for Time-to-Event Data Both i and i are random variables and their joint p.d.f. is f(i) iS(i) 1i : (2.15) Suppose T1; : : : ; Tn are independent, we may obtain the likelihood function given the right-censored data as L = NY i=1 f(i) iS(i) 1i : (2.16) The Kaplan-Meier method, the nonparametric MLE of the survivor function S(t), can be used to estimate this curve from the observed survival times without assumption on distribution. However, in many cases, the survival function S(t) is assumed following certain distributions. For instance, if S(t) can be written as S(T ju; ) = S0 log T u ; (2.17) where u 2 R is a location parameter and > 0 is a scale parameter, we say T has a log-location-scale distribution, which is a widely used parametric distribution for time-to-event data. If we generalize u to u(z), we may have more covariates z into account, that is S(T jz; u; ) = S0 log T u(z) : (2.18) Suppose is a random variable with survivor function S0() and is independent from z, then (2.18) can be rewritten as log T = u(z) + : (2.19) Here the survivor function is represented in a regression setting. 23 2.3. AFT Model for Time-to-Event Data Formula (2.19) is a general semiparametric AFT model. In practice, u() can be taken as a linear function and we have the following parametric AFT model log T =
0 + T 1 z+ ; (2.20) where is independent from z. In this thesis, we consider the random eects ai and bi as covariates and build the following AFT frailty model for time-to-event data. log Ti =
0 + T 1 ai + T 2 bi + i; i = 1; : : : ; n: (2.21) where
= (
0;
1;
2) are the unknown parameters and i are i.i.d. random variables. Note that the NLME longitudinal model (2.1) and the time-to-event model (2.21) are connected through the random eects bi. There are three major parametric AFT models based on dierent distributions of . Weibull AFT model. Assume the event times follow a Weibull distribution with scale parameter exp((
0+
T1 ai+
T2 bi)=) and shape parameter 1=, the survivor function is S(t) = exp exp log t
0
T1 ai
T2 bi ; (2.22) Because the Weibull distribution has both the proportional hazards prop- erty and the accelerated failure time property, this model is particularly attractive. The log-logistic AFT model. Assume the event times follow a log-logistic distribution with parameters (
0 + T 1 ai + T 2 bi)= and 1=, the survivor 24 2.3. AFT Model for Time-to-Event Data function is S(t) = 1 + exp log t
0
T1 ai
T2 bi 1 : (2.23) The log-normal AFT model. Assume the event times follow a log-normal distribution with parameters
0+ T 1 ai+ T 2 bi and , the survivor function is S(t) = 1 log t
0
T1 ai
T2 bi : (2.24) Once the distribution is assumed, one may nd the corresponding likelihood func- tion by specifying the f(t) and S(t) in (2.16). Suppose the survival time i in AFT model follows Weibull distribution, the pdf and survivor function of i is f() = exp( e); S() = exp(e); (2.25) that is f(t) = exp 1 log t
0
T1 ai
T2 bi (2.26) exp 1(log t
0
T1 ai
T2 bi) ; (2.27) and S(t) = exp exp 1(log t
0
T1 ai
T2 bi) : (2.28) The log-likelihood function of the AFT model for subject i is `(
jai;bi) = nX i=1 (ii ei) = nX i=1 i 1 log Ti
0
T1 ai
T2 bi exp 1(log Ti
0
T1 ai
T2 bi) : 25 2.4. Models for Nonignorable Missing Data By integrating out the random eects ai;bi, we can obtain the log-likelihood function of
. `(
) = nX i=1 ZZ `(
jai;bi)f(aijA)f(bijB)daidbi (2.29) = nX i=1 ZZ i 1 log Ti
0
T1 ai
T2 bi f(aijA)f(bijB)daidbi nX i=1 ZZ exp 1(log Ti
0
T1 ai
T2 bi) f(aijA)f(bijB)daidbi = 1 nX i=1 i log Ti
0
T1E(ai)
T2E(bi) nX i=1 exp 1(log Ti
0) Z exp 1
T1 ai f(aijA)dai Z exp 1
T2 bi f(bijB)dbi = 1 nX i=1 i log Ti
0
T1E(ai)
T2E(bi) nX i=1 exp log Ti
0 exp T 1E(ai) +
T1 (ai)
1 2 T 2E(bi) +
T2 (bi)
2 2 Then the statistical inference could be carried out based on the log-likelihood function (2.29) and the MLE theory applies. 2.4 Models for Nonignorable Missing Data Lots of generally applicable statistical inference methods for incomplete data is- sues are likelihood-based, the idea of which is illustrated by a missing response case in the following. Suppose some values of the response variable y are missing. Let ri = 1 if yi is missing and ri = 0 if yi is observed. Denote the latent variable that carries the information of the missingness as !, and the joint distribution of response y and missingness indicator r, condition on covariates x, is given by 26 2.4. Models for Nonignorable Missing Data Pr(y; rjx; !). Since this distribution can be rewritten as Pr(y; rjx; !) = Pr(yjx; (!))Pr(rjy;x; (!)); (2.30) where (!); (!) are subsets or transformation of !, the likelihood function can be obtained if the full-data response model Pr(yjx; (!)) and the missing mechanism Pr(rjy;x; (!)) are specied. The statistical inference can therefore be carried out based on by maximum likelihood theory. Little (1992) reviewed the statistical methods of estimation in regression mod- els with missing covariates. Six methods dealing with missing covariates are com- pared: complete-case methods, available-case methods, least squares on imputed data, maximum likelihood methods, Bayesian methods and multiple imputation. He suggested that the maximum likelihood method, Bayesian methods, and mul- tiple imputation method perform well, and the maximum likelihood method is preferred in a large samples scenario and Bayesian methods or multiple imputa- tion method are preferred in a small samples scenario. In the specication of NLME model (2.1), all covariate values should be avail- able, either observed or estimated, at the response measurement times tij . How- ever, in practice this may not be the case since covariates may be measured at dierent time points or may be missing at tij . Model (2.10) automatically incor- porates missing covariates when the missing data mechanism is ignorable (e.g., MAR or MCAR), since model (2.10) is tted to the observed covariates which may be measured at dierent time points than the responses. When the missing covariates are nonignorable (i.e., the missingness may be related to the missing values), the missing data mechanism should be modeled. Little (1995) gave an overview on modeling dropout mechanism in longitudinal studies. For example, we may assume that the missingness is related to the true 27 2.4. Models for Nonignorable Missing Data but unobserved covariate value and the previous missing status. That is, we may assume that logit(Pr(rij = 1)) = 0 + 1z i (tij) + 2ri;j1; (2.31) where zi (tij) = p(tij) T+q(tij) Tai. We can then incorporate the missing data model into the joint likelihood for inference. Alternative missing data models may also be considered. The statistical analysis in this thesis is mainly based on likelihood. When the missing covariates are nonignorable, we only need to add an additional model for the missing mechanism to the joint likelihood and then proceed the analysis. For simplicity of presentation, we only focus on ignorable missing covariates in the following sections. 28 Chapter 3 Simultaneous Inference Having introduced several models in previous chapter for the statistical prob- lems discussed in Section 1.1, the attentions are turned to how to analyze the data based on these models. As we emphasized in Section 1.2.1, the longitudinal data and the survival data are associated and the models share the same ran- dom eects. Thus, separate analysis of the longitudinal data and survival data may lead to biased results. This essential feature requires a simultaneous infer- ence for those statistical models. In this chapter, we rstly introduce a simple method that can be implemented easily by standard softwares. Then we discuss two likelihood-based simultaneous inference methods, EM algorithm and Laplace approximation, which both have their own advantages and are widely applica- ble. Finally, we linearize the joint models and propose a new likelihood-based approximate simultaneous inference approach. The purpose of Laplace approx- imation and the liearization method is to reduce computation burden since the EM algorithm is computationally intensive. A brie
y review below is to remind the readers of the statistical models that are adopted in this thesis. Suppose n subjects are included a longitudinal study. Let yij be the response value for subject i at time tij , zij be the corresponding covariate, and Ti be the event time of subject i, i = 1; : : : ; n. The statistical models that we introduced in Chapter 2 are reviewed as follows: 29 Chapter 3. Simultaneous Inference Parametric NLME model for the longitudinal process: yij = g(tij ;i) + eij ; i = 1; : : : ; n; j = 1; : : : ; ni; (3.1) i = h(zi;;bi) (3.2) bi N(0; B); eijbi N(0; 2Ii); (3.3) where g() and h() are given nonlinear functions, i are the subject-specic parameters and are the xed eects (population parameters), zi contain the covariates for subject i, bi = (bi1; : : : ; biq) T are the random eects be- tween subjects, ei = (ei1; : : : ; eini) T are the random errors of within-subject measurements, B is the variance-covariance matrix of the random eects, and characterizes variance of within-subject measurements. Semiparametric LME model for the covariate measurement errors problem: zi = z i + i = T p+ T q ai + i; i = 1; : : : ; n; (3.4) ai N(0; A); ijai N(0; 2Ii); (3.5) where () and () are some basis functions, are the xed eects and ai are the random eects between subjects, i = (i1; : : : ; ini) T are the random errors of within-subject measurements, A is the variance-covariance matrix of the random eects, and characterizes variance of within-subject measurements. Parametric AFT model for the time-to-event process: log Ti =
0 + T 1 ai + T 2 bi + i; i = 1; : : : ; n: (3.6) where i follows a standard extreme value distribution. 30 3.1. Two-Step Method 3.1 Two-Step Method Most joint modelings of longitudinal process and time-to-event process are based on the assumption that two processes are linked with some shared parameters, which are usually the random eects representing variations between subjects. A simple statistical inference method is to make independent inference for one process in the rst step to estimate the shared parameters, and then incorporate the estimated parameters into the inference for the other process in the second step. We call this method Two-Step Method. 3.1.1 Naive Two-Step Method Since both the longitudinal process and the time-to-event process depend on the covariate z, which were measured with errors, we need to solve the measurement errors problem (3.4) at the beginning. Laird and Ware (1982) gave the LME model formulation and proposed an EM algorithm to t the model by treating the random eects as missing data. Lindstrom and Bates (1988) employed the linear feature of LME model and obtained the rst two marginal moments for yi. The xed eects can be estimated by generalized least squares, and the variance components can be estimated by normal-theory pseudo-likelihood or restricted maximum pseudo-likelihood using a general framework based on Newton-Raphson method. Pinheiro and Bates (2002) described the covariance parametrizations of A, which could be more diverse and complex than our case. By tting the LME model (3.4), we have the \true" covariate zi for modeling the longitudinal process and the estimate of random eects ai for modeling the time-to-event process. For the NLME model (3.1), Lindstrom and Bates (1990) alternated between a penalized nonlinear least squares (PNLS) step and a linear mixed eects (LME) step. In the PNLS step, the current estimate of precision factor is held xed, and the conditional estimates of the random eects bi and the xed eects are 31 3.1. Two-Step Method obtained by minimizing a penalized nonlinear least squares objective function. The LME step then updates the estimate of the precision factor based on the rst- order Taylor expansion of the likelihood function around the current estimates of and bi respectively. Using this method to t the longitudinal model (3.1), we can obtain the estimate of the xed eects and the random eects bi for tting the time-to-event model. The time-to-event process is analyzed by the AFT regression model. Having the random eects ai and bi estimated from above two models, we can t the AFT regression model by standard statistical packages. Thus, the naive two-step approach can be summarized as below. Covariate model inference. For a time-varying covariate, assume a
exible and robust model to estimate the true covariate values and the random eects between subjects. Nonlinear mixed eects model inference. Estimate the xed eects, the ran- dom eects and the variance components with the covariate values measured with errors replaced by the true values. Time-to-event model inference. Fit the time-to-event model based on the estimated random eects from above two estimations. 3.1.2 Modied Two-Step Method The naive two-step method is easily understandable and executable using standard statistical software. However, as pointed out by Ye et al. (2008) and Albert and Shih (2009), the naive two-step method may lead to two types of bias. The covariate trajectories may be dierent according to the censoring status of the event times. Thus, applying a single covariate model to all covariate data may lead to biased estimation for the rst model. 32 3.2. EM Algorithm Inference for the second model that ignores the estimation uncertainty in the rst model may also lead to biased results (e.g., under-estimating the standard errors). The rst bias, called bias from informative dropouts, may depend on the strength of the association between the longitudinal process and the time-to-event process. The second bias may depend on the magnitude of measurement errors in covariate. Therefore, we need to modify the naive two-step method to reduce these biases. To incorporate the estimation uncertainty in the rst step, we could adjust the standard errors of the parameters' estimates using bootstrap method by re- sampling subjects and keeping the repeated measurements of resampled subjects. After repeating above procedure B times, the standard errors for the estimates of the xed parameters can be estimated by the sample variance-covariance matrix across the B bootstrap datasets, which is expected be more reliable than the naive two-step method results, if the assumed models are correct. However, the modied two-step method can not completely reduce the biases in the naive two-step method, because it still treats the two processes separately. In order to make reliable inference, simultaneous inference based on a joint model may be more appropriate. In order to provide an entirely decent inference method, making sure that the method considers two processes simultaneously is the rst and most fundamental step to do. In the following sections, we consider some unied approaches based on the joint likelihood function given all observed lon- gitudinal data and time-to-event data. 3.2 EM Algorithm The two-step methods introduced in the previous section, naive or modied, t the models separately with a simple \plug-in" strategy. EM Algorithm can be 33 3.2. EM Algorithm used to estimate all parameters in the longitudinal model, the covariate model and the time-to-event model simultaneously based on the joint likelihood with the bias of the naive two-step method avoided. The joint likelihood approach is quite general and can be extended to statistical inference for more than two related models. Such a joint likelihood method provides ecient estimation if the assumed models are correct. Specically, let = (;;
; ; ; ;A;B) be the parameters' collection of above three models, and let f() denotes the generic density functions and F () be the corresponding cumulative distribution functions. The joint log-likelihood given all the observed data can be written as: `o(jy; z; ; ) = nX i=1 `(i)o (jyi; zi; i; i) = nX i=1 ZZ log[f(yijai;bi;;; )f(bijB)f(zijai;;; )f(aijA) f(ijai;bi;
; )i(1 F (ijai;bi;
; ))1i ]daidbi: (3.7) The EM algorithm (Dempster et al., 1977) is a powerful iterative algorithm to compute MLEs in a wide variety of situations, including missing data problems and random eects models. To get the MLE of in our case, we may consider an EM algorithm by treating the unobservable random eects (ai;bi) as \missing data". EM algorithm contains an E-step, which computes the conditional expecta- tion of the \complete data" log-likelihood given the observed data and parameter estimates, and an M-step, which maximizes the conditional expectation in the E-step to update the parameters' estimates. 3.2.1 E-Step The \complete data" are fyi; zi; i; i;ai;big; i = 1; 2; ; n, and the parameter estimates at the tth iteration is denoted as (t). The E-step at the (t + 1)th 34 3.2. EM Algorithm iteration can be written as: Q(j(t)) = nX i=1 ZZ [log f(bijB) + log f(yijai;bi;;; ) + log f(aijA) + log f(zijai;;; ) + log(1 F (ijai;bi;
; ))1i + log f(ijai;bi;
; )i ]f(ai;bijyi; zi; i; i;(t))daidbi; (3.8) The computation of the conditional expectation Q(j(t)) in the E-step is usu- ally non-trivial for high-dimensional integrals. However, since the integral is an expectation with respect to f(ai;bijyi; zi; i; i;(t)), it can be evaluated based on the Monte Carlo EM of Wei and Tanner (1990) and Ibrahim et al. (2001a). That is to generate mt samples from f(ai;bijyi; zi; i; i;(t)) and approximate the expectation Q(j(t)) by its empirical mean, with the random eects ai;bi replaced by the simulated values a (j) i ;b (j) i ; j = 1; : : : ;mt. Q(j(t)) = 1 mt mtX j=1 nX i=1 [log f(yija(j)i ;b(j)i ;;; ) + log f(b(j)i jB) + log f(zija(j)i ;;; ) + log f(a(j)i jA) + i log f(ija(j)i ;b(j)i ;
; ) +(1 i) log(1 F (ija(j)i ;b(j)i ;
; ))] = Q(1)(;; ) +Q(2)(B) +Q(3)(; ) +Q(4)(A) +Q(5)(
; ): (3.9) We may choose m0 as a large number and mt = mt1 + mt1=k; (k 1) in the tth iteration. Increasing mt at each iteration may speed up the algorithm's convergence (Booth and Hobert, 1999). To generate independent samples from f(ai;bijyi; zi; i; i;(t)), we may use Gibbs sampler (Gelfand and Smith, 1990) by sampling from the following full 35 3.2. EM Algorithm conditionals iteratively: f(aijyi; zi; i; i;bi;(t)) / f(ai;(t))f(yijai;bi;(t))f(zijai;(t))f(i; ijai;bi;(t)); f(bijyi; zi; i; i;ai;(t)) / f(bi;(t))f(yijai;bi;(t))f(i; ijai;bi;(t)); where f(i; ijai;bi;(t)) = f(ijai;bi;(t))i(1F (ijai;bi;(t)))1i : Thus, we can use the Gibbs sampling method to iteratively sample from the above full conditionals. After a burn-in period, we can obtain a sample from the conditional distribution f(ai;bijyi; zi; i; i;(t)). 3.2.2 M-Step The parameter at the (t + 1)th iteration can be estimated by maximizing Q(j(t)). Generally, Q(j(t)) is a nonlinear function and there is no closed- form expression for the estimate ̂ (t+1) . The maximizers could be obtained via standard numerical optimization procedures for complete data, such as the Newton-Raphson method or more advanced algorithms (Fletcher, 1987; Nocedal and Wright, 2006). EM algorithm iterates between the two steps until convergence and the - nal estimates of are regarded as a local maximum. The asymptotic variance- covariance matrix of ̂ can be obtained using well-known complete-data formulas (Louis, 1982), where the expectations in those formulas can be approximated by Monte-Carlo means. Specically, note that the observed information matrix equals the expected complete information minus the missing information Iobs(̂) = Icom(̂) Imis(̂) 36 3.3. Laplace Approximation Dene _Q(ĵ) = nX i=1 _Qi(ĵ) = mtX j=1 nX i=1 1 mt Sij() Q(ĵ) = @ 2Qi(ĵ) @@T = mtX j=1 nX i=1 1 mt @Sij() @ : Since the parameters are all distinct, above matrices are block diagonal. Then the asymptotic observed information matrix is Iobs(̂) = Q(ĵ) 24 mtX j=1 nX i=1 1 mt Sij(̂)S T ij(̂) nX i=1 1 mt _Q(ĵ) _QT (ĵ) 35 : (3.10) The asymptotic variance-covariance matrix of ̂ can be approximated by Cov(̂) = I1obs(̂): 3.3 Laplace Approximation The convergence of the foregoing Monte-Carlo EM algorithm depends on the di- mension of the random eects (ai;bi). If the dimension of the random eects (ai;bi) is not small, for instance larger than 2, the computation of Monte-Carlo EM method can be extremely intensive since the method involves simulating large samples at each iteration. Moveover, the algorithm may not converge in some cases. Therefore, it is valuable to approximate the observed-data joint log- likelihood `o(jy; z; ; ) using the rst-order Laplace approximation, which com- pletely avoids integration and fastens the entire computation speed signicantly. 37 3.3. Laplace Approximation The \complete data" log-likelihood can be written as: `c(ja;b) = nX i=1 `(i)c (jyi; zi; i; i) = nX i=1 log[f(yijai;bi;;; )f(bijB)f(zijai;;; )f(aijA) f(ijai;bi;
; )i(1 F (ijai;bi;
; ))1i ] (3.11) Vonesh et al. (2002) and Lee et al. (2006) showed that the observed-data log- likelihood `o(jy; z; ; ) can be approximated by its rst-order Laplace approxi- mation ~̀ o(j~a; ~b) = `c(j~a; ~b) 1 2 log 12 @2`c(ja;b)@(a;b)2 (a;b=~a;~b) ; (3.12) where (~a; ~b) = f(~ai; ~bi); i = 1; 2; : : : ; ng solve the equations @`(i)c (jai;bi)=@(ai;bi) = 0; i = 1; 2; : : : ; n: (3.13) Thus, an approximate estimate of can be obtained by solving the equation @ ~̀o(j~ai; ~bi)=@ = 0: (3.14) Given starting values (0) and (a(0);b(0)), we iterate the above procedures until convergence and obtain the approximate estimate ̂ of the MLE. To be specic, we carry out the following two steps at the tth iteration. Estimate the random eects (~a(t); ~b(t)) by solving equations (3.13), Update the xed eect (t+1) by solving equation (3.14). Laplace approximation method completely avoids integration, and thus is com- putationally attractive. The estimated random eects obtained this way can be 38 3.4. Linearization Method interpreted as the empirical Bayes estimates. The standard errors of the approx- imate estimates can be calculated based on the approximate formula for their variance-covariance matrix, which can be obtained in a similar way by eliminating the mean parameters and the random eects using a similar Laplace approxima- tion. Specically, Cov(̂) = " @2`o(j~a; ~b) @@T #1 =̂ (3.15) where ̂ are the estimated xed eects, and ~a; ~b are the estimated random eects at the nal iteration. Vonesh et al. (2002) showed that the approximate estimate ̂ is consistent and asymptotically normally distributed when both the sample size and the number of measurements within each individual go to innity (̂ ) = Op h max n (n) 1 2 ; (minni) 1 oi ; (3.16) and p n(̂ )! N(0; I()1) (3.17) where I() = limn P i Ii()=n and Ii() is the information matrix of subject i. 3.4 Linearization Method Inspired by a popular approximate method for solving NLME model, we propose a linearization method for our joint models. Following the similar idea of Lindstrom and Bates (1990), we rstly linearize the NLME (3.1). Given the estimates of parameters ~; ~; ~ai; ~bi, we have the rst-order Taylor expansion of gij = g(tij ;i), which is a LME model ~yi =Wi+Xi + Uiai + Vibi + ei; (3.18) 39 3.4. Linearization Method where Wi = (wi1; ;wini)T ; wij = @gij=@; Xi = (xi1; ;xini)T ; xij = @gij=@; Ui = (ui1; ;uini)T ; uij = @gij=@ai; Vi = (vi1; ;vini)T ; vij = @gij=@bi; ~yi = yi gi(~; ~; ~ai; ~bi) +Wi ~+Xi~ + Ui~ai + Vi~bi; with all the partial derivatives being evaluated at (~; ~; ~ai; ~bi). Now we have two LMEs and a time-to-event model, and we adopt the strategy of EM algorithm to estimate the parameters. To calculate (3.8), we need to estimate the random eects ai;bi at rst. The general EM approach samples ai;bi from the conditional distribution of the ran- dom eects f(ai;bijyi; zi; i; i;(t)) by iteratively updating their full conditionals: f(aijyi; zi; i; i;bi;(t)) / f(ai;(t))f(yijai;bi;(t))f(zijai;(t))f(i; ijai;bi;(t)); f(bijyi; zi; i; i;ai;(t)) / f(bi;(t))f(yijai;bi;(t))f(i; ijai;bi;(t)); where f(i; ijai;bi;(t)) = f(ijai;bi;(t))i(1 F (ijai;bi;(t)))1i : 40 3.4. Linearization Method Since we have f(ai; (t)) / jait j ni 2 exp 1 2 (ai ait)T1ait(ai ait) / jait j ni 2 exp 1 2 aTi 1 aitai + T ait 1 aitai 1 2 Tait 1 aitait ; f(bi; (t)) / jbit j ni 2 exp 1 2 (bi bit)T1bit(bi bit) / jbit j ni 2 exp 1 2 bTi 1 bit bi + T bit 1bitbi 1 2 Tbit 1 bit bit ; f(zijai;(t)) / niit exp 1 22it (zi Ti t Ti ai)T (zi Ti t Ti ai) / niit exp 1 22it aTi i T i ai + 1 2it (zi Ti t)TTi ai 1 22it (zi Ti t)T (zi Ti t) ; f(yijai;bi;(t)) / niit exp[ 1 22it (~yit Witt Xitt Uitai Vitbi)T (~yit Witt Xitt Uitai Vitbi)] / niit exp 1 22it aTi U T itUitai 1 22it bTi V T it Vitbi + 1 2it (~yit Witt Xitt Vitbi)TUitai + 1 2it (~yit Witt Xitt Uitai)TVitbi 1 22it (~yit Witt Xitt)T (~yit Witt Xitt) ; The likelihood function given the event time data is linearized as follows: f(i; ijai;bi;(t)) =/ exp 1 22t aTi (
1t T 1t)ai + 1 2t (log Ti
0t + i)
T1tai 1 22t bTi (
2t T 2t)bi + 1 2t (log Ti
0t + i)
T2tbi 1 22t (log Ti
0t + i)2 : Then we may obtain the approximate explicit form of the distribution of the ran- 41 3.4. Linearization Method dom eects, which follows a multivariate normal distribution. Therefore, the full conditionals of the random eects can be approximated by the following multi- variate normal distributions: f(aijyi; zi; i; i;bi;(t)) / f(ai;(t))f(yijai;bi;(t))f(zijai;(t))f(i; ijai;bi;(t)) / exp 1 2 (ai ai(t+1))T1ai(t+1)(ai ai(t+1)) MVN(ai(t+1) ;ai(t+1)); (3.19) f(bijyi; zi; i; i;ai;(t)) / f(bi;(t))f(yijai;bi;(t))f(i; ijai;bi;(t)) / exp 1 2 (bi bi(t+1))T1bi(t+1)(bi bi(t+1)) MVN(bi(t+1) ;bi(t+1)); (3.20) where 1ai(t+1) = 1 ait + i T i 2it + UTitUit 2it +
1t T 1t 2t ; Tai(t+1) 1 ai(t+1) = Tait 1 ait + 1 2it (zi Ti t)TTi + 1 2t (log Ti
0t + i)
T1t + 1 2it (~yit Witt Xitt Vitbit)TUit; 1bi(t+1) = 1 bit + V Tit Vit 2it +
2t T 2t 2t ; Tbi(t+1) 1 bi(t+1) = Tbit 1 bit + 1 2t (log Ti
0t + i)
T2t + 1 2it (~yit Witt Xitt Uitait)TVit: Thus, we may have an estimate of the distribution of the random eects ai and bi at the (t+ 1)th iteration. 42 3.4. Linearization Method 3.4.1 Framework I With the estimated distributions of ai and bi, we may return to Section 3.2.1 and work on the log-likelihood function (3.8). If the random eects ai and bi can be integrated out, we may simply adopt the EM framework to complete the linearization method. To be specic, at the tth iteration, we need to: Linearize the joint model at the current steps ~ai(t); ~bi(t) and ̂(t); Estimate the random eects ( ~ai(t+1); ~bi(t+1)) as (3.19); Obtain the expectation of the joint log-likelihood function; Update (t+1) by maximizing the expectation of the observed data joint log-likelihood. We repeat the above steps until the algorithm converges. The standard errors of the MLE of can be calculated based on the asymptotic observed information matrix (3.10). 3.4.2 Framework II If we can not obtain an explicit form of the integral (3.8), we may simply consider the means of the estimated distributions of the random eects ai and bi, ai(t+1) and bi(t+1) , as their empirical Bayesian estimates. Then the xed eects can be updated in the same way as Laplace approximation method. That is to say, we carry out the following steps at the tth iteration. Linearize the joint model at current iterate points ~ai(t); ~bi(t) and ̂(t); Estimate the random eects ( ~ai(t+1); ~bi(t+1)) by (3.19), Update (t+1) by solving equations (3.14) with ai(t+1) and bi(t+1) being estimates of ~ai (t) and ~bi (t) . 43 3.4. Linearization Method Similar to Laplace approximation method, the standard errors of the approximate MLE of can be approximated by (3.15). 44 Chapter 4 Data Analysis 4.1 Data Description We consider an AIDS clinical trial, which included 46 HIV infected patients being treated with an anti-HIV treatment. The viral loads were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of the treatment. The CD4 and CD8 cell counts and other variables were also recorded throughout the study. The lower bound of the detectable limit of the viral load is 100 copies/ml, and 40 out of all 361 viral load measurements (11%) are below the detectable limit. For simplicity, we imputed the viral loads below the assay's lower detection limit 100 copies/ml with a half of the limit. The number of viral load measurements for each individual varies from 4 to 10. Figure 4.1 shows the trajectories of viral load, the CD4 and the CD4/CD8 of six randomly selected patients. One of the primary objectives of this study is to assess the antiviral activity of the treatment in the rst 12 weeks. Meanwhile, the association between the viral load and the time to the rst decline of the ratio of CD4 and CD8 is of interest. 45 4.2. Model Specication 4.2 Model Specication Based on previous researches (Wu and Ding, 1999; Wu, 2002; Wu et al., 2010), we consider the following HIV viral load dynamic model yij = log10(P1ie 1itij + P2ie2ijtij ) + eij (4.1) log(P1i) = 1 + bi1; 1i = 2 + bi2; log(P2i) = 3 + bi3; 2ij = 4 + 5z ij + bi4: where yij is the log10-transformation of the viral load of patient i at time tij , P1i and P2i are subject-specic viral load baselines, 1i and 2ij subject-specic rst- phase and second-phase viral load decay rates, zij is the \true" (but unobserved) CD4 count. It is known that CD4 counts are usually measured with substantial errors. Thus we assume that 2ij is related to the true CD4 value z ij rather than the observed CD4 value zij . To avoid very small (large) estimates, which may be unstable, we standardize the CD4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. Figure 4.1: Longitudinal Data Trajectories 0 20 40 60 80 1 2 3 4 5 6 Time Vi ra l L oa d 0 20 40 60 80 0 10 0 20 0 30 0 40 0 50 0 Time CD 4 20 40 60 80 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Time CD 4/ CD 8 As we can see from Figure 4.1, the CD4 trajectories are complicated and the between-patient variations in CD4 trajectories are quite large, thus we consider a 46 4.2. Model Specication nonparametric linear mixed eects model to empirically describe the CD4 trajec- tories. We use linear combinations of natural cubic splines with percentile-based knots to approximate r(t) and hi(t). We set 0(t) = 0(t) = 1 and take the same natural cubic splines in the approximations with q p in order to decrease the dimension of the random eects. AIC and BIC criteria are used to determine that p = q = 2, which leads to the following model for the CD4 process. zij = (1 + ai1)0(tij) + (2 + ai2)1(tij) + (3 + ai3)2(tij) + ij : (4.2) where zij is the observed CD4 value at time tij , 1() and 2() are the basis functions, = (1; 2; 3) T are the xed parameters, and ai = (ai1; ai2; ai3) T are the random eects. Let Ti be the time to the rst decline of CD4/CD8 ratio of subject i. In AIDS Figure 4.2: Estimate of Survivor Function 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time of First Decline of CD4:CD8 Su rvi vo r F un ctio n studies, it is of great interest to see if the time of the rst decline of CD4/CD8 is related to the subject characteristics of the viral load and the CD4 trajectories, 47 4.3. Analysis and Results represented by the random eects in the viral load model and the CD4 model. Therefore, we consider the following time-to-event model for Ti log Ti =
0 +
1ai1 +
2ai2 +
3bi2 +
4bi4 + i: (4.3) where i follows standard extreme value distribution. The random eects bi2 and bi4, which represent between-subject variations of viral decay rates, might be predictive for event times. The random eects ai1 and ai2, which bear the main features of between-subject variations of CD4 trajectories, are also included in the parametric AFT model (4.3). 4.3 Analysis and Results We estimate the models' parameters using two-step method (TS), modied two- step method (MTS), Laplace approximation method (LAP) and linearization method (LIN). TS and MTS are based on standard R algorithms in library(nlme) and library(survival). The estimation results of TS and MTS are listed along with those of LAP and LIN in Table 4.1. Figure 4.3 and 4.4 check the normality assumptions of the residual and random eects of the NLME model. Figure 4.5 and 4.6 check the normality assumptions of the residual and random eects of the measurement error model. It seems that no assumption is severely violated. LAP and LIN are iterative algorithms with convergence being achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. The parameters' estimates of TS method are taken as the initial values for two iterative algorithms LAP and LIN. Based on the analysis results listed in Table 4.1, we have the following obser- vations. Four inference methods provide similar parameter estimates for the NLME 48 4.3. Analysis and Results Figure 4.3: Residuals Plots I of NLME Model Fitted values St an da rd ize d re sid ua ls −2 0 2 4 2 3 4 5 6 Standardized residuals Qu an tile s o f s ta nd ar d no rm al −3 −2 −1 0 1 2 3 −2 0 2 4 Figure 4.4: Residuals Plots II of NLME Model Random effects Qu an tile s o f st an da rd no rm al −2 −1 0 1 2 −3 −2 −1 0 1 2 11960 250405 610282 610330 b1 −5 0 5 11960 b2 −4 −2 0 2 11960 250405 250708 b3 −5 0 −2 −1 0 1 2 250736 250911 271773 610500 b4.(Intercept) model of the viral load data. LAP and LIN agree that all parameters of are signicant to the viral load, while TS and MTS suggest that 5 might be insignicant. 49 4.3. Analysis and Results Figure 4.5: Residuals Plots I of Covariate Measurement Errors Model Fitted values St an da rd ize d re sid ua ls −3 −2 −1 0 1 2 −4 −3 −2 −1 0 1 Standardized residuals Qu an tile s o f s ta nd ar d no rm al −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 Figure 4.6: Residuals Plots II of Covariate Measurement Errors Model Random effects Qu an tile s o f st an da rd no rm al −2 −1 0 1 2 −3 −2 −1 0 1 250009 250723 250867 610429 (Intercept) −1 0 1 2 3 4 250009 271764 610429 z1 −0.4 −0.2 0.0 0.2 0.4 0.6 −2 −1 0 1 2 250009 250867 271764 610429 z2 The estimation results of the time-to-event process obtained from four meth- ods dier from each others. Generally speaking, TS and MTS suggest that none random eect is signicant to the event time, while two simultaneous 50 4.3. Analysis and Results Table 4.1: Estimation Results Parameter TS MTS LAP LIN Est S.E. Est S.E. Est S.E. Est S.E. 1 11.75 0.19 11.76 0.19 11.78 0.09 11.80 0.06 2 29.84 1.47 30.22 1.69 29.25 1.31 30.03 0.87 3 7.48 0.29 7.43 0.32 7.22 0.15 7.40 0.10 4 1.34 0.51 1.31 0.64 1.43 0.23 1.18 0.15 5 0.63 0.46 0.52 0.70 0.42 0.16 0.55 0.10 0.29 NA 0.29 NA 0.29 NA 0.22 NA
0 -0.81 0.12 -0.82 0.01 -0.97 0.11 -0.71 0.20
1 0.19 0.24 0.16 0.52 0.55 0.37 0.22 0.19
2 -0.03 0.21 -0.01 0.52 -1.03 0.52 0.01 0.13
3 -0.03 0.05 -0.02 0.11 -0.05 0.02 -0.03 0.04
4 0.08 0.05 0.04 0.10 0.12 0.03 0.05 0.02 0.75 NA 0.76 NA 0.31 NA 0.75 NA approaches indicate that bi4 might be signicant. This discovery distin- guished the simultaneous approaches from the TS based methods. MTS provides very close parameter estimates as TS for both the NLME model and the measurement errors model, but with greater standard errors. However, by inferring those three models simultaneously, LAP and LIN provided parameter estimates with smaller standard errors. Based on this data analysis result, we may conclude that: For the analysis of the longitudinal process based on NLME model, the re- sults obtained from two simultaneous inference methods agree with previous researches (Ding and Wu, 2001; Wu et al., 2008, 2010) in terms of the sig- nicance conclusion of the parameters. However, some estimated values of our analysis are dierent from previous researches, since we only considered the observations in the rst three months in our study. Two simultaneous inference methods suggest that CD4 is signicant to the 51 4.3. Analysis and Results viral load, while TS and MTS miss this reasonable and important property. According to the conclusions from LAP and LIN, the higher the CD4 is, the lower viral load the subject might have. Two simultaneous inference methods also indicate that the viral decay rate is signicant to the rst decline time of the CD4/CD8 ratio. The lower the viral load is, the later the CD4/CD8 may rstly decline. TS based methods fail again in detecting this important feature. The discrepancies between these data analysis results obtained from dierent methods are important in terms of both biological interpretation and statistical methodology development. To study if the performance of the simultaneous in- ference methods exceed the TS based methods where longitudinal process and time-to-event process are truly connected, we need a simulation study to compare these methods. 52 Chapter 5 Simulation A simulation study is conducted to evaluate the performance of the three methods: two-step method (TS), Laplace approximation method (LAP) and linearization method (LIN). Modied two-step method (MTS) is omitted since MTS performs quite similar to TS only expect that MTS generally yields in greater variances of the estimates. 5.1 Simulation Design We simulate the longitudinal data and covariate data based on model (4.1) and (4.2). The event times are simulated based on (4.3) with 20% randomly censored subject. While having the parameter , and
xed, we compare these inference methods based on their estimation performances in four scenarios with dierent settings of Number of subjects, Numbers of repeated measurements, Between-subject variance, Within-subject variance. The simulations were implemented on Dell R900 server with 16 CPUs and 16-32Gb RAM. The estimation performance includes the standard errors of the parameters' estimates and the corresponding bias and MSE, along with the average running 53 5.2. Simulation Results time. Since we are more interested in the NLME model and the time-to-event model estimation, the estimation results of the covariate model are omitted in the following section. 5.2 Simulation Results 5.2.1 Number of Subjects To examine how these estimation methods perform with dierent numbers of subjects, we simulate datasets with n = 60 and n = 120. The estimation results based on n = 60 are reported in Table 5.1. The estimation results based on n = 120 are reported in Table 5.2. Table 5.1: Simulation Results: n = 60 Parameter TS LAP LIN Bias SE MSE Bias SE MSE Bias SE MSE 1 = 12 0.02 0.18 0.03 -0.04 0.20 0.04 -0.09 0.06 0.01 2 = 30 -1.13 2.16 5.95 -1.39 1.69 4.79 -0.93 1.25 2.41 3 = 7 -0.02 0.20 0.04 0.12 0.23 0.07 -0.14 0.17 0.05 4 = 2 -0.05 0.26 0.07 0.15 0.37 0.16 -0.01 0.23 0.05 5 = 3 -0.00 0.13 0.02 -0.30 0.21 0.13 -0.48 0.08 0.23 = 0:3 -0.00 0.01 0.00 0.12 0.04 0.02 -0.01 0.02 0.00
0 = 1 2.19 0.40 4.95 0.20 0.38 0.19 0.15 0.10 0.03
1 = 1 0.01 0.43 0.18 -0.80 1.53 2.97 -0.91 0.18 0.87
2 = 1 -0.01 0.48 0.23 0.52 1.10 1.47 -0.88 0.24 0.84
3 = 1 -1.00 0.28 1.08 0.14 0.36 0.15 -0.07 0.21 0.05
4 = 1 -0.02 0.40 0.16 -0.08 0.36 0.14 -0.10 0.21 0.05 = 0:2 2.19 0.30 4.88 0.06 0.23 0.06 0.08 0.03 0.01 Time (Sec) 680.42 180.47 13.39 For Table 5.1, we have the following observations. Three methods provided generally decent estimations for , except that the estimate of 5 provided by LIN seems to be biased. 54 5.2. Simulation Results Though TS estimated
̂1;
̂2 and
̂4 with less biases, its performance in the estimation of
0;
3 and is poor. Comparatively, LAP and LIN are more likely to provide better estimation of
, at least for
0. The results in Table 5.1 are regarded as the baseline for the following compar- isons. Table 5.2: Simulation Results: n = 120 Parameter TS LAP LIN Bias SE MSE Bias SE MSE Bias SE MSE 1 = 12 0.05 0.14 0.02 0.00 0.11 0.01 -0.10 0.04 0.01 2 = 30 -0.94 1.55 3.28 -1.14 0.96 2.20 -1.09 0.92 2.03 3 = 7 -0.01 0.18 0.03 0.14 0.14 0.04 -0.12 0.16 0.04 4 = 2 -0.04 0.19 0.04 0.11 0.26 0.08 0.00 0.22 0.05 5 = 3 0.00 0.11 0.01 -0.19 0.13 0.05 -0.36 0.07 0.13 = 0:3 0.00 0.01 0.00 0.09 0.03 0.01 -0.00 0.01 0.00
0 = 1 2.01 0.29 4.13 -0.07 0.06 0.01 -0.05 0.09 0.01
1 = 1 0.04 0.37 0.14 0.00 0.14 0.02 -0.01 0.19 0.04
2 = 1 0.00 0.42 0.18 -0.00 0.11 0.01 -0.06 0.25 0.07
3 = 1 -0.97 0.31 1.03 0.03 0.04 0.00 -0.15 0.24 0.08
4 = 1 -0.01 0.30 0.09 -0.01 0.04 0.00 -0.15 0.23 0.08 = 0:2 2.26 0.25 5.18 -0.06 0.15 0.03 0.02 0.03 0.00 Time (Sec) 1146.46 360.46 28.83 Similarly, the observations for Table 5.2, which are based on more subjects, are listed below. Overall, if more subjects are available, the estimation results of three meth- ods can be eectively improved in terms of smaller SE, bias and MSE. The estimation result oered by LAP is overall better than the results from two methods. LIN gave satisfactory estimation for and
, except for 5. This method might be even more attractive for its extraordinarily fast speed. 55 5.2. Simulation Results TS still provided severely biased estimates for
0;
3 and . 5.2.2 Numbers of Repeated Measurements To examine how these estimation methods perform with dierent numbers of repeated measurements, we simulate datasets with each subject being repeatedly measured for 6 times and 12 times. Table 5.1 contains the simulation results based on ni = 6. Here we just list the estimation results of ni = 12 in Table 5.3. Table 5.3: Simulation Results: ni = 12 Parameter TS LAP LIN Bias SE MSE Bias SE MSE Bias SE MSE 1 = 12 0.02 0.20 0.04 -0.02 0.15 0.02 -0.15 0.06 0.03 2 = 30 -0.52 1.38 2.18 -1.76 1.07 4.24 -0.94 1.23 2.39 3 = 7 -0.01 0.19 0.04 0.18 0.19 0.07 -0.08 0.13 0.02 4 = 2 0.01 0.24 0.06 0.53 0.33 0.38 0.05 0.17 0.03 5 = 3 -0.00 0.10 0.01 -0.45 0.22 0.25 -0.28 0.07 0.08 = 0:3 -0.00 0.01 0.00 0.09 0.03 0.01 0.01 0.01 0.00
0 = 1 2.16 0.43 4.87 -0.05 0.20 0.04 -0.06 0.12 0.02
1 = 1 0.00 0.47 0.22 0.16 0.58 0.36 -0.96 0.19 0.95
2 = 1 0.03 0.47 0.22 -0.16 0.53 0.31 -0.94 0.23 0.93
3 = 1 -0.85 0.46 0.93 0.12 0.21 0.06 -0.05 0.21 0.05
4 = 1 0.04 0.45 0.20 0.01 0.16 0.03 -0.09 0.21 0.05 = 0:2 2.20 0.31 4.92 -0.02 0.22 0.05 0.08 0.05 0.01 Time (Sec) 183.73 187.58 15.60 The observations for the results based on more measurements, Table 5.3, are Overall, it might be hard to reduce the bias or SE of the estimates by having more measurements for each subject. Specically, LAP led the performance of estimating and
, though the bias of the estimates are greater than Table 5.2. LIN also provided good estimation for and
except for
1 and
2, which are associated with the random eects of the measurement error model. 56 5.2. Simulation Results With more repeated measurements ensured, the computation time of TS is greatly reduce. However, TS still provided severely biased estimates for
0;
3 and . 5.2.3 Between-Subject Variance To examine how these estimation methods perform with dierent variance-covariance matrices, we simulate datasets with dierent settings of A and B follows: A1 = 0BBBB@ 1 0 0 0 1 0 0 0 1 1CCCCA ; B1 = 0BBBBBBB@ 2 0 0 0 0 4 0 0 0 0 2 0 0 0 0 3 1CCCCCCCA ; and A2 = 0BBBB@ 2 0 0 0 2 0 0 0 2 1CCCCA B2 = 0BBBBBBB@ 4 0 0 0 0 8 0 0 0 0 4 0 0 0 0 6 1CCCCCCCA : Table 5.2 is based on A1 and B1. Table 5.4 is based on A2 and B2. For Table 5.4, Surprisingly, the estimation result that LAP oered in the greater between- subject variance scenario is better than the less between-subject variance scenario (Table 5.1). It is almost as good as the more sample size scenario (Table 5.2). However, LIN is not that lucky, at least in terms of the bias of estimates of 5 and
2. The SE of estimates are reasonably greater than previous setting of A1 and B1. 57 5.2. Simulation Results Table 5.4: Simulation Results: A2; B2 Parameter TS LAP LIN Bias SE MSE Bias SE MSE Bias SE MSE 1 = 12 0.10 0.25 0.07 -0.03 0.16 0.03 -0.15 0.07 0.03 2 = 30 -0.87 2.18 5.52 -1.17 0.93 2.23 -0.32 2.14 4.69 3 = 7 -0.07 0.29 0.09 0.10 0.23 0.06 -0.24 0.23 0.11 4 = 2 -0.02 0.34 0.11 0.15 0.39 0.18 -0.12 0.31 0.11 5 = 3 0.00 0.11 0.01 -0.17 0.10 0.04 -0.56 0.10 0.32 = 0:3 -0.00 0.01 0.00 0.14 0.04 0.02 0.03 0.05 0.00
0 = 1 2.61 0.51 7.10 -0.02 0.08 0.01 -0.14 0.37 0.15
1 = 1 0.06 0.36 0.14 -0.05 0.25 0.06 -0.03 0.32 0.10
2 = 1 -0.01 0.38 0.15 0.04 0.21 0.05 -0.22 0.41 0.22
3 = 1 -0.91 0.34 0.94 0.01 0.02 0.00 -0.09 0.36 0.13
4 = 1 0.02 0.31 0.09 0.00 0.03 0.00 -0.09 0.35 0.13 = 0:2 2.84 0.37 8.19 -0.02 0.18 0.03 0.80 5.52 31.07 Time (Sec) 908.11 172.65 14.22 TS still provided severely biased estimates for
0;
3 and , while taking much longer time on computation. 5.2.4 Within-Subject Variance To examine how these estimation methods perform with dierent within-subject variance, we simulate datasets with dierent settings of ; and as follows: 1 = 0:3; 1 = 0:2; 1 = 0:2; and 2 = 0:6; 2 = 0:5; 1 = 0:4: Table 5.1 is based on 1; 1 and 1. Table 5.5 is based on 2; 2 and 2. When within-subject variances become greater, the estimation results be- come worse in terms of both SE and bias. No method provides satisfactory estimation for the simulated time-to-event data with greater within-subject variance, indicating both the TS based 58 5.2. Simulation Results Table 5.5: Simulation Results: 2; 2; 2 Parameter TS LAP LIN Bias SE MSE Bias SE MSE Bias SE MSE 1 = 12 0.05 0.20 0.04 0.04 0.23 0.06 -0.05 0.11 0.01 2 = 30 -2.68 2.43 13.09 -1.97 2.36 9.43 -1.31 2.15 6.31 3 = 7 -0.12 0.24 0.07 0.25 0.28 0.14 -0.19 0.34 0.16 4 = 2 -0.08 0.31 0.10 0.69 0.47 0.70 -0.11 0.51 0.28 5 = 3 0.02 0.16 0.03 -0.22 0.22 0.10 -1.20 0.15 1.47 = 0:6 0.20 0.02 0.04 0.35 0.08 0.13 0.14 0.07 0.02
0 = 1 2.28 0.41 5.38 0.36 0.85 0.86 -0.28 0.35 0.20
1 = 1 0.01 0.52 0.27 -2.29 4.54 25.84 -0.97 0.23 1.00
2 = 1 0.05 0.51 0.26 1.83 4.35 22.30 -0.94 0.21 0.92
3 = 1 -0.99 0.26 1.05 1.78 3.05 12.47 -0.10 0.22 0.06
4 = 1 0.05 0.67 0.46 -0.11 1.49 2.24 -0.10 0.23 0.06 = 0:5 2.25 0.32 5.15 0.92 1.16 2.21 0.34 0.13 0.13 Time (Sec) 1644.74 299.28 16.48 method and the approximate methods might yield in greatly biased esti- mation when variance is too high. In this situation, some \exact" method, such as EM algorithm, might be necessary. In general, we may conclude that TS is risky for unbiased estimation of joint modeling of longitudinal pro- cess and time-to-event process, especially when strong association exists in between. Approximate simultaneous inference methods are promising to provide sat- isfactory estimation for both longitudinal processes and time-to-event pro- cesses when the measurement errors problem or the model mis-specication problem is tolerable. LAP generally performs more stably than LIN. In fact, when more subjects are given, or the between-subject variance falls in an appropriate range, LAP is likely to provide quite reliable estimation. 59 5.2. Simulation Results LIN can be considered as a valuable alternative of LAP when more subjects are involved, especially considering its fast speed. However, this algorithm requires further study before being widely promoted, since its estimate for some parameters, for example 5, might be biased. 60 Chapter 6 Conclusion and Discussion The computation associated with the joint modeling of longitudinal process and survival process can be extremely intensive and may lead to convergence prob- lems (Tsiatis and Davidian, 2004; Wu et al., 2008). The situation can be even severe when several nonlinear and semiparametric/nonparametric models are in- volved in dealing with complex data, such as measurement errors or missing data are present. Previous researches have mainly used EM algorithm or Laplace ap- proximation for making inference of several simultaneously developing processes. Besides the technical details of these two popular methods, we were more im- pressed by these two dierent approaches of handling the sharing parameters of the longitudinal studies. In this thesis, following those two distinct computational frameworks, we pro- posed an approximate statistical inference method for jointly modeling longitu- dinal process and time-to-event process based on a NLME model and a para- metric AFT model. By linearizing the joint model, we designed a new strategy of updating the random eects that connect two processes, and proposed two approaches for dierent scenarios of likelihood function. Roughly speaking, one approach is widely applicable and easily implementable when the likelihood func- tion is comparatively complex. The other approach utilizes more information of the estimated random eects when the likelihood function is comparatively sim- ple, and therefore is expected to be more ecient. Both approaches approximate the multidimensional integral in the observed-data joint likelihood by an analytic 61 Chapter 6. Conclusion and Discussion expression, which greatly reduce the computational intensity of the complex joint modeling problem. The proposed inference method, LIN, was applied to an HIV study along with three other methods, TS, MTS and LAP. The inference results obtained from dierent methods gave signicant dierent conclusions towards the HIV study. Specically, Two simultaneous inference methods suggested that CD4 is signicant to the viral load, which was missed by the TS based methods (TS and MTS). Based on the simultaneous inference results, the lower the CD4 is, the higher viral load the patient might have. Two simultaneous inference methods also indicated that the viral decay rate is signicant to the rst decline time of the CD4/CD8 ratio. The lower the viral load is, the later the CD4/CD8 may rstly decline. TS based method failed again in detecting this important feature. We also compared these inference methods based on simulation results. Gen- erally speaking, It is risky to use TS method for unbiased estimation of the joint modeling of longitudinal process and time-to-event process. LAP is generally more reliable than the other two methods in providing a valid estimation of the joint models. LIN might become a valuable alternative to LAP in terms of estimation results and computation time, especially when more subjects are given. Linearization methods have achieved numerous successes for solving nonlin- ear problems in real world applications, according to which we believe that the linearization strategy may also help us in jointly modeling longitudinal data and 62 Chapter 6. Conclusion and Discussion time-to-event data. Our initial trial in this area seems promising and we believe that the following research directions might be interesting in the future. Our proposed method is based on a model linearization and a likelihood function approximation. The asymptotic property of the estimator is nat- urally of interest. That is to say, we need to search for some theoretical justications for our approach. The model that we adopted in our modeling for time-to-event process, Weibull model, is a specic choice for event time analysis. As we introduced in Section 1.1.3, Cox models and general AFT models are also extremely popular in practice. Thus, one might be interested to generalize the lin- earization strategy to those situations where Cox models or general AFT models are involved in the joint modeling. In many practical longitudinal studies, the response variables are recorded as various types of category. Thus, the NLME in our modeling might need to be replaced by generalized mixed eects model (GLMM). Generalizing the linearization strategy to joint modeling with GLMM may be valuable to practical application. Without a doubt, the missing data problem is always one of the cores of longitudinal studies. Our approach is based on likelihood, which determines that the missing data problems could be naturally incorporated into our framework. However, as more statistical problems are taken into account, the joint modeling itself is getting more and more complex in terms of both theoretical justication and computation. These could be the focuses of our future research. 63 Bibliography Albert, P. S. and Shih, J. H. (2009). On estimating the relationship between longitudinal measurements and time-to-event data using a simple two-stage procedure. Biometrics, pages DOI 10.1111/j.1541{0420.2009.01324.x. Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Ser: B, 61:265 { 285. Brown, E. R. and Ibrahim, J. G. (2003). A bayesian semiparametric joint hierar- chical model for longitudinal and survival data. Biometrics, 59:221 { 228. Brown, E. R., Ibrahim, J. G., and DeGruttola, V. (2005). A
exible b-spline model for multiple longitudinal biomarkers and survival. Biometrics, 61:64 { 73. Carroll, R. J., Ruppert, D., and Stefanski, L. A. (2006). Measurement Error in Nonlinear Models. Chapman & Hall/CRC, London, 2nd edition edition. Chambers, J. M. (1991). Statistical Models in S. Duxbury Press. Davidian, M. and Giltinan, D. M. (1995). Nonlinear Models for Repeated Mea- surements Data. Chapman & Hall/CRC, London. Davidian, M. and Giltinan, D. M. (2003). Nonlinear models for repeated mea- surements data: an overview and update. Journal of Agricultural, Biological and Environmental Statistics, 8:387 { 419. 64 Bibliography de Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York. DeGruttola, V. and Tu, X. M. (1994). Modeling progression of cd-4 lymphocyte count and its relationship to survival time. Biometrics, 50:1003 { 1014. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Ser: B, 39:1 { 38. Diggle, P., Heagerty, P., Liang, K. Y., and Zeger, S. (2002). Analysis of Longitu- dinal Data. Oxford University Press, 2nd edition. Ding, A. and Wu, H. (2001). Assessing antiviral potency of anti-hiv therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics, 2:13 { 29. Ding, J. M. and Wang, J. L. (2008). Modeling longitudinal data with nonparamet- ric multiplicative random eects jointly with survival data. Biometrics, 64:546 { 556. Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall/CRC. Faucett, C. J. and Thomas, D. C. (1996). Simultaneously modeling censored survival data and repeatedly measured covariates: A gibbs sampling approach. Statistics in Medicine, 15:1663 { 1685. Fletcher, R. (1987). Practical Methods of Optimization. John Wiley & Sons, 2nd edition. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calcu- lating marginal desities. Journal of the American Statistical Association, 85:398 { 409. 65 Bibliography Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics, 1, 4:465 { 480. Hsieh, F. S., Tseng, Y. K., and Wang, J. L. (2006). Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics, 62:1037 { 1043. Huang, X. Z., Stefanski, L. A., and Davidian, M. (2009). Latent-model robustness in joint models for a primary endpoint and a longitudinal process. Biometrics, 65:719 { 727. Ibrahim, J. G., Chen, M. H., and Lipsitz, S. R. (2001a). Missing responses in gen- eralized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88:551 { 564. Ibrahim, J. G., Chen, M. H., and Sinha, D. (2001b). Bayesian Survival Analysis. Springer, New York. Laird, N. M. and Ware, J. H. (1982). Random-eects models for longitudinal data. Biometrics, 38:963 { 974. Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley Series in Probability and Statistics. Wiley Interscience, 2nd edition. Lee, Y., Nelder, J. A., and Pawitan, Y. (2006). Generalized Linear Models with Random Eects: Unied Analysis via H-likelihood. Chapman & Hall/CRC. Li, L., Hu, B., and Greene, T. (2009). A semiparametric joint model for longi- tudinal and survival data with application to hemodialysis study. Biometrics, 65:737 { 745. Li, N., Elasho, R. M., Li, G., and Saverc, J. (2010). Joint modeling of longitu- dinal ordinal data and competing risks survival times and analysis of the ninds rt-pa stroke trial. Statistics in Medicine, 29:546 { 557. 66 Bibliography Lindstrom, M. J. and Bates, D. M. (1988). Newton-raphson and em algorithms for linear mixed-eects models for repeated-measures data. Journal of the Ameri- can Statistical Association, 83:1014 { 1022. Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed eects models for repeated measures data. Biometrics, 46:673 { 687. Little, R. J. A. (1992). Regression with missing x's: a review. Journal of the American Statistical Association, 87:1227 { 1237. Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association, 90:1112 { 1121. Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Values. Wiley, New York, 2nd edition. Liu, B. and Muller, H. G. (2009). Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics. Journal of the American Statistical Association, 104:704 { 717. Liu, L. (2009). Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Statistics in Medicine, 28:972 { 986. Liu, W. and Wu, L. (2007). Simultaneous inference for semiparametric nonlinear mixed-eects models with covariate measurement errors and missing responses. Biometrics, 63:342 { 350. Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society, Series: B, 44:226 { 233. Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, 2nd edition. 67 Bibliography Pinheiro, J. C. and Bates, D. M. (2002). Mixed-Eects Models in S and SPLUS. Springer, New York. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New York, 2nd edition. Ratclie, S. J., Guo, W. S., and Have, T. R. T. (2004). Joint modeling of longi- tudinal and survival data via a common frailty. Biometrics, 60:892 { 899. Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed-eects models for un- equally sampled noisy curves. Biometrics, 57:253 { 259. Rizopoulos, D., Verbeke, G., and Lesare, E. (2009). Fully exponential laplace ap- proximations for the joint modelling of survival and longitudinal data. Journal of the Royal Statistical Society, Series: B, 71:637 { 654. Rizopoulos, D., Verbeke, G., Lesare, E., and Vanrenterghem, Y. (2008). A two- part joint model for the analysis of survival and longitudinal binary data with excess zeros. Biometrics, 64:611C619. Rizopoulos, D., Verbeke, G., and Molenberghs, G. (2010). Multiple-imputation- based residuals and diagnostic plots for joint models of longitudinal and survival outcomes. Biometrics, 66:20 { 729. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC. Shah, A., Laird, N., and Schoenfeld, D. (1997). A random-eects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association, 92:775 { 779. Song, X., Davidian, M., and Tsiatis, A. A. (2002a). An estimator for the pro- 68 Bibliography portional hazards model with multiple longitudinal covariates measured with error. Biostatistics, 3, 4:511 { 528. Song, X., Davidian, M., and Tsiatis, A. A. (2002b). A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics, 58:742 { 753. Song, X. and Huang, Y. J. (2006). A corrected pseudo-score approach for additive hazards model with longitudinal covariates measured with error. Lifetime Data Analysis, 12:97 { 110. Song, X. and Wang, C. Y. (2008). Semiparametric approaches for joint modeling of longitudinal and survival data with time-varying coecients. Biometrics, 64:557 { 566. Tseng, Y. K., Hsieh, F. S., and Wang, J. L. (2005). Joint modelling of accelerated failure time and longitudinal data. Biometrika, 92:587 { 603. Tsiatis, A. A. and Davidian, M. (2004). An overview of joint modeling of longi- tudinal and time-to-event data. Statistica Sinica, 14:793 { 818. Verbeke, G. and Molenberghs, G. (2001). Linear Mixed Models for Longitudinal Data. Springer, New York. Vonesh, E. F., Greene, T., and Schluchter, M. D. (2006). Shared parameter models for the joint analysis of longitudinal data and event times. Statistics in Medicine, 25:143 { 163. Vonesh, E. F., Wang, H., Nie, L., and Majumdar, D. (2002). Conditional sec- ond order generalized estimating equations for generalized linear and nonlinear mixed-eects models. Journal of the American Statistical Association, 97:271 { 283. 69 Bibliography Wang, Y. and Taylor, J. M. G. (2001). Jointly modeling longitudinal and event time data with application to acquired immunodeciency syndrome. Journal of the American Statistical Association, 96:895 { 905. Wei, G. C. and Tanner, M. A. (1990). A monte-carlo implementation of the em algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association, 85:699 { 704. Wu, H. and Ding, A. (1999). Population hiv-1 dynamics in vivo: applicable models and inferential tools for virological data from aids clinical trials. Biometrics, 55:410 { 418. Wu, H. and Zhang, J. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Eects Modeling Approaches. Wiley-Interscience. Wu, L. (2002). A joint model for nonlinear mixed-eects models with censoring and covariates measured with error, with application to aids studies. Journal of the American Statistical Association, 97:955 { 964. Wu, L. (2009). Mixed Eects Models for Complex Data. Chapman & Hall/CRC. Wu, L., Hu, X. J., and Wu, H. (2008). Joint inference for nonlinear mixed-eects models and time-to-event at the presence of missing data. Biostatistics, 9:308 { 320. Wu, L., Liu, W., and Hu, X. J. (2010). Joint inference on hiv viral dynamics and immune suppression in presence of measurement errors. Biometrics, 66:327 { 335. Wu, M. C. and Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44:175 { 188. 70 Bibliography Wulfshon, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longi- tudinal data measured with error. Biometrics, 53:300 { 309. Xu, J. and Zeger, S. L. (2001a). The evaluation of multiple surrogate endpoints. Biometrics, 57:81 { 87. Xu, J. and Zeger, S. L. (2001b). Joint analysis of longitudinal data comprising repeated measures and times to events. Applied Statistics, 50:375 { 387. Ye, W., Lin, X., and Taylor, J. M. G. (2008). Semiparametric modeling of longitu- dinal measurements and time-to-event data: A two stage regression calibration approach. Biometrics. Yu, M. G., Law, N. J., Taylor, J. M. G., and Sandler, H. M. (2004). Joint longitudinal-survival-cure models and their application to prostate cancer. Sta- tistica Sinica, 14:835 { 862. Yu, M. G., Taylor, J. M. G., and Sandler, H. M. (2008). Individual prediction in prostate cancer studies using a joint longitudinal survivalccure model. Journal of the American Statistical Association, 103:178 { 187. Zeng, D. L. and Cai, J. W. (2005a). Asymptotic results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics, 33:2132 { 2163. Zeng, D. L. and Cai, J. W. (2005b). Simultaneous modelling of survival and lon- gitudinal data with an application to repeated quality of life measures. Lifetime Data Analysis, 11:151 { 174. Zhang, H. P., Ye, Y. Q., Diggle, P., and Shi, J. (2008). Joint modeling of time series measures and recurrent events and analysis of the eects of air quality on respiratory symptoms. Journal of the American Statistical Association, 103:48 { 60. 71 Bibliography Zhang, S., Muller, P., and Do, K. A. (2010). A bayesian semiparametric survival model with longitudinal markers. Biometrics, 66:719 { 727. 72