Approximate Methods for Joint Models in Longitudinal Studies by Libo Lu B.Sc., Shandong University, 2002 M.Sc., University of Science and Technology of China, 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2010 c Libo Lu 2010 ⃝ Abstract Longitudinal studies often contain several statistical issues, such as longitudinal process and time-to-event process, the association among which requires joint modeling strategy. We ﬁrstly review the recent researches on the joint modeling topic. After that, four popular inference methods are introduced for jointly analyzing longitudinal data and time-to-event data based on a combination of typical parametric models. However, some of them may suﬀer from non-ignorable bias of the estimators. Others may be computationally intensive or even lead to convergence problems. In this thesis, we propose an approximate likelihood-based simultaneous inference method for jointly modeling longitudinal process and time-to-event process with covariate measurement errors problem. By linearizing the joint model, we design a strategy for updating the random eﬀects that connect the two processes, and propose two algorithm frameworks for diﬀerent scenarios of joint likelihood function. Both frameworks approximate the multidimensional integral in the observed-data joint likelihood by analytic expressions, which greatly reduce the computational intensity of the complex joint modeling problem. We apply this new method to a real dataset along with some available methods. The inference result provided by our new method agrees with those from other popular methods, and makes sensible biological interpretation. We also conduct a simulation study for comparing these methods. Our new method looks promising in terms of estimation precision, as well as computation eﬃciency, especially when ii Abstract more subjects are given. Conclusions and discussions for future research are listed in the end. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 1.2 1.3 1.4 Some Issues in Longitudinal Studies . . . . . . . . . . . . . . . . . 1 1.1.1 Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Incomplete Data . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Time-to-Event Data . . . . . . . . . . . . . . . . . . . . . . 6 Joint Modeling Longitudinal Data and Time-to-Event Data . . . . 7 1.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 Inference Approaches . . . . . . . . . . . . . . . . . . . . . 10 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.1 Brief Biological Background . . . . . . . . . . . . . . . . . 12 1.3.2 Statistical Question . . . . . . . . . . . . . . . . . . . . . . 13 Objective and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 14 iv Table of Contents 2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1 Nonlinear Mixed Eﬀects (NLME) Model for Longitudinal Process 16 2.2 Nonparametric Model for Covariate Measurement Errors Problem 19 2.3 AFT Model for Time-to-Event Data . . . . . . . . . . . . . . . . . 22 2.4 Models for Nonignorable Missing Data . . . . . . . . . . . . . . . 26 . . . . . . . . . . . . . . . . . . . . . . . . . 29 Two-Step Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Naive Two-Step Method . . . . . . . . . . . . . . . . . . . 31 3.1.2 Modiﬁed Two-Step Method . . . . . . . . . . . . . . . . . . 32 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 E-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 M-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Laplace Approximation . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Linearization Method . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Framework I . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.2 Framework II . . . . . . . . . . . . . . . . . . . . . . . . . 43 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Model Speciﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2.1 Number of Subjects . . . . . . . . . . . . . . . . . . . . . . 54 5.2.2 Numbers of Repeated Measurements . . . . . . . . . . . . 56 5.2.3 Between-Subject Variance . . . . . . . . . . . . . . . . . . 57 3 Simultaneous Inference 3.1 3.2 4 Data Analysis 5 Simulation v Table of Contents 5.2.4 Within-Subject Variance . . . . . . . . . . . . . . . . . . . 58 6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . 61 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 vi List of Tables 4.1 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 Simulation Results: n = 60 . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Simulation Results: n = 120 . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Simulation Results: ni = 12 . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Simulation Results: A2 , B2 . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Simulation Results: µ2 , λ2 , σ2 . . . . . . . . . . . . . . . . . . . . . 59 vii List of Figures 1.1 Longitudinal Data of 6 Subjects . . . . . . . . . . . . . . . . . . . . 2 1.2 Plots of Longitudinal Data of Subject 10 . . . . . . . . . . . . . . . 13 1.3 Plots of Longitudinal Data of Subject 18 . . . . . . . . . . . . . . . 14 1.4 Plots of Longitudinal Data of Subject 33 . . . . . . . . . . . . . . . 14 4.1 Longitudinal Data Trajectories . . . . . . . . . . . . . . . . . . . . 46 4.2 Estimate of Survivor Function . . . . . . . . . . . . . . . . . . . . . 47 4.3 Residuals Plots I of NLME Model . . . . . . . . . . . . . . . . . . 49 4.4 Residuals Plots II of NLME Model . . . . . . . . . . . . . . . . . . 49 4.5 Residuals Plots I of Covariate Measurement Errors Model . . . . . 50 4.6 Residuals Plots II of Covariate Measurement Errors Model . . . . . 50 viii Acknowledgments First and foremost, I would like to express my gratitude to my supervisors, Dr. Lang Wu and Dr. Jiahua Chen for their support, encouragement and patience throughout my study at the University of British Columbia. Their world-leading expertise and excellent guidance deﬁned a role model for my career. This thesis would not have been completed without their inspiring comments and constructive suggestions. I would also like to thank everyone in the Department of Statistics at the University of British Columbia. In the past two years, they have kindly left me invaluable knowledge, along with wonderful memories. I had a great time with their company. Many many many thanks to my parents and Miss. Marie Li for their unconditional love and support, and to my lifelong friends for their lifelong loyalty. And ﬁnally, I must thank Buddha for always being with me no matter what happens, dispelling my fear, healing my pain, comforting my soul and guiding my life forward. It is lucky to be his follower. ix Chapter 1 Introduction 1.1 Some Issues in Longitudinal Studies With the development of modern statistical methodology, longitudinal studies play an increasingly important role in health science, medical research, social science and economics. This type of study obviously diﬀers from the classical crosssectional studies in terms of the times of measurement of the response. That is to say, in a longitudinal study, some variables are measured more than once at diﬀerent time points for each subject. Longitudinal studies, which enable the researchers to separate the cohort and time eﬀects, are valuable because the eﬀect of time could be a serious confounding factor to other covariates of interest. 1.1.1 Longitudinal Data Suppose n subjects are included a longitudinal study. The value of the response variable y for subject i at time tij is denoted as yij , where i = 1, . . . , n, j = 1, . . . , ni . This type of data are called longitudinal data. The graphical display of longitudinal data requires attention. A sample plot of longitudinal data of 6 subjects is on the top of Figure 1.1, which is quite diﬃcult for discovering growth patterns. It is often helpful to distinguish subjects with diﬀerent types of points and connect the observations of each subject with lines, such as the bottom plot of Figure 1.1. Comparing to the data obtained from a cross-sectional study, the data col1 1.1. Some Issues in Longitudinal Studies 1 2 3 Response 4 5 6 Figure 1.1: Longitudinal Data of 6 Subjects 0.0 0.2 0.4 0.6 0.8 1.0 0.6 0.8 1.0 1 2 3 Response 4 5 6 Time 0.0 0.2 0.4 Time 2 1.1. Some Issues in Longitudinal Studies lected from a longitudinal study are naturally correlated. This feature requires a statistical methodology to take the correlation into account. For a longitudinal data analysis, a primary interest is to estimate the population mean and its relationship with covariates. A popular method for this purpose is called the marginal model, which separately models the mean structure and the variancecovariance structure of the response variable. Its underlying idea is close to the quasi-likelihood approach that only requires the ﬁrst and second moment information of the response variable without speciﬁc distributional assumption. Another approach is the transitional model, which assumes a Markov structure for the longitudinal process to model the correlation among the repeated measurements. This approach is appealing if the current observation strongly depends on previous ones. Marginal models and transitional models consider the between-subject variation and within-subject variation separately. A mixed eﬀects model considers the two sources of variation simultaneously by incorporating random eﬀects to represent the between-subject variation and within-subject correlations. Laird and Ware (1982) proposed a general linear mixed eﬀects (LME) model: yi = Xi β + Zi bi + ei (1.1) bi ∼ N (0, D), (1.2) ei |bi ∼ N (0, Ri ), where β are the ﬁxed eﬀects (population parameters), bi = (bi1 , . . . , biq )T are the random eﬀects (subject-speciﬁc parameters), Xi is a ni × (p + 1) design matrix containing the covariates of subject i, Zi is a ni × q design matrix (Zi is often a submatrix of Xi ) for the random eﬀects, ei = (ei1 , . . . , eini )T are the random errors of those within-subject measurements, Ri is a ni × ni matrix characterizing variance-covariance of within-subject measurements, and D is the variance- 3 1.1. Some Issues in Longitudinal Studies covariance matrix of the random eﬀects. The inference procedure of mixed eﬀects models is a natural extension of classic regression methods for cross-sectional models, and most ideas and results, such as the maximum likelihood estimation (MLE) and asymptotic normality, also apply to the estimation of LME models. Diggle et al. (2002) provided a comprehensive overview of the above three methods. Verbeke and Molenberghs (2001) and Davidian and Giltinan (1995) gave more discussions on linear and nonlinear mixed eﬀects models. In practice, the longitudinal trajectories could be so complex that parametric models are not ﬂexible enough. For those cases, Wu and Zhang (2006) introduced some semiparametric or nonparametric models. 1.1.2 Incomplete Data The data observed in longitudinal studies are very likely to be incomplete, such as measurement errors and missing data. In the following, we brieﬂy introduce these problems. Measurement Errors Problem It is ideal that we can get precise values of the selected variables, such as gender and age. However, in practice, the measurements of some continuous variables, such as blood pressure and CD4, are often imprecise due to various reasons. We call this issue as measurement error problem, the distinguishing feature of which is that we can only observe a variable z that are related to z∗ . A major goal of measurement error modeling is to obtain nearly unbiased estimates of the parameters in the main model by ﬁtting a sub-model based on z. This objective requires careful analysis, because simply substituting z for z∗ without proper adjustments may lead to biased estimates in some cases. Carroll et al. (2006) 4 1.1. Some Issues in Longitudinal Studies reviewed general methods for measurement errors problems. Missing Data Problem Missing data problems mainly encompass missing responses, missing covariates, and dropouts. A main reason is that not all subjects are available at every measurement time throughout the entire study, especially if the study lasts for a long period. In the cases of missing responses, missing covariates, and dropouts, part of the data are completely missing. The missingness can be classiﬁed into three mechanisms: • missing completely at random (MCAR): missingness depends neither on the observed data nor on the unobserved data. • missing at random (MAR): missingness may depend on the observed data but not on the unobserved data. • missing not at random (MNAR): missingness may depend on both the observed data and the unobserved data. For instance, suppose a student has not submitted his assignment in class on the due day, then his grade for that assignment was missing. If he left his assignment at home, the missing grade is MCAR. If he missed that class, then the missing grade is MAR, since the missingness is determined by the observed data “attendance”. If he has not ﬁnished his assignment, then the missing grade is MNAR. There are enormous valuable literatures on this topic, since the missing data are often informative and nonignorable. Statistical inference without considering this issue might result in biased estimation. Little and Rubin (2002) provided a systematic introduction to major missing data methods. Schafer (1997) gave a 5 1.1. Some Issues in Longitudinal Studies general introduction for incomplete multivariate data analysis. Wu (2009) oﬀered an overview of analyzing incomplete data problems with mixed eﬀects models. 1.1.3 Time-to-Event Data In many longitudinal studies, the time to certain events happen, such as time of patient’s death, are of great interest. It is conventional to talk about survival data and survival analysis, regardless of the nature of the events. Often, statistical models are built for studying the relationship between the event time and other covariates. The distinguishing feature of time-to-event data is censoring, which means that the event time may not be observed exactly. For example, suppose that a subject is lost to follow-up after 5 years of observation. According to these complex reasons, censoring issues can be classiﬁed into three types: • Right-censoring. The time of event is not observed because it happened to the right of the 5th year. This case is called right-censoring. • Left-censoring. Suppose a subject had an event before the 5th year but the exact time of the event is unknown. This case is called left-censoring. • Interval-censoring. Suppose a subject had an event between the 4th year and 5th year but the exact time of the event is unknown. This case is called interval-censoring. Here are some possible reasons why censoring may occur: • a subject does not experience the event before the study end; • a person is lost to follow-up during the study period; • a person withdraws from the study because of a death (if death is not the event of interest) or some other reasons. 6 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data The censoring feature, which is sometimes referred as a special missing data issue, determines that speciﬁc statistical methodology is necessary. Common approaches in time-to-event data analysis include Cox Proportional Hazards model (Cox model), which assumes that the eﬀect of a covariate is to multiply the hazard by some constant, and Accelerated Failure Time model (AFT model), which assumes that the eﬀect of a covariate is to multiply the predicted event time by some constant. Time-to-event data have been extensively studied. Lawless (2003) provided comprehensive introduction to the relevant statistical models and methods. 1.2 Joint Modeling Longitudinal Data and Time-to-Event Data 1.2.1 Motivation The statistical issues mentioned in section 1.1 often occur simultaneously but used to be analyzed separately in longitudinal studies. Previous relevant researches reported that the intrinsic relationships among longitudinal process, time-to-event process and incomplete data mechanism can be greatly inﬂuential to the statistical inference for diverse scientiﬁc objectives. • Wu and Carroll (1988) reported that analyzing longitudinal data without modeling informative censoring issue might lead to biased results. Their work was generalized by DeGruttola and Tu (1994). Little (1995) pointed out the possible estimation bias for analyzing longitudinal data due to ingnoring the missing data problems. Wu (2002) and Liu and Wu (2007) revealed the eﬀects of measurement errors problems and missing data problems on the inference of longitudinal process. These researches imply that classical statistical methods for longitudinal data analysis may need to be 7 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data adjusted for incorporating certain critical events in longitudinal studies, such as outcome-dependent drop-out. • Faucett and Thomas (1996) and Wulfshon and Tsiatis (1997) modeled timeto-event data with time-varying covariates measured with errors, where the joint modeling approach and the separate modeling approach provided different estimates. These researches indicated that valid modeling of timeto-event data might be conditional on relevant longitudinal processes, for instance, the time-varying covariates. Their researches were generalized respectively by Xu and Zeger (2001a,b) and Song et al. (2002a,b). • Henderson et al. (2000) jointly analyzed longitudinal data and time-to-event data with equal interests using the likelihood-based approaches. Wang and Taylor (2001) and Brown and Ibrahim (2003) further generalized the modeling assumptions on longitudinal process. Joint modeling of both longitudinal process and time-to-event process is particularly essential if the association between the event times and growth trends of the longitudinal process is of interest. Generally speaking, joint modeling links the longitudinal process and the timeto-event process via a common set of random eﬀects. Under the joint modeling framework, the main focus could be just on the longitudinal process or the time-toevent process, or it could be on both processes. Since this joint modeling approach contains at least two processes, the parameters to be estimated are usually more than a single inference. If only one speciﬁc process is of interest, then the other process would better be modeled as simple as possible to reduce the nuisance parameters. Tsiatis and Davidian (2004) provided an excellent overview on the motivations of joint modeling and relevant preceding literature. The rest of this section brieﬂy 8 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data reviews both statistical model and inference approach that have been adopted for recent researches. 1.2.2 Statistical Models Models for Longitudinal Data Many researches adopt mixed eﬀects models for modeling longitudinal data. One typical model, adopted by Ratcliﬀe et al. (2004), Zeng and Cai (2005a,b), Li et al. (2009) and Huang et al. (2009), is parametric linear mixed eﬀects models with covariates as predictors. Other researchers, such as Brown et al. (2005), Ding and Wang (2008), Ye et al. (2008) and Zhang et al. (2010), used nonparametric models for more ﬂexible descriptions of the longitudinal processes. Vonesh et al. (2006) and Yu et al. (2008) adopted nonlinear mixed eﬀects models according to the corresponding scientiﬁc backgrounds. Rizopoulos et al. (2008), Liu (2009) and Li et al. (2010) established their approaches based on generalized mixed eﬀects models for analyzing categorical longitudinal data. Models for Time-to-event Data For modeling time-to-event data, some joint modeling studies, including Ratcliﬀe et al. (2004), Song and Huang (2006), Ding and Wang (2008), Li et al. (2009) and Zhang et al. (2010), were based on Cox proportional hazard model, partially because Cox model is widely used in practice. Recently, according to the scientiﬁc background of the data, more and more attention have been turned to accelerated failure time model, such as Tseng et al. (2005) and Rizopoulos et al. (2008). Particularly, Vonesh et al. (2006) and Rizopoulos et al. (2010) adopted Weibull model, which bears the characteristics of both Cox model and AFT model. 9 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data 1.2.3 Inference Approaches Two-stage Approach The idea of two-stage approach is to ﬁrstly estimate the shared parameters bi from either the longitudinal process or the time-to-event process, such as Zhang et al. (2008) and Ye et al. (2008), with the corresponding parameters of the ﬁrst ˆ i are used for the model being estimated. Then the estimated shared parameters b estimation of the parameters of the second model. Although two-stage approach is easy to implement, there are several drawbacks (Wulfshon and Tsiatis, 1997). For example, the estimated parameters of one process from the ﬁrst step are regarded as ﬁxed in the modeling of the other process in the second step, which does not propagate uncertainty from step one to step two. Likelihood-Based Approach A more uniﬁed alternative of statistical inference is based on the joint likelihood function given both longitudinal data and time-to-event data. The joint modeling approaches simultaneously estimate the parameters that describe the longitudinal process, as well as those that describe the time-to-event process. Besides enhancing the eﬃciency of the inference, joint modeling is also expected to provide more precise estimation of the relationship between the two processes. Tseng et al. (2005) considered a LME model for longitudinal data along with a general AFT model for time-to-event model, and developed an EM algorithm to maximize the resulting log-likelihood, which involves intractable integrals over the distribution of ﬁxed eﬀects. The similar EM algorithms have been applied to many other researches based on Cox models, such as Ratcliﬀe et al. (2004), Wu et al. (2008), Rizopoulos et al. (2009) and Li et al. (2010). Vonesh et al. (2006) and Rizopoulos et al. (2009) eﬀectively reduced the computational diﬃculties in 10 1.2. Joint Modeling Longitudinal Data and Time-to-Event Data the E-step by using Laplace approximation. Zeng and Cai (2005a) adopted a LME model and a Cox model for the longitudinal process and the time-to-event process, and discussed the asymptotic behavior of the MLE obtained from EM algorithm. Hsieh et al. (2006) reinforced the merit of the joint modeling approach in Wulfshon and Tsiatis (1997) by providing a theoretical explanation of the robustness features observed in the literature. They suggested that the likelihood-based procedure with normal random eﬀects can be very eﬃcient and robust as long as there is rich enough information available from the longitudinal data. However, if the longitudinal data are too sparse or carry too large measurement errors, the eﬃciency loss of joint modeling can be quite substantial. Hsieh et al. (2006) also recommended to use bootstrap method to get more reliable estimates of the standard errors of the MLEs. Brown et al. (2005), Yu et al. (2008) and Zhang et al. (2010) took Bayesian approaches to joint model two processes and developed Markov Chain Monte Carlo (MCMC) implementations, which were also based on likelihood. Ibrahim et al. (2001b, chap. 7) provided a detailed discussion on joint modeling from a Bayesian perspective. Yu et al. (2004) compared the inference results for Bayesian approach and EM approach, both of which rely on the speciﬁcation of likelihood functions. Semiparametric Approach Song and Huang (2006) and Song and Wang (2008) focused on estimating the ﬁxed eﬀects in the joint models of longitudinal process and time-to-event process by developing a set of unbiased estimating equations (conditional and corrected score approaches), which yield consistent and asymptotically normal estimators with no assumptions on the random eﬀects. Their approaches reduce reliance on the parametric modeling assumptions. 11 1.3. Motivating Example 1.3 1.3.1 Motivating Example Brief Biological Background Human Immunodeﬁciency Virus (HIV) is a virus that directly attacks certain human organs, such as the brain, heart, and kidneys, as well as the human immune system. Infection with HIV occurs by the transfer of blood, semen, vaginal ﬂuid, pre-ejaculate, or breast milk. Within these bodily ﬂuids, HIV is present as both free virus particles and virus within infected immune cells. The four major routes of transmission are unsafe sex, contaminated needles, breast milk, and transmission from an infected mother to her baby at birth (Vertical transmission). Screening of blood products for HIV has largely eliminated transmission through blood transfusions or infected blood products in the developed world. The primary cells attacked by HIV are the CD4 lymphocytes, which help direct immune function in the body. HIV infection leads to low levels of CD4 cells through three main mechanisms: direct viral killing of infected cells; increased rates of apoptosis in infected cells; and killing of infected CD4 cells by CD8 cytotoxic lymphocytes that recognize infected cells. Since CD4 cells are required for proper immune system function, when enough CD4 lymphocytes have been destroyed by HIV, the immune system barely works. Many problems experienced by people infected with HIV result from a failure of the immune system to protect them from certain opportunistic infections (OIs) and cancers. Most people infected with HIV eventually develop Acquired Immunodeﬁciency Syndrome (AIDS). These individuals mostly die from opportunistic infections or malignancies associated with the progressive failure of the immune system. HIV progresses to AIDS at a variable rate aﬀected by viral, host, and environmental factors. Most individuals will progress to AIDS within 10 years of HIV infection, sooner or later. HIV treatment with anti-retrovirals increases the life expectancy 12 1.3. Motivating Example of people infected with HIV. Even after HIV has progressed to diagnosable AIDS, the average survival time with antiretroviral therapy was estimated to be more than 5 years as of 2005. Without antiretroviral therapy, someone who has AIDS typically dies within a year. 1.3.2 Statistical Question According to the clinical experiences, HIV infection often occurs along with the variation of CD4 cells, CD8 cells and their ratio. Recently, the CD4/CD8 ratio has become a new biomarker for assessing the relative condition of HIV subjects, and further more, for predicting the progression from HIV to AIDS. After having certain anti-HIV treatments, the viral loads of HIV-infected subjects are expected to decline, roughly with biphasic exponential decay. Meanwhile, we might observe a corresponding increase of CD4 and a decrease of CD8, and consequently, an increase of CD4/CD8 ratio. However, the viral load might rebound after a period of treatment due to various reasons. The relationship between HIV viral suppression and immune restoration is of great attention in AIDS research. Figure 1.2: Plots of Longitudinal Data of Subject 10 Subject 10 100 Time 150 0.65 0.55 0.50 0.45 0.40 0.35 CD4 300 250 200 150 CD4 CD4/CD8 100 0.40 0.35 50 0.60 400 350 0.60 0.50 0.45 4.0 3.5 3.0 log10(RNA) CD4/CD8 0 CD4/CD8 0.55 4.5 350 CD4 300 Time 150 log10(RNA) 250 200 100 100 150 50 CD4/CD8 0.65 5.0 400 5.0 4.5 4.0 log10(RNA) 3.5 3.0 log10(RNA) CD4 0 450 Subject 10 450 Subject 10 0 50 100 150 Time Our research was motivated by an AIDS Clinical Trials Group (ACTG) study (Wu and Ding, 1999) for demonstrating that the initial viral decay rate reﬂects 13 1.4. Objective and Outline Figure 1.3: Plots of Longitudinal Data of Subject 18 Subject 18 350 0.48 0.46 0.44 CD4/CD8 300 CD4 0.46 0 50 100 150 0 50 Time 100 0.42 0.40 0.40 2.5 0.42 250 3.0 250 2.5 CD4 CD4/CD8 0.44 log10(RNA) CD4/CD8 CD4/CD8 3.5 300 CD4 log10(RNA) 3.5 log10(RNA) CD4 3.0 log10(RNA) 0.48 4.0 350 4.0 0.50 Subject 18 0.50 Subject 18 150 0 50 Time 100 150 Time Figure 1.4: Plots of Longitudinal Data of Subject 33 Subject 33 Subject 33 CD4 CD4/CD8 0.24 0.22 0.20 CD4/CD8 200 CD4 0.18 0.18 100 0.20 150 0.22 4.5 4.0 3.5 3.0 100 3.0 CD4/CD8 5.0 0.24 5.5 0.26 250 log10(RNA) CD4/CD8 6.0 250 CD4 log10(RNA) 150 4.5 3.5 4.0 log10(RNA) 5.0 200 5.5 6.0 log10(RNA) CD4 0.26 Subject 33 0 50 100 Time 150 0 50 100 Time 150 0 50 100 150 Time the eﬃcacy of an anti-HIV treatment. Figure 1.2, 1.3 and 1.4 are the trajectories of HIV viral load, CD4 count, and CD4/CD8 ratio of three randomly selected subjects. HIV viral load appears to be negatively correlated with both CD4 count and CD4/CD8 ratio. This article studies the potential underlying association between the viral load decay and the CD4/CD8 time trend by jointly modeling the HIV viral load dynamics and the time of the ﬁrst decrease of CD4/CD8. 1.4 Objective and Outline In this thesis, we develop a new approximate statistical inference method for jointly modeling longitudinal process and time-to-event process with covariate 14 1.4. Objective and Outline measurement errors. Comparing to other existing inference methods, the methods that we proposed are easy to implement and much more computationally eﬃcient. This thesis is organized as follows. Chapter 2 introduces the statistical models that will be included in our joint modeling. Chapter 3 discusses currently available inference methods and proposes our new methods with procedure details. These methods are to be applied to a real dataset from the motivating example in Chapter 4 for data analysis. The simulation results are displayed in Chapter 5. We ﬁnally make the conclusion and future research discussion in Chapter 6. 15 Chapter 2 Statistical Models The joint modeling of longitudinal process and time-to-event process is basically composed of several simpler models which represent respectively diﬀerent speciﬁc processes. In this chapter, we introduce the statistical models for each process in our longitudinal study. 2.1 Nonlinear Mixed Eﬀects (NLME) Model for Longitudinal Process The general LME model (1.1) has achieved numerous successes in both theoretical and applied research for its simplicity in modeling, computation and interpretation. However, linear structure only provides local linear approximation to the growth trend of the response variable and thus limits our vision within the range of the observed data. In many longitudinal studies, the developments of response variables have certain deterministic mechanisms underlying, which can often be represented by nonlinear mathematical formulations of physically meaningful parameters. The nonlinear formulations, if being correctly speciﬁed, support more precise estimation for the statistical model, both inside and outside of the range of the observed data, with less parameters than linear models. Therefore, nonlinear mixed eﬀects models (NLME) are becoming increasingly favorable in representing the longitudinal process with known developing mechanism. Suppose n subjects are included in a longitudinal study. Let yij be the response 16 2.1. Nonlinear Mixed Eﬀects (NLME) Model for Longitudinal Process value of subject i at time tij , and let zij be the corresponding covariate value. We can write the NLME model in the following two-stage hierarchical nonlinear model: yij = g(tij , β i ) + eij (2.1) β i = h(zi , β, bi ) (2.2) bi ∼ N (0, B), ei |bi ∼ N (0, Ri ), (2.3) where g(·) and h(·) are given nonlinear functions, β i are the subject-speciﬁc parameters and β are the ﬁxed eﬀects (population parameters), zi contain the covariates of subject i, bi = (bi1 , . . . , biq )T are the random eﬀects distinguishing subjects, ei = (ei1 , . . . , eini )T are the random errors of within-subject measurements, B is the variance-covariance matrix of the random eﬀects, and Ri is a ni ×ni matrix characterizing the variance-covariance of within-subject measurements. In practice, the function g(·) is determined by the scientiﬁc problem and the data background. The function h(·) is often a linear combination of the ﬁxed eﬀects and the random eﬀects, such as β i = Ai β + Bi bi , (2.4) where Ai is a design matrix depending on the elements of zi , and Bi is a design matrix, which typically involve only zeros and ones, allowing some elements of β i to have no associated random eﬀects. The random eﬀects bi represent between-subject variance that is not explained by covariate zi . Thus, the variancecovariance matrix B is usually unstructured. The error term ei represent the lack of ﬁt of the models and possible measurement errors of the data. The choice of Ri should be guided under practical background, on which Davidian and Giltinan (2003) gave detailed discussions. 17 2.1. Nonlinear Mixed Eﬀects (NLME) Model for Longitudinal Process A widely applicable approach for estimating parameters of NLME is based on likelihood. Let θ = (β, κ, B) denotes all parameters, where κ is the collection of parameters in Ri , i = 1, . . . , n. The marginal distribution of the response yi is given by ∫ f (yi |θ) = f (yi |zi , θ, bi )f (bi |B)dbi , (2.5) and the likelihood is L(θ|y) = n ∏ L(θ|yi ) = i=1 n ∫ ∏ f (yi |zi , θ, bi )f (bi |B)dbi , (2.6) i=1 The statistical inference is then based on maximum likelihood estimation (MLE) approach, which means that the classic large-sample asymptotic result is promising to hold in this situation. In general, the marginal distribution (2.5) or the likelihood function (2.6) has no closed-form expression, which greatly distinguishes NLME model from LME model. This characteristic implies that a major challenge of likelihood-based approach for NLME model is the maximization of the likelihood function (2.6). Commonly applicable methods are: • Numerical method or Monte Carlo method, which uses diﬀerent integration techniques to approximate the integral of (2.6) and then maximizes the approximate likelihood. • EM algorithm, which iteratively maximizes the likelihood function (2.6). • Laplace approximation method, which approximates the integral of (2.6) by a simple form and then maximizes the approximate likelihood. • Taylor expansion method, which linearizes the NLME into a sequence of LMEs and approximate the maximum point of the likelihood function (2.6). 18 2.2. Nonparametric Model for Covariate Measurement Errors Problem Davidian and Giltinan (1995) and Wu (2009) provided comprehensive discussions on the asymptotic results and implementational details. 2.2 Nonparametric Model for Covariate Measurement Errors Problem In longitudinal studies, the covariates are usually recorded and analyzed to partially explain the between-subject variance. However, the covariates may be measured with substantial errors. It is well known that ignoring covariates measurement errors might lead to severe bias in statistical inference (see, e.g., Carroll et al. (2006)). Therefore, the covariate measurement errors problem should be incorporated into the analysis of longitudinal data. Generally, the choice of appropriate analysis method for measurement errors problem depends on the following two considerations. • The choice of statistical model depends on the type and amount of data available for analysis. A sophisticated error model may be less reliable without suﬃcient data available. • The measurement error models can mainly be classiﬁed into functional models and structural models. Comparatively speaking, functional models treat z∗ as a sequence of unknown ﬁxed constants or a random variable with minimal distributional assumption. It leads to estimation procedures that are robust to mis-speciﬁcation of the distribution of z∗ . On the other hand, structural models assume z∗ follow speciﬁc parametric distributions, the estimates based on which are usually more eﬃcient if the parametric distributions are correctly speciﬁed. From the viewpoint of functional data analysis, measurement errors problems in longitudinal studies are closely related to ﬁtting several smooth curves. Thanks 19 2.2. Nonparametric Model for Covariate Measurement Errors Problem to the magniﬁcent developments of functional data analysis and nonparametric statistical analysis in recent years, the researchers now have more ﬂexible techniques to handle complex data with high precision. Rice and Wu (2001) proposed a linear mixed eﬀects regression spline model for making subject-speciﬁc inference for the covariates with measurement errors and distinguishing the between-subject and within-subject variance. Liu and Muller (2009) developed a general and model-free dynamic equation for obtaining the derivatives of the growth curves for sparse data. This empirical diﬀerential equation has potential to provide valuable insights into the time-dynamics of random trajectories. In our study, we adopt the approach of Rice and Wu (2001) for modeling the time-varying covariate process to incorporate the measurement errors. Since the covariate process could be quite complex, a ﬂexible empirical nonparametric linear mixed eﬀects model is considered. zi (t) = r(t) + hi (t) + ξi (t) ≡ zi∗ (t) + ξi (t), i = 1, . . . , n. (2.7) where zi∗ (tik ) = r(tik )+hi (tik ) are the true but unobserved covariate values, r(tik ) and hi (tik ) are unknown nonparametric smooth functions representing the population means and the subject-speciﬁc variation respectively, and ξi (tik ) are the error terms following N (0, λ2 ). The random smooth function hi (·) is introduced to incorporate the large between-subject variation in the covariate process, while the ﬁxed smooth function r(·) represents population average of the covariate process. We assume that hi (·) is the realization of a zero-mean stochastic process. As in Rice and Wu (2001), we can approximate the nonparametric functions r(t) and hi (t) by linear combinations of some basis functions Ψp (t) = 20 2.2. Nonparametric Model for Covariate Measurement Errors Problem [ψ0 (t), ψ1 (t), . . . , ψp−1 (t)]T and Φq (t) = [ϕ0 (t), ϕ1 (t), . . . , ϕq−1 (t)]T as follows: r(t) ≈ rp (t) = p−1 ∑ αk ψk (t) = Ψp (t)T α, k=0 q−1 ∑ hi (t) ≈ hiq (t) = aik ϕk (t) = Φq (t)T ai , (2.8) i = 1, . . . , n. (2.9) k=0 where α = (α0 , . . . , αp−1 )T is a p×1 vector of ﬁxed eﬀects and ai = (ai0 , . . . , ai,q−1 )T is a q × 1 vector of random eﬀects, and we assume ai are i.i.d. ∼ N (0, A). The functions Ψ(·) and Φ(·) can theoretically be taken as any basis functions, including various types of splines, such as B-splines (de Boor, 1978), smoothing splines (Chambers, 1991) and P-splines (Ramsay and Silverman, 2005), or local polynomials (Fan and Gijbels, 1996). The above approximations of r(t) and hi (t) can be arbitrarily accurate as the values of p and q increase. However, too large values of p and q may lead to overﬁtting problem. Appropriate values of p and q can be determined by standard model selection criteria such as AIC or BIC values (Rice and Wu, 2001). By approximating r(t) and hi (t) with rp (t) and hiq (t) respectively, the covariate model (2.7) can be approximated by the following LME model zi = z∗i + ξ i = ΨTp α + ΦTq ai + ξ i (2.10) ai ∼ N (0, A), (2.11) ξ i |ai ∼ N (0, λ2 Ii ). Model (2.10) can be used to model covariate measurement errors problems (Carroll et al., 2006). For multiple covariates, we can model each covariate separately using (2.10). However, a more reasonable approach is to consider a multivariate version of model (2.10), such as a multivariate LME model (Shah et al., 1997). 21 2.3. AFT Model for Time-to-Event Data 2.3 AFT Model for Time-to-Event Data In the analysis of time-to-event data, the survivor function and the hazard function, which depend on time, are of particular interest. Denote f (t) as the probability density function (p.d.f.) of event time T . The survivor function S(t) is deﬁned as the probability of surviving to time t ∫ S(t) = P r(T ≥ t) = ∞ f (x)dx. (2.12) t The graph of S(t) against t is called the survival curve. The hazard function h(t) is deﬁned as the event rate at time t conditional on no event occurs until time t or later. P r(t ≤ T < t + ∆t|T ≥ t) ∆t→0 ∆t d f (t) = − log S(t). = S(t) dt h(t) = lim (2.13) Many statistical analysis methods for analyzing time-to-event data, no matter they are parametric, semiparametric or nonparametric, are closely related to the likelihood function given the data, which is determined by the existence and speciﬁc type of censoring issue. Let’s consider a simple right-censoring case for example. Let Ti be the event time for subject i, i = 1, . . . , N. Let Ci be the censoring time for subject i. Obviously, if no censoring occurs, Ti is observed and less than the potential censoring time (Ti ≤ Ci ). If censoring occurs, we only know that Ti > Ci . In general, we denote ρi = min(Ti , Ci ), δi = I(Ti ≤ Ci ). (2.14) 22 2.3. AFT Model for Time-to-Event Data Both ρi and δi are random variables and their joint p.d.f. is f (ρi )δi S(ρi )1−δi . (2.15) Suppose T1 , . . . , Tn are independent, we may obtain the likelihood function given the right-censored data as L= N ∏ f (ρi )δi S(ρi )1−δi . (2.16) i=1 The Kaplan-Meier method, the nonparametric MLE of the survivor function S(t), can be used to estimate this curve from the observed survival times without assumption on distribution. However, in many cases, the survival function S(t) is assumed following certain distributions. For instance, if S(t) can be written as ( S(T |u, σ) = S0 log T − u σ ) , (2.17) where u ∈ R is a location parameter and σ > 0 is a scale parameter, we say T has a log-location-scale distribution, which is a widely used parametric distribution for time-to-event data. If we generalize u to u(z), we may have more covariates z into account, that is ( S(T |z, u, σ) = S0 log T − u(z) σ ) . (2.18) Suppose ϵ is a random variable with survivor function S0 (ϵ) and is independent from z, then (2.18) can be rewritten as log T = u(z) + σϵ. (2.19) Here the survivor function is represented in a regression setting. 23 2.3. AFT Model for Time-to-Event Data Formula (2.19) is a general semiparametric AFT model. In practice, u(·) can be taken as a linear function and we have the following parametric AFT model log T = γ0 + γ T1 z + σϵ, (2.20) where ϵ is independent from z. In this thesis, we consider the random eﬀects ai and bi as covariates and build the following AFT frailty model for time-to-event data. log Ti = γ0 + γ T1 ai + γ T2 bi + σϵi , i = 1, . . . , n. (2.21) where γ = (γ0 , γ 1 , γ 2 ) are the unknown parameters and ϵi are i.i.d. random variables. Note that the NLME longitudinal model (2.1) and the time-to-event model (2.21) are connected through the random eﬀects bi . There are three major parametric AFT models based on diﬀerent distributions of ϵ. • Weibull AFT model. Assume the event times follow a Weibull distribution with scale parameter exp(−(γ0 +γ T1 ai +γ T2 bi )/σ) and shape parameter 1/σ, the survivor function is { ( )} log t − γ0 − γ T1 ai − γ T2 bi S(t) = exp − exp , σ (2.22) Because the Weibull distribution has both the proportional hazards property and the accelerated failure time property, this model is particularly attractive. • The log-logistic AFT model. Assume the event times follow a log-logistic distribution with parameters (γ0 + γ T1 ai + γ T2 bi )/σ and 1/σ, the survivor 24 2.3. AFT Model for Time-to-Event Data function is { S(t) = ( 1 + exp log t − γ0 − γ T1 ai − γ T2 bi σ )}−1 . (2.23) • The log-normal AFT model. Assume the event times follow a log-normal distribution with parameters γ0 + γ T1 ai + γ T2 bi and σ, the survivor function is ( S(t) = 1 − Φ log t − γ0 − γ T1 ai − γ T2 bi σ ) . (2.24) Once the distribution is assumed, one may ﬁnd the corresponding likelihood function by specifying the f (t) and S(t) in (2.16). Suppose the survival time ϵi in AFT model follows Weibull distribution, the pdf and survivor function of ϵi is f (ϵ) = exp(ϵ − eϵ ), S(ϵ) = exp(−eϵ ), (2.25) that is ) [ ( f (t) = exp σ −1 log t − γ0 − γ T1 ai − γ T2 bi ( )] − exp σ −1 (log t − γ0 − γ T1 ai − γ T2 bi ) , (2.26) (2.27) and [ ( )] S(t) = exp − exp σ −1 (log t − γ0 − γ T1 ai − γ T2 bi ) . (2.28) The log-likelihood function of the AFT model for subject i is ℓ(γ|ai , bi ) = n ∑ (δi ϵi − eϵi ) i=1 n ∑ [ −1 ( ) = δi σ log Ti − γ0 − γ T1 ai − γ T2 bi i=1 ( )] − exp σ −1 (log Ti − γ0 − γ T1 ai − γ T2 bi ) . 25 2.4. Models for Nonignorable Missing Data By integrating out the random eﬀects ai , bi , we can obtain the log-likelihood function of γ. ℓ(γ) = = n ∫∫ ∑ i=1 n ∫∫ ∑ ℓ(γ|ai , bi )f (ai |A)f (bi |B)dai dbi ( (2.29) ( )) δi σ −1 log Ti − γ0 − γ T1 ai − γ T2 bi f (ai |A)f (bi |B)dai dbi i=1 n ∫∫ ∑ [ ( )] − exp σ −1 (log Ti − γ0 − γ T1 ai − γ T2 bi ) f (ai |A)f (bi |B)dai dbi i=1 = σ −1 − n ∑ i=1 n ∑ [ ] δi log Ti − γ0 − γ T1 E(ai ) − γ T2 E(bi ) ( ) exp σ −1 (log Ti − γ0 ) ∫ ( ) exp −σ −1 γ T1 ai f (ai |A)dai ∫i=1 ) ( × exp −σ −1 γ T2 bi f (bi |B)dbi ( ) n [ [ ] ∑ log Ti − γ0 exp δi log Ti − γ0 − γ T1 E(ai ) − γ T2 E(bi ) − σ i=1 i=1 ( )] γ T E(ai ) γ T1 Σ(ai )γ 1 γ T2 E(bi ) γ T2 Σ(bi )γ 2 × exp − 1 + + − σ σ2 σ σ2 = σ −1 n ∑ Then the statistical inference could be carried out based on the log-likelihood function (2.29) and the MLE theory applies. 2.4 Models for Nonignorable Missing Data Lots of generally applicable statistical inference methods for incomplete data issues are likelihood-based, the idea of which is illustrated by a missing response case in the following. Suppose some values of the response variable y are missing. Let ri = 1 if yi is missing and ri = 0 if yi is observed. Denote the latent variable that carries the information of the missingness as ω, and the joint distribution of response y and missingness indicator r, condition on covariates x, is given by 26 2.4. Models for Nonignorable Missing Data P r(y, r|x, ω). Since this distribution can be rewritten as P r(y, r|x, ω) = Pr(y|x, θ(ω))Pr(r|y, x, ψ(ω)), (2.30) where θ(ω), ψ(ω) are subsets or transformation of ω, the likelihood function can be obtained if the full-data response model P r(y|x, θ(ω)) and the missing mechanism P r(r|y, x, ψ(ω)) are speciﬁed. The statistical inference can therefore be carried out based on by maximum likelihood theory. Little (1992) reviewed the statistical methods of estimation in regression models with missing covariates. Six methods dealing with missing covariates are compared: complete-case methods, available-case methods, least squares on imputed data, maximum likelihood methods, Bayesian methods and multiple imputation. He suggested that the maximum likelihood method, Bayesian methods, and multiple imputation method perform well, and the maximum likelihood method is preferred in a large samples scenario and Bayesian methods or multiple imputation method are preferred in a small samples scenario. In the speciﬁcation of NLME model (2.1), all covariate values should be available, either observed or estimated, at the response measurement times tij . However, in practice this may not be the case since covariates may be measured at diﬀerent time points or may be missing at tij . Model (2.10) automatically incorporates missing covariates when the missing data mechanism is ignorable (e.g., MAR or MCAR), since model (2.10) is ﬁtted to the observed covariates which may be measured at diﬀerent time points than the responses. When the missing covariates are nonignorable (i.e., the missingness may be related to the missing values), the missing data mechanism should be modeled. Little (1995) gave an overview on modeling dropout mechanism in longitudinal studies. For example, we may assume that the missingness is related to the true 27 2.4. Models for Nonignorable Missing Data but unobserved covariate value and the previous missing status. That is, we may assume that logit(P r(rij = 1)) = η0 + η1 zi∗ (tij ) + η2 ri,j−1 , (2.31) where zi∗ (tij ) = Ψp (tij )T α+Φq (tij )T ai . We can then incorporate the missing data model into the joint likelihood for inference. Alternative missing data models may also be considered. The statistical analysis in this thesis is mainly based on likelihood. When the missing covariates are nonignorable, we only need to add an additional model for the missing mechanism to the joint likelihood and then proceed the analysis. For simplicity of presentation, we only focus on ignorable missing covariates in the following sections. 28 Chapter 3 Simultaneous Inference Having introduced several models in previous chapter for the statistical problems discussed in Section 1.1, the attentions are turned to how to analyze the data based on these models. As we emphasized in Section 1.2.1, the longitudinal data and the survival data are associated and the models share the same random eﬀects. Thus, separate analysis of the longitudinal data and survival data may lead to biased results. This essential feature requires a simultaneous inference for those statistical models. In this chapter, we ﬁrstly introduce a simple method that can be implemented easily by standard softwares. Then we discuss two likelihood-based simultaneous inference methods, EM algorithm and Laplace approximation, which both have their own advantages and are widely applicable. Finally, we linearize the joint models and propose a new likelihood-based approximate simultaneous inference approach. The purpose of Laplace approximation and the liearization method is to reduce computation burden since the EM algorithm is computationally intensive. A brieﬂy review below is to remind the readers of the statistical models that are adopted in this thesis. Suppose n subjects are included a longitudinal study. Let yij be the response value for subject i at time tij , zij be the corresponding covariate, and Ti be the event time of subject i, i = 1, . . . , n. The statistical models that we introduced in Chapter 2 are reviewed as follows: 29 Chapter 3. Simultaneous Inference • Parametric NLME model for the longitudinal process: yij = g(tij , β i ) + eij , i = 1, . . . , n, j = 1, . . . , ni , β i = h(zi , β, bi ) bi ∼ N (0, B), (3.1) (3.2) ei |bi ∼ N (0, µ2 Ii ), (3.3) where g(·) and h(·) are given nonlinear functions, β i are the subject-speciﬁc parameters and β are the ﬁxed eﬀects (population parameters), zi contain the covariates for subject i, bi = (bi1 , . . . , biq )T are the random eﬀects between subjects, ei = (ei1 , . . . , eini )T are the random errors of within-subject measurements, B is the variance-covariance matrix of the random eﬀects, and µ characterizes variance of within-subject measurements. • Semiparametric LME model for the covariate measurement errors problem: zi = z∗i + ξ i = ΨTp α + ΦTq ai + ξ i , ai ∼ N (0, A), i = 1, . . . , n, ξ i |ai ∼ N (0, λ2 Ii ), (3.4) (3.5) where Ψ(·) and Φ(·) are some basis functions, α are the ﬁxed eﬀects and ai are the random eﬀects between subjects, ξ i = (ξi1 , . . . , ξini )T are the random errors of within-subject measurements, A is the variance-covariance matrix of the random eﬀects, and λ characterizes variance of within-subject measurements. • Parametric AFT model for the time-to-event process: log Ti = γ0 + γ T1 ai + γ T2 bi + σϵi , i = 1, . . . , n. (3.6) where ϵi follows a standard extreme value distribution. 30 3.1. Two-Step Method 3.1 Two-Step Method Most joint modelings of longitudinal process and time-to-event process are based on the assumption that two processes are linked with some shared parameters, which are usually the random eﬀects representing variations between subjects. A simple statistical inference method is to make independent inference for one process in the ﬁrst step to estimate the shared parameters, and then incorporate the estimated parameters into the inference for the other process in the second step. We call this method Two-Step Method. 3.1.1 Naive Two-Step Method Since both the longitudinal process and the time-to-event process depend on the covariate z, which were measured with errors, we need to solve the measurement errors problem (3.4) at the beginning. Laird and Ware (1982) gave the LME model formulation and proposed an EM algorithm to ﬁt the model by treating the random eﬀects as missing data. Lindstrom and Bates (1988) employed the linear feature of LME model and obtained the ﬁrst two marginal moments for yi . The ﬁxed eﬀects β can be estimated by generalized least squares, and the variance components can be estimated by normal-theory pseudo-likelihood or restricted maximum pseudo-likelihood using a general framework based on Newton-Raphson method. Pinheiro and Bates (2002) described the covariance parametrizations of A, which could be more diverse and complex than our case. By ﬁtting the LME model (3.4), we have the “true” covariate z∗i for modeling the longitudinal process and the estimate of random eﬀects ai for modeling the time-to-event process. For the NLME model (3.1), Lindstrom and Bates (1990) alternated between a penalized nonlinear least squares (PNLS) step and a linear mixed eﬀects (LME) step. In the PNLS step, the current estimate of precision factor is held ﬁxed, and the conditional estimates of the random eﬀects bi and the ﬁxed eﬀects β are 31 3.1. Two-Step Method obtained by minimizing a penalized nonlinear least squares objective function. The LME step then updates the estimate of the precision factor based on the ﬁrstorder Taylor expansion of the likelihood function around the current estimates of β and bi respectively. Using this method to ﬁt the longitudinal model (3.1), we can obtain the estimate of the ﬁxed eﬀects β and the random eﬀects bi for ﬁtting the time-to-event model. The time-to-event process is analyzed by the AFT regression model. Having the random eﬀects ai and bi estimated from above two models, we can ﬁt the AFT regression model by standard statistical packages. Thus, the naive two-step approach can be summarized as below. • Covariate model inference. For a time-varying covariate, assume a ﬂexible and robust model to estimate the true covariate values and the random eﬀects between subjects. • Nonlinear mixed eﬀects model inference. Estimate the ﬁxed eﬀects, the random eﬀects and the variance components with the covariate values measured with errors replaced by the true values. • Time-to-event model inference. Fit the time-to-event model based on the estimated random eﬀects from above two estimations. 3.1.2 Modiﬁed Two-Step Method The naive two-step method is easily understandable and executable using standard statistical software. However, as pointed out by Ye et al. (2008) and Albert and Shih (2009), the naive two-step method may lead to two types of bias. • The covariate trajectories may be diﬀerent according to the censoring status of the event times. Thus, applying a single covariate model to all covariate data may lead to biased estimation for the ﬁrst model. 32 3.2. EM Algorithm • Inference for the second model that ignores the estimation uncertainty in the ﬁrst model may also lead to biased results (e.g., under-estimating the standard errors). The ﬁrst bias, called bias from informative dropouts, may depend on the strength of the association between the longitudinal process and the time-to-event process. The second bias may depend on the magnitude of measurement errors in covariate. Therefore, we need to modify the naive two-step method to reduce these biases. To incorporate the estimation uncertainty in the ﬁrst step, we could adjust the standard errors of the parameters’ estimates using bootstrap method by resampling subjects and keeping the repeated measurements of resampled subjects. After repeating above procedure B times, the standard errors for the estimates of the ﬁxed parameters can be estimated by the sample variance-covariance matrix across the B bootstrap datasets, which is expected be more reliable than the naive two-step method results, if the assumed models are correct. However, the modiﬁed two-step method can not completely reduce the biases in the naive two-step method, because it still treats the two processes separately. In order to make reliable inference, simultaneous inference based on a joint model may be more appropriate. In order to provide an entirely decent inference method, making sure that the method considers two processes simultaneously is the ﬁrst and most fundamental step to do. In the following sections, we consider some uniﬁed approaches based on the joint likelihood function given all observed longitudinal data and time-to-event data. 3.2 EM Algorithm The two-step methods introduced in the previous section, naive or modiﬁed, ﬁt the models separately with a simple “plug-in” strategy. EM Algorithm can be 33 3.2. EM Algorithm used to estimate all parameters in the longitudinal model, the covariate model and the time-to-event model simultaneously based on the joint likelihood with the bias of the naive two-step method avoided. The joint likelihood approach is quite general and can be extended to statistical inference for more than two related models. Such a joint likelihood method provides eﬃcient estimation if the assumed models are correct. Speciﬁcally, let θ = (α, β, γ, µ, λ, σ, A, B) be the parameters’ collection of above three models, and let f (·) denotes the generic density functions and F (·) be the corresponding cumulative distribution functions. The joint log-likelihood given all the observed data can be written as: ℓo (θ|y, z, ρ, δ) = n ∑ ℓ(i) o (θ|yi , zi , ρi , δi ) i=1 = n ∫∫ ∑ log[f (yi |ai , bi ; α, β, µ)f (bi |B)f (zi |ai ; α, β, λ)f (ai |A) i=1 ×f (ρi |ai , bi ; γ, σ)δi (1 − F (ρi |ai , bi ; γ, σ))1−δi ]dai dbi . (3.7) The EM algorithm (Dempster et al., 1977) is a powerful iterative algorithm to compute MLEs in a wide variety of situations, including missing data problems and random eﬀects models. To get the MLE of θ in our case, we may consider an EM algorithm by treating the unobservable random eﬀects (ai , bi ) as “missing data”. EM algorithm contains an E-step, which computes the conditional expectation of the “complete data” log-likelihood given the observed data and parameter estimates, and an M-step, which maximizes the conditional expectation in the E-step to update the parameters’ estimates. 3.2.1 E-Step The “complete data” are {yi , zi , ρi , δi , ai , bi }, i = 1, 2, · · · , n, and the parameter estimates at the tth iteration is denoted as θ (t) . The E-step at the (t + 1)th 34 3.2. EM Algorithm iteration can be written as: (t) Q(θ|θ ) = n ∫∫ ∑ [log f (bi |B) + log f (yi |ai , bi ; α, β, µ) + log f (ai |A) i=1 + log f (zi |ai ; α, β, λ) + log(1 − F (ρi |ai , bi ; γ, σ))1−δi + log f (ρi |ai , bi ; γ, σ)δi ]f (ai , bi |yi , zi , ρi , δi ; θ (t) )dai dbi , (3.8) The computation of the conditional expectation Q(θ|θ (t) ) in the E-step is usually non-trivial for high-dimensional integrals. However, since the integral is an expectation with respect to f (ai , bi |yi , zi , ρi , δi ; θ (t) ), it can be evaluated based on the Monte Carlo EM of Wei and Tanner (1990) and Ibrahim et al. (2001a). That is to generate mt samples from f (ai , bi |yi , zi , ρi , δi ; θ (t) ) and approximate the expectation Q(θ|θ (t) ) by its empirical mean, with the random eﬀects ai , bi (j) (j) replaced by the simulated values ai , bi , j = 1, . . . , mt . Q(θ|θ (t) ) = mt ∑ n 1 ∑ (j) (j) (j) [log f (yi |ai , bi ; α, β, µ) + log f (bi |B) mt j=1 i=1 (j) (j) (j) (j) + log f (zi |ai ; α, β, λ) + log f (ai |A) + δi log f (ρi |ai , bi ; γ, σ) (j) (j) +(1 − δi ) log(1 − F (ρi |ai , bi ; γ, σ))] = Q(1) (α, β, µ) + Q(2) (B) + Q(3) (α, λ) + Q(4) (A) + Q(5) (γ, σ). (3.9) We may choose m0 as a large number and mt = mt−1 + mt−1 /k, (k ≥ 1) in the tth iteration. Increasing mt at each iteration may speed up the algorithm’s convergence (Booth and Hobert, 1999). To generate independent samples from f (ai , bi |yi , zi , ρi , δi ; θ (t) ), we may use Gibbs sampler (Gelfand and Smith, 1990) by sampling from the following full 35 3.2. EM Algorithm conditionals iteratively: f (ai |yi , zi , ρi , δi , bi ; θ (t) ) ∝ f (ai ; θ (t) )f (yi |ai , bi ; θ (t) )f (zi |ai ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ), f (bi |yi , zi , ρi , δi , ai ; θ (t) ) ∝ f (bi ; θ (t) )f (yi |ai , bi ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ), where f ∗ (ρi , δi |ai , bi ; θ (t) ) = f (ρi |ai , bi ; θ (t) )δi (1 − F (ρi |ai , bi ; θ (t) ))1−δi . Thus, we can use the Gibbs sampling method to iteratively sample from the above full conditionals. After a burn-in period, we can obtain a sample from the conditional distribution f (ai , bi |yi , zi , ρi , δi ; θ (t) ). 3.2.2 M-Step The parameter θ at the (t + 1)th iteration can be estimated by maximizing Q(θ|θ (t) ). Generally, Q(θ|θ (t) ) is a nonlinear function and there is no closedˆ (t+1) . form expression for the estimate θ The maximizers could be obtained via standard numerical optimization procedures for complete data, such as the Newton-Raphson method or more advanced algorithms (Fletcher, 1987; Nocedal and Wright, 2006). EM algorithm iterates between the two steps until convergence and the ﬁnal estimates of θ are regarded as a local maximum. The asymptotic varianceˆ can be obtained using well-known complete-data formulas covariance matrix of θ (Louis, 1982), where the expectations in those formulas can be approximated by Monte-Carlo means. Speciﬁcally, note that the observed information matrix equals the expected complete information minus the missing information ˆ = Icom (θ) ˆ − Imis (θ) ˆ Iobs (θ) 36 3.3. Laplace Approximation Deﬁne ˆ = ˙ θ) Q(θ| n ∑ ˆ = Q˙ i (θ|θ) i=1 ˆ = ¨ θ) Q(θ| ˆ ∂ 2 Qi (θ|θ) T ∂θ∂θ mt ∑ n ∑ 1 Sij (θ) mt j=1 i=1 mt ∑ n ∑ 1 ∂Sij (θ) = . mt ∂θ j=1 i=1 Since the parameters are all distinct, above matrices are block diagonal. Then the asymptotic observed information matrix is mt ∑ n n ∑ ∑ 1 1 ˙ ˆ ˙T ˆ ˆ = −Q(θ| ˆ − ˆ T (θ) ˆ − ¨ θ) Iobs (θ) Sij (θ)S Q(θ|θ)Q (θ|θ) . (3.10) ij mt mt j=1 i=1 i=1 ˆ can be approximated by The asymptotic variance-covariance matrix of θ ˆ = I −1 (θ). ˆ Cov(θ) obs 3.3 Laplace Approximation The convergence of the foregoing Monte-Carlo EM algorithm depends on the dimension of the random eﬀects (ai , bi ). If the dimension of the random eﬀects (ai , bi ) is not small, for instance larger than 2, the computation of Monte-Carlo EM method can be extremely intensive since the method involves simulating large samples at each iteration. Moveover, the algorithm may not converge in some cases. Therefore, it is valuable to approximate the observed-data joint loglikelihood ℓo (θ|y, z, ρ, δ) using the ﬁrst-order Laplace approximation, which completely avoids integration and fastens the entire computation speed signiﬁcantly. 37 3.3. Laplace Approximation The “complete data” log-likelihood can be written as: ℓc (θ|a, b) = n ∑ ℓc(i) (θ|yi , zi , ρi , δi ) i=1 = n ∑ log[f (yi |ai , bi ; α, β, µ)f (bi |B)f (zi |ai ; α, β, λ)f (ai |A) i=1 ×f (ρi |ai , bi ; γ, σ)δi (1 − F (ρi |ai , bi ; γ, σ))1−δi ] (3.11) Vonesh et al. (2002) and Lee et al. (2006) showed that the observed-data loglikelihood ℓo (θ|y, z, ρ, δ) can be approximated by its ﬁrst-order Laplace approximation 2 ˜ = ℓc (θ|˜ ˜ − 1 log − 1 ∂ ℓc (θ|a, b) ℓ˜o (θ|˜ a, b) a, b) 2 2π ∂(a, b)2 , (3.12) ˜ (a,b=˜ a,b) ˜ = {(˜ ˜ i ), i = 1, 2, . . . , n} solve the equations where (˜ a, b) ai , b ∂ℓ(i) c (θ|ai , bi )/∂(ai , bi ) = 0, i = 1, 2, . . . , n. (3.13) Thus, an approximate estimate of θ can be obtained by solving the equation ˜ i )/∂θ = 0. ∂ ℓ˜o (θ|˜ ai , b (3.14) Given starting values θ (0) and (a(0) , b(0) ), we iterate the above procedures until ˆ of the MLE. To be speciﬁc, convergence and obtain the approximate estimate θ we carry out the following two steps at the tth iteration. ˜ (t) ) by solving equations (3.13), • Estimate the random eﬀects (˜ a(t) , b • Update the ﬁxed eﬀect θ (t+1) by solving equation (3.14). Laplace approximation method completely avoids integration, and thus is computationally attractive. The estimated random eﬀects obtained this way can be 38 3.4. Linearization Method interpreted as the empirical Bayes estimates. The standard errors of the approximate estimates can be calculated based on the approximate formula for their variance-covariance matrix, which can be obtained in a similar way by eliminating the mean parameters and the random eﬀects using a similar Laplace approximation. Speciﬁcally, [ 2 ˜ ∗) a∗ , b ˆ = − ∂ ℓo (θ|˜ Cov(θ) ∂θ∂θ T ]−1 (3.15) ˆ θ=θ ˆ are the estimated ﬁxed eﬀects, and a ˜ ∗ are the estimated random ˜∗ , b where θ eﬀects at the ﬁnal iteration. ˆ is consistent and Vonesh et al. (2002) showed that the approximate estimate θ asymptotically normally distributed when both the sample size and the number of measurements within each individual go to inﬁnity [ { }] ˆ − θ) = Op max (n)− 12 , (min ni )−1 , (θ (3.16) and √ ˆ − θ) → N (0, I(θ) ¯ −1 ) n(θ (3.17) ∑ ¯ where I(θ) = limn i Ii (θ)/n and Ii (θ) is the information matrix of subject i. 3.4 Linearization Method Inspired by a popular approximate method for solving NLME model, we propose a linearization method for our joint models. Following the similar idea of Lindstrom and Bates (1990), we ﬁrstly linearize the NLME (3.1). Given the estimates of ˜ a ˜ i , we have the ﬁrst-order Taylor expansion of gij = g(tij , β ), ˜ β, ˜ i, b parameters α, i which is a LME model ˜ i = Wi α + Xi β + Ui ai + Vi bi + ei , y (3.18) 39 3.4. Linearization Method where Wi = (wi1 , · · · , wini )T , wij = ∂gij /∂α, Xi = (xi1 , · · · , xini )T , xij = ∂gij /∂β, Ui = (ui1 , · · · , uini )T , uij = ∂gij /∂ai , Vi = (v i1 , · · · , v ini )T , v ij = ∂gij /∂bi , ˜ a ˜ i ) + Wi α ˜ + Ui a ˜ i, ˜ i = yi − g i (α, ˜ β, ˜i , b ˜ + Xi β ˜ i + Vi b y ˜ a ˜ i ). Now we have two ˜ β, ˜i , b with all the partial derivatives being evaluated at (α, LMEs and a time-to-event model, and we adopt the strategy of EM algorithm to estimate the parameters. To calculate (3.8), we need to estimate the random eﬀects ai , bi at ﬁrst. The general EM approach samples ai , bi from the conditional distribution of the random eﬀects f (ai , bi |yi , zi , ρi , δi ; θ (t) ) by iteratively updating their full conditionals: f (ai |yi , zi , ρi , δi , bi ; θ (t) ) ∝ f (ai ; θ (t) )f (yi |ai , bi ; θ (t) )f (zi |ai ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ), f (bi |yi , zi , ρi , δi , ai ; θ (t) ) ∝ f (bi ; θ (t) )f (yi |ai , bi ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ), where f ∗ (ρi , δi |ai , bi ; θ (t) ) = f (ρi |ai , bi ; θ (t) )δi (1 − F (ρi |ai , bi ; θ (t) ))1−δi . 40 3.4. Linearization Method Since we have f (ai ; θ ) ∝ (t) ∝ f (bi ; θ (t) ) ∝ ∝ f (zi |ai , θ (t) ) ∝ ∝ f (yi |ai , bi , θ (t) ) ∝ [ ] 1 T −1 |Σait | exp − (ai − ν ait ) Σait (ai − ν ait ) 2 [ ] n 1 T −1 1 T −1 − 2i T −1 |Σait | exp − ai Σait ai + ν ait Σait ai − ν ait Σait ν ait , 2 2 [ ] ni 1 − 2 T −1 |Σbit | exp − (bi − ν bit ) Σbit (bi − ν bit ) 2 [ ] n 1 T −1 1 T −1 −1 − 2i T |Σbit | exp − bi Σbit bi + ν bit Σbit bi − ν bit Σbit ν bit , 2 2 [ ] 1 −ni T T T T T λit exp − 2 (zi − ψ i αt − ϕi ai ) (zi − ψ i αt − ϕi ai ) 2λit [ 1 1 T i a ϕ ϕT ai + 2 (zi − ψ Ti αt )T ϕTi ai λ−n it exp − 2λ2it i i i λit ] 1 T T T − 2 (zi − ψ i αt ) (zi − ψ i αt ) , 2λit 1 i µ−n (˜ yit − Wit αt − Xit β t − Uit ai − Vit bi )T it exp[− 2µ2it − ni 2 ×(˜ yit − Wit αt − Xit β t − Uit ai − Vit bi )] [ 1 1 −ni ∝ µit exp − 2 aTi UitT Uit ai − 2 bTi VitT Vit bi 2µit 2µit 1 + 2 (˜ yit − Wit αt − Xit β t − Vit bi )T Uit ai µit 1 + 2 (˜ yit − Wit αt − Xit β t − Uit ai )T Vit bi µit ] 1 T − 2 (˜ yit − Wit αt − Xit β t ) (˜ yit − Wit αt − Xit β t ) , 2µit The likelihood function given the event time data is linearized as follows: [ 1 1 ∼ f (ρi , δi |ai , bi ; θ ) =∝ exp − 2 aTi (γ 1t γ T1t )ai + 2 (log Ti − γ0t + δi )γ T1t ai 2σt σt 1 T 1 − 2 bi (γ 2t γ T2t )bi + 2 (log Ti − γ0t + δi )γ T2t bi 2σt σt ] 1 2 − 2 (log Ti − γ0t + δi ) . 2σt ∗ (t) Then we may obtain the approximate explicit form of the distribution of the ran- 41 3.4. Linearization Method dom eﬀects, which follows a multivariate normal distribution. Therefore, the full conditionals of the random eﬀects can be approximated by the following multivariate normal distributions: f (ai |yi , zi , ρi , δi , bi ; θ (t) ) ∝ f (ai ; θ (t) )f (yi |ai , bi ; θ (t) )f (zi |ai ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ) [ ] 1 T −1 ∝ exp − (ai − ν ai(t+1) ) Σai(t+1) (ai − ν ai(t+1) ) 2 ∼ MVN(ν ai(t+1) , Σai(t+1) ), (3.19) f (bi |yi , zi , ρi , δi , ai ; θ (t) ) ∝ f (bi ; θ (t) )f (yi |ai , bi ; θ (t) )f ∗ (ρi , δi |ai , bi ; θ (t) ) ] [ 1 T −1 ∝ exp − (bi − ν bi(t+1) ) Σbi(t+1) (bi − ν bi(t+1) ) 2 ∼ MVN(ν bi(t+1) , Σbi(t+1) ), (3.20) where Σ−1 ai(t+1) ν Tai(t+1) Σ−1 ai(t+1) Σ−1 bi(t+1) ν Tbi(t+1) Σ−1 bi(t+1) ϕi ϕTi UitT Uit γ 1t γ T1t + + , λ2it µ2it σt2 1 1 T T T T = ν Tait Σ−1 ait + 2 (zi − ψ i αt ) ϕi + 2 (log Ti − γ0t + δi )γ 1t λit σt 1 + 2 (˜ yit − Wit αt − Xit β t − Vit ν bit )T Uit , µit = Σ−1 ait + VitT Vit γ 2t γ T2t + , µ2it σt2 1 T = ν Tbit Σ−1 bit + 2 (log Ti − γ0t + δi )γ 2t σt 1 yit − Wit αt − Xit β t − Uit ν ait )T Vit . + 2 (˜ µit = Σ−1 bit + Thus, we may have an estimate of the distribution of the random eﬀects ai and bi at the (t + 1)th iteration. 42 3.4. Linearization Method 3.4.1 Framework I With the estimated distributions of ai and bi , we may return to Section 3.2.1 and work on the log-likelihood function (3.8). If the random eﬀects ai and bi can be integrated out, we may simply adopt the EM framework to complete the linearization method. To be speciﬁc, at the tth iteration, we need to: (t) ˆ (t) ; • Linearize the joint model at the current steps a˜i (t) , b˜i and θ (t+1) • Estimate the random eﬀects (a˜i (t+1) , b˜i ) as (3.19); • Obtain the expectation of the joint log-likelihood function; • Update θ (t+1) by maximizing the expectation of the observed data joint log-likelihood. We repeat the above steps until the algorithm converges. The standard errors of the MLE of θ can be calculated based on the asymptotic observed information matrix (3.10). 3.4.2 Framework II If we can not obtain an explicit form of the integral (3.8), we may simply consider the means of the estimated distributions of the random eﬀects ai and bi , ν ai(t+1) and ν bi(t+1) , as their empirical Bayesian estimates. Then the ﬁxed eﬀects can be updated in the same way as Laplace approximation method. That is to say, we carry out the following steps at the tth iteration. • Linearize the joint model at current iterate points a˜i (t) , b˜i (t) ˆ (t) ; and θ (t+1) • Estimate the random eﬀects (a˜i (t+1) , b˜i ) by (3.19), • Update θ (t+1) by solving equations (3.14) with ν ai(t+1) and ν bi(t+1) being (t) estimates of a˜i (t) and b˜i . 43 3.4. Linearization Method Similar to Laplace approximation method, the standard errors of the approximate MLE of θ can be approximated by (3.15). 44 Chapter 4 Data Analysis 4.1 Data Description We consider an AIDS clinical trial, which included 46 HIV infected patients being treated with an anti-HIV treatment. The viral loads were measured on days 0, 2, 7, 10, 14, 21, 28 and weeks 8, 12, 24, and 48 after initiation of the treatment. The CD4 and CD8 cell counts and other variables were also recorded throughout the study. The lower bound of the detectable limit of the viral load is 100 copies/ml, and 40 out of all 361 viral load measurements (11%) are below the detectable limit. For simplicity, we imputed the viral loads below the assay’s lower detection limit 100 copies/ml with a half of the limit. The number of viral load measurements for each individual varies from 4 to 10. Figure 4.1 shows the trajectories of viral load, the CD4 and the CD4/CD8 of six randomly selected patients. One of the primary objectives of this study is to assess the antiviral activity of the treatment in the ﬁrst 12 weeks. Meanwhile, the association between the viral load and the time to the ﬁrst decline of the ratio of CD4 and CD8 is of interest. 45 4.2. Model Speciﬁcation 4.2 Model Speciﬁcation Based on previous researches (Wu and Ding, 1999; Wu, 2002; Wu et al., 2010), we consider the following HIV viral load dynamic model yij = log10 (P1i e−λ1i tij + P2i e−λ2ij tij ) + eij (4.1) log(P1i ) = β1 + bi1 , λ1i = β2 + bi2 , ∗ log(P2i ) = β3 + bi3 , λ2ij = β4 + β5 zij + bi4 . where yij is the log10-transformation of the viral load of patient i at time tij , P1i and P2i are subject-speciﬁc viral load baselines, λ1i and λ2ij subject-speciﬁc ﬁrst∗ is the “true” (but unobserved) phase and second-phase viral load decay rates, zij CD4 count. It is known that CD4 counts are usually measured with substantial ∗ rather than errors. Thus we assume that λ2ij is related to the true CD4 value zij the observed CD4 value zij . To avoid very small (large) estimates, which may be unstable, we standardize the CD4 counts and rescale the original time t (in days) so that the new time scale is between 0 and 1. 0.7 0.6 0.4 CD4/CD8 0.3 0.2 0 1 0.1 100 2 3 200 CD4 Viral Load 4 300 0.5 5 400 6 500 Figure 4.1: Longitudinal Data Trajectories 0 20 40 Time 60 80 0 20 40 Time 60 80 20 40 60 80 Time As we can see from Figure 4.1, the CD4 trajectories are complicated and the between-patient variations in CD4 trajectories are quite large, thus we consider a 46 4.2. Model Speciﬁcation nonparametric linear mixed eﬀects model to empirically describe the CD4 trajectories. We use linear combinations of natural cubic splines with percentile-based knots to approximate r(t) and hi (t). We set ψ0 (t) = ϕ0 (t) = 1 and take the same natural cubic splines in the approximations with q ≤ p in order to decrease the dimension of the random eﬀects. AIC and BIC criteria are used to determine that p = q = 2, which leads to the following model for the CD4 process. zij = (α1 + ai1 )ϕ0 (tij ) + (α2 + ai2 )ϕ1 (tij ) + (α3 + ai3 )ϕ2 (tij ) + ξij . (4.2) where zij is the observed CD4 value at time tij , ϕ1 (·) and ϕ2 (·) are the basis functions, α = (α1 , α2 , α3 )T are the ﬁxed parameters, and ai = (ai1 , ai2 , ai3 )T are the random eﬀects. Let Ti be the time to the ﬁrst decline of CD4/CD8 ratio of subject i. In AIDS 0.6 0.4 0.0 0.2 Survivor Function 0.8 1.0 Figure 4.2: Estimate of Survivor Function 0.0 0.2 0.4 0.6 0.8 1.0 Time of First Decline of CD4:CD8 studies, it is of great interest to see if the time of the ﬁrst decline of CD4/CD8 is related to the subject characteristics of the viral load and the CD4 trajectories, 47 4.3. Analysis and Results represented by the random eﬀects in the viral load model and the CD4 model. Therefore, we consider the following time-to-event model for Ti log Ti = γ0 + γ1 ai1 + γ2 ai2 + γ3 bi2 + γ4 bi4 + σϵi . (4.3) where ϵi follows standard extreme value distribution. The random eﬀects bi2 and bi4 , which represent between-subject variations of viral decay rates, might be predictive for event times. The random eﬀects ai1 and ai2 , which bear the main features of between-subject variations of CD4 trajectories, are also included in the parametric AFT model (4.3). 4.3 Analysis and Results We estimate the models’ parameters using two-step method (TS), modiﬁed twostep method (MTS), Laplace approximation method (LAP) and linearization method (LIN). TS and MTS are based on standard R algorithms in library(nlme) and library(survival). The estimation results of TS and MTS are listed along with those of LAP and LIN in Table 4.1. Figure 4.3 and 4.4 check the normality assumptions of the residual and random eﬀects of the NLME model. Figure 4.5 and 4.6 check the normality assumptions of the residual and random eﬀects of the measurement error model. It seems that no assumption is severely violated. LAP and LIN are iterative algorithms with convergence being achieved when the maximum percentage change of all estimates is less than 5% in two consecutive iterations. The parameters’ estimates of TS method are taken as the initial values for two iterative algorithms LAP and LIN. Based on the analysis results listed in Table 4.1, we have the following observations. • Four inference methods provide similar parameter estimates for the NLME 48 4.3. Analysis and Results Figure 4.3: Residuals Plots I of NLME Model 3 4 Quantiles of standard normal 2 0 1 0 −1 −2 −2 −3 2 3 4 5 6 −2 0 Fitted values 2 4 Standardized residuals Figure 4.4: Residuals Plots II of NLME Model b3 b4.(Intercept) 250708 2 1 0 Quantiles of standard normal Standardized residuals 2 −1 250911 271773 250736 250405 11960 −2 610500 −4 −2 0 2 −5 0 b1 b2 610282 2 11960 610330 1 0 −1 250405 −2 11960 −3 −2 −1 0 1 2 −5 0 5 Random effects model of the viral load data. LAP and LIN agree that all parameters of β are signiﬁcant to the viral load, while TS and MTS suggest that β5 might be insigniﬁcant. 49 4.3. Analysis and Results Figure 4.5: Residuals Plots I of Covariate Measurement Errors Model 3 2 2 Quantiles of standard normal Standardized residuals 1 0 −1 −2 1 0 −1 −2 −3 −3 −4 −3 −2 −1 0 1 −3 −2 −1 Fitted values 0 1 2 Standardized residuals Figure 4.6: Residuals Plots II of Covariate Measurement Errors Model z2 250009 610429 250867 271764 2 1 Quantiles of standard normal 0 −1 −2 −0.4 −0.2 0.0 0.2 0.4 0.6 (Intercept) z1 271764 2 250009 610429 1 0 −1 250867 250723 610429 −2 250009 −3 −2 −1 0 1 −1 0 1 2 3 4 Random effects • The estimation results of the time-to-event process obtained from four methods diﬀer from each others. Generally speaking, TS and MTS suggest that none random eﬀect is signiﬁcant to the event time, while two simultaneous 50 4.3. Analysis and Results Table 4.1: Estimation Results Parameter β1 β2 β3 β4 β5 µ γ0 γ1 γ2 γ3 γ4 σ TS Est S.E. 11.75 0.19 29.84 1.47 7.48 0.29 1.34 0.51 0.63 0.46 0.29 NA -0.81 0.12 0.19 0.24 -0.03 0.21 -0.03 0.05 0.08 0.05 0.75 NA MTS Est S.E. 11.76 0.19 30.22 1.69 7.43 0.32 1.31 0.64 0.52 0.70 0.29 NA -0.82 0.01 0.16 0.52 -0.01 0.52 -0.02 0.11 0.04 0.10 0.76 NA LAP Est S.E. 11.78 0.09 29.25 1.31 7.22 0.15 1.43 0.23 0.42 0.16 0.29 NA -0.97 0.11 0.55 0.37 -1.03 0.52 -0.05 0.02 0.12 0.03 0.31 NA LIN Est S.E. 11.80 0.06 30.03 0.87 7.40 0.10 1.18 0.15 0.55 0.10 0.22 NA -0.71 0.20 0.22 0.19 0.01 0.13 -0.03 0.04 0.05 0.02 0.75 NA approaches indicate that bi4 might be signiﬁcant. This discovery distinguished the simultaneous approaches from the TS based methods. • MTS provides very close parameter estimates as TS for both the NLME model and the measurement errors model, but with greater standard errors. However, by inferring those three models simultaneously, LAP and LIN provided parameter estimates with smaller standard errors. Based on this data analysis result, we may conclude that: • For the analysis of the longitudinal process based on NLME model, the results obtained from two simultaneous inference methods agree with previous researches (Ding and Wu, 2001; Wu et al., 2008, 2010) in terms of the signiﬁcance conclusion of the parameters. However, some estimated values of our analysis are diﬀerent from previous researches, since we only considered the observations in the ﬁrst three months in our study. • Two simultaneous inference methods suggest that CD4 is signiﬁcant to the 51 4.3. Analysis and Results viral load, while TS and MTS miss this reasonable and important property. According to the conclusions from LAP and LIN, the higher the CD4 is, the lower viral load the subject might have. • Two simultaneous inference methods also indicate that the viral decay rate is signiﬁcant to the ﬁrst decline time of the CD4/CD8 ratio. The lower the viral load is, the later the CD4/CD8 may ﬁrstly decline. TS based methods fail again in detecting this important feature. The discrepancies between these data analysis results obtained from diﬀerent methods are important in terms of both biological interpretation and statistical methodology development. To study if the performance of the simultaneous inference methods exceed the TS based methods where longitudinal process and time-to-event process are truly connected, we need a simulation study to compare these methods. 52 Chapter 5 Simulation A simulation study is conducted to evaluate the performance of the three methods: two-step method (TS), Laplace approximation method (LAP) and linearization method (LIN). Modiﬁed two-step method (MTS) is omitted since MTS performs quite similar to TS only expect that MTS generally yields in greater variances of the estimates. 5.1 Simulation Design We simulate the longitudinal data and covariate data based on model (4.1) and (4.2). The event times are simulated based on (4.3) with 20% randomly censored subject. While having the parameter β, α and γ ﬁxed, we compare these inference methods based on their estimation performances in four scenarios with diﬀerent settings of • Number of subjects, • Numbers of repeated measurements, • Between-subject variance, • Within-subject variance. The simulations were implemented on Dell R900 server with 16 CPUs and 16-32Gb RAM. The estimation performance includes the standard errors of the parameters’ estimates and the corresponding bias and MSE, along with the average running 53 5.2. Simulation Results time. Since we are more interested in the NLME model and the time-to-event model estimation, the estimation results of the covariate model are omitted in the following section. 5.2 Simulation Results 5.2.1 Number of Subjects To examine how these estimation methods perform with diﬀerent numbers of subjects, we simulate datasets with n = 60 and n = 120. The estimation results based on n = 60 are reported in Table 5.1. The estimation results based on n = 120 are reported in Table 5.2. Table 5.1: Simulation Results: n = 60 Parameter β1 = 12 β2 = 30 β3 = 7 β4 = −2 β5 = 3 µ = 0.3 γ0 = −1 γ1 = 1 γ2 = 1 γ3 = 1 γ4 = 1 σ = 0.2 Time (Sec) Bias 0.02 -1.13 -0.02 -0.05 -0.00 -0.00 2.19 0.01 -0.01 -1.00 -0.02 2.19 TS SE 0.18 2.16 0.20 0.26 0.13 0.01 0.40 0.43 0.48 0.28 0.40 0.30 680.42 MSE 0.03 5.95 0.04 0.07 0.02 0.00 4.95 0.18 0.23 1.08 0.16 4.88 Bias -0.04 -1.39 0.12 0.15 -0.30 0.12 0.20 -0.80 0.52 0.14 -0.08 0.06 LAP SE 0.20 1.69 0.23 0.37 0.21 0.04 0.38 1.53 1.10 0.36 0.36 0.23 180.47 MSE 0.04 4.79 0.07 0.16 0.13 0.02 0.19 2.97 1.47 0.15 0.14 0.06 Bias -0.09 -0.93 -0.14 -0.01 -0.48 -0.01 0.15 -0.91 -0.88 -0.07 -0.10 0.08 LIN SE MSE 0.06 0.01 1.25 2.41 0.17 0.05 0.23 0.05 0.08 0.23 0.02 0.00 0.10 0.03 0.18 0.87 0.24 0.84 0.21 0.05 0.21 0.05 0.03 0.01 13.39 For Table 5.1, we have the following observations. • Three methods provided generally decent estimations for β, except that the estimate of β5 provided by LIN seems to be biased. 54 5.2. Simulation Results • Though TS estimated γˆ1 , γˆ2 and γˆ4 with less biases, its performance in the estimation of γ0 , γ3 and σ is poor. Comparatively, LAP and LIN are more likely to provide better estimation of γ, at least for γ0 . The results in Table 5.1 are regarded as the baseline for the following comparisons. Table 5.2: Simulation Results: n = 120 Parameter β1 = 12 β2 = 30 β3 = 7 β4 = −2 β5 = 3 µ = 0.3 γ0 = −1 γ1 = 1 γ2 = 1 γ3 = 1 γ4 = 1 σ = 0.2 Time (Sec) Bias 0.05 -0.94 -0.01 -0.04 0.00 0.00 2.01 0.04 0.00 -0.97 -0.01 2.26 TS SE 0.14 1.55 0.18 0.19 0.11 0.01 0.29 0.37 0.42 0.31 0.30 0.25 1146.46 MSE 0.02 3.28 0.03 0.04 0.01 0.00 4.13 0.14 0.18 1.03 0.09 5.18 Bias 0.00 -1.14 0.14 0.11 -0.19 0.09 -0.07 0.00 -0.00 0.03 -0.01 -0.06 LAP SE 0.11 0.96 0.14 0.26 0.13 0.03 0.06 0.14 0.11 0.04 0.04 0.15 360.46 MSE 0.01 2.20 0.04 0.08 0.05 0.01 0.01 0.02 0.01 0.00 0.00 0.03 Bias -0.10 -1.09 -0.12 0.00 -0.36 -0.00 -0.05 -0.01 -0.06 -0.15 -0.15 0.02 LIN SE MSE 0.04 0.01 0.92 2.03 0.16 0.04 0.22 0.05 0.07 0.13 0.01 0.00 0.09 0.01 0.19 0.04 0.25 0.07 0.24 0.08 0.23 0.08 0.03 0.00 28.83 Similarly, the observations for Table 5.2, which are based on more subjects, are listed below. • Overall, if more subjects are available, the estimation results of three methods can be eﬀectively improved in terms of smaller SE, bias and MSE. • The estimation result oﬀered by LAP is overall better than the results from two methods. • LIN gave satisfactory estimation for β and γ, except for β5 . This method might be even more attractive for its extraordinarily fast speed. 55 5.2. Simulation Results • TS still provided severely biased estimates for γ0 , γ3 and σ. 5.2.2 Numbers of Repeated Measurements To examine how these estimation methods perform with diﬀerent numbers of repeated measurements, we simulate datasets with each subject being repeatedly measured for 6 times and 12 times. Table 5.1 contains the simulation results based on ni = 6. Here we just list the estimation results of ni = 12 in Table 5.3. Table 5.3: Simulation Results: ni = 12 Parameter β1 = 12 β2 = 30 β3 = 7 β4 = −2 β5 = 3 µ = 0.3 γ0 = −1 γ1 = 1 γ2 = 1 γ3 = 1 γ4 = 1 σ = 0.2 Time (Sec) Bias 0.02 -0.52 -0.01 0.01 -0.00 -0.00 2.16 0.00 0.03 -0.85 0.04 2.20 TS SE 0.20 1.38 0.19 0.24 0.10 0.01 0.43 0.47 0.47 0.46 0.45 0.31 183.73 MSE 0.04 2.18 0.04 0.06 0.01 0.00 4.87 0.22 0.22 0.93 0.20 4.92 Bias -0.02 -1.76 0.18 0.53 -0.45 0.09 -0.05 0.16 -0.16 0.12 0.01 -0.02 LAP SE 0.15 1.07 0.19 0.33 0.22 0.03 0.20 0.58 0.53 0.21 0.16 0.22 187.58 MSE 0.02 4.24 0.07 0.38 0.25 0.01 0.04 0.36 0.31 0.06 0.03 0.05 Bias -0.15 -0.94 -0.08 0.05 -0.28 0.01 -0.06 -0.96 -0.94 -0.05 -0.09 0.08 LIN SE MSE 0.06 0.03 1.23 2.39 0.13 0.02 0.17 0.03 0.07 0.08 0.01 0.00 0.12 0.02 0.19 0.95 0.23 0.93 0.21 0.05 0.21 0.05 0.05 0.01 15.60 The observations for the results based on more measurements, Table 5.3, are • Overall, it might be hard to reduce the bias or SE of the estimates by having more measurements for each subject. • Speciﬁcally, LAP led the performance of estimating β and γ, though the bias of the estimates are greater than Table 5.2. • LIN also provided good estimation for β and γ except for γ1 and γ2 , which are associated with the random eﬀects of the measurement error model. 56 5.2. Simulation Results • With more repeated measurements ensured, the computation time of TS is greatly reduce. However, TS still provided severely biased estimates for γ0 , γ3 and σ. 5.2.3 Between-Subject Variance To examine how these estimation methods perform with diﬀerent variance-covariance matrices, we simulate datasets with diﬀerent settings of A and B follows: 2 1 0 0 0 A1 = 0 1 0 , B1 = 0 0 0 1 0 and 0 0 0 4 0 0 , 0 2 0 0 0 3 4 2 0 0 0 A2 = B = 0 2 0 2 0 0 0 2 0 0 0 0 8 0 0 . 0 4 0 0 0 6 Table 5.2 is based on A1 and B1 . Table 5.4 is based on A2 and B2 . For Table 5.4, • Surprisingly, the estimation result that LAP oﬀered in the greater betweensubject variance scenario is better than the less between-subject variance scenario (Table 5.1). It is almost as good as the more sample size scenario (Table 5.2). However, LIN is not that lucky, at least in terms of the bias of estimates of β5 and γ2 . • The SE of estimates are reasonably greater than previous setting of A1 and B1 . 57 5.2. Simulation Results Table 5.4: Simulation Results: A2 , B2 Parameter β1 = 12 β2 = 30 β3 = 7 β4 = −2 β5 = 3 µ = 0.3 γ0 = −1 γ1 = 1 γ2 = 1 γ3 = 1 γ4 = 1 σ = 0.2 Time (Sec) Bias 0.10 -0.87 -0.07 -0.02 0.00 -0.00 2.61 0.06 -0.01 -0.91 0.02 2.84 TS SE 0.25 2.18 0.29 0.34 0.11 0.01 0.51 0.36 0.38 0.34 0.31 0.37 908.11 MSE 0.07 5.52 0.09 0.11 0.01 0.00 7.10 0.14 0.15 0.94 0.09 8.19 Bias -0.03 -1.17 0.10 0.15 -0.17 0.14 -0.02 -0.05 0.04 0.01 0.00 -0.02 LAP SE 0.16 0.93 0.23 0.39 0.10 0.04 0.08 0.25 0.21 0.02 0.03 0.18 172.65 MSE 0.03 2.23 0.06 0.18 0.04 0.02 0.01 0.06 0.05 0.00 0.00 0.03 Bias -0.15 -0.32 -0.24 -0.12 -0.56 0.03 -0.14 -0.03 -0.22 -0.09 -0.09 0.80 LIN SE MSE 0.07 0.03 2.14 4.69 0.23 0.11 0.31 0.11 0.10 0.32 0.05 0.00 0.37 0.15 0.32 0.10 0.41 0.22 0.36 0.13 0.35 0.13 5.52 31.07 14.22 • TS still provided severely biased estimates for γ0 , γ3 and σ, while taking much longer time on computation. 5.2.4 Within-Subject Variance To examine how these estimation methods perform with diﬀerent within-subject variance, we simulate datasets with diﬀerent settings of µ, λ and σ as follows: µ1 = 0.3, λ1 = 0.2, σ1 = 0.2, and µ2 = 0.6, λ2 = 0.5, σ1 = 0.4. Table 5.1 is based on µ1 , λ1 and σ1 . Table 5.5 is based on µ2 , λ2 and σ2 . • When within-subject variances become greater, the estimation results become worse in terms of both SE and bias. • No method provides satisfactory estimation for the simulated time-to-event data with greater within-subject variance, indicating both the TS based 58 5.2. Simulation Results Table 5.5: Simulation Results: µ2 , λ2 , σ2 Parameter β1 = 12 β2 = 30 β3 = 7 β4 = −2 β5 = 3 µ = 0.6 γ0 = −1 γ1 = 1 γ2 = 1 γ3 = 1 γ4 = 1 σ = 0.5 Time (Sec) Bias 0.05 -2.68 -0.12 -0.08 0.02 0.20 2.28 0.01 0.05 -0.99 0.05 2.25 TS SE MSE 0.20 0.04 2.43 13.09 0.24 0.07 0.31 0.10 0.16 0.03 0.02 0.04 0.41 5.38 0.52 0.27 0.51 0.26 0.26 1.05 0.67 0.46 0.32 5.15 1644.74 Bias 0.04 -1.97 0.25 0.69 -0.22 0.35 0.36 -2.29 1.83 1.78 -0.11 0.92 LAP SE 0.23 2.36 0.28 0.47 0.22 0.08 0.85 4.54 4.35 3.05 1.49 1.16 299.28 MSE 0.06 9.43 0.14 0.70 0.10 0.13 0.86 25.84 22.30 12.47 2.24 2.21 Bias -0.05 -1.31 -0.19 -0.11 -1.20 0.14 -0.28 -0.97 -0.94 -0.10 -0.10 0.34 LIN SE MSE 0.11 0.01 2.15 6.31 0.34 0.16 0.51 0.28 0.15 1.47 0.07 0.02 0.35 0.20 0.23 1.00 0.21 0.92 0.22 0.06 0.23 0.06 0.13 0.13 16.48 method and the approximate methods might yield in greatly biased estimation when variance is too high. In this situation, some “exact” method, such as EM algorithm, might be necessary. In general, we may conclude that • TS is risky for unbiased estimation of joint modeling of longitudinal process and time-to-event process, especially when strong association exists in between. • Approximate simultaneous inference methods are promising to provide satisfactory estimation for both longitudinal processes and time-to-event processes when the measurement errors problem or the model mis-speciﬁcation problem is tolerable. • LAP generally performs more stably than LIN. In fact, when more subjects are given, or the between-subject variance falls in an appropriate range, LAP is likely to provide quite reliable estimation. 59 5.2. Simulation Results • LIN can be considered as a valuable alternative of LAP when more subjects are involved, especially considering its fast speed. However, this algorithm requires further study before being widely promoted, since its estimate for some parameters, for example β5 , might be biased. 60 Chapter 6 Conclusion and Discussion The computation associated with the joint modeling of longitudinal process and survival process can be extremely intensive and may lead to convergence problems (Tsiatis and Davidian, 2004; Wu et al., 2008). The situation can be even severe when several nonlinear and semiparametric/nonparametric models are involved in dealing with complex data, such as measurement errors or missing data are present. Previous researches have mainly used EM algorithm or Laplace approximation for making inference of several simultaneously developing processes. Besides the technical details of these two popular methods, we were more impressed by these two diﬀerent approaches of handling the sharing parameters of the longitudinal studies. In this thesis, following those two distinct computational frameworks, we proposed an approximate statistical inference method for jointly modeling longitudinal process and time-to-event process based on a NLME model and a parametric AFT model. By linearizing the joint model, we designed a new strategy of updating the random eﬀects that connect two processes, and proposed two approaches for diﬀerent scenarios of likelihood function. Roughly speaking, one approach is widely applicable and easily implementable when the likelihood function is comparatively complex. The other approach utilizes more information of the estimated random eﬀects when the likelihood function is comparatively simple, and therefore is expected to be more eﬃcient. Both approaches approximate the multidimensional integral in the observed-data joint likelihood by an analytic 61 Chapter 6. Conclusion and Discussion expression, which greatly reduce the computational intensity of the complex joint modeling problem. The proposed inference method, LIN, was applied to an HIV study along with three other methods, TS, MTS and LAP. The inference results obtained from diﬀerent methods gave signiﬁcant diﬀerent conclusions towards the HIV study. Speciﬁcally, • Two simultaneous inference methods suggested that CD4 is signiﬁcant to the viral load, which was missed by the TS based methods (TS and MTS). Based on the simultaneous inference results, the lower the CD4 is, the higher viral load the patient might have. • Two simultaneous inference methods also indicated that the viral decay rate is signiﬁcant to the ﬁrst decline time of the CD4/CD8 ratio. The lower the viral load is, the later the CD4/CD8 may ﬁrstly decline. TS based method failed again in detecting this important feature. We also compared these inference methods based on simulation results. Generally speaking, • It is risky to use TS method for unbiased estimation of the joint modeling of longitudinal process and time-to-event process. • LAP is generally more reliable than the other two methods in providing a valid estimation of the joint models. • LIN might become a valuable alternative to LAP in terms of estimation results and computation time, especially when more subjects are given. Linearization methods have achieved numerous successes for solving nonlinear problems in real world applications, according to which we believe that the linearization strategy may also help us in jointly modeling longitudinal data and 62 Chapter 6. Conclusion and Discussion time-to-event data. Our initial trial in this area seems promising and we believe that the following research directions might be interesting in the future. • Our proposed method is based on a model linearization and a likelihood function approximation. The asymptotic property of the estimator is naturally of interest. That is to say, we need to search for some theoretical justiﬁcations for our approach. • The model that we adopted in our modeling for time-to-event process, Weibull model, is a speciﬁc choice for event time analysis. As we introduced in Section 1.1.3, Cox models and general AFT models are also extremely popular in practice. Thus, one might be interested to generalize the linearization strategy to those situations where Cox models or general AFT models are involved in the joint modeling. • In many practical longitudinal studies, the response variables are recorded as various types of category. Thus, the NLME in our modeling might need to be replaced by generalized mixed eﬀects model (GLMM). Generalizing the linearization strategy to joint modeling with GLMM may be valuable to practical application. • Without a doubt, the missing data problem is always one of the cores of longitudinal studies. Our approach is based on likelihood, which determines that the missing data problems could be naturally incorporated into our framework. However, as more statistical problems are taken into account, the joint modeling itself is getting more and more complex in terms of both theoretical justiﬁcation and computation. These could be the focuses of our future research. 63 Bibliography Albert, P. S. and Shih, J. H. (2009). On estimating the relationship between longitudinal measurements and time-to-event data using a simple two-stage procedure. Biometrics, pages DOI 10.1111/j.1541–0420.2009.01324.x. Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Ser: B, 61:265 – 285. Brown, E. R. and Ibrahim, J. G. (2003). A bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics, 59:221 – 228. Brown, E. R., Ibrahim, J. G., and DeGruttola, V. (2005). A ﬂexible b-spline model for multiple longitudinal biomarkers and survival. Biometrics, 61:64 – 73. Carroll, R. J., Ruppert, D., and Stefanski, L. A. (2006). Measurement Error in Nonlinear Models. Chapman & Hall/CRC, London, 2nd edition edition. Chambers, J. M. (1991). Statistical Models in S. Duxbury Press. Davidian, M. and Giltinan, D. M. (1995). Nonlinear Models for Repeated Measurements Data. Chapman & Hall/CRC, London. Davidian, M. and Giltinan, D. M. (2003). Nonlinear models for repeated measurements data: an overview and update. Journal of Agricultural, Biological and Environmental Statistics, 8:387 – 419. 64 Bibliography de Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York. DeGruttola, V. and Tu, X. M. (1994). Modeling progression of cd-4 lymphocyte count and its relationship to survival time. Biometrics, 50:1003 – 1014. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Ser: B, 39:1 – 38. Diggle, P., Heagerty, P., Liang, K. Y., and Zeger, S. (2002). Analysis of Longitudinal Data. Oxford University Press, 2nd edition. Ding, A. and Wu, H. (2001). Assessing antiviral potency of anti-hiv therapies in vivo by comparing viral decay rates in viral dynamic models. Biostatistics, 2:13 – 29. Ding, J. M. and Wang, J. L. (2008). Modeling longitudinal data with nonparametric multiplicative random eﬀects jointly with survival data. Biometrics, 64:546 – 556. Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall/CRC. Faucett, C. J. and Thomas, D. C. (1996). Simultaneously modeling censored survival data and repeatedly measured covariates: A gibbs sampling approach. Statistics in Medicine, 15:1663 – 1685. Fletcher, R. (1987). Practical Methods of Optimization. John Wiley & Sons, 2nd edition. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal desities. Journal of the American Statistical Association, 85:398 – 409. 65 Bibliography Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics, 1, 4:465 – 480. Hsieh, F. S., Tseng, Y. K., and Wang, J. L. (2006). Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics, 62:1037 – 1043. Huang, X. Z., Stefanski, L. A., and Davidian, M. (2009). Latent-model robustness in joint models for a primary endpoint and a longitudinal process. Biometrics, 65:719 – 727. Ibrahim, J. G., Chen, M. H., and Lipsitz, S. R. (2001a). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88:551 – 564. Ibrahim, J. G., Chen, M. H., and Sinha, D. (2001b). Bayesian Survival Analysis. Springer, New York. Laird, N. M. and Ware, J. H. (1982). Random-eﬀects models for longitudinal data. Biometrics, 38:963 – 974. Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley Series in Probability and Statistics. Wiley Interscience, 2nd edition. Lee, Y., Nelder, J. A., and Pawitan, Y. (2006). Generalized Linear Models with Random Eﬀects: Uniﬁed Analysis via H-likelihood. Chapman & Hall/CRC. Li, L., Hu, B., and Greene, T. (2009). A semiparametric joint model for longitudinal and survival data with application to hemodialysis study. Biometrics, 65:737 – 745. Li, N., Elashoﬀ, R. M., Li, G., and Saverc, J. (2010). Joint modeling of longitudinal ordinal data and competing risks survival times and analysis of the ninds rt-pa stroke trial. Statistics in Medicine, 29:546 – 557. 66 Bibliography Lindstrom, M. J. and Bates, D. M. (1988). Newton-raphson and em algorithms for linear mixed-eﬀects models for repeated-measures data. Journal of the American Statistical Association, 83:1014 – 1022. Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed eﬀects models for repeated measures data. Biometrics, 46:673 – 687. Little, R. J. A. (1992). Regression with missing x’s: a review. Journal of the American Statistical Association, 87:1227 – 1237. Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association, 90:1112 – 1121. Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Values. Wiley, New York, 2nd edition. Liu, B. and Muller, H. G. (2009). Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics. Journal of the American Statistical Association, 104:704 – 717. Liu, L. (2009). Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Statistics in Medicine, 28:972 – 986. Liu, W. and Wu, L. (2007). Simultaneous inference for semiparametric nonlinear mixed-eﬀects models with covariate measurement errors and missing responses. Biometrics, 63:342 – 350. Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society, Series: B, 44:226 – 233. Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, 2nd edition. 67 Bibliography Pinheiro, J. C. and Bates, D. M. (2002). Mixed-Eﬀects Models in S and SPLUS. Springer, New York. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New York, 2nd edition. Ratcliﬀe, S. J., Guo, W. S., and Have, T. R. T. (2004). Joint modeling of longitudinal and survival data via a common frailty. Biometrics, 60:892 – 899. Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed-eﬀects models for unequally sampled noisy curves. Biometrics, 57:253 – 259. Rizopoulos, D., Verbeke, G., and Lesaﬀre, E. (2009). Fully exponential laplace approximations for the joint modelling of survival and longitudinal data. Journal of the Royal Statistical Society, Series: B, 71:637 – 654. Rizopoulos, D., Verbeke, G., Lesaﬀre, E., and Vanrenterghem, Y. (2008). A twopart joint model for the analysis of survival and longitudinal binary data with excess zeros. Biometrics, 64:611C619. Rizopoulos, D., Verbeke, G., and Molenberghs, G. (2010). Multiple-imputationbased residuals and diagnostic plots for joint models of longitudinal and survival outcomes. Biometrics, 66:20 – 729. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC. Shah, A., Laird, N., and Schoenfeld, D. (1997). A random-eﬀects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association, 92:775 – 779. Song, X., Davidian, M., and Tsiatis, A. A. (2002a). An estimator for the pro- 68 Bibliography portional hazards model with multiple longitudinal covariates measured with error. Biostatistics, 3, 4:511 – 528. Song, X., Davidian, M., and Tsiatis, A. A. (2002b). A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics, 58:742 – 753. Song, X. and Huang, Y. J. (2006). A corrected pseudo-score approach for additive hazards model with longitudinal covariates measured with error. Lifetime Data Analysis, 12:97 – 110. Song, X. and Wang, C. Y. (2008). Semiparametric approaches for joint modeling of longitudinal and survival data with time-varying coeﬃcients. Biometrics, 64:557 – 566. Tseng, Y. K., Hsieh, F. S., and Wang, J. L. (2005). Joint modelling of accelerated failure time and longitudinal data. Biometrika, 92:587 – 603. Tsiatis, A. A. and Davidian, M. (2004). An overview of joint modeling of longitudinal and time-to-event data. Statistica Sinica, 14:793 – 818. Verbeke, G. and Molenberghs, G. (2001). Linear Mixed Models for Longitudinal Data. Springer, New York. Vonesh, E. F., Greene, T., and Schluchter, M. D. (2006). Shared parameter models for the joint analysis of longitudinal data and event times. Statistics in Medicine, 25:143 – 163. Vonesh, E. F., Wang, H., Nie, L., and Majumdar, D. (2002). Conditional second order generalized estimating equations for generalized linear and nonlinear mixed-eﬀects models. Journal of the American Statistical Association, 97:271 – 283. 69 Bibliography Wang, Y. and Taylor, J. M. G. (2001). Jointly modeling longitudinal and event time data with application to acquired immunodeﬁciency syndrome. Journal of the American Statistical Association, 96:895 – 905. Wei, G. C. and Tanner, M. A. (1990). A monte-carlo implementation of the em algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistical Association, 85:699 – 704. Wu, H. and Ding, A. (1999). Population hiv-1 dynamics in vivo: applicable models and inferential tools for virological data from aids clinical trials. Biometrics, 55:410 – 418. Wu, H. and Zhang, J. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Eﬀects Modeling Approaches. Wiley-Interscience. Wu, L. (2002). A joint model for nonlinear mixed-eﬀects models with censoring and covariates measured with error, with application to aids studies. Journal of the American Statistical Association, 97:955 – 964. Wu, L. (2009). Mixed Eﬀects Models for Complex Data. Chapman & Hall/CRC. Wu, L., Hu, X. J., and Wu, H. (2008). Joint inference for nonlinear mixed-eﬀects models and time-to-event at the presence of missing data. Biostatistics, 9:308 – 320. Wu, L., Liu, W., and Hu, X. J. (2010). Joint inference on hiv viral dynamics and immune suppression in presence of measurement errors. Biometrics, 66:327 – 335. Wu, M. C. and Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44:175 – 188. 70 Bibliography Wulfshon, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics, 53:300 – 309. Xu, J. and Zeger, S. L. (2001a). The evaluation of multiple surrogate endpoints. Biometrics, 57:81 – 87. Xu, J. and Zeger, S. L. (2001b). Joint analysis of longitudinal data comprising repeated measures and times to events. Applied Statistics, 50:375 – 387. Ye, W., Lin, X., and Taylor, J. M. G. (2008). Semiparametric modeling of longitudinal measurements and time-to-event data: A two stage regression calibration approach. Biometrics. Yu, M. G., Law, N. J., Taylor, J. M. G., and Sandler, H. M. (2004). Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica, 14:835 – 862. Yu, M. G., Taylor, J. M. G., and Sandler, H. M. (2008). Individual prediction in prostate cancer studies using a joint longitudinal survivalccure model. Journal of the American Statistical Association, 103:178 – 187. Zeng, D. L. and Cai, J. W. (2005a). Asymptotic results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics, 33:2132 – 2163. Zeng, D. L. and Cai, J. W. (2005b). Simultaneous modelling of survival and longitudinal data with an application to repeated quality of life measures. Lifetime Data Analysis, 11:151 – 174. Zhang, H. P., Ye, Y. Q., Diggle, P., and Shi, J. (2008). Joint modeling of time series measures and recurrent events and analysis of the eﬀects of air quality on respiratory symptoms. Journal of the American Statistical Association, 103:48 – 60. 71 Bibliography Zhang, S., Muller, P., and Do, K. A. (2010). A bayesian semiparametric survival model with longitudinal markers. Biometrics, 66:719 – 727. 72
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Approximate methods for joint models in longitudinal...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Approximate methods for joint models in longitudinal studies Lu, Libo 2010
pdf
Page Metadata
Item Metadata
Title | Approximate methods for joint models in longitudinal studies |
Creator |
Lu, Libo |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Longitudinal studies often contain several statistical issues, suchas longitudinal process and time-to-event process, the associationamong which requires joint modeling strategy. We firstly review the recent researches on the joint modeling topic. After that, four popular inference methods are introduced for jointly analyzing longitudinal data and time-to-event data based on a combination of typical parametric models. However, some of them may suffer from non-ignorable bias of the estimators. Others may be computationally intensive or even lead to convergence problems. In this thesis, we propose an approximate likelihood-based simultaneous inference method for jointly modeling longitudinal process and time-to-event process with covariate measurement errors problem. By linearizing the joint model, we design a strategy for updating the random effects that connect the two processes, and propose two algorithm frameworks for different scenarios of joint likelihood function. Both frameworks approximate the multidimensional integral in the observed-data joint likelihood by analytic expressions, which greatly reduce the computational intensity of the complex joint modeling problem. We apply this new method to a real dataset along with some available methods. The inference result provided by our new method agrees with those from other popular methods, and makes sensible biological interpretation. We also conduct a simulation study for comparing these methods. Our new method looks promising in terms of estimation precision, as well as computation efficiency, especially when more subjects are given. Conclusions and discussions for future research are listed in the end. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0071254 |
URI | http://hdl.handle.net/2429/27909 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_lu_libo.pdf [ 386.24kB ]
- Metadata
- JSON: 24-1.0071254.json
- JSON-LD: 24-1.0071254-ld.json
- RDF/XML (Pretty): 24-1.0071254-rdf.xml
- RDF/JSON: 24-1.0071254-rdf.json
- Turtle: 24-1.0071254-turtle.txt
- N-Triples: 24-1.0071254-rdf-ntriples.txt
- Original Record: 24-1.0071254-source.json
- Full Text
- 24-1.0071254-fulltext.txt
- Citation
- 24-1.0071254.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071254/manifest