ANALYSIS OF LONGITUDINAL DATA OF MIXED TYPES USING A STATE SPACE MODEL APPROACH By Billy K. S. Ching B. Sc. University of Hong Kong A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June 1997 © Billy K. S. Ching, 1997 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract A new method for multivariate regression analysis of longitudinal data of mixed types is applied to the data from a sub-study of the Betaseron multicenter clinical trial in relapsing-remitting multiple sclerosis (MS) (The IFNB Multiple Sclerosis Study Group, 1993). The sub-study is based on a cohort of 52 patients at one center (University of British Columbia) for frequent magnetic resonance imagings (MRIs) for analysis of dis ease activity over the first two years of the trial (Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group, 1993). We consider a bivariate response vector with two different data types as components. The first component is a positive continuous variable and the second one is a count variable. We use a state space model approach based on the Tweedie class of exponential dispersion models assuming conditional independence of the two components given a latent gamma Markov process. The latent process is interpreted as the underlying severity of the disease whereas the ob servations reflect the symptoms. One advantage the new method offers is that it enables the examination of patterns over time. Not only can it identify the presence of treatment effect, but also the nature of the effect. It has well been established that Betaseron has substantially altered the natural history of MS in a properly controlled clinical trial (The IFNB Multiple Sclerosis Study Group, 1993). The main objective of this thesis is to illustrate the utilization of the new method using this data set and to extract additional valuable information from the data. n Table of Contents Abstract ii List of Tables vList of Figures vii Acknowledgments x1 Introduction 1 2 Review of literature 4 2.1 Marginal models 5 2.2 Random effects models 6 2.3 Transition models 8 2.4 State space models 9 3 Tweedie state space models 11 3.1 The model 13.2 The Kalman filter and smoother 15 3.3 Parameter estimation 17 3.3.1 The Kalman estimating equation 13.3.2 Estimation of dispersion parameters 20 3.4 Residual analysis 21 4 Analysis of Betaseron data ) 25 iii 4.1 Background 25 4.1.1 Description of the experiment 24.1.2 Description of the data 7 4.2 Initial data analysis 29 4.2.1 Burden of disease4.2.2 Active lesions 35 4.2.3 Covariates 7 4.3 Model identification 40 4.4 Model checking4.4.1 Full data set 7 4.4.2 Data set with 446* removed 59 4.4.3 Data set with 446* and 545* removed 73 4.5 The final model 75 Discussion 81 Bibliography 4 Appendix A Exponential dispersion models 87 Appendix B Godambe information matrix 90 B.l Moment calculations 9B.2 Derivatives of Kalman filter and smoother 91 B.3 Variability matrix 3 B.4 Sensitivity matrix 94 Appendix C Residuals plots for each patient in the placebo group 96 iv Appendix D Residuals plots for each patient in the low dose group 112 Appendix E Residuals plots for each patient in the high dose group 128 v List of Tables 4.1 Descriptive statistics for average relative burden values 30 4.2 Descriptive statistics for average active lesions values 35 4.3 Descriptive statistics for age, duration, EDSS and log area 37 4.4 Counts for origin and gender 39 4.5 Estimates and standard errors for baseline covariate effects 43 4.6 Estimates and standard errors for the long-term covariate (time) effects for patients in the placebo group 44 4.7 Estimates and standard errors for the long-term covariate effects (time) for patients in the low dose group 45 4.8 Estimates and standard errors for the long-term covariate effects (time) for patients in the high dose group 46 4.9 Empirical variance-covariance matrix for standardized residuals 71 4.10 Estimates for the long-term covariates effects 77 4.11 Estimates for the baseline covariates effects4.12 Estimates and standard errors for the time effects for each patient; placebo (PL), low dose (LD) and high dose (HD) 79 4.13 Estimates and standard errors for baseline covariate effects 79 A.14 Summary of the Tweedie exponential dispersion models 88 vi List of Figures 4.1 Boxplots of average relative burden values by treatment group 31 4.2 Boxplots of relative burden values for each patient 32 4.3 Plots of relative burden values versus time for each patient 33 4.4 Boxplots of average active lesions values across treatment group 34 4.5 Plots of active lesions values versus time 36 4.6 Boxplots of covariates values across treatment group 38 4.7 Residuals plots for patient #446 in the placebo group 48 4.8 Residuals plots for patient #545 in the placebo group 49 4.9 Residuals plots for patient #502 in the low dose group 51 4.10 Residuals plots for patient #500 in the high dose group 52 4.11 Residuals against log-linear predictors for the two categories 55 4.12 Residuals against log-linear predictors for the latent process 56 4.13 Residuals against lag 1 residuals for the two categories 57 4.14 Residuals against lag 1 residuals for the latent process 58 4.15 Residuals plots for patient #446 in the placebo group 60 4.16 Residuals plots for patient #545 in the placebo group 61 4.17 Residuals plots for patient #502 in the low dose group 62 4.18 Residuals plots for patient #500 in the high dose group 63 4.19 Residuals against log-linear predictors for the two categories 65 4.20 Residuals against log-linear predictors for the latent process 65 4.21 Residuals against lag 1 residuals for the two categories 66 vii 4.22 Residuals against lag 1 residuals for the latent process 67 4.23 Residuals against time for the two categories 64.24 Residuals against time for the latent process 8 4.25 Smoother residuals against log-linear predictors for the random effects . . 69 4.26 Smoother residuals against baseline covariates 70 4.27 Autocorrelation functions for average filter residuals: (a) to (c) are the au tocorrelations functions; (d) to (e) are the partial autocorrelations functions 72 4.28 Estimated latent processes for each patient 75 C.29 Residuals plots for patient #420 in the placebo group 97 C.30 Residuals plots for patient #421 in the placebo group 98 C.31 Residuals plots for patient #443 in the placebo group 99 C.32 Residuals plots for patient #449 in the placebo group 100 C.33 Residuals plots for patient #451 in the placebo group 101 C.34 Residuals plots for patient #498 in the placebo group 102 C.35 Residuals plots for patient #501 in the placebo group 103 C.36 Residuals plots for patient #504 in the placebo group 104 C.37 Residuals plots for patient #507 in the placebo group 105 C.38 Residuals plots for patient #522 in the placebo group 106 C.39 Residuals plots for patient #523 in the placebo group 107 C.40 Residuals plots for patient #539 in the placebo group 108 C.41 Residuals plots for patient #540 in the placebo group 109 C.42 Residuals plots for patient #550 in the placebo group 110 C. 43 Residuals plots for patient #565 in the placebo group Ill D. 44 Residuals plots for patient #419 in the low dose group 113 D.45 Residuals plots for patient #424 in the low dose group 114 viii D.46 Residuals plots for patient #448 in the low dose group 115 D.47 Residuals plots for patient #450 in the low dose group 116 D.48 Residuals plots for patient #452 in the low dose group 117 D.49 Residuals plots for patient #499 in the low dose group 118 D.50 Residuals plots for patient #508 in the low dose group 119 D.51 Residuals plots for patient #521 in the low dose group 120 D.52 Residuals plots for patient #525 in the low dose group 121 D.53 Residuals plots for patient #542 in the low dose group 122 D.54 Residuals plots for patient #544 in the low dose group 123 D.55 Residuals plots for patient #547 in the low dose group 124 D.56 Residuals plots for patient #548 in the low dose group 125 D.57 Residuals plots for patient #564 in the low dose group 126 D. 58 Residuals plots for patient #568 in the low dose group 127 E. 59 Residuals plots for patient #422 in the high dose group 129 E.60 Residuals plots for patient #444 in the high dose group 130 E.61 Residuals plots for patient #445 in the high dose group 131 E.62 Residuals plots for patient #453 in the high dose group 132 E.63 Residuals plots for patient #454 in the high dose group 133 E.64 Residuals plots for patient #497 in the high dose group 134 E.65 Residuals plots for patient #503 in the high dose group 135 E.66 Residuals plots for patient #506 in the high dose group 136 E.67 Residuals plots for patient #524 in the high dose group 137 E.68 Residuals plots for patient #526 in the high dose group 138 E.69 Residuals plots for patient #541 in the high dose group 139 E.70 Residuals plots for patient #543 in the high dose group 140 ix E.71 Residuals plots for patient #546 in the high dose group E.72 Residuals plots for patient #549 in the high dose group E.73 Residuals plots for patient #566 in the high dose group Acknowledgments I would like to thank Dr. Bent J0rgensen for his guidance and supervision throughout the development of this thesis. The data set used in this thesis were collected at the UBC MS Clinic as part of the UBC Frequent MRI Sub-study of the Betaseron Clinical Trial sponsored by Berlex Laboratories, Richmond, California. I would like to thank Berlex Laboratories for making the data available to us. I am grateful to Dr. John Petkau for introducing us to the data, his careful reading of the manuscript and subsequent helpful suggestions. The assistance from Dr. Peter X.K. Song concerning computing issues is also gratefully acknowledged. xi Chapter 1 Introduction This thesis focuses on the Tweedie state space models proposed by J0rgensen, Lundbye-Christensen, Song and Sun (1995c) and the application of these models to a multivariate longitudinal data set from the UBC 6-weekly frequent Magnetic Resonance Imaging (MRI) sub-study of the Betaseron clinical trial in relapsing-remitting multiple sclerosis (MS) (Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group, 1993). The data will be referred to simply as the Betaseron Data. Models for analyzing longitudinal data have been developed extensively recently. These models can be classified as marginal, random effects, transition and state space models. The marginal models focus on inferences about the population average. A key feature of these models is the estimation of regression parameters by the generalized es timation equation method. Among these are the models proposed by Liang and Zeger (1986) and Liang, Zeger, and Qaqish (1992). The random effects models consider situ ations in which the regression coefficients vary among individuals. Some examples are Laird and Ware (1982) and Zeger and Karim (1991). Transition models are particularly useful when the correlation structure is investigated. The models developed by Korn and Whittemore (1979) and Zeger, Liang and Self (1985) are specific examples of transition models for binary data. The state space models have their origin in aerospace engineering and have been 1 Chapter 1. Introduction 2 widely applied in modern systems theory. These kind of models are well developed within the context of the Gaussian distribution. The concept has been extended by several authors to include non-Gaussian.distributions. In particular, Zeger (1988) and Harvey and Fernandes (1989) have proposed state space models for Poisson counts. The use of the Kalman smoother in predicting the latent process is a key feature of these models. Details of the models for longitudinal data, as mentioned above, will be dis cussed in Chapter 2. The new method proposed by J0rgensen et al. (1995c) is based on distributions in the class of Tweedie exponential dispersion models (J0rgensen, 1987). The distributions included in the Tweedie class are the Gaussian, Poisson, compound Poisson and gamma, among others. Therefore, the method can handle a wide variety of data types, including continuous, count, mixed and positive continuous data. The method has been developed for regression analysis of multivariate longitudinal data of mixed types. That is, the re sponse vector can have components (called categories) with distributions in the Tweedie class; these distributions may not all be the same. The categories are assumed to all reflect the same underlying latent process; in particular, an order 1 Markov process is used in the analysis in Chapter 4. The observations are assumed conditionally indepen dent given the latent process, both across categories and over time. This results in serial correlation for each category as well as correlation between the categories because of the shared latent process. A realistic correlation structure of the general sort found in many types of data is hence achieved. The state space model approach provides an intuitively appealing conceptual frame work for longitudinal data analysis and a more practical interpretation of the data-generating mechanism. A new feature offered by the new method is that the time-varying Chapter 1. Introduction 3 covariates can enter the model either via the latent process or via the observation model. The covariates that enter via the observation model are termed short-term covariates. They have immediate effects on the categories but such effects are short-lived. The co variates that enter via the latent process are termed long-term covariates. They affect the underlying trend of the observations and have a carry-over effect to the next few data points. Residual analysis may help to determine whether a covariate is long-term or short-term. Detailed discussion of this issue is presented in Chapter 3. Also, the latent process may be interpreted as a measure of the severity of the disease. The response variables, in turn, reflect the symptoms caused by the given severity. This provides a convenient framework for interpretation of the model and the statistical analysis. This thesis has five chapters. A literature review is presented in Chapter 2. The new model is introduced in Chapter 3. In Chapter 4, we will discuss the analysis of the Betaseron data. The conclusions of the data analysis and the comparisons with the previous analyses of the data are presented in Chapter 5. Chapter 2 Review of literature This chapter surveys the current literature on approaches used in analyzing discrete and continuous longitudinal data. We will present the marginal, random effects, transition and state space models. Following Diggle, Liang and Zeger (1994), the marginal, random effects and transition models are extensions of generalized linear models (GLM). Tra ditional maximum likelihood or maximum conditional likelihood methods are normally employed in estimating the parameters in both the random effects and transition models. In marginal models, the marginal means and covariance structure of the observations are modelled separately. These models do not require specification of the joint likelihood of the data. Liang and Zeger (1986) and Zeger and Liang (1986) proposed the general ized estimating equations (GEE) approach for estimating regression parameters in these models. In the GEE approach, the "working likelihood" is used, instead of the actual likelihood, for generating estimating equations for the regression parameters. The ap plications of these extensions to GLM are confined to univariate response variables and depend on the objective of the scientific research. To be specific, the marginal models are suitable when the population average is the primary focus, whereas the random effects models are appropriate in situations where subject-specific effects are to be studied. The transition models are particularly useful in investigating the dependence of the response on both the explanatory variables and previous responses. The key idea underlying state space models is that the longitudinal data (observed) 4 Chapter 2. Review of literature 5 is related to a latent process (unobserved) by means of assumptions on the conditional distribution of the former given the value of the latter. Moreover, a transition model is assumed for the latent process. One advantage of this approach over the three GLM ex tensions is its ability to handle a multivariate response. In addition, it gives an appealing intuition regarding the data-generating mechanism. We will present some key features and ideas of these models in the following sections. For more detailed descriptions, we refer to the original papers. We denote the longitu dinal data by Yjt, j = 1,..., h, t = 1,..., n,-; the subject is indexed by j and rij is the number of observations from subject j. This notation will be used throughout the rest of this chapter. 2.1 Marginal models In a marginal model, the regression of the response on explanatory variables is modelled separately from within-subject correlation. In the regression, the average response over a sub-population (such as patients from a treatment group) is modelled as a function of explanatory variables. Liang and Zeger (1986) and Zeger and Liang (1986) proposed one such approach based upon generalized linear models (GLM). These marginal models for GLMs aim at addressing scientific interest concerning the population average. In such a marginal model for longitudinal data Yjt, the marginal expectation E(ij4) = pjt is assumed to be a function (link) h of the explanatory variables Xjt. The marginal variance Var(Yji) is related to the marginal mean through a known variance function V and an unknown dispersion parameter cr2. Furthermore, the correlation between Yjt and Yjk is a known function p of the marginal means and, perhaps, an additional parameter, a. Chapter 2. Review of literature 6 That is, • HHt) = xjtP ; • Vav(Yjt) = <r2V(nJt); • CoTT(Yjt,Yjk) = p(fj,jt,fijk;cx). The regression parameter (3 is assumed to be the same across subjects and it repre sents the overall effects of the explanatory variables on the response. One advantage of the marginal models is their simplicity of interpretation. They serve as natural analogues for correlated data of generalized linear models for independent data. Note that assump tions are made only on the first two moments and the likelihood function is not specified. On the one hand, these make them suitable for a wide range of discrete and continuous data. On the other hand, traditional maximum likelihood methods cannot be used in parameter estimation. The GEE approach was proposed to estimate the regression pa rameters and it was found to be robust to misspecifications of the correlation structure. For details of the GEE approach, see Liang and Zeger (1986) and Zeger and Liang (1986). 2.2 Random effects models When the scientific objective concerns the individual subjects and not only the population average, the random effects models are extremely useful. In a random effects model, the regression coefficients can vary across subjects and this variability takes the natural heterogeneity among subjects into account. One way of specifying random effects model is as follows: Chapter 2. Review of literature 7 • the responses Yji,..., Yjnj are conditionally independent given the random effects, Uj, j = 1,..., h; and the conditional densities are of the form f(yjt\Uj) = exp{ — + c(yjt,o )}, where b and c are known real functions and 9jt and o2 are unknown natural and dispersion parameters respectively; • the conditional mean and variance satisfy h(E(Yjt\Uj)) = x]t(3 + d]tUj and Vav(Yjt\Uj) = o'V(E(Yjt\U3)) where h and Fare known link and variance functions respectively, Xj4 is the tth row of the design matrix for the fixed population effects, /3 is the vector of unknown regression parameters, djt is the tth. row of the design matrix for the random individual effects and Uj is the vector of unknown individual effects. • the t/j's are mutually independent. Note that the heterogeneity among subjects is represented by a probability distribu tion and the serial correlations between observations from the same subjects are assumed to have been caused by their having the same Uj. Moreover, the inclusion of djt in the model permits some explanatory variables to have random coefficients while the remain ing ones do not. This allows greater flexibility in the applications of these models. Since the likelihood in these models is explicitly stated, regression parameters are usually esti mated by maximum likelihood or restricted maximum likelihood. For more details, see Laird and Ware (1982) and Zeger and Karim (1991). Chapter 2. Review of literature 8 2.3 Transition models The transition models are very useful when the dependence of the response on both the explanatory variables and the past are investigated. In a transition model, the conditional distribution of Yjt is assumed to depend on both the explanatory variables and the past responses, Hjt = {yjk, k = 1,..., t — 1}. A transition model is specified by: • the conditional distribution of Yjt is assumed to be of the form f(ylt\Hjt) = exP{yjt6jt + c(yjt,o2)}, o~ where b and c are known real functions and 9jt and a 2 are unknown natural and dispersion parameters respectively; • the conditional mean and variance satisfy h(V(YJt\HJt)) = xjt(3 + MH,f,f3,a) and Yar(Yjt\Hjt) = <r2V(E(Yjt\Hjt)) where ip^ is a known function, f3 is the regression parameter, ct is the additional parameter relating Yjt to Hjt, and h and V are known link and variance functions respectively. A Markov chain assumption for the series of responses is commonly used. For ex ample, in a order 1 Markov chain, the response Yjt is assumed to depend only on the immediate previous response, Yj^-\- Transition models with a Markov assumption are routinely used to analyze binary, categorical and count data. Note that the regression parameter /3 is assumed to be the same across subjects. It leaves the natural hetero geneity of subjects to be explained by the past responses, Hjt. A shortcoming of the Chapter 2. Review of literature 9 transition models is the possible confounding of the effects of the explanatory variables with those of the past responses. For specific examples, see Zeger and Qaqish (1988) and Kaufmann (1987). 2.4 State space models The state space models for longitudinal data proposed in the literature are diverse both in approach and in method of estimation. In these models, the response is related to the latent process (unobserved) and assumptions are made on the conditional distribution of the response given the latent process. The Kalman filter and smoother are com monly used to predict the latent process, see Durbin (1990), Fahrmeir and Kaufmann (1991), Jones (1993) and Fahrmeir and Tutz (1994). Models for Gaussian data are well developed and the idea has recently been extended to include non-Gaussian data. For example, models for univariate Poisson counts have been proposed by Azzalini (1982), Zeger (1988), Harvey and Fernandes (1989) and Chan and Ledolter (1995). J0rgensen, Lundbye-Christensen, Song and Sun (1995a) developed models for multivariate count data. Most state space models do not explicitly specify the distribution for the latent process but rather its first and second moments only. The model proposed by West, Harrison and Migon (1985) is one such model. In the model developed by Zeger (1988) for Poisson counts, only second moment assumptions were made for the latent process. In these cases where the probability model is not fully specified, the quasi-likelihood method is used in estimating the regression parameters. One shortcoming of the quasi-likelihood approach is that it is difficult to interpret the results in the absence of full distributional Chapter 2. Review of literature 10 assumptions. J0rgensen et al. (1995a) overcame this by explicitly modelling both the latent process and the data; these models are further developed into our current approach to handle various types of multivariate longitudinal data. Chapter 3 Tweedie state space models These models are generalizations of the Poisson-gamma models considered by J0rgensen et al. (1995a) in which the observations are multivariate Poisson counts and the latent process is assumed to follow the gamma distribution. The new method is based on the class of Tweedie exponential dispersion models and can be used for regression analysis of multivariate longitudinal data where the components (called categories) of the response vector can be of different data types: discrete, continuous and mixed. For more details, see J0rgensen et al. (1995c). 3.1 The model For the case where a d- dimensional vector of observations is recorded at equally spaced times t for each of the h subjects,.we have h independent series of observations indexed by j = 1,... ,h; the j th series has length rij. We let Y{jt denote the observation for category i, series j and time t ; i = 1,..., d, j = 1,..., h and t — 1,..., rij. The vector of observations and the value of the latent process at time t for series j are denoted by Yjt and 9jt respectively. Let 8jo denote the initial value of the latent process for series j. The variation of these initial values, 8j0s, is assumed to explain the natural heterogeneity across subjects. To summarize, the model consists of three parts: the observation model, the latent process and the initial values. 11 Chapter 3. Tweedie state space models 12 • The observation model: The categories of Yjt are assumed to be conditionally independent given 0jt, and the conditional distribution for the ith. category of Yjt is assumed to follow a Tweedie exponential dispersion model, where Xjt 6 R* are the time-varying short-term covariates modifying the condi tional mean of the observations given the latent process 9jt, CX;J 6 Rk are the regression parameters which can vary across categories, pi is the known shape pa rameter which determines the distributional form of the category and uf is the dispersion parameter which can be different even if all the p,s are the same. We consider the case where the dispersion parameter depends on the category i only but not on patient j. However, the model allows the dispersion parameter to be different across categories and across patients. The notation TwPi(a,62) denotes the Tweedie model with known shape parameter pi, pi > 1 or pi < 0. The case Pi = 1 and uf = 1 gives the Poisson distribution and the case pi = 2 corresponds to the gamma distribution. Note that the dispersion parameter for the Poisson distribution is equal to 1. A summary of these exponential dispersion models is given in Appendix A. The conditional expectation and variance are given (3.1) with aijt = exp(xJTial-j), (3.2) aijtOjt and Var(Yijt\6jt) = J/, respectively. Chapter 3. Tweedie state space models 13 For example, in the Betaseron data which we will analyze in Chapter 4, the latent process 9jt may be interpreted as the underlying severity of multiple sclerosis and the observations Y{jt may be interpreted as the symptoms reflecting this underlying severity. We assume that the conditional distribution for the first and second cate gories are Poisson and gamma distributions, respectively. Moreover, the dispersion parameter for category i is assumed to be the same across patients. • The latent process: The latent process for each series is assumed to be an order 1 Markov chain with transition distribution of the form where Zjt G R are the time-varying long-term covariates with increments Azf = zt — 7>i-\, (3j G R' are the regression parameters and o2 is the dispersion parameter. Moreover, we assume z0 = 0. The conditional expectation and variance are, respectively, (3.3) with bjt = exp(AzJ/3j), (3.4) E(0jt\9jt-i) = bjtOjt-i and Var(^i|6lit_1) = o • The initial values: The marginal distribution of the initial values is assumed to be 9j0 ~ Twr(gj,u2) (3.5) with 9i = exp(w_,- 7) = E(0jO), (3.6) Chapter 3. Tweedie state space models 14 where Wj £ Rm are the constant baseline covariates which reflect systematic varia tion between series, 7 £ Rm are the regression parameters and u2 is the dispersion parameter reflecting the random variation between series. For example, the treatment groups and the baseline covariates present in the Be taseron study are treated as constant baseline covariates. The correlations between the categories Yijt, i = 1,... ,d, are assumed to be caused by their shared 9jt. Short-term covariates have a short-lived effects on the observations whereas the effects of the long-term covariates can be carried over to the next few data points. The marginal expectations and variances in model (3.1) —(3.6) are now presented. For the initial values, the variance of 9jo is Var(^o) - u2g] For the latent process, the marginal expectation of 0jt is E(0jt) = Tjt = gjbj! • • • bjt = exp(zj^ + w/7). (3.7) and the marginal variance of 6jt is Var(0i4) = o2(j)jtTjt+uj2T2t, where fat = b'jt1 + bjttft-i + ••• + bjtbjt-i... tfT1 For example, in the Betaseron study, time is the only long-term covariate we will consider. Since the latent process is log-linear in the covariates as seen in (3.7), the time effect may be interpreted as the rate of change of the log of the underlying severity of the disease. A positive estimate for the time effect suggests that the disease is increasing in severity. Chapter 3. Tweedie state space models 15 For the observation model, the marginal expectations of the observations are E(^it) = exp(xjati + zJtPj + Note that the observation and latent process means are log-linear in the covariates and so are the initial values. Furthermore, the marginal variance of Yjt has the form Var(Yji) = AjtTjt + ajiaJtVar(9jt) where Ajt = di ag (v\ a^(,i^dadjt) anQl ai< = (aijti • • • > adjt)T • The above expression shows that, at time t, the categories are positively correlated since the second term is always positive. Finally, for observation vectors separated by a lag of s > 0, the covariance of Yjt and Yjt+S is given by Cov(Yj<, Yjt+s) = a>iaJ+J6jt+1 • • • bjt+sVav(9jt). 3.2 The Kalman filter and smoother For each series j, the Kalman filter predicts a future observation Yjt+i and the cur rent value of the latent process 0jt based on past and current observations Yj\,... ,Yjt whereas the smoother predicts the latent process using all the observations in the series, Yji,..., Yjn.. These filtered and smoothed values are used to estimate, or more precisely, predict the values of 9jt in model (3.1)—(3.6). From now on, we consider filtering and smoothing for one series at a time. The index j is suppressed for notational convenience. We define the innovations for the latent process by & = 0«-E(0«|0'-l) = 0,-Mt-i and for the observation model by e< = Yt-E(Yt|0«) = Y,-a,0t, Chapter 3. Tweedie state space models 16 where dl — (#o,..., 9t)T, t = 1,..., n. The model can be expressed in the additive form as follows, Yt = atet + et, 0, = Mt-i + 6, where et and 0t are uncorrelated, and so are £t and These 2n_, innovations are uncorrelated. The model has the same correlation structure as that of an ordinary Gaussian state space model. The variances of the innovations are given by Var(6) = E(^2) = o\\-\t and Var(e«) = E(eteJ) = Atrt. The forward recursive formula for the filtered values, mt, and their corresponding mean square errors, Ct, for the latent process are given by, for t = 1,..., n, mt = btimt-! + AaTQ7X(Yi - f()} (3.8) and where Ct = btDt{l - btDtajQT1*). (3.9) Dt = btCt-1+o2br1rt-i, ft = SLtbtmt-i and Qt = btDta.ta.J + A<T4. The recursion starts with mo = g and Co = u2gr. When all the n observations are known, the smoother values, m*, and their corre sponding mean square errors, C*, for the latent process can be obtained by the following backward recursion for t = n — 1,..., 1,0; starting with m* = mn and C* = Cn: C m* = mt + jr-(rn*+1 - bt+imt) (3.10) Chapter 3. Tweedie state space models 17 and c; = *2-£-b# + -§-c;+1. (3.ii) For detailed arguments, see Harvey (1981, Chapter 4), or J0rgensen et al. (1995a). 3.3 Parameter estimation Let the full data vector of the N = n\ + •••-(- rih observations Yjt be Y = (Yn,..., Ylni,..., YA1,..., Yknh) • and the vector of the smoother values be m* = (m*10,...,m*lni,...,m*hQ,...,m*hnJT. The two regression parameter vectors are defined by a = (aT1,...,aJ1,...,aTfc,...,aJA)T, (3 = ((3j,..., /3j)T. The Kalman estimating equation method is used to estimate the regression parameters 77 = (aT,/3T, 7T)T. The dispersion parameters vf, a2 and u2 are estimated by the adjusted Pearson estimates. 3.3.1 The Kalman estimating equation The Kalman estimating equation method corresponds to replacing the E-step of the EM algorithm by the Kalman smoother; but instead of the EM algorithm, the Newton scoring algorithm is used which will be defined later. When the values of the dispersion parameters vf, o2 and u2 are known, the regression parameter rj is estimated by ij, the solution to the Kalman estimating equation tp(r}) = 0 where the vector has components V'1(i|) = XTK;1(Y-Am-), (3.12) Chapter 3. Tweedie state space models 18 V>2(?7) = AZTK71Bm*, (3.13) and ^)=WTK;1(^o-g), (3-14) where A and B are suitably defined matrices such that the elements of Y — Am* are Yt — a4m* and those of Bm* are m* — fc^m*^; the matrices X, AZ and W are the design matrices of the generalized linear models (3.1-2), (3.3-4) and (3.5-6), respectively; Ka = diag(i/12a?l^1,..., uja^'l), K4 = diag(6|i \ ..., b\~],..., ..., Kf =diag(flfr1,-.-^r1)» g = (9iT--,9h)T, and m* = (mj0,..., m*h0)T. The estimating equation ip(rf) = 0 is unbiased. The estimator fj is asymptotically normal and its standard errors can be obtained from the inverse of the Godambe infor mation matrix, J(rj) = ST(V)V-1(r1)S(r1), where S(rj) is the sensitivity matrix, and is the variability matrix. The matrices S(TJ) and V(77) are calculated in Appendix B. The joint significance of a subset of regression parameters is tested using the statistic W defined by ly^iJ11^)]-1^, Chapter 3. Tweedie state space models 19 where T7X is the subvector whose significance we want to test and J1X(^) is the corre sponding block of the asymptotic covariance matrix of fj. The test statistic W is an analogue of Wald's statistic and follows a x2(w)-distribution asymptotically, where u is the dimensionality of the subvector rj1. The Newton scoring algorithm, a parallel to Fisher's scoring method, is defined as the Newton algorithm applied to the equation tp(rj) = 0, but with the derivative of ip replaced by its expectation 8(77). This new algorithm gives the updated value TJ* for TJ according to TJ* = Tj-S-1(r,)rb(Tj). (3.15) An advantage of this algorithm is that the calculation of S(rj) is done recursively in parallel with the calculation of the Kalman smoother, cf. Appendix B. The initial parameter estimates are obtained from the data by fitting the generalized linear models (3.1) —(3.6) with the 6jts replaced by their initial estimates which can be found by averaging the d components of Yjt. If d = 1, some smoothing is required, for example, using a moving average of the series. The algorithm is then started by calcu lating the smoothers and so on. When the dispersion parameters are unknown, they are estimated by the unbiased estimating equations (3.16), (3.17) and (3.18) in Section 3.3.2. These three equations to gether with the equation ip(rj) = 0 form a set of unbiased estimating equations. Hence, the estimator obtained by solving this set of equations is asymptotically unbiased. The Newton scoring algorithm obtains updated values for the regression parameters TJ and the dispersion parameters cr2, v2 and u2 in two stages. First, it updates TJ via (3.15). Chapter 3. Tweedie state space models 20 Second, it updates cr2, v2 and u2 via (3.16), (3.17) and (3.18). This extended algorithm is also referred as Newton scoring algorithm. The correct asymptotic standard errors for the estimator fj could be calculated from the Godambe information matrix for the full set of estimating equations, but this involves third and fourth moments of the Kalman smoother, which we have not calculated. Instead, we calculate the asymptotic standard errors from the Godambe information matrix for the estimating equation tpirj) = 0; this method is outlined earlier in this section. If the vector ib depends little on v2, o2 and u;2, this method will give approximately correct asymptotic standard errors. Further inves tigation on how the dependence of the vector ib on uf, o2 and u2 affects the asymptotic standard errors for the estimator is still pending. 3.3.2 Estimation of dispersion parameters Due to the substitution of the smoother value m*-t for the 6jt, the simple Pearson estimates for the dispersion parameters are biased downward. Following j0rgensen et al. (1995a), the adjusted Pearson estimates for the dispersion parameters a2, v2 and UJ2 are used instead: N ^ bq~^r-4 N tf-V ' { ' i h nj £*2 i h nj 2 fi* and ^=i£(2az*)l+i£2, (3,8) The equations for CT2, V2 and <2>2 are unbiased. The second term in (3.16) corrects the bias of the simple Pearson estimate and so do those in (3.17) and (3.18). In most cases, the dispersion parameters are unknown. Hence, the Newton scoring algorithm updates Chapter 3. Tweedie state space models 21 TJ via (3.15) and then <r2, vf and u>2 via (3.16), (3.17) and (3.18), respectively. 3.4 Residual analysis The main model assumptions for both the observed and unobserved parts of the model are checked by plots of the standardized residuals (that is, residuals divided by their standard deviations). Techniques commonly used in the generalized linear models are utilized to check the assumptions on the distribution as well as the regression of the model. The assumptions on the correlation structure of the data are examined by meth ods available from time series analysis; in particular, the autocorrelation function (ACF) and the partial autocorrelation function (PACF). When we check the assumptions (such as about the correlation structure) on the observations from a patient, we consider only the residuals from a given series. However, for the distributional assumptions of the observation model and the latent process, they are checked by plots of the residuals from all the h series. The two main types of residuals considered are the filter and smoother residuals. First, the filter residuals are defined as the predictor errors based on the Kalman filter and are given by kjt — Yjt - fjt and £jt = mjt — bjtmjt-\. Their variances are given by Vav(£jt) = Qjt and Var(&t) - bitDjt - Cjt. The tjtS are mutually uncorrelated over time and so are the ^s. Moreover, £jt is uncor rected with fijt and tfj* is uncorrelated with bjtmjt-i. These properties make the kjts Chapter 3. Tweedie state space models 22 and £jts very useful in the residual analysis. In particular, they are useful in checking the correlation structure of the model. For example, the ACF plot of the filter residu als (from a given series) showing significant serial correlation may suggest a violation of the assumption that the observations are conditional independent across time. However, these residuals may have relatively large variances for the first few observations of each series. Second, the smoother residuals are given by e*jt = Yjt - &jtm*jt and £*jt = m*jt - bjtm*^. Their corresponding variances are Var(£ji) = AHTJt ~ ZjtaJtCjt and var(§t) = o%%\lt -c;t- b)tqt_x + 2&*%^c;t. Ujt Contrary to the filter residuals, the smoother residuals are correlated over time. In gen eral, they have smaller variances than the corresponding filter residuals do. All these residuals have mean 0 but different variances. Note that the properties mentioned in this section do not take into account the effects of substituting estimates for the regression parameters. Standardized filter and smoother residuals (having unit variance) from the latent process are denoted by rjt and r*-t respectively. The residuals £jt are standardized com ponentwise. For each category, we use Rijt and R*jt to denote the standardized filter and Chapter 3. Tweedie state space models 23 smoother residuals, respectively. For the assumptions on the observation model, we may plot Rijt against Rij,t-i to check the conditional independence of the observations over time. For each category i of the jth series, the plots of the ACF and PACF of Rijt are useful to see whether the as sumed correlation structure is appropriate. We can check the distributional assumptions (to be specific, the form of the variance functions) by plotting the standardized filter residuals against the log fitted values, log(ay<m*i). Any pattern in this plot would indi cate that the variance function chosen is incorrect; for example, a "megaphone" shape would indicate that the variance of the residuals increases with the log fitted values. Due to the fact that the observation means are log-linear in the covariates in the model, the log link assumption can be checked by plotting log Y{jt against log(a,<m*t). If the log link assumption is incorrect, this plot should show a curvature or an unusual shape. The linearity of the regression model can be checked by plotting the smoother residuals R*-t against each covariate. The empirical variance-covariance matrix, H, of the vector of the standardized filter residuals given by 3 t=l is used to check the assumption that the categories are conditionally independent given the latent process. Since the expectation of His the d x d identity matrix, the observed H should not be very different from the identity matrix. One criterion can be used is that the off-diagonal elements should not exceed their respective asymptotic standard errors. Note that the off-diagonal elements have asymptotic standard errors nj1^2 and 1/2 the standardization depends on the version of the square-root matrix Q •/ chosen. Regarding the latent process, we can check the Markov assumption by plotting rjt Chapter 3. Tweedie state space models 24 against rJt-\- The ACF and PACF plots of r3t are also useful for checking the assump tions on the correlation structure of the latent process. The distributional assumptions can be checked by plotting rjt against the log fitted values, log^^m!^). This plot can indicate whether the variance of the residuals depends on the log fitted values. No partic ular pattern in the plot should be observed; this suggests no evidence against the chosen variance function. Non-linearity of the model can be detected by plotting r*-t against each covariate. Moreover, we can check the log link assumption by plotting log(m*t) against log^^m*^). A pattern in this plot may suggest that the log link assumption is inappropriate. We now consider the assumptions on the initial values. Similarly, we can plot the standardized smoother residuals m*0 — g3 against the log fitted values log(^j) to check the form of the random effects distribution. Plots of the standardized smoother residuals against each covariate are useful for detecting nonlinearity of the model. Furthermore, the log link assumption can be checked by plotting log(m*0) against log(^j). An additional feature this model offers is that residual plots can help to determine whether a covariate is long-term or short-term. The basic idea is that a short-term co variate would show an association with the observation residuals (Rijt or R*jt) if it had been incorrectly fitted as a long-term covariate. Similarly, a long-term covariate would show association with the latent process residuals {r3t or r*-t) if it had been incorrectly fitted as a short-term covariate. Chapter 4 Analysis of Betaseron data 4.1 Background 4.1.1 Description of the experiment A multicenter, randomized, double-blind, placebo-controlled trial of interferon beta-lb in 372 ambulatory patients with relapsing-remitting multiple sclerosis (MS) established that interferon beta-lb has substantially altered the natural history of MS (The IFNB Multiple Sclerosis Study Group, 1993). The interferon beta-lb was manufactured by Chiron Corporation, Emeryville, CA, and supplied to doctors as Betaseron by Berlex Laboratories, Richmond, CA. As a sub-study of this trial, a cohort of 52 patients at the University of British Columbia also had magnetic resonance imagings (MRIs) every 6 weeks during the initial two years of the trial (Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group, 1993). The main purpose of this frequent MRI sub-study was to learn more about the nature of MS lesions. We consider data from this UBC 6-weekly frequent MRI sub-study of the Betaseron clinical trial in relapsing-remitting multiple sclerosis (MS). The data is hereafter referred to as the Betaseron data. In this sub-study, the values of the response variables as well as the time-varying covariates for each patient were scheduled to be observed.at baseline (the first day of treatment) and every 6 weeks subsequently in the initial two years of the trial. Although patients were scheduled to have the MRIs (responses were observed based 25 Chapter 4. Analysis of Betaseron data 26 on these MRIs) every 6 weeks, the actual times when these MRIs were taken were some what different from this schedule. Due to a variety of reasons, several patients missed one or more of their scheduled MRI scans. Two of the total of 52 patients dropped out very early and they contributed a very small amount of data to this MRI sub-study. These patients were withheld from the analysis in this thesis. All analyses we report here are based on a total of only 50 patients. Moreover, several patients missed one or two isolated MRI scans. These missed scans are denoted by NA's. The only time-varying covariate we investigate in this thesis is time itself. Each patient was scheduled to take 17 MRI scans in addition to the baseline scan. These scans were critically interpreted and the observed values of the responses were based on these interpretations. The time was arbitrarily set to 0 at baseline and increased by 1 for each subsequent scan. That is, the time was equal to 1 for the first subsequent scan, 2 for the second subsequent scan and so on. Several baseline covariates were also present. For more detailed descriptions of this study, we refer to Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group (1993); see also Petkau and White (1995). The main objective of this thesis is to extensively illustrate the utilization of the Tweedie state space models described in Chapter 3 to analyze multivariate longitudinal data. We also aim at extracting additional information from the data using these new models. In this thesis, the relationship between Betaseron and two of the response vari ables is considered simultaneously and the patterns of these response variables over time will be investigated utilizing the Tweedie state space models. Chapter 4. Analysis of Betaseron data 27 4.1.2 Description of the data The two response variables that we considered were active lesions (lesions) and burden of disease (burden). For each patient, each subsequent scan was compared to the immediate previous one to determine the number of lesions that were found to be new, recurrent or enlarging. The value of active lesions is the sum of the number of new, recurrent and enlarging lesions at each scan. For each subsequent scan, the areas of MS lesions on all slices of the MRI scans were summed up. The total area (in units of mm2) is defined as the burden of disease. Therefore, the response which we studied is a bivariate vector with 2 components (called categories). The first category is active lesions which is a count variable whereas the second category is burden of disease which is a continuous variable. The data consist of 50 time series, one for each patient, for each response vector. As mentioned in section 4.1.1, there were deviations from the target dates of this 6-weekly schedule. The practical difficulties in maintaining this schedule is understandable. The most notable of these deviations was for the final scan which was delayed by a period of up to about two weeks for most patients. Three patients had even longer delays to their final scans (approximately 3, 4 and 5 weeks respectively) and two patients had final scans approximately 2 weeks before the target data according to the 6-weekly schedule. Most other deviations from the 6-weekly schedule were minor. Due to methodological constraints, in the analyses presented here, we consider each scan to be taken at the target date for which it was intended, irrespective of the deviation of the actual date of the scan from that target date. Due to early drop-outs, the length of these time series varies across patients. For example, one patient in the placebo group dropped out after time 14; the length of this Chapter 4. Analysis of Betaseron data 28 time series is 14 versus the targeted length of 17. These patients with early drop-outs do not complicate the analysis since the new method allows different lengths across patients. However, the isolated missing scans give rise to some difficulties. There are two possible approaches to handle them. Either we can withhold these NA's from the analyses or we can replace them with some imputed data (for example, the smoothed values of the latent process). For convenience, the former approach is used in the analyses to follow. There are all together 5 of these isolated missing scans. This number is relatively small as compared to the total number of scans taken in the sub-study. It is worth noting that when an isolated missing scan is withheld, the two scans originally adjacent to this missing scan will be treated as consecutive scans. For example, one patient in the placebo group missed the scan at time 7. When the NA at time 7 is withheld, the scans at time 6 and 8 will be treated as consecutive scans and the length of the time series becomes 16. The treatment has three levels: placebo, low dose and high dose. Patients were randomized to either the placebo, the low dose or the high dose group. The baseline covariates present are: • age (in years), • duration of disease (in years), • initial EDSS score, • origin (B.C. or Washington State), • gender (Male or Female), • initial burden of disease (in mm2). Chapter 4. Analysis of Betaseron data 29 Since the values of the initial burden of disease vary from 14 to 12300 and because of the log-linear nature of the baseline covariates effects for our model, the log of the initial burden of disease will be used as a covariate in the analysis to follow. For conve nience, duration of disease, initial EDSS score and the log of the initial burden of disease hereafter will be referred to simply as duration, EDSS and log area respectively. Time, the only time-varying covariate present, will be included in the model as long-term effect. The Betaseron data were previously analysed by Petkau and White (1995) who used the generalized estimating equation (GEE) approach developed by Liang and Zeger (1986) and investigated one variable at a time. Among the variables that they anal ysed was the burden of disease. D'yachkova (1997) analysed the whole data set from the Betaseron clinical trial. Among the responses, she investigated active lesions using the GEE approach. Later, the results from Petkau and White (1995) and D'yachkova (1997) will be compared to those obtained from our approach. 4.2 Initial data analysis 4.2.1 Burden of disease Following the ideas of Petkau and White (1995), the response was transformed into the variable relative burden (the ratio of the values of subsequent scans to that at baseline). Moreover, patient #505 was removed from the data set for further analysis as the trans formation was not defined for this patient, who had zero values at baseline and for all subsequent scans except at times 3 and 4. Further analysis will be based on the trans formed variable: relative burden. Chapter 4. Analysis of Betaseron data Table 4.1: Descriptive statistics for average relative burden values 30 Statistic Placebo Low Dose High Dose Number of patients 17 16 16 Minimum 0.793 0.629 0.814 Maximum 3.625 1.569 1.246 Median 1.153 1.108 1.028 Mean 1.474 1.167 1.023 Standard deviation 0.760 0.255 0.138 The descriptive statistics for the average of the relative burden values for each patient are summarized in Table 4.1. Boxplots of the average relative burden values by treatment groups presented in Figure 4.1 do not suggest a strong relationship with treatment groups. However, it appears that the average relative burden values tend to decrease with an increase in dosage of Betaseron. Three isolated extreme values (relative to the values of the same treatment group) are seen and they are: • patient #446 in the placebo group: average relative burden value is 3.625, • patient #449 in the placebo group: average relative burden value is 2.627, • patient #545 in the placebo group: average relative burden value is 2.623. Boxplots of the relative burden values for each patient shown in Figure 4.2 indicate that three patients (#446, #449 and #545) have higher values than those for others in the placebo group. Two very extreme values (relative to the values of the same patient) are seen in Figure 4.2 and they are: • patient #446 at time 17: relative burden value is 18.214, • patient #545 at time 6: relative burden value is 8.179. Chapter 4. Analysis of Betaseron data Figure 4.1: Boxplots of average relative burden values by treatment group 31 Placebo Low Dose High Dose These two extreme values will be referred to as 446* and 545* respectively. The extreme average relative burden values found in three patients (#446, #449 and #545) are not solely due to one or two isolated values. Instead, Figure 4.2 seems to indicate that the collections of values are larger for these three patients with extreme average relative bur den values. In addition, the two extreme relative burden values 446* and 545* are found in patients #446 and #545, respectively. The collections of values for these two patients are already larger than those of other patients in the placebo group. Special attention will be paid to investigating the influence of these two extreme values, 446* and 545*, on Chapter 4. Analysis of Betaseron data Figure 4.2: Boxplots of relative burden values for each patient 32 n : p !i n f j 0 § 1 J, J, B 1 n -1 i 1 3 | H B n i ii u H p 1 B 1 | ) i a I a! * B i i S" 1 B 8 s B 5 B g S B i § i aH1 Placebo Low Dose High Dose the analysis. Plots of the relative burden values versus time for each patient presented in Figure 4.3 show that three patients (#446, #449 and #545) have larger values relative to those of patients in the placebo group. It also appears that the values and their variations with time tend to decrease with an increase in dosage of Betaseron. Chapter 4. Analysis of Betaseron data 33 Figure 4.3: Plots of relative burden values versus time for each patient Placebo / / / / / A / 5 10 15 Chapter 4. Analysis of Betaseron data Figure 4.4: Boxplots of average active lesions values across treatment group Placebo Low Dose High Dose Chapter 4. Analysis of Betaseron data Table 4.2: Descriptive statistics for average active lesions values 35 Statistic Placebo Low dose High dose Number of patients 17 16 16 Minimum 0 0 0 Maximum 2.471 0.765 1.294 Median 0.375 0.147 0.059 Mean 0.579 0.225 0.225 Standard deviation 0.618 0.217 0.342 4.2.2 Active lesions As an initial step, the average of the active lesions values for each patient was calculated and the descriptive statistics for these averages are presented in Table 4.2. Table 4.2 suggests that there is a relationship with treatment groups. The average value in the placebo group is higher than the averages in the low and high dose groups. Boxplots of the average active lesion values by treatment groups shown in Figure 4.4 also indicate that there is a relationship with treatment groups. Two patients are seen to have extreme average active lesions values (relative to the values of the treatment group they are in) and they are listed below: • patient #545 in the placebo group: average active lesion value is 2All, • patient #497 in the high dose group: average active lesion value is 1.294. These two extreme values can be explained by the low successive counts (mostly 0 or 1) of the majority of the patients. As most patients have average active lesions values close to 0, this makes patients with a few high counts stand out readily. Chapter 4. Analysis of Betaseron data 36 Figure 4.5: Plots of active lesions values versus time Placebo Low Dose High Dose Chapter 4. Analysis of Betaseron data Table 4.3: Descriptive statistics for age, duration, EDSS and log area 37 Variable Statistic Placebo Low dose High dose age Median 35.0 36.0 36.0 Mean 34.6 37.5 37.2 Standard deviation 4.5 8.8 8.5 duration Median 6.9 6.1 10.1 Mean 7.4 9.5 11.0 Standard deviation 5.0 7.2 6.4 EDSS Median 1.5 2.0 2.5 Mean 1.9 2.1 2.5 Standard deviation 0.9 1.2 1.1 log area Median 7.5 7.0 7.5 Mean 6.9 6.9 7.5 Standard deviation 1.6 1.6 0.8 Plots of active lesions values versus time for each patient are presented in Figure 4.5. Patient #545 in the placebo group appears to have a collection of larger values of ac tive lesions than those of other patients in the placebo group. Figure 4.5 also seems to indicate that patient #497 only has larger number of active lesions in the first 6 MRI scans; the number of active lesions becomes smaller from time 7 onwards. In general, it seems that the number of active lesions decreases with an increase in dosage of the drug Betaseron. 4.2.3 Covariates The descriptive statistics for age, duration, EDSS and log area are shown in Table 4.3. These covariates are also summarized in the boxplots in Figure 4.6. The average age seems to be roughly the same across the treatment groups. However, the variability of age in the placebo group appears to be less than those in the low and high dose groups. Chapter 4. Analysis of Betaseron data 38 Figure 4.6: Boxplots of covariates values across treatment group Age Duration o 10 o in co o co in CM o CM in CM o CM in Placebo Low Dose. High Dose Placebo Low Dose High Dose EDSS Log area in co CM CO (0 Placebo Low Dose High Dose Placebo Low Dose High Dose Chapter 4. Analysis of Betaseron data Table 4.4: Counts for origin and gender 39 Variable Category Placebo Low Dose High Dose origin B.C. 15 10 14 Washington 2 6 2 gender Male 7 2 3 Female 10 14 13 The patients in the high dose group appear to have longer duration of disease than those for patients in the other groups and roughly the same variability as those in the placebo group. The patients in the low dose group seem to have the highest variability in the duration of disease. It is worth noting that EDSS is an ordinal variable. For convenience, we treat it as a continuous variable in the analyses which we report here. However, we should keep this in mind and explore the possibility in future to include EDSS as an or dinal variable in the analysis. The initial EDSS scores appear to be roughly comparable in mean and in variability across the treatment groups. The averages of log area values seem to be roughly equal across the treatment groups but the high dose patients appear to have the least spread of log area values among patients in other treatment groups. The counts for origin and gender in each treatment group is presented in Table 4.4. As seen in Table 4.4, it appears that the origin of patients in the placebo and high dose group is highly unbalanced. Only 2 out of the 17 patients from the placebo group and only 2 out of the 16 patients in the high dose group were from Washington. Imbalances in the gender distribution of the patients were also observed in the low and high dose groups. The patients in the low dose and high dose group were mostly male (2 out of 16 in the low dose group and 3 out of 16 in the high dose group). Chapter 4. Analysis of Betaseron data 40 This descriptive information about the covariates indicate that there are imbalances in covariates across treatment groups which have to be taken into account in the analysis to follow. 4.3 Model identification The only time-varying covariate that we consider in this thesis is time itself, which is used as a long-term covariate. All the baseline covariates together with the three treatment groups are included as constant baseline covariates in the initial values part of the model in Section 3.1. Hence, our preliminary model for the Betaseron data is as follows. Long-term covariate: time. Baseline covariates: intercept; low dose; high dose; age; EDSS; duration; origin; gender; log area. Let Yjt be the bivariate response vector of observations at time t for patient j with Chapter 4. Analysis of Betaseron data 41 Yijt, i = 1 (relative burden), 2 (active lesions) as its components (called categories). Note that the notation in Section 3.1 is used here. We assume relative burden to follow a gamma distribution, which corresponds to p\ = 2. We assume active lesions to be Poisson counts, which corresponds to p? = 1 and v\ = 1. The latent processes for all the patients are assumed to follow a gamma distribution; that is, we choose q = 2. For the initial values, the random effects are assumed to follow a gamma distribution, which corresponds to r = 2. The dispersion parameter, uf, is assumed to be the same across patients. In addition, all the latent processes are assumed to have the same dispersion parameter, cr2. Note that the dispersion parameter for a Poisson distribution is known and is equal to 1. The remaining unknown dispersion parameters are v\, o2 and u2 and they are estimated from the data. In the initial values part of the model, for identifiability, the effect for the placebo group is set to zero. In this way, the intercept represents the placebo level, and the effects for low dose and high dose represent the effects relative to placebo. The fitted model for the initial values becomes: l°g 9kj = ct + Tk + ii age + j2 EDSS + 73 duration + 74 origin + 75 gender + 76 log area where guj is the expected initial value of the latent process for the jth patient in the kth. treatment group; fk are the treatment effects for the kth treatment group; k = 0 (placebo), 1 (low dose), 2 (high dose). Note that the effect for the placebo group is T0 = 0. As mentioned in section 4.2.1, there are two extreme values, 446* and 545*, found in the variable relative burden. To investigate their effects on the analysis, we fit the Chapter 4. Analysis of Betaseron data 42 model separately to the full data set, the data set with 446* removed and the data set with both 446* and 545* removed. The estimates and their asymptotic standard errors, shown in brackets, for the baseline covariates are reported in Table 4.5 and those for the long-term covariate are presented in Table 4.6, 4.7 and 4.8. Our model allows the long-term effect for each patient to be different. There are all together 49 effects for time, one for each patient. The regression parameter for time can be interpreted as the rate of change of the log of the underlying severity per 6-week period. That is, each patient is assumed to have a different development of the disease. This is reasonable as there is natural heterogeneity among patients. Moreover, patients may react differently to the drug applied. A positive estimate for the time effect suggests an increasing trend over time of the severity. As the observations reflect the underlying severity, their patterns over time can well be addressed by investigating the trends of the latent processes. The baseline covariates influence the initial values of the latent process with time taken into account. The main focus is on the effects for the treatment groups. As ex plained in the beginning of this section, the estimate for low dose represents the effect of the low dose group relative to that of the placebo group. A negative estimate suggests a possible lowering of the initial values, thus hinting a possible improvement in the disease. The estimate for high dose has a similar interpretation. The two extreme values exerted great influences on the estimates for the effects for intercept, low dose, high dose and log area. For example, the estimates for the effects of the low dose group varies from 0.081 (full data set) to —0.118 (with 446* removed) and then to —0.067 (with 446* and 545* removed). Similarly, the estimates for the effects of the high dose group changes from —0.078 (full data set), —0.173 (with 446* removed) and then to —0.131 (with 446* and 545* removed). The dominant effects of these extreme Chapter 4. Analysis of Betaseron data Table 4.5: Estimates and standard errors for baseline covariate effects 43 Parameter full data set with 446* removed with 446* and 545* removed estimate (s.e.) estimate (s.e.) estimate (s.e.) intercept low dose high dose age EDSS duration origin gender log area -0.131 (0.280) 0.081 (0.075) -0.078 (0.081) -0.006 (0.004) 0.066 (0.032) -0.009 (0.005) -0.057 (0.080) 0.051 (0.078) 0.033 (0.021) 1.048 (0.253) -0.118 (0.066) -0.173 (0.072) -0.009 (0.004) 0.048 (0.029) -0.007 (0.005) -0.082 (0.071) 0.053 (0.071) -0.095 (0.019) 0.924 (0.276) -0.067 (0.073) -0.131 (0.078) -0.006 (0.004) 0.055 (0.031) -0.009 (0.005) -0.087 (0.078) 0.043 (0.076) -0.091 (0.021) values on the estimates are not unexpected. Although it is not desirable to have the results of the analysis strongly influenced by a few extreme values, it requires further investigations before removing them from the data set. Provided that the model gives a reasonable fit to the data, we will retain as many data points as possible to preserve valuable information. In the following section, we will see that the extreme value 446* causes some violations of model assumptions. As it is the last data point of the time series for patient #446, it causes a strong upward trend of the latent process from time 7 onwards. When 446* is removed from the data set, the model fits reasonably well. The second extreme value 545* does not have much effect on the adequacy of the model. It may be due to the fact that it is in the middle of the time series. However, it stands out as an outlier in the plots of residuals. On the one hand, we need to keep in mind the effects of these extreme values on model fitting. On the other hand, as these two large extreme values are found in the placebo patients, they may suggest evidence of a treatment effect. Chapter 4. Analysis of Betaseron data 44 Table 4.6: Estimates and standard errors for the long-term covariate (time) effects for patients in the placebo group full data set with 446* removed with 446* and 545* removed Patient # estimate (s.e.) estimate (s.e.) estimate (s.e.) 420 421 443 446 449 451 498 501 504 507 522 523 539 540 545 550 565 0.011 0.031 0.017 0.935 0.053 0.007 0.016 0.020 0.010 0.046 0.020 0.005 0.030 0.030 0.056 0.000 0.025 (0.032) (0.029) (0.030) (0.000) (0.026) (0.033) (0.029) (0.031) (0.034) (0.028) (0.029) (0.031) (0.033) (0.029) (0.027) (0.032) (0.031) 0.010 0.017 0.013 0.007 0.051 0.006 0.017 0.001 -0.020 0.038 0.018 0.005 0.025 -0.012 0.057 0.002 0.028 (0.011) (0.010) (0.010) (0.010) (0.009) (0.012) (0.011) (0.011) (0.012) (0.010) (0.011) (0.011) (0.013) (0.010) (0.008) (0.012) (0.011) 0.011 0.022 0.020 -0.005 0.047 0.010 0.022 0.012 -0.001 0.041 0.014 0.008 0.025 0.005 0.068 0.006 0.026 (0.012) (0.010) (0.011) (0.011) (0.009) (0.012) (0.011) (0.011) (0.012) (0.010) (0.011) (0.011) (0.013) (0.010) (0.008) (0.012) (0.012) Chapter 4. Analysis of Betaseron data 45 Table 4.7: Estimates and standard errors for the long-term covariate effects (time) for patients in the low dose group full data set with 446* removed with 446* and 545* removed Patient # estimate (s.e.) estimate (s.e.) estimate (s.e.) 419 424 448 450 452 499 502 508 521 525 542 544 547 548 564 568 0.022 0.030 0.008 -0.001 0.009 0.006 -0.022 0.009 -0.005 0.021 0.043 -0.002 -0.009 0.017 0.013 -0.005 (0.030) (0.028) (0.031) (0.033) (0.033) (0.029) (0.039) (0.029) (0.030) (0.029) (0.028) (0.037) (0.033) (0.032) (0.028) (0.036) 0.029 -0.002 0.015 0.010 0.003 0.025 -0.037 0.011 0.008 0.002 0.030 0.022 -0.018 0.008 0.014 0.019 (0.011) (0.010) (0.013) (0.013) (0.012) (0.011) (0.014) (0.012) (0.011) (0.010) (0.010) (0.016) (0.012) (0.011) (0.010) (0.016) 0.026 0.007 0.010 0.010 0.001 0.026 -0.027 0.007 0.008 -0.002 0.030 0.023 -0.009 0.013 0.014 0.011 (0.012) (0.010) (0.013) (0.013) (0.012) (0.012) (0.014) (0.011) (0.011) (0.011) (0.010) (0.016) (0.011) (0.011) (0.010) (0.016) Chapter 4. Analysis of Betaseron data 46 Table 4.8: Estimates and standard errors for the long-term covariate effects (time) for patients in the high dose group full data set with 446* with 446* and removed 545* removed Patient # estimate (s.e.) estimate (s.e.) estimate (s.e.) 422 0.020 (0.032) 0.015 (0.011) 0.018 (0.012) 444 0.006 (0.036) -0.004 (0.013) 0.002 (0.013) 445 0.026 (0.032) 0.030 (0.011) 0.026 (0.012) 453 0.011 (0.034) 0.008 (0.012) 0.012 (0.012) 454 0.035 (0.030) 0.042 (0.011) 0.040 (0.011) 497 -0.031 (0.033) -0.026 (0.013) -0.025 (0.013) 500 0.015 (0.033) 0.015 (0.011) 0.012 (0.012) 503 0.009 (0.031) 0.009 (0.011) 0.013 (0.011) 506 -0.038 (0.042) -0.029 (0.019) -0.021 (0.018) 524 0.010 (0.031) 0.020 (0.012) 0.018 (0.012) 526 0.014 (0.034) 0.014 (0.012) 0.011 (0.012) 541 -0.003 (0.034) -0.011 (0.012) -0.011 (0.012) 543 0.025 (0.035) 0.008 (0.013) 0.012 (0.013) 546 0.005 (0.041) 0.020 (0.018) 0.021 (0.018) 549 0.023 (0.031) 0.024 (0.011) 0.020 (0.012) 566 0.017 (0.032) 0.008 (0.011) 0.012 (0.011) Chapter 4. Analysis of Betaseron data 47 4.4 Model checking Following the notation used in Section 3.4, we refer the standardized filter residuals as Rit (from the active lesion process), i?2i (from the relative burden process) and rt (from the latent process). The corresponding standardized smoother residuals are denoted by R*,t and rt- F°r notational convenience, the index j for a given series is suppressed. We now analyse the residuals from the fitted models on the full data set, on the data set with 446* removed and on the data set with both 446* and 545* removed. 4.4.1 Full data set For the fitted model using the full data set, we will first analyze the residuals from each patient one at a time and then consider the residuals from the whole data set to inves tigate their overall pattern. Among the 49 patients in this study, we will only discuss our findings on four (2 from the placebo group, one from the low dose group and one from the high dose group) of the patients. Patients with extreme values are of particular importance. Therefore, we will present our findings on the residuals analysis for patient #446 and #545 in the placebo group. Patient #502 and #500 in the low dose and high dose group respectively are also selected. The residual plots for patient #497, who has extreme average active lesions, are available in Appendix C for reference. Plots for the residuals from patient #446 are shown in Figure 4.7. The autocorrela tions function (ACF) plot for Ru in Figure 4.7(a) clearly indicates that Rlt are correlated as the autocorrelations for lag 1 and 2 fall significantly outside the asymptotic 95% con fidence bands; thus violating the conditional independence assumption of active lesions Chapter 4. Analysis of Betaseron data 48 Figure 4.7: Residuals plots for patient #446 in the placebo group <a> Series: Active Lesions <"> Series: Relative Burden <°) Series: Latent Process (d) Active Lesions (e) Relative Burden (f) Latent Process -2.0 -1.5 -1.0 lag 1 residuals (g) Active Lesions -3 -2 -1 lag 1 residuals (h) Relative Burden (i) Latent Process lime (j) Active Lesions (k) Relative Burden Latent Process 1.0 1.5 2.0 log fitted values 1.0 1.5 2.0 log fitted values 12 3 4 log fitted values Chapter 4. Analysis of Betaseron data 49 Figure 4.8: Residuals plots for patient #545 in the placebo group <a> Series: Active Lesions <b> Series: Relative Burden <c> Series: Latent Process TT (d) Active Lesions lag 1 residual Active Lesions 10 time (j) Active Lesions (e) Relative Burden 0 5 10 15 lag 1 residuals (h) Relative Burden 5 10 lime (k) Relative Burden Lag (f) Latent Process (i) Latent Process Latent Process 0.5 1.0 log litted values 0.5 1.0 bg fitted values 1.0 1.5 2.0 bg fitted values Chapter 4. Analysis of Betaseron data 50 values over time. Moreover, both the ACF plot for R2t (Figure 4.7(b)) and for rt (Fig ure 4.7(c)) have lag 1 correlations found outside the asymptotic 95% confidence bands. The plots of Ru against Ru-i (Figure 4.7(d)) and R2t against -R24-1 (Figure 4.7(e)) show a linear pattern in the right most part of each plot. This gives further evidence against the assumptions that values from the two observation processes are conditionally inde pendent over time, given the latent process. Evidence against the Markov assumption of the latent process is seen in the plot of rt against rt-\ (Figure 4.7(f)) which shows a linear relationship in the right part of the plot. Curved patterns are observed in the plots of Rit against time (Figure 4.7(g)) and R2t against time (Figure 4.7(h)). This means that the values of the observation process are not conditionally independent over time, given the latent process. The same pattern is also visible in the plots of rt against time (Fig ure 4.7(i)). It confirms the inadequacy of the Markov assumption for the latent process. The plots for the residuals against the log-linear predictor presented in Figure 4.7(j), (k) and (1) all have a peculiar structure which should not be expected. The distributional assumptions for the observation and the latent processes can be considered inappropri ate. Note that an outlier corresponding to the observation at time 17 (the extreme value 446* mentioned in section 4.2.1) is visible in plots Figure 4.7(e), (f), (h), (i), (k) and (1). When there is something wrong with the model assumptions, it manifests itself in various ways which are seen in the residual plots. All the residuals plots suggest inadequacy of the model. The residuals analysis for patient #446 gives convincing evidence that the model assumptions are inappropriate. We suspect that the extreme value 446* causes some serious violations of main model assumptions. Although this is not entirely obvious from the plots in Figure 4.7, 446* at the last MRI scan (time 17) for that patient may pull the estimated latent process upwards to give rise to some patterns observed in those plots. Chapter 4. Analysis of Betaseron data 51 Figure 4.9: Residuals plots for patient #502 in the low dose group <a) Series: Active Lesions Lag (d) Active Lesions •0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 lag I residuals (g) Active Lesions (j) Active Lesions -1.0 -0.8 -0.6 -0.4 log fitted values w Series: Relative Burden Lag (e) Relative Burden -1.5 -1.0 -0.5 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden -1.0 -0.8 -0.6 log fitted values <c» Series: Latent Process Lag (f) Latent Process -1.5 -1.0 -0.5 0.0 0.5 1.0 (i) Latent Process Latent Process -1.5 -1.0 log fitted values -0.5 Chapter 4. Analysis of Betaseron data 52 Figure 4.10: Residuals plots for patient #500 in the high dose group <a> Series: Active Lesions I , i I ~n r Lag (d) Active Lesions -0.5 0.0 0.5 lag 1 reskfuals (g) Active Lesions 5 10 15 lime (j) Active Lesions <b> Series: Relative Burden _I_L i , I "I TT Lag (e) Relative Burden 0.0 0.5 1.0 lag 1 reskluals (h) Relative Burden 5 10 time (k) Relative Burden -0.10 -0.05 0.0 0.05 0.10 0.15 log fitted values 0.0 0.05 0.10 0.15 log fitted values <c) Series: Latent Process TT Lag (f) Latent Process -0.5 0.0 0.5 lag 1 residuals (i) Latent Process time Latent Process -0.1 0.0 0.1 log fitted values Chapter 4. Analysis of Betaseron data 53 Figure 4.8 consists of residuals plots for patient #545. The ACF plot for Rlt (Fig ure 4.8(a)) does not show significant autocorrelation and neither does the ACF plot for i?2i (Figure 4.8(b)). These give no evidence against the conditional independence as sumption of the observation processes. No significant autocorrelation is seen in the ACF plot for rt (Figure 4.8(c)) giving no evidence against the Markov assumption of the latent process. No specific structure is observed in the plot of Ru against Ru-i as shown in Figure 4.8(d). However, the plot for i?2t against i?2t-i presented in Figure 4.8(e) shows a weak linear pattern with 2 outliers. It is worth noting that these outliers are due to the extreme value at time 6, that is, 545* (mentioned in section 4.2.1). Hence, Figure 4.8(e) suggests a mild violation of the conditional independence assumption of the observation process. A similar weak linear pattern is also apparent in the plot for rt against rt-i as shown in Figure 4.8(f); it gives evidence against the Markov assumption of the latent process. Two outliers are also seen and they are due to the extreme value 545*. An outlier is observed in both Figure 4.8(h) and 4.8(i) which correspond to 545*. Besides that, nothing unusual is seen. No particular pattern is observed in the plot of R\t against the log-linear predictor (Figure 4.8(j)) and neither for i?2t against the log-linear predictor (Figure 4.8(k)) and rt against the log fitted values (Figure 4.8(1)) except that an outlier is present. The outliers that are found in Figure 4.8(k) and in Figure 4.8(1) are caused by the extreme value 545*. From these plots, the extreme value 545* is confirmed to be an outlier and mild violations of the conditional independence assumptions are evident. Plots for the residuals from patient #502 in the low dose group are presented in Fig ure 4.9. None of the ACF plots for Rn, R2t and rt have autocorrelations falling outside the 95% confidence bands. There are two possible outliers in the plots of Ru against R\t-\ shown in Figure 4.9(d). A closer look reveals that this feature is due to the dis creteness of the data. The values of active lesions are all 0 except at time 11 when the Chapter 4. Analysis of Betaseron data 54 number of active lesions is 1. The point in the far right corresponds to the case where the data value of 0 is immediately preceded by a value of 1. The reverse order gives the point in the upper left corner. The cluster of residuals in the lower left corner correspond to successive data values of 0. Although the residual at time 11 stands out from the rest, its value is 0.41 which is not large by any means. The residual from time 11 is perceived as an outlier in both Figure 4.9(g) and (j). Note that this feature caused by the discreteness of the data will also be observed in other plots. No particular pattern is observed in the plot for R2t against R2t-i presented in Figure 4.9(e) and rt against rt-i in Figure 4.9(f). The Markov assumption of the latent process appears to be acceptable as the plot of rt against rt-i in Figure 4.9(f) reveals no pattern. Again, nothing specific is observed for plots in Figure 4.9(g), (h), (i), (j), (k) and (1). In Figure 4.10(g), it is worth pointing out that the residuals in the lower band corresponding to data values of 0 and the isolated residual appears in the top corresponding to data value of 1. The same explanation applies to Figure 4.9(j). Plots for the residuals from patient #500 in the high dose group are presented in Figure 4.10. None of these plots give evidence against the main model assumptions. We now comment on some features found in these plots which are caused by the discrete nature of active lesions counts. In Figure 4.10(g) where R\t is plotted against time, all the residuals appear to align themselves in three bands. The residuals in the lowest band correspond to the counts of 0. Residuals from the counts of 1 are in the second lowest band whereas the residual in the upper left hand corner comes from the observation with 2 active lesions. This feature is also observed in Figure 4.10(j). The plot of Ru against Ru-i presented in Figure 4.10(d) also reveals an interesting feature due to the discrete ness of the data. The residuals appear in clusters. The cluster in the lower left corner corresponds to two consecutive 0 observations. The cluster in the center-left portion are Chapter 4. Analysis of Betaseron data 55 Figure 4.11: Residuals against log-linear predictors for the two categories (a) Active Lesions (b) Relative Burden -1 O 1 2 -1 O 1 2 log-linear predictor log-linear predictor due to a data value of 1 immediately preceded by a data value of 0 whereas a count of 2 immediately preceded by a count of 0 gives the isolated point in the upper-left corner. The features caused by the discrete nature of the observation are highly visible in plots where residuals from the entire data set are considered together. We now consider residual plots using the entire data set. Plots for Rn against log predicted values of the observed processes based on the Kalman filter are shown in Fig ure 4.11. The discreteness of the active lesions values, coupled with the low expected counts, accounts for the curved set of bands with negative slope found in Figure 4.11(a). The lowest band are the filter residuals for data values of 0, the next lowest band for data values of 1 and so on. In particular, it is impossible to have large negative residu als for small fitted values. This feature, together with the negative slope of the curves on which the residuals lie, makes the residual plots show an apparent downward trend. These bands dominate the plot for log expected counts less than 1 but seem to disappear Chapter 4. Analysis of Betaseron data 56 Figure 4.12: Residuals against log-linear predictors for the latent process Latent Process 0 1 2 log-linear predictor above this value. The Poisson residuals become approximately normal for log expected counts greater than 1. Taking this into account, Figure 4.11(a) is compatible with the Poisson assumption. In particular, it gives no evidence against the use of the Poisson variance function. The downward trend in Figure 4.11(a) would be apparent in any plot for Poisson residuals against fitted values, making it difficult to interpret some plots. In Figure 4.11(b), one outlier (near the top) corresponding to the extreme value 545* stands out. Nearly all the residuals are clustered in the lower left corner with a few of them scattered around. The residuals with a peculiar pattern in the lower right corner are from patient #446. This plot does not give clear evidence of serious violation of the gamma assumption for the relative burden process. However, it is important to note the peculiar pattern of the residuals from patient #446. The plot of rt against log fitted values (log(6tmt_i)) presented in Figure 4.12 also has an outlier corresponding to 545*. Moreover, there are a set of residuals in the far right having a peculiar pattern. These Chapter 4. Analysis of Betaseron data 57 Figure 4.13: Residuals against lag 1 residuals for the two categories (a) Active Lesions (b) Relative Burden 5 10 lag 1 residuals residuals are from patient #446. When these residuals are not taken in consideration, the gamma assumption for the latent process seems to be appropriate. Up to now, the residuals plots all suggest that patient #446 gives somewhat peculiar residuals. The strange pattern of these residuals are observed in various plots and is believed to have caused by the extreme value 446*. Figure 4.13 shows the plots for Ru against Ru-i and i?2i against R^t-i. A band-like structure is detected in Figure 4.13(a). All the bands run from the lower left corner to the upper right corner and are less obvious when away from the center-portion of the plot. These bands are caused by the discreteness of the data. The residuals in the center-most band correspond to two consecutive zero observations. The two bands adjacent to the center-most correspond to either data values of 0 immediately preceded by 1 or data values of 1 immediately preceded by 0. Figure 4.13(a) gives no evidence against the conditional independence assumption of the active lesions values over time, given the Chapter 4. Analysis of Betaseron data 58 Figure 4.14: Residuals against lag 1 residuals for the latent process Latent Process . . * * . * •5 0 5 10 15 lag 1 residuals latent process. There are two outliers in Figure 4.13(b) which are due to the extreme value at time 6 of patient #545, 545*. It appears that there is a very weak relationship between R2t and R2t-i which can be treated as negligible. As a significant portion of the residuals form a cloud, the perceived weak relationship is caused by a few residuals in the lower left corner of the plot. It is not surprising to see that these residuals are from patient #446 and #545. A similar very weak linear relationship between rt and rt-\ is also found in Figure 4.14. This weak linear relationship is considered negligible using the same arguments discussed above. Moreover, two outliers corresponding to 545* are seen in Figure 4.13. In conclusion, the fitted model on the full data set is inadequate. There are serious violations of some of the main model assumptions. For example, the conditional inde pendence assumption of the relative burden values over time and the Markov assumption Chapter 4. Analysis of Betaseron data 59 of the latent process are clearly inappropriate. These violations are mainly caused by the extreme observation of relative burden at time 17 for patient number 446. 4.4.2 Data set with 446* removed We now consider the residual analysis for the fitted model on the data set with the ex treme value 446* removed. As in the previous sub-section, we will discuss the residual plots from the same four patients as well as the residual plots from the entire data set. We will later confirm that the fitted model is reasonable. We provide the residual plots for the remaining patients in appendix C for reference. Plots for residuals from patient #446 are presented in Figure 4.15. None of the ACF plots in Figure 4.15(a), (b) and (c) have autocorrelations falling outside the asymptotic 95% confidence bands. No pattern is observed in Figure 4.15(d), (e) and (f). In Fig ure 4.15(g), the slight downward trend may be of concern. It suggests that the active lesions values may not be dependent via the latent process only. However, the band structure of Rlt (as discussed in the previous sub-section) may help explain part of this feature. As all the residuals corresponding to data values of 0 lie in the lower band, any slight relationship between Ru and time will be magnified. We will address this issue later by plotting residuals from all patients against time to see if such pattern persists. Figure 4.15(h) and (i) has no particular pattern, confirming the conditional independence assumption of the relative burden and the Markov assumption of the latent process. None of Figure 4.15(j), (k) and (1) reveal any unexpected patterns. Hence, they give no ev idence against the distributional assumptions for the observation and latent processes. It comes as no surprise that there are great improvements of the fit for patient #446. When the extreme value 446* is removed, the estimated latent process for patient #446 Chapter 4. Analysis of Betaseron data 60 Figure 4.15: Residuals plots for patient #446 in the placebo group <a> Series: Active Lesions '"'Series: Relative Burden (°i Series: Latent Process -r-r I ' 1 i 1 i i i Lag (d) Active Lesions •1.8 -1.6 -1.4 lag 1 residuals (g) Active Lesions 5 10 time (j) Active Lesions 0.6 0.7 0.8 0.9 1.0 log fitted values 0.6 (e) Relative Burden -2 0 2 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden 0.7 0.8 0.9 log fitted values I 1 I 1 I Lag (f) Latent Process (i) Latent Process 5 10 lime (I) Latent Process Chapter 4. Analysis of Betaseron data 61 Figure 4.16: Residuals plots for patient #545 in the placebo group <a' Series: Active Lesions (d) Active Lesions (g) Active Lesions 10 time (j) Active Lesions <"> Series: Relative Burden Lag (e) Relative Burden lag 1 residuals (h) Relative Burden 5 10 tune (k) Relative Burden w Series: Latent Process Lag (f) Latent Process 0 5 10 lag 1 residuals (i) Latent Process (I) Latent Process 0.6 0.8 1.0 log fitted values 0.6 0.8 1.0 log fitted values 1.0 1.5 2.0 log fitted values Chapter 4. Analysis of Betaseron data 62 Figure 4.17: Residuals plots for patient #502 in the low dose group <a> Series: Active Lesions "i Series: Relative Burden <c> Series: Latent Process TT J_L± _L_i_ Lag (d) Active Lesions -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 lag 1 residuals (g) Active Lesions 10 tone (j) Active Lesions •0.4 -0.2 Lag (e) Relative Burden -1.5 -1.0 -0.5 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden (k) Relative Burden -0.6 -0.4 log fitted values (f) Latent Process -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 (i) Latent Process (I) Latent Process •1.5 -1.0 -0.5 0.0 log Bed values Chapter 4. Analysis of Betaseron data 63 Figure 4.18: Residuals plots for patient #500 in the high dose group <a> Series: Active Lesions ±±TT±-Lag (d) Active Lesions -0.5 0.0 lag 1 residuals (g) Active Lesions 10 time (j) Active Lesions <"> Series: Relative Burden Lag (e) Relative Burden 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden 0.0 0.05 0.10 log fitted values -0.4 (c» Series: Latent Process Lag (f) Latent Process (i) Latent Process 5 10 15 time (I) Latent Process -0.2 0.0 0.2 log fitted values Chapter 4. Analysis of Betaseron data 64 is no longer forced to go up from time 7 onwards as in the fitted model on the full data set. Plots for residuals from patient #545 are shown in Figure 4.16. Two outliers are seen in Figure 4.16(e) and (f). These two outliers are due to the extreme value 545*. The outliers found in Figure 4.17(h), (i), (k) and (1) are also due to this extreme value. It is not surprising to see the frequent appearance of outliers due to 545* in various plots. Following the procedures on plots in Figure 4.8, there is no evidence against the model assumptions. Figure 4.17 and Figure 4.18 show the plots for residuals from patient #502 and #500 respectively. Note that the features observed in Figure 4.17(d), (g) and (j) have been discussed when we analyse Figure 4.9. Regarding the features seen in Figure 4.18(d), (g) and (j), we follow similar arguments when we discuss Figure 4.10. Again, no evidence against the main model assumptions are found. We now analyse residuals from all the patients. Plots for Rn against log fitted values are shown in Figure 4.19. The downward banded structure found in Figure 4.19(a) has been explained previously. The Poisson assumption for the active lesions values appears to be acceptable. In Figure 4.19(b), there is an outlier corresponding to the extreme value 545* in the top right corner. A slight "megaphone" shape is also observed in Figure 4.19(b); the shape is caused by a very few points to the far right. As most of the residuals form a cloud in the center of the plot, it seems that the gamma variance function assumption is justified. When the outlier is withheld from the plot, the new plot (not shown here) has a similar shape as Figure 4.19(b). In the new plot, only a very few points to the right cause a slight "megaphone" shape. There is no clear evidence against the gamma assumption. Figure 4.20 shows the plot for rt against log fitted Chapter 4. Analysis of Betaseron data 65 Figure 4.19: Residuals against log-linear predictors for the two categories (a) Active Lesions (b) Relative Burden -O.S O.O OS log-linear predictors -0.5 O.O O.S 1.0 log-linear predictors Figure 4.20: Residuals against log-linear predictors for the latent process Latent Process 3 log-linear predictors Chapter 4. Analysis of Betaseron data 66 Figure 4.21: Residuals against lag 1 residuals for the two categories (a) Active Lesions (b) Relative Burden -2 -1 O 1 2 3 4 -S O 5 10 15 lag 1 residuals tag 1 residuals values (log(6<mt_i)). There is an outlier in the upper part; corresponds to'545*. A slight "megaphone" shape caused by a small percentage of residual is also observed. The new plot (not shown here) with the outlier removed also gives a similar shape. Only a very small portion of points cause the perceived shape. Note that there are 804 observation vectors in the data. Following the argument used for Figure 4.19(b), we consider the gamma variance function assumption for the latent process acceptable. Note that the extreme value 545* consistently causes an outlier in some of the plots mentioned in this sub-section and is expected to cause outliers in some of the following plots, too. The band-like structure for Ru against R\t-\ in Figure 4.21(a) has been previously explained. It is due to the discreteness of the data. The plot for R2t against R2t-i in Figure 4.21(b) does not show any particular structure except the presence of two outliers. The plot for rt against rt-i shown in Figure 4.22 reveals no specific structure; the two outliers are as expected. Plots for Rn against time are presented in Figure 4.23. Both Chapter 4. Analysis of Betaseron data 67 Figure 4.22: Residuals against lag 1 residuals for the latent process Latent Process Figure 4.23: Residuals against time for the two categories (a) Active Lesions (b) Relative Burden 1111! 111 • I i! i i i! i 10 time Chapter 4. Analysis of Betaseron data 68 Figure 4.24: Residuals against time for the latent process Latent Process I 5 10 IS time plots demonstrate a cloud structure as expected. If time is incorrectly classified as a long-term covariate, the residuals Rn should show some relationship with time. Since these two plots have no particular pattern, there is no evidence against time being a long-term covariate. Nothing unexpected is observed in the plot of rt against time shown in Figure 4.24. Note that the outlier found in both Figure 4.23(b) and Figure 4.24 cor responds to 545*. We now check the model assumptions for the initial state values. The plot for the smoother residuals (mj0—gj) against the log fitted values (log#j) presented in Figure 4.25 does not reveal any particular pattern. Therefore, the gamma variance function assump tion for the random effects distribution is acceptable. The linearity assumption is checked by plotting the smoother residuals (mj0 — g3) against the covariates; as presented in Fig ure 4.26. Nothing unexpected is seen in Figure 4.26(a), (b), (c) and (f). Note that as ! I ! I I I I I I i ' I i | | I Chapter 4. Analysis of Betaseron data 69 Figure 4.25: Smoother residuals against log-linear predictors for the random effects 0.0 0.2 log-linear predictors mentioned in Section 4.2.3, EDSS is an ordinal variable but is treated as a continuous variable in the analysis here for computational convenience. Due to the considerable larger number of patients from BC, it is not entirely clear that the greater range of the residuals for the BC patients seen in Figure 4.26(d) indicates larger variability than residuals for the Washington patients. Moreover, the greater range of the residuals for the female patients observed in Figure 4.26(e) does not clearly indicate greater variabil ity than residuals for male patients because of the substantial larger number of female patients in the study. The plots in Figure 4.26 do not suggest violations of the model assumptions. Finally, we check the correlations between the two categories by calculating the em pirical variance-covariance matrix. This matrix is obtained by first standardizing the vector consisting of i?ti, i = 1 and 2, by the square-root of its variance-covariance matrix; Chapter 4. Analysis of Betaseron data 70 Figure 4.26: Smoother residuals against baseline covariates (a) Age (b) EDSS (c) Duration 30 35 40 45 50 EDSS (e) Gender 10 IS 20 25 (f) Log area Chapter 4. Analysis of Betaseron data 71 Table 4.9: Empirical variance-covariance matrix for standardized residuals Category 1 2 1 2 0.993 -0.152 -0.152 0.685 and then calculating the corresponding empirical variance-covariance matrix. The results are given in Table 4.9. Ideally, an identity matrix is expected if the two categories are uncorrelated. The interpretation of Table 4.9 is complicated by the fact that it depends on the choice of the square-root matrix used for the standardization. The matrix in Table 4.9 is much different from the identity matrix. However, such deviation may not suggest serious violation of the conditional independence assumption of the two categories. If we take the asymptotic standard error of the off-diagonal elements of the matrix to be l/\/804 = 0.035, the off-diagonal elements are much larger than predicted by the model. The empirical variance of 0.685 in the second category is much lower than it should be. Although there is evidence that the two categories are slightly conditionally corre lated, this does not seem to indicate any serious shortcoming of our model. However, we should keep in mind the effect of this possible correlation between the two categories on the asymptotic standard errors of the regression parameter estimates. These standard errors will be under-estimated if the categories are correlated instead of independent as assumed in our model. None of the ACF and PACF plots for the average filter residuals presented in Figure 4.28 indicate significant correlations. As a conclusion, the main model assumptions for the fitted model on the data set with 446* removed seems to be appropriate; no serious violations are evident from the data. The model gives a very reasonable fit to this data set. Chapter 4. Analysis of Betaseron data 72 Figure 4.27: Autocorrelation functions for average filter residuals: (a) to (c) are the autocorrelations functions; (d) to (e) are the partial autocorrelations functions <•> Series : Active Lesions <"> Series : Relative Burden <°> Series : Latent Process <"> Series : Active Lesions <•> Series : Relative Burden <" Series : Latent Process Chapter 4. Analysis of Betaseron data 73 4.4.3 Data set with 446* and 545* removed The residual analyses for the fitted model on the data set with 446* and 545* removed have also been carried out. As expected, all the main assumptions for the fitted are confirmed to be appropriate. Due to space limitation, these plots are not included in this thesis. 4.5 The final model We have now confirmed that the fitted model on the data set with extreme value 446* removed is reasonable. As mentioned previously in Section 4.3, we will retain as many data points as possible in the data for analysis provided that the fit is acceptable. There fore, we will start our search for the final model with the fitted model on the data set with 446* removed. Since the latent process may be interpreted as the underlying severity of the disease, the effects for time, the long-term covariate, is discussed first. The regression parameter for time may be interpreted as the rate of increase of the log of the underlying severity (log Qt) per 6-week period. A positive estimate means that the underlying severity of MS is getting worse over the period of this study. Our model allows each patient to have a different regression parameter. There are all together 49 estimates for the long-term covariate effects, one for each patient. These estimates are summarized in Table 4.6 on P.44, Table 4.7 on P.45 and Table 4.8 on P.46, with asymptotic standard errors of the estimates shown in brackets. Chapter 4. Analysis of Betaseron data 74 In our model, the latent process is estimated by the Kalman smoother. The estimated latent processes for each patient presented in Figure 4.28 can be used to guide our discus sion on the pattern of the response over time. It appears that these estimated processes are not so variable. The estimate for the dispersion parameter (a2 — 0.00038) shows that the stochastic variation of the latent morbidity process 9t itself is small. Among the three treatment groups, patients in the placebo group seems to have the greatest upward trend. The estimated latent processes of three patients (#449, #507 and #545) are very different from those for other patients in the placebo group. For the low dose group, nearly all the estimated processes are mostly horizontal except that for patient #502. This patient has an estimated latent process with a relatively large negative slope. Since one of the objectives of this study is the pattern of the response over time, it is desirable to see whether patients in a given treatment group share the same regression parameters and then compare these common effects across treatment groups. The test for patients in the placebo group having the same regression parameter has the Wald's statistic (as mentioned in section 3.3.1.) W = 27.31, which gives the p-value of 0.038 compared with a x2(16)-distribution. That is, there is weak evidence that the long-term covariate effects are not the same. The same test for the low dose group gives W = 12.64 with 15 degrees of freedom; the p-value is 0.630. Therefore, there is no evidence against the assumption that patients in the low dose group share the same regression parameter. The test applied to the patients in the high dose group has W=13.68 with 15 degrees of freedom and the p-value is 0.550. The test suggests that there are no significant differ ences of the long-term covariate effects among patients in the high dose group. We now discuss the effects for the baseline covariates. The estimates for the regression Chapter 4. Analysis of Betaseron data 75 Figure 4.28: Estimated latent processes for each patient Placebo m Chapter 4. Analysis of Betaseron data 76 parameters are summarized in Table 4.5 on P.43. The Wald's test for the joint signifi cance of low dose and high dose has W = 6.27 with 2 degrees of freedom, which gives the p-value of 0.043. Hence, the data give weak evidence that the effects for the treatment groups are significant. Since there are significant treatment effects, it is appropriate to compare the treatment groups. The effect for the low dose group is mildly significant (p-value of 0.073) and that for the high dose group is highly significant (p-value of 0.016). No significant difference between the low dose and the high dose group (p-value of 0.430) is found. That is, the log of the expected initial value for patients in the placebo group is 0.118 higher than those for patients in the low dose group and 0.173 higher than those for patients in the high dose group. Although the estimated effect for the high dose group is 0.055 less than that for the low dose group, such a difference should be not taken seriously as it is not statistically significant. The other baseline covariates found to have significant effects are age (p-value of 0.012) and log area (p-value of 0.000). For every unit increase in age and log area, the log of the expected initial value would be reduced by 0.009 and 0.095 respectively. That is, the analysis suggests that older patients with larger values of initial burden of disease seem to have smaller initial values for the latent process. We now consider the case where patients in a given treatment group share the same regression parameter for the long-term covariate. Although the data give some evidence that patients in the placebo group do not share a common time effect, it will be interesting to discuss the results under this assumption as one of the objective of this MRI sub-study is to know more about the "average" effects of Betaseron on patients. These common time effects may be interpreted as the "average" time effects in a treatment group, thus lending themselves to address the objective of the sub-study. The estimates for the Chapter 4. Analysis of Betaseron data 77 Table 4.10: Estimates for the long-term covariates effects Parameter estimate standard error z 0.021 0.0055 3.76 Pid 0.011 0.0063 1.67 fihd 0.011 0.0068 1.65 Table 4.11: Estimates for the baseline covariates effects Parameter estimate standard error z intercept 0.861 0.299 2.88 low dose -0.075 0.085 -0.88 high dose -0.163 0.091 -1.79 age -0.008 0.005 1.60 EDSS 0.049 0.033 1.48 duration -0.005 0.006 0.83 origin -0.049 0.084 0.58 gender 0.041 0.083 0.49 log area -0.080 0.022 3.64 effects are summarized in Table 4.10 and Table 4.11. We denote the common regression parameters by Bpl (in the placebo group), Bld (in the low dose group) and Bhd (in the high dose group). The test for equality of the Bpl, Bld and Bhd gives the Wald's statistic W = 1.79 with 2 degrees of freedom (p-value of 0.410). Note that neither Bld nor Bhd are significantly different from zero. It suggests that the time effects across the treatment groups are the same. Moreover, only two baseline covariates are found to have significant effects. The effect for high dose is marginally significant (p-value of 0.073) and for log area is highly significant (p-value of 0.000). The effects for low dose and age are no longer significant. Note that, in general, the magnitude of the estimates are smaller and the standard errors Chapter 4. Analysis of Betaseron data 78 are larger than those for the full model. For example, the estimate for the effect for the high dose group is —0.163 versus —0.173 in the full model. The Tweedie models possess an advantage that subject-specific effects can be inves tigated. This advantage is fully utilized to extract more valuable information from the data set. First, the time effects for patients are allowed to be different. By investigating these time effects, information on how individual patients respond to the treatment can be extracted. There is some evidence that the time effects for patients in the placebo group are somewhat different. If the objective is on studying the "average" time effects across the treatment groups, the analysis can be carried out with the assumption that patients in a treatment group share a common time effect. The common time effect may be interpreted as the "average" effect for patients in a treatment group. To conclude, we start with the full model and then reduce it to a more parsimonious one. The final model identified is: Long-term covariate: time. Baseline covariates: low dose; high dose; age; log area. The estimates for this final model is presented in Table 4.12 and Table 4.13. The Wald's test for common time effects for patients in a treatment group is carried out. For Chapter 4. Analysis of Betaseron data 79 Table 4.12: Estimates and standard errors for the time effects for each patient; placebo (PL), low dose (LD) and high dose (HD). PL estimate (s.e.) LD estimate (s.e.) HD estimate (s.e.) #420 0.010 (0.011) #419 0.021 (0.011) #422 0.011 (0.011) #421 0.014 (0.010) #424 0.001 (0.010) #444 -0.007 (0.012) #443 0.013 (0.010) #448 0.015 (0.012) #445 0.029 (0.011) #446 0.003 (0.010) #450 0.010 (0.012) #453 0.005 (0.011) #449 0.054 (0.008) #452 -0.003 (0.011) #454 0.040 (0.011) #451 0.002 (0.011) #499 0.021 (0.011) #497 -0.021 (0.013) #498 0.021 (0.010) #502 -0.041 (0.013) #500 0.012 (0.011) #501 0.001 (0.010) #508 0.012 (0.011) #503 0.013 (0.011) #504 -0.021 (0.011) #521 0.013 (0.011) #506 -0.026 (0.019) #507 0.039 (0.010) #525 0.005 (0.010) #524 0.023 (0.012) #522 0.022 (0.010) #542 0.027 (0.010) #526 0.012 (0.012) #523 0.007 (0.011) #544 0.016 (0.015) #541 -0.011 (0.012) #539 0.025 (0.013) #547 -0.011 (0.011) #543 0.002 (0.012) #540 -0.009 (0.010) #548 0.010 (0.011) #546 0.030 (0.017) #545 0.053 (0.008) #546 0.017 (0.010) #549 0.024 (0.011) #550 0.005 (0.011) #568 0.023 (0.016) #566 0.013 (0.011) #565 0.024 (0.011) Table 4.13: Estimates and standard errors for baseline covariate effects Parameter estimate (s.e.) intercept low dose high dose age log area 1.123 (0.190) -0.074 (0.063) -0.146 (0.068) -0.010 (0.004) -0.097 (0.019) Chapter 4. Analysis of Betaseron data 80 the placebo group, the test statistic is W = 74.71, which gives a p-value of 0.000 when compared with a x2(16)-distribution. For patients in the low dose group, the p-value is 0.036 (W = 26.20 with 15 degrees of freedom). As for the high dose group, the test gives a p-value of 0.011 (W = 30.14 with 15 degrees of freedom). Hence, there is clear evidence that the time effects are different for patients in a treatment group. These tests all have smaller p-values as compared to the same tests in the full model. This may be due to the smaller standard errors for the time effects in the final model. Note that in the full model, the same test for common time effects leads to test statistics of W = 27.31 (for the placebo group), W = 12.64 (for the low dose group) and W = 13.68 (for the high dose group). Finally, since there is marginal significant treatment effects (p-value of 0.089), it is quite reasonable to compare the treatment groups. As seen in Table 4.13, only the high dose group significantly (p-value of 0.032) influences the initial values of the underlying severity; the effect of the low dose group is not significant (p-value of 0.240). Chapter 5 Discussion It is helpful to use the results from the previous analyses of the data by Petkau and White (1995) and D'yachkava (1997) for reference. Comparing their results with those obtained in this thesis may not be so appropriate as the details in the previous analyses are quite different. Petkau and White (1995) used an univariate approach and consid ered a similar form of transformation of the response variable burden of disease. The transformation they considered was the log of relative burden. They found significant treatment effects and differences among the linear time effects for the different treatment group. D'yachkava (1997) used an univariate approach and looked at active lesions. She found significant treatment effects and no differences among the linear time effects for the different treatment group. Both Petkau and White (1995) and D'yachkava (1997) used the GEE approach in their analyses and focused on the sub-population (such as patients in a treatment group) average time effects but not subject-specific effects. We use a multivariate approach in our analysis and considered a bivariate response vector with relative burden and active lesions as its components. The possible effects of the treatment are addressed by investigating the time effects for patients across treat ment group. Since the latent process reflects the underlying severity of the disease, a large positive time effect for a patient suggests that the disease is getting worse over time. A time effect close to zero suggests that the disease does not change much during the sub-study. This may indicate a benefit of the treatment. The constant covariate effects 81 Chapter 5. Discussion 82 influence the initial value of the time series for a given patient, thus accounting for the natural heterogeneity of the patients. These constant covariate effects do not have the same interpretations as in the previous analyses. Regarding the pattern of the response over time, our model allows the time effects to be different across patients. The equality of these effects in a given treatment group can then be tested. There is clear evidence that patients in a treatment group do not share a common time effect. The estimated latent processes shown in Figure 4.28 indicate that patients in the placebo group have increasing severity of disease while those in the treat ment group do not have so much change in disease severity. When the sub-population "average" time effects are of focus, as seen in Table 4.10, the time effects for the treat ment group are smaller than that for the placebo group, perhaps even zero, suggesting a clear benefit of the treatment. In the analysis presented in this thesis, we avoid a potential computational complica tion by treating an ordinal variable (EDSS) simply as a continuous variable. Moreover, only one long-term covariate (time) is considered here. It may be interesting to include other time-varying covariates present in the MRI sub-study, for example, the EDSS scores at the time of subsequent scans, in future analysis. Our models currently can only handle longitudinal data collected at equally spaced time intervals. Due to this methodological constraint, we consider each scan to be taken at the target date for which it was intended, irrespective of the deviation of the actual date of the scan from that target date. Further development of the new method remains to be conducted in order to handle categorical (binary or ordinal) response data and to accommodate longitudinal data collected at unequally spaced intervals. Chapter 5. Discussion 83 Despite having some methodological constraints, the new method possesses various attractive features in analyzing multivariate longitudinal data. The Tweedie state space models provide a unified approach to analysis of multivariate longitudinal data of mixed types. Since the Tweedie class exponential dispersion models cover a wide range of dis crete and continuous distributions, the new method can handle a variety of response vectors with components of different data types. It offers an intuitively appealing con ceptual framework for the data-generating mechanism of longitudinal data. A realistic correlation structure is achieved under the model assumptions. In contrast to the quasi-likelihood approach, the method is based on the specification of the joint likelihood of the longitudinal data, thereby facilitating a wide variety of inferences and diagnostic procedures. The new method is flexible and easy to use. It has an extra feature that the covariates can enter the model via the short-term observation process or via the long-term latent process. At present, the new methodology cannot handle binary or cat egorical data. The method needs further development in order to accommodate these types of data. One limitation is that it is a new methodology. The justification of its use is based on asymptotics. Its properties are not yet fully understood. It would be desirable to provide a better understanding of the new methodology through further theoretical investigations and simulation studies. Bibliography Azzalini, A. (1982). Approximate filtering of parameter driven processes. Journal of Time Series Analysis, 3, 38-44. Chan, K.S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving observations. Journal of American Statistical Association, 90, 242-252. D'yachkova, Y. (1997). Analysis of longitudinal data from the Betaseron multiple sclerosis clinical trial. M.Sc. Thesis, Department of Statistics, University of British Columbia. Diggle, P.J., Liang, K.-Y. and Zeger, S.L. (1994). The analysis of Longitudinal Data. Oxford: Oxford University Press. Durbin, J. (1990). Extensions of Kalman modelling to non-Gaussian observations. Quaderni di Statistica e Matermatica Applicata alle Scienze Economico-Sociali, 12, 3-12. Fahrmeir, L. and Kaufmann, H. (1991). On Kalman filtering, posterior mode esti mation and Fisher scoring in dynamic exponential family regression. Metrika, 38, 37-60. Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modelling Based on Gen eralized Linear Models. New York: Springer-Verlag. Harvey, A.C. (1981). Time Series Models. Oxford: Allan. Harvey, A.C. and Fernandes, C. (1989). Time series models for counts or qualitative observations (with discussion). Journal of Business Economic Statistics, 7, 407-422. The IFNB Multiple Sclerosis Study Group (1993). Interferon beta-lb is effective in relapsing-remitting multiple sclerosis: clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology, 43, 655-661. Jones, R.H. (1993). Longitudinal Data with Serial Correlation: A State-Space Ap proach. London: Chapman and Hall. J0rgensen, B. (1986). Some properties of exponential dispersion models. Scandina vian Journal of Statistics, 13, 187-197. 84 Bibliography 85 J0rgensen, B. (1987). Exponential dispersion models (with discussion). Journal of Royal Statistical Society B, 49, 127-162. J0rgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1995a). A state space model for multivariate longitudinal count data. Technical Report #148, De partment of Statistics, University of British Columbia. J0rgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1995b). A longi tudinal study of emergency room visits and air pollution for Prince George, British Columbia. Biostatistics Research Report #6, Biostatistics Research Group, Univer sity of British Columbia. To appear in Statistics in Medicine. J0rgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1995c). State space models for multivariate longitudinal data of mixed types. Canadian Journal of Statistics, 24, 385-402. J0rgensen, B., Martinez, J.R. and Tsao, M. (1994). Asymptotic behaviour of the variance function. Scandinavian Journal of Statistics, 21, 223-243. Korn, E.L. and Whittemore, A.S. (1979). Methods for analyzing panel studies of acute health effects of air pollution. Biometrics, 35, 795-802. Kaufmann, H. (1987). Regression models for nonstationary categorical time series: asymptotic estimation theory. Annals of Statistics, 15, 863-871. Laird, N.M. and Ware, J.H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974. Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with Discussion). Journal of Royal Statistical Society, B 54, 3-40. Paty, D.W., Li, D.K.B., the UBC MS/MRI Study Group, and the IFNB Multiple Sclerosis Study Group (1993). Interferon beta-lb is effective in relapsing-remitting multiple sclerosis. II. MRI analysis results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology, 43, 662-667. Petkau, J. and White R. (1995). Longitudinal analyses for the UBC 6-weekly fre quent MRI sub-study of the Betaseron multiple sclerosis clinical trial. Biostatistics Research Report #10, Biostatistics Research Group, University of British Columbia. Shephard, N. (1994). Partial non-Gaussian state space. Biometrika, 81, 115-131. Bibliography 86 [26] West, M., Harrison, P.J. and Migon, H.S. (1985). Dynamic generalized linear models and Bayesian forecasting. Journal of American Statistical Association, 80, 73-79. [27] Zeger, S.L., Liang, K.-Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time-independent covariates. Biometrika, 72, 31-38. [28] Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach. Journal of American Statistical Association, 86, 79-86. [29] Zeger, S.L. (1988). A regression model for time series of counts. Biometrika, 75, 621-629. [30] Zeger, S.L. and Liang, K.-Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42, 121-130. [31] Zeger, S.L. and Qaqish, B. (1988). Markov regression models for time series: a quasi-likelihood approach. Biometrics, 44, 1019-1031. Appendix A Exponential dispersion models The following is a review of the basic properties of exponential dispersion models from j0rgensen (1986, 1987). An exponential dispersion model for a random variable Y is given by the probability density function for y G R, with respect to a suitable measure, and for appropriate functions c* and K. The distribution (A.19) is denoted by Y ~ ED*(0,A), and 9 is called the canonical parameter and A the index parameter. The maximum possible domain for 9, denoted by 0 C R, is called the canonical parameter domain. The domain for A, called the index set, is denoted by A, and is a subset of R+. We have A = R+ if and only if (A.19) is infinitely divisible (J0rgensen 1986). The function K is called the cumulant generator of the model. The mean and variance of Fare p(y;0,X) = c*(y;X)exp{9y - XK(9)} (A.19) E(Y) = n = XT(9) and Var(F) = XV(9/X), (A.20) where r(9) is the mean-value mapping and is the unit variance function on ft = r(intO). 87 Appendix A. Exponential dispersion models 88 Table A.14: Summary of the Tweedie exponential dispersion models Distribution p S Q 0 Extreme stable p < 0 R R+ Ro Normal p = 0 R R R {Do not exist} 0 < p < 1 — — — Poisson p = 1 N0 R+ R Compound Poisson 1 < p < 2 R0 R+ R_ Gamma p = 2 R+ R+ R_ Positive stable 2 < p < 3 R+ R+ — Ro Inverse Gaussian p = 3 R+ R+ —Ro Positive stable p > 3 R+ R+ —Ro Notation: S = support, —Ro = (—oo,0]. An exponential dispersion model satisfies the convolution formula ED*((9,AX)*ED*((9,A2) = ED*(<9, Ax + A2). (A.21) The Tweedie class is an important class of univariate exponential dispersion models, defined as the special case of the distribution (A. 19) with unit-variance functions of the form V(ii) = yP for some p £ (—oo] U [l,oo), denoted by Twp(fx,a2), where o2 = A1_p is called the dispersion parameter, the variance is O2/J,P. The Tweedie models may be characterized as the only exponential dispersion models that satisfy the following scale transformation property: cTwp(n, a2) ~ Twp(C/u, c2-V) Vc > 0. (A.22) Table A. 14 shows a summary of the Tweedie class. The case p — 1, a2 = 1 gives the Poisson distribution, and 1 < p < 2 gives the compound Poisson distributions [in the terminology of, for example, Feller (1971)], which have a positive probability mass at zero and are continuous for y > 0. The cases p = 0,2,3 correspond to the normal, Appendix A. Exponential dispersion models 89 gamma and inverse Gaussian distributions, respectively. For p > 2 or p < 0, the Tweedie models correspond to the exponential families generated by the positive stable distributions and the extreme stable distributions, re spectively. We refer to these models as the stable Tweedie models. These models are not stable as such, but all the Tweedie models appear as the limiting distributions for the exponential dispersion models according to the convergence theorem by J0rgensen, Martinez and Tsao (1994), which extends the stable generalization of the central limit theorem. Many exponential dispersion models have variance functions that are asymp totically of the Tweedie form, and may be approximated by a member of the Tweedie family by means of the result of j0rgensen, Martinez and Tsao (1994). For this reason, the Tweedie models occupy a central position among exponential dispersion models. The density function of the Tweedie class has the following form: f(r,t*,<r2,p) = cp(y;o2)exV[^{yT;\u) - KP{T;\U))}}, (A.23) where Kp(9) = a-1(a-l){9/(a-l)}a and rp{9) = {9/(a - with a = (p - 2)/(p - 1). Appendix B Godambe information matrix In this appendix, the calculation of the sensitivity matrix S and the variability matrix V mentioned in section 3.3.1 is briefly outlined. These matrices enter in the Godambe information matrix J = STV_1S and in the Newton scoring algorithm. The first two sections contain some intermediate results and the final results are given in the last two sections. B.l Moment calculations The variance-covariance matrices of e*, £* and m*, needed for calculating the variability matrix V are obtained as follows. We consider one series indexed by j at a time. For convenience, we suppress the index j in the notation. Let the matrix C* be defined by C* =E{(0 -m*)(0-m*)T} with diagonal elements C* and off-diagonal elements Ct*t+S given by n.. . . Cu+s= c**+s n Ct+i-l Since e*t — et — at(m* — 6t), where at6t is the conditional expectation of Yt given 9t, we have for all s, t > 0, Cov(et,es) = Cow(e*,e*) + ataJC*s. 90 Appendix B. Godambe information matrix 91 According to the results in section 3.2, the e*s are mutually uncorrelated, and hence we obtain Atrt - ataJC* for t = s -a*aJC* for t ^ s. A special case of this result is Cov(e*,m*0) = atC*0. From section 3.2, we have Cov(et,0) = 0, and using Qt = m* — (m* — 9t), we obtain Cov(e;,C) = atCov(m*t-9t,m:-8s-bs(mU-9s.1)) To obtain the covariance matrix of £* = (£*,..., £*), note that the innovation £t may be written as a sum of two uncorrelated terms 6 = it + {(0t - m*t) - M^i - mU)}-The innovations are mutually uncorrelated with variance cr2&?-1Tf, and since Var(0 — m*) = C*, we have E (£*»&-*) = ' <r2b\-XTt - Ct* + 26tC*_1 - b\C*_x for s = 0 < . -C*+a + btC;_lt+s + 6t+.C;(+,_1 - 6t6t+sC;_li+s_1 for s > 0 A special case of this formula is E{(m*0-T0)in = -C*ot + btC*ot_1. Finally, the variance of mj is given by varK) = gy - C*0 Appendix B. Godambe information matrix 92 B.2 Derivatives of Kalman filter and smoother The expected values of the partial derivatives of mt and m* with respect to a are, respectively, given by and The expected values of the partial derivatives of mt and m* with respect to f3 and 7 are defined similarly. As mentioned in section 3.2, for the Kalman filter, we have m0 = g = exp(wT7) and mt = &t{mt_i + DtaJQ~1(Yt - it)}, where &t is a function of ct, bt is a function of (3 and Dt and Qt are functions of ct and (3. Since E(Yf — f<) = 0, we obtain the recursion, for t = 1,..., n, Ba(t) = kB^t-^-b^D^jQ^Et-^^mt^ bt{l - 6<AaiTQi-1ai)DCK(t - 1) - ^A^Q,"1—^a4 OCX ^Da(i - 1) - MV^Q^—a, starting with DQ(0) = 0. Note that d dcxr Analogously, we obtain the recursions at = blockdiag (xT<Zit,..., xTadi) Appendix B. Godambe information matrix 93 starting with D^(0) = 0, and D7(0 = ^D7(t-l), starting with D7(0) = wTg. For the smoother, we immediately obtain the following backward recursions for, t = n,...,0, D*a(i) = DQ(t) + -£-{B*a(t + 1) - 6T+1Da(t)}, D£(t) = D^t) + ^{D^(i + 1) - bt^-Dplt) - AzJ+1rt+1} and D7(t) = D7(t) + 7^{D7(t + 1) - 6t+iD7(t)}, starting with D*^(n) = DQ(H), D^(n) = D^(ra) and D*y(n) = D7(ra). B.3 Variability matrix By definition, the variability matrix for the estimating function ip is / v12 v13 \ V = v21 v22 v23 v31 v32 v33 / where V,j = E[^j^J]. From section B.l, we have the variance-covariance structure of Y — Am*, Bm* and mj. Straightforward calculations give Vn = XTK~1Var(Y - Am*)K^X, Vi2 = XTK;1Cov(Y - Am*,Bm*)K6-1AZ, Appendix B. Godambe information matrix 94 V13 = XTKa-1Cov(Y - Arr^m^K^W, V22 = AZTK!;1Var(Bm*)K6-1AZ, V23 = AZTK6-1Cov(Bm*,m*)K;1W and V33 = WTK;1Var(mS)K;1W. The variances and covariances involved in these expressions may be calculated from the results in section B.l by noting that Y — Am* is the vector of the predicted innova tions k*t and Bm* is the vector of predicted innovations £*4. B.4 Sensitivity matrix As in the previously section, we may write the sensitivity matrix for ip in the form Sn S12 S13 \ S = S 23 where \ S31 S32 S33 j = E[d^i/drjJ], with j = 1,2,3 referring to at, (3 and 7 respectively. Using the derivatives of the Kalman smoother from section B.2, we obtain where D*^ denotes the vectors D*^) stacked in the appropriate order, and The remaining blocks of S are, using a similar notation S12 - -XTKa-1AD*/: 73' Appendix B. Godambe information matrix S13 = -XTKA-XAD7, S21 = -AZTK^BD*A, S22 = AZTK^ (^r + BD^ S23 = -AZTK^BD7, S3i - WTK;1D*C,, S32=WTK;1D*/3 and S33 = WTK- (D*7 - ) Here the subscript 0 refers to differentiation of mj. Appendix C Residuals plots for each patient in the placebo group 96 Appendix C. Residuals plots for each patient in the placebo group 97 Figure C.29: Residuals plots for patient #420 in the placebo group <a> Series: Active Lesions w Series: Relative Burden <c> Series: Latent Process TTT Lag (d) Active Lesions Lag (e) Relative Burden -0.8 -0.6 -0.4 lag 1 residuals (g) Active Lesions lag 1 residuals (h) Relative Burden (f) Latent Process 0.0 0.5 lag 1 residuals (i) Latent Process 0.0 0.05 log fitted values 0.0 0.05 log fitted values 0.0 0.1 log fitted values Appendix C. Residuals plots for each patient in the placebo group 98 Figure C.30: Residuals plots for patient #421 in the placebo group <a) Series: Active Lesions "P" <b) Series: Relative Burden (°> Series: Latent Process 1 I 1 1 I (d) Active Lesions -1.0 -0.8 -0.6 -0.4 -0.2 lag 1 residuals (g) Active Lesions 10 time (j) Active Lesions 0.10 0.15 0.20 0.25 0.30 0.35 log Bed values (e) Relative Burden lag 1 residuals (h) Relative Burden 5 10 time . (k) Relative Burden 0.20 0.25 0.30 log fitted values '(f) Latent Process -0.5 0.0 0.5 lag 1 residuals 1.0 (i) Latent Process Latent Process 0.3 0.4 0.5 tog fitted values Appendix C. Residuals plots for each patient in the placebo group 99 Figure C.31: Residuals plots for patient #443 in the placebo group <a' Series: Active Lesions Lag (d) Active Lesions o 1 lag 1 residuals (q) Active Lesions time (j) Active Lesions <b) Series: Relative Burden Lag (e) Relative Burden -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 lag 1 residuals (h) Relative Burden (k) Relative Burden <°' Series: Latent Process Lag (f) Latent Process •0.5 0.0 0.5 1.0 (i) Latent Process 5 10 15 time (I) Latent Process 0.1 0.2 0.3 log fitted values Appendix C. Residuals plots for each patient in the placebo group 100 Figure C.32: Residuals plots for patient #449 in the placebo group <a> Series: Active Lesions w Series: Relative Burden <c> Series: Latent Process NT i i TT i i l i i (d) Active Lesions •1.5 -1.0 -0.5 0.0 lag 1 residuals (g) Active Lesions 10 lime (j) Active Lesions Lag (e) Relative Burden 0 2 4 tag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden Lag (f) Latent Process -2 0 2 4 lag 1 residuals (i) Latent Process Latent Process 0.4 0.6 0.B 1.0 1.2 0.4 0.6 0.8 1.0 1.2 0.5 1.0 1.5 2.0 2.5 log fitted values log fitted values log fitted values Appendix C. Residuals plots for each patient in the placebo group 101 Figure C.33: Residuals plots for patient #451 in the placebo group <a> Series: Active Lesions <"> Series: Relative Burden w Series: Latent Process I I I i (d) Active Lesions Lag (e) Relative Burden (f) Latent Process (g) Active Lesions 0.0 0.2 0.4 lag 1 residuals (h) Relative Burden •0.2 0.0 0.2 lag 1 residuals (i) Latent Process 10 tune (j) Active Lesions (k) Relative Burden Latent Process -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 -0.14 -0.12 -0.10 -0.08 -0.06 ' -0.04 -0.02 log (Med values log fitted values -0.25 -0.20 -0.15 -0.10 tog fMed values Appendix C. Residuals plots for each patient in the placebo group 102 Figure C.34: Residuals plots for patient #498 in the placebo group <a> Series: Active Lesions <b> Series: Relative Burden <c) Series: Latent Process 1± Lag Lag Lag (d) Active Lesions Active Lesions (j) Active Lesions 0.10 0.15 0.20 log fitted values (e) Relative Burden •0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 lag 1 residuals (h) Relative Burden (k) Relative Burden 0.15 0.20 log fitted values (f) Latent Process (i) Latent Process Latent Process 0.3 0.4 0.5 tog fitted values Appendix C. Residuals plots for each patient in the placebo group 103 Figure C.35: Residuals plots for patient #501 in the placebo group <a> Series: Active Lesions ") Series: Relative Burden il11 Lag (d) Active Lesions Lag (e) Relative Burden 0.0 0.5 1.0 lag 1 residuals (g) Active Lesions (h) Relative Burden « o 2 6 10 time (j) Active Lesions 5 10 15 time (k) Relative Burden <c> Series: Latent Process _u_ TT 0.0 0.05 0.10 log fitted values Lag (f) Latent Process •1.0 -0.5 0.0 0.5 1.0 1.5 lag 1 residuals (i) Latent Process Latent Process •0.1 0.0 0.1 log fitted values Appendix C. Residuals plots for each patient in the placebo group 104 Figure C.36: Residuals plots for patient #504 in the placebo group <a> Series: Active Lesions JJL TT Lag (d) Active Lesions 3 a o 1.0 -0.8 -0.6 -0.4 -0.2 0.0 lag 1 residuals (g) Active Lesions « o 2 o 5 10 15 ume (j) Active Lesions <b> Series: Relative Burden Lag (e) Relative Burden 0.5 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden 5 10 lime (k) Relative Burden •0.3 -0.2 -0.1 0.0 log fitted values -0.3 -0.2 -0.1 0.0 log fitted values w Series: Latent Process Lag (f) Latent Process (i) Latent Process 10 time Latent Process -0.6 -0.4 -0.2 log fitted values -1.0 -0.5 0.0 0.5 1.0 1.5 lag 1 residuals Appendix C. Residuals plots for each patient in the placebo group 105 Figure C.37: Residuals plots for patient #507 in the placebo group <a> Series: Active Lesions 5 10 Lag (d) Active Lesions Active Lesions 5 10 15 tune (j) Active Lesions 0.0 0.1 0.2 0.3 0.4 0.5 log fitted values o» Series: Relative Burden 1 r 5 o Lag (e) Relative Burden J.5 0.0 0.5 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden 0.1 0.2 0.3 0.4 log Bed values <c> Series: Latent Process • 2 o ~1 T~ (f) Latent Process -1.0 -0.5 0.0 0.5 1.0 lag 1 residuals (i) Latent Process Latent Process 0.4 0.6 0.8 1.0 log fitted values Appendix C. Residuals plots for each patient in the placebo group 106 Figure C.38: Residuals plots for patient #522 in the placebo group <a> Series: Active Lesions <"> Series: Relative Burden <c> Series: Latent Process TT TTT I • . TTTH" i i I TT (d) Active Lesions (g) Active Lesions 10 time (j) Active Lesions 0.2 0.3 log fitted values Lag (e) Relative Burden (h) Relative Burden 5 10 15 time (k) Relative Burden 0.2 0.3 0.4 log fitted values (f) Latent Process -0.5 0.0 0.5 (i) Latent Process 5 10 15 time (I) Latent Process 0.4 0.6 0.8 log fitted values Appendix C. Residuals plots for each patient in the placebo group 107 Figure C.39: Residuals plots for patient #523 in the placebo group <a) Series: Active Lesions TT Lag (d) Active Lesions •0.8 -0.6 -0.4 lag 1 residuals (g) Active Lesions ci Series: Relative Burden TT J_L Lag (e) Relative Burden o.o 0.2 (h) Relative Burden 3 2 o 2 o <c) Series: Latent Process _L_L 5 10 Lag (f) Latent Process (i) Latent Process (j) Active Lesions lime (k) Relative Burden •0.04 -0.02 0.0 0.02 tog fitted values •0.02 0.0 0.02 tog fitted values Latent Process -0.05 0.0 0.05 tog fitted values Appendix C. Residuals plots for each patient in the placebo group 108 Figure C.40: Residuals plots for patient #539 in the placebo group <a' Series: Active Lesions i r TT 0 2 4 6 (d) Active Lesions -1.0 -0.5 0.0 lag 1 residuals (g) Active Lesions 10 12 10 12 14 <bSeries: Relative Burden 0 2 4 6 (e) Relative Burden 0.0 0.2 0.4 lag 1 residuals (h) Relative Burden 10 12 14 ^Series: Latent Process 0 2 4 Lag (f) Latent Process 10 12 14 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 tag 1 residuals (i) Latent Process 10 12 0) Active Lesions 0.2 0.3 tog fitted values (k) Relative Burden 0.2 0.3 tog fitted values Latent Process 0.4 0.6 log fitted values Appendix C. Residuals plots for each patient in the placebo group 109 Figure C.41: Residuals plots for patient #540 in the placebo group <a' Series: Active Lesions M Series: Relative Burden <c> Series: Latent Process TT Lag (d) Active Lesions •1.0 -0.8 -0.6 lag 1 residuals (g) Active Lesions 10 lima (j) Active Lesions 0.0 0.1 0.2 0.3 log fitted values TTTT TT TT 5 10 Lag (e) Relative Burden •2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden Lag (f) Latent Process -2.0 -1.5 -1.0 -0.5 0.0 0.5 tag 1 residuals (i) Latent Process Latent Process 0.0 0.1 0.2 log fitted values 0.0 0.2 0.4 log fitted values Appendix C. Residuals plots for each patient in the placebo group 110 Figure C.42: Residuals plots for patient #550 in the placebo group <a' Series: Active Lesions Series: Relative Burden <c> Series: Latent Process TT i M I I 1 i I i i l I I I ' i I 1 Lag (d) Active Lesions (e) Relative Burden -1.0 -0.8 -0.6 lag 1 residuals (g) Active Lesions •2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 jag 1 residuals (h) Relative Burden 10 time 0) Active Lesions (k) Relative Burden 5 10 Lag (f) Latent Process -2.0 -1.5 -1.0 -0.5 0.0 0.5 tag 1 residuals (i) Latent Process Latent Process 0.0 0.1 0.2 log fitted values 0.0 0.1 0.2 log fitted values -0.2 0.0 0.2 0.4 log fitted values Appendix C. Residuals plots for each patient in the placebo group 111 Figure C.43: Residuals plots for patient #565 in the placebo group (a) Series: Active Lesions *> Series: Relative Burden <c> Series: Latent Process _U_ | i 1 ' i I 5 10 Lag (d) Active Lesions (g) Active Lesions 5 10 time (j) Active Lesions Lag (e) Relative Burden (h) Relative Burden s 10 time (k) Relative Burden 10 15 (f) Latent Process 0.0 0.5 1.0 lag 1 residuals (i) Latent Process 5 10 15 time (I) Latent Process 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 -0.2 0.0 0.2 0.4 0.6 0.8 tog fitted values tog fitted values tog fitted values Appendix D Residuals plots for each patient in the low dose group 112 Appendix D. Residuals plots for each patient in the low dose group 113 Figure D.44: Residuals plots for patient #419 in the low dose group <a) Series: Active Lesions Lag (d) Active Lesions (g) Active Lesions 10 time (j) Active Lesions 0.0 0.1 0.2 log fitted values <") Series: Relative Burden TT Lag (e) Relative Burden •0.2 0.0 0.2 0.4 0.6 0.8 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden o.o 0.1 0.2 tog fitted values <c> Series: Latent Process J_L Lag (f) Latent Process •0.2 0.0 0.2 lag 1 residuals (i) Latent Process time Latent Process •0.2 0.0 0.2 0.4 log fitted values Appendix D. Residuals plots for each patient in the low dose group 114 Figure D.45: Residuals plots for patient #424 in the low dose group <a' Series: Active Lesions I"' Series: Relative Burden <c> Series: Latent Process 1 I 1 I I I 1 1 _LL T~TT | I 1 I 1 Lag ) Active Lesions (e) Relative Burden (f) Latent Process •1.1 -1.0 rag 1 residuals (g) Active Lesions -1 o 1 lag 1 residuals (h) Relative Burden (i) Latent Process ) Active Lesions 5 10 time (k) Relative Burden Latent Process 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.2 0.4 0.6 0.8 log lilted values log lilted values log Bed values Appendix D. Residuals plots for each patient in the low dose group 115 Figure D.46: Residuals plots for patient #448 in the low dose group <a> Series: Active Lesions ^Series: Relative Burden TT 1 ' i (d) Active Lesions | | 1 1 1 " ' i 0 10 15 Lag (e) Relative Burden •1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 (g) Active Lesions 10 line (j) Active Lesions -0.2 0.0 0.2 0.4 0.6 lag 1 residuals (h) Relative Burden 5 10 tune (k) Relative Burden 'c' Series: Latent Process (f) Latent Process -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 (i) Latent Process 10 time Latent Process •0.10 -0.05 0.0 log fitted values -0.2 -0.1 0.0 log fitted values Appendix D. Residuals plots for each patient in the low dose group 116 Figure D.47: Residuals plots for patient #450 in the low dose group <a> Series: Active Lesions w Series: Relative Burden 1°) Series: Latent Process i I . 3 2! « o (d) Active Lesions -0.96 -0.94 -0.92 -0.90 (g) Active Lesions 10 time 0) Active Lesions 3 M 2 o O O (e) Relative Burden •0.86 -0.4 -0.2 0.0 0.2 0.4 0.6 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden (f) Latent Process •0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 (i) Latent Process Latent Process -0.20 -0.15 -0.10 log fitted values -0.05 -0.20 -0.15 -0.10 log fitted values -0.05 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 tog fitted values Appendix D. Residuals plots for each patient in the low dose group 117 Figure D.48: Residuals plots for patient #452 in the low dose group <a' Series: Active Lesions Series: Relative Burden 1°) Series: Latent Process I M|| III I I 1 I M Lag (d) Active Lesions •0.5 0.0 0.5 lag 1 residuals (g) Active Lesions Lag (e) Relative Burden 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden g in Lag (f) Latent Process •0.5 0.0 0.5 1.0 1.5 (i) Latent Process 10 time (j) Active Lesions 10 15 (k) Relative Burden ffl Ul -o d 5 10 15 tune (I) Latent Process •0.06 -0.04 log fitted values •0.06 -0.04 -0.02 log fitted values •0.15 -0.10 -0.05 Fog fitted values Appendix D. Residuals plots for each patient in the low dose group 118 Figure D.49: Residuals plots for patient #499 in the low dose group <a) Series: Active Lesions TT Lag (d) Active Lesions (g) Active Lesions (j) Active Lesions <b) Series: Relative Burden (e) Relative Burden -0.2 0.0 0.2 0.4 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden <°> Series: Latent Process 5 10 15 (f) Latent Process •0.2 0.0 0.2 0.4 (i) Latent Process 5 10 15 time (I) Latent Process -0.2 0.0 0.2 log litted values Appendix D. Residuals plots for each patient in the low dose group 119 Figure D.50: Residuals plots for patient #508 in the low dose group <a' Series: Active Lesions i i i i i u Lag (d) Active Lesions •1.0 -0.5 0.0 lag 1 residuals (g) Active Lesions time (j) Active Lesions 0.10 0.15 0.20 0.25 log fitted values <") Series: Relative Burden i r i i Lag (e) Relative Burden 0 12 3 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden 0.10 0.15 0.20 0.25 log fitted values <°i Series: Latent Process TT Lag (f) Latent Process lag 1 residuals (i) Latent Process 10 time Latent Process 0.3 0.4 0.5 log fitted values Appendix D. Residuals plots for each patient in the low dose group 120 Figure D.51: Residuals plots for patient #521 in the low dose group <a) Series: Active Lesions o) Series: Relative Burden <c> Series: Latent Process TTT (d) Active Lesions •0.5 o.o lag 1 residuals (g) Active Lesions (j) Active Lesions •0.2 0.02 0.04 0.06 log fitted values (e) Relative Burden 0.0 0.2 0.4 tag 1 residuals (h) Relative Burden 5 10 tine (k) Relative Burden 0.02 0.04 0.06 log fitted values 0.08 0.10 S o IS ° Lag (f) Latent Process (i) Latent Process Latent Process -0.05 o.o 0.05 0.10 log fitted values Appendix D. Residuals plots for each patient in the low dose group 121 Figure D.52: Residuals plots for patient #525 in the low dose group <a' Series: Active Lesions <"> Series: Relative Burden <°> Series: Latent Process I I I I i 1 Lag (d) Active Lesions (e) Relative Burden (g) Active Lesions 2 q 10 time 0) Active Lesions 5 10 time (k) Relative Burden (f) Latent Process •10 12 3 -2-1012 lag 1 residuals lag 1 residuals (h) Relative Burden (i) Latent Process Latent Process 0.25 0.30 0.35 0.40 0.45 log fitted values 0.25 0.30 0.35 0.40 tog fitted values 0.6 0.7 log fitted values Appendix D. Residuals plots for each patient in the low dose group 122 Figure D.53: Residuals plots for patient #542 in the low dose group (a> Series: Active Lesions i i I i ci Series: Relative Burden TV 1 r w Series: Latent Process (d) Active Lesions -1.0 -0.8 -0.6 lag 1 residuals (g) Active Lesions (j) Active Lesions 0.3 0.4 bg fitted values 0.2 (e) Relative Burden lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden 0.3 0.4 0.5 bg titled values o ° 3 2 o (f) Latent Process (i) Latent Process 10 time Latent Process 0.4 0.6 0.8 1.0 log fitted values Appendix D. Residuals plots for each patient in the low dose group 123 Figure D.54: Residuals plots for patient #544 in the low dose group <a>Series: Active Lesions 10 12 (d) Active Lesions (g) Active Lesions 2 4 6 10 12 time 0) Active Lesions ("Series: Relative Burden i_L 1 1 | i 0 2 4 8 10 12 Lag (e) Relative Burden 0.0 0.2 lag 1 residuals (h) Relative Burden 10 time (k) Relative Burden ('Series: Latent Process i l 1 I i (f) Latent Process -0.2 o.o lag 1 residuals (i) Latent Process time Latent Process 3 ° -0.15 -0.10 -0.05 0.0 0.05 0.10 -0.15 -0.10 -0.05 0.0 0.05 0.10 -0.3 -0.2 -0.1 0.0 0.1 0.2 log lilted values log fitted values log fitted values Appendix D. Residuals plots for each patient in the low dose group 124 Figure D.55: Residuals plots for patient #547 in the low dose group (a) Series: Active Lesions Lag (d) Active Lesions (g) Active Lesions (j) Active Lesions -o.i o.o log fitted values <"> Series: Relative Burden I i 1 I | i (e) Relative Burden (h) Relative Burden 5 10 time (k) Relative Burden -0.1 0.0 bg fitted values <c> Series: Latent Process _LL_L TT 5 10 Lag (f) Latent Process o.o 0.5 lag 1 residuals (i) Latent Process Latent Process -0.2 0.0 02 bg fitted values Appendix D. Residuals plots for each patient in the low dose group 125 Figure D.56: Residuals plots for patient #548 in the low dose group <a> Series: Active Lesions (d) Active Lesions (g) Active Lesions 10 time (j) Active Lesions 0.04 0.06 0.08 0.10 0.12 0.14 log fitted values <") Series: Relative Burden T1-Lag (e) Relative Burden (h) Relative Burden 5 10 time (k) Relative Burden 0.0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 log fitted values <c) Series: Latent Process i i (f) Latent Process lag 1 residuals (i) Latent Process 10 time Latent Process 0.10 0.15 log fitted values Appendix D. Residuals plots for each patient in the low dose group 126 Figure D.57: Residuals plots for patient #564 in the low dose group (a> Series: Active Lesions T T~r~ Ug (d) Active Lesions •0.5 o.o lag 1 residuals ) Active Lesions 5 10 15 time (j) Active Lesions 0.2 0.3 0.4 log fitted values <•>> Series: Relative Burden "TT" Lag (e) Relative Burden 0.0 0.5 1.0 lag 1 residuals (h) Relative Burden 1.5 2.0 5 10 15 time (k) Relative Burden 0.2 0.3 0.4 log fitted values (°) Series: Latent Process TT TT Lag (f) Latent Process o 1 lag 1 residuals (i) Latent Process Latent Process 0.3 0.4 0.5 0.6 0.7 0.8 log fitted values Appendix D. Residuals plots for each patient in the low dose group 127 Figure D.58: Residuals plots for patient #568 in the low dose group <a>Series: Active Lesions i i i i i Lag (d) Active Lesions 10 12 •1.0 -0.8 -0.6 -0.4 -0.2 0.0 lag 1 residuals (g) Active Lesions 2 4 6 lime (j) Active Lesions -0.10 -0.05 0.0 0.05 0.10 bg titled values (•Series: Relative Burden TT (e) Relative Burden 0.4 0.6 0.8 lag 1 residuals (h) Relative Burden time (k) Relative Burden •0.10 0.0 0.05 0.10 log fitted values (°'Series: Latent Process j r_ TT Lag (f) Latent Process •0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag 1 residuals (i) Latent Process 10 12 time Latent Process •0.1 0.0 0.1 bg fitted values 0.2 Appendix E Residuals plots for each patient in the high dose group 128 Appendix E. Residuals plots for each patient in the high dose group 129 Figure E.59: Residuals plots for patient #422 in the high dose group <a> Series: Active Lesions _L_L 0 in 1 o 3 O (d) Active Lesions Active Lesions 10 time 0) Active Lesions •0.10 -0.05 0.0 log fitted values 0.05 0.10 i"' Series: Relative Burden (e) Relative Burden (h) Relative Burden 5 10 time (k) Relative Burden <c> Series: Latent Process •0.05 0.0 log fitted values I II I 0 10 15 Lag f) Latent Process 5 0.0 0.5 lag 1 residuals (i) Latent Process 10 time Latent Process •0.1 0.0 log fitted values Appendix E. Residuals plots for each patient in the high dose group 130 Figure E.60: Residuals plots for patient #444 in the high dose group <a> Series: Active Lesions Series: Relative Burden Lag (d) Active Lesions -0.6 -0.4 -0.2 Lag 1 residuals (g) Active Lesions time (j) Active Lesions "n-r 'MM (e) Relative Burden lag 1 residuals (h) Relative Burden (k) Relative Burden <c> Series: Latent Process TTT TT -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 bg fitted values -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 bg fitted values Lag (f) Latent Process 0.0 0.5 lag 1 residuals (i) Latent Process 10 time Latent Process •0.60 -0.55 -0.50 -0.45 -0.40 -0.35 bg fitted values Appendix E. Residuals plots for each patient in the high dose group 131 Figure E.61: Residuals plots for patient #445 in the high dose group <a) Series: Active Lesions <b> Series: Relative Burden •1.0 -0.8 -0.6 -0.4 -0.2 0.0 lag 1 residuals (g) Active Lesions •0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag 1 residuals (h) Relative Burden 10 time 0) Active Lesions 5 10 time (k) Relative Burden 0.0 0.1 log fitted values 0.2 (°> Series: Latent Process o w d U. ill o ,11,. 1 ' 1 II o 1 1 10 1 5 10 15 d 0 5 10 15 Lag Lag (d) Active Lesions (e) Relative Burden q luals 0.6 O CM • d CM • d • TT Ml,. Lag (f) Latent Process •0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 lag 1 residuals (i) Latent Process 10 time Latent Process -0.2 0.0 0.2 bg fitted values Appendix E. Residuals plots for each patient in the high dose group 132 Figure E.62: Residuals plots for patient #453 in the high dose group 'a> Series: Active Lesions <b> Series: Relative Burden <c> Series: Latent Process 1 I ' 1 ' I 1 | 1 r (d) Active Lesions (g) Active Lesions Lag (e) Relative Burden ).0 0.2 0.4 lag 1 residuals (h) Relative Burden 2 °-n o Lag (f) Latent Process (i) Latent Process •0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 log fitted values •0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 tog fitted values -0.30 -0.25 -0.20 log fitted values •0.15 Appendix E. Residuals plots for each patient in the high dose group 133 Figure E.63: Residuals plots for patient #454 in the high dose group <a' Series: Active Lesions (d) Active Lesions (g) Active Lesions 10 time (j) Active Lesions ("i Series: Relative Burden i i TT Lao (e) Relative Burden 0.0 0.2 0.4 0.6 0.B lag 1 residuals (h) Relative Burden 1.0 1.2 (k) Relative Burden <c) Series: Latent Process •0.2 Lag (f) Latent Process 0.0 0.2 0.4 0.6 lag 1 residuals (i) Latent Process Latent Process TT •0.2 -0.1 0.0 0.1 0.2 0.3 0.4 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.6 log fitted values bg fitted values bg fitted values Appendix E. Residuals plots for each patient in the high dose group 134 Figure E.64: Residuals plots for patient #497 in the high dose group <a> Series: Active Lesions JLL (d) Active Lesions -1.02 -1.00 (g) Active Lesions 10 time (j) Active Lesions -o.io -0.05 o.o log fitted values 0.05 0.10 i"i Series: Relative Burden 5 10 Lag (e) Relative Burden lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden <c> Series: Latent Process -0.10 -0.05 0.0 log fitted values 0.05 0.10 -0.3 5 10 Lag (f) Latent Process (i) Latent Process Latent Process -0.1 .0.0 log Med values Appendix E. Residuals plots for each patient in the high dose group 135 Figure E.65: Residuals plots for patient #503 in the high dose group <a> Series: Active Lesions "TT Lag (d) Active Lesions Active Lesions 10 time (j) Active Lesions "»Series: Relative Burden s •D » 9 S o 0 s 10 15 Lag (e) Relative Burden 0.0 0.5 lag 1 residuals (h) Relative Burden time (k) Relative Burden 'c> Series: Latent Process I I I 0 5 10 15 Lag (t) Latent Process 0.0 0.5 lag 1 residuals (i) Latent Process 10 time Latent Process •0.05 0.0 0.05 0.10 -0.05 0.0 0.05 0.10 -0.1 0.0 0.1 0.2 log fitted values log fitted values log fitted values Appendix E. Residuals plots for each patient in the high dose group 136 Figure E.66: Residuals plots for patient #506 in the high dose group ("Series: Active Lesions (d) Active Lesions 0.0 0.5 tag 1 residuals (g) Active Lesions lime (j) Active Lesions •Series: Relative Burden "Series: Latent Process ACF .0 0.5 1.0 . I , i ACF .0 0.5 1.0 I I I I ' ' ' -0.5 0. I I 1 0 2 4 6 8 10 12 Lag (e) Relative Burden 0 2 4 6 8 10 12 Lag (f) Latent Process -0.6 -0.4 -0.2 0.0 0.2 (h) Relative Burden 2 « 9 time (k) Relative Burden -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 lag 1 residuals (i) Latent Process 10 12 time (I) Latent Process -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 log fitted values •0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 log fitted values -0.6 -0.4 -0.2 log fitted values 0.0 Appendix E. Residuals plots for each patient in the high dose group 137 Figure E.67: Residuals plots for patient #524 in the high dose group <a> Series: Active Lesions ""Series: Relative Burden ACF .0 0.5 1.0 1 , . 1 i ACF .0 0.5 1.0 i 1 1 I III o in 1 | 1 1 0 5 10 15 Lag (d) Active Lesions 0 5 10 15 Lag (e) Relative Burden •0.5 0.0 0.5 lag 1 residuals (g) Active Lesions 10 time (j) Active Lesions 0.0 0.1 0.2 0.3 0.4 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden •0.15 -0.10 -0.05 0.0 0.05 0.10 0.15 0.20 tog fitted values -0.15 -0.10 -0.05 <c' Series: Latent Process (f) Latent Process 0.0 0.2 0.4 lag 1 residuals (i) Latent Process 10 time Latent Process •0.2 -0.1 0.0 0.1 tog fitted values 0.3 Appendix E. Residuals plots for each patient in the high dose group 138 Figure E.68: Residuals plots for patient #526 in the high dose group o) Series: Active Lesions *) Series: Relative Burden <c> Series: Latent Process (d) Active Lesions -1.05 -1.00 -0.95 lag 1 residuals (g) Active Lesions 10 lime 0) Active Lesions TT Ug (e) Relative Burden 0.0 0.5 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden A o 5 10 15 Lag (f) Latent Process -0.5 0.0 0.5 lag 1 residuals (i) Latent Process 5 10 15 time (I) Latent Process 3d TJ O 1 -0.20 -0.15 -0.10 -0.05 0.0 0.05 log fitted values -0.20 -0.15 -0.10 -0.05 0.0 0.05 log fitted values -0.4 -0.3 -0.2 -0.1 0.0 0.1 log fitted values Appendix E. Residuals plots for each patient in the high dose group 139 Figure E.69: Residuals plots for patient #541 in the high dose group <a' Series: Active Lesions i i ' (d) Active Lesions i -0.96 -0.94 -0.92 -O.90 lag 1 residuals (g) Active Lesions 10 tine (j) Active Lesions c» Series: Relative Burden ^T (e) Relative Burden •0.4 -0.2 0.0 0.2 0.4 0.6 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden <c> Series: Latent Process Lag (f) Latent Process -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.6 lag 1 residuals (i) Latent Process 10 time Latent Process -0.20 -0.15 -0.10 -0.05 log fitted values •0.20 -0.15 -0.10 -0.05 log fitted values -0.3 -0.2 -0.1 bg fitted values Appendix E. Residuals plots for each patient in the high dose group 140 Figure E.70: Residuals plots for patient #543 in the high dose group <a> Series: Active Lesions Lag (d) Active Lesions 5 "? •0.92 -0.90 -0.88 -0.81 lag 1 residuals (g) Active Lesions 10 lime (j) Active Lesions <s o >- O) 9 -O20 -0.15 log fitted values -0.10 w Series: Relative Burden <°> Series: Latent Process ACF .0 0.5 1.0 I I I I 1 1 ACF .0 0.5 1.0 I . ill, 111 111 II O III I 1 0 5 10 15 Lag (e) Relative Burden d t 0 5 10 15 Lag (f) Latent Process 0.0 0.5 lag 1 residuals (h) Relative Burden 5 10 time (k) Relative Burden -0.25 •OiO -0.15 log lilted values 0.0 lag 1 residuals (i) Latent Process 10 time Latent Process •0.45 -0.40 -0.35 log fitted values -0.30 Appendix E. Residuals plots for each patient in the high dose group 141 Figure E.71: Residuals plots for patient #546 in the high dose group <aSeries: Active Lesions Series: Relative Burden '•Series: Latent Process 6 Lag (d) Active Lesions 1.00 -0.96 -0.96 -0.94 -0.92 lag 1 residuals (g) Active Lesions 6 Lag (e) Relative Burden (h) Relative Burden _L_L 6 Lag (f) Latent Process (i) Latent Process (j) Active Lesions (k) Relative Burden -0.10 -0.05 0.0 log fitted values Latent Process -0.3 -0.2 -0.1 0.0 log fitted values Appendix E. Residuals plots for each patient in the high dose group 142 Figure E.72: Residuals plots for patient #549 in the high dose group <a) Series: Active Lesions Lag (d) Active Lesions -1.0 -0.8 -0.6 -0.4 -0.2 0.0 lag 1 residuals (g) Active Lesions (j) Active Lesions 0.1 0.2 log lilted values w Series: Relative Burden -0.1 Lag (e) Relative Burden 0.0 0.2 0.4 0.6 0.8 lag 1 residuals (h) Relative Burden (k) Relative Burden o.o 0.1 0.2 log fitted values <c' Series: Latent Process J_LL Lag (f) Latent Process « o 2 « (0 O 0.0 0.2 0.4 lag 1 residuals (i) Latent Process 0.6 0.8 10 time Latent Process 0.0 0.2 log fitted values Appendix E. Residuals plots for each patient in the high dose group 143 Figure E.73: Residuals plots for patient #566 in the high dose group <a> Series: Active Lesions ""TT Lag (d) Active Lesions •1.00 -0.95 lag 1 residuals (g) Active Lesions time (j) Active Lesions 0.05 0.10 log fitted values "> Series: Relative Burden <c> Series: Latent Process ACF o o.5 1 .a rill ACF o o.5 1 .a 1 11 111 o | 1 Ml 0 5 10 15 Lag (e) Relative Burden d 0 5 10 15 Lag (f) Latent Process lag 1 reskjuals (h) Relative Burden 5 10 time (k) Relative Burden 0.05 0.10 log fitted values 0.15 0.0 •0.5 0.0 0.5 lag 1 residuals (i) Latent Process 10 time Latent Process 0.1 0.2 log fitted values
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Analysis of longitudinal data of mixed types using...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Analysis of longitudinal data of mixed types using a state space model approach Ching, Billy K. S. 1997
pdf
Page Metadata
Item Metadata
Title | Analysis of longitudinal data of mixed types using a state space model approach |
Creator |
Ching, Billy K. S. |
Date | 1997 |
Date Issued | 2009-03-24 |
Description | A new method for multivariate regression analysis of longitudinal data of mixed types is applied to the data from a sub-study of the Betaseron multicenter clinical trial in relapsing-remitting multiple sclerosis (MS) (The IFNB Multiple Sclerosis Study Group, 1993). The sub-study is based on a cohort of 52 patients at one center (University of British Columbia) for frequent magnetic resonance imagings (MRIs) for analysis of disease activity over the first two years of the trial (Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group, 1993). We consider a bivariate response vector with two different data types as components. The first component is a positive continuous variable and the second one is a count variable. We use a state space model approach based on the Tweedie class of exponential dispersion models assuming conditional independence of the two components given a latent gamma Markov process. The latent process is interpreted as the underlying severity of the disease whereas the observations reflect the symptoms. One advantage the new method offers is that it enables the examination of patterns over time. Not only can it identify the presence of treatment effect, but also the nature of the effect. It has well been established that Betaseron has substantially altered the natural history of MS in a properly controlled clinical trial (The IFNB Multiple Sclerosis Study Group, 1993). The main objective of this thesis is to illustrate the utilization of the new method using this data set and to extract additional valuable information from the data. |
Extent | 5167121 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2009-03-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087894 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1997-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/6389 |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_1997-0385.pdf [ 4.93MB ]
- Metadata
- JSON: 1.0087894.json
- JSON-LD: 1.0087894+ld.json
- RDF/XML (Pretty): 1.0087894.xml
- RDF/JSON: 1.0087894+rdf.json
- Turtle: 1.0087894+rdf-turtle.txt
- N-Triples: 1.0087894+rdf-ntriples.txt
- Original Record: 1.0087894 +original-record.json
- Full Text
- 1.0087894.txt
- Citation
- 1.0087894.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
United States | 3 | 1 |
China | 2 | 15 |
Russia | 1 | 0 |
City | Views | Downloads |
---|---|---|
Ashburn | 2 | 0 |
Unknown | 2 | 1 |
Beijing | 2 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087894/manifest