Magnetic Resonance Imaging Lesion Count as a Surrogate Endpoint in Relapsing-Remitting Multiple Sclerosis Clinical Trials by Lang Qin B.Sc., Jinan University, 2009 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2011 c Lang Qin, 2011 ⃝ Abstract The count of active lesions based on magnetic resonance imaging (MRI) is often used as a potential surrogate endpoint in phase 2 clinical trials for relapsingremitting multiple sclerosis (RRMS) patients. However, this surrogacy relationship has not been completely validated. In this report, we study whether at the trial level, the MRI lesion count is a good surrogate endpoint for the relapse rate, the usual clinical endpoint for RRMS clinical trials. Two different approaches to assess this surrogacy relationship are applied to the dataset used by Sormani et al. [1] (SBRCMB) which contains the summary results from 23 randomized, placebo-controlled clinical trials in RRMS. The SBRCMB approach uses simple linear regression with weighted least squares estimation, while our more comprehensive approach develops a detailed model for the endpoints and the treatment effects to take into account estimation errors and the correlated contrasts. Both approaches are based only on the summary results from each clinical trial. The shortcomings of the SBRCMB approach are discussed and the results from the two approaches are compared. Both approaches show that the MRI lesion count is a good surrogate endpoint, while our more comprehensive approach shows a nearly perfect surrogacy relationship. When the estimated surrogacy relationship is used to predict the true treatment effect on the clinical endpoint for the trials in the SBRCBM dataset, the approaches give similar point predictions, but ii the approximate 95% prediction intervals from the comprehensive approach are generally shorter. In practice, the estimated surrogacy relationship based on the comprehensive approach can give a precise prediction for the true treatment effect on the clinical endpoint if the treatment displays a large effect on the surrogate endpoint, but may otherwise lead to an inconclusive result. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . 1.1 What is a Surrogate Endpoint? . . . . . . 1.2 Surrogate Endpoints in Multiple Sclerosis 1.3 Outline of the Report . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 4 2 Literature Review: Validation of Surrogate Endpoints . . . . . . 2.1 Importance of Validating a Potential Surrogate Endpoint . . . 2.2 Methods of Validating Surrogate Endpoints . . . . . . . . . . 2.2.1 The Prentice Operational Criteria for Validation . . . . 2.2.2 Validation in a Single Clinical Trial . . . . . . . . . . 2.2.3 Validation in Multiple Clinical Trials . . . . . . . . . . 2.3 Validation in Multiple Clinical Trials with Individual Data Unavailable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 8 8 9 12 . . 15 iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 2.3.3 Review of Daniels and Hughes [2] . . . . . . . . . . . . . Review of Korn et al. [3] . . . . . . . . . . . . . . . . . . Comparison of These Two Approaches . . . . . . . . . . . 3 Lesion Counts as a Surrogate Endpoint in RRMS: the SBRCMB Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction and the SBRCMB Dataset . . . . . . . . . . . . . 3.2 The SBRCMB approach . . . . . . . . . . . . . . . . . . . . 3.3 Critique of the SBRCMB Approach . . . . . . . . . . . . . . 3.3.1 The Appropriateness of the Weights . . . . . . . . . . 3.3.2 Correlation of the Contrasts . . . . . . . . . . . . . . . 3.3.3 Influence of Estimation Errors . . . . . . . . . . . . . . . . . . . . 16 19 24 . . . . . . . 26 26 28 33 34 39 40 . . . 44 45 45 . . . . 47 48 50 54 . . 59 72 5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . 79 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4 Lesion counts as a Surrogate Endpoint in RRMS: A More Comprehensive Approach . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Model for the Single-contrast Clinical Trials . . . . . . . . . . . 4.1.1 Model for the True Treatment Effects . . . . . . . . . . 4.1.2 Model for the Observed Annualized Relapse Rate and MRI Lesion Count Per Patient Per Scan . . . . . . . . . 4.1.3 Model for the Estimated Treatment Effects . . . . . . . . 4.2 Model for the Multiple-contrast Clinical Trials . . . . . . . . . . 4.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 4.4 Comparison between the Comprehensive Approach and the SBRCMB Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Assessment of the Estimated Surrogacy Relationship in Practice v A The SBRCMB Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B Partial Derivatives of E(Y0true |X0 = x0 ) . . . . . . . . . . . . . . . . . 91 vi List of Tables Table 3.1 Table 3.2 Results of the Sensitivity Study . . . . . . . . . . . . . . . . . Results of the Interaction Study . . . . . . . . . . . . . . . . . 30 31 Table 4.1 Table 4.2 Results of the Model Fit . . . . . . . . . . . . . . . . . . . . . Comparison of the Approximate 95% Prediction Intervals for exp (Y0true (x0 )) for the SBRCMB and Comprehensive Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Influence of the Sample Size N0 and the Magnitude of the Estimated Treatment Effect on the Surrogate Endpoint on the 95% Prediction Intervals for the True Treatment Effect on the Clinical Endpoint for Trials with K0 = 6 Scans per Patient. The Entries are the Point Predictions and Approximate 95% Prediction Intervals for exp (Y0true (x0 )). . . . . . . . . . . 57 Table 4.3 vii 68 73 List of Figures Figure 2.1 Scenarios of Perfect (a) and Imperfect (b,c,d) Surrogates . . . Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Scatter Plot of Estimated Treatment Effects Results of the Validation Study . . . . . . . Scatter Plot of (c, 1/w) . . . . . . . . . . . Scatter Plot of (c, 1/w′ ) . . . . . . . . . . . Figure 4.1 Regression Prediction Lines: the SBRCMB Approach (y = −0.02+0.55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = 50 (y = 0.50x). . . . . . . . . . . . . . Regression Prediction Lines: the SBRCMB Approach (y = −0.02+0.55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = ∞ (y = 0.08 + 0.62x). . . . . . . . . . . Point Predictions for the 40 Contrasts . . . . . . . . . . . Comparison of Point Predictions for the 40 Contrasts . . . Comparison of the Approximate 95% Prediction Intervals for exp (Y0true (x0 )) for the SBRCMB and Comprehensive Approaches . . . . . . . . . . . . . . . . . . . . . . . . . Threshold Value of exp (X0 ) versus Sample Size N0 when a Beneficial Treatment Effect is Observed on the Surrogate Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . Threshold Value of exp (X0 ) versus Sample Size N0 when a Negative Treatment Effect is Observed on the Surrogate Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 Figure 4.6 Figure 4.7 viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 . . . . 27 33 38 38 . . 63 . . . . . . 64 66 66 . . 71 . . 77 . . 77 Acknowledgments I would like to express my sincerest gratitude to my supervisor, Professor John Petkau. I could never finish my thesis without his insightful guidance and constant encouragement. I am also grateful for his enlightening teaching in STAT550 and STAT551, from which I started to learn how to think as a statistician. I would like to thank Professor Lang Wu for being my second reader and providing valuable comments. I would like to thank my best friend Guannan Li, to whom I can always express my sadness when I am depressed. I would like to thank my statistic colleague Yumi Kondo, who always discussed statistics with me and made my days at the department interesting and cheerful. I would also like to thank Jun Chen for being a very nice roommate who beard my irregular working schedule. Finally, I would like to express my deepest gratitude to my beloved parents. Without their love, I could never complete my graduate study. ix To my family. x Chapter 1 Introduction 1.1 What is a Surrogate Endpoint? In clinical trials, a clinical endpoint generally refers to occurrence of a disease, a symptom, a sign or a laboratory abnormality that constitutes one of the target outcomes of the trial. It directly measures how a patient feels, functions or survives and thus, is used to determine whether the treatment being studied is beneficial. A surrogate endpoint is an outcome which can be used as a substitute for a clinical endpoint. When assessing the treatment effect, a surrogate endpoint can be used to generate reliable conclusions instead of using the corresponding clinical endpoint directly. Examples of potential surrogate endpoints include CD4 cell count for HIV-related disease progression in clinical trials of anti-HIV treatments, progression-free survival time for survival time in clinical trials of treatments for advanced ovarian cancer and serum cholesterol levels for survival in clinical trials of treatments for cardiovascular disease. More examples of potential surrogate endpoints can be found in Burzykowsky et al. [4]. Why are surrogate endpoints required? The principal reason is that in many clinical trials, it is difficult to use the desired clinical endpoints directly. The clinical endpoint may be rare, so a large number of patients would be required for 1 a trial with adequate power (e.g. short-term mortality in patients with suspected acute myocardial infarction). The clinical endpoint may need a very long followup time to be detected (e.g. survival of patients in early-stage cancers), but too many patients might then be lost to follow-up. The clinical endpoint may also be difficult or costly to measure. In contrast, surrogates endpoints are outcomes that occur more often or are easier to measure. The motivation for the use of a surrogate endpoint is therefore the possibility of a reduction in the number of required patients or in the required trial duration. In order to effectively substitute for a formal clinical endpoint, a surrogate endpoint must have the potential to yield unambiguous information about differential treatment effects on a clinical endpoint. The formal definition of a surrogate endpoint is given by Prentice [5] as “a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the clinical endpoint”. The Prentice definition means that if a treatment has an effect on a clinical endpoint, then the treatment also has an effect on the surrogate endpoint, and the converse is also true. Mathematically, if S and C denote the surrogate endpoint and the clinical endpoint respectively, and Z denotes the treatment. then the Prentice definition can be written as: f (S|Z) = f (S) ⇔ f (C|Z) = f (C), (1.1) where f (X) denotes the probability distribution of the random variable X and f (X|Z) denotes the probability distribution of X conditional on the value of Z. 1.2 Surrogate Endpoints in Multiple Sclerosis Multiple sclerosis (MS) is a chronic and often disabling disease of the central nervous system. MS affects the ability of nerve cells in the brain and spinal cord 2 to communicate with each other. Nerve cells communicate by sending electrical signals called action potentials down long fibers called axons, which are wrapped in an insulating substance called myelin. In MS, the body’s own immune system attacks and damages the myelin. When myelin is lost, the axons can no longer effectively conduct signals. The name multiple sclerosis refers to scars particularly in the white matter of the brain and spinal cord, which is mainly composed of myelin. MS results in symptoms including difficulties in moving and coordination, deterioration of sensory functions, problems in bowel and bladder functions, among many others. MS onset usually occurs in young adults, and it is more common in women. Although much is known about the mechanisms involved in the disease process, the cause remains unknown, and there is no known cure for the disease to date. There are several types of MS characterized by disease progression in terms of severity of disability. Relapsing-remitting MS (RRMS), the most common type, is characterized by unpredictable relapses followed by periods of months to years of relative quiet (remission) with no new signs of disease activity. Until now, the only accepted primary endpoints for pivotal clinical trials of new treatments for RRMS are clinical outcomes, including relapse rate and accumulation of permanent disability, usually measured by the Extended Disability Status Scale (EDSS). There is no fully validated surrogate endpoint for RRMS yet. In RRMS clinical trials, magnetic resonance imaging (MRI) scans of the brain are often utilized to help monitor patients’ health and the progression of their disease. McFarland et al. [6] argue that changes in MS brain lesion patterns determined by MRI scans, which reflect the underlying disease pathology, may be the best candidate for a surrogate endpoint in RRMS. 3 1.3 Outline of the Report The objective of our study is to address the question: Are changes in brain lesion patterns determined by MRI a good surrogate endpoint for the relapse rate, the clinical endpoint in RRMS clinical trial? This chapter has provided some background information about surrogate endpoints and MS. Chapter 2 provides a general review of how to validate a potential surrogate endpoint. We first discuss the importance of validation and then review different approaches, in situations where data is from a single clinical trial and data is from multiple clinical trials respectively. In the situation of multiple clinical trials, we focus on the scenario where only summary statistics for each trial are available. We review the methods adopted in Daniels and Hughes [2] and Korn et al. [3] in detail. Chapter 3 considers validation in the RRMS setting. The specific potential surrogate endpoint we will focus on is the MRI lesion count, and the corresponding clinical endpoint is the annualized relapse rate. Information is presented on the dataset of Sormani et al. [1] (hereafter referred to as SBRCMB) used to assess the surrogacy relationship. The methodology of SBRCMB is discussed in detail, as well as the potential drawbacks of their approach. In Chapter 4, we develop a related but different model to assess the surrogacy relationship. We focus on dealing with the issue of measurement error existing in estimating the surrogate endpoint and the clinical endpoint, and the context where data is available from several clinical trials, including some having more than two arms. We compare the results from the SBRCMB model and from our model. We also evaluate the prediction ability of the estimated surrogacy relationship to determine whether the surrogate endpoint is useful in practice. Chapter 5 summarizes the overall findings and discusses problems that remain to be investigated. 4 Chapter 2 Literature Review: Validation of Surrogate Endpoints 2.1 Importance of Validating a Potential Surrogate Endpoint It is essential to validate a potential surrogate endpoint before using it as the primary outcome in a clinical trial. A surrogate endpoint should be able to assess the treatment effect in a clinical trial and the result obtained from the surrogate endpoint should be consistent with that obtained from the corresponding clinical endpoint. Inconsistent results will lead to an incorrect conclusion about the treatment effect, and thus misuse of the treatment in future, which may cause ineffective or even harmful impact on patients. For example, in some clinical trials regarding cardiologic disorder, blood pressure is used as a surrogate endpoint for actual survival of a patient. However, some treatments that are useful in lowering a patient’s blood pressure have been shown to have no effect in reducing the risk of death from myocardial infarction. More examples of misuse of potential surrogate endpoints can be found in Fleming and DeMets [7]. 5 Most potential surrogate endpoints are prognostic biomarkers, which means there is a strong association between the biomarker and the clinical endpoint at the level of the individual patient. Such association reflects a potential biological relationship between the biomarker and the clinical endpoint. However, as many studies have shown, a strong association is not enough. Surrogate endpoints are about assessing treatment effects. This means, at the trial level, the treatment effect obtained from a surrogate endpoint must reliably predict the treatment effect obtained from the clinical endpoint. Examples of the misuse of prognostic biomarkers as surrogate endpoints can be found in Fleming and DeMets [7]. Focusing on the Prentice definition of a surrogate endpoint (1.1), we require that if a treatment has an effect on a surrogate endpoint, then it also has an effect on the clinical endpoint. However, we also require that if a treatment doesn’t have an effect on the surrogate endpoint, then it doesn’t have an effect on the clinical endpoint either. Biologically, this implies the surrogate endpoint is on the sole causal pathway of the disease process to the clinical endpoint. Figure 2.1 illustrates the perfect scenario for a surrogate as well as some imperfect scenarios: D, S and C stand for the disease, the surrogate endpoint and the clinical endpoint in a clinical trial respectively, while Z stands for the treatment applied in this clinical trial. Panel (a) shows the situation of a perfect surrogate endpoint, in which S is on the sole causal pathway from D to C. So the entire effect of Z on S will extend to C, and Z cannot affect C without affecting S. Panels (b), (c) and (d) show some situations of imperfect surrogate endpoints. Note that, in all these 3 situations, S is associated with C since they both are influenced by the same disease process D. However, in panel (b), S is not on the causal pathway from D to C. In the case illustrated, Z could affect S but not C, so S is not a surrogate endpoint for C. In panel (c), there are two pathways from D to C, and S is on one of them. If Z affects C only through X on the second pathway, then S is not a surrogate endpoint for C; if Z can affect C through both S and X, then 6 (a) (b) (c) (d) Figure 2.1: Scenarios of Perfect (a) and Imperfect (b,c,d) Surrogates S is an imperfect surrogate endpoint for C. In such a case, an effect of Z on S could imply an effect of Z on C. However, since Z can bypass S and still influence C through X, it is possible that there is an effect on C but no effect on S. On the other hand, the effect of Z on S and the effect of Z on X may counteract each other, leading to no net effect of Z on C. In panel (d), it is possible that the effect of Z on S doesn’t extend to C, but to X instead. In this case, if there is no treatment effect on S, then there is no treatment effect on C, but the converse is not always true. 7 2.2 Methods of Validating Surrogate Endpoints 2.2.1 The Prentice Operational Criteria for Validation Prentice [5] proposed 4 operational criteria to validate a potential surrogate endpoint. Recalling his definition of a surrogate endpoint (1.1): f (S|Z) = f (S) ⇔ f (C|Z) = f (C), and using the same notation, we can express the Prentice operational criteria as: f (S|Z) ̸= f (S) (2.1) f (C|Z) ̸= f (C) (2.2) f (C|S) ̸= f (C) (2.3) f (C|S, Z) = f (C|S) (2.4) Essentially, (2.1) requires that the treatment has an effect on the surrogate endpoint, (2.2) requires that the treatment has an effect on the clinical endpoint, (2.3) requires that different values of the surrogate endpoint result in different values of the clinical endpoint, which means the surrogate endpoint is a prognostic biomarker, and (2.4) requires that the surrogate endpoint should completely capture the dependence of the clinical endpoint on the treatment. In practice, (2.1) and (2.2) are considered as necessary conditions for an outcome to be a surrogate endpoint, but not “actual” validation criteria. Note that (1.1) is equivalent to f (S|Z) ̸= f (S) ⇔ f (C|Z) ̸= f (C), so (2.1) and (2.2) need to be satisfied or not simultaneously. Criteria (2.3) and (2.4) are the “actual” validation criteria. Usually, (2.3) is examined before (2.4), because a surrogate endpoint is expected to be a good prognostic biomarker. Criterion (2.4) is the essential part of the Prentice operational criteria. It means the treatment effect on the clinical endpoint can be entirely captured by the surrogate endpoint. A common way to examine (2.4) is to assume a regression model of form C = α + β Z + γ S + ε and 8 to check if the estimated regression coefficient for S is significantly different from 0 and that for Z is not. For this approach to be valid, one has to believe that the regression model describes the true relationship among C, S and Z. Buyse and Molenberghs [8] show that (2.3) and (2.4) are necessary and sufficient conditions to establish (1.1) when the surrogate endpoint of interest is a binary outcome. When the surrogate endpoint is not binary, the criteria are only sufficient but not necessary; that is, if (2.3) and (2.4) are satisfied, then a treatment effect on the clinical endpoint ensures a treatment effect on the surrogate endpoint, but a treatment effect on the surrogate endpoint may not imply a treatment effect on the clinical endpoint. In terms of Figure 2.1, (2.3) and (2.4) exclude the situations (b) and (c), but not (d). (In (d), (2.3) holds because both S and C are influenced by D, and (2.4) holds because Z cannot affect C without affecting S.) Some counter examples are given in Buyse and Molenberghs [8] and Berger [9]. 2.2.2 Validation in a Single Clinical Trial To check the criterion (2.4), one needs to show that the statistical test for the treatment effect on the clinical endpoint to be nonsignificant after adjustment for the surrogate endpoint. However, this requirement raises a conceptual difficulty in validation since a nonsignificant result may simply be due to insufficient power of the statistical test. Hence, (2.4) is useful in rejecting a poor surrogate endpoint (the statistical test leads to a significant result), but is inadequate to validate a good surrogate endpoint. To overcome this difficulty, Freedman and Graubard [10] proposed a quantity called “proportion of the treatment effect explained by the surrogate” (PE) to measure the quality of a potential surrogate. Let β and βs be the parameters representing the treatment effect on the clinical endpoint C without and with adjustment for the surrogate endpoint S respectively. 9 Then PE is defined as: PE = β − βs βs = 1− . β β (2.5) It is expected that βs = 0 when the surrogate is perfect; in this case, PE = 1. Naturally, PE being closer to 1 implies the surrogate endpoint explains more of the treatment effect on the clinical endpoint. In practice, β and βs are replaced by their estimates, and the 2-sided 95% confidence interval for PE is constructed. Freedman and Graubard [10] suggested the lower limit of the interval should be greater than a critical value, say 0.5, for the surrogate endpoint to be considered useful. For example, in a clinical trial, let α and β denote the treatment effect on the surrogate endpoint and the clinical endpoint respectively, and let S j , C j and Z j denote the surrogate endpoint, the clinical endpoint and the treatment received for the jth patient. Here, Z j is an indicator variable, which can be either 1 (the jth patient is in the active arm) or 0 (the jth patient is in the control arm). We often refer to the combination of an active arm and a control arm as a “contrast”. So, α and β are the treatment effects obtained from the contrast in this clinical trial (i.e., by comparing the active arm and the control arm). Assume the model: S j = µ s + α Z j + εs j , C j = µ c + β Z j + εc j , (2.6) where the error terms (εsi and εc j ) have a bivariate normal distribution with mean 0 and variance-covariance matrix ( ) σss σsc Σ= . (2.7) σsc σcc Then, one can obtain the conditional distribution of C j given S j , which is param10 eterized as: C j |S j = µ + βs Z j + γ S j + ε j , (2.8) where βs = β − σσscss α . In this model, PE is given by: PE = 1 − βs σsc α = . β σss β (2.9) Despite PE’s description as the “proportion” of the treatment effect explained by the surrogate endpoint, it is not actually a “proportion”. Molenberghs et al. [11] point out that the range of PE is not between 0 and 1 and discuss the interpretation problems of PE. For instance, the PE defined by (2.9) can take any value on the real line, because the range of αβ is unrestricted. Buyse and Molenberghs [8] propose two quantities to replace PE in validating a potential surrogate endpoint. The first is the “adjusted association” ρA , a measure of the association between the surrogate endpoint and the clinical endpoint after adjustment for the treatment. In terms of model (2.6), ρA can be expressed as: σsc ρA = √ . (2.10) σss σcc The adjusted association ρA measures how good a surrogate endpoint performs at the level of the individual patient. In the above model, if ρA = 1, then the variance of ε j in (2.8) is 0. So, C j becomes a linear function of S j , which means given the value of S j , one can estimate the value of C j without error. In this case, the surrogate endpoint and the clinical endpoint contain equivalent information about the treatment, hence one can determine the treatment effect on the clinical endpoint exactly from the treatment effect on the surrogate endpoint, and the Prentice definition (1.1) is satisfied [12]. 11 The second quantity Buyse and Molenberghs [8] propose is the “relative effect” (RE), which is defined as the ratio of the treatment effect on the clinical endpoint to the treatment effect on the surrogate endpoint. In terms of model (2.6), RE is defined as: β RE = . (2.11) α The relative effect RE is useful in predicting the treatment effect on the clinical endpoint from that on the surrogate endpoint. In practice, α and β are replaced by their estimates and a confidence interval for RE is constructed. A narrow confidence interval results in a good prediction of the treatment effect on the clinical endpoint. For example, based on the data from the current trial, one can obtain ˆ ˆ = β . For a future trial, one can estimate its treatment effect on the surrogate RE αˆ endpoint as αˆ 0 . Then, the treatment effect on the clinical endpoint from that fuˆ However, to make use of RE for such ture trial can be estimated as βˆ0 = αˆ 0 · RE. predictions, it is necessary to assume that the relationship (2.11) also holds in the future trial. This assumption may not be correct and cannot be checked in a single clinical trial. 2.2.3 Validation in Multiple Clinical Trials When multiple clinical trials study the efficacy of the same treatment or treatments with a similar mechanism on the same disease, the validation procedure can use the information from these multiple trials. In this section, we review the methods used when the individual patient level data is available from each trial. In the next section, we discuss the methods used when only summary information from each trial is available. Buyse et al. [13] consider the situation where individual patient level data is available and the surrogate endpoint and the clinical endpoint are both continu- 12 ously, normally distributed. Let Si j , Ci j and Zi j denote the surrogate endpoint, the clinical endpoint and the treatment received for the jth patient from the ith trial. Assume the model: Si j = µs + µsi + α Zi j + αi Zi j + εsi j , Ci j = µc + µci + β Zi j + βi Zi j + εci j , (2.12) where µs and µc are fixed intercepts, α and β are the fixed effects of treatment on the surrogate endpoint and the clinical endpoint, µsi and µci are random intercepts and αi and βi are the random effects of treatment on the endpoints in trial i. The error terms εsi j and εci j are assumed to follow the joint normal distribution with mean 0 and variance-covariance matrix given by (2.7), and the random effects (µsi , µci , αi , βi )T are assumed to follow a joint normal distribution with mean 0 and variance-covariance matrix D given by: dss dsc dsα dsβ dsc dcc dcα dcβ . D= dsα dcα dαα dαβ dsβ dcβ dαβ dβ β (2.13) Buyse et al. [13] suggest to evaluate the surrogate endpoint at two different levels. One is at the trial level, the other is at the individual patient level. At the trial level, the surrogacy relationship is assessed by the conditional variance of β + βi given µsi and αi . From (2.12) and (2.13), the conditional variance is given by: ( dsβ Var(β + βi |µsi , αi ) = dβ β − dαβ )T ( 13 dss dsα dsα dαα )−1 ( dsβ dαβ ) . (2.14) This conditional variance describes how precisely one can predict the treatment effect on the clinical outcome given the treatment effect on the surrogate outcome in a certain trial. Equivalently, a proportion type measure of “trial level” surrogacy is defined as: ( )T ( )−1 ( ) dss dsα dsβ dsβ dαβ dsα dαα dαβ 2 Rtrial = . (2.15) dβ β Moreover, one can quantify the relationship between the treatment effects on the surrogate endpoint and on the clinical endpoint by using the conditional expectation of β + βi given µsi and αi , which is: ( dsβ E(β + βi |µsi , αi ) = β + dαβ )T ( dss dsα dsα dαα ) µsi . αsi )−1 ( (2.16) The equation (2.16) characterizes how the treatment effect on the clinical endpoint changes with the treatment effect on the surrogate endpoint. Given a new trial, after estimating the treatment effect on the surrogate endpoint, µˆ si and αˆ si , one can predict the expected treatment effect on the clinical endpoint through (2.16). Note that if we only have one trial, then we are not able to characterize this relationship. At the individual patient level, the surrogacy relationship is evaluated using the adjusted association ρA used in the single trial situation. The conditional variance 2 σ −1 . Thus, a proportion type of Ci j given Si j and the random effects is σcc − σcs ss measure of “individual level” surrogacy is defined as: R2ind = ρA2 2 σcs . = σss σcc (2.17) 2 A surrogate endpoint is considered to be perfect when both Rtrial and R2ind are 2 equal to 1. Large values of Rtrial implies precise prediction of the treatment effect on the clinical endpoint, while large values of R2ind implies strong association be- 14 tween the surrogate endpoint and the clinical endpoint, which is useful in patient 2 management. It is possible that Rtrial is large and R2ind is small, or vice versa. 2.3 Validation in Multiple Clinical Trials with Individual Data Unavailable In some contexts, only summary data of each trial, not the individual patient data, is available. For example, only results about the estimated treatment effect on the endpoints and the corresponding estimated standard errors may be available, not the outcomes of each patient. Then, the surrogacy relationship can only be evaluated at the trial level. Since we don’t know the outcomes of each patient, we cannot evaluate the strength of the association between the surrogate endpoint and the clinical endpoint at the individual patient level (e.g., calculate Rind in (2.17)). However, we are still able to assess the relationship between the treatment effect on the clinical endpoint and on the surrogate endpoint. When only summary results from each trial are available, caution must be taken in the validation procedure because these summary results are only “estimates”, which are different from the “true” quantities. For example, an estimated treatment effect on the endpoint from one trial is different from the true treatment effect on the endpoint from this trial. The true treatment effect is the effect obtained when the clinical trial includes an infinite number of patients. In practice, due to the limited number of patients, there always exist non-negligible estimation errors between the estimated and the true effects. How to appropriately model these estimation errors is important in assessing surrogacy relationships at the trial level. In the following subsections, we will review papers by Daniels and Hughes [2] (DH, hereafter) and Korn et al. [3] (KAM, hereafter), in which models are constructed to evaluate surrogacy relationships in multiple clinical trials for the situation when individual patient level data is unavailable. 15 2.3.1 Review of Daniels and Hughes [2] Suppose N trials are used to analyze the performance of the surrogate endpoint of interest. In the ith trial, denote the true treatments effect on the surrogate endpoint and on the clinical endpoint as Xitrue and Yitrue respectively. Correspondingly, let Xi and Yi denote their estimates, i.e. the summary results obtained from the ith trial. Generally, unless the the number of patients in the ith trial is very large, Xi and Yi are different from Xitrue and Yitrue . Given the ith trial, Xi is assumed to be normally distributed with mean Xitrue and variance δi2 and Yi is assumed to be normally distributed with mean Yitrue and variance σi2 . Furthermore, the correlation between Xi and Yi is assumed to be ρi . Here, δi2 and σi2 represent the effect of estimation error in the ith trial, and ρi represents the correlation between the estimation errors on Xitrue and Yitrue . In mathematical form: )) ( ) ( ) (( ) ( Yi σi2 ρi σi δi Yitrue Yitrue . (2.18) ∼ N2 , Xi ρi σi δi δi2 Xitrue Xitrue The surrogacy relationship of interest is the relationship between the true treatment effects Xi and Yi . DH assume the following structure: Yitrue |Xitrue ∼ N(α + β Xitrue , τ 2 ). (2.19) Here, β measures the association between the true treatment effects on the clinical and the surrogate endpoint. If β = 0, then there is actually no such surrogacy relationship. When β ̸= 0, a perfect surrogacy relationship also requires that α = 0 so that the treatment having no effect on the surrogate endpoint suggests no effect on the clinical endpoint. Having α ̸= 0 implies that there is a treatment effect on the clinical endpoint unexplained by the surrogate endpoint. The variance τ 2 16 represents the uncertainty of using Xitrue to predict Yitrue . If τ 2 = 0, then Yitrue will be perfectly determined when Xitrue is given. At this stage, DH assume the Xitrue s are fixed quantities. The reason why they choose Xitrue s as fixed rather than random is to avoid having to propose specific distributions for the Xitrue s, which they think may not be appropriate. (Though later, they put very flat prior distributions on Xitrue s when estimating the model parameters in the Bayesian framework.) Then combining (2.18) and (2.19), we obtain the bivariate normal model for Yi and Xi : (( ( ) )) ) ( Yi α + β Xitrue σi2 + τ 2 ρi σi δi ∼ N2 . , Xi ρi σi δi δi2 Xitrue (2.20) In some clinical trials, there may be more than one active arm, in addition to the control arm. A common situation is that different patients receive different levels of dosage of a treatment. For example, if a treatment is applied at 2 dosage levels, then this clinical trial consists of 3 arms. Patients on the first arm receive treatment with dosage level one, patients on the second arm receive treatment with dosage level two, and patients on the third arm receive control. Since the combination of any active arm and a control arm yields a contrast, this clinical trial consists of 2 contrasts. From a clinical trial with multiple contrasts, we obtain multiple estimated treatment effects on both endpoints. Suppose there are 3 arms in the ith trial, which can generate 2 contrasts. Let Yi1 and Xi1 be the estimated treatment effects on the clinical and surrogate endpoints from the first contrast, and Yi2 and Xi2 be those from the second contrast. Correspondingly, let Yi1true , Xi1true ,Yi2true and Xi2true 17 be the true treatment effects. Then model (2.18) can be generalized to: σi12 ρi11 σi1 δi1 ρiy σi1 σi2 ρi12 σi1 δi2 Yi1true Yi1true true true Xi1 Xi1 δi12 ρi21 δi1 σi2 ρix δi1 δi2 . ∼ N4 Xi1 , ρi11 σi1 δi1 Y Y true Y true ρ σ σ σi22 ρi22 σi2 δi2 i2 i2 i2 iy i1 i2 ρi21 δi1 σi2 Xi2 ρi12 σi1 δi2 ρix δi1 δi2 ρi22 σi2 δi2 δi22 Xi2true Xi2true (2.21) Yi1 The off-diagonal blocks of covariance terms in (2.21) are allowed to be non-zero, reflecting the possibility of correlations among the two pairs of estimated treatment effects arising because they all involve comparisons to the same control arm. Also, assuming Xi1true and Xi2true are fixed, (2.19) is generalized (M. J. Daniels, personal communication) as: ( ) ( ) ( )) ) (( Yi1true Xi1true α + β Xi1true τ2 0 , . ∼ N2 Yi2true α + β Xi2true 0 τ2 Xi2true (2.22) From (2.22), we can see that the marginal distributions of Yi1true and Yi2true have the same form. This is because all the treatments included in the analysis have similar mechanism of action; whether two contrasts are from one trial or different trials, they should reflect the same surrogacy relationship. DH assume the covariance between Yi1true and Yi2true is 0 in (2.22). In principle, this covariance could be nonzero and (2.22) can be replaced by substituting the 0 by an non-zero parameter. Combining (2.21) and (2.22), we get: α + β Xi1true σi12 + τ 2 ρi11 σi1 δi1 ρiy σi1 σi2 + τ 2 ρi12 σi1 δi2 2 Xi1 Xi1true ρi11 σi1 δi1 δ ρ δ σ ρ δ δ i21 i1 i2 ix i1 i2 i1 . ∼ N4 Y α + β X true , ρ σ σ + τ 2 ρ δ σ σi22 + τ 2 ρi22 σi2 δi2 i21 i1 i2 i2 i2 iy i1 i2 Xi2 Xi2true ρi12 σi1 δi2 ρix δi1 δi2 ρi22 σi2 δi2 δi22 (2.23) Yi1 For a clinical trial with 3 or more contrasts, a similar extension can be applied. DH assume a joint normal structure for the summary results (estimated treat18 ment effects) from each trial included in the study. To estimate the model parameters, they adopt a Bayesian approach. In the estimation procedure, all the within trial variances and correlations are assumed known and replaced by their estimates. The variance estimates for each trial are obtained from the summary results of that trial. For the correlation estimates, if the individual patient level data for one trial is available, the correlation estimates in this trial are calculated from the individual patient data. Otherwise, the correlation estimates for that trial are set to the average value of the correlation estimates from trials which individual patient level data are available. In the Bayesian procedure, priors are then placed on α , β , τ 2 and all the true treatment effects on the surrogate endpoint (i.e. Xitrue in single contrast trials and Xiture j s in multiple contrast trials). To assess the surrogacy relationship, DH propose to examine if the 95% credible intervals for α , β and τ 2 exclude 0. Also, DH suggest to compute Bayes factors [14] to test if α , β and τ 2 are 0. If the tests reject the null hypothesis of β = 0 and don’t reject the null hypotheses of α = 0 and τ 2 = 0, then the surrogacy relationship is considered to be validated. 2.3.2 Review of Korn et al. [3] KAM discuss different models to assess the surrogacy relationship for two different types of clinical trials. One type of trial involves unordered treatment arms (i.e. there is no control arm in the trial), and the other type of trial involves ordered treatment arms (i.e. there is one control arm in the trial). Since the dataset we will use in the next chapter consists of only ordered trials, and also to make KAM’s model comparable with DH’s model, we only discuss their model for ordered trials. In contrast to DH, KAM start their model at the arm level instead of at the contrast level. In the ith clinical trial, let Ci j and Si j be the observed clinical endpoint 19 and the observed surrogate endpoint from the jth arm, where j = 0, 1, 2, ... ( j = 0 represents the control arm in the trial). Similarly, let Citrue and Strue j i j be the true clinical and surrogate endpoints. KAM’s model begins by describing the estimation errors in estimating the endpoints. Correspondingly, let ei j and fi j denote the estimation errors in the surrogate endpoint and the clinical endpoint respectively. Then: S = Strue + e ij ij ij , (2.24) C = Ctrue + f , ij ij ij Since the estimation errors happen in different arms with different patients, they iid are assumed to be independent. KAM further assume that ei j ∼ N(0, σi2j ) and iid fi j ∼ N(0, δi2j ), and that ei j and fi j are independent. true As a next step, KAM model Strue i j and Ci j . Let µi represent the expected level of the true surrogate endpoint on the control arm, and µS represent the expected difference on the true surrogate endpoint between the active and control arms. Let mi j be the random effect representing the uncertainty in the true surrogate true endpoint for each arm. KAM express Strue i0 and Si j as: Strue i0 = µi + mi0 and Strue i j = µi + µS + mi j , for j ̸= 0. (2.25) KAM assume mi0 ∼ N(0, λ02 ), mi j ∼ N(0, λ 2 ) ( j ̸= 0), and all mi j s are independent. Note that although mi j s ( j ̸= 0) are the random effects for different active arms, they are assumed to have the same distribution. Similarly, all mi0 s are assumed to have the same distribution. Furthermore, mi j is assumed to be independent of ei j and fi j , which means the estimation errors are not affected by the value of the true endpoints. KAM assume there is a linear relationship between Citrue and Strue j i j , specifically: 20 true Ci0 = αi + β Strue i0 + gi0 true and Citrue j = αi + µC + β Si j + gi j , for j ̸= 0, (2.26) where β represents the linear relationship between Citrue and Strue j i j , and αi and αi + µC are the intercepts in the control arms and the active arms respectively. Here, µC represents the expected difference on the clinical endpoint between the active and control arms that cannot be explained by the influence of the true surrogate endpoint on the true clinical endpoint. The random effects gi j account for the fact that Citrue and Strue j i j are not perfectly linearly related and are assumed to be independent and normally distributed with mean 0 and variance τ 2 /2. Note that all the gi j s are assumed to have the same distributions though they are from different arms. Since gi j s are not estimation errors, gi j , ei j and fi j are assumed to be independent. The treatment effect is estimated as the difference between the endpoints from the active arm and from the control arm. Let Xi j = Si j − Si0 and Yi j = Ci j − Ci0 denote the estimated treatment effects on the surrogate and on the clinical endtrue true = points respectively ( j ̸= 0). Corresponding, let Xitrue = Strue j i j − Si0 and Yi j true Citrue j −Ci0 denote the true treatment effects. From (2.24), (2.25) and (2.26), we have: X = X true + (e − e ) X true = µ + (m − m ) ij ij ij i0 i0 S ij ij where Y = Y true + ( f − f ) Y true = µ + β X true + (g − g ) ij ij ij i0 ij C ij ij i0 (2.27) From (2.27), we obtain: true true E(Yitrue j |Xi j ) = µC + β Xi j (2.28) true 2 Var(Yitrue j |Xi j ) = τ , which describes the surrogacy relationship between the true treatment effects on the clinical endpoint and on the surrogate endpoint. We now see the interpreta21 tions of µC , β and τ 2 in the KAM model are the same as the interpretations of α , β and τ 2 in the DH model. As before, β measures the association between the true treatment effect on the clinical endpoint and on the surrogate endpoint. For a trial with single contrast, from (2.27) we can obtain the joint distribution of the estimated treatment effects: ( Yi1 (( ) Xi1 ∼ N2 µC + β µS µS ) ( , )) β 2 (λ02 + λ 2 ) + τ 2 + (σi12 + σi02 ) β (λ02 + λ 2 ) β (λ02 + λ 2 ) (λ02 + λ 2 ) + (δi12 + δi02 ) (2.29) For a trial with 2 contrasts, from (2.27), after similar calculation: (( ) ( )) µ Σ Σ 1i 3i (Yi1 , Xi1 ,Yi2 , Xi2 )T ∼ N4 , , µ Σ3i Σ2i ( where µ= µC + β µS µS (2.30) ) and ) ( β 2 (λ02 + λ 2 ) + τ 2 + (σi12 + σi02 ) β (λ02 + λ 2 ) , Σ1i = (λ02 + λ 2 ) + (δi12 + δi02 ) β (λ02 + λ 2 ) ( ) β 2 (λ02 + λ 2 ) + τ 2 + (σi22 + σi02 ) β (λ02 + λ 2 ) Σ2i = , β (λ02 + λ 2 ) (λ02 + λ 2 ) + (δi22 + δi02 ) ( ) 2 β 2 λ02 + τ2 + σi02 β λ02 Σ3i = . β λ02 λ02 + σi02 For a trial with 3 or more contrasts, a similar extension can be applied. When fitting their model, KAM use the maximum likelihood estimators obtained from the joint normal distributions (2.29) and (2.30). The estimation error 22 . terms σi2j and δi2j are assumed known and replaced by their estimates when fitting the model. To assess the surrogacy relationship, in addition to evaluating the estimates and the confidence intervals for µC , β and τ 2 , KAM suggest that one can use a R2 2 2 2 2 type measure. From (2.27) and (2.29), we know Var(Yitrue j ) = β (λ0 + λ ) + τ , true 2 2 and Var(Yitrue j |Xi j ) = τ . So, the R -type measure is defined as: 2 Rtrial = β 2 (λ02 + λ 2 ) . β 2 (λ02 + λ 2 ) + τ 2 (2.31) 2 2 This quantity is analogous to Rtrial in (2.15). Large values of Rtrial indicate a good surrogacy relationship. Furthermore, to evaluate how a surrogate endpoint performs in practice, KAM suggest to estimate the parameter E(Yitrue j |Xi j ), which is useful in predicting the true treatment effect on the clinical endpoint given the estimated treatment effect on the surrogate endpoint. This parameter is analogous to E(β + βi |µsi , αi ) in (2.16). However, KAM suggest to condition Yitrue on Xi j , rather than on Xitrue j j . From (2.27), the parameter of interest is: ∆ = E(Yitrue j |Xi j ) = ( µC + β µS ) + ( ) β (λ02 + λ 2 ) (Xi j − µS ). (2.32) (λ02 + λ 2 ) + (δi2j + δi02 ) To estimate ∆, KAM plug in the estimates for (β , µS , µC , λ02 , λ 2 ) and the observed value of Xi j from a new trial and replace δi2j and δi02 by their estimates from that trial. 23 2.3.3 Comparison of These Two Approaches The first difference between DH and KAM is that their models start from different levels: DH start directly from the treatment effects (contrast level, since treatment effects are obtained from contrasts), where they build the model for (Yi , Xi ) given (Yitrue , Xitrue ) and for Yitrue given Xitrue . In contrast, KAM start from the endpoints (arm level, since the endpoint values are obtained from the arms), where they first true specify the joint distribution for (Si j ,Ci j , Strue i j ,Ci j ), and take the difference to obtain the joint distribution for Yi j and Xi j . Building the model from the arm level requires a more detailed specification. However, in (2.26), KAM assume the same coefficient β for control arms and active arms, which implies the relationships between the true surrogate endpoint and the true clinical endpoint are the same regardless of the arm. This assumption may not be very realistic in some situations, where a treatment may substantially influence the association between two endpoints and thus it may be more reasonable to assume different β s for control and active arms. In contrast, DH don’t make assumptions about the relationship between the endpoints but model the surrogacy relationship directly in (2.19). We think the DH approach is more reasonable from this perspective. Both papers deal with the estimation errors in the same way in the sense that the estimation errors are assumed to be independent of the true treatment effects. In (2.18), DH assume σi2 and δi2 , the within trial estimation errors, do not depend on Yitrue and Xitrue . This means the estimation errors on the treatment effects are not affected by the true treatment effects. Similarly, in (2.25), KAM assume mi j are independent of ei j and fi j , which means the estimation errors on the endpoints are not affected by the true endpoints. This assumption implies that (mi j −mi0 ) are independent from (ei j −ei0 ) and ( fi j − fi0 ), which also means the estimation errors on the treatment effects are not affected by the true treatment effects. However, it is possible that a large true treatment effect is associated with a large estimation error, while a small treatment effect is associated with a small estimation error. Thus, this independence assumption may not hold in some clinical trials. 24 To compare how these models differ in characterizing the treatment effects, we can compare the joint distributions for the treatment effects. For example, we can compare (2.20) with (2.29). Alternatively, from (2.27), we obtain: ( Yi1 Xi1 ) ) ( (( )) Yi1true σi12 + σi02 0 , ∼ N2 , true 2 +δ2 X 0 δ true true i1 i0 i1 (Y ,X ) i1 (2.33) i1 and Yi1true |Xi1true ∼ N(µC + β Xi1true , τ 2 ). (2.34) Comparing (2.33) and (2.34) with (2.18) and (2.19), it is evident that the DH model and the KAM model are essentially the same. One difference is that Xi1true follows a normal distribution with mean µS and variance λ02 + λ 2 in the KAM model, while DH treat Xitrue as fixed when specifying their model but then give it a prior distribution when carrying out the estimation. The prior is chosen to be normal with mean 0 and a very large variance, meaning it is “non-informative”. Besides this, the conditional covariance in (2.33) is 0, while the conditional covariance in (2.18) is allowed to be non-zero. This is because KAM assume the within trial estimation errors ei j and fi j are independent, but DH allow a correlation ρi . It is likely that the two estimation errors are correlated in general. However, without individual patient level data, it is difficult to estimate this correlation. In the following chapters, we will discuss validation of the surrogate endpoint in the MS context. Our dataset consists of multiple clinical trials and only summary results from these trials are available. We will discuss two approaches to validate the surrogate endpoint of interest: the SBRCMB approach and a more comprehensive approach. The comprehensive approach is similar in spirit to the DH and KAM models. 25 Chapter 3 Lesion Counts as a Surrogate Endpoint in RRMS: the SBRCMB Approach 3.1 Introduction and the SBRCMB Dataset Recently, MRI measures of brain lesion counts on RRMS patients are widely used in clinical trials as a potential surrogate endpoint. One important clinical endpoint in RRMS clinical trials is the annualized relapse rate. A relapse is defined as appearance of new symptom or worsening of an existing symptom, attributable to MS, accompanied by an appropriate new neurologic abnormality. However, the surrogacy relationship between such MRI measures and this clinical endpoint has remained incompletely validated. Petkau et al. [15] show that the correlation between MRI lesion counts and the annualized relapse rate at the individual level is weak. The low degree of correlation at the individual level indicates that MRI measures would be unreliable predictors of the annualized relapse rate for an individual patient. However, this result does not exclude the possibility that the treatment effects on MRI measures and on the annualized relapse rate are highly associated, which means that MRI measures may still be useful for assessing the 26 treatment effect at the trial level. 1.0 ● 0.5 ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● −0.5 0.0 ● ● ● ● ● −1.0 ● ● ● ● ● ● ● −1.5 ● ● ● −2.0 Estimated Treatment Effect on the Clinial Endpoint 1.5 To evaluate whether MRI measures are useful in assessing treatment effects, SBRCMB collected summary information from multiple MS clinical trials. The SBRCMB dataset includes 23 randomized, double-blind, placebo-controlled trials. The treatments in the trials are believed to have similar mechanism of action. There are 2 trials including both secondary progressive multiple sclerosis patients and RRMS patients. The remaining 19 trials include only RRMS patients. Among the 23 trials, there are 9 trials of 2 arms, 14 trials of 3 arms, 1 trial of 4 arms and 1 trial of 5 arms. Each trial has only 1 control arm but 1 to 4 active arms. In total, there are 63 arms, 40 contrasts and 6591 patients. The detailed SBRCMB dataset is included in Appendix A. −4 −3 −2 −1 0 Estimated Treatment Effect on the Surrogate Endpoint Figure 3.1: Scatter Plot of Estimated Treatment Effects 27 1 The SBRCMB dataset contains no individual patient level data, only the summary results from each clinical trial. The observed clinical endpoint for an arm is defined as the observed annualized relapse rate for this arm (it is assumed that all the patients in the same trial have the same follow-up time) and the observed surrogate endpoint for an arm is defined as the observed MRI lesion count per patient per scan from this arm (all the patients in the same trial are assumed to have the same scan times). The estimated treatment effect on the clinical endpoint is then defined as the log ratio between the observed clinical endpoints in the active and control arms. Similarly, the estimated treatment effect on the surrogate endpoint is then defined as the log ratio between the observed surrogate endpoints in the active and control arms. Since one contrast is formed by comparing one active arm and one control arm, we can obtain one estimated treatment effect on the clinical endpoint and one estimated treatment effect on the surrogate endpoint from each contrast. In total, we have 40 pairs of estimated treatment effects. Figure 3.1 shows the scatter plot of these pairs of estimated treatment effects. Note that, the observed endpoints are not equal to the true endpoints (unless the arm includes infinite number of patients), and thus the estimated treatment effects are not equal to the true treatment effects. The task is to assess the surrogacy relationship between the true treatment effects, which are not observable, based on the estimated treatment effects. 3.2 The SBRCMB approach SBRCMB adopt a simple linear regression model and use weighted least squares (WLS) to assess the surrogacy relationship. The explanatory variable is the estimated treatment effect on the surrogate endpoint and the response variable is the estimated treatment effect on the clinical endpoint. In order to account for the influence of differences in trial size and trial duration for the contrasts, different weights are given to different contrasts. Specifically, let wi denote the weight 28 given to the ith contrast, where i = 1, 2, 3..., 40. Then: √ wi = Ncompletei · follow-up (months)i , 12 (3.1) where follow-up (months)i is the duration of the MRI follow-up in months of the patients in the ith contrast, and Ncompletei is a number which SBRCMB choose to represent the total number of patients in this contrast. For a contrast from a trial with only 2 arms, Ncompletei is equal to the total number of patients in these two arms. For a contrast from a trial with more than 2 arms, Ncompletei is obtained by equally dividing the number of placebo patients between the treatment arms. For example, for a trial with 20 patients on each of the 3 arms, 2 contrasts are created with Ncompletei = 20 + 20 2 = 30 for both contrasts. Let Yi and Xi represent the estimated treatment effect on the clinical endpoint and surrogate endpoint from the ith contrast. SBRCMB assume the following regression model to describe the surrogacy relationship: E(Yi ) = α + β Xi , (3.2) and estimate the regression coefficients based on WLS; that is: min ∑ wi (Yi − α − β Xi )2 . (3.3) SBRCMB also carry out a sensitivity study, an interaction study and a validation study. The sensitivity study aims to check whether the regression coefficients are sensitive to the choice of the weights, or to the choice of the contrasts included in the analysis. To check the sensitivity with respect to the choice of the weights, SBRCMB refit the regression line with 2 other weights w′i and w′′i , where w′i gives more weight to the duration of the contrast: w′i = Ncompletei · follow-up (months)i , 12 29 (3.4) and w′′i is a constant weight (i.e. w′′i ≡ 1). To check the sensitivity with respect to the choice of the contrasts, SBRCMB divide the whole dataset into different subsets with different features, and fits regression lines based on those subsets separately, all using the weights in (3.1). The first subset is a “highest contrasts” subset, which includes only data from “the active arm with the highest dose level versus control arm” contrast. The second subset is a “RRMS contrasts” subset, which includes data only from trials with only RRMS patients. The third subset is a “large effect contrasts” subset, which includes only data from the contrasts with estimated treatment effect on the clinical endpoint greater than 20%. Table 3.1 shows the results we reproduced for the sensitivity study; these are almost the same as those from SBRCMB. Table 3.1: Results of the Sensitivity Study Analysis No. of trials No. of contrasts wi 23 40 w′i 23 40 ′′ wi ≡ 1 23 40 highest 23 23 RRMS 21 36 18 25 large effect * estimate (estimated standard error) αˆ ∗ -0.02 (0.05) -0.02 (0.05) 0.12 (0.07) -0.06 (0.08) -0.03 (0.05) -0.01 (0.10) βˆ ∗ 0.55 (0.04) 0.58 (0.04) 0.50 (0.06) 0.53 (0.06) 0.56 (0.05) 0.58 (0.07) R2 0.80 0.84 0.65 0.77 0.80 0.75 The values in the R2 column are the weighted coefficients of determination: ¯2 ∑ wi (yˆi − y) R = , ¯2 ∑ wi (yi − y) 2 (3.5) where yˆi is the fitted value and wi can be replaced by w′i when (3.4) is used. In the sensitivity study, none of the αˆ s are significantly different from 0 but all the βˆ s are. Furthermore, SBRCMB claim that all the estimates of β s are close (all between 0.50 and 0.58) and all the R2 s are close (between 0.65 and 0.84). They interpret these findings as indicating that the fitted regression line is not sensitive 30 to the choice of weights or to the choice of contrasts involved. The SBRCMB interaction study aims to check whether the regression coefficients depend on the characteristics of the trials. For example, Let Ii be an indicator variable, which takes the value 1 if the ith contrast is from a trial conducted after year 2000 and 0 otherwise. Then SBRCMB fit the following regression model with weight wi : E(Yi ) = α + β1 Xi + β2 Ii + β3 Ii · Xi . (3.6) Through assessing β2 and β3 , one can see whether there is a difference in the regression coefficients between the contrasts before year 2000 and after year 2000. In addition to this “time period” factor, SBRCMB also examine the factors “drug class” (whether a contrast is from a trial whose treatment is an interferon) and “annualized relapse rate” (whether the observed annualized relapse rate in the placebo arm of a contrast is larger than 1). The reproduced results of the interaction study are shown in Table 3.2. The “P-value” column shows the Pvalues of testing if the coefficient of the interaction term is 0 (e.g. test if β3 = 0 in (3.6)). Table 3.2: Results of the Interaction Study indicator variable time period drug class annualized relapse rate class > 2000 < 2000 with interferon not interferon >1 <1 No. of contrasts 15 25 12 28 9 31 P-value 0.30 0.20 0.36 In the interaction study, as all these P-values are greater than 0.05, SBRCMB claim that there is no indication of differences in the slope of the fitted line for 31 contrasts with different characteristics, though SBRCMB also note that the power of this test is quite low due to the limited sample size. Finally, SBRCMB carry out a validation study, where 4 new clinical trials are introduced, which result in 4 new contrasts (each of these trials has only 2 arms). Their estimated treatment effects on the clinical endpoint are compared with the predict counterpart obtained from the regression model with weight wi . The reproduced results of the validation study are shown in Figure 3.2, where the hollow points represent the 40 actual contrasts used in the regression model, the solid line is the estimated regression line with weight wi , the solid points represent the 4 new contrasts, and the bars are the 95% prediction intervals for the estimated treatment effects on the clinical endpoint for the 4 new contrasts. The prediction intervals are calculated by the standard regression approach: the Xi s are assumed to be fixed, and the weights wi s are assumed to be proportional to the inverse of the variance of the Yi s. It can be seen that all the solid points lie within the prediction intervals (except for the 2nd one from the left, which is at the very edge of the prediction interval). SBRCMB claim that the estimated regression model is able to give satisfactory predictions. However, these 4 new trials use active control arms rather than placebo-controlled arms. So, these 4 trials have different designs from the 23 trials in SBRCMB’s dataset, and may not tell us whether the estimated regression equation can produce satisfactory predictions. Based on all of these results, SBRCMB conclude that in RRMS, the treatment effect on MRI lesion count can be used to predict the treatment effect on the annualized relapse rate. They state that these results support for the use of MRI lesion count as a surrogate endpoint in RRMS clinical trials with treatments of analogous mechanism. 32 1.5 1.0 0.5 0.0 _ ● −0.5 _ _ ● _ _ ● −1.0 ● −1.5 _ _ −2.0 Estimated Treatment Effect on the Clinial Endpoint _ −4 −3 −2 −1 0 1 Estimated Treatment Effect on the Surrogate Endpoint Figure 3.2: Results of the Validation Study 3.3 Critique of the SBRCMB Approach In this section, we discuss shortcomings of the SBRCMB approach in assessing the surrogacy relationship. The fundamental issue is the WLS estimates may not be appropriate for the dataset. There are several reasons. First, the explanatory variable Xi used in the SBRCMB model is defined as the log ratio between the observed MRI lesion counts per patient per scan in the active and the control arms, and the response variable Yi used is defined as the log ratio between the observed annualized relapse rates in the active and the control arms. Since the observed endpoints are not equal to the true endpoints, Xi and Yi are just estimates of the true treatment effects. If Xitrue and Yitrue denote the corresponding true treatment effects, then the surrogacy relationship is the relationship between 33 Xitrue and Yitrue , not that between Xi and Yi . The SBRCMB approach doesn’t take into account the influence of estimation errors in both Xi and Yi , which may lead to a biased result. Second, 14 of the 23 trials have more than 2 arms, which leads to correlated contrasts since the contrasts from the same trial share the same control arm. Therefore, even if we believe the estimation errors are negligible so that the relationship between Yi and Xi should be an excellent approximation to the relationship between Yitrue and Xitrue , the WLS approach is still not appropriate because some of the Yi s are correlated. Third, the SBRCMB choices for the weights used in the WLS estimation are quite mysterious. SBRCMB simply state that the weights are chosen because they reflect the information conveyed by each trial. Suppose that there is no estimation error, and all the Yi s are independent so that it is reasonable to use the WLS approach. Then are these weights appropriate? In the following subsections, we discuss each of these potential problems. We start with the appropriateness of the weights under the assumption that the WLS approach is reasonable. Then we discuss the correlation issue. Finally, we discuss the more fundamental issue of the influence of estimation errors in estimating the surrogacy relationship. 3.3.1 The Appropriateness of the Weights In this section, we focus on the relationship between Yi and Xi , and assume that all the Yi s are independent. Furthermore, we assume that all the Xi s are fixed. 34 We assume the following regression model: Yi = α + β xi + εi , (3.7) where E(εi ) = 0, Var(εi ) = τi2 and all the εi s are independent. Then theoretically, the weight wi for Yi should be proportional to the inverse of the variance of εi , i.e. wi ∝ τi−2 . We use xi instead of Xi here because xi s are assumed to be fixed. In the following text, we omit the subscript i on every quantity to simplify notation. Let Ra and Rc be the observed annualized relapse rate in the active and the control arms respectively from a certain trial. Let Rtrue and Rtrue be the correa c Rtrue Ra true a sponding true annualized relapse rates. Then Y = log Rc and Y = log Rtrue . Note c that since Ra and Rc are from different arms with different patients, it is natural to consider them to be independent. Similarly, we consider Rtrue and Rtrue also to be a c independent. Suppose that there are Na and Nc patients in the active and control arm respectively, and assume that all Na + Nc patients have the same number of years of follow-up for the relapse data, namely T . Then, let Fj denote the total number of relapses of the jth patient in the active arm. We assume: true E(Fj |Rtrue a ) = T Ra , true Var(Fj |Rtrue a ) = ϕ · T Ra , (3.8) where ϕ is a dispersion parameter, describing how the variance of the number of relapses is related to its expectation. If ϕ = 1, this corresponds to a Poisson assumption. We assume that ϕ is the same for all the patients in all the trials. Thus, ϕ has neither subscript j nor subscript i. Then, we have: Fj T is this patient’s annualized relapse rate. From the above assumption, E( Fj true |R ) = Rtrue a , T a Var( 35 Fj true Rtrue |Ra ) = ϕ · a . T T (3.9) By definition, the observed annualized relapse rate in the active arm is: Ra = F1 + F2 + ... + FNa . T Na (3.10) Then, by the delta method and the Central Limit Theorem, we obtain the following approximation to the conditional distribution of log Ra : ≈ N(log Rtrue log Ra |Rtrue a a , ϕ ). T Na Rtrue a (3.11) ϕ ). T Nc Rtrue c (3.12) Similarly, for the control arm, we have: log Rc |Rtrue ≈ N(log Rtrue c c , Unconditionally, we have: true Var(log Ra ) = Var(E(log Ra |Rtrue a )) + E(Var(log Ra |Ra )) (3.13) ϕ 1 ≈ Var(log Rtrue a ) + T Na E( Rtrue ), a and similarly, for the control arm, we have: true Var(log Rc ) = Var(E(log Rc |Rtrue c )) + E(Var(log Rc |Rc )) (3.14) ϕ 1 ≈ Var(log Rtrue c ) + T Nc E( Rtrue ). c The independence assumption for Ra and Rc leads to: Var(Y ) = Var(log RRac ) = Var(log Ra ) +Var(log Rc ) ϕ 1 1 1 1 true ≈ Var(log Rtrue a ) +Var(log Rc ) + T ( Na E( Rtrue ) + Nc E( Rtrue )). a (3.15) c From the above formula, we can see that the variance of Y depends on the distrias well as on the unknown parameter ϕ . and Rtrue bution of Rtrue c a 36 If we include the subscript i, (3.15) is actually Var(Yi ) = Var(log Rtrue ai ) + ϕ 1 1 1 Var(log Rtrue ) + N1ci E( Rtrue )), for i = 1, 2, ..., 40. Now we assume ci ) + Ti ( Nai E( Rtrue ai ci true all the Rai s are identically distributed. We think all the treatments included in the SBRCMB dataset have similar mechanism of action, so the distribution of the Rtrue ai describes how the true clinical endpoint varies across contrasts. Similarly, we assume all the Rture ci s are identically distributed. As a result, the variances and the expectations in (3.15) are constant across trials. 1 1 One way to estimate E( Rtrue ) and E( Rtrue ) is to average all the Ra s and all the a c Rc s across the contrasts and take their inverse. For the SBRCMB dataset, we ob1 ˆ 1 ˆ true tain E( R ) ≈ 1.43 and E( Rtrue ) ≈ 1.10. a c true Let θ denote Var(log Rtrue a ) + Var(log Rc ). Then the variance of Y can be written as: 1.43 1.10 τ 2 = Var(Y ) = θ + ϕ ( + ). (3.16) T Na T Nc The values of T , Na and Nc all depend on the contrast leading to Y . If we let 1.12 c = 1.47 T Na + T Nc and include the subscript i, we have: τi2 = θ + ϕ ci . (3.17) Based on (3.17), we can examine the appropriateness of the weights used in √ follow-up (months)i the SBRCMB approach. If wi = Ncompletei · is appropriate, then 12 wi should be proportional to the inverse of the variance of the estimated clinical outcome; that is: wi = a θ ϕ a 1 = + ci , ⇒ = 2 θ + ϕ ci wi a a τi (3.18) where a is an arbitrary proportionality constant. The above result implies that if we draw the scatter plot of (ci , w1i ), the points should gather around a straight line. 37 0.10 0.08 ● 0.06 ● 0.04 1/Wi ● ● ● ● ● ● 0.00 0.02 ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● 0.0 0.1 0.2 0.3 0.4 0.5 Ci Figure 3.3: Scatter Plot of (c, 1/w) 0.10 ● 0.08 ● 0.06 ● 0.04 ● ● ● ● ● 0.02 ● 0.00 1/Wi' ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.0 0.1 0.2 0.3 Ci Figure 3.4: Scatter Plot of (c, 1/w′ ) 38 0.4 0.5 Figure (3.3) compares wi s to ci s and Figure (3.4) compares w′i s to ci s. In both scatter plots, the points approximately gather around a straight line. This suggests, if the assumptions we made in this section are reasonable, then both weights used by SBRCMB also seem reasonable. From these two plots, we would expect the wi s and w′i s to perform similarly. 3.3.2 Correlation of the Contrasts The WLS approach is appropriate when the response variables are independent. However, this is not the case for the SBRCMB data. As mentioned before, 14 of the 23 trials have more than 2 arms. So, if two contrasts are from the same trial, then the estimated treatment effect on the clinical endpoint from these two contrasts are correlated, because the two contrasts share the same control arm. For example, let Y1 and Y2 be two estimated treatment effects on the clinical endpoint from the same three-arm trial. Then, Y1 = log RRa1c and Y2 = log RRa2c , where Ra1 , Ra2 and Rc are the observed annualized relapse rates in the first active arm, the second active arm and the control arm respectively. Because Ra1 and Ra2 are from different arms with different patients, we assume they are independent. Then: Cov(Y1 ,Y2 ) = Cov(log Ra1 Ra2 , log ) = Var(log Rc ). Rc Rc (3.19) Now, it is clear that Y1 and Y2 are correlated. An immediate way to address this correlation in the regression model is to use generalized least squares. However, from the last section, we know that: Var(log Rc ) ≈ Var(log Rtrue c )+ 1.10ϕ ϕ 1 E( true ) ≈ Var(log Rtrue . (3.20) c )+ T Nc Rc T Nc To make use of generalized least squares, we need to estimate the covariance between any two correlated Yi and Y j . But the estimate of that covariance requires 39 an estimate of Var(log Rtrue c ), the variance of the logarithm of the true annualized relapse rate across all the trials, and the unknown parameter ϕ . These two quantities are difficult to estimate without assuming a more complicated model. We will address this question in the next chapter by developing a more comprehensive model. 3.3.3 Influence of Estimation Errors As mentioned at the beginning of this chapter, the relationship of real interest is between Yitrue and Xitrue . However, we cannot observe Yitrue and Xitrue directly, but can only use Yi and Xi to estimate them. Suppose the true surrogacy relationship is: E(Yitrue |Xitrue ) = α + β Xitrue . (3.21) Then, the question is: when we use Yi and Xi in place of Yitrue and Xitrue to estimate α and β as was done by SBRCMB, how good are these estimators? In this section, we consider Xitrue as random rather than fixed. We think it is a reasonable assumption for the SBRCMB dataset. Since all the patients included in the study received treatments that are considered to be of the same type, it is then natural to think of all the true treatment effects from the different trials as coming from a single probability distribution. To simplify the discussion, we assume Yitrue and Xitrue are bivariate normally distributed. The conditional expectation of Yitrue given Xitrue is given in (3.21), and the conditional variance of Yitrue given Xitrue is denoted as τ 2 . Also, let µX and σX2 represent the expectation and the variance of Xitrue . Then, the bivariate normal distribution of Yitrue and Xitrue is: ( )) ) (( ) ( Yitrue α + β µX β 2 σX2 + τ 2 β σX2 . ∼ N2 , µX β σX2 σX2 Xitrue 40 (3.22) If we could observe Yitrue and Xitrue , then the OLS estimators based on Yitrue and Xitrue for β and α are unbiased and consistent. Because: ∑ (Xitrue − X¯ true )(Yitrue − Y¯ true ) βˆ = , ∑ (Xitrue − X¯ true )2 (3.23) where X¯itrue and Y¯itrue are the averages of all Xitrue s and Yitrue s included in the study, then: E(βˆ ) = E(E(βˆ |X true )) = E(β ) = β , (3.24) where X true represents the collection of all Xitrue s. For consistency, note that p p 2 ∑ (Xitrue − X¯ true ) /n → Var(Xitrue ) = σX2 and ∑ (Xitrue − X¯ true )(Yitrue − Y¯ true )/n → p βσ2 Cov(Yitrue , Xitrue ) = β σX2 , where n is the number of contrasts. So, βˆ → σ 2X = β . X A similar argument can be made for αˆ . Note that: αˆ = Y¯ true − βˆ X¯ true . (3.25) Then it is clear that E(αˆ ) = E(E(αˆ |X true )) = E(α + β X¯ true − β X¯ true ) = E(α ) = p α , and αˆ → (α + β µX ) − β µX = α . However, if we can only observe Yi and Xi , then the OLS estimator for β becomes: ¯ i − Y¯ ) ∑ (Xi − X)(Y , (3.26) β˜ = ¯ 2 ∑ (Xi − X) where X¯ and Y¯ are the average of all Xi s and Yi s included in the study. Conse¯ Are these estimators still unbiased and consistent? quently α˜ = Y¯ − β˜ X. Consider the following simple model. Let ei and fi represent the estimation errors on Xitrue and Yitrue respectively: Xi = Xitrue + ei and Yi = Yitrue + fi . 41 (3.27) iid iid We assume ei ∼ N(0, δ 2 ) and fi ∼ N(0, σ 2 ). Furthermore, we assume that ei and fi are independent and are independent of Xitrue and Yitrue for all i. As a result, we obtain the joint distribution for the estimated treatment effects: ( ) (( ) ( )) α + β µX β σX2 + τ 2 + σ 2 β σX2 Yi ∼ N2 . , Xi µX β σX2 σX2 + δ 2 It follows that: β σX2 (Xi − µX ) σX2 + δ 2 ) ( ( ) β σX2 σX2 = α + β µX − 2 µX + β · 2 Xi . σX + δ 2 σX + δ 2 (3.28) E(Yi |Xi ) = α + β µX + (3.29) σ Analogous to (3.23) and (3.24), we now have E(β˜ ) = β · σ 2 +Xδ 2 , which means β˜ 2 X Sxy p is not an unbiased estimator of β . For consistency, it is also clear that β˜ = Sxx → σX2 β 2 . So, β˜ is also not a consistent estimator of β . Similar conclusions hold σX +δ 2 for α˜ . σ2 Note that the coefficient σ 2 +Xδ 2 is always less than 1 unless δ 2 = 0. Hence, X under this model, when there exist estimation errors in the regressor, the expectation of the OLS estimator is always smaller than its true value. This is called the attenuation effect in regression. As demonstrated, this effect does not disappear even when the sample size goes to infinity. So, when the estimation error is not negligible (i.e. δ 2 is not very small relative to σX2 ), the OLS estimator is not a good estimator. On the other hand, we see the estimation errors in the response variable don’t affect the unbiasedness and consistency property of the OLS estimator. For more complex situations such as when the estimation errors are not identically distributed, or the Xitrue is fixed rather than random, it can be shown that the OLS estimator is still biased and inconsistent. The WLS estimator can also 42 be shown to be biased and inconsistent when there exist estimation errors in the regressor, no matter what kind of weights are applied to the data. For the SBRCMB dataset, since some trials included only a modest number of patients, non-negligible estimation errors must exist in the estimated treatment effects from those trials. Therefore, the OLS (WLS) estimator will tend to underestimate the true regression coefficient. Furthermore, using simple linear regression may lead to incorrect assessment of the surrogacy relationship. For example, in the above model, if no estimation errors exist, then the coefficient of determination R2 is the square of the sample correlation coefficient between Yitrue and Xitrue . From (3.22), we have: R2 = [∑ (Xitrue − X¯ true )(Yitrue − Y¯ true )]2 p β 2 σX4 → . σX2 (β 2 σX2 + τ 2 ) ∑ (Xitrue − X¯ true )2 ∑ (Yitrue − Y¯ true )2 (3.30) However, if estimation errors exist, and (3.28) is assumed, the coefficient of determination becomes ¯ i − Y¯ )]2 p [∑ (Xi − X)(Y β 2 σX4 R˜ 2 = → . ¯ 2 ∑ (Yi − Y¯ )2 (σX2 + δ 2 )(β 2 σX2 + τ 2 + σ 2 ) ∑ (Xi − X) (3.31) When estimation errors exist, σ 2 and δ 2 are always larger than 0, so the coefficient of determination tends to underestimate the square of the correlation coefficient between Yitrue and Xitrue , which may lead to a false conclusion about the surrogacy relationship. The coefficient of determination is 65% from SBRCMB with w′′i ≡ 1. However, the correlation between the true treatment effects on the clinical endpoint and on the surrogate endpoint may be higher, which means a better surrogacy relationship. In the next chapter, we will re-analyze the surrogacy relationship with a more comprehensive approach to take into account the existence of estimation errors and the correlated contrasts in the SBRCMB dataset. 43 Chapter 4 Lesion counts as a Surrogate Endpoint in RRMS: A More Comprehensive Approach In this chapter, we use the SBRCMB dataset to re-analyze the surrogacy relationship between the MRI lesion count and the annualized relapse rate at the trial level. We start with modeling the true treatment effects (the surrogacy relationship) in the single-contrast clinical trials and develop the conditional distribution of the observed endpoints given the true endpoints to account for the estimation errors. Similar models are then generalized to the multiple-contrast trials to address the issue of the correlated contrasts. Once all components of the model are constructed, the model parameters are estimated based on “normal estimating equations”. The results are then compared with those obtained from the SBRCMB approach and the estimated surrogacy relationship is evaluated as well as its usefulness in practice. In each arm, we define the true clinical endpoint as the true annualized relapse rate, which is the expected value of the observed annualized relapse rate. In fact, every patient in the same arm has his/her own observed annualized relapse rate, 44 and we assume they all have the same probabilistic distribution whose expectation is the true annualized relapse rate (as defined in Section 3.3.1). Similarly, we define the true surrogate endpoint as the true MRI lesion count per scan per patient, which is the expected value of the observed MRI lesion count per scan per patient. So, corresponding to the estimated treatment effects defined through the observed endpoints, we define the true treatment effects on the endpoints as the log ratio between the true endpoints in the active arm and in the control arm. We aim to assess the relationship between these true treatment effects. 4.1 Model for the Single-contrast Clinical Trials 4.1.1 Model for the True Treatment Effects In the SBRCMB dataset, there are 9 single-contrast trials. For each of these 9 trials, let Rtrue and Rtrue denote the true annualized relapse rates in the active a c arm and in the control arm, and let Matrue and Mctrue denote the true MRI lesion counts per scan per patient in the active arm and in the control arm. Then the Rtrue a and the true treatment effect on the clinical endpoint is defined as Y true = log Rtrue c Mtrue true treatment effect on the surrogate endpoint is defined as X true = log Matrue . We c assume the following bivariate normal model for these two true treatment effects: ( ) (( ) ( )) Y true µY σY2 σY X ∼ N2 , . X true µX σY X σX2 (4.1) Since different trials consist of different patients, we assume that the true treatment effects are independent across trials. The model (4.1) is assumed to be true for all the contrasts from all the single-contrast trials. This is reasonable because all the trials in the dataset are included to examine the effects of treatments with similar mechanisms of action and therefore we hope to see a similar relationship between the true treatment effect on the the clinical endpoint and on the surrogate 45 endpoint across all the trials with this type of treatment. We omit the subscript i for the ith trial in our notation throughout the following development. The distribution in (4.1) is specified in an unstructured form. To express the surrogacy relationship, we represent the moments of the conditional distribution of Y true on X true as: E(Y true | X true ) = α + β X true and Var(Y true | X true ) = τ 2 . (4.2) The parameter β is our primary interest, as it measures the strength of the surrogacy relationship. If β is 0, then the MRI lesion count is not a surrogate for the annualized relapse rate for this type of treatment at the trial level, since knowledge of the true treatment effect on the MRI lesion count doesn’t help to predict the true treatment effect on the annualized relapse rate. The parameter α is also of interest and we expect it to be small. If α is not 0, there is a part of the true treatment effect on the annualized relapse rate that is unexplained by the true treatment effect on the MRI lesion count per patient per scan. The parameter τ 2 represents the precision of this linear relationship; that is, how precisely we can predict the true treatment effect on the annualized relapse rate given the true treatment effect on the MRI lesion count. The Prentice definition (1.1) describes a prefect surrogate relationship: no treatment effect on the surrogate endpoint implies no treatment effect on the clinical endpoint and vice versa. In our context, (1.1) requires both α and τ 2 to be 0, while β must not be 0; that is, the relationship between Y true and X true is deterministic and multiplicative: Y true = β X true . However, such a perfect surrogacy relationship will seldom be realized in practice. 46 Using the parametrization specified in (4.2), we can rewrite (4.1) as: ( ) (( )) ) ( Y true α + β µX β 2 σX2 + τ 2 β σX2 ∼ N2 . , X true µX β σX2 σX2 4.1.2 (4.3) Model for the Observed Annualized Relapse Rate and MRI Lesion Count Per Patient Per Scan Let Ra , Rc and Ma , Mc denote the observed annualized relapse rates and the observed MRI lesion counts per patient per scan on the active and control arms. To derive the probability distribution of Ra and Rc , we use the same assumptions used in Section 3.3.1 and follow the notation used there (except we use ϕ1 now instead of ϕ ). As a result, we have: log Ra |Rtrue ≈ N(log Rtrue a a , ϕ1 ), T Na Rtrue a (4.4) log Rc |Rtrue ≈ N(log Rtrue c c , ϕ1 ). T Nc Rtrue c (4.5) Similarly, for the observed MRI lesion count, let G j denote the cumulative number of MRI lesions of the jth patient from the active arm on the K scans obtained for this patient during the follow-up time T . (As in SBRCMB, we assume the follow-up time for the MRI data is the same as the follow-up time for the relapse data, all the patients in a trial have the same follow-up time T , and all the patients in a trial have the same number of scans K.) We then assume: E(G j |Matrue ) = KMatrue , Var(G j |Matrue ) = ϕ2 · KMatrue , 47 (4.6) where ϕ2 is a dispersion parameter describing how the variance of the MRI lesion count is related to its expectation. As for ϕ1 , we assume that ϕ2 is the same for all the patients in all the trials. Thus, ϕ2 has neither subscript j nor subscript i. Then: E( G j true |M ) = Matrue , K a Var( G j true Mtrue |Ma ) = ϕ2 · a . K K (4.7) By definition, the observed MRI lesion count per patient per scan is: Ma = G1 + G2 + ... + GNa . KNa (4.8) Then, by the delta method and the Central Limit Theorem, we obtain the following approximation to the conditional distribution of log Ma : log Ma |Matrue ≈ N(log Matrue , ϕ2 ). KNa Matrue (4.9) ϕ2 ). KNc Mctrue (4.10) Similarly, for the control arm, we have: log Mc |Mctrue ≈ N(log Mctrue , 4.1.3 Model for the Estimated Treatment Effects From (4.4), it is clear that Ra and Rtrue are not independent, which is reasonable a since the observed clinical endpoint should depend on the true clinical endpoint. Now, we assume that given Rtrue a , the conditional distribution of log Ra is indepentrue and Mtrue ; that is , if we already know Rtrue , the additional dent of Rtrue c , Ma c a true and Mtrue does not help to predict log R . , M information of Rtrue a c a c and Mctrue affect neither Ra nor Rtrue It is natural to think that Rtrue a . The pac tients in the active arm and in the control arm are distinct, and the patients in the 48 active arm received the treatment while the patients in the control arm did not, so it seems obvious that the behavior of the patients in the control arm should not affect the behavior of the patients in the active arm. For Matrue , we could think that if it affects Ra , that effect would be only through Rtrue a . Therefore, instead of (4.4), we make the stronger assumption that: log Ra |U true = log Ra |Rtrue ≈ N(log Rtrue a a , ϕ1 ), T Na Rtrue a (4.11) true true true T where U true = (Rtrue a , Rc , Ma , Mc ) . The same argument leads to the corresponding results for log Rc |U true , log Ma |U true and log Mc |U true . Furthermore, we make the additional model assumption that log Ra , log Rc , log Ma and log Mc are conditionally independent, given U. The motivation for this assumption is the intuitive notion that each observed quantity is only affected by the corresponding true quantity. So if all the true quantities are given, the observed quantities are supposed to not affect each other. Then, if U = (Ra , Rc , Ma , Mc )T , we have: ϕ1 log Rtrue T N Rtrue a a a true log Rc 0 logU|U true ≈ N4 log Mtrue , 0 a log Mctrue 0 0 0 0 ϕ1 T Nc Rtrue c 0 0 0 0 ϕ2 KNa Matrue 0 0 ϕ2 KNc Mctrue . (4.12) a Let Y = log RRac and X = log M Mc denote the estimated treatment effects on the clinical outcome and on the surrogate outcome respectively. We can express Y and X in terms of U: ( ) ( ) Y 1 −1 0 0 = A logU, where A = . (4.13) X 0 0 1 −1 49 Combining (4.3) and (4.12), we obtain the approximations to the first two moments of the estimated treatment effects: ( ) ( ) α + β µ Y X E = E(A logU) = E(E(A logU | U true )) ≈ E(A logU true ) = . X µX (4.14) ( ) Y Var = Var(A logU) = Var(E(A logU | U true )) + E(Var(A logU | U true )) X ( ) 1 1 ) + TϕN1c E( Rtrue ) (β 2 σX2 + τ 2 ) + TϕN1a E( Rtrue β σX2 a c ≈ . ϕ2 ϕ2 1 1 β σX2 σX2 + KN E( Mtrue E( Mtrue ) + KN ) a c a c (4.15) Unlike these marginal moments, the marginal distribution of the estimated treatment effects is difficult to derive. In fact, to obtain the marginal distribution of (Y, X)T , we need to make additional distributional assumptions about U true . On the other hand, as Na and Nc , the number of patients in the active arm and in the control arm increase, the influence of the estimation errors become small. As a result, the observed endpoints approach the true endpoints and the estimated treatment effects approach the true treatment effects. Since in (4.1) we assume that the true treatment effects follow a joint normal distribution, we may think the normal distribution with moments given by (4.14) and (4.15) is a reasonable approximation to the true distribution of (Y, X)T for large Na and Nc . 4.2 Model for the Multiple-contrast Clinical Trials Besides the 9 single-contrast trials, there are 12 two-contrast trials, 1 three-contrast trial, and 1 four-contrast trial. In each of the two-contrast trials, there is a control 50 arm, a high dose arm and a low dose arm. For each of the 12 two-contrast tritrue als, let Rtrue a1 and Ra2 represent the true annualized relapse rate in the high dose true and Mtrue represent the arm and in the low dose arm respectively, and let Ma1 a2 true MRI lesion count per patient per scan in the high dose arm and in the low dose arm respectively. Then, the true treatment effects from the high dose versus Rtrue Mtrue a1 control contrast can be expressed as Y1true = log Rtrue and X1true = log Ma1 true , and the c c true treatment effects from the low dose versus control contrast can be expressed Mtrue Rtrue a2 as Y2true = log Rtrue and X2true = log Ma2 true . Here, we also omit the subscript i for the c c ith trial. To take into account the fact that these two pairs of true treatment effects, (Y1true , X1true ) and (Y2true , X2true ), are correlated, we assume a joint normal distribution for them. Focusing on (Y1true , X1true ) or (Y2true , X2true ) individually, the marginal distributions of both of these pairs should be the bivariate normal distribution (4.3). This is because we are examining the effects of treatments with similar mechanism of action; whether two contrasts are from one trial or from different trials, they should reflect the same surrogacy relationship. However, to determine the joint distribution of these four quantities, we also need to specify the covariance structure between (Y1true , X1true ) and (Y2true , X2true ). Assuming independence among the true endpoints from different arms, we have: Rtrue Rtrue c true Ma1 Mctrue c true Ma2 Mctrue a1 a2 Cov(Y1true ,Y2true ) = Cov(log Rtrue , log Rtrue ) = Var(log Rtrue c ), Cov(X1true , X2true ) = Cov(log Cov(Y1true , X2true ) = Cov(log Cov(Y2true , X1true ) = Cov(log Rtrue a1 Rtrue c Rtrue a2 Rtrue c , log , log , log true Ma2 Mctrue true Ma1 Mctrue ) = Var(log Mctrue ), (4.16) (4.17) true ) = Cov(log Rtrue c , log Mc )(4.18) true ) = Cov(log Rtrue c , log Mc )(4.19) In principle, these covariances represent 3 new parameters in the joint distribution of (Y1true , X1true ,Y2true , X2true )T in addition to the parameters α , β , µX , σX2 , τ 2 51 true that appear in (4.3). However, note that, Var(Y1true ) = Var(log Rtrue a1 )+Var(log Rc ), where Var(log Rtrue a1 ) represents the variability of the log of the true annualized relapse rate in the high dose arm across trials and Var(log Rtrue c ) represents the variability of the log of the true annualized relapse rate in the control arm across true may be quite differtrials. So, even though in a given trial, log Rtrue a1 and log Rc ent due to the treatment effect, the two variabilities across trials may not differ too true much. To simplify our model, we assume Var(log Rtrue a1 ) = Var(log Rc ). Under this assumption, from (4.3), we obtain: 1 1 2 2 2 true Cov(Y1true ,Y2true ) = Var(log Rtrue c ) = Var(Y1 ) = (β σX + τ ). 2 2 (4.20) true ) = Var(log Mtrue ) similarly leads to: The assumption that Var(log Ma1 c 1 1 Cov(X1true , X2true ) = Var(log Mctrue ) = Var(X1true ) = σX2 . 2 2 Rtrue (4.21) Mtrue a1 At the same time, note that Cov(Y1true , X1true ) = Cov(log Rtrue , log Ma1 true ) = c c true true true true true Cov(log Rtrue a1 , log Ma1 )+Cov(log Rc , log Mc ), where Cov(log Ra1 , log Ma1 ) measures how closely the two true endpoints on the high dose arm are related true across trials, and Cov(log Rtrue c , log Mc ) measures how closely the two true endpoints on the control arm are related across trials. Even though the true relationship between the two endpoints on the high dose arm may be quite different from that on the control arm, the two measures of closeness may not differ true too much. Thus, to simplify our model, we assume Cov(log Rtrue a1 , log Ma1 ) = true Cov(log Rtrue c , log Mc ). Under this assumption, from (4.3), we obtain: true Cov(Y1true , X2true ) = Cov(Y2true , X1true ) = Cov(log Rtrue c , log Mc ) (4.22) = 12 Cov(Y1true , X1true ) = 12 β σX2 . All these assumptions lead to the joint distribution of the true treatment effects 52 in a two-contrast trial: Y1true α + β µX β 2 σX2 + τ 2 true X1 β σX2 ∼ N4 µX , α + β µ 1 (β 2 σ 2 + τ 2 ) Y true X 2 2 X 1 true 2 X2 µX 2 β σX β σX2 σX2 1 2 2 β σX 1 2 2 σX 1 2 2 2 2 (β σX + τ ) 1 2 2 β σX β 2 σX2 + τ 2 β σX2 1 2 2 β σX 1 2 2 σX . β σX2 σX2 (4.23) To derive the probabilistic structure of the estimated treatment effects in a two-contrast trial, we first focus on the conditional distribution of the observed endpoints given the true endpoints. Let U˜ = (Ra1 , Ra2 , Rc , Ma1 , Ma2 , Mc )T and true true true true true T U˜ true = (Rtrue a1 , Ra2 , Rc , Ma1 , Ma2 , Mc ) represent the observed and true ˜ U˜ true has the same stochastic beendpoints respectively. We assume that log U| havior as logU|U true in the single-contrast trials. Then, as in (4.12), we have: ( ˜ true ˜U log U| ≈ N6 (U ˜ true ) ϕ1 ϕ1 ϕ1 ϕ2 ϕ2 ϕ2 , diag , , , true , T N Mtrue , T N Mtrue ), T N1 Rtrue T N2 Rtrue T Nc Rtrue T N1 Ma1 2 a2 c c c a1 a2 (4.24) where “diag” indicates a diagonal matrix. Then, combining (4.23) and (4.24), the estimated treatment effects, Y1 = log RRa1c , Y2 = log RRa2c , X1 = log MMa1c and X2 = log MMa2c , have the following approximations to their first two moments: (( ) ( )) Σ1 Σ3 µ T (Y1 , X1 ,Y2 , X2 ) ≈ , , (4.25) Σ3 Σ2 µ where ( ) α + β µX µ= µX 53 and 1 1 (β 2 σX2 + τ 2 ) + TϕN1a1 E( Rtrue ) + TϕN1c E( Rtrue ) β σX2 c a1 , Σ1 = ϕ2 ϕ2 1 1 2 β σX σX2 + KN ) + E( E( ) true true KNc M M a1 c a1 ϕ1 ϕ1 1 1 2 2 2 2 (β σX + τ ) + T Na2 E( Rtrue ) + T Nc E( Rtrue ) β σX c a2 , Σ2 = ϕ ϕ2 1 1 β σX2 σX2 + KN2a2 E( Mtrue E( ) ) + KN true M c ( Σ3 = c a2 1 2 2 (β σX 1 + τ 2 ) + TϕN1c E( Rtrue ) c 1 2 2 β σX ) 1 2 2 β σX ϕ2 1 2 1 2 σX + KNc E( Mctrue ) . Similarly as in the single-contrast trial, the marginal distribution of the estimated treatment effects are difficult to derive, since we need to make additional distributional assumptions about U˜ true . As before, we may think the normal distribution with moments given by (4.25) is a reasonable approximation to the true distribution of (Y1 , X1 ,Y2 , X2 )T for large Na1 , Na2 and Nc . For the single three-contrast trial we have 6 estimated treatment effects, and for the single four-contrast trial we have 8 estimated treatment effects. Deriving the first two moments of those 6 and 8 estimated treatment effects proceeds analogously to the above development for the 4 estimated treatment effects in the two-contrast trial. 4.3 Parameter Estimation From (4.25), we have approximations to the first two moments of the estimated treatment effects. In order to estimate the model parameters, we use the normal estimating equations: that is, we pretend the estimated treatment effects are multivariate normally distributed with the mean vector and variance covariance matrix 54 given by (4.25). Then maximum likelihood estimates (MLE) of the model parameters are obtained by maximizing the “normal likelihood”. In addition to the parameters of primary interest, α , β , µX , σX2 , τ 2 , ϕ1 and ϕ2 , there are several nuisance parameters in the covariance matrices that appear in this “likelihood” function, namely the expectations of the reciprocal of the true relapse 1 1 rates and lesion counts such as E( Rtrue ) and E( Mtrue ) in (4.25). When fitting the c c model, to avoid too many parameters to be estimated in the maximization procedure, we treat these terms as known and replace them by estimates. As mentioned in Section 3.3.1, we assume that all the Rtrue a s in different contrasts have the same distribution and all the Rtrue c s in different contrasts also have the same distribution. As a result: true true E(Rtrue a ) = E(Ra1 ) = E(Ra2 ), for all the contrasts. (4.26) Also, from (3.8), (3.9) and (3.10), we know that: true E(Ra ) = E(E(Ra |Rtrue a )) = E(Ra ). (4.27) From the delta method, we have the rough approximation: E( 1 Rtrue a )≈ 1 1 . = true E(Ra ) E(Ra ) (4.28) This means that we can use the observed annualized relapse rates to estimate the 1 1 nuisance parameter E( Rtrue ). From the total of 40 contrasts, we estimate E( Rtrue ) a a by the inverse of the average value of the 40 observed annualized relapse rates 1 ) similarly using the observed annualized on the active arms. We estimate E( Rtrue c 1 relapse rates on the 23 control arms. By the same argument, we estimate E( Mtrue ) a 1 and E( Mtrue ) by using the observed MRI lesion counts per patient per scan from c the 40 active arms and the 23 control arms respectively. As a result, we have 55 1 ˆ true ˆ 1 ˆ 1 ˆ 1 E( R ) ≈ 1.43, E( Rtrue ) ≈ 1.10, E( Mtrue ) ≈ 0.57 and E( Mtrue ) ≈ 0.41. a c a c To maximize the “normal likelihood”, we use the R function optim. The maximization procedure is based on the Nelder-Mead method [16]. The optimization process is “two-staged”: after obtaining the optimized parameter estimates from each initial value, we set these as an initial value and run the optimization again to obtain a final result. The reason for doing the two-stages is that the first stage often converges to a local minimum. To avoid negative estimates for σX and τ in the optimization, we re-parameterize them as ηX = log(σX ) and η = log(τ ). The first set of initial values for αˆ , βˆ , µˆ X , ηˆ X , ηˆ , ϕˆ1 and ϕˆ2 were -0.02, 0.55, -0.69, -0.04, -1.21, 1.5 and 1.5. The values for αˆ and βˆ are from the SBRCMB result, the values for µˆ X , ηˆ X and ηˆ are based on the method of moments, and the values for ϕˆ1 and ϕˆ2 are chosen somewhat arbitrarily. We then tried 999 different sets of random initial values, generating these initial values from independent uniform distributions. Specifically, we generate initial values for αˆ , βˆ , µˆ X , ηˆ X , ηˆ , ϕˆ1 and ϕˆ2 uniformly on (−0.5, 0.5), (0, 1), (−2, 0), (−4.5, 0.5), (−5, 0), (0.01, 10) and (0.01, 20) respectively. Nearly all of these initial values led to convergence to a very similar optimization result. We choose the estimate which returned the smallest negative log “likelihood” as the final solution. To calculate the standard errors of the parameter estimates based on the asymptotic normality of the MLE, we invert the negative hessian matrix of the log “likelihood” function and evaluate it at the parameter estimates. We also calculate standard errors for the parameter estimates based on the jackknife method, where we consider the 23 clinical trials as units and estimate the parameters after “leaving one out”. We generate 23 different subsets of the original 23 clinical trials; the ith subset is without the ith clinical trial. If the estimate of β from the ith 56 subset is βˆ(i) , then the jackknife estimate of the standard error of βˆ is given by ˆ 2 0.5 ˆ ˆ ˆ [ 22 23 ∑ (β(i) − β(.) ) ] , where β(.) is the average of all β(i) s [17]. Strictly speaking, this is not an appropriate application of the jackknife method, since different trials have different numbers of patients and different numbers of arms, which cause the estimation errors in different trials to be not identical. So the resulting estimated standard errors should be viewed as only “rough and ready” approximations. The parameter estimates and the corresponding estimated standard errors are shown in Table 4.1 and the estimated asymptotic correlation matrix of αˆ , βˆ , µˆX , σˆ X2 , τˆ 2 , ϕˆ1 and ϕˆ2 based on the MLE method is: 1.000 0.776 −0.056 −0.394 0.776 1.000 0.108 −0.479 −0.056 0.108 1.000 −0.106 Rˆ = −0.394 −0.479 −0.106 1.000 −0.002 −0.007 −0.002 0.003 −0.442 −0.414 −0.158 0.215 0.468 0.444 0.194 −0.410 −0.002 −0.007 −0.002 0.003 1.000 −0.001 −0.004 −0.442 0.468 −0.414 0.444 −0.158 0.194 0.215 −0.410 −0.001 −0.004 1.000 −0.557 −0.557 1.000 (4.29) Table 4.1: Results of the Model Fit αˆ βˆ µˆX σˆX 2 τˆ 2 ϕˆ1 ϕˆ2 Value 0.081 0.622 -0.713 0.521 < 0.001 0.825 37.427 Normal SE 0.084 0.074 0.156 0.167 < 0.001 0.383 19.932 Jackknife SE 0.105 0.150 0.179 0.198 0.003 0.498 33.496 Although that all the jackknife standard errors are larger than the corresponding MLE standard errors, the results of the statistical tests for significance of the estimates are consistent from these two methods (except for ϕˆ1 ). 57 Rtrue Mtrue a Recall that Y true = log Rtrue and X true = log Matrue . When a treatment has a benec c ficial effect, we expect a lower MRI lesion count and a smaller relapse rate, which means Y true < 0 and X true < 0. Therefore, an increase in the true treatment effect corresponds to a decrease in Y true and in X true . So, βˆ = 0.622 means that on average, a one unit increase in the true treatment effect on the MRI lesion count per patient per scan is associated with a 0.622 unit increase in the true treatment effect on the annualized relapse rate. Note this value is larger than the βˆ = 0.55 obtained with the SBRCMB approach (see Table 3.1). As the SBRCMB approach didn’t take into account the estimation errors, their regression coefficient of 0.55 may underestimate the association between the true treatment effects due to the attenuation effect. Although the value for αˆ of 0.081 is larger than the αˆ = −0.02 from the SBRCMB approach, its approximate 95% confidence interval still covers 0. The estimate of αˆ being not significantly different from 0 is consistent with a good surrogacy relationship, since there is no strong indication of part of the true treatment effect on the annualized relapse rate being unexplained by the true treatment effect on the MRI lesion count per patient per scan. Finally, the value for τˆ 2 is almost 0, which suggests a nearly perfect linear relationship between the true treatment effects. One can predict the true treatment effect on the annualized relapse rate almost without error based on the true treatment effect on the MRI lesion count per patients per scan. As mentioned at the end of Section 4.1.1, the Prentice definition requires that α = 0 and τ 2 = 0. So, under our model assumptions, the MRI lesion count per patient per scan appears to be a very good surrogate endpoint. 2 Buyse et al. [13] suggest to use Rtrial to evaluate the true surrogacy relation2 2 ship. Analogous to (2.14) and (2.15), β σX + τ 2 represents the uncertainty of predicting the true treatment effect on the clinical endpoint without the information of the surrogate endpoint, and τ 2 represents the uncertainty with the information of the surrogate endpoint. Thus, the difference β 2 σX2 represents how much we 58 gain from using the surrogate. From Table 4.1, we have 2 Rˆtrial = βˆ 2 σˆ X2 ≈ 1. βˆ 2 σˆ X2 + τˆ 2 (4.30) 2 The estimate of Rtrial of almost 1 suggests a very good surrogacy relationship. As a result, we can say that, at the trial level, the MRI lesion count per patient per scan has been validated as a surrogate endpoint for the annualized relapse rate in 2 being RRMS. However, the estimate of τ 2 being almost 0 or the estimate of Rtrial almost 1 may not guarantee a high precision in predicting the true treatment effect on the annualized relapse rate in a new trial. In Section 4.5, we will assess this using the estimated surrogacy relationship to make such predictions. As noted earlier, the jackknife method may not be very appropriate since the 23 trials which we treat as units cannot be considered as a random sample. Of course, the standard errors calculated by the MLE method is also approximate, because we don’t have the true likelihood. In the following sections, we use the standard errors based on the asymptotic normality of the MLE to develop our results. 4.4 Comparison between the Comprehensive Approach and the SBRCMB Approach In a contrast from a new clinical trial (we use the subscript “0” to denote this new contrast), if we know the true treatment effect on the MRI lesion count per patient per scan, X0true , we can use it to predict the true treatment effect on the annualized relapse rate, Y0true . In practice, however, there are only a limited number of patients included in any trial and we only have the estimated treatment effect X0 . So, we need to use X0 instead of X0true to predict Y0true ; that is, we want to use the surrogacy relationship to predict the treatment effect on the clinical endpoint 59 based on the estimated treatment effect on the surrogate endpoint. To identify the relationship between Y0true and X0 , first note that Cov(Y0true , X0 ) = E(Y0true X0 ) − E(Y0true )E(X0 ). We assume this new trial has similar inclusion criteria and involves the same type of treatment as the 23 trials included in the SBRCMB dataset. So, from (4.3) and (4.14), we have E(X0 ) ≈ E(X0true ). Let U0true = true true true T (Rtrue a0 , Rc0 , Ma0 , Mc0 ) denote the true endpoints from the new contrast. Then, from (4.12), we have E(Y0true X0 ) = E(E(Y0true X0 |U0true )) ≈ E(Y0true X0true ). Therefore: Cov(Y0true , X0 ) ≈ E(Y0true X0true ) − E(Y0true )E(X0true ) = Cov(Y0true , X0true ). (4.31) As a result, we have the following approximation to the moment structure for Y0true and X0 : )) ( ) (( ) ( β 2 σX2 + τ 2 β σX2 Y0true α + β µX , ≈ , 1 1 ) + K0ϕN2 c0 E( Mtrue ) β σX2 σX2 + K0ϕN2a0 E( Mtrue X0 µX a0 c0 (4.32) where K0 is the total number of scans on each patient in the new trial, and Na0 , Nc0 are the number of patients in the active and control arms in the new trial respectively. The point prediction for Y0true can be based on E(Y0true |X0 ), but determination of a prediction interval for Y0true requires information on the conditional distribution of Y0true given X0 . To derive this distribution, we use the normal distribution with moments given by (4.32) as an approximation to the joint distribution of Y0true and X0 . The joint distribution is unknown, but as Na0 and Nc0 , the number of patients included in this new trial becomes larger, the estimation error on the estimated treatment effect X0 , becomes smaller, and the estimated treatment effect approaches the true treatment effect X0true . We may think the bivariate normal distribution is a reasonable approximation for large Na0 and Nc0 . 60 Under this bivariate normal approximation, we have: σ2 βσ2 X X E(Y0true |X0 ) = α + β µX (1 − σ 2 +H ) + σ 2 +H X0 , 0 X X (4.33) 0 σ2 X Var(Y0true |X0 ) = β 2 σX2 (1 − σ 2 +H ) + τ 2, X (4.34) 0 1 1 ) + K0ϕN2 c0 E( Mtrue ). So, the point prediction of Y0true from where H0 = K0ϕN2a0 E( Mtrue a0 c0 a future contrast, given the value of X0 = x0 from that contrast, is: ˆ 0true |X0 = x0 ) = αˆ + βˆ µˆ X (1 − Yˆ0true (x0 ) = E(Y σˆ X2 βˆ σˆ X2 ) + x0 , σˆ X2 + Hˆ 0 σˆ X2 + Hˆ 0 (4.35) 1 1 1 1 where Hˆ 0 = K0ϕN2a0 E( Mtrue ) + K0ϕN2 c0 E( Mtrue ). As earlier, E( Mtrue ) and (E( Mtrue )) a0 c0 a0 c0 will be treated as known and replaced by the inverse of the average value of the 40 and 23 MRI lesion counts per patient per scan from the active and control arms in the SBRCMB dataset. ˆ ˆ The prediction interval for Y0true given X0 = x0 can be based on the random variable: W0 = Y0true (x0 ) − Yˆ0true (x0 ). (4.36) Note that given X0 = x0 , Y0true (x0 ) and Yˆ0true (x0 ) are independent, so Var(W0 ) = Var(Y0true (x0 )) + Var(Yˆ0true (x0 )). From (4.34), we know that Var(Y0true (x0 )) = σ2 X β 2 σX2 (1 − σ 2 +H ) + τ 2 . Furthermore, the delta method can be used to approxi0 X mate Var(Yˆ0true (x0 )). Specifically, let ΣW denote the asymptotic covariance matrix of αˆ , βˆ , µˆ X , σˆ 2 and ϕˆ2 , and let g denote the partial derivatives of E(Y true |X0 = x0 ) X 0 with respect to α , β , µX , σX2 and ϕ2 (see Appendix B). Then: Var(Yˆ0true (x0 )) ≈ gT · ΣW · g. 61 (4.37) As a result, Var(W0 ) ≈ β 2 σX2 (1 − σX2 ) + τ 2 + gT · ΣW · g. 2 σX + H0 (4.38) Note that, W0 is asymptotically normally distributed, so the approximate 95% prediction interval for Y0true (x0 ) can be given by: √ true ˆ ˆ Y0 (x0 ) ± 1.96 Var(W 0 ), (4.39) ˆ 2 ˆ 2 (1 − 2σˆ X ) + τˆ 2 + gˆT · Σˆ W · g, ˆ where, Var(W ˆ and g, ˆ Σˆ W are the partial 0) = β σ X σˆ X +Hˆ 0 derivatives and the asymptotic variance covariance matrix of the parameter estimators evaluated at their estimated values. 2 Figure 4.1 shows the comparison between the SBRCMB results and the comprehensive results in predicting Y0true from X0 . Although the regression relationship modeled in the SBRCMB approach is between the two estimated treatment effects, for this purpose, we pretend it is between the true treatment effect on the clinical endpoint and the estimated treatment effect on the surrogate endpoint. The SBRCMB prediction line is y = −0.02 + 0.55x while the prediction line for the comprehensive model is given by (4.35). To allow a specific illustration in the figure, we fixed K0 at 6 (the median number of total scans among the 40 contrasts in the SBRCMB dataset) and Na0 , Nc0 at 50 (the median number of patients among 23 placebo and 40 active arms in the SBRCMB dataset); for these values, σˆ 2 βˆ σˆ 2 αˆ + βˆ µˆ X (1 − σˆ 2 +XHˆ ) ≈ 0 and σˆ 2 +XHˆ ≈ 0.50, so (4.35) becomes y = 0.50x. The 0 0 X X points represent the 40 pairs of estimated treatment effects from the SBRCMB dataset. From Figure 4.1, we can see that for X between -4 and 1 (the range of X in the SBRCMB dataset), the two prediction lines don’t differ much: the point predictions for Y0true based on X0 from these two approaches are close. However, when 62 1.5 SMRCMB Comprehensive 1.0 0.5 ● ● ● ● ● ● 0.0 ● ● ● ●● ●● ● ● ● ● ● ● ● ●● −0.5 ● ● ●● ● ● ● ● −1.0 ● ● ● ● ● ● ● −1.5 True Treatment Effect on the Clinial Endpoint ● ● −2.0 ● −4 −3 −2 −1 0 1 Estimated Treatment Effect on the Surrogate Endpoint Figure 4.1: Regression Prediction Lines: the SBRCMB Approach (y = −0.02 + 0.55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = 50 (y = 0.50x). X < 0, the prediction line from the comprehensive approach is above that from the SBRCMB approach. Note that, when X0 < 0, the treatment in the new trial shows a beneficial effect on the surrogate endpoint. When Y0true < 0, the true treatment effect on the clinical endpoint is beneficial, and more negative Y0true values represent greater beneficial effects. So, Figure 4.1 implies that for a future trial with moderate sample size (50 patients in each arm, for example) and a total of 6 scans, if the treatment shows a beneficial effect on the surrogate endpoint, the true treatment effect on the clinical endpoint predicted by the SBRCMB approach is always slightly greater than that predicted by the comprehensive approach. This means when prediction of the true treatment effect on the clinical endpoint is based on the estimated treatment effect on the surrogate endpoint (on which estimation er63 rors exist), the SBRCMB approach may slightly overestimate the true treatment effect on the clinical endpoint. 1.5 Figure 4.2 shows another comparison between the SBRCMB results and the comprehensive results in predicting Y0true from X0true . We pretend that the SBRCMB approach models the regression relationship between the two true treatment effects; the prediction line is y = −0.02 + 0.55x. The prediction line from the comprehensive model is also given by (4.35), but now we choose Na0 and Nc0 to be infinity, to reflect the case that the future trial includes sufficient number of patients so that the observed treatment effect on the surrogate endpoint estimates the true treatment effect with negligible error. When Na0 and Nc0 are infinity, (4.35) becomes y = 0.08 + 0.62x. SMRCMB Comprehensive: no estimation error 1.0 0.5 ● ● ● ● ● ● 0.0 ● ● ● ●● ●● ● ● ● ● ● ● ● ●● −0.5 ● ● ●● ● ● ● ● −1.0 ● ● ● ● ● ● ● −1.5 True Treatment Effect on the Clinial Endpoint ● ● −2.0 ● −4 −3 −2 −1 0 1 TrueTreatment Effect on the Surrogate Endpoint Figure 4.2: Regression Prediction Lines: the SBRCMB Approach (y = −0.02 + 0.55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = ∞ (y = 0.08 + 0.62x). 64 From Figure 4.2, we see the two prediction lines intersecting at X0true = −1.39. Mtrue true = −1.39 means Note that exp (X0true ) = Ma0 true and exp (−1.39) = 0.25. So X0 c0 the treatment leads to a 75% reduction in MRI lesion count per patient per scan in the new trial, which is a large beneficial effect. Therefore, when the true treatment effect on the surrogate endpoint is available, the SBRCMB approach may underestimate/overestimate the true treatment effect on the clinical endpoint if the true treatment effect on the surrogate endpoint is larger/smaller than this value. We can also compare the point predictions of the two approaches for the 40 contrasts included in the SBRCMB dataset. The SBRCMB approach still uses the prediction line y = −0.02 + 0.55x to predict all of the Y0true s. But since each contrast has a different total number of scans and different numbers of patients, the comprehensive approach yields point predictions of the Y0true s that are no longer on a straight line. Figure 4.3 and Figure 4.4 show the comparison between the SBRCMB results and the comprehensive results in predicting Y0true from X0 , for the 40 contrasts in the SBRCMB dataset. In Figure 4.3, the solid points represent the point predictions for the 40 contrasts from the comprehensive approach, and the transparent points represent the pairs of estimated treatment effects. In Figure 4.4, the point predictions from the comprehensive approach are plotted against the corresponding predictions from the SBRCMB approach. From Figure 4.3 and 4.4, we can see that the point predictions for the true treatment effect on the clinical endpoints from the two approaches are generally very close. However, when X0 < 0, all the predictions from the comprehensive approach are larger than the corresponding predictions from the SBRCMB approach. So, for those contrasts where the treatments show beneficial effects on the surrogate endpoint, the SBRCMB approach may overestimate the true treatment effects on the clinical endpoint. Again, this is because none of those trials 65 1.5 ● SMRCMB Prediction Comprehensive Prediction 1.0 0.5 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● −0.5 −1.0 ● ● A ● ● ● ● ● −1.5 True Treatment Effect on the Clinial Endpoint ● ● ● −2.0 ● −4 −3 −2 −1 0 1 Estimated Treatment Effect on the Surrogate Endpoint 0.5 Figure 4.3: Point Predictions for the 40 Contrasts ● −0.5 ● ● ● ●● ● ● ● ● ● ● ● ● y=x ● ● −1.0 Comprehensive Prediction 0.0 ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● A ● ● −1.5 ● −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 SMRCMB Prediction Figure 4.4: Comparison of Point Predictions for the 40 Contrasts 66 include infinite number of patients, so estimation error exists in the measurement of the treatment effect on the surrogate endpoint. The SBRCMB prediction may be a little more liberal due to its failure to take into account the estimation error. The point A in Figure 4.3 and 4.4 shows the effect of estimation error on predicting the true treatment effect on the clinical endpoint clearly. Note that this point deviates substantially from the remaining points. This point represents the singlecontrast clinical trial which has only 10 patients in each arm. So the estimation error in the measurement of the treatment effect on the surrogate endpoint is very βˆ σˆ 2 large. From (4.35), we know that when Na0 and Nc0 are very small, σˆ 2 +XHˆ is 0 X much smaller than βˆ . This is why the point A deviates substantially from the rest of the points in the y direction. This means, with a large estimation error in the measurement of the treatment effect on the surrogate endpoint, a large estimated treatment effect on the surrogate endpoint may not be associated with a large true treatment effect on the clinical endpoint. We can also compare the prediction intervals of the two approaches. The prediction interval for Y0true (x0 ) from the comprehensive approach can be calculated from (4.39), and the prediction interval from the SBRCMB approach can be calculated from the standard regression method. (To do so, we pretend the SBRCMB approach models the regression relationship between the true treatment effect on the clinical endpoint and the estimated treatment effect on the surrogate endpoint.) Table 4.2 shows the result of the approximate 95% prediction intervals of exp (Y0true (x0 )) for the 40 contrasts included in the SBRCMB dataset. Note that Rtrue a0 exp (Y0true ) = Rtrue , which represents the true treatment effect on the annualized c0 relapse rate in a future contrast, expressed as a percentage. Table 4.2 is ordered a0 based on the magnitude of exp (X0 ) = M Mc0 , the estimated percentage treatment effect on the surrogate endpoint. The first column is the ID of the contrast in the SBRCMB dataset (see Appendix A). 67 Table 4.2: Comparison of the Approximate 95% Prediction Intervals for exp (Y0true (x0 )) for the SBRCMB and Comprehensive Approaches Contrast ID exp (X0 ) SBRCMB Comprehensive Point Interval Point Interval 3 0.02 0.12 (0.02, 0.60) 0.29 (0.12, 0.72) 29 0.04 0.17 (0.08, 0.35) 0.23 (0.13, 0.42) 20 0.08 0.24 (0.12, 0.49) 0.27 (0.18, 0.40) 21 0.11 0.29 (0.15, 0.58) 0.32 (0.22, 0.47) 28 0.17 0.37 (0.30, 0.45) 0.38 (0.29, 0.50) 15 0.19 0.39 (0.14, 1.09) 0.45 (0.28, 0.74) 25 0.30 0.50 (0.26, 0.95) 0.53 (0.38, 0.74) 4 0.32 0.52 (0.14, 1.95) 0.60 (0.33, 1.12) 14 0.34 0.54 (0.19, 1.53) 0.59 (0.37, 0.95) 8 0.35 0.55 (0.33, 0.91) 0.58 (0.43, 0.78) 40 0.36 0.55 (0.18, 1.68) 0.62 (0.34, 1.12) 1 0.37 0.56 (0.21, 1.48) 0.63 (0.35, 1.12) 26 0.39 0.58 (0.30, 1.12) 0.61 (0.43, 0.87) 27 0.40 0.58 (0.30, 1.14) 0.62 (0.44, 0.88) 2 0.41 0.59 (0.22, 1.59) 0.65 (0.36, 1.16) 36 0.44 0.62 (0.25, 1.51) 0.66 (0.44, 0.99) 10 0.47 0.64 (0.38, 1.09) 0.68 (0.49, 0.94) 24 0.47 0.64 (0.34, 1.21) 0.68 (0.49, 0.94) 6 0.48 0.65 (0.30, 1.40) 0.69 (0.34, 1.41) 38 0.51 0.67 (0.27, 1.63) 0.71 (0.47, 1.05) 7 0.58 0.72 (0.43, 1.20) 0.77 (0.57, 1.03) 68 Table 4.2: (continued) Contrast ID exp (X0 ) SBRCMB Comprehensive Point Interval Point Interval 5 0.67 0.78 (0.53, 1.16) 0.80 (0.49, 1.30) 33 0.67 0.78 (0.47, 1.30) 0.82 (0.58, 1.16) 18 0.69 0.79 (0.52, 1.20) 0.85 (0.66, 1.08) 9 0.76 0.84 (0.49, 1.44) 0.89 (0.64, 1.23) 19 0.82 0.88 (0.38, 2.04) 0.89 (0.56, 1.40) 30 0.88 0.91 (0.55, 1.51) 0.97 (0.73, 1.30) 39 0.91 0.93 (0.38, 2.26) 0.95 (0.63, 1.42) 11 0.92 0.93 (0.37, 2.35) 0.95 (0.62, 1.45) 23 0.96 0.95 (0.75, 1.21) 1.00 (0.71, 1.41) 32 1.04 1.00 (0.59, 1.70) 1.04 (0.72, 1.49) 16 1.06 1.01 (0.17, 5.97) 0.90 (0.48, 1.68) 37 1.11 1.03 (0.42, 2.53) 1.05 (0.70, 1.58) 22 1.16 1.06 (0.84, 1.35) 1.11 (0.78, 1.57) 17 1.27 1.11 (0.19, 6.60) 0.95 (0.51, 1.79) 13 1.35 1.15 (0.45, 2.98) 1.14 (0.73, 1.77) 31 1.47 1.21 (0.71, 2.07) 1.29 (0.94, 1.77) 34 1.61 1.27 (0.61, 2.68) 1.18 (0.71, 1.96) 35 1.69 1.31 (0.62, 2.79) 1.20 (0.71, 2.01) 12 1.74 1.33 (0.51, 3.45) 1.29 (0.82, 2.02) 69 From Table 4.2, we find that the lengths of the prediction intervals from the comprehensive approach are generally shorter than those obtained from the SBRCMB approach (34 out of 40 are shorter), which indicates that the comprehensive approach gives more precise prediction. This can be explained by the existence of estimation error in the measurement of the treatment effect on the clinical endpoint. Although we pretend that the SBRCMB approach can be used to predict Y0true , it actually predicts Y0 . Since in general, Y0 is more variable than Y0true , it may not be surprising that the SBRCMB prediction intervals tend to be wider. Figure 4.5 illustrates this information. The solid points and the solid lines represent the point predictions and the 95% prediction intervals from the SBRCMB approach, while the hollow points and the dashed lines represent those from the comprehensive approach. It is clear from the figure that most of the prediction intervals from the comprehensive approach are shorter than those from the SBRCMB approach. The second column of Table 4.2 is the estimated percentage treatment effect a0 on the surrogate endpoint. If X0 < 0 or equivalently, exp (X0 ) = M Mc0 < 1, then the treatment showed a beneficial effect on the surrogate endpoint in the contrast. a0 Among the 40 contrasts, there are 30 contrasts where M Mc0 < 1. For those contrasts, we expect to see beneficial true treatment effects on the clinical endpoint; that is, Rtrue a0 exp (Y0true ) = Rtrue < 1. However, based on the comprehensive approach, among c0 those 30 contrasts, only 14 have 95% prediction intervals that don’t contain 1. So for the other 16 contrasts, we get inconclusive prediction results for the true treatment effect on the clinical endpoint. The SBRCMB results are less definitive; only 7 contrasts have 95% prediction intervals that don’t contain 1. In the next section, we will study how the magnitude of the estimated treatment effect on the surrogate endpoint and the number of patients influence the prediction interval of the true treatment effect on the clinical endpoint. 70 2 3 ...... 6 7 SBRCMB Point Prediction Comprehensive Point Prediction SBRCMB 95% Prediction Interval Comprehensive 95% Prediction Interval ● 1 71 ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● 0 ● ● 0 0.5 1 1.5 exp(Xo) Figure 4.5: Comparison of the Approximate 95% Prediction Intervals for exp (Y0true (x0 )) for the SBRCMB and Comprehensive Approaches 4.5 Assessment of the Estimated Surrogacy Relationship in Practice For the MRI lesion count per patient per scan to be a useful surrogate endpoint in practice, it must provide precise enough information on the true treatment effect on the annualized relapse rate. Table 4.3 investigates the influence of the magnitude of X0 (or exp (X0 )) and the sample size Na0 , Nc0 of the future contrast on the prediction interval for Y0true (x0 ) (or exp (Y0true (x0 ))) calculated from the comprehensive approach. When calculating the prediction intervals, we fix K0 = 6. We set Na0 = Nc0 = N0 and vary N0 from 10 to 600 (the number of patients in the arms in the SBRCMB dataset range from 8 to 627). We also vary exp (X0 ) from 0.02 to 1.8 (the values of exp (X0 ) in the SBRCMB dataset range from 0.024 to 1.742). The entries in Table 4.3 are the point predictions and approximate 95% prediction intervals for exp (Y0true (x0 )). From Table 4.3, first we note that, within each column (i.e., given the value of the estimated treatment effect on the surrogate endpoint), the length of the approximate 95% prediction interval for the true treatment effect on the clinical endpoint becomes shorter as N0 increases. This is expected, since larger N0 represents more information on the new contrast, and the prediction will be more precise. The last row in Table 4.3 represents the situation when a new trial includes infinite number of patients. In such a case, the estimation error in the measurement of the treatment effect on the surrogate endpoint becomes negligible. However, we see the prediction interval for expY0true (X0 ) doesn’t shrink to a point: even if we know the true treatment effect on the surrogate endpoint, we still cannot predict the true treatment effect on the clinical endpoint without error. From Table 4.1, we know that τˆ 2 ≈ 0, which suggests a nearly perfect linear relationship between the true treatment effects. Therefore, the uncertainty in the last row of Table 4.3 is due to the fact that the surrogacy relationship is not estimated precisely enough (other parameters such as α and β are not estimated precisely enough). 72 Table 4.3: Influence of the Sample Size N0 and the Magnitude of the Estimated Treatment Effect on the Surrogate Endpoint on the 95% Prediction Intervals for the True Treatment Effect on the Clinical Endpoint for Trials with K0 = 6 Scans per Patient. The Entries are the Point Predictions and Approximate 95% Prediction Intervals for exp (Y0true (x0 )). N0 10 exp (X0 ) 0.02 0.1 0.2 0.5 0.8 0.9 1.0 1.5 1.8 0.28 0.44 0.54 0.70 0.80 0.83 0.85 0.96 1.01 (0.11, 0.70) (0.21, 0.93) (0.27, 1.07) (0.36, 1.35) (0.41. 1.55) (0.43, 1.61) (0.44, 1.66) (0.49, 1.89) (0.51, 2.02) 73 20 0.20 0.37 0.49 0.70 0.84 0.88 0.92 1.08 1.16 (0.09, 0.44) (0.20, 0.70) (0.27, 0.87) (0.41, 1.21) (0.49. 1.46) (0.51, 1.53) (0.53, 1.60) (0.61, 1.91) (0.65, 2.07) 50 0.14 0.31 0.44 0.70 0.89 0.94 1.00 1.22 1.34 (0.08, 0.24) (0.20, 0.50) (0.29, 0.67) (0.47, 1.05) (0.60, 1.33) (0.63, 1.41) (0.66, 1.49) (0.81, 1.86) (0.88, 2.05) 100 0.12 0.29 0.42 0.70 0.91 0.98 1.03 1.30 1.44 (0.07, 0.19) (0.20, 0.41) (0.31, 0.58) (0.52, 0.95) (0.67, 1.25) (0.71, 1.33) (0.76, 1.42) (0.93, 1.80) (1.02, 2.02) 200 0.11 0.27 0.41 0.70 0.93 1.00 1.06 1.34 1.50 (0.07, 0.16) (0.20, 0.36) (0.32, 0.53) (0.56, 0.89) (0.73, 1.18) (0.78, 1.27) (0.82, 1.36) (1.02, 1.77) (1.12, 2.00) 600 0.10 0.26 0.40 0.70 0.94 1.01 1.07 1.38 1.54 (0.06, 0.15) (0.20, 0.34) (0.33, 0.49) (0.60, 0.83) (0.78, 1.13) (0.83, 1.22) (0.88, 1.31) (1.09, 1.74) (1.19, 1.99) ∞ 0.10 0.26 0.40 0.70 0.94 1.02 1.08 1.40 1.56 (0.06, 0.15) (0.20, 0.33) (0.34, 0.46) (0.63, 0.78) (0.82, 1.09) (0.87, 1.18) (0.92, 1.23) (1.13, 1.73) (1.23, 1.98) Rtrue true a0 a0 Recall that, exp (X0 ) = M . So, when a new treatment Mc0 and exp (Y0 ) = Rtrue c0 is efficacious, we hope to observe exp (X0 ) < 1 and expect exp (Y0true ) < 1 (i.e., the upper bound of the approximate 95% prediction interval to be less than 1). On the other hand, when a new treatment has a negative effect, we hope to observe exp (X0 ) > 1 and expect exp (Y0true ) > 1 (i.e., the lower bound of the approximate 95% prediction interval to be larger than 1). The last two columns of Table 4.3 represent the situation when the treatment shows medium or large negative effects on the surrogate endpoint (the treatment is 50% or 80% worse than the control in terms of the observed surrogate endpoint), so we hope to see the lower bound of the prediction interval larger than 1. This only happens when N0 ≥ 200 for exp (X0 ) = 1.5 and when N0 ≥ 100 for exp (X0 ) = 1.8. So for negative observed treatment effects on the surrogate endpoint to imply negative true treatments effects on the clinical endpoint, a new contrast needs to include a large number of patients. For those contrasts with a medium or small number of patients or with a less extreme observed treatment effect on the surrogate endpoint, conclusive predictions for the true treatment on the clinical endpoint will not be possible. The 6th and 7th columns of Table 4.3 represent the situation when exp (X0 ) is close to 1; that is, the estimated treatment effect on the surrogate endpoint is beneficial but the magnitude is small. We see all the prediction intervals within these two columns contain 1 even when N0 is infinite. This suggests that when a new treatment shows only a small beneficial effect on the surrogate endpoint, we will not be able to determine if this treatment really has an effect on the clinical endpoint based on the estimated surrogacy relationship. In other words, the estimated surrogacy relationship is not very helpful in such a situation. The 5th column of Table 4.3 shows the situation when exp (X0 ) = 0.5, which represents a medium beneficial estimated treatment effect on the surrogate end- 74 point (50% reduction in the observed surrogate endpoint). However, when N0 < 100, the prediction intervals all contain 1. So, when a new treatment shows a medium beneficial effect on the surrogate endpoint, we will only be able to conclude this treatment has an effect on the clinical endpoint if the new trial includes sufficient patients. The first 3 columns of Table 4.3 represent the situation when exp (X0 ) is close to 0; that is, the estimated treatment effect on the surrogate endpoint is beneficial and the magnitude is very large. When N0 ≥ 20, all the prediction intervals exclude 1. This means we are 95% sure that an observed beneficial treatment effect on the surrogate endpoint corresponds to a true beneficial treatment effect on the clinical endpoint. On the other hand, how precisely we can determine the magnitude of the true treatment effect on the clinical endpoint is also of interest. This precision is indicated by the length of the prediction interval. Note that when N0 ≤ 50, the lengths of all the prediction intervals are no less than 0.3 except for the case when N0 = 50 and exp (X0 ) = 0.02. As N0 = 50 is a typical size for a phase 2 clinical trial in RRMS, this suggests the prediction of the true treatment effect on the clinical endpoint may not be very precise for a phase 2 clinical trial of small or medium size. On the other hand, when N0 ≥ 100, all the lengths of the prediction intervals are smaller than 0.25 except for the case when N0 = 100 and exp (X0 ) = 0.2. This indicates the prediction is relatively precise when a trial has a large number of patients. We also investigate the relationship between N0 and the value of exp (X0 ) for which the prediction interval for exp (Y0true ) excludes 1 (we fix K0 = 6). Burzykowsky and Buyse [18] introduced a similar concept called the “surrogate threshold effect”. This value represents the least extreme value of the estimated treatment effect on the surrogate endpoint from which we can obtain a conclusive prediction for the true treatment effect on the clinical endpoint. In Figure 4.6 and Figure 4.7, we plot the “threshold value” of exp (X0 ) against N0 . Figure 4.6 shows the re- 75 sult when a treatment shows a beneficial effect on the surrogate endpoint (X0 < 0), and Figure 4.7 shows the result when a treatment shows a negative effect on the surrogate endpoint (X0 > 0). From Figure 4.6, we see that when the treatment shows a beneficial effect on the surrogate endpoint, the threshold value increases as N0 increases. A larger threshold value represents a smaller estimated treatment effect on the surrogate endpoint. So, for a contrast with large number of patients, even though we observe only a relatively small treatment effect on the surrogate endpoint, we can still conclude that the treatment has a beneficial effect on the clinical endpoint. The threshold value for N0 = 50 is exp (X0 ) = 0.46, which means in order to conclude that a new treatment has a beneficial effect on the clinical endpoint for a contrast with 50 patients in each arm, this treatment has to be observed to be at least 100% − 46% = 54% better than the control on the surrogate endpoint. Similarly, for N0 = 10, 20, 100, 200 and 600, the threshold values are 0.14, 0.30, 0.55, 0.61 and 0.67. Note that the asymptote for the curve is 0.71, which indicates the threshold value obtained when N0 = ∞. So, when we try to predict the true treatment effect on the clinical endpoint based on the estimated surrogacy relationship, we require the new treatment to be at least 29% better than the control on the surrogate endpoint in order to conclude that there is a true beneficial treatment effect on the clinical endpoint. From Figure 4.7, we see that when the treatment shows a negative effect on the surrogate endpoint, the threshold value decreases as N0 increases. We can interpret Figure 4.7 in a similar way as Figure 4.6. For example, here the threshold value for N0 = 50 is 2.39, which means in order to conclude that a new treatment has a negative effect on the clinical endpoint for a contrast with 50 patients in each arm, this treatment has to be observed to be 139% worse than the control on the surrogate endpoint. Note that the asymptote here is 1.19. So, when we try to predict the true treatment effect on the clinical endpoint based on the estimated 76 0.7 0.6 0.5 0.4 0.3 Minimum Estimated Treatment Effect on MRI exp(Xo)=0.71 0 100 200 300 400 500 600 Number of Patients in the Placebo/Active arm 6 5 4 3 2 1 Minimum Estimated Treatment Effect on MRI 7 Figure 4.6: Threshold Value of exp (X0 ) versus Sample Size N0 when a Beneficial Treatment Effect is Observed on the Surrogate Endpoint exp(Xo)=1.19 0 100 200 300 400 500 600 Number of Patients in the Placebo/Active arm Figure 4.7: Threshold Value of exp (X0 ) versus Sample Size N0 when a Negative Treatment Effect is Observed on the Surrogate Endpoint 77 surrogacy relationship, we require the new treatment to be at least 19% worse than the control on the surrogate endpoint in order to conclude that there is a true negative treatment effect on the clinical endpoint. In conclusion, the estimated surrogacy relationship is useful in predicting the true treatment effect on the clinical endpoint when the treatment shows a large effect on the surrogate endpoint and the number of patients in the contrast is large (e.g. exp (X0 ) = 0.1 and N0 = 100). However, when the treatment shows a moderate beneficial effect on the surrogate endpoint (e.g. exp (X0 ) = 0.5), the prediction is not very precise (the prediction interval is wide). When the treatment only shows a small beneficial effect on the surrogate endpoint (exp (X0 ) > 0.71), using the estimated surrogacy relationship will lead to an inconclusive result for the true treatment effect on the clinical endpoint. From (4.30), we know that the true surrogacy relationship may be very good or nearly perfect. Nevertheless, the surrogate endpoint may not be very useful in predicting the true treatment effect on the clinical endpoint unless the treatment shows a large effect on the surrogate endpoint. Furthermore, even if a new trial includes sufficient number of patients so that we can measure the treatment effect on the surrogate endpoint perfectly, we still cannot predict the true treatment effect on the clinical endpoint without error. These may be explained by the limited number of trials included in the SBRCMB dataset. Since we only have 23 trials, we may not estimate the true surrogacy relationship precisely. So, use of the estimated surrogacy relationship may not result in a very precise prediction. 78 Chapter 5 Conclusions and Discussion In a clinical trial, a surrogate endpoint is used as a substitute for the clinical endpoint to assess the treatment effect. Using a surrogate endpoint instead of the clinical endpoint can shorten the period of a clinical trial, or reduce the number of patients needed in a clinical trial, and therefore reduce the cost. However, before a potential surrogate endpoint can be formally employed in practice, it must be validated. Use of an invalidated surrogate endpoint can lead to an incorrect conclusion about the treatment effect and thus use of the treatment in future may lead to ineffective or even harmful impact on patients. A potential surrogate endpoint can be validated in a single clinical trial or in multiple clinical trials if the multiple trials study the same or similar treatments. When the validation is carried on in multiple trials, the validation process can be based on the summary information of each trial or on the individual patient data, depending on whether the individual patient level data is available. When individual patient level data is not available, we lose the possibility of examining how closely a surrogate is related to the clinical endpoint in individual patients, but retain the ability to evaluate the relationship between the treatment effects on the surrogate and the clinical endpoints. 79 In RRMS clinical trials, changes in MS brain lesion patterns determined by MRI reflect the underlying MS disease pathology and hence may be the best candidate for a surrogate endpoint. In this report, we studied whether the MRI lesion count per patient per scan can serve as a surrogate endpoint for the annualized relapse rate, which is the most commonly used clinical endpoint for RRMS clinical trials. The SBRCMB dataset only includes summary information from 23 clinical trials. Two different approaches (the SBRCMB approach and the comprehensive approach) are applied to the SBRCMB dataset to assess this potential surrogacy relationship. The SBRCMB approach discussed in Chapter 3 uses simple linear regression with weighted least squares estimation, where the response and the explanatory variables are the estimated treatment effects on the clinical and the surrogate endpoints from each contrast, and the weights are chosen to account for the influence of different numbers of patients and different durations of contrasts. However, this approach treats the estimated treatment effects as the true treatment effects (doesn’t take into account the estimation errors) and ignores the correlation structure among contrasts from the same trial. The comprehensive approach discussed in Chapter 4 assumes a multivariate normal distribution for the true treatment effects to take into account the correlation structure among the contrasts from the same trial, and develops the conditional distribution of the estimated treatment effects given the true endpoints. The approximated marginal moments of the estimated treatment effects are then determined. To estimate the parameters related to the surrogacy relationship, we use the normal estimating equations. The βˆ from the comprehensive approach is 0.62, which is larger than 0.55 from the SBRCMB approach. So, the SBRCMB approach may underestimate the association between the true treatment effects. Neither of the αˆ s from the two 80 approaches are significantly different from 0, which is consistent with a good surrogacy relationship, since there is no strong indication of part of the true treatment effect on the annualized relapse rate remaining unexplained by the true treatment effect on the MRI lesion count per patient per scan. The SBRCMB approach ob2 tains a weighted R2 = 0.80, and the comprehensive approach obtains Rˆtrial ≈ 1. Both indicate a good surrogacy relationship. For the comprehensive approach, 2 Rˆtrial ≈ 1 is equivalent to τˆ 2 ≈ 0, which indicates a negligible estimated conditional variance of the true treatment effect on the annualized relapse rate given the true treatment effect on the MRI lesion count per patient per scan. Under the assumptions of the comprehensive approach, the Prentice definition about a surrogate endpoint requires that α = 0 and τ = 0. So, the MRI lesion count per patient per scan appears to be a very good surrogate endpoint for the annualized relapse rate. To assess how good this estimated surrogacy relationship is in practice, we predict the true treatment effect on the clinical endpoint for the 40 contrasts included in the SBRCMB dataset. The point predictions from the two approaches are very close, but those from the comprehensive approach are slightly larger than those from the SBRCMB approach for most contrasts. So, for those trials which showed beneficial treatment effects on the surrogate endpoint, the SBRCMB approach tends to predict slightly larger treatment effects than the comprehensive approach. The interval predictions from the two approaches are quite different however. The length of the prediction interval from the comprehensive approach is generally shorter (34 out of 40 are shorter), which indicates the comprehensive approach gives more precise prediction. For the comprehensive approach, we also study how the number of patients per arm and the value of the estimated treatment effect on the surrogate endpoint affect the prediction interval for the true treatment effect on the clinical endpoint. For a new contrast with infinite number of patients in each arm (i.e. the estimation 81 error in the measurement of the treatment effect on the surrogate endpoint is negligible), we require the treatment to be observed to be at least 29% better or 19% worse than the control on the surrogate endpoint, in order to avoid inconclusive prediction for the true treatment effect on the clinical endpoint. For a new contrast with limited number of patients in each arm, we require the treatment to show more extreme effects. For a typical phase 2 clinical trial in RRMS with 50 patients in each arm and with 6 scans for each patient, we require the treatment is at least 54% better or 139% worse. Among the 30 contrasts included in the SBRCMB dataset where the treatments show beneficial effects on the surrogate endpoint, 20 show treatment effects greater than 54%, while among the 10 contrasts where the treatments show negative effects on the surrogate endpoint, only 4 treatments are 139% or more worse than the control. So, the estimated surrogacy relationship could be useful in prediction when a treatment shows an beneficial effect on the surrogate endpoint, but may not be useful in the contrary case. In addition, when the number of patients per arm is around 50, the prediction interval is wide and doesn’t yield a precise prediction, unless the treatment shows a very large effect on the surrogate endpoint (e.g. ≥ 90%). In conclusion, the comprehensive approach shows that the underlying surrogacy relationship may be very good. In a typical phase 2 with around 50 patients in each arm and with 6 scans for each patient, the estimated surrogacy relationship can give precise prediction for the true treatment effect on the clinical endpoint when the treatment displays a large effect on the surrogate endpoint. However, when the treatment displays only a modest or a small effect on the surrogate endpoint, the prediction may be inconclusive or not precise enough. The reason for this may be the limited number of trials included in the SBRCMB dataset: the parameters related to the surrogacy relationship may not be estimated precisely enough, which leads to a relatively wide prediction interval. To employ the surrogacy relationship to make predictions in practice, we may need information from more trials to estimate the surrogacy relationship more precisely. 82 The comprehensive approach we developed is in the spirit of Daniels and Hughes [2] (DH) and Korn et al. [3] (KAM). Both construct models to assess surrogacy relationships using summary results from multiple clinical trials. Both DH and KAM use multivariate normal distributions for the true treatment effects in their models to allow for correlated contrasts. However, DH starts with assumptions about the surrogacy relationship between the true treatment effects directly, while KAM starts with assumptions about the true endpoints, where the influence of the true surrogate endpoint on the true clinical endpoint is assumed to be the same regardless of the presence of the treatment. Building the model from endpoints requires a more detailed specification and we think the KAM assumptions may not be very appropriate in practice, so we started with assumptions about the true treatment effects. On the other hand, both papers assume the estimation errors in estimating the true treatment effects are independent from the true treatment effects. In contrast, we assume they are dependent and large true treatment effects are associated with small estimation errors. We think this dependence assumption is more reasonable in practice. However, not making assumptions about the true endpoints and the dependence estimation errors makes it difficult to obtain the marginal distribution the estimated treatment effects in our model. If one can find a reasonable assumption on the distribution of the true endpoints, then the marginal distribution can be obtained, and the surrogacy relationship could be re-estimated using the actual likelihood rather than the “approximated” likelihood. Furthermore, DH adopt a Bayesian approach to estimate the surrogacy relationship. By choosing appropriate priors for the parameters, we could also use a Bayesian approach to estimate the surrogacy relationship and compare the results to those obtained in this study. The SBRCMB dataset only contains summary information from each trial but not the individual patient information. If the individual patient information is available, one can re-analyze the surrogacy relationship using the individual pa- 83 tient level data and compare the results for the estimated surrogacy relationship with those from this study. In principle, the estimated surrogacy relationship from the model with individual patient level data should be more precisely determined, since this model includes more information. However, if the two results are close, one may favor the model based on summary results. This is because it is much easier to collect the summary results of each trial than to collect the individual patient data from each trial, and the estimation process of the model with only summary results may be much less computational intensive. Despite this, if the individual patient information is available, one can assess how closely the surrogate endpoint is related to the clinical endpoint, (e.g. Rind from Buyse et al. [13]), which is useful for patient management. 84 Bibliography [1] M. P. Sormani, L. Bonzano, L. Roccatagliata, G. R. Cutter, G. L. Mancardi, and P. Bruzzi. Magnetic resonance imaging as a potential surrogate for relapses in multiple sclerorsis: A meta-analytic approach. Annals of Neurology, 65:268–275, 2009. [2] M. J. Daniels and M. D. Hughes. Meta-analysis for the evaluation of potential surrogate markers. Statistics in Medicine, 16:1965–1982, 1997. [3] E. L. Korn, P. S. Albert and L. M. McShane. Assessing surrogates as trial endpoints using mixed models. Statistics in Medicine, 24:163–182, 2005. [4] T. Burzykowsky, G. Molenberghs and M. Buyse. The Evaluation of Surrogate Endpoints. Springer, New York, New York, 2005. [5] R. L. Prentice. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine, 8:431–440, 1989. [6] H. F. McFarland, F. Barkhof, J. Antel, and D. H. Miller. The role of MRI as a surrogate outcome measure in multiple sclerosis. Multiple Sclerosis, 8: 40–51, 2002. [7] T. R. Fleming and D. L. DeMets. Surrogate endpoints in clinical trials: Are we being misled? Annals of Internal Medicine, 125:605–613, 1996. [8] M. Buyse and G. Molenberghs. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics, 54:1014–1029, 1996. 85 [9] V. W. Berger. Does the Prentice criterion validate surrogate endpoints? Statistics in Medicine, 23:1571–1578, 2004. [10] L. S. Freedman and B. I. Graubard. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine, 11:167–178, 1992. [11] G. Molenberghs, M. Buyse, H. Geys, D. Renard, T. Burzykowski, and A. Alonso. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Cotrolled Clinical Trials, 23:607–625, 2002. [12] A. Alonso, G. Molenberghs, T. Burzykowski, D. Renard, H. Geys, Z. Shkedy, F. Tibaldi, J. C. Abrahantes, and M. Buyse. Prentice’s approach and the meta-analytic paradigm: A reflection on the role of statisitcs in the evaluation of surrogate endpoints. Biometrics, 60:724–728, 2004. [13] M. Buyse, G. Molenberghs, T. Burzykowsky, D. Renard, and H. Geys. The validation of surrogate endpoint in meta-analyses of rnadomized experiments. Biometrics, 1:49–67, 2000. [14] R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90:773–795, 1995. [15] A. J. Petkau, S. C. Reingold, U. Held, G. R. Cutter, T. R. Fleming, M. D. Hughes, D. H. Miller, H. F. McFarland, and J. S. Wolinsky. Magnetic resonance imaging as a surrogate outcome for multiple sclerosis relapses. Multiple Sclerosis, 14:770–778, 2008. [16] J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308–313, 1965. [17] F. Mosteller and J. W. Tukey. Data Analysis and Regression, a Second Course in Statistics. Addison-Wesley, Reading, Massachusetts, 1977. 86 [18] T. Burzykowsky and M. Buyse. Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation. Pharmaceutical Statistics, 5:173–186, 2006. 87 Appendix A The SBRCMB Dataset In the table that follows, the last four columns represent the observed endpoints from each contrast: MRI = MRI lesion count per patient per scan; ARR = annualized relapse rate. The symbol “C” means “control arm” and the symbol “A” means “active arm”. Unless otherwise noted, entries in columns 1, 2, 3, 4, 5, 11 and 12 are copied from the supplementary table accompanying the SBRCMB paper. Entries in the remaining columns are extracted or calculated from the original papers where the results of the corresponding clinical trials are reported. 88 Trial 89 1 1 2 3 4 5 6 6 7 7 8 8 8 9 9 10 10 11 12 Contrast ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MRI Outcome Active T2 a Active T2 a Active T2 b Active T2 b Active T2 b New T2 CUA c CUA c CUA c CUA c CUA c CUA c CUA c New Gd New Gd New Gd New Gd New T2 CUA c SBRCMB Weight 37 36 14 20 233 59 138 140 123 124 41 39 39 32 33 11 11 207 49 Follow-up # of (months) Scans 24 6 24 6 6 6 6 6 24 2 24 2 24 10 24 10 12 6 12 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 9 9 6 6 # of Patients C A 17 17 17 17 10 10 14 14 82 83 19 23 66 64 66 68 97 87 97 85 43 44 43 40 43 40 33 32 33 32 10 8 10 8 120 119 34 36 MRI C A 0.82 0.30 0.82 0.33 3.37 0.08 4.22 1.37 2.40 1.60 3.65 1.75 1.55 0.90 1.55 0.55 1.70 1.30 1.70 0.80 1.48 1.37 1.48 2.58 1.48 2.00 1.22 0.42 1.22 0.23 3.00 3.18 3.00 3.80 1.52 1.04 2.42 1.98 ARR C A 1.27 1.17 1.27 0.84 2.00 0.34 1.29 0.57 0.82 0.67 1.31 0.45 1.28 0.91 1.28 0.87 1.08 1.08 1.08 0.81 0.98 1.00 0.98 1.64 0.98 1.47 0.88 0.90 0.88 1.07 0.27 0.48 0.27 0.88 1.21 0.81 1.29 1.50 Trial 90 13 13 14 14 15 15 16 16 17 18 19 19 20 20 21 21 22 22 22 22 23 a new, Contrast MRI ID Outcome 20 CUA c 21 CUA c 22 New T2 23 New T2 24 New T2 25 New T2 26 CUA c 27 CUA c 28 Active T2 b 29 New Gd 30 New Gd 31 New Gd 32 New T2 33 New T2 34 CUA c 35 CUA c 36 New Gd 37 New Gd 38 New Gd 39 New Gd 40 New Gd SBRCMB Weight 74 77 758 751 87 84 79 77 1332 74 140 d 128 d 129 136 65 63 44 d 44 d 44 d 44 d 28 Follow-up # of (months) Scans 6 6 6 6 14 1 14 1 6 6 6 6 9 7 9 7 24 2 6 4 12 8 12 8 9 4 9 4 12 4 12 4 6 6 6 6 6 6 6 6 6 5 # of Patients C A 71 68 71 74 467 471 467 462 81 83 81 77 61 61 61 57 315 627 35 69 84 96 84 87 102 98 102 106 41 44 41 42 49 50 49 50 49 50 49 50 19 19 MRI C 1.62 1.62 6.80 6.80 1.07 1.07 2.68 2.68 5.50 1.12 0.72 0.72 2.40 2.40 4.50 4.50 1.73 1.73 1.73 1.73 1.03 A 0.13 0.18 7.90 6.50 0.50 0.32 1.04 1.06 0.95 0.05 0.64 1.06 2.50 1.60 7.25 7.62 0.77 1.91 0.88 1.57 0.37 ARR C A 0.51 0.09 0.51 0.22 0.61 0.60 0.61 0.54 0.77 0.35 0.77 0.36 0.81 0.58 0.81 0.55 0.73 0.23 0.84 0.37 0.44 0.46 0.44 0.60 0.77 0.76 0.77 0.52 0.50 1.00 0.50 0.88 0.53 0.44 0.53 0.52 0.53 0.56 0.53 0.44 0.63 0.42 recurrent and enlarging T2 lesions and enlarging T2 lesions c combined uniquely active lesions = recurrent and enlarging T2 lesions and new Gd enhancing lesions, avoiding double counting d calculated from the original papers; these differ from those in the SBRCMB paper b new Appendix B Partial Derivatives of E(Y0true|X0 = x0) From (4.34), we have: E(Y0true |X0 = x0 ) = α + β µX (1 − σX2 σX2 ) + x0 , β σX2 + H0 σX2 + H0 1 1 where H0 = ϕ2 [ K01Na0 E( Mtrue ) + K01Nc0 E( Mtrue )] = ϕ2 c0 say. Let L0 = a0 c0 σX2 2 σX +H0 E(Y0true |X0 = x0 ) = α + β µX (1 − L0 ) + β L0 x0 . So: ∂E = 1, ∂α ∂E = µX (1 − L0 ) + L0 x0 , ∂β ∂E = −β µX + β x0 , ∂ L0 ∂E = β (1 − L0 ), ∂ µX ∂ L0 H0 = 2 2 ∂ σX (σX + H0 )2 91 . Then: ∂E ∂ E ∂ L0 H0 = = (−β µX + β x0 ) 2 , 2 2 ∂ L0 ∂ σ X ∂ σX (σX + H0 )2 ∂ L0 −σ 2 = 2 X 2, ∂ H0 (σX + H0 ) ∂ H0 = c0 , ∂ ϕ2 ∂E ∂ E ∂ L0 ∂ H0 −σ 2 = = (−β µX + β x0 ) 2 X 2 c0 . ∂ ϕ2 ∂ L0 ∂ H0 ∂ ϕ2 (σX + H0 ) The entries of the partial derivative of g is then given by: g=( ∂E ∂E ∂E ∂E ∂E T , , , , ) . ∂ α ∂ β ∂ µX ∂ σX2 ∂ ϕ2 92
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Magnetic resonance imaging lesion count as a surrogate...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Magnetic resonance imaging lesion count as a surrogate endpoint in relapsing-remitting multiple sclerosis… Qin, Lang 2011
pdf
Page Metadata
Item Metadata
Title | Magnetic resonance imaging lesion count as a surrogate endpoint in relapsing-remitting multiple sclerosis clinical trials |
Creator |
Qin, Lang |
Publisher | University of British Columbia |
Date Issued | 2011 |
Description | The count of active lesions based on magnetic resonance imaging (MRI) is often used as a potential surrogate endpoint in phase 2 clinical trials for relapsing-remitting multiple sclerosis (RRMS) patients. However, this surrogacy relationship has not been completely validated. In this report, we study whether at the trial level, the MRI lesion count is a good surrogate endpoint for the relapse rate, the usual clinical endpoint for RRMS clinical trials. Two different approaches to assess this surrogacy relationship are applied to the dataset used by Sormani et al. [1] (SBRCMB) which contains the summary results from 23 randomized, placebo-controlled clinical trials in RRMS. The SBRCMB approach uses simple linear regression with weighted least squares estimation, while our more comprehensive approach develops a detailed model for the endpoints and the treatment effects to take into account estimation errors and the correlated contrasts. Both approaches are based only on the summary results from each clinical trial. The shortcomings of the SBRCMB approach are discussed and the results from the two approaches are compared. Both approaches show that the MRI lesion count is a good surrogate endpoint, while our more comprehensive approach shows a nearly perfect surrogacy relationship. When the estimated surrogacy relationship is used to predict the true treatment effect on the clinical endpoint for the trials in the SBRCBM dataset, the approaches give similar point predictions, but the approximate 95% prediction intervals from the comprehensive approach are generally shorter. In practice, the estimated surrogacy relationship based on the comprehensive approach can give a precise prediction for the true treatment effect on the clinical endpoint if the treatment displays a large effect on the surrogate endpoint, but may otherwise lead to an inconclusive result. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-09-01 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0072082 |
URI | http://hdl.handle.net/2429/37092 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2011-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2011_fall_qin_lang.pdf [ 410.76kB ]
- Metadata
- JSON: 24-1.0072082.json
- JSON-LD: 24-1.0072082-ld.json
- RDF/XML (Pretty): 24-1.0072082-rdf.xml
- RDF/JSON: 24-1.0072082-rdf.json
- Turtle: 24-1.0072082-turtle.txt
- N-Triples: 24-1.0072082-rdf-ntriples.txt
- Original Record: 24-1.0072082-source.json
- Full Text
- 24-1.0072082-fulltext.txt
- Citation
- 24-1.0072082.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0072082/manifest