- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Magnetic resonance imaging lesion count as a surrogate...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Magnetic resonance imaging lesion count as a surrogate endpoint in relapsing-remitting multiple sclerosis.. 2011
pdf
Page Metadata
Item Metadata
Title | Magnetic resonance imaging lesion count as a surrogate endpoint in relapsing-remitting multiple sclerosis clinical trials |
Creator |
Qin, Lang |
Publisher | University of British Columbia |
Date Created | 2011-09-01T20:56:41Z |
Date Issued | 2011-09-01 |
Date | 2011 |
Description | The count of active lesions based on magnetic resonance imaging (MRI) is often used as a potential surrogate endpoint in phase 2 clinical trials for relapsing-remitting multiple sclerosis (RRMS) patients. However, this surrogacy relationship has not been completely validated. In this report, we study whether at the trial level, the MRI lesion count is a good surrogate endpoint for the relapse rate, the usual clinical endpoint for RRMS clinical trials. Two different approaches to assess this surrogacy relationship are applied to the dataset used by Sormani et al. [1] (SBRCMB) which contains the summary results from 23 randomized, placebo-controlled clinical trials in RRMS. The SBRCMB approach uses simple linear regression with weighted least squares estimation, while our more comprehensive approach develops a detailed model for the endpoints and the treatment effects to take into account estimation errors and the correlated contrasts. Both approaches are based only on the summary results from each clinical trial. The shortcomings of the SBRCMB approach are discussed and the results from the two approaches are compared. Both approaches show that the MRI lesion count is a good surrogate endpoint, while our more comprehensive approach shows a nearly perfect surrogacy relationship. When the estimated surrogacy relationship is used to predict the true treatment effect on the clinical endpoint for the trials in the SBRCBM dataset, the approaches give similar point predictions, but the approximate 95% prediction intervals from the comprehensive approach are generally shorter. In practice, the estimated surrogacy relationship based on the comprehensive approach can give a precise prediction for the true treatment effect on the clinical endpoint if the treatment displays a large effect on the surrogate endpoint, but may otherwise lead to an inconclusive result. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | Eng |
Collection |
Electronic Theses and Dissertations (ETDs) 2008+ |
Date Available | 2011-09-01T20:56:41Z |
DOI | 10.14288/1.0072082 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 2011-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/37092 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/24/items/1.0072082/source |
Download
- Media
- ubc_2011_fall_qin_lang.pdf [ 410.76kB ]
- Metadata
- JSON: 1.0072082.json
- JSON-LD: 1.0072082+ld.json
- RDF/XML (Pretty): 1.0072082.xml
- RDF/JSON: 1.0072082+rdf.json
- Turtle: 1.0072082+rdf-turtle.txt
- N-Triples: 1.0072082+rdf-ntriples.txt
- Citation
- 1.0072082.ris
Full Text
Magnetic Resonance Imaging Lesion Count as a Surrogate Endpoint in Relapsing-Remitting Multiple Sclerosis Clinical Trials by Lang Qin B.Sc., Jinan University, 2009 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2011 c Lang Qin, 2011 Abstract The count of active lesions based on magnetic resonance imaging (MRI) is of- ten used as a potential surrogate endpoint in phase 2 clinical trials for relapsing- remitting multiple sclerosis (RRMS) patients. However, this surrogacy relation- ship has not been completely validated. In this report, we study whether at the trial level, the MRI lesion count is a good surrogate endpoint for the relapse rate, the usual clinical endpoint for RRMS clinical trials. Two different approaches to assess this surrogacy relationship are applied to the dataset used by Sormani et al. [1] (SBRCMB) which contains the summary re- sults from 23 randomized, placebo-controlled clinical trials in RRMS. The SBR- CMB approach uses simple linear regression with weighted least squares estima- tion, while our more comprehensive approach develops a detailed model for the endpoints and the treatment effects to take into account estimation errors and the correlated contrasts. Both approaches are based only on the summary results from each clinical trial. The shortcomings of the SBRCMB approach are discussed and the results from the two approaches are compared. Both approaches show that the MRI le- sion count is a good surrogate endpoint, while our more comprehensive approach shows a nearly perfect surrogacy relationship. When the estimated surrogacy rela- tionship is used to predict the true treatment effect on the clinical endpoint for the trials in the SBRCBM dataset, the approaches give similar point predictions, but ii the approximate 95% prediction intervals from the comprehensive approach are generally shorter. In practice, the estimated surrogacy relationship based on the comprehensive approach can give a precise prediction for the true treatment effect on the clinical endpoint if the treatment displays a large effect on the surrogate endpoint, but may otherwise lead to an inconclusive result. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 What is a Surrogate Endpoint? . . . . . . . . . . . . . . . . . . . 1 1.2 Surrogate Endpoints in Multiple Sclerosis . . . . . . . . . . . . . 2 1.3 Outline of the Report . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review: Validation of Surrogate Endpoints . . . . . . . . 5 2.1 Importance of Validating a Potential Surrogate Endpoint . . . . . 5 2.2 Methods of Validating Surrogate Endpoints . . . . . . . . . . . . 8 2.2.1 The Prentice Operational Criteria for Validation . . . . . . 8 2.2.2 Validation in a Single Clinical Trial . . . . . . . . . . . . 9 2.2.3 Validation in Multiple Clinical Trials . . . . . . . . . . . . 12 2.3 Validation in Multiple Clinical Trials with Individual Data Un- available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 iv 2.3.1 Review of Daniels and Hughes [2] . . . . . . . . . . . . . 16 2.3.2 Review of Korn et al. [3] . . . . . . . . . . . . . . . . . . 19 2.3.3 Comparison of These Two Approaches . . . . . . . . . . . 24 3 Lesion Counts as a Surrogate Endpoint in RRMS: the SBRCMB Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 Introduction and the SBRCMB Dataset . . . . . . . . . . . . . . . 26 3.2 The SBRCMB approach . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Critique of the SBRCMB Approach . . . . . . . . . . . . . . . . 33 3.3.1 The Appropriateness of the Weights . . . . . . . . . . . . 34 3.3.2 Correlation of the Contrasts . . . . . . . . . . . . . . . . . 39 3.3.3 Influence of Estimation Errors . . . . . . . . . . . . . . . 40 4 Lesion counts as a Surrogate Endpoint in RRMS: A More Com- prehensive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Model for the Single-contrast Clinical Trials . . . . . . . . . . . . 45 4.1.1 Model for the True Treatment Effects . . . . . . . . . . . 45 4.1.2 Model for the Observed Annualized Relapse Rate and MRI Lesion Count Per Patient Per Scan . . . . . . . . . . 47 4.1.3 Model for the Estimated Treatment Effects . . . . . . . . . 48 4.2 Model for the Multiple-contrast Clinical Trials . . . . . . . . . . . 50 4.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Comparison between the Comprehensive Approach and the SBR- CMB Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 Assessment of the Estimated Surrogacy Relationship in Practice . 72 5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . 79 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 v A The SBRCMB Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B Partial Derivatives of E(Y true0 jX0 = x0) . . . . . . . . . . . . . . . . . 91 vi List of Tables Table 3.1 Results of the Sensitivity Study . . . . . . . . . . . . . . . . . 30 Table 3.2 Results of the Interaction Study . . . . . . . . . . . . . . . . . 31 Table 4.1 Results of the Model Fit . . . . . . . . . . . . . . . . . . . . . 57 Table 4.2 Comparison of the Approximate 95% Prediction Intervals for exp(Y true0 (x0)) for the SBRCMB and Comprehensive Ap- proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Table 4.3 Influence of the Sample SizeN0 and theMagnitude of the Es- timated Treatment Effect on the Surrogate Endpoint on the 95% Prediction Intervals for the True Treatment Effect on the Clinical Endpoint for Trials with K0 = 6 Scans per Pa- tient. The Entries are the Point Predictions and Approximate 95% Prediction Intervals for exp(Y true0 (x0)). . . . . . . . . . . 73 vii List of Figures Figure 2.1 Scenarios of Perfect (a) and Imperfect (b,c,d) Surrogates . . . 7 Figure 3.1 Scatter Plot of Estimated Treatment Effects . . . . . . . . . . 27 Figure 3.2 Results of the Validation Study . . . . . . . . . . . . . . . . . 33 Figure 3.3 Scatter Plot of (c;1=w) . . . . . . . . . . . . . . . . . . . . . 38 Figure 3.4 Scatter Plot of (c;1=w0) . . . . . . . . . . . . . . . . . . . . . 38 Figure 4.1 Regression Prediction Lines: the SBRCMB Approach (y= 0:02+0:55x) and the Comprehensive Approach withK0= 6 and Na0 = Nc0 = 50 (y= 0:50x). . . . . . . . . . . . . . . . 63 Figure 4.2 Regression Prediction Lines: the SBRCMB Approach (y= 0:02+0:55x) and the Comprehensive Approach withK0= 6 and Na0 = Nc0 = ¥ (y= 0:08+0:62x). . . . . . . . . . . . . 64 Figure 4.3 Point Predictions for the 40 Contrasts . . . . . . . . . . . . . 66 Figure 4.4 Comparison of Point Predictions for the 40 Contrasts . . . . . 66 Figure 4.5 Comparison of the Approximate 95% Prediction Intervals for exp(Y true0 (x0)) for the SBRCMB and Comprehensive Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Figure 4.6 Threshold Value of exp(X0) versus Sample Size N0 when a Beneficial Treatment Effect is Observed on the Surrogate Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Figure 4.7 Threshold Value of exp(X0) versus Sample Size N0 when a Negative Treatment Effect is Observed on the Surrogate Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 viii Acknowledgments I would like to express my sincerest gratitude to my supervisor, Professor John Petkau. I could never finish my thesis without his insightful guidance and con- stant encouragement. I am also grateful for his enlightening teaching in STAT550 and STAT551, from which I started to learn how to think as a statistician. I would like to thank Professor Lang Wu for being my second reader and providing valu- able comments. I would like to thank my best friend Guannan Li, to whom I can always ex- press my sadness when I am depressed. I would like to thank my statistic col- league Yumi Kondo, who always discussed statistics with me and made my days at the department interesting and cheerful. I would also like to thank Jun Chen for being a very nice roommate who beard my irregular working schedule. Finally, I would like to express my deepest gratitude to my beloved parents. Without their love, I could never complete my graduate study. ix To my family. x Chapter 1 Introduction 1.1 What is a Surrogate Endpoint? In clinical trials, a clinical endpoint generally refers to occurrence of a disease, a symptom, a sign or a laboratory abnormality that constitutes one of the target out- comes of the trial. It directly measures how a patient feels, functions or survives and thus, is used to determine whether the treatment being studied is beneficial. A surrogate endpoint is an outcome which can be used as a substitute for a clin- ical endpoint. When assessing the treatment effect, a surrogate endpoint can be used to generate reliable conclusions instead of using the corresponding clinical endpoint directly. Examples of potential surrogate endpoints include CD4 cell count for HIV-related disease progression in clinical trials of anti-HIV treatments, progression-free survival time for survival time in clinical trials of treatments for advanced ovarian cancer and serum cholesterol levels for survival in clinical trials of treatments for cardiovascular disease. More examples of potential surrogate endpoints can be found in Burzykowsky et al. [4]. Why are surrogate endpoints required? The principal reason is that in many clinical trials, it is difficult to use the desired clinical endpoints directly. The clin- ical endpoint may be rare, so a large number of patients would be required for 1 a trial with adequate power (e.g. short-term mortality in patients with suspected acute myocardial infarction). The clinical endpoint may need a very long follow- up time to be detected (e.g. survival of patients in early-stage cancers), but too many patients might then be lost to follow-up. The clinical endpoint may also be difficult or costly to measure. In contrast, surrogates endpoints are outcomes that occur more often or are easier to measure. The motivation for the use of a surro- gate endpoint is therefore the possibility of a reduction in the number of required patients or in the required trial duration. In order to effectively substitute for a formal clinical endpoint, a surrogate endpoint must have the potential to yield unambiguous information about differ- ential treatment effects on a clinical endpoint. The formal definition of a surrogate endpoint is given by Prentice [5] as “a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the clinical endpoint”. The Prentice definition means that if a treatment has an effect on a clinical endpoint, then the treatment also has an effect on the surrogate endpoint, and the converse is also true. Mathematically, if S and C denote the surrogate endpoint and the clinical endpoint respectively, and Z denotes the treatment. then the Prentice definition can be written as: f (SjZ) = f (S), f (CjZ) = f (C); (1.1) where f (X) denotes the probability distribution of the random variable X and f (X jZ) denotes the probability distribution of X conditional on the value of Z. 1.2 Surrogate Endpoints in Multiple Sclerosis Multiple sclerosis (MS) is a chronic and often disabling disease of the central ner- vous system. MS affects the ability of nerve cells in the brain and spinal cord 2 to communicate with each other. Nerve cells communicate by sending electrical signals called action potentials down long fibers called axons, which are wrapped in an insulating substance called myelin. In MS, the body’s own immune system attacks and damages the myelin. When myelin is lost, the axons can no longer effectively conduct signals. The name multiple sclerosis refers to scars particu- larly in the white matter of the brain and spinal cord, which is mainly composed of myelin. MS results in symptoms including difficulties in moving and coordination, de- terioration of sensory functions, problems in bowel and bladder functions, among many others. MS onset usually occurs in young adults, and it is more common in women. Although much is known about the mechanisms involved in the disease process, the cause remains unknown, and there is no known cure for the disease to date. There are several types of MS characterized by disease progression in terms of severity of disability. Relapsing-remitting MS (RRMS), the most common type, is characterized by unpredictable relapses followed by periods of months to years of relative quiet (remission) with no new signs of disease activity. Until now, the only accepted primary endpoints for pivotal clinical trials of new treatments for RRMS are clinical outcomes, including relapse rate and ac- cumulation of permanent disability, usually measured by the Extended Disability Status Scale (EDSS). There is no fully validated surrogate endpoint for RRMS yet. In RRMS clinical trials, magnetic resonance imaging (MRI) scans of the brain are often utilized to help monitor patients’ health and the progression of their disease. McFarland et al. [6] argue that changes in MS brain lesion patterns determined by MRI scans, which reflect the underlying disease pathology, may be the best candidate for a surrogate endpoint in RRMS. 3 1.3 Outline of the Report The objective of our study is to address the question: Are changes in brain lesion patterns determined by MRI a good surrogate endpoint for the relapse rate, the clinical endpoint in RRMS clinical trial? This chapter has provided some background information about surrogate end- points and MS. Chapter 2 provides a general review of how to validate a potential surrogate endpoint. We first discuss the importance of validation and then review different approaches, in situations where data is from a single clinical trial and data is from multiple clinical trials respectively. In the situation of multiple clini- cal trials, we focus on the scenario where only summary statistics for each trial are available. We review the methods adopted in Daniels and Hughes [2] and Korn et al. [3] in detail. Chapter 3 considers validation in the RRMS setting. The specific potential surrogate endpoint we will focus on is the MRI lesion count, and the correspond- ing clinical endpoint is the annualized relapse rate. Information is presented on the dataset of Sormani et al. [1] (hereafter referred to as SBRCMB) used to assess the surrogacy relationship. The methodology of SBRCMB is discussed in detail, as well as the potential drawbacks of their approach. In Chapter 4, we develop a related but different model to assess the surrogacy relationship. We focus on dealing with the issue of measurement error existing in estimating the surrogate endpoint and the clinical endpoint, and the context where data is available from several clinical trials, including some having more than two arms. We compare the results from the SBRCMB model and from our model. We also evaluate the prediction ability of the estimated surrogacy relationship to determine whether the surrogate endpoint is useful in practice. Chapter 5 summa- rizes the overall findings and discusses problems that remain to be investigated. 4 Chapter 2 Literature Review: Validation of Surrogate Endpoints 2.1 Importance of Validating a Potential Surrogate Endpoint It is essential to validate a potential surrogate endpoint before using it as the pri- mary outcome in a clinical trial. A surrogate endpoint should be able to assess the treatment effect in a clinical trial and the result obtained from the surrogate endpoint should be consistent with that obtained from the corresponding clini- cal endpoint. Inconsistent results will lead to an incorrect conclusion about the treatment effect, and thus misuse of the treatment in future, which may cause in- effective or even harmful impact on patients. For example, in some clinical trials regarding cardiologic disorder, blood pressure is used as a surrogate endpoint for actual survival of a patient. However, some treatments that are useful in lowering a patient’s blood pressure have been shown to have no effect in reducing the risk of death from myocardial infarction. More examples of misuse of potential surro- gate endpoints can be found in Fleming and DeMets [7]. 5 Most potential surrogate endpoints are prognostic biomarkers, which means there is a strong association between the biomarker and the clinical endpoint at the level of the individual patient. Such association reflects a potential biolog- ical relationship between the biomarker and the clinical endpoint. However, as many studies have shown, a strong association is not enough. Surrogate endpoints are about assessing treatment effects. This means, at the trial level, the treatment effect obtained from a surrogate endpoint must reliably predict the treatment ef- fect obtained from the clinical endpoint. Examples of the misuse of prognostic biomarkers as surrogate endpoints can be found in Fleming and DeMets [7]. Focusing on the Prentice definition of a surrogate endpoint (1.1), we require that if a treatment has an effect on a surrogate endpoint, then it also has an effect on the clinical endpoint. However, we also require that if a treatment doesn’t have an effect on the surrogate endpoint, then it doesn’t have an effect on the clinical endpoint either. Biologically, this implies the surrogate endpoint is on the sole causal pathway of the disease process to the clinical endpoint. Figure 2.1 illustrates the perfect scenario for a surrogate as well as some im- perfect scenarios: D, S andC stand for the disease, the surrogate endpoint and the clinical endpoint in a clinical trial respectively, while Z stands for the treatment applied in this clinical trial. Panel (a) shows the situation of a perfect surrogate endpoint, in which S is on the sole causal pathway from D to C. So the entire effect of Z on S will extend toC, and Z cannot affectC without affecting S. Panels (b), (c) and (d) show some situations of imperfect surrogate endpoints. Note that, in all these 3 situations, S is associated with C since they both are influenced by the same disease process D. However, in panel (b), S is not on the causal path- way from D to C. In the case illustrated, Z could affect S but not C, so S is not a surrogate endpoint for C. In panel (c), there are two pathways from D to C, and S is on one of them. If Z affects C only through X on the second pathway, then S is not a surrogate endpoint for C; if Z can affect C through both S and X , then 6 (a) (b) (c) (d) Figure 2.1: Scenarios of Perfect (a) and Imperfect (b,c,d) Surrogates S is an imperfect surrogate endpoint for C. In such a case, an effect of Z on S could imply an effect of Z onC. However, since Z can bypass S and still influence C through X , it is possible that there is an effect on C but no effect on S. On the other hand, the effect of Z on S and the effect of Z on X may counteract each other, leading to no net effect of Z on C. In panel (d), it is possible that the effect of Z on S doesn’t extend toC, but to X instead. In this case, if there is no treatment ef- fect on S, then there is no treatment effect onC, but the converse is not always true. 7 2.2 Methods of Validating Surrogate Endpoints 2.2.1 The Prentice Operational Criteria for Validation Prentice [5] proposed 4 operational criteria to validate a potential surrogate end- point. Recalling his definition of a surrogate endpoint (1.1): f (SjZ) = f (S), f (CjZ) = f (C), and using the same notation, we can express the Prentice opera- tional criteria as: f (SjZ) 6= f (S) (2.1) f (CjZ) 6= f (C) (2.2) f (CjS) 6= f (C) (2.3) f (CjS;Z) = f (CjS) (2.4) Essentially, (2.1) requires that the treatment has an effect on the surrogate end- point, (2.2) requires that the treatment has an effect on the clinical endpoint, (2.3) requires that different values of the surrogate endpoint result in different values of the clinical endpoint, which means the surrogate endpoint is a prognos- tic biomarker, and (2.4) requires that the surrogate endpoint should completely capture the dependence of the clinical endpoint on the treatment. In practice, (2.1) and (2.2) are considered as necessary conditions for an out- come to be a surrogate endpoint, but not “actual” validation criteria. Note that (1.1) is equivalent to f (SjZ) 6= f (S), f (CjZ) 6= f (C), so (2.1) and (2.2) need to be satisfied or not simultaneously. Criteria (2.3) and (2.4) are the “actual” valida- tion criteria. Usually, (2.3) is examined before (2.4), because a surrogate endpoint is expected to be a good prognostic biomarker. Criterion (2.4) is the essential part of the Prentice operational criteria. It means the treatment effect on the clinical endpoint can be entirely captured by the surrogate endpoint. A common way to examine (2.4) is to assume a regression model of form C = a+bZ+ gS+ e and 8 to check if the estimated regression coefficient for S is significantly different from 0 and that for Z is not. For this approach to be valid, one has to believe that the regression model describes the true relationship among C;S and Z. Buyse and Molenberghs [8] show that (2.3) and (2.4) are necessary and suf- ficient conditions to establish (1.1) when the surrogate endpoint of interest is a binary outcome. When the surrogate endpoint is not binary, the criteria are only sufficient but not necessary; that is, if (2.3) and (2.4) are satisfied, then a treatment effect on the clinical endpoint ensures a treatment effect on the surrogate end- point, but a treatment effect on the surrogate endpoint may not imply a treatment effect on the clinical endpoint. In terms of Figure 2.1, (2.3) and (2.4) exclude the situations (b) and (c), but not (d). (In (d), (2.3) holds because both S and C are influenced by D, and (2.4) holds because Z cannot affect C without affecting S.) Some counter examples are given in Buyse and Molenberghs [8] and Berger [9]. 2.2.2 Validation in a Single Clinical Trial To check the criterion (2.4), one needs to show that the statistical test for the treat- ment effect on the clinical endpoint to be nonsignificant after adjustment for the surrogate endpoint. However, this requirement raises a conceptual difficulty in validation since a nonsignificant result may simply be due to insufficient power of the statistical test. Hence, (2.4) is useful in rejecting a poor surrogate end- point (the statistical test leads to a significant result), but is inadequate to validate a good surrogate endpoint. To overcome this difficulty, Freedman and Graubard [10] proposed a quantity called “proportion of the treatment effect explained by the surrogate” (PE) to measure the quality of a potential surrogate. Let b and bs be the parameters representing the treatment effect on the clinical endpointC without and with adjustment for the surrogate endpoint S respectively. 9 Then PE is defined as: PE = b bs b = 1 bs b : (2.5) It is expected that bs = 0 when the surrogate is perfect; in this case, PE = 1. Nat- urally, PE being closer to 1 implies the surrogate endpoint explains more of the treatment effect on the clinical endpoint. In practice, b and bs are replaced by their estimates, and the 2-sided 95% confidence interval for PE is constructed. Freedman and Graubard [10] suggested the lower limit of the interval should be greater than a critical value, say 0.5, for the surrogate endpoint to be considered useful. For example, in a clinical trial, let a and b denote the treatment effect on the surrogate endpoint and the clinical endpoint respectively, and let S j, C j and Z j denote the surrogate endpoint, the clinical endpoint and the treatment received for the jth patient. Here, Z j is an indicator variable, which can be either 1 (the jth patient is in the active arm) or 0 (the jth patient is in the control arm). We often refer to the combination of an active arm and a control arm as a “contrast”. So, a and b are the treatment effects obtained from the contrast in this clinical trial (i.e., by comparing the active arm and the control arm). Assume the model: S j = ms+aZ j+ es j; C j = mc+bZ j+ ec j; (2.6) where the error terms (esi and ec j) have a bivariate normal distribution with mean 0 and variance-covariance matrix S= sss ssc ssc scc ! : (2.7) Then, one can obtain the conditional distribution of C j given S j, which is param- 10 eterized as: C jjS j = m+bsZ j+ gS j+ e j; (2.8) where bs = b sscsssa . In this model, PE is given by: PE = 1 bs b = ssc sss a b : (2.9) Despite PE’s description as the “proportion” of the treatment effect explained by the surrogate endpoint, it is not actually a “proportion”. Molenberghs et al. [11] point out that the range of PE is not between 0 and 1 and discuss the interpretation problems of PE. For instance, the PE defined by (2.9) can take any value on the real line, because the range of ab is unrestricted. Buyse and Molenberghs [8] propose two quantities to replace PE in validating a potential surrogate endpoint. The first is the “adjusted association” rA, a mea- sure of the association between the surrogate endpoint and the clinical endpoint after adjustment for the treatment. In terms of model (2.6), rA can be expressed as: rA = sscp sssscc : (2.10) The adjusted association rA measures how good a surrogate endpoint performs at the level of the individual patient. In the above model, if rA = 1, then the variance of e j in (2.8) is 0. So, C j becomes a linear function of S j, which means given the value of S j, one can estimate the value of C j without error. In this case, the surrogate endpoint and the clinical endpoint contain equivalent information about the treatment, hence one can determine the treatment effect on the clinical end- point exactly from the treatment effect on the surrogate endpoint, and the Prentice definition (1.1) is satisfied [12]. 11 The second quantity Buyse and Molenberghs [8] propose is the “relative ef- fect” (RE), which is defined as the ratio of the treatment effect on the clinical endpoint to the treatment effect on the surrogate endpoint. In terms of model (2.6), RE is defined as: RE = b a : (2.11) The relative effect RE is useful in predicting the treatment effect on the clinical endpoint from that on the surrogate endpoint. In practice, a and b are replaced by their estimates and a confidence interval for RE is constructed. A narrow con- fidence interval results in a good prediction of the treatment effect on the clinical endpoint. For example, based on the data from the current trial, one can obtain R̂E = b̂â . For a future trial, one can estimate its treatment effect on the surrogate endpoint as â0. Then, the treatment effect on the clinical endpoint from that fu- ture trial can be estimated as b̂0 = â0 R̂E. However, to make use of RE for such predictions, it is necessary to assume that the relationship (2.11) also holds in the future trial. This assumption may not be correct and cannot be checked in a single clinical trial. 2.2.3 Validation in Multiple Clinical Trials When multiple clinical trials study the efficacy of the same treatment or treatments with a similar mechanism on the same disease, the validation procedure can use the information from these multiple trials. In this section, we review the methods used when the individual patient level data is available from each trial. In the next section, we discuss the methods used when only summary information from each trial is available. Buyse et al. [13] consider the situation where individual patient level data is available and the surrogate endpoint and the clinical endpoint are both continu- 12 ously, normally distributed. Let Si j,Ci j and Zi j denote the surrogate endpoint, the clinical endpoint and the treatment received for the jth patient from the ith trial. Assume the model: Si j = ms+msi+aZi j+aiZi j+ esi j; Ci j = mc+mci+bZi j+biZi j+ eci j; (2.12) where ms and mc are fixed intercepts, a and b are the fixed effects of treatment on the surrogate endpoint and the clinical endpoint, msi and mci are random intercepts and ai and bi are the random effects of treatment on the endpoints in trial i. The error terms esi j and eci j are assumed to follow the joint normal distribution with mean 0 and variance-covariance matrix given by (2.7), and the random effects (msi;mci;ai;bi)T are assumed to follow a joint normal distribution with mean 0 and variance-covariance matrix D given by: D= 0BBBB@ dss dsc dsa dsb dsc dcc dca dcb dsa dca daa dab dsb dcb dab dbb 1CCCCA : (2.13) Buyse et al. [13] suggest to evaluate the surrogate endpoint at two different levels. One is at the trial level, the other is at the individual patient level. At the trial level, the surrogacy relationship is assessed by the conditional variance of b +bi given msi and ai. From (2.12) and (2.13), the conditional variance is given by: Var(b +bijmsi;ai) = dbb dsb dab !T dss dsa dsa daa !1 dsb dab ! : (2.14) 13 This conditional variance describes how precisely one can predict the treatment effect on the clinical outcome given the treatment effect on the surrogate outcome in a certain trial. Equivalently, a proportion type measure of “trial level” surrogacy is defined as: R2trial = dsb dab !T dss dsa dsa daa !1 dsb dab ! dbb : (2.15) Moreover, one can quantify the relationship between the treatment effects on the surrogate endpoint and on the clinical endpoint by using the conditional ex- pectation of b +bi given msi and ai, which is: E(b +bijmsi;ai) = b + dsb dab !T dss dsa dsa daa !1 msi asi ! : (2.16) The equation (2.16) characterizes how the treatment effect on the clinical endpoint changes with the treatment effect on the surrogate endpoint. Given a new trial, af- ter estimating the treatment effect on the surrogate endpoint, m̂si and âsi, one can predict the expected treatment effect on the clinical endpoint through (2.16). Note that if we only have one trial, then we are not able to characterize this relationship. At the individual patient level, the surrogacy relationship is evaluated using the adjusted association rA used in the single trial situation. The conditional variance of Ci j given Si j and the random effects is sccs2css1ss . Thus, a proportion type measure of “individual level” surrogacy is defined as: R2ind = r 2 A = s2cs sssscc : (2.17) A surrogate endpoint is considered to be perfect when both R2trial and R 2 ind are equal to 1. Large values of R2trial implies precise prediction of the treatment effect on the clinical endpoint, while large values of R2ind implies strong association be- 14 tween the surrogate endpoint and the clinical endpoint, which is useful in patient management. It is possible that R2trial is large and R 2 ind is small, or vice versa. 2.3 Validation in Multiple Clinical Trials with Individual Data Unavailable In some contexts, only summary data of each trial, not the individual patient data, is available. For example, only results about the estimated treatment effect on the endpoints and the corresponding estimated standard errors may be available, not the outcomes of each patient. Then, the surrogacy relationship can only be evaluated at the trial level. Since we don’t know the outcomes of each patient, we cannot evaluate the strength of the association between the surrogate endpoint and the clinical endpoint at the individual patient level (e.g., calculate Rind in (2.17)). However, we are still able to assess the relationship between the treatment effect on the clinical endpoint and on the surrogate endpoint. When only summary results from each trial are available, caution must be taken in the validation procedure because these summary results are only “esti- mates”, which are different from the “true” quantities. For example, an estimated treatment effect on the endpoint from one trial is different from the true treatment effect on the endpoint from this trial. The true treatment effect is the effect ob- tained when the clinical trial includes an infinite number of patients. In practice, due to the limited number of patients, there always exist non-negligible estima- tion errors between the estimated and the true effects. How to appropriately model these estimation errors is important in assessing surrogacy relationships at the trial level. In the following subsections, we will review papers by Daniels and Hughes [2] (DH, hereafter) and Korn et al. [3] (KAM, hereafter), in which models are constructed to evaluate surrogacy relationships in multiple clinical trials for the situation when individual patient level data is unavailable. 15 2.3.1 Review of Daniels and Hughes [2] Suppose N trials are used to analyze the performance of the surrogate endpoint of interest. In the ith trial, denote the true treatments effect on the surrogate endpoint and on the clinical endpoint as X truei and Y true i respectively. Correspondingly, let Xi and Yi denote their estimates, i.e. the summary results obtained from the ith trial. Generally, unless the the number of patients in the ith trial is very large, Xi and Yi are different from X truei and Y true i . Given the ith trial, Xi is assumed to be normally distributed with mean X truei and variance d 2i and Yi is assumed to be normally distributed with mean Y truei and variance s2i . Furthermore, the correlation between Xi and Yi is assumed to be ri. Here, d 2i and s2i represent the effect of estimation error in the ith trial, and ri represents the correlation between the estimation errors on X truei and Y truei . In mathematical form: Yi Xi ! Y truei X truei ! N2 Y truei X truei ! ; s2i risidi risidi d 2i !! : (2.18) The surrogacy relationship of interest is the relationship between the true treat- ment effects Xi and Yi. DH assume the following structure: Y truei jX truei N(a+bX truei ;t2): (2.19) Here, b measures the association between the true treatment effects on the clinical and the surrogate endpoint. If b = 0, then there is actually no such surrogacy re- lationship. When b 6= 0, a perfect surrogacy relationship also requires that a = 0 so that the treatment having no effect on the surrogate endpoint suggests no effect on the clinical endpoint. Having a 6= 0 implies that there is a treatment effect on the clinical endpoint unexplained by the surrogate endpoint. The variance t2 16 represents the uncertainty of using X truei to predict Y true i . If t2 = 0, then Y truei will be perfectly determined when X truei is given. At this stage, DH assume the X truei s are fixed quantities. The reason why they choose X truei s as fixed rather than random is to avoid having to propose specific distributions for the X truei s, which they think may not be appropriate. (Though later, they put very flat prior distributions on X truei s when estimating the model parameters in the Bayesian framework.) Then combining (2.18) and (2.19), we obtain the bivariate normal model for Yi and Xi: Yi Xi ! N2 a+bX truei X truei ! ; s2i + t2 risidi risidi d 2i !! : (2.20) In some clinical trials, there may be more than one active arm, in addition to the control arm. A common situation is that different patients receive different levels of dosage of a treatment. For example, if a treatment is applied at 2 dosage levels, then this clinical trial consists of 3 arms. Patients on the first arm receive treatment with dosage level one, patients on the second arm receive treatment with dosage level two, and patients on the third arm receive control. Since the com- bination of any active arm and a control arm yields a contrast, this clinical trial consists of 2 contrasts. From a clinical trial with multiple contrasts, we obtain multiple estimated treatment effects on both endpoints. Suppose there are 3 arms in the ith trial, which can generate 2 contrasts. Let Yi1 and Xi1 be the estimated treatment effects on the clinical and surrogate endpoints from the first contrast, and Yi2 and Xi2 be those from the second contrast. Correspondingly, let Y truei1 ;X true i1 ;Y true i2 and X true i2 17 be the true treatment effects. Then model (2.18) can be generalized to:0BBBB@ Yi1 Xi1 Yi2 Xi2 1CCCCA 0BBBB@ Y truei1 X truei1 Y truei2 X truei2 1CCCCAN4 0BBBB@ 0BBBB@ Y truei1 X truei1 Y truei2 X truei2 1CCCCA ; 0BBBB@ s2i1 ri11si1di1 riysi1si2 ri12si1di2 ri11si1di1 d 2i1 ri21di1si2 rixdi1di2 riysi1si2 ri21di1si2 s2i2 ri22si2di2 ri12si1di2 rixdi1di2 ri22si2di2 d 2i2 1CCCCA 1CCCCA : (2.21) The off-diagonal blocks of covariance terms in (2.21) are allowed to be non-zero, reflecting the possibility of correlations among the two pairs of estimated treat- ment effects arising because they all involve comparisons to the same control arm. Also, assuming X truei1 and X true i2 are fixed, (2.19) is generalized (M. J. Daniels, per- sonal communication) as: Y truei1 Y truei2 ! X truei1 X truei2 ! N2 a+bX truei1 a+bX truei2 ! ; t2 0 0 t2 !! : (2.22) From (2.22), we can see that the marginal distributions of Y truei1 and Y true i2 have the same form. This is because all the treatments included in the analysis have similar mechanism of action; whether two contrasts are from one trial or different trials, they should reflect the same surrogacy relationship. DH assume the covariance between Y truei1 and Y true i2 is 0 in (2.22). In principle, this covariance could be non- zero and (2.22) can be replaced by substituting the 0 by an non-zero parameter. Combining (2.21) and (2.22), we get:0BBBB@ Yi1 Xi1 Yi2 Xi2 1CCCCAN4 0BBBB@ 0BBBB@ a+bX truei1 X truei1 a+bX truei2 X truei2 1CCCCA ; 0BBBB@ s2i1+ t2 ri11si1di1 riysi1si2+ t2 ri12si1di2 ri11si1di1 d 2i1 ri21di1si2 rixdi1di2 riysi1si2+ t2 ri21di1si2 s2i2+ t2 ri22si2di2 ri12si1di2 rixdi1di2 ri22si2di2 d 2i2 1CCCCA 1CCCCA : (2.23) For a clinical trial with 3 or more contrasts, a similar extension can be applied. DH assume a joint normal structure for the summary results (estimated treat- 18 ment effects) from each trial included in the study. To estimate the model pa- rameters, they adopt a Bayesian approach. In the estimation procedure, all the within trial variances and correlations are assumed known and replaced by their estimates. The variance estimates for each trial are obtained from the summary re- sults of that trial. For the correlation estimates, if the individual patient level data for one trial is available, the correlation estimates in this trial are calculated from the individual patient data. Otherwise, the correlation estimates for that trial are set to the average value of the correlation estimates from trials which individual patient level data are available. In the Bayesian procedure, priors are then placed on a;b ;t2 and all the true treatment effects on the surrogate endpoint (i.e. X truei in single contrast trials and X turei j s in multiple contrast trials). To assess the surrogacy relationship, DH propose to examine if the 95% cred- ible intervals for a;b and t2 exclude 0. Also, DH suggest to compute Bayes factors [14] to test if a;b and t2 are 0. If the tests reject the null hypothesis of b = 0 and don’t reject the null hypotheses of a = 0 and t2 = 0, then the surrogacy relationship is considered to be validated. 2.3.2 Review of Korn et al. [3] KAM discuss different models to assess the surrogacy relationship for two dif- ferent types of clinical trials. One type of trial involves unordered treatment arms (i.e. there is no control arm in the trial), and the other type of trial involves ordered treatment arms (i.e. there is one control arm in the trial). Since the dataset we will use in the next chapter consists of only ordered trials, and also to make KAM’s model comparable with DH’s model, we only discuss their model for ordered tri- als. In contrast to DH, KAM start their model at the arm level instead of at the con- trast level. In the ith clinical trial, letCi j and Si j be the observed clinical endpoint 19 and the observed surrogate endpoint from the jth arm, where j = 0;1;2; ::: ( j = 0 represents the control arm in the trial). Similarly, let Ctruei j and S true i j be the true clinical and surrogate endpoints. KAM’s model begins by describing the estima- tion errors in estimating the endpoints. Correspondingly, let ei j and fi j denote the estimation errors in the surrogate endpoint and the clinical endpoint respectively. Then: 8<:Si j = Struei j + ei jCi j =Ctruei j + fi j; ; (2.24) Since the estimation errors happen in different arms with different patients, they are assumed to be independent. KAM further assume that ei j iid N(0;s2i j) and fi j iid N(0;d 2i j), and that ei j and fi j are independent. As a next step, KAMmodel Struei j andC true i j . Let mi represent the expected level of the true surrogate endpoint on the control arm, and mS represent the expected difference on the true surrogate endpoint between the active and control arms. Let mi j be the random effect representing the uncertainty in the true surrogate endpoint for each arm. KAM express Struei0 and S true i j as: Struei0 = mi+mi0 and S true i j = mi+mS+mi j; for j 6= 0: (2.25) KAM assume mi0 N(0;l 20 ), mi j N(0;l 2) ( j 6= 0), and all mi js are indepen- dent. Note that although mi js ( j 6= 0) are the random effects for different active arms, they are assumed to have the same distribution. Similarly, all mi0s are as- sumed to have the same distribution. Furthermore, mi j is assumed to be indepen- dent of ei j and fi j, which means the estimation errors are not affected by the value of the true endpoints. KAM assume there is a linear relationship between Ctruei j and S true i j , specifi- cally: 20 Ctruei0 = ai+bS true i0 +gi0 and C true i j = ai+mC+bStruei j +gi j; for j 6= 0; (2.26) where b represents the linear relationship between Ctruei j and Struei j , and ai and ai + mC are the intercepts in the control arms and the active arms respectively. Here, mC represents the expected difference on the clinical endpoint between the active and control arms that cannot be explained by the influence of the true sur- rogate endpoint on the true clinical endpoint. The random effects gi j account for the fact that Ctruei j and S true i j are not perfectly linearly related and are assumed to be independent and normally distributed with mean 0 and variance t2=2. Note that all the gi js are assumed to have the same distributions though they are from different arms. Since gi js are not estimation errors, gi j, ei j and fi j are assumed to be independent. The treatment effect is estimated as the difference between the endpoints from the active arm and from the control arm. Let Xi j = Si j Si0 and Yi j = Ci jCi0 denote the estimated treatment effects on the surrogate and on the clinical end- points respectively ( j 6= 0). Corresponding, let X truei j = Struei j Struei0 and Y truei j = Ctruei j Ctruei0 denote the true treatment effects. From (2.24), (2.25) and (2.26), we have:8<:Xi j = X truei j +(ei j ei0)Yi j = Y truei j +( fi j fi0) where 8<:X truei j = mS+(mi jmi0)Y truei j = mC+bX truei j +(gi jgi0) (2.27) From (2.27), we obtain: E(Y truei j jX truei j ) = mC+bX truei j (2.28) Var(Y truei j jX truei j ) = t2; which describes the surrogacy relationship between the true treatment effects on the clinical endpoint and on the surrogate endpoint. We now see the interpreta- 21 tions of mC;b and t2 in the KAM model are the same as the interpretations of a;b and t2 in the DH model. As before, b measures the association between the true treatment effect on the clinical endpoint and on the surrogate endpoint. For a trial with single contrast, from (2.27) we can obtain the joint distribution of the estimated treatment effects: Yi1 Xi1 ! N2 mC+bmS mS ! ; b 2(l 20 +l 2)+ t2+(s2i1+s2i0) b (l 20 +l 2) b (l 20 +l 2) (l 20 +l 2)+(d 2i1+d 2i0) !! : (2.29) For a trial with 2 contrasts, from (2.27), after similar calculation: (Yi1;Xi1;Yi2;Xi2)T N4 m m ! ; S1i S3i S3i S2i !! ; (2.30) where m = mC+bmS mS ! and S1i = b 2(l 20 +l 2)+ t2+(s2i1+s 2 i0) b (l 2 0 +l 2) b (l 20 +l 2) (l 20 +l 2)+(d 2i1+d 2 i0) ! ; S2i = b 2(l 20 +l 2)+ t2+(s2i2+s 2 i0) b (l 2 0 +l 2) b (l 20 +l 2) (l 20 +l 2)+(d 2i2+d 2 i0) ! ; S3i = b 2l 20 + t2 2 +s 2 i0 bl 2 0 bl 20 l 2 0 +s 2 i0 ! : For a trial with 3 or more contrasts, a similar extension can be applied. When fitting their model, KAM use the maximum likelihood estimators ob- tained from the joint normal distributions (2.29) and (2.30). The estimation error 22 terms s2i j and d 2i j are assumed known and replaced by their estimates when fitting the model. To assess the surrogacy relationship, in addition to evaluating the estimates and the confidence intervals for mC;b and t2, KAM suggest that one can use a R2- type measure. From (2.27) and (2.29), we know Var(Y truei j ) = b 2(l 20 +l 2)+ t2, and Var(Y truei j jX truei j ) = t2. So, the R2-type measure is defined as: R2trial = b 2(l 20 +l 2) b 2(l 20 +l 2)+ t2 : (2.31) This quantity is analogous to R2trial in (2.15). Large values of R 2 trial indicate a good surrogacy relationship. Furthermore, to evaluate how a surrogate endpoint performs in practice, KAM suggest to estimate the parameter E(Y truei j jXi j), which is useful in predicting the true treatment effect on the clinical endpoint given the estimated treatment effect on the surrogate endpoint. This parameter is analogous to E(b + bijmsi;ai) in (2.16). However, KAM suggest to condition Y truei j on Xi j, rather than on X true i j . From (2.27), the parameter of interest is: D= E(Y truei j jXi j) = (mC+bmS)+ b (l 20 +l 2) (l 20 +l 2)+(d 2 i j+d 2i0) (Xi jmS): (2.32) To estimate D, KAM plug in the estimates for (b ;mS;mC;l 20 ;l 2) and the observed value of Xi j from a new trial and replace d 2i j and d 2i0 by their estimates from that trial. 23 2.3.3 Comparison of These Two Approaches The first difference between DH and KAM is that their models start from different levels: DH start directly from the treatment effects (contrast level, since treatment effects are obtained from contrasts), where they build the model for (Yi;Xi) given (Y truei ;X true i ) and for Y true i given X true i . In contrast, KAM start from the endpoints (arm level, since the endpoint values are obtained from the arms), where they first specify the joint distribution for (Si j;Ci j;Struei j ;C true i j ), and take the difference to obtain the joint distribution for Yi j and Xi j. Building the model from the arm level requires a more detailed specification. However, in (2.26), KAM assume the same coefficient b for control arms and active arms, which implies the relationships between the true surrogate endpoint and the true clinical endpoint are the same regardless of the arm. This assumption may not be very realistic in some situa- tions, where a treatment may substantially influence the association between two endpoints and thus it may be more reasonable to assume different b s for control and active arms. In contrast, DH don’t make assumptions about the relationship between the endpoints but model the surrogacy relationship directly in (2.19). We think the DH approach is more reasonable from this perspective. Both papers deal with the estimation errors in the same way in the sense that the estimation errors are assumed to be independent of the true treatment effects. In (2.18), DH assume s2i and d 2i , the within trial estimation errors, do not depend on Y truei and X true i . This means the estimation errors on the treatment effects are not affected by the true treatment effects. Similarly, in (2.25), KAM assume mi j are independent of ei j and fi j, which means the estimation errors on the endpoints are not affected by the true endpoints. This assumption implies that (mi jmi0) are independent from (ei jei0) and ( fi j fi0), which also means the estimation errors on the treatment effects are not affected by the true treatment effects. However, it is possible that a large true treatment effect is associated with a large estimation error, while a small treatment effect is associated with a small estimation error. Thus, this independence assumption may not hold in some clinical trials. 24 To compare how these models differ in characterizing the treatment effects, we can compare the joint distributions for the treatment effects. For example, we can compare (2.20) with (2.29). Alternatively, from (2.27), we obtain: Yi1 Xi1 ! (Y truei1 ;X true i1 ) N2 Y truei1 X truei1 ! ; s2i1+s 2 i0 0 0 d 2i1+d 2 i0 !! ; (2.33) and Y truei1 jX truei1 N(mC+bX truei1 ;t2): (2.34) Comparing (2.33) and (2.34) with (2.18) and (2.19), it is evident that the DH model and the KAM model are essentially the same. One difference is that X truei1 follows a normal distribution with mean mS and variance l 20 + l 2 in the KAM model, while DH treat X truei as fixed when specifying their model but then give it a prior distribution when carrying out the estimation. The prior is chosen to be normal with mean 0 and a very large variance, meaning it is “non-informative”. Besides this, the conditional covariance in (2.33) is 0, while the conditional co- variance in (2.18) is allowed to be non-zero. This is because KAM assume the within trial estimation errors ei j and fi j are independent, but DH allow a correla- tion ri. It is likely that the two estimation errors are correlated in general. How- ever, without individual patient level data, it is difficult to estimate this correlation. In the following chapters, we will discuss validation of the surrogate endpoint in the MS context. Our dataset consists of multiple clinical trials and only sum- mary results from these trials are available. We will discuss two approaches to validate the surrogate endpoint of interest: the SBRCMB approach and a more comprehensive approach. The comprehensive approach is similar in spirit to the DH and KAM models. 25 Chapter 3 Lesion Counts as a Surrogate Endpoint in RRMS: the SBRCMB Approach 3.1 Introduction and the SBRCMB Dataset Recently, MRI measures of brain lesion counts on RRMS patients are widely used in clinical trials as a potential surrogate endpoint. One important clinical endpoint in RRMS clinical trials is the annualized relapse rate. A relapse is defined as ap- pearance of new symptom or worsening of an existing symptom, attributable to MS, accompanied by an appropriate new neurologic abnormality. However, the surrogacy relationship between such MRI measures and this clinical endpoint has remained incompletely validated. Petkau et al. [15] show that the correlation be- tween MRI lesion counts and the annualized relapse rate at the individual level is weak. The low degree of correlation at the individual level indicates that MRI measures would be unreliable predictors of the annualized relapse rate for an in- dividual patient. However, this result does not exclude the possibility that the treatment effects on MRI measures and on the annualized relapse rate are highly associated, which means that MRI measures may still be useful for assessing the 26 treatment effect at the trial level. To evaluate whether MRI measures are useful in assessing treatment effects, SBRCMB collected summary information from multiple MS clinical trials. The SBRCMB dataset includes 23 randomized, double-blind, placebo-controlled tri- als. The treatments in the trials are believed to have similar mechanism of action. There are 2 trials including both secondary progressive multiple sclerosis patients and RRMS patients. The remaining 19 trials include only RRMS patients. Among the 23 trials, there are 9 trials of 2 arms, 14 trials of 3 arms, 1 trial of 4 arms and 1 trial of 5 arms. Each trial has only 1 control arm but 1 to 4 active arms. In total, there are 63 arms, 40 contrasts and 6591 patients. The detailed SBRCMB dataset is included in Appendix A. l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l −4 −3 −2 −1 0 1 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 Estimated Treatment Effect on the Surrogate Endpoint Es tim ate d T re atm en t E ffe ct on th e C lin ial En dp oin t Figure 3.1: Scatter Plot of Estimated Treatment Effects 27 The SBRCMB dataset contains no individual patient level data, only the sum- mary results from each clinical trial. The observed clinical endpoint for an arm is defined as the observed annualized relapse rate for this arm (it is assumed that all the patients in the same trial have the same follow-up time) and the observed surrogate endpoint for an arm is defined as the observed MRI lesion count per pa- tient per scan from this arm (all the patients in the same trial are assumed to have the same scan times). The estimated treatment effect on the clinical endpoint is then defined as the log ratio between the observed clinical endpoints in the active and control arms. Similarly, the estimated treatment effect on the surrogate end- point is then defined as the log ratio between the observed surrogate endpoints in the active and control arms. Since one contrast is formed by comparing one active arm and one control arm, we can obtain one estimated treatment effect on the clin- ical endpoint and one estimated treatment effect on the surrogate endpoint from each contrast. In total, we have 40 pairs of estimated treatment effects. Figure 3.1 shows the scatter plot of these pairs of estimated treatment effects. Note that, the observed endpoints are not equal to the true endpoints (unless the arm includes infinite number of patients), and thus the estimated treatment effects are not equal to the true treatment effects. The task is to assess the surrogacy relationship be- tween the true treatment effects, which are not observable, based on the estimated treatment effects. 3.2 The SBRCMB approach SBRCMB adopt a simple linear regression model and use weighted least squares (WLS) to assess the surrogacy relationship. The explanatory variable is the es- timated treatment effect on the surrogate endpoint and the response variable is the estimated treatment effect on the clinical endpoint. In order to account for the influence of differences in trial size and trial duration for the contrasts, differ- ent weights are given to different contrasts. Specifically, let wi denote the weight 28 given to the ith contrast, where i= 1;2;3:::;40. Then: wi = Ncompletei r follow-up (months)i 12 ; (3.1) where follow-up (months)i is the duration of the MRI follow-up in months of the patients in the ith contrast, and Ncompletei is a number which SBRCMB choose to represent the total number of patients in this contrast. For a contrast from a trial with only 2 arms, Ncompletei is equal to the total number of patients in these two arms. For a contrast from a trial with more than 2 arms, Ncompletei is obtained by equally dividing the number of placebo patients between the treatment arms. For example, for a trial with 20 patients on each of the 3 arms, 2 contrasts are created with Ncompletei = 20+ 20 2 = 30 for both contrasts. Let Yi and Xi represent the estimated treatment effect on the clinical endpoint and surrogate endpoint from the ith contrast. SBRCMB assume the following regression model to describe the surrogacy relationship: E(Yi) = a+bXi; (3.2) and estimate the regression coefficients based on WLS; that is: minåwi(YiabXi)2: (3.3) SBRCMB also carry out a sensitivity study, an interaction study and a valida- tion study. The sensitivity study aims to check whether the regression coefficients are sensitive to the choice of the weights, or to the choice of the contrasts included in the analysis. To check the sensitivity with respect to the choice of the weights, SBRCMB refit the regression line with 2 other weights w0i and w00i , where w0i gives more weight to the duration of the contrast: w0i = Ncompletei follow-up (months)i 12 ; (3.4) 29 and w00i is a constant weight (i.e. w00i 1). To check the sensitivity with respect to the choice of the contrasts, SBRCMB divide the whole dataset into different subsets with different features, and fits regression lines based on those subsets separately, all using the weights in (3.1). The first subset is a “highest contrasts” subset, which includes only data from “the active arm with the highest dose level versus control arm” contrast. The second subset is a “RRMS contrasts” subset, which includes data only from trials with only RRMS patients. The third subset is a “large effect contrasts” subset, which includes only data from the contrasts with estimated treatment effect on the clinical endpoint greater than 20%. Table 3.1 shows the results we reproduced for the sensitivity study; these are almost the same as those from SBRCMB. Table 3.1: Results of the Sensitivity Study Analysis No. of trials No. of contrasts â b̂ R2 wi 23 40 -0.02 (0.05) 0.55 (0.04) 0.80 w0i 23 40 -0.02 (0.05) 0.58 (0.04) 0.84 w00i 1 23 40 0.12 (0.07) 0.50 (0.06) 0.65 highest 23 23 -0.06 (0.08) 0.53 (0.06) 0.77 RRMS 21 36 -0.03 (0.05) 0.56 (0.05) 0.80 large effect 18 25 -0.01 (0.10) 0.58 (0.07) 0.75 * estimate (estimated standard error) The values in the R2 column are the weighted coefficients of determination: R2 = åwi(ŷi ȳ)2 åwi(yi ȳ)2 ; (3.5) where ŷi is the fitted value and wi can be replaced by w0i when (3.4) is used. In the sensitivity study, none of the âs are significantly different from 0 but all the b̂ s are. Furthermore, SBRCMB claim that all the estimates of b s are close (all between 0.50 and 0.58) and all the R2s are close (between 0.65 and 0.84). They interpret these findings as indicating that the fitted regression line is not sensitive 30 to the choice of weights or to the choice of contrasts involved. The SBRCMB interaction study aims to check whether the regression coeffi- cients depend on the characteristics of the trials. For example, Let Ii be an indi- cator variable, which takes the value 1 if the ith contrast is from a trial conducted after year 2000 and 0 otherwise. Then SBRCMB fit the following regression model with weight wi: E(Yi) = a+b1Xi+b2Ii+b3Ii Xi: (3.6) Through assessing b2 and b3, one can see whether there is a difference in the re- gression coefficients between the contrasts before year 2000 and after year 2000. In addition to this “time period” factor, SBRCMB also examine the factors “drug class” (whether a contrast is from a trial whose treatment is an interferon) and “annualized relapse rate” (whether the observed annualized relapse rate in the placebo arm of a contrast is larger than 1). The reproduced results of the interaction study are shown in Table 3.2. The “P-value” column shows the P- values of testing if the coefficient of the interaction term is 0 (e.g. test if b3 = 0 in (3.6)). Table 3.2: Results of the Interaction Study indicator variable class No. of contrasts P-value time period > 2000 15 0.30 < 2000 25 drug class with interferon 12 0.20 not interferon 28 annualized relapse rate > 1 9 0.36 < 1 31 In the interaction study, as all these P-values are greater than 0.05, SBRCMB claim that there is no indication of differences in the slope of the fitted line for 31 contrasts with different characteristics, though SBRCMB also note that the power of this test is quite low due to the limited sample size. Finally, SBRCMB carry out a validation study, where 4 new clinical trials are introduced, which result in 4 new contrasts (each of these trials has only 2 arms). Their estimated treatment effects on the clinical endpoint are compared with the predict counterpart obtained from the regression model with weight wi. The repro- duced results of the validation study are shown in Figure 3.2, where the hollow points represent the 40 actual contrasts used in the regression model, the solid line is the estimated regression line with weight wi, the solid points represent the 4 new contrasts, and the bars are the 95% prediction intervals for the estimated treatment effects on the clinical endpoint for the 4 new contrasts. The prediction intervals are calculated by the standard regression approach: the Xis are assumed to be fixed, and the weights wis are assumed to be proportional to the inverse of the variance of the Yis. It can be seen that all the solid points lie within the prediction intervals (ex- cept for the 2nd one from the left, which is at the very edge of the prediction interval). SBRCMB claim that the estimated regression model is able to give sat- isfactory predictions. However, these 4 new trials use active control arms rather than placebo-controlled arms. So, these 4 trials have different designs from the 23 trials in SBRCMB’s dataset, and may not tell us whether the estimated regression equation can produce satisfactory predictions. Based on all of these results, SBRCMB conclude that in RRMS, the treatment effect on MRI lesion count can be used to predict the treatment effect on the annualized relapse rate. They state that these results support for the use of MRI lesion count as a surrogate endpoint in RRMS clinical trials with treatments of analogous mechanism. 32 −4 −3 −2 −1 0 1 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 Estimated Treatment Effect on the Surrogate Endpoint Es tim ate d T re atm en t E ffe ct on th e C lin ial En dp oin t l l ll _ _ _ _ _ _ _ _ Figure 3.2: Results of the Validation Study 3.3 Critique of the SBRCMB Approach In this section, we discuss shortcomings of the SBRCMB approach in assessing the surrogacy relationship. The fundamental issue is the WLS estimates may not be appropriate for the dataset. There are several reasons. First, the explanatory variable Xi used in the SBRCMBmodel is defined as the log ratio between the observedMRI lesion counts per patient per scan in the active and the control arms, and the response variable Yi used is defined as the log ratio between the observed annualized relapse rates in the active and the control arms. Since the observed endpoints are not equal to the true endpoints, Xi and Yi are just estimates of the true treatment effects. If X truei and Y true i denote the corresponding true treatment effects, then the surrogacy relationship is the relationship between 33 X truei and Y true i , not that between Xi and Yi. The SBRCMB approach doesn’t take into account the influence of estimation errors in both Xi and Yi, which may lead to a biased result. Second, 14 of the 23 trials have more than 2 arms, which leads to correlated contrasts since the contrasts from the same trial share the same control arm. There- fore, even if we believe the estimation errors are negligible so that the relationship between Yi and Xi should be an excellent approximation to the relationship be- tween Y truei and X true i , the WLS approach is still not appropriate because some of the Yis are correlated. Third, the SBRCMB choices for the weights used in the WLS estimation are quite mysterious. SBRCMB simply state that the weights are chosen because they reflect the information conveyed by each trial. Suppose that there is no estimation error, and all the Yis are independent so that it is reasonable to use the WLS ap- proach. Then are these weights appropriate? In the following subsections, we discuss each of these potential problems. We start with the appropriateness of the weights under the assumption that the WLS approach is reasonable. Then we discuss the correlation issue. Finally, we discuss the more fundamental issue of the influence of estimation errors in estimating the surrogacy relationship. 3.3.1 The Appropriateness of the Weights In this section, we focus on the relationship between Yi and Xi, and assume that all the Yis are independent. Furthermore, we assume that all the Xis are fixed. 34 We assume the following regression model: Yi = a+bxi+ ei; (3.7) where E(ei) = 0, Var(ei) = t2i and all the eis are independent. Then theoretically, the weight wi for Yi should be proportional to the inverse of the variance of ei, i.e. wi µ t2i . We use xi instead of Xi here because xis are assumed to be fixed. In the following text, we omit the subscript i on every quantity to simplify notation. Let Ra and Rc be the observed annualized relapse rate in the active and the control arms respectively from a certain trial. Let Rtruea and R true c be the corre- sponding true annualized relapse rates. ThenY = log RaRc andY true = log R true a Rtruec . Note that since Ra and Rc are from different arms with different patients, it is natural to consider them to be independent. Similarly, we consider Rtruea and R true c also to be independent. Suppose that there are Na and Nc patients in the active and control arm re- spectively, and assume that all Na+Nc patients have the same number of years of follow-up for the relapse data, namely T . Then, let Fj denote the total number of relapses of the jth patient in the active arm. We assume: E(FjjRtruea ) = TRtruea ; Var(FjjRtruea ) = f TRtruea ; (3.8) where f is a dispersion parameter, describing how the variance of the number of relapses is related to its expectation. If f = 1, this corresponds to a Poisson as- sumption. We assume that f is the same for all the patients in all the trials. Thus, f has neither subscript j nor subscript i. Then, FjT is this patient’s annualized relapse rate. From the above assumption, we have: E( Fj T jRtruea ) = Rtruea ; Var( Fj T jRtruea ) = f Rtruea T : (3.9) 35 By definition, the observed annualized relapse rate in the active arm is: Ra = F1+F2+ :::+FNa TNa : (3.10) Then, by the delta method and the Central Limit Theorem, we obtain the following approximation to the conditional distribution of logRa: logRajRtruea N(logRtruea ; f TNaRtruea ): (3.11) Similarly, for the control arm, we have: logRcjRtruec N(logRtruec ; f TNcRtruec ): (3.12) Unconditionally, we have: Var(logRa) =Var(E(logRajRtruea ))+E(Var(logRajRtruea )) (3.13) Var(logRtruea )+ fTNaE( 1Rtruea ); and similarly, for the control arm, we have: Var(logRc) =Var(E(logRcjRtruec ))+E(Var(logRcjRtruec )) (3.14) Var(logRtruec )+ fTNcE( 1Rtruec ): The independence assumption for Ra and Rc leads to: Var(Y ) =Var(log RaRc ) =Var(logRa)+Var(logRc) Var(logRtruea )+Var(logRtruec )+ fT ( 1NaE( 1Rtruea )+ 1 Nc E( 1Rtruec )): (3.15) From the above formula, we can see that the variance of Y depends on the distri- bution of Rtruea and R true c as well as on the unknown parameter f . 36 If we include the subscript i, (3.15) is actually Var(Yi) = Var(logRtrueai ) + Var(logRtrueci )+ f Ti ( 1NaiE( 1 Rtrueai )+ 1NciE( 1 Rtrueci )), for i= 1;2; :::;40. Now we assume all the Rtrueai s are identically distributed. We think all the treatments included in the SBRCMB dataset have similar mechanism of action, so the distribution of the Rtrueai describes how the true clinical endpoint varies across contrasts. Similarly, we assume all the Rtureci s are identically distributed. As a result, the variances and the expectations in (3.15) are constant across trials. One way to estimate E( 1Rtruea ) and E( 1 Rtruec ) is to average all the Ras and all the Rcs across the contrasts and take their inverse. For the SBRCMB dataset, we ob- tain Ê( 1Rtruea ) 1:43 and Ê( 1 Rtruec ) 1:10. Let q denote Var(logRtruea )+Var(logRtruec ). Then the variance of Y can be written as: t2 =Var(Y ) = q +f( 1:43 TNa + 1:10 TNc ): (3.16) The values of T , Na and Nc all depend on the contrast leading to Y . If we let c= 1:47TNa + 1:12 TNc and include the subscript i, we have: t2i = q +fci: (3.17) Based on (3.17), we can examine the appropriateness of the weights used in the SBRCMB approach. If wi = Ncompletei q follow-up (months)i 12 is appropriate, then wi should be proportional to the inverse of the variance of the estimated clinical outcome; that is: wi = a t2i = a q +fci ) 1 wi = q a + f a ci; (3.18) where a is an arbitrary proportionality constant. The above result implies that if we draw the scatter plot of (ci; 1wi ), the points should gather around a straight line. 37 ll l l l ll ll l l l l l l ll l l l l l 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0 0.0 2 0.0 4 0.0 6 0.0 8 0.1 0 Ci 1/W i Figure 3.3: Scatter Plot of (c;1=w) l l l l l l l ll ll l l l l l ll l l l l 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0 0.0 2 0.0 4 0.0 6 0.0 8 0.1 0 Ci 1/W i' Figure 3.4: Scatter Plot of (c;1=w0) 38 Figure (3.3) compares wis to cis and Figure (3.4) compares w0is to cis. In both scatter plots, the points approximately gather around a straight line. This suggests, if the assumptions we made in this section are reasonable, then both weights used by SBRCMB also seem reasonable. From these two plots, we would expect the wis and w0is to perform similarly. 3.3.2 Correlation of the Contrasts The WLS approach is appropriate when the response variables are independent. However, this is not the case for the SBRCMB data. As mentioned before, 14 of the 23 trials have more than 2 arms. So, if two contrasts are from the same trial, then the estimated treatment effect on the clinical endpoint from these two contrasts are correlated, because the two contrasts share the same control arm. For example, let Y1 and Y2 be two estimated treatment effects on the clinical endpoint from the same three-arm trial. Then, Y1 = log Ra1 Rc andY2 = log Ra2 Rc , where Ra1;Ra2 and Rc are the observed annualized relapse rates in the first active arm, the second active arm and the control arm respectively. Because Ra1 and Ra2 are from different arms with different patients, we assume they are independent. Then: Cov(Y1;Y2) =Cov(log Ra1 Rc ; log Ra2 Rc ) =Var(logRc): (3.19) Now, it is clear thatY1 andY2 are correlated. An immediate way to address this correlation in the regression model is to use generalized least squares. However, from the last section, we know that: Var(logRc)Var(logRtruec )+ f TNc E( 1 Rtruec )Var(logRtruec )+ 1:10f TNc : (3.20) To make use of generalized least squares, we need to estimate the covariance be- tween any two correlated Yi and Yj. But the estimate of that covariance requires 39 an estimate of Var(logRtruec ), the variance of the logarithm of the true annualized relapse rate across all the trials, and the unknown parameter f . These two quan- tities are difficult to estimate without assuming a more complicated model. We will address this question in the next chapter by developing a more comprehen- sive model. 3.3.3 Influence of Estimation Errors As mentioned at the beginning of this chapter, the relationship of real interest is between Y truei and X true i . However, we cannot observe Y true i and X true i directly, but can only use Yi and Xi to estimate them. Suppose the true surrogacy relationship is: E(Y truei jX truei ) = a+bX truei : (3.21) Then, the question is: when we use Yi and Xi in place of Y truei and X true i to estimate a and b as was done by SBRCMB, how good are these estimators? In this section, we consider X truei as random rather than fixed. We think it is a reasonable assumption for the SBRCMB dataset. Since all the patients included in the study received treatments that are considered to be of the same type, it is then natural to think of all the true treatment effects from the different trials as coming from a single probability distribution. To simplify the discussion, we assumeY truei and X truei are bivariate normally distributed. The conditional expectation of Y true i given X truei is given in (3.21), and the conditional variance of Y true i given X true i is denoted as t2. Also, let mX and s2X represent the expectation and the variance of X truei . Then, the bivariate normal distribution of Y true i and X true i is: Y truei X truei ! N2 a+bmX mX ! ; b 2s2X + t2 bs2X bs2X s2X !! : (3.22) 40 If we could observe Y truei and X true i , then the OLS estimators based on Y true i and X truei for b and a are unbiased and consistent. Because: b̂ = å (X truei X̄ true)(Y truei Ȳ true) å(X truei X̄ true)2 ; (3.23) where X̄ truei and Ȳ true i are the averages of all X true i s andY true i s included in the study, then: E(b̂ ) = E(E(b̂ jX true)) = E(b ) = b ; (3.24) where X true represents the collection of all X truei s. For consistency, note that å(X truei X̄ true)2=n p!Var(X truei ) = s2X and å(X truei X̄ true)(Y truei Ȳ true)=n p! Cov(Y truei ;X true i ) = bs2X , where n is the number of contrasts. So, b̂ p! bs2Xs2X = b . A similar argument can be made for â . Note that: â = Ȳ true b̂ X̄ true: (3.25) Then it is clear that E(â) = E(E(âjX true)) = E(a+b X̄ trueb X̄ true) = E(a) = a , and â p! (a+bmX)bmX = a . However, if we can only observe Yi and Xi, then the OLS estimator for b becomes: b̃ = å (Xi X̄)(Yi Ȳ ) å(Xi X̄)2 ; (3.26) where X̄ and Ȳ are the average of all Xis and Yis included in the study. Conse- quently ã = Ȳ b̃ X̄ . Are these estimators still unbiased and consistent? Consider the following simple model. Let ei and fi represent the estimation errors on X truei and Y true i respectively: Xi = X truei + ei and Yi = Y true i + fi: (3.27) 41 We assume ei iid N(0;d 2) and fi iid N(0;s2). Furthermore, we assume that ei and fi are independent and are independent of X truei and Y true i for all i. As a result, we obtain the joint distribution for the estimated treatment effects: Yi Xi ! N2 a+bmX mX ! ; bs2X + t2+s2 bs2X bs2X s2X +d 2 !! : (3.28) It follows that: E(YijXi) = a+bmX + bs 2 X s2X +d 2 (XimX) = a+bmX bs 2 X s2X +d 2 mX + b s 2 X s2X +d 2 Xi: (3.29) Analogous to (3.23) and (3.24), we now have E(b̃ ) = b s2Xs2X+d 2 , which means b̃ is not an unbiased estimator of b . For consistency, it is also clear that b̃ = SxySxx p! b s 2 X s2X+d 2 . So, b̃ is also not a consistent estimator of b . Similar conclusions hold for ã . Note that the coefficient s 2 X s2X+d 2 is always less than 1 unless d 2 = 0. Hence, under this model, when there exist estimation errors in the regressor, the expecta- tion of the OLS estimator is always smaller than its true value. This is called the attenuation effect in regression. As demonstrated, this effect does not disappear even when the sample size goes to infinity. So, when the estimation error is not negligible (i.e. d 2 is not very small relative to s2X ), the OLS estimator is not a good estimator. On the other hand, we see the estimation errors in the response variable don’t affect the unbiasedness and consistency property of the OLS esti- mator. For more complex situations such as when the estimation errors are not iden- tically distributed, or the X truei is fixed rather than random, it can be shown that the OLS estimator is still biased and inconsistent. The WLS estimator can also 42 be shown to be biased and inconsistent when there exist estimation errors in the regressor, no matter what kind of weights are applied to the data. For the SBRCMB dataset, since some trials included only a modest number of patients, non-negligible estimation errors must exist in the estimated treatment effects from those trials. Therefore, the OLS (WLS) estimator will tend to underestimate the true regression coefficient. Furthermore, using simple linear regression may lead to incorrect assessment of the surrogacy relationship. For example, in the above model, if no estimation errors exist, then the coefficient of determination R2 is the square of the sample correlation coefficient between Y truei and X true i . From (3.22), we have: R2 = [å(X truei X̄ true)(Y truei Ȳ true)]2 å(X truei X̄ true)2å(Y truei Ȳ true)2 p! b 2s4X s2X(b 2s2X + t2) : (3.30) However, if estimation errors exist, and (3.28) is assumed, the coefficient of de- termination becomes R̃2 = [å(Xi X̄)(Yi Ȳ )]2 å(Xi X̄)2å(Yi Ȳ )2 p! b 2s4X (s2X +d 2)(b 2s2X + t2+s2) : (3.31) When estimation errors exist, s2 and d 2 are always larger than 0, so the coefficient of determination tends to underestimate the square of the correlation coefficient between Y truei and X true i , which may lead to a false conclusion about the surro- gacy relationship. The coefficient of determination is 65% from SBRCMB with w00i 1. However, the correlation between the true treatment effects on the clini- cal endpoint and on the surrogate endpoint may be higher, which means a better surrogacy relationship. In the next chapter, we will re-analyze the surrogacy relationship with a more comprehensive approach to take into account the existence of estimation errors and the correlated contrasts in the SBRCMB dataset. 43 Chapter 4 Lesion counts as a Surrogate Endpoint in RRMS: A More Comprehensive Approach In this chapter, we use the SBRCMB dataset to re-analyze the surrogacy rela- tionship between the MRI lesion count and the annualized relapse rate at the trial level. We start with modeling the true treatment effects (the surrogacy relation- ship) in the single-contrast clinical trials and develop the conditional distribution of the observed endpoints given the true endpoints to account for the estimation errors. Similar models are then generalized to the multiple-contrast trials to ad- dress the issue of the correlated contrasts. Once all components of the model are constructed, the model parameters are estimated based on “normal estimating equations”. The results are then compared with those obtained from the SBR- CMB approach and the estimated surrogacy relationship is evaluated as well as its usefulness in practice. In each arm, we define the true clinical endpoint as the true annualized relapse rate, which is the expected value of the observed annualized relapse rate. In fact, every patient in the same arm has his/her own observed annualized relapse rate, 44 and we assume they all have the same probabilistic distribution whose expectation is the true annualized relapse rate (as defined in Section 3.3.1). Similarly, we de- fine the true surrogate endpoint as the true MRI lesion count per scan per patient, which is the expected value of the observed MRI lesion count per scan per patient. So, corresponding to the estimated treatment effects defined through the observed endpoints, we define the true treatment effects on the endpoints as the log ratio between the true endpoints in the active arm and in the control arm. We aim to assess the relationship between these true treatment effects. 4.1 Model for the Single-contrast Clinical Trials 4.1.1 Model for the True Treatment Effects In the SBRCMB dataset, there are 9 single-contrast trials. For each of these 9 trials, let Rtruea and R true c denote the true annualized relapse rates in the active arm and in the control arm, and let Mtruea and M true c denote the true MRI lesion counts per scan per patient in the active arm and in the control arm. Then the true treatment effect on the clinical endpoint is defined as Y true = log R true a Rtruec and the true treatment effect on the surrogate endpoint is defined as X true = log M true a Mtruec . We assume the following bivariate normal model for these two true treatment effects: Y true X true ! N2 mY mX ! ; s2Y sYX sYX s2X !! : (4.1) Since different trials consist of different patients, we assume that the true treat- ment effects are independent across trials. The model (4.1) is assumed to be true for all the contrasts from all the single-contrast trials. This is reasonable because all the trials in the dataset are included to examine the effects of treatments with similar mechanisms of action and therefore we hope to see a similar relationship between the true treatment effect on the the clinical endpoint and on the surrogate 45 endpoint across all the trials with this type of treatment. We omit the subscript i for the ith trial in our notation throughout the following development. The distribution in (4.1) is specified in an unstructured form. To express the surrogacy relationship, we represent the moments of the conditional distribution of Y true on X true as: E(Y true j X true) = a+bX true and Var(Y true j X true) = t2: (4.2) The parameter b is our primary interest, as it measures the strength of the surro- gacy relationship. If b is 0, then the MRI lesion count is not a surrogate for the annualized relapse rate for this type of treatment at the trial level, since knowledge of the true treatment effect on the MRI lesion count doesn’t help to predict the true treatment effect on the annualized relapse rate. The parameter a is also of interest and we expect it to be small. If a is not 0, there is a part of the true treatment effect on the annualized relapse rate that is unexplained by the true treatment ef- fect on the MRI lesion count per patient per scan. The parameter t2 represents the precision of this linear relationship; that is, how precisely we can predict the true treatment effect on the annualized relapse rate given the true treatment effect on the MRI lesion count. The Prentice definition (1.1) describes a prefect surrogate relationship: no treatment effect on the surrogate endpoint implies no treatment effect on the clin- ical endpoint and vice versa. In our context, (1.1) requires both a and t2 to be 0, while b must not be 0; that is, the relationship between Y true and X true is de- terministic and multiplicative: Y true = bX true. However, such a perfect surrogacy relationship will seldom be realized in practice. 46 Using the parametrization specified in (4.2), we can rewrite (4.1) as: Y true X true ! N2 a+bmX mX ! ; b 2s2X + t2 bs2X bs2X s2X !! : (4.3) 4.1.2 Model for the Observed Annualized Relapse Rate and MRI Lesion Count Per Patient Per Scan Let Ra;Rc and Ma;Mc denote the observed annualized relapse rates and the ob- served MRI lesion counts per patient per scan on the active and control arms. To derive the probability distribution of Ra and Rc, we use the same assumptions used in Section 3.3.1 and follow the notation used there (except we use f1 now instead of f ). As a result, we have: logRajRtruea N(logRtruea ; f1 TNaRtruea ); (4.4) logRcjRtruec N(logRtruec ; f1 TNcRtruec ): (4.5) Similarly, for the observed MRI lesion count, let G j denote the cumulative number of MRI lesions of the jth patient from the active arm on the K scans ob- tained for this patient during the follow-up time T . (As in SBRCMB, we assume the follow-up time for the MRI data is the same as the follow-up time for the re- lapse data, all the patients in a trial have the same follow-up time T , and all the patients in a trial have the same number of scans K.) We then assume: E(G jjMtruea ) = KMtruea ; Var(G jjMtruea ) = f2 KMtruea ; (4.6) 47 where f2 is a dispersion parameter describing how the variance of the MRI lesion count is related to its expectation. As for f1, we assume that f2 is the same for all the patients in all the trials. Thus, f2 has neither subscript j nor subscript i. Then: E( G j K jMtruea ) =Mtruea ; Var( G j K jMtruea ) = f2 Mtruea K : (4.7) By definition, the observed MRI lesion count per patient per scan is: Ma = G1+G2+ :::+GNa KNa : (4.8) Then, by the delta method and the Central Limit Theorem, we obtain the following approximation to the conditional distribution of logMa: logMajMtruea N(logMtruea ; f2 KNaMtruea ): (4.9) Similarly, for the control arm, we have: logMcjMtruec N(logMtruec ; f2 KNcMtruec ): (4.10) 4.1.3 Model for the Estimated Treatment Effects From (4.4), it is clear that Ra and Rtruea are not independent, which is reasonable since the observed clinical endpoint should depend on the true clinical endpoint. Now, we assume that given Rtruea , the conditional distribution of logRa is indepen- dent of Rtruec , M true a and M true c ; that is , if we already know R true a , the additional information of Rtruec , M true a andM true c does not help to predict logRa. It is natural to think that Rtruec and M true c affect neither Ra nor R true a . The pa- tients in the active arm and in the control arm are distinct, and the patients in the 48 active arm received the treatment while the patients in the control arm did not, so it seems obvious that the behavior of the patients in the control arm should not affect the behavior of the patients in the active arm. ForMtruea , we could think that if it affects Ra, that effect would be only through Rtruea . Therefore, instead of (4.4), we make the stronger assumption that: logRajU true = logRajRtruea N(logRtruea ; f1 TNaRtruea ); (4.11) whereU true = (Rtruea ;R true c ;M true a ;M true c ) T . The same argument leads to the corre- sponding results for logRcjU true; logMajU true and logMcjU true. Furthermore, wemake the additional model assumption that logRa; logRc; logMa and logMc are conditionally independent, given U . The motivation for this as- sumption is the intuitive notion that each observed quantity is only affected by the corresponding true quantity. So if all the true quantities are given, the observed quantities are supposed to not affect each other. Then, if U = (Ra;Rc;Ma;Mc)T , we have: logU jU true N4 0BBBB@ 0BBBB@ logRtruea logRtruec logMtruea logMtruec 1CCCCA ; 0BBBB@ f1 TNaRtruea 0 0 0 0 f1TNcRtruec 0 0 0 0 f2KNaMtruea 0 0 0 0 f2KNcMtruec 1CCCCA 1CCCCA : (4.12) Let Y = log RaRc and X = log Ma Mc denote the estimated treatment effects on the clinical outcome and on the surrogate outcome respectively. We can express Y and X in terms ofU : Y X ! = A logU; where A= 1 1 0 0 0 0 1 1 ! : (4.13) 49 Combining (4.3) and (4.12), we obtain the approximations to the first two moments of the estimated treatment effects: E Y X ! = E(A logU) = E(E(A logU jU true)) E(A logU true) = a+bmX mX ! : (4.14) Var Y X ! =Var(A logU) =Var(E(A logU jU true))+E(Var(A logU jU true)) (b 2s2X + t2)+ f1 TNa E( 1Rtruea )+ f1 TNc E( 1Rtruec ) bs 2 X bs2X s2X + f2 KNa E( 1Mtruea )+ f2 KNc E( 1Mtruec ) ! : (4.15) Unlike these marginal moments, the marginal distribution of the estimated treatment effects is difficult to derive. In fact, to obtain the marginal distribution of (Y;X)T , we need to make additional distributional assumptions aboutU true. On the other hand, as Na and Nc, the number of patients in the active arm and in the control arm increase, the influence of the estimation errors become small. As a result, the observed endpoints approach the true endpoints and the estimated treat- ment effects approach the true treatment effects. Since in (4.1) we assume that the true treatment effects follow a joint normal distribution, we may think the normal distribution with moments given by (4.14) and (4.15) is a reasonable approxima- tion to the true distribution of (Y;X)T for large Na and Nc. 4.2 Model for the Multiple-contrast Clinical Trials Besides the 9 single-contrast trials, there are 12 two-contrast trials, 1 three-contrast trial, and 1 four-contrast trial. In each of the two-contrast trials, there is a control 50 arm, a high dose arm and a low dose arm. For each of the 12 two-contrast tri- als, let Rtruea1 and R true a2 represent the true annualized relapse rate in the high dose arm and in the low dose arm respectively, and let Mtruea1 and M true a2 represent the true MRI lesion count per patient per scan in the high dose arm and in the low dose arm respectively. Then, the true treatment effects from the high dose versus control contrast can be expressed as Y true1 = log Rtruea1 Rtruec and X true1 = log Mtruea1 Mtruec , and the true treatment effects from the low dose versus control contrast can be expressed as Y true2 = log Rtruea2 Rtruec and X true2 = log Mtruea2 Mtruec . Here, we also omit the subscript i for the ith trial. To take into account the fact that these two pairs of true treatment effects, (Y true1 ;X true 1 ) and (Y true 2 ;X true 2 ), are correlated, we assume a joint normal distribu- tion for them. Focusing on (Y true1 ;X true 1 ) or (Y true 2 ;X true 2 ) individually, the marginal distributions of both of these pairs should be the bivariate normal distribution (4.3). This is because we are examining the effects of treatments with similar mechanism of action; whether two contrasts are from one trial or from different trials, they should reflect the same surrogacy relationship. However, to determine the joint distribution of these four quantities, we also need to specify the covari- ance structure between (Y true1 ;X true 1 ) and (Y true 2 ;X true 2 ). Assuming independence among the true endpoints from different arms, we have: Cov(Y true1 ;Y true 2 ) =Cov(log Rtruea1 Rtruec ; log R true a2 Rtruec ) =Var(logRtruec ); (4.16) Cov(X true1 ;X true 2 ) =Cov(log Mtruea1 Mtruec ; log M true a2 Mtruec ) =Var(logMtruec ); (4.17) Cov(Y true1 ;X true 2 ) =Cov(log Rtruea1 Rtruec ; log M true a2 Mtruec ) =Cov(logRtruec ; logM true c )(4.18) Cov(Y true2 ;X true 1 ) =Cov(log Rtruea2 Rtruec ; log M true a1 Mtruec ) =Cov(logRtruec ; logM true c )(4.19) In principle, these covariances represent 3 new parameters in the joint distri- bution of (Y true1 ;X true 1 ;Y true 2 ;X true 2 ) T in addition to the parameters a;b ;mX ;s2X ;t2 51 that appear in (4.3). However, note that,Var(Y true1 )=Var(logR true a1 )+Var(logR true c ), where Var(logRtruea1 ) represents the variability of the log of the true annualized relapse rate in the high dose arm across trials and Var(logRtruec ) represents the variability of the log of the true annualized relapse rate in the control arm across trials. So, even though in a given trial, logRtruea1 and logR true c may be quite differ- ent due to the treatment effect, the two variabilities across trials may not differ too much. To simplify our model, we assume Var(logRtruea1 ) =Var(logR true c ). Under this assumption, from (4.3), we obtain: Cov(Y true1 ;Y true 2 ) =Var(logR true c ) = 1 2 Var(Y true1 ) = 1 2 (b 2s2X + t2): (4.20) The assumption that Var(logMtruea1 ) =Var(logM true c ) similarly leads to: Cov(X true1 ;X true 2 ) =Var(logM true c ) = 1 2 Var(X true1 ) = 1 2 s2X : (4.21) At the same time, note that Cov(Y true1 ;X true 1 ) =Cov(log Rtruea1 Rtruec ; log M true a1 Mtruec ) = Cov(logRtruea1 ; logM true a1 )+Cov(logR true c ; logM true c ), whereCov(logR true a1 ; logM true a1 ) measures how closely the two true endpoints on the high dose arm are related across trials, andCov(logRtruec ; logM true c ) measures how closely the two true end- points on the control arm are related across trials. Even though the true rela- tionship between the two endpoints on the high dose arm may be quite differ- ent from that on the control arm, the two measures of closeness may not differ too much. Thus, to simplify our model, we assume Cov(logRtruea1 ; logM true a1 ) = Cov(logRtruec ; logM true c ). Under this assumption, from (4.3), we obtain: Cov(Y true1 ;X true 2 ) =Cov(Y true 2 ;X true 1 ) =Cov(logR true c ; logM true c ) (4.22) = 12Cov(Y true 1 ;X true 1 ) = 1 2bs 2 X : All these assumptions lead to the joint distribution of the true treatment effects 52 in a two-contrast trial:0BBBB@ Y true1 X true1 Y true2 X true2 1CCCCAN4 0BBBB@ 0BBBB@ a+bmX mX a+bmX mX 1CCCCA ; 0BBBB@ b 2s2X + t2 bs2X 1 2(b 2s2X + t2) 1 2bs 2 X bs2X s2X 1 2bs 2 X 1 2s 2 X 1 2(b 2s2X + t2) 1 2bs 2 X b 2s2X + t2 bs2X 1 2bs 2 X 1 2s 2 X bs2X s2X 1CCCCA 1CCCCA : (4.23) To derive the probabilistic structure of the estimated treatment effects in a two-contrast trial, we first focus on the conditional distribution of the observed endpoints given the true endpoints. Let Ũ = (Ra1;Ra2;Rc;Ma1;Ma2;Mc)T and Ũ true = (Rtruea1 ;R true a2 ;R true c ;M true a1 ;M true a2 ;M true c ) T represent the observed and true endpoints respectively. We assume that logŨ jŨ true has the same stochastic be- havior as logU jU true in the single-contrast trials. Then, as in (4.12), we have: logŨ jŨ trueN6(Ũ true; diag f1 TN1Rtruea1 ; f1 TN2Rtruea2 ; f1 TNcRtruec ; f2 TN1Mtruea1 ; f2 TN2Mtruea2 ; f2 TNcMtruec ); (4.24) where “diag” indicates a diagonal matrix. Then, combining (4.23) and (4.24), the estimated treatment effects,Y1= log Ra1 Rc , Y2 = log Ra2 Rc , X1 = log Ma1 Mc and X2 = log Ma2 Mc , have the following approximations to their first two moments: (Y1;X1;Y2;X2)T m m ! ; S1 S3 S3 S2 !! ; (4.25) where m = a+bmX mX ! 53 and S1= 0@(b 2s2X + t2)+ f1TNa1E( 1Rtruea1 )+ f1TNcE( 1Rtruec ) bs2X bs2X s2X + f2 KNa1 E( 1Mtruea1 )+ f2KNcE( 1 Mtruec ) 1A ; S2= 0@(b 2s2X + t2)+ f1TNa2E( 1Rtruea2 )+ f1TNcE( 1Rtruec ) bs2X bs2X s2X + f2 KNa2 E( 1Mtruea2 )+ f2KNcE( 1 Mtruec ) 1A ; S3 = 1 2(bs 2 X + t2)+ f1 TNc E( 1Rtruec ) 1 2bs 2 X 1 2bs 2 X 1 2s 2 X + f2 KNc E( 1Mtruec ) ! : Similarly as in the single-contrast trial, the marginal distribution of the esti- mated treatment effects are difficult to derive, since we need to make additional distributional assumptions about Ũ true. As before, we may think the normal dis- tribution with moments given by (4.25) is a reasonable approximation to the true distribution of (Y1;X1;Y2;X2)T for large Na1;Na2 and Nc. For the single three-contrast trial we have 6 estimated treatment effects, and for the single four-contrast trial we have 8 estimated treatment effects. Deriv- ing the first two moments of those 6 and 8 estimated treatment effects proceeds analogously to the above development for the 4 estimated treatment effects in the two-contrast trial. 4.3 Parameter Estimation From (4.25), we have approximations to the first two moments of the estimated treatment effects. In order to estimate the model parameters, we use the normal estimating equations: that is, we pretend the estimated treatment effects are multi- variate normally distributed with the mean vector and variance covariance matrix 54 given by (4.25). Then maximum likelihood estimates (MLE) of the model param- eters are obtained by maximizing the “normal likelihood”. In addition to the parameters of primary interest, a;b ;mX ;s2X ;t2;f1 and f2, there are several nuisance parameters in the covariance matrices that appear in this “likelihood” function, namely the expectations of the reciprocal of the true relapse rates and lesion counts such as E( 1Rtruec ) and E( 1 Mtruec ) in (4.25). When fitting the model, to avoid too many parameters to be estimated in the maximization proce- dure, we treat these terms as known and replace them by estimates. As mentioned in Section 3.3.1, we assume that all the Rtruea s in different con- trasts have the same distribution and all the Rtruec s in different contrasts also have the same distribution. As a result: E(Rtruea ) = E(R true a1 ) = E(R true a2 ); for all the contrasts: (4.26) Also, from (3.8), (3.9) and (3.10), we know that: E(Ra) = E(E(RajRtruea )) = E(Rtruea ): (4.27) From the delta method, we have the rough approximation: E( 1 Rtruea ) 1 E(Rtruea ) = 1 E(Ra) : (4.28) This means that we can use the observed annualized relapse rates to estimate the nuisance parameter E( 1Rtruea ). From the total of 40 contrasts, we estimate E( 1 Rtruea ) by the inverse of the average value of the 40 observed annualized relapse rates on the active arms. We estimate E( 1Rtruec ) similarly using the observed annualized relapse rates on the 23 control arms. By the same argument, we estimate E( 1Mtruea ) and E( 1Mtruec ) by using the observed MRI lesion counts per patient per scan from the 40 active arms and the 23 control arms respectively. As a result, we have 55 Ê( 1Rtruea ) 1:43, Ê( 1 Rtruec ) 1:10, Ê( 1Mtruea ) 0:57 and Ê( 1 Mtruec ) 0:41. To maximize the “normal likelihood”, we use the R function optim. The max- imization procedure is based on the Nelder-Mead method [16]. The optimization process is “two-staged”: after obtaining the optimized parameter estimates from each initial value, we set these as an initial value and run the optimization again to obtain a final result. The reason for doing the two-stages is that the first stage often converges to a local minimum. To avoid negative estimates for sX and t in the optimization, we re-parameterize them as hX = log(sX) and h = log(t). The first set of initial values for â; b̂ ; m̂X ; ĥX , ĥ ; f̂1 and f̂2 were -0.02, 0.55, -0.69, -0.04, -1.21, 1.5 and 1.5. The values for â and b̂ are from the SBRCMB result, the values for m̂X ; ĥX and ĥ are based on the method of moments, and the values for f̂1 and f̂2 are chosen somewhat arbitrarily. We then tried 999 different sets of random initial values, generating these ini- tial values from independent uniform distributions. Specifically, we generate ini- tial values for â; b̂ ; m̂X ; ĥX ; ĥ ; f̂1 and f̂2 uniformly on (0:5;0:5), (0;1); (2;0), (4:5;0:5), (5;0); (0:01;10) and (0:01;20) respectively. Nearly all of these initial values led to convergence to a very similar optimization result. We choose the estimate which returned the smallest negative log “likelihood” as the final so- lution. To calculate the standard errors of the parameter estimates based on the asymp- totic normality of the MLE, we invert the negative hessian matrix of the log “like- lihood” function and evaluate it at the parameter estimates. We also calculate standard errors for the parameter estimates based on the jackknife method, where we consider the 23 clinical trials as units and estimate the parameters after “leav- ing one out”. We generate 23 different subsets of the original 23 clinical trials; the ith subset is without the ith clinical trial. If the estimate of b from the ith 56 subset is b̂(i), then the jackknife estimate of the standard error of b̂ is given by [2223å(b̂(i) b̂(:))2]0:5, where b̂(:) is the average of all b̂(i)s [17]. Strictly speaking, this is not an appropriate application of the jackknife method, since different trials have different numbers of patients and different numbers of arms, which cause the estimation errors in different trials to be not identical. So the resulting estimated standard errors should be viewed as only “rough and ready” approximations. The parameter estimates and the corresponding estimated standard errors are shown in Table 4.1 and the estimated asymptotic correlation matrix of â; b̂ ; m̂X , ŝ2X ; t̂2; f̂1 and f̂2 based on the MLE method is: R̂= 0BBBBBBBBBBB@ 1:000 0:776 0:056 0:394 0:002 0:442 0:468 0:776 1:000 0:108 0:479 0:007 0:414 0:444 0:056 0:108 1:000 0:106 0:002 0:158 0:194 0:394 0:479 0:106 1:000 0:003 0:215 0:410 0:002 0:007 0:002 0:003 1:000 0:001 0:004 0:442 0:414 0:158 0:215 0:001 1:000 0:557 0:468 0:444 0:194 0:410 0:004 0:557 1:000 1CCCCCCCCCCCA (4.29) Table 4.1: Results of the Model Fit â b̂ m̂X ŝX2 t̂2 f̂1 f̂2 Value 0.081 0.622 -0.713 0.521 < 0:001 0.825 37.427 Normal SE 0.084 0.074 0.156 0.167 < 0:001 0.383 19.932 Jackknife SE 0.105 0.150 0.179 0.198 0.003 0.498 33.496 Although that all the jackknife standard errors are larger than the corresponding MLE standard errors, the results of the statistical tests for significance of the esti- mates are consistent from these two methods (except for f̂1). 57 Recall thatY true = log R true a Rtruec and X true = log M true a Mtruec . When a treatment has a bene- ficial effect, we expect a lower MRI lesion count and a smaller relapse rate, which means Y true < 0 and X true < 0. Therefore, an increase in the true treatment ef- fect corresponds to a decrease in Y true and in X true. So, b̂ = 0:622 means that on average, a one unit increase in the true treatment effect on the MRI lesion count per patient per scan is associated with a 0.622 unit increase in the true treatment effect on the annualized relapse rate. Note this value is larger than the b̂ = 0:55 obtained with the SBRCMB approach (see Table 3.1). As the SBRCMB approach didn’t take into account the estimation errors, their regression coefficient of 0.55 may underestimate the association between the true treatment effects due to the attenuation effect. Although the value for â of 0.081 is larger than the â = 0:02 from the SBRCMB approach, its approximate 95% confidence interval still covers 0. The estimate of â being not significantly different from 0 is consistent with a good sur- rogacy relationship, since there is no strong indication of part of the true treatment effect on the annualized relapse rate being unexplained by the true treatment effect on the MRI lesion count per patient per scan. Finally, the value for t̂2 is almost 0, which suggests a nearly perfect linear relationship between the true treatment effects. One can predict the true treatment effect on the annualized relapse rate al- most without error based on the true treatment effect on the MRI lesion count per patients per scan. As mentioned at the end of Section 4.1.1, the Prentice definition requires that a = 0 and t2 = 0. So, under our model assumptions, the MRI lesion count per patient per scan appears to be a very good surrogate endpoint. Buyse et al. [13] suggest to use R2trial to evaluate the true surrogacy relation- ship. Analogous to (2.14) and (2.15), b 2s2X +t2 represents the uncertainty of pre- dicting the true treatment effect on the clinical endpoint without the information of the surrogate endpoint, and t2 represents the uncertainty with the information of the surrogate endpoint. Thus, the difference b 2s2X represents how much we 58 gain from using the surrogate. From Table 4.1, we have R̂2trial = b̂ 2ŝ2X b̂ 2ŝ2X + t̂2 1: (4.30) The estimate of R2trial of almost 1 suggests a very good surrogacy relationship. As a result, we can say that, at the trial level, the MRI lesion count per patient per scan has been validated as a surrogate endpoint for the annualized relapse rate in RRMS. However, the estimate of t2 being almost 0 or the estimate of R2trial being almost 1 may not guarantee a high precision in predicting the true treatment effect on the annualized relapse rate in a new trial. In Section 4.5, we will assess this using the estimated surrogacy relationship to make such predictions. As noted earlier, the jackknife method may not be very appropriate since the 23 trials which we treat as units cannot be considered as a random sample. Of course, the standard errors calculated by the MLE method is also approximate, because we don’t have the true likelihood. In the following sections, we use the standard errors based on the asymptotic normality of the MLE to develop our re- sults. 4.4 Comparison between the Comprehensive Approach and the SBRCMB Approach In a contrast from a new clinical trial (we use the subscript “0” to denote this new contrast), if we know the true treatment effect on the MRI lesion count per patient per scan, X true0 , we can use it to predict the true treatment effect on the annual- ized relapse rate, Y true0 . In practice, however, there are only a limited number of patients included in any trial and we only have the estimated treatment effect X0. So, we need to use X0 instead of X true0 to predict Y true 0 ; that is, we want to use the surrogacy relationship to predict the treatment effect on the clinical endpoint 59 based on the estimated treatment effect on the surrogate endpoint. To identify the relationship betweenY true0 and X0, first note thatCov(Y true 0 ;X0)= E(Y true0 X0)E(Y true0 )E(X0). We assume this new trial has similar inclusion cri- teria and involves the same type of treatment as the 23 trials included in the SBR- CMB dataset. So, from (4.3) and (4.14), we have E(X0) E(X true0 ). Let U true0 = (Rtruea0 ;R true c0 ;M true a0 ;M true c0 ) T denote the true endpoints from the new contrast. Then, from (4.12), we have E(Y true0 X0) = E(E(Y true 0 X0jU true0 )) E(Y true0 X true0 ). There- fore: Cov(Y true0 ;X0) E(Y true0 X true0 )E(Y true0 )E(X true0 ) =Cov(Y true0 ;X true0 ): (4.31) As a result, we have the following approximation to the moment structure forY true0 and X0: Y true0 X0 ! a+bmX mX ! ; b 2s2X + t2 bs2X bs2X s2X + f2 K0Na0 E( 1Mtruea0 )+ f2K0Nc0E( 1 Mtruec0 ) !! ; (4.32) where K0 is the total number of scans on each patient in the new trial, and Na0;Nc0 are the number of patients in the active and control arms in the new trial respec- tively. The point prediction for Y true0 can be based on E(Y true 0 jX0), but determination of a prediction interval for Y true0 requires information on the conditional distribu- tion of Y true0 given X0. To derive this distribution, we use the normal distribution with moments given by (4.32) as an approximation to the joint distribution of Y true0 and X0. The joint distribution is unknown, but as Na0 and Nc0, the number of patients included in this new trial becomes larger, the estimation error on the estimated treatment effect X0, becomes smaller, and the estimated treatment effect approaches the true treatment effect X true0 . We may think the bivariate normal dis- tribution is a reasonable approximation for large Na0 and Nc0. 60 Under this bivariate normal approximation, we have: E(Y true0 jX0) = a+bmX(1 s 2 X s2X+H0 )+ bs2X s2X+H0 X0; (4.33) Var(Y true0 jX0) = b 2s2X(1 s 2 X s2X+H0 )+ t2; (4.34) where H0 = f2 K0Na0 E( 1Mtruea0 )+ f2K0Nc0E( 1 Mtruec0 ). So, the point prediction of Y true0 from a future contrast, given the value of X0 = x0 from that contrast, is: Ŷ true0 (x0) = Ê(Y true 0 jX0 = x0) = â+ b̂ m̂X(1 ŝ2X ŝ2X + Ĥ0 )+ b̂ ŝ2X ŝ2X + Ĥ0 x0; (4.35) where Ĥ0 = f̂2 K0Na0 E( 1Mtruea0 ) + f̂2K0Nc0E( 1 Mtruec0 ). As earlier, E( 1Mtruea0 ) and (E( 1Mtruec0 )) will be treated as known and replaced by the inverse of the average value of the 40 and 23 MRI lesion counts per patient per scan from the active and control arms in the SBRCMB dataset. The prediction interval for Y true0 given X0 = x0 can be based on the random variable: W0 = Y true0 (x0) Ŷ true0 (x0): (4.36) Note that given X0 = x0, Y true0 (x0) and Ŷ true 0 (x0) are independent, so Var(W0) = Var(Y true0 (x0)) +Var(Ŷ true 0 (x0)). From (4.34), we know that Var(Y true 0 (x0)) = b 2s2X(1 s 2 X s2X+H0 )+ t2. Furthermore, the delta method can be used to approxi- mateVar(Ŷ true0 (x0)). Specifically, let SW denote the asymptotic covariance matrix of â; b̂ ; m̂X ; ŝ2X and f̂2, and let g denote the partial derivatives of E(Y true0 jX0 = x0) with respect to a;b ;mX ;s2X and f2 (see Appendix B). Then: Var(Ŷ true0 (x0)) gT SW g: (4.37) 61 As a result, Var(W0) b 2s2X(1 s2X s2X +H0 )+ t2+gT SW g: (4.38) Note that, W0 is asymptotically normally distributed, so the approximate 95% prediction interval for Y true0 (x0) can be given by: Ŷ true0 (x0)1:96 q ˆVar(W0); (4.39) where, ˆVar(W0) = b̂ 2ŝ2X(1 ŝ 2 X ŝ2X+Ĥ0 )+ t̂2+ ĝT ŜW ĝ, and ĝ; ŜW are the partial derivatives and the asymptotic variance covariance matrix of the parameter esti- mators evaluated at their estimated values. Figure 4.1 shows the comparison between the SBRCMB results and the com- prehensive results in predicting Y true0 from X0. Although the regression relation- ship modeled in the SBRCMB approach is between the two estimated treatment effects, for this purpose, we pretend it is between the true treatment effect on the clinical endpoint and the estimated treatment effect on the surrogate endpoint. The SBRCMB prediction line is y = 0:02+ 0:55x while the prediction line for the comprehensive model is given by (4.35). To allow a specific illustration in the figure, we fixed K0 at 6 (the median number of total scans among the 40 con- trasts in the SBRCMB dataset) and Na0;Nc0 at 50 (the median number of patients among 23 placebo and 40 active arms in the SBRCMB dataset); for these values, â + b̂ m̂X(1 ŝ 2 X ŝ2X+Ĥ0 ) 0 and b̂ ŝ2Xŝ2X+Ĥ0 0:50, so (4.35) becomes y = 0:50x. The points represent the 40 pairs of estimated treatment effects from the SBRCMB dataset. From Figure 4.1, we can see that for X between -4 and 1 (the range of X in the SBRCMB dataset), the two prediction lines don’t differ much: the point predic- tions for Y true0 based on X0 from these two approaches are close. However, when 62 ll l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l −4 −3 −2 −1 0 1 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 Estimated Treatment Effect on the Surrogate Endpoint Tru e Tre atm en t E ffe ct on th e C lin ial En dp oin t SMRCMB Comprehensive Figure 4.1: Regression Prediction Lines: the SBRCMB Approach (y = 0:02+ 0:55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = 50 (y= 0:50x). X < 0, the prediction line from the comprehensive approach is above that from the SBRCMB approach. Note that, when X0 < 0, the treatment in the new trial shows a beneficial effect on the surrogate endpoint. When Y true0 < 0, the true treatment effect on the clinical endpoint is beneficial, and more negative Y true0 values repre- sent greater beneficial effects. So, Figure 4.1 implies that for a future trial with moderate sample size (50 patients in each arm, for example) and a total of 6 scans, if the treatment shows a beneficial effect on the surrogate endpoint, the true treat- ment effect on the clinical endpoint predicted by the SBRCMB approach is always slightly greater than that predicted by the comprehensive approach. This means when prediction of the true treatment effect on the clinical endpoint is based on the estimated treatment effect on the surrogate endpoint (on which estimation er- 63 rors exist), the SBRCMB approach may slightly overestimate the true treatment effect on the clinical endpoint. Figure 4.2 shows another comparison between the SBRCMB results and the comprehensive results in predicting Y true0 from X true 0 . We pretend that the SBR- CMB approach models the regression relationship between the two true treatment effects; the prediction line is y = 0:02+ 0:55x. The prediction line from the comprehensive model is also given by (4.35), but now we choose Na0 and Nc0 to be infinity, to reflect the case that the future trial includes sufficient number of pa- tients so that the observed treatment effect on the surrogate endpoint estimates the true treatment effect with negligible error. When Na0 and Nc0 are infinity, (4.35) becomes y= 0:08+0:62x. l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l −4 −3 −2 −1 0 1 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 TrueTreatment Effect on the Surrogate Endpoint Tru e Tre atm en t E ffe ct on th e C lini al En dp oin t SMRCMB Comprehensive: no estimation error Figure 4.2: Regression Prediction Lines: the SBRCMB Approach (y = 0:02+ 0:55x) and the Comprehensive Approach with K0 = 6 and Na0 = Nc0 = ¥ (y= 0:08+0:62x). 64 From Figure 4.2, we see the two prediction lines intersecting at X true0 =1:39. Note that exp(X true0 ) = Mtruea0 Mtruec0 and exp(1:39) = 0:25. So X true0 = 1:39 means the treatment leads to a 75% reduction in MRI lesion count per patient per scan in the new trial, which is a large beneficial effect. Therefore, when the true treatment effect on the surrogate endpoint is available, the SBRCMB approach may under- estimate/overestimate the true treatment effect on the clinical endpoint if the true treatment effect on the surrogate endpoint is larger/smaller than this value. We can also compare the point predictions of the two approaches for the 40 contrasts included in the SBRCMB dataset. The SBRCMB approach still uses the prediction line y=0:02+0:55x to predict all of the Y true0 s. But since each con- trast has a different total number of scans and different numbers of patients, the comprehensive approach yields point predictions of the Y true0 s that are no longer on a straight line. Figure 4.3 and Figure 4.4 show the comparison between the SBRCMB results and the comprehensive results in predicting Y true0 from X0, for the 40 contrasts in the SBRCMB dataset. In Figure 4.3, the solid points represent the point predic- tions for the 40 contrasts from the comprehensive approach, and the transparent points represent the pairs of estimated treatment effects. In Figure 4.4, the point predictions from the comprehensive approach are plotted against the correspond- ing predictions from the SBRCMB approach. From Figure 4.3 and 4.4, we can see that the point predictions for the true treatment effect on the clinical endpoints from the two approaches are generally very close. However, when X0 < 0, all the predictions from the comprehensive approach are larger than the corresponding predictions from the SBRCMB ap- proach. So, for those contrasts where the treatments show beneficial effects on the surrogate endpoint, the SBRCMB approach may overestimate the true treat- ment effects on the clinical endpoint. Again, this is because none of those trials 65 ll l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l −4 −3 −2 −1 0 1 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 Estimated Treatment Effect on the Surrogate Endpoint Tru e Tre atm en t E ffec t o n t he Cl inia l E ndp oin t l l l l l l l l l l l l ll l l l l l l l l l l l ll l l l l SMRCMB Prediction Comprehensive Prediction A Figure 4.3: Point Predictions for the 40 Contrasts l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 − 1.5 − 1.0 − 0.5 0.0 0.5 SMRCMB Prediction Co mp reh ens ive Pr ed icti on y=x A Figure 4.4: Comparison of Point Predictions for the 40 Contrasts 66 include infinite number of patients, so estimation error exists in the measurement of the treatment effect on the surrogate endpoint. The SBRCMB prediction may be a little more liberal due to its failure to take into account the estimation error. The point A in Figure 4.3 and 4.4 shows the effect of estimation error on predict- ing the true treatment effect on the clinical endpoint clearly. Note that this point deviates substantially from the remaining points. This point represents the single- contrast clinical trial which has only 10 patients in each arm. So the estimation error in the measurement of the treatment effect on the surrogate endpoint is very large. From (4.35), we know that when Na0 and Nc0 are very small, b̂ ŝ2X ŝ2X+Ĥ0 is much smaller than b̂ . This is why the point A deviates substantially from the rest of the points in the y direction. This means, with a large estimation error in the measurement of the treatment effect on the surrogate endpoint, a large estimated treatment effect on the surrogate endpoint may not be associated with a large true treatment effect on the clinical endpoint. We can also compare the prediction intervals of the two approaches. The prediction interval for Y true0 (x0) from the comprehensive approach can be calcu- lated from (4.39), and the prediction interval from the SBRCMB approach can be calculated from the standard regression method. (To do so, we pretend the SBR- CMB approach models the regression relationship between the true treatment ef- fect on the clinical endpoint and the estimated treatment effect on the surrogate endpoint.) Table 4.2 shows the result of the approximate 95% prediction intervals of exp(Y true0 (x0)) for the 40 contrasts included in the SBRCMB dataset. Note that exp(Y true0 ) = Rtruea0 Rtruec0 , which represents the true treatment effect on the annualized relapse rate in a future contrast, expressed as a percentage. Table 4.2 is ordered based on the magnitude of exp(X0) = Ma0 Mc0 , the estimated percentage treatment ef- fect on the surrogate endpoint. The first column is the ID of the contrast in the SBRCMB dataset (see Appendix A). 67 Table 4.2: Comparison of the Approximate 95% Prediction Intervals for exp(Y true0 (x0)) for the SBRCMB and Comprehensive Approaches Contrast exp(X0) SBRCMB Comprehensive ID Point Interval Point Interval 3 0.02 0.12 (0.02, 0.60) 0.29 (0.12, 0.72) 29 0.04 0.17 (0.08, 0.35) 0.23 (0.13, 0.42) 20 0.08 0.24 (0.12, 0.49) 0.27 (0.18, 0.40) 21 0.11 0.29 (0.15, 0.58) 0.32 (0.22, 0.47) 28 0.17 0.37 (0.30, 0.45) 0.38 (0.29, 0.50) 15 0.19 0.39 (0.14, 1.09) 0.45 (0.28, 0.74) 25 0.30 0.50 (0.26, 0.95) 0.53 (0.38, 0.74) 4 0.32 0.52 (0.14, 1.95) 0.60 (0.33, 1.12) 14 0.34 0.54 (0.19, 1.53) 0.59 (0.37, 0.95) 8 0.35 0.55 (0.33, 0.91) 0.58 (0.43, 0.78) 40 0.36 0.55 (0.18, 1.68) 0.62 (0.34, 1.12) 1 0.37 0.56 (0.21, 1.48) 0.63 (0.35, 1.12) 26 0.39 0.58 (0.30, 1.12) 0.61 (0.43, 0.87) 27 0.40 0.58 (0.30, 1.14) 0.62 (0.44, 0.88) 2 0.41 0.59 (0.22, 1.59) 0.65 (0.36, 1.16) 36 0.44 0.62 (0.25, 1.51) 0.66 (0.44, 0.99) 10 0.47 0.64 (0.38, 1.09) 0.68 (0.49, 0.94) 24 0.47 0.64 (0.34, 1.21) 0.68 (0.49, 0.94) 6 0.48 0.65 (0.30, 1.40) 0.69 (0.34, 1.41) 38 0.51 0.67 (0.27, 1.63) 0.71 (0.47, 1.05) 7 0.58 0.72 (0.43, 1.20) 0.77 (0.57, 1.03) 68 Table 4.2: (continued) Contrast exp(X0) SBRCMB Comprehensive ID Point Interval Point Interval 5 0.67 0.78 (0.53, 1.16) 0.80 (0.49, 1.30) 33 0.67 0.78 (0.47, 1.30) 0.82 (0.58, 1.16) 18 0.69 0.79 (0.52, 1.20) 0.85 (0.66, 1.08) 9 0.76 0.84 (0.49, 1.44) 0.89 (0.64, 1.23) 19 0.82 0.88 (0.38, 2.04) 0.89 (0.56, 1.40) 30 0.88 0.91 (0.55, 1.51) 0.97 (0.73, 1.30) 39 0.91 0.93 (0.38, 2.26) 0.95 (0.63, 1.42) 11 0.92 0.93 (0.37, 2.35) 0.95 (0.62, 1.45) 23 0.96 0.95 (0.75, 1.21) 1.00 (0.71, 1.41) 32 1.04 1.00 (0.59, 1.70) 1.04 (0.72, 1.49) 16 1.06 1.01 (0.17, 5.97) 0.90 (0.48, 1.68) 37 1.11 1.03 (0.42, 2.53) 1.05 (0.70, 1.58) 22 1.16 1.06 (0.84, 1.35) 1.11 (0.78, 1.57) 17 1.27 1.11 (0.19, 6.60) 0.95 (0.51, 1.79) 13 1.35 1.15 (0.45, 2.98) 1.14 (0.73, 1.77) 31 1.47 1.21 (0.71, 2.07) 1.29 (0.94, 1.77) 34 1.61 1.27 (0.61, 2.68) 1.18 (0.71, 1.96) 35 1.69 1.31 (0.62, 2.79) 1.20 (0.71, 2.01) 12 1.74 1.33 (0.51, 3.45) 1.29 (0.82, 2.02) 69 From Table 4.2, we find that the lengths of the prediction intervals from the comprehensive approach are generally shorter than those obtained from the SBR- CMB approach (34 out of 40 are shorter), which indicates that the comprehensive approach gives more precise prediction. This can be explained by the existence of estimation error in the measurement of the treatment effect on the clinical end- point. Although we pretend that the SBRCMB approach can be used to predict Y true0 , it actually predicts Y0. Since in general, Y0 is more variable than Y true 0 , it may not be surprising that the SBRCMB prediction intervals tend to be wider. Figure 4.5 illustrates this information. The solid points and the solid lines rep- resent the point predictions and the 95% prediction intervals from the SBRCMB approach, while the hollow points and the dashed lines represent those from the comprehensive approach. It is clear from the figure that most of the prediction intervals from the comprehensive approach are shorter than those from the SBR- CMB approach. The second column of Table 4.2 is the estimated percentage treatment effect on the surrogate endpoint. If X0 < 0 or equivalently, exp(X0) = Ma0 Mc0 < 1, then the treatment showed a beneficial effect on the surrogate endpoint in the contrast. Among the 40 contrasts, there are 30 contrasts where Ma0Mc0 < 1. For those contrasts, we expect to see beneficial true treatment effects on the clinical endpoint; that is, exp(Y true0 ) = Rtruea0 Rtruec0 < 1. However, based on the comprehensive approach, among those 30 contrasts, only 14 have 95% prediction intervals that don’t contain 1. So for the other 16 contrasts, we get inconclusive prediction results for the true treatment effect on the clinical endpoint. The SBRCMB results are less definitive; only 7 contrasts have 95% prediction intervals that don’t contain 1. In the next section, we will study how the magnitude of the estimated treatment effect on the surrogate endpoint and the number of patients influence the prediction interval of the true treatment effect on the clinical endpoint. 70 01 2 3 . . . . . . 6 7 0 0.5 1 1.5 exp(Xo) ll l l ll lllll llll l ll l l ll l l l ll l l l l l l l l l l l l SBRCMB Point Prediction Comprehensive Point Prediction SBRCMB 95% Prediction Interval Comprehensive 95% Prediction Interval Figure 4.5: Comparison of the Approximate 95% Prediction Intervals for exp(Y true0 (x0)) for the SBRCMB and Comprehensive Approaches 71 4.5 Assessment of the Estimated Surrogacy Relationship in Practice For the MRI lesion count per patient per scan to be a useful surrogate endpoint in practice, it must provide precise enough information on the true treatment effect on the annualized relapse rate. Table 4.3 investigates the influence of the magni- tude of X0 (or exp(X0)) and the sample size Na0;Nc0 of the future contrast on the prediction interval for Y true0 (x0) (or exp(Y true 0 (x0))) calculated from the compre- hensive approach. When calculating the prediction intervals, we fix K0 = 6. We set Na0 =Nc0 =N0 and vary N0 from 10 to 600 (the number of patients in the arms in the SBRCMB dataset range from 8 to 627). We also vary exp(X0) from 0.02 to 1.8 (the values of exp(X0) in the SBRCMB dataset range from 0.024 to 1.742). The entries in Table 4.3 are the point predictions and approximate 95% prediction intervals for exp(Y true0 (x0)). From Table 4.3, first we note that, within each column (i.e., given the value of the estimated treatment effect on the surrogate endpoint), the length of the approx- imate 95% prediction interval for the true treatment effect on the clinical endpoint becomes shorter as N0 increases. This is expected, since larger N0 represents more information on the new contrast, and the prediction will be more precise. The last row in Table 4.3 represents the situation when a new trial includes infinite number of patients. In such a case, the estimation error in the measurement of the treat- ment effect on the surrogate endpoint becomes negligible. However, we see the prediction interval for expY true0 (X0) doesn’t shrink to a point: even if we know the true treatment effect on the surrogate endpoint, we still cannot predict the true treatment effect on the clinical endpoint without error. From Table 4.1, we know that t̂2 0, which suggests a nearly perfect linear relationship between the true treatment effects. Therefore, the uncertainty in the last row of Table 4.3 is due to the fact that the surrogacy relationship is not estimated precisely enough (other parameters such as a and b are not estimated precisely enough). 72 Table 4.3: Influence of the Sample Size N0 and the Magnitude of the Estimated Treatment Effect on the Sur- rogate Endpoint on the 95% Prediction Intervals for the True Treatment Effect on the Clinical Endpoint for Trials with K0 = 6 Scans per Patient. The Entries are the Point Predictions and Approximate 95% Prediction Intervals for exp(Y true0 (x0)). exp(X0) N0 0.02 0.1 0.2 0.5 0.8 0.9 1.0 1.5 1.8 10 0.28 0.44 0.54 0.70 0.80 0.83 0.85 0.96 1.01 (0.11, 0.70) (0.21, 0.93) (0.27, 1.07) (0.36, 1.35) (0.41. 1.55) (0.43, 1.61) (0.44, 1.66) (0.49, 1.89) (0.51, 2.02) 20 0.20 0.37 0.49 0.70 0.84 0.88 0.92 1.08 1.16 (0.09, 0.44) (0.20, 0.70) (0.27, 0.87) (0.41, 1.21) (0.49. 1.46) (0.51, 1.53) (0.53, 1.60) (0.61, 1.91) (0.65, 2.07) 50 0.14 0.31 0.44 0.70 0.89 0.94 1.00 1.22 1.34 (0.08, 0.24) (0.20, 0.50) (0.29, 0.67) (0.47, 1.05) (0.60, 1.33) (0.63, 1.41) (0.66, 1.49) (0.81, 1.86) (0.88, 2.05) 100 0.12 0.29 0.42 0.70 0.91 0.98 1.03 1.30 1.44 (0.07, 0.19) (0.20, 0.41) (0.31, 0.58) (0.52, 0.95) (0.67, 1.25) (0.71, 1.33) (0.76, 1.42) (0.93, 1.80) (1.02, 2.02) 200 0.11 0.27 0.41 0.70 0.93 1.00 1.06 1.34 1.50 (0.07, 0.16) (0.20, 0.36) (0.32, 0.53) (0.56, 0.89) (0.73, 1.18) (0.78, 1.27) (0.82, 1.36) (1.02, 1.77) (1.12, 2.00) 600 0.10 0.26 0.40 0.70 0.94 1.01 1.07 1.38 1.54 (0.06, 0.15) (0.20, 0.34) (0.33, 0.49) (0.60, 0.83) (0.78, 1.13) (0.83, 1.22) (0.88, 1.31) (1.09, 1.74) (1.19, 1.99) ¥ 0.10 0.26 0.40 0.70 0.94 1.02 1.08 1.40 1.56 (0.06, 0.15) (0.20, 0.33) (0.34, 0.46) (0.63, 0.78) (0.82, 1.09) (0.87, 1.18) (0.92, 1.23) (1.13, 1.73) (1.23, 1.98) 73 Recall that, exp(X0) = Ma0 Mc0 and exp(Y true0 ) = Rtruea0 Rtruec0 . So, when a new treatment is efficacious, we hope to observe exp(X0) < 1 and expect exp(Y true0 ) < 1 (i.e., the upper bound of the approximate 95% prediction interval to be less than 1). On the other hand, when a new treatment has a negative effect, we hope to observe exp(X0)> 1 and expect exp(Y true0 )> 1 (i.e., the lower bound of the approximate 95% prediction interval to be larger than 1). The last two columns of Table 4.3 represent the situation when the treatment shows medium or large negative effects on the surrogate endpoint (the treatment is 50% or 80% worse than the control in terms of the observed surrogate end- point), so we hope to see the lower bound of the prediction interval larger than 1. This only happens when N0 200 for exp(X0) = 1:5 and when N0 100 for exp(X0) = 1:8. So for negative observed treatment effects on the surrogate endpoint to imply negative true treatments effects on the clinical endpoint, a new contrast needs to include a large number of patients. For those contrasts with a medium or small number of patients or with a less extreme observed treatment effect on the surrogate endpoint, conclusive predictions for the true treatment on the clinical endpoint will not be possible. The 6th and 7th columns of Table 4.3 represent the situation when exp(X0) is close to 1; that is, the estimated treatment effect on the surrogate endpoint is beneficial but the magnitude is small. We see all the prediction intervals within these two columns contain 1 even when N0 is infinite. This suggests that when a new treatment shows only a small beneficial effect on the surrogate endpoint, we will not be able to determine if this treatment really has an effect on the clinical endpoint based on the estimated surrogacy relationship. In other words, the esti- mated surrogacy relationship is not very helpful in such a situation. The 5th column of Table 4.3 shows the situation when exp(X0) = 0:5, which represents a medium beneficial estimated treatment effect on the surrogate end- 74 point (50% reduction in the observed surrogate endpoint). However, when N0 < 100, the prediction intervals all contain 1. So, when a new treatment shows a medium beneficial effect on the surrogate endpoint, we will only be able to con- clude this treatment has an effect on the clinical endpoint if the new trial includes sufficient patients. The first 3 columns of Table 4.3 represent the situation when exp(X0) is close to 0; that is, the estimated treatment effect on the surrogate endpoint is benefi- cial and the magnitude is very large. When N0 20, all the prediction intervals exclude 1. This means we are 95% sure that an observed beneficial treatment effect on the surrogate endpoint corresponds to a true beneficial treatment effect on the clinical endpoint. On the other hand, how precisely we can determine the magnitude of the true treatment effect on the clinical endpoint is also of interest. This precision is indicated by the length of the prediction interval. Note that when N0 50, the lengths of all the prediction intervals are no less than 0.3 except for the case when N0 = 50 and exp(X0) = 0:02. As N0 = 50 is a typical size for a phase 2 clinical trial in RRMS, this suggests the prediction of the true treatment effect on the clinical endpoint may not be very precise for a phase 2 clinical trial of small or medium size. On the other hand, when N0 100, all the lengths of the prediction intervals are smaller than 0.25 except for the case when N0 = 100 and exp(X0) = 0:2. This indicates the prediction is relatively precise when a trial has a large number of patients. We also investigate the relationship between N0 and the value of exp(X0) for which the prediction interval for exp(Y true0 ) excludes 1 (we fixK0= 6). Burzykowsky and Buyse [18] introduced a similar concept called the “surrogate threshold ef- fect”. This value represents the least extreme value of the estimated treatment effect on the surrogate endpoint from which we can obtain a conclusive predic- tion for the true treatment effect on the clinical endpoint. In Figure 4.6 and Figure 4.7, we plot the “threshold value” of exp(X0) against N0. Figure 4.6 shows the re- 75 sult when a treatment shows a beneficial effect on the surrogate endpoint (X0 < 0), and Figure 4.7 shows the result when a treatment shows a negative effect on the surrogate endpoint (X0 > 0). From Figure 4.6, we see that when the treatment shows a beneficial effect on the surrogate endpoint, the threshold value increases as N0 increases. A larger threshold value represents a smaller estimated treatment effect on the surrogate endpoint. So, for a contrast with large number of patients, even though we ob- serve only a relatively small treatment effect on the surrogate endpoint, we can still conclude that the treatment has a beneficial effect on the clinical endpoint. The threshold value for N0 = 50 is exp(X0) = 0:46, which means in order to con- clude that a new treatment has a beneficial effect on the clinical endpoint for a contrast with 50 patients in each arm, this treatment has to be observed to be at least 100%46%= 54% better than the control on the surrogate endpoint. Simi- larly, for N0 = 10, 20, 100, 200 and 600, the threshold values are 0.14, 0.30, 0.55, 0.61 and 0.67. Note that the asymptote for the curve is 0.71, which indicates the threshold value obtained when N0 = ¥. So, when we try to predict the true treat- ment effect on the clinical endpoint based on the estimated surrogacy relationship, we require the new treatment to be at least 29% better than the control on the sur- rogate endpoint in order to conclude that there is a true beneficial treatment effect on the clinical endpoint. From Figure 4.7, we see that when the treatment shows a negative effect on the surrogate endpoint, the threshold value decreases as N0 increases. We can in- terpret Figure 4.7 in a similar way as Figure 4.6. For example, here the threshold value for N0 = 50 is 2.39, which means in order to conclude that a new treatment has a negative effect on the clinical endpoint for a contrast with 50 patients in each arm, this treatment has to be observed to be 139% worse than the control on the surrogate endpoint. Note that the asymptote here is 1.19. So, when we try to predict the true treatment effect on the clinical endpoint based on the estimated 76 0 100 200 300 400 500 600 0.3 0.4 0.5 0.6 0.7 Number of Patients in the Placebo/Active arm Min imu m Es tim ate d T re atm en t E ffec t o n M RI exp(Xo)=0.71 Figure 4.6: Threshold Value of exp(X0) versus Sample Size N0 when a Ben- eficial Treatment Effect is Observed on the Surrogate Endpoint 0 100 200 300 400 500 600 1 2 3 4 5 6 7 Number of Patients in the Placebo/Active arm Min imu m Es tim ate d T re atm en t E ffec t o n M RI exp(Xo)=1.19 Figure 4.7: Threshold Value of exp(X0) versus Sample Size N0 when a Neg- ative Treatment Effect is Observed on the Surrogate Endpoint 77 surrogacy relationship, we require the new treatment to be at least 19% worse than the control on the surrogate endpoint in order to conclude that there is a true negative treatment effect on the clinical endpoint. In conclusion, the estimated surrogacy relationship is useful in predicting the true treatment effect on the clinical endpoint when the treatment shows a large effect on the surrogate endpoint and the number of patients in the contrast is large (e.g. exp(X0) = 0:1 and N0 = 100). However, when the treatment shows a moder- ate beneficial effect on the surrogate endpoint (e.g. exp(X0) = 0:5), the prediction is not very precise (the prediction interval is wide). When the treatment only shows a small beneficial effect on the surrogate endpoint (exp(X0)> 0:71), using the estimated surrogacy relationship will lead to an inconclusive result for the true treatment effect on the clinical endpoint. From (4.30), we know that the true surrogacy relationship may be very good or nearly perfect. Nevertheless, the surrogate endpoint may not be very useful in predicting the true treatment effect on the clinical endpoint unless the treatment shows a large effect on the surrogate endpoint. Furthermore, even if a new trial includes sufficient number of patients so that we can measure the treatment effect on the surrogate endpoint perfectly, we still cannot predict the true treatment effect on the clinical endpoint without error. These may be explained by the limited number of trials included in the SBRCMB dataset. Since we only have 23 trials, we may not estimate the true surrogacy relationship precisely. So, use of the estimated surrogacy relationship may not result in a very precise prediction. 78 Chapter 5 Conclusions and Discussion In a clinical trial, a surrogate endpoint is used as a substitute for the clinical end- point to assess the treatment effect. Using a surrogate endpoint instead of the clinical endpoint can shorten the period of a clinical trial, or reduce the number of patients needed in a clinical trial, and therefore reduce the cost. However, before a potential surrogate endpoint can be formally employed in practice, it must be validated. Use of an invalidated surrogate endpoint can lead to an incorrect con- clusion about the treatment effect and thus use of the treatment in future may lead to ineffective or even harmful impact on patients. A potential surrogate endpoint can be validated in a single clinical trial or in multiple clinical trials if the multiple trials study the same or similar treatments. When the validation is carried on in multiple trials, the validation process can be based on the summary information of each trial or on the individual patient data, depending on whether the individual patient level data is available. When indi- vidual patient level data is not available, we lose the possibility of examining how closely a surrogate is related to the clinical endpoint in individual patients, but retain the ability to evaluate the relationship between the treatment effects on the surrogate and the clinical endpoints. 79 In RRMS clinical trials, changes in MS brain lesion patterns determined by MRI reflect the underlying MS disease pathology and hence may be the best can- didate for a surrogate endpoint. In this report, we studied whether the MRI lesion count per patient per scan can serve as a surrogate endpoint for the annualized re- lapse rate, which is the most commonly used clinical endpoint for RRMS clinical trials. The SBRCMB dataset only includes summary information from 23 clinical trials. Two different approaches (the SBRCMB approach and the comprehensive approach) are applied to the SBRCMB dataset to assess this potential surrogacy relationship. The SBRCMB approach discussed in Chapter 3 uses simple linear regression with weighted least squares estimation, where the response and the explanatory variables are the estimated treatment effects on the clinical and the surrogate end- points from each contrast, and the weights are chosen to account for the influence of different numbers of patients and different durations of contrasts. However, this approach treats the estimated treatment effects as the true treatment effects (doesn’t take into account the estimation errors) and ignores the correlation struc- ture among contrasts from the same trial. The comprehensive approach discussed in Chapter 4 assumes a multivariate normal distribution for the true treatment effects to take into account the corre- lation structure among the contrasts from the same trial, and develops the con- ditional distribution of the estimated treatment effects given the true endpoints. The approximated marginal moments of the estimated treatment effects are then determined. To estimate the parameters related to the surrogacy relationship, we use the normal estimating equations. The b̂ from the comprehensive approach is 0.62, which is larger than 0.55 from the SBRCMB approach. So, the SBRCMB approach may underestimate the association between the true treatment effects. Neither of the âs from the two 80 approaches are significantly different from 0, which is consistent with a good sur- rogacy relationship, since there is no strong indication of part of the true treatment effect on the annualized relapse rate remaining unexplained by the true treatment effect on the MRI lesion count per patient per scan. The SBRCMB approach ob- tains a weighted R2 = 0:80, and the comprehensive approach obtains R̂2trial 1. Both indicate a good surrogacy relationship. For the comprehensive approach, R̂2trial 1 is equivalent to t̂2 0, which indicates a negligible estimated condi- tional variance of the true treatment effect on the annualized relapse rate given the true treatment effect on the MRI lesion count per patient per scan. Under the assumptions of the comprehensive approach, the Prentice definition about a surro- gate endpoint requires that a = 0 and t = 0. So, the MRI lesion count per patient per scan appears to be a very good surrogate endpoint for the annualized relapse rate. To assess how good this estimated surrogacy relationship is in practice, we predict the true treatment effect on the clinical endpoint for the 40 contrasts in- cluded in the SBRCMB dataset. The point predictions from the two approaches are very close, but those from the comprehensive approach are slightly larger than those from the SBRCMB approach for most contrasts. So, for those trials which showed beneficial treatment effects on the surrogate endpoint, the SBRCMB ap- proach tends to predict slightly larger treatment effects than the comprehensive approach. The interval predictions from the two approaches are quite different however. The length of the prediction interval from the comprehensive approach is generally shorter (34 out of 40 are shorter), which indicates the comprehensive approach gives more precise prediction. For the comprehensive approach, we also study how the number of patients per arm and the value of the estimated treatment effect on the surrogate endpoint affect the prediction interval for the true treatment effect on the clinical endpoint. For a new contrast with infinite number of patients in each arm (i.e. the estimation 81 error in the measurement of the treatment effect on the surrogate endpoint is neg- ligible), we require the treatment to be observed to be at least 29% better or 19% worse than the control on the surrogate endpoint, in order to avoid inconclusive prediction for the true treatment effect on the clinical endpoint. For a new con- trast with limited number of patients in each arm, we require the treatment to show more extreme effects. For a typical phase 2 clinical trial in RRMS with 50 patients in each arm and with 6 scans for each patient, we require the treatment is at least 54% better or 139% worse. Among the 30 contrasts included in the SBRCMB dataset where the treatments show beneficial effects on the surrogate endpoint, 20 show treatment effects greater than 54%, while among the 10 contrasts where the treatments show negative effects on the surrogate endpoint, only 4 treatments are 139% or more worse than the control. So, the estimated surrogacy relationship could be useful in prediction when a treatment shows an beneficial effect on the surrogate endpoint, but may not be useful in the contrary case. In addition, when the number of patients per arm is around 50, the prediction interval is wide and doesn’t yield a precise prediction, unless the treatment shows a very large effect on the surrogate endpoint (e.g. 90%). In conclusion, the comprehensive approach shows that the underlying surro- gacy relationship may be very good. In a typical phase 2 with around 50 patients in each arm and with 6 scans for each patient, the estimated surrogacy relationship can give precise prediction for the true treatment effect on the clinical endpoint when the treatment displays a large effect on the surrogate endpoint. However, when the treatment displays only a modest or a small effect on the surrogate end- point, the prediction may be inconclusive or not precise enough. The reason for this may be the limited number of trials included in the SBRCMB dataset: the parameters related to the surrogacy relationship may not be estimated precisely enough, which leads to a relatively wide prediction interval. To employ the surro- gacy relationship to make predictions in practice, we may need information from more trials to estimate the surrogacy relationship more precisely. 82 The comprehensive approach we developed is in the spirit of Daniels and Hughes [2] (DH) and Korn et al. [3] (KAM). Both construct models to assess surrogacy relationships using summary results from multiple clinical trials. Both DH and KAM use multivariate normal distributions for the true treatment effects in their models to allow for correlated contrasts. However, DH starts with assump- tions about the surrogacy relationship between the true treatment effects directly, while KAM starts with assumptions about the true endpoints, where the influence of the true surrogate endpoint on the true clinical endpoint is assumed to be the same regardless of the presence of the treatment. Building the model from end- points requires a more detailed specification and we think the KAM assumptions may not be very appropriate in practice, so we started with assumptions about the true treatment effects. On the other hand, both papers assume the estimation errors in estimating the true treatment effects are independent from the true treat- ment effects. In contrast, we assume they are dependent and large true treatment effects are associated with small estimation errors. We think this dependence as- sumption is more reasonable in practice. However, not making assumptions about the true endpoints and the dependence estimation errors makes it difficult to ob- tain the marginal distribution the estimated treatment effects in our model. If one can find a reasonable assumption on the distribution of the true endpoints, then the marginal distribution can be obtained, and the surrogacy relationship could be re-estimated using the actual likelihood rather than the “approximated” like- lihood. Furthermore, DH adopt a Bayesian approach to estimate the surrogacy relationship. By choosing appropriate priors for the parameters, we could also use a Bayesian approach to estimate the surrogacy relationship and compare the results to those obtained in this study. The SBRCMB dataset only contains summary information from each trial but not the individual patient information. If the individual patient information is available, one can re-analyze the surrogacy relationship using the individual pa- 83 tient level data and compare the results for the estimated surrogacy relationship with those from this study. In principle, the estimated surrogacy relationship from the model with individual patient level data should be more precisely determined, since this model includes more information. However, if the two results are close, one may favor the model based on summary results. This is because it is much easier to collect the summary results of each trial than to collect the individual patient data from each trial, and the estimation process of the model with only summary results may be much less computational intensive. Despite this, if the individual patient information is available, one can assess how closely the surro- gate endpoint is related to the clinical endpoint, (e.g. Rind from Buyse et al. [13]), which is useful for patient management. 84 Bibliography [1] M. P. Sormani, L. Bonzano, L. Roccatagliata, G. R. Cutter, G. L. Mancardi, and P. Bruzzi. Magnetic resonance imaging as a potential surrogate for re- lapses in multiple sclerorsis: A meta-analytic approach. Annals of Neurol- ogy, 65:268–275, 2009. [2] M. J. Daniels and M. D. Hughes. Meta-analysis for the evaluation of poten- tial surrogate markers. Statistics in Medicine, 16:1965–1982, 1997. [3] E. L. Korn, P. S. Albert and L. M. McShane. Assessing surrogates as trial endpoints using mixed models. Statistics in Medicine, 24:163–182, 2005. [4] T. Burzykowsky, G. Molenberghs and M. Buyse. The Evaluation of Surro- gate Endpoints. Springer, New York, New York, 2005. [5] R. L. Prentice. Surrogate endpoints in clinical trials: Definition and opera- tional criteria. Statistics in Medicine, 8:431–440, 1989. [6] H. F. McFarland, F. Barkhof, J. Antel, and D. H. Miller. The role of MRI as a surrogate outcome measure in multiple sclerosis. Multiple Sclerosis, 8: 40–51, 2002. [7] T. R. Fleming and D. L. DeMets. Surrogate endpoints in clinical trials: Are we being misled? Annals of Internal Medicine, 125:605–613, 1996. [8] M. Buyse and G. Molenberghs. Criteria for the validation of surrogate end- points in randomized experiments. Biometrics, 54:1014–1029, 1996. 85 [9] V. W. Berger. Does the Prentice criterion validate surrogate endpoints? Statistics in Medicine, 23:1571–1578, 2004. [10] L. S. Freedman and B. I. Graubard. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine, 11:167–178, 1992. [11] G. Molenberghs, M. Buyse, H. Geys, D. Renard, T. Burzykowski, and A. Alonso. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Cotrolled Clinical Trials, 23:607–625, 2002. [12] A. Alonso, G. Molenberghs, T. Burzykowski, D. Renard, H. Geys, Z. Shkedy, F. Tibaldi, J. C. Abrahantes, and M. Buyse. Prentice’s approach and the meta-analytic paradigm: A reflection on the role of statisitcs in the evaluation of surrogate endpoints. Biometrics, 60:724–728, 2004. [13] M. Buyse, G. Molenberghs, T. Burzykowsky, D. Renard, and H. Geys. The validation of surrogate endpoint in meta-analyses of rnadomized experi- ments. Biometrics, 1:49–67, 2000. [14] R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statis- tical Association, 90:773–795, 1995. [15] A. J. Petkau, S. C. Reingold, U. Held, G. R. Cutter, T. R. Fleming, M. D. Hughes, D. H. Miller, H. F. McFarland, and J. S. Wolinsky. Magnetic reso- nance imaging as a surrogate outcome for multiple sclerosis relapses. Mul- tiple Sclerosis, 14:770–778, 2008. [16] J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308–313, 1965. [17] F. Mosteller and J. W. Tukey. Data Analysis and Regression, a Second Course in Statistics. Addison-Wesley, Reading, Massachusetts, 1977. 86 [18] T. Burzykowsky and M. Buyse. Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation. Pharmaceutical Statistics, 5:173–186, 2006. 87 Appendix A The SBRCMB Dataset In the table that follows, the last four columns represent the observed endpoints from each contrast: MRI = MRI lesion count per patient per scan; ARR = an- nualized relapse rate. The symbol “C” means “control arm” and the symbol “A” means “active arm”. Unless otherwise noted, entries in columns 1, 2, 3, 4, 5, 11 and 12 are copied from the supplementary table accompanying the SBRCMB pa- per. Entries in the remaining columns are extracted or calculated from the original papers where the results of the corresponding clinical trials are reported. 88 Trial Contrast MRI SBRCMB Follow-up # of # of Patients MRI ARR ID Outcome Weight (months) Scans C A C A C A 1 1 Active T2 a 37 24 6 17 17 0.82 0.30 1.27 1.17 1 2 Active T2 a 36 24 6 17 17 0.82 0.33 1.27 0.84 2 3 Active T2 b 14 6 6 10 10 3.37 0.08 2.00 0.34 3 4 Active T2 b 20 6 6 14 14 4.22 1.37 1.29 0.57 4 5 Active T2 b 233 24 2 82 83 2.40 1.60 0.82 0.67 5 6 New T2 59 24 2 19 23 3.65 1.75 1.31 0.45 6 7 CUA c 138 24 10 66 64 1.55 0.90 1.28 0.91 6 8 CUA c 140 24 10 66 68 1.55 0.55 1.28 0.87 7 9 CUA c 123 12 6 97 87 1.70 1.30 1.08 1.08 7 10 CUA c 124 12 6 97 85 1.70 0.80 1.08 0.81 8 11 CUA c 41 6 6 43 44 1.48 1.37 0.98 1.00 8 12 CUA c 39 6 6 43 40 1.48 2.58 0.98 1.64 8 13 CUA c 39 6 6 43 40 1.48 2.00 0.98 1.47 9 14 New Gd 32 6 6 33 32 1.22 0.42 0.88 0.90 9 15 New Gd 33 6 6 33 32 1.22 0.23 0.88 1.07 10 16 New Gd 11 9 9 10 8 3.00 3.18 0.27 0.48 10 17 New Gd 11 9 9 10 8 3.00 3.80 0.27 0.88 11 18 New T2 207 9 9 120 119 1.52 1.04 1.21 0.81 12 19 CUA c 49 6 6 34 36 2.42 1.98 1.29 1.50 89 Trial Contrast MRI SBRCMB Follow-up # of # of Patients MRI ARR ID Outcome Weight (months) Scans C A C A C A 13 20 CUA c 74 6 6 71 68 1.62 0.13 0.51 0.09 13 21 CUA c 77 6 6 71 74 1.62 0.18 0.51 0.22 14 22 New T2 758 14 1 467 471 6.80 7.90 0.61 0.60 14 23 New T2 751 14 1 467 462 6.80 6.50 0.61 0.54 15 24 New T2 87 6 6 81 83 1.07 0.50 0.77 0.35 15 25 New T2 84 6 6 81 77 1.07 0.32 0.77 0.36 16 26 CUA c 79 9 7 61 61 2.68 1.04 0.81 0.58 16 27 CUA c 77 9 7 61 57 2.68 1.06 0.81 0.55 17 28 Active T2 b 1332 24 2 315 627 5.50 0.95 0.73 0.23 18 29 New Gd 74 6 4 35 69 1.12 0.05 0.84 0.37 19 30 New Gd 140 d 12 8 84 96 0.72 0.64 0.44 0.46 19 31 New Gd 128 d 12 8 84 87 0.72 1.06 0.44 0.60 20 32 New T2 129 9 4 102 98 2.40 2.50 0.77 0.76 20 33 New T2 136 9 4 102 106 2.40 1.60 0.77 0.52 21 34 CUA c 65 12 4 41 44 4.50 7.25 0.50 1.00 21 35 CUA c 63 12 4 41 42 4.50 7.62 0.50 0.88 22 36 New Gd 44 d 6 6 49 50 1.73 0.77 0.53 0.44 22 37 New Gd 44 d 6 6 49 50 1.73 1.91 0.53 0.52 22 38 New Gd 44 d 6 6 49 50 1.73 0.88 0.53 0.56 22 39 New Gd 44 d 6 6 49 50 1.73 1.57 0.53 0.44 23 40 New Gd 28 6 5 19 19 1.03 0.37 0.63 0.42 anew, recurrent and enlarging T2 lesions bnew and enlarging T2 lesions ccombined uniquely active lesions = recurrent and enlarging T2 lesions and new Gd enhancing lesions, avoiding double counting dcalculated from the original papers; these differ from those in the SBRCMB paper 90 Appendix B Partial Derivatives of E(Y true0 jX0 = x0) From (4.34), we have: E(Y true0 jX0 = x0) = a+bmX(1 s2X s2X +H0 )+b s2X s2X +H0 x0; where H0 = f2[ 1K0Na0E( 1 Mtruea0 )+ 1K0Nc0E( 1 Mtruec0 )] = f2c0 say. Let L0 = s2X s2X+H0 . Then: E(Y true0 jX0 = x0) = a+bmX(1L0)+bL0x0: So: ¶E ¶a = 1; ¶E ¶b = mX(1L0)+L0x0; ¶E¶mX = b (1L0); ¶E ¶L0 =bmX +bx0; ¶L0¶s2X = H0 (s2X +H0)2 91 ¶E ¶s2X = ¶E ¶L0 ¶L0 ¶s2X = (bmX +bx0) H0 (s2X +H0)2 ; ¶L0 ¶H0 = s2X (s2X +H0)2 ; ¶H0 ¶f2 = c0; ¶E ¶f2 = ¶E ¶L0 ¶L0 ¶H0 ¶H0 ¶f2 = (bmX +bx0) s 2 X (s2X +H0)2 c0: The entries of the partial derivative of g is then given by: g= ( ¶E ¶a ; ¶E ¶b ; ¶E ¶mX ; ¶E ¶s2X ; ¶E ¶f2 )T : 92
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
Sri Lanka | 5 | 0 |
Japan | 4 | 0 |
China | 3 | 10 |
United States | 1 | 0 |
City | Views | Downloads |
---|---|---|
Colombo | 5 | 0 |
Tokyo | 4 | 0 |
Beijing | 3 | 1 |
Redmond | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: