Analyses of Longitudinal and Time-to-Event Data in a Randomized Clinical Trial in the Presence of a Lag Time in the Stabilization of Treatment By Eugenia Yu Hoi Y i n B. Sc. University of British Columbia 2000 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA February 2003 © Eugenia Yu Hoi Y i n , 2003 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Statistics The University of British Columbia 6356 Agricultural Road Vancouver, Canada V6T 1Z2 Date: Apr.'I IT, 2 O 0 3 Abstract Randomized controlled clinical trials (RCTs) are generally considered to be the best experi-mental setting for assessing new medical therapies. In medical research, the evaluation of RCTs is often based on two approaches: the commonly recommended intent-to-treat (ITT) analysis, and the more controversial per-protocol (PP) approach, which respectively attempt to assess the clinical effectiveness and the efficacy of a therapy. In the presence of a variable lag time in treatment stabilization following randomization, the two approaches may differ not only in their patient inclusion and exclusion criteria, but also in their definitions of the baseline time, from which follow-up is to be measured. In this work, ITT and PP analyses are applied to the evaluation of an eye pressure lowering therapy, in data from the Collaborative Normal Tension Glaucoma Study. In this study, the therapeutic intervention consisted of achieving a 30% re-duction in intra-ocular pressure, and necessitated a lag time before the lowered pressure level became stable. This thesis includes longitudinal and survival analyses, based on measurements taken on some of the main variables in this study. In this case, the PP approach defines baseline time in the treated group as the time at which treatment stabilization has been achieved. It thus loses some of the advantages of randomization, and may suffer potential bias in parameter estimation as well as diminishing statistical power in testing the treatment effect. We inves-tigate these potential problems through some simulation work. While the ITT and the PP approaches fail to account for the delay in treatment stabilization, we also develop a multistate model for survival analysis and a piecewise linear mixed effects (LME) model for longitudinal analysis, both of which address the lag time problem in assessing the effectiveness of the ther-apy. Finally, we consider a baseline-adjustment approach to match the control group to the delayed treatment group for an efficacy assessment of the therapy. These methods that account for the lag time are compared to the ITT and the PP approaches, and recommendations based on their performance in our study and their general applicability are given. i i Table of Contents Abstract ii List of Tables vi List of Figures viii Acknowledgements x 1 Introduction 1 2 The Collaborative Normal Tension Glaucoma Study 4 2.1 Normal Tension Glaucoma 4 2.1.1 Introduction 4 2.1.2 Epidemiology 5 2.1.3 Diagnosis and Management 6 2.1.4 Visual Field Measurements 7 2.2 Motivation and Design of the Study 8 2.3 Description of the Data 10 2.3.1 Definition of the Progression End Point 10 3 Evaluation of Randomized Clinical Trials 12 3.1 Intent-to-Treat versus Per-Protocol Principles 12 3.2 Intent-to-Treat and Per-Protocol Analyses of the Normal Tension Glau-coma Data 14 4 Linear Mixed Effects Models for the Longitudinal Mean Defect Data 17 4.1 The Linear Mixed Effects Model 18 ii i 4.2 Application to the Mean Defect Data 19 5 Multistate Models for the Time to Event Data 24 5.1 Stochastic Multistate Models for Survival Time Data 24 5.1.1 The Disability Model 25 5.1.2 Application to the Normal Tension Glaucoma Data 27 5.2 Baseline-Adjustment Analysis of the Time to Progression Data 31 6 Results 33 6.1 Analysis of the Mean Defect Data . . . 33 6.1.1 Piecewise Linear Modelling Approach 36 6.2 Analysis of the Time to Event Data : . . 46 6.2.1 Analysis of the Time to IOP Stabilization Data 47 6.2.2 Analysis of the Time to Progression Data . . 52 6.2.3 Analysis Using the Disability Model 62 6.2.4 The Baseline-Adjustment Analysis • • • 66 7 Simulation 71 7.1 The Objectives ; 71 7.2 Generating Mean Defect Data ; 72 7.2.1 Generation of the Mean Structure Specified by the Fixed Effects . 73 7.2.2 Generation of the Random Effects and the Within-Patient Errors 73 7.2.3 Different Combinations of the Decay Rates of the Mean Defect '. . 75 7.3 Simulation Procedures 78 7.4 Results . 79 7.4.1 Comparison of the ITT and the PP approaches 79 7.4.2 Comparison of the P P and the P C L I N approaches 84 8 Discussion 96 9 Conclusions 111 iv 10 Future Research Bibliography List of Tables 6.1 R E M L coefficient estimates of the fixed effects in the L M E model for the ITT and P P approaches. . . 34 6.2 R E M L coefficient estimates of the fixed effects in the L M E model for the . Three Different Approaches (ITT, PP, PCLIN) 41 6.3 Table of AICs, BICs, observed log likelihoods and within-patient mean square errors (a2) from fitting the L M E models for the three different approaches 46 6.4 Estimates of the log relative hazards ratio (ft) and relative hazards ra-tio (exp(/5)) for the gender and treatment type covariates from the Cox regression analysis of the time to IOP stabilization data 50 6.5 Estimates of the log relative hazards ratio (/5) and relative hazards ratio (exp(/3)) for the group covariate from the gender-specific Cox regression analysis of the time to progression data (M for males, F for females). . . 58 6.6 Estimates of the log relative hazards ratio (ft) and relative hazards ratio (exp(/3)) from the stratified (by gender) Cox regression analysis of the time to progression data. . . 60 6.7 Estimates of the log relative hazards ratio (/3) and relative hazards ratio (exp(/3)) for the time-dependent IOP stabilization covariate within the treated patients from the gender-specific time-dependent Cox regression analysis (M for males, F for females) 66 6.8 Estimates of the log relative hazards ratio (/3) and relative hazards ratio (exp(/3)) for the group covariate from the gender-specific Cox regression analysis under the baseline-adjustment approach (M for males, F for fe-males) 68 vi 7.9 The different decay rates of the M D for the control and treated groups used in the simulation. . 76 7.10 Proportions of rejecting the null hypothesis of no time by group interaction effect in the ITT and the P P analyses, for an M D decay rate of the control group = -0.001721 and different sets of M D decay rates of the treated group. 80 7.11 Proportions of rejecting the null hypothesis of no time by group interaction effect in the ITT and the PP analyses, for an M D decay rate of the control group = -0.003442 and different sets of M D decay rates of the treated group. 81 7.12 Proportions of rejecting the null hypothesis of no difference in the M D decay rates between the two treatment groups after IOP stabilization in the P P and the P C L I N analyses, for an M D decay rate of the control group = -0.001721 and different sets of M D decay rates of the treated group. . . 86 7.13 Proportions of rejecting the null hypothesis of no difference in the M D decay rates between the two treatment groups after IOP stabilization in the P P and the P C L I N analyses, for an M D decay rate of the control group = -0.003442 and different sets of M D decay rates of the treated group. . . 87 7.14 Sample mean of the 500 estimated regression coefficients and standard errors in the P P and the P C L I N analyses, for an M D decay rate of the control group = -0.001721 and different sets of M D decay rates of the treated group 89 7.15 Sample mean of the 500 estimated regression coefficients arid standard errors in the PP and the P C L I N analyses, for an M D decay rate of the control group = -0.003442 and different sets of M D decay rates of the treated group. . . 90 vii List of Figures 4.1 The observed M D trajectories over time from randomization (in days) for the 53 control patients in the data set. 20 4.2 The observed M D trajectories over time from randomization (in days) for the 44 treated patients in the data set 21 5.3 The Disability Model . 26 5.4 The Disability Model for the Glaucoma Data 28 6.5 The average M D level observed over time from randomization for the con-trol and the treated groups 38 6.6 Individual observed M D trajectories and the mean fitted trajectory from the P C L I N approach from time of randomization for the control group. . 43 6.7 Individual observed M D trajectories and the mean fitted trajectory from the P C L I N approach from time of randomization for the treatment group. 44 6.8 A random sample of 24 individual fitted M D trajectories from the P C L I N approach from time of randomization 45 6.9 (a) Estimated Kaplan-Meier survivor functions for the time to IOP stabi-lization of the four gender by treatment groups, (b) Estimated Kaplan-Meier survivor functions for the time to IOP stabilization of the surgical and non-surgical groups 49 6.10 (a) Estimated Kaplan-Meier survivor functions for the time to progression of the four treatment-gender groups based on the intent-to-treat (ITT) approach, (b) Estimated Kaplan-Meier survivor functions for the time to progression of the four treatment-gender groups based on the per-protocol (PP) approach 56 viii 6.11 (a) Estimated Kaplan-Meier survivor functions for the time to progression of the two treatment groups based on the intent-to-treat (ITT) approach. (b) Estimated Kaplan-Meier survivor functions for the time to progression of the two treatment groups based on the per-protocol (PP) approach. . 6.12 (a) Estimated Kaplan-Meier survivor functions for the time to progression of the four gender by treatment groups based on the baseline-adjustment approach, (b) Estimated Kaplan-Meier survivor functions for the time to progression of the treated and the control groups based on the baseline-adjustment approach 7.13 Histograms of estimated coefficients Ptime-x.gr mip (from the P P analysis) and /?4 + p5 (from the P C L I N analysis) for randomly selected sets of (PCPTUPTT) used in the simulation, (a) pc = -0.001721, pTl = -0.004232, (3T2 = -0.0015 (b) Bc = -0.001721, pTl = -0.002977, PTI = -0.005953 (c) pc = -0.003442, pTl = -0.005953, pT2 =-0.0025 (d) pc = -0.003442, (3T1 = -0.004698, pT2 = -0.003442 ix Acknowledgements I would like to thank my supervisor Dr. Michael Schulzer for his continuous guidance and patience without which the development of this thesis would not have been possible. His expertise in both statistical and medical fields has enlightened me and motivated my interest in biostatisics. He has inspired me a lot in biostatistical research, and broadened my knowledge beyond textbooks and examples. Also, thank you to Dr. Jim Zidek for his support and confidence in me throughout my stay in U B C as both an undergraduate and a graduate student. My decision to pursue further studies in statistics was undoubtedly due to Jim's enthusiasm and encour-agement. As well, I am grateful to Dr. Nancy Heckman and Dr. Richard Mathias (from the Department of Health Care and Epidemiology) for being my second readers and for their precious advice on my thesis. Needless to say, thanks. should also go to all the graduate students and faculty of the department, and to all my friends. Vivien and Lisa:- many thanks for walking with me through these years, especially for all the mental and intellectual support which bonds us closely as the millennium Head-Line Team!! Mandy and Stephanie: thanks for lending an ear to me during my hard times!! Last but not least, I am most thankful for the love and support from my parents and my aunts who are always beside me. E U G E N I A Y U The University of British Columbia April 2003 Chapter 1 Introduction Randomized controlled cl inical trials (RCTs) are generally recognized as the best experi-mental setting for assessing new medical therapies. Randomizat ion of patients to different treatments promotes comparabil ity of the treatment groups and minimizes potential se-lection biases with respect to unmeasured characteristics of the patients. Differences in the response between the control and the treatment groups may then be attr ibuted to the treatment itself rather than to some confounding factors. When there is a lag time in the stabil ization of treatment following randomization, the definition of baseline from which patients are to be followed is particularly crucial in the cl inical comparisons between treatment groups. Randomizat ion alone may not be sufficient to validate the results of treatment comparisons if the baseline is defined inappropriately. In general, when a method of comparison is inappropriate, or when the assumptions underlying a correct approach are not satisfied, the results wi l l be erroneous and may lead one to type I or type II errors. F ind ing an appropriate method of comparison in the presence of a lag time in the stabi l ization of treatment following randomization is often difficult. The possibil ity of a lag time in ful l treatment effect was first noted by Halperin et al. [1]. Such a lag time arises, for example, when the treatment under study does not take immediate effect upon its administration to treated patients at the time of randomization. There is a delay before the intended treatment effect is achieved, and this makes comparisons between the control and the treatment groups difficult. A l though randomization of patients guards 1 Chapter 1. Introduction 2 against bias in the treatment assignment and in subsequent data analyses, this Tag time is likely to affect the results of comparisons but often cannot be specified precisely before the trials are conducted. The effect of such a lag time on statistical comparison procedures has been receiving great attention by the medical and statistical communities in the past few decades. The presence of a lag time in the treatment is mostly seen in long-term treatments. Well-known examples include the Lipid Research Clinics Coronary Primary Prevention Trial (CPPT) [2], the Women's Health Trial [3] and the Physicians' Health Study [4]. In the C P P T , the treatment was a cholesterol lowering therapy. The therapy was expected to gradually reduce the amount of plaque in blood vessels and hence the risk of coronary disease. In the Women's Health Trial, a randomized controlled trial was initiated to determine if a low fat diet effectively reduced the incidence of breast cancer among high-risk group of women. The cholesterol lowering therapy and the diet intervention introduced some sort of linear lag phase where the effect of the treatment gradually increased with time. On the other hand, a different model for the lag time was used in the Physicians' Health Study where the effect of beta-carotene on cancer incidence was investigated. The experimenters believed that the drug did not affect pre-existing tumors and time was needed for new tumors to develop and become detectable. Hence a threshold lag time was assumed: the effect of the drug was not associated with tumors detected within the first two years since administration at the time of patient randomization. In all the above cases, the treatment effect was not immediate and thus introduced a lag time from the time of randomization before the treatment reached its full effect. Several authors proposed new statistical procedures to take into account the lag time in analyzing survival data. Zucker and Lakatos [5] considered a linear and a threshold lag model, and presented two weighted log rank type statistics for comparing survival curves in a non-parametric setting. Luo [6] extended their ideas to the Cox proportional hazards regression model to include lagged effects of some of the covariates. Nevertheless, little discussion in the literature has been devoted to the consequences of applying ordinary Chapter 1. Introduction 3 comparison procedures without a careful adjustment for the delay. The lag time present in non-survival type data has not been widely addressed either. It is important to make clinical practitioners and designers of clinical studies aware of the problem and this motivated this study of the statistical issues related to treatment lag times. For this work, I have focused on investigating and discussing the effect posed by a delay in treatment stabilization on the results of treatment comparisons. The comparison procedures are based on the intent-to-treat and the per-protocol principles which are commonly adopted in the evaluation of clinical trials. Longitudinal and survival data collected from the Collaborative Normal Tension Glaucoma Study [7] - [10] are used for analyses throughout this work. Approaches that take into account the lag time present in our data have been considered, in addition to some classical methods of comparison for longitudinal and survival analyses. Furthermore, a demonstration of the potential problems including bias and diminishing statistical power in applying the per-protocol approach will be given through some simulation work. The remainder of this thesis is organized as follows. The details of the Collaborative Normal Tension Glaucoma Study and an introduction to normal tension glaucoma are first provided in Chapter 2. Chapter 3 gives a full description of the intent-to-treat and per-protocol principles and an application to the evaluation of the glaucoma study. Methodologies of modelling the longitudinal and survival data from the glaucoma study are detailed in Chapters 4 and 5, and the results of analyses are presented in Chapter 6. Simulation of longitudinal data for demonstrating the performance of the intent-to-treat and per-protocol approaches follows in Chapter 7. A general discussion and conclusions are given in Chapters 8 and 9. The thesis ends with recommendations and suggestions for future work in Chapter 10. Chapter 2 The Collaborative Normal Tension Glaucoma Study The Collaborative Normal Tension Glaucoma Study (CNTGS) [7]-[10] is a prospective multi-center study for investigating the effects of intra-ocular pressure (IOP) reduction on disease progression in normal tension glaucoma (NTG). Before giving the details of the design of the study in Section 2.2 and a description of the data in Section 2.3, we familiarize the readers with some basic information on the nature of the disease, its diagnosis and management to enhance their understanding of the purpose of the trial and subsequently, the methodologies and analyses which are presented in this thesis. 2.1 Normal Tension Glaucoma 2.1.1 Introduction Glaucoma has been one of the leading causes of blindness among adults and the elderly in particular. Its definition varies across the ophthalmic community, but the disease can be referred to as a chronic ophthalmic condition characterized by optic nerve head damage, a characteristic loss of visual field and an elevated IOP. In 1857, Von Graefe [11] described a group of patients having cupping of the optic nerve head and visual field defects but with IOP levels that remained within the statistically normal range. The term "normal tension glaucoma" was then coined to describe this particular group of glaucomatous conditions. 4 Chapter 2. The Collaborative Normal Tension Glaucoma Study 5 2.1.2 Epidemiology The prevalence of normal tension glaucoma was estimated in a number of studies. The estimates range from 0.3 to 4% among patients in their mid-60s [12]. The wide range is the result of the many different definitions of N T G being employed. Similar to Open-Angle Glaucoma, which comprises the largest group of patients suffering from glaucoma with elevated IOP, normal tension glaucoma is mostly asymptomatic at the early stage. There is no associated visual field loss and therefore most patients are unaware of the disease. When untreated, N T G patients will gradually lose their peripheral vision and eventually may suffer total blindness. The Glaucoma Research Foundation [13] reported a rate of blindness from glaucoma between 93 and 126 per 100,000 people over the age of 40. In particular, Open-Angle Glaucoma accounts for 19% of all blindness among African-Americans compared to 6% in Caucasians. As has been discussed by Sassani [14], results from previous research on the risk factors for normal tension glaucoma showed that age, gender, race, diseases including migraine and diabetes, and genetic factors are associated with the development of the disease. More specifically, the prevalence increases with age; females, Asians and African Americans, and people with migraine, diabetes or family history of glaucoma are more susceptible to developing normal tension glaucoma. The most recent study of the natural history of the disease, as conducted by the Collaborative Normal Tension Glaucoma Study Group, investigated the risk factors for the progression of visual field abnormalities in N T G [10]. It was found that the female gender, the presence of migraine and disk hemorrhage contribute separately to a higher risk of progression. Asian patients have a slower rate of progression despite a high prevalence of N T G within the race, while black patients show a faster rate of progression. Moreover, age, the untreated level of IOP and self-declared family history of glaucoma were found to have no effect on the progression rate in this study. Chapter 2. The Collaborative Normal Tension Glaucoma Study 6 2.1.3 Diagnosis and Management The diagnosis of normal tension glaucoma is often made by a diagnosis of exclusion. The determination of the nature of the disease is based upon the elimination of other diseases that share similar symptoms and characteristics. A l l other causes leading to damage of the optic nerve and visual field loss, for example, cardiovascular abnormalities, must be eliminated, and the IOP level has to be shown repeatedly not to exceed the normal statistical upper bound (21 mm Hg) before normal tension glaucoma can be diagnosed [12]-To measure intra-ocular pressure, tonometry is used, but it has rather poor sensitivity and specificity in detecting glaucoma if used alone [15]. Therefore in practice, it is used in combination with ophthalmoscopy, which examines the appearance of the optic nerve, for early detection of the disease. Moreover, gonioscopy helps to examine the structure of the anterior chamber angle for determining whether a patient suffers from Open-Angle or Angle-Closure Glaucoma. Normal tension glaucoma shares many clinical features with Open-Angle Glaucoma, but we do not plan to discuss their similarities here. Readers can refer to the literature on glaucoma for more details. Sassani [14] provides a comprehensive reference on the subject. Despite the fact that an IOP outside the normal range has not been documented, patients with normal tension glaucoma tend to have a wider diurnal IOP fluctuation, which might account for the glaucomatous features in the absence of a consistent ele-vated IOP level [12]. Furthermore, studies have shown that asymmetric normal tension glaucoma is associated with an asymmetric IOP [16]-[18], so IOP is believed to play a role in the underlying mechanism causing the disease. And with this belief, treatments for normal tension glaucoma aim at reducing IOP levels. Patients diagnosed to have an early stage of the disease are usually treated with medications. When patients show progression on medication or experience considerable visual field loss, they are treated Chapter 2. The Collaborative Normal Tension Glaucoma Study 7 with laser surgery. Filtering or incisional surgery is applied upon failure of the previously mentioned treatments or upon persistence of the progression. The medical and surgical treatments all attempt to lower IOP in hope of preventing further progressive damage. 2.1.4 Visual Field Measurements To monitor disease progression, both the optic nerve head and the visual field need to be assessed regularly. Nowadays, automated static perimetry is used to quantify visual field loss, based on the linear relationship between visual perception and the change in stimulus intensity which is measured in the logarithm scale of decibels (dB) [15]. The Humphrey Field Analyzer (HFA), which was used in the CNTGS, is one of the most commonly used automatic perimeters. In essence, perimetry based on the HFA entails estimating threshold values at each test location in the central 30 degrees of the visual field, where a threshold can be described as the minimum brightness of a stimulus a patient perceives at a particular test location. To estimate the threshold value at each location tested, stimuli are presented at that location and the intensities are decreased in AdB steps until reversal, i.e., from perceived to not perceived or vice versa. The test process then reverses and the intensities increases in 2dB steps until the second reversal occurs, at which time the threshold determination is stopped. The last seen stimulus intensity will be used as the threshold estimate. Detailed description of the HFA and the threshold estimation procedure can be found in [19]. From these threshold values at different locations summarize the four global indices that quantify visual field loss: the mean deviation or mean defect (MD), the pattern standard deviation (PSD), the short-term fluctuation (SF) and the corrected pattern standard deviation (CPSD). The mean defect, which is an important outcome variable in our data, is a variance weighted average departure of each test.location from the age-corrected value, where a threshold of stimulus intensity is observed at every test location in the visual field: Chapter 2. The Collaborative Normal Tension Glaucoma Study 8 MD = f 1 yt -n where Ui is the observed threshold Vi is the normal age-corrected reference threshold sf is the between-patient variance of normal field measurements at the ith of the n test locations. The M D measures the overall sensitivity of the retina to light. A large negative M D is suggestive of a serious overall abnormality of the visual field. On the other hand, PSD and CPSD are more effective indices for quantifying localized visual field defects. In the presence of a cataract, M D tends to have reduced specificity because cataracts are characterized by a generalized depression of thresholds over the entire field, thus leading to a decreased M D level. Monitoring the rate of decay of M D is useful for assessing the rate of progression of normal tension glaucoma, as M D is reflective of the overall degree of visual field defects. 2.2 Motivation and Design of the Study For all forms of glaucoma that are associated with an elevated intra-ocular pressure (IOP), treatments always involve the lowering of the IOP, and a reduced IOP has known beneficial effect on the natural history of the disease. However, for N T G patients whose IOP stays inside the statistically normal range, the usefulness of having an IOP reduction is unknown. Clinical findings suggest asymmetric N T G is often associated with asym-metric IOP levels [16]-[18]. One of the main objectives of the C N T G S was to ascertain Chapter 2. The Collaborative Normal Tension Glaucoma Study 9 the role IOP reduction plays in normal tension glaucoma. The C N T G S Group compared the time-to-progression experience of an untreated group of N T G patients to a treated group in which patients received medical, laser and/or surgical treatment(s) to achieve a 30% reduction from the mean of the last three prerandomization pressure readings. The effectiveness and the efficacy of the IOP lowering strategy were assessed by using an intent-to-treat approach and a per-protocol approach, respectively. The principles underlying the two approaches are discussed in Chapter 3. Two hundred and thirty patients from twenty-four centers were enrolled in the study. To be eligible for the study, the patients all had unilateral or bilateral normal tension glaucoma and other ophthalmic characteristics which met the criteria as described in [7]. Upon entry into the study, patients remained unrandomized until a fixation threat or progression of the study eye(s) occurred. A fixation threat can be described as having visual field defects at the point of fixation, which is the area of maximum visual acuity in human visual field. The eligible eye of each patient was then randomized to either the control group, in which the eye remained untreated, or the treatment group, in which a 30% reduction in IOP was achieved by means of medical, laser and/or surgical interventions. Most treated patients were first placed on topical medication or laser treatment. When either or both failed to reduce the IOP to the desired level, patients underwent filtering surgery. There were also cases where treated patients were given the surgical treatment immediately after randomization. Once stabilization of the treatment effect was achieved, the patients were followed regularly until their study eyes reached the progression end point (which is defined in Section 2.3.1) or until their lifetime in the study was censored. Meanwhile, patients had their mean defect (MD) measured repeatedly and regularly, at each of their clinical visits. The time to IOP stabilization after a 30% reduction for the treated patients, the time to the progression end point and the mean defect values comprised the three main outcomes of the study. Covariate information on demographics, medical history Chapter 2. The Collaborative Normal Tension Glaucoma Study 10 and baseline ophthalmic characteristics was also collected. 2.3 Description of the Data Due to confidentiality concerns, a subset of the data analyzed by the C N T G S Group was used for this thesis. Hereafter, I will refer to this subset as the data unless stated otherwise. The data were obtained by sampling at random 97 patients from the 145 who enrolled in the study and whose study eyes met the criteria for randomization. Among the selected 97 patients, 44 were in the treated group and 53 were in the control group. Longitudinal data of the mean defect measurements (in decibels) and survival data of the time to IOP stabilization (in days) and the time to progression (in years) were available for analyses. Besides the group variable, the effects of gender, type of therapy that treated patients received, as well as age, IOP and M D levels at baseline were studied in my thesis. 2.3.1 Definition of the Progression End Point The Collaborative Normal Tension Glaucoma Study adopted two definitions of the pro-gression end point: the protocol definition and a definition based on the so-called four-of-five criteria. The former ensured identification of minimal visual field alterations to minimize any risk to eyes of untreated patients in the study, and the details of this definition can be found in [7]. The latter was used for the purpose of analyzing study outcomes. A computer algorithm was developed for the identification of the progression end point. In essence, progression was considered confirmed when four of five consecutive follow-up fields showed progression in a cluster of test locations relative to baseline visual fields, with at least one non-peripheral progression point (test location) common to all four fields. A progression relative to baseline fields was defined as having two or more adjacent points (they could not all be peripheral) whose M D values decreased by at least Chapter 2. The Collaborative Normal Tension Glaucoma Study 11 lOdB, relative to the average of baseline values at each of these points taken at the time of randomization. A complete description of the four-of-five criteria can be found in [7]. Chapter 3 Evaluation of Randomized Clinical Trials 3.1 Intent-to-Treat versus Per-Protocol Principles In conducting cl inical trials, treatment assignment is ideally done through randomization because randomization tends to give an unbiased comparison of the different treatment groups. In practice, however, clinicians often encounter problems of patient drop-outs, non-compliance and missing observations. Some patients do not ult imately receive the treatments to which they are preassigned. This then leads to concern about how one should analyze cl inical trials in order to have a proper comparison between the treatment groups. There have been two principles that are adopted in the evaluation of cl inical trials: the intent-to-treat (ITT) principle and the per-protocol (PP) principle. The former is based on the idea that all patients who are randomized should be included in the final analysis of the tr ial irrespective of the presence of drop-outs, cross-overs and non-compliance. Patients are assumed to remain in the'treatment groups to which they have been randomized even if they switch to another treatment during the period they are followed. According to Lachin [20], the intent-to-treat principle refers to a set of criteria for the evaluation of the benefits and risks of a new therapy that essentially calls for the complete inclusion of al l data from all patients randomized in the final analyses. The intent-to-treat principle is contrasted with the per-protocol principle in which the main purpose of the analysis lies in the assessment of the efficacy of a treatment. W i t h this principle, the evaluation of a cl inical tr ia l is based only on patients who actual ly 12 Chapter 3. Evaluation of Randomized Clinical Trials 13 adhere to the treatment preassigned. Observations on dropped-out and non-compliant patients are excluded from the analysis. Lachin [20] described the principle as a strategy to select and to examine the experience of a subset of patients that meet the desired efficacy criteria for inclusion in the analysis. It is important to distinguish between efficacy and effectiveness because their assess-ment entails different strategies and clinical implications. Efficacy refers to the effects of an intervention, such as a medication under ideal conditions, while effectiveness refers to how successful an intervention is in cl inical practice whose conditions often deviate from the controlled conditions in efficacy studies. As described by Indrayan and Sarmukaddam [21], efficacy is evaluated under controlled conditions, whereas effectiveness is determined not only by efficacy but also by coverage, compliance, provider performance, etc. There has been intense controversy over which of the two principles should be em-ployed in making treatment comparisons. Advocates for the intent-to-treat approach argue that it not only provides a means of treatment comparison in an unbiased fashion, but also realistically assesses the usefulness of a treatment in cl inical practice where it is infeasible to track how patients are receiving their prescribed treatments. The other group advocating the per-protocol approach argue that an intent-to-treat analysis is less powerful in detecting the presence of a treatment effect because a treatment effect is pos-sibly di luted by the inclusion of patients who do not adhere to their treatments. Those who follow the per-protocol principle believe that by studying the subset of patients who do receive the treatments exactly as described in the protocol, the treatment effect can be truly assessed and estimated. But they have overlooked some potential problems of this approach. Lachin [20] described some of the statistical considerations in the intent-to-treat de-sign and in the analyses of cl inical trials. He also discussed potential bias and statistical power issues related to the per-protocol analysis which he referred to as the efficacy sub-set analysis. The dropped-out and non-compliant patients possibly comprise a group Chapter 3. Evaluation of Randomized Clinical Trials 14 with demographic characteristics and health conditions substantially different from the rest of the patients who are included in the per-protocol analysis. This subset of patients being included is not identified at the time of randomization, and hence randomization of patients does not help ensure an unbiased comparison. The validity of the results from the per-protocol analysis becomes questionable; even if the results are valid, they may not be applicable to the study population in general. In many recent studies, clinical trials are evaluated using both approaches. Results from both are reported and compared. There seems to be a belief that when a per-protocol analysis identifies a significant treatment effect, the same result obtained in the intent-to-treat analysis further demonstrates the usefulness of the treatment. And for this reason, the intent-to-treat analysis tends to serve as a confirmation tool of a treatment assessment, rather than a means of unbiased treatment assessment. It is unclear how the results obtained from the two approaches are related and can be compared as they involve essentially two different sets of patients (one set is the subset of the other). Regardless, one should be aware of the potential problems of the per-protocol approach. Moreover, because of its difference in clinical interpretation and implication from the intent-to-treat approach, one needs to be cautious when drawing conclusions from either approach. 3.2 Intent-to-Treat and Per-Protocol Analyses of the Normal Tension Glau-coma Data In the evaluation of the Collaborative Normal Tension Glaucoma study, both the intent-to-treat and the per-protocol approaches were used for treatment comparisons. In par-ticular, the survival experience of the treated patients who had their IOP levels reduced by 30% from the prerandomization values was compared to that of the patients in the control group who remained untreated until they reached the progression end point or their follow-up times in the trial were censored. In the intent-to-treat analysis, all the Chapter 3. Evaluation of Randomized Clinical Trials 15 patients who were randomized were included for treatment comparison, regardless of the fact that there were problems such as inability to achieve the desired reduced level of IOP, progression prior to IOP stabilization, treatment complications that affected visual acuity of the treated patients, and non-compliance. In contrast, patients having any of the above problems were excluded from the per-protocol analysis. As the two approaches differ in the inclusion of patients for analyses, a substantial difference in sample size is expected when the degree of drop-outs and non-compliance is high. However, among the 145 patients who were randomized in the Collaborative Normal Tension Glaucoma Study, only five treated patients withdrew from the study before the stabilization of IOPs took place and did not meet the efficacy requirement to be included in the per-protocol analysis. In defining the intent-to-treat and the per-protocol approaches that were adopted by the C N T G S group, not only the original criteria of inclusion of patients were used, but also two different baselines were defined. In the intent-to-treat analysis, the baseline was taken to be the time of randomization for all the patients. Even though a 30% reduction in IOP was not immediately achieved upon medical, laser or surgical intervention for the treated patients, measuring the patients starting from the time of randomization gave a. reasonable assessment of an overall clinical effectiveness of the treatment as the treated patients began with the IOP-lowering therapy at randomization. On the other hand, in the per-protocol analysis, treated patients had their follow-up times measured from the baseline time at which their IOPs stabilized after a 30% reduction. Equivalently, the baseline for the control group remained at the time of randomization while that for the treatment group was shifted to the patients' individual times of IOP stabilization. This new baseline was chosen for the treatment group because having a 30% reduction in IOP was the desired treatment criterion. The treatment with the IOP reduction was supposed to have taken its full effect at the time of IOP stabilization. Due to the small number of patients who withdrew from the study, the ITT and P P Chapter 3. Evaluation of Randomized Clinical Trials 16 approaches did not differ much in terms of sample size. Rather, the baseline definition was the distinguishing factor between the two approaches. Throughout this thesis, we will follow the same principles defining the intent-to-treat and the per-protocol approaches as were adopted in the evaluation of the Collaborative Normal Tension Glaucoma Study. The results of our analyses based on the two approaches will be presented in Chapter 6, and the reliability of the results from the two approaches will be discussed in Chapter 8. Chapter 4 Linear Mixed Effects Models for the Longitudinal Mean Defect Data To model repeated measurements collected over time, the multivariate normal distr ibu-t ion ( M V N ) , generalized estimating equations (GEE) and the linear mixed effects ( LME ) model are amongst the popular choices. The M V N approach may not be applicable to our case because it generally works best only when the subjects have observations taken at a common set of times. Moreover, its application puts quite a strong distr ibutional assumption on the data. In order to have more relaxed assumptions to work with, and to incorporate irregular follow-up measurements for patients enrolled in the Col laborative Normal Tension Glaucoma Study, the G E E [22] and the L M E model were considered. F i t t ing the mean defect data using the two methods gave similar results marginally, but the L M E model was chosen for further analysis of the data because it allows for random effects for covariates which vary substantially between subjects. Also, the L M E model automatical ly takes care of problems of missing response and has the flexibil ity of f i tt ing a wide variety of correlation structures for the within-patient errors and for the random effects. In particular, the data show a large between-patient variation in the baseline mean defect (MD) and in the decay pattern of M D over time. Using the L M E model, we can better model the M D data by allowing different intercepts and decay rates for different patients. 17 Chapter 4. Linear Mixed Effects Models for the Longitudinal Mean Defect Data 18 4.1 T h e L i n e a r M i x e d E f fec t s M o d e l The linear mixed effects model that was fitted to the data was described in La i rd and Ware [23]. For indiv idual i, the model has the form where Yj is the vector of responses, /3 is the vector of fixed effects, which are constant across subjects, bi is the vector of random effects, and is independent of bj for i ^ j, Xj and Zj are the design matrices for the fixed and random effects, respectively, et is the vector of within-subject errors, and is independent of e,- for i ^ j. The model in Equation (4.1) has two major components: the mean structure Xj/3 and the covariance structure defined in terms of the distr ibution of the random effects and the within-subject errors. The vectors of random effects, bj's, are each assumed to be normally distributed with mean 0 and covariance matr ix D, and are mutual ly independent of the e;'s. The vector of within-subject errors, e;, also follows the normal distr ibution with mean 0 and covariance matr ix R; of dimension ri; x n;, where n» is the number of observations for individual i. The unknown parameters in Rj do not depend upon i. It can be shown that marginally, Yj is independently normally distributed wi th mean Xj/3 and covariance matr ix V(Yj) = Rj + ZjDZf. The random effects introduce an extra component of variation ZjDZf to the response variable. Furthermore, in the simplest case where Rj = Yini) are correlated in the presence of random effects. Est imat ion of the fixed and random effects, and parameters of the covariance struc-tures can be based on least squares and maximum likelihood methods, or an empir i-cal Bayes methodology. Details on the estimation procedure were discussed in La i rd and Ware [23]. The maximum likelihood (ML) and the restricted maximum likel ihood ( R E M L ) methods are by far the most popular choices of estimation procedures. R E M L estimates are often more efficient as R E M L estimation adjusts for the degrees of freedom used in the estimation of the parameters. However, when the total number of observa-tions 2~2iLi ni (m is the number of subjects) is much larger than the number of unknown parameters which are to be estimated, the M L and the R E M L methods give very close parameter estimates. 4.2 Application to the Mean Defect Data One of the major study questions which wi l l be addressed in this thesis is whether a 30% reduction of IOP from the baseline value successfully slows down the rate of generalized visual field loss as measured by the rate of decay of the mean defect. Moreover, the mean defect level of the control and the treated groups would also be our study interest because a more negative level is indicative of a higher severity of normal tension glaucoma. Prel iminary plots of the individual patients' mean defect trajectories for the control and treated groups separately (see Figures 4.1 and 4.2) suggest a general trend of de-pression of the M D over t ime/ Although the M D level within each patient tends to show moderate fluctuation, it does not seem too unreasonable to assume a linear model for the M D data of each of the individuals. Furthermore, we observed a large between-patient variation in the baseline M D readings at randomization and in the rate of change of M D over time. To take into account the highly variant information across patients, we fitted Chapter 4. Linear Mixed Effects Models for the Longitudinal Mean Defect Data 20 Figure 4.1: The observed M D trajectories over time from randomization (in days) for the 53 control patients in the data set. 0 500 1500 time 0 500 1500 time 0 200 600 time 0 200 600 time 0 200 600 time 0 400 1000 time 0 1000 2000 time 0 400 800 time 0 200 400 time 0 500 1500 time 0 . 400 800 time 0 400 800 time • • \ / 0 500 1500 time 0 200 600 time 0 500 1500 time ' 0 500 1500 time 0 400 800 time • \f •*! 0 400 1000 time 0 400 800 time 0 200 400 time 200 400 time 0 200 600 time i •? 0 400 800 time * — ft. 1 0 400 800 time 0 1000 2500 time 0 50 • 150 250 time 0 400 800 time 0 200 600 time 0 200 600 time \ 0 400 800 time 0 500 1500 time 5 2 0 400 800 time 0 200 400 time 0 200 500 time S , / ' 0 400 800 time 0 500 1500 time 0 400 1000 time 0 200 600 time \/1 0 400 1000 time 0 1000 2000 time < y v v 0 400 1000 time 0 400 800 time 0 200 600 time 0 400 1000 time 0 1000 2500 time . 0 500 1500 time V \ r ' v ' 0 1000 2500 time " \ 0 400 800 time V \ r -0 500 1500 time 0 500 1500 time 0 500 1500 time 0 500 1500 time 0 400 800 time Chapter 4. Linear Mixed Effects Models for the Longitudinal Mean Defect Data 21 Figure 4.2: The observed M D trajectories over time from randomization (in days) for the 44 treated patients in the data set. 0 500 1500 time \f" 0 500 1500 time •«: •••v-\ 0 1000 2000 time ' '.'•••it?-': 0 1000 2000 time 0 1000 2000 time v 0 500 1500 time ••v 0 1p00 2500 time 0 400 1000 time 0 1000 2500 time 0 200 600 time • • • • . ' \ • . \ . \ 0 1000 2500 time 0 1000 2000 time 0 100 200 time 0 500 1500 time 0 400 1000 time l \ 0 500 1500 time 0 400 800 time X . \ / 0 500 1500 time • • ' • \ / 0 400 1000 time 0 500 1500 time 0 500 1500 time 0 1000 2000 time V ^ 0 200 400 time 0 1000 2000 time 0 400 800 time 0 400 800 time 0 50 150 time ••\ :,:\'.-0 500 1500 time 0 500 1500 time 0 500 1500 time 0 500 1500 time 0 500 1500 time-0 1000 2000 time 0 1000 2500 time 0 1000 2500 time 0 500 1500 time S. T 0 400 800 time 0 500 • 1500 time 0 500 1500 time 0 1000 2000 time 0 500 1500 time 0 500 1500 time 0 500 1500 time 0 500 1 500 time Chapter 4. Linear Mixed Effects Models for the Longitudinal Mean Defect Data 22 the Laird and Ware linear mixed effects (LME) model given in Equation (4.1) to the mean defect data and included as random effects the intercept and the time-slope covari-ate. The response variable was the M D measured over time. Covariates that comprised the fixed effects were • the times of repeated M D measurements • gender • group membership • age at baseline • baseline M D • baseline IOP In both the intent-to-treat and the per-protocol analyses, the interaction between time and group was also included in order to assess any difference in the decay rate of the M D between the two treatment groups. A significant time x group interaction effect would imply a significantly different rate of change in M D and hence a different rate of disease progression between the control and the treated groups. Various covariance structures including the independence model, the first-order con-tinuous autoregressive model (CAR1, which is equivalent to the exponential model) and the Gaussian model were fitted to the within-patient errors in the preliminary analysis of the data. The first-order continuous autoregressive model was found to give the best fit. For the random effects, an unstructured covariance matrix was assumed. Moreover, there is a lower level below which the M D seldom attains in normal tension glaucoma. Patients whose M D levels at baseline are close to the lower limit might show a different trend of depression and in particular a slower decay rate than those who have Chapter 4. Linear Mixed Effects Models for the Longitudinal Mean Defect Data 23 less negative M D at baseline, because the closer the M D is to the lower limit, the less room is left for depression of the generalized visual field. This phenomenon is referred to as the "floor effect". In the case where the treated patients and the control patients in our study began with rather different initial M D levels, a difference between the mean M D decay rates of the two groups of patients might be the result of a potential floor effect rather than of a significant treatment effect. We checked for the presence of such a floor effect in our data by including an additional covariate into the L M E model in both the ITT and the P P analyses. The covariate was an indicator variable for whether a baseline M D level was above or below -YldB. This particular value was chosen because a patient with an M D less than -12