Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Causal inference approaches for dealing with time-dependent confounding in longitudinal studies, with… Karim, Mohammad Ehsanul 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_february_karim_mohammad.pdf [ 1.48MB ]
Metadata
JSON: 24-1.0167113.json
JSON-LD: 24-1.0167113-ld.json
RDF/XML (Pretty): 24-1.0167113-rdf.xml
RDF/JSON: 24-1.0167113-rdf.json
Turtle: 24-1.0167113-turtle.txt
N-Triples: 24-1.0167113-rdf-ntriples.txt
Original Record: 24-1.0167113-source.json
Full Text
24-1.0167113-fulltext.txt
Citation
24-1.0167113.ris

Full Text

Causal Inference Approaches forDealing with Time-dependentConfounding in Longitudinal Studies,with Applications to Multiple SclerosisResearchbyMohammad Ehsanul KarimB.Sc., University of Dhaka, 2004M.S., University of Dhaka, 2005M.Sc., The University of British Columbia, Vancouver, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2015c© Mohammad Ehsanul Karim 2015AbstractMarginal structural Cox models (MSCMs) have gained popularity in ana-lyzing longitudinal data in the presence of ‘time-dependent confounding’,primarily in the context of HIV/AIDS and related conditions. This thesis ismotivated by issues arising in connection with dealing with time-dependentconfounding while assessing the effects of beta-interferon drug exposure ondisease progression in relapsing-remitting multiple sclerosis (MS) patientsin the real-world clinical practice setting. In the context of this chronic, yetfluctuating disease, MSCMs were used to adjust for the time-varying con-founders, such as MS relapses, as well as baseline characteristics, throughthe use of inverse probability weighting (IPW). Using a large cohort of 1, 697relapsing-remitting MS patients in British Columbia, Canada (1995−2008),no strong association between beta-interferon exposure and the hazard ofdisability progression was found (hazard ratio 1.36, 95% confidence inter-val 0.95, 1.94). We also investigated whether it is possible to improve theMSCM weight estimation techniques by using statistical learning methods,such as bagging, boosting and support vector machines. Statistical learn-ing methods require fewer assumptions and have been found to estimatepropensity scores with better covariate balance. As propensity scores andIPWs in MSCM are functionally related, we also studied the usefulness ofstatistical learning methods via a series of simulation studies. The IPWs es-timated from the boosting approach were associated with less bias and bettercoverage compared to the IPWs estimated from the conventional logistic re-gression approach. Additionally, two alternative approaches, prescriptiontime-distribution matching (PTDM) and the sequential Cox approach, pro-posed in the literature to deal with immortal time bias and time-dependentconfounding respectively, were compared via a series of simulations. TheiiAbstractPTDM approach was found to be not as effective as the Cox model (withtreatment considered as a time-dependent exposure) in minimizing immor-tal time bias. The sequential Cox approach was, however, found to be aneffective method to minimize immortal time bias, but not as effective asa MSCM, in the presence of time-dependent confounding. These methodswere used to re-analyze the MS dataset to show their applicability. Thefindings from the simulation studies were also used to guide the data anal-yses.iiiPrefaceI wrote this dissertation with direction and input from Drs. P. Gustafson,J. Petkau and H. Tremlett. These studies were approved by the Universityof British Columbia’s Clinical Research Ethics board (study number: H08-01544).Chapter 2 is a version of the pre-copy-editing, author-produced PDF ofan article accepted for publication in ‘American Journal of Epidemiology’following peer review. The definitive publisher-authenticated version [KarimM. E., Gustafson P., Petkau J., Zhao Y., Shirani A., Kingwell E., Evans C.,van der Kop M., Oger J., and Tremlett H. Marginal Structural Cox Modelsfor Estimating the Association Between β-Interferon Exposure and DiseaseProgression in a Multiple Sclerosis Cohort. American Journal of Epidemi-ology, 180(2):160-171, 2014, Oxford University Press] is available online at:http://aje.oxfordjournals.org/cgi/content/abstract/kwu125. As partof my copyright agreement with Oxford University Press I have retained theright, after publication, to include this article in full or in part in my the-sis or dissertation, provided that this is not published commercially. I wasthe lead investigator, responsible for concept formation, statistical analy-ses and interpretations of the data, as well as drafting of the manuscript.P. Gustafson, J. Petkau and H. Tremlett were supervising this project andwere involved throughout the project in formation of the study concept anddesign and manuscript composition. H. Tremlett, P. Gustafson, E. Kingwell,J. Petkau, Y. Zhao and M. van der Kop obtained funding, and A. Shirani,C. Evans, E. Kingwell, J. Oger, and H. Tremlett provided administrative,technical, or material support. A. Shirani, E. Kingwell, M. van der Kop,J. Oger, and H. Tremlett were involved in data acquisition and P. Gustafson,ivPrefaceJ. Petkau and Y. Zhao contributed in guiding the statistical analyses. Forthis manuscript, I was responsible for all of the research analysis and writingthe initial draft, but all co-authors were involved in the improvement of themanuscript via a number of critical revisions.I was the lead investigator for the projects described in Chapters 3 and4. I was responsible for all major areas of concept formation, design of thestudies and analyses, as well as the manuscript composition. P. Gustafson,J. Petkau and H. Tremlett were the supervisors on these projects and wereinvolved throughout the project in concept formation and manuscript edits.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . xxiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 A Brief Overview of Causal Inference Frameworks . . . . 21.1.1 Potential Outcomes Framework . . . . . . . . . . . 21.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . 71.1.3 Models for Longitudinal Settings . . . . . . . . . . 101.1.4 Inverse Probability of Treatment Weights . . . . . 141.1.5 Role of Causal Diagrams . . . . . . . . . . . . . . . 171.1.6 Time-dependent Confounders . . . . . . . . . . . . 191.2 Models to Estimate the Causal Effect . . . . . . . . . . . 191.2.1 In the Presence of Time-dependent Confounders . 191.2.2 In the Absence of a Time-dependent Confounder . 201.2.3 In the Presence of Immortal Time Bias . . . . . . 201.3 Organization of the Dissertation . . . . . . . . . . . . . . 22viTable of Contents2 Marginal Structural Cox Models for Estimating the Effectof Beta-interferon Exposure in Delaying Disease Progres-sion in a Multiple Sclerosis Cohort . . . . . . . . . . . . . 242.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . 262.2.1 Study Population and Measurements . . . . . . . . 262.2.2 Statistical Methods . . . . . . . . . . . . . . . . . . 292.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.1 Time-dependent Weights . . . . . . . . . . . . . . 332.3.2 The Causal Effect of β-IFN . . . . . . . . . . . . . 372.3.3 IPTC Weighting for Estimation of Survival Curves 392.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 The Performance of Statistical Learning Approaches toConstruct Inverse Probability Weights in Marginal Struc-tural Cox Models: A Simulation-based Comparison . . 463.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 Marginal Structural Cox Model (MSCM) . . . . . . . . . 483.2.1 Estimation of ψ1 from MSCM . . . . . . . . . . . . 503.2.2 Estimation Methods of IPWs . . . . . . . . . . . . 513.2.3 IPW schemes . . . . . . . . . . . . . . . . . . . . . 533.2.4 Fitting Weight Models to Estimate IPW . . . . . . 533.3 Design of Simulations . . . . . . . . . . . . . . . . . . . . 543.3.1 Simulation Specifications . . . . . . . . . . . . . . 563.3.2 Performance Metrics . . . . . . . . . . . . . . . . . 583.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . 593.4.1 IPW Summary . . . . . . . . . . . . . . . . . . . . 593.4.2 Comparing IPW Estimation Approaches . . . . . . 613.4.3 Properties From Smaller Samples . . . . . . . . . . 623.4.4 When More Events are Available . . . . . . . . . . 633.4.5 Computational Time . . . . . . . . . . . . . . . . . 643.5 Empirical Multiple Sclerosis Application . . . . . . . . . . 673.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 72viiTable of Contents4 Comparison of Statistical Approaches Dealing with Im-mortal Time Bias in Drug Effectiveness Studies . . . . . 764.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . 804.2.2 Analysis Approaches . . . . . . . . . . . . . . . . . 804.2.3 Design of Simulation . . . . . . . . . . . . . . . . . 894.2.4 Simulation Specifications . . . . . . . . . . . . . . 904.2.5 Analytic Models Used . . . . . . . . . . . . . . . . 934.2.6 Performance Metrics . . . . . . . . . . . . . . . . . 954.3 Application in Multiple Sclerosis . . . . . . . . . . . . . . 954.3.1 Analytic Models Used . . . . . . . . . . . . . . . . 964.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . 974.4.1 Description of the Simulated Data . . . . . . . . . 974.4.2 Rare Event Condition . . . . . . . . . . . . . . . . 984.4.3 When More Events are Available . . . . . . . . . . 1024.5 Results from Multiple Sclerosis Data Analysis . . . . . . . 1034.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.1 Summary of the Main Results . . . . . . . . . . . . . . . . 1115.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . 117Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118AppendicesA Appendix for Chapter 2 . . . . . . . . . . . . . . . . . . . . 147A.1 Rationale Behind Hypothesizing that Cumulative Relapsesare Lying on the Causal Path of β-IFN and Disability Pro-gression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147viiiTable of ContentsA.2 Rationale Behind Using Marginal Structural Cox Model (MSCM)Instead of a Cox Model . . . . . . . . . . . . . . . . . . . 148A.3 Approximation of the Marginal Structural Cox Model . . 150A.4 Weight Models Used in the Data Analysis . . . . . . . . . 151A.5 MSCM fitting in R . . . . . . . . . . . . . . . . . . . . . . 153A.6 Exclusion Criteria and Summary of Selected Cohorts . . . 155A.7 Sensitivity Analyses . . . . . . . . . . . . . . . . . . . . . 156A.7.1 Sensitivity Analysis: Impact of Weight Trimming . 156A.7.2 Sensitivity Analysis: Impact of More Restrictive Eli-gibility Criteria . . . . . . . . . . . . . . . . . . . . 157A.7.3 Sensitivity Analysis: Impact of the Cumulative Expo-sure to β-IFN . . . . . . . . . . . . . . . . . . . . . 158A.7.4 Sensitivity Analysis: Impact of the Cumulative Num-ber of Relapses in the Last Year . . . . . . . . . . 159B Appendix for Chapter 3 . . . . . . . . . . . . . . . . . . . . 161B.1 Propensity Scores . . . . . . . . . . . . . . . . . . . . . . . 161B.2 Model Specification in MSCM . . . . . . . . . . . . . . . . 162B.3 Model Specifications for Estimating the Weights . . . . . 163B.4 Implementation of the Statistical Learning Approaches in R 165B.5 Post-estimation Weight Variability Reduction Techniques 168B.6 Pseudocode for MSCM Data Simulation . . . . . . . . . . 169B.7 Describing the Characteristics of the Weights in a SimulatedPopulation . . . . . . . . . . . . . . . . . . . . . . . . . . 170B.8 Additional Simulation Results . . . . . . . . . . . . . . . . 175B.8.1 Results from Smaller Samples n = 300 . . . . . . . 175B.8.2 Results from the Scenario When More Events are Avail-able for n = 2, 500 . . . . . . . . . . . . . . . . . . 180B.9 Supporting Results from the Empirical MS Application . 185C Appendix for Chapter 4 . . . . . . . . . . . . . . . . . . . . 190C.1 Bias Due to Incorrect Handling of Immortal Time . . . . 190C.1.1 Misclassifying Immortal Time . . . . . . . . . . . . 191ixTable of ContentsC.1.2 Excluding Immortal Time . . . . . . . . . . . . . . 193C.2 Illustration of the Prescription Time-distribution MatchingApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . 194C.3 Constructing a Mini-trial in the Sequential Cox Approach 198C.4 Implementation of the Sequential Cox Approach in R . . . 199C.5 Survival Data Simulation via Permutation Algorithm . . . 200C.6 Additional Simulation Results . . . . . . . . . . . . . . . . 202C.6.1 When More Events are Available . . . . . . . . . . 202C.7 Additional MS Data Analysis . . . . . . . . . . . . . . . . 205C.7.1 Prescription Time-distribution Matching . . . . . . 205C.7.2 Sequential Cox Approach . . . . . . . . . . . . . . 206xList of Tables1.1 An illustration of defining treatment effect in terms of poten-tial outcomes . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 An illustration of defining treatment effect in terms of ob-served outcomes . . . . . . . . . . . . . . . . . . . . . . . 61.3 Outcomes after stratum specific averages are imputed in thecells where the outcomes are missing . . . . . . . . . . . . 162.1 Different versions of the IPTC weights and the correspondingcausal effect of β-IFN on the hazard of reaching sustainedEDSS 6 for MS patients from BC (1995-2008). . . . . . . 362.2 The marginal structural Cox model (MSCM) fit with the nor-malized stabilized IPTC weights sw(n) for time to sustainedEDSS 6 to estimate the causal effect of β-IFN treatmentfor multiple sclerosis (MS) patients from British Columbia,Canada (1995-2008). The model was also adjusted for thebaseline covariates EDSS, age, disease duration and sex. . 372.3 Estimates of effect of β-IFN treatment on time to sustainedEDSS 6 for MS patients from British Columbia, Canada (1995-2008) using different analytical approaches. . . . . . . . . 382.4 Sensitivity analysis to assess the impact of EDSS as an ad-ditional time-varying confounder: The MSCM fit with thenormalized stabilized IPTC weights sw(n) for time to sus-tained EDSS 6 to estimate the causal association between β-IFN treatment for patients with relapsing-onset MS, BritishColumbia, Canada (1995-2008) . . . . . . . . . . . . . . . 39xiList of Tables2.5 The impact of truncation of the w(n) on the estimated causaleffect of β-IFN on reaching sustained EDSS 6 for MS patientsfrom British Columbia, Canada (1995-2008). . . . . . . . 403.1 Summaries of the (untruncated) weights estimated by differ-ent methods (l = logistic, b = bagging, svm = SVM, gbm= boosting) under different weighting schemes (w = unstabi-lized, w(n) = unstabilized normalized, sw = stabilized, sw(n)= stabilized normalized) from the simulation study with alarge (25, 000) number of subjects, each with up to 10 visits,under the rare event condition. . . . . . . . . . . . . . . . 603.2 Time required to compute IPWs using various approaches 694.1 Description of the analytic methods. . . . . . . . . . . . . 884.2 Three simulation settings under consideration. . . . . . . 934.3 Characteristics of three simulation settings under considera-tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.4 Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-I (one baseline covariateand time-dependent treatment exposure) of 1, 000 datasets,each containing 2, 500 subjects followed for up to 10 time-intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.5 Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-II (one baseline covariate,one time-dependent covariate and time-dependent treatmentexposure) of 1, 000 datasets, each containing 2, 500 subjectsfollowed for up to 10 time-intervals. . . . . . . . . . . . . . 1014.6 Comparison of the analytical approaches to adjust for immor-tal time bias from simulation-III (one time-dependent con-founder and time-dependent treatment exposure) of 1, 000datasets, each containing 2, 500 subjects followed for up to10 time-intervals. . . . . . . . . . . . . . . . . . . . . . . . 102xiiList of Tables4.7 Summary of the estimated parameters from the relapsing-onset multiple sclerosis (MS) patients’ data from British Columbia,Canada (1995-2008). . . . . . . . . . . . . . . . . . . . . . 103A.1 Estimated coefficients from the treatment model (denomina-tor of swTit) for patients with relapsing-onset multiple sclerosis(MS), British Columbia, Canada (1995-2008) . . . . . . . 152A.2 Characteristics of the selected cohort of patients with relapsing-onset multiple sclerosis (MS), British Columbia, Canada (1995-2008). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155A.3 The marginal structural Cox model (MSCM) fit with the nor-malized stabilized IPTC weights sw(n) for time to sustainedEDSS 6 to estimate the causal effect of β-IFN treatment forpatients with relapsing-onset multiple sclerosis (MS), BritishColumbia, Canada (1995-2008) selected by more restrictiveeligibility criteria. The model was also adjusted for baselinecovariates EDSS, age, disease duration and sex. . . . . . . 157A.4 The marginal structural Cox model (MSCM) fit with the nor-malized stabilized IPTC weights sw(n) for time to sustainedEDSS 6 to estimate the causal effect of cumulative exposure toβ-IFN over the last two years for patients with relapsing-onsetmultiple sclerosis (MS), British Columbia, Canada (1995-2008).The model was also adjusted for baseline covariates EDSS,age, disease duration and sex. . . . . . . . . . . . . . . . . 158A.5 The marginal structural Cox model (MSCM) fit with the nor-malized stabilized IPTC weights sw(n) for time to sustainedEDSS 6 to estimate the causal effect of cumulative exposure toβ-IFN over the last two years for patients with relapsing-onsetmultiple sclerosis (MS), British Columbia, Canada (1995-2008)while considering the cumulative number of relapses in thelast year as the time-varying confounder. The model was alsoadjusted for baseline covariates EDSS, age, disease durationand sex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159xiiiList of TablesB.1 Summaries of the truncated weights estimated by logistic re-gression (l = logistic) under different weighting schemes (w= unstabilized, w(n) = unstabilized normalized, sw = sta-bilized, sw(n) = stabilized normalized) from the simulationstudy with a large (25, 000) number of subjects, each with upto 10 visits, under the rare event condition. . . . . . . . . 171B.2 Summaries of the truncated weights estimated by bagging ap-proach (b = bagging) under different weighting schemes (w= unstabilized, w(n) = unstabilized normalized, sw = sta-bilized, sw(n) = stabilized normalized) from the simulationstudy with a large (25, 000) number of subjects, each with upto 10 visits, under the rare event condition. . . . . . . . . 172B.3 Summaries of the truncated weights estimated by SVM ap-proach (svm = SVM) under different weighting schemes (w= unstabilized, w(n) = unstabilized normalized, sw = sta-bilized, sw(n) = stabilized normalized) from the simulationstudy with a large (25, 000) number of subjects, each with upto 10 visits, under the rare event condition. . . . . . . . . 173B.4 Summaries of the truncated weights estimated by boostingapproach (gbm = boosting) under different weighting schemes(w = unstabilized, w(n) = unstabilized normalized, sw = sta-bilized, sw(n) = stabilized normalized) from the simulationstudy with a large (25, 000) number of subjects, each with upto 10 visits, under the rare event condition. . . . . . . . . 174B.5 The impact of truncation of the sw(n) generated via logis-tic regression on the estimated causal effect of β-IFN on thehazard of reaching sustained EDSS 6 for BC MS patients(1995-2008). . . . . . . . . . . . . . . . . . . . . . . . . . . 186B.6 The impact of truncation of the sw(n) generated via bag-ging on the estimated causal effect of β-IFN on the hazard ofreaching sustained EDSS 6 for BC MS patients (1995-2008). 187xivList of TablesB.7 The impact of truncation of the sw(n) generated via SVM onthe estimated causal effect of β-IFN on the hazard of reachingsustained EDSS 6 for BC MS patients (1995-2008). . . . . 188B.8 The impact of truncation of the sw(n) generated via boost-ing on the estimated causal effect of β-IFN on the hazard ofreaching sustained EDSS 6 for BC MS patients (1995-2008). 189C.1 Comparison of the analytical approaches to adjust for immor-tal time bias from simulation-I (one baseline covariate andtime-dependent treatment exposure) of 1, 000 datasets, eachcontaining 2, 500 subjects followed for up to 10 time-intervals(frequent event case λ0 = 0.10). . . . . . . . . . . . . . . . 202C.2 Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-II (one baseline covariate,one time-dependent covariate and time-dependent treatmentexposure) of 1, 000 datasets, each containing 2, 500 subjectsfollowed for up to 10 time-intervals (frequent event case). 203C.3 Comparison of the analytical approaches to adjust for immor-tal time bias from simulation-III (one time-dependent con-founder and time-dependent treatment exposure) of 1, 000datasets, each containing 2, 500 subjects followed for up to10 time-intervals (frequent event case). . . . . . . . . . . . 204C.4 Mean (SD) of the estimated parameters using PTDM fromthe MS example with 1, 000 different starting seed values. 206C.5 Estimated hazard ratio using the sequential Cox approachto estimate the causal effect of β-IFN on time to sustainedEDSS 6 for patients with relapsing-onset multiple sclerosis(MS), British Columbia, Canada (1995-2008), when IPCWsare calculated from the combined dataset of all mini-trials. 206xvList of Figures1.1 An illustration of defining treatment effect in terms of poten-tial outcomes and observations . . . . . . . . . . . . . . . 71.2 Illustration of point-treatment and two time point treatmentssituation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Relationships among exposure E, outcome variable D and acovariate C in a directed acyclic graph . . . . . . . . . . . 181.4 An illustration of immortal time, i.e., a delay or wait periodthat may exist before a subject begins to receive a treatmentin an observational drug effectiveness study . . . . . . . . 202.1 Representation of the hypothesized causal relationships in thetreatment of MS with three time points j = 0, 1, 2. . . . . 282.2 Number of patients at risk of reaching sustained EDSS 6 dur-ing the first month of each follow-up year after baseline. Fail-ure to continue to the next risk set results from either censor-ing or reaching sustained EDSS 6. Analyses were performedby month, but the plot is drawn by year for simplicity. . . 332.3 Distribution of various IPTC weighting schemes for each yearof follow-up (instead of month for better visual display). Themeans are indicated by ∗ in each boxplot. Note that the plotsdo not have identical scales on the vertical axes. . . . . . 34xviList of Figures2.4 IPTC weight adjusted Kaplan-Meier-type survival curves forthe effect of β-IFN on time to reaching sustained EDSS 6for multiple sclerosis (MS) patients from British Columbia,Canada (1995-2008). The truncated weights are derived fromthe normalized unstabilized IPTC weights (w(n)) so that thesurvival probabilities and HRs are marginal estimates withcausal interpretation. . . . . . . . . . . . . . . . . . . . . . 453.1 Causal diagram depicting the dependencies in the marginalstructural Cox model (MSCM) data generation algorithm. 553.2 Bias of MSCM estimate ψˆ1 under different IPW estimationapproaches when the large weights are truncated with in-creased levels in a simulation study of 1, 000 datasets with2, 500 subjects observed at most 10 times. . . . . . . . . . 643.3 Empirical standard deviation of MSCM estimate ψˆ1 underdifferent IPW estimation approaches when the large weightsare truncated with increased levels in a simulation study of1, 000 datasets with 2, 500 subjects observed at most 10 times. 653.4 Average model-based standard error of MSCM estimate ψˆ1under different IPW estimation approaches when the largeweights are truncated with increased levels in a simulationstudy of 1, 000 datasets with 2, 500 subjects observed at most10 times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.5 Mean squared error of MSCM estimate ψˆ1 under differentIPW estimation approaches when the large weights are trun-cated with increased levels in a simulation study of 1, 000datasets with 2, 500 subjects observed at most 10 times. . 673.6 The coverage probability (cp) of model-based nominal 95%confidence intervals based on the MSCM estimate ψˆ1 underdifferent IPW estimation approaches when the large weightsare truncated with increased levels in a simulation study of1, 000 datasets with 2, 500 subjects observed at most 10 times. 68xviiList of Figures3.7 Performance of stabilized normalized weights estimated fromdifferent IPW estimation approaches for MSCM analysis in amultiple sclerosis study. . . . . . . . . . . . . . . . . . . . 704.1 Matched wait periods (in years) from prescription time-distributionmatching approach in the relapsing-onset multiple sclerosis(MS) cohort from British Columbia, Canada (1995-2008). 104B.1 Bias of MSCM estimate ψˆ1 under different IPW generationapproaches when the large weights are progressively trun-cated in a simulation study of 1, 000 datasets with 300 sub-jects observed at most 10 times under the rare event condi-tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175B.2 Empirical standard deviation of MSCM estimate ψˆ1 underdifferent IPW generation approaches when the large weightsare progressively truncated in a simulation study of 1, 000datasets with 300 subjects observed at most 10 times underthe rare event condition. . . . . . . . . . . . . . . . . . . . 176B.3 Average model-based standard error of MSCM estimate ψˆ1under different IPW generation approaches when the largeweights are progressively truncated in a simulation study of1, 000 datasets with 300 subjects observed at most 10 timesunder the rare event condition. . . . . . . . . . . . . . . . 177B.4 Mean squared error of MSCM estimate ψˆ1 under differentIPW generation approaches when the large weights are pro-gressively truncated in a simulation study of 1, 000 datasetswith 300 subjects observed at most 10 times under the rareevent condition. . . . . . . . . . . . . . . . . . . . . . . . . 178xviiiList of FiguresB.5 The coverage probability (cp) of model-based nominal 95%confidence intervals based on the MSCM estimate ψˆ1 underdifferent IPW generation approaches when the large weightsare progressively truncated in a simulation study of 1, 000datasets with 300 subjects observed at most 10 times underthe rare event condition. . . . . . . . . . . . . . . . . . . . 179B.6 Bias of MSCM estimate ψˆ1 under different IPW generationapproaches when the large weights are progressively trun-cated in a simulation study of 1, 000 datasets with 2, 500 sub-jects observed at most 10 times when the event rate is morefrequent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180B.7 Empirical standard deviation of MSCM estimate ψˆ1 underdifferent IPW generation approaches when the large weightsare progressively truncated in a simulation study of 1, 000datasets with 2, 500 subjects observed at most 10 times whenthe event rate is more frequent. . . . . . . . . . . . . . . . 181B.8 Average model-based standard error of MSCM estimate ψˆ1under different IPW generation approaches when the largeweights are progressively truncated in a simulation study of1, 000 datasets with 2, 500 subjects observed at most 10 timeswhen the event rate is more frequent. . . . . . . . . . . . . 182B.9 Mean squared error of MSCM estimate ψˆ1 under differentIPW generation approaches when the large weights are pro-gressively truncated in a simulation study of 1, 000 datasetswith 2, 500 subjects observed at most 10 times when the eventrate is more frequent. . . . . . . . . . . . . . . . . . . . . 183B.10 The coverage probability (cp) of model-based nominal 95%confidence intervals based on the MSCM estimate under dif-ferent IPW generation approaches when the large weightsare progressively truncated in a simulation study of 1, 000datasets with 2, 500 subjects observed at most 10 times whenthe event rate is more frequent. . . . . . . . . . . . . . . . 184xixList of FiguresB.11 Performance of stabilized normalized weights generated bydifferent statistical learning approaches for MSCM analysisto estimate log-hazard ψ1 in a multiple sclerosis study. . . 185C.1 Risk ratios of misclassified immortal time (RR′), excludingimmortal time (RR′′) and PTDM (RR′′) methods comparedto that of a time-dependent analysis RR in terms of variousfraction of immortal time f and ratio of person-times underno treatment versus under treatment r under the assumptionof constant hazard. . . . . . . . . . . . . . . . . . . . . . . 192C.2 An illustration of prescription time-distribution matching 194C.3 An illustration of the sequential Cox approach . . . . . . 198C.4 Estimated hazard ratio from the PTDM method to estimatethe causal effect of β-IFN on time to sustained EDSS 6 forpatients with relapsing-onset multiple sclerosis (MS), BritishColumbia, Canada (1995-2008) . . . . . . . . . . . . . . . 205C.5 Density plots of the estimated IPC weights from the MS data(estimated from each mini-trial separately) in all the referenceintervals using the sequential Cox approach . . . . . . . . 207C.6 Density plots of the estimated IPC weights from the MS data(estimated from the aggregated data of all mini-trials) in allthe reference intervals using the sequential Cox approach . 208xxAcknowledgementsIt gives me great pleasure to express my sincere gratitude and deepest ap-preciation to my supervisors: Professors Paul Gustafson and John Petkau.Their mentorship, valuable suggestions regarding my research, financial sup-port, prompt review of my writing and constant inspiration greatly helpedto advance my research training and to complete this dissertation. It trulyhas been an honor and a privilege to work with both of them.I would like to thank my supervisory committee member, Associate Pro-fessor Helen Tremlett (Neurology), for giving me the opportunity to workon the BeAMS (Benefits and Adverse Effects of Beta-interferon for Mul-tiple Sclerosis) project and for including me in her family of impassionedcollaborators. I am also grateful for her invaluable research feedback andcareful draft revisions. I acknowledge and thank my research collaboratorsDrs. Yinshan Zhao, Afsaneh Shirani, Elaine Kingwell, Charity Evans, JoelOger and Mia van der Kop for their support and patience.Many thanks to my PhD comprehensive committee members, ProfessorsRollin Brant and Lang Wu, for their excellent feedback. I would like toexpress my sincere gratitude and appreciation to Professor Erica Moodie,external examiner from McGill University (Department of Epidemiology,Biostatistics and Occupational Health), for her valuable comments. I wouldlike to thank Professors Hubert Wong (Health Care and Epidemiology) andRollin Brant (Statistics) for serving as the university examiners. I wouldalso like to thank everyone in the Department of Statistics, from faculty tostaff and fellow graduate students, for making my PhD program such anenriching and pleasant experience.I would like to acknowledge the Multiple Sclerosis (MS) Society of CanadaxxiAcknowledgementsfor the PhD Research Studentship as well as travel awards to attend confer-ences, the endMS Network for travel awards to attend various conferencesand summer schools, the University of British Columbia (UBC) for the Ph.D.Tuition Fee Award, Graduate Student Travel Award and Faculty of ScienceGraduate Award and the Department of Statistics for its Graduate TeachingAssistant Award. I am also grateful for the travel grants from the PacificInstitute for the Mathematical Sciences (PIMS).Many thanks to the MS patients for their participation in research andthe BCMS neurologists who contributed to the study through patient exam-ination and data collection (current members listed here by primary clinic):UBC MS Clinic: A. Traboulsee, MD, FRCPC (UBC Hospital MS Clinic Di-rector and Head of the UBC MS Programs); A-L. Sayao, MD, FRCPC; V.Devonshire, MD, FRCPC; S. Hashimoto, MD, FRCPC (UBC and VictoriaMS Clinics); J. Hooge, MD, FRCPC (UBC and Prince George MS Clinics);L. Kastrukoff, MD, FRCPC (UBC and Prince George MS Clinics); J. Oger,MD, FRCPC. Kelowna MS Clinic: D. Adams, MD, FRCPC; D. Craig, MD,FRCPC; S. Meckling, MD, FRCPC. Prince George MS Clinic: L. Daly, MD,FRCPC. Victoria MS Clinic: O. Hrebicek, MD, FRCPC; D. Parton, MD,FRCPC; K Pope, MD, FRCPC. We also thank P. Rieckmann, MD (Sozial-stiftung Bamberg Hospital, Germany) for helpful revisions of the originalCIHR grant. The views expressed in this dissertation do not necessarilyreflect the views of each neurologist acknowledged.On a personal note, I am eternally grateful to my parents who haveprovided me with abundance of freedom and opportunities all my life, tomy sister and brother for their words of wisdom and to my wife Suborna forher love.Vancouver, Canada Mohammad Ehsanul KarimJanuary 19, 2015xxiiDedicationTo my parents,and to my wife, Suborna.xxiiiChapter 1IntroductionIn most scientific research, establishing causation is the ultimate goal. Re-searchers usually view predictive models with a causal interpretation. With-out a sense of causality in the researcher’s mind, the statistical measures aremerely measures of association among various variables under consideration.Simple association resulting from a poorly-designed study may sometimes bemisleading or inadequate in assessing the causal relationship between vari-ables. This is especially true in the field of epidemiology, when evaluatingdisease-exposure relationships from observational data. Finding the causeof a health related outcome is usually the focus. Statistical association ismerely an intermediate step in the process. This led researchers to redefinevarious statistical and epidemiologic concepts in terms of causal mechanisms.Multiple sclerosis (MS) is a chronic disease that affects the central ner-vous system, affecting an estimated 2.3 million people worldwide [1]. Beta-interferons (β-IFNs) are the most commonly prescribed immunomodulatorydrugs for treating relapsing-onset MS patients. The drugs were primarilylicensed or approved for use in MS based on demonstrated, but partial, ef-fects from key short-term clinical trials [2–6]. Further, a number of sideeffects are associated with the use of these drugs. Since these β-IFN drugsare expensive and patients may be on the drugs for many years, long-termeffectiveness of β-IFN is of great interest. However, MS is a life-long diseaseand appropriately following these patients for such a long time in order toassess drug effectiveness in an ‘exposed’ versus ‘non-exposed’ group of in-dividuals is not practical, from either the ethical or cost perspective. Thisstudy has access to one of the largest population-based MS databases inthe world. Utilizing this retrospective cohort of British Columbia (BC) MS11.1. A Brief Overview of Causal Inference Frameworkspatients provides the opportunity to investigate long-term effectiveness ofβ-IFN under the ‘real-world’ clinical practice setting.A causal interpretation of a treatment effect estimate obtained from ob-servational data requires additional considerations and assumptions. Con-ventional statistical analysis tools often fail to produce unbiased estimates inthe absence of randomization and the presence of time-varying confounders.In this dissertation, an MS research question motivates the improvement ofanalysis methodologies for the observational study data from a much broaderperspective. The overarching goal of this dissertation is to assess, improveand compare the statistical tools that deal with the time-dependent con-founding while estimating the causal effect of a treatment in the context ofobservational longitudinal drug-effectiveness studies. However, it is impor-tant to understand the assumptions behind these causal inference tools. Inthe following sections, we briefly illustrate the basic framework and the keyassumptions that facilitate causal inference.1.1 A Brief Overview of Causal InferenceFrameworks1.1.1 Potential Outcomes FrameworkThe ideas of causality date back to 1748 in the work of the philosopherHume [7]. He defined causality in plain English as follows: “Cause is anevent followed by another (effect)”, and “Without the first event (cause),the second (effect) would never happen”, which formed the foundation forthe sufficient and necessary conditions for causality. These intuitive causaldefinitions (especially the second) were translated into statistical languageby Neyman et al. [8] using the ‘potential outcome’ notion to define ‘causaleffect’ (Neyman’s framework).21.1.ABriefOverviewofCausalInferenceFrameworksTable 1.1: An illustration of defining treatment effect in terms of potential outcomesSubject Covariate Treatment Outcome Outcome Causal effecti L (A = 0, 1) YA=1 YA=0 YA=1 − YA=01 l1 Both† YA=1,1 YA=0,1 YA=1,1 − YA=0,12 l1 Both YA=1,2 YA=0,2 YA=1,2 − YA=0,23 l1 Both YA=1,3 YA=0,3 YA=1,3 − YA=0,34 l1 Both YA=1,4 YA=0,4 YA=1,4 − YA=0,45 l1 Both YA=1,5 YA=0,5 YA=1,5 − YA=0,5Conditional summary E(YA=1|L = l1) E(YA=0|L = l1) E(YA=1 − YA=0|L = l1)6 l2 Both YA=1,6 YA=0,6 YA=1,6 − YA=0,67 l2 Both YA=1,7 YA=0,7 YA=1,7 − YA=0,78 l2 Both YA=1,8 YA=0,8 YA=1,8 − YA=0,89 l2 Both YA=1,9 YA=0,9 YA=1,9 − YA=0,910 l2 Both YA=1,10 YA=0,10 YA=1,10 − YA=0,10Conditional summary E(YA=1|L = l2) E(YA=0|L = l2) E(YA=1 − YA=0|L = l2)Marginal summary E(YA=1) E(YA=0) E(YA=1) − E(YA=0)† Both treatments A = 0 and A = 1 are applied on each of the subjects.31.1. A Brief Overview of Causal Inference FrameworksIn Neyman’s framework, the causal effect is defined as the comparison ofpotential outcomes YA under treatment (A = 1) and no treatment (A = 0)conditions, i.e., YA=1 versus YA=0 for a given unit i. This comparison can bemeasured either in the form of a difference (additive scale) or a ratio (multi-plicative scale) or some other generalized contrasts such as, simple average,median, hazard or cumulative density function of the potential outcomes.For example, the causal risk ratio can be defined as the ratio (contrast) ofthe means (function of potential outcome), E(YA=1) / E(YA=0). Deviationfrom the null value, zero for the difference and one for the ratio, would implythat there is a causal effect. In the hypothetical example shown in Table1.1, each row corresponding to a given subject produces a causal effect. Aconditional summary (say, average) causal effect can be calculated for eachparticular value of the covariate L. A marginal summary (say, average)causal effect can be calculated unconditionally.Fisher recognized the value of randomization in estimating the treatmenteffect from an experiment [9, 10]. He introduced Fisher’s sharp null hypoth-esis H0 that under all treatment assignments (A = 0, 1), every unit i wouldproduce the same outcome, i.e., YA=1,i = YA=0,i, for completely randomizedexperiments. If this hypothesis H0 is true, then there is no treatment effectin unit i (Fisher’s framework).In evaluating causation or causal effects, randomized experiments arethe best choice. Their strength stems from the principle of randomizationor lack of bias towards any covariate levels, as noted by Fisher. However,experimenting with medical treatments is not always feasible. Hence re-searchers may have to depend on observational studies to estimate the effectof a treatment despite the fact that observational studies are more prone tovarious kinds of biases. Providing a causal interpretation of an estimateobtained from observational data requires additional assumptions or condi-tions, under which we could imagine some form of chance mechanism wasinvolved in the process of data collection. To make causal inference usingnonrandomized data in a point-treatment situation (treatment is not time-41.1. A Brief Overview of Causal Inference Frameworksdependent), Rubin extended the potential outcome notion (Neyman-Rubinframework) in a series of works [11–16]. If Ai = 0, 1 is the treatment as-signment on unit i, the observed outcome Yi for unit i can be expressed asYi = AiYA=1,i + (1−Ai)YA=0,i.To estimate a causal effects for subject i, one needs the two outcomes(YA=1, YA=0). The obvious problem with using this hypothetical definitionis that we can only observe a patient under one treatment at any given time,i.e., we can observe either YA=1 or YA=0 in Table 1.1. Crossover randomizedexperiments may provide a solution to this issue under some conditions, butit is not possible to conduct such experiments for irreversible health out-comes. The need to deal with unobservable quantities (i.e., missingness ofhalf of the outcomes) is considered as the fatal flaw of the potential outcomemodel [17] or the fundamental problem of causal inference [18].Even though YA=1,i − YA=0,i, the causal effect for a particular unit i,cannot be identified due to missing information, the average causal effect,E(YA=1)−E(YA=0) from the two mutually exclusive groups A = 1 and 0, canbe estimated by E(Y |A = 1)−E(Y |A = 0) if these two groups are similar incharacteristics (as shown in Table 1.2 as an illustration). That is, for a binaryoutcome, the causal risk ratio defined by the unconditional or marginalexpression P (YA=1 = 1)/P (YA=0 = 1) can be estimated from the conditionalexpression or association measure P (Y = 1|A = 1)/P (Y = 1|A = 0). Herethe marginal ratio is estimated based on the idea that both treatments areapplied to the whole population. The conditional ratio is estimated basedon the idea that the treatment is applied to a part of the population anda mutually exclusive part of that population did not receive the treatment,as shown in Figure 1.1. To ensure these two groups are comparable, certainassumptions need to be made.51.1.ABriefOverviewofCausalInferenceFrameworksTable 1.2: An illustration of defining treatment effect in terms of observed outcomesSubject Covariate Treatment Outcome Outcome Causal effecti L A = 0 or 1 YA=1 YA=0 YA=1 − YA=01 l1 1 YA=1,12 l1 1 YA=1,23 l1 0 YA=0,34 l1 0 YA=0,45 l1 0 YA=0,5Conditional summary E(Y |A = 1, L = l1) E(Y |A = 0, L = l1) E(Y |A = 1, L = l1) −E(Y |A = 0, L = l1)6 l2 0 YA=0,67 l2 0 YA=0,78 l2 1 YA=1,89 l2 1 YA=1,910 l2 1 YA=1,10Conditional summary E(Y |A = 1, L = l2) E(Y |A = 0, L = l2) E(Y |A = 1, L = l2) −E(Y |A = 0, L = l2)Marginal summary E(Y |A = 1) E(Y |A = 0) E(Y |A = 1) − E(Y |A = 0)61.1. A Brief Overview of Causal Inference FrameworksFigure 1.1: An illustration of defining treatment effect in terms of potentialoutcomes and observations1.1.2 AssumptionsTo be able to estimate the causal effect, the assumption of ignorability[19] or unconfoundedness [20] is required. This assumption states that(YA=1, YA=0)⊥A, which means the treatment assignment A and the poten-tial outcomes are not associated. A common source of confusion would beto interpret this assumption as observed A and observed Y to be indepen-dent, which is not true if there is a treatment effect. To make a connectionbetween this assumption and randomization, we can say that ignorabilityensures the treatment assignment A and the potential outcomes YA=1, YA=0are unassociated, whereas randomization assures that the treatment assign-ment A is unassociated with any variables, not just the joint distribution ofthe potential outcomes. Let us define the ‘sufficient set of covariates’ as aset of covariates that contains the complete information about the exposure-outcome association. Then within the levels of this set of covariates, theassociation measure of the exposure-outcome relationship is unconfounded.If L is a sufficient set of covariates, then (YA=1, YA=0)⊥A|L is called condi-71.1. A Brief Overview of Causal Inference Frameworkstional ignorability within levels of L, i.e., the causal effect can be estimatedwithin the strata matched by L or corresponding to the levels of L.Some authors view ignorability as the combination of the assumptionsof exchangeability and positivity [21, ch.3]. “Exchangeability” is denoted byYa⊥A or P (Ya|A = 1) = P (Ya|A = 0). If this condition is satisfied, rever-sal or exchange of treatment status of all the patients does not change themagnitude or direction of the treatment effect. Under this assumption, ex-cluding the effect of treatment, sub-groups under consideration are assumedto be equivalent in all respects. Therefore, the risk in the exposed groupwould be the same as the risk in the unexposed group had patients in theexposed group not received the treatment or exposure. Exchangeability isalso known as exogeneity [22] which is related to the concept of an exogenousvariable. A variable that does not receive any causal input from any othervariable in the system is called an exogenous variable. In the econometricliterature, such a concept is useful in detecting confounding or deviationfrom causal and associational measures as well. “Positivity” is denoted byP (A = a|L = l) > 0 ∀a. This assumption requires the existence of at leastone individual in each stratum of L in each of the exposure groups so that acomparable pair of subjects exists in each stratum of L in the target popu-lation, i.e., positive probability of getting assigned to each of the treatmentlevels in each strata.Another assumption required to make causal inference from nonrandom-ized data is popularly known as the ‘stable unit treatment value assumption’(SUTVA) [23]. This assumption states that the potential outcome observa-tion Y on one unit i should be unaffected by the particular assignment oftreatments A = 0, 1 to the other units j 6= i. In other words, treatmentchoice for one unit does not affect the outcome of any other unit. This issimilar to the assumption of no interaction between units [24], but moregeneral because SUTVA also assumes that there are no different versions oftreatment, i.e., treatment does not vary in effectiveness from unit to unit.The latter part of the assumption is also known as consistency [25, 26] de-81.1. A Brief Overview of Causal Inference Frameworksnoted by Ya = Y ∀a.In the literature, this consistency assumption is variously known as ‘noversions of treatment’ [19] or ‘treatment-variation irrelevance’ [27] or ‘welldefined interventions’ [28]. That means, treatment (A = 1) needs to be oneparticular dose of a particular drug and no treatment (A = 0) should alsomean no other treatment. For the subjects with same covariate history, theobserved outcome due to this treatment A = 1 should be the same. Asan extreme example, if it is reasonable to assume that various drugs wouldhave the same effect (counterfactual outcome) on a given patient, then allthese drugs could be listed as treatment A = 1 with no versions (since theoutcome does not vary). In experiments, keeping the same version of thetreatment is usually expected due to adherence to a precise protocol. How-ever, in observational studies, it may be achievable if the prescription isunambiguously one particular dose of a given drug. Otherwise, if multipletreatments A1, A2, . . . , AR (with possibly different effects on outcome) arebeing prescribed to treat similar patients, then consistency is violated dueto existence of R versions of the drug. The causal effect of such a ‘treatmentwith multiple versions’ may not have any practical value. A possible remedyfor such a situation is restriction, say, restricting analysis to Ai if multipleversions are separately documented in the data. Even if there is only oneversion of a treatment in a given observational study, the effect may varydue to noncompliance (e.g., taking pills irregularly) and thus result in a vio-lation of the consistency assumption. The resulting average treatment effectwill depend on the distribution of patients receiving various versions of thetreatment in the sample. If this differs from the population distributionof patients who received various versions of the treatment in a real worldsetting, then the corresponding result may be biased.The three assumptions, exchangeability, positivity and consistency, areoften referred to as the ‘identifiability conditions’ [29]. Making a causalstatement or interpretation requires that the observational study emulates arandomized experiment where all the covariates are equally distributed be-91.1. A Brief Overview of Causal Inference Frameworkstween the treated and the untreated groups. However, such balance betweenthe treated and untreated groups is not usually seen in observational studies.Under these identifiability conditions, observational studies can be viewedas conditionally randomized experiments, i.e., treatment assignment can beassumed random conditional on measured covariates. Then it is possible tomake causal inferences with the hope that the untestable assumptions areapproximately true. Unfortunately, without subject-area knowledge or useof additional information to justify the assumptions, such inferences can notbe validated.Rosenbaum and Rubin proposed propensity score methodology based onthis framework [20, 30–32]. They defined the propensity score p = P (A =1|L) as the conditional probability of receiving treatment given the measuredbackground variables L. They also extended the assumption of conditionalignorability. The propensity score is often used for matching when there aremultiple or possibly high-dimensional attributes of L.1.1.3 Models for Longitudinal SettingsFor longitudinal data structures, more sophisticated methodology needs tobe adopted to account for the complexity in the data. Robins showed that,under some conditions, time-varying treatments will not have a causal in-terpretation even if the usual identifiability conditions hold [33, 34]. He ex-tended the point-treatment theory further to apply in longitudinal settingswhere treatment may be time-varying (multiple time point treatments). Fig-ure 1.2 (b) illustrates this extension for two time points. Here, treatment(A) assignment or choice may change in the second time point (t = 1) com-pared to that of the first time point (t = 0), whereas for the point-treatmentsituation (Figure 1.2 (a)) treatment A is assigned only once. Under thisframework, potential outcomes are often denoted by the term ‘counterfac-tuals’ [35] (or ‘possible worlds’ [25]), while others have reservations aboutthis nomenclature [16].101.1. A Brief Overview of Causal Inference Frameworks(a) Point-treatment (b) Two time point treatmentsFigure 1.2: Illustration of point-treatment and two time point treatments situa-tionA methodology to estimate causal parameters from longitudinal counter-factual models was proposed under the so-called sequential randomizationassumption (SRA) [33]. Let us define A¯(t) = (A(1), A(2), . . . , A(t)) as thetreatment history up to time t from baseline, a¯(t) as the observed treatmenthistory up to time t from baseline and Ya¯(t) or shortly, Ya¯ as the correspond-ing vector of counterfactuals. Then Ya¯⊥A(t)|A¯(t−1) is known as SRA, whichis an extension of the ignorability condition of the point-treatment theory:(YA=1, YA=0)⊥A. This assumption basically states that the joint distribu-tion of the counterfactuals Ya¯ is independent of the current treatment giventhe treatment history A¯(t − 1). Similarly, the conditional ignorability as-sumption (YA=1, YA=0)⊥A|L of the point-treatment theory can be extendedto Ya¯⊥A(t)|A¯(t−1), L¯(t) where L¯(t) is the time-dependent covariate history.This assumption basically states that the counterfactuals Ya¯ are indepen-dent of treatment A(t) at time t, conditional on the treatment and measuredcovariate history (A¯(t− 1) and L¯(t), with the assumption that covariates ofa given time t are measured before treatment assignment) up to time t.111.1. A Brief Overview of Causal Inference FrameworksEstimating the Causal Effect of Treatment in the Presence ofTime-dependent ConfoundersModels based on this SRA assumption, such as marginal structural models(MSM) [36–42], provide a way to identify the causal effect of time-dependenttreatment from longitudinal data. To provide a causal interpretation of thecoefficient associated with A(t) on the outcome in a regression model, weneed to remove any confounding effect of time-dependent covariates L¯(t)up to time t. One way this could happen is if A(t) is an exogeneous vari-able (the covariate history L¯ is not causing treatment), i.e., if L¯(t)⊥A¯(t)or L¯(t)⊥A(t)|A¯(t − 1). In that case, the association measure will estimatethe causal effect. Now, let P(a(t)|a¯(t − 1))denote the probability of sub-jects who choose treatment A(t) = a(t) among the subjects with treatmenthistory A¯(t − 1) = a¯(t − 1). Similarly, let P(a(t)|a¯(t − 1), l¯(t))denote theprobability of subjects who choose treatment A(t) = a(t) among the subjectswith treatment history A¯(t− 1) = a¯(t− 1) and covariate history L¯(t) = l¯(t).Thenω(t) =t∏j=0P(A(j)|A¯(j − 1), L¯(j))P(A(j)|A¯(j − 1))indicates the degree to which the treatment process A(t) deviates from ex-ogeneity; “exogeneity” can be expressed as ω(t) ≡ 1. For the subjects whomake the predictable choices, in the event that the covariate history L¯(t) is astrong predictor of treatment choice A(t) at time t, ω(t) will be larger, but ifthe covariate history does not cause or predict treatment choice A(t) at timet (treatment assignment truly being an exogenous variable), then ω(t) willbe 1. Suppose the exposure-outcome association is being estimated from aregression model. Then, even if A(t) is exogenous, weighting the regressionmodel by the weight ω−1(t) (generally known as the inverse probability oftreatment weights, discussed in §1.1.4) will provide an estimate of the causaleffect. That is, an effect measure obtained from the weighted regression of121.1. A Brief Overview of Causal Inference Frameworksthe mean of observed outcome Y on the treatment history A¯(t) will have acausal interpretation. Such a marginal model for the response (that aver-ages over covariates instead of conditioning on the covariates) is popularlyknown as an MSM.To explain the product operator in the formula for ω(t), let us considerthe point treatment situation. Then,ω =P(A|L)P(A)indicates the degree to which the treatment assignment A deviates fromexogeneity. Similarly, for a two-time point treatment situation (t = 0, 1),let A(0) denote the binary treatment status at time 0 and A(1) denote thebinary treatment status at time 1. Also let L(0) denote the binary covariatestatus at time 0 (baseline) and L(1) denote the binary covariate status attime 1. Here L(1) can possibly be affected by A(0), but not the other wayaround since we are not dealing with retrocausality. With the conventionthat A(−1) = 0:ω(1) =P(A(1), A(0)|L(1), L(0))P(A(1), A(0))=P(A(1)|A(0), L(1), L(0))P(A(0)|L(1), L(0))P(A(1)|A(0))P(A(0))=P(A(1)|A(0), L(1), L(0))P(A(1)|A(0)) ×P(A(0)|L(0))P(A(0))=1∏j=0P(A(j)|A¯(j − 1), L¯(j))P(A(j)|A¯(j − 1)) ,will now indicate the degree to which the treatment assignments A¯(1) =(A(0), A(1))deviate from exogeneity. This expression can be generalized tot-time points and this leads to the formula above for ω(t).131.1. A Brief Overview of Causal Inference Frameworks1.1.4 Inverse Probability of Treatment WeightsWeights are usually known while analyzing the data from a randomizedclinical trial. While dealing with observational studies, weights need to beestimated from the observed data. To derive the weights, treatment historyis assumed to be predicted by the covariate history so that an appropriateadjustment can be made. The weight ω−1(t) is known as inverse probabil-ity of treatment weight (IPTW). Note that this IPTW is a generalizationof the propensity score p = P (A|L) and is functionally related, i.e., IPTW= A/p + (1 − A)/(1 − p) [43, 44]. For the point treatment context, suchweighting is equivalent to adding ω−1 − 1 copies of corresponding subjectswhich will constitute a pseudo-population, where the unconfounded effectestimate can be obtained by the use of simple measures of association (say,risk ratio or risk difference). This estimate is equivalent to that of standard-ization methods [45, 46] where the causal risk ratio can be estimated by thestandardized risk ratio for the total population,P (YA=1 = 1)P (YA=0 = 1)=∑l P (Y = 1|A = 1, L = l)P (L = l)∑l P (Y = 1|A = 0, L = l)P (L = l).This quantity estimates the risk for ‘all’ the subjects in the population thatare treated versus ‘all’ the subjects in the population that are untreated,computed from the observed quantities of Y , A and L. This is a ratio ofweighted averages of the stratum L-specific risks that offers a causal in-terpretation (complete exposure versus complete nonexposure) under theconditional ignorability assumption. MSM generalizes the standardizationmethods in longitudinal settings [47–49].MSM, therefore, treats the unobserved counterfactual potential out-comes as missing values and tries to impute stratum specific average valuesor, equivalently re-weights the observed values to adjust for those missing inorder to rebuild the pseudo dataset (as shown in Table 1.3 as an illustration;compare to Table 1.2).141.1. A Brief Overview of Causal Inference FrameworksThe mention of the MSM approach in the literature dates back to the1990s, but use of this approach increased after the publication of two land-mark papers in 2000 [42, 50]. These papers outlined a simple method toimplement this approach using off-the-shelf software routines of logistic re-gression to estimate the weights. As IPTW estimation is central to theMSM approach, assessment of the assumptions of the corresponding logisticregression fits are crucial, even though rarely seen in the MSM literature[51, 52]. Alternative IPTW modelling strategies, such as statistical learningmethods, require fewer assumptions and may be worth investigating.Analysts and researchers are increasingly using the MSM approach todeal with time-dependent confounding. In observational settings, many areskeptical about weight-based estimators in general. The debate is not aboutthe foundation of the IPTW estimators [39], but mostly about proper imple-mentation techniques. The major criticism of this method stems from thefact that the assumptions of the MSM approach are restrictive and mostlyuntestable from a given dataset. There exists substantial literature aboutvarious implementation techniques [53–57].Marginal Structural Cox Model: With event-time outcomes in thelongitudinal context, censoring is usually another feature of these studies.MSM models are tailored for survival data by use of inverse probability ofcensoring weights (IPCW) in addition to IPTW, and these models are pop-ularly known as marginal structural Cox models (MSCMs) [50, 58]. MSCMmodels will be further discussed in Chapter 2.151.1.ABriefOverviewofCausalInferenceFrameworksTable 1.3: Outcomes after stratum specific averages are imputed in the cells where the outcomes are missingSubject Covariate Treatment† Outcome Outcome Causal effecti L A = 0 or 1 YA=1 YA=0 YA=1 − YA=01 l1 1 YA=1,1 E(Y |A = 0, L = l1)2 l1 1 YA=1,2 E(Y |A = 0, L = l1)3 l1 0 E(Y |A = 1, L = l1) YA=0,34 l1 0 E(Y |A = 1, L = l1) YA=0,45 l1 0 E(Y |A = 1, L = l1) YA=0,5Conditional summary E(Y |A = 1, L = l1)†† E(Y |A = 0, L = l1) E(Y |A = 1, L = l1) −E(Y |A = 0, L = l1)6 l2 0 E(Y |A = 1, L = l2) YA=0,67 l2 0 E(Y |A = 1, L = l2) YA=0,78 l2 1 YA=1,8 E(Y |A = 0, L = l2)9 l2 1 YA=1,9 E(Y |A = 0, L = l2)10 l2 1 YA=1,10 E(Y |A = 0, L = l2)Conditional summary E(Y |A = 1, L = l2) E(Y |A = 0, L = l2) E(Y |A = 1, L = l2) −E(Y |A = 0, L = l2)Marginal summary E(Yfull|A = 1)† † † E(Yfull|A = 0) E(Yfull|A = 1) −E(Yfull|A = 0)‡ This simplistic illustrative example is for the point-treatment situation. However, the MSM is applicablefor a more generalized longitudinal setting where treatment A values may change more than once overtime.‡† E(Y |A = 1, L = .) values are computed from stratum L specific observed YA=.,i values. If none of thevalues in a particular stratum L = x are observed, then this method fails.† † † E(Yfull|A = .) is computed after the missing value imputation by stratum L specific averages.161.1. A Brief Overview of Causal Inference Frameworks1.1.5 Role of Causal DiagramsThe existence of common causes of treatment and outcome, i.e., the ideaof confounding, is an important issue in any epidemiologic study since thisdistorts the relationship between the treatment exposure variable and theoutcome variable. Even if exposure and outcome variables are not causallyassociated, due to relationship with this common cause, association mea-sures may report associated exposure-outcome variables. Similarly, condi-tioning on common effects also may distort the exposure-outcome variablerelationship and such bias is popularly known as collider bias.For a longitudinal setting, when treatment status is a time-dependentvariable and other time-dependent variables are affected by the previoustreatment status, the relationship of the treatment variable with other vari-ables can be complicated. Such complications may lead to additional biaswhich can be hard to detect and control. A set of graphical tools called ‘di-rected acyclic graphs’ (DAGs) or causal diagrams were developed [59–61] todefine causality [62], confounder [63], selection bias [64], effect-modification[65], non-collapsibility [66] and over-adjustment [67].These graphs are more intuitive in explaining the causal concepts com-pared to other methods or definitions even in complicated structures. DAGsdo not deal with cyclic variables which can cause themselves, nor are suit-able for assessing claims of retrocausality [68]. Using DAGs, it is possibleto explain epidemiologic concepts in an intuitive way [35], an unified struc-tural approach for detecting confounding, mediation and selection bias canbe outlined [64] and a simple 6-step approach can be used [62] to identifyconfounding using the backdoor criterion [60].In DAG notation, if C is a variable that is causing exposure variable Eand outcome variable D, then even if E and D are not associated with eachother, a statistical association measure may find these two variables to beassociated, which does not imply or reflect the true causal relationship [63].171.1. A Brief Overview of Causal Inference FrameworksA path between two variables corresponds to statistical association and ablock in the path means statistical association between these two variablesis nullified. The backdoor criterion is a graphical tool to determine the exis-tence of an unblocked non-directional path between exposure and outcomeas shown in Figure 1.3. In the exposure E - outcome D association, a setof variables C fulfills this criterion, if (i) no variable in C is caused by ex-posure E and (ii) C blocks every path between exposure E and outcome Dthat is causing exposure choice E. This backdoor criterion helps researchersidentify whether there is confounding present in a situation, whether suchconfounding can be eliminated, and what particular variables are necessaryto control for the confounding.C is common cause or con-foundingC is intermediate value orMediatorC is common effect or colliderUnblocked non-directionalpath exists between E and D,i.e., E ← C → DUnblocked directional pathexists between E and D, i.e.,E → C → D and E → DBlocked non-directional pathexists between E and D, i.e.,E → C ← DFigure 1.3: Relationships among exposure E, outcome variable D and a covariateC in a directed acyclic graphWhen a confounder is affected by the previous exposure status, it isknown as an intermediate variable. In a regression adjustment, to get avalid assessment of the exposure effect, Cox [24] suggested not to controlfor an intermediate variable. But this suggestion was not based on anyproof or simulation. Using DAG theory, it was later shown why standardmethods of estimation of treatment effects in longitudinal studies fail toproduce unbiased estimates in the presence of a time-dependent risk-factor181.2. Models to Estimate the Causal Effectthat is also a predictor of subsequent exposure [67].1.1.6 Time-dependent ConfoundersIf a confounder C is affected by the previous treatment exposure, then it isalso acting as an intermediate variable between treatment E and outcomeD. As C is in the causal pathway between the current treatment and futureoutcome, it is associated with both of them (E and D). If C is impactedby the previous treatment exposure and subsequently influences the currenttreatment decision, it is known as a time-dependent confounder. MSM andMSCM approaches are useful tools to deal with time-dependent confounding[64]. Throughout this thesis, we define a covariate as a “time-dependentconfounder” [50, 69] if it1. is itself affected by the previous treatment exposure and2. predicts the future treatment decision and future outcome conditionalon the past treatment exposure.1.2 Models to Estimate the Causal Effect1.2.1 In the Presence of Time-dependent ConfoundersMSCMs are useful tools to estimate the causal effect of treatment in thepresence of time-dependent confounders in the longitudinal context withevent-time outcomes. This is especially true when the time-dependent con-founders also act as mediators between the exposure-outcome association.However, MSCMs may require strong and untestable assumptions. Alter-native methods such as structural nested models [38, 70, 71], the sequentialstratification approach [72] and the sequential Cox approach [73] can be usedto deal with time-dependent confounders. But those methods can be verycomputationally intensive compared to standard statistical tools. Amongthese, the sequential Cox approach is relatively new, and deserves more at-tention due to its simplicity.191.2. Models to Estimate the Causal EffectFigure 1.4: An illustration of immortal time, i.e., a delay or wait period thatmay exist before a subject begins to receive a treatment in an observational drugeffectiveness study1.2.2 In the Absence of a Time-dependent ConfounderA recurrent theme in this research is to find causal effects from observationaldata in the presence of time-dependent exposure. So far we have dealt withthe complicated situation when time-dependent confounders are present. Insimpler settings, when time-dependent covariates do not interact or influ-ence future treatment, estimating the effect of time-dependent treatmentis less troublesome. Under the assumptions of conditional exchangeability,consistency, correct model specification, and positivity, hazard ratios esti-mated from a time-dependent Cox proportional hazards model [74] will havecausal interpretation [75].1.2.3 In the Presence of Immortal Time BiasIn some observational studies, after entering into the study or reaching eli-gibility, there might be a wait-period before receiving treatment. A treatedpatient with such a wait period contributes to both treated and untreatedtime. The waiting time of the patients who ‘survived’ until treatment initia-201.2. Models to Estimate the Causal Effecttion needs to be properly accounted for in the analysis. As shown in Figure1.4, there may be patients of three kinds: (a) those who were treated fromthe beginning till the end of the study, (b) those who were untreated fromthe beginning till the end of the study, and (c) those who begin the studyas an untreated patient, but after some waiting time switch onto treatment.If this waiting time is classified as exposed to treatment instead of unex-posed, it offers an artificially enhanced survival advantage for the treatedsubjects and this phenomenon is sometimes referred to as ‘immortal timebias’ [76–80]. Note that the immortal time bias can occur with or withouttime-dependent confounding.Time-dependent exposure modelling is one way to adjust for this immor-tal time bias, i.e., for survival outcomes, the time-dependent Cox propor-tional hazards model [74] is a suggested solution. This approach achieves thebest statistical efficiency compared to the alternatives [81, ch.33]. Insteadof comparing the treated versus untreated groups, this approach comparestime under treatment to time not under treatment. A time-distributionmatching approach was suggested [82], which offers a way to use the usualCox model for the treatment group comparison [83]. This approach assignsnew baselines to achieve balance in the follow-up time distribution of ex-posed and unexposed. This suggested approach is cited frequently in therecent biomedical literature [84–88]. However, it is currently unknown howwell this method works in a general setting compared to a time-dependentCox model. One of the proposed directions of this research is to assess theperformance of this approach by means of simulations and theoretical cal-culations. We will also assess the suitability of using the sequential Coxapproach instead of MSCM when a time-dependent confounder is present.We will further discuss these approaches in Chapter 4.211.3. Organization of the Dissertation1.3 Organization of the DissertationSo far we have portrayed the assumptions and the key concepts of the causalinference framework in a very general way. This framework allows us toidentify and control for the time-dependent confounders. We consider threegeneral problems related to adjustment of analyses for the possible influ-ences of time-dependent confounders in this dissertation.We will describe the motivating MS research problem that inspired thiswork in Chapter 2. MSCMs allow adjustment for time-varying confounders,as well as baseline characteristics. Most of the MSCM analysis performed inthe published literature are specific to HIV/AIDS. MS is a chronic diseasewith features of its own. Different subjects may come to medical attention(e.g., an MS clinic) at different times. Therefore, subjects included in anobservational MS study may have different baselines or cohort entry starttimes, different drug initiation times and may use different treatments ormay switch treatments over time depending on their health conditions (e.g.,relapse frequency and disease course). A carryover effect of the previoustreatment may exist even after drug discontinuation. β-IFN has been foundto be effective in some short-term MS clinical trials (3-5 years). The effectof this treatment on longer-term outcomes such as irreversible disability isof great interest. In this study, one of the main objectives was to assessthe association between β-IFN drug exposure and disease progression inrelapsing-remitting MS patients in the ‘real-world’ clinical practice setting.In the presence of time-varying confounders, such as MS relapses, MSCMcan be a valuable tool to analyze longitudinal observational survival data.In this chapter, we set out to assess the suitability of MSCMs to analyzedata from a large cohort of relapsing-remitting MS patients in BC, Canada(1995-2008).Our data analyses and previous literature indicate that the propertiesof the inverse probability weights (IPWs) can influence the estimated ef-fects from MSCM and their accuracy. Logistic regressions are generally221.3. Organization of the Dissertationused to model the IPWs. Statistical learning algorithms such as bagging,support vector machines, and boosting have proved to be useful in estimat-ing propensity scores with better covariate balance. As propensity scoresand IPWs are functionally related, whether the lessons learnt from propen-sity scores can be translated and generalized to IPW estimation is of greatinterest. In Chapter 3, we will assess the performance of these proposedmethods via simulated survival data that mimics a context in which bothtreatment status and a confounder are time-dependent. These statisticallearning approaches are also applied to estimate IPWs to investigate theimpact of beta-interferon treatment in delaying disability progression in theBritish Columbia Multiple Sclerosis (BCMS) cohort.Prescription time-distribution matching is an approach proposed in theliterature to avoid a time-dependent Cox analysis. In longitudinal survivalstudies, in the presence of time-dependent confounding, MSCMs are usuallyused to deal with such confounding. The sequential Cox approach is sug-gested as an alternative approach. Both the prescription time-distributionmatching and the sequential Cox approaches make the interpretation of theresults much more accessible to a wider audience. In Chapter 4, we assessthe suitability of both of these approaches for analyzing data in the ab-sence and presence of time-dependent confounding. These methods are alsoutilized to investigate the impact of beta-interferon treatment in delayingdisability progression in subjects from the BCMS database. Finally, Chap-ter 5 briefly summarizes this research, and suggests possible directions forfuture research.23Chapter 2Marginal Structural CoxModels for Estimating theEffect of Beta-interferonExposure in DelayingDisease Progression in aMultiple Sclerosis Cohort2.1 IntroductionMultiple sclerosis (MS) is a disease associated with damage to the myelinand nerve fibers in the brain and spinal cord. It is a life-long disease, typi-cally manifesting in early adulthood, affecting an estimated 2 to 2.5 millionpeople worldwide [89]. A relapsing-remitting course is the most commonpresenting MS phenotype; these patients can experience periods of acuteworsening, known as an attack or relapse, followed by relapse-free periodswith partial or full recovery. Disability may gradually worsen over time,ultimately becoming irreversible. As evident from various clinical trials, im-munomodulatory drugs, such as beta-interferon (β-IFN) may reduce the riskof an MS relapse and increase the duration of relapse-free periods over theshort-term [2–6]. However, their impact on longer-term outcomes such asirreversible disability is unclear.242.1. IntroductionThere is a real need to determine whether the β-IFNs positively influ-ence the MS disease course over the long-term, particularly in the ‘real-world’ clinical practice setting. Observational studies are the most prag-matic means of addressing this need. However, findings from recent ob-servational studies have been contradictory with respect to the impact ofβ-IFN [90–92]. Possible explanations for these inconsistencies include: se-lection bias, informative censoring, immortal time bias, and inappropriateuse of analytic tools [93, 94]. Hence the association between β-IFN and theprogression of disability in clinical practice remains undetermined.Recently researchers assessed the association of β-IFN with the timeto irreversible disability outcomes among relapsing-remitting MS patientstreated in the real world clinical practice setting of British Columbia, Canada,using a Cox model with time-dependent treatment exposure after adjustingfor a number of important baseline confounders [92]. They were also able tocompare β-IFN treated patients with two separate control cohorts - a ‘histor-ical’ cohort (patients who first became β-IFN eligible prior to the approvalof β-IFN in Canada in 1995) and a ‘contemporary’ cohort (patients who firstbecame β-IFN eligible after the approval of β-IFN, but remained unexposedto β-IFN.). While this approach represented a considerable improvementover previous studies [95], concern remained about the potential for indica-tion bias when the contemporary control cohort was considered [92]. Despiteadjustment for a number of baseline characteristics, there were also concernsraised about the inability to adjust for subsequent (post-baseline) treatmentdecisions [92, 96–99]. Furthermore, since disease activity, such as relapsescan drive decision-making with respect to starting or stopping β-IFN treat-ment [100], and might also be associated with the outcome [101], relapsescould be considered a potential time-dependent confounder. Simply incor-porating such confounders as covariates in a time-dependent Cox model maybe inadequate to adjust for selection bias and confounding [50].Marginal structural Cox models (MSCMs) allow estimation of the causaleffects of treatment exposure on survival responses (e.g., time to disability)252.2. Materials and Methodsin the presence of time-dependent confounding, selection bias and informa-tive censoring [50, 58]. These models depend on model-based estimates ofthe inverse probability of the observed treatment and censoring status ofeach patient to achieve causal interpretation of the findings. Simulationstudies with short-term follow-up have repeatedly shown that MSCMs areadvantageous in terms of obtaining consistent estimates of the effect of timevarying treatment exposures [56, 57, 102, 103]. When studying MS, a chronicdisease, extended observational periods are needed, which may contribute tothe construction of highly variable weights [104], and subsequently may leadto an inefficient estimate of the causal effect. Furthermore, how robust thesemodels are when follow-up lengths differ for individual patients, as is the casein clinical practice, is largely unknown. To assess and address these practicalchallenges, we explored the use of different weighting approaches in MSCMsto estimate the causal effect of β-IFN on the time to irreversible disability ina cohort of relapsing-remitting MS patients from British Columbia, Canada.2.2 Materials and Methods2.2.1 Study Population and MeasurementsThis cohort study included data that were collected prospectively from MSpatients who were registered at a British Columbia (BC) MS clinic andwho were eligible to receive β-IFN (all preparations of β-IFN were consid-ered as one therapeutic class). In Canada, the first β-IFN was licensed inJuly 1995. Therefore, patients who became eligible for β-IFN treatment forthe first time between July 1995 and December 2004 were included (onlythe contemporary control cohort was considered). Broad eligibility crite-ria for receiving β-IFN treatment were adapted from the BC government’sreimbursement scheme, i.e., adults (≥ 18 years old) who had a diagnosisof definite MS with a relapsing-onset course and were able to walk (Ex-panded Disability Status Scale or EDSS ≤ 6.5). The first MS clinic visitat which a patient met the β-IFN eligibility criteria was considered the pa-262.2. Materials and Methodstient’s baseline date (time = 0). The end of follow-up was December 2008.The study was approved by the University of British Columbia’s ClinicalResearch Ethics board.The study outcome (irreversible disability progression) was based on theEDSS [105], a standardized rating system to measure neurological impair-ment and disability, which ranges from 0 (indicating no disability) to 10(death from MS). The EDSS has been widely used to describe a patient’sclinical status, to quantify disability progression and to evaluate treatmentresponse in intervention studies. Our outcome was time to reaching sus-tained EDSS 6. An EDSS score of 6 indicates that the patient requires in-termittent or unilateral constant assistance (cane, crutch or brace) to walkabout 100 meters with or without resting. Since it is possible to move backand forth along the EDSS scale, sustained EDSS 6 (i.e., confirmed after atleast 150 days, with all subsequent scores being at least EDSS 6 or greater)was adopted in this study as an indicator of irreversible disability progres-sion [92, 106, 107].Since a patient’s β-IFN exposure status might change during follow-up,this was considered as a time-dependent variable. β-IFN exposure was de-fined as ‘any vs. none’ on a monthly basis. This could be considered animprovement on the previous study design [92] in which only one treatmentinitiation and one termination date was considered for each treated patient.Potential confounders included: age at baseline, sex, disease duration atbaseline, EDSS score at baseline and relapses.The relapse variable was selected to be included in the model as a time-varying factor for the following reasons. Firstly, relapses may be associatedwith the outcome (disability progression). Studies have shown that earlyrelapses may have a significant impact on later disability progression, eventhough the strength of this association may diminish with time [107]. Sec-ondly, the β-IFNs have been shown to reduce relapse rates [2–6]; therefore, apatient’s relapse status may be affected by prior β-IFN treatment. Thirdly,272.2. Materials and Methodsthe presence (or absence) of relapses might influence treatment decisions,i.e. determine whether to start or stop a β-IFN. Finally, the risk of a re-lapse is not constant over time; it typically decreases as the patient’s diseaseduration and age increases [108]. Therefore, only considering those relapsesthat occurred prior to a patient’s baseline date may be insufficient. Instead,we considered the cumulative number of relapses in the last two years (here-after ‘cumulative relapses’) as a time-dependent confounder.Figure 2.1: Representation of the hypothesized causal relationships in the treat-ment of MS with three time points j = 0, 1, 2.The cumulative number of relapses could be an intermediate variablebetween treatment exposure and disability progression; a simplified versionof this hypothesized causal relationship is outlined in Figure 2.1. In thisFigure, Ej denotes the binary β-IFN exposure variable that is measured im-mediately after the time-dependent confounder Rj , cumulative relapse andDj , disability progression index, i.e., EDSS score of the j-th time period.The time-dependent confounder Rj at time j is affected by prior treatmentEj−1. According to the causal diagram, R0 imposes confounding for the E0-D relationship (as relapse frequency may dictate the subsequent treatmentchoice and residual disability left by frequent relapses may accumulate overtime leading to irreversible disability), but R1 is an intermediate variablefor the same relationship [35, 109] (as the prior β-IFN treatment may re-duce relapse frequency which may allow more time to recover from residualdisability left by past relapses and may contribute to slower progression ofdisability over time). A more detailed discussion of rationale can be foundin Appendix §A.1. We also examined whether cumulative relapses were an282.2. Materials and Methodsimportant predictor of subsequent treatment choices.2.2.2 Statistical MethodsConventional Cox model. We defined the model notations as follows: if pa-tient i was followed from the time of β-IFN eligibility (t = 0) to time Ti withtreatment exposure at time t described by Ait (1 = under treatment, 0 = notunder treatment), then ait was the realization of Ait; a¯it = (ai1, ai2, . . . , ait)described the observed treatment status up to time t. The patient’s baselinecovariates were recorded in the vector Li0 consisting of baseline EDSS score,disease duration, age and sex. If λi(t|Li0) was the hazard of reaching sus-tained EDSS 6 at time t for patient i with baseline covariates Li0, one wayto model such data was with the time-dependent Cox proportional hazardsmodel:λi(t|Li0) = λ0(t) exp(β1Ait + β2Li0), (2.1)where λ0(t) was the unspecified baseline hazard, β2 was the vector of loghazard ratios (HRs) for the baseline covariates and β1 was the log HR of thecurrent β-IFN status (Ait). Adding cumulative relapse (Lit) as a covariatein this model may have failed to adjust for this time-dependent confounder(discussed in detail in Appendix §A.2). Hence, the MSCM approach [42, 50]was applied instead.Marginal Structural Cox model (MSCM). Within a counterfactual frame-work, in the pseudo-population, MSCMs enabled the conceptual compari-son of the hazard functions for those who never received β-IFN (completenon-exposure during follow-up) with those who received β-IFN continuously(complete exposure). To accomplish this, the partial likelihood function ofthe Cox model (or its approximations; see Appendix §A.3) was modifiedsuch that the contribution of patient i to the risk set at time t was weightedby the inverse probability of treatment and censoring (IPTC) weight wi toremove the possible confounding effects of both time-varying and baseline292.2. Materials and Methodsconfounders [50].Weighting schemes. The stabilized version of the IPT weight for patienti at time t was given by:swTit =t∏j=0pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0)pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯ij = l¯ij), (2.2)where A¯ij = a¯ij and L¯ij = l¯ij were the observed treatment history and time-varying confounder history respectively from baseline to time j. The sta-bilized IPT weights were inversely related to a function of the time-varyingconfounder cumulative relapse, since this variable appeared only in the de-nominator of the weights, whereas the baseline covariates were included inboth the numerator and the denominator, as shown in equation (2.2). Theweights swTit down-weighted the person-time contributions when cumulativerelapse were a strong predictor of the treatment status in the subsequenttime periods, after controlling for the baseline covariates. Assuming that thedenominators of the weight models were correctly specified, these weightscreated a pseudo-population in which cumulative relapses no longer pre-dicted the subsequent β-IFN treatment status [41]. The β-IFN treatmenteffect in this pseudo-population would be the same as in the original targetpopulation [22].Generally, when the numerator in equation (2.2) is replaced by 1, theseweights become the unstabilized IPT weights, wTit [50]. The unstabilizedweights simultaneously controls for time-varying and baseline covariates.Unlike MSCMs using stabilized versions of the weights, MSCM analyses us-ing unstabilized versions of the weights do not need further adjustment forthe baseline covariates [22]. Use of the unstabilized weights also yields con-sistent causal estimates but these estimates are associated with substantialvariability [41].Consistent estimation of β1 from censored data can be achieved by in-302.2. Materials and Methodscorporating IPC weights in the analysis [110]. Using similar logic to thatleading to the IPT weights for uncensored patients, the stabilized version ofthe IPC weight for patient i at time t is obtained as:swCit =t∏j=0pr(Cij = 0|C¯i,j−1 = 0, A¯i,j−1 = a¯i,j−1, Li0 = li0)pr(Cij = 0|C¯i,j−1 = 0, A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯i,j−1 = l¯i,j−1),(2.3)where Cij denoted the binary censoring status taking the value of 1 if thei-th patient was censored in the j-th time and 0 otherwise and C¯ij = c¯ij wasthe observed censoring history up to time j. The overall stabilized IPTCweights swit are obtained by multiplying swTit by swCit [22, 42].Since the weights were unknown, they were estimated from the data.Logistic regression models were applied to estimate the conditional proba-bilities appearing in equations (2.2) - (2.3) (see Appendix §A.4).The normalized version of the IPTC weights were calculated where eachweight was normalized by the mean weight of the corresponding risk set [57]:w(n)it =witNt∑i Yitwit, sw(n)it =switNt∑i Yitswit, (2.4)where Yit indicated whether patient i belonged to the risk set at time t andNt =∑Ni Yit was the total number of patients in the risk set at time t. Wecritically assessed the performance of all of the above mentioned weightingschemes.To take within-subject correlation [111] into account, robust SEs are usu-ally evaluated, which may be asymptotically conservative [42, 112]. There-fore, the 95% CIs for the causal estimate based on 500 nonparametric boot-strap samples were calculated [73, 113, 114].312.3. ResultsIPTC weighted survival estimates. IPTC weight adjusted Kaplan-Meiersurvival curves did not require assumptions related to parametric survivalor the Cox model. We used unstabilized IPTC weights (w or w(n)) to ad-just the survival curves. This had the added advantage of yielding marginalestimates that provided direct causal interpretations without first requiringfit of the MSCM model [115]; hence, constructing such curves served as asensitivity analysis. However, these weights can be highly variable comparedto sw(n) and the adjusted survival curves are prone to distortion in the pres-ence of extreme weights. Truncation of extreme weights was applied as onead-hoc solution to assuage the problem of extreme weights [55].Sample code and practical guidance on implementing the weights in suchdirect and approximate MSCM approaches via various R [116] packages areincluded in Appendix §A.5.2.3 ResultsOf 1,697 patients included in the study, 1,297 patients were female (76%).The mean age at baseline was 39.7 years (SD = 9.7), the mean disease du-ration from symptom onset was 7 years (SD = 7.7) and the median EDSSscore was 2 (IQR = 1).The mean follow-up time was 4 years (IQR = 6.0 − 1.7 = 4.3), and themaximum was 12.7 years. In total there were 6,890 person-years of follow-up and 2,530 person-years of β-IFN exposure. In all, 829 patients remaineduntreated during follow-up. Patients at risk of reaching the outcome atthe beginning of each year are shown in Figure 2.2. Overall, 138 patientsreached the outcome of sustained EDSS 6. Further description of the datais provided in the Appendix §A.6.322.3. ResultsFigure 2.2: Number of patients at risk of reaching sustained EDSS 6 during thefirst month of each follow-up year after baseline. Failure to continue to the nextrisk set results from either censoring or reaching sustained EDSS 6. Analyses wereperformed by month, but the plot is drawn by year for simplicity.2.3.1 Time-dependent WeightsWe found the cumulative relapse variable to be a good predictor of subse-quent treatment choices as evidenced by the significance in the model forthe IPT weights (two-sided P < 0.001; see Appendix-Table A.1) and alsofor the IPC weights (two-sided P = 0.03).The IPTC weights varied not only from patient to patient, but also bytime. As the number of patients at risk decreased monotonically over time,the variation of the IPTC weights increased with follow-up time. As seen inFigure 2.3, in addition to such increasing variability, a clear upward trendover time was evident in the unstabilized weights w. The means at succes-sive time points were much closer to one after stabilization (sw). However,an upward trend of the mean weights was still apparent as follow-up pro-gressed. As expected, this trend was eliminated when the stabilized weightswere normalized (sw(n)). When the unstabilized weights were normalized332.3. Resultsw w(n)sw sw(n)Figure 2.3: Distribution of various IPTC weighting schemes for each year offollow-up (instead of month for better visual display). The means are indicated by∗ in each boxplot. Note that the plots do not have identical scales on the verticalaxes.(w(n)), even though the mean weight at each time point was one, the distri-342.3. Resultsbutions of the weights were highly variable and skewed.The mean and SD of the unstabilized, unnormalized weights (w) weremuch larger than those of the other weights (Table 2.1), and the resultingcausal effect estimate was further removed from null, with a much wider con-fidence interval (CI). Normalization resulted in a mean weight of one anda markedly reduced SD. Stabilization of the weights had an even greaterimpact on reducing the SE of the causal estimate.A smaller range is an indication of well-behaved weights [55] that gener-ally leads to a smaller CI for the effect estimate. In terms of this desirableproperty, sw(n) behaved better than the other schemes: these weights had asmaller range. This supported the use of sw(n) in this application. Also, anecessary condition for correct model specification is that the mean of thestabilized weights is one [49, 55], ideally at each time period rather than justoverall. Although sw(n) depend on the same specifications of the treatmentand censoring models as in sw, we observe that there was no tendency forthe mean to deviate from one even after long follow-up (see Figure 2.3).352.3.ResultsTable 2.1: Different versions of the IPTC weights and the corresponding causal effect of β-IFNon the hazard of reaching sustained EDSS 6 for MS patients from BC (1995-2008).Scheme∗ Stabilized Normalized Estimated Weights Causal EstimatesMean (log-SD) Min-Max HR 95 % CIw No No 28.17 (6.44) 1 - 43,985.38 1.54 0.09, 26.38§w(n) No Yes 1 (2.45) 0.01 - 753.47 1.36 0.18, 10.40§sw Yes No 0.99 (-2.12) 0.30 - 1.95 1.36 0.95, 1.94§sw(n) Yes Yes 1 (-2.18) 0.32 - 1.71 1.36 0.95, 1.94§#log-SD, log of standard deviation; Min, minimum; Max, maximum; CI, confidenceinterval.∗ The IPT numerator model included the baseline covariates EDSS, age, disease dura-tion and sex, treatment status at previous time interval and restricted cubic spline[117] of the follow-up month number. The denominator model included the covari-ates considered in the numerator model and the time-dependent covariate cumulativerelapses for last two years, as well as its interaction with treatment status at the pre-vious time interval. The same model specifications were used to generate the IPCweights. With the stabilized versions of the weights, the hazard ratio model of theMSCM must include adjustment for the baseline covariates, but this is not necessarywith the unstabilized versions of the weights.§ Based on 500 nonparametric bootstrap samples with patients as sampling units.# The CI of the causal effect estimate obtained using sw(n) was the smallest, althoughequal to that obtained using sw when displayed to 2 decimal places.362.3. Results2.3.2 The Causal Effect of β-IFNSince the sw(n) had better properties, we relied on the corresponding MSCMestimates (see Table 2.2). The estimated HR failed to suggest a beneficialeffect of the treatment, and the evidence of an association between the cur-rent beta-IFN exposure and the hazard of reaching sustained EDSS 6 wasinconclusive.Table 2.2: The marginal structural Cox model (MSCM) fit withthe normalized stabilized IPTC weights sw(n) for time to sustainedEDSS 6 to estimate the causal effect of β-IFN treatment for multiplesclerosis (MS) patients from British Columbia, Canada (1995-2008).The model was also adjusted for the baseline covariates EDSS, age,disease duration and sex.Covariate Estimate∗ HR † 95% bootstrap CI ‡β-IFN 0.31 1.36 0.95 - 1.94EDSS 0.54 1.72 1.54 - 1.92§Disease duration# -0.19 0.83 0.66 - 1.05Age# 0.28 1.32 1.08 - 1.62§Sex¶ -0.22 0.80 0.55 - 1.17HR, Hazard ratio; CI, confidence interval; EDSS, expandeddisability status scale∗ Estimated log HR† HR, indicating the instantaneous risk of reaching sustainedand confirmed EDSS 6‡ Based on 500 nonparametric bootstrap samples.§ 95% CI that does not include 1.# Expressed in decades.¶ Reference level: MaleTo verify the results, we also obtained the estimates from several ap-proaches that approximate the MSCM (see Table 2.3). All the estimatesfrom the models based on sw(n) were consistent. The conclusion concerningthe causal effect of β-IFN on time to sustained EDSS 6 did not change withthe modelling choices.372.3. ResultsTable 2.3: Estimates of effect of β-IFN treatment on time to sustained EDSS6 for MS patients from British Columbia, Canada (1995-2008) using differentanalytical approaches.Model Adjustment Measures 95% CIof effectCoxUnweighted† 1.29§ 0.91 - 1.82‡Weighted by sw(n) 1.36§ 0.95 - 1.94¶Pooled logistic Unweighted† 1.29# 0.91 - 1.82‡Weighted by sw(n) 1.36# 0.96 - 1.95¶Poisson Weighted by sw(n) 1.36# 0.96 - 1.95¶C-log-log Weighted by sw(n) 1.37# 0.96 - 1.95¶† Based on time-dependent β-IFN treatment exposure status and covari-ates measured at baseline: EDSS, age, disease duration, sex. This esti-mate does not have a causal interpretation; and is shown for comparisonpurposes.‡ 95% CIs calculated based on robust SEs.¶ 95% CIs obtained from 500 nonparametric bootstrap samples.§ HR is the measure of effect obtained from a Cox model.# HR from Cox model was approximated by the odds ratio (OR) of thepooled logistic model [118, 119] (see Appendix §A.3) or, under the in-frequent event assumption, by the standardized mortality ratio (SMR)from Poisson regression or by the OR from complementary log-log re-gression respectively. The weighted Cox [50, 57] model was approxi-mated by weighted versions of these models. Software specifications ofthese analyses are reported in Appendix §A.5.In a complementary analysis we considered longitudinal EDSS values asan additional time-varying confounder, instead of treating EDSS as a base-line covariate (see Table 2.4). Additionally, the impact of weight trimming[120] was evaluated to assess the sensitivity of the findings to the positivityassumption (see Appendix §A.7.1). The analysis was also repeated after se-lecting patients via more restricted eligibility criteria (see Appendix §A.7.2).382.3. ResultsFurther analyses were conducted to check the impact of the cumulative ex-posure to β-IFN over the last two years on the same outcome (see Appendix§A.7.3). We also assessed the impact of including cumulative relapses inthe last year, rather than the last two years (see Appendix §A.7.4). Noneof these sensitivity analyses resulted in statistical evidence for an effect oftreatment.Table 2.4: Sensitivity analysis to assess the impact of EDSS as an additional time-varyingconfounder: The MSCM fit with the normalized stabilized IPTC weights sw(n) for time tosustained EDSS 6 to estimate the causal association between β-IFN treatment for patientswith relapsing-onset MS, British Columbia, Canada (1995-2008)Covariate Estimate HR † 95% CI ‡β-IFN∗ 0.12 1.13 0.76 - 1.68Disease duration# -0.02 0.98 0.82 - 1.22Age# 0.32 1.37 1.10 - 1.63 §Sex¶ -0.36 0.70 0.47 - 1.02HR, Hazard ratio; CI, confidence interval; EDSS, expanded disability statusscale.∗ The model was adjusted for cumulative relapse and EDSS as time-varyingconfounders and baseline covariates age, disease duration and sex. ConsideringEDSS as a time-varying confounder rather than a baseline covariate in theanalysis does not contradict the causal diagram (Figure 2.1). All missing EDSSvalues were imputed via the last-value-carried-forward approach.‡ Based on 500 nonparametric bootstrap sample estimates.§ 95% CI that does not include 1.# Expressed in decades.¶ Reference level: Male.2.3.3 IPTC Weighting for Estimation of Survival CurvesWe plotted IPTC weight w(n) adjusted Kaplan-Meier survival curves. How-ever, the large drops in the survival plot in Figure 2.4 (b) were driven byonly a few large weights. Therefore, we investigated the sensitivity of theseadjusted Kaplan-Meier curves after progressively truncating w(n).392.3. ResultsTable 2.5: The impact of truncation of the w(n) on the estimated causal effectof β-IFN on reaching sustained EDSS 6 for MS patients from British Columbia,Canada (1995-2008).Truncation Estimated weights Treatment effect estimatepercentiles‡ Mean (log-SD) Min-Max HR SE† 95 % CI†None 1 (2.45) 0.01 - 753.47 1.36 1.41 0.18 - 10.4(5, 95) 0.31 (-1.24) 0.04 - 0.93 1.11 0.32 0.64 - 1.95(10, 90) 0.3 (-1.29) 0.05 - 0.83 1.13 0.31 0.66 - 1.95(25, 75) 0.21 (-2.2) 0.09 - 0.35 1.17 0.25 0.77 - 1.76Median § 0.19 (-Inf) 0.19 - 0.19 1.29 0.23 0.91 - 1.82log-SD, logarithmic transformation of standard deviation; Min, mini-mum; Max, maximum; CI, confidence interval, HR, Hazard ratio.† Based on 500 nonparametric bootstrap samples.‡ Truncation means the extreme weights (determined by the selected per-centile range) are replaced by the nearest percentile weight value.§ Weighting by the median of the weights gives the same estimate and CIas obtained from the simple baseline covariate adjusted Cox model (seeTable 2.3).As can be seen from Figure 2.4 (c), truncation of the 5% smallest andlargest of the w(n) freed the curve from the excess influence of a few extremeweights (following the convention in [55]). In this application, the adjustedsurvival curves did not change dramatically with greater truncation (see Fig-ure 2.4 (d)-(f)). Note that some studies do not truncate the smaller weightsas truncating such weights generally does not lead to substantial changes inthe effect estimates [121].The magnitude of variability in the weights w(n) affected not only the ad-justed survival curve, but also the CI for the causal effect obtained from thew(n) weighted MSCM. The CI (95% bootstrap CI 0.18−10.4; see Table 2.1)was wider than that obtained with sw(n), even though the two causal effectestimates were the same (HR 1.36). As before, truncation of the extreme402.4. Discussionweights was examined as another sensitivity analysis to increase the preci-sion of the causal estimate [55]. Truncating the 5% smallest and largest ofthe w(n) had a substantial impact in this application: the CI shrunk to 0.64- 1.95 (see Table 2.5). Table 2.5 shows that despite improving the precisionof the estimate of the β-IFN treatment effects, this ad-hoc truncation ap-proach did not alter the conclusion concerning the causal effect of β-IFN ontime to sustained EDSS 6.2.4 DiscussionBy adapting an IPTC weight based MSCM approach in order to explorethe impact of β-IFN on MS disability progression in the ‘real-world’ clinicalpractice setting, we did not find a significant association between β-IFN ex-posure and disability progression.The possibility that cumulative number of (prior) relapses may repre-sent a time-dependent confounder lying on the causal path of β-IFN anddisability progression led us to propose this MSCM approach [122]. Fromthe analysis, it was evident that the cumulative relapse count in the previ-ous two years was an important factor in the weight models. This highlightsthe importance of controlling for this type of time-dependent confounderand justifies the additional complexity of the MSCM approach. Further ad-vantages of using such models included the ability to adjust for potentialinformative censoring.Even though an extended follow-up period is essential to adequatelycapture the potential effects of treatment on disease progression for chronicdiseases such as MS, the duration of follow-up may vary considerably frompatient to patient in observational settings. This feature of the data posesconsiderable challenges while applying the MSCM approach, especially whentrying to obtain suitable weights. Over time, treatment exposure as well asother patient characteristics (e.g., age, disease duration, occurrence of re-412.4. Discussionlapses) change, further contributing to the complexity of the study design.To account for these changes, the weights at a given time point need to beobtained by combining weights for each previous time period in a multi-plicative manner. For patients with an extended follow-up, this may causeestimated weights for later periods to increase dramatically and the overallmean weights for these periods to deviate far from one. Also, as follow-upprogresses, the decreasing number of patients ‘at risk’ may further con-tribute to high variability in the weights. Deviation from a mean of one (forthe stabilized versions of the weights) at any time point is an indication ofpossible weight model misspecification, whereas highly variable weights maydecrease the precision of the causal effect estimate [55]. Furthermore, in thepresence of very large weights, near nonpositivity may result in a biased andimprecise estimate of the treatment effect [110, 123]. The large variability infollow-up periods of the MS patients prompted us to investigate the choiceof appropriate weighting schemes for MSCMs.Stabilization of the weights is generally advocated to decrease weightvariation, and hence increase the precision of MSCM estimates [50]. How-ever, the performance of these weights in the chronic disease context has notbeen well-studied. Here we noted that as the observation period increased, sodid the upward trend of the weights. Even though the normalized weights(sw(n)) generally possess desirable properties irrespective of the follow-upperiod length [57], we could find no application of these newly proposedweights to the chronic disease context in the published literature. Applica-tion of sw(n) completely eradicated the upward trends, in turn producing aneffect estimate with slightly higher precision compared to the other weight-ing schemes, suggesting the potential utility of such weights in studies withlonger-follow-up.Adjusting for the time-dependent confounder ‘cumulative relapses’ viaIPTC weighting (sw(n)) moved the estimated effect of β-IFN treatment (HR1.36) away from the null compared to the unweighted Cox model (HR 1.29).The corresponding 95% bootstrap CIs from the MSCM analyses were wider422.4. Discussionthan the 95% robust CIs of the unweighted Cox model, appropriately re-flecting more uncertainty as a consequence of using estimated weights. Theeffect estimates were consistent for the various approximations of MSCMmodels that we considered; none provided evidence of a significant benefitof β-IFN exposure on disease progression.We also explored the application of other weighting schemes, such as,normalized unstabilized weights w(n). Using these weights, we constructedIPTC weighted adjusted survival curves. These curves serve as sensitivityanalyses as their results are independent of fitting any MSCM. However,unstable survival estimates were produced as a result of a few very largeweights. Moreover, as expected, use of the unstabilized weights resulted inlarger SEs of the MSCM estimators than those obtained from the stabilizedversions. The ad-hoc strategy of truncating extreme weights produced morestable survival curves and increased the precision of the MSCM estimatebased on w(n). Truncation at the 5% level was enough to produce quitestable and smooth survival curves, as well as w(n) based MSCM estimatedSEs comparable to those based on sw(n).This study has limitations. In order to make a causal interpretationfrom the MSCM results, identifiability conditions such as positivity, consis-tency, conditional exchangeability and correct MSCM model specificationare required [55], most of which are untestable assumptions. In addition,assuming the IPTC weight models were correctly specified, truncation ofthe most extreme weights might have introduced bias into the β-IFN ef-fect estimates, reflecting the fundamental ‘bias-variance trade-off’ [55]. Ourassessment of disease progression was based on the EDSS which has rec-ognized limitations [124] and may not be able to tease out differences dueto natural aging versus MS disability. Also, one could consider EDSS asanother time-dependent confounder. Our sensitivity analysis implement-ing this (based on imputed missing EDSS values) substantially moved theestimated HR towards the null (HR 1.13; 95% CI 0.76, 1.68), considerablyweakening the suggestion from the main analysis of an adverse effect of treat-432.4. Discussionment. The near-significant point estimate (HR 1.36, 95% CI 0.95 − 1.94)from the main results may therefore be due to residual confounding. Al-though we considered important confounders, residual confounding due tounmeasured covariates (both baseline and time-dependent) is still possible.Potential limitations of the observational study design to assess the associ-ation between β-IFN and disease progression are similar to those describedelsewhere [92].In summary, use of the Cox model alone may be inadequate to han-dle the challenges of analyzing longitudinal observational data. The use ofsuch tools may partly explain the seemingly inconsistent findings regard-ing the effectiveness of β-IFN on disability progression in the ‘real-world’MS clinical practice setting [91, 92]. Here, we carefully implemented theMSCM analysis to adjust for potential indication bias and related changesin patient characteristics which might influence the subsequent treatmentdecisions. Our analyses did not find any association between β-IFN expo-sure and the time to developing sustained EDSS 6 over the follow-up. Eventhough different approaches were used here, our conclusions are consistentwith those of other studies [92, 125]. Furthermore, none of the sensitivityanalyses in the current study changed our conclusion regarding the causaleffect of β-IFN on disease progression. The consistency of the results fromall of our MSCM analyses strengthen our confidence in the findings. Themethods implemented here are adaptable to chronic disease settings beyondMS.442.4. Discussion(a) Unweighted (b) Adjusted by w(n) (untruncated)(c) Adjusted by 5% truncated w(n) (d) Adjusted by 10% truncated w(n)(e) Adjusted by 25% truncated w(n) (f) Adjusted by 50% truncated w(n)Figure 2.4: IPTC weight adjusted Kaplan-Meier-type survival curves for theeffect of β-IFN on time to reaching sustained EDSS 6 for multiple sclerosis (MS)patients from British Columbia, Canada (1995-2008). The truncated weights arederived from the normalized unstabilized IPTC weights (w(n)) so that the survivalprobabilities and HRs are marginal estimates with causal interpretation.45Chapter 3The Performance ofStatistical LearningApproaches to ConstructInverse Probability Weightsin Marginal Structural CoxModels: A Simulation-basedComparison3.1 IntroductionMarginal structural Cox models (MSCMs) [42, 50, 58] provide a popular ap-proach to estimate the causal effect of time-dependent treatment from non-experimental survival data in the presence of time-dependent confounders.As discussed in Chapter 1, these models are based on the potential out-come notion of causality. In Chapter 2, we have seen that inverse prob-ability weights (IPWs) play a key role in the MSCM approach. As withsurvey sampling weighting, IPWs redistribute the population by creating apseudo-population so that the biasing effect of time-dependent confoundingvariables that influence the future treatment decision is removed and theassociation between outcome and treatment becomes unconfounded. The463.1. Introductionvalidity of MSCM results based on non-experimental data depends on iden-tifiability conditions, such as exchangeability, positivity, consistency and allmodels being correctly specified [55]. If these identifiability conditions hold,the resulting treatment-outcome association measures from the MSCM anal-ysis possess a causal interpretation.For randomized experiments, weights are usually known. Since theweights are not generally known in observational studies, we need to esti-mate them from the observed data. Estimation of IPWs is central to MSCM.As observed in point-treatment studies (treatment intervention occuring ata single time point in the study), MSCM estimates are highly sensitive toweight model misspecification [54]. Similar patterns are evident in longitu-dinal studies with moderate numbers of time periods (measurements duringup to three time periods) [126, 127]. Hence, for the correct estimation ofthe causal parameter, the weights need to be estimated as accurately aspossible. As the weights are calculated based on the product of propensityscore-based estimates at each time period, longer follow-up makes weightestimates more challenging [128]. The search for techniques capable of ro-bust estimation of IPWs is of considerable current interest [127, 129–131].The use of logistic regression to model the exposure status is the mostpopular IPW estimation approach. General guidelines for IPW estimationvia logistic regression are stated in the MSCM literature [55]. As estimatedpropensity scores (see Appendix §B.1) are subsequently utilized to createthe weights for the MSCM [42, Appendix.1], findings in the propensity scoreliterature could be valuable in the MSCM context. Propensity scores esti-mated from statistical learning techniques have sometimes been found toimprove covariate balance compared to those estimated from logistic regres-sion models [132, 133]. These methods provide predictions based on a re-lationship obtained from learning algorithms such as bootstrap aggregation(bagging), support vector machine (SVM) and boosting [134–139]. Thesestatistical learning methods seem promising in estimating IPWs with bet-ter properties in longitudinal settings, as hypothesized by some researchers473.2. Marginal Structural Cox Model (MSCM)[135, 140, 141]. However, as the implementation, mechanism and interpre-tation of the propensity sores and MSCM approaches differ considerably, itis not immediately clear if the lessons from propensity score literature willdirectly apply in the MSCM context [126].The performance of the proposed statistical learning algorithms (bag-ging, SVM, boosting) have not been investigated in the context of MSCMsin the longitudinal setting. As the true weights cannot be known from obser-vational data analysis, we need to resort to simulation to evaluate the utilityof the various IPW estimation methods. Young et al. [56, 142] suggested asimulation scheme that satisfies the sufficient conditions of inverse probabil-ity weighting of the MSCM [39] and other similar methods [37, 143]. Theirscheme was used and further described by subsequent simulation studies[57, 114] and later discussed elsewhere [103, 144]. Using their data gener-ation procedure, we compare the performance of MSCMs using these pro-posed IPW estimation methods.This chapter is organized as follows. In the next section, we describethe notation of the IPW estimation methods used to obtain estimates fromMSCM. The next section describes the design of the simulation study, thecorresponding model-specification and the metrics used for evaluating theperformances of the various IPW estimation approaches. We also summarizeand compare the resulting MSCM estimates. Then we present MSCM anal-yses on a retrospective cohort of multiple sclerosis (MS) subjects [92] (alsosee Chapter 2) in the next section. The chapter concludes with discussionof the results, and the strengths and weaknesses of the current study.3.2 Marginal Structural Cox Model (MSCM)In §2.2.2, we described the notations of MSCM in terms of time t. In thissection, we will consider a fixed set of time intervals. Consider a hypo-thetical longitudinal study where the measurements are taken at intervalsm = 0, 1, 2, . . . ,K. Let t0 = 0 be the time of the baseline visit and L0 be483.2. Marginal Structural Cox Model (MSCM)the covariates measured at baseline. Suppose that follow-up continues tothe exact failure time T . During the m-th time interval [tm, tm+1), binarytreatment status Am is measured immediately after recording the value ofa binary covariate (Lm) in the same interval. Here, Am = 1 if the subject istreated in the m-th interval and Am = 0 otherwise. Similarly, Lm = 1 if thecovariate is present for the subject in the m-th interval and Lm = 0 other-wise. We let A¯m = (A0, A1, . . . , Am) and L¯m = (L0, L1, . . . , Lm) be the ob-served treatment history and covariate history respectively through the endof interval m and set A−1 = L−1 = 0. Consequently, a¯m = (a0, a1, . . . , am)and l¯m = (l0, l1, . . . , lm) are the realizations of A¯m and L¯m respectively. Af-ter observing covariate histories until the m-th interval, we define Ym+1 =I(T ≤ tm+1), the indicator of failure by tm+1 and Y¯m+1 = (Y0, Y1, . . . , Ym+1),the failure history through the end of interval m+1. By definition, subjectsmust be at risk at baseline, i.e., Y0 = 0.We denote a treatment regime by a¯K = (a0, a1, . . . am, . . . aK), a pos-sible realization of A¯K . There are 2K+1 possible treatment regimes for abinary treatment, including 0¯K = (0, . . . , 0) (never treated), 1¯K = (1, . . . , 1)(always treated) and (0, . . . , 0, 1, . . . , 1) (partly treated) etc. Let the coun-terfactual failure time be Ta¯K had a subject followed a (hypothetical) regimea¯K . Then the counterfactual outcome history under the treatment regimeis denoted by Y¯ a¯KK+1. Therefore, for each regime a¯m, we can define a MSCMas follows:λa¯m(m) = λ0¯m(m) exp(γ(m, a¯m,ψ)), (3.1)where the (causal) effect is indicated by a parameter vector ψ = (ψ1, ψ2),γ is a known function, λa¯m(m) and λ0¯m(m) are hazard functions for thecounterfactuals Ta¯m and T0¯m at time tm. For the treatment regime a¯m, thecausal hazard ratio is defined as λa¯m(m)/λ0¯m(m) comparing with 0¯m. Acausal effect is present (ψ1 6= 0) if for any a¯m (m = 0, 1, . . . ,K), λa¯m(m) 6=λ0¯m(m). The equality of the hazard functions for all K + 1 intervals is493.2. Marginal Structural Cox Model (MSCM)indicative of the absence of causal effect (ψ1 = 0). We specifyγ(m, a¯m,ψ) = ψ1Am + ψ2L0, (3.2)based on current treatment exposure [50]. See Appendix §B.2 for an ex-tended definition of MSCM.3.2.1 Estimation of ψ1 from MSCMMSCM is based on counterfactual theory. This requires creating a pseudo-population where the confounding due to the time-dependent confounder isremoved from the relationship between outcome and treatment expusure. Ifthe time-dependent confounder is a strong predictor of the treatment expo-sure for a patient in a given time-period, then that person-time contributionis down-weighted using IPWs. As discussed in §2.2.2, the stabilized versionof the inverse probability of treatment weights isswim =m∏j=0pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0)pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯ij = l¯ij);see Appendix §B.3 for further details.Simulation studies have shown that a MSCM fitted directly using theIPW weighted Cox proportional hazards model considerably reduces thevariability of the estimated treatment effect [57] compared to approximateMSCM approaches, such as the IPW weighted pooled logistic regression ap-proximation [50]. This is true even when both the direct and approximateMSCM approaches use the same weights. It was also shown that when theevent rate is more frequent, the weighted pooled logistic regression approxi-mation leads to biased estimates [56]. Therefore, we fit the MSCM directlyusing the Cox model with IPWs to estimate ψ1, as was done in Chapter 2.We use the robust sandwich standard error (calculated based on residualsand weights) [145, 146] to estimate the variance of the MSCM estimators[50, 57, 104].503.2. Marginal Structural Cox Model (MSCM)3.2.2 Estimation Methods of IPWsThe following methods are used to estimate the IPWs:Logistic regression. To estimate the IPWs, treatment status at eachtime point is modelled with respect to the covariates associated with thetreatment decision. The predicted values from this logistic regression modelare the most commonly used to generate treatment weights. Logistic regres-sion is easy to understand and interpret, but violation of its assumptionsleads to invalid inference.We use the IPWs estimated from this approach as a baseline to comparethe properties of IPWs estimated from other methods.Bagging. Bagging is a statistical learning algorithm intended to increasethe power of a predictive model [147, 148]. B bootstrap samples from theoriginal dataset are used as B training sets. Predictive classification treesare grown to make predictions for the original dataset. If the constructedtrees are not pruned, predictions are generally associated with high variabil-ity but low bias due to possible over-fitting. The resulting B predictions arethen aggregated and majority vote assigns a final predicted value for eachtreatment status (0 or 1). These predictions are generally associated withless variability and more accuracy than those obtained from using logisticregression [149, 150]. This is especially true when logistic regression providesunstable prediction, which is sometimes the case while estimating treatmentweights [73].In our simulations, we use B = 100 bootstrap replications. As suggestedin the propensity scores literature, 10-fold cross-validation is used in orderto obtain better predictions [151].Support vector machines. SVM is a highly flexible statistical learn-ing algorithm to find the optimal separating hyperplane (say, a straightline in 2 dimensions) that gives the best separation between the treatment513.2. Marginal Structural Cox Model (MSCM)classes. The resulting separation rule puts the binary classes (treated versusuntreated) are as far as possible from the hyperplane [149, 150, 152]. To fa-cilitate finding a hyperplane that maximally separates binary classes, SVMmaps (transforms) the covariates into a higher dimensional space by usingthe kernel function. Even with noisy data, SVMs generally perform well inclassification problems.In our simulations, we use the polynomial kernel to fit the SVM. Avail-able software routines enable us to obtain probability estimates or predic-tions from the SVM fit via internal cross-validation procedures [153].Boosting. Boosting is a general approach for improving predictions, whichcan be applied in many regression and classification problems [134, 149, 150,154]. Bagging and boosting work in a similar iterative way with one excep-tion. Bagging uses bootstrap samples whereas boosting sequentially uses amodified version of the original data in each iteration. These modificationsare based on the information obtained from the previous iterations. That is,in the boosting fitting procedure, in each successive iteration b = 1, 2, . . . , B,this algorithm places more weight on those treatment statuses that weremisclassified in the previous iterations. Then the final predicted values oftreatment status are obtained from an average (weighted by a shrinkageparameter, not a simple average as for the bagging approach) of the B pre-dictions. When classification trees are used in the process, this methodinherits their flexible properties, while being able to capture complex inter-actions among covariates and nonlinear effects [155].In this chapter, B = 1, 000 trees are used in each fit, with maximumtwo-way interaction (d = 2) of all the covariates under consideration andshrinkage parameter set to 0.01.523.2. Marginal Structural Cox Model (MSCM)3.2.3 IPW schemesHaving a smaller standard deviation (SD) is desirable property for theweights [49, 55]. Even though the unstabilized weights w (see Appendix §B.3equation (B.3)) produce consistent MSCM estimates, these weights are no-torious for producing extreme weights that can ultimately lead to inefficientestimates and impractical confidence interval widths [55, 57]. Generally,unstabilized weights w are associated with high variability. The variabil-ity of the weights is an important factor to consider because more variableweights may lead to more variability of the MSCM estimates. Stabilization(see Appendix §B.3; equation (B.6)) reduces the variability of the weights,while not affecting the consistency of the estimate of the MSCM parame-ter [50, 156]. Even after stabilization, we may observe a few extreme weights.Several other ad-hoc suggestions have been proposed to further reducethe variability of the weights. Although they are practically useful, theydo not have much theoretical justification (see Appendix §B.5). Weighttruncation [55, 104, 157, 158] reduces excess variability in the weights. Nor-malization (see Appendix §B.3; equation (B.8)) is a popular survey samplingtechnique that found its way into the MSCM literature [57]. In this study, wecompare various unstabilized, stabilized and normalized versions of IPWs.We also assess their characteristics under increased levels of truncation.3.2.4 Fitting Weight Models to Estimate IPWFor generating unstabilized, stabilized and normalized versions of the inverseprobability of treatment weights using logistic regression, we used the IPWgenerating formulas (equations 2, 5, 7 in Appendix §B.3). The denominatormodel for the unstabilized and stabilized weights included the lagged valueof treatment status Am−1, the follow-up month index m, the time-dependentcovariate Lm, its lagged vaue Lm−1 and its interaction with lagged treat-ment status (Am−1 × Lm) (for both equations 2 and 5 in Appendix §B.3).The numerator model for the stabilized weights included the lagged value oftreatment status Am−1 and the follow-up month index m (for equation 5 in533.3. Design of SimulationsAppendix §B.3). We get normalized unstabilized and stabilized versions ofthe weights by normalizing the unstabilized and stabilized weights respec-tively (equation 7 in Appendix §B.3). The same list of covariates are usedto generate the IPW using the bagging, SVM and boosting approaches. Asthe second order interaction depth of the covariates (d = 2) is selected forthe boosting approach, explicitly specifying the interaction (Am−1 × Lm)was not necessary. The software implementation details are described inAppendix §B.4.3.3 Design of SimulationsMSCM is a popular tool to estimate the causal effects of time-varying treat-ments in the presence of time-varying confounders that are affected by pre-vious treatment exposure. To study the properties of this method, we needto be able to simulate data from a MSM with specified parameter values sothat we can evaluate how well MSCM performs. Several studies have sim-ulated data from MSCMs under different conditions [56, 57, 102, 103, 114,142, 144, 159, 160]. We apply a data generation process due to Young et al.[56] where both treatment status and confounder values are generated basedon their lagged values. We briefly describe the data generation procedurehere.For this longitudinal follow-up study, let i = 1, 2, . . . , n be the subjectindex. At each time interval [tm, tm+1), values of a single time-dependentconfounder Lim and time-dependent treatment Aim are sampled from aBernoulli distribution with probabilities pL and pA respectively, where pLand pA are defined as follows:logit(pL) = logit Pr(Lm = 1|Am−1, Lm−1, Ym = 0;β)= β0 + β1I(T0¯ < c) + β2Am−1 + β3Lm−1, (3.3)543.3. Design of SimulationsFigure 3.1: Causal diagram depicting the dependencies in the marginal structuralCox model (MSCM) data generation algorithm.logit(pA) = logit Pr(Am = 1|Lm, Am−1, Lm−1, Ym = 0;α)= α0 + α1Am−1 + α2Lm + α3Lm−1 + α4Lm ×Am−1, (3.4)where α = (α0, α1, α2, α3, α4) and β = (β0, β1, β2, β3). Here, T0¯ is the un-treated counterfactual survival time and c is an arbitrary cut-point usedto generate the binary variable I(T0¯ < c). The sampling distributions ofLm and Am both depend on their previous lagged values, i.e., lm−1 andam−1. In particular, past treatment exposure status Am−1 is a predic-tor of Lm, which then predicts future treatment exposure Am. The con-founding in the exposure-outcome relationship arises via the following path:Ym+1 ← T0¯ → Lm → Am (see Figure 3.1). While generating treatmentstatus in the next interval, we also include an interaction term between pasttreatment status Am−1 and current confounder status Lm. This interactionAm−1 × Lm mimics the commonly occurring situation that both of thesefactors influence future treatment decisions.553.3. Design of SimulationsThe untreated counterfactual survival time (for the never-treated regime0¯K ≡ 0¯), Ti0¯ for each person is sampled from an exponential distributionwith constant hazard λT0¯ . The counterfactual survival time under a givenregime a¯m, Tia¯m is calculated from the cumulative hazard∫m+10 λa¯j (j)dj ≡∑mj=0 λa¯j (j). At each step of the data generation procedure, this cumulativehazard∫m+10 λa¯j (j)dj is updated based on the new am value, accumulatingthe risk for the regime A¯m = a¯m. The counterfactual survival times T0¯and TA¯m follow the same distribution if either ψ1 = 0 or Am = 0 for allm [142, 160]. Therefore, the sampled Ti0¯ (with hazard λi0¯ for the never-treated regime 0¯) is compared with the calculated Tia¯m (with cumulativehazard∫m+10 λa¯j (j)dj for the simulated regime a¯m) to determine whether thesubject fails in the next interval, i.e., if the sampled Ti0¯ is greater than thecalculated TiA¯m , then the failure indicator Yi,m+1 = 0; otherwise Yi,m+1 = 1.3.3.1 Simulation SpecificationsIn observational epidemiologic studies of drug effectiveness, confounding byindication is a common problem. This is a specific type of confounding en-countered when the allocation of the treatment A is not random, and thephysician’s decision to assign a treatment to a particular subject is affectedby factors such as the severity of disease, concurrent therapies, concomi-tant medical conditions Lm (say, disease activity), or combinations of theseconditions (e.g., interactions). To mimic this confounding by indication inthe simulation, the treatment status at each stage Am is generated by thefollowing factors: the previous therapy, Am−1, the current and past medi-cal conditions or symptoms, Lm and Lm−1 respectively and the interactionAm−1 × Lm. We assume that being treated in the previous time-period(Am−1 = 1) positively stimulates (α1 = 1/2 = 0.5) a subject to continuetreatment in the current period (Am = 1), whereas occurrence of currentand past symptoms (Lm = 1 and Lm−1 = 1 respectively) positively encour-ages (α2 = 1/2 = 0.5, α3 = log(4) = 1.39) a subject to take the treatment563.3. Design of Simulationsin the current period (Am = 1) (in equation (3.4)). If a subject was undertreatment in the previous period and the subject is currently suffering fromsymptoms (i.e., Am−1 = 1 and Lm = 1), both of these factors influencethe subject positively (α4 = log(6/5) = 0.18) to continue treatment in thecurrent period (Am = 1). In the absence of previous treatment (Am−1 = 0)and current or previous symptoms (Lm = 0 and Lm−1 = 0), the subjectis less likely (α0 = log(2/7) = −1.25) to take the treatment in the currenttime-period (Am = 1) and more likely to discontinue (Am = 0). Therefore,the associated parameter vector in equation (3.4) is α = (log(2/7), 1/2, 1/2,log(4), log(6/5)).In our simulations, the time-dependent confounder, Lm, is similarly gen-erated by the previous treatment status Am−1, the lagged time-dependentconfounder Lm−1 and a binary confounder I(T0¯ ≤ c) associated with thecounterfactual outcome T0¯. We assume that having a survival T0¯ shorterthan a cut-point c (i.e., I(T0¯ ≤ c) = 1) puts a subject under an increasedrisk (β1 = 2) of developing a symptom (Lm = 1) (in equation (3.3)). Thecut-point is set to c = 30. Being treated in the previous time interval(Am−1 = 1) reduces a subject’s risk (β2 = log(1/2) = −0.69) of developinga new symptom (Lm = 1). A subject who experienced a symptom in theprevious period (Lm−1 = 1) is also more likely (β3 = log(3/2) = 0.40) todevelop a new symptom in the current time-period (Lm = 1). In the absenceof previous treatment (Am−1 = 0) and previous symptoms (Lm−1 = 0), asubject is less likely (β0 = log(3/7) = −0.85) to develop a new symptom(Lm = 1). Therefore, the associated parameter vector in equation (3.3) isβ = (log(3/7), 2, log(1/2), log(3/2)). The true causal effect parameter inequations (3.1) and (3.2) is set such that the treatment is less hazardous(ψ1 = −0.5 in log-hazard scale) to the subjects and therefore has a benefi-cial effect.Note that the same set of covariates are used to generate the treatmentin the data-generation algorithm (see equation 3.4) and in the weight modelfitting process, except for the follow-up index m. Using this follow-up or573.3. Design of Simulationsvisit index variable m in the weight model allows us to estimate a separateintercept for each visit (say, month) [69]. Other flexible choice of modelling(say, smoothing this index m) are also possible [50], but were not used inour weight model fitting.To study the properties of the weight estimation procedures, we generatea large dataset with n = 25, 000 subjects. In the simulations, we generatedatasets with n = 2, 500 subjects, each followed for up to m = 10 subsequentmonthly visits as in previous studies [56, 57, 114]. To assess the small sampleproperties, these simulations are repeated for a smaller sample size (n =300). Ti0¯’s were sampled from an exponential distribution, with constantλ0 = 0.01 rate of monthly events throughout the follow-up to mimic therare disease condition. To mimic a more frequent event rate scenario, therate is increased to constant λ0 = 0.10 rate of monthly events throughout thefollow-up. The Monte Carlo study consists of N = 1, 000 generated datasetsfor each setting under consideration. The pseudocode for our simulationdesign is provided in Appendix §B.6.3.3.2 Performance MetricsWe assessed the performance of the various weighting schemes by the fol-lowing measures• Bias =∑Ni=1(ψˆ1i−ψ1)/N : The average difference between the true andN = 1, 000 estimated parameters (log hazard ratio) from the MSCMmodel.• SD =√∑Ni=1(ψˆ1i − ψ′1)2/(N − 1) where ψ′1 =∑Ni=1 ψˆ1i/N• MSE =√∑Ni=1(ψˆ1i − ψ1)2/N• Model-based SE: The average of N = 1, 000 estimated standard errorsof the estimated causal effect from the MSCM model.• Coverage probabilities of model-based nominal 95% CIs: Proportionof the N = 1, 000 datasets for which the true parameter is contained583.4. Simulation Resultsin the 95% CI.3.4 Simulation Results3.4.1 IPW SummarySummaries of the (untruncated) weights calculated from different approachesfrom one large simulated dataset with 25, 000 subjects, each with up to 10visits, are presented in Table 3.1. As expected, for each fitting approach,the mean and standard deviation are noticeably larger for the unstabilizedweights w. Normalization is effective in reducing the variability of the un-stabilized weights (i.e., w(n)) but stabilization (i.e., sw) is even better. Nor-malization of the stabilized weights (i.e., sw(n)) has little impact on thevariability. Bagging results in a reduction in variability and SVM reducesthe variability even further. With boosting, the variabilities of the unsta-bilized weights (w and w(n)) increase slightly compared to those of the cor-responding weights from SVM. Surprisingly, the variablities of the boostingstabilized weights (i.e., sw and sw(n)) increase more than 5-fold comparedto those from SVM.As expected, the effect of increased levels of truncation is monotone in re-ducing the variability of IPWs generated from all approaches (see Appendix§B.7: Appendix Tables B.1-B.4). When the weight variability is alreadysmall, the truncation has less of an effect on variability reduction.This data is generated under the rare disease condition (λ0 = 0.01 in amonthly scale and the corresponding event rate is 0.010 in this dataset). Theevent rate becomes as frequent as 0.075 when the parameter λ0 is increasedto 0.10 in a monthly scale under the same data generating conditions.593.4.SimulationResultsTable 3.1: Summaries of the (untruncated) weights estimated by different methods (l = logistic, b = bagging, svm = SVM,gbm = boosting) under different weighting schemes (w = unstabilized, w(n) = unstabilized normalized, sw = stabilized,sw(n) = stabilized normalized) from the simulation study with a large (25, 000) number of subjects, each with up to 10 visits,under the rare event condition.Min. Q1 Median Mean Q3 Max. sd p >20 p >100l − w 1.21 3.96 17.82 189.70 98.09 12780.00 666.15 90.38 47.23l − w(n) 0.01 0.22 0.58 1.00 1.22 13.99 1.37 0.00 0.00l − sw 0.33 0.79 0.94 1.00 1.19 2.54 0.34 0.00 0.00l − sw(n) 0.32 0.78 0.94 1.00 1.18 2.48 0.33 0.00 0.00b− w 1.28 4.08 18.62 195.80 101.50 8990.00 641.09 96.09 49.35b− w(n) 0.01 0.26 0.66 1.00 1.26 12.56 1.31 0.00 0.00b− sw 0.36 0.92 0.98 1.00 1.06 1.99 0.19 0.00 0.00b− sw(n) 0.35 0.92 0.98 1.00 1.06 1.95 0.19 0.00 0.00svm− w 1.35 4.50 20.25 161.40 100.70 6568.00 466.04 80.87 40.52svm− w(n) 0.03 0.34 0.72 1.00 1.31 8.33 1.10 0.00 0.00svm− sw 0.76 0.95 1.01 1.00 1.06 1.22 0.08 0.00 0.00svm− sw(n) 0.76 0.95 1.01 1.00 1.05 1.22 0.08 0.00 0.00gbm− w 1.24 3.56 16.09 163.60 90.74 6441.00 477.65 75.26 38.22gbm− w(n) 0.01 0.23 0.60 1.00 1.22 8.86 1.29 0.00 0.00gbm− sw 0.21 0.77 0.93 0.99 1.10 3.41 0.42 0.00 0.00gbm− sw(n) 0.21 0.77 0.94 1.00 1.11 3.45 0.42 0.00 0.00603.4. Simulation Results3.4.2 Comparing IPW Estimation ApproachesResults of the simulation using 1, 000 datasets (each with n = 2, 500) underthe rare event condition (λ0 = 0.01 in a monthly scale) are shown in Figures3.2 - 3.6.Figure 3.2 shows the bias pattern in estimating the MSCM parameterψ1 = −0.5 when the IPWs are estimated using the four different approaches.The untruncated weights generated using logistic regression and boostingsuccessfully estimate the parameter ψ1, while bagging introduces some biasin estimating ψ1, and SVM yields even more bias. This description is validfor all the weighting schemes (w, w(n), sw and sw(n)). Under increased lev-els of truncation, a clear pattern is visible: as expected, increasing the levelof truncation increases the bias. SVM is clearly doing worse than logisticregression in terms of bias. Bagging is doing better than SVM, but clearlynot as well as logistic regression. Boosted regression is performing as wellas logistic regression and for the stabilized cases (sw, sw(n)), it is doingslightly better. In general, the bias of all IPW estimation approaches agreeat 50% truncation. As theory suggests, this indicates the bias obtained froma baseline-adjusted analysis [55].Figure 3.3 shows the pattern of variability (SD) of the causal effect es-timates from MSCM. This figure shows that the SDs of ψˆ1 for each set ofweights under consideration are very similar for the same level of trunca-tion. As expected, the unstabilized IPWs from all methods were associatedwith higher SDs. Normalizing the unstabilized IPWs reduced this variabil-ity. Stabilization was also effective in reducing variability. Normalization ofthe stabilized IPWs had little further effect. Figure 3.4 shows that, exceptfor SVM, the average model-based standard errors (SE) of ψˆ1 were similarto the empirical SDs of ψˆ1.Figure 3.5 summarizes the MSE patterns of the MSCM estimates. Asthe SDs from the different approaches were similar, differences in MSE were613.4. Simulation Resultsmainly due to differences in bias. The sharp decrease in MSE at low levelsof truncation in most of the curves suggests that a low level of truncationmight yield better estimates (compared to no truncation) in terms of MSE.The coverage probabilities of model-based nominal 95% CIs for ψ1 areshown in Figure 3.6. The untruncated IPWs show good coverage whencomputed from the boosting or logistic regression approaches. As the biasincreases and the SE decreases under increased levels of truncation, it isnot surprising that the coverage probability decreases sharply as the levelof truncation increases. IPWs calculated from boosting yield as good cov-erages as those from logistic regression. With stabilization, boosting yieldsslightly better coverage than logistic regression. The bagging approach doesnot perform well in terms of coverage probability and the performance ofSVM is even worse.3.4.3 Properties From Smaller SamplesThe corresponding results from the simulation for n = 300 appear in Ap-pendix Figures B.1-B.5 in Appendix §B.8.1. The bias is slightly larger withthis smaller sample size, though the patterns are similar compared to then = 2, 500 case (Appendix Figure B.1). As expected, the SDs of ψˆ1 aremuch higher in all settings compared to the n = 2, 500 case (Appendix Fig-ure B.2). Except for SVM, the patterns of average SE are similar to the SDsof ψˆ1 (Appendix Figure B.3). In this smaller sample case, bias and varianceboth are larger, affecting the MSE (Appendix Figure B.4) and the patternsof the MSE curves differ from the n = 2, 500 case. Except for the unstabi-lized weights in the smaller sample case, MSE increases with higher levels oftruncation. For n = 2, 500 we observe a sharp drop below 5% truncation andthen an upward trend, whereas for n = 300 the drop continues to around10% truncation. This suggests that the levels of truncation up to 10% mightbe beneficial in obtaining MSCM estimates from such smaller samples. Thecoverage probabilities of the model-based nominal 95% CIs are always less623.4. Simulation Resultsthan 90% for the unstabilized weights (Appendix Figure B.5). However,when IPWs are either normalized or stabilized, or both, the coverage prob-abilities at all low levels of truncation are at least 90%, except for the SVMapproach. However, the coverage probabilities of even these weights neverreach the nominal 95% level.3.4.4 When More Events are AvailableWhen this simulation is repeated with n = 2, 500 but with λ0 = 0.10 insteadof λ0 = 0.01 in a monthly scale, the level of bias is substantially reduced(Appendix Figure B.6 in Appendix §B.8.2 compared to Figure B.1). Bag-ging results in more bias than logistic regression and SVM is still worse.Boosting performs as well as logistic regression. The SDs of ψˆ1 are muchlower in all settings compared to the rare event scenario (Appendix FigureB.7). The patterns of average SE are similar to the SDs of ψˆ1 (AppendixFigure B.8, compared with Appendix Figure B.7). In the rare event scenariofor both the n = 2, 500 and n = 300 cases, the average SEs are generallylarger than the SDs of ψˆ1 when using SVM. However, this discrepancy is notas severe in the frequent event scenario. The almost constant MSEs withincreased levels of truncation in this scenario suggests that truncation maynot be very helpful in improving MSCM estimates in terms of MSE exceptfor the unstabilized weights (Appendix Figure B.9). However, normaliza-tion, stabilization or both are still effective in obtaining estimates with lowerMSE. The coverage probabilities of model-based nominal 95% CIs obtainedfrom untruncated weights in this scenario are not very different than in therare event scenario (Appendix Figure B.10). However, the coverage prob-abilities do not decrease nearly as quickly with higher levels of truncation.Even with 50% truncation, the coverage probabilities are close to 75% (asopposed to close to 0% in the rare event setting).633.4. Simulation ResultsAverage bias in ψ10 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasUnstabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasNormalized unstabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasStabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasNormalized stabilizedlogist bag svm gbmFigure 3.2: Bias of MSCM estimate ψˆ1 under different IPW estimation ap-proaches when the large weights are truncated with increased levels in a simulationstudy of 1, 000 datasets with 2, 500 subjects observed at most 10 times.3.4.5 Computational TimeThe computational time for running the R process for each IPW generatingapproach (for estimating the unstabilized weights) using a dataset with 300643.4. Simulation ResultsEmpirical standard deviation of ψˆ10 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilesdUnstabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilesdNormalized unstabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilesdStabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilesdNormalized stabilizedlogist bag svm gbmFigure 3.3: Empirical standard deviation of MSCM estimate ψˆ1 under differentIPW estimation approaches when the large weights are truncated with increasedlevels in a simulation study of 1, 000 datasets with 2, 500 subjects observed at most10 times.subjects having up to 10 visits is reported in Table 3.2 for a Windows 7 64bit machine.653.4. Simulation ResultsAverage model-based standard error in ψˆ10 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilemodel.sdUnstabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilemodel.sdNormalized unstabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilemodel.sdStabilized0 10 20 30 40 500.20.30.40.50.60.7weight truncation percentilemodel.sdNormalized stabilizedlogist bag svm gbmFigure 3.4: Average model-based standard error of MSCM estimate ψˆ1 underdifferent IPW estimation approaches when the large weights are truncated withincreased levels in a simulation study of 1, 000 datasets with 2, 500 subjects observedat most 10 times.663.5. Empirical Multiple Sclerosis ApplicationMean squared error in ψˆ10 10 20 30 40 500.20.30.40.50.60.70.8weight truncation percentilemseUnstabilized0 10 20 30 40 500.20.30.40.50.60.70.8weight truncation percentilemseNormalized unstabilized0 10 20 30 40 500.20.30.40.50.60.70.8weight truncation percentilemseStabilized0 10 20 30 40 500.20.30.40.50.60.70.8weight truncation percentilemseNormalized stabilizedlogist bag svm gbmFigure 3.5: Mean squared error of MSCM estimate ψˆ1 under different IPWestimation approaches when the large weights are truncated with increased levelsin a simulation study of 1, 000 datasets with 2, 500 subjects observed at most 10times.3.5 Empirical Multiple Sclerosis ApplicationWe apply the methodologies described in this chapter in the British ColumbiaMS cohort (1995-2008) described in §2.2.1. The β-IFN exposure is defined673.5. Empirical Multiple Sclerosis ApplicationThe coverage probability of model-based nominal 95% confidence intervals of ψˆ10 10 20 30 40 500.00.20.40.60.81.0weight truncation percentilecpUnstabilized0 10 20 30 40 500.00.20.40.60.81.0weight truncation percentilecpNormalized unstabilized0 10 20 30 40 500.00.20.40.60.81.0weight truncation percentilecpStabilized0 10 20 30 40 500.00.20.40.60.81.0weight truncation percentilecpNormalized stabilizedlogist bag svm gbmFigure 3.6: The coverage probability (cp) of model-based nominal 95% confi-dence intervals based on the MSCM estimate ψˆ1 under different IPW estimationapproaches when the large weights are truncated with increased levels in a simula-tion study of 1, 000 datasets with 2, 500 subjects observed at most 10 times.as a time-dependent variable Am, measured on a monthly basis. We assessthe impact of β-IFN on time to reach irreversible disability progression (sur-683.5. Empirical Multiple Sclerosis Applicationvival outcome) in the real-world clinical practice setting. From the follow-upbetween July 1995 and December 2004, 1, 697 patients are included in thestudy, 829 of whom never receive the β-IFN treatment. Among the 6, 890person-years of follow-up, 2,530 person-years are β-IFN exposed. Ultimately,138 subjects reached irreversible disability, measured by sustained EDSS 6.Appendix §A.6 describes the baseline characteristics.As discussed in in Chapter 2, MSCMs are an appropriate choice of modelto adjust for the time-dependent confounder Lm cumulative relapses andbaseline confounders L0: age, sex, disease duration, and EDSS score. IPWsare estimated using the following methods: logistic regression, bagging,SVM, boosting. The resulting estimate of exp(ψ1) and the correspondingrobust standard errors are compared.As the stabilized normalized weights sw(n) performed well in the simula-tion, we used sw(n) generated from different IPW estimation approaches asthe MSCM weights in our analyses (equation (B.8) in Appendix §B.3). Wefirst calculate treatment weights swT using the general inverse probability oftreatment weight model (equation (B.6) in Appendix §B.3). In all IPW es-timation methods, the numerator model included the baseline covariates L0(EDSS score, age, disease duration, sex), the lagged treatment status Am−1,and the follow-up month index m (equation (B.7) in Appendix §B.3). Thedenominator model included the numerator model covariates as well as thetime-dependent covariate Lm ‘cumulative number of relapses for last 2 years’and its interaction with the lagged treatment status (Am−1 × Lm) (equa-Table 3.2: Time required to compute IPWs using various approachesIPW estimation approach Time (in seconds)Logistic regression 0.02Bagging 5.12SVM 3.05Boosting 50.50693.5. Empirical Multiple Sclerosis Applicationlllllll0 10 20 30 40 501.281.301.321.341.36weight truncation percentileHazard Ratiolllllll0 10 20 30 40 500.2250.2300.2350.240weight truncation percentilese(Hazard Ratio)l logist bag svm gbmFigure 3.7: Performance of stabilized normalized weights estimated from differentIPW estimation approaches for MSCM analysis in a multiple sclerosis study.tion (B.5) in Appendix §B.3). Since this was an observational study andartificial or non-random censoring may be present, we also need to calcu-late censoring weights swC [55]. Setting censoring status as the dependentvariable, the same numerator and denominator covariate specifications asin treatment weight model were used to generate the inverse probability of703.5. Empirical Multiple Sclerosis Applicationcensoring weights. Multiplying the treatment and censoring weights yieldsthe IPWs sw = (swT × swC) and we normalize sw to get sw(n) (equation(B.8) in Appendix §B.3). The sw(n) weighted MSCM further adjusts for thebaseline covariates (equation (B.2) in Appendix §B.2). We also assessed theimpact of increased levels of weight truncation.Figure 3.7 shows the estimated hazard ratio and corresponding robuststandard error from the fitted MSCMs. IPWs generated using SVM andboosting methods show fairly similar results in terms of the estimated haz-ard ratio HR = exp(ψˆ1) and its robust standard errors. HR estimates basedon logistic IPWs are associated with higher robust standard errors. The re-sults from bagging do not change much under increased levels of truncation.In fitting weight models using the bagging approach, we used B = 100. AsB = 100 may be an inadequate number of boostrap replicates to stabilizethe misclassification error rates in the bagging approach, we repeated theanalysis with B = 1, 000. However, this did not have much impact on theMSCM estimates (data not shown). Appendix Tables B.5-B.8 in Appendix§B.9 summarize the stabilized normalized weights generated from the fourapproaches, the corresponding hazard ratio estimates and the confidenceintervals in more detail. As shown in those tables, the IPW weights arewell-behaved (mean approximately 1 and small SD) and none of the analy-ses yield strong evidence of an association between β-IFN exposure and timeto reaching a sustained EDSS 6. Except for bagging, standard errors reducewith higher levels of truncation. For the bagging approach, estimates areclose to those obtained from the baseline-adjusted analysis under all trun-cation levels. The performance of the IPW estimation methods to estimateψ1 is also provided on the log-hazards scale (Appendix Figure B.11 in Ap-pendix §B.9). The patterns of ψ1 in this figure are similar to that of HR inFigure 3.7. The differences in the SEs of ψˆ1 from the different approachesare very small.Based on the simulation results, we know that when IPWs are estimatedfrom the boosting approach, the corresponding MSCM estimates are better713.6. Discussionor at least similar to that of logistic regression approach. Therefore, basedon our simulation results, the use of boosting as an IPW estimation methodserves as an excellent sensitivity analysis.3.6 DiscussionThe MSCM approach is built on counterfactual theory where a pseudo-population is built based on IPWs. The confounding due to the time-dependent confounder is removed from the relationship between outcomeand treatment exposure in this pseudo-population. Estimation of IPWs isessential in the process.The probability of receiving treatment given the covariates is known asthe propensity score. IPWs can be thought as an extension to propen-sity scores when treatment is time-dependent in longitudinal studies [161].Typically, as with propensity scores, the IPWs are estimated using logisticregression models. Assessment of the assumptions of the corresponding lo-gistic regression fits are rarely seen in the MSCM literature. In fact, manyanalyses involving MSCM do not report important IPW summaries ade-quately [51, 52]. Alternative modelling strategies, such as statistical learn-ing methods, have been explored in the propensity score literature. Thesestrategies achieve the same goal of obtaining covariate balance but requirefewer assumptions. As propensity scores and MSCM estimation are quitedifferent, we investigated whether the success of these alternative methodsin propensity score modelling generalizes to the MSCM context.Various MSCM data generating algorithms are proposed in the litera-ture. We used one of them [56] to assess the performance of the MSCMs.To make the data generating process more realistic for many disease set-tings, we included an interaction term between previous treatment statusand current state of the confounder in deciding the next period treatmentassignment. This data generating process was used to assess three settings:(i) large and (ii) small sample sizes in the rare event scenario and (iii) large723.6. Discussionsample sizes when the event rate is more frequent. We estimated the causaleffect parameter ψ1 with a MSCM. As we know the values of the parametersgenerating the data, we can assess the performance of the IPW estimatingmethods under consideration.We also evaluated the performance of the ad-hoc variability reductiontechniques for IPWs (such as stabilization, normalization, truncation andtheir combinations). We found that normalized weights perform betterthan unnormalized weights. When applied to unstabilized weights, the im-provement is noticeable in terms of bias, SD, MSE and confidence intervalcoverage. However, application of normalization to stabilized weights didnot have much impact. This is the case even when truncation is applied.Depending on how many risk-sets are present in the study, the increasedcomputational burden due to use of normalization might not be justifiedgiven the small gain. However, when application of unstabilized weights isdesired [115, 162], normalization might be useful. A small level of truncationmight also be helpful in such scenarios [128].Among the methods under consideration, boosting estimates were asso-ciated with the least bias under the assumption of rare events. In the rareevent scenario, stabilized IPWs estimated from boosting marginally outper-formed the stabilized IPWs generated from logistic regression in terms ofbias, MSE and coverage probability of ψ1 estimation. However, when theevent rate is more frequent, no such advantage is apparent. Compared to anyother methods under consideration, MSCM estimates computed using theboosting approach were always closest to those from logistic regression, butthe computational burden associated with the boosting approach is muchhigher.Bagging and SVM did not perform very well in our simulation context.One reason could be that estimation of the weights generally requires esti-mation of probability of class membership (i.e., estimating the probability ofbeing treated versus not). Bagging may not estimate the probability of class733.6. Discussionmembership well and may result in boundary probabilities (i.e., close to 0or 1) more often than expected [163]. SVM approaches attempt to find theoptimal dividing hyperplane [164] instead of explicitly modelling the prob-ability of getting treatment. Such properties might make these statisticallearning approaches less desirable for estimating IPWs [135].We implemented all these IPW estimation methods in a MS dataset toshow the applicability of the proposed methods in practice. We estimatedthe effect of β-IFN on irreversible disability progression using MSCM withstabilized normalized IPW. Except for the bagging approach, hazards ratioestimates from the different IPW adjusted analyses are fairly similar. Asexpected from our simulation results, HR estimates using IPWs estimatedfrom the boosting approach were more precisely determined than those fromlogistic regression. Interestingly, SVM performed similarly to the boostingmethod. In this application, bagging failed to take into account of time-dependent confounder adequately, as the resulting estimates were similarto that obtained from the baseline-adjusted analysis (where time-dependentconfounding was not controlled). This was the situation under all the in-creased levels of truncation considered. The results from logistic regression,SVM and boosting are consistent with the previously reported estimates[128]. None of the methods resulted in a significant effect estimate.This study has several limitations. In this simulation, the data genera-tion algorithm used linearity in the logit specification and also the logisticregression model was correctly specified during the simulated data analy-sis. For these reasons, it is not surprising that the logistic regression modelperforms well. However, despite these advantages favoring the logistic re-gression, the better performance of the boosting approach in the rare diseasesettings shows the utility of using this method. Statistical learning meth-ods generally work well with high dimensional data. In this simulation,the number of covariates considered for adjustment is limited. Baseline co-variates are not used in the simulation and the time-dependent confounderunder consideration is a binary variable. The treatment variable under con-743.6. Discussionsideration is binary. When we conducted the data analysis where baselineconfounders were present, the results from SVM seem to be close to thosefrom logistic regression whereas bagging did not perform as well as the sim-ulation suggests, especially at the untruncated and lower truncation levels.Further studies are required to assess the behaviour of these methods in thepresence of baseline covariates. While estimating the IPWs, better covariatebalance in the propensity scores in the point-treatment studies motivated usto use the statistical learning approaches in the MSCM context. However, itis not well established in the published literature how to generalize such bal-ancing criteria when there are many time points in a longitudinal study andwhen multiple time-dependent covariates are present [129]. Future studiescould investigate this issue. To reduce the computational burden, we utilizedrobust standard errors for ψˆ1 [57]. However, resampling methods, such asthe bootstap [165] may provide more reliable estimates of the standard error[55, 166]. The performance of these methods may be further enhanced afterfine tuning of statistical learning parameters to obtain better fits. In thisstudy, we mostly relied on the default settings offered by off-the-shelf statis-tical software packages which are freely available to the general researchersand epidemiologists. Future research could explore other statistical learningmethods, such as neural networks and random forests [150].75Chapter 4Comparison of StatisticalApproaches Dealing withImmortal Time Bias in DrugEffectiveness Studies4.1 IntroductionA goal of causal inference is to design an analysis plan for observationalstudy data to emulate the conditions of a randomized clinical trial. In theabsence of time-dependent covariates, this means making the subjects fromdifferent treatment groups comparable at baseline (i.e., the start of follow-upor the cohort entry time point). When dealing with survival data, survivaloutcome can be modelled by treatment groups, conditional on the baselinecovariates, via the Cox proportional hazards model. Under the identifiabil-ity conditions (conditional exchangeability, positivity and consistency), theresulting hazard ratio estimate can be given a causal interpretation [55, 75].In many observational drug effectiveness studies, there may be a delayor wait period before a subject begins to receive a treatment. Therefore, asubject’s treatment status recorded at baseline may not be accurate for theentire duration of follow-up. Epidemiologists refer to this wait period duringwhich the survival outcome cannot occur, in part due to the study design,as ‘immortal time’. If the subject develops the event shortly after baseline,he may not get the opportunity to initiate the treatment and by design, he764.1. Introductionis assigned to the untreated group. Failure to adjust for the change in treat-ment status, therefore, results in a spurious survival advantage (protectiveassociation) in favour of the treated group. This bias is popularly knownas immortal (or ‘immune’) time bias [78, 167], time-dependent bias [168] orsurvival bias [82]. This bias can considerably distort the underlying hazardratio if a large number of failures occur before the initiation of treatment orif the length of immortal time is large [169].Although this bias was first identified in the 1970’s [170], many pharma-coepidemiology studies still fail to account for this source of bias [79, 168,171]. Recently, this issue of immortal time bias has resurfaced in pharma-coepidemiology studies while trying to implement newly popularized causalinference tools such as propensity scores (see §B.1 for a brief description).While deriving propensity scores, treatment group memberships need to bedefined at baseline [172]. Some studies choose to address or at least acknowl-edge this immortal time bias problem by using simplified techniques suchas selecting an alternative baseline unique for all subjects that makes senseclinically or excluding the subjects with wait times [173–175] or by rathercomplicated techniques such as risk-set matching [176–178], while otherschoose to ignore it completely [91, 93]. While some argue that the propen-sity score is an acceptable tool to deal with immortal time bias [179, 180],others are skeptical about this claim [93, 181]. To account for the immortaltime bias, statisticians generally recommend adopting a proper treatmentexposure definition via time-dependent analyses [78], such as use of time-dependent Cox proportional hazards models. However, as the findings fromthese models are expressed in terms of person-time under treatment expo-sure rather than in terms of treatment groups, interpretations are often notas intuitive as for a group-based comparison. Also, assumptions related tothese analyses, such as treatment initiation being unrelated to the risk ofsubsequent failure, may be unrealistic for some situations [182]. Severalother approaches have been proposed in the literature which modify thedata so as to retain the treatment group-level interpretation. Prescriptiontime-distribution matching (PTDM) [82] is one suggested approach to ad-774.1. Introductionjust for immortal time bias that is cited frequently in the recent literature[84–88] due to its simplicity.In longitudinal studies, treatment may not be the only influential vari-able that may change after baseline. It is natural to have regular measure-ments of clinical symptoms and disease activity, and the values of thesecovariates may change over time. Since the predictive ability of baselinecovariates may decrease over the follow-up time, considering the full covari-ate history of these time-dependent covariates, rather than just the baselinecovariates may be preferable [183]. If these covariates do not interact withtreatment exposure, a time-dependent Cox model may still be adequate toobtain an unbiased estimate of the treatment effect. However, if these co-variates are affected by previous treatment (such covariates are popularlyknown as time-dependent confounders), the estimated hazard ratio may bebiased if the time-dependent confounders are included as covariates in thetime-dependent Cox model analysis [50]. In the presence of time-dependentconfounding and immortal time, marginal structural Cox models (MSCM)are frequently used to estimate the causal effect of a time-dependent treat-ment exposure [42, 181]. MSCM is basically an extension of the propen-sity score methods that appropriately accommodates treatments of a time-varying nature in observational longitudinal studies with survival outcome[161] (see Chapter 2). The sequential Cox approach [73] has been proposedas an alternative to the MSCM approach. Both approaches are intended todeal with immortal time and the initiation of treatment after baseline.Several studies have quantified the amount and direction of bias dueto misclassifying or ignoring immortal time by means of simulation [184–187] or theoretically [188–190]. Via simulation of various disease contexts,it was showed repeatedly that overly optimistic estimates of treatment ef-fects are obtained when the time-dependent nature of treatment is ignored[184, 186, 187]. In the intensive care unit context, a further simulation studyshowed that time-fixed analytic approaches are not generally equipped todeal with the time-dependent covariates [185]. Particularly, the landmark784.1. Introductionmethod, a time-fixed analytic approach, was shown to adequately controlfor immortal time bias only when outcome occurs soon after initiating thetreatment [188]. The nature of time-dependent bias was investigated math-ematically for survival models in some studies [189, 190]. Such bias wastheoretically quantified under some parametric assumptions depending onvarious cohort definitions [80]. To the best of our knowledge, no attempthas been made to explore the appropriateness of the PTDM or sequentialCox approaches in minimizing immortal time bias.The focus of this chapter is to assess the performance of these proposedmethods for dealing with immortal time bias. To do this, we quantify thebias due to PTDM based on the expressions derived in Suissa [80] for twoother naive approaches. We also simulate survival data with time-dependenttreatment exposure. Three different conditions are considered for simula-tion: (1) one baseline covariate present, (2) one time-dependent covariatepresent along with a baseline covariate and (3) one time-dependent con-founder present. To assess the suitability of these methods in an applica-tion, we apply all these methods to investigate the impact of time-varyingbeta-interferon treatment in delaying disability progression in subjects fromthe British Columbia Multiple Sclerosis (MS) database (1995-2008) [92, 128].The remainder of the chapter is organized as follows. In the next section,we describe the notation and design of the simulation study, the methodsused to address immortal time bias, and the metrics used to evaluate theirperformances. Then we summarize the simulation and the MS data analysisresults. The chapter concludes with a discussion of the results, and theimplications and limitations of the current study.794.2. Methods4.2 Methods4.2.1 NotationConsider a hypothetical longitudinal study consisting of n subjects (i =1, 2, . . . , n). Let t0 = 0 be the start of follow-up or the time of the baselinevisit. Baseline covariates L0 (binary or continuous) are recorded at baseline.Follow-up continues till the time of failure T or the time of censoring TC .Regular measurements of the binary treatment status Am (= 1 for treatedand 0 otherwise), are recorded at intervals m = 0, 1, 2, . . . ,K. As this studyis focusing mainly on the implications of immortal time, we assume thatthe subjects may initiate treatment at most once and continue taking thetreatment thereafter till the study ends. Let treatment initiation occur attime TA.Let Nm be the number of failures occurring up to and including them-th interval [tm, tm+1). Also, let Cm (= 1 for censoring due to dropoutor artificial censoring and 0 otherwise) be the binary indicator of censor-ing during the m-th interval. Finally, let rm be the risk-set consistingof subjects who are at risk of failure during the m-th interval [tm, tm+1).Let a¯m = (a0, a1, . . . , am) be the observed realizations of treatment historyA¯m up to interval m, and similarly, let l¯m and c¯m be the observed real-izations of covariate histories L¯m and censoring histories C¯m up to intervalm respectively. The binary indicator of failure by time tm+1 is defined asYm+1 = I(T ≤ tm+1).4.2.2 Analysis ApproachesIn a simplified drug-effectiveness analysis, we can divide the subjects intotwo groups: the ‘ever-treatment exposed group’ consisting of the subjectswho were exposed to the treatment at some point during their follow-up,and the ‘never-treatment exposed group’ consisting of the subjects who werenever exposed to the treatment during their follow-up.804.2. MethodsFor comparison purposes, we will include two naive Cox models withtime-independent treatment definitions: (1) unexposed time is misclassifiedas exposed time for the subjects in the ever-treatment exposed group, (2)unexposed time is excluded from the follow-up of the ever-treatment exposedgroup and the treatment initiation time is treated as time zero for these sub-jects. To address immortal time, we then apply the following approaches:(3) time-dependent Cox model with time-dependent treatment and time-dependent covariates, (4) time-dependent Cox model with time-dependenttreatment and baseline values of the covariates, (5) MSCM, (6) PTDM, (7)sequential Cox approach.Brief characteristics of the analysis approaches are shown in Table 4.1.We describe these methods in detail in the following sections using the no-tation defined above.Naive Cox Model with Time-independent Treatment DefinitionTo demonstrate the impact of misclassifying treatment exposure by ignor-ing immortal time, two naive Cox analyses with time-independent treatmentdefinitions are used to estimate the log-hazard (or log-hazard ratio). In thefirst approach, subjects in the ever-treatment exposed group are classifiedas treated for their whole duration of follow-up. This is similar to intention-to-treat principle [191] where subjects are assumed exposed to treatmentimmediately at the beginning of follow-up. We call this approach ‘includeimmortal time’ hereafter. Then we fit a time-invariant Cox model whileadjusting for the potential baseline confounders.In the second approach, the immortal time, i.e., time from cohort entryto the initiation of treatment, is excluded from the follow-up of the ever-treatment exposed subjects and time zero for these subjects is taken to bethe time of treatment initiation TA. However, the follow-up period for thenever-treatment exposed subjects remains the same, i.e., time zero is thetime of cohort entry t0 = 0. We call this approach ‘exclude immortal time’814.2. Methodshereafter. We fit a time-invariant Cox model while adjusting for the po-tential confounders measured at original baseline. More details about thesetwo approaches are available in the Appendix §C.1.Time-dependent Cox Model with Both Treatment andCovariates being Time-dependentTo avoid the difficulties related to the immortal time, statisticians frequentlysuggest using the time-dependent Cox model incorporating the entire treat-ment history [80, 82, 184, 192]. If we only consider the baseline covariatesL0, the hazard function can be expressed as the following time-dependentCox model:λT (m|L0) = λ0(m) exp(ψ1Am + ψ2L0), (4.1)where m is the visit index, λ0(m) is the unspecified baseline hazard function,ψ1 is the log HR of the time-dependent treatment status (Am) and ψ2 is thevector of log-hazard for the baseline covariates L0.To increase accuracy of the results, researchers also suggest incorpo-rating the entire history of the relevant time-dependent covariates in theanalysis [193]. In the presence of time-dependent covariates Lm (binary orcontinuous), equation (4.1) can be modified to:λT (m|L0, Lm) = λ0(m) exp(ψ1Am + ψ2L0 + ψ3Lm), (4.2)where ψ3 is the vector of log HRs for the time-dependent covariates Lm.Time-dependent Cox Model with Time-dependent Treatmentand Baseline CovariatesWe also include another time-dependent Cox analysis based on a time-dependent treatment definition. However, the history of the time-dependentcovariates is restricted to the baseline values only (denoted as L′0, which ex-824.2. Methodscludes the post-baseline values of Lm). This is to quantify the impact ofignoring post-baseline changes in covariates. We call this the ‘full cohort(base)’ analysis. The hazard function is modelled as:λT (m|L0, L′0) = λ0(m) exp(ψ1Am + ψ2L0 + ψ′3L′0),where ψ′3 is the vector of log HRs for the values of the time-dependent co-variate at baseline, L′0.Marginal structural Cox modelWe have already discussed this model in §2.2.2 and §3.2 so we include onlya brief description here. If the time-dependent covariate Lm is influencedby past exposure, i.e., if Lm is a time-dependent confounder, playing a dualrole as a confounder and an intermediate variable in the causal pathwaybetween treatment and outcome, ψ1 estimated from equation (4.2) may bebiased [50]. Researchers need to be cautious about what covariates theyinclude in the regression equation as covariates [194]. Instead of using Lmas a covariate in a Cox model, Lm is used to calculate the inverse probabilityweights (IPW) that are person-time specific measures of the degree to whichLm confounds the treatment selection process. These IPWs are then used tocreate the pseudo-population which will be free from the confounding effectsof Lm. MSCM enables the conceptual comparison of the hazard functionsfor those subjects who were never exposed to treatment with those who werecontinuously exposed.Stabilized IPW, swm, can be obtained by multiplying stabilized inverseprobability of treatment weights (IPTW), swTm, by stabilized inverse prob-ability of censoring weights (IPCW), swCm [42], whereswTm =m∏j=0pr(Aj = aj |A¯j−1 = a¯j−1, L0 = l0)pr(Aj = aj |A¯j−1 = a¯j−1, L0 = l0, L¯j = l¯j), (4.3)834.2. MethodsandswCm =m∏j=0pr(Cj = 0|C¯j−1 = 0, A¯j−1 = a¯j−1, L0 = l0)pr(Cj = 0|C¯j−1 = 0, A¯j−1 = a¯j−1, L0 = l0, L¯j−1 = l¯j−1).(4.4)The weights swm are used in the time-dependent Cox model with hazardfunction modelled as in equation (4.1) to weight the contribution of eachperson-time observation so that the confounding due to Lim is removed.Note that IPCW is used only if non-random censoring is present. As dis-cussed in §2.2.2, when the numerators in equations (4.3) and (4.4) are re-placed by 1, these become the unstabilized IPW, wm.Prescription Time-Distribution Matching ApproachAlthough time-dependent Cox models are suitable tools for dealing with im-mortal time bias, these models do not offer treatment group-based interpre-tations as does the standard Cox model with fixed exposures. Researchersoften resort to even simpler methodologies to deal with immortal time, suchas the PTDM approach [82].The essence of this approach is to redefine time zero in both the ever-treatment exposed and the never-treatment exposed groups. This is done byshifting the start of follow-up to the time of treatment initiation (the end ofthe immortal time period) TA for the ever-treatment exposed subjects. Theimmortal time (wait) periods for the ever-treatment exposed subjects arerandomly (with replacement) assigned to the never-treatment exposed sub-jects. The never-treatment exposed subjects who failed within their assignedwait period are excluded from further analysis. The analysis is performedbased on the new time zeros, i.e., the newly defined baseline after exclud-ing the observed or assigned immortal time from the follow-up for the everand never-treatment exposed groups respectively. This eliminates imbal-ance in the excluded time distribution between the two treatment groups.Note that the random assignment of immortal time to the never-treatmentexposed subjects and the subsequent exclusion of the subjects if they fail844.2. Methodswithin the assigned immortal period makes the data restructuring processrandom and the hazard ratio obtained from a different random assignmentmay be different. Further illustration and theoretical assessment of thisapproach is provided in Appendix §C.2.Sequential Cox ApproachLet [tm, tm+1) denote the m-th interval where at least one subject initiatestreatment. We want to mimic a clinical trial setting (e.g., either on treat-ment or off treatment during the entire duration of the follow-up) for eachof these intervals where subjects initiate treatment. Based on the treatmentinitiation at the m-th interval, the m-th mini-trial is created as follows: onlysubjects who have not received any treatment before the m-th interval areconsidered. Among the subjects at-risk at tm (i.e., those who have not failedor been lost to follow-up by the beginning of the m-th interval, tm), the sub-jects initiating treatment during the m-th interval (tm < TA ≤ tm+1) areconsidered as the treated group, while the remaining subjects are consideredas the control group. These control subjects are artificially censored at thetime of later treatment initiation (TA > tm+1) to avoid confounding due totreatment. As these subjects are artificially censored, the analysis needs tobe adjusted using IPCW. Note that if we consider ‘month’ as the intervalunit for follow-up, there may be some intervals (i.e., months) during follow-up when no subjects initiate treatment.In this mimicked trial, a subject is either on treatment or off treatmentduring the entire duration of the follow-up. Therefore, this manipulatedsubset of the data mimics a randomized clinical trial. A Cox proportionalhazards model can be used to compare the survival experiences of thesetwo groups. The relevant time-dependent covariate information is updateddepending on the interval. In the analysis, we adjust for the baseline con-founders L0 measured at inclusion or baseline, the time-dependent covariatesLm measured at the start of m-th interval and the lagged covariates Lm−1consisting of the lagged value from the previous interval; this will help us854.2. Methodsreduce bias in the estimation of the treatment effect from the m-th mini-trialdata [73]. Let us denote L˜m = (L0, Lm−1, Lm).After treatment initiation, time-dependent covariate values after the m-th interval may be affected by the treatment and hence those covariatevalues after the m-th interval are not used in the analysis of the data forthe m-th mini-trial [73]. If Lm are time-dependent confounding covariates,they are not included in L˜m as they are affected by the treatment [195, p.23].We assume that the different mini-trials may have different baseline haz-ard functions but all subjects in the same mini-trial will have the samebaseline hazard function. Under this assumption, a stratified Cox model isappropriate. Therefore, the hazard function for the m-th mini-trial can bewritten as [73]:λmT (m|L0, L˜m) = λ0m(m) exp(ψ1Am + ψ′2L˜m)(4.5)where λ0m(m) is the unspecified baseline hazard function for stratum m, ψ′2is the vector of log HRs for the time-dependent covariates L˜m. This haz-ard function should be weighted by IPCW given in equation (4.4). Pooledlogistic regression [42, 50] or Aalen’s additive regression model can be usedto estimate the IPCW [73, 196]. The resulting estimate will bear a causalinterpretation under the assumptions of no unmeasured confounders andcorrect model specification for the hazard ratio and the censoring weights.We can fit a stratified Cox model on the combined data of all mini-trials(pseudo-data), stratified by the treatment initiation time. Alternatively, asimple Cox model weighted by IPCW can be run for each of the successivemini-trials to obtain separate estimates of the treatment effect for each mini-trial, leading to the name, the sequential Cox approach. An overall estimateof the treatment effect is obtained by simply averaging the treatment effectestimates from the separate mini-trials. The overall estimate requires twoadditional assumptions for causal interpretation: (1) the treatment effect864.2. Methodsis the same in all the mini-trials and (2) the treatment effect is unchangedfor all covariate histories before the m-th interval, given the covariates atthe m-th interval. However, if one is willing to interpret the overall effectestimate as an aggregated (averaged) effect over all the mini-trials, then thefirst assumption can be relaxed [73, 75].The IPCW adjusted stratified Cox model used in the sequential Cox ap-proach is easy to implement using standard software packages. The IPTW,the potentially unstable part of the IPW, are not used in the sequentialCox approach [73]. As the data associated with a given mini-trial canbe extracted and separated quite easily from the combined mini-trial data(pseudo-data), it is also straightforward to compare the effects of early ver-sus late treatment initiation. However, the combined mini-trial (pseudo)dataset can become large due to repeated use of the same control subjects.While inclusion of the same subject more than once may increase event rates,the SE obtained from the stratified weighted Cox analysis is invalid. Timeconsuming resampling methods, such as the jackknife [73] or the bootstrap[196], are required to obtain a correct SE. An illustrative data construc-tion example is provided in Appendix §C.3 and the corresponding softwareimplementation details are provided in Appendix §C.4.874.2.MethodsTable 4.1: Description of the analytic methods.Data-modify method Method Time- Stratified Covariate Weightdependent Cox history adjusted(1) Include IT Cox PH No No Baseline No(2) Exclude IT Cox PH No No Baseline No(3) Full cohort Cox PH Yes No Full No(4) Full cohort (Base) Cox PH Yes No Baseline No(5) MSCM Cox PH Yes No Full Yes, IPTC(6) PTDM Cox PH No No Baseline No(7) Sequential Cox Cox PH No Yes Multiple† Yes, IPCIT, Immortal time; PTDM, Prescription time distribution matching; MSCM, Marginalstructural Cox model; IPT, Inverse probability of treatment; IPC, Inverse probability ofcensoring; IPTC, Inverse probability of treatment and censoring.† For sequential Cox approach, covariate values are collected at three time points for eachmini-trial: at baseline, at the period of treatment start and the lagged value at treatmentstart.884.2. Methods4.2.3 Design of SimulationA number of schemes for simulating survival data for Cox models are avail-able in the literature. Some generate survival times for the Cox modelswith time-invariant covariate by inverting cumulative hazards functions fromcommonly used survival distributions [197, 198]. A scheme for generat-ing survival times for more complicated distributions such as the truncatedpiecewise exponential is also available [199]. These schemes have been ex-tended to the situation with one [200–202] or more time-varying covariates[203].To simulate survival times with or without time-dependent covariates,we adapt the permutation algorithm [204]. This algorithm simulates sur-vival data following specified distributions of survival time conditional onany number of fixed or time-dependent covariates. In this algorithm, a per-mutation probability law based on the Cox model partial likelihood [83] isused as a basis for performing matching as follows. If a subject with agiven set of covariates remains at risk until interval m, then the probabilityof that subject reaching the outcome at interval m is proportional to thesubject’s current hazard. This algorithm has been validated for generatingsurvival times conditional on time-dependent treatment [205] and also whentime-dependent covariates are present [183]. This algorithm has been usedin several other studies dealing with generating survival data with time-dependent covariates (see for example [206–210]). A brief description of thealgorithm is presented in Appendix §C.5.A number of different simulation schemes are available in the literature tosimulate survival times in the presence of a time-dependent confounder [56,57, 102, 103, 144, 159, 160]. We adopt the data generation process of Younget al. [56] (also used in Chapter 3: see §3.3) where both treatment statusand confounder are time-dependent. Data generated from this algorithmare popularly used to assess the ability of MSCMs to handle time-dependentconfounders [57, 114].894.2. Methods4.2.4 Simulation SpecificationsIn our Monte Carlo study, we will generate N = 1, 000 datasets with n =2, 500 subjects, each followed for up to m = 10 subsequent visits for eachsetting under consideration. For mimicking the rare disease condition, weset λ0 = 0.01 (on a monthly scale). For mimicking a more frequent diseasecondition, we set λ0 = 0.10 (on a monthly scale) and repeat the Monte Carlostudy. Below we discuss the simulation schemes under consideration and thespecifications that were used.Simulation - IWe assume an exponential distribution for generating failure times T withconstant λ0 = 0.01 rate of monthly events throughout the follow-up. Theexponential distribution is the simplest of all commonly used survival timedistributions. However, despite its simplicity, it is often considered useful inbiomedical research and is the basis for various frequently used approaches,such as, the Poisson model [186]. Therefore, we choose the exponentialas the marginal distribution for generating event times. An uniform dis-tribution U(1, 60) months is assumed to generate censoring times TC ; i.e.,administrative censoring is set at 5 years of follow-up. This marginal distri-bution of censoring time is independent of treatment exposure, as well as thefailure times that were generated earlier. The accuracy of the chosen algo-rithm decreases with increasing rates of censoring [183] and hence we chose60 months as the administrative censoring time-point i.e., the high upperlimit of the uniform distribution. Treatment initiation time TA is generatedfrom an uniform distribution U(0, 10) (in months). We assume treatmentto be a binary variable for all subjects. This implies that the treatmenthas a constant impact on the hazard (multiple versions of the treatment isnot acceptable); otherwise no treatment was assigned. Also, to focus onthe immortal time issue, we assumed that there are no discontinuations orinterruptions for those who initiate treatment. Additionally, we consider sexas a baseline confounder in these data. Subject’s sex is generated based ona Bernoulli distribution where the probability of being male is 0.3. This co-904.2. Methodsvariate is also generated independent of time-dependent treatment exposure.After generating values for the survival time Ti, the censoring time TCi ,and the treatment and covariate matrix Xim = (Aim, Li0) for each subjecti = 1, 2, . . . , n for up to m = 10 time periods, the permutation algorithm[204] is used to generate survival data where treatment Am is time-dependentbut the confounder L0 is fixed at baseline value. The effect parameters fortreatment and sex on the survival outcome are set such that the treatmenthas a harmful effect (a log-hazard of ψ1 = 0.5) and males are at a lowerrisk than females (a log-hazard of ψ2 = −0.7). Here, the treatment havingharmful effect means that a subject’s survival time is shorter when she istreated compared to her survival time when she is untreated.Simulation - IITo generate the survival times, we use exactly the same specification usedin simulation - I, with the exception that we now add one time-dependentcovariate, say cumulative disease activity Lm, such that higher cumulativedisease activity has a higher risk (a log-hazard of ψ3 = log(1.5)). Thistime-dependent covariate Lm is generated based on a Bernoulli distributionwith probability of disease activity increment being 0.75, accumulating thedisease activity over at most m = 10 periods of time. As before, sex is abaseline confounder (L0).Simulation - IIIWe use the algorithm for simulating survival times in the presence of atime-dependent confounder [56]. In this simulation, counterfactual failuretime Ti0¯’s are sampled from an exponential distribution, with constant λ0 =0.01 rate of monthly events throughout the follow-up, as discussed in §3.3.The binary time-dependent confounder, Lm, is modelled by the followingcovariates: a binary covariate I(T0¯ ≤ c), previous treatment status Am−1,914.2. Methodsand the lagged variable Lm−1:logit(pL) = logit Pr(Lm = 1|Am−1, Lm−1, Ym = 0;β)= β0 + β1I(T0¯ < c) + β2Am−1 + β3Lm−1,with associated parameters β = (β0, β1, β2, β3) = (log(3/7), 2, log(1/2),log(3/2)), c = 30 and Ym = I(T ≤ tm) (as defined in § 4.2.1).We model treatment status at each stageAm with the factors symptom orcurrent medical condition Lm, past symptom Lm−1, and previous treatmentstatus Am−1 aslogit(pA) = logit Pr(Am = 1|Lm, Am−1, Lm−1, Ym = 0;α)= α0 + α1Lm + α2Lm−1 + α3Am−1,with associated parameters α = (α0, α1, α2, α3) = (log(2/7), 1/2, 1/2, 10).Current treatment status Am is made heavily dependent on the previoustreatment status Am−1 by setting a high parameter value (α3 = 10). Thatway, we emulate the situation where subjects switch to treatment at mostonce and keep on using the treatment without much interruption or discon-tinuation. The true causal effect parameter is set to be ψ1 = 0.5. A briefdescription of the three simulations under consideration is provided in Table4.2.To study the properties of these simulated data populations, we gener-ated datasets with n = 25, 000 subjects, each followed for up to m = 10subsequent visits, based on each simulation setting. As mentioned before,in each of the Monte Carlo studies, we generated datasets with n = 2, 500subjects, each followed for up to m = 10 subsequent visits.924.2. MethodsTable 4.2: Three simulation settings under consideration.Simulation - I Simulation - II Simulation - IIIAlgorithm Abrahamowiczet al. [204]Abrahamowiczet al. [204]Young et al. [56]Time-varying treatment Yes Yes YesBaseline covariate Yes Yes NoTime-varying covariate No Yes NoTime-varying confounder No No Yes4.2.5 Analytic Models UsedSimulation-I ModelsIn the simulation setting-I, when estimating the treatment effect, the base-line covariate L0 is included in all the models under consideration. The Coxmodel is used in all these approaches. In the ‘include IT (immortal time)’approach, immortal time is mislabelled as treated and in the ‘exclude IT’approach, immortal time is excluded from the analysis. PTDM excludes theobserved and assigned wait times. In the sequential Cox approach, strati-fied Cox weighted by IPCW is used, adjusting for treatment status Am andbaseline L0. In the absence of a time-dependent covariate, stabilized IPWsare not useful. Therefore, the corresponding unstabilized IPCW model isfitted using pooled logistic regression adjusting for Am and L0 to predictfuture censoring status.Simulation-II ModelsIn the simulation setting-II, when estimating the treatment effect, the base-line covariate L0 and time-dependent covariate Lm are included in all themodels under consideration. The Cox model is used in all these approaches.In the ‘full cohort’ and MSCM approach, all post-baseline values of Lm areused. In all the other approaches (except sequential Cox), only the baseline934.2. Methodsvalues of Lm (i.e., L′0) are used. In the sequential Cox approach, the baselinecovariate L0 and three values of Lm are used: one at cohort entry, anotherat mini-trial entry and the lagged value before the mini-trial entry (as dis-cussed in §4.2.2). To create the IPCWs, pooled logistic models are used. Inthe stabilized IPCW model (equation 4.4), the numerator model adjusts forAm and L0, while the denominator model adjusts for Am, L0 and Lm. Forthe MSCM, the model adjusts for only L0 to obtain the effect of Am. Thecorresponding IPTW (equation 4.3) is modelled via a pooled logistic model.For the stabilized IPTWs, the numerator model adjusts for the time index,L0 and lagged values of Am, while the denominator model adjusts for Lm,L0, and lagged values of Lm and Am to predict future treatment status.Simulation-III ModelsIn the simulation setting- III, all the approaches under consideration in-clude the time-dependent confounder Lm when estimating the treatmenteffect. The Cox model is used in all these approaches. In the ‘full cohort’and MSCM approach, all post-baseline values of Lm are used. In all theother approaches (except sequential Cox), only values of Lm at cohort en-try are used. As discussed in § 4.2.2, in the sequential Cox approach, wediscard the time-dependent confounder Lm. The unstabilized IPCW modeladjusts for only Am in the absence of a time-dependent confounder. To do asensitivity analysis, we discarded the IPCW. In another sensitivity analysis,three values of Lm are used in the sequential Cox approach: one at cohortentry, another at mini-trial entry and the lagged value before the mini-trialentry (as discussed in §4.2.2 and as in simulation-II). To create the IPCWs(equation 4.4), a pooled logistic model is used. For the stabilized IPCWs,the numerator model adjusts for Am, while the denominator model adjustsfor Am and Lm. The IPTWs (equation 4.3) for fitting the MSCM are mod-elled via a pooled logistic model. The stabilized IPTW numerator modeladjusts for time index and lagged values of Am, while the denominator modeladjusts for Lm and lagged values of Lm and Am to predict future treatmentstatus.944.3. Application in Multiple SclerosisAmong the approaches used here, the sequential Cox approach and theMSCM rely on IPWs, but the other approaches do not use any propen-sity scores or other weight-based approach; rather they rely only on theregression-based estimation approach. The PTDM and the included andexcluded immortal time approaches are not suitable for propensity scoreor weight adjustment as they either use the future treatment status of thesubjects to define the treatment groups or lack a baseline that is uniformlyunique for all the subjects. More details about these approaches are availablein the Appendices C.1 and C.2.4.2.6 Performance MetricsWe assessed the performance of the various approaches by the followingmeasures:• Bias =∑Ni=1(ψˆ1i − ψ1)/N : The average difference between the trueand N = 1, 000 estimated parameters (log-hazard).• SD =√∑Ni=1(ψˆ1i − ψ′1)2/(N − 1) where ψ′1 =∑Ni=1 ψˆ1i/N• Model-based SE: The average of N = 1, 000 estimated standard errorsof the estimated causal effect.• Coverage probabilities of model-based nominal 95% CIs: Proportionof N = 1, 000 datasets in which the true parameter is contained in thenominal 95% CI.• Power: For a level α = 0.05 test of H0 : ψ1 = 0, the estimated poweris the proportion of nominal p-values that are less than α = 0.05.4.3 Application in Multiple SclerosisWe apply the methodologies described in this chapter in the British Columbia(BC) MS cohort study (1995-2008) described in §2.2.1. The dataset was usedin previous studies [92, 101] (also in Chapters 2 and 3) to estimate the effect954.3. Application in Multiple Sclerosisof β-IFN on irreversible disease progression. As before, irreversible progres-sion of disability is measured by sustained EDSS 6 which is confirmed afterat least 150 days, with all subsequent EDSS scores being 6 or greater. Thetreatment definition is changed in this study to allow us to demonstrate theimpact of the various immortal or immune time adjustment methods. Here,once the subjects initiate β-IFN, we assume they continue taking the drugwithout any discontinuation until they develop the survival outcome (timeto irreversible progression of disability) or become censored.4.3.1 Analytic Models UsedPotential baseline confounders L0 include age, sex, disease duration andEDSS score. Also, we consider the cumulative number of relapses in the last2 years (hereafter called ‘cumulative relapses’) as a time-dependent con-founder Lm (justified in [128], Chapter 2). All the models under consid-eration adjust for the baseline confounders L0 (age, sex, disease durationand EDSS score) and the time-dependent confounder Lm (cumulative re-lapses) when estimating the treatment effect. The Cox model is used in allthese approaches. All post-baseline values of cumulative relapses (Lm) areused only in the ‘full cohort’ and MSCM approaches. In all the other ap-proaches (except sequential Cox), the value of cumulative relapses at cohortentry is used. In the sequential Cox approach, three values of cumulativerelapses are used: one at cohort entry, another at mini-trial entry and thelagged value before the mini-trial entry (discussed in § 4.2.2). To createthe IPCW (equation 4.4), pooled logistic models are used. The stabilizedIPCW numerator model adjusts for Am and the baseline confounders L0,while the denominator model adjusts for Am, L0 and Lm. For the MSCMmodel, we estimated the effect of Am after adjusting for the potential base-line confounders. The corresponding IPTWs (equation 4.3) are modelled viaa pooled logistic model. The stabilized IPTW numerator model adjusts fora restricted cubic spline of the follow-up time-index, baseline confoundersL0 and lagged values of Am to predict future treatment status. The de-nominator model additionally adjusts for the current and lagged values of964.4. Simulation Resultscumulative relapses (Lm).4.4 Simulation Results4.4.1 Description of the Simulated DataTo describe the data generated from the three simulation settings, we gener-ated datasets with a larger number of subjects (25, 000) with up to 10 sub-sequent visits from each simulation algorithms. For our purposes, we needto generate data such that subjects generally switch from the ‘not treated’state to the ‘treated’ state at most once. For simulation-I and II, there areno exceptions. However, the way simulation-III is generated allows a fewexceptions (19 out of 25, 000) where there are discontinuations. However,the proportion of discontinuation in the simulation-III dataset is negligible(0.00076) and we do not expect any noticeable impact in the results dueto this small number of exceptions. The characteristics of the treated, un-treated and partially treated groups, their failure rates and average numberof visits are listed in Table 4.3. Simulation-I and II are very similar withrespect to the characteristics listed here.Table 4.3: Characteristics of three simulation settings under considera-tion.Rates Simulation-I Simulation-II Simulation-IIIFailure 0.084 0.084 0.143Always treated 0.051 0.051 0.261Never treated 0.152 0.150 0.046Partially treated 0.797 0.799 0.692Discontinuation - - 0.001Mean visits 8.949 8.943 9.367974.4. Simulation Results4.4.2 Rare Event ConditionWe present the results from the rare event condition (λ0 = 0.01 in a monthlytime-scale) in the three simulation settings. When a time-dependent covari-ate or confounder is present, PTDM is not an appropriate analysis method.This method is only appropriate for analyzing simulation setting - I. Westill show the results from this analysis in the other simulation settings forcomparison purposes.Results From Simulation-IResults from simulation-I are reported in Table 4.4. The time-dependentCox model with treatment status (Am) and the baseline covariate (L0) isfitted to assess the accuracy of the survival generating permutation algo-rithm. The level of bias of ψˆ1 is negligible (0.005), the average coverageprobability of model-based nominal 95% CIs is 0.946 and the correspondingpower is 0.879. We consider these results as the standard for comparisonpurposes for this simulation setting.When the immortal time is misclassified as exposed time, we see a sub-stantial downward bias (−2.799). The situation improves slightly when im-mortal time is excluded from the analysis (bias −2.214). Applying PTDM,the bias is further reduced (−1.837), but the estimate is still substantiallyoff the target. Also, the variability of the estimator (SD(ψˆ1)) for PTDM issubstantially larger than for the time-dependent Cox results.When the sequential Cox approach is applied, the amount of bias is neg-ligible (0.007), the average coverage probability of the model-based nominal95% CIs is 0.949; both are comparable to the time-dependent Cox results.However, the power from this approach (0.755) is slightly lower and the SDof the ψˆ1 (0.196) is higher than that of the time-dependent Cox approach.As there is no time-dependent covariate in this simulation, the MSCM isnot fitted.984.4. Simulation ResultsTable 4.4: Comparison of the analytical approaches to adjust forimmortal time bias from simulation-I (one baseline covariate andtime-dependent treatment exposure) of 1, 000 datasets, each con-taining 2, 500 subjects followed for up to 10 time-intervals.Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort 0.005 0.167 0.162 0.946 0.879Included IT -2.799 0.143 0.141 0.000 1.000Excluded IT -2.214 0.143 0.142 0.000 1.000PTDM -1.837 0.198 0.198 0.000 0.999Sequential Cox 0.007 0.196 0.187 0.949 0.755MSCM - - - - -PTDM, Prescription time distribution matching; IT, Im-mortal time; MSCM, Marginal structural Cox model.In all these approaches, the empirical standard errors SD(ψˆ1) (SD ofthe estimated parameters) are reasonably close to the average model-basedstandard-error (se(ψˆ1)). A slight discrepancy is, however, apparent with thesequential Cox approach (0.196 and 0.187 respectively, reported in the Table4.4). Here the average standard error se(ψˆ1) = 0.187 is obtained from theapproximate jackknife approach (see Appendix C.4). We resort to the non-parametric bootstrap method to determine more reliable estimates of thestandard error. For the 1, 000 datasets under consideration, the bootstrapstandard error of the sequential Cox approach based on 100 nonparametricbootstrap samples is 0.189. Since the bootstrap method requires a substan-tial amount of computing time and estimates from both of the methods(bootstrap and approximate jackknife approach) are close, we simply reportthe approximate jackknife standard error estimate from now on.Results From Simulation-IIResults from simulation-II are reported in Table 4.5. The time-dependentCox model with treatment status (Am), baseline covariate (L0) and time-994.4. Simulation Resultsdependent covariate Lm is again fitted to validate the survival generatingpermutation algorithm. The level of bias is negligible (0.000), the averagecoverage probability of the model-based nominal 95% CIs is 0.952 and thecorresponding power is 0.878. These results are again considered as thestandard for comparison purposes for this simulation setting.To examine the implications of not using the post-baseline changes inthe time-dependent covariate, we use only the baseline values of the time-dependent covariate, while keeping the definition of time-dependent treat-ment unchanged from the previous analysis. This results in some bias(−0.179). However, when we simplify the treatment definition by mis-classifying the immortal time as treated time, the bias is again substantial(−2.305). Excluding the immortal time or using PTDM results in little orno improvement in terms of bias (bias −2.321 and −1.952 respectively).When the sequential Cox approach is used in this simulation setting,we observe some bias (0.268). Even though the bias is much less thanwith the PTDM method, the bias is still close to that obtained from thetime-dependent Cox analysis that incorporates only the baseline covariateinformation.We apply MSCM with Lm handled as a time-dependent confounder, eventhough it is not. The corresponding bias is negligible (−0.001), the averagecoverage probability of the model-based nominal 95% CIs is 0.952, and thepower is 0.880. These results are very similar to those for the time-dependentCox analysis using the full cohort.Results From Simulation-IIIResults from simulation-III are reported in Table 4.6. MSCM with treat-ment status (Am) is fitted to validate the survival generating permutationalgorithm. The corresponding stabilized weights are generated based onrelationship between treatment status (Am) and the time-dependent con-1004.4. Simulation ResultsTable 4.5: Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-II (one baseline covariate, one time-dependent covariate and time-dependent treatment exposure) of 1, 000datasets, each containing 2, 500 subjects followed for up to 10 time-intervals.Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort 0.000 0.164 0.162 0.952 0.878Full cohort (Base) -0.179 0.189 0.189 0.842 0.394Included IT -2.305 0.183 0.180 0.000 1.000Excluded IT -2.321 0.187 0.184 0.000 1.000PTDM -1.952 0.233 0.233 0.000 0.999Sequential Cox 0.268 0.190 0.185 0.696 0.978MSCM -0.001 0.163 0.162 0.952 0.880PTDM, Prescription time distribution matching; IT, Immor-tal time; MSCM, Marginal structural Cox model.founder Lm. The level of bias is negligible (0.029), the average coverageprobability of the model-based nominal 95% CIs is 0.942, and the corre-sponding power is 0.734. These results are now considered as the standardfor comparison purposes for this simulation setting.The time-dependent Cox models, both using full and baseline covariateinformation, result in biased estimates in the presence of this time-dependentconfounder (0.438 and 0.188 respectively). The average coverage probabilityof the model-based nominal 95% CIs for the method using the full covari-ate history is very low (0.251). When the immortal time is misclassified,excluded or PTDM is used to analyze data, we still see substantial bias(−2.190,−1.917 and −1.553 respectively).We apply the sequential Cox approach in three different ways. Firstwe do the analysis excluding the time-dependent confounder Lm from theanalysis. The amount of bias (0.721) is lower than with PTDM, but higherthan with the time-dependent Cox analysis, as was seen in [195]. The second1014.4. Simulation Resultsanalysis, a sensitivity analysis that does not use IPCW in the analysis, leadsto similar bias (0.720). Finally, we perform the analysis by including Lmin the stratified IPCW weighted Cox model. The bias is reduced (0.474),but comparable to the bias in the time-dependent Cox analysis with fullcovariate information (0.438).Table 4.6: Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-III (one time-dependent confounderand time-dependent treatment exposure) of 1, 000 datasets, each con-taining 2, 500 subjects followed for up to 10 time-intervals.Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort 0.438 0.168 0.169 0.251 1.000Full cohort (Base) 0.188 0.177 0.180 0.841 0.982Included IT -2.190 0.199 0.198 0.000 1.000Excluded IT -1.917 0.194 0.193 0.000 1.000PTDM -1.553 0.249 0.223 0.001 0.978Sequential Cox# 0.721 0.266 0.257 0.188 0.999Sequential Cox† 0.720 0.266 0.256 0.185 0.999Sequential Cox§ 0.474 0.272 0.263 0.578 0.969MSCM 0.029 0.201 0.205 0.942 0.734PTDM, Prescription time distribution matching; IT, Immor-tal time; MSCM, Marginal structural Cox model.# As described in § 4.2.2.† Sequential Cox not adjusting for either the time-dependentconfounder or for informative censoring.§ Sequential Cox adjusting for both the time-dependent con-founder in the regression for estimating ψ1 and for informativecensoring via IPCW.4.4.3 When More Events are AvailableResults from the more frequent event condition are presented in the TablesC.1-C.3 in Appendix §C.6. The trends in the bias are similar comparedto those in the rare event condition. In general, in all simulation settings,1024.5. Results from Multiple Sclerosis Data Analysisthe standard errors are much less than in the corresponding analyses whenfailure rates are rare. Bias is slightly lower in some cases. One noticeabledifference is observed in simulation setting - III: in the presence of the time-dependent confounder, when the failure rate is more frequent, the time-dependent Cox and MSCM approaches yield minimal bias (0.044 and 0.000respectively). It is not clear why this is the case. As expected, the averagecoverage probability of the model-based nominal 95% CIs from the time-dependent Cox approach is smaller (0.888) than that of MSCM.4.5 Results from Multiple Sclerosis DataAnalysisTable 4.7: Summary of the estimated parameters from the relapsing-onset multiple sclerosis(MS) patients’ data from British Columbia, Canada (1995-2008).Approach HˆR se(HˆR) 95% CI WeightsAverage (log-SD) rangeFull cohort 1.29 0.23 0.91 - 1.83Full cohort (Base) 1.25 0.23 0.87 - 1.79Included IT 1.05 0.20 0.72 - 1.52Excluded IT 1.53 0.30 1.05 - 2.24PTDM 1.26 0.24 0.86 - 1.85Sequential Cox 1.14 0.29 0.69 - 1.89 1.00 ( -4.06 ) 0.63 - 2.20MSCM 1.31 0.23 0.92 - 1.84 1.00 ( -2.86 ) 0.37 - 1.60PTDM, Prescription time distribution matching.The HR for the treatment is reported. The analyses are adjusted for baselinecovariates sex, EDSS score, age and disease duration, and for the time-dependentconfounder ‘cumulative relapses’.To focus on the impact of immortal time in this application, we assumethat the subjects remain on β-IFN treatment once they initiate the treat-ment, as is assumed in previous pharmacoepidemiologic studies [73, 196,211]. Appendix §A.6 describes the baseline characteristics of the MS cohort1034.5. Results from Multiple Sclerosis Data Analysisunder consideration. As justified in our previous study [128](see Chapter2), we consider MSCM estimates to be ideal in this context. Results arereported in Table 4.7.Figure 4.1: Matched wait periods (in years) from prescription time-distributionmatching approach in the relapsing-onset multiple sclerosis (MS) cohort fromBritish Columbia, Canada (1995-2008).Wait periods (assigned for never-treatment exposed subjects and ob-served for ever-treatment exposed subjects) from the PTDM method areshown in Figure 4.1. As the PTDM approach produces different estimatesfrom the same data based on random sampling of the immortal times, weestimate the HR from the MS data 1, 000 times and report the mean andSD of the estimated HR in Appendix Table C.4 in Appendix §C.7 as is1044.6. Discussiondone in other studies involving random estimates [212]. The distribution ofthe estimated HR, depicted in Appendix Figure C.4 in Appendix §C.7.1, ismoderately symmetric and the estimated HR is always above the null valueof 1 but below 2.The IPCW in the sequential Cox approach are less variable (SD = 0.02)than the IPW in MSCM (0.06). IPCW are estimated separately at eachmini-trial data construction [73]. When they are instead estimated from theaggregated dataset [75], the HR estimate (1.11) is very close to the estimate(1.14) shown in Table 4.7 (see Appendix §C.7.2). Note that no matter howthey are constructed, the IPCWs from the mini-trials are well-behaved, i.e.,the averages are close to one and they have low variability (most are withinthe range 0.9 to 1.1 and the distributions are unimodal and symmetric; seeAppendix Figures C.5 and C.6).4.6 DiscussionDue to various practical considerations, researchers use observational sur-vival studies to assess the impact of treatments. In such studies, in contrastto randomized clinical trials, subjects may not be exposed to the treat-ment at the beginning of follow-up. In longitudinal observational studies,treatment exposure in addition to other patient characteristics may changeover time. Ignoring these time-varying characteristics may lead to inaccu-rate estimates, or possibly even to wrong conclusions being drawn. Sta-tistical procedures, such as the time-dependent Cox model, are known todeal with time-dependent treatment and time-dependent covariate informa-tion. However, in the medical literature, it is not uncommon to see thetime-independent Cox model based on only the baseline characteristics (i.e.,treatment and covariate information measured at baseline) used in suchcircumstances, likely for the convenience of model fitting and group-basedinterpretation [193].1054.6. DiscussionPerhaps of even greater concern, some studies employ future treatmentstatus (who initiates treatment later in the follow-up) to classify subjectsinto the treatment groups [186, 189]. Comparisons between two such mis-classified groups is prone to bias related to immortal time, due to the incor-rect specification of risk-sets.The Cox model assumes that treatment initiation is unrelated to the riskof subsequent failure. Such assumptions underlying the time-dependent Coxmodel may be untestable or difficult to assess in an epidemiological context[82]. The Cox model is sometimes considered as an oversimplified method tocapture the observed process [182]. Alternative survival analysis methods,such as Poisson regression and pooled logistic regression, also suffer fromthe same bias when the definition of time zero for building risk-sets is notthe same for all subjects [169]. Therefore, there is a need for methods thatare capable of handling the time-dependent nature of longitudinal data, aswell as helping us better understand the treatment-outcome mechanism sothat the interpretations of the results become more appropriate.To this end, we assess two methods that are proposed for the situa-tion when treatment initiation occurs later than cohort entry: PTDM andsequential Cox. The appropriateness of these methods is not assessed inthe literature. We design three increasingly difficult simulation settings tohighlight the importance of accounting for time-dependent covariates in lon-gitudinal studies. The first setting (simulation-I) is the simplest: only treat-ment initiation may be delayed and the covariate under consideration istime-fixed. In the second setting (simulation-II), we add a covariate thatis time-dependent. The last setting (simulation-III) deals with the situa-tion where a time-dependent confounder is present. PTDM and sequen-tial Cox approaches are claimed to be appropriate analysis techniques fordatasets generated from simulation settings I and III respectively. As thetime-dependent Cox model is appropriate for simulation settings I and II,we use these results as the standard for comparison. For setting III, wherea time-dependent confounder is present, a MSCM is the most popular and1064.6. Discussionappropriate method and hence results from this method are used as thestandard for comparison in this simulation setting.Downward bias (indicated by a negative sign in the log-hazard estimates)in the analyses ignoring immortal time (‘exclude IT’ approach) is consis-tent with previous simulation studies [184]. This indicates immortal timebias makes the treatment look more protective than it actually is. Eventhough the bias associated with exclusion of immortal time is generally lessor equally severe than with misclassifying it (‘include IT’ approach), thebias is not negligible.A widely accepted alternative to the time-dependent Cox model is PTDM[82]. Here, treatment exposure is converted into a time-independent variableso that a simple Cox model for treatment-group comparison can be applied.This conversion of time-dependent exposure into a time-independent ex-posure requires restructuring of the data using the PTDM approach. Inthis approach, new time zeros are defined after excluding the observed andassigned immortal times in the ever and never-treatment exposed groups.The excluded times (wait-periods) for the ever-treatment exposed and never-treatment exposed groups follow the same distribution. However, the base-lines for all subjects are not exactly the same as in landmark analyses[213, 214]. It is not clear whether assigning random baselines will adequatelyaddress the immortal time bias. Ambiguous and inconsistent definition ofthe baseline time for different subjects in observational studies makes ithard to obtain an unbiased estimate of the treatment-outcome associationdue to entanglement of various sources of bias [215]. From the results ofour simulation (simulation-I), we can see the bias is slightly less than whenmisclassifying or excluding immortal time. However, the bias is still sub-stantial in this analysis (also see Appendix §C.2 for theoretical assessment),highlighting the value of setting a well-defined time zero or baseline.The sequential Cox approach is an alternative method for estimating thetreatment effect from more complex observational data settings where the1074.6. Discussiontreatment is time-dependent and censoring may be non-random [73]. Espe-cially in simulation-I, in the absence of the time-varying covariate or con-founder, this method works very well in comparison to the time-dependentCox. However, results from our simulation settings with a time-varying co-variate or confounder are not as promising as claimed in the original paper[73]. MSCMs are generally more popular in dealing with time-dependentconfounders. As MSCMs are extensions of time-definement Cox models,they are also used in addressing the immortal time bias [216]. Although themechanisms and interpretations behind the sequential Cox approach andMSCM are different, both claim to have the same goal of estimating thecausal effect of treatment in the presence of time-dependent confounders.However, from our simulation, we do not find the sequential Cox approachto be as effective as MSCM in the presence of a time-dependent confounder(simulation-III) or even when a time-dependent covariate which is not atime-dependent confounder (simulation-II) is present.In simulation-III, we performed a sensitivity analysis of the sequentialCox approach without using IPC weights. This sensitivity analysis assessesthe impact of artificial censoring induced in the analysis by censoring latertreatment initiation cases. This analysis yielded very similar results (bias0.720 compared to 0.721 in the original analysis). Another sensitivity anal-ysis was performed that adjusts for the time-dependent confounder. Thisapproach yields less bias (0.474), indicating the importance of adjusting forbaseline values of the time-dependent confounder. As the time-dependentconfounder values after the treatment initiation are discarded in the se-quential Cox approach, it makes sense to control for the time-dependentconfounder in the analysis. Even after such adjustment, the sequential Coxapproach does not seem to remove the effects of time-dependent confound-ing, as is recently mentioned elsewhere [217]. Instead of using the full covari-ate history of the time-dependent covariate (L¯m), this approach only adjustsfor a few values of the time-dependent covariates (L˜m = (L0, Lm−1, Lm)) asdefined in the equation (4.5). This may limit the ability of this method toobtain unbiased estimates. Additionally, this approach cannot handle treat-1084.6. Discussionment discontinuation or more than one treatment initiation [75].On the other hand, in contrast to PTDM, the sequential Cox approacheffectively removes the immortal time bias. The approach utilizes all sub-jects and is therefore more efficient than PTDM. It also handles differentbaselines properly by performing a stratified analysis. The focus is on recre-ating the covariate process at each treatment start using the mini-trial ap-proach. Such focused and detailed scrutiny could yield insights about thedata which may be hard to extract using a MSCM approach. InterpretingMSCM results remains a hurdle and an alternate view of the data may behelpful. Although IPTWs are avoided in the sequential Cox approach, westill need to use IPCWs. These weights are less variable and more stablethan IPTW [73, 217] and appropriately handle artificial censoring caused bythe censoring at later treatment start dates. However, similar to other arti-ficial censoring correction methods, IPCW may not be an effective methodin the presence of strong confounding and small sample size [218].We apply the methods under consideration to estimate the effect of β-IFN on disease progression. The sequential Cox approach seems to have adownward bias (HR=1.14; 95% CI 0.69 - 1.89) compared to MSCM (HR=1.31;95% CI 0.92 - 1.84). The PTDM results (HR=1.26; 95% CI 0.86 - 1.85) lookcloser to MSCM. However, the PTDM approach involves random assignmentof wait periods for the never-treatment exposed subjects. When we repeatthe analysis 1, 000 times and average the results, the estimated treatmenteffect (HR=1.44; 95% CI 0.97 - 2.11) looks too high with wider CI comparedto that obtained from a MSCM. The random nature of the results may bean undesirable feature of this analysis. However, use of statistical methodsthat produce variable results is not uncommon in the epidemiologic litera-ture [212, 219].Similar to other simulation studies, we only investigate a handful of pos-sible scenarios. However, the assumptions underlying the data simulationare consistent with patterns typical in epidemiologic observational survival1094.6. Discussionstudies where treatment initiation may happen later for some subjects andassociated covariates are measured regularly. Furthermore, our assumptionof no discontinuations or interruptions in the treatment is restrictive andmay not be suitable in some disease scenarios where subjects may choosedifferent treatment strategies over the course of time. Other bias relatedto time-dependency, such as time-modified confounding [220], is not consid-ered in this study. Substantial immortal time bias is induced by group-basedCox model analysis in the scenarios investigated. However, even in these ex-treme scenarios, methods are available that estimate the target parameteradequately with minimal bias.One single approach may not be the most suitable to analyze all kindsof survival data. However, based on this study, we have gained consider-able understanding about which approaches should be used depending onthe nature of the disease mechanism. If assumptions behind the Cox modelwith time-dependent treatment (such as treatment assignment occurring arandom times) are not reasonable in a given disease scenario, the sequentialCox approach can be used as a good alternative. However, when we need toconsider post-baseline values of the time-dependent covariate to adequatelymodel a disease process, then the sequential Cox approach is not the bestalternative. In the presence of time-dependent confounders, MSCM is thebest method to adjust for this type of confounding. Future research couldfocus on enhancing the sequential Cox approach to allow appropriate ad-justments for time-dependent covariates and confounders. Future studiescould also assess the impact of model misspecifications and measurementerror under the same scheme used in this study.110Chapter 5Conclusion5.1 Summary of the Main ResultsEstimating the treatment effect from drug effectiveness observational stud-ies is challenging due to the existence of various kinds of biases. The ideabehind causal inference is to design the statistical analysis of observationalstudies to mimic the conditions of a hypothetical randomized experiment.Such conditions will allow us to properly investigate well-formulated causalquestions. This requires not only knowledge about the subject area (e.g.,the condition or disease under study and associated drug exposure), butalso familiarity with the statistical tools and techniques appropriate for suchanalyses.Research on chronic diseases deals with multiple measurements of af-fected subjects over an extended follow-up period. During this time, keypatient characteristics may change, including initiation or cessation of drugtreatments. This means that straightforward adjustment for baseline con-founders may not be adequate to answer a question about the effectiveness ofa drug, where the exposure might occur months or years after ‘baseline’. Forexample, longitudinal observational data are required to assess the impact ofbeta-interferon drug exposure on disease progression in relapsing-remittingmultiple sclerosis (MS) patients in the ‘real-world’ clinical practice setting.Most commonly used causal inference tools, such as propensity scores, arenot generally well-suited to deal with complex longitudinal patterns of suchdata, i.e., in the presence of immortal time bias and time-dependent con-founding [21, ch.15]. Marginal structural Cox models (MSCMs) can bethought of as an extension of the propensity score tool that gained popular-1115.1. Summary of the Main Resultsity over the last decade. MSCMs provide distinct advantages over traditionalapproaches by allowing adjustment for time-varying confounders, such as re-lapses (‘attacks’) in our MS application, as well as baseline characteristics,through the use of inverse probability weighting (IPW). As MSCMs are ex-tensions of the Cox model with time-dependent treatment exposure, theyalso allow adjustment for immortal time bias.We assessed the suitability of MSCMs to analyze data from a large co-hort of 1,697 relapsing-remitting MS patients in British Columbia, Canada(1995-2008) in Chapter 2. In the context of this observational study span-ning over a decade and involving patients with a chronic, yet fluctuatingdisease, the recently proposed normalized stabilized weights were found tobe the most appropriate choice of weights. Using these weights, no asso-ciation was found between beta-interferon exposure and the hazard of dis-ability progression (hazard ratio 1.36, 95% confidence interval 0.95, 1.94).Additionally, findings did not change when truncated normalized unstabi-lized weights were used in further MSCMs and to construct IPW adjustedsurvival curves. Qualitatively similar conclusions from approximation ap-proaches to the weighted Cox model (i.e., MSCM) extend confidence in thefindings.IPWs are at the heart of MSCMs. The properties of IPWs influence theestimated effects from MSCM and their accuracy. Logistic regressions arepopularly used to model the IPWs. Statistical learning algorithms such asbagging, support vector machines and boosting have proved useful in gener-ating well-balanced propensity scores. As propensity scores are used in theintermediate steps to construct IPWs, it is natural to investigate the utilityof these approaches for modelling IPWs. We compared the performance ofthese proposed approaches in Chapter 3 using simulated survival data thatmimicked a context in which both treatment status and a confounder weretime-dependent. Proposed approaches are compared with respect to bias,standard error, MSE, and coverage probabilities of model-based nominal95% confidence intervals under various weight variability reduction tech-1125.1. Summary of the Main Resultsniques, such as normalization and increased levels of truncation. Under arare event condition, the weights generated from boosting were found tobe associated with less MSE and better coverage. Bagging and supportvector machine did not perform well in this MSCM context. The studywas repeated for the situation when events are more frequent and also withsmaller numbers of subjects to observe the impact. In the smaller samplecase, bias, variance and subsequently MSE were larger. When the event rateis more frequent, MSCM estimates computed using boosing approach weresimilar to those from logistic regression.The findings from this simulation study guide an application of theMSCM to investigate the impact of beta-interferon treatment in delayingdisability progression in subjects from the British Columbia Multiple Scle-rosis database (1995-2008). When boosting is used to model the IPWs,MSCM estimates were similar to that obtained when IPWs are estimatedfrom the logistic regression approach as in Chapter 2. Although the confi-dence interval was narrower, the conclusion remains the same (hazard ratio1.32, 95% confidence interval 0.94, 1.86).In observational drug effectiveness survival studies, misclassification orexclusion of the period between cohort entry and first treatment exposureduring the follow-up period may result in immortal time bias. This biascan be minimized by acknowledging a change in treatment exposure statuswith time-dependent analyses, such as fitting a time-dependent Cox model.Accounting for time-dependent variables in the analyses may be complexand the corresponding interpretations may not be intuitive. Furthermore,the assumptions of such an approach, such as treatment initiation beingunrelated to the risk of subsequent failure, may be untestable or difficult toassess. Prescription time-distribution matching is an approach proposed inthe literature to avoid the need for a time-dependent Cox analysis. In thismethod, the treatment initiation time distribution for the treated subjects ismatched with a newly assigned baseline or time zero for untreated subjects,so that both treatment groups have a comparable time zero.1135.2. ImplicationsIn longitudinal studies with a sequence of measurements, both treat-ment and the covariates under consideration may be time-dependent. Fur-thermore, the time-dependent covariates may be affected by the change oftreatment status, i.e., time-dependent confounding may be present. MSCMsare usually used to deal with such confounding. However, these models areextensions of time-dependent Cox models and therefore fitting and inter-pretation of these models is also not straightforward. The sequential Coxapproach is suggested as an alternative approach. This approach createssmall cohorts based on each possible treatment start time and the over-all treatment effect is estimated by averaging the estimated effects fromall the created cohorts. Both the prescription time-distribution matchingand the sequential Cox approaches break the time-dependent nature of theproblem down into smaller pieces such that the findings potentially becomeaccessible to a wider audience. In Chapter 4, we assess the suitability ofboth approaches for analyzing data in the absence and presence of a time-dependent confounder.These approaches are applied to investigate the impact of beta-interferontreatment in delaying disability progression in the British Columbia Multi-ple Sclerosis cohort (1995−2008). Under the assumption that there were notreatment discontinuations, we found no convincing evidence that β-IFN re-duces the hazard of disability progression with either approach (hazard ratio1.26, 95% confidence interval 0.86, 1.85 from the PTDM and hazard ratio1.14, 95% confidence interval 0.69, 1.89 from the sequential Cox approach).5.2 ImplicationsMost of the MSCM analyses reported in the literature aim to model diseaseconditions specific to HIV/AIDS. We applied and adapted advances made inthis field to better study another chronic disease, MS. In this work, we iden-tified a time-dependent confounder using a causal diagram by incorporatingsubject-specific knowledge of how β-IFN treatment potentially impacts on1145.2. Implicationsthe disease process. We then translated the MS disease features using theMSCM framework to adjust for the time-dependent confounder.Randomized clinical trials are not feasible ethically or practically overthe long observation periods of chronic diseases such as MS. Such studiesmay also fail to reflect the ‘real-world’ clinical practice setting. Therefore,observational studies may provide invaluable information in the MS context.It is of considerable clinical importance to establish whether β-IFN has aneffect in delaying long-term progression of the disease. Therefore, the aimwas to estimate the effect of β-IFN treatment on the longer-term outcomeof irreversible disability. Although this question was studied previously invarious observational studies, results were seemingly inconsistent and con-tradictory. In this study, we show how the analysis should be appropriatelydone in the presence of a time-dependent confounder, such as MS relapses.The present study is the first to examine the impact of beta-interferondrug exposure on disease progression in a MS cohort using a MSCM ap-proach. This study took into account of the causal dynamics of the MSdisease process over a long follow-up period and has made a notable con-tribution to the available evidence. The implication of this study goes farbeyond the MS disease setting as it shows that normalized stabilized weightsare useful for fitting MSCMs in order to study chronic disease conditionswith an extended follow-up period. A large number of sensitivity analyseswere carefully planned and carried out to check various assumptions, to val-idate the results in a restricted sub-population, and to assess the impact ofcovariate definitions used in the data analysis.It has long been hypothesized that use of statistical learning techniquesmight improve the properties of IPWs as they were shown to improve thebalance of propensity scores [135, 140, 141]. Using a simulation study anddata analysis, this study is the first to investigate the utility of statisticallearning methods such as bagging, SVM and boosting in creating weights.In particular, weights created using the boosting approach were shown to1155.2. Implicationsimprove the behaviour of IPWs and consequently provided a better estimateof the effect from MSCM with narrower confidence interval. The in-depthanalysis also considered the impact of various weight variability reductiontechniques, such as truncation and normalization, which reduce the variabil-ity of the IPWs.Pharmacoepidemiological studies often suggest alternative techniques forthe ease of analysis and interpretation of the results. For example, ‘PTDM’was suggested as an alternative to the Cox model with time-dependent ex-posures to minimize immortal time bias. One set of data analyses found thismethod to be effective in controlling immortal time bias [82] in the sensethat it provided similar results as fitting of a Cox model with time-dependentexposures.The MSCM approach is usually used to estimate a time-dependent treat-ment effect in the presence of time-dependent confounding. To avoid someof the problems of fitting MSCM, such as potential instability of the IPW es-timates, an alternative, the sequential Cox approach, was adopted to studydrug exposure in analyzing a HIV cohort [73]. Although the estimationmechanisms are different, the target parameter, the causal effect of treat-ment, is the same in both approaches and both approaches also adjust fortime-dependent confounding.To date, neither of these alternative approaches were investigated in theliterature in generalized settings as alternatives to the Cox model with time-dependent exposures and MSCM respectively. To the best of our knowledge,ours is the first study to investigate the generalizability of these approachesusing simulation settings suitable for the contexts of these methods. Find-ings from our study guide the appropriate choice of analysis tool based oninformation such as whether time-dependent covariates are present or notand whether a covariate interacts with the treatment (exposure possibly de-layed for some subjects). The simulation studies revealed that the sequentialCox approach is more useful than the PTDM approach in addressing im-1165.3. Future Researchmortal time bias. On the other hand, our results indicate that this approachis not as effective as other studies have suggested in the presence of a time-dependent confounder.5.3 Future ResearchThe main aim of this dissertation was to assess, validate and refine thecausal inference tools to estimate the causal effect of a treatment in a re-alistic epidemiological context involving time-dependent confounders. Weused these tools to answer an MS research question that is of great im-portance to MS patients: is β-IFN treatment beneficial in reducing thehazard of longer-term irreversible disability milestones. Other approaches,such as structural nested models [38, 70, 71], the sequential stratification ap-proach [72], nonparametric g-computation approach [221–226], tree-based g-computation [227] and parametric g-computation [228–233] may also be use-ful in estimating treatment effects in MS in the presence of time-dependentconfounders. Further research could make use of the dynamic MSCMs [234–236] and the dynamic random g-formula [237, 238] to answer questions re-garding the optimal time to start β-IFN treatment. Future research couldaddress more specific questions regarding the direct, indirect and mediatedeffects of β-IFN using the g-computation approach and extensions of thesequential Cox approach [196, 229].117Bibliography[1] Evans C., Zhu F., Kingwell E., Shirani A., van der Kop M., Petkau J.,Gustafson P., Zhao Y., Oger J., and Tremlett H. Association betweenbeta-interferon exposure and hospital events in multiple sclerosis.Pharmacoepidemiology and Drug Safety, 2014. doi: 10.1002/pds.3667.URL http://dx.doi.org/10.1002/pds.3667.[2] INFB Multiple Sclerosis Study Group. Interferon beta-1b is effectivein relapsing-remitting multiple sclerosis. I. Clinical results of a multi-center, randomized, double-blind, placebo-controlled trial. Neurology,43(4):655–661, 1993.[3] INFB Multiple Sclerosis Study Group and the University of BritishColumbia MS/MRI Analysis Group. Interferon beta-1b in the treat-ment of multiple sclerosis: final outcome of the randomized controlledtrial. Neurology, 45(7):1277–1285, 1995.[4] Jacobs L.D., Cookfair D.L., Rudick R.A., Herndon R.M., Richert J.R.,Salazar A.M., Fischer J.S., Goodkin D.E., Granger C.V., Simon J.H.,et al. Intramuscular interferon beta-1a for disease progression in re-lapsing multiple sclerosis. Annals of Neurology, 39(3):285–294, 1996.[5] Ebers, G.C. and PRISMS (Prevention of Relapses and Disability byInterferon beta-1a Subcutaneously in Multiple Sclerosis) study group.Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. The Lancet, 352(9139):1498–1504, 1998.[6] Freedman M. and the OWIMS Study Group. Evidence of interferon118Bibliographybeta-1a dose response in relapsing–remitting MS: the OWIMS Study.Neurology, 53(4):679–686, 1999.[7] Hume D. An enquiry concerning human understanding. P.F. Collier& Son, 1748.[8] Neyman J., Dabrowska D.M., and Speed T.P. On the applicationof probability theory to agricultural experiments. essay on principles.section 9. (translated). Roczniki Nauk Rolniczych Tom X [in Polish,translated in] Statistical Science, 5:465–472, 1923.[9] Fisher R.A. Statistical methods for research workers. Edinburgh, 1925.[10] Fisher R.A. et al. The arrangement of field experiments. Journal ofthe Ministry of Agriculture of Great Britain, 33:503–513, 1926.[11] Rubin D.B. Estimating causal effects of treatments in randomized andnonrandomized studies. Journal of Educational Psychology, 66(5):688,1974.[12] Rubin D.B. Assignment to treatment group on the basis of a covariate.Journal of Educational and Behavioral statistics, 2(1):1, 1977.[13] Rubin D.B. Bayesian inference for causal effects: The role of random-ization. The Annals of Statistics, 6(1):34–58, 1978.[14] Rubin D.B. Estimation in parallel randomized experiments. Journalof Educational and Behavioral Statistics, 6(4):377, 1981.[15] Rubin D.B. Formal mode of statistical inference for causal effects.Journal of Statistical Planning and Inference, 25(3):279–292, 1990.[16] Rubin D.B. Causal inference using potential outcomes. Journal of theAmerican Statistical Association, 100(469):322–331, 2005.[17] Dawid A.P. Causal inference without counterfactuals. Journal of theAmerican Statistical Association, 95(450):407–424, 2000.119Bibliography[18] Holland P.W. Statistics and causal inference. Journal of the AmericanStatistical Association, 81(396):945–960, 1986.[19] Rubin D.B. Statistics and causal inference: comment: which ifs havecausal answers. Journal of the American Statistical Association, 81(396):961–962, 1986.[20] Rosenbaum P.R. and Rubin D.B. The central role of the propensityscore in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.[21] Herna´n M.A. and Robins J.M. Causal inference. ChapmanHall/CRC, 2015. Forthcoming. URL: http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Last accessed: Oct-05,2014.[22] Robins J.M. Association, causation, and marginal structural models.Synthese, 121(1):151–179, 1999.[23] Rubin D.B. Randomization analysis of experimental data: The Fisherrandomization test. Journal of the American Statistical Association,75(371):591–593, 1980.[24] Cox D.R. Planning of experiments. Wiley, 1958.[25] Lewis D.K. Counterfactuals. Wiley-Blackwell, 1973.[26] VanderWeele T.J. and Herna´n M.A. Causal inference under multipleversions of treatment. Journal of Causal Inference, 1(1):1–20, 2013.[27] VanderWeele T.J. and Herna´n M.A. From counterfactuals to sufficientcomponent causes and vice versa. European Journal of Epidemiology,21(12):855–858, 2006.[28] Herna´n M.A. and Taubman S.L. Does obesity shorten life&quest; theimportance of well-defined interventions to answer causal questions.International Journal of Obesity, 32:S8–S14, 2008.120Bibliography[29] Cole S.R. and Frangakis C.E. The consistency statement in causalinference: a definition or an assumption? Epidemiology, 20(1):3–5,2009.[30] Rosenbaum P.R. and Rubin D.B. Reducing bias in observational stud-ies using subclassification on the propensity score. Journal of theAmerican Statistical Association, 79(387):516–524, 1984.[31] Rosenbaum P.R. and Rubin D.B. Constructing a control group usingmultivariate matched sampling methods that incorporate the propen-sity score. American Statistician, 39(1):33–38, 1985.[32] Rubin D.B. Estimating causal effects from large data sets usingpropensity scores. Annals of Internal Medicine, 127(8 Part 2):757–763, 1997.[33] Robins J.M. A new approach to causal inference in mortality studieswith a sustained exposure period - application to control of the healthyworker survivor effect. Mathematical Modelling, 7(9-12):1393–1512,1986.[34] Robins J.M. Addendum to “a new approach to causal inference inmortality studies with a sustained exposure period - application tocontrol of the healthy worker survivor effect”. Computers & Mathe-matics with Applications, 14(9-12):923–945, 1987.[35] Greenland S., Pearl J., and Robins J.M. Causal diagrams for epidemi-ologic research. Epidemiology, 10(1):37–48, 1999.[36] Robins J.M. The analysis of randomized and non-randomized AIDStreatment trials using a new approach to causal inference in longitudi-nal studies. Health Service Research Methodology: a Focus on AIDS,113:113–159, 1989.[37] Robins J. Estimation of the time-dependent accelerated failure timemodel in the presence of confounding factors. Biometrika, 79(2):321–334, 1992.121Bibliography[38] Robins J.M. Correcting for non-compliance in randomized trials usingstructural nested mean models. Communications in Statistics - Theoryand Methods, 23(8):2379–2412, 1994.[39] Robins J.M. Marginal structural models. In Proceedings of the Amer-ican Statistical Association, Section on Bayesian Statistical Science,pages 1–10. American Statistical Association, 1997.[40] Robins J.M. Correction for non-compliance in equivalence trials.Statistics in Medicine, 17(3):269–302, 1998.[41] Robins J.M. Marginal structural models versus structural nested mod-els as tools for causal inference. Statistical Models in Epidemiology,the Environment and Clinical Trials, 116:95–134, 1999.[42] Robins J.M., Herna´n M.A., and Brumback B. Marginal structuralmodels and causal inference in epidemiology. Epidemiology, 11(5):550–560, 2000.[43] Ma˚nsson R., Joffe M.M., Sun W., and Hennessy S. On the estimationand use of propensity scores in case-control and case-cohort studies.American Journal of Epidemiology, 166(3):332, 2007.[44] Austin P.C. The performance of different propensity-score methodsfor estimating differences in proportions (risk differences or absoluterisk reductions) in observational studies. Statistics in Medicine, 29:2137–2148, 2010.[45] Miettinen O.S. Components of the crude risk ratio. American Journalof Epidemiology, 96(2):168–172, 1972.[46] Miettinen O.S. and Cook E. Confounding: essence and detection.American Journal of Epidemiology, 114(4):593–603, 1981.[47] Sato T. and Matsuyama Y. Marginal structural models as a tool forstandardization. Epidemiology, 14(6):680–686, 2003.122Bibliography[48] Newman S.C. Causal analysis of case-control data. EpidemiologicPerspectives & Innovations, 3(1):2–7, 2006.[49] Herna´n M.A. and Robins J.M. Estimating causal effects from epidemi-ological data. Journal of Epidemiology and Community Health, 60(7):578–586, 2006.[50] Herna´n M.A., Brumback B., and Robins J.M. Marginal structuralmodels to estimate the causal effect of zidovudine on the survival ofHIV-positive men. Epidemiology, 11(5):561–570, 2000.[51] Suarez D., Borras R., and Basagana X. Differences between marginalstructural models and conventional models in their exposure effectestimates: A systematic review. Epidemiology, 22(4):586–588, 2011.[52] Yang S., Eaton C.B., Lu J., and Lapane K.L. Application of marginalstructural models in pharmacoepidemiologic studies: a systematic re-view. Pharmacoepidemiology and Drug Safety, 23(6):560–571, 2014.[53] van der Laan M.J. and Robins J.M. Unified methods for censoredlongitudinal data and causality. Springer Verlag, 2003.[54] Mortimer K.M., Neugebauer R., Van der Laan M., and Tager I.B. Anapplication of model-fitting procedures for marginal structural models.American Journal of Epidemiology, 162(4):382–388, 2005.[55] Cole S.R. and Herna´n M.A. Constructing inverse probability weightsfor marginal structural models. American Journal of Epidemiology,168(6):656–664, 2008.[56] Young J.G., Herna´n M.A., Picciotto S., and Robins J.M. Relationbetween three classes of structural models for the effect of a time-varying exposure on survival. Lifetime Data Analysis, 16(1):71–84,2010.[57] Xiao Y., Abrahamowicz M., and Moodie E.E.M. Accuracy of conven-tional and marginal structural Cox model estimators: A simulationstudy. The International Journal of Biostatistics, 6(2):1–28, 2010.123Bibliography[58] Herna´n M.A., Brumback B., and Robins J.M. Marginal structuralmodels to estimate the joint causal effect of nonrandomized treat-ments. Journal of the American Statistical Association, 96(454):440–448, 2001.[59] Pearl J. Causal diagrams for empirical research. Biometrika, pages669–688, 1995.[60] Pearl J. Causality: models, reasoning and inference. Cambridge UnivPress, 2000.[61] Spirtes P., Glymour C.N., and Scheines R. Causation, prediction, andsearch, volume 81. The MIT Press, 2000.[62] Shrier I. and Platt R. Reducing bias through directed acyclic graphs.BMC Medical Research Methodology, 8(1):70, 2008.[63] Herna´ndez-Dı´az S., Schisterman E.F., and Herna´n M.A. The birthweight paradox uncovered? American Journal of Epidemiology, 164(11):1115, 2006.[64] Herna´n M.A., Herna´ndez-Dı´az S., and Robins J.M. A structural ap-proach to selection bias. Epidemiology, 15(5):615–625, 2004.[65] VanderWeele T.J. and Robins J.M. Four types of effect modification:A classification based on directed acyclic graphs. Epidemiology, 18(5):561, 2007.[66] Herna´n M.A., Clayton D., and Keiding N. The Simpson’s paradoxunraveled. International Journal of Epidemiology, 2011.[67] Schisterman E.F., Cole S.R., and Platt R.W. Overadjustment biasand unnecessary adjustment in epidemiologic studies. Epidemiology,20(4):488, 2009.[68] Leibovici L. Effects of remote, retroactive intercessory prayer on out-comes in patients with bloodstream infection: randomised controlledtrial. British Medical Journal, 323(7327):1450, 2001.124Bibliography[69] Fewell Z., Herna´n M.A., Wolfe F., Tilling K., Choi H., and Sterne JA.Controlling for time-dependent confounding using marginal structuralmodels. The Stata Journal, 4(4):402–420, 2004.[70] Robins J. A graphical approach to the identification and estimationof causal parameters in mortality studies with sustained exposure pe-riods. Journal of Chronic Diseases, 40:139–161, 1987.[71] Almirall D., Ten Have T., and Murphy S.A. Structural nested meanmodels for assessing time-varying effect moderation. Biometrics, 66(1):131–139, 2010.[72] Schaubel D.E., Wolfe R.A., Sima C.S., and Merion R.M. Estimatingthe effect of a time-dependent treatment by levels of an internal time-dependent covariate: application to the contrast between liver wait-list and posttransplant mortality. Journal of the American StatisticalAssociation, 104(485):49–59, 2009.[73] Gran J.M., Røysland K., Wolbers M., Didelez V., Sterne J.A.C., Led-ergerber B., Furrer H., von Wyl V., and Aalen O.O. A sequential Coxapproach for estimating the causal effect of treatment in the presenceof time-dependent confounding applied to data from the Swiss HIVCohort Study. Statistics in Medicine, 29(26):2757–2768, 2010.[74] Andersen P.K. and Gill R.D. Cox’s regression model for countingprocesses: a large sample study. The Annals of Statistics, 10(4):1100–1120, 1982.[75] Lange T. and Rod N.H. Causal models. In Handbook of survivalanalysis, pages 135–151. CRC Press, 2013.[76] Messmer B.J., Leachman R.D., Nora J.J., and Cooley D.A. Survival-times after cardiac allografts. The Lancet, 293(7602):954–956, 1969.[77] Clark D.A., Stinson E.B., Griepp R.B., Schroeder J.S., Shumway N.E.,and Harrison D.C. Cardiac transplantation in man. Annals of InternalMedicine, 75(1):15, 1971.125Bibliography[78] Suissa S. Effectiveness of inhaled corticosteroids in chronic obstruc-tive pulmonary disease: immortal time bias in observational studies.American Journal of Respiratory and Critical Care Medicine, 168(1):49, 2003.[79] Suissa S. Immortal time bias in observational studies of drug effects.Pharmacoepidemiology and Drug Safety, 16(3):241–249, 2007.[80] Suissa S. Immortal time bias in pharmacoepidemiology. AmericanJournal of Epidemiology, 167(4):492, 2008.[81] Clayton D. and Hills M. Statistical models in epidemiology. OxfordUniversity Press, 1993.[82] Zhou Z., Rahme E., Abrahamowicz M., and Pilote L. Survival biasassociated with time-to-treatment initiation in drug effectiveness eval-uation: a comparison of methods. American Journal of Epidemiology,162(10):1016–1023, 2005.[83] Cox D.R. Regression models and life-tables (with discussion). Journalof the Royal Statistical Society. Series B (Methodological), pages 187–220, 1972.[84] Sylvestre M.P., Huszti E., and Hanley J.A. Do OSCAR winners livelonger than less successful peers? A reanalysis of the evidence. Annalsof Internal Medicine, 145(5):361–363, 2006.[85] Ho P.M., Fihn S.D., Wang L., Bryson C.L., Lowy E., Maynard C.,Magid D.J., et al. Clopidogrel and long-term outcomes after stentimplantation for acute coronary syndrome. American Heart Journal,154(5):846–851, 2007.[86] Karp I., Behlouli H., LeLorier J., and Pilote L. Statins and cancerrisk. The American Journal of Medicine, 121(4):302–309, 2008.[87] Ho P.M., Maddox T.M., Wang L., Fihn S.D., Jesse R.L., Peter-son E.D., and Rumsfeld J.S. Risk of adverse outcomes associated with126Bibliographyconcomitant use of clopidogrel and proton pump inhibitors followingacute coronary syndrome. Journal of American Medical Association,301(9):937–944, 2009.[88] Snyder C.W., Weinberg J.A., McGwin Jr G., Melton S.M.,George R.L., Reiff D.A., Cross J.M., Hubbard-Brown J., Rue III L.W.,and Kerby J.D. The relationship of blood product ratio to mortality:survival benefit or survival bias? The Journal of Trauma, 66(2):358–362, 2009.[89] World health organization: multiple sclerosis international federa-tion. Multiple Sclerosis Resources in the World. Geneva, Switzerland:World Health Organization, 2008.[90] Brown M.G., Kirby S., Skedgel C., Fisk J.D., Murray T.J., Bhan V.,and Sketris I.S. How effective are disease-modifying drugs in delayingprogression in relapsing-onset MS? Neurology, 69(15):1498, 2007.[91] Trojano M., Pellegrini F., Fuiani A., Paolicelli D., Zipoli V., Zima-tore G.B., Di Monte E., Portaccio E., Lepore V., Livrea P., andM.P. Amato. New natural history of interferon-β–treated relapsingmultiple sclerosis. Annals of Neurology, 61(4):300–306, 2007.[92] Shirani A., Zhao Y., Karim M.E., Evans C., Kingwell E., van derKop M., Oger J., Gustafson P., Petkau J., and Tremlett H. Associ-ation between use of interferon beta and progression of disability inpatients with relapsing-remitting multiple sclerosis. Journal of Amer-ican Medical Association, 308(3):247–256, 2012.[93] Renoux C. and Suissa S. Immortal time bias in the study of effective-ness of interferon-β in multiple sclerosis. Annals of Neurology, 64(1):109–110, 2008.[94] Koch M., Mostert J., De Keyser J., Tremlett H., and Filip-pini G. Interferon-beta treatment and the natural history of relapsing-remitting multiple sclerosis. Annals of Neurology, 63(1):125–126, 2008.127Bibliography[95] Derfuss T. and Kappos L. Evaluating the potential benefit of inter-feron treatment in multiple sclerosis. Journal of American MedicalAssociation, 308(3):290–291, 2012.[96] Goodin D.S., Reder A.T., and Cutter G. Treatment with interferonbeta for multiple sclerosis. The Journal of the American Medical As-sociation, 308(16):1627–1628, 2012.[97] Shirani A., Petkau J., and Tremlett H. Treatment with interferon betafor multiple sclerosis-reply. Journal of American Medical Association,308(16):1627–1628, 2012.[98] Greenberg B.M., Balcer L., Calabresi P.A., Cree B., Cross A.,Frohman T., Gold R., Havrdova E., Hemmer B., Kieseier B.C.,Lisak R., Miller M.K. A. Racke, Steinman L., Stuve O., Wiendl H., andFrohman E. Interferon beta use and disability prevention in relapsing-remitting multiple sclerosis. Journal of American Medical AssociationNeurology, 70(2):248–251, 2013.[99] Shirani A., Zhao Y., Karim M.E., Evans C., Kingwell E., van derKop M., Oger J., Gustafson P., Petkau J., and Tremlett H. Interferonbeta and long-term disability in multiple sclerosis. JAMA Neurology,70(5):651–653, 2013.[100] Coles A. Multiple sclerosis: The bare essentials. Neurology in Practice,9(2):118–126, 2009.[101] Shirani A., Zhao Y., Karim M.E., Evans C., Kingwell E., van derKop M., Oger J., Gustafson P., Petkau J., and Tremlett H. Investi-gation of heterogeneity in the association between interferon beta anddisability progression in multiple sclerosis: an observational study. Eu-ropean Journal of Neurology, 21(6):835–844, 2014.[102] Westreich D., Cole S.R., Schisterman E.F., and Platt R.W. A simula-tion study of finite-sample properties of marginal structural Cox pro-portional hazards models. Statistics in Medicine, 31(19):2098–2109,2012.128Bibliography[103] Havercroft W.G. and Didelez V. Simulating from marginal structuralmodels with time-dependent confounding. Statistics in Medicine, 31(30):4190–4206, 2012.[104] Xiao Y., Moodie E.E.M., and Abrahamowicz M. Comparison of ap-proaches to weight truncation for marginal structural Cox models.Epidemiologic Methods, 2(1):1–20, 2012.[105] Kurtzke J.F. Rating neurologic impairment in multiple sclerosis: anexpanded disability status scale (EDSS). Neurology, 33(11):1444–1452,1983.[106] Tremlett H., Paty D., and Devonshire V. Disability progression inmultiple sclerosis is slower than previously reported. Neurology, 66(2):172–177, 2006.[107] Tremlett H., Yousefi M., Devonshire V., Rieckmann P., and Zhao Y.Impact of multiple sclerosis relapses on progression diminishes withtime. Neurology, 73(20):1616–1623, 2009.[108] Tremlett H., Zhao Y., Joseph J., and Devonshire V. Relapses in multi-ple sclerosis are age-and time-dependent. Journal of Neurology, Neu-rosurgery & Psychiatry, 79(12):1368–1374, 2008.[109] Glymour M.M. Using causal diagrams to understand common prob-lems in social epidemiology. In Oakes J.M. and Kaufman J.S., editors,Methods in social epidemiology. Jossey-Bass, 2006.[110] Robins J.M., Greenland S., and Hu F.C. Estimation of the causaleffect of a time-varying exposure on the marginal mean of a repeatedbinary outcome. Journal of the American Statistical Association, 94(447):687–700, 1999.[111] Cook N.R., Cole S.R., and Hennekens C.H. Use of a marginal struc-tural model to determine the effect of aspirin on cardiovascular mor-tality in the Physicians’ Health Study. American Journal of Epidemi-ology, 155(11):1045–1053, 2002.129Bibliography[112] Cole S.R., Herna´n M.A., Robins J.M., Anastos K., Chmiel J., De-tels R., Ervin C., Feldman J., Greenblatt R., Kingsley L., Lai S.,Young M., Cohen M., and Mun˜oz A. Effect of highly active antiretro-viral therapy on time to acquired immunodeficiency syndrome or deathusing marginal structural models. American Journal of Epidemiology,158(7):687–694, 2003.[113] McCulloch M., Broffman M., van der Laan M., Hubbard A., Kushi L.,Kramer A., Gao J., and Colford J.M. Lung cancer survival with herbalmedicine and vitamins in a whole-systems approach: ten-year follow-up data analyzed with marginal structural models and propensityscore methods. Integrative Cancer Therapies, 10(3):260–279, 2011.[114] Ali R.A., Ali M.A., and Wei Z. On computing standard errors formarginal structural Cox models. Lifetime Data Analysis, 20(1):106–131, 2014.[115] Westreich D., Cole S.R., Tien P.C., Chmiel J.S., Kingsley L.,Funk M.J., Anastos K., and Jacobson L.P. Time scale and adjustedsurvival curves for marginal structural Cox models. American Journalof Epidemiology, 171(6):691–700, 2010.[116] R Core Team. R: a language and environment for statistical comput-ing. R Foundation for Statistical Computing, Vienna, Austria, 2012.URL http://www.R-project.org/. ISBN 3-900051-07-0.[117] Harrell F.E. Regression modeling strategies: with applications to linearmodels, logistic regression, and survival analysis. Springer, 2001.[118] Thompson Jr W.A. On the treatment of grouped observations in lifestudies. Biometrics, 33(3):463–470, 1977.[119] D’Agostino R.B., Lee M.L., Belanger A.J., Cupples L.A., Anderson K.,and Kannel W.B. Relation of pooled logistic regression to time depen-dent Cox regression analysis: the Framingham Heart Study. Statisticsin Medicine, 9(12):1501–1515, 1990.130Bibliography[120] Platt R.W., Delaney J.A.C., and Suissa S. The positivity assumptionand marginal structural models: the example of warfarin use and riskof bleeding. European Journal of Epidemiology, 27(2):77–83, 2012.[121] Lee B.K., Lessler J., and Stuart E.A. Weight trimming and propensityscore weighting. PLoS one, 6(3):e18174, 03 2011.[122] Van der Wal W.M., Noordzij M., Dekker F.W., Boeschoten E.W.,Krediet R.T., Korevaar J.C., and Geskus R.B. Comparing mortalityin renal patients on hemodialysis versus peritoneal dialysis using amarginal structural model. The International Journal of Biostatistics,6(1):1–19, 2010.[123] Robins J., Orellana L., and Rotnitzky A. Estimation and extrapolationof optimal treatment and testing strategies. Statistics in Medicine, 27(23):4678–4721, 2008.[124] Willoughby E.W. and Paty D.W. Scales for rating impairment inmultiple sclerosis a critique. Neurology, 38(11):1793–1793, 1988.[125] Ebers G.C., Traboulsee A., Li D., Langdon D., Reder A.T.,Goodin D.S., Bogumil T., Beckmann K., Wolf C., Konieczny A., andthe Investigators of the 16-year Long-Term Follow-Up Study. Analysisof clinical outcomes according to original treatment groups 16 yearsafter the pivotal ifnb-1b trial. Journal of Neurology, Neurosurgery &Psychiatry, 81(8):907–912, 2010.[126] Lefebvre G., Delaney J.A., and Platt R.W. Impact of mis-specificationof the treatment model on estimates from a marginal structural model.Statistics in Medicine, 27(18):3629–3642, 2008.[127] Imai K. and Ratkovic M. Robust estimation of inverse probabil-ity weights for marginal structural models, 2014. URL http://imai.princeton.edu/research/MSM.html. Technical Report, Lastaccessed: July 20, 2014.131Bibliography[128] Karim M. E., Gustafson P., Petkau J., Zhao Y., Shirani A., King-well E., Evans C., van der Kop M., Oger J., and Tremlett H. MarginalStructural Cox Models for Estimating the Association Between β-Interferon Exposure and Disease Progression in a Multiple SclerosisCohort. American Journal of Epidemiology, 180(2):160–171, 2014.[129] Gruber S., Logan R.W., Jarr´ın I., Monge S., and Herna´n M.A. Ensem-ble learning of inverse probability weights for marginal structural mod-eling in large observational datasets. Statistics in Medicine, 2014. doi:10.1002/sim.6322. URL http://dx.doi.org/10.1002/sim.6322.[130] Fong C. and Imai K. Covariate balancing propensity score for gen-eral treatment regimes, 2014. URL http://imai.princeton.edu/research/CBGPS.html. Technical Report, Last accessed: Sept 20,2014.[131] Wyss R., Ellis A.R., Brookhart M.A., Girman C.J., Funk M.J., Lo-Casale R., and Stu¨rmer T. The role of prediction modeling in propen-sity score estimation: An evaluation of logistic regression, bcart, andthe covariate-balancing propensity score. American Journal of Epi-demiology, 180(6):645–655, 2014.[132] Lee B.K., Lessler J., and Stuart E.A. Improving propensity scoreweighting using machine learning. Statistics in Medicine, 29(3):337–346, 2010.[133] Austin P.C. An introduction to propensity score methods for reduc-ing the effects of confounding in observational studies. MultivariateBehavioral Research, 46(3):399–424, 2011.[134] McCaffrey D.F., Ridgeway G., and Morral A.R. Propensity scoreestimation with boosted regression for evaluating causal effects in ob-servational studies. Psychological Methods, 9(4):403–425, 2004.[135] Westreich D., Lessler J., and Funk M.J. Propensity score estimation:machine learning and classification methods as alternatives to logisticregression. Journal of Clinical Epidemiology, 63(8):826, 2010.132Bibliography[136] Li L., Shen C., Wu A.C., and Li X. Propensity score-based sensitivityanalysis method for uncontrolled confounding. American Journal ofEpidemiology, 174(3):345–353, 2011.[137] Zhu Y., Ghosh D., Mukherjee B., and Mitra N. A data-adaptivestrategy for inverse weighted estimation of causal effects, 2013. URLhttp://works.bepress.com/debashis_ghosh/58. Technical Report,Collection of Biostatistics Research Archive, Last accessed: June-05-2014.[138] Keller B.S., Kim J., and Steiner P.M. Data mining alternatives tologistic regression for propensity score estimation: Neural networksand support vector machines. Multivariate Behavioral Research, 48(1):164–164, 2013.[139] Watkins S., Jonsson-Funk M., Brookhart M.A., Rosenberg S.A.,O’Shea T.M., and Daniels J. An empirical comparison of tree-basedmethods for propensity score estimation. Health Services Research, 48(5):1798–1817, 2013.[140] Regier M.D., Moodie E.E.M., and Platt R.W. The effect of error-in-confounders on the estimation of the causal parameter when us-ing marginal structural models and inverse probability-of-treatmentweights: A simulation study. The International Journal of Biostatis-tics, 10(1):1–15, 2014.[141] Coffman D.L. and Zhong W. Assessing mediation using marginalstructural models in the presence of confounding and moderation. Psy-chological Methods, 17(4):642–664, 2012.[142] Young J.G., Herna´n M.A., Picciotto S., and Robins J.M. Simula-tion from structural survival models under complex time-varying datastructures. In JSM Proceedings, Section on Statistics in Epidemiology,pages 1–6. American Statistical Association, 2008.[143] Picciotto S, Young J, and Herna´n M.A. G-estimation of structural133Bibliographynested cumulative failure time models. American Journal of Epidemi-ology, 67:139, 2008.[144] Young J.G. and Tchetgen Tchetgen E.J. Simulation from a known CoxMSM using standard parametric models for the g-formula. Statisticsin Medicine, 33(6):1001–1014, 2014.[145] Lin D.Y. and Wei L. The robust inference for the Cox proportionalhazards model. Journal of the American Statistical Association, 84(408):1074–1078, 1989.[146] Binder D.A. Fitting Cox’s proportional hazards models from surveydata. Biometrika, 79(1):139–147, 1992.[147] Breiman L. Bagging predictors. Machine Learning, 24(2):123–140,1996.[148] Breiman L. Arcing classifier (with discussion and a rejoinder by theauthor). The Annals of Statistics, 26(3):801–849, 1998.[149] Kuhn Max and Johnson Kjell. Applied predictive modeling. Springer,2013.[150] James G., Witten D., Hastie T., and Tibshirani R. An introductionto statistical learning. Springer, 2013.[151] Luellen J.K., Shadish W.R., and Clark M.H. Propensity scores anintroduction and experimental test. Evaluation Review, 29(6):530–558, 2005.[152] Fan R., Chen P., and Lin C. Working set selection using second or-der information for training support vector machines. The Journal ofMachine Learning Research, 6:1889–1918, 2005.[153] Chang C. and Lin C. LIBSVM: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3):1–27,2011.134Bibliography[154] Ridgeway G. The state of boosting. Computing Science and Statistics,pages 172–181, 1999.[155] Guo S. and Fraser Mark W. Propensity score analysis: statisticalmethods and applications. Sage Publications, 2009.[156] Robins J.M. and Herna´n M.A. Estimation of the causal effects oftime-varying exposures. In Longitudinal data analysis, pages 553–599.CRC Press, 2009.[157] Wang Y., Petersen M.L., Bangsberg D., and van der Laan M.J.Diagnosing bias in the inverse probability of treatment weightedestimator resulting from violation of experimental treatment as-signment, 2006. URL http://biostats.bepress.com/ucbbiostat/paper211/. Technical Report, Last accessed: July 20, 2014.[158] Bembom O. and van der Laan M.J. Data-adaptive selection ofthe truncation level for inverse-probability-of-treatment-weighted es-timators, 2008. URL http://biostats.bepress.com/ucbbiostat/paper230/. Technical Report, Last accessed: July 20, 2014.[159] Bryan J., Yu Z., and van der Laan M.J. Analysis of longitudinalmarginal structural models. Biostatistics, 5(3):361–380, 2004.[160] Moodie E.E.M., Stephens D.A., and Klein M.B. A marginal structuralmodel for multiple-outcome survival data: assessing the impact of in-jection drug use on several causes of death in the canadian co-infectioncohort. Statistics in Medicine, 33(8):1409–1425, 2014.[161] Marcus S.M., Siddique J., Ten Have T.R., Gibbons R.D., Stuart E.,and Normand S.T. Balancing treatment comparisons in longitudinalstudies. Psychiatric Annals, 38(12):805, 2008.[162] Cole S.R. and Herna´n M.A. Adjusted survival curves with inverseprobability weights. Computer Methods and Programs in Biomedicine,75(1):45–49, 2004.135Bibliography[163] Langford J. and Zadrozny B. Estimating class membership proba-bilities using classifier learners. In Proceedings of the Tenth Interna-tional Workshop on Artificial Intelligence and Statistics, pages 198–205, 2005.[164] Zhu Ji and Hastie Trevor. Kernel logistic regression and the importvector machine. In Advances in Neural Information Processing Sys-tems, pages 1081–1088, 2001.[165] Efron B. and Tibshirani R.J. An introduction to the bootstrap. CRCpress, 1994.[166] Brumback B.A., Herna´n M.A., Haneuse S.J.P.A., and Robins J.M.Sensitivity analyses for unmeasured confounding assuming a marginalstructural model for repeated measures. Statistics in Medicine, 23(5):749–767, 2004.[167] Lash T. L and Cole S.R. Immortal person-time in studies of canceroutcomes. Journal of Clinical Oncology, 27(23):e55–e56, 2009.[168] van Walraven C., Davis D., Forster A.J., and Wells G.A. Time-dependent bias was common in survival analyses published in lead-ing clinical journals. Journal of Clinical Epidemiology, 57(7):672–682,2004.[169] Wolkewitz M., Allignol A., Harbarth S., de Angelis G., Schu-macher M., and Beyersmann J. Time-dependent study entries andexposures in cohort studies can easily be sources of different and avoid-able types of bias. Journal of Clinical Epidemiology, 65(11):1171–1180,2012.[170] Gail M.H. Does cardiac transplantation prolong life? A reassessment.Annals of Internal Medicine, 76(5):815–817, 1972.[171] Le´vesque L.E., Hanley J.A., Kezouh A., and Suissa S. Problem ofimmortal time bias in cohort studies: example using statins for pre-venting progression of diabetes. British Medical Journal, 340, 2010.136Bibliography[172] Austin P.C. Balance diagnostics for comparing the distributionof baseline covariates between treatment groups in propensity-scorematched samples. Statistics in Medicine, 28(25):3083–3107, 2009.[173] Ravi B., Croxford R., Austin P.C., Lipscombe L., Bierman A.S., Har-vey P.J., and Hawker G.A. The relation between total joint arthro-plasty and risk for serious cardiovascular events in patients withmoderate-severe osteoarthritis: propensity score matched landmarkanalysis. British Medical Journal, 347:f6187, 2013.[174] Kiri V.A., Pride N.B., Soriano J.B., and Vestbo J. Inhaled corti-costeroids in chronic obstructive pulmonary disease: results from twoobservational designs free of immortal time bias. American Journal ofRespiratory and Critical Care Medicine, 172(4):460–464, 2005.[175] Karim M.E. Can joint replacement reduce cardiovascular risk? BritishMedical Journal, 347:f6651, 2013.[176] Li Y.P., Propert K.J., and Rosenbaum P.R. Balanced risk set match-ing. Journal of the American Statistical Association, 96(455):870–882,2001.[177] Lu B. Propensity score matching with time-dependent covariates. Bio-metrics, 61(3):721–728, 2005.[178] Li Y., Schaubel D.E., and He K. Matching methods for obtainingsurvival functions to estimate the effect of a time-dependent treatment.Statistics in Biosciences, 6(1):105–126, 2014.[179] Trojano M. and Pellegrini F. Reply. Annals of Neurology, 64(1):110–110, 2008.[180] Tleyjeh I.M., Ghomrawi H.M.K., Steckelberg J.M., Montori V.M.,Hoskin T.L., Enders F., Huskins W.C., Mookadam F., Wilson W.R.,Zimmerman V., and Baddourb L.M. Propensity score analysis with atime-dependent intervention is an acceptable although not an optimal137Bibliographyanalytical approach when treatment selection bias and survivor biascoexist. Journal of Clinical Epidemiology, 63(2):139–140, 2010.[181] Austin P.C. and Platt R.W. Survivor treatment bias, treatment selec-tion bias, and propensity scores in observational research. Journal ofClinical Epidemiology, 63(2):136–138, 2010.[182] Kiri V.A. and MacKenzie G. Re: “immortal time bias in pharma-coepidemiology”. American Journal of Epidemiology, 170(5):667–668,2009.[183] Sylvestre M.P. and Abrahamowicz M. Comparison of algorithms togenerate event times conditional on time-dependent covariates. Statis-tics in Medicine, 27(14):2618–2634, 2008.[184] Austin P.C., Mamdani M.M., Van Walraven C., and Tu J.V. Quan-tifying the impact of survivor treatment bias in observational studies.Journal of Evaluation in Clinical Practice, 12(6):601–612, 2006.[185] Shintani A.K., Girard T.D., Arbogast P.G., Moons K.G.M., andEly E.W. Immortal time bias in critical care research: applicationof time-varying cox regression for observational cohort studies. Criti-cal Care Medicine, 37(11):2939, 2009.[186] Liu J., Weinhandl E.D., Gilbertson D.T., Collins A.J., and St Pe-ter W.L. Issues regarding ‘immortal time’ in the analysis of the treat-ment effects in observational studies. Kidney International, 81(4):341–350, 2011.[187] Ho A., Dion P.W., Yeung J.H.H., Joynt G.M., Lee A., Ng C.S.H.,Chang A., So F.L., and Cheung C.W. Simulation of survivorship biasin observational studies on plasma to red blood cell ratios in massivetransfusion for trauma. British Journal of Surgery, 99(S1):132–139,2012.[188] Buyse M. and Piedbois P. On the relationship between response to138Bibliographytreatment and survival time. Statistics in Medicine, 15(24):2797–2812,1996.[189] Beyersmann J., Gastmeier P., Wolkewitz M., and Schumacher M. Aneasy mathematical proof showed that time-dependent bias inevitablyleads to biased effect estimation. Journal of Clinical Epidemiology, 61(12):1216–1221, 2008.[190] Beyersmann Jan, Wolkewitz Martin, and Schumacher Martin. The im-pact of time-dependent bias in proportional hazards modelling. Statis-tics in Medicine, 27(30):6439–6454, 2008.[191] Gupta S.K. Intention-to-treat concept: A review. Perspectives inClinical Research, 2(3):109, 2011.[192] Wolkewitz M., Allignol A., Schumacher M., and Beyersmann J. Twopitfalls in survival analyses of time-dependent exposure: a case studyin a cohort of oscar nominees. The American Statistician, 64(3):205–211, 2010.[193] Leffondre´ K., Abrahamowicz M., and Siemiatycki J. Evaluation ofcox’s model and logistic regression for matched case-control data withtime-dependent covariates: a simulation study. Statistics in Medicine,22(24):3781–3794, 2003.[194] Cole S.R., Platt R.W., Schisterman E.F., Chu H., Westreich D.,Richardson D., and Poole C. Illustrating bias due to conditioningon a collider. International Journal of Epidemiology, 39(2):417–420,2010.[195] Gran J.M. Infectious disease modelling and causal inference, 2011.Ph.D. Thesis, University of Oslo.[196] Røysland K., Gran J.M., Ledergerber B., Wyl V., Young J., andAalen O.O. Analyzing direct and indirect effects of treatment us-ing dynamic path analysis applied to data from the swiss hiv cohortstudy. Statistics in Medicine, 30(24):2947–2958, 2011.139Bibliography[197] Leemis L.M. Technical note-variate generation for accelerated lifeand proportional hazards models. Operations Research, 35(6):892–894, 1987.[198] Bender R., Augustin T., and Blettner M. Generating survival timesto simulate cox proportional hazards models. Statistics in Medicine,24(11):1713–1723, 2005.[199] Zhou M. Understanding the Cox regression models with time-changecovariates. The American Statistician, 55(2):153–155, 2001.[200] Leemis L.M., Shih L., and Reynertson K. Variate generation for ac-celerated life and proportional hazards models with time dependentcovariates. Statistics & Probability Letters, 10(4):335–339, 1990.[201] Shih L. and Leemis L.M. Variate generation for a nonhomogeneouspoisson process with time dependent covariates. Journal of StatisticalComputation and Simulation, 44(3-4):165–186, 1993.[202] Austin P.C. Generating survival times to simulate cox proportionalhazards models with time-varying covariates. Statistics in Medicine,31(29):3946–3958, 2012.[203] Hendry D.J. Data generation for the Cox proportional hazards modelwith time-dependent covariates: a method for medical researchers.Statistics in Medicine, 33(3):436–454, 2014.[204] Abrahamowicz M., Mackenzie T., and Esdaile J.M. Time-dependenthazard ratio: modeling and hypothesis testing with application in lu-pus nephritis. Journal of the American Statistical Association, 91(436):1432–1439, 1996.[205] Mackenzie T. and Abrahamowicz M. Marginal and hazard ratio spe-cific random data generation: Applications to semi-parametric boot-strapping. Statistics and Computing, 12(3):245–252, 2002.140Bibliography[206] Abrahamowicz M. and MacKenzie T.A. Joint estimation of time-dependent and non-linear effects of continuous covariates on survival.Statistics in Medicine, 26(2):392–408, 2007.[207] Sylvestre M. and Abrahamowicz M. Flexible modeling of the cumu-lative effects of time-dependent exposures on the hazard. Statistics inMedicine, 28(27):3437–3453, 2009.[208] Mahboubi A., Abrahamowicz M., Giorgi R., Binquet C., Bonithon-Kopp C., and Quantin C. Flexible modeling of the effects of continuousprognostic factors in relative survival. Statistics in Medicine, 30(12):1351–1365, 2011.[209] Abrahamowicz M., Beauchamp M., and Sylvestre M. Comparisonof alternative models for linking drug exposure with adverse effects.Statistics in Medicine, 31(11-12):1014–1030, 2012.[210] Gauvin H., Lacourt A., and Leffondre´ K. On the proportional hazardsmodel for occupational and environmental case-control analyses. BMCMedical Research Methodology, 13(1):18, 2013.[211] Sterne J.A.C., Herna´n M.A., Ledergerber B., Tilling K., Weber R.,Sendi P., Rickenbach M., Robins J.M., and Egger M. Long-term ef-fectiveness of potent antiretroviral therapy in preventing AIDS anddeath: a prospective cohort study. The Lancet, 366(9483):378–384,2005.[212] Essebag V., Platt R.W., Abrahamowicz M., and Pilote L. Comparisonof nested case-control and survival analysis methodologies for analysisof time-dependent exposure. BMC Medical Research Methodology, 5(1):5, 2005.[213] Dafni U. Landmark analysis at the 25-year landmark point. Circula-tion: Cardiovascular Quality and Outcomes, 4(3):363–371, 2011.[214] Giobbie-Hurder A., Gelber R.D., and Regan M.M. Challenges of141Bibliographyguarantee-time bias. Journal of Clinical Oncology, 31(23):2963–2969,2013.[215] Austin P.C. and Platt R.W. Author’s response: the design of observa-tional studies-defining baseline time. Journal of Clinical Epidemiology,63(2):141, 2010.[216] Wang O., Kilpatrick R.D., Critchlow C.W., Ling X., Bradbury B.D.,Gilbertson D.T., Collins A.J., Rothman K.J., and Acquavella J.F.Relationship between epoetin alfa dose and mortality: findings from amarginal structural model. Clinical Journal of the American Societyof Nephrology, 5(2):182–188, 2010.[217] Aalen O.O. Armitage lecture 2010: understanding treatment effects:the value of integrating longitudinal data and survival analysis. Statis-tics in Medicine, 31(18):1903–1917, 2012.[218] Howe C.J., Cole S.R., Chmiel J.S., and Mun˜oz A. Limitation of inverseprobability-of-censoring weights in estimating survival in the presenceof strong selection bias. American Journal of Epidemiology, 173(5):569–577, 2011.[219] Wolkewitz M., Beyersmann J., Gastmeier P., and Schumacher M. Ef-ficient risk set sampling when a time-dependent exposure is present.Methods of Information in Medicine, 48:438–43, 2009.[220] Platt R.W., Schisterman E.F., and Cole S.R. Time-modified confound-ing. American Journal of Epidemiology, 170(6):687–694, 2009.[221] Robins J.M. A new approach to causal inference in mortality studieswith a sustained exposure period: application to control of the healthyworker survivor effect. Mathematical Modelling, 7(9):1393–1512, 1986.[222] Diggle P., Heagerty P., Liang K., and Zeger S. Time-dependent co-variates. In Analysis of longitudinal data, pages 245–281. Oxford Uni-versity Press, 2002.142Bibliography[223] Bembom O. and van der Laan M.J. Statistical methods for analyz-ing sequentially randomized trials. Journal of the National CancerInstitute, 99(21):1577–1582, 2007.[224] Van der Wal W.M., Prins M., Lumbreras B., and Geskus R.B. Asimple g-computation algorithm to quantify the causal effect of a sec-ondary illness on the progression of a chronic disease. Statistics inMedicine, 28(18):2325–2337, 2009.[225] Snowden J.M., Rose S., and Mortimer K.M. Implementation of g-computation on a simulated data set: demonstration of a causal infer-ence technique. American Journal of Epidemiology, 2011.[226] Daniel R.M., Cousens S.N., De Stavola B.L., Kenward M.G., andSterne J.A.C. Methods for dealing with time-dependent confounding.Statistics in Medicine, 32(9):1584–1618, 2013.[227] Austin P.C. Using ensemble-based methods for directly estimatingcausal effects: an investigation of tree-based g-computation. Multi-variate Behavioral Research, 47(1):115–135, 2012.[228] Taubman S.L., Robins J.M., Mittleman M.A., and Herna´n M.A. In-tervening on risk factors for coronary heart disease: an application ofthe parametric g-formula. International Journal of Epidemiology, 38(6):1599–1611, 2009.[229] Daniel R.M., De Stavola B.L., and Cousens S.N. gformula: Estimatingcausal effects in the presence of time-varying confounding or mediationusing the g-computation formula. The Stata Journal, 11(4):479, 2011.[230] Daniel R.M., De Stavola B.L., and Cousens S.N. Time-varying con-founding: some practical considerations in a likelihood framework.In Causality: statistical perspectives and applications, pages 234–252.John Wiley & Sons, 2012.[231] Westreich D., Cole S.R., Young J.G., Palella F., Tien P.C., Kings-ley L., Gange S.J., and Herna´n M.A. The parametric g-formula to143Bibliographyestimate the effect of highly active antiretroviral therapy on incidentAIDS or death. Statistics in Medicine, 31(18):2000–2009, 2012.[232] Cole S.R., Richardson D.B., Chu H., and Naimi A.I. Analysis ofoccupational asbestos exposure and lung cancer mortality using the gformula. American Journal of Epidemiology, 177(9):989–996, 2013.[233] Garcia-Aymerich J., Varraso R., Danaei G., Camargo C.A., andHerna´n M.A. Incidence of adult-onset asthma after hypothetical in-terventions on body mass index and physical activity: An applicationof the parametric g-formula. American Journal of Epidemiology, 179(1):20–26, 2014.[234] Cain L.E., Robins J.M., Lanoy E., Logan R., Costagliola D., andHerna´n M.A. When to start treatment? a systematic approach tothe comparison of dynamic regimes using observational data. TheInternational Journal of Biostatistics, 6(2), 2010.[235] Cain L.E., Logan R., Robins J.M., Sterne J.A., Sabin C., Bansi L.,Justice A., Goulet J., van Sighem A., de Wolf F., et al. When toinitiate combined antiretroviral therapy to reduce rates of mortalityand AIDS in HIV-infected individuals in developed countries. Annalsof Internal Medicine, 154(8):509–515, 2011.[236] Ewings F.M., Ford D., Walker A.S., Carpenter J., and Copas A. Op-timal CD4 Count for Initiating HIV Treatment: Impact of CD4 Ob-servation Frequency and Grace Periods, and Performance of DynamicMarginal Structural Models. Epidemiology, 25(2):194–202, 2014.[237] Young J.G., Cain L.E., Robins J.M., O’Reilly E.J., and Herna´n M.A.Comparative effectiveness of dynamic treatment regimes: an appli-cation of the parametric g-formula. Statistics in Biosciences, 3(1):119–143, 2011.[238] Schomaker M., Egger M., Ndirangu J., Phiri S., Moultrie H., Tech-nau K., Cox V., Giddy J., Chimbetete C., and Wood R. When to144Bibliographystart antiretroviral therapy in children aged 2–5 years: A collabora-tive causal modelling analysis of cohort studies from southern africa.PLoS Medicine, 10(11):e1001555, 2013.[239] Simon J.H., Jacobs L.D., Campion M., Wende K., Simonian N., Cook-fair D.L., Rudick R., Herndon R., Richert J., and Salazar A. TheMultiple Sclerosis Collaborative Research Group. Magnetic resonancestudies of intramuscular interferon beta-1a for relapsing multiple scle-rosis. Annals of Neurology, 43(1):79–87, 1998.[240] Gill R.D. Understanding Cox’s regression model: a martingale ap-proach. Journal of the American Statistical Association, 79(386):441–447, 1984.[241] Therneau T.M. Extending the Cox Model. Technical report,Section of Biostatistics, Mayo Clinic, Rochester, 1998. URLhttp://mayoresearch.mayo.edu/mayo/research/biostat/upload/58.pdf,Last accessed: June-05-2014.[242] Cole S.R., Hudgens M.G., Tien P.C., Anastos K., Kingsley L.,Chmiel J.S., and Jacobson L.P. Marginal structural models for case-cohort study designs to estimate the association of antiretroviral ther-apy initiation with incident AIDS or death. American Journal ofEpidemiology, 175(5):381–390, 2012.[243] Howe C.J., Cole S.R., Mehta S.H., and Kirk G.D. Estimating the ef-fects of multiple time-varying exposures using joint marginal structuralmodels: alcohol consumption, injection drug use, and HIV acquisition.Epidemiology, 23(4):574–582, 2012.[244] Cole S.R., Jacobson L.P., Tien P.C., Kingsley L., Chmiel J.S., andAnastos K. Using marginal structural measurement-error models to es-timate the long-term effect of antiretroviral therapy on incident AIDSor death. American Journal of Epidemiology, 171(1):113–122, 2010.[245] Choi H.K., Herna´n M.A., Seeger J.D., Robins J.M., and Wolfe F.145Methotrexate and mortality in patients with rheumatoid arthritis: aprospective study. The Lancet, 359(9313):1173–1177, 2002.[246] Horvitz D.G. and Thompson D.J. A generalization of sampling with-out replacement from a finite universe. Journal of the American Sta-tistical Association, 47(260):663–685, 1952.[247] Coffman D.L., Caldwell L.L., and Smith E.A. Introducing the at-riskaverage causal effect with application to HealthWise South Africa.Prevention Science, 13(4):437–447, 2012.[248] Lumley T. survey: Analysis of complex survey samples, 2011. Rpackage version 3.26.[249] Therneau T. A Package for Survival Analysis in S, 2014. URL http://CRAN.R-project.org/package=survival. R package version 2.37-7, Last accessed: Sep-15,2014.[250] Curtis L.H., Hammill B.G., Eisenstein E.L., Kramer J.M., andAnstrom K.J. Using inverse probability-weighted estimators in com-parative effectiveness analyses with observational databases. MedicalCare, 45(10):S103–S107, 2007.[251] Cole S.R., Herna´n M.A., Margolick J.B., Cohen M.H., andRobins J.M. Marginal structural models for estimating the effectof highly active antiretroviral therapy initiation on CD4 cell count.American Journal of Epidemiology, 162(5):471–478, 2005.[252] Rothman K.J. and Suissa S. Exclusion of immortal person-time. Phar-macoepidemiology and Drug Safety, 17(10):1036–1036, 2008.[253] Therneau T.M. Modeling survival data: extending the Cox model.Springer, 2000.[254] Sylvestre M., Edens T., MacKenzie T., and Abrahamowicz M. Pack-age ‘PermAlgo’, 2010. URL http://cran.r-project.org/web/packages/PermAlgo/. Last accesses: Sep-10-2014.146Appendix AAppendix for Chapter 2A.1 Rationale Behind Hypothesizing thatCumulative Relapses are Lying on theCausal Path of β-IFN and DisabilityProgressionThe exact mechanism of action of the β-IFN drugs in MS has never beenfully established and is one reason why estimating the effect of these drugsin MS is not straightforward. In the absence of randomization, establishinga causal link between drug exposure and outcome requires subject-specificknowledge and careful implementation of that knowledge in the analysis.Suggesting a plausible causal path is the first step.Relapsing-remitting patients experience relapses followed by periods ofremission in which partial or complete recovery occurs. Based on the resultsfrom randomized, double-blind, placebo-controlled studies, β-IFN treat-ments reduced the severity and frequency of relapses [2, 4–6, 239] and henceincreased the period between relapses [5].Consequently, a patient has moretime to recover from the residual disability left by the past relapse. Thisextended period of relapse-free time due to β-IFN exposure may eventu-ally contribute to a slower progression of disability [4, 5, 239]. However, itshould be noted that while most natural history studies indicate that long-term there is minimal or no association between relapse rates and disabilityprogression, a specific window of opportunity for relapses to contribute todisease progression may exist [107, 108].147A.2. Rationale Behind Using Marginal Structural Cox Model (MSCM) Instead of a Cox ModelTherefore, we hypothesized that within a short time interval the cumu-lative relapses are acting as an intermediate variable for the treatment anddisability progression relationship, i.e., the relapse frequency is influenced byprior β-IFN treatment and a greater (lesser) relapse frequency will result infaster (slower) disability progression. Also, we assume that the cumulativerelapse count in the previous time period is a confounder that may dictatethe treatment choice in subsequent time periods. Furthermore, experienc-ing an increased number of cumulative relapses after initiating treatmentwill increase the probability of discontinuing treatment [100]. Hence, in thisrelationship, cumulative relapse is treated both as an intermediate variableand a confounder.The causal path described above could be considered as rather simplistic.It is possible that cumulative relapse and disability progression have an un-measured common cause (for example, low serum vitamin D levels). Shouldthis data be available, then we would add that variable to the causal pathbetween cumulative relapse and EDSS. Cumulative relapse would still be atime-dependent confounder and would need to be adjusted for accordingly.A.2 Rationale Behind Using Marginal StructuralCox Model (MSCM) Instead of a Cox ModelFor a longitudinal study with N patients, let i = 1, 2, . . . , N be the patientindex, t = 0, 1, . . . , Ti months be the follow-up time index, Ait be the binarytreatment status at month t (1 = treated, 0 = untreated), and Li0 be thebaseline covariates of patient i. One possible model would express the hazardfunction of the time-dependent Cox model as follows:λi(t|Li0) = λ0t exp(β1Ait + β2Li0), (A.1)here λ0t is the unspecified baseline hazard function, β2 is the vector of loghazard ratios (HRs) for the baseline covariates and β1 is the log HR of thecurrent β-IFN status (Ait).148A.2. Rationale Behind Using Marginal Structural Cox Model (MSCM) Instead of a Cox ModelAssuming no tied event times, we estimate β = (β1, β2) by maximizingthe partial likelihood [240]:PL(β) =N∏i=1Ti∏t=0(Yit exp(β1Ait + β2Li0)∑Nk=1 Ykt exp(β1Akt + β2Lk0))dNit,where Yit denotes whether patient i belongs to the risk set at time t, Nit isthe number of events in the interval [0, t] and dNit denotes the number ofnew events for patient i at month t (increment from month t − 1, if any).This setting is more general than our case, where Nit ≤ 1 and dNit = 1 forat most 1 month.However, ignoring the time-dependent confounder Lit (i.e., an intermedi-ate variable lying in the causal pathway of the treatment and the outcome)may lead to a biased estimate of β. Simply including this variable in theCox model as a covariate as,λi(t|Li0, Lit) = λ0t exp(β1Ait + β2Li0 + β3Lit), (A.2)may still produce a biased estimate if Lit is influenced by past exposure [50].Inverse probability of treatment and censoring weights (IPTC; say w,sw, w(n), sw(n)) are person-time specific measures of the degree to whicha time-dependent variable confounds the treatment selection and censoringprocesses. These are used in the time-dependent Cox model to weight thecontribution of each person-time observation so that confounding due toLit is removed without changing the target parameter. In this way, MSCMfacilitates correction for time-dependent confounding. In the MSCM, theseIPTC weights are inserted in the partial likelihood function as follows [241–243]:PLw(β) =N∏i=1Ti∏t=0(Yit exp(β1Ait + β2Li0)∑Nk=1 Yktwkt exp(β1Akt + β2Lk0))dNit×wit.149A.3. Approximation of the Marginal Structural Cox ModelThe gradient with respect to the parameter vector β of the log of theweighted partial likelihood PLw(β) yields the score function Uw(β). Equat-ing Uw(β) to zero yields a set of estimating equations that can be solvedusing an iterative method such as the Newton-Raphson algorithm or a pe-nalized partial likelihood approach.A.3 Approximation of the Marginal StructuralCox ModelLet Dt be an indicator of reaching EDSS 6 for the first time between themonths t−1 and t. The data for patients who did not reach sustained EDSS6 and remained uncensored until follow-up month t can be modelled usingthe pooled logistic regression (logistic regression pooled over persons andtimes):logit Pr(Di,t = 1|Di,t−1 = 0, Ait, Li0) = γ0(t) + γ1Ait + γ2Li0. (A.3)Here γ0(t) is a smooth function of the month index t, represented as arestricted cubic spline, which is often used to reduce weight variability. Justas for cubic polynomial regression, use of a restricted cubic spline forcesthe relationship to be smooth even on the edges [117, chapter 6]; see theR code in the Appendix §A.5. The log OR of the current β-IFN statusin this pooled logistic regression, γ1, is generally a good approximation ofthe corresponding log hazard ratio obtained from the time-dependent Coxmodel (β1), provided that censoring is ignorable [244] and relatively shortintervals are chosen so that the probability of outcome occurrence in eachtime interval is small [118, 119]. The corresponding likelihood function canbe expressed as:L(γ) =N∏i=1Ti∏t=0pDitit (1− pit)(1−Dit),150A.4. Weight Models Used in the Data Analysiswhere γ = (γ0, γ1, γ2) and logit(pit) = γ0(t) + γ1Ait + γ2Li0.Herna´n et al. [50] suggested use of weighted pooled logistic regression toapproximate MSCM (IPTC weighted time-dependent Cox model) estimatesof treatment effect (β1) and others have followed this suggestion. [69, 112,159, 211, 245]. The weighted likelihood function is then written as [244]:Lw(γ) =N∏i=1Ti∏t=0(pDitit (1− pit)(1−Dit))wit .This approximate approach was suggested mainly because software avail-able at that time was unable to handle patient-specific time-varying weightsin a Cox model. It has been noted that this approximation approach isinadequate when the event is not rare [56]. Subsequently Xiao et al. [57]suggested the direct use of the Cox model weighted by IPTC weights to over-come this limitation. Through simulation, these authors also showed thatdirect use of the Cox model weighted by IPTC weights instead of any ap-proximate MSCM approach [50] considerably reduced the variability of theestimated treatment effect, even when both methods use the same weights.A.4 Weight Models Used in the Data AnalysisThe stabilized IPT weights for patient i at month t are expressed as:swTit =t∏j=0pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0)pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯ij = l¯ij). (A.4)The probability appearing in the numerator of swT is modeled usingpooled logistic model as follows:logit Pr(Aij = 1|A¯i,j−1, Li0) = α0(j) + α1Ai,j−1 + α2Li0, (A.5)where treatment status at the previous time interval (Aj−1; A−1 = 0 forall patients), the baseline covariates (L0; in our application, EDSS, age,151A.4. Weight Models Used in the Data AnalysisTable A.1: Estimated coefficients from the treatment model (de-nominator of swTit) for patients with relapsing-onset multiple sclerosis(MS), British Columbia, Canada (1995-2008)Estimate z-value p-valueβ-IFNj−1 9.78 102.92 < 0.001EDSS† 0.12 4.31 < 0.001Age†‡ -0.07 -1.70 0.09Disease duration†‡ -0.17 -3.17 < 0.001Sex† -0.07 -0.96 0.34Cumulative relapse 0.34 7.70 < 0.001Cumulative relapse:β-IFNj−1 -0.55 -10.83 < 0.001EDSS, expanded disability status scale.∗ Time index is also fitted with restricted cubic spline, but thecorresponding coefficients are not reported in the table.† Baseline covariates (L0).‡ Expressed in decades.disease duration, sex) and a restricted cubic spline of the follow-up monthindex are included as predictors. These covariates, as well as the time-varying confounder cumulative relapse (Lij) and its interaction with priortreatment status are included in the denominator model:logit Pr(Aij = 1|A¯i,j−1, Li0, L¯ij) = α0(j) + α1Ai,j−1 + α2Li0 +α3Lij + α13Ai,j−1Lij . (A.6)The output of this fit is reported in Appendix-Table A.1.The predicted value from the (denominator) model (A.6) yields the es-timated probability of the patient’s treatment status in that month t. Sincethe exposure status may vary from one time point to another, first we es-timate the probability of the observed treatment status at each time point,and then obtain the probability of the observed exposure sequence of a givenpatient by multiplying the corresponding probabilities. The numerator of152A.5. MSCM fitting in RswTit is estimated in a similar fashion from model (A.5), where L¯ij is notincluded as a predictor. Dividing the numerator model probabilities of thepatient’s observed treatment status aij (either 0 or 1) by the correspondingdenominator model probabilities yields the estimated IPT weights swTit thataccount for the confounding due to L¯ij , given the required assumptions aremet.To estimate the IPTC weights swit = swTit×swCit , the inverse probabilityof censoring (IPC) weights swCit are estimated in the same fashion. In orderto produce the normalized IPTC weights sw(n), each weight sw is dividedby its risk set’s mean weight.A.5 MSCM fitting in RFor time-dependent survival analysis, all person-time observations are pooledto make an augmented dataset. Short intervals, such as months, are chosenso that the most recently observed changes of the time-varying variables canbe updated in a new row in the dataset to reflect the patient’s time-varyingstatus with respect to covariates, censoring and response. In the longitudi-nal analysis literature, this is referred to as the ‘long’ format.Guidelines regarding IPTC weight calculations in R are available in theliterature [159]. These IPTC weights can be viewed as a generalization ofthe Horvitz-Thompson estimator [141, 246, 247]. Recently, due to the avail-ability of packages for the analysis of complex surveys in standard software(SAS, Stata and R), it is possible to fit the time-dependent IPTC weightedCox model directly or via approximation, say, using the weighted pooledlogistic model. In all the model choices, reliable SEs can be obtained froma reasonable number of patient-specific bootstrap samples.• Most MSCM analyses in the literature use weighted pooled logisticregression to approximate the IPTC weighted Cox model fit. In R,153A.6. Exclusion Criteria and Summary of Selected Cohortsperforming weighted pooled logistic regression using the glm functionfrom the base package (with log link) is straightforward [159].• Similarly, the svyglm function from the survey package can be usedto implement the (weighted) pooled logistic model [247].• With data organized in person-month format, to perform survival anal-ysis using the weighted Cox model, we used the Andersen-Gill’s count-ing process approach as implemented in the svycoxph function fromthe R package survey [248] with the weights option. Approxima-tion via complementary-log-log and Poisson models can also be imple-mented using the same package. A sample code follows:require(survey)require(rms)(weighted.design<-svydesign(id=~ID, data=long.format,weight=~normalized.stabilized.weight))svycoxph(Surv(start, stop, event) ~ drug + covariate.list,design=weighted.design)svyglm(event ~ drug + rcs(Time) + covariate.list,family=binomial(link=log), design=weighted.design)svyglm(event ~ drug + rcs(Time) + covariate.list,family=binomial(link=cloglog), design=weighted.design)svyglm(event ~ offset(log(stop-start))+ drug + rcs(Time) +covariate.list, family=poisson(), design=weighted.design)• Alternatively, the coxph function from the survival package [249]can be used to fit the weighted Cox model [57]. To handle correlatedobservations, the cluster option must be specified to identify theperson-month observations from the same patient. Robust SEs areobtained by specifying the option robust = TRUE.154A.6. Exclusion Criteria and Summary of Selected CohortsTable A.2: Characteristics of the selected cohort of patients with relapsing-onset multiple sclerosis (MS), British Columbia, Canada (1995-2008).Characteristics β-IFN β-IFNexposed patients unexposed patientsFrequency 868 829Women, n (%) 660 (76.0) 637 (76.8)Disease duration (at baseline) 5.8†( 6.6‡) 8.3†( 8.5‡)Age (at baseline) 38.1†( 9.2‡) 41.3†( 10.0‡)EDSS score (at baseline) 2.0§( 0-6.5¶) 2.0§( 0-6.5¶)Relapse rate / year (over the2 years prior to baseline)#0.5§( 0-1.2#) 0.5§( 0-1.0#)Person-years exposed to β-IFN treatment2,530 0Person-years not exposed toβ-IFN treatment1,400 2,960† Mean.‡ Standard deviation.§ Median.¶ Range.# IQR.A.6 Exclusion Criteria and Summary of SelectedCohortsIn total, 2,671 patients met the eligibility criteria to receive β-IFN treat-ment between July 1995 and December 2004 [92]. Of these, patients whowere exposed to a non-β-IFN immunomodulatory drug, a cytotoxic immuno-suppressant for MS (n = 172), or an MS clinical trial (n = 21) prior tobaseline were excluded from the analysis. If the exposure occurred afterbaseline, data were censored at at the start of the exposure of the non--IFN treatment. Other exclusion criteria included unknown MS onset date(n = 10), insufficient EDSS measurements (n = 436), reaching of the out-come (n = 218) or the secondary progressive stage before the eligibility date(n = 217). Some patients met multiple exclusion criteria. As a result, 1, 697155A.7. Sensitivity Analysespatients were selected. A summary of their characteristics are reported inTable A.2.A.7 Sensitivity AnalysesA.7.1 Sensitivity Analysis: Impact of Weight TrimmingIf the weights contain extreme values, one should be concerned about thepositivity assumption. The MSCM approach is built on the counterfactualframework and it is necessary to assume patients could choose treatmentexposure or non-exposure at any time point. If a group of patients withsimilar covariate history rarely or never receive treatment, then the esti-mated probability of being treated would be close to zero. Conversely, ifa group of patients with similar covariate history almost always or alwaysreceive treatment, then the estimated probability of being treated would beclose to one. Then the corresponding fitted probability will be close to zeroor one resulting in a very large or small inverse probability weight respec-tively. This may produce unstable estimates from the MSCM.As a sensitivity analysis, one could restrict the analysis to the subset ofpatients that have probability of treatment and censoring that is reasonablyremoved from 0 and 1 at every time point. This procedure is known astrimming [120]. As with truncation of the weights, systematically exclud-ing such patients may produce a biased estimate. Also, the interpretationmay lack generalizability due to this restriction. However, since the patientswith extreme weights are removed, a relatively stable point estimate with asmaller CI is expected.After estimating the fitted probabilities from the weight models, if theprobabilities are such that a few person-time observations are contributingtoo much in the pseudo-population, this may make the estimate of the causaleffect unstable. In our sensitivity analysis, we removed the patients with atleast one fitted value either greater than 0.95 or less than 0.05 (represented156A.7. Sensitivity Analysesmore than 20 times in the pseudo-population). This left 1,603 patients,with 133 reaching the outcome. MSCM using sw(n) lead to a HR estimateof 1.33 with a 95% bootstrap CI of 0.94−1.89. The conclusion regarding thetreatment effect of β-IFN on time to sustained EDSS 6 from these resultsremained the same.A.7.2 Sensitivity Analysis: Impact of More RestrictiveEligibility CriteriaTable A.3: The marginal structural Cox model (MSCM) fit with the normalized stabi-lized IPTC weights sw(n) for time to sustained EDSS 6 to estimate the causal effect of β-IFN treatment for patients with relapsing-onset multiple sclerosis (MS), British Columbia,Canada (1995-2008) selected by more restrictive eligibility criteria. The model was alsoadjusted for baseline covariates EDSS, age, disease duration and sex.Covariate Estimate∗ HR † 95% CI ‡β-IFN 0.18 1.19 0.68 - 2.11EDSS 0.40 1.48 1.24 - 1.77§Disease duration# -0.14 0.87 0.55 - 1.37Age# 0.45 1.57 1.14 - 2.18§Sex¶ -0.33 0.72 0.38 - 1.35HR, Hazard ratio; CI, confidence interval; EDSS, expanded disability statusscale.∗ Estimated log HR: negative value is indicative of a beneficial effect and positivevalue is indicative of a harmful effect.† HR, indicating the instantaneous risk of reaching sustained and confirmedEDSS 6.‡ Based on 500 nonparametric bootstrap sample estimates.§ 95% CI that does not include 1.# Expressed in decades.¶ Reference level: Male.As another sensitivity analysis, a more restricted study sample was se-lected by defining active disease (two or more documented relapses duringthe two years prior to baseline) as part of the eligibility criteria, while also157A.7. Sensitivity Analysesincluding all the previous criteria. This left 747 patients in the study with3, 028 person-years of follow-up and 1, 460 person-years of -IFN exposure.Only 52 of these patients reached the irreversible disease outcome.The model fit is reported in Appendix-Table A.3. The regression co-efficients and HR estimates were qualitatively similar to those reported inTable 2. The CIs from this restricted dataset were wider due to the smallersample size. Still, the conclusion regarding the treatment effect of β-IFN ontime to sustained EDSS 6 remained the same as before.A.7.3 Sensitivity Analysis: Impact of the CumulativeExposure to β-IFNTable A.4: The marginal structural Cox model (MSCM) fit with the normalized stabi-lized IPTC weights sw(n) for time to sustained EDSS 6 to estimate the causal effect ofcumulative exposure to β-IFN over the last two years for patients with relapsing-onset mul-tiple sclerosis (MS), British Columbia, Canada (1995-2008). The model was also adjustedfor baseline covariates EDSS, age, disease duration and sex.Covariate Estimate HR † 95% CI ‡Cumulative β-IFN∗ 0.53 1.70 0.64 - 4.53EDSS 0.54 1.71 1.53 - 1.91 §Disease duration# -0.20 0.82 0.66 - 1.10Age# 0.30 1.34 1.10 - 1.63 §Sex¶ -0.23 0.79 0.55 - 1.15HR, Hazard ratio; CI, confidence interval; EDSS, expanded disability statusscale.∗ Expressed as proportion of months exposed over last two years.† HR, indicating the instantaneous risk of reaching sustained and confirmedEDSS 6.‡ Based on 500 nonparametric bootstrap sample estimates.§ 95% CI that does not include 1.# Expressed in decades.¶ Reference level: Male.158A.7. Sensitivity AnalysesWe also assessed the impact of the cumulative exposure to β-IFN (pro-portion of months exposed) over the last two years on time to sustainedEDSS 6. The model fit is reported in Appendix-Table A.4. This analysisalso failed to detect a significant association between the cumulative ex-posure to β-IFN and the hazard of reaching sustained EDSS 6. A similarfinding was observed when the cumulative exposure was restricted to thepast one year only (data not shown).A.7.4 Sensitivity Analysis: Impact of the CumulativeNumber of Relapses in the Last YearTable A.5: The marginal structural Cox model (MSCM) fit with the normalized stabi-lized IPTC weights sw(n) for time to sustained EDSS 6 to estimate the causal effect ofcumulative exposure to β-IFN over the last two years for patients with relapsing-onsetmultiple sclerosis (MS), British Columbia, Canada (1995-2008) while considering the cu-mulative number of relapses in the last year as the time-varying confounder. The modelwas also adjusted for baseline covariates EDSS, age, disease duration and sex.Covariate Estimate HR † 95% CI ‡β-IFN∗ 0.31 1.36 0.96 - 1.92EDSS 0.54 1.72 1.54 - 1.92 §Disease duration# -0.18 0.82 0.66 - 1.04Age# 0.28 1.32 1.10 - 1.60 §Sex¶ -0.22 0.80 0.55 - 1.16HR, Hazard ratio; CI, confidence interval; EDSS, expanded disability statusscale.∗ Expressed as proportion of months exposed over last two years.† HR, indicating the instantaneous risk of reaching sustained and confirmedEDSS 6.‡ Based on 500 nonparametric bootstrap sample estimates.§ 95% CI that does not include 1.# Expressed in decades.¶ Reference level: Male.We also assessed the impact of the exposure to β-IFN on time to sus-159A.7. Sensitivity Analysestained EDSS 6 while considering the cumulative number of relapses in thelast year (instead of the last two years) as the time-varying confounder. Themodel fit is reported in Appendix-Table A.5. This analysis also failed to de-tect a significant association between the exposure to β-IFN and the hazardof reaching sustained EDSS 6.160Appendix BAppendix for Chapter 3B.1 Propensity ScoresA confounder is a factor that is affects both the treatment decision and thestudy outcome. Propensity score techniques facilitate simultaneous adjust-ment for multiple confounders in observational or non-experimental settings.The propensity score pi is defined as a subject’s probability of receivinga treatment (Ai0 = 1) conditional on a number of covariates (Li0 = li0)present at baseline [20]. Under the assumption of no unmeasured confound-ing, the treated and untreated subjects with the same propensity score willhave identical distributions of baseline confounders. To balance the covari-ate distribution, treated and untreated subjects are selected by matchingthe estimated propensity scores or stratifying on the basis of the estimatedpropensity scores quantiles. If covariate balance is lacking in the originalsample, excluding the subjects without overlapping propensity scores canrestore the balance. That is why the propensity score is known as a balanc-ing score.Propensity scores can be used to determine the inverse probability weights(IPW) that are inversely proportional to the probability of the observed ex-posure status, conditional on confounders. These weights can incorporatenot only the baseline covariates, but also the covariates that include post-baseline values. That is, if pi is the propensity score, then 1/pi is the weightfor the exposed subject and 1/(1−pi) is the weight for the unexposed subject.IPW-based estimators can be generalized to multiple exposure categories, toaccommodate survival or censored data and to incorporate time-dependentexposure and covariates [42, 250]. However, the ability to deal with com-161B.2. Model Specification in MSCMplex problems comes at a price. The effect estimates from IPW methods canbe unstable and the estimated variance needs to account for the weighted(pseudo) data. Methods exist for stabilizing the weights (see appendix B.3)and robust variance estimation methods can be used to account for weighteddata [50].B.2 Model Specification in MSCMBased on the notation described in the text (see §3.2), in the presence ofbaseline covariates Li0, the hazard function can be expressed as the followingtime-dependent Cox model:λi,A¯m(m|Li0) = λ0¯(m) exp(γ(m, A¯m,ψ))(B.1)where m is the visit index, λ0¯(m) is the unspecified baseline hazard function,ψ = (ψ1, ψ2), ψ1 is the log HR of the current treatment status (Aim), and ψ2is the vector of log hazard ratios (HRs) for the baseline covariates. Specifyingthe model for γ(m, A¯m, ψ1) yields:λi,A¯m(m|Li0) = λ0¯(m) exp(ψ1Aim + ψ2Li0), (B.2)where the impact of treatment is modelled based on only current exposureAm (i.e., the dependence on A¯m is modelled only through the current expo-sure Am) [50].In the presence of a time-dependent confounder Lim, we may be temptedto expand the above Cox model to:λi,A¯m(m|Li0, Lim) = λ0¯(m) exp(ψ1Aim + ψ2Li0 + ψ3Lim),which could still produce a biased estimate of ψ1 if Lim is influenced bypast exposure [50]. Nonetheless, as Lim is a confounder, we still need toadjust for confounding due to Lim somehow. IPWs are person-time specificmeasures of the degree to which Lim confounds the treatment selection pro-162B.3. Model Specifications for Estimating the Weightscess. Therefore, in MSCM, IPWs are used in the time-dependent Cox modelformulation (equation (B.2)) to weight the contribution of each person-timeobservation so that the confounding due to Lim is removed.B.3 Model Specifications for Estimating theWeightsThe unstabilized IPWs for subject i at month m are expressed as:wim =m∏j=01pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯ij = l¯ij), (B.3)As discussed in § A.4, we can estimate the probabilities in equation (B.3)by building a pooled logistic regression model for current treatment status(Aj) with the following covariates: treatment status at the previous timeinterval (Aj−1), the baseline covariates (L0), the follow-up time index, andthe time-varying confounder (Lij) as follows:logit Pr(Aij = 1|A¯i,j−1, Li0, L¯ij ,α) = α0(j) + α1Ai,j−1 +α2Li0 + α3Lij , (B.4)where α = (α0, α1, α2, α3).Adding interaction terms to equation (B.4) enables us to capture therealistic scenario that the status of the confounder at time j (i.e., Lij = 0 or1) can potentially influence a switch onto treatment (Ai,j−1 = 0, Ai,j = 1)differently than a switch off treatment (Ai,j−1 = 1, Ai,j = 0), dependingon the treatment status at the previous time period (i.e., Ai,j−1). In ourimplementation, the denominator terms are estimated from:logit Pr(Aij = 1|A¯i,j−1, Li0, L¯ij ,α) = α0(j) + α1Ai,j−1 + α2Li0 +α3Lij + α13Ai,j−1Lij , (B.5)163B.3. Model Specifications for Estimating the Weightswhere α now includes α13 as well. Since we are using only the last valueof the treatment history (A¯i,j−1 = Ai,j−1) in equation (B.5), it is possibleto simplify this equation by considering treatment status at the (j − 1)-thtime, i.e., whether the patient was treated (Ai,j−1 = 1) or not (Ai,j−1 = 0)as follows:logit Pr(Aij = 1|Ai,j−1 = 1, Li0, Lij) = {α0(j) + α1}+ α2Li0 +(α3 + α13)Lij .logit Pr(Aij = 1|Ai,j−1 = 0, Li0, Lij) = α0(j) + α2Li0 + α3Lij .The predicted probabilities from equation (B.5) yield the estimated prob-ability of the subject’s treatment status at time j. Subsequently, we obtainthe probability of the observed exposure sequence over m time periods of agiven subject by multiplying the corresponding probabilities.To stabilize this IPW, we use the following general formula:swim =m∏j=0pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0)pr(Aij = aij |A¯i,j−1 = a¯i,j−1, Li0 = li0, L¯ij = l¯ij). (B.6)In our implementation, the numerator terms are estimated from:logit Pr(Aij = 1|A¯i,j−1, Li0) = α′0(j) + α′1Ai,j−1 + α′2Li0, (B.7)where no element of L¯ij is included as a predictor.Dividing the estimated numerator probabilities of the subject’s observedtreatment status aij (either 0 or 1) by the corresponding estimated denom-inator probabilities yields the estimated IPWs swim that account for theconfounding due to L¯im.164B.4. Implementation of the Statistical Learning Approaches in RThe formulas for the normalized versions of the IPW are:w(n)im =wimnm∑i∈rm wim, sw(n)im =swimnm∑i∈rm swim, (B.8)where rm denotes the risk-set at time m, nm denotes the total number ofsubjects in the risk-set and w and sw are the unstabilized and stabilizedIPW weights.B.4 Implementation of the Statistical LearningApproaches in RTo estimate the weights, the following functions can be used in R:• We fitted the logistic regressions using the glm function from the basepackage (with logit link). Sample code is as follows:# Numerator and denominator modelsww <- glm(A ~ m + L + A.lag + L.lag + A.lag*L,family = binomial(logit), data = dataset)ww0 <- glm(A ~ m + A.lag, family = binomial(logit), data = dataset)# Weight generation by exposuredataset$wwp <- with(dataset, ifelse(A == 0,1 - fitted(ww), fitted(ww)))dataset$wwp0 <- with(dataset, ifelse(A == 0,1 - fitted(ww0),fitted(ww0)))# generating unstabilized and stabilized weightsdataset$w <- unlist(tapply(1/dataset$wwp, dataset$id, cumprod))dataset$sw <- unlist(tapply(dataset$wwp0/dataset$wwp,dataset$id, cumprod))The lrm function from the rms package can do the same.• We performed bootstrap aggregation or bagging using the baggingfunction from the ipred package. Sample code is as follows:165B.4. Implementation of the Statistical Learning Approaches in Rlibrary(rpart)library(ipred)# Numerator and denominator modelsww <- bagging(A ~ m + L + A.lag + L.lag + A.lag*L,data=dataset, nbagg=100,control=rpart.control(xval=10))ww0 <- bagging(A ~ m + A.lag, data=dataset, nbagg=100,control=rpart.control(xval=10))# Weight generation by exposuredataset$wwp <- with(dataset, ifelse(A == 0,1 - predict(ww, type="prob"),predict(ww, type="prob")))dataset$wwp0 <- with(dataset, ifelse(A == 0,1 - predict(ww0, type="prob"),predict(ww0, type="prob")))# generating unstabilized and stabilized weightsdataset$w <- unlist(tapply(1/dataset$wwp, dataset$id, cumprod))dataset$sw <- unlist(tapply(dataset$wwp0/dataset$wwp,dataset$id, cumprod))• LIBSVM is a popular implementation of the support vector machinesalgorithm [153]. We can make use of this implementation via thee1071 package in R. We fit SVM using svm function from this package.Sample code is as follows:require(e1071)# Numerator and denominator modelsww <- svm(as.factor(A) ~ m + as.factor(L) + as.factor(A.lag) +as.factor(L.lag) + as.factor(A.lag*L),data=dataset, probability=TRUE, kernel = "polynomial")ww0 <- svm(as.factor(A) ~ m + as.factor(A.lag),data=dataset, probability=TRUE, kernel = "polynomial")# Weight generation by exposurenewdf <- data.frame(m = dataset$m, L = dataset$L,166B.4. Implementation of the Statistical Learning Approaches in RA.lag = dataset$A.lag, L.lag = dataset$L.lag)(predw <- predict(ww, newdf, probability = TRUE))pr.predw <- attr(predw, "prob")[,1]newdf <- data.frame(m = dataset$m, A.lag = dataset$A.lag)(predw0 <- predict(ww0, newdf, probability = TRUE))pr.predw0 <- attr(predw0, "prob")[,1]dataset$wwp <- with(dataset, ifelse(A == 0,pr.predw, 1-pr.predw))dataset$wwp0 <- with(dataset, ifelse(A == 0,pr.predw0, 1-pr.predw0))# generating unstabilized and stabilized weightsdataset$w <- unlist(tapply(1/dataset$wwp, dataset$id, cumprod))dataset$sw <- unlist(tapply(dataset$wwp0/dataset$wwp,dataset$id, cumprod))The ksvm function in the kernlab package or the svmlight functionin the klaR package or the svmpath function in the svmpath packagecan also be used to fit SVM.• We performed boosting using the ps function from the twang package,which utilizes the gbm function from the gbm package. Sample code isas follows:require(gbm)require(twang)# Numerator and denominator modelsww <- ps(A ~ m + L + A.lag + L.lag, data=dataset,interaction.depth=2,stop.method="ks.mean", print.level=0,verbose=FALSE)ww0 <- ps(A ~ m + A, data=dataset, interaction.depth=2,stop.method="ks.mean", print.level=0,verbose=FALSE)# Weight generation by exposuredataset$wwp <- with(dataset, ifelse(A == 0,1-ww$ps$ks.mean.ATE, ww$ps$ks.mean.ATE))167B.5. Post-estimation Weight Variability Reduction Techniquesdataset$wwp0 <- with(dataset, ifelse(A == 0,1-ww0$ps$ks.mean.ATE, ww0$ps$ks.mean.ATE))# generating unstabilized and stabilized weightsdataset$w <- unlist(tapply(1/dataset$wwp, dataset$id, cumprod))dataset$sw <- unlist(tapply(dataset$wwp0/dataset$wwp,dataset$id, cumprod))The coxph function from the survival package can be used to fit theMSCM with the cluster option specifying the patient identification andoption robust = TRUE specifying estimation of robust SEs.B.5 Post-estimation Weight VariabilityReduction TechniquesNormalization. Normalization (discussed in Appendix B.3) is a relativelynew proposal to change the weights in such a way that, in each risk-set, thevariability is reduced while also assuring that the mean weight for each risk-set equals one [57]. Such characteristics of weights in turn contribute toreducing the sampling variability of the estimates of the causal effect. Thistechnique can be applied on both unstabilized and stabilized weights (seeequation (B.8) in Appendix B.3). The usefulness of this approach was shownin a simulation setting [57].Truncation. Weight truncation refers to reducing weights larger thansome specified value wu to wu and increasing weights smaller than somespecified value wl to wl. The truncation points (wl, wu) are usually se-lected according to specified weight quantiles (say, 5% and 95%). Trunca-tion generally reduces the variability of the causal effect estimate. Whenthe distribution of the weights is symmetric, higher levels of truncation usu-ally lead the effect estimate to move towards the baseline-adjusted estimate.At 50% truncation, a median weight is assigned, which leads to an esti-mate similar to the baseline-adjusted estimate [251]. Selecting a suitablelevel of truncation involves the ‘variance-bias-trade-off’. Selection of this168B.6. Pseudocode for MSCM Data Simulationlevel generally involves data-adaptive methods [104, 158]. We denoted un-truncated, 1%, 5%, 25%, 35% and 50% truncated unstabilized weights asw,w1, w5, w25, w35, w50 respectively. This notational convention holds forall other weights as well. Many researchers use the terms trimming and trun-cation interchangeably [120, 121, 251], but we will maintain the definitionof ‘truncation’ above.B.6 Pseudocode for MSCM Data SimulationThe algorithm proposed by Young et al. [56, 142] generates data that satisfythe conditions of the following three models simultaneously: MSM, struc-tural nested accelerated failure time model and a structural nested cumu-lative failure time model. The steps of this algorithm are also describedelsewhere [56, 57, 114, 160]. We slightly modified the treatment generat-ing models to include an interaction term (Am−1 × Lm) in the treatmentgeneration stage to make it more realistic for many disease settings.GETn← 2500 (large sample) or 300 (small sample);K ← 10 (maximum follow-up);λ0 ← 0.01 (rare events) or 0.10 (frequent events);β ← [log(3/7), 2, log(1/2), log(3/2)] (parameter vector for generating L);α← [log(2/7), (1/2), (1/2), log(4), log(6/5)] (parameter vector for gener-ating A);ψ1 ← −0.5 (true log-hazard value of the treatment effect)COMPUTEFOR ID = 1 to nINIT: L−1 ← 0; A−1 ← 0; Y0 ← 0; Hm ← 0; c← 30T0¯ ∼ Exponential(λ0)FOR m = 1 to Klogit pL ← logit Pr(Lm = 1|Lm−1, Am−1, Ym = 0;β)169B.7. Describing the Characteristics of the Weights in a Simulated Population← β0 + β1I(T0¯ < c) + β2Am−1 + β3Lm−1Lm ∼ Bernoulli(pL)logit pA ← logit Pr(Am = 1|Lm, Lm−1, Am−1, Ym = 0;α)← α0 + α1Lm + α2Am−1 + α3Lm−1 + α4Am−1 × LmAm ∼ Bernoulli(pA)Hm ←∫m+10 λa¯j (j)dj← Hm + exp(ψ1Am)IF T0¯ ≥ HmYm+1 ← 0ELSEYm+1 ← 1TA¯m ← m+ (T0¯ −Hm)× exp(−ψAm)END IFENDFOR mENDFOR IDPRINTID, m, Ym+1, Am, Lm, Am−1, Lm−1B.7 Describing the Characteristics of theWeights in a Simulated PopulationThe truncated weights estimated from various approaches are summarizedin Appendix Tables B.1 - B.4. Data is generated for a very large numberof subjects (n = 25, 000), each with up to 10 visits, under the rare eventcondition. These tables are described in § 3.4.1.170B.7.DescribingtheCharacteristicsoftheWeightsinaSimulatedPopulationTable B.1: Summaries of the truncated weights estimated by logistic regression (l = logistic) under different weightingschemes (w = unstabilized, w(n) = unstabilized normalized, sw = stabilized, sw(n) = stabilized normalized) from thesimulation study with a large (25, 000) number of subjects, each with up to 10 visits, under the rare event condition.Min. Q1 Median Mean Q3 Max. sd p >20 p >100l − w 1.21 3.96 17.82 189.70 98.09 12780.00 666.15 0.48 0.25l − w1 1.21 3.96 17.82 165.30 98.09 2728.00 425.81 0.48 0.25l − w5 1.31 3.96 17.82 117.00 98.09 807.30 214.13 0.48 0.25l − w10 1.47 3.96 17.82 89.15 98.09 414.70 137.36 0.48 0.25l − w25 3.96 3.96 17.82 38.89 98.05 98.09 39.10 0.48 0.00l − w35 6.93 6.93 17.82 24.30 44.89 44.89 17.04 0.48 0.00l − w50 17.82 17.82 17.82 17.82 17.82 17.82 0.00 0.00 0.00l − w(n) 0.01 0.22 0.58 1.00 1.22 13.99 1.37 0.00 0.00l − w(n)1 0.02 0.22 0.58 0.97 1.22 6.42 1.21 0.00 0.00l − w(n)5 0.06 0.22 0.58 0.89 1.22 3.34 0.91 0.00 0.00l − w(n)10 0.10 0.22 0.58 0.82 1.22 2.38 0.75 0.00 0.00l − w(n)25 0.22 0.22 0.58 0.65 1.22 1.22 0.40 0.00 0.00l − w(n)35 0.35 0.35 0.58 0.56 0.79 0.79 0.20 0.00 0.00l − w(n)50 0.58 0.58 0.58 0.58 0.58 0.58 0.00 0.00 0.00l − sw 0.33 0.79 0.94 1.00 1.19 2.54 0.34 0.00 0.00l − sw1 0.39 0.79 0.94 1.00 1.19 2.08 0.33 0.00 0.00l − sw5 0.55 0.79 0.94 1.00 1.19 1.64 0.29 0.00 0.00l − sw10 0.65 0.79 0.94 0.98 1.19 1.41 0.25 0.00 0.00l − sw25 0.79 0.79 0.94 0.97 1.19 1.19 0.16 0.00 0.00l − sw35 0.86 0.86 0.94 0.95 1.05 1.05 0.08 0.00 0.00l − sw50 0.94 0.94 0.94 0.94 0.94 0.94 0.00 0.00 0.00l − sw(n) 0.32 0.78 0.94 1.00 1.18 2.48 0.33 0.00 0.00l − sw(n)1 0.39 0.78 0.94 1.00 1.18 2.08 0.33 0.00 0.00l − sw(n)5 0.55 0.78 0.94 0.99 1.18 1.64 0.29 0.00 0.00l − sw(n)10 0.65 0.78 0.94 0.98 1.18 1.42 0.25 0.00 0.00l − sw(n)25 0.78 0.78 0.94 0.97 1.18 1.18 0.16 0.00 0.00l − sw(n)35 0.85 0.85 0.94 0.95 1.04 1.04 0.09 0.00 0.00l − sw(n)50 0.94 0.94 0.94 0.94 0.94 0.94 0.00 0.00 0.00171B.7.DescribingtheCharacteristicsoftheWeightsinaSimulatedPopulationTable B.2: Summaries of the truncated weights estimated by bagging approach (b = bagging) under different weightingschemes (w = unstabilized, w(n) = unstabilized normalized, sw = stabilized, sw(n) = stabilized normalized) from thesimulation study with a large (25, 000) number of subjects, each with up to 10 visits, under the rare event condition.Min. Q1 Median Mean Q3 Max. sd p >20 p >100b− w 1.28 4.08 18.62 195.80 101.50 8990.00 641.09 0.49 0.25b− w1 1.29 4.08 18.62 174.00 101.50 2986.00 450.31 0.49 0.25b− w5 1.30 4.08 18.62 121.70 101.50 812.10 219.95 0.49 0.25b− w10 1.67 4.08 18.62 97.09 101.50 466.20 152.30 0.49 0.25b− w25 4.08 4.08 18.62 40.68 101.30 101.50 40.54 0.49 0.25b− w35 7.70 7.70 18.62 25.84 47.23 47.23 17.84 0.49 0.00b− w50 18.62 18.62 18.62 18.62 18.62 18.62 0.00 0.00 0.00b− w(n) 0.01 0.26 0.66 1.00 1.26 12.56 1.31 0.00 0.00b− w(n)1 0.02 0.26 0.66 0.98 1.26 6.87 1.18 0.00 0.00b− w(n)5 0.07 0.26 0.66 0.90 1.26 3.25 0.87 0.00 0.00b− w(n)10 0.12 0.26 0.66 0.83 1.26 2.29 0.72 0.00 0.00b− w(n)25 0.26 0.26 0.66 0.69 1.26 1.26 0.40 0.00 0.00b− w(n)35 0.40 0.40 0.66 0.61 0.84 0.84 0.20 0.00 0.00b− w(n)50 0.66 0.66 0.66 0.66 0.66 0.66 0.00 0.00 0.00b− sw 0.36 0.92 0.98 1.00 1.06 1.99 0.19 0.00 0.00b− sw1 0.49 0.92 0.98 1.00 1.06 1.63 0.18 0.00 0.00b− sw5 0.76 0.92 0.98 1.00 1.06 1.38 0.15 0.00 0.00b− sw10 0.83 0.92 0.98 1.00 1.06 1.24 0.12 0.00 0.00b− sw25 0.92 0.92 0.98 0.99 1.06 1.06 0.06 0.00 0.00b− sw35 0.95 0.95 0.98 0.98 1.02 1.02 0.03 0.00 0.00b− sw50 0.98 0.98 0.98 0.98 0.98 0.98 0.00 0.00 0.00b− sw(n) 0.35 0.92 0.98 1.00 1.06 1.95 0.19 0.00 0.00b− sw(n)1 0.48 0.92 0.98 1.00 1.06 1.62 0.18 0.00 0.00b− sw(n)5 0.75 0.92 0.98 1.00 1.06 1.38 0.15 0.00 0.00b− sw(n)10 0.82 0.92 0.98 0.99 1.06 1.24 0.12 0.00 0.00b− sw(n)25 0.92 0.92 0.98 0.98 1.06 1.06 0.06 0.00 0.00b− sw(n)35 0.94 0.94 0.98 0.98 1.01 1.01 0.03 0.00 0.00b− sw(n)50 0.98 0.98 0.98 0.98 0.98 0.98 0.00 0.00 0.00172B.7.DescribingtheCharacteristicsoftheWeightsinaSimulatedPopulationTable B.3: Summaries of the truncated weights estimated by SVM approach (svm = SVM) under different weightingschemes (w = unstabilized, w(n) = unstabilized normalized, sw = stabilized, sw(n) = stabilized normalized) from thesimulation study with a large (25, 000) number of subjects, each with up to 10 visits, under the rare event condition.Min. Q1 Median Mean Q3 Max. sd p >20 p >100svm− w 1.35 4.50 20.25 161.40 100.70 6568.00 466.04 0.50 0.25svm− w1 1.35 4.50 20.25 149.50 100.70 2438.00 366.19 0.50 0.25svm− w5 1.35 4.50 20.25 111.30 100.70 731.10 196.67 0.50 0.25svm− w10 1.82 4.50 20.25 88.21 100.70 408.90 133.50 0.50 0.25svm− w25 4.50 4.50 20.25 40.93 100.60 100.70 40.18 0.50 0.25svm− w35 8.22 8.22 20.25 27.22 50.19 50.19 18.95 0.50 0.00svm− w50 20.25 20.25 20.25 20.25 20.25 20.25 0.00 1.00 0.00svm− w(n) 0.03 0.34 0.72 1.00 1.31 8.33 1.10 0.00 0.00svm− w(n)1 0.04 0.34 0.72 0.99 1.31 5.59 1.03 0.00 0.00svm− w(n)5 0.11 0.34 0.72 0.93 1.31 3.13 0.81 0.00 0.00svm− w(n)10 0.17 0.34 0.72 0.86 1.31 2.12 0.64 0.00 0.00svm− w(n)25 0.34 0.34 0.72 0.76 1.31 1.31 0.40 0.00 0.00svm− w(n)35 0.48 0.48 0.72 0.70 0.95 0.95 0.21 0.00 0.00svm− w(n)50 0.72 0.72 0.72 0.72 0.72 0.72 0.00 0.00 0.00svm− sw 0.76 0.95 1.01 1.00 1.06 1.22 0.08 0.00 0.00svm− sw1 0.81 0.95 1.01 1.00 1.06 1.19 0.08 0.00 0.00svm− sw5 0.87 0.95 1.01 1.00 1.06 1.14 0.07 0.00 0.00svm− sw10 0.90 0.95 1.01 1.00 1.06 1.10 0.06 0.00 0.00svm− sw25 0.95 0.95 1.01 1.01 1.06 1.06 0.04 0.00 0.00svm− sw35 0.98 0.98 1.01 1.01 1.04 1.04 0.03 0.00 0.00svm− sw50 1.01 1.01 1.01 1.01 1.01 1.01 0.00 0.00 0.00svm− sw(n) 0.76 0.95 1.01 1.00 1.05 1.22 0.08 0.00 0.00svm− sw(n)1 0.80 0.95 1.01 1.00 1.05 1.19 0.08 0.00 0.00svm− sw(n)5 0.86 0.95 1.01 1.00 1.05 1.14 0.07 0.00 0.00svm− sw(n)10 0.89 0.95 1.01 1.00 1.05 1.09 0.06 0.00 0.00svm− sw(n)25 0.95 0.95 1.01 1.00 1.05 1.05 0.04 0.00 0.00svm− sw(n)35 0.97 0.97 1.01 1.00 1.03 1.03 0.03 0.00 0.00svm− sw(n)50 1.01 1.01 1.01 1.01 1.01 1.01 0.00 0.00 0.00173B.7.DescribingtheCharacteristicsoftheWeightsinaSimulatedPopulationTable B.4: Summaries of the truncated weights estimated by boosting approach (gbm = boosting) under different weightingschemes (w = unstabilized, w(n) = unstabilized normalized, sw = stabilized, sw(n) = stabilized normalized) from thesimulation study with a large (25, 000) number of subjects, each with up to 10 visits, under the rare event condition.Min. Q1 Median Mean Q3 Max. sd p >20 p >100gbm− w 1.24 3.56 16.09 163.60 90.74 6441.00 477.65 0.46 0.23gbm− w1 1.24 3.56 16.09 151.20 90.74 2415.00 376.62 0.46 0.23gbm− w5 1.30 3.56 16.09 117.20 90.74 878.10 226.41 0.46 0.23gbm− w10 1.54 3.56 16.09 84.72 90.74 408.30 133.56 0.46 0.23gbm− w25 3.56 3.56 16.09 35.96 90.54 90.74 36.33 0.46 0.00gbm− w35 6.55 6.55 16.09 22.38 41.31 41.31 15.65 0.46 0.00gbm− w50 16.09 16.09 16.09 16.09 16.09 16.09 0.00 0.00 0.00gbm− w(n) 0.01 0.23 0.60 1.00 1.22 8.86 1.29 0.00 0.00gbm− w(n)1 0.02 0.23 0.60 0.99 1.22 6.75 1.23 0.00 0.00gbm− w(n)5 0.06 0.23 0.60 0.90 1.22 3.40 0.94 0.00 0.00gbm− w(n)10 0.09 0.23 0.60 0.84 1.22 2.41 0.77 0.00 0.00gbm− w(n)25 0.23 0.23 0.60 0.66 1.22 1.22 0.40 0.00 0.00gbm− w(n)35 0.35 0.35 0.60 0.57 0.81 0.81 0.21 0.00 0.00gbm− w(n)50 0.60 0.60 0.60 0.60 0.60 0.60 0.00 0.00 0.00gbm− sw 0.21 0.77 0.93 0.99 1.10 3.41 0.42 0.00 0.00gbm− sw1 0.30 0.77 0.93 0.99 1.10 2.55 0.40 0.00 0.00gbm− sw5 0.45 0.77 0.93 0.98 1.10 1.82 0.34 0.00 0.00gbm− sw10 0.56 0.77 0.93 0.96 1.10 1.44 0.26 0.00 0.00gbm− sw25 0.77 0.77 0.93 0.93 1.10 1.10 0.13 0.00 0.00gbm− sw35 0.85 0.85 0.93 0.94 1.03 1.03 0.08 0.00 0.00gbm− sw50 0.93 0.93 0.93 0.93 0.93 0.93 0.00 0.00 0.00gbm− sw(n) 0.21 0.77 0.94 1.00 1.11 3.45 0.42 0.00 0.00gbm− sw(n)1 0.30 0.77 0.94 1.00 1.11 2.59 0.40 0.00 0.00gbm− sw(n)5 0.45 0.77 0.94 0.98 1.11 1.83 0.34 0.00 0.00gbm− sw(n)10 0.57 0.77 0.94 0.96 1.11 1.45 0.26 0.00 0.00gbm− sw(n)25 0.77 0.77 0.94 0.94 1.11 1.11 0.13 0.00 0.00gbm− sw(n)35 0.86 0.86 0.94 0.94 1.03 1.03 0.08 0.00 0.00gbm− sw(n)50 0.94 0.94 0.94 0.94 0.94 0.94 0.00 0.00 0.00174B.8. Additional Simulation ResultsB.8 Additional Simulation ResultsB.8.1 Results from Smaller Samples n = 3000 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasUnstabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasNormalized unstabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasStabilized0 10 20 30 40 500.00.10.20.30.40.50.60.7weight truncation percentilebiasNormalized stabilizedlogist bag svm gbmFigure B.1: Bias of MSCM estimate ψˆ1 under different IPW generation ap-proaches when the large weights are progressively truncated in a simulation studyof 1, 000 datasets with 300 subjects observed at most 10 times under the rare eventcondition.175B.8. Additional Simulation Results0 10 20 30 40 500.40.60.81.01.2weight truncation percentilesdUnstabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilesdNormalized unstabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilesdStabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilesdNormalized stabilizedlogist bag svm gbmFigure B.2: Empirical standard deviation of MSCM estimate ψˆ1 under differentIPW generation approaches when the large weights are progressively truncated ina simulation study of 1, 000 datasets with 300 subjects observed at most 10 timesunder the rare event condition.176B.8. Additional Simulation Results0 10 20 30 40 500.40.60.81.01.2weight truncation percentilemodel.sdUnstabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilemodel.sdNormalized unstabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilemodel.sdStabilized0 10 20 30 40 500.40.60.81.01.2weight truncation percentilemodel.sdNormalized stabilizedlogist bag svm gbmFigure B.3: Average model-based standard error of MSCM estimate ψˆ1 underdifferent IPW generation approaches when the large weights are progressively trun-cated in a simulation study of 1, 000 datasets with 300 subjects observed at most10 times under the rare event condition.177B.8. Additional Simulation Results0 10 20 30 40 500.60.81.01.2weight truncation percentilemseUnstabilized0 10 20 30 40 500.60.81.01.2weight truncation percentilemseNormalized unstabilized0 10 20 30 40 500.60.81.01.2weight truncation percentilemseStabilized0 10 20 30 40 500.60.81.01.2weight truncation percentilemseNormalized stabilizedlogist bag svm gbmFigure B.4: Mean squared error of MSCM estimate ψˆ1 under different IPWgeneration approaches when the large weights are progressively truncated in a sim-ulation study of 1, 000 datasets with 300 subjects observed at most 10 times underthe rare event condition.178B.8. Additional Simulation Results0 10 20 30 40 500.60.70.80.91.0weight truncation percentilecpUnstabilized0 10 20 30 40 500.60.70.80.91.0weight truncation percentilecpNormalized unstabilized0 10 20 30 40 500.60.70.80.91.0weight truncation percentilecpStabilized0 10 20 30 40 500.60.70.80.91.0weight truncation percentilecpNormalized stabilizedlogist bag svm gbmFigure B.5: The coverage probability (cp) of model-based nominal 95% confi-dence intervals based on the MSCM estimate ψˆ1 under different IPW generationapproaches when the large weights are progressively truncated in a simulation studyof 1, 000 datasets with 300 subjects observed at most 10 times under the rare eventcondition.179B.8. Additional Simulation ResultsB.8.2 Results from the Scenario When More Events areAvailable for n = 2, 5000 10 20 30 40 500.000.020.040.060.080.10weight truncation percentilebiasUnstabilized0 10 20 30 40 500.000.020.040.060.080.10weight truncation percentilebiasNormalized unstabilized0 10 20 30 40 500.000.020.040.060.080.10weight truncation percentilebiasStabilized0 10 20 30 40 500.000.020.040.060.080.10weight truncation percentilebiasNormalized stabilizedlogist bag svm gbmFigure B.6: Bias of MSCM estimate ψˆ1 under different IPW generation ap-proaches when the large weights are progressively truncated in a simulation studyof 1, 000 datasets with 2, 500 subjects observed at most 10 times when the eventrate is more frequent.180B.8. Additional Simulation Results0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilesdUnstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilesdNormalized unstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilesdStabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilesdNormalized stabilizedlogist bag svm gbmFigure B.7: Empirical standard deviation of MSCM estimate ψˆ1 under differentIPW generation approaches when the large weights are progressively truncated ina simulation study of 1, 000 datasets with 2, 500 subjects observed at most 10 timeswhen the event rate is more frequent.181B.8. Additional Simulation Results0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemodel.sdUnstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemodel.sdNormalized unstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemodel.sdStabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemodel.sdNormalized stabilizedlogist bag svm gbmFigure B.8: Average model-based standard error of MSCM estimate ψˆ1 underdifferent IPW generation approaches when the large weights are progressively trun-cated in a simulation study of 1, 000 datasets with 2, 500 subjects observed at most10 times when the event rate is more frequent.182B.8. Additional Simulation Results0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemseUnstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemseNormalized unstabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemseStabilized0 10 20 30 40 500.000.050.100.150.200.250.30weight truncation percentilemseNormalized stabilizedlogist bag svm gbmFigure B.9: Mean squared error of MSCM estimate ψˆ1 under different IPWgeneration approaches when the large weights are progressively truncated in a sim-ulation study of 1, 000 datasets with 2, 500 subjects observed at most 10 times whenthe event rate is more frequent.183B.8. Additional Simulation Results0 10 20 30 40 500.700.750.800.850.900.951.00weight truncation percentilecpUnstabilized0 10 20 30 40 500.700.750.800.850.900.951.00weight truncation percentilecpNormalized unstabilized0 10 20 30 40 500.700.750.800.850.900.951.00weight truncation percentilecpStabilized0 10 20 30 40 500.700.750.800.850.900.951.00weight truncation percentilecpNormalized stabilizedlogist bag svm gbmFigure B.10: The coverage probability (cp) of model-based nominal 95% confi-dence intervals based on the MSCM estimate under different IPW generation ap-proaches when the large weights are progressively truncated in a simulation studyof 1, 000 datasets with 2, 500 subjects observed at most 10 times when the eventrate is more frequent.184B.9. Supporting Results from the Empirical MS ApplicationB.9 Supporting Results from the Empirical MSApplicationlllllll0 10 20 30 40 500.250.260.270.280.290.300.31weight truncation percentilelog(hazard ratio)lllllll0 10 20 30 40 500.17460.17480.17500.17520.17540.1756weight truncation percentileSE of log(hazard ratio)l logist bag svm gbmFigure B.11: Performance of stabilized normalized weights generated by differentstatistical learning approaches for MSCM analysis to estimate log-hazard ψ1 in amultiple sclerosis study.185B.9. Supporting Results from the Empirical MS ApplicationTable B.5: The impact of truncation of the sw(n) generated via logistic regressionon the estimated causal effect of β-IFN on the hazard of reaching sustained EDSS 6for BC MS patients (1995-2008).Truncation Estimated weights Treatment effect estimatepercentiles Mean (log-SD) Min-Max HR SE† 95% CI†None 1.000 (-2.179) 0.317 - 1.713 1.360 0.239 0.964 - 1.919(1, 99) 1.000 (-2.246) 0.638 - 1.275 1.358 0.239 0.963 - 1.916(5, 95) 1.001 (-2.411) 0.812 - 1.164 1.352 0.237 0.959 - 1.905(10, 90) 1.003 (-2.572) 0.875 - 1.125 1.341 0.235 0.952 - 1.890(25, 75) 1.005 (-3.120) 0.950 - 1.062 1.313 0.229 0.932 - 1.849(35, 65) 1.003 (-3.768) 0.978 - 1.030 1.299 0.227 0.922 - 1.830Median 1.001 (-Inf) 1.001 - 1.001 1.288 0.225 0.914 - 1.815log-SD, logarithmic transformation of standard deviation; Min, minimum;Max, maximum; CI, confidence interval; HR, Hazard ratio; SE, standarderror.† Based on robust standard error.186B.9. Supporting Results from the Empirical MS ApplicationTable B.6: The impact of truncation of the sw(n) generated via bagging on theestimated causal effect of β-IFN on the hazard of reaching sustained EDSS 6 for BCMS patients (1995-2008).Truncation Estimated weights Treatment effect estimatepercentiles Mean (log-SD) Min-Max HR SE† 95% CI†None 1.000 (-4.882) 0.967 - 1.122 1.286 0.225 0.913 - 1.813(1, 99) 1.000 (-5.784) 0.988 - 1.020 1.287 0.225 0.913 - 1.814(5, 95) 1.000 (-6.821) 0.997 - 1.002 1.288 0.225 0.914 - 1.815(10, 90) 1.000 (-7.221) 0.998 - 1.001 1.288 0.225 0.914 - 1.815(25, 75) 1.000 (-7.848) 0.999 - 1.000 1.288 0.225 0.914 - 1.815(35, 65) 1.000 (-8.378) 0.999 - 1.000 1.288 0.225 0.914 - 1.815Median 1.000 (-Inf) 1.000 - 1.000 1.288 0.225 0.914 - 1.815log-SD, logarithmic transformation of standard deviation; Min, minimum;Max, maximum; CI, confidence interval; HR, Hazard ratio; SE, standarderror.† Based on robust standard error.187B.9. Supporting Results from the Empirical MS ApplicationTable B.7: The impact of truncation of the sw(n) generated via SVM on the esti-mated causal effect of β-IFN on the hazard of reaching sustained EDSS 6 for BC MSpatients (1995-2008).Truncation Estimated weights Treatment effect estimatepercentiles Mean (log-SD) Min-Max HR SE† 95% CI†None 1.000 (-3.036) 0.420 - 1.755 1.305 0.229 0.926 - 1.841(1, 99) 1.000 (-3.400) 0.866 - 1.102 1.305 0.229 0.926 - 1.840(5, 95) 1.000 (-3.667) 0.937 - 1.039 1.301 0.228 0.923 - 1.833(10, 90) 1.001 (-3.879) 0.963 - 1.030 1.299 0.227 0.922 - 1.830(25, 75) 1.003 (-4.437) 0.988 - 1.017 1.294 0.226 0.918 - 1.823(35, 65) 1.004 (-4.961) 0.996 - 1.012 1.291 0.226 0.916 - 1.820Median 1.005 (-Inf) 1.005 - 1.005 1.288 0.225 0.914 - 1.815log-SD, logarithmic transformation of standard deviation; Min, minimum;Max, maximum; CI, confidence interval; HR, Hazard ratio; SE, standarderror.† Based on robust standard error.188B.9. Supporting Results from the Empirical MS ApplicationTable B.8: The impact of truncation of the sw(n) generated via boosting on theestimated causal effect of β-IFN on the hazard of reaching sustained EDSS 6 for BCMS patients (1995-2008).Truncation Estimated weights Treatment effect estimatepercentiles Mean (log-SD) Min-Max HR SE† 95% CI†None 1.000 (-2.834) 0.348 - 1.749 1.321 0.231 0.938 - 1.861(1, 99) 1.002 (-3.269) 0.790 - 1.108 1.316 0.230 0.935 - 1.854(5, 95) 1.004 (-3.754) 0.945 - 1.051 1.305 0.228 0.926 - 1.838(10, 90) 1.004 (-4.095) 0.973 - 1.032 1.300 0.227 0.923 - 1.832(25, 75) 1.005 (-4.971) 0.997 - 1.014 1.293 0.226 0.918 - 1.822(35, 65) 1.005 (-5.599) 1.001 - 1.010 1.291 0.226 0.916 - 1.819Median 1.004 (-Inf) 1.004 - 1.004 1.288 0.225 0.914 - 1.815log-SD, logarithmic transformation of standard deviation; Min, minimum;Max, maximum; CI, confidence interval; HR, Hazard ratio; SE, standarderror.† Based on robust standard error.189Appendix CAppendix for Chapter 4C.1 Bias Due to Incorrect Handling of ImmortalTimeFor simplicity, we often improperly define the treatment exposure. For ex-ample, we assume the subjects are on treatment immediately after joining astudy cohort, when in reality, there may be a delay period to initiate treat-ment for some of the subjects. Not properly accounting for the delay periodcauses immortal time bias.Let us define the notation to investigate the bias associated with im-mortal time. Suppose i = 1 indicates the ever-treatment exposed group,whereas i = 0 indicates the never-treatment exposed group. Further, letNi and Ti (i = 0, 1) indicate the observed number of failures and follow-upperson-time in these groups.Let r = T0/T1, the ratio of the observed person-times in the never-treatment exposed and ever-treatment exposed subjects. Denote TIT as theobserved immortal time, i.e., the aggregated follow-up time not under treat-ment in the ever-treatment exposed group, and set f = TIT /T1. Let NITbe the number of failures observed during the immortal time. Obviously,NIT = 0. Also, let T ′1 = T1−TIT = (1−f)×T1 denote the person-time undertreatment in the ever-treatment exposed group. The total person-time notunder treatment is T ′0 = T0 + TIT = r × T1 + f × T1 = T1(r + f), where T0and TIT are contributed by the never-treatment exposed and ever-treatmentexposed subjects respectively. Under the assumption of constant hazard of190C.1. Bias Due to Incorrect Handling of Immortal Timefailure, the failure rate is calculated by the number of failures divided by thecorresponding follow-up person-time. Thus, the failure rate under treatmentis N1/T ′1, the failure rate not under treatment is N0/T′0, and the failure rateratio obtained from a time-dependent analysis is:RR =N1/T ′1N0/T ′0=N1/(T1 − TIT )N0/(T0 + TIT ). (C.1)C.1.1 Misclassifying Immortal TimeMisclassifying the observed immortal time TIT as treated time leads to thefailure rate of N1/T1 for the ever-treatment exposed subjects, and the failurerate of N0/T0 for the never-treatment exposed subjects, and the failure rateratio,RR′ =N1/T1N0/T0.Comparing RR′ with the correct rate ratio RR yields [80]:RR′RR=N1/T1N0/T0N1/(T1−TIT )N0/(T0+TIT )=T0T1×T1 − TITT0 + TIT= r ×T1 − f × T1T0 + f × T1= r ×T1(1− f)r × T1 + f × T1= r ×(1− f)(r + f)= (1− f)×r(r + f). (C.2)Under the assumption of constant hazard, this approach, therefore, al-191C.1. Bias Due to Incorrect Handling of Immortal Timeways underestimates the correct failure rate rato. As a lower rate ratio isindicative of less hazard or risk, this approach always overestimates (in-flates) the treatment effect. Varying the r and f parameters in equation(C.2) yields the Appendix Figure C.1 (upper panel 1). We can see a largerdownward bias (in RR′/RR) for increasing values of f , the fraction of theimmortal person-time in the ever-treatment exposed subjects. For differ-ent ratios r = T0/T1 (r = 0.25, 0.5, 1, 2, 4, 8), the pattern of RR′/RR lookssimilar. The higher values of r yield slightly less bias (in RR′/RR).Figure C.1: Risk ratios of misclassified immortal time (RR′), excluding immortaltime (RR′′) and PTDM (RR′′) methods compared to that of a time-dependentanalysisRR in terms of various fraction of immortal time f and ratio of person-timesunder no treatment versus under treatment r under the assumption of constanthazard.192C.1. Bias Due to Incorrect Handling of Immortal TimeC.1.2 Excluding Immortal TimeExclusion of the immortal time yields the failure rate under treatment ofN1/T ′1 as NIT , the number of failures during the immortal time TIT , is zero.The immortal time TIT is not included in the calculation of the failure ratefor the untreated group, leading to failure rate N0/T0, and the failure rateratioRR′′ =N1/T ′1N0/T0.Comparing RR′′ to the correct rate ratio RR yields [80]:RR′′RR=(N1/T ′1)/(N0/T0)(N1/T ′1)/(N0/T′0)=T0T ′0=T0T0 + TIT=r × T1r × T1 + f × T1=rr + f(C.3)As in the previous situation, this approach, therefore, always underesti-mates the correct failure rate ratio, overestimating the effect of treatment.Varying the r and f parameters in equation (C.3) yields the Appendix Fig-ure C.1 (upper panel 2). This also shows a downward bias (in RR′′/RR)for increasing values of f , the fraction of the immortal person-time in theever-treatment exposed subjects. However, the bias (in RR′′/RR) is signifi-cantly reduced for the higher values of r, the ratio of the person-times in thenever-treatment exposed and ever-treatment exposed subjects. If the never-treatment exposed cohort is much larger than the ever-treatment exposedcohort, the bias from this approach may be negligible, even for large frac-tions of immortal time f . Therefore, use of this approach may be reasonablein some settings [252].193C.2. Illustration of the Prescription Time-distribution Matching ApproachC.2 Illustration of the PrescriptionTime-distribution Matching ApproachThe PTDM approach can be illustrated as follows. Let us consider the timeof eligibility to receive treatment as the baseline. The length of time fromthe first eligibility date t0 = 0 to the treatment initiation TA for the treatedsubjects is the wait-period or immortal time TIT .Figure C.2: An illustration of prescription time-distribution matchingTo apply this method, first, the wait-periods Tj,IT for each of the treatedsubjects j are listed. To achieve balance in both treatment groups, the dis-tribution of these wait-periods Tj,IT for the treated subjects needs to bematched with a similar part of the follow-up time for the untreated sub-jects. To achieve this, for each untreated subject j′, a wait-period Tj,IT isselected at random from the created list of wait-periods for the treated sub-jects and is assigned to the untreated subject j′. If this wait-period Tj,IT islonger than the event time Tj′ or the censoring time TCj′ of this untreatedsubject j′, the untreated subject j′ gets excluded from further analysis. For194C.2. Illustration of the Prescription Time-distribution Matching Approachsimplicity, let us first assume that both groups have the same number of sub-jects i.e., N0 = N1. Then this process should match the time-distributionof Tj,IT for the treated subjects to the assigned wait-period distribution ofTj′,IT of the untreated subjects (as shown in Figure C.2). The wait-periodsfor the treated subjects and the matched contributions for the untreatedsubjects are deleted together.The immortal time TIT ≡∑N1j=1 Tj,IT and we denote T′IT ≡∑N0j′=1 Tj′,IT .After excluding the observed and assigned wait-times from both groups, theunexposed time under consideration is T ′′0 = T0−T′IT and the exposed timeunder consideration is T ′1 = T1 − TIT . As Tj,IT and Tj′,IT follow the samedistribution, assuming N1 = N0 we have TIT ≈ T ′IT , and the balance shouldbe restored due to the elimination of the similar wait-periods from bothgroups. Subjects in both groups are now followed from their new baselinesuntil reaching outcome or censoring. Under the assumption of constanthazard, the failure rate ratio is calculated asRR′′′ =N1/T ′1N0/T ′′0.Let us first derive the formula of the rate ratio RR′′′ for the PTDM approachunder two further simplifying assumptions: T ′IT = TIT and the number ofsubjects discarded from the never-treatment exposed group, N ′IT = 0. Wewill derive the formula for more general settings later. Comparing RR′′′with the correct rate ratio RR in equation (C.1) yields:195C.2. Illustration of the Prescription Time-distribution Matching ApproachRR′′′RR=(N1/T ′1)/(N0/T′′0 )(N1/T ′1)/(N0/T′0)=T ′′0T ′0=T0 − TITT0 + TIT=r × T1 − f × T1r × T1 + f × T1=r − fr + f. (C.4)We see that RR′′′/RR can also be expressed as a function of r and f .Varying the r and f parameters yields Appendix Figure C.1 (lower panel).Unfortunately, this approach also shows a downward bias for increasing val-ues of f , the fraction of the immortal person-time in the ever-treatmentexposed subjects. As for the exclusion method, the bias (in RR′′′/RR) issignificantly reduced for high values of r, the ratio of the person-times inthe never-treatment exposed and ever-treatment exposed subjects. However,small values of r and large values of f have much more detrimental effectson the bias compared to the misclassification and exclusion approaches.To obtain equation (C.4), we assumed that the number of failures inassigned wait-period T ′IT for the untreated patients is zero; i.e., N′IT = 0. Ingeneral, for N ′IT ≥ 0, the total untreated person-time Tx of the N′IT patientswho had failures within the assigned wait-times is excluded and the formulabecomes (set x = Tx/T1 ≥ 0):196C.2. Illustration of the Prescription Time-distribution Matching ApproachRR′′′RR=N1/(T1−TIT )(N0−N ′IT )/(T0−TIT−Tx)N1/(T1−TIT )N0/(T0+TIT )=(N0N0 −N ′IT)T0 − TIT − TxT0 + TIT=(N0N0 −N ′IT)×r − f − xr + f≥r − fr + f. (C.5)Therefore, equation (C.4) is actually a lower bound for the bias in RR′′′/RR.Also, if N ′IT = 0, then Tx = 0 as well.Now, let us relax the assumption that T ′IT = TIT . Set T′IT = q × TIT ;that is, q is the ratio of assigned and observed wait-periods. Here, q > 1 forthe setting where there are more subjects in the never-treatment exposedgroup than the ever-treatment exposed group, and otherwise 0 < q < 1.Then, T ′′0 = T0 − T′IT = T0 − q× TIT and T′1 = T1 − TIT , and the derivationleading to (C.5) is modified as follows:RR′′′RR=N1/(T1−TIT )(N0−N ′IT )/(T0−T′IT−Tx)N1/(T1−TIT )N0/(T0+TIT )=(N0N0 −N ′IT)T0 − T ′IT − TxT0 + TIT=(N0N0 −N ′IT)T0 − q × TIT − x× T1T0 + TIT=(N0N0 −N ′IT)r − q × f − xr + f(C.6)The equations (C.4) - (C.6) and Appendix Figure C.1 show the generalpattern of bias and allow general statements about the approaches under197C.3. Constructing a Mini-trial in the Sequential Cox Approachconsideration. However, in our simulation studies, we took into accountadditional specific details of a more realistic epidemiological setting, such ascensoring, different rates of failures, covariate under consideration, etc.C.3 Constructing a Mini-trial in the SequentialCox ApproachFigure C.3: An illustration of the sequential Cox approachTo illustrate the method, consider Appendix Figure C.3, where thefollow–up times for 11 subjects are outlined. Patient 1 was not under treat-ment when she entered the study. She started taking the treatment in them = 4th month and was censored during the 5th month. Similarly subject5, who was never under treatment was censored during the 6th month. Now,suppose we want to create the mimicked trial considering the 4th month asthe reference interval. We eliminate the subjects who received treatmentbefore the 4th month, i.e., the 3rd, 7th and 11th subjects will be discarded.198C.4. Implementation of the Sequential Cox Approach in RThen for the subjects who started treatment after the 4th month, we censorthem at the time of treatment start i.e., the 6th and 10th subjects are cen-sored at the 5th and 6th months respectively. Then, under the assumptionthat treatment status remains the same for the entire month, subjects 1, 4and 9 will be considered the treated group and subjects 2, 5, 6, 8 and 10will be considered the control group, for the mimicked trial starting at thebeginning of 4th month.Similarly, we can identify the subjects for the treatment and controlgroups in the mimicked trials starting at the beginning of other months.This yields multiple mimicked RCTs, one for each of the time intervals (say,months) of treatment start. The treatment effect can be estimated sepa-rately from each mimicked trial data and then aggregated (i.e., averaged) toestimate the average treatment effect.C.4 Implementation of the Sequential CoxApproach in RThe coxph function in the survival package [249] is used to fit both time-independent and time-dependent Cox PH models. While preparing the datafor the mini-trials of the sequential Cox approach, we can code it in eitherlong or wide form; both will produce the same result:• In the long form, each row of the data can represent the smallesttime interval to be used (such as month) and the multiple rows persubject specify the start and stop of all the intervals. Rows withoutany change in covariate values can be merged (one row starting at thebaseline, one for starting at the m-th month and another for the laggedvalues) [73]. Then the counting process formulation of the coxph canbe applied specifying the start and end time of the intervals and thecorresponding event status.• In the wide form, each subject in the m-th mini-trial will produce199C.5. Survival Data Simulation via Permutation Algorithmonly one row in the mini-trial data containing all the correspondinginformation at baseline, the m-th interval and the lagged data of m-thinterval as separate covariates [75]. Then the standard coxph can beapplied specifying the follow-up time and event status.In the coxph function, the option strata is set to fit a stratified Coxmodel for the sequential Cox approach. Also, the options such as clusterand robust = TRUE are set to obtain the robust (sandwich) variance es-timate. This is an approximate grouped jackknife variance estimate [253,p.170] when multiple observations per subject are present. To obtain boot-strap estimates [165], the lapply function is used on each bootstrap sampleto estimate the corresponding IPCWs and subsequently the HR from a CoxPH. In this chapter we fitted pooled logistic regression using the glm func-tion in the stats package to estimate the IPCWs. Alternatively, Aalen’sadditive regression can be fitted using the aalen function in the timeregpackage for the same purpose [75].C.5 Survival Data Simulation via PermutationAlgorithmThe algorithm has following steps:1. For each subject i = 1, 2, . . . , n, we generate the survival time Ti usinga specified distribution.2. For each subject i, we generate the censoring time TCi using a specifieddistribution.3. We find the observed survival time T ∗i = min(Ti, TCi ) and the binarycensoring indicator Ci = I(Ti ≥ TCi ) = 1 if censored and 0 otherwise.4. Repeat steps 1-3 n times and sort survival status tuples (T ∗i , Ci) withrespect to T ∗i in increasing order.200C.5. Survival Data Simulation via Permutation Algorithm5. We generate n covariate matrices Xi = (Aim, Li0, Lim) with dimen-sions (m × p), where the m = 0, 1, . . . ,K rows correspond to the dif-ferent time intervals or visits when measurements are taken and thep columns correspond to the predictor variables, including treatment(Am), time-fixed and/or time-varying covariates (L0 and/or Lm). Forsubject i, Xim, the m-th row of Xi, is a vector of variable values attime m.6. According to the ordered T ∗i listed in step 3, we begin assigning thesurvival status tuple (T ∗i , Ci) to covariate values from Xim as follows.At time T ∗i , variable values (treatment and covariate) are sampledwith probabilities pim defined below based on the Cox model’s partiallikelihood:pim =exp(ψXim)∑j∈riexp(ψXjm), if Ci = 01∑j∈riI(j∈ri), if Ci = 1,where ψ is the vector of log-hazards for the corresponding variablesand I(j ∈ ri) indicates whether a subject is within a given riskset rifor time T ∗i .7. The subject i with the covariate values Xim is assigned the observedtime T ∗i . The selected Xim is removed from further calculation.The permutation algorithm is implemented in the PermAlgo package inR [254].201C.6. Additional Simulation ResultsC.6 Additional Simulation ResultsC.6.1 When More Events are AvailableTable C.1: Comparison of the analytical approaches to adjust forimmortal time bias from simulation-I (one baseline covariate andtime-dependent treatment exposure) of 1, 000 datasets, each con-taining 2, 500 subjects followed for up to 10 time-intervals (frequentevent case λ0 = 0.10).Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort 0.000 0.061 0.060 0.951 1.000Included IT -2.149 0.062 0.059 0.000 1.000Excluded IT -1.220 0.055 0.051 0.000 1.000PTDM -1.284 0.073 0.070 0.000 1.000Sequential Cox -0.038 0.071 0.070 0.899 1.000MSCM - - - - -PTDM, Prescription time distribution matching; IT, Im-mortal time; MSCM, Marginal structural Cox model.202C.6. Additional Simulation ResultsTable C.2: Comparison of the analytical approaches to adjust forimmortal time bias from simulation-II (one baseline covariate, one time-dependent covariate and time-dependent treatment exposure) of 1, 000datasets, each containing 2, 500 subjects followed for up to 10 time-intervals (frequent event case).Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort -0.002 0.059 0.060 0.960 1.000Full cohort (Base) -0.208 0.067 0.070 0.130 0.990Included IT -1.638 0.076 0.076 0.000 1.000Excluded IT -1.411 0.069 0.069 0.000 1.000PTDM -1.440 0.085 0.084 0.000 1.000Sequential Cox 0.174 0.066 0.068 0.273 1.000MSCM -0.014 0.058 0.060 0.952 1.000PTDM, Prescription time distribution matching; IT, Immor-tal time; MSCM, Marginal structural Cox model.203C.6. Additional Simulation ResultsTable C.3: Comparison of the analytical approaches to adjust for im-mortal time bias from simulation-III (one time-dependent confounderand time-dependent treatment exposure) of 1, 000 datasets, each con-taining 2, 500 subjects followed for up to 10 time-intervals (frequentevent case).Approach Bias SD(ψˆ1) se(ψˆ1) CP PowerFull cohort 0.044 0.067 0.065 0.888 1.000Full cohort (Base) 0.007 0.068 0.066 0.942 1.000Included IT -2.095 0.090 0.084 0.000 1.000Excluded IT -1.629 0.071 0.068 0.000 1.000PTDM -1.575 0.090 0.074 0.000 1.000Sequential Cox 0.202 0.099 0.099 0.464 1.000Sequential Cox † 0.201 0.096 0.096 0.433 1.000Sequential Cox § 0.181 0.096 0.096 0.522 1.000MSCM 0.000 0.069 0.068 0.942 1.000PTDM, Prescription time distribution matching; IT, Immor-tal time; MSCM, Marginal structural Cox model.† Sequential Cox not adjusting for either time-dependent con-founder or informative censoring.§ Sequential Cox adjusting for both time-dependent confounderin the regression for estimating β and informative censoringvia IPCW.204C.7. Additional MS Data AnalysisC.7 Additional MS Data AnalysisC.7.1 Prescription Time-distribution MatchingFigure C.4: Estimated hazard ratio from the PTDM method to estimate thecausal effect of β-IFN on time to sustained EDSS 6 for patients with relapsing-onset multiple sclerosis (MS), British Columbia, Canada (1995-2008)205C.7. Additional MS Data AnalysisTable C.4: Mean (SD) of the estimated param-eters using PTDM from the MS example with1, 000 different starting seed values.HR se(HˆR) Average 95% CI1.44 (0.09) 0.28 (0.02) 0.97 - 2.11PTDM, Prescription time distributionmatching.The analyses are adjusted for baseline co-variates: gender, EDSS score, age, diseaseduration and time-dependent confounder‘cumulative relapse’.C.7.2 Sequential Cox ApproachTable C.5: Estimated hazard ratio using the sequential Cox approach to estimatethe causal effect of β-IFN on time to sustained EDSS 6 for patients with relapsing-onset multiple sclerosis (MS), British Columbia, Canada (1995-2008), when IPCWs arecalculated from the combined dataset of all mini-trials.Approach HR se(HˆR) 95% CI WeightsAverage (log-SD) rangeSequential Cox 1.11 0.29 0.66 - 1.85 1.00 ( -4.15 ) 0.64 - 1.40The HR for the treatment is reported. The analyses are adjusted for baselinecovariates: sex, EDSS score, age, disease duration and time-dependent con-founder ‘cumulative relapse’ measured at baseline, treatment initiation monthand its lagged value.206C.7.AdditionalMSDataAnalysisFigure C.5: Density plots of the estimated IPC weights from the MS data (estimated from each mini-trial separately) inall the reference intervals using the sequential Cox approach207C.7.AdditionalMSDataAnalysisFigure C.6: Density plots of the estimated IPC weights from the MS data (estimated from the aggregated data of allmini-trials) in all the reference intervals using the sequential Cox approach208

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167113/manifest

Comment

Related Items