You may notice some images loading slow across the Open Collections website. Thank you for your patience as we rebuild the cache to make images load faster.

Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Comparative efficacy and safety of first-line treatments for HIV patients for clinical guideline development… Kanters, Steve 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2019_november_kanters_steve.pdf [ 11.52MB ]
Metadata
JSON: 24-1.0383244.json
JSON-LD: 24-1.0383244-ld.json
RDF/XML (Pretty): 24-1.0383244-rdf.xml
RDF/JSON: 24-1.0383244-rdf.json
Turtle: 24-1.0383244-turtle.txt
N-Triples: 24-1.0383244-rdf-ntriples.txt
Original Record: 24-1.0383244-source.json
Full Text
24-1.0383244-fulltext.txt
Citation
24-1.0383244.ris

Full Text

      COMPARATIVE EFFICACY AND SAFETY OF FIRST-LINE TREATMENTS FOR HIV PATIENTS FOR CLINICAL GUIDELINE DEVELOPMENT AND THE IMPACT OF INDIVIDUAL PATIENT DATA   by STEVE KANTERS BSc, McGill University, 2004 MSc, The University of British Columbia, 2006   A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Population and Public Health)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  September 2019 © Steve Kanters, 2019    ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Comparative efficacy and safety of first-line treatments for hiv patients for clinical guideline development and the impact of individual patient data   submitted by Steve Kanters in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Population and Public Health  Examining Committee: Nick Bansback, Population and Public Health Supervisor  Mohammad Ehsanul Karim, Population and Public Health Supervisory Committee Member  Aslam Anis, Population and Public Health Supervisory Committee Member Hubert Wong, Population and Public Health University Examiner Michael John Milloy, AIDS University Examiner  Additional Supervisory Committee Members: Kristian Thorlund, Health Research Methods Supervisory Committee Member  iii Abstract Since 2008, efavirenz+tenofovir+emtricitabine (EFV+TDF+XTC) has been the preferred first-line antiretroviral therapy (ART) regimen for treating HIV throughout most countries. With an expanding choice of ART, should a newer treatment be preferred? The therapeutic landscape was assessed for efficacy, safety and tolerability through a systematic literature review (SLR) and network meta-analysis (NMA).  Data were analyzed using aggregate data (AgD) from publications for each population of interest. Ninety eligible trials were identified in the principal SLR and 65 were included in analyses. There was high certainty that dolutegravir (DTG) was superior to EFV with respect to viral suppression, change in CD4 cell counts, discontinuation, and adverse events. DTG and EFV were comparable among TB-HIV co-infected patients. Among pregnant women initiating DTG, there appeared to be fewer adverse events than with EFV.  To determine whether the inclusion of individual patient data (IPD) would impact decision-making and to explore the impact of integrating IPD in varying ways, the SLR and NMA were expanded through the addition of IPD obtained for three critical trials: SINGLE, FLAMINGO and SPRING-2. Use of IPD did not alter the conclusions. In the few cases where IPD-based analyses were selected, the impact on estimates did not meaningfully impact their utility towards the development of clinical guidelines.  A simulation study was conducted to determine how network size, density, proportion of IPD, and nature of effect-modification could predict impact of IPD on NMA results. The inclusion of IPD may be most impactful among small and/or sparse networks of evidence. Having a higher proportion of treatment comparisons with IPD also improves the NMA estimates, particularly among larger networks of evidences. Similarly, these simulations suggested while inclusion of IPD led to improvements with respect to both bias and precision of estimates, these improvements decreased within larger and more dense networks – such as those used in the HIV analyses.  In conclusion, the findings support the use of DTG+TDF+XTC as the preferred first-line regimen, supporting the change in HIV guidelines by the World Health Organization in late 2018. The analyses provide important insights into the types of networks where IPD would influence results of NMA.  .  iv Lay Summary Clinical guidelines are used by policy-makers, healthcare providers and patients to inform health-related decision-making, including choice of treatment. Atripla, a multi-drug pill, has been the most widely recommended treatment for previously untreated HIV patients since 2008.  This research identified and assessed the evidence to determine if a newer treatment should be the preferred treatment for previously untreated HIV patients on the basis of efficacy, safety and tolerability. This research also explored whether including data from some patients from the select trials, rather than summary results for each trial, improved the evidence base. Computer simulations were then used to show that while including patient data will often improve the evidence, adding relatively few patient data among a large number of trials will not.  Overall, the results suggest that a dolutegravir-based treatment should be preferred to Atripla. This work supported the World Health Organization’s decision to change the recommendation in December 2018.    v Preface I began working in HIV as a statistician at the British Columbia Centre for Excellence in HIV/AIDS in 2009, where I developed an interest in HIV research. In 2011, I moved to work as a statistician at what would later become Redwood Outcomes, where I expanded my interest to network meta-analyses. This work led me to begin my PhD part time in 2012. I originally began my PhD with a different supervisor on a different topic; however, in my second year, my real research interests emerged, namely HIV and network meta-analysis. On this basis, I developed a program of research to determine whether any of the newer 3-drug antiretroviral therapy regimens should be favoured over efavirenz/emtricitabine/tenofovir-disoproxil-fumarate as the preferred first-line regimen in a public health framework and to expand upon the network meta-analyses methods to include non-comparative studies.  During this time, I met with Nick Bansback who has expertise in network meta-analysis and in late 2013 he agreed to become my supervisor. With his support, I was awarded a CIHR Doctoral Research award for the above program of research. In 2014, I planned to update the systematic literature review I began in 2013 and use it to conduct a network meta-analysis. In late 2014, the World Health Organization (WHO) put out a call to conduct evidence syntheses to inform an update to the 2013 WHO Consolidated Guidelines on the use of ARVs for Treating and Preventing HIV Infection. I proposed to the WHO to use a network meta-analysis approach rather than the collection of pairwise meta-analyses they had used to inform all of their previous HIV guidelines. In 2015, I conducted the systematic literature review and network meta-analyses to help determine which ART regimen should be the preferred first-line regimen (i.e., the HIV knowledge gap previously established for my dissertation research program). This work was used to inform the 2016 WHO Consolidated HIV Guidelines and it was published in: Kanters S, Vitoria M, Doherty M, Socias ME, Ford N, Forrest JI, Popoff E, Bansback N, Thorlund K, Mills EJ. Comparative efficacy and safety of first-line antiretroviral therapy for the treatment of HIV infection: a systematic review and network meta-analysis. Lancet HIV. Nov 2016;3(11):510-520. I conceived the study concept and design and received support from Nick Bansback, Kristian Thorlund, Edward Mills, Marco Vitario, Meg Doherty, Eugenia Socias, and Nathan Ford. I had planned to use this work as my Chapter 2, but for reasons explained below, I updated the review and analyses in 2018, and the update was the basis of Chapter 2. On the basis of this work, I wrote an editorial: Kanters S, Ford N, Druyts E, Thorlund K, Mills EJ, Bansback N. Use of network meta-analysis in clinical guidelines. Bull World Health Organ. Oct 1 2016;94(10):782-784. I conceived the idea for this paper and further refined it with Nick Bansback and Nathan Ford. I wrote the first draft and all authors contributed to critical feedback and writing of subsequent drafts. This has become part of Chapter 1.  While conducting the review and analyses, I became curious as to whether applying individual patient level data would have an impact on the results of the evidence synthesis and its impact on the development of the guidelines. HIV trials are particularly heterogeneous and challenging with respect to evidence synthesis.  vi In August of 2016, I also wanted to compare the methods by which to integrate patient data to traditional network meta-analyses. I submitted a request for the de-identified individual patient data of three pivotal clinical trials to ClinicalStudyDataRequest.com. The submission was entirely written and prepared by myself, though the idea was discussed with my committee. The data were received in the summer of 2017 following the signature of an agreement in April 2017. Given many other trials had been published since 2015, to ensure that the results remained pertinent, the 2015 SLR was updated. The expanded traditional analyses formed the basis of Chapter 2 and the analyses integrating and exploring the patient data formed the basis of Chapter 3 of this dissertation. Just as Chapter 3 was borne of the work conducted in Chapter 2, so was Chapter 4 borne of the work in Chapter 3. Following the limited impact of individual patient data on the analyses, I hypothesized that the reason for this may have been the small number of patients included in the analysis relative to the total population in the evidence base. I designed a simulation study to better understand how many patients are needed and whether other extrinsic factors could help predict when patient data could be expected to impact an analysis. I designed this research for Chapter 4 under the supervision of my full committee. I conducted all analyses.  Ethics approval was obtained for this thesis (certificate number: H16-02646).  vii Table of Contents  Abstract ...................................................................................................................................................... iii Lay Summary ............................................................................................................................................. iv Preface ......................................................................................................................................................... v Table of Contents ...................................................................................................................................... vii List of Tables ............................................................................................................................................ xiv List of Figures .......................................................................................................................................... xvi List of Symbols ........................................................................................................................................ xix List of Abbreviations ............................................................................................................................... xxi Glossary ................................................................................................................................................. xxiv Acknowledgements .............................................................................................................................. xxvii Dedication ............................................................................................................................................. xxviii Chapter 1: Introduction .............................................................................................................................. 1 1.1 Human immunodeficiency virus .................................................................................................. 1 1.1.1 History and treatment ............................................................................................................. 3 1.1.2 Current guidelines and new developments ............................................................................ 4 1.2 Current evidence synthesis methods ......................................................................................... 5 1.2.1 Conventional meta-analysis ................................................................................................... 6 1.2.2 Network meta-analysis ........................................................................................................... 7 1.2.3 The rationale for using NMA for the development of clinical guidelines ............................... 10 1.3 Individual patient data in evidence synthesis ........................................................................... 11 1.4 Summary of knowledge gaps ................................................................................................... 13 1.5 Dissertation objectives .............................................................................................................. 14 1.6 Dissertation outline ................................................................................................................... 15  viii Chapter 2: Systematic literature review and network meta-analysis of first-line ART ...................... 16 2.1 Synopsis ................................................................................................................................... 16 2.2 Introduction ............................................................................................................................... 18 2.3 Objectives ................................................................................................................................. 20 2.4 Methodology ............................................................................................................................. 20 2.4.1 Systematic literature review ................................................................................................. 21 2.4.1.1 Sources ....................................................................................................................... 22 2.4.1.2 Search strategy ........................................................................................................... 23 2.4.1.3 Study selection ............................................................................................................ 23 2.4.1.4 Study quality ................................................................................................................ 23 2.4.1.5 Data extraction ............................................................................................................ 23 2.4.2 Statistical analyses ............................................................................................................... 24 2.4.2.1 Network meta-analyses ............................................................................................... 24 2.4.2.2 Node definitions and backbone adjustments ............................................................... 27 2.4.2.3 Models ......................................................................................................................... 28 2.4.2.4 Adjusted analysis ......................................................................................................... 29 2.4.2.5 Evaluation of consistency between direct and indirect comparisons ........................... 29 2.4.2.6 Model selection ............................................................................................................ 29 2.4.2.7 Software ...................................................................................................................... 30 2.5 Results: Adults and adolescents .............................................................................................. 30 2.5.1 Systematic literature review study selection ........................................................................ 30 2.5.2 Analysis set study selection ................................................................................................. 32 2.5.3 Network meta-analysis results ............................................................................................. 34 2.5.3.1 Viral suppression ......................................................................................................... 34  ix 2.5.3.2 Increase in CD4 cell counts ......................................................................................... 37 2.5.3.3 Mortality ....................................................................................................................... 39 2.5.3.4 AIDS defining illnesses ................................................................................................ 39 2.5.3.5 Discontinuations .......................................................................................................... 40 2.5.3.6 Discontinuations due to adverse events ...................................................................... 40 2.5.3.7 Treatment-related and emergent adverse events ....................................................... 41 2.5.3.8 Treatment-related and treatment-emergent serious adverse events .......................... 42 2.5.3.9 Regimen substitutions ................................................................................................. 42 2.5.4 Remarks relative to the 2015 analyses ................................................................................ 43 2.6 Results for the TB co-infected subpopulation ........................................................................... 43 2.6.1 The INSPIRING trial ............................................................................................................. 43 2.6.2 Systematic literature review study selection ........................................................................ 44 2.6.3 Network meta-analysis results ............................................................................................. 46 2.6.3.1 Efficacy ........................................................................................................................ 47 2.6.3.2 Tolerability ................................................................................................................... 48 2.6.3.3 Safety .......................................................................................................................... 49 2.7 Results for the pregnant and breastfeeding women subpopulation ......................................... 49 2.7.1 Systematic literature review study selection ........................................................................ 49 2.7.2 Summary of the evidence base ............................................................................................ 51 2.8 Results: Children and adolescents ........................................................................................... 52 2.9 Discussion ................................................................................................................................ 53 Chapter 3: Investigation into the benefits of using IPD for the systematic literature review and network meta-analysis of first-line ART ................................................................................................. 58 3.1 Synopsis ................................................................................................................................... 58  x 3.2 Introduction ............................................................................................................................... 60 3.3 Objective ................................................................................................................................... 63 3.4 Methodology ............................................................................................................................. 63 3.4.1 Systematic literature review ................................................................................................. 63 3.4.1.1 Eligibility criteria ........................................................................................................... 63 3.4.1.2 Data sources and search strategy ............................................................................... 64 3.4.1.3 Study selection and data extraction ............................................................................. 65 3.4.1.4 Study quality ................................................................................................................ 65 3.4.2 Preparation of the individual patient data ............................................................................. 65 3.4.3 Statistical analyses ............................................................................................................... 66 3.4.3.1 Statistical models ......................................................................................................... 68 3.4.3.2 Measures of model comparison .................................................................................. 72 3.4.3.3 Additional considerations ............................................................................................. 73 3.4.3.4 Software ...................................................................................................................... 74 3.4.4 Funding ................................................................................................................................ 74 3.5 Results ...................................................................................................................................... 75 3.5.1 Evidence base ...................................................................................................................... 75 3.5.2 Stage 1 – Comparing meta-regression adjustments ............................................................ 77 3.5.2.1 Viral suppression at 48 weeks ..................................................................................... 77 3.5.2.2 Change from baseline in CD4 at 48 weeks ................................................................. 82 3.5.2.3 Discontinuations .......................................................................................................... 85 3.5.2.4 Discontinuations due to adverse events ...................................................................... 87 3.5.3 Stage 2 – Comparative efficacy and safety .......................................................................... 90 3.6 Discussion ................................................................................................................................ 92  xi Chapter 4: When does use of individual patient data make a difference? .......................................... 96 4.1 Synopsis ................................................................................................................................... 96 4.2 Introduction ............................................................................................................................... 98 4.3 Objective ................................................................................................................................... 99 4.3.1 Hypotheses .......................................................................................................................... 99 4.4 Methodology ........................................................................................................................... 100 4.4.1 Model parameters .............................................................................................................. 100 4.4.2 Data generation .................................................................................................................. 102 4.4.2.1 Constructing a network .............................................................................................. 102 4.4.2.2 Generating observations ........................................................................................... 104 4.4.2.3 Number of simulations and computation time ........................................................... 105 4.4.2.4 Data generation for AgD can lead to almost unrealistically small residual variance . 105 4.4.3 Data analysis ...................................................................................................................... 106 4.4.4 Data collection and measures of comparison .................................................................... 107 4.4.5 Software ............................................................................................................................. 107 4.5 Results .................................................................................................................................... 107 4.5.1 Treatment-effect estimation................................................................................................ 107 4.5.1.1 Bias and mean-squared error .................................................................................... 107 4.5.1.2 Estimation precision .................................................................................................. 117 4.5.1.3 Proportion of patients with IPD .................................................................................. 118 4.5.2 Regression coefficient estimation ...................................................................................... 119 4.5.3 Model diagnostics .............................................................................................................. 121 4.5.4 When did IPD help? ........................................................................................................... 121 4.6 Discussion .............................................................................................................................. 121  xii Chapter 5: Discussion ............................................................................................................................ 125 5.1 Key findings ............................................................................................................................ 125 5.1.1 Dolutegravir as the preferred anchor treatment ................................................................. 125 5.1.1.1 Neural tube defects ................................................................................................... 126 5.1.1.2 Guideline recommendations for first-line ART regimens ........................................... 128 5.1.1.3 NAMSAL trial ............................................................................................................. 129 5.1.2 Use of individual patient data in network meta-analysis .................................................... 130 5.1.3 Use of network meta-analysis for clinical guideline development ...................................... 132 5.2 Strengths ................................................................................................................................ 135 5.3 Limitations .............................................................................................................................. 135 5.4 Future research ...................................................................................................................... 136 5.5 Knowledge translation ............................................................................................................ 138 5.6 Conclusion .............................................................................................................................. 140 Bibliography ............................................................................................................................................ 142 Appendices .............................................................................................................................................. 166 Appendix A .......................................................................................................................................... 166 A.1 Model code ......................................................................................................................... 166 A.2 Risk of bias ......................................................................................................................... 168 A.3 Data availability .................................................................................................................. 172 A.4 Verification of inconsistency ............................................................................................... 175 A.5 Cross tables summarizing results of all analyses .............................................................. 181 Appendix B .......................................................................................................................................... 192 B.1 PRISMA-IPD Checklist ....................................................................................................... 192 B.2 Model code ......................................................................................................................... 196  xiii B.3 Results of individual studies ............................................................................................... 202 B.4 Risk of bias across studies ................................................................................................. 206 B.5 Additional results from Stage 1 analyses ........................................................................... 221 B.6 Results from Stage 2 analyses .......................................................................................... 228 Appendix C .......................................................................................................................................... 235 C.1 Simulation code R code ..................................................................................................... 235 C.2 Additional results ................................................................................................................ 243   xiv List of Tables Table 1-1: Scope of the dissertation research program ............................................................................. 14 Table 2-1: List of antiviral agents, their drug class, year of approval and how they are currently used and perceived clinically ..................................................................................................................................... 19 Table 2-2: Scope of the systematic literature review in PICOS form ......................................................... 21 Table 2-3: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for viral suppression at 48 weeks from the fixed-effects network meta-analyses ...................................... 36 Table 2-4: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for viral suppression at 24 weeks from the fixed-effects network meta-analyses in HIV-TB co-infected patients ...................................................................................................................................................... 47 Table 2-5: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for mean change in CD4 cell counts at 24 weeks from the fixed-effects network meta-analyses in HIV-TB co-infected patients .................................................................................................................................... 47 Table 2-6: Data for treatment comparisons of interest for discontinuations due to adverse events outcome in HIV-TB co-infected patients ................................................................................................................... 48 Table 2-7: Data for treatment comparisons of interest for discontinuations due to adverse events outcome in HIV-TB co-infected patients ................................................................................................................... 48 Table 2-8: Data for treatment comparisons of interest for the treatment-emergent serious adverse events ................................................................................................................................................................... 49 Table 2-9: Summary of the Tsepamo study of DTG/TDF/FTC vs EFV/TDF/FTC in pregnant and breastfeeding women initiated on first-line ART ........................................................................................ 51 Table 2-10: Summary of evidence among pregnant and breastfeeding women on first-line ART ............ 52 Table 3-1: Scope of the literature review in PICOS form ........................................................................... 64 Table 3-2: Comparison of model selection and fit for viral suppression at 48 weeks ................................ 78 Table 3-3: Comparison of surface under the cumulative ranking (SUCRA) for viral suppression at 48 weeks ................................................................................................................................................................... 79 Table 3-4: Comparison of comparative treatment estimates for viral suppression at 48 weeks ................ 80 Table 3-5: Coefficient estimates across the IPD-AgD NMA for viral suppression at 48 weeks ................. 81  xv Table 3-6: Comparison of surface under the cumulative ranking (SUCRA) for change in CD4 at 48 weeks ................................................................................................................................................................... 83 Table 3-7: Comparison of comparative treatment estimates for change in CD4 at 48 weeks ................... 84 Table 3-8: Comparison of comparative treatment estimates for discontinuations ..................................... 86 Table 3-9: Comparison of model selection and fit for discontinuations due to adverse events ................. 87 Table 3-10: Comparison of comparative treatment estimates for discontinuations due to adverse events 89 Table 3-11: Comparison of model selection and fit for viral suppression at 96 weeks .............................. 90 Table 3-12: Comparison of comparative treatment estimates for viral suppression at 96 weeks .............. 91 Table 3-13: Comparison of comparative treatment estimates for change in CD4 at 96 weeks ................. 91 Table 4-1: List of the parameters explored through simulation with descriptions .................................... 100 Table 4-2: Make of network according to density and number of nodes ................................................. 103 Table 4-3: Summary statistics of the mean squared error of treatment-effect for the two meta-regression adjusted NMA models ........................................................................................................ 119 Table 4-4: Summary statistics for the multivariate PSRF assessing convergence of simulation models 121   xvi List of Figures  Figure 1-1: HIV replication cycle and mechanisms of the four classes of currently available treatments ... 2 Figure 1-2: General schematic of viral load and CD4 cell counts over the course of HIV ........................... 2 Figure 1-3: Example network of 11 treatments (A-L) and placebo .............................................................. 8 Figure 1-4: Network of evidence reflecting anchored indirect treatment comparison of AB trials with AC trials39 ........................................................................................................................................................... 9 Figure 1-5: Network of evidence reflecting direct and indirect treatment comparison of AB, AC and BC trials39 ........................................................................................................................................................... 9 Figure 2-1: Flow diagram for principal systematic literature review on the choice of first-line ART among adults and adolescents .............................................................................................................................. 31 Figure 2-2: Network of all studies included in the evaluation of first-line ART for the treatment of HIV among adults and adolescents .............................................................................................................................. 33 Figure 2-3: Forest plot of select ARVs comparisons with respect to viral suppression at A. 48 weeks; and B. 96 weeks according to fixed-effects network meta-analysis ................................................................. 35 Figure 2-4: Inconsistency plot comparing direct and indirect evidence for viral suppression at 48 weeks 37 Figure 2-5: Forest plot of select ARVs comparisons with respect to mean change in CD4 cell counts at A. 48 weeks and B. 96 weeks according to fixed-effects network meta-analysis .......................................... 38 Figure 2-6: Forest plot of select ARVs comparisons with respect to mortality according to fixed-effects network meta-analysis ............................................................................................................................... 39 Figure 2-7: Forest plot of select ARVs comparisons with respect to the proportion of patients developing AIDS defining illnesses according to fixed-effects network meta-analysis ................................................ 40 Figure 2-8: Forest plot comparing pair-wise and NMA estimated relative effects of different ARVs with respect discontinuations (all cause) ........................................................................................................... 40 Figure 2-9: Forest plot comparing pair-wise and NMA estimated relative effects of select ARVs with respect discontinuations due to adverse events ..................................................................................................... 41 Figure 2-10: Forest plot of select ARVs comparisons with respect to A. treatment related adverse events and B. treatment emergent adverse events according to fixed-effects and random-effects network meta-analysis ...................................................................................................................................................... 41  xvii Figure 2-11: Forest plot of select ARVs comparisons with respect to A. treatment related serious adverse events and B. treatment emergent serious adverse events according to fixed-effects network meta-analysis ................................................................................................................................................................... 42 Figure 2-12: Forest plot comparing pair-wise and NMA estimated relative effects of different ARVs with respect regimen substitution (48 weeks) ................................................................................................... 43 Figure 2-13: Modified FDA snapshot analysis of the percentage of participants (95% CI) with HIV-1 RNA <50 copies/mL ........................................................................................................................................... 44 Figure 2-14: Flow diagram for principal literature review on TB co-infected individuals and first-line ART regimens .................................................................................................................................................... 45 Figure 2-15: Complete network of evidence for patients with HIV-TB co-infection .................................... 46 Figure 2-16: Flow diagram for principal systematic literature review on pregnant and breastfeeding women and first line ART regimens ....................................................................................................................... 50 Figure 3-1: Pictorial demonstration of basic statistical definitions useful for network meta-analysis ......... 62 Figure 3-2: PRISMA-IPD flow diagram ...................................................................................................... 76 Figure 3-3: Network of all studies included in the evidence base .............................................................. 77 Figure 4-1: Illustration of the different types of effect-modification .......................................................... 101 Figure 4-2: Illustration of the networks constructed using the number of nodes and network density .... 103 Figure 4-3: Density plots summarizing treatment-effect estimates from simulations separated by the number of nodes in the network ............................................................................................................................ 109 Figure 4-4: Density plots summarizing treatment-effect estimates from simulations separated by network density...................................................................................................................................................... 110 Figure 4-5: Density plots summarizing treatment-effect estimates from simulations separated by the proportion of network edges with individual patient data ......................................................................... 111 Figure 4-6: Density plots summarizing treatment-effect estimates from simulations separated by the effect-modification .............................................................................................................................................. 112 Figure 4-7: Density plots summarizing treatment-effect estimates from simulations separated by the trial sizes among individual patient data ......................................................................................................... 113  xviii Figure 4-8: Average mean and median mean-squared errors of treatment-effect estimates within specific scenarios.................................................................................................................................................. 115 Figure 4-9: Density of bias of treatment ................................................................................................... 116 Figure 4-10: Mean treatment-effect standard errors using IPD-NMA and AgD-NMA-MR ....................... 117 Figure 4-11: Relationship between the proportion of patients with IPD and the impact of IPD on mean-squared error ........................................................................................................................................... 118 Figure 4-12: Mean regression coefficient standard errors using IPD-NMA and AgD-NMA-MR .............. 120 Figure 5-1: Evidence hierarchy proposed by Leucht et al336 .................................................................... 133   xix List of Symbols  !" The overall treatment-by-covariate interaction term for the #$% trial-specific covariate  dAB or d Treatment-effect of intervention B relative to intervention A &(. ) Link function *+,- Response for patient i, on treatment k in study j .,- Number of experimental units in arm k of trial j /,- Proportion of respondents in arm k of trial j 0,- Number of respondents in arm k of trial j 12,-3  Observed variance in response value of arm k of trial j 4,- Observed mean response value arm k of trial j 5", Trial-specific covariate value for the #$% trial-specific covariate in the 6$% treatment  5",- Arm-specific covariate value for the #$% trial-specific covariate in the 6$% treatment-arm in the 7$% trial  5. 8&&", Trial-specific covariate value for the #$% trial-specific covariate in the 6$% treatment in the case of aggregate data in an IPD-AgD model s Standard deviation 9"- Treatment-by-covariate interaction term for the #$% trial-specific covariate in the 6$% treatment 9:", Study-specific effect of subject-level covariate 5+,. 9;"<- Interaction effects of covariate 5+, for treatment k relative to control treatment b =,- Unknown parameter within a general  δj Random-effects estimate of treatment differences in study j (Pairwise meta-analysis) @,A- Random-effects estimate of treatment-effect of treatment k relative to treatment b differences in study j B,- The ‘underlying’ outcome for treatment k in study j and the link function to transform this outcome to a normally distributed scale (such as logit). By underlying, we mean that it represents the parameter stemming from the likelihood function and the observed data. C,A Baseline effect of treat b in the jth trial. µ or l  Study effects within the NMA model equation D,A or D, Baseline effect of treat b in the jth trial. Can also be described as the treatment effect for the reference treatment in the jth trial. q or h Link-function transformed outcome [e.g. logit(probability of success)] E,- The ‘underlying’ outcome for treatment k in study j and the link function to transform this outcome to a normally distributed scale (such as logit). By underlying, we mean that it represents the parameter stemming from the likelihood function and the observed data.  xx E+,- link-function-transformed parameter from the likelihood function of interest for the ith individual, in the jth trial, treated with treatment k. σ2 Between-study variance  xxi List of Abbreviations 3TC Lamivudine  AASLD American Association for the Study of Liver Disease ABC Abacavir ACTG AIDS clinical trials group ADI AIDS-defining illness AE Adverse events AgD Aggregate data AgD-NMA Aggregate data network meta-analysis AgD-NMA-MR Aggregate data network meta-analysis with meta-regression adjustments AIDS Acquired immunodeficiency syndrome ART Antiretroviral therapy ARV Antiretroviral agent/drug APA American Psychiatric Association ASH American Society of hematology ATV Atazanavir AZT Zidovudine BIC Bictegravir bid Twice-daily CD4 Cluster of differentiation 4 CDC Centre for Disease Control CDISC Clinical Data Interchange Standards Consortium  CENTRAL Cochrane Central Register of Controlled Trials CFB Change from baseline CIHR Canadian Institutes of Health Research CLARITY Clinical Advances through Research and Information Translation CPU Central processing unit CrI Credible interval CROI Conference on Retroviruses and Opportunistic Infections CSDR ClinicalStudyDataRequest.com d4T Stavudine DAA Directly acting antiretroviral ddI Didanosine DHHS Department of Health and Human Services DIC Deviance information criterion DNA Deoxyribonucleic acid DOR Doravirine  xxii DRV Darunavir DSU Decision Support Unit DTG Dolutegravir EACS European AIDS Clinical Society EASL European Association for the Study of the Liver EFV Efavirenz ETR Etravirine EVG Elvitegravir FAS Full analysis sets FDA Food and Drug Administration FTC Emtricitabine GDG Guideline Development Group GHz Gigahertz GRID Gay-related immune deficiency HBV Hepatitis B virus HCV Hepatitis C virus HIV Human immunodeficiency virus HMR Hierarchical meta-regression IAS International AIDS Society IDU Injection drug user IDV Indinavir iKT Integrated knowledge translation INSTI Integrase strand transfer inhibitors IPD Individual patient data IPD-NMA Individual patient data network meta-analysis ITC Indirect treatment comparisons ITT Intention to treat JAGS Just Another Gibb Sampler KT Knowledge translation LMICs Low- and middle-income countries LPV Lopinavir MAIC Matched-adjusted indirect comparisons MCMC Markov Chain Monte Carlo MD Mean difference MSE Mean squared error MSM Men who have sex with men MTC Mixed treatment comparisons  xxiii NFV Nelfinavir NICE National Institute for Health and Clinical Excellence NMA Network meta-analysis NNRTI Nonnucleoside reverse transcriptase inhibitor NRTI Nucleoside reverse transcriptase inhibitors NTD Neural tube defect NVP Nevirapine OR Odds ratio PAIC Population-based adjusted indirect comparisons PIb Boosted protease inhibitor PICOS Population, interventions, comparisons, outcomes, study design PLWH People living with HIV PRISMA Preferred reporting items for systematic reviews and meta-analyses PRISMA-IPD Preferred reporting items for systematic review and meta-analysis of individual participant data PSRF Potential scale reduction factor qd Once-daily RAL Raltegravir RAM Random-access memory RCT Randomized-controlled trial RNA Ribonucleic acid RPV Rilpivirine RR Relative risk SAE Serious adverse events SD Standard deviation SLR Systematic literature review SUCRA Surface under the cumulative ranking curve STC Simulated treatment controls TAF Tenofovir alafenamide TB Tuberculosis TDF Tenofovir disoproxil fumarate TLD Tenofovir disoproxil fumarate + Lamivudine + Dolutegravir TSD Technical support document WHO World Health Organization XTC Lamivudine or Emtricitabine YODA Yale University Open Data Access   xxiv Glossary  Aggregate data: Summary statistics calculated over a group of observations, typically patients. Anchor treatment: Antiretroviral therapy typically consists of three treatments, two backbone antiretrovirals and a 3rd antiretroviral that is referred to as the anchor treatment. The backbone is typically made up of two treatments of the nucleoside reverse transcriptase inhibitor class and the anchor treatment is of a separate class.  Antiretroviral therapy: A combination of multiple antiretroviral drugs provided to HIV patients in order to control their disease. In the late 90s and early 00s, these were often referred to as highly active antiretroviral therapies. Backbone: Most antiretroviral therapies include two drugs of the nucleoside analogue class and a third drug of another class. These two nucleoside analogues can be referred to as the backbone of the combination. Clinical guidelines: Recommendations for clinicians about the care of patients with specific conditions. They should be based upon the best available research evidence and practice experience. Consistency: The agreement of direct and indirect evidence with respect to a pairwise treatment difference. Consistency is a key condition required to conduct network meta-analysis. Direct evidence: Evidence from a head-to-head trial comparing two treatments. Edges (treatment comparisons): The links between nodes (treatments) in a network indicating the existence of one or more trials comparing them. (see nodes below) Effect-modifier: A covariate that alters the effect of treatment on outcomes, so that the treatment is more or less effective in different subgroups formed by levels of the effect-modifier. Effect-modifiers are not necessarily also prognostic variables. Effect-modifier status is specific to a given scale: the positive status of a covariate as an effect-modifier on one scale does not necessarily imply either positively or negatively effect-modifier status on another scale; however, a covariate that is not an effect-modifier on one scale is guaranteed to be an effect-modifier on another. Endonodal: A trial that reports on a single node in a network. Typically, trials report on one or more comparisons, but endonodal trials are the special cases that do not Evidence based medicine: An approach to medical practice intended to optimize decision-making by emphasizing the use of evidence from well-designed and well-conducted research  xxv Evidence synthesis: The process of combining multiple sources of quantitative evidence into a single, coherent result. Fixed-effects meta-analysis: A model that assumes that there is one true effect size shared by all included studies. (see random-effects below) FDA snapshot algorithm: A patient is virally suppressed if and only if they have a viral load below the pre-determined cut-off (typically <40 copies/mL). This implies that patients failing to provide a measurement due to loss to follow-up are considered failures (to be non-suppressed). Heterogeneity: Between-study heterogeneity is the true variation in treatment-effects among different studies comparing the same interventions caused by systematic differences in known and unknown study design and patient-related effect-modifiers across studies. Human immunodeficiency virus: A retrovirus that when left untreated leads to AIDS. Retroviruses are single-stranded RNA viruses that enter host cells through the cytoplasm and alter DNA using its own reverse transcriptase enzyme and RNA. Indirect evidence: Evidence from multiple trials comparing treatments through one or more common comparators. For example, a trial comparing treatment A to treatment B and a trial comparing treatment A to treatment C provide indirect evidence for treatments B and C through the common comparator treatment A. Individual patient data: Data for each patient, as opposed to aggregate data, which are only available at the treatment arm or trial level. Inconsistency: The disagreement between direct and indirect evidence in a network. A fundamental assumption of network meta-analysis is that direct and indirect evidence are equivalent (so called transitivity). Meta-regression: An expansion of meta-analysis methods, whether pairwise or network, by which the relative treatment-effect of each study is a function of not only a treatment comparison of that study but also an effect-modifier. Mathematically, this is a weighted regression that with the strength of evidence dictating the weight of each observation of the relationship between potential effect-modifier and outcome. Network meta-analysis: An expansion of pairwise meta-analysis (see below) by which evidence for a connected network of evidence is analyzed simultaneously. Network meta-analysis is used as an umbrella term to include indirect treatment comparisons (networks with no closed loops) and mixed treatment comparisons (networks with at least one closed loop). Neural tube defect: Birth defects of the brain spine or spinal cord. They happen in the first month of pregnancy, often before a woman even knows that she is pregnant.  xxvi Node: The individual treatments that make up the network. Nodes are connected through edges (see above) Outcome regression: A method for adjusting the outcomes observed in a sample population to those that would have been seen in a target population with a different covariate distribution. A statistical model is created to describe the outcome in terms of the covariates, and then applied to predict outcomes for the target population. Pairwise meta-analysis: Methods that combine quantitative evidence from multiple scientific studies comparing two treatments of interest. Results from pairwise meta-analyses provide a succinct result regarding the comparative effect under consideration between two treatments. Prognostic factor: A covariate that affects (or is prognostic of) the outcome. There is a distinction between prognostic variables and effect-modifiers; effect-modifiers are not necessarily also prognostic variables.  Public health approach: Keeping guidance as simple as possible in order to facilitate task-shifting and other strategies within resource-limited settings. Random-effects meta-analysis: A model that allows the true effect to vary from study to study. Here, differences in study settings are acknowledged to lead to differences in the true effect. For example, studies with older patients may have a slightly larger effect than those with younger patients. Randomized controlled trials: A type of scientific experiment that aims to reduce selection bias when testing treatments by randomizing patients to treatments. Simulation study: Computer experiments involving the creation and analysis of random datasets over a number of replications and varying factors so as to understand trends in the results according to changes in the factors that are controlled for. Systematic literature review: A type of literature review that uses systematic methods to collect secondary data, critically appraise research studies, and synthesize studies. The systematic approach should be transparent so as to be reproducible.   xxvii Acknowledgements I am grateful for my PhD supervisor, Nick Bansback, and the members of my committee – Kristian Thorlund, Ehsan Karim, and Aslam Anis – for their continual support and their patience in what turned out to be a rather long process. They provided the knowledge and feedback required to persist through the research program.  A special thank you to my former director and mentor Edward Mills, who pushed me towards doing this PhD and provided moral and technical support early on. I wouldn’t have accomplished this without him. I am similarly thankful for the support provided by Jeroen Jansen. Thank you for being the technical guru that you are and for your friendship. I am also thankful for other support within the greater research community at large. To Ross Maclean and Bob Hogg, I am grateful for your support through my professional career and helping me push through despite competing interests. Thank you to Nathan Ford, Marco Vitario, Meg Doherty, Martina Penazzato, Lynne Mofenson, Rebecca Zash, Francois Venter, Alexandra Calmy, and Sabin Nsanzimana for your HIV expertise and your support, particularly for Chapter 2. Gratitude goes out to the Canadian Institutes for Health Research (CIHR) and their support of my work through their Doctoral Research Award. I am equally grateful for the Clinical Studies Data Request program that provided the de-identified individual patient data that allowed some of this research to take place.  To my eternal cheerleaders, my friends and family, I am forever grateful. A special thanks to my father Ron, my mother Renée, my brother David, and my godmother Louise. Jamie Forrest, Katie Muldoon, Eric Druyts, and Shannon Cope, thank you for the useful conversations and emotional support when venting was needed. And finally, last but by no means least, also to every one of my friends and colleagues that have supported me along the way over the past seven years. Thanks for all your encouragement!   xxviii Dedication  This dissertation and all of my academic achievements are dedicated to my godmother Louise Kanters who passed away suddenly before the completion of my dissertation.   1 Chapter 1: Introduction The fight against the HIV/AIDS epidemic started in the early 1980s and continues to this day.1 Key milestones have been: the introduction of antiretroviral drugs (ARVs)2 in the 1980s, the introduction of anti-retroviral therapy (ART) regimens combining three ARVs in the mid 1990s,3 and the ART scale-up in low- and middle-income countries (LMICs) in the early 2000s.4 The World Health Organization (WHO) provides clinical guidelines aimed at policy-makers and clinical experts. Its first guidelines for adults and adolescents were published in 2002, which was then updated in 2003, 2006 and 2010.5 Over the same period of time, numerous guidelines were also released for other populations of interest.5 In June 2013, WHO released consolidated guidelines on the use of antiretrovirals for the treatment and prevention of HIV in adults, children and pregnant women.6 These guidelines use a public health approach by which guidance is kept as simple as possible in order to facilitate task-shifting and other strategies within resource-limited settings.7 A more personalized approach is used in high-resource settings. In order to properly discuss the knowledge gaps that this dissertation aims to address, Sections 1.1-1.3 provides some background on HIV – the disease, the therapeutic landscape, and current treatment guidelines – and on evidence synthesis methods – network meta-analysis and individual patient data analyses. 1.1 Human immunodeficiency virus The human immunodeficiency virus (HIV) is a retrovirus. It infects specific cells within the immune system, most notably CD4+ T cells. Over time, HIV left untreated leads to acquired immunodeficiency syndrome (AIDS) and eventually death.8 Retroviruses are single-stranded RNA viruses that enter host cells through the cytoplasm and alter DNA using its own reverse transcriptase enzyme and RNA.9 They are referred to as retroviruses because this order of operation is the reverse of the usual pattern. The new DNA is then incorporated into the cell’s genes, leading to the production of proteins to assemble new copies of the virus.  Figure 1-1 displays the HIV replication cycle which is central to its classification as a retrovirus.  Two important variables in understanding the progress of HIV within an infected person are CD4 cell counts and viral load. Through the HIV replication cycle highlighted above, the infected CD4 cells are eventually killed. Moreover, as the CD4 cell count decreases, the infected individual becomes increasingly immunocompromised, making them more and more susceptible to opportunistic infections. Viral loads are measured as the number of HIV RNA copies per mL of plasma. Higher viral loads are both harmful to the individual, but also indicate that an individual is more infectious, thus increasing the probability of transmission associated with certain behaviours. The average course of untreated HIV can be described using these two variables, as shown in Figure 1-2. The schematic also shows the three phases of disease progression, namely acute infection shortly after the primary infection occurs, clinical latency during which viral loads are relatively low and the culmination to the development of AIDS.  2 Figure 1-1: HIV replication cycle and mechanisms of the four classes of currently available treatments  Note: Schematic description of the mechanism of the four classes of currently available antiviral drugs against HIV: fusion inhibitors ( interfere with the binding, fusion or entry of an HIV virion), reverse-transcriptase inhibitors (interfere with the translation of viral RNA into DNA), integrase inhibitors (block the viral enzyme integrase, that inserts the viral genome into the DNA of the host cell), protease inhibitors (block proteolytic cleavage of protein precursors that are necessary for the production of infectious viral particles). Credit Thomas Splettstoesser (www.scistyle.com). Creative Commons License 3.0.  Figure 1-2: General schematic of viral load and CD4 cell counts over the course of HIV  Note: Individual cases may vary considerably relative to this schematic. Figure has no copyrights.  3 There are multiple modes of transmission of HIV. Viral copies are present in both genital liquids and blood plasma. Sexual intercourse is the most common mode of transmission, but there are numerous others. Exposure to infected blood, such as through sharing of needles or through tainted blood transfusions, is another important mode of transmission. Finally, HIV may be transmitted from mother to child when a pregnant woman is HIV-positive. As presented in the next section, there are ways of preventing most of these transmissions (the major exception is a transfusion of tainted blood).  1.1.1 History and treatment The exact origins of HIV remain a contested topic. The most commonly accepted theory is that HIV was transferred from primates to humans in the early 20th century in Western Africa.10 Evidence suggests that acquisition occurred through the manipulating of bushmeat by hunters and butchers. The earliest documented case of HIV within a human being took place in the Congo in 1959.11 However, it was only in 1981 that the first case of AIDS was clinically observed and in 1983 that HIV was identified as the virus leading to AIDS.12 AIDS was principally observed in homosexual men at first, which led to the term GRID, for gay-related immune deficiency. As more cases presented themselves, this was changed to the 4H disease, for homosexuals, heroin users, hemophiliacs and Haitians. Finally, once it was understood that this disease was not isolated to these groups, the term AIDS was adopted. A full account of the history of HIV can be found in numerous sources.13,14 As of 2017, there are approximately 36.9 million people living with HIV.15 Roughly 70% of cases are in sub-Saharan Africa, but the epidemic is a global one.15 Although prevalence continues to increase (global prevalence in 2010 was approximately 33.2 million), the number of deaths has decreased over time.15 This is principally due to the development of effective treatments that render the disease chronic rather than necessarily deadly and the scale-up of treatment to developing nations. Indeed, the life expectancy of people living with HIV, even in developing nations, has returned to close to normal.16 As previously mentioned, a major turning point in the fight against AIDS was the development of highly active ART, the results of which were first presented at the 11th International AIDS Conference in Vancouver in 1996. Highly active ART is a combination of three ARVs. It is a slightly antiquated term that is seldom used today. It has replaced by the simpler term ART. The typical ART regimen is composed of a backbone and a third agent, often referred to as an anchor treatment. Most backbones are composed of two nucleoside reverse transcriptase inhibitors (NRTIs) and the anchor treatment come from another class.  In addition to the HIV replication cycle, Figure 1-1 provides an overview of mechanisms by which the different treatment classes act to counter HIV. In addition to NRTIs, a separate class, non-nucleoside reverse transcriptase inhibitor (NNRTIs) also interrupt the replication cycle in the first step following entry into the cell, namely the alteration of the DNA using the RNA strands and reverse transcriptase enzyme. NRTIs were the first class of ARVs to be developed, often used one at a time in their early use. NNRTIs  4 and protease inhibitors (PIs) were the next two classes to be developed and the two drug classes used as anchor treatments within ART when they were developed. Protease inhibitors interrupt the replication cycle near the end by interrupting the process of assembling new copies of the virus through proteins produced by the altered DNA. Of note, PIs are usually boosted with another PI, specifically ritonavir or cobicistat, and is referred to as PIb to denote the boosting. Two other drug classes have been developed since then. First integrase strand transfer inhibitors (INSTIs), which interrupt the integration of DNA into the cell genome, and second, fusion inhibitors, or entry inhibitors, which interrupts the entry through the cytoplasm at the very beginning of the replication cycle. By including multiple treatments within an ART regimen, that interrupt multiple stages of the replication cycle, the efficacy of ART changed dramatically.17 Patients taking these treatments became virally suppressed, that is to say that blood tests could no longer detect viral loads. As a result, there were dramatic improvements in patient health – with immune system recovery, and reduction of morbidity and of mortality. Over the years, treatments have improved such that they have become more effective, more tolerable, safer and easier to use.18 Early ART required many pills to be ingested multiple times a day, but today there are many options that require a single fixed treatment combination (FTC) pill once a day. With time, an additional benefit of ART became apparent in that sustaining viral suppression led to a dramatic reduction of transmission rates. This led to the development of treatment as prevention, which was seminal in the reduction of ART costs and the scale-up of ART to sub-Saharan Africa and other developing nations.4 1.1.2 Current guidelines and new developments With the increasing number of emerging treatments, and trials comparing these treatments to each other, guidelines emerged to help synthesize the evidence and help promote best practise. Multiple HIV treatment guidelines exist. They differ in terms of scope, target audience, jurisdiction and decision-making parameters (e.g., costs, access to genetic testing, etc.). Most countries have their own set of clinical guidelines. In Canada, guidelines are provided at the provincial level by bodies such as the BC Centre for Excellence in HIV/AIDS for the province of British Columbia.19 Guidelines can also be developed by agencies, whether they be unilateral, such as the International Antiviral Society-USA (IAS-USA) and the US Department of Health and Human Services (DHHS), or multilateral, such as the WHO and the European AIDS Clinical Society (EACS). Perhaps the two most influential guidelines are the guidelines by the IAS-USA and those by the WHO. As previously mentioned, WHO guidelines are used by most LMICs. Similarly, the IAS-USA guidelines20 are used by many high-income countries. For example, the above-mentioned BC guidelines are aligned with the IAS guidelines. The 2016 IAS-USA guidelines recommend that all HIV-infected patients with a detectable viral load be treated. For patients first initiating ART, the recommended regimen is an INSTI + 2 NRTIs.20 Both NNRTI + 2 NRTIs and PIb + 2 NRTIs are considered other effective regimens. While the IAS-USA guidelines do not differentiate ARV within the INSTI class for first-line regimens, they do specifically recommend efavirenz  5 (EFV) and rilpivirine (RPV) for NNRTIs (i.e., nevirapine is not recommended) and ritonavir boosted darunavir (DRV/r) for PIb. The choice of backbone varies from abacavir/lamivudine (ABC+3TC), tenofovir alafenamide/emtricitabine (TAF+FTC) and tenofovir disoproxil fumarate (TDF)+FTC, with TDF contra-indicated for those with or at-risk of kidney and bone disease. The guidelines note that HLA-B*5701 testing should be performed prior to abacavir use. Box 2 of the IAS-USA guidelines provides a long list of additional guidance regarding sub-populations and choices to be made on the basis of kidney function as measured by creatinine clearance and other measures.20  When this PhD research was initiated, the 2013 WHO Consolidated Guidelines on the use of ARVs for Treating and Preventing HIV Infection were the most recent WHO guidelines.6 Since then, the 2016 Update  was released.21 Clinical guidelines are developed through multi-step processes that ensure that they are feasible within the current clinical environment and that they are evidence-based. A key step involves evidence synthesis whereby all of the evidence is collected and analysed so as to provide an overview of the therapeutic landscape. In 2015, the initial research for this PhD program included evidence synthesis that was used to update the 2013 Consolidated guidelines on the use of antiretrovirals for treating and preventing HIV.22-24 At the time, the hypothesis was that INSTI based regimens or low-dose efavirenz (EFV400) based regimens would challenge the preferred recommended first-line regimen. The 2013 guidelines recommended, for adults and adolescents, a first-line ART consisting of an NNRTI + 2 NRTIs with EFV + TDF + XTC as the preferred option as the first-line regimen.6 Results of the 2015 systematic literature review (SLR) and network meta-analyses (NMA) revealed improved tolerability and efficacy with INSTIs and EFV400, with dolutegravir (DTG) having the highest estimated tolerability and safety.24 Despite this evidence, DTG and EFV400 were recommended as alternative first-line regimens rather than the preferred treatment.21 This was due to numerous limiting factors, such as uncertainty around efficacy and safety in sub-populations and prohibitive costs.  1.2 Current evidence synthesis methods It is important to note that the evidence synthesis methods used to inform the different guidelines can vary. Evidence synthesis is the process of combining multiple sources of quantitative evidence into a single, coherent result. Evidence syntheses often form the “evidence base” in evidence-based decision-making, such as evidence-based medicine and evidence-based politics. As the latter practices have grown in recent years, so has the frequency of use of evidence syntheses.25 Evidence synthesis involves using systematic, transparent methodology stemming from the formulation of a research question through to the identification of eligible studies, the appraisal of their quality, the analysis of the relevant data and the determination of the strength of the evidence relative to the posed research question.  Study selection is usually conducted through a systematic literature review. SLRs are literature reviews that are conducted according to predefined methods that have been specified in order to ensure that all steps are taken to maximize the sensitivity, and perhaps specificity, of the study selection algorithms. They  6 provide improved rigour because they help ensure best practices are used and they provide a framework through which the work that is carried out can be evaluated. The group of statistical methods used to combine results across a collection of studies are collectively referred to as meta-analyses. Although the concepts of meta-analyses can be traced back to the age of enlightenment,26 formal methods for doing so are usually traced back to the early 20th century when Karl Pearson reported upon typhoid inoculation across multiple clinical studies.27 As the frequency of evidence syntheses has increased over the past 20-30 years,28,29 so have the methods used for analyzing data. Key among these developments has been the expansion of conventional meta-analysis to network meta-analyses. An overview of conventional meta-analyses and NMA are provided in the remaining Subsections 1.2.1 and 1.2.2. As will be explained in these sections, the choice of analytical method often matters, which is why it is interesting to note that some HIV guidelines have been developed on the basis of traditional pairwise meta-analyses and others have been developed on the basis of NMA. 1.2.1 Conventional meta-analysis Conventional meta-analyses consist of methods used to combine treatment-effect estimates comparing two treatments, referred to as pairwise meta-analysis, as well as methods to combine single measures, such as proportions, obtained across multiple scientific studies. There are many variants in the methods, each varying on the basis of the underlying concepts, the measures being combined, data structure, improvements in estimation stability, statistical framework (e.g., Bayesian or Frequentist) and more. Numerous resources provide comprehensive details of these methods.30,31 As such, only the basic details are reviewed below. These are: heterogeneity, fixed and random-effects models, and the basic ideas behind the methods for estimation of an overall effect. To discuss heterogeneity, it helps to understand the difference between effect-modifiers and prognostic factors. Prognostic factors are any patient or trial characteristics that influence a patient’s outcome. For example, age is a prognostic factor to survival given that, on average, younger patients are at a lower risk of death than older ones. An effect-modifier is a trial or patient characteristic that influences the relative efficacy of two or more treatments. For example, two treatments might be equivalent in patients with a mild form of a disease, but only one is effective in patients with the severe form.  Most meta-analyses include studies that are clinically and methodologically diverse. Although study similarity is a condition required to conduct meta-analysis, studies will always differ in some respect. Traditionally, meta-analyses were restricted to randomized controlled trials (RCTs) but are now conducted using a variety of study designs. Among RCTs, differences in study populations and prognostic factors may also affect the observed outcomes being measured. Between-study heterogeneity is the true variation in treatment-effects among different studies comparing the same interventions caused by systematic differences in known and unknown study design and patient-related effect-modifiers across studies. For example, in a meta-analysis of global trends in antiretroviral resistance in treatment-naïve individuals with  7 HIV,32 a variety of sources of heterogeneity were identified and addressed through meta-regression. Sources of heterogeneity included continents, the start time of the ART roll-out for each country, and proportions of patients by risk group, among others (e.g., trends in pre-treatment acquired resistance were different in countries with larger proportions of acquisitions through injection drug-use than through sexual activities). Meta-analyses can be performed with fixed- or random-effects models. A fixed-effects model assumes that there is no heterogeneity and that all included studies are providing estimates of the same treatment-effect. This implies that there are no effect-modifiers, or that they have the same distribution across all studies in the meta-analysis. On the other hand, a random-effects model assumes that each study has its own true treatment-effect and are assumed to follow a distribution around an overall mean (the meta-analysis mean; the parameter of interest in fixed-effects meta-analysis), and with a variance (between-study heterogeneity) that reflects how different the true treatment-effects are between them. In general, the assumptions of random-effects models are much more plausible than of the fixed-effect model. Nonetheless, when there is little evidence of between-study heterogeneity, one should use a fixed-effect model given that it is simpler in nature.33  As mentioned above, there are many variants on the equations used to conduct these analyses, but fixed-effects meta-analyses all consist of weighted averages, with the inverse variance of each estimate being the most common weight used.31 The specific mathematical formulas are presented in Chapter 2. The most commonly used random-effects model, referred to as the DerSimonian-Laird method,34 simply involves adding the estimated between-study variance H3 to the within study variance. As a result, the weights used for the weighted average involve both the study variance and the between study variance. All of these concepts apply to the expansion of meta-analyses to NMA. 1.2.2 Network meta-analysis A collection of RCTs informing on several treatments (3 or more) constitutes a network of evidence, in which each RCT directly compares a subset, but not necessarily all treatments. Such a network involving treatments compared directly, indirectly (through a pathway of common comparators), or both, can be synthesized by means of NMA.35-37 Figure 1-3 presents an example of such a network. Note that methods have been developed to conduct NMA among non-RCT studies; however, uptake has been low thus far. In a traditional meta-analysis, all included studies compare the same intervention with the same comparator, as described in the previous section. NMA extends this concept by including multiple pairwise comparisons across a range of interventions and provides estimates of relative treatment-effects on multiple treatment comparisons for comparative effectiveness purposes based on direct and/or indirect evidence.  In its simplest form, a network is composed of three treatments with only two edges (treatment comparisons; connecting lines representing the existence of trials making the depicted direct comparison). The method  8 used for this specific scenario is also referred to as adjusted indirect treatment comparisons (ITC). This method was introduced by Bucher et al in 1997 and for the first time allowed for a valid comparison of two treatments that have not been compared in a head-to-head fashion.38 From ITC, multiple treatment comparisons and mixed treatment comparisons (both referred to as MTC) were derived, by which more treatments are compared simultaneously and/or closed loops are present in the network. The details below help better understand these differences; however, NMA as a set of methods capture ITC and MTC.  Figure 1-3: Example network of 11 treatments (A-L) and placebo  Legend: Each circle represents a treatment and each line represents head-to-head evidence between the treatments it connects. Thinker lines correspond to stronger evidence (more trials and larger sample sizes) between the trials.   To help understand the basics of these methods, consider scenarios in which there are three treatments: EFV, DTG, and raltegravir (RAL). For simplicity of annotation, consider EFV to be drug A, DTG to be drug B and RAL to be drug C. RCTs comparing DTG with EFV (AB trials) provide a direct estimate of the true relative effect of DTG versus EFV (dAB). AC-trials provides a direct estimate of the true relative effect dAC. If the included AB and AC trials are sufficiently similar then the true relative efficacy of the different types of comparisons are mathematically related, as illustrated in Figure 1-4.  In the absence of head-to-head evidence comparing DTG and RAL, an unbiased indirect estimate for the relative treatment-effect of RAL versus DTG (dBC) can be obtained from the true effect dAB and from the true effect dAC under the assumption that there are no systematic differences between AB and AC studies regarding study and patient characteristics that affect the treatment-effect of DTG versus EFV and RAL versus EFV (i.e. no imbalance in effect-modifiers between the different direct comparisons). If a network has a closed loop, there is both direct and indirect evidence for some treatment contrasts (Figure 1-5). For example, in an EFV-DTG-RAL network that consists of AB trials, AC trials, and BC trials,  9 direct evidence for the DTG-RAL contrast is provided by the BC trials and indirect evidence for the BC contrast by the indirect comparison of AC and AB trials. If there are no systematic differences in treatment-effect-modifiers across the different direct comparisons that form the loop, then there will be no systematic differences in the direct and indirect estimate for each of the contrasts that are part of the loop. Combining direct estimates with indirect estimates is valid and the pooled (i.e., mixed) result will reflect a greater evidence base and one with increased precision regarding relative treatment-effects. Figure 1-4: Network of evidence reflecting anchored indirect treatment comparison of AB trials with AC trials39   Figure 1-5: Network of evidence reflecting direct and indirect treatment comparison of AB, AC and BC trials39   In general, network meta-analysis relies on the transitivity assumption, IA- = I<- − I<A, and is used to obtain relative effect estimates for comparisons not studied in a head-to-head fashion, as well as to combine direct comparisons with indirect comparisons to increase precision. NMA does not only rely upon transitivity  10 conceptually. Transitivity is central to the underlying mathematical formulae. The entire network has LM3N contrasts, but is solved using . − 1 parameters, where . is the number of treatments included in the network. Specifically, only the I<- treatment-effects are estimated, where P is the reference treatment. All other contrasts are calculated using the estimated treatment-effects and the transitivity assumption.  In the presence of a closed loop, violations of the transitivity assumption will show as systematic differences between direct and indirect estimates for comparisons of interventions that are part of the loop. Combining these may be inappropriate. In order to help identify inconsistency, it is informative to perform a meta-analysis of the relative treatment-effects based on only the direct evidence, as well as a synthesis of only indirect evidence, before performing the network meta-analysis where direct and indirect evidence are combined.  Inconsistency can be overcome by either avoiding the NMA altogether when issues are detected, removing studies that are the source of inconsistency, or using meta-regression to account for trial differences that lead to the inconsistency. Meta-regression analyses are performed by modeling the relative treatment-effect of each study as a function of not only a treatment comparison of that study but also an effect-modifier. In other words, with a meta-regression model, the pooled relative treatment-effect is estimated for a certain comparison based on the available studies, adjusted for differences in the level of the effect-modifier between studies. Meta-regression analysis can help explain between-study heterogeneity and minimize inconsistency due to transitivity violations.40 Finally, note that NMA can be performed within a frequentist or Bayesian framework. The analyses for the current research will follow the Bayesian approach. Bayesian methods involve a formal combination of a prior probability distribution (that reflects a prior belief of the possible values of the model parameters) with a (likelihood) distribution based on the observed data to obtain a posterior probability distribution of model parameters.41 The likelihood informs us about the extent to which different values for the parameter of interest are supported by the data. A major advantage of the Bayesian approach is that the method naturally leads to a decision framework.41-43 The posterior distribution can be interpreted in terms of probabilities (e.g. “There is an x% probability that treatment A results in a greater response than treatment B”); frequentist approaches do not allow such an interpretation.44 For an NMA, an additional advantage of a Bayesian approach is that it allows calculation of rank-probabilities.44 There are several important advantages to NMA relative to other, related methods. These are discussed in the next section all while arguing for why they should be used in the development of clinical guidelines.45 1.2.3 The rationale for using NMA for the development of clinical guidelines There has been a marked uptake of NMA methods across a variety of fields. Chief among these are the field of health technology assessment (HTA) and academic research. They have also recently been adopted by the Cochrane Collaboration, with 10% (23/230) of systematic reviews published by the  11 Cochrane Collaboration using NMA. The NMA methodology has numerous qualities that lends itself to decision-making processes such as clinical guideline development.  Clinical guidelines are developed through multi-step processes that ensure that they are feasible within the current clinical environment and that they are based on the best available evidence. Although it is often acknowledged that having the most up-to-date evidence is critical to the development of clinical guidelines, it is equally critical that optimal analytical methods are used to appraise the evidence. NMA methods comprise the simultaneous analysis of all potential treatment options, they make full use of the available evidence. This offers several advantages to the process of developing guidelines. One advantage is the ability to quantitatively contrast interventions that have not been directly compared in studies. This is pivotal to guideline development because, in the absence of head-to-head evidence, guideline development groups will rely more strongly on expert opinion and may, perhaps even subconsciously, make naïve comparisons that have not adequately accounted for potential biases in study designs, intervention characteristics, and study populations. Indirect comparisons connect treatments via a ‘common control’ (e.g., placebo or current standard of care) to establish comparative effects between treatments that have not been compared head-to-head in RCTs. The connection through the common comparator supports the adjustment for bias that would otherwise be introduced by differences in prognostic factors if the individual treatment arms were compared naively.  A second advantage is that, by analyzing both direct and indirect evidence collectively, the evidence base is strengthened - often to an extent that could mean the difference between grading the strength of evidence as low versus moderate or high. And a third advantage of NMAs to guideline development is the ability to analyse a complete network of interventions within a single analysis. Doing so provides a more concise assessment of the clinical landscape that in turn lends itself better to decision-making. This was particularly important for the development of the HCV guidelines given the rapidly changing treatment landscape with the approval of six new treatments since 2013. Finally, while NMAs have traditionally been used to assess the comparative effectiveness of drugs, the approach can be applied more broadly.  It is for these reasons that the National Institute for Health and Clinical Excellence (NICE) in the United Kingdom has begun to include NMA within its clinical guidelines manual;46 however, uptake continues to be low in this field, with only 8% of NICE clinical guidelines making use of NMA.47   1.3 Individual patient data in evidence synthesis There are numerous challenges to conducting evidence synthesis in this therapeutic area (HIV).48 Perhaps the easiest challenge to overcome is the lack of head-to-head evidence for key treatments of interest. For example, up until very recently, there no head-to-head trials comparing DTG and EFV400 that have reported results. While traditional pairwise meta-analyses would do not allow for inference to be drawn on such comparisons, use of NMA easily would overcome this issue.  12 The two most important challenges to conducting evidence synthesis in HIV are: defining the network nodes and addressing the high degree of heterogeneity among HIV patients and settings. The difficulty with node definitions centers around the drug combination nature of ART. Defining nodes according to the exact combinations of drugs would lead to too large a network, such that results could not be parsed into a meaningful conclusion. Simplification through drug classes would simplify the network considerably, but would fail to answer the research question of which drug is preferable. Using the anchor treatment as the node definition and adjusting for differences in backbones would allow for a sensible network to be formed and results to be meaningful. Such an adjustment would require arm-based meta-regression which is seldomly used in NMA.  Addressing heterogeneity, the second challenge highlighted above, is a mainstay in evidence synthesis more generally. In HIV, trials conducted principally in resource-rich settings will differ in many ways relative to trials conducted in LMICs. For example, trials in the US are mostly composed of affluent, white men who have sex with men, while trials in Sub-Saharan Africa tend to be among women who have higher viral loads, lower CD4 counts and are at higher risk of opportunistic infections such as TB. Evidence synthesis is typically conducted using aggregate data (AgD). This is primarily based on data availability. In doing so, methods to adjust for heterogeneity include trial-based meta-regression adjustments and restricted analyses. As the data are aggregated, they often lack the granularity to make a meaningful difference. As the Cochrane handbook explains,49 individual patient data (IPD) meta-analysis has many important advantages to AgD meta-analysis. While IPD meta-analysis requires more involvement from the statistician, it allows for more and better adjustments. More because IPD tends to have more reported variables and because certain types of adjustments are simply not feasible with AgD only. For example, when using IPD patients can be selected individually to meet a target population (i.e., restrictions can be used as an adjustment mechanism). Better because regression in AgD may fall prey to the ecological fallacy50 and adjustments using IPD can be made on multiple variables simultaneously. It is for these reasons, among others, that IPD meta-analysis is considered the gold standard of meta-analyses. However, given that trialists tend to be protective of their data, it is much more common to have IPD for one or few trials and AgD for the remaining trials. A 2007 systematic review of meta-analyses using IPD and AgD found 33 such publications and that 27 of the 33 publications used a two-stage approach to their analyses. A two-stage approach involves first converting the IPD into AgD and then analyzing the AgD using traditional methods.51 This may be viewed as an over-simplification of data that may fail to make full use of the information at hand, but it is unclear whether more complicated methods would lead to substantially different results. Sutton and colleagues developed methods for meta-analyses, referred to as a one-stage approach, involving a mixture of IPD and AgD using Bayesian hierarchical models to analyze IPD in their native form all while combining them with AgD.52 Finally, Jackson et al proposed a method of hierarchical regression, in the context of survey data, through which IPD meta-regression is conducted with AgD and is used to minimize the ecological fallacy over the AgD.53  13 These ideas can be extended to NMA. Jansen extended the methods by Sutton et al and Jackson et al to be applied to NMA.52-54 There have been a flurry of development in the use of mixed AgD and IPD in NMA lately, but much of it has been focussed on population-based adjusted indirect comparisons (PAIC); namely, matching adjusted indirect comparisons (MAIC) and simulated treatment controls (STC).55 These are methods that aim to “map” the treatment-effects observed in one population into the effects that would be observed in another population by weighting individual patients to match against a population that is only available in AgD. These two methods are akin to propensity score weighting and outcomes regression methods, respectively.55,56 The most important utility for these methods is to allow for adjustments to occur in sparse and/or disconnected networks.57 A down-side to PAIC is that it mostly applies to small evidence bases. MAIC does not allow for multiple treatment comparisons. There has been much less research conducted in the use of network meta-regression with limited IPD, particularly around the Jansen-Jackson methods of hierarchical meta-regression that can be described as population-adjusted individual-level indirect comparisons rather than population adjusted study-level indirect comparisons. What is clear is that these adjustment methods that leverage information from IPD allow for better adjustments that ultimately lead to less biased estimates. Given that the HIV population is heterogeneous to begin with – across a wide socio-economic range, races, risk groups, ages and more – using IPD within NMA for ART RCTs is likely to improve the statistics that are obtained from such evidence synthesis. 1.4 Summary of knowledge gaps Since 2006, the combination of EFV, FTC, and TDF has been available as a once-daily fixed-dose combination under the name of Atripla. With its high efficacy, reduction of pill burden and simplification of treatment, it has been recommended as the preferred first-line of treatment by the WHO since 2008. Since it has been established as the preferred first-line treatment, many ARVs have been developed, including drugs within new classes, such as INSTIs. Given a large number of treatment options, it was unclear whether one of the newer ART regimens (regimens developed after Atripla) meets the requirements to be the preferred first-line regimen in clinical guidelines that use a public health framework (knowledge gap 1).  Conventional NMA uses aggregate data from RCTs and requires that all treatments included in the analysis be connected within a single network.58 Lately, one area of focus with respect to new methodology within this field of research has been the expansion of methods to combine aggregate data with individual patient data. The latest technical support document (TSD18) released by the National Institute for Health and Clinical Excellence (NICE) Decision Support Unit (DSU) provides an overview of two methods to allow for population-based adjustments when using methods that use IPD and aggregate data; namely, MAIC and STC.55 The motivation for these methods principally stands from allowing for better adjustments of effect-modifiers and for allowing for unanchored networks. The latter has become crucial in the wake of increased fast-tracking by the FDA.59 Despite guidance on how to combine IPD and aggregate data, it remained  14 unclear whether the use of IPD could lead to improved statistics for the HIV evidence base (knowledge gap 2).  The use of IPD within evidence synthesis is gaining popularity and there is well-established guidance, such as the aforementioned TSD18, on how to use these in the context of anchored and unanchored networks.55 Nonetheless, there remain many areas that require further research to optimize the use of IPD within NMA. STC and MAIC tend to be restricted to use in small networks – often with only three nodes. However, the use of IPD for meta-regression, as discussed in knowledge gap 2, can be used in networks of any size. It remains to be determined when the proportion of data from IPD are too small relative to the overall network to impact the NMA results. Could it be known beforehand based on network characteristics, whether the inclusion of IPD will be of limited utility?  (knowledge gap 3).  1.5 Dissertation objectives The overarching aim of this dissertation is to use, and improve upon, network meta-analyses to compare the clinical efficacy, tolerability and safety of competing HIV ART regimens to help inform clinical guidelines. Chapter-specific aims and research objectives are outlined in Table 1-1. Table 1-1: Scope of the dissertation research program Chapter Title Aim Research Objective 2 Systematic literature review and network meta-analysis of first-line ART To determine whether an ARV can replace EFV as the anchor treatment in the preferred first-line ART regimen a) To identify and obtain the evidence base of RCTs pertaining to currently used ART for the treatment of treatment-naïve HIV patients; b) To estimate the comparative efficacy, tolerability and safety of currently used ART for the treatment of treatment-naïve HIV patients; c) To use the above two steps to answer the following two research questions: • Should DTG be recommended as the preferred first-line antiretroviral agent in combination with an age-appropriate backbone (TDF + XTC for adults and adolescents) for the treatment of HIV? • Should EFV400 be preferred over EFV (standard-dose) for the first-line antiretroviral agent in combination with an age-appropriate backbone for the treatment of HIV? 3 Investigation into the benefits of using IPD for the network meta-analysis of first-line ART  To determine the benefits of using IPD in the evidence synthesis of first-line ART a) To examine the change in outputs in the evidence synthesis of ART among  first-line HIV patients – with a particular focus on the relative efficacy, safety and tolerability of DTG relative to other anchor treatments – when including IPD and  b) To further compare the extent of this impact using different established IPD-based methods for meta-regression adjustments using a mixture of IPD and AgD. 4 When does use of individual patient data make a difference? To determine whether dilution of IPD within a network (i.e., reduction of the relative proportion of data coming from IPD) reduces its utility a) To identify, through use of simulations, statistics that can help identify when the inclusion of IPD might not be useful to include  b) To apply the IPD within a smaller network for the first-line ART network which meets criteria to be likely to be impactful found in the simulation studies  15 1.6 Dissertation outline This dissertation is structured using five chapters. This introductory chapter has provided the program’s foundational background on the disease area, concepts and methods. It has introduced the knowledge gaps and, from these, has stated the dissertation objectives.  Chapter 2 describes the methods and results of the SLR and NMA conducted among first-line ART regimens for the treatment of HIV. Each of the methods and results section first describe the SLR, which was conducted in two stages (2015 and 2018. The subsequent construction of the analyses set from the evidence base is also provided. Results of analyses conducted using the analysis set are then presented for adults and adolescents, followed by results within each of the key subpopulations.  Chapter 3 builds upon Chapter 2 by expanding upon its analyses through the inclusion of individual patient data obtained through ClinicalStudyDataRequest.com. It compares a variety of methods to include individual patient data and compares them with respect to the estimates that were obtained. The chapter also discusses some of the challenges that don’t appear to have been discussed previously, such as how to properly construct the statistics required for model selection.  Similarly to Chapter 3, Chapter 4 builds upon the previous chapter. It was motivated by the results of Chapter 3 and uses simulations to determine when IPD are relatively too few to have a meaningful impact on NMA results. It first uses simulations to construct data under a variety of conditions and then compares results from those analyses to compare results across a variety of networks of ever-growing size (both in terms of treatments and in terms of trials).  Lastly, Chapter 5 concludes by discussing this program of research, contextualizing it within related developments over the course of its development, identifying strengths and limitations, and areas for future research. Of particular note, this chapter discusses the impact of using and not using NMA may have had on the clinical guidelines from various authorities, as well as the implications of the results of the Tsepamo study on this program of research.60,61 Upon completion of Chapters 2 and 3, results from the Tsepamo study were released that provided a signal of potentially higher rates of neural tube defects among women with pre-conception exposure to DTG.62 While the question of pre-conception exposure to DTG differs from the issue of first-line therapy among pregnant and breastfeeding women, these results did send a shockwave through the HIV clinical community. The chapter discusses results of the Tsepamo study as complimentary research and how they nuance the results of the evidence synthesis undertaken in this doctoral research.    16 Chapter 2: Systematic literature review and network meta-analysis of first-line ART  2.1 Synopsis Background: The 2016 WHO HIV clinical guidelines recommend EFV+XTC+TDF as the preferred first-line therapy, despite the SLR and NMA informing it revealing improved tolerability and efficacy with DTG and, to a lesser extent, EFV400. Rather, DTG and EFV400 are recommended as alternative first-line regimens due to uncertainty around sub-populations. With more subpopulations data and the price of manufacturing decreasing, we sought to evaluate the comparative effectiveness, tolerability and safety of first-line ART.  Objective: The research questions for this chapter were: 1) Should DTG be recommended as the preferred first-line anchor antiretroviral agent in ART for the treatment of HIV? 2) Should EFV400 be preferred over EFV (standard-dose) for the first-line anchor antiretroviral agent in combination with XTC + TDF (or an age-appropriate alternative for the children sub-population) for the treatment of HIV? Methods: Systematic database searches, for the SLR, were conducted on 12 February 2018 to identify publications reporting on relevant RCTs in: MEDLINE, EMBASE, and CENTRAL through Ovid. The outcomes included: viral suppression, mean change in CD4 cell counts, mortality, AIDS defining illnesses, retention, discontinuation due to adverse events, adverse events and regimen switching. This process was repeated for TB-HIV co-infection, pregnant and breastfeeding women, and children sub-populations. The network meta-analyses were performed in a Bayesian framework. Given that the research questions focus on the anchor agents of ART with a specific backbone (XTC+TDF), network nodes were defined in terms of specific ARVs. The analyses used an arm-based meta-regression adjustment to account for differences in backbones and trial-based meta-regression to adjust for differences in baseline CD4, HIV RNA and proportion of males. Results: A total of 2,815 citations were identified through database searches for the SLR update. Altogether, 163 publications describing 90 trials were selected for the principal population. Of the 17 new trials added since the previous guidelines, 4 included DTG, 3 included DOR, 2 included BIC, and 3 were endonodal on EVG/c comparing TAF to TDF. No evidence was available to describe the use of EFV400 in any of the sub-populations, so only the comparison of DTG vs EFV was considered in the sub-populations. The analysis set was constructed from the SLR by removing trials of older treatments that are no longer used (e.g., indinavir), a disconnected trial and a trial with an unapproved dose of BIC.  There was high certainty of improved viral suppression (odds ratio [OR]: 1.93; 95% credible interval [CrI]: 1.52, 2.47 at 96 weeks), discontinuations (OR: 0.49; 95% CrI: 0.44, 0.62) and discontinuations due to AEs  17 (OR: 0.30; 95% CrI: 0.19, 0.47) for DTG relative the EFV. This was supported by moderate certainty of improvements in CD4 cell counts (mean difference [md]: 22.87; 95% CrI: 8.29, 37.40), and both treatment-related (OR: 0.33; 95% CrI: 0.25, 0.44) and treatment-emergent AEs (OR: 0.63; 95% CrI: 0.38, 1.11). Due to low numbers of events, imprecise estimates and some risk of bias, there was low to very-low certainty of improved efficacy at 144 weeks, mortality and ADIs, SAEs and regimen substitutions. For EFV400 relative to standard dose EFV, high there was high certainty of improvements in discontinuations due to AEs (OR: 0.42; 95% CrI: 0.22, 0.77). Otherwise, efficacy and safety tended to have moderate quality evidence due to imprecision.  For the treatment of patients with HIV-TB co-infection, the results of the systematic literature review identified an interim analysis from the ongoing INSPIRING trial, a Phase III open-label RCT comparing twice-daily DTG 50 mg to once-daily EFV 600 mg. This was an important improvement over previous SLRs. DTG and EFV were not statistically differentiable with respect to viral suppression (OR: 0.54; 95% CrI: 0.19, 1.57), but DTG led to larger increases in CD4 (md: 52.52 cells/mm3; 95% CrI: 14.93, 89.61).  Two studies among pregnant and breastfeeding women were identified: the DolPHIN-1 trial and the Tsepamo study. The Tsepamo study was a large cohort study of 1,729 pregnant women initiating DTG+TDF+XTC and 4,593 women initiating EFV+TDF+XTC in Botswana. The proportions of pregnancies with any adverse birth outcome were similar across treatment arms with 33.2% of DTG-managed pregnancies and 35.0% of EFV-managed pregnancies. Similarly, severe birth outcomes were reported in 10.7% of DTG-managed and 11.3% of EFV-managed pregnancies. For a variety of safety outcomes, due to the risk of bias associated with an observational study there was moderate certainty that DTG was as safe or safer than EFV. Finally, for children, there was a real lack of evidence with respect to DTG, which was only reported in a single study among treatment-experienced (i.e., not eligible) patients.  Discussion: This extensive systematic literature review and network meta-analysis to evaluate the comparative efficacy, tolerability and safety of these and other ART regimens drew strong conclusions about the improved efficacy and tolerability of DTG relative to EFV. Moreover, the evidence synthesis supports the use of DTG among sub-populations, which was not the case when the previous guidelines were formed. This study has numerous strengths and limitations. The use of NMA allowed for analytic adjustments to account for differences in backbones. However, the evidence for the comparisons of interest continued to be somewhat limited in sub-populations for EFV400 and was missing among children for DTG. DTG+XTC+TDF is an effective, safe and tolerable ART regimen. Across a variety of outcomes, evidence strongly suggests that it is superior to the current efavirenz-based preferred first-line ART regimen. With a new affordable generic fixed-dose combination and comparable outcomes among sub-populations, the evidence supports the choice of a DTG based preferred first-line regimen. Conclusions regarding EFV400 are unchanged since the 2015 reviews. EFV400 appears to be more tolerable than standard-dose EFV, but with lack of evidence in sub-populations, it is likely best to be considered an alternative first-line regimen.  18 2.2 Introduction The efficacy and safety of initial HIV ART has improved over the years and currently more than 17 million people living with HIV (PLWH) are receiving life-saving ART.63 Improvements in potency, tolerability, simplicity and availability of first-line ART have resulted in increased life expectancy and quality of life for PLWH, when treatments are accessed in a timely and consistent manner.64 Hence, the selection of first-line ART has important public health and programmatic implications. With the effectiveness and safety of regimens as key considerations, many ART programs, particularly in LMICs, are influenced by the WHO ART guidelines.65  Clinical guidelines are developed through multi-step processes that ensure that they are feasible within the current clinical environment and that they are evidence-based.66 A key step involves evidence synthesis whereby all of the evidence is collected and analyzed so as to provide an overview of the therapeutic landscape. Guideline development is a particularly iterative scientific process. The evidence is re-evaluated every few years and recommendations are updated accordingly. In the case of the WHO consolidated HIV guidelines, they are updated every 2-3 years. Being iterative in nature, the natural starting point of an update are the conclusions of the evidence synthesis from the previous update.  In 2015, the WHO conducted evidence synthesis to update the 2013 Consolidated guidelines on the use of antiretrovirals for treating and preventing HIV.24 At the time, the hypothesis was that INSTI based regimens or low-dose efavirenz (EFV400) based regimens would challenge the preferred recommended first-line regimen. The 2013 guidelines recommended, for adults and adolescents, a first-line ART consisting of two NRTIs and an NNRTI.6 In particular, the preferred first-line therapy was the combination of EFV + TDF + XTC (whose fixed dose combination is called Atripla).6 Results of the 2015 SLR and NMA revealed improved tolerability and efficacy relative to standard-dose EFV with INSTIs and EFV400, with DTG having the highest estimated efficacy and tolerability.24 Despite this evidence, DTG and EFV400 were recommended as anchor treatments to alternative first-line regimens rather than the preferred treatment.21 This recommendation was due to uncertainty around sub-populations and an expensive price rendering it difficult to recommend for LMICs. As mentioned in Chapter 1, the importance of the sub-populations centers around the public health approach adopted by the WHO, by which guidance is kept as simple as possible in order to facilitate task-shifting and other strategies within resource-limited settings.7 Explicitly, while personalized care, involving genetic testing for example, is provided in resource rich settings, it cannot be provided in LMICs. Personalized care certainly leads to optimal treatment outcomes for the individual patients and should be favoured when feasible. In LMICs, the primary objective is providing treatment too all patients who need it, as part of the ongoing treatment scale-up effort. Having simplified treatment algorithms helps optimize the supply chain, reducing risk of stock outs and other potentially detrimental issues, and allows for task shifting, which is critical where there are shortages in trained health care providers. Vitario and colleagues discussed the existing barriers and  19 opportunities for the introduction of new antiretrovirals into national treatment programs in low-income and middle-income countries to support further treatment scale-up.67 Their conclusions were that additional efficacy and safety data among key sub-populations were needed.  The therapeutic landscape includes a large collection of antiviral agents. Table 2-1 provides background on various agents. ARVs not listed below are either of a class that was not of interest here (e.g., fusion inhibitors, primarily used as part of a salvage therapy) or have been out of use for an extended period of time. Table 2-1: List of antiviral agents, their drug class, year of approval and how they are currently used and perceived clinically Antiretroviral name Abbrev-iation Class Year of FDA approval Use today Abacavir ABC NRTI 1998 Still commonly used; however, use is restricted to countries with access to genetic testing given the drug’s immunopathogenesis Atazanavir ATV PI 2003 Used today and part of one of the preferred second-line regimens Zidovudine AZT NRTI 1995 Still used today, particularly for prevention of mother to child transmission and in lower-income settings Bictegravir BIC INSTI 2018 As a newer drug, it is only beginning to be prescribed in high-income settings Stavudine D4T NRTI 1994 Less commonly used today due to high rate of adverse events Didanosine ddI NRTI 1991 Less commonly used today due to high rate of adverse events Doravirine DOR NNRTI 2018 As a newer drug, it is only beginning to be prescribed in high-income settings Darunavir DRV PI 2006 Still commonly used today. Viewed as a drug that has high tolerability, but limited efficacy among patients with high viral loads Dolutegravir DTG INSTI 2013 Commonly used today in high-income settings and some low-income settings (e.g. Botswana) Efavirenz EFV NNRTI 1998 Remains one of the most commonly used first-line ARVs Low dose efavirenz EFV400 NNRTI 2018 Viewed as more tolerable yet as efficient treatment as full dose EFV, its use is mostly constrained to experimental settings for now Elvitegravir EVG INSTI 2012 Usually combined with cobicistat and used in high-resource settings Emtricitabine FTC NRTI 2003 Among the most commonly used ARVs today Indinavir IDV PI 1996 Seldomly used due to limited efficacy Lamivudine 3TC NRTI 1995 Still commonly used. Considered interchangeable with emtricitabine. Lopinavir LPV PI 2000 Used today and part of one of the preferred second-line regimens Nelfinavir NFV PI 1997 Seldomly used due to limited efficacy Nevirapine NVP NNRTI 1996 Still used today, particularly in LMICs and is among the more commonly used for TB-coinfected patients Raltegravir RAL INSTI 2007 Commonly used across a variety of settings. Twice daily use rather than once daily. Rilpivirine RPV NNRTI 2011 Used today and seen as a favourable treatment for high viral load patients Saquinavir SQV PI 1997 Seldomly used today due to lower efficacy Tenofovir alafenamide TAF NRTI 2016 Being used increasingly more given its favourable safety profile Tenofovir disoproxil TDF NRTI 1996 Remains one of the most commonly used first-line ARVs. May lead to bone mass density issues   20 With numerous changes, including the availability of generic fixed-dose combinations of DTG +TDF + XTC, the current SLR and NMA aimed to determine the efficacy and safety of DTG and EFV400 relative to standard dose EFV , including within the sub-populations discussed by Vitoria et al.67 The focus on these treatments is based on the SLR and NMA used to form the latest guidelines.24  The updated SLR and aggregate data NMA are described in this chapter. Of note, the evidence base used within this chapter, and the methods used to obtain it, are provided in detail in the publicly available statistical analysis plan for Chapter 3.68 Critical details are provided here, and the reader is referenced to the statistical analysis plan for details deemed of minor importance. 2.3 Objectives The objective of this chapter is to compare the efficacy and safety of first-line ART regimens. To do so the aim is to:  a) To identify and obtain the evidence base of RCTs pertaining to modern ART for the treatment of treatment-naïve HIV patients; b) To estimate the comparative efficacy, tolerability and safety of modern ART for the treatment of treatment-naïve HIV patients; c) Given the knowledge accumulated through previous guidelines and knowledge of current literature results, using the above two, this chapter is centred on the following research question: • Should DTG be recommended as the preferred first-line antiretroviral agent in combination with an age-appropriate backbone (TDF + XTC for adults and adolescents) for the treatment of HIV? • Should EFV400 be preferred over EFV (standard-dose) for the first-line antiretroviral agent in combination with an age-appropriate backbone for the treatment of HIV? 2.4 Methodology  As explained in the Introduction, the work conducted for the previous WHO guidelines helped formulate the current research questions, and as explained in the Preface the work on the previous WHO guidelines was part of this doctoral research. The 2015 SLR and NMA has been published,24 and informed the 2016 guidelines.21 Given these two aspects of the relationship (the informing and the work having been part of this doctoral research program), the methods and results do reference the work that was previously done. Doing so is also critical when viewing guideline development as an iterative, ongoing process rather than as a collection of independent investigations.   21 2.4.1 Systematic literature review Table 2-2 describes the PICOS (population, interventions, comparator, outcomes, study design) criteria used to guide the selection of studies that were included in this systematic literature review. Note that both research questions are captured by this single PICOS. Moreover, no restrictions were put on calendar time, but given that development of these interventions began in the late 80s and early 90s, the intervention and comparator terms naturally limited the results from a temporal perspective (i.e., these removed all results from the 70s and 80s). Table 2-2: Scope of the systematic literature review in PICOS form Criteria  Definition Population Inclusion criteria: • Treatment-naïve adults and adolescents (12 years and above) living with HIV Subgroups of interest: • Children • Pregnant and breastfeeding women • TB co-infected patients Interventions • DTG + 2NRTI • EFV400 + 2NRTI • Raltegravir (RAL) + 2NRTI • Elvitegravir boosted with cobicistat EVG/c + 2NRTI • Bictegravir (BIC) + 2 NRTI • Doravirine (DOR) + 2NRTI  • Rilpivirine (RPV) + 2 NRTI • Nevirapine (NVP) + 2 NRTI • Darunavir boosted with ritonavir (DRV/r) + 2 NRTI • Atazanavir boosted with ritonavir (ATV/r) + 2 NRTI • Lopinavir boosted with ritonavir (LPV/r) + 2 NRTI Comparator • EFV600 + 2 NRTI Outcomes • Viral suppression at 48 and 96 weeks  • Change from baseline CD4 at 48 and 96 weeks  • Mortality  • Retention  • Discontinuations due to adverse events  • Treatment emergent adverse events  • Severe adverse events  • Development of drug resistance Study design Inclusion criteria: • Randomized controlled trials (RCTs) Additionally, for subgroups: • Single-arm non-randomized controlled trials • Prospective and retrospective cohort studies • Case-control studies • Controlled and uncontrolled longitudinal studies (cohorts or case series) Language Only studies published in English  Time Minimum follow-up time of 24 weeks *Note: Except for DTG, EFV400 and EFV600 treatments are required to provide indirect evidence  22 The population listed above, treatment-naïve adults and adolescents, represents the population for the principal analysis of this evidence synthesis research. Although the principal inclusion criteria described above was broad enough to capture the sub-populations, less restrictive criteria on study design were required to obtain meaningful evidence on them. Thus, additional searches were conducted for each sub-population.  Treatments were differentiated according to the specific drugs, doses, and frequencies of administration. The only drugs that were considered interchangeable were lamivudine (3TC) and emtricitabine (FTC) due to their molecular likeness, referred to here as XTC. Non-standard doses were not considered as a reason for exclusion at the study selection process; however, non-standard doses that did not serve as connectors (i.e. were not compared to two or more treatments of interest) were excluded in the final selection stage (following full-text selection). ART regimens with a single antiviral agent and those with two agents that included one or more NRTI were not considered eligible. Similarly, with the exception of boosted regimens, ART regimens with four or more agents were not eligible (e.g. NNRTI+PI+2NRTI). Trials that had mixed backbones were included if the backbones were equally distributed across arms. Trials, where backbones were selected prior to randomization, were considered eligible. Trials failing to report on backbone distribution or reporting imbalanced backbone distributions were excluded. Further details on how regimens were defined for analytical purposes are provided below (Section 2.4.2.2). The eligibility criteria for inclusion within the SLR remained generally unchanged relative to the 2015 SLR; however, two new treatments, BIC and DOR, were added to the network. The motivation for adding them is that they might provide additional indirect evidence to the comparisons of interest and that there may be some secondary utility to understanding their efficacy and safety relative to the treatments of interest. Additionally, the use of tenofovir alafenamide (TAF) as a backbone was now permitted given that the evidence base for this treatment has grown substantially since 2015. 2.4.1.1 Sources A comprehensive systematic search of the literature was conducted on using the following databases: Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica database (EMBASE), and Cochrane Central Register of Controlled Trials (CENTRAL). The current systematic review is an update on a review completed in May 2015.24 Therefore, searches were restricted from 01 January 2015 to the search date, 12 February 2018.  Further manual searches of 2016, 2017 and 2018 Conference on Retroviruses and Opportunistic Infections (CROI), the 2016 AIDS conference, and the 2017 International AIDS Society (IAS) conference were conducted. Conference abstracts identified through the EMBASE search were eligible for inclusion. Additional studies were identified through a review of clinical trial registries and the reference lists of identified publications.  23 2.4.1.2 Search strategy The general search strategy involved identifying papers according to the population of interest, and the inclusion of interventions and comparators of interest, and the restriction to randomized controlled trials. The population was identified as having HIV or AIDS and not being treatment experienced or failing treatment. The searches further excluded publication types that were not of interest (i.e., newsletters and reviews). The specific search strategies are presented in the statistical analysis plan.68 2.4.1.3 Study selection  Two investigators (Steve Kanters and Michael Zoratti), working independently, scanned all titles and abstracts identified in the literature search. The same two investigators independently reviewed records potentially relevant in full-text. If any discrepancies occurred between the studies selected by the two investigators, a third investigator provided arbitration. The full-text screening was conducted for each specific question. The same approach was used for data extraction. Data extraction for the original studies was conducted by Steve Kanters and Evan Popoff.  2.4.1.4 Study quality The validity of individual randomized controlled trials was assessed using the Risk of Bias instrument, endorsed by the Cochrane Collaboration.49 This instrument is used to evaluate 7 key domains: sequence generation; allocation concealment; blinding of participants and personnel; blinding of outcome assessors; incomplete outcome data; selective outcome reporting; and other sources of bias.  The validity of non-randomized studies, including single-arm trials, cohort studies, and observational study studies, were evaluated using the Tool to Assess the Risk of Bias in Cohort Studies, developed by the Clinical Advances through Research and Information Translation (CLARITY) group at McMaster University (Hamilton, Canada). This 8-item instrument is used to evaluate various aspects of the research design and study execution, including the selection of patients, differences in patient characteristics, and the assessment of outcomes.  2.4.1.5 Data extraction Two investigators, working independently, extracted data on study characteristics, interventions, patient characteristics at baseline, and outcomes for the study populations of interest for the final list of selected eligible studies. Any discrepancies observed between the data extracted by the two data extractors were resolved by involving a third reviewer and coming to a consensus. Data is provided in a Microsoft Excel Workbook with sheets corresponding to the different information categories. Further details on data extraction, including assessment of study quality, are provided in the statistical analysis plan.68  24 2.4.2 Statistical analyses 2.4.2.1 Network meta-analyses Network meta-analyses (NMAs) were used to analyse all outcomes. These were conducted within the Bayesian framework using Bayesian hierarchical models. Under the assumption of consistency, the NMA model relates the data from the individual studies to basic parameters reflecting the (pooled) relative treatment-effect and safety profiles between interventions. Based on these parameters, the relative treatment-effects between each of the contrasts in the network were obtained. Below, the models for the NMAs that were used to evaluate the evidence base are presented. First consider the fixed- and random-effects models for a meta-analysis of treatment B versus A comparison. This can then be built upon for multiple treatment comparisons. Note that where possible, pairwise meta-analyses, such as those described in Equations 2-1 and 2-2 were conducted. Fixed and random-effects meta-analysis for AB trials The fixed effects meta-analysis model for RCTs is presented in equation 2-1. In this equation, E,- reflects the ‘underlying’ outcome for treatment k in study j and the link function to transform this outcome to a normally distributed scale (such as logit): E,- = &L=,-N	where &(. ) Is the link function and =,- are the unknown parameters of the likelihood function. In order to keep equation 2-1 generalizable, the likelihood function is not presented as part of the equation because this would be data specific. For the sake of specificity, in the case of continuous data and a Normal likelihood, the likelihood equation would be 4,- =R(E,-, 12,-3 ) where 4,- is the observed mean response for treatment k in study j and 12,-3  is its corresponding observed standard error. Similarly, for dichotomous outcome with a binomial likelihood, the likelihood equation would be 0,- = TU.(/,-, .,-) where 0,- is the number of responders for treatment k in study j and .,- is the corresponding sample size. In this case, E,-, = #V&UW(/,-). E,- = 	 XD,																									6 = PD, + 	I																6 = T Equation 2-1  Furthermore, D, represents this (transformed) outcome in trial j with comparator treatment A. d is the underlying treatment-effect of B versus A on a normal scale that is the same for each study j.   With the random-effects meta-analysis model @, is the trial-specific relative treatment-effect of B relative to A, as shown in equation 2-2. These trial-specific relative effects are drawn from a random-effects distribution @,~RV0*8#(I, [3). E,-=	 \µj																									k=Aµj+	δj																		k=B	 Equation 2-2  25 δj~Normal(d,σ2)		 Otherwise, E,- represents the parameter stemming from the likelihood function and the observed data, as seen in Equation 2-1, and µh represents the baseline effect of the jth trial, again as seen in Equation 2-1. To be more precise if the reference treatment in the jth trial is placebo, then the baseline effect can be described as the study effect: the impact of all prognostic factors, including placebo-effect, on the outcome of interest. Otherwise, the baseline effect is the combination of the study effect and the treatment effect of the reference treatment. The D, parameters are fixed intercepts and it is only the relative treatment effects δj that are drawn from a random distribution across studies.  Fixed effects network meta-analysis model When the available evidence consists of a network of multiple pairwise comparisons (i.e. AB-trials, AC-trials, BC-trials, etc.) the standard fixed effects model for NMA can be specified as follows:  E,- = 	 XD,A																																																													if	6 = !D,A +	I<- − I<A																																				if	6 ≻ !	I<< = 0, I<-~RV0*8#(0, 10	000)  Equation 2-3 There are k treatments labelled as A, B, C, etc., and treatment A is taken to be the reference treatment for the analysis. The order of treatments follows the alphabet and the greater than operator (≻) corresponds to a letter further down in the alphabet. D,A is the (transformed) outcome in study j on ‘baseline’ treatment b which will vary across studies. IA- is the fixed effect of treatment k relative to ‘baseline treatment’ b. IA- are identified by expressing them in terms of the reference treatment A: IA- = I<- − I<A with I<< = 	0. Random-effects network meta-analysis model E,- = 	 XD,A																										if	6 = !D,A +	@,A-												if	6 ≻ ! @,A-	~	RV0*8#(IA-, [3) = 		RV0*8#(I<- − I<A, [3)	I<< = 0, I<-~RV0*8#(0, 1000)	 Equation 2-4 @,A- is the trial-specific treatment-effect of k relative to treatment b. These trial-specific effects are drawn from a random-effects distribution: @,A-	~	R(IA-, [3). Again, the pooled effects, IA-, are identified by expressing them in terms of the reference treatment A. The heterogeneity [3 is assumed constant for all treatment comparisons. (A fixed effect model is obtained if [3 equals zero.) There are two important additional considerations regarding Equation 2-4. First, it makes the assumption that the between study variance parameter [3 is shared for all network edges (treatment comparisons);  26 and, second, it treats multiple-arm trials (>2 treatments) without taking account of the correlations between the trial-specific @s that they estimate. With respect to the first point, the more general model would include @,A-	~	RV0*8#(IA-, [A-3 	) rather than @,A-	~	RV0*8#(IA-, [3	). The assumption that the between-trial variances are all equal has been suggested by others,69,70 and is supported by the NICE DSU TSD 2.58 Admittedly, the motivation for making this assumption is primarily for the resulting conceptual and technical simplicity of the model. Additionally, particular to this evidence base (presented in Section 2.5), there were numerous edges with a single trial, for which solving for an edge-specific between-study heterogeneity is not feasible. Among these edges is the critical comparison between EFV and EFV400. By assuming a shared heterogeneity parameter, the model can account for the anticipated heterogeneity that would be anticipated on this and other edges were further trials available. As pointed out by the NICE DSU TSD 2, for those wishing to use model that accounts for multiple between-study variances, Lu and Ades provide a model that overcomes some of the technical difficulties that arise from the restrictions that the consistency condition of NMA imposes on the multiple between-study variances.71 For the aforementioned reasons, this model was not used here. With respect to the second consideration, Bayesian random-effects models with a heterogeneity parameter for I<- can be easily extended to fit trials with 3 or more treatment arms by decomposing a multivariate normal distribution as a series of conditional univariate distributions.72 The compound symmetry in the correlations (all are 0.5) stems from the consistency equations and the homogeneity of between-study variances.58 m@,A-n⋮@,A-pq	~	RV0*8# ⎝⎜⎜⎛mIA-n⋮IA-pq ,⎝⎜⎛[3 ⋯ [32⋮ ⋱ ⋮[32 ⋯ [3⎠⎟⎞⎠⎟⎟⎞ Equation 2-5 Then the conditional univariate distributions for arm i given the previous 1,….(i-1) arms are: @,A-z	| m @,A-n⋮@,A-z|nq	~	RV0*8# mIA-z +	1U }~@,A- −	IA-Ä , (U − 1)2U 	[3+Å;,Ç; q Equation 2-6 Random-effects network meta-analysis model with trial-specific covariate interaction term If there are systematic differences in effect-modifiers across studies causing heterogeneity or inconsistency, attempting to explain this by extending the models with covariates may be useful and reduce bias.  27 E,- = 	 XD,A																										if	6 = !D,A +	@,A-												if	6 ≻ ! @,A- 	= 	ÉRV0*8# ÑI<- − I<A +	}(9"- − 9"<)5"," , [3Ö 																										if	! = PRV0*8#(I<- − I<A, [3)																																																																					if	! ≠ P I<< = 0, 	I<-~RV0*8#(0,1000), 9"- = !", !"~RV0*8#(0,1000)		 Equation 2-7 5", is the #$% trial-specific covariate value. 9"- is the corresponding treatment-by-covariate interaction term. This implies an exchangeable parametrization of the covariate term, as suggested by the NICE DSU TSD 3 document.40 The motivation for using a fixed-effects parametrization for the covariate adjustment is primarily a question of estimation power. In practice, estimates of the covariates tend to have very large credible intervals due to having a small number of trials from which to estimate them. Furthermore, it seems reasonable to assume that there would be a larger heterogeneity across trials with respect to the relative treatment-effects (@,A-) rather than the manner in which covariates act as effect-modifiers to these treatment-effects.   Random-effects network meta-analysis model with treatment-specific covariate interaction term While meta-regression as described above is common, arm-based meta-regression adjustments are seldom used. That’s because between arm-differences are nullified through randomization of the RCTs; however, as will be explained below, there are valid reasons for us to use this approach here. E,- = 	áD,A																																																																				if	6 = !D,A +	@,A- +}9"(5",- −	5",A)" 													if	6 ≻ ! @,A-	~	RV0*8#(IA-, [3) = 		RV0*8#(I<- − I<A, [3) I<< = 0													I<-~RV0*8#(0,1000), 9"~RV0*8#(0,1000)	 Equation 2-8 Equations for the independent means models are presented in the statistical analysis plan.68 2.4.2.2 Node definitions and backbone adjustments Given that the research questions for this chapter concern anchor treatments (i.e., non-backbone antivirals; also referred to as third agents), the nodes were defined in terms of specific antivirals rather than specific ART regimens. All treatments with multiple standard doses or frequency of administration were not differentiated on this basis. For example, nevirapine 200 mg twice daily (bid) was considered equivalent to nevirapine 400 mg once daily (qd). However, most treatments only had a single posology. The only treatment with multiple doses that were distinguished in the analysis was efavirenz (600 mg qd) and low  28 dose efavirenz (400 mg qd). Defining nodes according to a single ARV rather than the full regimen importantly simplified the interpretation of modelling and results. Nonetheless, it is important to account for differences in backbone therapies. RCTs that use the same backbone in all trial arms do not require any adjustment in terms of backbones; however, RCTs employing different backbones require adjustments in order to properly measure the effect attributable to the antiviral agent comparison being estimated. Two approaches were used to address differences in backbone regimens. First, backbone regimens were categorized as TDF + XTC (the reference category), TAF + XTC, abacavir (ABC) + XTC, zidovudine (AZT) + XTC, and as other. The other category included treatments such as stavudine (d4T) and didanosine (ddI) as well as the agents contained in the previous categories. Arm-based meta-regression was used to adjust estimates according to differences in backbones according to these categories. The alternative approach was to simply reduce the evidence base to trials that did not differ with respect to backbones. By choosing the backbone for the preferred first-line regimen as the reference category, results were the expected results were all trials using the first-line preferred ART-regimen backbone. This meta-regression adjustment approach assumed that the backbone is not an effect modifier to the effect of the anchor treatment.  The most notable trial to differ in backbones was the SINGLE trial comparing EFV to DTG,73 which is central to the research questions. Otherwise, trials that differed in backbones tended to be older or to be endonodal. Endonodal trials are those that compare a node to itself. Indeed, some trials were only included to improve the estimation of the meta-regression adjustments. The adjusted model served as the primary analysis; however, in outcomes where there were too few differences in backbones, the restricted model was used instead. 2.4.2.3 Models All outcomes were either binary or continuous. Viral suppression and CD4 outcomes were frequently reported at multiple time points and were analysed separately for each of the three-time points of interest: 24 weeks, 48 weeks, and 96 weeks. The remaining outcomes tended to be reported at a single time point, which varied and typically coincided with trial duration. During the feasibility assessment stage, the relationship between follow-up time and outcomes was explored. The figures considering trends in both proportions and odds ratios across time are shown in Appendix H of the statistical analysis plan.68 Results suggest that the odds ratios tend to be stable over time or include an equal amount of downward and upward trends. On this basis, the relative treatment-effects were modelled on all remaining variables using the outcomes combined across multiple time points. For studies reporting one of these outcomes at multiple time points, the values at the longest follow-up were used. For binary outcomes (mortality, AIDS-defining illnesses, viral suppression, loss to follow-up, serious adverse events, and regimen substitutions) a logistic regression model with the logit link function and a binomial likelihood was used (avoiding the need for approximate methods).74 Results are presented as odds ratios (OR) for these models to avoid the ceiling effect that limits relative risks (RR) for outcomes with  29 proportions around 0.8 to 0.95. To test for the presence of heterogeneity both the fixed-effects and random-effects models were employed. For the random-effects model, the conventional non-informative prior, a uniform distribution between 0 and 2, was applied to the between-trial standard deviation.58 For continuous outcomes (increase in CD4 count) linear regression models with an identity link and normal likelihood were used. The data was arm-based, and the differences in change from baseline between all informed treatment comparisons were modelled. Estimates of comparative efficacy were represented as mean differences.  2.4.2.4 Adjusted analysis Adjusted analyses came in two flavours. First, meta-regression adjustments were conducted to evaluate whether differences in baseline CD4, baseline log-transformed viral load, and proportion of males impacted relative efficacy and safety estimates. Second, sensitivity analyses were conducted. For viral suppression, the intention-to-treat (ITT) outcomes were used for the primary analysis and per-protocol outcomes were considered as a sensitivity analysis. Additionally, multiple cut-off values were reported for the definition of viral suppression. Newer trials tend to use a cut-off of <50 copies/mL, but some trials used higher cut-off values, <200 and <400 copies/ml, due to the limited sensitivity of older assays (i.e., the tools used to measure the level of viremia in plasma). While the cut-off does affect the absolute count, no evidence was found to suggest that these alter relative treatment-effects. Thus, for the primary analysis, all trials were included regardless of cut-off used, and as a sensitivity analysis only trials using the <50 copies/mL were included. In trials where multiple cut-off values were reported, <50 copies/ml was favoured to <200 copies/ml, which was favoured to <400 copies/ml.  2.4.2.5 Evaluation of consistency between direct and indirect comparisons Prior to the NMA, the consistency between direct and indirect comparisons was evaluated for networks that consisted of closed loops. For each of the comparisons (i.e. contrasts) that were part of a closed loop made up of more than 1 RCT, the available trials were split into direct and indirect information. For each contrast in question, two (pooled) relative treatment-effect estimates were obtained, one with independent-means (or independent-effects) models using only the trials providing direct comparisons, and one based on an NMA of the remaining trials providing only indirect evidence. This iterative technique is called edge-splitting.75 The difference in estimates generated by the two sets of evidence was evaluated with the Bucher test for inconsistency.38  2.4.2.6 Model selection For each outcome and subgroup of interest, fixed or random-effects models were applied (except for certain outcomes where random-effects were not feasible). Model selection was conducted using the deviance information criterion (DIC) according to NICE conventions.58 The DIC provides a measure of model fit that penalizes for model complexity. Through the use of the DIC, the fixed effects models were often favoured. Model fit was also assessed using leverage plots and any outliers identified in this fashion were investigated  30 further.58 The model with the best fit was chosen as the primary analysis model. In situations with very limited and sparse data, a narrative review was used as an alternative to quantitative analysis. The latter were restricted to the sub-population analyses. 2.4.2.7 Software The parameters of the different models were estimated using a Markov Chain Monte Carlo method implemented in the JAGS software package. A first series of 50,000 iterations from the JAGS sampler were discarded as ‘burn-in’, and the inferences were based on additional 50,000 iterations using two chains. All analyses were performed using R version 3.4.4 (http://www.r-project.org/) and JAGS version 4.3. Code used to conduct the analyses are presented in Appendix A1.  2.5 Results: Adults and adolescents 2.5.1 Systematic literature review study selection Following the PICOS outlined in Table 2-2 and employing the search strategies specified in Appendix A of the statistical analysis plan,68 a total of 2,815 citations were identified through database searches for the SLR update; of these 204 were selected for full-text review. The flow diagram for study selection is presented in Figure 2-1. As previously explained, this is an expansion of the SLR conducted in 2015. Given that the 2015 review was part of this doctoral research program, there was no need to start from scratch. Rather the additional search was conducted from the start of 2015 and the results of the two study selection processes were combined. This two-step approach is reflected in Figure 2-1. Overall, 171 publications describing 99 trials were identified and included in the systematic literature review.73,76-237  These trials are restricted to those of the primary population. Trials pertaining to subpopulations, such as TB, have been excluded from this study selection. Further details on sub-populations are in Sections 2.6, 2.7 and 2.8.  Of the 16 new trials added to the evidence base, 4 included DTG (ARIA, GS-US380-1489; GS-US-380-1490, and SSAT066),91,112,181,205 3 included DOR (DRIVE AHEAD, DRIVE FORWARD, and 1439-007 Study),115,116,156,174 2 included BIC,112,205 and 3 were endonodal on EVG/c comparing TAF to TDF.81,204,206 With respect to the DTG trials, ARIA compared DTG to ATV/r among a sample of women only; SSAT066 compared DTG to RAL and EVG/c; while the remaining two trials were comparisons to BIC. Standard dose efavirenz continued to be central as it was included in 5 trials; however, there were no new trials comparing EFV to DTG or to EFV400. Just as in 2015, ENCORE1 was the only trial that included EFV400, pitting it against standard dose EFV. The NAMSAL trial (NCT02777229) was ongoing at the time of the search and aimed to compare EFV400 to DTG and results are expected later this year.   31 Figure 2-1: Flow diagram for principal systematic literature review on the choice of first-line ART among adults and adolescents  32 As such, despite the large number of trials in the evidence base as a whole, there are three key trials that very much inform the comparisons of interest: SINGLE (DTG + ABC + XTC vs EFV + TDF + XTC), SPRING-1 (DTG + 2 NRTIs vs EFV + 2 NRTIs) and ENCORE1 (DTG + TDF + XTC vs EFV + TDF + XTC).73,78,79,121,216,223,227,228 These are the trials involved in the head to head comparisons of interest. They were all multinational, double-blind, placebo-controlled randomized trials. SRING-1 was a Phase II trial, while the others were Phase III trials. Spring-1 randomized 205 patients, SINGLE randomized 833 patients and ENCORE1 randomized 636 patients. The ENCORE1 study investigators concluded that EFV400 was non-inferior to EFV over 96 weeks with fewer treatment-related adverse events. While positive conclusions were also drawn for the SINGLE trial, there was confounding due to the difference in backbones. This is why the adjustments for differences in backbones were critical to this evidence synthesis. Note that no evidence was available to describe the use of EFV400 in any of the sub-populations and as such only the comparison of DTG vs EFV was considered in the sub-populations. The full list of included trials, including the studies included in the various sub-population reviews, is provided in the statistical analysis plan.68 2.5.2 Analysis set study selection Feasibility, applicability, and relevance considerations led to the removal of several studies or study arms from the analysis set. Following the 2015 analysis, it became clear that older treatments that are no longer used in most settings had very little connectivity to the network and therefore provided negligible additional information regarding the research questions. As shown in previous research, nodes that are not of direct interest may be removed unless they are very well-connected in the network, which was not the case here.238 In fact, at times they displayed evidence of loop inconsistency (disagreement between direct and indirect evidence). As such, studies of indinavir, fosamprenavir, unboosted atazanavir, saquinavir, nelfinavir, and triple NRTIs were removed from the analysis set. To ensure the removal of these nodes was acceptable, sensitivity analyses including the older treatments were conducted and results were compared to those of the primary analyses. Differences were negligible. Five trials were excluded for having a RAL backbone, which could not be handled in the model. From the review update, GS-US-141-1475 and GS-US-299-0102 were excluded: GS-US-141-1475 used a non-FDA approved dose of BIC; and GS-US-299-0102 was an endonodal trial that did not connect to the overall network (Cobicistat boosted DRV). Figure 2-2 presents the analysis set represented as a network diagram. The network was well-connected, with EFV serving as the most well-connected node. Overall, the principal analysis set of studies included 65 trials in which 33,148 patients were randomized to 151 treatment arms (12 treatments). A single study compared EFV to EFV400, with no indirect evidence identified. A combination of direct and indirect evidence was available for all other treatment comparisons except BIC and RPV. Both were kept so as to have comparisons of these with the remainder of the therapeutic landscape, but their inclusion provided no additional information to the comparisons of interest.  33 Summaries of trial characteristics, patient characteristics, and critical appraisal quality assessments are presented in Appendices D, E, and  F of the statistical analysis plan,68 respectively. With respect to trial characteristics, trials ranged from Phase II to IV, but most were Phase III, double-blinded and multinational. With respect to baseline characteristics, there were a number of notable differences across trials. Most notably, sex varied from all females to all males. Mean CD4 varied from as low as 102 cells/mm3 (PHIDISA II) to 576.5 cells/mm3 (GS-US-236-0140). Similarly, baseline HIV RNA varied from 4.28 (Epzicom-Truvada) to 5.48 (ADVANZ; a new trial). There were also notable differences with respect risk groups and other markers of disease severity, but age, sex, CD4 and viral load were the best reported and the ones that were explored further through meta-regression. As shown in Appendix A.2, overall study quality was generally high (i.e., low risk of bias). The areas where high risk of bias was identified were mostly restricted to open-label trials having a high risk of bias due to blinding. Moreover, some of the more recent trials for which only conference posters were available had insufficient information to determine with certainty that the risk of bias was either low or high.  Figure 2-2: Network of all studies included in the evaluation of first-line ART for the treatment of HIV among adults and adolescents  Legend: Circles (nodes) in the diagrams represent individual treatments, lines between circles represent availability of head-to-head evidence between two treatments, and the numbers on the lines are the number of RCTs informing each head-to-head comparison. Blue: NNRTIs; Green: Protease inhibitors; Orange: Integrase inhibitors. ATV/r: ritonavir-boosted atazanavir; DRV/r: ritonavir-boosted darunavir; DTG: dolutegravir; EFV: efavirenz; EFV400: efavirenz 400; EVG/c: elvitegravir/cobicistat; LPV/r: ritonavir-boosted lopinavir; NVP: nevirapine; RAL: raltegravir; RPV: rilpivirine; BIC: bictegravir; DOR: doravirine   34 2.5.3 Network meta-analysis results Prior to providing details regarding the analysis of each outcome individually, consider a few general remarks.  With respect to meta-regression adjustments, there was one analysis that did require an adjustment for imbalances in the proportion of males; namely, the analysis for discontinuations. For all other analyses, the unadjusted model was favoured. The network diagrams for each analysis are provided in the statistical analysis plan.68 To further facilitate the reader in identifying which trials were used for the analyses of each outcome, Appendix A3 lists the trials and indicates which outcome analysis each trial was included in. All analyses appeared to meet the consistency assumption for NMA. There were very few exceptions and these were restricted to specific loops in the treatment-related SAEs. These did not justify a need to use an alternative analysis to the NMA. Section 2.5.3.1 presents an instance of the evidence used to make these judgements and explains how to interpret the results. 2.5.3.1 Viral suppression Viral suppression was among the best-reported outcomes. The definition of viral suppression was a composite of the various thresholds reported in the included studies (i.e. <20-50 copies/mL; <200 copies/mL; <400 copies/mL). The <50 mL threshold was favoured and was by far the most commonly reported threshold. Sensitivity analyses restricting the evidence base to only trials using the <50 mL threshold yielded similar results, thus supporting the composite approach (results not shown). Additionally, many trials used the Food and Drug Administration Snapshot algorithm, whereby discontinuations are considered failures. This approach was used throughout the evidence base, even in trials that did not explicitly use this approach. A consequence of this approach is that differences in viral load suppression can either be driven by differences in efficacy (i.e., improved ability of the drug to suppress the virus) or differences in tolerability (i.e., an increased propensity to stay on the drug) or both. As such, while viral suppression is considered the primary efficacy outcome, in actuality it is difficult to disaggregate efficacy from tolerability and this should be kept in mind while interpreting the results. The reported analyses are restricted to ‘intention to treat’ (ITT) results, but per-protocol results were also analysed (results not shown). The per-protocol results did not resolve the issue around determining whether differences are due to efficacy or tolerability. Walmsley et al note that a large difference between DTG and EFV in the SINGLE trial was due to tolerability.239 Notably, evidence on EFV400 was only available at the 46 and 96- week timepoints. For the analysis of viral suppression at 48 weeks, evidence was derived from 53 trials of 115 treatment arms including 26,410 patients. The results of the fixed-effects NMA for the comparisons of interest are presented in panel A of Figure 2-3, below, and all comparisons in Table 2-3. The forest plots (Figure 2-3) has arrows at the bottom indicating the direction of effect for the first treatment listed (for the first comparison, it is better for DTG if results are high, or to the right). The cross-table (Table 2-3), sometimes referred to as a league table, provides an overview by presenting the comparative efficacy of all  35 comparisons in the network. The results refer to the comparison between the row treatment relative to the column treatment. As such, the results below the diagonal are the inverse of those above the diagonal. The key results for the research questions of interest are highlighted in the forest plots. The cross-table is given for the purpose of a complete picture and reviewing other results that may be of interest to the reader. For example, in Table 2-3 the row pertaining to DTG shows that DTG had a higher odds of viral suppression at 48 weeks than any other treatment. To be clear, there is no need to review each and every single estimate.  Based on the available evidence, DTG was statistically significantly more effective than standard dose EFV in achieving viral suppression at 48 weeks (OR: 1.86; 95% CrI: 1.44, 2.40). In fact, it was statistically superior to all other treatments except EFV400 (OR 1.61, 95% CrI 0.97, 2.70) against which it was marginally significantly better. The comparison between DTG and EFV400 was based only on indirect evidence. EFV400 was not statistically distinguishable from standard dose EFV. For the remainder of the outcomes, cross-tables providing results for all comparisons in each of the included analyses are presented in Appendix A5. Figure 2-4 compares the estimates obtained through direct evidence only (x-axis) and through indirect evidence (y-axis). Points falling along y = x line indicate perfect agreement between the two sources. In this example, there is a single point that falls further away from the agreement line. The loop with disagreement was the LPV/r-ATV/r-EFV loop, which is not likely to be influential to the results of interest. Moreover, the Bucher test suggests no evidence of disagreement (p = 0.085). For the remainder of the outcomes, the comparisons of direct and indirect evidence, as well as the p-values testing for loop inconsistency, are presented in Appendix A4. Figure 2-3: Forest plot of select ARVs comparisons with respect to viral suppression at A. 48 weeks; and B. 96 weeks according to fixed-effects network meta-analysis   36 Table 2-3: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for viral suppression at 48 weeks from the fixed-effects network meta-analyses EFV 0.86  (0.56, 1.35) 0.54  (0.42, 0.69) 0.78  (0.59, 1.01) 0.86  (0.53, 1.42) 0.73  (0.57, 0.94) 1.14  (0.94, 1.39) 0.84  (0.67, 1.04) 0.88  (0.67, 1.15) 1.18  (1.02, 1.37) 1.17  (0.93, 1.48) 1.46  (1.16, 1.85) 1.16  (0.74, 1.79) EFV400 0.62  (0.37, 1.04) 0.90  (0.53, 1.50) 0.99  (0.51, 1.91) 0.85  (0.51, 1.40) 1.32  (0.81, 2.15) 0.97  (0.59, 1.58) 1.02  (0.61, 1.70) 1.36  (0.85, 2.17) 1.35  (0.82, 2.22) 1.69  (1.03, 2.78) 1.86  (1.44, 2.40) 1.61  (0.97, 2.70) DTG 1.44  (1.02, 2.04) 1.59  (1.03, 2.48) 1.37  (1.03, 1.80) 2.12  (1.56, 2.89) 1.56  (1.12, 2.17) 1.64  (1.16, 2.31) 2.19  (1.70, 2.83) 2.17  (1.64, 2.89) 2.71  (1.98, 3.72) 1.29  (0.99, 1.69) 1.12  (0.67, 1.89) 0.69  (0.49, 0.98) EVG/c 1.10  (0.64, 1.94) 0.95  (0.66, 1.34) 1.47  (1.07, 2.02) 1.08  (0.77, 1.53) 1.14  (0.79, 1.65) 1.52  (1.17, 1.97) 1.51  (1.07, 2.11) 1.88  (1.35, 2.64) 1.16  (0.71, 1.90) 1.01  (0.52, 1.95) 0.63  (0.40, 0.97) 0.91  (0.51, 1.57) BIC 0.86  (0.51, 1.42) 1.34  (0.78, 2.23) 0.98  (0.56, 1.66) 1.03  (0.59, 1.76) 1.37  (0.83, 2.24) 1.36  (0.81, 2.27) 1.70  (0.99, 2.87) 1.36  (1.06, 1.76) 1.18  (0.71, 1.97) 0.73  (0.55, 0.97) 1.05  (0.75, 1.51) 1.17  (0.70, 1.96) RAL 1.55  (1.15, 2.12) 1.14  (0.82, 1.59) 1.20  (0.85, 1.70) 1.60  (1.25, 2.08) 1.59  (1.20, 2.12) 1.99  (1.45, 2.73) 0.88  (0.72, 1.06) 0.76  (0.46, 1.23) 0.47  (0.35, 0.64) 0.68  (0.50, 0.93) 0.75  (0.45, 1.28) 0.64  (0.47, 0.87) NVP 0.73  (0.55, 0.99) 0.77  (0.56, 1.06) 1.03  (0.84, 1.28) 1.02  (0.77, 1.36) 1.28  (0.96, 1.70) 1.19  (0.96, 1.49) 1.03  (0.63, 1.69) 0.64  (0.46, 0.90) 0.93  (0.65, 1.31) 1.02  (0.60, 1.77) 0.88  (0.63, 1.22) 1.36  (1.01, 1.83) RPV 1.05  (0.75, 1.48) 1.41  (1.08, 1.83) 1.39  (1.02, 1.92) 1.74  (1.27, 2.41) 1.13  (0.87, 1.48) 0.98  (0.59, 1.64) 0.61  (0.43, 0.86) 0.88  (0.61, 1.27) 0.97  (0.57, 1.68) 0.83  (0.59, 1.17) 1.30  (0.94, 1.79) 0.95  (0.67, 1.34) DOR 1.34  (1.01, 1.78) 1.33  (1.01, 1.76) 1.66  (1.19, 2.30) 0.85  (0.73, 0.98) 0.73  (0.46, 1.18) 0.46  (0.35, 0.59) 0.66  (0.51, 0.85) 0.73  (0.45, 1.20) 0.62  (0.48, 0.80) 0.97  (0.78, 1.20) 0.71  (0.55, 0.92) 0.75  (0.56, 0.99) ATV/r 0.99  (0.79, 1.24) 1.24  (0.99, 1.55) 0.85  (0.68, 1.08) 0.74  (0.45, 1.22) 0.46  (0.35, 0.61) 0.66  (0.47, 0.93) 0.73  (0.44, 1.23) 0.63  (0.47, 0.83) 0.98  (0.73, 1.31) 0.72  (0.52, 0.98) 0.75  (0.57, 0.99) 1.01  (0.80, 1.27) DRV/r 1.25  (0.97, 1.61) 0.68  (0.54, 0.87) 0.59  (0.36, 0.98) 0.37  (0.27, 0.50) 0.53  (0.38, 0.74) 0.59  (0.35, 1.01) 0.50  (0.37, 0.69) 0.78  (0.59, 1.04) 0.57  (0.41, 0.79) 0.60  (0.43, 0.84) 0.81  (0.65, 1.01) 0.80  (0.62, 1.04) LPV/r Values represent the effect of the treatment lower on the diagonal to the one higher on it. Bold values indicate comparisons that are statistically significant. Odds ratios above 1 indicate higher efficacy in viral suppression. ATV/r: ritonavir-boosted atazanavir; DRV/r: ritonavir-boosted darunavir; DTG: dolutegravir; EFV: efavirenz; EFV400: efavirenz 400; EVG/c: elvitegravir/cobicistat; LPV/r: ritonavir-boosted lopinavir; NVP: nevirapine; RAL: raltegravir; RPV: rilpivirine; BIC: bictegravir; DOR: doravirine  37 Figure 2-4: Inconsistency plot comparing direct and indirect evidence for viral suppression at 48 weeks  Note: The smallest p-value for the Bucher test was 0.085. The marginal evidence of inconsistency was with LPV/r  For the analysis of viral suppression at 96 weeks, evidence was derived from 28 trials of 63 treatments arms including 16,495 patients. There was no evidence of inconsistency (Figure A-1 in Appendix A). The results of the fixed-effects NMA for the comparisons of interest are also presented in panel B of Figure 2-3, and all comparisons in Table A-4 of Appendix A. Treatment was DTG was associated with a higher proportion of patients achieving viral suppression compared to all other treatments, though the comparison to RPV was not statistically significant (OR 1.44, 95% CrI 0.98, 2.10). At this time point, treatment with DTG was associated with a statistically significantly higher proportion of patient achieving viral suppression compared to EFV400 (OR 2.00, 95% CrI 1.20, 3.37). Similarly to at 48 weeks, EFV400 was not statistically distinguishable from standard dose EFV. The analysis of viral suppression at 144 weeks was based on evidence from 6 trials of 13 treatment arms enrolling 5,274 patients. There was no evidence of inconsistency given that this was a star network (all evidence was either direct or indirect, but never both). Given that EFV400 was not available at this timepoint, results of the fixed-effects NMA for viral suppression at 144 weeks are only presented in Table A-5 of Appendix A. DTG continued to demonstrate superior viral suppression, although only half the comparisons were statistically significant. This was likely due to the smaller number of trials and patients. Nonetheless, among the significant comparisons was that against EFV (OR: 1.44; 95% CrI: 1.08, 1.94). 2.5.3.2 Increase in CD4 cell counts The network of evidence for change in CD4 cell count at 48 weeks is based on 44 trials comprising 94 treatment arms including 23,789 patients. There was no evidence of inconsistency (Figures A-2 and A-3  38 in Appendix A). The results of the fixed-effects NMA for the comparisons of interest are presented in panel A of Figure 2-5, below, and all comparisons in Table A-6 of Appendix A. Based on the available evidence, DTG was statistically significantly more effective than standard dose EFV in increasing CD4 at 48 weeks (MD: 22.87; 95% CrI: 8.29, 37.40), as was EFV 400 (MD: 25.43; 95% CrI: 6.93, 43.97). As a result, both treatments were comparable to one another. Both treatments had higher estimated increases in CD4 than almost all other treatments, albeit not statistically significant. The network of evidence for change in CD4 cell count at 96 weeks is based on 22 trials comprising 47 treatment arms including 15,134 patients. The results of the fixed-effects NMA for the comparisons of interest are also presented in panel B of Figure 2-5,  below, and all comparisons in Table A-7 of Appendix A. Results at this timepoint were very similar to those at 48 weeks, suggesting an improvement over the first year that is sustained in the second. Figure 2-5: Forest plot of select ARVs comparisons with respect to mean change in CD4 cell counts at A. 48 weeks and B. 96 weeks according to fixed-effects network meta-analysis   The network of evidence for change in CD4 cell count at 144 weeks is based on 7 trials comprising 15 treatment arms including 7,019 patients. Given that EFV400 was not available at this timepoint, results of the fixed-effects NMA for viral suppression at 144 weeks are only presented in Table A-8 of Appendix A. The improvement in CD4 for DTG relative to EFV increased to 49.44 cells/mm3 (95% CrI: 19.51, 79.39); however, the network was much sparser than at earlier time points.  39 2.5.3.3 Mortality The evidence base for mortality in first-line treatment consisted of 21,604 patients enrolled in 29 trials consisting of 62 treatment arms. Although mortality was considered a very important outcome by the guideline development group, the trials were underpowered for this outcome. Mortality across trials was low, with the exception of PHIDISA II which compared EFV (106/872, 12.2%) to LPV/r (102/873, 11.7%). In all, there were 391 deaths in the evidence base, but many comparisons with 0 events can render estimates unreliable. Consider that there were only 5 deaths in each of DTG and EFV400 in the evidence base and as a result a few comparisons including zero counts. Given the small number of events, there are limitations to synthesizing the evidence through an NMA which may produce wide credible intervals. However, select comparisons are presented in Figure 2-6. There was no statistically significant difference between mortality outcomes in patients treated with DTG, EFV, and EFV400. An inconsistency plot (Figure A-4) despite these limitations, but no cross-table is provided. Figure 2-6: Forest plot of select ARVs comparisons with respect to mortality according to fixed-effects network meta-analysis   2.5.3.4 AIDS defining illnesses The evidence base for AIDS defining illnesses in first-line treatment consisted of 9,722 patients enrolled in 18 trials consisting of 40 treatment arms. Similar issues to those seen in mortality were present in the ADI analysis. For example, 12.3% (14/144) of patients enrolled in the EFV arm of the Altair trial reported ADIs while no patients (0%) reported ADIs in the EFV arm of GS-US-236-0102 (0/352), SPRING-1 (0/50), and Protocol 004 (0/38) trials. Few ADIs were reported for patients treated with DTG (SPRING-1, 2/51, 3.9%) and for patients treated with EFV400 (ENCORE1, 14/321, 4.4%). Given the small number of events, there are limitations to synthesizing the evidence through an NMA which may produce wide and non-meaningful CrIs. However, select comparisons are presented in Figure 2-7. There was no statistically significant difference between mortality outcomes in patients treated with DTG, EFV, and EFV400. There was a single closed loop that provided no evidence of disagreement between direct and indirect evidence (Figure A-5).   40 Figure 2-7: Forest plot of select ARVs comparisons with respect to the proportion of patients developing AIDS defining illnesses according to fixed-effects network meta-analysis   2.5.3.5 Discontinuations The evidence base for all-cause discontinuations (retention) was based on 26,399 patients enrolled across 120 treatment arms in 54 trials. . There was no evidence of inconsistency (Figure A-6 in Appendix A). The results of the fixed-effects NMA for the comparisons of interest are presented in Figure 2-8, below, and all comparisons in Table A-8 of Appendix A. Recall that this was the outcome that required an adjustment for the proportion of males in the trials. Based on the available evidence, DTG was statistically significantly more effective than standard dose EFV in preventing discontinuations (OR: 0.49; 95% CrI: 0.44, 0.62). In fact, it was statistically superior to all other treatments except EVG/c, BIC, and DOR against which it still had a lower estimate of discontinuation. The comparison between DTG and EFV400 was based only on indirect evidence. EFV400 was not statistically distinguishable from standard dose EFV (OR: 0.91; 95% CrI: 0.50, 2.08); however, DTG did appear to be superior to EFV400. Figure 2-8: Forest plot comparing pair-wise and NMA estimated relative effects of different ARVs with respect discontinuations (all cause)   2.5.3.6 Discontinuations due to adverse events The evidence base for discontinuations due to adverse events was based on 54 trials of 26,165 patients enrolled in 118 treatment arms. . There was no evidence of inconsistency (Figure A-7 in Appendix A). The results of the fixed-effects NMA for the comparisons of interest are presented in Figure 2-9, below, and all comparisons in Table A-9 of Appendix A.  41 Figure 2-9: Forest plot comparing pair-wise and NMA estimated relative effects of select ARVs with respect discontinuations due to adverse events  It is for this outcome that both DTG and EFV400 shine, with both having the lowest odds of discontinuation due to adverse events and both being superior to standard dose EFV. This time EFV400 and DTG were not statistically differentiable. 2.5.3.7 Treatment-related and emergent adverse events  The evidence for treatment-related adverse events included 15,599 patients enrolled in 61 treatment arms across 27 trials and that for emergent adverse events included 18,915 patients in 70 treatment arms across 32 trials. There was no evidence of inconsistency (Figure A-8 and A-9 in Appendix A). Key comparisons are presented in Figure 2-10.  Figure 2-10: Forest plot of select ARVs comparisons with respect to A. treatment related adverse events and B. treatment emergent adverse events according to fixed-effects and random-effects network meta-analysis  While none of the treatments were distinguishable with respect to treatment emergent AEs, both DTG and EFV400 had lower odds of leading to a treatment-related AE. Moreover, DTG had lower odds than EFV400.  42 Overall, BIC had the lowest odds of treatment-related AEs, followed by DOR, DRV/r and then DTG. Treatments were generally less distinguishable with respect to emergent AEs. 2.5.3.8 Treatment-related and treatment-emergent serious adverse events  The evidence for treatment-related SAEs was based on 8,041 patients enrolled in 31 trial arms across 15 trials and for treatment-emergent SAEs was based on 26,706 patients enrolled in 98 trial arms across 45 trials. With only 81 treatment related SAEs reported across the evidence base, there were too few events to obtain reliable estimates. This also complicated the inconsistency figures (Figures A-10 and A-11). Results of the analysis are nonetheless presented in Figure 2-11. Treatment emergent SAEs did not have the same limitation. Full results are presented in Table A-11. Again, neither DTG nor EFV400 distinguished themselves from EFV, though the estimates were lower. Figure 2-11: Forest plot of select ARVs comparisons with respect to A. treatment related serious adverse events and B. treatment emergent serious adverse events according to fixed-effects network meta-analysis    2.5.3.9 Regimen substitutions The evidence base for regimen substitutions by 48 weeks was based on 9,263 patients enrolled in 18 trials across 41 treatment arms. Figure 2-12 displays some of the key comparisons. Results of this analysis do not match what is seen in practice. That is to say that in practice, regimen substitutions are less common than with EFV. It is important to note that there was no direct evidence supporting the DTG to EFV comparison and that there were very few observed regimen substitutions in the trials that did.  43 Figure 2-12: Forest plot comparing pair-wise and NMA estimated relative effects of different ARVs with respect regimen substitution (48 weeks)   2.5.4 Remarks relative to the 2015 analyses The results of the updated analyses were very similar to those of the 2015 review. A notable difference is that the fixed-effects models were more often favoured in these analyses, while the random-effects were more commonly used in the 2015 analyses. This suggests a reduction in heterogeneity, which may be due to the removal of the older treatment nodes. 2.6 Results for the TB co-infected subpopulation The subpopulations described here and in Sections 2.7 and 2.8 have been reported and analyzed completely separately. That is to say that the studies among these subpopulations were not included within the more general population reviewed in Section 2.5. That is because treatment among these subpopulations are fundamentally different. For TB-HIV co-infected patients, treatment needs to be given in combination with TB drugs, which can lead to complications due to drug-to-drug interactions. For patients with HIV-TB co-infection, the SLR identified an interim analysis from the ongoing INSPIRING trial.240 Given the direct relevance and impact the findings of this trial have in relation to the research question, an overall review of the interim findings is presented first and this is supplemented with an NMA of all available evidence to provide an overview of the evidence landscape. 2.6.1 The INSPIRING trial In this review, the evidence for HIV-TB co-infected patients treated with DTG is based on a 24-week interim analysis from the INSPIRING trial which was presented at the CROI 2018 conference.240 INSPIRING (NCT02178592) is a Phase III, open-label randomized controlled trial enrolling HIV-TB co-infected adult patients for treatment with twice-daily DTG 50 mg or once-daily EFV 600 mg. Patients were receiving rifampin-based TB therapy.   44 Treatment with DTG led to relative increases in CD4 cell counts but was not distinguishable from EFV with respect to viral suppression (Figure 2-13). By 24 weeks, DTG was well tolerated through the evidence on safety and tolerability was based on very few events. As discussed in the previous section, viral suppression was defined using the FDA Snapshot algorithm and differences here were driven by differences in discontinuations. The difference in discontinuations was not due to AEs Figure 2-13: Modified FDA snapshot analysis of the percentage of participants (95% CI) with HIV-1 RNA <50 copies/mL  Adapted from Dooley et al 2018 (CROI 2018)  2.6.2 Systematic literature review study selection A separate systematic literature review was conducted to describe and synthesize the evidence for the first-line treatment of HIV-TB co-infected patients. While the primary search strategy did not exclude patients with TB co-infection, this search was supplemented with a more sensitive, targeted strategy.68 The flow diagram for the study selection criterion is presented in Figure 2-14. Overall, 11 publications on 10 studies were identified.240-250  As with the principal analysis, some studies were excluded from the analysis set. In this case, three trials were non-comparative and could not be used in the analysis. These were the trials named: ANRS 129 BKVIR, HIV-TB Pharmagene and TB-HAART.241,244,245 This was determined at the feasibility stage and could not be determined at the SLR stage. Also, the Sinha et al, 2013 was a prelude to Sinha et al, 2017 and was therefore not used in the analyses.248,249   45 Figure 2-14: Flow diagram for principal literature review on TB co-infected individuals and first-line ART regimens  46 The complete network of evidence for the analysis set of the HIV-TB co-infected sub-population is presented in Figure 2-15. The evidence base consisted of 1378 patients enrolled in 13 treatment arms across 6 RCTs. The evidence was limited to 5 treatments: NVP, DTG, EFV, and RAL (400 mg; 800 mg). No evidence was identified for patients treated with EFV400.   Figure 2-15: Complete network of evidence for patients with HIV-TB co-infection  Legend: Circles (nodes) in the diagrams represent individual treatments, lines between circles represent availability of head-to-head evidence between two treatments, and the numbers on the lines are the number of RCTs informing each head-to-head comparison. Blue: NNRTIs;; Orange: Integrase inhibitors.  2.6.3 Network meta-analysis results As compared with the principal analysis, there were less analytical adjustments used here. There were no adjustments for differences in backbones and the network was too sparse to allow for meta-regression adjustments for baseline characteristics. Given that the only closed loop came from a single trial, no tests for consistency were needed (i.e., there are no inconsistency plots for this subpopulation because they are not required). All network diagrams for specific analyses are provided in the statistical analysis plan. As only one comparison of interest, namely DTG vs EFV, is present in this network of evidence, forest plots were not used for the sake of presentation. Moreover, given that the evidence base was relatively small, the cross-tables are presented instead. In the following subsections, results for analyses that involved DTG are presented. For all other outcomes, cross-tables are provided in Appendix A as a reference.  47 2.6.3.1 Efficacy For the analysis of viral suppression in HIV-TB co-infected patients at 24 weeks, 3 trials including 382 patients across 7 trial arms informed the network of evidence. Results of the fixed-effects NMA are presented in Table 2-4. There was no statistically significant difference between DTG and EFV or between RAL400 and RAL800; however, the estimate suggests lower odds of suppression (in accordance with the FDA Snapshot algorithm). As previously mentioned, this difference appears to be driven by the larger number of discontinuations among the DTG arm of the INSPIRING trial. All treatments were associated with a higher proportion of patients achieving viral suppression compared to NVP, though this difference was not statistically significant compared to DTG (OR 1.64).  Table 2-4: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for viral suppression at 24 weeks from the fixed-effects network meta-analyses in HIV-TB co-infected patients EFV 1.86  (0.64, 6.33) 0.51  (0.21, 1.19) 0.46  (0.19, 1.09) 3.09  (1.28, 7.98) 0.54  (0.16, 1.57) DTG 0.27  (0.06, 1.10) 0.24  (0.05, 0.98) 1.64  (0.36, 6.88) 1.95  (0.84, 4.74) 3.67  (0.91, 16.51) RAL400 0.89  (0.35, 2.27) 6.07  (1.77, 22.19) 2.19  (0.92, 5.39) 4.14  (1.02, 18.72) 1.12  (0.44, 2.89) RAL800 6.80  (2.00, 25.13) 0.32  (0.13, 0.78) 0.61  (0.15, 2.76) 0.16  (0.05, 0.57) 0.15  (0.04, 0.50) NVP Values represent the effect of the treatment lower on the diagonal to the one higher on it. Bold values indicate comparisons that are statistically significant. Odds ratios above 1 indicate higher efficacy in viral suppression. DTG: dolutegravir; EFV: efavirenz; NVP: nevirapine; RAL400: raltegravir 400; RAL800: raltegravir 800   The network of evidence for change in CD4 count at 24 weeks in HIV-TB co-infected patients was based on 3 trials of 8 treatment arms consisting of 371 patients. The network of evidence at the 24-, 48-, and 96-week timepoints are presented in Appendix A. Results of the fixed effects NMA are presented in Table 2-5. Treatment with DTG was associated with statistically significant increases in CD4 cell count compared to all other treatments in the network. No other comparisons were statistically significant. Table 2-5: Cross table of odds ratios with 95% credible intervals comparing the relative efficacy of ARVs for mean change in CD4 cell counts at 24 weeks from the fixed-effects network meta-analyses in HIV-TB co-infected patients EFV -52.52  (-89.61, -14.93) 5.76  (-23.84, 35.48) 52.52  (14.93, 89.61) DTG 58.28  (10.72, 106.04) -5.76  (-35.48, 23.84) -58.28  (-106.04, -10.72) NVP Values represent the effect of the row treatment relative to the column treatment. Bold values indicate comparisons that are statistically differentiable. DTG: dolutegravir; EFV: efavirenz; NVP: nevirapine;   48 2.6.3.2 Tolerability The evidence base for discontinuations due to adverse events was based on 4 trials of 524 patients enrolled in 9 treatment arms. A summary of the evidence, arranged by treatment and trial, is presented in Table 2-6. The proportion of patients with a discontinuation due to adverse events varied across treatments. There are limitations to synthesizing evidence by NMA when there are a small number of events and analyses may generate non-meaningfully wide CrIs. With no such events observed in the DTG arm, no NMA was conducted.  Table 2-6: Data for treatment comparisons of interest for discontinuations due to adverse events outcome in HIV-TB co-infected patients Trials EFV DTG RAL400 RAL800 NVP INSPIRING 1/44 (2.3%) 0/69 (0%)    Swaminathan et al, 2011 1/59 (1.7%)    2/57 (3.5%) ANRS 12 180 Reflate TB trial 3/51 (5.9%)  0/51 (0%) 3/51 (5.9%)  N2R 3/71 (4.2%)    4/71 (5.6%) EFV: efavirenz; NVP: nevirapine; RAL400: raltegravir 400; RAL800: raltegravir 800; DTG: dolutegravir  The evidence base for all-cause discontinuations (retention) was based on 2,839 patients enrolled across 13 treatment arms in 6 trials. A summary of the evidence, arranged by treatment and trial, is presented in Table 2-7. The number of patients who discontinued treatment varied across treatments. There are limitations to synthesizing evidence by NMA when there are a small number of events and analyses may generate non-meaningfully wide CrIs.  Table 2-7: Data for treatment comparisons of interest for discontinuations due to adverse events outcome in HIV-TB co-infected patients Trials EFV DTG RAL400 RAL800 NVP LPV/r INSPIRING 2/44 (4.5%) 5/69 (7.2%)     Swaminathan et al, 2011 4/59 (6.8%)    10/57 (17.5%)  ANRS 12 180 Reflate TB trial 6/51 (11.8%)  5/51 (9.8%) 9/51 (17.6%)   CARINEMO 52/285 (18.2%)    43/285 (15.1%)  N2R 9/71 (12.7%)    16/71 (22.5%)  EFV: efavirenz; NVP: nevirapine; RAL400: raltegravir 400; RAL800: raltegravir 800; DTG: dolutegravir; LPV/r: ritonavir-boosted lopinavir   49 2.6.3.3 Safety This analysis was based on 2,726 patients enrolled in 5 trials consisting of 11 treatment arms. Data collected from trials is presented in Table 2-8. Again, small sample sizes were a major limiting factor. The odds of experiencing an SAE while on DTG were 0.48 times those of experiencing an SAE while on EFV (95% CrI: 0.12, 1.90). There were more events when it came to overall AEs and here, DTG was found to be safer than EFV (OR: 0.26; 95% CrI: 0.08, 0.84). Table 2-8: Data for treatment comparisons of interest for the treatment-emergent serious adverse events Trials EFV DTG RAL400 RAL800 NVP LPV/r INSPIRING 5/44 (11.4%) 4/69 (5.8%)     Swaminathan et al, 2011 4/59 (6.8%)    5/57 (8.8%)  ANRS 12 180 Reflate TB trial 19/51 (37.3%)  17/51 (33.3%) 17/51 (33.3%)   CARINEMO 70/288 (24.3%)    74/285 (26%)  EFV: efavirenz; NVP: nevirapine; RAL400: raltegravir 400; RAL800: raltegravir 800; DTG: dolutegravir; LPV/r: ritonavir-boosted lopinavir  2.7 Results for the pregnant and breastfeeding women subpopulation 2.7.1 Systematic literature review study selection A subgroup systematic literature review was conducted to describe and synthesize the evidence for the first-line treatment of pregnant and breastfeeding women. While the primary search strategy did not exclude pregnant and breastfeeding women, this search was supplemented with a more sensitive, targeted strategy.68 The flow of information diagram is presented in Figure 2-16. Overall, 29 publications reporting on 15 studies were identified.60,61,251-277  50 Figure 2-16: Flow diagram for principal systematic literature review on pregnant and breastfeeding women and first line ART regimens     51 2.7.2 Summary of the evidence base In the previous review (2015), 9 studies in treatment-naïve pregnant and breastfeeding women were identified. However, no closed networks of evidence could be established (i.e. the evidence could not be synthesized through an NMA). Several studies were identified through the current update, but most were excluded from the analysis as they could only be synthesized through a descriptive summary. With respect to the research question, which is focused on the efficacy and safety of DTG relative to EFV, two studies of relevance were identified: the DolPHIN 1 trial and the Tsepamo study.60,61,277 The DolPHIN 1 trial is an open-label, phase II/III randomized controlled pilot study comparing DTG/TDF/XTC to EFV/TDF/XTC (standard of care).277 This is primarily a pharmacokinetics study with limited clinical outcomes and a very small sample size with 8 patients in each treatment arm in the current interim analysis (these were interim analysis results and recruitment was ongoing). A larger, phase III trial is also underway (DolPHIN-2, NCT03249181). This open-label randomized controlled trial was planned to begin in January 2018 and is anticipated to be completed in March 2021 with an estimated enrolment of 250 adult patients in South Africa and Uganda. The Tsepamo study was a large cohort study of pregnant women initiating DTG/TDF/XTC or EFV/TDF/XTC across 8 government hospitals in Botswana.60,61 A large sample of patients was enrolled, with 1,729 patients treated with DTG and 4,593 treated with EFV. It is noteworthy that Botswana was the first country to recommend DTG/TDF/XTC for initiation in pregnancy. The proportion of pregnancies with any adverse birth outcome was similar across treatment arms with 33.2% of DTG-managed pregnancies and 35.0% of EFV-managed pregnancies resulting in an adverse outcome. Similarly, severe birth outcomes were reported in 10.7% of DTG-managed and 11.3% of EFV-managed pregnancies. A summary of the outcomes from the Tsepamo study is presented in Error! Not a valid bookmark self-reference.. Table 2-9: Summary of the Tsepamo study of DTG/TDF/FTC vs EFV/TDF/FTC in pregnant and breastfeeding women initiated on first-line ART  Outcome Dolutegravir/ TDF/FTC (N=1729) Efavirenz/ TDF/FTC (N=4593) Unadjusted RR  (95% CI) Adjusted RR (95% CI) Any Adverse Birth Outcome 576 (33·3%) 1611 (35·0%) 0·95 (0·88,1·03) 0.94 (0.87, 1.02) Any Severe Adverse Birth Outcome 186 (10·8%) 520 (11·3%) 0.95 (0·81,1·11) 0.93 (0.79,1.11) Preterm birth (<37 weeks) 309 (18·0%) 844 (18·5%) 0·97 (0·87,1·10) 0.98 (0.87,1.11) Very preterm birth (<32 weeks) 66 (3·8%) 160 (3·5%) 1·10 (0·83,1·45) 1.09 (0.82,1.45) Small for Gestational Age (<10th %tile weight-for-gestational age) 297 (17·4%) 838 (18·5%) 0·94 (0·83,1·06) 0.94 (0.83,1.06) Very small for Gestational Age (<3rd %tile weight-for-gestational age) 104 (6·1%) 302 (6·7%) 0·91 (0·74,1·13) 0.91 (0.74,1.13)  52 Outcome Dolutegravir/ TDF/FTC (N=1729) Efavirenz/ TDF/FTC (N=4593) Unadjusted RR  (95% CI) Adjusted RR (95% CI) Stillbirth 39 (2·3%) 105 (2·3%) 0·99 (0·69,1·42) 0.99 (0.69,1.42) Neonatal death (<28 days) 21 (1·2%) 60 (1·3%) 0·93 (0·57,1·53) 0.96 (0.58,1.57) RR: Relative risk; 95% CI: 95% confidence interval Two additional studies of relevance were identified (IMPAACT 1026s and EPPICC/PANNA) though they were not restricted to the treatment of first-line patients.252,266,274-276 In both studies, pregnant women were initiated on DTG-based regimens, with 29 pregnancies in IMPAACT 1026s and 84 pregnancies in EPPICC/PANNA. These studies were included in the evidence base to inform safety outcomes of interest. A summary of the evidence for select outcomes of interest in patients treated with DTG and EFV is presented in Table 2-10. It shows similar estimates to those seen in the Tsepamo study and suggests that the results of the Tsepamo study may be generalizable to high and middle-income settings. Table 2-10: Summary of evidence among pregnant and breastfeeding women on first-line ART  Outcome DTG/TDF/FTC EFV/TDF/FTC Unadjusted OR (95% CI) Study Viral suppression < 50 copies/ml (2 weeks post-partum) 5 (62.5%) 4 (50%) 1.25 (0.52, 3.00) DolPHIN 1 Still births 1 (12.5%) 0 (0%) -- DolPHIN 1 1 (1.2%) -- -- EPPICC/PANNA Congenital abnormalities 4 (4.9%) -- -- EPPICC/PANNA Pre-term birth 4 (13.8%) -- -- IMPAACT 1026s 11 (13.8%) -- -- EPPICC/PANNA Low birth weight (<2.5 kg) 4 (13.8%) -- -- IMPAACT 1026s 13 (16.9%) -- -- EPPICC/PANNA Very low birth weight (<1.5 kg) 1 (3.4%) -- -- IMPAACT 1026s 0 (0%) -- -- EPPICC/PANNA HIV transmissions 0 (0%) -- -- IMPAACT 1026s OR: Odds ratio; 95% CI: 95% confidence intervals 2.8 Results: Children and adolescents  No evidence meeting the inclusion criteria were found for children and adolescent through the course of the systematic literature review. While there was evidence on the use of PIb and NNRTI use in children, there was no evidence on DTG and EFV400, which were the interventions of interest for these analyses. Given that there was no evidence base to work with, no analyses were conducted.     53 2.9 Discussion  The purpose of this study was to determine whether the evidence supported that DTG and/or EFV400, each with an XTC + TDF backbone, should be the preferred first-line ART regimen for clinical guidelines using a public health framework rather than their current designation of alternative first-line ART regimen. Ultimately, this work was intended to also support the 2018 update to the WHO consolidated guidelines on the use of antiretrovirals for treating and preventing HIV with respect to the choice of first-line ART. This extensive systematic literature review and network meta-analysis to evaluate the comparative efficacy and safety of these and other ART regimens drew strong conclusions about the improved efficacy and tolerability of DTG relative to EFV. Moreover, the evidence synthesis supports the use of DTG among sub-populations (with the exception of children), which was not the case in 2015. Specifically, the results of this study suggest comparable safety among pregnant women initiating treatment and results were not demonstrably worse among TB-HIV co-infected individuals. Unfortunately, evidence was lacking among subpopulations for EFV400 and the evidence had not changed since the previous guideline developments. Overall, the evidence more strongly supports the choice of DTG as the anchor treatments for the preferred first-line regimen and continues to support EFV400 as an alternative first-line anchor treatment. Contextualizing results The analyses separated ART regimens into two components: The backbone and the anchor treatment. The focus of the analyses was on the anchor treatment given that there have been more recent developments in anchor treatments and that preferred backbones are rather well established. Nonetheless, there are a few worthwhile remarks pertaining to backbones. TAF was the latest NRTI to be FDA approved (2016) and represents the only true new development in backbone ARVs. Previous to this, the most recently approved NRTI was FTC (2003). As such, the TDF+XTC backbone is generally accepted as a preferred choice within a public health approach.21 Note that in places where genotyping is readily available ABC+XTC is also viewed as a preferred choice given that it has similar efficacy and safety among patients that are HLA B*5701 negative.278 The 5% of patients that HLA B*5701 positive experience increased adverse events using this backbone. As a result, these analyses used TDF+XTC as the reference backbone. While the backbone was not the focus of this research, the meta-regression estimates did confirm that TDF+XTC was safer and more effective than older treatments (e.g. d4T, ddI and to a lesser extent AZT). Differences with ABC+XTC were not statistically significant, but were more favourable to TDF+XTC. Importantly, TAF+XTC did appear to be somewhat favourable relative to TDF+XTC. In general, these differences were not statistically significant, with the exception of changes in CD4 and rates of discontinuations due to adverse events. These results support the more extensive work conducted by Andrew Hill and colleagues that focused on this particular comparison.279 Their work found that meaningful differences between the two treatments were only found when treatments were boosted with cobicistat or ritonavir. Further research will be needed to confirm the advantages of TAF vs TDF, but it appears reasonable that TAF does provide  54 some advantages. At the moment, both are recommended as part of preferred first-line regimens in resource-rich settings;278 however, TAF will not be part of a preferred first-line regimen in LMICs until it is available in a generic form. At the moment, it is no;.280 however, funding agencies do see TAF is a potentially cheaper alternative in the near future. Having discussed the backbone, the remainder of this discussion focuses on anchor treatments, first DTG and then EFV400.. Despite strong evidence of improved efficacy and tolerability of DTG relative to EFV in the 2015 SLR and NMA,24 other factors prevented its recommendation as the preferred first-line regimen.21 These factors were: the unavailability of a fixed-dose combination with TDF+XTC, the high price (i.e., unavailability as a generic), and the uncertainty around its use in key sub-populations. All of these factors have now been overcome. A generic fixed-dose combination with TDF and lamivudine, referred to as TLD,281 is now available and studies among sub-populations have begun to report results. It is in this context that this updated study was undertaken. For adults and adolescents, the principal population, there were only a few new trials with the treatments of interest and no new trials for the comparisons of interest. It is therefore not at all surprising that the conclusions were similar to the previous set of analyses. There continues to be a high certainty of improved viral suppression, discontinuations and discontinuations due to AEs for DTG relative the EFV. Across all outcomes, the results of DTG were favourable, with the only exceptions being in outcomes that are plagued with the very low number of events. While this study demonstrated improved safety with DTG relative to EFV in terms of overall AEs, it is understood that the specific adverse events experienced with each differ, where DTG is more likely to lead to headaches and EFV has a higher propensity for neuropsychiatric adverse events.282  The evidence on EFV400 compared to standard dose EFV (600mg qd) comes entirely from the ENCORE1 trial.78 Evidence from this trial suggests that low dose EFV is non-inferior to the standard dose, with respect to efficacy and safety. It also suggests improved retention compared to standard dose EFV. This study also displayed improved discontinuations due to adverse events among EFV400. Despite this difference, comparisons of low dose EFV to DTG using indirect evidence showed that DTG tended to have better retention and lower discontinuations due to adverse events. Moreover, these showed that DTG was more effective with respect to viral suppression. As previously mentioned, the availability of data among sub-populations served as an important motivation for this update; however, it was unfortunate that two key studies currently underway were not yet available. These are the NAMSAL trial, comparing EFV400 to DTG, and the ADVANCE trial comparing EFV to DTG with TDF and TAF.67 The addition of these trials could be influential on the current assessment of treatment comparability. An update following their release will be of high interest. Critical to the public health approach that is favoured by the WHO,7 is the ability to prescribe treatment regardless of TB co-infection, pregnancy and ideally to children as well. Having the simplest treatment algorithm allows task shifting and non-centralized care to continue in low-income settings, which has been so critical to the success of the fight against HIV/AIDS. No eligible studies were identified with respect to  55 EFV400 among sub-populations and thus the second research question could not be tackled outside of the principal population. Only a handful of eligible studies providing insights on the first research question (DTG vs EFV) were identified. None were identified for children; however, it may not be too much of a stretch to believe the efficacy observed in adults would translate into children aged 3 years or more. Although a trial was available comparing DTG to EFV among TB-HIV co-infected patients, it must be recognized that the trial was both small and that only 24-week results were available.240 The small sample size is the greater concern with respect to the certainty of evidence given that in practice, TB treatment tends to be completed within six months of its initiation. Evidence appears to suggest a negligible difference between treatments. Pregnancy was the sub-population with the richest evidence base. Pregnancy was also a focal point of discussion during the Guideline Development Group meetings, but not exactly in line with this research. While the evidence base was convincing with respect to the use of DTG as a first-line regimen among pregnant women, there is a signal that DTG in preconception may be problematic.283 That issue falls outside the scope of the current study and is discussed at length in Chapter 5. The reasoning behind focusing on DTG and EFV400, as provided in the introduction, is based on the results of the previous SLR/NMA. Nonetheless, it is worth discussing some of the other competing treatments. Relative to standard dose EFV, all of the INSTIs have demonstrated improved efficacy and safety. RAL, the first approved INSTI, has a well-established research program that has demonstrated that it is an effective and safe option among the key sub-populations: HIV-TB co-infected individuals, children, and pregnant and breastfeeding women. RAL is certainly a valid choice as the cornerstone to a first-line regimen. Yet, the two principal reasons for not selecting raltegravir as the preferred anchor treatment are: increased pill burden (it is taken twice a day) and lesser efficacy and tolerability relative to DTG. To address pill burden, Merck is currently developing a once-daily version of RAL. For both EVG/c and BIC, research among sub-populations is lacking; however, there is a noted lower barrier to resistance for EVG/c.284 Protease inhibitors can be used as a first-line anchor treatment, but tend to be used a preferred second-line treatment.23 DRV/r represents an interesting option due to its favourable tolerability, but its efficacy is much lower among patients with high viral loads (>100,000 copies/mL).157 Finally, both RPV and DOR represent potential better options within the NNRTI class, but neither provides efficacy that is comparable to the INSTIs.    Use of NMA for guideline development The theoretical grounds for which NMA should be used more often in guideline development were discussed in Chapter 1. The experience of using this study to support the 2015 and 2018 guidelines serves as a useful case study that further supports the use of NMA in guideline development. The importance of being able to make indirect comparisons within a network of evidence manifested itself when the Guideline Development Group (GDG) reviewed the evidence pertaining to the comparison between DTG and EFV400.  56 Both had both been compared to standard-dose EFV, which served as a common comparator for indirect comparisons. Published trial results suggest that both dolutegravir and EFV400 are comparable to EFV in terms of efficacy, but have more favourable tolerability. Using these NMA, DTG was shown to have lower levels of discontinuations due to adverse events than EFV400 and lead to higher probabilities of viral suppression. Thus, by considering both alternatives simultaneously, the GDG – composed of clinical experts, policy-makers, and community members – was able to draw stronger conclusions on the choice of first-line regimens. Strengths and limitations This study has numerous strengths and limitations. First, the use of NMA allowed for analytic adjustments to account for differences in backbones and provide an unbiased estimate of the comparison between DTG and EFV despite the critical trial having different backbones. Second, by combining direct and indirect evidence, some of the findings can be seen as having stronger evidence than previously perceived when strong findings are supported by both sources of evidence (e.g., some results that were not statistically differentiable on the basis of direct evidence became so when combining direct and indirect evidence). With respect to limitations, first, the evidence for the comparisons of interest continued to be somewhat limited in sub-populations. For EFV400, it was completely missing. Most notably for DTG, there was an absence of evidence within children. This was also the case in people pre-exposed to ARVs, though that was expected at the outset of the study. Even in pregnancy and TB, much of the evidence is still to come. Second, the Tsepamo study released results that impact this research.60,61 Specifically, its results provided a signal of potentially higher rates of neural tube defects among women with pre-conception exposure to DTG.62 Issues of congenital malformations must be taken seriously. As previously discussed, clinical guideline recommendations are made on the basis of many considerations, with efficacy and safety being one of them. This study aimed to assess the relative efficacy and safety of treatments of interest. While the question of pre-conception exposure to DTG differs from the issue of first-line therapy among pregnant and breastfeeding women, these results certainly speak to the safety of DTG as a first-line regimen. As such, the results of the current study need to be assessed in conjunction with the results of the Tsepamo study when discussing guideline recommendations.. This is further complicated by the premature status of the Tsepamo study. As this was an unplanned analysis, the analysis was underpowered. It is important to realize that its signal is based on a mere four events and that further results will be critical in properly assessing this safety concern. This is further discussed in Chapter 5. It is listed here as a limitation because while the results were pretty clear with respect to the improved outcomes using DTG, they are now more difficult to comprehend and further research will be needed. Third, some significant outcomes were limited by a very low number of events, including mortality, regimen substitutions, serious adverse events, and ADIs. This influenced the precision of the estimates with respect to these outcomes and, in some cases, precluded the conduct of evidence synthesis through NMA. Fourth, treatment-related adverse events were both inconsistently defined and inconsistently reported. This limitation was mitigated by considering both  57 treatment-related and treatment-emergent adverse events. Additionally, studies of shorter duration are, by their nature, less likely to identify adverse events than longer-term trials. Despite this, the evidence was collected through a rigorous systematic review process in accordance with the practices and recommendations set forth by the Cochrane Collaboration, including both broad and targeted searches of the literature, critical appraisal of the identified studies, and consultation with subject matter experts. Conclusion Dolutegravir in combination with lamivudine/emtricitabine and tenofovir disoproxil fumarate is an effective, safe and tolerable ART regimen. Across a variety of outcomes, evidence strongly suggests that it is superior to the current efavirenz-based preferred first-line ART regimen. With a new affordable generic fixed-dose combination and comparable outcomes among sub-populations, the evidence supports the choice of a dolutegravir based preferred first-line regimen. Conclusions regarding low-dose efavirenz are unchanged since 2015. Low-dose efavirenz appears to be more tolerable, but with lack of evidence in sub-populations it is likely best to be considered an alternative first-line regimen.   58 Chapter 3: Investigation into the benefits of using IPD for the systematic literature review and network meta-analysis of first-line ART 3.1 Synopsis Background: There is a growing recognition that individual patient data (IPD) can be used to improve evidence synthesis by allowing for better adjustments for covariates. IPD for three dolutegravir trials were obtained from through ClinicalStudyDataRequest.com (CSDR). Objectives: To compare various methods for meta-regression adjustments using a mixture of IPD and aggregate data (AgD) and to determine whether its inclusion would impact decision-making on the bases of relative efficacy, safety and tolerability of DTG relative to other anchor treatments in first-line HIV patients. Methods: The SLR matched that of Chapter 2, but was expanded upon with a search for IPD. On 15 August 2016, IPD from three RCTs (FLAMINGO, SINGLE, SPRING-2) available through CSDR were formally requested. Access to the data was granted on 06 June 2017. Following the identification of baseline characteristics and outcomes of interest, data were verified to ensure that published results for each trial could be obtained from the IPD. Initial differences were resolved. The statistical analyses were conducted in two stages. Stage 1 involved comparing the various statistical models of interest for conducting meta-regression adjustments with IPD and AgD on select outcomes. Stage 2 involved selecting and applying an adjustment method as explored in Stage 1. The primary outcomes were: viral suppression and change in CD4 at 48 weeks, discontinuations, and discontinuations due to AEs. The motivation for choosing these variables is that DTG and EFV400 are treatments that are viewed to have as good or better efficacy and improved tolerability. The covariates that were adjusted for were baseline CD4 cell counts, baseline HIV RNA, the proportion of males and their combinations. Meta-regressions were centered at the average values for EFV, which was the reference treatment. Seven modelling approaches were applied and compared: 1) Unadjusted AgD-NMA; 2) AgD-NMA with meta-regression; 3) Two-stage IPD-AgD NMA; 4) Unadjusted one-stage IPD-AgD NMA; 5) Two-stage IPD-AgD NMA with empirical-priors (empirical-priors approach); 6) One-stage IPD-AgD NMA with meta-regression (one-stage approach); 7) Hierarchical meta-regression IPD-AgD NMA (HMR approach). The first two were the models used in the previous chapter. The two-stage approach consisted of calculating the AgD after regression adjustment on the IPD and analyzing the resulting AgD. The empirical-priors approach involved using the IPD to calculate distributions for the coefficient adjustments and using these for the entire network. Finally, HMR and the one-stage approaches involved modeling both IPD and AgD in a single Bayesian analysis. Models were compared with respect to effect estimates, changes in the effect estimates, coefficient estimates, DIC and model fit, rankings and between-study heterogeneity.   59 Results: Individual patient level data were available from 2,160 patients across three trials, representing 6.5% of the total evidence base (2,160/33,148), and the three trials with IPD cover a total of 3 of 24 edges. Overall, the use of IPD often impacted the coefficient estimates, but seldomly impacted the results such that it would change the subsequent decision-making. For viral suppression at 48 weeks, the IPD models did not have the lowest DIC and had no impact on rankings; however, they did have a large impact on the absolute effects, with mean changes in proportions of 2-3%. For CD4, again none of the models had the lowest DIC, but the impact on estimates for DTG relative to EFV was large. Effects varied from approximately no change to an increase of 50 cells/ml more and many of the covariate estimates were statistically significant. Only discontinuations due to adverse events had an IPD model with the lowest DIC. The selected model adjusted for the proportion of males using the empirical-priors approach. The selected model shifted the principal comparison of interest from an OR of 0.28 (95% CrI: 10.17, 0.44) to 0.37 (95% CrI: 0.23, 0.58), but this would have little impact on decision making. Finally, similar observations were made with the secondary outcomes, where only viral suppression at 96 weeks had an alternative model selected and the difference in the estimate did not change the interpretation of the therapeutic landscape. For each outcome, the use of IPD impacted an aspect of the results (e.g., DIC, rankings or covariate estimates), but the aspect affected were generally different for each of analysed outcomes. Discussion: This study does provide important insights into these methods of adjustment. First, the choice of method matters given that in limited cases results differed remarkably between the two-stage approach (non-empirical-prior) and other approaches under consideration. Second, the hierarchical meta-regression tended to lead to the largest changes in effect estimates, but did so with reduced precision. Third, there was also a remarkable difference in the coefficient estimates obtained through IPD methods and traditional AgD methods, suggesting that when adjustments are needed, IPD is much more appropriate to use. For comparing the therapeutic landscape of first-line ART for the treatment of HIV, conclusions reached through the evidence synthesis supplemented by the IPD did not lead to changes in interpretation. There were some limitations to this study. First, there were very few trials for which IPD were obtained, which was exacerbated by the missed opportunity to get IPD for SPRING-1. Second, to account for differences in backbone regimens, an arm-based meta-regression was used in addition to the more traditional regression adjustments, and it is unclear whether the multiple forms of meta-regression interfered with one another. Third, numerous other potential effect-modifiers were poorly reported and hence not adjusted for. These principally included ethnicity and acquisition risk groups. Finally, due to low event counts and data unavailability, not all outcomes were available for re-analysis using IPD.  There are many ways in which IPD can be integrated with AgD for the purpose of NMA. Choosing the method by which to integrate these data does have an impact on the results. In most cases, the one-stage approach is recommended; however, in situations with fewer edges that have IPD, the two-stage empirical-priors approach is a viable alternative. Even with the revised analyses, DTG continues to demonstrate improved efficacy and tolerability over other anchor treatments.   60 3.2 Introduction Meta-analyses typically consist of combining results obtained through extracting aggregate data (AgD) from study publications and reports, as was done in Chapter 2. A less common form of meta-analysis consists of obtaining data on individual patients, referred to as individual patient data (IPD) meta-analysis.285 Traditionally, this requires obtaining the original research data directly from the researchers that conducted the studies of interest.286 Obtaining proprietary data from researchers represented and represents a major barrier to conducting IPD meta-analyses. Moreover, obtaining multiple IPD study sets may require collaborative groups that ensure the inclusion of stakeholders across research groups. Finally, the analyses themselves require much more involvement on the part of the analyst given that the richer data provide more analytical options.  Despite their additional complications, IPD meta-analyses offer numerous advantages over traditional AgD meta-analyses.287 IPD provide richer evidence bases that allow for more precise estimates. The richer data provide an opportunity to make further analytical adjustments, including a variety of adjustments that would not be available were the data to only be available in aggregate form. For example, imbalances in baseline characteristics can only be addressed through meta-regression on trial-level values or on the exclusion of whole trials in an AgD meta-analysis. In contrast, in IPD meta-analysis a subset of patients not meeting study inclusion criteria can be removed rather than removing the entire trial when some patients do not meet inclusion criteria or are deemed too different from the remainder of the evidence base. Additionally, conducting meta-regression at the patient rather than using trial-level aggregate values provides more precise adjustments and avoids such risks as the ecological fallacy (when trends at the trial-level do not match trends at the individual level, and adjustments exacerbate the estimation bias).288 Finally, IPD also allow for the adjustments of multiple variables at a time while AgD seldom allow for adjustments of more than one variable at a time.289 Two important steps have been taken to overcome the challenge of data availability. Firstly, methods have been developed to use a combination of IPD and AgD. This allows for some of the IPD meta-analysis advantages described above to be harnessed, all while reducing complications around obtaining data. Secondly, access to IPD from clinical trials has improved through collaborative programs that allow for such data to be accessed. There are multiple types of programs providing access to IPD. The simplest form of such programs are open access programs, such as Project Genomics Evidence Neoplasia Information Exchange (GENIE). Open access projects are characterized by not needing a proposal or approval to access the data. These tend to either be programs that have a narrow scope or that provide access to much older trials. A less accessible model is the gatekeeper-federated model, which provide data contingent on a variety of conditions and which are usually provided directly by the study sponsor. As a result, these sources are more likely to have studies that are highly pertinent to a given research question, but combining IPD from multiple sponsors may be difficult at best. Notable programs include  61 ClinicalStudyDataRequest.com (CSDR), Supporting Open Access for Researchers (SOAR) programs and the Yale University Open Data Access (YODA) Project, among others.290 The CSDR is a consortium of clinical study data providers. Clinical trials have become an increasingly expensive endeavour and it is critical that the data obtained from them be used fully. By providing a platform through which to request and access these data, CSDR provides an opportunity to further scientific innovation and help advance medical science through further research. It has many study sponsors who have committed to it. Given that NMA expands the scope of traditional meta-analyses, the logistical challenges of conducting IPD meta-analysis are only exacerbated when conducting NMA. As a result, the most common manner in which IPD are used in NMA is in analyses that include both IPD and AgD.290 These methods have been developed for both pairwise meta-analysis and NMA. There are a variety of ways that have been suggested for using IPD within an evidence base that has a mixture of both IPD and AgD. The simplest manner to use such data is to create adjusted AgD using the IPD, referred to as a two-stage approach.286 This can either mean using regression or data restriction. With regression, an outcome estimate that would be expected if the IPD population matched the target population on selected baseline characteristics is obtained (e.g., obtaining the estimated cure rate were only 20% of the population were treatment experienced, rather than the 35% observed in the IPD trials). With restriction, the AgD are obtained after removing all undesired patients (e.g., obtaining the estimated cure rate after removal of all treatment-experienced patients).  Use of IPD in meta-analyses was limited before the late 1990s.286 It is only in the early 2000s that a series of papers presented methods on how to integrate the IPD and AgD together using hierarchical models, referred to as a one-stage approach, were published.291-293 Development of NMA methods for AgD were only starting at the time.37,70 Methods to integrate IPD and AgD within NMA soon followed with similar one-stage approaches.294 Similarly, Jackson et al developed an expanded hierarchical method that improved IPD-AgD meta-analysis by reducing the risk of ecological fallacy,53 and this was then expanded to the case of NMA by Jansen.54 Finally, there is a family of methods, referred to as population-adjusted indirect comparisons, which expand beyond traditional NMA, allowing for disconnected networks. Methods for unanchored networks are referred to as population adjusted indirect comparisons and are the principal topic of the National Institute of Clinical and Health Excellence (NICE) Technical Support Document (TSD) 18.55 In the case of NMA an important consideration for IPD-AgD analyses is whether the network is anchored or unanchored (connected or not).55 In the case of an anchored network, the goal is to adjust for imbalances in effect-modifiers. In an unanchored network, the goal is to adjust for both effect-modifiers and prognostic factors.55 These terms are demonstrated in Figure 3-1. Effect-modifiers, as the name suggests, are factors that will influence the magnitude of the treatment-effect. As such, imbalances in these factors across different treatment comparisons (network edges) will lead to biased estimates when unadjusted for. Prognostic factors are those that influence the study effect. Put another way, they influence patient outcomes, but do not influence the treatment-effect. In a connected network, the treatment-effect can be  62 isolated from the study effect and as such factors that only influence the study effect (i.e., prognostic factors) are inconsequential. Hence why only effect-modifiers need to be adjusted for. When a network is unanchored, it is no longer possible to separate the treatment-effect from the study effect. In such a scenario, the prognostic factors must be accounted for. The aim is to make all study effects roughly the same, therefore allowing for valid comparisons of treatment-effects. The principal methods discussed in TSD 18 are match-adjusted indirect comparisons and simulated treatment comparisons.56 These are limited to indirect comparisons and are therefore not of interest for this chapter. Rather, this chapter focuses on the use of IPD to improve meta-regression adjustments within NMA. A full description of the various methods that can be used is provided in the Methods section. Following the completion of the SLR and NMA described in Chapter 2, and following the completion of the WHO clinical guidelines, IPD were sought to determine their impact on the evidence synthesis. CSDR granted access to the IPD for three separate DTG trials, namely FLAMINGO, SINGLE and SPRING-2.94,235,295 Given the particular interest around DTG as a potential replacement to EFV as the cornerstone of the preferred first-line ART regimen, it is worth noting that all three of these trials include DTG, albeit with difference comparators for each trial. In the analyses from Chapter 2, DTG was a viable option as a first-line ART anchor treatment as it demonstrates improvements in efficacy, safety and tolerability. Moreover, with the INSTI class, DTG was consistently favored. Nonetheless, there were many instances where the improvements were not statistically significant relative to either EFV and other INSTIs.  Figure 3-1: Pictorial demonstration of basic statistical definitions useful for network meta-analysis   63 In 2016, IAS-USA released its own clinical guidelines that suggested that all INSTIs were equivalent.278,296 Evidence synthesis for these guidelines did not use NMA and assumed equivalency within the drug class. It remains to be seen whether the inclusion of IPD within the evidence synthesis would help shed light on this assumption and whether the use of IPD would improve statistical results, leading to improved decision making. Recall that guideline development uses the GRADE system, forcing the decisions to align with the statistical results of the evidence synthesis.  3.3 Objective The objective of this chapter was to examine the change in outputs in the evidence synthesis of ART among  first-line HIV patients – with a particular focus on the relative efficacy, safety and tolerability of DTG relative to other anchor treatments – when including IPD and to further compare the extent of this impact using different  established IPD-based methods for meta-regression adjustments using a mixture of IPD and AgD. 3.4 Methodology  The statistical analysis plan for this chapter was made publicly available prior to conducting the analyses and can be found on ResearchGate.com.68 The data used for this study are the combination of the SLR data used in Chapter 2, as well as the IPD provided by CSDR. Methods regarding the study selection, data extraction and data manipulations for the SLR are provided in both the statistical analysis plan and Chapter 2.68 Key details of the eligibility criteria, study selection and data extraction are provided here, but further details can be obtained into Section 2.4.1. 3.4.1 Systematic literature review 3.4.1.1 Eligibility criteria The inclusion criteria were the very same as those for the principal analysis in Chapter 2 (i.e., the sub-populations were not further explored in this chapter). Table 3-1 describes the PICOS (population, interventions, comparator, outcomes, study design) criteria used to guide the selection of studies that were included in this systematic literature review. There were no limits on the time of publication; however, given that ART in the sense described in the interventions was first presented in 1996, eligible publications were effectively only available starting in the 1990s. The inclusion criteria were applied to both the study and individual level. Specifically, to the individual level, eligible patients were those included in the full analysis set in the publications reporting on these studies.    64 Table 3-1: Scope of the literature review in PICOS form Criteria  Definition Population Inclusion criteria: • Treatment-naïve adults and adolescents (12 years and above) living with HIV Interventions • DTG + 2NRTI • EFV400 + 2NRTI • Raltegravir (RAL) + 2NRTI • Elvitegravir boosted with cobicistat EVG/c + 2NRTI • Bictegravir (BIC) + 2 NRTI • Doravirine (DOR) + 2NRTI  • Rilpivirine (RPV) + 2 NRTI • Nevirapine (NVP) + 2 NRTI • Darunavir boosted with ritonavir (DRV/r) + 2 NRTI • Atazanavir boosted with ritonavir (ATV/r) + 2 NRTI • Lopinavir boosted with ritonavir (LPV/r) + 2 NRTI Comparator • EFV600 + 2 NRTI Outcomes • Viral suppression at 48 and 96 weeks  • Change from baseline CD4 at 48 and 96 weeks  • Mortality  • Retention  • Discontinuations due to adverse events  • Treatment-emergent adverse events  • Severe adverse events  • Development of drug resistance Study design Inclusion criteria: • Randomized controlled trials (RCTs) Language Only studies published in English will be included Time Minimum follow-up time of 24 weeks *Note: Except for DTG, EFV400 and EFV600 treatments are required to provide indirect evidence 3.4.1.2 Data sources and search strategy A comprehensive systematic search of the literature was conducted on using the following databases: MEDLINE, EMBASE, and CENTRAL. The latest search of these sources was conducted on 12 February 2018. Further manual searches of the 2016, 2017 and 2018 Conference on Retroviruses and Opportunistic Infections (CROI), the 2016 AIDS conference, and the 2017 International AIDS Society (IAS) conference were conducted. Conference abstracts identified through the EMBASE search were eligible for inclusion. Additional studies were identified through a review of clinical trial registries and the reference lists of identified publications.  The general search strategy involved identifying papers according to the population of interest, and the inclusion of interventions and comparators of interest, and the restriction to randomized controlled trials. The population was identified as having HIV or AIDS and not being treatment experienced or failing treatment. The searches were further restricted on publication types that were not of interest (i.e., newsletters and reviews). The specific search strategies are presented in the statistical analysis plan.68  65 On 15 August 2016, IPD from three RCTs available through CSDR were formally requested. These were FLAMINGO (DRV/r + 2 NRTIs vs DTG + 2 NRTIs),94,157 SINGLE (DTG + ABC + XTC vs EFV + TDF + XTC),73,227-229 and SPRING-2 (DTG + 2 NRTIs vs RAL + 2NRTIs).187,188 Access to the data was granted on 06 June 2017. In hindsight, there was one more eligible trial that was available at the time through this service, namely SPRING-1;216,223 however, it had not been identified at the time. SPRING-1 was a Phase II trial comparing DTG to EFV. IPD are also available through the YODA Project. This service potentially could have provided access to ATREMIS,80,109,136,153,172,176,217 ECHO,160 and THRIVE;98 however, to the best of our knowledge, neither YODA nor CSDR would allow for data from their database to be combined with data from the other. It is important to note that despite not having the IPD for these four additional trials, they were still included in the evidence base and lack of IPD does not delegitimize the planned analysis. 3.4.1.3 Study selection and data extraction Two investigators, working independently, scanned all titles and abstracts identified in the literature search. The same two investigators independently reviewed records potentially relevant in full-text. If any discrepancies occurred between the studies selected by the two investigators, a third investigator provided arbitration. The same approach was used for data extraction. Further details on data extraction, including assessment of study quality, are provided in the statistical analysis plan.68 3.4.1.4 Study quality As described in Chapter 2, the validity of individual randomized controlled trials was assessed using the Risk of Bias instrument, endorsed by the Cochrane Collaboration.49 This instrument is used to evaluate 7 key domains: sequence generation; allocation concealment; blinding of participants and personnel; blinding of outcome assessors; incomplete outcome data; selective outcome reporting; and other sources of bias.  The validity of non-randomized studies, including single-arm trials, cohort studies, and observational study studies, were evaluated using the Tool to Assess the Risk of Bias in Cohort Studies, developed by the Clinical Advances through Research and Information Translation (CLARITY) group at McMaster University (Hamilton, Canada). This 8-item instrument is used to evaluate various aspects of the research design and study execution, including the selection of patients, differences in patient characteristics, and the assessment of outcomes.  3.4.2 Preparation of the individual patient data IPD were provided in a series of lengthwise tables following the Clinical Data Interchange Standards Consortium (CDISC) standards. Using these tables, an amalgamated IPD dataset combining all three studies was prepared. The patients were restricted to the full analysis sets (FAS) as was done in each of the publications for the respective trials.94,235,295 Using the ITT set was considered, but data, as provided, did not allow for this to be done. Data were prepared in a wide format with a row per unique patient.   66 The following baseline characteristics were obtained: age, sex, race (Caucasian, Black, Asian, other), ethnicity (Hispanic or not), viral RNA, CD4 cell count, AIDS-defining illnesses, Hepatitis B and C, and whether the patient was MSM. Note that disease duration was sought out, but unavailable from the raw trial data. In any event, disease duration was seldom reported throughout the evidence base obtained through the SLR as well. Furthermore, for the purpose of analyses, some variables were re-parameterized. Specifically, baseline CD4 was divided by 100 and viral RNA was on the log base 10 scale so as to allow for better interpretable regression coefficients. Moreover, given that IPD categorical variables were categorized as 0 or 1, percentages (0-100) were converted to proportions (0-1) in the SLR data in order to ensure a shared scale of measurement; and similarly, for the transformations to baseline CD4 and viral RNA. The following outcomes were obtained: Viral suppression at 24, 48 and 96 weeks; change from baseline in CD4 cell counts at 24, 48 and 96 weeks; discontinuations, discontinuations due to adverse events, serious adverse events. Viral suppression was defined using the FDA Snapshot algorithm, meaning missing values are set to failures. As discussed in Chapter 1, this implies that viral suppression captures a combination of efficacy and tolerability. Two outcomes were not available through the raw clinical trials data tables, namely AIDS-defining illnesses and regimen substitutions. Furthermore, given the few mortality events within the evidence base, mortality was not included within the updated analyses. By definition, there were no missing values for viral suppression. Similarly, there were no missing values for safety and tolerability outcomes. For CD4, there were missing values analyses were only conducted on the observations for which an observation was available. Following the identification of baseline characteristics and outcomes of interest, data were further verified to ensure that published results for each trial could be obtained from the IPD. Initial differences were resolved. For improved transparency, reporting of writings are in accordance with the preferred reporting items for systematic review and meta-analysis of individual participant data (PRISMA-IPD) guidelines. Please see Appendix B.1 for the completed PRISMA-IPD checklist.297 3.4.3 Statistical analyses The statistical analyses were conducted in two stages. Stage 1 involved comparing the various statistical models of interest for conducting meta-regression adjustments with IPD and AgD on select outcomes. Stage 2 involved applying the preferred adjustment method as identified in Stage 1.  Given that the analyses for this chapter were an extension of the analyses in Chapter 2, there are numerous aspects of the methods from Chapter 2 that apply here as well. All analyses were conducted in the Bayesian framework, as described in more detail in the previous chapter. Non-informative priors were used throughout all analyses. The only exception to this was the use of empirical priors for meta-regression covariates in a set of analyses that are discussed in detail in Section 3.4.3.1. Node definitions also aligned  67 with those used in the analyses of Chapter 2. In brief, nodes were defined in terms of specific anchor antivirals rather than specific ART regimens and backbones were treated as a covariate to be adjusted for through arm-based meta-regression, rather than as a defining characteristic of the node itself.  The following outcomes were used for Stage 1: • Viral suppression at 48 weeks (+/- 4 weeks) • Change from baseline in CD4 cell counts at 48 weeks (+/- 4 weeks) • Discontinuations  • Discontinuations due to adverse events  The motivation for choosing these variables is that DTG and EFV400 are treatments that are viewed to have as good or better efficacy and improved tolerability.67 Outcomes for Stage 2, were simply remaining outcomes from Chapter 2 that were available in the IPD, namely viral suppression and change from baseline in CD4 cell count at 96 weeks and treatment-emergent AEs and SAEs. Note AEs and SAEs are reported as such in publications and are not derived through the combination of reporting of individual adverse events. Given that this research involves multiple classes of drugs and that safety profiles with respect to specific adverse events differ by class, it was deemed to not be as relevant to look at specific adverse events. As stated above, the interest lied in using various forms of meta-regression to reduce heterogeneity across the network and improve treatment-effect estimation. The target population was set to be the average population amongst EFV patients. The reasoning here is that EFV was the recommended preferred first-line regimen at the time and also the most connected node in the network. A few factors were considered when deciding which variables to account for as covariates on which to adjust. First, they needed to be baseline characteristics that were potential effect-modifiers (i.e., it needed to be plausible that they would impact the treatment-effect). Second, they needed to be variables that were imbalanced in the network. And third, they needed to be variables that were well described in the AgD. The following variables were considered for covariate adjustments: • Baseline CD4 cell counts  • Baseline Viral RNA (log transformed) • Proportion of males The three trials for which IPD were available tended to include healthier patients (higher baseline CD4 and lower baseline HIV RNA) and more males than the average EFV trial. Although MSM was identified early on as a covariate of interest, it was too poorly reported in the AgD. The proportion of males was viewed as a proxy to the proportion of MSM. The variables listed above, along with age, were the best-reported  68 variables. There were some differences in other variables, such as race and HCV, but the differences appeared to be more negligible. Most other variables are under-reported in the literature rendering adjustment on these difficult or untenable. The analyses were centered near the mean values among EFV trials, specifically 250 cells/cm3 for baseline CD4, 4.8 log copies/ml for HIV RNA and 78% males. Finally, the covariates were parametrized as follows. HIV RNA was left unchanged; CD4 was divided by 100 to make the covariate estimates larger and easier to interpret; and finally, the proportion of males was parameterized as a proportion (between 0 and 1) rather than a percentage. The latter was to ensure coherence with the 0/1 parameterization of males in the IPD.  3.4.3.1 Statistical models For each outcome, a large collection of analyses was conducted using the various models of interest and the different combinations of covariates on which to adjust. Analyses consisted of comparing a variety of modeling approaches. The baseline model was the unadjusted NMA using AgD only. This model was used throughout most of the analyses in Chapter 2. In general, models fit into one of three categories: AgD NMA models, two-stage IPD-AgD models, and one-stage IPD-AgD models. Two-stage IPD-AgD models consist of models where the IPD are used to estimate the AgD results that would have been observed if the population was centered at specific values of patient characteristics. The second stage consists of using the revised AgD in a typical NMA model. One-stage IPD-AgD are those that conduct the NMA using both IPD and AgD in a single model, with the regression adjustments made within the model. In 2005, Simmonds et al reported that 28/44 (63%) published IPD meta-analyses used the two-stage approach to IPD-AgD NMA. In a more recent 2015 review, the same researchers report roughly even use of one- and two-stage approaches, though outside of survival outcomes, the use of one-stage IPD-AgD NMA has become more popular.286 Each model is defined below. Note that the first three sets of models (AgD NMA, AgD NMA with meta-regression, and Two-stage IPD-AgD NMA) are described through Equations 2-3, 2-4, and 2-7 of Chapter 2. All analyses include an arm-based meta-regression adjustment, as described in Equation 2-8, but for improved simplicity, these are not included in any of the model equations below. AgD NMA: Re-run the analyses conducted in 2015 with the updated data from 2018. This served as the “baseline” results from which to draw comparisons  AgD NMA with meta-regression: Traditional meta-regression for NMA as described in the NICE Technical Support Document 3.40 See Equation 2.7 and the surrounding text for further details.  Two-stage IPD-AgD NMA: For these analyses, aggregate values for the DTG trials were calculated using the IPD. Specifically, mixed linear regression among the IPD was used to model the effects of the candidate covariates on each outcome. These estimated effects were then used to determine the aggregate value that would have been observed if the IPD trials were observed within the target population. The advantage  69 of these methods is that they are much simpler to implement and can be conducted much faster. The disadvantage is that the analyses are disjointed and some of the information from the IPD may not be fully utilized. This may lead to sub-optimal results. One-stage IPD-AgD NMA without adjustments: Both IPD and AgD were analyzed simultaneously, but no meta-regression adjustments were used. This was simply to help determine if the IPD results were as expected. These results were meant to line up with the AgD NMA, and as such were used as a form of barometer. The model is shown in Equation 3-1, where !"#$ is the link-function-transformed parameter from the likelihood function of interest for the ith individual, in the jth trial, treated with treatment k. Similarly, %#$is the link-function-transformed parameter from the likelihood function of interest for the treatment k in the jth trial for the AgD. &#' and (#' are the study effects for the IPD and AgD, respectively, and reflect the mean value for the reference treatment. The remainder of the model notation follows from those described in Chapter 2. Identity link functions with Normal likelihoods were used for continuous outcomes. In the case of the IPD likelihood function,  )"#$ = +(!"#$, ./"#$0 ) was used, where )"#$ is the observed response for patient i, using treatment k in study j and ./"#$0  is a standard error generated using a random uniform distribution.  For dichotomous outcomes, logit link functions were use. A Bernoulli likelihood was used in the IPD stage and a binomial distribution was used in the AgD stage. For the IPD, the likelihood equation was 2"#$ =3/4567889(:"#$) where 2"#$ is the dichotomous value for patient i using treatment k in study j. These choices of link functions and likelihoods apply for all one-stage IPD-AgD NMA models. IPD !"#$ = 	 <&#'																																																																								if	? = @&#' +	B#'$																																																										if	? ≻ @ AgD %#$ = 	 <(#'																																																																										if	? = @(#' +	B#'$																																																												if	? ≻ @ B#'$	~	+642E8(F'$, G0) = 		+642E8(FH$ − FH', G0) FHH = 0,												FH$~+642E8(0,1000) Equation 3-1 Similar to Chapter 2, the order of treatments follows the alphabet and the greater than operator (≻) corresponds to a letter further down in the alphabet. Moreover, the assumption of equal between-study variance is used for these models as well.  One-stage IPD-AgD NMA with adjustments: IPD and AgD were combined, along with meta-regression, in a single model. This has the advantage of being a single model using all data and is likely to be the preferred model. An advantage that both this and the next models have is that they can more easily allow for the adjustment of multiple variables. Meta-regression with multiple variables using AgD only is seldom,  70 if ever, feasible. The model is shown in Equation 3-2. It shares the same notation as Equation 3-1. Additional terms are those specific to the meta-regression adjustments. For the IPD, LM# is a study specific effect of subject-level covariate N"#. LOH$ −	LOH' reflects the interaction effects of covariate N"# for treatment k relative to control treatment b. k-1 different regression coefficient LOH$ will be estimated by the model. Parameters of primary interest from analyses are the pooled estimates of FH$, the estimates for the heterogeneity, and treatment-by-covariate interaction effects LOH$.  IPD !"#$ = 	⎩⎪⎨⎪⎧&#' +	TLMU#NU"#U 																																																																												if	? = @&#' +	B#'$ +	TLMU#NU"#U 		+		T(LOUH$ −	LOUH')NU"#U 											if	? ≻ @ AgD %#$ = 	V(#'																																																																																			if	? = @(#' +	B#'$ +	T(LOUH$ −	LOUH')N. EXXU#U 												if	? ≻ @ B#'$	~	+642E8(F'$, G0) = 		+642E8(FH$ − FH', G0) FHH = 0, LOHH = 0												FH$~+642E8(0,1000), LU$ = @U, @U~+642E8(0,1000) Equation 3-2 Two-stage IPD-AgD NMA with empirical-priors: These models were the same as described in Equation 3-2, except that the regression coefficients were provided with an empirical prior that was informed by the IPD. Rather than start with the non-informative prior for LOH$, the IPD were first used to estimate meta-regression coefficients using mixed-effects linear regression. The estimates and standard errors of the meta-regression were used to construct an empirical prior: LOH$~+642E8YLZ, :4/[\]^. The idea here is two-fold. First, to ensure that the IPD principally inform the meta-regression (potentially avoiding some ecological fallacy bias) and second, to simplify the computational process. One-stage IPD-AgD NMA with hierarchical meta-regression: The final model that was considered was an expansion of one-stage IPD-AgD NMA that applies the hierarchical meta-regression adjustments first descried by Jackson et al. and developed for NMA by Jansen et al.53,54 Unfortunately, these methods have not been developed for all scenarios of interest. Most notably, they have only been developed for binomial outcomes. Development of methods for continuous outcomes are complete, but have not yet been published. The model is shown in Equation 3-3. It shares the same notations as Equations 3-1 and 3-2.  71 IPD 2"#$	~	3/4567889Y:"#$^ 86X9_(:"#$) = 	 <&#' +	LMN"#																																																																if	? = @&#' +	B#'$ +	LMN"# +	(LOH$ −	LOH')N"#												if	? ≻ @ AgD 4#$	~	395629E8Y`#$, 5#$^ `#$ = 	`#$M Y1 − N. EXX#^ +	`#$O N. EXX# 86X9_(`#$M ) = 	 <(#'																																																																																if	? = @(#' +	B#'$																																																																		if	? ≻ @ 86X9_(`#$O ) = 	 <(#' +	LM																																																																					if	? = @(#' +	B#'$ +	LM +	(LOH$ −	LOH')																						if	? ≻ @ B#'$	~	+642E8(F'$, G0) = 		+642E8(FH$ − FH', G0) FHH = 0, LOHH = 0												FH$~+642E8(0,0.001), LU$ = @U, @U~+642E8(0,1000) Equation 3-3 The IPD part of this model is the same as that of the one-stage IPD-AgD NMA with adjustments, with the exception that the coefficient related to the covariate effect, LM, is not study specific but fixed across studies. This is required because this parameter is now also used in the AgD part of the model (which reflects different studies). Alternatively, LM can be defined as specific to treatment b or as exchangeable across studies (or a mixture of both). For the AgD part of the model, the number of events 4 in study j for treatment k is assumed to be binomially distributed with probability `#$ and sample size 5#$. `#$ can be considered the average probability of the response of interest for an individual in study j treated with intervention k. `#$ is determined by integrating the individual level model over the joint within-study distribution of the binary covariate X. For a single binary covariate, `#$	equals the sum of the proportion of subjects with covariate x=1 in each AgD study multiplied with `#$O 	and the proportion of subjects with covariate x=0 multiplied with `#$M . `#$O 	represents the marginal probability of response with treatment k for a subject with the covariate x = 1 in study j. Similarly, `#$M 	is the equivalent for a subject with x=0. (#'	is the log odds of a response with comparator treatment b in study j for subjects with x=0. B#'$ is the study-specific log OR of treatment k relative to b for these subjects, and F'$ is the pooled effect. F'$+ (LOH$ −	LOH') reflects the pooled log OR of treatment k relative to b for a subject with covariate x=1.  The covariate adjustment values LOH$ are distinct from those used in previous equations (Chapter 2 and 3) in that they are patient-level effects rather than trial-level effects. Even in the other IPD models, the effects  72 are trial-level because they are estimated by both IPD and AgD. In Equation 3-3, the values `#$M  and `#$O  are latent probabilities and as a result it is not possible to point identify LM and LOH$ from AgD only, which in turn implies that these are solely estimated through IPD. This removes the possibility of the ecological fallacy bias entirely, rather than the partial approach from the two-stage IPD-AgD NMA with empirical-priors.  3.4.3.2 Measures of model comparison To assess the different models, the following measures were compared: • Treatment-effect estimates and posterior distributions of key comparisons.  • Coefficient estimates and posterior distributions • DIC value comparisons across models, as well as pD and deviance • Between-study heterogeneity (between-study variance of modelled outcome, e.g., log OR) • The proportion of points falling outside the lines c=3 and c=4 within leverage plots (the curves are of the form x2 + y = c). Points outside of the lines with c=3 can generally be identified as contributing to the model’s poor fit (see TSD2).58 Although the figures were reviewed, they are not presented.  • Change in SUCRA (surface under the cumulative ranking curve) scores The posterior distributions for treatment-effect estimates are the outputs that are of greatest interest when conducting a Bayesian NMA. These are the output that are subsequently used to draw inference and for decision-making. As such, the greatest impact that can be observed from using different modeling strategies is a large shift in treatment-effect estimates. For this reason, this was a primary measure of modeling impact. There were no specific hypotheses regarding how these would be affected beforehand. For comparisons in treatment-effect, the absolute effect was used because it is the most interpretable. For example, a difference of 5% in the proportion of viral suppression is more interpretable than a difference of 1.5 in the logarithm of the odds ratio. For the dichotomous variables, a difference of 1% was chosen as the threshold of minimal clinically important difference; however, one could argue that any improvement would be meaningful given the size of the epidemic. For a change in CD4, a difference of 10 cells/mm3 was chosen to be the minimal clinically important difference to align with the values that were used in the WHO reviews. The coefficient estimate is usually a measure of secondary importance in that most applications of NMA are not used to assess the degree to which a given covariate impacts the outcome of interest. As such, it provides a secondary measure by which to assess the impact of the adjustment as well as provide a measure by which the assess the existence and extent to which a chosen covariate influences the outcome of interest. The hypothesis was that the estimates would be more precise, because more points are used to estimate them, and that they might be larger due to having better estimates of standard deviation.   73 The deviance information criterion (DIC) was used to compare the goodness-of-fit of competing models.298 DIC provides a measure of model fit that penalizes model complexity according to abc = ad + :a, :a = ad −a.]  a] (“Dbar”) is the posterior mean residual deviance, pD is the effective number of parameters, and a] is the deviance evaluated at the posterior mean of the model parameters. In general, a more complex model will result in a better fit to the data, demonstrating a smaller residual deviance. The model with the better trade-off between fit and parsimony has a lower DIC. As suggested by Burnham and Anderson (1998), models receiving DIC within 1–2 of the ‘best’ deserve consideration, and 3–7 have considerably less support: these rules of thumb appear to work reasonably well for DIC. Certainly we would like to ensure that differences are not due to Monte Carlo error: although this is straightforward for ad (i.e., mean deviance), Zhu and Carlin (2000) have explored the difficulty of assessing the Monte Carlo error on DIC. As described in the statistical analysis plan, a difference of 3 or more in the DIC was considered to be meaningful.299  The fifth listed measure, the proportion of points above the third and fourth parabola, corresponds to points falling outside the acceptable range in leverage plots. Leverage plots are a useful tool to simultaneously assess an observations deviance (“how well the model fits a particular data point”) and leverage (“How influential an observation is with respect to model estimation”). Points outside these parabolas are points that are either too influential, too poor a fit, or too influential given its poor fit. Finally, the SUCRA is a measure of treatment rankings that is viewed as being robust;300 however, much caution should be used in working with rankings as they can be affected by large variances to some treatments. This measure is included out of curiosity and because they are used by some,  3.4.3.3 Additional considerations Inconsistency was assessed through the results of the inconsistency checks conducted in Chpater2. Each network of evidence considered in this chapter was also modeled in Chapter 2. In Chapter 2, results from independent means models were compared to results obtained through edge-splitting in order to determine whether the condition of consistency was met in each network of evidence. Analyses in this chapter are extensions of the same networks of evidence that aim at reducing the impact of differences between trials. As such, these models are less likely to suffer from inconsistency and results of the inconsistency tests in Chapter 2 were considered sufficient for verifying the consistency in the analyses for this chapter. Note that, as shown in Appendix A.4, there was no evidence of inconsistency in these networks of evidence. While previous literature has been clear on statistical models, they have been less clear on the development of some of the peripheral measures and considerations used in these models. Among these is the calculation of residual deviances in the IPD step of the model. A reminder that the overall residual deviance, adefg, is an absolute measure of model fit, which is critical to such steps as model selection and evaluation. As shown above, it is a key component to the calculation of the DIC. Deviance for the AgD section of the  74 model followed the methods outlined in TSD 2.58 Specifically, for using a binomial likelihood to model dichotomous outcomes, the equation for the total residual sum is: TT2i4"$86X j4"$4̂"$l + (5"$ − 4"$)86X j(5"$ − 4"$)(5"$ − 4̂"$)lm$"  Equation 3-4 Where 9 is the index for the trials and ? is the index for the trial arms. 4"$ are the observed number of responders in the kth arm of the ith trial, 4̂"$ are the model predicted number of responders in the kth arm of the ith trial, and 5"$ are the observed number of patients in the kth arm of the ith trial. In an AgD NMA model, the components of Equation 3-4 are measured in each iteration of the MCMC simulations. The challenge lay in calculating these trial-arm based measurements when the simulations were over specific patients for trials reporting IPD. To do so, vector multiplication and the sum() function were used to calculate each of the measures used in the AgD calculation at the study level on the basis of individual-level measures. Using JAGS was critical here as it allowed for vector multiplication in a manner akin to R. While most of these measures could be calculated outside of the JAGS program, 4̂"$ (rhat in JAGS code) could not and therefore, further thought would be required to program this in OpenBUGS or WinBUGS. See the top of page 198 in Appendix B.2 for full details. Where ns_ipd is the number of studies reporting IPD data, nn[i,j] represents the number of patients in the jth arm of the ith study, pp[i,j] represents the proportion of responders in the jth arm of the ith study, rr[i,j] represents the number of responders in the jth arm of the ith study, and rhat[i,j] represents the modeled number of responders in the jth arm of the ith study. In doing so, we re-create the individual elements of the AgD equation at the trial level for each IPD reporting trial. 3.4.3.4 Software The parameters of the different models were estimated using a Markov Chain Monte Carlo method implemented in the JAGS software package. The first series of 30,000 iterations from the OpenBUGS sampler were discarded as ‘burn-in’, and the inferences were based on additional 50,000 iterations using two chains. For all analyses, model convergence was assessed through trace plots, density plots and Gelman-Rubin-Brooks (shrink factor) plots.301 All analyses were performed using R version 3.4.4 (http://www.r-project.org/) and JAGS version 4.3. Code used to conduct the analyses specific to this chapter are presented in Appendix B.2. 3.4.4 Funding This work was supported by a CIHR (Canadian Institutes of Health Research) Doctoral Research Award. The IPD were provided by GlaxoSmithKline through the ClinicalStudyDataRequest.com programme. Neither agency played any role in the development and execution of the SLR and the analyses.  75 3.5 Results 3.5.1 Evidence base Study and patient selection are presented in the PRISMA-IPD297 flow diagram in Figure 3-2. The search was conducted in three phases: the first search of AgD was conducted in May 2015, a search for IPD was conducted on 15 August 2016, and then an updated search of AgD was conducted on 12 February 2018. The IPD search in 2016 involved both YODA and CDSR; however, data were only obtained through CSDR. These included 2160 patients from FLAMINGO (DRV/r + 2 NRTIs vs DTG + 2 NRTIs),94,157 SINGLE (DTG + ABC + XTC vs EFV + TDF + XTC),73,227-229 and SPRING-2 (DTG + 2 NRTIs vs RAL + 2NRTIs).187,188 As shown in Figure 3-2, the 2160 patients for which individual patient level data were available represent 6.5% of the total evidence base (2160/33,148), and as shown in Figure 3-3, the three trials cover a total of 3 of 24 edges (12.5%; shown in red) with trials providing head-to-head evidence. Given that the data matched those used to produce the results used in the publications, as was confirmed through the data checks, the patient characteristics match those described in the statistical analysis plan that was used to describe both Chapters 2. The results of the individual studies are provided in Appendix B.3. As shown in Appendix A.2, overall study quality was generally high (i.e., low risk of bias). Exceptions were restricted to open-label trials having a high risk of bias due to blinding and some of the more recent trials that were only reported upon in posters having insufficient information to determine with certainty that the risk of bias was either low or high. While there was a low risk of bias within studies, there was a fairly large variability in baseline characteristics across studies and, importantly, across comparisons. As shown in Appendix B.4, the covariates selected for adjustments in this study were both the best reported upon and had a high degree of variability. This was especially apparent in the baseline CD4, but note that given that HIV RNA is on a log scale, the variability appears to be misleadingly muted. Beyond the variables for which adjustments were made, there was also important variability in ethnicity, co-infections and baseline risk groups; however, data were too sparsely related to adjust for these through meta-regression. As such, there is some risk of bias here. Nonetheless, the variability in these baseline variables is known to be highly correlated to the variables on which we did adjust for.       76 Figure 3-2: PRISMA-IPD flow diagram   77 Figure 3-3: Network of all studies included in the evidence base  Legend: Circles (nodes) in the diagrams represent individual treatments, lines between circles represent availability of head-to-head evidence between two treatments, and the numbers on the lines are the number of RCTs informing each head-to-head comparison. Blue: NNRTIs; Green: Protease inhibitors; Orange: Integrase inhibitors. ATV/r: ritonavir-boosted atazanavir; DRV/r: ritonavir-boosted darunavir; DTG: dolutegravir; EFV: efavirenz; EFV400: efavirenz 400; EVG/c: elvitegravir/cobicistat; LPV/r: ritonavir-boosted lopinavir; NVP: nevirapine; RAL: raltegravir; RPV: rilpivirine; BIC: bictegravir; DOR: doravirine 3.5.2 Stage 1 – Comparing meta-regression adjustments Overall, the use of IPD appeared to have a negligible impact on the results. In each outcome, the use of IPD impacted an aspect of the results – say DIC, rankings or covariate estimates – but the aspect affected changed from one outcome to the next and tended to not be meaningful. The full set of results are shown for viral load at 48 weeks. For the remaining primary outcomes, only results that are noteworthy are focused on and the remaining Stage 1 results are presented in Appendix B.5 for posterity. 3.5.2.1 Viral suppression at 48 weeks Table 3-2 presents the model fit for the various models of interest for viral suppression at 48 weeks. The lowest DIC was for the unadjusted one-stage IPD-AgD NMA; however, the difference between it and the base model was not meaningful. The fit using the one-stage IPD-AgD NMA were considerably better than those using informative priors based on external analyses (two-stage empirical-priors approach). Moreover, they tended to have a better fit than those using IPD adjusted AgD NMA, but not all of these differences in DIC were meaningful. The use of IPD appeared to have minimal impact on the heterogeneity parameter estimate for this outcome. This is hardly surprising given that the selected models tended to be fixed-effects  78 rather than random-effects. Finally, with respect to Table 3-2, the proportion of observations above the third and fourth parabola in the leverage vs deviance plot tended to be stable. Nonetheless, the trend was towards having more outliers among the two-stage AgD NMA. Table 3-2: Comparison of model selection and fit for viral suppression at 48 weeks Analyses Model DIC pD Deviance Between study heterogeneity prop3 prop4 AgD NMA – Unadjusted Fixed 186.69 68.38 118.31 0.07 (0.003, 0.199) 2/116 2/116 AgD NMA meta-regression – CD4  Fixed 187.26 69.25 118.01 0.082 (0.006, 0.211) 2/116 2/116 AgD NMA meta-regression – HIV RNA  Fixed 188.79 69.38 119.41 0.07 (0.005, 0.194) 2/116 2/116 AgD NMA meta-regression – Male  Fixed 188.98 69.42 119.56 0.07 (0.003, 0.195) 2/116 2/116 Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 189.14 68.37 120.77 0.072 (0.003, 0.204) 2/116 2/116 Two-stage AgD NMA – CD4 + Male Fixed 188.21 68.38 119.83 0.076 (0.003, 0.208) 2/116 2/116 Two-stage AgD NMA – CD4 + HIV RNA Fixed 193.10 68.42 124.68 0.088 (0.005, 0.228) 3/116 2/116 Two-stage AgD NMA – HIV RNA + Male Fixed 190.15 68.44 121.71 0.078 (0.005, 0.215) 2/116 2/116 Two-stage AgD NMA – HIV RNA Fixed 191.23 67.67 123.56 0.087 (0.005, 0.225) 3/116 2/116 Two-stage AgD NMA – CD4 Fixed 189.06 67.70 121.36 0.082 (0.005, 0.222) 2/116 2/116 Two-stage AgD NMA – Male Random 205.19 82.51 122.68 0.167 (0.017, 0.306) 4/116 3/116 One-stage IPD-AgD NMA – unadjusted Fixed 185.11 68.45 116.66 0.072 (0.003, 0.2) 2/116 2/116 One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 188.65 71.41 117.24 0.066 (0.003, 0.204) 2/116 2/116 One-stage IPD-AgD NMA – CD4 + Male Fixed 188.35 70.21 118.14 0.074 (0.002, 0.206) 2/116 2/116 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 187.19 70.54 116.65 0.073 (0.004, 0.204) 2/116 2/116 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 188.90 70.58 118.32 0.074 (0.004, 0.207) 2/116 2/116 One-stage IPD-AgD NMA – CD4 Fixed 186.82 69.31 117.51 0.067 (0.001, 0.206) 2/116 2/116 One-stage IPD-AgD NMA – HIV RNA Fixed 187.76 69.36 118.4 0.074 (0.003, 0.203) 2/116 2/116 One-stage IPD-AgD NMA – Male Fixed 186.40 69.48 116.92 0.068 (0.004, 0.199) 2/116 2/116 Two-stage empirical-priors approach– CD4 + HIV RNA + Male Fixed 198.56 77.47 121.09 0.104 (0.012, 0.244) 2/116 2/116 Two-stage empirical-priors – CD4 + Male Fixed 197.69 74.18 123.51 0.108 (0.006, 0.253) 2/116 2/116 Two-stage empirical-priors – CD4 + HIV RNA Fixed 192.22 74.75 117.47 0.071 (0.003, 0.221) 2/116 2/116 Two-stage empirical-priors – HIV RNA + Male Fixed 195.30 74.40 120.90 0.091 (0.004, 0.23) 2/116 2/116 Two-stage empirical-priors – CD4 Fixed 191.28 71.49 119.79 0.082 (0.008, 0.215) 2/116 2/116 Two-stage empirical-priors – HIV RNA Fixed 193.05 71.16 121.89 0.09 (0.004, 0.228) 3/116 2/116 Two-stage empirical-priors – Male Fixed 188.59 71.81 116.78 0.075 (0.004, 0.208) 2/116 2/116 HMR IPD-AgD NMA – CD4 Fixed 185.25 70.07 115.18 0.061 (0.005, 0.181) 2/116 2/116 HMR IPD-AgD NMA – HIV RNA Fixed 187.21 70.45 116.76 0.066 (0.001, 0.192) 2/116 2/116 HMR IPD-AgD NMA – Male Fixed 186.34 70.44 115.90 0.072 (0.003, 0.197) 2/116 2/116 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; DIC: Deviance information criterion; pD: Effective number of parameters; prop3: Proportion of observations above deviance2 + leverage = 3; prop4: Proportion of observations above deviance2 + leverage = 4 Table 3-3 displays effects on rankings. The top-ranked treatments were all INSTIs. In most cases, there were no change in rankings. Change in rankings tended to happen in the models with the highest DICs and  79 hence those that were not at risk of being favoured. Changes in the top three rankings tended to be limited to a re-ordering of the same treatments, with DTG usually remaining on top. There were no notable trends among changes in rankings for this outcome. Table 3-3: Comparison of surface under the cumulative ranking (SUCRA) for viral suppression at 48 weeks Analyses Model SUCRA for EFV Rank (SUCRA) SUCRA for DTG Rank (SUCRA) SUCRA for EFV400 Rank (SUCRA) Change in top 3 ranked AgD NMA – Unadjusted Fixed 8 (0.43) 1 (0.99) 5 (0.59) DTG, RAL, EVG/c AgD NMA meta-regression – CD4  Fixed 8 (0.42) 1 (0.99) 5 (0.60) None AgD NMA meta-regression – HIV RNA  Fixed 8 (0.43) 1 (0.99) 5 (0.59) None AgD NMA meta-regression – Male  Fixed 8 (0.42) 1 (0.99) 5 (0.60) None Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 7 (0.47) 1 (0.92) 5 (0.65) None Two-stage AgD NMA – CD4 + Male Fixed 7 (0.47) 1 (0.94) 5 (0.64) None Two-stage AgD NMA – CD4 + HIV RNA Fixed 7 (0.47) 1 (0.97) 5 (0.64) None Two-stage AgD NMA – HIV RNA + Male Fixed 7 (0.47) 1 (0.94) 5 (0.65) DTG, EVG/c, RAL Two-stage AgD NMA – HIV RNA Fixed 7 (0.48) 2 (0.86) 5 (0.66) RAL, DTG, EVG/c Two-stage AgD NMA – CD4 Fixed 7 (0.48) 1 (0.93) 5 (0.66) DTG, EVG/c, RAL Two-stage AgD NMA – Male Random 7 (0.47) 1 (0.88) 5 (0.62) None One-stage IPD-AgD NMA – unadjusted Fixed 7 (0.46) 1 (0.97) 5 (0.63) None One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 6 (0.49) 1 (0.97) 5 (0.59) None One-stage IPD-AgD NMA – CD4 + Male Fixed 6 (0.50) 1 (0.97) 5 (0.63) None One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 6 (0.49) 1 (0.97) 5 (0.59) None One-stage IPD-AgD NMA – HIV RNA + Male Fixed 7 (0.48) 1 (0.97) 5 (0.61) None One-stage IPD-AgD NMA – CD4 Fixed 6 (0.49) 1 (0.97) 5 (0.63) None One-stage IPD-AgD NMA – HIV RNA Fixed 7 (0.48) 1 (0.97) 5 (0.62) None One-stage IPD-AgD NMA – Male Fixed 8 (0.46) 1 (0.97) 5 (0.63) None Two-stage empirical-priors – CD4 + HIV RNA + Male Fixed 6 (0.55) 1 (0.97) 5 (0.62) DTG, EVG/c, RPV Two-stage empirical-priors – CD4 + Male Fixed 6 (0.54) 1 (0.98) 5 (0.63) DTG, EVG/c, RAL Two-stage empirical-priors – CD4 + HIV RNA Fixed 6 (0.53) 1 (0.98) 5 (0.62) None Two-stage empirical-priors – HIV RNA + Male Fixed 7 (0.49) 1 (0.97) 6 (0.62) DTG, EVG/c, RPV Two-stage empirical-priors – CD4 Fixed 6 (0.52) 1 (0.98) 5 (0.63) None Two-stage empirical-priors – HIV RNA Fixed 7 (0.49) 1 (0.97) 5 (0.62) DTG, EVG/c, RPV Two-stage empirical-priors – Male Fixed 7 (0.46) 1 (0.97) 6 (0.62) None HMR IPD-AgD NMA – CD4 Fixed 3 (0.68) 1 (0.98) 6 (0.60) DTG, RAL, EFV HMR IPD-AgD NMA – HIV RNA Fixed 8 (0.28) 1 (0.98) 5 (0.61) None HMR IPD-AgD NMA – Male Fixed 11 (0.20) 1 (0.97) 6 (0.59) DTG, EVG/c, RAL AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; RAL: Raltegravir; EVG/c: Cobicistat-boosted elvitegravir; RPV: Rilpivirine; DOR: Doravirine.  80 Table 3-4: Comparison of comparative treatment estimates for viral suppression at 48 weeks Analyses Model DTG vs. EFV OR (95% CrI) EFV400 vs. EFV OR (95% CrI) DTG vs. EFV400 OR (95% CrI) Mean change in log-odds Maximum change in log-odds Mean change in proportion Maximum change in proportion AgD NMA – Unadjusted Fixed 1.85 (1.44, 2.38) 1.15 (0.75, 1.80) 1.61 (0.96, 2.68) -- -- -- -- AgD NMA meta-regression – CD4  Fixed 1.87 (1.49, 2.39) 1.16 (0.74, 1.97) 1.62 (0.92, 2.59) 0.033 0.089 0.005 0.015 AgD NMA meta-regression – HIV RNA  Fixed 1.85 (1.45, 2.38) 1.15 (0.75, 1.80) 1.61 (0.97, 2.67) 0.001 0.004 0 0 AgD NMA meta-regression – Male  Fixed 1.85 (1.45, 2.38) 1.17 (0.73, 1.86) 1.59 (0.93, 2.73) 0.003 0.013 0 0.002 Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 1.44 (1.12, 1.85) 1.16 (0.75, 1.80) 1.24 (0.75, 2.05) 0.063 0.266 0.01 0.042 Two-stage AgD NMA – CD4 + Male Fixed 1.50 (1.16, 1.93) 1.15 (0.74, 1.81) 1.30 (0.78, 2.16) 0.057 0.221 0.01 0.035 Two-stage AgD NMA – CD4 + HIV RNA Fixed 1.59 (1.23, 2.04) 1.16 (0.74, 1.80) 1.37 (0.82, 2.28) 0.056 0.183 0.009 0.029 Two-stage AgD NMA – HIV RNA + Male Fixed 1.49 (1.16, 1.93) 1.15 (0.74, 1.79) 1.29 (0.78, 2.15) 0.065 0.238 0.009 0.036 Two-stage AgD NMA – HIV RNA Fixed 1.34 (1.04, 1.74) 1.16 (0.74, 1.80) 1.16 (0.70, 1.94) 0.073 0.32 0.012 0.052 Two-stage AgD NMA – CD4 Fixed 1.46 (1.12, 1.91) 1.15 (0.75, 1.79) 1.27 (0.75, 2.11) 0.072 0.253 0.011 0.04 Two-stage AgD NMA – Male Random 1.44 (1.05, 1.99) 1.15 (0.66, 2.04) 1.24 (0.65, 2.39) 0.059 0.299 0.008 0.044 One-stage IPD-AgD NMA – unadjusted Fixed 1.60 (1.26, 2.02) 1.16 (0.75, 1.80) 1.38 (0.84, 2.27) 0.046 0.155 0.014 0.03 One-stage IPD-AgD NMA – CD4 +HIV RNA+ Male Fixed 1.57 (1.23, 2.00) 1.10 (0.70, 1.72) 1.43 (0.86, 2.38) 0.076 0.167 0.052 0.068 One-stage IPD-AgD NMA – CD4 + Male Fixed 1.54 (1.21, 1.95) 1.12 (0.72, 1.75) 1.37 (0.83, 2.26) 0.085 0.187 0.031 0.05 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 1.57 (1.24, 2.00) 1.10 (0.70, 1.72) 1.43 (0.85, 2.39) 0.076 0.166 0.03 0.048 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 1.58 (1.24, 2.01) 1.11 (0.71, 1.73) 1.41 (0.85, 2.36) 0.06 0.161 0.054 0.07 One-stage IPD-AgD NMA – CD4 Fixed 1.54 (1.22, 1.95) 1.12 (0.72, 1.75) 1.38 (0.84, 2.26) 0.083 0.181 0.02 0.037 One-stage IPD-AgD NMA – HIV RNA Fixed 1.56 (1.23, 1.97) 1.12 (0.72, 1.75) 1.39 (0.85, 2.27) 0.066 0.174 0.022 0.039 One-stage IPD-AgD NMA – Male Fixed 1.61 (1.27, 2.04) 1.15 (0.74, 1.80) 1.39 (0.83, 2.33) 0.042 0.142 0.008 0.026 Two-stage empirical-priors – CD4+HIV RNA+Male Fixed 1.60 (1.21, 2.13) 1.08 (0.68, 1.70) 1.48 (0.87, 2.54) 0.102 0.271 0.022 0.049 Two-stage empirical-priors – CD4 + Male Fixed 1.62 (1.24, 2.12) 1.10 (0.70, 1.71) 1.48 (0.88, 2.49) 0.106 0.245 0.023 0.047 Two-stage empirical-priors – CD4 + HIV RNA Fixed 1.65 (1.25, 2.17) 1.09 (0.70, 1.71) 1.50 (0.88, 2.54) 0.087 0.233 0.016 0.042 Two-stage empirical-priors – HIV RNA + Male Fixed 1.58 (1.21, 2.05) 1.10 (0.70, 1.74) 1.43 (0.85, 2.40) 0.092 0.194 0.02 0.039 Two-stage empirical-priors – CD4 Fixed 1.67 (1.28, 2.17) 1.11 (0.71, 1.73) 1.50 (0.90, 2.52) 0.092 0.211 0.019 0.041 Two-stage empirical-priors – HIV RNA Fixed 1.58 (1.24, 2.03) 1.11 (0.71, 1.74) 1.42 (0.85, 2.37) 0.083 0.178 0.021 0.04 Two-stage empirical-priors – Male Fixed 1.58 (1.22, 2.04) 1.14 (0.73, 1.79) 1.38 (0.83, 2.32) 0.056 0.16 0.011 0.028 HMR IPD-AgD NMA – CD4 Fixed 1.34 (0.96, 1.88) 0.94 (0.57, 1.58) 1.42 (0.87, 2.31) 0.226 0.321 0.131 0.148 HMR IPD-AgD NMA – HIV RNA Fixed 2.03 (0.92, 3.96) 1.36 (0.67, 2.57) 1.48 (0.89, 2.46) 0.16 0.221 0.120 0.170 HMR IPD-AgD NMA – Male Fixed 2.20 (1.25, 4.02) 1.48 (0.81, 2.80) 1.48 (0.88, 2.48) 0.258 0.309 0.036 0.050 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.  81 Table 3-5: Coefficient estimates across the IPD-AgD NMA for viral suppression at 48 weeks Analyses Model ß1,1 Median (95% CrI) ß1,2 Median (95% CrI) ß1,3 Median (95% CrI) ß0,1 Median (95% CrI) ß0,2 Median (95% CrI) ß0,3 Median (95% CrI) AgD NMA meta-regression – CD4  Fixed 0.096  (-0.039, 0.216) -- -- -- -- -- AgD NMA meta-regression – HIV RNA  Fixed 0.009  (-0.368, 0.377) -- -- -- -- -- AgD NMA meta-regression – Male  Fixed 0.041  (-0.947, 1.035) -- -- -- -- -- One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 0.144  (0.015, 0.272) -0.031  (-0.363, 0.296) -0.237  (-0.957, 0.429) -0.779  (-3.481, 1.729) 0.256  (-9.87, 7.315) -7.108  (-20.194, 17.658) One-stage IPD-AgD NMA – CD4 + Male Fixed 0.135  (0.011, 0.264) -0.019  (-0.338, 0.303) -- 0.174  (-2.918, 2.76) -1.272  (-6.477, 3.456) -- One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 0.149  (0.035, 0.269) -0.229  (-0.945, 0.427) -- -0.413  (-2.778, 1.985) -2.913  (-10.851, 4.943) -- One-stage IPD-AgD NMA – HIV RNA + Male Fixed -0.195  (-0.486, 0.097) -0.105  (-0.791, 0.539) -- 8.999  (-0.415, 15.519) -19.637  (-32.557, -6.818) -- One-stage IPD-AgD NMA – CD4 Fixed 0.139  (0.028, 0.251) -- -- 0.351  (-1.398, 2.058) -- -- One-stage IPD-AgD NMA – HIV RNA Fixed -0.182  (-0.468, 0.104) -- -- -0.500  (-4.586, 3.167) -- -- One-stage IPD-AgD NMA – Male Fixed -0.014  (-0.674, 0.607) -- -- 1.347  (-10.007, 6.357) -- --  Table 3-4 presents the estimated effects for the comparisons of primary interest (DTG, EFV400 and EFV). Meta-regression adjustments based on IPD tended to lower the estimated efficacy of DTG, but almost never rendered it non-significant. The exception was the use of hierarchical meta-regression, which was limited to single variable adjustments. Two observations were made with this approach. First, these analyses included much wider credible intervals than other analyses. This aligns with results previously presented by Jansen.54 Second, the analysis adjusting for baseline CD4 led to the largest downward shift, while the analysis with adjustments for HIV RNA and proportion of males led to the largest upward shift. Given that neither of these analyses led to improved fit, there is minimal concern here, but these are methods that are noted for reducing ecological fallacy (i.e., results of these analyses may have improved validity). Mean changes in the log odds were quite large across all analyses and the maximum changes were extremely large. These differences were considerably larger in the hierarchical meta-regression, which may help explain the larger confidence intervals.Table 3-5 presents the estimated coefficients across the analyses. When comparing the meta-regression coefficients, the coefficient for CD4 was statistically significant in each of the IPD analyses that included it as a covariate. Moreover, its estimated effect size was consistent across the model using IPD. The  82 coefficient estimates were notably different across AgD and IPD models, with HIV RNA leading the way. This suggests that the AgD NMA meta-regression may have suffered from the ecological fallacy.  3.5.2.2 Change from baseline in CD4 at 48 weeks One could argue that the model fit is the most important measure to review given that without a meaningful improvement in DIC, the alternative models would never be chosen to begin with. For a change in baseline CD4, no models led to a meaningfully lower DIC than the unadjusted AgD NMA (Appendix B.5). Contrary to viral suppression, here it was the two-stage models that appeared to have the best fit among the IPD adjusted models. Moreover, the two-stage analyses also reduced the number of points outside the fourth parabola in the leverage plots, suggesting an overall better fit to the data. Curiously, the choice of random and fixed effects varied with one-stage IPD-AgD NMA using a random-effects rather than a fixed. This is also reflected by the larger estimated between-study heterogeneity. In this case, it is worth asking whether this implies a better fit using the AgD NMA and two-stage IPD-AgD NMA or whether this is a reflection of these analyses being underpowered relative to the one stage IPD-AgD NMA. It is also worth noting that the one-stage models had considerably lower deviances suggesting that these models better fit the data. Table 3-6 displays effects on rankings. The rankings were the most affected measure across the various models. DTG was ranked first in the base case and in the IPD-AgD NMA, but EFV400 was ranked first when using AgD meta-regression and two-stage IPD-AgD NMA. DTG remained the favoured treatment in the one-stage and two-stage empirical-priors. With respect to the research question at hand, using a two-stage approach would indeed impact how data were interpreted given the change in rankings, particularly with DTG becoming a mid-ranked treatment and EFV400 becoming the number one ranked treatment.  Table 3-7 shows the estimated comparative treatment-effects of interest as well as the mean and maximum changes. In alignment with the ranking results, changes were observed going in opposite directions, with some models leading to much smaller treatment-effects for DTG and others leading to much larger changes. What is worth noting is that across all modeling approaches, the average change in estimated effect was between approximately 5 and 10 cells/ml (absolute value). The maximum changes were as large as 25 cells/ml. These are large changes that would lead to a very different interpretation of the data. In reviewing these results, one can see that the effect EFV400 relative to EFV changes considerably with these adjustments despite being estimated by a single trial, ENCORE-1. While the results were centered around the average for EFV trials, ENCORE-1 was a bit of an outlier in that it had higher baseline HIV-RNA and lower baseline CD4 than the average EFV trial. Thus adjustments on these factors did impact the analytical results   83 Table 3-6: Comparison of surface under the cumulative ranking (SUCRA) for change in CD4 at 48 weeks Analyses Model SUCRA for EFV Rank  SUCRA for DTG Rank  SUCRA for EFV400 Rank  Change in top 3 ranked AgD NMA – Unadjusted Fixed 12 (0.06) 1 (0.81) 10 (0.18) DTG, RAL, LPV/r AgD NMA meta-regression – CD4  Fixed 12 (0.07) 3 (0.73) 2 (0.79) LPV/r, EFV400, DTG AgD NMA meta-regression – HIV RNA  Fixed 12 (0.06) 2 (0.77) 1 (0.82) EFV400, DTG, EVG/c AgD NMA meta-regression – Male  Fixed 12 (0.06) 3 (0.73) 1 (0.90) EFV400, LPV/r, DTG Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 11 (0.19) 9 (0.25) 1 (0.90) EFV400, LPV/r, EVG/c Two-stage AgD NMA – CD4 + Male Fixed 11 (0.19) 9 (0.25) 1 (0.90) EFV400, LPV/r, EVG/c Two-stage AgD NMA – CD4 + HIV RNA Fixed 11 (0.19) 9 (0.26) 1 (0.90) EFV400, LPV/r, EVG/c Two-stage AgD NMA – HIV RNA + Male Fixed 10 (0.19) 9 (0.29) 1 (0.90) EFV400, LPV/r, EVG/c Two-stage AgD NMA – HIV RNA Fixed 11 (0.19) 9 (0.26) 1 (0.90) EFV400, LPV/r, EVG/c Two-stage AgD NMA – CD4 Fixed 11 (0.20) 9 (0.25) 1 (0.91) EFV400, LPV/r, EVG/c Two-stage AgD NMA – Male Fixed 10 (0.19) 9 (0.28) 1 (0.90) EFV400, LPV/r, EVG/c One-stage IPD-AgD NMA – unadjusted Random 12 (0.09) 1 (0.86) 5 (0.65) DTG, RAL, BIC One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Random 12 (0.06) 1 (0.89) 4 (0.68) DTG, RAL, EVG/c One-stage IPD-AgD NMA – CD4 + Male Random 12 (0.06) 1 (0.89) 4 (0.68) DTG, RAL, EVG/c One-stage IPD-AgD NMA – CD4 + HIV RNA Random 12 (0.09) 1 (0.87) 4 (0.65) DTG, RAL, EVG/c One-stage IPD-AgD NMA – HIV RNA + Male Random 12 (0.07) 1 (0.89) 4 (0.67) DTG, RAL, EVG/c One-stage IPD-AgD NMA – CD4 Random 12 (0.09) 1 (0.87) 4 (0.65) DTG, RAL, EVG/c One-stage IPD-AgD NMA – HIV RNA Random 12 (0.07) 1 (0.89) 4 (0.67) DTG, RAL, EVG/c One-stage IPD-AgD NMA – Male Random 12 (0.09) 1 (0.87) 4 (0.65) DTG, RAL, EVG/c Two-stage empirical-priors – CD4 + HIV RNA + Male Random 12 (0.11) 1 (0.91) 6 (0.59) DTG, RAL, DRV/r Two-stage empirical-priors – CD4 + Male Random 12 (0.10) 1 (0.90) 5 (0.60) None Two-stage empirical-priors – CD4 + HIV RNA Random 12 (0.12) 1 (0.93) 5 (0.60) None Two-stage empirical-priors – HIV RNA + Male Random 12 (0.09) 1 (0.87) 5 (0.61) DTG, RAL, EVG/c Two-stage empirical-priors – CD4 Random 12 (0.11) 1 (0.92) 5 (0.62) None Two-stage empirical-priors – HIV RNA Random 12 (0.08) 1 (0.87) 4 (0.63) DTG, RAL, EVG/c Two-stage empirical-priors – Male Random 12 (0.10) 1 (0.88) 5 (0.63) DTG, RAL, EVG/c AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; RAL: Raltegravir; EVG/c: Cobicistat-boosted elvitegravir; RPV: Rilpivirine; DOR: Doravirine.   84 Table 3-7: Comparison of comparative treatment estimates for change in CD4 at 48 weeks Analyses Model DTG vs. EFV MD (95% CrI) EFV400 vs. EFV MD (95% CrI) DTG vs. EFV400 MD (95% CrI) Mean change in mean change Maximum change in mean change AgD NMA – Unadjusted Fixed 25.40 (7.11, 43.74) 6.41 (-7.93, 20.67) 19.03 (-4.38, 42.38) -- -- AgD NMA meta-regression – CD4  Fixed 22.13 (7.24, 36.76) 25.83 (7.52, 44.13) -3.66 (-27.40, 20.03) 5.621 19.424 AgD NMA meta-regression – HIV RNA  Fixed 22.32 (7.70, 36.76) 25.59 (7.31, 43.87) -3.31 (-26.62, 20.13) 4.235 19.182 AgD NMA meta-regression – Male  Fixed 20.72 (6.25, 35.51) 31.03 (10.94, 50.92) -10.28 (-35.77, 15.29) 5.453 24.623 Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 0.48 (-14.18, 15.19) 25.50 (6.94, 43.61) -24.91 (-48.20, -1.13) 9.392 24.915 Two-stage AgD NMA – CD4 + Male Fixed 0.77 (-13.94, 15.63) 25.47 (7.35, 44.01) -24.85 (-48.38, -1.38) 9.318 24.627 Two-stage AgD NMA – CD4 + HIV RNA Fixed 0.92 (-13.85, 15.51) 25.44 (7.32, 43.76) -24.57 (-48.09, -1.49) 9.315 24.478 Two-stage AgD NMA – HIV RNA + Male Fixed 1.74 (-10.75, 13.75) 25.58 (6.98, 43.93) -23.92 (-45.83, -1.84) 9.269 23.652 Two-stage AgD NMA – HIV RNA Fixed 0.80 (-14.01, 15.32) 25.35 (6.99, 43.48) -24.64 (-48.26, -1.19) 9.332 24.601 Two-stage AgD NMA – CD4 Fixed 0.29 (-11.96, 12.08) 25.46 (7.41, 43.74) -25.22 (-46.91, -3.49) 9.556 25.107 Two-stage AgD NMA – Male Fixed 1.10 (-11.20, 13.07) 25.57 (7.37, 43.69) -24.42 (-46.33, -2.70) 9.386 24.292 One-stage IPD-AgD NMA – unadjusted Random 34.20 (15.99, 51.87) 25.13 (-6.26, 56.17) 9.15 (-26.81, 44.90) 6.985 18.717 One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Random 41.03 (21.92, 60.52) 31.25 (-1.89, 63.95) 9.78 (-27.94, 48.39) 10.423 24.841 One-stage IPD-AgD NMA – CD4 + Male Random 40.90 (21.81, 60.06) 31.50 (-1.53, 64.27) 9.36 (-28.18, 47.70) 10.325 25.094 One-stage IPD-AgD NMA – CD4 + HIV RNA Random 34.93 (16.41, 53.59) 24.99 (-6.51, 56.45) 9.97 (-26.64, 46.52) 6.91 18.582 One-stage IPD-AgD NMA – HIV RNA + Male Random 42.15 (22.25, 62.44) 32.06 (-2.75, 66.57) 10.11 (-29.44, 50.44) 10.983 25.647 One-stage IPD-AgD NMA – CD4 Random 34.83 (16.36, 53.34) 25.16 (-6.63, 56.73) 9.71 (-26.75, 46.13) 6.837 18.753 One-stage IPD-AgD NMA – HIV RNA Random 42.01 (22.23, 61.92) 31.53 (-2.72, 65.44) 10.45 (-28.87, 50.29) 10.847 25.125 One-stage IPD-AgD NMA – Male Random 34.56 (16.45, 52.64) 24.95 (-6.54, 56.07) 9.61 (-26.27, 46.10) 6.772 18.537 Two-stage empirical-priors – CD4 + HIV RNA + Male Random 44.73 (24.74, 64.68) 26.29 (-5.56, 58.20) 18.40 (-19.22, 55.58) 9.283 19.881 Two-stage empirical-priors – CD4 + Male Random 43.76 (25.08, 62.60) 26.96 (-3.88, 57.76) 16.83 (-19.18, 52.98) 9.175 20.547 Two-stage empirical-priors – CD4 + HIV RNA Random 41.04 (22.05, 59.22) 22.57 (-7.51, 52.24) 18.44 (-16.67, 53.23) 7.353 16.163 Two-stage empirical-priors – HIV RNA + Male Random 40.48 (19.85, 61.29) 27.36 (-7.52, 61.79) 13.17 (-26.77, 53.69) 8.332 20.949 Two-stage empirical-priors – CD4 Random 39.63 (21.49, 57.21) 23.49 (-5.67, 52.61) 16.13 (-18.02, 49.88) 7.074 17.081 Two-stage empirical-priors – HIV RNA Random 39.35 (19.62, 59.28) 27.80 (-6.06, 61.44) 11.57 (-27.48, 51.09) 7.71 21.393 Two-stage empirical-priors – Male Random 36.48 (17.21, 55.01) 24.23 (-7.24, 56.09) 12.11 (-24.56, 48.57) 7.471 17.82 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.    85 Finally, with respect to CD4 most regression coefficients were not statistically significant (Appendix B.5), but similarly to the viral suppression at 48 weeks analysis, the estimated coefficients using IPD were substantially different than those obtained through AgD meta-regression. For example, the effect of baseline HIV RNA went from 2.507 (95% CrI: -21.233, 26,667) to 45.509 (95% CrI: 31.324, 59.931). That is to say that the AgD meta-regression estimated that on average, a trial initiating at a baseline HIV RNA that was one log unit higher led to a relative change in CD4 that was 2.507 cells/ml higher, whereas the one-stage IPD-AgD NMA estimated an average increase that was 45.509 cells/ml higher. Moreover, the latter was significant while the former was not. In this example, it is important to keep in mind that trials did not differ by a full log unit of HIV RNA. Overall for change in CD4 at 48 weeks, how model selection is conducted can be very impactful. On the basis of the DIC, one could select the unadjusted AgD NMA as no models had a meaningfully lower DIC. Choosing the lowest DIC would imply selecting the two-stage model adjusting for baseline HIV RNA and the proportion of males. This model would completely change the interpretation of the therapeutic landscape. Choosing on the basis of lowest deviance, would lead to selecting the one-stage model adjusting for the proportion of males, and on the basis of significant covariates would lead to a one-stage model with adjustments for baseline HIV RNA. Both of these models would strengthen the position of DTG as the most effective ARV for this outcome, but not change the decision-making with respect to clinical guideline development.  3.5.2.3 Discontinuations None of the models were meaningfully different from the base AgD NMA. Although some of the adjusted AgD NMA models had lower DICs, none met the threshold difference of three required to be meaningful (Appendix B.5). Of note, the lowest DIC was the HMR IPD-AgD NMA with adjustments for the proportion of males. Similar to with CD4, choice of random and fixed-effects varied from one model to the next, making comparison with respect to the leverage plots a little more complicated. Rankings varied a little, but did not have any impact on decision making; rankings of the key treatments remained unchanged. Table 3-11 presents the estimated effects for the comparisons of primary interest. For the comparisons of interest, the change in estimates tended to be minimal across models. Certainly, the changes would not impact the interpretation of the therapeutic landscape. Interestingly, the exception to this was the HMR IPD-AgD NMA with adjustments for the proportion of males, the very model with the lowest DIC. In this model, both DTG and EFV400 were considerably more tolerable relative to EFV than in the unadjusted model. When looking at the average and maximum changes, meta-regression adjustments tended to have a small impact. Finally, with respect to regression coefficients, none of them were statistically significant.  86 Table 3-8: Comparison of comparative treatment estimates for discontinuations Analyses Model DTG vs. EFV OR (95% CrI) EFV400 vs. EFV OR (95% CrI) DTG vs. EFV400 OR (95% CrI) Mean change in log-odds Maximum change in log-odds Mean change in proportion Maximum change in proportion AgD NMA – Unadjusted Random 0.52 (0.39, 0.70) 0.91 (0.46, 1.84) 0.57 (0.27, 1.23) -- -- -- -- AgD NMA meta-regression – CD4  Random 0.55 (0.40, 0.74) 0.91 (0.46, 1.83) 0.60 (0.28, 1.29) 0.029 0.055 0.004 0.007 AgD NMA meta-regression – HIV RNA  Fixed 0.51 (0.40, 0.64) 0.86 (0.47, 1.60) 0.59 (0.31, 1.14) 0.047 0.101 0.005 0.016 AgD NMA meta-regression – Male  Random 0.54 (0.40, 0.71) 1.06 (0.53, 2.13) 0.50 (0.24, 1.04) 0.061 0.16 0.009 0.033 Two-stage AgD NMA – CD4 + HIV RNA + Male Random 0.59 (0.40, 0.85) 0.92 (0.42, 1.99) 0.64 (0.27, 1.52) 0.05 0.189 0.011 0.028 Two-stage AgD NMA – CD4 + Male Random 0.57 (0.39, 0.83) 0.92 (0.42, 2.00) 0.62 (0.26, 1.49) 0.059 0.182 0.012 0.027 Two-stage AgD NMA – CD4 + HIV RNA Random 0.59 (0.40, 0.88) 0.92 (0.41, 2.04) 0.65 (0.27, 1.61) 0.059 0.215 0.011 0.032 Two-stage AgD NMA – HIV RNA + Male Random 0.56 (0.39, 0.81) 0.92 (0.43, 1.96) 0.61 (0.26, 1.43) 0.046 0.186 0.012 0.028 Two-stage AgD NMA – HIV RNA Random 0.54 (0.37, 0.78) 0.91 (0.43, 1.96) 0.59 (0.25, 1.38) 0.048 0.168 0.011 0.025 Two-stage AgD NMA – CD4 Random 0.53 (0.36, 0.77) 0.92 (0.42, 1.97) 0.57 (0.24, 1.37) 0.045 0.176 0.011 0.026 Two-stage AgD NMA – Male Random 0.57 (0.38, 0.85) 0.91 (0.40, 2.10) 0.63 (0.24, 1.59) 0.064 0.244 0.014 0.039 One-stage IPD-AgD NMA – unadjusted Random 0.62 (0.46, 0.83) 0.91 (0.45, 1.81) 0.69 (0.32, 1.44) 0.048 0.175 0.026 0.044 One-stage IPD-AgD NMA – CD4+HIV RNA + Male Fixed 0.63 (0.50, 0.80) 0.92 (0.50, 1.70) 0.69 (0.36, 1.32) 0.052 0.191 0.011 0.028 One-stage IPD-AgD NMA – CD4 + Male Random 0.63 (0.46, 0.87) 0.93 (0.45, 1.93) 0.68 (0.31, 1.52) 0.059 0.194 0.009 0.015 One-stage IPD-AgD NMA – CD4 + HIV RNA Random 0.64 (0.46, 0.87) 0.92 (0.45, 1.93) 0.69 (0.31, 1.52) 0.061 0.198 0.006 0.02 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 0.63 (0.50, 0.80) 0.91 (0.49, 1.70) 0.69 (0.36, 1.34) 0.05 0.189 0.014 0.03 One-stage IPD-AgD NMA – CD4 Random 0.63 (0.46, 0.86) 0.92 (0.45, 1.90) 0.69 (0.31, 1.51) 0.057 0.193 0.012 0.023 One-stage IPD-AgD NMA – HIV RNA Fixed 0.63 (0.50, 0.79) 0.91 (0.49, 1.69) 0.69 (0.36, 1.33) 0.049 0.186 0.007 0.016 One-stage IPD-AgD NMA – Male Random 0.62 (0.46, 0.83) 0.90 (0.44, 1.82) 0.68 (0.32, 1.47) 0.048 0.169 0.009 0.014 Two-stage empirical-priors – CD4+HIV RNA+Male Fixed 0.67 (0.51, 0.88) 0.92 (0.49, 1.69) 0.73 (0.37, 1.45) 0.123 0.252 0.017 0.052 Two-stage empirical-priors – CD4 + Male Random 0.66 (0.48, 0.92) 0.90 (0.44, 1.85) 0.74 (0.33, 1.62) 0.117 0.242 0.016 0.038 Two-stage empirical-priors – CD4 + HIV RNA Random 0.66 (0.47, 0.92) 0.93 (0.46, 1.93) 0.71 (0.31, 1.54) 0.105 0.252 0.016 0.033 Two-stage empirical-priors – HIV RNA + Male Fixed 0.63 (0.49, 0.81) 0.90 (0.48, 1.66) 0.70 (0.36, 1.38) 0.057 0.188 0.017 0.041 Two-stage empirical-priors – CD4 Random 0.65 (0.46, 0.91) 0.93 (0.44, 1.96) 0.70 (0.30, 1.56) 0.103 0.265 0.014 0.029 Two-stage empirical-priors – HIV RNA Fixed 0.64 (0.50, 0.80) 0.89 (0.48, 1.64) 0.71 (0.37, 1.38) 0.054 0.198 0.013 0.036 Two-stage empirical-priors – Male Fixed 0.64 (0.50, 0.81) 0.92 (0.50, 1.71) 0.69 (0.36, 1.33) 0.057 0.199 0.015 0.032 HMR IPD-AgD NMA – CD4 Random 0.62 (0.40, 0.94) 0.92 (0.43, 1.97) 0.67 (0.32, 1.44) 0.043 0.171 0.026 0.039 HMR IPD-AgD NMA – HIV RNA Random 0.45 (0.17, 3.80) 0.73 (0.25, 6.32) 0.64 (0.29, 1.37) 0.239 0.294 0.088 0.147 HMR IPD-AgD NMA – Male Fixed 0.36 (0.22, 0.57) 0.61 (0.30, 1.23) 0.59 (0.30, 1.15) 0.491 0.58 0.028 0.046 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.  87 3.5.2.4 Discontinuations due to adverse events Out of all the primary outcomes, only discontinuations due to adverse events had a model other than the unadjusted AgD NMA selected through a meaningfully lower DIC. In this case, it was the two-stage empirical priors approach with adjustments for the proportion of males that was selected (Table 3-9).  Table 3-9: Comparison of model selection and fit for discontinuations due to adverse events Analyses Model DIC pD Deviance Between study  prop3 prop4 AgD NMA – Unadjusted Fixed 205.79 66.95 138.84 0.213 (0.019, 0.503) 9/116 3/116 AgD NMA meta-regression – CD4  Fixed 205.27 67.89 137.38 0.21 (0.007, 0.498) 8/116 4/116 AgD NMA meta-regression – HIV RNA  Fixed 207.20 67.89 139.31 0.214 (0.013, 0.498) 7/116 3/116 AgD NMA meta-regression – Male  Fixed 205.34 67.98 137.36 0.209 (0.016, 0.497) 8/116 3/116 Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 205.85 66.95 138.90 0.217 (0.022, 0.502) 9/116 3/116 Two-stage AgD NMA – CD4 + Male Fixed 205.89 67.00 138.89 0.223 (0.021, 0.515) 9/116 3/116 Two-stage AgD NMA – CD4 + HIV RNA Fixed 205.98 66.89 139.09 0.228 (0.017, 0.506) 10/116 3/116 Two-stage AgD NMA – HIV RNA + Male Fixed 205.65 66.82 138.83 0.22 (0.017, 0.502) 9/116 3/116 Two-stage AgD NMA – HIV RNA Fixed 205.79 66.85 138.94 0.219 (0.013, 0.501) 10/116 3/116 Two-stage AgD NMA – CD4 Fixed 206.12 66.98 139.14 0.227 (0.015, 0.509) 9/116 3/116 Two-stage AgD NMA – Male Fixed 205.88 66.87 139.01 0.225 (0.013, 0.505) 9/116 3/116 One-stage IPD-AgD NMA – unadjusted Fixed 204.22 66.95 137.27 0.171 (0.004, 0.474) 9/116 3/116 One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 204.52 69.98 134.54 0.156 (0.006, 0.445) 7/116 4/116 One-stage IPD-AgD NMA – CD4 + Male Fixed 204.33 68.95 135.38 0.177 (0.007, 0.444) 6/116 4/116 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 204.25 68.86 135.39 0.194 (0.017, 0.47) 9/116 3/116 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 204.30 68.74 135.56 0.178 (0.01, 0.452) 6/116 3/116 One-stage IPD-AgD NMA – CD4 Fixed 204.35 67.93 136.42 0.199 (0.01, 0.476) 10/116 3/116 One-stage IPD-AgD NMA – HIV RNA Fixed 205.02 67.87 137.15 0.189 (0.014, 0.477) 8/116 3/116 One-stage IPD-AgD NMA – Male Fixed 203.41 67.88 135.53 0.177 (0.007, 0.469) 9/116 3/116 Two-stage empirical-priors – CD4 + HIV RNA + Male Fixed 203.76 75.28 128.48 0.130 (0.018, 0.413) 6/116 3/116 Two-stage empirical-priors – CD4 + Male Fixed 203.37 71.32 132.05 0.173 (0.008, 0.463) 6/116 3/116 Two-stage empirical-priors – CD4 + HIV RNA Fixed 202.68 72.54 130.14 0.142 (0.006, 0.43) 7/116 3/116 Two-stage empirical-priors –HIV RNA+ Male Fixed 205.83 71.82 134.01 0.147 (0.01, 0.446) 6/116 3/116 Two-stage empirical-priors – CD4 Fixed 202.79 69.53 133.26 0.183 (0.015, 0.459) 9/116 4/116 Two-stage empirical-priors – HIV RNA Fixed 204.90 68.79 136.11 0.183 (0.005, 0.471) 8/116 3/116 Two-stage empirical-priors – Male Fixed 204.43 70.16 134.27 0.172 (0.007, 0.455) 9/116 3/116 HMR IPD-AgD NMA – CD4 Fixed 205.23 68.36 136.87 0.197 (0.009, 0.477) 9/116 3/116 HMR IPD-AgD NMA – HIV RNA Fixed 204.72 67.52 137.20 0.214 (0.014, 0.516) 8/116 3/116 HMR IPD-AgD NMA – Male Fixed 203.34 69.42 133.92 0.172 (0.014, 0.45) 9/116 3/116 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; DIC: Deviance information criterion; pD: Effective number of parameters; prop3: Proportion of observations above deviance2 + leverage = 3; prop4: Proportion of observations above deviance2 + leverage = 4  88 The one-stage analyses and two-stage empirical-priors analyses also led to a lower estimate of the between study heterogeneity, suggesting that the adjustments helped account for between-study differences as well. The rankings stayed generally stable across models (Appendix B.5); however, with the selected model, DTG changed from being ranked 1st to being ranked 2nd.  Table 3-10 presents the estimated effects for the comparisons of primary interest. The selected model shifted the principal comparison of interest from an OR of 0.28 (95% CrI: 10.17, 0.44) to 0.37 (95% CrI: 0.23, 0.58), but this would have little impact on decision making. Two of the three HMR models led to non-significant relative effects, which would be quite impactful. Despite all of this, it is important to note that with respect to absolute effects, most model adjustments led to very small differences. This aligns well with the fact that none of the covariates were found to be statistically significant.  89 Table 3-10: Comparison of comparative treatment estimates for discontinuations due to adverse events Analyses Model DTG vs. EFV OR (95% CrI) EFV400 vs. EFV OR (95% CrI) DTG vs. EFV400 OR (95% CrI) Mean change in log-odds Maximum change in log-odds Mean change in proportion Maximum change in proportion AgD NMA – Unadjusted Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.67 (0.30, 1.45) -- -- -- -- AgD NMA meta-regression – CD4  Fixed 0.30 (0.19, 0.48) 0.40 (0.21, 0.74) 0.76 (0.35, 1.65) 0.084 0.215 0.002 0.007 AgD NMA meta-regression – HIV RNA  Fixed 0.30 (0.19, 0.47) 0.42 (0.22, 0.77) 0.72 (0.33, 1.55) 0.083 0.308 0.003 0.007 AgD NMA meta-regression – Male  Fixed 0.27 (0.17, 0.44) 0.58 (0.27, 1.20) 0.47 (0.19, 1.24) 0.107 0.321 0.004 0.011 Two-stage AgD NMA – CD4 + HIV RNA + Male Fixed 0.28 (0.17, 0.45) 0.42 (0.22, 0.76) 0.67 (0.31, 1.47) 0.004 0.008 0 0 Two-stage AgD NMA – CD4 + Male Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.66 (0.30, 1.47) 0.003 0.01 0 0 Two-stage AgD NMA – CD4 + HIV RNA Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.66 (0.30, 1.47) 0.005 0.012 0 0.001 Two-stage AgD NMA – HIV RNA + Male Fixed 0.28 (0.17, 0.45) 0.42 (0.22, 0.76) 0.66 (0.31, 1.50) 0.004 0.009 0 0.001 Two-stage AgD NMA – HIV RNA Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.67 (0.31, 1.47) 0.004 0.008 0 0 Two-stage AgD NMA – CD4 Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.66 (0.31, 1.47) 0.003 0.009 0 0 Two-stage AgD NMA – Male Fixed 0.28 (0.17, 0.44) 0.42 (0.22, 0.77) 0.66 (0.30, 1.47) 0.004 0.008 0 0.001 One-stage IPD-AgD NMA – unadjusted Fixed 0.35 (0.22, 0.52) 0.42 (0.22, 0.77) 0.82 (0.39, 1.79) 0.094 0.24 0.011 0.027 One-stage IPD-AgD NMA – CD4+ HIV RNA+ Male Fixed 0.34 (0.21, 0.54) 0.38 (0.20, 0.70) 0.91 (0.42, 1.98) 0.141 0.238 0.003 0.007 One-stage IPD-AgD NMA – CD4 + Male Fixed 0.33 (0.21, 0.53) 0.40 (0.21, 0.74) 0.83 (0.39, 1.79) 0.086 0.206 0.005 0.013 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 0.38 (0.24, 0.58) 0.40 (0.21, 0.74) 0.94 (0.44, 2.04) 0.193 0.338 0.002 0.005 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 0.33 (0.21, 0.52) 0.37 (0.19, 0.70) 0.89 (0.41, 1.93) 0.109 0.213 0.003 0.008 One-stage IPD-AgD NMA – CD4 Fixed 0.37 (0.23, 0.57) 0.43 (0.22, 0.79) 0.85 (0.41, 1.85) 0.144 0.305 0.006 0.017 One-stage IPD-AgD NMA – HIV RNA Fixed 0.32 (0.20, 0.49) 0.40 (0.21, 0.74) 0.80 (0.37, 1.72) 0.056 0.165 0.007 0.019 One-stage IPD-AgD NMA – Male Fixed 0.36 (0.23, 0.55) 0.39 (0.20, 0.72) 0.92 (0.43, 2.00) 0.159 0.298 0.002 0.005 Two-stage empirical-priors – CD4+HIV RNA+Male Fixed 0.38 (0.24, 0.66) 0.43 (0.22, 0.77) 0.90 (0.42, 2.15) 0.167 0.544 0.01 0.03 Two-stage empirical-priors – CD4 + Male Fixed 0.35 (0.22, 0.57) 0.43 (0.22, 0.79) 0.83 (0.38, 1.87) 0.167 0.582 0.01 0.028 Two-stage empirical-priors – CD4 + HIV RNA Fixed 0.40 (0.25, 0.66) 0.43 (0.23, 0.80) 0.93 (0.42, 2.11) 0.199 0.648 0.011 0.029 Two-stage empirical-priors – HIV RNA + Male Fixed 0.34 (0.21, 0.56) 0.41 (0.21, 0.75) 0.85 (0.38, 1.90) 0.105 0.264 0.01 0.029 Two-stage empirical-priors – CD4 Fixed 0.37 (0.23, 0.58) 0.44 (0.23, 0.81) 0.84 (0.39, 1.85) 0.205 0.588 0.01 0.028 Two-stage empirical-priors – HIV RNA Fixed 0.32 (0.21, 0.50) 0.41 (0.22, 0.76) 0.79 (0.37, 1.74) 0.076 0.232 0.009 0.025 Two-stage empirical-priors – Male Fixed 0.37 (0.23, 0.58) 0.41 (0.22, 0.76) 0.90 (0.42, 1.97) 0.097 0.288 0.01 0.028 HMR IPD-AgD NMA – CD4 Fixed 0.42 (0.20, 0.79) 0.50 (0.22, 1.05) 0.84 (0.40, 1.80) 0.288 0.423 0.004 0.01 HMR IPD-AgD NMA – HIV RNA Fixed 0.48 (0.11, 1.51) 0.57 (0.15, 1.94) 0.82 (0.39, 1.78) 0.395 0.542 0.011 0.029 HMR IPD-AgD NMA – Male Fixed 0.76 (0.28, 1.75) 0.71 (0.30, 1.59) 1.05 (0.47, 2.42) 0.866 1.035 0.04 0.092 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.  90 3.5.3 Stage 2 – Comparative efficacy and safety Upon further investigation of the remaining outcomes, analyses were not required for all of the planned outcomes. Specifically, with only 4 deaths among the 2,160 patients for which IPD were available, it was deemed that despite having IPD, using meta-regression would not change the results. Under-reporting and low event counts were also issues with ADI and treatment-related SAEs and AEs, but these variables were not included in the planned analyses for Stage 2 (see Methods and the Statistical Analysis Plan). Four tables have been prepared for each outcome, namely a model selection table, a ranking table, a comparison of effect sizes table and a regression covariates table. Only select tables are presented in this section. All others are provided in Appendix B.6. Largely, results of the Stage 2 analyses led to similar impacts to those observed in Stage 1. With respect to model selection, the unadjusted AgD NMA was favoured for CD4 at 96 weeks, SAEs and AEs; however, in the case viral suppression at 96 weeks, the model adjusted for baseline HIV RNA was the selected model. As shown in Table 3-11, the DIC for the selected model more than 12 units smaller than the AgD NMA. The table also shows that there are other adjustments that lead to similar DICs, but in this case, we’ve selected the smallest DIC. Although in the case of CD4 one of the adjustments led to a lower DIC, the difference was less than 3, so the AgD NMA model was favoured for its simplicity. Table 3-11: Comparison of model selection and fit for viral suppression at 96 weeks Analyses Model DIC pD Deviance Between study  prop3 prop4 AgD NMA – Unadjusted Fixed 111.78 40.95 70.83 0.114 (0.005, 0.325) 6/96 6/96 One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 102.45 43.99 58.46 0.077  (0.003, 0.254) 4/96 3/96 One-stage IPD-AgD NMA – CD4 + Male Fixed 101.17 42.98 58.19 0.085  (0.006, 0.263) 4/96 3/96 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 103.91 43.03 60.88 0.094  (0.005, 0.297) 5/96 3/96 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 100.66 42.97 57.69 0.072  (0.003, 0.238) 4/96 3/96 One-stage IPD-AgD NMA – CD4 Fixed 103.22 41.91 61.31 0.101  (0.007, 0.296) 4/96 3/96 One-stage IPD-AgD NMA – HIV RNA Fixed 99.43 42.04 57.39 0.074  (0.004, 0.241) 4/96 3/96 One-stage IPD-AgD NMA – Male Fixed 101.22 42.02 59.2 0.088  (0.006, 0.275) 4/96 3/96 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; DIC: Deviance information criterion; pD: Effective number of parameters; prop3: Proportion of observations above deviance2 + leverage = 3; prop4: Proportion of observations above deviance2 + leverage = 4 There was no meaningful impact with respect to rankings across outcomes. Change in CD4 had the most changes in ordering of the top-ranked treatments, but this is likely because the top few treatments were very similar to one another. The impact of adjustments with IPD on the actual estimates was noticeable, particularly in the case of viral suppression and change in CD4 cell counts at 96 weeks, as shown in Table 3-12 and Table 3-13, respectively.   91 Table 3-12: Comparison of comparative treatment estimates for viral suppression at 96 weeks Analyses Model DTG vs. EFV OR (95% CrI) EFV400 vs. EFV OR (95% CrI) DTG vs. EFV400 OR (95% CrI) Mean change in log-odds Maximum change in log-odds Mean change in proportion Maximum change in proportion AgD NMA – Unadjusted Fixed 1.94 (1.52, 2.48) 0.96 (0.61, 1.52) 2.02 (1.49, 2.76) -- -- -- -- One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 1.53 (1.18, 1.98) 0.90 (0.57, 1.44) 1.71 (1.32, 2.64) 0.147 0.24 0.081 0.108 One-stage IPD-AgD NMA – CD4 + Male Fixed 1.56 (1.21, 2.01) 0.88 (0.55, 1.41) 1.76 (1.40, 2.60) 0.119 0.217 0.046 0.063 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 1.60 (1.24, 2.07) 0.97 (0.61, 1.55) 1.83 (1.43, 2.80) 0.095 0.191 0.086 0.104 One-stage IPD-AgD NMA – HIV RNA + Male Fixed 1.54 (1.19, 1.99) 0.91 (0.57, 1.45) 1.80 (1.41, 2.73) 0.141 0.231 0.042 0.061 One-stage IPD-AgD NMA – CD4 Fixed 1.64 (1.27, 2.11) 0.94 (0.60, 1.49) 1.78 (1.41, 2.63) 0.064 0.169 0.022 0.031 One-stage IPD-AgD NMA – HIV RNA Fixed 1.58 (1.23, 2.03) 0.89 (0.56, 1.41) 1.75 (1.39, 2.59) 0.109 0.206 0.041 0.057 One-stage IPD-AgD NMA – Male Fixed 1.65 (1.29, 2.12) 0.99 (0.62, 1.57) 1.82 (1.42, 2.76) 0.076 0.16 0.012 0.021 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.  Table 3-13: Comparison of comparative treatment estimates for change in CD4 at 96 weeks Analyses Model DTG vs. EFV OR (95% CrI) EFV400 vs. EFV OR (95% CrI) DTG vs. EFV400 OR (95% CrI) Mean change Maximum change AgD NMA – Unadjusted Fixed 24.22 (5.39, 43.45) 26.98 (3.96, 49.74) -2.66 (-32.37, 27.00) -- -- One-stage IPD-AgD NMA – CD4 + HIV RNA + Male Fixed 47.23 (33.54, 61.10) 34.73 (11.75, 57.81) 12.54 (-14.40, 39.32) 13.58 32.183 One-stage IPD-AgD NMA – CD4 + Male Fixed 46.43 (32.79, 60.11) 35.65 (12.56, 58.72) 10.85 (-15.74, 37.44) 12.709 27.908 One-stage IPD-AgD NMA – CD4 + HIV RNA Fixed 39.06 (25.46, 52.76) 26.33 (3.50, 49.21) 12.73 (-13.83, 39.43) 5.229 14.839 One-stage IPD-AgD NMA – HIV RNA + Male Random 49.90 (25.65, 76.04) 35.22 (-3.60, 73.82) 14.58 (-30.27, 61.87) 14.541 37.494 One-stage IPD-AgD NMA – CD4 Fixed 38.33 (24.64, 51.85) 27.08 (4.07, 49.91) 11.30 (-15.30, 37.77) 4.306 14.11 One-stage IPD-AgD NMA – HIV RNA Random 48.76 (25.12, 74.16) 36.03 (-1.88, 73.08) 12.78 (-30.82, 58.20) 13.386 32.242 One-stage IPD-AgD NMA – Male Fixed 38.81 (25.13, 52.29) 26.11 (3.26, 49.04) 12.72 (-14.28, 39.50) 5.155 14.588 AgD: Aggregate data; IPD: Individual patient data; NMA: Network meta-analysis; EFV: Efavirenz; DTG: Dolutegravir; EFV400: Low-dose efavirenz; OR: Odds ratio; CrI: Credible interval.    92 In the case of viral suppression, the relative efficacy of DTG was reduced relative to both EFV and EFV400. In the selected model, the odds ratio decreased from 1.94 (95% CrI: 1.52, 2.48) to 1.58 (95% CrI: 1.23, 2.03) relative to EFV, with a similar change relative to EFV400. Importantly, however, none of the effects that were significant ceased to be following the change and similar decision making would be achieved with the revised numbers. Nonetheless, the average change in modeled proportions was rather large at a mean shift of 4.1% in the selected model. With respect to CD4, the change was in the opposite direction. The mean change in CD4 for DTG relative to both EFV and EFV400 increased considerably after IPD adjustment. Keep in mind that none of the models were selected. Notwithstanding, mean changes in relative treatment-effects varied from 4.3 to 14.5 cells/ml, which can be considered large changes. Finally, in three of the four secondary outcomes, there were significant coefficients in some of the models. For CD4, this included every single model. Across outcomes, it was more often the case that the coefficients adjusting the treatment-effect (i.e., the ones reducing effect-modification) were significant, rather than the coefficients adjusting the study effect. Particular to this evidence base, HIV RNA was often statistically significant.  3.6 Discussion This study examined the change in outputs in the evidence synthesis of ART among first-line HIV patients when including IPD and compared the extent of this impact using different established IPD-based methods for meta-regression adjustments using a mixture of IPD and AgD. The four methods of adjusting for covariate imbalances using individual patient data that were compared are: a two-stage approach, a two-stage approach with empirical priors, a one-stage approach, and hierarchical meta-regression. None of the four methods stood out as a clearly superior approach solely on the basis of the numerical results. Nonetheless, this study does provide important insights into these methods of adjustment. First, while in most analyses the four strategies were in general agreement, there were situations where the results differed remarkably between the two-stage approach and other approaches and thus the choice of method matters. Second, the hierarchical meta-regression tended to lead to the largest changes in effect estimates, but did so at the steep cost of reduced precision. Third, there was also a remarkable difference in the coefficient estimates obtained through individual patient data methods and those obtained through more traditional meta-regression using aggregate data only, suggesting that when adjustments are needed, individual patient data is much more appropriate to use. This study also aimed to understand the potential impact of including individual patient data for the particular application of comparing the therapeutic landscape of anchor treatments in first-line ART for the treatment of HIV. To this end, it was reassuring to find that the conclusions reached through the evidence synthesis supplemented by the individual patient data from SINGLE, FLAMINGO and SPRING-2 did not lead to changes that would have impacted the WHO change in guidelines that took place in December 2018 on the basis of the analyses presented in Chapter 2.302 Nonetheless, there were two analyses that did change, namely discontinuations due to adverse events  93 and viral suppression at 96 weeks, and there is merit in having the least biased estimates available to assist in decision-making, including the formulation of clinical guidelines. Comparisons and implications Despite the limited impact on the interpretation of the therapeutic landscape on the basis of IPD, there are a number of advantages to the use of IPD that were observed and that have been discussed previously.286 First, the IPD more easily allows for the simultaneous adjustment of multiple covariates because it has a much higher degree of freedom. Only edges with multiple trials and differences in covariate values along those edges allow for the estimation of the covariate of interest. Secondly, the results of this study suggested that where traditional AgD meta-regression was feasible, it was underpowered, as demonstrated by the smaller estimated coefficients that were less often statistically significant, and often inaccurate. Under the assumption that the IPD estimates based on 2,160 data points are more accurate than the meta-regression adjustments based on trends among a small number of aggregate data points, then the large differences seen in estimates suggests an inaccuracy among the AgD meta-regression. This is best seen in the CD4 analyses, where the AgD meta-regression suggests that men on average gain 38.5 cells/ml more than women while the IPD-AgD meta-regression suggests men on average gain 4.0 cells/ml less than women. Similarly large differences were also observed in the other outcomes. On the other hand, there are two disadvantages to the IPD-AgD NMA relative to traditional methods. Traditional methods require a lot less data management and can be conducted in a much faster time. In the grander scheme of things, it is always better to do things right than to do them quickly and thus, IPD should be integrated when possible.  There is a clear trend towards improved access to IPD and increased use of IPD.290,303,304 The most popular methods have the distinct advantage of being able to adjust for unanchored networks, but these require very strong assumptions (no unobserved prognostic factors and effect-modifiers) and are usually limited to indirect comparisons.55,289 As the use of IPD increases, we can expect increased use of IPD-AgD NMA such as the methods compared in this study. In terms of meta-analyses and network meta-analyses, there has been a shift from the predominant use of a two-stage approach to a one-stage approach.286 As Simmonds et al. explain in their review, the likely cause of this trend is a growing familiarity with methods, improvements in computing and the recognition that regression model offer the greatest flexibility for IPD analysis.286 However, the two-stage analyses used in this study included regression in the creation of the AgD from the IPD, as opposed to many of the two-stage analyses that have been published in the past, where regression adjustments have tended to not be used.286 To the best of our knowledge, no study has compared the results of one-stage and two-stage IPD-AgD NMA directly. In most analyses, there was no meaningful differences in the results using either approach; however, this was not the case in the 48-week CD4 analyses. Adjustments went in opposite directions for this outcome, which is likely a result of having the regression adjustments for the IPD done independently for each trial in the two-stage approach, rather than collectively. In the absence of differences, the two-stage approach had the advantage of being computationally less intensive and being easier to code. Conversely, the other approaches did have the  94 advantage of being analyzed in a single step and having more easily interpretable regression coefficients. Given these advantages and the fact that choice appeared to matter for some analyses, the recommendation would be to not use the two-stage approach. The choice between one-stage IPD-AgD NMA and two-stage IPD-AgD NMA with empirical-priors is less straightforward, and is ultimately dependent on the evidence base at hand. The empirical-priors method does not appear to have been used previously. As described in the methods, the motivation for its use was to isolate the coefficient estimation to the IPD (i.e., reduce the influence of the AgD on the estimation of the regression adjustments). The difference between these two approaches was much more subtle. Inspection of the DTG vs. EFV estimates, for which there was an IPD trial, reveals that there was general agreement between the two modeling approaches (when keeping the same covariates). The difference was notable in the EFV400 vs. EFV comparisons, for which there were no IPD available. Here, the use of one-stage IPD-AgD NMA appeared to have a lesser impact than the empirical-priors approach. It is important to note that that isn’t to say that the one-stage approach led to no adjustments. As a result, in situations where there is an abundant number of trials and treatment comparisons that have IPD, such as in the Donegan et al example,294 the one-stage approach, which is already well adopted, would be recommended. On the other hand, for networks that have few IPD trials and many treatment comparisons with AgD only, the empirical-priors approach is a way by the IPD may be maximized across the network.  Although hierarchical meta-regression has shown some promising results, it appears that more research is needed for these methods. The age-old question is statistics is whether to choose improved validity over improved precision. Simulation work has suggested that these methods reduce bias,54 which is the more prized attribute. Nonetheless, the loss of precision was not negligible in these applications. Moreover, it was difficult to use these methods with multiple variables at a time and the methods for use on continuous outcomes have not yet been published. Once further advancements are conducted on this method, it will be worthwhile reviewing a comparison with one-stage analyses again. As discussed above, the implications for first-line ART regimens is minimal. The evidence continues to support the DTG as the more efficient and tolerable choice of treatment. At the outset, it was expected that by using IPD to have more precise estimates,68 that DTG might become distinguishable within its class of treatment as some comparisons appeared to be marginally significant.305 However, the adjusted analyses tended to not be selected on the basis of improved fit. In instances where models were selected due to significant meta-regression coefficients, the differences between treatments tended to be less pronounced, albeit DTG continued to perform best with respect to viral suppression, change in CD4 and tolerability. Limitations There are several limitations to this study. First and foremost, there were very few trials for which IPD that were obtained. These represented a small fraction of the trials and patients and may explain why the impact on model estimates appeared to be somewhat muted. This limitation is exacerbated by the missed  95 opportunity to get IPD for the SPRING-1 trial. The oversight was identified too far along in the process and thus could not be corrected in time. Given that this was a small Phase 2 trial that would have added a small fraction of patients to an already small sample of IPD, the impact of including or excluding its IPD is very likely to be negligible (particularly given the results of Chapter 4). Moreover, despite not having the IPD, the SPRING-1 trial was still included in the analyses. Second, it is unclear whether the multiple forms of meta-regression interfered with one another. To account for differences in backbone regimens, an arm-based meta-regression was used in addition to the more traditional trial/patient-based regression adjustments and this may have been a nuisance to the modeling process. Third, the trials for which IPD were available were principally conducted in high-income countries, which may limit the ability to make adjustments needed in studies conducted in the LMICs. Nonetheless, there tended to be a wide range of values for the covariates of interest, so this is unlikely to have been an issue. Fourth, there were numerous other potential effect-modifiers that were more poorly reported and hence not adjusted for. These principally included ethnicity and acquisition risk groups. Early on, men who have sex with men was a variable that was intended to be included, but it was deemed to not be feasible during the development of the statistical analysis plan. This represents a limitation with respect to the applied work, but is not a limiting factor for the comparison of meta-regression adjustment methods. Finally, due to low event counts and data unavailability, not all outcomes were available for re-analysis using IPD. For example, there were only 5 deaths among the 2,160 patients in the three trials for which IPD were available. With so few events, it was known beforehand that using IPD adjustments would have no impact on the mortality outcome and as a result, this analysis was not conducted.  For this reason, a full complementary set of analyses to Chapter 2 was not possible.   Conclusion There are many ways in which IPD can be integrated with AgD for the purpose of NMA. Choosing the method by which to integrate these data does have an impact on the results. In most cases, the one-stage approach is recommended; however, in situations with fewer treatment comparisons that have IPD, the empirical-priors approach is a viable alternative. Even with the revised analyses, DTG continues to demonstrate improved efficacy and tolerability over other anchor treatments.   96 Chapter 4: When does use of individual patient data make a difference?   4.1 Synopsis Background: The use of individual patient data (IPD)