UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Clinical Implications of inter-tumour, intra-tumour, and tumour microenvironment heterogeneity in B-cell… Chan, Fong Chun 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_may_chan_fongchun.pdf [ 22.42MB ]
Metadata
JSON: 24-1.0343370.json
JSON-LD: 24-1.0343370-ld.json
RDF/XML (Pretty): 24-1.0343370-rdf.xml
RDF/JSON: 24-1.0343370-rdf.json
Turtle: 24-1.0343370-turtle.txt
N-Triples: 24-1.0343370-rdf-ntriples.txt
Original Record: 24-1.0343370-source.json
Full Text
24-1.0343370-fulltext.txt
Citation
24-1.0343370.ris

Full Text

Clinical Implications of Inter-tumour, Intra-tumour, and TumourMicroenvironment Heterogeneity in B-cell LymphomasbyFong Chun ChanMSc, The University of British Columbia, 2012BSc, Simon Fraser University, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Bioinformatics)The University of British Columbia(Vancouver)March 2017c© Fong Chun Chan, 2017AbstractB-cell lymphomas are lymphoid neoplasms derived from mature B lymphocytes at variousstages of B-cell development. Advances in sequencing have contributed to decoding thegenomic landscapes underlying many subtypes of B-cell lymphomas. However, it remainsunclear why some B-cell lymphoma patients suffer from disease progression.A major factor contributing to disease progression is tumour heterogeneity, aconsequence of branched evolutionary processes, and microenvironment heterogeneityleading to variation in the composition and properties of non-malignant cells infiltratingand surrounding the cancer. A thorough characterization of these forms of diversity inB-cell lymphomas and their association with disease progression has not been undertaken.As such, the overarching hypothesis of this thesis is that uncharacterized inter-tumour,intra-tumour, and tumour microenvironment heterogeneity impacts disease progressionin B-cell lymphomas. In particular, this thesis is focused on studying these types ofheterogeneity in three subtypes of B-cell lymphomas and their implications on diseaseprogression.First, I explored inter-tumour heterogeneity in primary specimens of diffuse largeB-cell lymphoma patients. I identified novel RCOR1 deletions and their correspondingtranscriptional signature in a subset of patients that stratified patients into good and pooroutcome following first-line treatment. Secondly, I explored intra-tumour heterogeneityin histologically transformed and early progressed follicular lymphoma patients usingserial samples of their primary and transformed/progressed specimens. Through theinference of clonal dynamic patterns, I revealed divergent evolution patterns and identifiednovel genes underlying these distinct clinical end points. Thirdly, I explored tumourmicroenvironment (TME) heterogeneity in classical Hodgkin lymphoma relapse patientsthrough serial sampling of primary pretreatment and relapse specimens. I demonstratedhow specific TME dynamic patterns can inform on treatment failure. Moreover, I deriveda novel, clinically applicable prognostic model (RHL30), based on the TME composition atrelapse that predicts response to second-line treatment.Collectively, the work in this thesis constitutes a step forward in our characterization oftumour and microenvironment heterogeneity in B-cell lymphomas and its association withdisease progression. The results presented here will aid in the determination of precisetherapeutic approaches for individual lymphoma patients.iiPrefaceIn conjunction with my co-supervisors, Dr. Sohrab Shah and Dr. Christian Steidl, Iwas involved in the conceiving and designing of the research that is part of this thesis.Specifically, my main responsibilities were to perform the experimental research, interpretthe results, and present the results through figures, tables, and text in this thesis. However,this thesis also includes contributions from collaborators and close colleagues and this isdescribed below.Chapter 2 is a modified version of material published in “Chan, FC. et al. An RCOR1loss-associated gene expression signature identifies a prognostically significant DLBCLsubgroup. Blood (2015).” I led this project under the co-supervision of Dr. SohrabShah and Dr. Christian Steidl. I performed the bioinformatics data analysis, interpretedthe results, and co-authored the paper along with Dr. Sohrab Shah and Dr. ChristianSteidl. Adele Telenius and Dr. Shannon Healy performed in vitro RCOR1 shRNAknockdown experiments, analyzed the results, generated Figure 2.17B, and contributedto the writing of the methods (Section 2.2.15). Susana Ben-Neriah performed FISHexperiments, analyzed the results, generated figures (Figure 2.4C-E), and contributed tothe writing of the methods (Section 2.2.8). Dr. Anja Mottok performed IHC experiments,analyzed the results, generated Figure 2.12, and contributed to the writing of the methods(Section 2.2.12). All other co-authors assisted with data collection, generation and/orinterpretation of the results.Chapter 3 is a modified version of material published in “Kridel, R*., Chan, FC*.,et al. Histological Transformation and Progression in Follicular Lymphoma: a ClonalEvolution Study. PLOS Medicine (2016). *Equal contribution”. This project was co-ledby myself and Dr. Robert Kridel and was supervised by Dr. Sohrab Shah. I wasprimarily responsible for the bioinformatics data analysis which included analysis of thewhole genome sequencing data, deep amplicon sequencing data, capture sequencingdata, inference of clonal dynamics, and tumour evolution modelling. Dr. Robert Kridelwas primarily responsible for data generation and wetlab experiments which includedcohort assembly, extraction of DNA and RNA, flow sorting, scoring of FISH assays,primer design, performing deep amplicon sequencing, digital droplet PCR and Lymph2Cxassays. Dr. Robert Kridel also generated the following figures: Figure 3.1, Figure 3.2,Figure 3.18, Figure 3.19, Figure 3.23, Figure 3.25. Together we interpreted the results,iiigenerated figures (Figure 3.24 and Figure 3.26), and co-authored the paper along withDr. Sohrab Shah. Anja Mottok performed IHC staining, analyzed the results, generatedFigure 3.24C, and contributed to the writing of the methods (Section 3.2.2). SusanaBen-Neriah performed FISH experiments and analyzed the results. Maia Smith and Dr.Cydney Nielsen assisted in visualization of the results. All other co-authors assisted withdata collection, generation, and/or interpretation of the results.Chapter 4 is a modified version of material that is currently under review “Chan,FC*., Mottok, A*., et al. A Novel Prognostic Model to Predict Post-Autologous StemCell Transplantation Outcomes in Classical Hodgkin Lymphoma. Under Review. *Equalcontribution”. I co-led this project with Dr. Anja Mottok under the supervision of Dr.Christian Steidl. I performed the bioinformatics data analysis and interpreted the results.Dr. Anja Mottok generated the NanoString assays, performed all the pathology work,generated figures (Figure 4.8C, Figure 4.9G,H, Figure 4.10), and interpreted these anddata analysis results. I co-authored the paper along with Dr. Anja Mottok and Dr. ChristianSteidl. All other co-authors assisted with data collection, generation, and/or interpretationof the results.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Inter-tumour Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Intra-tumour Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Tumour Microenvironment Heterogeneity . . . . . . . . . . . . . . . . . . . 51.4 B-cell Lymphomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.1 B-cell Lymphomagenesis . . . . . . . . . . . . . . . . . . . . . . . . 61.4.2 Diffuse Large B-cell Lymphoma . . . . . . . . . . . . . . . . . . . . . 81.4.3 Follicular Lymphoma . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.4 Hodgkin Lymphoma . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.5 Problem Statement and Thesis Overview . . . . . . . . . . . . . . . . . . . 182 An RCOR1 Loss-Associated Gene Expression Signature Identifies aPrognostically Significant DLBCL Subgroup . . . . . . . . . . . . . . . . . . . 202.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.1 Patient Cohorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Affymetrix SNP 6.0 Quality Control . . . . . . . . . . . . . . . . . . . 22v2.2.3 Processing of Affymetrix SNP 6.0 Data . . . . . . . . . . . . . . . . 232.2.4 Generation of Copy Number Segments . . . . . . . . . . . . . . . . 242.2.5 Generation of Gene-centric Copy Number States and LogR Values . 252.2.6 Filtering of Copy Number Polymorphisms . . . . . . . . . . . . . . . 272.2.7 GISTIC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.8 Fluorescence in Situ Hybridization Experiments . . . . . . . . . . . . 272.2.9 RNA-seq Alignment, Filtering, and Gene Expression Analysis . . . . 282.2.10 Cis/Trans Correlation Analysis . . . . . . . . . . . . . . . . . . . . . 282.2.11 xseq Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2.12 Immunohistochemistry on Primary Lymphoma and Reactive TonsilSamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2.13 Mutual Exclusivity and Co-occurrence Analyses . . . . . . . . . . . 292.2.14 Cell Line Selection for In Vitro Knockdown . . . . . . . . . . . . . . . 302.2.15 Virus Production, Transduction and Transcript expression . . . . . . 302.2.16 Gene Expression Analysis of In Vitro Knockdown Cells . . . . . . . 312.2.17 RCOR1 Loss-Associated Gene Signature Analysis . . . . . . . . . . 312.2.18 Correlating Deletions with the RCOR1 Loss-Associated GeneSignature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.19 Determining the Expression Status of RCOR1, LCOR, and NCOR1 332.2.20 Quality Control on the Monti Cohort . . . . . . . . . . . . . . . . . . 332.2.21 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.22 Pathway Enrichment Analysis . . . . . . . . . . . . . . . . . . . . . . 342.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.3.1 High-Resolution Copy Number Analysis of DLBCL . . . . . . . . . . 342.3.2 Dysregulated Transcriptional Networks Identified by Integration ofCopy Number and Gene Expression Data . . . . . . . . . . . . . . . 362.3.3 RCOR1 Deletions Define a Subgroup of DLBCL Patients withUnfavourable Survival in a Homogenously R-CHOP-Treated Cohort 382.3.4 Recurrent Deletions in Members of the Corepressor Family . . . . . 442.3.5 An RCOR1 Loss-Associated Gene Expression Signature Derived byIn vitro Knockdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3.6 The RCOR1 Loss-Associated Gene Expression Signature isAssociated with Unfavourable Outcome . . . . . . . . . . . . . . . . 502.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Clonal Dynamics Underlying Histological Transformation and Progressionin Follicular Lymphoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2.1 Cohort Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 60vi3.2.2 Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.2.3 Whole Genome Sequencing Data Analysis . . . . . . . . . . . . . . 653.2.4 Targeted Deep Amplicon Sequencing Data Analysis . . . . . . . . . 773.2.5 Capture Sequencing Data Analysis . . . . . . . . . . . . . . . . . . . 803.2.6 Digital Droplet Polymerase Chain Reaction . . . . . . . . . . . . . . 853.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.3.1 Transformed/Progressed Follicular Lymphoma Samples ExhibitHigher Mutational Burden than Diagnostic Samples . . . . . . . . . 863.3.2 Histological Transformation Emerges from the Expansion of a RareSubclone in Diagnostic Samples . . . . . . . . . . . . . . . . . . . . 893.3.3 Clones Dominant in Progressed Samples were Prevalent inDiagnostic Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 973.3.4 TFL Clonal Dynamics are Inconsistent with Neutral Evolution . . . . 993.3.5 Contribution of Individual Gene Mutations to Transformation . . . . . 1003.3.6 Gene Mutations in Early Progressers . . . . . . . . . . . . . . . . . 1033.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054 A Novel Prognostic Model to Predict Post-Autologous Stem CellTransplantation Outcomes in Classical Hodgkin Lymphoma . . . . . . . . . 1094.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.2.1 BCCA Study Cohort and Clinical Characteristics . . . . . . . . . . . 1114.2.2 Gene Expression Analysis . . . . . . . . . . . . . . . . . . . . . . . . 1144.2.3 Pathology and Immunohistochemical Analysis . . . . . . . . . . . . 1184.2.4 Prognostic Power Comparison Between Primary and Relapse Samples1184.2.5 Bayesian Test of Proportions . . . . . . . . . . . . . . . . . . . . . . 1194.2.6 RHL30 Prognostic Model/Assay . . . . . . . . . . . . . . . . . . . . 1204.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.3.1 Comparative Analysis of Paired Primary-Relapse SpecimensReveals Biological Differences . . . . . . . . . . . . . . . . . . . . . 1244.3.2 Relapse Biopsies are Superior for Predicting Post-Autologous StemCell Transplantation Outcomes . . . . . . . . . . . . . . . . . . . . . 1284.3.3 A Novel Prognostic Model (RHL30) using Relapse SpecimensPredicts Post-Autologous Stem Cell Transplantation Outcomes . . . 1304.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . 1415.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425.1.1 RCOR1 as a Prognostic and Therapeutic Target . . . . . . . . . . . 1425.1.2 Extent of Spatial Heterogeneity in B-cell Lymphomas . . . . . . . . . 143vii5.1.3 Prediction of Early Progression in Follicular Lymphoma . . . . . . . 1435.1.4 Deconvolution of the TME . . . . . . . . . . . . . . . . . . . . . . . . 1445.1.5 Establishing the RHL30 as a Prognostic and Predictive Biomarker . 1445.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.1.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.1.2 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175viiiList of TablesTable 2.1 Clinical Characteristics of the R-CHOP-treated DLBCL Study Cohort. . . 23Table 2.2 Clinical Characteristics of the DLBCL SNP 6.0 Study Cohort. . . . . . . . 24Table 2.3 BAC probes used for FISH Validations. . . . . . . . . . . . . . . . . . . . 28Table 2.4 Pairwise Multivariate Analyses of RCOR1 Deletions and RCOR1-low vs.Established Prognostic Markers (COO and IPI). . . . . . . . . . . . . . . 40Table 3.1 Clinical Characteristics of TFL Cases. . . . . . . . . . . . . . . . . . . . . 63Table 3.2 Clinical Characteristics of Early vs. Late Progressers. . . . . . . . . . . . 64Table 3.3 TITAN Final Selected Parameters. . . . . . . . . . . . . . . . . . . . . . . 73Table 3.4 TITAN Copy Number State Overview. . . . . . . . . . . . . . . . . . . . . 74Table 3.5 Capture Sequencing Gene Panel (n = 86 genes). . . . . . . . . . . . . . 81Table 3.6 Regions Selected for Investigating Somatic Hypermutation (n = 20 genes). 84Table 4.1 Clinical Characteristics of Study Cohorts. . . . . . . . . . . . . . . . . . . 113Table 4.2 RHL30 Model Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . 123Table A.1 Frequently Deleted Regions Predicted by GISTIC. . . . . . . . . . . . . . 160Table A.2 Frequently Gained Regions Predicted by GISTIC. . . . . . . . . . . . . . 161Table A.3 Distribution of Normal/Tumour Content. . . . . . . . . . . . . . . . . . . . 162Table A.4 List of Genes in the RCOR1 Loss-associated Gene Expression Signature(n = 233). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166Table A.5 Table of Up/Down-regulated Enriched Biological processes (FDR < 0.05)that are Associated with the RCOR1 Loss-associated Gene Signature (n= 233). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173Table A.6 RHL30 vs. Reported Prognostic Markers for Post-BMT-OS. . . . . . . . . 175Table A.7 Primary vs. Relapse Specimens Gene Expression Signature DifferenceCorrelations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196Table A.8 RHL30 vs. Reported Prognostic Markers for Post-BMT-FFS. . . . . . . . 199Table A.9 RHL30 vs. Reported Prognostic Markers for Post-BMT-OS. . . . . . . . . 200ixList of FiguresFigure 1.1 Clonal Evolution in Cancer. . . . . . . . . . . . . . . . . . . . . . . . . . 2Figure 1.2 Different Forms of Heterogeneity Covered in this Thesis. . . . . . . . . . 3Figure 1.3 Decrease in Genomic Diversity due to an Evolutionary Bottleneck. . . . 4Figure 1.4 The Germinal-Centre Reaction and Putative COO of DifferentLymphoma Subtypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Figure 1.5 Clinical Prognosis of DLBCL patients . . . . . . . . . . . . . . . . . . . 10Figure 1.6 Clinical Prognosis of FL patients. . . . . . . . . . . . . . . . . . . . . . . 12Figure 1.7 cHL Microenvironment. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 1.8 Molecular Microscope Approach for Profiling the cHL Microenvironment. 16Figure 1.9 Clinical Prognosis of cHL patients . . . . . . . . . . . . . . . . . . . . . 17Figure 2.1 Overview of the Cohorts. . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.2 OncoSNP Segmentation Methodology. . . . . . . . . . . . . . . . . . . . 26Figure 2.3 RCOR1 Loss-Associated Gene Signature Clustering Methodology. . . . 32Figure 2.4 Genome-wide Copy Number Architecture of 148 DLBCL Patients. . . . 34Figure 2.5 Copy Number Aberration Distributions. . . . . . . . . . . . . . . . . . . . 35Figure 2.6 Genome-wide Copy Number Architecture per Subtype. . . . . . . . . . 36Figure 2.7 GISTIC Deletion and Gain Score Plots. . . . . . . . . . . . . . . . . . . 37Figure 2.8 Top 20 Candidate Driver Genes Selected by xseq. . . . . . . . . . . . . 38Figure 2.9 Association of CDKN2A, RCOR1, TRAF3, and TNFAIP2 Deletions withPFS, DSS, and OS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 2.10 Correlations of RCOR1 Deletions with Gene Expression. . . . . . . . . 41Figure 2.11 RCOR1 IHC Correlations with Copy Number Data. . . . . . . . . . . . . 42Figure 2.12 RCOR1 IHC Stains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 2.13 RCOR1 IHC Correlations with Survival. . . . . . . . . . . . . . . . . . . 43Figure 2.14 Focal View of Raw Copy Number Values. . . . . . . . . . . . . . . . . . 44Figure 2.15 Focal View of Raw Copy Number Values for RCOR1, LCOR, andNCOR1 Across All Deleted Samples. . . . . . . . . . . . . . . . . . . . 46Figure 2.16 Integration of Copy Number and Single Point Mutational Data. . . . . . 47Figure 2.17 RCOR1 knockdown qRT-PCR and Western Blots. . . . . . . . . . . . . 48xFigure 2.18 Venn Diagrams Comparing the Consistency of Dysregulated Pathwaysbetween RCOR1 Knockdowns in the cell-lines KM-H2 and Raji. . . . . . 49Figure 2.19 RCOR1 Loss-Associated Gene Signature Derivation Workflow. . . . . . 50Figure 2.20 RCOR1 Loss-Associated Signature is Associated with UnfavourableOutcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 2.21 Expression Distribution of RCOR1, LCOR and NCOR1. . . . . . . . . . 52Figure 2.22 RCOR1 Loss-associated Gene Expression Signature Clustering in theLenz Rediscovery Cohort. . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure 2.23 RCOR1 Loss-associated Gene Expression Signature Clustering in theMonti Rediscovery Cohort. . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 2.24 Prognostic Significance of the RCOR1 Loss-associated GeneExpression Signature within the IPI Low Group. . . . . . . . . . . . . . . 55Figure 3.1 Sample Overview and Timeline of Whole-Genome Sequencing Cohort. 61Figure 3.2 Sample Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure 3.3 WGS Sequencing Statistics. . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 3.4 Bioinformatics Workflow for Predicting sSNVs from WGS data. . . . . . 67Figure 3.5 Tumour Content Estimation in TFL Patients. . . . . . . . . . . . . . . . . 69Figure 3.6 Tumour Content Estimation in PFL Patients. . . . . . . . . . . . . . . . . 70Figure 3.7 Tumour Content Estimation in NPFL patients. . . . . . . . . . . . . . . . 71Figure 3.8 Bioinformatics Workflow for Predicting sCNAs from WGS data. . . . . . 72Figure 3.9 Selection of sSNV Positions for Deep Sequencing Validation. . . . . . . 78Figure 3.10 Inference of Clonal Phylogenies from Sequencing Data Workflow. . . . 79Figure 3.11 High-Level WGS Analysis Overview. . . . . . . . . . . . . . . . . . . . . 86Figure 3.12 Comparative Analysis Between Clinical Groups. . . . . . . . . . . . . . 88Figure 3.13 Correlation of Mutation Load Difference with Time between Samples. . 89Figure 3.14 WGS VAF Distribution Across all TFL and PFL Patients. . . . . . . . . . 90Figure 3.15 Clonal Phylogenies of TFL patients (Part I). . . . . . . . . . . . . . . . . 92Figure 3.16 Clonal Phylogenies of TFL patients (Part II). . . . . . . . . . . . . . . . . 93Figure 3.17 Clonal Phylogenies of TFL patients (Part III). . . . . . . . . . . . . . . . 94Figure 3.18 Ultra-sensitive Detection of Low Prevalence Clones in T1 Samples (PartI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Figure 3.19 Ultra-sensitive Detection of Low Prevalence Clones in T1 samples (PartII). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Figure 3.20 Clonal Phylogenies of PFL Patients. . . . . . . . . . . . . . . . . . . . . 98Figure 3.21 Power Law Distribution in TFL and PFL Patients. . . . . . . . . . . . . . 99Figure 3.22 Wright-Fisher Modelling in TFL and PFL Patients. . . . . . . . . . . . . 100Figure 3.23 Mutational Load in Coding Sequence of 86 genes. . . . . . . . . . . . . 101Figure 3.24 Results from Targeted Sequencing of 86 genes in Samples from 159TFL Patients (128 T1 and 149 T2 Samples). . . . . . . . . . . . . . . . 103xiFigure 3.25 Progression-free and Overall Survival in Early and Late Progressers. . . 104Figure 3.26 Results from Targeted Sequencing of 86 genes in 41 Early and 84 LateProgresser Patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 3.27 Schematic Models of Evolutionary Progression in TFL. . . . . . . . . . . 106Figure 3.28 Schematic Models of Evolutionary Progression in PFL. . . . . . . . . . 107Figure 4.1 Patient/Specimen Overview of the BCCA Cohort. . . . . . . . . . . . . . 112Figure 4.2 BCCA Cohort Quality Control Methodology. . . . . . . . . . . . . . . . . 116Figure 4.3 Effects of Different Normalization Strategies. . . . . . . . . . . . . . . . 117Figure 4.4 Correlation of IHC and NanoString Expression. . . . . . . . . . . . . . . 119Figure 4.5 Penalized Multivariate Cox Regression Method for Directly ComparingPrimary vs. Relapse Specimens. . . . . . . . . . . . . . . . . . . . . . . 120Figure 4.6 Primary vs. Relapse Samples Bootstrap Aggregration Method . . . . . 121Figure 4.7 RHL30 Prognostic Model Methodology. . . . . . . . . . . . . . . . . . . 122Figure 4.8 Primary vs. Relapse Histological Subtype Transitions. . . . . . . . . . . 125Figure 4.9 Primary vs. Relapse Gene Expression Differences. . . . . . . . . . . . . 126Figure 4.10 CD163 IHC vs. CD20 IHC. . . . . . . . . . . . . . . . . . . . . . . . . . 127Figure 4.11 Superior Post-ASCT Prognostic Properties of Relapse SpecimensCompared to Primary Specimens. . . . . . . . . . . . . . . . . . . . . . 129Figure 4.12 Penalized Multivariate Cox Regression for Directly Comparing Primaryvs. Relapse Specimens. . . . . . . . . . . . . . . . . . . . . . . . . . . . 130Figure 4.13 Univariate Cox Regression on Post-BMT-FFS using Relapse Specimens. 131Figure 4.14 Univariate Cox Regression on Post-BMT-OS using Relapse Specimens. 132Figure 4.15 RHL30 Predicts Response to ASCT. . . . . . . . . . . . . . . . . . . . . 134Figure 4.16 RHL30 Log-rank P-values and Hazard Ratios of Different Thresholds inthe Study Cohort. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Figure 4.17 RHL30 Model Applied to UMCG Validation Cohort. . . . . . . . . . . . . 136Figure 4.18 RHL30 Model Applied to AUH Validation Cohort. . . . . . . . . . . . . . 137Figure 4.19 RHL30 vs. Reported Prognostic Markers Forest Plot. . . . . . . . . . . 138xiiGlossaryABC Activated B-cell Like. One of two major molecular subtypes of diffuse large B-celllymphoma.ABVD Doxorubicin, Bleomycin, Vinblastine and Dacarbazine. The current standardfirst-line chemotherapy regimen used for treating classical Hodgkin lymphomapatients.ASCT Autologous Stem Cell Transplantation. A procedure that transplants a patient’sstem cells back into their bloodstream following high-dose chemotherapy. This iscurrently the standard of care for patients with relapsed disease.BAF B-allele Frequency. A measure of the allelic ratio of two alleles (A and B).cHL Classical Hodgkin Lymphoma. The more common form of Hodgkin lymphoma.CNP Copy Number Polymorphism. Germline copy number variants found in normal tissue.CQC Contrast Quality Control.COO Cell of Origin. The putative stage in the B-cell development process a lymphomasubtype arose at.CSR Class Switch Recombination. The third of three major B-cell developmentmechanisms that results in genetic diversity of the immunogloblin loci.ddPCR Digital Droplet Polymerase Chain Reaction.DHIT Double Hit Lymphomas. The molecular subtype of lymphomas characterized byconcurrent IgH-MYC, IgH-BCL2 translocations and/or IgH-BCL6 translocations. If all3 translocations are present in a case, this is called a “triple hit”.DLBCL Diffuse Large B-Cell Lymphoma. The most common form of non-Hodgkinlymphoma.DSS Disease Specific Survival. A clinical endpoint that measures patient death due to thedisease. Death from unrelated causes is excluded.xiiiEBV Epstein-Barr Virus. A common virus in humans that is known to be associated withHodgkin lymphoma.FDR False Discovery Rate.FF Fresh Frozen. A procedure for collecting specimens, usually for research purposes.FISH Fluorescence in Situ Hybridization.FL Follicular Lymphoma. The second most common form of non-Hodgkin lymphoma andthe most common form of indolent non-Hodgkin lymphoma.FLIPI Follicular Lymphoma International Prognostic Index. The most widely usedprognostic factor for follicular lymphoma.FFPE Formalin-Fixed Paraffin-Embedded.GCB Germinal Centre B-cell Like. One of two major molecular subtypes of diffuse largeB-cell lymphoma.HDCT High-Dose Chemotherapy. An umbrella term to describe an escalatedchemotherapy regimen. This is used in conjunction with autologous stem celltransplanation to treat relapsed classical Hodgkin lymphoma.HL Hodgkin Lymphoma. The less common of the two major types of lymphoma.IHC Immunohistochemistry.Ig Immunoglobulin Loci. An umbrella term encompassing the immunoglobulin heavy andlight chain loci.IgH Immunoglobulin Heavy Chain. The heavy chain component of an immunoglobulin thatis genomically located on 14q32.IgL Immunoglobulin Light Chain. The light chain component of an immunoglobulin that isgenomically located on chromosome 2 (κ locus) and 22 (λ locus).IPI International Prognostic Index. The most widely used prognostic factor in diffuse largeB-cell lymphoma.logR Log2 ratio. A measure of the total copy number.m7-FLIPI A clinicogenetic prognostic model for follicular lymphoma that integrates theFollicular Lymphoma International Prognostic Index and mutation status of 7 genes:EZH2, ARID1A, MEF2B, EP300, FOXO1, CREBBP, and CARD11.NHL Non-Hodgkin Lymphoma. The more common of the two major types of lymphoma.xivOS Overall Survival. A clinical endpoint that measures time from initial diagnosis to deathfrom any cause.PFS Progression Free Survival. A clinical endpoint that measures time from initialdiagnosis to patient relapse or death from any cause.R-CHOP Rituximab, Cyclophosphamide, Doxorubicin Hydrochloride, Vincristine sulfate,and Prednisone. Current first-line therapy for diffuse large B-cell lymphoma.RPKM Reads/Kilobase of Transcript/Million Mapped Reads.sCNA Somatic Copy Number Alteration.SHM Somatic Hypermutation. The second of three major B-cell development mechanismsthat results in genetic diversity of the immunogloblin loci.sIndel Somatic Small Insertion/Deletion.SNP Single Point Polymorphisms.sSNV Somatic Single Point Nucleotide Variant.TMA Tissue Microarray.TME Tumour Microenvironment.WGS Whole Genome Sequencing.VAF Variant Allele Fraction. The fraction of sequencing reads supporting the variant alleleat a given position.VBBMM Variational Bayes Binomial Mixture Model Clustering.V(D)J V, D, J Somatic Recombinaton. The first of three major B-cell developmentmechanisms that results in genetic diversity of the immunogloblin loci.xvAcknowledgmentsI would like to extend my sincerest gratitude to my co-supervisors Dr. Christian Steidland Dr. Sohrab Shah. I truly was a beneficiary of unique co-supervisory structure thatallowed me to be fully integrated into both labs and experience the wisdom and knowledgeof both supervisors. They have provided me constant mentorship, motivation, and supportfor my graduate studies. And most importantly, thank you for believing in me, when Iwas just starting out, and giving me countless career opportunities. These opportunitieshave afforded me the ability to develop as a scientist and I am forever indebted as theseopportunities have set me up for the next stage of my career.Throughout my PhD, I have also had the privilege to interact and work with manytalented researchers, and professionals who have helped me along the way. I wouldlike to thank Dr. Marco Marra, Dr. Kerry Savage, and Dr. Randy Gascoyne for theircommitment to being on my thesis committee and providing me invaluable advice on myresearch. Thank you Carolyn Lui, Barbara Yuen, and Lori-Ann Wilson for your incredibleadministrative support during my PhD. Thank you to all members of the Steidl and Shablabs for your camaraderie and support during my PhD. Specifically, I want to thank myfellow bioinformaticians from the Steidl lab, Dr. Chris Hother, Dr. Hennady Shulha, LaurenChong, Rebecca Johnston, and Dr. Stacy Hung for giving me endless encouragement,friendship, and fruitful discussions. Thank you for giving me the opportunity to make animpact on your lives, and also teaching me so much about how to be a good and effectiveleader. Thank you to my close colleague Dr. Robert Kridel for giving me the opportunityto co-lead Chapter 3 with you. It was truly an extraordinary experience working withsomeone so intelligent, wise, patient, and appreciative of the bioinformatics contributionsto the project. You made a huge effort to treat me as an equal in the project and I amextremely thankful for that. I have also had the privilege to work closely with Dr. AndrewMcPherson and Dr. Andrew Roth. Their willingness to share their expertise in machinelearning, statistics, and algorithmic development was critical to the understanding andusage of various bioinformatic tools. Big thanks to Dr. Anja Mottok for your incrediblepathology contributions to my thesis and your constant willingness to discuss key clinicaland scientific concepts with me. I also want to thank Dr. David Scott for teaching me somuch about the translational aspect of research and constantly helping me understand mycontributions in the clinical context.xviI am greatly appreciative of the Terry Fox Research Institute for their financialcontributions in chapters 2, 3, and the Canadian Institutes for Health Research for fundingin chapter 4. Additionally, I have also been privileged to receive funding support for mygraduate studies from the University of British Columbia in the form of Faculty of Sciencegraduate awards and travel awards.Finally, I want to sincerely thank my family - my mom, dad, sister, and my wife Emilia- for everything they have provided me in life. To my parents, the sacrifice you two hadto make to bring our family to Canada to have a better life will never be lost on me. Thevalues and morals you have taught me and the plethora of opportunities you afforded mewhile growing up, many of which you never had, have shaped me to the man I am today. Iam so glad to have such loving parents who will unconditionally drop everything to cometo my aid. You are the best parents that I could ask for and I am so proud to be able tocall you mom and dad. To my big sister, I have always looked up to you in everything inlife. Growing up you spent a lot of time “nagging” me about a lot of things, but I am trulythankful that you did. Because I now understand you were teaching and mentoring me onhow to become a better person. And to Emilia, my wonderful wife, no words can describehow thankful I am to have you in my life. Knowing that you were in my corner providedme with the confidence and energy to persist through my research. Thank you for all thecountless hours you spent helping scrutinize, read, and discuss my research. You havebeen instrumental in helping me develop as a scientist and there is no way I could havecompleted my graduate studies without you by my side.xviiDedicationTo my loving Lord, extended family, sister, parents, and my wife.xviiiChapter 1IntroductionCancer is a genetic disease and is a significant public health problem as it results in 1in 4 deaths worldwide [1]. Cancer development occurs through an evolutionary processknown as clonal evolution [2, 3]. The clonal evolution model has been linked to Darwin’stheory of evolution - in that genomically distinct groups of cancer cells (clones) are theequivalent of asexual reproducing species that live and evolve under an ecosystem. Inthat light, there are two constituent processes that underlie cancer development: thefirst being the emergence of clones due to the acquisition of mutations, which could bemediated by genome instability [4], and the second being the perpetuation of fit clonesdriven by positive selection (Figure 1.1). The consequence of the concert of these 2processes is tumour heterogeneity: a term that broadly encapsulates the notion thattumours are mixtures of genomically distinct clones that can dynamically vary betweenpatients (inter-tumour) and within a patient, either over disease course or between spatiallyseparated specimens of the same tumour (intra-tumour). Additionally, a large body ofevidence demonstrates that cancer progression is enabled through tumour cell interactionswith the non-malignant elements that form the tumour microenvironment [5]. This tumourmicroenvironment, similar to tumour cells, can exhibit variation between patients andover a patient’s disease course. This phenomenon is collectively referred to as tumourmicroenvironment heterogeneity. In Sections 1.1, 1.2, and 1.3, I review these forms ofheterogeneity and discuss their clinical implications on cancer treatment. Section 1.4provides a review on B-cell lymphomas and the current unmet clinical needs. Finally,Section 1.5 outlines the aim of this thesis - characterizing inter-tumour, intra-tumour,and tumour microenvironment heterogeneity and its association to disease progressionin B-cell lymphoma.1.1 Inter-tumour HeterogeneityClonal evolution occurs independently in each patient’s tumour resulting in a myriadof genotypes in cells within a given tumour. However, when considering a cohort of1Figure 1.1: Clonal Evolution in Cancer. Each circle is an individual cell with the colordenoting the clone it represents. An ecosystem is represented by the beige boxes thatencapsulate a set of clones. The vertical lines represent imposition of selective pressures(e.g. Tx - therapy). This is Figure 2a reprinted by permission from Macmillan PublishersLtd: Nature (Greaves, M., & Maley, C. C [3]), copyright (2012).patients, convergent phenotypes often arise, leading to tumours sharing common setsof phenotypic and histologic hallmarks. The consequence of this process is observedacross patient cohorts where primary specimens that are assigned the same histologicalsubtype have diverse genomic profiles. This is referred to as “inter-tumour heterogeneity”(Figure 1.2A). For instance, recent high-throughput sequencing studies [6–9] havecomprehensively surveyed the mutational landscape of diffuse large B-cell lymphoma(DLBCL) and demonstrated substantial diversity in the mutational profiles of patients. Theabsence of a single defining genomic profile of DLBCL suggests that a variety of genotypescan converge on the same pathogenesis of the disease.Inter-tumour heterogeneity is responsible, in part, for why a standardchemo/radiotherapy regimen does not work for all patients. For instance, patientswho suffer from treatment failure might have tumours harbouring genomic alterationsthat confer resistance to the standard treatment regimen. As such, by characterizinginter-tumour heterogeneity, we have the opportunity to address an unmet clinical need:2pri relBlood vesselFolliculardendritic cellTH1 cellCD8+ T cellB cellFOXP3+ T cellTimeCD4+ T cellSubclone 1Subclone 2Subclone 3Tumour CellMacrophageC Tumour Microenvironment Heterogeneity (Chapter 4)B Intra-tumour Heterogeneity (Chapter 3)A Inter-tumour Heterogeneity (Chapter 2)Figure 1.2: Different Forms of Heterogeneity Covered in this Thesis. This figureprovides an overview of the various types of heterogeneity that will be discussed in thisthesis. Panel A: Inter-tumour heterogeneity. This form of heterogeneity is the topic ofChapter 2 and pertains to genomic variations between tumours of the same histologicaltype. This is Figure 1 adapted by permission from Macmillan Publishers Ltd: Nature(Burrell, R. A., et al. [4]), copyright (2013). Panel B: Intra-tumour heterogeneity. This formof heterogeneity is the topic of Chapter 3 and pertains to genomic differences betweencells of the same tumour and their change over time. Panel C: Tumour microenvironmentheterogeneity. This form of heterogeneity is the topic of Chapter 4 and pertains tovariations in the tumour microenvironment and its change over time. This is Figure 1adapted by permission from Macmillan Publishers Ltd: Nat Rev Cancer (Scott, D. W., &Gascoyne, R. G. [5]), copyright (2014).the identification of biomarkers that distinguish patients with a high likelihood of beingcured with standard treatment from those who would experience disease progression.The latter group of patients may ultimately benefit from alternative therapeutic approachessuch as targeted therapy or immunotherapy.1.2 Intra-tumour HeterogeneityAs clonal evolution occurs over the course of the development of an individual tumour,each tumour can be thought of as an ecosystem comprised of a population of evolvingclones (clonal composition) that fluctuates as a function of genetic drift and selectivepressures (intrinsic and extrinsic). In this light, clonal evolution results in “intra-tumourheterogeneity” (Figure 1.2B) within an individual specimen [10]. This is typically observed3between spatially separated specimens of the same tumour [11–13], and evolves overdisease course [14–16].The evolving nature of each tumour has been implicated in disease progression. Mostnotably, we often observe the emergence of progressive disease following standard-of-carechemotherapy treatment, which acts as an extrinsic selective pressure. As clones harbourdiscordant genomic profiles, the extent to which each clone is sensitive to drug treatmentmay vary. When drug treatment is introduced, it may create an evolutionary bottleneck,which eradicates treatment sensitive clones, and in turn results in a significant decrease inthe genomic variability of the tumour [11] (Figure 1.3).Figure 1.3: Decrease in Genomic Diversity due to an Evolutionary Bottleneck. Thisis Figure 2A reprinted from Swanton, C. [13], with permission from AACR.The remaining treatment resistant clones then undergo further evolution re-establishingthe genomic heterogeneity and the re-manifestation of the disease. For instance, Landauet al. [16] observed treatment resistant minor clones at diagnosis that emerged asaggressive major clones at relapse in 18 chronic lymphocytic leukemia patients. Therefore,it is pertinent to comprehensively characterize the shifts in clonal composition (clonaldynamics) between paired primary and relapse/metastatic tumours to identify clones, andcorresponding genotypes that are associated with the disease progression phenotype. By4identifying such clones, we can begin to address an unsolved clinical problem: preventionof disease progression and relapse within a single patient.1.3 Tumour Microenvironment HeterogeneityThe tumour microenvironment (TME) is a crucial participant in tumourigenesis [5]. TheTME composition is dependent on numerous factors, such as the extent of cross-talk(mediated through cytokines and chemokines) between tumour and non-malignant cells,host-specific factors (e.g. antitumour inflammatory response), and genomic alterationsin tumour cells [17]. Furthermore, the existence of inter-tumour (Section 1.1) andintra-tumour heterogeneity (Section 1.2) may result in TME heterogeneity by elicitingdifferences in TME composition between patient tumours as well as over disease course(Figure 1.2C).Recently, the identification of therapeutic resistance mechanisms conferred bynon-malignant neighboring cells is in agreement with a previously underappreciated roleof the TME in disease progression [18, 19]. For instance, while a large proportion oftumour cells may be eradicated by therapy, the TME may interact with a subset of tumourcells and provide protection from therapy via soluble factors and cell adhesion-mediateddrug resistance mechanisms [20]. Another example of TME involvement is the treatmentresistance to anti-angiogenic therapy, where tumour-associated macrophages secretefactors that counteract the loss of VEGF in tumour cells to allow for angiogenesis [21].These results demonstrate that the TME can provide tumour cells with an extrinsictherapeutic resistance mechanism. Moreover, the TME also provides refuge to tumourcells under insult from therapy while simultaneously affording tumour cells the time toevolve a treatment resistant phenotype.The TME’s critical role in facilitating the survival of a tumour suggests it may bean attractive therapeutic target [19]. Recent approaches to targeting the TME havedemonstrated this [22, 23]. However, the fact that a proportion of patients do notrespond to such therapy or experience a re-emergence of their disease suggests thatTME heterogeneity is partially responsible for this treatment failure. As such, thecontribution of TME heterogeneity to disease progression needs to be characterized toaddress the aforementioned clinical problems in Sections 1.1 and 1.2. In addition, bycharacterizing TME heterogeneity in conjunction with the tumour cell diversity, we canidentify patients who may benefit from combination therapy that targets both the malignantand non-malignant compartments of a cancer. Such therapy may lead to durable patientresponses.51.4 B-cell LymphomasB-cell lymphomas are a heterogeneous class of lymphoid neoplasms derived from Blymphocytes at various stages of B-cell development [24]. The two main types of B-celllymphomas are Hodgkin lymphoma (HL) and non-Hodgkin lymphoma (NHL). Of these twotypes, NHL is more common and represents a diverse collection of at least 14 differentsubtypes [25].1.4.1 B-cell Lymphomagenesis1.4.1.1 Chromosomal TranslocationsOne defining feature of B-cell lymphomas is the prevalence of chromosomal translocationsinvolving the immunoglobulin (Ig) locus that arise during the process of B-cell development[26, 27]. One site of origin of these translocations is in the bone marrow, where B-cellprogenitors are located. Here, B-cells undergo somatic recombination of their V, Dand J gene segments (V(D)J recombination) to form the immunoglobulin heavy-chain(IgH) and immunoglobulin light-chain (IgL, lack the D segments) genes that comprisethe subunits of an antibody [28]. This process requires the breaking of double-strandedDNA, mediated through the recombinase-activating genes RAG1 and RAG2, followed byDNA repair of these breaks. This process results in extensive Ig diversity and in thecreation of naive mature B-cells. An unintended consequence of such a mechanism isthe somatic acquisition of reciprocal chromosomal translocations that place the expressionof proto-oncogenes under the control of enhancers from an Ig (most commonly from theIgH locus). This then leads to massive and unregulated expression of the oncogene.A second site of origin of these translocations is in the germinal centre. Followingan encounter with an antigen, naive mature B-cells will migrate to the germinal centre oflymph nodes (Figure 1.4) [24]. Inside the germinal centre, a complex regulatory networkallows for further B-cell development via the mechanisms of somatic hypermutation(SHM) and class-switch recombination (CSR). These processes are mediated by theactivation-induced cytidine deaminase enzyme (AID), and results in further geneticdiversification of the Ig locus. Like the somatic V(D)J mechanism, a consequence of SHMand CSR is the introduction of DNA damage that may lead to chromosomal translocations.The presence of translocations within or adjacent to rearranged V(D)J in conjunction withthe presence of somatically mutated V segments is suggestive of these translocationsarising during SHM [29]. Additionally, translocations with breakpoints in the switch regionsof Ig suggests that these may have arisen during the CSR process [29].The process of B-cell development harbours multiple mechanisms for somaticchromosomal translocations to occur and hence these mutations are often the initiatingmutations in lymphogenesis. This notion is supported by the fact that nearly alllymphoma subtypes are characterized by hallmark translocations involving the IgH locus6(e.g. IgH-BCL2 in follicular lymphoma (FL) and DLBCL [30], IgH-CCND1 in mantle celllymphoma [31], and IgH-MYC in Burkitt lymphoma [32]) that are found in a large proportionof patients. These translocations are almost always present in all tumour cells, suggestingthat they are early and distinguishing genetic alterations in B-cell lymphomas.Despite the high incidence of chromosomal translocations, there is a lack of 100%occurrence in patients and translocations are also present in B-cells of healthy individuals[33]. This suggests that chromosomal translocations alone are neither necessary norsufficient for lymphomagenesis [34]. Thus additional/alternative dysregulated mechanismsmust exist through different genomic/epigenomic alterations. These will be discussedbelow in the separate lymphoma subtype Sections 1.4.2, 1.4.2, and 1.4.4.1.4.1.2 Cell of OriginAs B-cell development occurs in a stepwise fashion, it follows that the transformation ofa benign B-cell to a malignant state can occur at any step of the development process.As a corollary, a lymphoma subtype’s cell of origin (COO) refers to the stage in the B-celldevelopment from which it arose. As the stage at which a B-cell becomes malignant isdifficult to determine, COO is putatively derived from gene expression profiling of malignantB-cells and comparing these to the profiles derived from a range of different normal B-cells.Through this methodology, the putative COO of many different lymphoma subtypes havebeen assigned (Figure 1.4). In this thesis, 3 types of B-cell lymphomas are studied: 1)classical Hodgkin lymphoma (cHL, the more common form of HL), 2) DLBCL (the mostcommon form of NHL), and 3) FL (the most common form of indolent NHL).7Classical Hodgkin lymphomaFigure 1.4: The Germinal-Centre Reaction and Putative COO of Different LymphomaSubtypes. Depicted is a naive mature B-cell, after undergoing somatic V(D)Jrecombination in the bone marrow, encountering an antigen and migrating to the germinalcentre of a lymph node. Once inside the germinal centre, the germinal-centre regulatoryprogram is activated to facilitate the further development of the B-cell. Indicated are theputative COO of the different lymphoma subtypes. This is Figure 1 adapted with permissionfrom Lenz, G., & Staudt, L. M. [24], Copyright Massachusetts Medical Society.1.4.2 Diffuse Large B-cell LymphomaDLBCL is the most common form of aggressive NHL accounting for 30% to 40% of allnew lymphoma patients [25]. DLBCL commonly arises from normal B-cells, but can also8arise through histologic transformation from other lymphoma subtypes (e.g. FL which isthe subject of Chapter 3) [35].1.4.2.1 Pathogenesis of Diffuse Large B-Cell LymphomaGene expression profiling has identified that DLBCL is comprised of at least 2 majorsubtypes, activated B-cell like (ABC) and germinal centre B-cell like (GCB), that reflecttheir hypothetical COO [36]. The GCB subtype exhibits a transcriptional signature thatis analogous to that of a germinal centre B-cell that exhibits elevated expression levelsof the transcriptional repressor BCL6 along with hypermutuated Ig genes - indicative ofsomatic hypermutation. In contrast, the ABC subtype exhibits a signature characterizedby a down-regulation of the germinal centre reaction, up-regulation of the BCR and NF-κβpathways that includes elevated expression of the genes IRF4 and PRDM1, and lack ofongoing SHM. This is consistent with the profile of a B-cell leaving the germinal centreand entering the stage of plasma cell differentiation. Importantly, COO is prognosticallysignificant with the GCB subtype having better outcomes than ABC [24].Recent genomic studies [6–9] have also demonstrated that DLBCL has a high degreeof genomic complexity, where each patient harbours between 50 and >100 codinggenomic alterations (point mutations, copy number alterations, and translocations) [37].These studies collectively implicate disruption of the cancer epigenome through frequentmutations in chromatin modifying genes (e.g. KMT2D, CREBBP, EZH2, EP300, andMEF2B) as a defining feature of DLBCL. Additionally, mutations in B2M, HLA, and CD58allow DLBCL cells to escape from immune surveillance [37].Intriguingly, certain genomic alterations are enriched in one particular subtype. Forexample, Ig translocations involving MYC and BCL2 along with gain-of-function mutationsin the EZH2 gene are nearly exclusive to the GCB subtype. In contrast, frequent mutationsin CD79B, CARD11, MYD88, and TNFAIP3 are enriched in ABC, pointing to constitutiveactivation of the B-cell receptor and NF-κβ signalling pathways as being a predominantlyABC oncogenic pathway.Additionally, there has been a lot of interest in “double-hit” lymphomas (DHIT;concurrent IgH-MYC, IgH-BCL2, and/or IgH-BCL6 translocations) which compriseapproximately 5% of DLBCL patients [38]. In one study of 303 DLBCL patients, 14%of patients had a MYC translocation, of which 83% had a concurrent BCL2 and/or BCL6translocation [39]. Patients with this type of biology are often of the GCB subtype. Whilethese translocations lead to overexpression of the oncogenes MYC, BCL2 and BCL6, thereis also evidence of the overexpression of these genes, in the absence of a translocation[40]. Such patients are referred to as dual-expressors and interestingly are often of theABC subtype. A major reason why DHIT and dual-expressor lymphomas have garnered alot of attention is because of their association with inferior survival [39, 41] (as discussedin Section 1.4.2.2). As such, these types of lymphomas are now recognized as a distinct9molecular subtype of DLBCL as evidenced by their recent inclusion into the World HealthOrganization classification [42].1.4.2.2 Clinical Prognosis and Prognostic FactorsThe current standard of care for DLBCL is a multi-agent immuno-chemotherapy regimenthat uses rituximab, an anti-CD20 antibody, in combination with cyclophosphamide,doxorubicin hydrochloride, vincristine sulfate, and prednisone, (R-CHOP). Theincorporation of rituximab into treatment of DLBCL has led to dramatic improvements inDLBCL patient outcomes compared to CHOP alone [43, 44]. Despite this improvement,approximately one third of patients still suffer from primary refractory or relapse disease(Figure 1.5). Such patients will receive high-dose chemotherapy (HDCT) followed byautologous stem cell transplantation (ASCT), as the standard second-line treatment, thatcures approximately 50% of patients [45]. For the subset of patients not eligible for ASCTdue to lack of response to HDCT, these patients typically have a dismal outlook and rapidlysuccumb to the disease.DLBCL Diagnosis 60-70% Cured 30-40% Relapse Second line  treatment  (HDCT-ASCT) 50% Cured 50% Relapse/ Death           First line treatment (R-CHOP) Figure 1.5: Clinical Prognosis of DLBCL patientsThe most widely used prognostic factor is the International Prognostic Index (IPI)that encompasses 5 clinical factors (age, stage, performance status, serum lactatedehydrogenase and the number of extranodal sites) [46]. While the IPI has shownrobustness as a prognostic factor, it is surrogate for the extensive biological heterogeneityin DLBCL that still remains incompletely described. A first step towards a clinically usefulbiomarker is likely through COO phenotyping as this identifies the ABC subtype to haveinferior survival compared to GCB. Moreover, the decoding of the genomic architecture ofDLBCL has led to the development of novel targeted therapies. As some of these mutationsare subtype specific, this opens the avenue for subtype specific targeted therapies (e.g.EZH2 inhibitors, kinase inhibitors for the B-cell receptor signalling) and the potential forCOO to become a predictive biomarker.However, the IPI and COO are still unable to fully account for the variability in responseto standard therapy. One relatively new recognized factor is DHIT and dual-expressorpatients who have dismal outcomes with R-CHOP. These patients represent a clinicalextreme, requiring specialized clinical management, and thus clinical trials will be needed10to determine the optimal way to treat such patients.With the advent of genomic technologies, novel prognostic factors have beenuncovered that harbour clinical potential (e.g. somatic mutations in FOXO1 [47] anddeletions in CDKN2A [48]). Large cohorts of DLBCL patients will be required to furtheruncover additional prognostic factors.1.4.3 Follicular LymphomaFollicular lymphoma (FL) is the most common form of indolent lymphoma that ischaracterized by a slow progression with a median age of onset at 60 years [35]. FL isthe second most common form of NHL, after DLBCL, accounting for approximately 25% ofall new NHL diagnoses [25].1.4.3.1 Pathogenesis of Follicular LymphomaOverexpression of anti-apoptotic BCL2 caused by the translocation of IgH-BCL2 isrecognized as the molecular hallmark of FL. The translocation occurs in 85-90% of allpatients and is hypothesized to be one of the first genomic alterations to occur in theoncogenic cascade leading to the initiation of FL [49]. However, it has been shown thathealthy individuals may have B-cells harbouring the translocation [33] suggesting that thetranslocation alone is insufficient or not required (10-15% of FL patients do not harbour it)for lymphomagenesis.To this end, additional studies have revealed tumour heterogeneity in FL beyond theIgH-BCL2 translocation. For instance, TNFRSF14 (in the 1p36 region) is frequentlymutated and is a target of heterozygous deletions and copy-neutral loss of heterozygosity[50]. Similarly, frequent deletions in the tumour suppressor TNFAIP3 [51] have beenimplicated in FL pathogenesis. Comprehensive high-throughput sequencing studies [6, 7]have also implicated mutations in the chromatin modifying genes KMT2D, CREEBP, EZH2,EP300, and MEF2B as a defining hallmark in FL.1.4.3.2 Clinical Prognosis and Prognostic FactorsDue to the indolent nature of FL, not all patients will demonstrate symptoms of thedisease. As such, the recommended treatment is dependent upon the aggressiveness ofthe patient’s symptoms [35]. Typically, 1) limited stage (stage I or II) patients will be treatedwith radiotherapy with a curative intent; 2) advanced stage (stage III or IV) symptomatic,low-tumour burden patients are placed under a watch and wait cycle and sometimes given4 rounds of rituximab for disease control; 3) advanced stage, high-tumour burden patientsare treated with rituximab and chemotherapy (e.g. rituximab and bendustamine in BritishColumbia) (Figure 1.6). However, although modern therapies can produce median overallsurvival rates > 10 years, FL remains a significant clinical burden as it is an incurabledisease and most patients will eventually suffer from disease progression [52]. In particular,11the prognosis of FL patients is often marked by two clinical extreme events.FL Diagnosis Median Survival of 10 years Transformation     Radiotherapy Advanced stage (III or IV) and high-tumour-burden “Watch and Wait” Limited Stage  (I or II) Advanced stage (III or IV) and low-tumour-burden Rituximab + Chemotherapy (e.g. Rituximab + Bendustime)   Early Progression 45% of patients;  2-3% of patients per year 20% of patients within 2 years of treatment   Late Progression Figure 1.6: Clinical Prognosis of FL patients.Firstly, patients may experience a histological transformation of their original FL into ahigh-grade, aggressive lymphoma subtype (most commonly DLBCL and Burkitt lymphoma)[53]. This transformation event occurs in up to 45% of patients (2-3% of patients per year)and is associated with poor clinical outcomes, with patients surviving only 1.2-1.7 yearspost-transformation [54]. The second clinical extreme event is early disease progressionwhere patients will experience treatment resistance within 2 years of receiving therapy[55]. This occurs in approximately 20% of patients with such patients only achieving a 50%overall survival rate at 5 years.The most widely used prognostic risk factor is the Follicular Lymphoma InternationalPrognostic Index (FLIPI) which encompasses 5 clinical factors (age, stage, hemoglobin,number of nodal sites, serum lactate dehydrogenase levels). More recently, the integrationof genomic data (mutational status of EZH2, ARID1A, MEF2B, EP300, FOXO1, CREBBP,and CARD11) has been integrated with the FLIPI to generate a clinicogenetic risk modelthat demonstrated superior prognostic capability over just the FLIPI alone (m7-FLIPI) [56].Additionally, a recent study has suggested that the m7-FLIPI is also able to predict earlyprogression [57].Whether it is possible to predict histological transformation still remains an openquestion. Reported predictive factors include advanced stage (III, IV, or I/II with Bsymptoms or bulky disease) [58], incomplete remission, low serum albumin, high beta2 microglobulin level [59], high-risk FLIPI, and IPI [60]. Recent genomic studies havecontributed to a better understanding of the genetic underpinnings of transformations,but have failed to identify any consistent genetic factors that can definitively predicttransformation. As a result, prediction of histological transformation still remains achallenge.121.4.4 Hodgkin LymphomaHodgkin lymphoma (HL) is the most common form of lymphoma affecting individuals underthe age of 30 in the western world and accounts for 11% of all malignant lymphomas[61]. The most common form of HL is classical Hodgkin lymphoma (cHL), which accountsfor about ~95% of HL cases, and is characterized by malignant mononucleated Hodgkinand multinucleated Reed-Sternberg cells (HRS) [62]. The other less common form of HLis nodular lymphocyte predominant Hodgkin lymphoma (NLPHL), comprising ∼5% of HLcases, which is characterized by the presence of lymphocyte-predominant (LP) cells.In both of these forms, a defining feature, in comparison to other lymphoid cancers,is the low prevalence of malignant HRS or LP cells that represent only a minor portion(∼1%) of the tumour mass. As such, the bulk of the tumour mass (∼99%) is comprisedof a spectrum of infiltrating immune cells (e.g. eosinophils, lymphocytes, macrophages)that form an extensive pro-tumour immune microenvironment [63]. The microenvironmentcomposition along with the prevalence of HRS cells forms the basis of a classificationsystem in which cHL is further subclassified into 4 histological subtypes: 1) nodularsclerosis, 2) mixed cellularity, 3) lymphocyte rich, and 4) lymphocyte depleted. Forthe remainder of this thesis when HL is discussed, it is the cHL subtype that is beingreferenced.1.4.4.1 Pathogenesis of Classical Hodgkin LymphomaWhereas large scale sequencing efforts (e.g. The Cancer Genome Atlas [64], InternationalCancer Genome Consortium [65]) have comprehensively profiled the genomic landscapesof various types of cancers, a comprehensive genomic architecture of HRS cells remainslargely unknown due to the paucity of these cells in the tumour masses.To date, most studies have focused on the study of cell-lines and laser capturemicrodissection of HRS cells. These studies collectively find the majority of genomicalterations to affect two major signalling pathways [62]: 1) nuclear factor-κβ (NF-κβ),and 2) JAK/STAT. Genomic alterations affecting the NF-κβ pathway predominantly includefrequent amplifications of REL [66], and inactivating somatic mutations of its inhibitorsNFKBIA, NFKBIE, and TNFAIP3 [62], which result in constitutive activation of thispathway. The JAK/STAT signalling pathway is also constitutively activated through genomicamplifications of JAK2 [67] and inactivating mutations of its inhibitors SOCS1 [68] andPTPN1 [69].Additionally, there have been studies reporting genomic alterations, amplifications andtranslocations, in the 9p24.1 locus that lead to the overexpression of two key programdeath ligands (PD-L1 and PD-L2) [70, 71]. These ligands serve as key immune checkpointregulators for mediating immune responses. Specifically, their overexpression leads toT-cell anergy, implicating immune evasion as a novel mechanism in cHL.Recently through the combination of flow sorting and low input exome sequencing,13Reichel et al. [72] produced the first ever exome sequencing libraries of primary HRS cells.This study identified frequent (70%, 7 out of 10 patients) inactivating mutations in B2M,supporting previous findings in cell lines [73], along with other mutations not previouslyimplicated in cHL. This suggests that, beyond the two aforementioned constitutivelyactivated pathways, there still remains undiscovered genomic heterogeneity that couldfurther contribute to the pathogenesis of cHL.Finally, latent Epstein-Barr virus (EBV) infection of HRS cells occurs in ~40% of cHLcases and predominately in the mixed cellularity and lymphocyte-deleted subtypes [74].This infection results in all HRS cells harbouring the virus, suggesting that EBV infection isan early event and likely plays a role in the pathogenesis of cHL. Evidence in support ofthis notion includes the expression of viral LMP1 and LMP2 membrane proteins leading toactive TNF receptor and BCR signalling respectively, which provide two key signals for thesurvival of B-cells [74]. Additionally, all samples which contain destructive IgV mutationsare all EBV+ suggesting that EBV provides a mechanism to circumvent cell death [75].These results strongly suggest that EBV infection plays a key role, at least in a subset ofpatients, in the pathogenesis of cHL.1.4.4.2 The Classical Hodgkin Lymphoma MicroenvironmentAs the tumour mass is largely comprised of an immune infiltrate, it is becoming increasinglyrecognized that these cells are critical partners to the HRS cells in aiding the pathogenesisof cHL. Through immunohistochemistry staining (IHC) and flow cytometry, it is nowestablished that the microenvironment is largely comprised of CD4+ and CD8+ T-cells,TH1 and TH2 cells, TFH cells, FOXP3+ T cells, TFR cells, B-cells, plasma cells, eosinophils,macrophages, and mast cells (Figure 1.7) [5].14Figure 1.7: cHL Microenvironment. This is Figure 1 adapted by permission fromMacmillan Publishers Ltd: Nat Rev Cancer (Scott, D. W., & Gascoyne, R. G. [5]), copyright(2014).The composition of this microenvironment is facilitated through a wide array ofcytokines and chemokines secreted by HRS cells (e.g. CCL5, CCL17, Galectin1, CX3CL1)along with the various receptors on non-neoplastic cells allowing for the crosstalk betweenthese intrinsic and extrinsic elements of the tumours [76]. This means that geneticalterations in immuno-modulating genes may have a drastic effect in the shaping of themicroenvironment, although this still remains to be fully proven.Traditionally, the deconvolution of the constituent elements of the microenvironmenthas been done using IHC or flow cytometry. With the advent of high-throughputtranscriptome-wide expression assays (e.g. Affymetrix microarrays), this has affordedresearchers the ability to now perform expression profiling on whole tissue biopsies[77–79]. This approach effectively uses gene expression as a surrogate for IHC actingas a “molecular microscope” to interrogate the microenvironment (Figure 1.8). Giventhe increasing recognition of the importance of the microenvironment, more work will beneeded in the future to establish the biomarkers and opportunities for targeting them.15B−cellHRST−cellNK MacrophageNeutrophilHL1092_HL1080HL1094_HL1082HL1085_HL1073HL1072_HL1060HL1263_HL1011HL1269_HL1099HL1109_HL1267HL1110_HL1268HL1130_HL1121HL1021_HL1009HL1257_HL1256HL1069_HL1057HL1131_HL1270HL1015_HL1003HL1066_HL1054HL1116_HL1104HL1285_HL1251matrix_22 1 0 −1 −2Gene Expression Profiling of  Whole Tissue Biopsy B Cell HRS T-Cell NK Neutrophil Macrophage Figure 1.8: Molecular Microscope Approach for Profiling the cHL Microenvironment.The image on the right is Figure 1 adapted by permission from Macmillan Publishers Ltd:Nat Rev Cancer (Scott, D. W., & Gascoyne, R. G. [5]), copyright (2014).1.4.4.3 Clinical Prognosis and Prognostic FactorscHL is widely regarded as a model of success for cancer treatment with combined modalityand multi-agent chemotherapy resulting in > 80% of patients being cured [80] (Figure 1.9).In particular, patients with limited stage (non-bulky stage IA and stage II), accountingfor 30-35% of all new cHL diagnosis, have cure rates of 90-95%. For advanced stagepatients (stage IB and stage IIB bulky disease or stage III and IV) before the advent ofmultiagent chemotherapy, the disease was nearly fatal for all patients within 5 years [81].With the chemotherapy regimen ABVD (adriamycin, bleomycin, vinblastine, dacarbazine)as the current standard of care for advanced stage, these patients now have durableremission rates of 60% to 80% [80]. In such advanced stage patients, the most widelyused prognostic marker for guiding treatment decisions is the International PrognosticScore (IPS) which assigns patients a score between 0 and 7 with > 3 being classifiedas high-risk [82]. In addition, one promising new biomarker is macrophage content inthe TME with a higher amount being associated with inferior prognosis [79]. This findingsuggests that the microenvironment can indicate patient prognosis. Further efforts haveattempted to expand on this such as a 26-gene expression based prognostic model [83]that encapsulates a macrophage, cytotoxic T-cell, and natural killer cell signal.Despite these improvements, a proportion of advanced stage patients remain eitherrefractory to first-line treatment (10%) or relapse following first-line treatment (20-30%)[84]. The current standard of care for these patients (second-line treatment) ishigh-dose chemotherapy (HDCT) followed by autologous stem-cell transplantation (ASCT)to re-establish a patient’s bone marrow with its own healthy stem cells. This therapy onlycures 50% of patients [85] with the remaining portion of patients suffering from disease16HL Diagnosis 65-70% Cured 30-35% Relapse Second line  treatment (HDCT-ASCT) 50% Cured 50% Relapse/ Death           First line treatment (ABVD) Figure 1.9: Clinical Prognosis of cHL patientsrecurrence, usually within the first year after treatment [86]. Such patients are left withlimited therapeutic options and eventually succumb to the disease (median survival of 2.4years).1.4.4.4 Novel TherapiesNovel therapies for the treatment of primary and refractory/relapsed cHL are emerging withbrentuximab vedotin (BV) and PD-1 blockade leading the way [87].BV is an antibody-drug conjugate comprising of an anti-CD30 antibody conjugated bya microtubule toxin - monomethyl auristatin E (MMAE) [88]. It is designed to selectivelybind to CD30+ cells resulting in its internalization and release of MMAE via proteolyticcleavage. The MMAE disrupts the microtubule network resulting in cell death [89]. BVrepresents a promising targeted therapy in cHL because HRS cells express the surfacemembrane antigen CD30. Chen et al. demonstrate how the BV treatment of a cohort ofrefactory/relapse cHL patients, who failed ASCT, resulted in 13 out of 34 (38%) patientsachieving complete responses and remaining in remission for over 5 years [90]. BV hasalso been considered in combination with ABVD for first-line treatment, but further clinicaltrials are required to determine its appropriate use in first-line treatment [91]. Given thehistory of dismal outcomes for patients with relapsed cHL, these results demonstrate thatthese patients still have the potential of being cured.Another emerging novel therapy is immune checkpoint inhibition, and PD-1 blockadein particular. The PD-1 receptor on T-cells engages with its ligands, PD-L1 and PD-L2,leading to T-cell anergy. In cHL, it is known that overexpression of these ligands, foundin the 9p24.1 locus, is the result of frequent HRS cell copy number amplifications [70, 92]and genomic rearrangements [71]. The overexpression of these ligands on the cell surfaceleads to immune evasion, but also suggests that the disease may be susceptible to immunereactivation via the blocking of the PD-1 receptor. In this light, a large amount of drugdevelopment has been focused on designing antibodies directed against PD-1 with two ofthem being Nivolumab and Pembrolizumab. Ansell et al. tested the efficacy of Nivolumab ina group of 23 refractory/relapsed cHL patients and demonstrated responses in 20 out of 23(87%) patients [93]. Most of these patients had received ABVD as their first-line therapy,17but then failed either BV and/or ASCT as their second-line treatment. Another recentstudy showed similar results in which 53 out of 80 patients achieved an objective responsewith Nivolumab following failure of both ASCT and BV [94]. Similiarly for Pembrolizumab,Armand et al. demonstrated an overall response rate of 65% in a cohort of 31 patients whohad failed ASCT and/or BV [95]. Therefore, given the limited number of therapeutic optionsfor patients who fail traditional second-line therapy, PD-1 blockade serves as a novel andpromising option.1.5 Problem Statement and Thesis OverviewDespite improvements in the standard of care, a subset of B-cell lymphoma patients sufferfrom disease progression. For instance, in DLBCL, 30-40% of patients remain incurablefollowing first-line treatment [24]. In FL, nearly all patients suffer from disease progression;with up to 45% of patients in which a histological transformation of their original lymphomato a more aggressive form of lymphoma (typically DLBCL) occurs. Moreover, 20% ofpatients experience early treatment failure (within 2 years of treatment) [35]. In cHL, 10%of patients remain refractory to first-line treatment (10%) and 20-30% relapse followingfirst-line treatment [84]. Additionally, 50% of patients with refractory/relapse diseaseexperience failure of second-line treatment and ultimately succumb to the disease. Assuch, there is still a need to obtain a refined understanding of the pathogenesis anddisease progression of B-cell lymphomas.While comprehensive landscape studies of B-cell lymphoma patient samples haveuncovered the vast degree of inter-tumour heterogeneity [6, 7, 72], few studies havecorrelated these patterns and associated genomic alterations with treatment failure.Additionally, the majority of studies have been principally focused on studying primaryspecimens with little to no ascertainment of intra-tumour heterogeneity and how this maychange over time. What is needed is a fundamental understanding of the relationshipbetween intra-tumour heterogeneity and disease progression which will allow us to identifygenetic events and patterns that could be markers of progressive disease. Finally, therole of the tumour microenvironment in disease progression is relatively neglected incomparison to the malignant cells. How microenvironment heterogeneity, both at theinter-tumour and intra-tumour levels, correlates with progressive disease remains poorlyunderstood. Ultimately, a richer understanding of the association of these forms ofheterogeneity with disease progression will aid in the determination of precise therapeuticapproaches for each patient.In this thesis work, I aimed to uncover novel genomic and phenotypic featuresacross a spectrum of B-cell lymphomas with the overarching hypothesis that previouslyuncharacterized inter-tumour, intra-tumour, and tumour microenvironment heterogeneityis associated with disease progression and treatment outcome. This hypothesis isinvestigated and addressed in three major data chapters of this thesis (2, 3, and 4), each18of which is focused on studying different aspects of heterogeneity and its association withprogressive disease in B-cell lymphomas. In particular, Chapter 2 reports on inter-tumourheterogeneity in DLBCL primary specimens. Chapters 3 and 4 are focused on studyingintra-tumour and tumour microenvironment heterogeneity and how this changes over time,in the context of FL and cHL respectively.19Chapter 2An RCOR1 Loss-Associated GeneExpression Signature Identifies aPrognostically Significant DLBCLSubgroup12.1 IntroductionDiffuse large B-cell lymphoma (DLBCL), the most common type of non-Hodgkin lymphoma(NHL), accounts for approximately 30-40% of all new lymphoma cases. DLBCL comprisesat least two major molecular subtypes reflecting the phenotype of the hypothetical cell oforigin (COO): (1) activated B-cell like (ABC), derived from B cells exiting (or poised toexit) the germinal centre, and (2) germinal centre B cell-like (GCB), derived from B cellsfound in the germinal centre [24]. The advent of high-throughput sequencing has led to thediscovery of various somatic genomic mutations linked to the pathogenesis of DLBCL.These include somatic copy number alterations (sCNAs) changes such as arm-lengthdeletions of chromosome 6q, amplifications of chromosomes 3 and 18q [96, 97], focaldeletions of CDKN2A [48], and amplifications of REL [98]. As well as recurrent somaticpoint mutations in CREBBP, EP300, EZH2, and KMT2D [6–8] - collectively implicatingdisruption of chromatin modification as a defining feature.Despite the increasing knowledge base of genomic and transcriptomic abnormalities, amajor clinical challenge persists: approximately 40% of DLBCL patients receiving standardtherapy of rituximab, plus cyclophosphamide, doxorubicin, vincristine, and prednisolone1In this chapter, I describe a study of inter-tumour heterogeneity and its implications on disease progressionin a cohort of diffuse large B-cell lymphoma patients. This chapter is a modified version of material published in“Chan, FC. et al. An RCOR1 loss-associated gene expression signature identifies a prognostically significantDLBCL subgroup. Blood (2015).”20(R-CHOP) are not cured [99]. The biological correlates of treatment failure are not wellunderstood and the variability in clinical outcomes cannot be explained by the currentknowledge of the mutational landscape. Thus, additional biologic heterogeneity and relatedassociations to treatment outcomes remain to be discovered.I set out to uncover prognostically significant molecular subgroups of DLBCL throughsimultaneous interrogation of the genomic and transcriptomic dimensions of tumoursfrom a uniformly treated population of patients for which comprehensive clinical follow-updata were available. I sought to identify sCNAs with measureable impact on geneexpression of transcriptional networks as primary candidates for functional and prognosticalterations [100]. While similar datasets have been generated by other groups [96,97, 101–103], most studies have been restricted by small cohorts, low resolution data,and/or lack of association with clinical outcomes in the current era of R-CHOP therapy.The most recent integrative study [104] presented the largest and highest resolutiondataset to date, and focused on the synergy between aberrations in p53 and cell cyclecomponents. However, associations between gene-centric sCNAs, clinical outcomes andgene expression patterns were not described. I identified novel focal and recurrent sCNAchanges in RCOR1 that are associated with a prognostically significant gene expressionsignature. This RCOR1 loss-associated gene signature identifies a subgroup of DLBCLpatients with unfavourable overall survival (OS), both in the discovery cohort and inan independent cohort [105]. Taken together, I have identified RCOR1 as a criticallydysregulated gene in DLBCL pathogenesis and a consequent gene expression signatureas a novel risk-associated molecular profile in R-CHOP treated patients.2.2 Materials and Methods2.2.1 Patient CohortsFor this study, a cohort of 151 primary treatment naive DLBCL samples (Figure 2.1)was available from the tissue repository of the Centre for Lymphoid Cancer at the BritishColumbia Cancer Agency (“BCCA study cohort”) according to the following criteria: 1)availability of a fresh-frozen tissue specimen at initial diagnosis (pre-treatment specimen),2) confirmed de novo DLBCL diagnosis upon pathology review, 3) HIV negativity, and 4)age ≥ 16 years.High-resolution copy number microarrays (Affymetrix SNP 6.0) were availablefor all BCCA study cohort samples of which 148 passed quality control (seeSection 2.2.2). A subset, comprising 139 of these patients, were treated with combinedimmuno-chemotherapy (R-CHOP / R-CHOP-like therapy, “the R-CHOP cohort”) andwere used for outcome correlation analysis. Analysis of clinical characteristics (age,stage, performance status, extra nodal sites, lactate dehydrogenase, and the derived IPIscore indicated that the R-CHOP cohort (n = 139) and BCCA study cohort (n = 148)21151 Affymetrix SNP 6.0 139 R-CHOP Treated 91 RNA-seq 85 R-CHOP Treated 233 HG-U133 Microarrays 148 Affymetrix SNP 6.0 “BCCA Study Cohort” 195 R-CHOP Treated 38 overlapping samples “R-CHOP Cohort” “Rediscovery Cohort” (Lenz et al. 2008) 3 failed QC 91 HG-U133 Microarrays “Rediscovery Cohort 2” (Monti et al. 2012) 90 HG-U133 Microarrays 1 failed QC 67 R-CHOP Treated 9 Non-RCHOP 6 Non-RCHOP 23 Non-RCHOP Figure 2.1: Overview of the Cohorts.were representative of the DLBCL R-CHOP-treated population at the BCCA (Table 2.1,Table 2.2). Matching previously published gene expression profiles on 91 samples wereascertained by RNA-seq analysis [6], of which 85 samples were treated with R-CHOP /CHOP-like therapy.A rediscovery cohort of 233 R-CHOP treated patients was used from Lenz et al. [105].A total of 38 samples from the BCCA study cohort overlapped with the rediscovery cohortand were removed for outcome correlation analysis. A secondary independent rediscoverycohort of 91 patients was used from Monti et al. [104]. 90 of the 91 patients passed qualitycontrol (Section 2.2.20) of which 67 of these patients were R-CHOP treated and used foroutcome correlation analysis.2.2.2 Affymetrix SNP 6.0 Quality ControlA total of 151 Affymetrix SNP 6.0 microarrays were generated according to manufacturer’sinstructions (Affymetrix, Santa Clara, CA). The Affymetrix Power Tools package (v1.15.1)was used to check whether SNP 6.0 microarrays met quality control criteria as suggestedby the Affymetrix Genotyping Console v4.1.3. 150 out of 151 SNP 6.0 microarrays (99.3%)passed quality control by having a contrast quality control (CQC) score of > 0.4 and quality22Table 2.1: Clinical Characteristics of the R-CHOP-treated DLBCL Study Cohort.aData were unavailable for 6 patients. bData were unavailable for 9 patients. cDatawere unavailable for 10 patients. dData were unavailable for 27 patients. eDatawere unavailable for 27 patients. BCCA, British Columbia Cancer Agency; LDH,lactate dehydrogenase; IPI, International Prognostic Index; OS, Overall Survival; PFS,Progression-Free Survival; DSS, Disease-Specific Survival; R-CHOP, Rituximab plusCyclophosphamide, Doxorubicin, Vincristine, and Prednisolone.Clinical VariableDLBCL StudyR-CHOP TreatedCohort (n = 139)DLBCL BCCAR-CHOP TreatedPopulation (n=554)PMales 86 (62%) 356 (64%) 0.671Age > 60 83 (60%) 324 (59%) 0.868Stage III/IV 75 (54%)a 325 (59%) 0.704Performance Status > 1 41 (32%)b 185 (33%) 0.763Extra Nodal Sites > 1 17 (13%)c 151 (27%) < 0.001LDH High 56 (50%)d 269 (49%) 0.861IPI ≥ 3 (high-risk) 46 (41%)e 236 (43%) 0.765Follow-up, yearsMedian 5.11 7.69Range 0.09 - 11.05 0.1 - 13.21Outcome, 5 yearsOS 72% 69% 0.630PFS 68% 63% 0.243DSS 75% 73% 0.899control rate of > 0.86. In addition, the batch passed quality control as the mean passingCQC for samples with CQC > 0.4 was 2.17 (< 1:7 indicates a problematic batch).2.2.3 Processing of Affymetrix SNP 6.0 DataThe PennCNV-affy [106] protocol was used to pre-process the Affymetrix SNP 6.0 datareturning log2 ratio (logR) and B-allelic frequency (BAF) values for each CNV and SNPprobe in each sample. The logR and BAF values for only the SNP probes were used asinput into OncoSNP [107] (v1.3) that was run using the following command:. / oncosnp −−gcd i r quant isnp2 / b37 −−sampleid $ (SAMPLEID) −−tumour− f i l e $ ( INPUT_SAMPLE_FILE)−−hgtab les hgTables_b37 . t x t −−paramsf i l e hyperparameters−a f f y . dat−− l e v e l s f i l e leve ls−a f f y . dat −− t r a i n i n g s t a t e s f i l e t r a i n i n g S t a t e s . dat−−t u m o u r s t a t e s f i l e tumourStatesCLL . data −−subsample 10 −−emi ters 10 −− f u l l o u t p u t−−st romal −−i n t ra tumourThe hgTables_b37.txt, *.dat, and tumourStatesCLL.data files were provided in theoriginal download of the OncoSNP software. The quantisnp2/b37 data was downloadedfrom ftp://ftp.stats.ox.ac.uk/pub/yau/quantisnp2/download/ as suggested by the OncoSNPwebsite. The $(INPUT_SAMPLE_FILE) is the output from the PennCNV-affy protocol.23Table 2.2: Clinical Characteristics of the DLBCL SNP 6.0 Study Cohort. Abbreviations:BCCA, British Columbia Cancer Agency; LDH, lactate dehydrogenase; IPI, InternationalPrognostic Index; R-CHOP: Rituximab plus Cyclophosphamide, Doxorubicin, Vincristine,and Prednisolone. aData were unavailable for 6 patients. bData were unavailable for 9patients. cData were unavailable for 10 patients. dData were unavailable for 27 patients.eData were unavailable for 27 patients.Clinical VariableDiscovery StudyCohort (n = 148)DLBCL BCCAR-CHOP TreatedPopulation (n=554)PMales 93 (63%) 356 (64%) 0.823Age > 60 89 (60%) 324 (59%) 0.788Stage ≥ 3 80 (56%)a 325 (59%) 0.685Performance Status > 1 45 (32%)b 185 (33%) 0.899Extra Nodal Sites > 1 20 (15%)c 151 (27%) 0.003LDH High 61 (50%)d 269 (49%) 0.787IPI ≥ 3 (high-risk) 51 (42%)e 236 (43%) 0.922Samples predicted to have an overall normal content ≥ 50% were re-run using the exactsame parameters with the removal of the “–stroma”. The rationale for re-running withoutthis stroma parameter was that the sample could either be high in normal contaminationor be a quiescent copy number genome. To distinguish between these two scenarios,hematoxylin and eosin staining (H&E; according to routine pathology practices) wasperformed to assess the tumour content on any samples with available tissue. Twoadditional samples were excluded due to having high normal content > 50%, as predictedby OncoSNP, and < 10% tumour content, as assessed by H&E, resulting in a final studycohort of 148 samples. The OncoSNP predicted tumour content with corresponding H&E,when applicable, can be found in Table A.3.For each sample, OncoSNP performs the copy number analyses at two differentbaseline ploidy configurations returning results for each different ploidy configuration. Thepredicted results from the ploidy configuration with the higher probability (designated asPloidyNo1) were taken for each sample with the exception of 02-16987 in which the resultsfrom the lower probability configuration (designated as PloidyNo2) were chosen. Thekaryotype of this sample was confirmed to be tetraploid with the PloidyNo2 supportingthis.2.2.4 Generation of Copy Number SegmentsOncoSNP will segment the copy number data into regions (defined as segments) that arepredicted to be one of 11 possible copy number states:1. homozygous deletion (HOMD)242. heterozygous/hemizygous deletion (HETD)3. neutral (NEU)4. 3n duplication (3N_GAIN)5. 4n monoallelic duplication (4N_MONO_GAIN)6. 2n somatic loss of heterozygosity (2N_SOMLOH)7. 3n somatic LOH (3N_SOMLOH)8. 4n Somatic LOH (4N_SOMLOH)9. 2n germline LOH (2N_GERMLOH)10. 3n germline LOH (3N_GERMLOH)11. 4n germline LOH (4N_GERMLOH)Each segment is also associated with a rank (1-5) indicating the granularity of thepredicted region with 5 being the most granular (i.e. detection of focal copy numberalterations). By default, segments may overlap if a more granular segment is encapsulatedin a less granular segment (e.g. a segment of rank 5 may be found inside a segmentof rank 2). To resolve overlapping segments, overlapping segments are subtracted fromeach other to form non-overlapping segments with the highest-ranking segment takingprecedence. The resulting segment is a continuous segment file with non-overlappingsegments (Figure 2.2).2.2.5 Generation of Gene-centric Copy Number States and LogR ValuesGene centric copy number calls (i.e. assignment of a copy number status for each gene)were generated finding all overlapping segments and assigning the copy number state ofthe segment that had the highest rank. If multiple segments overlap with the same rank,then the most severe copy number state is assigned. Severity of states were ranked in thisorder (from least to most severe):1. NEU2. 2N_GERMLOH3. 3N_GERMLOH4. 4N_GERMLOH5. 2N_SOMLOH6. 3N_DUP25Rank 1Rank 3 Rank 5Rank 3 Rank 5Rank 1 Rank 1 Rank 1Segment SubtractionABFigure 2.2: OncoSNP Segmentation Methodology. Panel A: An example of potentialOncoSNP segmented results. Panel B: Following segmentation subtraction, segments nolonger overlap.7. 3N_SOMLOH8. 4N_MONODUP9. 4N_SOMLOH10. HETD11. HOMDThe gene specific copy number calls across all samples were merged together to forma gene call matrix. For downstream analyses, deletion states were considered to be HETDand HOMD. Neutral states were considered to be NEU, 2N_SOMLOH, 2N_GERMLOH,3N_GERMLOH, and 4N_GERMLOH. Finally, gain states were considered to be 3N_GAIN,4N_MONO_GAIN, 3N_SOMLOH, and 4N_SOMLOH. The states 4N_MONO_GAIN and4N_SOMLOH were considered to be high-level amplifications (HLAMP).Gene logR values were generated taking the mean logR value of all the SNP probesoverlapping with the gene after calculating for the OncoSNP logR shift. The gene logRvalues across all samples were merged together to form a gene logR matrix.For both the gene copy number and gene logR calculations, Ensembl (v72) genemodels were used as a scaffold for genomic coordinates.262.2.6 Filtering of Copy Number PolymorphismsPotential copy number polymorphisms (CNPs), defined as germline copy number variants,were filtered out first by using a normal breast tissue CNP mask [108] that containedhigh-resolution segments of CNPs. Segments which overlapped in genomic location withany CNP segment by at least 25% were considered a CNP segment and treated as beingneutral. A second CNP mask was generated by using peripheral blood lymphocytes [70],and the same criteria was used to filter out CNP segments. Finally, all focal and recurrentaberrations were manually inspected in the Integrative Genomics Viewer. Genes withhighly recurrent aberration segments occurring in the exact same genomic loci (i.e. samegenomic start and end coordinates) was classified as a CNP.2.2.7 GISTIC AnalysisGISTIC (v2.0.12) [109] was run on the OncoSNP segmented data to identify minimallycommonly deleted regions. To convert OncoSNP segmentation data into a format suitablefor GISTIC, all GISTIC deleted segments were set to a logR value of -0.11 and all gainedsegments were set to a logR value of 0.11. Additionally, all segments were modified to havea start and end coordinate that was a SNP 6.0 probe position. This modified segment filewas then used as the “-cnv” file input to GISTIC using the default parameters of GISTICand with the “deletions_threshold” and “amplifications_threshold” parameters set at -0.1and 0.1 respectively.2.2.8 Fluorescence in Situ Hybridization ExperimentsFluorescence in Situ Hybridization (FISH) analysis was performed on fixedcell suspensions (methanol/acetic acid) or nuclei extracted from formalin-fixedparaffin-embedded FFPE samples according to standard protocols as describedelsewhere using in-house bacterial artificial chromosome (BAC) clones [110]. BAC cloneswere selected according to human genome (hg19) alignment and directly labeled byNick translation according to a standard protocol (Abbott Molecular Nick translation kit)(Table 2.3). All BAC clones had previously been identified and verified by hybridizationto normal metaphases to confirm the expected site of chromosomal localization. For thepurpose of this study, the cut off value was set at > 5% for cell suspensions and > 30% forextracted nuclei, scoring a minimum of 200 interphase cells/case. Slides were analyzedusing a Carl Zeiss Axio lmager Z2 microscope equipped with a Plan Apochromat 100x/1.4oil objective. The images were acquired using a CCD camera and Metasystems software(v5.5.1).27Table 2.3: BAC probes used for FISH Validations. 1Target genes of interest. 2Controlgenes for target genes of interest. Start and end are genomic coordinates for genome buildhg19.Gene Name Location BAC Start EndPTEN2 10q24.1 89,623,195 89,728,532CTD-2553L21 89,628,330 89,802,495LCOR1 10q23.31 98,592,017 98,724,198CTD-2328D7 98,611,168 98,744,963SOCS42 14q22.1 55,493,844 55,516,206CTD-2059C4 55,485,070 55,614,826RCOR11 14q32.32 103,058,996 103,196,913CTD-2351D3 103,061,203 103,204,380TP532 17p13.1 7,571,720 7,590,863RP11-89D11 7,495,729 7,663,042NCOR11 17p11.2 15,933,408 16,118,874RP11-459E6 15,829,328 15,988,8762.2.9 RNA-seq Alignment, Filtering, and Gene Expression AnalysisRNA-seq libraries were aligned using the split-read aware aligner GSNAP (v2012-07-20)[111]. The default parameters were used in addition to using the parameters novelsplicing,the SNP 135 database and the list of splicing involving known sites or known introns inhg19. All aligned GSNAP libraries were then sorted and filtered using SAMtools [112]to remove reads that were non-primary alignments, and failed platform/vendor qualitychecks. Reads were summarized over Ensembl gene models using the Genomic FeaturesR package [113].Gene expression values were generated using the metric reads/kilobase oftranscript/million mapped reads (RPKM) and then combined across all samples to forma gene expression matrix. For downstream analysis, the gene expression matrix was log2transformed, and then quantile normalized.To generate the list of genes that were co-expressed with RCOR1, I first removed geneswith RPKM values < log2(5) in > 30% of samples. Then I performed a pairwise Spearmancorrelation test and adjusted for multiple-test correction by controlling for FDR. Genes wereconsidered to be co-expressed with RCOR1 if the FDR was < 0.0001.2.2.10 Cis/Trans Correlation AnalysisGene-centric cis-correlations, defined as copy number aberrations that correlate with geneexpression, were generated by 1) doing a pairwise Spearman rank correlation test on thegene copy number logR values and gene expression values, and by 2) classifying thesamples into deletions, neutral, and amplifications based on the gene copy number states28(defined as previously) and performing a Kruskal-Wallis test on the gene expression value.All p-values were then multiple-test corrected, using FDR, and genes were classified ascis-correlated if they were either significant (FDR < 0.1 and Spearman rho > 0) in 1) or 2).Pathway enrichment was done on genes that were cis-correlated, had 5% deletion and >1 HOMD or 5% gain and > 1 high-level amplification.2.2.11 xseq AnalysisFor the xseq analysis [100], the gene-centric copy number state and expression matriceswere used as input. The genomic alterations are provided through a gene-centric copynumber state matrix. The copy number state matrix is converted into a binary matrix byencoding highly significant aberrations (i.e. HOMD and HLAMP) as 1 and all other eventsas 0.2.2.12 Immunohistochemistry on Primary Lymphoma and Reactive TonsilSamplesImmunohistochemistry (IHC) was performed on formalin-fixed paraffin-embedded (FFPE)samples of 68 DLBCL cases with matching copy number data of which 40 of these caseshad matching gene expression data. A reactive tonsil specimen was used to assess thestaining pattern in the germinal centre. Four µm sections of tissue microarrays (TMA) orwhole tissue sections were stained with an anti-RCOR1 antibody (clone S72-8, LSBio,Seattle, Washington; dilution 1:500) using routine protocols for automated procedures onthe Ventana Benchmark XT (Ventana Medical Systems, Tucson, Arizona).Scoring was performed by an experienced hematopathologist (Dr. Anja Mottok), andthe percentage of tumour cells positive for RCOR1, was recorded in 10% increments. Thekernel density on the percentage of tumour cells positive of RCOR1 was plotted and athreshold of 30% was defined based on the lowest point of the density. Patients wereclassified into RCOR1 IHC negative if < 30% and RCOR1 IHC positive if ≥ 30%. IHCcorrelation with deletion status was performed using a Fisher’s exact test comparing theodds ratio of deletion to neutral in the RCOR1 IHC negative group to RCOR1 IHC positivegroup. IHC was correlated to gene logR and gene expression using a Wilcoxon test.2.2.13 Mutual Exclusivity and Co-occurrence AnalysesTo test for the mutual exclusivity or co-occurrence of mutations between two genes, aone-sided Fisher’s exact test was used. Once to test whether the two genes were mutuallyexclusive (odds ratio < 1), and once to test whether their mutations co-occurred (odds ratio> 1). Multiple test correction, using FDR, was performed on all the p-values within eachone-sided Fisher’s exact test. Significant results were considered for FDR < 0.1.292.2.14 Cell Line Selection for In Vitro KnockdownA total of 12 different B-cell lymphoma cell lines were considered for shRNA-mediatedRCOR1 knockdown. Of these 12 cell lines, the cell lines KM-H2 (classical Hodgkinlymphoma-derived) and Raji (Burkitt lymphoma-derived) showed the highest knockdownof RCOR1 at both the gene expression (qRT-PCR) and protein (western blot) levels.Therefore, cells from these two cell lines were carried forward for gene expression analysis.2.2.15 Virus Production, Transduction and Transcript expressionLentivirus particles from different sources were used to transduce two suspensioncells lines KM-H2 and Raji. For RCOR1 mRNA interference in both cell linesthe Open Biosystems clones V3LHS_373654 172_0829-G-4 (Cl2) and V2LHS_87301172_0185-E-3 (Cl5) were used; shRNAmir constructs for RCOR1 and pGIPZ non-silencingLentiviral shRNA control virus particles were generated from glycerol stocks andco-transfected with an optimized packaging mix according to the manufacturers’ protocol(Trans-Lentiviral Packaging Mix, Open Biosystems, containing pTLA1-Pak, pTLA1-Enz,pTLA1-Env, pTLA1-Rev and pTLA1-TOFF) by lipid-mediated transfection (Arrest-In, OpenBiosystems, Thermo Scientific, Nepean, Ontario, Canada). Viral titres were assayed usingthe Clontech Lenti-X p24 Rapid Titre kit (Clontech, Mountain View, CA, USA).The end result was two RCOR1 knockdowns for KM-H2 (KM-H2 Cl2 KD, KM-H2Cl5 KD) with a matching non-silencing control (KM-H2 NS). For Raji, a single RCOR1knockdown (Raji OB KD) was available, using Cl2, with a matching non-silencing control(Raji OB NS). The shRNAmir Cl5 did not generate sufficient RCOR1 knockdown to proceedwith further downstream analysis. Therefore, the Sigma’s MISSION shRNA Lentiviraltransduction particles (TRCN0000147106) and non-target shRNA control transductionparticles for RCOR1 knockdown in the Raji cell line were used. This resulted in a MISSIONshRNA RCOR1 knockdown (Raji M KD) with a matching non-silencing control (Raji M NS).In total for downstream analyses, 4 biological RCOR1 knockdown replicates (KM-H2 Cl2KD, KM-H2 Cl5 KD, Raji OB KD and Raji M KD) along with 3 matching non-silencingcontrols (KM-H2 NS, Raji OB NS, Raji M NS) were available.Transductions were carried out following standard protocols using an MOI of 10, andpredetermined puromycin selection was added 3 days after transduction. Transduced cellswere cultured using standard techniques. One week after selection was applied matchedcontrol and RCOR1 transduced cells from all cell lines and transduction systems werecollected and RNA and protein were extracted using the Qiagen AllPrep DNA/RNA kit(Qiagen) and by M-PER extraction (Thermo Scientific) respectively. Knockdown efficiencywas evaluated by measuring residual expression of the transcript by quantitative RT-PCR.Protein concentration was determined using the BCA assay (Thermo Scientific). Westernblots were generated by wet transfer of 50 µg cell lysate following electrophoresis ona 4-12% Bis-Tris gradiant gel (Life Technologies). RCOR1 was labeled using a rabbit30monoclonal antibody (Abcam, ab183711), including B-actin detection as a loading control,and detected using ECL western blotting reagents (GE Healthcare).2.2.16 Gene Expression Analysis of In Vitro Knockdown CellsRNA-seq libraries were generated, as per previous protocols [6], for each of the in vitroRCOR1 knockdown clones (KM-H2 Cl2 KD, KM-H2 Cl5 KD, Raji OB KD, Raji M KD) andtheir matching non-silencing control (KM-H2 NS, Raji OB NS, Raji M NS). Gene expressionvalues were generated using the metric RPKM and then combined across all samples toform a gene expression matrix. The expression matrix was then log2 transformed andquantile normalized.Differential expression was first performed within each B-cell line model. In Raji, Icompared Raji OB KD and Raji M KD vs. Raji OB NS and Raji M NS. While in KM-H2,I compared KM-H2 Cl2 KD and KM-H2 Cl5 vs. KM-H2 NS. Differential expression valueswere generated using the fold change difference metric. Specifically, for each gene theaverage expression in the knockdown libraries was subtracted from the average expressionof the non-silencing libraries. A positive fold change difference would mean a higherexpression in the knockdown libraries compared to the non-silencing and vice versa fora negative fold change difference. In addition, any genes that had an expression value ofbelow log2(5) in greater than 30% of the samples were considered lowly expressed andremoved from the analysis.Genes were considered significantly differentially expressed if their fold changedifference was ≥ 0.3 or ≤ -0.3. Following differential expression analyses within eachB cell line model, the fold change difference correlations between Raji and KM-H2 werecompared using a spearman rank test. Using only genes that were concordant in theirfold change directionality, a list of differentially expressed genes was generated for Raji(n = 1209) and KM-H2 (n = 875). Pathway analyses were then performed on just theup-regulated genes (i.e. positive fold change difference) and just the down-regulatedgenes (i.e. negative fold change difference) separately for Raji and KM-H2. The lists ofenriched pathways were then compared using Venn diagrams to show the consistency inthe dysregulated pathways between the two B cell line systems. Given the consistencyin the fold change difference correlations along with the large overlap in the enrichedpathways between Raji and KM-H2, I treated all the knockdowns in both Raji and KM-H2as one group (n = 4) and their non-silencing controls as another group (n = 3). Fold changedifferences were re-calculated and a final list of differentially expressed genes (n = 1588)was generated using the fold change threshold of 0.3.2.2.17 RCOR1 Loss-Associated Gene Signature AnalysisGenes classified as being differentially expressed in the in vitro RCOR1 knockdownexperiments, and genes co-expressed with RCOR1 were intersected followed by filtering to31only include genes which were concordant in their up- and down-directionality resulting inthe RCOR1 loss-associated gene signature (Table A.4). Next, hierarchical unsupervisedclustering was performed on the BCCA RNA-seq cohort using the RCOR1 loss-associatedgene signature. The Ward criteria were used for linking the clusters. The sample clusterdendrogram was cut to form 3 distinct clusters (Figure 2.3). The cluster with the lowestaverage RCOR1 expression was defined as the RCOR1-low cluster. The cluster withthe highest average RCOR1 expression was defined as the RCOR1-high cluster. Theremaining cluster was defined as the unclassified cluster.For the analysis of external cohorts, only the genes in the RCOR1 loss-associatedsignature were carried over, thus this strategy was defined as “rediscovery” rather thanvalidation. These genes were used to perform a de novo clustering in these cohortsfollowed by defining of the clusters using the same methodology as the study cohort(i.e. cluster with the lowest average RCOR1 expression was defined as the RCOR1-lowcluster).Cluster of Patients with Lowest Average RCOR1 Expression (RCOR1-low cluster) RCOR1 Loss-Associated Gene Signature  (n = 233) 91 RNA-seq DLBCL  Cutting of Sample Clustering Dendrogram  Cluster of Patients with Medium Average RCOR1 Expression (Unclassified cluster)  Cluster of Patients with Highest Average RCOR1 Expression (RCOR1-high cluster)  RCOR1 Expression Distribution Figure 2.3: RCOR1 Loss-Associated Gene Signature Clustering Methodology. Thisfigure visualizes how the signature is used to derive the different expression clusters thatgo on for downstream analysis.322.2.18 Correlating Deletions with the RCOR1 Loss-Associated GeneSignatureA one-sided Fisher’s exact test was performed to test for the enrichment of deletionsin the RCOR1-low to RCOR1-high cluster and also the RCOR1-low and unclassified toRCOR1-high cluster. Samples that were not cis-correlated (i.e. deletion did not induce achange in RCOR1 expression and indicated in black for the annotation track in Figure 2.20)were excluded from the analysis.2.2.19 Determining the Expression Status of RCOR1, LCOR, and NCOR1For the annotation tracks in Figure 2.20, Figure 2.22, and Figure 2.23, patientswere dichotomized into RCOR1, LCOR and NCOR1 high and low expression usingGaussian mixture models. Specifically, in the BCCA cohort, due to RCOR1 and NCOR1being cis-correlated, the parameters for a two-component Gaussian mixture model wereempirically derived from deleted and neutral cases, corresponding to a low and highexpression component, then the posterior was calculated for each sample. Samples wereassigned to the low and high expression cluster based on a posterior threshold of 0.15.This threshold was set to include the majority of the deleted cases in the low expressioncluster. Since LCOR was not cis-correlated, the copy number data were not used to deriveany parameters. Instead, the data were fit to a two-component Gaussian mixture modelusing the expectation maximization algorithm. Samples were assigned to a low and highexpression cluster based on a posterior threshold of 0.5.2.2.20 Quality Control on the Monti CohortGene expression profiling on the Monti et al. cohort [104] revealed the sample GSM844275to be consistently clustering by itself. The normalized unscaled standard error metric forthis sample at a median of 1.29 was much higher than the Affymetrix suggested thresholdof 1.05. This sample was excluded from downstream analysis.2.2.21 Survival AnalysisSurvival analyses were performed at the gene-centric copy number level by dichotomizingsamples into deletions vs. copy number neutral. Copy number neutral loss ofheterozygosity samples were excluded from outcome correlation analyses. The log ranktest was used to test if outcomes were different between groups using OS, DSS, andPFS as endpoints. OS was defined as death from any cause. DSS was defined asdeath specifically from lymphoma. PFS was defined as the time from initial diagnosisto disease-progression, lymphoma relapse or death from any cause.Similarly for the RCOR1 loss-associated gene signature, I used the log rank testcomparing OS of patients in the RCOR1-low vs. RCOR1-high expression clusters. To33test for the prognostic independence of the RCOR1 aberrations and gene signature fromthe IPI and COO, I performed pairwise multivariate Cox regression.2.2.22 Pathway Enrichment AnalysisPathway enrichment analysis was performed using the Cytoscape Reactome FI plugin(v2013) [114].2.3 Results2.3.1 High-Resolution Copy Number Analysis of DLBCL50Frequency(%)500DeletionsGains25251 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122FHITCD58 CXCL6TNFAIP3PRDM1TOXMTAP*/CDKN2A*NCOR1B2MRCOR1/TRAF3DLEU1LCORPTENTNFSF14TP53ChromosomeAC D ENumber of AberrationsBUnclassifiedActivated B-Cell LikeGerminal Centre B-Cell LikeDeletions GainsPTEN LCOR TP53 NCOR1SOCS4 RCOR1REL* NFKBIZ*CDK6EGFRMLLSMAD2ROCK1IFRD1FOXP1*KCNMB4NUF2GPC5CD58NUF2REL*FHITFOXP1*NFKBIZ*CXCL6PRDM1TNFAIP3EGFRCDK6IFRD1TOXMTAP*CDKN2A*PTENLCORMLLKCNMB4DLEU1*GPC5RCOR1TRAF3B2MTP53NCOR1ROCK1SMAD2TNFSF142040 0 20 40Normal CellNormal CellNormal CellTumour CellTumour CellTumour CellFigure 2.4: Genome-wide Copy Number Architecture of 148 DLBCL Patients. PanelA: Genome-wide linear representation of the sCNA profile, summarized at the gene level,across all 148 samples, with gains in red, and deletions in blue. Selected genes withrecurrent somatic genomic aberrations are annotated with arrows. CNPs have beensubtracted from this plot. Panel B: Stacked horizontal-bar plot indicating the absolutenumber of gains (on right) and deletions (on left) of the selected genes from panel A.Different colors represent the aberration distribution according to COO classification, withasterisks indicating whether there is an enrichment of the aberration in a particular subtype.Panels C-E: Two-color FISH assays, with red probe interrogating a target gene of interest(LCOR, RCOR1, and NCOR1) and green probe interrogating an established tumoursuppressor on the same chromosomal arm (PTEN, SOCS4, and TP53) as a reference.34Genomic gains and losses were profiled in the 148 sample BCCA study cohort(Figure 2.4A). On average, 16.8% of the genome affecting 3106 (15.3%) protein codinggenes was aberrant per sample (full distributions in Figure 2.5). I confirmed previouslyreported, highly recurrent, large-scale chromosome alterations, including amplifications ofthe entire chromosome 7, COO-specific deletions of 6q, and amplifications of chromosome3 and 18q (enriched in ABC-DLBCL; Figure 2.6) [96, 97].020406080NumberofAberratedGenesAPercentageofGenomeAffected(%)Homozygous DeletionHemizygous Deletion3N Duplication4N Monoallelic Duplication3N LOH Duplication4N LOH DuplicationActivated B-Cell LikeGerminal CentreB-Cell LikeNot AvailableUnclassified06−1039806−2491505−1928706−2305706−2390702−2299181−5288405−2812204−2342604−1013400−2642702−2017007−17613AZ_0000105−1293905−2054307−2835106−1671699−2554901−2640592−5618897−1440206−2205704−2999506−2734703−1036302−2472506−2567408−2117501−1866708−11596AZ_0000204−3910806−2890004−3642206−3404305−2596907−1662308−2589494−2679506−3014509−4108203−1312305−2490402−1381800−1222300−1569496−2088306−1525609−1273702−3064706−1991905−2608405−2466605−3294707−3796804−2064406−3002506−1463406−2488106−2547004−1115607−3183307−2380403−3326605−1132805−2311005−2440198−2253201−1996906−2379205−2543907−2501204−2926405−2567407−35482AZ_0000303−3171303−3043806−1592203−1159295−3281407−2063407−3062805−2456105−1779301−2657902−2202399−2713706−3377707−3401408−1546006−1631682−5757006−1854702−3051985−6385505−2439506−1153509−3300303−34969AZ_0000406−3135303−2098106−1731503−2348806−2577805−1333402−1698706−2492504−3915603−2696906−2637803−2401103−3411205−1303006−2679203−3388892−6795005−3034904−3482405−1790903−3437983−1527201−2730801−2838905−3395404−4123506−2790503−2804502−2521603−2051202−2878306−2879899−2222695−1101503−2326906−2093706−1474203−3207903−2781206−2261504−4253907−2328905−3019281−5193804−2581004−28140050001000015000 BFigure 2.5: Copy Number Aberration Distributions. Samples are ordered bypercentage of genome affected (high to low) within each subtype. Panel A: A stackedbarplot indicating the percentage of the genome affected per sample. Panel B: A stackedbarplot indicating the absolute number of copy number altered protein coding genes persample.35Frequency(%)50500GainsDeletionsFrequency(%)50500Gains 2525Deletions1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome2525Trisomy 3 /3q Amplifcations18q AmplificationsMTAP/CDKN2A6q DeletionsNFKBIZRELFOXP1ABFigure 2.6: Genome-wide Copy Number Architecture per Subtype. Frequency ofgains (above 0) and deletions (below 0) plotted across the genome. Significantly enrichedalterations in one of the subtypes are highlighted. Panel A: Copy Number Architecture of38 ABC cases. Panel B: Copy Number Architecture of 53 GCB cases.Amplifications in REL (n = 34, 22.9% of patients) [98] were enriched in GCB samples(P < 0.001, Figure 2.4B) while amplifications in FOXP1 (n = 27, 18.2%, P = 0.03) andNFKBIZ (n = 31, 21%, P < 0.001) [102] were enriched in ABC samples. I also observeddeletions in CDKN2A/MTAP (n = 33/29; 22.3%/19.6%, P = 0.002 / P = 0.003 enriched inABC) [48], TNFAIP3 (n = 43, 29.1%) [7], CD58 (n = 18, 12.2%), B2M (n = 26, 17.6%) [104],FHIT (n = 15, 10.1%) [115], and PRDM1 (n = 43, 29.1%) [116], corroborating previousreports. Commonly affected regions can be found in Figure 2.7, Table A.1 and Table A.2.2.3.2 Dysregulated Transcriptional Networks Identified by Integration ofCopy Number and Gene Expression DataI integrated the gene expression profiles of 91 matching RNA-seq libraries to investigatesCNA alterations impacting gene expression profiles. I estimated 22.1% of protein-codinggenes to be cis-correlated (Spearman’s Rank Correlation Test and Kruskal-Wallis test(FDR < 0.1). These cis-correlated genes were enriched for the biological processesneurotrophin signalling pathway, B cell receptor signalling pathway, signalling eventsmediated by HDAC class I, and class I MHC mediated antigen processing & presentation.To pinpoint candidate functional aberrations, I performed a xseq analysis [100], whichlinks sCNA and gene expression data through known transcriptional networks (trans36RELMTAP / CDKN2ARCOR1 / TRAF3TNFRSF18, TNFRSF14,PRDM16FHITPRDM1, TNFAIP3CD58DLEU2, RB1MYC, FAM84BPPP2R1AA BSignificance (Q-value) Significance (Q-value)PPP2R1BDVL2, VAMP2CD247, SSR2GPC5Figure 2.7: GISTIC Deletion and Gain Score Plots. False discovery rates (FDR) are onthe x-axes across the 22 autosomes on the y-axes. Selected genes are highlighted forboth deletions and gains. The green line indicates a FDR cut-off of 0.1. Panel A: GISTICdeletion score plot. Panel B: GISTIC gain score plot.correlations). The top 10 candidate deleted genes and the top 10 candidate gained genesthat were predicted to significantly impact the expression of their cognate genes are listedin Figure 2.8A. These genes included the known tumour suppressor CDKN2A as wellas genomic loci that harboured multiple candidate genes 1q22-q24.2 (CD247, SSR2),3q21-q29 (TFRC, CSTA, RAB7A, ITGB5), 11q13-q21 (NUMA1, RSF1), 14q32 (RCOR1,TRAF3, TNFAIP2) and 17p13 (DVL2, VAMP2).37BCumulativeSurvival(%)Progression Free Survival (Years)No. at Risk33 19 13 4 0Deletion88 70 56 19 6NeutralCDKN2A NeutralCDKN2A Deletion2 4 6 80100806040020C10 4 2 1 0Deletion120 88 69 25 8NeutralNo. at RiskRCOR1-NeutralRCOR1-DeletionCumulativeSurvival(%) 100806040020Log Rank P = 0.0012 4Log Rank P = 0.0166 80Progression Free Survival (Years)NDRG1MIFRCOR1TNFAIP2TRAF3LRRC47FLNBVAMP2DVL2FOXO3RSF1NUMA1CDKN2AITGB5RAB7ACSTATFRCSSR2CD247PTPN1202−1698703−2401102−2521603−2696902−3064701−2838903−23488AZ_0000108−2589405−1132804−2064405−2439506−1525609−3300300−2642705−1293903−2804506−1039804−1115609−41082AZ_0000205−2812202−2299105−2311005−1333403−3411296−2088300−1569407−3796806−2305781−5288402−1381806−2390705−1790905−2466608−1546006−2491506−2379292−5618806−2679207−3183306−2492507−2380406−3014506−2488104−3482482−5757006−2093705−2608401−1996901−2730806−19919AZ_0000303−2781203−3437903−1312305−2543906−2577803−1159205−2054394−2679599−2554906−1631606−1463405−2456107−1761304−3910805−2490406−1153504−2342603−3388805−1303003−1036397−1440206−2734706−2790506−1671604−2999500−1222398−2253206−2879883−1527207−2835192−6795002−2472501−1866702−2017008−2117506−2637809−1273708−1159604−1013405−3019205−3034906−3002507−2063406−2205795−1101501−2640505−2440103−3326606−2547003−2051285−6385507−2501206−2567402−3051905−1928703−3171305−3294706−1854704−4123504−3915603−3496905−2567407−3548295−3281499−22226A Homozygous DeletionHemizygous Deletion3N Duplication4N Monoallelic Duplication3N LOH Duplication4N LOH Duplication3q21-q291q22-q24.214q3211q13-q2117p13Figure 2.8: Top 20 Candidate Driver Genes Selected by xseq. Panel A: Matrix showingthe top 20 xseq genes in rows and samples containing genomic aberrations in any oneof these genes in the columns. To simplify the visualization, any gains affecting deletedgenes and deletions affecting gained genes are removed. Panels B and C: Kaplan-Meieranalyses for CDKN2A and RCOR1 deletions, respectively demonstrate an association withpoor progression-free survival.2.3.3 RCOR1 Deletions Define a Subgroup of DLBCL Patients withUnfavourable Survival in a Homogenously R-CHOP-Treated CohortI investigated the prognostic ability of the xseq identified aberrations, using clinical outcomedata available for the 139 R-CHOP-treated DLBCL patients. For genes with > 5%aberration frequency, CDKN2A deletions were associated with unfavourable PFS (5 years- 54.5% deleted vs. 76.3% neutral, P = 0.016, Figure 2.8B), DSS (P = 0.006, Figure 2.9E)and OS (P = 0.016, Figure 2.9I) as previously reported [48], and the set of genes locatedin the 14q32 locus were associated with PFS: RCOR1 (5 years - 22.5% deleted vs. 71.1%neutral, P = 0.001, Figure 2.8C), TRAF3 (5 years - 18.3% deleted vs. 72% neutral, P< 0.001, Figure 2.9C) and TNFAIP2 (5 years - 18.3% deleted vs. 72% neutral, P <0.001, Figure 2.9D). Similar statistical trends were found using the endpoints DSS andOS (Figure 2.9): RCOR1 (DSS = 0.037, OS = 0.093), TRAF3 (DSS = 0.003, OS = 0.01),and TNFAIP2 (DSS = 0.003, OS = 0.01).380 2 4 6 8Progression−Free Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8Disease−Specific Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8Overall Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8Progression−Free Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Disease−Specific Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Overall Survival (Years)CumulativeSurvival(%)020406080100CDKN2A RCOR1PFSDSSOSCDKN2A Deletion (n = 33)CDKN2A Neutral (n = 88)CDKN2A Deletion (n = 33)CDKN2A Neutral (n = 88)CDKN2A Deletion (n = 33)CDKN2A Neutral (n = 88)RCOR1 Deletion(n = 10)RCOR1 Neutral (n = 120)RCOR1 Deletion (n = 10)RCOR1 Neutral (n = 120)RCOR1 Deletion (n = 10)RCOR1 Neutral (n = 120)Log Rank P = 0.006Log Rank P = 0.016Log Rank P = 0.016Log Rank P = 0.037Log Rank P = 0.093Log Rank P = 0.001A BE FI J0 2 4 6 8Progression−Free Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Disease−Specific Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Overall Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8Progression−Free Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Disease−Specific Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8 10Overall Survival (Years)CumulativeSurvival(%)020406080100TRAF3 TNFAIP2TRAF3 Deletion(n = 12)TRAF3 Neutral (n = 119)TRAF3 Deletion(n = 12)TRAF3 Neutral (n = 119)TRAF3 Deletion(n = 12)TRAF3 Neutral (n = 119)TNFAIP2 Deletion(n = 12)TNFAIP2 Neutral (n = 119)TNFAIP2 Deletion(n = 12)TNFAIP2 Neutral (n = 119)TNFAIP2 Deletion(n = 12)TNFAIP2 Neutral (n = 119)Log Rank P = 0.003Log Rank P = 0.01Log Rank P < 0.001 Log Rank P < 0.001Log Rank P = 0.003Log Rank P = 0.01C DG HK LFigure 2.9: Association of CDKN2A, RCOR1, TRAF3, and TNFAIP2 Deletions withPFS, DSS, and OS.Given the known role of RCOR1 in chromatin modification, I pursued this genein this study. Further analysis revealed an association between deletion of thetranscriptional co-repressor and PFS that was independent of standard prognostic riskfactors included in the IPI (P = 0.005) and COO phenotyping (P = 0.005) using pair-wisemultivariate Cox regression (Table 2.4). RCOR1 deletions were significantly correlatedwith gene expression (Figure 2.10), and protein expression by IHC (Figure 2.11C,D) withrepresentative RCOR1 IHC staining cases shown in Figure 2.12.Additionally, protein expression was correlated with gene expression (Figure 2.11E)and low protein expression was associated with unfavourable survival (Figure 2.13).Finally IHC on a benign reactive tonsil revealed strong nuclear staining of the germinalcentre cells compared to the mantle zone cells that were negative (Figure 2.12D),suggesting that the RCOR1 deletions would have a pathogenic consequence as theirnormal counterparts express RCOR1.39Table 2.4: Pairwise Multivariate Analyses of RCOR1 Deletions and RCOR1-low vs.Established Prognostic Markers (COO and IPI). Pairwise analyses 1-2 compare theRCOR1 deletions to COO and IPI in the BCCA study cohort. Pairwise analyses 3-4compare the RCOR1-low clustering to COO and IPI in the BCCA study cohort. Pairwiseanalysis 5 compares the RCOR1-low/high clustering in the IPI low group to COO in theLenz rediscovery cohort. †Within the IPI low group.Patients Pairwise Multivariate AnalysisVariable No. % HR 95% CI PPairwise analysis 1RCOR1 deletion 8/92 8.70% 3.891 1.511 - 10.020 0.005COO (GCB) 45/92 48.90% 0.525 0.221 - 1.246 0.144COO (U) 13/92 14.10% 0.587 0.167 - 2.060 0.405Pairwise analysis 2RCOR1 deletion 9/105 8.60% 3.411 1.446 - 8.047 0.005IPI ≥ 3 41/105 39.00% 2.587 1.248 - 5.365 0.011Pairwise analysis 3RCOR1-low Cluster 18/63 28.60% 3.229 1.170 - 8.910 0.024COO (GCB) 32/63 50.80% 1.062 0.365 - 3.089 0.913COO (U) 8/63 12.70% 0.64 0.126 - 3.237 0.589Pairwise analysis 4RCOR1-low Cluster 16/56 28.60% 2.437 0.840 - 7.074 0.101IPI ≥ 3 14/56 25.00% 4.469 1.517 - 13.170 0.007Pairwise analysis 5†RCOR1-low Cluster 17/65 26.20% 3.68 1.066 - 12.701 0.039COO (GCB) 31/65 47.70% 0.253 0.065 - 0.990 0.048COO (U) 12/65 18.50% 0.279 0.029 - 2.706 0.27140024−1.5 −1.0 −0.5 0.0Gene LogRGeneExpressionRPKM(Log2)Copy Number StatusHomozygous DeletionHemizygous DeletionNeutral3N Duplication01234Deletion NeutralCopy Number StatusGeneExpressionRPKM(Log2)Spearman Rank P = 0.00053Wilcox P = 0.00017ABFigure 2.10: Correlations of RCOR1 Deletions with Gene Expression. Panel A:Scatterplot of RCOR1 gene expression vs. gene LogR. Panel B: Boxplot comparing theexpression of RCOR1 between RCOR1 deleted cases and RCOR1 neutral cases.410.000.010.020.030 25 50 75 100Proportion of Positive Tumour CellsDensity0204060Negative PositiveRCOR1 IHC ClassNumberofSamplesCopy Number StatusDeletionNeutralGain0.000.250.500.751.00RelativeProportionCopy Number StatusDeletionNeutralGain−1.5−1.0−0.50.0RCOR1 IHC ClassGeneLogR0123RCOR1 IHC ClassGeneExpressionRPKM(Log2)BDACEWilcox P = 0.009 Wilcox P = 0.041Fisher P = 0.009Negative PositiveRCOR1 IHC ClassNegative Positive Negative PositiveFigure 2.11: RCOR1 IHC Correlations with Copy Number Data. Panel A: Distributionof proportion of RCOR1 positive tumour cells. Vertical red dotted line shows the cut-off todetermine RCOR1 IHC negative and RCOR1 IHC positive class. Panels B-C: RCOR1copy number status distribution between the negative and positive class with panel Cshowing absolute numbers and panel D showing relative proportion within each class.Panel D: Boxplot showing the RCOR1 gene logR distribution between the negative andpositive class. Panel E: Boxplot showing the RCOR1 gene expression distribution betweennegative and positive class.42BDACFigure 2.12: RCOR1 IHC Stains. Panels A-C: RCOR1 staining of a homozygous deletedcase (05-19287), heterozygous deleted case (05-24904), and a neutral case (05-23110).Panel D: RCOR1 staining of a reactive tonsil. B cells within the germinal centres of thereactive tonsil demonstrate a uniformly strong nuclear staining whereas the mantle cellswere negative or showed a comparatively weak expression.0 2 4 6 8 10Overall Survival (Years)CumulativeSurvival(%)0204060801000 2 4 6 8Progression−Free Survival (Years)CumulativeSurvival(%)020406080100RCOR1 IHCNegative(n = 4)Log Rank P = 0.008 Log Rank P = 0.0070 2 4 6 8 10Disease Specific−Free Survival (Years)CumulativeSurvival(%)020406080100Log Rank P = 0.004RCOR1 IHCPositive(n = 63)BA CRCOR1 IHCPositive(n = 63)RCOR1 IHCPositive(n = 63)RCOR1 IHCNegative(n = 4)RCOR1 IHCNegative(n = 4)Figure 2.13: RCOR1 IHC Correlations with Survival. Kaplan-Meier Curves are shownin comparing the RCOR1 IHC negative vs. positive class in terms of survival for OS (panelA), PFS (panel B), and DSS (panel C).432.3.4 Recurrent Deletions in Members of the Corepressor FamilyIn addition to deletions in RCOR1 (n = 11, 7.5%), I also observed recurrent deletionsin members of the corepressor gene family LCOR (n = 13, 8.8%) and NCOR1 (n = 21,14.2%). I also analyzed copy number data from an independent DLBCL cohort (n = 77)[7], identifying recurrent deletions (RCOR1, n = 6, 7.8%; LCOR, n = 2, 2.6%; NCOR1, n =15, 19.5% respectively) in these 3 genes. Focal examples are shown in Figure 2.14 andFigure 2.15.97,500,00098,000,00098,500,00099,000,00099,500,000NeutralHomozygousDeletionHemizygousDeletionGenomic Position Genomic Position Genomic Position15,000,00015,500,00016,000,00016,500,00017,000,000LCOR99-2554900003861_Columbia_GW6.0B−3−2−101Log 2CopyNumberRatio−3−2−101102,000,000102,500,000103,000,000103,500,000104,000,000Log 2CopyNumberRatio−2−101−2−101C NCOR103-2696900003861_Columbia_GW6.0A RCOR100003883_Columbia_GW6.005-19287Log 2CopyNumberRatio−2−10−2−1017p12-p11.210q24.1-q24.214q32.31-q32.33Figure 2.14: Focal View of Raw Copy Number Values. Each panel containsraw copy number values from two samples. Top sample from the BCCAstudy cohort and the bottom sample from the Pasqualucci et al. cohort. PanelA: Focal view of chr14:102,058,998-104,377,837 demonstrating predicted RCOR1deletions in samples 05-19287 and 00003883_Columbia_GW6.0. Panel B: Focalview of chr10:97,592,017-99,740,800 demonstrating predicted LCOR deletions insamples 99-25549 and 00003861_Columbia_GW6.0. Panel C: Focal view ofchr17:14,934,718-17,119,010 demonstrating predicted NCOR1 deletions in samples03-26969 and 00003861_Columbia_GW6.0.In selected index cases, deletions in LCOR (n = 6) and NCOR1 (n = 7), as well as allRCOR1 deletions (n = 11) were validated using FISH with a reference probe interrogatinga known tumour suppressor on the same chromosomal arm (PTEN, TP53, SOCS4).Figure 2.4 shows three specific examples illustrating the focal nature of the observeddeletions: a homozygous LCOR deletion (10q24.1; 13 kb; case 99-25549), PTEN neutral(10q23.31) (Figure 2.4C); a homozygous RCOR1 deletion (14q32.31; 143 kb; case05-19287), hemizygous SOCS4 deletion (14q22.3) (Figure 2.4D), and a hemizygousNCOR1 deletion (17p12; 159 kb; case 01-19969) TP53 neutral (17p13.1) (Figure 2.4E).This suggests that the genomic deletions may have been selected for - independent ofknown proximal tumour suppressors in at least a subset of cases.Although RCOR1, LCOR and NCOR1 were all found to be the focal target ofhemizygous or homozygous deletion events, none of these 3 genes were affected bysingle point mutations in the RNA-seq cohort as previously reported [6]. When analyzingmutational patterns including RCOR1, LCOR and NCOR1 deletions (Figure 2.16A),44I found that LCOR and NCOR1 deletions significantly co-occurred with other somaticmutations and copy number aberrations, such as TP53 mutations (correlated with NCOR1)and FAS mutations (correlated with LCOR) (Figure 2.16B). However, RCOR1 deletions didnot significantly co-occur with any other somatic mutations.451020000001025000001030000001035000001040000009750000098000000985000009900000099500000Genomic PositionAGenomic Position Genomic PositionB C600000090000001200000015000000Minimal Commonly Deleted Region Gene of InterestNeutralHemizygousDeletion3N DuplicationHomozygousDeletionRCOR1 LCOR NCOR117p13.2-p11.210q24.1-q24.214q32.31-q32.33 98,410,901 5,592,164 16,277,57898,732,508103,024,387 103,566,785Figure 2.15: Focal View of Raw Copy Number Values for RCOR1, LCOR, and NCOR1Across All Deleted Samples. The red bar indicates minimal commonly deleted regionsthat include the genes of interest.46C6orf114DMDHUWE1JMJD2CRAD51L1TAF1KIAA0100RNF135UBAP2ALG10KIF27PACS1RNF213STAT6DNAH8HIST1H2ACHIST1H2BCHIST1H2BOHIST1H3DLPCAT3MAP4K1STAT3CCNHHDAC7HIST1H1DHIST1H2AGHVCN1NR3C1PXDNRAPGEF1TLE4TTC27ADAMTS12CD79BDNM2HIST1H1CMKI67PRKDCSMG7ABCA7BBXDARS2EP300GNA13ITSN1PASKZFHX3ZNF134CIITACTCFMCTP2NLRP11NMD3PLCE1BCL10BCL6CCND3IRF4MEF2BMLL2MTRRBL2BTG2CD70CNOT1COL4A2LRRK1MAN2A2MYO5BPAXIP1POSTNRFTN1ETS1KIF20BMTERFPIK3CDBTG1CREBBPPIM1SAMD9TET2KLHL6FASFOXO1PDS5BCD58MYD88CARD11MALT1EZH2SENP6TNFRSF14TMEM30AB2MTP53SGK1BCL2LCORNCOR1RCOR105−1928782−5757005−2490499−2554906−2488100−2642708−1159606−1671608−1546008−2589406−2491509−1273703−1312304−1013402−1381806−1039807−2501205−1293905−3294797−1440206−1631603−3171395−3281405−2440105−2311007−3062800−15694SPEC−112006−1854707−3548294−2679502−2299106−2205796−2088303−3326606−2305798−2253204−1115601−1866705−2567402−2472505−1132805−2466605−2543906−3377707−1761303−3043806−2734792−5618801−2657906−2390709−4108201−1996906−3002502−3064781−5288404−2342606−2567407−3183385−6385505−1779399−2713702−2017005−2608408−2117506−1153505−2400607−3796806−3014505−2054306−1525605−2439506−25470SPEC−1185SPEC−118706−1592202−2202302−3051906−3404306−1991904−3910804−20644SPEC−120306−2379204−2926401−2640509−3300303−1036306−14634Genomic Aberration StatusHomozygous DeletionHemizygous Deletion4N Monoallelic Duplication4N LOH DuplicationMissense Single Point MutationNonsense Single Point MutationARCOR1NCOR1LCORC6orf114DMDHUWE1JMJD2CRAD51L1TAF1KIAA0100RNF135UBAP2ALG10KIF27PACS1RNF213STAT6DNAH8HIST1H2ACHIST1H2BCHIST1H2BOHIST1H3DLPCAT3MAP4K1STAT3CCNHHDAC7HIST1H1DHIST1H2AGHVCN1NR3C1PXDNRAPGEF1TLE4TTC27ADAMTS12CD79BDNM2HIST1H1CMKI67PRKDCSMG7ABCA7BBXDARS2EP300GNA13ITSN1PASKZFHX3ZNF134CIITACTCFMCTP2NLRP11NMD3PLCE1BCL10BCL6CCND3IRF4MEF2BMLL2MTRRBL2BTG2CD70CNOT1COL4A2LRRK1MAN2A2MYO5BPAXIP1POSTNRFTN1ETS1KIF20BMTERFPIK3CDBTG1CREBBPPIM1SAMD9TET2KLHL6FASFOXO1PDS5BCD58MYD88CARD11MALT1EZH2SENP6TNFRSF14TMEM30AB2MTP53SGK1BCL2SignificantNon−significantBFigure 2.16: Integration of Copy Number and Single Point Mutational Data. PanelA: Patient Genomic Aberration Matrix. Previously reported single point mutation data isintegrated with copy number data from RCOR1, NCOR1 and LCOR. Only patients withmatching SNP 6.0 and RNA-seq data are presented since there are copy number andsingle point mutational data for these cases. Panel B: Matrix showing the genes (incolumns) that significantly co-mutate with the genes RCOR1, NCOR1, and LCOR. Redcells indicate genes that co-mutate significantly (using Fisher exact test). No genes werefound to mutate exclusively of RCOR1, NCOR1, or LCOR.2.3.5 An RCOR1 Loss-Associated Gene Expression Signature Derived byIn vitro KnockdownHaving established an association between outcomes and RCOR1 deletions, in vitroknockdown of RCOR1 was performed in two B-cell lymphoma lines (KM-H2 and Raji)to investigate the effects of RCOR1 loss at the transcriptional level. Quantitative reverse47transcriptase PCR confirmed reduction of RCOR1 in KM-H2 cells to 21.5% ± 0.1% and inRaji cells to 25.5% ± 0.02% compared to the non-silenced controls (Figure 2.17A).0.000.250.500.751.00KM-H2 Cl2 KM-H2 Cl5 Raji Mission Raji OpenBioGAPDHnormalisedtorespectivenon-silencingcontrolsABRCOR1B-actinNS KDRajiMissionRajiOpenBioNS KDNS KD KDKM-H2Cl2 Cl5Figure 2.17: RCOR1 knockdown qRT-PCR and Western Blots. Panel A: qRT-PCR plotdemonstrating the knockdown of RCOR1 at the mRNA level in the B-cell lines. Panel B:Western blots demonstrating the knockdown of RCOR1 at the protein level in the B celllines.Western blots also confirmed the reduction of RCOR1 at the protein level(Figure 2.17B). Differential expression analyses between KM-H2 and Raji revealed astrong correlation in fold change directionality (P < 0.001) as well as consistency in thedysregulated pathways (Figure 2.18).4821 851Raji KM-H230 2588KM-H2RajiA BFigure 2.18: Venn Diagrams Comparing the Consistency of Dysregulated Pathwaysbetween RCOR1 Knockdowns in the cell-lines KM-H2 and Raji. Panel A: Venn diagramcomparing up-regulated pathways. Panel B: Venn diagram comparing down-regulatedpathways.This provided confidence to combine the results of the two in vitro knockdown Bcell lines to produce a single list of differentially expressed genes (n = 1588). Thisset of genes was significantly overlapping (P < 0.001) with the genes co-expressedwith RCOR1 in the RNA-seq cohort (n = 1639) (Figure 2.19). I defined the list of theoverlapping genes (n = 233) as the RCOR1 loss-associated gene signature (Table A.4).This gene signature was enriched for biological processes that included up-regulation ofthe proteasome, processing of capped intron-containing pre-mRNA, and down-regulationof signalling events mediated by HDAC class II (Table A.5).49KD: Knockdown NS: Non-silencing FcDiff: Fold Change Difference (KD – NS)  RCOR1 KD in-vitro Differential Expressed Genes (n = 1588) Raji M KD KM-H2 Cl2 KD Raji M NS KM-H2 NS Raji OB KD Raji OB NS KM-H2 Cl5 KD Fold Change |FcDiff|>= 0.3 In-vitro RCOR1 Knockdowns 91 RNA-seq DLBCL  RCOR1 Co-expressed Genes  (n = 1639) Spearman Rank (FDR < 0.0001) BCCA RNAseq Study Cohort RCOR1 Loss-Associated Gene Signature  (n = 233) Overlap & Concordant in Directionality •  FcDiff ≤ -0.3 & FDR < 0.0001 & rho > 0 •  FcDiff ≥ 0.3 & FDR < 0.0001 & rho < 0 In-vitro RCOR1 Non-silencing Control Figure 2.19: RCOR1 Loss-Associated Gene Signature Derivation Workflow.2.3.6 The RCOR1 Loss-Associated Gene Expression Signature isAssociated with Unfavourable OutcomeI next investigated whether the RCOR1 loss-associated gene signature correlated withoutcomes following R-CHOP chemotherapy in the BCCA study cohort. Using theRNA-seq derived expression measurements (from 91 patients) of the RCOR1 lossgenes as features, I performed hierarchical clustering and found three distinct subgroups(Figure 2.20A, Figure 2.3), including a group of patients (n = 20, 22%) exhibiting lowRCOR1 expression (defined as the RCOR1-low group) demonstrating a differential geneexpression profile distinct from another group of patients (n = 49, 53.8%) exhibiting highRCOR1 expression (defined as the RCOR1-high group; P < 0.001, Figure 2.21A) as wellas high LCOR and NCOR1 expression (Figure 2.21B-C). A third group of patients (n = 22,24.2%) demonstrated a mixture of the expression profile from both the RCOR1-low andRCOR1-high group and I defined this as the unclassified group. When placing the RCOR1deletions in the context of the gene signature groups, I found that RCOR1 deletionstrended towards clustering in the RCOR1-low group (P = 0.079). When consideringalso the unclassified group, the RCOR1 deletions clustered into either the RCOR1-low50or unclassified group (P = 0.039). In the subgroup of R-CHOP-treated patients (n = 63),the RCOR1-low expression cluster showed unfavourable OS (Figure 2.20B) relative to theRCOR1-high expression clusters (5 year OS - 55.6% RCOR1-low vs. 83.4%, RCOR1-high;P = 0.023).RCOR1 Expression DistributionHigh ExpressionLow ExpressionExpression LevelAliveDeadPatient StatusClusterRCOR1-highExpression ClusterRCOR1-lowExpression ClusterScale-4 -2 0 2 4Activated B-Cell LikeGerminal CentreB-Cell LikeUnclassifiedSubtypeIPIHigh IPI (≥ 3)Low IPI (< 3)N/AUnclassifiedExpression ClusterCopy NumberStatusGainDeletionNeutralDeletion with NoCis-CorrelationAOverall Survival (Years)2 4 6 8018 10 9 6 145 39 30 12 5No. at Risk100806040020RCOR1-highExpression Cluster(n = 45)RCOR1-lowExpression Cluster(n = 18)CumulativeSurvival(%)Log Rank P = 0.023BBCCA Study CohortClusterSubtypeLCOR ExpressionNCOR1 ExpressionRCOR1 ExpressionIPIPFSOSRCOR1 Copy Number StatusOverall Survival (Years)36 17 10 0 0115 61 21 3 2No. at Risk2 4 6 80 1010080604002001CLog Rank P = 0.039CumulativeSurvival(%)Lenz (2008) Rediscovery CohortRCOR1-highExpression Cluster(n = 115)RCOR1-lowExpression Cluster(n = 36)Figure 2.20: RCOR1 Loss-Associated Signature is Associated with UnfavourableOutcome. Panel A: The heatmap produced from unsupervised clustering on the BCCAcohort using the RCOR1 loss-associated gene expression signature (n = 233). Panel B:Kaplan-Meier analyses performed on the RCOR1-low vs. RCOR1-high expression clustersin the BCCA study cohort. Panel C: Kaplan-Meier analyses performed on the RCOR1-lowvs. RCOR1-high expression clusters in the Lenz rediscovery cohort.The prognostic association of the RCOR1 loss-associated gene signature wasreproducible in an independent cohort of R-CHOP-treated patients from Lenz et al. [105].This cohort included 233 samples (rediscovery cohort) with microarray-derived geneexpression data and clinical data including OS, but not PFS. The set of genes from thegene signature was carried over to this rediscovery cohort and used to perform de novoclustering. As per the BCCA study cohort, this rediscovery cohort was stratified intoRCOR1-low (n = 53), RCOR1-high (n = 128) and unclassified (n = 52) gene expressionclusters (Figure 2.22). After removing 38 overlapping samples from the study cohortand rediscovery cohort for outcome analysis, the RCOR1-low expression patients had51RCOR1 LCOR NCOR101234RCOR1−lowRCOR1−high RCOR1−lowRCOR1−high RCOR1−lowRCOR1−highGene Expression RPKM RPKM (Log2)Figure 2.21: Expression Distribution of RCOR1, LCOR and NCOR1. Panels A-C:Expression of RCOR1, LCOR, and NCOR1 dichotomized into two groups RCOR1-highand RCOR1-low.unfavourable OS Figure 2.20C, relative to patients in the RCOR1-high and unclassifiedexpression clusters (5 year OS - 55.8% RCOR1-low vs. 72.2%, RCOR1-high, P = 0.039).To further test the prognostic value of the gene signature, it was tested in a secondindependent rediscovery cohort from Monti et al. (n = 90; Figure 2.23A) that produceda statistical trend for prognostic significance (5 year OS - 47.1% RCOR1-low vs. 72.6%RCOR1-high; P = 0.187, Figure 2.23B).52High ExpressionLow ExpressionExpression LevelAliveDeadPatient StatusClusterRCOR1-highExpression ClusterRCOR1-lowExpression ClusterScaleActivated B-Cell LikeGerminal CentreB-Cell LikeUnclassifiedSubtypeIPIHigh IPI (≥ 3)Low IPI (< 3)N/AUnclassifiedExpression ClusterClusterSubtypeLCOR ExpressionNCOR1 ExpressionRCOR1 ExpressionIPIOSRCOR1 Expression Distribution-4-2024Figure 2.22: RCOR1 Loss-associated Gene Expression Signature Clustering in theLenz Rediscovery Cohort.53High Risk (> 3)Expression LevelPatient StatusCluster IPIOverall Survival (Years)17 11 7No. at Risk100806040020BLog Rank P = 0.187CumulativeSurvival(%)Monti (2012) Rediscovery CohortRCOR1-highExpressionCluster (n = 23)RCOR1-lowExpressionCluster (n = 17)10 2 4 623 14 11 4ClusterLCOR ExpressionRCOR1 ExpressionIPIOSRCOR1 Expression Distribution-4 -2 0 2 4ScaleEventNon-EventHigh ExpressionLow ExpressionRCOR1-highExpression ClusterUnclassifiedExpression ClusterRCOR1-lowExpression ClusterLow Risk (< 3)N/ANCOR1 ExpressionFigure 2.23: RCOR1 Loss-associated Gene Expression Signature Clustering in theMonti Rediscovery Cohort.I next tested the prognostic value of the gene signature with respect to knownprognostic markers such as COO and IPI using a multivariate Cox regression analysis.The prognostic value was independent of COO in the study cohort, but was linked toIPI (Table 2.4). I investigated this further and observed that the gene signature addsprognostic value in the IPI low-risk group (P = 0.043; Figure 2.24A). In the Lenz cohort, Iagain observed an enhancement of the prognostic value within the IPI low-risk group (P <0.001; Figure 2.24B) that was independent of COO (Table 2.4). Taken together, the geneexpression data from an orthogonal platform derived from two non-overlapping cohortsof similarly treated patients confirm the prognostic nature of the RCOR1 loss-associatedgene expression signature.549 6 6 5 1No. at RiskOverall Survival (Years)CumulativeSurvival(%)0 2 4 6 8 0 2 4 6 8 10Overall Survival (Years)CumulativeSurvival(%)48 31 31 3 2No. at Risk01RCOR1-highExpression Cluster(n = 48)RCOR1-lowExpression Cluster(n = 17)RCOR1-highExpression Cluster (n = 27)RCOR1-lowExpression Cluster(n = 9)Figure 2.24: Prognostic Significance of the RCOR1 Loss-associated GeneExpression Signature within the IPI Low Group. Panel A: Kaplan-Meier Analyses inthe low IPI group in the BCCA study cohort. Panel B: Kaplan-Meier Analyses in the low IPIgroup in the Lenz rediscovery cohort.2.4 DiscussionUsing integrative analysis of high-resolution copy number and RNA-seq data in alarge cohort of DLBCL patients, I identified novel focal and recurrent deletions in thetranscriptional regulator RCOR1 and established a prognostic signature of 233 genes thatstratified patients into a distinct subgroup associated with reduced RCOR1 expression.My methodology focused on identification of copy number alterations that: (1) affectedthe mRNA expression of genes harboured within the regions of chromosomal imbalance,(2) led to genome-wide changes in transcriptional networks, and (3) were associated withclinical outcomes. While this study focused primarily on copy number alterations, additionalcandidate functional alterations could be revealed through integration with somatic pointmutations and will be an important aspect in future studies.From the inventory of copy number alterations affecting gene expression, RCOR1deletions stood out from other identified gene loci as these deletions were associated witha pronounced effect on key cellular pathways (xseq analysis) and unfavourable survival.Moreover, the RCOR1 loss-associated gene signature proved to be a prognostic indicatorof survival that added prognostic value within the IPI low-risk group independent of COO.Based on the validation work using FISH, the selected deletions were demonstratedto be independent of other known tumour suppressor gene loci in close proximity. The55concept of synergistic tumourigenic effects of co-deleted or co-amplified genes on thesame or different chromosomes has been widely assessed in lymphoma and other cancers[108, 117, 118]. Indeed, RCOR1 deletions were associated with deletions of TRAF3(located in close vicinity). TRAF3 is a key molecule in TNF-alpha and Toll-like receptorsignalling, acting as a negative regulator of NF-kB inducing kinase [98, 119]. TRAF3 hasalso been identified as a target of somatic mutations and deletions in a number of cancers[7, 120–122]. When RCOR1 and TRAF3 are co-deleted, the combination of transcriptionalpattern changes mediated by RCOR1 loss and the downstream effects on alternativeNF-kB signalling may cooperate and contribute to the malignant phenotype.RCOR1 encodes a co-repressor of the RE1-silencing transcription factor RESTthat binds to RE1 neuron-restrictive silencer elements to repress gene expression innon-neuronal cells [123, 124]. RCOR1 is part of the BRAF35-Histone Deacetylase(BHC) complex where it associates with the C-terminal domain of REST, the HistoneDeacetylases 1 and 2 (HDAC1/2), and KDM1A, and regulates gene expression throughchromatin remodelling [125–127]. Further, NCOR1 - a paralog of RCOR1 - that Ifound deleted and cis-correlated, binds to the N-terminus of REST, further recruitingHDAC1/2 to RE1/NRSE DNA-binding sites [128]. When intersecting the in vitro RCOR1knockdown signature with RCOR1 co-regulated genes in clinical samples to define thegene expression signature, this revealed gene enrichment in pathways associated withHDAC class II signalling events and processing of capped intron-containing pre-mRNA.Thus, global deregulation of gene expression is a likely consequence of RCOR1 loss.Taken together and in two separate, independent cohorts, the identified outcomecorrelations of genomic RCOR1 deletions and the RCOR1 loss-associated gene signaturesuggest that these findings may be valuable as novel prognostic biomarkers in DLBCLpatients. The RCOR1 loss-associated gene signature as a biomarker might be thebest representation of the initial finding of RCOR1 deletions as it is stable (basedon multiple gene features), reproducible (two independent cohorts) and biologicallymeaningful (defined by RCOR1 in vitro knockdown). This biologically defined RCOR1loss-associated gene expression signature identified an RCOR1-low cluster that hadunfavourable overall survival in both the BCCA and the rediscovery cohorts. While thedeletions tended to cluster with the RCOR1-low cluster, several cases in this cluster hadlow expression but had no RCOR1 deletion. Alternative molecular mechanisms such aspromoter methylation along with other epigenetic modifications are possible explanationsfor the lack of expression in these cases and would need to be explored in future studies.The predictive capacity of the RCOR1 loss-associated gene signature had addedprognostic value in the IPI low group in both the BCCA study and the Lenz cohorts.Additionally, the added prognostic value in the IPI low group was also prognosticallyindependent of COO classification. Thus RCOR1 loss-related biology is likely to addprognostic value to COO identification in DLBCL patients. In conjunction with other56emerging biomarkers such as COO subtype, a signature of RCOR1 loss could becombined in gene expression analyses to improve risk stratification. Finally, by targetingthe biology associated with RCOR1 loss this provides a roadmap for improving therapeuticintervention in a poor outcome subgroup of DLBCL patients.57Chapter 3Clonal Dynamics UnderlyingHistological Transformation andProgression in FollicularLymphoma13.1 IntroductionFollicular lymphoma (FL) is the 2nd most common subtype of non-Hodgkin lymphoma(NHL) and the most frequent indolent lymphoma, accounting for 22-32% of all new NHLdiagnoses in Western countries [129, 130]. Patient outcomes are favourable, with medianoverall survival extending well beyond 10 years [131–133]. However, FL remains anincurable malignancy as most patients eventually experience progressive disease. Asubset of patients are at risk of early lymphoma-related mortality due to early progressionafter immuno-chemotherapy or histological transformation to aggressive lymphoma (2-3%of patients per year), each of which leads to shortened survival [54, 55, 57, 58, 60, 134–136]. Hence, mutational profiling of FL specimens at the temporal boundaries of clinicalinflection points represents a compelling opportunity to study the evolutionary dynamicsunderpinning FL disease progression.To infer evolutionary properties, deconvolution of malignant tissues into constituentclones is required. Clonal decomposition is accomplished through analysis of allelicmeasurements, under the assumption that the prevalence of specific alleles in a DNAmixture extracted from a tumour quantitatively represents its clonal population abundance.1In this chapter, I describe a study of intra-tumour heterogeneity in clinically extreme follicular lymphomapatients (histologically transformed and early progressers) that involves serial sampling of primary andtransformed/progressed samples. This chapter is a modified version of material published in “Kridel, R*.,Chan, FC*., et al. Histological Transformation and Progression in Follicular Lymphoma: a Clonal EvolutionStudy. PLOS Medicine (2016). *Equal contribution”.58In this study, targeted amplicon sequencing in conjunction with ultra-sensitive digital dropletPCR were used to measure the changing prevalence of alleles at unprecedented resolutionover FL disease progression. With precise measurements of alleles, computationalinference can then determine clonal composition and phylogenetic topology of clones,yielding insight into temporal mutation acquisition and genotypes giving rise to clonalexpansions over time. With this approach, longitudinal comparison of the clonalcomposition of tumours sampled at different time points in patient’s clinical history canbe performed, deciphering which constituent populations were present at diagnosis, andwhich populations constituted the relapse. Thus, the degree to which a tumour is evolvingand the contributions of specific clones to the evolutionary process (collectively termedclonal dynamics) can be quantitatively assessed.To varying levels of resolution, related approaches have been applied to a variety ofprogression scenarios in haematologic and solid malignancies [15, 16, 137, 138]. Forexample, secondary acute myeloid leukemia from underlying myelodysplastic syndromeand Richter syndrome from chronic lymphocytic leukemia arise without significantbranched evolution [139, 140]. By contrast, transformation of FL has most commonlybeen described as divergent, branched evolution from a common progenitor [141, 142].The nature of clonal trajectories leading to transformation or early progression are poorlyunderstood; it is unknown if similar, or contrasting modes of selection underpin theseclinical end points.Discrete transformation-associated genetic alterations have been described involvingCDKN2A, MYC, TP53, CD58 or B2M [141–150]. However, these events alone cannotexplain the majority of transformed cases, leaving a discovery gap for genetic driversof transformation. Simlarly, progression has been described to occur more frequentlyin the presence of selected, recurrent cytogenetic aberrations or single gene mutations[151–157]. Recently, a clinicogenetic risk model (m7-FLIPI), including mutational statusof seven genes, the Follicular Lymphoma International Prognostic Index (FLIPI) andperformance status was shown to improve outcome prediction for patients requiringimmuno-chemotherapy[56]. Nonetheless, the m7-FLIPI imperfectly captures determinantsof early progression [57]. A newer prognostic model, named POD24-PI, was developedusing the original m7-FLIPI data to specifically predict early progression. The POD24-PIhas better sensitivity but lower specificity for the prediction of early progression [57], raisingthe question of whether progressive disease might be attributed to genetic lesions thatare not captured by either prognostic model. Furthermore, the mechanisms underlyingresistance to immuno-chemotherapy remain elusive; genetic profiling of early progressivecases has potential to uncover novel genetic lesions in molecular pathways leading totreatment resistance.To address these questions, I set out to compare the clonal dynamics of tumoursleading to transformation and those associated with early progression by using high59resolution, genome-wide profiling of mutant alleles. I found dramatic clonal expansionsin transformed disease, whereby dominant clones in transformation samples emergedfrom extremely low prevalence clones or from clones that were not detected in thediagnostic samples. In contrast, the dynamics of disease progression during treatmentin the absence of transformation showed markedly different characteristics, with much ofthe clonal architecture preserved from diagnostic to relapse specimens. Finally, using alarge extension targeted sequencing cohort, I established genetic variants associated withtransformation and early progression in the broader patient population. Taken together,these results illuminate previously undescribed patterns of clonal expansion underpinningFL clinical histories suggesting that contrasting management strategies will be necessaryacross the FL patient population.3.2 Materials and Methods3.2.1 Cohort DescriptionsA discovery cohort of tumour and normal specimens from 41 patients were selected forwhole-genome sequencing (WGS) (Figure 3.1 and Figure 3.2A). Samples were acquiredfrom the British Columbia Cancer Agency (BCCA) lymphoma tumour bank and patientswere grouped according to three clinical endpoints:1. patients diagnosed with FL and subsequent or concomitant (patient FL1014)transformation to large cell lymphoma (TFL, n = 15; These cases were selectedirrespective of type of treatment received),2. patients whose disease progressed without evidence of transformation (PFL, n =6); 5 out of these 6 patients progressed within 2.5 years after starting first-lineimmuno-chemotherapy with R-CVP (rituximab, cyclophosphamide, vincristine andprednisone),3. patients whose lymphoma displayed no evidence of transformation or progression formore than 5 years from initial diagnosis (NPFL, n = 20).All cases were selected irrespective of clinical stage, grade and IgH-BCL2 translocationstatus in order for the cohort to be reflective of the clinical and pathological heterogeneitythat is inherent to FL. Samples with a tumour content of less than 50% and availablefrozen single cell suspensions were flow-sorted to purify tumour (CD19+ kappa or lambda+CD3-) and germline cells (CD19- kappa or lambda- CD3+). Germline DNA was obtainedfrom flow-sorted CD3+ lymphocytes or from peripheral blood cells. All germline sampleswere confirmed to be free of tumour contamination by the absence of PCR-amplifiablepatient-specific IgH-BCL2 and/or V(D)J rearrangements.60FL1001FL1019FL1007FL1004FL1005FL1006FL1008FL1014FL1009FL1012FL1013FL1016FL1017FL1018FL2008FL2002FL1020FL2001FL2005FL2006FL2007TransformedProgressedFL3013FL3004FL3001FL3002FL3003FL3005FL3009FL3006FL3007FL3008FL3010FL3011FL3012FL3020FL3016FL3014FL3015FL3017FL3018FL3019Long-term non progreser0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18Time since diagnosis (years) Time since diagnosis (years)Follicular lymphoma (genome sequencing)LegendTransformed lymphoma (genome sequencing)AliveDeadFollicular lymphomaFigure 3.1: Sample Overview and Timeline of Whole-Genome Sequencing Cohort.The capture sequencing cohort refers to the samples from 277 patients (39 patientsoverlapping with the WGS cohort), in which germline DNA was available for 84 patients(Figure 3.2B). These patients were divided into three groups:1. 159 TFL patients (sample at primary FL timepoint available in 128 cases, sample attransformed FL timepoint available in 149 cases and samples from both timepointsavailable in 118 cases).2. 41 PFL patients who presented with early progressive disease within 2.5 years afterstarting immuno-chemotherapy with R-CVP. Progression was defined as radiologicalevidence of progressive disease and requirement for initiation of second-line therapy.For 7 of these patients, the T2 sample at progression (FL histology), was alsosequenced.3. 84 NPFL patients without progression for at least 5 years after either observation orR-CVP.61Transformed FL  Cohort: n = 159 patients n = 277 samples Clinical Extremes Cohort: n = 125 patients n = 132 samples Overlap: N = 7 T1 T2 time FL n = 128 FL (early progresser) n = 41 FL (late progresser) n = 84 TFL n = 149 118 pairs Any treatment R-chemo R-chemo or observed WHOLE GENOME SEQUENCING CAPTURE SEQUENCING Transformed  Cohort: n = 15 patients n = 15 samples Progressed Cohort: n = 6 patients n = 6 samples FL n = 15 TFL n = 15 All paired FL n = 6 Progressed FL n = 6 All paired FL n = 20 no pairs Non-progressed Cohort: n = 20 patients n = 20 samples Overlap  39 patients; 58 samples A B Figure 3.2: Sample Overview. Whole-genome (panel A) and capture sequencing (panelB) cohorts, as well as the repartition of patients and samples into clinical groups.Timepoint specificity is indicated for each sample by the suffixes -T1 for the primarytimepoint (by definition FL) or -T2 for the secondary timepoint in the TFL or the PFL cohorts(transformed or treatment-resistant FL). Clinical characteristics of these cohorts can befound in Table 3.1 and Table 3.2.62Table 3.1: Clinical Characteristics of TFL Cases. aData were unavailable for 4 patients.bData were unavailable for 12 patients. cData were unavailable for 5 patients. FL, follicularlymphoma; TFL, transformed follicular lymphoma; NA, not available; DLBCL, diffuse largeB-cell lymphoma; COM, composite histology; BCLU: unclassifiable B-cell lymphoma withfeatures intermediate between DLBCL and Burkitt lymphoma.Clinical Variable TFL (n = 159)Year of FL diagnosis < 2003 99 (62%)≥ 2003 60 (38%)Year of TFL diagnosis < 2003 55 (35%)≥ 2003 104 (65%)Antecedent FL gradea Grade 1 or 2 131 (85%)Grade 3A 24 (15%)Time to transformation ≤ 5 years 104 (65%)5-10 years 40 (25%)> 10 years 15 (9%)First line treatment of FLb Single-agent chemo 28 (19%)Multi-agent chemo 24 (16%)Multi-agent chemo + rituximab 16 (11%)Radiation or surgery 20 (14%)Observation 54 (37%)Other 5 (3%)Initial treatment after transformationb Multi-agent chemo 41 (28%)Multi-agent chemo + rituximab 87 (59%)None 1 (1%)Other 18 (12%)Rituximab before transformationc No 116 (75%)Yes 38 (25%)Pathology DLBCL 124 (78%)COM 25 (16%)BCLU 10 (6%)63Table 3.2: Clinical Characteristics of Early vs. Late Progressers. aData wereunavailable for 1 early and 2 late patients. bData were unavailable for 2 early and 5 latepatients. cData were unavailable for 5 late patients. dData were unavailable for 3 early and4 late patients. eData were unavailable for 2 early and 7 late.Clinical Variable Early (n = 41) Late (n = 84) PAge ≤ 60 years 25 (61%) 46 (55%)> 60 years 16 (39%) 38 (45%) 0.567Gender Male 23 (56%) 41 (49%)Female 18 (44%) 43 (51%) 0.454B symptomsa Absent 28 (70%) 71 (87%)Present 12 (30%) 11 (13%) 0.046ECOGb 0-1 31 (79%) 76 (96%)2-4 8 (21%) 3 (4%) 0.006Stage I-II 5 (12%) 21 (25%)III-IV 36 (88%) 63 (75%) 0.108Extranodal sites 0-1 38 (93%) 78 (90%)> 1 3 (7%) 8 (10%) 1.000Nodal areasc ≤ 4 11 (27%) 39 (49%)> 4 31 (73%) 40 (51%) 0.020Tumour massd < 7 cm 15 (39%) 59 (74%)≥ 7 cm 23 (61%) 21 (26%) < 0.001LDHe ≤ ULN 22 (56%) 73 (95%)> ULN 17 (44%) 4 (5%) < 0.001Hemoglobina ≥ 12 g/dL 31 (78%) 78 (95%)< 12 g/dL 9 (22%) 4 (5%) 0.009FLIPIa Low/intermediate 15 (38%) 62 (76%)High 25 (62%) 20 (24%) < 0.001FL grade 1-2 29 (71%) 77 (92%)3A 12 (29%) 7 (8%) 0.004Treatment by ITT R-CVP & R maintenance 33 (80%) 44 (52%)Other R-chemo 8 (20%) 1 (1%)Observed 0 37 (44%)Other 0 2 (2%) N/AProgression during R maintenance 17 (41%) 064. . . continuedClinical Variable Early (n = 41) Late (n = 84) PR-chemotherapy 24 (59%) 0 N/A3.2.2 PathologyAll samples were centrally reviewed by expert hematopathologists at the BCCA (Drs. AnjaMottok, Pedro Farinha, King Tan, and Randy Gascoyne), and the following histopathologicco-variates were recorded: FL grade (1, 2 or 3A), histological diagnosis at transformation(DLBCL, composite or B-cell lymphoma not otherwise classifiable (BCLU)), cell of origin forall TFL cases with a DLBCL histology, and immunohistochemistry for expression of TP53,IRF4, CD8, and B2M. Composite histology was defined as any evidence of underlyinglow grade lymphoma in a sample that concomitantly harboured large cell lymphoma. TheLymph2Cx assay was performed as previously described [158, 159], with the exceptionthat it was applied in 4 cases to RNA extracted from fresh-frozen blocks, using 100 ngas input. Immunohistochemical stained slides for the T cell marker CD8 (antibody cloneC8/144B, Dako, catalogue number M7103) were scanned with an Aperio ScanScopeXT at 20x magnification. Analysis was performed using the Aperio ImageScope viewer(v12.1.0; Aperio Technologies). Only cores and areas containing tumour were scored byapplying the Positive Pixel Count algorithm with an optimized color saturation threshold.Any staining was considered positive and the number of positive pixels was divided by thetotal pixel count. Scores from both cores were subsequently averaged and multiplied by100 to obtain the percentage of positive pixels. Images from tissue cores stained for CD8and B2M (rabbit polyclonal antibody, Dako, catalogue number A0072) were taken using aNikon Eclipse 80i microscope equipped with a Nikon DS-Ri1 camera and NIS ElementsImaging Software, D3.10.3.2.3 Whole Genome Sequencing Data AnalysisThe following method sections pertain to the WGS data.3.2.3.1 Library Construction, Alignment, and FilteringWGS libraries were constructed from genomic DNA using PCR-free library constructionprotocols, with the exception of libraries from cases FL1001, FL1002, FL2001, andFL2002 that were constructed during an earlier phase of the project using PCR-containingprotocols. Libraries were sequenced on Illumina HiSeq 2500 instruments, generating onaverage 1822261237 paired-end sequence reads of 100-125 length per library. The BWA(v0.5.7) aligner [160] was used to align the paired-end reads to the human referencegenome GRCh37. PCR duplicates were marked using Picard tools (v1.126) using65MarkDuplicates and sequencing statistics (Figure 3.3).Tumor Normal0 × 10+01 × 10+92 × 10+93 × 10+9Number of ReadsNumber of Aligned ReadsNumber of Non−Aligned Readslll ll llll lllllll llll ll l llll l l lll l l lllll lllllll l l l lll llll l l ll lll l l lll l lll l ll l l l l l l l l l l l l lll l l ll l l l l l l l l l255075100125FL1001T1FL1001T2FL1004T1FL1004T2FL1005T1FL1005T2FL1006T1FL1006T2FL1007T1FL1007T2FL1008T1FL1008T2FL1009T1FL1009T2FL1012T1FL1012T2FL1013T1FL1013T2FL1014T1FL1014T2FL1016T1FL1016T2FL1017T1FL1017T2FL1018T1FL1018T2FL1019T1FL1019T2FL1020T1FL1020T2FL2001T1FL2001T2FL2002T1FL2002T2FL2005T1FL2005T2FL2006T1FL2006T2FL2007T1FL2007T2FL2008T1FL2008T2FL3001T1FL3002T1FL3003T1FL3004T1FL3005T1FL3006T1FL3007T1FL3008T1FL3009T1FL3010T1FL3011T1FL3012T1FL3013T1FL3014T1FL3015T1FL3016T1FL3017T1FL3018T1FL3019T1FL3020T1FL1001NFL1004NFL1005NFL1006NFL1007NFL1008NFL1009NFL1012NFL1013NFL1014NFL1016NFL1017NFL1018NFL1019NFL1020NFL2001NFL2002NFL2005NFL2006NFL2007NFL2008NFL3001NFL3002NFL3003NFL3004NFL3005NFL3006NFL3007NFL3008NFL3009NFL3010NFL3011NFL3012NFL3013NFL3014NFL3015NFL3016NFL3017NFL3018NFL3019NFL3020NNumber of Reads per BaseFigure 3.3: WGS Sequencing Statistics. Top facet shows the distribution of aligned vs.non-aligned reads per sample. Bottom facet shows the mean (dot) and standard deviationcoverage per sample.While assessing the number and type of somatic single nucleotide variant (sSNV)substitutions across these samples, it became apparent that samples FL1005T1,FL1009T1, FL1009T2, FL1012T1, FL1012T2, FL1014T1, FL1014T2, FL2005T1,FL2005T2 and FL3014T1 had an overwhelming representation of C to A substitutions atlow variant allele fractions (VAFs) (between 10-15%). The cases appeared to be randomlyaffected. The low allelic ratio C to A substitutions were documented to be artifactual byamplicon sequencing in sample FL1007T1 and deemed to be compatible with oxidativedamage during DNA shearing, as described by Costello et al. [161]. These positions werefiltered out by applying allelic ratio filters for C to A substitutions in the affected samples(filter set at 0.20 for FL1005T1, FL1009T1, FL1009T2, FL1012T2, FL1014T1, FL3014T1and at 0.25 for FL1012T1, FL1014T2, FL2005T1, FL2005T2).3.2.3.2 Somatic Single Nucleotide Variant and Small Insertion-Deletion PredictionsThe full bioinformatics workflow for predicting sSNVs from the WGS data is shown inFigure 3.4. MutationSeq [162] (v4.1.0) was used to predict sSNVs for each tumour-normalpair using the parameters “-m model_v4.0.2.npz”. To increase the sensitivity for sSNVs,Strelka [163] (v1.0.13) was also used to predict sSNVs for each tumour-normal pair usingthe default parameters. Mutations reported by MutationSeq and/or Strelka (i.e. appearedin the passed.somatic.snvs.vcf file) were aggregated to form a timepoint-specific list ofcandidate sSNV positions. For TFL and PFL patients, a patient-centric candidate list ofsSNV positions was generated by aggregating the candidate sSNVs positions across bothtimepoints (i.e. T1 + T2 candidate sSNVs). For NPFL, the patient-centric candidate list isequivalent to the T1 candidate list as there is only one timepoint sample.66T1 Sample Normal Sample MutationSeq Strelka T2 Sample Normal Sample Strelka MutationSeq T1 Specific sSNVs T2 Specific sSNVs sSNVs Union Patient Specific sSNVs Prediction of sSNVs Prediction of sSNVs sSNVs Union T1 Sample Normal Sample MutationSeq T2 Sample Normal Sample MutationSeq T1 Filtered sSNVs T2 Filtered sSNVs Patient Filtered sSNVs Union of Candidate sSNVs Across Timepoints Re-run with Patient Specific sSNV List Re-run with Patient Specific sSNV List Filters  (see “Methods”) Filters  (see “Methods”) Figure 3.4: Bioinformatics Workflow for Predicting sSNVs from WGS data. TFL andPFL patients follow the complete workflow as depicted. For NPFL patients, which had onlythe T1 timepoint sample, the only the left side the workflow is followed.MutationSeq was then re-run specifically interrogating the patient-centric sSNVcandidate list across both timepoints (for TFL and PFL patients) and single timepoint forNPFL patients. This effectively retrieves sSNV information at candidate positions predictedby MutationSeq and Strelka in a consistent output format, and also across timepoints. Finalputative timepoint centric sSNVs lists were constructed based on the following criteria: 1)the sSNV had a MutationSeq probability ≥ 0.9 and MutationSeq filter field = “PASS”, 2)the sSNV was predicted by Strelka and MutationSeq filter field = “PASS”, or 3) sSNV waspredicted by Strelka, MutationSeq filter = “INDL” and MutationSeq probability ≥ 0.9.SnpEff (v3.5) was then used to annotate each sSNV with respect to the canonicaltranscript with the parameters “-canon -no-downstream -no-intergenic -no-upstream” usingthe GRCh37.72 SnpEff database. In the scenario where a position may have multipleSnpEff effects, the effect with the most impact was chosen. The effect impact was orderedas follows from lowest to highest: 1) intragenic, 2) intron, 3) exon, 4) utr 3 prime, 5)utr 5 prime, 6) splice site region, 7) synonymous coding, 8) synonymous stop, 9) nonsynonymous start, 10) splice site donor, 11) splice site acceptor, 12) non synonymous67coding, 13) stop lost, 14) stop gained, 15) start lost. For downstream analyses, codingsSNV effects were considered to be: 1) non synonymous coding, 2) stop gained, 3) startlost, 4) stop lost, 5) splice site donor, 6) splice site acceptor, 7) non synonymous start, and8) splice site region. Truncating effects were considered to stop gained and start lost.The same runs of Strelka for predicting sSNVs also predicted small somatic insertionsand deletions (sIndels) with the passing results placed into the passed.somatic.indels.vcffile. SnpEff was used to annotate each sIndel with the same parameters as used in thesSNV annotations. For downtream analyses, coding sIndel effects were considered to be:1) codon change plus codon deletion, 2) codon change plus codon insertion, 3) codondeletion, 4) codon insertion, 5) frame shift, 6) splice site acceptor, 7) splice site donor, and8) splice site region.3.2.3.3 Tumour Content EstimationTo estimate the tumour content of each tumour-normal pair, I used the sSNV results fromthe WGS data. More specifically, I applied a variational Bayes binomial mixture modelclustering (VBBMM) on the sSNV allele count data of each patient. For TFL and PFLpatients, I clustered in 2 dimensions (T1 and T2, Figure 3.5 and Figure 3.6). For NPFLpatients, I clustered in 1 dimension (T1 only, Figure 3.7). Next, I identified the cluster mostrepresentative of the clonally dominant diploid heterozygous mutations in each patient.To calculate the sample-specific tumour content, I took the mean VAF of that cluster inthat sample and multiplied by 2. These tumour/normal content estimation values wereused as input into PyClone [164] and TITAN [165]. For TitanCNA, there were a fewexceptions where the tumour/normal content estimations from sSNV data are not used.See Section 3.2.3.4 for specific details on these samples.683 76 480251Tumour Content:26%; 42%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)lllllllll3 (1472; 0.077; 0.183)*7 (1139; 0.129; 0.208)6 (755; 0.003; 0.183)4 (433; 0.044; 0.153)8 (324; 0.121; 0.087)0 (172; 0.221; 0.174)2 (132; 0.061; 0.03)5 (129; 0.1; 0.333)1 (77; 0.365; 0.382)FL10011 952476308Tumour Content:68%; 73%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll1 (4805; 0.003; 0.356)*9 (4500; 0.34; 0.363)5 (1072; 0.304; 0.003)2 (841; 0.686; 0.727)4 (606; 0.003; 0.13)7 (531; 0.367; 0.577)6 (431; 0.258; 0.237)3 (407; 0.003; 0.577)0 (272; 0.07; 0.003)8 (149; 0.133; 0.124)FL10045972316408Tumour Content:86%; 99%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*5 (3123; 0.428; 0.496)9 (795; 0.006; 0.263)7 (544; 0.006; 0.524)2 (410; 0.341; 0.288)3 (302; 0.006; 0.114)1 (268; 0.333; 0.006)6 (208; 0.41; 0.76)4 (90; 0.669; 0.799)0 (74; 0.125; 0.16)8 (38; 0.091; 0.007)FL10052930457168Tumour Content:59%; 94%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*2 (3526; 0.296; 0.468)9 (2672; 0.057; 0.005)3 (2474; 0.003; 0.448)0 (1172; 0.003; 0.161)4 (502; 0.262; 0.291)5 (479; 0.431; 0.904)7 (370; 0.269; 0.004)1 (367; 0.375; 0.683)6 (267; 0.003; 0.889)8 (123; 0.125; 0.156)FL10069 647 015328Tumour Content:94%; 88%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll9 (5818; 0.004; 0.434)*6 (3085; 0.469; 0.441)4 (948; 0.003; 0.241)7 (686; 0.159; 0.004)0 (546; 0.447; 0.003)1 (356; 0.614; 0.8)5 (353; 0.285; 0.268)3 (302; 0.127; 0.144)2 (295; 0.004; 0.666)8 (285; 0.005; 0.846)FL10075 372694180Tumour Content:71%; 81%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll5 (3040; 0.005; 0.421)*3 (2666; 0.357; 0.405)7 (2089; 0.064; 0.005)2 (916; 0.005; 0.279)6 (884; 0.35; 0.005)9 (366; 0.502; 0.735)4 (178; 0.006; 0.631)1 (144; 0.007; 0.077)8 (136; 0.006; 0.832)0 (95; 0.134; 0.127)FL10089235804Tumour Content:52%; 89%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)lllllll*9 (4608; 0.262; 0.447)2 (2432; 0.315; 0.549)3 (337; 0.201; 0.015)5 (134; 0.494; 0.956)8 (125; 0.025; 0.134)0 (91; 0.139; 0.214)4 (47; 0.488; 0.609)FL100907158923Tumour Content:41%; 52%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllll*0 (5182; 0.206; 0.259)7 (4869; 0.006; 0.245)1 (1299; 0.253; 0.005)5 (1265; 0.304; 0.26)8 (750; 0.142; 0.002)9 (314; 0.118; 0.2)2 (191; 0.142; 0.121)3 (174; 0.412; 0.469)FL10122501 96437 8Tumour Content:93%; 88%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*2 (4256; 0.467; 0.44)5 (1149; 0.003; 0.452)0 (1014; 0.171; 0.004)1 (994; 0.004; 0.298)9 (882; 0.319; 0.304)6 (438; 0.609; 0.614)4 (305; 0.955; 0.915)3 (263; 0.782; 0.823)7 (231; 0.003; 0.109)8 (142; 0.173; 0.122)FL10136702945138Tumour Content:83%; 72%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll6 (4791; 0.006; 0.464)7 (2861; 0.008; 0.299)0 (2611; 0.444; 0.007)2 (807; 0.005; 0.653)*9 (731; 0.415; 0.362)4 (663; 0.472; 0.575)5 (615; 0.194; 0.005)1 (380; 0.006; 0.872)3 (328; 0.007; 0.111)8 (90; 0.862; 0.865)FL1014871956 0432Tumour Content:92%; 73%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*8 (4768; 0.459; 0.367)7 (2180; 0.003; 0.375)1 (837; 0.134; 0.003)9 (754; 0.004; 0.165)5 (474; 0.301; 0.251)6 (447; 0.482; 0.544)0 (394; 0.65; 0.533)4 (213; 0.136; 0.123)3 (151; 0.926; 0.752)2 (53; 0.19; 0.443)FL101659176 84203Tumour Content:70%; 57%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*5 (4874; 0.352; 0.286)9 (4030; 0.003; 0.323)1 (3856; 0.325; 0.004)7 (1774; 0.003; 0.19)6 (1750; 0.108; 0.003)8 (1440; 0.474; 0.003)4 (1066; 0.524; 0.324)2 (279; 0.807; 0.575)0 (257; 0.159; 0.128)3 (165; 0.835; 0.003)FL10178176342059Tumour Content:93%; 99%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*8 (6498; 0.463; 0.497)1 (1179; 0.005; 0.5)7 (660; 0.137; 0.003)6 (219; 0.393; 0.331)3 (191; 0.049; 0.093)4 (164; 0.216; 0.233)2 (136; 0.006; 0.246)0 (111; 0.665; 0.749)5 (78; 0.137; 0.499)9 (72; 0.916; 0.988)FL101835 610497 28Tumour Content:87%; 61%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll3 (5499; 0.004; 0.309)5 (3768; 0.269; 0.003)6 (3584; 0.427; 0.003)1 (2311; 0.003; 0.175)*0 (1447; 0.434; 0.303)4 (839; 0.267; 0.223)9 (479; 0.677; 0.484)7 (349; 0.098; 0.014)2 (257; 0.786; 0.004)8 (196; 0.326; 0.529)FL10198045397216Tumour Content:95%; 71%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*8 (8617; 0.475; 0.353)0 (1393; 0.112; 0.002)4 (1345; 0.33; 0.263)5 (995; 0.431; 0.005)3 (590; 0.62; 0.519)9 (486; 0.004; 0.125)7 (299; 0.963; 0.773)2 (161; 0.207; 0.212)1 (110; 0.111; 0.094)6 (110; 0.646; 0.32)FL1020Figure 3.5: Tumour Content Estimation in TFL Patients. A scatterplot of the meanT2 vs. T1 VAF of each cluster (identified from a VBBMM) in each TFL patient is shownwith the size of the cluster representing the number of sSNVs in the cluster. The clustermost representative of clonally dominant diploid heterozygous sSNVs in each patient isindicated by an * in the patient legend. The predicted tumour content is listed in the bottomright hand corner of each patient plot (T1; T2).695293618740Tumour Content:46%; 67%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll5 (1893; 0.003; 0.147)*2 (1796; 0.231; 0.335)9 (1477; 0.005; 0.371)3 (1279; 0.11; 0.002)6 (804; 0.236; 0.468)1 (230; 0.172; 0.154)8 (212; 0.391; 0.713)7 (205; 0.241; 0.002)4 (134; 0.079; 0.084)0 (98; 0.007; 0.698)FL20010615743289Tumour Content:92%; 83%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*0 (3929; 0.461; 0.415)6 (1268; 0.003; 0.121)1 (921; 0.155; 0.003)5 (834; 0.61; 0.398)7 (786; 0.324; 0.266)4 (570; 0.535; 0.543)3 (565; 0.177; 0.356)2 (413; 0.686; 0.656)8 (410; 0.14; 0.122)9 (291; 0.979; 0.853)FL20023160597Tumour Content:95%; 100%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)lllllll*3 (3661; 0.476; 0.502)1 (585; 0.104; 0.135)6 (363; 0.296; 0.31)0 (295; 0.904; 0.991)5 (291; 0.36; 0.468)9 (202; 0.161; 0.383)7 (27; 0.654; 0.714)FL200561903458Tumour Content:100%; 82%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllll*6 (3513; 0.51; 0.411)1 (1618; 0.004; 0.168)9 (643; 0.427; 0.331)0 (177; 0.31; 0.255)3 (156; 0.991; 0.761)4 (133; 0.128; 0.118)5 (64; 0.096; 0.004)8 (28; 0.692; 0.577)FL20062875 940136Tumour Content:94%; 100%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll*2 (3338; 0.47; 0.499)8 (1380; 0.003; 0.116)7 (581; 0.112; 0.005)5 (371; 0.003; 0.473)9 (213; 0.166; 0.498)4 (193; 0.126; 0.141)0 (170; 0.31; 0.311)1 (27; 0.644; 0.68)3 (19; 0.438; 0.006)6 (13; 0.467; 0.989)FL20077285463901Tumour Content:58%; 100%0.000.250.500.751.000.00 0.25 0.50 0.75 1.00T1 VAFT2 VAFCluster (n; T1 VAF; T2 VAF)llllllllll7 (5384; 0.003; 0.099)*2 (2866; 0.291; 0.499)8 (1099; 0.124; 0.003)5 (731; 0.145; 0.487)4 (555; 0.004; 0.268)6 (201; 0.091; 0.124)3 (172; 0.153; 0.186)9 (148; 0.504; 0.989)0 (95; 0.24; 0.338)1 (40; 0.151; 0.654)FL2008Figure 3.6: Tumour Content Estimation in PFL Patients. A scatterplot of the meanT2 vs. T1 VAF of each cluster (identified from a VBBMM) in each PFL patient is shownwith the size of the cluster representing the number of sSNVs in the cluster. The clustermost representative of clonally dominant diploid heterozygous sSNVs in each patient isindicated by an * in the patient legend. The resulting predicted tumour content is listed inthe bottom right hand corner of each patient plot (T1; T2).70TumourContent:41%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*3 (6022; 0.203)0 (1956; 0.296)2 (212; 0.442)6 (95; 0.088)7 (5; 0.738)9 (3; 0.125)FL3001TumourContent:73%012340.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*1 (1763; 0.365)0 (1095; 0.481)7 (624; 0.103)8 (146; 0.213)3 (102; 0.6)5 (40; 0.757)2 (32; 0.862)6 (31; 0.423)FL3002TumourContent:71%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*8 (4421; 0.356)5 (2780; 0.432)4 (2246; 0.28)0 (667; 0.093)6 (467; 0.739)7 (264; 0.183)3 (79; 0.57)FL3003TumourContent:47%02460.00 0.25 0.50 0.75 1.00T1 VAFDensity Cluster (n; T1 VAF)*4 (5459; 0.235)1 (1051; 0.112)6 (1012; 0.348)8 (22; 0.522)FL3004TumourContent:60%0240.00 0.25 0.50 0.75 1.00T1 VAFDensity Cluster (n; T1 VAF)*8 (1536; 0.298)1 (853; 0.199)6 (129; 0.098)5 (69; 0.49)FL3005TumourContent:65%0123450.00 0.25 0.50 0.75 1.00T1 VAFDensity Cluster (n; T1 VAF)*1 (4069; 0.327)8 (2567; 0.228)9 (247; 0.595)7 (231; 0.09)FL3006TumourContent:74%0240.00 0.25 0.50 0.75 1.00T1 VAFDensity Cluster (n; T1 VAF)*3 (4944; 0.369)7 (248; 0.204)4 (179; 0.105)1 (32; 0.631)FL3007TumourContent:68%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*5 (3684; 0.34)6 (493; 0.217)1 (149; 0.686)0 (137; 0.102)3 (62; 0.498)FL3008TumourContent:62%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*3 (2634; 0.312)1 (2561; 0.395)7 (1865; 0.222)0 (331; 0.525)4 (177; 0.681)2 (150; 0.094)6 (9; 0.276)FL3009TumourContent:58%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*6 (3814; 0.29)8 (255; 0.42)3 (244; 0.103)7 (223; 0.172)4 (119; 0.735)0 (38; 0.542)FL3010TumourContent:98%012340.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*6 (4167; 0.489)9 (1761; 0.307)2 (1548; 0.154)8 (688; 0.593)1 (642; 0.407)0 (141; 0.971)3 (86; 0.718)FL3011TumourContent:54%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*5 (4447; 0.268)1 (466; 0.154)4 (207; 0.086)7 (166; 0.397)8 (96; 0.576)FL3012TumourContent:73%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*5 (5160; 0.365)9 (307; 0.103)4 (200; 0.723)6 (111; 0.217)7 (47; 0.549)FL3013TumourContent:70%02460.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*2 (8776; 0.348)3 (652; 0.705)0 (458; 0.527)5 (443; 0.222)1 (46; 0.1)FL3014TumourContent:50%02460.00 0.25 0.50 0.75 1.00T1 VAFDensity Cluster (n; T1 VAF)*3 (4270; 0.248)1 (1178; 0.122)2 (245; 0.381)9 (86; 0.536)FL3015TumourContent:98%0123450.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*0 (4653; 0.492)2 (917; 0.325)3 (740; 0.112)8 (391; 0.659)5 (332; 0.978)1 (102; 0.201)FL3016TumourContent:98%0240.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*3 (3669; 0.491)5 (185; 0.12)7 (150; 0.982)6 (72; 0.236)1 (33; 0.685)FL3017TumourContent:86%0240.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*4 (1566; 0.43)9 (1184; 0.341)0 (414; 0.248)7 (256; 0.571)8 (82; 0.101)2 (13; 0.817)FL3018TumourContent:95%012340.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*4 (1716; 0.476)6 (1206; 0.4)1 (1021; 0.293)0 (848; 0.568)9 (814; 0.102)7 (465; 0.184)5 (225; 0.859)8 (200; 0.663)FL3019TumourContent:97%012340.00 0.25 0.50 0.75 1.00T1 VAFDensityCluster (n; T1 VAF)*0 (12273; 0.483)8 (6587; 0.332)3 (2304; 0.633)9 (941; 0.112)4 (341; 0.197)2 (160; 0.95)6 (91; 0.562)FL3020Figure 3.7: Tumour Content Estimation in NPFL patients. T1 VAF density plots ofeach cluster (identified from a VBBMM). The cluster most representative of the diploidheterozygous sSNVs in each patient is indicated by an * in the patient legend.3.2.3.4 Somatic Copy Number Alteration PredictionThe full bioinformatics workflow for predicting somatic copy number alterations (sCNA)from the WGS data is shown in Figure 3.8. Germline heterozygous SNPs in the normalsample were first predicted with samtools (v0.1.18) and bcftools (v0.1.18). Specifically, thenormal bam file was first filtered for PCR duplicates and non-unique mapping reads using“samtools view -F 1024”. This filtered bam was used as input into “samtools mpileup -u -I-f GRCh37.fa” and then “bcftools view -vcg”. The positions are then filtered to only keepSNPs reported in dbSNP (v137) using “SnpSift filter ‘isHet(GEN[0])”’ (v3.5, available aspart of the SnpEff package). Read count data for the reference and variant alleles werethen retrieved for all these snp positions in both the tumour and normal bam files. Readsconsidered PCR duplicates and non-unique were filtered “samtools view -F 1024” and notconsidered in the allele read counting.71Normal Sample Heterozygous Germline Positions Tumour Sample Tumour  Allele Count Normal Allele Count Tumour Coverage Normal Coverage Genome GC Content Genome Mappability GRCh37 Genome bcftools;  Use only dbSNP137  positions HMMCopy:  readCounter Retrieve Read Count Data TitanCNA Results TitanCNA HMMCopy: gcCounter HMMCopy: a)  generateMap.pl b)  mapCounter HMMCopy:  readCounter a)  Normal Content Estimation from sSNV Data b)  numberClonalClusters (1-5) Segment Results createTITANsegmentfiles.pl Retrieve Read Count Data Mask for CNV Gene Centric sCNA Figure 3.8: Bioinformatics Workflow for Predicting sCNAs from WGS data.HMMcopy [166] (v0.1.1) was used to generate coverage wig files for the tumourand normal samples using a window size of 1000 base pairs (“readCounter -w 1000”).Additionally, HMMcopy was also used to calculate GC content of the GRCh37 genome(“gcCounter -w 1000”). Finally, a GRCh37 mappability file was generated by first runningHMMcopy’s “generateMap -w 35” to generate a BigWig file which is used as input intoHMMcopy’s “mapCounter -w 1000” to generate a final mappability wig file.The tumour-normal pair’s read count, coverage data, and normal content estimations(see Section 3.2.3.3 for details on estimation) along with the GRCh37 GC contentand mappability data were used as input into the TITAN [165] bioconductor R package(named TitanCNA, v1.5.7) to predict sCNAs. The TitanCNA “loadDefaultParameters”function was used with the parameters “copyNumber = 4, numberClonalClusters = 1. . . 5,symmetric = TRUE” followed with setting ploidy = 2 to get the initial parameters. ThenumberClonalClusters was set to a value between 1 to 5 (more details below). Datawas filtered for low and high depth positions (“filterData with parameters minDepth = 10,maxDepth = 200”).TITAN was then run with runClonalEM with the parameters “maxiter = 20,maxiterUpdate = 1500, txnExpLen = 1e9, txnZstrength = 1e9, useOutlierState =72FALSE, normalEstimateMethod = fixed, estimateS = TRUE, estimatePloidy = TRUE”).For each tumour-normal pair, TITAN was run 5 times with each run differing by thenumberClonalClusters parameter as mentioned earlier. The run with the minimumS_Dbw_validity_index is chosen as the optimal model for each tumour-normal pair. Inseveral tumour-normal pairs (FL1001T2, FL1007T2, FL1008T2, FL1012T2, FL2008T1,FL3004T1, FL3006T1, FL3012T1, FL3015T1), the optimal model selection was not apragmatic solution. In these pairs, I investigated the model for every numberClonalClustersvalue and selected the model selection with the most pragmatic results. For FL1001T1and FL3012T1, the TITAN results were uninterpretable indicating that TITAN was havingdifficulty converging on a pragmatic solution. For these patients, I ran TITAN across arange of normal content initialization values (i.e. 0.1 to 1) to determine if normal contentwas contributing to the non-convergence. I discovered that a normal content of 1 as aninitialization value gave a pragmatic solution and chose to go forward with the results ofthis initialization value. The final parameters values selected are listed in Table 3.3.Table 3.3: TITAN Final Selected Parameters.Patient Timepoint Number of Clusters Normal Content Ploidy S_Dbw_validity_indexFL1001 T1 2 0 2.01 0.3285FL1001 T2 2 0.58 2.05 0.2573FL1004 T1 2 0.32 2.06 0.1457FL1004 T2 1 0.27 2.07 0.1027FL1005 T1 2 0.14 2.02 0.1456FL1005 T2 3 0.01 1.96 0.1542FL1006 T1 1 0.41 2.07 0.2435FL1006 T2 1 0.06 1.97 0.1394FL1007 T1 2 0.06 1.99 0.2104FL1007 T2 1 0.12 2.08 0.1877FL1008 T1 2 0.29 2.01 0.2083FL1008 T2 1 0.19 2.06 0.1902FL1009 T1 2 0.48 1.97 0.2444FL1009 T2 2 0.11 2 0.2712FL1012 T1 1 0.59 2.04 0.1975FL1012 T2 2 0.48 2.08 0.1609FL1013 T1 2 0.07 2.09 0.1629FL1013 T2 2 0.12 2.13 0.1624FL1014 T1 2 0.17 2.02 0.237FL1014 T2 2 0.28 1.9 0.1356FL1016 T1 2 0.08 2.02 0.1811FL1016 T2 1 0.27 2.04 0.080773. . . continuedPatient Timepoint Number of Clusters Normal Content Ploidy S_Dbw_validity_indexFL1017 T1 2 0.3 2.04 0.1811FL1017 T2 3 0.43 2.03 0.1446FL1018 T1 1 0.07 2.01 0.2359FL1018 T2 2 0.01 2 0.1602FL1019 T1 2 0.13 2.28 0.1912FL1019 T2 2 0.39 2.26 0.1252FL1020 T1 3 0.05 2.04 0.1968FL1020 T2 2 0.29 2.06 0.2256FL2001 T1 2 0.54 2.04 0.2336FL2001 T2 2 0.33 2.02 0.1546FL2002 T1 2 0.08 2.1 0.2052FL2002 T2 2 0.17 2.08 0.2406FL2005 T1 2 0.05 1.99 0.2252FL2005 T2 1 0 1.97 0.2719FL2006 T1 2 0 2 0.2218FL2006 T2 4 0.18 2 0.2376FL2007 T1 4 0.06 2 0.3108FL2007 T2 2 0 2 0.196FL2008 T1 2 0.42 2.01 0.2512FL2008 T2 3 0 2 0.176The output TITAN results were then converted into TITAN segments using the“createTITANsegmentfiles.pl” script with each segment taking on 1 of 25 possible copynumber states which were then collapsed into 1 of 10 possible summary states: 1)Homozygous Deletion, 2) Hemizygous Deletion, 3) Neutral, 4) 3N Gain, 5) 4N Gain, 6) 5NGain, 7) 6N Gain, 8) 7N Gain, 9) 8N Gain, 10) Somatic LOH. These TITAN copy numberstates and summary state mappings are listed in Table 3.4.Table 3.4: TITAN Copy Number State Overview. States from TITAN are collapsed inone of 10 different summary states listed in the “State Summary” column for analysis.HOMD, Homozygous deletion; DLOH, Deletion loss of heterozygosity (LOH); NLOH,Neutral LOH; HET, Diploid heterozygous; ALOH, Amplified LOH; ASCNA, allele-specificcopy number (CN) amplification; BCNA, balanced CN amplification; UBCNA, unbalancedCN amplification.State No. State Genotype State Summary Copy Number Major Minor0 HOMD NA Homozygous Deletion 0 0 01 DLOH A Hemizygous Deletion 1 1 074. . . continuedState No. State Genotype State Summary Copy Number Major MinorB2 NLOH AA Somatic LOH 2 2 0BB3 HET AB Neutral 2 1 14 ALOH AAA Somatic LOH 3 3 0BBB5 GAIN AAB 3N Gain 3 2 1ABB6 ALOH AAAA Somatic LOH 4 4 0BBBB7 ASCNA AAAB 4N Gain 4 3 1ABBB8 BCNA AABB 4N Gain 4 2 29 ALOH AAAAA Somatic LOH 5 5 0BBBBB10 ASCNA AAAAB 5N Gain 5 4 1ABBBB11 UBCNA AAABB 5N Gain 5 3 2AABBB12 ALOH AAAAAA Somatic LOH 6 6 0BBBBBB13 ASCNA AAAAAB 6N Gain 6 5 1ABBBBB14 UBCNA AAAABB 6N Gain 6 4 2AABBBB15 BCNA AAABBB 6N Gain 6 3 316 ALOH AAAAAAA Somatic LOH 7 7 0BBBBBBB17 ASCNA AAAAAAB 7N Gain 7 6 1ABBBBBB18 UBCNA AAAAABB 7N Gain 7 5 2AABBBBB19 UBCNA AAAABBB 7N Gain 7 4 3AAABBBB20 ALOH AAAAAAAA Somatic LOH 8 8 0BBBBBBBB21 ASCNA AAAAAAAB 8N Gain 8 7 175. . . continuedState No. State Genotype State Summary Copy Number Major MinorABBBBBBB22 UBCNA AAAAAABB 8N Gain 8 6 2AABBBBBB23 UBCNA AAAAABBB 8N Gain 8 5 3AAABBBBB24 BCNA AAAABBBB 8N Gain 8 4 4To increase the specificity for sCNAs, I masked the segment data with a copy numbervariant mask. This mask was constructed using the gold standard data from the Databaseof Genomic Variants (vJuly 2015), peripheral blood samples [70], and normal breasttissues (METABRIC [108]). TITAN segments were overlapped with the copy numbervariation mask and any segment that overlapped with ≥ 25% of its length with any masksegment was labeled as a mask segment and assigned a neutral summary state.To assign gene-centric copy number, genes coordinates (Ensembl v72) wereoverlapped with the TITAN segment coordinates. Genes were considered overlappingwith a TITAN segment if ≥ 50% of the gene overlapped with the segment and then wouldbe assigned the state of the overlapping TITAN segment.3.2.3.5 Somatic Structural Rearrangement PredictionDestruct [167] (v0.2.0) was used to predict structural rearrangements for each patient. ForTFL and PFL patients, the normal, T1 and T2 samples were used as input into Destruct tosimultaneously predict rearrangements across all samples. For NPFL patients, the normaland T1 samples were used. The following filters were applied for Destruct predictions:1) 0 read in the matching normal sample, 2) distance to any other breakpoint is > 50basepairs, 3) log likelihood > -20, 4) minimum template length > 120, 5) matescore ≤ 10,6) both rearranged partners must be on an autosomal or sex chromosome, 7) not foundas a variant in the database of genomic variants database, and 8) number of split reads> 0 if the rearrangement does involve the IgH locus. When considering rearrangementsthat may affect a gene, the breakpoint must be inside the gene itself and not upstream ordownstream of the gene. A rearrangement was considered to be shared if there was ≥ 1read in both timepoints and timepoint specific if it contained 0 reads in one timepoint.3.2.3.6 Tumour Evolution ModellingNeutral evolution modelling was performed using the methodology described previouslyby Williams et al. [168]. The methodology assumes that tumour samples undergoingneutral evolution acquire mutations at a rate that follows a 1V AF power law distribution.76A goodness-of-fit measure, R2, can then be used as a single measure of how much thesample is evolving under neutral evolutionary dynamics. VAFs from WGS data were usedas input and specifically using a minimum VAF of 0.05 and a maximum VAF of 0.25 ascut-offs for subclonal sSNV as per the methodology described previously by Williams et al[168].Genetic drift trajectory modelling was performed using the Wright-Fisher model [169].Under this model, the following assumptions are made: 1) generations do not overlap,and 2) each copy of an allele in a generation is independently drawn from the previousgeneration at random, and 3) the number of cells does not change between generations.Simulations were initialized with 10000 diploid cells and a starting mutant VAF of 0.01 forTFL and 0.5 for PFL patients. Each simulation was run for 10000 generations and a totalof 1000 simulations were run. K-means clustering, ranging from 2 to 10 clusters (k), wasused to cluster together simulations that exhibit similar genetic drift trajectories. To decideon the optimal number of clusters, the total within sum of squares vs. k-clusters was plottedand the optimal number of clusters was selected based on the elbow of the curve. Thisresulted in 5 and 4 optimal clusters for TFL and PFL patients respectively.After identifying the cluster with the genetic drift trajectory most similar to the observedpatterns in TFL patients, the proportion of simulations of that trajectory was used as thebackground trajectory rate. This rate was then used as input into the binomial exact test toquantify the probability of 13 out of 15 TFL patients following the observed patterns.3.2.4 Targeted Deep Amplicon Sequencing Data AnalysisThe following method sections pertain to the target deep amplicon sequencing data.3.2.4.1 Selection of Positions for Deep Amplicon SequencingFor each patient, at least 192 predicted sSNVs, across both timepoints, were selected tobe taken forward for targeted deep amplicon sequencing. These mutations included allnon-synonymous (Figure 3.9B) and synonymous coding sSNVs (Figure 3.9C) as well ascoding sIndels. As this sum fell typically short of 192, I backfilled the list of positions fordeep amplicon sequencing to a total of 192 by also including non-coding sSNVs that wereproportionally selected from mutational clusters generated from each patient’s sSNV VAF(Figure 3.9D). These clusters are the same clusters identified through the VBBMM usedfor tumour content estimation (Figure 3.9A, Section 3.2.3.3).3.2.4.2 Somatic Single Nucleotide Variant ValidationReads were fitered out if they either 1) aligned > 10 base pairs away from an amplicon’sstart or end position, 2) had > 5 mismatched bases, or 3) mapping quality < 30. Readcounts supporting the reference and variant base are then extracted for each predictedposition from WGS. Only reads that had a base quality > 30 are considered in the77Figure 3.9: Selection of sSNV Positions for Deep Sequencing Validation. At least 192positions were selected for deep sequencing validation. This includes all coding sIndels,non-synonymous coding sSNVs (panel B), and synonymous coding sSNVs (panel C).To backfill positions to meet the 192 position requirement, I then proportionally sampledsSNVs from the different clusters (panel D) identified by VBBMM (panel A) on the T1 andT2 VAFs.counting. After filtering, the mean ± standard deviation of targeted positions was 10733.37± 9025.205 for each sample.For each targeted position, a background error rate was calculated by interrogating 30base pairs up- and downstream of the targeted position by calculating the allelic ratio ofthe most frequent base. The mean VAF of these positions was considered the backgrounderror rate after ignoring germline and somatic mutation positions. A binomial exact test wasthen used to test if the predicted variant allele was present using a p-value of threshold of< 0.000001. A position is considered somatic if the variant allele is present in tumour andabsent in matching normal.In some normal samples, there was evidence of contaminating tumour DNA leading tothe presence of the variant allele in the matching normal. To deal with these situations, I78performed a one-tailed Fisher’s exact test on the tumour and normal read counts testing ifthe allelic ratio was higher in the tumour. A significance level of < 0.05 was set to specifyif a position was somatic. The mean ± standard deviation validation rate (precision) was96.3% ± 5.4%.3.2.4.3 Estimation of Mutational Cellular PrevalenceFor each TFL and PFL patient, I inferred the mutational cellular prevalence of eachvalidated sSNV (aka. proportion of cancer cells with sSNV) for both the T1 and T2 sample.Specifically, any mutation that was validated in a T1 and/or T2 sample (Section 3.2.4.2,Figure 3.10A,B) was used as an input into PyClone (v0.12.7). PyClone requires for eachsSNV: 1) read count of the variant allele, 2) read count for the reference allele, and 3) majorand minor copy number. For 1) and 2), these data were retrieved from the targeted deepamplicon sequencing (Section 3.2.4). The major and minor copy number data of eachfeature was taken from the TITAN segments (Section 3.2.3.4). In addition to these inputsfor each sSNV, the tumour content estimated from the sSNV data (Section 3.2.3.3) wasalso used. Any sSNVs without matching copy number data or in a homozygous deletedregion were not considered for PyClone analysis.12 3 45ClonalPrevalenceT1 T2A B C D E FT1 WGS VAFT2WGSVAFT2WGSVAFT1 DeepSeq VAFFigure 3.10: Inference of Clonal Phylogenies from Sequencing Data Workflow.sSNVs, predicted from the T1 (x-axis) and/or T2 (y-axis) samples from WGS (panel A),were selected for targeted deep sequencing validation (panel B). Validated positions wereused as input into PyClone to determine the mutational cellular prevalence of each sSNV(panel C). sSNV with similar mutational cellular prevalences are clustered with PyClonewith each cluster’s cellular prevalence being represented by the mean cellular prevalenceof all sSNVs in the cluster (panel D). These cluster cellular prevalences were used as inputinto Citup to construct clonal phylogenies (panel E) and clonal prevalences for the T1 andT2 samples (panel F). sSNVs in each node are propagated down to their children nodes.The PyClone “build_mutations_file” with the parameters “–var_priorparental_copy_number” was used to construct sample-centric mutation yaml files.Patient configuration yaml files were constructed manually with sample-specific tumourcontent estimations and “error_rate = 0.001” along with the following parameters:79dens i t y : pyc lone_beta_binomialnum_iters : 100000base_measure_param :alpha : 1.0beta : 1.0concen t ra t i on :value : 1.0p r i o r :shape : 1.0ra te : 0.001beta_binomial_prec is ion_params :value : 1000.0p r i o r :shape : 1.0ra te : 0.0001proposal :p r e c i s i o n : 0.01These yaml files were then used as input into the “analyze” function of PyClone withthe parameters “–seed 1000”. The “build_table” function with the parameter “burnin 50000”was then used to retrieve the estimated mutational cellular prevalence of each sSNV(Figure 3.10C). As some sSNVs had large uncertainty in their estimated mutational cellularprevalence, I performed a credible interval analysis using the “CI_filter.py” script fromPostPy (v0.1) using the parameters “-b 50000 -i 90 -r 0”. Any sSNVs with a 90% credibleinterval larger than 0.3 in any sample of a patient was filtered out. In addition, positions inthe IGH location with low coverage were also filtered out. PyClone was then re-run withoutthe filtered sSNVs.3.2.5 Capture Sequencing Data AnalysisThe following method sections pertain to the capture sequencing data.3.2.5.1 Capture Panel and Library ConstructionA total of 86 genes (Table 3.5) were selected for capture-based targeted sequencing.These genes were selected based on the following 6 criteria:1. recurrence in FL of > 5% [170, 171],2. recurrence in DLBCL > 5% (in lab data) or reported to be consistently mutated inBurkitt lymphoma across studies [172–174],3. genes significantly mutated (q < 0.05) in the compiled dataset from this study andothers [141, 142] after applying the MutSigCV algorithm [175] (v1.4),4. T1 gene mutations associated with early transformation/progression (< 5 years) vs.no progression (for at least 5 years) based on the study and external cohorts [141,142],805. genes that were found to be T2-specific in at least 3 cases from this study andexternal cohorts [141, 142], and6. IL4R, PTPN1, NOTCH2 and RFX5.Table 3.5: Capture Sequencing Gene Panel (n = 86 genes). Genes in which the codingsequence was sequenced and analyzed in the capture sequencing cohort, and criteria bywhich they were selected. Criterion 1 - Known recurrence in FL > 5%. Criterion 2 - Knownrecurrence in DLBCL > 5%, or recurrently mutated in Burkitt lymphoma. Criterion 3 -Genes significantly mutated using MutSigCV in the compiled dataset of the whole-genomeor whole-exome sequencing cases from this study and from the studies by Okosun et al.[141] and Pasqualucci et al. [142]. Criterion 4 - Genes that, when mutated, are associatedwith early progression or transformation in the studies mentioned in Criterion 3. Criterion 5- Genes found to be T2-specific in at least 3 cases from this study and studies mentionedin Criterion 3. Other - IL4R, PTPN1, NOTCH2, RFX5.Gene Name Criterion1 Criterion2 Criterion3 Criterion4 Criterion5 OtherACTB FALSE TRUE FALSE FALSE FALSE FALSEARHGEF1 FALSE TRUE FALSE FALSE FALSE FALSEARID1A TRUE FALSE TRUE TRUE FALSE FALSEARID1B TRUE FALSE FALSE FALSE FALSE FALSEATP6AP1 TRUE FALSE FALSE FALSE FALSE FALSEATP6V1B2 TRUE FALSE TRUE FALSE FALSE FALSEB2M TRUE TRUE TRUE TRUE TRUE FALSEBCL10 FALSE FALSE TRUE FALSE FALSE FALSEBCL2 TRUE TRUE FALSE TRUE FALSE FALSEBCL6 TRUE TRUE TRUE TRUE FALSE FALSEBCL7A TRUE FALSE TRUE FALSE FALSE FALSEBCR FALSE TRUE FALSE FALSE FALSE FALSEBTG1 TRUE TRUE FALSE FALSE FALSE FALSEBTG2 FALSE TRUE FALSE FALSE FALSE FALSECARD11 TRUE TRUE TRUE FALSE FALSE FALSECCL23 FALSE FALSE TRUE FALSE FALSE FALSECCND3 TRUE TRUE TRUE TRUE FALSE FALSECD58 FALSE TRUE TRUE FALSE FALSE FALSECD70 FALSE TRUE FALSE FALSE FALSE FALSECD79B FALSE TRUE TRUE FALSE FALSE FALSECD83 FALSE TRUE FALSE FALSE FALSE FALSECHD8 FALSE FALSE FALSE TRUE FALSE FALSECIITA FALSE TRUE FALSE FALSE FALSE FALSECREBBP TRUE TRUE TRUE FALSE TRUE FALSE81. . . continuedGene Name Criterion1 Criterion2 Criterion3 Criterion4 Criterion5 OtherCTSS TRUE FALSE FALSE FALSE FALSE FALSEDNAH9 FALSE FALSE FALSE TRUE FALSE FALSEDTX1 TRUE FALSE TRUE FALSE FALSE FALSEEBF1 FALSE FALSE TRUE TRUE FALSE FALSEEBF3 FALSE FALSE FALSE FALSE TRUE FALSEEEF1A1 TRUE FALSE FALSE FALSE FALSE FALSEEP300 TRUE TRUE FALSE FALSE FALSE FALSEEVI2A FALSE FALSE TRUE FALSE FALSE FALSEEZH2 TRUE TRUE TRUE TRUE FALSE FALSEFAS FALSE FALSE TRUE FALSE TRUE FALSEFAT4 FALSE FALSE FALSE TRUE FALSE FALSEFOXO1 TRUE TRUE TRUE TRUE FALSE FALSEGNA13 TRUE TRUE TRUE FALSE FALSE FALSEGNAI2 TRUE FALSE TRUE FALSE FALSE FALSEHIST1H1B FALSE FALSE TRUE FALSE FALSE FALSEHIST1H1C FALSE TRUE FALSE FALSE FALSE FALSEHIST1H1E TRUE TRUE TRUE TRUE TRUE FALSEHIST1H2AM FALSE FALSE FALSE FALSE TRUE FALSEHLA-DMB FALSE FALSE TRUE FALSE FALSE FALSEHVCN1 FALSE FALSE TRUE FALSE FALSE FALSEID3 FALSE TRUE FALSE FALSE FALSE FALSEIKZF3 TRUE FALSE FALSE FALSE FALSE FALSEIL4R FALSE FALSE FALSE FALSE FALSE TRUEIRF4 FALSE FALSE FALSE TRUE FALSE FALSEIRF8 TRUE TRUE TRUE FALSE TRUE FALSEITPKB FALSE FALSE FALSE TRUE FALSE FALSEKLHL6 FALSE TRUE TRUE FALSE FALSE FALSEKMT2C TRUE FALSE FALSE FALSE FALSE FALSEKMT2D TRUE TRUE TRUE TRUE FALSE FALSELRRC7 FALSE FALSE FALSE TRUE FALSE FALSEMEF2B TRUE TRUE TRUE TRUE FALSE FALSEMEF2C FALSE FALSE TRUE FALSE FALSE FALSEMKI67 FALSE TRUE FALSE FALSE FALSE FALSEMYC FALSE TRUE TRUE TRUE FALSE FALSEMYD88 FALSE TRUE FALSE FALSE FALSE FALSEMYOM2 FALSE TRUE FALSE FALSE FALSE FALSENLRC5 FALSE TRUE FALSE FALSE FALSE FALSE82. . . continuedGene Name Criterion1 Criterion2 Criterion3 Criterion4 Criterion5 OtherNOTCH1 FALSE TRUE FALSE FALSE FALSE FALSENOTCH2 FALSE FALSE FALSE FALSE FALSE TRUEP2RY8 FALSE TRUE FALSE FALSE FALSE FALSEPIM1 TRUE TRUE FALSE TRUE FALSE FALSEPOU2AF1 FALSE FALSE TRUE FALSE FALSE FALSEPTPN1 FALSE FALSE FALSE FALSE FALSE TRUERFX5 FALSE FALSE FALSE FALSE FALSE TRUERHOA FALSE TRUE FALSE FALSE FALSE FALSERRAGC FALSE FALSE TRUE FALSE TRUE FALSES1PR2 FALSE FALSE TRUE FALSE FALSE FALSESGK1 FALSE TRUE TRUE TRUE FALSE FALSESMARCA4 TRUE TRUE FALSE FALSE FALSE FALSESOCS1 FALSE FALSE TRUE FALSE FALSE FALSESTAT3 FALSE TRUE FALSE FALSE FALSE FALSESTAT6 TRUE FALSE TRUE FALSE FALSE FALSETCF3 FALSE TRUE FALSE FALSE FALSE FALSETLR2 FALSE FALSE FALSE TRUE FALSE FALSETMEM30A FALSE TRUE FALSE FALSE FALSE FALSETNFAIP3 TRUE TRUE TRUE FALSE FALSE FALSETNFRSF14 TRUE TRUE TRUE TRUE FALSE FALSETP53 TRUE TRUE TRUE TRUE FALSE FALSEUNC5C FALSE TRUE FALSE FALSE FALSE FALSEUPF1 FALSE FALSE TRUE FALSE FALSE FALSEXBP1 FALSE FALSE TRUE FALSE FALSE FALSEZFP36L1 FALSE TRUE FALSE FALSE FALSE FALSEAn additional 20 genes were selected, 12 of which overlapped with the 86above-mentioned genes, to assess mutations in targets of somatic hypermutation(Table 3.6). Libraries were constructed from either 500ng of fresh-frozen genomic DNAor 200ng of FFPE-derived genomic DNA, and captured using custom SureSelectXT2 baits(Agilent). Captured libraries were pooled to a maximum of 46 libraries per pool and eachpool was sequenced on one Illumina HiSeq lane, generating 125bp indexed reads (V4chemistry). In total, the capture space was 452129 base pairs.83Table 3.6: Regions Selected for Investigating Somatic Hypermutation (n = 20 genes).Genes in which the 5’ sequences were analzyed for coding and non-coding mutations.FDR value refers to the result of a Fisher’s exact test assessing enrichment of mutationsobserved in the WGS cohort in RGYW and WRCY motifs. TSS, Transcription start site;Chr, Chromosome.Gene Name Chr TSS Strand Targeted Region FDRBACH2 6 91006628 - chr6:91002628-91007628 4.23E-06BCL2 18 60987362 - chr18:60983362-60988362 3.30E-44BCL6 3 187463516 - chr3:187459516-187464516 1.37E-10BCL7A 12 122457328 + chr12:122456328-122461328 3.16E-09BCR 22 23521891 + chr22:23520891-23525891 1.89E-02BPTF 17 65821640 + chr17:65820640-65825640 8.33E-02BTG1 12 92539674 - chr12:92535674-92540674 1.36E-02BTG2 1 203274619 + chr1:203273619-203278619 4.72E-02CD83 6 14117872 + chr6:14116872-14121872 1.94E-06CIITA 16 10971055 + chr16:10970055-10975055 7.66E-07CXCR4 2 136875736 - chr2:136871736-136876736 1.57E-04IRF8 16 85932409 + chr16:85931409-85936409 9.63E-02LTB 6 31550300 - chr6:31546300-31551300 4.18E-03MYC 8 128747680 + chr8:128746680-128751680 7.60E-01PAX5 9 37027120 - chr9:37023120-37028120 3.07E-19PIM1 6 37137979 + chr6:37136979-37141979 2.25E-03RHOH 4 40198526 + chr4:40197526-40202526 1.30E-06SOCS1 16 11350037 - chr16:11346037-11351037 1.45E-01TCL1A 14 96180534 - chr14:96176534-96181534 1.69E-04TMSB4X X 12995227 + chrX:12994227-12999227 2.71E-043.2.5.2 Somatic Single Point Mutations AnalysisMutationSeq (v4.3.8) [162] was used to predict sSNVs in deep sequencing modein single sample and in tumour-normal pair mode (when applicable). The models“model_deep_single_v0.1.npz” and “model_deep_v0.2.npz” were used for single andpaired mode respectively. For FF samples, predictions were filtered based on the followingcriteria: mutation probability > 0.9 and coverage ≥ 100, or mutation probability > 0.9and coverage < 100. For FFPE samples, predictions were filtered as follows: mutationprobability > 0.7 and coverage ≥ 50, or mutation probability > 0.9 and coverage < 50. Forcases with available germline DNA sequenced (n = 84), SNVs were filtered out if they werepresent in the germline samples. For cases without available germline DNA sequenced,SNVs were filtered out if they were found in at least 2 normal samples or in dbSNP (v147).84For 8 cases without germline DNA sequenced by targeted capture sequencing, SNPs werefiltered out using matching germline whole-genome sequencing libraries. For sampleswith matching germline, indels were called using Strelka (v1.0.13) and the output was notfiltered. For samples without matching germline, indels were called using VarScan andgermline variants were filtered out using dbSNP (v137) and the 1000 genomes. SNVs andindels were then combined. Artifacts were defined as variants that occured more than 5times in either fresh-frozen or FFPE samples, with a mean variant allele frequency of <0.15 or > 0.9; or variants that occured in more than 10% of either fresh-frozen or FFPEsamples. Any SNV calls that occurred within a distance of < 10 base pairs of an indelwere filtered out. Putative SNPs were considered to be variants if they occurred morethan 4 times and had a mean variant allele frequency > 0.45 and < 0.55, with a standarddeviation of < 0.1.sSNVs were annotated with SnpEFF and if multiple effects were called for a singleposition, the effect with the putatively greatest impact was chosen as described abovefor SNV calling in the WGS cases. Multiple, distinct mutation calls in a given gene werecollapsed into a single call using the same effect filter, and only coding mutations weretaken forward for analysis.3.2.6 Digital Droplet Polymerase Chain ReactionDigital droplet polymerase chain reaction (ddPCR) was performed on selected pairs of T1and T2 biopsies. As controls, 200ng of genomic DNA were used from 3 reactive lymphnode samples from unrelated individuals and the data from these samples was pooled.For the test samples, a minimum of 600ng and 200ng of genomic DNA were used for T1and T2 samples, respectively. Primers and FAM- or HEX-tagged probes were designed byIntegrated DNA Technologies (IDT). PCR was performed in droplets generated using theBiorad QX200 droplet generator and each 20 µL reaction mix contained 900nM forwardand reverse primers, 250nM FAM and HEX probes and 5 units of HindIII, in addition to 1xddPCR Supermix for Probes (No dUTP) (Biorad), genomic DNA and water. For each pair ofprobes, the optimal annealing temperature was determined using a temperature gradient.PCR conditions were as follows: 95C for 10 minutes, (94C for 30 seconds, optimalannealing temperature for 90 seconds) x 39, 98C for 10 minutes. Droplets were assessedfor FAM or HEX fluorescence using the Biorad QX200 droplet reader. In all instances, nosignal for either wild-type or mutant DNA could be detected in the non-template controlwells. Results from T2 samples were clustered using Gaussian mixture models and themclust R package (v5.1), setting the number of mixture components to 4 and initializinghierarchical clustering on a random sample of 1000 datapoints. If clustering using thisapproach did not reveal 4 distinct clusters corresponding to empty droplets, wild-type onlydroplets, mutant only droplets or double positive droplets, clustering was repeated, withdifferent random seeds, until an appropriate solution was found. The seed of the first85successful clustering was then set for all other samples corresponding to the same case.As DNA degradation with time potentially competes with the detection of rare alleles, onlysingle, mutant signal-positive droplets were considered to be positive events, as describedin Wong et al. [176].3.3 Results3.3.1 Transformed/Progressed Follicular Lymphoma Samples ExhibitHigher Mutational Burden than Diagnostic SamplesAn average of 7133.29 ± 3107.02 sSNVs (ranging from 2184 to 21802), 512.63 ± 296.67sIndels (ranging from 70 to 1801) and 26.24 ± 21.28 rearrangements (ranging from 4 to112) were observed across all tumour samples (Figure 3.11).)UDFWLRQRI*HQRPHZLWK$OWHUHG&RS\1XPEHU6WDWXV)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/)/7UDQVIRUPHG 7)/ 3URJUHVVHG 3)/ 1RQSURJUHVVRU 13)/6RPDWLF /2+6WUXFWXUDO 5HDUUDQJHPHQW 7\SH,+&1HJDWLYH3RVLWLYH*UDGH *UDGH $)/ *UDGH7XPRU &RQWHQW &HOORI2ULJLQ&RGLQJ 6LQJOH 1XFOHRWLGH 9DULDQWV,QVHUWLRQV'HOHWLRQV 7\SH&RS\ 1XPEHU 6WDWHFigure 3.11: High-Level WGS Analysis Overview. Number of genomic alterations bysample and by clinical group. For the TFL and PFL patients, the T1 and T2 sampleare placed beside each other. Each panel represents a different mutation type (sSNV,sIndel, sCNA, structural rearrangement) with the number of mutations on the y axes.Different colors represent the different categories of mutations within each mutation type.For sCNAs, the fraction of genome with altered copy number status is plotted and copynumber states are mutually exclusive. The somatic LOH class encapsulates all LOH eventsregardless of their absolute sCNA state (e.g. 3N, 4N, etc). Refer to Table 3.4 for moredetails on copy number state annotations.86Temporal analysis of the mutational burden revealed that T2 samples were significantlyhigher than in T1 samples for all mutation types (in both TFL and PFL patients)(Figure 3.12A) and was independent of the time interval between sampling (Figure 3.13).Notably, baseline mutation rates at diagnosis were no different between the threegroups of T1 samples (TFL, PFL and NPFL) in sSNVs, sIndels and proportion of thegenome altered by sCNAs (Figure 3.12B) suggesting the increase in mutation rate for TFLand PFL was acquired after diagnosis. However, the number of structural rearrangementswas higher in TFL T1 samples (31.33 ± 23.29) than in PFL (17 ± 8.88) and NPFL(16.9 ± 13.76) (Kruskal-Wallis P = 0.026, Figure 3.12B), consistent with TFL cases atdiagnosis harbouring an increased propensity to accumulate translocations. Comparisonof TFL and PFL T2 samples revealed higher sIndels (one-tailed Wilcoxon P < 0.001),proportion of genome altered by sCNAs (one-tailed Wilcoxon P = 0.018) and numberof rearrangements in TFL (one-tailed Wilcoxon P = 0.028) (Figure 3.12B), suggestinghistological transformation is associated with higher mutational rates in the structuralgenome relative to samples progressed on therapy. These results indicated a highermutational burden in T2 samples relative to diagnostic samples with a more pronouncedeffect in TFL cases.87P = 0.482 P = 0.056P = 0.280P = 0.650P = 0.336 P < 0.001P = 0.701 P = 0.018P = 0.026 P = 0.028PFL PFLNPFLB T1 T2AFractionofGenomewithAlteredCopyNumberStatusFractionofGenomewithAlteredCopyNumberStatusP < 0.001P = 0.001P < 0.001P = 0.015P = 0.008T1Figure 3.12: Comparative Analysis Between Clinical Groups. Panel A: Genomicalteration load in T1 vs. T2 samples (for TFL and PFL patients). A one-tailed Wilcoxontest was used to assess for a higher mutational burden in T2 samples compared to T1samples. Panel B: Number of genetic alterations by timepoint and by clinical group forsSNVs, sIndels, and rearrangements. Fraction of genome that is copy number altered forsCNAs. A Kruskal-Wallis test was used to assess for differences in mutational burdenbetween T1 samples of TFL, PFL, and NPFL clinical groups. A one-tailed Wilcoxon testwas used to assess for differences in mutational burden between T2 samples of TFL andPFL clinical groups.88-20000200040000 2000 4000Time between T2 and T1 Samples (Days)Difference in T2 and T1 Mutation LoadPatient TypeTFLPFLFigure 3.13: Correlation of Mutation Load Difference with Time between Samples.3.3.2 Histological Transformation Emerges from the Expansion of a RareSubclone in Diagnostic SamplesTo investigate whether mutational burden differences could be attributed to shifts in theclonal architecture between T1 and T2 samples, I computed the distribution of VAFs of theunion set of sSNVs predicted in both T1 and T2 samples for each patient (Figure 3.4).All distributions showed evidence of shared clonal ancestry between T1 and T2 samplesaccompanied by substantial numbers of T1 (0.175 ± 0.105 [0.038 - 0.431]) and T2-specificmutations (0.366 ± 0.166 [0.063-0.664]) (Figure 3.14), contour density on T1 and T2axes).To infer clonal dynamic patterns, ∼192 sSNVs per patient were selected fromthese distributions for deep targeted amplicon sequencing to provide precise allelicmeasurements. Validated sSNV, inferred from ∼10733-fold mean coverage, were usedas markers of clonal lineage for each patient and then clonal phylogenies were thenconstructed to provide a life history of each patient’s tumour (Figure 3.10).In 13 of 15 TFL patients (87%), I observed dramatic clonal dynamics, characteristic ofT2 samples dominated by clones (or phylogenetic lineages) that were extremely rare in T1samples (Figure 3.15, Figure 3.16 and Figure 3.17). Data were consistent with a mode89Figure 3.14: WGS VAF Distribution Across all TFL and PFL Patients. X-axesrepresents the T1 samples while the y-axes represents the T2 samples.of evolution for TFL patients, whereby expansion of clade-specific clones are detectablyabsent or rare in T1 but come to dominate T2 disease. This suggests diagnostic samplesare not likely to possess reliable predictors of transformation in the majority of cases, andthat the clonal dynamics occurring after diagnosis likely underpin histological change. Thispattern was independent of time to transformation. For example, the T2 sample fromFL1007 (transformed after 14.57 years, Figure 3.15), characterized by FOXO1 and BCL6mutations in the ancestral clone (cluster 1), was entirely composed of a clonal lineage thatharboured B2M and CCND3 mutations (clusters 2 and 3) that were near 0 prevalencelevels in T1 samples. Notably, these clones were mutually exclusive to the clonal lineagedominating the T1 samples (clusters 4-7-6, and 5). FL1017 (transformed after 0.42 years,Figure 3.15), characterized by CREBBP and KMT2D mutations in its ancestral clone,90harboured a T2-specific lineage containing EZH2 and FOXO1 mutations (cluster 2 and1) exhibiting a similar distribution of clones to FL1007. This pattern of clonal dynamicswas independent of treatment regimen and was found in untreated cases (observationalone) (FL1007, FL1006, FL1012, FL1014, FL1019) and those cases treated with rituximabor combination therapy (FL1001, FL1005, FL1013, FL1016, FL1004, FL1008, FL1017).The pattern of expansion from undetectable or extremely rare clones was validated usingorthogonal ddPCR technology (Section 3.2.6) in 3/3 TFL cases attempted, confirminga clone as rare as 2 out of approximately 105 cells at diagnosis came to dominate thetransformed specimen (Figure 3.18A,B, Figure 3.19A-C). Two cases (13%) exhibitedclonal dynamics that contrasted with the dominant pattern. In these cases (FL1009 andFL1020 - both untreated and both with relatively short times to transformation (0.39 yearsand 0.78 years), the dynamic properties showed conserved clonal architecture (FL1009)or only modest dynamics (FL1020). Thus, a small minority of cases may already containthe properties driving transformation at time of diagnosis. Together, these results reveal astriking pattern of clonal dynamics underpinning histological transformation in the majorityof TFL cases, independent of time to transformation and treatment regimen.91Mean ClusterCellular PrevalencessSNV CellularPrevalences Clonal Phylogenies Clonal Prevalences12 4 53 7637 114691012 3451231235412 34 CREBBPARID1BEZH2STAT6CARD11CREBBPHLA-DMBKMT2DRRAGCB2MBCL6RHOASGK1EZH2FOXO1CREBBPB2MBCL6RHOASGK1ARID1BEZH2STAT6CARD11CREBBPHLA-DMBKMT2DRRAGCEZH2FOXO1KLHL6KLHL6Time to transformation: 0.42yTreatment between samples:- RituximabTime to transformation: 0.01yTreatment between samples:- NoneT1 T2ARID1ABCL2CREBBPEBF1FASHIST1H1EHVCN1TNFRSF14ARID1ABCL2CREBBPEBF1FASHIST1H1EHVCN1TNFRSF14CCND3CD58TMEM30ACCND3CD58TMEM30ACD58CD58B2MCCND3POUFAF1B2MCCND3POUFAF1BCL6FOXO1STAT6BCL6FOXO1STAT6FASTP53FASTP53BCL10BCL10BCL6BCL6CREBBPEZH2TNFRSF14CREBBPEZH2TNFRSF14Time to transformation: 0.39yTreatment between samples:- ObservationTime to transformation: 14.57yTreatment between samples:- ObservationTime to transformation: 1.35yTreatment between samples:- Observation- R-CVPTime to transformation: 2.56yTreatment between samples:Cyclophosphamide + PrednisoneFigure 3.15: Clonal Phylogenies of TFL patients (Part I). Progression from mutationcellular prevalences to ... continued 92Figure 3.15: ... clonal phylogenies and clonal prevalences for each TFL patient. For eachgiven patient, the left most plot shows the PyClone cellular prevalence of each validatedsSNV (i.e. somatic in either T1 and/or T2) in T1 (x-axes) and T2 (y-axes) with each mutationcolored by the cluster they belong to. The following plot to the right represents the clustercellular prevalence (mean cellular prevalence of all mutations in the cluster) with the sizeof the dot representing the number of mutations in the cluster. This is followed by a clonalphylogeny and then a stacked bar plot representing the clonal prevalence of each clone inthe T1 and T2 sample. Colors of the clusters have no meaning across patients.sSNV CellularPrevalencesMean ClusterCellular Prevalences Clonal Phylogenies Clonal Prevalences12 3 4B2MIRF4STAT3B2MIRF4STAT3Time to transformation: 0.70yTreatment between samples:- Observation1234567BCL6HIST1H1BTNFRSF14HIST1H1EBCL6HIST1H1BTNFRSF14HIST1H1EEZH2EZH2Time to transformation: 3.00yTreatment between samples:- R-CVP + R maintenanceT1 T213 6245HIST1H1BHIST1H1BDTX1DTX1CCND3CIITACREBBPDTX1IL4RKLHL6CCND3CIITACREBBPDTX1IL4RKLHL6IRF8SGK1ZFP36L1IRF8SGK1ZFP36L1Time to transformation: 5.05yTreatment between samples:- Observation12 34BCL6MEF2CTNFRSF14BCL6MEF2CTNFRSF14Time to transformation: 7.65yTreatment between samples:- ObservationFigure 3.16: Clonal Phylogenies of TFL patients (Part II). Same legend as ??.93sSNV CellularPrevalencesMean ClusterCellular Prevalences Clonal Phylogenies Clonal Prevalences1 234BCL6CARD11CREBBPFOXO1BCL6CARD11CREBBPFOXO1Time to transformation: 0.99yTreatment between samples:- Cyclophosphamide + Prednisone- R-CHOPBCL61235 746ATP6V1B2EP300FASTNFAIP3ATP6V1B2EP300FASTNFAIP3BCL6Time to transformation: 0.85yTreatment between samples:- R-CVP1243CREBBPEZH2SMARCA4TNFRSF14CREBBPEZH2SMARCA4TNFRSF14SGK1SGK1Time to transformation: 2.84yTreatment between samples:- RadiationT1 T212 34B2MBCL7ACREBBPEBF1EP300CARD11B2MBCL7ACREBBPEBF1EP300EZH2GNA13MEF2BTNFRSF14ZFP36L1EZH2GNA13MEF2BTNFRSF14ZFP36L1CARD11Time to transformation: 0.78yTreatment between samples:- Observation1 23BCL6EZH2KMT2DSTAT3BCL6EZH2KMT2DSTAT3Time to transformation: 1.45yTreatment between samples:- Observation- Chlorambucil + PrednisoneFigure 3.17: Clonal Phylogenies of TFL patients (Part III). Same legend as ??.94ABCFL1012:IRF4chr6:394908G>TFL1019:CCND3chr6:41903688A>TFL2001:ARID1Achr1:27106880G>ADeep ampliconsequencingDigital droplet PCRT2 T1 ControlFigure 3.18: Ultra-sensitive Detection of Low Prevalence Clones in T1 Samples(Part I). Shown are 3 mutations (panels A to C) in 3 patients (FL1012, FL1019 andFL2001) in which PyClone suggested that the expanded T2-dominant mutation clustersare present at near zero prevalence in T1. No evidence of read support, when comparedto background, was found for T2-associated mutations in T1 for case FL1012, in contrastto cases FL1019 and FL2001 (leftmost plots). Background refers to VAFs of all possiblesingle nucleotide changes in the vicinity of the mutation of interest (defined as up to 50 basepairs upstream and up to 50 base pairs downstream). The results are confirmed by digitaldroplet PCR (rightmost plots). Color coding in the ddPCR plots is as follows: grey=emptydroplets, blue=single-positive droplets for wild-type allele, purple=double-positive droplets,red=single-positive droplets for mutant allele.95Deep ampliconsequencingDigital droplet PCRT2 T1 ControlABCFL1004:CD58chr1:117086940A>CFL1012:LRP1Bchr2:141108417G>TFL1019:CD83chr6:14118192G>ADFL2001:ATP6V1B2chr8:20074768G>AEFL2001:IL11RAchr9:34660330C>TFigure 3.19: Ultra-sensitive Detection of Low Prevalence Clones in T1 samples (PartII). Shown are 5 mutations (panels A to E) in 4 patients (FL1004, FL1012, FL1019 andFL2001) in which PyClone suggested that the expanded T2-dominant mutation clustersare present at near zero prevalence in T1. Legend is the same as Figure 3.18.963.3.3 Clones Dominant in Progressed Samples were Prevalent inDiagnostic SamplesProgressed samples exhibited markedly different patterns of clonal dynamics relative totransformed cases (Figure 3.20). Four cases (FL2002, FL2005, FL2007, and FL2008)harboured readily detectable clones at T1, which expanded to full clonal prevalenceduring treatment with immuno-chemotherapy. This suggests that clones harbouringtreatment-resistance properties were already present at diagnosis, and that symptomaticdisease progression may be attributed to selection of clones that were major constituentsof the diagnostic sample. This mode of progression is reminiscent of clonal evolutiondescribed in chronic lymphocytic leukemia, another mature, incurable, and typicallyrelapsing lymphoid malignancy [16, 137]. FL2006 showed a slightly different patternwhereby the ancestral clone dominated T1 and T2 samples but was accompanied bymodest dynamics, including expansion of a low prevalence clone (cluster 2) in the T2sample. An exceptional case (FL2001) in the PFL group exhibited dynamics similar to TFLcases (validated with ddPCR (Figure 3.18C,Figure 3.19D,E), with a T2-specific lineagewith ARID1A mutation (cluster 2, 3) coming to dominate the relapse sample with noevidence of the T1 clones (cluster 4, 5). This patient initially presented with indolent FL,received single agent rituximab and presented 4 years after diagnosis with symptomatic,progressive lymphoma unresponsive to three lines of systemic therapy, leading to thepatient’s death. In this case, the phylogenetic structure was analogous to the TFLpattern, yet the biopsy from T2 showed no evidence of large cell transformation. Thus,treatment resistance patterns accompanied by significant clonal dynamics can occur inFL in the absence of overt transformation. PFL clonal dynamics suggest that progressionon therapy is driven by a starkly different mode of evolution than what was observed forTFL. These two clinical end points are likely underpinned by non-overlapping evolutionarymechanisms, with PFL harbouring intrinsically resistant properties at diagnosis and TFLgenerally acquiring the dominant transformation phenotype after diagnosis.97Mean ClusterCellular PrevalencessSNV CellularPrevalencesClonal Phylogenies Clonal PrevalencesT1 T225314 1510 161112 1312 312354612 4 53123967108123 4 5HIST1H1EEZH2CREBBPHIST1H1EKMT2DCREBBPCREBBPEZH2EZH2ATP6V1B2RRAGCCREBBPKMT2DRRAGCTNFRSF14ARID1AATP6V1B2FASCREBBPEEF1A1EBF1KMT2DTP53EZH2Time to Progression: 1.11yTreatment between samples:- R-CVP + R MaintenanceTime to Progression: 0.90yTreatment between samples:- R-CVP + BortezomibTime to Progression: 2.41yTreatment between samples:- R-CVP + R MaintenanceTime to Progression: 0.64yTreatment between samples:- R-CVP + BortezomibTime to Progression: 1.36yTreatment between samples:- R-CVP + R maintenanceTime to Progression: 4.02yTreatment between samples:- Observation- RituximabFigure 3.20: Clonal Phylogenies of PFL Patients. Progression from mutation cellularprevalences to clonal phylogenies and clonal prevalences for each PFL patient.983.3.4 TFL Clonal Dynamics are Inconsistent with Neutral EvolutionTo determine whether the observed patterns of evolution in TFL and PFL patients couldbe explained by genetic drift and not positive selection, I considered the observed patternsunder neutral evolutionary dynamics. Using the methodology described by Williams etal. [168] to model neutral evolution, I explored for each specimen how much of itsmutational burden and clonal architecture followed a power law distribution with a higherassociation (assessed through R2) being attributed to greater evidence of neutral evolution(Figure 3.21). For TFL patients, 8 of 11 T2 samples (73%) showed a decrease in their R2(0.794 ± 0.060) in comparison to their T1 sample (0.884 ± 0.089). While for PFL patients,the R2 values were higher in T2 samples (0.932 ± 0.053) compared to T1 samples (0.902± 0.046). These results suggest that changes in the clonal architecture between T1 andT2 for PFL patients may be attributed to genetic drift whereas in TFL patients positiveselection may be the driving force.0.70.80.90.880.920.96TFLPFLT1 T2TimepointR2A0246TFL0.7 0.8 0.9R2Density TimepointT1T2B0246PFL0.88 0.92 0.96R2Density TimepointT1T2CFigure 3.21: Power Law Distribution in TFL and PFL Patients. Panel A: Paired lineplots, facetted by patient type, connecting the neutral evolution estimate at T1 to T2 foreach patient. Panels B-C: Density plot of the neutral evoluton estimate comparing T1and T2 specimens for TFL (panel B) and PFL patients (panel C). Patients with C to Asubstitution bias are excluded from this analysis as they may confound the neutral evolutionanalysis.I next sought to quantify the probability of observing the clonal expansion of anextremely rare clone at T1 (< 1%) into a dominant clone at T2 (> 50%) under neutralevolutionary dynamics for TFL patients. Under the Wright-Fisher process, I modelled driftin 1000 independent simulations. I observed that the majority (88.1%) of the simulationsresulted in a loss of the mutant allele (cluster 1 of Figure 3.22A). Conversely, only 6 (0.6%;cluster 2) of the simulations exhibited a trajectory similar to the clonal expansion patternsobserved in TFL patients. As such, observing this clonal expansion pattern in 13 out of9915 TFL patients is statistically unlikely (binomial exact test P < 0.001) when assuming thisas the background trajectory rate. When modelling drift in PFL starting with a dominantclone at T1, the simulations demonstrate trajectories that are consistent with the observedpatterns of evolution in PFL patients (Figure 3.22B).02505007501 3 4 5 2ClusterNumberofSimulationsCluster 1(n = 881)Cluster 3(n = 75)Cluster 4(n = 19)Cluster 5(n = 19)Cluster 2(n = 6)0.000.250.500.751.00025005000750010000 025005000750010000 025005000750010000 025005000750010000 025005000750010000GenerationVAFA01002001 2 3 4ClusterNumberofSimulationsCluster 1(n = 284)Cluster 2(n = 278)Cluster 3(n = 226)Cluster 4(n = 212)0.000.250.500.751.00025005000750010000 025005000750010000 025005000750010000 025005000750010000GenerationVAFBFigure 3.22: Wright-Fisher Modelling in TFL and PFL Patients. Genetic drift modellingin TFL (panel A) and PFL (panel B) patients with an initial VAF of 1% and 50% respectively.Far left barplot indicates the number of simulations that follow a specific genetic drifttrajectory (shown on the right).Collectively, these results strongly support the notion that histological transformationis driven through positive selection. In contrast, the clonal dynamic patterns in earlyprogressed patients can be statistically explained by neutral evolution models, suggestingthat clones at T1 are fit under the current environment and further dynamics can beattributed to neutral drift.3.3.5 Contribution of Individual Gene Mutations to TransformationEvolutionary analysis suggested that the transformation and early progressed phenotypeare driven by distinct modes of selection. This prompted a need to identify the specificmutations associated with these aggressive phenotypes in a larger set of patients (395genomic DNA samples (T1 and/or T2) from 277 patients) and assess their roles intransformation and early progression (Table 3.1 and Table 3.2).A targeted panel of 94 genes was used for custom hybridization-based capture assay(cap-seq) (Table 3.5, selection criteria is described in Section 3.2.5.1). In this assay, theentire coding sequence of 86 genes was sequenced, as well as the 5’ regions that are100targets of somatic hypermutation in 20 genes (12 genes overlapping) (Table 3.6).Mean target coverage was significantly higher in fresh-frozen samples when comparedto FFPE samples (998.58 ± 206.95 vs. 171.34 ± 124.06, Student t P < 0.001), but themean number of variants predicted did not significantly differ (74.13 ± 59.57 variants intarget space of fresh-frozen cases vs. 83.41 ± 54.81 variants in target space of FFPEcases, Student t P = 0.115). Sixty-one samples from the capture sequencing cohort hadpreviously been sequenced using WGS, and within the coding sequence of all genes inwhich the entire coding sequence was assessed, 343 variants out of 357 (96%) predictedfrom WGS data were also predicted by the pipeline for capture sequencing data, testifyingto the sensitivity of the approach.First, the T1 (n = 128) and T2 (n = 149) samples of transformed cases from 159 patients(118 paired biopsies) were compared. Similar to findings from WGS (Figure 3.11B),mutational load was higher in T2 than in T1 samples (mean number of mutated genes12.38 ± 6.81 vs. 9.34 ± 5.73, Student t P < 0.001) (Figure 3.23). Mutation burden in the5’ regions of 20 genes that are targets of somatic hypermutation did not significantly differbetween T1 and T2 samples, with the exception of MYC.P=6.85e-05010203040T1 T2TimepointMutated Genes per Patient (n)T1T2Figure 3.23: Mutational Load in Coding Sequence of 86 genes. Non-synonymoussSNVs and sIndels were considered in this analysis.Next, I sought to determine which genes had a higher likelihood of being mutatedin T2 compared to T1 samples. A Bayesian test of proportions was used to discover101that 10 genes were more commonly altered in transformed lymphoma (Figure 3.24A).These included genes that had previously been associated with transformation, suchas TP53, B2M, MYC, EBF1, as well as genes that have not yet been implicated ascontributing to this process, for example EZH2, CCND3, PIM1 or ITPKB. B2M mutationswere associated with a significantly reduced CD8+ T-cell infiltrate in transformed lymphomabiopsies Figure 3.24B,C). Moreover, mutations in GNA13, S1PR2 and P2RY8, that havebeen implicated in dissemination of germinal centre B cells [177], were enriched in T2samples. Taken together, these findings suggest that defective DNA damage response,increased proliferation, escape from immune surveillance and loss of confinement withinthe germinal centre represent key features that drive histological transformation fromindolent to aggressive lymphoma.Next, the mutation status was overlaid with detailed histological annotation. Compositemorphology was associated with a lower prevalence of TP53 mutations (8% vs. 37%,Fisher P = 0.007) relative to DLBCL morphology. In addition, cell-of-origin classificationwas available for 108 cases with a DLBCL histology, 18 and 90 of which were of the ABCand GCB subtype, respectively (Figure 3.24D). Although the number of ABC-TFL caseswas small, BCL10 (16% vs. 1%, Fisher P = 0.004), CD79B (22% vs. 4%, Fisher P = 0.005)and MYD88 mutations (28% vs. 9%, P = 0.006) were more frequently associated withABC-TFL than in GCB-TFL, suggesting that B-cell receptor signalling and NF-κβ signallingare important contributors to the ABC phenotype in TFL.102Significant Not significant-0.10.00.10.2CredibleIntervalA0204060TP53B2MEZH2MYCCCND3EBF1PIM1GNA13ITPKBS1PR2P2RY8SGK1CHD8HIST1H1CMYD88DNAH9FASHIST1H1BIRF8KMT2CHIST1H1EGNAI2HLA-DMBEEF1A1KMT2DNOTCH2RHOABTG1CD79BSTAT3BCL2TLR2FOXO1UNC5CTMEM30AARHGEF1BCL10SOCS1CIITACD83IKZF3KLHL6ZFP36L1MYOM2DTX1HIST1H2AMIL4RSMARCA4BCL6IRF4MEF2CBCL7AID3PTPN1NOTCH1EVI2AHVCN1MKI67LRRC7FAT4CD58ARID1AARID1BTCF3CCL23XBP1BTG2BCRCTSSCD70UPF1CARD11RRAGCPOU2AF1TNFRSF14EBF3MEF2BEP300ATP6AP1TNFAIP3CREBBPNLRC5ACTBATP6V1B2STAT6SamplesMutated(%) TimepointT1T2P = 0.758 P = 0.003T1 T2010203040Wildtype Mutated Wildtype MutatedB2M Mutation StatusCD8PositivePixels(%)B C0255075100BCL10 CD79B MYD88 FAT4 TCF3 DNAH9Gene NameT2SamplesMutated(%) ABC (n=18)GCB (n=90)DT1 T2CD8IHCB2MIHC100um 100um25um 25umFigure 3.24: Results from Targeted Sequencing of 86 genes in Samples from 159TFL Patients (128 T1 and 149 T2 Samples). Panel A: Credible intervals from Bayesianproportion test (top), genes are ranked by group difference and separated based onwhether the probability of a given gene to be more commonly mutated in T2 is > 0.95 ornot; Percentage of samples harbouring mutations in given genes (bottom). Panel B: CD8positive pixel count by Aperio automated imaging of immunohistochemically stained tissuecores, by timepoint and B2M mutation status. Only cases with paired information on CD8+T cell scoring and mutation status are shown. Panel C: Representative microscopy imagestaken at 200x or 800x magnification (CD8 and B2M, respectively). Panel D: Proportionof mutated samples by cell-of-origin (ABC, activated B-cell-like; GCB, germinal centreB-cell-like); shown are only genes that are significantly associated with either subtype(Fisher P < 0.05).3.3.6 Gene Mutations in Early ProgressersNext, I assessed the association of gene mutations with patient outcome contrastingearly progressers (< 2.5 years after starting R-chemotherapy) (n = 41) and late ornon-progressers (no progression for > 5 years) (n = 84). Median overall survival wasextremely poor in early progressers (3.01 years vs. not reached in late progressers(log-rank P < 0.001, Figure 3.25), highlighting the critical need for better therapy.1030.000.250.500.751.000.0 2.5 5.0 7.5 10.0Time (years)Probability progression-freeGroupEARLYLATEProgression-free survival0.000.250.500.751.000.0 2.5 5.0 7.5 10.0Time (years)Probability survivingGroupEARLYLATEOverall survivalFigure 3.25: Progression-free and Overall Survival in Early and Late Progressers.Eleven genes were mutated more commonly in early progressers than in lateprogressers, including KMT2C, TP53, BTG1, MKI67, XBP1 and SOCS1 (Figure 3.26A).Only MEF2C was more commonly mutated in late progressers. Overall, 33 out of 41 earlyprogressers had mutations in any of the 10 early progression-associated, and none of theearly progression-associated genes were mutated at a frequency > 22% (Figure 3.26A,B)Thus, early progression appears to be related to relatively infrequent genetic alterations.Furthermore, none of the early progression-associated gene mutations form part of them7-FLIPI outcome predictor and, in this cohort that was enriched for clinical extremes, them7-FLIPI was similarly associated with early progression when compared with the FLIPI,but not superior, having better specificity (89% vs. 76%), but worse sensitivity (36% vs.63%). Taken together, these results identify early progression as a distinct clinicogeneticdisease category that is imperfectly captured by traditional prognostic tools.104Early Not significant Late-0.2-0.10.00.10.20.3CredibleIntervalA0204060KMT2CBTG1TP53MKI67XBP1SOCS1IKZF3B2M FASMYD88BCL7AKMT2DFOXO1EP300SGK1HIST1H1CDNAH9TNFRSF14TCF3CIITAEBF1MYCATP6V1B2ATP6AP1PIM1ARID1BCTSSHIST1H1EGNAI2DTX1BTG2CREBBPTNFAIP3STAT6RFX5ITPKBS1PR2BCL6P2RY8BCL2RRAGCFAT4NLRC5CHD8IL4RZFP36L1EVI2ABCRIRF4SMARCA4UNC5CEZH2RHOAACTBNOTCH1NOTCH2ARID1ACD79BCD83HIST1H1BCCND3LRRC7EBF3STAT3EEF1A1POU2AF1MYOM2KLHL6HIST1H2AMHLA-DMBMEF2BARHGEF1CARD11IRF8HVCN1GNA13MEF2CSamplesMutated(%)EARLYLATEXBP1TP53SOCS1MYD88MKI67KMT2CIKZF3FASBTG1B2MEarlyProgressionGenesSplice Site RegionNon-Synonymous StartSplice Site Donor SpliceSite Acceptor Start LostNon-Synonymous CodingCodon InsertionCodon Change Plus Codon DeletionFrame ShiftStop GainedStart LostFLIPIm7-FLIPIEarly or LateBEARLYLATEFLIPI/m7-FLIPI low riskFLIPI/m7-FLIPI high riskFigure 3.26: Results from Targeted Sequencing of 86 genes in 41 Early and 84 LateProgresser Patients. Panel A: Credible intervals from a bayesian proportion test (top),genes are ranked by group difference and separated based on whether the probabilityof a given gene to be more commonly mutated in early progressers is > 0.95 or not;Percentage of samples harbouring mutations in given genes (bottom). Panel B: Oncoplotof genes significantly mutated in early progressers only, annotated with FLIPI and m7-FLIPIrisk groups (yellow, low or low/intermediate; green, high risk).3.4 DiscussionTransformation and progression in FL are driven by disparate modes of evolutionarychange. Shown schematically in Figure 3.27 and Figure 3.28, TFL is characterizedby the emergence of clones that become dominant at T2 and that typically lie belowthe detection limit of even highly sensitive methods in the T1 timepoint (Figure 3.27),implying that the aggressive phenotypes emerge after diagnosis. By contrast, earlyprogression of FL commonly results from prevalent clones in T1 such that much of theclonal architecture is maintained despite treatment, implying that resistant properties arewell-established at diagnosis (Figure 3.28). Gene content of mutations associated withtransformation and early progression also differed. I found novel associations of genemutations with transformation (including CCND3, GNA13, S1PR2 and P2RY8 mutations),and showed that TFL is molecularly heterogeneous with, for example, the ABC subtype ofTFL being enriched for BCL10, CD79B and MYD88 mutations. Gene content of recurrentmutations associated with early progression included KMT2C, TP53, BTG1 and MKI67.105Thus, transformation and progression can be attributed to disruption of partially differentbiological processes.T0 T2T1ClonalPrevalenceClonalPrevalenceTransformation Properties EmergeFollowing DiagnosisTimesweepClonalTrajectoryARID1AATP6V1B2CARD11EZH2HIST1H1CHIST1H1DHIST1H1EIRF8KMT2DRRAGCSTAT6TNFRSF14t(14;18)CREBBPB2MCCND3CDKN2AEBF1GNA13P2RY8S1PR2MYCTP53Figure 3.27: Schematic Models of Evolutionary Progression in TFL. The timesweepfacet (top) shows the diagram of a prototypical transformed case in FL indicating dynamicsof clonal composition from germline to diagnostic T1 specimen to histologically transformedT2 specimen. The clonal trajectory facet (bottom) presents an alternative view of thetimesweep facet, without the clonal hierarchy, demonstrating the trajectory of the individualclones over time.These findings are of critical translational relevance. The divergent modes of evolutionof PFL and TFL mirror distinct differences between the clinical presentation of theseentities: transformation being uniquely associated with rapid onset of tumour growth andsystemic symptoms, suggesting an underlying shift in tumour biology. As the nature ofexpansion appears to correlate with rapid onset symptoms, more granular monitoringof these patients would be required to determine the exact timing of the evolutionaryinflection point. Furthermore, the defining genetic features of transformation may remainelusive at diagnosis, and at best will require ultra-sensitive detection techniques in orderto develop predictive assays but may be essentially undetectable. Conversely, primaryresistance to upfront combined modality therapy generally occurs by the selection ofresistant clones readily found at diagnosis, suggesting that their detection may predictresistance to treatment. In that regard, samples from patients who experience early106ClonalPrevalenceClonalPrevalenceTreatment Resistant PropertiesEstablished at DiagnosisT0 T2T1ClonalTrajectoryTimesweepB2MBTG1EP300FASFOXO1IKZF3KMT2CMKI67MYD88SOCS1TP53XBP1t(14;18)CREBBPFigure 3.28: Schematic Models of Evolutionary Progression in PFL. The timesweepfacet (top) shows the diagram of a prototypical FL case progressed on treatment indicatingdynamics of clonal composition from germline to diagnostic T1 specimen to progressedtransformed T2 specimen with the trajectory of each clone presented in the clonal trajectoryfacet (bottom). The clonal trajectory facet (bottom) presents an alternative view of thetimesweep facet, without the clonal hierarchy, demonstrating the trajectory of the individualclones over time.progression harbour relatively uncommon gene mutations that are associated with earlyprogression (e.g. B2M, BTG1, XBP1) none of which had previously been described topredict progression.My results have fundamental implications for the study of tumour evolution.Paradoxically, several patients who were managed solely with observation exhibitedpunctuated clonal dynamics, whereas PFL patients who were treated with multi-agenttherapy exhibited relative stability in their clonal make-up. This implies that the evolutionaryprocesses driving FL may be independent of selective pressures imposed by treatmentregimens. The association of known driver events (such as CCND3 mutations) withtransformation suggests that such punctuated expansions typical of transformation areunder positive selection and not neutral evolution. Rather, in transformation it is likelythat specific alleles overcome offsetting interactions between beneficial and deleteriousmutations acquired over time due to increased fitness. Indeed three cases in the discovery107cohort with CCND3 descendent mutations had widely varying time to transformation: 14.57yrs, 5.05 yrs, and 2.56 yrs. These cases showed VAFs of 0, 0.002, and 0 respectively atdiagnosis and thus emerged from extremely rare populations. Learning precisely howalleles such as CCND3 mutations exhibit epistatic interactions and modify the effect offounder events such as the t(14:18) translocation to confer higher fitness will be criticalto elucidating the mechanism of histological transformation. The pattern is dramaticallydifferent in progression as the expectation might be that clonal dynamics in the presenceof a shifting fitness landscape induced by therapy. Rather, clonal architecture at diagnosisremains relatively constant, suggesting that fitness could be attributed to non-geneticfactors, or these tumours acquire resistance properties very early in their evolutionaryhistories and in the absence of therapeutic selective pressure.These results place transformation and progression in FL at the extremes of clonalpopulation dynamic spectra, at once informing future management strategies, andstimulating deeper questions on how follicular lymphomas mechanistically navigate variedfitness landscapes.108Chapter 4A Novel Prognostic Model to PredictPost-Autologous Stem CellTransplantation Outcomes inClassical Hodgkin Lymphoma14.1 IntroductionClassical Hodgkin lymphoma (cHL) is the most common form of lymphoma affectingindividuals under the age of 30 in the western world. Histologically, cHL is characterizedby the presence of malignant Hodgkin and Reed-Sternberg cells (HRS), which, in contrastto most lymphoid cancers, represent only a minor portion (~1%) of the tumour mass.Thus, the vast majority of the cellular infiltrate (~99%) is comprised of a spectrum ofdifferent immune cells (e.g. eosinophils, lymphocytes, macrophages) forming a pro-tumourmicroenvironment (TME) [63]. The wide array of cytokines and chemokines secreted byHRS cells and TME along with the various receptors on these cells, allows for an extensiveimmune-suppressing crosstalk [76].cHL is widely regarded as a model of success for cancer treatment with chemotherapyhaving greatly improved patient survival. Despite these improvements, a proportion ofadvanced stage patients either harbour refractory lymphoma (10%) or relapse followingfirst-line treatment (20%-30%) [84]. The current standard of care for young, fit patients whoexperience refractory/relapsed disease is salvage chemotherapy followed by high-dosechemotherapy and autologous stem-cell transplantation (ASCT) [85]. Approximately 50%1 In this chapter, I describe a study of tumour microenvironment heterogeneity in a cohort of relapsedclassical Hodgkin lymphoma patients. This chapter is a modified version of material that is currently underreview “Chan, FC*., Mottok, A*., et al. A Novel Prognostic Model to Predict Post-Autologous Stem CellTransplantation Outcomes in Classical Hodgkin Lymphoma. Under Review. *Equal contribution”.109of patients are not cured by such therapy and succumb to the disease.Several clinical, histologic, and biologic parameters that correlate with ASCT outcomehave been reported [79, 178–183, 183–186]. However, with the exception of time to firstrelapse, there has been a lack of reproducible prognostic markers and there is a need forpredictive tests to guide treatment decisions at the time point of first treatment failure, priorto initiation of salvage therapy. The need for such biomarkers, which could be translatedinto clinical-grade assays, is further demanded by the current development of noveltherapeutic strategies in relapsed cHL, including PD-1 blockade and anti-CD30 basedantibody drug conjugate therapy, which have emerged as alternatives to standard salvagetherapy after ASCT failure [90, 93, 94] or are currently being integrated as consolidationtherapy after ASCT [187] in high-risk patients.To date, cHL research has been for the most part focused on primary specimens.Few studies have explored the biology of relapse biopsies due to lack of available tissuespecimens along with the assumption that the biology at relapse is not significantly distinctfrom that at primary diagnosis. While not yet formally shown in cHL, genomic studies fromsolid and other hematological malignancies have demonstrated genetic and phenotypicdivergence of tumour cells between initial diagnosis and relapse, in part explainable bythe exposure to selective pressure exerted by chemotherapy [3, 4, 14, 15, 137]. As it ispostulated that genomic alterations in HRS cells, in conjunction with host-specific immunity,are key determinants for the composition of the TME, the TME composition might act as asurrogate measure of genetic alterations in HRS cells [17]. Moreover, genomic divergencebetween primary and relapse specimens following therapy might be reflected in TMEcomposition differences (“TME dynamics”).Recent studies have demonstrated that gene expression signatures representingnon-neoplastic cells of the TME can be associated with patient outcomes following therapy,implicating the TME’s potential role in treatment outcomes [76–79, 188–190]. This conceptis reinforced by novel therapeutic approaches designed to target the various constituentelements of the TME [5, 76, 191–195]. Given the aforementioned concept of TMEdynamics, there is a critical need to describe the TME composition following failure offirst-line treatment to develop novel biomarkers for post-ASCT treatment outcome.Here, I demonstrate how gene expression patterns, reflecting TME composition, differsignificantly between matched primary and relapse specimens in a subset of cHL patients.Based on the superior predictive properties of gene expression measurements in relapsespecimens, I developed and validated a novel, clinically applicable prognostic model/assay(RHL30), which identifies a subset of patients at high-risk of treatment failure followingsalvage therapy and ASCT. Taken together, these results establish the importance ofgaining a better understanding of biological determinates underlying treatment failurefollowing ASCT, and I demonstrate that this can be best achieved by analyzing biopsiesobtained at relapse.1104.2 Materials and Methods4.2.1 BCCA Study Cohort and Clinical CharacteristicsThe initial study cohort (“BCCA cohort”, Figure 4.1A) consisted of 245 formalin-fixed,paraffin-embedded (FFPE) specimens derived from 174 cHL patients treated at the BritishColumbia Cancer Agency (BCCA). Patients were selected according to the followingcriteria: 1) patients received first-line treatment with doxorubicin, bleomycin, vinblastineand dacarbazine (ABVD) or ABVD-equivalent therapy with curative intent, 2) patientsexperienced cHL progression despite primary treatment (refractory or cHL relapse, asdefined below), and 3) tissue derived from an excisional biopsy was available. Patientswere classified as having primary refractory disease if their cHL progressed during ABVDtreatment or within 3 months of finishing chemotherapy. Patients who had recurrencebeyond 3 months of ending ABVD treatment were classified as having relapsed disease.A total of 159 out of 174 patients went on to receive ASCT as previously described [186].Table 4.1 summarizes the clinical characteristics of the BCCA cohort. Overall survival(OS) was defined as the time from primary pathologic diagnosis of cHL to death fromany cause. Time to first relapse (FFS) was defined as the time from primary pathologicdiagnosis to first cHL progression, or death from cHL. Post-ASCT-OS was defined astime from ASCT treatment to death from any cause. Post-ASCT-FFS was defined astime from ASCT treatment to cHL progression, or death from cHL. Regimen-related toxicdeaths (n = 8, 5%) were censored for Post-ASCT-FFS. Clinical evaluation and/or diagnosticimaging were used to assess response to salvage therapy. Patients with complete orpartial response were classified as chemosensitive. Patients with stable or progressivedisease were classified as chemoresistant. Herein, a relapse specimen refers to a secondbiopsy of either a primary refractory lymphoma or relapsed lymphoma following first cHLprogression.11190 Primary Specimens 13 Relapse Specimens 2 Primary Refractory Disease 11 Relapse Disease 42 Primary Refractory Disease (3 RRT) 48 Relapse Disease  (4 RRT) 90 Eligible for Post-ASCT Analysis 13 Eligible for Post-ASCT Analysis 71 Primary Specimens 71 Relapse Specimens 18 Primary Refractory Disease (3 No-ASCT) 53 Relapse Disease (12 No-ASCT;  1 RRT) Paired 56 Eligible for Post-ASCT Analysis 84 Relapse Specimens 20 Primary Refractory Disease (3 No-ASCT) 64 Relapse Disease (12 No-ASCT;  1 RRT) 161 Primary Specimens 60 Primary Refractory Disease (3 No-ASCT) 101 Relapse Disease (12 No-ASCT;  8 RRT) B 69 Eligible for Post-ASCT Analysis 146 Eligible for Post-ASCT Analysis A ASCT: Autologous stem cell transplantation. RRT: Regimen related toxicity from high-dosage chemotherapy and ASCT treatment. No-ASCT: Patient did not receive ASCT as second-line treatment. Figure 4.1: Patient/Specimen Overview of the BCCA Cohort. Panel A: All patients withsingle primary/relapse specimens or paired primary-relapse biopsies who passed qualitycontrol. Panel B: Specimen-centric view of the BCCA cohort.112Table 4.1: Clinical Characteristics of Study Cohorts. All clinical variables are basedat time of primary diagnosis. aMissing data for 18, 13, 9, and 11 patients in the BCCA,BCCA training, UMCG, and AUH cohorts, respectively. bMissing data for 20 patients inthe BCCA cohort. cMissing data for 8, 2, 19, and 4 patients in the BCCA, BCCA training,UMCG, and AUH cohorts, respectively. dMissing data for 3 patients in the AUH cohort.eMissing data for 4, 2, and 3 patients in the BCCA, BCCA training, and AUH cohorts,respectively. fMissing data for 44, 11 and 5 patients in the BCCA, BCCA training, and AUHcohorts, respectively. NS, Nodular sclerosis; MC, Mixed cellularity; LR, Lymphocyte-rich;LD, Lymphocyte-depleted; EN, Extranodal; NOS, Not otherwise specified.CharacteristicBCCACohort (n = 174)BCCA TrainingCohort (n = 65)UMCG ValidationCohort (n = 31)AUH ValidationCohort (n = 27)Age, yearsMedian 32 29 29 33Range 16-72 16-58 14-62 15-62Male sex - n (%) 92 (53%) 39 (60%) 15 (48%) 17 (63%)EBV+ HRSCells - n (%)a 18 (12%) 4 (8%) 3 (14%) 2 (7%)Subtype - n (%)bNS 122 (79%) 44 (85%) 28 (90%) 21 (78%)MC 16 (10%) 4 (8%) 2 (7%) 5 (19%)LR 4 (3%) 3 (6%) 1 (3%) 1 (4%)LD 1 (1%) 0 0 0EN 2 (1%) 1 (2%) 0 0NOS 9 (6%) 0 0 0Stage - n (%)I 6 (3%) 2 (3%) 3 (10%) 3 (12%)II 74 (43%) 33 (51%) 19 (61%) 7 (28%)III 58 (33%) 19 (29%) 7 (23%) 11 (41%)IV 36 (21%) 11 (17%) 2 (6%) 6 (22%)IPS ≥ 3 - n (%)c 64 (39%) 26 (41%) 2 (17%) 8 (35%)B-symptoms - n (%)d 100 (57%) 36 (55%) 12 (39%) 11 (46%)Mass - n (%)e 71 (42%) 20 (32%) 10 (32%) 7 (29%)Primary RefractoryStatus- n (%) 62 (36%) 16 (25%) 4 (13%) 5 (19%)Response to SalvageTherapy - n (%)f 45 (35%) 13 (24%) 3 (10%) 2 (9%)113. . . continuedClinical CharacteristicBCCACohort (n = 174)BCCA TrainingCohort (n = 65)UMCG ValidationCohort (n = 31)AUH ValidationCohort (n = 27)Follow-up, yearsMedian 10.3 10.7 9.9 11.6Range 1.4 - 28.1 1.4 - 24.1 2.0 - 26.2 1.2 - 16.8Outcomes, 5 yearsOS 76% 87% 77% 71%FFS 9% 14% 10% 7%Post-ASCT-OS 71% 74% 62% 65%Post-ASCT-FFS 61% 66% 62% 65%4.2.2 Gene Expression AnalysisRNA was extracted from FFPE biopsies using the QIAGEN AllPrep DNA/RNA FFPE kit(QIAGEN GmbH, Germany). For each case 10µm scrolls were cut with the number ofscrolls determined by the size of the tissue surface area (resulting in at least 1cm2).The NanoString platform (NanoString Technologies, WA) was used to perform digitalgene expression profiling on 200 ng total RNA. Gene expression profiles were obtainedusing a NanoString codeset (RHL800) comprising probes for 784 endogenous and 15housekeeper genes (Table A.6). These 784 endogenous genes were selected if theysatisfied one of the following criteria: 1) biological signatures representative of cell typestypically found in the microenvironment of HL, 2) previously reported to be associatedwith outcome in cHL, 3) signal of EBV infection (i.e. EBER1, EBER2, LMP1, and LMP2)[78, 79, 83, 190, 196]. For the BCCA cohort, all specimens were profiled with the RHL800assay. A second NanoString 30 gene codeset (RHL30, defined below) was subsequentlyapplied to all relapse specimens involved in prognostic model building and validation(see Section 4.2.6). All data generated on the NanoString platform were subjected toa gene-expression profiling quality control procedure (see Section 4.2.2.1).4.2.2.1 Sample Quality ControlOf the 314 specimens before quality control, 29 specimens from non-progressor patientswere included for quality control purposes. 8 specimens were first excluded due to failingwet lab quality control criteria (low RNA quality). The remaining 306 specimens underwentNanoString gene expression profiling using the RHL800 codeset and then bioinformaticsquality control was performed using a similar procedure as described in Scott et al. [83]Specifically, specimens were normalized by dividing the raw NanoString counts data ofeach specimen by its normalizer score which was calculated using the geometric mean of11412 housekeeping genes (ACTB, ALAS1, CLTC, GAPDH, GUSB, PGK1, POLR2A, RPL19,RPLP0, SDHA, TBP, and TUBB). The 3 remaining housekeepers on the RHL800 codeset(G6PD, HMBS, and POLR1B) were excluded due to high expression variability acrossspecimens. Since specimens with very low normalizer values often lead to very highnormalized expression values (which is an indicator of a poor quality sample), a thresholdof 35 was set to exclude these specimens. This resulted in the removal of 16 casesfrom the BCCA cohort. Following the bioinformatics quality control, a number of thespecimens were identified as being duplicate extractions from the same biopsy specimen.These existed because the first extractions were hypothesized, on the wetlab side, to below in quality so an additional extraction was performed as a backup. In the event thatboth duplicate extracted specimens passed quality control, the specimen with the highernormalizer was taken. This step excluded a total of 16 specimens. After removing theremaining specimens from non-progressor patients, this resulted in a final BCCA cohort of245 specimens from 174 progressor patients reported in Section 4.2.1.For the purpose of building an outcome predictor for post-ASCT outcome, a total of 69relapse specimens, eligible for post-ASCT analysis, were considered for gene expressionprofiling using the RHL30 assay. Two specimens had to be excluded due to lack ofremaining RNA after extraction and 2 additional specimens were excluded due to lownormalizer scores (< 35) resulting in a BCCA training cohort of 65 relapse specimens.4.2.2.2 Gene Expression Normalization and TransformationThe gene expression data were normalized by dividing each sample’s gene expressionvalues by the sample’s geometric mean of 12 housekeepers and then scaled by a factor of1000. Any expression values below 1 were set to 1 and then this was followed by a log2transformation.For paired primary-relapse sample gene expression analysis, I noticed in a few patientsthat the hybridization efficiency was significantly different between its primary and relapsesamples (“Raw” facet of Figure 4.3). In such situations, there would be a skew inexpression values towards one sample type which would result in a majority of genes beingclassified as being overexpressed in that sample type. Despite applying the geometricmean normalization method, it was clear that this method was unable to fully account forthe skew (1-12 housekeeper facets of Figure 4.3). To address this issue, I used a quantilenormalization method which uses the entire set of genes on the codeset to normalize asample. This provided a robust way to correct hybridization efficiency differences betweenprimary and relapse samples (“Quantile Normalized” facet of Figure 4.3). As such, for allanalyses pertaining to Section 4.3.1, the expression data were quantile normalized.115314 Samples 306 Samples Wet Lab Quality Control 290 Samples Generation of Sample-Centric Normalizer Value  (12 reference genes) Determine Normalizer Threshold (35) Exclude 8 samples Exclude 16 samples •  Exclude multi-extracted samples.  •  Choose sample with the higher normalizer. 274 Samples Non−normalized Gene Expression Normalized Gene Expression●●●●● ●●●● ●●●●●● ●●● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●● ●●●●●●●●●● ●●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●● ●●●●● ●●● ●●●●●● ●●●●●● ● ●●●●●●●● ●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●● ●●●●●● ●●●●●●●●● ●●●● ●●●● ●● ●●●● ●● ●●●● ● ●● ●●● ●●●●●●●●●● ●●● ●● ● ●●● ● ●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●● ● ●●● ●●●● ●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●● ●●●●●● ● ●●●●●●●● ●●●●●● ●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●●●● ●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●● ●●●●● ● ●●●●●● ●● ●●●●●●●●●●●●● ●●●● ●● ●●●●●● ●●● ● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●● ●●●● ●●●● ● ●●● ●●●● ●● ●●● ●●● ●●● ●● ●●● ●● ●● ●●● ● ● ●●●● ● ●●● ●●●●● ● ●● ●●●●●● ●●●●●● ● ●●●●●●●● ●● ●●●● ● ●●●●●●●●●● ● ● ●● ●●●●●●●●●●●●● ●●● ●●● ●●●● ●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●● ●● ●● ●● ●● ● ●●● ●●●●●●●● ●●●●●●● ●●● ●● ● ● ●● ●●●● ●● ● ●●● ●● ● ●●● ●● ●●● ● ●● ●●●●●●●●●●● ●●●●● ●● ●● ●●●●● ●●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●● ●●●● ●●●● ●●●●● ●● ●●●●●● ● ●●● ●●●● ● ●● ●●●●● ● ●● ● ●●● ●●●● ●●● ●●● ●●●● ●●● ●● ● ●● ●●●●● ● ● ●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●● ●● ●●● ●● ●● ●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●● ● ●● ●●● ●●●● ●●● ● ●● ●●●●●●●●●●●●●●●●●● ●●●●●● ● ●●●●●●● ●● ●● ●● ●● ● ●●● ●● ●●● ●●● ●●● ●●●●● ●●● ●● ●●● ●●●●●●● ● ●●●● ●●●●● ●● ●●●● ●●●● ●●●●●●●● ●● ● ●● ●●●●●●●●●● ●●●● ●●●●●●●●●● ●● ●●●●●●●●●● ● ●●●●●● ● ●● ●●● ●● ●●●● ● ●●●●● ●●●●● ●●●● ●●● ●●● ● ●●● ●● ●● ●●● ● ● ●●● ●●●●● ●● ●● ●● ●● ● ●●● ●●● ●●●●●● ●●●●●●● ● ●●●●● ●●●●●●●● ● ●●● ●● ● ● ●●● ●●●●●●●●● ●●● ●●●● ●● ●● ●● ● ●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●● ● ●● ● ●●●●●●●●●●●● ●●●●●●●●● ●●●●●●● ●●● ●●●● ●● ●●● ●● ●●● ●● ●●●●● ●●● ●●● ● ●●●●● ● ●● ●● ●● ● ● ●●●● ● ●● ●●●● ● ●● ●● ● ●●● ●●●●● ●● ●● ● ●● ●●●● ● ●●●●●● ● ● ●●●●●● ●●●●●●●●●●●●●●●● ● ● ●● ●●●●●●●●● ●●●●●●● ●●●● ●●●● ● ●● ●●● ●●● ●●●●● ●●● ●●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●● ●● ●●●● ●●●●●● ● ●●● ●●● ●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ● ●●●●●●● ● ●●●● ●● ●● ● ●● ●●● ●●●●●● ●●● ●●●●● ●●●●● ●● ●● ●●●● ● ● ●●● ●● ● ●●●● ●●●● ● ●● ●● ●●●●●●●●●●● ●● ● ●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●● ● ●●●● ●●●● ●●●●● ●●●●●● ●● ●●●● ●●●●● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●●● ● ● ●● ●●●●●●● ● ●●●●●●●● ●●● ●● ● ●● ● ●● ●●● ●●●●●●●● ●●●●●●●●●● ●●●●● ●● ●●●● ● ● ●●●●●● ● ● ●●●●●●● ●●●●● ● ●●● ●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●●● ●●● ●●●● ●●●●●●●● ●●● ●●●●●●●●●●●●●●●● ●●● ●● ●● ●●●●● ● ●●● ●●●●●● ●●● ●●●● ● ● ●●●●●●●●● ● ●●●●● ● ●●● ●● ●● ●● ●●●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●● ●●● ●●●●● ●●●● ●●●●●●● ●●●● ●●●● ●●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●● ●●●●●●● ●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●● ●●● ●●●●● ●● ●●●●●● ●● ● ●●●●●●●● ●●●●●●● ●●●●● ●●●● ●● ●● ●●●●●● ●●●● ●●●● ●● ●●●●●●●●●●●●●●● ●●●● ●● ● ●●●●●● ●●●●● ●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●● ● ●●●● ● ● ●●●●●●● ●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ●● ● ●● ●●● ●●● ●●●● ● ●● ●●● ●● ●● ● ●● ●●●●● ● ●●●●●● ● ●●● ●●● ●● ●●●● ● ● ●●●● ● ●●●●●● ● ●● ●●● ● ●●●●●●● ●● ●●●● ● ●● ●●●●●● ●● ● ● ●●● ●●●●●●●●●●● ● ●● ●● ●●● ●●●●●●●●●●●●●●● ●● ●● ●●●● ●● ●●●● ●● ●● ●● ●● ● ●●●● ●●●●●●● ●●●●●● ● ●●● ●●●●● ●●●●●● ● ●●●●● ●●●●●●●● ● ●●● ● ●●●●●● ●●●●● ●●●● ●● ●● ● ●●●●● ● ●●●●●●● ●●●●● ●● ●●● ●● ● ●●●●●● ●● ●●●●●●● ●● ●●●●● ●● ●● ● ●●●●0100020003000400002000040000600008000005001000150020000500100015002000010000200003000005001000150002000400060008000010002000300002000040000600000100002000030000400000250500750025050075010001250010002000300040005000All GenesACTBALAS1CLTCGAPDHGUSBPGK1POLR2ARPL19RPLP0SDHATBPTUBB0 1000 2000 3000 4000 0 1000 2000 3000 4000Normalizer (counts)Gene ExpressionA B Figure 4.2: BCCA Cohort Quality Control Methodology. Panel A: Quality controlworkflow. Panel B: Housekeeper (All) Gene(s) Expression vs. Normalizer Expression.The left facet shows the specific housekeeper (all) gene(s) raw expression vs. normalizervalue for each specific sample. The right facet shows the specific housekeeper (all)gene(s) normalized expression vs. normalizer value for each specific sample. The dashedhorizontal red line in each facet represents the mean + 2 standard deviations of thenormalized expression for the respective housekeeper (all) gene(s). The dashed verticalred line in each facet represents the normalizer threshold of 35.4.2.2.3 Unsupervised Clustering of Paired Primary-Relapse SamplesI performed a two-component Gaussian mixture model clustering on the Spearmanr2 values of each patient (n = 71) with a paired primary-relapse sample. Theexpectation-maximization algorithm was used to fit the parameters of the Gaussiandistributions and maximize the log likelihood with convergence of the fit set to 1e−8. The“normalmixEM function” from the mixtools R package (v1.0.3) was used for this clustering.A posterior probability threshold of 0.5 was set to classify patients into the low or highcorrelation class.116Raw 1 Housekeeper(s) 2 Housekeeper(s) 3 Housekeeper(s)4 Housekeeper(s) 5 Housekeeper(s) 6 Housekeeper(s) 7 Housekeeper(s)8 Housekeeper(s) 9 Housekeeper(s) 10 Housekeeper(s) 11 Housekeeper(s)12 Housekeeper(s) Quantile Normalized0510150510150510150510150 5 10 15 0 5 10 15Primary Sample (HL1028)Relapse Sample (HL1040)Figure 4.3: Effects of Different Normalization Strategies. Plotted in the different facetsis the gene expression of a matching relapse vs. primary specimen normalized to adifferent set of genes. The red diagonal line represents the line of perfect correlation.4.2.2.4 Signature AnalysisGenes were assigned to various signatures (TME or cellular process) according toliterature. A sample’s signature score was calculated by taking the average expression ofits constituent genes. A patient’s signature difference was calculated by taking its relapsesample signature score and subtracting it from its primary sample signature score. Assuch, a positive signature difference meant a higher expression (content) in the relapsesample compared to its primary sample and vice versa for a negative signature difference.1174.2.3 Pathology and Immunohistochemical AnalysisAll cases included in the BCCA cohort were reviewed and subclassified byexpert hematopathologists (Dr. Anja Mottok and Dr. Randy Gascoyne). Tissuemicroarrays (TMAs) were constructed to represent each case by two 1.5 mm cores.Immunohistochemical studies were performed on a Ventana Benchmark system using 4µm sections and the following antibodies: anti-CD20 (clone L26, Dako, 1:500), anti-CD68(clone KP1, Dako, 1:3000) and CD163 (clone SP96, Abcam, 1:10).TMA slides for CD20, CD68 and CD163 were scanned using an Aperio ScanScopeXT at 20x magnification and subsequently analyzed with the Aperio ImageScope viewersoftware (Version 12.1.0; Aperio Technologies). Only cores and areas containing tumourwere scored using the Positive Pixel Count algorithm with optimized color saturationthresholds. Areas with extensive sclerosis or fibrosis were excluded from the analysis.Any staining was considered positive and the number of positive pixels was divided bythe total pixel count. Whenever applicable, scores from both cores were averaged andmultiplied by 100 to obtain the percentage of positive pixels. Correlation of IHC scoringand NanoString expression was well correlated (Figure 4.4).4.2.4 Prognostic Power Comparison Between Primary and RelapseSamplesThree separate analyses were performed to compare the prognostic power of primary torelapse samples. Firstly, univariate cox regression was performed for each sample typeindividually using all genes and the number of significant genes were directly compared.Secondly, for each patient with a paired primary and relapse sample I combined theexpression profiles of both of these samples to form a single patient expression profile(Figure 4.5). This resulted in each patient having a total of 1570 gene expression featureswith 785 features from the primary and relapse samples respectively. By using penalizedmultivariate cox regression on these patient expression profiles, I accounted for featureselection competition between features from the same sample type (primary or relapse)and also between sample types.Finally for each sample type, bootstrap aggregating was performed by splitting thesamples into 2/3 training and 1/3 testing a total of 1000 times with each iteration beinga different split (Figure 4.6). For each iteration, a multivariate ridge Cox regression(α = 0) was used to generate a prognostic model using the training samples and genefeatures that were significant by univariate cox regression. This prognostic model was thenapplied to the testing cohort followed with the generation of a concordance statistic. Theseconcordance statistics were then compared between the primary and relapse sample typesusing a two-sample T-test. Only patients with paired primary-relapse samples were usedin this analysis.118Aperio CD163 Aperio CD20 Aperio CD68Spearman R = 0.836 Spearman R = 0.76 Spearman R = 0.59802550751005.0 7.5 10.0 12.5 4 6 8 10 12 8 9 10 11Gene ExpressionIHCAAperio CD163 Aperio CD20 Aperio CD68Spearman R = 0.743 Spearman R = 0.724 Spearman R = 0.39802550751006 7 8 9 6 7 8 9 6 7 8 9Signature ExpressionIHCBFigure 4.4: Correlation of IHC and NanoString Expression. Panel A: IHC vs. IndividualGene Expression. Panel B: IHC vs. Signature Expression.4.2.5 Bayesian Test of ProportionsA Bayesian test of proportions was used to assess the correlation of subtype transition withprimary refractory status. The full hierarchical model is as follows:xi ∼ Binomial(θi, ni)θi ∼ Beta(1, 1)• i is an index over the group of patients who have either 1) experienced a subtypetransition or 2) not experienced a subype transition.• ni is the number of patients in group i• θi is the frequency of patients with primary refractory status in group i given xiBayesian inference is then performed on θ2 − θ1 using Markov chain Monte Carlo(MCMC) sampling over 50000 iterations to generate a posterior distribution. A P value is119Each gene is represented twice: 1 for primary and 1 for relapse (e.g. CD68_rel and CD68_pri Primary Samples Gene Expression Matrix Relapse Samples Gene Expression Matrix Patient Primary Samples  (n = 71)   Patient Relapse Samples  (n = 71)  Genes (n = 785) Primary-Relapse Sample Gene Expression Matrix Patients (n = 71)  Genes (n = 1570) Genes (n = 785) Parsimonous Outcome Predictor Model Features from Primary  Features from Relapse  Penalized Multivariate  Cox Regression (Post-BMTFFS; Post-BMTOS) α = {0.1, 0.2, …, 1.0} Exclude patients ineligible for outcome analysis •  16 for Post-BMTFFS •  15 for Post-BMTOS Figure 4.5: Penalized Multivariate Cox Regression Method for Directly ComparingPrimary vs. Relapse Specimens.then generated by measuring the proportion of times that θ2 > θ1 over all MCMC iterations.4.2.6 RHL30 Prognostic Model/AssayFigure 4.7A provides an overview of how the RHL30 prognostic model was built andvalidated. For the purpose of building a prognostic model for post-ASCT treatmentoutcomes, 69 relapse specimens, eligible for post-ASCT-FFS outcome analysis, with geneexpression data generated using the RHL800 codeset were available for the model buildingprocess. A pre-filtering step was applied to select features/genes based on the followingcriteria: 1) expression standard deviation > 0.3, 2) statistical significance in univariate coxregression, and 3) not an EBV-related gene (i.e. EBER1, EBER2, LMP1, LMP2).This resulted in a total of 87 genes being used for constructing the prognostic model.The glmnet R package (v1.9.8) was used to build penalized cox regression models. Themain parameter is α that controls the degree of penalization. A α = 0.5, correspondingto elastic-net regularization, was used to choose discriminative features associated withpost-ASCT-FFS response. This resulted in a total of 18 outcome-associated genes andalong with the 12 housekeeping genes, used for normalization, comprised the RHL30120Primary Samples Gene Expression Matrix Relapse Samples Gene Expression Matrix Patient Primary Samples  (n = 71)   Patient Relapse Samples  (n = 71)  Genes (n = 785) Genes (n = 785) Exclude patients ineligible for outcome analysis: •  16 for Post-BMTFFS •  15 for Post-BMTOS 2/3 Training Primary Samples Gene Expression Matrix 1/3 Testing Primary Samples Gene Expression Matrix N = 1000 Parsimonous Outcome Predictor Model Multivariate  Ridge Cox Regression (α = 0) Include genes significant from univariate cox regresssion  (P < 0.05) Concordance Statistic Exclude patients ineligible for outcome analysis: •  16 for Post-BMTFFS •  15 for Post-BMTOS 1/3 Testing Primary Samples Gene Expression Matrix 2/3 Training Primary Samples Gene Expression Matrix Parsimonous Outcome Predictor Model Include genes significant from univariate cox regresssion  (P < 0.05) Concordance Statistic Multivariate  Ridge Cox Regression (α = 0) N = 1000 Two-sample T-test Figure 4.6: Primary vs. Relapse Samples Bootstrap Aggregration Methodprognostic model.Once the genes and housekeeping genes for the RHL30 model were established, aspecific NanoString codeset with probes corresponding to the genes in the model wasordered and assays were performed again on the relapse specimens. 2 samples, HL1071and HL1016, had to be excluded to lack of RNA from extraction and 2 additional samples,HL1072 and HL1269, had to be excluded due to low normalizer values (< 35) resulting ina final “BCCA training cohort” of 65 relapse specimens (Table 4.1). Ridge multivariate coxregression (α = 0) was then used to retrain the RHL30 model (Table 4.2) using the smallercodeset. The RHL30, herein, refers to both the prognostic model and its correspondingNanoString assay.For the purpose of RHL30 model validation, two similarly treated independent cohorts(Table 4.1) of relapse specimens were available from the University Medical CentreGroningen (“UMCG validation cohort”, n = 31) and Aarhus University Hospital (“AUHvalidation cohort”, n = 27) (Figure 4.7B).Survival curves were compared using the log-rank test and multivariate Cox regressionwas used to test for statistical independence of the RHL30 risk scores to FFS, response tosalvage therapy, age ≥ 45, and stage IV.12184 Relapse Specimens 20 Primary Refractory Disease (3 No Auto) 64 Relapse Disease (12 No Auto; 1 RRT) 69 Relapse Specimens Exclude 3 specimens ineligible for  Post-ASCT-FFS Exclude 12 specimens eligible for  Post-ASCT-FFS •  Standard deviation > 0.25 •  Univariate Cox Regression P < 0.05 •  Filter out housekeeper genes and EBER1, EBER2, LMP1, LMP2 Relapse Specimen Gene Expression Matrix (800 x 69) Relapse Specimen Gene Expression Matrix (87 x 69) Add 12 Housekeepers Relapse Specimen RHL30 Gene Expression Matrix (30 x 65) RHL30 Outcome Predictor Model Elastic-net Multivariate Cox Regression Selection of 18 Genes Ridge Multivariate Cox Regression RHL30 Codeset Establish Threshold (10.4) for Low and  High Risk Patients Relapse Specimen RHL30 Gene Expression Matrix (30 x 31) BCCA Training Cohort Validate Model UMCG Validation Cohort 33 Relapse Specimens RHL30 Codeset A B Exclude 2 specimens (HL1071 and HL1016) due to lack of RNA for extraction Exclude HL1072 and HL1269 due to low normalizer (< 35) Exclude 2 specimens (HL1516 and HL1534) due to low normalizer (< 35) RHL30 Retraining  Cohort Relapse Specimen RHL30 Gene Expression Matrix (30 x 27) Validate Model AUH Validation Cohort 29 Relapse Specimens RHL30 Codeset Exclude 2 specimens (HL1617 and HL1627) due to low normalizer (< 35) Figure 4.7: RHL30 Prognostic Model Methodology. Panel A: The RHL30 model trainingprocedure based on the BCCA training cohort. Panel B: The RHL30 model applied to theUMCG and AUH validation cohorts.122Table 4.2: RHL30 Model Coefficients. Housekeeper genes are used only to normalizethe expression data.Gene No. Gene Name Coefficient Signature1 BACH2 -0.1020536 B-cell2 CR2 -0.0293100 B-cell3 NGK2D -0.0010277 NK4 ABCG1 0.0072505 Drug-Resistance5 IL3RA 0.04698366 SOD2 0.0484459 Neutrophil7 IGSF3 0.0563299 HRS8 CCL20 0.0605319 HRS, Macrophage9 IL13RA1 0.0627200 Neutrophil10 CX3CL1 0.0637130 HRS, Macrophage11 LGMN 0.074187112 APOE 0.1039541 Macrophage13 RNF144B 0.104874014 ABCA3 0.1171592 Drug-Resistance15 SDC4 0.122326116 GCS 0.1518246 Drug-Resistance17 TNFSF9 0.2130110 T-synapse18 CSF1 0.2799929 HRS19 ACTB 0.0000000 Housekeeper20 ALAS1 0.0000000 Housekeeper21 CLTC 0.0000000 Housekeeper22 GAPDH 0.0000000 Housekeeper23 GUSB 0.0000000 Housekeeper24 PGK1 0.0000000 Housekeeper25 POLR2A 0.0000000 Housekeeper26 RPL19 0.0000000 Housekeeper27 RPLP0 0.0000000 Housekeeper28 SDHA 0.0000000 Housekeeper29 TBP 0.0000000 Housekeeper30 TUBB 0.0000000 Housekeeper1234.3 Results4.3.1 Comparative Analysis of Paired Primary-Relapse Specimens RevealsBiological DifferencesUsing the 71 patients with paired primary-relapse (including refractory) specimens, Iinvestigated biological differences at the histological and molecular level.All tissue specimens included in this study were subjected to pathology review.Histological subtypes were assigned according to the current WHO classification andrevealed that of the 71 patients with paired primary-relapse biopsies the majority of theprimary specimens were of the nodular sclerosis subtype (n = 55, 77%, top panel ofFigure 4.8A) Subtype assignment for paired specimens was performed without knowledgeof the result from the corresponding primary biopsy (bottom panel of Figure 4.8B). Afterexclusion of cases with extranodal disease or unclassifiable cHL, I observed a subtypetransition in 16 out of 61 cases (26%) when comparing their matched primary and relapsebiopsies (Figure 4.8B). The majority of these 16 patients (n = 14, 87.5%) were patientswith relapsed rather than primary refractory disease (Bayesian test of proportions P =0.087, 0.16 [-0.065 - 0.36]). The most common transition (37.5%, n = 6) was from mixedcellularity to nodular sclerosis (Figure 4.8C).Next, I associated gene expression patterns with components of the TME (seeSection 4.2.2.4) and described differences between paired primary-relapse specimensthat are reflective of “TME dynamics”. I found a bimodal distribution of r2 correlation valuesand identified that 17 out of 71 patients (24%, r2 = 0.6 ± 0.13) exhibited low correlationbetween their primary and relapse specimens’ gene expression profiles, indicative of highTME dynamics. By contrast, the mean correlation (r2) of the highly correlated group ofpatients was 0.8 ± 0.14 (Figure 4.9A). Patients with low correlation pairs had an inferiorpost-ASCT-FFS (5-year: 38.5%) compared to patients with high correlation pairs (5-year:68.4%; Log-rank P = 0.005; Figure 4.9B). The prognostic significance of the correlationgroup was independent of time to first relapse (P = 0.011), primary refractory status (P =0.007), response to salvage therapy (P = 0.03), age ≥ 45 years (P = 0.008), and stage IV(P = 0.006) in pairwise Cox regression analyses.The finding of significant biological changes between primary and relapse specimensat the histological and gene expression levels prompted me to further investigate specificdifferences in gene expression signatures reflective of TME composition and drugresistance mechanisms. This analysis revealed that those differences are attributableto dynamic shifts of multiple signatures (Figure 4.9C-D). Of all correlations (Table A.7),the most striking was an inverse correlation of relative changes in macrophage and B-cellsignatures between primary and relapse specimens (Spearman r = -0.796, SpearmanP < 0.001), which was evident in both correlation groups (Figure 4.9E). I validatedthese findings by immunohistochemistry (IHC) using antibodies directed against CD20 and1243ULPDU\ 6SHFLPHQ 5HODSVH 6SHFLPHQ1XPEHU RI 7UDQVLWLRQV160&/5/'Figure 4.8: Primary vs. Relapse Histological Subtype Transitions. Panel A:Distribution of histological subtypes across both primary and relapse specimens. 2 primaryand 8 relapse specimens were classified as extranodal or not otherwise specified andwere excluded from this plot and further analyses. Panel B: Barplot demonstrating thenumber of patients with differences in the histological subtype between their primary andrelapse specimen. Panel C: Paired line plot demonstrating the specific subtype transition(y-axis) between matching primary and relapse specimen (x-axis). Each line representsthe histological subtype transition between a matched primary and relapse specimen. Thewidth and the color of the line represent the number of patients with a given subtypetransition. Only patients with WHO subtypes at both timepoints were considered forthis analysis. NS, nodular sclerosis; MC, mixed cellularity; LR, lymphocyte-rich; LD,lymphocyte-depleted.CD163, confirming this inverse correlation (Spearman r = -0.645, Spearman P < 0.001)(Figure 4.9F). Specific patient examples of this macrophage / B-cell pattern are shown inFigure 4.9G,H and Figure 4.10).Collectively, these results demonstrate that relapse specimens can be biologicallydistinct from their corresponding primary specimen and indicate the value of obtainingrepeat biopsies at time of relapse to discover novel biomarkers for response to second-linetherapies, including ASCT.125CD20 CD163PrimarySpecimenRelapseSpecimenCD20PrimarySpecimenCD163RelapseSpecimenGPost-ASCT-OSPost-ASCT-FFSHFEBV Status at Primary DiagnosisResponse to Salvage TherapyLowerExpression inRelapseSpecimenHigherExpression inRelapseSpecimenEventNo-EventOutcome StatusRelapsePrimary RefractoryPrimary Refractory StatusNOS/ExtranodalMCLDLRNSHistological SubtypeEBV+EBV-EBV StatusChemoresistantChemosensitiveResponse to Salvage TherapyHigh RiskLow RiskIPSFigure 4.9: Primary vs. Relapse Gene Expression Differences. Panel A: Pairedprimary-relapse spearman r2 correlation distribution. A Gaussian mixture model is fittedto identify a set of low correlation pairs (dark grey distribution) and high correlation pairs(light grey distribution). Inserted scatterplots show gene expression data for representativepatients from two distributions. Panel B: The two correlation groups differ in theirASCT treatment survival. Panel C: Fold change matrix of paired relapse vs. primaryspecimens. Each column represents a patient and each row represents a signature. Eachcell represents the fold change in the expression of the genes included in a particularsignature between the relapse - primary specimen pair. Panel D: Correlation matrix ofthe cumulative signature differences. Panel E: Macrophage signature vs. B-cell signaturedifferences (relapse - primary) for each patient (represented as individual dots). The areashaded in light grey represents the 95% confidence interval. Panel F: CD163 positivepixel count difference vs. CD20 positive pixel count difference (relapse - primary, asmeasured by IHC and automated image analysis) for each patient (presented as individualdots). The area shaded in light grey represents the 95% confidence interval. PanelsG and H: Representative examples of two patients showing their paired biopsies withimmunohistochemical staining for CD20 and CD163. The measurement bar equals 50µm, original magnification: x200.126CD20 CD163PrimarySpecimenRelapseSpecimenBCD20PrimarySpecimenCD163RelapseSpecimenAFigure 4.10: CD163 IHC vs. CD20 IHC. Panels A and B show the inverse correlationof CD163 IHC difference vs. CD20 IHC difference in two representative patients(HL1003_HL1015; HL1053_HL1065). The measurement bar is 50 µm.1274.3.2 Relapse Biopsies are Superior for Predicting Post-Autologous StemCell Transplantation OutcomesGiven the significant biological differences between primary and relapse specimens in cHL,I next asked whether molecular characteristics in relapse specimens contained superiorprognostic properties compared to primary biopsies. First, I performed a post-ASCT-FFSand post-ASCT-OS univariate Cox regression analysis using all primary (n = 146) andrelapse specimens (n = 69, see Figure 4.1B) from patients who received an ASCT.This analysis revealed that gene expression measurements in relapse specimens (87significant genes) were more frequently associated with post-ASCT-FFS compared toprimary specimens (27 genes; Fisher P < 0.001) (Figure 4.11A,B). This observationwas also statistically significant (Fisher P < 0.001) for post-ASCT-OS as an endpoint (121significant genes in relapse vs. 38 significant genes in primary, Figure 4.11C,D). Thisfinding suggests that relapse specimens contain more individual prognostic signals thanprimary specimens to predict post-ASCT outcomes.To further validate the superior prognostic potential of gene expression measurementsat relapse, I combined the expression measurements of primary and relapse specimensinto a single profile and performed multivariate Cox regression-based feature selection.The resultant outcome prognostic models consisted of more gene expression featuresmeasured at relapse across a range of penalization terms (Figure 4.12).Finally, concordance statistics to measure the collective prognostic power of geneexpression measurements in primary and relapse specimens were generated throughbootstrapping (1000 models). This analysis revealed that models generated using relapsespecimens had a significantly higher mean concordance statistic than those generatedusing primary specimens for both post-ASCT-FFS (0.785 ± 0.073 vs. 0.594 ± 0.079,two-sample T-test P < 0.001) and post-ASCT-OS (0.786 ± 0.090 vs. 0.731 ± 0.107,two-sample T-test P < 0.001) (Figure 4.11E).In aggregate, these results establish that gene expression measurements at relapsecontain superior prognostic properties for predicting post-ASCT outcome than thosemeasured in the primary pre-treatment specimen.1280.00.51.01.50.00 0.25 0.50 0.75 1.00Post−ASCT−FFS Univariate Cox Regression P−valueDensityA2787P < 0.0010306090Primary RelapseSpecimen TypeNumber ofSignificant GenesPatients Eligiblefor Post−ASCTPrimary (n = 146)Relapse (n = 69)B0.00.51.00.00 0.25 0.50 0.75 1.00Post−ASCT−OS Univariate Cox Regression P−valueDensityC36121P < 0.001050100150Primary RelapseSpecimen TypeNumber ofSignificant GenesPatients Eligiblefor Post−ASCTPrimary (n = 146)Relapse (n = 69)DPost−ASCT−FFS Post−ASCT−OSlllllllllllllllP < 0.001llllllP < 0.0010.40.60.81.0Primary Relapse Primary RelapseSpecimen TypeConcordancePatients Eligiblefor Post−ASCTPrimary (n = 56)Relapse (n = 56)EFigure 4.11: Superior Post-ASCT Prognostic Properties of Relapse SpecimensCompared to Primary Specimens. Panels A and C: Post-ASCT-FFS and post-ASCT-OSunivariate Cox regression p-value distribution for primary and relapse specimens. Dottedvertical red line indicates the p-value of 0.05. Panels B and D: Number of significant genesfrom univariate cox regression analysis for post-ASCT-FFS and post-ASCT-OS. Panel E:Distribution of concordance statistics across 1000 models for each specimen type andpost-ASCT endpoint.12983 86465424322432232922 23 22 2121029161616161616 3 7 3 6 3 6 3 5 3 502550751000.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1AlphaNumber ofSelected Genes Patients Eligiblefor Post−ASCTPrimary (n = 56)Relapse (n = 56)A771215286334228 35 233122 28 24 26152414 20 11 17 10 16 9 15 9 15 9 15 9 15 9 14 9 13 9 13 8 11 8 110501000.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1AlphaNumber ofSelected Genes Patients Eligiblefor Post−ASCTPrimary (n = 56)Relapse (n = 56)BFigure 4.12: Penalized Multivariate Cox Regression for Directly Comparing Primaryvs. Relapse Specimens. Number of genes selected from penalized cox regression foreach sample type across a range of alphas for the endpoints post-ASCT-FFS (panel A)and post-ASCT-OS (panel B).4.3.3 A Novel Prognostic Model (RHL30) using Relapse SpecimensPredicts Post-Autologous Stem Cell Transplantation OutcomesHaving established that relapse biopsies provide superior information for predictingresponse to ASCT, I next sought to construct a prognostic model for the endpointpost-ASCT-FFS. Using genes associated with post-ASCT-FFS by univariate cox regressionanalysis in relapse specimens (Figure 4.13 and Figure 4.14), a parsimonious prognosticmodel (Figure 4.7A) was constructed that encapsulates and leverages a multitude ofbiological features and prognostic signals associated with post-ASCT outcomes.130ZNHIT3TNFSF9TNFSF4TMEM126ATLR8TLR4TCN2SLC31A2SLC15A3SDC4RXRARPL31RNF144BRASIP1RAPGEF2PTPRFPSTPIP2PLXNB2PIEZO1MT2AMAP4MAFBLGMNKCTD12JMJD6IL3RAHMOX1HMBSGUSBGNSGFM1GAS7FMNL2FGRFCGR1AFCER1GDRAM1DOCK4CYP2C18CR2CEBPDBECN1ATG10APOEAPOBALAS1AK8ACTBNGK2DSOD2RGS2RASSF2NAMPTIL13RA1GNB2CSF3RCD177APRILCD33HSP90AA1CTSBCCL14LOXL4CHUKCD80TLR2CD68IER3SYTL3IGSF3TNFRSF1BCX3CL1CSF1CCL20SCARB2CSF1RCD86CCL4C1QCC1QAC1QBBACH2GSTP1GCSBCL2L1ABCG1ABCC6ABCC11ABCA4ABCA3CCND1Drug−ResistanceB−cellCTLEosinophilHRSMacrophageMDSCNeutrophilNK N/ASignatureGene Name●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2.5 0.0 2.5 5.0 7.5Hazard Ratio (Log 2)●●Associated with Good Post−ASCTFFS OutcomeAssociated with Poor Post−ASCTFFS OutcomeFigure 4.13: Univariate Cox Regression on Post-BMT-FFS using Relapse Specimens.131WARSVEGFATPP1TLR8TLR4TLR10TLR1THEMIS2TCN2TBC1D9SLC31A2SLAMF8SIGLEC10SERPINE2SERPINA1SDC4RNF144BRASSF4RAPGEF2PTAFRPSTPIP2PLOD3PLD3PLA1APIEZO1ORM1NINJ1NAPAMT2AMT1GMAFBLOC54103LILRB2LGMNLAIR1KIAA1598IL4I1IGSF6IGL_AIGF2RHMOX1HLA_AHK3HCKGPX3GNSGLULGBP1FMNL2FGRFCER1GEVI2ADRAM1DOCK4CXCL9CXCL11CTSZCREG1CR2CHN1CEBPDBAFFATG7APOL3APOEANKRD22ALAS1AK8DPH5TRACD4SELLIL15RASOD2ITGB2ICAM3FCGRTERM2DEFA1CSF3RC3AR1ICAM1CD33CD14GPNMBFCGR3ACTSBCD80TLR2ALDH1A1MARCOCX3CR1CD68IFNGCX3CL1CCL20CD1CCCL3SCARB2PRF1CTSSCSF1RCCL4C1QCC1QACXCL10C1QBVPREB3FCER2MS4A1CD79ACD22CD19BLNKBLKGCSBCL2L1ABCG1ABCC6ABCC3ABCA3ABCA1Drug−ResistanceB−cellCTLEosinophilFDCHRSMacrophageMastMDSCNeutrophilNKPlasmaT−cellN/ASignatureGene Namellllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−4 −2 0 2 4 6Hazard Ratio (Log 2)llAssociated with Good Post−ASCTOS OutcomeAssociated with Poor Post−ASCTOS OutcomeFigure 4.14: Univariate Cox Regression on Post-BMT-OS using Relapse Specimens.132This model, consisting of 18 outcome-associated and 12 housekeeping genes, wascalled the RHL30 (Figure 4.15A, Table 4.2). The represented gene signatures in theRHL30 included B-cell, macrophage, HRS cell, neutrophil, and NK cell components, aswell as a drug resistance component.13315 2 1 0 050 24 14 4 1No. at Risk15 2 1 0 050 24 13 4 1No. at RiskPost-ASCT-FFS (Years)Low-Risk (n = 23)8 3 0 0 023 13 7 4 1No. at RiskAUH Validation CohortUMCG Validation CohortBCCA Training CohortPost-ASCT-FFSPost-ASCT-OS8 3 0 0 023 11 5 3 1No. at RiskAB-cellDrug-ResistanceHRSMacrophageNeutrophilNK-0.1 0.0 0.1 0.2RHL30 ModelCoefficientPost-ASCT-OS (Years)Low-Risk (n = 23)CSF1TNFSF9GCSSDC4ABCA3RNF144BAPOELGMNCX3CL1IL13RA1CCL20IGSF3SOD2IL3RAABCG1ULBP3CR2BACH2Threshold = 10.489101112Response to Salvage TherapyPrimary Refractory StatusPost-ASCT-OSPost-ASCT-FFSBHL1094HL1065HL1095HL1234HL1043HL1112HL1063HL1115HL1047HL1229HL1236HL1235HL1088HL1119HL1096HL1232HL1113HL1093HL1044HL1091HL1070HL1114HL1230HL1019HL1069HL1045HL1231HL1015HL1042HL1017HL1089HL1061HL1022HL1090HL1237HL1020HL1087HL1067HL1024HL1039HL1048HL1041HL1066HL1261HL1118HL1013HL1178HL1177HL1038HL1064HL1062HL1109HL1014HL1037HL1092HL1040HL1228HL1046HL1116HL1110HL1021HL1263HL1085HL1018HL1285++++++++++ +++++ +++++++++ +++++++ + +Low-Risk (n = 50)High-Risk (n = 15)High-Risk (n = 15)igh- isk (n = 15)0.000.250.500.751.000 5 10 15 20SurvivalProbabilityC EFPost-ASCT-FFS (Years)+++ ++++++++++++ +++++++++++++++Low-Risk (n = 50)High-Risk (n =15)0.000.250.500.751.000 5 10 15 20Post-ASCT-OS (Years)SurvivalProbabilityD++++++ + + ++++++ + ++Low-Risk (n = 20)High-Risk (n = 7)Log Rank P < 0.0010.000.250.500.751.000 5 10 15Post-ASCT-FFS (Years)SurvivalProbabilityG++++++ + +++++++ + ++ +Low-Risk (n = 20)High-Risk (n = 7)Log Rank P = 0.0050.000.250.500.751.000 5 10 15Post-ASCT-OS (Years)SurvivalProbabilityHLowerExpression inRelapseSpecimenHigherExpression inRelapseSpecimenEventNo-EventRelapsePrimary RefractoryChemosensitiveChemoresistant7 1 0 020 11 6 6No. at Risk7 1 0 020 12 6 6No. at RiskHigh-Risk (n = 8)High-Risk (n = 8)Figure 4.15: RHL30 Predicts Response to ASCT. Panel A shows the heatmap of theexpression values (z-score normalized) of the genes in the RHL30. Specimens are orderedby the RHL30 model score (panel B) while genes are ordered by their coefficient valuevalue (panel next to the gene expression heatmap). The plot on the far right of panelA indicates the signature assignment of a particular gene. Horizontal dotted red line inpanel B indicates the threshold used to dichotomize patients into low- and high- risk.Kaplan-Meier curves of the high- vs. low-risk groups (as identified by the RHL30) forpost-ASCT-FFS and post-ASCT-OS in the BCCA training (panels B,C), as well as in theUMCG (panels E,F), and AUH (panels G,H) validation cohorts.For the purpose of risk stratification, different model score thresholds were applied to134stratify patients in high- vs. low-risk. The hazard ratio was consistently above 1, indicatingthat the RHL30 predictive power is maintained over a broad range of potential thresholds(Figure 4.16B).P = −log10(0.05)Threshold = 10.40.02.55.07.510.012.51 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64Split Index−log10(P−value)Alllllll llllll lll llllllllllllllllllllllllll lllllHazard Ratio = 10.02.55.07.510.012.51 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64Split IndexHazard RatioBl llllllllllllllllllllllllllHazard Ratio = 102461 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31Split IndexHazard RatioCFigure 4.16: RHL30 Log-rank P-values and Hazard Ratios of Different Thresholds inthe Study Cohort. Panel A: Log-rank P-values of RHL30 high- vs. low-risk patients atvarious threshold splits. The x-axes represents various splits of the BCCA training cohortinto high- and low-risk with the threshold increasing from left to right. The y-axes representsthe log-rank p-value for each specific split. Panels B-D: Hazard Ratio of RHL30 high- vs.low-risk patients at various threshold splits in the BCCA training (panel B), UMCG (panelC), and AUH cohorts (panel D). The y-axes represents the hazard ratio for each specificsplit. A hazard ratio above 1 represents a higher hazard for patients with the high-riskcompared to low-risk. This figure demonstrates that at various thresholds the hazard ratiois consistently above 1 indicating a model that is able to identify a consistently high-riskgroup of patients.Ultimately, a threshold of 10.4 was chosen to maximize the survival differences (basedon the log-rank test P value) between the risk classes (Figure 4.16A,B) while identifying135at least 20% of patients as high risk. This schema produced a high-risk group of patientswith significantly inferior post-ASCT-FFS compared to the low-risk group (5-year: 37.5%high-risk vs. 70.1% low-risk, Figure 4.15C) and also inferior post-ASCT-OS (5-year: 37.5%high-risk vs. 81.6% low-risk, Figure 4.15D).To validate the prognostic power of RHL30, I applied the model to two separatevalidation cohorts of relapse specimens from UMCG (n = 31) and AUH (n = 27)(Figure 4.7B). According to the established and locked-in threshold (10.4) from the BCCAtraining cohort, the RHL30 dichotomized the patients of each validation cohort into high-and low-risk groups (Figure 4.17 and Figure 4.18) with proportions similar to thoseobserved in the BCCA training cohort. In the UMCG validation cohort, high-risk patientsdisplayed an unfavourable post-ASCT-FSS compared to low risk patients (Figure 4.15E)(5-year post-ASCT-FFS: 37.5% high-risk vs. 70.1% low-risk, P = 0.017) and also anunfavourable post-ASCT-OS (Figure 4.15F) (5-year post-ASCT-OS: 37.5% high-risk vs.71.6% low-risk, P = 0.006). Highly significant outcome correlations were also found inthe AUH validation cohort for post-ASCT-FFS (Figure 4.15G) (5-year post-ASCT-FFS:14.3% high-risk vs. 88.7% low-risk, P < 0.001) and post-ASCT-OS (Figure 4.15H) (5-yearpost-ASCT-OS: 28.6% high-risk vs. 85.7% low-risk, P < 0.001).$%/RZHU([SUHVVLRQ LQ5HODSVH6SHFLPHQ+LJKHU([SUHVVLRQ LQ5HODSVH6SHFLPHQ(YHQW1R(YHQW5HODSVH3ULPDU\ 5HIUDFWRU\&KHPRVHQVLWLYH&KHPRUHVLVWDQW&6)71)6)*&66'&$%&$51)%$32(/*01&;&/,/5$&&/,*6)62',/5$$%&*8/%3&5%$&+7KUHVKROG  5HVSRQVH WR 6DOYDJH 7KHUDS\3ULPDU\ 5HIUDFWRU\ 6WDWXV3RVW$6&7263RVW$6&7))6+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/Figure 4.17: RHL30 Model Applied to UMCG Validation Cohort. Panel A shows theheatmap of the expression values (z-score normalized) of the genes in the RHL30 model(rows) in the different UMCG relapse specimens (columns). Patients are ordered by theRHL30 model score (Panel B) while genes are ordered by their coefficient value. Horizontaldotted red line in Panel B indicates the threshold used to dichotomize patients into low- andhigh- risk.136CSF1TNFSF9GCSSDC4ABCA3RNF144BAPOELGMNCX3CL1IL13RA1CCL20IGSF3SOD2IL3RAABCG1ULBP3CR2BACH2Threshold = 10.489101112Response to Salvage TherapyPrimary Refractory StatusPost-ASCT-OSPost-ASCT-FFSHL1621HL1620HL1616HL1628HL1625HL1632HL1605HL1624HL1622HL1610HL1615HL1613HL1619HL1609HL1602HL1633HL1634HL1626HL1612HL1630HL1635HL1601HL1618HL1631HL1603HL1623HL1611HL1607EventNo-EventRelapsePrimary RefractoryChemosensitiveChemoresistantRHL30RiskScoreBALowerExpression inRelapseSpecimenHigherExpression inRelapseSpecimen-0.1 0.0 0.1 0.2RHL30 ModelCoefficient B-cellDrug-ResistanceHRSMacrophageNeutrophilNKFigure 4.18: RHL30 Model Applied to AUH Validation Cohort. Panel A shows theheatmap of the expression values (z-score normalized) of the genes in the RHL30 model(rows) in the different AUH relapse specimens (columns). Patients are ordered by theRHL30 model score (panel B) while genes are ordered by their coefficient value. Horizontaldotted red line in Panel B indicates the threshold used to dichotomize patients into low- andhigh- risk.I next examined whether RHL30 was an independent prognostic marker with respect topreviously reported prognostic factors of post-ASCT outcomes. Information was availablefor the following prognostic markers: time to first relapse, response to salvage therapy,age ≥ 45 years, and stage IV at primary diagnosis. Pairwise multivariate Cox regressionanalyses (Figure 4.19) (Table A.8 and Table A.9) demonstrated that the RHL30 wasstatistically (P < 0.05) or nearly statistically independent of all these prognostic markers inall three cohorts.Taken together, the RHL30 represents a novel gene expression-based prognosticmodel for response to ASCT, derived from relapse specimens, that validates in twoindependent cohorts and is independent of previously reported prognostic markers.137Post−ASCT−FFS Post−ASCT−OSllllllllllllllllllllRHL30 high−riskTime to First RelapseRHL30 high−riskPrimary RefractoryStatusRHL30 high−riskChemoresistance toSalvage TherapyRHL30 high−riskAge ≥ 45RHL30 high−riskStage IV123451 10 1 10Hazard RatioA Post−ASCT−FFS Post−ASCT−OSllllllllllllllllRHL30 high−riskTime to First RelapseRHL30 high−riskPrimary RefractoryStatusRHL30 high−riskChemoresistance toSalvage TherapyRHL30 high−riskAge ≥ 4512340.1 1.0 10.0 0.1 1.0 10.0Hazard RatioB Post−ASCT−FFS Post−ASCT−OSllllllllllllllllllllRHL30 high−riskTime to First RelapseRHL30 high−riskPrimary RefractoryStatusRHL30 high−riskChemoresistance toSalvage TherapyRHL30 high−riskAge ≥ 45RHL30 high−riskStage IV123451 100 1 100Hazard RatioCFigure 4.19: RHL30 vs. Reported Prognostic Markers Forest Plot. The RHL30risk-class is compared to other reported prognostic markers (y-axis) of post-ASCToutcomes using pairwise Cox regression. Each dot represents the hazard ratio (x-axis) withthe bar representing the 95% confidence interval. Each facet on the y-axis is a differentpairwise multivariate Cox regression. Panel A: BCCA training cohort. Panel B: UMCGvalidation cohort. Statistical independence from stage IV was not performed due to lack ofstage IV patients. Panel C: AUH validation cohort.4.4 DiscussionThrough histological evaluation and whole tissue gene expression profiling, I provideevidence that primary and relapse specimens of cHL can be biologically divergentreflecting, in part, differences in their TME composition. I provide statistical evidencethat the biology at relapse, compared to primary diagnosis, provides superior prognosticfeatures for predicting treatment outcome after ASCT. Finally, I have developed a novelgene expression-based prognostic model of post-ASCT-FFS (RHL30) that was derivedfrom relapse specimens and validated in two external cohorts of similarly treated patients.While there exists a rich literature describing the biology of cHL [62, 76], there has beena disproportionate focus on the biology of treatment-naive primary specimens leading toa paucity in the literature regarding the biology of relapsed disease. Using histologicalsubtyping and gene expression profiling of 71 paired primary and relapse specimens, Iprovide evidence for biological divergence, suggesting that chemotherapy, such as ABVD,can induce selective pressures resulting in tumour / TME evolution. The comparativeanalysis of prognostic properties between primary and relapse biopsies also indicatesthat accurate outcome prediction has to account for tumour evolution, and that prognosticfeatures to predict outcome of ASCT are best derived from relapse specimens.There is increasing evidence that the constituent elements of the TME are active138contributors to the pathogenesis of cHL. Additionally, reports of high macrophage [79],and B-cell content [78, 79], at the time of primary diagnosis, associating with poor andgood patient outcomes, respectively, strongly support the notion that the TME compositionaffects the likelihood of first-line treatment failure and overall survival. Two previous studieshave also suggested that macrophage [185] and T-cell content [184] at relapse might bepredictive of ASCT outcomes. However, both of these studies relied on IHC, which can beimprecise due to inter-laboratory and inter-observer variability. Moreover, the use of singleIHC biomarkers likely fails to fully capture the predictive properties of multiple componentsin the TME.In contrast, NanoString [197] digital gene expression profiling simultaneously measuresmultiple biological components that can be weighted and integrated into predictive models,employing a platform with proven reliability in quantifying RNA species from FFPE material[198] and a rapid assay turnaround time of only 36 hours. To develop a clinically usefultool for outcome prediction at the time point of relapse, I chose to train a prognostic model(RHL30) based on post-ASCT-FFS as the most disease-specific endpoint. Specifically,the model robustly captures the association of multiple biological components withoutcome following ASCT and was validated in two independent external cohorts of relapsespecimens.The prognostic power of the RHL30 might be of important translational relevancefor future clinical trials and patient management. Firstly, it identifies a low-risk groupof patients who have excellent survival rates when treated with the current standard ofcare, and a sizable group of patients who much more frequently experience second-linetreatment failure. Such prognostic information can provide the foundation for informedclinical decision making supporting the use of ASCT as second-line regimen with itshigh likelihood of success in low risk patients or suggesting that alternative therapeuticapproaches should be considered when there is a high probability of treatment failure asis seen in high-risk patients. For these high-risk patients, the development of biomarkersat relapse, such as RHL30, comes at a very timely juncture in the field as recent studieshave demonstrated the efficacy of novel therapies such as brentuximab vedotin [90] andPD-1 blockade [93, 94]. Moreover, consolidation with brentuximab vedotin has beendemonstrated to improve post-ASCT survival [187], but the subgroup of patients most likelyto benefit from this consolidation approach still needs to be determined.While the RHL30 assay represents a validated prognostic assay in relapsed,ABVD-pretreated patients who are subjected to ASCT-based second-line therapies, futurestudies will be needed that incorporate RHL30 into various clinical trial designs andinvestigate patients treated with brentuximab vedotin consolidation or different first-lineregimens. The availability of biopsies taken at relapse will be critical in this process toprovide patients and treating physicians access to the benefits of improved risk stratificationand related clinical decision making. In summary, the RHL30 is an important step towards139achieving precision medicine for patients experiencing relapse of cHL.140Chapter 5Conclusions and Future DirectionsDisease progression in B-cell lymphomas is a significant clinical burden. Inter-tumour,intra-tumour, and tumour microenvironment (TME) heterogeneity are major contributingfactors to disease progression and treatment outcome. Inter-tumour heterogeneity isresponsible, in part, for why patients respond to treatment differently. Intra-tumourheterogeneity implies that multiple clones are under selection, where each clone couldprovide a distinct evolutionary trajectory towards a progressive disease phenotype (e.g.therapeutic resistance). In addition, the tumour microenvironment can provide malignantcells with an extrinsic therapeutic resistance mechanism. Thus, TME heterogeneitycan contribute to differences in patient responses to treatment, similarly to inter-tumourheterogeneity. Moreover, as TME heterogeneity is subject to change over time (TMEdynamics), it can also contribute to disease progression through the provision of a noveltherapeutic resistance mechanism to the malignant cells.In this light, the primary goal of this thesis was to employ state of the art bioinformaticsapproaches to characterize these forms of heterogeneity in B-cell lymphomas and theirassociations with disease progression. This would allow for improved patient riskstratification, earmarking of alternative therapies, and ultimately inform on better clinicalmanagement of B-cell lymphoma patients. To this end, this thesis includes three studiesthat characterize these forms of heterogeneity and its implications on disease progressionacross three different B-cell lymphoma subtypes. The results of this work have broadimplications for the community. These are summarized below.Identification of novel prognostic biomarkers in DLBCL and cHL. In Chapter 2,I identified novel deletions in RCOR1 that produced a downstream prognostic expressionsignature in DLBCL. This discovery has contributed to the characterization of inter-tumourheterogeneity in DLBCL and also led to the identification of a novel genomic basedprognostic factor that can aid in the risk stratification of DLBCL patients. In Chapter 4,I developed a novel clinically applicable prognostic model (RHL30), based on the TMEcomposition at relapse, that can stratify patient responses to second-line treatment. TheRHL30 identifies relapse patients who may benefit from experimental therapeutics without141the need to first endure toxicity from a predictable ASCT treatment failure.Deciphering intratumour heterogeneity and modelling tumour evolution in FL. InChapter 3, I inferred clonal dynamic patterns using serial specimens and demonstratedhow histological transformation and early progression in FL manifests through disparatemodes of tumour evolution. Specifically, my tumour evolution modelling showed how clonalexpansion patterns of known driver mutations are typically under positive selection in thepathogenesis of transformation. In contrast, clonal dynamic patterns between diagnosticand early progressed specimens were consistent with neutral evolutionary dynamics. Thedistinct tumour evolution patterns underlying these two clinical extremes also highlights theneed for different clinical management strategies.Highlighting the importance of relapse biopsies and TME heterogeneity in cHL.In Chapter 4, I demonstrated for the first time, to the best of my knowledge, the existenceof TME dynamics between paired primary and relapsed cHL specimens. Additionally, myanalyses showed how differences in patient responses to treatment can be attributed tointer-tumour microenvironment heterogeneity. These results establish that primary andrelapse cHL specimens can be molecularly divergent and how the TME composition caninform on treatment outcome. Moreover, I demonstrate how the predictive power of theTME composition at relapse is superior to that at primary for predicting patient response tosecond-line treatment. This highlights the clinical relevance of acquiring relapse biopsiesfor the discovery of novel and superior prognostic biomarkers for predicting second-linetreatment response.5.1 Future Directions5.1.1 RCOR1 as a Prognostic and Therapeutic TargetWhile Chapter 2 identified RCOR1 deletions and the corresponding downstream genesignature as a novel prognostic factor, further work is needed to establish the methodologyfor clinically applying the RCOR1-loss gene signature to risk stratification. A majorlimitation in the study was the independent cohorts having transcriptional profiles assayedfrom a platform (microarrays) that was different from the discovery cohort (RNA-seq).The consequence was the demonstration of the signature’s prognostic significance beingrestricted to a “rediscovery approach”, via unsupervised clustering. If all the transcriptionaldata were generated from the same platform, a statistical model derived from the discoverycohort could have been established and applied to the independent cohorts. This approachwould have been consistent with a validation approach and also be the first step towards aclinically applicable test. To overcome this limitation, the RCOR1-loss gene signature couldbe translated into a clinically applicable assay similar to the RHL30 assay (Chapter 4).Moreover, the genomics-era has led to the discovery of other genome based prognosticfactors (e.g. CDKN2A [48], FOXO1 [47]). It remains to be determined if synergistic outcome142patterns could be found through the integration of RCOR1 deletions and its downstreamgene signature with other reported prognostic factors.Additionally, shRNA knockdowns of RCOR1 provided a first glance at the potentialfunctional consequences of a RCOR1 deletion. However, the exact functional role ofRCOR1 deletions in lymphogenesis still remains to be elucidated. More detailed insightcould be gained through genomic deletion of RCOR1, via CRISPR-mediated genomeediting in vitro or in vivo, followed by expression profiling. Such characterization could alsoaid in the design of therapeutic approaches that target the downstream effects of RCOR1loss.5.1.2 Extent of Spatial Heterogeneity in B-cell LymphomasIn Chapter 3, the existence of mutations that were undetectable at T1 but highly prevalentat T2 suggested that these mutations might have arisen de novo during this time period inTFL patients. However, another explanation for this finding could be spatial heterogeneityin B-cell lymphomas that was not accounted for in the experimental design. As B-celllymphomas are derived from B lymphocytes that circulate freely within the lymphatictissues and throughout the vascular system, the widely accepted notion is that B-celllymphomas are disseminated throughout the body and thus spatial heterogeneity mightbe less prevalent. However, this has not been, to the best of my knowledge, conclusivelyproven in B-cell lymphomas. Additionally studies to profile the spatial heterogeneity withinthe same lymph nodes, or between lymph nodes could help address this question. Ifspatial heterogeneity does exist, then this could explain the inability to detect mutations atlow prevalence and indicate that a single lymph node specimen is unable to fully capturethe biological heterogeneity of a tumour. However, if the full extent of spatial heterogeneitycould be accounted for, our ability to predict histological transformation could be furtherimproved.5.1.3 Prediction of Early Progression in Follicular LymphomaThe PFL clonal dynamic patterns in Chapter 3 demonstrated that clones harbouringtreatment-resistance properties were already present at diagnosis, and that diseaseprogression may be attributed to selection of clones that were major constituents ofthe diagnostic sample. As such, prediction of early treatment resistance should beachievable through comprehensive characterization of the genetic and clonal compositionat diagnosis. This thesis took the first steps towards this by identifying mutations thatwere preferentially associated with early progression in a cohort of 125 patients comparing41 early vs. 84 late progressers. Further work with larger cohorts is needed to fullycharacterize the collection of mutations that can predict early progression. This couldeventually culminate in a gene mutation based model that could incorporate clinical factorsto produce a clinicogenetic model akin to the m7-FLIPI [56].1435.1.4 Deconvolution of the TMEIn Chapter 4, a non-probabilistic, supervised method was applied to deconvolute thecomposition of the TME from gene expression data. This method relied on annotation datafrom the literature to associate genes with certain cell types; thus it was unable to discovernovel associations. Moreover, an assumption was made that all genes had equivalentcontributions to the cell type they were assigned to. This assumption is likely invalid asgenes are often expressed by multiple cell types and thus their expression contribution isdistributed across various cell types.As such, a more sophisticated probabilistic machine learning approach should besuperior and more accurate at deconvoluting the TME. Application of the tool Cibersort[199] or an implementation of a latent dirichlet allocation approach could be the next steps.5.1.5 Establishing the RHL30 as a Prognostic and Predictive BiomarkerIn Chapter 4, a key contribution was the development of RHL30 - a novel prognosticmodel for ASCT derived from relapse specimens. While I validated this in an independentcohort, the prognostic significance should be validated in additional cohorts and alsoin an independent lab. Moreover, the pipeline to generate RHL30 scores needs to bestandardized and readily distributed to other groups and clinical labs. And finally, thecurrent work has only shown that the RHL30 is a prognostic biomarker. Clinical trials willbe needed to determine whether the inferior survival of high-risk patients can be overcomewith novel therapeutics (e.g. immunotherapy, targeted therapy). Such work would thenestablish the RHL30 as also a predictive biomarker.5.2 Concluding RemarksThe work presented in this thesis constitutes a step forward in our characterization oftumour heterogeneity and its association with disease progression in DLBCL, FL, andcHL. I attained my overall thesis goal of deciphering previously undescribed patterns ofinter-tumour, intra-tumour and TME heterogeneity and their associations with diseaseprogression and treatment outcome. Moreover, these studies demonstrate how theintegration of high-resolution genomics data, state of the art bioinformatic techniques,and clinical patient information can reveal previously uncovered patterns associated withdisease progression. As such, these results represent some of the newest insights intotumour biology and its implications on clinical management, and collectively represents amajor advance towards precision medicine for individual lymphoma patients.144Bibliography[1] Jemal, A. et al. Cancer Statistics, 2008. CA: A Cancer Journal for Clinicians 58,71–96 (2008). → pages 1[2] Nowell, P. The clonal evolution of tumor cell populations. Science 194, 23–28(1976). → pages 1[3] Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313(2012). → pages 1, 2, 110[4] Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes andconsequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345(2013). → pages 1, 3, 110[5] Scott, D. W. & Gascoyne, R. D. The tumour microenvironment in B cell lymphomas.Nat Rev Cancer 14, 517–534 (2014). → pages 1, 3, 5, 14, 15, 16, 110[6] Morin, R. D. et al. Frequent mutation of histone-modifying genes in non-Hodgkinlymphoma. Nature 476, 298–303 (2011). → pages 2, 9, 11, 18, 20, 22, 31, 44[7] Pasqualucci, L. et al. Analysis of the coding genome of diffuse large B-celllymphoma. Nat Genet 43, 830–837 (2011). → pages 11, 18, 36, 44, 56[8] Lohr, J. G. et al. Discovery and prioritization of somatic mutations in diffuse largeB-cell lymphoma (DLBCL) by whole-exome sequencing. Proc Natl Acad Sci U S A109, 3879–3884 (2012). → pages 20[9] Zhang, J. et al. Genetic heterogeneity of diffuse large B-cell lymphoma. Proc NatlAcad Sci U S A (2013). → pages 2, 9[10] Shah, S. P. et al. The clonal and mutational evolution spectrum of primarytriple-negative breast cancers. Nature 486, 395–399 (2012). → pages 3[11] Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed bymultiregion sequencing. N Engl J Med 366, 883–892 (2012). → pages 4[12] Bashashati, A. et al. Distinct evolutionary trajectories of primary high-grade serousovarian cancers revealed through spatial mutational profiling. The Journal ofpathology 231, 21–34 (2013). → pages[13] Swanton, C. Intratumor heterogeneity: evolution through space and time. CancerRes 72, 4875–4882 (2012). → pages 4145[14] Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at singlenucleotide resolution. Nature 461, 809–813 (2009). → pages 4, 110[15] Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed bywhole-genome sequencing. Nature 481, 506–510 (2012). → pages 59, 110[16] Landau, D. A. et al. Evolution and impact of subclonal mutations in chroniclymphocytic leukemia. Cell 152, 714–726 (2013). → pages 4, 59, 97[17] Mottok, A. & Steidl, C. Genomic alterations underlying immune privilege inmalignant lymphomas. Current Opinion in Hematology 22, 343–354 (2015). →pages 5, 110[18] Fridman, W. H., Pagès, F., Sautès-Fridman, C. & Galon, J. The immune contexturein human tumours: impact on clinical outcome. Nat Rev Cancer 12, 298–306(2012). → pages 5[19] Junttila, M. R. & de Sauvage, F. J. Influence of tumour micro-environmentheterogeneity on therapeutic response. Nature 501, 346–354 (2013). → pages 5[20] Meads, M. B., Gatenby, R. A. & Dalton, W. S. Environment-mediated drugresistance: a major contributor to minimal residual disease. Nat Rev Cancer 9,665–674 (2009). → pages 5[21] Shojaei, F. et al. Tumor refractoriness to anti-VEGF treatment is mediated byCD11b+Gr1+ myeloid cells. Nature biotechnology 25, 911–920 (2007). → pages 5[22] Fang, H. & Declerck, Y. A. Targeting the tumor microenvironment: fromunderstanding pathways to effective clinical trials. Cancer Res 73, 4965–4977(2013). → pages 5[23] Shi, Y., Du, L., Lin, L. & Wang, Y. Tumour-associated mesenchymal stem/stromalcells: emerging therapeutic targets. Nature reviews. Drug discovery (2016). →pages 5[24] Lenz, G. & Staudt, L. M. Aggressive lymphomas. N Engl J Med 362, 1417–1429(2010). → pages 6, 8, 9, 18, 20[25] Swerdlow, S. H., Cancer, I. A. f. R. o. & World Health Organization. WHOclassification of tumours of haematopoietic and lymphoid tissues (World HealthOrganization, 2008), 4 edn. → pages 6, 8, 11[26] Küppers, R. Mechanisms of B-cell lymphoma pathogenesis. Nat Rev Cancer 5,251–262 (2005). → pages 6[27] Lieber, M. R. Mechanisms of human lymphoid chromosomal translocations. NatRev Cancer 16, 387–398 (2016). → pages 6[28] Jung, D., Giallourakis, C., Mostoslavsky, R. & Alt, F. W. Mechanism and control ofV(D)J recombination at the immunoglobulin heavy chain locus. Annual review ofimmunology 24, 541–570 (2006). → pages 6[29] Küppers, R. & Dalla-Favera, R. Mechanisms of chromosomal translocations in Bcell lymphomas. Oncogene 20, – (2001). → pages 6146[30] Iqbal, J. et al. BCL2 translocation defines a unique tumor subset within thegerminal center B-cell-like diffuse large B-cell lymphoma. The American journal ofpathology 165, 159–166 (2004). → pages 7[31] Royo, C. et al. The complex landscape of genetic alterations in mantle celllymphoma. Seminars in cancer biology 21, 322–334 (2011). → pages 7[32] Hecht, J. L. & Aster, J. C. Molecular Biology of Burkitt’s Lymphoma.jco.ascopubs.org . → pages 7[33] Tellier, J. et al. Human t (14; 18) positive germinal center B cells: a new step infollicular lymphoma pathogenesis? Blood 123, 3462–3465 (2014). → pages 7, 11[34] Bakhshi, A. et al. Cloning the chromosomal breakpoint of t(14;18) humanlymphomas: clustering around JH on chromosome 14 and near a transcriptionalunit on 18. Cell 41, 899–906 (1985). → pages 7[35] Kridel, R. R., Sehn, L. H. L. & Gascoyne, R. D. R. Pathogenesis of follicularlymphoma. Journal of Clinical Investigation 122, 3424–3431 (2012). → pages 9,11, 18[36] Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified bygene expression profiling. Nature 403, 503–511 (2000). → pages 9[37] Pasqualucci, L. & Dalla-Favera, R. SnapShot: Diffuse Large B Cell Lymphoma.Cancer Cell 25, 132–132.e1 (2014). → pages 9[38] Dunleavy, K. Double-hit lymphomas: current paradigms and novel treatmentapproaches. Hematology / the Education Program of the American Society ofHematology. American Society of Hematology. Education Program 2014, 107–112(2014). → pages 9[39] Barrans, S. et al. Rearrangement of MYC is associated with poor prognosis inpatients with diffuse large B-cell lymphoma treated in the era of rituximab. J ClinOncol 28, 3360–3365 (2010). → pages 9[40] Green, T. M. et al. Immunohistochemical double-hit score is a strong predictor ofoutcome in patients with diffuse large B-cell lymphoma treated with rituximab pluscyclophosphamide, doxorubicin, vincristine, and prednisone. J Clin Oncol 30,3460–3467 (2012). → pages 9[41] Johnson, N. A. et al. Concurrent Expression of MYC and BCL2 in Diffuse LargeB-Cell Lymphoma Treated With Rituximab Plus Cyclophosphamide, Doxorubicin,Vincristine, and Prednisone. J Clin Oncol – (2012). → pages 9[42] Swerdlow, S. H. et al. The 2016 revision of the World Health Organizationclassification of lymphoid neoplasms. Blood 127, 2375–2390 (2016). → pages 10[43] Sehn, L. H., Connors, J. M. & Gascoyne, R. Introduction of Combined CHOP PlusRituximab Therapy Dramatically Improved Outcome of Diffuse Large B-CellLymphoma in British Columbia. Journal of Clinical Oncology 23, 5027–5033(2005). → pages 10147[44] Feugier, P. Long-Term Results of the R-CHOP Study in the Treatment of ElderlyPatients With Diffuse Large B-Cell Lymphoma: A Study by the Groupe d’Etude desLymphomes de l’Adulte. Journal of Clinical Oncology 23, 4117–4126 (2005). →pages 10[45] Friedberg, J. W. New strategies in diffuse large B-cell lymphoma: translatingfindings from gene expression analyses into clinical practice. Clinical cancerresearch : an official journal of the American Association for Cancer Research 17,6112–6117 (2011). → pages 10[46] A Predictive Model for Aggressive Non-Hodgkin’s Lymphoma. N Engl J Med 329,987–994 (1993). → pages 10[47] Trinh, D. L. et al. Analysis of FOXO1 mutations in diffuse large B-cell lymphoma.Blood (2013). → pages 11, 142[48] Jardin, F. et al. Diffuse large B-cell lymphomas with CDKN2A deletion have adistinct gene expression signature and a poor prognosis under R-CHOP treatment:a GELA study. Blood 116, 1092–1104 (2010). → pages 11, 20, 36, 38, 142[49] Green, M. R. et al. Hierarchy in somatic mutations arising during genomic evolutionand progression of follicular lymphoma. Blood (2013). → pages 11[50] Cheung, K.-J. J. et al. Acquired TNFRSF14 mutations in follicular lymphoma areassociated with worse prognosis. Cancer Res 70, 9166–9174 (2010). → pages 11[51] Oricchio, E. et al. The Eph-receptor A7 is a soluble tumor suppressor for follicularlymphoma. Cell 147, 554–564 (2011). → pages 11[52] Tan, D. et al. Improvements in observed and relative survival in follicular grade 1-2lymphoma during 4 decades: the Stanford University experience. Blood 122,981–987 (2013). → pages 11[53] Casulo, C., Burack, W. R. & Friedberg, J. W. Transformed follicular non-Hodgkinlymphoma. Blood 125, 40–47 (2015). → pages 12[54] Wagner-Johnston, N. D. et al. Outcomes of transformed follicular lymphoma in themodern era: a report from the National LymphoCare Study (NLCS). Blood 126,851–857 (2015). → pages 12, 58[55] Casulo, C. et al. Early Relapse of Follicular Lymphoma After Rituximab PlusCyclophosphamide, Doxorubicin, Vincristine, and Prednisone Defines Patients atHigh Risk for Death: An Analysis From the National LymphoCare Study. Journal ofClinical Oncology 33, 2516–2522 (2015). → pages 12, 58[56] Pastore, A. et al. Integration of gene mutations in risk prognostication for patientsreceiving first-line immunochemotherapy for follicular lymphoma: a retrospectiveanalysis of a prospective clinical trial and validation in a population-based registry.The Lancet Oncology 16, 1111–1122 (2015). → pages 12, 59, 143[57] Jurinovic, V. et al. Clinicogenetic risk models predict early progression of follicularlymphoma after first-line immunochemotherapy. Blood (2016). → pages 12, 58, 59148[58] Al-Tourah, A. J. et al. Population-based analysis of incidence and outcome oftransformed non-Hodgkin’s lymphoma. J Clin Oncol 26, 5165–5169 (2008). →pages 12, 58[59] Bastion, Y. et al. Incidence, predictive factors, and outcome of lymphomatransformation in follicular lymphoma patients. J Clin Oncol 15, 1587–1594 (1997).→ pages 12[60] Montoto, S. et al. Risk and clinical implications of transformation of follicularlymphoma to diffuse large B-cell lymphoma. Journal of Clinical Oncology 25,2426–2433 (2007). → pages 12, 58[61] N, H. et al. SEER Cancer Statistics Review (CSR) 1975-2013. URLhttp://seer.cancer.gov/csr/1975_2013/. → pages 13[62] Küppers, R. The biology of Hodgkin’s lymphoma. Nat Rev Cancer 9, 15–27 (2008).→ pages 13, 138[63] Pileri, S. A. et al. Hodgkin’s lymphoma: the pathologist’s viewpoint. Journal ofclinical pathology 55, 162–176 (2002). → pages 13, 109[64] Network, T. C. G. A. R. et al. The Cancer Genome Atlas Pan-Cancer analysisproject. Nat Genet 45, 1113–1120 (2013). → pages 13[65] International Cancer Genome Consortium et al. International network of cancergenome projects. Nature 464, 993–998 (2010). → pages 13[66] Joos, S. et al. Classical Hodgkin lymphoma is characterized by recurrent copynumber gains of the short arm of chromosome 2. Blood 99, 1381–1387 (2002). →pages 13[67] Joos, S. et al. Genomic imbalances including amplification of the tyrosine kinasegene JAK2 in CD30+ Hodgkin cells. Cancer research 60, 549–552 (2000). →pages 13[68] Weniger, M. A. et al. Mutations of the tumor suppressor gene SOCS-1 in classicalHodgkin lymphoma are frequent and associated with nuclear phospho-STAT5accumulation. Oncogene 25, 2679–2684 (2006). → pages 13[69] Gunawardana, J. et al. Recurrent somatic mutations of PTPN1 in primarymediastinal B cell lymphoma and Hodgkin lymphoma. Nat Genet (2014). → pages13[70] Green, M. R. et al. Integrative analysis reveals selective 9p24.1 amplification,increased PD-1 ligand expression, and further induction via JAK2 in nodularsclerosing Hodgkin lymphoma and primary mediastinal large B-cell lymphoma.Blood 116, 3268–3277 (2010). → pages 13, 17, 27, 76[71] Steidl, C. et al. MHC class II transactivator CIITA is a recurrent gene fusion partnerin lymphoid cancers. Nature 471, 377–381 (2011). → pages 13, 17[72] Reichel, J. et al. Flow sorting and exome sequencing reveal the oncogenome ofprimary Hodgkin and Reed-Sternberg cells. Blood 125, 1061–1072 (2015). →pages 14, 18149[73] Liu, Y. et al. The mutational landscape of Hodgkin lymphoma cell lines determinedby whole exome sequencing. Leukemia (2014). → pages 14[74] Küppers, R. Molecular biology of Hodgkin lymphoma. Hematology / the EducationProgram of the American Society of Hematology. American Society of Hematology.Education Program 491–496 (2009). → pages 14[75] Bechtel, D., Kurth, J., Unkel, C. & Küppers, R. Transformation of BCR-deficientgerminal-center B cells by EBV supports a major role of the virus in thepathogenesis of Hodgkin and posttransplantation lymphomas. Blood 106,4345–4350 (2005). → pages 14[76] Steidl, C., Connors, J. M. & Gascoyne, R. D. Molecular pathogenesis of Hodgkin’slymphoma: increasing evidence of the importance of the microenvironment. J ClinOncol 29, 1812–1826 (2011). → pages 15, 109, 110, 138[77] Sánchez-Aguilera, A. et al. Tumor microenvironment and mitotic checkpoint are keyfactors in the outcome of classic Hodgkin lymphoma. Blood 108, 662–668 (2006).→ pages 15[78] Chetaille, B. et al. Molecular profiling of classical Hodgkin lymphoma tissuesuncovers variations in the tumor microenvironment and correlations with EBVinfection and outcome. Blood 113, 2765–3775 (2009). → pages 114, 139[79] Steidl, C. et al. Tumor-associated macrophages and survival in classic Hodgkin’slymphoma. N Engl J Med 362, 875–885 (2010). → pages 15, 16, 110, 114, 139[80] Connors, J. M. State-of-the-art therapeutics: Hodgkin’s lymphoma. J Clin Oncol 23,6400–6408 (2005). → pages 16[81] Borchmann, P. & Engert, A. The past: what we have learned in the last decade.Hematology / the Education Program of the American Society of Hematology.American Society of Hematology. Education Program 2010, 101–107 (2010). →pages 16[82] Hasenclever, D. et al. A Prognostic Score for Advanced Hodgkin’s Disease. N EnglJ Med 339, 1506–1514 (1998). → pages 16[83] Scott, D. W. et al. Gene expression-based model using formalin-fixedparaffin-embedded biopsies predicts overall survival in advanced-stage classicalHodgkin lymphoma. J Clin Oncol 31, 692–700 (2013). → pages 16, 114[84] Kuruvilla, J., Keating, A. & Crump, M. How I treat relapsed and refractory Hodgkinlymphoma. Blood 117, 4208–4217 (2011). → pages 16, 18, 109[85] Schmitz, N. et al. Aggressive conventional chemotherapy compared with high-dosechemotherapy with autologous haemopoietic stem-cell transplantation for relapsedchemosensitive Hodgkin’s disease: a randomised trial. Lancet (London, England)359, 2065–2071 (2002). → pages 16, 109[86] Crump, M. Management of Hodgkin Lymphoma in Relapse after Autologous StemCell Transplant. Hematology / the Education Program of the American Society of150Hematology. American Society of Hematology. Education Program 2008, 326–333(2008). → pages 17[87] Alinari, L. & Blum, K. A. How I treat relapsed classical Hodgkin lymphoma afterautologous stem cell transplant. Blood 127, 287–295 (2016). → pages 17[88] Younes, A. et al. Brentuximab vedotin (sgn-35) for relapsed cd30-positivelymphomas. N Engl J Med 363, 1812–1821 (2010). → pages 17[89] Francisco, J. A. et al. cAC10-vcMMAE, an anti-CD30-monomethyl auristatin Econjugate with potent and selective antitumor activity. Blood 102, 1458–1465(2003). → pages 17[90] Chen, R. et al. Five-year survival and durability results of brentuximab vedotin inpatients with relapsed or refractory Hodgkin lymphoma. Blood (2016). → pages 17,110, 139[91] Younes, A. et al. Brentuximab vedotin combined with ABVD or AVD for patients withnewly diagnosed Hodgkin’s lymphoma: a phase 1, open-label, dose-escalationstudy. The Lancet Oncology 14, 1348–1356 (2013). → pages 17[92] Steidl, C. et al. Genome-wide copy number analysis of Hodgkin Reed-Sternbergcells identifies recurrent imbalances with correlations to treatment outcome. Blood116, 418–427 (2010). → pages 17[93] Ansell, S. M. et al. PD-1 Blockade with Nivolumab in Relapsed or RefractoryHodgkin’s Lymphoma. N Engl J Med 372, 311–319 (2015). → pages 17, 110, 139[94] Younes, A. et al. Nivolumab for classical Hodgkin’s lymphoma after failure of bothautologous stem-cell transplantation and brentuximab vedotin: a multicentre,multicohort, single-arm phase 2 trial. The Lancet Oncology (2016). → pages 18,110, 139[95] Armand, P. et al. Programmed Death-1 Blockade With Pembrolizumab in PatientsWith Classical Hodgkin Lymphoma After Brentuximab Vedotin Failure. J Clin Oncol(2016). → pages 18[96] Bea, S. et al. Diffuse large B-cell lymphoma subgroups have distinct geneticprofiles that influence tumor biology and improve gene-expression-based survivalprediction. Blood 106, 3183–3190 (2005). → pages 20, 21, 35[97] Tagawa, H. et al. Comparison of genome profiles for identification of distinctsubgroups of diffuse large B-cell lymphoma. Blood 106, 1770–1777 (2005). →pages 20, 21, 35[98] Rosenwald, A. et al. The use of molecular profiling to predict survival afterchemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346, 1937–1947(2002). → pages 20, 36, 56[99] Friedberg, J. W. Relapsed/refractory diffuse large B-cell lymphoma. Hematology /the Education Program of the American Society of Hematology. American Societyof Hematology. Education Program 2011, 498–505 (2011). → pages 21151[100] Ding, J. et al. Systematic analysis of somatic mutations impacting gene expressionin 12 tumour types. Nature communications 6, 8554 (2015). → pages 21, 29, 36[101] Chen, W. et al. Array comparative genomic hybridization reveals genomic copynumber changes associated with outcome in diffuse large B-cell lymphomas. Blood107, 2477–2485 (2006). → pages 21[102] Lenz, G. et al. Molecular subtypes of diffuse large B-cell lymphoma arise by distinctgenetic pathways. Proc Natl Acad Sci U S A 105, 13520–13525 (2008). → pages36[103] Kreisel, F. et al. High resolution array comparative genomic hybridization identifiescopy number alterations in diffuse large B-cell lymphoma that predict response toimmuno-chemotherapy. Cancer Genet 204, 129–137 (2011). → pages 21[104] Monti, S. et al. Integrative Analysis Reveals an Outcome-Associated andTargetable Pattern of p53 and Cell Cycle Deregulation in Diffuse Large B CellLymphoma. Cancer Cell 22, 359–372 (2012). → pages 21, 22, 33, 36[105] Lenz, G. et al. Stromal gene signatures in large-B-cell lymphomas. N Engl J Med359, 2313–2323 (2008). → pages 21, 22, 51[106] Wang, K. et al. PennCNV: an integrated hidden Markov model designed forhigh-resolution copy number variation detection in whole-genome SNP genotypingdata. Genome Res 17, 1665–1674 (2007). → pages 23[107] Yau, C. A statistical approach for detecting genomic aberrations in heterogeneoustumor samples from single nucleotide polymorphism genotyping data. Genome Biol11, R92 (2010). → pages 23[108] Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breasttumours reveals novel subgroups. Nature 486, 346–352 (2012). → pages 27, 56,76[109] Beroukhim, R. et al. Assessing the significance of chromosomal aberrations incancer: methodology and application to glioma. Proc Natl Acad Sci U S A 104,20007–20012 (2007). → pages 27[110] Paternoster, S. F. et al. A new method to extract nuclei from paraffin-embeddedtissue to study lymphomas using interphase fluorescence in situ hybridization. TheAmerican journal of pathology 160, 1967–1972 (2002). → pages 27[111] Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants andsplicing in short reads. Bioinformatics 26, 873–881 (2010). → pages 28[112] Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,2078–2079 (2009). → pages 28[113] Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoSComputational Biology 9, e1003118 (2013). → pages 28[114] Wu, G., Feng, X. & Stein, L. A human functional protein interaction network and itsapplication to cancer data analysis. Genome Biol 11, R53 (2010). → pages 34152[115] Kameoka, Y. et al. Contig array CGH at 3p14.2 points to the FRA3B/FHIT commonfragile region as the target gene in diffuse large B-cell lymphoma. Oncogene 23,9148–9154 (2004). → pages 36[116] Pasqualucci, L. Inactivation of the PRDM1/BLIMP1 gene in diffuse large B celllymphoma. Journal of Experimental Medicine 203, 311–317 (2006). → pages 36[117] Rui, L. et al. Cooperative epigenetic modulation by cancer amplicon genes. CancerCell 18, 590–605 (2010). → pages 56[118] Scuoppo, C. et al. A tumour suppressor network relying on the polyamine-hypusineaxis. Nature 487, 244–248 (2012). → pages 56[119] Liao, G., Zhang, M., Harhaj, E. W. & Sun, S.-C. Regulation of theNF-kappaB-inducing kinase by tumor necrosis factor receptor-associated factor3-induced degradation. The Journal of biological chemistry 279, 26243–26250(2004). → pages 56[120] Keats, J. J. et al. Promiscuous mutations activate the noncanonical NF-kappaBpathway in multiple myeloma. Cancer Cell 12, 131–144 (2007). → pages 56[121] Nagel, S. et al. NK-like homeodomain proteins activate NOTCH3-signaling inleukemic T-cells. BMC cancer 9, 371 (2009). → pages[122] Braggio, E. et al. Identification of Copy Number Abnormalities and InactivatingMutations in Two Negative Regulators of Nuclear Factor- B Signaling Pathways inWaldenstrom’s Macroglobulinemia. Cancer Res 69, 3579–3588 (2009). → pages56[123] Chong, J. A. et al. REST: a mammalian silencer protein that restricts sodiumchannel gene expression to neurons. Cell 80, 949–957 (1995). → pages 56[124] Schoenherr, C. J. & Anderson, D. J. The neuron-restrictive silencer factor (NRSF):a coordinate repressor of multiple neuron-specific genes. Science 267, 1360–1363(1995). → pages 56[125] Ballas, N. et al. Regulation of neuronal traits by a novel transcriptional complex.Neuron 31, 353–365 (2001). → pages 56[126] You, A., Tong, J. K., Grozinger, C. M. & Schreiber, S. L. CoREST is an integralcomponent of the CoREST- human histone deacetylase complex. Proc Natl AcadSci U S A 98, 1454–1458 (2001). → pages[127] Hakimi, M.-A. et al. A core-BRAF35 complex containing histone deacetylasemediates repression of neuronal-specific genes. Proc Natl Acad Sci U S A 99,7420–7425 (2002). → pages 56[128] Huang, Y., Myers, S. J. & Dingledine, R. Transcriptional repression by REST:recruitment of Sin3A and histone deacetylase to neuronal genes. Natureneuroscience 2, 867–872 (1999). → pages 56[129] Anon. A clinical evaluation of the International Lymphoma Study Groupclassification of non-Hodgkin’s lymphoma. The Non-Hodgkin’s LymphomaClassification Project. Blood 89, 3909–18 (1997). → pages 58153[130] Anderson, J. R., Armitage, J. O. & Weisenburger, D. D. Epidemiology of thenon-Hodgkin’s lymphomas: distributions of the major subtypes differ by geographiclocations. Non-Hodgkin’s Lymphoma Classification Project. Ann Oncol 9, 717–20(1998). → pages 58[131] Bachy, E. et al. Long-term follow up of the FL2000 study comparingCHVP-interferon to CHVP-interferon plus rituximab in follicular lymphoma.Haematologica 98, 1107–14 (2013). → pages 58[132] Tan, D. et al. Improvements in observed and relative survival in follicular grade 1-2lymphoma during 4 decades: the Stanford University experience. Blood 122,981–987 (2013). → pages[133] Junlen, H. R. et al. Follicular lymphoma in Sweden: nationwide improved survival inthe rituximab era, particularly in elderly women - a Swedish Lymphoma Registrystudy. Leukemia 29, 668–676 (2014). → pages 58[134] Link, B. K. et al. Rates and outcomes of follicular lymphoma transformation in theimmunochemotherapy era: a report from the University of Iowa/MayoClinicSpecialized Program of Research Excellence Molecular Epidemiology Resource. JClin Oncol 31, 3272–8 (2013). → pages 58[135] Mozessohn, L. et al. Chemoimmunotherapy resistant follicular lymphoma:predictors of resistance, association with transformation and prognosis. LeukLymphoma 55, 2502–2507 (2014). → pages[136] Lerch, K. et al. Impact of prior treatment on outcome of transformed follicularlymphoma and relapsed de novo diffuse large B cell lymphoma: a retrospectivemulticentre analysis. Ann Hematol 94, 981–8 (2015). → pages 58[137] Landau, D. A. et al. Mutations driving CLL and their evolution in progression andrelapse. Nature (2015). → pages 59, 97, 110[138] Morrissy, A. S. et al. Divergent clonal selection dominates medulloblastoma atrecurrence. Nature (2016). → pages 59[139] Walter, M. J. et al. Clonal architecture of secondary acute myeloid leukemia. TheNew England journal of medicine 366, 1090–8 (2012). → pages 59[140] Fabbri, G. et al. Genetic lesions associated with chronic lymphocytic leukemiatransformation to Richter syndrome. The Journal of experimental medicine 210,2273–88 (2013). → pages 59[141] Okosun, J. et al. Integrated genomic analysis identifies recurrent mutations andevolution patterns driving the initiation and progression of follicular lymphoma. NatGenet 46, 176–81 (2014). → pages 59, 80, 81[142] Pasqualucci, L. et al. Genetics of Follicular Lymphoma Transformation. CellReports (2014). → pages 59, 80, 81[143] Yano, T., Jaffe, E. S., Longo, D. L. & Raffeld, M. MYC rearrangements inhistologically progressed follicular lymphomas. Blood 80, 758–767 (1992). →pages154[144] Lossos, I. S. et al. Transformation of follicular lymphoma to diffuse large-celllymphoma: Alternative patterns with increased or decreased expression of c-mycand its regulated genes. Proc Natl Acad Sci U S A 99, 8886–8891 (2002). → pages[145] Lo Coco, F. et al. p53 mutations are associated with histologic transformation offollicular lymphoma. Blood 82, 2289–2295 (1993). → pages[146] Sander, C. A. et al. p53 mutation is associated with progression in follicularlymphomas. Blood 82, 1994–2004 (1993). → pages[147] Fitzgibbon, J. et al. Genome-wide detection of recurring sites of uniparental disomyin follicular and transformed follicular lymphoma. Leukemia 21, 1514–1520 (2007).→ pages[148] Davies, A. J. et al. Transformation of follicular lymphoma to diffuse large B-celllymphoma proceeds by distinct oncogenic mechanisms. British journal ofhaematology 136, 286–293 (2007). → pages[149] Carlotti, E. et al. Transformation of follicular lymphoma to diffuse large B-celllymphoma may occur by divergent evolution from a common progenitor cell or bydirect evolution from the follicular lymphoma clone. Blood 113, 3553–3557 (2009).→ pages[150] O’Shea, D. et al. Regions of acquired uniparental disomy at diagnosis of follicularlymphoma are associated with both overall survival and risk of transformation.Blood 113, 2298–2301 (2009). → pages 59[151] Yunis, J. J. et al. Multiple recurrent genomic defects in follicular lymphoma. Apossible model for cancer. New England Journal of Medicine 316, 79–84 (1987). →pages 59[152] Tilly, H. et al. Prognostic value of chromosomal abnormalities in follicularlymphoma. Blood 84, 1043–1049 (1994). → pages[153] Höglund, M. et al. Identification of cytogenetic subgroups and karyotypic pathwaysof clonal evolution in follicular lymphomas. Genes, chromosomes & cancer 39,195–204 (2004). → pages[154] O’Shea, D. et al. The presence of TP53 mutation at diagnosis of follicularlymphoma identifies a high-risk group of patients with shortened time to diseaseprogression and poorer overall survival. Blood 112, 3126–3129 (2008). → pages[155] d’Amore, F. et al. Clonal evolution in t(14;18)-positive follicular lymphoma, evidencefor multiple common pathways, and frequent parallel clonal evolution. Clinicalcancer research : an official journal of the American Association for CancerResearch 14, 7180–7187 (2008). → pages[156] Cheung, K.-J. J. et al. Genome-wide profiling of follicular lymphoma by arraycomparative genomic hybridization reveals prognostically significant DNA copynumber imbalances. Blood 113, 137–148 (2009). → pages155[157] Alhejaily, A., Day, A. G., Feilotter, H. E., Baetz, T. & Lebrun, D. P. Inactivation of theCDKN2A tumor-suppressor gene by deletion or methylation is common atdiagnosis in follicular lymphoma and associated with poor clinical outcome. Clinicalcancer research : an official journal of the American Association for CancerResearch 20, 1676–1686 (2014). → pages 59[158] Scott, D. W. et al. Determining cell-of-origin subtypes of diffuse large B-celllymphoma using gene expression in formalin-fixed paraffin-embedded tissue. Blood123, 1214–7 (2014). → pages 65[159] Kridel, R. et al. Cell of origin of transformed follicular lymphoma. Blood 126,2118–2128 (2015). → pages 65[160] Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheelertransform. Bioinformatics 25, 1754–1760 (2009). → pages 65[161] Costello, M. et al. Discovery and characterization of artifactual mutations in deepcoverage targeted capture sequencing data due to oxidative DNA damage duringsample preparation. Nucleic Acids Res 41, e67 (2013). → pages 66[162] Ding, J. J. et al. Feature-based classifiers for somatic mutation detection intumour-normal paired sequencing data. Bioinformatics 28, 167–175 (2012). →pages 66, 84[163] Saunders, C. T. et al. Strelka: accurate somatic small-variant calling fromsequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012). →pages 66[164] Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer.Nature Methods 11, 396–398 (2014). → pages 68[165] Ha, G. et al. TITAN: inference of copy number architectures in clonal cellpopulations from tumor whole-genome sequence data. Genome Res 24,1881–1893 (2014). → pages 68, 72[166] Lai, D., Ha, G. & Shah, S. Hmmcopy: Copy number prediction with correction for gcand mappability bias for hts data (2012). R package version 1.8.0. → pages 72[167] McPherson, A. W. et al. nFuse: Discovery of complex genomic rearrangements incancer using high-throughput sequencing. Genome Res (2012). → pages 76[168] Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A.Identification of neutral tumor evolution across cancer types. Nat Genet 48,238–244 (2016). → pages 76, 77, 99[169] Ewens, W. J. Mathematical Population Genetics, vol. 27 of Interdisciplinary AppliedMathematics (Springer New York, New York, NY, 2004). → pages 77[170] Green, M. R. et al. Mutations in early follicular lymphoma progenitors areassociated with suppressed antigen presentation. Proc Natl Acad Sci U S A201501199 (2015). → pages 80156[171] Pastore, A. et al. Integration of gene mutations in risk prognostication for patientsreceiving first-line immunochemotherapy for follicular lymphoma: a retrospectiveanalysis of a prospective clinical trial and validation in a population-based registry.Lancet Oncol 2045, 1–12 (2015). → pages 80[172] Schmitz, R. et al. Burkitt lymphoma pathogenesis and therapeutic targets fromstructural and functional genomics. Nature 490, 116–20 (2012). → pages 80[173] Love, C. et al. The genetic landscape of mutations in Burkitt lymphoma. Nat Genet44, 1321–5 (2012). → pages[174] Richter, J. et al. Recurrent mutation of the ID3 gene in Burkitt lymphoma identifiedby integrated genome, exome and transcriptome sequencing. Nat Genet 44,1316–20 (2012). → pages 80[175] Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for newcancer-associated genes. Nature 499, 214–8 (2013). → pages 80[176] Wong, T. N. et al. Role of TP53 mutations in the origin and evolution oftherapy-related acute myeloid leukaemia. Nature (2014). → pages 86[177] Muppidi, J. R. et al. Loss of signalling via Gα13 in germinal centre B-cell-derivedlymphoma. Nature 516, 254–258 (2014). → pages 102[178] Lohri, A. et al. Outcome of treatment of first relapse of Hodgkin’s disease afterprimary chemotherapy: identification of risk factors from the British Columbiaexperience 1970 to 1988. Blood 77, 2292–2298 (1991). → pages 110[179] Brice, P. et al. Analysis of prognostic factors after the first relapse of Hodgkin’sdisease in 187 patients. Cancer 78, 1293–1299 (1996). → pages[180] Josting, A. et al. Prognostic factors and treatment outcome in primary progressiveHodgkin lymphoma: a report from the German Hodgkin Lymphoma Study Group.Blood 96, 1280–1286 (2000). → pages[181] Josting, A. et al. New prognostic score based on treatment outcome of patientswith relapsed Hodgkin’s lymphoma registered in the database of the GermanHodgkin’s lymphoma study group. J Clin Oncol 20, 221–230 (2002). → pages[182] Bierman, P. J. et al. The International Prognostic Factors Project score foradvanced Hodgkin’s disease is useful for predicting outcome of autologoushematopoietic stem cell transplantation. Annals of oncology : official journal of theEuropean Society for Medical Oncology / ESMO 13, 1370–1377 (2002). → pages[183] Perz, J. B. et al. LACE-conditioned autologous stem cell transplantation forrelapsed or refractory Hodgkin’s lymphoma: treatment outcome and risk factoranalysis in 67 patients from a single centre. Bone marrow transplantation 39,41–47 (2007). → pages 110[184] Koreishi, A. F. et al. The role of cytotoxic and regulatory T cells inrelapsed/refractory Hodgkin lymphoma. Applied immunohistochemistry &molecular morphology : AIMM / official publication of the Society for AppliedImmunohistochemistry 18, 206–211 (2010). → pages 139157[185] Casulo, C. et al. Tumor associated macrophages in relapsed and refractoryHodgkin lymphoma. Leukemia research 37, 1178–1183 (2013). → pages 139[186] Gerrie, A. S. et al. Chemoresistance can be overcome with high-dosechemotherapy and autologous stem-cell transplantation for relapsed and refractoryHodgkin lymphoma. Annals of oncology : official journal of the European Societyfor Medical Oncology / ESMO 25, 2218–2223 (2014). → pages 110, 111[187] Moskowitz, C. H. et al. Brentuximab vedotin as consolidation therapy afterautologous stem-cell transplantation in patients with Hodgkin’s lymphoma at risk ofrelapse or progression (AETHERA): a randomised, double-blind,placebo-controlled, phase 3 trial. Lancet 385, 1853–1862 (2015). → pages 110,139[188] Alvaro, T. et al. Outcome in Hodgkin’s lymphoma can be predicted from thepresence of accompanying cytotoxic and regulatory T cells. Clinical CancerResearch 11, 1467–1473 (2005). → pages 110[189] Kelley, T. W., Pohlman, B., Elson, P. & Hsi, E. D. The ratio of FOXP3+ regulatory Tcells to granzyme B+ cytotoxic T/NK cells predicts prognosis in classical Hodgkinlymphoma and is independent of bcl-2 and MAL expression. American Journal ofClinical Pathology 128, 958–965 (2007). → pages[190] Sanchez-Espiridion, B. et al. A TaqMan low-density array to predict outcome inadvanced Hodgkin’s lymphoma using paraffin-embedded samples. Clinical CancerResearch 15, 1367–1375 (2009). → pages 110, 114[191] Bollard, C. M. et al. Cytotoxic T lymphocyte therapy for Epstein-Barr Virus+Hodgkin’s disease. Journal of Experimental Medicine 200, 1623–1633 (2004). →pages 110[192] Bollard, C. M. et al. Complete responses of relapsed lymphoma following geneticmodification of tumor-antigen presenting cells and T-lymphocyte transfer. Blood110, 2838–2845 (2007). → pages[193] Zhou, J. et al. CTLA-4 blockade following relapse of malignancy after allogeneicstem cell transplantation is associated with T cell activation but not with increasedlevels of T regulatory cells. Biology of Blood and Marrow Transplantation 17,682–692 (2011). → pages[194] Younes, A. et al. Phase 2 study of rituximab plus ABVD in patients with newlydiagnosed classical Hodgkin lymphoma. Blood 119, 4123–4128 (2012). → pages[195] Kasamon, Y. L. et al. Phase 2 study of rituximab-ABVD in classical Hodgkinlymphoma. Blood 119, 4129–4132 (2012). → pages 110[196] Steidl, C. et al. Gene expression profiling of microdissected HodgkinReed-Sternberg cells correlates with treatment outcome in classical Hodgkinlymphoma. Blood 120, 3530–3540 (2012). → pages 114[197] Geiss, G. K. et al. Direct multiplexed measurement of gene expression withcolor-coded probe pairs. Nat Biotechnol 26, 317–325 (2008). → pages 139158[198] Reis, P. P. et al. mRNA transcript quantification in archival samples usingmultiplexed, color-coded probes. BMC biotechnology 11, 46 (2011). → pages 139[199] Newman, A. M. et al. Robust enumeration of cell subsets from tissue expressionprofiles. Nature Methods 12, 453–457 (2015). → pages 144159Appendix ASupporting MaterialsA.1 TablesA.1.1 Chapter 2Table A.1: Frequently Deleted Regions Predicted by GISTIC.Cytoband Wide Peak Limits Q-value Number of Deletions(Frequency)1p36.32 chr1:1-249250621 <0.001 23 (14.6%)1p36.31 chr1:5921586-6475685 <0.001 30 (19.1%)2q22.3 chr2:107000207-243199373 0.018 10 (6.4%)3p26.1 chr3:4356986-4943651 0.008 6 (3.8%)3p21.31 chr3:48649407-50598050 <0.001 14 (8.9%)3p14.2 chr3:59031064-61549567 <0.001 9 (5.7%)4p16.3 chr4:1-39458764 0.018 19 (12.1%)4q13.3 chr4:70516672-72608791 <0.001 22 (14%)4q35.1 chr4:164428046-191154276 0.004 15 (9.6%)5p14.2 chr5:1-180915260 0.066 6 (3.8%)5q22.2 chr5:111305879-113703272 0.007 7 (4.5%)6p21.33 chr6:31143901-31432860 <0.001 15 (9.6%)6p21.32 chr6:32412593-32781432 <0.001 16 (10.2%)6q15 chr6:87799886-89322324 <0.001 39 (24.8%)6q21 chr6:105845558-107352244 <0.001 43 (27.4%)6q23.3 chr6:134823920-138411442 <0.001 47 (29.9%)6q25.2 chr6:153449213-157115790 <0.001 35 (22.3%)7q21.13 chr7:38269293-159138663 0.018 3 (1.9%)8p23.1 chr8:4847250-6728387 <0.001 18 (11.5%)160. . . continuedCytoband Wide Peak Limits Q-value Number of Deletions(Frequency)8p22 chr8:15623718-16850497 <0.001 18 (11.5%)8p21.3 chr8:1-146364022 <0.001 16 (10.2%)8q12.1 chr8:59571288-61104746 <0.001 18 (11.5%)8q23.3 chr8:110984073-116425859 0.008 6 (3.8%)8q24.21 chr8:129161454-132919615 0.018 8 (5.1%)9p21.3 chr9:21864553-22000769 <0.001 35 (22.3%)10p14 chr10:1-135534747 0.066 9 (5.7%)10q23.31 chr10:89309387-91065948 <0.001 19 (12.1%)12q24.31 chr12:121842721-122287289 0.066 10 (6.4%)13q14.2 chr13:50506234-51484444 <0.001 23 (14.6%)14q13.2 chr14:33297556-39504995 0.066 11 (7%)14q32.32 chr14:102808656-104100559 0.001 17 (10.8%)15q15.3 chr15:43617494-45036916 <0.001 31 (19.7%)16q12.2 chr16:33169941-90354753 0.066 9 (5.7%)17p13.3 chr17:1-1965541 <0.001 25 (15.9%)17p13.1 chr17:6683879-8068044 <0.001 28 (17.8%)18q22.2 chr18:60643784-78077248 0.001 11 (7%)19p13.3 chr19:6535499-6895865 <0.001 16 (10.2%)Table A.2: Frequently Gained Regions Predicted by GISTIC.Cytoband Wide Peak Limits Q-value Number ofAmplifications(Frequency)1q21.1 chr1:121289003-145372142 <0.001 32 (20.4%)1q22 chr1:153145745-157488486 <0.001 34 (21.7%)1q24.2 chr1:161933049-173545263 <0.001 36 (22.9%)1q25.3 chr1:178626021-184386070 <0.001 33 (21%)1q31.1 chr1:189887716-191410504 <0.001 31 (19.7%)1q31.3 chr1:195108489-196061710 <0.001 29 (18.5%)1q32.1 chr1:200542195-205015283 <0.001 34 (21.7%)1q41 chr1:215697086-219633868 <0.001 31 (19.7%)1q42.12 chr1:220214812-236901817 0.001 29 (18.5%)1q44 chr1:238299630-249250621 0.003 28 (17.8%)2p16.1 chr2:60562715-60903680 <0.001 36 (22.9%)161. . . continuedCytoband Wide Peak Limits Q-value Number ofAmplifications(Frequency)6p25.2 chr6:1-53515129 0.02 25 (15.9%)8q24.22 chr8:113640937-138863137 0.003 27 (17.2%)8q24.3 chr8:138547530-146364022 <0.001 28 (17.8%)11q23.1 chr11:112261086-112654973 0.001 34 (21.7%)11q25 chr11:124293199-135006516 0.001 36 (22.9%)12q14.3 chr12:51595506-80437332 <0.001 40 (25.5%)13q31.3 chr13:86209869-115169878 0.02 25 (15.9%)18q21.32 chr18:54632503-57045590 0.003 45 (28.7%)18q22.3 chr18:68782554-78077248 0.008 39 (24.8%)19q13.41 chr19:50080251-59128983 0.001 24 (15.3%)Table A.3: Distribution of Normal/Tumour Content. Normal content was predicted byOncoSNP while tumour content was assessed by hematoxylin and eosin staining.Sample ID Normal Content Tumour Content00-26427 0.4 NA01-19969 0.3 NA01-26405 0.7 8001-26579 0.9 8001-28389 0.7 7002-25216 0.1 NA02-28783 0.1 NA03-10363 0.1 NA03-13123 0.2 NA03-20512 0.3 NA03-23269 0.8 9503-23488 0 NA03-24011 0.6 9503-26969 0.1 NA03-27812 0.9 5003-28045 0.7 9503-30438 0.8 8003-32079 0.8 8503-33888 0.1 NA03-34112 0.1 NA162. . . continuedSample ID Normal Content Tumour Content03-34379 0.3 NA04-10134 0.1 NA04-25810 0.9 4004-39156 0.1 NA04-42539 0.1 NA05-13030 0.2 NA05-13334 0.6 7005-17793 0.9 8505-17909 0.1 NA05-24401 0.1 NA05-24666 0.1 NA05-30192 0.9 6005-30349 0.7 9505-33954 0.6 7006-14742 0.9 5006-17315 0.9 1006-20937 0.8 2006-22615 0.8 9006-24925 0.3 NA06-25778 0.4 NA06-26378 0.2 NA06-26792 0.1 NA06-27905 0.2 NA06-28798 0.1 NA04-28140 0.9 2004-34824 0.3 NA04-41235 0.1 NA07-23289 0.9 NA83-15272 0.1 NA01-27308 0.8 4081-51938 0.9 NA92-67950 0.1 NA03-11592 0.8 4095-11015 0.1 NA99-22226 0.1 NA03-31713 0.1 NA04-29264 0.8 90163. . . continuedSample ID Normal Content Tumour Content07-35482 0.1 NA07-31833 0.3 NA99-25549 0.1 NA05-19287 0 NA08-21175 0.1 NA02-22991 0.4 NA07-37968 0.4 NA08-15460 0.4 NA01-18667 0.2 NA02-22023 0.9 7092-56188 0 NA81-52884 0.2 NA05-25674 0.1 NA02-13818 0 NA02-20170 0.1 NA06-23057 0 NA99-27137 0.8 >5002-24725 0 NA09-33003 0.6 6509-12737 0.1 NA85-63855 0.1 NA82-57570 0.4 NASPEC-1203 0.4 NASPEC-1187 0.6 >50SPEC-1120 0.1 NASPEC-1185 0.8 >5009-41082 0.1 NA00-15694 0.3 NA07-25012 0.1 NA06-33777 0.9 7504-20644 0.7 8008-25894 0.1 NA06-15256 0.5 5008-11596 0.1 NA06-25674 0.3 NA04-36422 0.9 >5002-30647 0.2 NA164. . . continuedSample ID Normal Content Tumour Content05-20543 0.2 NA97-14402 0.1 NA05-11328 0.2 NA05-32947 0.1 NA00-12223 0.6 6002-30519 0.1 NA03-20981 0.9 1004-23426 0.1 NA98-22532 0.1 NA05-12939 0.1 NA05-24395 0.1 NA05-24904 0.4 NA06-10398 0.1 NA06-11535 0.2 NA06-15922 0.8 8506-18547 0.7 9006-19919 0.6 9005-23110 0.2 NA05-25439 0.1 NA94-26795 0.4 NA05-28122 0.4 NA04-11156 0.2 NA05-24561 0.8 >5004-39108 0.9 5004-29995 0.8 7003-34969 0.8 7005-25969 0.9 7006-16316 0.2 NA06-14634 0.7 >5006-16716 0.2 NA06-22057 0.1 NA96-20883 0.3 NA06-23907 0.1 NA05-26084 0.6 7006-24915 0.1 NA06-25470 0.6 8006-23792 0.6 80165. . . continuedSample ID Normal Content Tumour Content06-27347 0.1 NA06-28900 0.9 3006-30025 0.1 NA06-30145 0.5 >5006-31353 0.9 2006-34043 0.9 >5007-17613 0.1 NA03-33266 0.1 NA07-16623 0.9 6007-20634 0.8 7007-23804 0.6 6095-32814 0.4 NA07-30628 0.6 7507-34014 0.9 8007-28351 0.1 NA06-24881 0.1 NA02-16987 0.5 80Table A.4: List of Genes in the RCOR1 Loss-associated Gene Expression Signature(n = 233). A negative spearman rank rho and log2 fold change > 0 means up-regulated inthe event of RCOR1 loss. Vice versa for a positive spearman rank rho and log2 fold change< 0.Gene Name Spearman Correlation Rho Log2 Fold ChangeNDUFAB1 -0.602 0.308MLL5 0.626 -0.518SS18L2 -0.503 0.384PIAS1 0.506 -0.375H6PD 0.559 -0.454MRTO4 -0.532 0.685ATP2B4 0.468 -0.594CDK17 0.527 -0.424MPC1 -0.491 0.363ISOC2 -0.491 0.656KDM4A 0.544 -0.7UFD1L -0.456 0.322EIF2B3 -0.48 0.467166. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeC19orf10 -0.592 0.505BAZ2A 0.523 -0.369ANKRD13A 0.629 -0.317TOP2B 0.581 -0.389GNB1 0.512 -0.311VDAC3 -0.508 0.439SMARCA2 0.441 -0.375DUSP12 -0.498 0.306NFE2L1 0.49 -0.657SMAP2 0.532 -0.63DYNLL1 -0.494 0.484KDM2B 0.536 -0.398RCOR1 1 -0.94GLG1 0.528 -0.358HNRNPC -0.521 0.39ABLIM1 0.56 -0.368TOMM22 -0.534 0.363KHNYN 0.51 -0.38SOS2 0.611 -0.545PSMC6 -0.534 0.354PSMA3 -0.505 0.533PSMB5 -0.624 0.356PSME2 -0.547 0.433TBL1X 0.622 -0.52ELF4 0.645 -0.389NUTF2 -0.542 0.374MON1B 0.56 -0.321SPG11 0.527 -0.394TSTA3 -0.565 0.402ARHGEF18 0.513 -0.41TIMM50 -0.484 0.346TMEM147 -0.557 0.406ZKSCAN1 0.502 -0.461AKNA 0.519 -0.574RGP1 0.532 -0.426BAG1 -0.51 0.726UBE2R2 0.736 -0.431167. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeTNKS2 0.457 -0.338MED31 -0.481 0.406PRKAR1A 0.462 -0.319CLCN3 0.618 -0.476CBL 0.702 -0.308COMMD9 -0.58 0.594POU2AF1 0.461 -0.361MAGOHB -0.557 0.382SUDS3 0.549 -0.378HECA 0.498 -0.436PAPD7 0.491 -0.357BRIX1 -0.462 0.366SELK -0.612 0.323BCL6 0.585 -0.563POLE4 -0.454 0.358HSPE1 -0.543 0.771RAB3GAP1 0.654 -0.446PNO1 -0.443 0.471PLEK 0.443 -1.505EBNA1BP2 -0.572 0.57FASTKD2 -0.436 0.589CNTRL 0.628 -0.596SLIRP -0.488 0.493CRIPT -0.54 0.342KIAA0922 0.678 -0.412ZCCHC17 -0.552 0.314GTF3A -0.573 0.37SRGN -0.482 0.309PRDX4 -0.569 0.645NDUFAF4 -0.486 0.315EMC3 -0.56 0.492MRPS7 -0.532 0.408SNRPB2 -0.493 0.367KDM5C 0.482 -0.308KLHDC10 0.457 -0.433MRPS12 -0.58 0.546TXNDC17 -0.558 0.484168. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeLSM7 -0.521 0.378PRRC2B 0.648 -0.338SESN2 0.47 -2.076COX7B -0.522 0.499NUP210 0.489 -0.397KDM6B 0.452 -1.089GPS2 -0.442 0.48SERINC3 0.579 -0.516POMP -0.575 0.316PIK3C2B 0.498 -0.668MKRN1 0.489 -0.472EDEM1 0.585 -0.729TIMM10 -0.547 0.317ETS1 0.483 -0.364ISCA1 -0.44 0.494TSPAN31 -0.533 0.354DCAF7 0.699 -0.348EPRS 0.499 -0.452ANP32B 0.561 -0.327MRPL15 -0.54 0.311SULF1 0.445 -0.658NDUFAF1 -0.477 0.484GTF2B -0.571 0.507SLC7A1 0.474 -0.897SRP14 -0.469 0.302COPS3 -0.526 0.312PFKL 0.492 -0.343RERE 0.464 -0.417ZNF593 -0.601 0.356WDTC1 0.545 -0.737SDHC -0.473 0.441JTB -0.482 0.307GATAD2B 0.639 -0.488ITPKB 0.598 -0.36SNRPG -0.574 0.474TPRKB -0.438 0.408CISD2 -0.471 0.311169. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeRNF44 0.438 -0.419PPP1R18 0.448 -0.32DOCK11 0.531 -0.393NTMT1 -0.438 0.394LAMTOR1 -0.563 0.306TIMM8B -0.626 0.461GEMIN6 -0.557 0.743ATP5J -0.459 0.367HK1 0.469 -0.359ZFAND3 0.576 -0.44SEC13 -0.545 0.481WASF2 0.494 -0.349ATP5G1 -0.531 0.48CCDC58 -0.544 0.426ZER1 0.452 -0.385MED27 -0.455 0.304DVL3 0.491 -0.388IKZF3 0.474 -0.409POLR3K -0.552 0.376PYGO2 0.509 -0.445YEATS2 0.554 -0.502UBXN7 0.453 -0.652LSM6 -0.532 0.522EBF1 0.49 -0.368FABP5 -0.557 0.505PSIP1 0.549 -0.46ATP5C1 -0.502 0.358FUNDC2 -0.484 0.366ARIH1 0.555 -0.334PPIB -0.473 0.705VPS39 0.491 -0.341MBD6 0.443 -0.956PHB -0.51 0.603TRUB2 -0.556 0.648MIDN 0.537 -0.387ZNF146 0.521 -0.332TRANK1 0.58 -0.335170. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeDTYMK -0.445 0.395NDUFS5 -0.537 0.368MRPL1 -0.53 0.7FAM103A1 -0.515 0.448STRA13 -0.549 0.483DCXR -0.447 0.35SF3B5 -0.555 0.426POLH 0.466 -0.435AKAP13 0.646 -0.454TRIAP1 -0.52 0.328LSM3 -0.435 0.481POLR1C -0.556 0.406ZNF318 0.677 -0.436CLSTN1 0.633 -0.324FAM195A -0.497 0.407MYEOV2 -0.554 0.52RASGRP1 0.444 -0.379KLHL6 0.509 -1.192TNKS 0.598 -0.533MOB1B 0.578 -0.528STAT5B 0.507 -0.396USMG5 -0.58 0.428TRMT10C -0.495 0.439MRPL11 -0.596 0.303TOMM5 -0.492 0.312UNC119B 0.553 -0.359TIMM22 -0.471 0.346POLR2L -0.494 0.693IMP3 -0.611 0.353ZHX2 0.542 -0.494EXOSC4 -0.447 0.303GADD45GIP1 -0.576 0.376DCTPP1 -0.559 0.631ZNF609 0.542 -0.491MRPS11 -0.562 0.325SEPT9 0.467 -0.484NDUFA12 -0.537 0.351171. . . continuedGene Name Spearman Correlation Rho Log2 Fold ChangeGTF2H5 -0.497 0.353PSMD13 -0.717 0.35ZFP36L1 0.468 -0.557UBE2H 0.612 -0.72RPS19BP1 -0.453 0.351FNBP1 0.481 -0.354NBR1 0.586 -0.342NDUFA4 -0.528 0.413PAX5 0.61 -0.425STK40 0.536 -0.457YRDC -0.509 0.47PLXNB2 0.496 -0.426MAFG 0.448 -0.346MRPL21 -0.611 0.368RPS26 -0.554 0.823UBL5 -0.537 0.449ASNA1 -0.564 0.313GPN1 -0.508 0.364ZNF511 -0.632 0.305STK39 0.441 -0.314MSRB1 -0.656 0.575PNP -0.506 0.548DENND4B 0.477 -0.406SREBF2 0.595 -0.631CENPW -0.528 0.311TMEM57 0.527 -0.301PPP1R10 0.621 -0.523PSMB10 -0.534 0.54TMX2 -0.477 0.32MSMP 0.465 -0.499RP11-211G3.3 0.593 -0.435GPX1 -0.6 0.441MRPL20 -0.451 0.393S1PR2 0.554 -0.316AL022328.1 0.531 -0.401172Table A.5: Table of Up/Down-regulated Enriched Biological processes (FDR < 0.05)that are Associated with the RCOR1 Loss-associated Gene Signature (n = 233).Letters in parentheses represent the source pathway database: (B), NCI-PID BioCarta;(C), cancer cell map; (K), KEGG; (N), NCI-PID curated pathways; (P), PANTHER; (R),Reactome.Biological Processes Selected Genes DirectionRibosome(K) RPS26, MRPL15, MRPL11, MRPS12,MRPL1upOxidative phosphorylation(K) NDUFAB1, NDUFS5, ATP5C1, NDUFA4,ATP5JupM/G1 Transition(R) PSMA3, PSME2, PSMB10, PSMC6, PSMB5 upAPC/C-mediated degradation of cellcycle proteins(R)PSMB5, PSMD13, PSMC6, PSMA3, PSME2 upSpliceosome(K) SF3B5, SNRPB2, SNRPG, LSM6, LSM7 upThe citric acid (TCA) cycle andrespiratory electron transport(R)ATP5C1, NDUFAB1, ATP5G1, ATP5J,NDUFS5upRegulation of DNA replication(R) PSMA3, PSMB10, PSME2, PSMB5, PSMC6 upDegradation of beta-catenin by thedestruction complex(R)PSMB10, PSME2, PSMB5, PSMD13,PSMA3upRegulation of Apoptosis(R) PSMB10, PSMD13, PSMB5, PSMA3,PSME2upSynthesis of DNA(R) PSME2, PSMD13, PSMB5, PSMC6, PSMA3 upMitochondrial Protein Import(R) TOMM5, ATP5G1, TIMM10, TOMM22,TIMM22upRegulation of mRNA Stabilityby Proteins that Bind AU-richElements(R)PSMB5, EXOSC4, PSMD13, PSMC6,PSME2upS Phase(R) PSMB5, PSME2, PSMB10, PSMD13,PSMC6upCell Cycle Checkpoints(R) PSMB10, PSMA3, PSMB5, PSMC6, PSME2 upMitotic G1-G1/S phases(R) PSMA3, PSMB5, PSMD13, PSME2,PSMB10upbeta-catenin independent WNTsignaling(R)PSMD13, PSME2, PSMA3, PSMC6, PSMB5 upMetabolism of amino acids andderivatives(R)PSMC6, PSMB10, PSMA3, PSMB5, PSME2 upProteasome(K) PSMB10, PSMC6, PSMA3, PSMB5, PSME2 upDeadenylation-dependent mRNAdecay(R)LSM6, LSM7, LSM3, EXOSC4 upMitotic Metaphase andAnaphase(R)PSMB10, PSMD13, PSMC6, PSMB5,PSME2upIL4-mediated signaling events(N) STAT5B, BCL6, CBL, ETS1 down173. . . continuedBiological Processes Selected Genes DirectionEpstein-Barr virus infection(K) PSMC6, PSMD13, POLR1C, GTF2B,POLR3KupRNA degradation(K) LSM6, LSM7, LSM3, EXOSC4 upProcessing of CappedIntron-Containing Pre-mRNA(R)HNRNPC, SNRPG, SNRPB2, SF3B5,POLR2LupRNA polymerase(K) POLR2L, POLR3K, POLR1C upClass I MHC mediated antigenprocessing & presentation(R)PSMA3, PSMD13, PSMC6, PSMB10,PSMB5upSignaling events mediated by HDACClass II(N)NUP210, BCL6, GNB1 downil-2 receptor beta chain in t cellactivation(B)STAT5B, IKZF3, CBL downPyrimidine metabolism(K) POLR2L, DTYMK, POLR3K, POLR1C upUbiquitin mediated proteolysis(K) UBE2R2, CBL, UBE2H, PIAS1 downRNA Polymerase I, RNAPolymerase III, and MitochondrialTranscription(R)GTF3A, POLR2L, POLR3K, POLR1C upEGF receptor signaling pathway(P) STAT5B, SOS2, CBL downC-MYB transcription factornetwork(N)PAX5, ETS1, SMARCA2 downTranscriptional Regulation of WhiteAdipocyte Differentiation(R)SREBF2, EBF1, TBL1X downPathways in cancer(K) ETS1, SOS2, STAT5B, CBL, DVL3 downJak-STAT signaling pathway(K) STAT5B, SOS2, CBL, PIAS1 downErbB signaling pathway(K) STAT5B, SOS2, CBL downCytosolic DNA-sensing pathway(K) POLR2L, POLR3K, POLR1C upRas signaling pathway(K) RASGRP1, SOS2, GNB1, ETS1 downSignaling by the B Cell Receptor(BCR)(R)PSME2, PSMC6, PSMD13, PSMA3, PSMB5 upCytosolic sensors ofpathogen-associated DNA (R)POLR2L, POLR3K, POLR1C upT cell receptor signaling pathway(K) RASGRP1, SOS2, CBL downWnt signaling pathway(P) GNB1, PYGO2, TBL1X, SMARCA2 down174A.1.2 Chapter 4Table A.6: RHL30 vs. Reported Prognostic Markers for Post-BMT-OS.Gene Name Accession Gene TypeA2M NM_000014.4 EndogenousABCA1 NM_005502.2 EndogenousABCA10 NM_080282.3 EndogenousABCA12 NM_015657.3 EndogenousABCA13 NM_152701.3 EndogenousABCA2 NM_001606.4 EndogenousABCA3 NM_001089.2 EndogenousABCA4 NM_000350.2 EndogenousABCA5 NM_172232.2 EndogenousABCA6 NM_080284.2 EndogenousABCA7 NM_019112.3 EndogenousABCA8 NM_007168.2 EndogenousABCA9 NM_080283.3 EndogenousABCB1 NM_000927.3 EndogenousABCB10 NM_012089.2 EndogenousABCB11 NM_003742.2 EndogenousABCB2 NM_000593.5 EndogenousABCB3 NM_000544.3 EndogenousABCB4 NM_018849.2 EndogenousABCB5 NM_001163941.1 EndogenousABCB6 NM_005689.2 EndogenousABCB7 NM_004299.3 EndogenousABCB8 NM_007188.3 EndogenousABCB9 NM_019624.3 EndogenousABCC1 NM_004996.3 EndogenousABCC10 NM_001198934.1 EndogenousABCC11 NM_032583.3 EndogenousABCC12 NM_033226.2 EndogenousABCC13 NR_003088.1 EndogenousABCC2 NM_000392.3 EndogenousABCC3 NM_001144070.1 EndogenousABCC4 NM_005845.3 EndogenousABCC5 NM_001023587.1 EndogenousABCC6 NM_001171.5 Endogenous175. . . continuedGene Name Accession Gene TypeABCC7 NM_000492.3 EndogenousABCC8 NM_000352.3 EndogenousABCC9 NM_020298.2 EndogenousABCD1 NM_000033.3 EndogenousABCD2 NM_005164.3 EndogenousABCD3 NM_001122674.1 EndogenousABCD4 NM_005050.3 EndogenousABCE1 NM_001040876.1 EndogenousABCF1 NM_001090.2 EndogenousABCF2 NM_007189.1 EndogenousABCF3 NM_018358.2 EndogenousABCG1 NM_207174.1 EndogenousABCG2 NM_004827.2 EndogenousABCG4 NM_022169.4 EndogenousABCG5 NM_022436.2 EndogenousABCG8 NM_022437.2 EndogenousABI3BP NM_015429.3 EndogenousACP2 NM_001131064.1 EndogenousACSL1 NM_001995.2 EndogenousACTB NM_001101.2 HousekeepingACYP2 NM_138448.3 EndogenousADCY1 NM_021116.2 EndogenousADH1B NM_000668.4 EndogenousADPRM NM_020233.4 EndogenousADRB2 NM_000024.3 EndogenousAHR NM_001621.3 EndogenousAK8 NM_152572.2 EndogenousALAS1 NM_000688.4 HousekeepingALDH1A1 NM_000689.3 EndogenousALDOA NM_184041.2 EndogenousALK1 NM_004304.3 EndogenousALOX5 NM_000698.2 EndogenousANGPTL4 NM_139314.1 EndogenousANKRD22 NM_144590.2 EndogenousANTXR1 NM_018153.3 EndogenousAPOB NM_000384.2 EndogenousAPOD NM_001647.3 Endogenous176. . . continuedGene Name Accession Gene TypeAPOE NM_000041.2 EndogenousAPOL3 NR_027833.1 EndogenousAPOL6 NM_030641.3 EndogenousAPRIL NM_003808.3 EndogenousAQP9 NM_020980.3 EndogenousARG1 NM_000045.2 EndogenousARL3 NM_004311.2 EndogenousASNS NM_183356.2 EndogenousATF3 NM_001674.2 EndogenousATG10 NM_001131028.1 EndogenousATG12 NM_004707.2 EndogenousATG14 NM_014924.3 EndogenousATG16 NM_017974.3 EndogenousATG3 NM_022488.3 EndogenousATG4B NM_013325.4 EndogenousATG5 NM_004849.2 EndogenousATG7 NM_001136031.2 EndogenousATG9A NM_001077198.1 EndogenousATM NM_138292.3 EndogenousATP10D NM_020453.3 EndogenousATP5L NM_006476.4 EndogenousATP6AP1 NM_001183.4 EndogenousB2M NM_004048.2 EndogenousB3GAT1 NM_054025.2 EndogenousBACH2 NM_021813.2 EndogenousBAD NM_004322.3 EndogenousBAFF NM_006573.4 EndogenousBAG1 NM_004323.3 EndogenousBAK NM_001188.2 EndogenousBAX NM_138761.2 EndogenousBCL11A NM_018014.2 EndogenousBCL2 NM_000633.2 EndogenousBCL2L1 NM_138578.1 EndogenousBCL6 NM_138931.1 EndogenousBCMA NM_001192.2 EndogenousBECN1 NM_003766.2 EndogenousBET1 NM_005868.4 Endogenous177. . . continuedGene Name Accession Gene TypeBID NM_197966.1 EndogenousBIRC4BP NM_199139.1 EndogenousBLK NM_001715.2 EndogenousBLNK NM_013314.2 EndogenousBOLA3 NM_001035505.1 EndogenousBSG NM_198590.1 EndogenousBTLA NM_181780.2 EndogenousBUD31 NM_003910.3 EndogenousC10orf128 NM_001010863.1 EndogenousC12orf45 NM_152318.2 EndogenousC16orf75 NM_152308.1 EndogenousC1orf54 NM_024579.3 EndogenousC1QA NM_015991.2 EndogenousC1QB NM_000491.3 EndogenousC1QC NM_001114101.1 EndogenousC22orf32 NM_033318.4 EndogenousC3 NM_000064.2 EndogenousC3AR1 NM_004054.2 EndogenousC4orf52 NM_001145432.1 EndogenousC7 NM_000587.2 EndogenousCAPNS1 NM_001749.2 EndogenousCARM1 NM_199141.1 EndogenousCASP10 NM_032977.3 EndogenousCASP14 NM_012114.1 EndogenousCASP3 NM_004346.3 EndogenousCASP8 NM_001228.4 EndogenousCCDC167 NM_138493.2 EndogenousCCL13 NM_005408.2 EndogenousCCL14 NM_032962.4 EndogenousCCL17 NM_002987.2 EndogenousCCL18 NM_002988.2 EndogenousCCL19 NM_006274.2 EndogenousCCL2 NM_002982.3 EndogenousCCL20 NM_004591.1 EndogenousCCL21 NM_002989.2 EndogenousCCL22 NM_002990.3 EndogenousCCL23 NM_145898.1 Endogenous178. . . continuedGene Name Accession Gene TypeCCL28 NM_148672.2 EndogenousCCL3 NM_002983.2 EndogenousCCL4 NM_002984.2 EndogenousCCL5 NM_002985.2 EndogenousCCL7 NM_006273.2 EndogenousCCNA2 NM_001237.2 EndogenousCCND1 NM_053056.2 EndogenousCCND2 NM_001759.2 EndogenousCCNE2 NM_057735.1 EndogenousCCR1 NM_001295.2 EndogenousCCR2 NM_001123041.2 EndogenousCCR3 NM_001837.2 EndogenousCCR4 NM_005508.4 EndogenousCCR5 NM_000579.1 EndogenousCCR7 NM_001838.2 EndogenousCD14 NM_000591.2 EndogenousCD160 NM_007053.2 EndogenousCD163 NM_004244.4 EndogenousCD177 NM_020406.2 EndogenousCD19 NM_001770.4 EndogenousCD1A NM_001763.2 EndogenousCD1C NM_001765.2 EndogenousCD2 NM_001767.2 EndogenousCD200 NM_005944.5 EndogenousCD200R1 NM_138939.2 EndogenousCD22 NM_001771.2 EndogenousCD244 NM_016382.2 EndogenousCD27 NM_001242.4 EndogenousCD274 NM_014143.2 EndogenousCD276 NM_001024736.1 EndogenousCD28 NM_001243078.1 EndogenousCD300A NM_007261.2 EndogenousCD300C NM_006678.3 EndogenousCD33 NM_001177608.1 EndogenousCD34 NM_001025109.1 EndogenousCD36 NM_001001548.2 EndogenousCD38 NM_001775.2 Endogenous179. . . continuedGene Name Accession Gene TypeCD3D NM_000732.4 EndogenousCD3E NM_000733.2 EndogenousCD3G NM_000073.2 EndogenousCD4 NM_000616.3 EndogenousCD44 NM_000610.3 EndogenousCD45RA NM_002838.4 EndogenousCD45RO NM_080921.3 EndogenousCD47 NM_001777.3 EndogenousCD58 NM_001779.2 EndogenousCD68 NM_001251.2 EndogenousCD69 NM_001781.1 EndogenousCD70 NM_001252.2 EndogenousCD72 NM_001782.2 EndogenousCD74 NM_001025159.1 EndogenousCD79A NM_001783.3 EndogenousCD80 NM_005191.3 EndogenousCD86 NM_006889.3 EndogenousCD8A NM_001768.5 EndogenousCD8B NM_004931.3 EndogenousCD93 NM_012072.3 EndogenousCD95 NM_152876.1 EndogenousCDC2 NM_001130829.1 EndogenousCDKN2A NM_000077.3 EndogenousCDKN2B NM_004936.3 EndogenousCEACAM8 NM_001816.3 EndogenousCEBPB NM_005194.2 EndogenousCEBPD NM_005195.3 EndogenousCENPF NM_016343.3 EndogenousCFHR1 NM_002113.2 EndogenousCFLAR NM_001127183.1 EndogenousCGRRF1 NM_006568.2 EndogenousCHN1 NM_001025201.2 EndogenousCHN2 NM_004067.2 EndogenousCHUK NM_001278.3 EndogenousCKS2 NM_001827.1 EndogenousCLC NM_001828.4 EndogenousCLEC2B NM_005127.2 Endogenous180. . . continuedGene Name Accession Gene TypeCLPS NM_001832.2 EndogenousCLTC NM_004859.2 HousekeepingCLU NM_203339.1 EndogenousCMBL NM_138809.3 EndogenousCNR1 NM_016083.3 EndogenousCOL18A1 NM_030582.3 EndogenousCOL1A2 NM_000089.3 EndogenousCOL3A1 NM_000090.3 EndogenousCOL4A1 NM_001845.4 EndogenousCOL6A1 NM_001848.2 EndogenousCOMT NM_000754.3 EndogenousCPA3 NM_001870.2 EndogenousCPVL NM_019029.2 EndogenousCR2 NM_001006658.1 EndogenousCRADD NM_003805.3 EndogenousCRCP NM_014478.3 EndogenousCREG1 NM_003851.2 EndogenousCSF1 NM_000757.4 EndogenousCSF1R NM_005211.2 EndogenousCSF2RB NM_000395.2 EndogenousCSF3R NM_156038.2 EndogenousCSTA NM_005213.3 EndogenousCTA_246H3.1 NR_029395.1 EndogenousCTLA4 NM_005214.3 EndogenousCTSB NM_001908.3 EndogenousCTSC NM_001114173.1 EndogenousCTSG NM_001911.2 EndogenousCTSL1 NM_001912.4 EndogenousCTSS NM_004079.3 EndogenousCTSZ NM_001336.3 EndogenousCTTN NM_005231.3 EndogenousCX3CL1 NM_002996.3 EndogenousCX3CR1 NM_001337.3 EndogenousCXCL10 NM_001565.1 EndogenousCXCL11 NM_005409.3 EndogenousCXCL12 NM_199168.2 EndogenousCXCL13 NM_006419.2 Endogenous181. . . continuedGene Name Accession Gene TypeCXCL9 NM_002416.1 EndogenousCXCR3 NM_001504.1 EndogenousCXCR4 NM_001008540.1 EndogenousCXCR5 NM_001716.3 EndogenousCYCS NM_018947.4 EndogenousCYP27B1 NM_000785.3 EndogenousCYP2C18 NM_000772.2 EndogenousCYP2C9 NM_000771.3 EndogenousCYP4Z1 NM_178134.2 EndogenousDEFA1 NM_004084.2 EndogenousDHRS2 NM_005794.3 EndogenousDLEU1 NR_002605.1 EndogenousDNAM1 NM_006566.2 EndogenousDOCK4 NM_014705.3 EndogenousDPH5 NM_001077395.1 EndogenousDPP4 NM_001935.3 EndogenousDRAM1 NM_018370.2 EndogenousDUSP4 NM_057158.2 EndogenousE2F7 NM_203394.2 EndogenousEARS2 NM_133451.1 EndogenousEBER1 HHV4_000057.1 EndogenousEBER2 HHV4_000080.1 EndogenousEBNA1 HHV4_000053.1 EndogenousEBNA2 HHV4_000068.1 EndogenousEBNA3 HHV4_000040.1 EndogenousEFEMP1 NM_004105.3 EndogenousEGF NM_001963.3 EndogenousEGFR NM_201282.1 EndogenousEIF3E NM_001568.2 EndogenousELANE NM_001972.2 EndogenousELMO3 NM_024712.3 EndogenousEMR1 NM_001974.3 EndogenousENC1 NM_003633.2 EndogenousENO3 NM_001976.4 EndogenousEOMES NM_005442.2 EndogenousEPCAM NM_002354.1 EndogenousERM2 NM_013447.2 Endogenous182. . . continuedGene Name Accession Gene TypeETS2 NM_005239.4 EndogenousEVI2A NM_014210.3 EndogenousF13A1 NM_000129.3 EndogenousFAM129A NM_052966.2 EndogenousFAM58A NM_152274.3 EndogenousFAS NM_000043.3 EndogenousFASLG NM_000639.1 EndogenousFCER1G NM_004106.1 EndogenousFCER2 NM_002002.4 EndogenousFCER2_b NM_001207019.2 EndogenousFCGR1A NM_000566.3 EndogenousFCGR3A NM_000569.6 EndogenousFCGR3B NM_000570.3 EndogenousFCGRT NM_004107.4 EndogenousFGL2 NM_006682.2 EndogenousFGR NM_005248.1 EndogenousFKBP5 NM_001145775.1 EndogenousFLNA NM_001456.3 EndogenousFLRT2 NM_013231.4 EndogenousFLT1 NM_002019.2 EndogenousFMNL2 NM_052905.3 EndogenousFN1 NM_212482.1 EndogenousFOS NM_005252.2 EndogenousFOXO1 NM_002015.3 EndogenousFOXP3 NM_014009.3 EndogenousFPR1 NM_002029.3 EndogenousFPR3 NM_002030.3 EndogenousFUT4 NM_002033.2 EndogenousFUZ NM_025129.3 EndogenousG6PD NM_000402.2 HousekeepingGALNAC4S_6ST NM_001270765.1 EndogenousGAPDH NM_002046.3 HousekeepingGAS6 NM_000820.2 EndogenousGAS7 NM_001130831.1 EndogenousGATA1 NM_002049.2 EndogenousGATA3 NM_001002295.1 EndogenousGB_Virus_C NC_001710.1 Endogenous183. . . continuedGene Name Accession Gene TypeGBP1 NM_002053.1 EndogenousGCA NM_012198.3 EndogenousGCH1 NM_000161.2 EndogenousGCS NM_001498.2 EndogenousGEMIN2 NM_003616.2 EndogenousGFM1 NM_024996.5 EndogenousGJB2 NM_004004.5 EndogenousGLUL NM_001033056.1 EndogenousGMFG NM_004877.2 EndogenousGNAI2 NM_002070.2 EndogenousGNB2 NM_005273.3 EndogenousGNG11 NM_004126.3 EndogenousGNS NM_002076.3 EndogenousGPNMB NM_001005340.1 EndogenousGPR160 NM_014373.1 EndogenousGPR18 NM_001098200.1 EndogenousGPR65 NM_003608.2 EndogenousGPX3 NM_002084.3 EndogenousGPX7 NM_015696.4 EndogenousGR NM_001018077.1 EndogenousGRAP NM_006613.3 EndogenousGSTA1 NM_145740.3 EndogenousGSTM1 NM_000561.2 EndogenousGSTP1 NM_000852.2 EndogenousGTSF1L NM_176791.3 EndogenousGUSB NM_000181.1 HousekeepingGYS1 NM_002103.4 EndogenousGZMB NM_004131.3 EndogenousGZMM NM_005317.2 EndogenousHCK NM_002110.2 EndogenousHCLS1 NM_005335.4 EndogenousHGF NM_000601.4 EndogenousHHEX NM_002729.4 EndogenousHIST1H1D NM_005320.2 EndogenousHIST1H2BF NM_003522.3 EndogenousHIST1H4C NM_003542.3 EndogenousHK3 NM_002115.1 Endogenous184. . . continuedGene Name Accession Gene TypeHLA-A NM_002116.5 EndogenousHLA-B NM_005514.6 EndogenousHLA-C NM_002117.4 EndogenousHLA-DRA NM_019111.3 EndogenousHLA-DRB1 NM_002124.2 EndogenousHLA-DRB3 NM_022555.3 EndogenousHLA-DRB4 NM_021983.4 EndogenousHMBS NM_000190.3 HousekeepingHMMR NM_012484.2 EndogenousHMOX1 NM_002133.1 EndogenousHOPX NM_001145460.1 EndogenousHPRT1 NM_000194.1 EndogenousHRH1 NM_000861.2 EndogenousHRSP12 NM_005836.2 EndogenousHSD17B8 NM_014234.3 EndogenousHSP90AA1 NM_005348.3 EndogenousHSPA1A NM_005345.5 EndogenousHSPA1L NM_005527.3 EndogenousHSPB1 NM_001540.3 EndogenousHYOU1 NM_006389.3 EndogenousICAM1 NM_000201.2 EndogenousICAM3 NM_002162.3 EndogenousICOS NM_012092.2 EndogenousICOSLG NM_015259.4 EndogenousID2 NM_002166.4 EndogenousIDO1 NM_002164.3 EndogenousIER3 NM_003897.2 EndogenousIFI44 NM_006417.4 EndogenousIFI44L NM_006820.2 EndogenousIFITM2 NM_006435.2 EndogenousIFNAR1 NM_000629.2 EndogenousIFNAR2 NM_000874.3 EndogenousIFNB NM_002176.2 EndogenousIFNG NM_000619.2 EndogenousIFNGR1 NM_000416.1 EndogenousIFNGR2 NM_005534.3 EndogenousIGF1 NM_000618.3 Endogenous185. . . continuedGene Name Accession Gene TypeIGF2R NM_000876.1 EndogenousIGFBP7 NM_001553.1 EndogenousIGL-A J00252.1 EndogenousIGSF3 NM_001542.2 EndogenousIGSF6 NM_005849.2 EndogenousIKBKG NM_003639.2 EndogenousIKZF2 NM_016260.2 EndogenousIL10 NM_000572.2 EndogenousIL12 NM_000882.2 EndogenousIL13 NM_002188.2 EndogenousIL13RA1 NM_001560.2 EndogenousIL15RA NM_002189.2 EndogenousIL17A NM_002190.2 EndogenousIL17F NM_052872.3 EndogenousIL1A NM_000575.3 EndogenousIL1B NM_000576.2 EndogenousIL1R1 NM_000877.2 EndogenousIL1R2 NM_004633.3 EndogenousIL1RN NM_000577.3 EndogenousIL2 NM_000586.2 EndogenousIL21 NM_021803.2 EndogenousIL22 NM_020525.4 EndogenousIL2RA NM_000417.1 EndogenousIL2RG NM_000206.1 EndogenousIL33 NM_033439.2 EndogenousIL3RA NM_002183.2 EndogenousIL4 NM_000589.2 EndogenousIL4I1 NM_152899.1 EndogenousIL4R NM_000418.2 EndogenousIL5 NM_000879.2 EndogenousIL6 NM_000600.1 EndogenousIL6R NM_000565.2 EndogenousIL7 NM_000880.2 EndogenousIL7R NM_002185.2 EndogenousIL8 NM_000584.2 EndogenousIL8RB NM_001557.2 EndogenousIL9 NM_000590.1 Endogenous186. . . continuedGene Name Accession Gene TypeIL9R NM_002186.2 EndogenousILT2 NM_001081637.1 EndogenousINHBA NM_002192.2 EndogenousIQCG NM_001134435.1 EndogenousIRF1 NM_002198.1 EndogenousIRF4 NM_002460.1 EndogenousITGA2 NM_002203.2 EndogenousITGA4 NM_000885.4 EndogenousITGAE NM_002208.4 EndogenousITGAL NM_002209.2 EndogenousITGAM NM_000632.3 EndogenousITGAX NM_000887.3 EndogenousITGB2 NM_000211.2 EndogenousITM2A NM_004867.4 EndogenousITM2B NM_021999.2 EndogenousJMJD6 NM_015167.2 EndogenousJUN NM_002228.3 EndogenousKBTBD8 NM_032505.2 EndogenousKCNJ11 NM_000525.3 EndogenousKCNJ2 NM_000891.2 EndogenousKCNJ8 NM_004982.2 EndogenousKCTD12 NM_138444.2 EndogenousKIAA1598 NM_018330.5 EndogenousKIR2DL1 NM_014218.2 EndogenousKIR2DS1 NM_014512.1 EndogenousKIR3DL1 NM_013289.2 EndogenousKIT NM_000222.1 EndogenousKLRAP1 NR_028045.1 EndogenousKLRB1 NM_002258.2 EndogenousKLRD1 NM_002262.3 EndogenousKLRF1 NM_016523.1 EndogenousKLRG1 NM_005810.3 EndogenousKYNU NM_003937.2 EndogenousLAG3 NM_002286.5 EndogenousLAIR1 NM_002287.3 EndogenousLAMC1 NM_002293.3 EndogenousLAYN NM_178834.3 Endogenous187. . . continuedGene Name Accession Gene TypeLCN2 NM_005564.3 EndogenousLGALS1 NM_002305.3 EndogenousLGALS3BP NM_005567.3 EndogenousLGMN NM_001008530.2 EndogenousLILRB2 NM_005874.1 EndogenousLMNB1 NM_005573.2 EndogenousLMO2 NM_005574.3 EndogenousLMP1 HHV4_000020.1 EndogenousLMP2 HHV4_000043.1 EndogenousLOC54103 NM_017439.3 EndogenousLOXL1 NM_005576.2 EndogenousLOXL2 NM_002318.2 EndogenousLOXL3 NM_032603.2 EndogenousLOXL4 NM_032211.6 EndogenousLPHN1 NM_001008701.1 EndogenousLPL NM_000237.2 EndogenousLRP1 NM_002332.2 EndogenousLRRC20 NM_207119.1 EndogenousLSM5 NM_012322.2 EndogenousLSM7 NM_016199.2 EndogenousLTA NM_000595.2 EndogenousLTBP1 NM_000627.3 EndogenousLY75 NM_002349.2 EndogenousLY86 NM_004271.3 EndogenousLYPD3 NM_014400.2 EndogenousLYZ NM_000239.2 EndogenousMAFB NM_005461.3 EndogenousMAOA NM_000240.2 EndogenousMAP1LC3B NM_022818.4 EndogenousMAP3K13 NM_004721.3 EndogenousMAP4 NM_030885.3 EndogenousMAP7D1 NM_018067.3 EndogenousMAPK13 NM_002754.3 EndogenousMAPK7 NM_139033.1 EndogenousMARCO NM_006770.3 EndogenousMATK NM_139354.1 EndogenousMDM2 NM_006878.2 Endogenous188. . . continuedGene Name Accession Gene TypeMED30 NM_080651.2 EndogenousMET NM_000245.2 EndogenousMETRNL NM_001004431.1 EndogenousMFAP2 NM_001135247.1 EndogenousMGMT NM_002412.3 EndogenousMGST1 NM_145792.1 EndogenousMGST3 NM_004528.2 EndogenousMID2 NM_012216.3 EndogenousMIF NM_002415.1 EndogenousMINK1 NM_170663.3 EndogenousMKI67 NM_002417.2 EndogenousMME NM_000902.2 EndogenousMMP11 NM_005940.3 EndogenousMMP13 NM_002427.2 EndogenousMMP2 NM_004530.2 EndogenousMMP3 NM_002422.3 EndogenousMMP7 NM_002423.3 EndogenousMMP9 NM_004994.2 EndogenousMNDA NM_002432.1 EndogenousMPO NM_000250.1 EndogenousMRC1 NM_002438.2 EndogenousMRPL33 NM_004891.3 EndogenousMRPL42 NM_014050.3 EndogenousMRPL54 NM_172251.2 EndogenousMRPS36 NM_033281.5 EndogenousMS4A1 NM_152866.2 EndogenousMS4A4A NM_024021.2 EndogenousMS4A6A NM_152851.2 EndogenousMSN NM_002444.2 EndogenousMST1R NM_002447.1 EndogenousMT1E NM_175617.3 EndogenousMT1F NM_005949.3 EndogenousMT1G NM_005950.1 EndogenousMT2A NM_005953.3 EndogenousMYC NM_002467.3 EndogenousMYLK NM_053032.2 EndogenousNAMPT NM_005746.2 Endogenous189. . . continuedGene Name Accession Gene TypeNAPA NM_003827.2 EndogenousNCAM1 NM_000615.5 EndogenousNCF2 NM_000433.3 EndogenousNCKIPSD NM_016453.2 EndogenousNCR1 NM_001145457.1 EndogenousNCR3 NM_147130.1 EndogenousNDUFA4 NM_002489.2 EndogenousNDUFB6 NM_001199987.1 EndogenousNDUFS4 NM_002495.2 EndogenousNEB NM_004543.3 EndogenousNFATC4 NM_004554.4 EndogenousNFIA NM_005595.1 EndogenousNFIB NM_005596.2 EndogenousNFKB1 NM_003998.2 EndogenousNGK2D NM_024518.1 EndogenousNINJ1 NM_004148.3 EndogenousNM23A NM_000269.2 EndogenousNOS2 NM_000625.4 EndogenousNOTCH1 NM_017617.3 EndogenousNPY1R NM_000909.4 EndogenousNSMCE2 NM_173685.2 EndogenousNT5DC4 XM_001716359.4 EndogenousNUAK1 NM_014840.2 EndogenousNUCB2 NM_005013.2 EndogenousOCIAD2 NM_152398.2 EndogenousODZ2 NM_001122679.1 EndogenousORM1 NM_000607.1 EndogenousOSBPL6 NM_145739.2 EndogenousP21 NM_000389.2 EndogenousP2RY10 NM_014499.2 EndogenousP2RY14 NM_014879.2 EndogenousP4HB NM_000918.3 EndogenousPARP1 NM_001618.3 EndogenousPARP14 NM_017554.2 EndogenousPCDHGC3 NM_032402.1 EndogenousPDCD1 NM_005018.1 EndogenousPDCD10 NM_145859.1 Endogenous190. . . continuedGene Name Accession Gene TypePDCD1LG2 NM_025239.3 EndogenousPDGFA NM_002607.5 EndogenousPDGFRA NM_006206.3 EndogenousPDGFRB NM_002609.3 EndogenousPDIA5 NM_006810.2 EndogenousPEA15 NM_003768.2 EndogenousPECAM1 NM_000442.3 EndogenousPERP NM_022121.3 EndogenousPEX2 NM_000318.2 EndogenousPEX3 NM_003630.2 EndogenousPFDN6 NM_014260.2 EndogenousPGE2 NM_025072.6 EndogenousPGK1 NM_000291.2 HousekeepingPHLDA2 NM_003311.3 EndogenousPICALM NM_007166.2 EndogenousPIEZO1 NM_001142864.1 EndogenousPIK3C3 NM_002647.2 EndogenousPIK3CB NM_006219.1 EndogenousPKIG NM_007066.3 EndogenousPLA1A NM_015900.2 EndogenousPLAU NM_002658.2 EndogenousPLD3 NM_001031696.2 EndogenousPLEKHC1 NM_001135000.1 EndogenousPLEKHF2 NM_024613.2 EndogenousPLOD1 NM_000302.2 EndogenousPLOD3 NM_001084.4 EndogenousPLTP NM_006227.2 EndogenousPLVAP NM_031310.1 EndogenousPLXNB2 NM_012401.2 EndogenousPOLR1B NM_019014.3 HousekeepingPOLR2A NM_000937.2 HousekeepingPOU2AF1 NM_006235.2 EndogenousPPIL3 NM_032472.3 EndogenousPRDX4 NM_006406.1 EndogenousPRF1 NM_005041.3 EndogenousPRG1 NR_026881.1 EndogenousPRG2 NM_002728.4 Endogenous191. . . continuedGene Name Accession Gene TypePRKAR2B NM_002736.2 EndogenousPRKCB1 NM_212535.1 EndogenousPSMD14 NM_005805.4 EndogenousPSTPIP2 NM_024430.3 EndogenousPTAFR NM_000952.3 EndogenousPTGDR2 NM_004778.1 EndogenousPTGS1 NM_000962.2 EndogenousPTGS2 NM_000963.1 EndogenousPTPLAD1 NM_016395.2 EndogenousPTPRF NM_002840.3 EndogenousPTS NM_000317.2 EndogenousRAB7A NM_004637.5 EndogenousRAB9A NM_001195328.1 EndogenousRAPGEF2 NM_014247.2 EndogenousRASIP1 NM_017805.2 EndogenousRASSF2 NM_014737.2 EndogenousRASSF4 NM_032023.3 EndogenousRB1 NM_000321.1 EndogenousRC3H2 NM_018835.2 EndogenousRCSD1 NM_052862.3 EndogenousRELA NM_021975.2 EndogenousRELL1 NM_001085400.1 EndogenousRETNLB NM_032579.2 EndogenousRGC32 NM_014059.2 EndogenousRGS18 NM_130782.2 EndogenousRGS2 NM_002923.1 EndogenousRHEBL1 NM_144593.1 EndogenousRNASE2 NM_002934.2 EndogenousRNF144B NM_182757.2 EndogenousRNF19A NM_183419.1 EndogenousROMO1 NM_080748.2 EndogenousRORC NM_001001523.1 EndogenousRPA3 NM_002947.3 EndogenousRPL19 NM_000981.3 HousekeepingRPL22L1 NM_001099645.1 EndogenousRPL31 NM_000993.4 EndogenousRPLP0 NM_001002.3 Housekeeping192. . . continuedGene Name Accession Gene TypeRPS21 NM_001024.3 EndogenousRRAD NM_004165.1 EndogenousRXRA NM_002957.4 EndogenousS100A11 NM_005620.1 EndogenousS100A13 NM_001024210.1 EndogenousSASH1 NM_015278.3 EndogenousSCARB2 NM_005506.2 EndogenousSDC1 NM_002997.4 EndogenousSDC4 NM_002999.2 EndogenousSDHA NM_004168.1 HousekeepingSEC61G NM_001012456.1 EndogenousSECTM1 NM_003004.2 EndogenousSELL NR_029467.1 EndogenousSEMA4C NM_017789.4 EndogenousSEPP1 NM_005410.2 EndogenousSERPINA1 NM_000295.4 EndogenousSERPINB1 NM_030666.2 EndogenousSERPINE2 NM_006216.2 EndogenousSERPING1 NM_000062.2 EndogenousSH3BGRL3 NM_031286.2 EndogenousSHC1 NM_183001.4 EndogenousSHMT1 NM_148918.1 EndogenousSIGLEC10 NM_001171158.1 EndogenousSLAMF8 NM_020125.2 EndogenousSLC15A3 NM_016582.1 EndogenousSLC22A14 NM_004803.3 EndogenousSLC31A2 NM_001860.2 EndogenousSLC4A11 NM_032034.2 EndogenousSMAD1 NM_005900.2 EndogenousSMAP2 NM_001198979.1 EndogenousSMC3 NM_005445.3 EndogenousSNRPD1 NM_006938.2 EndogenousSNRPD2 NM_004597.4 EndogenousSOCS2 NM_003877.3 EndogenousSOD2 NM_000636.2 EndogenousSORD NM_003104.4 EndogenousSPARC NM_003118.2 Endogenous193. . . continuedGene Name Accession Gene TypeSPARCL1 NM_004684.4 EndogenousSPN NM_003123.3 EndogenousSRPX NM_006307.2 EndogenousSTAP1 NM_012108.2 EndogenousSTAT1 NM_007315.2 EndogenousSTK17B NM_004226.2 EndogenousSYTL3 NM_001009991.2 EndogenousTACI NM_012452.2 EndogenousTALDO1 NM_006755.1 EndogenousTBC1D9 NM_015130.2 EndogenousTBP NM_001172085.1 HousekeepingTBX21 NM_013351.1 EndogenousTCEAL1 NM_001006639.1 EndogenousTCIRG1 NM_006053.2 EndogenousTCN2 NM_000355.2 EndogenousTCTEX1D2 NM_152773.3 EndogenousTEK NM_000459.2 EndogenousTFB2M NM_022366.2 EndogenousTFPI2 NM_006528.3 EndogenousTGFB1 NM_000660.3 EndogenousTGFBI NM_000358.2 EndogenousTHBS1 NM_003246.2 EndogenousTHEMIS2 NM_001039477.1 EndogenousTIA1 NM_022037.1 EndogenousTIMM10 NM_012456.2 EndogenousTIMP1 NM_003254.2 EndogenousTIMP4 NM_003256.2 EndogenousTL6 NM_005092.2 EndogenousTLR1 NM_003263.3 EndogenousTLR10 NM_030956.2 EndogenousTLR2 NM_003264.3 EndogenousTLR3 NM_003265.2 EndogenousTLR4 NR_024168.1 EndogenousTLR5 NM_003268.3 EndogenousTLR6 NM_006068.2 EndogenousTLR7 NM_016562.3 EndogenousTLR8 NM_016610.2 Endogenous194. . . continuedGene Name Accession Gene TypeTLR9 NM_017442.2 EndogenousTMEM126A NM_032273.3 EndogenousTMSB15A NM_021992.2 EndogenousTNF NM_000594.2 EndogenousTNFRSF11A NM_003839.2 EndogenousTNFRSF14 NM_003820.2 EndogenousTNFRSF18 NM_004195.2 EndogenousTNFRSF1A NM_001065.2 EndogenousTNFRSF1B NM_001066.2 EndogenousTNFRSF4 NM_003327.2 EndogenousTNFRSF8 NM_001243.3 EndogenousTNFRSF9 NM_001561.4 EndogenousTNFSF10 NM_003810.2 EndogenousTNFSF11 NM_003701.2 EndogenousTNFSF4 NM_003326.2 EndogenousTNFSF8 NM_001244.2 EndogenousTNFSF9 NM_003811.3 EndogenousTNS1 NM_022648.4 EndogenousTNS3 NM_022748.11 EndogenousTOP2A NM_001067.2 EndogenousTOP2B NM_001068.2 EndogenousTP53 NM_000546.2 EndogenousTPD52 NM_001025252.1 EndogenousTPP1 NM_000391.3 EndogenousTPSAB1 NM_003294.3 EndogenousTRA X02592.1 EndogenousTRADD NM_003789.2 EndogenousTRAF2 NM_021138.3 EndogenousTRAT1 NM_016388.2 EndogenousTSPAN4 NM_001025234.1 EndogenousTUBB NM_178014.2 HousekeepingTUBB1 NM_030773.2 EndogenousTUBB2B NM_178012.3 EndogenousTUBB4 NM_006087.2 EndogenousUBA1 NM_003334.3 EndogenousUBE3A NM_000462.2 EndogenousUBE3B NM_183415.2 Endogenous195. . . continuedGene Name Accession Gene TypeULK1 NM_003565.1 EndogenousUPB1 NM_016327.2 EndogenousUSP3 NM_006537.2 EndogenousUTS2R NM_018949.1 EndogenousUXT NM_004182.2 EndogenousVAMP7 NM_005638.5 EndogenousVCAM1 NM_001078.2 EndogenousVCAN NM_004385.3 EndogenousVEGFA NM_001025366.1 EndogenousVEGFR1 NM_002019.4 EndogenousVNN2 NM_004665.2 EndogenousVPREB3 NM_013378.2 EndogenousVWA5A NM_001130142.1 EndogenousVWF NM_000552.3 EndogenousWARS NM_173701.1 EndogenousWBP4 NM_007187.3 EndogenousWDR83 NM_032332.3 EndogenousWHSC2 NM_005663.3 EndogenousWT1 NM_000378.3 EndogenousXIAP NM_001204401.1 EndogenousXKR4 NM_052898.1 EndogenousZBTB32 NM_014383.1 EndogenousZDHHC21 NM_178566.4 EndogenousZNF32 NM_001005368.1 EndogenousZNHIT3 NM_001033577.1 EndogenousZYX NM_003461.4 EndogenousTable A.7: Primary vs. Relapse Specimens Gene Expression Signature DifferenceCorrelations.Signature1 Signature2 r2CTL B-cell -0.028Drug-Resistance B-cell 0.426Eosinophil B-cell -0.431FDC B-cell 0.598Fibroblast-ECM B-cell -0.471HRS B-cell -0.014196. . . continuedSignature1 Signature2 r2Macrophage B-cell -0.796Mast B-cell -0.278MDSC B-cell -0.716Neutrophil B-cell -0.641NK B-cell 0.110Plasma B-cell 0.355T-cell B-cell 0.652Treg B-cell 0.138Drug-Resistance CTL -0.066Eosinophil CTL -0.107FDC CTL 0.043Fibroblast-ECM CTL -0.195HRS CTL -0.115Macrophage CTL 0.075Mast CTL 0.035MDSC CTL -0.025Neutrophil CTL -0.086NK CTL 0.353Plasma CTL 0.179T-cell CTL 0.201Treg CTL 0.024Eosinophil Drug-Resistance -0.043FDC Drug-Resistance 0.291Fibroblast-ECM Drug-Resistance -0.514HRS Drug-Resistance -0.226Macrophage Drug-Resistance -0.555Mast Drug-Resistance -0.187MDSC Drug-Resistance -0.456Neutrophil Drug-Resistance -0.438NK Drug-Resistance 0.299Plasma Drug-Resistance 0.124T-cell Drug-Resistance 0.307Treg Drug-Resistance -0.200FDC Eosinophil -0.297Fibroblast-ECM Eosinophil 0.018HRS Eosinophil 0.037Macrophage Eosinophil 0.227197. . . continuedSignature1 Signature2 r2Mast Eosinophil -0.031MDSC Eosinophil 0.284Neutrophil Eosinophil 0.433NK Eosinophil 0.229Plasma Eosinophil -0.267T-cell Eosinophil -0.290Treg Eosinophil -0.119Fibroblast-ECM FDC -0.388HRS FDC 0.173Macrophage FDC -0.588Mast FDC 0.034MDSC FDC -0.461Neutrophil FDC -0.441NK FDC 0.031Plasma FDC 0.203T-cell FDC 0.569Treg FDC 0.310HRS Fibroblast-ECM -0.281Macrophage Fibroblast-ECM 0.505Mast Fibroblast-ECM 0.206MDSC Fibroblast-ECM 0.656Neutrophil Fibroblast-ECM 0.278NK Fibroblast-ECM -0.078Plasma Fibroblast-ECM 0.197T-cell Fibroblast-ECM -0.635Treg Fibroblast-ECM -0.125Macrophage HRS -0.033Mast HRS -0.140MDSC HRS -0.103Neutrophil HRS 0.029NK HRS -0.274Plasma HRS -0.408T-cell HRS 0.225Treg HRS 0.552Mast Macrophage 0.379MDSC Macrophage 0.744Neutrophil Macrophage 0.674198. . . continuedSignature1 Signature2 r2NK Macrophage -0.147Plasma Macrophage -0.212T-cell Macrophage -0.718Treg Macrophage -0.149MDSC Mast 0.296Neutrophil Mast 0.301NK Mast -0.230Plasma Mast 0.179T-cell Mast -0.245Treg Mast -0.027Neutrophil MDSC 0.570NK MDSC 0.032Plasma MDSC -0.088T-cell MDSC -0.773Treg MDSC -0.190NK Neutrophil -0.204Plasma Neutrophil -0.374T-cell Neutrophil -0.582Treg Neutrophil -0.098Plasma NK 0.303T-cell NK 0.022Treg NK -0.144T-cell Plasma 0.155Treg Plasma -0.060Treg T-cell 0.463Table A.8: RHL30 vs. Reported Prognostic Markers for Post-BMT-FFS.Cohort Index Feature PBCCA Training 1 Time to First Relapse 0.883BCCA Training 1 RHL30 High-risk 0.000BCCA Training 2 Primary Refractory Status 0.791BCCA Training 2 RHL30 High-risk 0.000BCCA Training 3 Chemoresistance to Salvage Therapy 0.086BCCA Training 3 RHL30 High-risk 0.000BCCA Training 4 Age ≥ 45 0.459199. . . continuedCohort Index FeatureBCCA Training 4 RHL30 High-risk 0.000BCCA Training 5 Stage IV 0.314BCCA Training 5 RHL30 High-risk 0.000UMCG Validation 1 Time to First Relapse 0.294UMCG Validation 1 RHL30 High-risk 0.059UMCG Validation 2 Primary Refractory Status 0.384UMCG Validation 2 RHL30 High-risk 0.027UMCG Validation 3 Chemoresistance to Salvage Therapy 0.002UMCG Validation 3 RHL30 High-risk 0.009UMCG Validation 4 Age ≥ 45 0.270UMCG Validation 4 RHL30 High-risk 0.027AUH Validation 1 Time to First Relapse 0.996AUH Validation 1 RHL30 High-risk 0.005AUH Validation 2 Primary Refractory Status 0.514AUH Validation 2 RHL30 High-risk 0.003AUH Validation 3 Chemoresistance to Salvage Therapy 0.980AUH Validation 3 RHL30 High-risk 0.013AUH Validation 4 Age ≥ 45 0.168AUH Validation 4 RHL30 High-risk 0.002AUH Validation 5 Stage IV 0.848AUH Validation 5 RHL30 High-risk 0.003Table A.9: RHL30 vs. Reported Prognostic Markers for Post-BMT-OS.Cohort Index Feature PBCCA Training 1 Time to First Relapse 0.253BCCA Training 1 RHL30 High-risk 0.000BCCA Training 2 Primary Refractory Status 0.341BCCA Training 2 RHL30 High-risk 0.000BCCA Training 3 Chemoresistance to Salvage Therapy 0.016BCCA Training 3 RHL30 High-risk 0.001BCCA Training 4 Age ≥ 45 0.765BCCA Training 4 RHL30 High-risk 0.000BCCA Training 5 Stage IV 0.091BCCA Training 5 RHL30 High-risk 0.000UMCG Validation 1 Time to First Relapse 0.782200. . . continuedCohort Index Feature PUMCG Validation 1 RHL30 High-risk 0.017UMCG Validation 2 Primary Refractory Status 0.512UMCG Validation 2 RHL30 High-risk 0.015UMCG Validation 3 Chemoresistance to Salvage Therapy 0.230UMCG Validation 3 RHL30 High-risk 0.009UMCG Validation 4 Age ≥ 45 0.930UMCG Validation 4 RHL30 High-risk 0.012AUH Validation 1 Time to First Relapse 0.955AUH Validation 1 RHL30 High-risk 0.022AUH Validation 2 Primary Refractory Status 0.916AUH Validation 2 RHL30 High-risk 0.017AUH Validation 3 Chemoresistance to Salvage Therapy 0.781AUH Validation 3 RHL30 High-risk 0.077AUH Validation 4 Age ≥ 45 0.858AUH Validation 4 RHL30 High-risk 0.016AUH Validation 5 Stage IV 0.020AUH Validation 5 RHL30 High-risk 0.005201

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0343370/manifest

Comment

Related Items