Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Detection of genomic rearrangements in archival lymphoma tissues using targeted capture sequencing Chong, Lauren Camille 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_may_chong_lauren.pdf [ 7.34MB ]
JSON: 24-1.0229569.json
JSON-LD: 24-1.0229569-ld.json
RDF/XML (Pretty): 24-1.0229569-rdf.xml
RDF/JSON: 24-1.0229569-rdf.json
Turtle: 24-1.0229569-turtle.txt
N-Triples: 24-1.0229569-rdf-ntriples.txt
Original Record: 24-1.0229569-source.json
Full Text

Full Text

Detection of Genomic Rearrangements in Archival Lymphoma TissuesUsing Targeted Capture SequencingbyLauren Camille ChongB.C.S. Bioinformatics, The University of Waterloo, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Bioinformatics)The University of British Columbia(Vancouver)April 2016c© Lauren Camille Chong, 2016AbstractThe B-cell lymphomas are a heterogeneous group of disease entities arising from mature B lym-phocytes and are characterized by frequent genomic rearrangements. Recurrent rearrangementsinvolving the MHC class II transactivator CIITA and the programmed death 1 ligands PDL1 andPDL2 have been shown to contribute to an immune privilege phenotype in multiple B-cell lym-phomas, with implications for novel therapeutic approaches. However, the landscape of fusionpartners for these genes has not been well characterized and methods that utilize formalin-fixedparaffin-embedded (FFPE) tumour samples for breakpoint discovery have not been explored.We selected 68 B-cell lymphoma patients with known CIITA and PDL1/2 rearrangement statusdetermined by fluorescence in situ hybridization (FISH) break-apart assays. DNA surrounding theCIITA and PDL1/2 loci was captured from FFPE tumour libraries using a hybridization-based tar-get enrichment assay and sequenced on an Illumina HiSeq 2500. Multiple structural variant (SV)detection tools were used in an ensemble approach to generate SV predictions. We identified 35novel translocation partners and observed translocation cluster breakpoint regions (CBRs) in CI-ITA, PDL1, PDL2 and the SOCS1 tumour suppressor gene downstream of CIITA. Recurrent intra-chromosomal deletions, inversions and duplications were also identified in each region. Immuno-histochemistry (IHC) analysis of PD-L1 and PD-L2 surface expression demonstrated that CBRtranslocations and a subset of intra-chromosomal rearrangements are significantly associated withincreased protein expression of the respective ligand. In conjunction with published reports thissuggests that distinct rearrangement types have variable functional consequences. We also reportmany SVs below the detection resolution of FISH, suggesting the value of a combined approachintegrating FISH, capture sequencing and IHC data for characterizing genomic rearrangements inlymphomas.This study confirms the utility of a targeted sequencing approach for detecting structural vari-ation in FFPE lymphoma tissues. Future capture designs interrogating the full set of recurrentlyrearranged lymphoma genes are being explored with the aim of designing a comprehensive, high-throughput and clinically relevant assay for routine profiling of rearrangement status to guide clini-cal decision making.iiPrefaceThis project was conceived of primarily by Dr. Christian Steidl. I was responsible for performingall bioinformatics analysis and interpretation of the findings.Chapter 2: Study Design and MethodologyA version of this chapter has been used as the methodology section of a manuscript which is inrevisions for publication. This manuscript is co-first-authored by David D. W. Twa and myself(manuscript subsequently referred to as Chong, Twa et al.). The contributions of each author onChong, Twa et al. include:• Project design: Christian Steidl (CS), David D. W. Twa (DT), Anja Mottok (AM), SusanaBen-Neriah (SB), Andrew J. Mungall (AJM), Bruce W. Woolcock, Kerry J. Savage, DavidW. Scott, Randy D. Gascoyne, Ryan D. Morin• FISH analyses: SB• Cohort selection: CS, DT, SB, AM• Tissue extractions: AM, SB• Capture and sequencing: AJM, Yongjun Zhao, Marco A. Marra• Bioinformatics analysis: Lauren Chong (LC)• Validations: DT• Immunohistochemistry: AM• Interpretation of results: DT, LC, AM, CS• Writing the manuscript: DT, LC• Edited and approved the manuscript: All authorsiiiChapter 3: Results and DiscussionSection 3.1: Pipeline implementation and optimizationVersions of Section 3.1.2 and Section 3.1.4 are included in the results of Chong, Twa et al., andsome figures and tables shown here are used in that manuscript. I generated all results and figuresincluded in this chapter, except the library construction quality metrics which were generated byAJM and his team at the Genome Sciences Centre (GSC).Section 3.2: PDL rearrangementsA version of this chapter has been written as the results and discussion sections of Chong, Twaet al. in which I generated the rearrangement predictions and visualizations. DT validated therearrangements, AM generated immunohistochemistry data, and interpretation of the results wasperformed by DT and myself. Some of the tables and figures in this chapter are included in Chong,Twa et al. The findings described here with some additional functional analysis may be included inDT’s thesis.Section 3.3: CIITA and SOCS1 rearrangementsI was responsible for generating the predicted rearrangements and figures included in this chapter.Validations of predicted events were performed by AM, and interpretation of the findings was per-formed by AM and myself. This work is not yet published. These results and further functionalinterpretation may be included in AM’s thesis.Chapter 4: ConclusionsSome of the conclusions described here are also included in the discussion section of Chong, Twaet al. and were authored by DT, CS and myself.Appendix A: Supplementary MethodsThis section contains paragraphs taken from the supplementary methods of Chong, Twa et al. Creditfor authorship of each paragraph is described with footnotes throughout the appendix.Figure 1.2, Figure 1.4, Figure 1.6 and their legends are exact reproductions with permission frompublished manuscripts. Credit for these figures can be found at the end of their legends.This study abided by the Declaration of Helsinki and was approved by the BC Cancer Agencyreview board (REB: H11-00684).ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 B-cell lymphomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Genomic rearrangements in B-cell lymphomas . . . . . . . . . . . . . . . Pathogenesis and mechanisms . . . . . . . . . . . . . . . . . . . 61.1.3 Anti-tumour immune response . . . . . . . . . . . . . . . . . . . . . . . . MHC class II and CIITA . . . . . . . . . . . . . . . . . . . . . . Programmed death signalling pathways . . . . . . . . . . . . . . 111.2 Targeted sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detecting genomic rearrangements . . . . . . . . . . . . . . . . Utility for archival tissues . . . . . . . . . . . . . . . . . . . . . 191.3 Research questions and project aims . . . . . . . . . . . . . . . . . . . . . . . . . 20v2 Study Design and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1 Sample selection and library construction . . . . . . . . . . . . . . . . . . . . . . 232.2 Capture design and sequencing protocol . . . . . . . . . . . . . . . . . . . . . . . 302.3 Structural variant detection and filtering . . . . . . . . . . . . . . . . . . . . . . . 322.3.1 Alignment-based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Read trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.3 Assembly-based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.4 Integrating results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4 High-confidence variants and validations . . . . . . . . . . . . . . . . . . . . . . . 392.5 Immunohistochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1 Pipeline implementation and optimization . . . . . . . . . . . . . . . . . . . . . . 423.1.1 Library construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1.2 Target coverage depth and uniformity . . . . . . . . . . . . . . . . . . . . 443.1.3 Assembly statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.1.4 Structural variant detection . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 PDL rearrangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2.1 Patterns of rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2.1.1 Translocation cluster breakpoint regions in CD274 and PDCD1LG2 643.2.1.2 Intra-chromosomal rearrangements . . . . . . . . . . . . . . . . 663.2.1.3 Immunohistochemistry . . . . . . . . . . . . . . . . . . . . . . 673.2.2 Interpretation of PDL rearrangements . . . . . . . . . . . . . . . . . . . . 703.2.3 Concordance with FISH . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.3 CIITA and SOCS1 rearrangements . . . . . . . . . . . . . . . . . . . . . . . . . . 753.3.1 Patterns of rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . 753.3.1.1 Translocation cluster breakpoint regions in CIITA intron 1 andSOCS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.3.1.2 Intra-chromosomal rearrangements . . . . . . . . . . . . . . . . 843.3.2 Interpretation of CIITA and SOCS1 rearrangements . . . . . . . . . . . . . 873.3.3 Concordance with FISH . . . . . . . . . . . . . . . . . . . . . . . . . . . 894 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.1 Utility of capture sequencing in B-cell lymphoma rearrangement detection . . . . . 924.2 Clinical implications of genomic rearrangements . . . . . . . . . . . . . . . . . . 944.3 Further research applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98viA Supplementary Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111B Supplementary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113viiList of TablesTable 2.1 Chromosomal coordinates (hg19) of the BAC probes used in FISH break-apartstaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Table 2.2 Description of the 92 patients selected for investigation . . . . . . . . . . . . . 26Table 2.3 Coordinates of the capture space used in the custom Agilent SureSelect design . 31Table 3.1 High-confidence predicted SVs in the PDL region . . . . . . . . . . . . . . . . 62Table 3.2 High-confidence predicted SVs in the chr16 capture region . . . . . . . . . . . 76Table B.1 Software parameters used in various steps of the bioinformatics pipeline . . . . 114Table B.2 Merged structural variant predictions in the PDL region . . . . . . . . . . . . . 121Table B.3 Immunohistochemistry scoring results for the 68 cases . . . . . . . . . . . . . . 128Table B.4 Merged structural variant predictions in the chr16 capture region . . . . . . . . 131viiiList of FiguresFigure 1.1 Stages of B-cell development in the lymph node . . . . . . . . . . . . . . . . . 3Figure 1.2 Molecular processes that remodel immunoglobulin genes . . . . . . . . . . . . 4Figure 1.3 B-cell lymphoma types are associated with normal B-cell counterparts . . . . . 5Figure 1.4 Interactions with the microenvironment that mediate immune escape . . . . . . 9Figure 1.5 Agilent SureSelect kit design and methodology . . . . . . . . . . . . . . . . . 15Figure 1.6 Coverage plot for array and solution hybrid capture . . . . . . . . . . . . . . . 16Figure 1.7 Schematic of the FISH break-apart assays used to detect rearrangements . . . . 18Figure 1.8 Capture of DNA fragments spanning rearrangement breakpoints can identifystructural variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.1 FISH break-apart assays for the CIITA and PDL loci . . . . . . . . . . . . . . 24Figure 2.2 Target regions selected for custom Agilent SureSelect design . . . . . . . . . . 31Figure 2.3 Different types of structural variants introduce representative abnormalities inread pair mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 2.4 Schematic describing the read trimming strategy utilized for the FFPE sequenc-ing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 2.5 Schematic describing assembly-based SV detection . . . . . . . . . . . . . . . 39Figure 3.1 Fragment size distribution for cases failing library construction . . . . . . . . . 43Figure 3.2 Total number of reads sequenced for each of the 68 libraries . . . . . . . . . . 45Figure 3.3 Schematic describing how insert size is measured . . . . . . . . . . . . . . . . 45Figure 3.4 Distribution of insert sizes from the 68 sequencing libraries . . . . . . . . . . . 46Figure 3.5 Mean coverage across each of the target regions . . . . . . . . . . . . . . . . . 47Figure 3.6 Mean coverage of each sample across the target regions . . . . . . . . . . . . . 48Figure 3.7 Proportion of bases sequenced to 100x depth in each sample . . . . . . . . . . 49Figure 3.8 Regions of the target space averaging less than 100x coverage depth . . . . . . 50Figure 3.9 Proportion of bases aligned outside the capture space . . . . . . . . . . . . . . 51Figure 3.10 Number of off-target bases reaching 500-fold depth . . . . . . . . . . . . . . . 52Figure 3.11 Distribution of assembly statistics based on selected k-mer value . . . . . . . . 53ixFigure 3.12 Overlap of rearrangement predictions from four tools . . . . . . . . . . . . . . 54Figure 3.13 Tool-specific rearrangement predictions are primarily found in a single result-set 55Figure 3.14 Overlap of deStruct predictions at each trimming length . . . . . . . . . . . . 58Figure 3.15 Overlap of DELLY predictions at each trimming length . . . . . . . . . . . . . 59Figure 3.16 Overlap of deStruct and DELLY predictions for all 68 samples . . . . . . . . . 60Figure 3.17 Translocation events in the PDL region . . . . . . . . . . . . . . . . . . . . . 65Figure 3.18 Intra-chromosomal predictions in the PDL region . . . . . . . . . . . . . . . . 67Figure 3.19 Immunohistochemistry staining for three representative lymphoma cases . . . 68Figure 3.20 IHC histological score is significantly associated with CBR translocations andgene duplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Figure 3.21 Concordance between FISH and capture-based findings in the PDL region . . . 72Figure 3.22 Schematic of the FISH assay used to investigate a focal amplification in A43037 74Figure 3.23 Location of translocation breakpoints throughout the chr16 capture space . . . 81Figure 3.24 Location of translocation breakpoints throughout the CIITA intron 1 region . . 82Figure 3.25 Dual fusion FISH assay used to validate a t(10;16) prediction in A43115 . . . . 82Figure 3.26 Location of translocation breakpoints throughout the SOCS1 gene region . . . 83Figure 3.27 Location of intra-chromosomal rearrangements in the chr16 capture space . . . 85Figure 3.28 Location of intra-chromosomal rearrangements within CIITA intron 1 . . . . . 86Figure 3.29 Intronic CIITA deletions observed in A43071. . . . . . . . . . . . . . . . . . . 86Figure 3.30 Location of intra-chromosomal rearrangements in the SOCS1 gene locus. . . . 87Figure 3.31 Concordance between FISH and capture-based structural variants on chr16 . . 90xList of AbbreviationsBCR: B-cell receptorSHM: somatic hypermutationCSR: class-switch recombinationMHC: major histocompatibility complexPDL: programmed death ligandDLBCL: diffuse large B-cell lymphomaPMBCL: primary mediastinal large B-cell lymphomaHL: Hodgkin lymphomacHL: classical Hodgkin lymphomaFL: follicular lymphomaSV: structural variantCBR: cluster breakpoint regionxiAcknowledgmentsI would like to thank the Canadian Cancer Society Research Institute for providing funding for thisproject through an innovation grant (702519 to Christian Steidl), and the Terry Fox Research Insti-tute for funding through a Program Project Grant (1023 to Christian Steidl and Randy Gascoyne). Iwould also like to thank the Canadian Institutes for Health Research for providing my MSc fundingthrough the Bioinformatics Training Program (BTP).I want to thank my supervisors, Dr. Christian Steidl and Dr. Ryan Morin for their ongoingsupport and advice. I also want to thank Dr. Inanc Birol for his participation in my committee, andhis lab members for providing valuable technical support and feedback. Thanks to Dr. Steven Jonesfor his role as BTP advisor and for chairing my defense. Thank you to Sharon Ruschkowski for herconstant support throughout my studies.All the members of the Steidl lab assisted me greatly throughout my project, but I especiallywant to thank David Twa, Anja Mottok and Susana Ben-Neriah for their integral contributions to mythesis work. I am also grateful to the Centre for Lymphoid Cancer’s bioinformatics team for helpfuldiscussions, and to Andrew McPherson for frequent assistance with software troubleshooting.Lastly I would like to thank my extraordinary parents, family and friends for their continuedpatience and support.xiiChapter 1Introduction1.1 B-cell lymphomasThe B-cell lymphomas represent a heterogeneous group of disease entities that all arise from ma-ture B lymphocytes. The current World Health Organization (WHO) classification was releasedin 2008 and specifies more than 35 distinct types of B-cell lymphomas, distinguished using mul-tiple criteria including morphology, immunophenotype, genetic data and clinical factors [98]. B-cell lymphoma types vary significantly in prognosis, clinical behaviour, and response to treatment,highlighting the importance of the classification in clinical decision-making. Our understanding ofB-cell lymphomas has significantly advanced in the past two decades, and continues to evolve asnovel technologies and treatment strategies are explored.1.1.1 TypesThe current classification system used for lymphomas is the WHO Classification of Tumours ofHaematopoietic and Lymphoid Tissues whose first edition was published in 2001 [13]. This classi-fication incorporated aspects of a number of others that preceded it, most significantly the RevisedEuropean-American Classification of Lymphoid Neoplasms (REAL) published in 1994 by the In-ternational Lymphoma Study Group [33]. The WHO system defined B-cell lymphomas mainly bymorphology, immunophenotype, clinical features and some genetic abnormalities identified throughclassical cytogenetics. These features demonstrated that each lymphoma is closely related to a nor-1mal cell in a certain stage of B-cell development, suggesting that lymphomas can be partially dis-tinguished by the normal cell counterpart from which they putatively arise. Some key aspects of theinitial classification included the distinction between Hodgkin lymphoma (HL) and non-Hodgkinlymphoma (NHL), and the definition of the two most common forms of NHL, diffuse large B-celllymphoma (DLBCL) and follicular lymphoma (FL) [98].B-cell lymphomas are derived from normal cells at various developmental stages (Figure 1.1).When naive B-cells enter the lymph node they express an antibody complex on their surface calleda B-cell receptor (BCR) [43]. This antibody is developed in the bone marrow. Once it has enteredthe lymph node, a mature naive B-cell is activated through binding at the BCR of either free antigenor antigen displayed on an antigen-presenting cell (APC) [9]. The activated B-cell migrates into aregion within the lymph node called the germinal centre, consisting of a dark zone and a light zoneeach containing cells that are defined by specific expression patterns and activated pathways [39].Cells first enter the dark zone, where they undergo rapid proliferation and somatic hypermutation(SHM). SHM is a process by which random point mutations and indels are introduced into thevariable regions of the immunoglobulin genes that encode the BCR (Figure 1.2b). This alters thebinding site of the antibody and can either increase or decrease its affinity for the antigen. Cellsthen migrate into the light zone and undergo selection, where B-cells with reduced antigen affinityundergo apoptosis. B-cells that survive this process may re-enter the dark zone and repeat theSHM process to further increase antigen affinity. After selection, cells in the light zone undergo aprocess called class-switch recombination (CSR or “class switching”), where the type of antibodythey make can be changed from IgM to another type (e.g. IgG, IgE, IgA) by removal of constantregions (Figure 1.2c). Cells can then leave the germinal center and differentiate into plasma cellswhich produce large amounts of the high-affinity antibody, or memory B-cells that are primed forrecognition of the antigen in the future. The B-cells in each stage of this process have specificregulatory and expression programs corresponding to the activities they must undergo.The initiation and pathogenesis of lymphomas are closely related to the cellular context in whichthey arise (Figure 1.3). Cells at certain stages in the developmental cycle are more susceptible tobecoming malignant. For instance, germinal centre B-cells have repressed cell cycle regulation and2Germinal centre Lymph node Naïve mature B-cell Centroblast Centrocyte Activated B-cell Dark zone Light zone Somatic hypermutation Plasma cell Memory B-cell Class switching Antigen Apoptosis Increased affinity Decreased affinity Figure 1.1: Stages of B-cell development in the lymph node. After a naive B-cell is acti-vated by binding of a free antigen or antigen displayed on an antigen-presenting cell(not shown), it migrates into the dark zone of the germinal centre. Here it proliferatesand undergoes somatic hypermutation of the immunoglobulin gene variable regions. Themutations introduced will either increase or decrease the BCR’s affinity for the antigen.Cells then migrate to the light zone, where selection causes them to undergo apoptosis ifno BCR stimulation occurs due to decreased antigen binding affinity. Cells that surviveselection may migrate back to the dark zone to repeat the process and further increaseantigen specificity. As cells progress through the light zone they undergo class-switchrecombination at the immunoglobulin heavy chain constant region to determine the typeof antibody produced (e.g. IgG, IgE, IgA). They can then exit the germinal centre anddifferentiate into memory B-cells or plasma cells.DNA damage responses in order to facilitate proliferation and somatic hypermutation in normalB-cell development [85]. The silencing of these processes is mediated by the BCL6 transcriptionalrepressor, but in many lymphoma types BCL6 is over-expressed allowing uncontrolled growth topersist in the cell outside of the germinal centre.The comprehensive characterization of expression patterns in B-cell lymphomas was facilitatedby the introduction of high-throughput gene expression profiling (GEP) methods. Importantly, GEPidentified additional distinct lymphoma subtypes that were indistinguishable by other features but3Figure 1.2: “Molecular processes that remodel immunoglobulin genes. Immunoglobulins(Igs) are expressed by B cells and consist of variable (V) regions, which interact withantigen, and constant (C) regions, which mediate the effector functions of Igs. To createa functional Ig, B cells must rearrange DNA segments that encode the heavy (H)- andlight-chain (not shown) regions of the variable genes . . . b | The process of somatic hyper-mutation is activated when B cells reach the germinal centre . . . This process leads to theintroduction of point mutations, deletions or duplications in the rearranged V-region ofIg genes (denoted by ‘Xs’ in the figure) [44]. These mutations occur in the V-region of Iggenes – not in the downstream Cµ region. c | Class switching results in the replacementof the originally expressed H-chain C-region gene with that of another Ig gene. In thediagram, the C-region for IgM (Cµ) and IgD (Cδ ) are exchanged for the C-region of IgG(Cγ1) by recombination at the switch regions for these genes (Sµ and Sγ1, respectively).This results in an antibody with different effector functions but the same antigen-bindingdomain.” NOTE: This figure is an exact reproduction with permission of Figure 1b-c inKüppers 2005.showed variable expression patterns. A key example of this was the identification of two distinctsubtypes of DLBCL [2]. While morphologically identical, these subtypes had expression patternsthat clearly clustered into two groups, each highly correlated to a distinct normal cell type (Fig-ure 1.3). One type resembled germinal centre (GC) B-cells (termed “GCB DLBCL”), and the otherresembled activated plasmablast cells (termed “activated B-cell like (ABC) DLBCL”). These havesince been classified as two distinct molecular subtypes [13]. The mutated pathways differ betweenthe two types leading to variable prognosis and response to treatment, with ABC DLBCL patientshaving inferior survival to GCB DLBCL patients [83]. DLBCL subtype is now frequently assessedin clinical trials, and assays are being developed that will facilitate implementation in clinical prac-tice [90, 117]. GEP has also provided further evidence for the classification of primary mediastinalB-cell lymphoma (PMBCL) as its own entity which likely derives from a thymic B-cell [84, 86, 96].This disease has been historically classified as a third subtype of DLBCL, but GEP studies have4Germinal centre Lymph node Centroblast Centrocyte Activated B-cell Dark zone Light zone Somatic hypermutation Plasma cell Class switching GCB DLBCL ABC DLBCL Follicular lymphoma Hodgkin lymphoma PMBCL Plasmablast Thymic B-cell Figure 1.3: B-cell lymphoma types are associated with normal B-cell counterparts. Arrowsshow the normal cell type from which each lymphoma type is putatively derived. ABCDLBCL: activated B-cell like diffuse large B-cell lymphoma, GCB DLBCL: germinalcentre B-cell like diffuse large B-cell lymphoma; PMBCL: primary mediastinal B-celllymphoma.shown it to have features similar to nodular sclerosis classical Hodgkin lymphoma (cHL) and it isnow widely considered to be a distinct entity. These examples highlight the value of novel datatypes in advancing our understanding of B-cell lymphomas.The advent of next-generation sequencing techniques has allowed for further exploration intothe genetic basis for lymphomagenesis and pathogenesis. Sequencing studies have identified recur-rent driver mutations including single nucleotide variants (SNVs), copy number aberrations (CNAs),and structural variants (SVs) in multiple lymphoma types, often informing on common mechanismsand pathways that are deregulated in specific types [29, 31, 51, 53, 63, 64, 70]. Alterations arefrequently found in genes integral to normal B-cell differentiation (e.g. BCL6 encoding BCL6). Thecombination of developmental stage and acquired alterations gives clues about how pathogenesisof certain lymphoma types might be achieved. In particular, sequencing has been used to char-5acterize recurrent lymphoma-specific chromosomal translocations. Early cytogenetic work led tolymphomas being termed “translocation cancers,” because they are enriched for specific transloca-tions that often represent driver events in lymphomagenesis [98]. While it is now appreciated thattranslocations comprise only a subset of lymphoma driver mutations and that other alterations suchas SNVs and CNAs play an essential role in many types, translocations still represent a hallmark ofmany lymphomas and are a unique feature of this disease as compared to other cancers [29, 64, 70].The introduction of sequencing data has allowed for detailed characterization of translocation break-point anatomy, and has provided additional information about the mechanisms by which these rear-rangements arise and their functional impact.1.1.2 Genomic rearrangements in B-cell lymphomasOne of the key findings from early classical cytogenetics studies is the prevalence of genomic rear-rangements in B-cell lymphomas. These are typically distinct and pathognomonic to specific lym-phoma types, and have been key to understanding the pathway alterations underlying lymphomapathogenesis. While the presence of rearrangements has long been appreciated, investigation of thespecific breakpoints and possible functional effects of translocations has been greatly advanced inrecent years with the advent of next-generation sequencing technology. Pathogenesis and mechanismsAs discussed in Section 1.1.1, one early genetic factor used to distinguish B-cell lymphoma typesis the presence of translocations throughout the tumour genome. These occur when portions oftwo non-homologous chromosomes are erroneously joined after double-stranded breaks (DSBs) areintroduced, resulting in derivative hybrid chromosomes. The mechanisms by which these eventsarise and their possible functional effect is a major focus of study in lymphoma biology.One highly recurrent pathogenic mechanism in B-cell lymphoma-associated rearrangements isthe involvement of the immunoglobulin heavy chain (IGH) locus. IGH is recurrently rearranged inmultiple lymphoma types with different translocation partners occurring in different types. In fol-licular lymphoma, BCL2-IGH translocations are found in approximately 90% of cases and Burkitt’slymphoma is characterized by MYC-IGH translocations in greater than 95% of cases [88]. DLBCL6shows multiple recurrent IGH translocations including BCL6-IGH, BCL2-IGH and MYC-IGH atfrequencies ranging from 15-35% [43].The pathogenic significance of IGH rearrangements is explained by the identity of the rear-rangements partners, many of which are proto-oncogenes [72, 113]. Studies have demonstratedthat the IGH locus is enriched with enhancer elements since this region is designed to be highlyexpressed during development for the production of BCR antibody. When the enhancer-rich IGHlocus is placed in close proximity to the proto-oncogenes, it results in over-expression of the part-ner genes promoting their oncogenic activity [43]. For example, BCL6-IGH translocations leadto over-expression of BCL6 protein, which represses cell cycle regulation and DNA damage re-sponse. BCL2 encoded by BCL2 is an anti-apoptotic factor, and BCL2-IGH translocations cause itsover-expression and promote tumour cell survival [85].The frequency of translocations involving IGH suggests a shared mechanism by which theyarise. As described, the normal development of B-cells involves multiple mechanisms by which theBCR antibody increases its affinity for antigen (“affinity maturation”). One such stage is somatichypermutation in the dark zone of the germinal centre, where mutations are introduced into thevariable regions of the immunoglobulin heavy (IGH) and light (IGL) chain loci (Figure 1.2b). Thisprocess is mediated by a protein called activation-induced cytidine deaminase (AID), which convertscytosine residues to uracils at specific DNA motifs, causing random point mutations and smallindels to be introduced during repair. This random variability in the immunoglobulin loci has thepotential to increase or decrease antibody avidity. In the germinal centre light zone, the process ofclass-switch recombination at the IGH constant region is also initiated by AID-induced alterations,where subsequent repair mechanisms introduce DSBs and allow for rearrangement of the constantsegments (Figure 1.2c).AID works by recognizing specific DNA sequence motifs of the form RGWY and WRCY,where R represents purine bases (adenine or guanine), G represents guanine, W represents adenineor thymine, Y represents pyrimidine bases (cytosine or thymine) and C represents cytosine [40].These motifs are abundant in the immunoglobulin (Ig) variable and constant switch regions to facil-itate mutation by AID in normal B-cell development. However, the non-specificity of these motifs7also provides the opportunity for breaks to be introduced at other sites in the genome.In fact, it has been shown that recurrent partners of IGH translocations, such as BCL6 and MYC,are enriched for mutations in AID target motifs [107]. This suggests that the introduction of DSBsby AID in both IGH and the partner region facilitates the frequently observed translocation events.AID is active in germinal centre B-cells during SHM and CSR, explaining the frequency of B-celllymphomas arising in this stage. Off-target AID activity provides an example of how malignancycan be achieved through aberrant behaviour during normal cell development. It has since beendemonstrated that AID activity is a requirement for lymphomagenesis in germinal centre-derivedlymphomas [71]. The susceptibility to aberrant AID activity is further exacerbated in the germinalcentre due to repressed DNA damage responses [47].In addition to IGH, other genes have been shown to be recurrently rearranged in lymphomas.Recently our group has described two such loci: the CIITA gene locus on chromosome 16, andthe programmed death ligand (PDL) locus containing the CD274 (PDL1) and PDCD1LG2 (PDL2)genes on chromosome 9 [97, 101]. These genes are integral to the anti-tumour immune response,and have become a major area of interest in the lymphoma community.1.1.3 Anti-tumour immune responseIn addition to our understanding of B-cell lymphomas through the profiling of tumour cells, therehas been an increasing appreciation for the contribution of non-malignant microenvironmental cellsin lymphoma pathogenesis [88]. A highly investigated topic in recent years is the interaction be-tween tumours and the host innate and adaptive immune system. Multiple studies have demonstratedthat tumour cells are able to interact with components of the anti-tumour immune response, allow-ing them to circumvent destruction. Furthermore, most recent studies have demonstrated that themechanisms underlying these interactions often relate to acquired genetic alterations.The adaptive immune response is critical for anti-tumour activity, and in particular the cell-mediated T-cell response plays a major role in the recognition and destruction of tumour cells.Activation of such T-cells requires two signals [67, 88]. The first involves binding of a tumour-specific antigen to the T-cell receptor (TCR) on the surface of the T-cell. The second involves co-8Figure 1.4: “Interactions with the microenvironment that mediate immune escape. The figureshows mechanisms observed in selected lymphomas that affect the relationship betweentumour cells and the immune system. Part a highlights the loss of surface moleculesthat are involved in the interactions with the immune system. Loss of major histocom-patibility complex class I (MHC I) and MHC II molecules are postulated to reduce thepresentation of tumour-specific antigens to cells of the immune system. The targeting ofthe cells with loss of MHC I by natural killer (NK) cells is abrogated by the coordinateloss of expression of CD58, a ligand that activates NK cells. Part b shows the gain ofsurface molecules that reduce antitumour immune function. The ligands of programmedcell death protein 1 (PD1) provide inhibitory signals to T cells, and HLA class I his-tocompatibility antigen, α-chain G (HLA-G) reduces immune function.” NOTE: Thisfigure is an exact reproduction with permission of Figure 3a-b in Scott and Gascoyne2014. CTL: cytotoxic T lymphocyte, TCR: T-cell receptor.stimulatory and co-inhibitory pathways required for an activated T-cell to interact with its targets.The TCR is able to recognize tumour-specific antigen when it is presented by a tumour cell oran APC. This presentation is facilitated through the major histocompatibility complexes (MHCs)expressed on the cell surface. Lymphoma cells are often able to disrupt the process of antigenpresentation through alterations in genes central to MHC activity (Figure 1.4a). Mutations involveboth components of the MHCs such as B2M in DLBCL, and regulators of MHC expression such asthe MHC class II transactivator CIITA in PMBCL and cHL [67]. When MHC function is impairedand tumour-specific antigen cannot be presented, tumour cells are able to avoid detection by theimmune system by preventing the activation of T-cells that recognize them.The second stage of T-cell activation involves co-stimulatory or co-inhibitory pathways thatmediate T-cell activity. These are required for a variable immune response, and enable T-cells tobe active when required but also inactive when not required to prevent auto-immunity and tissue9damage [67]. One highly studied protein involved in this process is programmed death 1 (PD-1)which is an inhibitor of T-cell activity. Alterations involving amplification and rearrangement ofPD-1’s ligands, PD-L1 (encoded by CD274) and PD-L2 (encoded by PDCD1LG2), are frequentin PMBCL, cHL and DLBCL (Figure 1.4b) [30, 92, 101]. The JAK-STAT pathway that mediatesPD-1 ligand expression is also frequently altered [30]. In these cases disruption of the PD-1/PDLco-inhibitory pathway allows tumour cells to constitutively inhibit the T-cell response.As CIITA and the PD-1 ligands are frequently rearranged in multiple lymphoma types and havea common involvement in the anti-tumour response, they have been a major research focus sincetheir discovery [67, 103]. MHC class II and CIITAThe first stage of T-cell activation requires the presentation of an antigen to the T-cell receptor.CD4+ T-cells recognize antigen when presented on an MHC class II complex (Figure 1.4a).The MHC class II transactivator CIITA (encoded by CIITA) is responsible for regulating expres-sion of the MHC class II components. This gene is affected by recurrent mutations and transloca-tions in multiple lymphoma types including DLBCL, cHL, and PMBCL [80, 97]. The frequency ofCIITA alterations is variable between types, with the highest incidence in PMBCL where mutationshave been shown in up to 71% of cases [68]. Our group has demonstrated decreased MHC class IIexpression as a result of CIITA mutations, leading to reduced tumour cell immunogenicity due tothe lack of tumour antigen being presented to T-cells [20]. Reduced MHC class II expression hasalso been associated with decreased survival [21, 82].Importantly, the recurrence of CIITA translocations is a recent discovery from our group [97].Studies so far have demonstrated that CIITA rearranges promiscuously with multiple partner regions,including the PD-1 ligand genes CD274 and PDCD1LG2 [101]. The breakpoints of all CIITAtranslocations described to date occur in intron 1 suggesting a common rearrangement mechanism.In fact, further investigation of CIITA intron 1 revealed that mutations in this region frequently fellin AID target sites, suggesting that the CIITA locus is likely also a target of aberrant AID activityduring somatic hypermutation [68].10The impact of described CIITA rearrangements demonstrate two functional effects. Firstly, theydisrupt the coding sequence of the CIITA gene and likely prevent translation of a functional CIITAprotein, impairing MHC class II expression as described. Additionally, the CIITA promoter is alsohighly active in B-cells. Some partners of CIITA are known proto-oncogenes, and a promoter swapmechanism is thought to cause increased expression of oncogenic partner proteins analogous to IGHtranslocations [103].The rearrangement of the CIITA locus is a relatively new area of exploration in B-cell lym-phomas. Further investigation into the full set of translocation partners, location of breakpoints, andpresence of additional rearrangements in this region is required to fully understand the relevanceand mechanisms of CIITA alterations. Programmed death signalling pathwaysThe second stage of the anti-tumour immune response involves co-stimulation or co-inhibition ofT-cell activity. A key protein in this process is PD-1. When bound to one of its ligands PD-L1 orPD-L2, PD-1 inhibits T cell activation and proliferation, resulting in T-cell anergy and apoptosis[67].The genes encoding the PD-1 ligands, CD274 and PDCD1LG2, are located adjacently at chro-mosome 9p24.1. The PDL region is frequently amplified in multiple lymphoma types includingPMBCL, cHL and DLBCL [30, 92, 101]. Amplifications cause over-expression of the ligands lead-ing to an increase in PD-1/PDL binding and a decrease in T-cell activity. Tumours containing PDLamplifications exhibit an immune escape phenotype, where inactivation of the T-cells that recognizethe tumour cells prevents them from being destroyed [22].Recently, our group has demonstrated recurrent translocations in the PDL region in PMBCLpatients and cell lines [101]. Like CIITA, the existing reported partners of PDL rearrangementsare promiscuous [97, 101]. Translocations in the PDL region have been shown to lead to over-expression of the involved ligand, often through promoter swap with a highly active partner pro-moter (e.g. CIITA, IGH) [102]. Some rearrangements also occur near the 3’ end of the genes, andmay cause over-expression through the introduction of downstream enhancer elements [103]. While11expression differences in cases with CD274 translocations and copy number aberrations are compa-rable, cases with PDCD1LG2 translocations demonstrate significantly higher expression of PD-L2than amplified cases, highlighting the importance of these newly discovered translocation events[101].The mechanism by which PDL translocations arise is currently unknown. The reported break-point sites do not show an enrichment for AID motifs, making it unlikely that aberrant somatichypermutation is the cause of DSBs [103]. Further investigation of PDL breakpoint anatomy witha larger sample size will be required to identify commonalities.In recent years, the PD-1/PD-L1 axis has been a major area of interest in immune therapy. Sincemany lymphomas rely on deregulation of the immune response for tumour survival, the signallingpathways mediating this response represent a potential therapeutic target. Immune checkpoint in-hibitors that bind either PD-1 or PD-L1 have been developed and are now in clinical trials forlymphomas [4, 5, 112] as well as multiple solid tumour types [73]. These act as antagonists whichprevent PD-1/PDL binding, thereby preventing inhibitory signalling and restoring T-cell activitytargeting the malignant cells [73, 91]. PD-1 inhibitors have demonstrated differential efficacy inlymphoma patients, most successfully in relapsed and refractory Hodgkin lymphoma cases wherea phase I/II trial showed an objective response rate of 87% in treated patients [4]. Results fromtrials in relapsed FL and post-autologous hematopoietic stem-cell transplantation DLBCL patientsshowed more modest overall response rates of 66% and 51%, respectively [5, 112]. The differ-ential responses between patients receiving these therapies in both lymphomas and solid cancersindicates the necessity of predictive biomarkers that can identify the subset of patients most likelyto benefit from this therapy [91]. PD-L1 expression level has already shown some value as a pre-dictive biomarker, and it follows that understanding the specific genomic alterations altering PDLexpression may have further predictive capability [100].1.2 Targeted sequencingThe advent of next-generation sequencing has greatly increased our understanding of B-cell lym-phoma biology and cancer biology in general. While whole-genome sequencing (WGS) provides12the most comprehensive information about a tumour genome, it has trade-offs when compared totargeted sequencing approaches which examine only a portion of the genomic space [60]. To achievesufficient coverage depth in WGS data for identification of variation above background noise, 30-fold redundant coverage of the 3 Gb human genome is typically required, representing approxi-mately 90 Gb of aligned sequencing data [69]. This requirement is even higher for the discovery oflow-frequency mutations in tumour genomes, and for rare mutations may require coverage on theorder of 100-fold depth. The large amount of sequencing required for WGS analysis leads to lowerthroughput and consequently higher cost than targeted approaches [79]. The number of alterationsdiscovered in WGS is also likely to be much larger, making it challenging to isolate functionallyrelevant findings in a discovery context. Furthermore, the analysis of WGS requires significantlymore computational resources (e.g. storage space, CPU hours).In contrast, targeted sequencing approaches provide data for a limited genomic space, but re-quire less sequencing to achieve comparable depths [79]. This also makes them amenable to multi-plexed sequencing, increasing throughput and yielding a per-sample reduction in both monetary andcomputational costs [56]. The amount of genomic space targeted can vary based on user-specificneeds, ranging from whole-exome sequencing (WES) to sequencing of a single genic region. Manycompanies produce custom kits that enrich DNA in user-defined regions, and this technology is wellsuited to cancer research since many tumour types have been shown to harbour recurrent mutationsand rearrangements in a small number of loci. The range of applications for targeted sequencing iscontinuously being explored.1.2.1 MethodologyWhile methodology varies between different approaches, targeted sequencing always begins withenrichment of the target DNA from a genomic sample. Methods for target enrichment can be sepa-rated into two broad categories: PCR-based enrichment and hybridization capture-based enrichment[60].PCR-based enrichment involves the creation of primers that flank the target region. These areused to PCR-amplify the target DNA and the products are subsequently sequenced. This method13has high specificity owing to the uniqueness of the primer pairs [60]. However, it is not well suitedto large target regions since there is an upper limit on the size of a PCR product that can be amplified(~10 kb), and larger regions require multiple tiled primer sets [56]. It is also low throughput, andamplification bias in the PCR step can result in allelic dropout where one copy of a given locusis preferentially amplified causing loss of the other allele. Some regions are also poorly suitedfor PCR-based enrichment due to factors like high GC content or secondary structures that reduceamplification efficiency [57].Hybridization capture-based enrichment is performed using labelled oligonucleotide probes (or‘baits’) that are designed in a tiled fashion, complementary to the region of interest (Figure 1.5). Theprobes are hybridized to DNA in the genomic library and are used to isolate bound fragments (i.e.those originating in the target region) and remove unbound fragments. The bound fragments canthen be washed off and sequenced. Hybrid capture is better suited to larger target spaces since theprobe design is easily scalable, and it is useful for investigating regions with frequent alterations butno defined hotspot. Furthermore, it can interrogate multiple regions concurrently and retains highsensitivity [56]. However, in some instances it can be less specific than PCR-based methods becausethe capture probes often bind to fragments outside the target space with partial complementarity[60]. This means that the enriched portion after capture often includes some DNA that does notcome from the region of interest (“off-target” DNA).Two general methods of hybrid capture enrichment have been developed: an array-based ap-proach and a solution-based approach [56]. In the array-based approach, probes are stationary on anarray and sample DNA is washed over the surface. This requires specialized equipment, is somewhatlabour-intensive and also requires a large amount of starting sample material. The solution-basedapproach requires less DNA since it relies on the use of a much larger ratio of probes to genomicDNA. Benchmarking studies have demonstrated that solution-based capture yields higher coverageof the target space and higher uniformity, especially when the captured region is relatively small(Figure 1.6). Solution-based capture is also less labour-intensive. Furthermore, this approach iswell suited to multiplexing, and allows pooling of multiple samples prior to capture and sequenc-ing. One of the most popular commercial kits available for hybrid capture is the Agilent SureSelect,14Sample DNA library Streptavidin-coated magnetic beads Non-target fragments discarded SEQUENCING Hybridization Wash beads Bead capture Gene/region of interest Capture chrA A B Biotinylated capture probes Capture Figure 1.5: Agilent SureSelect kit design and methodology. (A) Custom target regions forcapture (red box) can be designed that fall in any genomic region of interest (grey box).(B) The Agilent SureSelect kit is created by designing tiled biotinylated RNA probescomplementary to the chosen capture region. Probes will hybridize with fragments inthe DNA library that come from the target space, and these fragments can be isolated bycapture with Streptavidin-coated beads. The isolated target DNA can then be sequenced.Blue lines represent RNA probes and yellow circles represent biotin labels. Red linesrepresent DNA in the target space, and black lines represent non-target DNA. This figurewas adapted from the SureSelect Workflow diagram on the Agilent website.15Figure 1.6: “Coverage plot for array and solution hybrid capture, for 3.5 Mb of exonic targetand whole human exome. Values were taken from five independent array and solutionexperiments, using the same CTR, with each capture using a different DNA sample, andeach yielding roughly 107 mappable sequences per lane. One lane of sequencing wasused for 3.5 Mb captures, whereas two or three lanes were used for the whole exome.Error bars, s.d. (n = 5).” NOTE: This figure is an exact reproduction with permission ofFigure 4 in Mamanova et al. 2010. CTR: capture target region.and this protocol has been shown to be highly effective when compared to other kits in a bench-marking study of whole-exome platforms [59].In general, the suitability of each targeted sequencing approach is dependent on the specificapplication. Some critical considerations include the size of the desired target space (a region greaterthan ~10 kb is better suited to hybrid capture), the number of samples (multiplexing is possible withsolution-based hybrid capture), the amount of available starting material (array-based hybrid capturerequires more material), and the type of analysis that will be performed (e.g. SNV vs. CNA vs. SV).In particular, the detection of SVs including translocations where the partner region is unknown isnot feasible with a PCR-based approach since the primer design requires knowledge of the sequenceon both sides of a breakpoint. A hybrid capture approach can be used for this purpose since captureprobes can bind any DNA fragment containing complementary sequence, including those that spanan SV breakpoint with one end in the target space (see Section ApplicationsThe applications of targeted sequencing continue to be explored and expanded. Many studies havedemonstrated the utility of this method for identifying SNVs, CNAs and SVs in genomic data. Atthe whole-exome level, this approach has been used to describe the mutational landscape of severaltumour types and elucidate novel oncogenic pathways. Such studies have investigated solid tumoursand hematologic malignancies including prostate cancer, melanoma, breast cancer, colorectal can-cer, chronic lymphocytic leukemia and non-Hodgkin lymphoma [7, 63, 75, 109, 116]. Many morestudies have performed focused analysis interrogating specific cancer-related genes [26, 32, 74].Targeted sequencing approaches are increasingly being explored as potential clinical tools, with theultimate goal of identifying patient-specific genomic alterations that can guide treatment strategies[1, 18, 60, 79, 105, 106]. Detecting genomic rearrangementsA major focus of study in cancer genomics is the identification of structural variants including inter-chromosomal translocations and large intra-chromosomal deletions, inversions and duplications.This is particularly relevant in lymphomas due the role of recurrent translocations in lymphomage-nesis.A commonly used method for detecting SVs in lymphoma samples is fluorescence in situ hy-bridization (FISH) break-apart assays. In this method, differentially labelled probes (e.g. red andgreen) are designed that hybridize on either side of a region where rearrangements are expected(Figure 1.7A). The probes are hybridized to tumour samples and the nuclei of the tumour cells arevisualized under a microscope (Figure 1.7B). If the cells are wild-type (i.e. no rearrangement in theregion of interest), the two labelled regions will be located adjacently in the nuclei, causing adjacentred and green signals to be observed. If a cell harbours a rearrangement, there will be a spatialseparation between the two labelled regions, causing a separation of the two coloured signals. Thisis referred to as a “break-apart” pattern. The absence of one coloured signal is also typically clas-sified as break-apart positive since it indicates an abnormality in one of the probe binding regions,suggestive of a chromosomal break as part of an unbalanced rearrangement.17A BchrA Gene/region of interest Probe 1 Probe 2 Figure 1.7: Schematic of the FISH break-apart assays used to detect rearrangements. (A)Differentially labelled probes flanking the gene or region of interest are hybridized totissue specimens. (B) Break-apart positive cases harbouring rearrangements in the regionof interest are characterized by nuclei with separation of the red and green signals ordeletion of one signal, while normal alleles yield adjacent signals.The currently reported genomic rearrangements discovered in CIITA and the PDLs were ini-tially found through FISH break-apart analysis [97, 101, 102]. While FISH assays provide high-level information about rearrangement status, they are unable to describe the exact location of arearrangement breakpoint making it impossible to understand specific breakpoint anatomy. In mostcases, published breakpoint information has been obtained using RNA-seq analysis to identify thepartner region, and additional PCR experiments that rely on amplification and Sanger sequencingof DNA from fresh-frozen tissue to find the exact breakpoint location.Only in the past decade has targeted capture-based sequencing been explored as a potentialtool for SV detection. In 2007, it was first demonstrated that mapping of paired-end sequencingreads could be used for the identification of SVs in human genomes [41]. The authors describedhow discordantly mapped read pairs with reads mapping to two chromosomes, an abnormal dis-tance between mapped reads, or reads mapping in unexpected orientations could be used to identifytranslocations, deletions, inversions and tandem duplications in the genome (Figure 1.8). AdditionalSV detection algorithms have since been developed that incorporate data from reads that sequenceacross a rearrangement breakpoint (“split reads”), or utilize assembly-based methods rather thanread mapping (see Section 2.3) [3]. The current number of existing tools for SV detection is sovast that international benchmarking challenges comparing the sensitivity and specificity of variousalgorithms have been performed (!Synapse:syn312572).The utility of hybridization capture-based sequencing for investigating genomic rearrangementsis becoming more widely appreciated in cancer genomics, though it is generally considered to be a18CD274 Unknown fusion partner CIITA deletion tumour reference ABFigure 1.8: Capture of DNA fragments spanning rearrangement breakpoints can identifystructural variants. Black fragments represent individual reads and dashed lines joinpaired reads. (A) Translocations with one side in the target space can be sequenced toidentify the unknown fusion partner. (B) Intra-chromosomal SVs that have one or bothends in the capture space can be identified based on paired read mapping informationsuch as the distance between mapped reads.more difficult task than identifying point mutations and copy number aberrations [60]. Studies haveused custom capture designs targeting frequently rearranged genes such as ALK in lung cancers andKMT2A in acute leukemias to identify novel translocations in these regions at base-pair resolution[1]. Some groups have also designed custom cancer panels that include many cancer-related genesshown to have rearrangements with prognostic relevance in specific tumour types [10, 24, 95]. Utility for archival tissuesAs the utility of sequencing for clinical use becomes more feasible, conditions applicable to clinicalsettings need to be explored [37]. Many of the landmark studies that have described the mutationallandscape of tumours have relied on fresh frozen tumour samples for sequencing. However, inroutine clinical practice the standard tissue preservation approach is formalin-fixation and paraffin-embedding (FFPE) [28]. This method is logistically favourable to fresh frozen tissue because it takesup less physical space, requires less monitoring and does not require the use of freezers. Therefore,FFPE tissue repositories are generally much larger than biobanks of fresh-frozen material, increas-ing statistical power for correlative studies and allowing for study of relatively rare subtypes [11].Moreover, methodologies and findings might be directly applicable to clinical assay development19using routinely acquired FFPE. However, FFPE introduces challenges in downstream analysis be-cause DNA prepared with this method is highly susceptible to fragmentation and degradation bothbefore fixation and as a product of cross-linking [11, 46, 118].Many reports of targeted capture-based sequencing in FFPE tissue have been described for theidentification of SNVs, CNAs, and small insertions and deletions (indels). One group used WESof FFPE patient samples to successfully identify SNVs and CNAs, including clinically actionablemutations, with comparable power to frozen tissue [105]. Custom capture kits have also been usedthat target actionable or druggable genes [106], known kinase genes [115], mutated genes specificto individual tumour types [50], and known oncogenes and tumour suppressors [54]. These havebeen used to great success for identifying SNVs, CNAs and indels across multiple tumour types.Fewer studies have utilized targeted sequencing of FFPE tissue for the identification of SVs,as the difficulty of SV detection is exacerbated by the shorter fragments in degraded FFPE DNA.However, this has been successfully explored in a few recent reports. Pritchard et al. were ableto identify clinically relevant ALK fusions and a ROS inversion in FFPE lung tumours using acustom capture design targeting clinically actionable genes [74]. Another group also detected ALKrearrangements in lung cancers and identified novel translocation partners by targeting the disease-related GPS V2 gene set [1]. Beltran et al. used their own cancer panel to identify known recurrentERG rearrangements and a novel translocation in the BRAF oncogene in FFPE prostate tumours[10].The applicability of capture-based sequencing for SV detection in FFPE remains unexplored inB-cell lymphomas. This methodology is ideally suited for lymphomas due to the high recurrence ofrearrangements and the large repository of FFPE tissue samples available at the BC Cancer Agency.1.3 Research questions and project aimsWhile the impact of genomic rearrangements in B-cell lymphomas has been shown to have biolog-ical and clinical relevance, several open questions remain with our current understanding of CIITAand PDL alterations. Firstly, while FISH-based approaches have demonstrated that B-cell lym-phomas are frequently rearranged at these loci, the specific patterns of rearrangement breakpoints20and the landscape of CIITA and PDL fusion partners has not been well characterized. These locihave been hypothesized to rearrange promiscuously (i.e. with multiple partner loci) based on indi-vidual reported cases, but no studies with a sufficient sample size to test this hypothesis have beenperformed. Furthermore, rearrangements may occur that are below the detection resolution of FISHand do not produce a sizeable enough separation or deletion of FISH probes to be observable withbreak-apart assays. Rearrangements of this nature have never been assessed, and the frequency ofsuch events has not been explored. A key reason for the lack of comprehensive characterization isthat no strategies capable of utilizing archival FFPE tissue specimens have been developed, therebylimiting the number of samples available for analysis.I contended that a capture-based sequencing approach could be applied to overcome these ob-stacles. Specifically, I hypothesized that:• Targeted capture-based sequencing of the CIITA and PDL loci in B-cell lymphomas can con-firm the presence of genomic rearrangements, reveal breakpoint anatomy at base-pair resolu-tion, and detect the identity of novel rearrangement partners. Furthermore, this methodologycan be applied to FFPE tumour tissue as a tool for discovery;• Understanding breakpoint anatomy and the landscape of rearrangement fusion partners willprovide information about the functional effects of CIITA and PDL rearrangements.In designing a strategy to test these hypotheses, I attempted to answer three main researchquestions:1. Can a capture-based sequencing approach be effectively applied to FFPE lymphoma tissuefor rearrangement detection? (Feasibility)2. Do rearrangements in B-cell lymphomas show recurrent patterns? (Recurrent rearrangementpatterns)3. Are there functional or clinical implications that can be elucidated by examining rearrange-ments at this level of resolution? (Functional impact)21To answer these questions, my thesis work focused on developing, optimizing and applying arearrangement detection/analysis workflow to a large number of FFPE lymphoma specimens.22Chapter 2Study Design and MethodologyFor all software steps described in this chapter, the full set of non-default parameters used can befound in Table B.1 in Appendix B.2.1 Sample selection and library constructionWhen selecting cases for investigation, we sought to identify patients where we expected to finda rearrangement in either the CIITA or PDL loci. This allowed us to address our research ques-tions by including an adequate number of patients to assess recurrent patterns in our findings, andenriching for cases where we could test the sensitivity of our capture-based method for detectingrearrangements and novel partners. We used results from FISH break-apart assays to select thesepatients.FISH break-apart assays were performed as previously published using tissue microarrays (TMAs)containing cores from each specimen [97, 101]. Briefly, TMA sections were stained with green andred labelled break-apart bacterial artificial chromosome (BAC) probes flanking the region of in-terest. After hybridization, slides were imaged and 200 interphase nuclei were assessed for eachspecimen. For those with patterns that were examinable, rearrangement status was determined andcases were classified as break-apart positive if >5% of nuclei had split signals or patterns indicativeof unbalanced rearrangements of the genomic locus (e.g. loss of one signal). See Appendix A foradditional details.2310,800,000 11,000,000 11,200,000 bp 11,400,000 CIITA RP11-109M19 RP11-66H6 SOCS1 10,600,000 chr16 5,200,000 5,400,000 5,600,000 bp 5,800,000 6,000,000 RP11-963L3 RP11-12D24 CD274 PDCD1LG2 RP11-207C16 RP11-845C2 chr9 ABFigure 2.1: FISH break-apart assays for the CIITA and PDL loci. Labelled probes flankingthe (A) CIITA and (B) PDL loci were hybridized to tumour specimens on tissue microar-rays. After hybridization, nuclei were observed and classified as break-apart positive ornegative.In a translational research setting, FISH break-apart assays are routinely applied to lymphomaspecimens to assess the rearrangement status of multiple loci including CIITA, the PDL locus, andtwo additional recurrently rearranged genes TBL1XR1 and TP63 (Figure 2.11) [89]. Table 2.1 de-scribes the coordinates of the BAC probes used for these assays. Specimens were selected if theyhad a break-apart positive status at any of these four loci. In addition, five samples were includedbecause they showed over-expression of one or both PD-1 ligands by qRT-PCR, but did not havea break-apart positive signal pattern [101]. Patients harbouring only TBL1XR1 or TP63 rearrange-ments or showing over-expression served as negative controls in our experiment since we did notexpect to find CIITA or PDL rearrangements in these cases.In total, 92 cases were selected for investigation. These included 41 PMBCLs, 39 DLBCLs, 11FLs, and 1 cHL cell line (L-1236). Of the 91 clinical cases, 85 were available as FFPE tissue, andthe remaining 6 were preserved as fresh frozen samples. Table 2.2 describes the samples and theirbreak-apart status at the four loci of interest.1These figures were adapted from schematics generated by Susana Ben-Neriah.24Table 2.1: Chromosomal coordinates (hg19) of the BAC probes used in FISH break-apartstaining. Shading in the “BAC name” column shows the fluorescence colour used.Locus Chromosome location BAC name Start End LengthPDL9 RP11-963L3 5,157,912 5,297,920 140,0099 RP11-12D24 5,297,921 5,454,512 156,592CD274 (PDL1) gene 5,450,503 5,470,567 20,065PDCD1LG2 (PDL2) gene 5,510,545 5,571,282 60,7389 RP11-207C16 5,649,499 5,819,060 169,5629 RP11-845C2 5,838,250 6,013,270 175,021CIITA16 RP11-109M19 10,680,358 10,866,730 186,373CIITA gene 10,971,055 11,018,840 47,78616 RP11-66H6 11,036,514 11,203,598 167,085TBL1XR13 RP11-1148M8 176,526,100 176,685,112 159,013TBL1XR1 gene 176,738,542 176,915,048 176,5073 RP11-1065H24 177,054,192 177,253,920 199,729TP633 RP11-24F1 189,161,021 189,338,343 177,323TP63 gene 189,349,216 189,615,068 265,8533 RP11-53D15 189,675,419 189,769,543 94,12525Table 2.2: Description of the 92 patients selected for investigation. A break-apart status of 1 indicates a positive break-apart signal, 0represents a normal signal, and ‘NA’ indicates a FISH assay was not performed. A library ID of ‘NA’ indicates the sample failedlibrary construction (discussed in Section 3.1.1). BA: break-apart, NA: not applicable, DLBCL: diffuse large B-cell lymphoma, PM-BCL: primary mediastinal B-cell lymphoma, FL: follicular lymphoma, cHL: classical Hodgkin lymphoma, O-EX: over-expressed,FFPE: formalin-fixed paraffin-embedded.Case ID Library ID DiseaseCIITABA statusPDL BAstatusTP63 BAstatusTBL1XR1BA statusTissuepreparationSequencingbatchReadsBC_077 NA DLBCL 1 0 0 0 FFPE NA NABC_078 A43029 DLBCL 1 0 0 0 FFPE 2 29,099,284BC_069 A43030 PMBCL 1 1 0 0 FFPE 2 33,385,716BC_079 A43031 DLBCL 1 0 0 0 FFPE 2 24,367,610BC_080 A43032 DLBCL 1 0 0 0 FFPE 1 18,365,200BC_081 A43033 DLBCL 1 0 0 0 FFPE 1 16,714,140BC_082 A43034 DLBCL 1 0 0 0 FFPE 2 25,282,130BC_072 NA PMBCL 1 0 NA NA FFPE NA NABC_093 A43036 PMBCL 1 0 NA NA Frozen 1 22,844,204BC_094 A43037 PMBCL 1 0 NA NA Frozen 1 83,581,936BC_083 A43038 DLBCL 0 1 0 0 FFPE 2 26,849,204BC_073 NA DLBCL 0 1 0 0 FFPE NA NABC_091 NA DLBCL 0 1 0 0 FFPE NA NABC_092 A43041 DLBCL 0 1 0 0 FFPE 2 27,498,186BC_084 A43042 DLBCL 0 1 0 0 FFPE 1 24,937,322BC_095 A43043 PMBCL 0 1 NA NA Frozen 1 20,380,282BC_085 NA DLBCL 0 O-EX NA 1 FFPE NA NABC_086 A43045 DLBCL 0 0 0 1 FFPE 2 26,489,992BC_087 A43046 DLBCL 0 0 0 1 FFPE 2 30,854,814BC_088 A43047 DLBCL 0 0 1 1 FFPE 2 20,123,804BC_001 NA PMBCL 1 0 0 0 FFPE NA NA26Case ID Library ID DiseaseCIITABA statusPDL BAstatusTP63 BAstatusTBL1XR1BA statusTissuepreparationSequencingbatchReadsBC_002 A43049 PMBCL 1 0 0 0 FFPE 2 30,604,788BC_003 A43050 PMBCL 1 0 0 0 FFPE 1 21,416,260BC_004 A43051 DLBCL 1 0 0 0 FFPE 2 34,707,102BC_005 A43052 DLBCL 1 0 0 0 FFPE 2 27,384,062BC_006 A43053 DLBCL 1 0 0 0 FFPE 2 19,988,266BC_007 NA FL 1 0 0 0 FFPE NA NABC_008 NA FL 1 0 0 0 FFPE NA NABC_009 NA FL 1 0 0 0 FFPE NA NABC_010 NA FL 1 0 0 0 FFPE NA NABC_011 NA FL 1 0 0 0 FFPE NA NABC_012 NA FL 1 0 0 0 FFPE NA NABC_013 NA FL 1 0 0 0 FFPE NA NABC_014 NA FL 1 0 0 1 FFPE NA NABC_015 NA FL 1 0 0 0 FFPE NA NABC_016 NA FL 1 0 0 0 FFPE NA NABC_017 NA PMBCL 1 0 NA NA FFPE NA NABC_018 NA PMBCL 1 0 NA NA FFPE NA NABC_019 NA PMBCL 1 0 NA NA FFPE NA NABC_020 A43067 PMBCL 1 0 NA NA FFPE 1 20,980,184BC_021 A43068 PMBCL 1 0 NA NA FFPE 2 26,260,090BC_022 A43069 PMBCL 1 0 NA NA FFPE 1 26,637,312BC_023 A43070 PMBCL 1 0 NA NA FFPE 2 32,798,066BC_024 A43071 PMBCL 1 1 NA NA FFPE 2 23,983,668BC_025 A43072 PMBCL 1 0 NA NA FFPE 2 25,715,502BC_026 A43073 PMBCL 1 0 NA NA FFPE 1 21,280,568BC_027 A43074 PMBCL 1 1 NA NA FFPE 2 31,268,866BC_028 A43075 PMBCL 1 0 NA NA FFPE 2 33,042,774BC_029 A43076 PMBCL 1 0 NA NA FFPE 2 30,669,41627Case ID Library ID DiseaseCIITABA statusPDL BAstatusTP63 BAstatusTBL1XR1BA statusTissuepreparationSequencingbatchReadsBC_030 A43077 PMBCL 1 0 NA NA FFPE 1 19,505,062BC_031 A43078 PMBCL 1 0 NA NA FFPE 2 38,848,866BC_032 A43079 PMBCL 1 1 NA NA FFPE 2 31,723,494BC_033 A43080 PMBCL 1 0 NA NA FFPE 2 25,873,248BC_034 A43081 PMBCL 1 0 NA NA FFPE 1 17,208,682BC_035 A43082 PMBCL 0 1 0 0 FFPE 2 30,733,728BC_036 A43083 DLBCL 0 1 0 0 FFPE 2 20,632,328BC_037 A43084 PMBCL 0 1 0 0 FFPE 2 29,466,520BC_038 A43085 DLBCL 0 1 0 0 FFPE 2 38,320,144BC_039 A43086 DLBCL 0 1 0 0 FFPE 2 36,915,282BC_040 A43087 DLBCL 0 1 0 0 FFPE 2 34,875,622BC_041 A43088 DLBCL 0 1 0 0 FFPE 2 36,333,466BC_042 A43089 DLBCL 0 1 0 0 FFPE 2 32,349,052BC_043 A43090 PMBCL 0 1 NA NA FFPE 2 27,833,264BC_044 A43091 PMBCL 0 1 NA NA FFPE 2 28,187,526BC_045 A43092 PMBCL 0 1 NA NA FFPE 2 28,087,804BC_046 A43093 PMBCL 0 1 NA NA FFPE 2 28,008,090BC_047 A43094 PMBCL 0 1 NA NA FFPE 2 29,605,210BC_048 A43095 PMBCL 0 1 NA NA FFPE 2 33,427,098BC_049 A43096 PMBCL 0 1 NA NA Frozen 1 22,146,670BC_050 A43097 PMBCL 0 1 NA NA FFPE 2 29,934,310BC_051 NA PMBCL 0 1 NA NA FFPE NA NABC_052 A43099 DLBCL 0 1 NA NA FFPE 2 39,860,280BC_053 A43100 PMBCL 0 O-EX NA NA FFPE 2 32,110,152BC_054 A43101 PMBCL 0 O-EX NA NA FFPE 2 38,178,636BC_55B A43102 DLBCL 0 0 1 0 FFPE 2 44,623,430BC_056 A43103 DLBCL 0 O-EX NA NA FFPE 2 26,688,558BC_057 A43104 DLBCL 0 0 0 1 FFPE 2 26,897,75628Case ID Library ID DiseaseCIITABA statusPDL BAstatusTP63 BAstatusTBL1XR1BA statusTissuepreparationSequencingbatchReadsBC_058 A43105 DLBCL 0 0 0 1 FFPE 2 15,968,588BC_061 A43106 DLBCL 0 0 1 1 FFPE 2 34,222,570BC_062 A43107 DLBCL 0 0 1 1 FFPE 2 36,120,840BC_063 A43108 DLBCL 0 0 0 1 FFPE 2 34,118,112BC_064 A43109 DLBCL 0 0 1 0 FFPE 2 33,381,920BC_065 A43110 DLBCL 0 1 1 1 FFPE 2 36,768,436BC_066 A43111 DLBCL 0 0 1 0 FFPE 2 33,127,150BC_076 NA DLBCL 0 0 1 1 FFPE NA NABC_074 NA DLBCL 1 0 0 0 FFPE NA NABC_075 NA DLBCL 1 0 0 0 FFPE NA NABC_097 A43115 PMBCL 1 1 NA NA Frozen 1 22,986,098BC_060 NA DLBCL 0 0 1 1 FFPE NA NABC_071 A43117 cHL 0 1 NA NA Cell line 1 28,873,216BC_099 A43118 PMBCL 0 O-EX NA NA Frozen 1 25,885,766BC_089 A43119 FL 1 0 NA NA FFPE 2 29,793,97829DNA extraction was performed by Anja Mottok and Susana Ben-Neriah at the Centre for Lym-phoid Cancer, and library preparation was then attempted for all 92 samples at the Genome SciencesCentre (GSC; see Appendix A for details). Twenty-four samples failed library preparation due toeither an insufficient amount of DNA extracted or DNA fragments too small for sequencing (‘NA’library IDs in Table 2.2). The remaining 68 libraries were successful, and unique hexamer barcodeswere introduced to each library to allow multiplexing during the sequencing step. The 68 successfullibraries consisted of 35 PMBCLs, 31 DLBCLs, 1 FL and the cHL cell line (L-1236) [36, 114].2.2 Capture design and sequencing protocolTo enrich for the regions of interest, a custom Agilent SureSelect protocol was designed. We choseto use this solution-based hybrid capture approach for several reasons. First, it allowed us to detectSVs with unknown partners, and was amenable to multiplexing since we had a large number ofsamples. It was also well suited to small amounts of input DNA, which are typical of librariescreated from FFPE tissue since the nucleic acids in these specimens are highly degraded and thenumber of fragments long enough for sequencing is more limited than in frozen tissue. Finally, itcould easily capture the fairly large region that we wanted to investigate (on the order of 500 kb).This included the full genes and their surrounding non-coding space, since no confirmed hotspotregions had been previously described.Target coordinates were chosen based on previously described rearrangement breakpoints andincluded three regions. Two regions were located on chromosome 16: one spanning the CIITA generegion and the other located downstream spanning the SOCS1 gene region. SOCS1 is a tumoursuppressor that regulates JAK/STAT signalling and is frequently mutated in multiple lymphomatypes. Typically mutations are inactivating and prevent the formation of a functional SOCS1 pro-tein [61, 65, 97, 111]. The third region spanned the adjacent CD274 and PDCD1LG2 genes onchromosome 9 and was designed to include all previously described PDL translocation breakpoints[97, 101, 102]. Table 2.3 contains the exact target coordinates used in the capture design, whichspanned approximately 0.5 Mb of genomic space in total.An Agilent SureSelect custom capture assay was created consisting of labelled RNA probes30Table 2.3: Coordinates of the capture space used in the custom Agilent SureSelect design(hg19).Region Chromosome Start End LengthCIITA 16 10,959,585 11,277,301 317,717SOCS1 16 11,332,082 11,350,098 18,017PDL 9 5,449,434 5,573,579 124,146Total 459,880CIITA SOCS1 Capture chr16 CD274 PDCD1LG2 Capture chr9 Figure 2.2: Target regions selected for custom Agilent SureSelect design. Three custom targetregions were chosen surrounding the CIITA, SOCS1 and PDL (CD274 and PDCD1LG2)loci. X-axes represents genomic coordinates, and grey boxes represent the location ofgene regions. Coloured boxes above show the selected target space.complementary to the target space (Figure 1.5, Figure 2.2). Probes were designed using 5x tiling andstandard boosting, and no masking of the repetitive regions was used during the probe design. Thedecision not to mask multi-mapping probes increased our capacity to identify breakpoints locatedin repetitive regions, but also likely resulted in a higher off-target capture rate (described further inSection 3.1.2).Sequencing was performed in two batches (listed in the ‘Sequencing batch’ column of Ta-ble 2.2). The first batch contained the 16 libraries with the highest library yield. These cases werepooled prior to capture, then sequenced on a single lane of an Illumina HiSeq 2500 using version 3chemistry. This produced paired-end 100 bp reads that were de-multiplexed back into individual li-braries. The second sequencing batch contained the remaining 52 libraries, which were pooled intotwo groups of 17 and one group of 18 prior to capture and sequencing. Lanes in the second batchwere sequenced using version 4 chemistry, yielding paired-end 125 bp reads. Sequencing producedan average of 25.9 million reads for samples in the first batch and 30.4 million reads for samples inthe second batch (Table 2.2).FASTQ files were filtered to remove reads failing Illumina chastity filters and aligned to the En-31sembl human GRCh37 genome using bwa mem (v0.7.5a) [48]. Duplicate reads were flagged usingthe bammarkduplicates function from the biobambam tool (v0.0.185) [99]. Target coverageand additional quality metrics were calculated using the CalculateHsMetrics function in thePicard tool suite (v1.126; Per-base coverage was assessedusing the coverage function in the bedtools suite (v2.17.0) [76].2.3 Structural variant detection and filteringStructural variant detection was performed using existing tool suites. Two classes of algorithmswere utilized for this purpose: alignment-based algorithms (Section 2.3.1) and assembly-based al-gorithms (Section 2.3.3). These are able to identify inter-chromosomal translocations in addition tointra-chromosomal deletion, duplication and inversion events using sequence and paired-end infor-mation.2.3.1 Alignment-based analysisOne class of algorithms for identifying SVs utilizes an alignment-based approach. In these methods,paired-end sequencing reads are first aligned to the reference genome, and the alignment informa-tion within reads and within read pairs is used to identify mappings indicative of rearrangements.To improve the sensitivity of SV detection, we explored the use of multiple alignment-based tools incombination. SV analysis was performed in two batches corresponding to the sequencing batches,the first containing 16 samples and the second containing 52 samples.The first tool utilized was DELLY (v0.5.5) [78]. This tool was chosen based on the results of pre-vious benchmarking studies including the recent ICGC-TCGA DREAM Somatic Mutation CallingChallenge, where DELLY ranked in the top 3 submissions (based on balanced accuracy) in multiplesynthetic data sets (!Synapse:syn312572). The DELLY algorithm worksin two stages, the first utilizing paired-end information and the second utilizing single-end data. Inthe first stage, “spanning” paired-end reads are identified where one read aligns on one side of anSV and the other read aligns on the opposite side. Different classes of rearrangements yield differ-ent representative mapping patterns when mapped to the reference genome (Figure 2.3). Deletions32are represented by pairs with an abnormally large insert size compared to the median, inversionsand duplications are represented by pairs with abnormal read pair orientations, and translocationsare represented by pairs that map to different chromosomes. The second stage identifies single-end“split” reads that sequence across the breakpoint. These are found by isolating unmapped reads thathave a mapped pair, and matching the mapped read to the regions flanking potential SVs identifiedin the first stage. The unmapped reads represent possible split reads, and split reads matching eachpotential SV are then grouped and used to determine the consensus breakpoint sequence and theexact breakpoint locations of the SV. DELLY is run separately for each of the four SV types. Eachpredicted SV is listed in the output along with multiple quality measures describing the probabilitythat the variant is real, and counts of the spanning and split reads supporting the variant. We ranDELLY on each sample individually for each SV type to get raw predictions. These were filteredto retain only predictions with a ‘PASS’ filter value, indicating that they had support from at least3 spanning reads (5 for translocations) with an average mapping quality of at least 20. A furtherfilter was applied to remove predictions flagged as ‘IMPRECISE’, indicating that they had no splitread support and the exact breakpoint could not be determined. DELLY breakpoint coordinateswere then annotated with SnpEff (v3.5) using canonical transcripts [17]. The results from DELLYwere collapsed for all samples within each batch, so that events with identical breakpoints (samechromosome, strand and position) were considered to be the same.The next tool tested was the deStruct tool (v0.1.3; Thistool is currently unpublished, but its methodology has been briefly described in the supplementalmethods of Eirew et al. 2014:“In brief, discordant and non-mapping reads were extracted from bam files and re-aligned using a seed and extend strategy. Split alignment across a putative breakpointwas attempted for reads that did not fully align to a single loci. Discordant align-ments were clustered according to the likelihood they were produced from the samebreakpoint. Multiple mapped reads were assigned to a single mapping location us-ing previously described methods [34, 58]. Finally, heuristic filters removed predictedbreakpoints with poor discordant read coverage of sequence flanking predicted break-33Tumour genome Reference genome Deletion Tumour genome Reference genome Tumour genome Reference genome Tumour genome Reference genome chrA chrB A B C D Inversion Duplication Translocation Figure 2.3: Different types of structural variants introduce representative abnormalities in readpair mappings. (A) Deletions result in pairs with the expected orientation (1 forward and1 reverse), but aligning with an insert size larger than expected based on the observeddistribution. (B) Inversions result in pairs in the same orientation. The reads can both beforward (black) or reverse (grey) with respect to the reference, depending on which endof the inversion they span. (C) Duplication events yield pairs in opposite orientationswhere the reverse read is located upstream of the forward read. (D) Translocations createpairs that align to two different chromosomes. This figure was adapted from Figure 2 ofthe DELLY manuscript [78].34points.” [25]Empirically, deStruct produced many more raw results than DELLY. In contrast to DELLY whichwas run on a per-sample basis, deStruct was run on multiple samples concurrently. Analysis wasperformed in the two batches corresponding to the sequencing batches, so that the first set of de-Struct results contained combined SVs from 16 samples and the second set contained SVs from 52samples. deStruct predictions were filtered if either end was located in a non-standard chromosome(i.e. other than chr1-22, X or Y) since these represent unplaced contigs with high repeat content,and SVs involving this region were likely to be artefacts due to sequence similarity. Predictionswere also filtered if they matched known SVs in the Database of Genetic Variants as annotated inthe deStruct output (, the LUMPY tool for SV detection was explored [45]. LUMPY utilizes multiple distinctsources of evidence, most commonly those from paired-end and split-read data sets. In contrast toDELLY which only uses split-read evidence to corroborate paired-end findings, LUMPY analysesboth paired-end and split-read data separately and integrates the results (along with other possibleevidence sources) to generate a set of SV predictions. As described in the manual, discordantread pairs and split reads were first extracted from the library-level BAM files, and the distributionof insert sizes for each library was calculated using LUMPY’s script.LUMPY was run for all samples concurrently, separated by batch. Breakpoints for predicted SVswere annotated using SnpEff as described in the DELLY pipeline.For each tool, the set of predictions were further filtered based on read support to keep onlythose with seven or more spanning reads of support. The cut-off of seven reads was chosen basedon a series of attempted validations, where predictions with spanning read support of 2 or morewere selected. All selected predictions with fewer than seven supporting reads failed validation, sowe chose to remove these predictions since they consistently represented false positives in our data.To further remove artefacts that are likely due to sequence similarity, predictions were also removedif they had seven or more spanning reads of support in multiple samples, or if the total number ofspanning reads supporting the variant in all other samples combined was greater than or equal to thesupport in the single case where the SV was identified.352.3.2 Read trimmingAs described in Section 2.3.1, some alignment-based detection tools such as DELLY require uniquelymapped paired-end reads for identification of potential SVs. Since FFPE DNA is often heavily de-graded, we expected some of the paired reads to have overlapping sequence when the fragment sizewas less than 250 bp (i.e. 2 x 125 bp reads), thereby reducing the likelihood of read pairs aligninguniquely to the two sides of a breakpoint. We experimented with a trimming strategy where theend of each read was trimmed by a fixed amount prior to alignment. We expected that this strategywould artificially introduce a positive distance between some read pairs, and increase the likelihoodof obtaining discordantly mapped pairs with high mapping quality (Figure 2.4).Trimming was performed using the fastx_trimmer tool in the FASTX-Toolkit (v0.0.13.2; After trimming, reads were aligned as described in Sec-tion 2.2, and SV detection was performed as described in Section 2.3.1. This process was performedfor read lengths of 125 bp (only applicable for libraries in the second batch), 100 bp, 85 bp and 75bp. After filtering, results from different tools and trimming lengths were merged (see Section 2.3.4)and compared. The findings are described in Section Assembly-based analysisAn orthogonal method to the alignment-based SV detection strategy is to perform assembly ofthe sequenced data. In assembly, reads that come from overlapping fragments in the genome aremerged by identifying shared sub-sequences. The tiled sequences of all the reads are combinedwith the goal of reconstructing the full-length sequence from each region in the data (i.e. each targetregion in our experiment). Since this method is not dependent on a reference genome, it removesthe complication of aligning reads to regions that contain structural variants.To provide longer contiguous sequences for assembly, overlapping reads pairs in the FASTQfiles were first merged into single reads using the FLASH tool (v1.2.11) [55]. Reads were mergedif they had a minimum overlap of 10 base pairs and a maximum overlap of the full read length.Assembly was then performed using the ABySS assembler (v1.5.2) in paired-end mode [94]. Theun-merged reads from FLASH were used as a paired library, and the merged reads were used as a36Partner 1 Partner 2 Partner 1 Partner 2 Fragment size: A B Full-length reads Trimmed reads Partner 1 Partner 2 Fragment size: Fragment size: Figure 2.4: Schematic describing the read trimming strategy utilized for the FFPE sequencingdata. (A) When fragment sizes are larger than the length of the two paired-end reads (i.e.>250 bp), there is a high likelihood of generating paired-end reads that uniquely alignto each side of a breakpoint (black arrows). (B) When fragment length is shorter thanthe length of the paired reads, the reads will overlap and reduce the chance of obtaininguniquely mapped discordant pairs. By trimming the ends of the reads prior to alignment,a positive inner mate distance is created and discordant pairs can again be found.37single-end library. This produced a FASTA file containing the assembled contigs for each sample.A key consideration in running ABySS is selection of the k-mer value used for assembly[94]. During the assembly process, each read in the sequencing data is split into all possible sub-sequences of length k (referred to as k-mers), and shared k-mers are used to identify overlappingreads that are merged in subsequent steps. A smaller k-mer value increases the likelihood of findingoverlaps because it requires a shorter shared sequence, but it also uses more computational timeand memory. Conversely, a larger k-mer value increases specificity and prevents the likelihood ofshared k-mers occurring by chance, but may also reduce sensitivity in regions of low coverage sinceit requires fragments with larger overlaps. A large k-mer value also propagates sequencing errorsmore significantly since each single base pair error in a read will generate a larger number of erro-neous k-mers. In order to minimize the error introduced by either extreme, we decided to performassembly with multiple k-mer values and integrate the results. We chose values of 50, 55, 60 and64. These values were selected based on multiple quality metrics (number of contigs, N50 value,total size of the assembly). See Section 3.1.3 for details.After assembly, each set of contigs (i.e. each sample and k-mer value combination) was alignedto the reference genome (GRCh37) using bwa mem (v0.7.5a). The raw reads from the sample werealso aligned to the assembled contigs to assess the sequencing depth supporting each contig. ThePAVFinder tool (v0.2.0; was used to identify structural variantsusing both the contigs mapped to the reference and the reads mapped to the contigs (Figure 2.5).Results from all samples in each batch were then merged and filtered to remove predictions witheither end in a non-standard chromosome. Predictions were further filtered based on read counts asdescribed in Section 2.3.1 to retain only predictions found with at least 7 reads of support in a singlesample.2.3.4 Integrating resultsAfter all alignment- and assembly-based tools were applied and the resulting predictions were fil-tered, the results were combined. Each individual set of predictions is henceforth referred to as aresult-set. For the alignment-based algorithms, each tool was run for all possible trimming lengths,38chrA chrB Merged reads Contig Reference Paired-end reads Figure 2.5: Schematic describing assembly-based SV detection. Read pairs with overlappingsequence were merged, and the merged and un-merged reads were used to create anassembly with ABySS. Assembled contigs were mapped to the reference and raw paired-end reads were mapped to the contigs. Integrated data from both mappings were used todetect structural variants (e.g. translocations as shown here) with the PAVFinder tool.and each combination of a tool with a specific trimming length is referred to as a result-set. Forassembly-based output only full-length reads were used as input, and predictions for each k-mervalue are referred to as a result-set. This means that in total, predictions for samples in the firstsequencing batch could be found in up to 13 result-sets: deStruct with all trimming lengths (100 bp,85 bp, 75 bp), DELLY with all trimming lengths, LUMPY with all trimming lengths and PAVFinderwith all k-mer values (50, 55, 60, 64). Predictions for samples in the second sequencing batch couldbe found in an additional 3 result-sets since reads of 125 bp were also available for the alignment-based tools. Predictions from the result-sets were merged if both ends of the breakpoint were locatedwithin 25 bp and occurred on the same strand.2.4 High-confidence variants and validations2After filtering, we applied multiple criteria to further select a list of high-confidence SVs we ex-pected to represent true variants. Predictions were typically considered low-confidence and werefiltered out if they were identified in only a single result-set and had low read support (i.e. 10 readsor less). We also considered SVs to be low-confidence if they had multiple highly-concordant map-2Validations of PDL SVs were performed by David Twa and validations of CIITA SVs were performed by AnjaMottok.39pings on both sides of the breakpoint as assessed with BLAT ( When we observed multiple mappings of the same rearrangement due to ambiguous se-quence in the unknown partner region, we chose a single breakpoint as the most likely mapping.We selected the breakpoint that had evidence for the reciprocal rearrangement (i.e. the breakpointon the other derivative chromosome for translocations, or at the other end of an inversion). If nomappings had a reciprocal prediction, we selected the partner that was most biologically plausiblebased on strand orientation (i.e. avoiding predicted dicentric derivative chromosomes). After re-moving low-confidence SVs and collapsing the multi-mapping predictions, we were left with a listof high-confidence SV predictions.For the PDL SVs, validation was attempted for every high-confidence prediction. However, insome cases genomic material for validation was not available due to exhaustion of the tissue sample.When reciprocal translocations were identified, only the prediction pertaining to the region wherethe derivative PDL coding locus is retained was selected for validation. For predicted rearrange-ments in the CIITA region, we chose a subset of the translocations for validation based on tissueavailability and validated both reciprocal breakpoints when applicable.To validate the selected events, custom PCR primers flanking the predicted breakpoints weredesigned using the primer3 tool (v4.0.0) and PCR-amplified fragments were Sanger sequenced toconfirm the predicted breakpoint sequence and location [104]. Two of the predicted t(10;16) translo-cations had partner sequences mapping to multiple locations. In these cases designing unique PCRprimers was challenging, so a dual fusion FISH assay was used for validation3 (see Section details).2.5 Immunohistochemistry4Immunohistochemistry (IHC) was performed to assess PD-L1 (encoded by CD274) and PD-L2 (en-coded by PDCD1LG2) surface protein expression. A detailed description of the methodology canbe found in Appendix A. Briefly, TMAs containing tissue samples from 44 of the 67 clinical spec-imens (66%) were stained with PD-L1 rabbit monoclonal antibody and PD-L2 mouse monoclonal3Dual fusion FISH assay was performed by Susana Ben-Neriah.4IHC staining was performed by Katy Milne and Brad H. Nelson, and scoring was performed by Anja Mottok.40antibody. Staining was evaluable for PD-L1 in 35 specimens, and for PD-L2 in 37 specimens. Inthese cases, the percentage of positive tumour cells (0-100) and the staining intensity (1-3) wererecorded. IHC histological score was calculated for each case by multiplying the percentage ofpositive tumour cells by the staining intensity (0-300).41Chapter 3Results and Discussion3.1 Pipeline implementation and optimizationMy first research question addressed the feasibility of using a capture-based sequencing approachwith FFPE tissue for the purpose of rearrangement detection in B-cell lymphomas. In order toexamine this question, in this section I describe the metrics I used to assess the suitability of themethod.3.1.1 Library construction1As described in Section 2.1, initially we selected 92 cases (91 clinical specimens and 1 cell line) forinvestigation in this study. The GSC protocol required ~94 ng DNA yield after library constructionto proceed to sequencing, and 20 samples failed because they did not meet this requirement. Theyield in these specimens ranged from 0.02-87 ng and had a median value of 35 ng. An additional4 libraries had sufficient DNA yield, but failed because the DNA fragments were too small forsequencing (Figure 3.1). Both of these issues are a likely consequence of DNA degradation inFFPE specimens [11, 46, 118].The issue of DNA yield represents a major challenge in FFPE extraction, and in our study 22%(20/91) of our selected cases could not be sequenced due to insufficient yield. To combat these1The metrics and figures described in this section were generated by Andy Mungall and his team at the GenomeSciences Centre.42Figure 3.1: Fragment size distribution for cases failing library construction. The fragment sizerepresents the length of the DNA insert plus adapter sequences. X-axis shows fragmentsize (base pairs) and y-axis shows frequency. Red lines show the fragment size distri-bution of a passed library from this experiment for reference, and blue lines representthe distribution in each of the four failed cases. Arrows indicate the peaks that representgenomic DNA fragments.limitations, the GSC has developed an automated DNA extraction method using FormaPure bead-based selection (Agencourt). This method has demonstrated improved yield over the column-basedAllPrep method (Qiagen), and likely indicates the higher efficiency of a bead-based approach over acolumn-based approach. In our expanded experiments, the automatic extraction method has showna library construction success rate of 100% in a cohort of 92 FFPE specimens (see Section 4.3), andas such will be our preferred method in the future. We plan to re-assess the libraries that failed inthis study using the automated extraction approach in a future experiment.433.1.2 Target coverage depth and uniformityFor the successful libraries, we first performed quality analysis to assess whether capture sequencingproduced adequate results for SV analysis with the FFPE tissue libraries.First we examined the total number of reads sequenced in each of the libraries (Figure 3.2). Onelibrary (A43037) had many more reads than the others, which was later explained by the identifica-tion of a focal amplification in the chr9 target region for this sample (described in Section amplification resulted in preferential enrichment of DNA from this library during the pooledcapture step since it produced more fragments from the target space than samples with no amplifi-cation. We also noticed a difference in total reads between libraries in the two sequencing batches.This effect is primarily due to differences in the Illumina HiSeq chemistry, since the version 4 chem-istry (batch 2; right panels) yields a higher number of reads per lane than the version 3 chemistry(batch 1; left panels) as specified by the manufacturer. While the total number of reads was variableacross samples, the number of reads that were non-PCR-duplicates and the mean coverage in thetarget space were more uniform. We proceeded with analysis since only these reads would be usedfor rearrangement detection.We next examined the insert sizes of the mapped reads, where the challenges of DNA degrada-tion in FFPE tissue were apparent. Insert size is calculated as the estimated total length of the DNAfragment being sequenced based on the mapping of its paired-end ends (Figure 3.3). While thedesired size range for sheared fragments would be longer than the size of the paired reads (i.e. >250bp for 125 bp reads), in our FFPE libraries this range had to be relaxed to include smaller fragmentsin order to extract sufficient DNA for library construction. The majority of insert sizes fell between100 and 300 bp, and the median insert size of the FFPE libraries was significantly lower than that offresh frozen libraries (Wilcoxon-Mann-Whitney P = 9.05×10−5 ; Figure 3.4). To address the issueof overlapping read pairs in downstream analysis, read trimming strategies were utilized (discussedin Section 2.3.2 and Section 3.1.4).We next sought to determine whether the sequencing data provided adequate coverage of thetarget regions. We assessed the mean coverage in each region, calculated as the average per-basesequencing depth (after removal of PCR duplicates) across all bases contained in the region (Fig-44Batch 1 Batch 2020,000,00040,000,00060,000,00080,000,00001,0002,0003,000ReadsCoverageA43037A43117A43069A43118A43042A43115A43036A43096A43050A43073A43067A43043A43077A43032A43081A43033A43102A43099A43078A43085A43101A43086A43110A43088A43107A43087A43051A43106A43108A43095A43030A43109A43111A43075A43070A43089A43100A43079A43074A43046A43082A43076A43049A43097A43119A43094A43084A43029A43091A43092A43093A43090A43041A43052A43104A43038A43103A43045A43068A43080A43072A43034A43031A43071A43083A43047A43053A43105Total readsUnique readsMean target coverageFigure 3.2: Total number of reads sequenced for each of the 68 libraries. Libraries from thefirst sequencing batch are shown in the left panels (n = 16) and libraries in the secondbatch are shown in the right panels (n = 52). Top panels show read counts in each sample,with total number of reads in red and number of unique (non-PCR-duplicated) reads inblue. Bottom panels show the average fold coverage across all bases in the target space.Sequencing adapters Sequenced reads Insert size Genomic DNA Figure 3.3: Schematic describing how insert size is measured. The size of the genomic DNAfragment is determined by the mapping of the read pairs, and the insert size measurementincludes the length of the sequenced reads plus the distance between them with respectto the reference genome.45P = 9.05e− 150 180 210 240Median insert sizeDensity Tissue preparationFFPEFrozenFFPE Frozen0100200300Insert sizeA B Figure 3.4: Distribution of insert sizes from the 68 sequencing libraries. (A) Each box on thex-axis represents a library and box colour represents the tissue preparation type. Upperand lower whiskers represent extreme values falling with 1.5 * IQR, where IQR is theinter-quartile range. FFPE: formalin-fixed paraffin-embedded. (B) The median insertsize in libraries produced from FFPE tissue specimens (n = 62) is significantly smallerthan in frozen tissue libraries (n = 6; Wilcoxon-Mann-Whitney P = 9.05×10−5 ).46lll638.34  701.2 620.41050010001500200025003000CIITA PDL SOCS1Mean coverageFigure 3.5: Mean coverage across each of the target regions. Values on the y-axis representmean coverage of each region, and boxes represent summary statistics across the 68 sam-ples (median value is labelled). One extreme data point from the PDL region (A43037)was much higher than the others and has been omitted for visualization purposes.ure 3.5). Across the 68 libraries, the median value for mean coverage was above 600x for all threeregions, and was slightly higher (~700x) for the PDL region. The sample that produced an abnor-mally high number of reads (A43037) also had very high coverage of the PDL locus (>12,500x),consistent with the focal amplification of this region that was later identified (Section Theaverage depths were consistent with previous studies that successfully performed structural vari-ant detection with targeted sequencing approaches [1, 12]. In particular, a study of rearrangementdetection in lung tumours and leukemias found that 600x average coverage of the target regionsin tumours with 20% tumour content was sufficient to detect SVs with 90% sensitivity (using adifferent set of SV detection tools) [1].To identify sample-specific biases, we compared coverage of the target regions on a per-samplebasis (Figure 3.6). Coverage was highly variable between samples, and in some cases specific trendswere later explained by SV identification. For instance, A43077 showed low coverage of the CIITA4712,584.35050010001500200025003000050010001500200025003000050010001500200025003000CIITAPDLSOCS1A43037A43094A43117A43118A43078A43096A43106A43115A43042A43099A43110A43079A43093A43108A43107A43043A43036A43109A43075A43101A43069A43084A43091A43050A43030A43097A43089A43073A43100A43111A43082A43070A43077A43074A43029A43038A43041A43067A43092A43086A43032A43051A43068A43095A43085A43088A43034A43031A43087A43033A43072A43047A43080A43071A43052A43049A43081A43104A43090A43076A43083A43046A43102A43045A43053A43119A43105A43103Mean coverageTissue preparationFFPEFrozenFigure 3.6: Mean coverage of each sample across the target regions. Library names are shownacross the x-axis and bars are coloured by tissue preparation type. FFPE: formalin-fixedparaffin-embedded. Coverage of the PDL locus for A43037 is not plotted since it isabnormally high, but the coverage value is listed in grey.and SOCS1 regions compared to the PDL locus, and this case was later shown to harbour a deletionspanning most of the chromosome 16 target space (Section In general, the mean targetcoverage across samples was highly correlated with the total number of reads as expected (Pearsoncorrelation = 0.68, P = 1.4×10−10 ).While average coverage depth was high in each of the regions, we also wanted to characterize480255075100A43117A43078A43069A43094A43086A43101A43091A43036A43042A43115A43050A43079A43108A43097A43075A43085A43099A43118A43107A43089A43073A43106A43110A43074A43030A43092A43096A43043A43109A43041A43067A43077A43093A43084A43051A43045A43088A43100A43038A43029A43111A43033A43032A43095A43081A43068A43082A43104A43047A43049A43119A43034A43090A43087A43080A43052A43046A43071A43037A43031A43053A43083A43103A43076A43070A43105A43072A43102Percent covered at 100xFigure 3.7: Proportion of bases sequenced to 100x depth in each sample. Samples are listedacross the x-axis.the uniformity of coverage throughout the capture space. Across the 68 samples, 97.7% of bases inthe target space had an average coverage depth of at least 100x. The proportion of bases reachingthis depth ranged from 84.7% to 98.4% for individual samples (Figure 3.7). Regions of the targetspace with bases averaging less than 100x depth were identified and visualized in the UCSC genomebrowser (Figure 3.8). These low-coverage regions overlap with areas where the reference sequencehas low uniqueness, indicating that the paucity of reads aligning to these areas is likely due to aninability to map the corresponding reads unambiguously rather than a lack of DNA being capturedfrom these regions. These regions may still present a challenge in SV detection as rearrangementbreakpoints could be located in these areas, making exact breakpoint anatomy difficult to character-ize due to the small number of supporting reads. However, average coverage in these regions wasstill fairly high at ~60x, and since they only constituted 2.3% of the capture space we anticipatedthis would not be a limiting factor in downstream analysis.We also wanted to examine the specificity of the capture approach for enriching DNA in thetarget regions. We observed high off-target coverage in all samples, with 72.7% of aligned basesmapping outside the capture space on average (Figure 3.9). The off-target percentage was signifi-cantly higher in samples from the second batch (Wilcoxon-Mann-Whitney P = 3.5×10−9 ), possi-bly because these libraries had a lower DNA yield and contained fewer fragments from the targetregions. Our off-target proportion was higher than has been observed in previous studies and was49A B C Figure 3.8: Regions of the target space averaging less than 100x coverage depth. Low-coverage regions are shown in the UCSC genome browser for (A) the CIITA target region,(B) the SOCS1 target region and (C) the PDL target region. The “Low covg (<100x)”track shows bases in black if they have an average depth less than 100x across the 68samples. RefSeq gene models are shown in blue and the uniqueness of the reference se-quence in 35 bp windows is shown on the bottom track (“Duke Uniq 35”), where whiteregions indicate non-unique sequences found more than 4 times throughout the referencegenome.likely due to the small size of our capture space, the reduced complexity of FFPE tissue, and ourdecision not to mask probes in repetitive regions during our capture kit design [12]. This strategy isa trade-off because although it provides more complete probe coverage and increases the likelihoodof identifying rearrangements with breakpoints in repetitive regions, it also results in probes withhigh complementarity to regions outside the capture space. To investigate how much of the off-target region was sequenced to comparable depths as the target space, we calculated the number of50l llllllll llll lllll llllllllllllll lllllll ll lllllllllllllllllllllllllll P = 3.5e−09Mean = 0.7270. off−target basesSequencing batchllBatch 1Batch 2Figure 3.9: Proportion of bases aligned outside the capture space. The mean percentage acrossthe 68 samples is 72.7%. The proportion of off-target bases is significantly lower forlibraries in the first sequencing batch compared to the second batch (Wilcoxon-Mann-Whitney P = 3.5×10−9 ).off-target bases with at least 500x coverage, representing the approximate lower quartile value formean coverage of the target space (Figure 3.2). The mean number of off-target bases reaching 500-fold depth was 8,481 across the 68 samples (Figure 3.10), representing a very small fraction of thetotal off-target region (0.0003%) and only 1.8% of the target space size. Most of the high-coverageoff-target bases were shared between multiple samples, such that only 8.6% (49,310/576,704) of thetotal bases were unique. When the common high-depth regions were assessed with BLAT, most hadhigh-concordance matches to regions within the capture space. This supports the hypothesis thatour high off-target percentage is due to capture of off-target fragments due to partial probe comple-mentarity. We concluded that despite the high off-target percentage, the deep coverage we observedin the target regions suggested that the on-target data was still sufficient for SV detection.51Mean = 8,481 bases05,00010,00015,00020,00025,000A43102A43117A43107A43037A43074A43106A43108A43110A43109A43099A43118A43049A43082A43096A43093A43051A43085A43111A43078A43030A43115A43100A43046A43087A43094A43091A43101A43089A43038A43076A43070A43029A43072A43095A43084A43097A43086A43075A43052A43031A43088A43034A43079A43043A43036A43103A43042A43068A43041A43045A43090A43119A43071A43080A43092A43104A43047A43053A43050A43077A43083A43033A43032A43069A43067A43073A43105A43081Off−target bases reaching 500−fold depthFigure 3.10: Number of off-target bases reaching 500-fold depth. The mean number of basesacross the 68 samples is 8,481, representing 0.0003% of the total off-target region(3,095,217,532 base pairs).3.1.3 Assembly statisticsTo examine the effect of k-mer value on assembly, we examined multiple ABySS output metrics.As k-mer size increased, the number of contigs in the assembly decreased (Figure 3.11A). This isgenerally preferable as it indicates a more finished assembly with fewer un-joined segments. TheN50 metric is commonly used to assess assemblies, and describes the length of the shortest contigfor which this contig and all larger contigs combined constitute more than half of the total assemblysize. Thus, a higher N50 value indicates longer contigs. In our data, the N50 value increasedwith the k-mer value (Figure 3.11B). These two metrics suggested that a larger k-mer value wasproviding more finished assemblies with longer contigs. However, the total size of assembly alsoincreased with k-mer size and became increasingly greater than the actual size of the capture space(Figure 3.11C). This may be due to the propagation of sequencing errors that comes from larger k-mers, thereby resulting in additional erroneous contigs and increasing the total size of the assembly.We decided that using these four k-mer values in combination provided a good trade-off of themetrics when combined.52●●●●●●●●●●5000100001500020000k50 k55 k60 k64K−mer valueN50 (bp)●●●●●●●●●0250005000075000100000125000k50 k55 k60 k64K−mer valueNumber of contigs●●450000475000500000525000k50 k55 k60 k64K−mer valueSize of assembly (bp)A B C Figure 3.11: Distribution of assembly statistics based on selected k-mer value. The (A) num-ber of contigs, (B) N50 value, and (C) total size of the assembly for the 68 samples areshown on the y-axes. Red dashed line represents the true size of the capture space.3.1.4 Structural variant detectionAfter performing quality analysis we proceeded with detection of structural variants from our se-quencing data. As described in Section 2.3, we used an integrative approach utilizing multiple SVprediction tools and multiple trimmed read lengths.We explored four tools for predicting structural variants: alignment-based tools deStruct, DELLY,and LUMPY, and the assembly-based PAVFinder tool. We began by testing the performance of the5327(20.1%)1(0.746%)21(15.7%)0(0%)1(0.746%)20(14.9%)1(0.746%)0(0%)43(32.1%)8(5.97%) 5(3.73%)2(1.49%)1(0.746%)3(2.24%)1(0.746%)deStruct DELLYLUMPY PAVFinderFigure 3.12: Overlap of rearrangement predictions from four tools. The number of sharedpredicted SVs after merging is shown in each area, and is also described as a proportionof the total predictions in parentheses on the data from our first sequencing batch of 16 libraries. These samples had 100 bp paired-end reads, and we performed SV detection on read lengths of 100 bp, 85 bp and 75 bp for the threealignment-based tools. PAVFinder was run on assemblies generated with k-mer values of 50, 55, 60and 64 using the full-length 100 bp reads. This provided 13 total result-sets for comparison.In total, we obtained 134 merged predictions from the 13 result-sets. Only 14.9% (20/134) ofthe predictions were identified with all four tools (Figure 3.12), and 70.1% (94/134) were foundwith a single tool. Most of these tool-specific predictions were identified in only one result-setrepresenting a single parameter setting (i.e. trimming length or k-mer value; Figure 3.13).542241051015201 2 3Number of result−setsNumber of predictions382 301020301 2 3Number of result−setsNumber of predictions1413 305101 2 3 4Number of result−setsNumber of predictions210. 2Number of result−setsNumber of predictionsA B C D Figure 3.13: Tool-specific rearrangement predictions are primarily found in a single result-set.The number of tool-specific predictions from (A) deStruct, (B) DELLY, (C) LUMPYand (D) PAVFinder are shown on the y-axis summarized by the number of result-sets inwhich they were predicted (x-axis).55We examined the tool-specific predictions further since these were most likely to represent falsepositives. Many of the LUMPY-specific predictions had low spanning read support (~10 reads) andno split read support, suggesting they were found only with paired-end evidence. A high proportionof these were small rearrangements (<1kb) contained within CIITA intron 1, which may representtrue variation but are unlikely to impart any functional consequence. When LUMPY identifiedrearrangements in repetitive regions, it often produced multiple outputs differing by tens or hun-dreds of bases at the breakpoint locations. These likely represent different mappings of the samerearrangement, generated from non-exact breakpoint information when paired-end data is availablewith no corresponding split reads. The lack of split read support for these rearrangements and theirlocation within repetitive regions suggest they may be artefacts due to sequencing similarity. ThePAVFinder-specific results contained many predictions in repetitive regions whose consensus se-quences mapped to multiple locations in BLAT. They also contained several predictions with bothbreakpoints outside the capture space, likely indicating mis-assemblies resulting from off-targetreads. Additionally, there is a discrepancy in output formats because PAVFinder is able to dis-tinguish between insertion-type events and translocation events. Many of the PAVFinder-specificpredictions were insertions corresponding to translocations found in the other tools. While this dis-tinction provides valuable information about rearrangement anatomy, it can be similarly deducedfrom strand orientation and consensus sequence information given by the other tools. The deStruct-specific events contained some rearrangements that matched the observed FISH patterns and werehypothesized to represent true positives missed by the other tools. Many of the predictions found ina single deStruct result-set were alternate mappings of rearrangements in repetitive regions wherethe unknown partner region was ambiguous, and these were therefore not merged even though theyrepresented the same variant. Other predictions had a consensus sequence mapping to multiplelocations on both sides of the breakpoint and likely represented false positives.We wanted to limit the number of artefactual results that would require validation and decreaseour computational requirements, so we decided to reduce the number of tools used in our finalanalysis pipeline. Since DELLY provided a small set of highly concordant results and had been pre-viously shown to perform well in benchmarked studies, we kept it in our analysis. Of the remaining56tools, deStruct produced the largest number of predictions and appeared to contain some true pos-itives based on its concordance with FISH patterns. Furthermore, its output format was the mostcomprehensive and easy to parse; the LUMPY output does not produce consensus sequences andPAVFinder does not include split-read support. Finally, the combination of deStruct and DELLYwas able to reproduce 61% (35/57) of the PAVFinder predictions and 47% (25/53) of the LUMPYpredictions (Figure 3.12), so we decided to combine deStruct and DELLY results in our analysispipeline with the aim of maintaining high sensitivity.After the tools were finalized, SV detection was also performed on the 52 samples from the sec-ond sequencing batch (including 125 bp reads). In the 68 cases, a total of 243 merged predictionswere produced occurring in 48 samples. Specific findings are discussed in Section 3.2 and Sec-tion 3.3. There was significant overlap between the results obtained at different trimming lengthsfor both deStruct (Figure 3.14) and DELLY (Figure 3.15). However, only 42% of the deStruct pre-dictions (88/209) and 50% (76/151) of the DELLY predictions were identified with all trimminglengths, indicating the importance of the trimming strategy for prediction sensitivity. Concordanceof deStruct and DELLY in the full set of predictions was high (Figure 3.16), and 70% (117/168) ofthe predictions found in more than one result-set were found by both tools.Our list of merged results for all 68 samples contained 12 predictions with both breakpointsoutside the capture space, and these were likely due to off-target coverage. The remaining 231predictions were in the capture space, with 145 located in the chr16 region and 86 located in thePDL region. We next proceeded with the interpretation and validation of these results.3.2 PDL rearrangements2My second research question asked whether we could identify recurrent rearrangement patterns inour target regions. To address this question, we used our SV predictions to examine the identityof novel rearrangement partners and breakpoint anatomy. We began by assessing SVs in the PDLregion.2A version of this section is used as the methods and results sections of a publication that is currently in revisions(Chong, Twa et al.)57A B 17(21.2%)15(18.8%)4(5%) 31(38.8%)3(3.75%)10(12.5%)deStruct 75bp deStruct 100bpdeStruct 85bp5(3.88%)20(15.5%)6(4.65%)0(0%)0(0%)57(44.2%)23(17.8%)0(0%)5(3.88%)0(0%) 1(0.775%)2(1.55%)2(1.55%)5(3.88%)3(2.33%)deStruct 125bp deStruct 100bpdeStruct 85bp deStruct 75bpFigure 3.14: Overlap of deStruct predictions at each trimming length. Absolute counts areshown in each area, and are shown as a proportion of the total predictions below. Pre-dictions are separated into those found in samples from (A) sequencing batch 1 and (B)sequencing batch 2.58A B 4(12.1%) 4(12.1%)3(9.09%)1(3.03%) 19(57.6%)2(6.06%)DELLY 75bp DELLY 100bpDELLY 85bp2(1.69%)9(7.63%)8(6.78%)2(1.69%)2(1.69%)57(48.3%)11(9.32%)1(0.847%)9(7.63%)2(1.69%) 1(0.847%)3(2.54%)1(0.847%)4(3.39%)6(5.08%)DELLY 125bp DELLY 100bpDELLY 85bp DELLY 75bpFigure 3.15: Overlap of DELLY predictions at each trimming length. Absolute counts areshown in each area, and are shown as a proportion of the total predictions below. Pre-dictions are separated into those found in samples from (A) sequencing batch 1 and (B)sequencing batch 2.5992(37.9%)34(14%)117(48.1%)deStructDELLYFigure 3.16: Overlap of deStruct and DELLY predictions for all 68 samples.3.2.1 Patterns of rearrangementAfter filtering, we generated a list of 86 predicted PDL rearrangements (Table B.2). As described inSection 2.4, we further selected 36 SVs from 25 cases as high-confidence predictions (Table 3.1).This included 19 translocations, 9 deletions, 5 inversions and 3 duplications. The selection of thesespecific events is discussed further in Section and Section the PDL rearrangements, we attempted to validate all high-confidence SVs. Two casescontained predictions that could not be validated due to exhaustion of the sample tissue leaving nomaterial for validation. However, the five SV predictions in these two cases had high read supportand uniquely mapping consensus sequences, so we consider them likely to represent real variants.The remaining 31 predictions had validation attempted using PCR and Sanger sequencing. Twenty-seven (87%) of these were validated by Sanger sequencing, and the remaining four failed validation.In these four cases a PCR band that could represent the fusion region was produced, but Sanger60sequencing of the product did not map to the predicted partner region, likely due to sequencingcomplications arising from repeated base stretches.61Table 3.1: High-confidence predicted SVs in the PDL region. Partner genes shown are those that are closest to the breakpoint in RefSeq.When the partner sequence mapped to multiple locations, the one that is most biologically plausible based on strand orientation waschosen. Breakpoint coordinates are based on hg19. BA: break-apart positive, CBR1: cluster breakpoint region 1, CBR2: clusterbreakpoint region 2, DEL: deletion, DUP: duplication, FISH: fluorescence in situ hybridization, INV: inversion, N*: not validateddue to lack of tissue, N: failed validation, TRA: translocation, Y: Sanger-sequencing validated. aThis prediction was filtered outin some result-sets because it had 7+ reads in multiple samples, but was included for validation due to its unusually high supportin A43037. bThe predicted partner region for this translocation was on chromosome 2, but this was corrected based on BLAT andvalidation results.Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionA43029 DEL Chr9:5467908 Chr9:5889215 CD274 MLANA Normal Y Intra-chromosomalA43030 TRA Chr10:48174662 Chr9:5526439 CTSL1P2 PDCD1LG2 BA Y Other TRA (5’)A43037 DUPa Chr9:5447066 Chr9:5570695 CD274 PDCD1LG2 Normal Y Intra-chromosomalA43038 INV Chr9:5477220 Chr9:37409812 CD274 GRHPR BA Y Intra-chromosomalA43041TRA Chr20:49127848 Chr9:5450396 PTPN1 CD274BAY CBR1DUP Chr9:5451269 Chr9:5467963 CD274 CD274 Y Intra-chromosomalINV Chr9:5467981 Chr9:5627175 CD274 RIC1 Y Intra-chromosomalDEL Chr9:5466595 Chr9:5775524 CD274 RIC1 Y Intra-chromosomalDEL Chr9:5451258 Chr9:5451585 CD274 CD274 Y Intra-chromosomalA43042TRA Chr11:35161226 Chr9:5452813 CD44 CD274BAN* CBR1DEL Chr9:5491418 Chr9:6135848 PDCD1LG2 IL33 N* Intra-chromosomalA43043TRA Chr17:56409009 Chr9:5518645 BZRAP1-AS1 PDCD1LG2BAY CBR2TRA Chr10:47272696 Chr9:5518489 BMS1P6 PDCD1LG2 Y CBR2A43067 DEL Chr9:5467992 Chr9:5508323 CD274 PDCD1LG2 Normal N Intra-chromosomalA43071 TRA Chr1:28833535 Chr9:5518615 RCC1 PDCD1LG2 BA Y CBR262Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionA43075 DUP Chr9:5500059 Chr9:5570090 PDCD1LG2 PDCD1LG2 Normal Y Intra-chromosomalA43079 TRA Chr17:56409576 Chr9:5517129 BZRAP1-AS1 PDCD1LG2 BA Y CBR2A43082 TRA Chr2:89159416 Chr9:5510982 IGK PDCD1LG2 BA Y CBR2A43084 DEL Chr9:5467983 Chr9:5470588 CD274 CD274 BA Y Intra-chromosomalA43085 TRA Chr13:46946572 Chr9:5451322 KIAA0226L CD274 BA Y CBR1A43088 TRA Chr9:5466034 Chr6:52178267 CD274 MCM3 BA Y Other TRA (3’)A43089 INV Chr9:5480091 Chr9:37381934 CD274 GRHPR BA Y Intra-chromosomalA43090 DEL Chr9:5563053 Chr9:5984410 PDCD1LG2 KIAA2026 BA N Intra-chromosomalA43093TRA Chr16:27326593 Chr9:5511678 IL4R PDCD1LG2BAY CBR2TRA Chr7:932289 Chr9:5513369 GET4 PDCD1LG2 Y CBR2TRA Chr9:5565388 ChrX:10651175 PDCD1LG2 MID1 N Other TRA (3’)A43094 TRA Chr5:68784424 Chr9:5491406 OCLNb PDCD1LG2 BA N Other TRA (intergenic)A43095 TRA Chr22:23231719 Chr9:5510728 IGLL5 PDCD1LG2 BA Y CBR2A43097 TRA Chr16:27326880 Chr9:5518313 IL4R PDCD1LG2 BA Y CBR2A43099 INV Chr9:5527098 Chr9:37498687 PDCD1LG2 POLR1E BA Y Intra-chromosomalA43101 INV Chr9:5469428 Chr9:5561319 CD274 PDCD1LG2 Normal Y Intra-chromosomalA43110TRA Chr2:89157495 Chr9:5450316 IGK CD274BAN* CBR1TRA Chr2:89051353 Chr9:5502180 RPIA PDCD1LG2 N* Other TRA (intergenic)DEL Chr9:5470453 Chr9:6440816 CD274 UHRF2 N* Intra-chromosomalA43115DEL Chr9:5518701 Chr9:5520276 PDCD1LG2 PDCD1LG2BAY Intra-chromosomalTRA Chr22:23235076 Chr9:5511361 IGLL5 PDCD1LG2 Y CBR2633.2.1.1 Translocation cluster breakpoint regions in CD274 and PDCD1LG2In total, our filtered list contained 46 predicted translocations. Of these, 19 represented unique high-confidence translocation events (Table 3.1). The remaining 27 translocation predictions consistedof 16 reciprocal events, 4 alternate mappings of the same rearrangement, and 7 low-confidencepredictions which had read support of 10 or less, occurred in repetitive regions, and were typicallyidentified in a single result-set.CD274 and PDCD1LG2 both rearranged promiscuously with multiple partner genes (Figure 3.17).This observation had been suggested by the identification of multiple partners in previous studies,but had not been verified due to lack of sample size in published cohorts [15, 97, 101, 102]. No exactbreakpoints were recurrent, but BZRAP1-AS1, IGK, IL4R, and IGLL5 were each found as partnersin two cases.There were two obvious clusters of translocation breakpoints in the regions surrounding the 5’end of each PDL gene. The first cluster contained 4 translocation breakpoints and spanned from300 bp upstream of CD274 to the first 2.5 kb of intron 1. We have termed this newly identifiedhotspot region “cluster breakpoint region 1 (CBR1)”. The second hotspot occurred in the first 8kb of PDCD1LG2 intron 1 and contained 10 translocation breakpoints. We refer to this secondregion as “cluster breakpoint region 2 (CBR2)”. There were five additional translocations withbreakpoints outside the CBRs, which we define as a third category of translocation. This includedtwo breakpoints near the 3’ end of each gene, and two intergenic breakpoints located between thePDL genes. The last translocation occurred within PDCD1LG2 intron 2. We hypothesized thatthe three translocation types (CBR1, CBR2, non-CBR) would have distinct functional impact, andwe further explored this hypothesis using immunohistochemistry. These results are discussed inSection also wanted to assess whether any of the PDL translocations might result in chimeric proteinproducts. We examined the location and reading frame of each set of translocation breakpoints, andfound CD44-CD274 to be the only potential in-frame fusion. In this case, breakpoints occur inthe intron 1 region of each gene causing the promoter and exon 1 regions of the two genes to beexchanged. Since both PDL genes have a non-coding exon 1, the full coding sequence of CD2746412567910111316172022XBZRAP1-AS1PDCD1LG2KIAA0226LCTSL1P2BMS1P6PTPN1MCM3CD274OCLNIGLL5RCC1MID1GET4CD44RPIAIL4RIGKCTSL1P2PTPN1CD44BZRAP1−AS1BMS1P6RCC1BZRAP1−AS1IGKKIAA0226LMCM3IL4RGET4MID1OCLNIGLL5IL4RIGKRPIAIGLL5CD274 PDCD1LG2A B CBR1 CBR2 Figure 3.17: Translocation events in the PDL region. (A) Circos plot showing the locationof PDL rearrangement partners throughout the genome. Black arcs represent translo-cations between genes labelled on either side. Dots size represents rearrangement fre-quency, with smaller dots indicating rearrangements observed once and larger dots in-dicating rearrangements observed twice. Translocations labelled with a black dot havebeen Sanger-validated, red dots failed Sanger validation, and grey dots have not beenvalidated due to lack of tissue. Green box represents capture region. (B) Location oftranslocation breakpoints (inverted triangles) throughout the PDL region with partnerslabelled above. X-axis represents genomic coordinates and gene models are shownbelow. Grey shaded region represents the capture space.65is retained downstream of CD44 exon 1. A putative fusion protein is predicted to result from thisrearrangement consisting of the first 21 amino acids of CD44 encoded by exon 1, followed by fiveamino acids from the normally untranslated segment of CD274 exon 2 and the full-length aminoacid sequence of PD-L1. Unfortunately, the existence of this chimeric protein cannot be assessedsince tissue from this sample has been exhausted. For all other rearrangements, no chimeric proteinis likely to exist. This could be because the partner region is intergenic, the partner genes are inopposite orientations, the fusion is out-of-frame, or one side of the rearrangement contains onlyuntranslated sequence. Intra-chromosomal rearrangementsA fourth type of rearrangements in the PDL region is intra-chromosomal SVs consisting of dele-tions, inversions and duplications. We identified 40 intra-chromosomal SV predictions in our fil-tered data. One prediction was the reciprocal end of an inversion event, one represented an alternatemapping of a deletion breakpoint, and 21 represented low-confidence predictions that were smallerthan 1 kb, had low read support and were usually found in a single result-set. This left 17 uniquehigh-confidence predictions for validation consisting of 9 deletions, 5 inversions, and 3 duplications(Table 3.1).Deletions of the 3’ untranslated region (UTR) of CD274 were seen in five cases (Figure 3.18).These often occurred concurrently with deletion of the entire PDCD1LG2 gene region (3/5 cases). Adeletion of the PDCD1LG2 3’ UTR was also seen in one sample. Two inversions also displaced theCD274 3’ region and were both relatively small (less than 200 kb). The three remaining inversionswere much larger (~32 Mb) and all had their partner breakpoints in the same region of chromosome9 surrounding the GRHPR and POLR1E genes (9p13.2). Duplications were seen in three cases andwere located in PDL gene regions. One duplication spanned both PDL genes, and the other twoamplified each gene individually.66CD274 PDCD1LG2A43029A43037A43038A43041A43042A43067A43075A43084A43089A43090A43099A43101A43110A43115546000054800005500000552000055400005560000DeletionDuplicationInversionFigure 3.18: Intra-chromosomal predictions in the PDL region. Deletion, inversion and du-plication events identified in the PDL region are shown by coloured rectangles. X-axisrepresents genomic space and rectangle borders represent the span of each SV. Individ-ual samples are shown on the y-axis. Gene models are shown above and grey rectanglerepresents the capture space. Immunohistochemistry3To examine the impact of the genomic rearrangements we discovered, immunohistochemistry stain-ing was performed to assess surface expression of the PD-L1 (encoded by CD274) and PD-L2(encoded by PDCD1LG2) proteins. IHC was performed on 44 cases and was evaluable for PD-L1expression in 35 cases and PD-L2 expression in 37 cases (Table B.3).We first examined the effects of CBR translocations on ligand expression since these representeda novel recurrent type and encompassed the majority of translocations. For both types, a CBRrearrangement was associated with positivity of the corresponding ligand. IHC was available for 2of 4 cases containing a CBR1 translocation, and both of these stained positively for PD-L1 in 100%of cells with maximum intensity. Of the 8 cases with a CBR2 translocation, 6 had IHC data availableand all of these stained positively (i.e. ≥20% of cells were positive) for PD-L2. Figure 3.194 showsa representative case (A43095) where a CBR2 rearrangement corresponds to expression of PD-L2and no expression of the unaltered PD-L1.3IHC scoring was performed by Anja Mottok and interpretation was performed by Anja and myself.4This figure was created by Anja Mottok.67Figure 3.19: Immunohistochemistry staining for three representative lymphoma cases.A43095 contains a CBR2 translocation, and tumour cells stain positively for PD-L2 while remaining negative for PD-L1. A43075 has a focal amplification of thePDCD1LG2 gene region, and showed positivity for PD-L2 in the majority of tumourcells and negativity for PD-L1 except in macrophages. A43101 contains an inversionthat removes the 3’ UTR of CD274, and stained positively for PD-L1 only. Scale bar =25µm.68A B ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●P−value = 0.01801002003000 1CD274  rearrangementPD−L1 IHC histological score● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●P−value = 0.0004801002003000 1PDCD1LG2  rearrangementPD−L2 IHC histological scoreFigure 3.20: IHC histological score is significantly associated with CBR translocations andgene duplications. Wilcoxon-Mann-Whitney tests for (A) PD-L1 and (B) PD-L2 histo-logical scores showed that samples with a CBR translocation or focal gene amplification(x = 1) are significantly associated with a higher score compared to those with no suchrearrangement (x = 0). Points represent individual cases.Duplications were also associated with higher ligand expression. The case with a duplicationspanning both PDL genes stained positively for PD-L1 and PD-L2, and the cases with amplificationsof individual genes stained positively for the amplified ligand and negatively or at a low percent-age for the other ligand. Figure 3.19 demonstrates positive staining only for PD-L2 in A43075,corresponding to the focal amplification of PDCD1LG2.Since both CBR translocations and focal gene amplifications appeared to be correlated withover-expression, we compared IHC histological scores between cases with and without these re-arrangements. We found that samples harbouring either type of rearrangement were significantlyassociated with a higher histological score for the corresponding protein (Wilcoxon-Mann-WhitneyP = 0.018 for PD-L1; P < 0.001 for PD-L2; Figure 3.20).We next considered non-CBR translocations. Three cases harbouring these rearrangements hadIHC data available. The case with a translocation in PDCD1LG2 intron 2 stained negatively forboth ligands. Conversely, a case with a PDCD1LG2 3’ translocation and a case with an intergenictranslocation both stained positively for both ligands.Cases harbouring intra-chromosomal rearrangements that disrupt the 3’ UTR of a gene also ap-69peared to be associated with increased protein expression. Both cases with deletions of the CD2743’ UTR for which IHC was available showed 100% positivity for PD-L1, and the case with aPDCD1LG2 3’ deletion stained positively for PD-L2 in 90% of cells. A case with an inversiondisplacing the 3’ UTR of CD274 also showed positivity for PD-L1 only (A43101 in Figure 3.19).IHC data was available for two of the three cases containing a 32 Mb inversion. Both cases werenegative for PD-L2, and only one case was positive for PD-L1 despite the two inversions havingboth breakpoints in close proximity.3.2.2 Interpretation of PDL rearrangements5The SV analysis performed in this study represents the first comprehensive assessment of the PDLrearrangement landscape at base-pair resolution. Moreover, it represents the first analysis of B-celllymphomas using FFPE tissue in conjunction with hybrid capture-based targeted sequencing.We verified that CD274 and PDCD1LG2 rearrange promiscuously, and identified 19 noveltranslocations. We further described breakpoint cluster regions CBR1 and CBR2 for the first time.Importantly, our observations are consistent with previous reports of PDL translocations that iden-tified multiple breakpoints in the intron 1 regions of CD274 and PDCD1LG2 corresponding to theCBRs [15, 97]. Published studies have also reported non-CBR rearrangements in PDCD1LG2 withbreakpoints in intron 6 and downstream of the gene, which are consistent with the presence of 3’rearrangements identified here (i.e. CD274-MCM3 and PDCD1LG2-MID1) [101, 102]. Combined,these observations provide support for the classification of translocations into 3 groups by break-point location (CBR1, CBR2, non-CBR).Immunohistochemistry data suggests that by retaining the full coding sequence contained inexon 2 onwards, CBR rearrangements result in over-expression of the rearranged ligand. This isconsistent with published literature [92, 101, 102]. Many of the partner genes for CBR rearrange-ments possess highly active promoters, such as CD44 and previously reported CIITA. This supportsa promoter swap mechanism as the likely cause of induced expression. Similarly, many of the part-ner regions are enriched in enhancer elements such as the recurrent immunoglobulin partners IGK5A version of this section is included in the discussion in Chong, Twa et al. and was co-authored by David Twa.70and IGLL5, and previously reported IGH. This suggests that the introduction of enhancers upstreamof the PDL gene regions may also result in expression of the ligands. Surface expression of PD-1ligands has been demonstrated to inhibit T-cell activity in a subset of lymphomas, allowing tumourcells to evade immune attack [67, 97]. Taken together these findings indicate the functional impactof newly described CBR rearrangements.Non-CBR translocations also comprise a number of observed rearrangements, and appear tohave disparate functional effects. A subset of these rearrangements involve the 3’ end of the geneand are predicted to cause removal of exon 7 and/or 6, and translocations of this type appear to beassociated with increased ligand expression by IHC. This may be the result of the exchange of cisregulatory elements in the 3’ region, removal of microRNA binding sites in the 3’ UTR that allowpersistence of mRNA transcripts, and possibly increased translational efficiency by altering sec-ondary mRNA structures [16, 52, 81, 108, 119]. Since some of the 3’ translocation partners includeenhancer-rich regions such as IGH (previously reported), it is possible that enhancer elements mayalso induce expression in this subset of translocations. Unfortunately, IHC data was unavailable forintergenic rearrangements and these have yet to be functionally characterized.A further consequence of PDL translocations may be the disruption of partner genes that func-tion as tumour suppressors such as CIITA, GET4, KIA0226L, and PTPN1 [31, 35, 42, 97].We also discovered multiple intra-chromosomal rearrangements and observed recurrent dele-tions and inversions displacing the 3’ UTR of the PDL genes. The effect of these SVs is likelysimilar to that of the 3’ translocations (i.e. removal of miRNA binding sites, increased transcripthalf-life), and indeed these cases showed over-expression of the altered ligands by IHC. Interest-ingly, the recurrent large inversions seen in three cases had no consistent effect on protein expres-sion, suggesting they may occur due to breaks being preferentially introduced in the partner regionsbut have no functional consequence on PDL expression. Duplication of gene regions also appearsto be a mechanism by which ligands are over-expressed.My third research question aimed to assess whether the specific rearrangements we identifiedhad a functional impact or clinical relevance. Our observed SV patterns and the associated IHCdata demonstrated that specific patterns were indeed associated with significant differences at the7113 923Capture FISH5 835 -/-19 +/+1 +/+ mismatchFISHCaptureA B Overlap by SV Overlap by case Figure 3.21: Concordance between FISH and capture-based findings in the PDL region. (A)Of the 36 structural variants discovered by capture sequencing, 23 matched FISH break-apart patterns and 13 did not match or were below FISH resolution. Nine cases wereFISH break-apart positive for PDL and contained no matching SVs found by sequenc-ing. (B) Of the 68 cases, 55 had a concordant rearrangement status: 35 were break-apartnegative and had no predicted SVs; 19 were break-apart positive and had matching SVsidentified in sequencing; and 1 was break-apart positive and harboured a SV that didnot match the FISH pattern. The other 13 samples had discordant findings between thetwo methods, with 8 being break-apart positive by FISH and negative in capture, and 5with SVs detected by capture but producing a normal FISH pattern.expression level. When considered with the existing literature, our findings suggest that both CBRtranslocations and intra-chromosomal rearrangements are likely to result in inhibition of T-cell re-sponse and an immune privilege phenotype.3.2.3 Concordance with FISH6To compare the consistency between findings with the classical FISH-based approach and the novelcapture-based sequencing approach, we compared the PDL results from each on a per-SV and per-case basis (Figure 3.21).On a per-case basis, 79% (54/68) of the samples contained concordant results from FISH andsequencing. This means that the FISH break-apart signal pattern matches the SVs discovered bysequencing, or that a normal FISH pattern is observed in cases with no rearrangements. There was6A version of this section is included in the results and discussion of Chong, Twa et al. and was co-authored by DavidTwa.72no significant association with either tissue preparation type (Wilcoxon-Mann-Whitney P = 0.43)or sequencing batch (Wilcoxon-Mann-Whitney P = 0.63) between concordant and non-concordantsamples. When comparing the validated SV predictions, 36% (13/36) of the SVs were identifiedin capture but not with FISH. These 13 events were all intra-chromosomal rearrangements, mostof which were too small to be detected using break-apart assays. However, two were deletionsexpected to result in the loss of one red signal but did not demonstrate this pattern.An interesting example came from A43037, where a focal amplification of both gene regionswas observed that fell between the two FISH probe regions (Figure 3.18). To investigate this further,we performed an additional FISH assay that included a third aqua-labelled probe region locatedbetween the red and green regions (Figure 3.227). Nuclei in this sample showed a brighter aquasignal than red and green in both alleles, indicating that the amplification is indeed focal to the aqua-labelled region and is also homozygous. The labelled cells also showed minimal separation betweenthe red and green probes due to the relatively small size of the amplified region, demonstrating whythis sample was classified as break-apart normal. IHC for A43037 was positive for both PD-L1 andPD-L2, indicating that this micro-amplification is associated with over-expression of both ligands.There are many possible explanations for discordance between the FISH and sequencing results.Rearrangements suggested by FISH may be missed in capture sequencing if they are located outsideof the defined target space, or if they occur in highly repetitive regions where mapping is difficultand low coverage limits the ability to detect SVs (Figure 3.8). Conversely, SVs identified in capturesequencing may be missed using FISH due to the limited resolution of break-apart assays to identifysmaller or complex rearrangements (as demonstrated for A43037). Differences may also be causedby intra-tumoural heterogeneity. Rearrangements in small clones may be present at low frequency(i.e. < 5%) causing the cells to display a normal FISH pattern, but be detectable by sequencing dueto high coverage depth. Spatial heterogeneity within tumour tissue may also be a factor, wheresampling bias may result in the two assays utilizing samples with different clonal composition.The presence of SVs that cannot be detected by FISH suggests the added value of performingcapture-based sequencing analysis for lymphoma samples. Our assessment of the focal amplifica-7The schematic and FISH image in this figure were generated by Susana Ben-Neriah, and the IHC images weregenerated by Anja Mottok.73FISH BAC probes CD274 PDCD1LG2 Capture space 5,200,000 5,300,000 5,400,000 5,500,000 5,600,000 5,700,000 bp 5,800,000 5,900,000 chr9 A B C D Figure 3.22: Schematic of the FISH assay used to investigate a focal amplification in A43037.(A) A third type of aqua-labelled FISH probes were designed that hybridize betweenthe red and green regions. This labelled region overlapped with the capture space. (B)A representative stained nucleus from A43037 showed a brighter aqua signal (labelledwith aqua arrows) than the red and green signals (labelled with white arrows), indicat-ing the amplification is focal to this region and is homozygous. The close proximitybetween the red and green signals demonstrates why the cells were not classified asbreak-apart positive. IHC staining in this case was positive for both (C) PD-L1 and (D)PD-L2 surface expression. Scale bar = 25µm.tion in A43037 demonstrates that sub-FISH resolution SVs may have important functional conse-quences. Conversely, the large inversions we observed in three cases produced a FISH break-apartsignal but did not impact ligand expression by IHC. This highlights the importance of characteriz-ing rearrangement type when assessing the potential impact of PDL rearrangements. Our resultsindicate that performing FISH, capture sequencing and IHC in conjunction may be an improvedmethod for rearrangement detection and interpretation. This would maximize the sensitivity andresolution of SV detection and allow for an informed analysis of the functional consequences ofgenomic rearrangements.743.3 CIITA and SOCS1 rearrangementsWe next examined whether the chromosome 16 rearrangements at the CIITA and SOCS1 loci alsoshowed recurrent rearrangement patterns.3.3.1 Patterns of rearrangementAfter filtering, we generated a list of 145 predicted CIITA rearrangements (Table B.4). We furtherselected a list of 82 high-confidence SVs from 36 cases as described in Section 2.4 (Table 3.2). Thisconsisted of 16 translocations, 44 deletions, 19 inversions and 3 duplications. A detailed descriptionof the criteria used for this selection can be found in Section and Section contrast to our PDL analysis, we performed validation of only selected translocation eventsin our CIITA results. Based on the high validation rate we observed in our high-confidence PDLSV list, we were reassured that our selected CIITA SV predictions were likely true with similarspecificity, avoiding the necessity of performing intensive validation of all 82 results.75Table 3.2: High-confidence predicted SVs in the chr16 capture region. Partner genes shown are those that are closest to the breakpointin RefSeq. When the partner sequence mapped to multiple locations, the one with support for the reciprocal event was chosen,or if none had reciprocal support then the one that is most biologically plausible based on strand orientation was chosen. Emptycells in the ‘Validation’ column indicate validation was not attempted. Breakpoint coordinates are based on hg19. BA: break-apartpositive, DEL: deletion, DUP: duplication, FISH: fluorescence in situ hybridization, INV: inversion, N*: failed validation by dualfusion FISH, TRA: translocation, Y: Sanger-sequencing validated, Y*: validated by dual fusion FISH.Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionA43029 INV chr16:11,201,073 chr16:11,785,708 CLEC16A TXNDC11 BA OtherA43030INV chr16:11,348,883 chr16:11,351,573 SOCS1 SOCS1BAOtherDEL chr16:10,947,070 chr16:11,116,222 CIITA CLEC16A OtherDEL chr16:11,349,015 chr16:11,349,193 SOCS1 SOCS1 OtherTRA chr14:106,326,048 chr16:11,349,135 IGH SOCS1 SOCS1 translocationA43031 INV chr16:11,013,706 chr16:11,033,187 CIITA DEXI BA OtherA43036INV chr16:10,972,119 chr16:11,349,103 CIITA SOCS1BAOtherTRA chr16:10,972,522 chrX:41,548,791 CIITA GPR34 Y CIITA translocationDEL chr16:10,972,770 chr16:10,973,118 CIITA CIITA CIITA intron 1A43037DEL chr16:10,972,948 chr16:10,973,706 CIITA CIITABACIITA intron 1INV chr16:11,062,836 chr16:57,168,099 CLEC16A CPNE2 OtherINV chr16:11,063,029 chr16:30,683,452 CLEC16A FBRS OtherINV chr16:10,537,483 chr16:11,080,769 ATF7IP2 CLEC16A OtherINV chr16:11,110,612 chr16:28,448,129 CLEC16A EIF3C OtherDUP chr16:11,106,289 chr16:21,748,652 CLEC16A OTOA OtherINV chr16:11,059,965 chr16:21,428,484 CLEC16A NPIPB3 OtherA43043 DEL chr16:10,972,128 chr16:10,972,316 CIITA CIITA Normal Y CIITA intron 176Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionA43049 INV chr16:3,056,935 chr16:10,966,595 CLDN9 CIITA BA OtherA43050 DEL chr16:10,972,395 chr16:10,973,044 CIITA CIITA BA CIITA intron 1A43051 DEL chr16:7,638,085 chr16:10,972,040 RBFOX1 CIITA BA OtherA43052TRA chr16:10,974,001 chr2:61,108,477 CIITA RELBAY CIITA translocationTRA chr16:10,972,714 chr2:89,159,665 CIITA IGK Y CIITA translocationDEL chr16:10,972,769 chr16:10,973,286 CIITA CIITA CIITA intron 1A43067 INV chr16:10,973,601 chr16:27,326,617 CIITA IL4R BA OtherA43068DEL chr16:10,962,704 chr16:11,310,352 CIITA CLEC16ABAOtherTRA chr16:11,348,887 chr14:106,211,708 SOCS1 IGH SOCS1 translocationA43069 TRA chr22:39,854,860 chr16:10,973,113 MGAT3 CIITA BA Y CIITA translocationA43070DEL chr16:10,983,031 chr16:11,812,699 CIITA TXNDC11BAOtherTRA chr16:10,972,750 chr1:2,985,148 CIITA PRDM16 Y CIITA translocationDEL chr16:11,215,236 chr16:11,480,301 CLEC16A RMI2 OtherINV chr16:11,215,206 chr16:11,480,274 CLEC16A RMI2 OtherA43071 DEL chr16:10,756,488 chr16:11,339,356 TEKT5 SOCS1 BA OtherA43072 DEL chr16:10,861,924 chr16:10,996,431 NUBP1 CIITA BA OtherA43075DEL chr16:11,348,417 chr16:11,349,240 SOCS1 SOCS1BAOtherTRA chr16:10,973,178 chr12:8,764,607 CIITA AICDA Y CIITA translocationDEL chr16:10,972,691 chr16:10,972,983 CIITA CIITA CIITA intron 1A43076INV chr16:11,037,301 chr16:11,352,433 CLEC16A RMI2BAOtherDEL chr16:10,982,313 chr16:12,374,408 CIITA SNX29 OtherDUP chr16:10,972,350 chr16:10,972,662 CIITA CIITA CIITA intron 1A43077 DEL chr16:10,972,806 chr16:12,062,507 CIITA TNFRSF17 BA Other77Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionA43078TRA chr8:128,808,741 chr16:10,972,594 PVT1 CIITABAY CIITA translocationDEL chr16:10,972,600 chr16:10,972,919 CIITA CIITA CIITA intron 1DEL chr16:10,972,500 chr16:10,973,104 CIITA CIITA CIITA intron 1A43079TRA chr16:10,972,800 chr10:48,989,246 CIITA GLUD1P7BACIITA translocationDEL chr16:10,973,408 chr16:10,973,892 CIITA CIITA CIITA intron 1TRA chr14:106,325,853 chr16:11,349,095 IGH SOCS1 SOCS1 translocationA43080DEL chr16:10,973,122 chr16:11,348,838 CIITA SOCS1BAOtherDEL chr16:10,971,699 chr16:10,972,446 CIITA CIITA CIITA intron 1A43081DEL chr16:10,971,906 chr16:10,973,504 CIITA CIITABACIITA intron 1DEL chr16:10,971,940 chr16:10,972,143 CIITA CIITA CIITA intron 1INV chr16:10,972,733 chr16:10,973,111 CIITA CIITA CIITA intron 1TRA chr10:48,150,225 chr16:10,973,648 CTSL1P2 CIITA N* CIITA translocationA43082DEL chr16:11,349,119 chr16:11,349,550 SOCS1 SOCS1NormalOtherDEL chr16:10,972,561 chr16:10,972,737 CIITA CIITA CIITA intron 1DEL chr16:10,972,338 chr16:10,972,492 CIITA CIITA CIITA intron 1DEL chr16:10,972,316 chr16:10,972,598 CIITA CIITA CIITA intron 1A43084 DEL chr16:10,972,316 chr16:10,972,598 CIITA CIITA Normal CIITA intron 1A43090DEL chr16:11,179,301 chr16:11,180,645 CLEC16A CLEC16ANormalOtherDEL chr16:11,348,503 chr16:11,348,970 SOCS1 SOCS1 OtherA43092 TRA chr8:128,749,148 chr16:11,349,126 MYC SOCS1 Normal SOCS1 translocationA43093DEL chr16:10,973,558 chr16:10,975,142 CIITA CIITANormalCIITA intron 1INV chr16:10,971,536 chr16:10,972,727 CIITA CIITA CIITA intron 1DEL chr16:10,797,444 chr16:11,084,314 NUBP1 CLEC16A Other78Sample Type Position 1 Position 2 Gene 1 Gene 2 FISH Validation RegionINV chr16:10,971,870 chr16:10,972,644 CIITA CIITA CIITA intron 1DUP chr16:10,972,945 chr16:10,973,145 CIITA CIITA CIITA intron 1A43094 DEL chr16:9,992,019 chr16:11,348,834 GRIN2A SOCS1 Normal OtherA43095DEL chr16:10,972,127 chr16:10,974,070 CIITA CIITANormalCIITA intron 1DEL chr16:11,348,681 chr16:11,349,373 SOCS1 SOCS1 OtherDEL chr16:10,972,015 chr16:10,972,312 CIITA CIITA CIITA intron 1INV chr16:10,973,186 chr16:10,973,372 CIITA CIITA CIITA intron 1DEL chr16:10,974,425 chr16:10,974,698 CIITA CIITA CIITA intron 1DEL chr16:10,972,020 chr16:10,974,069 CIITA CIITA CIITA intron 1A43097 DEL chr16:10,971,586 chr16:10,971,998 CIITA CIITA Normal Y CIITA intron 1A43101 INV chr16:10,971,827 chr16:10,973,681 CIITA CIITA Normal CIITA intron 1A43110 DEL chr16:10,810,673 chr16:11,143,548 NUBP1 CLEC16A Normal OtherA43115DEL chr16:10,972,963 chr16:10,973,158 CIITA CIITABACIITA intron 1INV chr16:10,971,282 chr16:10,971,688 CIITA CIITA CIITA intron 1DEL chr16:11,349,402 chr16:11,349,494 SOCS1 SOCS1 OtherTRA chr16:10,980,174 chr7:128,309,229 CIITA FAM71F2 CIITA translocationTRA chr10:48,986,597 chr16:10,973,066 GLUD1P7 CIITA Y* CIITA translocationA43117 DEL chr16:10,972,620 chr16:10,973,536 CIITA CIITA Normal CIITA intron 1A43119 TRA chr14:106,094,545 chr16:10,973,001 IGH CIITA BA CIITA translocation793.3.1.1 Translocation cluster breakpoint regions in CIITA intron 1 and SOCS1In total, 36 translocation predictions were identified in the two capture regions on chromosome 16.This list contained 16 unique high-confidence rearrangements. The remaining 20 translocationsconsisted of 11 reciprocal breakpoints, 8 alternate mappings of the same rearrangement, and 1 low-confidence rearrangement that had only 7 reads of support, was detected in a single result-set, andmapped to multiple locations in BLAT. When translocations had alternate mappings of the unknownpartner region, the prediction with support for the reciprocal event was chosen for reporting, andif none had reciprocal support the prediction that was most biologically plausible based on strandorientation was selected (i.e. avoiding predicted dicentric derivative chromosomes). The 16 noveltranslocations came from 13 samples and included 12 breakpoints in the CIITA region and 4 in theSOCS1 region (Figure 3.23).There are two clear clusters of translocation breakpoints within the chromosome 16 capturespace. The first of these is located in CIITA intron 1 and contains 12 breakpoints (Figure 3.24).The identity of the partners in this region confirms that CIITA rearranges promiscuously. No exactbreakpoints were recurrent, but two immunoglobulin partners were observed (IGK and IGH). Fur-thermore, 3 translocations had their partner region on chromosome 10 (GLUD1P7 and CTSL1P2)and all 3 had breakpoint sequences that mapped to multiple locations on chromosome 10 with nearly100% identity. Two of these were selected for further investigation using a dual fusion FISH as-say (Figure 3.258). The t(10;16) prediction in A43115 successfully validated but the prediction inA43081 failed. This suggests that the predicted partner region in A43081 may be incorrect, andthat the real partner may be one of the alternate mappings. Additional assays will be required toelucidate the correct partner in this case. The CIITA cluster breakpoint region we observed is consis-tent with previous reports from our group that have described 10 additional translocations occurringwithin the first 3 kb of intron 1 in PMBCLs [68, 97].The other breakpoint cluster occurred in the SOCS1 gene locus, with all breakpoints fallingin the coding region of exon 2 (Figure 3.26). IGH was a recurrent partner region in SOCS1 rear-rangements (3/4 translocations). Previous reports have demonstrated frequent inactivating SOCS18The FISH image in this figure was generated by Susana Ben-Neriah.8012781012141622XGLUD1P7FAM71F2CTSL1P2PRDM16MGAT3AICDASOCS1GPR34CIITAPVT1MYCRELIGHIGKGPR34MGAT3CTSL1P2FAM71F2GLUD1P7IGHRELIGKIGHPRDM16AICDAPVT1GLUD1P7IGHMYCIGHCIITA CLEC16ADEXI SOCS1A B Figure 3.23: Location of translocation breakpoints throughout the chr16 capture space. (A)Circos plot demonstrating the location of translocation partners across the genomicspace. Black arches represent translocations, and green boxes show the target space.Dot size shows the frequency of each rearrangement, with the smallest dots representinga single event. Black dots were Sanger-validated, red dots failed validation, and greydots were not validated. (B) Location of chromosome 16 translocation breakpoints.X-axis represents genomic coordinates. Upper track shows the breakpoint locations(inverted triangles) with the partner region labelled above. Bottom track shows genemodels, and shaded grey regions represent the target space.81GPR34MGAT3CTSL1P2FAM71F2GLUD1P7RELIGKPRDM16AICDAPVT1GLUD1P7IGH10975000109800001098500010990000Figure 3.24: Location of translocation breakpoints throughout the CIITA intron 1 region. X-axis represents genomic coordinates and dashed grey lines show the borders of intron 1.Inverted triangles show exact breakpoint locations with partner regions labelled above.CIITA chr16 GLUD1P7 chr10 A BCFigure 3.25: Dual fusion FISH assay used to validate a t(10;16) prediction in A43115. Greenand red probes are bound to regions spanning the breakpoints in the (A) CIITA regionon chromosome 16 and the (B) GLUD1P7 region on chromosome 10. Dashed linesrepresent predicted breakpoint locations. (C) In a non-rearranged allele, red and greenprobes are located in distinct spatial regions on the normal chromosomes (red and greenarrows). If a translocation occurs within the probe binding regions, the green and redsignals will be joined on the two derivative chromosomes (white arrows). A represen-tative nucleus in A43115 shows that the sample has one normal allele (separate red andgreen signals) and one translocated allele (two merged signals).82IGHIGHIGHMYCSOCS1Figure 3.26: Location of translocation breakpoints throughout the SOCS1 gene region. X-axisrepresents genomic coordinates. Inverted triangles represent breakpoint locations withpartner regions labelled above. Grey shaded region represents the capture space and theSOCS1 gene model is shown below.mutations in multiple lymphoma types, but have never described an enrichment for translocationbreakpoints in this region [65, 87, 111].After analysing the translocation products, two of the rearrangements had the potential to createa fusion protein. These included CIITA-REL and CIITA-PRDM16. In both cases the breakpointsoccurred in intron 1 of CIITA and in the region upstream of the partner gene. This would causethe CIITA intronic region to extend to the splice acceptor site at exon 2 of the the partner gene,placing exon 1 of CIITA in frame upstream of the partner’s coding exon 2. The putative chimericproteins would consist of the first 17 amino acids of CIITA encoded in exon 1, followed by allamino acids of the partner protein from exon 2 onwards. The CIITA-GPR34 and CIITA-FAM71F2rearrangements were predicted to create truncated fusion transcripts since they joined CIITA exon 1to partner exons in a different reading frame, in both cases introducing stop codons shortly after thefusion. These transcripts are highly unlikely to result in functional protein products. The remainderof the translocations joined CIITA or SOCS1 to partner genes in opposite orientations or to intergenicspace.In all CIITA translocations, the coding sequence of CIITA was disrupted between coding exons 1and 2. Similarly, all SOCS1 translocations disrupted the coding sequence in exon 2 and are expectedto be inactivating.833.3.1.2 Intra-chromosomal rearrangementsThere were 109 predicted intra-chromosomal rearrangements (deletions, inversions and duplica-tions) detected in the filtered results. Among these were 66 unique SVs we classified as high-confidence including 44 deletions, 19 inversions and 3 duplications (Figure 3.27). The remaining43 SVs contained 8 reciprocal inversion events, 3 alternate mappings of breakpoints in repetitive re-gions, and 7 predictions that were deemed low-confidence because they had only 7 reads of support,were found in a single result-set, were typically smaller than 1 kb, and had no predicted func-tional impact (e.g. deletions contained within an intronic region). It also contained 25 whole-introndeletions in A43117, which are discussed further in this section and are unlikely to represent realgenomic deletions.Thirty-two of the intra-chromosomal rearrangements (48%) were small events contained withinCIITA intron 1 (labelled as “CIITA intron 1” in Table 3.2), and most of these fell within the first 3kb of the intronic space (Figure 3.28). This is consistent with a report from our group that showedrecurrent single nucleotide variants (SNVs) and small indels in nearly half of PMBCL tumoursprofiled [68]. Sequence analysis of this region showed that mutations were enriched in AID targetmotifs, suggesting that aberrant somatic hypermutation is likely responsible for the alterations seenhere.An interesting case was A43071, where we identified 26 unique deletions. Upon inspection,all but one of these had coordinates ending at the borders of CIITA introns (Figure 3.29). Thisstrongly suggested that the library suffered from RNA contamination during extraction, and thatthese deletions represented splicing of intronic regions in processed mRNA. This pattern was notobserved in any other cases. Improved distinction between DNA and RNA is another feature of theGSC’s new automated extraction method (discussed in Section 3.1.1).We saw frequent intra-chromosomal SVs in the CIITA gene locus that were expected to abrogateprotein expression. Twelve deletions were observed: 6 of which deleted the full CIITA gene, and6 that deleted either the start or end of the coding sequence. One of these partial coding deletionsresulted in a putative NUBP1-CIITA fusion protein joining the first 10 exons of NUBP1 to CIITAexon 8 onward in A43072. A similar deletion with breakpoints in NUBP1 intron 9 and CIITA intron84CIITA CLEC16ADEXI SOCS1A43029A43030A43031A43036A43037A43043A43049A43050A43051A43052A43067A43068A43070A43071A43072A43075A43076A43077A43078A43079A43080A43081A43082A43084A43090A43093A43094A43095A43097A43101A43110A43115A4311711000000111000001120000011300000DeletionDuplicationInversionFigure 3.27: Location of intra-chromosomal rearrangements in the chr16 capture space. Sam-ples are listed on the y-axis and genomic space is shown across the x-axis. Boxesrepresent the boundaries of predicted rearrangements with red boxes representing dele-tions, purple showing duplications and yellow showing inversions. Gene models areshown above. Grey rectangles represent the target space.1 has been previously reported by our group, and was shown to cause reduced CIITA expression dueto the weaker activity of the NUBP1 promoter [68]. In our case, the combined reduction in promoteractivity and removal of additional CIITA coding sequence are likely to produce a non-functionalCIITA protein. Two large inversions with a breakpoint in CIITA intron 1 were also observed, andanother occurred upstream of the gene locus displacing the CIITA promoter region.The SOCS1 locus also harboured frequent intra-chromosomal rearrangements (Figure 3.30).Three deletions of the full gene were observed. Another deletion removed the entire coding se-quence between UTRs, and five additional deletions removed either the start or end of the codingregion. One in-frame deletion within the coding sequence was observed, but is predicted to abrogate85A43030A43036A43037A43043A43050A43051A43052A43067A43068A43070A43071A43072A43075A43076A43077A43078A43079A43080A43081A43082A43084A43093A43094A43095A43097A43101A43110A43115A4311710975000109800001098500010990000DeletionDuplicationInversionFigure 3.28: Location of intra-chromosomal rearrangements within CIITA intron 1. Dashedgrey lines represent borders of intron 1. Individual samples are listed on the y-axis withthe x-axis representing genomic coordinates.CIITAA43071109700001098000010990000110000001101000011020000Figure 3.29: Intronic CIITA deletions observed in A43071. Deletions were identified inA43071 corresponding to intronic boundaries, suggesting mRNA contamination dur-ing library preparation. X-axis shows genomic coordinates and red rectangles showdeletions with edges representing boundaries. The CIITA gene model is shown above.86SOCS1A43029A43030A43036A43037A43067A43070A43075A43076A43077A43080A43082A43090A43094A43095A4311511348500113490001134950011350000DeletionDuplicationInversionFigure 3.30: Location of intra-chromosomal rearrangements in the SOCS1 gene locus. Sam-ples are listed on the y-axis and genomic coordinates are shown across the x-axis. Boxesrepresent the boundaries of predicted rearrangements with red boxes representing dele-tions, purple showing duplications and yellow showing inversions. SOCS1 gene modelis shown above. Grey shading represents the target space.protein function by altering the SH2 domain that allows SOCS1 to bind to target proteins [38]. Twoinversions were also observed that disrupted the SOCS1 coding sequence.Alterations of the CLEC16A gene that lies between CIITA and SOCS1 were also frequent, with7 full-gene deletions, 3 partial deletions, and 7 inversions within the coding region identified.3.3.2 Interpretation of CIITA and SOCS1 rearrangements9After identifying recurrent patterns in CIITA and SOCS1 rearrangements, we assessed whether dif-ferent rearrangement types were likely to have a functional impact.Our translocation findings in CIITA are consistent with what has been previously reported inthe literature: CIITA rearranges promiscuously with an enrichment for breakpoints in CIITA intron1 [68, 97]. Our group has demonstrated that alterations in this region are enriched in AID targetmotifs, suggesting that aberrant SHM is the likely cause of the DSBs that facilitate these transloca-tions [68]. This is also the likely mechanism by which the large number of small intra-chromosomalevents we observed within intron 1 arise.9Interpretation was performed by myself and Anja Mottok.87One functional consequence of CIITA intron 1 translocations is the disruption of the CIITAlocus, most often placing the majority of the coding sequence downstream of an intergenic regionand preventing the production of CIITA protein. As CIITA is the master transcriptional regulatorof MHC class II expression, its loss has been shown to result in decreased expression of MHCclass II surface molecules [97]. Decreased expression of MHC class II components is associatedwith inferior survival in cHL and PMBCL, and allows tumour cells to escape immune surveillance[21, 82, 97]. This is also the likely functional consequence of the recurrent intra-chromosomalrearrangements we observed, namely deletions and inversions that result in removal of the full orpartial CIITA coding sequence.Translocations in CIITA intron 1 can have a second functional impact arising from the juxta-position of promoter regions. Since the CIITA promoter is highly active in B-cells, when placedupstream of a partner gene it can result in over-expression of the partner protein. This has been ob-served previously in cases with CIITA-CD274 and CIITA-PDCD1LG2 translocations, where over-expression of the PD-1 ligands was demonstrated and is known to correspond to an immune priv-ilege phenotype [97]. Here we observed two translocations that resulted in a promoter swap andproduction of a putative chimeric protein. Notably, both of the partners in these cases are knownproto-oncogenes: REL and PRDM16. Alterations in the REL locus have been described in multiplelymphomas including cHL and PMBCL [8, 103, 110]. The REL gene encodes the c-REL protein,which is a subunit of the NF-κB complex that regulates cell proliferation and survival [19]. Fre-quent amplifications of the REL gene are associated with nuclear expression of c-REL, indicatingactivation of the NF-κB signalling pathway and leading to tumour cell survival and proliferation.Over-expression of c-REL through the CIITA promoter swap mechanism may have the same func-tional effect, leading to constitutive NF-κB signalling. PRDM16 rearranges with RPN1 and otherpartners in a subset of adult and pediatric acute myeloid leukemias, leading to PRDM16 over-expression and poor prognosis in these patients [23, 93]. Over-expression of PRDM16 by the CIITApromoter may have a similar adverse effect in our lymphoma case.Taken together, our novel CIITA rearrangement findings support the theory of a double-hit func-tional impact combining abrogation of CIITA protein expression through translocations, deletions88and inversions, and over-expression of oncogenic partner genes in a subset of cases.The intra-chromosomal rearrangements we observed in SOCS1 were also consistent with pre-vious reports. Inactivating SOCS1 mutations occur frequently in PMBCL, HL, DLBCL and FL[62, 66, 87, 111]. The SOCS1 protein functions as an inhibitor of the JAK/STAT signalling path-way, and its impairment has been shown to cause de-regulated STAT activity resulting in increasedtumour cell survival and proliferation [65, 96, 111]. The frequent deletions and inversions we ob-serve in the SOCS1 locus involve removal of the full or partial coding sequence, and are predictedto prevent expression of a functional protein product. Similarly to CIITA, these rearrangements arelikely to arise from aberrant SHM as previously described mutations in this region are also enrichedin AID target motifs [65].Strikingly, we discovered recurrent translocation breakpoints clustered in the coding region ofSOCS1 exon 2. Translocations in this region have not been previously described, but the recurrenceof IGH partners suggests that aberrant SHM is also the likely mechanism of these rearrangements.The functional consequences of SOCS1 translocations have yet to be assessed, but they are pre-dicted to have similar inactivating effects as the intra-chromosomal rearrangements since they alsocause disruption of the coding space. Together our SOCS1 findings suggest that the functional im-pact of both translocations and intra-chromosomal rearrangements is disruption of SOCS1 proteinexpression leading to increased tumour cell survival and proliferation.Finally, frequent deletions of the CLEC16A locus were observed. This gene has not been wellcharacterized, but has recently been implicated in normal B-cell development [49]. Its frequent dele-tion suggests its loss may have an additional impact on tumour cell malignancy, but this hypothesisrequires further exploration.3.3.3 Concordance with FISHWe compared the results we generated with the capture-based sequencing approach to those seenin the FISH break-apart assays (Figure 3.31). Of the cases investigated, only 65% (44/68) hadconcordant results with either a FISH break-apart pattern and matching SV identified in capture,or a normal FISH pattern and no SVs found. Six samples were positive by both methods, but89A B Overlap by SV Overlap by case 58 1224CaptureFISH12 626 -/-18 +/+6 +/+ mismatchCapture FISHFigure 3.31: Concordance between FISH and capture-based structural variants on chr16. (A)Twenty-four of the 82 high-confidence SVs found with capture matched the observedFISH break-apart pattern, and the other 58 were found only with capture. Twelve sam-ples were break-apart positive by FISH but had no matching SV identified with se-quencing. (B) Of the 68 cases, 50 had concordant rearrangement statuses: 26 werebreak-apart negative and had no SVs identified; 18 were break-apart positive and hadmatching SVs found in sequencing; and 6 were break-apart positive and harboured SVsthat did not match the observed FISH pattern. The other 18 samples had a discordantrearrangement status between the two methods. Six were FISH break-apart positive andhad no predicted SVs by capture, and 12 had predictions by capture with a normal FISHpattern.had discordant findings between the two. Similarly to the PDL findings, there was no significantassociation with tissue preparation type (Wilcoxon-Mann-Whitney P = 0.33) or sequencing batch(Wilcoxon-Mann-Whitney P = 0.84) between concordant and non-concordant samples.On a per-SV basis, 24 of the 82 high-confidence SVs (29%) matched an observed FISH pattern.The 58 capture-specific SVs were mostly small CIITA intron 1 rearrangements that are undetectableat FISH resolution and SOCS1 rearrangements not observable by FISH since they are not locatedbetween the CIITA probe binding regions. Many cases harboured CIITA intron 1 alterations inthe region containing the alternative CIITA pIV promoter and alternative exon 1, often causingdeletion or disruption of these features (12 cases). This observation is consistent with previousreports that demonstrated frequent deletions and mutations in this region [68]. These small intron1 rearrangements may be of pathogenic relevance because they prevent the use of pIV transcriptionas a rescue mechanism when alterations affecting the normal B-cell pIII promoter or exon 1 are90present. Our capture-specific SV list also contained some larger intra-chromosomal rearrangementspredicted to have a functional effect. A 377 kb inversion in A43036 disrupts the CIITA codingsequence, but since the inversion breakpoints fall on either side of the red probe region no visualseparation of the red and green probes is observable by FISH. Deletions in A43072 and A43093remove part or all of the CIITA coding sequence, but are contained between probe regions anddo not alter FISH signals. These intra-chromosomal SVs are predicted to inhibit CIITA function,but are undetectable by FISH. The presence of functionally relevant sub-FISH resolution eventshighlights the value of a capture approach to increase the sensitivity of SV detection in conjunctionwith routine FISH assays.The discovery of recurrent translocations in the SOCS1 coding region is a novel finding. Sincethe abrogation of SOCS1 protein function has been shown to have adverse effects in multiple lym-phoma types, the addition of this small region to our capture-based sequencing panel provides valu-able information. We identified predicted inactivating SOCS1 translocations and intra-chromosomalrearrangements in 14 cases, suggesting the importance of examining the SOCS1 locus which is notroutinely investigated using FISH assays. However, it should be noted that the functional impact ofthese rearrangements is predicted based on previous literature, but needs to be properly assessed infuture experiments.The reasons for discrepancies between FISH and capture are likely the same as those describedfor the PDL results. Namely, the differential resolution between methods and possible tumourheterogeneity between the tissue samples used in the two assays. Five of the 12 break-apart sampleswith no matching capture prediction were break-apart positive in a very low percentage of cells (i.e.10% or less). In these cases it is likely that the rearrangements are found in a small clone which iseither not present in the tissue used for sequencing library construction or is not detectable abovebackground noise in the sequencing data. The tumour heterogeneity hypothesis is further supportedby cases that harbour SVs predicted to cause an observable change by FISH but still display anormal pattern. For example, a large deletion in A43094 should correspond to loss of both the redand green signals in one allele. Instead a normal signal pattern was seen by FISH, suggesting apossible difference in the clonal populations profiled in the two methods.91Chapter 4Conclusions4.1 Utility of capture sequencing in B-cell lymphoma rearrangementdetectionThe significance of genomic rearrangements in B-cell lymphomas is well appreciated for its diag-nostic value in identifying lymphoma types, predicting prognosis and thereby influencing clinicalmanagement. The analysis of structural variation by sequencing is a relatively new area of explo-ration in lymphoma, and methods that are cost effective, high-throughput and clinically relevant arerequired.My first research question addressed feasibility, and asked whether we could use capture-basedtargeted sequencing to perform rearrangement detection in FFPE lymphoma tissues. Assessmentof quality metrics showed that despite the limitations of FFPE DNA seen in the low library yieldsand small insert sizes, we achieved very high sequencing coverage of the target regions which wascomparable to that of our fresh frozen samples. Confirming our first hypothesis, using our ensemblemethod for SV detection we were able to detect predicted genomic rearrangements, describe break-point anatomy at base-pair resolution, and identify novel rearrangement partners in FFPE tissue. Wewere also able to discover many novel intra-chromosomal rearrangements and describe previouslyuncharacterized SVs in the SOCS1 region.The efficacy of this method combined with our large sample size allowed us to examine our92second research question and identify recurrent rearrangement patterns. A major finding in ourresults is the presence of translocation cluster breakpoint regions (CBRs) in the CIITA, SOCS1 andPDL regions. The base-pair resolution of our capture-based sequencing assay uniquely allowed us todetect these clusters, and we provide the first comprehensive characterization of breakpoint locationsand partner genes for these regions. The CBRs identified in the PDL genes are a novel finding. Inparticular, the distinction between CD274 and PDCD1LG2 rearrangements and CBR vs. non-CBRbreakpoints, plus the relative frequency of each are discernible with this method. In CIITA, thecluster breakpoint pattern supports the predicted mechanism of aberrant somatic hypermutation asthe cause of rearrangements [68]. We were able to identify 12 novel translocations and describepreviously unknown CIITA partner genes, including known proto-oncogenes. Translocations in theSOCS1 coding region have never been reported, and we describe this recurrent pattern for the firsttime. Finally, we identified recurrent patterns of intra-chromosomal rearrangements in all regions.These included focal duplications of the PDL gene regions, removal of the 3’ UTR in the PDL loci,and frequent disruption of the CIITA and SOCS1 coding regions by deletions and inversions.Our final research question addressed the possible functional impact of the rearrangements wedetected. We used IHC analysis of PD-L1 and PD-L2 to assess functional impact at the expressionlevel, and demonstrated that CBR rearrangements and focal amplifications are indeed associatedwith increased protein expression of the altered ligand. Combining the observed rearrangement pat-terns, IHC data, and findings in the published literature, we have confirmed our second hypothesisthat understanding rearrangement anatomy can inform on the functional consequences of structuralvariants.In addition to addressing our specific research questions, the integration of our capture pre-dictions with FISH and IHC data has allowed us to assess the utility of our method in a broadercontext. As described throughout Chapter 3, a large proportion of the novel SVs we identifiedwere below the detection resolution of FISH assays. Many of these are likely to impact proteinexpression, such as the focal amplification of the PDL region in A43037 that was associated withincreased ligand expression, or the deletions of the CIITA coding space that fall between the FISHprobe regions (Figure 3.22). Conversely, some cases that are break-apart positive with FISH are93unlikely to have a functional impact based on their rearrangement anatomy. The recurrent large in-versions with a breakpoint between CD274 and PDCD1LG2 showed no association with expressionchanges despite producing a break-apart signal. However, there were also cases with FISH break-apart positivity that were not explained by sequencing, suggesting that the capture-based method isnot adequate to replace FISH altogether. Our findings suggest that the most sensitive method fordetecting and interpreting genomic rearrangements in B-cell lymphomas is a combined approachintegrating FISH, capture sequencing and IHC.4.2 Clinical implications of genomic rearrangements1The application of targeted sequencing for clinical use is becoming more realistic, and many studieshave begun to explore the feasibility of identifying actionable mutations in clinically relevant FFPEtissue to guide treatment [1, 18, 60, 79, 105, 106]. Of particular relevance, tumours that harbouralterations resulting in an immune privilege phenotype have been a major focus with the recentprogress made in immune therapy [67].Immune checkpoint inhibitors targeting the PD-1/PDL axis are of great interest for cancer ther-apy, and in recent years inhibitors targeting PD-1, PD-L1 and PD-L2 have been explored. Thesehave already been shown to have varying efficacy based on expression level of the ligands, and itfollows that copy number and rearrangement status at the PDL locus is also likely to impact efficacy[91]. Supporting this theory, a phase I/II clinical study of a PD-1 inhibitor in relapsed and refractoryHodgkin lymphoma recently showed that a number of responsive patients harboured copy numbergains of CD274 and PDCD1LG2 linked to increased ligand expression, suggesting that CNAs inthis region may have potential as a predictive biomarker [4]. Our findings have demonstrated thatrearrangements in the PDL genes are also frequently associated with expression changes, and thatspecific breakpoint anatomy informs on functional impact. The CBRs we identified may even ex-plain differences observed in clinical response between patients being treated with immune check-point therapy. Together this data suggests that addition of targeted sequencing data to IHC and FISHinformation may have predictive value in administering immune therapy.1Parts of this section are included in the discussion of Chong, Twa et al. and were co-authored by David Twa andChristian Steidl.94Another potentially important factor that needs to be explored is the impact of specific rear-rangement partners. Rearrangements in CIITA have been shown to have the double-hit effect ofboth abrogating CIITA protein expression and simultaneously increasing expression of the translo-cated partner region placed under control of the active CIITA promoter. We have demonstratedhere and previously the presence of proto-oncogenic partner genes [97, 101]. Understanding thefunctional impact of partner over-expression may be key to fully understanding and addressing themechanisms of pathogenesis, particularly in cases where the partner region is involved in targetablepathways. For instance, c-Rel is a critical component in NF-κB signalling, a pathway which is tar-getable by numerous inhibitors [27]. CBR rearrangements in the PD-1 ligands frequently place themunder control of the partner gene’s promoter, and assessing the partner and its regulatory pathwaysmay inform on strategies for regulating aberrant PDL expression. Currently the value of partnerinformation is only theoretical, but observations in DLBCL have shown that cases harbouring MYCrearrangements with immunoglobulin or non-immunoglobulin partners have differential prognosisfollowing R-CHOP therapy [14]. This suggests the potential value of characterizing partner genefunction.Our data also demonstrate the prevalence of inactivating SOCS1 mutations. This locus is notroutinely assessed by FISH because it has not previously been implicated in recurrent translocations.The inclusion of this locus in the capture design demonstrates the strength of this method. Weadded this small region because it was easy to include in our CIITA-centric design, and ended upcharacterizing unexpected novel rearrangement patterns. Discoveries in the SOCS1 region may alsohave a clinical impact, as mutations in this gene have been shown to result in constitutive activationof the JAK/STAT signalling pathway which is frequently deregulated in PMBCL and HL [85]. Thispathway is actively being explored as a therapeutic target via JAK inhibitors [66, 77].Taken together, our findings indicate that understanding the full spectrum of SVs in B-cell lym-phoma rearrangements will be critical for future clinical approaches that match lymphoma geno-types and phenotypes to tailored therapeutic regimens.954.3 Further research applicationsMany novel discoveries were made in this study, including the recurrence of specific rearrangementtypes. In many cases we have hypothesized the functional impact of these types and have begun toinvestigate expression using immunohistochemistry in the PDL data. However, further functionalstudies are planned, such as the cloning of specific rearrangement types (e.g. 3’ PDCD1LG2 rear-rangements) into lymphoma cell lines. This will allow us to explore the functional impact of theserearrangements in vitro, and co-culture experiments to specifically examine the effect on T-cell re-sponse are planned. Additional validations of rearrangements in the CIITA and SOCS1 regions willalso be performed, along with further investigation into the potential CIITA fusion proteins.While our study produced many novel, biologically relevant discoveries, it also serves as aproof of concept. A key benefit of our approach is the ability to interrogate multiple loci in asingle assay. We have now created a design with an expanded target space that includes a morecomprehensive set of B-cell lymphoma-related genes2. This includes the recurrently rearrangedimmunoglobulin (IGH, IGL, IGK), MYC, BCL2 and BCL6 loci. The total expanded capture spaceencompasses approximately 7.8 Mb and contains 18 regions of interest. The goal of this design isto create a more clinically relevant assay that can be used to assess the full spectrum of lymphoma-related rearrangements simultaneously. This is particularly important for diseases like “double-hit” DLBCL, where concurrent BCL2 and MYC translocations are associated with especially poorprognosis [6]. We hope this design will represent a practical high-throughput method that can beroutinely applied to patient samples to comprehensively assess rearrangement status at all relevantloci. This may also have value in a clinical setting for designing personalized combination therapies.Using our expanded design, we have now sequenced an initial set of 91 FFPE B-cell lymphomas.Preliminary results demonstrate that an ensemble-based SV detection pipeline incorporating multi-ple trim lengths and predictions tools is still highly sensitive. However, some major differences inquality metrics are seen with the new design, such as lower average target coverage owing to themuch larger target space. To address this issue we are investigating the impact of deeper coverage,2The expanded capture design was created by myself, Christian Steidl, Susana Ben-Neriah, David Scott, David Twaand Anja Mottok.96sequencing multiple lanes of each pool instead of a single lane. We will also re-visit an assembly-based approach to evaluate its efficacy on the new target space, and experiment with additionalnewly published alignment-based tools.Our expanded experiments also provide an opportunity to address limitations with the currentstudy. A key example is the high failure rate (26%) we observed during library preparation in ourcurrent cohort. This is a common issue with FFPE extraction at the GSC, and is now being ad-dressed by the adoption of automated bead-based methods. These show preliminary improvementsover the column-based methods that we have used previously. In our new cohort of 91 cases, wehave switched to the automated extraction method and observed a 100% success rate during librarypreparation.In conclusion, this study has addressed the three main research questions I aimed to answer. Wehave shown that the capture-based sequencing approach is feasible and effective for SV detectionin FFPE lymphomas, demonstrated recurrent patterns of rearrangement with respect to transloca-tion breakpoints and intra-chromosomal rearrangements, and interpreted the findings and performedpreliminary functional analysis to assess the potential functional impact of the rearrangement pat-terns we describe. This study represents an important step towards the routine profiling of genomicrearrangements in lymphomas for guiding clinical decision making, and our ongoing experimentswill continue to work toward developing a comprehensive, high-throughput and clinically relevantassay for this purpose.97Bibliography[1] Abel, H. J., H. Al-Kateb, C. E. Cottrell, A. J. Bredemeyer, C. C. Pritchard, A. H. Grossmann,M. L. Wallander, J. D. Pfeifer, C. M. Lockwood, and E. J. Duncavage (2014). Detection of generearrangements in targeted clinical next-generation sequencing. The Journal of MolecularDiagnostics 16(4), 405–17. → pages 17, 19, 20, 47, 94[2] Alizadeh, A. A., M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick,H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson, L. Lu, D. B.Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O.Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown,and L. M. Staudt (2000). Distinct types of diffuse large B-cell lymphoma identified by geneexpression profiling. Nature 403(6769), 503–11. → pages 4[3] Alkan, C., B. P. Coe, and E. E. Eichler (2011). Genome structural variation discovery andgenotyping. Nature Reviews Genetics 12(5), 363–76. → pages 18[4] Ansell, S. M., A. M. Lesokhin, I. Borrello, A. Halwani, E. C. Scott, M. Gutierrez, S. J.Schuster, M. M. Millenson, D. Cattry, G. J. Freeman, S. J. Rodig, B. Chapuy, A. H. Ligon,L. Zhu, J. F. Grosso, S. Y. Kim, J. M. Timmerman, M. A. Shipp, and P. Armand (2014). PD-1Blockade with Nivolumab in Relapsed or Refractory Hodgkin’s Lymphoma. New EnglandJournal of Medicine 372(4), 311–9. → pages 12, 94[5] Armand, P., A. Nagler, E. A. Weller, S. M. Devine, D. E. Avigan, Y.-B. Chen, M. S. Kaminski,H. K. Holland, J. N. Winter, J. R. Mason, J. W. Fay, D. A. Rizzieri, C. M. Hosing, E. D. Ball,J. P. Uberti, H. M. Lazarus, M. Y. Mapara, S. A. Gregory, J. M. Timmerman, D. Andorsky,R. Or, E. K. Waller, R. Rotem-Yehudar, and L. I. Gordon (2013). Disabling immune toleranceby programmed death-1 blockade with pidilizumab after autologous hematopoietic stem-celltransplantation for diffuse large B-cell lymphoma: results of an international phase II trial.Journal of Clinical Oncology 31(33), 4199–206. → pages 12[6] Aukema, S. M., R. Siebert, E. Schuuring, G. W. van Imhoff, H. C. Kluin-Nelemans, E.-J.Boerma, and P. M. Kluin (2011). Double-hit B-cell lymphomas. Blood 117(8), 2319–31. →pages 96[7] Barbieri, C. E., S. C. Baca, M. S. Lawrence, F. Demichelis, M. Blattner, J.-P. Theurillat, T. A.White, P. Stojanov, E. Van Allen, N. Stransky, E. Nickerson, S.-S. Chae, G. Boysen, D. Auclair,R. C. Onofrio, K. Park, N. Kitabayashi, T. Y. MacDonald, K. Sheikh, T. Vuong, C. Guiducci,K. Cibulskis, A. Sivachenko, S. L. Carter, G. Saksena, D. Voet, W. M. Hussain, A. H. Ramos,98W. Winckler, M. C. Redman, K. Ardlie, A. K. Tewari, J. M. Mosquera, N. Rupp, P. J. Wild,H. Moch, C. Morrissey, P. S. Nelson, P. W. Kantoff, S. B. Gabriel, T. R. Golub, M. Meyerson,E. S. Lander, G. Getz, M. A. Rubin, and L. A. Garraway (2012). Exome sequencing identifiesrecurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nature Genetics 44(6),685–9. → pages 17[8] Barth, T. F. E., J. I. Martin-Subero, S. Joos, C. K. Menz, C. Hasel, G. Mechtersheimer, R. M.Parwaresch, P. Lichter, R. Siebert, and P. Möoller (2003). Gains of 2p involving the REL locuscorrelate with nuclear c-Rel protein accumulation in neoplastic cells of classical Hodgkinlymphoma. Blood 101(9), 3681–6. → pages 88[9] Batista, F. D. and N. E. Harwood (2009). The who, how and where of antigen presentation toB cells. Nature Reviews Immunology 9(1), 15–27. → pages 2[10] Beltran, H., R. Yelensky, G. M. Frampton, K. Park, S. R. Downing, T. Y. MacDonald,M. Jarosz, D. Lipson, S. T. Tagawa, D. M. Nanus, P. J. Stephens, J. M. Mosquera, M. T. Cronin,and M. A. Rubin (2013). Targeted next-generation sequencing of advanced prostate canceridentifies potential therapeutic targets and disease heterogeneity. European Urology 63(5),920–6. → pages 19, 20[11] Blow, N. (2007). Tissue preparation: Tissue issues. Nature 448(7156), 959–63. → pages 19,20, 42[12] Bouamar, H., S. Abbas, A.-P. Lin, L. Wang, D. Jiang, K. N. Holder, M. C. Kinney,S. Hunicke-Smith, and R. C. T. Aguiar (2013). A capture-sequencing strategy identifies IRF8,EBF1, and APRIL as novel IGH fusion partners in B-cell lymphoma. Blood 122(5), 726–33. →pages 47, 50[13] Campo, E., S. H. Swerdlow, N. L. Harris, S. Pileri, H. Stein, and E. S. Jaffe (2011). The 2008WHO classification of lymphoid neoplasms and beyond: evolving concepts and practicalapplications. Blood 117(19), 5019–32. → pages 1, 4[14] Carey, C. D., D. Gusenleitner, B. Chapuy, A. E. Kovach, M. J. Kluk, H. H. Sun, R. E.Crossland, C. M. Bacon, V. Rand, P. Dal Cin, L. P. Le, D. Neuberg, A. R. Sohani, M. A. Shipp,S. Monti, and S. J. Rodig (2015). Molecular classification of MYC-driven B-cell lymphomas bytargeted gene expression profiling of fixed biopsy specimens. The Journal of MolecularDiagnostics 17(1), 19–30. → pages 95[15] Chapuy, B., M. G. Roemer, Y. Tan, C. Stewart, L. Zhang, A. J. Dunford, E. S. Jordanova,F. Feuerhake, G. Illerhaus, D. Gusenleitner, E. Linden, H. H. Sun, M. Aono, G. J. Freeman,T. R. Golub, G. Getz, S. J. Rodig, D. de Jong, S. Monti, and M. A. Shipp (2014). ActionableGenetic Features of Primary Testicular and Primary Central Nervous System Lymphomas.Blood 124(21), 74. → pages 64, 70[16] Chen, L., D. L. Gibbons, S. Goswami, M. A. Cortez, Y.-H. Ahn, L. A. Byers, X. Zhang,X. Yi, D. Dwyer, W. Lin, L. Diao, J. Wang, J. D. Roybal, M. Patel, C. Ungewiss, D. Peng,S. Antonia, M. Mediavilla-Varela, G. Robertson, S. Jones, M. Suraokar, J. W. Welsh, B. Erez,I. I. Wistuba, L. Chen, D. Peng, S. Wang, S. E. Ullrich, J. V. Heymach, J. M. Kurie, and F. X.-F.99Qin (2014). Metastasis is regulated via microRNA-200/ZEB1 axis control of tumour cell PD-L1expression and intratumoral immunosuppression. Nature Communications 5, 5241. → pages 71[17] Cingolani, P., A. Platts, L. L. Wang, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, andD. M. Ruden (2012). A program for annotating and predicting the effects of single nucleotidepolymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2;iso-3. Fly 6(2), 80–92. → pages 33[18] Cottrell, C. E., H. Al-Kateb, A. J. Bredemeyer, E. J. Duncavage, D. H. Spencer, H. J. Abel,C. M. Lockwood, I. S. Hagemann, S. M. O’Guin, L. C. Burcea, C. S. Sawyer, D. M. Oschwald,J. L. Stratman, D. A. Sher, M. R. Johnson, J. T. Brown, P. F. Cliften, B. George, L. D. McIntosh,S. Shrivastava, T. T. Nguyen, J. E. Payton, M. A. Watson, S. D. Crosby, R. D. Head, R. D.Mitra, R. Nagarajan, S. Kulkarni, K. Seibert, H. W. Virgin, J. Milbrandt, and J. D. Pfeifer(2014). Validation of a next-generation sequencing assay for clinical molecular oncology. TheJournal of Molecular Diagnostics 16(1), 89–105. → pages 17, 94[19] Curry, C. V., A. A. Ewton, R. J. Olsen, B. R. Logan, H. A. Preti, Y.-C. Liu, S. L. Perkins, andC.-C. Chang (2009). Prognostic impact of C-REL expression in diffuse large B-cell lymphoma.Journal of Hematopathology 2(1), 20–6. → pages 88[20] Cycon, K. A., J. L. Clements, R. Holtz, H. Fuji, and S. P. Murphy (2009). Theimmunogenicity of L1210 lymphoma clones correlates with their ability to function asantigen-presenting cells. Immunology 128(1 Suppl), e641–51. → pages 10[21] Diepstra, A., G. W. van Imhoff, H. E. Karim-Kos, A. van den Berg, G. J. te Meerman,M. Niens, I. M. Nolte, E. Bastiaannet, M. Schaapveld, E. Vellenga, and S. Poppema (2007).HLA class II expression by Hodgkin Reed-Sternberg cells is an independent prognostic factor inclassical Hodgkin’s lymphoma. Journal of Clinical Oncology 25(21), 3101–8. → pages 10, 88[22] Dong, H., S. E. Strome, D. R. Salomao, H. Tamura, F. Hirano, D. B. Flies, P. C. Roche, J. Lu,G. Zhu, K. Tamada, V. A. Lennon, E. Celis, and L. Chen (2002). Tumor-associated B7-H1promotes T-cell apoptosis: a potential mechanism of immune evasion. Nature Medicine 8(8),793–800. → pages 11[23] Duhoux, F. P., G. Ameye, C. P. Montano-Almendras, K. Bahloula, M. J. Mozziconacci,S. Laibe, I. Wlodarska, L. Michaux, P. Talmant, S. Richebourg, E. Lippert, F. Speleman,C. Herens, S. Struski, S. Raynaud, N. Auger, N. Nadal, K. Rack, F. Mugneret, I. Tigaud,M. Lafage, S. Taviaux, C. Roche-Lestienne, D. Latinne, J. M. Libouton, J.-B. Demoulin, andH. A. Poirel (2012). PRDM16 (1p36) translocations define a distinct entity of myeloidmalignancies with poor prognosis but may also occur in lymphoid malignancies. BritishJournal of Haematology 156(1), 76–88. → pages 88[24] Duncavage, E. J., H. J. Abel, P. Szankasi, T. W. Kelley, and J. D. Pfeifer (2012). Targetednext generation sequencing of clinically significant gene mutations and translocations inleukemia. Modern Pathology 25(6), 795–804. → pages 19[25] Eirew, P., A. Steif, J. Khattra, G. Ha, D. Yap, H. Farahani, K. Gelmon, S. Chia, C. Mar,A. Wan, E. Laks, J. Biele, K. Shumansky, J. Rosner, A. McPherson, C. Nielsen, A. J. L. Roth,100C. Lefebvre, A. Bashashati, C. de Souza, C. Siu, R. Aniba, J. Brimhall, A. Oloumi, T. Osako,A. Bruna, J. L. Sandoval, T. Algara, W. Greenwood, K. Leung, H. Cheng, H. Xue, Y. Wang,D. Lin, A. J. Mungall, R. Moore, Y. Zhao, J. Lorette, L. Nguyen, D. Huntsman, C. J. Eaves,C. Hansen, M. A. Marra, C. Caldas, S. P. Shah, and S. Aparicio (2014). Dynamics of genomicclones in breast cancer patient xenografts at single-cell resolution. Nature 518(7539), 422–426.→ pages 33, 35[26] Forshew, T., M. Murtaza, C. Parkinson, D. Gale, D. W. Y. Tsui, F. Kaper, S.-J. Dawson, A. M.Piskorz, M. Jimenez-Linan, D. Bentley, J. Hadfield, A. P. May, C. Caldas, J. D. Brenton, andN. Rosenfeld (2012). Noninvasive identification and monitoring of cancer mutations by targeteddeep sequencing of plasma DNA. Science Translational Medicine 4(136), 136ra68. → pages 17[27] Gilmore, T. D. and M. Herscovitch (2006). Inhibitors of NF-κB signaling: 785 and counting.Oncogene 25(51), 6887–99. → pages 95[28] Goetz, L., K. Bethel, and E. J. Topol (2013). Rebooting cancer tissue handling in thesequencing era: toward routine use of frozen tumor tissue. Journal of the American MedicalAssociation 309(1), 37–8. → pages 19[29] Green, M. R., A. J. Gentles, R. V. Nair, J. M. Irish, S. Kihira, C. L. Liu, I. Kela, E. S.Hopmans, J. H. Myklebust, H. Ji, S. K. Plevritis, R. Levy, and A. A. Alizadeh (2013). Hierarchyin somatic mutations arising during genomic evolution and progression of follicular lymphoma.Blood 121(9), 1604–11. → pages 5, 6[30] Green, M. R., S. Monti, S. J. Rodig, P. Juszczynski, T. Currie, E. O’Donnell, B. Chapuy,K. Takeyama, D. Neuberg, T. R. Golub, J. L. Kutok, and M. A. Shipp (2010). Integrativeanalysis reveals selective 9p24.1 amplification, increased PD-1 ligand expression, and furtherinduction via JAK2 in nodular sclerosing Hodgkin lymphoma and primary mediastinal largeB-cell lymphoma. Blood 116(17), 3268–77. → pages 10, 11[31] Gunawardana, J., F. C. Chan, A. Telenius, B. Woolcock, R. Kridel, K. L. Tan, S. Ben-Neriah,A. Mottok, R. S. Lim, M. Boyle, S. Rogic, L. M. Rimsza, C. Guiter, K. Leroy, P. Gaulard,C. Haioun, M. A. Marra, K. J. Savage, J. M. Connors, S. P. Shah, R. D. Gascoyne, and C. Steidl(2014). Recurrent somatic mutations of PTPN1 in primary mediastinal B cell lymphoma andHodgkin lymphoma. Nature Genetics 46(4), 329–35. → pages 5, 71[32] Han, S.-W., H.-P. Kim, J.-Y. Shin, E.-G. Jeong, W.-C. Lee, K.-H. Lee, J.-K. Won, T.-Y. Kim,D.-Y. Oh, S.-A. Im, Y.-J. Bang, S.-Y. Jeong, K. J. Park, J.-G. Park, G. H. Kang, J.-S. Seo, J.-I.Kim, and T.-Y. Kim (2013). Targeted sequencing of cancer-related genes in colorectal cancerusing next-generation sequencing. PloS One 8(5), e64271. → pages 17[33] Harris, N. L., E. S. Jaffe, H. Stein, P. M. Banks, J. K. Chan, M. L. Cleary, G. Delsol, C. DeWolf-Peeters, B. Falini, and K. C. Gatter (1994). A revised European-American classification oflymphoid neoplasms: a proposal from the International Lymphoma Study Group. Blood 84(5),1361–92. → pages 1[34] Hormozdiari, F., I. Hajirasouliha, P. Dao, F. Hach, D. Yorukoglu, C. Alkan, E. E. Eichler, andS. C. Sahinalp (2010). Next-generation VariationHunter: combinatorial algorithms fortransposon insertion discovery. Bioinformatics 26(12), i350–7. → pages 33101[35] Huisman, C., G. B. A. Wisman, H. G. Kazemier, M. A. T. M. van Vugt, A. G. J. van der Zee,E. Schuuring, and M. G. Rots (2013). Functional validation of putative tumor suppressor geneC13ORF18 in cervical cancer by Artificial Transcription Factors. Molecular Oncology 7(3),669–79. → pages 71[36] Kanzler, H., M. L. Hansmann, U. Kapp, J. Wolf, V. Diehl, K. Rajewsky, and R. Küppers(1996). Molecular single cell analysis demonstrates the derivation of a peripheral blood-derivedcell line (L1236) from the Hodgkin/Reed-Sternberg cells of a Hodgkin’s lymphoma patient.Blood 87(8), 3429–36. → pages 30[37] Kerick, M., M. Isau, B. Timmermann, H. Sültmann, R. Herwig, S. Krobitsch, G. Schaefer,I. Verdorfer, G. Bartsch, H. Klocker, H. Lehrach, and M. R. Schweiger (2011). Targeted highthroughput sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded(FFPE) tumor tissues, input amount and tumor heterogeneity. BMC Medical Genomics 4, 68.→ pages 19[38] Kile, B. T., B. A. Schulman, W. S. Alexander, N. A. Nicola, H. M. Martin, and D. J. Hilton(2002). The SOCS box: a tale of destruction and degradation. Trends in BiochemicalSciences 27(5), 235–41. → pages 87[39] Klein, U. and R. Dalla-Favera (2008). Germinal centres: role in B-cell physiology andmalignancy. Nature Reviews Immunology 8(1), 22–33. → pages 2[40] Klein, U., T. Goossens, M. Fischer, H. Kanzler, A. Braeuninger, K. Rajewsky, andR. Küppers (1998). Somatic hypermutation in normal and transformed human B cells.Immunological Reviews 162, 261–80. → pages 7[41] Korbel, J. O., A. E. Urban, J. P. Affourtit, B. Godwin, F. Grubert, J. F. Simons, P. M. Kim,D. Palejev, N. J. Carriero, L. Du, B. E. Taillon, Z. Chen, A. Tanzer, A. C. E. Saunders, J. Chi,F. Yang, N. P. Carter, M. E. Hurles, S. M. Weissman, T. T. Harkins, M. B. Gerstein, M. Egholm,and M. Snyder (2007). Paired-end mapping reveals extensive structural variation in the humangenome. Science 318(5849), 420–6. → pages 18[42] Krenciute, G., S. Liu, N. Yucer, Y. Shi, P. Ortiz, Q. Liu, B.-J. Kim, A. O. Odejimi, M. Leng,J. Qin, and Y. Wang (2013). Nuclear BAG6-UBL4A-GET4 complex mediates DNA damagesignaling and cell death. The Journal of Biological Chemistry 288(28), 20547–57. → pages 71[43] Küppers, R. (2005). Mechanisms of B-cell lymphoma pathogenesis. Nature ReviewsCancer 5(4), 251–62. → pages 2, 4, 7[44] Küppers, R., M. Zhao, M. L. Hansmann, and K. Rajewsky (1993). Tracing B celldevelopment in human germinal centres by molecular analysis of single cells picked fromhistological sections. The EMBO Journal 12(13), 4955–67. → pages 4[45] Layer, R. M., C. Chiang, A. R. Quinlan, and I. M. Hall (2014). LUMPY: a probabilisticframework for structural variant discovery. Genome Biology 15(6), R84. → pages 35[46] Lehmann, U. and H. Kreipe (2001). Real-time PCR analysis of DNA and RNA extractedfrom formalin-fixed and paraffin-embedded biopsies. Methods 25(4), 409–18. → pages 20, 42102[47] Lenz, G. and L. M. Staudt (2010). Aggressive lymphomas. The New England Journal ofMedicine 362(15), 1417–29. → pages 8[48] Li, H. and R. Durbin (2010). Fast and accurate long-read alignment with Burrows-Wheelertransform. Bioinformatics 26(5), 589–95. → pages 32[49] Li, J., S. F. Jørgensen, S. M. Maggadottir, M. Bakay, K. Warnatz, J. Glessner, R. Pandey,U. Salzer, R. E. Schmidt, E. Perez, E. Resnick, S. Goldacker, M. Buchta, T. Witte, L. Padyukov,V. Videm, T. Folseraas, F. Atschekzei, J. T. Elder, R. P. Nair, J. Winkelmann, C. Gieger, M. M.Nöthen, C. Büning, S. Brand, K. E. Sullivan, J. S. Orange, B. Fevang, S. Schreiber, W. Lieb,P. Aukrust, H. Chapel, C. Cunningham-Rundles, A. Franke, T. H. Karlsen, B. Grimbacher,H. Hakonarson, L. Hammarström, and E. Ellinghaus (2015). Association of CLEC16A withhuman common variable immunodeficiency disorder and role in murine B cells. NatureCommunications 6, 6804. → pages 89[50] Lipson, D., M. Capelletti, R. Yelensky, G. Otto, A. Parker, M. Jarosz, J. A. Curran,S. Balasubramanian, T. Bloom, K. W. Brennan, A. Donahue, S. R. Downing, G. M. Frampton,L. Garcia, F. Juhn, K. C. Mitchell, E. White, J. White, Z. Zwirko, T. Peretz, H. Nechushtan,L. Soussan-Gutman, J. Kim, H. Sasaki, H. R. Kim, S.-i. Park, D. Ercan, C. E. Sheehan, J. S.Ross, M. T. Cronin, P. A. Jänne, and P. J. Stephens (2012). Identification of new ALK and RETgene fusions from colorectal and lung cancer biopsies. Nature Medicine 18(3), 382–4. → pages20[51] Liu, Y., F. R. Abdul Razak, M. Terpstra, F. C. Chan, A. Saber, M. Nijland, G. van Imhoff,L. Visser, R. Gascoyne, C. Steidl, J. Kluiver, A. Diepstra, K. Kok, and A. van den Berg (2014).The mutational landscape of Hodgkin lymphoma cell lines determined by whole-exomesequencing. Leukemia 28(11), 2248–51. → pages 5[52] Loeb, G. B., A. A. Khan, D. Canner, J. B. Hiatt, J. Shendure, R. B. Darnell, C. S. Leslie, andA. Y. Rudensky (2012). Transcriptome-wide miR-155 binding map reveals widespreadnoncanonical microRNA targeting. Molecular Cell 48(5), 760–70. → pages 71[53] Love, C., Z. Sun, D. Jima, G. Li, J. Zhang, R. Miles, K. L. Richards, C. H. Dunphy, W. W. L.Choi, G. Srivastava, P. L. Lugar, D. A. Rizzieri, A. S. Lagoo, L. Bernal-Mizrachi, K. P. Mann,C. R. Flowers, K. N. Naresh, A. M. Evens, A. Chadburn, L. I. Gordon, M. B. Czader, J. I. Gill,E. D. Hsi, A. Greenough, A. B. Moffitt, M. McKinney, A. Banerjee, V. Grubor, S. Levy, D. B.Dunson, and S. S. Dave (2012). The genetic landscape of mutations in Burkitt lymphoma.Nature Genetics 44(12), 1321–5. → pages 5[54] MacConaill, L. E., C. D. Campbell, S. M. Kehoe, A. J. Bass, C. Hatton, L. Niu, M. Davis,K. Yao, M. Hanna, C. Mondal, L. Luongo, C. M. Emery, A. C. Baker, J. Philips, D. J. Goff,M. Fiorentino, M. A. Rubin, K. Polyak, J. Chan, Y. Wang, J. A. Fletcher, S. Santagata,G. Corso, F. Roviello, R. Shivdasani, M. W. Kieran, K. L. Ligon, C. D. Stiles, W. C. Hahn,M. L. Meyerson, and L. A. Garraway (2009). Profiling critical cancer gene mutations in clinicaltumor samples. PloS One 4(11), e7887. → pages 20[55] Magocˇ, T. and S. L. Salzberg (2011). FLASH: fast length adjustment of short reads toimprove genome assemblies. Bioinformatics 27(21), 2957–63. → pages 36103[56] Mamanova, L., A. J. Coffey, C. E. Scott, I. Kozarewa, E. H. Turner, A. Kumar, E. Howard,J. Shendure, and D. J. Turner (2010). Target-enrichment strategies for next-generationsequencing. Nature Methods 7(2), 111–8. → pages 13, 14, 16[57] McDowell, D. G., N. A. Burns, and H. C. Parkes (1998). Localised sequence regionspossessing high melting temperatures prevent the amplification of a DNA mimic in competitivePCR. Nucleic Acids Research 26(14), 3340–7. → pages 14[58] McPherson, A., F. Hormozdiari, A. Zayed, R. Giuliany, G. Ha, M. G. F. Sun, M. Griffith,A. Heravi Moussavi, J. Senz, N. Melnyk, M. Pacheco, M. A. Marra, M. Hirst, T. O. Nielsen,S. C. Sahinalp, D. Huntsman, and S. P. Shah (2011). deFuse: an algorithm for gene fusiondiscovery in tumor RNA-Seq data. PLoS Computational Biology 7(5), e1001138. → pages 33[59] Meienberg, J., K. Zerjavic, I. Keller, M. Okoniewski, A. Patrignani, K. Ludin, Z. Xu,B. Steinmann, T. Carrel, B. Röthlisberger, R. Schlapbach, R. Bruggmann, and G. Matyas(2015). New insights into the performance of human whole-exome capture platforms. NucleicAcids Research 43(11), e76. → pages 16[60] Meldrum, C., M. A. Doyle, and R. W. Tothill (2011). Next-generation sequencing for cancerdiagnostics: a practical perspective. The Clinical Biochemist 32(4), 177–95. → pages 13, 14,17, 19, 94[61] Melzner, I., A. J. Bucur, S. Brüderlein, K. Dorsch, C. Hasel, T. F. E. Barth, F. Leithäuser, andP. Möller (2005). Biallelic mutation of SOCS-1 impairs JAK2 degradation and sustainsphospho-JAK2 action in the MedB-1 mediastinal lymphoma line. Blood 105(6), 2535–42. →pages 30[62] Mestre, C., F. Rubio-Moscardo, A. Rosenwald, J. Climent, M. J. S. Dyer, L. Staudt,D. Pinkel, R. Siebert, and J. A. Martinez-Climent (2005). Homozygous deletion of SOCS1 inprimary mediastinal B-cell lymphoma detected by CGH to BAC microarrays. Leukemia 19(6),1082–4. → pages 89[63] Morin, R. D., M. Mendez-Lago, A. J. Mungall, R. Goya, K. L. Mungall, R. D. Corbett, N. A.Johnson, T. M. Severson, R. Chiu, M. Field, S. Jackman, M. Krzywinski, D. W. Scott, D. L.Trinh, J. Tamura-Wells, S. Li, M. R. Firme, S. Rogic, M. Griffith, S. Chan, O. Yakovenko, I. M.Meyer, E. Y. Zhao, D. Smailus, M. Moksa, S. Chittaranjan, L. Rimsza, A. Brooks-Wilson, J. J.Spinelli, S. Ben-Neriah, B. Meissner, B. Woolcock, M. Boyle, H. McDonald, A. Tam, Y. Zhao,A. Delaney, T. Zeng, K. Tse, Y. Butterfield, I. Birol, R. Holt, J. Schein, D. E. Horsman,R. Moore, S. J. M. Jones, J. M. Connors, M. Hirst, R. D. Gascoyne, and M. A. Marra (2011).Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476(7360),298–303. → pages 5, 17[64] Morin, R. D., K. Mungall, E. Pleasance, A. J. Mungall, R. Goya, R. D. Huff, D. W. Scott,J. Ding, A. Roth, R. Chiu, R. D. Corbett, F. C. Chan, M. Mendez-Lago, D. L. Trinh,M. Bolger-Munro, G. Taylor, A. Hadj Khodabakhshi, S. Ben-Neriah, J. Pon, B. Meissner,B. Woolcock, N. Farnoud, S. Rogic, E. L. Lim, N. A. Johnson, S. Shah, S. Jones, C. Steidl,R. Holt, I. Birol, R. Moore, J. M. Connors, R. D. Gascoyne, and M. A. Marra (2013).Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genomesequencing. Blood 122(7), 1256–65. → pages 5, 6104[65] Mottok, A., C. Renné, M. Seifert, E. Oppermann, W. Bechstein, M.-L. Hansmann,R. Küppers, and A. Bräuninger (2009). Inactivating SOCS1 mutations are caused by aberrantsomatic hypermutation and restricted to a subset of B-cell lymphoma entities. Blood 114(20),4503–6. → pages 30, 83, 89[66] Mottok, A., C. Renné, K. Willenbrock, M.-L. Hansmann, and A. Bräuninger (2007). Somatichypermutation of SOCS1 in lymphocyte-predominant Hodgkin lymphoma is accompanied byhigh JAK2 expression and activation of STAT6. Blood 110(9), 3387–90. → pages 89, 95[67] Mottok, A. and C. Steidl (2015). Genomic alterations underlying immune privilege inmalignant lymphomas. Current Opinion in Hematology 22(4), 343–54. → pages 8, 9, 10, 11,71, 94[68] Mottok, A., B. Woolcock, F. C. Chan, K. M. Tong, L. Chong, P. Farinha, A. Telenius,E. Chavez, S. Ramchandani, M. Drake, M. Boyle, S. Ben-Neriah, D. W. Scott, L. M. Rimsza,R. Siebert, R. D. Gascoyne, and C. Steidl (2015). Genomic Alterations in CIITA Are Frequentin Primary Mediastinal Large B Cell Lymphoma and Are Associated with Diminished MHCClass II Expression. Cell Reports 13(7), 1418–31. → pages 10, 80, 84, 85, 87, 90, 93[69] Mwenifumbo, J. C. and M. A. Marra (2013). Cancer genome-sequencing study design.Nature Reviews Genetics 14(5), 321–32. → pages 13[70] Okosun, J., C. Bödör, J. Wang, S. Araf, C.-Y. Yang, C. Pan, S. Boller, D. Cittaro, M. Bozek,S. Iqbal, J. Matthews, D. Wrench, J. Marzec, K. Tawana, N. Popov, C. O’Riain, D. O’Shea,E. Carlotti, A. Davies, C. H. Lawrie, A. Matolcsy, M. Calaminici, A. Norton, R. J. Byers,C. Mein, E. Stupka, T. A. Lister, G. Lenz, S. Montoto, J. G. Gribben, Y. Fan, R. Grosschedl,C. Chelala, and J. Fitzgibbon (2014). Integrated genomic analysis identifies recurrent mutationsand evolution patterns driving the initiation and progression of follicular lymphoma. NatureGenetics 46(2), 176–81. → pages 5, 6[71] Pasqualucci, L., G. Bhagat, M. Jankovic, M. Compagno, P. Smith, M. Muramatsu, T. Honjo,H. C. Morse, M. C. Nussenzweig, and R. Dalla-Favera (2008). AID is required for germinalcenter-derived lymphomagenesis. Nature Genetics 40(1), 108–12. → pages 8[72] Pasqualucci, L., P. Neumeister, T. Goossens, G. Nanjangud, R. S. Chaganti, R. Küppers, andR. Dalla-Favera (2001). Hypermutation of multiple proto-oncogenes in B-cell diffuse large-celllymphomas. Nature 412(6844), 341–6. → pages 7[73] Postow, M. A., M. K. Callahan, and J. D. Wolchok (2015). Immune Checkpoint Blockade inCancer Therapy. Journal of Clinical Oncology 33(17), 1974–82. → pages 12[74] Pritchard, C. C., S. J. Salipante, K. Koehler, C. Smith, S. Scroggins, B. Wood, D. Wu, M. K.Lee, S. Dintzis, A. Adey, Y. Liu, K. D. Eaton, R. Martins, K. Stricker, K. A. Margolin,N. Hoffman, J. E. Churpek, J. F. Tait, M.-C. King, and T. Walsh (2014). Validation andimplementation of targeted capture and sequencing for the detection of actionable mutation,copy number variation, and gene rearrangement in clinical cancer specimens. The Journal ofMolecular Diagnostics 16(1), 56–67. → pages 17, 20105[75] Quesada, V., L. Conde, N. Villamor, G. R. Ordóñez, P. Jares, L. Bassaganyas, A. J. Ramsay,S. Beà, M. Pinyol, A. Martínez-Trillos, M. López-Guerra, D. Colomer, A. Navarro, T. Baumann,M. Aymerich, M. Rozman, J. Delgado, E. Giné, J. M. Hernández, M. González-Díaz, D. A.Puente, G. Velasco, J. M. P. Freije, J. M. C. Tubío, R. Royo, J. L. Gelpí, M. Orozco, D. G.Pisano, J. Zamora, M. Vázquez, A. Valencia, H. Himmelbauer, M. Bayés, S. Heath, M. Gut,I. Gut, X. Estivill, A. López-Guillermo, X. S. Puente, E. Campo, and C. López-Otín (2012).Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chroniclymphocytic leukemia. Nature Genetics 44(1), 47–52. → pages 17[76] Quinlan, A. R. and I. M. Hall (2010). BEDTools: a flexible suite of utilities for comparinggenomic features. Bioinformatics 26(6), 841–2. → pages 32[77] Quintás-Cardama, A. and S. Verstovsek (2013). Molecular pathways: Jak/STAT pathway:mutations, inhibitors, and resistance. Clinical Cancer Research 19(8), 1933–40. → pages 95[78] Rausch, T., T. Zichner, A. Schlattl, A. M. Stütz, V. Benes, and J. O. Korbel (2012). DELLY:structural variant discovery by integrated paired-end and split-read analysis.Bioinformatics 28(18), i333–i339. → pages 32, 34[79] Rehm, H. L. (2013). Disease-targeted sequencing: a cornerstone in the clinic. NatureReviews Genetics 14(4), 295–300. → pages 13, 17, 94[80] Rimsza, L. M., W. C. Chan, R. D. Gascoyne, E. Campo, E. S. Jaffe, L. M. Staudt, J. Delabie,A. Rosenwald, and S. P. Murphy (2009). CIITA or RFX coding region loss of functionmutations occur rarely in diffuse large B-cell lymphoma cases and cell lines with low levels ofmajor histocompatibility complex class II expression. Haematologica 94(4), 596–8. → pages10[81] Ringnér, M. and M. Krogh (2005). Folding free energies of 5’-UTRs impactpost-transcriptional regulation on a genomic scale in yeast. PLoS Computational Biology 1(7),e72. → pages 71[82] Roberts, R. A., G. Wright, A. R. Rosenwald, M. A. Jaramillo, T. M. Grogan, T. P. Miller,Y. Frutiger, W. C. Chan, R. D. Gascoyne, G. Ott, H. K. Muller-Hermelink, L. M. Staudt, andL. M. Rimsza (2006). Loss of major histocompatibility class II gene and protein expression inprimary mediastinal large B-cell lymphoma is highly coordinated and related to poor patientsurvival. Blood 108(1), 311–8. → pages 10, 88[83] Rosenwald, A., G. Wright, W. C. Chan, J. M. Connors, E. Campo, R. I. Fisher, R. D.Gascoyne, H. K. Muller-Hermelink, E. B. Smeland, J. M. Giltnane, E. M. Hurt, H. Zhao,L. Averett, L. Yang, W. H. Wilson, E. S. Jaffe, R. Simon, R. D. Klausner, J. Powell, P. L.Duffey, D. L. Longo, T. C. Greiner, D. D. Weisenburger, W. G. Sanger, B. J. Dave, J. C. Lynch,J. Vose, J. O. Armitage, E. Montserrat, A. López-Guillermo, T. M. Grogan, T. P. Miller,M. LeBlanc, G. Ott, S. Kvaloy, J. Delabie, H. Holte, P. Krajci, T. Stokke, and L. M. Staudt(2002). The use of molecular profiling to predict survival after chemotherapy for diffuselarge-B-cell lymphoma. The New England Journal of Medicine 346(25), 1937–47. → pages 4[84] Rosenwald, A., G. Wright, K. Leroy, X. Yu, P. Gaulard, R. D. Gascoyne, W. C. Chan,T. Zhao, C. Haioun, T. C. Greiner, D. D. Weisenburger, J. C. Lynch, J. Vose, J. O. Armitage,106E. B. Smeland, S. Kvaloy, H. Holte, J. Delabie, E. Campo, E. Montserrat, A. Lopez-Guillermo,G. Ott, H. K. Muller-Hermelink, J. M. Connors, R. Braziel, T. M. Grogan, R. I. Fisher, T. P.Miller, M. LeBlanc, M. Chiorazzi, H. Zhao, L. Yang, J. Powell, W. H. Wilson, E. S. Jaffe,R. Simon, R. D. Klausner, and L. M. Staudt (2003). Molecular diagnosis of primary mediastinalB cell lymphoma identifies a clinically favorable subgroup of diffuse large B cell lymphomarelated to Hodgkin lymphoma. The Journal of Experimental Medicine 198(6), 851–62. →pages 4[85] Rui, L., R. Schmitz, M. Ceribelli, and L. M. Staudt (2011). Malignant pirates of the immunesystem. Nature Immunology 12(10), 933–40. → pages 3, 7, 95[86] Savage, K. J., S. Monti, J. L. Kutok, G. Cattoretti, D. Neuberg, L. De Leval, P. Kurtin, P. DalCin, C. Ladd, F. Feuerhake, R. C. T. Aguiar, S. Li, G. Salles, F. Berger, W. Jing, G. S. Pinkus,T. Habermann, R. Dalla-Favera, N. L. Harris, J. C. Aster, T. R. Golub, and M. A. Shipp (2003).The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuselarge B-cell lymphomas and shares features with classical Hodgkin lymphoma. Blood 102(12),3871–9. → pages 4[87] Schif, B., J. K. Lennerz, C. W. Kohler, S. Bentink, M. Kreuz, I. Melzner, O. Ritz, L. Trümper,M. Loeffler, R. Spang, and P. Möller (2013). SOCS1 mutation subtypes predict divergentoutcomes in diffuse large B-Cell lymphoma (DLBCL) patients. Oncotarget 4(1), 35–47. →pages 83, 89[88] Scott, D. W. and R. D. Gascoyne (2014). The tumour microenvironment in B celllymphomas. Nature Reviews Cancer 14(8), 517–34. → pages 6, 8, 9[89] Scott, D. W., K. L. Mungall, S. Ben-Neriah, S. Rogic, R. D. Morin, G. W. Slack, K. L. Tan,F. C. Chan, R. S. Lim, J. M. Connors, M. A. Marra, A. J. Mungall, C. Steidl, and R. D.Gascoyne (2012). TBL1XR1/TP63: a novel recurrent gene fusion in B-cell non-Hodgkinlymphoma. Blood 119(21), 4949–52. → pages 24[90] Scott, D. W., G. W. Wright, P. M. Williams, C.-J. Lih, W. Walsh, E. S. Jaffe, A. Rosenwald,E. Campo, W. C. Chan, J. M. Connors, E. B. Smeland, A. Mottok, R. M. Braziel, G. Ott,J. Delabie, R. R. Tubbs, J. R. Cook, D. D. Weisenburger, T. C. Greiner, B. J. Glinsmann-Gibson,K. Fu, L. M. Staudt, R. D. Gascoyne, and L. M. Rimsza (2014). Determining cell-of-originsubtypes of diffuse large B-cell lymphoma using gene expression in formalin-fixedparaffin-embedded tissue. Blood 123(8), 1214–7. → pages 4[91] Sharma, P. and J. P. Allison (2015). The future of immune checkpoint therapy.Science 348(6230), 56–61. → pages 12, 94[92] Shi, M., M. G. M. Roemer, B. Chapuy, X. Liao, H. Sun, G. S. Pinkus, M. A. Shipp, G. J.Freeman, and S. J. Rodig (2014). Expression of programmed cell death 1 ligand 2 (PD-L2) is adistinguishing feature of primary mediastinal (thymic) large B-cell lymphoma and associatedwith PDCD1LG2 copy gain. The American Journal of Surgical Pathology 38(12), 1715–23. →pages 10, 11, 70[93] Shiba, N., K. Ohki, T. Kobayashi, Y. Hara, G. Yamato, R. Tanoshima, H. Ichikawa,D. Tomizawa, M.-J. Park, A. Shimada, M. Sotomatsu, H. Arakawa, K. Horibe, S. Adachi,107T. Taga, A. Tawa, and Y. Hayashi (2016). High PRDM16 expression identifies a prognosticsubgroup of pediatric acute myeloid leukaemia correlated to FLT3-ITD, KMT2A-PTD, andNUP98-NSD1: the results of the Japanese Paediatric Leukaemia/Lymphoma Study GroupAML-05 trial. British Journal of Haematology 172(4), 581–91. → pages 88[94] Simpson, J. T., K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol (2009).ABySS: a parallel assembler for short read sequence data. Genome Research 19(6), 1117–23.→ pages 36, 38[95] Sobreira, N. L. M., V. Gnanakkan, M. Walsh, B. Marosy, E. Wohler, G. Thomas, J. E.Hoover-Fong, A. Hamosh, S. J. Wheelan, and D. Valle (2011). Characterization of complexchromosomal rearrangements by targeted capture and next-generation sequencing. GenomeResearch 21(10), 1720–7. → pages 19[96] Steidl, C. and R. D. Gascoyne (2011). The molecular pathogenesis of primary mediastinallarge B-cell lymphoma. Blood 118(10), 2659–69. → pages 4, 89[97] Steidl, C., S. P. Shah, B. W. Woolcock, L. Rui, M. Kawahara, P. Farinha, N. A. Johnson,Y. Zhao, A. Telenius, S. B. Neriah, A. McPherson, B. Meissner, U. C. Okoye, A. Diepstra,A. van den Berg, M. Sun, G. Leung, S. J. Jones, J. M. Connors, D. G. Huntsman, K. J. Savage,L. M. Rimsza, D. E. Horsman, L. M. Staudt, U. Steidl, M. A. Marra, and R. D. Gascoyne(2011). MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoidcancers. Nature 471(7338), 377–81. → pages 8, 10, 11, 18, 23, 30, 64, 70, 71, 80, 87, 88, 95,111[98] Swerdlow, S. H., E. Campo, N. L. Harris, E. S. Jaffe, S. A. Pileri, H. Stein, J. Thiele, andJ. W. Vardiman (Eds.) (2008). WHO Classification of Tumours of Haematopoietic andLymphoid Tissues. Lyon, France: IARC Press. → pages 1, 2, 6[99] Tischler, G. and S. Leonard (2014). biobambam: tools for read pair collation basedalgorithms on BAM files. Source Code for Biology and Medicine 9(1), 13. → pages 32[100] Topalian, S. L., F. S. Hodi, J. R. Brahmer, S. N. Gettinger, D. C. Smith, D. F. McDermott,J. D. Powderly, R. D. Carvajal, J. A. Sosman, M. B. Atkins, P. D. Leming, D. R. Spigel, S. J.Antonia, L. Horn, C. G. Drake, D. M. Pardoll, L. Chen, W. H. Sharfman, R. A. Anders, J. M.Taube, T. L. McMiller, H. Xu, A. J. Korman, M. Jure-Kunkel, S. Agrawal, D. McDonald, G. D.Kollia, A. Gupta, J. M. Wigginton, and M. Sznol (2012). Safety, activity, and immune correlatesof anti-PD-1 antibody in cancer. The New England Journal of Medicine 366(26), 2443–54. →pages 12[101] Twa, D. D. W., F. C. Chan, S. Ben-Neriah, B. W. Woolcock, A. Mottok, K. L. Tan, G. W.Slack, J. Gunawardana, R. S. Lim, A. W. McPherson, R. Kridel, A. Telenius, D. W. Scott, K. J.Savage, S. P. Shah, R. D. Gascoyne, and C. Steidl (2014). Genomic rearrangements involvingprogrammed death ligands are recurrent in primary mediastinal large B-cell lymphoma.Blood 123(13), 2062–5. → pages 8, 10, 11, 12, 18, 23, 24, 30, 64, 70, 95, 111[102] Twa, D. D. W., A. Mottok, F. C. Chan, S. Ben-Neriah, B. W. Woolcock, K. L. Tan, A. J.Mungall, H. McDonald, Y. Zhao, R. S. Lim, B. H. Nelson, K. Milne, S. P. Shah, R. D. Morin,108M. A. Marra, D. W. Scott, R. D. Gascoyne, and C. Steidl (2015). Recurrent genomicrearrangements in primary testicular lymphoma. The Journal of Pathology 236(2), 136–41. →pages 11, 18, 30, 64, 70[103] Twa, D. D. W. and C. Steidl (2015). Structural genomic alterations in primary mediastinallarge B-cell lymphoma. Leukemia & Lymphoma 56(8), 2239–50. → pages 10, 11, 12, 88[104] Untergasser, A., I. Cutcutache, T. Koressaar, J. Ye, B. C. Faircloth, M. Remm, and S. G.Rozen (2012). Primer3–new capabilities and interfaces. Nucleic Acids Research 40(15), e115.→ pages 40[105] Van Allen, E. M., N. Wagle, P. Stojanov, D. L. Perrin, K. Cibulskis, S. Marlow,J. Jane-Valbuena, D. C. Friedrich, G. Kryukov, S. L. Carter, A. McKenna, A. Sivachenko,M. Rosenberg, A. Kiezun, D. Voet, M. Lawrence, L. T. Lichtenstein, J. G. Gentry, F. W. Huang,J. Fostel, D. Farlow, D. Barbie, L. Gandhi, E. S. Lander, S. W. Gray, S. Joffe, P. Janne,J. Garber, L. MacConaill, N. Lindeman, B. Rollins, P. Kantoff, S. A. Fisher, S. Gabriel, G. Getz,and L. A. Garraway (2014). Whole-exome sequencing and clinical interpretation offormalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. NatureMedicine 20(6), 682–8. → pages 17, 20, 94[106] Wagle, N., M. F. Berger, M. J. Davis, B. Blumenstiel, M. Defelice, P. Pochanard, M. Ducar,P. Van Hummelen, L. E. Macconaill, W. C. Hahn, M. Meyerson, S. B. Gabriel, and L. A.Garraway (2012). High-throughput detection of actionable genomic alterations in clinical tumorsamples by targeted, massively parallel sequencing. Cancer Discovery 2(1), 82–93. → pages17, 20, 94[107] Wang, J. H., M. Gostissa, C. T. Yan, P. Goff, T. Hickernell, E. Hansen, S. Difilippantonio,D. R. Wesemann, A. A. Zarrin, K. Rajewsky, A. Nussenzweig, and F. W. Alt (2009).Mechanisms promoting translocations in editing and switching peripheral B cells.Nature 460(7252), 231–6. → pages 8[108] Wang, W., F. Li, Y. Mao, H. Zhou, J. Sun, R. Li, C. Liu, W. Chen, D. Hua, and X. Zhang(2013). A miR-570 binding site polymorphism in the B7-H1 gene is associated with the risk ofgastric adenocarcinoma. Human Genetics 132(6), 641–8. → pages 71[109] Wei, X., V. Walia, J. C. Lin, J. K. Teer, T. D. Prickett, J. Gartner, S. Davis, K. Stemke-Hale,M. A. Davies, J. E. Gershenwald, W. Robinson, S. Robinson, S. A. Rosenberg, and Y. Samuels(2011). Exome sequencing identifies GRIN2A as frequently mutated in melanoma. NatureGenetics 43(5), 442–6. → pages 17[110] Weniger, M. A., S. Gesk, S. Ehrlich, J. I. Martin-Subero, M. J. S. Dyer, R. Siebert, P. Möller,and T. F. E. Barth (2007). Gains of REL in primary mediastinal B-cell lymphoma coincide withnuclear accumulation of REL protein. Genes, Chromosomes & Cancer 46(4), 406–15. → pages88[111] Weniger, M. A., I. Melzner, C. K. Menz, S. Wegener, A. J. Bucur, K. Dorsch, T. Mattfeldt,T. F. E. Barth, and P. Möller (2006). Mutations of the tumor suppressor gene SOCS-1 inclassical Hodgkin lymphoma are frequent and associated with nuclear phospho-STAT5accumulation. Oncogene 25(18), 2679–84. → pages 30, 83, 89109[112] Westin, J. R., F. Chu, M. Zhang, L. E. Fayad, L. W. Kwak, N. Fowler, J. Romaguera,F. Hagemeister, M. Fanale, F. Samaniego, L. Feng, V. Baladandayuthapani, Z. Wang, W. Ma,Y. Gao, M. Wallace, L. M. Vence, L. Radvanyi, T. Muzzafar, R. Rotem-Yehudar, R. E. Davis,and S. S. Neelapu (2014). Safety and activity of PD1 blockade by pidilizumab in combinationwith rituximab in patients with relapsed follicular lymphoma: a single group, open-label, phase2 trial. The Lancet. Oncology 15(1), 69–77. → pages 12[113] Willis, T. G. and M. J. Dyer (2000). The role of immunoglobulin translocations in thepathogenesis of B-cell malignancies. Blood 96(3), 808–22. → pages 7[114] Wolf, J., U. Kapp, H. Bohlen, M. Kornacker, C. Schoch, B. Stahl, S. Mücke, C. von Kalle,C. Fonatsch, H. E. Schaefer, M. L. Hansmann, and V. Diehl (1996). Peripheral bloodmononuclear cells of a patient with advanced Hodgkin’s lymphoma give rise to permanentlygrowing Hodgkin-Reed Sternberg cells. Blood 87(8), 3418–28. → pages 30[115] Wong, S. Q., J. Li, R. Salemi, K. E. Sheppard, H. Do, R. W. Tothill, G. A. McArthur, andA. Dobrovic (2013). Targeted-capture massively-parallel sequencing enables robust detection ofclinically informative mutations from formalin-fixed tumours. Scientific Reports 3, 3494. →pages 20[116] Wood, L. D., D. W. Parsons, S. Jones, J. Lin, T. Sjöblom, R. J. Leary, D. Shen, S. M. Boca,T. Barber, J. Ptak, N. Silliman, S. Szabo, Z. Dezso, V. Ustyanksky, T. Nikolskaya, Y. Nikolsky,R. Karchin, P. A. Wilson, J. S. Kaminker, Z. Zhang, R. Croshaw, J. Willis, D. Dawson,M. Shipitsin, J. K. V. Willson, S. Sukumar, K. Polyak, B. H. Park, C. L. Pethiyagoda, P. V. K.Pant, D. G. Ballinger, A. B. Sparks, J. Hartigan, D. R. Smith, E. Suh, N. Papadopoulos,P. Buckhaults, S. D. Markowitz, G. Parmigiani, K. W. Kinzler, V. E. Velculescu, andB. Vogelstein (2007). The genomic landscapes of human breast and colorectal cancers.Science 318(5853), 1108–13. → pages 17[117] Wright, G., B. Tan, A. Rosenwald, E. H. Hurt, A. Wiestner, and L. M. Staudt (2003). Agene expression-based method to diagnose clinically distinct subgroups of diffuse large B celllymphoma. Proceedings of the National Academy of Sciences of the United States ofAmerica 100(17), 9991–6. → pages 4[118] Xie, R., J.-Y. Chung, K. Ylaya, R. L. Williams, N. Guerrero, N. Nakatsuka, C. Badie, andS. M. Hewitt (2011). Factors influencing the degradation of archival formalin-fixedparaffin-embedded tissue sections. The Journal of Histochemistry and Cytochemistry 59(4),356–65. → pages 20, 42[119] Zhu, J., L. Chen, L. Zou, P. Yang, R. Wu, Y. Mao, H. Zhou, R. Li, K. Wang, W. Wang,D. Hua, and X. Zhang (2014). MiR-20b, -21, and -130b inhibit PTEN expression resulting inB7-H1 over-expression in advanced colorectal cancer. Human Immunology 75(4), 348–53. →pages 71110Appendix ASupplementary MethodsFISH assays1FISH was performed as previously described on specimens represented as duplicate 0.6 mm coreson tissue microarrays (TMA) [97, 101]. Briefly, TMA sections were stained with in-house spec-trum green and orange PDL break-apart bacterial artificial chromosome probes and DAPI was usedto counterstain the nuclei. Slides were imaged using an Olympus BX61 microscope, in conjunctionwith Ariol imaging equipment (Leica Biosystems). Scoring at least four fields of view, Susana Ben-Neriah and David Twa counted 200 interphase nuclei for each case. In the event that reported signalpatterns were discordant, Christian Steidl reviewed an additional 100 interphase nuclei. For thosecases with examinable signal patterns, thresholds were set as follows: copy number loss: >40% ofnuclei with one fusion signal; gain: >20% of nuclei with three or four fusion signals; amplification:>20% of nuclei with five or more fusion signals; break-apart: >5% of nuclei with split-signals orsignal patterns indicative of unbalanced rearrangements of the genomic locus.Library construction2A 96-well FFPET genomic DNA library construction protocol was performed. Since DNA ex-tracted from FFPET will be damaged by the fixation process and prolonged storage in non-idealconditions, variable DNA quality across the collection is expected with some highly degraded sam-ples. DNA was normalized to 100 ng in a volume of 62 µL elution buffer (Qiagen) and transferredinto a microTUBE plate for shearing on an LE220 (Covaris) acoustic sonicator using the condi-tions: Duty Factor - 20%, Peak Incident Power - 450W, Cycle per burst - 200, Duration - 2X60 seconds with an intervening spin. The profile of sheared FFPE DNA extracted by the QiagenAllprep DNA/RNA FFPE protocol has a DNA peak height between 300 and 400 bp. To improvelibrary quality of FFPE-derived DNA, solid phase reversible immobilization (SPRI) bead-based sizeselection was performed before library construction to remove smaller DNA fragments from highlydegraded FFPE DNAs. If not removed in the early library construction process, these smaller frag-1This section was written by David Twa and Anja Mottok and is included in the supplementary methods of Chong,Twa et al.2This section was written by Andrew J. Mungall and is included in the supplementary methods of Chong, Twa et al.111ments would otherwise dominate the final amplified library. FFPE DNA damage and end-repairand phosphorylation were combined in a single reaction using an enzymatic premix (NEB), thenbead purified using a 0.8:1 (bead:sample) ratio to remove small FFPE fragments. Repaired DNAfragments were next A-tailed for ligation to paired-end, partial Illumina sequencing adapters thenpurified twice with SPRI beads (1:1 ratio). Full-length adaptered products were achieved by per-forming 8 cycles PCR with primers introducing fault-tolerant hexamer “barcodes” allowing multi-plexing of libraries. Indexed PCR products were double purified with 1:1 beads. Concentration offinal libraries was determined using size profiles obtained from a high sensitivity Caliper LabChipGX together with Quant-iT (Invitrogen) quantification. Libraries were pooled prior to capture.Immunohistochemistry3Sections of six tissue microarrays were cut (4 µm) and deparaffinised followed by antigen retrievalin a decloaking chamber with Diva Decloaker solution. Automated immunohistochemical stain-ing was performed on the Biocare Intellipath FLX autostainer; all reagents (with the exceptionof the antibodies) were purchased from Biocare (Biocare, Concord, CA). Sections were blockedwith peroxidase-1 and background sniper. PD-L1 rabbit monoclonal antibody (clone SP142, SpringBioscience) was applied at a concentration of 1:100 in Da Vinci Green Diluent for 30 minutes atroom temperature. PD-L1 mouse monoclonal antibody (clone 366C.9E5, G. Freeman) was usedat a concentration of 0.07 g/ml in renaissance background reducing diluent for 30 minutes at roomtemperature. MACH 2 rabbit-HRP and mouse-HRP polymers were applied for 30 minutes and vi-sualized using intelliPATH FLX DAB chromogen. Staining was evaluated and the percentage ofpositive tumor cells as well as staining intensity was recorded. A Nikon Eclipse 80i microscopeequipped with a Nikon DS-Ri1 camera was used for taking representative pictures. Katy Milneand Brad H. Nelson performed IHC staining, and Gordon J. Freeman provided the PD-L2 antibody.Anja Mottok performed IHC evaluation.3This section was written by Anja Mottok and is included in the supplementary methods of Chong, Twa et al.112Appendix BSupplementary Tables113Table B.1: Software parameters used in various steps of the bioinformatics pipeline. Parameter names listed in angle brackets representunnamed inputs. NA: not applicable.Processing step Software Version Command OptionsExtract FASTQ files Picard 1.96 SamToFastq INPUT=<sample>.bamFASTQ=<sample>.1.fastq.gzSECOND_END_FASTQ=<sample>.2.fastq.gzVALIDATION_STRINGENCY=LENIENTMAX_RECORDS_IN_RAM=5000000OUTPUT_PER_RG=falseRE_REVERSE=trueINTERLEAVE=falseINCLUDE_NON_PF_READS=falseREAD1_TRIM=0READ2_TRIM=0INCLUDE_NON_PRIMARY_ALIGNMENTS=falseVERBOSITY=INFO QUIET=falseCOMPRESSION_LEVEL=5CREATE_INDEX=falseCREATE_MD5_FILE=falseAlign FASTQ files Burrows-WheelerAligner(BWA)0.7.5a bwa mem -t 4-M-R@RG\tID:<sample>\tLB:<sample>\tPL:ILLUMINA\tSM:<sample><reference>: GRCh37-lite.fa<fastq1>: <sample>.1.fastq.gz<fastq2>: <sample>.2.fastq.gz114Processing step Software Version Command OptionsRun QC metrics FastQC 0.10.1 perl (see next entry) fastqc--nogroup<input>: <sample>.bamRun perl scripts Perl 5.18.2 perl NACalculate read count andcoverage statisticsPicard 1.96 CalculateHsMetrics BAIT_INTERVALS=<capture regions intervalsfile>BAIT_SET_NAME=hsTARGET_INTERVALS=<capture regions intervalsfile>INPUT=<sample>.bamOUTPUT=<sample>.hs_metrics.txtMETRIC_ACCUMULATION_LEVEL=[AL_READS]REFERENCE_SEQUENCE=GRCh37-lite.faPER_TARGET_COVERAGE=<sample>.interval_hs_metrics.txtVALIDATION_STRINGENCY=LENIENTMAX_RECORDS_IN_RAM=5000000VERBOSITY=INFOQUIET=falseCOMPRESSION_LVEL=5CREATE_INDEX=falseCREATE_MD5_FILE=false115Processing step Software Version Command OptionsCollect other QC metrics Picard 1.96 CollectMultipleMetrics INPUT=<sample>.bamREFERENCE_SEQUENCE=GRCh37-lite.faOUTPUT=<sample>VALIDATION_STRINGENCY=LENIENTMAX_RECORDS_IN_RAM=5000000ASSUME_SORTED=trueSTOP_AFTER=0PROGRAM=[CollectAlignmentSummaryMetrics,CollectInsertSizeMetrics, QualityScoreDistribution,MeanQualityByCycle]VERBOSITY=INFOQUIET=falseCOMPRESSION_LEVEL=5CREATE_INDEX=falseCREATE_MD5_FILE=falseCalculate per-basecoveragebedtools 2.17.0 coverage -d-abam <sample BAM with duplicates removed>(see next entry)-b <BED file of capture regions>Extracting BAM withduplicates removedSamtools 0.1.18 view -b-F 1024<input>: <sample>.bamIdentify structuralvariantsdeStruct 0.1.3 python (see nextentry) < file>--tmpdir <sample>/tmp--submit asyncqsub--maxjobs 100<input>: <file listing all sample names andassociated BAMs>116Processing step Software Version Command OptionsRun python tools Python Anacondainstallationversion 1.9.2python NAIdentify structuralvariantsDELLY 0.5.5 delly -t DEL/DUP/INV/TRA (run for each)-g GRCh37-lite.fa<input>: <sample>.bamAnnotate DELLYbreakpointsSnpEff 3.5 java (see next entry)-Xmx4G -jarsnpEff.jar-canon GRCh37.75-noStats<input>: <VCF output from DELLY>Run Java applications Java 1.7.0_25 java NARun LUMPY (see LUMPY manual for full description):1. Extract discordant readpairsSamtools 0.1.18 view -b-F 1294<input>: <sample>.bam2. Extract split reads LUMPY 0.2.8 scripts/extract-SplitReads_BwaMem-i stdin<input>: samtools view -h <sample>.bam3. Calculate insert sizedistributionLUMPY 0.2.8 python (see entry)scripts/ <read length>-X 4-N 10000<input>: samtools view <sample>.bam | tail-n+100000117Processing step Software Version Command Options4. Run LUMPY LUMPY 0.2.8 lumpy -mw 10-tt 0For each sample:-pe bam_file:<discordant reads file>,histo_file:<insert size distribution file>,mean:<mean value from distribution output>,stdev:<standard deviation from distributionoutput>, read_length:<read length>,min_non_overlap:<read length>, discordant_z:4,back_distance:25, min_mapping_threshold:20,weight:1-sr bam_file:<split reads file>, weight:1,back_distance:25, min_mapping_threshold:20Trim reads in FASTQ file FASTX-Toolkit0.0.13.2 fastx_trimmer -l 75/85/100 (run for each)<input>: gunzip -c <sample fastq.gz file 1 or 2>Merge overlapping reads FLASH 1.2.11 flash --min-overlap=10--max-overlap=<read length>--compress--output-prefix=<sample>--threads=1Run assembly ABySS 1.5.2 abyss-pe name=<sample>E=0k=50/55/60/64 (run for each)c=50e=50v=-vnp=8in=‘<FLASH un-merged fastq.gz 1> <FLASHun-merged fastq.gz 2>’se=‘<FLASH merged fastq.gz>’118Processing step Software Version Command OptionsAlign ABySS contigs to reference:1. Align contigs toreferenceBWA 0.7.5a bwa mem -t 4-a-R@RG\tID:<sample>\tLB:<sample>\tPL:ILLUMINA\tSM:<sample><reference>: GRCh37-lite.fa<fastq>: <contigs FASTA> (from ABySS)2. Filter out vendor-failedreadsSamtools 0.1.18 view -b-F 512<input>: <contigs BAM>3. Sort based oncoordinateSamtools 0.1.18 sort <input>: <contigs BAM with vendor-failed readsremoved>4. Sort based on name(used in PAVFinder step)Samtools 0.1.18 sort -n<input>: <coordinate-sorted contigs BAM>Align reads to ABySS contigs:1. Align reads to contigs BWA 0.7.5a bwa mem -t 4<reference>: <contigs FASTA> (from BWA)<input>: <(zcat fastq/<sample>.1.fastq.gzfastq/<sample>.2.fastq.gz)2. Sort by name Samtools 0.1.18 sort -n<input>: <reads to contigs BAM>3. Annotate pairinformationSamtools 0.1.18 fixmate <input>: <name-sorted reads to contigs BAM>4. Sort based oncoordinateSamtools 0.1.18 sort <input>: <mate-fixed reads to contigs BAM>119Processing step Software Version Command OptionsDetect structural variantsin assembly dataPAVFinder 0.2.0 pavfinder genome <name-sorted contigs BAM>bwa_mem <contigs FASTA><reference>: GRCh37-lite.fa-b <coordinate-sorted reads to contigs BAM>--min_size 100120Table B.2: Merged structural variant predictions in the PDL region. Position1 and Position2 describe the location of the rearrangementbreakpoints shown by chromosome, position and read orientation. Coordinates are based on hg19. Reads column represents themaximum number of spanning reads supporting the variant in any of the result-sets. Result-set column describes the number ofresult-sets in which the prediction was found. *Gene columns were manually annotated using predicted coordinates in the UCSCgenome browser, with “downstream” and “upstream” annotations referring to relative locations on positive strand. DEL: deletion,INV: inversion, DUP: duplication, TRA: translocation. aThis prediction was filtered out in some result-sets because it had 7+ readsin multiple samples, but was included for validation due to its unusually high support in A43037.Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43037 DUPa chr9:5,447,066 (-) chr9:5,570,695 (+) CD274(upstream)PDCD1LG2(exon 7)123,629 4893 6A43037 INV chr9:5,523,487 (-) chr9:5,523,443 (-) PDCD1LG2(intron 2)PDCD1LG2(intron 2)44 8 1A43037 TRA chr3:71,705,002 (-) chr9:5,514,773 (-) EIF4E3(upstream)PDCD1LG2(intron 1)0 8 1A43037 TRA chr3:71,704,967 (-) chr9:5,514,845 (-) EIF4E3(upstream)PDCD1LG2(intron 1)0 10 1A43037 TRA chr3:NA (-) chr9:NA (-) EIF4E3(unknown)PDCD1LG2(unknown)0 9 1A43037 INV chr9:5,486,121 (+) chr9:5,486,135 (+) CD274(downstream)CD274(downstream)14 7 1A43042 TRA chr9:5,452,813 (-) chr11:35,161,226 (+) CD274 (intron1)CD44 (intron1)0 204 6A43042 TRA chr11:35,161,233 (-) chr9:5,452,798 (+) CD44 (intron1)CD274 (intron1)0 86 4A43042 DEL chr9:6,135,848 (-) chr9:5,491,418 (+) RANBP6(downstream)PDCD1LG2(upstream)644,430 35 1A43042 DEL chr9:6,135,737 (-) chr9:5,491,392 (+) RANBP6(downstream)PDCD1LG2(upstream)644,345 35 1121Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43043 TRA chr17:56,409,009 (-) chr9:5,518,645 (-) BZRAP1-AS1(intron 2)PDCD1LG2(intron 1)0 81 5A43043 TRA chr17:56,409,011 (+) chr9:5,518,629 (+) BZRAP1-AS1(intron 2)PDCD1LG2(intron 1)0 61 5A43043 TRA chr10:47,272,097 (+) chr9:5,518,472 (+) BMS1P6(downstream)PDCD1LG2(intron 1)0 79 3A43043 TRA chr9:5,518,489 (-) chr10:48,156,678 (+) PDCD1LG2(intron 1)CTSL1P2(intron 1)0 54 1A43043 TRA chr10:48,981,636 (-) chr9:5,518,489 (-) GLUD1P7(downstream)PDCD1LG2(intron 1)0 50 1A43043 TRA chr10:47,272,696 (-) chr9:5,518,489 (-) BMS1P6(downstream)PDCD1LG2(intron 1)0 51 1A43067 DEL chr9:5,508,323 (-) chr9:5,467,992 (+) PDCD1LG2(upstream)CD274 (3’UTR)40,331 22 6A43069 INV chr9:5,497,600 (+) chr9:5,497,717 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)117 11 1A43069 INV chr9:5,538,313 (+) chr9:5,538,288 (+) PDCD1LG2(intron 3)PDCD1LG2(intron 3)25 9 1A43069 INV chr9:5,530,511 (+) chr9:5,530,435 (+) PDCD1LG2(intron 2)PDCD1LG2(intron 2)76 8 1A43069 INV chr9:5,509,123 (+) chr9:5,509,071 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)52 7 1A43069 INV chr9:5,467,690 (+) chr9:5,467,670 (+) CD274 (intron5)CD274 (intron5)20 7 1A43069 INV chr9:5,558,742 (+) chr9:5,558,687 (+) PDCD1LG2(intron 5)PDCD1LG2(intron 5)55 8 1A43069 INV chr9:5,509,071 (+) chr9:5,509,084 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)13 7 1122Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43069 INV chr9:5,519,687 (+) chr9:5,519,725 (+) PDCD1LG2(intron 1)PDCD1LG2(intron 1)38 11 1A43069 INV chr9:5,500,494 (+) chr9:5,500,465 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)29 8 1A43069 INV chr9:5,529,417 (+) chr9:5,529,472 (+) PDCD1LG2(intron 2)PDCD1LG2(intron 2)55 8 1A43069 INV chr9:5,552,016 (-) chr9:5,552,115 (-) PDCD1LG2(intron 4)PDCD1LG2(intron 4)99 7 1A43073 INV chr9:5,559,810 (-) chr9:5,559,962 (-) PDCD1LG2(intron 5)PDCD1LG2(intron 5)152 8 1A43077 INV chr9:5,570,345 (-) chr9:5,570,480 (-) PDCD1LG2(3’ UTR)PDCD1LG2(3’ UTR)135 7 1A43077 INV chr9:5,533,337 (+) chr9:5,533,378 (+) PDCD1LG2(intron 2)PDCD1LG2(intron 2)41 10 1A43077 INV chr9:5,524,996 (+) chr9:5,524,970 (+) PDCD1LG2(intron 2)PDCD1LG2(intron 2)26 7 1A43115 DEL chr9:5,520,276 (-) chr9:5,518,701 (+) PDCD1LG2(intron 1)PDCD1LG2(intron 1)1,575 163 5A43115 TRA chr22:23,235,076 (-) chr9:5,511,361 (-) IGLL5 (intron1)PDCD1LG2(intron 1)0 281 4A43115 TRA chr22:23,230,931 (+) chr9:5,511,348 (+) IGLL5 (intron1)PDCD1LG2(intron 1)0 68 1A43029 DEL chr9:5,889,215 (-) chr9:5,467,908 (+) MLANA(upstream)CD274 (3’UTR)421,307 119 8A43030 TRA chr10:48,963,626 (-) chr9:5,526,439 (+) GLUD1P7(intron 2)PDCD1LG2(intron 2)0 46 2A43030 TRA chr10:48,174,662 (+) chr9:5,526,439 (+) CTSL1P2(downstream)PDCD1LG2(intron 2)0 49 2123Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43038 INV chr9:5,477,220 (+) chr9:37,409,812 (+) CD274(downstream)ZCCHC7(downstream)31,932,592 17 8A43041 TRA chr20:49,127,848 (-) chr9:5,450,396 (-) PTPN1 (intron1)CD274(upstream)0 48 8A43041 DUP chr9:5,451,269 (-) chr9:5,467,963 (+) CD274 (intron1)CD274 (3’UTR)16,694 34 8A43041 INV chr9:5,467,981 (+) chr9:5,627,175 (+) CD274 (3’UTR)RIC1(upstream)159,194 16 8A43041 INV chr9:5,468,614 (-) chr9:5,627,760 (-) CD274 (3’UTR)RIC1(upstream)159,146 13 7A43041 TRA chr20:49,127,782 (+) chr9:5,450,389 (+) PTPN1 (intron1)CD274(upstream)0 53 6A43041 DEL chr9:5,775,524 (-) chr9:5,466,595 (+) RIC1 (3’UTR)CD274 (intron4)308,929 9 5A43041 DEL chr9:5,451,585 (-) chr9:5,451,258 (+) CD274 (intron1)CD274 (intron1)327 38 4A43041 TRA chr9:5,491,406 (-) chr2:33,091,981 (-) PDCD1LG2(upstream)LINC00486(intron 2)0 7 1A43071 TRA chr9:5,518,615 (-) chr1:28,833,535 (+) PDCD1LG2(intron 1)RCC1 (intron1)0 33 8A43071 TRA chr1:28,833,554 (-) chr9:5,518,591 (+) RCC1 (intron1)PDCD1LG2(intron 1)0 20 7A43071 TRA chr9:5,482,706 (-) chr1:231,007,812 (-) CD274(downstream)LOC101927604(upstream)0 7 1A43075 DUP chr9:5,500,059 (-) chr9:5,570,090 (+) PDCD1LG2(upstream)PDCD1LG2(3’ UTR)70,031 231 8A43075 TRA chr9:5,482,706 (-) chr1:231,007,775 (-) CD274(downstream)LOC101927604(upstream)0 8 2124Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43079 TRA chr17:56,409,576 (-) chr9:5,517,129 (-) BZRAP1-AS1(intron 2)PDCD1LG2(intron 1)0 643 10A43079 TRA chr17:56,409,564 (+) chr9:5,517,123 (+) BZRAP1-AS1(intron 2)PDCD1LG2(intron 1)0 21 4A43082 TRA chr9:5,510,982 (-) chr2:89,159,416 (+) PDCD1LG2(intron 1)IGK 0 49 8A43082 TRA chr2:89,160,608 (-) chr9:5,510,954 (+) IGK PDCD1LG2(intron 1)0 15 6A43084 DEL chr9:5,470,588 (-) chr9:5,467,983 (+) CD274(downstream)CD274 (3’UTR)2,605 64 8A43085 TRA chr13:46,946,572 (-) chr9:5,451,322 (-) KIAA0226L(intron 1)CD274 (intron1)0 113 8A43085 TRA chr13:46,946,271 (+) chr9:5,450,461 (+) KIAA0226L(5’ UTR)CD274(upstream)0 23 7A43085 DUP chr9:5,461,461 (-) chr9:5,461,651 (+) CD274 (intron2)CD274 (intron2)190 13 5A43088 TRA chr9:5,466,034 (+) chr6:52,178,267 (+) CD274 (intron4)MCM3(downstream)0 21 7A43089 INV chr9:5,480,091 (+) chr9:37,381,934 (+) CD274(downstream)ZCCHC7(downstream)31,901,843 114 8A43090 DEL chr9:5,984,410 (-) chr9:5,563,053 (+) KIAA2026(intron 2)PDCD1LG2(intron 5)421,357 9 7A43093 TRA chr7:931,958 (-) chr9:5,513,369 (+) GET4 (exon6)PDCD1LG2(intron 1)0 166 8A43093 TRA chr9:5,511,678 (-) chr16:27,326,593 (+) PDCD1LG2(intron 1)IL4R (intron1)0 125 8A43093 TRA chrX:10,651,175 (-) chr9:5,565,388 (+) MID1 (intron1)PDCD1LG2(intron 6)0 25 8125Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43093 TRA chr9:5,564,531 (-) chrX:10,628,217 (+) PDCD1LG2(intron 6)MID1 (intron1)0 23 8A43093 TRA chr9:5,513,369 (-) chr7:932,289 (+) PDCD1LG2(intron 1)GET4 (intron6)0 69 7A43093 TRA chr16:27,326,876 (-) chr9:5,511,670 (+) IL4R (intron1)PDCD1LG2(intron 1)0 61 7A43093 TRA chr9:5,513,373 (-) chr7:932,321 (+) PDCD1LG2(intron 1)GET4 (intron6)0 65 1A43093 INV chr9:5,500,133 (+) chr9:5,500,828 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)695 16 1A43093 INV chr9:5,492,236 (-) chr9:5,493,334 (-) PDCD1LG2(upstream)PDCD1LG2(upstream)1,098 7 1A43094 TRA chr9:5,491,406 (-) chr2:33,091,985 (-) PDCD1LG2(upstream)LINC00486(intron 2)0 13 1A43095 TRA chr22:23,231,719 (-) chr9:5,510,728 (-) IGLL5 (intron1)PDCD1LG2(5’ UTR)0 87 6A43095 TRA chr22:23,231,701 (+) chr9:5,510,714 (+) IGLL5 (intron1)PDCD1LG2(5’ UTR)0 41 2A43097 TRA chr9:5,518,313 (-) chr16:27,326,880 (+) PDCD1LG2(intron 1)IL4R (intron1)0 84 8A43097 TRA chr16:27,326,882 (-) chr9:5,518,334 (+) IL4R (intron1)PDCD1LG2(intron 1)0 34 4A43099 INV chr9:5,527,098 (+) chr9:37,498,687 (+) PDCD1LG2(intron 2)POLR1E(intron 9)31,971,589 7 5A43101 INV chr9:5,469,428 (+) chr9:5,561,319 (+) CD274 (3’UTR)PDCD1LG2(intron 5)91,891 194 8A43106 INV chr9:5,500,133 (+) chr9:5,500,828 (+) PDCD1LG2(upstream)PDCD1LG2(upstream)695 8 1126Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43109 TRA chr9:5,482,706 (-) chr1:231,007,816 (-) CD274(downstream)LOC101927604(upstream)0 8 3A43110 TRA chr9:5,450,316 (-) chr2:89,157,495 (+) CD274(upstream)IGK 0 389 8A43110 TRA chr2:89,157,512 (-) chr9:5,450,311 (+) IGK CD274(upstream)0 104 8A43110 TRA chr2:89,134,008 (-) chr9:5,497,709 (+) IGK PDCD1LG2(upstream)0 34 8A43110 TRA chr2:89,051,353 (-) chr9:5,502,180 (+) RPIA(downstream)PDCD1LG2(upstream)0 29 8A43110 DEL chr9:6,440,816 (-) chr9:5,470,453 (+) UHRF2(intron 3)CD274 (3’UTR)970,363 11 6127Table B.3: Immunohistochemistry scoring results for the 68 cases. Percentage of tumour cells staining positive for PD-L1 or PD-L2and intensity of the staining are shown. Histological score is calculated by multiplying the positive percentage by the intensity.Empty cells represent cases that were not evaluated by IHC, and NA values represent cases that were stained but not evaluablewhen scoring. BA: break-apart, O-EX: over-expressed.SampleIDPDL FISHBA statusPD-L1 IHC%PD-L1 IHCintensityPD-L1 IHChistological scorePD-L2 IHC%PD-L2 IHCintensityPD-L2 IHChistological scoreA43029 0A43030 1 10 1 10 0 0 0A43031 0 0 0 0 0 0 0A43032 0 0 0 0 40 2 80A43033 0 0 0 0 60 1 60A43034 0 0 0 0 50 2 100A43036 0 20 2 40 0 0 0A43037 0 100 3 300 70 3 210A43038 1 0 0 0 0 0 0A43041 1 100 3 300 30 1 30A43042 1 100 3 300 30 1 30A43043 1 10 2 20 50 2 100A43045 0A43046 0A43047 0A43049 0 0 0 0 10 1 10A43050 0 50 2 100 0 0 0A43051 0A43052 0A43053 0 80 2 160 30 1 30A43067 0 NA NA NA NA NA NAA43068 0 NA NA NA NA NA NA128SampleIDPDL FISHBA statusPD-L1 IHC%PD-L1 IHCintensityPD-L1 IHChistological scorePD-L2 IHC%PD-L2 IHCintensityPD-L2 IHChistological scoreA43069 0 80 2 160 60 2 120A43070 0 NA NA NA NA NA NAA43071 1 NA NA NA NA NA NAA43072 1 NA NA NA NA NA NAA43073 0 NA NA NA NA NA NAA43074 1 60 1 60 10 1 10A43075 0 0 0 0 100 3 300A43076 0 100 3 300 0 0 0A43077 0 90 2 180 80 3 240A43078 0 NA NA NA 20 2 40A43079 1 0 0 0 50 3 150A43080 0 30 1 30 60 1 60A43081 0 0 0 0 0 0 0A43082 1A43083 1A43084 1 100 3 300 20 1 20A43085 1A43086 1A43087 1A43088 1A43089 1 100 3 300 0 0 0A43090 1 100 3 300 90 3 270A43091 1 100 3 300 0 0 0A43092 1 0 0 0 10 1 10A43093 1 70 2 140 60 3 180A43094 1 100 3 300 90 3 270A43095 1 0 0 0 100 3 300A43096 1 100 3 300 0 0 0129SampleIDPDL FISHBA statusPD-L1 IHC%PD-L1 IHCintensityPD-L1 IHChistological scorePD-L2 IHC%PD-L2 IHCintensityPD-L2 IHChistological scoreA43097 1 20 1 20 30 1 30A43099 1A43100 O-EX NA NA NA 0 0 0A43101 O-EX 80 2 160 0 0 0A43102 0A43103 O-EXA43104 0A43105 0A43106 0A43107 0A43108 0A43109 0A43110 1A43111 0 0 0 0 60 2 120A43115 1 0 0 0 100 2 200A43117 1A43118 O-EX NA NA NA NA NA NAA43119 0130Table B.4: Merged structural variant predictions in the chr16 capture region. Position1 and Position2 describe the location of therearrangement breakpoints shown by chromosome, position and read orientation. Coordinates are based on hg19. Reads columnrepresents the maximum number of spanning reads supporting the variant in any of the result-sets. Result-set column describesthe number of result-sets in which the prediction was found. *Gene columns were manually annotated using predicted coordinatesin the UCSC genome browser, with “downstream” and “upstream” annotations referring to relative locations on positive strand.DEL: deletion, INV: inversion, DUP: duplication, TRA: translocation.Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43036 INV chr16:10,972,119 (+) chr16:11,349,103 (+) CIITA (intron1)SOCS1 (exon2)376,984 144 6A43036 TRA chrX:41,548,791 (-) chr16:10,972,522 (+) GPR34 (intron1)CIITA (intron1)NA 68 6A43036 DEL chr16:10,973,118 (-) chr16:10,972,770 (+) CIITA (intron1)CIITA (intron1)348 112 4A43036 INV chr16:10,972,127 (-) chr16:11,349,114 (-) CIITA (intron1)SOCS1 (exon2)376,987 65 4A43036 TRA chr16:10,972,530 (-) chrX:41,548,793 (+) CIITA (intron1)GPR34 (intron1)NA 70 2A43037 DEL chr16:10,973,706 (-) chr16:10,972,948 (+) CIITA (intron1)CIITA (intron1)758 252 6A43037 INV chr16:11,062,836 (-) chr16:57,168,099 (-) CLEC16A(intron 3)CPNE2(intron 12)46,105,263 234 6A43037 INV chr16:11,063,029 (+) chr16:30,683,452 (+) CLEC16A(exon 4)FBRS(downstream)19,620,423 148 6A43037 INV chr16:10,537,483 (+) chr16:11,080,769 (+) ATF7IP2(intron 7)CLEC16A(intron 9)543,286 36 6A43037 INV chr16:11,060,782 (-) chr16:30,690,172 (-) CLEC16A(intron 3)FBRS(downstream)19,629,390 134 4A43037 INV chr16:11,110,612 (+) chr16:28,448,129 (+) CLEC16A(intron 10)EIF3C(downstream)17,337,517 217 3131Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43037 DUP chr16:11,106,289 (-) chr16:21,748,652 (+) CLEC16A(intron 10)OTOA (intron21)10,642,363 10 3A43037 INV chr16:11,059,965 (+) chr16:21,428,484 (+) CLEC16A(intron 3)NPIPB3(intron 2)10,368,519 18 2A43037 INV chr16:11,060,709 (-) chr16:30,690,172 (-) CLEC16A(intron 3)FBRS(downstream)19,629,463 134 1A43043 DEL chr16:10,972,316 (-) chr16:10,972,128 (+) CIITA (intron1)CIITA (intron1)188 94 4A43050 DEL chr16:10,973,044 (-) chr16:10,972,395 (+) CIITA (intron1)CIITA (intron1)649 108 6A43067 INV chr16:10,973,601 (+) chr16:27,326,617 (+) CIITA (intron1)IL4R (intron1)16,353,016 39 6A43067 INV chr16:10,973,610 (-) chr16:27,326,640 (-) CIITA (intron1)IL4R (intron1)16,353,030 16 6A43069 TRA chr22:39,854,860 (+) chr16:10,973,113 (+) MGAT3(intron 1)CIITA (intron1)NA 55 6A43069 TRA chr22:39,854,856 (-) chr16:10,973,113 (-) MGAT3(intron 1)CIITA (intron1)NA 40 6A43069 INV chr16:11,336,283 (-) chr16:11,335,246 (-) SOCS1(upstream)SOCS1(upstream)1,037 7 1A43077 DEL chr16:12,062,507 (-) chr16:10,972,806 (+) TNFRSF17(downstream)CIITA (intron1)1,089,701 81 6A43081 DEL chr16:10,973,504 (-) chr16:10,971,906 (+) CIITA (intron1)CIITA (intron1)1,598 21 5A43081 DEL chr16:10,972,143 (-) chr16:10,971,940 (+) CIITA (intron1)CIITA (intron1)203 12 4A43081 INV chr16:10,972,733 (-) chr16:10,973,111 (-) CIITA (intron1)CIITA (intron1)378 17 3132Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43081 INV chr16:10,973,005 (+) chr16:10,972,714 (+) CIITA (intron1)CIITA (intron1)291 18 3A43081 INV chr16:11,191,188 (+) chr16:11,191,294 (+) CLEC16A(intron 18)CLEC16A(intron 18)106 7 1A43081 TRA chr10:48,988,103 (-) chr16:10,973,628 (+) GLUD1P7(downstream)CIITA (intron1)NA 7 1A43081 INV chr16:11,082,018 (+) chr16:11,081,907 (+) CLEC16A(intron 9)CLEC16A(intron 9)111 7 1A43081 TRA chr16:10,973,648 (-) chr10:47,279,151 (+) CIITA (intron1)BMS1P6(downstream)NA 8 1A43081 TRA chr10:46,798,352 (-) chr16:10,973,628 (+) BMS1P5(downstream)CIITA (intron1)NA 7 1A43081 INV chr16:11,111,185 (+) chr16:11,111,295 (+) CLEC16A(intron 10)CLEC16A(intron 10)110 7 1A43081 TRA chr10:48,150,225 (-) chr16:10,973,648 (-) CTSL1P2(upstream)CIITA (intron1)NA 8 1A43081 INV chr16:11,191,264 (+) chr16:11,191,197 (+) CLEC16A(intron 18)CLEC16A(intron 18)67 7 1A43081 INV chr16:11,045,032 (-) chr16:11,045,138 (-) CLEC16A(intron 1)CLEC16A(intron 1)106 7 1A43115 DEL chr16:10,973,158 (-) chr16:10,972,963 (+) CIITA (intron1)CIITA (intron1)195 141 5A43115 INV chr16:10,971,282 (+) chr16:10,971,688 (+) CIITA (intron1)CIITA (intron1)406 251 3A43115 INV chr16:10,971,301 (-) chr16:10,971,709 (-) CIITA (intron1)CIITA (intron1)408 236 3A43115 DEL chr16:11,349,494 (-) chr16:11,349,402 (+) SOCS1 (intron1)SOCS1 (intron1)92 25 3133Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43115 TRA chr7:128,309,229 (-) chr16:10,980,174 (+) FAM71F2(upstream)CIITA (intron1)NA 15 2A43115 TRA chr10:48,150,928 (+) chr16:10,973,053 (+) CTSL1P2(upstream)CIITA (intron1)NA 98 1A43115 TRA chr16:10,973,066 (-) chr10:46,796,841 (+) CIITA (intron1)BMS1P5(downstream)NA 129 1A43115 TRA chr10:47,278,446 (-) chr16:10,973,053 (+) BMS1P6(downstream)CIITA (intron1)NA 107 1A43115 TRA chr16:10,973,066 (-) chr10:48,986,597 (+) CIITA (intron1)GLUD1P7(downstream)NA 135 1A43115 TRA chr10:48,987,377 (-) chr16:10,973,053 (+) GLUD1P7(downstream)CIITA (intron1)NA 116 1A43117 DEL chr16:10,973,536 (-) chr16:10,972,620 (+) CIITA (intron1)CIITA (intron1)916 349 6A43029 INV chr16:11,201,073 (-) chr16:11,785,708 (-) CLEC16A(intron 18)TXNDC11(exon 8)584,635 67 8A43030 INV chr16:11,348,883 (-) chr16:11,351,573 (-) SOCS1 (exon2)SOCS1(downstream)2,690 14 5A43030 DEL chr16:11,116,222 (-) chr16:10,947,070 (+) CLEC16A(intron 11)CIITA(upstream)169,152 38 4A43030 DEL chr16:11,349,193 (-) chr16:11,349,015 (+) SOCS1 (exon2)SOCS1 (exon2)178 9 2A43030 TRA chr16:11,349,135 (-) chr14:106,326,048 (+) SOCS1 (exon2)IGH NA 19 2A43031 INV chr16:11,013,706 (+) chr16:11,033,187 (+) CIITA (intron14)DEXI (intron1)19,481 149 8A43031 INV chr16:11,013,709 (-) chr16:11,033,083 (-) CIITA (intron14)DEXI (intron1)19,374 110 7134Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43038 TRA chr16:10,979,957 (+) chr4:19,079,508 (-) CIITA (intron1)Intergenic NA 7 1A43049 INV chr16:3,056,935 (+) chr16:10,966,595 (+) CLDN9(upstream)CIITA(upstream)7,909,660 21 8A43051 DEL chr16:10,972,040 (-) chr16:7,638,085 (+) CIITA (intron1)RBFOX1(intron 7)3,333,955 19 5A43052 TRA chr16:10,974,031 (-) chr2:61,108,467 (+) CIITA (intron1)REL(upstream)NA 21 8A43052 TRA chr2:61,108,477 (-) chr16:10,974,001 (+) REL(upstream)CIITA (intron1)NA 13 8A43052 TRA chr2:89,159,665 (-) chr16:10,972,714 (+) IGK CIITA (intron1)NA 15 7A43052 DEL chr16:10,973,286 (-) chr16:10,972,769 (+) CIITA (intron1)CIITA (intron1)517 13 4A43052 TRA chr16:10,972,727 (-) chr2:89,159,288 (+) CIITA (intron1)IGK NA 13 3A43068 DEL chr16:11,310,352 (-) chr16:10,962,704 (+) CLEC16A(downstream)CIITA(upstream)347,648 54 8A43068 TRA chr14:106,211,708 (-) chr16:11,348,887 (+) IGH SOCS1 (exon2)NA 28 3A43068 TRA chr16:11,348,899 (-) chr14:106,213,380 (+) SOCS1 (exon2)IGH NA 106 3A43068 TRA chr16:11,348,887 (+) chr14:106,211,391 (-) SOCS1 (exon2)IGH NA 18 1A43070 DEL chr16:11,812,699 (-) chr16:10,983,031 (+) TXNDC11(intron 5)CIITA (intron1)829,668 41 8A43070 TRA chr16:10,972,919 (-) chr1:2,984,655 (+) CIITA (intron1)PRDM16(downstream)NA 56 8135Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43070 TRA chr1:2,985,148 (-) chr16:10,972,750 (+) PRDM16(downstream)CIITA (intron1)NA 102 5A43070 DEL chr16:11,480,301 (-) chr16:11,215,236 (+) RMI2(downstream)CLEC16A(intron 19)265,065 26 3A43070 DEL chr16:11,215,210 (+) chr16:11,480,299 (-) CLEC16A(intron 19)RMI2(downstream)265,089 24 2A43070 INV chr16:11,215,206 (+) chr16:11,480,274 (+) CLEC16A(intron 19)RMI2(downstream)265,068 19 2A43070 DEL chr16:11,480,347 (-) chr16:11,215,210 (+) RMI2(downstream)CLEC16A(intron 19)265,137 29 1A43071 DEL chr16:10,996,515 (-) chr16:10,996,041 (+) CIITA (intron7)CIITA (intron7)474 29 8A43071 DEL chr16:11,339,356 (-) chr16:10,756,488 (+) SOCS1(upstream)TEKT5(intron 5)582,868 21 8A43071 DEL chr16:11,000,356 (-) chr16:10,998,669 (+) CIITA (intron10)CIITA (intron10)1,687 20 7A43071 DEL chr16:11,004,046 (-) chr16:11,003,045 (+) CIITA (intron10)CIITA (intron10)1,001 30 7A43071 DEL chr16:10,989,620 (+) chr16:10,992,525 (-) CIITA (intron3)CIITA (intron3)2,905 19 7A43071 DEL chr16:10,989,526 (-) chr16:10,989,285 (+) CIITA (intron2)CIITA (intron2)241 23 6A43071 DEL chr16:10,998,601 (-) chr16:10,997,752 (+) CIITA (intron9)CIITA (intron9)849 19 6A43071 DEL chr16:11,012,381 (+) chr16:11,016,021 (-) CIITA (intron14)CIITA (intron14)3,640 8 5A43071 DEL chr16:10,997,593 (-) chr16:10,996,663 (+) CIITA (intron8)CIITA (intron8)930 28 4136Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43071 DEL chr16:10,971,237 (+) chr16:10,989,136 (-) CIITA (intron1)CIITA (intron1)17,899 17 4A43071 DEL chr16:11,010,316 (+) chr16:11,012,296 (-) CIITA (intron13)CIITA (intron13)1,980 11 3A43071 DEL chr16:11,009,507 (+) chr16:11,010,223 (-) CIITA (intron12)CIITA (intron12)716 11 3A43071 DEL chr16:11,000,446 (-) chr16:10,997,760 (+) CIITA (intron10)CIITA (intron9)2,686 35 2A43071 DEL chr16:11,000,326 (-) chr16:10,997,762 (+) CIITA (intron10)CIITA (intron9)2,564 34 2A43071 DEL chr16:11,002,003 (+) chr16:11,002,882 (-) CIITA (intron10)CIITA (intron10)879 23 2A43071 DEL chr16:11,004,116 (+) chr16:11,009,426 (-) CIITA (intron11)CIITA (intron11)5,310 11 2A43071 DEL chr16:11,016,347 (+) chr16:11,017,084 (-) CIITA (intron16)CIITA (intron16)737 9 2A43071 DEL chr16:11,016,107 (+) chr16:11,016,263 (-) CIITA (intron15)CIITA (intron15)156 11 2A43071 DEL chr16:10,996,060 (-) chr16:10,995,326 (+) CIITA (intron7)CIITA (intron5)734 8 1A43071 DEL chr16:11,003,989 (-) chr16:11,002,129 (+) CIITA (intron10)CIITA (intron10)1,860 7 1A43071 DEL chr16:10,992,769 (-) chr16:10,989,418 (+) CIITA (intron4)CIITA (intron2)3,351 12 1A43071 DEL chr16:10,992,681 (-) chr16:10,989,701 (+) CIITA (intron4)CIITA (intron3)2,980 10 1A43071 DEL chr16:10,995,896 (-) chr16:10,992,830 (+) CIITA (intron6)CIITA (intron4)3,066 13 1137Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43071 DEL chr16:10,992,480 (-) chr16:10,989,155 (+) CIITA (intron3)CIITA (intron1)3,325 7 1A43071 DEL chr16:10,992,859 (+) chr16:10,995,891 (-) CIITA (intron5)CIITA (intron6)3,032 8 1A43071 DEL chr16:10,995,412 (+) chr16:10,995,891 (-) CIITA (intron6)CIITA (intron6)479 11 1A43072 DEL chr16:10,996,431 (-) chr16:10,861,924 (+) CIITA (intron7)NUBP1(intron 9)134,507 73 8A43075 DEL chr16:11,349,240 (-) chr16:11,348,417 (+) SOCS1 (exon2)SOCS1 (3’UTR)823 90 7A43075 TRA chr16:10,973,366 (-) chr12:8,762,984 (+) CIITA (intron1)AICDA(intron 1)NA 48 6A43075 TRA chr12:8,764,607 (-) chr16:10,973,178 (+) AICDA(intron 1)CIITA (intron1)NA 133 5A43075 DEL chr16:10,972,691 (+) chr16:10,972,983 (-) CIITA (intron1)CIITA (intron1)292 102 1A43076 INV chr16:11,037,301 (+) chr16:11,352,433 (+) CLEC16A(upstream)RMI2 (intron1)315,132 40 7A43076 DEL chr16:12,374,408 (-) chr16:10,982,313 (+) SNX29 (intron15)CIITA (intron1)1,392,095 20 6A43076 DUP chr16:10,972,350 (-) chr16:10,972,662 (+) CIITA (intron1)CIITA (intron1)312 45 4A43078 TRA chr8:128,808,741 (-) chr16:10,972,594 (-) PVT1 (intron1)CIITA (intron1)NA 90 4A43078 TRA chr8:128,808,716 (+) chr16:10,972,569 (+) PVT1 (intron1)CIITA (intron1)NA 39 3A43078 DEL chr16:10,972,919 (-) chr16:10,972,600 (+) CIITA (intron1)CIITA (intron1)319 55 2138Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43078 DEL chr16:10,973,104 (-) chr16:10,972,500 (+) CIITA (intron1)CIITA (intron1)604 41 1A43079 TRA chr10:48,989,246 (-) chr16:10,972,800 (+) GLUD1P7(downstream)CIITA (intron1)NA 146 7A43079 DEL chr16:10,973,892 (-) chr16:10,973,408 (+) CIITA (intron1)CIITA (intron1)484 189 6A43079 TRA chr16:11,349,095 (+) chr14:106,325,853 (+) SOCS1 (exon2)IGH NA 29 4A43079 TRA chr16:10,972,823 (-) chr10:46,794,179 (+) CIITA (intron1)BMS1P5(downstream)NA 18 2A43079 TRA chr16:10,972,823 (-) chr10:48,983,939 (+) CIITA (intron1)GLUD1P7(downstream)NA 24 1A43080 DEL chr16:11,348,838 (-) chr16:10,973,122 (+) SOCS1 (exon2)CIITA (intron1)375,716 195 8A43080 DEL chr16:10,972,446 (-) chr16:10,971,699 (+) CIITA (intron1)CIITA (intron1)747 106 8A43082 DEL chr16:11,349,550 (-) chr16:11,349,119 (+) SOCS1 (intron1)SOCS1 (exon2)431 45 6A43082 DEL chr16:10,972,737 (-) chr16:10,972,561 (+) CIITA (intron1)CIITA (intron1)176 29 2A43082 DEL chr16:10,972,338 (+) chr16:10,972,492 (-) CIITA (intron1)CIITA (intron1)154 28 2A43082 DEL chr16:10,972,598 (-) chr16:10,972,316 (+) CIITA (intron1)CIITA (intron1)282 18 1A43084 DEL chr16:10,972,598 (-) chr16:10,972,316 (+) CIITA (intron1)CIITA (intron1)282 97 1A43090 DEL chr16:11,180,645 (-) chr16:11,179,301 (+) CLEC16A(intron 18)CLEC16A(intron 18)1,344 70 8139Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43090 DEL chr16:11,348,970 (-) chr16:11,348,503 (+) SOCS1 (exon2)SOCS1 (3’UTR)467 16 5A43092 TRA chr8:128,749,207 (-) chr16:11,349,139 (-) MYC (intron1)SOCS1 (exon2)NA 23 8A43092 TRA chr8:128,749,148 (+) chr16:11,349,126 (+) MYC (intron1)SOCS1 (exon2)NA 13 5A43093 DEL chr16:10,975,142 (-) chr16:10,973,558 (+) CIITA (intron1)CIITA (intron1)1,584 80 8A43093 INV chr16:10,971,536 (+) chr16:10,972,727 (+) CIITA (intron1)CIITA (intron1)1,191 90 6A43093 DEL chr16:11,084,314 (-) chr16:10,797,444 (+) CLEC16A(intron 9)NUBP1(upstream)286,870 8 5A43093 INV chr16:10,971,870 (-) chr16:10,972,644 (-) CIITA (intron1)CIITA (intron1)774 83 3A43093 DUP chr16:10,972,945 (-) chr16:10,973,145 (+) CIITA (intron1)CIITA (intron1)200 35 2A43094 DEL chr16:11,348,834 (-) chr16:9,992,019 (+) SOCS1 (exon2)GRIN2A(intron 3)1,356,815 111 7A43095 DEL chr16:10,974,070 (-) chr16:10,972,127 (+) CIITA (intron1)CIITA (intron1)1,943 20 7A43095 DEL chr16:11,349,373 (-) chr16:11,348,681 (+) SOCS1 (5’UTR)SOCS1 (3’UTR)692 85 5A43095 DEL chr16:10,972,312 (-) chr16:10,972,015 (+) CIITA (intron1)CIITA (intron1)297 76 5A43095 INV chr16:10,973,186 (-) chr16:10,973,372 (-) CIITA (intron1)CIITA (intron1)186 66 3A43095 INV chr16:10,973,169 (+) chr16:10,973,376 (+) CIITA (intron1)CIITA (intron1)207 37 3140Sample Type Position1 Position2 Gene1* Gene2* Length Reads Result-setsA43095 DEL chr16:10,974,698 (-) chr16:10,974,425 (+) CIITA (intron1)CIITA (intron1)273 83 2A43095 DEL chr16:10,972,020 (+) chr16:10,974,069 (-) CIITA (intron1)CIITA (intron1)2,049 14 2A43097 DEL chr16:10,971,998 (-) chr16:10,971,586 (+) CIITA (intron1)CIITA (intron1)412 138 5A43101 INV chr16:10,971,827 (+) chr16:10,973,681 (+) CIITA (intron1)CIITA (intron1)1,854 52 6A43101 INV chr16:10,971,821 (-) chr16:10,973,687 (-) CIITA (intron1)CIITA (intron1)1,866 36 5A43101 DEL chr16:11,349,678 (-) chr16:11,349,563 (+) SOCS1 (intron1)SOCS1 (intron1)115 7 1A43110 DEL chr16:11,143,548 (-) chr16:10,810,673 (+) CLEC16A(intron 16)NUBP1(upstream)332,875 63 7A43119 TRA chr16:10,973,001 (-) chr14:106,094,545 (-) CIITA (intron1)IGH NA 9 1141


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items