Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Using genomic sequencing technology to provide insight into cancer biology and their mechanisms Thibodeau, My Linh 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_may_thibodeau_mylinh.pdf [ 7.87MB ]
Metadata
JSON: 24-1.0388625.json
JSON-LD: 24-1.0388625-ld.json
RDF/XML (Pretty): 24-1.0388625-rdf.xml
RDF/JSON: 24-1.0388625-rdf.json
Turtle: 24-1.0388625-turtle.txt
N-Triples: 24-1.0388625-rdf-ntriples.txt
Original Record: 24-1.0388625-source.json
Full Text
24-1.0388625-fulltext.txt
Citation
24-1.0388625.ris

Full Text

 1 USING GENOMIC SEQUENCING TECHNOLOGY TO PROVIDE INSIGHT INTO CANCER BIOLOGY AND THEIR MECHANISMS by My Linh Thibodeau  DEC, Cégep de Trois-Rivières, 2006 M.D, Université Laval, 2013  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies and Postdoctoral Studies (Bioinformatics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)    February 2020  © My Linh Thibodeau, 2020 ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:  Using genomic sequencing technology to provide insight into cancer biology and their mechanisms  submitted by My Linh Thibodeau  in partial fulfillment of the requirements for the degree of Master of Science in Bioinformatics  Examining Committee: Kasmintan A. Schrader, Hereditary Cancer Program Co-supervisor Steven J.M. Jones, Canada's Michael Smith Genome Sciences Centre at BC Cancer Co-supervisor  Tony Ng, Department of Pathology and Laboratory Medicine Supervisory Committee Member  Inanc Birol, Canada's Michael Smith Genome Sciences Centre at BC Cancer Supervisory Committee Member  Jan M. Friedman, Department of Medical Genetics Additional Examiner     iii Abstract  Genomic sequencing technology provides insight into cancer pathogenesis and tumoural mechanisms. Tumour RNA sequencing can be used to assess the functionality of genes by allowing for gene expression quantification and transcriptome analysis. Mutational signatures are somatic patterns of mutations arising from specific mutagenic processes such as exogenous and endogenous exposures, defective DNA repair mechanisms or DNA enzymatic editing. Such signatures are “genomic scars” informing on the underlying biological processes that led to cancer. Whole genome sequencing (WGS) of tumour DNA and matched blood DNA as well as whole transcriptome sequencing (WTS) of tumour RNA was performed in advanced cancers of diverse types as part of the Personalized OncoGenomics project. Germline single nucleotide variants (SNVs), copy number variants (CNVs) and structural variants (SVs) in 98 hereditary cancer genes were analyzed from germline WGS data. Somatic SNVs, CNVs and SVs were analyzed from tumour WGS and WTS data. Somatic SNVs profiles were used for mutational signature modelling. Gene expression was obtained from WTS. Transcriptome targeted assembly was performed for transcript splicing analysis. We present specific examples demonstrating the usefulness of combined genomic and bioinformatic approaches for understanding clinically unusual cases of cancer and their molecular mechanisms. We used somatic mutational signature profiling to determine the functional impact of germline and somatic variants in MUTYH, a base excision repair gene, on the overall mutational landscape. In Chapter 2, we present a case series of patients with germline MUTYH variants and diverse cancers. We identified two MUTYH variants for which the previous classification in public databases are inconsistent and we show that these variants cause aberrant splicing and base excision repair deficiency signatures enriched   iv for C:G>A:T transversion mutations. Our results support the pathogenicity of these variants. In Chapter 3, we present the example of comprehensive genomic profiling of a rare and uncharacterized tumour, the eccrine porocarcinoma, in which CDKN2A was identified as a potential novel driver.  In both chapters, we used transcriptome targeted assembly to detect and characterize aberrant splicing due to selected germline and somatic variants of interest.  v Lay summary  All cancers are caused by genetic mutation and most cancers are thought to arise by chance from sporadic genetic mutations acquired during one’s lifetime. However, some genetic mutations, called germline, are present in all cells of the body from birth and are passed down from one generation to the next. In these cases, cancer appears to be "running" in the family (e.g. BRCA1 mutation).   Despite the availability of public databases with large quantities of germline and cancer genomic data, each cancer is unique and understanding the complex biology of a given cancer remains challenging. It is important to determine the mechanisms underlying cancer as targeting these mechanisms may change patient care from earlier cancer screening to choice of chemotherapy if cancer arises.   Using DNA sequencing technologies such as whole genome sequencing and whole transcriptome sequencing and various data analysis methods provides insight into cancer biology and their mechanisms.   vi Preface  The work presented here was completed at the BC Cancer Genome Sciences Centre under the supervision of Dr Kasmintan A. Schrader and Dr Steven J.M. Jones. The data used in this work were generated by the Personalized OncoGenomics project; an ongoing clinical trial (Clinicaltrials.gov ID:NCT02155621) approved by the Research Ethics Board at BC Cancer (H12-00137 and H14-00681).  A version of chapter 2 has been published in the Cold Spring Harbor Molecular Case Studies journal as Thibodeau et al. (2019) under the title “Base excision repair deficiency signatures implicate germline and somatic MUTYH aberrations in pancreatic ductal adenocarcinoma and breast cancer oncogenesis”, of which I am the first author. A version of chapter 3 has been published in npj Precision Oncology journal as Thibodeau et al. (2018) under the title “Whole genome and whole transcriptome genomic profiling of a metastatic eccrine porocarcinoma”, of which I am the first author. I was the main analyst for these two studies and integrated both clinical and genomic data. I wrote and edited the manuscripts under the scientific supervision and guidance of Kasmintan A. Schrader and Steven J.M. Jones.  Contributions to this work were made by numerous members of the Personalized OncoGenomics research group. Marco A. Marra and Janessa Laskin contributed the Personalized OncoGenomics study design. Eric Y. Zhao provided mutational signature modelling tools and assisted in data interpretation. Erin Pleasance provided scientific oversight. Andrew Mungall contributed a comprehensive description of the laboratory methods and Yussanne Ma the description of   vii bioinformatics methods. Karen Mungall and Caralyn Reisle assisted in structural variant characterization. Melika Bonakdar collaborated closely in pathway characterization of the porocarcinoma genomic profiling report. Yaoqing Shen assisted in germline variant annotation. Readman Chiu and Inanc Birol collaborated on transcriptome characterization of splicing aberrations using transcriptome assembly.    viii Table of contents Abstract ......................................................................................................................................... iii Lay summary ................................................................................................................................. v Preface ........................................................................................................................................... vi Table of contents ........................................................................................................................ viii List of Tables ................................................................................................................................ xi List of Figures .............................................................................................................................. xii List of Supplementary Materials .............................................................................................. xiii List of Abbreviations ................................................................................................................. xiv Acknowledgements .................................................................................................................... xvi Dedication .................................................................................................................................. xvii Chapter 1. Introduction ................................................................................................................ 1 1.1 Background ........................................................................................................................... 1 1.1.1 Genomic sequencing technology and comprehensive genomic profiling ..................... 1 1.1.2 Abnormal splicing .......................................................................................................... 3 1.1.3 Somatic mutational signatures ....................................................................................... 5 1.2 Clinical implications and importance ................................................................................... 5 1.2.1 Challenges in determining the impact of genomic variation ......................................... 6 1.2.2 Combining genomic and bioinformatic approaches to understand variant impact ........ 7 1.3 Hypothesis and research objectives ...................................................................................... 8 Chapter 2: Role of germline and somatic MUTYH aberrations in base excision repair deficiency signatures ................................................................................................................... 10 2.1 Introduction ......................................................................................................................... 10 2.2 Key findings ........................................................................................................................ 13 2.3 Methods ............................................................................................................................... 14 2.4 Results ................................................................................................................................. 16 2.4.1 Biallelic germline MUTYH variants and pancreatic ductal adenocarcinoma ............. 16 2.4.1.1 Clinical history ...................................................................................................... 16 2.4.1.2 Germline analysis .................................................................................................. 19 2.4.1.2 Somatic analysis .................................................................................................... 23 2.4.2 Germline MUTYH carriers with somatic second hits in breast cancer ........................ 28 2.4.2.1 Patient 2 overview ................................................................................................. 28   ix 2.4.2.2 Patient 3 overview ................................................................................................. 32 2.4.2.3 Patient 4 overview ................................................................................................. 34 2.4.2.4 MUTYH Asian founder splice site variant ........................................................... 34 2.5 Discussion ........................................................................................................................... 35 2.6 Conclusion .......................................................................................................................... 39 Chapter 3: Comprehensive genomic profiling of a rare tumour, the eccrine porocarcinoma ....................................................................................................................................................... 41 3.1 Introduction ......................................................................................................................... 42 3.2 Key findings ........................................................................................................................ 42 3.3 Methods ............................................................................................................................... 43 3.3.1 Clinical sample ............................................................................................................. 43 3.3.2 Tissue collection and preparation ................................................................................ 43 3.3.3 Whole genome DNA library construction ................................................................... 44 3.3.4 Strand-specific RNA library construction ................................................................... 44 3.3.5 Whole genome and transcriptome sequencing ............................................................ 46 3.3.6 Bioinformatic analysis ................................................................................................. 46 3.3.6.1 Germline alteration assessment ............................................................................. 46 3.3.6.2 Somatic alteration assessment ............................................................................... 46 3.4.6.3 Transcriptome gene expression assessment .......................................................... 48 3.3.6.4 Biological interpretation and therapeutic association ........................................... 49 3.3.7 Mutational signatures ................................................................................................... 50 3.3.7.1 Monte-Carlo simulation ........................................................................................ 50 3.3.7.2 Timing of mutational processes ............................................................................ 51 3.3.7.3 Radiation exposure analysis .................................................................................. 51 3.4 Clinical description ............................................................................................................. 51 3.5 Results ................................................................................................................................. 53 3.5.1 Pathology ..................................................................................................................... 53 3.5.2 Somatic profiling ......................................................................................................... 55 3.5.2.1 Cell cycle regulation ............................................................................................. 62 3.5.2.2 Cell growth, cell survival and Wnt pathway ......................................................... 65 3.5.2.3 Copy number and structural variants .................................................................... 68 3.6 Discussion ........................................................................................................................... 71 3.7 Conclusion .......................................................................................................................... 78 Chapter 4: Conclusions .............................................................................................................. 80 Bibliography ................................................................................................................................ 82 Appendices ................................................................................................................................... 93 Appendix A List of 98 Mendelian hereditary cancer predisposition genes .............................. 93 Appendix B Chapter 2 .............................................................................................................. 94 B.1 Distribution of most common cancer types (oncotree codes) across our 731 cases of advanced cancers. ................................................................................................................. 94 B.2 Detailed MUTYH somatic status for MUTYH germline carriers. ............................. 95   x B.3 Patient 1 coding mutation summary. ........................................................................ 96 B.4 Patient 1 somatic small mutations. ............................................................................ 97 B.5 Patient 1 somatic structural variants. ...................................................................... 102 B.6 Patient 1 MUTYH targeted transcriptome assembly (TAP) splicing bed coordinates. 103 B.7 Patient 1 COSMIC mutational signatures. .............................................................. 104 B.8 Patient 1 SigProfiler mutational signatures. ............................................................ 105 B.9 Patient 1 SignatureAnalyzer mutational signature. ................................................. 107 B.10 Patient 1 COSMIC signatures Bayesian probabilities for KRAS G12C mutation. 109 B.11 Patient 1 SigProfiler signatures Bayesian probabilities for KRAS G12C mutation. 110 B.12 Patient 1 SignatureAnalyzer signatures Bayesian probabilities for KRAS G12C mutation. ............................................................................................................................. 112 B.13 Patient 2 coding mutation summary. ...................................................................... 114 B.14 Patient 2 somatic small mutations. .......................................................................... 115 B.15 Patient 2 somatic structural variants. ...................................................................... 117 B.16 Patient 2 MUTYH targeted transcriptome assembly (TAP) splicing bed coordinates. 119 B.17 Patient 2 COSMIC mutational signatures. .............................................................. 121 B.18 Patient 2 SigProfiler mutational signatures. ............................................................ 122 B.19 Patient 2 SignatureAnalyzer mutational signatures. ............................................... 124 B.20 Patient 3 coding mutation summary. ...................................................................... 126 B.21 Patient 3 somatic small mutations. .......................................................................... 127 B.22 Patient 3 somatic structural variants. ...................................................................... 133 B.23 Patient 3 COSMIC mutational signatures. .............................................................. 145 B.24 Patient 3 SigProfiler mutational signatures. ............................................................ 146 B.25 Patient 3 SignatureAnalyzer mutational signatures. ............................................... 148 B26.  EthSeq predicted ethnicity of patients with MUTYH germline variants. ............... 150 Appendix C Chapter 3 ............................................................................................................ 151 C1. Coding mutation summary. ..................................................................................... 151 C2. Somatic small mutations ......................................................................................... 152 C3. Somatic structural variants. ..................................................................................... 154 C4. Monte Carlo simulation mutational signatures. ...................................................... 156 C5. Monte Carlo simulation mutational signatures timing. ........................................... 158 C6. Features associated with radiation-induced tumours as described in Behrati et al (2016). 159 C7. CDKN2A transcripts coverage. .............................................................................. 160 C8. CDKN2A exon coverage. ....................................................................................... 160   xi List of Tables Table 2.1 MUTYH pathogenic germline variants. ................................................................. 20 Table 2.2 MUTYH somatic status for patients with MUTYH pathogenic germline variants. 21 Table 3.1 RNA expression metrics of selected genes (diploid model). ................................ 67   xii List of Figures Figure 1.1 High-level overview of Personalized OncoGenomics workflow. ........................... 2 Figure 1.2 Selected mechanisms by which germline or somatic variants lead to aberrant splicing. ................................................................................................................... 4 Figure 2.1 Role of MUTYH in base excision repair pathway and relationship to Signature 18 from the Catalogue of Somatic Mutations In Cancer (COSMIC) enriched for C:G>A:T transversion mutations. ......................................................................... 11 Figure 2.2 Pedigree of patient 1. ............................................................................................. 17 Figure 2.3 Pancreatic ductal adenocarcinoma histology. ....................................................... 18 Figure 2.4  Patient 1 compound heterozygous MUTYH germline variants. ........................... 22 Figure 2.6 Mutation catalogs and COSMIC mutational signatures of MUTYH germline or combined germline/somatic biallelic aberrations. ................................................. 26 Figure 2.7  Germline MUTYH founder Asian splice site variant impact on splicing. ............ 30 Figure 3.1 Histology and immunochemistry profile of poroid neoplasm. ............................. 54 Figure 3.2 CIRCOS plot illustrating the somatic copy number variants (CNV) observed in the porocarcinoma tumour (Krzywinski et al., 2009). .......................................... 56 Figure 3.3 Transcriptome Spearman correlation across TCGA cancer types. ........................ 57 Figure 3.4 Mutational signatures – Monte-Carlo simulation. ................................................. 59 Figure 3.5 Mutational signatures timing as described by McGranahan et al (McGranahan et al., 2015). ............................................................................................................... 61 Figure 3.6 Genomewide deletion distribution as described by Behjati et al (Behjati et al., 2016). ..................................................................................................................... 61 Figure 3.7 CDKN2A splicing. ................................................................................................ 63 Figure 3.8 CDKN2A transcripts (green/blue) and expression overlay (top). ......................... 64 Figure 3.9 Exon-specific collapsed transcripts expression of CDKN2A transcripts NM_058195 (p14ARF) and NM_000077 (p16INK4a). ....................................... 65 Figure 3.10 Large 46Mb chromosome 3 deletion (3:149653091-196530353, hg19) creating a RNF13-PAK2 gene fusion expressed in the transcriptome. ................................. 69 Figure 3.11 Large 45Mb chromosome 5 deletion (chr5:67564688-112859542, hg19) creating a PIK3R1-YTHDC2 gene fusion expressed in the transcriptome. .......................... 70   xiii List of Supplementary Materials Supplementary Table S1. MUTYH project – Patient 1 somatic CNV and RNA expression data.  Supplementary Table S2. MUTYH project – Patient 2 somatic CNV and RNA expression data.  Supplementary Table S3. MUTYH project – Patient 3 somatic CNV and RNA expression data.  Supplementary Table S4. Porocarcinoma project – Somatic CNV and RNA expression data.     xiv List of Abbreviations ASCNA Allele-specific copy number alteration BER  Base excision repair bp   Base pair CNV   Copy number variant COSMIC Catalogue Of Somatic Mutations In Cancer CT  Computed tomography DLOH  Deletion loss of heterozygosity EP  Eccrine porocarcinoma ESCA-SCC  Oesophageal squamous cell carcinoma FC   Fold change FDG-PET  Fluorodeoxyglucose (FDG) positron emission tomography (PET) GOF   Gain-of-function H&E   Hematoxylin & eosin Indel   Insertion/deletion kIQR   Inter-quartile range intervals away from the median LOF  Loss of function LOH   Loss of heterozygosity MRI  Magnetic resonance imaging OCT  Optimal cutting temperature PDAC  Pancreatic ductal adenocarcinoma POG   Personalized OncoGenomics RPKM  Reads per kilobase per million mapped reads   xv SBS  Single base substitution SNV   Single nucleotide variant SV   Structural variant TAP   Targeted transcriptome assembly TCGA  The Cancer Genome Atlas TSG  Tumour suppressor gene WGS   Whole genome sequencing WTS   Whole transcriptome sequencing       xvi Acknowledgements  I would like to thank my supervisors, Dr Kasmintan Schrader and Dr Steven Jones, for their support, research and career guidance, and for the scientific and professional opportunities they provided to me. I am thankful to the patients and families enrolled in the Personalized OncoGenomics project and to the Personalized OncoGenomics project clinical and research team for making this work possible. I thank all the members of the Jones group, trainees and scientists, for always being available to answer my numerous questions.  I am thankful to the Clinician Investigator Program (CIP) at the University of British Columbia for their funding and giving me the opportunity to acquire the skills required for a clinician-scientist career. Special thanks to Dr Sian Spacey for her useful words of advice to troubleshoot research challenges, and to Tessa Feuchuk for her prompt assistance in navigating the CIP and UBC training requirements as well as her kind words in times of need.   I thank my committee Dr Tony Ng, Dr Inanc Birol, Dr Kasmintan Schrader and Dr Steven Jones for their valuable feedback during my training.  I would like to thank my sister for her unconditional support and deeply rooted belief that I could achieve anything I put my mind to in life.   xvii Dedication  Dedicated to my sister Katlyn Thibodeau   1 Chapter 1. Introduction 1.1 Background  1.1.1 Genomic sequencing technology and comprehensive genomic profiling Scientific advances and dropping costs of genomic sequencing technologies have made large-scale genomic research projects possible and offer unprecedented opportunities for studying the relationships between genomic variation and cancer (Schadt et al. 2010; Bonetta 2010; Chang et al. 2011; Cancer Genome Atlas Research Network et al. 2013).   While large studies using panel and exome sequencing improved our understanding of cancer risk, cancer behaviour and targeted therapies opportunities at the population level (Yang et al. 2015; Johnson et al. 2014; Hirotsu et al. 2015), more comprehensive genomic and bioinformatic approaches are required to understand the complex biology of an individual tumour and for investigating the potential therapeutic implications relating to specific patients (Mandelker et al. 2017; Kim and Kim 2018).  This is especially important in the context of advanced and metastatic cancer patients as they often present drug resistance and limited therapeutic options are available to them (Kurzrock and Giles 2015; Belin et al. 2017). Most large-scale studies and databases with comprehensive genomic profiling focus on primary cancers and the tumour biology of advanced and metastatic cancers remains poorly understood (Gao et al. 2013; Cancer Genome Atlas Research Network et al. 2013; Grossman et al. 2016; Ma et al. 2018). Tumours are typically  heterogeneous even when originating from the same primary site, and ultimately, each tumour has its unique genomic   2 landscape (Wang et al. 2013; Seoane and De Mattos-Arruda 2014; Mertens et al. 2015; Gonzalez-Perez et al. 2013). The usefulness of such comprehensive genomic profiling and bioinformatic analysis in advanced cancers has previously been illustrated in the Personalized OncoGenomics project initiative (see Figure 1.1) (Laskin et al. 2015).    Figure 1.1 High-level overview of Personalized OncoGenomics workflow. Whole genome sequencing (WGS) of tumour DNA with matched blood derived DNA is performed. The germline blood DNA is only analyzed for 98 known cancer predisposition genes (Appendix A) and for subtracting the normal DNA background variation from tumour DNA in order to identify the somatic events unique to the cancer. Tumour DNA small mutations include single nucleotide variants (SNVs) and indels. Copy number variants (CNVs) include deletions and duplications of larger segments of DNA. Structural variants (SVs) include translocations and inversions, but also overlap with CNVs and indels. Mutational signatures include all SNVs, coding and noncoding, taking into consideration the 5’ and 3’ flanking nucleotides (trinucleotide context). Tumour whole transcriptome sequencing (WTS) is performed to assess gene expression, gene fusions and gene splicing.     Normal DNA (blood) Whole genome sequencing 98 germline cancer genesTumour DNA Whole genome sequencingCopy number variantsSmall mutationsMutational signaturesEntire human genomeStructural variantsCopy number variantsSmall mutationsStructural variantsTumour RNA Transcriptome sequencing Coding/noncoding genes Gene fusionsGene expressionGene splicing<<<  3 1.1.2 Abnormal splicing Whole transcriptome sequencing (WTS) has multiple applications in cancer genomics; WTS is often used a surrogate predictor of gene expression (Peng et al. 2015) and for identifying oncogenic fusions (McPherson et al. 2011; Kim et al. 2013). Moreover, transcriptome data can be used for detecting and characterizing aberrant splicing (Trapnell et al. 2012; Chiu et al. 2018).   Aberrant splicing plays an important role in oncogenesis (Chen and Weiss 2015; Sveen et al. 2016). It can arise from somatic mutations disrupting or creating exonic and intronic splicing motifs or from mutations or dysregulation of genes involved in the splicing machinery (Climente-González et al. 2017; Jayasinghe et al. 2018; Seiler et al. 2018). Germline and somatic variants in cancer genes can lead to aberrant splicing by various mechanisms (Supek et al. 2014; Jung et al. 2015). Selected examples of mutation types causing aberrant splicing that have been characterized in the cancer literature are shown in Figure 1.2 below. Jayasinghe et al found that 26% and 11% of somatic splice-site-creating mutations (1964 mutations in The Cancer Genome Atlas dataset) were previously misannotated as “missense” and “silent” respectively  (Jayasinghe et al. 2018). Several pathogenic germline splice-disrupting and splice-creating mutations in genes associated with hereditary cancer predisposition syndromes are described in the literature such as for Lynch (e.g. MLH1, MSH2, PMS2, MSH6) (Auclair et al. 2006; Etzler et al. 2008; Rhine et al. 2018), hereditary breast and ovarian cancer (e.g. BRCA1, BRCA1, PALB2) (Walker et al. 2010; Wappenschmidt et al. 2012; Yang et al. 2019), Li Fraumeni (TP53) (Varley et al. 2001) and familial melanoma (BAP1) (Wadt et al. 2012) syndromes.    4  Figure 1.2 Selected mechanisms by which germline or somatic variants lead to aberrant splicing. Synonymous and missense variants distant to the canonical splice site can create a novel acceptor or donor splice site or enhance an existing cryptic acceptor or donor splice site. Such alternative splice sites compete with the canonical splice site and often lead to deletion of several amino acids (in-frame or frameshift). Similarly, missense variants in the vicinity of a canonical splice site can alter the strength of the splice acceptor or donor can lead to donor loss with intron retention, acceptor loss with exon skipping or acceptor loss with removal of several amino acids (in-frame or frameshift). Intronic single nucleotide variant or indels at the canonical splice donor (+1 bp and +2 bp) or splice acceptor (-1 bp and -2 bp) or in the vicinity of the canonical splice site cause loss of the splice site. Such No variantNormal splicingDonor loss / Intron retentionMissense near splice Donor loss / Intron retentionAberrant splicingSynonymous/missense codingNovel acceptorNovel donorCanonical splice site Acceptor loss /Exon skippingAcceptor loss / Exon skippingAcceptor loss /Novel acceptorAcceptor loss /Novel acceptor  5 mutations were described to lead respectively to donor loss with intron retention, acceptor loss with exon skipping or acceptor loss with the removal of several amino acids (in-frame or frameshift). Red stars represent sites of potential germline or somatic variants.  1.1.3 Somatic mutational signatures Mutational signatures are tumoural patterns of mutations arising from specific mutagenic processes such as exogenous (e.g. tobacco usage) and endogenous exposures, defective DNA repair mechanisms (e.g. homologous recombination) or DNA enzymatic editing (e.g. APOBEC enzymes) (Alexandrov et al. 2013b).  Germline mutations in hereditary cancer predisposition genes have been associated with specific somatic signatures. Inherited BRCA1/BRCA2 pathogenic mutations are associated with homologous recombination deficiency (HRD) signatures in breast cancer (Nik-Zainal et al. 2012; Zhao et al. 2017). Germline variations at the APOBEC3A/APOBEC3B locus are associated with cytosine deamination signatures in breast and bladder cancers (Middlebrooks et al. 2016). More recently, germline and combined germline/somatic MUTYH aberrations have been associated with base excision repair (BER) deficiency signature characterized by a somatic C:G>A:T transversion pattern in colorectal cancer, pancreatic neuroendocrine tumours and adrenocortical tumours (Viel et al. 2017; Scarpa et al. 2017; Pilati et al. 2017).  1.2 Clinical implications and importance   Investigation of the molecular determinants involved in advanced and metastatic cancers is a task of great complexity and current research suggests that detailed and in-depth analysis is required to expand knowledge on these complex oncological entities (Kurzrock and Giles 2015; Belin et   6 al. 2017). More specifically, advancing our understanding on the role of both germline and somatic variation on cancer molecular profiles and tumoural landscape constitute an essential stepping stone towards precision medicine (Kim and Kim 2018; Waszak et al. 2017; Knijnenburg et al. 2015; Ma et al. 2018).   1.2.1 Challenges in determining the impact of genomic variation Numerous public repositories contain genomic data on constitutional DNA population variation (e.g. 1000 genomes (1000 Genomes Project Consortium et al. 2015), ExAc and gnomAD (Lek et al. 2016), dbsnp (Sherry et al. 2001)) and disease-causing variation (e.g. ClinVar (Landrum et al. 2016), LOVD (Fokkema et al. 2011)). Similarly, multiple online cancer datasets are publicly available through resources such as The Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Research Network et al. 2013), the Catalogue Of Somatic Mutations In Cancer (COSMIC) (Forbes et al. 2017), cBioPortal (Cerami et al. 2012) and the International Cancer Genome Consortium (ICGC) (Zhang et al. 2011).  However, despite the recent development of large data public genomic databases and online tools (Tsang et al. 2017; Katsila et al. 2018), determining the functional impact of germline (benign versus pathogenic) (Moghadasi et al. 2016; Bland et al. 2018) and somatic (driver versus passenger) (Sukhai et al. 2016; Madhavan et al. 2018) variants in cancer genes relying solely on genomic features remains challenging. For any given rare variant, scientific evidence is typically lacking and the impact of such variants in oncogenesis often remains uncertain; the prohibitive number of germline and somatic variants makes a functional assessment of individual variants impractical (Richards et al. 2015; Li et al. 2017).   7  1.2.2 Combining genomic and bioinformatic approaches to understand variant impact Genomic sequence information is often insufficient to determine the impact of a variant on gene function (Moghadasi et al. 2016; Hoffman-Andrews 2017). Combining genomic annotations from both germline and somatic databases with combined genomic (genome and transcriptome sequencing) and bioinformatic (e.g. transcriptome assembly) approaches offers the opportunity for functional assessment of variants in cancer genes (Jones et al. 2010).   Considering the subset of genes previously associated to cancer processes such as the ones from the Catalogue Of Somatic Mutations In Cancer (COSMIC) cancer gene census (Forbes et al. 2017), several instances of recurrent hotspot mutations can be found in both germline variation databases and somatic databases (Huang et al. 2018). This crosstalk between germline and somatic contexts constitute a reciprocal relationship that supports pathogenicity of such variants in hereditary cancer predisposition on one hand and in cancer pathogenesis on the other hand (Kalinsky et al. 2009; Rivière et al. 2012; Huang et al. 2018).  While variant classification of recurrent hotspot mutations in cancer genes as well as loss of function (LOF) mutations in tumour suppressor genes (TSGs) is mostly straightforward, determining the impact of synonymous, missense and splice site variants on function is challenging (Eggington et al. 2014). Gene-level transcriptome assembly can support variant pathogenicity by providing functional evidence of aberrant splicing associated with selected germline or somatic variants (Thibodeau et al. 2018; Chiu et al. 2018).    8 Beyond the detection of coding variants, tumour whole genome sequencing (WGS) is necessary to survey both driver and passenger mutations and leads to more accurate modelling of mutational signatures, which provides a better assessment of the underlying biological processes shaping tumoural landscape (Zhao et al. 2017).  1.3 Hypothesis and research objectives  I hypothesize that using combined genomic approaches (e.g. genome, transcriptome, tumour and matched normal sequencing) with diversified bioinformatic methods (e.g. transcriptome assembly, mutational signature modelling) will offer novel insights into clinically unusual cases of cancer.   My thesis consists of two unrelated studies, namely MUTYH aberrations in relation to base excision repair signatures enriched for C:G>A:T transversion mutations (Chapter 2) and comprehensive profiling of an eccrine porocarcinoma tumour (Chapter 3).   In Chapter 2, I present a study of patients with diverse advanced cancers and MUTYH aberrations in relation to somatic base excision repair deficiency mutational signatures enriched for C:G>A:T transversion mutations (Thibodeau et al. 2019). I describe in more details three patients with germline or combined germline/somatic biallelic MUTYH mutations and tumours not classically seen patients with MUTYH-associated polyposis (MAP) syndrome. We found that mutational signatures previously linked to MUTYH deficiency contributed to the mutational landscape of these three patients’ tumours.   9  In Chapter 3, I present the detailed genomic analysis of a rare tumour, the eccrine porocarcinoma, to illustrate the importance of comprehensive genomic profiling for understanding the cancer biology of a poorly characterized tumour (Thibodeau et al. 2018). In-depth assessment of somatic molecular determinants is crucial in characterizing uncommon and under-studied cancers (Boyd et al. 2016; Painter et al. 2017; Gatta et al. 2017).  Both studies presented in Chapter 2 and Chapter 3 illustrate the use of combined genomic and bioinformatic approaches to provide insight into the molecular oncogenesis of clinically unusual cancer cases.    10 Chapter 2: Role of germline and somatic MUTYH aberrations in base excision repair deficiency signatures  The content of this chapter has been published in the Cold Spring Harbor Molecular Case Studies journal (Thibodeau et al. 2019). The data used in this chapter were generated as part of the Personalized OncoGenomics project, which is an ongoing research initiative enrolling patients with advanced cancers for comprehensive genomic profiling of their tumour in order to identify opportunities for precision medicine. Patient enrollment, normal and tumour sampling and preparation, genomic sequencing, bioinformatic pipeline data processing and annotations were contributed by the Personalized OncoGenomics trial project at BC Cancer (NCT02155621).  I performed genomic and clinical data analysis, interpretation and integration, as well as the writing of the manuscript as the first author. I also performed customized data annotations and analysis using bioinformatic tools such as transcriptome targeted assembly pipeline (TAP)/PAVFinder (Chiu et al. 2018) for splicing and signIT (Zhao et al. 2017) for mutational signatures modelling.  2.1 Introduction  Oxidative DNA damage leads to the formation of 8-Oxoguanine, which during DNA replication results in 8-OxoG:A mispairing and subsequent C:G>A:T transversion mutations (Cheadle and Sampson 2007; Boiteux et al. 2017). The MUTYH DNA glycosylase excises mismatched adenine from 8-OxoG:A complex (Banda et al. 2017). Colorectal tumours from patients with   11 germline biallelic MUTYH aberrations and MUTYH-associated polyposis (MAP) syndrome feature base excision repair (BER) deficiency mutational signatures enriched for C:G>A:T transversion mutations (Viel et al. 2017). Novel associations of this transversion signature have also been identified in a subset of germline MUTYH carriers with pancreatic neuroendocrine tumours and adrenocortical tumours displaying somatic inactivation of the second allele (Scarpa et al. 2017; Pilati et al. 2017). Please refer to Figure 2.1 for a high-level overview of the role of MUTYH in base excision repair and C:G>A:T transversion signatures.  Figure 2.1 Role of MUTYH in base excision repair pathway and relationship to Signature 18 from the Catalogue of Somatic Mutations In Cancer (COSMIC) enriched for C:G>A:T transversion mutations. a) Oxidative DNA damage leads to formation of 8-Oxoguanine (8-oxo-G). This damaged DNA can undergo short patch repair with DNA glycosylase OGG1 enzymatic excision of 8-Oxo-G, followed by APE1 removal of 3’ sugar phosphate at the apurinic/apyrimidinic site, POL ! insertion of a normal Guanine (facilitated by PCNA/RP-A) and finally ligation by DNA ligase II and XRCC1. If DNA replication takes place prior to short patch repair, long patch repair can be undertaken to   12 correct the 8-OxoG:A mispairing. MUTYH DNA glycosylase enzymatically removes the incorrectly inserted Adenine base, then APE1 removes the 3’ sugar phosphate at the apurinic/apyrimidinic site, POL " inserts a Cytosine (facilitated by PCNA/RP-A) on the opposite strand of the 8-oxo-G damage base, FEN1 trims the 5’ flap and DNA ligase I ligates the site. The long patch repair process restores the initial lesion (8-oxo-G:C), which can then undergo short patch repair. b) Defective MUTYH in cancer leads to mispairing of 8-Oxo-G:A and enrichment for transversion mutations (C:G>A:T). This pattern is predominant in the COSMIC Signature 18 described by Alexandrov et al (Alexandrov et al. 2013a). The signature 18 polar plot was created using signIT (Zhao et al. 2017). Please note that the description and schematic of panel a) are adapted from Boiteux et al (Boiteux et al. 2017) and Dizdaroglu et al (Dizdaroglu et al. 2017).  Panel b) is adapted from my previous original work, which is publicly available through the Mutational signatures Wikipedia page (https://en.wikipedia.org/wiki/Mutational_signatures).  I hypothesize that using comprehensive genomic and bioinformatic analyses methods provide insight into the role of germline and somatic MUTYH aberrations in C:G>A:T transversion signatures previously associated with defective MUTYH.  We report a case of early-onset pancreatic ductal adenocarcinoma in a patient harbouring biallelic MUTYH germline mutations. Their tumour featured somatic mutational signatures consistent with defective MUTYH-mediated base excision repair and the associated driver KRAS transversion mutation p.Gly12Cys. Analysis of an additional 730 advanced cancer cases (n=731) was undertaken to determine whether the mutational signatures were also present in tumours from germline MUTYH heterozygote carriers, or if instead, the signatures were only seen in those with biallelic loss of function. This review of our cohort revealed two female patients with breast cancer each carrying a single heterozygous pathogenic germline MUTYH variant associated with loss of heterozygosity in the tumour and the same somatic signatures. Therefore, we describe   13 three cases with germline or combined germline/somatic biallelic MUTYH aberrations and C:G>A:T mutational signatures previously linked to defective MUTYH in tumours not classically associated with MAP syndrome. These novel associations illustrate that both biallelic and a monoallelic MUTYH pathogenic germline variants can contribute to the mutational landscape of an individual’s cancer and should therefore be considered when elevated somatic C:G>A:T transversion signatures possibly suggestive of MUTYH deficiency (e.g. COSMIC Signature 18, SigProfiler SBS18/SBS36, SignatureAnalyzer SBS18/SBS36) are identified.  2.2 Key findings  We report biallelic germline mutations in MUTYH associated with early-onset pancreatic ductal adenocarcinoma. The tumour demonstrated somatic signatures consistent with MUTYH-mediated based excision repair deficiency and the associated KRAS p.Gly12Cys transversion mutation. Our results suggest that monoallelic inactivation of MUTYH is not sufficient for C:G>A:T transversion signatures previously linked to MUTYH deficiency to arise (n=9), but that biallelic complete loss of MUTYH function can cause such signatures to arise even in tumours not classically seen in MUTYH-associated polyposis (n=3). Although defective MUTYH is not the only determinant of these signatures, MUTYH germline variants may be present in a subset of patients with tumours demonstrating elevated somatic signatures possibly suggestive of MUTYH deficiency.       14 2.3 Methods  Patient written informed consent was obtained for the Personalized OncoGenomics trial at BC Cancer (NCT02155621), which was approved by the University of British Columbia/BC Cancer Research Ethics Board. Tumour with matched normal blood whole genome sequencing (WGS) and tumour whole transcriptome sequencing was performed in 731 patients with advanced cancers of diverse origins (Appendix B.1). The average depth of coverage was 80X and 40X for the tumour and normal genome respectively. Library preparation, sequencing and bioinformatics analyses were performed according to previously published protocols (Thibodeau et al. 2018). For each patient, tumour and matched normal samples were analyzed together to identify somatic loss of heterozygosity (LOH) and copy number alteration (CNA) regions using CNAseq (v0.0.8) (Jones et al. 2010) and APOLLOH (v0.1.2) (Ha et al. 2012) tools, which both utilize a Hidden Markov model. The ensemble of LOH and CNA regions were compared to a set of theoretical ploidy models ranging from diploid to pentaploid and a set of theoretical tumour content varying by 10% intervals from the initial tumour content assessed on pathological examination. For example, if the tumour content was 44% on initial assessment, ploidy models (diploid, triploid, tetraploid, pentaploid) were assessed with theoretical tumour contents of 14%, 24%, 34%, 44%, 54%, 64%, 74%, 84% and 94%. This results in a total collection of 40 models. The collected copy states are compared to the values for theoretical copy states in each model. The model which results in the lowest error and model complexity is selected upon manual review. Mutational burden was reported in terms of coding nonsynonymous mutation count as well as total SNVs count (coding and noncoding) and total SNVs per Mb rate. The SNVs per Mb rate was calculated using an approximate hg19 genome assembly size of 3200 Mb. The structural   15 variant (SV) count (or SV mutational load) of a given case was compared to the cohort local database of structural variants at the time of analysis. A case enrolled earlier in the study is compared to a smaller cohort (smaller local database) than a case enrolled later in the study (larger local database). Somatic mutations called by Strelka (Saunders et al. 2012) were classified into 96 classes based on the variant base and 3’ and 5’ context (trinucleotide). Using publicly available signature reference matrices from the Catalogue of Somatic Mutations In Cancer (COSMIC) (Alexandrov et al. 2013a), SigProfiler (Alexandrov et al. 2018) and SignatureAnalyzer (Haradhvala et al. 2018), the contribution of each signature was determined with the Bayesian R package signIT as previously described (Zhao et al. 2017).   To calculate the probability for driver KRAS transversion mutation (C[C>A]A transversion leading to p.Gly12Cys) to be caused by signatures previously associated with MUTYH deficiency (COSMIC Signature 18, SigProfiler SBS18/SBS36 and SignatureAnalyzer SBS18/SBS36) in patient 1, Bayesian inference was used. For each signature, the posterior probability P(S|M) for the mutation to be caused by that given signature was calculated as follows:  Where P(M|S) is the prior probability of C[C>A]A in the reference signature matrix, P(S) is the proportion of mutation contributed by this signature and P(M) is the total number of C[C>A]A events.    P(S|M) =P(M|S) x P(S)P(M)  16 2.4 Results  Genomic profiling of a pancreatic ductal adenocarcinoma (PDAC) case with germline biallelic compound heterozygous MUTYH variants revealed markedly elevated Catalogue of Somatic Mutations in Cancer (COSMIC) Signature 18. As COSMIC Signature 18 has previously been associated with defective MUTYH, these findings prompted review of our entire cohort of advanced cancers (n=731) for MUTYH status and mutational signatures.   2.4.1 Biallelic germline MUTYH variants and pancreatic ductal adenocarcinoma 2.4.1.1 Clinical history We report a female (patient 1) of Chinese ancestry initially diagnosed with Stage IIB distal pancreatic ductal adenocarcinoma (PDAC) at the age of 45 years.   Of note, the patient had prior colonoscopy and gastroscopy at age 43 for epigastric pain, which reported three colonic tubular adenomas, one rectal tubular adenoma and one gastric fundic gland polyp. She was evaluated at the Hereditary Cancer Program for her early-onset PDAC and positive family history of gastrointestinal cancers. Her father was diagnosed with pancreatic cancer at the age of 79 years. Her brother was diagnosed with colorectal cancer on a reported background of 30 tubular adenomas at the age of 54 years and her sister reported multiple polyps at the age of 58 years (Figure 2.2 for pedigree).   17  Figure 2.2 Pedigree of patient 1. Patient 1 was diagnosed with early-onset pancreatic ductal adenocarcinoma at the age of 45. Her father was diagnosed with pancreatic cancer at the age of 79 years and her brother diagnosed with colorectal cancer at the age of 54 years. Although segregation is not confirmed in the family, the family history is consistent with MUTYH-Associated Polyposis (MAP) syndrome in consideration of the patient’s brother reportedly with 30 tubular adenomas noted in addition to his colon cancer. The patient had 4 colorectal polyps. The patient’s sister had 3-5 polyps on colonoscopy at age 58. The patient’s uncle was reported with oesophageal cancer at age 64 (deceased at age 65) and one of his four children (not shown) reported with lung cancer at age 55 years (deceased at age 55).  Distal pancreatectomy and partial splenectomy revealed a 6 cm moderately differentiated ductal adenocarcinoma within the pancreatic tail (Figure 2.3). Margins were clear, but there was lymphovascular and perineural invasion as well as one of four lymph nodes involved. There was no evidence of intratumoural lymphocytes. The final pathologic stage was pT2N1.  d.68yStomach Ca (68y)d.70y(stroke)d.80yPDAC (79y)d.83y (stroke)Renal cystsRenal cystsCRC (54y)vMultiple polyps Multiple polyps PDAC (45y)d.48yd.65yEsophagealCa (64y)  18  Figure 2.3 Pancreatic ductal adenocarcinoma histology. Hematoxylin & eosin (H&E) stained section of moderately differentiated pancreatic ductal adenocarcinoma within pancreatic tail in patient 1. Specimen obtained from distal pancreatectomy and partial splenectomy surgical procedure. Typical histology with prominent peritumoural fibrosis and no evidence for intratumoural lymphocytes (100X magnification).  The patient’s primary tumour was assessed for DNA mismatch repair protein expression by immunohistochemistry and intact staining was noted in the MLH1, MSH2, MSH6 and PMS2 proteins (data not shown).  She received adjuvant chemotherapy with 5-fluorouracil, oxaliplatin and irinotecan on a clinical trial and completed all 12 planned cycles. Post-treatment imaging and CA19-9 showed no evidence of disease recurrence. Three months after completing adjuvant chemotherapy, routine surveillance CT detected signs of local recurrence. Patient 1 received radiotherapy to the pancreatic remnant, 50 Gy in 25 fractions, with concurrent capecitabine. Restaging FDG-PET performed one month after completion of therapy revealed multiple liver metastases. The patient consented to participate   19 in the Personalized OncoGenomics study and underwent ultrasound-guided liver biopsy, which confirmed the diagnosis of metastatic pancreatic ductal adenocarcinoma.   She commenced first-line palliative chemotherapy with gemcitabine and nab-paclitaxel and had an excellent radiological partial response and biochemical response after 6 cycles of treatment. Six months later, she was switched to Nivolumab for progression, but she did not respond to immunotherapy. The patient unfortunately passed away from complications of her metastatic disease at age 48.   2.4.1.2 Germline analysis Clinical germline genetic testing identified compound heterozygote pathogenic variants in MUTYH (c.996G>A, p.Ser332Ser and c.815G>A, p.Gly272Glu;  NM_001048171). Blood DNA whole genome sequencing with analysis of 98 cancer predisposition genes (Appendix A) also identified the MUTYH variants (Table 2.1), and no other pathogenic variants. For these germline MUTYH carriers, selected somatic MUTYH features are reported in Table 2.2 and Appendix B.2.  20 Table 2.1 MUTYH pathogenic germline variants. Variant annotations with GRCh37 (hg19) genomic positions, MUTYH transcript NM_001048171 positions and MUTYH protein transcript NP_001041636 positions are provided. The Cancer Genome Atlas (TCGA) MUTYH expression percentile is provided in comparison to the TCGA average for all cancers. ACC, Adrenocortical Carcinoma; Atypical Teratoid/Rhabdoid Tumor (ATRT); IDC, Breast Invasive Ductal Carcinoma; LUAD, Lung Adenocarcinoma; NA, not available; PAAD, Pancreatic Adenocarcinoma; RCC, renal cell carcinoma; TCGA, The Cancer Genome Atlas; ULMS, Uterine Leiomyosarcoma.   General Germline Id Tumour type Chromosome and genomic position HGVS DNA Reference HGVS Protein Reference Variant type dbSNP #1 PAAD chr1:45797481C>T c.996G>A p.Ser332Ser Synonymous rs372673338 #1 PAAD chr1:45797914C>T c.815G>A p.Gly272Glu Missense rs730881833 #2 IDC chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170 #3 IDC chr1:45797228C>T c.1145G>A p.Gly382Asp Missense rs36053993 #4 ACC chr1:45797228C>T c.1145G>A p.Gly382Asp Missense rs36053993 #5 ATRT chr1:45798475T>C c.494A>G p.Tyr165Cys Missense rs34612342 #6 LUAD chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170 #7 RCC chr1:45798475T>C c.494A>G p.Tyr165Cys Missense rs34612342 #8 IDC chr1:45798475T>C c.494A>G p.Tyr165Cys Missense rs34612342 #9 ULMS chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170 #10 PAAD chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170 #11 PAAD chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170 #12 LUAD chr1:45797760T>C c.892-2A>G  Splice acceptor rs77542170   21 Table 2.2 MUTYH somatic status for patients with MUTYH pathogenic germline variants. COSMIC Signature 18 exposures are shown in this table. Please refer to Appendices for complete mutational signatures profiles using reference matrices from COSMIC, SigProfiler and Signature Analyzer for patient #1 (Appendices B.7, B.8 and B.9), patient #2 (Appendices B.17, B.18 and B.19) and patient #3 (Appendices B.23, B.24 and B.25). ACC, Adrenocortical Carcinoma; ATRT, Atypical Teratoid/Rhabdoid Tumor; IDC, Breast Invasive Ductal Carcinoma; LUAD, Lung Adenocarcinoma; NA, not available; PAAD, Pancreatic Adenocarcinoma; RCC, renal cell carcinoma; TCGA, The Cancer Genome Atlas; ULMS, Uterine Leiomyosarcoma.  Germline  Somatic Id HGVS DNA Reference Copy category Zygosity of germline variant in tumour Germline variant DNA ALT/total Germline variant RNA ALT/total TCGA percentile Sig18 exposure Sig18 proportion  1 c.996G>A Neutral heterozygous 26/53 2/15 16 5683 0.4702 1 c.815G>A Neutral heterozygous 18/45 20/29 16 5683 0.4702 2 c.892-2A>G Loss homozygous 33/49 80/121 56 830 0.1668 3 c.1145G>A Loss homozygous 49/55 113/125 20 1735 0.0857 4 c.1145G>A Neutral heterozygous 22/51 NA NA 308 0.2081 5 c.494A>G Neutral heterozygous biop2: 33/74 biop1: 27/68 biop2: 41/75 biop1: 52/101 23 76 0.0536 6 c.892-2A>G Neutral heterozygous 37/67 9/28 18 50 0.0059 7 c.494A>G Gain heterozygous 37/76 59/126 22 20 0.0047 8 c.494A>G Gain heterozygous 62/90 91/120 56 21 0.0029 9 c.892-2A>G Neutral heterozygous 50/101 84/221 69 46 0.0056 10 c.892-2A>G Neutral heterozygous 45/95 72/231 72 23 0.0031 11 c.892-2A>G Loss heterozygous 19/44 13/50 37 18 0.0011 12 c.892-2A>G Neutral heterozygous 19/44 36/97 58 48 0.0028   22 Somatic transcriptome data showed the synonymous variant (p.Ser332Ser) to cause abnormal splicing and that the pathogenic variants are in trans (Figure 2.4, red box).    Figure 2.4  Patient 1 compound heterozygous MUTYH germline variants. (A) Integrative Genomics Viewer (IGV) (Thorvaldsdóttir et al. 2013) capture of whole genome sequencing of tumour and matched blood DNA with whole transcriptome sequencing of tumour RNA data at the genomic region encompassing the two MUTYH variants (c.996G>A, p.Ser332Ser and c.815G>A, p.Gly272Glu;  NM_001048171). Paired-end transcriptome reads showing the germline variants to be in trans with read pairs containing the p.Gly272Glu variant (exon 10) but not the p.Ser332Ser variant (exon 12) (red box). The splicing aberration caused by p.Ser332Ser variant was not visually apparent in IGV and required transcriptome Targeted Assembly Pipeline (TAP) for detection and   23 characterization. TAP was used to assemble reads into contigs. Contig 1 shows aberrant splicing and contig 2 shows normal splicing. (B) Schematic of TAP analysis results showing the synonymous germline variant (c.996G>A, p.Ser332Ser) creating a novel canonical AG acceptor splice site at chr1:45797480-45797481 and removing 42 bp (14 amino acids).  2.4.1.2 Somatic analysis A tetraploid model (4 copies) with an estimated tumour content of 51% was used for the PDAC. Tumour genomic features such as single nucleotide variants (SNVs), indels, structural variants (SVs), copy number variants (CNVs) and mutational signatures were analyzed in conjunction with gene expression from transcriptome data. In comparison to The Cancer Genome Atlas (TCGA) dataset, the overall coding mutational burden was moderate with 88 coding SNVs (73rd percentile for all TCGA cancers and 76th percentile for TCGA pancreatic cancers) and 6 coding indels (73rd percentile for all TCGA cancers and 1st percentile for TCGA pancreatic cancers). The total number of somatic SNVs (coding and noncoding) for patient 1 was 12087, or 3.78 SNVs per Mb. There were 86 SVs (60th percentile amongst our local database of 626 cancers of diverse origins). In regards to patient 1 somatic data, please refer to Appendix B.3 for the coding mutation summary, Appendix B.4 for the small mutations, Supplementary Table S1 for the CNV and RNA expression data and Appendix B.5 for the structural variants.  As mentioned previously, tumour transcriptome data showed the germline variants to be in trans and expressed (Figure 2.4A). Targeted Assembly Pipeline (TAP) (Chiu et al. 2018) on the transcriptome showed the synonymous germline variant (c.996G>A, p.Ser332Ser) to create a novel canonical AG acceptor splice site removing 42 bp and 14 amino acids (Figure 2.4B, Appendix B.6). Due to the creation of this novel acceptor splice site and subsequent splicing out   24 of the c.996G>A variant, the variant was only present in 2 out of 15 RNA-seq reads spanning the chr1:45797481 genomic site. The second pathogenic variant (c.815G>A, p.Gly272Glu) was present in 20 out of 29 RNA-seq reads covering the chr1:45797914 genomic site. The novel exon junction at chr1:45797479 was supported by 12 reads while the canonical junction at chr1:45797521 was supported by 19 reads (Appendix B.6).   The PDAC tumour featured the highest Signature 18 and SBS36 signatures of our cohort using three different published signatures references matrices (COSMIC, SigProfiler, SignatureAnalyzer) with a proportion of mutations contributed of 41-48% (4962-5769 SNVs) (Figure 2.5, Figure 2.6, Appendices B.7, B.8 and B.9). KRAS increased expression (94th percentile compared to the TCGA average) was associated with copy gains (+3 copies) and the driver KRAS p.Gly12Cys (c.34G>T, NM_033360) transversion mutation (3/6 copies). Given the predicted mutational signatures probability from each reference matrix, the Bayesian probability for the KRAS transversion mutation to be caused by the transversion signatures previously associated with MUTYH deficiency is 84% for Signature 18 (COSMIC, Appendix B.10), 75% for combined SBS18/SBS36 signatures (SigProfiler, Appendix B.11) and 62% for combined SBS18/SBS36 signatures (SignatureAnalyzer, Appendix B.12). These probabilities strongly suggest that the oncogenic KRAS p.Gly12Cys mutation was caused by signatures previously associated with defective MUTYH.   25   Figure 2.5 Comparison of mutation signatures exposures of MUTYH germline or combined germline/somatic biallelic aberrations against 731 cancer genomes of mixed origins. The cohort distribution of mutation signature exposures for each signature is shown using signature composition reference matrices from COSMIC, SigProfiler and SignatureAnalyzer. The mutation fraction exposures (proportion of all mutations contributed by each signature) of patient 1, patient 2 and patient 3 are superimposed. Signatures previously associated with MUTYH-mediated base excision repair (BER)   26 deficiency are highlighted by a rectangle. Patient 1 (pancreatic ductal adenocarcinoma, pink diamond), patient 2 (breast cancer, green diamond) and patient 3 (breast cancer, blue diamond) demonstrated germline or combined germline/somatic biallelic MUTYH aberrations. Functional biallelic MUTYH loss of function was present in tumours of patient 1, patient 2 and patient 3 and all three patients displayed elevated outlier (>1.5 interquartile range above the third quartile) signatures previously associated with defective MUTYH (COSMIC Signature 18, SigProfiler SBS18 and SBS36, SignatureAnalyzer SBS18 and SBS36).      Figure 2.6 Mutation catalogs and COSMIC mutational signatures of MUTYH germline or combined germline/somatic biallelic aberrations. Complete catalogs of single nucleotide variants from whole-genome sequencing of tumours were classified based on variant and 3’/5’ contexts into 96 categories (left side) and mutation catalogs were used to calculate the contribution of each signature to mutational burden (right side). The proportion of mutations or mutation count in each of the 96 categories is shown here as a barplot (left   27 side), while the signIT Markov Chain Monte Carlo simulation result for COSMIC (Catalogue of Somatic Mutations in Cancer) mutational signatures is shown as a violin plot distribution (right side). COSMIC Signature 18 composition is shown at the top for reference. Patient 1 (germline MUTYH c.815G>A, p.Gly272Glu; c.996G>A, p.Ser332Ser; NM_001048171) and patient 2 (germline MUTYH c.892-2A>G) tumours showed increased Signature 18 somatic mutation burden, as exemplified by the ressemblance of their mutation catalogs to COSMIC reference Signature 18. Other mutational processes contributed most to the mutational burden of patient 3’s tumour.  The tumour harbored several somatic events that have previously been associated with PDAC (Bailey et al. 2016). A 10 Mb deletion at 9p led to homozygous copy loss of tumour suppressor genes (TSGs) CDKN2A, CDKN2B, MTAP and a 51 Mb deletion at 18q led to heterozygous copy loss of TSGs SMAD2 and SMAD4. In addition to the gain of function (GOF) KRAS mutation (p.Gly12Cys), somatic small mutations of interest included TP53 frameshift indel with a deletion copy loss leading to biallelic loss of function, as well as heterozygous frameshift indels in two epigenetic TSGs: ARID1A and SETD2. Including subclonal variants, 53/95 (56%) of moderate to high impact variants in protein-coding genes were due to transversion events (Appendix B.4). Fusions of potential biological relevance included a chr12 duplication leading to an inversion-fusion of GPRC5A and CCDC91 associated with a copy gain (+2 copies) and elevated GPRC5A expression (96th percentile compared to the TCGA average). Base excision repair (BER) pathway genes OGG1 and NUDT1 had low percentile expression (3rd percentile for both) and OGG1 had a copy loss (-1 copy). Please refer to Supplementary Table S1 for additional somatic CNV and gene expression data.     28 2.4.2 Germline MUTYH carriers with somatic second hits in breast cancer Analysis of an additional 730 advanced cancer cases (total=731) was undertaken to determine whether the mutational signatures seen in patient 1’s PDAC were also present in tumours from germline MUTYH heterozygote carriers, or if instead, the signatures were only seen in the case of biallelic loss of function.   Using reference matrices from COSMIC, SigProfiler and SignatureAnalyzer, somatic mutational signatures were analyzed in terms of mutation counts (number of SNVs) and proportions (fraction between 0 and 1) contributed to the overall mutational burden. Mutation counts (mean exposures) and mutation fractions (proportions contributed by each signature) higher than 1.5 interquartile range above the third quartile were considered outliers. Amongst germline MUTYH carriers (n=12, Table 1), all three cases of biallelic MUTYH aberrations (patient 1, patient 2, patient 3) were outliers for the proportion of mutations contributed by transversion signatures previously associated with defective MUTYH (Figure 2.5, Figure 2.6).   2.4.2.1 Patient 2 overview Patient 2 is a female patient of North-East Asian ancestry diagnosed with metastatic invasive ductal breast carcinoma (4x5cm mass on ultrasound, Nottingham grade 2/3, Estrogen Receptor 8/8, Progesterone Receptor 7/8 and HER2 negative) at the age of 32 years. Two bone lesions at L5 and T10 were seen on MRI and PET imaging. She was treated with doxorubicin and cyclophosphamide for 6 cycles, followed by mastectomy and axillary node resection. Family history was negative for MAP-associated cancers. Patient 2 was found to carry a single germline MUTYH splice site variant (c.892-2A>G; NM_001048171). No other pathogenic variants were   29 identified on clinical multigene panel testing or analysis of 97 additional cancer susceptibility genes by WGS. Their tumour featured a somatic copy loss of MUTYH (triploid model, tumour content 38%), resulting in the splice site variant being homozygous in the tumour.   The tumour had an unremarkable coding mutational burden with 24 nonsynonymous coding SNVs (29th percentile amongst all TCGA cancers and 54th percentile amongst TCGA breast cancer dataset), 1 coding indel (24th percentile amongst all TCGA cancers and 28th percentile amongst TCGA breast cancer dataset) and 116 SVs (73rd percentile amongst our local database of 583 cancer cases). The total number of somatic SNVs (coding and noncoding) for patient 2 was 4979, or 1.52 SNVs per Mb. In regards to patient 2 somatic data, please refer to Appendix B.13 for the coding mutation summary, Appendix B.14 for small mutations, Supplementary Table S2 for CNV and RNA expression data and Appendix B.15 for structural variants.  Targeted Assembly Pipeline (TAP) (Chiu et al. 2018) on the transcriptome showed two mechanisms of abnormal splicing associated with this germline variant. The first mechanism leads to a 9 bp deletion due to the canonical splice site acceptor disruption and selection of a downstream canonical AG acceptor splice site at chr1:45797750-45797751 (new exon start at chr1:45797749). The second mechanism leads to marked intron 10 and intron 11 retention (Figure 2.7, Appendix B.16).   30  Figure 2.7  Germline MUTYH founder Asian splice site variant impact on splicing. The Asian founder mutation MUTYH disrupting the canonical acceptor splice site at exon 11 (c.892-2A>G) was found in 6 patients (patient 2, patient 6, patient 9, patient 10, patient 11, patient 12). (A) Integrative Genomics Viewer (IGV) (Thorvaldsdóttir et al. 2013) capture of whole genome sequencing of tumour and matched blood DNA with whole transcriptome sequencing of tumour RNA data at the genomic region encompassing the MUTYH variant (c.892-2A>G;  NM_001048171).Transcriptome data showing abnormal splicing removing 9 bp from exon 11 (19 reads with a black line) and marked intron 10 and intron 11 retention. (B) Schematic of abnormal splicing removing 9 bp from exon 11. (C) Schematic of abnormal splicing resulting in intron retention.    31 Despite the overall low mutation counts (4979 SNVs), the contribution of COSMIC Signature 18 (Appendix B.17), SigProfiler SBS36 (Appendix B.18) and SignatureAnalyzer SBS36 (Appendix B.19) signatures were elevated, suggesting that some degree of MUTYH-mediated BER deficiency could be present in the tumour due to the combined germline and somatic biallelic MUTYH aberrations. COSMIC Signature 1 (1203 SNVs or 24.2% of all SNVs), associated with age, contributed most to overall mutational burden, followed by Signature 18 (830 SNVs or 16.7% of all SNVs) and non-specific Signature 5 (757 SNVs or 15.2% of all SNVs) (Figure 2.5, Figure 2.6, Appendices B.17, B.18 and B.19).   Patient 2 had a concurrent germline variant of uncertain significance (VUS) in CHEK2 mutation (NM_007194, c.542G>A; p.Arg181His; chr22:29121015C>T, hg19; rs121908701). Although the CHEK2 variant was found homozygous in the tumour by reason of deletion LOH, there was no significant Signature 3, associated with homologous recombination deficiency, to the overall mutational burden.  Somatic genomic features of interest included PIK3CA p.His1047Arg (heterozygous, 2/3 copies) GOF mutation, ADGRA2 amplification (+9 copies), AKT1 copy gain (+1 copy) and AKT3 amplification (+5 copies). Additional CNVs of potential clinical relevance included amplification of MDM4 (+5 copies) and single copy gains of MDM2, CDK4, CDK6 and AURKA. BER gene OGG1 had low expression (2nd percentile), but both OGG1 and NUDT1 were copy neutral. Please refer to Supplementary Table S2 for additional somatic CNV and gene expression data.    32 2.4.2.2 Patient 3 overview Patient 3 is a female patient of European ancestry diagnosed at the age of 69 years with invasive left ductal breast carcinoma pT2N1aMX grade 3 (Estrogen Receptor/Progesterone Receptor-positive, HER2 negative). She was initially treated with partial left mastectomy and left axillary lymph node dissection followed by standard care adjuvant therapy doxorubicin, cyclophosphamide and paclitaxel, but paclitaxel was replaced by docetaxel following a severe allergic reaction to paclitaxel. This regimen was followed by adjuvant letrozole and radiation therapy (16 cycles). In addition to the pathogenic germline MUTYH variant (c.1145G>A, p.Gly382Asp), this patient also carries a heterozygous pathogenic germline CHEK2 mutation (NM_007194:c.1100delC, p.Thr367Metfs, rs555607708). Family history revealed a maternal aunt with premenopausal breast cancer at the age of 30 years who underwent bilateral mastectomy.   The tumour had an elevated coding mutational burden with 92 nonsynonymous coding SNVs (74th percentile amongst all TCGA cancers and 93rd percentile amongst TCGA breast cancer dataset), 4 coding indels (59th percentile amongst all TCGA cancers and 69th percentile amongst TCGA breast cancer dataset) and 248 SVs (93rd percentile amongst our local database of 626 cancer cases). The total number of somatic SNVs (coding and noncoding) for patient 3 was 20237, or 6.32 SNVs per Mb. In regards to patient 3 somatic data, please refer to Appendix B.20 for the coding mutation summary, to Appendix B.21 for the small mutations, to Supplementary Table S3 for the CNV and RNA expression data and to Appendix B.22 for the structural variants.     33 The tumour had a somatic MUTYH copy loss (triploid model, tumour content 70%) resulting in the germline variant being homozygous in the tumour. Although COSMIC Signature 18 had a significant contribution (1735 or 8.6% of all SNVs), Signature 2 (6621 or 32.7% of all SNVs) and Signature 13 (3512 or 17.4% of all SNVs), associated with APOBEC family of cytidine deaminases, were the main contributors to overall mutational burden (Table 2.1, Figure 2.5, Figure 2.5, Appendices B.23, B.24 and B.25). Deletion copy loss (-1 copy) of BRCA1 and BRCA2 was observed in the tumour. Furthermore, the germline CHEK2 c.1100delC heterozygous variant was homozygous in the tumour by reason of a somatic deletion loss of heterozygosity (LOH) event. However, Signature 3, associated with homologous recombination deficiency, only very modestly contributed to the overall mutational burden (304 or 1.5% of all SNVs).   Other somatic events of interest included copy gain (+2 copies) with PIK3CA p.His1047Arg (heterozygous, 4/5 copies) GOF mutation, FGFR2 amplification (+58 copies), CCND1 amplification (+19 copies) and AKT3 amplification (+4 copies). Additional CNVs of potential clinical relevance included amplifications of MDM4 (+8 copies), MDM2 (+4 copies), CDK2/CDK4 (+3 copies) and AURKA (+3 copies). BER pathway genes NUDT1 and OGG1 had low percentile expression (respectively 2nd and 15th percentile) and NUDT1 had copy neutral LOH while OGG1 had a copy loss (-1 copy). Please refer to Supplementary Table S3 for additional somatic CNV and gene expression data.      34 2.4.2.3 Patient 4 overview Patient 4 (Table 2.1) is a female of Asian ancestry who was initially diagnosed at the age of 41 years with triple negative breast cancer. Left mastectomy revealed 2 foci of tumour (15 mm and 5 mm) and 1/5 sentinel node positive (staging pT1cN1a). She was treated with 4 cycles of doxorubicin and cyclophosphamide, then 4 cycles of paclitaxel followed by radiation therapy to the breast and axillary nodes. Family history was negative for MAP cancers. Patient 4 was also found to be a germline carrier of the Asian founder MUTYH splice site variant (c.892-2A>G) and she was diagnosed with triple negative breast cancer at age 41. A pentaploid model (5 copies) with estimated tumour content of 29% was used. While their tumour also had one somatic MUTYH copy loss, the germline variant remained heterozygous in the tumour (2/4 copies). Their tumour did not display outlier signatures previously associated with defective MUTYH (data not shown).  2.4.2.4 MUTYH Asian founder splice site variant  The Asian founder MUTYH splice site mutation c.892-2A>G (rs77542170) (Taki et al. 2016) was identified in 6 out of 731 patients (minor allele frequency 0.0041, or 0.41%). According to the gnomAD database, this splice site variant has a minor allele frequency of 0.11% (0.0011) in the general population and 1.5% (0.015) in the East Asian population (Lek et al. 2016). Given the frequency of this variant in our cohort, we conducted single nucleotide polymorphisms principal component analysis using EthSeq R package (Romanel et al. 2017) to show that all patients carrying this pathogenic germline variant were of East Asian ancestry (Appendix B.26). The ethnic composition of our cohort explains the large number of carriers identified with the MUTYH c.892-2A>G variant. Transcriptome targeted assembly pipeline (TAP) of the tumours   35 from patients carrying the c.892-2A>G splice site mutation consistently identified the same two abnormal splicing mechanisms described in Figure 2.7 (other MUTYH carriers transcriptome assembly data not shown).  2.5 Discussion  We observed mutational signatures featuring a strong C:G>A:T transversion phenotype that have previously been associated with defective MUTYH in tumours from patients with biallelic germline or combined germline and somatic MUTYH loss of function (n=3). Such signatures, collectively referred to as SBS18/SBS36 signatures, include COSMIC Signature 18, SigProfiler SBS18/SBS36 and SignatureAnalyzer SBS18/SBS36 (Viel et al. 2017; Pilati et al. 2017; Scarpa et al. 2017; Alexandrov et al. 2018). The study of patient 1 (germline compound heterozygote) provides evidence for the contribution of aberrant MUTYH function to PDAC genomic landscape. In keeping with previously published reports in the Asian population, Zhou et al reported a high frequency of KRAS p.Gly12Cys transversion mutations in pancreatic cancer (80/126, 63%) (Zhou et al. 2016; Kairupan et al. 2005; Win et al. 2011; Fokkema et al. 2011). The KRAS p.Gly12Cys variant is seen in 2% of colorectal cancers, and of those, 25% had germline biallelic pathogenic MUTYH variants (Jones et al. 2004; Aimé et al. 2015). As activated KRAS is a frequent and early driver of PDAC, germline mutations in MUTYH have been hypothesized to be associated with development of pancreatic adenocarcinoma cancer on this basis, although this had not been demonstrated prior to our study (Smith et al. 2009). Given the predominant contribution of SBS18/SBS36 signatures to the PDAC mutational landscape in the setting of germline biallelic MUTYH mutations (patient 1), the oncogenic driver KRAS   36 transversion was likely caused by MUTYH deficiency. A Bayesian probability approach consistently ranked the transversion signatures previously associated with germline or combined germline/somatic MUTYH impairment [COSMIC Signature 18 (Pilati et al. 2017; Scarpa et al. 2017), SigProfiler SBS36 (Alexandrov et al. 2018) and SignatureAnalyzer SBS36 (Alexandrov et al. 2018)] at the top (Appendiced B.10, B.11 and B.12).   In all the MUTYH mutation carriers (n=9) where a wild type copy was retained in the tumour, signatures previously associated with defective MUTYH (SBS18/SBS36) were not predominant or were inconsistent between methods (COSMIC, SigProfiler, SignatureAnalyzer; data not shown). These results suggest that MUTYH is haplosufficient with respect to MUTYH function in the monoallelic state, but that biallelic complete loss of MUTYH function can cause SBS18/SBS36 signatures to arise even in tumours not classically associated with MAP syndrome (Table 2.1).   We identified strong evidence for MUTYH-mediated BER deficiency contributing to oncogenesis in our PDAC case with germline biallelic MUTYH aberrations. Patient 2 and patient 3 both carry a heterozygous germline MUTYH pathogenic variant (c.892-2A>G and c.1145G>A (p.Gly382Asp), respectively) and their breast cancer tumour displayed a somatic MUTYH copy loss causing the germline variant to become functionally homozygous (2/2 remaining copies, triploid model). While APOBEC signatures were the main contributors to patient 3’s mutational landscape, signatures associated with age contributed most to patient 2’s somatic landscape. Although the SBS18/SBS36 signatures contribution remained modest in the two breast cancer cases, accounting for 16.7-19.8% and 8.9-10.3% of all mutations in patient 2 and patient 3,   37 respectively, the rarity of these characteristic signatures in the setting of combined germline and somatic biallelic MUTYH inactivation suggests that MUTYH deficiency is likely the main determinant of C:G>A:T transversion mutations in these tumours. There is debate in the literature as to whether or not inactivation of MUTYH predisposes to breast cancer (Wasielewski et al. 2010; Boesaard et al. 2014). Although evidence of somatic SBS18/SBS36 signatures in two breast cancer cases does not resolve this question, our findings suggest that germline MUTYH heterozygous mutations can occasionally contribute to somatic mutational evolution in the presence of secondary somatic MUTYH loss of function. Even though other known germline cancer predisposition variants such as the CHEK2 c.1100delC in patient 3 are stronger determinants of lifetime breast cancer risk, a minor influence of germline MUTYH pathogenic variant carrier status cannot be excluded.   MUTYH is one of two BER DNA glycosylase genes that have been linked to recessive familial adenomatous polyposis syndromes and specific somatic mutational signatures. NTHL1 is associated with a C:G>T:A mutational phenotype (Weren et al. 2015). MUTYH is associated with the 8-OxoG BER pathway and a C:G>A:T transversion phenotype (Viel et al. 2017). The 8-OxoG BER pathway is redundant and therefore, it is possible that multiple genomics hits in this pathway, including NUDT1 and OGG1, may be required for the SBS18/SBS36 C:G>A:T transversion signatures to arise in presence of oxidative DNA damage. NUDT1-null mice showed increased tumourigenesis, but the number of spontaneous mutations was not increased compared to wild type mice (Tsuzuki et al. 2001). When the NUDT1 knockout was superimposed on a mismatch repair deficient background, tumours displayed enrichment for C:G>A:T transversion mutations (Egashira et al. 2002). OGG1-null mice showed increased   38 spontaneous tumourigenesis and transversion mutations (Klungland et al. 1999; Minowa et al. 2000; Sakumi et al. 2003). However, double knockout of OGG1 and NUDT1 did not increase tumourigenesis, which suggest the oncogenic effects of OGG1 deficiency may be counteracted by NUDT1 deficiency (Sakumi et al. 2003). Finally, MUTYH-null mice showed increased intestinal tumourigenesis and C:G>A:T transversion mutations (Sakamoto et al. 2007). To date, germline variation in OGG1 and NUDT1 has not been associated with a Mendelian hereditary cancer predisposition and MUTYH remains the only validated Mendelian cancer predisposition gene linked to the SBS18/SBS36 signatures with a strong phenotype of C:G>A:T transversions (Mur et al. 2018).   Beyond consideration for defective BER as a potential underlying etiology, SBS18/SBS36 transversion signatures ultimately arise from an excess of reactive oxygen species (ROS) and subsequent formation of 8-OxoGuanine (8-OxoG) (Cheng et al. 1992). Increased oxidative damage leads to C:G>A:T transversion mutations even in the presence of intact MUTYH, but the frequency of such transversions is markedly increased by MUTYH deficiency (Sakai et al. 2006). Suzuki et al recently suggested that MUTYH activity may actually increase C:G>A:T transversions in presence of 8-Oxo-dGTP (Suzuki et al. 2010). In light of this, SBS18/SBS36 signatures are most likely multifactorial in origin and depend on the state of oxidative stress and metabolism, as well as on the functionality of MUTYH and BER mechanisms.  We identified several known pathogenic MUTYH founder mutations within our cohort as well as previously described pathogenic variants (Table 2.1, NM_001048171). Founder variants included the European p.Gly382Asp and p.Tyr165Cys variants (Aretz et al. 2014; Lejbkowicz et   39 al. 2012) and the Asian c.892-2A>G variant (Taki et al. 2016). Previously reported pathogenic variants included c.815G>A (p.Gly272Glu) and c.996G>A (p.Ser332Ser) (Kairupan et al. 2005; Win et al. 2011; Fokkema et al. 2011). While classifications of c.996G>A and c.892-2A>G are less than uniform amongst reporting centres per ClinVar (Landrum et al. 2016), ranging from being a variant of uncertain significance, to likely pathogenic or pathogenic, they are locally considered to be pathogenic alleles. The combined germline and somatic data presented herein, support these classifications. The presence of markedly elevated SBS18/SBS36 transversion signatures in patient 1 associated with abnormal splicing of c.996G>A and phasing of the biallelic compound heterozygous germline variants (c.996G>A, p.Ser332Ser and c.815G>A, p.Gly272Glu) support pathogenicity of the synonymous variant (p.Ser332Ser). Patient 2 carries a heterozygous germline MUTYH variant (c.892-2A>G) shown to cause aberrant splicing and their tumour featured somatic inactivation of the remaining allele resulting in biallelic MUTYH aberrations. The presence of the same transversion signatures in patient 2 also supports pathogenicity of the Asian founder variant c.892-2A>G.    2.6 Conclusion  Our study demonstrates that tumours with biallelic MUTYH aberrations arising in the germline or combined germline and somatic contexts display a characteristic elevation of somatic C:G>A:T transversion signatures previously associated with defective MUTYH-mediated base excision repair (BER). Defective MUTYH is not the sole determinant of these signatures, but MUTYH germline variants may be present in a subset of patients with tumours demonstrating mutational signatures possibly suggestive of MUTYH deficiency (e.g. COSMIC Signature 18, SigProfiler   40 SBS18/SBS36, or SignatureAnalyzer SBS18/SBS36). Further research in large cohorts will be important to elucidate the role of germline and somatic MUTYH aberrations, in relation to base excision repair deficiency in diverse cancer types, in particular within populations where founder mutations have been identified.    41 Chapter 3: Comprehensive genomic profiling of a rare tumour, the eccrine porocarcinoma   The content of this chapter was published in npj Precision Oncology under the title “Whole genome and whole transcriptome genomic profiling of a metastatic eccrine porocarcinoma”, of which I am the first author (Thibodeau et al. 2018). As mentioned previously, the data used in this chapter were generated as part of the Personalized OncoGenomics project, which is an ongoing research initiative enrolling patients with advanced cancers for comprehensive genomic profiling of their tumour in order to identify opportunities for precision medicine. Patient enrollment, normal and tumour sampling and preparation, genomic sequencing, bioinformatic pipeline data processing and annotations were contributed by the Personalized OncoGenomics project.   I performed genomic and clinical data analysis, interpretation and integration, as well the writing of first author manuscript writing. In collaboration with Readman Chiu and Inanc Birol, I also performed customized data annotations and analysis using bioinformatic tools such as transcriptome targeted assembly pipeline (TAP)/PAVFinder for splicing analysis. Andrew Mungall provided a comprehensive description of the laboratory methods and Yussanne Ma the description of bioinformatics methods. Bioinformatics methods were reviewed and updated by Karen Mungall and myself following comments from reviewers of npj Precision Oncology journal which led to complete re-run of this case in the updated tools bioinformatics pipeline.     42 3.1 Introduction  Eccrine porocarcinomas (EP) are very rare malignant tumours of the intraepidermic sweat gland duct (Riera-Leal et al. 2015). In the United States, the age-adjusted incidence rate ratio of porocarcinoma is 0.4 case per 1 million person-year and the median age at diagnosis is 75 years (Blake et al. 2010). Very little is known about the molecular pathophysiology of this tumour and only targeted tumour sequencing of EP has been published to date. Harms et al recently suggested that porocarcinomas feature recurrent somatic HRAS and EGFR gain of function (GoF) mutations and various tumour suppressor genes loss of function (LoF) mutations (Harms et al. 2016). A PIK3CA somatic GoF mutation has been reported in one case of porocarcinoma (Dias-Santagata et al. 2011). We describe the whole genome and whole transcriptome profiling of metastatic EP in a 66-year-old male with a previous history of localized EP of the scalp.  3.2 Key findings  We describe the whole genome and whole transcriptome genomic profiling of a metastatic EP in a 66-year-old male patient with a previous history of localized porocarcinoma of the scalp. Whole genome and whole transcriptome genomic profiling was performed on the metastatic EP. Whole genome sequencing was performed on blood-derived DNA in order to allow a comparison between germline and somatic events. We found somatic copy losses of several tumour suppressor genes including APC, PTEN and CDKN2A, CDKN2B and CDKN1A. We identified a somatic hemizygous CDKN2A pathogenic splice site variant. De novo transcriptome assembly revealed abnormal splicing of CDKN2A p14ARF. Elevated expression of oncogenes   43 EGFR and NOTCH1 was noted and no somatic mutations were found in these genes. Wnt pathway somatic alterations were also observed. Our results suggest that the molecular pathophysiology of malignant EP features high complexity and subtle interactions of multiple key genes. Cell cycle dysregulation and CDKN2A loss of function (LoF) was found to be a new potential driver in EP tumourigenesis.  3.3 Methods  3.3.1 Clinical sample This research project was approved by the University of British Columbia Cancer Agency (BCCA) Research Ethics Board (REB) (protocol H14-00681). Informed written consent was obtained from the patient for tumour profiling using RNA-seq (tumour) as well as whole genome sequencing (tumour and blood). The use of datasets is allowed for research reports and scientific publications. The protocol allows for data to be not only used for research reports and scientific publication, but also to be made available to named investigators of institutions who agree by a data transfer agreement stating they will honour the same ethical and privacy principles required by the BCCA REB. Following informed consent, the patient underwent imaging-guided left neck lymph nodes metastases biopsies as part of the Personalized OncoGenomics trial at the British Columbia Cancer Agency (Clinicaltrials.gov ID:NCT02155621). Methods were conducted in accordance with the review board approved protocols.   3.3.2 Tissue collection and preparation   44 Lymph node biopsy tissue (10 x 3 mm) was surgically excised and embedded in optimal cutting temperature (OCT) compound and snap frozen on dry ice. 8 tubes containing 4 x 50 µm tissue sections with an estimated average tumour content of 57% (pathology review) were chosen for nucleic acid extraction using an AllPrep kit on a QiaCube instrument (Qiagen). Tumour genomic DNA (gDNA) was pooled and quantified (Qubit assay, Invitrogen), yielding 15 µg. 7 µg of total RNA was obtained with an RNA Integrity Number (RIN) of 8.1 as determined by Agilent Bioanalyzer.  3.3.3 Whole genome DNA library construction PCR-free tumour (P00471) and blood (P00427) whole genome libraries were constructed using an automated implementation of the TruSeq DNA PCR-free kit (FC-121-1002, Illumina Inc.) from 1µg gDNA, arrayed in a 96-well microtitre plate and sheared by Covaris sonication (Perkin Elmer). Sheared DNA was end-repaired and size selected using AMPure XP beads targeting a 300-400bp fraction. After 3’ A-tailing, full-length TruSeq adapters were ligated. Libraries were purified using AMPure XP beads. Library fragment sizes were assessed using an aliquot of PCR amplified library DNA on a Caliper GX DNA1000 chip. The PCR-free library concentration was quantified using a qPCR Library Quantification kit (KAPA, KK4824).   3.3.4 Strand-specific RNA library construction A strand-specific messenger RNA library (P00475) was constructed from 1 µg total RNA. Polyadenylated (polyA+) RNA was purified using a 96-well MultiMACS mRNA isolation kit on a MultiMACS 96 separator (Miltenyi Biotec, Germany) from 1 µg total RNA with on-column DNaseI-treatment as per the manufacturer's instructions. The eluted polyA+ RNA was ethanol   45 precipitated and resuspended in 10µL of DEPC treated water with 1:20 SuperaseIN (Life Technologies, USA). First-strand cDNA was synthesized from the purified polyadenylated messenger RNA using a Maxima H Minus First Strand cDNA Synthesis kit (Thermo-Fisher, USA) and random hexamer primers at a concentration of 5µM along with a final concentration of 1µg/uL Actinomycin D, followed by Ampure XP SPRI bead purification on a Biomek FX robot (Beckman-Coulter, USA). Second strand cDNA was synthesized following the Superscript cDNA Synthesis protocol by replacing the dTTP with dUTP in dNTP mix, allowing second strand digestion using Uracil-N-Glycosylase (Life Technologies, USA) in the post-adapter ligation reaction and thus achieving strand specificity. cDNA was fragmented by Covaris E210 sonication for 55 seconds at a “Duty cycle” of 20% and “Intensity” of 5. The paired-end sequencing library was prepared following the BC Cancer Genome Sciences Centre strand-specific, plate-based and paired-end library construction protocol on a Biomek FX robot (Beckman-Coulter, USA). Briefly, the cDNA was purified in 96-well format using Ampure XP SPRI beads, and was subject to end-repair, and phosphorylation by T4 DNA polymerase, Klenow DNA Polymerase, and T4 polynucleotide kinase respectively in a single reaction, followed by cleanup using Ampure XP SPRI beads and 3’ A-tailing by Klenow fragment (3’ to 5’ exo minus). After purification using Ampure XP SPRI beads, Quant-iT quantification was performed to determine the amount of Illumina PE adapters to be used in the next step of the adapter ligation reaction. The adapter-ligated products were purified using Ampure XP SPRI beads, and digested with UNG (1U/µL) at 37oC for 30 min followed by deactivation at 95oC for 15 min. The digested cDNA was purified using Ampure XP SPRI beads, and then PCR-amplified with Phusion DNA Polymerase (Thermo Fisher Scientific Inc. USA) using Illumina’s PE primer set, with cycle condition 98˚C  30sec followed by 10 cycles of 98˚C  10 sec, 65˚C  30   46 sec and 72˚C  30 sec, and then 72˚C  5min. The PCR products were purified using Ampure XP SPRI beads, and quality determined using a LabChip GX for DNA samples using the High Sensitivity Assay (Caliper, PerkinElmer, Inc. USA). PCR product in the 250-400bp size range was purified using SPRI beads, and the DNA quality was assessed and quantified using an Agilent DNA 1000 series II assay and Quant-iT dsDNA HS Assay Kit using Qubit fluorometer (Invitrogen), then diluted to 8nM for Illumina Sequencing.   3.3.5 Whole genome and transcriptome sequencing Tumour and blood genome libraries were sequenced to 90X and 42X coverage, respectively using paired-end 125 base pair reads on an Illumina HiSeq2500 with version 4 chemistry. The tumour RNA library was sequenced with paired-end 75 base reads on an Illumina HiSeq2500 sequencer using version 4 chemistry, generating 186 million passed filter reads.    3.3.6 Bioinformatic analysis  3.3.6.1 Germline alteration assessment Sequence reads from the whole genome libraries were aligned to the human reference genome (hg19) using the Burrows-Wheeler Alignment tool (BWA-MEM v0.7.6) (Li 2013). Variant calling and filtering was performed with mpileup and varFilter from SAMtools (v0.1.17) respectively (Li et al. 2009; Li 2011).  3.3.6.2 Somatic alteration assessment Sequence reads from the whole genome libraries were aligned to the human reference genome (hg19) using the Burrows-Wheeler Alignment tool (BWA-MEM v0.7.6) (Li 2013). The tumour’s   47 genomic sequence was compared to that of the patient’s constitutive DNA to identify somatic alterations.  Regions of copy number variation (CNV) and loss of heterozygosity (LOH) were identified using Hidden Markov model-based approaches CNAseq (v0.0.8) (Jones et al. 2010) and APOLLOH (v0.1.2) (Ha et al. 2012) respectively. The collection of distinct CNA and loss of heterozygosity (LOH) regions was compared to a set of theoretical models for ploidy (ranging from diploid to pentaploid) and tumour content (10% intervals from the initial estimated pathology review tumour content of 56%): 16%, 26%, 36%, 46%, 56%, 66%, 76%, 86%, 96%). The best fit was a diploid model at 66% tumour content. CNAseq (v0.0.8) (Jones et al. 2010) and APOLLOH (v0.1.2) (Ha et al. 2012) were used for the comparative analysis between tumour and matched normal to identify copy number variants (CNV) and loss of heterozygosity (LOH) regions respectively.   Single nucleotide mutations were identified using a probabilistic joint variant calling approach utilizing SAMtools (v0.1.17) (Li et al. 2009), MutationSeq (v4.3.5) (Ding et al. 2012) and Strelka (v1.0.6) (Saunders et al. 2012); small insertions and deletions (indels) were identified using Strelka (v1.0.6) (Saunders et al. 2012) and Trans-ABySS (v1.4.10) (Birol et al. 2009; Robertson et al. 2010). De novo assembly and annotation of genomic and transcriptomic data using ABySS (v1.3.4) (Simpson et al. 2009), Trans-ABySS (v1.4.10) (Birol et al. 2009; Robertson et al. 2010), deFuse (McPherson et al. 2011) and MAVIS (Reisle et al. 2019) were used to identify structural variants (SV) and fusion genes. Structural variant percentile was determined according to our local database of 584 diverse cancer cases, as described previously (Thibodeau et al. 2017).    48 Variants were annotated to genes using the Ensembl database (v69) (Flicek et al. 2014).  Coding single nucleotide variants (SNV) were compared to the COSMIC database (downloaded 2015/02/26) (Forbes et al. 2017) to identify previously recorded somatic events. Genes were associated with pathways and cancers using the ConsensusPathDB pathway database (v30) (Herwig et al. 2016) and the COSMIC cancer gene census (downloaded 2015/02/26) (Forbes et al. 2017). Genes were linked with potential therapeutics using DGIdb (downloaded 2016/06/21) (Wagner et al. 2016). Zygosity and variant calling of small mutations (SNV, indels) takes both the ploidy (diploid) and the tumour content (66%) into consideration.  3.4.6.3 Transcriptome gene expression assessment RNA-Seq reads were aligned against a database of exon junction sequences and subsequently processed to reposition all read alignments, with gaps, onto the same genomic reference using JAGuaR (v2.2.0) (Butterfield et al. 2014). Further processing (in-house software, taking into account read strand) was used to determine gene and exon read counts and normalized expression level in reads per kilobase per million mapped reads (RPKM) (Mortazavi et al. 2008).  As no reference matched normal tissue was available for differential expression analysis, we used an approach similar to that in Jones et al (Jones et al. 2010). We compared RPKM expression levels in the tumour RNA sample to transcriptome sequencing data from 16 different normal human tissues from Illumina Human BodyMap (HBM) 2.0 project (www.illumina.com; ArrayExpress ID: E-MTAB-513)(Asmann et al. 2012), computing a fold change (FC) value compared to the appropriately matched tissue, or to the average if no appropriate match was available for the tumour sample. Gene-based RPKM (transcript-normalized) values were calculated using Ensembl (v69) gene models. To compare to TCGA datasets, which use GAF   49 gene models, a mapping from a given Ensembl gene to the appropriate GAF gene was determined based on position overlap.  The Cancer Genome Atlas (TCGA) gene expression data (level 3) were downloaded (https://tcga-data.nci.nih.gov/tcga/) and analyzed to calculate reads per kilobases per million mapped reads (RPKM) values from each of the available cancer types. Each gene in the patient tumour sample was compared to a ranked list of all TCGA expression values for that gene to assign it an overall tumour percentile within the same tumour type if available, or within all tumour types. A within sample expression rank was also calculated for each gene to further infer significance to outlier gene expression levels.  To select a subset of genes for correlation analysis, we implemented ANOVA on log-transformed TCGA expression data. Expression data for the top 3000 genes ranked by F-ratios were normalized (mean) across TCGA samples and the patient sample. We then computed Spearman correlations between TCGA expression datasets and the patient sample.  3.3.6.4 Biological interpretation and therapeutic association Alterations with specific therapeutic associations are identified using the Genome Sciences Centre's expert-curated Knowledgebase, which integrates information from sources including cancer databases, drug databases, clinical tests, and the biomedical literature. Associations are considered based on the level of evidence for the use of that drug in the context of the observed alteration, including those that are approved in the same or other cancer types, and those that have early clinical or preclinical evidence. Inferences about novel or poorly studied alterations are made based on the biological function and activity of the protein and the expected impact of   50 the alteration. Additional biological and therapeutic inferences are made based on patient-specific annotated pathways. The comprehensive tumour description is expert reviewed, including highlighting driver mutations, providing pathway context, interpreting results in tumour type context, and refining potential therapeutic targets.  3.3.7 Mutational signatures 3.3.7.1 Monte-Carlo simulation The mutation signature profile was determined by classifying all genomic SNVs into 96 classes based on variant and 3’/5’ mutation context to obtain a mutation catalog vector as described by Alexandrov et al. (2013) (Alexandrov et al. 2013b). In order to determine the best fit to a consensus set of 30 mutation signatures (available at http://cancer.sanger.ac.uk/cosmic/signatures), we performed non-negative least squares decomposition to determine the exposure vector e in the model . S is known 96x30 matrix of consensus signatures, m is the 96-element mutation catalog, and the members of e denote each signature’s relative contribution to the overall mutation burden. Determining the solution of e which minimizes the squared residuals, residuals, , is a well-studied problem. We determined e using a quadratic programming approach implemented in the R package “nnls” (version 1.4) which rapidly converges to an optimal solution. In order to estimate the sampling variance inherent in m, we performed a Monte Carlo simulation. 1000 replicate vectors, ,  were randomly drawn from a multinomial distribution, ,  where n is the number of somatic mutations. 1000 corresponding exposure vectors were computed from the simulated catalogs and the 95% simulated interval was reported using the 25th and 975th ranked values for each signature exposure.    51  3.3.7.2 Timing of mutational processes Mutations were partitioned into early and late categories as described by McGranahan et al (McGranahan et al. 2015). Early mutations represent clonal variants or those present in multiple copies in a region with copy number gain (and which therefore likely occurred before the duplication). Late mutations represent subclonal variants that were not present prior to copy number gains. Mutation signatures were deciphered in the previously described manner for both early and late variants.  3.3.7.3 Radiation exposure analysis Indel/substitution and deletion/insertion ratios were analyzed as a surrogate of radiation-induced mutational processes as described by Behjati et al (2016) (Behjati et al. 2016), who found that radiation-induced tumours had higher such ratios. A custom analysis was carried out, which included genome-wide assessment of indels and SNV, detection of microhomology fraction and fraction of deletions in simple repeat regions.   3.4 Clinical description  The male patient presented at the age of 64 with a bleeding left-scalp lesion. The surgical excision biopsy revealed a well-circumscribed, but atypical proliferation of squamoid cells with connection to the overlying epidermis and conspicuous mitotic activity (10-15 mitoses per 10 high power fields) with foci of necrosis. The lesion was resected and pathology examination revealed clear margins. Pathology examination showed features in keeping with eccrine   52 porocarcinoma (EP). Eighteen months later, the patient presented with left cervical lymphadenopathies. Neck ultrasound imaging showed numerous pathological-appearing cervical lymph nodes, the largest measuring 1.2x1.2x1.2 cm. A fine-needle aspiration did not yield a satisfactory pathological sample, and was followed by a 2.5 cm incisional biopsy of a lymph node next to the left sternocleidomastoid muscle. Tumour staging investigations included an FDG-PET scan, showing the left occipital malignancy and incidentally identifying a pituitary adenoma, which maintained stable appearance on subsequent imaging. Oncological management included left neck radical dissection with en bloc excision of the left occipital porocarcinoma mass, followed by local radiation therapy to the occipital, cervical and supraclavicular areas (60 Gy in 30 fractions). Post-treatment PET scan imaging was negative for evidence of distant metastasis. Over a three month-period following initial management completion, the patient developed increasing pain in the left neck and left shoulder area, but CT and FDG-PET scan imaging did not uncover any suspicious focus of malignancy. Six months later, the follow-up FDG-PET scan showed a tracer avid left supraclavicular node and fine needle aspiration cytology revealed features in keeping with metastatic carcinoma. Subsequent wide incisional biopsy of a posterior triangle cervical lymph node confirmed the malignant epithelial neoplasm consistent with metastatic EP. Systemic therapies were considered, but not pursued, given the patient's desire to optimize his quality of life and the lack of scientific literature supporting the efficacy of such therapies in EP management. Shortly after, the patient acutely developed cerebellar signs with pronounced slurred speech and truncal ataxia. No acute vascular event was seen on unenhanced brain CT and contrast CT-angiography imaging, but brain MRI with gadolinium showed an enhancing mass (1.9x3.3x2.1 cm) within the superior vermis and right superior cerebellar hemisphere. Right occipital craniotomy and cerebellar metastasis surgical   53 resection was performed for symptomatic relief. Pathology of the resected tissue confirmed metastatic EP (Figure 3.1).  The patient developed leptomeningeal disease over the following two months and passed away from progressive central nervous system involvement.   3.5 Results  3.5.1 Pathology Pathological examination of the primary scalp lesion showed a monotonous proliferation with epidermal attachment and elevated mitotic activity, consistent with a poroid neoplasm (Figure 3.1a). Examination of the subsequent left neck resection specimen showed very similar histologic features, with metastatic tumour cells in 38 of 49 neck lymph nodes examined, confirming the diagnosis of EP (Figure 3.1b). The subsequent cerebellar metastatic tumour showed identical histological features as the primary lesion and the neck resection specimen, consisting of a highly infiltrative carcinoma composed of epithelioid cells with eosinophilic and focally clear cytoplasm, arranged in sheets and nodules, within the cerebellar parenchyma, including wide spread central necrosis and areas of discohesive growth (Figure 3.1c). CK5 immunohistochemistry showed diffuse cytoplasmic staining in the tumour cells, consistent with the expected immunoprofile for EP (Figure 3.1d). ß-catenin staining showed strong membranous and cytoplasmic positivity with no nuclear staining, supporting the absence of canonical Wnt pathway activation (Figure 3.1e). Staining for p16, the protein product of CDKN2A (p16INK4a), was absent in the tumour (Figure 3.1f). EGFR immunostaining showed diffuse membranous   54 staining (Figure 3.1g), while staining for p53 showed variable nuclear positivity, consistent either with wild-type or missense TP53 status (Figure 3.1h).   Figure 3.1 Histology and immunochemistry profile of poroid neoplasm. a) Hematoxylin & eosin (H&E) stained section of primary scalp lesion. b) H&E-stained section of subsequent metastatic left neck lesion. c) H&E-stained section of subsequent metastatic cerebellar tumour. Immunohistochemistry of the cerebellar tumour for d) CK5, e) β-catenin, f) p16, g) EGFR, and h) p53. All images are shown at 200× magnification. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/)     55 3.5.2 Somatic profiling Tumour genomic profiling was performed on a lymph node metastasis from the wide left neck biopsy (2015) at the site where previous left radical neck dissection with en bloc excision of left occipital porocarcinoma was performed (2013). Unless otherwise specified, the tumour expression percentile comparison was against The Cancer Genome Atlas (TCGA) average of all cancers (disease comparator) (Cancer Genome Atlas Research Network et al. 2013). Somatic profiling included assessment of protein-coding mutations (small mutations and structural rearrangements) and mutational burden, copy number analysis, gene expression analysis from transcriptome data and mutational signatures (please refer to Methods more details). There were 38 protein-coding somatic small mutations, including 35 (44th percentile) non-synonymous single nucleotide variants (SNV) and 3 indels (38th percentile) and there were 40 (21st percentile amongst our local database of 584 diverse cancer cases) structural variants (SV). Two expressed SV fusions due to large deletion events were identified: RNF13-PAK2 and PIK3R1-YTHDC2 (see section below on copy number and structural variants). For  additional somatic genomic data, please refer to Appendix C.1 for the coding mutation summary, Supplementary Table S4 for comprehensive copy number and gene expression, Appendix C.2 for small mutations and Appendix C.3 for structural variants.   There were regional copy gains (chromosome 3) and losses (chromosome 1, 3, 5), as well as chromosome-wide copy loss of chromosome 6 (Figure 3.2). There were no large regions of loss of heterozygosity (LOH).    56  Figure 3.2 CIRCOS plot illustrating the somatic copy number variants (CNV) observed in the porocarcinoma tumour (Krzywinski et al. 2009). From the outer circle: copy gains are represented in red; copy losses are represented in green and region of loss of heterozygosity (LOH) are represented in blue.   The transcriptome Spearman correlation showed the highest correlation with squamous cancers, specifically esophageal squamous carcinoma (Figure 3.3).   57  Figure 3.3 Transcriptome Spearman correlation across TCGA cancer types. The highest correlation found was with squamous cancers, specifically esophageal squamous carcinoma (ESCA). ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BM, bonne marrow; BRCA, breast invasive cancer; CESC CAD, cervical squamous cell carcinoma and endocervical Spearman correlationcancer normalKIRCKIRPKICHTGCTBLCAPRADLIHCPAADSTADCOADREADESCA EACCHOLSARCOVUCSUCECCESC CADMB AdultGBMTFRI GBM NCLLGGTHCAFLDLBCNCI GPH DLBCLDLBC BMTHYMLAMLUVMSKCMMESOLUADBRCAPCPGACCSpearman correlationcancer normalHNSCLUSCESCA SCCCESC SCCSquamous CellCarcinomasCentral Nervous SystemEndocrineHematologicHead and NeckSkinThoracicBreastUrologicGastrointestinalSoft tissueGynecologicNormalCancer  58 adenocarcinoma - cervical adenocarcinoma; CESC SCC, cervical squamous cell carcinoma and endocervical adenocarcinoma - squamous cell carcinoma; CHOL, cholangiocarcinoma; COADREAD, colorectal adenocarcinoma; DLBC, diffuse large B-cell lymphoma; DLBCL, diffuse large B-cell lymphoma; ESCA EAC, esophageal carcinoma esophageal adenocarcinoma; ESCA SCC, esophageal carcinoma squamous cell carcinoma; FL, follicular lymphoma; GBM, glioblastoma; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukemia; LGG, low grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung adenocarcinoma; MB, medulloblastoma; MESO, mesothelioma; NCI, National Cancer Institute; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TFRI, Terry Fox Research Institute; TGCT, testicular germ cell tumours; THCA, thyroid carcinoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma.    59 The somatic SNV profile revealed a best-fit mutation signature model comprised of signatures 1, 8, 9, and 16 (Figure 3.4) (Alexandrov et al. 2013a). Signature 1 is associated with age, and is ubiquitous across cancer types. Please refer to Appendix C.4 for the Monte Carlo simulation mutational signatures.    Figure 3.4 Mutational signatures – Monte-Carlo simulation. Mean exposure (mutation count) for each mutational signature. The exposure is displayed on the x-axis and represents the total number of mutations contributing to each individual signature on the y-CGT3' contextAA C G T5' context050100150C>AC>TT>AT>CT>GC>G࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛࡛302928272625242322212019181716151413121110987654321050010001500Exposure(Number of Mutations)SignatureA B  60 axis. Non-negative least squares (NNLS) with Monte Carlo resampling was used to generate 95% confidence interval for the mutation count of each signature. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/)  For investigation of potential radiation-associated processes, mutational signature timing analysis (Figure 3.5) and genomewide deletion mutation analyses (Figure 3.6) were performed respectively according to the methods described by McGranahan et al (McGranahan et al. 2015) and Behjati et al (Behjati et al. 2016). Results were inconsistent, and did not present obvious radiation-associated mutational signature features. Please refer to Appendix C.5 for Monte Carlo simulation mutational signatures timing and to Appendix C.6 for features associated with radiation-induced tumours.    61  Figure 3.5 Mutational signatures timing as described by McGranahan et al (McGranahan et al. 2015). (A) Early mutations (B) Late mutations. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).   Figure 3.6 Genomewide deletion distribution as described by Behjati et al (Behjati et al. 2016). Image reproduced with permission from Thibodeau et al (2018) according to ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●3029282726252423222120191817161514131211109876543210.0 0.1 0.2 0.3Exposure FractionSignatureEarly Mutations●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●3029282726252423222120191817161514131211109876543210.0 0.1 0.2 0.3Exposure FractionSignatureLate MutationsNon−zero CI●●FALSETRUE●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 X Y1e+031e+041e+051e+061e+071e+08Intermutation−distance0246Indel PositionIndel Count  62 Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).  3.5.2.1 Cell cycle regulation CDK6 had elevated expression (86th percentile). The tumour had deletional copy loss of one CDKN2A allele and a somatic splice site acceptor mutation located at the last base from intron 1, at its junction with exon 2 (chr9:21971208C>T, GRCh37; c.151-1G>C, NM_000077.4; c.194-1G>C, NM_058195.3; COSM127095; rs730881677) on the remaining allele (http://cancer.sanger.ac.uk) (Forbes et al. 2017). The variant was absent in germline (0/30 reads) and hemizygous in the tumour (19/19 reads) (Appendix C.7). This variant is present in ClinVar (rs730881677) as a pathogenic variant predisposing to familial melanoma (Landrum et al. 2016).  De novo transcriptome assembly revealed that the CDKN2A splice site variant rs730881677 causes two independent abnormal splicing events of the p14ARF transcript (ENST00000361570; NM_058195.3) (Figure 3.7) (Chiu et al. 2018). The first splicing abnormality, supported by 7 reads, is the abnormal splicing out of the entire exon 2 (Figure 3.7b). The second splicing abnormality, supported by 5 reads, is the removal of the first base of exon 2 (chr9:21971208, hg19) resulting in a frameshift indel and truncation of CDKN2A at 6 bp into exon 2 (Figure 3.7c).    63  Figure 3.7 CDKN2A splicing. a) CDKN2A (p14ARF/p16INK4a) normal splicing. b) CDKN2A exon 2 skipping caused by the somatic splice site mutation (p14ARFc.194-1G>A, NM_05895.3; p16INK4a c.151-1G>A, NM_000077.4). c) CDKN2A (p14) abnormal splicing caused by the c.194−1G>A (NM_058195.3) somatic mutation leading to 1 base pair deletion and a frameshift. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).  The exon-specific collapsed transcript expression (cumulative coverage across each exon) showed relatively low levels of expression in reads per kilobase per million mapped reads (RPKM) at the corresponding genomic area of exon 2 for both ENST00000361570 (p14ARF) and   64 ENST00000304494 (p16INK4a) transcripts (Figure 3.8 and Figure 3.9, Appendices C.7 and C.8), consistent with exon 2 skipping.    Figure 3.8 CDKN2A transcripts (green/blue) and expression overlay (top). The depth of read coverage in the transcriptome is given by the top graph. Ensembl transcript diagrams are depicted below where IDs are given to the left of each transcript and exonic regions have been scaled larger than intronic regions. The position of the splice-site mutation (9:21971208C>T) is noted with the vertical line running perpendicular to the transcripts. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).  rna3 2 1ENST000003615703 2 1ENST000003801513 2 1ENST000004461773 2 1ENST000004796924 3 2 1ENST000004942622 1ENST000004977504 3 2 1ENST000004981243 2 1ENST000004986283 2 1ENST000005306282 1ENST000005788453 2 1ENST000005791223 2 1ENST000005797553 2 1ENST00000304494* M9:21971208C>T(p16INK4)(p14ARF)  65  Figure 3.9 Exon-specific collapsed transcripts expression of CDKN2A transcripts NM_058195 (p14ARF) and NM_000077 (p16INK4a). RPKM (reads per kilobase of transcript per million mapped reads) coverage on the Y-axis in relation to genomic regions approximately corresponding to CDKN2A exons on X-axis (Exon 1 p14ARF chr9:21994138-21994490; Exon 1 p16INK4a chr9:21970901-21971207; Exon 2 p14ARF/p16INK4a chr9: 21967752-21968241; Exon 3 p14ARF/p16INK4a chr9:21968574-21968770; hg19). Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).   3.5.2.2 Cell growth, cell survival and Wnt pathway Tumour suppressor copy loss of PTEN was observed. When compared to the TCGA average, we noted elevated expression for KRAS, EGFR and NOTCH1 (Table 3.1). Amplification of GSK3B 05101520Exon 1p14ARFExon 1p16INK4aExon 2p14ARFp16INK4aExon 3p14ARFp16INK4aCDKN2A approximate corresponding exon positionsRPKMExpression per CDKN2A exon  66 (glycogen synthase kinase 3 beta) was seen (4 copies in total), associated with high expression compared to TCGA average (100th percentile, Table 3.1). WNT10A was highly expressed (Table 3.1), but no genomic causal event was identified. Copy losses of APC, CTNNB1, WNT5A and WNT2B were identified, but their transcriptome expression was average.     67 Table 3.1 RNA expression metrics of selected genes (diploid model). TCGA The Cancer Genome Atlas, ASCNA allele-specific copy number alteration, DLOH deletion loss of heterozygosity, ESCA-SCC oesophageal squamous cell carcinoma, FC fold change, kIQR number of inter-quartile range intervals away from the median, %ile percentile. Table reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/). Gene Copy number change (diploid model) All TCGA ESCA-SCC TCGA All TCGA (matched normal) Bodymap %ile kIQR %ile kIQR %ile kIQR mean FC APC -1 (DLOH) 84 1.1 70 0.41 86 0.88 -1.17 BAP1 -1 (DLOH) 1 -1.5 0 -1.77 0 -2.33 -1.79 BRAF 0 94 1.7 79 0.73 100 2.06 1.25 CDK4 0 2 -0.96 4 -1.1 13 -0.91 1.48 CDK6 0 96 4.25 89 1.08 100 9.78 7.59 CDKN1A -1 (DLOH) 60 0.2 56 0.06 56 0.11 1.44 CDKN2A -1 (DLOH) 60 0.15 72 0.56 99 4.35 4.67 CDKN2B -1 (DLOH) 96 3.74 90 1.43 91 1.78 3.91 CTNNB1 -1 (DLOH) 86 1.06 97 2.49 91 1.5 1.73 CYLD 0 65 0.28 56 0.1 53 0.04 -1.59 DPH3 0 95 1.91 100 3.36 95 1.3 -1.01 E2F1 0 31 -0.3 9 -0.73 92 1.5 1.98 E2F2 0 46 -0.08 6 -0.99 82 2.24 1.45 E2F3 -1 (DLOH) 48 -0.03 47 -0.05 91 1.13 1.09 EGFR 0 99 17.91 93 3.62 100 29.5 16.72 ERBB2 0 22 -0.47 3 -0.68 10 -0.79 1.77 EZH2 0 42 -0.14 14 -0.63 90 2.35 1.57 FZD1 0 90 2.28 92 2.15 92 1.97 3.21 FZD6 0 97 2.78 53 0.04 100 3.95 5.25 FZD7 0 98 5.58 93 1.52 97 4.43 4.18 GSK3B +2 (ASCNA) 100 5.08 97 2.22 100 8.58 3.62 HRAS 0 7 -0.69 0 -1.09 14 -0.7 1.05 JAG1 0 99 5.62 83 1.02 100 8.97 6.42 KRAS 0 96 2.45 90 1.24 100 4.41 1.83 MDM2 0 97 3.31 97 4.19 99 5.89 2.86   68 MDM4 0 93 2.08 90 2.07 100 3.26 -1.07 MUTYH 0 1 -0.97 2 -1.37 5 -0.91 -1.14 NOTCH1 0 96 3.16 82 0.96 100 4.81 5.35 PIK3CB +2 (ASCNA) 89 1.19 82 0.98 87 1.05 1.76 PTEN -1 (DLOH) 80 0.69 76 0.64 66 0.33 1.14 RB1 0 95 1.82 64 0.29 100 3.31 2.26 TGFB1 0 79 0.73 18 -0.49 94 1.75 1.86 TGFBR1 0 94 1.79 90 1.45 97 2.05 1.15 TGFBR2 0 99 3.84 100 6.67 81 0.82 1.75 TP53 0 64 0.29 56 0.16 89 1.19 18.52 WNT10A 0 96 8.33 89 1.54 100 14.41 2.9 WNT5A -1 (DLOH) 83 1.09 33 -0.13 91 2.12 -1.17  3.5.2.3 Copy number and structural variants The copy number variants (CNV) of note were copy losses of APC, CDKN2A/B, PTEN, CTNNB1, FOXP1, MITF, and BAP1 and copy gains of GSK3B, PIK3CB and ATR (refer to Supplementary Table S4 and Figure 3.2 for CNV details). Two SV are expressed in the transcriptome. A 46Mb deletion on chromosome 3 (3:149653091-196530353, hg19) leads to the fusion of RNF13 and PAK2 (Figure 3.10).     69  Figure 3.10 Large 46Mb chromosome 3 deletion (3:149653091-196530353, hg19) creating a RNF13-PAK2 gene fusion expressed in the transcriptome. PAK2 has a two-copy gain and is highly expressed (99th percentile). This event leads to loss of RNF13 RING domain, which is predicted to be essential for this ubiquitin ligase. PAK2 loses its p21-RHO-binding domain. The number above gene models represent exon numbers. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).   A 45Mb deletion on chromosome 5 (chr5:67564688-112859542, hg19) leads to the fusion of PIK3R1 and YTHDC2 (Figure 3.11) and PIK3R1 has high expression (96th percentile). Refer to Supplementary Table S4 for copy number and gene expression and to Appendix C.3 for structural variants.  RNF13PF02225PF00097PAK2PF00069PF07714PF007861 2 3 4 5 6 7 8 9 10Tx 1-382aa382aa1 8 9 101 2 6 7Tx 1-525aa525aa1 4 5 15RNF13/PAK3 fusion genePF02225TranscriptTranscriptPF00069PF077141 2 3 4 5 6 7 81 86 75 15Transcript  70  Figure 3.11 Large 45Mb chromosome 5 deletion (chr5:67564688-112859542, hg19) creating a PIK3R1-YTHDC2 gene fusion expressed in the transcriptome. All critical YTHDC2 domains (RNA helicase related) are maintained, but the ssDNA/RNA binding domain (R3H) is lost. Image reproduced with permission from Thibodeau et al (2018) according to Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/).  PIK3R1725aaPF00017PF00620PF076531 2 3 16Transcript61 4 5PF04146PF04408PF00271PF00270PF07717PF014241431aaTranscriptYTHDC21 26305 30PIK3R1/YTHDC2 fusion genePF07653TranscriptPF04146PF04408PF00271PF07717  71 3.6 Discussion  Whole genome and whole transcriptome analysis of this case of metastatic EP provided insight into the complex molecular pathophysiology of this rare tumour. Overall, somatic SNV, CNV/LOH and SV were scarce when compared to other cutaneous tumours or even non-cancerous sun-exposed skin (Martincorena et al. 2015). However, some key components of cell cycle regulation and Wnt pathways were somatically altered, and will be further discussed below.   Previously reported cases of EP have been characterized by LOH, TP53 alterations, and a paucity of cytogenetically detectable abnormalities when compared to other cutaneous squamous cell carcinomas (Takata et al. 2000). This contrasts the findings in our case, where multiple disruptions to CDKN2A (loss of one copy and somatic hemizygous splice site acceptor mutation) were identified, but no TP53 mutation.   While the global gene expression level of CDKN2A was not obviously perturbed (Table 3.1), p16INK4a isoform had decreased expression (Appendix C.7) and CDNK2A splice site variant rs730881677 caused two independent abnormally spliced p14ARF transcripts (Figure 3.7, Figure 3.9, Appendix C.7). Moreover, p16INK4a protein (CDKN2A) IHC staining was absent in our case, in keeping with the absence of a functional protein product (Figure 3.1f). Interestingly, Tsujita et al (Tsujita et al. 2015) found p16INK4a staining to be moderate to strongly positive in 16/17 eccrine poroma and focally and diffusely positive in 4/4 porocarcinoma tumour samples. Germline CDKN2A mutations are associated with significant predisposition to pancreatic cancer   72 and hereditary melanoma (Hill et al. 2013). CDKN2A somatic mutations are seen not only in melanoma and non-melanoma skin cancers (Pacifico et al. 2008), but also in tumours arising from the central nervous system, the pleura and the esophagus (Forbes et al. 2017). Combined with somatic TP53 alterations, CDKN2A LoF is a frequent feature of cutaneous squamous cell (Saridaki et al. 2003) and esophageal carcinomas (Suzuki et al. 1995). Intragenic, epigenetic and copy loss mutations of CDKN2A and CDKN2B play an important role in esophageal squamous cell oncogenesis (Hu et al. 2004). Our porocarcinoma transcriptome profiling displays the highest correlation with TCGA esophageal squamous carcinoma expression data, which may be explained by the epithelial origin of both cancer types and the genomic events described above leading to cell cycle dysregulation and oncogenesis (Figure 3.2, Figure 3.3, Figure 3.8 and Figure 3.9).    The somatic SNV profile revealed a best-fit mutation signature model comprised of signatures 1, 8, 9, and 16 (Figure 3.4, Appendix C.4). Signature 1 is associated with age, and is ubiquitous across cancer types. Signature 9 is attributed to polymerase η mutagenesis (Alexandrov et al. 2013a). The aetiologies of signatures 8 and 16 are not clearly characterized. While Signature 8 has been weakly associated with homologous recombination deficiency, the role of the more specific Signature 3 was unclear in this tumour (Davies et al. 2017). Signature 7 is associated with UV-induced mutagenesis (Alexandrov et al. 2013a). The NNLS score of Signature 7 was 0.015, suggesting that up to 1.5% of mutational processes may be attributable to UV-related processes, but the confidence interval of this signature crosses zero in the Monte Carlo simulation. Contrary to previously reported porocarcinoma cases, Signature 7 is unlikely a   73 significant contributor in either early or late mutations to the porocarcinoma tumour profile of our patient.  Using techniques described by McGranahan et al (McGranahan et al. 2015), we can partition mutations into likely early and likely late, and then decipher mutation signatures separately for both (Figure 3.5, Appendix C.5). Temporal dissection of mutations revealed 1469 late-arising and subclonal variants and 2605 early-arising or clonal variants. Mutation signatures deciphered from temporally dissected SNVs demonstrated stable representation of signatures 8, 9, and 16 across early and late mutations. Signature 1 decreased over time, agreeing with previously reported findings by McGranahan et al (McGranahan et al. 2015). Temporal analysis of mutational signatures (Figure 3.5) shows Signature 30 as a late-arising mutational signature. Signature 30 has been associated with NTHL1 mutations (Drost et al. 2017). However, the contribution of Signature 30 remains modest (Appendix C.5).  Using techniques described by Behjati et al (Behjati et al. 2016), we did not identify a clear pattern of radiation-induced mutational processes. We note that the overall analysis of potential radiation-induced genomic features (Appendix C.6) is inconsistent and therefore inconclusive. Although the fraction of deletions in simple repeat regions is 4.2%, a number above what could be expected by chance given the fact that simple repeats cover only 2.5% of the hg19 genome, the distribution of deletions across the genome display variation, with distances between deletions ranging from 100 kb to 100 Mb (Figure 3.6), which is not in keeping with the equal distribution radiation-induced deletions described by Behjati et al (Behjati et al. 2016). The deletion microhomology fraction (8.6%) was low and we did not identify an increase in large   74 inversion SV, which also does not support radiation-induced processes either (Behjati et al. 2016). Although Behjati et al (2016) (Behjati et al. 2016) looked at inferred radiation-induced tumours rather than post-radiation treatment tumours, tumour exposed to radiotherapy could theoretically present similar genomic features. Our EP’s indels and substitution SNV features did not point towards radiation-associated processes. We can not comment on whether radiotherapy in our patient played a role in porocarcinoma evolution and tumoural landscape (Behjati et al. 2016; Alexandrov et al. 2013a).  Whole genome porocarcinoma sequencing revealed a somatic focal copy loss at 3p21.3, which encompasses BAP1, another cell cycle gene frequently mutated in inherited and sporadic melanomas (Murali et al. 2013). Together, these findings indicate that cell cycle dysregulation likely plays a role in our patient's EP oncogenesis.   A 46Mb deletion on chromosome 3 (Figure 3.10, chr3:149653091-196530353, hg19, Appendix C.3) leads to the fusion of RNF13 (intron 9 breakpoint) and PAK2 (intron 4 breakpoint). This genomic event is well supported in the transcriptome. This event leads to loss of RNF13 RING domain, and as RNF13 is thought to be a ubiquitin ligase, its RING domain would be critical to this functionality (Zhang et al. 2009) and loss of function can be inferred from this genomic event. Although RNF13 activity was initially thought to favour cancer progression (Zhang et al. 2009), reduced RNF13 has also been shown to enhance metastasis (Cheng et al. 2015). The role of this gene in cancer pathogenesis remains unclear. Due to the genomic event, PAK2 loses its p21-RHO-binding domain, which is the required for binding of CDC42 and RAC1 and subsequent increases of PAK2 kinase activity. However, as PAK2 loses its GTPase-binding   75 domain, necessary for dimer and autoinhibition, this deletion event could lead to constitutive activation of this serine-threonine kinase receptor. PAK2 has a two-copy gain (total of 4 copies) and its expression is increased (100th percentile among ESCA TCGA cancers and 99th percentile among all TCGA cancers). Transcriptome data revealed the expression of a chimeric transcript fusing the first eight exons of RNF13 to exons 5 to 15 of PAK2. Transcriptome support for the event is high (more than 500 flanking read pairs, 386 spanning reads of which 203 are forward reads and 183 are reverse reads). PAK2 was shown to play a role in multiple cancer types including ovarian, endometrial, breast and others (Stofega et al. 2004; Siu et al. 2010; Flate and Stalvey 2014; Marlin et al. 2009; Siu et al. 2015; Radu et al. 2014). Taken together with the loss of PAK2 GTPase-binding domain, the copy gain (4 copies in total) and high expression of PAK2 support a potential role for PAK2 in our patient’s EP oncogenesis, but a functional assay (e.g. phosphorylation assay) would be required to investigate further this hypothesis.   A 45Mb deletion on chromosome 5 (Figure 3.11, chr5:67564688-112859542, hg19, Appendix C.3) leads to the fusion of PIK3R1 (intron 2 breakpoint) and YTHDC2 (intron 2 breakpoint). All critical YTHDC2 domains (RNA helicase related) are maintained, but the ssDNA/RNA binding domain (R3H) is lost due to the fusion and therefore, loss of function is inferred from the absence of binding domain R3H. The functional domain of PIK3R1 (encoded by exon 2) is required for interaction with adaptor proteins and tyrosine kinases and it is maintained. Transcriptome data revealed the expression of a chimeric transcript fusing the first two exons of PIK3R1 to exons 4 to 17 of YTHCD2. Transcriptome support for the event is high (37 flanking read pairs, 32 spanning reads of which 18 are forward reads and 14 are reverse reads). PIK3R1 showed increased expression (96th percentile among ESCA TCGA cancers, 87th   76 percentile among all TCGA cancers) while YTHDC2 expression slightly elevated (73rd percentile among ESCA TCGA cancers, 85th percentile among all TCGA cancers). While PIK3R1 is thought to have tumour suppressor properties by negatively regulating the epithelial-to-mesenchymal transition in renal cancer cells via the AKT/GS3KB/CTNNB1 pathway (Lin et al. 2015), some data suggests that YTHDC2 may act as an oncogene by enhancing the efficiency of HNF1A translational process. HNF1A expression level was unremarkable (Supplementary Table S4). Therefore, the biological relevance of this structural variants remains unclear.  PTEN copy loss was observed in our case. Somatic PTEN copy loss and mutations have not been reported in human porocarcinoma, but are relatively frequent events in melanoma (Dillon and Miller 2014) and squamous cell carcinomas (Ming and He 2009). A mouse model of squamous cell carcinoma of the skin showed that epidermal Pten knockout leads to skin tumour formation via increased autocrine FGF signalling (Hertzler-Schaefer et al. 2014). Suzuki et al demonstrated that combining in vitro and animal models provide critical insight on skin tumour development (Suzuki et al. 2003). They created a keratinocyte-specific Pten Cre-loxP knockout mouse model. All k5Ptenflox/flox mice and 23% of k5Ptenflox/+ mice developed squamous papillomas and squamous cell carcinomas, but one mouse developed an eccrine sweat gland adenocarcinoma (or eccrine porocarcinoma), suggesting that PTEN loss may be a critical and early event in the development of EP (Suzuki et al. 2003). Our EP tumour displays a single deletion copy loss of PTEN, but PTEN expression was unremarkable (80th percentile for all TCGA cancers).   We observe several alterations of the PI3K-AKT-RAS pathway at the genomic and transcriptome levels. Somatic copy loss of PTEN, a regulator of PI3K-AKT-RAS pathway, may   77 contribute to EGFR overexpression. Although the expression profile of AKT genes is unremarkable, this might be explained by the effects of downstream GSK3B copy gains (4 copies in total) and GSK3B high expression compared to TCGA average (100th percentile, Table 3.1).   EGFR, which has been hypothesized to be an oncogenic driver in porocarcinoma (Harms et al. 2016), has elevated expression (99th percentile for all TCGA cancers, 93rd percentile for TCGA-ESCA and 15.73-fold change from the mean Illumina BodyMap), but we identified no mutation in EGFR and EGFR ligands expression was unremarkable (Appendix C.2, Supplementary Table S4) (Schneider and Wolf 2009; Chen et al. 2016). We found one deletion copy loss of LRIG1, a known EGFR inhibitor, in our EP tumour (Gotoh 2009). EP EGFR IHC showed strong positivity (Figure 3.1g), also supporting the overrepresentation of EGFR at the functional cellular level. FISH and immunochemistry studies suggest that EGFR inhibitors could hypothetically inhibit growth of metastatic adnexal tumours (Dias-Santagata et al. 2011). Although such therapeutic agents have not been studied clinically, they represent promising treatment avenues deserving further studies. SOS1/SOS2 overexpression combined with increased EGFR activity most likely explain the RAS pathway activation with high expression of KRAS and BRAF. As EGFR IHC has many caveats and no EGFR genomic alterations were detected, combining the transcriptome data and pathway analysis allowed us to identify EGFR as a potential therapeutic target to consider in EP management. EGFR phosphorylation assays may aid in determining if EGFR overexpression alone can be used as a marker for response to EGFR inhibitors or if EGFR site specific phosphorylation may be associated with EGFR inhibitor response in a subset of wild-type EGFR cases, such as demonstrated in non-small cell lung cancer (Sette et al. 2015). EP KRAS overexpression was observed (Table 3.1) and is of interest given the previously described   78 oncogenic role of RAS family genes in EP (Harms et al. 2016). NOTCH1, which can contribute to RAS pathway over-activation, was also overexpressed (Table 3.1). Recently, KRAS and PIK3CB signalling have been noted to have a direct relationship in oral squamous cell carcinomas and these oncogenes may become therapeutic targets in the future (Al-Rawi et al. 2014). Overall, unlike several well-characterized tumour types, such as esophageal carcinomas or skin melanomas, the genomic profiling of our EP tumour is unique and does not fit any specific pattern. In our case, the patient’s clinical situation deteriorated rapidly after resection of his cerebellar metastases due to leptomeningeal spread and systemic therapy was not feasible, but advances in understanding EP pathophysiology coupled with novel targeted agents may soon offer better therapeutic strategies to treat patients with this rare tumour.     3.7 Conclusion  Our results suggest that the molecular pathophysiology of malignant EP features high complexity and subtle interactions of multiple key genes. Cell cycle dysregulation and CDKN2A LoF was found to be a new potential driver in EP tumourigenesis. Moreover, the combination of somatic copy number variants and abnormal gene expression perhaps partly related to epigenetic mechanisms, all likely contribute to the development of this rare malignancy in our patient.   Given GSK3B (glycogen synthase kinase 3 beta) amplification and high expression and copy losses of APC and CTNNB1, our data raise the possibility of Wnt pathway contribution to EP pathogenesis. No clear “targetable” pathway or genomic alteration was identified in our EP tumour, but given the rarity of this tumour as well as the paucity of EP genomic data available,   79 determining the utility of genomic profiling in guiding EP management requires additional comprehensive genomic studies. Specifically, further research is necessary to appreciate if EP tumours display recurrent and potentially targetable mutations, or if such tumours are molecularly heterogeneous and difficult to characterize. Moreover, complementary functional studies such as proteomics and detailed immunochemistry are needed to improve genomic profiling interpretation and assist in delineating the molecular pathophysiology of EP.      80 Chapter 4: Conclusions  The use of combined sequencing and data analysis methods was crucial in understanding the molecular pathophysiology of clinically unusual cancer cases in the two separate studies constituting Chapter 2 and Chapter 3.  In Chapter 2 of this thesis, I presented a study of patients with advanced cancers of diverse origins and germline or combined germline and somatic MUTYH aberrations (Thibodeau et al. 2019). I provided a detailed clinical and genomic description of a case of early-onset pancreatic ductal adenocarcinoma in a patient with biallelic germline MUTYH pathogenic variants and markedly elevated mutational signatures previously associated with defective MUTYH-mediated base excision repair. Additionally, I provided a brief summary of clinical and genomic features in two patients with breast cancer and combined germline and somatic biallelic MUTYH aberrations in which the same mutational signatures were present, although to a lesser extent. Interestingly, the tumours described in this report are not classically seen in MUTYH-associated polyposis syndrome. The use of tumour and paired normal DNA sequencing combined with transcriptome assembly and mutational signatures modelling provided useful insight into the functional impact of germline and somatic MUTYH variants through lines of evidence such as splicing aberrations and BER deficiency signatures.  In Chapter 3 of this thesis, I presented the detailed genomic analysis of a rare tumour, the eccrine porocarcinoma (Thibodeau et al. 2018). This report illustrates the importance of comprehensive genomic profiling for understanding the cancer biology of a previously poorly characterized   81 tumour. Although recurrent mutations in EGFR, KRAS and PIK3CA were reported previously in the literature, based on our genomic analyses, we found cell cycle dysregulation and CDKN2A loss of function to be a new potential driver in eccrine porocarcinoma tumourigenesis.  In summary, these reports highlight the power of combining diverse genomic and bioinformatic approaches to provide novel insights into clinically unusual cancer cases. This research work also highlights a need for more research on rare cancers, phenotypes or genotypes in order to elucidate the molecular pathophysiology underlying uncommon oncological entities.   82 Bibliography  1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. 2015. A global reference for human genetic variation. Nature 526: 68–74. Aimé A, Coulet F, Lefevre JH, Colas C, Cervera P, Flejou J-F, Lascols O, Soubrier F, Parc Y. 2015. Somatic c.34G>T KRAS mutation: a new prescreening test for MUTYH-associated polyposis? Cancer Genet 208: 390–395. Alexandrov L, Kim J, Haradhvala NJ, Huang MN, Ng AWT, Boot A, Covington KR, Gordenin DA, Bergstrom E, Lopez-Bigas N, et al. 2018. The repertoire of mutational signatures in human cancer. BioRxiv 00: 1–29. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale A-L, et al. 2013a. Signatures of mutational processes in human cancer. Nature 500: 415–421. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. 2013b. Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3: 246–259. Al-Rawi N, Ghazi A, Merza M. 2014. PIK3CB and K-ras in oral squamous Cell carcinoma. A possible cross-talk! J Orofac Sci 6: 99–103. Aretz S, Tricarico R, Papi L, Spier I, Pin E, Horpaopan S, Cordisco EL, Pedroni M, Stienen D, Gentile A, et al. 2014. MUTYH-associated polyposis (MAP): evidence for the origin of the common European mutations p.Tyr179Cys and p.Gly396Asp by founder events. Eur J Hum Genet 22: 923–929. Auclair J, Busine MP, Navarro C, Ruano E, Montmain G, Desseigne F, Saurin JC, Lasset C, Bonadona V, Giraud S, et al. 2006. Systematic mRNA analysis for the effect of MLH1 and MSH2 missense and silent mutations on aberrant splicing. Hum Mutat 27: 145–154. Bailey P, Chang DK, Nones K, Johns AL, Patch A-M, Gingras M-C, Miller DK, Christ AN, Bruxner TJC, Quinn MC, et al. 2016. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531: 47–52. Banda DM, Nuñez NN, Burnside MA, Bradshaw KM, David SS. 2017. Repair of 8-oxoG:A mismatches by the MUTYH glycosylase: Mechanism, metals and medicine. Free Radic Biol Med 107: 202–215. Behjati S, Gundem G, Wedge DC, Roberts ND, Tarpey PS, Cooke SL, Van Loo P, Alexandrov LB, Ramakrishna M, Davies H, et al. 2016. Mutational signatures of ionizing radiation in second malignancies. Nat Commun 7: 1–8. Belin L, Kamal M, Mauborgne C, Plancher C, Mulot F, Delord JP, Gonçalves A, Gavoille C, Dubot C, Isambert N, et al. 2017. Randomized phase II trial comparing molecularly targeted therapy based on tumor molecular profiling versus conventional therapy in patients with refractory cancer: cross-over analysis from the SHIVA trial. Ann Oncol 28: 590–596. Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD, Zhao Y, Hirst M, Schein JE, et al. 2009. De novo transcriptome assembly with ABySS. Bioinformatics 25: 2872–2877. Blake PW, Bradford PT, Devesa SS, Toro JR. 2010. Cutaneous appendageal carcinoma incidence and survival patterns in the United States: a population-based study. Arch Dermatol 146: 625–632.   83 Bland A, Harrington EA, Dunn K, Pariani M, Platt JCK, Grove ME, Caleshu C. 2018. Clinically impactful differences in variant interpretation between clinicians and testing laboratories: a single-center experience. Genet Med 20: 369–373. Boesaard EP, Vogelaar IP, Bult P, Wauters CA, van Krieken JHJ, Ligtenberg MJ, van der Post RS, Hoogerbrugge N. 2014. Germline MUTYH gene mutations are not frequently found in unselected patients with papillary breast carcinoma. Hered Cancer Clin Pract 12: 1–4. Boiteux S, Coste F, Castaing B. 2017. Repair of 8-oxo-7,8-dihydroguanine in prokaryotic and eukaryotic cells: Properties and biological roles of the Fpg and OGG1 DNA N-glycosylases. Free Radic Biol Med 107: 179–201. Bonetta L. 2010. Whole-genome sequencing breaks the cost barrier. Cell 141: 917–919. Boyd N, Dancey JE, Gilks CB, Huntsman DG. 2016. Rare cancers: a sea of opportunity. Lancet Oncol 17: 52–61. Butterfield YS, Kreitzman M, Thiessen N, Corbett RD, Li Y, Pang J, Ma YP, Jones SJM, Birol İ. 2014. JAGuaR: junction alignments to genome for RNA-seq reads. PLoS One 9: 1–6. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45: 1113–1120. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, et al. 2012. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2: 401–404. Chang H, Jackson DG, Kayne PS, Ross-Macdonald PB, Ryseck R-P, Siemers NO. 2011. Exome sequencing reveals comprehensive genomic alterations across eight cancer cell lines. PLoS One 6: 1–9. Cheadle JP, Sampson JR. 2007. MUTYH-associated polyposis--from defect in base excision repair to clinical genetic testing. DNA Repair (Amst) 6: 274–279. Chen J, Weiss WA. 2015. Alternative splicing in cancer: implications for biology and therapy. Oncogene 34: 1–14. Chen J, Zeng F, Forrester SJ, Eguchi S, Zhang M-Z, Harris RC. 2016. Expression and function of the epidermal growth factor receptor in physiology and disease. Physiol Rev 96: 1025–1069. Cheng H, Wang A, Meng J, Zhang Y, Zhu D. 2015. Enhanced metastasis in RNF13 knockout mice is mediated by a reduction in GM-CSF levels. Protein Cell 6: 746–756. Cheng KC, Cahill DS, Kasai H, Nishimura S, Loeb LA. 1992. 8-Hydroxyguanine, an abundant form of oxidative DNA damage, causes G>T and A>C substitutions. J Biol Chem 267: 166–172. Chiu R, Nip KM, Chu J, Birol I. 2018. TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics 11: 1–9. Climente-González H, Porta-Pardo E, Godzik A, Eyras E. 2017. The functional impact of alternative splicing in cancer. Cell Rep 20: 2215–2226. Davies H, Glodzik D, Morganella S, Yates LR, Staaf J, Zou X, Ramakrishna M, Martin S, Boyault S, Sieuwerts AM, et al. 2017. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat Med 23: 517–525. Dias-Santagata D, Lam Q, Bergethon K, Baker GM, Iafrate AJ, Rakheja D, Hoang MP. 2011. A potential role for targeted therapy in a subset of metastasizing adnexal carcinomas. Mod Pathol 24: 974–982. Dillon LM, Miller TW. 2014. Therapeutic targeting of cancers with loss of PTEN function. Curr   84 Drug Targets 15: 65–79. Ding J, Bashashati A, Roth A, Oloumi A, Tse K, Zeng T, Haffari G, Hirst M, Marra MA, Condon A, et al. 2012. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 28: 167–175. Dizdaroglu M, Coskun E, Jaruga P. 2017. Repair of oxidatively induced DNA damage by DNA glycosylases: Mechanisms of action, substrate specificities and excision kinetics. Mutat Res 771: 99–127. Drost J, van Boxtel R, Blokzijl F, Mizutani T, Sasaki N, Sasselli V, de Ligt J, Behjati S, Grolleman JE, van Wezel T, et al. 2017. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358: 234–238. Egashira A, Yamauchi K, Yoshiyama K, Kawate H, Katsuki M, Sekiguchi M, Sugimachi K, Maki H, Tsuzuki T. 2002. Mutational specificity of mice defective in the MTH1 and/or the MSH2 genes. DNA Repair (Amst) 1: 881–893. Eggington JM, Bowles KR, Moyes K, Manley S, Esterling L, Sizemore S, Rosenthal E, Theisen A, Saam J, Arnell C, et al. 2014. A comprehensive laboratory-based program for classification of variants of uncertain significance in hereditary cancer genes. Clin Genet 86: 229–237. Etzler J, Peyrl A, Zatkova A, Schildhaus HU, Ficek A, Merkelbach-Bruse S, Kratz CP, Attarbaschi A, Hainfellner JA, Yao S, et al. 2008. RNA-based mutation analysis identifies an unusual MSH6 splicing defect and circumvents PMS2 pseudogene interference. Hum Mutat 29: 299–305. Flate E, Stalvey JRD. 2014. Motility of select ovarian cancer cell lines: effect of extra-cellular matrix proteins and the involvement of PAK2. Int J Oncol 45: 1401–1411. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, et al. 2014. Ensembl 2014. Nucleic Acids Res 42: 749–755. Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. 2011. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 32: 557–563. Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, Cole CG, Ward S, Dawson E, Ponting L, et al. 2017. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res 45: 777–783. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, et al. 2013. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6: 2064–2077. Gatta G, Capocaccia R, Botta L, Mallone S, De Angelis R, Ardanaz E, Comber H, Dimitrova N, Leinonen MK, Siesling S, et al. 2017. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet-a population-based study. Lancet Oncol 18: 1022–1039. Gonzalez-Perez A, Mustonen V, Reva B, Ritchie GRS, Creixell P, Karchin R, Vazquez M, Fink JL, Kassahn KS, Pearson JV, et al. 2013. Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods 10: 723–729. Gotoh N. 2009. Feedback inhibitors of the epidermal growth factor receptor signaling pathways. Int J Biochem Cell Biol 41: 511–515. Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, Staudt LM. 2016. Toward a shared vision for cancer genomic data. N Engl J Med 375: 1109–1112. Ha G, Roth A, Lai D, Bashashati A, Ding J, Goya R, Giuliany R, Rosner J, Oloumi A, Shumansky K, et al. 2012. Integrative analysis of genome-wide loss of heterozygosity   85 and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Res 22: 1995–2007. Haradhvala NJ, Kim J, Maruvka YE, Polak P, Rosebrock D, Livitz D, Hess JM, Leshchiner I, Kamburov A, Mouw KW, et al. 2018. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat Commun 9: 1–9. Harms PW, Hovelson DH, Cani AK, Omata K, Haller MJ, Wang ML, Arps D, Patel RM, Fullen DR, Wang M, et al. 2016. Porocarcinomas harbor recurrent HRAS-activating mutations and tumor suppressor inactivating mutations. Hum Pathol 51: 25–31. Hertzler-Schaefer K, Mathew G, Somani A-K, Tholpady S, Kadakia MP, Chen Y, Spandau DF, Zhang X. 2014. Pten loss induces autocrine FGF signaling to promote skin tumorigenesis. Cell Rep 6: 818–826. Herwig R, Hardt C, Lienhard M, Kamburov A. 2016. Analyzing and interpreting genome data at the network level with ConsensusPathDB. Nat Protoc 11: 1889–1907. Hill VK, Gartner JJ, Samuels Y, Goldstein AM. 2013. The genetics of melanoma: recent advances. Annu Rev Genomics Hum Genet 14: 257–279. Hirotsu Y, Nakagomi H, Sakamoto I, Amemiya K, Oyama T, Mochizuki H, Omata M. 2015. Multigene panel analysis identified germline mutations of DNA repair genes in breast and ovarian cancer. Mol Genet Genomic Med 3: 459–466. Hoffman-Andrews L. 2017. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J Law Biosci 4: 648–657. Hu N, Wang C, Su H, Li W-J, Emmert-Buck MR, Li G, Roth MJ, Tang Z-Z, Lu N, Giffen C, et al. 2004. High frequency of CDKN2A alterations in esophageal squamous cell carcinoma from a high-risk Chinese population. Genes Chromosomes Cancer 39: 205–216. Huang K-L, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, Paczkowska M, Reynolds S, Wyczalkowski MA, Oak N, et al. 2018. Pathogenic germline variants in 10,389 adult cancers. Cell 173: 355–37. Jayasinghe RG, Cao S, Gao Q, Wendl MC, Vo NS, Reynolds SM, Zhao Y, Climente-González H, Chai S, Wang F, et al. 2018. Systematic Analysis of Splice-Site-Creating Mutations in Cancer. Cell Rep 23: 270–281. Johnson DB, Dahlman KH, Knol J, Gilbert J, Puzanov I, Means-Powell J, Balko JM, Lovly CM, Murphy BA, Goff LW, et al. 2014. Enabling a genetically informed approach to cancer medicine: a retrospective evaluation of the impact of comprehensive tumor profiling using a targeted next-generation sequencing panel. Oncologist 19: 616–622. Jones S, Lambert S, Williams GT, Best JM, Sampson JR, Cheadle JP. 2004. Increased frequency of the k-ras G12C mutation in MYH polyposis colorectal adenomas. Br J Cancer 90: 1591–1593. Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T, Chuah E, Corbett R, et al. 2010. Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol 11: 1-12. Jung H, Lee D, Lee J, Park D, Kim YJ, Park W-Y, Hong D, Park PJ, Lee E. 2015. Intron retention is a widespread mechanism of tumor-suppressor inactivation. Nat Genet 47: 1242–1248. Kairupan CF, Meldrum CJ, Crooks R, Milward EA, Spigelman AD, Burgess B, Groombridge C, Kirk J, Tucker K, Ward R, et al. 2005. Mutation analysis of the MYH gene in an Australian series of colorectal polyposis patients with or without germline APC mutations. Int J Cancer 116: 73–77.   86 Kalinsky K, Jacks LM, Heguy A, Patil S, Drobnjak M, Bhanot UK, Hedvat CV, Traina TA, Solit D, Gerald W, et al. 2009. PIK3CA mutation associates with improved outcome in breast cancer. Clin Cancer Res 15: 5049–5059. Katsila T, Viennas E, Bartsakoulia M, Komianou A, Sarris K, Tzimas G, Patrinos GP. 2018. Elsevier. Chapter 10 Human genomic databases in translational medicine. In Human Genome Informatics 195–222. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. 2013. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14: 1–13. Kim H, Kim Y-M. 2018. Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types. Sci Rep 8: 1–14. Klungland A, Rosewell I, Hollenbach S, Larsen E, Daly G, Epe B, Seeberg E, Lindahl T, Barnes DE. 1999. Accumulation of premutagenic DNA lesions in mice defective in removal of oxidative base damage. Proc Natl Acad Sci USA 96: 13300–13305. Knijnenburg TA, Bismeijer T, Wessels LFA, Shmulevich I. 2015. A multilevel pan-cancer map links gene mutations to cancer hallmarks. Chin J Cancer 34: 439–449. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639–1645. Kurzrock R, Giles FJ. 2015. Precision oncology for patients with advanced cancer: the challenges of malignant snowflakes. Cell Cycle 14: 2219–2221. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, et al. 2016. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44: 862–868. Laskin J, Jones S, Aparicio S, Chia S, Ch’ng C, Deyell R, Eirew P, Fok A, Gelmon K, Ho C, et al. 2015. Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers. Cold Spring Harb Mol Case Stud 1: 1–14. Lejbkowicz F, Cohen I, Barnett-Griness O, Pinchev M, Poynter J, Gruber SB, Rennert G. 2012. Common MUTYH mutations and colorectal cancer risk in multiethnic populations. Fam Cancer 11: 329–335. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536: 285–291. Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993. Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv 00: 1-3. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. Li MM, Datto M, Duncavage EJ, Kulkarni S, Lindeman NI, Roy S, Tsimberidou AM, Vnencak-Jones CL, Wolff DJ, Younes A, et al. 2017. Standards and guidelines for the interpretation and reporting of sequence variants in cancer: A joint consensus recommendation of the association for molecular pathology, american society of clinical oncology, and college of american pathologists. J Mol Diagn 19: 4–23.   87 Lin Y, Yang Z, Xu A, Dong P, Huang Y, Liu H, Li F, Wang H, Xu Q, Wang Y, et al. 2015. PIK3R1 negatively regulates the epithelial-mesenchymal transition and stem-like phenotype of renal cancer cells through the AKT/GSK3β/CTNNB1 signaling pathway. Sci Rep 5: 1–12. Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, Zhou X, Li Y, Rusch MC, Easton J, et al. 2018. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555: 371–376. Madhavan S, Ritter D, Micheel C, Rao S, Roy A, Sonkin D, Mccoy M, Griffith M, Griffith OL, Mcgarvey P, et al. 2018. ClinGen Cancer Somatic Working Group - standardizing and democratizing access to cancer molecular diagnostic data to drive translational research. Pac Symp Biocomput 23: 247–258. Mandelker D, Zhang L, Kemel Y, Stadler ZK, Joseph V, Zehir A, Pradhan N, Arnold A, Walsh MF, Li Y, et al. 2017. Mutation Detection in Patients With Advanced Cancer by Universal Sequencing of Cancer-Related Genes in Tumor and Normal DNA vs Guideline-Based Germline Testing. JAMA 318: 825–835. Marlin JW, Eaton A, Montano GT, Chang Y-WE, Jakobi R. 2009. Elevated p21-activated kinase 2 activity results in anchorage-independent growth and resistance to anticancer drug-induced cell death. Neoplasia 11: 286–297. Martincorena I, Roshan A, Gerstung M, Ellis P, Van Loo P, McLaren S, Wedge DC, Fullam A, Alexandrov LB, Tubio JM, et al. 2015. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348: 880–886. McGranahan N, Favero F, de Bruin EC, Birkbak NJ, Szallasi Z, Swanton C. 2015. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med 7: 1–12. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MGF, Griffith M, Heravi Moussavi A, Senz J, Melnyk N, et al. 2011. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol 7: 1–16. Mertens F, Johansson B, Fioretos T, Mitelman F. 2015. The emerging complexity of gene fusions in cancer. Nat Rev Cancer 15: 371–381. Middlebrooks CD, Banday AR, Matsuda K, Udquim K-I, Onabajo OO, Paquin A, Figueroa JD, Zhu B, Koutros S, Kubo M, et al. 2016. Association of germline variants in the APOBEC3 region with cancer risk and enrichment with APOBEC-signature mutations in tumors. Nat Genet 48: 1330–1338. Ming M, He Y-Y. 2009. PTEN: new insights into its regulation and function in skin cancer. J Invest Dermatol 129: 2109–2112. Minowa O, Arai T, Hirano M, Monden Y, Nakai S, Fukuda M, Itoh M, Takano H, Hippou Y, Aburatani H, et al. 2000. Mmh/Ogg1 gene inactivation results in accumulation of 8-hydroxyguanine in mice. Proc Natl Acad Sci USA 97: 4156–4161. Moghadasi S, Eccles DM, Devilee P, Vreeswijk MPG, van Asperen CJ. 2016. Classification and clinical management of variants of uncertain significance in high penetrance cancer predisposition genes. Hum Mutat 37: 331–336. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628. Mur P, Jemth A-S, Bevc L, Amaral N, Navarro M, Valdés-Mas R, Pons T, Aiza G, Urioste M, Valencia A, et al. 2018. Germline variation in the oxidative DNA repair genes NUDT1 and OGG1 is not associated with hereditary colorectal cancer or polyposis. Hum Mutat   88 39: 1214–1225. Murali R, Wiesner T, Scolyer RA. 2013. Tumours associated with BAP1 mutations. Pathology 45: 116–126. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA, et al. 2012. Mutational processes molding the genomes of 21 breast cancers. Cell 149: 979–993. Pacifico A, Goldberg LH, Peris K, Chimenti S, Leone G, Ananthaswamy HN. 2008. Loss of CDKN2A and p14ARF expression occurs frequently in human nonmelanoma skin cancers. Br J Dermatol 158: 291–297. Painter C, Dunphy M, Anastasio E, McGillicuddy M, Anderka K, Larkin K, Lennon N, Chen Y-L, Lander E, Golub T, et al. 2017. The Angiosarcoma Project: Generating the genomic landscape of a rare cancer through a direct-to-patient initiative. J Clin Oncol 35: 1519–1519. Peng L, Bian XW, Li DK, Xu C, Wang GM, Xia QY, Xiong Q. 2015. Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types. Sci Rep 5: 1-–. Pilati C, Shinde J, Alexandrov LB, Assié G, André T, Hélias-Rodzewicz Z, Ducoudray R, Le Corre D, Zucman-Rossi J, Emile J-F, et al. 2017. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J Pathol 242: 10–15. Radu M, Semenova G, Kosoff R, Chernoff J. 2014. PAK signalling during the development and progression of cancer. Nat Rev Cancer 14: 13–25. Reisle C, Mungall KL, Choo C, Paulino D, Bleile DW, Muhammadzadeh A, Mungall AJ, Moore RA, Shlafman I, Coope R, et al. 2019. MAVIS: merging, annotation, validation, and illustration of structural variants. Bioinformatics 35: 515–517. Rhine CL, Cygan KJ, Soemedi R, Maguire S, Murray MF, Monaghan SF, Fairbrother WG. 2018. Hereditary cancer genes are highly susceptible to splicing mutations. PLoS Genet 14: 1–18. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, et al. 2015. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17: 405–424. Riera-Leal L, Guevara-Gutiérrez E, Barrientos-García JG, Madrigal-Kasem R, Briseño-Rodríguez G, Tlacuilo-Parra A. 2015. Eccrine porocarcinoma: epidemiologic and histopathologic characteristics. Int J Dermatol 54: 580–586. Rivière J-B, Mirzaa GM, O’Roak BJ, Beddaoui M, Alcantara D, Conway RL, St-Onge J, Schwartzentruber JA, Gripp KW, Nikkel SM, et al. 2012. De novo germline and postzygotic mutations in AKT3, PIK3R2 and PIK3CA cause a spectrum of related megalencephaly syndromes. Nat Genet 44: 934–940. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, et al. 2010. De novo assembly and analysis of RNA-seq data. Nat Methods 7: 909–912. Romanel A, Zhang T, Elemento O, Demichelis F. 2017. EthSEQ: ethnicity annotation from whole exome sequencing data. Bioinformatics 33: 2402–2404. Sakai A, Nakanishi M, Yoshiyama K, Maki H. 2006. Impact of reactive oxygen species on spontaneous mutagenesis in Escherichia coli. Genes Cells 11: 767–778.   89 Sakamoto K, Tominaga Y, Yamauchi K, Nakatsu Y, Sakumi K, Yoshiyama K, Egashira A, Kura S, Yao T, Tsuneyoshi M, et al. 2007. MUTYH-null mice are susceptible to spontaneous and oxidative stress induced intestinal tumorigenesis. Cancer Res 67: 6599–6604. Sakumi K, Tominaga Y, Furuichi M, Xu P, Tsuzuki T, Sekiguchi M, Nakabeppu Y. 2003. Ogg1 knockout-associated lung tumorigenesis and its suppression by Mth1 gene disruption. Cancer Res 63: 902–905. Saridaki Z, Liloglou T, Zafiropoulos A, Koumantaki E, Zoras O, Spandidos DA. 2003. Mutational analysis of CDKN2A genes in patients with squamous cell carcinoma of the skin. Br J Dermatol 148: 638–648. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. 2012. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28: 1811–1817. Scarpa A, Chang DK, Nones K, Corbo V, Patch A-M, Bailey P, Lawlor RT, Johns AL, Miller DK, Mafficini A, et al. 2017. Whole-genome landscape of pancreatic neuroendocrine tumours. Nature 543: 65–71. Schadt EE, Turner S, Kasarskis A. 2010. A window into third-generation sequencing. Hum Mol Genet 19: 227-240. Schneider MR, Wolf E. 2009. The epidermal growth factor receptor ligands at a glance. J Cell Physiol 218: 460–466. Seiler M, Peng S, Agrawal AA, Palacino J, Teng T, Zhu P, Smith PG, Cancer Genome Atlas Research Network, Buonamici S, Yu L. 2018. Somatic Mutational Landscape of Splicing Factor Genes and Their Functional Consequences across 33 Cancer Types. Cell Rep 23: 282–296. Seoane J, De Mattos-Arruda L. 2014. The challenge of intratumour heterogeneity in precision medicine. J Intern Med 276: 41–51. Sette G, Salvati V, Mottolese M, Visca P, Gallo E, Fecchi K, Pilozzi E, Duranti E, Policicchio E, Tartaglia M, et al. 2015. Tyr1068-phosphorylated epidermal growth factor receptor (EGFR) predicts cancer stem cell targeting by erlotinib in preclinical models of wild-type EGFR lung cancer. Cell Death Dis 6: 1–11. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res 19: 1117–1123. Siu MKY, Kong DSH, Ngai SYP, Chan HY, Jiang L, Wong ESY, Liu SS, Chan KKL, Ngan HYS, Cheung ANY. 2015. p21-Activated Kinases 1, 2 and 4 in Endometrial Cancers: Effects on Clinical Outcomes and Cell Proliferation. PLoS One 10: 1–13. Siu MKY, Wong ESY, Chan HY, Kong DSH, Woo NWS, Tam KF, Ngan HYS, Chan QKY, Chan DCW, Chan KYK, et al. 2010. Differential expression and phosphorylation of Pak1 and Pak2 in ovarian cancer: effects on prognosis and cell invasion. Int J Cancer 127: 21–31. Smith LM, Sharif S, Brand R, Fink E, Lamb J, Whitcomb DC. 2009. MUTYH exon 7 and 13 mutations associated with colorectal cancer (MAP syndrome) are not commonly associated with sporadic pancreatic cancer. Pancreatology 9: 793–796. Stofega MR, Sanders LC, Gardiner EM, Bokoch GM. 2004. Constitutive p21-activated kinase (PAK) activation in breast cancer cells as a result of mislocalization of PAK to focal adhesions. Mol Biol Cell 15: 2965–2977.   90 Sukhai MA, Craddock KJ, Thomas M, Hansen AR, Zhang T, Siu L, Bedard P, Stockley TL, Kamel-Reid S. 2016. A classification system for clinical relevance of somatic variants identified in molecular profiling of cancer. Genet Med 18: 128–136. Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B. 2014. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156: 1324–1335. Suzuki A, Itami S, Ohishi M, Hamada K, Inoue T, Komazawa N, Senoo H, Sasaki T, Takeda J, Manabe M, et al. 2003. Keratinocyte-specific Pten deficiency results in epidermal hyperplasia, accelerated hair follicle morphogenesis and tumor formation. Cancer Res 63: 674–681. Suzuki H, Zhou X, Yin J, Lei J, Jiang HY, Suzuki Y, Chan T, Hannon GJ, Mergner WJ, Abraham JM. 1995. Intragenic mutations of CDKN2B and CDKN2A in primary human esophageal cancers. Hum Mol Genet 4: 1883–1887. Suzuki T, Harashima H, Kamiya H. 2010. Effects of base excision repair proteins on mutagenesis by 8-oxo-7,8-dihydroguanine (8-hydroxyguanine) paired with cytosine and adenine. DNA Repair (Amst) 9: 542–550. Sveen A, Kilpinen S, Ruusulehto A, Lothe RA, Skotheim RI. 2016. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35: 2413–2427. Takata M, Hashimoto K, Mehregan P, Lee MW, Yamamoto A, Mohri S, Ohara K, Takehara K. 2000. Genetic changes in sweat gland carcinomas. J Cutan Pathol 27: 30–35. Taki K, Sato Y, Nomura S, Ashihara Y, Kita M, Tajima I, Sugano K, Arai M. 2016. Mutation analysis of MUTYH in Japanese colorectal adenomatous polyposis patients. Fam Cancer 15: 261–265. Thibodeau ML, Bonakdar M, Zhao E, Mungall KL, Reisle C, Zhang W, Bye MH, Thiessen N, Bleile D, Mungall AJ, et al. 2018. Whole genome and whole transcriptome genomic profiling of a metastatic eccrine porocarcinoma. npj Precision Onc 2: 1–6. Thibodeau ML, Reisle C, Zhao E, Martin LA, Alwelaie Y, Mungall KL, Ch’ng C, Thomas R, Ng T, Yip S, et al. 2017. Genomic profiling of pelvic genital type leiomyosarcoma in a woman with a germline CHEK2:c.1100delC mutation and a concomitant diagnosis of metastatic invasive ductal breast carcinoma. Cold Spring Harb Mol Case Stud 3: 1–18. Thibodeau ML, Zhao EY, Reisle C, Ch’ng C, Wong H-L, Shen Y, Jones MR, Lim HJ, Young S, Cremin C, et al. 2019. Base excision repair deficiency signatures implicate germline and somatic MUTYH aberrations in pancreatic ductal adenocarcinoma and breast cancer oncogenesis. Cold Spring Harb Mol Case Stud 5: 1–18. Thorvaldsdóttir H, Robinson JT, Mesirov JP. 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinformatics 14: 178–192. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. 2012. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7: 562–578. Tsang H, Addepalli K, Davis SR. 2017. Resources for interpreting variants in precision genomic oncology applications. Front Oncol 7: 1–10. Tsujita J, Kaku Y, Ichiki T, Eto A, Maemura H, Otsuka A, Nakaie R, Kitagawa N, Morioka Y, Matsuda T, et al. 2015. Immunohistological Expression of p16INK4a is Commonly Present Both in Benign and Malignant Sweat Gland Neoplasias. Fukuoka Igaku Zasshi 106: 323–329.   91 Tsuzuki T, Egashira A, Igarashi H, Iwakuma T, Nakatsuru Y, Tominaga Y, Kawate H, Nakao K, Nakamura K, Ide F, et al. 2001. Spontaneous tumorigenesis in mice defective in the MTH1 gene encoding 8-oxo-dGTPase. Proc Natl Acad Sci USA 98: 11456–11461. Varley JM, Attwooll C, White G, McGown G, Thorncroft M, Kelsey AM, Greaves M, Boyle J, Birch JM. 2001. Characterization of germline TP53 splicing mutations and their genetic and functional analysis. Oncogene 20: 2647–2654. Viel A, Bruselles A, Meccia E, Fornasarig M, Quaia M, Canzonieri V, Policicchio E, Urso ED, Agostini M, Genuardi M, et al. 2017. A Specific Mutational Signature Associated with DNA 8-Oxoguanine Persistence in MUTYH-defective Colorectal Cancer. EBioMedicine 20: 39–49. Wadt K, Choi J, Chung J-Y, Kiilgaard J, Heegaard S, Drzewiecki KT, Trent JM, Hewitt SM, Hayward NK, Gerdes A-M, et al. 2012. A cryptic BAP1 splice mutation in a family with uveal and cutaneous melanoma, and paraganglioma. Pigment Cell Melanoma Res 25: 815–818. Wagner AH, Coffman AC, Ainscough BJ, Spies NC, Skidmore ZL, Campbell KM, Krysiak K, Pan D, McMichael JF, Eldred JM, et al. 2016. DGIdb 2.0: mining clinically relevant drug-gene interactions. Nucleic Acids Res 44: 1036-1044. Walker LC, Whiley PJ, Couch FJ, Farrugia DJ, Healey S, Eccles DM, Lin F, Butler SA, Goff SA, Thompson BA, et al. 2010. Detection of splicing aberrations caused by BRCA1 and BRCA2 sequence variants encoding missense substitutions: implications for prediction of pathogenicity. Hum Mutat 31: 1484-1505. Wang E, Zou J, Zaman N, Beitel LK, Trifiro M, Paliouras M. 2013. Cancer systems biology in the genome sequencing era: part 2, evolutionary dynamics of tumor clonal networks and drug resistance. Semin Cancer Biol 23: 286–292. Wappenschmidt B, Becker AA, Hauke J, Weber U, Engert S, Köhler J, Kast K, Arnold N, Rhiem K, Hahnen E, et al. 2012. Analysis of 30 putative BRCA1 splicing mutations in hereditary breast and ovarian cancer families identifies exonic splice site mutations that escape in silico prediction. PLoS One 7: 1–9. Wasielewski M, Out AA, Vermeulen J, Nielsen M, van den Ouweland A, Tops CMJ, Wijnen JT, Vasen HFA, Weiss MM, Klijn JGM, et al. 2010. Increased MUTYH mutation frequency among Dutch families with breast cancer and colorectal cancer. Breast Cancer Res Treat 124: 635–641. Waszak SM, Tiao G, Zhu B, Rausch T, Muyas F, Rodriguez-Martin B, Rabionet R, Yakneen S, Escaramis G, Li Y, et al. 2017. Germline determinants of the somatic mutation landscape in 2,642 cancer genomes. BioRxiv 00: 1–26. Weren RDA, Ligtenberg MJL, Kets CM, de Voer RM, Verwiel ETP, Spruijt L, van Zelst-Stams WAG, Jongmans MC, Gilissen C, Hehir-Kwa JY, et al. 2015. A germline homozygous mutation in the base-excision repair gene NTHL1 causes adenomatous polyposis and colorectal cancer. Nat Genet 47: 668–671. Win AK, Cleary SP, Dowty JG, Baron JA, Young JP, Buchanan DD, Southey MC, Burnett T, Parfrey PS, Green RC, et al. 2011. Cancer risks for monoallelic MUTYH mutation carriers with a family history of colorectal cancer. Int J Cancer 129: 2256–2262. Yang C, Ceyhan-Birsoy O, Mandelker D, Jairam S, Catchings A, O’Reilly EM, Walsh MF, Zhang L. 2019. A synonymous germline variant PALB2 c.18G>T (p.Gly6=) disrupts normal splicing in a family with pancreatic and breast cancers. Breast Cancer Res Treat 173: 79–86.   92 Yang C-Y, Liau J-Y, Huang W-J, Chang Y-T, Chang M-C, Lee J-C, Tsai J-H, Su Y-N, Hung C-C, Jeng Y-M. 2015. Targeted next-generation sequencing of cancer genes identified frequent TP53 and ATRX mutations in leiomyosarcoma. Am J Transl Res 7: 2072–2081. Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, Liang Y, Rivkin E, Wang J, Whitty B, et al. 2011. International Cancer Genome Consortium Data Portal–a one-stop shop for cancer genomics data. Database (Oxford) 2011: 1–11. Zhang Q, Meng Y, Zhang L, Chen J, Zhu D. 2009. RNF13: a novel RING-type ubiquitin ligase over-expressed in pancreatic cancer. Cell Res 19: 348–357. Zhao EY, Shen Y, Pleasance E, Kasaian K, Leelakumari S, Jones M, Bose P, Ch’ng C, Reisle C, Eirew P, et al. 2017. Homologous Recombination Deficiency and Platinum-Based Therapy Outcomes in Advanced Breast Cancer. Clin Cancer Res 23: 7521–7530. Zhou L, Baba Y, Kitano Y, Miyake K, Zhang X, Yamamura K, Kosumi K, Kaida T, Arima K, Taki K, et al. 2016. KRAS, BRAF, and PIK3CA mutations, and patient prognosis in 126 pancreatic cancers: pyrosequencing technology and literature review. Med Oncol 33: 1–8.     93 Appendices Appendix A List of 98 Mendelian hereditary cancer predisposition genes Main organ system associated  Genes  Nervous system  (e.g. neuroblastoma) NF1, NF2, TSC1, TSC2, AKT1, ALK, EZH2, NSD1, RB1, PHOX2B, SUFU Breast and ovary BRCA1, BRCA2, PTEN, CHEK2, ATR, RAD50, RAD51, RAD51B, RAD51C, RAD51D, BARD1, BLM, BRIP1, NBN, AXIN2, MRE11A, PALB2 Endocrine and neuroendocrine  MEN1, SDHAF2, SDHA, SDHB, SDHD, SDHC, VHL, RET, PRKAR1A, TMEM127, MAX, CDC73, CDKN1B,  Gastric CDH1, KIT, PDGFRA Colorectal  (e.g. Lynch, polyposis) and other gastrointestinal sites (e.g. duodenum) APC, SMAD4, MLH1, MSH2 MSH6, PMS1, PMS2, MUTYH, STK11, EPCAM, GREM1, POLD1, POLE, TGFBR1 Kidney BMPR1A, FLCN, FH, MET, WT1, HNF1A Skin (e.g. melanoma) CDKN2A, PTCH1, MITF, BAP1, CDK4 Hematological  (e.g. dyskeratosis congenita) TERT, DKC1, TERC, TINF2, GATA2, ATM, WRN, RECQL4, RUNX1, ERCC2, ERCC3, ERCC4, ERCC5, CBL, ETV6, FAM175A, FANCA, FANCC, PTPN11, PAX5, SH2D1A Musculoskeletal, connective tissues (e.g. sarcoma, rhabdoid) TP53, SMARCB1, SMARCA4 Mosaic disorders  (diverse cancer types)  IDH1, PIK3CA, AKT1, HRAS Miscellaneous DICER1, EGFR    94 Appendix B Chapter 2 B.1 Distribution of most common cancer types (oncotree codes) across our 731 cases of advanced cancers. Tumour types with 5 or more cases in the POG cohort shown.   Oncotree tumour type count Breast Invasive Ductal Carcinoma (IDC) 135 Colorectal Adenocarcinoma (COAD) 78 Pancreatic Adenocarcinoma (PAAD) 51 Lung Adenocarcinoma (LUAD) 48 High-Grade Serous Ovarian Cancer (HGSOC) 13 Leiomyosarcoma (LMS) 12 Cholangiocarcinoma (CHOL) 10 Pancreatic Neuroendocrine Tumor (PANET) 10 NA 10 Breast Invasive Lobular Carcinoma (ILC) 9 Melanoma (MEL) 9 Osteosarcoma (OS) 9 Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma/High-Grade Spindle Cell Sarcoma (MFH) 9 Uveal Melanoma (UM) 9 Adenocarcinoma of the Gastroesophageal Junction (GEJ) 7 Low-Grade Serous Ovarian Cancer (LGSOC) 7 Neuroblastoma (NBL) 7 Adrenocortical Carcinoma (ACC) 6 Anal Squamous Cell Carcinoma (ANSC) 6 Clear Cell Ovarian Cancer (CCOV) 6 Lung Squamous Cell Carcinoma (LUSC) 6 Neurofibroma (NFIB) 6 Stomach Adenocarcinoma (STAD) 6 Ewing Sarcoma (ES) 5 Gastrointestinal Stromal Tumor (GIST) 5 Glioblastoma Multiforme (GBM) 5 Uterine Carcinosarcoma/Uterine Malignant Mixed Mullerian Tumor (UCS) 5 Uterine Endometrioid Carcinoma (UEC) 5    95 B.2 Detailed MUTYH somatic status for MUTYH germline carriers. The cDNA variant description relates to the NM_001048171 transcript and the amino acid variant description to the NP_001041636 transcript. All variants related to the hugo gene MUTYH. ACC, Adrenocortical Carcinoma; ALOH, allele-specific loss of heterozygosity;ATRT, Atypical Teratoid/Rhabdoid Tumor; FC, fold change; DLOH, deletion loss of heterozygosity; HET, heterozygous; IDC, Breast Invasive Ductal Carcinoma; kIQR, number of inter-quartile range intervals away from the median; LOH, loss of heterozygosity; LUAD, Lung Adenocarcinoma; NA, not available; PAAD, Pancreatic Adenocarcinoma; RCC, renal cell carcinoma; TCGA, The Cancer Genome Atlas; ULMS, Uterine Leiomyosarcoma; %ile, percentile.   ID Tumour type Copy category Copy change Ploidy copy number Ploidy copy change LOH LOH ratio LOH TCGA kIQR TCGA %ile FC Bodymap 1 PAAD Neutral 0 4 0 HET 0.57 No -0.59 16 1.36 2 IDC Loss -1 2 -1 DLOH 0.74 Yes 0.12 56 2.23 3 IDC Loss -1 2 -1 DLOH 0.86 Yes -0.52 20 1.45 4 ACC Neutral NA 3 0 NLOH NA NA NA NA NA 5 ATRT Neutral 0 2 0 HET 0.57 No -0.41 23 1.49 6 LUAD Neutral 0 2 0 HET 0.48 No -0.55 18 1.43 7 RCC Gain 0 4 1 HET 0.49 No -0.48 22 1.53 8 IDC Gain 1 4 1 ALOH 0.65 Yes 0.12 56 2.25 9 ULMS Neutral 0 2 0 HET 0.38 No 0.42 69 2.57 10 PAAD Neutral 0 2 0 HET 0.37 No 0.48 72 2.68 11 PAAD Loss -1 3 -2 HET 0.72 No -0.23 37 1.82 12 LUAD Neutral 0 3 0 HET 0.54 No 0.16 58 2.28   96 B.3 Patient 1 coding mutation summary. PDAC, pancreatic ductal adenocarcinoma; SNV, single nucleotide variant; SV, structural variant; TCGA, The Cancer Genome Atlas  Type of mutation (coding) Number of events Percentile - TCGA PDAC Percentile - all TCGA SNV 88 76 73 INDEL 6 1 73 SV 86 60th percentile amongst our local database of 626 cancers     97 B.4 Patient 1 somatic small mutations.  chr pos ref alt hugo Variant support zygosity Eff_type hgvs_cds hgvs_protein 1 27101233 C CT ARID1A '2/4' het frameshift c.4516dupT p.Tyr1506fs 1 185834975 G T HMCN1 '1/5' het missense c.601G>T p.Asp201Tyr 1 40704081 G T RLF '1/4' het missense c.3707G>T p.Ser1236Ile 1 17594448 C T PADI3 '2/4' het missense c.643C>T p.His215Tyr 1 34083158 C T CSMD2 '2/4' het missense c.184G>A p.Glu62Lys 1 42046091 C A HIVEP3 '1/4' het Stop gained c.4378G>T p.Glu1460* 1 52947564 C A ZCCHC11 '2/4' het missense c.1547G>T p.Gly516Val 1 75112406 T A ERICH3 '2/4' het missense c.188A>T p.Lys63Ile 1 76378528 G T MSH4 '1/4' het missense c.2767G>T p.Asp923Tyr 1 156438059 G A MEF2D '3/5' het missense c.1280C>T p.Ala427Val 1 167095584 G T DUSP27 '2/5' het Stop gained c.1216G>T p.Glu406* 1 201845115 G A IPO9 '1/5' het missense c.3059G>A p.Cys1020Tyr 1 204093892 C T SOX13 '3/5' het missense c.1499C>T p.Thr500Ile 1 215848148 C A USH2A '1/5' het missense c.13105G>T p.Ala4369Ser 2 202137420 G T CASP8 '2/4' het missense c.567G>T p.Leu189Phe 2 125660593 G T CNTNAP5 '2/4' het missense c.3568G>T p.Val1190Phe 2 179612847 T A TTN '2/4' het missense c.14280A>T p.Arg4760Ser 2 201474083 G T AOX1 '1/4' het missense c.1099G>T p.Asp367Tyr 2 217124045 G A MARCH04 '2/4' het missense c.1223C>T p.Thr408Met 2 241622023 G T AQP12B '1/4' het missense c.232C>A p.Leu78Met 2 158287474 A T CYTIP '0/4' sub missense+splice c.280T>A p.Ser94Thr 3 47164579 CT C SETD2 '2/3' het Frameshift variant c.1546delA p.Arg516fs   98 3 123010055 C T ADCY5 '1/5' het missense c.3232G>A p.Ala1078Thr 3 195508078 G A MUC4 '1/5' het missense c.10373C>T p.Pro3458Leu 3 183755945 C T HTR3D '3/5' het missense c.797C>T p.Thr266Ile 3 195513154 G T MUC4 '0/5' sub missense c.5297C>A p.Thr1766Asn 4 187627950 G T FAT1 '1/3' het missense c.3032C>A p.Ser1011Tyr 4 4864599 C T MSX1 '1/5' het missense c.641C>T p.Thr214Met 4 47679958 C T CORIN '2/4' het missense c.1246G>A p.Val416Ile 4 69203306 G C YTHDC1 '2/4' het missense c.443C>G p.Thr148Arg 4 119626803 G A METTL14 '2/4' het missense c.893G>A p.Arg298His 4 119631257 G T METTL14 '2/4' het Stop gained c.1171G>T p.Glu391* 4 155410851 C T DCHS2 '1/3' het missense c.1636G>A p.Ala546Thr 4 5857987 C A CRMP1 '0/5' sub Stop gained c.703G>T p.Glu235* 5 140207905 G T PCDHA6 '1/5' het missense c.229G>T p.Val77Leu 5 41160466 C A C6 '1/4' het missense c.1462G>T p.Ala488Ser 5 83402473 C T EDIL3 '1/3' het Stop gained c.645G>A p.Trp215* 6 35261681 C A ZNF76 '2/4' het missense c.1483C>A p.Gln495Lys 6 125569472 C A TPD52L1 '1/4' het missense c.329C>A p.Ala110Glu 7 115890461 C T TES '0/4' sub Stop gained c.613C>T p.Gln205* 7 99774657 C T GPC2 '1/4' het missense+splice c.166G>A p.Gly56Ser 7 98792740 T A KPNA7 '1/4' het missense c.506A>T p.Asn169Ile 7 133580424 C A EXOC4 '1/4' het missense c.1807C>A p.Gln603Lys 7 4247730 G T SDK1 '2/4' het Splice acceptor 7 128496611 G A FLNC '1/4' het missense c.7291G>A p.Val2431Met 7 77814966 C A MAGI2 '0/4' sub missense c.2291G>T p.Ser764Ile 7 82585845 C A PCLO '0/4' sub missense c.4424G>T p.Arg1475Met 8 53596157 C T RB1CC1 '3/5' het missense c.321G>A p.Met107Ile 8 28226083 C A ZNF395 '1/2' het Splice acceptor   99 8 39550173 G T ADAM18 '3/4' het missense c.1876G>T p.Ala626Ser 8 145058273 C A PARP10 '2/3' het Splice acceptor 8 145750009 CTGT C LRRC24 '2/3' het Disruptive inframe deletion c.251_253delACA p.Asn84del 9 34513156 C A DNAI1 '4/10' het missense c.1536C>A p.Phe512Leu 9 34513183 C G DNAI1 '3/10' het missense c.1563C>G p.Ile521Met 9 34513132 C A DNAI1 '5/10' het missense c.1512C>A p.Phe504Leu 9 139342595 G A SEC16A '2/4' het missense c.6331C>T p.Arg2111Cys 10 25197504 C A PRTFDC1 '1/3' het missense c.450G>T p.Trp150Cys 10 34673088 C G PARD3 '1/4' het missense c.985G>C p.Asp329His 10 34673152 C G PARD3 '1/4' het missense c.921G>C p.Glu307Asp 10 84744945 G A NRG3 '2/4' het missense c.1084G>A p.Glu362Lys 11 5221197 T G OR51V1 '1/4' het missense c.734A>C p.Lys245Thr 11 6425011 G T APBB1 '1/4' het missense c.763C>A p.Leu255Met 11 49075477 G T TRIM64C '1/4' het Stop gained c.1133C>A p.Ser378* 11 119991292 T C TRIM29 '1/4' het missense c.1517A>G p.Tyr506Cys 12 25398285 C A KRAS '3/6' het missense c.34G>T p.Gly12Cys 12 40716230 G T LRRK2 '1/3' het missense c.5427G>T p.Lys1809Asn 12 40868582 T A MUC19 '1/3' het missense c.4756T>A p.Ser1586Thr 12 96674568 G T CDK17 '1/3' het missense c.1393C>A p.His465Asn 12 97211672 C A CFAP54 '1/3' het Stop gained c.9077C>A p.Ser3026* 12 107393747 G T CRY1 '1/3' het Stop gained c.798C>A p.Tyr266* 12 133501958 TTTATA T ZNF605 '2/3' het Frameshift variant+stop lost c.1922_1926delTATAA p.Ile641fs 13 111532294 G A ANKRD10 '1/5' het missense c.953C>T p.Pro318Leu 13 47324708 G T LRCH1 '1/6' het Stop gained c.2014G>T p.Glu672* 13 78472444 G T EDNRB '1/5' het Stop gained c.1490C>A p.Ser497*   100 14 20529047 C A OR4L1 '1/6' het missense c.844C>A p.Leu282Met 14 92537354 C CCTGCTGCTGCTGCTGCTGCTGCTGCTG ATXN3 '1/4' het Inframe insertion c.942_943insCAGCAGCAGCAGCAGCAGCAGCAGCAG p.Gln306_Gln314dup 14 81259445 G A CEP128 '4/4' hom missense c.1219C>T p.Arg407Cys 15 39885410 G A THBS1 '1/4' het missense c.2977G>A p.Asp993Asn 15 50535435 C T HDC '1/4' het missense c.1147G>A p.Glu383Lys 16 89349070 C A ANKRD11 '1/5' het Stop gained c.3880G>T p.Glu1294* 16 88744917 G A SNAI3 '1/5' het missense c.818C>T p.Thr273Ile 16 20043917 C G GPR139 '1/5' het missense c.202G>C p.Ala68Pro 16 1828491 G T SPSB3 '1/5' het missense c.249C>A p.Ser83Arg 16 47581404 G T PHKB '3/5' het missense c.655G>T p.Gly219Cys 17 7578503 CAGGGCAGGTCTTGGCCAGTTGGCAAAACATCTTGTTG C TP53 '1/2' het Frameshift variant c.390_426delCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCCT p.Leu130fs 17 79880745 C A MAFG '1/4' het missense c.225G>T p.Gln75His 18 9887041 T C TXNDC2 '1/3' het missense c.565T>C p.Ser189Pro 18 8784164 G T MTCL1 '2/3' het missense c.1054G>T p.Val352Leu 18 25589767 G A CDH2 '2/4' het missense c.616C>T p.Pro206Ser 19 51381718 G T KLK2 '2/4' het missense c.689G>T p.Gly230Val 19 15793291 C A CYP4F12 '1/4' het Stop gained c.618C>A p.Cys206*   101 19 4054394 C A ZBTB7A '2/4' het missense c.837G>T p.Glu279Asp 19 9002548 G A MUC16 '2/4' het missense c.40268C>T p.Thr13423Ile 19 14268190 C A ADGRL1 '2/4' het missense c.2633G>T p.Cys878Phe 19 22497758 G T ZNF729 '1/4' het missense c.1539G>T p.Lys513Asn 19 30019359 C A VSTM2B '1/4' het missense c.287C>A p.Thr96Asn 19 56953763 C A ZNF667 '2/4' het Stop gained c.601G>T p.Glu201* 20 31369174 C T DNMT3B '0/4' sub missense c.158C>T p.Ser53Leu 22 21579692 A G GGT2 '2/2' hom missense c.152T>C p.Leu51Ser 22 32019749 G T PISD '2/2' hom missense c.140C>A p.Pro47His X 84510384 G T ZNF711 '2/2' hom missense c.199G>T p.Ala67Ser     102 B.5 Patient 1 somatic structural variants.  Event type1 Event type2 gene1 gene2 Flanking dna reads Spanning DNA reads location Flanking RNA reads Spanning RNA reads fusion deletion NA KCNU1 23 8 8:28620411|8:36782032 NA NA fusion duplication MECOM FNDC3B 4 11 3:168815920|3:171938862 3 7 fusion duplication ZNF273 ZNF273 0 7 7:64346453|7:64347998 NA NA fusion duplication GATA4 GATA4 0 4 8:11612178|8:11614051 NA NA indel ins ARID1A ARID1A NA 19 1:27101233-27101233 NA 33 fusion inversion GPRC5A GPRC5A 11 8 12:13049191|12:28485150 NA NA fusion inversion ITPR2 CACNA1C 10 7 12:26619559|12:2328550 NA NA fusion inversion ATR NA 8 1 3:142209236|3:145768433 2 3 fusion inversion PLD1 NA 11 8 3:158464289|3:171349515 NA NA fusion inversion NA NOL6 8 5 9:33219310|9:33461702 1 6 fusion translocation NA GATA4 2 7 1:149243941|8:11612655 NA NA fusion translocation CCDC30 MON1B 2 11 1:43008532|16:77230193 NA NA fusion translocation PPM1A KIAA1598 3 33 14:60741400|10:118722884 NA NA fusion translocation NA SYTL2 2 32 2:218265091|11:85444947 NA NA fusion translocation NA ITCH 12 9 2:85350509|20:33095084 NA NA fusion translocation ANTXR2 NA 2 11 4:80858869|8:72399221 NA NA fusion translocation NA B4GALNT3 8 6 9:20265196|12:633218 2 3   103 B.6 Patient 1 MUTYH targeted transcriptome assembly (TAP) splicing bed coordinates.  chr1 45795108 45796188 MUTYH.NM_001048171.E16.E15 36 - 45795108 45796188 0 2 1,1 0,1079 chr1 45796228 45796854 MUTYH.NM_001048174.E15.E14 42 - 45796228 45796854 0 2 1,1 0,625 chr1 45797005 45797092 MUTYH.NM_001048171.E14.E13 49 - 45797005 45797092 0 2 1,1 0,86 chr1 45797227 45797333 MUTYH.NM_001048174.E13.E12 41 - 45797227 45797333 0 2 1,1 0,105 chr1 45797478 45797695 MUTYH.NM_001048174.E12.E11 12 - 45797478 45797695 0 2 1,1 0,216 chr1 45797520 45797695 MUTYH.NM_001048171.E12.E11 17 - 45797520 45797695 0 2 1,1 0,174 chr1 45797757 45797838 MUTYH.NM_001048174.E11.E10 34 - 45797757 45797838 0 2 1,1 0,80 chr1 45797981 45798063 MUTYH.NM_001048174.E10.E9 37 - 45797981 45798063 0 2 1,1 0,81 chr1 45798159 45798246 MUTYH.NM_001048174.E9.E8 36 - 45798159 45798246 0 2 1,1 0,86 chr1 45798358 45798435 MUTYH.NM_001048174.E8.E7 48 - 45798358 45798435 0 2 1,1 0,76 chr1 45798505 45798590 MUTYH.NM_001048174.E7.E6 44 - 45798505 45798590 0 2 1,1 0,84 chr1 45798505 45798769 MUTYH.NM_001048174.E7.E5 1 - 45798505 45798769 0 2 1,1 0,263 chr1 45798630 45798769 MUTYH.NM_001048171.E6.E5 33 - 45798630 45798769 0 2 1,1 0,138 chr1 45798841 45798957 MUTYH.NM_001048171.E5.E4 32 - 45798841 45798957 0 2 1,1 0,115 chr1 45798995 45799085 MUTYH.NM_001048174.E4.E3 39 - 45798995 45799085 0 2 1,1 0,89 chr1 45799232 45800063 MUTYH.NM_001048174.E3.E2 19 - 45799232 45800063 0 2 1,1 0,830 chr1 45800182 45805891 MUTYH.NM_001048171.E2.E1 4 - 45800182 45805891 0 2 1,1 0,5708 chr1 45805975 45806745 TOE1.NM_025077.E1.E2 14 + 45805975 45806745 0 2 1,1 0,769 chr1 45807240 45807621 TOE1.NM_025077.E4.E5 2 + 45807240 45807621 0 2 1,1 0,380   104 B.7 Patient 1 COSMIC mutational signatures. The total number of somatic mutations was 12087. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier  1 718 485 1055 557 876 0.0594 no no  2 413 286 587 343 489 0.0342 no no  3 11 0 79 0 37 0.0009 no no  4 79 0 339 3 230 0.0066 no no  5 36 0 189 2 120 0.0030 no no  6 17 0 109 0 61 0.0014 no no  7 25 0 106 0 85 0.0020 no no  8 2140 1686 2647 1866 2435 0.1770 no no  9 205 40 425 92 329 0.0170 no no  10 811 593 1028 667 961 0.0671 yes yes  11 35 0 151 2 106 0.0029 no no  12 85 0 304 4 202 0.0071 no no  13 29 0 106 3 68 0.0024 no no  14 1176 871 1512 978 1385 0.0973 yes yes  15 9 0 61 0 34 0.0008 no no  16 186 1 494 17 388 0.0154 no no  17 5 0 41 0 16 0.0004 no no  18 5683 5296 6093 5435 5920 0.4702 yes yes  19 109 1 368 8 269 0.0090 no no  20 35 0 175 1 115 0.0029 no no  21 10 0 56 0 32 0.0008 no no  22 11 0 67 0 39 0.0009 no no  23 23 0 111 1 79 0.0019 no no  24 21 0 131 0 73 0.0018 no no  25 52 0 268 1 166 0.0043 no no  26 23 0 117 0 78 0.0019 no no  27 74 3 141 33 118 0.0061 no no  28 4 0 29 0 17 0.0004 no no  29 33 0 182 1 110 0.0027 no no  30 30 0 175 1 93 0.0024 no no     105 B.8 Patient 1 SigProfiler mutational signatures. The total number of somatic mutations was 12087. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 730 528 875 633 823 0.0604 no no 2 211 98 334 136 291 0.0174 no no 3 23 0 159 1 78 0.0019 no no 4 42 0 350 1 149 0.0035 no no 5 46 0 267 1 151 0.0038 no no 6 102 0 371 4 271 0.0084 no no 7a 180 31 343 64 287 0.0149 yes no 7b 19 0 108 0 66 0.0016 no no 7c 54 0 156 4 127 0.0045 no no 7d 24 0 93 2 63 0.0020 no no 8 748 373 1057 495 980 0.0619 no no 9 100 2 272 17 208 0.0083 no no 10a 172 3 447 28 333 0.0142 no no 10b 48 0 151 7 99 0.0039 no no 11 21 0 163 1 75 0.0017 no no 12 46 0 218 2 139 0.0038 no no 13 29 1 97 4 63 0.0024 no no 14 203 1 433 43 378 0.0168 yes yes 15 98 2 302 10 224 0.0081 no no 16 96 5 253 20 183 0.0080 no no 17a 27 0 105 1 65 0.0022 no no 17b 18 0 54 1 38 0.0015 no no 18 549 12 1292 72 1055 0.0454 no no 19 255 104 408 155 353 0.0211 no no 20 59 0 266 2 178 0.0048 no no 21 20 0 90 1 60 0.0016 no no 22 58 0 208 4 137 0.0048 no no 23 16 0 87 0 57 0.0013 no no 24 36 0 238 1 110 0.0030 no no 25 104 0 439 5 294 0.0086 no no 26 67 0 247 2 172 0.0055 no no 27 21 0 92 1 56 0.0017 no no 28 16 0 85 1 45 0.0013 no no   106 29 83 0 371 3 263 0.0069 no no 30 60 0 256 3 184 0.0050 no no 31 30 0 174 1 104 0.0025 no no 32 251 94 446 132 366 0.0208 no no 33 11 0 59 0 35 0.0009 no no 34 158 71 251 100 222 0.0131 no no 35 66 0 348 2 183 0.0055 no no 36 5769 4833 6598 5188 6283 0.4772 yes yes 37 50 0 213 1 151 0.0042 no no 38 86 0 278 6 208 0.0072 yes yes 39 14 0 82 0 47 0.0011 no no 40 51 0 307 1 182 0.0042 no no 41 32 0 184 1 106 0.0027 no no 42 27 0 141 1 99 0.0022 no no 43 6 0 42 0 23 0.0005 no no 44 112 0 380 6 283 0.0092 no no 45 31 0 176 1 108 0.0026 no no 46 74 1 267 4 184 0.0061 no no 47 42 0 196 3 120 0.0035 no no 48 5 0 37 0 18 0.0004 no no 49 5 0 24 0 17 0.0004 no no 50 41 0 166 1 121 0.0034 no no 51 21 0 119 1 74 0.0017 no no 52 23 0 130 1 86 0.0019 no no 53 37 0 179 2 103 0.0031 no no 54 52 0 161 5 115 0.0043 no no 55 12 0 49 0 36 0.0010 no no 56 76 0 422 1 252 0.0063 no no 57 36 0 170 1 103 0.0030 no no 58 500 265 716 332 652 0.0413 no no 59 85 0 246 9 180 0.0070 yes yes 60 2 0 14 0 8 0.0002 no no    107 B.9 Patient 1 SignatureAnalyzer mutational signature. The total number of somatic mutations was 12087. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 754 269 1002 498 926 0.0624 no no 2 382 160 540 287 477 0.0316 no no 3 35 0 154 1 115 0.0029 no no 4 171 1 499 12 400 0.0141 no no 5 81 0 433 3 249 0.0067 no no 6 137 1 688 5 385 0.0113 no no 7a 52 0 206 2 152 0.0043 no no 7b 27 0 158 1 100 0.0023 no no 7c 19 0 103 1 66 0.0015 no no 8 908 559 1331 697 1121 0.0752 no no 9 65 0 192 9 137 0.0054 no no 10a 26 0 224 1 89 0.0021 no no 11 64 1 217 4 168 0.0053 no no 12 35 0 176 1 116 0.0029 no no 13 49 0 178 3 131 0.0041 no no 14 181 0 621 7 424 0.0150 yes yes 15 53 0 297 2 163 0.0044 no no 16 167 11 318 72 265 0.0138 no no 17a 87 3 178 31 145 0.0072 no no 17b 14 0 70 1 38 0.0012 no no 18 982 642 1304 798 1185 0.0813 yes yes 19 298 131 436 214 385 0.0246 yes no 21 15 0 89 0 56 0.0013 no no 22 227 122 349 157 301 0.0188 no no 26 44 0 169 2 126 0.0036 no no 28 18 0 96 0 60 0.0015 no no 30 77 1 285 4 206 0.0064 no no 33 9 0 45 0 29 0.0008 no no 35 54 1 210 3 137 0.0044 no no 36 4962 4573 5272 4755 5166 0.4105 yes yes 37 32 0 209 1 109 0.0027 no no 38 196 50 344 101 282 0.0162 yes yes 39 15 0 132 1 54 0.0013 no no   108 40 22 0 133 0 68 0.0018 no no 60 3 0 14 0 9 0.0002 no no 55 10 0 52 0 32 0.0008 no no 44 195 1 523 9 417 0.0161 no no 61 17 0 110 1 62 0.0014 no no 62 29 0 152 1 89 0.0024 no no 63 22 0 143 1 76 0.0018 no no 64 71 0 387 2 228 0.0059 no no 65 117 1 543 5 327 0.0096 no no 66 28 0 150 0 96 0.0023 no no 67 29 0 147 0 88 0.0024 no no 68 106 0 331 7 255 0.0088 no no 69 227 22 420 71 361 0.0188 no no 70 39 0 229 2 137 0.0033 no no 71 540 161 841 324 761 0.0447 no no 72 37 0 144 2 111 0.0030 no no 73 24 0 162 0 85 0.0020 no no 74 83 0 482 2 316 0.0068 no no 75 37 0 152 1 114 0.0031 no no 76 35 0 188 1 110 0.0029 no no 77 13 0 86 1 47 0.0011 no no 78 14 0 76 0 46 0.0012 no no 79 48 0 197 2 138 0.0039 no no 80 58 0 279 2 172 0.0048 no no 81 21 0 98 1 67 0.0018 no no 82 16 0 92 0 54 0.0014 no no 83 7 0 33 0 18 0.0006 no no    109 B.10 Patient 1 COSMIC signatures Bayesian probabilities for KRAS G12C mutation. P_SignatureSNV, prior probability of C[C>A]A in the reference signature matrix; P_Mutation_given_SigExposure, proportion of mutations caused by the signature (or probability of any mutation being caused by the signature); posterior_probability, posterior probability of the C[C>A]A transversion mutation being caused by the signature; Sig, signature.  Sig P_SignatureSNV P_Mutation_given_SigExposure posterior_probability 18 0.08876811 0.47021181 0.8361342 8 0.03172376 0.17704921 0.11251336 14 0.0056 0.0972541 0.01090992 1 0.00659587 0.05940324 0.00784888 4 0.0461 0.00655486 0.00605326 16 0.0159 0.01536053 0.00489248 10 0.0031 0.06709785 0.00416673 9 0.0098 0.0169839 0.00333418 29 0.05141021 0.0026998 0.00278039 24 0.06355928 0.0017763 0.00226163 19 0.0112 0.00897888 0.00201449 12 0.0135 0.00706021 0.00190931 25 0.01483295 0.00432253 0.00128437 20 0.01737711 0.00288755 0.00100515 27 0.00506507 0.00609936 6.19E-04 5 0.0096749 0.00296404 5.74E-04 2 6.77E-04 0.03417228 0.00046374 3 0.01878173 8.98E-04 3.38E-04 6 0.0101 0.00138326 2.80E-04 15 0.0106 7.79E-04 1.66E-04 26 0.00370585 0.00186422 1.38E-04 22 0.00454969 9.39E-04 8.56E-05 13 0.00171009 0.0023952 8.21E-05 7 0.0012 0.00204784 4.92E-05 11 7.00E-04 0.00291909 4.09E-05 21 0.002 8.08E-04 3.24E-05 17 0.00103243 4.10E-04 8.47E-06 28 0.00116852 3.50E-04 8.20E-06 23 1.65E-04 0.00188026 6.20E-06 30 0 0.00244904 0      110 B.11 Patient 1 SigProfiler signatures Bayesian probabilities for KRAS G12C mutation. P_SignatureSNV, prior probability of C[C>A]A in the reference signature matrix; P_Mutation_given_SigExposure, proportion of mutations caused by the signature (or probability of any mutation being caused by the signature); posterior_probability, posterior probability of the C[C>A]A transversion mutation being caused by the signature; Sig, signature.  Sig P_SignatureSNV P_Mutation_given_SigExposure posterior_probability 36 0.0558 0.477249 0.664419 18 0.074 0.045422 0.083861 38 0.381 0.007154 0.068 8 0.0401 0.061885 0.061915 53 0.334 0.003074 0.02562 45 0.238 0.00258 0.015322 29 0.0542 0.00689 0.009317 15 0.0418 0.00814 0.00849 4 0.0807 0.003452 0.006951 14 0.0163 0.016812 0.006837 58 0.00618 0.041348 0.006375 20 0.0376 0.004842 0.004543 35 0.0299 0.005485 0.004092 25 0.0148 0.008637 0.003189 52 0.0574 0.001937 0.002775 46 0.0156 0.006131 0.002386 49 0.229 3.98E-04 0.002275 40 0.0208 0.004239 0.0022 47 0.0246 0.003501 0.002149 16 0.0104 0.007967 0.002067 50 0.0225 0.003409 0.001914 24 0.0254 0.003017 0.001912 10a 0.00318 0.014247 0.00113 3 0.0225 0.00187 0.00105 9 0.0048 0.008261 9.89E-04 7d 0.0145 0.002012 7.28E-04 5 0.00743 0.003765 6.98E-04 19 0.00132 0.021098 6.95E-04 31 0.0107 0.00251 6.70E-04 57 0.0087 0.003002 6.52E-04 32 0.00124 0.020774 6.43E-04   111 42 0.0116 0.002204 6.38E-04 34 0.00195 0.013095 6.37E-04 12 0.00542 0.003781 5.11E-04 1 3.12E-04 0.060399 4.70E-04 59 0.00223 0.007009 3.90E-04 56 0.00248 0.006251 3.87E-04 6 0.00182 0.008414 3.82E-04 37 0.00322 0.00417 3.35E-04 51 0.00698 0.001716 2.99E-04 39 0.0101 0.001147 2.89E-04 22 0.00183 0.004783 2.18E-04 27 0.00507 0.00172 2.18E-04 26 0.00141 0.005535 1.95E-04 7a 4.55E-04 0.01492 1.69E-04 55 0.00625 0.001029 1.61E-04 41 0.00238 0.002682 1.59E-04 30 7.68E-04 0.004958 9.50E-05 2 2.08E-04 0.01744 9.05E-05 43 0.00663 5.33E-04 8.82E-05 13 0.0014 0.002422 8.46E-05 28 0.00203 0.001285 6.51E-05 33 0.0027 8.86E-04 5.97E-05 10b 5.21E-04 0.003936 5.12E-05 7b 0.00114 0.001574 4.48E-05 11 6.18E-04 0.001739 2.68E-05 17a 2.95E-04 0.002196 1.62E-05 7c 1.09E-04 0.004507 1.23E-05 48 0.00118 4.06E-04 1.20E-05 17b 2.71E-04 0.00146 9.87E-06 54 6.32E-05 0.00431 6.80E-06 44 2.84E-05 0.009236 6.54E-06 60 0.00118 1.99E-04 5.85E-06 23 1.51E-04 0.001325 4.99E-06 21 2.22E-16 0.001623 8.99E-18     112 B.12 Patient 1 SignatureAnalyzer signatures Bayesian probabilities for KRAS G12C mutation. P_SignatureSNV, prior probability of C[C>A]A in the reference signature matrix; P_Mutation_given_SigExposure, proportion of mutations caused by the signature (or probability of any mutation being caused by the signature); posterior_probability, posterior probability of the C[C>A]A transversion mutation being caused by the signature; Sig, signature.  Sig P_SignatureSNV P_Mutation_given_SigExposure posterior_probability 36 0.055086 0.377881 0.506045 38 0.448623 0.018962 0.206802 18 0.051216 0.090431 0.112594 8 0.033119 0.072551 0.058413 4 0.076342 0.031008 0.057547 63 0.010374 0.046092 0.011624 44 0.015704 0.023616 0.009016 14 0.011058 0.024923 0.0067 1 0.004494 0.055341 0.006046 31 0.020006 0.006079 0.002957 19 0.003502 0.023957 0.002039 5 0.010573 0.007685 0.001975 30 0.013068 0.005928 0.001883 22 0.002843 0.021505 0.001486 16 0.005049 0.010398 0.001276 70 0.00968 0.005008 0.001178 78 0.010811 0.003939 0.001035 69 0.001754 0.018076 7.71E-04 12 0.012033 0.002628 7.69E-04 9 0.004048 0.00767 7.55E-04 37 0.011041 0.002781 7.46E-04 68 0.001423 0.019288 6.67E-04 67 0.004892 0.005507 6.55E-04 2 7.59E-04 0.03372 6.22E-04 26 0.003824 0.006461 6.01E-04 77 0.007946 0.003073 5.94E-04 40 0.011716 0.001692 4.82E-04 81 0.003584 0.005291 4.61E-04 71 0.00788 0.001902 3.64E-04 74 0.006828 0.002026 3.36E-04 66 0.005499 0.002463 3.29E-04   113 39 0.008071 0.00155 3.04E-04 65 0.005309 0.002283 2.95E-04 73 0.006004 0.001973 2.88E-04 75 0.00441 0.002589 2.78E-04 80 0.003792 0.002298 2.12E-04 15 0.002889 0.002839 1.99E-04 7c 0.003702 0.002088 1.88E-04 76 0.005935 0.001166 1.68E-04 7b 0.001727 0.003483 1.46E-04 10a 0.002578 0.002203 1.38E-04 7a 7.57E-04 0.007253 1.34E-04 64 0.001917 0.002411 1.12E-04 72 0.001884 0.002262 1.04E-04 62 0.001874 0.002054 9.36E-05 61 0.00234 0.001518 8.63E-05 28 0.001957 0.001778 8.46E-05 13 0.001392 0.002429 8.22E-05 21 0.00303 0.001103 8.13E-05 17a 2.95E-04 0.007463 5.35E-05 17b 0.001218 0.001722 5.10E-05 11 3.80E-04 0.00525 4.85E-05 79 0.00122 0.001589 4.71E-05 55 9.68E-04 0.001191 2.80E-05 33 4.19E-04 7.62E-04 7.77E-06 60 0 2.39E-04 0 82 0 6.22E-04 0   114 B.13 Patient 2 coding mutation summary. BRCA, breast invasive carcinoma; SNV, single nucleotide variant; SV, structural variant; TCGA, The Cancer Genome Atlas  Type of mutation (coding) Number of events Percentile - TCGA BRCA Percentile - all TCGA SNV 24 54 29 INDEL 1 28 24 SV 116 73rd percentile amongst local databases (n=583)    115 B.14 Patient 2 somatic small mutations. chr pos ref alt effect type hugo hgvs_protein hgvs_cds Alt reads Ref reads 1 152323924 A C missense  FLG2 p.Val2113Gly c.6338T>G 8 72 1 1720669 C T missense  GNB1 p.Asp247Asn c.739G>A 16 21 1 201052359 C T missense  CACNA1S p.Val442Ile c.1324G>A 8 93 1 227842419 C A missense  ZNF678 p.Asp156Glu c.468C>A 15 156 1 143767357 G C missense  PPIAL4G p.Phe164Leu c.492C>G 32 146 1 202124732 G A stop gained PTPN7 p.Gln238* c.712C>T 50 55 2 79349136 C A stop gained REG1A p.Ser69* c.206C>A 8 46 2 38302054 G T missense  CYP1B1 p.Pro160Thr c.478C>A 5 60 2 96521280 T G missense  ANKRD36C p.Ile1577Leu c.4729A>C 6 61 3 178952085 A G missense  PIK3CA p.His1047Arg c.3140A>G 15 41 3 124728548 C T splice donor +intron  HEG1 NA NA 13 43 3 13363242 G A missense  NUP210 p.Ser1670Phe c.5009C>T 23 36 3 72864536 T G missense  SHQ1 p.Ile301Leu c.901A>C 7 46 5 153065833 C T missense  GRIA1 p.Arg360Cys c.1078C>T 5 59 9 123156895 C T missense  CDK5RAP2 p.Val1825Ile c.5473G>A 20 48 9 4286195 G T missense  GLIS3 p.Ser77Arg c.231C>A 7 52 11 62295549 C T missense  AHNAK p.Ala2114Thr c.6340G>A 3 37 12 100175842 A T missense  ANKS1B p.Leu235Met c.703T>A 14 47 14 91700884 C T missense  GPR68 p.Val171Met c.511G>A 7 59 14 60616818 T G missense  DHRS7 p.Ile241Leu c.721A>C 11 48 15 42302379 C G missense  PLA2G4E p.Val23Leu c.67G>C 22 33 15 25953394 G T missense  ATP10A p.Leu800Ile c.2398C>A 4 46 16 614970 C T missense  C16orf11 p.Pro460Leu c.1379C>T 6 74 16 31497561 C T missense  SLC5A2 p.Ala180Val c.539C>T 7 43 19 1004839 G A missense  GRIN3B p.Asp447Asn c.1339G>A 12 42   116 19 39789037 G A missense  IL29 p.Ala162Thr c.484G>A 5 43 21 46021011 A G missense  KRTAP10-7 p.Ile164Val c.490A>G 5 58 X 149919509 G T missense  MTMR1 p.Gly380Val c.1139G>T 7 37 X 152113821 G T missense  ZNF185 p.Asp407Tyr c.1219G>T 4 65 X 155251977 G A splice acceptor +intron  WASH6P NA NA 7 95    117 B.15 Patient 2 somatic structural variants.  Event type1 Event type2 gene1 gene2 Flanking DNA reads Spanning DNA reads location fusion deletion NA TRAJ6 1 25 14:22466410|14:23008024 fusion deletion CEPT1 NA 68 30 1:111703918|1:222978503 fusion deletion CEPT1 NA 68 30 1:111703918|1:222981200 fusion deletion NA CDC42BPA 75 74 1:223204394|1:227307505 fusion deletion NA CDC42BPA 75 74 1:223205339|1:227307505 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385863|22:23247167 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385863|22:23247167 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385863|22:23247167 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385867|22:23247168 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385867|22:23247168 fusion deletion IGLV4-69 IGLJ3 1 5 22:22385867|22:23247168 fusion deletion NA IGLL5 1 5 22:22390956|22:23235965 fusion deletion NA IGLL5 1 5 22:22390956|22:23235965 fusion deletion NA IGLL5 1 5 22:22390956|22:23235965 fusion deletion NA IGLL5 1 5 22:22390956|22:23235965 fusion duplication NA IGLL5 4 4 8:40171873|8:40389757 fusion duplication NA ZMAT4 4 4 8:40171873|8:40389757 fusion inversion NA CCND1 4 5 11:69366640|11:69468926 fusion inversion NA CCND1 4 5 11:69366640|11:69468926 fusion inversion ATP1A1 NA 13 13 1:116936922|1:117817159 fusion translocation POLD3 NA 14 13 11:74330516|GL000220.1:102185 fusion translocation WDR3 NA 32 31 1:118497981|11:50076790 fusion translocation WDR3 NA 32 31 1:118497981|11:50076790   118 fusion translocation WDR3 NA 32 31 1:118497981|11:50076793 fusion translocation WDR3 NA 32 31 1:118497981|11:50076793 fusion translocation SPAG17 PACS1 3 1 1:118571980|11:65983869 fusion translocation SPAG17 PACS1 3 1 1:118571980|11:65983869 fusion translocation DISP1 NA 8 5 1:223116674|11:69181511 fusion translocation DISP1 NA 8 5 1:223116674|11:69181511 fusion translocation ZNF678 RP11-702F3 6 5 1:227843464|11:46197900 fusion translocation ZNF678 RP11-702F3 6 5 1:227843464|11:46197900 fusion translocation NA NA 6 5 1:227864366|11:46197520 fusion translocation HOOK3 NA 13 10 8:42814457|8:50157265 fusion translocation NA PIBF1 17 16 8:80943266|13:73399551 fusion translocation NA PIBF1 17 16 8:80943266|13:73399551 fusion translocation TPD52 PIBF1 17 16 8:80950355|13:73401148 fusion translocation TPD52 PIBF1 17 16 8:80950355|13:73401148 fusion translocation TPD52 PIBF1 17 16 8:80954855|13:73401148 fusion translocation TPD52 PIBF1 17 16 8:80954855|13:73401148   119 B.16 Patient 2 MUTYH targeted transcriptome assembly (TAP) splicing bed coordinates. chr1 45795108 45796188 MUTYH.NM_001048171.E16.E15 114 - 45795108 45796188 0 2 1,1 0,1079 chr1 45796228 45796854 MUTYH.NM_001048171.E15.E14 109 - 45796228 45796854 0 2 1,1 0,625 chr1 45797005 45797092 MUTYH.NM_001048174.E14.E13 206 - 45797005 45797092 0 2 1,1 0,86 chr1 45797227 45797333 MUTYH.NM_001048174.E13.E12 198 - 45797227 45797333 0 2 1,1 0,105 chr1 45797520 45797695 MUTYH.NM_001048174.E12.E11 62 - 45797520 45797695 0 2 1,1 0,174 chr1 45797748 45797838 MUTYH.NM_001048174.E11.E10 46 - 45797748 45797838 0 2 1,1 0,89 chr1 45797757 45797838 MUTYH.NM_001048174.E11.E10 31 - 45797757 45797838 0 2 1,1 0,80 chr1 45797981 45798063 MUTYH.NM_001048174.E10.E9 233 - 45797981 45798063 0 2 1,1 0,81 chr1 45798159 45798246 MUTYH.NM_001048174.E9.E8 257 - 45798159 45798246 0 2 1,1 0,86 chr1 45798358 45798435 MUTYH.NM_001048174.E8.E7 253 - 45798358 45798435 0 2 1,1 0,76 chr1 45798505 45798590 MUTYH.NM_001048174.E7.E6 232 - 45798505 45798590 0 2 1,1 0,84 chr1 45798505 45798957 MUTYH.NM_001128425.E7.E4 3 - 45798505 45798957 0 2 1,1 0,451 chr1 45798630 45798769 MUTYH.NM_001048171.E6.E5 229 - 45798630 45798769 0 2 1,1 0,138 chr1 45798841 45798957 MUTYH.NM_001128425.E5.E4 252 - 45798841 45798957 0 2 1,1 0,115 chr1 45798991 45799085 MUTYH.NM_001128425.E4.E3 2 - 45798991 45799085 0 2 1,1 0,93 chr1 45798995 45799085 MUTYH.NM_001128425.E4.E3 274 - 45798995 45799085 0 2 1,1 0,89 chr1 45799168 45800063 MUTYH.NM_001293196.E3.E2 35 - 45799168 45800063 0 2 1,1 0,894 chr1 45799232 45800063 MUTYH.NM_001048171.E3.E2 65 - 45799232 45800063 0 2 1,1 0,830 chr1 45799235 45800063 MUTYH.NM_001048172.E3.E2 15 - 45799235 45800063 0 2 1,1 0,827 chr1 45799265 45800063 MUTYH.NM_001293191.E3.E2 4 - 45799265 45800063 0 2 1,1 0,797 chr1 45799274 45800063 MUTYH.NM_001128425.E3.E2 4 - 45799274 45800063 0 2 1,1 0,788 chr1 45800182 45805648 MUTYH.NM_001293192.E2.E1 7 - 45800182 45805648 0 2 1,1 0,5465 chr1 45800182 45805875 MUTYH.NM_001293192.E2.E1 5 - 45800182 45805875 0 2 1,1 0,5692 chr1 45800182 45805891 MUTYH.NM_001048171.E2.E1 26 - 45800182 45805891 0 2 1,1 0,5708 chr1 45805975 45806745 TOE1.NM_025077.E1.E2 14 + 45805975 45806745 0 2 1,1 0,769 chr1 45806213 45806745 TOE1.NM_025077.E1.E2 4 + 45806213 45806745 0 2 1,1 0,531   120 chr1 45806886 45806975 TOE1.NM_025077.E2.E3 5 + 45806886 45806975 0 2 1,1 0,88 chr1 45807014 45807145 TOE1.NM_025077.E3.E4 1 + 45807014 45807145 0 2 1,1 0,130    121 B.17 Patient 2 COSMIC mutational signatures. The total number of somatic mutations was 4979. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 1203 992 1467 1064 1350 0.2416 no no 2 143 85 219 97 193 0.0288 no no 3 422 69 831 186 653 0.0848 no no 4 45 0 224 2 139 0.0091 no no 5 757 48 1509 338 1210 0.1521 no no 6 31 0 242 1 112 0.0063 no no 7 21 0 104 1 66 0.0042 no no 8 209 11 517 41 416 0.0419 no no 9 327 152 555 209 459 0.0657 no no 10 56 1 162 6 129 0.0113 no no 11 27 0 134 1 81 0.0053 no no 12 227 6 554 55 419 0.0457 no no 13 21 0 76 1 49 0.0041 no no 14 29 0 137 1 95 0.0057 no no 15 18 0 140 0 62 0.0037 no no 16 91 0 393 3 261 0.0182 no no 17 5 0 41 0 16 0.0010 no no 18 830 645 1028 690 959 0.1668 yes yes 19 29 0 182 1 104 0.0058 no no 20 176 13 328 66 293 0.0354 no yes 21 17 0 88 0 59 0.0034 no no 22 7 0 43 0 24 0.0013 no no 23 14 0 87 0 51 0.0028 no no 24 24 0 128 1 82 0.0048 no no 25 41 0 225 1 128 0.0082 no no 26 42 0 235 1 145 0.0084 no no 27 26 0 69 3 55 0.0051 no no 28 8 0 45 0 27 0.0015 no no 29 78 0 296 3 213 0.0156 no yes 30 57 0 312 2 168 0.0114 no no    122 B.18 Patient 2 SigProfiler mutational signatures. The total number of somatic mutations was 4979. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 579 441 696 494 659 0.1162 no no 2 84 22 159 41 131 0.0169 no no 3 71 0 408 2 222 0.0143 no no 4 30 0 181 1 95 0.0060 no no 5 227 0 1193 9 642 0.0456 no no 6 94 0 317 6 237 0.0189 no no 7a 31 0 133 0 98 0.0062 no no 7b 22 0 113 1 78 0.0045 no no 7c 50 1 134 6 105 0.0100 no no 7d 16 0 91 0 51 0.0033 no no 8 56 0 255 2 163 0.0113 no no 9 86 1 247 8 182 0.0172 no no 10a 65 0 162 10 128 0.0131 no no 10b 24 0 93 1 63 0.0047 no no 11 17 0 108 1 58 0.0035 no no 12 145 1 361 18 281 0.0291 no no 13 22 0 59 2 47 0.0044 no no 14 26 0 120 1 80 0.0052 no no 15 28 0 183 1 88 0.0057 no no 16 83 2 263 9 180 0.0168 no no 17a 10 0 67 0 34 0.0021 no no 17b 4 0 20 0 13 0.0008 no no 18 774 166 1137 516 992 0.1554 no yes 19 75 1 201 10 153 0.0150 no no 20 25 0 141 1 72 0.0049 no no 21 26 0 147 1 71 0.0052 no no 22 16 0 75 1 48 0.0032 no no 23 28 0 130 1 91 0.0057 no no 24 38 0 166 2 120 0.0076 no no 25 49 0 215 1 139 0.0099 no no 26 57 0 322 2 188 0.0115 no no 27 14 0 65 1 38 0.0028 no no 28 7 0 40 0 23 0.0014 no no   123 29 111 1 328 8 269 0.0223 no yes 30 168 1 364 6 291 0.0337 no no 31 40 0 171 1 123 0.0080 no no 32 40 0 182 2 110 0.0080 no no 33 31 0 106 2 77 0.0063 no no 34 61 6 128 18 104 0.0123 no no 35 30 0 150 1 88 0.0060 no no 36 132 0 550 7 350 0.0266 no yes 37 162 4 453 23 307 0.0325 no no 38 8 0 49 0 29 0.0017 no no 39 270 35 494 148 387 0.0542 no no 40 126 0 567 4 361 0.0254 no no 41 41 0 216 2 129 0.0081 no no 42 29 0 168 1 102 0.0058 no no 43 15 0 72 1 47 0.0030 no no 44 89 0 279 6 220 0.0178 no no 45 10 0 61 0 32 0.0019 no no 46 35 0 158 1 107 0.0070 no no 47 45 0 159 2 112 0.0090 no no 48 5 0 26 0 14 0.0009 no no 49 4 0 33 0 16 0.0009 no no 50 82 3 195 10 157 0.0164 no no 51 57 0 182 6 128 0.0114 no no 52 15 0 77 0 45 0.0031 no no 53 13 0 62 0 43 0.0025 no no 54 201 68 309 116 277 0.0404 no no 55 12 0 58 0 36 0.0024 no no 56 37 0 184 1 110 0.0075 no no 57 25 0 115 1 83 0.0050 no no 58 163 10 340 54 269 0.0327 no no 59 37 1 114 4 86 0.0074 no yes 60 6 0 33 0 19 0.0013 no no    124 B.19 Patient 2 SignatureAnalyzer mutational signatures. The total number of somatic mutations was 4979. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 615 295 839 450 746.5 0.1234 no no 2 108 6 201 43 166.0 0.0217 no no 3 81 0 292 4 204.2 0.0163 no no 4 76 0 309 4 202.3 0.0153 no no 5 193 2 669 16 435.7 0.0387 no no 6 293 11 727 42 578.1 0.0589 no no 7a 26 0 147 1 81.2 0.0052 no no 7b 29 0 136 1 96.7 0.0058 no no 7c 28 0 115 1 82.4 0.0056 no no 8 71 0 281 4 178.7 0.0143 no no 9 101 4 255 15 177.0 0.0203 no no 10a 26 0 130 1 82.1 0.0052 no no 11 25 0 139 1 76.4 0.0051 no no 12 46 0 278 2 154.3 0.0091 no no 13 24 0 94 1 66.6 0.0048 no no 14 23 0 132 0 81.9 0.0047 no no 15 22 0 198 0 82.4 0.0045 no no 16 74 0 221 4 164.5 0.0148 no no 17a 25 0 106 1 70.9 0.0051 no no 17b 6 0 29 0 20.3 0.0012 no no 18 559 379 754 442 669.0 0.1123 no yes 19 55 4 143 9 111.1 0.0111 no no 21 13 0 74 0 45.4 0.0025 no no 22 19 0 76 0 56.3 0.0038 no no 26 153 7 348 30 281.2 0.0307 no no 28 14 0 81 0 47.6 0.0028 no no 30 163 1 398 25 304.2 0.0327 no no 33 60 9 139 24 104.1 0.0122 no no 35 42 0 144 2 104.4 0.0084 no no 36 424 272 602 335 519.0 0.0852 no yes 37 26 0 185 1 100.0 0.0052 no no 38 18 0 85 1 54.0 0.0037 no no 39 290 73 556 129 443.9 0.0583 no no   125 40 52 0 234 3 134.3 0.0105 no no 60 11 0 38 1 23.3 0.0022 no no 55 16 0 57 1 40.2 0.0032 no no 44 98 0 345 8 255.4 0.0197 no no 61 15 0 90 1 47.4 0.0030 no no 62 17 0 87 1 55.2 0.0033 no no 63 16 0 100 1 59.7 0.0033 no no 64 59 0 324 2 187.6 0.0118 no no 65 55 0 327 1 161.3 0.0111 no no 66 17 0 126 0 58.2 0.0035 no no 67 13 0 70 0 43.8 0.0026 no no 68 47 0 202 3 126.1 0.0094 no no 69 59 0 216 4 150.6 0.0118 no no 70 76 0 302 3 201.7 0.0153 no no 71 108 0 388 11 257.3 0.0216 no no 72 19 0 105 1 58.3 0.0038 no no 73 15 0 111 0 55.7 0.0029 no no 74 48 0 304 1 174.8 0.0097 no no 75 18 0 88 1 58.5 0.0035 no no 76 67 0 290 2 212.5 0.0135 no no 77 19 0 94 1 55.2 0.0038 no no 78 8 0 40 0 29.2 0.0017 no no 79 17 0 78 1 58.6 0.0034 no no 80 246 11 491 70 418.1 0.0495 no no 81 71 0 179 6 142.7 0.0143 no no 82 50 0 192 2 140.2 0.0101 no no 83 13 0 38 2 27.8 0.0026 no no     126 B.20 Patient 3 coding mutation summary. BRCA, breast invasive carcinoma; SNV, single nucleotide variant; SV, structural variant; TCGA, The Cancer Genome Atlas  Type of mutation (coding) Number of events Percentile - TCGA BRCA Percentile - all TCGA SNV 92 93 74 INDEL 4 69 59 SV 248 93rd percentile amongst our local database of 626 cancer cases    127 B.21 Patient 3 somatic small mutations.  chr pos ref alt effect type hugo hgvs_protein hgvs_cds Alt DNA reads Ref DNA reads 1 150445667 C T missense  RPRD2 p.Arg1415Trp c.4243C>T 19 119 1 152283235 C G missense  FLG p.Arg1376Thr c.4127G>C 19 91 1 167844392 C A missense  ADCY10 p.Arg480Ile c.1439G>T 14 110 1 171753387 C T stop gained METTL13 p.Gln221* c.661C>T 63 87 1 211526730 C T missense  TRAF5 p.Ser50Leu c.149C>T 16 132 1 222923392 C T missense  FAM177B p.Pro157Ser c.469C>T 69 82 1 230503842 C T missense  PGBD5 p.Arg58His c.173G>A 36 158 1 235909784 C T missense  LYST p.Met2608Ile c.7824G>A 23 132 1 9786997 G A missense  PIK3CD p.Glu1010Lys c.3028G>A 23 58 1 43738808 G A missense  TMEM125 p.Ala139Thr c.415G>A 12 50 1 53446082 G C missense  SCP2 p.Met280Ile c.840G>C 7 56 1 99772192 G A missense  RP4-788L13.1 p.Glu640Lys c.1918G>A 53 16 1 114247307 G A missense  PHTF1 p.Ser595Leu c.1784C>T 7 54 1 13001252 T C missense  PRAMEF6 p.Gln144Arg c.431A>G 18 53 1 183602300 T A splice acceptor +intron  ARPC5 NA NA 25 87 2 44513181 A G missense  SLC3A1 p.Tyr259Cys c.776A>G 11 64 2 26700064 C A missense  OTOF p.Gln833His c.2499G>T 5 85 2 99439514 C T missense  KIAA1211L p.Glu408Lys c.1222G>A 23 74 2 141816500 C G missense  LRP1B p.Glu454Gln c.1360G>C 7 98 2 166771739 C A missense  TTC21B p.Asp704Tyr c.2110G>T 27 67   128 2 179600280 C T missense  TTN p.Glu4965Lys c.14893G>A 50 43 2 3518580 G A missense +splice region  ADI1 p.Ser34Leu c.101C>T 14 88 2 61412636 G A splice acceptor +intron  AHSA2 NA NA 21 68 2 63283125 G A missense  OTX1 p.Asp247Asn c.739G>A 7 65 3 101483815 A G missense  CEP97 p.Glu673Gly c.2018A>G 10 45 3 178952085 A G missense  PIK3CA p.His1047Arg c.3140A>G 83 46 3 183776361 C A missense  HTR3C p.Gln236Lys c.706C>A 18 97 3 186330958 C T missense  AHSG p.Leu10Phe c.28C>T 49 88 3 195515294 C G missense  MUC4 p.Asp1053His c.3157G>C 20 96 3 197670653 C T missense  IQCG p.Cys93Tyr c.278G>A 22 109 3 64633662 G A missense  ADAMTS9 p.Thr555Ile c.1664C>T 24 57 3 77684138 G T missense  ROBO2 p.Gly1309Val c.3926G>T 7 62 3 121179001 G C missense  POLQ p.Gln2350Glu c.7048C>G 22 63 3 141712417 G A missense  TFDP2 p.Arg107Trp c.319C>T 28 70 3 64004546 T C missense  PSMD6 p.Tyr222Cys c.665A>G 8 52 4 8207058 C T missense  SH3TC1 p.Ala46Val c.137C>T 31 26 4 76407851 C T missense  RCHY1 p.Arg228Gln c.683G>A 23 75 4 175896983 C T missense  ADAM29 p.His103Tyr c.307C>T 7 64 4 184629724 C G missense  TRAPPC11 p.Gln1079Glu c.3235C>G 8 48 4 175898243 G A missense  ADAM29 p.Glu523Lys c.1567G>A 13 65 5 145969658 A C missense  PPP2R2B p.Val395Gly c.1184T>G 7 72 5 53751624 C A missense  HSPB3 p.Ala2Glu c.5C>A 31 84 5 108134086 C T missense  FER p.Ser68Phe c.203C>T 14 77 5 115814263 C G missense  SEMA6A p.Glu468Gln c.1402G>C 14 59 5 145969670 C T missense  PPP2R2B p.Arg391Gln c.1172G>A 44 36 5 157240156 C G missense  CLINT1 p.Arg144Ser c.432G>C 39 42   129 5 168671762 C A missense  SLIT3 p.Gln96His c.288G>T 5 82 5 127497486 G T missense  SLC12A2 p.Leu870Phe c.2610G>T 8 84 5 153190694 G A missense  GRIA1 p.Arg877Gln c.2630G>A 6 60 6 70733532 A T missense  COL19A1 p.Glu347Val c.1040A>T 27 68 6 15638031 C T missense  DTNBP1 p.Glu56Lys c.166G>A 26 87 6 26124519 C T missense  HIST1H2AC p.Ser20Phe c.59C>T 36 79 6 57006829 C T stop gained ZNF451 p.Gln314* c.940C>T 11 133 6 170592141 C T missense  DLL1 p.Asp701Asn c.2101G>A 21 43 6 12124454 G A missense  HIVEP1 p.Glu1476Lys c.4426G>A 51 62 6 31116224 G A missense  CCHCR1 p.Ser424Leu c.1271C>T 10 108 6 32373042 G T missense  BTNL2 p.Pro34His c.101C>A 49 60 6 150067176 G A missense  NUP43 p.Ser48Phe c.143C>T 23 54 6 165792846 G T missense  PDE10A p.His608Asn c.1822C>A 20 53 7 1785931 C G missense  ELFN1 p.Gln567Glu c.1699C>G 46 44 7 48313147 C A stop gained ABCA13 p.Ser1295* c.3884C>A 29 107 7 100680120 C A missense  MUC17 p.Thr1808Asn c.5423C>A 3 55 7 102604079 C T missense  FBXL13 p.Asp209Asn c.625G>A 9 63 7 103230136 G C missense  RELN p.Ser1351Cys c.4052C>G 14 55 8 38845457 C G missense +splice region  HTRA4 p.Ser424Cys c.1271C>G 47 44 8 144990775 C T missense  PLEC p.Arg4542His c.13625G>A 12 91 8 24254967 G C missense +splice region  ADAMDEC1 p.Glu209Gln c.625G>C 24 37 8 38831876 G A missense  HTRA4 p.Glu32Lys c.94G>A 30 107 8 120435337 G A missense  NOV p.Glu347Lys c.1039G>A 58 67 9 38595865 C T missense  ANKRD18A p.Arg491His c.1472G>A 22 93   130 9 101984014 C T missense  ALG2 p.Asp55Asn c.163G>A 34 68 9 134358812 C T stop gained PRRC2B p.Gln1814* c.5440C>T 43 40 9 92220375 G A missense  GADD45G p.Glu28Lys c.82G>A 11 105 10 26851281 A C missense  APBB1IP p.Asn466His c.1396A>C 61 38 10 51584644 A G missense  NCOA4 p.Asn248Ser c.743A>G 10 95 10 43691936 C T stop gained RASGEF1A p.Trp470* c.1409G>A 7 124 10 75435524 C G missense  AGAP5 p.Lys298Asn c.894G>C 10 73 10 76603082 C A missense  KAT6B p.Ala156Asp c.467C>A 8 59 10 102743326 C T missense  SEMA4G p.Pro657Leu c.1970C>T 24 82 10 135381809 C T missense  AL161645.1 p.Arg12Gln c.35G>A 20 75 10 125426277 G T missense  GPR26 p.Met118Ile c.354G>T 36 80 10 75435513 T G missense  AGAP5 p.Lys302Thr c.905A>C 8 74 11 4790388 A C missense  OR51F1 p.Tyr254Asp c.760T>G 36 45 11 46407822 C T missense  CHRM4 p.Gly96Ser c.286G>A 34 63 11 55595388 G A missense  OR5L2 p.Glu232Lys c.694G>A 9 125 11 55606240 G A missense  OR5D16 p.Glu5Lys c.13G>A 15 108 11 64678311 G A missense  ATG2A p.Pro528Ser c.1582C>T 8 117 11 66114535 G C missense  B3GNT1 p.Ser161Trp c.482C>G 48 93 11 134122772 G C missense  THYN1 p.Ser2Trp c.5C>G 20 39 12 82792988 A T missense  METTL25 p.Asn316Tyr c.946A>T 11 103 12 49375071 C T missense  WNT1 p.Ser254Leu c.761C>T 11 121 12 12672813 G A missense  DUSP16 p.Ser117Phe c.350C>T 20 55 12 46582778 G T stop gained SLC38A1 p.Ser480* c.1439C>A 17 82 12 51753008 G A missense  GALNT6 p.Arg426Cys c.1276C>T 11 90 12 55886528 G T missense  OR6C68 p.Ala123Ser c.367G>T 11 147 12 93101480 G A missense  C12orf74 p.Asp188Asn c.562G>A 7 89   131 12 105583567 G A missense  APPL2 p.Ser481Leu c.1442C>T 16 73 12 40114716 T C missense  C12orf40 p.Val541Ala c.1622T>C 13 85 14 53345326 C T missense  FERMT2 p.Glu313Lys c.937G>A 27 59 14 86520877 C A splice donor +intron  RP11-1079H9.1 NA NA 7 70 15 31359358 C A missense  TRPM1 p.Asp154Tyr c.460G>T 47 29 15 48787337 C T missense  FBN1 p.Cys887Tyr c.2660G>A 7 83 15 34653699 G A missense  LPCAT4 p.Pro349Ser c.1045C>T 17 68 16 1389528 C T missense  BAIAP3 p.Thr146Met c.437C>T 41 87 16 3187316 C A missense  ZNF213 p.Pro12His c.35C>A 27 79 16 28847399 C T missense  ATXN2L p.Ser1014Leu c.3041C>T 14 108 16 57449013 C T missense  CCL17 p.Arg31Trp c.91C>T 7 71 16 737223 G A missense  WDR24 p.Arg285Trp c.853C>T 13 96 16 69810091 G C splice acceptor +intron  WWP2 NA NA 6 98 17 16220120 C T stop gained PIGL p.Gln180* c.538C>T 43 20 17 18392276 C T stop gained LGALS9C p.Gln156* c.466C>T 19 47 17 7983488 G T missense  AC129492.6 p.Ala57Ser c.169G>T 17 68 17 48593978 G A missense  MYCBPAP p.Glu85Lys c.253G>A 10 56 17 73809270 G A missense  UNK p.Arg321Gln c.962G>A 9 59 18 14748464 G A missense  ANKRD30B p.Glu16Lys c.46G>A 24 34 19 58773851 A G missense  ZNF544 p.Thr627Ala c.1879A>G 53 35 19 22363737 C G missense  ZNF676 p.Gly261Ala c.782G>C 12 100 19 31770377 C T missense  TSHZ3 p.Asp108Asn c.322G>A 42 81 19 56466698 C G missense  NLRP8 p.Ser425Cys c.1274C>G 6 80 19 56733357 C T missense  ZSCAN5A p.Val360Met c.1078G>A 25 59 19 5661695 G C missense  SAFB p.Glu677Gln c.2029G>C 24 62   132 19 21606649 G C missense  ZNF493 p.Lys268Asn c.804G>C 5 90 19 46338356 G C missense  SYMPK p.Thr458Arg c.1373C>G 28 45 20 32251456 G A missense  C20orf144 p.Gly82Glu c.245G>A 15 87 21 46875805 A C missense  COL18A1 p.Ile121Leu c.361A>C 24 69 X 24024500 A C missense  KLHL15 p.Ile104Ser c.311T>G 21 87 X 3228287 C T missense  MXRA5 p.Glu2653Lys c.7957G>A 17 94 X 5821677 C T missense  NLGN4X p.Val348Ile c.1042G>A 36 56 X 25031679 C T missense  ARX p.Ala145Thr c.433G>A 7 95 X 38037538 C G missense +splice region  SRPX p.Asp53His c.157G>C 62 54 X 141291745 C T missense  MAGEC2 p.Arg10His c.29G>A 4 41 X 6975846 G A missense  HDHD1 p.Arg177Cys c.529C>T 46 60    133 B.22 Patient 3 somatic structural variants.  Event type1 Event type2 Gene1 Gene2 Flanking DNA reads Spanning DNA reads location fusion deletion INPP5F NA 28 26 10:121506254|10:123077622 fusion deletion NA JAKMIP3 21 5 10:125352994|10:133976963 fusion deletion NA NA 11 16 11:40819743|11:40826018 fusion deletion NA SLC25A21 28 17 14:33400099|14:37510020 fusion deletion ITFG1 WWP2 58 51 16:47208829|16:69809286 fusion deletion ITFG1 WWP2 58 51 16:47208829|16:69809286 fusion deletion CTC-512J12 SERTAD1 20 10 19:44842437|19:40927760 fusion deletion CTC-512J12 SERTAD1 20 10 19:44842437|19:40927760 fusion deletion CTC-512J12 SERTAD1 20 10 19:44842437|19:40927760 fusion deletion NA PCP4 96 85 21:31620116|21:41290103 fusion deletion WDR12 NA 14 12 2:203752640|2:208658423 fusion deletion ATP2B2 ATP2B2 0 4 3:10397475|3:10397806 fusion deletion LCORL LCORL 30 35 4:17985703|4:18005255 fusion deletion TRAM2 NA 25 16 6:52410815|6:52741023 fusion deletion TRAM2 NA 25 16 6:52410815|6:52741023 fusion deletion NA NA 23 13 8:111568164|8:111629899 fusion deletion NA NA 23 13 8:111568164|8:111629899 fusion deletion NA TG 17 18 8:132448501|8:133941833 fusion deletion TG NA 32 22 8:133974900|8:143975474 fusion deletion TG NA 32 22 8:133974900|8:143975474 fusion deletion NA NA 23 12 8:138895620|8:140137129 fusion deletion NA NA 23 12 8:138895620|8:140137129   134 fusion deletion NA NA 23 12 8:138895620|8:140137129 fusion deletion NRG1 NA 3 95 8:31505948|8:114792909 fusion deletion NA NA 60 33 8:64185278|8:65211898 fusion deletion PDE7A NA 50 25 8:66689454|8:79389766 fusion deletion PDE7A NA 50 25 8:66689454|8:79389766 fusion deletion NA NA 23 27 8:72108711|8:73227506 fusion deletion RP11-383H13 NA 4 25 8:72864883|8:78243355 fusion deletion NA NA 4 33 8:78426809|8:78458988 fusion deletion NA NA 40 74 8:78459166|8:79332689 fusion deletion NA NA 134 59 8:85084012|8:91564587 fusion deletion NA NA 134 59 8:85084012|8:91564587 fusion deletion NA NA 47 27 X:130657294|X:138182242 fusion deletion RP5-972B16 RP5-972B16 60 28 X:38404812|X:38413346 fusion duplication NA NA 24 20 10:121387754|10:121438137 fusion duplication NA NA 34 57 10:121437485|10:122689172 fusion duplication NA NA 34 57 10:121437485|10:122689172 fusion duplication NA FGFR2 24 20 10:122548224|10:123255861 fusion duplication NA FGFR2 24 20 10:122548224|10:123255861 fusion duplication BICC1 ANK3 47 30 10:60553625|10:62040434 fusion duplication BICC1 ANK3 47 30 10:60553625|10:62040434 fusion duplication REEP3 NA 50 31 10:65342152|10:65512378 fusion duplication REEP3 NA 50 31 10:65342152|10:65512378 fusion duplication NA NA 42 19 10:8845361|10:9324611   135 fusion duplication NA NA 50 22 10:92278137|10:93411483 fusion duplication NA NA 50 22 10:92278137|10:93411483 fusion duplication PLCE1 PDE6C 55 30 10:96076074|10:95417425 fusion duplication PLCE1 PDE6C 55 30 10:96076074|10:95417425 fusion duplication PLCE1 PDE6C 55 30 10:96076074|10:95417425 fusion duplication NA NA 38 41 11:29354330|11:29546607 fusion duplication NA NA 38 41 11:29354330|11:29546607 fusion duplication NA CACNA2D4 50 31 12:1796972|12:2012578 fusion duplication RASSF3 TBK1 24 13 12:65041922|12:64882424 fusion duplication RASSF3 TBK1 24 13 12:65041922|12:64882424 fusion duplication LATS2 NA 38 36 13:21549397|13:21760382 fusion duplication WDR25 NA 51 16 14:100912344|14:102030807 fusion duplication NA NA 39 16 14:29341990|14:29415395 fusion duplication NA NA 39 16 14:29341990|14:29415395 fusion duplication STRN3 BAZ1A 44 19 14:31466248|14:35273371 fusion duplication STRN3 BAZ1A 44 19 14:31466248|14:35273371 fusion duplication NA FANCI 55 30 15:86645004|15:89808246 fusion duplication NA FANCI 55 30 15:86645004|15:89808246 fusion duplication NETO2 NA 67 31 16:47142923|16:64177386 fusion duplication SRCIN1 DNAJC7 51 23 17:36714341|17:40132098 fusion duplication SRCIN1 DNAJC7 51 23 17:36714341|17:40132098 fusion duplication PRX ARHGAP35 34 9 19:40917129|19:47432289 fusion duplication PRX ARHGAP35 34 9 19:40917129|19:47432289 fusion duplication ATP2B4 NA 63 37 1:203634945|1:206949589 fusion duplication HHAT KCNH1 13 13 1:210796152|1:211016172 fusion duplication HHAT KCNH1 13 13 1:210796152|1:211016172 fusion duplication NA ZMYM4 44 30 1:31261705|1:35811844   136 fusion duplication NA DNAJC6 59 27 1:65479590|1:65815126 fusion duplication RBMS1 NA 30 18 2:161343558|2:161554603 fusion duplication AGPS PDE11A 22 12 2:178293675|2:178569801 fusion duplication AGPS PDE11A 22 12 2:178293675|2:178569801 fusion duplication AGPS PDE11A 22 12 2:178293675|2:178569801 fusion duplication DNAH6 NA 62 21 2:84986785|2:85319476 fusion duplication NA COL6A6 38 27 3:129897914|3:130301859 fusion duplication NA NA 54 24 3:172144557|3:172564862 fusion duplication FAM19A1 NA 40 17 3:68540207|3:70224505 fusion duplication NA NA 25 17 4:18514138|4:19309328 fusion duplication FAM193A RNF4 29 12 4:2659350|4:2496111 fusion duplication FAM193A RNF4 29 12 4:2659350|4:2496111 fusion duplication BMPR1B BMPR1B 51 8 4:95772023|4:95970082 fusion duplication STPG2 PPA2 21 14 4:98824701|4:106335084 fusion duplication STPG2 PPA2 21 14 4:98824701|4:106335084 fusion duplication NA SNX24 45 23 5:121910890|5:122235545 fusion duplication RP11-73O6 RP11-73O6 47 15 6:127171803|6:127521211 fusion duplication CMTR1 NA 58 28 6:37420103|6:37751970 fusion duplication NA NA 29 26 8:115302795|8:116744252 fusion duplication NA NA 29 26 8:115302795|8:116744252 fusion duplication EXT1 NA 28 7 8:118905561|8:125766861 fusion duplication NSMCE2 NA 17 15 8:126208682|8:138538251 fusion duplication NA NA 55 27 8:128258045|8:128918826 fusion duplication NA NA 142 82 8:1392166|8:107528871 fusion duplication NA PXDNL 57 28 8:52183854|8:52388035 fusion duplication NA PXDNL 57 28 8:52183854|8:52388035 fusion duplication FAM110B NA 61 36 8:58952119|8:59240203 fusion duplication NA NA 22 17 8:613267|8:5420289   137 fusion duplication CHD7 NA 50 27 8:61593787|8:71877158 fusion duplication PDE7A RP11-383H13 51 37 8:66688794|8:72758899 fusion duplication PDE7A RP11-383H13 51 37 8:66688794|8:72758899 fusion duplication CPA6 NA 57 29 8:68370190|8:79332997 fusion duplication NA NA 99 68 8:75010634|8:112319604 fusion duplication NA NA 61 34 9:3246131|9:3417642 fusion duplication NTRK2 NA 48 22 9:87456099|9:89610517 fusion duplication NTRK2 NA 48 22 9:87456099|9:89610517 fusion duplication NA NA 68 48 X:130506123|X:131317323 fusion duplication RP5-972B16 RP5-972B16 82 41 X:38386677|X:38450280 fusion duplication RP5-972B16 RP5-972B16 82 41 X:38386677|X:38450280 fusion inversion NA STK32C 21 14 10:124203323|10:134085427 fusion inversion NA ANO1 326 166 11:69053576|11:69956486 fusion inversion NA NA 92 127 11:70746973|11:70763459 fusion inversion WDR25 WDR25 30 11 14:100912834|14:100922656 fusion inversion G2E3 NA 41 14 14:31069925|14:31252744 fusion inversion G2E3 NA 41 14 14:31069925|14:31252744 fusion inversion ONECUT2 NA 40 29 18:55118638|18:58966382 fusion inversion PRX SYMPK 13 15 19:40915333|19:46350245 fusion inversion PRX SYMPK 13 15 19:40915333|19:46350245 fusion inversion CTC-512J12 ARHGAP35 27 18 19:44848488|19:47428441 fusion inversion CTC-512J12 ARHGAP35 27 18 19:44848488|19:47428441 fusion inversion CTC-512J12 ARHGAP35 27 18 19:44848488|19:47428441 fusion inversion BCL9 NA 52 19 1:147097325|1:150613821 fusion inversion NA PCP4 189 81 21:30904757|21:41289323 fusion inversion NA PCP4 189 81 21:30904757|21:41289323   138 fusion inversion GRIK1 NA 196 67 21:31070518|21:41310987 fusion inversion TIAM1 NA 131 97 21:32922041|21:41314479 fusion inversion RIPK4 NA 66 6 21:43164202|21:43217835 fusion inversion ZBTB21 NA 55 29 21:43420629|21:45229202 fusion inversion ZBTB21 NA 55 29 21:43420629|21:45229202 fusion inversion UBASH3A TRPM2 126 152 21:43833842|21:45821383 fusion inversion UBASH3A TRPM2 126 152 21:43833842|21:45821383 fusion inversion PHLDB2 NA 45 40 3:111672718|3:116433851 fusion inversion PHLDB2 NA 45 40 3:111672718|3:116433851 fusion inversion PHLDB2 NA 45 40 3:111672718|3:116433851 fusion inversion MANBA NA 25 14 4:103636270|4:130748304 fusion inversion MANBA NA 25 14 4:103636270|4:130748304 fusion inversion NA PAPSS1 58 24 4:104134101|4:108635628 fusion inversion NA NA 24 25 4:107285215|4:112130996 fusion inversion ANGPT1 NA 2 23 8:108473559|8:112319490 fusion inversion ANGPT1 NA 14 50 8:108479404|8:112695597 fusion inversion ANGPT1 NA 4 39 8:108480165|8:112695694 fusion inversion TRPS1 NA 42 24 8:116675646|8:124185229 fusion inversion NA NSMCE2 28 15 8:116948967|8:126153475 fusion inversion NSMCE2 NA 49 0 8:126139829|8:132453964 fusion inversion NA ST3GAL1 33 0 8:130335971|8:134508635 fusion inversion ST3GAL1 NA 27 13 8:134524848|8:140168596 fusion inversion CDCA2 NA 63 31 8:25322363|8:30240740 fusion inversion CDCA2 NA 63 31 8:25322363|8:30240740 fusion inversion NA UNC5D 51 31 8:25365701|8:35100689 fusion inversion NA NRG1 67 29 8:25367116|8:32597271 fusion inversion NA NA 35 45 8:61967065|8:78458282 fusion inversion ASPH NA 56 1 8:62627070|8:68285216   139 fusion inversion PDE7A NA 90 67 8:66687375|8:71876775 fusion inversion NA NA 41 32 8:73226764|8:78457260 fusion inversion NA NA 67 21 8:75843889|8:77535459 fusion inversion NA NA 63 35 8:76920450|8:78457760 fusion inversion NA CNGB3 140 65 8:76924005|8:87692070 fusion inversion NA NA 39 69 8:77270120|8:78459853 fusion inversion NA NA 147 62 8:807634|8:1428356 fusion inversion NA F8 96 41 X:130934776|X:154245366 fusion inversion NA F8 96 41 X:130934776|X:154245366 fusion inversion F9 NA 52 30 X:138636858|X:146937243 fusion translocation NA HSF5 31 51 10:108991269|17:56527097 fusion translocation APBB1IP NA 4 25 10:26852979|16:72357454 fusion translocation NA NA 2 1 10:54396005|14:103820452 fusion translocation NA NA 1 23 11:31917053|X:3064239 fusion translocation NA DSCAM 1 6 11:34397386|21:41957374 fusion translocation NA RTTN 2 15 12:31406636|18:67868552 fusion translocation LIN7A NA 6 5 12:81315235|X:27423677 fusion translocation LATS2 NA 32 32 13:21549631|19:16019314 fusion translocation NA NA 62 22 13:21756938|19:15870600 fusion translocation NA CCDC178 1 28 15:66668151|18:30923983 fusion translocation NA CCDC178 1 28 15:66668151|18:30923983 fusion translocation NETO2 BACE2 70 29 16:47146287|21:42603857 fusion translocation NETO2 BACE2 70 29 16:47146287|21:42603857 fusion translocation NAPG NA 13 20 18:10552686|X:152964040   140 fusion translocation ARHGEF2 NA 1 5 1:155924395|13:44368134 fusion translocation NAV1 NA 120 1 1:201745265|5:152652359 fusion translocation NAV1 NA 120 1 1:201745265|5:152652359 fusion translocation KCNK2 NA 2 7 1:215285275|6:156507203 fusion translocation SUSD4 NA 13 10 1:223491188|6:148195789 fusion translocation SUSD4 NA 13 10 1:223491188|6:148195789 fusion translocation SMYD3 NA 11 12 1:245989321|3:155883101 fusion translocation SMYD3 NA 11 12 1:245989321|3:155883101 fusion translocation SMYD3 NA 11 12 1:245989321|3:155883101 fusion translocation NA FTO 7 41 1:43252985|16:53874981 fusion translocation NA FTO 7 41 1:43252985|16:53874981 fusion translocation AGBL4 RPS6KA2 5 0 1:49357678|6:166930941 fusion translocation AGBL4 RPS6KA2 5 0 1:49357678|6:166930941 fusion translocation NA MLH3 3 2 1:77100369|14:75499581 fusion translocation BACE2 NETO2 52 34 21:42607679|16:47144192 fusion translocation BACE2 NETO2 52 34 21:42607679|16:47144192 fusion translocation NA RTN1 4 27 2:15163766|14:60139664 fusion translocation NA NA 54 16 2:189095624|7:22716668 fusion translocation NA NA 50 24 2:189097893|12:55726686 fusion translocation NA GNE 3 25 2:194539252|9:36223516 fusion translocation NA GNE 3 25 2:194539252|9:36223516 fusion translocation ERBB4 NA 2 31 2:213168117|3:94261462 fusion translocation PNKD NA 5 6 2:219198684|Y:15683653 fusion translocation PNKD NA 5 6 2:219198684|Y:15683653 fusion translocation XDH NA 18 8 2:31559681|9:2997920 fusion translocation XDH NA 18 8 2:31559681|9:2997920 fusion translocation ALLC NA 3 21 2:3718461|5:92677463 fusion translocation SOCS5 NA 5 36 2:46928951|8:136416502   141 fusion translocation CD80 CDCA2 48 13 3:119253446|8:25361860 fusion translocation CD80 CDCA2 48 13 3:119253446|8:25361860 fusion translocation NA NA 48 31 3:120277305|8:92708594 fusion translocation TRPC1 NA 3 9 3:142460039|5:151760916 fusion translocation NA SLC2A9 3 26 3:155040675|4:9853868 fusion translocation NA NA 6 18 3:162025903|10:25957326 fusion translocation NA MPP6 4 60 3:176143034|7:24678256 fusion translocation NA NA 13 36 3:181209606|10:89373681 fusion translocation NA NA 13 36 3:181209606|10:89373681 fusion translocation NA NA 13 36 3:181209606|10:89373681 fusion translocation NA ARSB 1 37 3:181490515|5:78253759 fusion translocation NA ARSB 1 37 3:181490515|5:78253759 fusion translocation UTS2B PKD1L1 1 5 3:190999464|7:47895504 fusion translocation UTS2B PKD1L1 1 5 3:190999464|7:47895504 fusion translocation UTS2B PKD1L1 1 5 3:190999464|7:47895504 fusion translocation UTS2B NA 1 4 3:190999464|7:82271836 fusion translocation NA NA 3 1 3:25031262|11:13813018 fusion translocation NA NA 3 1 3:25031262|11:13813018 fusion translocation GBE1 NA 4 11 3:81678169|11:41831810 fusion translocation GBE1 NA 4 11 3:81678169|11:41831810 fusion translocation NA WDR7 43 24 4:104139324|18:54366896 fusion translocation TACR3 NA 36 34 4:104589989|X:145476015 fusion translocation ARHGEF38 NA 1 50 4:106507402|10:86384131 fusion translocation NA NA 23 14 4:107280417|18:35549789 fusion translocation CCDC109B GAB3 87 58 4:110524024|X:153916995 fusion translocation CCDC109B GAB3 87 58 4:110524024|X:153916995 fusion translocation NA TCF4 27 13 4:112130638|18:53229600 fusion translocation NA NA 2 10 4:119296375|10:56902369   142 fusion translocation NA CDH7 33 34 4:134528790|18:63536333 fusion translocation NA NA 37 62 4:166468026|8:88410425 fusion translocation CCDC110 ARHGEF28 2 8 4:186382160|5:73216370 fusion translocation CCDC110 ARHGEF28 2 8 4:186382160|5:73216370 fusion translocation NA F8 74 48 4:82233085|X:154244601 fusion translocation NA NA 44 15 4:82949213|18:35271423 fusion translocation NA NA 34 24 4:87039|12:58884478 fusion translocation NA NA 34 24 4:87039|12:58884478 fusion translocation NA NA 1 44 5:162239054|X:33302850 fusion translocation NA NA 3 2 5:21651574|6:40731415 fusion translocation NA NRXN3 1 19 5:62500890|14:79723858 fusion translocation C5orf49 NA 1 5 5:7846053|22:30909148 fusion translocation C5orf49 NA 1 5 5:7846053|22:30909148 fusion translocation NA LAMA2 1 16 5:83127432|6:129387494 fusion translocation NA LAMA2 1 16 5:83127432|6:129387494 fusion translocation NA NA 19 27 5:8432628|21:37321084 fusion translocation NA NA 19 27 5:8432628|21:37321084 fusion translocation NA NA 19 27 5:8432628|21:37321084 fusion translocation MCTP1 RP11-73O6 4 16 5:94109622|6:126423676 fusion translocation MCTP1 RP11-73O6 4 16 5:94109622|6:126423676 fusion translocation SLC16A10 NA 4 2 6:111484346|12:58083260 fusion translocation NA LRRC69 2 26 6:133160107|8:92153186 fusion translocation PDE7B DOCK1 4 9 6:136352901|10:129190306 fusion translocation PDE7B DOCK1 4 9 6:136352901|10:129190306 fusion translocation NA TTC28 10 1 6:156460598|22:29065965 fusion translocation NA HACE1 1 21 6:19236515|6:105222269   143 fusion translocation NA NPLOC4 10 12 6:23978344|17:79578012 fusion translocation NA C20orf194 1 6 6:54445472|20:3242408 fusion translocation NA NA 3 15 6:56134168|11:27039982 fusion translocation NA ABCA10 1 5 6:66816124|17:67206305 fusion translocation FAM135A NA 60 38 6:71186652|8:91728366 fusion translocation FAM135A NA 60 38 6:71186652|8:91728366 fusion translocation FAM135A NA 49 46 6:71188819|8:93492489 fusion translocation SMAP1 TMEM67 51 41 6:71482396|8:94823457 fusion translocation SMAP1 TMEM67 51 41 6:71482396|8:94823457 fusion translocation PPP1R9A NA 8 28 7:94779065|11:39115802 fusion translocation NA TSPAN11 6 19 7:95308530|12:31120731 fusion translocation NA ENOX1 6 10 7:97067538|13:43822016 fusion translocation NA NA 152 87 8:1100547|21:44589025 fusion translocation NA TMPRSS3 110 46 8:112319362|21:43815930 fusion translocation NA TMPRSS3 156 123 8:112319831|21:43815992 fusion translocation NA NA 45 30 8:112695527|21:27089960 fusion translocation CSMD3 NA 146 88 8:113897658|21:30769076 fusion translocation CSMD3 NA 146 88 8:113897658|21:30769076 fusion translocation NA NA 33 5 8:138838856|21:15718733 fusion translocation NA NA 33 5 8:138838856|21:15718733 fusion translocation SGCZ TTC7B 3 6 8:14343340|14:91185180 fusion translocation SGCZ TTC7B 3 6 8:14343340|14:91185180 fusion translocation DLGAP2 DGKH 8 31 8:1543543|13:42726787 fusion translocation DLGAP2 DGKH 8 31 8:1543543|13:42726787 fusion translocation DLGAP2 DGKH 8 31 8:1543543|13:42726787 fusion translocation ARHGEF10 NA 29 15 8:1857626|21:25267360 fusion translocation KBTBD11 NA 180 96 8:1934787|21:30371771 fusion translocation NA NA 162 59 8:5421506|8:111567841   144 fusion translocation NA PCP4 141 93 8:5421811|21:41289800 fusion translocation NA GRIK1 104 72 8:5664739|21:31134541 fusion translocation NA GRIK1 104 72 8:5664739|21:31134541 fusion translocation NA NA 65 56 8:75010634|21:26637221 fusion translocation NA NA 65 56 8:75010634|21:26637221 fusion translocation NA DSCAM 75 98 8:808234|21:41867641 fusion translocation NA DSCAM 75 98 8:808234|21:41867641 fusion translocation NA TMPRSS3 175 86 8:808387|21:43816502 fusion translocation NA TMPRSS3 175 86 8:808387|21:43816502 fusion translocation IMPA1 SHANK2 48 66 8:82591570|11:70444180 fusion translocation IMPA1 SHANK2 48 66 8:82591570|11:70444180 fusion translocation NA NA 19 15 8:87209885|21:37262122 fusion translocation NA NA 19 15 8:87209885|21:37262122 fusion translocation SDC2 RFX3 51 30 8:97547036|9:3248257 fusion translocation SDC2 RFX3 51 30 8:97547036|9:3248257 fusion translocation SDC2 RFX3 52 25 8:97595075|9:3248281 fusion translocation SDC2 RFX3 52 25 8:97595075|9:3248281 fusion translocation NA TTC28 25 1 9:8094868|22:29065790    145 B.23 Patient 3 COSMIC mutational signatures. The total number of somatic mutations was 20236. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 1152 812 1495 967 1329 0.0569 no no 2 6621 6252 6894 6404 6830 0.3272 yes yes 3 304 1 781 36 647 0.015 no no 4 102 0 459 5 289 0.005 no no 5 2770 1548 3692 2024 3436 0.1369 no no 6 49 0 292 2 160 0.0024 no no 7 86 0 326 2 218 0.0042 no no 8 1464 970 1921 1121 1790 0.0723 no no 9 593 351 884 419 784 0.0293 no no 10 518 295 763 375 670 0.0256 yes no 11 74 0 265 4 190 0.0036 no no 12 256 3 682 38 514 0.0127 no no 13 3512 3243 3768 3350 3663 0.1736 yes yes 14 112 0 383 6 258 0.0055 yes no 15 20 0 117 1 71 0.001 no no 16 308 4 901 28 690 0.0152 no no 17 4 0 25 0 16 2.00E-04 no no 18 1735 1412 2166 1545 1947 0.0857 yes yes 19 39 0 242 1 144 0.0019 no no 20 63 1 292 3 192 0.0031 no no 21 31 0 221 1 101 0.0015 no no 22 13 0 62 0 43 6.00E-04 no no 23 23 0 153 1 84 0.0012 no no 24 27 0 138 1 96 0.0014 no no 25 67 0 327 2 205 0.0033 no no 26 60 0 306 3 184 0.003 no no 27 100 24 185 50 153 0.0049 no no 28 8 0 60 0 29 4.00E-04 no no 29 34 0 185 1 121 0.0017 no no 30 92 0 397 3 269 0.0046 no no    146 B.24 Patient 3 SigProfiler mutational signatures. The total number of somatic mutations was 20236. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 714 529 871 607 820 0.0353 no no 2 5504 5122 5802 5302 5701 0.2720 yes yes 3 106 1 723 3 336 0.0052 no no 4 105 0 537 3 334 0.0052 no no 5 487 3 1760 29 1311 0.0241 no no 6 81 0 378 4 249 0.0040 no no 7a 169 0 527 17 380 0.0084 yes no 7b 75 0 373 1 235 0.0037 yes no 7c 136 17 289 52 241 0.0067 no no 7d 24 0 116 1 75 0.0012 no no 8 469 73 950 166 772 0.0232 no no 9 152 6 534 27 297 0.0075 no no 10a 108 1 303 8 250 0.0053 no no 10b 377 223 542 280 486 0.0186 yes yes 11 24 0 177 0 90 0.0012 no no 12 375 26 771 122 619 0.0185 no no 13 3909 3693 4098 3781 4044 0.1932 yes yes 14 135 2 367 14 277 0.0067 yes yes 15 72 0 240 4 179 0.0035 no no 16 369 84 768 190 526 0.0182 no no 17a 6 0 38 0 24 0.0003 no no 17b 6 0 27 0 19 0.0003 no no 18 1330 537 2068 813 1801 0.0657 yes yes 19 388 159 574 255 532 0.0192 yes no 20 43 0 240 1 139 0.0021 no no 21 50 0 188 2 134 0.0025 no no 22 93 1 265 8 191 0.0046 no no 23 24 0 143 1 94 0.0012 no no 24 30 0 167 1 107 0.0015 no no 25 195 0 626 11 434 0.0096 no no   147 26 90 0 444 3 284 0.0045 no no 27 36 0 120 1 85 0.0018 no no 28 12 0 69 0 39 0.0006 no no 29 88 0 430 3 273 0.0043 no no 30 395 59 762 165 642 0.0195 no no 31 42 0 202 2 135 0.0021 no no 32 235 17 497 80 414 0.0116 no no 33 38 0 133 2 93 0.0019 no no 34 223 89 346 147 297 0.0110 no no 35 21 0 118 1 82 0.0011 no no 36 659 59 1463 136 1210 0.0326 yes yes 37 576 178 924 325 827 0.0285 no no 38 104 0 312 10 217 0.0051 yes yes 39 627 279 945 416 837 0.0310 no no 40 315 3 1202 17 877 0.0156 no no 41 68 0 337 2 219 0.0034 no no 42 30 0 155 1 113 0.0015 no no 43 10 0 68 0 31 0.0005 no no 44 68 0 331 2 224 0.0034 no no 45 47 0 242 3 147 0.0023 yes no 46 43 0 271 2 145 0.0021 no no 47 56 0 247 3 159 0.0028 no no 48 4 0 27 0 12 0.0002 no no 49 25 0 73 1 58 0.0013 yes no 50 63 0 245 2 179 0.0031 no no 51 205 9 377 66 327 0.0101 no no 52 27 0 208 1 90 0.0014 no no 53 36 0 170 1 105 0.0018 no no 54 148 13 294 43 255 0.0073 no no 55 7 0 42 0 27 0.0004 no no 56 155 0 520 7 409 0.0076 yes no 57 62 0 255 3 174 0.0031 no no 58 128 0 565 8 341 0.0063 no no 59 31 0 162 1 93 0.0016 no no 60 5 0 31 0 17 0.0002 no no    148 B.25 Patient 3 SignatureAnalyzer mutational signatures. The total number of somatic mutations was 20236. The exposure columns refer to the mutation counts (mean exposure, 50%) contributed by each signature and the proportion columns refer to the proportion of mutations contributed by each signature. Sig, signature.  Sig Exposure 0% 100% 2.50% 97.50% Proportion Exposure outlier Proportion outlier 1 742 363 975 530 920.2 0.0366 no no 2 4399 3902 4770 4162 4631.7 0.2174 yes yes 3 226 1 575 28 455.4 0.0112 no no 4 264 1 892 43 570.5 0.0130 no no 5 541 1 1200 127 967.7 0.0268 no no 6 195 2 699 9 494.4 0.0096 no no 7a 73 0 353 3 223.8 0.0036 yes no 7b 69 1 351 3 207.8 0.0034 no no 7c 76 0 329 3 202.8 0.0038 no no 8 311 22 687 69 572.2 0.0153 no no 9 273 101 504 152 399.0 0.0135 no no 10a 44 0 263 1 144.0 0.0022 no no 11 123 0 377 6 272.9 0.0061 yes no 12 100 1 466 4 290.5 0.0049 no no 13 5389 4916 5756 5100 5652.0 0.2663 yes yes 14 40 0 232 2 134.4 0.0020 no no 15 29 0 184 1 104.5 0.0015 no no 16 434 179 711 277 586.9 0.0214 no no 17a 28 0 112 1 81.1 0.0014 no no 17b 10 0 61 0 33.2 0.0005 no no 18 628 371 899 457 809.4 0.0310 no no 19 142 4 299 38 251.1 0.0070 no no 21 21 0 119 1 72.6 0.0010 no no 22 156 50 312 77 239.1 0.0077 no no 26 128 0 452 10 285.2 0.0063 no no 28 23 0 112 1 78.1 0.0012 no no 30 272 4 729 30 524.5 0.0134 no no 33 44 0 129 2 90.3 0.0022 no no 35 21 0 134 0 75.6 0.0010 no no 36 1447 1074 1768 1238 1628.6 0.0715 yes yes 37 265 1 741 34 543.2 0.0131 yes no   149 38 218 77 363 139 300.3 0.0108 yes yes 39 331 6 797 61 627.9 0.0164 no no 40 333 1 675 98 548.2 0.0165 no no 60 10 0 37 1 22.7 0.0005 no no 55 10 0 49 0 33.9 0.0005 no no 44 71 0 300 1 217.2 0.0035 no no 61 33 0 228 1 110.7 0.0016 no no 62 165 0 457 13 360.0 0.0082 yes yes 63 26 0 195 1 97.4 0.0013 no no 64 130 1 513 5 354.1 0.0064 no no 65 143 0 658 4 433.3 0.0071 no no 66 25 0 150 1 90.9 0.0012 no no 67 25 0 170 0 92.6 0.0012 no no 68 212 3 524 30 444.0 0.0105 no no 69 253 1 1162 10 748.5 0.0125 no no 70 67 0 387 1 234.5 0.0033 no no 71 384 17 820 137 626.5 0.0190 no no 72 31 0 122 1 96.6 0.0015 no no 73 53 0 279 1 159.9 0.0026 yes no 74 78 0 403 2 240.6 0.0038 no no 75 134 1 509 7 322.6 0.0066 yes no 76 169 1 659 8 432.8 0.0084 no no 77 25 0 142 1 89.9 0.0012 no no 78 24 0 161 1 77.6 0.0012 no no 79 23 0 106 1 75.0 0.0011 no no 80 431 65 903 127 707.9 0.0213 no no 81 224 45 405 102 348.4 0.0111 no no 82 45 0 323 2 148.5 0.0022 no no 83 48 12 92 26 73.0 0.0024 no no     150 B26.  EthSeq predicted ethnicity of patients with MUTYH germline variants. The cDNA variant description relates to the NM_001048171 transcript and the amino acid variant description to the NP_001041636 transcript. All variants related to the hugo gene MUTYH. ACC, Adrenocortical Carcinoma; ATRT, Atypical Teratoid/Rhabdoid Tumor; EAS, East Asian; EUR, European; IDC, Breast Invasive Ductal Carcinoma; LUAD, Lung Adenocarcinoma; NA, not available; PAAD, Pancreatic Adenocarcinoma; RCC, renal cell carcinoma; SAS, South Asian; ULMS, Uterine Leiomyosarcoma.  ID Tumour type cDNA variant AA variant Effect Genomic position ethnicity Type ethnicity Ethnicity contribution 1 PAAD c.996G>A c.815G>A p.Ser332Ser p.Gly272Glu Synonymous Missense chr1:45797481C>T chr1:45797914C>T EAS CLOSEST EAS(78.37%)|SAS(21.63%) 2 IDC c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS INSIDE NA 3 IDC c.1145G>A p.Gly382Asp Missense chr1:45797228C>T EUR INSIDE NA 4 ACC c.1145G>A p.Gly382Asp Missense chr1:45797228C>T EUR INSIDE NA 5 ATRT c.536A>G p.Tyr179Cys Missense chr1:45798475T>C EUR INSIDE NA 6 LUAD c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS INSIDE NA 7 RCC c.536A>G p.Tyr179Cys Missense chr1:45798475T>C EUR CLOSEST EUR(81.17%)|SAS(18.83%) 8 IDC c.536A>G p.Tyr179Cys Missense chr1:45798475T>C EUR CLOSEST EUR(72.96%)|SAS(27.04%) 9 ULMS c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS CLOSEST EUR(19.2%)|EAS(42.24%)|SAS(38.55%) 10 PAAD c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS INSIDE NA 11 IDC c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS INSIDE NA 12 LUAD c.934-2A>G splice_site Splice acceptor chr1:45797760T>C EAS CLOSEST EAS(93.44%)|SAS(6.56%)   151 Appendix C Chapter 3 C1. Coding mutation summary. SNV, single nucleotide variant; SV, structural variant; TCGA, The Cancer Genome Atlas   Type of mutation Number of events Percentile  SNV 35 44 (amongst all TCGA cancers) indels 3 38 (amongst all TCGA cancers) SV 40 21 (amongst our local database of 584 diverse cancer cases )    152 C2. Somatic small mutations chr pos ref alt hugo Mutation support zygosity Eff type hgvs_cds hgvs_protein 1 46106067 G A GPBP1L1 '1/2' het missense  c.559C>T p.Pro187Ser 1 184020992 C A TSEN15 '1/2' het missense  c.103C>A p.Pro35Thr 2 1893188 G A MYT1L '0/2' sub missense  c.2339C>T p.Pro780Leu 2 26204120 T A KIF3C '1/2' het missense  c.667A>T p.Thr223Ser 2 141032086 C T LRP1B '1/2' het missense  c.13049G>A p.Arg4350His 2 176965326 G A HOXD12 '1/2' het missense  c.679G>A p.Gly227Arg 2 177053900 C T HOXD1 '1/2' het missense  c.371C>T p.Ala124Val 2 215865636 G A ABCA12 '1/2' het missense  c.2972C>T p.Thr991Ile 3 49405964 C A RHOA '1/1' hom missense  c.174G>T p.Trp58Cys 3 49760112 C G GMPPB '1/1' hom missense  c.478G>C p.Val160Leu 3 101578211 A T NFKBIZ '1/4' het missense  c.1853A>T p.Tyr618Phe 5 16877756 C A MYO10 '1/2' het missense  c.82G>T p.Ala28Ser 6 41889465 C A BYSL '1/2' het missense  c.165C>A p.Ser55Arg 6 42162535 C A GUCA1B '1/2' het missense  c.24G>T p.Glu8Asp 6 131184818 C T EPB41L2 '1/1' hom missense  c.2870G>A p.Arg957His 7 142658880 ACCTC A KEL '1/2' het splice donor +inframe deletion+splice region +intron  c.79_81+1delGAGG p.Glu27del 8 104987587 A C RIMS2 '1/2' het missense  c.2156A>C p.His719Pro 9 21971208 C T CDKN2A '1/1' hom splice acceptor +intron   9 87570398 T A NTRK2 '1/2' het missense  c.2090T>A p.Met697Lys 9 125512944 G C OR1L6 '1/2' het missense  c.818G>C p.Arg273Pro 9 115246828 AT A C9orf147 '1/2' het frameshift  c.453delA p.Lys151fs 10 5042786 C A AKR1C2 '1/2' het missense  c.325G>T p.Asp109Tyr   153 10 26459419 C A MYO3A '1/2' het missense  c.3349C>A p.Gln1117Lys 10 28142292 T C ARMC4 '0/2' sub missense +splice region  c.1597A>G p.Thr533Ala 10 99153472 C T RRP12 '1/2' het missense  c.499G>A p.Val167Ile 11 56085891 A C OR8K3 '1/2' het missense  c.109A>C p.Ile37Leu 12 54769686 A T ZNF385A '0/2' sub missense  c.200T>A p.Leu67His 12 108929181 T G SART3 '1/2' het missense  c.1510A>C p.Asn504His 13 46917552 G T RUBCNL '0/2' sub missense  c.1957C>A p.Leu653Ile 13 94482464 G A GPC6 '1/2' het missense  c.377G>A p.Arg126Gln 13 46917549 GAA G RUBCNL '0/2' sub frameshift  c.1958_1959delTT p.Leu653fs 14 89629203 G C FOXN3 '1/2' het missense  c.962C>G p.Ser321Cys 15 43784283 G A TP53BP1 '0/2' sub missense  c.203C>T p.Ser68Phe 15 92973313 G A ST8SIA2 '1/2' het missense  c.133G>A p.Val45Met 17 4872017 C T CAMTA2 '1/2' het missense  c.3710G>A p.Arg1237His 17 38519248 G T GJD3 '1/2' het missense  c.820C>A p.Arg274Ser 17 76548862 T A DNAH17 '1/2' het missense  c.2204A>T p.Glu735Val 17 76548863 C A DNAH17 '1/2' het stop gained c.2203G>T p.Glu735* 18 25568504 GA G CDH2 '1/2' het frameshift  c.1724delT p.Phe575fs 20 16359898 A T KIF16B '1/2' het missense  c.2749T>A p.Leu917Met 21 44589331 C T CRYAA '1/2' het missense  c.122C>T p.Ser41Leu 22 23540573 G T BCR '1/2' het splice donor +intron   22 37887298 G T CARD10 '1/2' het missense  c.2998C>A p.Leu1000Met 22 38119267 G A TRIOBP '1/2' het missense  c.704G>A p.Arg235Gln    154 C3. Somatic structural variants.  type gene1 gene2 breakpoint1 breakpoint2 flanking_reads split_reads deletion ARMC6 NCAN 19:19163504 19:19361176 36 20 deletion NOS1AP NOS1AP 1:162230748 1:162231284 0 7 deletion TXNRD3 TXNRD3 3:126358938 3:126359250 0 5 deletion RNF13 PAK2 3:149653091 3:196530353 115 89 deletion ZNF385D ZNF385D 3:21723705 3:21725967 2 0 deletion NA NA 3:85734745 3:146762311 48 25 deletion PIK3R1 YTHDC2 5:67564688 5:112859541 40 18 deletion NA RARS2 6:44701028 6:88280684 38 12 deletion NA NA 9:18755113 9:22413855 59 21 duplication BOC BOC 3:112932831 3:112934083 0 90 duplication THRB THRB 3:24205346 3:24205472 0 30 inversion NA NA 3:85644568 3:85734152 54 37 translocation NAV2 IFT172 11:19963012 2:27669870 2 17 translocation RTN1 NA 11:32892058 14:60139663 3 22 translocation NA NA 11:86507395 18:60686113 2 12 translocation RTTN NA 12:31406636 18:67868552 2 12 translocation HDHD2 NA 12:93576680 18:44641894 1 14 translocation NA NA 13:25512014 16:12459616 1 13 translocation LHFP GREM2 13:39964150 1:240747421 30 8 translocation C14orf159 NA 14:91656636 X:119527843 2 15 translocation NA DPP6 1:22421092 7:153966050 9 13 translocation SLC35F3 FSTL4 1:234213058 5:132753954 7 17 translocation SMYD3 NA 1:246592331 17:47177341 5 18 translocation NA TMEM178A 1:79582076 2:39935866 1 18   155 translocation NA IL1RAPL2 2:158028141 X:104786837 10 10 translocation ACVR1C NA 2:158433673 4:133445843 1 11 translocation NA SYTL2 2:218265091 11:85444947 4 21 translocation NA NA 3:39611778 3:146762500 38 37 translocation SCFD2 NA 4:54113738 21:21256658 4 14 translocation FRAS1 NA 4:79322982 5:99608159 3 8 translocation STK10 NA 5:171514937 13:61162457 16 37 translocation ATXN1 PREX2 6:16613209 8:68981808 4 9 translocation AKAP9 NA 7:91642608 11:104144748 4 8 translocation NA NA 8:96730502 18:54706193 3 27 translocation DMRT1 RERG 9:848897 12:15365296 4 9     156 C4. Monte Carlo simulation mutational signatures. fit, Number of mutations contributed by the signature, determined by NNLS; means, Mean exposure (number of mutations) from Monte Carlo resampling; lCI, Inferior limit of 95% Confidence Interval (CI) for number of mutations; uCI, Upper limit of 95% Confidence Interval (CI) for number of mutations   Sig fit means lCI uCI Probable Associations 1 906.919686 906.012972 773.767126 1040.53789 Age 2 72.6622468 73.2384786 15.3702575 130.82021 APOBEC 3 250.956731 271.506512 0 561.329522 BRCA1/2 mutations 4 0 0 0 0 Smoking 5 741.474803 678.258313 0 1320.64833 - 6 0 0 0 0 DNA MMR deficiency 7 40.8601126 42.4134975 0 126.055423 UV light 8 677.65071 666.129821 464.964631 848.340652 - 9 385.700172 384.292944 245.247115 515.536086 IG gene hypermutation 10 10.6821552 16.9021457 0 65.8417119 Pol e mutations 11 35.4312068 42.9907755 0 137.861827 Temozolomide 12 129.970833 134.189454 0 330.928836 - 13 70.1510528 67.437123 29.4102214 107.665822 APOBEC 14 0 0.02612349 0 0 - 15 43.1293493 48.6762018 0 145.844348 DNA MMR deficiency 16 678.30871 688.879246 294.159951 1074.56829 - 17 0 0 0 0 - 18 0 2.86E-04 0 0 - 19 0 0.40953287 0 0 - 20 0 0 0 0 DNA MMR deficiency 21 0 2.54600286 0 35.8642342 - 22 0 0.00355691 0 0 aristolochic acid 23 0 0 0 0 -   157 24 0 0 0 0 aflatoxin 25 0 25.0029242 0 131.202734  26 0 5.12111865 0 79.5735363 DNA MMR deficiency 27 44.7210005 43.6489258 12.3757606 75.0740492 - 28 0 0.00288866 0 0 - 29 0 0 0 0 Tobacco chewing 30 187.700319 180.70141 0 385.119988 -    158 C5. Monte Carlo simulation mutational signatures timing. nnls_mean, Mean exposure (number of mutations) contributed by signature across Monte Carlo resampled mutation catalogs; nnls_lCI, Inferior limit of 95% Confidence Interval (CI) for number of mutations; nnls_uCI, Upper limit of 95% Confidence Interval (CI) for number of mutations; nnls_solution, Number of mutations contributed by signature according to NNLS best fit. The mutation counts were rounded to the nearest integer.    Early       Late         Sig nnls_mean nnls_lCI nnls_uCI nnls_solution nnls_mean nnls_lCI nnls_uCI nnls_solution Probable Associations 1 724 614 837 723 130 70 188 133 Age 2 46 7 87 47 12 0 38 9 APOBEC 3 150 0 361 116 106 0 265 102 BRCA1/2 mutations 4 0 0 0 0 0 0 0 0 Smoking 5 300 0 725 396 215 0 550 216 - 6 0 0 0 0 2 0 20 0 DNA MMR deficiency 7 87 13 156 95 0 0 0 0 UV light 8 369 204 519 371 289 168 402 297 - 9 148 55 241 152 222 140 315 222 IG gene hypermutation 10 8 0 46 0 10 0 39 7 Pol e mutations 11 35 0 114 27 3 0 29 0 Temozolomide 12 48 0 183 37 67 0 169 69 - 13 45 12 75 47 21 0 43 23 APOBEC 14 0 0 0 0 1 0 13 0 - 15 68 0 157 70 0 0 4 0 DNA MMR deficiency 16 496 198 773 477 227 0 454 229 - 17 0 0 0 0 0 0 0 0 -   159 18 1 0 6 0 0 0 0 0 - 19 2 0 25 0 1 0 19 0 - 20 0 0 0 0 1 0 10 0 DNA MMR deficiency 21 2 0 30 0 3 0 29 0 - 22 2 0 18 0 0 0 0 0 aristolochic acid 23 0 0 0 0 0 0 0 0 - 24 0 0 0 0 0 0 0 0 aflatoxin 25 32 0 133 30 10 0 63 0   26 13 0 86 0 0 0 0 0 DNA MMR deficiency 27 5 0 22 2 37 17 60 38 - 28 1 0 8 0 0 0 0 0 - 29 0 0 0 0 0 0 0 0 Tobacco chewing 30 15 0 108 0 136 49 211 147 -  C6. Features associated with radiation-induced tumours as described in Behrati et al (2016). SNV, single nucleotide variants. Feature Approximate values in radiation-induced tumours Ratio feature associated with radiated tumours Porocarcinoma value Support radiation-induced process? Indel/SNV ratio 4 High compared to non-irradiated  0.0021 no Deletion/insertion ratio 0.12 High compared to non-irradiated  2.1 no Deletion microhomology fraction Up to 3/4 of structural variants High compared to non-irradiated  0.086 no Fraction of deletion in simple repeats 0.053 for radiated cancers, 0.098 for non-radiated cancers Low compared to non-irradiated  0.042 yes    160 C7. CDKN2A transcripts coverage. ENST00000304494 and ENST00000361570 transcripts correspond to p16INK4a (NM_000077, NP_000068) and p14ARF (NM_058195, NP_478102) respectively.  Transcript Length TPM RPKM KPKM Number of reads ENST00000304494 1218 7.68047 4.15995 4.15995 6.81043 ENST00000361570 1151 4.60101 2.49203 2.49203 3.85183 ENST00000579755 1283 0.605136 0.327758 0.327758 0.565675 ENST00000494262 989 0.215702 0.11683 0.11683 0.154737 ENST00000498628 926 0.377957 0.204711 0.204711 0.253523 ENST00000498124 880 2.15073 1.16489 1.16489 1.36948 ENST00000470819 856 3.03883 1.64591 1.64591 1.88105 ENST00000380151 794 2.70542 1.46533 1.46533 1.55061 ENST00000530628 748 75.9888 41.1575 41.1575 40.968 ENST00000446177 716 0 0 0 0 ENST00000578845 678 0.93195 0.504769 0.504769 0.454198 ENST00000579122 666 11.8744 6.43149 6.43149 5.68176 ENST00000497750 565 5.35274 2.89919 2.89919 2.16141 ENST00000479692 544 1.08157 0.585808 0.585808 0.419935 ENST00000380150 493 1.92151 1.04074 1.04074 0.673579 ENST00000577854 442 5.82578 3.1554 3.1554 1.82248 ENST00000582361 293 9.25164 5.01094 5.01094 1.87472  C8. CDKN2A exon coverage. Merged expression values across approximate coordinates for exons of p16INK4a (p16) and p14ARF (p14) are presented below. Cov, coverage.  exon chr start end Number fractional reads Exox cov Exon_cov_per_ length RPKM E3 (p14, p16) 9 21967752 21968241 896.573333 67243 137.2306 19.7237 E2 (p14, p16) 9 21968574 21968770 11.8 885 4.4924 0.6457 E1 (p16) 9 21974403 21975038 234.24 17568 27.6226 3.9701 E1 (p14) 9 21994138 21994490 616.653333 46249 131.017 18.8307  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0388625/manifest

Comment

Related Items