You may notice some images loading slow across the Open Collections website. Thank you for your patience as we rebuild the cache to make images load faster.

Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The clinical actionability and evolution of mutational processes in metastatic cancer Zhao, Eric Yang 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2019_february_zhao_eric.pdf [ 6.49MB ]
Metadata
JSON: 24-1.0374139.json
JSON-LD: 24-1.0374139-ld.json
RDF/XML (Pretty): 24-1.0374139-rdf.xml
RDF/JSON: 24-1.0374139-rdf.json
Turtle: 24-1.0374139-turtle.txt
N-Triples: 24-1.0374139-rdf-ntriples.txt
Original Record: 24-1.0374139-source.json
Full Text
24-1.0374139-fulltext.txt
Citation
24-1.0374139.ris

Full Text

The Clinical Actionability and Evolution ofMutational Processes in Metastatic CancerEric Yang ZhaoB. Sc. (Honours) The University of British Columbia 2013, MayA thesis submitted in partial fulfillment of the requirements for thedegree ofDoctor of PhilosophyThe Faculty of Graduate and Postdoctoral Studies(Bioinformatics)The University of British Columbia(Vancouver)November 2018©Eric Yang Zhao, 2018The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:  The Clinical Actionability and Evolution of Mutational Processes in Metastatic Cancer  submitted by Eric Yang Zhao  in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics  Examining Committee: Steven J. M. Jones, Medical Genetics Supervisor  Christian Steidl, Pathology & Laboratory Medicine Supervisory Committee Member  Cathie Garnis, Department of Surgery University Examiner Judy Wong, Medical Genetics University Examiner   Additional Supervisory Committee Members: Marco Marra, Medical Genetics Supervisory Committee Member Inanc Birol, Medical Genetics Supervisory Committee Member     Stephen Yip, Pathology & Laboratory Medicine Supervisory Committee Member      ii A B S T R A C TCancers are characterized by somatic mutation arising from the interplay ofmutagen exposure and deficient DNA repair. Whole genome sequencing of tu-mours reveals characteristic patterns of mutation, known as mutation signatures,which often correspond with specific processes such as cigarette smoke expo-sure or the loss of a DNA repair pathway. Quantifying DNA repair deficiencycan have clinical implications. Cancer chemotherapies which induce DNA dam-age are known to be more effective against cancers with deficient DNA repair.However, it is not yet known whether mutation signatures can serve as reliablepredictive biomarkers for response to these treatments. Furthermore, the currentunderstanding of mutation signatures stems largely from studies of primary, un-treated tumours, whereas metastasis underpins as much as 90% of cancer-relatedmortality. This thesis aims to (1) describe the association between mutation sig-natures and clinical response to DNA damaging chemotherapy, (2) enable ac-curate personalized assessment of mutation signatures and their evolution overtime, and (3) characterize the evolution of mutational processes in metastaticcancers. To assess clinical actionability, we quantified signatures of single nu-cleotide variants, structural variants, copy number variants, and small deletionsin 93 metastatic breast cancers, 33 of which received platinum-based chemother-apy. We found that patients with signatures of homologous recombination defi-ciency had improved responses and prolonged treatment durations on platinum-based chemotherapy. Next, we formulated a Bayesian model called SignIT, whichimproves the accuracy of individualized mutation signature analysis and infersiiisignature evolution over tumour subpopulations. We demonstrated SignIT’s su-perior accuracy on both simulated data and somatic mutations from The Can-cer Genome Atlas, and validated temporal dissection using whole genomes from24 multiply-sequenced cancers. We highlighted a potential clinical application ofmutation in a BRCA1-mutated pancreatic adenocarcinoma with low HomologousRecombination Deficiency (HRD) signature but exceptional response to platinum-containing chemotherapy. Finally, we deciphered mutation signatures from nearly500 metastatic cancer whole genomes, revealing evolution of mutational processesassociated with late metastasis and exposure to cytotoxic chemotherapy. Takentogether, our findings demonstrate the complex interplay of factors shaping themetastatic cancer genome. We highlight both clinical opportunities of studying ge-nomic instability and the additional insights available from understanding theirtemporal evolution.ivL AY S U M M A RYChanges in DNA, known as mutations, happen for many reasons such as expo-sure to cigarette smoke and ultraviolet rays. Healthy human cells have tools to fixmutated DNA, but cancer cells often lose this ability. This might make tumoursvulnerable to chemotherapy designed to damage DNA, because healthy cells canrepair this damage but cancer cells cannot. Can we tailor treatments to exploit thisvulnerability? To answer this question, we detected all the mutations in about 500late-stage cancers of different types. The patterns of DNA mutation can reveal ifa cancer is properly repairing DNA. Within breast cancers, we found that cancersunable to repair broken DNA were more sensitive to chemotherapies called cis-platin and carboplatin. This could help improve personalized treatment plans forsome cancer patients. We also found mutation patterns caused by chemotherapy,showing that cancer treatments themselves can alter DNA.vP R E FA C EAll of the work contained within this thesis was conducted at the BC CancerAgency Genome Sciences Centre under the auspices of the Personalized Oncoge-nomics Project (NCT 02155621), approved by the University of British ColumbiaResearch Ethics Board (approval #H12-00137 and #H14-00681-A019). Written in-formed consent, including potential publication of findings, was obtained frompatients prior to genomic profiling. Patient information was anonymized, andeach was assigned an alphanumeric identification code. Whole-genome sequenc-ing and RNA-seq data (.bam files) are deposited in the European Genome-Phenome Archive (www.ebi.ac.uk/ega/home) under the study accession numberEGAS00001001159.Chapters 1 and 5 are my original work, and contain brief excerpts from aninvited book chapterZhao EY, Jones MR, Jones SJM. “Whole Genome Sequencing in Can-cer”. Next Generation Sequencing. Ed. McCombie WR, Mardis ER,Knowles J, and McPherson JD. New York: Cold Spring Harbour Press,Accepted Feb 2018.A version of chapter 2 has been published:Zhao EY, Shen Y, Pleasance E, Kasaian K, Leelakumari S, Jones M, BoseP, Ch’ng C, Reisle C, Eirew P, Corbett R, Mungall KL, Thiessen N, Ma Y,Schein JE, Mungall AJ, Zhao Y, Moore RA, Den Brok W, Wilson S, VillaD, Shenkier T, Lohrisch C, Chia S, Yip S, Gelmon K, Lim H, Renouf D,viiSun S, Schrader KA, Young S, Bosdet I, Karsan A, Laskin J, Marra MA,and Jones SJM. Homologous Recombination Deficiency and Platinum-Based Therapy Outcomes in Advanced Breast Cancer. Clin Cancer Res.2017 Dec 15;23(24):7521-7530.I conceptualized this project with my supervisor, Steven Jones. I was the pri-mary researcher, leading the study design, clinical chart review, data analysis,interpretation of findings, and manuscript preparation. Yaoqing Shen, Erin Plea-sance, Katayoon Kasaian, Sreeja Leelakumari, Martin Jones, Pinaki Bose, Car-olyn Ch’ng, Caralyn Reisle, and Peter Eirew were involved in the analysis ofspecific patient cases and bioinformatics pipeline development. Richard Corbett,Karen Mungall, Nina Thiessen, and Yussanne Ma were responsible for developingand operating the bioinformatics pipelines. Jacqueline Schein, Andrew Mungall,Yongjun Zhao, and Richard Moore were responsible for developing and operat-ing laboratory protocols for the handling and sequencing of samples. WendieDen Brok, Sheridan Wilson, Diego Villa, Tamara Shenkier, Caroline Lohrisch,Stephen Chia, Karen Gelmon, and Sophie Sun were responsible for recruiting pa-tients. Karen Gelmon, Howard Lim, Daniel Renouf, and Janessa Laskin providedguidance and supervision of the clinical chart review. Stephen Yip performedhistopathology analysis. Yaoqing Shen, Howard Lim, Kasmintan Schrader, SeanYoung, Ian Bosdet, and Aly Karsan assisted with analysis and interpretation ofgermline variants, as well as associated ethics discussions. Janessa Laskin andMarco Marra provided supervision over the Personalized Oncogenomics Project.All authors were involved in manuscript revision.A version of chapter 3 has been submitted for publication:Zhao EY, Pleasance ED, Jones MR, Shen Y, Reisle CR, Mungall AJ, ZhaoY, Moore RA, Laskin J, Marra MA, Jones SJM. SignIT: Inferring Muta-viiition Signatures and their Temporal Evolution in Individual Tumours.In Review.In addition to chapter 3, aspects of the analysis submitted for publication alsocontributed to chapter 5. I was the primary researcher on this project and wasresponsible for conceptualization, methodology development, data analysis, andmanuscript preparation. Erin Pleasance, Martin Jones, Yaoqing Shen, and CaralynReisle were involved in various aspects of data analysis and cohort assembly. An-drew Mungall, Yongjun Zhao, and Richard Moore were responsible for laboratoryprotocols, including the sequencing of samples. Janessa Laskin and Marco Marrasupervised the Personalized Oncogenomics Project, which was the major sourceof data for the study. Steven Jones supervised the project and was involved inconceptualization and interpretation of findings throughout. All authors revisedand approved the manuscript.A version of chapter 4 has been submitted for publication as a case study, forwhich I was an equally contributing primary author:Wong H, Zhao EY, Jones MR, Reisle CR, Eirew P, Pleasance ED, GrandeBM, Karasinka JM, Kallogher SE, Lim HJ, Shen Y, Yip S, Morin RD,Laskin J, Marra MA, Jones SJM, Schrader KA, Schaeffer DF, andRenouf DJ. Temporal dynamics of genomic alterations in a BRCA1germline mutated pancreatic cancer with low genomic instability bur-den but exceptional response to FOLFIRINOX. In Review.Hui-li was responsible for detailing the clinical history, and performing retro-spective review. I was principally responsible for the temporal analysis of BRCA1and other genomic events, as well as the analysis and timing of mutation signa-tures. Hui-li and I jointly conceptualized the project, interpreted the findings, andixwrote the manuscript. Martin Jones, Caralyn Reisle, Peter Eirew, and Erin Plea-sance were responsible for initial genome and transcriptome characterization ofthe tumour as part of the Personalized Oncogenomics Project. Howard Lim, Yao-qing Shen, and Kasmintan Schrader were responsible for the analysis of germlinevariants. Bruno Grande and Ryan Morin were responsible for the clonality analy-sis copy number variants. Joanna Karasinka and Steve Kalloger were responsiblefor coordinating the study. Stephen Yip was responsible for histopathology anal-ysis. Janessa Laskin and Marco Marra were responsible for supervising the Per-sonalized Oncogenomics Project, the study for which this patient was recruited.Steven Jones, Kasmintan Schrader, David Schaeffer, and Daniel Renouf jointly su-pervised this project and were involved at various stages of conceptualization andinterpretation of findings. All authors revised and approved the manuscript.A version of chapter 5 is part of an in-draft manuscript which will detail thewhole genome sequencing of the first 570 metastatic cancers in the PersonalizedOncogenomics Project. This project is led by Erin Pleasance, Yaoqing Shen, andMartin Jones, and is supervised by Marco Marra. However, I am the lead re-searcher on the aspect of the project which involves mutation signatures, whichis the only portion reported in this thesis. I was solely responsible for the analysisand interpretation of mutation signatures, with guidance from the study leadsand Steven Jones. Erin Pleasance and Martin Jones were responsible for curatingthe final list of participants included in the study and grouping them into can-cer type cohorts. Yussanne Ma oversees bioinformatics analyses for the project.Additionally, Laura Williamson was involved in the analysis of mismatch repairdeficiency genes and microsatellite instability.xTA B L E O F C O N T E N T SAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviii1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Cancer is an Evolving Genetic Disease . . . . . . . . . . . . . . 11.2.2 Whole Genome Sequencing of Cancer . . . . . . . . . . . . . . 101.2.3 From Cancer Genome to Personalized Oncogenomics . . . . . 201.3 Thesis Objectives and Chapter Overview . . . . . . . . . . . . . . . . 232 the clinical actionability of homologous recombinationdeficiency in advanced breast cancer . . . . . . . . . . . . . . . . 252.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.1 Somatic Mutation Signatures . . . . . . . . . . . . . . . . . . . 272.2.2 Genomic Findings Associated with HRD . . . . . . . . . . . . 292.2.3 HRD Mutation Signatures are Associated with PlatinumOutcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.4 Effects of HRDetect on Overall Survival and Treatment Du-ration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.5 Feasibility of HRD Analysis in Personalized Medicine . . . . 402.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.4.1 Patient Samples, Ethics, and Data Policy . . . . . . . . . . . . 472.4.2 Sample Collection, Preparation, and Sequencing . . . . . . . . 472.4.3 Bioinformatic Analysis . . . . . . . . . . . . . . . . . . . . . . . 482.4.4 Determining HRDetect Scores . . . . . . . . . . . . . . . . . . 492.4.5 Single Nucleotide Variant Mutation Signatures . . . . . . . . . 492.4.6 Structural Variant Mutation Signatures . . . . . . . . . . . . . 502.4.7 Calculation of the HRD Index . . . . . . . . . . . . . . . . . . . 512.4.8 Analysis of Deletion Microhomology . . . . . . . . . . . . . . 512.4.9 Review of Clinical Case Data . . . . . . . . . . . . . . . . . . . 513 signit : inferring mutation signatures and their temporalevolution in individual tumours . . . . . . . . . . . . . . . . . . . 55xi3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.1 SignIT Reports Credible Intervals and Signature Bleed . . . . 583.2.2 Resilience to Complexity and Noise . . . . . . . . . . . . . . . 583.2.3 SignIT Better Reproduces Signatures in Cancer Data . . . . . 613.2.4 SignIT Infers the Temporal Evolution of Signatures . . . . . . 633.2.5 Metastatic Tumours Demonstrate Divergence of MutationalProcesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.4.1 The SignIT Generative Model . . . . . . . . . . . . . . . . . . . 723.4.2 Simulated Genomes . . . . . . . . . . . . . . . . . . . . . . . . 823.4.3 Publicly Available Cancer Mutation Data . . . . . . . . . . . . 833.4.4 De Novo Signature Analysis . . . . . . . . . . . . . . . . . . . . 843.4.5 Structural Variant Mutation Signatures . . . . . . . . . . . . . 853.4.6 Whole Genome Sequencing of Metastatic Cancers . . . . . . . 853.4.7 Ploidy-correction of Copy Number Variants . . . . . . . . . . 873.4.8 Whole Genome Sequencing of Primary Tumours . . . . . . . 884 clinical application of mutation timing in a brca1-mutated pancreatic adenocarcinoma . . . . . . . . . . . . . . . . 894.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2 Case Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.3.1 BRCA1 Loss in the Primary and Metastasis . . . . . . . . . . . 924.3.2 Timing of the BRCA1 Loss . . . . . . . . . . . . . . . . . . . . . 924.3.3 Evolution of Mutation Signatures from Primary to Metastasis 954.3.4 Evolution of Orthogonal HRD-associated Mutational Signa-tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.3.5 Mutation Signature Timing . . . . . . . . . . . . . . . . . . . . 994.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.5.1 Tissue Collection, Processing, and Storage . . . . . . . . . . . 1024.5.2 Sequencing and Bioinformatics . . . . . . . . . . . . . . . . . . 1034.5.3 Mutation Timing Analysis . . . . . . . . . . . . . . . . . . . . . 1044.5.4 Mutation Signature and Signature Timing Analysis . . . . . . 1054.5.5 Additional HRD Metrics: Deletion Microhomology andHRD Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055 the evolution of mutational processes in metastatic can-cer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.2.1 Aging-related Mutation Signatures . . . . . . . . . . . . . . . . 1105.2.2 Signatures of Exogenous Mutation . . . . . . . . . . . . . . . . 114xii5.2.3 Signatures of Endogenous Mutation and DNA Repair Defi-ciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.2.4 The Late-arising Signatures of Metastases and Chemother-apy Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.2.5 Signature M3 Results from Exposure to Cisplatin . . . . . . . 1215.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.4.1 Whole Genome Sequencing of Metastatic Cancers . . . . . . . 1255.4.2 Mutation Calling . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.4.3 Mutation Signature Analysis . . . . . . . . . . . . . . . . . . . 1265.4.4 Temporal Analysis of Mutation Signatures . . . . . . . . . . . 1275.4.5 Microsatellite Instability Scores . . . . . . . . . . . . . . . . . . 1285.4.6 Quantifying Gene Expression from Transcriptomes . . . . . . 1295.4.7 Retrospective Clinical Review . . . . . . . . . . . . . . . . . . . 1295.4.8 Analysis of Drug-Signature Associations . . . . . . . . . . . . 1306 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316.1 Summary of Major Findings . . . . . . . . . . . . . . . . . . . . . . . . 1316.2 The Clinical Implications of Genomic Instability . . . . . . . . . . . . 1336.3 The Mutational Processes of Metastatic Cancers . . . . . . . . . . . . 1356.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.5 A Role for Mutation Signatures in Precision Oncology . . . . . . . . 1386.6 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 1396.7 Looking Forward: Biomarker Discovery in the Era of Genomic Data 141Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169a1 Note on nucleic acid nomenclature . . . . . . . . . . . . . . . . . . . . 169a2 Appendix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170a3 Appendix Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173xiiiL I S T O F TA B L E STable 2.1 Summary of patient molecular and clinical characteristicsby HRDetect status . . . . . . . . . . . . . . . . . . . . . . . . 36Table 2.2 Test metrics of HRDetect predictions computed using spec-ified thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . 38Table 2.3 Area under the curve of homologous recombination defi-ciency (HRD) signatures in platinum response prediction . . 38Table 2.4 Logistic regression model odds ratios of clinical improve-ment (CI) on platinum-based chemotherapy . . . . . . . . . . 39Table 3.1 The numbers of samples and variants in each TCGA cohortanalyzed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Table 5.1 The number of patients belonging to each cancer type spe-cific cohort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Table A.1 The expanded nomenclature for nucleic acid naming . . . . 169Table A.2 Significance tests for differences in mutation signaturesacross molecular subtypes . . . . . . . . . . . . . . . . . . . . 171Table A.3 Sample details for whole genome sequencing of multiply-sequenced tumours . . . . . . . . . . . . . . . . . . . . . . . . 171xvL I S T O F F I G U R E SFigure 1.1 Mutational prevalence is related to time of mutation onset . 9Figure 1.2 Whole genome sequencing data reveal diverse forms of ge-nomic alteration . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 1.3 The analysis of mutation signatures by non-negative matrixfactorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.1 Nine signatures of single nucleotide variation decipheredfrom 93 breast cancer whole genomes . . . . . . . . . . . . . 28Figure 2.2 Breast cancer signatures across subtypes . . . . . . . . . . . . 30Figure 2.3 Breast cancer structural variant signatures . . . . . . . . . . . 31Figure 2.4 Association of platinum-based treatment outcomes withHRDetect, an aggregate of six homologous recombinationdeficiency (HRD) mutation signatures . . . . . . . . . . . . . 34Figure 2.5 HRDetect scores, mutations in key homologous recombina-tion genes, and outcomes on platinum-based therapy . . . . 35Figure 2.6 Homologous recombination deficiency is associated withextended overall survival (OS) and total duration onplatinum-based therapy (TDT) . . . . . . . . . . . . . . . . . . 41Figure 2.7 N of 1 signatures by non-negative matrix factorization(NNLS) accurately reproduce HRDetect scores . . . . . . . . 43Figure 2.8 Treatment timelines and radiographic outcomes onplatinum-based chemotherapy arranged by total durationon platinum-based chemotherapy . . . . . . . . . . . . . . . . 53Figure 3.1 SignIT reports complete posterior distributions along withsignal bleed between signatures . . . . . . . . . . . . . . . . . 59Figure 3.2 SignIT improves signature estimation for complex modelswith noisy data . . . . . . . . . . . . . . . . . . . . . . . . . . 60Figure 3.3 Runtimes of n-of-1 mutation signature decomposition tools . 62Figure 3.4 Comparison of NMF and n-of-1 methods across nine cancerexome cohorts . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 3.5 Subpopulation-specific mutation signatures in a somaticcancer whole genome . . . . . . . . . . . . . . . . . . . . . . . 66Figure 3.6 Mutation signatures in serially sequenced metastatic tu-mours demonstrate time-dependent divergence from theprimary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 3.7 SignIT improves upon binary temporal partitioning (BTP)by modeling the tumour subpopulation structure . . . . . . 69Figure 3.8 A colorectal cancer demonstrates errors in binary partition-ing resulting from unusually high mutational prevalence ofpopulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70xviiFigure 3.9 Complete SignIT joint population-signature model . . . . . . 77Figure 3.10 The time of sample collection for multiply sequenced tumours 86Figure 4.1 Evolution of single nucleotide variant (SNV) mutation sig-natures in a pancreatic adenocarcinoma with exceptionalresponse to FOLFIRINOX . . . . . . . . . . . . . . . . . . . . 91Figure 4.2 Genomic analysis and clinical evolution of a germlineBRCA1 c.68_69delAG-associated pancreatic ductal adeno-carcinoma (PDAC) primary tumor (left) and metastasis (right) 93Figure 4.3 Joint calling of CNV, LOH, and clonal status performedacross the metastatic genome using TITAN . . . . . . . . . . 94Figure 4.4 Comparison of inferred timing for events shared betweenpancreatic primary tumor and metastasis . . . . . . . . . . . 96Figure 4.5 Mutation signature bleed between signatures 3 and 8 . . . . 97Figure 4.6 Evolution of structural variation alterations between thepancreatic primary and metastasis . . . . . . . . . . . . . . . 98Figure 4.7 Filtering of small segments for mutation timing and HRDscores pre-processing . . . . . . . . . . . . . . . . . . . . . . . 106Figure 5.1 Novel metastatic signatures not catalogued in COSMIC . . . 111Figure 5.2 Mutation signatures and their temporal dissection inmetastatic cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 112Figure 5.3 De novo mutation signatures deciphered from metastaticcancers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Figure 5.4 The hypermutating signatures of tobacco smoking and ul-traviolet radiation . . . . . . . . . . . . . . . . . . . . . . . . . 115Figure 5.5 Platinum-exposure is associated with temporal evolutionof homologous recombination deficiency (HRD) associatedmutation signatures . . . . . . . . . . . . . . . . . . . . . . . . 116Figure 5.6 A novel signature of mismatch repair (MMR) deficiency isassociated with microsatellite instability and underexpres-sion of MLH1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Figure 5.7 Screening of drug-signature interactions reveals statisticallysignificant associations with cisplatin, oxaliplatin, and flu-orouracil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Figure A.1 Underestimated mutation signature exposures from simu-lated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Figure A.2 Overestimated mutation signature exposures from simu-lated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175Figure A.3 Model selection for mutation signature analysis of nine co-horts of The Cancer Genome Atlas . . . . . . . . . . . . . . . 176Figure A.4 Matching de novo mutation signatures to previously iden-tified known signatures . . . . . . . . . . . . . . . . . . . . . . 177Figure A.5 Clustering of mutation signatures across multiple cancercohorts into a common consensus signature set . . . . . . . . 178xviiiFigure A.6 Mutation signatures were successfully deciphered across 12cancer cohorts . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Figure A.7 Late-arising mutation signatures across cancer types . . . . . 180Figure A.8 Late-arising mutation signatures across biopsy sites . . . . . 181xixL I S T O F A B B R E V I AT I O N SADVI automatic differentiation variational inferenceBER base excision repairBTP binary temporal partitioningCCF cancer cell fractionCGH comparative genomic hybridizationCI clinical improvementCNLOH copy-neutral loss of heterozygosityCNV copy number variantCOSMIC Catalogue of Somatic Mutations in CancerDDR deficient DNA repairEGA European Genome-Phenome ArchiveEM expectation maximizationFFPE formalin-fixed and paraffin embeddedGb gigabaseGDC Genomic Data CommonsHMC Hamiltonial Monte CarloHMM hidden Markov modelHR homologous recombinationHRD homologous recombination deficiencyICGC International Cancer Genome ConsortiumITH intratumour heterogeneitykb kilobaseLOH loss of heterozygosityxxiLST large-scale transitionMAF mutation annotation formatMb megabaseMCMC Markov-chain Monte CarloMCN mutation copy numberMMR mismatch repairMMRD mismatch repair deficiencyMSI microsatellite instabilityNER nucleotide excision repairNGS Next-generation sequencingNHEJ non-homologous end joiningNMF non-negative matrix factorizationNNLS non-negative least squaresOCT optimal cutting temperatureOS overall survivalPARP poly (ADP-ribose) polymerasePCR polymerase chain reactionPD progressive diseasePD-L1 Programmed death ligand 1PDAC pancreatic ductal adenocarcinomaPNET pancreatic neuroendocrine tumourPOG Personalized Oncogenomics ProjectQP quadratic programmingROS reactive oxygen speciesRPKM reads per kilobase of transcript per million mapped readsSA simulated annealingxxiiSD stable diseaseSNV single nucleotide variantSSE sum of squared errorsSV structural variantTAI telomeric allelic inbalanceTARGET Therapeutically Applicable Research to Generate Effective Treat-mentsTCGA The Cancer Genome AtlasTDT total duration on platinum-based therapyTMB total mutational burdenUV ultravioletVAF variant allele fractionVUS variants of uncertain significanceWAIC Watanabe-Akaike information criterionWES whole exome sequencingWGS whole genome sequencingWGTA whole genome and transcriptome analysisWTS whole transcriptome sequencingxxiiiacknowledgementsI am infinitely grateful to my supervisor Dr. Steven J. M. Jones for his mentor-ship. With his support, I have had the privilege of learning from and contributingto the fascinating research field of cancer genomics and precision oncology. Inaddition, I would like to thank the members of my thesis advisory committee,Dr. Marco Marra, Dr. Inanc Birol, Dr. Christian Steidl, and Dr. Stephen Yip. Theirinput at committee meetings has continually challenged me to critically considerthe unifying questions and themes of my research, and to rise to the standards ofa PhD.My sincerest thanks go to the participants of the Personalized Oncogenomics(POG) project, who are contributing critical insights in the fight against cancer.Additionally, I thank the BC Cancer Foundation for providing generous fundingsupport for POG.Thank you to the POG directors Marco Marra and Janessa Laskin, as well as thePOG coordinators and project managers, Robyn Roscoe, Alexandra Fok, Kather-ine Mui, Jessica Nelson, and Payal Sipahimalani, without whom this work wouldnot be possible. Also, a big thank you to POG pathologists and oncologists whohave at various times served as both research and clinical mentors to me, in partic-ular Drs. Janessa Laskin, Daniel Renouf, Stephen Yip, Sophie Sun, Karen Gelmon,Howard Lim, Stephen Chia, Hui-li Wong, and Wendie Den Brok. I am also grate-ful for the administrative work of Louise Clarke, Cath Ennis, Leslie Alfaro, andMhairi Sigrist.I would like to thank all of the members of the Jones Lab, as well as the staff sci-entists, group leaders, project managers, and computational biologists involved inPOG, who have been constant companions in this research endeavour. I am grate-xxvful to some individuals in particular, who frequently made time to discuss myresearch and contributed significantly to the development of my skills: Erin Plea-sance, Yaoqing Shen, Martin Jones, Jake Lever, Jasleen Grewal, Laura Williamson,Bruno Grande, Ryan Morin, Shaun Jackman, Katayoon Kasaian, Pinaki Bose,Darya Dargahi, Caroline Ch’ng, Caralyn Reisle, Daniel Paulino, Luka Culibrk,Richard Corbett, Martin Krzywinski, Karen Mungall, Andrew Mungall, and Yus-sanne Ma.The results in this thesis are in whole or part based upon data generated byThe Cancer Genome Atlas (TCGA) managed by the NCI and NHGRI. Informationabout TCGA can be found at http://cancergenome.nih.gov.I am grateful for funding support during my graduate studies from the BCCA-CIHR-UBC MD/PhD Studentship, a CIHR Vanier Canada Graduate Scholarship,a UBC 4 Year Doctoral Fellowship, a CIHR Canada Graduate Scholarship Mas-ter’s Award, and a UBC Faculty of Medicine Graduate Award. I would also liketo acknowledge financial support for travel and technology purchases from theBCCA-CIHR-UBC MD/PhD Studentship, the John Bosdet Travel Award, the UBCGraduate Studies Travel Award, and two travel awards from the UBC Bioinformat-ics Program.My sincere thanks go to the UBC MD/PhD program director Lynn Raymond,associate director Torsten Nielson, and coordinator Jane Lee. I would also like tothank the UBC Bioinformatics program coordinator, Sharon Ruschkowski. Lastly,I thank my former research supervisors Dr. Marco Marra, Dr. Jan M. Friedmanand Dr. Alex MacKay for welcoming me into their labs as an undergraduate stu-dent and empowering me to pursue further research endeavours.Thank you to the members of the 2017 Class of Medicine at the University ofBritish Columbia, who were my daily companions during the first two years of myxxvigraduate program. Also, a tremendous thanks to my fellow MD/PhD colleagues,many of whom have shared the same timeline as myself: Andrea Jones, AmandaDancsok, Philip Edgcumbe, Parker Jobin, Frank Lee, Victoria Baronas, CynthiaMin, Paulina Piesik, Adam Ramzy, Allen Zhang, David Twa, Jordan Squair, DanielWoodsworth, Alexander Wright, William Guest, and many others.My warmest thanks to my life partner, Kaylee Sohng, who (despite residingon the other end of the country) has seen me through all the ups and downsof this work with exceptional generosity. My deepest gratitude goes to my sisterSarah and my parents Liheng and Yongjun, who have ceaselessly supported mefor nearly three decades.xxviidedicationI dedicate this thesis to my parents Liheng Li and Yongjun Zhao, who have madeso many sacrifices while instilling in me scientific curiosity and a passion forlearning and service.xxviii1I N T R O D U C T I O N1.1 research aimsThe objective of this thesis is to precisely characterize the evolution and clinicalactionability of genomic instability in metastatic cancer. To address this objective,we begin by demonstrating the clinical applicability of mutation signatures inthe treatment of metastatic breast cancers, using homologous recombination (HR)as a model system. Next, we develop a statistical model capable of accuratelyestimating the temporal evolution of mutational processes shaping individual tu-mour genomes. Last, we catalog the mutational processes shaping 484 metastaticcancers and relate them to potential clinical implications.1.2 background1.2.1 Cancer is an Evolving Genetic DiseaseThe first known work to probe the biological basis of heredity was published in1866 by Gregor Mendel. In 1886, De Gouvea reported the first known case of inher-ited retinoblastoma, providing evidence that cancer, like other traits, is heritable.In the early 20th century, Boveri, Sutton, and Hunt demonstrated that geneticmaterial was organized into chromosomes, and postulated that cellular processes1and chromosome damage triggered the onset of oncogenesis. Later, Wynder andGraham (1950) showed that cigarette smoking was associated with lung cancerin advance of the 1964 surgeon general report on the same topic (Surgeon Gen-eral, 1964). However, it was only recently, with advances in genome sequencing,that the scale of DNA damage and mutation from tobacco smoke and other mu-tagenic sources was quantified across various types of cancer (Alexandrov et al.,2013a). This section will overview the recent literature highlighting the role ofmutagenesis in tumour biology, and the opportunities and challenges it poses.Somatic Mutations Arise from Exogenous and Endogenous SourcesGenome instability and mutagenesis are cancer hallmarks arising from a conflu-ence of mutagenic exposures and deficient DNA repair (Hanahan and Weinberg,2011). The total number of single nucleotide variants (SNVs) acquired by cancersvaries by orders of magnitude both between and within tumour types, from asfew as ten to as many as three million (Alexandrov et al., 2013a; Lawrence et al.,2013). Large scale structural variants (SVs) also exhibit heterogeneous patternsof occurrence, with substantial differences in mutation burden and distributionseven between tumour subtypes (Nik-Zainal et al., 2016; Waddell et al., 2015). Thisreflects the tremendous diversity in the etiology of and genetic predispositions toindividual cancers.Exogenous mutational processes typically involve exposure to environmentalcarcinogens such as cigarette smoke or ultraviolet (UV) radiation. Mutations arealso known to occur iatrogenically, through exposure to radiation therapy (Beh-jati et al., 2016) and chemotherapies. For example, the alkylating chemotherapyagent, temozolomide, commonly used to treat brain cancers, causes a hypermu-tating signature of cytosine-to-thymine (C→T) transitions in cancers with specific2DNA repair deficiencies (Alexandrov et al., 2013a; Tomita-Mitchell et al., 2000;Yip et al., 2009). When exogenous mutagenesis occurs, it is often responsible forlarge numbers of mutations, accounting for the high mutation rates of melanomas,lung cancers, and gastrointestinal tract cancers, which arise in tissues frequentlyexposed to environmental stresses.By contrast, endogenous mutational processes refer to the action of intracellu-lar mechanisms which induce DNA damage or otherwise cause base changes. Afrequently observed example is deamination, which causes a pattern of C→Gand C→T mutations at TCN trinucleotide contexts. The deamination of 5-methylcytosine at CpG loci is thought to cause aging-related mutations acrosscell types (Alexandrov et al., 2015a). Another form of deamination, catalyzed bythe APOBEC gene family, displays a more specific mutagenic profile (Nik-Zainalet al., 2012) and is often responsible for local hypermutation, known as “Kataegis”(Alexandrov et al., 2013a). Like exogenous processes, endogenous processes oftenaccrue mutations in the context of DNA damage repair deficiencies. For example,metabolic damage due to reactive oxygen species (ROS) can cause the formationof DNA lesions, such as 8-oxoguanosine, which result in DNA mispairing. Exci-sion of 8-oxoguanosine is performed by DNA glycosylase, encoded by the geneMUTYH. Consequently, mutations that inactivate MUTYH result in the accumu-lation of G→T/C→A mutations in the cancer genome (Pilati et al., 2017; Viel etal., 2017; Zehir et al., 2017).Mutational processes each generate characteristic patterns of mutation knownas mutation signatures (Alexandrov et al., 2013a). Others refer to them as “ge-nomic scars” (Lord and Ashworth, 2012; Watkins et al., 2014), a term which em-phasizes the lasting imprint that mutations leave on the cancer genome. Mutationsignature analysis leverages the genome itself as a functional assay to study mu-3tagenesis and DNA repair. Many signatures have been associated with specificcancer types (Alexandrov et al., 2013a), histopathological subtypes (Wang et al.,2017), and response to chemotherapeutic agents (Rizvi et al., 2015; Telli et al.,2016).Somatic mutation provides a basis for tumourigenesis. The stepwise occurrenceof mutations in key oncogenes and tumour suppressors generates a diverse set ofphenotypes amongst the cells of a tumour, upon which selective pressures can act(Nowell, 1976). This process is known to favour dedifferentiated cells with quali-ties concisely summarised as The Hallmarks of Cancer (Hanahan and Weinberg,2011), including replicative immortality, loss of growth suppression, and evasionof immune detection among others.The Role of Deficient DNA RepairCells are equipped with the molecular machinery to repair damage and errors inthe genome. Molecular pathways that repair DNA point mutations include baseexcision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR),and direct catalytic removal of lesions (i.e. the removal of 6-O-methylguanosineby the MGMT gene) (Weinberg, 2013). Mechanisms for repair of DNA strandbreakages and cross-links include HR and non-homologous end joining (NHEJ)(Chang et al., 2017; Ranjha et al., 2018).Like with mutagenic exposures, deficiencies in these DNA repair pathwayscan give rise to specific patterns of mutation. For example, MMR corrects erro-neous base pairing, and a germline predisposition to mismatch repair deficiency(MMRD) gives rise to Lynch Syndrome, which carries a 50-80% lifetime risk ofcolorectal cancer (Kohlmann and Gruber, 1993). MMRD is commonly associatedwith large numbers of C→T mutations and microsatellite instability (MSI), the4genome-wide shrinkage or expansion of short repetitive sequences known as mi-crosatellites (Thibodeau et al., 1993).While DNA damage repair deficiency can drive somatic mutation, it is alsohypothesized to produce vulnerability to DNA damaging chemotherapy. Over-whelming the compromised repair mechanism using platinum-based chemother-apy, alkylating agents, anthracyclines or other DNA damaging drugs is thoughtto stall replication and induce apoptosis (Helleday et al., 2008). A successfulapplication of this strategy is the use of platinum-based chemotherapies inBRCA1/BRCA2 mutated cancers (Farmer et al., 2005; Kennedy et al., 2004; Yanget al., 2011). Alternatively, inducing synthetic lethal inhibition of the poly (ADP-ribose) polymerase (PARP) gene family has recently been shown as a more tar-geted strategy of treating tumours with mutations in BRCA1 or BRCA2 (Engert etal., 2017; Helleday et al., 2008; Robson et al., 2017).The Clinical Implications of Homologous Recombination DeficiencyUncovering the role of homologous recombination deficiency (HRD) in cancersusceptibility was a major scientific breakthrough of the 1990s. Hall et al. (1990)and Narod et al. (1991) first reported that inheritance of chromosomal segment17q12-23 was strongly associated with familial breast and ovarian cancers. A fewyears later, Miki et al. (1994) and Albertsen et al. (1994) pinpointed the exact lo-cus of the BRCA1 gene, and BRCA2 was identified the following year (Woosteret al., 1995). Hereditary mutations in BRCA1 and BRCA2 confer an up to 85%lifetime risk of breast cancer and drive 5-10% of total cases (Canadian Cancer So-ciety, 2014; National Cancer Institute, 2014). However, the mechanism by whichBRCA1/BRCA2 mutations resulted in cancer risk was unclear. In 1997, phospho-rylation of BRCA1 was found to occur in response to DNA damage, and BRCA15was also found to complex Rad51 at damaged sites (Scully et al., 1997a, 1997b).Gradually, the complementary roles of BRCA1 and BRCA2 in HR mediated DNAdamage repair were uncovered (Roy et al., 2012; Yoshida and Miki, 2004).HR refers to the exchange of similar or identical nucleotide sequences. It facili-tates error-free repair of double strand breaks and inter-strand crosslinks, as wellas error-free replication support and telomere maintenance (Li and Heyer, 2008).Additional studies have described cancer risk mutations in other genes of the HRpathway, including ATM, MRN, MRE11, RAD50, NBS1, RAD51, XRCC2/3, and theFANC family, as well as downregulation of ATR (Cerbinskaite et al., 2012).Breast cancers with BRCA1/BRCA2 mutations also display characteristic pat-terns of mutation. The first of these to be discovered were three large-scale copynumber variant (CNV) patterns detected by hybrid-capture panels (Birkbak et al.,2012; Popova et al., 2012). This approach yielded three quantifiable scores (loss ofheterozygosity: HRD-LOH, telomeric allelic imbalance: HRD-TAI, and large scaletransition: HRD-LST) which correlated with BRCA1/2 mutation status (Timms etal., 2014). More recently, the whole genome analysis of breast cancers has iden-tified characteristic SNV and SV signatures associated with BRCA1/BRCA2 mu-tations, as well as short homologous stretches of DNA at deletion breakpointsknown as microhomology (Alexandrov et al., 2013b; Davies et al., 2017; Nik-Zainalet al., 2016; Stephens et al., 2012).HRD is a promising target for the administration of poly-ADP ribose poly-merase (PARP) inhibitors (Gelmon et al., 2011; Kaufman et al., 2013) and platinum-based therapies such as cisplatin and carboplatin (Farmer et al., 2005; Kennedy etal., 2004; Yang et al., 2011). This is motivated by the substantial evidence of a linkbetween germline BRCA1 and BRCA2 variants and sensitivity to platinum-basedchemotherapy (Arun et al., 2011; Byrski et al., 2010; Tutt et al., 2015; Von Minck-6witz et al., 2014). Conversely, spontaneous reversion mutations which restore afunctional copy of BRCA1 or BRCA2 have been observed in breast and ovariancancers with acquired resistance to platinum-based chemotherapy (Afghahi et al.,2017; Patch et al., 2015; Swisher et al., 2008).Telli et al. (2016) showed that the aforementioned HRD score was predic-tive of response to platinum-containing neoadjuvant chemotherapy in primarybreast cancers. However, a phase III trial could not reproduce this finding inthe advanced breast cancer setting (Tutt et al., 2015). What remains unclearis whether more precise quantification of HRD-associated DNA damage usingwhole genome sequencing (WGS) might predict response to platinum-basedchemotherapy in advanced breast cancers where the HRD score on its own couldnot.Tumour Heterogeneity and The Evolution of Mutational ProcessesNowell (1976) proposed a model of cancer clonal expansion, which asserts thatcancers evolve over time by natural selection. Just as cancers change, so too dothe processes which cause mutations in cancer. This surfaces a limitation of mu-tation signatures: they condense the life history of a tumour into a single timepoint. As Watkins et al. (2014) wrote, “[b]y chronicling the past but not document-ing the present, genomic scar measures report whether or not a defect . . . hasbeen operative at some point in tumorigenesis and not whether it remains op-erative at the point of treatment.” Mutations accumulated in the cancer genomedo not disappear when the processes that created them grind to a halt. Therefore,observing a mutational pattern does not immediately reveal whether it arose longago in tumour initiation or whether it represents an active and clinically relevantmutational process.7The lost temporal axis of cancer mutation can be partially restored using ge-nomic features known to vary with time. Consider the analogy of the geologicalrecord. The fossilized remnants of past life have accumulated in the earth, butit can be inferred that fossils found in deeper strata likely represent earlier lifeforms than those found in shallow ones, even without the use of additional tech-nology such as carbon dating. In the cancer genome, clonal heterogeneity andlarge scale chromosomal duplications are analogous to geological strata. Muta-tions occurring in a larger fraction of cancer cells likely occurred earlier thanmutations occurring in few. Additionally, mutations present on multiple chromo-somes in duplicated regions most likely occurred prior to the duplication event,and can also be inferred to be earlier-arising (Figure 1.1).By inferring the relative timing of mutations, an analysis across publicly avail-able cancer datasets in The Cancer Genome Atlas (TCGA) verified that mutationalprocesses do indeed change over time (McGranahan et al., 2015). Those associatedwith aging, cigarette smoking, and UV radiation were prevalent among earlier-occurring mutations, whereas the impacts of DNA repair deficiencies were morelikely to be split evenly between early and late mutations. While perhaps not asprecise as taking multiple time-separated samples of a tumour to track its evo-lution, this approach enables the temporal dissection of mutations using a singlebiopsy, obviating the inconvenience, medical risk, and expense of sequencing mul-tiple biopsies of a tumour.Just as chromosomal duplications can temporally dissect point mutations, sotoo can point mutations provide a molecular clock for duplications. A late-arisingduplication can be expected to reveal a region spanning many prior (and there-fore now duplicated) point mutations. Conversely, an early-arising duplicationwill likely go on to acquire many late-arising (and therefore non-duplicated) point80/22/3early(undetected)earlyduplicationdeletion1/3 latecopy-neutralloss ofheterozygosity2/31/3duplicationchromosomeSNVCopiesearlyTimingearly/lateallele-specificamplificationCellularPrevalence1.000.500.250.25lateearlyA BFigure 1.1: Mutational prevalence is related to time of mutation onset.Mutational prevalence is defined as the arithmetic product ofcellular prevalence and mutational copy number, and has a di-rect impact on variant allele fraction. (A) For a given locus, thecellular prevalence is the fraction of cancer cells carrying a mu-tation at the locus. Higher cellular prevalence is associated with“trunk” mutations, which are more likely to be early arisingamongst cancer cells. (B) Increased mutation copy number isassociated with early arising mutations which occurred priorto duplication events.mutations. Importantly, this model refers to “molecular time,” which differs fromtrue time by a factor proportional to the mutation rate, which can vary over time.Purdom et al. (2013) implemented a generative model which estimates the relativetiming of chromosomal duplications. An important limitation is that relative tim-ing of events in regions with more complex chromosomal abnormalities can onlybe inferred if the exact temporal ordering of events is known or can be deduced.This limitation stems from the inability of sequencing techniques to distinguishwhich homologous chromosome a DNA read originated from. The advent of long-read and linked-read (Zheng et al., 2016) sequencing technology may eventuallyhelp to address this limitation.91.2.2 Whole Genome Sequencing of CancerThe application of WGS to derive a more complete understanding of cancer hasbeen a central goal of cancer researchers since before the human genome wasfirst decoded in 2003 (Lander et al., 2001). It would take a further 5 years and asea change in genome sequencing technology before the first application of next-generation whole genome sequencing to a cancer sample was described. (Ley etal., 2008) reported the analysis of a cytogenetically normal AML in 2008 only sixmonths after the first human whole genome sequence by next-generation tech-nologies was published (Wheeler et al., 2008). At the time, the bioinformaticstools and genomic resources to facilitate the in-depth analysis of whole genomedata could be considered in their infancy compared to today’s standards. Even so,the insights gained into both the approach taken to sequencing the tumour andthe biology of the tumour itself were profound when compared to the targetedsequencing approaches commonly applied to cancer research at the time.Today, the resources required for WGS analysis have decreased substantially.Alongside a steady reduction in the cost of WGS, there have been improvementsin the technologies for generating and processing quality raw data as well as thetools and companion datasets that contextualize findings for biological and clini-cal interpretation. However, a majority of cancer genomics efforts remain focusedaround targeted deep sequencing and whole exome sequencing (WES) (Morris etal., 2017; Raphael et al., 2017).Large-scale efforts using genome sequencing to characterize a wide variety ofadult and paediatric cancers began in earnest as early as 2005. This includedprojects such as TCGA, the International Cancer Genome Consortium (ICGC),Catalogue of Somatic Mutations in Cancer (COSMIC), and Therapeutically Ap-10plicable Research to Generate Effective Treatments (TARGET), to name but a few.Not only have such efforts progressed our understanding of cancer as a genomicdisease, they also provide the data needed for developing tools and resources thatfacilitate the rapid detection and analysis of potentially relevant genomic events(Cerami et al., 2012; Gao et al., 2013; Gonzalez-Perez et al., 2013; Rubio-Perez et al.,2015). However, since the bulk of the data produced is focused on coding regionsof the genome, the available data are underpowered to inform how untranslated,intronic and intergenic regions might impact the molecular pathogenesis of dis-ease. In many cases the data also lack comprehensive clinical annotation, whichis required for linking genomic events to specific cancer types, prognoses, andtreatment responses. Furthermore the majority of samples in these cohorts arefrom primary untreated disease and do not offer insight into how tumours re-spond to often complex and disparate treatment regimens (Robinson et al., 2017).Additional cancer cohorts of samples from multiple time-points and biopsy sitesthat include rich clinical information are therefore still required to better definetumour biology and the relationship to treatment history and response.The number of tumour whole genome sequences that have been published andmade publicly available has steadily increased over the past 10 years. These anal-yses have led to surprising insights into cancer biology, particularly from the anal-ysis of SVs in tumour genomes (Chong et al., 2017). They range in scope fromcharacterization of cancer cell lines (Pleasance et al., 2010b, 2010a) and n-of-onecase reports with rich clinical detail (Ellis et al., 2012), to ultra-deep sequencing ofa single tumour to uncover clonal heterogeneity (Griffith et al., 2015). Larger-scaleefforts are also emerging, focussed both at the characterization of somatic muta-tions, including non-coding and SVs (Alexandrov et al., 2013b; Banerji et al., 2012;Nik-Zainal et al., 2016; Wang et al., 2014) and germline-specific analysis for the11discovery of predisposing factors (Foley et al., 2015). For example, Nik-Zainal etal. (2016) sequenced the whole genomes of 560 breast cancers, 260 supplementedwith transcriptome sequencing. They demonstrated how such approaches fill gapsin our understanding of the genome between the exons and expand the knownrepertoire of biological mechanisms underlying tumorigenesis with potential clin-ical utility (Alexandrov and Stratton, 2014; Alexandrov et al., 2013b; Davies et al.,2017; Zolkind and Uppaluri, 2017).WGS has many unique capabilities that together enable the complete cata-loguing of somatic variation within a single experimental protocol (Figure 1.2).Whereas targeted sequencing has the advantage of being efficient and affordablewhile capturing much of the known actionable variation, WGS carries advantagesin the analysis of genomic instability and SVs. The research within this thesisrelies upon insights gleaned from recent advances in the analysis and interpreta-tion of cancer WGS data. The following sections provide an overview of uniquecapabilities of WGS which are leveraged in later chapters.High Resolution Structural Variant AnalysisFew images are more deeply enscribed in the history of human genetics than thekaryotype, which emphasizes the functional importance of genomic organization.Over the years, a plethora of technologies have enabled inspection of SVs andCNVs in the genome. Some, such as fluorescence in-situ hybridization, are pre-cisely targeted, while others, such as array-comparative genomic hybridization,are comprehensive at varying resolutions. WGS promises to deliver precise andcomprehensive characterization of both SVs and CNVs. While many challengesstand in the way of achieving a complete “digital karyotype” of cancer, WGS hasmade dramatic advances towards this goal.12FGFR3ELK4SOS2EGFRNRASBRAFMAP2K2MAP2K1ERBB3MAPK6MYCNF1Mutagenesis Genomic InstabilityCCCCCCCSmall Variants CNV & LOH Structural VariantsPathway IntegrationMutation SignaturesFunctional Prediction Therapeutic TargetsTumorBiologyGenomicInterpretationWhole GenomeSequencingFigure 1.2: Whole genome sequencing data reveal diverse forms of ge-nomic alteration. Tumour genomes exhibit frequent mutationand genomic instability which drive the hallmarks of cancer.Whole genome sequencing can catalog various forms of ge-nomic alteration, enabling integrative analyses of tumour bi-ology.13The cancer genome frequently features complex and interlocking patterns ofsomatic SVs, expanding the realm of possible cancer-driving alterations. Since thediscovery of the Philadelphia chromosome (Nowell P., 1960), the characterizationof oncogenic fusions has been central to cancer diagnosis and treatment. Asidefrom fusions, SVs also modulate gene regulation by rearranging the non-codinggenome. Variants that impact the copy number or relative positioning of promot-ers and other regulatory elements can alter gene expression (Alaei-Mahabadi etal., 2017).Paired-end WGS has become the standard for comprehensively and preciselycataloguing SVs and CNVs. While targeted arrays and WES can provide compar-ative read counts for CNV analysis, they lack the resolution to detect microampli-fications and microdeletions and suffer from sequencing depth bias. These chal-lenges, along with a need for computational methods to address them, limit theaccuracy of CNV calls and result in high false discovery rates (Zare et al., 2017).While WES can detect gene fusions (Chmielecki et al., 2013), it misses fusionsaffecting splice sites, promoters, and other functionally critical loci.SV analysis methods fall broadly into four categories: read density, split reads,paired end reads, and de novo assembly (Liu et al., 2015; Tattini et al., 2015). Recentmethods combine these strategies to captalize on the unique strengths of each. Forexample, DELLY improves sensitivity by considering paired end reads while en-abling base-pair breakpoint calling precision by examining split reads (Rausch etal., 2012). A major challenge in digital karyotype construction is poor concordancebetween SVs and CNV breakpoints (Alaei-Mahabadi et al., 2017), which suggestsmismatched detection thresholds between these two modalities. This mismatchcauses further difficulty in reconstructing complex rearrangements such as chro-mothripsis (Stephens et al., 2011) and chromoplexy (Baca et al., 2013).14SVs and CNVs play important roles in the somatic alteration of cancer genes,and are best comprehensively characterized by WGS. Evolving methodologiespromise to move the field towards increasingly accurate reconstruction of thedigital karyotype of cancer.Methods for Analyzing Mutation SignaturesA mutation signature can be defined as a set of somatic mutation types whichoccur at specific relative frequencies. For example, a simple mutation signaturecould describe the relative frequencies of each base change amongst SNVs. Toavoid redundancy, it is typical to collapse complementary bases, resulting in sixmutation classes: C→A, C→G, C→T, T→A, T→C, T→G. A widely-used exten-sion of this classification includes the 3’ and 5’ trinucleotide context. For example,a C→T mutation in an ApCpG context would be considered one mutation class.Because there are four possible 3’ and four possible 5’ bases, this parameterizationyields a total of 96 SNV classes (Alexandrov et al., 2013b).This approach accounts for bias in the frequencies of mutations observed inspecific trinucleotide contexts. For example, deamination of methyl-cytosine isa common cause of C→T mutations, and often occurs at CpG sites, which arefrequently methylated. As a result, signatures of deamination often feature highrates of C→T mutations in NCG trinucleotide contexts (Nik-Zainal et al., 2012).The analysis of mutation signatures involves two steps: (1) generating mutationcount vectors, and (2) inferring signatures and exposures. Using the chosen pa-rameterization, the number of mutations of each class is determined to yield amutation count vector, also known as a mutation catalog. The inference of signa-tures and exposures was first performed using non-negative matrix factorization(NMF), which determines a dimensionally reduced set of mutation signatures and15SIGNATURE ASIGNATURE CSIGNATURE BSUBJECT 1SUBJECT 29010304030×++ +==OPERATIVE SIGNATURES (P) SIGNATURE EXPOSURES (E) LINEAR COMBINATION OF SIGNATURES (P × E) SUBJECT SIGNATURES (M)NATURAL PROCESSES - SUMMATING SIGNATURESDECIPHERING SIGNATURESFigure 1.3: The analysis of mutation signatures by non-negative matrixfactorization. Mutational occurrence probabilities are modeledas a linear combination of signatures. This defines a generativeprocess whereby the genomes of individual cancers differ inthe relative contributions of each mutation signature. In reality,the mutation count matrix (M) is known, and the signatures(P) and exposures (E) are unknown. Non-negative matrix fac-torization is an unsupervised learning method which infers astable factorization of the mutation count matrix, which is oftenreferred to as “deciphering mutation signatures de novo.”their relative contributions to each sample’s genome (Alexandrov et al., 2013b).Thus, the mutation counts of a given genome are modeled as a linear combina-tion of the signatures, which is consistent with the notion of multiple overlappingmutational processes each exerting additive effects (Figure 1.3). While this methodcan be used to derive mutation signatures both from genomes and exomes, thenumber of mutations sampled by WES often insufficient to detect all but the mosthypermutating signatures, unless many hundreds of cancers are sequenced.There are many variations on the analysis of mutation signatures which involvemodifying (1) the parameterization of mutation types, or (2) the dimensionalityreduction algorithm. An example of varying mutation types is the inference of SVmutation signatures in breast cancer (Nik-Zainal et al., 2016), wherein mutationswere classified by size, type (deletions, duplications, inversions, and transloca-tions), and whether or not their breakpoints are clustered. Alternative dimension-16ality reduction algorithms include principal components analysis (Gehring et al.,2015), expectation-maximization (Fischer et al., 2013), empirical Bayes (Rosales etal., 2017), and Bayesian NMF (Kim et al., 2016). Additionally, mutation signaturescan be determined in subsets of the total mutation set to address specific biologi-cal hypotheses. For example, Supek and Lehner (2017) partitioned mutations intoclustered and non-clustered sets to determine which mutational processes gener-ate local hypermutation, and McGranahan et al. (2015) partitioned mutations intoinferred early and late sets to examine mutation signature evolution.The first integrative analysis of mutation signatures across cancer types aggre-gated 21 mutation signatures across 7,042 cancers, mostly from publicly availablesequencing data (Alexandrov et al., 2013a). Later, this set of “consensus” muta-tion signatures was expanded to 30 to incorporate the continued discovery ofnovel signatures (for example Poon et al. (2015)). The 30-signature reference set isdescribed in detail at http://cancer.sanger.ac.uk/cosmic/signatures.Lastly, methods for determining the contributions of known mutation signa-tures to individual cancer genomes have also emerged (Huang et al., 2017; Rosen-thal et al., 2016). The previously described mutation signature analysis methodscan be considered de novo signature analysis, because they simultaneously infermutation signatures and their contributions to individual cancers from scratch. Bycontrast, n-of-1 mutation signature analysis requires determining the best fit of anindividual mutation profile against a set of known signatures. In the personalizedcancer genomics paradigm, it is important to be able to reproducibly analyze thesignatures of individual genomes to identify patterns of mutation with potentialtherapeutic implications.17Methods for Analyzing Tumour HeterogeneityThe analysis of intratumour heterogeneity in cancer is an area of active research.Significant advances in single-cell genome sequencing are enabling the directlysampling of genomic heterogeneity across tumour cells. While single-cell sequenc-ing technologies are rapidly improving, resource limitations make them pro-hibitive at this time for use in clinical research, where large cancer cohorts areoften required to distinguish significant effects. Another strategy for understand-ing tumour evolution involves the statistical inference of cancer cell subclonesvia the analysis of digital next-generation sequencing (NGS) read counts. Thisapproach can be made more computationally feasible with the collection andcomparison of multiple cancer sequencing timepoints.NGS by Illumina-based protocols involves the capture, sequencing, and align-ment of fragmented DNA reads. This process yields digital read depth counts ateach locus, which, at sufficient sequencing depth, enables genome-wide statisti-cal inference of DNA copy number. Additionally, somatic mutation loci can bequeried for the fraction of total reads supporting the variant allele, also known asthe variant allele fraction (VAF). The number of variant reads is related to (1) thetumour content or purity of the sample, (2) the tumour and normal cell copy num-ber at the mutated locus, (3) the number of DNA copies carrying the mutation,also known as the mutation copy number (MCN), and (4) the fraction of cancercells carrying the mutation, also known as cancer cell fraction (CCF).If the tumour content and copy number are known, the MCN and CCF of agiven mutation can be estimated from the VAF. The random sampling of frag-mented reads from the bulk tumour sample also introduces noise. With suf-ficiently deep sequencing, discrete clusters representing mutations of varyingMCN, and sometimes different CCF, can be observed. At lower sequencing depths,18it may be challenging to deconvolute MCN and CCF. A common goal for tu-mour heterogeneity inference methods is to infer the number of tumour subclonespresent as well as their relative CCF (Miller et al., 2014; Roth et al., 2014). Thesetechniques are often designed to work on deep sequencing data, wherein sites ofknown somatic mutation are sequenced to a depth of hundreds or thousands ofreads.A simplified approach to CCF estimation has been applied to the temporalanalysis of mutation signatures from WES or WGS data. This involves partitioningcancer somatic mutations according to their MCN and CCF (McGranahan et al.,2015) and has yielded promising findings regarding mutation signature evolutionin lung cancer (Bruin et al., 2014), breast cancer (Yates et al., 2017), and liver cancer(Letouzé et al., 2017).Complementarity with the TranscriptomeAlongside advancements in WGS, RNA sequencing technologies including wholetranscriptome sequencing (WTS) can provide complementary insights in a per-sonalized medicine setting. WTS enables genome-wide quantification of gene ex-pression, which substantially expands the capture of potentially actionable molec-ular aberrations. Substantial efforts exist aiming to use tumour expression profilesto refine cancer diagnoses and subtyping. Translation of these efforts into inter-pretable and clinically actionable parameters such as the widely used PAM50 geneset for subtyping, stratification, and prognostication of breast cancers (Chia et al.,2012) can further advance these aims. The integration of gene expression datainto functional clusters or pathways using statistical methods or visualizationscan help identify dysregulated cancer pathways, including DNA repair pathways(Mulligan et al., 2014). This can provide rationale for guiding targeted cancer19treatment, including the use of experimental therapeutics (Tomasetti et al., 2017).When treatments fail, gene expression analysis can also elucidate resistance mech-anisms and potentially suggest follow-up targets (Jones et al., 2010).The existence and functional impacts of many events observed in the genomecan be further analyzed by WTS. Amplified and deleted genes can be assessedfor differential expression. The presence of oncogenic or deleterious mutationson the expressed transcript can be confirmed. Exon skipping and intron reten-tion can be identified and potentially linked to splice site variants. Transcriptomeassembly can facilitate detection of potentially oncogenic alternative transcripts.The presence of oncogenic gene fusions can be confirmed, and their expressionverified. The effects of promoter and enhancer mutations on gene expression canbe quantified. Tumour suppressors such as BRCA1 and MLH1 can be assessed forpotential epigenetic silencing.1.2.3 From Cancer Genome to Personalized OncogenomicsThe scope of sequencing and its applications has also broadened into the clinic.For specific cancer types, the use of targeted genomic panels for both germlinesusceptibility and known ‘actionable’ somatic mutations is becoming routine inmany cancer centres (Bosdet et al., 2013; De Leeneer et al., 2011). The develop-ment and application of larger-scale gene panels is also seeing routine use on alarge number of patient samples (Zehir et al., 2017). These high-throughput ap-proaches drive discovery in the clinical context and quantify the frequency andprevalence of both well-characterized and novel variants in cancer-related genes.However, they capture a tiny fraction of the genomic complexity that can exist inan individual tumour.20Jones et al. (2010) published the first attempt to characterise a whole cancergenome for a clinical application. This analysis involved sequencing an adeno-carcinoma of the tongue and identifying genomic amplification and concurrentabundant expression of the RET oncogene as a potential driver of the disease.This discovery led to a personalised treatment approach using kinase inhibitorsto target the RET protein. Subsequent analysis of a post-treatment sample afterdisease progression provided comprehensive insight into how the tumour evolvedto circumvent the treatment regimen in a way that a targeted approach could nothave achieved. The success of this study led to a pilot for the Personalized Oncoge-nomics Project (POG) study at the BC Cancer Agency, now in its 5th year, whichaims to leverage whole genome analysis with the intent to treat based on the ge-nomic information. Sequencing of the initial 100 patients on this trial requireddevelopment of pipelines and comprehensive interpretation tools (Laskin et al.,2015) and promised to restratify cancers by molecular features rather than by siteof origin.The inherent genomic complexity of cancers gives rise to a range of genomicevents and signatures that are becoming increasingly relevant in patient treat-ment stratification. However, it is far from certain that an individual’s tumourwill harbour previously described and functionally characterized genomic events.The successful clinical application of personalized genomic medicine thereforemust rely on broad screening approaches, a conclusion that was also reached ina study comparing WES and WGS in gastric cancer (Wang et al., 2014). A wholegenome approach is currently the most efficient way to build a comprehensivepicture of the genomic variation in a tumour without resorting to multiple tech-nical platforms. That being said, there are still significant challenges that must beovercome before approaches such as whole genome and transcriptome analysis21(WGTA) can be universally adopted. However, on the assumption that sequencingcosts will follow the historical downward trend, a more gradual uptake of WGTAfor more refined stratification and subtyping of rare tumours may be achievablein the short term. Furthermore, as unanticipated clinical successes from WGTAcontinue to permeate the field and the infrastructure to support such approachesmature, a transition towards ever more comprehensive sequencing in the clinicalsetting is expected.Outstanding Challenges in the Clinical Translation of Mutation SignaturesThe study of mutational processes in cancer is yielding promising predictivebiomarkers (Davies et al., 2017; Rizvi et al., 2015). However, many challengesstand in the way of clinical translation. While HRD mutation signatures havebeen proposed as targets for DNA damaging chemotherapy (Alexandrov et al.,2015b), their association with therapeutic outcomes has not yet been established.Additional technical challenges limit biomarker studies of mutation signatures.The individualized analysis of SNV mutation signatures is possible, but existingapproaches are susceptible to bleed of signal between like signatures, which canlead to false positive signature identification. Moreover, methods for individual-ized temporal dissection of mutation signatures provide point estimates withoutconfidence intervals which are challenging to interpret in the clinical setting. Ad-ditionally, while SNV mutation signatures can be captured by WES, the accuracyof assessing processes with low to medium mutagenicity, such as HRD, is poor.WGS not only addresses this, but also simultaneously enables the detection of SVsignatures (Nik-Zainal et al., 2016). However, unlike with SNV signatures, thereis not yet a consensus set of SV signatures applicable across cancer types.22Lastly, the understanding of how mutation signatures evolve in the metastaticsetting is limited, as nearly all studies of mutation signatures have taken place inprimary, untreated cancers. Metastasis underlies as much as 90% of cancer-relatedmortality, (Chaffer and Weinberg, 2011), and the clinical features of metastatic dis-semination and overall survival can be highly variable, even amongst subtypes ofa cancer (Kennecke et al., 2010). Only recently have genomic studies investigatedactionable cancer genes (Robinson et al., 2017) and the tumour evolution (Yates etal., 2017) of metastatic cancers. Understanding the factors which shape mutagen-esis in metastatic cancers will help to guide the clinical application of mutationsignatures.1.3 thesis objectives and chapter overviewThe overarching goal of this this thesis is to elucidate the clinical actionabilityof mutation signatures and their temporal evolution. The HR pathway was anideal model system to study clinical actionability of mutation signatures, as itrelates to the commonly mutated genes BRCA1 and BRCA2 and to readily avail-able platinum-based chemotherapy. I hypothesized that mutation signatures ofHRD are independently associated with response and resistance to platinum-based chemotherapy, even in cancers lacking BRCA1/BRCA2 mutations. I furtherhypothesized that in cancers with low or moderate HRD mutation signatures, ob-serving the continued activity of HRD mutagenesis could be similarly associatedwith response to platinum-based chemotherapy. The corollary to this hypothesisis that past exposure to platinum-based chemotherapy would be associated withthe suppression of HRD mutation signatures in late mutations.23The aim of chapter 2 is to assess the predictive value of mutation signatures aris-ing from HRD. In this chapter, I describe an observational study of 93 advanced-stage breast cancers, 33 of which were treated with platinum-based chemotherapy.We made use of the recently-developed HRDetect metric, which combines six dis-tinct signatures of HRD to form a more robust model that has been shown topredict mutations in BRCA1/BRCA2 with high accuracy (Davies et al., 2017). Wefound that patients with increased HRDetect scores had improved outcomes onplatinum-based chemotherapy, and that HRDetect was a superior predictor thanany of the six signatures alone. This work was published in Clinical Cancer Re-search (Zhao et al., 2017).Examining temporal shifts in mutation signatures required the developmentof novel mutation analysis software. In chapter 3, I present SignIT: mutation sig-nature inference in individual tumours. SignIT is a Bayesian hierarchical modelwhich provides improved accuracy and clinical interpretability of individualizedsignature analysis. A natural extension of the SignIT model enabled the inferenceof mutation signatures across tumour subpopulations with inferred relative tim-ing.Having formulated SignIT, I demonstrated two applications of mutation sig-nature timing. In chapter 4, I analyzed the evolution of HRD in a pancreaticadenocarcinoma to address the hypothesis that ongoing HRD activity is associ-ated with treatment outcome. In chapter 5, I report mutation signatures and theirtemporal evolution across 484 metastatic cancers. This analysis revealed an associ-ation between prior platinum-based chemotherapy exposure and the suppressionof HRD in late mutations. I also uncovered signatures arising from chemotherapy-associated mutagenesis.242T H E C L I N I C A L A C T I O N A B I L I T Y O F H O M O L O G O U SR E C O M B I N AT I O N D E F I C I E N C Y I N A D VA N C E D B R E A S TC A N C E R2.1 introductionGenomic instability and mutagenesis are hallmarks of human cancers that canarise from deficient DNA repair processes. One such process, HR, involves strandinvasion by homologous sequences to facilitate error-free repair of double strandbreaks and inter-strand crosslinks (Li and Heyer, 2008). Mutations in genes respon-sible for HR are prevalent among human cancers. The BRCA1 and BRCA2 genesare centrally involved in HR, DNA damage repair, end resection, and checkpointsignaling (Joosse, 2012). Inherited mutations in BRCA1 and BRCA2 account for 5-10% of all breast cancers, conferring an up to 85% lifetime risk (Canadian CancerSociety, 2014; National Cancer Institute, 2014). There is also emerging evidencesuggesting that germline BRCA1 and BRCA2 mutated cancers are associated withsensitivity to platinum-based chemotherapy (Arun et al., 2011; Byrski et al., 2010;Tutt et al., 2015; Von Minckwitz et al., 2014) and PARP inhibitors (Robson et al.,2017). This is further supported by resistance to platinum-based agents arisingfrom secondary mutations that cause somatic reversion of germline BRCA1/2 vari-ants (Norquist et al., 2011).25HRD is complex, and its myriad causes are not fully understood. However,examining characteristic patterns of mutation, collectively known as mutation sig-natures or genomic scars, can provide an aggregate functional metric of pathwayfunction. For example, BRCA1 and BRCA2 are associated with characteristic CNVpatterns (Timms et al., 2014), which have been suggested to independently pre-dict platinum sensitivity in primary breast cancer (Telli et al., 2016). However, aclinical trial in advanced stage triple negative breast cancer did not verify thisassociation (Tutt et al., 2015). Meanwhile, new genomic correlates have refinedthe detection of HRD. Large-scale genome profiling across thousands of cancershas revealed characteristic patterns of mutation giving rise to millions of somaticSNVs (Alexandrov et al., 2013a) and SVs (Nik-Zainal et al., 2016). Recent effortsaggregated six HRD-associated signatures into a single score called HRDetect toaccurately classify breast cancers by their BRCA1 and BRCA2 status (Davies et al.,2017).With this improved capability to quantify “BRCA-ness,” there is substantial in-terest in its therapeutic implications in breast cancer (Alexandrov et al., 2013a;Davies et al., 2017; Jacot et al., 2015; Lips et al., 2013; Stecklein and Sharma, 2014).Importantly, these measures may be able to identify BRCA1- and BRCA2-intactbut HR-deficient tumours to guide eligibility for HRD-targeted clinical trials andtreatment decision-making. However, there is not yet direct evidence that ag-gregated genomic scar metrics predict platinum sensitivity. In this observationalbiomarker study, we perform WGS to identify HRD mutation signatures in a co-hort of 93 patients with advanced stage breast cancers and associate them withmolecular, pathologic, and clinical features. Using HRDetect, we aggregate HRDsignatures and demonstrate their association with clinical benefit on platinum-based chemotherapy.262.2 results2.2.1 Somatic Mutation SignaturesUsing a published framework (Alexandrov et al., 2013b), we deciphered the muta-tion signatures of 1,182,840 somatic SNVs and 11,393 SVs from the whole genomesof 93 advanced-stage breast cancers.Of the nine resulting SNV signatures, numbered V1-V9 (Fig. 2.1A), six closelymatched previously described mutation signatures available from COSMIC. V9(Signature 3) and V6 (Signature 8) are associated with HRD (Alexandrov et al.,2013a; Davies et al., 2017; Nik-Zainal et al., 2016). V4 (Signature 1) is associatedwith aging (Alexandrov et al., 2015a). V1 (Signature 2) and V2 (Signature 13) areassociated with APOBEC deaminase activity. V3 (Signature 17) has been observedacross many cancer types, but its etiology is unclear.The three remaining signatures, V5, V7, and V8, represent novel breast can-cer mutational signatures. V5 predominantly displays C→T mutations in CpCpYcontexts (see Table A.1 for nomenclature) and was present in only three can-cers. V7 is characterized by high pyrimidine transition rate with enrichment inNpYpG contexts and was observed across many tumours spanning histologicaland molecular subtypes. V8 demonstrated moderate enrichment of all base substi-tution types when flanked by T and A bases, and was present at low levels acrossmany tumours. These novel signatures may reflect the advanced, recurrent, anddrug-treated nature of our cohort, whereas previous mutation signatures havebeen derived from primary untreated cancers. Further study is necessary to ver-ify etiology. Potential etiologies of signatures V3 and V5 will be discussed in moredepth in chapter 5.27C>AC>GC>TT>AT>CT>GV1APOBECC>AC>GC>TT>AT>CT>GV2APOBECC>AC>GC>TT>AT>CT>GV3Sig 17C>AC>GC>TT>AT>CT>GV4AgingC>AC>GC>TT>AT>CT>GV5C>AC>GC>TT>AT>CT>GV6HRDC>AC>GC>TT>AT>CT>GV7C>AC>GC>TT>AT>CT>GV8C>AC>GC>TT>AT>CT>GV9HRDAV1 V2 V3 V4 V5 V6 V7 V8 V9Signatures1 20MutationBurden(Muts / Mb)B0.20.40.6ExposureFractionACGTA C G T5' context3' contextFigure 2.1: Nine signatures of single nucleotide variation decipheredfrom 93 breast cancer whole genomes. (A) Signatures are visu-alized according to relative frequencies of mutations groupedby base change and 3’/5’ context. Six of nine signatures matchpreviously published mutation signatures (cosine similarity >0.9), five of which are associated with hypothetical etiologies.(B) Fractional exposures and mutation burdens across the pa-tient cohort, ordered by hierarchical clustering, reveals groupsdefined by aging (top cluster with dominant V4), homologousrecombination deficiency (middle cluster with dominant V9),and APOBEC deamination (lower cluster with dominant V1and V2).28Hierarchical clustering revealed that most cases of high SNV burden weredriven by APOBEC or HRD associated processes (Fig. 2.1B), which together weredominant in 46 (49%) of the 93 sequenced breast cancers. The aging mutationsignature was ubiquitous across cancers, and was the dominant signature in 31(33%) cases, all of which had low mutation burden (< 5 SNVs per Mb).92 samples were classified into intrinsic subtypes based on expression profilesof PAM50 (Chia et al., 2012) genes. Non-parametric analysis demonstrated signifi-cant differences in signatures V2, V3, V8, and V9 across subtypes (Appendix TableA.2). Post-hoc pairwise Dunn tests revealed elevated V3, V8, and V9 within basal-like cancers (Figure 2.2), suggesting that diverse mutagenic etiologies, includingHRD, underlie this subtype. Elevated signature V9 was also most common amongtriple-negative tumours.The six deciphered SV signatures (Figure 2.3), numbered R1-R6, closely resem-bled the six previously described breast cancer signatures reported by Nik-Zainalet al. (2016). R1-R4 and R6 uniquely matched previously described signatures.By visual inspection, R5 matches previously described rearrangement signature 5albeit with more non-clustered translocations.2.2.2 Genomic Findings Associated with HRDAlongside these four SNV and SV mutation signatures, we measured two addi-tional HRD-associated patterns of somatic mutation, the HRD index, and microho-mology at deletion breakpoints. The HRD index measures the frequency of largescale loss of heterozygosity (LOH), telomeric allelic inbalance (TAI), and large-scale transition (LST) events (Timms et al., 2014), and was computed using alleliccopy number ratios inferred from read alignment frequencies. The proportion of29AER PRHER2DiagnosisER PRHER2RelapseBSubtypeV1 V2 V3 V4 V5 V6 V7 V8 V9Signature1 20MutationBurden(Muts / Mb)Marker statusNegativePositiveUnknownIntrinsic SubtypeBasalHer2Luminal ALuminal BNormal−like0.20.40.6Signature Fractionll llllll***0.00.20.40.6V3 Exp.llll***0.00.51.01.5V8 Exp.l lll******0246BasalHer2Luminal ALuminal BNormal−likeV9 Exp.CFigure 2.2: Breast cancer signatures across subtypes. Comparison of SNVmutation signatures across (A) histological and (B) molecularsubtypes shows more frequent signature V9 (homologous re-combination deficiency) exposure in triple-negative and basal-like breast cancers. (C) Four signatures (V2, V3, V8, V9) ex-hibited statistically significant differences across molecular sub-types (Kruskal-Wallis test with adjusted p-values). Subtype-specific signature exposures are shown here, with pairwisestatistical significance testing performed by the Dunn non-parametric test of multiple comparisons.30Clustered Non−clusteredR1R2R3R4R5R6DELDUP INVTRADELDUP INVTRA0.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.4SV typeProbabilitysubtype1 kb10 kb100 kb1 Mb10 MbtraAR1R2R3R4R5R6Signature1 400SVBurden0.250.500.75ExposureBFigure 2.3: Breast cancer structural variant signatures. (A) Six structuralvariant (SV) mutation signatures deciphered from breast cancerwhole genomes. SVs were classified according to breakpointclustering, mutation type, and size. (B) Signature exposureswere normalized to sum to 1 for each sample, then arrangedaccording to hierarchical clustering alongside SV mutation bur-den.31small deletions associated with microhomology was determined by comparingsequences flanking deletion breakpoints. As per a published method (Davies etal., 2017), all six scores were log transformed, normalized, and combined into asingle HRDetect predictor. This was performed using a logistic function with thesame coefficients as those reported by Davies et al. (2017) to ensure consistencywith the previously model.19 breast cancers had high HRDetect scores (> 0.7), 37 had moderate scores(0.005-0.7), and 37 had low scores (< 0.005). All cancers underwent genome-widecharacterization of germline and somatic point mutations, insertions and dele-tions, and copy loss in gene regions and splice sites.Across the 93 breast cancers, HRDetect predicted pathogenic germline and so-matic variants in BRCA1 and BRCA2 with high accuracy and an optimal dif-ferentiating threshold of 0.74 (Fig. 2.4B). These findings closely agree with thepreviously established threshold of 0.70 (Davies et al., 2017). Because variants ofuncertain significance (VUS) have previously not been associated with increasedHRDetect (Davies et al., 2017), we classified VUS as non-pathogenic mutations forthe purposes of this analysis. Elevated HRDetect scores were observed in all tu-mours with observed BRCA1/BRCA2 frame shifts, nonsense mutations, homozy-gous deletions, or splice variants identified as likely pathogenic in ClinVar (Fig.2.5). There were 11 cases with germline missense VUS. The most common of thesewas BRCA2 T1915M, which had a global minor allele frequency of 1.14% and hasconflicting reports of both reducing (Serrano-Fernández et al., 2009) and increas-ing (Johnson et al., 2007) breast cancer risk. In our study, seven breast cancers(BR004, BR027, BR032, BR036, BR064, BR074, BR086) harboured germline BRCA2T1915M, of which three (BR004, BR036, BR086) were homozygous in the tumourand displayed a wide range of HRDetect scores (0, 0.04, and 0.62 respectively).32However, BR086 exhibited coincident homozygous deletion of RAD51 which mayaccount for the elevated score. These data therefore do not provide clear evidencefor pathogenicity of BRCA2 T1915M.A number of other genes involved in HR demonstrated tentative associationswith HRDetect scores. Elevated HRDetect was observed in three cases with ho-mozygous deletion of PTEN as well as one case with two coincident PTEN mis-sense mutations (F278L and P38S). However, one case with homozygous PTENA126D somatic mutation was associated with a low HRDetect score. Homozy-gous deletions in RAD50, RAD51, and MCPH1 were observed in some tumourswith moderate or high HRDetect scores. MCPH1 is a potential cancer susceptibil-ity gene (Mantere et al., 2016) whose deletion may be a poor prognostic marker(Tsuneizumi et al., 2002). Although recurrently deleted in our cohort, its link toHRD signatures was inconsistent.High HRDetect scores were also associated with triple negative and basal-likebreast cancers (Table 2.1). Of 19 samples with high HRDetect, 11 (58%) wereclassified as basal-like. Among low HRDetect samples, only 2 (5%) were basallike. Luminal B and normal-like tumours were more likely to have low HRDe-tect scores, whereas most (7/9) HER2-like tumours displayed moderate HRDe-tect. Receptor status was assessed by immunohistochemistry and retrieved frompathology records, which were available for 79 tumours at primary and 76 at re-lapse (Figure 2.2). High HRDetect was inversely associated with positive receptorstatus in all three receptors. 50% of high HRDetect tumours were triple negative,compared to only 6% of primary and 15% of metastatic low HRDetect tumours.330.000.250.500.751.00Probability+/++/−−/−0.00 0.25 0.50 0.75 1.00HRDetect ScoreBRCA1/BRCA2ModelCI FitPD FitResponsePDSDCIA0.740.000.250.500.751.000.00 0.25 0.50 0.75 1.00False PositivesTrue PositivesB0.960.740.130.000.250.500.751.000.00 0.25 0.50 0.75 1.00False PositivesTrue PositiveC0.00630.00460.000.250.500.751.000.00 0.25 0.50 0.75 1.00False PositivesTrue PositiveDPredictor snv 3snv 8SV 3SV 5HRD IndexMicrohomologyHRDetectFigure 2.4: Association of platinum-based treatment outcomes withHRDetect, an aggregate of six homologous recombination de-ficiency (HRD) mutation signatures. (A) The HRDetect scoreis significantly associated with clinical improvement (CI) onplatinum-based chemotherapy (logistic regression, adjusted forBRCA1/2 status and treatment timing, p = 0.006). There wasalso a trend between low HRDetect and progressive disease(PD; p = 0.112). Moreover, of 8 BRCA1/2-intact cases with el-evated HRDetect score, 5 responded favorably to platinum-based chemotherapy. Receiver-operator characteristic for (B)BRCA status and (C,D) therapeutic outcomes on platinum-based chemotherapy (C: CI; D: stable disease, SD). These sug-gest optimal HRDetect thresholds of 0.7 and 0.005 for CI andSD respectively. Specific near-threshold HRDetect values are la-belled. In all three ROC curves, HRDetect had a superior areaunder the curve than its six constituent mutation signatures.340.000.250.500.751.00HRDetectScorePredicted ResponseBefore BiopsyDuring BiopsyAfter BiopsyTreatmentTimePALB2BRIP1MCPH1ATRATMRAD50RAD51PTENBRCA2BRCA1TP53Intrinsic SubtypeERPRHER2ResponsePDSDCINADeletionSplice VariantMissenseNonsenseFrame ShiftSomaticGermlineBasalHer2Luminal ALuminal BNormal−likeNegativePositiveUnknownFigure 2.5: HRDetect scores, mutations in key homologous recombina-tion genes, and outcomes on platinum-based therapy. Six dis-tinct mutation signatures associated with homologous recombi-nation deficiency (HRD) were deciphered from 93 breast cancerwhole genomes and aggregated into a single HRDetect score.Radiology reports during and after treatment regimens involv-ing platinum-based chemotherapy were reviewed for evidenceof clinical improvement (CI), stable disease (SD), or progressivedisease (PD). Analysis of receiver-operator characteristic curvessuggested HRDetect thresholds of 0.7 for CI and 0.005 for SD,indicated here by a colourbar.35Table 2.1: Summary of patient molecular and clinical characteristics byHRDetect status.HRDetect status Low (<0.005) Moderate High (>0.7) TotalSample countsTotal Count 37 37 19 93Treated Count 9 13 11 33Treated and Imaged 8 7 11 26Pathogenic BRCA1/2 Variant 0 0 7 7Response to PlatinumCI 0 2 8 10SD 2 4 2 8PD 6 1 1 8Median TDT (days) 56 (n=9) 71 (n=13) 143 (n=11)Median OS (days) 122 (n=6) 160 (n=8) 384 (n=5)Intrinsic SubtypeBasal 2 12 11 24HER2 1 7 1 9AmplifiedLuminal A 6 5 2 13Luminal B 22 12 4 38Normal-like 6 0 1 7Primary receptor statusER (positive/negative) 31 / 3 20 / 9 7 / 8 58 / 20PR (positive/negative) 18 / 4 11 / 10 4 / 12 33 / 26HER2 (positive/negative) 4 / 23 4 / 22 0 / 14 8 / 59Triple negative 2 (6%) 8 (28%) 8 (50%) 18Metastatic receptor statusER (positive/negative) 27 / 6 17 / 10 5 / 10 49 / 26PR (positive/negative) 15 / 15 9 / 13 2 / 10 26 / 38HER2 (positive/negative) 6 / 28 4 / 22 1 / 13 11 / 63Triple negative 5 (15%) 6 (21%) 8 (50%) 19362.2.3 HRD Mutation Signatures are Associated with Platinum OutcomesHigh HRDetect scores were significantly associated with clinical improvementon platinum-based chemotherapy, even after adjusting for BRCA1/BRCA2 statusand treatment timing (p = 0.006, n = 26; Table 2.4). HRDetect demonstrated areasunder the ROC curve of 0.89 for clinical improvement (CI) and 0.86 for stabledisease (SD), which exceeded those of its component signatures (Fig. 2.4B, C;Table 2.3). Optimal thresholds of 0.005 for predicting SD and 0.7 for predicting CIwere chosen (Fig. 2.4B, C). Sensitivity, specificity, positive predictive value, andnegative predictive value were computed for both thresholds and are reported inTable 2.2.Biallelic loss of BRCA1 or BRCA2 was also associated with clinical improvementon platinum-based chemotherapy (Fig. 2.4A) but was observed in only three of 26treated patients with available imaging. By comparison, 11 patients demonstratedHRDetect scores above 0.7, of whom 8 experienced CI, 2 experienced SD, and1 had disease progression. Therefore, HRDetect scores correctly identified fiveadditional patients without biallelic loss of BRCA1 or BRCA2 who benefited fromplatinum-based therapy. In a joint logistic model, BRCA1 and BRCA2 status didnot contribute significantly to the predictive value of HRDetect (Table 2.4).37Table 2.2: Test metrics of HRDetect predictions computed using specifiedthresholds. Elevated HRDetect scores computed from wholegenome sequencing of a breast cancer cohort were associ-ated with improve response to platinum-based chemotherapy.Receiver-operator characteristic (ROC) curves suggested thresh-olds of 0.005 for stable disease (SD) and 0.7 for clinical improve-ment (CI). Sensitivity, specificity, positive predictive value (PPV)and negative predictive value (NPV) were computed based ontrue/false positive/negative rates for both thresholds.Response threshold accuracy sensitivity specificity PPV NPVSD or CI 0.005 0.85 0.89 0.75 0.89 0.75CI 0.7 0.81 0.8 0.82 0.73 0.88Table 2.3: Area under the curve of homologous recombination deficiency(HRD) signatures in platinum response prediction. Six dis-tinct HRD-associated mutation signatures were computed usingwhole genome sequencing data from 93 advanced-stage breastcancers. The six signatures were normalized and aggregated us-ing a logistic regression model with coefficients trained in a pre-vious study (Davies et al., 2017). Retrospective clinical reviewwas performed to classify best reported radiographic responseto platinum-based chemotherapy into three categories: clinicalimprovement (CI), stable disease (SD), and progressive disease(PD). Receiver-operator characteristics (ROC) were computedfor the aggregated metric, called HRDetect, as well as the sixoriginal signatures. Treatment success groups were defined aseither CI or the union of CI and SD response groups. For bothsuccess metrics, the area under the curve of each ROC curve isreported here.Predictor BRCA1/2 status AUC CI AUC CI & SD AUCsnv 3 0.897 0.756 0.836snv 8 0.777 0.826 0.743SV 3 0.832 0.838 0.845SV 5 0.769 0.821 0.822HRD Index 0.743 0.741 0.812Microhomology 0.899 0.75 0.605HRDetect 0.94 0.891 0.85538Table 2.4: Logistic regression model odds ratios of clinical improvement(CI) on platinum-based chemotherapy. HRDetect scores werecomputed using six mutation signatures derived from wholegenome sequencing of a breast cancer cohort. Germline and so-matic assessment of mutation status, deletions, and loss of het-erozygosity of BRCA1 and BRCA2 were assessed to determinemono-allelic and bi-allelic loss of function. Retrospective clini-cal review classified responses to platinum-based chemotherapy,which was modelled using logistic regression with HRDetectscores, BRCA1 & BRCA2 status, and treatment timing as pre-dictors. HRDetect was significantly associated with platinum re-sponse with a log odds ratio of 3.2 (odds ratio = 16, p = 0.006).z p Log Odds Ratio Lower CI Upper CIIntercept -1.9 0.061 -2.1 -4.6 -0.12HRDetect 2.8 0.0057 3.2 1.1 5.7BRCA+/- -0.22 0.83 -0.46 -4.8 3.9BRCA-/- 0.54 0.59 0.73 -2.1 3.7Tx During Biopsy -0.12 0.9 -0.21 -4.1 3.1Tx After Biopsy -0.023 0.98 -0.028 -2.5 2.52.2.4 Effects of HRDetect on Overall Survival and Treatment DurationOf patients treated post-biopsy with platinum-based chemotherapy, there was astatistically significant difference in overall survival (OS) depending upon HRDe-tect (p = 0.04, n = 33). 5 patients with predicted CI (HRDetect > 0.7) demonstrateda median survival of 384 days, 8 with predicted SD (0.7 > HRDetect > 0.005) hada median survival of 160 days, and 6 patients with predicted progressive dis-ease (PD) (HRDetect < 0.005) had a median survival of 122 days. This differenceshould be interpreted with caution due to small sample size and other treatmentsreceived besides platinum, but represents a promising trend which warrants fur-ther study.39In addition to OS, total duration on platinum-based therapy (TDT) was usedas a surrogate for clinical response. In practice, platinum-based chemotherapy istypically continued in responding patients until disease progression or significanttoxicity. Figure 2.8 verifies that, in 26 patients with available imaging, patientswith reported radiographic response were more likely to undergo a longer du-ration of treatment. HRDetect scores were significantly associated with extendedTDT with a hazard ratio of 0.24 (0.081 - 0.95; p = 0.01, n = 33), after adjustingfor BRCA1 and BRCA2 mutation status, timing of treatment, and patient age (Fig.2.6B). Tumours were classified based on HRDetect scores into predicted treatmentresponse categories. There was a significant difference in TDT (p < 0.001, n = 33;Fig. 2.6A) between patients with predicted CI (median 143 days), SD (median71 days), and PD (median 56 days). This amounts to an estimated three-monthdifference in median TDT between high HRD and low HRD cases.2.2.5 Feasibility of HRD Analysis in Personalized MedicineThe development of precision oncology initiatives (Laskin et al., 2015; Meric-Bernstam et al., 2013; Mestan et al., 2011; Zehir et al., 2017) has necessitatedgenome analysis pipelines compatible with “N of 1” cases. One challenge of mu-tation signature analysis by NMF is the reliance upon large cohorts of sequencedtumours. This has led to techniques to determine the most likely composition ofsignatures for a single isolated sample (Rosenthal et al., 2016). HRD analysis pro-vides a promising target for personalized treatment decision-making. Thus, in ad-dition to cohort-based de novo signature discovery, we also computed individual-tumour best fit signature exposure profiles for HRD-associated SNV signatures3 (V9) and 8 (V6) and SV signatures 3 (R1) and 5 (R5) using non-negative least40++++ p = 0.040.000.250.500.751.000.0 0.5 1.0 1.5 2.0 2.5Time (years)Overall Survival ProbabilityPrediction + + +PD SD CIAp = 0.000290.000.250.500.751.000 100 200 300 400 500TimeProbability ofContinued TherapyPrediction CI PD SDBAgeBRCA −/−BRCA +/−HRDetectTx After BiopsyTx During Biopsy0.1 0.3 1.0 3.0Hazard RatioCox RegressionCFigure 2.6: Homologous recombination deficiency is associated with ex-tended overall survival (OS) and total duration on platinum-based therapy (TDT). (A) Among patients treated after the se-quencing biopsy (n = 19), OS was computed as the durationbetween first post-biopsy treatment and death. There was a sta-tistically significant (p = 0.04) difference between patients pre-dicted to be CI (HRDetect > 0.7), SD (0.7 > HRDetect > 0.005),and PD (HRDetect < 0.005). (B) Platinum-treated patients (n= 33) with different predicted treatment outcomes also experi-enced significantly different TDT as part of standard care foradvanced breast cancer. (C) Multivariate Cox survival modeldemonstrated a significant association between HRDetect andTDT independently of BRCA1/2 mutation status. 95% confi-dence intervals are shown for the hazard ratio.41squares (NNLS) - details in Methods. We then recomputed HRDetect scores usingthese individualized NNLS signature exposures to assess accuracy.HRDetect scores and all four HRD-associated SNV and SV signatures demon-strated high concordance between NMF and NNLS approaches based on Pearsonlinear regression (r > 0.9; Fig. 2.7). Employing the selected thresholds of 0.005 forSD and 0.7 for CI, 86 out of 93 cancers were concordantly classified by NNLS andNMF, including all cases predicted to experience CI. NNLS reclassified 4 cancersfrom PD to SD, and 3 from SD to PD.These findings demonstrate that NNLS-based N of 1 computation of mutationsignature exposures provides robust HRD estimates concordant with a cohort-based NMF approach. This is promising for the application of HRD biomarkersin sequencing-driven treatment guidance. However, this approach may not trans-late to WES data or similarly targeted sequencing approaches due to the lowernumbers of sampled mutations.2.3 discussionIn this retrospective study, HRD mutation signatures were associated with clinicalbenefit on platinum based chemotherapy in advanced stage breast cancer. Specif-ically, we demonstrated that HRDetect, the same model independently trained topredict BRCA1 and BRCA2 status with high sensitivity and specificity (Davieset al., 2017), was also significantly associated with favorable response to plat-inum chemotherapy response and longer TDT. Moreover, we identified an optimalHRDetect threshold of 0.7, which agrees with the previously established cut-offfor BRCA1/BRCA2 status (Davies et al., 2017). Therefore, our findings both inde-42lllllllllllllllllllllllllllllllllllll lllRsq = 0.99p = 1.02e−910.000.250.500.751.000.000.250.500.751.00HRDetect (NMF)HRDetect (NNLS)Correct Class l lFALSE TRUE PD SD CIAllllllllllllllllllllllll llllllllllllllllllllllll Rsq = 0.94p = 6.16e−58 llllllllllllllllll lllllllllllll l lll llllllllllRsq = 0.84p = 1.76e−38SNV Signature 3 SNV Signature 8025005000750010000 0250050007500100000300060009000NMF ExposureNNLS ExposurellllllllllllllRsq = 0.99p = 1.02e−93 lllllllllllll llllllllllllllllll Rsq = 0.82p = 8.22e−36SV Signature 3 SV Signature 50100200300 0100200300050100150200250NMF ExposureNNLS ExposureBFigure 2.7: N of 1 signatures by non-negative matrix factorization(NNLS) accurately reproduce HRDetect scores. (A) HRDetectscores were computed using component signatures derivedfrom both NNLS and non-negative matrix factorization (NMF).Scores obtained by the two approaches were strongly correlated(Pearson’s R squared = 0.99) and demonstrated high classifica-tion concordance based on selected thresholds. (B) IndividualHRD-associated mutation signatures were concordant betweenthe two approaches (Pearson’s R squared > 0.82 for all signa-tures).43pendently validate the HRDetect model and provide promising evidence for itsclinical relevance.A key limitation of this study is the ability to establish causation. As this wasan observational cohort of advanced-stage breast cancers undergoing standardchemotherapy treatments, some patients were sequenced during or after coursesof platinum-based chemotherapy. To mitigate the impacts of tumour evolution,we limited analyses to patients sequenced within two years of treatment. Anothersignificant challenge when studying treated tumours is that platinum-associatedmutagenesis may impact the mutation signature profile, especially in cancers biop-sied after treatment. A few factors help to mitigate this challenge, but cannot en-tirely rule out platinum-induced mutagenesis. First, we adjusted for the treatmenttiming in statistical analyses of the association between HRDetect and clinicaloutcomes. Second, there has been reproducible evidence of HRD-associated sig-natures in cohorts of predominantly primary tumours (Alexandrov et al., 2013a;Davies et al., 2017; Nik-Zainal et al., 2012, 2016; Timms et al., 2014), which area close match to the signatures we found. Lastly, the aggregation of six distinctsignatures into a more robust metric should help mitigate the impact of platinum-induced mutagenesis affecting any one signature in particular. Notably, the inves-tigation of advanced stage breast cancers is an important feature of this study.Whereas a previous trial did not find that the HRD index alone was predictivein advanced breast cancer (Tutt et al., 2015), our findings renew promise for ag-gregated metrics such as HRDetect. However, studying advanced stage tumoursinevitably introduces potential confounders such as variable treatment histories.Therefore, well-designed prospective clinical trials are needed to further validateHRDetect as a predictive biomarker.44Another caveat is the threshold selection for predicting CI. A threshold of 0.7was chosen because it both agrees with the model trained by Davies et al. (2017)and optimally separated responders from non-responders in our cohort. However,there was a sharp decline in HRDetect scores below 0.7, with no cases fallingbetween 0.25 and 0.5, and no cases with treatment response data between 0.5 and0.7. This suggests that a superior threshold may exist between 0.25 and 0.7, and astudy with greater sample size may be require to pinpoint it.HRD is common among breast cancers. Based on our HRDetect predictivethresholds, 19 cases (20%) showed potentially targetable high HRD status (HRDe-tect > 0.70). An additional 37 cancers (40%) showed moderate HRD status consis-tent with stable disease on platinum-based chemotherapy (HRDetect > 0.005). Bycomparison, biallelic germline and somatic mutations were detected in only 11cases, and known pathogenic variants in only 7. Similarly, a previous analysis of560 breast cancer genomes, which additionally examined promoter hypermethy-lation, estimated the frequency of BRCA-null breast cancers at 14% (Nik-Zainal etal., 2016). The analysis of HRD signatures may identify patients who could ben-efit from platinum-based therapy otherwise undetected on BRCA1/2 screening.These signatures may also have implications for sensitivity to PARP inhibitors,which exploit a synthetic lethal interaction between PARP-1 and the HR pathway.Germline mutations in BRCA1 and BRCA2 are associated with improved responseto PARP inhibitors (Robson et al., 2017). Additional translational research incorpo-rating WGS is necessary to reveal whether HRD mutation signatures are similarlyassociated with PARP inhibitor response independently of BRCA1/2 status.Clinical translation of HRD mutation signatures requires sufficient capture ofsomatic SNVs and SVs to infer the processes underlying mutagenesis. WhileHRDetect improves upon the accuracy of the clinically employed LOH, TAI, and45LST metrics, it requires WGS, which currently poses technical and financial chal-lenges for clinical use. Further research to develop predictive models that excludeSV signatures may enable application on cancer exomes or other targeted sequenc-ing methods, which can capture sufficient somatic mutations for SNV signaturebut not SV signature analysis. Additionally, orthogonal HRD assays, for exampleemploying gene set expression profiling (Mulligan et al., 2014), may also serve aslower cost parameters for treatment prediction. Nevertheless, as sequencing costsfall, WGS provides unique opportunities to integrate diverse markers of genomicinstability and mutagenesis within a single protocol. Moreover, we demonstratedthat NNLS mutation signature analysis enables accurate N of 1 HRD signatureinvestigation for genome-driven personalized medicine initiatives.Quantifying HRD signatures supplements existing knowledge and paradigmsof cancer detection and stratification. HRDetect scores were associated not onlywith BRCA1 and BRCA2, but also potentially with other genes such as PTEN. Thisapproach provides a functional indicator for mutations whose impact on genefunction is uncertain, potentially expanding the repertoire of known causativevariants which comprise hereditary cancer screening (Polak et al., 2017). Addi-tionally, we observed that HRD signatures were more common in, but not exclu-sive to, triple-negative and basal-like breast cancers. This agrees with previouswork (Nik-Zainal et al., 2016) and helps to situate HRD in the context of otherwidely-used breast cancer markers. A topic for future investigation is the value ofscreening basal-like and triple negative breast cancers for signatures of HRD.Breast cancer remains the most common cancer diagnosis in women world-wide. It is evident that a substantial proportion are driven in some part by HRD.Here, we have quantified the relationship between aggregated HRD signaturesand measures of sensitivity to platinum-based chemotherapy, providing the basis46for further investigation of this putative predictive biomarker in prospective tri-als. In doing so, this study demonstrates the potential for mutation signatures toguide clinical therapy in a precision oncology setting.2.4 methods2.4.1 Patient Samples, Ethics, and Data Policy93 study participants with advanced stage breast cancer underwent tumour biop-sies at the BC Cancer Agency and collaborating hospitals as part of the POGproject, the first 100 cases of which were described in an earlier publication(Laskin et al., 2015). This study includes data from the first 93 verified breastcancer cases which underwent whole genome characterization and met qualityassurance standards.2.4.2 Sample Collection, Preparation, and SequencingBiopsy samples were embedded in optimal cutting temperature (OCT) compoundand sectioned. Pathology review was completed for each specimen, including as-sessment of tumour content. Genome libraries from tumor and peripheral blood(normal control) as well as transcriptome libraries from tumour were constructedusing Illumina protocols. Whole genome and transcriptome sequencing was per-formed on an Illumina HiSeq2000 or HiSeq2500 sequencer. The details of libraryconstruction and sequencing have been previously described (Bose et al., 2015;Sheffield et al., 2015).472.4.3 Bioinformatic AnalysisSequencing reads were aligned to the human reference genome (GSCh37) by theBWA aligner (v0.5.7) (Li and Durbin, 2009, 2010). Somatic SNVs and small in-sertions/deletions were processed using samtools (Li et al., 2009) and Strelka(v0.4.6.2) (Saunders et al., 2012). CNVs were called using CNASeq (v0.0.6) asdescribed in (Jones et al., 2010) and LOH by APOLLOH (v0.1.1) (Ha et al., 2012).The matched normal genome was used to subtract germline variants and to reportcancer risk variants in 98 select actionable genes, pre-approved by an ethics com-mittee. Germline variant pathogenicity was estimated according to establishedACMG guidelines (Richards et al., 2015) using a local curated variant databaseand custom-built risk calculator established by the BC Cancer Agency CancerGenetics Laboratory. Transcriptomes were repositioned using JAGuaR (version2.0.3) (Butterfield et al., 2014). Differential expression analysis was performed bycomparing RPKM expression levels against a compendium of 16 normal tissuesfrom the Illumina BodyMap 2.0 project (available from ArrayExpress, queryID: E-MTAB-513) as described in (Jones et al., 2010). Intrinsic subtypes were determinedby performing Spearman rank-order correlations on the expression of genes inthe PAM50 gene set (Chia et al., 2012) for each breast cancer subtype between se-quenced samples and 823 breast cancers derived from The Cancer Genome Atlas(The Cancer Genome Atlas, 2012). For each sample, the subtype with the greatestcorrelation coefficient was taken as the intrinsic subtype (Figure 2.2). One tumoursample did not pass quality control for RNA-seq and was excluded from analysesinvolving intrinsic subtypes.482.4.4 Determining HRDetect ScoresHRDetect scores were computed by aggregating six mutation signatures associ-ated with HRD: (1) SNV signature 3/V9, (2) SNV signature 8/V6, (3) SV signature3/R1, (4) SV signature 5/R5, (5) the HRD index, and (6) the fraction of deletionswith microhomology. All signatures were normalized and log transformed as pre-viously described (Davies et al., 2017), and HRDetect scores were computed usinga logistic model with the same intercept and coefficients as those reported in thepreviously trained model, without any retraining or adjustment (Davies et al.,2017). The intercept was -3.364 and the coefficients were 1.611, 0.091, 1.153, 0.847,0.667, and 2.398 respectively for the six HRD signatures. The sections that followdetail the computation of the six component signatures. A complete pipeline forcomputing HRDetect scores is available at github.com/eyzhao/hrdetect-pipeline.2.4.5 Single Nucleotide Variant Mutation SignaturesSomatic SNVs called by Strelka were used for mutation signature calculation.SNVs were categorized based on 6 variant types and 16 trinucleotide context sub-types to yield a total of 96 mutation classes. Mutation signatures were decipheredusing a published framework (Alexandrov et al., 2013b), which employs NMF toinfer both the operative signatures prevalent across the 93-genome cohort and therelative exposure of each signature to each genome. Exposures are modeled as thenumber of mutations contributed by a mutation signature. Fractional exposurewas defined as the proportion of a genome’s total mutation burden contributedby a particular signature. Signature stability estimates were obtained by bootstrapre-sampling with 1 008 iterations (84 iterations over 12 cores). The similarity of49signatures to thirty previously described mutational signatures (available fromcancer.sanger.ac.uk/cosmic/signatures) was quantified using the cosine similar-ity metric. Solutions with a 7 to 10 signature model were found to best maximizesignature stability and minimize Frobenius reconstruction error. Among these, a9-signature model was selected as it yielded one signature with maximal cosinesimilarity to the previously described HRD-associated Signature 3.2.4.6 Structural Variant Mutation SignaturesLarge scale somatic SVs were reconstructed by de novo assembly of tumor and nor-mal reads using ABySS and Trans-ABySS (Robertson et al., 2010). Candidate SVswere re-aligned to the reference genome to resolve breakpoints. Additionally, weused DELLY (v0.6.1) to obtain an independent SV set by reference-based analysisof split and paired end reads (Rausch et al., 2012). Germline events were filteredout by subtracting SVs found in the matched normal genome. SVs detected bythe two methods were merged to yield a high quality consensus set, containingan intersection of variants called by both methods where matching breakpointloci were separated by no more than 20 base pairs.32-parameter SV mutation catalog vectors were computed by binning variantsbased on breakpoint clustering, SV type, and SV length (Nik-Zainal et al., 2016),yielding a 32 by 93 catalogue matrix. This matrix was decomposed by NMF (likewith SNV signatures) using a 6-signature model, which was chosen to maximizesignature stability and minimize Frobenius reconstruction error. Pairwise com-parisons of newly deciphered mutation signatures to six previously describedsignatures was performed by cosine similarity metric.502.4.7 Calculation of the HRD IndexFor each cancer genome, the HRD index was computed as the arithmetic sum ofLOH, TAI, and LST scores. CNV and LOH analysis pipelines yielded coordinatessegmenting whole genomes by allele-specific copy number ratios. We created anR package called HRDtools which computes LOH, TAI, and LST scores basedon the genome-wide CNV profile (available from github.com/eyzhao/hrdtools).Because the HRD index relies upon large-scale events, HRDtools first filters outsmall events occurring within contiguous events at least 100 times larger. Thethree scores are then determined based on published guidelines (Timms et al.,2014)2.4.8 Analysis of Deletion MicrohomologySomatic deletions were detected based on sequence alignment using Strelka. Se-quences flanking deletion breakpoints were obtained. The microhomology frac-tion was determined as the proportion of deletions which were larger than threebase pairs and demonstrated overlapping microhomology at the breakpoints.2.4.9 Review of Clinical Case DataRetrospective chart review was performed to obtain treatment history and clin-ical response to chemotherapy regimens. We queried a province-wide registryof oncology therapeutic records (Wu et al., 2013) to obtain dates of (1) birth, (2)death if applicable, (3) most recent cancer diagnosis, and (4) start and end dates ofall platinum-based chemotherapy regimens administered to treat the most recent51cancer diagnosis along with therapies used in combination. Treatment timelinesand clinical response are presented in Figure 2.8. All patients were treated as partof standard cancer care either prior to, during, or after the sequencing biopsy.Platinum-treated patients were given standard doses of cisplatin (30 mg/m2 ondays 1 and 8 of a 21 day cycle) or carboplatin (calculated in milligrams as glomeru-lar filtration rate + 25, multiplied by 6 for monotherapy or 5 in combination regi-mens).To assess therapeutic benefit, three outcomes were chosen: OS, TDT, and clini-cal response based on imaging. OS was assessed in patients treated after sequenc-ing (n = 19) and was computed as the duration from first post-biopsy dose ofplatinum-based chemotherapy to death. TDT was examined as a surrogate fortherapy effectiveness. To improve relevance to the present diagnosis, TDT in-cluded only treatment regimens occurring within 2 years of sequencing biopsy(n = 33; Figure 2.8).Clinical imaging reports were reviewed to evaluate platinum response includ-ing fludeoxyglucose positron emission tomography and computed tomographyobtained during or within two months after the period of platinum-based therapy,compared to pre-treatment scans. Treatment response was classified as follows: (1)CI, any tumor shrinkage of one or more lesions with no evidence of growth ornew lesions; (2) SD, either no change in lesions or decreased size of some lesionswith growth of others; or (3) PD, disease progression with no associated tumorshrinkage. The best observed response per regimen was recorded.52BR011BR033BR014BR021BR055BR015BR054BR004BR024BR012BR066BR002BR057BR026BR049BR046BR020BR019BR064BR067BR035BR068BR007BR032BR023BR009BR034BR041BR084BR005BR001BR042BR017BR003BR044−1000−500 05001000Time (days)PatientBefore BiopsyDuring BiopsyAfter BiopsyTimingPlatinum AgentCARBOPLATINCISPLATINResponsePDSDCIFigure 2.8: Treatment timelines and radiographic outcomes on platinum-based chemotherapy arranged by total duration on platinum-based chemotherapy. The time axis is aligned to the biopsydate, which is centred at time zero. Treatment timelines wereobtained from the Outcomes and Surveillance Integration Sys-tem (OaSIS) of the BC Cancer Agency, which aggregates can-cer therapy data across provincial registries. Radiographic out-comes were obtained from a retrospective review of radiologistreports specific to periods of platinum-based treatment.533S I G N I T: I N F E R R I N G M U TAT I O N S I G N AT U R E S A N D T H E I RT E M P O R A L E V O L U T I O N I N I N D I V I D U A L T U M O U R S3.1 introductionMutagenic processes in cancer leave characteristic patterns of somatic SNVs(Alexandrov et al., 2013a) and SVs (Nik-Zainal et al., 2016). These mutation sig-natures reveal exposures such as tobacco smoke (Alexandrov et al., 2016) andultraviolet radiation, as well as DNA repair deficiencies (Polak et al., 2017). Theyhave also been shown to correlate with the etiology, biology, and pathology of tu-mours (Schulze et al., 2015; Wang et al., 2017). Recent studies have also revealedtherapeutic implications of mutation signatures, suggesting opportunities for pre-dictive biomarker clinical trials (Alexandrov et al., 2015b; Le et al., 2015; Rizviet al., 2015; Zhao et al., 2017). Increasingly, high throughput sequencing is beinginvestigated for its potential to guide cancer precision therapy (Kumar-Sinha andChinnaiyan, 2018; Zehir et al., 2017). The use of mutation signatures as biomarkersfor personalized medicine will require accurate, robust, and interpretable muta-tion signature analysis in individual tumours.Most mutation signature methods focus on detecting signatures de novo, whichrequires large cancer genome cohorts (Alexandrov et al., 2013b; Baez-Ortega andGori, 2017; Fischer et al., 2013; Shiraishi et al., 2015). Currently, two methodsexist for n-of-1 mutation signature decomposition by fitting to consensus refer-55ence signatures: deconstructSigs (Rosenthal et al., 2016) and SignatureEstimation(Huang et al., 2017). deconstructSigs performs signature selection and point es-timation of signature exposures, while SignatureEstimation additionally reportscredible intervals. However, a significant challenge when fitting samples to ref-erence signatures is multicollinearity: correlated features between signatures cancause bleed of signal between them. This can cause overfitting, resulting in over-or underestimation of clinically relevant mutation signatures.The temporal evolution of mutation signatures is also crucial to their biolog-ical and clinical interpretation. Temporal dissection of cancer mutation sets hasshown that exogenous and aging-related mutagenic processes act earlier than en-dogenous mutagens and DNA repair deficiencies (Bruin et al., 2014; McGranahanet al., 2015; Rosenthal et al., 2016). Temporal shifts in mutagenesis may also revealshifts in therapeutic targets.One approach for tracking changing mutation signatures is serial sequencing.However, this strategy is costly and impractical in the clinical setting as it requiresrebiopsy. An alternative strategy is to use digital NGS read counts to infer the cel-lular prevalence (also known as cancer cell fraction) and number of chromosomalcopies carrying each mutation. Both are directly related to mutation timing: so-matic mutations present on multiple copies likely occurred before duplication, andmutations with high cellular prevalence likely occurred before subclone develop-ment (Figure 1.1). Previous approaches have partitioned mutations a priori into“early” and “late” categories (Bruin et al., 2014; McGranahan et al., 2015). Hence-forth, we will refer to this approach as binary temporal partitioning (BTP). Thismethod is limited because it makes hard assumptions about the underlying tu-mour clonal architecture, uses arbitrary thresholds to define “clonal” mutations,56and treats each variant independently rather than inferring shared parametersfrom the complete data.In recent years, Bayesian probabilistic models have generated substantial ad-vances in the inference of heterogeneous tumour subpopulations from both so-matic SNVs and read depth data (Fischer et al., 2014; Ha et al., 2014; Miller etal., 2014; Roth et al., 2014). These methods define a hierarchical data-generatingprobabilistic model, then infer model parameters (such as population prevalencesand signature exposures) to best reflect the data. Bayesian inference is often morerobust to noise and provides full posterior distributions over parameter estimates.Here, we present SignIT, an R package featuring a Bayesian hierarchical modelfor accurate and robust mutation signature analysis of individual tumours. Fullposterior estimates of signature exposures juxtaposed with comprehensive mu-tation signature bleed mapping can significantly enhance interpretability. SignITalso includes an extended model which enables joint inference of mutation sig-natures and temporally distinct tumour subpopulations, exposing signature evo-lution. We assess SignIT’s n-of-1 signature accuracy against deconstructSigs andSignatureEstimation using both simulated mutation count vector and somatic mu-tation data from TCGA. We validated SignIT’s temporal analysis using WGS datafrom 24 serially sequenced primary-metastasis tumour pairs. Lastly, we applySignIT to the analysis of 543 metastatic whole genomes, the first ever temporalanalysis of mutation signatures in metastatic cancer.573.2 results3.2.1 SignIT Reports Credible Intervals and Signature BleedWe begin with an example to illustrate SignIT’s output. P10 is a patient whounderwent whole genome sequencing of her metastatic breast cancer, revealing10,068 somatic SNVs. Mutation signature analysis by SignIT revealed elevated sig-natures 3 and 8, both associated with HRD (Davies et al., 2017), as well as slightinvolvement of signatures 2, 9, and 17 (Figure 3.1A). SignIT reports full poste-rior probability distributions which reflect the stochasticity of somatic mutationas well as uncertainties due to signature bleed. Moreover, 2D projections of theposterior distribution provide a pairwise map of signature bleed (Figure 3.1B),which is visualized below signature exposures as a non-directed graph. Signaturebleed presents as anti-correlation in the posterior samples because the existenceof reference signatures with similar profiles will produce mutually exclusive so-lutions. Signatures 3 and 8, for example, have correlated mutation spectra, with acosine similarity of 0.76.SV signatures can also be analyzed. P10 possessed 146 somatic SVs, which werefitted against six SV signatures previously identified in breast cancer (Nik-Zainalet al., 2016). This revealed involvement of rearrangement signatures 2 and 5, withsignature bleed between them (Figure 3.1C).3.2.2 Resilience to Complexity and NoiseTo assess the accuracy of signature exposures, we created a mutation signa-ture simulation R package called msimR (github.com/eyzhao/msimR). Mutation5801000200030001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30Exposure0.40.50.6SignatureBleedA Rho = −0.50422502500275030003250350024002700300033003600Signature 3Signature 8B0204060801 2 3 4 5 6ExposureCFigure 3.1: SignIT reports complete posterior distributions along withsignal bleed between signatures. (A) Complete Bayesian infer-ence over mutation counts determines posterior parameter es-timates for each mutation signature exposure. Signature bleed,quantified as the negative Spearman correlation coefficient per2D slice of the posterior distribution, is quantified pairwise be-tween signatures and plotted as a graph below posteriors. (B)An example of anticorrelated exposure posteriors between Sig-nature 3 and Signature 8 is shown as a 2D density plot. (C)Mutation signature exposures can also be computed for struc-tural variant mutation signatures, using a 6-signature referenceset.59Alll l l l l l lllllll l l l l l lllllll l l l l l lllllll l l l l l lllll l l l llll l l l l l llllll l l l l llllll l l l lllllllllll l l l lllllllllll l l l llllllllll l l l l lllll l l l lllll l l l l lllll l l l l llllll l l l lllllll l l lllllll l l ll lllllllll l l lllllll l l ll lllllllll l l lllllll l l ll llll llllll l l l llllll l l l ll llll l l l lllll l l l l lllll l l l l ll lll l l l l lllllllll llllllll l ll l l l l llllll ll l llllllll l ll l l l l llll ll l l l lllllll l l ll l l l l l l l llllll l l l llllll l l l ll l l l l l l l llll l l l l l lllll l l l l ll l l l l l l l l1 signatures 5 signatures 12 signatures 20 signatures0% noise10% noise20% noise40% noise80% noise1e+011e+021e+031e+041e+051e+011e+021e+031e+041e+051e+011e+021e+031e+041e+051e+011e+021e+031e+041e+050.00.20.40.60.00.20.40.60.00.20.40.60.00.20.40.60.00.20.40.6Number of MutationsCosine DistanceMethodllldeconstructSigsSignatureEstimation_SASignITBFigure 3.2: SignIT improves signature estimation for complex modelswith noisy data. (A) Mutation count vectors were simulated byvarying the mutation burden, number of active signatures, andamount of noise introduced into the reference signature matrix.Exposures were estimated using three signature decompositionmethods and compared against true exposures. (B) 500 countvectors were simulated at each condition and the similarity ofestimated to true exposures was computed using the cosine dis-tance (lower values indicate better accuracy). The mean cosinesimilarity per condition is shown.count vectors were generated from known simulated signature exposures withvarying mutation burden and model complexity (number of active signatures).Additionally, there can be uncertainty in the mutational profile of processes whichgenerated somatic mutation, as no reference set can be expected to capture all thepossible biological variability of mutagenesis. To emulate this biological variabil-ity, random Gaussian perturbation of reference signatures was introduced (Figure3.2A). It is expected that performance improves with increasing mutation burden(sample size) and declines with increasing model complexity (the number of sig-natures, or dimensionality) and reference signature noise.Simulated genomes with higher mutation burden yielded more accurate signa-ture exposures across all conditions and methods, as demonstrated by decreased60cosine distance (Figure 3.2B), which is defined as 1− cosinesimilarity. However,accuracy declined in more complex genomes with larger numbers of active signa-tures. SignIT was either equally accurate or more accurate than other methods inall settings with the exception of low-complexity, low-mutation genomes. SignITwas superior in genomes with many active processes and was also substantiallymore robust to perturbation of the underlying reference signatures.Error rates were quantified per signature to identify over- and underestimatedsignature exposures. Both deconstructSigs and SignatureEstimation frequently un-derestimated signatures, which may result in the loss of actionable information(Appendix Figure A.1). Particularly difficult to resolve were signatures most sim-ilar to other signatures and are therefore most likely to exhibit signature bleed,especially signature 5. Conversely, absent signatures are frequently overestimatedby all methods (Appendix Figure A.2). However, where SignIT inflated exposures,it did so with a lower relative error. This robustness against dramatic over- or un-derestimation of signature exposures is necessary for confident clinical interpre-tation.In most settings, SignIT takes longer to run than other methods, but scales torealistic mutation burdens with practical runtimes (Figure 3.3). Using default set-tings (8 chains in parallel with 200 burn-in iterations and 200 sampling iterationseach), SignIT ran in 20 seconds on tumours with 100 mutations and 154 secondson tumours with 1,000,000 mutations.3.2.3 SignIT Better Reproduces Signatures in Cancer DataTo evaluate SignIT on real cancer genome mutation data, we analyzed wholeexomes from nine cohorts of TCGA. Mutation signatures in each cohort were61l l l l l l l l ll l l l l l l l ll l l ll l l lll l l l l l l l ll l l l l l l l ll l l ll l l lll l l l l l l l ll l l l l l l l ll l l ll l l lll l l l l l l l ll l l l l l l l ll l l ll l llll l l l l l l l ll l l l l l l l ll l ll l l llll l l l l l ll ll l l l l l ll ll l llllll l l l l l l l ll l l l l l l ll l ll ll llll l l l l l l l ll l l l l l l l ll ll l ll llll l l l l l l l ll l l l l l l l ll ll l ll llll l l l l l l l ll l l l l l l l ll ll l ll llll l l l ll llll l l l l ll ll l ll llll l l l ll l l ll l l l l l ll ll l ll llll l l l ll l l ll l ll ll l ll llll l l l l l l l ll l l ll l l ll lllll l l l l l l l ll l l l l l l l ll l l ll lllll l l ll lllll l l l l l ll ll l ll llll l l ll ll lll l l l l l ll ll l ll llll l l ll l ll ll l l l l ll ll ll lllll l l l ll l l ll l ll ll l ll llll l l l l l l l ll l l ll l l ll llll1 signatures 5 signatures 12 signatures 20 signatures0% noise10% noise20% noise40% noise80% noise1e+011e+021e+031e+041e+051e+011e+021e+031e+041e+051e+011e+021e+031e+041e+051e+011e+021e+031e+041e+05050100150050100150050100150050100150050100150Number of MutationsRun Time (sec)methodllldeconstructSigsSignatureEstimation_SASignITFigure 3.3: Runtimes of n-of-1 mutation signature decomposition tools.Simulated mutation catalogs were generated under various con-ditions and their exposures were re-estimated using decon-structSigs, SignatureEstimation, and SignIT. The number of mu-tations varied from 10 to 1,000,000, the number of active signa-tures varied from 1 to 20, and random perturbation of referencesignatures varied from 0 to 80 percent. Runtimes were capturedacross 500 trials under each set of conditions. Mean runtimesare presented here in seconds.62deciphered by NMF (Alexandrov et al., 2013b), as well as by SignIT, deconstruct-Sigs, and SignatureEstimation. NMF signatures were compared against the fullCOSMIC 30-signature reference set to determine the best match for each de novosignature. To best emulate a clinical sequencing scenario, n-of-1 signature analysiswas rendered entirely blind to NMF results. Exposures were computed against theentire COSMIC 30-signature reference set. For each COSMIC signature matchedby NMF, exposures were compared to those of each n-of-1 method by Spearmancorrelation, which was chosen because of its robustness to outlier (hypermutating)signatures.SignIT exposures demonstrated greater concordance with de novo NMF meth-ods across all signatures and cohorts than deconstructSigs or SignatureEstimation(Figure 3.4A). While the methods were comparable for hypermutating signaturessuch as Signatures 2, 4, 7, and 13, SignIT substantially improved concordance withNMF for lower-exposure signatures (Figure 3.4B).3.2.4 SignIT Infers the Temporal Evolution of SignaturesReturning to patient P10, we next undertake the temporal dissection of signaturesacross tumour subpopulations (Figure 3.5A). SignIT identified two mutationalsubpopulations with prevalences of 1.0 and 0.31 accounting for 80.5% and 19.5%of total mutation burden respectively. The two populations display different mu-tational profiles, with the more prevalent (earlier) population being enriched forSignature 3 and the less prevalent (later) population for Signatures 16, 17, and 30.Dividing mutations into early and late sets using BTP (McGranahan et al., 2015)agreed closely with results from SignIT (Figure 3.5B).63BLCA BRCA CESC COAD LUAD LUSC SKCM STAD UCECdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitdeconstructsigssig_estimationsignitSignature 30Signature 28Signature 26Signature 21Signature 20Signature 17Signature 16Signature 15Signature 14Signature 13Signature 12Signature 11Signature 10Signature 8Signature 7Signature 6Signature 5Signature 4Signature 3Signature 2Signature 1MethodSignatureAlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllll lldSigs sigEst signit0 50100150200 0 50100150200 0 501001502000.000.250.500.751.00Median Exposure(Number of Mutations)SpearmanCorrelation0.250.500.75SpearmanCorrelationBFigure 3.4: Comparison of NMF and n-of-1 methods across nine can-cer exome cohorts. (A) Mutation signatures were decipheredde novo in 9 cohorts from The Cancer Genome Atlas (TCGA)and matched to the most similar corresponding reference sig-nature. N of 1 mutation signature exposures were estimatedusing three methods and compared against de novo signatureexposures using the Spearman correlation coefficient. (B) Sig-nIT demonstrated improved accuracy, providing significant im-provement in resolving signal from less mutagenic signatures.64Early-arising, population 1 signatures were concordant with both primary (co-sine similarity = 0.988; Figure 3.5C) and metastatic (cosine similarity = 0.976; Fig-ure 3.1A) exposure profiles, both of which showed elevated signatures 3 and 8.Population 2 therefore may provide insights into mutational processes later inmetastasis.3.2.5 Metastatic Tumours Demonstrate Divergence of Mutational ProcessesTo assess the temporal dissection of mutation signatures, we performed WGS of24 metastatic tumours with paired sequencing of primaries. 20 of those primarieswere sequenced from formalin-fixed and paraffin embedded (FFPE) tissue, and 4from OCT tissue. The intersection of primary and metastatic SNVs was used toderive signatures of the primary tumour. The rationale for using the intersectionis to focus on mutations present in the primary which persisted in the metasta-sis and to filter out false positives introduced by FFPE. Subpopulation-specificsignature exposures computed by SignIT were compared to primary tumour sig-natures. The divergence away from the primary was quantified for the signaturesof each subpopulation in the metastatic sample using the cosine distance.As expected, all cases demonstrated mutation signature divergence from theprimary tumour in the least prevalent (latest) subpopulation (Figure 3.6). In mostcases, prevalent early subpopulations were similar to the primary tumour (cosinedistance < 0.2), even when signatures from the bulk metastatic tumour differedfrom the primary. This suggests that signature timing can reveal the early muta-tional processes of tumorigenesis using a later metastatic sample.Early mutations derived from the BTP method similarly matched primary tu-mour signatures except in three cases (P07, P08, and P15). All three were charac-650.00.10.20.30.4Signature 1Signature 2Signature 3Signature 5Signature 8Signature 9Signature 11Signature 13Signature 16Signature 17Signature 30SignatureExposure FractionPopulation 1Population 20.3 0.5 0.7Proportion0.4 0.6 0.8 1.0PrevalencePopulationPopulation 1Population 2All llllllll ll llllllll ll ll ll ll llllll ll ll ll ll ll ll ll ll ll ll ll llll0.00.10.20.30.41 3 5 7 9 11 13 15 17 19 21 23 25 27 29SignatureExposureFraction TimingllearlylateB0.00.10.20.30.41 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30SignatureExposureFractionCFigure 3.5: Subpopulation-specific mutation signatures in a somatic can-cer whole genome. (A) SignIT was used to infer subpopulation-specific mutation signatures in a breast cancer. This revealedtwo temporally distinct subpopulations with mutational preva-lences of 1.0 and 0.31 giving rise to 80.5% and 19.5% of mu-tations respectively. (B) Temporally dissected mutation signa-tures were similar to those deciphered by binary temporal parti-tioning. (C) Signatures deciphered from mutations shared withthe sequenced primary tumour also agreed with early subpop-ulation mutation signatures.66lllll ll llllllllllllll llllllllllll lllllll ll ll ll llllllllllll lP22 biop2 P23 biop1 P24 biop1 P24 biop2P16 biop1 P17 biop1 P18 biop1 P19 biop1 P20 biop1P11 biop1 P12 biop1 P13 biop1 P14 biop1 P15 biop1P06 biop1 P07 biop1 P08 biop1 P09 biop1 P10 biop1P01 biop1 P02 biop1 P03 biop1 P04 biop1 P05 biop10.00.51.0 0.00.51.0 0.00.51.0 0.00.51.00.00.51.00.00.40.80.00.40.80.00.40.80.00.40.80.00.40.8PrevalenceCosine Distance from Primary Whole MetastasisPopulationlllll12345PopulationProportionlll0.250.500.75Figure 3.6: Mutation signatures in serially sequenced metastatic tumoursdemonstrate time-dependent divergence from the primary. Tovalidate SignIT and explore signature evolution in metastatictumours, SignIT was used to decipher population-specific sig-natures in metastatic tumours with whole genome sequencingof paired primaries. Cosine distance was used to determine thesimilarity of mutation signature exposures in each subpopu-lation to those of mutations shared with the primary tumour.More prevalent populations typically demonstrated similaritywith the primary, even when bulk metastasis signatures dif-fered greatly. Signatures in lower-prevalence populations di-verged over time.67terized by highly mutagenic late (lower-prevalence) subpopulations (Figure 3.6).Examining P07 as an example (Figure 3.7), SignIT early (population 1) muta-tion signatures were a closer prediction of primary tumour signatures than theearly mutations from BTP, which overestimated APOBEC-related signatures 2and 13. Whereas SignIT found that 91% of mutations originated from the lower-prevalence, late subpopulation, BTP identified 80% of mutations as late-arising.This suggests that BTP may have suffered from contamination of the early muta-tion pool with late-arising APOBEC-associated mutations.SignIT also improves temporal dissection when tumours harbour subpopula-tions with prevalence values near 1.0. For example, P05 had a subpopulation witha relatively high prevalence of 0.73. This population demonstrated a dramaticdrop in signature 1 and rise in signature 8, which was not resolved as clearly byBTP (Figure 3.8). SignIT mitigates these limitations of BTP by fitting the cancer’ssubpopulation structure.3.3 discussionMutation signatures and genomic instability are an emerging part of the ever-growing scientific literature focussed on clinically actionable cancer biomarkers.SignIT constitutes a substantial advance in mutation signature analysis and inter-pretation in individual tumours. Along with providing novel insights into muta-tion signature bleed and tumour subpopulation structure, our findings demon-strate SignIT’s accuracy and its robustness against model complexity and noise.SignIT’s inference of subpopulation-specific signatures improves upon previousapproaches because it directly models the underlying clonal structure. Analysisof tumours sequenced at multiple time points revealed frequent divergence of680.00.10.20.30.4Signature 1Signature 2Signature 3Signature 5Signature 8Signature 9Signature 10Signature 13Signature 16Signature 17SignatureExposure FractionPopulation 1Population 20.25 0.50 0.75Proportion0.5 0.6 0.7 0.8 0.9 1.0PrevalencePopulationPopulation 1Population 2Allllllll ll ll llll ll ll ll llllll ll ll ll ll ll ll ll ll ll ll ll ll ll ll ll ll0.00.10.20.30.41 3 5 7 9 11 13 15 17 19 21 23 25 27 29SignatureExposureFraction TimingllearlylateB0.00.10.21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30SignatureExposureFractionCFigure 3.7: SignIT improves upon binary temporal partitioning (BTP) bymodeling the tumour subpopulation structure. When the tu-mour subpopulation structure disagrees with the strict assump-tions of BTP, temporally dissected mutation signatures can beinaccurate. (A) In this temporal analysis of a lung adenocar-cinoma, the lower prevalence population 2 was highly muta-genic. (B) This feature may have resulted in contamination ofthe smaller “early” mutation pool, resulting in poor temporalseparation. (C) SignIT’s early signatures better match those ofthe archival sample.690.00.20.40.60.8Signature 1Signature 2Signature 3Signature 8Signature 9Signature 10Signature 13Signature 16Signature 18SignatureExposure FractionPopulation 1Population 20.400.450.500.550.60Proportion0.7 0.8 0.9 1.0PrevalencePopulationPopulation 1Population 2Allllll ll ll ll llllllll ll llll ll ll ll llllll ll ll ll ll ll ll ll ll ll ll ll0.00.10.20.30.40.51 3 5 7 9 11 13 15 17 19 21 23 25 27 29SignatureExposureFraction TimingllearlylateB0.00.10.20.30.40.51 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30SignatureExposureFractionCFigure 3.8: A colorectal cancer demonstrates errors in binary partition-ing resulting from unusually high mutational prevalence ofpopulation 2. (B) This resulted in erroneous estimates of signa-tures 1 and 8. (A) By contrast, SignIT jointly models the popu-lation structure and signatures, enabling clearer delineation ofsignatures between subpopulations. (C) SignIT’s results moreaccurately reproduce mutation signatures from the matchedarchival sample.70mutation signatures, highlighting the need to track the evolution of mutagenicprocesses in metastasis.SignIT offers the ability to resolve the mutational history of individual tumoursat greater resolution than previously possible. The temporal dissection of muta-tion signatures can inform many questions of biological interest. For example, un-derstanding the earliest mutational processes in cancer may help inform tumourprevention and early detection. Temporally resolving mutation signatures couldalso improve the understanding of mechanisms underlying known and novel sig-natures. Lastly, signature timing can isolate mutagenic processes characteristic ofkey disease phases such as metastasis.Successful mutation signature analysis necessitates some technical trade-offs.For instance, SignIT infers mutational prevalence as a surrogate for mutationtiming without deconvolving the influences of variant copy number and cellularprevalence. It is technically challenging to determine both these factors indepen-dently without targeted deep sequencing, which is frequently used to estimateclonal composition (Roth et al., 2014). However, the analysis of mutation signa-tures requires the broad capture of large numbers of variants, which is most eco-nomical by lower-depth WGS or WES. A potential compromise, which SignITsupports, involves using targeted deep sequencing to predetermine fixed preva-lence parameters in SignIT inference.Fulfilling the promise of cancer precision medicine will require rapid integra-tion of orthogonal genomic biomarkers into research and clinical practice. SignITprovides a novel, easy to use, and methodologically rigorous approach to muta-tion signature analysis to improve interpretation of n-of-1 signature analysis. Italso enables temporal dissection of mutation signatures, which is applicable toemerging biological and clinical questions in cancer genomics.713.4 methodsSignIT is available from github.com/eyzhao/SignIT. Version v1.0.1 was used forall analyses described in this thesis.3.4.1 The SignIT Generative ModelBayesian inference involves the definition of a generative model, then learning theparameters of that model which provide the best fit to data. Upon convergence,Hamiltonial Monte Carlo (HMC) enables probabilistically proportionate samplingfrom the complete posterior distribution over the parameters. Here, we describethe generative model used in SignIT to perform Bayesian inference over signatureexposures.Mathematics of De Novo Mutation Signature AnalysisThe early work on mutation signatures aimed to identify recurrent patterns ofsomatic mutation which could explain the mutational processes frequently ob-served in cancer. These approaches utilize NMF or similar dimensionality reduc-tion methods to reduce a mutation count vector denoting the frequencies of a setof mutation types into a smaller set of signatures. We refer to these approaches,in aggregate, as de novo mutation signature analysis, because they identify novelmutation signatures using unsupervised learning methods.First, let V be a set of mutation classes parametrized for N mutations arisingfrom K mutation signatures in G genomes. Let M be a V × G mutation countsmatrix, S a V × K mutation signature matrix, and E a K× G exposures matrix. In72the mutation signature model, M = SE. Determining an optimal solution for Sand E given a mutation count matrix M can be performed by NMF.N of 1 Mutation Signature AnalysisIn the n-of-1 case, let G = 1, yielding c = Se, where c ∈ NV is a V-dimensionalnon-negative integer mutation count vector and e ∈ R+K is a K-dimensional non-negative exposures vector. In the common parametrization of SNVs based on basechange and 3’/5’ context, V = 96. Let S;(0 < Sij < 1∀i = 1 . . . V, j = 1 . . . K)bea V × K matrix of known signatures, where K is the number of known referencesignatures.At the time of writing, the most commonly used reference signatureset for SNVs is a 30-signature matrix available from COSMIC at can-cer.sanger.ac.uk/cosmic/signatures. However, any set of reference mutation sig-natures may be used. The mutational spectrum of a tumour is modeled as a linearcombination of contributing signatures, with coefficients e; e1, e2, . . . , eK > 0.Given known values for m and S, the goal is to determine the best fit value ofe. The most common cost function is the sum of squared errors (SSE), such that eis chosen to minimize |Se−m|2. Given that e must be non-negative in all dimen-sions (there cannot be a negative number of mutations), this problem is known asnon-negative least squares. NNLS is well-studied and readily implemented usingquadratic programming (QP), which rapidly converges to an optimal solution.Limitations of NNLSThere are three sources of error in the QP solution to NNLS which limit its ac-curacy and utility to clinical cancer sequencing. The first is sampling error inmutagenesis. The accrual of mutations can be viewed as a random process where,73for the ith mutation, the responsible mutational process zi is drawn from the cat-egorical distribution, zi ∼ Categorical (e), and the mutational class xi is subse-quently drawn from xi ∼ Categorical (S:,zi). This results in a categorical noiseprofile, which QP is likely to overfit to, especially for small mutation counts. Onthe other hand, SignIT models the data-generating process underlying mutationcounts (known as a categorical mixture model) and can therefore account fornoise rather than overfitting to it. This allows SignIT to yield estimates of solutionuncertainty, rather than point estimates.The second source of error arises due to multicollinearity of the reference sig-nature matrix. Signatures in the reference set often exhibit correlation with eachother, due to similarities in their mutation profiles. This can result in spuriousmutation signature elevation when similar signatures are present, a phenomenonknown as “signature bleed”. Multicollinearity poses an inherent mathematicallimitation on the ability to call mutation signatures with certainty. SignIT ad-dresses signature bleed by mapping mutual exclusivities in signature activation.In MCMC, posterior sampling density eventually converges on the true posteriordistribution; any two signatures which bleed with one another will have anticor-related MCMC samples. This reflects the fact that a fixed portion of mutations canbe explained by various linear combinations of the two similar signatures.The last source of error arises from the reference signature matrix, S. The “true”signatures giving rise to a cancer genome may differ slightly from those cata-logued in S, resulting in the mis-estimation of exposures. This signature bias ismore difficult to account for without prior knowledge of uncertainties in the ref-erence signature matrix. However, N of 1 signature decomposition methods canbe tested for robustness against reference signature bias by observing the accu-74racy and stability of results when the reference signatures are perturbed with apre-determined noise profile.Bayesian N of 1 Signature AnalysisSignIT models the acquisition of mutations as a categorical mixture of K muta-tional processes, where K is the number of reference mutation signatures. Thereference mutation matrix, S, has dimensions V × K, where V is the number ofmutation classes (most commonly V = 96) and variant classes are defined by thebase change and 3’/5’ mutation context (Alexandrov et al., 2013b). Every columnof S is a probabilistic simplex denoting the probability distribution of mutationclasses associated with a single reference signature. Let e be a K-dimensionalsimplex denoting signature proportions, also known as exposures. A signature’sexposure is the probability of a random mutation occurring as a result of thatsignature. Note the analogy to a topic model, where signatures represent top-ics, mutation classes constitute the vocabulary, and mutations serve as words. Togenerate a dataset of N mutations, first select each mutation’s signature, ui, bydrawing from a categorical distribution ui ∼ Categorical(e). Next, determine themutation class, vi ∼ Categorical(S:,i). Repeating this process for i = 1, 2 . . . , Nyields a set of N mutations, each belonging to a specific class. SignIT vectorizesthis by encoding mutation count vectors rather than individual mutations, whichprovides significantly better performance.VectorizationFor efficiency, SignIT’s implementation vectorizes the categorical mixture modelpresented in the main text, yielding an equivalent but simpler generative model.We begin by recognizing that the product Se yields an V-dimensional probability75simplex denoting the probability of observing each mutation class. We next de-fine the V-dimensional mutation count vector c, where ci indicates the number ofmutations belonging to the ith mutation class. In this scheme, the previously pre-sented categorical model can be equivalently presented as c ∼ Multinomial(n =N, p = Se). This vectorized parametrization yields significantly faster gradientcalculation, sampling, and likelihood calculation.The Temporal Subpopulation ModelSignIT includes a mathematically consistent extension of the aforementioned mix-ture model to jointly infer signature exposures and temporally separated tumoursubpopulations. The complete hierarchical model (Figure 3.9) generates both theVAF and mutation class of each variant using coinciding beta-binomial and cate-gorical finite mixtures respectively. The categorical mixture component is identicalto the previously described model, except that exposures are replaced by mixingcomponents with K · L elements, where L is the number of subpopulations be-ing modelled. Subpopulations are distinguished based on prevalence, µ′ = Fm,defined as the product of clonal cellular prevalence, F, and the number of alleliccopies carrying the mutation m. Higher prevalence subpopulations are associatedwith earlier-arising mutations.We assume the presence of L latent temporally distinct subpopulations whichgive rise to varying mutant allele counts. To account for overdispersion, which iscommonly observed amongst NGS reads, we model variant allele counts usinga beta-binomial finite mixture model. Note that the beta-binomial distribution isparameterized with mean (µ) and concentration (κ), which relate to the shapeparameters α and β by the relations µ =αα+ βand κ = α+ β.76θ[K × L]φK × LxnKunLynvnznandn[V]skκ µlσ ρ α βLNK θ =(1 1 · · · 1)︸ ︷︷ ︸K×L elementsHyperparametersσ, ρ = 0.01α, β = 1sk = Mutation type probabilitiesdn Read depthsan =T(C(T )n T ) + (C(N)n (1− T ))Correction factorφ ∼ Dirichlet(θ) Priorsxn ∼ Categorical(φ)κ− 2 ∼ Gamma(σ, ρ)µl ∼ Beta(α, β)un = ((xn − 1) %K) + 1 Indicesyn = dxnKezn ∼ BetaBinom(n = dn, Posteriorsα = κanµyn , β = κ(1− anµyn))vn ∼ Categorical(sun)Figure 3.9: Complete SignIT joint population-signature model. For eachmutation, mutation type (vn) and variant read depth (zn) arejointly drawn based on the selected signature (un) and popula-tion (yn) respectively. These responsibility terms are determinis-tically mapped from xn, which is chosen from the mixing prob-abilities φ. The signature probabilities (sk) are user-determined.The beta-binomial population prevalences (µl) and shared con-centration term (κ) determine read depths. The correction fac-tor (an) is computed from total read depth (dn), tumour copynumber (C(T)), normal copy number (C(N)), and tumour con-tent (T).77The SignIT population model assumes that the probability of sampling a variantallele at a mutated locus isµT(C(T)n T) + (C(N)n (1− T)),where C(T) is the positive integer tumour copy number, C(N) is the normalcopy number, and T is the tumour content (0 < T < 1). We define µ as a joint“mutational prevalence,” which is the arithmetic product of cellular prevalence(the fraction of cancer cells possessing a mutation at the locus) and mutation copynumber (the number of copies per cell which carry the mutant allele).Let ψ be a probability simplex with L elements. For the nth mutation, draw thesubpopulation index zn ∼ Categorical(φ). This index selects the subpopulationprevalence µyn . Letting κ be the concentration parameter which determines thedegree of beta-binomial overdispersion, we draw the variant allele depthzn ∼ BetaBinom(n = dn, α = κanµyn , β = κ(1− anµyn)),where an =T(C(T)n T) + (C(N)n (1− T)).This model makes a few assumptions. First, the “infinite sites” assumption thatno locus undergoes multiple independent somatic mutations. Second, tumourcopy number states are assumed to be clonal across the genome. Third, genomicread counts are overdispersed for all sites with a common concentration coeffi-cient, κ.The Complete Subpopulation Signature ModelThe complete SignIT model unites the signature and population models 3.9. Tofacilitate this, each population-signature combination requires its own model coef-78ficient. To replace ψ from the population model, let φ be a simplex of length K× L.The mixing index drawn from φ is xn ∼ Categorical(φ), and simultaneously en-codes a population index yn and a signature index un. The uth signature and ythpopulation correspond to position x = K(y− 1) + u in φ, where K is the numberof populations. The deterministic inverse mappings are un = ((xn − 1) % K) + 1and yn = dxnK e, where % is the modulo (remainder) operator and d...e is theceiling (upwards rounding) operator.Upon assignment of un and yn, the remainder of the generative model proceedsas previously described. un selects the mutation signature, which determines theprobability vector, sun , across mutation types. The mutation type is drawn fromvn ∼ Categorical(sun). yn selects the population mean µl, and the variant alleledepth is drawn from a beta-binomial distribution.This combination of signature and population models defines a joint categorical-beta-binomial mixture model, which allows likelihoods to be computed simulta-neously across both mutation types and variant allele counts. In other words, thisgenerative model simultaneously samples variant allele depth and mutation typeat each mutated locus. Using Bayesian inference methods, we can then estimateposterior distributions over the parameters φ, µ, and κ based on the providedgenomic data.Avoiding Degeneracies in the Beta-Binomial Mixture ModelFinite mixture models with degenerate components often give rise to multimodalposterior distributions. This is due to non-identifiability of the mixture compo-nents (Betancourt, 2017). To remedy this, we enforced ordering over the subpopu-lation prevalences µ. Moreover, in order to sample µ from Beta(1, 1) while impos-79ing natural limitations on its allowable interval, we sampled µ′l ∼ Logistic(0, 1)and applied the inverse logit transform µl =exp(µ′l)1 + exp(µ′l).Implementing Bayesian InferenceThe SignIT hierarchical models are encoded using the Stan (2.17.0) probabilis-tic programming language (Carpenter et al., 2017). Stan is cross-platform andprovides robust Bayesian samplers with a unified modeling language, as wellas visual diagnostics for chain convergence via ShinyStan (Stan DevelopmentTeam, 2017). Posteriors can be sampled by HMC for basic mutation signaturesand by either HMC or automatic differentiation variational inference (ADVI) forpopulation-specific signatures. SignIT was implemented and tested in R (version3.4.1) using the RStan package. Analyses included in this manuscript were run onan x86_64 CentOS6 Linux cluster.The SignIT signature model posteriors are sampled by Markov-chain MonteCarlo (MCMC), by default employing four chains each traversing 200 burn-initerations and 200 sampling iterations with no thinning. The complete SignITpopulation-signature model by default employs ten chains with 200 burn-in itera-tions and 300 sampling iterations each. These parameters can be tuned, however,during testing these values have yielded consistent convergence with sufficientsampling density and effective sample sizes over the posterior with acceptableautocorrelation. The output contains summary statistics along with an attachedStan model output, which can be easily run through diagnostics in ShinyStan(Stan Development Team, 2017).Because the SignIT population-signature model can be slow to sample byMCMC, especially in hypermutated cases, we also enable estimation of posteri-ors by variational inference. SignIT leverages Stan’s ADVI module, which auto-80maticaly selects a variational family and performs optimization to minimize theKullback-Liebler divergence. Upon convergence, 1000 iterations are drawn fromthe posterior distribution via importance sampling. SignIT treats these 1000 itera-tions equivalently to MCMC iterations in subsequent analyses.The relevant version of all dependencies installed for this analysis are part ofan Anaconda virtual environment which can be installed and executed on a Unix-based terminal using the following commands.git clone https://github.com/eyzhao/bio-pipeline-dependencies.gitgit checkout tags/SignIT-paper-dependenciesmakesource miniconda3/bin/activate dependenciesSelecting the Number of SubpopulationsTo select the number of subpopulations, we recommend performing inferenceover models ranging from 1 to at least 5 subpopulations and computing theWatanabe-Akaike information criterion (WAIC) on each. SignIT provides a func-tion for automatically computing the WAIC on sampler output. The model whichminimizes WAIC should be preferred.If there is insufficient time or computational resources to attempt a range ofmodels, SignIT can also provide a rough estimate of the optimal populationcount. Maximum a posteriori parameter estimates are obtained using the popula-tion model 3.9, excluding the portion for inferring signatures, for models rangingfrom 1 to 5 subpopulations. Bayesian information criteria (BIC) are computed foreach model’s parameter estimates and the model with minimum BIC is chosen.813.4.2 Simulated GenomesTo evaluate mutation signature decomposition accuracy against known “true” ex-posures, we devised a mutation signature simulation package called msimR, avail-able at github.com/eyzhao/msimR. Somatic mutation count vectors were simu-lated by drawing mutations of 96 distinct SNV classes from a multinomial dis-tribution, vi ∼ Multinomial(N, Sˆe), where e represents a theoretical set of “true”exposures (Figure 3.2A). Active mutation signatures were randomly selected andall non-contributing signatures had their exposures set to zero. For each simulatedmutation set, contributing signatures were selected at random from a uniform dis-tribution Uniform(0, 1) and the resulting e vector was normalized to sum to 1.Aside from varying signature exposures, an additional source of variabilitymay arise from differences between the reference signatures and the “true” bi-ological mutational processes driving a tumour. It is unlikely that any static setof reference signatures can accurately reflect all possible mutational processes.Therefore, to simulate inaccuracies in the underlying reference signature set, areference signature perturbation was performed by introducing Gaussian noiseinto S. Given a perturbation factor p between 0 and 100 and “true” reference sig-nature matrix S, a perturbed signature matrix Sˆ was randomly computed whereSˆij ∼ Normal(µ = Sij, σ = p100 ∗ Sij); i = 1, 2, . . . V; j = 1, 2, . . . K, with the restric-tion that Sˆij >= 0.In addition to SignIT, we implemented deconstructSigs v1.8.0 (Rosenthal et al.,2016) and SignatureEstimation v1.0.0 (Huang et al., 2017). All mutation signatureswere deciphered “blindly,” using only the mutation count vector v and the com-plete consensus (non-perturbed) matrix of reference signatures S (not Sˆ). This82scenario best reflects the real-world problem where only the observed mutationsand reference signature matrix are known (see Figure 3.2A).Simulated count vectors were generated with combinations of three parame-ters. (1) The number of mutations, N, was varied from 10 to 106; (2) The numberof contributing signatures was varied from 1 to 20; and (3) the amount of referencesignature perturbation was varied from 0% to 80%. For each combination of pa-rameters, 500 random mutation count vectors were generated for a total of 90,000.Each count vector was decomposed into exposures using each of the three meth-ods. Deviation between calculated exposures and true exposures was reportedusing the cosine distance (1− e · e|e||e| , Figure 3.2B).3.4.3 Publicly Available Cancer Mutation Data2,748,760 cancer somatic SNVs from 4,563 exomes called by four mutation callers(MuSE, MuTect2, VarScan2, and SomaticSniper) via a harmonized pipeline wereobtained as mutation annotation format (MAF) files from TCGA using the Ge-nomic Data Commons (GDC) portal (gdc.cancer.gov). This study included datafrom 9 TCGA cancer cohorts; BLCA, BRCA, CESC, COAD, LUAD, LUSC, SKCM,STAD, and UCEC were chosen because they have the most cumulative somaticmutations and are thus more likely to yield reproducible, high quality mutationsignatures by NMF. 459,552 SNVs called by only one of four callers were filteredout, leaving a total of 2,289,208 somatic SNVs (Table 3.1). Mutation signatureswe deciphered de novo using NMF from each cohort using the WTSI framework(Alexandrov et al., 2013b), then the best-matching reference signature was chosenbased on cosine similarity for comparison with n-of-1 methods. Where multiple denovo signatures best matched one reference signature, only the top match was cho-83sen for comparison. N of 1 mutation signatures were fitted using SignIT, decon-structSigs, and SignatureEstimation using the complete set of 30 SNV mutationsignatures as a reference matrix, and blinded to the NMF analysis. Comparisonbetween NMF and each n-of-1 method was performed by Spearman correlationof sample exposures between matching signatures (Figure 3.4).Table 3.1: The numbers of samples and variants in each TCGA cohort ana-lyzed.Cohort Sample Count Total SNVs Excluded SNVs SNVs for NMFBLCA 411 145,980 22,681 123,299BRCA 983 130,254 34,701 95,553CESC 289 112,735 18,573 94,162COAD 399 287,035 66,401 220,634LUAD 562 224,075 39,886 184,189LUSC 491 195,436 28,051 167,385SKCM 466 431,179 35,731 395,448STAD 433 225,913 57,429 168,484UCEC 529 996,153 156,099 840,0543.4.4 De Novo Signature AnalysisSNVs were categorized based on 6 variant types and 16 trinucleotide contextsubtypes to yield a total of 96 mutation classes. Within each cohort, mutation sig-natures were deciphered using a published framework (Alexandrov et al., 2013b),which employs NMF to infer both the operative signatures prevalent across sam-ples and the relative exposure of each signature to each sample. Signature stabilityestimates were obtained by Monte Carlo simulation with 1000 iterations (10 itera-tions over 100 cores). In each cohort, signature models involving 2 to 8 signatureswere attempted and the solution which maximized signature stability and min-84imized reconstruction error was selected (Appendix Figure A.3). The similarityof signatures to thirty previously described mutational signatures (available fromcancer.sanger.ac.uk/cosmic/signatures) was quantified using the cosine similaritymetric and the most similar corresponding signature was selected in each case.3.4.5 Structural Variant Mutation SignaturesFor the SV signatures in Figure 3.1C, SVs were categorized as per Nik-Zainal etal. (2016) based on the mutation type (deletion, duplication, inversion, or translo-cation), the length of SV (except for translocations), and whether the SV break-points were clustered. Clustered breakpoints were in segments with breakpointdensity at least 10 times greater than average, and segments were determined us-ing a piecewise linear fitting with smoothness parameter γ = 25 and minimumbreakpoints per segment kmin = 10, implemented using the copynumber package(v1.18.0) in R. This yielded a 32-class parameterization. SignIT analysis was per-formed on the resulting mutation count vector using six previously published SVsignatures (Nik-Zainal et al., 2016) as the reference matrix.3.4.6 Whole Genome Sequencing of Metastatic CancersStudy participants with advanced stage cancers underwent tumour biopsies aspart of the POG Project (Laskin et al., 2015). The study was approved by theUniversity of British Columbia Research Ethics Board (REB# H12-00137 andH14-00681). Written informed consent, including potential publication of find-ings, was obtained from patients prior to genomic profiling. Patient informationwas anonymized, and each was assigned an alphanumeric identification code.85llllllllllllllllllllllllP20P06P07P13P16P10P17P08P01P12P15P18P04P11P21P19P14P05P22P02P09P03P23P24−15 −10 −5 0Relative Biopsy Time (Years)PatientOccurrencel PrimaryMetastaticPrepllllFAFFFFPEOCTFigure 3.10: The time of sample collection for multiply sequenced tu-mours. Whole genome analysis of multiply sequenced tu-mours was performed in 24 patients. Timing of primarytumour sample collection relative to the metastatic biopsyranged from -15 to +1 years.24 patients underwent whole genome sequencing of multiple temporally and/orspatially distinct tumours (Appendix Table A.3). Of these, the primary tumourwas biopsied before the metastatic tumour in all except one case (Figure 3.10).Whole-genome sequencing data (.bam files) have been submitted to the EuropeanGenome-Phenome Archive (EGA) (www.ebi.ac.uk/ega/home) under the studyaccession number EGAS00001001159.86The details of library construction, sequencing, and bioinformatics of metastaticsamples have been previously described (Jones et al., 2010). Briefly, biopsy sam-ples were embedded in OCT compound and sectioned. Pathology review wasperformed to select sections for sequencing. Genome libraries were constructedfrom tumor and peripheral blood (normal control) and sequenced using Illuminaprotocols and on a HiSeq sequencer. Reads were aligned to hg19 by the BWAaligner (v0.5.7) (Li and Durbin, 2009, 2010). Somatic SNVs and small insertion-s/deletions were processed using samtools (Li et al., 2009) and Strelka (v0.4.6.2)(Saunders et al., 2012). CNVs were called using CNASeq (v0.0.6). LOH was calledby APOLLOH (v0.1.1) (Ha et al., 2012). SVs called both from ABySS de novo as-sembly (Jackman et al., 2017) and by DELLY (Rausch et al., 2012) were intersectedbased on events with breakpoints less than 20 base pairs apart.3.4.7 Ploidy-correction of Copy Number VariantsSignIT relies upon accurate calling of CNVs in order to correct for variant al-lele probabilities. For every metastatic cancer which underwent whole genomesequencing, a most likely ploidy model was determined by manual review us-ing CNV and LOH calls and informed by tumour content estimates both frompathology assessment and bioinformatic analysis. In order to correct for ploidy,the absolute copy numbers of segments called by the CNASeq hidden Markovmodel (HMM) were adjusted. The tumour-normal depth ratios, R, used as inputfor segmentation are computed asR =C(T)T + C(N)(1− T)TP + C(N)(1− T) ,87where R is the mean tumour-to-normal read depth ratio across the segment, Tis the tumour content, and P is the ploidy. C(T) is the estimated absolute tumourcopy number of the segment, and C(N) is the normal copy number, assumed tobe 2. The ploidy model is chosen manually by inspection of allele-specific readdepths. The numerator is proportional the relative abundance of reads from thetumour sample, which is a mixture of tumour and normal cells. The denominatoris the relative abundance for a region with no copy number abberation. Rearrang-ing yieldsC(T) =(R)(TP + C(N)(1− T))− C(N)(1− T)T.3.4.8 Whole Genome Sequencing of Primary Tumours4 primary samples were sequenced as frozen or OCT samples as described in theprevious section. 20 primary samples were sequenced from FFPE material.Whole genome libraries from primary tumour were constructed as previouslydescribed (Chong et al., 2016) with modifications. The input amount of FFPE DNAsamples varied from 100 ng to 2 µg depending on availability. To improve libraryquality, the sheared genomic DNA was either size-selected by polyacrylamide gelelectrophoresis or by solid phase reversible immobilization bead-based size selec-tion to remove smaller DNA fragments from highly degraded strands. Librarieswere sequenced on the Illumina HiSeq2500 sequencer using paired-end sequenc-ing with read lengths of 100 or 125 bp.884C L I N I C A L A P P L I C AT I O N O F M U TAT I O N T I M I N G I N AB R C A 1 - M U TAT E D PA N C R E AT I C A D E N O C A R C I N O M A4.1 introductionHR facilitates error-free repair of double-strand DNA breaks and interstrandcrosslinks (Li and Heyer, 2008). Mutations in BRCA1, BRCA2, and other genesresponsible for HR are prevalent among human cancers, causing HRD and ge-nomic instability (Scully and Livingston, 2000).WGS efforts have identified mutational and structural rearrangement signa-tures linked to BRCA1 and BRCA2 mutations in breast and other cancers (Lordand Ashworth, 2016), which may predict response to platinum-based chemother-apy and PARP inhibitors. However, the role of signature timing on treatment re-sponse has not been elucidated, but could help distinguish currently active, action-able mutational processes from historically active ones. In chapter 2, we demon-strated an association between signatures of HRD and response to platinum-basedchemotherapy. In chapter 3, we developed a method to perform n-of-1 analysis ofmutation signatures and their temporal evolution.Here, we present the first clinical application of HRD dynamics across spatiallyand temporally distinct biopsies of a pancreatic ductal adenocarcinoma (PDAC).This approach helped to reconcile paradoxical findings: genomic stability and lowHRD mutation signature despite a germline BRCA1 mutation and exceptional re-89sponse to FOLFIRINOX. Our findings highlight the potential value of consideringtiming in the clinical interpretation of mutation signatures.4.2 case reportAs part of an ongoing study exploring the use of comprehensive molecular anal-ysis to inform treatment decision-making (NCT 02155621) (Laskin et al., 2015),a 67-year-old male with metastatic PDAC and the germline founder mutationBRCA1 c.68_69delAG (185delAG) consented to undergo biopsy of a liver metas-tasis for molecular analysis. The primary tumor had been resected previously,followed by 6 months of adjuvant cisplatin/gemcitabine chemotherapy, before de-tection of liver metastases 12 months after surgery and 6 months after discontinu-ing cisplatin/gemcitabine. Liver biopsy was performed before commencement ofpalliative FOLFIRINOX chemotherapy (5-fluorouracil, oxaliplatin and irinotecan).He had an excellent response to treatment, with CA19-9 halving in 2 months andcomplete PET response within 4 months (Figure 4.1A). Oxaliplatin was held after16 cycles due to peripheral neuropathy and the patient continued to have diseasecontrol on first-line chemotherapy at last follow up, 18 months later. This repre-sents an exceptional response, as median overall survival for metastatic pancreaticcancer is less than 6 months, or 11 months with FOLFIRINOX treatment (Conroyet al., 2011).90BD00.010.020.03C>AC>GC>TT>AT>CT>Gmetastasis - primary0.00.20.40.60.81.00.260.210.090.07Signature 3Signature 6Signature 2Signature 13Signature 1Signature 9Signature 8OtherCA TGCACGT5’3’CA19-9 (kU/L x 10000)AstartedFOLFIRINOXComplete response on PETWeeks from Diagnosis of Metastatic PDAC0 20 40 60 8043210Week 20Week 10.00.20.40.6Signature 1Signature 2Signature 3Signature 4Signature 7Signature 8Signature 9Signature 13Signature 16Signature 25Signature 27Signature 28SignatureExposure FractionPopulation 1Population 20.3 0.4 0.5 0.6 0.7Proportion0.85 0.90 0.95 1.00PrevalencePopulationPopulation 1Population 20.00.20.40.6Signature 1Signature 2Signature 3Signature 4Signature 7Signature 8Signature 9Signature 13Signature 16Signature 25Signature 27Signature 28SignatureExposure FractionPopulation 1Population 20.3 0.4 0.5 0.6 0.7Proportion0.85 0.90 0.95 1.00Prevalencel tionPopulation 1Population 20.00.20.40.6Signature 1Signature 2Signature 3Signature 4Signature 7Signature 8Signature 9Signature 13Signature 16Signature 25Signature 27Signature 28SignatureExposure FractionPopulation 1Population 20.3 0.4 0.5 0.6 0.7Proportion0.85 0.90 0.95 1.00PrevalencePopulationPopulation 1Population 20.00.20.40.6Signature 1Signature 2Signature 3Signature 4Signature 7Signature 8Signature 9Signature 13Signature 16Signature 25Signature 27Signature 28SignatureExposure FractionPopulation 1Population 20.3 0.4 0.5 0.6 0.7Proportion0.85 0.90 0.95 1.00PrevalencePopulationPopulation 1Population 20.00.20.40.6Signature 1Signature 2Signature 3Signature 4Signature 7Signature 8Signature 9Signature 13Signature 16Signature 25Signature 27Signature 28SignatureExposure FractionPopulation 1Population 20.3 0.4 0.5 0.6 0.7roportion0.85 0.90 0.95 1.00PrevalencePopulationPopulation 1Population 213890500100015002000PrimaryMetastasisExposure (Mutations)13890.00.10.20.3PrimaryMetastasisExposure FractionFigure 4.1: Evolution of single nucleotide variant (SNV) mutation sig-natures in a pancreatic adenocarcinoma with exceptionalresponse to FOLFIRINOX. (A) At 20 weeks of treatmentwith FOLFIRINOX, the patient exhibited a complete response,which was maintained for over 18 months. (B) Mutation sig-nature exposures in the primary tumor and metastasis reveala substantial rise in the homologous recombination deficiency(HRD) signature. (C) The complete catalogue of new somaticSNVs (present in metastasis but absent in primary tumor), withSNVs categorized into 96 classes based on variant type and3’/5’ context was matched against 30 pre-defined mutation sig-natures. This revealed dominant involvement of the HRD sig-natures (Signature 3 and 8). (D) Temporal dissection by SignITrevealed two tumour subpopulations, which revealed a drop insignature 1 and rise in signatures 3 and 8.914.3 results4.3.1 BRCA1 Loss in the Primary and MetastasisBoth the primary tumor and metastasis demonstrated genomically stable struc-tural variant profiles based on previous characterization of the pancreatic cancergenome landscape (Waddell et al., 2015). The BRCA1 c.68_69delAG frameshiftvariant, heterozygous in the germline, was homozygous in both tumors as demon-strated by copy-neutral loss of heterozygosity (CNLOH) spanning most of chro-mosome 17 and detailed analysis of aligned reads (Figure 4.2). Analysis of theBRCA1 transcripts showed the presence of the mutation in all expressed tran-scripts.4.3.2 Timing of the BRCA1 LossBased on analysis with cancerTiming (Purdom et al., 2013), the CNLOH eventon chromosome 17 resulting in homozygosity of BRCA1 was the 9th earliest of 30events (pi0 = 0.15) in the metastasis and 17th of 41 events (pi0 = 0.40) in the primary,suggesting that it was not among the earliest tumor-initiating events (Figures4.4C, D). TP53 loss of function was also observed and was likely a simultaneousoccurrence due to the same CNLOH event. Cellular prevalence estimation usingTITAN (Ha et al., 2014) converged on a 4-subclone model but suggests that thechr17 LOH event was clonal in the metastasis (Figure 4.3). The clonality of thisevent minimizes the risk of platinum resistance by selection of a BRCA1 wild-typesubclone. Tumor content in the pancreatic primary was insufficient to estimateclonality.92Figure 4.2: Genomic analysis and clinical evolution of a germlineBRCA1 c.68_69delAG-associated pancreatic ductal adenocar-cinoma (PDAC) primary tumor (left) and metastasis (right).(A, B) Copy-neutral loss of heterozygosity (CNLOH) of chro-mosome 17. The allelic ratio is shown in the top-left and top-right plots. The copy number variant (CNV) ratios are shownin the bottom-left and bottom-right plots. LOH regions werecalled using APOLLOH. The position of BRCA1 is shown asa red vertical line in all plots. (C, D) Structural variants, LOHand CNV events are depicted from the centre outwards. Thegreen and red bars represent copy loss and copy gain respec-tively. Estimated tumor content by sequencing was 25% in theprimary tumor and 49% in the metastasis. (E) Developmentof liver metastases with rising CA19-9 that peaked at 45,000kU/L. Within 4 weeks of commencing FOLFIRINOX, CA19-9decreased by over 50%; positron emission tomography (PET)complete response was seen at 20 weeks. At the time of writ-ing, the patient has an ongoing PET complete response andsuppression of CA19-9, 79 weeks after commencing treatment.93Figure 4.3: Joint calling of CNV, LOH, and clonal status performedacross the metastatic genome using TITAN. The 4-clone modelyielded an optimal fit to the data. Investigation of chromosome17 revealed that the CNLOH event affecting the chr17 genesTP53 and BRCA1 was clonal, with high cellular prevalence.94Inferred timing of shared genomic events was significantly correlated (r = 0.7,p = 4.8 × 10−6) between primary and metastasis samples (Figure 4.4A), withevents in the metastasis consistently inferred to be “earlier” (Figure 4.4B). This isexpected, and reflects the “aging” of shared genomic events during the approxi-mately one-year gap between sequencing of the two samples. To our knowledge,this is the first biological validation of a CNV timing inference model using mul-tiple sequencing time points.4.3.3 Evolution of Mutation Signatures from Primary to MetastasisThe relative contributions of 30 previously described mutation signatures (Alexan-drov et al., 2013a) were determined from 5683 SNVs in the primary and 8315 inthe metastasis. Signature 3 and, to a lesser extent, Signature 8 have been associatedwith HRD (Davies et al., 2017; Nik-Zainal et al., 2012). Mutations associated withsignature 3 rose by 1593 in the metastasis, and signature 8 rose by 1421, more thanany other signature (Figure 4.1A). Further, of new somatic mutations (present inthe metastasis but absent in the primary tumor), 26% were associated with theHRD signature (Figure 4.1B), suggesting major involvement of HRD-associatedmutagenesis in the evolution of this PDAC. Strong signature bleed was observedbetween signatures 3 and 8 (Figure 4.5), but there was little bleed between HRDand other signatures.4.3.4 Evolution of Orthogonal HRD-associated Mutational SignaturesThe presence of recently described genomic signatures associated with HRD(Davies et al., 2017) was investigated in the primary and metastasis. Rearrange-95lllllllllllllllllllllllllllll ll0.00.20.40.60.000.250.500.751.00Primary pi0Metastasis pi0A0.000.250.500.751.00primarymetastasistissuepi0Event TypeCNLOHDoubleGainSingleGainBlllllllllllllllllllllllllllllllllllllllllBRCA1, TP53chr15: 0M − 22.5Mchr21: 0M − 14.6Mchr19: 53.5M − 59.1Mchr4: 188.9M − 190.9Mchr14: 103.4M − 107.2Mchr21: 24.6M − 31.8Mchr20: 31.3M − 35.7Mchr9: 38.8M − 141.1Mchr4: 181.3M − 183.2Mchr6: 104.7M − 119.9Mchr6: 68.9M − 98.1Mchr17: 0.6M − 81Mchr12: 38.4M − 133.7Mchr22: 16.1M − 51.1Mchr12: 24.3M − 31.9Mchr16: 63.8M − 90.2Mchr4: 0.1M − 167.2Mchr6: 24.1M − 67.5Mchr16: 27M − 60.8Mchr8: 56.6M − 146.1Mchr6: 120.6M − 156.1Mchr13: 0M − 114.9Mchr7: 0M − 139.2Mchr5: 0M − 19Mchr1: 144.9M − 249.2Mchr8: 43.7M − 56.6Mchr4: 172.9M − 176.6Mchr5: 21.6M − 24.9Mchr20: 10.1M − 25Mchr21: 32.3M − 48Mchr1: 0M − 60.1Mchr12: 17.5M − 22.8Mchr20: 47.1M − 62.8Mchr6: 7.4M − 23.1Mchr4: 183.2M − 187.9Mchr19: 28.6M − 52.7Mchr6: 165.8M − 170.9Mchr5: 26.2M − 45.8Mchr2: 0.1M − 5.6Mchr4: 168.3M − 172.1Mchr12: 2.2M − 8.8M0.00 0.25 0.50 0.75 1.00Inferred timing (pi0)PrimaryCllllllllllllllllllllllllllllllTP53, BRCA1chr15: 0M − 22.5Mchr14: 20.2M − 73.7Mchr12: 0.2M − 2.7Mchr20: 0.1M − 10.1MchrX: 0M − 9.4Mchr9: 38.8M − 141.1Mchr7: 133.9M − 138.7Mchr20: 46.2M − 62.8Mchr17: 0M − 44.2Mchr12: 32.1M − 133.8Mchr13: 19.1M − 111.4Mchr1: 145.5M − 249.2Mchr20: 10.1M − 19.3Mchr7: 84M − 133.9Mchr22: 0M − 51.2Mchr4: 166.9M − 190.9Mchr6: 7.4M − 170.9Mchr17: 44.4M − 81Mchr4: 0.1M − 166.9Mchr21: 0M − 48.1Mchr7: 0.1M − 84Mchr8: 39.6M − 56.6Mchr20: 19.7M − 46.2Mchr1: 0M − 60Mchr19: 22.6M − 46.5Mchr5: 33.6M − 52.8Mchr8: 56.6M − 146.1Mchr12: 2.7M − 22.8Mchr12: 23.3M − 29Mchr15: 66.4M − 102.4M0.00 0.25 0.50 0.75 1.00Inferred timing (pi0)MetastasisDFigure 4.4: Comparison of inferred timing for events shared betweenpancreatic primary tumor and metastasis. (A) Inferred tim-ing of copy-change events in the primary tumor and metastasiswere strongly correlated, with a slope of 0.45, intercept of 0.02,r = 0.7, and p = 4.8× 10−6. Only shared genomic events in over-lapping regions are shown. (B) Events in the metastasis samplewere inferred to have arisen “earlier” in the course of tumori-genesis, consistent with the timing of sample collection. (C, D)Inferred timing with 95% confidence intervals in the primaryand metastasis. Event coordinates are labelled along the y-axis.In case of two-copy gain, only the timing of the first copy gainis shown. The loss of heterozygosity (LOH) event encompass-ing both BRCA1 and TP53 has been highlighted, and is the 17thearliest of 41 inferred events in the primary and 9th of 30 eventsin the metastasis.960.00.10.20.31 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30Exposure Fraction0.370.390.410.430.45SignatureBleedA0.00.10.20.31 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30Exposure Fraction0.400.450.50SignatureBleedBFigure 4.5: Mutation signature bleed between signatures 3 and 8. Us-ing SignIT, mutation signature posteriors were estimated froma full Bayesian solution to a categorical mixture model usingHamiltonial Monte Carlo. Signature bleed was collected basedon anticorrelation in 2D projections of the posterior probabil-ity distribution. Mutation signature exposures from (A) the pri-mary sample and (B) the metastatic sample demonstrate in-creasing involvement of Signature 3, with bleed between Sig-natures 3 and 8.970.000.250.500.75PrimaryMetastasisExposure FractionA050100150200PrimaryMetastasisNumber of IndelsB0102030PrimaryMetastasisHRD ScoreCSignatureRS1RS2RS3RS4RS5RS6IndelsAll IndelsIndels with MHHRD ScoreLOHTAILSTFigure 4.6: Evolution of structural variation alterations between the pan-creatic primary and metastasis. (A) Rearrangement signatures3 and 5, associated with HRD, were low but rose between theprimary and metastasis. (B) The fraction of indels with micro-homology rose from 8% to 12%. (C) The total HRD score rosefrom 30 to 38, largely driven by a rise in loss of heterozygosity(LOH) and large-scale transitions (LST). RS: Rearrangement sig-nature; MH: microhomology; TAI: telomeric allelic imbalance.ment signatures 3 and 5 (Figure 4.6A) and the fraction of indels with microho-mology (Figure 4.6B) were low, but rose between the primary and the metastasis.The HRD composite score, combining LOH, TAI, and LST, increased from 30 inthe primary to 38 in the metastasis (Figure 4.6C). A caveat is that the low tumourcontent of the primary tumour may impact the accuracy of SV and CNV calling.This was the highest observed HRD score among the first 25 PDAC cases in ourstudy cohort.984.3.5 Mutation Signature TimingAnalysis with SignIT revealed two temporally distinct tumour subpopulationswith mutational prevalences of 0.99 and 0.84. The higher-prevalence populationreflects signatures from clonal mutations (present in every tumour cell) or mu-tations present on multiple copies. Both are associated with earlier-arising mu-tations: clonal mutations occur before subclone branching, and multi-copy muta-tions occur before replication of the associated segment 1.1. These early mutationsaccount for 36% of mutations, while later mutations make up 64%.The early subpopulation was dominated by Signature 1, which accounted for59% of mutations, whereas Signatures 3, 8, 9, and 16 were active in the later sub-population. These findings agree with the observed increase in signatures 3, 8, and9 from the primary to the metastatic sample. In particular, Signature 3 exposurerose from 6% of mutations in population 1 to 27% in population 2. These findingsprovide additional evidence that HRD remains an active mutation-causing pro-cess in the metastatic tumour, despite the overall low SV burden and moderateHRD signature exposure.4.4 discussionPrevious studies have reported platinum sensitivity in PDACs with BRCA1/2 mu-tations, rampant SVs, and strong mutation signature (Waddell et al., 2015). Here,we explored the genomic evolution of a BRCA1 germline mutated PDAC with aparadoxically low HRD mutation signature and genomic instability burden. Basedon observations involving the temporal dynamics of mutational signatures, wepostulate that HRD onset in this case may have occurred too recently to produce99a heavy burden of genomic instability, thus resulting in the absence of an unstablerearrangement signature. However, rising HRD signature exposure suggests thatHRD remains a “currently active” process, which may explain the patient’s excel-lent and sustained response to FOLFIRINOX, a platinum-containing chemother-apy.This analysis has some limitations. The archival primary sample had low tu-mour content (25%), which is known to limit the accuracy of mutation calling,and thus required analytical validation of major findings. Mitigating factors in-clude the findings that timing of CNV and LOH events was concordant betweenprimary and metastasis and temporal analysis of the metastasis corroborated mu-tation signature evolution patterns. Moreover, SNVs were called by Strelka, whichis designed to operate under low cellularity (Saunders et al., 2012). Another im-portant limitation is that FOLFIRINOX contains agents other than oxaliplatin,namely fluorouracil, leucovorin, and irinotecan, which may have contributed tothe durable treatment response. Notably, the patient did not exhibit such a dra-matic response to adjuvant cisplatin/gemcitabine therapy in the primary setting.While this could be explained by low HRD mutagenic activity in the primary tu-mor, the action of non-oxaliplatin agents cannot be neglected. Lastly, it is possiblethat exposure to platinum-based therapy may have driven HRD-associated muta-genesis in the metastatic tumour. However, other studies have discovered HRD-associated signatures in treatment-naive primaries (Alexandrov et al., 2015b; Nik-Zainal et al., 2016), and they differ from recently-discovered platinum-associatedsignatures (Boot et al., 2017; Szikriszt et al., 2016). Despite these caveats, we believethis case raises important educational questions on whether temporal evolutionof HRD signature activity may help refine prediction of therapy response.100Although HRD is commonly considered an early tumor-initiating event, a re-cent study suggests that the BRCA1 and BRCA2 mutation signature is also preva-lent in late-arising mutations (McGranahan et al., 2015). The exploration of muta-tion timing has not yet been widely adopted, and presents with numerous tech-nical challenges (Purdom et al., 2013). This case was an opportune candidate fortiming analysis due to the availability of primary tissue and the large chr17 CN-LOH event spanning BRCA1 and TP53. Consequently, these findings raise severalquestions beyond the scope of this brief report. Do temporal dynamics vary acrosspathogenic BRCA1, BRCA2, or other HRD-associated gene variants, and is “late-onset” HRD a common phenomenon?With a growing body of evidence supporting the role of HRD as a predictivemarker of response to platinum-based therapy and PARP inhibitors across varioustumor types, there is increasing interest in new approaches to identify genomicscars associated with HRD. Although WGS techniques provide a cross-sectionalsnapshot of the cancer genome at a fixed moment in time, they can also be usedto infer the relative timing of genomic events. We hope that ongoing compre-hensive molecular analysis with high quality prospective treatment and outcomeinformation will facilitate a deeper understanding of the nuances in HRD-relatedmutational processes, resulting in improved clinically predictive accuracy of HRDassessment.1014.5 methods4.5.1 Tissue Collection, Processing, and StorageFollowing informed consent, patients underwent image-guided metastatic biop-sies as part of the Personalized OncoGenomics program of British Columbia (NCT02155621, University of British Columbia Clinical Research Ethics Board approvalno. H12-00137). Up to 5 biopsy cores were obtained using 18-22G biopsy needlesand embedded in optimal cutting temperature (OCT) compound. Tumor sectionswere reviewed by a pathologist to confirm the diagnosis, evaluate tumor contentand cellularity and to select areas most suitable for DNA and RNA extraction.Peripheral venous blood samples were obtained at the time of biopsy and leuko-cytes isolated for use as a germline DNA reference. DNA and RNA were extractedfor genomic and transcriptomic library construction, which have been previouslydescribed in detail (Sheffield et al., 2015).Tissue from the primary pancreatic tumor and liver metastasis were sequenced,with leukocytes isolated from blood samples used as a germline DNA reference.Tumour content was estimated at 49% for the metastatic sample and 25% forthe primary. The low tumour content of the primary sample necessitates carefulinterpretation of variant calls along with orthogonal validation of key findings.The primary pancreatic tumor sample was obtained from the previously re-sected specimen that had been snap-frozen at the time of surgery and stored atthe BC Gastrointestinal Biobank at -80°C for approximately 18 months prior toextraction and analysis. All samples were handled under sterile conditions andtransported in dry ice.1024.5.2 Sequencing and BioinformaticsPaired-end reads were generated on an Illumina HiSeq2500 sequencer andaligned to the human reference genome GSCh37 by the BWA aligner (Li et al.,2009) (v0.5.7). Somatic SNVs and small insertions/deletions were processed us-ing SAMtools (Li and Durbin, 2010) and Strelka (Saunders et al., 2012) (v0.4.6.2).Regions of CNV were determined using CNASeq (v0.0.6) and LOH by APOL-LOH (Ha et al., 2012) (v0.1.1). Tumor content and ploidy models were estimatedfrom sequencing data through analysis of the CNA ratios and allelic frequenciesof each chromosome. This was then compared to theoretical models (Ha et al.,2012) for diploid, triploid, tetraploid, and pentaploid genomes at various tumorcontents (10% intervals from initial lab estimate). The resulting analysis was adiploid model at 25% tumor content in the PDAC primary and 49% tumor contentin the PDAC metastasis. Structural variation was detected by de novo assemblyof tumor reads using ABySS and Trans-ABySS (Robertson et al., 2010), followedby variant discovery using DELLY (Rausch et al., 2012).CNV and LOH Analysis with TITANJoint detection of CNV and LOH was performed on the metastatic sample us-ing TITAN (Ha et al., 2014). The TitanCNA Bioconductor package (version 1.12.0)and its dependencies were installed in R (version 3.3.2). The germline heterozy-gous mutations required by TITAN were called using MutationSeq (version 4.3.8)(Ding et al., 2012) installed in Python (version 2.7.13). The germline variants werefiltered for those present in dbSNP (release 138, common_all). For more infor-mation, see https://github.com/MO-BCCRC/titan_workflow. The cellular preva-lence of each event was estimated according to a 4-subclone model, which yielded103the best Bayesian information criteria fit to the SNV read counts. TITAN providedan estimated tumour content of 0.56, which is similar to the tumour content esti-mate obtained by manual review (0.49). TITAN also provided an average tumourploidy of 2.05. These findings were used to assess the clonal status of the LOHevent on chromosome 17 spanning BRCA1 and TP53.4.5.3 Mutation Timing AnalysisThe relative temporal ordering of large scale genomic events can be performed byleveraging SNV burden as a “molecular clock”. Note that this method can onlyinfer the timing of events for which the precise history is known with reasonableconfidence. As a result, only regions with CNLOH or allele-specific amplificationwith 1-copy or 2-copy gain have inferrable timing. Thus, Figure 2 shows the in-ferred timing for the subset of events which fit this criterion.This analysis was performed using the cancerTiming module of the R program-ming language (Purdom et al., 2013). Because larger genomic events yield moreaccurate timing, small events which interrupt adjacent larger ones were automat-ically filtered out 2. This filtering step dramatically improved the assocation intiming between the primary and metastatic samples. For each segment, cancer-Timing computes pi0, the probability of a random SNV within the affected locioccurring prior to the event. Greater pi0 values suggest later occurrence of theCNV and/or LOH. Bootstrap distributions were computed non-parametricallyusing 1000 iterations 95% confidence intervals were determined by reporting the25th and 975th ordered values from the resulting distribution.1044.5.4 Mutation Signature and Signature Timing AnalysisPatterns of somatic SVs, SNVs, and CNVs were interrogated to determine thecontribution of HRD to mutagenesis. SNV and SV count vectors were com-puted as previously described (Alexandrov et al., 2013b; Nik-Zainal et al.,2016). Using the 30 consensus signatures from COSMIC as a reference set (can-cer.sanger.ac.uk/cosmic/signatures), signature exposures were computed usingSignIT.4.5.5 Additional HRD Metrics: Deletion Microhomology and HRD ScoreThe proportion of deletions exhibiting overlapping microhomology was deter-mined by identifying matching sequences between deleted ends and flanking re-gions. HRD scores were computed as the arithmetic sum of LOH, TAI, and LSTmetrics, which in turn were determined using CNV and LOH patterns (Figure4.7) based on previously described guidelines (Timms et al., 2014).1051 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18192021223LOH callsfiltered12341234Copy Number1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18192021223nonelohtaibothlohtypeHETBCNAASCNADLOHNLOHALOHFigure 4.7: Filtering of small segments for mutation timing and HRDscores pre-processing. The purpose of filtering is to (1) fill ingaps where no calls exist with a normal, heterozygous seg-ment, and (2) remove tiny segments adjacent to two equiva-lent larger segments, as these are likely to represent later orspurious events. HRD scores are composed of telomeric allelicimbalance (TAI), large loss of heterozygosity (LOH), and large-scale transitions (LST). LST junctions are shown by the verticalblack lines in the lower figure.1065T H E E V O L U T I O N O F M U TAT I O N A L P R O C E S S E S I NM E TA S TAT I C C A N C E R5.1 introductionAlthough metastasis underlies up to 90% of cancer-related mortality (Seyfried andHuysentruyt, 2013), genomic instability and mutation signatures are mostly stud-ied in primary tumours. Mutation signatures are recurrent patterns of somatic mu-tation frequently associated with specific mutational mechanisms such as tobaccoand UV exposure (Alexandrov et al., 2016), endogenous mutagenic processes suchas deamination (Roberts et al., 2013), and DNA repair deficiencies (Nik-Zainal etal., 2012; Polak et al., 2017). The analysis of mutation signatures in primary can-cer sequencing data has catalogued over 30 known signatures (Alexandrov et al.,2013b; Letouzé et al., 2017; Nik-Zainal et al., 2016). Analysis of digital NGS readcounts has also revealed that the activity of mutational processes changes overtime (McGranahan et al., 2015). This additional insight can help characterize theordering of mutagenic impacts throughout carcinogenesis and progression.Recent studies suggest that certain mutation signatures may predict chemother-apy response. Hypermutating tobacco, UV radiation, and MMR have been asso-ciated with increased neoantigen burdens and sensitivity to immunotherapies inlung, gastrointestinal, urothelial, and skin cancers (Iyer et al., 2017; Lauss et al.,2017; Le et al., 2015; Rizvi et al., 2015). Signatures of HRD have been associated107with distinct cancer subtypes (Wang et al., 2017) and sensitivity to platinum-basedchemotherapies (Telli et al., 2016; Zhao et al., 2017). Understanding mutationalprocesses in metastatic cancers could uncover actionable targets and refine mod-els of progression and drug resistance.Also of interest in metastatic cancers are mutational spectra manifest byexposure to cytotoxic chemotherapies. For example, a specific hypermutationsignature of C→T transitions pervades the genomes of MGMT-methylated,MMR-deficient glioblastomas treated with temozolomide alkylator chemotherapy(Alexandrov et al., 2013a; Yip et al., 2009). Despite various efforts to catalogue themutations induced in model organisms by chemotherapy exposures (Meier et al.,2014; Segovia et al., 2015; Szikriszt et al., 2016), few matching signatures have beenobserved in sequenced patient samples. However, a signature recently discoveredin cisplatin treated human cell lines was also found in 8 hepatocellular and 2esophageal cancers, all with histories of cisplatin exposure (Boot et al., 2017).To catalogue the mutational signatures of metastatic cancer, whole genome andtranscriptome analysis of 571 advanced cancers was performed as part of the BCCancer Agency Personalized Oncogenomics Project. Additionally, we performedtemporal dissection of mutation signatures to map their evolutionary trajecto-ries through cancer progression. This is the largest study to date of metastaticcancer whole genomes, revealing novel mutation signatures and chemotherapy-associated evolution of mutational processes. Our findings highlight the complexinterplay of factors shaping the somatic genomes of metastatic cancers.1085.2 resultsDe novo inference of mutation signatures was successful in 12 cohorts out of 23,containing a total of 484 patients with 9,646,146 somatic SNVs. Primary site oforigin varied (Table 5.1), with the largest cohorts being breast (n = 144), colorectal(n = 87), and lung (n = 68). Hierarchical clustering over signatures decipheredindependently from each cohort yielded 20 independent mutation signatures. Sig-natures were compared against the current 30-signature COSMIC reference setusing the cosine similarity metric. 11 signatures closely matched with at least onepreviously observed signature from the COSMIC set (Appendix Figure A.4).Table 5.1: The number of patients belonging to each cancer type specificcohort.Cohort Number of Participants Primary Site / Cancer TypeBRCA 144 BreastCOLO 87 ColorectalLUNG 68 LungSARC 50 SarcomaMISC 45 Miscellaneous (i.e. unknown primary)PAAD 42 PancreaticOV 28 OvarianCHOL 14 CholangiocarcinomaSECR 12 Secretory gland tumorsSKCM 12 SkinLYMP 11 LymphomaSTAD 11 StomachESCA 10 EsophagealHNSC 6 Head & neckUVM 6 Uveal melanomaKDNY 5 KidneyACC 4 Adenoid cysticTHCA 4 Thyroid109Cohort Number of Participants Primary Site / Cancer TypeTHYM 4 ThymomaPRAD 3 ProstateGBM 2 GlioblastomaHCC 2 HepatocellularBLCA 1 BladderWe inferred the temporal evolution of mutation signatures to map the progres-sion trajectory of genomic instability. Signature evolution has only been previ-ously investigated in primary, untreated cancers (Letouzé et al., 2017; McGrana-han et al., 2015). The temporal dissection of novel metastatic signatures may helpdistinguish markers of metastasis.We have numbered novel metastatic signatures Signatures M1 to M9. Diagramsof all novel signatures are provided in Figure 5.1. The mean timing of mutationsignatures across every cancer type is summarized in Figure 5.2. The similarityof signatures to COSMIC reference signatures is shown in Figure 5.3. Signatureexposures for every cancer sample are provided in Appendix Fig. A.5.5.2.1 Aging-related Mutation SignaturesOf the 9 novel signatures (Figure 5.1), some were variations on known signatures.We identified signatures 1 and 5, known to be associated with aging (Alexandrovet al., 2015a). Signature M1 was characterized by C→T mutations in CpG contexts,and matched the aging-related signature 1B previously found in many primarytumours (Alexandrov et al., 2013a) but left out of COSMIC. Aging-related sig-natures were not observed in skin and lung cancer cohorts (SKCM and LUNG).In skin cancers, this is likely due to the small sample size and strong presenceof the UV signature. In lung cancers, signatures 3, M4, and M6 were correlated110C>AC>GC>TT>AT>CT>GSignature M1C>AC>GC>TT>AT>CT>GSignature M2C>AC>GC>TT>AT>CT>GSignature M3C>AC>GC>TT>AT>CT>GSignature M4C>AC>GC>TT>AT>CT>GSignature M5C>AC>GC>TT>AT>CT>GSignature M6C>AC>GC>TT>AT>CT>GSignature M7C>AC>GC>TT>AT>CT>GSignature M8C>AC>GC>TT>AT>CT>GSignature M9ACGTA C G T5' context3' contextFigure 5.1: Novel metastatic signatures not catalogued in COSMIC. Theanalysis of recurrent mutation signatures across 12 metastaticcancer cohorts identified 9 signatures not found in the COSMICsignature catalog. These included signatures of aging (M1), cis-platin exposure (M3), mismatch repair deficiency (M4), andAPOBEC deamination (M5). The etiology of the remaining sig-natures remains unclear.111Timing Early Mutations Late MutationsSignature M9Signature M8Signature M7Signature M6Signature M5Signature M4Signature M3Signature M2Signature M1Signature 30Signature 18Signature 17Signature 13Signature 8Signature 7Signature 5Signature 4Signature 3Signature 2Signature 1LUNGBRCASKCMCOLOPAADSARCESCAHNSC OVSTADCHOLSECRCancer Type CohortAPOBEC DeaminationMismatch Repair DeficiencyCisplatin ExposureAging−relatedNTHL1 MutationMUTYH MutationAPOBEC DeaminationHRDUV RadiationAging−relatedTobacco ExposureHRDAPOBEC DeaminationAging−relatedProposedEtiologyFigure 5.2: Mutation signatures and their temporal dissection inmetastatic cancer. 20 de novo signatures deciphered frommetastatic cancer whole genomes were found recurrentlyacross tumours of 12 cancertype-specific cohorts. Temporal dis-section revealed signatures biased towards early- or late-arisingmutations.and their linear combination matched signature 1, suggesting that presence ofthese the three signatures together obviated the need for a separate aging-relatedsignature. A similar previous analysis suggested that signature 1B may often becomposed of signature 1 together with signature 5 (Alexandrov et al., 2016).Like in primary tumours, the aging-related signatures 1 and M1 were early-arising mutational processes across cancer types. Aging signatures were particu-larly elevated in cancers of rapidly proliferating epithelial cells, such as colorectalcancer, which agrees with previous findings (Alexandrov et al., 2015a). Despiteprevious evidence that signature 5 is also aging-related, we found that elevatedsignature 5 occurred primarily in late-arising mutations.112Signature 1Signature 2Signature 3Signature 4Signature 5Signature 7Signature 8Signature 13Signature 17Signature 18Signature 30Signature M1Signature M2Signature M3Signature M4Signature M5Signature M6Signature M7Signature M8Signature M90 10 20 30COSMIC SignatureMetastatic De Novo Signature0.250.500.75similarityFigure 5.3: De novo mutation signatures deciphered from metastatic can-cers. Of these, 11 matched known signatures from COSMICand 9 were novel. To compare de novo signatures against thosein COSMIC, The cosine similarity metric was computed pair-wise between each metastatic signature and each COSMIC sig-nature.1135.2.2 Signatures of Exogenous MutationSignature 4, associated with tobacco exposure (Alexandrov et al., 2016), was aspecific indicator of smoking history as expected (Figure 5.4A). Signature 4 wasearly-arising in all but one of the participants with a known smoking history.Signature 7, associated with UV radiation, was found in skin cancers and headand neck cancers as well as one hypermutated cancer of the lung. These cases had,on average, approximately 100 times the mean mutation burden. 4 out of 12 UVhypermutated cases displayed a bias towards early mutations, and 5 fit a single-population model (Figure 5.4B). Subsequent review was conducted into the UV-positive lung tumour. Clinical review showed that this patient had multiple priorskin cancers, and assessment of pathology and gene expression profiles suggestedthat this cancer was a sarcomatoid lung tumour which likely originated from aspindle cell carcinoma of the scalp.5.2.3 Signatures of Endogenous Mutation and DNA Repair DeficiencyMutation signatures arising from APOBEC deamination, signatures 2 and 13(Roberts et al., 2013), were common across cancers of various types. Additionally,we identified a similar signature (M5) in stomach adenocarcinoma, which likelyshares a similar mutational mechanism. These three signatures were observedacross both early and late mutations. This disagrees with a previous finding thatsignature 2 was late-arising in bladder, head & neck, and lung cancers and signa-ture 13 was early-arising in bladder cancers (McGranahan et al., 2015).Signatures 3 and 8, associated with HRD (Davies et al., 2017; Nik-Zainal et al.,2012), were observed in both early and late mutations. Higher signature expo-114lll llllll lllllllllllllllllllllllllllllllllllSinglePopulation(Early) 00.250.50.75(Late) 10 10 20 30Exposure (Muts/Mb)TimingTobaccollllUnknownNever−smokerSmoking Hx2nd HandSig. 4 in Lung CancerAlllllllllllllllllllllllllllllllllllllllllllllSinglePopulation(Early) 00.250.50.75(Late) 10 20 40 60 80Exposure (Muts/Mb)Timing CohortlllHNSC, Sig. 7LUNG, Sig. 7SKCM, Sig. 7Sig. 7BFigure 5.4: The hypermutating signatures of tobacco smoking and ultra-violet radiation. (A) Mutation signature 4, associated with to-bacco exposure, was early-arising and elevated in patients witha known history of cigarette smoking. The signature was absentfrom never-smokers, as well as one patient who reported fre-quent exposure to second-hand smoke. (B) Mutation signature7, associated with ultraviolet radiation was prevalent amongskin cancers and one head & neck cancer, resulting over 10-100times the mutation burden in exposed cases.115ll lllllllllllllllllllll llllllllllll lllllllllllllllllllllllll llllllllll lllllllllllllllll llll llllllllll lllllllllllllllllllllllllllllll llll lllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllll lllll llllllll llll ll llllllll llllllllllll lllllllllllllll ll lllllllllllllllll ll llllllll llllllllll lll lllllllll lllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllll lllllllllllll llllllll ll lllllllllll lSECR Sig. 3 SKCM Sig. 3 STAD Sig. 3 STAD Sig. 8LUNG Sig. 3 OV Sig. 3 PAAD Sig. 3 SARC Sig. 3BRCA Sig. 3 BRCA Sig. 8 CHOL Sig. 3 ESCA Sig. 30.1 0.2 0.3 0.4 5 10 15 0 1 2 0 2 4 60 2 4 0 1 2 3 4 0 1 2 3 0.0 0.5 1.0 1.5 2.00 1 2 3 0.0 0.5 1.0 1.5 0 1 2 3 4 5 0.0 2.5 5.0 7.510.012.5Clonal(Early) 00.250.50.75(Late) 1Clonal(Early) 00.250.50.75(Late) 1Clonal(Early) 00.250.50.75(Late) 1Exposure (muts / Mb)TimingAll*lllllYes NoEarly Late Early Late0.00.20.40.6TimingSig. 3 Exposure FractionBRCA Platinum ExposedPlatinum ExposurelllYesNoUnknownBFigure 5.5: Platinum-exposure is associated with temporal evolution ofhomologous recombination deficiency (HRD) associated mu-tation signatures. (A) Mutation signatures 3 and 8, associatedwith HRD, were found in many cancer types. They were early-biased in high-exposure breast cancers and sarcomas. (B) Inbreast cancer, prior exposure to platinum-based chemotherapywas associated with a decrease in signature 3 from early to latemutations compared to non-platinum exposed tumours (p =0.033).sures were associated with early-arising mutations in breast cancer and sarcoma.Signature 3 was also observed in ovarian, pancreatic, and stomach cancers, as pre-viously described (Alexandrov et al., 2015b). In breast cancer, prior exposure toplatinum-based chemotherapy was associated with a decrease in the late muta-tion activity of signature 3 (p = 0.033, Figure 5.5). As discussed in the previoussection, the elevation of signature 3 in lung cancers is likely artifactual.Signature 30 was observed in two highly mutated cases, an undifferentiatedround cell sarcoma and a pancreatic neuroendocrine tumour (PNET). Signature30 was recently induced in cancer organoid models by the mutation of NTHL1116(Drost et al., 2017), a DNA glycosylase which participates in BER. In our cohort,both cases with elevated signature 30 carried deleterious mutations in the NTHL1gene. The NTHL1 mutation in the PNET was a germline fusion event with thenearby genes TRAF7 and TSC2 and was previously described in detail (Wong etal., 2018).Signature M4 was a driver of hypermutation in MMR-deficient tumours, andwas associated with elevated MSI scores (Figure 5.6A). The signature profile wascharacterized by C→T and T→C transitions, and of the COSMIC signatures asso-ciated with MMR, it was closest to signature 26 (Figure 5.3). Timing bias of MMRhypermutation varied (Figure 5.6B).Aside from hypermutated cases, MMR signatures demonstrated temporal vari-ability across tumour types. In particular, colorectal cancers carried increased sig-nature M4 exposure in early-biased or single-population tumours. Aside frommutation of genes responsible for MMR, a common etiology of MMR deficiencyis hypermethylation of the MLH1 promoter (Kuismanen et al., 2000; Li et al., 2013)which is associated with decreased MLH1 expression. Although we did not di-rectly measure methylation, the expression of MLH1 was estimated from tran-scriptome data. Excluding cases with MSI or carrying germline mutations in anMMR gene, signature M4 exposure was negatively correlated with the expressionof MLH1 (p = 0.0029, Figure 5.6C,D).Signature M7 was a signature of unknown etiology and accounted for 27,547mutations (0.72% of mutation burden) in a single breast cancer sample. It has aspecific profile of GCG→GTG, GTC→GCC, TTC→TCC, and GTT→GCT muta-tions.117l ll l lll l llll ll ll lll lll ll l lllll ll lll lll ll l llll lll l lllll ll02040600.1 1.0 10.0Sig. M4 Exposure (Muts/Mb)Log ScaleMSIsensor ScoreAlllllllllllllllllllllllllllllllll lllllll llll lll lllllllllllllll lllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllll llSinglePopulation(Early) 00.250.50.75(Late) 10.1 1.0 10.0Sig. M4 Exposure (Muts/Mb)Log ScaleTimingBllllll llll llll lllllll ll lllllllllllllll llllllllllll llllllllll llllllllllllllllp = 0.0029012340 20 40 60MLH1 Exp. PercentileSig. M4 Exposure (Muts/Mb)Cllllll lll lll llllll ll lllllllllllllllllllllllllllllllllllllll01234−3 −2 −1 0 1MLH1 Exp. Fold ChangeSig. M4 Exposure (Muts/Mb)DCohortllllCOLO, Sig. M4ESCA, Sig. M4LUNG, Sig. M4PAAD, Sig. M4ImpactllllWild TypeLOWMODERATEHIGHFigure 5.6: A novel signature of mismatch repair (MMR) deficiency is as-sociated with microsatellite instability and underexpressionof MLH1. (A) Mutation signature M4 was the only signatureassociated with microsatellite instability. (B) Temporal dissec-tion of signature M4 revealed a distinct cluster of colorectalcancers with elevated early-arising signature exposure. (C,D)Cases with germline mutations in an MMR gene (MSH2, MSH3,MSH6, PMS1, PMS2, and MLH1) are shown according to theirSNPeff-predicted mutation impact (low, moderate, or high). Ex-cluding cases with germline MMR gene mutations, signatureM4 was significantly correlated with MLH1 expression.1185.2.4 The Late-arising Signatures of Metastases and Chemotherapy ExposureTwo signatures of unknown etiology, signatures 17 and M2 were biased towardslate-arising mutations. This was observed across cancer types (Appendix FigureA.7) and biopsy sites (Appendix Figure A.8), suggesting that these signaturesmay relate to shared mutational processes occurring in progression or treatment.Signature 17 has been previously found in a small number of cancers of the liver(Letouzé et al., 2017) and breast (Nik-Zainal et al., 2016).To explore exposures to common DNA-damaging chemotherapies as a poten-tial etiology for mutation signatures, we examined drug-signature associations in7 common chemotherapy agents with known DNA damaging properties. The sig-natures included in this analysis were those of late-arising or unknown etiology:signatures 5, 17, M2, M3, and M8.Three drug-signature pairs displayed statistically significant differences in sig-nature exposure between drug-exposed and non-exposed groups. Signature 17was elevated in cancers exposed to oxaliplatin (p = 3.2e-07, median of 1685 vs.228 mutations) and fluorouracil (p = 2.8e-06, median of 828 vs. 199 mutations),which are commonly administered in combination as part of FOLFOX regimensto treat gastrointestinal and pancreatic cancers (André et al., 2004; Conroy et al.,2011). The trend was observed both in cancers fitting single-population modelsand multi-population models, with the latter displaying a clear late signature biasin oxaliplatin-treated cases (Figure 5.7C).119C>A C>G C>T T>A T>C T>GBoot et al.Signature 17Signature M3A_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_TA_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_TA_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_TA_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_TA_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_TA_AA_CA_GA_TC_AC_CC_GC_TG_AG_CG_G G_T T_AT_CT_GT_T0.00.10.20.00.10.20.00.10.2Trinucleotide ContextProportionAlllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllll ll lllllllllll llllllll llllllllllllllllSARC SECRBRCA CHOL OV0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.50.0 0.5 1.0 1.5 2.0 2.5SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1Sig. M3 Exposure (mutations / Mb)TimingBlllllllllll llllllll lllllll lllllllllll lll llllllll lll lllllllllllllllll l l l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llll lllllllllll llllll lllllllllllllllllllllllllllll lll llllll ll lllllll lll llllllllllllll ll lll llllllll ll lllllllllllllllllll l lll ll lllllllllllllllllllllllllllllll lllllllllllllllll lllllll lllllllFLUOROURACIL OXALIPLATIN0.1 10.0 0.1 10.0SinglePopulation(Early) 00.250.50.75(Late) 1Signature 17 Exposure (Muts / Mb) − Log ScaleTimingCCrosslinkeitherinterstrandintrastrandPlatinum−exposedllllNon−exposedCarboplatin, CisplatinCarboplatinCisplatinTreatedllFALSETRUEFigure 5.7: Screening of drug-signature interactions reveals statisticallysignificant associations with cisplatin, oxaliplatin, and flu-orouracil. (A) Mutation signatures associated with platinum-based chemotherapies demonstrated features consistent withintrastrand crosslink formation. Signature M3 strongly resem-bled a signature induced by Boot et al. (2017) in cell lines bytreatment with cisplatin. Our findings provide independentdiscovery of this mutation signature in clinical samples. (B)Signature M3 was deciphered during de novo mutation signa-ture analysis in five cancer types. Elevation of signature M3was observed in association with 16 cancers of various typespreviously treated with cisplatin, as well as many carboplatin-treated ovarian cancers. (C) In addition, the elevation of sig-nature exposure in association with oxaliplatin treatment wasbiased towards late-arising mutations.1205.2.5 Signature M3 Results from Exposure to CisplatinSignature M3 was elevated in patients exposed to cisplatin (p = 8e-07, medianof 232 vs. 653 mutations), which is commonly administered with gemcitabine totreat cancers of the bladder, lung, breast, liver, and bile duct. Elevated signatureM3 was associated with platinum exposure in four out of five osteosarcomas, aswell as three other sarcomas, one breast cancer, three colangiocarcinomas, andtwo salivary duct carcinomas (Figure 5.7B).A recently posted pre-print article in the bioRxiv (Boot et al., 2017) indepen-dently discovered a signature nearly identical to M3 (cosine similarity = 0.94) bytreating human cell lines with cisplatin. Both signatures M3 and 17 display highrates of mutations consistent with intrastrand crosslink formation.Whereas signature M3 was enriched for C→T mutations in CpCpY contexts,signature M8 exhibited C→A mutations in the same context. Signature M8 wasfound in only a single pancreatic adenocarcinoma, to which it contributed 13648mutations, accounting for 90% of mutation burden. The only chemotherapy towhich the pancreatic cancer had been previously exposed was gemcitabine. How-ever, definitive conclusions cannot be made from this single observation.5.3 discussionMutation signatures represent an emerging tumour biomarker orthogonal to exist-ing clinical modalities. Understanding the evolution of mutation signatures fromcarcinogenesis to progression would undoubtedly inform research on their ef-fective clinical translation. Here, we have performed the largest investigation ofmutation signatures across metastatic cancer whole genomes. In addition to dis-121covering novel signatures in the metastatic setting, we inferred their temporal evo-lution, which can help guide the association of signatures with potential etiologicfactors, such as exposure to DNA-damaging chemotherapies.Our analysis also uncovered known and novel trends of temporal mutation sig-nature bias. Signatures of aging were early arising, as previously observed; aswere signatures associated with cigarette smoke and UV radiation. Additionally,temporal dissection revealed that non-hypermutating, early involvement of MMR-deficient signatures correlated with underexpression of MLH1 specifically in col-orectal cancers. Discrepancies in temporal bias compared with previous analyses(McGranahan et al., 2015), such as observed with APOBEC signatures, may resultfrom numerous differences between the studies. McGranahan et al. (2015) ana-lyzed WES data, which is less likely to yield stable signature solutions than WGS,but allowed for a greater number of cases per cancer type. Additionally, the pre-vious study employed binary temporal partitioning, which we demonstrated pre-viously can result in lower integrity temporal dissection than SignIT dependingon the underlying clonal structure. Importantly, McGranahan et al. (2015) studiedprimary cancers, which may lack certain mutational processes specific to metasta-sis.An example of signatures specific to advanced, chemotherapy-treated tumoursare those arising from genotoxic chemotherapy exposures. Signature M3 was as-sociated with cisplatin exposure across diverse cancer types. Boot et al. (2017)found an identical signature in vitro, as well as in 8 hepatocellular carcinomasand 2 esophageal cancers, which together with our findings provides strong ev-idence of a direct link to drug exposure. We further described signature M3 cis-platin treated breast cancers, cholangiocarcinomas, sarcomas, and salivary glandtumours.122In addition to signature M3, we also found an association between signature17 and treatment with oxaliplatin and fluorouracil. In this case, the signature waslate-biased, suggesting later onset of drug exposure during the course of the dis-ease. Despite having been observed in other cancers, the etiology of signature17 remains unknown, and no in vitro studies have definitively linked it to a muta-genic agent. However, mutational spectra arising from oxaliplatin and fluorouracilexposure have not yet been studied using in vitro methods. However, examinationof signatures M3 and 17 revealed shared enrichment of mutations consistent withintrastrand crosslink formation. This form of DNA damage is typically repairedby NER (Huang and Li, 2013). Deficiencies in NER or related pathways such astranslesion synthesis may explain why some platinum-treated tumours displaythese signatures while others do not.The temporal analysis of signatures in association with drug exposures alsoenables the study of hypothetical drug resistance mechanisms. Tumours withmutations in the HR-associated genes BRCA1 and BRCA2 are more sensitive toplatinum-based chemotherapies (Arun et al., 2011; Byrski et al., 2010; Tutt et al.,2015; Von Minckwitz et al., 2014). We showed in chapter 2 that breast cancers withsignatures of HRD, including signature 3, are also associated with improved out-comes on platinum-based chemotherapy, independent of BRCA1 and BRCA2 mu-tations. We observed that prior exposure to platinum-based chemotherapy wasassociated with depression of signature 3 exposure in late-arising mutations. Ahypothetical resistance mechanism to platinum-based chemotherapy is the rever-sion (or back-mutation) of BRCA1 and BRCA2, restoring function to the mutantgene (Dhillon et al., 2011; Norquist et al., 2011; Swisher et al., 2008). This findingsuggests that reversion of HRD, and therefore a drop in signature 3, may be asso-123ciated with platinum resistance even in the absence of detected BRCA1 or BRCA2reversion mutations.The analysis of signature timing provides evidence to suggest signature evolu-tion in response to chemotherapy exposures. However, inferring timing from a sin-gle biopsy alone is limited in its ability to definitely attribute mutation signaturesto chemotherapy exposures. The availability of sequencing data from multipletime points would be helpful in this regard, but can be costly and technically in-feasible to obtain. A further limitation of this study is the lack of available clinicaldata regarding prior radiotherapy at the time of analysis. As a result, we could notinvestigate the association of radiation exposure with mutation signatures. How-ever, a previous analysis of radiation-treated second malignancies predominantlyidentified signatures of SVs, insertions, and deletions rather than SNVs (Behjatiet al., 2016).This analysis also highlights continued technical challenges in the applicationand interpretation of mutation signatures. There were many disagreements in sig-nature evolution between our analysis and previous work (McGranahan et al.,2015), such as discrepant timing of signatures 1B, 2, and 13. It is likely that mul-ticollinearity between signatures, resulting in mutation signature bleed, plays asignificant role in these discrepancies. For example, in chapter 3 we demonstratedthat mutation signature 5 is similar to many other signatures, and thus may bleedsignal with them. Additionally, the aging signature 1B (or M1) can be formulatedfrom a linear combination of signatures 1 and 5 (Nik-Zainal and Morganella, 2017).This may reconcile why signature 5 was correlated with age-of-onset in a previousstudy (Alexandrov et al., 2015a) yet is biased towards late mutations in our studyand others (McGranahan et al., 2015). This suggests that signature 5 itself may in-124dependently arise from both an aging-related process, and a different late-arisingmechanism.Over the past decade, mutation signature analysis has emerged as a valuabletool for the precise delineation of genomic instability and mutation in cancers. Theapplicability of this approach across tumour types makes it an attractive optionfor biomarker discovery in personalized cancer analysis and treatment. By investi-gating the mutation signatures of advanced cancers, we have aimed to shed lighton the diverse mutagenic influences at play during invasion and metastasis.5.4 methods5.4.1 Whole Genome Sequencing of Metastatic CancersStudy participants underwent tumour biopsies as part of the POG Project (Laskinet al., 2015). The study was approved by the University of British Columbia Re-search Ethics Board (REB# H12-00137 and H14-00681). Written informed consent,including potential publication of findings, was obtained from patients prior to ge-nomic profiling. Whole-genome sequencing data (.bam files) have been submittedto the European Genome-Phenome Archive (EGA) (www.ebi.ac.uk/ega/home)under the study accession number EGAS00001001159.The details of library construction, sequencing, and bioinformatics of metastaticsamples have been previously described (Jones et al., 2010). Briefly, biopsy sam-ples were embedded in OCT compound and sectioned. Pathology review wasperformed to select sections for sequencing. Genome libraries were constructedfrom tumor and peripheral blood (normal control) and sequenced using Illuminaprotocols on a HiSeq sequencer.1255.4.2 Mutation CallingReads were aligned to hg19 by the BWA aligner (v0.5.7) (Li and Durbin, 2009,2010). Somatic SNVs and small insertions/deletions were processed using sam-tools (Li et al., 2009) and Strelka (v0.4.6.2) (Saunders et al., 2012). CNVs werecalled using CNASeq (v0.0.6).5.4.3 Mutation Signature AnalysisMutation signature analysis was performed on 571 cancer whole genomes from23 cancer type cohorts. Somatic SNVs called by Strelka were categorized basedon 6 variant types and 16 trinucleotide context subtypes to yield a total of 96mutation classes. Mutation signatures were deciphered using a published frame-work (Alexandrov et al., 2013b) for non-negative matrix factorization (NMF) ofthe mutation catalog matrix into de novo mutation signatures and the relative ex-posure of each signature to each cancer genome. Fractional exposure was definedas the proportion of a genome’s total mutation burden contributed by a particularsignature.Signature stability estimates were obtained by bootstrap re-sampling with 1,000iterations (10 iterations over 100 cores). The solution which best maximizes signa-ture stability and minimizes Frobenius reconstruction error, nopt was chosen foreach cohort with the formulanopt = argminn(Rn −min(R)max(R)−min(R) −Sn −min(S)max(S)−min(S)),126where Sn and Rn are the signature stability and reconstruction error values forthe n-signature solution and S and R are vectors containing stability and recon-struction error values for all values of n. The model selection in each cohort isshown in Appendix Fig. A.6Mutation signature analysis of a total of 23 cancer type cohorts was attempted(Table 5.1). All but 12 cohorts failed mutation signature analysis because of (1)too few samples, (2) too few SNVs, or (3) excessive heterogeneity in mutationsignatures (as was the case in the MISC cohort). An analysis was marked failedif every sample had its own private mutation signature (meaning dimensionalityreduction did not take place) or if the stability and reconstruction error estimateswere poor across all attempted models.5.4.4 Temporal Analysis of Mutation SignaturesTemporal analysis of mutation signatures based on mutation types and NGS vari-ant allele counts was performed using SignIT. Cases which fit models describedby greater than one subpopulation can be subject to mutation signature timinganalysis. SignIT requires the annotation of SNV calls with tumour and normalcopy number. Prior to annotation, CNVs from CNAseq were first corrected forploidy using the following formulaC(T) =(R + 1)(TP + C(N)(1− T))− C(N)(1− T)T.Where R is the mean GC-corrected tumour-to-normal read depth ratio acrossthe segment, T is the tumour content, and P is the ploidy. C(T) is the estimatedabsolute tumour copy number of the segment and was rounded to the nearestwhole number, and C(N) is the normal copy number, assumed to be 2. SNVs127in regions with greater than 5 copies were filtered out, as precise copy numberestimation becomes difficult.262 cases best fit a model with one subpopulation, while 215 fit multiple tem-porally distinct subpopulations thus enabling signature timing. Mean early andlate mutation signature exposures were computed by fitting a weighted linearmodel of exposure fraction versus subpopulation prevalence. The timing bias wascomputed as the fraction of late-arising mutations,late exposurelate exposure + early exposure,and could vary from 0 for early mutation signatures to 1 for late mutationsignatures. To generate Figure 5.2, results were grouped by cohort and signature,and the total number of early- and late-arising mutations across all samples wascomputed.5.4.5 Microsatellite Instability ScoresMicrosatellite instability was quantified from paired tumour-normal wholegenome sequencing using the previously described tool, MSIsensor (Niu et al.,2014). Microsatellites and homopolymers were identified in the hg19 referencegenome using default parameters (homopolymers ≥ 5, microsatellites repeat unitlength ≤ 5) followed by subsampling to 50,000 sites randomly distributed acrossthe genome. Somatic status of sites with sufficient coverage (20 spanning reads inboth normal and tumour samples) was determined using default settings (median1369 sites tested per sample). The percentage of tested sites that were unstable inthe tumour sample compared to the normal sample was used as a measure of MSI.General classification of MSI status were as follows: microsatellite stable (< 10%),128MSI-low (10-30%) and MSI-high (> 30% of somatic sites unstable). Four out ofsix cases that had conventional immunohistochemical testing for MMR deficiencyand an MSI score of ≥ 10% (MSI-low or MSI-high) also tested positive for MMRdeficiency by immunohistochemistry, supporting the accuracy of MSIsensor anal-ysis.5.4.6 Quantifying Gene Expression from TranscriptomesTranscriptomes were repositioned using JAGuaR (version 2.0.3) (Butterfield et al.,2014). Differential expression analysis was performed by comparing reads perkilobase of transcript per million mapped reads (RPKM) values against a com-pendium of 16 normal tissues from the Illumina BodyMap 2.0 project (availablefrom ArrayExpress, queryID: E-MTAB-513) as previously described (Jones et al.,2010). For every sample, the expression percentile of each gene was computedagainst expression data for that gene across all samples from TCGA.5.4.7 Retrospective Clinical ReviewA retrospective review of chemotherapy exposures including treatment start andend dates was performed, aided by a provincial clinical cancer database (Wu etal., 2013). Additionally, relevant patient demographics such as age at diagnosisand tobacco smoking history were obtained.1295.4.8 Analysis of Drug-Signature AssociationsChemotherapy exposure data were available in 408 out of 484 patients, who alto-gether had been exposed to 119 distinct chemotherapy drug types. Among the 20most commonly used chemotherapy agents, 7 with known DNA damaging quali-ties were chosen for investigation: cyclophosphamide, cisplatin, fluorouracil, dox-orubicin, capecitabine, carboplatin, and oxaliplatin. Late-arising signatures andthose of unknown etiology (5, 17, M2, M3, M8) were each assessed for differencesin exposure between therapy-exposed and non-exposed patients by the Wilcoxonsigned-rank test. Resulting p-values were adjusted for multiple hypothesis testingusing the Bonferroni-Holm method.1306C O N C L U S I O N6.1 summary of major findingsThis thesis had three major aims. First, to assess the association between DNAdamage repair mutation signatures and response to DNA-damaging chemother-apy. Second, to enable accurate individualized mutation signature decompositionand temporal dissection. Last, to characterize the evolution of mutation signaturesin metastatic cancers.To assess the clinical actionability of mutation signatures, we studied HR asa model system. This allowed us to build upon knowledge that cancers withBRCA1 and BRCA2 mutations are more sensitive to platinum-based chemotherapy.Moreover, past efforts to quantify genomic scars as predictors of BRCA1/BRCA2mutation (Timms et al., 2014) and platinum response (Telli et al., 2016) alreadydemonstrated promising findings in primary breast cancers, but did not replicatein the metastatic setting (Tutt et al., 2015). In chapter 2, we used WGS of advancedbreast cancers to demonstrate that response to platinum-based chemotherapieswas associated with mutation signatures of HRD. The novel aspect of this workwas the integration of multiple independent signatures of various mutation types,whereas previous studies had relied only upon signatures of CNV and LOH. Ourfindings suggest that genome-scale analysis of mutational processes can moreaccurately inform the clinical management of cancers with HRD.131The use of mutation signatures in clinical guidance calls for accurate individ-ualized decomposition of signature exposures. However, the majority of muta-tion signature methods perform de novo inference of signatures from large cancerdatasets. As part of our breast cancer study, we found that signature inferenceby NNLS using the 30 signature reference set from COSMIC yielded accuraten-of-1 HRD predictions. We built upon this in chapter 3 by proposing SignIT, ahierarchical Bayesian model which performs accurate, robust, and interpretableindividualized inference of mutation signatures. Using simulated data and WESmutation calls from TCGA, we demonstrated SignIT’s superiority over alternativeapproaches.A challenge in the interpretation of mutation signatures or genomic scars isthat they represent the aggregate mutational history of a tumour rather thanthe relevant mutational processes still active at the time of treatment. Serial se-quencing at multiple timepoints could map out the mutational trajectory, butwould be inconvenient, costly, and impose additional medical procedures. In-stead, we extended SignIT to integrate genomic read depth data in order to in-fer the presence of temporally distinct mutational subpopulations. This enablesthe tracking of mutation signatures from a single sequencing timepoint. Usingdata from multiply sequenced metastatic tumours, we demonstrated that earlyprevalent subpopulations demonstrated signatures similar to those of the primary,whereas later-arising subpopulations diverged. By directly inferring tumour sub-populations rather than partitioning mutations using hard assumptions, SignITimproved upon previous attempts at reconstructing mutation signature timing.In chapter 4, we demonstrated clinical implications of mutation signature tim-ing in a pancreatic adenocarcinoma with a germline BRCA1 variant but paradox-ically low SV burden and HRD signature exposure. We synthesized the timing132of chr17 LOH and of the HRD signature using both computational techniquesand comparison to the primary. The findings suggested later than expected so-matic LOH of BRCA1 and onset of HRD, which may reconcile the BRCA1 variant,the low HRD signature, and the cancer’s exceptional response to FOLFIRINOXtherapy.The development of SignIT allowed us to address the final aim. To date, muta-tional processes have been studied almost exclusively in primary, treatment-naivecancers. Successful application of mutation signatures to the analysis of advancedcancers will require an understanding of the unique forces shaping somatic mu-tation in metastases. In chapter 5, we deciphered mutation signatures from thewhole genomes of nearly 500 metastatic, treated cancers. This uncovered novelsignatures, one of which (signature M3) has been shown to arise in vitro from cellstreated with cisplatin (Boot et al., 2017). Additionally, the HRD-associated signa-ture 3 was suppressed in the late-arising mutations of cancers previously exposedto platinum-based chemotherapy, which hints at potential resistance mechanisms.These findings confirm that metastatic cancers are characterized by shifts in mu-tagenesis borne of selective pressures and exposure to DNA damaging therapies.6.2 the clinical implications of genomic instabilityMutation signatures blur the line between the genotype and phenotype. Whilemutagenesis shapes the cancer’s genotype, it also reveals the integrity of its DNArepair processes. By leveraging the whole genome as a functional assay, mutationsignatures permit biologists to directly view the effects of DNA repair deficiencieswhether or not their root cause is identifiable.133DNA repair mechanisms, such as HR, are complex and not fully understood.For example, not every HRD tumour is explained by mutation or hypermethyla-tion of BRCA1/BRCA2. Whereas BRCA1/BRCA2 underlie 5-10% of breast cancer,as many as 22% of primary breast cancers carry signatures of HRD (Davies et al.,2017). Furthermore, of the observed mutations, many are VUS, without clear evi-dence linking to breast cancer risk. Polak et al. (2017) recently demonstrated thatmutation signatures can help to delineate the functional relevance of mutationsin DNA repair genes. This has ramifications both for guidance of treatment andfor the screening of hereditary cancer risk. The mutation signature of a targetablepathway can serve as an indicator of that pathway’s function. However, even inobjective responders to platinum-based chemotherapy, recurrence rates are high(Dent et al., 2007; Nagourney et al., 2000; Sirohi et al., 2008). Follow-up analy-sis by WGS offers the potential to probe the origins of acquired drug resistance(Jones et al., 2010). For instance, HRD tumours have been observed to acquiresecondary mutations which restore the reading frame of BRCA1/BRCA2 (Patchet al., 2015; Swisher et al., 2008). Here, the temporal dissection of mutation sig-natures may come into play. We found that breast cancers with prior exposure toplatinum-based chemotherapy exhibited a decrease in HRD signature activity inlate mutations. Again, the analysis of mutational processes may obviate the needto identify the specific somatic event giving rise to resistance, so long as a lateshift in the signature is observed.In contrast to HRD suppression, the pancreatic case study in chapter 4 demon-strated the principle of late-onset HRD. As a common source of cancer suscep-tibility, HRD is thought to be an early cancer driver. However, the analysis ofmutation signature timing in primary cancers by McGranahan et al. (2015) andour own temporal analysis in metastatic cancers (chapter 5) suggests that HRD134may require time to accrue a notable mutation burden. In the clinical setting, thiscould result in the discounting of a low but active HRD signature and the missedopportunity for targeted treatment. Therefore, it is preferable to delineate cur-rently active processes from historically active ones in order to appreciate the fulltimeline of actionable mutagenesis.6.3 the mutational processes of metastatic cancersStudying the association of mutation signatures with drug exposures was madepossible by the availability of clinical treatment data. This infrastructure allowedthe screening and joint modeling of drug-signature interactions. The associationbetween signature M3 and cisplatin exposure emerged directly from this analysis.Aside from the temozolomide signature, this is the first verified signature of achemotherapy exposure independently discovered in patient samples by NMF.Signature M3 appears to be a specific, but not sensitive marker of platinumexposure as many platinum-exposed samples do not display the signature. It isnot yet clear whether its accrual depends on the loss of specific DNA repair pro-cesses as with signature 11 in temozolomide treatment. It is also unclear whethersignature M3 could be a marker of acquired drug sensitivity or resistance. Thesignature’s unbiased temporal occurrence suggests that it may be a remnant ofDNA damage retained through the proliferation of a resistant clone. The lack oflate-bias also suggests that, in a small fraction of tumours, cisplatin treatmentmay induce as many as 2 mutations per megabase and could shape early tumourcells which seed to metastatic sites. This finding calls for further study the clinicalconsequences of cisplatin-dependent mutagenesis, especially in metastatic cancertypes where signature M3 is frequently observed.135The association of signature 17 with oxaliplatin and fluorouracil exposurestands in contrast to signature M3 because of its bias towards late-arising muta-tions. Therefore, we posit that signature 17 may result from direct drug exposureof the metastatic cancer rather than the expansion of a resistant clone. However,independent validation of this signature in fluorouracil/oxaliplatin treated cellsis necessary, as no studies have yet examined the mutagenic profile of oxaliplatin.Prior to the successful induction of signature M3 in human cell lines by cisplatintreatment (Boot et al., 2017), mutagenic profiles were also accumulated in vitro viacisplatin treatment of chicken DT40 lymphoblast cells (Szikriszt et al., 2016) andcaenorhabditis elegans (Meier et al., 2014). However, these signatures have not sub-sequently been identified in sequenced human samples. The distinction suggeststhat signatures generated experimentally in model organisms may not be as ap-plicable to human cancers as those generated in human cells. This further impliesthat mutation signatures stem from a delicate interplay between mutagens andrepair pathways, and that subtle variation between species can dramatically altermutagenesis.6.4 limitationsThe study of platinum response in breast cancer was part of a larger study withthe goal to guide individual treatment decisions using WGTA. The populationstudied was selected for inclusion, and may not reflect the full population ofmetastatic breast cancers. Additionally, this project occurred in two phases, thefirst of which included the first 100 cases (Laskin et al., 2015). Sequencing proto-cols differed slightly between the two phases, which could introduce batch effects.136However, bioinformatics pipelines are standardized across samples and earliercases are occasionally re-run with updated analysis tools.The clinical data on treatment durations were derived from a provincialdatabase of pharmacy records at cancer centres (Wu et al., 2013). Some treatmentsare missing from this data, especially those delivered in different jurisdictions orhealth authorities, or which were part of certain clinical trials. However, data onstandard treatments such as the platinum-based chemotherapies discussed here,were near complete. Additionally, for the breast cancers studied in chapter 2, miss-ing treatment dates were reintroduced during retrospective clinical review.Another major limitation is the lack of standard timelines for the assessment oftreatment response. This precluded the use of standard objective response criteriasuch as RECIST (Eisenhauer et al., 2009) and necessitated the creation of a customresponse scale. Instead, treatment duration was available as a secondary measureof patient outcome, and was found to correlate with rated treatment responses.A limitation of SignIT is the assumption that CNVs are clonal, meaning theyare identical across all subclones of a tumour. Some methods such as TITAN (Haet al., 2014) can estimate the cellular prevalence of CNVs. Future iterations ofSignIT could provide an option to include such estimates within the model tomore precisely adjust the expected variant allele counts.Lastly, in the temporal analysis of mutation signatures by SignIT, slightly morethan half of cancers fit a single-population model. This suggests that there wasinsufficient clonal diversity and/or too few SNVs in regions of copy number vari-ation to accurately estimate the timing of signatures. Deeper sequencing may benecessary to uncover identifiable tumour subpopulations in these cases. It is notcertain, however, whether cancers which tend to fit a single-population model alsosystematically differ in the timing of mutational processes. If so, then the removal137of these cases from analysis could confound the findings in chapter 5 and harmgeneralizability.6.5 a role for mutation signatures in precision oncologySince 2014, the POG project has incorporated mutation signatures in thetreatment-focused personalized analysis of cancer genomes. HRD scores were in-troduced in late 2016. Within the first 139 breast cancers, over 25% of cases weredeemed to have an actionable target based on mutation signature, HRD score, ormutation burden. A major limitation is that distinct actionable targets from mu-tation signatures remain limited to HRD (for platinum/PARP inhibitors) and hy-permutation (for immunotherapy). In addition to POG, other personalized cancersequencing initiatives have also incorporated some element of mutation signatureanalysis (Tuxen et al., 2016; Zehir et al., 2017).Cost remains a barrier to the integration of mutation signatures at scale intoclinical care. The accuracy of mutation signature decomposition improves withincreased sampling of mutations. For example, the signature 3 exposure of a typi-cal metastatic breast cancer varied from 0 to 10,000 mutations, and total mutationburdens from 0 to 60,000. WES yields approximately 100 times fewer mutations,and targeted panels fewer still. When partitioning mutations across 96 classes, itcan thus be challenging to identify clinically relevant signatures with any confi-dence. Using a large targeted panel, Zehir et al. (2017) could quantify signaturesonly of hypermutating processes (POLE, MMR, tobacco, UV, and temozolomide),even with 10,000 samples. Worse still, SV signatures are infeasible at the scaleof exomes. Low-depth sequencing, such as employed by Nik-Zainal et al. (2016),is a potential cost-saving solution for mutation signature analysis. However, this138would render temporal dissection of mutational processes challenging or impos-sible without follow-up targeted sequencing of mutated loci.The integration of mutation signatures into clinical practice will likely dependon the feasibility of WGS itself. The value added by WGS rises with continuedcharacterization of the cancer genome and the proliferation of datasets supportingthe contextualization of clinically relevant findings. Meanwhile, sequencing costscontinue to fall, but the cost of genome analysis has not followed suit. Weymannet al. (2017) showed that the cost of WGTA within POG was $34,886 per patientfrom 2012 to 2015 with a downwards trajectory driven primarily by decreasing se-quencing cost. The construction of automated analysis pipelines to surface knownand hypothetical actionable targets needs to be a continued focus to realize thegoal of scalable precision oncogenomics.6.6 future research directionsIn the meantime, there remain many opportunities for research into the actionabil-ity of DNA repair deficiency. Within the field of HRD, a well-designed prospec-tive trial leveraging HRDetect or a similar aggregate measure of HRD is neces-sary. Such a trial could compare the response to cisplatin/gemcitabine treatmentbetween HRD and non-HRD breast cancers. Moreover, the recent approval of ola-parib for use in germline BRCA1/BRCA2 mutated breast cancers (Center for DrugEvaluation and Research, 2018) raises the possibility of a PARP inhibitor trial. Alsopromising is the recent development of a DNA G-quadruplex stabilizer (Xu et al.,2017), which is now in phase I clinical trial (Canadian trial NCT02719977). G-quadruplex structures are sites of frequent DNA damage which requires repair139by HR. Inducing or stabilizing G-quadruplex structures in the context of HRDmay facilitate tumour-targeted cell death.Another priority is better characterization of HRD’s actionability in cancer typesother than breast cancer. HRD has also been observed in ovarian, pancreatic, andgastric cancers (Alexandrov et al., 2015b; Davies et al., 2017), as well as osteosar-coma (Kovac et al., 2015), wherein the PARP inhibitor talazoparib has been effec-tive in vitro (Engert et al., 2017).Regarding platinum resistance, we hypothesized in chapter 5 that HR restora-tion could indicate acquired resistance to platinum-based chemotherapy. Thisfinding may eventually inform development of biomarkers to monitor for drugresistance, similar to how HRD onset could be a marker of drug sensitivity.However, there has not yet been evidence showing that BRCA1/BRCA2 rever-sion mutations lead to a drop in HRD mutation signature activity. Unfortunately,no BRCA1/BRCA2 reversions have been confirmed in our metastatic cancer co-hort because paired primary cancers were sequenced in only select cases. How-ever, a WGS study of 92 chemoresistant ovarian cancers confirmed five casesof BRCA1/BRCA2 reversion. Data from this study are available via EuropeanGenome-Phenome Archive (EGA), and could be used to assess the evolution ofHRD mutation signatures in association with BRCA1/BRCA2 reversion. Anotherpromising approach is the sequencing of circulating tumour DNA to monitor forreversion mutations in BRCA1/BRCA2 (Christie et al., 2017; Mayor et al., 2017;Weigelt et al., 2017). Exome-scale capture of circulating tumour DNA could even-tually enable non-invasive monitoring of mutation signatures, which would dra-matically improve feasibility of clinical application.While this thesis has focused on the timing of SNV mutation signatures, thetiming of SV signatures remains a challenge. Graph-based approaches have been140proposed to reconstruct genome-wide rearrangement histories (Greenman et al.,2012), but outstanding challenges exist relating both to analytical difficulties andthe accuracy of CNV and SV callers (Maciejowski and Imielinski, 2016). Likewise,the timing of CNV-based signatures such as the HRD score could be achieved bymethods such as those described by Purdom et al. (2013) and applied in chapter 4,but these in turn depend upon knowledge of rearrangement history. Despite thesechallenges, early attempts to chart the timing of common cancer driver eventshave already been made (Gerstung et al., 2017).6.7 looking forward : biomarker discovery in the era of genomicdataIf nothing else, I hope that this thesis has conveyed the importance and complex-ity of relating a novel WGS biomarker, the mutation signature, to its therapeuticpotential. There are fundamental biological questions to consider, such as the con-fluence of factors which shape and alter mutation signatures, and whether muta-tional processes evolve over time. There are also technical details to unmask, suchas signature bleed. Most importantly, there is the feedback loop which enables thehypothesis, discovery, and follow-up of relevant clinical associations.For some forward-thinking jurisdictions, collating the complete genomic infor-mation of tumours coupled with extensive clinical information will provide anunprecedented research platform to understand the mechanisms underlying ther-apeutic response, acquired resistance, and failure. Furthermore, the serial appli-cation of WGTA, undertaken many times during the course of the disease couldprovide a real-time view of cancer progression and treatment response. This feed-141back loop will be invaluable for the study of cancers where the goal is to improvedisease stratification and therapeutic intervention.I have been privileged to glimpse the earliest of genomics applications aimedat guiding cancer treatment decision-making. I hope that, in the coming era ofgenomic medicine, these efforts expand and continue to generate insights, sup-ported by the clinical infrastructure necessary to render them actionable.142B I B L I O G R A P H YAfghahi, A., Timms, K.M., Vinayak, S., Jensen, K.C., Kurian, A.W., Carlson,R.W., Chang, P.-J., Schackmann, E., Hartman, A.-R., Ford, J.M., et al. (2017). Tu-mor BRCA1 Reversion Mutation Arising during Neoadjuvant Platinum-BasedChemotherapy in Triple-Negative Breast Cancer Is Associated with Therapy Re-sistance. Clinical Cancer Research 23.Alaei-Mahabadi, B., Bhadury, J., Karlsson, J.W., Nilsson, J.A., and Larsson, E.(2017). Global analysis of somatic structural genomic alterations and their impacton gene expression in diverse human cancers. PNAS 113, 13768–13773.Albertsen, H., Smith, S.A., Mazoyer, S., Fujimoto, E., Stevens, J., Williams, B.,Rodriguez, P., Cropp, C.S., Slijepcevic, P., Carlson, M., et al. (1994). A physicalmap and candidate genes in the BRCA1 region on chromosome 17q12–21. NatureGenetics 7, 472–479.Alexandrov, L.B., and Stratton, M.R. (2014). Mutational signatures: The patternsof somatic mutations hidden in cancer genomes.Alexandrov, L., Jones, P., Wedge, D., Sale, J., Campbell, P., Nik-Zainal, S., andStratton, M. (2015a). Clock-like mutational processes in human somatic cells. NatCommun.Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S. a J.R., Behjati, S.,Biankin, A.V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.-L., et al. (2013a).Signatures of mutational processes in human cancer. Nature 500, 415–421.143Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J., and Stratton, M.R.(2013b). Deciphering Signatures of Mutational Processes Operative in HumanCancer. Cell Reports 3, 246–259.Alexandrov, L.B., Nik-Zainal, S., Siu, H.C., Leung, S.Y., and Stratton, M.R.(2015b). A mutational signature in gastric cancer suggests therapeutic strategies.Nature Communications 6.Alexandrov, L.B., Ju, Y.S., Haase, K., Van Loo, P., Martincorena, I., Nik-Zainal,S., Totoki, Y., Fujimoto, A., Nakagawa, H., Shibata, T., et al. (2016). Mutation sig-natures associated with tobacco smoking in human cancer. Science 354, 618–622.André, T., Boni, C., Mounedji-Boudiaf, L., Navarro, M., Tabernero, J., Hickish,T., Topham, C., Zaninelli, M., Clingan, P., Bridgewater, J., et al. (2004). Oxaliplatin,Fluorouracil, and Leucovorin as Adjuvant Treatment for Colon Cancer. New Eng-land Journal of Medicine 350, 2343–2351.Arun, B., Bayraktar, S., Liu, D.D., Barrera, A.M.G., Atchley, D., Pusztai, L., Lit-ton, J.K., Valero, V., Meric-Bernstam, F., and Hortobagyi, G.N. (2011). Responseto neoadjuvant systemic therapy for breast cancer in BRCA mutation carriers andnoncarriers: a single-institution experience. Journal of Clinical Oncology 29, 3739–3746.Baca, S.C., Prandi, D., Lawrence, M.S., Mosquera, J.M., Romanel, A., Drier, Y.,Park, K., Kitabayashi, N., MacDonald, T.Y., Ghandi, M., et al. (2013). PunctuatedEvolution of Prostate Cancer Genomes. Cell 153, 666–677.Baez-Ortega, A., and Gori, K. (2017). Computational approaches for discoveryof mutational signatures in cancer. Briefings in Bioinformatics 69, 26–33.Banerji, S., Cibulskis, K., Rangel-Escareno, C., Brown, K.K., Carter, S.L., Fred-erick, A.M., Lawrence, M.S., Sivachenko, A.Y., Sougnez, C., Zou, L., et al. (2012).144Sequence analysis of mutations and translocations across breast cancer subtypes.Nature 486, 405–409.Behjati, S., Gundem, G., Wedge, D., Roberts, N., Tarpey, P., Cooke, S., Van, L.,Alexandrov, L., Ramakrishna, M., Davies, H., et al. (2016). Mutational signaturesof ionizing radiation in second malignancies. Nature Communications 12605.Betancourt, M. (2017). Identifying Bayesian Mixture Models.Birkbak, N.F., Wang, Z.C., Kim, J...Y., Eklund, A.C., Li, Q., Tian, R., Bowman-Colin, C., Li, Y., Greene-Colozzi, A., Iglehard, J.D., et al. (2012). Telomeric al-lelic imbalance indicates defective DNA repair and sensitivity to DNA-damagingagents. Cancer Discovery 2.Boot, A., Huang, M., Ng, A.W.T., Kawakami, Y., Chayama, K., Teh, B.T., Naka-gawa, H., and Rozen, S.G. (2017). In-depth characterization of the cisplatin muta-tional signature in a human cell line and in esophageal and liver tumors. bioRxiv189233.Bosdet, I.E., Docking, T.R., Butterfield, Y.S., Mungall, A.J., Zeng, T., Coope,R.J., Yorida, E., Chow, K., Bala, M., Young, S.S., et al. (2013). A Clinically Vali-dated Diagnostic Second-Generation Sequencing Assay for Detection of Hered-itary BRCA1 and BRCA2 Mutations. The Journal of Molecular Diagnostics 15,796–809.Bose, P., Pleasance, E.D., Jones, M., Shen, Y., Ch’ng, C., Reisle, C., Schein, J.E.,Mungall, A.J., Moore, R., and Ma, Y. (2015). Integrative genomic analysis of ghostcell odontogenic carcinoma. Oral Oncology 51, e71–e75.Bruin, E.C. de, McGranahan, N., Mitter, R., Salm, M., Wedge, D.C., Yates, L.,Jamal-Hanjani, M., Shafi, S., Murugaesu, N., Rowan, A.J., et al. (2014). Spatial andtemporal diversity in genomic instability processes defines lung cancer evolution.Science (New York, N.Y.) 346, 251–256.145Butterfield, Y.S., Kreitzman, M., Thiessen, N., Corbett, R.D., Li, Y., Pang, J., Ma,Y.P., Jones, S.J.M., and Birol, I˙. (2014). JAGuaR: junction alignments to genome forRNA-seq reads. PloS One 9, e102398.Byrski, T., Gronwald, J., Huzarski, T., Grzybowska, E., Budryk, M., Stawicka,M., Mierzwa, T., Szwiec, M., Wis´niowski, R., and Siolek, M. (2010). Pathologiccomplete response rates in young women with BRCA1-positive breast cancersafter neoadjuvant chemotherapy. Journal of Clinical Oncology 28, 375–379.Canadian Cancer Society (2014). BRCA gene mutations.Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt,M., Brubaker, M., Guo, J., Li, P., and Riddell, A. (2017). Stan: A Probabilistic Pro-gramming Language. Journal of Statistical Software; Vol 1, Issue 1 (2017).Center for Drug Evaluation and Research (2018). Approved Drugs - FDA ap-proves olaparib for germline BRCA-mutated metastatic breast cancer.Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen,A., Byrne, C.J., Heuer, M.L., Larsson, E., et al. (2012). The cBio cancer genomicsportal: an open platform for exploring multidimensional cancer genomics data.Cancer Discovery 2, 401–404.Cerbinskaite, A., Mukhopadhyay, A., Plummer, E.R., Curtin, N.J., and Edmond-son, R.J. (2012). Defective homologous recombination in human cancers. CancerTreatment Reviews 38, 89–100.Chaffer, C.L., and Weinberg, R.A. (2011). A Perspective on Cancer Cell Metasta-sis. Science 331, 1559–1564.Chang, H.H.Y., Pannunzio, N.R., Adachi, N., and Lieber, M.R. (2017). Non-homologous DNA end joining and alternative pathways to double-strand breakrepair. Nature Reviews Molecular Cell Biology 18, 495–506.146Chia, S.K., Bramwell, V.H., Tu, D., Shepherd, L.E., Jiang, S., Vickery, T., Mardis,E., Leung, S., Ung, K., Pritchard, K.I., et al. (2012). A 50-Gene Intrinsic SubtypeClassifier for Prognosis and Prediction of Benefit from Adjuvant Tamoxifen. Clin-ical Cancer Research 1–8.Chmielecki, J., Crago, A.M., Rosenberg, M., O’Connor, R., Walker, S.R., Am-brogio, L., Auclair, D., McKenna, A., Heinrich, M.C., Frank, D.A., et al. (2013).Whole-exome sequencing identifies a recurrent NAB2-STAT6 fusion in solitaryfibrous tumors. Nature Genetics 45, 131–132.Chong, L.C., Twa, D.D.W., Mottok, A., Ben-Neriah, S., Woolcock, B.W., Zhao, Y.,Savage, K.J., Marra, M.A., Scott, D.W., Gascoyne, R.D., et al. (2016). Comprehen-sive characterization of programmed death ligand structural rearrangements inB-cell non-Hodgkin lymphomas. Blood 128, 1206–1213.Chong, Z., Ruan, J., Gao, M., Zhou, W., Chen, T., Fan, X., Ding, L., Lee, A.Y.,Boutros, P., Chen, J., et al. (2017). novoBreak: local assembly for breakpoint detec-tion in cancer genomes. Nature Methods 14, 65–67.Christie, E.L., Fereday, S., Doig, K., Pattnaik, S., Dawson, S.-J., and Bowtell, D.D.(2017). Reversion of BRCA1/2 Germline Mutations Detected in Circulating TumorDNA From Patients With High-Grade Serous Ovarian Cancer. Journal of ClinicalOncology 35, 1274–1280.Conroy, T., Desseigne, F., Ychou, M., Bouché, O., Guimbaud, R., Bécouarn, Y.,Adenis, A., Raoul, J.-L., Gourgou-Bourgade, S., Fouchardière, C. de la, et al. (2011).FOLFIRINOX versus Gemcitabine for Metastatic Pancreatic Cancer. New EnglandJournal of Medicine 364, 1817–1825.Davies, H., Glodzik, D., Morganella, S., Yates, L.R., Staaf, J., Zou, X., Ramakr-ishna, M., Martin, S., Boyault, S., and Sieuwerts, A.M. (2017). HRDetect is a pre-147dictor of BRCA1 and BRCA2 deficiency based on mutational signatures. NatureMedicine.De Leeneer, K., Hellemans, J., De Schrijver, J., Baetens, M., Poppe, B., VanCriekinge, W., De Paepe, A., Coucke, P., and Claes, K. (2011). Massive parallelamplicon sequencing of the breast cancer genes BRCA1 and BRCA2: opportuni-ties, challenges, and limitations. Human Mutation 32, 335–344.Dent, R., Trudeau, M., Pritchard, K.I., Hanna, W.M., Kahn, H.K., Sawka, C.A.,Lickley, L.A., Rawlinson, E., Sun, P., and Narod, S.A. (2007). Triple-Negative BreastCancer: Clinical Features and Patterns of Recurrence. Clinical Cancer Research 13,4429–4434.Dhillon, K.K., Swisher, E.M., and Taniguchi, T. (2011). Secondary mutations ofBRCA1/2 and drug resistance. Cancer Science 102, 663–669.Ding, J., Bashashati, A., Roth, A., Oloumi, A., Tse, K., Zeng, T., Haffari, G., Hirst,M., Marra, M.A., Condon, A., et al. (2012). Feature-based classifiers for somaticmutation detection in tumour-normal paired sequencing data. Bioinformatics 28,167–175.Drost, J., Boxtel, R. van, Blokzijl, F., Mizutani, T., Sasaki, N., Sasselli, V., Ligt, J.de, Behjati, S., Grolleman, J.E., Wezel, T. van, et al. (2017). Use of CRISPR-modifiedhuman stem cell organoids to study the origin of mutational signatures in cancer.Science (New York, N.Y.) 358, 234–238.Eisenhauer, E., Therasse, P., Bogaerts, J., Schwartz, L., Sargent, D., Ford, R.,Dancey, J., Arbuck, S., Gwyther, S., Mooney, M., et al. (2009). New response evalu-ation criteria in solid tumours: Revised RECIST guideline (version 1.1). EuropeanJournal of Cancer 45, 228–247.148Ellis, M.J., Ding, L., Shen, D., Luo, J., Suman, V.J., Wallis, J.W., Van Tine, B.A.,Hoog, J., Goiffon, R.J., Goldstein, T.C., et al. (2012). Whole-genome analysis in-forms breast cancer response to aromatase inhibition. Nature 486, 353–360.Engert, F., Kovac, M., Baumhoer, D., Nathrath, M., and Fulda, S. (2017). Os-teosarcoma cells with genetic signatures of BRCAness are susceptible to the PARPinhibitor talazoparib alone or in combination with chemotherapeutics. Oncotarget8, 48794–48806.Farmer, H., McCabe, N., Lord, C.J., Tutt, A.N.J., Johnson, D.A., Richardson, T.B.,Santarosa, M., Dillon, K.J., Hickson, I., Knights, C., et al. (2005). Targeting theDNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature 434,917–921.Fischer, A., Illingworth, C.J., Campbell, P.J., and Mustonen, V. (2013). EMu:probabilistic inference of mutational processes and their localization in the can-cer genome. Genome Biology 14, R39.Fischer, A., Vázquez-García, I., Illingworth, C.J.R., and Mustonen, V. (2014).High-definition reconstruction of clonal composition in cancer. Cell Reports 7,1740–1752.Foley, S.B., Rios, J.J., Mgbemena, V.E., Robinson, L.S., Hampel, H.L., Toland,A.E., Durham, L., and Ross, T.S. (2015). Use of Whole Genome Sequencing forDiagnosis and Discovery in the Cancer Genetics Clinic. EBioMedicine 2, 74–81.Gao, J., Aksoy, B.A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S.O., Sun, Y.,Jacobsen, A., Sinha, R., Larsson, E., et al. (2013). Integrative Analysis of ComplexCancer Genomics and Clinical Profiles Using the cBioPortal. Science Signaling 6,pl1–pl1.149Gehring, J.S., Fischer, B., Lawrence, M., and Huber, W. (2015). SomaticSigna-tures: inferring mutational signatures from single-nucleotide variants. Bioinfor-matics btv408.Gelmon, K.A., Tischkowitz, M., Mackay, H., Swenerton, K., Robidoux, A.,Tonkin, K., Hirte, H., Huntsman, D., Clemons, M., Gilks, B., et al. (2011). Ola-parib in patients with recurrent high-grade serous or poorly differentiated ovar-ian carcinoma or triple-negative breast cancer: a phase 2, multicentre, open-label,non-randomised study. The Lancet Oncology 12, 852–861.Gerstung, M., Jolly, C., Leshchiner, I., Dentro, S.C., Gonzalez, S., Mitchell, T.J.,Rubanova, Y., Anur, P., Rosebrock, D., Yu, K., et al. (2017). The evolutionary historyof 2,658 cancers. bioRxiv 161562.Gonzalez-Perez, A., Perez-Llamas, C., Deu-Pons, J., Tamborero, D., Schroeder,M.P., Jene-Sanz, A., Santos, A., and Lopez-Bigas, N. (2013). IntOGen-mutationsidentifies cancer drivers across tumor types. Nature Methods 10, 1081–1082.Greenman, C., Pleasance, E., Newman, S., Yang, F., Fu, B., Nik-Zainal, S., Jones,D., Lau, K., Carter, N., Edwards, P., et al. (2012). Estimation of rearrangementphylogeny for cancer genomes. Genome Research 22, 346–361.Griffith, M., Miller, C.A., Griffith, O.L., Krysiak, K., Skidmore, Z.L., Ramu, A.,Walker, J.R., Dang, H.X., Trani, L., Larson, D.E., et al. (2015). Optimizing CancerGenome Sequencing and Analysis. Cell Systems 1, 210–223.Ha, G., Roth, A., Lai, D., Bashashati, A., Ding, J., Goya, R., Giuliany, R., Rosner,J., Oloumi, A., and Shumansky, K. (2012). Integrative analysis of genome-wideloss of heterozygosity and monoallelic expression at nucleotide resolution revealsdisrupted pathways in triple-negative breast cancer. Genome Research 22, 1995–2007.150Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L.M., Melnyk, N., McPher-son, A., Bashashati, A., Laks, E., et al. (2014). TITAN: inference of copy numberarchitectures in clonal cell populations from tumor whole-genome sequence data.Genome Research 24, 1881–1893.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., andKing, M.C. (1990). Linkage of early-onset familial breast cancer to chromosome17q21. Science (New York, N.Y.) 250, 1684–1689.Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: the next genera-tion. Cell 144, 646–674.Helleday, T., Petermann, E., Lundin, C., Hodgson, B., and Sharma, R.A. (2008).DNA repair pathways as targets for cancer therapy. Nature Reviews Cancer 8,193–204.Huang, Y., and Li, L. (2013). DNA crosslinking damage and cancer - a tale offriend and foe. Translational Cancer Research 2, 144–154.Huang, X., Wojtowicz, D., Przytycka, T.M., and Curtis, C. (2017). Detecting pres-ence of mutational signatures in cancer with confidence. Bioinformatics 10.1093.Iyer, G., Audenet, F., Middha, S., Carlo, M.I., Regazzi, A.M., Funt, S., Al-Ahmadie, H., Solit, D.B., Rosenberg, J.E., and Bajorin, D.F. (2017). Mismatch re-pair (MMR) detection in urothelial carcinoma (UC) and correlation with immunecheckpoint blockade (ICB) response. J Clin Oncol 35, suppl; abstr 4511.Jackman, S.D., Vandervalk, B.P., Mohamadi, H., Chu, J., Yeo, S., Hammond, S.A.,Jahesh, G., Khan, H., Coombe, L., Warren, R.L., et al. (2017). ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Research 27,768–777.151Jacot, W., Theillet, C., Guiu, S., and Lamy, P.-J. (2015). Targeting triple-negativebreast cancer and high-grade ovarian carcinoma: refining BRCAness beyondBRCA1/2 mutations? Future Oncology 11, 557–559.Johnson, N., Fletcher, O., Palles, C., Rudd, M., Webb, E., Sellick, G., dos SantosSilva, I., McCormack, V., Gibson, L., Fraser, A., et al. (2007). Counting potentiallyfunctional variants in BRCA1, BRCA2 and ATM predicts breast cancer suscepti-bility. Human Molecular Genetics 16, 1051–1057.Jones, S.J.M., Laskin, J., Li, Y.Y., Griffith, O.L., An, J., Bilenky, M., Butterfield, Y.S.,Cezard, T., Chuah, E., and Corbett, R. (2010). Evolution of an adenocarcinoma inresponse to selection by targeted kinase inhibitors. Genome Biology 11, 1–12.Joosse, S.A. (2012). BRCA1 and BRCA2: a common pathway of genome protec-tion but different breast cancer subtypes. Nature Reviews Cancer 12, 372–372.Kaufman, B., Shapira-Frommer, R., Schmutzler, R.K., Audeh, M.W., Friedlander,M., Balmana, J., Mitchell, G., Fried, G., Bowen, K., and Fielding, A. (2013). Ola-parib monotherapy in patients with advanced cancer and a germ-line BRCA1/2mutation: An open-label phase II study. In ASCO Annual Meeting Proceedings,p. 11024.Kennecke, H., Yerushalmi, R., Woods, R., Cheang, M.C.U., Voduc, D., Speers,C.H., Nielsen, T.O., and Gelmon, K. (2010). Metastatic Behavior of Breast CancerSubtypes. Journal of Clinical Oncology 28, 3271–3277.Kennedy, R.D., Quinn, J.E., Mullan, P.B., Johnston, P.G., and Harkin, D.P. (2004).The role of BRCA1 in the cellular response to chemotherapy. Journal of the Na-tional Cancer Institute 96, 1659–1668.Kim, J., Mouw, K.W., Polak, P., Braunstein, L.Z., Kamburov, A., Tiao, G.,Kwiatkowski, D.J., Rosenberg, J.E., Van Allen, E.M., D’Andrea, A.D., et al. (2016).152Somatic ERCC2 mutations are associated with a distinct genomic signature inurothelial tumors. Nature Genetics 48, 600–606.Kohlmann, W., and Gruber, S.B. (1993). Lynch Syndrome (University of Wash-ington, Seattle).Kovac, M., Blattmann, C., Ribi, S., Smida, J., Mueller, N.S., Engert, F., Castro-Giner, F., Weischenfeldt, J., Kovacova, M., and Krieg, A. (2015). Exome sequencingof osteosarcoma reveals mutation signatures reminiscent of BRCA deficiency. Na-ture Communications 6.Kuismanen, S.A., Holmberg, M.T., Salovaara, R., Chapelle, A. de la, and Pel-tomäki, P. (2000). Genetic and Epigenetic Modification of MLH1 Accounts for aMajor Share of Microsatellite-Unstable Colorectal Cancers. The American Journalof Pathology 156, 1773–1779.Kumar-Sinha, C., and Chinnaiyan, A.M. (2018). Precision oncology in the ageof integrative genomics. Nature Biotechnology 36, 46–60.Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J.,Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencingand analysis of the human genome. Nature 409, 860–921.Laskin, J., Jones, S., Aparicio, S., Chia, S., Ch’ng, C., Deyell, R., Eirew, P., Fok, A.,Gelmon, K., Ho, C., et al. (2015). Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers. MolecularCase Studies 1.Lauss, M., Donia, M., Harbst, K., Andersen, R., Mitra, S., Rosengren, F., Salim,M., Vallon-Christersson, J., Törngren, T., Kvist, A., et al. (2017). Mutational andputative neoantigen load predict clinical benefit of adoptive T cell therapy inmelanoma. Nature Communications 8, 1738.153Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., Cibulskis, K., Sivachenko,A., Carter, S.L., Stewart, C., Mermel, C.H., Roberts, S.A., et al. (2013). Mutationalheterogeneity in cancer and the search for new cancer-associated genes. Nature499, 214–218.Le, D.T., Uram, J.N., Wang, H., Bartlett, B.R., Kemberling, H., Eyring, A.D.,Skora, A.D., Luber, B.S., Azad, N.S., Laheru, D., et al. (2015). PD-1 Blockade inTumors with Mismatch-Repair Deficiency. New England Journal of Medicine 372,2509–2520.Letouzé, E., Shinde, J., Renault, V., Couchy, G., Blanc, J.-F., Tubacher, E., Bayard,Q., Bacq, D., Meyer, V., Semhoun, J., et al. (2017). Mutational signatures reveal thedynamic interplay of risk factors and cellular processes during liver tumorigene-sis. Nature Communications 8, 1315.Ley, T.J., Mardis, E.R., Ding, L., Fulton, B., McLellan, M.D., Chen, K., Dooling,D., Dunford-Shore, B.H., McGrath, S., Hickenbotham, M., et al. (2008). DNA se-quencing of a cytogenetically normal acute myeloid leukaemia genome. Nature456, 66–72.Li, H., and Durbin, R. (2009). Fast and accurate short read alignment withBurrows-Wheeler transform. Bioinformatics 25, 1754–1760.Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment withBurrows-Wheeler transform. Bioinformatics 26, 589–595.Li, X., and Heyer, W.-D. (2008). Homologous recombination in DNA repair andDNA damage tolerance. Cell Research 18, 99–113.Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format andSAMtools. Bioinformatics 25, 2078–2079.154Li, X., Yao, X., Wang, Y., Hu, F., Wang, F., Jiang, L., Liu, Y., Wang, D., Sun,G., and Zhao, Y. (2013). MLH1 promoter methylation frequency in colorectal can-cer patients and related clinicopathological and molecular features. PloS One 8,e59064.Lips, E.H., Mulder, L., Oonk, A., Kolk, L.E. van der, Hogervorst, F.B.L., Imholz,A.L.T., Wesseling, J., Rodenhuis, S., and Nederlof, P.M. (2013). Triple-negativebreast cancer: BRCAness and concordance of clinical features with BRCA1-mutation carriers. British Journal of Cancer 108, 2172–2177.Liu, B., Conroy, J.M., Morrison, C.D., Odunsi, A.O., Qin, M., Wei, L., Trump,D.L., Johnson, C.S., Liu, S., and Wang, J. (2015). Structural variation discoveryin the cancer genome using next generation sequencing: computational solutionsand perspectives. Oncotarget 6, 5477–5489.Lord, C.J., and Ashworth, A. (2012). The DNA damage response and cancertherapy. Nature 481, 287–294.Lord, C.J., and Ashworth, A. (2016). BRCAness revisited. Nature Reviews Can-cer 16, 110–120.Maciejowski, J., and Imielinski, M. (2016). Modeling cancer rearrangement land-scapes: from pattern to mechanism, and back. Current Opinion in Systems Biology1, 54–61.Mantere, T., Winqvist, R., Kauppila, S., Grip, M., Jukkola-Vuorinen, A., Tervas-mäki, A., Rapakko, K., and Pylkäs, K. (2016). Targeted Next-Generation Sequenc-ing Identifies a Recurrent Mutation in MCPH1 Associating with Hereditary BreastCancer Susceptibility. PLOS Genetics 12, e1005816.Mayor, P., Gay, L.M., Lele, S., and Elvin, J.A. (2017). BRCA1 reversion muta-tion acquired after treatment identified by liquid biopsy. Gynecologic OncologyReports 21, 57–60.155McGranahan, N., Favero, F., Bruin, E.C. de, Birkbak, N.J., Szallasi, Z., and Swan-ton, C. (2015). Clonal status of actionable driver events and the timing of muta-tional processes in cancer evolution. Science Translational Medicine 7.Meier, B., Cooke, S.L., Weiss, J., Bailly, A.P., Alexandrov, L.B., Marshall, J.,Raine, K., Maddison, M., Anderson, E., Stratton, M.R., et al. (2014). C. eleganswhole-genome sequencing reveals mutational signatures related to carcinogensand DNA repair deficiency. Genome Research 24, 1624–1636.Meric-Bernstam, F., Farhangfar, C., Mendelsohn, J., and Mills, G.B. (2013). Build-ing a personalized medicine infrastructure at a major cancer center. Journal ofClinical Oncology 31, 1849–1857.Mestan, K.K., Ilkhanoff, L., Mouli, S., and Lin, S. (2011). Genomic sequencingin clinical trials. Journal of Translational Medicine 9, 222.Miki, Y., Swensen, J., Shattuck-Eidens, D., Futreal, P.A., Harshman, K., Tavtigian,S., Liu, Q., Cochran, C., Bennett, L.M., and Ding, W. (1994). A strong candidatefor the breast and ovarian cancer susceptibility gene BRCA1. Science (New York,N.Y.) 266, 66–71.Miller, C.A., White, B.S., Dees, N.D., Griffith, M., Welch, J.S., Griffith, O.L., Vij,R., Tomasson, M.H., Graubert, T.A., Walter, M.J., et al. (2014). SciClone: InferringClonal Architecture and Tracking the Spatial and Temporal Patterns of TumorEvolution. PLoS Computational Biology 10, e1003665.Morris, V., Rao, X., Pickering, C., Foo, W.C., Rashid, A., Eterovic, K., Kim, T.,Chen, K., Wang, J., Shaw, K., et al. (2017). Comprehensive Genomic Profiling ofMetastatic Squamous Cell Carcinoma of the Anal Canal. Molecular Cancer Re-search 15, 1542–1550.Mulligan, J.M., Hill, L.A., Deharo, S., Irwin, G., Boyle, D., Keating, K.E., Raji,O.Y., McDyer, F.A., O’Brien, E., Bylesjo, M., et al. (2014). Identification and Val-156idation of an Anthracycline/Cyclophosphamide–Based Chemotherapy ResponseAssay in Breast Cancer. JNCI: Journal of the National Cancer Institute 106, djt335.Nagourney, R.A., Link, J.S., Blitzer, J.B., Forsthoff, C., and Evans, S.S. (2000).Gemcitabine Plus Cisplatin Repeating Doublet Therapy in Previously Treated, Re-lapsed Breast Cancer Patients. Journal of Clinical Oncology 18, 2245–2249.Narod, S.A., Feunteun, J., Lynch, H.T., Watson, P., Conway, T., Lynch, J., andLenoir, G.M. (1991). Familial breast-ovarian cancer locus on chromosome 17q12-q23. Lancet (London, England) 338, 82–83.National Cancer Institute (2014). BRCA1 and BRCA2: Cancer Risk and GeneticTesting.Nik-Zainal, S., and Morganella, S. (2017). Mutational Signatures in Breast Can-cer: The Problem at the DNA Level. Clin Cancer Res 23, 2617–2629.Nik-Zainal, S., Alexandrov, L.B., Wedge, D.C., Van Loo, P., Greenman, C.D.,Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L.A., et al. (2012). Muta-tional processes molding the genomes of 21 breast cancers. Cell 149.Nik-Zainal, S., Davies, H., Staaf, J., Ramakrishna, M., Glodzik, D., Zou, X., Mar-tincorena, I., Alexandrov, L.B., Martin, S., and Wedge, D.C. (2016). Landscape ofsomatic mutations in 560 breast cancer whole-genome sequences. Nature 534.Niu, B., Ye, K., Zhang, Q., Lu, C., Xie, M., McLellan, M.D., Wendl, M.C., andDing, L. (2014). MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016.Norquist, B., Wurz, K.A., Pennil, C.C., Garcia, R., Gross, J., Sakai, W., Karlan,B.Y., Taniguchi, T., and Swisher, E.M. (2011). Secondary Somatic Mutations Restor-ing BRCA1/2 Predict Chemotherapy Resistance in Hereditary Ovarian Carcino-mas. Journal of Clinical Oncology 29, 3008–3015.157Nowell, P.C. (1976). The clonal evolution of tumor cell populations. Science 194,23–28.Nowell P., H.D. (1960). A minute chromosome in human chronic granulocyticleukemia. Science 132.Patch, A.-M., Christie, E.L., Etemadmoghadam, D., Garsed, D.W., George, J.,Fereday, S., Nones, K., Cowin, P., Alsop, K., and Bailey, P.J. (2015). Whole-genomecharacterization of chemoresistant ovarian cancer. Nature 521, 489–494.Pilati, C., Shinde, J., Alexandrov, L., Assié, G., André, T., Hélias-Rodzewicz,Z., Doucoudray, R., Le, C., Zucman-Rossi, J., Emile, J., et al. (2017). Mutationalsignature analysis identifies MUTYH deficiency in colorectal cancers and adreno-cortical carcinomas. Journal of Pathology 242.Pleasance, E.D., Stephens, P.J., O’Meara, S., McBride, D.J., Meynert, A., Jones,D., Lin, M.-L., Beare, D., Lau, K.W., Greenman, C., et al. (2010a). A small-celllung cancer genome with complex signatures of tobacco exposure. Nature 463,184–190.Pleasance, E.D., Cheetham, R.K., Stephens, P.J., McBride, D.J., Humphray, S.J.,Greenman, C.D., Varela, I., Lin, M.-L., Ordóñez, G.R., Bignell, G.R., et al. (2010b).A comprehensive catalogue of somatic mutations from a human cancer genome.Nature 463, 191–196.Polak, P., Kim, J., Braunstein, L.Z., Karlic, R., Haradhavala, N.J., Tiao, G., Rose-brock, D., Livitz, D., Kübler, K., Mouw, K.W., et al. (2017). A mutational signa-ture reveals alterations underlying deficient homologous recombination repair inbreast cancer. Nature Genetics 49, 1476–1486.Poon, S.L., Huang, M.N., Choo, Y., McPherson, J.R., Yu, W., Heng, H.L., Gan,A., Myint, S.S., Siew, E.Y., Ler, L.D., et al. (2015). Mutation signatures implicatearistolochic acid in bladder cancer development. Genome Medicine 7, 38.158Popova, T., Manié, E., Rieunier, G., Caux-Moncoutier, V., Tirapo, C., Dubois, T.,Delattre, O., Sigal-Zafrani, B., Bollet, M., Longy, M., et al. (2012). Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas withBRCA1/2inactivation. Cancer Res 72.Purdom, E., Ho, C., Grasso, C.S., Quist, M.J., Cho, R.J., and Spellman, P. (2013).Methods and challenges in timing chromosomal abnormalities within cancer sam-ples. Bioinformatics 29, 3113–3120.Ranjha, L., Howard, S.M., and Cejka, P. (2018). Main steps in DNA double-strand break repair: an introduction to homologous recombination and relatedprocesses. Chromosoma 127, 187–214.Raphael, B.J., Hruban, R.H., Aguirre, A.J., Moffitt, R.A., Yeh, J.J., Stewart, C.,Robertson, A.G., Cherniack, A.D., Gupta, M., Getz, G., et al. (2017). IntegratedGenomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32,185–203.e13.Rausch, T., Zichner, T., Schlattl, A., Stütz, A.M., Benes, V., and Korbel, J.O. (2012).DELLY: structural variant discovery by integrated paired-end and split-read anal-ysis. Bioinformatics 28, i333–i339.Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W.W.,Hegde, M., Lyon, E., Spector, E., et al. (2015). Standards and guidelines for the in-terpretation of sequence variants: a joint consensus recommendation of the Amer-ican College of Medical Genetics and Genomics and the Association for MolecularPathology. Genetics in Medicine 17, 405–423.Rizvi, N.A., Hellmann, M.D., Snyder, A., Kvistborg, P., Makarov, V., Havel, J.J.,Lee, W., Yuan, J., Wong, P., and Ho, T.S. (2015). Mutational landscape determinessensitivity to PD-1 blockade in non–small cell lung cancer. Science 348, 124–128.159Roberts, S.A., Lawrence, M.S., Klimczak, L.J., Grimm, S.A., Fargo, D., Stojanov,P., Kiezun, A., Kryukov, G.V., Carter, S.L., Saksena, G., et al. (2013). An APOBECcytidine deaminase mutagenesis pattern is widespread in human cancers. NatureGenetics 45, 970–976.Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S.D., Mungall,K., Lee, S., Okada, H.M., Qian, J.Q., et al. (2010). De novo assembly and analysisof RNA-seq data. Nature Methods 7, 909–912.Robinson, D.R., Wu, Y.-M., Lonigro, R.J., Vats, P., Cobain, E., Everett, J., Cao,X., Rabban, E., Kumar-Sinha, C., Raymond, V., et al. (2017). Integrative clinicalgenomics of metastatic cancer. Nature 548.Robson, M., Im, S.-A., Senkus, E., Xu, B., Domchek, S.M., Masuda, N., Delaloge,S., Li, W., Tung, N., Armstrong, A., et al. (2017). Olaparib for Metastatic BreastCancer in Patients with a Germline BRCA Mutation. New England Journal ofMedicine 377, 523–533.Rosales, R.A., Drummond, R.D., Valieris, R., Dias-Neto, E., and Silva, I.T. da(2017). signeR: an empirical Bayesian approach to mutational signature discovery.Bioinformatics 33, 8–16.Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B.S., and Swanton, C. (2016).DeconstructSigs: delineating mutational processes in single tumors distinguishesDNA repair deficiencies and patterns of carcinoma evolution. Genome Biology 17,31.Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S.,Bouchard-Côté, A., Sohrab, &., et al. (2014). Pyclone: statistical inference of clonalpopulation structure in cancer. Nature Methods 11.Roy, R., Chun, J., and Powell, S.N. (2012). BRCA1 and BRCA2: different roles ina common pathway of genome protection. Nature Reviews Cancer 12, 68–78.160Rubio-Perez, C., Tamborero, D., Schroeder, M.P., Antolín, A.A., Deu-Pons, J.,Perez-Llamas, C., Mestres, J., Gonzalez-Perez, A., and Lopez-Bigas, N. (2015). Insilico prescription of anticancer drugs to cohorts of 28 tumor types reveals target-ing opportunities. Cancer Cell 27, 382–396.Saunders, C.T., Wong, W.S.W., Swamy, S., Becq, J., Murray, L.J., and Cheetham,R.K. (2012). Strelka: Accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817.Schulze, K., Imbeaud, S., Letouze, E., Alexandrov, L.B., Calderaro, J., Rebouis-sou, S., Couchy, G., Meiller, C., Shinde, J., Soysouvanh, F., et al. (2015). Exome se-quencing of hepatocellular carcinomas identifies new mutational signatures andpotential therapeutic targets. Nat Genet 47, 505–511.Scully, R., and Livingston, D.M. (2000). In search of the tumour-suppressor func-tions of BRCA1 and BRCA2. Nature 408, 429–432.Scully, R., Chen, J., Plug, A., Xiao, Y., Weaver, D., Feunteun, J., Ashley, T., andLivingston, D.M. (1997a). Association of BRCA1 with Rad51 in mitotic and meioticcells. Cell 88, 265–275.Scully, R., Chen, J., Ochs, R.L., Keegan, K., Hoekstra, M., Feunteun, J., and Liv-ingston, D.M. (1997b). Dynamic changes of BRCA1 subnuclear location and phos-phorylation state are initiated by DNA damage. Cell 90, 425–435.Segovia, R., Tam, A.S., and Stirling, P.C. (2015). Dissecting genetic and environ-mental mutation signatures with model organisms. Trends in Genetics 31, 465–474.Serrano-Fernández, P., Debniak, T., Górski, B., Bogdanova, N., Dörk, T., Cy-bulski, C., Huzarski, T., Byrski, T., Gronwald, J., Wokołorczyk, D., et al. (2009).Synergistic interaction of variants in CHEK2 and BRCA2 on breast cancer risk.Breast Cancer Research and Treatment 117, 161–165.161Seyfried, T.N., and Huysentruyt, L.C. (2013). On the origin of cancer metastasis.Critical Reviews in Oncogenesis 18, 43–73.Sheffield, B.S., Tinker, A.V., Shen, Y., Hwang, H., Li-Chang, H.H., Pleasance, E.,Ch’ng, C., Lum, A., Lorette, J., and McConnell, Y.J. (2015). Personalized oncoge-nomics: clinical experience with malignant peritoneal mesothelioma using wholegenome sequencing. PloS One 10, e0119689.Shiraishi, Y., Tremmel, G., Miyano, S., and Stephens, M. (2015). A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures. PLOSGenetics 11, e1005657.Sirohi, B., Arnedos, M., Popat, S., Ashley, S., Nerurkar, A., Walsh, G., Johnston,S., and Smith, I.E. (2008). Platinum-based chemotherapy in triple-negative breastcancer. Annals of Oncology 19.Stan Development Team (2017). shinystan: Interactive Visual and NumericalDiagnostics and Posterior Analysis for Bayesian Models.Stecklein, S.R., and Sharma, P. (2014). Tumor homologous recombination defi-ciency assays: another step closer to clinical application? Breast Cancer Research :BCR 16, 409.Stephens, P.J., Greenman, C.D., Fu, B., Yang, F., Bignell, G.R., Mudie, L.J., Plea-sance, E.D., Lau, K.W., Beare, D., Stebbings, L.A., et al. (2011). Massive genomicrearrangement acquired in a single catastrophic event during cancer development.Cell 144, 27–40.Stephens, P.J., Tarpey, P.S., Davies, H., Van Loo, P., Greenman, C., Wedge, D.C.,Zainal, S.N., Martin, S., Varela, I., Bignell, G.R., et al. (2012). The landscape ofcancer genes and mutational processes in breast cancer. Nature 486, 400.162Supek, F., and Lehner, B. (2017). Clustered Mutation Signatures Reveal thatError-Prone DNA Repair Targets Mutations to Active Genes. Cell 170, 534–547.e23.Surgeon General (1964). Report of the Advisory Committee to the Surgeon Gen-eral of the Public Health Service. US Department of Health, Education and Wel-fare, Public Health Service Publication.Swisher, E.M., Sakai, W., Karlan, B.Y., Wurz, K., Urban, N., and Taniguchi, T.(2008). Secondary BRCA1 mutations in BRCA1-mutated ovarian carcinomas withplatinum resistance. Cancer Research 68, 2581–2586.Szikriszt, B., Póti, Á., Pipek, O., Krzystanek, M., Kanu, N., Molnár, J., Ribli, D.,Szeltner, Z., Tusnády, G.E., Csabai, I., et al. (2016). A comprehensive survey of themutagenic impact of common cancer cytotoxics. Genome Biology 17, 99.Tattini, L., D’Aurizio, R., and Magi, A. (2015). Detection of Genomic StructuralVariants from Next-Generation Sequencing Data. Frontiers in Bioengineering andBiotechnology 3, 92.Telli, M.L., Timms, K.M., Reid, J., Hennessy, B., Mills, G.B., Jensen, K.C., Sza-llasi, Z., Barry, W.T., Winer, E.P., and Tung, N.M. (2016). Homologous recombi-nation deficiency (HRD) score predicts response to platinum-containing neoadju-vant chemotherapy in patients with triple-negative breast cancer. Clinical CancerResearch 22, 3764–3773.The Cancer Genome Atlas (2012). Comprehensive molecular portraits of humanbreast tumours. Nature 490.Thibodeau, S.N., Bren, G., and Schaid, D. (1993). Microsatellite instability incancer of the proximal colon. Science 260, 816–819.Timms, K.M., Abkevich, V., Hughes, E., Neff, C., Reid, J., Morris, B., Kalva, S.,Potter, J., Tran, T.V., and Chen, J. (2014). Association of BRCA1/2 defects with163genomic scores predictive of DNA damage repair deficiency among breast cancersubtypes. Breast Cancer Research 16, 1.Tomasetti, C., Li, L., and Vogelstein, B. (2017). Stem cell divisions, somatic mu-tations, cancer etiology, and cancer prevention. Science 355.Tomita-Mitchell, A., Kat, A.G., Marcelino, L.A., Li-Sucholeiki, X.C., Goodluck-Griffith, J., and Thilly, W.G. (2000). Mismatch repair deficient human cells: spon-taneous and MNNG-induced mutational spectra in the HPRT gene. Mutation Re-search 450, 125–138.Tsuneizumi, M., Emi, M., Hirano, A., Utada, Y., Tsumagari, K., Takahashi, K.,Kasumi, F., Akiyama, F., Sakamoto, G., Kazui, T., et al. (2002). Association of allelicloss at 8p22 with poor prognosis among breast cancer cases treated with high-dose adjuvant chemotherapy. Cancer Letters 180, 75–82.Tutt, A., Ellis, P., Kilburn, L., Gilett, C., Pinder, S., Abraham, J., Barrett, S.,Barrett-Lee, P., Chan, S., and Cheang, M. (2015). Abstract S3-01: The TNT trial:A randomized phase III trial of carboplatin (C) compared with docetaxel (D) forpatients with metastatic or recurrent locally advanced triple negative or BRCA1/2breast cancer (CRUK/07/012). Cancer Research 75, S3–01.Tuxen, I., Yde, C., Mau-Sørensen, M., Santoni-Rugiu, E., Lassen, U., and Nielsen,F. (2016). Copenhagen prospective personalized oncology (CoPPO): Genomic pro-filing to select patients for phase 1 trials. Annals of Oncology 27.Viel, A., Bruselles, A., Meccia, E., Fornasarig, M., Quaia, M., Canzonieri, V.,Policicchio, E., Damiano Urso, E., Agostini, M., Genuardi, M., et al. (2017). ASpecific Mutational Signature Associated with DNA 8-Oxoguanine Persistence inMUTYH-defective Colorectal Cancer. EBioMedicine 20.Von Minckwitz, G., Hahnen, E., Fasching, P.A., Hauke, J., Schneeweiss, A., Salat,C., Rezai, M., Blohmer, J.U., Zahm, D.M., and Jackisch, C. (2014). Pathological com-164plete response (pCR) rates after carboplatin-containing neoadjuvant chemother-apy in patients with germline BRCA (gBRCA) mutation and triple-negative breastcancer (TNBC): Results from GeparSixto. In ASCO Annual Meeting Proceedings,p. 1005.Waddell, N., Pajic, M., Patch, A.M., Chang, D.K., Kassahn, K.S., Bailey, P., Johns,A.L., Miller, D., Nones, K., Quek, K., et al. (2015). Whole genomes redefine themutational landscape of pancreatic cancer. Nature 518, 495–501.Wang, K., Yuen, S.T., Xu, J., Lee, S.P., Yan, H.H.N., Shi, S.T., Siu, H.C., Deng,S., Chu, K.M., Law, S., et al. (2014). Whole-genome sequencing and comprehen-sive molecular profiling identify new driver mutations in gastric cancer. NatureGenetics 46, 573–582.Wang, Y.K., Bashashati, A., Anglesio, M.S., Cochrane, D.R., Grewal, D.S., Ha, G.,McPherson, A., Horlings, H.M., Senz, J., and Prentice, L.M. (2017). Genomic con-sequences of aberrant DNA repair mechanisms stratify ovarian cancer histotypes.Nature Genetics.Watkins, J.A., Irshad, S., Grigoriadis, A., and Tutt, A.N.J. (2014). Genomic scarsas biomarkers of homologous recombination deficiency and drug response inbreast and ovarian cancers. Breast Cancer Research 16, 3405.Weigelt, B., Comino-Méndez, I., Bruijn, I. de, Tian, L., Meisel, J.L., García-Murillas, I., Fribbens, C., Cutts, R., Martelotto, L.G., Ng, C.K.Y., et al. (2017). Di-verse BRCA1 and BRCA2 Reversion Mutations in Circulating Cell-Free DNA ofTherapy-Resistant Breast or Ovarian Cancer. Clinical Cancer Research : An OfficialJournal of the American Association for Cancer Research 23, 6708–6720.Weinberg, R. (2013). The biology of cancer (Garland science).Weymann, D., Laskin, J., Roscoe, R., Schrader, K.A., Chia, S., Yip, S., Cheung,W.Y., Gelmon, K.A., Karsan, A., Renouf, D.J., et al. (2017). The cost and cost tra-165jectory of whole-genome analysis guiding treatment of patients with advancedcancers. Molecular Genetics & Genomic Medicine 5, 251–260.Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He,W., Chen, Y.-J., Makhijani, V., Roth, G.T., et al. (2008). The complete genome of anindividual by massively parallel DNA sequencing. Nature 452, 872–876.Wong, H.-l., Yang, K.C., Shen, Y., Zhao, E.Y., Loree, J.M., Kennecke, H.F.,Kalloger, S.E., Karasinska, J.M., Lim, H.J., Mungall, A.J., et al. (2018). Molecularcharacterization of metastatic pancreatic neuroendocrine tumors (PNETs) usingwhole-genome and transcriptome sequencing. Molecular Case Studies 4, a002329.Wooster, R., Bignell, G., Lancaster, J., Swift, S., Seal, S., Mangion, J., Collins,N., Gregory, S., Gumbs, C., Micklem, G., et al. (1995). Identification of the breastcancer susceptibility gene BRCA2. Nature 378, 789–792.Wu, J., Ho, C., Laskin, J., Gavin, D., Mak, P., Duncan, K., French, J., McGahan,C., Reid, S., Chia, S., et al. (2013). The development of a standardized softwareplatform to support provincial population-based cancer outcomes units for multi-ple tumour sites: OaSIS - Outcomes and Surveillance Integration System. Studiesin Health Technology and Informatics 183, 98–103.Wynder, E., and Graham, E. (1950). Tobacco smoking as a possible etiologicfactor in bronchiogenic carcinoma; a study of 684 proved cases. Journal of theAmerican Medical Association 143, 329–336.Xu, H., Di Antonio, M., McKinney, S., Mathew, V., Ho, B., O’Neil, N.J., Santos,N.D., Silvester, J., Wei, V., Garcia, J., et al. (2017). CX-5461 is a DNA G-quadruplexstabilizer with selective lethality in BRCA1/2 deficient tumours. Nature Commu-nications 8, 14432.Yang, D., Khan, S., Sun, Y., Hess, K., Shmulevich, I., Sood, A.K., and Zhang, W.(2011). Association of BRCA1 and BRCA2 mutations with survival, chemotherapy166sensitivity, and gene mutator phenotype in patients with ovarian cancer. JAMA306, 1557–1565.Yates, L.R., Knappskog, S., Wedge, D., Farmery, J.H.R., Gonzalez, S., Martin-corena, I., Alexandrov, L.B., Van Loo, P., Haugland, H.K., Lilleng, P.K., et al. (2017).Genomic Evolution of Breast Cancer Metastasis and Relapse. Cancer Cell 32, 169–184.e7.Yip, S., Miao, J., Cahill, D.P., Iafrate, A.J., Aldape, K., Nutt, C.L., and Louis, D.N.(2009). MSH6 mutations arise in glioblastomas during temozolomide therapy andmediate temozolomide resistance. Clinical Cancer Research : An Official Journalof the American Association for Cancer Research 15, 4622–4629.Yoshida, K., and Miki, Y. (2004). Role of BRCA1 and BRCA2 as regulators ofDNA repair, transcription, and cell cycle in response to DNA damage. CancerScience 95, 866–871.Zare, F., Dow, M., Monteleone, N., Hosny, A., and Nabavi, S. (2017). An eval-uation of copy number variation detection tools for cancer using whole exomesequencing data. BMC Bioinformatics 18, 286.Zehir, A., Benayed, R., Shah, R.H., Syed, A., Middha, S., Kim, H.R., Srinivasan,P., Gao, J., Chakravarty, D., Devlin, S.M., et al. (2017). Mutational landscape ofmetastatic cancer revealed from prospective clinical sequencing of 10,000 patients.Nature Medicine.Zhao, E.Y., Shen, Y., Pleasance, E., Kasaian, K., Leelakumari, S., Jones, M., Bose,P., Ch’ng, C., Reisle, C., Eirew, P., et al. (2017). Homologous Recombination Defi-ciency and Platinum-Based Therapy Outcomes in Advanced Breast Cancer. Clini-cal Cancer Research 23, 7521–7530.Zheng, G.X.Y., Lau, B.T., Schnall-Levin, M., Jarosz, M., Bell, J.M., Hindson, C.M.,Kyriazopoulou-Panagiotopoulou, S., Masquelier, D.A., Merrill, L., Terry, J.M., et al.167(2016). Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nature Biotechnology 34, 303–311.Zolkind, P., and Uppaluri, R. (2017). Checkpoint immunotherapy in head andneck cancers. Cancer and Metastasis Reviews 36, 475–489.168A P P E N D I C E Sa1 note on nucleic acid nomenclatureMutations are denoted by their base change. For example C>T denotes a basechange from cytosine to thymine. Additionally, the trinucleotide context of a basechange is described in one of two ways. The first is to provide the entire alteredtrinucleotide, for example a TCT > TTT mutation. The other is too indicate thecontext separately, for example a C>T mutation in a TpCpT context. Here, the pbetween bases denotes a phosphodiester bond.In addition, expanded nomenclature is occasionally used. For example, aCpApY trinucleotide includes both CpApC or CpApT. A complete table of theexpanded nomenclature is provided in table A.1.Table A.1: The expanded nomenclature for nucleic acid naming.Letter Meaning Nucleotide(s)A Adenine AT Thymine TG Guanine GC Cytosine CR Purine G or AY Pyrimidine T or CM Amino A or CK Keto G or TS Strong G or CW Weak A or TH Not Guanine A or C or T169Letter Meaning Nucleotide(s)B Not Adenine G or T or CV Not Thymine G or C or AD Not Cytosine G or T or AN Any base G or T or A or Ca2 appendix tables170Table A.2: Significance tests for differences in mutation signatures acrossmolecular subtypes. Multiple Kruskal-Wallis non-parametrictests were performed to identify variation in NMF-derived denovo mutation signatures across five breast cancer molecularsubtypes (Luminal A, Luminal B, Her2-Amplified, Basal, andNormal-like). P-values were adjusted for false discovery rate,and revealed statistically significant subtype variability in threesignatures: V3, V8, and V9.Chi-squared Degrees of freedom p Adjusted pV1 5.3 4 0.26 0.28V2 10 4 0.035 0.075V3 18 4 0.0015 0.0045V4 9.9 4 0.042 0.075V5 9.2 4 0.056 0.085V6 7.3 4 0.12 0.16V7 5 4 0.28 0.28V8 18 4 0.0015 0.0045V9 25 4 6.1e-05 0.00055Table A.3: Sample details for whole genome sequencing of multiply-sequenced tumours.ID Occurrence Stage Sex Diagnosis Prep Depth PurityP01 Primary Adult F Colorectal Cancer FFPE 41x 60P01 Metastatic Adult F Colorectal Cancer OCT 92x 47P02 Primary Adult F Breast Cancer FFPE 46x 80P02 Metastatic Adult F Breast Cancer OCT 99x 80P03 Primary Adult M Colorectal Cancer FFPE 49x 70P03 Metastatic Adult M Colorectal Cancer OCT 46x 79P04 Primary Adult F Appendix Cancer FFPE 42x 60P04 Metastatic Adult F Appendix Cancer OCT 87x 81P05 Primary Adult M Colorectal Cancer FFPE 50x 65P05 Metastatic Adult M Colorectal Cancer OCT 92x 73P06 Primary Adult F Ovarian granulosa FFPE 48x 80P06 Metastatic Adult F Ovarian granulosa OCT 112x 90P07 Primary Adult F Breast Cancer FFPE 37x 75P07 Metastatic Adult F Breast Cancer OCT 105x 47171ID Occurrence Stage Sex Diagnosis Prep Depth PurityP08 Primary Adult F Breast Cancer FFPE 53x 80P08 Metastatic Adult F Breast Cancer OCT 95x 35P09 Primary Adult F Endometrial Cancer FFPE 44x 55P09 Metastatic Adult F Endometrial Cancer OCT 96x 80P10 Primary Adult F Breast Cancer FFPE 40x 70P10 Metastatic Adult F Breast Cancer OCT 122x 63P11 Primary Adult F Breast Cancer FFPE 43x 70P11 Metastatic Adult F Breast Cancer OCT 91x 86P12 Primary Adult F Breast Cancer FFPE 41x 65P12 Metastatic Adult F Breast Cancer OCT 105x 70P13 Primary Adult F Breast Cancer FFPE 54x 70P13 Metastatic Adult F Breast Cancer OCT 97x 76P14 Primary Adult F Lung Cancer FFPE 46x 60P14 Metastatic Adult F Lung Cancer OCT 100x 64P15 Primary Adult F Breast Cancer FFPE 39x 65P15 Metastatic Adult F Breast Cancer OCT 97x 69P16 Primary Adult F Breast Cancer FFPE 39x 60P16 Metastatic Adult F Breast Cancer OCT 93x 63P17 Primary Pediatric M Neuroblastoma FFPE 41x 90P17 Metastatic Pediatric M Neuroblastoma FF 89x 69P18 Primary Adult F Breast Cancer FFPE 31x 50P18 Metastatic Adult F Breast Cancer OCT 96x 70P19 Primary Pediatric M Sarcoma FFPE 54x 95P19 Metastatic Pediatric M Sarcoma FF 102x 65P20 Primary Adult F Breast Cancer FFPE 44x 70P20 Metastatic Adult F Breast Cancer OCT 110x 90P21 Primary Adult F Adenocarcinoma of the lung OCT 91x 20P21 Metastatic Adult F Adenocarcinoma of the lung OCT 67x 51P22 Primary Adult F Cholangiocarcinoma FF 86x 48P22 Metastatic Adult F Cholangiocarcinoma OCT 79x 58P23 Primary Adult M Pancreatic Adenocarcinoma FA 86x 25P23 Metastatic Adult M Pancreatic Adenocarcinoma OCT 84x 49P24 Primary Adult M Metastatic lung cancer OCT 100x 48P24 Metastatic Adult M Metastatic lung cancer OCT 82x 35172a3 appendix figures173Signature 17 Signature 2 Signature 27 Signature 28 Signature 22 Signature 13Signature 21 Signature 11 Signature 23 Signature 18 Signature 10 Signature 7Signature 4 Signature 15 Signature 12 Signature 1 Signature 26 Signature 24Signature 8 Signature 16 Signature 9 Signature 30 Signature 6 Signature 29Signature 5 Signature 20 Signature 14 Signature 3 Signature 25 Signature 19−101−101−101−101−101Error (Fraction of True Exposure)ll llllll lllllllllll lllllllllllllll0.000.250.500.755 20 14 3 25 19 8 16 9 30 6 29 4 15 12 1 26 24 21 11 23 18 10 7 17 2 27 28 22 13Reference SignatureSimilarity to other signaturesmethoddeconstructSigsSignatureEstimation_SASignITFigure A.1: Underestimated mutation signature exposures from simu-lated data. Simulated mutation catalogs with known exposurevectors were generated under various conditions and their ex-posures were re-estimated using deconstructSigs, SignatureEs-timation, and SignIT. For every signature with non-zero simu-lated exposure, the error ratio was computed ase− ee, wheree is the estimated exposure and e is the true exposure. Mu-tation signatures were ordered by their median similarity toother reference signatures. We observed frequent underestima-tion of mutation signatures by all methods, but errors weregreatest in deconstructSigs and smallest in SignIT exposures.Signatures which are highly similar to other signatures weremost likely to be underestimated.174Signature 25 Signature 26 Signature 27 Signature 28 Signature 29 Signature 30Signature 19 Signature 20 Signature 21 Signature 22 Signature 23 Signature 24Signature 13 Signature 14 Signature 15 Signature 16 Signature 17 Signature 18Signature 7 Signature 8 Signature 9 Signature 10 Signature 11 Signature 12Signature 1 Signature 2 Signature 3 Signature 4 Signature 5 Signature 6deconstructSigsnnlsSignatureEstimation_SASignITdeconstructSigsnnlsSignatureEstimation_SASignITdeconstructSigsnnlsSignatureEstimation_SASignITdeconstructSigsnnlsSignatureEstimation_SASignITdeconstructSigsnnlsSignatureEstimation_SASignITdeconstructSigsnnlsSignatureEstimation_SASignIT0.000.250.500.751.000.000.250.500.751.000.000.250.500.751.000.000.250.500.751.000.000.250.500.751.00methoderror_fraction_of_total_mutationmethoddeconstructSigsnnlsSignatureEstimation_SASignITFigure A.2: Overestimated mutation signature exposures from simulateddata. Simulated mutation catalogs with known exposure vec-tors were generated under various conditions and their expo-sures were re-estimated using deconstructSigs, SignatureEsti-mation, and SignIT. For every signature with zero simulatedexposure, the overestimation error was computed as the esti-mated exposure divided by the total mutation burden. Whileall methods exhibited exposure errors, SignIT overestimatedexposures with lower magnitude.175234567823456782345678234567823456782345678234567823456782345 678SKCM STAD UCECCOAD LUAD LUSCBLCA BRCA CESC0.6 0.8 1.0 0.6 0.7 0.8 0.9 1.0 0.80 0.85 0.90 0.95 1.000.8 0.9 1.0 0.4 0.6 0.8 1.0 0.6 0.8 1.00.6 0.7 0.8 0.9 1.0 0.4 0.6 0.8 1.0 0.6 0.8 1.05001000150050075010001250500060007000800050010001500200025004404805205001000150020003006009001200150012001500180021001200150018002100Signature StabilityReconstruction ErrorFigure A.3: Model selection for mutation signature analysis of nine co-horts of The Cancer Genome Atlas. To select the number ofsignatures, NMF was performed for models varying from 2to 8 mutation signatures. To estimate signature stability, eachNMF algorithm was run with 1000 Monte Carlo resimulatedmutation catalog matrices. In each cohort, the model contain-ing a number of signatures maximizing signature stabilitywhile minimizing Frobenius reconstruction error was chosen.Chosen models are indicated with a black box.176C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS3C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS1AC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS5C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS3BC>A C>G C>T T>A T>C T>G0.00.10.20.30.4ProbabilityPS2C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS4CC>A C>G C>T T>A T>C T>G0.00.10.2ProbabilityPS17C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS8DC>A C>G C>T T>A T>C T>G0.00.10.20.3ProbabilityPS7C>A C>G C>T T>A T>C T>G0.000.050.100.150.200.25ProbabilityMS9EC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS30C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS10FC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS8C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS11GC>A C>G C>T T>A T>C T>G0.00.10.20.3ProbabilityPS13C>A C>G C>T T>A T>C T>G0.00.10.20.3ProbabilityMS13HC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS4C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS17IC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS1C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS18JC>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityPS18C>A C>G C>T T>A T>C T>G0.000.050.100.150.20ProbabilityMS19KACGTA C G T5' context3' contextFigure A.4: Matching de novo mutation signatures to previously identi-fied known signatures. Mutation signatures were decipheredde novo from cancertype-specific cohorts of metastatic cancerwhole genomes. Signatures were clustered across cohorts toyield a set of independent signatures. Those closely resemblingprimary signatures were mapped accordingly. Here, PS standsfor primary signature, and the numbers correspond to the 30COSMIC mutation signatures.17713 17 2 3 8 M1M2M3M7M9signaturesampleBRCA0 5 10 15 20Mutations/Mb)3 5M1M2M3signaturesampleCHOL0.0 2.5 5.0 7.5Mutations/Mb)17 2 M1M4signaturesampleCOLO0 20 40Mutations/Mb)17 3 5 M1M4signaturesampleESCA0 10 20 30 40 50Mutations/Mb)2 5 7signaturesampleHNSC0 20 40 60 80Mutations/Mb)2 3 4 7M4M6signaturesampleLUNG0 20 40 60 80Mutations/Mb)2 3 5M1M2M3signaturesampleOV0.0 2.5 5.0 7.5 10.0Mutations/Mb)1 18 2 3 30 M2M4M8signaturesamplePAAD0 5 10Mutations/Mb)2 3 30 M1M2M3signaturesampleSARC0 2 4 6Mutations/Mb)3 5M1M2M3signaturesampleSECR0 1 2 3 4 5Mutations/Mb)3 7signaturesampleSKCM0 50100150200250Mutations/Mb)17 3 5 8 M1M2M5signaturesampleSTAD0 3 6 9Mutations/Mb)Figure A.5: Clustering of mutation signatures across multiple cancer co-horts into a common consensus signature set. Exposures forpresent signatures are shown per sample for all cohorts. Expo-sures were normalized by dividing by total mutation burdenper sample such that the exposures of each sample (each row)sum to 1. Total mutation burden for the corresponding sampleis also shown shown. Rows are ordered based on hierarchicalclustering of the fractional exposures.178234567891011121314234567891011121314234567891011121314234567891011121314234567891011213142345678910111213142345678 91011 12 131423456789101112131423456789101112131423456789101112131423456789 101121314234567891011121314SARC SECR SKCM STADHNSC LUNG OV PAADBRCA CHOL COLO ESCA0.6 0.7 0.8 0.9 1.0 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.000.25 0.50 0.75 1.00 0.7 0.8 0.9 1.0 0.4 0.6 0.8 1.0 0.6 0.7 0.8 0.9 1.00.6 0.7 0.8 0.9 1.0 0.2 0.4 0.6 0.8 1.0 0.900 0.925 0.950 0.975 1.000 0.25 0.50 0.75 1.001000200030004000200040006000800050010001500200025002000030000400005000060000100015002000200003000040000400600800100012001400100002000030000400002004006004000800012000160002000400060001000150020002500Signature StabilityReconstruction ErrorFigure A.6: Mutation signatures were successfully deciphered across 12cancer cohorts. For each cohort, the number of signatures tobe inferred was selected by jointly minimizing reconstructionerror and maximizing signature stability. Chosen models areindicated with a black box.179llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll ll lllllllll llllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllll llllllllllllllll lllllllll lllllllll ll lll lllllllll l llllllllll lll llllllllllll llllllllllllllllll llllllll l llllllllllll llllllllllllllllllllSig. M2SARCSig. M2SECRSig. M2STADSig. M2BRCASig. M2CHOLSig. M2OVSig. M2PAADSig. 17BRCASig. 17COLOSig. 17ESCASig. 17STAD0 1 2 3 0 1 2 3 0 1 2 30 1 2 3SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1Exposure (Mutations / Mb)TimingFigure A.7: Late-arising mutation signatures across cancer types. Tempo-ral dissection of mutation signatures deciphered de novo frommetastatic cancers revealed two late-arising mutation signa-tures. Signatures 17 and M2 were observed in late-arising mu-tational subpopulations across various cancer types.180lllllllllllllllllllllllllllllllllllllllll lllllll lllllll lllllllll llllllllllll llllllllllll llll llllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllll llllllllllllllllllllllllllll llllllllllll llllllllll llllllllllllSig. M2LungSig. M2Lymph NodeSig. M2SpineSig. M2Abdominal WallSig. M2BreastSig. M2Chest wallSig. M2LiverSig. 17LiverSig. 17LungSig. 17Lymph NodeSig. M2Abdominal MassSig. 17Abdominal MassSig. 17Abdominal WallSig. 17BreastSig. 17Chest wall0 1 2 3 0 1 2 3 0 1 2 30 1 2 3SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1SinglePopulation(Early) 00.250.50.75(Late) 1Exposure (Mutations / Mb)TimingFigure A.8: Late-arising mutation signatures across biopsy sites. Tempo-ral dissection of mutation signatures deciphered de novo frommetastatic cancers revealed two late-arising mutation signa-tures. Signatures 17 and M2 were observed in late-arising mu-tational subpopulations across various biopsy sites.181

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0374139/manifest

Comment

Related Items