UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computational prioritization of cancer driver genes for precision oncology Shrestha, Raunak 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_september_shrestha_raunak.pdf [ 8.08MB ]
JSON: 24-1.0370936.json
JSON-LD: 24-1.0370936-ld.json
RDF/XML (Pretty): 24-1.0370936-rdf.xml
RDF/JSON: 24-1.0370936-rdf.json
Turtle: 24-1.0370936-turtle.txt
N-Triples: 24-1.0370936-rdf-ntriples.txt
Original Record: 24-1.0370936-source.json
Full Text

Full Text

COMPUTATIONAL PRIORITIZATION OF CANCER DRIVERGENES FOR PRECISION ONCOLOGYbyRAUNAK SHRESTHAB.Tech. Kathmandu University, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Bioinformatics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2018c© RAUNAK SHRESTHA, 2018The following individuals certify that they have read, and recommend to the Faculty of Graduate andPostdoctoral Studies for acceptance, the dissertation entitled:Computational Prioritization of Cancer Driver Genes for Precision Oncologysubmitted by Raunak Shrestha in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin BioinformaticsExamining Committee:Dr. Colin C. Collins, Urologic SciencesSupervisorDr. S. Cenk Sahinalp, Computer ScienceCo-supervisorDr. Artem Cherkasov, Urologic SciencesSupervisory Committee MemberDr. David G. Huntsman, Pathology and Laboratory MedicineUniversity ExaminerDr. Leonard Foster, Biochemistry and Molecular BiologyUniversity ExaminerAdditional Supervisory Committee Members:Dr. Yuzhuo Wang, Urologic SciencesSupervisory Committee MemberDr. Wan Lam, PathologySupervisory Committee MemberiiAbstractAdvances in high-throughput sequencing technologies has drastically increased the efficiency to accessdifferent alterations in the genome, transcriptome, proteome, and epigenome of a cancer cell. This hasincreased the computational burden to analyze these “big data” making the translation of the knowledgeinto insightful and impactful patient outcomes extraordinarily challenging.Among these alterations, only a few “driver” alterations are expected to confer crucial growth advan-tage. These are greatly outnumbered by functionally inconsequential “passenger” alterations. This posesa significant challenge for the identification of driver alterations, requiring solutions to novel algorithmicproblems. Although, the insight on driver alterations is critical to guide selection of appropriate drugtherapies for the patient, no specific tools exist to help clinicians contextualize the enormous genomicinformation when making therapeutic decisions.In this thesis we describe novel algorithms for the identification and prioritization of cancer drivergenes. First we describe, HIT’nDRIVE, a combinatorial algorithm measuring the impact of genomicaberration to global changes of gene expression pattern to prioritize cancer driver genes. We also demon-strate its application on large multi-omics cancer datasets to guide precision oncology. We further de-scribe integrative multi-omics characterization of peritoneal mesothelioma, a rare cancer of abdomen.Here using HIT’nDRIVE, we identified peritoneal mesothelioma with BAP1 loss to form a distinctmolecular subtype characterized by distinct gene expression patterns of chromatin remodeling, DNArepair pathways, and immune checkpoint receptor activation. We demonstrate that this subtype is cor-related with an inflammatory tumor microenvironment and thus is a candidate for immune checkpointblockade therapies. Finally, we describe, cd-CAP, a combinatorial algorithm to identify subnetworkswith conserved molecular alteration pattern across a large subset of a tumor sample cohort. Notably, wedemonstrate that many of the largest highly conserved subnetworks within a tumor type solely consist ofgenes that have been subject to copy number gain, typically located on the same chromosomal arm andthus likely a result of a single, large scale copy number amplification.iiiLay SummaryCancer arises as a result of deleterious aberrations on the genetic material and its product. The compo-nents of the genetic material interact with each other forming extremely complex web of networks. Theaccumulation of abnormalities in the genetic material results in perturbation of critical networks whichmay ultimately give rise to tumor. Although many alterations accumulate in a tumor over its lifetime,only a small fraction, known as “driver” alterations, are critical for tumor growth, while the majority of“passenger” alterations are not essential. Identification of driver alterations in the vast milieu of passen-ger alterations is a challenging task, but is critical for optimal cancer management.In this thesis, we describe novel computational method using advanced mathematics and computerscience techniques to address the problems mentioned above. Here we demonstrate, how our compu-tational tools establish linkage between driver alterations and tumour viability thus revealing novel bio-logical insights to therapeutic strategies. This will guide the selection of appropriate anti-cancer drugsand development of new ones. Thus we believe, this work will accelerate translation from discovery toeffective cancer treatment.ivPrefaceIn conjunction with my advisors, Dr. Colin C. Collins and Dr. S. Cenk Sahinalp, I was involved in theconceptualization and design of research activities described in the thesis. In particular, I was designed,developed, and implemented the computational algorithms described in this thesis. I performed majorityof data analysis for the molecular characterization of malignant peritoneal mesothelioma. I performedthe computational experiments, data analysis, and generation of figures, tables, and text in this thesis.Where there are exceptions, they are noted below.Chapter 1 was written by me.Majority of the Chapter 2 and 3 was written by me. The HIT’nDRIVE algorithm developmentwas done in collaboration with Mr. Ermin Hodzic, Dr. Gholamreza Haffari, and Dr. S. Cenk Sahi-nalp. I performed majority of data analysis, and generated tables and figures. Certain portion of thecomputational experiments were performed by Mr. Ermin Hodzic. Chapteres 2 and 3 has been pub-lished in: R. Shrestha, E. Hodzic, J. Yeung, K. Wang, T. Sauerwald, P. Dao, S. Anderson, H. Beltran,M. A. Rubin, C. C. Collins, G. Haffari, and S. C. Sahinalp. HIT’nDRIVE: Multi-driver gene priori-tization based on hitting time. Research in Computational Molecular Biology: 18th Annual Interna-tional Conference, RECOMB 2014, Pittsburgh, PA, USA, April 2-5, 2014, Proceedings, pages 293–306,2014. doi:10.1007/978-3-319-05269-4 23. URL http://dx.doi.org/10.1007/978-3-319-05269-4 23 andR. Shrestha, E. Hodzic, T. Sauerwald, P. Dao, K. Wang, J. Yeung, S. Anderson, F. Vandin, G. Haf-fari, C. C. Collins, and S. C. Sahinalp. HIT’nDRIVE: patient-specific multidriver gene prioritiza-tion for precision oncology. Genome research, 27(9):1573–1588, sep 2017. ISSN 1549-5469. doi:10.1101/gr.221218.117. URL https://www.ncbi.nlm.nih.gov/pubmed/28768687. HIT’nDRIVE softwareis available through the following url: https://github.com/sfu-compbio/hitndriveChapter 4 was written by me. I performed majority of data analysis, and generated tables and figures.This work was performed in collaboration with Dr. Noushin Nabavi. Dr. Andrew Churg, Dr. HtooZarni Oo, Dr. Antonio Hurtado-Coll, Dr. Ladan Fazli, and Ms, Estelle Li generated Tissue Microarray,performed pathological slide staining and slide reviews. Dr. Noushin Nabavi, Mr. Hans H. Adomat,vMr. Robert Shukin, Mr. Brian McConeghy, Ms. Anne Haegert, and Ms. Sonal Brahmbhatt performedexperiments and data generation. Dr. Yen-Yi Lin, Dr. Fan Mo, Dr. Stanislav Volik, Mr. Shawn Anderson,and Mr. Robert H. Bell performed various computational experiments. This study was approved by theInstitutional Review Board of the University of British Columbia and the Vancouver Coastal Health(REB Number. H1500902 and V15-00902). All samples and information were collected with writtenand signed informed consent from the participating patients. The pre-print version of this chapter isavailable at: R. Shrestha, N. Nabavi, Y.-Y. Lin, F. Mo, S. Anderson, S. Volik, H. H. Adomat, D. Lin,H. Xue, X. Dong, R. Shukin, R. H. Bell, B. McConeghy, A. Haegert, S. Brahmbhatt, E. Li, H. Z. Oo,A. Hurtado-Coll, L. Fazli, J. Zhou, Y. McConnell, A. McCart, A. Lowy, G. B. Morin, M. Daugaard, S. C.Sahinalp, F. Hach, S. Le Bihan, M. E. Gleave, Y. Wang, A. Churg, and C. C. Collins. Integrated Multi-omics Molecular Subtyping Predicts Therapeutic Vulnerability in Malignant Peritoneal Mesothelioma.bioRxiv, 2018. doi:10.1101/243477. URL https://doi.org/10.1101/2434777Chapter 5 was written by me. This work was done in collaboration with Mr. Ermin Hodzic and Mr.Kaiyuan Zhu. I performed data preparation, developed algorithm, performed data analysis as well asgenerated tables and figures. The pre-print version of this chapter is available at: E. Hodzic, R. Shrestha,K. Zhu, K. Cheng, C. C. Collins, and S. C. Sahinalp. Combinatorial detection of conserved alterationpatterns for identifying cancer subnetworks. bioRxiv, 2018. doi:10.1101/369850. URL https://doi.org/10.1101/369850 cd-CAP software is available through the following url:https://github.com/ehodzic/cd-CAPviTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Cancer driver genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Computational methods for the prediction of cancer driver genes . . . . . . . . . . . . . 21.2.1 Identification of recurrent somatic alterations . . . . . . . . . . . . . . . . . . . 21.2.2 Prediction of functional impact of somatic alterations . . . . . . . . . . . . . . . 31.2.3 Pathway and interaction-network based approaches . . . . . . . . . . . . . . . . 41.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 HIT’nDRIVE: an algorithm for cancer driver genes prioritization using hitting time . . . 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10vii2.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 HIT’nDRIVE Algorithmic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Reformulation of RWFL as a Weighted Multi-Set Cover (WMSC) Problem . . . 132.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 HIT’nDRIVE parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 HIT’nDRIVE: expression outlier stringency . . . . . . . . . . . . . . . . . . . . 152.4.3 HIT’nDRIVE: random alterations and random expression outliers. . . . . . . . . 162.4.4 HIT’nDRIVE: network perturbation . . . . . . . . . . . . . . . . . . . . . . . . 162.4.5 HIT’nDRIVE: underlying network . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.6 Modified HIT’nDRIVE: when it is not required to prioritize at least one drivergene per patient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.7 HIT’nDRIVE’s ability to capture CGC genes . . . . . . . . . . . . . . . . . . . 172.4.8 Correlation of predicted driver genes with alteration burden. . . . . . . . . . . . 182.4.9 Phenotype classification using dysregulated modules seeded with the predicteddriver genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.10 CGC cancer type-specific gene enrichment. . . . . . . . . . . . . . . . . . . . . 202.4.11 Phenotype classification using CGC gene seeded modules . . . . . . . . . . . . 202.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.6.1 Datasets and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.6.2 Interaction networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.3 Validation dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6.4 Derivation of expression outlier genes . . . . . . . . . . . . . . . . . . . . . . . 242.6.5 Derivation of expression outlier gene weights . . . . . . . . . . . . . . . . . . . 242.6.6 Statistical significance of the overlap of driver genes with that of CGC database. 253 Application of HIT’nDRIVE: patient-specific multi-driver gene prioritization for preci-sion oncology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.1 HIT’nDRIVE predicts frequent as well as infrequent driver genes in multi-omicscancer datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.2 Network properties of cancer driver genes . . . . . . . . . . . . . . . . . . . . . 37viii3.3.3 Breast cancer subtype classification using driver modules. . . . . . . . . . . . . 393.3.4 Subtype-specific breast cancer driver modules are associated with survival out-come. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.5 HIT’nDRIVE seeded driver genes accurately predict drug efficacy . . . . . . . . 413.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 Datasets and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.2 Genomics of drug sensitivity in cancer . . . . . . . . . . . . . . . . . . . . . . . 443.5.3 Pathway enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.4 Association of driver modules with patients’ survival outcome . . . . . . . . . . 444 Integrated multi-omics molecular subtyping predicts therapeutic vulnerability in malig-nant peritoneal mesothelioma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.3.1 Patient Cohort description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.3.2 Landscape of somatic mutations in PeM . . . . . . . . . . . . . . . . . . . . . . 524.3.3 Copy number landscape in PeM . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.4 Gene fusions in PeM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3.5 The global transcriptome and proteome profile of PeM . . . . . . . . . . . . . . 554.3.6 Transcriptional and post-transcriptional mechanisms regulate chromatin remod-eling protein-complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.3.7 BAP1del subtype is characterized by distinct expression patterns of genes in-volved in DNA repair pathway, and immune checkpoint receptor activation . . . 584.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5.1 Clinical samples and pathology evaluation . . . . . . . . . . . . . . . . . . . . . 614.5.2 Construction of tissue microarrays (TMAs) . . . . . . . . . . . . . . . . . . . . 614.5.3 Immunohistochemistry and Histopathology . . . . . . . . . . . . . . . . . . . . 624.5.4 Whole exome sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.5.5 Somatic variant calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.6 Copy number aberration (CNA) calls . . . . . . . . . . . . . . . . . . . . . . . 634.5.7 Transcriptome sequencing (RNA-seq) . . . . . . . . . . . . . . . . . . . . . . . 64ix4.5.8 Transcriptome (RNA-seq) quantification . . . . . . . . . . . . . . . . . . . . . . 644.5.9 Identification of fusion transcripts and validation . . . . . . . . . . . . . . . . . 654.5.10 Proteomics analysis using mass spectrometry . . . . . . . . . . . . . . . . . . . 654.5.11 Peptide identification and protein quantification . . . . . . . . . . . . . . . . . . 664.5.12 Mutational signature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.5.13 Prioritization of driver genes using HIT’nDRIVE . . . . . . . . . . . . . . . . . 674.5.14 Consensus clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.5.15 Protein attenuation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.5.16 Pathway enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.5.17 Stromal and immune score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.5.18 Enumeration of tissue-resident immune cell types using mRNA expression profiles 684.5.19 External datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Combinatorial detection of conserved alteration patterns for identifying cancer subnet-works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.3 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.4 Algorithmic Framework of cd-CAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.4.1 Combinatorial Optimization Formulation . . . . . . . . . . . . . . . . . . . . . 785.4.2 Algorithmic Framework for solving MCSC . . . . . . . . . . . . . . . . . . . . 805.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.5.1 Dataset Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.5.2 Maximal Colored Subnetworks Across Cancer Types . . . . . . . . . . . . . . . 835.5.3 Maximal Colorful Subnetworks Across Cancer Types . . . . . . . . . . . . . . . 855.5.4 Multiple-Subnetwork Analysis Across Cancer Types . . . . . . . . . . . . . . . 855.5.5 Empirical P-Value Estimates Confirm the Significance of cd-CAP Identified Net-works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.7 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.7.1 Significance of the Identified Subnetworks . . . . . . . . . . . . . . . . . . . . 885.7.2 Pathway enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.7.3 Association of sub-networks with patients’ survival outcome . . . . . . . . . . . 89x6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.1 Future Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99xiList of TablesTable 5.1 Five subnetworks identified by cd-CAP in multi-subnetwork mode for each cancertype: respective columns below depict the subnetwork size, depth, and the number ofnodes in the subnetwork with copy number amplification (AMP), expression increase(EXP-UP) or decrease (EXP-DOWN). . . . . . . . . . . . . . . . . . . . . . . . . . 91xiiList of FiguresFigure 2.1 Overview of HIT’nDRIVE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 2.2 HIT’nDRIVE identified driver genes with respect to varying parameter values in 100selected BRCA samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 2.3 HIT’nDRIVE identified driver genes with respect to underlying network used in 100selected BRCA samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Figure 2.4 Modified HIT’nDRIVE not required to prioritize at least one driver gene per patient. 29Figure 2.5 Likelihood of HIT’nDRIVE to capture CGC Genes. . . . . . . . . . . . . . . . . . . 30Figure 2.6 Correlation between the number of driver genes predicted by HITnDRIVE with mu-tation rate and copy-number burden . . . . . . . . . . . . . . . . . . . . . . . . . . 31Figure 2.7 Phenotype classification using driver-seeded modules . . . . . . . . . . . . . . . . . 32Figure 2.8 Phenotype Classification using CGC Genes Seeded Modules. . . . . . . . . . . . . . 33Figure 3.1 Summary of driver genes prioritized by HIT’nDRIVE . . . . . . . . . . . . . . . . 46Figure 3.2 Network properties of driver genes . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Figure 3.3 BRCA subtype classification using driver modules . . . . . . . . . . . . . . . . . . 48Figure 3.4 Drug efficacy predicted by HIT’nDRIVE seeded driver genes. . . . . . . . . . . . . 49Figure 4.1 Landscape of somatic mutations in PeM tumors . . . . . . . . . . . . . . . . . . . . 70Figure 4.2 Landscape of copy number aberrations in PeM tumors . . . . . . . . . . . . . . . . 71Figure 4.3 Gene fusions in PeM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Figure 4.4 Transcriptome and proteome profile of PeM . . . . . . . . . . . . . . . . . . . . . . 73Figure 4.5 Immune cell infiltration in PeM tumors. . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 5.1 Schematic overview of cdCAP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Figure 5.2 Conserved colored subnetworks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92xiiiFigure 5.3 Colorful maximal subnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Figure 5.4 Multiple subnetwork analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Figure 5.5 Empirical p-value estimates for the maximum size subnetworks identified by cd-CAP. 95xivGlossaryAGC Automatic Gain ControlBAM Binary Alignment MapBMR Background Mutation RateBRCA Breast adenocarcinomaC-INDEX Concordance-indexCGC Cancer Gene CensusCNA Copy Number AberrationCOSMIC Catalogue of Somatic Mutations in CancerCRS Cytoreductive surgeryDEG Differentially expressed genesDGIDB Drug-Gene interaction databaseDNA Deoxyribonucleic acidEQED eQTL Electrical DiagramsEQTL Expression Quantitative Trait LociFL Facility LocationFDR False Discovery RateFFPE Formalin-Fixed Paraffin-EmbeddedxvGBM Glioblastoma multiformeHIPEC Hyperthermic intraperitoneal chemotherapyHR Hazard RatioHT Hitting TimeIGV Integrative Genomics ViewerIHC ImmunohistochemicalILP Integer Linear ProgrammingINDEL Insertion and DeletionKNN k-nearest neighbourLP Linear ProgrammingMFPT Mean First Passage TimeNIPEC Normothermic intraperitoneal chemotherapyOV Ovarian serous cystadenocarcinomaPCR Polymerase Chain ReactionPM Pleural MesotheliomaPRAD Prostate adenocarcinomaPSM Peptide Spectral MatchesQPCR Quantitative Polymerase Chain ReactionRECOMB Research in Computational Molecular BiologyRNA Ribonucleic AcidRT-PCR Reverse Transcription PCRRWFL Random Walk Facility Location ProblemxviRWR Random Walk with RestartSNV Single Nucleotide VariationSV Structural VariationTCGA The Cancer Genome AtlasTMA Tissue microarrayTMZ TemozolomideTNBC Triple-Negative Breast CancerVCF Variant Calling FormatWMSC Weighted Multi-Set CoverxviiAcknowledgmentsThis research was supported in part by the CIHR Bioinformatics Training Program, Prostate CancerFoundation - British Columbia (PCF-BC) Research Awards, and Mitacs Accelerate PhD Fellowship.My deepest gratitude goes to my PhD supervisors, Dr. Colin Collins and Dr. Cenk Sahinalp, fortheir endless support, encouragement, and guidance throughout my graduate studies. It was their com-mitments, ideas, and constructive criticisms that help shape my research and successfully publish mypapers. Under their guidance I’ve had several opportunities to hone my technical and scientific skills incomputer science and biology. I have learned from them, not only to be a good researcher but learnedmany lessons in life beyond the confined boundaries of a laboratory.I would like to thank my thesis committee members - Dr. Yuzhuo Wang, Dr. Artem Cherkasov,and Dr. Wan Lam for providing me guidance whenever I needed. Special thanks to Dr. Josh Stuart foraccepting to read and evaluate my thesis as an external examiner. I would also like to thank Dr. David G.Huntsman and Dr. Leonard Foster for accepting to read and evaluate my thesis as university examiners.I have been incredibly blessed to be part of Vancouver Prostate Centre family. My most sinceregratitude to my colleagues Mr. Ermin Hodzic, Mr. Kaiyuan Zhu, and Dr. Noushin Nabavi with whomI have had opportunity to closely collaborate in my different research works. I would like to thank Dr.Anna Lapuk and Mr. Kendric Wang who mentored me during initial phase of my PhD. I offer my regardsto Dr. Mads Daugaard, Dr. Faraz Hach, Dr. Stanislav Volik, Dr. Stephane Le Bihan, and Dr. Alex Wyattfor the their invaluable guidance.I would like to thank my collaborators Dr. Gholamreza Haffari, Dr. Andrew Churg, Dr. ThomasSauerwald, Dr. Phoung Dao, Dr. Fabio Vandin, and Dr. Kuoyuan Cheng. I would also like to thankmy present and former colleagues Dr. Yen-Yi Lin, Dr. Fan Mo, Dr. Dong Lin, Dr. Nilgun Donmez, Dr.Ibrahim Numanagic, Mr. Shawn Anderson, Mr. Hans H. Adomat, Mr Robert Shukin, Mr. Robert H.Bell, Mr. Brian McConeghy, Ms. Anne Haegert, Ms. Sonal Brahmbhatt, Mr. Jake Yeung, Mr. SalemMalikic, Mr. Alex Gawronski, Mr. Ehsan Haghshenas, Mr. Mike Ford, Mr. Varune Ramnarine, Mr.Hossein Sharifi-Noghabi, and Mr. Hossein Asghari. I benefited greatly from these collaborations, andxviiihope to continue working with them.Last, but not least, I heartily thank my family for the strong motivation that they gave me to followmy studies. Their support was invaluable to me.xixChapter 1Introduction1.1 Cancer driver genesCancer is a major cause of death across the globe and remains a growing challenge to health-care systems.Cancer is characterized by uncontrolled division (malignant growth) of abnormal cells (tumors) in thebody. All cancer arise due to somatically acquired changes in Deoxyribonucleic acid (DNA), RibonucleicAcid (RNA), or protein sequences of the cancer cells.Cancer is a complex disease caused by combination of different genetic changes. These genetic alter-ations includes, but not limited to, Single Nucleotide Variation (SNV), Insertion and Deletion (INDEL),Copy Number Aberration (CNA), Structural Variation (SV), gene fusions, changes in amino-acids se-quence of a protein, DNA methylation, and changes in the gene and protein level expression. Combi-nation of these genetic alterations dysregulate different oncogenic or tumor-supressive signaling path-ways thus promoting cancer growth. Furthermore, cross-talks between different signaling pathways isinevitable but is often less understood [72].Cancer is an evolutionary disease. The genetic changes occur in initiating cells (clones) undergointense evolutionary selection during disease progression and can be widely altered during treatment.Cancers evolve by reiterative process of clonal selection, clonal expansion, and genetic diversificationwithin the adaptive landscapes of tissue ecosystems [67]. The cancer cell evolutionary process may leadto sub-clonal divergence resulting in genetic and molecular heterogeneity.During tumor progression, cancer cells accumulate a multitude of genomic alterations; however mostare inconsequential “passenger” alterations that are effectively neutral. Nevertheless, a small fractionprovide mission-critical “hallmark” functions and are known as “driver” alterations that modify tran-scriptional programs and therefore drive and sustain tumor progression [69, 161, 195]. Driver alterations1are evolutionary advantageous for the tumor development. These are causally implicated in oncogenesisand some even trigger cancer progression, resistance to the disease or therapy. Driver alterations are pos-itively selected during evolution of the cancer. Improving our knowledge on driver alterations, possiblythrough an integrative analysis of various omics data is critical to better understand cancer mechanismsand select appropriate therapies for specific cancer patients.1.2 Computational methods for the prediction of cancer driver genesAll computational methods for the prediction of cancer driver genes can be broadly grouped into threedifferent strategic approaches: (a) identification of recurrent somatic alterations, (b) prediction of func-tional impact of somatic alterations, and (c) pathway and interaction-network based approaches1.2.1 Identification of recurrent somatic alterationsIn the early cancer genomics studies, the driver mutation were identified on the basis of alterations thatappeared more frequently across the patient cohort than expected by random chance. These driver muta-tions were thought to drives the cancer phenotype and provide selective advantage for clonal expansionof its lineage.Recurrent Somatic MutationSeveral popular computational tools such as MutSigCV [96], MuSiC [46], and others [77, 159, 209]have been developed based on this strategy. These method aim to identify recurrence frequency of SNVswith respect to the Background Mutation Rate (BMR) in a population of tumors [68, 195, 209]. TheBMR is the probability of observing a passenger mutation at a specific location of the genome. Themain difference between the tools mentioned above are in how they estimate the BMR and how manymutational context they consider. BMR is not constant across the genome but depends on the genomiccontext. BMR estimate greatly effects the identification of recurrent mutation. If the BMR is lower thanthe true value, then it will lead to false-positives whereas if the BMR is higher than the true value, thenit will lose some recurrent mutations.Recurrent copy-number alterations (CNAs)The identification of recurrent CNAs in tumors presents different set of challenges. Unlike SNVs, CNAseffect more than one gene. Somatic CNAs show a large variation in their position and length acrossdifferent tumors. For example, an oncogene can be amplified in a tumor because of whole-chromosome2duplication whereas in other tumors the same oncogene amplification may be focal where the ampli-fied locus also contains the oncogene. These issues makes identification of somatic driver CNAs morechallenging. Thus the computational methods developed to study such problems take a non-parametricapproach. Early approaches to identify recurrent somatic driver CNAs relied on identification of sharedregions of CNAs across the tumor cohort. The statistical significance of such overlaps were assessed byfixing the length of alterations but independently permuting their position in tumors populations. Morerecent approaches such as GISTIC2 [115], CMDS [212], JISTIC [147], DiNAMIC [196], and ADMIRE[188] use more sophisticated models to assess the statistical significance of overlapping CNAs of differ-ent lengths.Frequency based approach are best suited to study the driver genes that frequently altered acrossthe tumor population. However, less frequently altered genes dominate and vastly outnumber frequentlyaltered genes [200]. Recent whole-genome studies have revealed that important genes may be recurrentlyaltered in only a small fraction of the tumor cohort under study, and can be subtype-specific [128, 170].Furthermore, in the context of tumor evolution, personalized rare driver genes are likely to arise duringadvanced stages and may be isolated to a small fraction of tumor cells [49, 67]. Such rare or personalizeddriver alterations may be functionally important and are likely to be missed by the frequency-basedapproach.1.2.2 Prediction of functional impact of somatic alterationsAnother approach for distinguishing driver alterations from passenger alterations is to predict the func-tional impact of a mutation using additional biological information about the sequence and/or structureof the protein encoded by the mutated gene. These methods are applied to the non-silent SNVs that resultin changes in the amino-acid sequence of the corresponding protein. Several methods have been devel-oped to predict the effect of SNVs. ANNOVAR [197] provide annotation of transcript variants. FunSeq[86] includes additional annotation of non-coding elements and regulatory features. MutationAssessor[139] combines protein domain information with evolutionary conservation model to identify functionalimpact of somatic mutations. Furthermore, CHASM [29], TransFIC [65], and OncodriveFM [64] usesmachine-learning algorithms trained on known cancer mutations to highlight potential driver mutations.ActiveDriver [137] predict effects that are related to protein aggregation, protein stability and alterationsof residues targeted by post-translational modification. Other popular methods to access the effect ofSNVs on protein function includes Condel [63], SIFT [157], and Polyphen [2].31.2.3 Pathway and interaction-network based approachesGenes and their protein product act on different hierarchies of biochemical organization. Moreover,gene/proteins do not act in isolation rather they act together with other genes in a signaling, regulatory,or metabolic pathway collectively known as interactome. Examination of the collection of identifiedsomatic alteration in the interactome can lead to better understanding of the cancer progression. However,the complex nature of the interactome is the confounding factor for the identification of driver genes andtheir corresponding signaling interaction network.Many computational methods have been developed to assess signaling networks or pathways per-turbed by somatic mutations in cancer. Perhaps the first computational method to consider large scalegenomic variants as driver events is CONEXIC [4]. It correlates genes with highly recurrent CNAswith variation in gene expression profiles within a Bayesian network. CONEXIC uses a score-guidedsearch to identify the combinations of driver CNAs that best explains the patterns of gene-expressionmodules in the tumor phenotype. Similarly, with no prior knowledge of pathways or protein interac-tions, MOCA correlates gene mutation information with expression profile changes in other genes [112].NetBox [32] uses the shortest-path approach to connect the somatically altered genes in an interactionnetwork and then identify statistically significant connected modules containing potential driver genes.Method by Suo et al [165] prioritizes highly mutated genes that interact with large number of differ-entially expressed genes in a gene network. MEMo [37], identifies sets of proximally-located genesfrom interaction networks, which are also recurrently altered and exhibit patterns of mutually exclusivityacross the patient population. MEMo first defines modules of highly connected nodes in the networkand then assesses if these network modules show mutually exclusive mutations. RME algorithm [117]identifies modules with exclusive patterns of mutations using an information theoretic measure to testfor the significance of the observed exclusivity. RME starts from scores that measures the exclusivity ofpairs of genes, and includes only genes mutated with relatively high frequency, limiting its effectivenessin identifying rare driver mutations. Another approach, (Multi) Dendrix aims to simultaneously identifymultiple driver pathways, assuming mutual exclusivity of mutated genes among patients, using either aMarkov chain Monte Carlo algorithm [190] or Integer Linear Programming (ILP) [101]. XSEQ [48] usesprobabilistic model to compute influence of mutated genes over expression profile changes in other genesby considering direct gene interactions. Two other methods - PARADIGM [192] and PARADIGM-SIFT[125] uses Bayesian network to integrate genomic and transcriptomic data to infer pathways altered in apatient.4Network propagation based methodsIn recent years, network propagation based methods has been used extensively for identification of dis-ease associated genes. The main principle assumption behind network propagation methods is that genesthat belong to same phenotype interact with each other resulting in the amplified biological signal [41].These group of methods aim to identify the genes that are in close proximity to the known disease genesin the signaling interaction network. The prior knowledge or experimental measurement obtained fromthe genomic, transcriptomic, proteomic, or epigenomic profile of an individual(s) are superimposed onthe network. The signal from the “source” node is then propagated to a distant “target” node through theedges in the global interaction network. Instead of finding the single path connecting the source to thetarget, network propagation methods computes the fraction of the “flow” (originating from the sourcenode) passing through each of the intermediate node/edges to the target node. The fraction of the flowimitates the probability of using the path in the information propagation process. Network propagationapproach gives us the ability to incorporate multiple data-types (such as mutations, genomic aberrations,gene-expression, confidence level of interactions, and functional associations of genes) to the probabilis-tic network models [35]. Due to its powerful nature to predict distant interactions, network propagationis used in many different disciplines including computer science, engineering, physics, and biology. Inbiology, network propagation has been used in the context of gene function prediction, gene-modulediscovery, disease genes discovery, disease subtyping, and drug target prediction. Below, I will furtherelaborated on network propagation based methods built to identify disease genes or cancer driver genes.The current flow approach is one of the ways to model network propagation. Current flow approachassumes the flow of current in an electronic circuit, where each edge has an associated resistance. It isbased on the well-known analogy between random walks (discussed below) and electronic networkswhere the amount of current entering a node or an edge in the network is proportional to the expectednumber of random walk visit on the node or edge. eQTL Electrical Diagrams (EQED) [166] integratesExpression Quantitative Trait Loci (EQTL) analysis with molecular interaction network using the circuitnetwork model. To the best of our knowledge, NetQTL [87] is the first method to link CNAs to expressionprofile changes within an interaction network and connects specific “causal” aberrant genes with potentialtargets in the interaction network. They formulated a Weighted Multi-Set Cover (WMSC) problem andprovided a greedy solution to identify the set of causal genes.Another network propagation approach is to use random-walk (also known as fluid diffusion, diffu-sion kernel, or graph kernel). A random walk, as its name indicates, propagates randomly starting froma known disease gene (i.e. a seed gene) to its neighbouring genes with equal probability or with a givenprior probability. This iterative process of random walk is halted after certain number of steps. In order5to capture local neighbourhood of the disease gene, a variant of this process known as Random Walkwith Restart (RWR) is used as an alternative to the halting process. In RWR, a reset parameter is usedwhich insures that the random walker return to the seed nodes after each step of propagation. In this way,we can identify genes/proteins interacting with the disease-gene as they are the nodes most often visitedduring the random-walk simulation. This approach helps to prioritize genes/proteins and interactions onthe basis of their potential involvement a particular disease. The network propagation algorithm was firstdescribed by Kondor et al [92]. Network propagation algorithm have been used to analyze friendshipnetworks, where edges represent similarity or affinity. It is the basis of the original Google’s PageRankalgorithm [23].Tu et al [183] used a random walk approach on a molecular interaction network to associate causalgenes and pathways explaining a given association and applied the method to the data obtained fromyeast knockout experiments. Methods by Ko¨hler et al [91] and PRINCE [191] uses variant of the ran-dom walk algorithm to prioritize disease associated genes/proteins. Ko¨hler et al [91] demonstrated thatrandom walk analysis of interaction networks outperforms local network-based methods, such as shortestpath distances and direct interactions. Yeger-Lotem et al developed a method - ResponseNet [204] whichwas later expanded by Lan et al [94]. ResponseNet uses network algorithm to relate genetic perturbationsto transcriptomic response in yeast model thereby identifying sub-networks of regulators mediating theinteractions. ResponseNet formulated a mininum-cost flow optimization problem which aims to maxi-mize the flow between the source and target while minimizining the cost of the connecting paths. Thusby setting the cost of an edge to the negative log of its probability, a high-probability connecting sub-network is obtained. They provided a Linear Programming (LP) formulation to solve the optimizationproblem. HotNet [189], was the first method to use a network propagation (fluid diffusion) approach[136] to compute a pairwise influence measure between the genes in the (gene interaction) network andidentify sub-networks enriched with mutations. HotNet then derives a two-stage multiple hypothesis testto reduce the False Discovery Rate (FDR) in sub-networks discovery. Another method, TieDIE [130],extends the heat diffusion strategies of HotNet by leveraging two different type of genomic inputs: mu-tated genes and transcriptional factors. TieDIE identifies a collection of pathways and sub-networks thatassociate a fixed set of driver genes to expression profile change.Another method, DriverNet [14], aims to correlate genomic alterations with target genes expressionprofile changes, but only among direct interaction partners. The novel feature of DriverNet is that it aimsto find the “minimum” number of potential drivers that can “cover” targets. DriverNet provided a greedyapproximation algorithm to solve the optimization problem.Hitting Time (HT)) or First Passage Time is an alternative approach for estimating node influencein the (gene interaction) network using network propagation. HT on a network is simply the expected6minimum number of steps (hops) taken from a source node to reach a target node. Since HT relies onthe global topology of the network, there are many possible paths that connects the source node to thetarget node given the sparseness of the biological networks. However, we cannot be certain about theprobability of reaching a target node given the number of steps or which path is the most probable. Forthis reason, measuring the average hitting time (or Mean First Passage Time (MFPT)) is more reasonablefor pairwise influence calculations [152]. To calculate the average HT, random walk simulation can beutilized where the transition probability of the nodes may have equal probabilities or some pre-definedprobabilities.Liben-nowell et al [106] was the first to make use of HT for link-prediction problem on socialnetworks. Average HT has been previously used for analyzing state transition graphs in probabilisticBoolean networks to identify gene perturbations that quickly lead to a desired state of the system [153].Yao et al [203] estimated the closeness of a candidate gene to a disease of interest by computing theHT of a random walk that starts at the corresponding disease phenotype and ends at the candidate. Con-damin et al [39] developed a method for computing exact hitting times in a complex network, dependingon fractal dimension (i.e. density of nodes) and random walk dimension (i.e. source-target distance in thenetwork). Torchala et al [181] extended this method using Hill’s algorithm which make use of transitionprobabilities between the node. They also demonstrated that Hill’s algorithm is an efficient method tocalculate average HT in a network. This was later implemented on C++ as RaTrav [182].1.3 ContributionsIn this thesis, we focus on computational problems involving identification cancer driver genes, and theirapplication to guide precision oncology. Our goal here is to design network propagation based efficientcomputational algorithm for cancer driver gene prioritization integrating multi-omics cancer datasets.More specifically we present the following contributions:• We introduce HIT’nDRIVE [154, 155], a combinatorial algorithm that measures the potential im-pact of genomic aberrations on changes in the global expression of other genes/proteins whichare in close proximity in a gene/protein-interaction network. HIT’nDRIVE then prioritizes thoseaberrations with the highest impact as cancer driver genes. HIT’nDRIVE formulates the driverprioritization problem as a “random-walk facility location” (RWFL) problem, which differs fromthe standard facility location problem by its use of “hitting time”, the expected number of hopsto reach a “target” gene from a “source” gene, as a distance measure in an interaction network.HIT’nDRIVE uses “inverse” hitting time as a measure of influence of a source gene over a tar-get gene to identify the subset of sequencewise altered/source genes whose overall influence over7expression altered/target genes is maximum possible.• Using multi-omics data from different cancer types, we identified both known as well as rare (andpotentially novel) patient-specific driver genes. We also demonstrate that by using HIT’nDRIVE-identified driver genes and associated “network modules” (sub-networks seeded by driver geneswhose aggregate expression profiles correlate well with the cancer phenotype) as features, it ispossible to perform accurate phenotype classification. In fact, we found a number of breast cancersubtype-specific driver modules that are associated with patients’ survival outcome. Finally, wedemonstrate that HIT’nDRIVE-identified driver genes accurately predict drug efficacy in pan-cancer cell lines.• We present a first-in-field comprehensive integrative multi-omics analysis of a patient cohort oftreatment-nave peritoneal mesothelioma (PeM) [156]. In a novel contribution, using HIT’nDRIVE,we identified PeM with BAP1 loss to form a distinct molecular subtype characterized by distinctgene expression patterns of chromatin remodeling, DNA repair pathways, and immune checkpointreceptor activation. We also demonstrate that this subtype is correlated with inflammatory tumormicroenvironment and thus a candidate for immune checkpoint blockade therapies. Our findingsreveal BAP1 to be a trackable prognostic and predictive biomarker for PeM immunotherapy thatrefines PeM disease classification. This is significant because almost half of PeM cases are nowcandidates for these therapies. BAP1 stratification may improve drug response rates in ongoingphase-I and II clinical trials exploring the use of immune checkpoint blockade therapies in PeMin which BAP1 status is not considered. This integrated molecular characterization provides acomprehensive foundation for improved management of a subset of PeM patients.• Our another novel and significant contribution is that we resolved the large discordance betweenmRNA and protein expression patterns in PeM cohort. Most of this discordance is attributed tochromatin remodeling genes and proteins linked to multimeric protein complex. The majority ofwhich are direct protein-interaction partners of BAP1. The discordance between the mRNA andthe protein expression patterns is most likely due to the ubiquitination and degradation of proteinsin these BAP1 regulated complexes to maintain functional stoichiometry.• Lastly, we present a novel computational method, cd-CAP (combinatorial detection of ConservedAlteration Patterns), that primarily uses an ILP formulation to identify subnetworks of an interac-tion network, each with an alteration pattern conserved across (a large subset of) a tumor samplecohort. cd-CAP simultaneously identifies more than one subnetwork, and each gene within eachsubnetwork has labels specific to the alteration types it harbors. Notably, we demonstrate that8many of the largest highly conserved subnetworks within a tumor type solely consist of genes thathave been subject to copy number gain, typically located on the same chromosomal arm and thuslikely a result of a single, large scale copy number amplification. We have also demonstrated thatthe subnetworks identified using cd-CAP are associated with patients’ survival outcome and henceare clinically important.In addition to our primary contributions to the driver gene identification problems mentioned above,our other contributions to the field of Computational Biology and Cancer Genomics can be found in[61, 109, 149, 198, 201, 202]1.4 Organization of the thesisThe rest of the thesis is organized as follows:• In Chapter 2, we introduce HIT’nDRIVE, a combinatorial algorithm to prioritize cancer drivergenes. Then we present our experimental results exploring the behaviour of HIT’nDRIVE.• In Chapter 3, we present extensive analysis of multi-omics data from multiple cancer types usingHIT’nDRIVE. Here we identify cancer driver genes in multi-omics cancer dataset as mentionedabove and explore their network properties. Then we demonstrate application of HIT’nDRIVEon cancer phenotype and subtype classification, and drug efficacy prediction to guide precisiononcology.• Chapter 4 describes integrative multi-omics characterization of a patient cohort of a rare cancer,peritoneal mesothelioma. Here we demonstrate application of HIT’nDRIVE, which helped usdefine a novel molecular subtype of peritoneal mesothelioma. We predicted this subtype wouldlikely respond to immunotherapy.• In Chapter 5, we introduce cd-CAP, a combinatorial algorithm to identify sub-networks with con-served molecular alteration pattern across a large subset of a tumor sample cohort. Then we presentour experimental results analyzing multi-omics data from multiple cancer types using cd-CAP.• Finally, in Chapter 6, we offer a summary and conclusion of our contributions to cancer drivergene identification, as well as discussion of possible directions for future work.9Chapter 2HIT’nDRIVE: an algorithm for cancerdriver genes prioritization using hittingtime2.1 IntroductionGenomic and transcriptomic alterations are the major contributors of tumorigenesis and progressionof cancer. Over the past decade, high-throughput sequencing efforts have provided an unprecedentedopportunity to identify these alterations in cancer that can lead to changes in gene regulation, proteinstructure, and function [161]. Genomic and transcriptomic data provide unique and complementaryinformation about a particular tumor, but the translation of “big” molecular data into insightful andimpactful patient outcomes is extraordinarily challenging [195]. As explained in Chapter 1, during tumorprogression, cancer cells accumulate a multitude of genomic alterations with most being inconsequential“passenger” alterations that are effectively neutral. However, a small fraction provide mission-critical“hallmark” functions and are known as “driver” alterations that modify transcriptional programs andtherefore drive and sustain tumor progression [69, 161, 195]. The knowledge of driver alterations isfoundational to guide selection of appropriate therapies. For this we need to better integrate differentomics data-types and distinguish critical driver events from others.Among different strategies explained in Chapter 1, the ones based on mutual exclusivity still fo-cus on frequent events. The others, based on “information flow” in gene/protein interaction networks,do not aim to discover cancer drivers, but rather are designed to identify dysregulated sub-networks or10modules. In addition, the notion of influence they employ is based on stationary distribution of “informa-tion” originating at a particular gene/protein. As a result, none of the available methods aim to identifyrare, patient-specific driver events, based on a time dependent notion of influence. Finally, none of theavailable techniques aim to simultaneously consider different types of genomic alterations as potentialdrivers.2.2 Our ContributionsTo address the above challenges, in this chapter, we introduce a novel combinatorial method, HIT’nDRIVE[154, 155], which was first presented at the Research in Computational Molecular Biology (RECOMB)conference . HIT’nDRIVE is a combinatorial algorithm that measures the potential impact of genomicaberrations on changes in the global expression of other genes/proteins which are in close proximityin a gene/protein-interaction network. HIT’nDRIVE then prioritizes those aberrations with the highestimpact as cancer driver genes. HIT’nDRIVE formulates the driver prioritization problem as a “random-walk facility location” (RWFL) problem, which differs from the standard facility location problem by itsuse of “hitting time”, the expected number of hops to reach a “target” gene from a “source” gene, as adistance measure in an interaction network. HIT’nDRIVE uses “inverse” hitting time as a measure ofinfluence of a source gene over a target gene to identify the subset of sequencewise altered/source geneswhose overall influence over expression altered/target genes is maximum possible.Since RWFL problem is NP-hard, we estimate the multi-hitting time based on the independent hittingtimes of the drivers to an expression outlier, which provides an upper bound on the multi-hitting time.Our experiments show that this estimate works well for the human protein interaction network. Moreimportantly, our estimate enables us to reduce the RWFL problem to a Weighted Multi-Set Cover (WMSC)problem, for which we give an ILP formulation.2.3 HIT’nDRIVE Algorithmic FrameworkHIT’nDRIVE links alterations at the genomic level to changes at transcriptome level using gene/proteininteraction network. For that, it aims to find the smallest set of altered genes that can explain most ofthe observed transcriptional changes in the cohort. In other words, HIT’nDRIVE identifies the minimumnumber of potential drivers which can cause a user-defined proportion of the downstream expressioneffects observed. We formulate this as a Random Walk Facility Location Problem (RWFL) problem, acombinatorial optimization problem that we introduce here. RWFL generalizes the classical FacilityLocation (FL) problem by changing the notion of distance it uses. Given a network, FL problem definesthe distance between a potential driver gene and an outlier gene as the length of the shortest path between11them. The RWFL problem, in contrast, uses “hitting time” [39, 106], the expected length of a randomwalk between the two nodes, as their distance. Under the use of hitting time, the FL problem completelychanges nature: in the classical FL formulation the goal is to associate each outlier gene in the networkwith exactly one (the closest) driver gene. In the RWFL formulation, each outlier gene is associated withmultiple drivers (whose collective distance to the outlier will no longer be the shortest pairwise distance),forming a many-to-many relation. Intuitively, hitting time measures how accessible a particular outliergene is from potential drivers. Thus RWFL problem asks to find the smallest set of sequence-alteredgenes from which one can reach (a good proportion of) outliers within a user defined “multi-hittingtime” - the expected length of the shortest random walk originating from any of the sequence alteredgenes, and ending at an outlier.In order to capture the uncertainty of interactions of genes with their neighbours, it considers arandom walk process which propagates the effect of sequence alteration in one gene to the remainderof the genes through the network. As a result, the influence is defined to be the inverse of hitting-time,which is the expected length (number of hops) of a random walk which starts at a given potential drivergene, and “hits” a given target gene the first time in an interaction network. More specifically, for anytwo nodes u,v ∈V of an undirected, connected graph G = (V,E), let the random variable τu,v denote thenumber of hops in a random walk starting from u and visiting v for the first time. Then the hitting-timeHu,v is defined as Hu,v = E[τu,v] [104].In order to capture synthetic lethality like scenarios, HIT’nDRIVE considers multiple aberrated genesas potential drivers. For that, we define the influence value (of a set of potential driver genes on a target) asthe inverse of multi-hitting time. More specifically, let U ⊆V be a subset of nodes of G and v∈ (V−{U})be a single node. We thus define the multi(source)-hitting time HU,v as HU,v = E[minu∈U τu,v].Now the RWFL problem for a single patient can be described as follows. LetX be a set of potentialdriver genes andY be a set of expression altered (outlier) genes. Then, for a user defined k, HIT’nDRIVEcan aim to return k potential driver genes as solution to the following optimization problem:argminX⊆X ,|X |=k maxy∈YHX ,ywhere HX ,y denotes the multi-hitting time from the gene set X to the gene y.As per the standard facility location problem, RWFL is NP-hard. In fact, even the problem of comput-ing the multi-hitting time between a set of nodes in a network and a particular target node is difficult. Weovercome this difficulty by introducing a good estimate on the multi-hitting time that helps us to reduceRWFL problem to the Weighted Multi-Set Cover (WMSC), which we solve through an ILP formulation.(Although the use of set-cover for representing the most parsimonious solution in a bioinformatics con-12text is not new [75], to the best of our knowledge this is the first use of the multi-set cover formulation formaximum parsimony.) In this formulation, we use a slightly different objective: given a user defined up-per bound on the maximum multi-hitting time, we now aim to minimize the number of potential driversthat can “cover” (a user defined proportion of) the outlier genes. For more than one patient, we minimizethe number of drivers that can “cover” (a user defined proportion of) patient-specific outliers such thateach such outlier is covered by potential drivers that are aberrant in that patient.2.3.1 Reformulation of RWFL as a Weighted Multi-Set Cover (WMSC) ProblemFor simplicity, we first describe how HIT’nDRIVE works on single patient data. Given an interactionnetwork with X denoting the set of sequence-altered genes (through SNVs or SVs) and Y denotingthe set of expression-altered genes, HIT’nDRIVE computes the smallest subset of X whose joint “in-fluence” over (a user defined fraction of) expression-altered genes is sufficiently high (i.e. above a userdefined threshold). The influence of a set of (sequence-altered) genes X over an expression-altered geneg is defined as 1MHT (X ,g) , where MHT (X ,g) denotes the multi-hitting time, the expected length of theshortest random walk originating at each one of the genes in X that ends at g. Therefore, HIT’nDRIVEaims to solve the RWFL problem in a network where X are the “potential facilities” and Y are the“requests”.Since RWFL is a computationally hard problem, and cannot be solved in a reasonable amount oftime in its original formulation, we reduce the RWFL problem to the WMSC problem, for which wegive an ILP formulation. Intuitively, in this new formulation, HIT’nDRIVE associates the genomicalterations with transcriptomic changes in the form of a bipartite graph Gbip(X ,Y ,E ) whereX is theset of aberrant genes, Y is the set of patient-specific expression-altered genes, and E is the set of edges.If gene xi is mutated in a patient p, we set edges between xi and all of the expression altered genes inthe same patient (y j, p) where the edges are weighted by the inverse pairwise hitting times wi j := H−1xi,y j(Figure 2.1A). The WMSC problem on this representation of data asks to find the smallest subset ofX(as potential drivers) whose total influence (sum of pairwise influence values) over a user defined fractionof expression-altered genes (for each patient) is sufficiently high.The reduction from RWFL problem to the WMSC problem is achieved by estimating the multi-hitting time as a function of independent hitting times of the drivers to an outlier, which provides anupper bound on the multi-hitting time. The exact individual hitting times are calculated by a matrixinversion method [173]. The resulting WMSC problem can then be formulated as the ILP below, whichis efficiently solvable by CPLEX (within minutes) for all data sets we considered.13minx1,..,x|X |∑i xis.t.∀i, j : xi = ei j∀ j : ∑i ei jwi j ≥ y jγλ j∑i wi j∑ j y j ≥ α|Y |∀p : argβ f ractiono f highestλ j (y j) = 1xi,ei j,y j ∈ {0,1}The above ILP formulation for the WMSC problem introduces binary variables xi, y j, ei j, respec-tively, for each potential driver, expression-alteration event, and edge in the bipartite graph. The objectiveof the ILP is to minimize the number of drivers (i.e. the sum of xi values) subject to four constraints. Thefirst constraint ensures that a selected driver contributes to the coverage of each of the expression alter-ation events it is connected to (in each patient, if multiple patients are available). The second constraintensures that selected (patient-specific) driver genes contribute enough to cover at least a (γ) fraction ofthe sum of all incoming edge weights to each expression alteration event. This constraint corresponds tosetting an upper bound on our estimate on the inverse of multi-hitting time of the selected (patient spe-cific) drivers on an expression alteration event. The third constraint ensures that the selected driver genescollectively cover at least an α fraction of the set of expression alteration events. And the fourth con-straint ensures that for each patient, the top β fraction of expression altered genes with highest weights(λ j) are always covered.As indicated above, our ILP formulation for WMSC problem can be generalized to multiple patientswith the objective of minimizing the total number of driver genes across all patients, subject to theconstraint that a user-defined proportion of outlier genes in each of the patients are covered by the subsetof drivers present in that patient.In order to quantitatively assess the genes identified by HIT’nDRIVE, we extended our previously de-veloped algorithm, OptDis [44], for de novo identification of modules of small size inside the interactionnetwork which contain (i.e. are seeded by) at least one predicted driver. The modules are chosen so thattheir discriminative power (for phenotype classification) is the greatest among connected sub-networksof similar size that contain the individual predicted drivers. In general, OptDis performs supervised di-mensionality reduction on the set of connected sub-networks. It projects the high dimensional space ofall connected sub-networks to a user-specified lower dimensional space of sub-networks such that, in thenew space, the samples belonging to the same class are closer and the samples from different class aremore distant to each other (i.e. minmize in-class distant and maximize out-class distance) with respectto a normalized distance measure (typically L1). Then we use module features (average expression of14genes in the module) for phenotype classification (Figure 2.1B-C). Using such module features, we hopethat the classifier in use does not overfit on rare drivers and is able to generalize the signal coming fromrare drivers to new patients. We report the classification accuracy based on the identified driver-seededmodules as means of quantitative validation of our results (in the absence of ground truth). We alsolook at the genes that build the chosen modules (of high classification accuracy) in attempt to identifycancer-related pathways.2.4 ResultsWe have implemented HIT’nDRIVE in C++ and solved the ILP using IBM CPLEX version 12.5.1. Wefirst tested the behaviour and robustness of HIT’nDRIVE given different parameters used in the algo-rithm. These in silico experiments were performed using multi-omics data from four major cancer types- Glioblastoma multiforme (GBM) [175], Ovarian serous cystadenocarcinoma (OV) [176], Breast ade-nocarcinoma (BRCA) [177], and Prostate adenocarcinoma (PRAD) [178] obtained from the The CancerGenome Atlas (TCGA) data portal. Here we describe the results exploring the behaviour of HIT’nDRIVEalgorithm when used for the analysis of multi-omics cancer datasets. The biologically motivated resultsobtained using HIT’nDRIVE are extensively discussed in Chapter- HIT’nDRIVE parametersHIT’nDRIVE uses three user-specified input parameters:1. α: fraction of outliers to be covered overall (across all patients)2. β : fraction of outliers to be covered in each patient3. γ: fractional lower bound on the sum of the incoming edge weights from driver genes selected byHIT’nDRIVEHIT’nDRIVE is robust with respect to the changes in α and β but is somewhat sensitive to γ (Fig-ure 2.2A-B), as expected. However, as γ grows, the driver genes identified by HIT’nDRIVE do notchange but simply grow in number by the addition of new driver genes, which indicates robustness ofour method with respect to γ as well.2.4.2 HIT’nDRIVE: expression outlier stringencyThe higher the stringency we apply on the expression value change in a potential outlier, the feweroutliers we will identify, which in turn will result in fewer number of driver genes. However, the new15set of driver genes obtained are, in general, a subset of the first set of driver genes, again indicatingrobustness (Figure 2.2C-D).2.4.3 HIT’nDRIVE: random alterations and random expression outliers.We compared the HIT’nDRIVE predictions of driver genes among observed mutations with those ob-tained through randomized mutations (Figure 2.2E) and random outliers (Figure 2.2F). There is a starkcontrast between the two sets of driver gene predictions with respect to their overlap with the CancerGene Census (CGC) [59] data set - conserved through different values of the γ parameter (the overlap isgenerally preserved across various settings of the remaining two parameters, namely α and β ). Drivergenes predicted in the non-randomized alteration (or non-randomized outliers) data not only (i) includeda higher number of CGC genes (i.e. more number of true driver genes) as compared to that in drivergenes predicted from randomized alterations (or randomized outliers) data, but also (ii) the number ofCGC driver genes predicted through the use of non-randomized data increased quickly with increasing γparameter, whereas it stays roughly the same when randomized data was used. Note that while perform-ing randomization, the original gene labels (sequence-wise altered genes or expression-outlier genes)were randomly replaced by new ones while preserving their recurrence frequency distributions.2.4.4 HIT’nDRIVE: network perturbationWe used STRING v10 network for our analysis. The edges of the STRING v10 network was perturbedto different extent (between 1-10%) preserving the degree of the nodes in the network. HIT’nDRIVEanalysis was performed using different perturbed networks. Proportion of common driver genes betweenthe unperturbed network and each of the perturbed network were calculated (Figure 2.3A-E). We ob-served that even though the edges of the network were perturbed, the list of driver genes did not changeto a great extent (i.e. the overlap of driver genes was very high) as compared to the non-perturbed net-work even when the edges of the network were perturbed by up to 10%. This clearly demonstrates thatHIT’nDRIVE is not biased towards network perturbations.2.4.5 HIT’nDRIVE: underlying networkWe evaluated the robustness of HIT’nDRIVE on three networks, namely STRING, HPRD and the RE-ACTOME. Only 34% of the vertices in STRING, HPRD, and the REACTOME are shared in all threenetworks; in terms of edges, an even smaller proportion of the edges. Not surprisingly, the more nodesthe network has, the more driver genes HIT’nDRIVE predicts. This is consistently observed across var-ious parameter settings. What is noteworthy is that the percentage overlap between the driver genes16predicted on the three networks is quite robust, i.e., the percentage of driver genes shared between allthree networks is preserved across various parameter settings - e.g. this overlap is above 60% betweenthe REACTOME and any of the other two networks, across various values of gamma - which is quiteimpressive. In fact the driver genes predicted on STRING are almost a superset of those predicted onREACTOME. See Figure 2.3F.2.4.6 Modified HIT’nDRIVE: when it is not required to prioritize at least one drivergene per patient.In HIT’nDRIVE, at least one gene is picked per patient (i.e. when the β > 0). This constraint is based onthe implicit assumption that at least one causal mutation should be driving cancer (although there couldbe exceptions to this, for example, the driver event could be something other than genomic alteration,and be in the form of methylation, aberrant expression of a regulatory RNA or a metabolite, they couldall be incorporated in our framework, given matching data - which unfortunately is not available throughTCGA). There are also important performance issues related to the value of beta: (1) Setting β > 0significantly improves the robustness of our method with respect to the alpha parameter. In Figure 2.4, itcan observed that the alpha parameter has minimal effect on the output of our method - provided beta isnon-zero. If β = 0 (i.e. patients do not necessarily have one driver gene), our method is less robust, as canbe seen in Figure 2.4B. In Figure 2.4C, especially for small values of alpha, the number of patients thatdo not have a driver gene increases as the value of gamma decreases. In the worst case,∼40% of patientsdo not report a driver gene; this happens when α = 0.5 and γ = 0.02. For guaranteeing robustness, theγ value should be set above 0.2 and the α value should be set above 0.7, which reduces to the fractionof patients with no driver genes to 5%. (2) Setting β = 0 significantly increases the running time of ourmethod, from a couple of minutes to several days on very large datasets.2.4.7 HIT’nDRIVE’s ability to capture CGC genesTo check if HIT’nDRIVE is able to capture the true driver genes, we perform the following analysis. Forthe sake of this analysis, let us first assume that the cancer-type specific genes listed in CGC database arethe true driver genes i.e. the ground truth. We predicted potential driver genes in patients from four majorcancer types using HIT’nDRIVE (for details see Chapter 3). For every patient analyzed, we compared theinput (i.e. all sequence-wise altered gene) and the output (i.e. subset of the input sequence-wise alteredgenes that are predicted as potential driver genes) data for HIT’nDRIVE. We compared the amount ofCGC true driver genes present in the input data versus amount of CGC true driver genes captured byHIT’nDRIVE.17The Figure 2.5A-D summarizes the results of this analysis. As can be seen, the likelihood of asequence-wise altered CGC gene to be prioritized by HIT’nDRIVE is much higher than that of a non-CGC genes. Next, for each patient, we calculated the likelihood of HIT’nDRIVE to capture CGC genes(see Section 2.6.6 for details). We found that majority of the samples analyzed have a very significant p-value (i.e. < 0.01) (Figure 2.5E). This analysis demonstrates that HIT’nDRIVE is able to capture cancerdriver genes, to a larger extent, in the patient samples analyzed.2.4.8 Correlation of predicted driver genes with alteration burden.To obtain the mutation rate, we calculated the somatic mutation frequency per Mb (considering mutationsin protein-coding genes only). We obtained copy-number burden values (i.e. percentage of somatic copy-number genome changed) using BioDiscovery Nexus Copy Number software. Figure 2.6A summarizesthe correlation between mutation rate and copy-number burden. As reported in many recent studies,samples in OV, PRAD and BRCA had high copy-number burden. In case of GBM, majority of sampleshad more or less equal mutation and copy-number burden.Figure 2.6B shows the correlation of number of HIT’nDRIVE predicted driver genes with Muta-tion rate. Except for a few highly mutated samples in BRCA, the number of driver genes predictedby HIT’nDRIVE was not correlated with the somatic mutation rate of the respective sample. Finally,Figure 2.6C shows the correlation of number of HIT’nDRIVE predicted driver genes with copy-numberburden. Here too we observed the number of HIT’nDRIVE predicted driver genes were largely indepen-dent of the somatic copy number burden in the genome. Therefore, except for the hypermutated cases, thenumber of HIT’nDRIVE predicted driver genes is independent of both mutation rate and copy-numberburden.2.4.9 Phenotype classification using dysregulated modules seeded with the predicteddriver genesEvaluating computational methods for predicting cancer driver genes is challenging in the absence of theground truth (i.e. follow-up biological experiments). Therefore, we mainly focused on testing whetherour predictions provide insight into the cancer phenotype and improve classification accuracy on anindependent cancer dataset. To test association of the driver genes identified by HIT’nDRIVE with thecancer phenotype, as explained in the earlier section, we used the driver gene seeded gene-modules, aset of functionally related genes (e.g. in a signaling pathway), from the protein interaction network, asfeatures for classifying the cancer phenotype. Using OptDis (here referred to as HIT’nDRIVE-OptDis),we identified small connected sub-networks that include (i.e. are seeded by) predicted driver genes in18a greedy fashion. More specifically, we prioritized sub-networks (of at most seven genes) iteratively sothat in each iteration we identified the sub-networks that maximally discriminates sample phenotypesin a gene-expression matrix, among the sub-networks that share very few genes (at most 20%) with thesub-networks already prioritized.Furthermore, we have also developed an unsupervised method for module identification (here re-ferred to as HIT’nDRIVE-unsupervised), i.e. one that does not depend on any phenotype information.This unsupervised method seeds each module with one HIT’nDRIVE identified driver gene, and includesoutlier genes that it has influence over and co-occurs with significantly across patients. For this, we per-form a hypergeometric test to identify significant driver-outlier interaction (i.e. mutual presence) pairsacross the patient cohort (pvalue < 10-3).Here we compare HIT’nDRIVE-OptDis and HIT’nDRIVE-unsupervised to another network baseddriver genes prioritization method - DriverNet [14]. DriverNet itself does not aim to identify modules thatwe can use to compare against HIT’nDRIVE-OptDis or HIT’nDRIVE-unsupervised modules. Rather,DriverNet identifies driver genes in an iterative fashion, where in each iteration, DriverNet picks thedriver genes which “covers” the maximum number of uncovered outliers. We use this driver and theoutlier genes it covers as the “next” DriverNet module.We used the set of prioritized sub-networks, i.e. the driver modules, first, to perform binary sampleclassification: tumor vs normal. For this, we used gene-expression data for each of the four cancertypes (GBM, OV, PRAD and BRCA) from TCGA as discovery datasets to calculate the mean geneexpression value for each sub-network/driver module, for each patient. On these sub-networks, we usedthe k-nearest neighbour (KNN) classifier (with k = 1), to perform classification on both the expressionvalues from TCGA, and additional validation gene-expression datasets (Figure 2.7A-C). The additionalvalidation datasets were used in order to assess the capability of the modules identified on TCGA cohort,in classifying other cohorts.For every dataset analyzed, the maximum classification accuracy achieved by HIT’nDRIVE mod-ules (either HIT’nDRIVE-unsupervised or HIT’nDRIVE-OptDis), for any number of modules consid-ered, was higher than that achieved by DriverNet modules (Figure 2.7A). Moreover, in most datasets,HIT’nDRIVE methods achieve maximum or near-maximum accuracy using a smaller fraction of mod-ules. All three methods achieved perfect or near perfect classification accuracy in TCGA-GBM, TCGA-OV and TCGA-BRCA datasets except for TCGA-PRAD dataset (where the maximum classificationaccuracy achieved was 90% by HIT’nDRIVE-Unsupervised, 95% by HIT’nDRIVE-OptDis and 86% byDriverNet). Overall, the driver modules (identified in one cohort) were able to distinguish the tumorphenotype from normal very well in validation datasets (on other cohorts) supporting the relevance ofthe identified driver genes to the cancer phenotype.192.4.10 CGC cancer type-specific gene enrichment.Next, we looked into the list of prioritized driver genes by both HIT’nDRIVE and DriverNet and theiroverlap with the known CGC genes (Figure 2.7B). DriverNet selects a much larger number of drivergenes, as compared to HIT’nDRIVE, to cover most outlier genes (across all four cancer types) dueto its model considering only direct interactions in the network. In particular, in OV and BRCA, thenumber of HIT’nDRIVE identified driver genes are an order of magnitude smaller than that of DriverNet.Although in GBM and PRAD datasets, the number of driver genes identified by DriverNet is somewhatlower and comparable to that identified by HIT’nDRIVE (primarily because most outliers were filteredout due to sharing no interaction edge with candidate altered genes), HIT’nDRIVE identified drivergenes cover a significantly larger number of outliers. More importantly, even though HIT’nDRIVEidentifies a smaller number of driver genes, a larger fraction of these driver genes can be found in CGCdatabase - in comparison to the DriverNet identified driver genes. In fact, even a larger fraction of CGCgenes specific to the relevant cancer type can be found among HIT’nDRIVE identified driver genes.Specifically, HIT’nDRIVE predicted four glioblastoma specific CGC genes (IDH1, PDGFRA, PIK3CAand PIK3R1) in TCGA-GBM dataset. Among them, IDH1, PDGFRA and PIK3CA were not identified byDriverNet. Similarly, four ovarian cancer specific CGC genes (BRCA1, BRCA2, CCNE1 and MAPK1)were predicted in TCGA-OV dataset. CCNE1 was not identified by DriverNet. Five prostate cancerspecific CGC genes (BRAF, ERG, FOXA1, PTEN and SPOP) were predicted in TCGA-PRAD dataset.BRAF and SPOP were not identified by DriverNet. And seven breast cancer specific CGC genes (BRCA2,CCND1, CDH1, GATA3, MAP3K1, PIK3CA and TP53) were predicted in TCGA-BRCA dataset. Amongthem, CDH1 and MAP3K1 were not identified by DriverNet.2.4.11 Phenotype classification using CGC gene seeded modulesTo evaluate the difference between HIT’nDRIVE predicted driver genes and a list of known driver genes(from CGC), we performed the following experiments. First, using HIT’nDRIVE-OptDis, we comparedthe HIT’nDRIVE driver seeded module with CGC gene seeded module to classify tumor vs normalsamples in TCGA-PRAD patient cohort. Note that among the four TCGA cancer cohorts we study inthis paper, only the PRAD cohort includes non-trivial number of patients with no known driver genes(based on an unpublished study by PCAWG project) and thus provides a good testbed for novel drivergene identification by HIT’nDRIVE. As can be seen, HIT’nDRIVE identified driver seeded modulesprovide higher classification accuracy, potentially due to novel driver genes identified by HIT’nDRIVE.The top HIT’nDRIVE modules associated with PRAD are seeded by (in the order of discriminativeability) ERG, ACAN, FOXA1, ERG, PTEN and CDKN1B (Figure 2.8A). All but ACAN are CGC genes20associated with PRAD. HIT’nDRIVE successfully identified all these driver genes without the use of anyinformation related to known PRAD driver genes from CGC. In addition, HIT’nDRIVE identified ACAN,a non-CGC gene as a potential driver gene of PRAD. In comparison, the modules identified for CGCPRAD driver genes were seeded by (again in the order of discriminative ability) ERG, FOXA1, NCOR2,BRAF, ERG and AR - missing PTEN due to potentially large overlap with other modules. Overall, themodules seeded by HIT’nDRIVE identified driver genes provide a higher accuracy in discriminatingPRAD than CGC PRAD driver genes.Next, we compared HIT’nDRIVE driver genes to CGC genes in breast cancer subtypes in TCGA-BRCA patient cohort. Note that breast cancer is possibly the best studied cancer type with respect todriver genes Thus it is not surprising that Basal, Her2 and Luminal-B subtypes show negligible dif-ferentiation between HIT’nDRIVE predictions and CGC based predictions (Figure 2.8B). This is dueto big overlap between HIT’nDRIVE discovered modules and CGC modules (e.g. in BASAL, top 4HIT’nDRIVE modules almost perfectly match the top 4 CGC modules - which, again, is not surpris-ing since BRCA is a very well studied cancer with respect to driver genes). However, HIT’nDRIVEshow some advantage in Luminal-A. HIT’nDRIVE outperformed the CGC genes from 43rd module on-ward. This may be due to HIT’nDRIVE predicted driver genes (seeds) such as DMD, ROCK1, AGAP1,SHANK2 which are not part of CGC and these genes play important role in cancer.2.5 DiscussionHere, we have presented a network-based combinatorial method, HIT’nDRIVE, which models the col-lective effects of sequence altered genes on expression altered genes. HITnDRIVE aims to solve the“random-walk facility location” (RWFL) problem on a gene/protein interaction network which differsfrom the standard facility location problem by its use of “hitting time”, the expected minimum numberof hops in a random-walk originating from any sequence altered gene (i.e. a potential driver) to reachan expression altered gene, as a distance measure. We introduced the notion of “multi-hitting time” andpresented efficient and accurate methods to estimate it based on single-source hitting time in large-scalenetworks. HITnDRIVE reduces RWFL (with multi-hitting time as the distance) to a weighted multi-setcover problem, which it formulates and solves as an ILP.As a measure of influence, hitting time - the expected length of a random walk between two nodes, orits general version, the multi-hitting time, is quite different from the diffusion-based measures or RootedPageRank, which are based on asymptotic distributions. We argue that hitting time is a better measure forour purposes as it is: (i) parameter free (diffusion model introduces at least one additional parameter - theproportion of incoming flow “consumed” at a node in each time step), (ii) it is time dependent (while the21diffusion model and PageRank measures the stationary behavior) and (iii) it is more robust with respectto small perturbations in the network [74].In this chapter, we demonstrated the robustness of HIT’nDRIVE to identify cancer driver genes inmulti-omics cancer datasets using a number of different experiments such as - varying the user definedparameters of HIT’nRIVE, randomizing the input data, randomizing the interaction network, using dif-ferent interaction networks. We also demonstrated that HIT’nDRIVE is able to capture cancer drivergenes, to a larger extent, in the tumors analyzed. Furthermore, we demonstrated that it is also possi-ble to perform accurate phenotype prediction for tumor samples by only using HITnDRIVE implieddriver genes and their “network modules of influence” (small sub-networks involving each driver genewhere the aggregate expression profile correlates well with the cancer phenotype) as features, providingadditional evidence that these genes may be driving the cancer phenotype. The network modules weidentified may provide new insights into the biological mechanisms underlying tumor progression.2.6 Methods2.6.1 Datasets and AnalysisWe used publically available datasets of four major cancer-types glioblastoma multiforme (GBM) [175],Ovarian serous cystadenocarcinoma (OV) [176], breast adenocarcinoma (BRCA) [177], and prostateadenocarcinoma (PRAD) [178] from The Cancer Genome Atlas (TCGA) project. All data were ob-tained from TCGA data-portal in May 2014 which were mapped to GRCh37 genome build. AlthoughTCGA has recently made available all data re-aligned to the newer GRCh38 genome build, to ensurecompatibility, all TCGA data we have used in this study has been mapped to GRCh37.Somatic mutationSomatic mutation calls (level 2 data) from all available platforms/centres were merged. Only missense,nonsense and splice-site mutations were marked as somatic-mutation alteration events.Copy number aberrations (CNAs)CNAs for GBM and OV, Agilent Human Genome CGH Microarray 244A (level 1) data files were usedand for PRAD and BRCA, Affymetrix Genome-Wide Human SNP Array 6.0 (level 3) data files wereused to generate the copy number profiles.These Agilent FE format sample files were loaded into BioDiscovery Nexus Copy Number softwarev7.0, where quality was assessed and data was visualized and analyzed. All samples were mapped to the22most recent genome build (hg 19, NCBI build 37) via Agilent probe identifiers and annotation (down-loaded from Agilent’s website) based on the 1M SurePrint G3 Human CGH Microarray 1x1M designplatform. BioDiscovery’s FASST2 segmentation algorithm, a Hidden Markov Model based approach,was used to make copy number calls. The FASST2 algorithm, unlike other common HMM methods forcopy number estimation, does not aim to estimate the copy number state at each probe but uses manystates to cover more possibilities, such as mosaic events. These state values are then used to make callsbased on a log-ratio threshold. The significance threshold for segmentation was set at = 5X10-6) alsorequiring a minimum of 3 probes per segment and a maximum probe spacing of 1000 between adjacentprobes before breaking a segment. The log ratio thresholds for single copy gain and single copy losswere set at 0.2 and -0.23, respectively. The log ratio thresholds for two or more copy gain and homozy-gous loss were set at 1.14 and -1.1 respectively. Upon loading of raw data files, signal intensities arenormalized via division by mean. All samples are corrected for GC wave content using a systematiccorrection algorithm. Only the high confidence copy number aberrations i.e. high copy number gain orhomozygous deletions were marked as copy-number aberrant events. Finally, genes that harbour eithera somatic-mutation aberrant event or a copy-number aberrant event were taken to be the final list ofabberant genes at the genomic level.Gene expressionWe used microarray based gene-expression (Affymetrix HT Human Genome U133 Array Plate Set)(level-1) for GBM and OV data sets. Where as for BRCA and PRAD data sets, RNA-seq derived gene-expression were used (level-3). Gene expression profiles of normal and tumor phenotype were used assample groups.Gene fusionsTranscript fusions prediction calls for GBM, OV, BRCA and PRAD were obtained from TCGA Fusiongene Data Portal (http://www.tumorfusions.org) [207]. The fusion partner genes were tagged for gene-fusion alteration.2.6.2 Interaction networksWe used STRING version 10 [168] protein-interaction network which contains high confidence func-tional protein-protein interactions (PPI). Self-loops and interactions with missing HGNC symbols werediscarded and interaction scores were divided by 1000 to obtain percentage-like reliability score. Onlyhigh confidence interactions with combined score of 0.9 or greater were selected. As a result we obtained23a network of 10971 nodes with 214298 interactions.In the case of prostate cancer, we integrated STRING-10 protein-protein interaction network withprotein-DNA interaction network derived from Chip-seq experiments for transcription factors highlyrelevant to prostate cancer - REST, FOXA1, AR, EZH2 [150] and ERG [141] resulting in a new combinednetwork of 13517 nodes and 220190 interactions.To simulate HIT’nDRIVE using different underlying network we used two additional interaction net-works: Human Protein Reference Database - Protein-Protein Interaction Database (HPRD-PPI) network(version 9.0) [134] and REACTOME pathway database (version 2015) [55].2.6.3 Validation datasetFor the validation of driver-modules we used the following gene-expression datasets: GBM: Murat-2008 [122], Sun-2006 [164]; OV: Yoshihara-2009 [205], Bowen-2009 [20]; PRAD: Taylor-2010 [169],Grasso-2012 [66], SMMU-PC [138]; BRCA:METABRIC [42] and Richardson-2006 [140].2.6.4 Derivation of expression outlier genesWe used generalized extreme studentized deviate (GESD) test [144] to obtain the outlier genes. UnlikeGrubbs test and the Tietjen-Moore test, GESD test only requires that an upper bound for the suspectednumber of outliers be specified. Given the upper bound, r, the GESD test essentially performs r separatetests: a test for one outlier, a test for two outliers, and so on up to r outliers.2.6.5 Derivation of expression outlier gene weightsOutlier-gene weights were calculated as follows: Let i denote genes, j denote patients and xi j denote thegene-expression value of gene i in patient j. We then calculated the absolute value of z-score (zi j).zi j =|xi j−µi|σiwhere, µi and σi respectively denotes mean and standard deviation of expression value of gene i. Nextwe performed Student’s t-test in the gene-expression values of normal and tumor phenotypes. where,ψi =−log(pvaluettest). Finally, we calculate the outlier weight ωi j asωi j =ψizi j∑iψizi j242.6.6 Statistical significance of the overlap of driver genes with that of CGC database.Suppose, for a cohort of cancer patients, we predict ntotal number of driver genes using HIT’nDRIVE, outof which ncgc number of driver genes are present in the CGC database (of known cancer driver genes).Let, x be the total number of sequence altered genes (i.e. all potential driver genes) and let y of these xsequence altered genes be in CGC. This means that the probability that a randomly selected gene out ofthese sequence altered genes happens to be a CGC gene is ( yx).The probability (p-value) that at least ncgc out of ntotal driver genes are identified in CGC is:pvalue =ntotal∑i=ncgc(ntotali)(yx)i(1− yx)ntotal−iNext we consider driver genes in each patient. We also calculated the p-value for HIT’nDRIVE topick at least p CGC drivers out of p′ and pick at most q non-CGC drivers out of q′ as followspvalue =x=p′+q′∑x=p′(p+qx)(pp+q)x( qp+q)p′+q′−x25Figure 2.1: Overview of HIT’nDRIVE algorithmic framework. (A) HIT’nDRIVE integratesgenome and transcriptome data obtained from patients’ tumor samples. The red and blue col-ors represent genomic alterations and transcriptomic changes in tumor samples, respectively.The influence values derived from the protein interaction network indicate how likely a drivergene influences its downstream target genes in the network. (B) The predicted driver genes areused as seeds to discover modules of genes that discriminate between the sample phenotypesusing OptDis. (C) Based on this the driver modules are ranked and thus prioritized.26Figure 2.2: HIT’nDRIVE identified driver genes with respect to varying parameter values in100 selected BRCA samples. (A-B) The number of driver genes identified by HIT’nDRIVEwith respect to the varying values of (A) γ , and (B) α . (C) The number of driver genesidentified by HIT’nDRIVE with respect to three outlier detection threshold values, acrossvarying values of the γ . (D) Proportion of HIT’nDRIVE detected driver genes obtained foroutlier threshold of 0.01 which are also detected when the outlier threshold is 0.05 and 0.1.(E-F) Driver genes predicted by HIT’nDRIVE in non-randomized data compared with thedriver genes predicted using randomized (i.e. by gene label swapping for 100 iterations). (E)altered genes and (F) outlier genes.27Figure 2.3: HIT’nDRIVE identified driver genes with respect to underlying network used in100 selected BRCA samples. (A) Venn Diagram showing the overlap of nodes in the threedifferent networks used - STRING v10 (only high-confident interactions), HPRD v9.0, andREACTOME v2015. (B) Comparison between the number of nodes in the network. (C) Com-parison between the number of edges in the network. (D) Comparison between the numberof driver genes detected using different networks. (E) Proportion of common driver genes be-tween the networks (STRING-REACTOME and HPRD-REACTOME) as compared to drivergenes detected using REACTOME network. (F) HIT’nDRIVE identified driver genes withrespect to network perturbation. The edges of the STRING ver-10 network was perturbedto different extent (between 1-10%) preserving the degree of the nodes in the network. Pro-portion of common driver genes between the unperturbed network and each of the perturbednetwork were calculated.28Figure 2.4: Modified HIT’nDRIVE not required to prioritize at least one driver gene per pa-tient. (A) Modified ILP formulation where we removed the constraint that ensured at leastone driver gene is prioritized per patient. (B) HIT’nDRIVE simulation with different valuesof gamma (γ) parameter with the modified ILP formulation as given in A. Each line repre-sents different values of alpha (α) parameter, which controls the fraction of total outliers tobe covered. (C) We calculated the fraction of patients with no driver genes prioritized, for thesame set of driver genes prioritized in B.29Figure 2.5: Likelihood of HIT’nDRIVE to capture CGC Genes. (A-D) Sequence-wise alteredCGC genes prioritized by HITnDRIVE v.s. that of non-CGC genes, for each patient sam-ple, across four cancer types. Only CGC genes specific to a cancer type is considered here.Green: Cancer specific sequence-wise altered CGC genes prioritized by HITnDRIVE; Red:Cancer specific sequence-wise altered CGC genes NOT-prioritized by HITnDRIVE; Orange:Sequence-wise altered non-CGC genes prioritized by HITnDRIVE; Purple: Sequence-wisealtered non-CGC genes NOT-prioritized by HITnDRIVE. The right panel depicts absolutenumbers and the left panel depicts relative proportions. As can be seen the likelihood of asequence-wise altered CGC gene to be prioritized by HITnDRIVE is much higher than thatof a non-CGC gene. (E) P-value Distribution of the likelihood of HIT’nDRIVE to pick CGCgenes.30Figure 2.6: Correlation between the number of driver genes predicted by HITnDRIVE withmutation rate and copy-number burden (A) Correlation between Mutation rate (frequencyof somatic mutation per Mb) with copy-number burden (percentage of genome changed cal-culated using somatic copy number changes). Correlation of the number of driver genespredicted by HIT’nDRIVE with (B) mutation rate and (C) copy-number burden.31Figure 2.7: Phenotype classification using driver-seeded modules. (A) Phenotype (tumor vs nor-mal) classification accuracy in gene-expression datasets of different cancer-types using threedifferent methods - HIT’nDRIVE-unsupervised (left panel), HITn’DRIVE-OptDis (middlepanel) and DriverNet (right panel). (B) Comparison of HIT’nDRIVE with DriverNet.32Figure 2.8: Phenotype Classification using CGC Genes Seeded Modules. Phenotype Classifi-cation accuracy of HIT’nDRIVE driver seeded module vs Cancer Gene Census (CGC) genesseeded modules. (A) TCGA-PRAD gene-expression dataset with Tumor and Normal sam-ples. (B) Subtype classification accuracy of HITnDRIVE identified driver seeded modules vsCGC BRCA driver seeded modules on TCGA-BRCA cohort with respect to four subtypes ofbreast cancer (Basal, Her2, Luminal-A and Luminal-B).33Chapter 3Application of HIT’nDRIVE:patient-specific multi-driver geneprioritization for precision oncology3.1 IntroductionTo demonstrate the utility of the HIT’nDRIVE, we analyzed over 2200 genomes and transcriptomes(gene expression) of tumors from four major cancer types - glioblastoma, ovarian, breast and prostatecancer from TCGA project. We present the driver genes obtained by HIT’nDRIVE on this dataset andexplore their functional properties. Many of the HIT’nDRIVE identified driver genes turn out to beknown drivers from the CGC database [59], demonstrating that it is possible to replicate the lengthy andcostly experimental approaches for detecting driver genes in common tumor types by HIT’nDRIVE -in-silico, strongly supporting the biological relevance of HIT’nDRIVE’s algorithmic framework. Thisobservation increases our confidence in the calls made by HITnDRIVE in rarer tumor types for whichdriver genes are mostly unknown. In fact, the initial results of the PanCancer Atlas project project [12]reveal that more than 20% of tumors do not have a single (genomically altered) driver gene from CGC.3.2 Our ContributionsIn this chapter, we used HIT’nDRIVE to identify both known as well as rare (and potentially novel)patient-specific driver genes on large multi-omics data from different cancer types. We also demonstratethat by using HIT’nDRIVE-identified driver genes and associated “network modules” (sub-networks34seeded by driver genes whose aggregate expression profiles correlate well with the cancer phenotype)as features, it is possible to perform accurate phenotype classification - as additional evidence thatthese genes are likely drivers of the cancer phenotype. We found a number of breast cancer subtype-specific driver modules that are associated with patients’ survival outcome. Finally, we demonstrate thatHIT’nDRIVE-identified driver genes accurately predict drug efficacy in pan-cancer cell lines.3.3 Results3.3.1 HIT’nDRIVE predicts frequent as well as infrequent driver genes in multi-omicscancer datasetsWe applied HIT’nDRIVE to prioritize driver genes in four major cancer types - Glioblastoma multiforme(GBM) [175], Ovarian serous cystadenocarcinoma (OV) [176], Breast adenocarcinoma (BRCA) [177], andProstate adenocarcinoma (PRAD) [178] obtained from the TCGA data portal. Only samples with matchedgenomic alterations (SNVs and/or CNAs and/or gene fusions) and transcriptomic changes (outlier genesfrom gene-expression profile) were used in our study. We used the fusion prediction calls as reported inthe TCGA Fusion gene Data Portal [207].In GBM, we obtained 48 unique candidate driver genes altered at varying frequencies across 258GBM patients. EGFR (36%), TP53 (29.5%), PTEN (28%) and CHEK2 (26%) were the most fre-quently altered driver genes in GBM followed by CDKN2A (16%), RB1 (13%), SEC61G (12%). Previ-ous efforts in GBM genome characterization identified amplification in EGFR, PDGFRA, mutations inCHEK2, TP53, PTEN, RB1, NF1 and deletions in CDKN2A to be associated with GBM [128, 175, 193].HIT’nDRIVE prioritized all of the above alterations. Alterations in EGFR is characteristic of classicalsubtype, NF1 with mesenchymal subtype, PDGFRA and IDH1 with pro-neural subtype of GBM [193].Fifteen out of 48 driver genes predicted by HIT’nDRIVE (p-value = 8X10-4), were present in CGCdatabase [59], that contains genes for which mutations have been causally implicated in cancer (3.1A).GSTT1 (deleted in 21 patients), a key player in drug metabolism, was neither found in CGC nor in Cata-logue of Somatic Mutations in Cancer (COSMIC) [58] databases. Twelve GBM driver genes were foundto be actionable targets. Actionable genes were extracted from TARGET database [187], which containsgenes directly linked to a clinical action. In addition to the above list, 6 other driver genes were druggable(Figure 3.1B). We extracted the list of druggable genes from Drug-Gene interaction database (DGIDB)[70]. Interestingly, around 85% of the patients in GBM cohort harbour at least one actionable driver geneand further 5% of patients have druggable targets (Figure 3.1C). HIT’nDRIVE also identified 12 infre-quent driver genes, which we define as genes altered in at most 2% of the cases. Among the infrequent35genes, SACS is known to be associated with neurological functions, NLRP3 is involved in apoptosis, andTIAM2 is involved in invasion and metastasis.The 526 OV patients harboured a total of 85 unique driver alterations . TP53 mutations were preva-lent in more than half (58%) of the patients in the cohort. Consistent with the previous findings, wefound OV patients to be driven by genomic copy-number changes rather than recurrent point muta-tions [38, 129]. Recurrent somatic CNAs were observed in GSTT1 (32.3%), WWOX (28.1%), FAM49B(15.0%), UGT2B17 (14.6%), CCNE1 (13.1%), SLC39A4 (13.1%) and MYC (12.5%). Mutations inTP53, BRCA1/2 and loss of RB1, NF1 and CCNE1 were previously associated with OV [129, 176].HIT’nDRIVE revealed 18 CGC driver genes (p-value = 2X10-5) (Figure 3.1A) among which 13 geneswere actionable targets and other 12 genes were at least druggable (Figure 3.1B). More than 75% of OVpatients harboured at least one actionable targets and additional 6% of patients have druggable target(Figure 3.1C). GSTT1 (altered in 170 patients), in OV, is involved in estrogen and drug metabolism. Itwas neither found in CGC nor in COSMIC databases. We identified 13 infrequent genes, among whichMAPK1 is known to play an important role in oncogenic pathways in cancer.HIT’nDRIVE identified 40 driver genes across 333 PRAD patients Copy number loss of SPECC1L(23.7%), STEAP1B (13%), WWOX (10%) and amplification of NSD1 (16.2%), SIRPB1 (16.2%) werethe most recurrent events in PRAD patients. We also found recurrent somatic mutation in MUC4(11%), SPOP (10.5%) and TP53 (10%). The most common alterations in PRAD genomes are fusionof androgen-regulated promoters with ERG and other members of ETS family of transcription factorsmainly, TMPRSS2-ERG fusions [180]. Since we relied on the gene fusion predictions obtained fromTCGA Fusion gene Data Portal [207] which analyzed only 178 (out of 333) patients, we observed ERGgene fusion in only 5.7% cases. The more recent TCGA publication [178] reported ERG fusions inalmost half of the patients in the cohort. Moreover, the tools used for gene fusion detection, in the twostudies, were different as a result of which we observed much smaller number of ERG fusions than re-ported previously. SPOP, TP53, FOXA1 and PTEN are the most frequently mutated genes which havebeen previously associated with prostate cancer [13]. PRAD patients harboured 12 driver genes presentin CGC database (p-value = 9X10-4) (Figure 3.1A) out of which 8 driver genes were actionable (Figure3.1B). Approximately a quarter of PRAD patients could benefit with actionable targeted therapy Figure(3.1C). Moreover an additional 14% of patients harboured druggable genes which warrants deeper inves-tigation of drug repurposing opportunities. NBPF1 (mutated in 17 patients), is a known tumor suppressorgene known to have neural function and also involved in cell-cycle arrest, was neither found in CGC norin COSMIC databases. We identified 11 infrequent genes in PRAD among which IDH1 mutant patientswere recently identified as a distinct molecular-subtype of PRAD [178], NKX3-1 is required for normalprostate tissue development and CDKN1B was previously associated with PRAD.36In BRCA, HIT’nDRIVE identified 107 driver genes across 1090 patients Somatic mutation of PIK3CA(30.5%) and TP53 (30.2%) were the most recurrent events in BRCA. This was followed by somatic mu-tation of CHD1 (11.2%), GATA3 (10.5%), MUC16 (6.9%), MAP3K1 (6.9%) and CNA amplification ofNSD1 (8.7%) and MED1 (6.9%). BRCA patients harboured 16 genes present in CGC database (p-value= 9.3X10-3) (Figure 3.1A) among which 10 genes were actionable targets (Figure 3.1B). More than 60%of BRCA patients could benefit with the actionable targeted therapy. Furthermore, additional 11% ofBRCA patients harboured at least one of the 19 potentially druggable genes (Figure 3.1C). ACACA (al-tered in 36 patients mostly from HER2 subtype), involved in fatty-acid metabolism, was neither foundin CGC nor in COSMIC databases. We identified 46 infrequent driver genes among which BRCA2 andGNAS have been previously linked to BRCA.3.3.2 Network properties of cancer driver genesCentrality of Driver Genes in the Interactome.Cancer driver genes are known to occupy critical positions in the interactome. To check whether HIT’nDRIVEpredicted driver genes also occupy similar positions in the interaction network, we used the node degreeas a “local measure”, and node betweenness (the number of shortest paths between node pairs that passthrough the node) as a “global measure” of centrality. The driver genes predicted by HIT’nDRIVE in-clude a number of well-known high-degree hubs - TP53, EGFR, RB1, MYC, PIK3CA, ERG, CHD1 thatare “central” in the interactome with high degree and high betweenness (Figure 3.2A). Although therewas very weak correlation between the number of edges (i.e. degree centrality) of a node and the numberof samples/patients in which it is identified as a driver, remarkably, each hub gene was typically altered ina large fraction of patients. Because of their centrality perturbations, hub genes are likely to dysregulateseveral other genes and the associated signaling pathways. Interestingly, HIT’nDRIVE also identifiedlow-degree genes (IDH1, MTAP, NF1, NRG1, NSD1) that reside in the periphery of the interaction net-work. In particular, in prostate cancer, there seems to be an inverse correlation between the degree andhow often the gene is picked as a driver. Most of these low-degree genes are altered in a small fraction ofpatients, indicating that HIT’nDRIVE, unlike many other methods, does not primarily return hubs thatare altered in a large number of patients but is capable of identifying rare driver genes without trivialtopological biases.37Influential nodes prioritized as cancer driver genes.Next we examined the influential driver genes that are responsible for driving cancer. For this, wecomputed the total outgoing influence from each altered gene (which has been chosen as a driver), definedas the weighted sum of all influence values from the source to all outlier genes it is connected to (targets),weighted by the corresponding outlier weights. First we investigated driver genes with high influencevalues within each cancer type. We observed that on average the total influence of driver genes washigher than that of other altered genes in all cancer types (Figure 3.2D). EGFR, PTEN, CHEK2, TP53and CDKN2A were the most influential driver genes in GBM which together exerted 38.5% of the totalinfluence on the GBM patient cohort. In OV, TP53, GSTT1 and MYC together exerted 20% of the totalinfluence. Similarly, in PRAD cohort, SPOP, MUC4 and TP53 were the most influential genes exerting23.7% of the total influence. PIK3CA, TP53 and CHD1 were the most influential genes exerting 23% ofthe total influence on the BRCA patient cohort. Moreover, the gene influence was positively correlatedto its alteration frequency (Figure 3.2E).We investigated influence of the predicted driver genes within individual patients. Many recurrentlyaltered driver genes had higher influence compared to other driver genes. For example, EGFR in GBM;TP53 in OV; ERG in PRAD; TP53, PIK3CA and PTEN in BRCA.Interestingly, among the highly influential genes there were also less-recurrent but functionally im-portant and actionable driver genes. For example, somatic mutations in ABCB1 were influential drivergenes in seven GBM patients (3.2F). ABCB1 is a membrane-bound protein present in the endothelialcells of the blood-brain barrier. It harnesses the energy of ATP hydrolysis to drive the unidirectionaltransport of exogenous and xenobiotic substances (drug compounds) from the cytoplasm to the extra-cellular space. It is known to transport many anticancer compounds including Temozolomide (TMZ),which is used as a first-line treatment for GBM patients. Mutations and over-expression of ABCB1 inGBM have been associated with resistance to TMZ [107]. It was intriguing that some of these GBMpatients had undergone treatment prior to tissue collection and were initially mislabelled as untreatedpatients. Treatment-induced selection pressure in the drug transporter might be a plausible reason forhigh influence exerted by ABCB1.Similarly, HIT’nDRIVE predicted BRAF as driver genes in eight PRAD patients (6 somatic muta-tions and 2 gene-fusions) (Figure 3.2G). These patients harboured BRAF as a highly influential drivergene. None of these patients harboured BRAFV600E mutation that is prevalent in cutaneous melanomas,thyroid cancer and many other cancer types. However, BRAFL597R can be targeted using MEK inhibitors[21, 43]. BRAF plays important roles in growth factor signalling pathways, which affects cell divisionand differentiation. These results serve as proof of concept that HIT’nDRIVE can prioritize functionally38relevant cancer driver genes.3.3.3 Breast cancer subtype classification using driver modules.Our next goal was to classify four major subtypes of breast cancer - Basal, HER2, Luminal-A andLuminal-B. For that purpose, we performed binary classification for each subtype: e.g. Basal vs non-Basal (including the normal samples). This was achieved through the use of HIT’nDRIVE-identifieddriver genes from TCGA-BRCA as seed genes, with which we identified subtype-specific driver mod-ules from TCGA-BRCA gene-expression data (as described for tumor classification). We respectivelyobtained 37, 16, 43 and 39 subtype-specific driver modules for Basal, HER2, Luminal-A and Luminal-B subtypes. As described above, using these sub-type specific driver modules as features, we per-formed independent classification of BRCA subtypes in TCGA-BRCA, METABRIC-Cambridge andMETABRIC-Vancouver datasets [42].Majority of Basal-like tumors constitute Triple-Negative Breast Cancer (TNBC), which are highlyaggressive tumors characterized by lack of expression of estrogen receptor 1 (ESR1), progestrone recep-tor (PGR) and erb-b2 receptor tyrosine kinase 2 (ERBB2). Molecular mechanisms driving TNBC areleast understood and hence, no targeted therapies for TNBC yet exists [17]. Interestingly, HIT’nDRIVEseeded driver modules were able to classify Basal-like tumors with much higher accuracy (98%) as com-pared to other BRCA-subtypes - HER2 (94%), Luminal-A (85%) and Luminal-B (83%) (Figure 3.3A).As expected, ESR1 and PGR was highly expressed in Luminal-A/B but not in Basal and HER2 sub-types. Modules containing ESR1 were consistently down-regulated in Basal subtype and up-regulated inLuminal-A/B subtype whereas module LUMB-03 was up-regulated in Luminal-B subtype. The ESR1network neighbourhood included eleven known transcriptional targets of ESR1 (TFF1, PGR, SLC9A3R1,GNAS, RARA, WWP1, WNT5A, TCF7L2, FKBP4, SPRY2, and RAD54B). These results were consistentwith previous findings [51]. ERBB2 was expressed only in 9 (of 16) HER2 modules and was the mostprominent hub in the large interactome of HER2 modules. All modules containing ERBB2 were up-regulated in HER2 subtype and module expression pattern were consistent in different BRCA datasets.PGR was present in 2 modules (BASAL-26 and HER2-12) both of which were down-regulated in Basalsubtype but up-regulated in Luminal-A/B. These results strongly suggest that HIT’nDRIVE can cap-ture subtype-specific driver genes, and the driver-seeded modules we identified can indeed differentiateBRCA subtypes.393.3.4 Subtype-specific breast cancer driver modules are associated with survivaloutcome.To test for association of subtype-specific driver modules with patient survival outcome, we developeda risk-score defined as a linear combination of the normalized gene-expression values of the componentgenes in the module weighted by their estimated univariate Cox proportional-hazard regression coeffi-cients (see Methods). Based on the risk-score values, patients were stratified into low-risk (risk-score< 33 percentile) and high-risk (risk-score > 66 percentile) groups. Both Cox regression coefficients ofeach gene and risk-score cutoff values for each module were estimated from TCGA-BRCA cohort (train-ing dataset), later these values were applied to METABRIC cohorts (test dataset). To assess whether therisk-score assignment to high/low categories was valid, a log-rank test was performed for each modulein both training and test datasets.We first compared driver-seeded modules against driver-gene-free modules that, according to Opt-Dis, have the best discriminative score for the TCGA-BRCA dataset. For each module we calculatedthree distinct indices: log-rank test pvalue, Hazard Ratio (HR) and Concordance-index (C-INDEX). Wefound driver-seeded modules to outperform driver-free modules on all three indices demonstrating thatthe driver-seeded modules were better correlated with survival (Figure 3.3B). Motivated by this, weidentified the top modules for each of the BRCA subtypes which do well based on all three indices andchecked whether they can return meaningful results with respect to survival. We found 9 driver mod-ules significantly associated with patients’ survival outcome (p-value < 0.01, hazard-ratio > 1.5 andconcordance-index > 0.5) in TCGA-BRCA cohort. These 9 modules were also significantly associatedwith patient survival outcome (p-value < 0.01) in two additional cohorts (METABRIC cohorts). It isinteresting to note that two of these modules (BASAL-02 and HER2-01) were seeded by an oncogene- nuclear receptor coactivator 3 (NCOA3) driver gene. NCOA3 driver module was the second-topmostmodule (Figure 3.3C) to separate Basal from other subtypes and the top-most module to separate HER2subtype. NCOA3 driver module was down-regulated in Basal subtype and associated with patients’ over-all survival (Figure 3.3D-E). A fraction of breast (and ovarian) cancer patients are known to harbourNCOA3 mutation, amplification or deletion [71]. NCOA3 alone cannot distinguish the basal subtype.NCOA3 requires other component genes in the module (AR, XBP1, TFF1 and SPDEF) to collectivelydistinguish the basal subtype which, as per our knowledge, is a novel finding. However, the interactionwithin the module are well known. NCOA3 is a coactivator of steroid hormone receptor, AR and ESR1,and transcriptional target of XBP1 [71]. NCOA3 is known to stimulate many intracellular signalingpathways that are critical for cancer proliferation and metastasis. The activity of NCOA3 is known tobe associated with reduced responsiveness to tamoxinfen in patients [126]. SPDEF is associated with40regulation of AR activity [100].3.3.5 HIT’nDRIVE seeded driver genes accurately predict drug efficacyNext, we obtained somatic mutation, copy number aberration and gene expression data of pan-cancer celllines from Genomics of Drug Sensitivity in Cancer (GDSC) project [80]. We used HIT’nDRIVE to iden-tify driver genes of individual cancer cell lines. Following up on the premise by [80] that potential drivergenes (i.e. cancer genes, which include the CGC genes) alone could predict drug efficacy fairly well, thepredicted driver genes were used as seeds in the network (STRING v10) to identify sub-networks thatdiscriminate between the drug-response phenotypes (i.e. sensitive vs resistant cell lines). As availablein GDSC, 265 different drug treatments were tested on each cell line provided. We present results for25 cancer types (the remaining 5 cancer types for which only a very limited number of cell lines areavailable are statistically insignificant and thus have not been used).Perhaps our most interesting result is that, for many drugs, the top HIT’nDRIVE predicted drivermodule for cell lines of a specific cancer type (more specifically, OptDis modules seeded by HIT’nDRIVEidentified driver genes, prioritized with respect to drug efficacy) not only includes the drug target butalso the associated (downstream) signaling pathway. As importantly, we measured the accuracy of drug-response phenotype classification using HIT’nDRIVE-OptDis for each drug-treatment in different cancertypes (Figure 3.4A). In most cancer types, HIT’nDRIVE-OptDis correctly predicted the response to morethan 25% of the drugs in 95% of the cell lines or more. Specifically, Stomach adenocarcinoma (STAD)and Chronic Myelogenous Leukemia (LCML) are the cancer types with highest fraction of drugs pre-dicted with an accuracy of 95% or more whereas Liver hepatocellular carcinoma (LIHC) and GBM arethe cancer types with the lowest fraction of drugs predicted with the same accuracy. Below we pro-vide some of our observations on three well known/promising cancer drugs for which we obtained highaccuracy on cell lines of specific cancer types.Gefitinib is a clinically approved (for patients with non-small cell lung cancer) protein kinase in-hibitor which selectively inhibits EGFR. Interestingly, in BRCA, EGFR copy-number amplification oroverexpression primarily activates RAS-RAF-MAPK pathway and PI3K-AKT-mTOR pathway trigger-ing response for cell proliferation, invasion and survival. Using HIT’nDRIVE, EGFR was found as adriver gene of BRCA cell lines. Furthermore, EGFR seeded driver module was the second highest scor-ing module to distinguish the drug-response phenotype increasing the classification accuracy to 98%(Figure 3.4B,C).Another example, Nutlin-3a is a promising pre-clinical stage compound which inhibits the interactionbetween MDM2 and TP53 inducing apoptosis. MDM2 was predicted as a driver gene in OV cell lines41by HIT’nDRIVE. MDM2 seeded module was the top predictor (maximum accuracy 94%) of the drug-response phenotype when treated with Nutlin-3a (Figure 3.4B,E). Our method predicted many otherinteracting partners (both as seed or component genes in the module) of MDM2 and TP53 which areknown to play a critical role in TP53 pathway.Finally, TMZ is a clinically approved first-line therapy for GBM. ABC transporters (including ABCB1)help to transport TMZ from the extracellular space to the cytoplasm of a cell. TMZ methylates selectivenucleotides of DNA triggering DNA repair pathway. MGMT specifically removes the methyl groupsfrom the methylated nucleotides escaping from DNA strand breaks. MGMT was predicted as a compo-nent gene in the third top-scoring module. Failure to repair DNA strand breaks triggers DNA damageresponse pathway further activating TP53 and apoptosis. Interestingly, TP53 was predicted as the seedof the top scoring module by HIT’nDRIVE-OptDis. Furthermore, another gene in the DNA damage re-sponse pathway, CDKN2A, seeds another top ranking module, which improves the overall classificationaccuracy to 97% (Figure 3.4B,D). Note that both CDKN2A and TP53 are the most frequently alteredgenes in GBM.3.4 DiscussionIn this chapter we have demonstrated that (1) HIT’nDRIVE increases our ability to identify potential ge-nomic driver alterations. (2) HIT’nDRIVE prioritizes clinically actionable driver genes many of whichhappen to be private drivers. This implies that it is possible to replicate the lengthy and costly ex-perimental approaches for detecting driver genes in common tumor types by HIT’nDRIVE - in-silico,strongly supporting the biological relevance of HIT’nDRIVE’s algorithmic framework. The fact that ahigh portion of HIT’nDRIVE prioritized drivers in well studied cancer types overlap with known drivergenes increases our confidence in the calls made by HIT’nDRIVE in rarer tumor types for which drivergenes are mostly unknown. (3) HIT’nDRIVE prioritizes driver genes present in both the centre andperiphery of an interaction network. (4) Our analysis revealed that driver genes have higher collectiveinfluence on the transcriptome than other altered genes. Some of these driver genes are central and nat-urally have high influence, however there are also many non-central driver genes with high influenceover other genes in the network. (5) HIT’nDRIVE is especially suitable for identifying such non-centraldriver genes or infrequent/private drivers. (6) HIT’nDRIVE can capture subtype specific driver genesand such driver seeded modules can indeed differentiate between different subtypes of a cancer. (7)We have demonstrated that subtype specific driver modules are also associated with patients’ survivaloutcome providing additional evidence that these driver genes have clinical significance. (8) We alsodemonstrated that HIT’nDRIVE seeded driver genes (more specifically, OptDis modules seeded by HIT-42nDRIVE identified driver genes, prioritized with respect to drug efficacy) not only include the drug targetbut also the associated (downstream) signaling pathway. This provides us the possibility of identifyingand clinically targeting multiple genes (not necessarily sequence-wise altered but are nevertheless in themodule identified by HIT’nDRIVE) dysregulating critical oncogenic or metabolic pathways.We also note that targeted therapeutics are being extensively used in clinical trials but the drug re-sponse rate is very poor (only ∼5% of patients in clinical trials have good response to targeted thera-peutics) [111, 135]. This is most likely because even if a cancer patient harbours an alteration for whichtargeted therapeutics are available, we do not know if that alteration is responsible for driving the tumor[16]. HITnDRIVE could potentially play a key role by prioritizing potential driver alterations from a vastpool of passenger alterations. In our study, we have used drug efficacy data from pan-cancer cell linesin order to demonstrate that the potential genomic drivers (more precisely driver gene seeded modules)of the cell-lines can be used as features to predict drug-efficacy. Following similar procedure in clinicaltrials, we believe that the application of HITnDRIVE to predict drug efficacy would likely improve thedrug response rate.HIT’nDRIVE predicted ABCB1 as the most influential driver gene in seven TCGA-GBM casesthat were treated with TMZ prior to tissue collection. Using GDSC dataset, we demonstrated thatHITnDRIVE-OptDis can predict mechanisms of drug sensitivity for TMZ and other drugs (Figure 3.4G-H). Since ABCB1 was not mutated in any of the GBM cell lines in the analysis, it was not identifiedas a driver gene of GBM cell lines. However, the top seed driver gene, TP53, is an interaction partnerof ABCB1 (in STRING v10 network). Other seed driver genes and its component genes in the modulethat are direct interaction partners of ABCB1 are UBC, CAV1, WDTC1 and DNAH8. ABC transporters(including ABCB1) helps to transport TMZ from the extracellular space to the cytoplasm of a cell. Onthe other hand, DNA damage caused by TMZ activates TP53 thereby dysregulating apoptotic pathways.Thus, the presented analysis demonstrates that the downstream expression changes are, most likely, themanifestation of the selection pressure in ABCB1 induced by TMZ treatment.Protein-protein interaction (PPI) networks representing physical interactions now include thousandsof proteins and over a million (undirected) interactions between them. Regulatory networks on the otherhand represent gene/protein regulation occurring at multiple levels of biological systems through directedlinks. Since available regulatory networks are very limited in size and scope, our study focuses on PPInetworks. However, HIT’nDRIVE can easily be applied to regulatory networks as they grow in size andscope. In addition, the use of multi-hitting time as a distance measure between two or more driver genesand a target gene enables HIT’nDRIVE to capture synthetic rescue like scenarios; this is ideally suitedfor undirected PPI networks, but in principle can be extended to regulatory networks in the future.HIT’nDRIVE is a driver gene prioritization tool that is flexible enough to incorporate different types43of -omics data. Both principles under RWFL and HIT’nDRIVE can be utilized to identify the causalgenes in different complex disease facing analogous problems to cancer. Finally, we believe that appli-cations of RWFL problem may extend beyond its application to driver gene identification - to influenceanalysis in social networks, disease networks and others.3.5 Methods3.5.1 Datasets and analysisWe used publically available datasets of four major cancer-types GBM [175], OV [176], BRCA [177],and PRAD [178] from TCGA project. Details can be found in Section Genomics of drug sensitivity in cancerSomatic mutation, copy-number alterations and gene-expression, and drug screening data of cancer celllines were downloaded from Genomics of Drug Sensitivity in Cancer (GDSC) [80] websitehttp://www.cancerrxgene.org/downloads. Data downloaded on August 2016.3.5.3 Pathway enrichment analysisThe selected set of genes were tested for enrichment against gene sets of pathways present in MolecularSignature Database (MSigDB) v5.0 [162]. A Fisher’s exact test based gene set enrichment analysis wasused for this purpose. A cut-off threshold of false discovery rate (FDR)≤ 0.01 was used to obtain the sig-nificantly enriched pathways. An R implementation of GESD test is available at https://github.com/raunakms/GSEA-Fisher. Same procedure, as above, is used to assign biological functional to the gene-modules.3.5.4 Association of driver modules with patients’ survival outcomeTo test for association of driver modules with patients’ survival outcome, we developed a risk-score basedon multi-gene (component genes of the module) expression. The risk-score (S) defined as a weightedsum of the normalized gene-expression values of the component genes in the module weighted by theirestimated univariate Cox proportional-hazard regression coefficients [15] as given in the equation below.S =k∑iβixi j44Here i and j represents a gene and a patient respectively, βi is the coefficient of cox regression for genei, xi j is the normalized gene-expression of gene i in patient j, and k is the number of component genesin a gene-module. The normalized gene-expression values were fitted against overall survival time withliving status as the censored event using univariate Cox proportional-hazard regression (Exact method).Based on the risk-score values, patients were stratified into two groups: low-risk group (patients withS < 33 percentile of S), and high-risk group (patients with S > 66 percentile of S). Patients that fall inbetween (i.e. patients with S >= 33 percentile of S and <= 66 percentile of S) were discarded from thefurther analysis as these patients fall into intermediate-risk group and are bound to introduce noise whileperforming log-rank test.Both Cox regression coefficients of each gene and risk-score cutoff values for each module wereestimated from TCGA-BRCA cohort (training dataset), later these values were applied to METABRICcohots (test dataset). To assess whether the risk-score assignment to high/low categories was valid, alog-rank test was performed for each module in both training and test datasets.Finally, to identify the significant list of driver-modules that were robust enough to predict patients’survival, we calculated log-rank test pvalue, hazard-ratio (HR) (Wald test) and concordance-index (c-index) (Wald test).45Figure 3.1: Summary of driver genes prioritized by HIT’nDRIVE. (A) Distribution of predicteddriver genes in cancer genes databases. CGC database contains genes for which mutationshave been causally implicated in cancer. Genes curated in CGC database represents likelydrivers of cancer. COSMIC is a comprehensive database of somatic mutations that have beenreported in different cancers. However, every gene present in COSMIC database may notrepresent drivers of cancer. (B) Distribution of driver genes in druggable genes databases.Actionable genes in cancer therapy were derived from TARGET database. List of druggablegenes were extracted from DGI database. (A-B) The numbers in the panel represent thenumber of genes in respective categories. (C) Distribution of patient druggability. Patientdruggability was accessed using information in TARGET and DGI databases. The numbersin the panel represent the number of patients in respective categories.46Figure 3.2: Network properties of driver genes. (A) The centrality of the predicted drivers inSTRING v10 network. The size of the circles is proportional to the alteration frequencyof the driver gene. The color scale represents the total influence of the driver gene on theexpression outliers. (B) Correlation between influence and centrality. Each dot represents atarget node receiving certain amount of influence from all source nodes in the network. Alowess regression line is represented in blue. (C) Correlation between incoming and outgoinginfluence of a node. Each dot represents a node in the network and the color scale representsits betweenness centrality. A linear regression line is represented in blue. (D) Boxplot of thetotal influence of driver genes predicted by HIT’nDRIVE on the expression outliers comparedto that of other altered genes (genes not predicted as drivers). (E) Correlation between geneinfluence and its alteration frequency in the respective patient cohort. (F) Relative influence ofdriver genes in each patient in GBM cohort with mutation in ABCB1. (G) Relative influenceof driver genes in each patient in PRAD cohort with mutation in BRAF. (All gene influencevalues have been multiplied by 105 before log transformation.)47Figure 3.3: BRCA subtype classification using driver modules. (A) Performance accuracy of clas-sifying different subtypes for breast cancer using activity-score of subtype specific drivermodules as features in three distinct datasets. (B) Box plot comparing subtype specific driver-seeded modules and driver-free modules with respect to three distinct measures - log-rank testpvalue, hazard-ratio (HR) and concordance-index (c-index). (C) A BRCA subtype specificdriver module (BASAL-02) seeded by NCOA3 that distinguished Basal subtype from rest ofthe BRCA subtypes. (D) Activity-score of BASAL-02 module across different BRCA sub-types. (E) Kaplan-Meier plot showing the significant association of BASAL-02 module withpatients’ clinical outcome in the three datasets considered.48Figure 3.4: Drug efficacy predicted by HIT’nDRIVE seeded driver genes. (A) Accuracy of drug-response phenotype classification for all 265 drugs used in GDSC study across 25 cancer types(the remaining 5 cancer types for which only a very limited number of cell lines have beenmade available are statistically insignificant and thus have not been used). The classificationaccuracy for each drug on each cancer type is measured based on the collective use of at most10 best discriminating modules, i.e. the accuracy is maximized across the range of 1 to 10(best discriminating) modules. Note that many of the drugs were not tested on all cancer types;in fact for the vast majority of cancer types only a handful of drugs were tested. (B) Classifi-cation accuracy of modules that distinguish the drug-response phenotypes after treatment withGefitinib in BRCA cell-lines (top-panel), Temozolomide in GBM cell-lines (middle-panel),and Nutlin-3a in OV cell-lines (bottom-panel). Important genes identified in the modules andinvolved in the dysregulated signalling pathways have been highlighted. (C-E) The figuresrepresent the dysregulated signalling pathways in the respective drug perturbation.49Chapter 4Integrated multi-omics molecularsubtyping predicts therapeuticvulnerability in malignant peritonealmesothelioma4.1 IntroductionMalignant mesothelioma is a rare but aggressive cancer that arises from internal membranes lining ofthe pleura and the peritoneum. While the majority of mesotheliomas are pleural in origin, peritonealmesothelioma (PeM) accounts for approximately 10-20% of all mesothelioma cases. PeM emerges frommesothelial cells lining of the peritoneal/abdominal cavities. The incidence rate of PeM is estimatedto be less than 0.5 per 100,000 with 400-800 cases reported annually in the United States of Americaalone [172]. Occupational asbestos exposure is a significant risk factor in the development of PleuralMesothelioma (PM). However, epidemiological studies suggest that unlike PM, asbestos exposure playsa far smaller role in the etiology of PeM tumors [172].Mesothelioma is typically diagnosed in the advanced stages of the disease. A combination of Cytore-ductive surgery (CRS) and Hyperthermic intraperitoneal chemotherapy (HIPEC), sometimes followed byNormothermic intraperitoneal chemotherapy (NIPEC) has recently emerged as a first-line treatment forPeM [163]. However, even with this regime, complete cytoreduction is hard to achieve and death ensuesfor most patients. Actionable molecular targets for PeM critical for precision oncology remains to be50defined. Immune checkpoint blockade therapy in PM has recently caught much attention given 20-40%of PM cases reported as inflammatory phenotype [174]. Although, clinical trials typically lump PeM andPM together for immune checkpoint blockade [25–27, 56, 110], no study has yet provided any rationalewhy PeM should be considered for immunotherapy.Studies investigating genetic abnormalities of PeM [5, 34, 83, 85, 99, 151, 158, 184] have revealedrecurrent copy-number losses of CDKN2A on 9p21, NF2 on 22q and BAP1 on 3p21. In addition, thesestudies also reported recurrent mutations in BAP1, SETD2, and DDX3X. However, downstream conse-quences of these genomic alterations in PeM has not been investigated in great detail. Genomic informa-tion alone is unlikely to successfully uncover candidate therapeutic targets if not analyzed in the contextof transcriptomes and proteomes.In this study, we performed an integrated analysis of the genome, transcriptome, and proteome of 19PeM tumors predominantly of epithelioid subtype.4.2 Our ContributionsWe present a first-in-field comprehensive integrative multi-omics analysis of a patient cohort of treatment-naive PeM [156]. In a novel contribution, using HIT’nDRIVE, we identified PeM with BAP1 loss to forma distinct molecular subtype characterized by distinct gene expression patterns of chromatin remodeling,DNA repair pathways, and immune checkpoint receptor activation. We also demonstrate that this subtypeis correlated with inflammatory tumor microenvironment and thus a candidate for immune checkpointblockade therapies. Our findings reveal BAP1 to be a trackable prognostic and predictive biomarkerfor PeM immunotherapy that refines PeM disease classification. This is significant because almost halfof PeM cases are now candidates for these therapies. BAP1 stratification may improve drug responserates in ongoing phase-I and II clinical trials exploring the use of immune checkpoint blockade therapiesin PeM in which BAP1 status is not considered. This integrated molecular characterization provides acomprehensive foundation for improved management of a subset of PeM patients.Our another novel and significant contribution is that we resolved the large discordance betweenmRNA and protein expression patterns in PeM cohort. Most of this discordance is attributed to chromatinremodeling genes and proteins linked to multimeric protein complex. The majority of which are directprotein-interaction partners of BAP1. The discordance between the mRNA and the protein expressionpatterns is most likely due to the ubiquitination and degradation of proteins in these BAP1 regulatedcomplexes to maintain functional stoichiometry.514.3 Results4.3.1 Patient Cohort descriptionWe assembled a cohort of 19 tumors from 18 patients (here we refer to it as VPC-PeM) undergoingCRS at Vancouver General Hospital (Vancouver, Canada), Mount Sinai Hospital (Toronto, Canada), andMoores Cancer Centre (San Diego, California, USA). We obtained 19 fresh-frozen primary treatment-nave PeM tumors and adjacent benign tissues or whole blood from the 18 cancer patients. For one patient,MESO-18, two tumors from distinct sites were available. Immunohistochemical staining on tissues usingdifferent biomarkers were evaluated by two independent pathologists. Both pathologist categorized all19 tumors as epithelioid PeM with a content of higher than 75% tumor cellularity. To the best of ourknowledge this is the largest cohort of PeM subjected to an integrative multi-omics analysis.4.3.2 Landscape of somatic mutations in PeMTo investigate the heterogeneity of somatic gene mutations in VPC-PeM, we performed high-coverageexome sequencing (Ion Proton Hi-Q) of 19 tumors and 16 matched normal samples. We achieved a meancoverage of 180x for cancerous samples and 96x for non-cancerous samples, with at least 43-77% of tar-geted bases having a coverage of 100x. We identified 346 unique non-silent mutations (313 of whichwere not previously reported in COSMIC [58]) affecting 202 unique genes. We observed an average of0.015 protein-coding non-silent mutations per Mb per tumor sample. Patient MESO-18 had the highestmutation burden (0.04 mutations per Mb) and MESO-11 had the least mutation burden (0.001 muta-tions per Mb). The non-silent mutation burden in PeM is low compared to other adult cancers includingmany abdominal cancers (Figure 4.1A), with the exception of prostate adenocarcinomas (PRAD), kid-ney chromophobe carcinomas (KICH), and testicular germ cell tumors (TGCT). Notably, the mutationburden in PeM was fairly similar to PM as well as pancreatic adenocarcinomas (PAAD). We also as-sessed the mutational process that contribute to alterations in tumors. Analysis of base-level transitionsand transversions at mutated sites showed that C>T transitions were predominant in PeM (Figure 4.1B).Using the software deconstructSigs [143], we found that mutational signature 1, 5, 12, and 6 were oper-ative in PeM. Interestingly, signature 1 was often correlated with age at diagnosis, and signature 6 wasassociated with DNA mismatch repair and mostly found in microsatellite instable tumors [7].We first identified driver genes of PeM using our recently developed algorithm HIT’nDRIVE [155].Briefly, HIT’nDRIVE measures the potential impact of genomic aberrations on changes in the globalexpression of other genes/proteins which are in close proximity in a gene/protein-interaction network. Itthen prioritizes those aberrations with the highest impact as cancer driver genes. HIT’nDRIVE priori-52tized 25 unique driver genes in 15 PeM tumors for which matched genome and transcriptome data wereavailable (Figure 4.1C). Six genes (BAP1, BZW2, ABCA7, TP53, ARID2, and FMN2) were prioritized asdrivers, each harboring single nucleotide changes.The mutation landscape of PeM was found to be highly heterogeneous. The nuclear deubiquitinaseBAP1 was the most frequently mutated gene (5 out of 19 tumors) in PeM tumors. BAP1 is a tumor-suppressor gene known to be involved in chromatin remodeling, DNA double-strand break repair, andregulation of transcription of many other genes [19]. Previous studies have also reported BAP1 as themost frequently mutated gene in both PeM [5, 85] and PM [19, 24]. The BAP1 missense mutation inMESO-18A/E resulted in a single amino-acid (AA) change in the ubiquitin carboxyl hydrolase domainkeeping the rest of the amino acid chain intact. In MESO-06 and MESO-09, a BAP1 frameshift deletionresulted in a premature stop codon and chain termination. In MESO-09 approximately 91% of BAP1amino acid chains were intact, but in MESO-06 only 2% of BAP1 amino acid chains were intact. We alsoobserved a BAP1 germline mutation in only one case (MESO-09). In three (15%) tumors, we identifieda recurrent R396I mutation in ZNF678 - a zinc finger protein containing zinc-coordinating DNA bindingdomains involved in transcriptional regulation. We compared the mutated genes in our VPC-PeM cohortwith publically available datasets [1, 5, 24] of both PeM and PM. BAP1 was the only mutated genecommon between the three PeM cohorts. Twenty-five genes including tumor suppressors LATS1, TP53,and chromatin modifiers SETD2 were common between at least two PeM cohorts. Many mutated genesin VPC-PeM were also previously reported in PM. BAP1 and SETD2 were the two mutated genes foundcommon between VPC-PeM and all four PM cohorts.4.3.3 Copy number landscape in PeMTo investigate the somatic CNA profiles of PeM, we derived copy-number calls from exome sequencingdata using the software Nexus Copy Number Discovery Edition Version 8.0. The aggregate CNA profileof PeM tumors is shown in Figure 4.2A-B. We observed a total of 1,281 CNA events across all samples.On an average, 10% of the protein-coding genome was altered per PeM tumor. MESO-14 had thehighest CNA burden (42%) whereas MESO-11 had the least (0.01%). Interestingly, both mutation andCNA burden in PeM was strongly correlated (R = 0.74).We also compared the CNA burden in protein-coding regions of the VPC-PeM cohort with differentadult cancers from TCGA project. Similar to the mutation burden, VPC-PeM tumors were observedat the lower end of the pan-cancer CNA burden spectrum. Only UCEC, PRAD, and PAAD tumorshad lower median CNA burden as compared to PeM tumors (Figure 4.2C). CNA status and mRNAexpression for around half of the genes were positively correlated (R ≥ 0.1) and 16% of the genes had53strong correlation (R ≥ 0.5). To identify cancer genes, we compared aberrations in protein-coding geneswith data from the CGC. Intriguingly, CNA status and mRNA expression for majority of CGC geneswere positively correlated.To identify recurrent focal CNAs in PeM tumors, we used the GISTIC [115] algorithm which yielded5 regions of focal deletions (q < 0.05) including in 3p21 and 22q13 which are characteristic of malig-nant mesotheliomas (Figure 4.2D). Furthermore, GISTIC prioritized 8 regions of focal amplification (q< 0.05) which included genes such as IGH, VEGFD, BRD9, FOXL1, EGFR, and PDGFA (Figure 4.2D).Copy-number status of these genes was also significantly correlated with their respective mRNA expres-sion. Chromosome 1 was the most aberrant region in PeM and chromosomes 13 and 18 were relativelyunchanged except for MESO-14 (Figure 4.2B).Using HIT’nDRIVE, we identified genes in chromosome 3p21, BAP1, PBRM1, and SETD2, as keydriver genes of PeM (Figure 4.1C). Chromosome 3p21 was deleted in almost half of the tumors (8of 19) in the cohort. Here, we call tumors with 3p21 (or BAP1) loss as BAP1del and the rest of thetumors with 3p21 (or BAP1) copy-number intact as BAP1intact. Interestingly, BAP1 mRNA transcriptsin BAP1del tumors were expressed at lower levels as compared to those in BAP1intact tumors (Wilcoxonsigned-rank test p-value = 3x10-4) (Figure 4.2E). We validated this using Immunohistochemical (IHC)staining demonstrating lack of BAP1 nuclear staining in the tumors with BAP1 homozygous deletion(Figure 4.2F). Tumors with BAP1 heterozygous loss still displayed BAP1 nuclear staining. We observedthree BAP1 mutated cases among BAP1intact tumors. BAP1 mRNA transcripts in these three tumors,were expressed at high levels. As mentioned in the previous section, the mutation analysis also predictedthat despite mutation in BAP1 in these three tumors, the entire BAP1 amino-acid chain is still intactand may be functionally active. Furthermore, we found DNA copy loss of 3p21 locus to include fourconcomitantly deleted cancer genes - BAP1, SETD2, SMARCC1, and PBRM1, consistent with [208].Copy-number status of these four genes was significantly correlated with their corresponding mRNAexpression, suggesting that the allelic loss of these genes is associated with decreased transcript levels.These four genes are chromatin modifiers, and PBRM1 and SMARCC1 are part of SWI/SNF complexthat regulates transcription of a number of genes.CNA status of genes associated with key cancer pathways was observed to be different between thePeM subtypes (i.e. BAP1del and BAP1intact). We observed many genes involved in chromatin remodeling,SWI/SNF complex and DNA repair pathway to be deleted in BAP1del tumors as compared to BAP1intacttumors (Figure 4.1C). In contrast, we found copy-number gain of many genes in D NA repair path-ways (BRCA2, ATM, MGMT, and RAD51) and the cell cycle (MYC, CDK5, CCNB1, and CCND1) in theBAP1intact tumors. Furthermore, PeM tumors (both BAP1del and BAP1intact) harbored CNA events in car-cinogenic pathways such as MAPK, PI3K, MTOR, Wnt, and Hippo pathways. Interestingly, ESR1 copy54number deletion is enriched in BAP1del tumors while co-amplification of EGFR and BRAF were presentin three BAP1intact tumors. Notably, we identified copy-number loss of tumor suppressor LATS1/2 andcopy-number gain of NF2 in one case, both of which has been previously associated with mesotheliomas[24], in BAP1del tumors. Notably, both LATS1/2 and NF2 are key regulators of the Hippo pathway [105].Unsupervised consensus clustering of tumor samples based on copy-number segmentation meanvalues of the 3349 most variable genes identified four tumor sub-groups (Figure 4.2G)). We observedthat BAP1del and BAP1intact tumors were grouped into distinct clusters. This indicates that BAP1deltumors have distinct copy-number profiles from those of BAP1intact tumors. We identified 692 genes(p-value < 0.01, Kruskal-Wallis test) with significantly differential CNA genes segments between theclusters. These genes were mapped to eight distinct chromosome loci 19p, 6q, 1q, and, 13q and weremostly gained in clusters 1 and 3, whereas Xq, 22q, and 7p loss were mostly in clusters 1 and Gene fusions in PeMTo identify the presence of gene fusions, we analyzed RNA-seq data in 15 PeM using deFuse algorithm[114]. Overall, 82 unique gene fusion events were identified using our filtering criteria (see Methods),out of which we successfully validated 18 gene fusions using Sanger sequencing. We observed moregene fusion events in BAP1del tumors as compared to that in BAP1intact tumors (Figure 4.3A-B).Notably, BAP1, SETD2, PBRM1, and KANSL1 were prioritized as a driver gene by HIT’nDRIVE onbasis of gene-fusion. Fusions in these genes were mostly found in the BAP1del subtype. MTG1-SCART1was the most recurrent gene fusion observed in 7 cases. MTG1 regulates mitochondrial ribosome thatsynthesize proteins essential for oxidative phosphorylation. SCART1 is a pseudogene predicted to actas a co-receptor of certain T-cells. This was followed by GKAP1-KIF27 and KANSL1-ARL17B (Figure4.3C) each of which was identified in 6 different cases. Three unique fusions were present in PBRM1, 2in KANSL1, and 1 each in BAP1 and SETD2 all of which are involved in chromatin remodeling process(Figure 4.1C and 4.3D-F).4.3.5 The global transcriptome and proteome profile of PeMTo segregate transcriptional subtypes of PeM, we performed total RNA-seq (Illumina HiSeq 4000) and itsquantification of 15 PeM tumor samples for which RNA were available (RNA for remaining four tumorsamples did not pass the quality control checks). We first performed principal-component analyses andunsupervised consensus clustering of all PeM tumors to determine transcriptomic patterns using genesbased on variance among tumor specimens. Consensus clustering revealed two distinct transcriptomesub-groups (Figure 4.4A). We found BAP1intact and BAP1del have some distinct transcriptomic patterns;55however, a few samples showed an overlapping pattern.We performed mass spectrometry (Fusion Orbitrap LC/MS/MS) with isobaric tagging for expressedpeptide identification and its corresponding protein quantification using Proteome Discoverer for pro-cessing pipeline for 16 PeM tumors and 7 matched normal tissues (matched normal samples for theremaining tumors were not available). We identified 8242 unique proteins in 23 samples analyzed (wewere surprised BAP1 protein was however not detected in our MS experiment, likely due to inherenttechnical limitations with these samples and/or processing. Quality control analysis of in solution Heladigests also have very low BAP1 with only a single peptide observed in occasional runs). First, we ana-lyzed global matched mRNA-protein expression correlation. Although, 58% (4715 of 8109) of proteinsshowed positive mRNA-protein correlation (Pearson correlation; R ≥ 0.1), only 22.7% (1839) of theproteins were strongly correlated with their corresponding mRNA (R ≥ 0.5). Expression of 2.4% (194)of proteins strongly negatively correlated with their corresponding mRNA (R ≤ -0.5). To analyze theproteomic pattern across PeM tumors, we performed principal-component analyses and unsupervisedconsensus clustering following the same procedure as described above for the transcriptome. Unlike intranscriptome profiles, the proteome profiles of BAP1 PeM tumor sub-types did not group into distinctclusters (Figure 4.4B).To identify Differentially expressed genes (DEG) between BAP1intact and BAP1del, we performedWilcoxon signed-rank test using mRNA and protein expression data independently. We identified 1520and 466 DEG (with p-value < 0.05) using mRNA and protein expression data respectively. However,only 53 genes were found common between the two sets of DEG. As expected, BAP1, PBRM1 andSMARCA4, SMARCD3 were among the top-500 DEG. Many other important cancer-related genes weredifferentially expressed such as CDK20, HIST1H4F, ERCC1, APOBEC3A, CDK11A, CSPG4, TGFB1,IL6, LAG3, and ATM.4.3.6 Transcriptional and post-transcriptional mechanisms regulate chromatinremodeling protein-complexesNext, we aimed to study the extent to which changes in copy number profile affects its correspondingprotein expression. For this, we calculated Pearson correlation between CNA-mRNA expression andCNA-protein expression. While, copy number profile of genes, on average, have good agreement withtheir corresponding mRNA expression, a number of detected proteins had poor correlation with theirrespective gene’s copy number profile. Approximately 25% (1871 of 7462) of proteins were observedto have poor correlation with their genes copy number which we here define as “attenuated proteins”(Methods, Figure 4.4C). Among the attenuated proteins, we identified important chromatin remodeling56proteins - PBRM1, SETD2, and SMARCC1. The attenuated proteins also included cancer genes suchas NF2, EGFR, APC, PIK3CA, and MAP3K4. We observed that the attenuated proteins were signifi-cantly enriched with direct protein-protein interaction partners of the UBC (hypergeometric test p-value:10-5), BAP1 (10-3), and PBRM1 (10-2) in STRING v10 interaction network. Notably, geneset enrich-ment analysis revealed that attenuated proteins are more likely to form a part of a multimeric complexor bind to macromolecules (Figure 4.4D). These results corroborate previous findings from studies an-alyzing breast, ovarian and colorectal cancer datasets [62]. These attenuated proteins were found to beinvolved in mRNA processing, DNA repair pathway, cell cycle regulation, the immune system, and incarbohydrate and lipid metabolism. Strikingly, we found that DEG between the PeM subtypes are sig-nificantly associated with protein attenuation (Chi-Squared test p-value: 10-4 using mRNA expressionDEG, 10-6 using protein expression DEG). These findings suggest that the effects of CNA are attenuatedat the protein level via post-transcriptional modification.To identify large protein complexes containing the attenuated proteins and that are variable (i.e. atleast a protein subunit of the complex is differentially expressed) between PeM subtypes, we leveraged amanually curated set of core protein complexes from the CORUM database [145]. These included manyprotein complexes involved in DNA conformation modification, DNA repair, transcriptional regulation,post-translational modification including ubiquitination. Using our data, we observed that the majorityof the protein complexes were highly co-regulated at the protein level rather than at the mRNA level.Notably, we identified SWI/SNF (BAF and PBAF) and HDAC complex which were highly co-regulated(Figure 4.4E-G). We found copy-number deletion in many subunits of SWI/SNF complex, mostly inthe BAP1del subtype (Figure 4.1C). About one quarter of proteins in the BAF complex and half of pro-teins in PBAF were attenuated. PBRM1 was both attenuated at the protein level as well as differentiallyexpressed between PeM subtypes. SMARCB1, and SMARCA4 were also differentially expressed be-tween PeM subtypes in this complex (Figure 4.4H). We further identified a number of HDAC complexcomponents as highly co-regulated. The complex consisted of Histone deacetylase (HDAC1/2), whichregulates expression of a number of genes through chromatin remodeling. About one-third of proteinsubunits in the complex were attenuated at the protein level. More importantly, HDAC1, CHD4 andZMYM2 were differentially expressed between PeM subtypes in the protein complex, and different fam-ily members of HDAC protein family were highly expressed in the BAP1del subtype (Figure 4.4I). Thisindicates potential use of HDAC inhibitors to suppress the tumor growth in the BAP1del subtype. Wenote that both SWI/SNF and HDAC complexes interact with BAP1. Expression pattern of many subunitsof these complexes were either highly correlated or highly anti-correlated with BAP1 expression (Figure4.4E-G). Although mRNA transcripts are transcribed proportional to the changes in copy-number profileof the gene, the corresponding proteins are often stabilized when in complex, and free proteins in excess57are usually ubiquitinated and targeted for proteosomal degradation to maintain stoichiometry [62].4.3.7 BAP1del subtype is characterized by distinct expression patterns of genes involvedin DNA repair pathway, and immune checkpoint receptor activationTo identify the pathways dysregulated by the DEG between the PeM subtypes, we performed hyper-geometric test based geneset enrichment analysis (Methods) using the REACTOME pathway database.Intriguingly, we observed high concordance between pathways dysregulated by the two sets (mRNAand protein expression data) of top-500 DEG (Figure 4.5A-B). The unsupervised clustering of path-ways revealed two distinct clusters for BAP1del and BAP1intact tumors. This indicates that the enrichedpathways, between the patient groups, are also differentially expressed. BAP1del patients demonstratedelevated levels of RNA and protein metabolism as compared to BAP1intact patients. Many genes in-volved in chromatin remodeling and DNA damage repair were differently expressed between the groups.Our data suggests that BAP1del tumors have repressed DNA damage response pathways. Most impor-tantly, protein expression data revealed that PARP1 is highly expressed in BAP1del tumors as comparedto BAP1intact tumors indicating potential inhibition of PARP1 for BAP1del tumors. Genes involved incell-cycle and apoptotic pathways were observed to be highly expressed in BAP1del patients. Further-more, glucose and fatty-acid metabolism pathways were repressed in BAP1del as compared to BAP1intact.More interestingly, we observed a striking difference in immune-system associated pathways betweenthe PeM subtypes. Whereas BAP1del patients demonstrated strong activity of cytokine signaling and theinnate immune system; MHC-I/II antigen presentation system and Adaptive immune system were activein BAP1intact patients.Prompted by this finding, we next analyzed whether PeM tumors were infiltrated with leukocytes. Toassess the extent of leukocyte infiltration, we computed an expression (RNA-seq and protein) based scoreusing the immune-cell and stromal markers proposed by [206]. We discovered that the immune markergene score was strongly correlated with stromal marker gene score (Methods and Figure 4.5C-D). UsingCIBERSORT [124] software, we computationally estimated leukocytes representation in the bulk tumortranscriptome. We observed massive infiltration of T cells cells in majority of the PeM tumors (Figure4.5E). A subset of PeM tumors had massive infiltration of B-cells in addition to T cells. Interestingly,when we group the PeM tumors by their BAP1 aberration status, there was a marked difference in theproportion of infiltrated plasma cells, natural killer (NK) cells, mast cells, T cells and B cells betweenthe groups. Whereas the proportions of plasma cells, NK cells and B cells were less in the BAP1deltumors, there was more infiltration of mast cells and T cells were in BAP1del tumors as compared toBAP1intact tumors. We performed Tissue microarray (TMA) IHC staining of CD3 and CD8 antibody58on PeM tumors. We observed that BAP1del PeM tumors were positively stained for both CD3 and CD8confirming infiltration of T cells in BAP1del PeM tumors (Figure 4.5F). Combined, this strongly indicatesthat leukocytes from the tumor-microenvironment infiltrates the PeM tumor.Finally, we surveyed the PeM tumors for expression of genes involved in immune checkpoint path-ways. A number of immune checkpoint receptors were highly expressed in BAP1del tumors relative toBAP1intact tumors. These included CD274 (PD-L1), CD80, CTLA4, LAG3, and ICOS (Figure 4.5G) forwhich inhibitors are either clinically approved or are at varying stages of clinical trials. Gene expres-sion of these immune checkpoint receptors were highly correlated with immune score (Figure 4.5H).Moreover, a number of MHC genes, immuno-inhibitor genes as well as immuno-stimulator genes weredifferentially expressed between BAP1del and BAP1intact tumors. Furthermore, we analyzed whether theimmune checkpoint receptors were differentially expressed in tumors with and without 3p21 loss in PMtumors from TCGA. Unlike in PeM, we did not observe a significant difference in immune checkpointreceptor expression between the PM tumor groups (i.e. BAP1del and BAP1intact). These findings suggestthat BAP1del PeM tumors could potentially be targeted with immune-checkpoint inhibitors while PMtumors may less likely to respond.4.4 DiscussionIn this study, we present a comprehensive integrative multi-omics analysis of malignant peritoneal mesothe-liomas. Even though this is a rare disease we managed to amass a cohort of 19 tumors. Prior studies ofmesotheliomas, performed using a single omic platform, have established loss of function mutation orcopy-number loss of BAP1 as a key driver event in both PeM and PM. Our novel contribution to PeM isthat we provide evidence from integrative multi-omics analyses that BAP1 copy number loss (BAP1del)forms a distinct molecular subtype of PeM. This subtype of PeM is characterized by distinct expressionpatterns of genes involved in chromatin remodeling, DNA repair pathway, and immune checkpoint ac-tivation. Moreover, BAP1del subtype has inflammatory tumor microenvironment. Our results suggestthat BAP1del tumors might be prioritized for immune checkpoint blockade therapies. Thus BAP1 maybe both a prognostic and predictive biomarker for PeM enabling better disease classification and patienttreatment.Structural alterations in PeM tumors were found to be highly heterogeneous, and occur at a lower rateas compared to most other adult solid cancers. The majority of SNVs and CNAs were typically uniqueto a patient. However, many of these alterations were non-randomly distributed to critical carcinogenicpathways. We observed many alterations in genes involved in chromatin remodeling, SWI/SNF complex,cell cycle and DNA repair pathway. SWI/SNF complex is an ATP-dependent chromatin remodeling59complex known to harbor aberrations in almost one-fifth of all human cancers [84]. Our results show thatSWI/SNF complex is differentially expressed between PeM subtypes which further regulates oncogenicand tumor suppressive pathways. Notably, we also identified another chromatin remodeling complex -HDAC complex which is differentially expressed between PeM subtypes. HDAC, known to be regulatedby BAP1, is a potential therapeutic target for the BAP1del PeM subtype. Recent in-vitro experimentsdemonstrated BAP1 loss altered sensitivity of PM as well as uveal melanoma (UM) cells to HDACinhibition [95, 146]).Loss of BAP1 is known to alter chromatin architecture exposing the DNA to damage, and also im-pairing the DNA-repair machinery [81, 210]. Similar to BRCA1/2 deficient breast and ovarian cancers,BAP1 deficient PeM tumors most likely depends on PARP1 for survival. This rationale can be utilized totest PARP inhibitors in BAP1del PeM subtype. The DNA repair defects thus drive genomic instability anddysregulate tumor microenvironment [121]. DNA repair deficiency leads to the increased secretion of cy-tokines, including interferons that promote tumor-antigen presentation, and trigger recruitment of both Tand B lymphocytes to destroy tumor cells. As a response, tumor cells evade this immune-surveillance byincreased expression of immune checkpoint receptors. The results presented here also indicate that PeMtumors are infiltrated with immune-cells from the tumor microenvironment. Moreover, the BAP1del sub-type displays elevated levels of immune checkpoint receptor expression which strongly suggests the useof immune checkpoint inhibitors to treat this subtype of PeM. However, in a small subset of PM tumorsin TCGA dataset, the loss of BAP1 did not elevate expression of immune checkpoint marker genes. Thiswarrants further investigation on the characteristics of these groups of PM tumors. Furthermore, recently,BAP1 loss has been defined as a distinct molecular subtype of clear cell renal cell carcinoma (ccRCC)and UM [33, 132, 142]. These studies showed that, similar to BAP1del PeM subtype, BAP1del tumorsfrom both ccRCC and UM also have dysregulated chromatin modifiers, impaired DNA repair pathway,and immune checkpoint receptor activation. More recent studies in ccRCC [116] and melanoma [127]demonstrated that inactivation of PBRM1 (or PBAF complex) predicts response to immune checkpointblocking therapies. Similarly, DNA repair defects have also been shown to be predictive of response toimmune checkpoint blocking therapies [60, 97, 98]. This strongly indicates a pan-cancer mechanism ofoncogenesis shared among tumors with BAP1 copy-number loss.The main challenge in mesothelioma treatment is that, all current efforts made towards testing newtherapy options are limited to using therapies that have been proven successful in other cancer types,without a good knowledge of underlying molecular mechanisms of the disease. As a result of sheerdesperation, some patients have been treated even though no targeted therapy for mesothelioma hasbeen proven effective as yet. For example, a number of clinical trials exploring the use of immunecheckpoint inhibitors (anti-PD1/PD-L1 or anti-CTLA4) in PM and/or PeM patients that progressed under60chemotherapy, and are positive for immune checkpoint markers are currently under progress. The resultsof the first few clinical trials report either very low response rate or no benefit to the patients [9, 25, 26,110]. Notably, BAP1 copy-number or mutation status were not assessed in these studies. We believe thatresponse rates for immune checkpoint blockade therapies in clinical trials for PeM will improve whenpatients are segregated by their BAP1 copy-number status.4.5 Methods4.5.1 Clinical samples and pathology evaluationPrimary untreated PeM tumors and matched benign samples were obtained from cancer patients under-going cytoreductive surgeries following protocols approved by the Clinical Research Ethics Board of theVancouver General Hospital (Vancouver, BC, Canada), Mount Sinai Hospital (Toronto, ON, Canada),and Moores Cancer Centre (San Diego, CA, USA). This study was approved by the Institutional ReviewBoard of the University of British Columbia and Vancouver Coastal Health (REB No. H15-00902 andV15-00902). All patients signed a formal consent form approved by the respective institutional ethicsboard. Histologic parameters and pathological scoring of tumors confirming PeM was established bythree independent pathologists. H&E and immunostained Formalin-Fixed Paraffin-Embedded (FFPE)slides were reviewed by at least two specialized pathologists to diagnose PeM and its subtype. Hema-toxylin and eosin (H&E) staining was used to determine the highest tumor cellularity (≥ 75%) fromsections for sequencing. The surgical resections were snap frozen and processed at respective institu-tions. The tumors have a companion normal tissue specimen (either adjacent normal tissue or peripheralblood previously extracted for germline DNA control). Each tumor specimen was approximately 1cm3 insize and weighed between 100-300 mg. Specimen were shipped overnight on dry ice that maintained anaverage temperature of less than -80oC. Upon receipt, the tissues were sectioned into 5 slices for DNA,RNA, and protein extraction as well as construction of TMA.4.5.2 Construction of tissue microarrays (TMAs)FFPE tissue blocks were retrieved from the archives of the Department of Pathology, Vancouver GeneralHospital (Vancouver, Canada). H&E stained slides from each block were reviewed by two pathologiststo identify tumor areas. TMAs were constructed with 1 mm diameter tissue cores from representativetumor areas from FFPE blocks. Cores were transferred to a paraffin block using a semi-automated tissuearray instrument (Pathology Devices TMArrayer, San Diego, CA). Duplicate tissue cores were takenfrom each specimen, resulting in a composite TMA block. Reactive mesothelial tissues from pleura61were also included as benign controls. Following construction, 4µm thick sections were cut for H&Eand immunohistochemical staining.4.5.3 Immunohistochemistry and HistopathologyFreshly cut TMA sections were analyzed for immunoexpression using Ventana Discovery Ultra au-tostainer (Ventana Medical Systems, Tucson, Arizona). In brief, tissue sections were incubated in Tris-EDTA buffer (CC1) at 37C to retrieve antigenicity, followed by incubation with respective primary anti-bodies at room temperature or 37C for 60-120 min. For primary antibodies, mouse monoclonal antibod-ies against CD8 (Leica, NCL-L-CD8-4B11, 1:100), CK5/Cytokeratin 5(Abcam, ab17130, 1:100), BAP1(SantaCruz, clone C4, sc-28383, 1:50), rabbit monoclonal antibody against CD3 (Abcam, ab16669,1:100), and rabbit polyclonal antibodies against CALB2/Calretinin (LifeSpan BioSciences, LS-B4220,1:20 dilution) were used. Bound primary antibodies were incubated with Ventana Ultra HRP kit or Ven-tana universal secondary antibody and visualized using Ventana ChromoMap or DAB Map detection kit,respectively. All stained slides were digitalized with the SL801 autoloader and Leica SCN400 scanningsystem (Leica Microsystems; Concord, Ontario, Canada) at magnification equivalent to x20. The im-ages were subsequently stored in the SlidePath digital imaging hub (DIH; Leica Microsystems) of theVancouver Prostate Centre. Representative tissue cores were manually identified by two pathologists.4.5.4 Whole exome sequencingDNA was isolated from snap-frozen tumors with 0.2 mg/mL Proteinase K (Roche) in cell lysis solutionusing Wizard Genomic DNA Purification Kit (Promega Corporation, USA). Digestion was carried outovernight at 55C before incubation with RNase solution at 37C for 30 minutes and treatment with pro-tein precipitation solution followed by isopropanol precipitation of the DNA. The amount of DNA wasquantified on the NanoDrop 1000 Spectrophotometer and an additional quality check done by reviewingthe 260/280 ratios. Quality check were done on the extracted DNA by running the samples on a 0.8%agarose/TBE gel with ethidium bromide.For Ion AmpliSeqTM Exome Sequencing, 100ng of DNA based on Qubit R© dsDNA HS Assay (ThermoFisher Scientific) quantitation was used as input for Ion AmpliSeqTM Exome RDY Library Preparation.This is a Polymerase Chain Reaction (PCR) based sequencing approach using 294,000 primer pairs (am-plicon size range 225-275 bp), and covers >97% of Consensus CDS (CCDS; Release 12), >19,000coding genes and >198,000 coding exons. Libraries were prepared, quantified by Quantitative Poly-merase Chain Reaction (QPCR) and sequenced according to the manufacturer’s instructions (ThermoFisher Scientific). Samples were sequenced on the Ion Proton System using the Ion PITM Hi-QTM Se-62quencing 200 Kit and Ion PITM v3 chip. Two libraries were run per chip for a projected coverage of40M reads per sample.4.5.5 Somatic variant callingTorrent Server (Thermo Fisher Scientific) was used for signal processing, base calling, read alignment,and generation of results files. Specifically, following sequencing, reads were mapped against the hu-man reference genome hg19 using Torrent Mapping Alignment Program. The mean target coverageranges from 78.62 to 226.44, thus sequencing depth ranges from 78 to 226X. Variants were identifiedby using Torrent Variant Caller plugin with the optimized parameters for AmpliSeq exome-sequencingrecommended by Thermo Fisher. The Variant Calling Format (VCF) files from all sample were com-bined using GATK (3.2-2) [47] and all variants were annotated using ANNOVAR [197]. Only non-silentexonic variants including non-synonymous SNVs, stop-codon gain SNVs, stop-codon loss SNVs, splicesite SNVs and In-Dels in coding regions were kept if they were supported by more than 10 reads andhad allele frequency higher than 10%. To obtain somatic variants, we filtered against dbSNP build 138(non-flagged only) and the matched adjacent benign or blood samples sequenced in this study. Puta-tive variants were manually scrutinized on the Binary Alignment Map (BAM) files through IntegrativeGenomics Viewer (IGV) version 2.3.25 [179].4.5.6 Copy number aberration (CNA) callsCopy number changes were assessed using Nexus Copy Number Discovery Edition Version 8.0 (BioDis-covery, Inc., El Segundo, CA). Nexus NGS functionality (BAM ng CGH) with the FASST2 Segmentationalgorithm was used to make copy number calls (a Circular Binary Segmentation/Hidden Markov Modelapproach). The significance threshold for segmentation was set at 5X10-6, also requiring a minimum of3 probes per segment and a maximum probe spacing of 1000 between adjacent probes before breaking asegment. The log ratio thresholds for single copy gain and single copy loss were set at +0.2 and −0.2,respectively. The log ratio thresholds for gain of 2 or more copies and for a homozygous loss were setat +0.6 and −1.0, respectively. Tumor sample BAM files were processed with corresponding normaltissue BAM files. Reference reads per CN point (window size) was set at 8000. Probes were normalizedto median. Relative copy number profiles from exome sequencing data were determined by normalizingtumor exome coverage to values from whole blood controls. The germline exome sequences were used toobtain allele-specific copy number profiles and generating segmented copy number profiles. The GISTICmodule on Nexus identifies significantly amplified or deleted regions across the genome. The amplitudeof each aberration is assigned a G-score as well as a frequency of occurrence for multiple samples. False63Discovery Rate q-values for the aberrant regions have a threshold of 0.15. For each significant region, a“peak region” is identified, which is the part of the aberrant region with greatest amplitude and frequencyof alteration. In addition, a “wide peak” is determined using a leave-one-out algorithm to allow for er-rors in the boundaries in a single sample. The “wide peak” boundaries are more robust for identifyingthe most likely gene targets in the region. Each significantly aberrant region is also tested to determinewhether it results primarily from broad events (longer than half a chromosome arm), focal events, orsignificant levels of both. The GISTIC module reports the genomic locations and calculated q-values forthe aberrant regions. It identifies the samples that exhibit each significant amplification or deletion, andit lists genes found in each “wide peak” region.4.5.7 Transcriptome sequencing (RNA-seq)Total RNA from 100µm sections of snap-frozen tissue was isolated using the mirVana Isolation Kit fromAmbion (AM-1560). Strand specific RNA sequencing was performed on quality controlled high RINvalue (>7) RNA samples (Bioanalyzer Agilent Technologies) before processing at the high throughputsequencing facility core at BGI Genomics Co., Ltd. (The Children’s Hospital of Philadelphia, Penn-sylvania, USA). In brief, 200ng of total RNA was first treated to remove the ribosomal RNA (rRNA)and then purified using the Agencourt RNA Clean XP Kit (Beckman Coulter) prior to analysis with theAgilent RNA 6000 Pico Chip to confirm rRNA removal. Next, the rRNA-depleted RNA was fragmentedand converted to cDNA. Subsequent steps include end repair, addition of an ‘A’ overhang at the 3’ end,ligation of the indexing-specific adaptor, followed by purification with Agencourt Ampure XP beads.The strand specific RNA library prepared using TruSeq (Illumina Catalogue No. RS-122-2201) wasamplified and purified with Ampure XP beads. Size and yield of the barcoded libraries were assessedon the LabChip GX (Caliper), with an expected distribution around 260 base pairs. Concentration ofeach library was measured with real-time PCR. Pools of indexed library were then prepared for clustergeneration and PE100 sequencing on Illumina HiSeq 4000.4.5.8 Transcriptome (RNA-seq) quantificationUsing splice-aware aligner STAR (2.3.1z) [50], RNA-seq reads ( 200MB in size) were aligned ontothe human genome reference (GRCh38) and exon-exon junctions, according to the known gene modelannotation from the Ensembl release 80 (http://www.ensembl.org). Apart from protein coding genes,non-coding RNA types and pseudogenes are further annotated and classified. Based on the alignmentand by using gene annotation (Ensembl release 80), gene expression profiles was calculated. Only readsunique to one gene and which corresponded exactly to one gene structure, were assigned to the corre-64sponding genes by using the python tool HTSeq [11]. Normalization of read counts was conducted by Rpackage DESeq [10], which was designed for gene expression analysis of RNA-seq data across differentsamples.4.5.9 Identification of fusion transcripts and validationWe used the deFuse algorithm [114] to predict rearrangements in RNA sequence libraries. The deFusefusion transcript prediction calls were further filtered using following criteria: a fusion gene candidate:(1) must be predicted to have arisen from genome rearrangement, rather than via a readthrough event; (2)must be predicted in no more than two sequence libraries; (3) must map unambiguously on both sidesof the predicted breakpoints (that is, no multi-mapping reads); (4) must not map entirely to repetitiveelements; (5) must be detected in >5 reads (either split or spanning) and (6) must have at least one of thefusion partner transcript expressed.Prioritized putative gene fusions were verified by designing PCR primers around the predicted fusionsites. Specifically, Reverse Transcription PCR (RT-PCR) was used to amplify the predicted fusion genejunctions from the same starting RNA material (100ng) as was used for RNA-seq. Two primers (20-22bp nucleotides) spanning the exon boundary of fused genes were designed using Primer3 (v. 0.4.0) [186].PCR was performed in 20µl reactions using Q5 buffer (NEB), 0.2mM dNTPs, 0.4 µM each primer, 0.12units Q5 High-Fidelity DNA Polymerase (NEB) and 2 µl of the RT reaction. The PCR reaction wascarried out with the following program: 95C, 30 seconds, followed by 30 cycles of 95C for 10 seconds,57C for 20 seconds and 72C for 10 seconds. Resulting PCR products, ranging in size from 150bpto 250bp, were purified using AMPure beads (Agencourt) and sequenced using Sanger sequencing toverify fusion junctions.4.5.10 Proteomics analysis using mass spectrometryFresh frozen samples dissected from tumor and adjacent normal were individually lysed in 50mM ofHEPES pH 8.5, 1% SDS, and the chromatin content was degraded with benzonase. The tumor lysateswere sonicated (Bioruptor Pico, Diagenode, New Jersey, USA), and disulfide bonds were reduced withDTT and capped with iodoacetamide. Proteins were cleaned up using the SP3 method [78, 79] (SinglePot, Solid Phase, Sample Prep), then digested overnight with trypsin in HEPES pH 8, peptide concentra-tion determined by Nanodrop (Thermo) and adjusted to equal level. A pooled internal standard controlwas generated comprising of equal volumes of every sample (10µl of each of the 100µl total digests)and split into 3 equal aliquots. The labeling reactions were run as three TMT 10-plex panels (9+IS), thendesalted and each panel divided into 48 fractions by reverse phase HPLC at pH 10 with an Agilent 110065LC system. The 48 fractions were concatenated into 12 superfractions per panel by pooling every 4thfraction eluted resulting in a total 36 overall samples.These samples were analyzed with an Orbitrap Fusion Tribrid Mass Spectrometer (Thermo FisherScientific) coupled to EasyNanoLC 1000 using a data-dependent method with synchronous precursorselection MS3 scanning for TMT tags. A short description follows; more detailed overview is in [79].Briefly, an in house packed reverse phase column run with a 2 hour low pH acetonitrile gradient (5-40%with 0.1% formic acid) was used to separate and introduce peptides into the MS. Survey scans coveringm/z 350-1500 were acquired in profile mode at a resolution of 120,000 (at m/z 200) with S-Lens RFLevel of 60%, a maximum fill time of 50 milliseconds, and Automatic Gain Control (AGC) target of4x105. For MS2, monoisotopic precursor selection was enabled with triggering charge state limited to 2-5, threshold 5x103 and 10 ppm dynamic exclusion for 60 seconds. Centroided MS2 scans were acquiredin in the ion trap in Rapid mode after CID fragmentation with a maximum fill time of 20 milliseconds and1 m/z isolation quadrupole isolation window, c ollision energy of 30%, activation Q of 0.25, injection forall available parallelizable time turned ON, and an AGC target value of 1x104. For MS3, fragment ionswere isolated from a 400-1200 m/z precursor range, ion exclusion of 20 m/z low and 5 m/z high, isobarictag loss exclusion for TMT, with a top 10 precursor selection. Acquisition was in profile mode with theOrbitrap after HCD fragmentation (NCE 60%) with a maximum fill time of 90 milliseconds, 50,000 m/zresolution, 120-750 m/z scan range, an AGC target value of 1x105, and all available parallelizable ON.The total allowable cycle time was set to 4 seconds.4.5.11 Peptide identification and protein quantificationQualitative and quantitative proteomics analysis was done using ProteomeDiscoverer (ThermoFisher Scientific). To maintain consistency with transcriptome annotation, we used Ensembl GRCh38.87human reference proteome sequence database for proteome annotation. Sequest HT 1.3 was used forPeptide Spectral Matches (PSM), with parameters specified as trypsin enzyme, two missed cleavagesallowed, minimum peptide length of 6, precursor mass tolerance 10 ppm, and a fragment mass toler-ance of 0.6 Da. We allowed up to 4 variable modifications per peptide from the following categories:acetylation at protein terminus, methionine oxidation, and TMT label at N-terminal residues and the sidechains of lysine residues. In addition, carbamidomethylation of cysteine was set as a fixed modification.PSM results were filtered using q-value cut off of 0.05 to control for FDR determined by Percolator.Identified peptides from both high and medium-confidence level after FDR-filtering were included inthe final stage to provide protein identification and quantification results. Reporter ions from MS3 scanswere quantified with an integration tolerance of 20ppm with the most confident centroid. Proteins were66further filtered to include only those found with minimum one peak in all samples. Proteome Discovererprocessed data was exported for further statistical analysis.4.5.12 Mutational signature analysisWe used deconstructSigs [143], a multiple regression approach to statistically quantify the contributionof mutational signature for each tumor. The mutational signature were obtained from the COSMICmutational signature database [8]. Both silent and non-silent somatic mutations were used togetherto obtain the mutational signatures. Only mutational signatures with a weight more than 0.06 wereconsidered for analysis.4.5.13 Prioritization of driver genes using HIT’nDRIVENon-silent somatic mutation calls, CNA gain or loss, and gene-fusion calls were collapsed in gene-patientalteration matrix with binary labels. Gene-expression values were used to derive expression-outlier gene-patient outlier matrix using GESD test. STRING ver10 [167] protein-interaction network was used tocompute pairwise influence value between the nodes in the interaction network. We integrated thesegenome and transcriptome data using HIT’nDRIVE algorithm [155]. Following parameters were used:α=0.9, β=0.6, and γ=0.8. We used IBM-CPLEX as the ILP solver.4.5.14 Consensus clusteringWe used ConsensusClusterPlus [199] R-package to perform consensus clustering. We used the followingparameters: maximum cluster number to evaluate: 10, number of subsamples: 10000, proportion of itemsto sample: 0.8, proportion of features to sample: 1, cluster algorithm: hierarchical, distance: pearson.4.5.15 Protein attenuation analysisFor every gene/protein profiled for CNA (segment mean), RNA-seq (normalized log2 expression), andMS (normalized log2 expression), we performed the following analysis. For every gene/protein, thePearson correlation coefficients were calculated for CNA-mRNA expression (RCNA:mRNA) and CNA-protein expression (RCNA:protein). The 75th percentile of the difference between the above two correlationcoefficients i.e. Rdiff = RCNA:mRNA−RCNA:protein was found to be approximately 0.45. Therefore thoseproteins with Rdiff ≥ 0.45 were considered as attenuated proteins.674.5.16 Pathway enrichment analysisThe selected set of genes were tested for enrichment against gene sets of pathways present in MolecularSignature Database (MSigDB) v6.0 [162] A hypergeometric test based gene set enrichment analysiswas used for this purpose (https://github.com/raunakms/GSEAFisher). A cut-off threshold of FDR <0.01 was used to obtain the significantly enriched pathways. Only pathways that are enriched with atleast three differentially expressed genes were considered for further analysis. To calculate the pathwayactivity score, the expression dataset was transformed into standard normal distribution using ‘inversenormal transformation’ method. This step is necessary for fair comparison between the expression-values of different genes. For each sample, the pathway activity score is the mean expression level of thedifferentially expressed genes linked to the enriched pathway.4.5.17 Stromal and immune scoreWe used two sets of 141 genes (one each for stromal and immune gene signatures) as described in [206].We used ‘inverse normal transformation’ method to transform the distribution of expression data into thestandard normal distribution. The stromal and immune scores were calculated, for each sample, usingthe summation of standard normal deviates of each gene in the given set.4.5.18 Enumeration of tissue-resident immune cell types using mRNA expressionprofilesCIBERSORT algorithm [124] was applied to the RNA-seq gene-expression data to estimate the propor-tions of 22 immune cell types (B cells naive, B cells memory, Plasma cells, T cells CD8, T cells CD4naive, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cellsgamma delta, T cells regulatory (Tregs), NK cells resting, NK cells activated, Monocytes, MacrophagesM0, Macrophages M1, Macrophages M2, Dendritic cells resting, Dendritic cells activated, Mast cellsresting, Mast cells activated, Eosinophils, and Neutrophils) using LM22 dataset provided by CIBER-SORT platform. Genes not expressed in any of the PeM tumor samples were removed from the LM22dataset. The analysis was performed using 1000 permutation. The 22 immune cell types were lateraggregated into 11 distinct groups.4.5.19 External datasetsTCGA datasets for 16 different cancer-types used in this study were downloaded from the NationalCancer Institute-Genomic Data Commons (NCI-GDC; https://portal.gdc.cancer.gov/) on February 2017.For somatic mutation data, non-silent variant calls that were identified by at least three out of four dif-68ferent tools (MUSE, MuTect2, SomaticSniper and VArScan2) were considered. CNA segmented datawere further processed using Nexus Copy Number Discovery Edition Version 9.0 (BioDiscovery, Inc.,El Segundo, CA) to identify aberrant regions in the genome. In case of the RNA-seq expression data,HTSeq-FPKM-UQ normalized data were used.69Figure 4.1: Landscape of somatic mutations in PeM tumors. (A) Comparison of somatic muta-tion rate in protein-coding regions of PeM with different adult cancers obtained from TCGA.(B) Mutational signature present in PeM (top panel). Proportional contribution of differ-ent COSMIC mutational signature per tumor sample. (C) Somatic alterations identified inPeM tumors group by important cancer-pathways. LUSC: Lung Squamous Cell Carcinoma,LUAD: Lung adenocarcinoma, BLCA: Urothelial Bladder Carcinoma, COAD: Colorectalcarcinoma, UCEC: Uterine Corpus Endometrial Carcinoma, OV: Ovarian cancer, KRIP: Kid-ney renal papillary cell carcinoma, KIRC: Kidney Renal Clear Cell Carcinoma, UCS: Uter-ine Carcinosarcoma, GBM: Glioblastoma Multiforme, BRCA: Breast Invasive Carcinoma,MESO-PM: Malignant Pleural Mesothelioma, MESO-PeM: Malignant Peritoneal Mesothe-lioma, PAAD: Pancreatic Adenocarcinoma, PRAD: Prostate Adenocarcinoma, KICH: KidneyChromophobe, TGCT: Testicular Germ Cell Tumor.70Figure 4.2: Landscape of copy number aberrations in PeM tumors. (A) Aggregate copy-number alterations by chromosome regions in PeM tumors. Important genes with copy-number changes are highlighted. (B) Sample-wise view of copy-number alterations in PeMtumors. (C) Comparison of copy-number burden (considering protein-coding regions only)in PeM with respect to other adult cancers obtained from TCGA. (D) Highly aberrant ge-nomic regions in PeM prioritized by GISTIC. (E) mRNA expression pattern of BAP1 acrossall PeM samples. The Wilcoxon signed-rank test p-value for BAP1 mRNA expression com-pared between the PeM subtypes is indicated in the box. (F) Detection of BAP1 nuclearprotein expression in PeM tumors by immunohistochemistry (Photomicrographs magnifica-tion - 20x). (G) Unsupervised consensus clustering of tumor samples based on copy-numbersegmentation mean values of the 3349 most variable genes.71Figure 4.3: Gene fusions in PeM. (A-B) Circos plot showing the gene fusion events identifiedin PeM tumors. (A) BAP1intact subtype (B) BAP1del subtype. (C-F) Few selected gene fu-sion events identified in PeM tumors. The top and middle panel shows the chromosomeand the transcripts involved in the gene fusion event. The bottom panel shows the RNA-seq read counts detected for the respective transcripts. (C) KANSL1-ARL17B fusion, (D)PBRM1-ADGB fusion, (E) SETD2-CHP1 fusion, and (F) PHF7-PBRM1 fusion. (G-J) Thechromatogram showing the Sanger sequencing validation of the fusion-junction point.72Figure 4.4: Transcriptome and proteome profile of PeM. (A-B) Principal component analysisof PeM tumors using (A) transcriptome profiles and (B) proteome profiles. (C) Effects ofCNA on transcriptome and proteome. In the scatterplot, each dot represents a gene/pro-tein. The horizontal and vertical axes represent Pearson correlation coefficient between CNA-transcriptome and CNA-proteome respectively. Key cancer genes that undergo protein atten-uation have been highlighted. (D) Geneset enrichment analysis of attenuated proteins againstgene ontologies (left panel) and Reactome pathways (right panel). (E-G) CORUM core pro-tein complexes regulated by PBRM1 and/or BAP1. The nodes represent individual proteinsubunit of the respective complex. The node color represents correlation of mRNA expressionof respective gene with BAP1. The border color of the node indicates whether the respectiveprotein is attenuated or not. The edge represents interaction between the protein subunits.The edge information were extracted from STRING v10 PPI network. The edge color (andedge thickness) represents correlation of protein expression between the respective interactionpartners. (E) SWI/SNF complex B (PBAF), (F) SWI/SNF complex A (BAF), and (G) HDACcomplex. (H-I) mRNA and protein expression level differences between PeM subtypes. (H)SWI/SNF complex and (I) HDAC complex. The expression levels are log2 transformed andmean normalized.73Figure 4.5: Immune cell infiltration in PeM tumors. (a-b) Pathways enrichment of top-500 dif-ferentially expressed genes between PeM subtypes obtained using (a) mRNA expression and(b) protein expression. (c-d) Correlation between immune score and stromal score derivedfor each tumor sample using (c) mRNA expression and (d) protein expression. (e) Estimatedrelative mRNA fractions of leukocytes infiltrated in PeM tumors based on CIBERSORT anal-ysis. (f) CD3 and CD8 immunohistochemistry showing immune cell infiltration on BAP1delPeM tumor (Photomicrographs magnification - 20x). (g) mRNA expression differences inimmune checkpoint receptors between PeM subtypes. The bar plot on the right representsnegative log10 of Wilcoxon signed-rank test p-value computed between PeM subtypes. (h)Correlation between immune score and mRNA expression of immune checkpoint receptors.The expression levels are log2 transformed and mean normalized74Chapter 5Combinatorial detection of conservedalteration patterns for identifying cancersubnetworks5.1 IntroductionRecent large scale pan-cancer sequencing projects have revealed multitude of somatic genomic, tran-scriptomic, proteomic and epigenomic alterations across cancer types. However, a tumor is likely drivenby selected few alterations that provide evolutionary advantage to the tumor, hence called “driver” al-terations [195]. Distinguishing driver alterations from functionally inconsequential random “passenger”alterations is critical for therapeutic development and cancer treatment.It is well evident that, except for few cases, cancers are often driven by multiple driver genes [12,155]. Whereas emergence of alterations is likely a consequence of endogenous or exogenous mutagenexposures [7], their evolutionary selection depends on the functional role of the affected genes [195] andsynergistic combinations of different alterations. For example, TMPRSS2-ERG gene fusion is consideredas an early driver event in almost half of prostate cancer cases, and it often co-exists with copy-numberdeletions of PTEN as well as NKX3-1 to drive cancer progression [31, 90, 93]. Recently, concomitantdeletion of four cancer genes - BAP1, SETD2, PBRM1, and SMARCC1 in chromosome locus 3p21 hasbeen identified as a driver event in a fraction of clear cell renal cell carcinoma (ccRCC) [33], uvealmelanoma [142], and mesotheliomas [208]. These genes are involved in chromatin remodeling process,and their loss further impairs DNA damage repair pathway in the aberrant tumors [142].75Co-occurring alterations might be evolutionary selected because alteration in one gene might en-hance the deleterious effect of the other [28]. Such co-selected genes are often a part of a functionallyinteracting driver subnetwork (or pathway) that are observed together in the same tumor, and defineits phenotype. In fact, as demonstrated by the pancancer and other large scale sequencing efforts, co-occurring genomic and transcriptomic alterations in specific tumor types are commonly shared across alarge fraction of patients. Thus efficient computational methods that can identify large subsets of func-tionally interacting (genomic or transcriptomic) alterations, highly conserved across specific tumor types,are in high demand.5.2 Literature ReviewRecently, a number of computational methods have been developed to identify recurrent genomic (as wellas transcriptomic) alteration patters across tumor samples. Some of these methods have been designedto identify multiple gene alterations simultaneously, based on their co-occurrence or mutual exclusivityrelationships in a tumor cohort, without any reference to a molecular interaction network [45, 89, 118].Other approaches have been developed with the aim of identifying a specific subnetwork within a molec-ular interaction network, either through (i) a combinatorial formulation, with the goal of maximizing thetotal weight of the subnetwork in a molecular interaction network with node (and possibly edge) weights[53, 108], or (ii) a network diffusion process to derive specific mutated pathways [102, 189]. A directionparticularly relevant to our study is motivated by [6, 88, 185, 189], and explored by Bomersbach et al.[18], which proposed an alternative formulation for finding a subnetwork of a given size k with the goalof minimizing h, the number of samples for which at least one gene of the subnetwork is in an alteredstate. (A similar formulation where the goal is to maximize a weighted difference of k and h, for varyingsize k, can be found in [76].) Although the above combinatorial problems are typically NP-hard, theybecame manageable through the use of state of the art ILP solvers or greedy heuristics, or by the use ofcomplex preprocessing procedures.Complementary to the ideas proposed above, there are also several approaches to identify mutuallyexclusive (rather than jointly altered) sets of genes and pathways [37, 117, 190]. These approaches utilizethe mutational heterogeneity prevalent in cancer genomes, and are driven by the observation that muta-tions acting on same pathway are many times mutually exclusive across tumor samples. Although, froma methodological point of view, these approaches are very interesting, they are not trivially extendableto the problem of identifying co-occurring alteration patterns (involving more than two genes) conservedacross many samples.765.3 Our ContributionsIn this chapter, we present a novel computational method, cd-CAP (combinatorial detection of ConservedAlteration Patterns), that primarily uses an ILP formulation to identify subnetworks of an interactionnetwork, each with an alteration pattern conserved across (a large subset of) a tumor sample cohort.Some of the previous methods described above, attempt to solve a variant of the problem but do so byconsidering only a single network and using binary labeled genes – indicating whether the gene is alteredor not. Unlike these approaches, our method simultaneously identifies more than one subnetwork, andeach gene within each subnetwork has labels specific to the alteration types it harbors. In fact, we allowa gene to have more than one label, each corresponding to a specific alteration type: somatic mutation,copy number alteration, or aberrant expression. From this point on we will refer to each distinct alterationtype as a specific “color” of the corresponding node in an interaction network.The algorithmic framework of cd-CAP consists of two major steps. The first step is an exhaustivesearch method (a variant of the a-priori algorithm) that was originally designed for association rule min-ing [3]. This step computes the set of all “candidate” subnetworks (each with a distinct color assignment)of size at most k shared among at least t samples (both k and t are user defined parameters). cd-CAPprovides the user the additional options that (i) at least two distinct colors should be present in the col-oring of a subnetwork, or (ii) each sample network can include up to a fraction δ of nodes whose colorassignment differ from that of the “template”. cd-CAP also gives the user to stop at this point and provide(a) the largest colored subnetwork that appears in at least t samples (we report on some results obtainedwith this option), or (b) the colored subnetwork of size k that is shared by the largest number of samples.Alternatively, the second step solves the maximum conserved subnetwork cover problem which asks tocover the maximum number of nodes in all samples with at most l colored subnetworks (l is user defined)- obtained in the first step - via ILP.We have applied cd-CAP - with each of the possible options above, i.e., (i), (ii), (a) and (b) - toTCGA breast cancer (BRCA), colorectal adenocarcinoma (COAD), and glioblastoma (GBM) datasets,which collectively include over 1000 tumor samples. cd-CAP identified several connected subnetworksof interest, each exhibiting specific gene alteration pattern across a large subset of samples.In particular, cd-CAP results with option (i) demonstrated that many of the largest highly conservedsubnetworks within a tumor type solely consist of genes that have been subject to copy number gain, typ-ically located on the same chromosomal arm and thus likely a result of a single, large scale amplification.One of these subnetworks cd-CAP observed (in about one third of the COAD samples [170]) include 9genes in chromosomal arm 20q, which corresponds to a known amplification recurrent in colorectal tu-mors. Another copy-number gain subnetwork cd-CAP observed in breast cancer samples correspond to77a recurrent large scale amplification in chromosome 1 [42]. It is interesting to note that cd-CAP was ableto re-discover these events without specific training.Several additional subnetworks identified by option (i) solely consist of genes that are aberrantly ex-pressed. Further analysis with options (ii) and (b) of cd-CAP revealed subnetworks that capture signalingpathways and processes critical for oncogenesis in a large fraction of tumors. We have also demonstratedthat the subnetworks identified through all three options of cd-CAP are associated with patients’ survivaloutcome and hence are clinically important.In order to assess the statistical significance of subnetworks discovered by cd-CAP - option (a), weintroduce for the first time a model in which likely inter-dependent events, in particular amplification ordeletion of all genes in a single chromosome arm, are considered as a single event. Conventional modelsof gene amplification either consider each gene amplification independently [36] (this is the model weimplicitly assume in our combinatorial optimization formulations, giving a lower bound on the true p-value), or assumes each amplification can involve more than one gene (forming a subsequent sequence ofgenes) but with the added assumption that the original gene structure is not altered and the duplicationsoccur in some orthogonal “dimension” [54, 148, 211]. Both models have their assumptions that do nothold in reality, but inferring evolutionary history of a genome with arbitrary duplications (that convertone string to another, longer string, by copying arbitrary substrings to arbitrary destinations) is NP-hardand even hard to approximate [40, 123]. By considering all copy number gain or loss events in the samechromosomal arm as a single event, we are, for the first time, able to compute an estimate that providesan empirical upper bound to the statistical significance (p-value) of the subnetworks discovered. (Notethat this is not a true upper bound since a duplication event may involve both arms of a chromosome -but that would be very very rare.) Through this upper bound, together with the lower bound above, wecan sandwich the true p-value and thus the significance of our discovery.5.4 Algorithmic Framework of cd-CAP5.4.1 Combinatorial Optimization FormulationConsider an undirected and node-labeled graph G = (V,E), representing the human gene or proteininteraction network, with n nodes where v j ∈V represent genes and e=(vh,v j)∈E represent interactionsamong the genes/proteins. Let us assume that we have m copies of the original network G, where eachcopy represents an individual sample Pi in a cohort. In each network Gi = (V,E,Ci) corresponding tosample Pi, each node vi, j (as a copy of v j) is colored with one or more possible colors to form the setCi, j (i.e. Ci maps vi, j to a possibly empty subset of colors Ci, j). Each color represents a distinct type78of alteration harbored by a gene/protein, in particular somatic mutation (single nucleotide alteration orshort indel), copy number gain, copy number loss or significant alteration in expression (which can betrivially expanded to include genic structural alteration - micro-inversion or duplication, gene fusion,alternative splicing, methylation altearation, non-coding sequence alteration) observed in the gene andthe protein product. Without loss of generality, Ci, j = /0 implies none of the possible alteration events areobserved at vi, j, and two nodes vi, j,vi′, j corresponding to each other in two distinct samples have at leastone matching color if Ci, j ∩Ci′, j 6= /0.The main goal of cd-CAP is to identify conserved patterns of (i.e. identically colored) connectedsubnetworks across a subset of sample networks Gi. Consider a connected subnetwork T = (VT ,ET ) ofthe original interaction network G, where each node v j ∈ VT is assigned exactly one color c j. Such acolored subnetwork is said to be shared by a collection of sample networks Gi(i ∈ I) if each node of thesubnetwork harbor the same color in every sample network i.e. c j ∈⋂i∈I Ci, j for each v j ∈VT . A colorednode in a sample network is said to be covered by a subnetwork if the subnetwork is shared by the node’ssample network (Fig. 5.1). Intuitively, a colored subnetwork represents a conserved pattern or a networkmotif.cd-CAP combinatorially formulates the problem of identifying conserved patterns of subnetworksas the Maximum Conserved colored Subnetwork Identification problem (MCSI). Here the goal isto find the largest connected subnetwork S of the interaction network G, that occur in exactly t (a userspecified number) samplesP , such that each node in S has the same color in each sample Pi(∈P). Notethat this formulation is orthogonal to that used in [18] and [76], where the goal is to maximize the numberof samples that share a fixed size subnetwork. The advantage of formulating the problem as MCSI is thatit naturally admits a generalization of the a-priori algorithm. We also note that our formulation considersdistinct types of mutations (as colors) in the conserved alteration patterns, another key improvement tothat used in [18, 76].cd-CAP also supports simultaneous identification of multiple conserved subnetworks that are alteredin a large number of samples. In one potential formulation of the problem one may aim to cover allnodes vi, j in all m input sample networks Gi, with the smallest number of subnetworks T = (VT ,ET )∈Tshared by at least one sample network. We refer this combinatorial optimization problem as MinimumSubgraph Cover Problem for (Node) Colored Interaction Networks (MSC-NCI).One advantage of the MSC-NCI problem is that it is parameter-free. However, in a realistic multi-omics cancer dataset, the number of genes far exceeds the number of samples represented. Under suchconditions, the solution to the MSC-NCI problem will primarily include subnetworks that are large con-nected components that are shared by only one sample network. To account for this situation, we intro-duce the following parameters/constraints akin to those for the MCSI formulation: (1) we require that79the nodes in each subnetwork have the same color shared by at least t samples (in the remainder of thediscussion, t is referred to as depth of a subnetwork); and (2) we require that each subnetwork returnedcontains at most k nodes. Note that this variant of the problem is infeasible for certain cohorts (considera particular node which has a unique color for a particular sample; clearly requirement 1 can not be sat-isfied if t > 1). Even if there is a feasible solution, the requirement that each subnetwork in T is of sizeat most k makes the problem NP-hard (the reduction is from the problem of determining whether G canbe exactly partitioned into connected subnetworks, each with k nodes [52]). As a result (3) we introduceone additional parameter, l, the maximum number of subnetworks (each of size at most k, and whichare color-conserved in at least t samples) with the objective of covering the maximum number of nodesacross all samples. We call the problem of identifying at most l subnetworks of size at most k, whosecolors are conserved across at least t samples, so as to maximize the total number of nodes in all thesesamples covered by these subnetworks, as the Maximum Conserved Subnetwork Coverage problem(MCSC).5.4.2 Algorithmic Framework for solving MCSCWe formulate the MCSC problem (as well as MSC-NCI problem) as an ILP. A straightforward applica-tion of available ILP solvers can only handle relatively small instances of the MSC-NCI problem. Thisis because the number of variables and the number of constraints for the MSC-NCI ILP formulation areO(n2m2) and O(n2m3) respectively, both very large for a typical problem instance. Fortunately, in allinstances of interest, only a limited number of genes are colored in comparison to the total number ofnodes nm. This enables us to apply an exhaustive search method that is designed for association rulemining [3] to build a list of all candidate subnetworks exactly and efficiently (e.g. in comparison to theILP or heuristic solutions in [18, 76]) and then solve the MCSC on the set of candidate subnetworks1.Generating Conserved SubnetworksWe generate the complete list of candidate subnetworks with minimum depth t by the use of “anti-monotone property” [103]: if any subnetwork S has depth < t, then the depth of all of its supergraphsS′ ⊃ S must be < t. This makes it possible to grow the set S of valid subnetworks comprehensivelybut without repetition (as described as “optimal order of enumeration” in [113]) through the followingbreadth-first network growth strategy.1. For every colored node vi, j and each of its colors c`, we create a candidate subnetwork of size 11 Note that our exhaustive search method is an extension of the a-priori algorithm with the difference that we require thecandidate subnetworks to maintain connectivity as they grow.80containing the node with color c`. All samples in which the node is colored c` naturally share thistrivial subnetwork.2. We inductively consider all candidate subnetworks of size s with the goal of growing them tosubnetworks of size s+1 as follows. For a given subnetwork T of size s, consider each neighboringnode u. For each possible color c′` of u, we create a new candidate subnetwork of size s+ 1 byextending T with u - with color c′`. We maintain this subnetwork for the next inductive step onlyif the number of samples sharing this new subnetwork is at least t; otherwise, we discard it.During the extension of T above, if the new node u does not reduce the number of samples sharing it, Tbecomes redundant and is not considered in the ILP formulation.Solving MCSCGiven the universe U = {vi, j |Ci, j 6= /0 , i = 1, · · · ,m; j = 1, · · · ,n}, containing all the coloured nodes inall the sample networks, and the collection of all subnetworksS = {Ti |Ti shared by at least t samples and contains at most k nodes}our goal is to identify up to l subnetworks from the collection S whose union contains the maximumpossible number of elements of the universe U .After the list of all candidate subnetworksS is constructed (as described in the previous subsection),we represent the MCSC problem with the following ILP and solve it using IBM-CPLEX or Gurobi. Abinary variable C[i, j] corresponds to whether colored node vi, j was covered by at least one chosen sub-network, and binary variable X [i] corresponds to whether colored candidate subnetwork Ti was one ofthe chosen. Let Si, j represent the set of all subnetworks of S which contain node vi, j properly coloredin them.Maximize ∑vi, j∈UC[i, j]s.t. ∑Tp∈Si, jX [p]≥C[i, j] (∀vi, j ∈U )∑Ti∈SX [i]≤ l81Special Types of Conserved Subnetworks.In addition to the exactly-conserved colored subnetworks obtained through the general MCSC formula-tion, we also consider two important variants.1. Colorful Conserved Subnetworks. A colorful subnetwork T is one that has at least two distinctcolors represented in the coloring of its nodes, i.e. c`,ch ∈ ⋂v j∈T C j (c` 6= ch). In some of thedatasets that we analyzed, certain colors were dominant in the input to such extent that all subnet-works identified by our method had all nodes colored the same. By restricting focus to colorfulsubnetworks, it is possible, e.g., to capture conserved patterns of potential driver alterations andtheir impact on their vicinity in the interaction network, in the form of expression alterations. Inorder to identify the maximalcolorful conserved subnetwork of a given depth t in the tumor sam-ples, we only need to keep track of the colorful subnetworks in each iteration - since any colorfulnetwork must contain a connected colorful subnetwork.2. Subnetworks Conserved within error rate δ . In order to reduce the sensitivity of our method tonoise (or lack of precision in generating the data) in the input when detecting conserved patterns,we extend our formulation to allow some “errors” in identifying conserved subnetworks. Wedefine δ , the error rate of a colored subnetwork T as the maximum allowable fraction of nodes ofT without an assigned color in any sample Pi that shares T . For tolerating an error rate of δ , weextend our algorithm to generate candidate subnetworksS for the MCSC problem by performinga post-processing step in which the list of samples sharing subnetwork T is increased by includingall samples that share T with an error rate of δ . (Note that our notion of error is restricted to nodesthat do not have a color, i.e. an observed alteration, in each specific sample.)5.5 Results5.5.1 Dataset UsedWe obtained somatic mutation, copy number aberration and RNA-seq based gene-expression data fromthree distinct cancer types - glioblastoma multiforme (GBM) [175], breast adenocarcinoma (BRCA)[177], and colon adenocarcinoma (COAD) [170] from The Cancer Genome Atlas (TCGA) datasets.In addition, we distinguish four commonly observed molecular subtypes (i.e. Luminal A, Luminal B,Triple-negative/basal-like and HER2-enriched) from the BRCA cohort. For each sample, we obtainedthe list of genes which harbor somatic mutations, copy number aberrations, or are expression outliers asper below.82Somatic Mutations. All non-silent variant calls that were identified by at least one tool among MUSE,MuTect2, SomaticSniper and VarScan2 were considered.Copy Number Aberrations. CNA segmented data from NCI-GDC were further processed using NexusCopy Number Discovery Edition Version 9.0 (BioDiscovery, Inc., El Segundo, CA) to identify aberrantregions in the genome. We restricted our analysis to the most confident CNA calls selecting only thosegenes with high copy gain or homozygous copy loss.Expression outliers. We used HTSeq-FPKM-UQ normalized RNA-seq expression data to which weapplied the GESD test [144]. In particular, we used GESD test to compare the transcriptome profile ofeach tumor sample (one at a time) with that from a number of available normal samples. For each gene,if the tumor sample was identified as the most extremely deviated sample (using critical value α = 0.1),the corresponding gene was marked as an expression-outlier for that tumor sample. This procedure wasrepeated for every tumor sample. Finally, comparing the tumor expression profile of these outlier genesto the normal samples, their up or down regulation expression patterns were determined.5.5.2 Maximal Colored Subnetworks Across Cancer TypesWe used cd-CAP to solve the maximum conserved colored subnetwork identification problem exactlyin (each one of the four) protein-interaction network(s) on each cancer type - for every feasible valueof network depth. As can be easily observed, the depth and the size of the identified subnetwork areinversely related. We say that a given value of the network depth is feasible if (i) the depth is at least10% of the cohort size, (ii) the maximum network size for that depth is at least 3, (iii) the number of“candidate”subnetworks are at most 2M per iteration when running cd-CAP for that depth.The number of maximal solutions of cd-CAP as a function of network depth for each cancer type(COAD, GBM, BRCA Luminal A, and BRCA Luminal B) is shown in figure 5.2A-D on STRING v10PPI network with high confidence edges. In general, for a fixed network size, the number of distinctnetworks of that size decreases as the network depth increases. One can observe the “valleys” in thecolored plots in figure 5.2A-D which correspond to the largest depth that can be obtained for a givensubnetwork size. Throughout the remainder of the paper we focus on the colored subnetworks of eachgiven size for which the network depth is maximum possible - which correspond to the valleys in theplots. If for a given subnetwork size and the corresponding maximal depth, cd-CAP returns more than 1subnetwork, we discard those solutions.Most of the subnetworks, especially those with large depth, identified for each of the four cancertypes consisted of expression outlier genes (typically all upregulated or all downregulated) only (fig-ure 5.2A-D). As the network depth decreases, maximal subnetworks that consist only of copy number83variants emerge. One of the most prominent copy-number gain subnetworks of the COAD dataset hasdepth 163 out of 463 patients in the cohort. This network forms the core of the larger maximal subnet-works cd-CAP identifies for lower depth values; it corresponds to a copy number gain of the chromo-somal arm 20q - a known copy number aberration pattern highly specific to colorectal adenocarcinomatumors [170].Another subnetwork cd-CAP identified in 15% of the 422 BRCA Luminal-A samples correspondsto a copy number gain on chromosome 1, which is again a known aberration associated with breastcancer [42]. With increasing depth, the maximal subnetworks cd-CAP identifies in Luminal A cohortstart to consist solely of expression outlier genes. In particular cd-CAP identified a subnetwork of eightunderexpressed genes with network depth 90 (Fig. 5.2E) - consisting of genes EGFR, PRKCA, SPRY2,and NRG2, known to be involved in EGFR/ERBB2/ERBB4 signaling pathways (Fig. 5.2F). EGFR is animportant driver gene involved in progression of breast tumors to advanced forms [171] and its alteredexpression is observed in a number of breast cancer cases [42]. The subnetwork also included MET,another well-known oncogene [119], and is enriched for members of the Ras signaling pathway, whichis also known for its role in oncogenesis and mediating cancer phenotypes such as over-proliferation[57].In order to test for the association between the subnetworks identified by cd-CAP and patient survivaloutcomes, we used a risk-score defined as a linear combination of the normalized gene-expression valuesof the genes in the subnetwork weighted by their estimated univariate Cox proportional-hazard regressioncoefficients (see Methods section for details). Based on the risk-score values, the patients covered by thesubnetwork were stratified into two risk group. Luminal A subnetwork was the most significant amongall subnetworks identified in this dataset (Fig. 5.2G). The patients in the high-risk group have pooroverall survival outcome suggesting clinical importance of the identified subnetwork by cd-CAP.As another example, we identified a colored subnetwork with copy number gain genes that covered163 patients in the COAD dataset (Fig. 5.2H). The genes in this subnetwork belong to the same chro-mosome locus 20q13, suggesting that they may comprise a single region of chromosomal amplification.Intriguingly all the members forms a linear pathway-like structure also on the PPI level. Among themis a group of functionally related genes consisting of transcription factors and their regulators (genesCEBPB, NCOA’s, UBE2’s), which are known to be involved in the intracellular receptor signaling path-way (Fig. 5.2I). CEBPB and UBE2’s are also involved in the regulation of cell cycle [82]. To the otherend of the linear subnetwork, we found MMP9 and SDC4, the established mediators of cancer invasionand apoptosis [30, 82]. Also we confirmed that this set of genes are highly predictive of the patients’survival outcome (Fig. 5.2J). These results support the functional importance and clinical relevance ofthe subnetwork we identified.845.5.3 Maximal Colorful Subnetworks Across Cancer TypesWe next used cd-CAP to solve the maximum conserved colored subnetwork identification problem - withat least two distinct colors (see Section 5.4.2 for details), in each of the four protein-interaction network(s)and on each cancer type. Again, cd-CAP was run with every feasible value (as defined above) of networkdepth. The number of maximal solutions of cd-CAP as a function of network depth for each cancer type(COAD, GBM, BRCA Luminal A, and BRCA Luminal B) is shown in figure 5.3A-D on STRING v10PPI network with high confidence edges. Note that we distinguish here the maximal subnetworks withone or two sequence-level alterations (i.e. somatic mutations and copy number alterations) – whichis of potential interest since their neighboring expression-level alterations are possibly caused by thesesequence-level alterations (figure 5.3E provides an example) – with all the other cases. Similarly, weonly focus on the maximal colorful subnetworks of every possible size for which the network depth ismaximum possible and discard the solutions when cd-CAP returns more than 1 colorful subnetworks foreach feasible value of network depth.One colorful COAD subnetwork of note is composed of overexpressed genes with an additionalcopy number gain gene that covers 108 patients (Fig. 5.3E). This subnetwork is mainly enriched forgenes involved in ribosome biogenesis (Fig. 5.3G). Cancer has been long known to have an increaseddemand on ribosome biogenesis [120], and increased ribosome generation has been reported to contributeto cancer development [131]. The biological relevance of this subnetwork is also supported by survivalanalysis, which shows a strong differentiation between the high-risk and low-risk groups - see figure5.3F.Another colorful subnetwork we observed in 58 BRCA Luminal A samples consists of four copynumber gained genes, an overexpressed gene, and two underexpressed genes, including EGFR (Fig.5.3H). All copy-number gained genes and the overexpressed gene are located in chromosome 1q, com-monly reported in breast cancer [42]. The subnetwork involves an interesting combination of the down-regulation of the cancer gene EGFR and the amplification of a group of genes involved in T-cell receptorsignaling (PTPRC, CD247, and ARPC5; see figure 5.3I). Thus we may surmise that the covered popula-tion of patients potentially have relatively low cancer proliferation index with higher anti-tumor immuneresponse, which can be highly relevant indicators with regard to clinical outcome. Indeed, this subnet-work is significantly associated with patients’ survival (Fig. 5.3J).5.5.4 Multiple-Subnetwork Analysis Across Cancer TypesWe next sought to detect up to 5 subnetworks per cancer type that collectively cover maximum possiblenumber of colored nodes by solving the MCSC problem on STRING v10.5 network (with experimentally85validated edges). The subnetwork extension error rate was set to 20%, and we restricted the search spaceto subnetworks which do not consist only of expression outlier nodes, in order to obtain what we believeto be more biologically interesting results. Parameter t was chosen for each dataset in a way that made itpossible to construct all candidate subnetworks of maximum possible size while keeping the total numberof candidate subnetworks below 2×106, making the problem solvable in reasonable amount of time. Weset t to 69 (15% of the patients), 62 (10% of the patients), and 110 (10% of the patients) respectively forCOAD, GBM, and BRCA datasets. Table 5.1 shows the size, per sample depth and the coloring of thenodes in the resulting subnetworks.We note that the subnetworks identified in the GBM dataset had the lowest depth (10-15% of thesamples). COAD and BRCA datasets on the other hand have much larger depth (respectively 30-48% and15-32% of the samples). Smaller subnetworks of the GBM dataset solely consist of copy number gaingenes on chromosome 7q, a known amplification in GBM [22]. The two large subnetworks each containa single gene with copy number gain (SEC61G and EGFR, respectively) accompanied by several ofoverexpressed genes. BRCA dataset exhibits a similar pattern: each of the four large subnetworks containa single copy number gain gene from chromosome 8q, (NSMCE2 in one and MYC in the remainingthree subnetworks). Subnetworks detected in COAD dataset were much more colorful and recurrentlyconserved in a larger fraction of samples than those in the other datasets. All genes with copy numbergain are located in chromosome 20q.We identified a subnetwork with 15 nodes (11 genes with copy number gain, 1 overexpessed and2 underexpressed genes) in 149 COAD patients (Fig. 5.4A). All 11 copy number gain genes belong tochromosome 1q. IL6R, PLCG1, PTPN1, and HCK are involved in cytokine/interferon signaling to acti-vate immune cells to counter proliferating tumor cells [160] (Fig. 5.4B). UBE2I, AURKA, and MAPRE1are involved in cell cycle processes. This subnetwork was found to be associated with patients’ survivaloutcome (Fig. 5.4C).We identified another subnetwork with 15 nodes (14 overexpressed and 1 copy number gain genes)in 313 breast cancer patients (Fig. 5.4D). Genes in this subnetwork are involved in cell cycle processes(Fig. 5.4E). In particular the cell cycle checkpoint processes were dysregulated - which is known to drivetumor initiation processes [194]. The subnetwork was found to be associated with patients’ survivaloutcome (Fig. 5.4F) demonstrating its clinical relevance.865.5.5 Empirical P-Value Estimates Confirm the Significance of cd-CAP IdentifiedNetworksTo evaluate the significance of cd-CAP’s findings, we performed the permutation test in Section 5.7.11000 times on each cancer type for each setting of subnetwork constraints. Figure 5.5 demonstratesthe distribution of the empirical p-value estimates. (The lower bound results look similar to what ispresented in the figure and thus are omitted.) In the permutation tests all cd-CAP identified subnetworks(without additional constraints) of size 2-5 were composed solely of expression altered genes; in contrastthere are several larger CNV rich subnetworks observed in the TCGA COAD data set and others, furtherconfirming the significance of our findings. Colorful subnetworks presented in Figure 5.5 are even lesslikely to occur at random (we therefore omit empirical p-value estimates for the networks in Figure 5.5).5.6 DiscussionIn this study, we introduce a novel combinatorial framework and an associated tool named cd-CAPwhich can identify (one or more) subnetworks of an interaction network where genes exhibit conservedalteration patterns across many tumor samples. Compared with the state-of-the-art methods (e.g.[6,76]), cd-CAP differentiates alteration types associated with each gene (rather than relying on binaryinformation of a gene being altered or not), and simultaneously detects multiple alteration type conservedsubnetworks.cd-CAP provides the user with two major options. (a) It computes the largest colored subnetwork thatappears in at least t samples. This option exhibits significant speed advantage over available ILP-basedapproaches; its a-priori based algorithmic formulation allows flexible integration of special constraints(on maximal subnetworks) – not only simplifying complicated ILP constraints, but also further reducingthe number of candidate subnetworks in iteration steps (a good example for this is the “colorful con-served subnetworks” as introduced in Section 5.4.2). However, the identified subnetworks are requiredto be conserved, i.e., each node only admits one alteration type among the samples sharing it (althoughwe have relaxed constraints that allow each sample to have a few nodes without any alterations, i.e. col-ors). In the future, we may extend the definition of a network to include nodes with color mismatches(for example, according to the definition in [6] or [185]) with a modification to cd-CAP’s candidate sub-network generation algorithm. (b) It solves the maximum conserved subnetwork cover (MCSC) problemto cover the maximum number of nodes in all samples with at most l colored subnetworks (l is user de-fined) via ILP. In the future, we aim to refine the MCSC formulation with reduced number of parametersand hope to develop exact or approximate solutions.Subnetworks identified by cd-CAP in COAD, GBM and BRCA datasets from TCGA are typically87enriched with genes harboring gene-expression alterations or copy-number gain. Notably, we observedthat genes in subnetworks with copy-number amplification are universally located in the same chromoso-mal locus. Many of these genes have known interactions and are functionally similar, demonstrating theability of cd-CAP in capturing functionally active subnetworks, conserved across a large number of tu-mor samples. These subnetworks seem to overlap with pathways critical for oncogenesis. In the datasetsanalyzed, we observed cell cycle, apoptosis, RNA processing, and immune system processes that areknown to be dysregulated in a large fraction of tumors. cd-CAP also captured subnetworks relevant toEGFR/ERBB2 signaling pathways, which have distinct expression patterns in specific subtypes of breastcancer [42, 133]. Survival analysis of cd-CAP identified subnetworks also confirmed their substantialclinical relevance.5.7 Methods5.7.1 Significance of the Identified SubnetworksUnder the assumption that each gene is altered independently, it is possible to apply the conventionalpermutation test [18, 89, 190] to assess the statistical significance of the subnetworks identified by cd-CAP as follows. Let Ci = {(vi, j,c) : c ∈Ci, j 6= /0,vi, j ∈V} be a binary relation representing the existingcolors on each node of sample network Gi. A permuted copy of the interaction network G′i = (V,E,C′i)is generated (under the null hypothesis) by randomly shuffling the range of C , such that each nodevi, j takes a new set of colors C′i, j with the total number of colors ∑ j |Ci, j| in Gi preserved. (In otherwords, ∑ j |Ci, j| = ∑ j |C′i, j|, and a simple implementation assigns |C′i, j| by random shuffling (|Ci, j| : j =1,2, · · · ,n). An entire set of permuted sample networks consists of each randomly generated G′i, and thispermutation test is repeated sufficiently many (by default 1000) times. For a particular size k subnetworkT = (VT ,ET ) identified by cd-CAP (on t samples) we define P1 as the fraction of these permutation testswhere any subnetwork of size at least k appear in t or more samples.In fact, P1 presents a lower bound on the p-value for T since it ignores the inter-dependency ofnode colors (gene alteration events). In particular, whole chromosome or chromosome arm level copynumber amplifications/deletions are commonly observed in various cancer types. To address this issue,we apply the following procedure to calculate P2 as an empirical upper-bound for the p-value of T ,under the assumption that copy number alterations take place in whole chromosome arms. First weidentify all genes v j ∈V on the same chromosome arm, chr(v j) and construct a set of supernodes Uchri ={chr(vi, j) : ∃c, (vi, j,c) ∈ Ci} from the genes on the same chromosome arm for each sample Pi. LetNE = |{(vi, j,E) ∈ Ci}| denote the number of nodes with color E (corresponding to either a copy number88gain or loss) in sample Pi. Then, each supernode is assigned the color E independently with probabilityNE|Ci| , which guarantees that the expected count of E in Pi is preserved. Finally we randomly assign theremaining colors to those nodes without a color assignment thus far, to obtain a new randomly permutedinteraction network G′′i = (V,E,C′′i ) towards an empirical p-value (upper bound) estimate. We againrepeat this process sufficiently many (by default 1000) times to generate distinct permuted datasets andderive P2 by counting the fraction of these datasets where any subnetwork of size at least k appear int or more samples. The true statistical significance is expected to be in the range [P1,P2] provided thatchromosome arms form the largest units of alteration.5.7.2 Pathway enrichment analysisThe set of genes in the subnetwork were tested for enrichment against gene sets of pathways present in theMolecular Signature Database (MSigDB) v6.0 [162]. A hypergeometric test based gene set enrichmentanalysis [162] was used for this purpose. A cut-off threshold of false discovery rate (FDR) ≤ 0.01 wasused to obtain the significantly enriched pathways.5.7.3 Association of sub-networks with patients’ survival outcomeIn order to assess the association of identified subnetworks with patients’ survival outcome, we useda risk-score based on the (weighted) aggregate expression of the genes in the subnetwork. The risk-score (S) of a patient is defined as the sum of the normalized gene-expression values in the subnetwork,each weighted by the estimated univariate Cox proportional-hazard regression coefficient [15], i.e., S =∑ki βixi j. Here i and j represents a gene and a patient respectively, βi is the coefficient of Cox regressionfor gene i, xi j is the normalized gene-expression of gene i in patient j, and k is the number of genes in thesubnetwork. The normalized gene-expression values were fitted against overall survival time with livingstatus as the censored event using univariate Cox proportional-hazard regression (exact method). Basedon the risk-score values, patients were stratified into two groups: low-risk group (patients with S<meanof S), and high-risk group (patients with S ≥ mean of S). Note that only those patients that are coveredby the subnetwork are considered for the analysis above.89Figure 5.1: Schematic overview of cdCAP. Multi-omics alteration profiles of a cohort of tumorsamples are identified using appropriate bioinformatics tools. The alteration information iscombined with gene-level information in the form of a sample-gene alteration matrix. Eachalteration type is assigned a distinct color. Using a (signaling) interaction network, cd-CAPidentifies subnetworks with conserved alteration patterns.90Table 5.1: Five subnetworks identified by cd-CAP in multi-subnetwork mode for each cancer type:respective columns below depict the subnetwork size, depth, and the number of nodes in thesubnetwork with copy number amplification (AMP), expression increase (EXP-UP) or decrease(EXP-DOWN).Cancer Network# Size Depth AMP EXP-UP EXP-DOWN1 6 206 1 5 02 11 152 6 5 0COAD 3 12 137 7 3 24 15 149 11 1 35 15 223 2 10 31 4 72 4 0 02 4 69 4 0 0GBM 3 9 67 9 0 04 16 70 1 15 05 36 96 1 32 31 8 164 7 0 12 10 332 1 9 0BRCA 3 11 360 1 10 04 15 313 1 14 05 15 335 1 14 091Figure 5.2: Conserved colored subnetworks. (A-D) Number of maximal solutions and the sizeof the conserved colored subnetwork obtained using the MCSI formulation, as a function ofnetwork depth t, in each of four cancer types analyzed, on STRING v10 (with high confidencenodes) PPI network . The horizontal axis denotes the depth (number of patients) of the net-work. For the blue plot, the vertical axis denotes the maximum possible network size (in termsof the number of nodes) and thus it is strictly non-increasing by definition. For the plots withdifferent colors, the vertical axis denotes the number of distinct networks with network sizeequal to that indicated by the blue plot. (E-G) One of the 11 maximal colored subnetworksidentified in BRCA Luminal A dataset. (E) The colored subnetwork (with 8 nodes) topology.(F) Pathways dysregulated by alterations harboured by the genes in the subnetwork - thesegenes are involved in EGFR, ERBB2, and FGFR signaling pathways. (G) Kaplan-Meier plotshowing the significant association of the subnetwork, with patients’ clinical outcome. (H-J)One of the 10 maximal colored subnetworks identified in COAD dataset. (H) The coloredsubnetwork (with 9 nodes) topology. (I) Pathways dysregulated by the alterations harbouredby the genes in the subnetwork - these genes are involved in signal transduction and apoptoticprocess. (J) Kaplan-Meier plot showing the significant association of the subnetwork withpatients’ clinical outcome (73 High Risk vs 83 Low Risk patients).92Figure 5.3: Colorful maximal subnetworks. (A-D) Number of maximal solutions and the sizeof the conserved colorful subnetwork obtained using the MCSI formulation, as a function ofnetwork depth t, in each of four cancer types analyzed on the STRING v10 (high confidenceedges) PPI network. The horizontal axis denotes the depth (number of patients) of the net-work. For the blue plot, the vertical axis denotes the maximum possible network size (interms of the number of nodes) and thus it is strictly non-increasing by definition. For the plotswith different colors, the vertical axis denotes the number of distinct networks with networksize equal to that indicated by the blue plot. (E-G) One of the maximal colorful subnetworksidentified in the COAD dataset with depth 108 (patients). (E) The colored subnetwork (with9 nodes) topology - obtained from STRING v10 (with experimentally validated edges) PPInetwork. (F) Pathways dysregulated by alterations harboured by the genes in the subnetwork.(G) Kaplan-Meier plot showing the significant association of the subnetwork, with patients’clinical outcome (59 High Risk vs 47 Low Risk patients). (H-J) One of the maximal color-ful subnetworks identified in the Luminal A dataset with no color restrictions, with depth of58 (patients). (H) The colored subnetwork (with 8 nodes) topology - obtained in the REAC-TOME PPI network. (I) Pathways dysregulated by the alterations harboured by the genes inthe subnetwork. (J) Kaplan-Meier plot showing the significant association of the subnetworkwith patients’ clinical outcome (30 High Risk vs 30 Low Risk patients).93Figure 5.4: Multiple subnetwork analysis. Two largest among the 15 subnetworks identifiedacross the COAD, GBM and BRCA data sets (5 per each) through the MCSC formulation ofcd-CAP on STRING v10.5 (with experimentally validated edges) PPI network. The numberin parenthesis next to each node represents the univariate Cox proportional-hazard regressioncoefficient estimated for that gene, used as its weight in the risk-score calculation to stratifythe patients into two distinct risk groups. (See section 5.7.3 for details). (A-C) The largest ofthe 5 COAD subnetworks with a network depth of 149 (patients). (A) The subnetwork topol-ogy (with 15 nodes). (B) Pathways dysregulated by alterations harboured by the genes in thesubnetwork. (C) Kaplan-Meier plot showing the significant association of the subnetwork,with patients’ clinical outcome (69 High Risk vs 78 Low Risk patients). (D-F) The largest ofthe 5 BRCA subnetworks with a network depth of 313 (patients). (D) The subnetwork topol-ogy (with 15 nodes). (E) Pathways dysregulated by the alterations harboured by the genes inthe subnetwork. (F) Kaplan-Meier plot showing the significant association of the subnetworkwith patients’ clinical outcome (33 High Risk vs 278 Low Risk patients).94Figure 5.5: Empirical p-value estimates for the maximum size subnetworks identified by cd-CAP. Compared with the subnetworks observed in real mutation profiles, those identified bycd-CAP in permutation tests (with identical t values) were much smaller, implying a p-valueof < 0.001 for each of the colored subnetworks presented in Figure 5.2.95Chapter 6ConclusionIn recent years, there has been an unprecedented increase in the multi-dimensional high-throughput dataprofiling (especially genome, transcriptome, proteome, and epigenome) of cancer patients. This hasrevealed extensive mutational heterogeneity observed in the cancer (sub)types, yielding a long-taileddistribution of mutated genes across the patients, implying the existence of many rare/private drivergenes. Thus, there is a great need for computational methods to mine these massive datasets and prioritizeclinically actionable driver events to aid treatment modalities using precision oncology.The primary goal of this thesis was to develop novel computational algorithms to identify and priori-tize cancer driver genes and provide insight into the heterogeneous biology to guide precision oncology.We introduced HIT’nDRIVE, a combinatorial algorithm to prioritize cancer driver genes. HIT’nDRIVEmodels the information flow connecting the genomic aberrations to the changes in global expression pat-tern in the transcriptome. HIT’nDRIVE measures the potential impact of genomic aberrations on changesin the global expression of other genes/proteins which are in close proximity in a gene/protein-interactionnetwork. HIT’nDRIVE then prioritizes those aberrations with the highest impact as cancer driver genes.We formulated the driver prioritization problem as a “random-walk facility location” (RWFL) problem,which differs from the standard facility location problem by its use of “hitting time”, the expected numberof hops to reach a “target” gene from a “source” gene, as a distance measure in an interaction network.HIT’nDRIVE uses “inverse” hitting time as a measure of influence of a source gene over a target geneto identify the subset of sequencewise altered/source genes whose overall influence over expression al-tered/target genes is maximum possible.We further demonstrated that HIT’nDRIVE accurately predicts patient-specific predicts cancer drivergenes. We also demonstrate that by using HIT’nDRIVE-identified driver genes and associated “networkmodules” (sub-networks seeded by driver genes whose aggregate expression profiles correlate well with96the cancer phenotype) as features, it is possible to perform accurate phenotype classification. We alsodemonstrate that these driver modules are associated with patients’ survival outcome and accurately pre-dict drug efficacy in pan-cancer cell lines. Altogether, HIT’nDRIVE may help clinicians contextualizemassive multi-omics data in therapeutic decision making widespread implementation of precision oncol-ogy possible.In chapter 4, we described the first-in-field integrative multi-omics characterization of a cohort of ma-lignant peritoneal mesothelioma (PeM). To our knowledge this is the largest cohort, of this rare tumor, tobe subjected to an integrative multi-omics analysis. We presented the integrated genome, transcriptome,and proteome landscape. BAP1 loss of function is known to be a key driver event of PeM. However, thedownstream molecular and clinical significance of BAP1 loss has not been investigated in context of PeMand we show that it is predictive for immunotherapy. We found that BAP1 loss forms a distinct molecularsubtype characterized by dysregulated gene expression patterns of chromatin remodeling, DNA repairpathways, and immune checkpoint receptor activation. We further demonstrated that this subtype iscorrelated with an inflammatory tumor microenvironment and thus a candidate for immune checkpointblockade therapies. Thus, BAP1 is a biomarker for PeM immunotherapy in 50% of cases we studied.This is of critical importance because PeM is a rare and understudied cancer for which chemotherapyand targeted therapies have proven ineffective. BAP1 stratification may improve drug response rates inongoing phase-I and II clinical trials exploring the use of immune checkpoint blockade therapies in PeM.In these BAP1 status is not currently taken into account.Further, we resolved the discordance between mRNA and protein expression patterns in this cohortand this may apply to other studies incorporating mass spectrometry. The discordance between mRNAand protein levels was found to be due to multimeric protein complexes of chromatin remodeling genesthe majority of which are direct protein-interaction partners of BAP1. The discordance between themRNA and the protein expression patterns is most likely due to the ubiquitination and degradation ofproteins in these BAP1 regulated complexes to maintain functional stoichiometry.Finally, in chapter 5, we introduced cd-CAP, a combinatorial algorithm to identify sub-networkswith conserved molecular alteration pattern across a large subset of a tumor sample cohort. cd-CAPsimultaneously identifies more than one subnetwork, and each gene within each subnetwork has labelsspecific to the alteration types it harbors. Notably, we demonstrate that many of the largest highlyconserved subnetworks within a tumor type solely consist of genes that have been subject to copy numbergain, typically located on the same chromosomal arm and thus likely a result of a single, large scaleamplification. We have also demonstrated that the subnetworks identified using cd-CAP are associatedwith patients’ survival outcome and hence are clinically important.976.1 Future PerspectiveContinuous development and validation of novel algorithms for identification and prioritization of cancerdriver genes, especially rare driver genes, will be essential given the exponential growth of sequenced tu-mors. Many studies over the past decade have focused on driver genomic aberration on the protein-codingregions of the gene. Driver genes harbouring aberration in the non-coding regions are emerging. Withthe rise of multi-omics data profiled for a given tumor, efficient computational algorithms to integratemeaningful information from these multi-omics data together with curated knowledge of signaling path-way/network will be necessary. Inclusion of epigenome (DNA methylation) and 3D genome interactiondata (Hi-C) data together with genome, transcriptome, and proteome would be necessary to understandcancer initiation and progression. Furthermore, I believe, as the regulatory interaction-network coveringthe non-coding genome will be available in the near future, this will trigger the next wave of algorithmcombining different types of data mentioned above.Inference of sub-clonal population structure and identification of sub-clonal driver genes is anotheravenue which is necessary for correct identification of driver genes. However, the shallow sequencingdepth of the available tumor whole genome sequences has remained as a bottleneck to correctly estimatethe correct sub-clonal population structure of tumor samples. Thus, I believe, as the high-throughputsequencing cost further shrinks and ultra-high coverage genomes become more common, efficient com-putational algorithms would be able to correctly identify sub-clonal driver genes providing further insightinto tumor evolution-guided clinically actionable targets.In recent past, single-cell sequencing technology (single-cell DNA-seq, RNA-seq, and methylationprofiling) has surged as a promising technique to study molecular changes at a single-cell resolution. Ibelieve, advances in development of computational tools to analyze single-cell sequencing data for re-solving intra-tumor heterogeneity, spatial heterogeneity, and reconstructing sub-clonal population struc-ture in tumor will provide new insights in oncology research.On the other hand, Deep Neural Network (also known as deep learning) has been recognized as anefficient approach for learning the functional relationships between different types of related data. Al-though in its infancy, one of deep neural network based methods, IBM Watson Oncology (https://www.ibm.com/watson/),is being tested for its utility in cancer therapeutics across research centres around the world. Similarly,Google’s DeepMind Health (https://deepmind.com) is being tested to mine patients’ medical reports topredict appropriate treatment in the UK. I believe, combining the algorithmic approaches described inthis thesis together with deep neural network approach would help such computational tools to becomemore powerful and robust which critical for precision oncology.98Bibliography[1] AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an InternationalConsortium. Cancer discovery, 7(8):818–831, 2017. ISSN 2159-8290. doi:10.1158/2159-8290.CD-17-0151. URLhttp://www.ncbi.nlm.nih.gov/pubmed/28572459. → pages 53[2] I. Adzhubei, S. Schmidt, L. Peshkin, V. E. Ramensky, A. Gerasimova, P. Bork, A. S. Kondrashov, and S. R. Sunyaev. Amethod and server for predicting damaging missense mutations. Nature methods, 7(4):248–249, 2010. ISSN1548-7091. doi:10.1038/nmeth0410-248. URL https://www.ncbi.nlm.nih.gov/pubmed/20354512. → pages 3[3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20thInternational Conference on Very Large Data Bases, VLDB ’94, pages 487–499, San Francisco, CA, USA, 1994.Morgan Kaufmann Publishers Inc. ISBN 1-55860-153-8. → pages 77, 80[4] U. D. Akavia, O. Litvin, J. Kim, F. Sanchez-Garcia, D. Kotliar, et al. An integrated approach to uncover drivers ofcancer. Cell, 143(6):1005–17, Dec. 2010. ISSN 1097-4172. doi:10.1016/j.cell.2010.11.013. → pages 4[5] H. Alakus, S. E. Yost, B. Woo, R. French, G. Y. Lin, K. Jepsen, K. A. Frazer, A. M. Lowy, and O. Harismendy. BAP1mutation is a frequent somatic event in peritoneal malignant mesothelioma. Journal of translational medicine, 13(1):122, 2015. ISSN 1479-5876. doi:10.1186/s12967-015-0485-1. URL https://www.ncbi.nlm.nih.gov/pubmed/25889843.→ pages 51, 53[6] N. Alcaraz, T. Friedrich, T. Ko¨tzing, A. Krohmer, J. Mu¨ller, J. Pauling, and J. Baumbach. Efficient key pathwaymining: combining networks and OMICS data. Integrative biology : quantitative biosciences from nano to macro, 4(7):756–64, jul 2012. ISSN 1757-9708. doi:10.1039/c2ib00133k. URL http://www.ncbi.nlm.nih.gov/pubmed/22353882.→ pages 76, 87[7] L. B. Alexandrov, S. Nik-Zainal, D. C. Wedge, S. a. J. R. Aparicio, S. Behjati, A. V. Biankin, G. R. Bignell, N. Bolli,A. Borg, A.-L. Børresen-Dale, S. Boyault, B. Burkhardt, A. P. Butler, C. Caldas, H. R. Davies, C. Desmedt, R. Eils,J. E. Eyfjo¨rd, J. a. Foekens, M. Greaves, F. Hosoda, B. Hutter, T. Ilicic, S. Imbeaud, M. Imielinski, M. Imielinsk,N. Ja¨ger, D. T. W. Jones, D. Jones, S. Knappskog, M. Kool, S. R. Lakhani, C. Lo´pez-Otı´n, S. Martin, N. C. Munshi,H. Nakamura, P. a. Northcott, M. Pajic, E. Papaemmanuil, A. Paradiso, J. V. Pearson, X. S. Puente, K. Raine,M. Ramakrishna, A. L. Richardson, J. Richter, P. Rosenstiel, M. Schlesner, T. N. Schumacher, P. N. Span, J. W. Teague,Y. Totoki, A. N. J. Tutt, R. Valde´s-Mas, M. M. van Buuren, L. van ’t Veer, A. Vincent-Salomon, N. Waddell, L. R.Yates, Australian Pancreatic Cancer Genome Initiative, ICGC Breast Cancer Consortium, ICGC MMML-SeqConsortium, ICGC PedBrain, J. Zucman-Rossi, P. A. Futreal, U. McDermott, P. Lichter, M. Meyerson, S. M.Grimmond, R. Siebert, E. Campo, T. Shibata, S. M. Pfister, P. J. Campbell, and M. R. Stratton. Signatures of mutationalprocesses in human cancer. Nature, 500(7463):415–21, aug 2013. ISSN 1476-4687. doi:10.1038/nature12477. URLhttps://www.ncbi.nlm.nih.gov/pubmed/23945592. → pages 52, 75[8] L. B. Alexandrov, P. H. Jones, D. C. Wedge, J. E. Sale, P. J. Campbell, S. Nik-Zainal, and M. R. Stratton. Clock-likemutational processes in human somatic cells. Nature Genetics, 47(12):1402–1407, 2015. ISSN 15461718.doi:10.1038/ng.3441. URL https://www.ncbi.nlm.nih.gov/pubmed/26551669. → pages 6799[9] E. W. Alley, J. Lopez, A. Santoro, A. Morosky, S. Saraf, B. Piperdi, and E. van Brummelen. Clinical safety and activityof pembrolizumab in patients with malignant pleural mesothelioma (KEYNOTE-028): preliminary results from anon-randomised, open-label, phase 1b trial. The Lancet Oncology, 18(5):623–630, 2017. ISSN 14745488.doi:10.1016/S1470-2045(17)30169-9. URL https://www.ncbi.nlm.nih.gov/pubmed/28291584. → pages 61[10] S. Anders and W. Huber. Differential expression analysis for sequence count data. Genome biology, 11(10):R106,2010. ISSN 1474-760X. doi:10.1186/gb-2010-11-10-r106. URL http://www.ncbi.nlm.nih.gov/pubmed/209796212. →pages 65[11] S. Anders, P. T. Pyl, and W. Huber. HTSeq-A Python framework to work with high-throughput sequencing data.Bioinformatics, 31(2):166–169, 2015. ISSN 14602059. doi:10.1093/bioinformatics/btu638. URLhttps://www.ncbi.nlm.nih.gov/pubmed/25260700. → pages 65[12] M. H. Bailey, C. Tokheim, E. Porta-Pardo, S. Sengupta, D. Bertrand, A. Weerasinghe, A. Colaprico, M. C. Wendl,J. Kim, B. Reardon, P. K.-S. Ng, K. J. Jeong, S. Cao, Z. Wang, J. Gao, Q. Gao, F. Wang, E. M. Liu, L. Mularoni,C. Rubio-Perez, N. Nagarajan, I. Corte´s-Ciriano, D. C. Zhou, W.-W. Liang, J. M. Hess, V. D. Yellapantula,D. Tamborero, A. Gonzalez-Perez, C. Suphavilai, J. Y. Ko, E. Khurana, P. J. Park, E. M. Van Allen, H. Liang, MC3Working Group, Cancer Genome Atlas Research Network, M. S. Lawrence, A. Godzik, N. Lopez-Bigas, J. Stuart,D. Wheeler, G. Getz, K. Chen, A. J. Lazar, G. B. Mills, R. Karchin, and L. Ding. Comprehensive Characterization ofCancer Driver Genes and Mutations. Cell, 173(2):371–385.e18, apr 2018. ISSN 1097-4172.doi:10.1016/j.cell.2018.02.060. URL http://www.ncbi.nlm.nih.gov/pubmed/29625053. → pages 34, 75[13] C. E. Barbieri, S. C. Baca, M. S. Lawrence, F. Demichelis, M. Blattner, J.-P. Theurillat, T. a. White, P. Stojanov, E. VanAllen, N. Stransky, E. Nickerson, S.-S. Chae, G. Boysen, D. Auclair, R. C. Onofrio, K. Park, N. Kitabayashi, T. Y.MacDonald, K. Sheikh, T. Vuong, C. Guiducci, K. Cibulskis, A. Sivachenko, S. L. Carter, G. Saksena, D. Voet, W. M.Hussain, A. H. Ramos, W. Winckler, M. C. Redman, K. Ardlie, A. K. Tewari, J. M. Mosquera, N. Rupp, P. J. Wild,H. Moch, C. Morrissey, P. S. Nelson, P. W. Kantoff, S. B. Gabriel, T. R. Golub, M. Meyerson, E. S. Lander, G. Getz,M. a. Rubin, and L. a. Garraway. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations inprostate cancer. Nature genetics, 44(6):685–9, jun 2012. ISSN 1546-1718. doi:10.1038/ng.2279. URLhttps://www.ncbi.nlm.nih.gov/pubmed/22610119. → pages 36[14] A. Bashashati, G. Haffari, J. Ding, G. Ha, K. Lui, et al. DriverNet: uncovering the impact of somatic driver mutationson transcriptional networks in cancer. Genome biology, 13(12):R124, Dec. 2012. ISSN 1465-6914.doi:10.1186/gb-2012-13-12-r124. → pages 6, 19[15] D. G. Beer, S. L. R. Kardia, C.-C. Huang, T. J. Giordano, A. M. Levin, D. E. Misek, L. Lin, G. Chen, T. G. Gharib,D. G. Thomas, M. L. Lizyness, R. Kuick, S. Hayasaka, J. M. G. Taylor, M. D. Iannettoni, M. B. Orringer, andS. Hanash. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature medicine, 8(8):816–24, aug 2002. ISSN 1078-8956. doi:10.1038/nm733. → pages 44, 89[16] H. Beltran, K. Eng, J. M. Mosquera, A. Sigaras, A. Romanel, H. Rennert, M. Kossai, C. Pauli, B. Faltas, J. Fontugne,K. Park, J. Banfelder, D. Prandi, N. Madhukar, T. Zhang, J. Padilla, N. Greco, T. J. McNary, E. Herrscher, D. Wilkes,T. Y. MacDonald, H. Xue, V. Vacic, A.-K. Emde, D. Oschwald, A. Y. Tan, Z. Chen, C. Collins, M. E. Gleave, Y. Wang,D. Chakravarty, M. Schiffman, R. Kim, F. Campagne, B. D. Robinson, D. M. Nanus, S. T. Tagawa, J. Z. Xiang,A. Smogorzewska, F. Demichelis, D. S. Rickman, A. Sboner, O. Elemento, and M. a. Rubin. Whole-ExomeSequencing of Metastatic Cancer and Biomarkers of Treatment Response. JAMA Oncology, 10021, 2015. ISSN2374-2437. doi:10.1001/jamaoncol.2015.1313. URL https://www.ncbi.nlm.nih.gov/pubmed/26181256. → pages 43[17] G. Bianchini, J. M. Balko, I. A. Mayer, M. E. Sanders, and L. Gianni. Triple-negative breast cancer: challenges andopportunities of a heterogeneous disease. Nature reviews. Clinical oncology, may 2016. ISSN 1759-4782.doi:10.1038/nrclinonc.2016.66. URL http://www.ncbi.nlm.nih.gov/pubmed/27184417. → pages 39100[18] A. Bomersbach, M. Chiarandini, and F. Vandin. An Efficient Branch and Cut Algorithm to Find Frequently MutatedSubnetworks in Cancer. In M. Frith and C. N. and Storm Pedersen, editors, Algorithms in Bioinformatics, pages 27–39,Cham, 2016. Springer International Publishing. ISBN 978-3-319-43681-4. doi:10.1007/978-3-319-43681-4 3. URLhttp://link.springer.com/10.1007/978-3-642-33122-0. → pages 76, 79, 80, 88[19] M. Bott, M. Brevet, B. S. Taylor, S. Shimizu, T. Ito, L. Wang, J. Creaney, R. a. Lake, M. F. Zakowski, B. Reva,C. Sander, R. Delsite, S. Powell, Q. Zhou, R. Shen, A. Olshen, V. Rusch, and M. Ladanyi. The nuclear deubiquitinaseBAP1 is commonly inactivated by somatic mutations and 3p21.1 losses in malignant pleural mesothelioma. Naturegenetics, 43(7):668–672, 2011. ISSN 1061-4036. doi:10.1038/ng.855. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21642991. → pages 53[20] N. J. Bowen, L. D. Walker, L. V. Matyunina, S. Logani, K. a. Totten, B. B. Benigno, and J. F. McDonald. Geneexpression profiling supports the hypothesis that human ovarian surface epithelia are multipotent and capable of servingas ovarian cancer initiating cells. BMC medical genomics, 2:71, 2009. ISSN 1755-8794. doi:10.1186/1755-8794-2-71.→ pages 24[21] S. E. Bowyer, A. D. Rao, M. Lyle, S. Sandhu, G. V. Long, G. a. McArthur, J. M. Raleigh, R. J. Hicks, and M. Millward.Activity of trametinib in K601E and L597Q BRAF mutation-positive metastatic melanoma. Melanoma research, 24(5):504–8, 2014. ISSN 1473-5636. doi:10.1097/CMR.0000000000000099. → pages 38[22] C. W. Brennan, R. G. W. Verhaak, A. McKenna, B. Campos, H. Noushmehr, S. R. Salama, S. Zheng, D. Chakravarty,J. Z. Sanborn, S. H. Berman, R. Beroukhim, B. Bernard, C.-J. Wu, G. Genovese, I. Shmulevich, J. Barnholtz-Sloan,L. Zou, R. Vegesna, S. a. Shukla, G. Ciriello, W. K. Yung, W. Zhang, C. Sougnez, T. Mikkelsen, K. Aldape, D. D.Bigner, E. G. Van Meir, M. Prados, A. Sloan, K. L. Black, J. Eschbacher, G. Finocchiaro, W. Friedman, D. W.Andrews, A. Guha, M. Iacocca, B. P. O’Neill, G. Foltz, J. Myers, D. J. Weisenberger, R. Penny, R. Kucherlapati, C. M.Perou, D. N. Hayes, R. Gibbs, M. Marra, G. B. Mills, E. Lander, P. Spellman, R. Wilson, C. Sander, J. Weinstein,M. Meyerson, S. Gabriel, P. W. Laird, D. Haussler, G. Getz, L. Chin, and TCGA Research Network. The somaticgenomic landscape of glioblastoma. Cell, 155(2):462–77, oct 2013. ISSN 1097-4172. doi:10.1016/j.cell.2013.09.034.→ pages 86[23] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDNSystems, 30(1-7):107–117, apr 1998. ISSN 01697552. doi:10.1016/S0169-7552(98)00110-X. URLhttp://dx.doi.org/10.1016/S0169-7552(98)00110-X. → pages 6[24] R. Bueno, E. W. Stawiski, L. D. Goldstein, S. Durinck, A. De Rienzo, Z. Modrusan, F. Gnad, T. T. Nguyen, B. S.Jaiswal, L. R. Chirieac, D. Sciaranghella, N. Dao, C. E. Gustafson, K. J. Munir, J. A. Hackney, A. Chaudhuri, R. Gupta,J. Guillory, K. Toy, C. Ha, Y.-J. Chen, J. Stinson, S. Chaudhuri, N. Zhang, T. D. Wu, D. J. Sugarbaker, F. J. de Sauvage,W. G. Richards, and S. Seshagiri. Comprehensive genomic analysis of malignant pleural mesothelioma identifiesrecurrent mutations, gene fusions and splicing alterations. Nature Genetics, 48(October 2015):1–13, 2016. ISSN1061-4036. doi:10.1038/ng.3520. URL http://www.ncbi.nlm.nih.gov/pubmed/26928227. → pages 53, 55[25] L. Calabro`, A. Morra, E. Fonsatti, O. Cutaia, G. Amato, D. Giannarelli, A. M. Di Giacomo, R. Danielli, M. Altomonte,L. Mutti, and M. Maio. Tremelimumab for patients with chemotherapy-resistant advanced malignant mesothelioma:An open-label, single-arm, phase 2 trial. The Lancet Oncology, 14(11):1104–1111, 2013. ISSN 14702045.doi:10.1016/S1470-2045(13)70381-4. URL https://www.ncbi.nlm.nih.gov/pubmed/24035405. → pages 51, 61[26] L. Calabro`, A. Morra, E. Fonsatti, O. Cutaia, C. Fazio, D. Annesi, M. Lenoci, G. Amato, R. Danielli, M. Altomonte,D. Giannarelli, A. M. Di Giacomo, and M. Maio. Efficacy and safety of an intensified schedule of tremelimumab forchemotherapy-resistant malignant mesothelioma: An open-label, single-arm, phase 2 study. The Lancet RespiratoryMedicine, 3(4):301–309, 2015. ISSN 22132619. doi:10.1016/S2213-2600(15)00092-2. URLhttps://www.ncbi.nlm.nih.gov/pubmed/25819643. → pages 61101[27] L. Calabro`, A. Morra, D. Giannarelli, G. Amato, A. D’Incecco, A. Covre, A. Lewis, M. C. Rebelatto, R. Danielli,M. Altomonte, A. M. Di Giacomo, and M. Maio. Tremelimumab combined with durvalumab in patients withmesothelioma (NIBIT-MESO-1): an open-label, non-randomised, phase 2 study. The Lancet. Respiratory medicine,2600(18):1–10, may 2018. ISSN 2213-2619. doi:10.1016/S2213-2600(18)30151-6. URLhttp://www.ncbi.nlm.nih.gov/pubmed/29773326. → pages 51[28] P. J. Campbell. Cliques and Schisms of Cancer Genes. Cancer Cell, 32(2):129–130, 2017. ISSN 18783686.doi:10.1016/j.ccell.2017.07.009. URL http://dx.doi.org/10.1016/j.ccell.2017.07.009. → pages 76[29] H. Carter, S. Chen, L. Isik, S. Tyekucheva, V. E. Velculescu, K. W. Kinzler, B. Vogelstein, and R. Karchin.Cancer-specific high-throughput annotation of somatic mutations: Computational prediction of driver missensemutations. Cancer Research, 69(16):6660–6667, 2009. ISSN 00085472. doi:10.1158/0008-5472.CAN-09-1133. URLhttps://www.ncbi.nlm.nih.gov/pubmed/19654296. → pages 3[30] L. Carvallo, R. Mun˜oz, F. Bustos, N. Escobedo, H. Carrasco, G. Olivares, and J. Larraı´n. Non-canonical Wnt signalinginduces ubiquitination and degradation of Syndecan4. The Journal of biological chemistry, 285(38):29546–55, sep2010. ISSN 1083-351X. doi:10.1074/jbc.M110.155812. URL http://www.ncbi.nlm.nih.gov/pubmed/20639201. →pages 84[31] B. S. Carver, J. Tran, A. Gopalan, Z. Chen, S. Shaikh, A. Carracedo, A. Alimonti, C. Nardella, S. Varmeh, P. T.Scardino, C. Cordon-Cardo, W. Gerald, and P. P. Pandolfi. Aberrant ERG expression cooperates with loss of PTEN topromote cancer progression in the prostate. Nature Genetics, 41(5):619–624, 2009. ISSN 1061-4036.doi:10.1038/ng.370. URL https://www.ncbi.nlm.nih.gov/pubmed/19396168. → pages 75[32] E. Cerami, E. Demir, N. Schultz, B. S. Taylor, and C. Sander. Automated network analysis identifies core pathways inglioblastoma. PLoS ONE, 5(2), 2010. ISSN 19326203. doi:10.1371/journal.pone.0008918. → pages 4[33] F. Chen, Y. Zhang, Y. Senbabaoglu, G. Ciriello, L. Yang, E. Reznik, B. Shuch, G. Micevic, G. De Velasco, E. Shinbrot,M. S. Noble, Y. Lu, K. R. Covington, L. Xi, J. A. Drummond, D. Muzny, H. Kang, J. Lee, P. Tamboli, V. Reuter, C. S.Shelley, B. A. Kaipparettu, D. P. Bottaro, A. K. Godwin, R. A. Gibbs, G. Getz, R. Kucherlapati, P. J. Park, C. Sander,E. P. Henske, J. H. Zhou, D. J. Kwiatkowski, T. H. Ho, T. K. Choueiri, J. J. Hsieh, R. Akbani, G. B. Mills, A. A.Hakimi, D. A. Wheeler, and C. J. Creighton. Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma. CellReports, 14(10):2476–2489, 2016. ISSN 22111247. doi:10.1016/j.celrep.2016.02.024. URLhttps://www.ncbi.nlm.nih.gov/pubmed/26947078. → pages 60, 75[34] P. Chirac, D. Maillet, F. Lepreˆtre, S. Isaac, O. Glehen, M. Figeac, L. Villeneuve, J. Pe´ron, F. Gibson, F. Galateau-Salle´,F. N. Gilly, and M. Brevet. Genomic copy number alterations in 33 malignant peritoneal mesothelioma analyzed bycomparative genomic hybridization array. Human Pathology, 55:72–82, 2016. ISSN 15328392.doi:10.1016/j.humpath.2016.04.015. URL https://www.ncbi.nlm.nih.gov/pubmed/27184482. → pages 51[35] D.-Y. Cho, Y.-A. Kim, and T. M. Przytycka. Chapter 5: Network Biology Approach to Complex Diseases. PLoSComputational Biology, 8(12):e1002820, Dec. 2012. ISSN 1553-7358. doi:10.1371/journal.pcbi.1002820. URLhttp://dx.plos.org/10.1371/journal.pcbi.1002820. → pages 5[36] S. A. Chowdhury, S. E. Shackney, K. Heselmeyer-Haddad, T. Ried, A. A. Scha¨ffer, and R. Schwartz. Algorithms toModel Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics.PLoS Computational Biology, 10(7), 2014. ISSN 15537358. doi:10.1371/journal.pcbi.1003740. → pages 78[37] G. Ciriello, E. Cerami, C. Sander, and N. Schultz. Mutual exclusivity analysis identifies oncogenic network modules.Genome research, 22(2):398–406, Feb. 2012. ISSN 1549-5469. doi:10.1101/gr.125567.111. → pages 4, 76102[38] G. Ciriello, M. L. Miller, B. A. Aksoy, Y. Senbabaoglu, N. Schultz, and C. Sander. Emerging landscape of oncogenicsignatures across human cancers. Nature genetics, 45(10):1127–1133, sep 2013. ISSN 1546-1718.doi:10.1038/ng.2762. → pages 36[39] S. Condamin, O. Be´nichou, V. Tejedor, R. Voituriez, and J. Klafter. First-passage times in complex scale-invariantmedia. Nature, 450(7166):77–80, 2007. ISSN 0028-0836. doi:10.1038/nature06201. → pages 7, 12[40] G. Cormode, G. Cormode, M. Paterson, M. Paterson, S. Sahinalp, S. Sahinalp, U. Vishkin, and U. Vishkin.Communication complexity of document exchange. In Proceedings of the eleventh annual ACM-SIAM symposium onDiscrete algorithms, pages 197–206, Philadelphia, 2000. Society for Industrial and Applied Mathematics. ISBN0-89871-453-2. URL http://portal.acm.org/citation.cfm?id=338219.338252. → pages 78[41] L. Cowen, T. Ideker, B. J. Raphael, and R. Sharan. Network propagation: a universal amplifier of genetic associations.Nature reviews. Genetics, 18(9):551–562, 2017. ISSN 1471-0064. doi:10.1038/nrg.2017.38. URLhttp://www.ncbi.nlm.nih.gov/pubmed/28607512. → pages 5[42] C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa,Y. Yuan, S. Gra¨f, G. Ha, G. Haffari, A. Bashashati, R. Russell, S. McKinney, A. Langerød, A. Green, E. Provenzano,G. Wishart, S. Pinder, P. Watson, F. Markowetz, L. Murphy, I. Ellis, A. Purushotham, A.-L. Børresen-Dale, J. D.Brenton, S. Tavare´, C. Caldas, and S. Aparicio. The genomic and transcriptomic architecture of 2,000 breast tumoursreveals novel subgroups. Nature, 486(7403):346–52, jun 2012. ISSN 1476-4687. doi:10.1038/nature10983. URLhttp://www.ncbi.nlm.nih.gov/pubmed/22522925. → pages 24, 39, 78, 84, 85, 88[43] K. B. Dahlman, J. Xia, K. Hutchinson, C. Ng, D. Hucks, P. Jia, M. Atefi, Z. Su, S. Branch, P. L. Lyle, D. J. Hicks,V. Bozon, J. A. Glaspy, N. Rosen, D. B. Solit, J. L. Netterville, C. L. Vnencak-Jones, J. A. Sosman, A. Ribas, Z. Zhao,and W. Pao. BRAFL597 mutations in melanoma are associated with sensitivity to MEK inhibitors. Cancer Discovery,2(9):791–797, 2012. ISSN 21598274. doi:10.1158/2159-8290.CD-12-0097. → pages 38[44] P. Dao, K. Wang, C. Collins, M. Ester, A. Lapuk, and S. C. Sahinalp. Optimally discriminative subnetwork markerspredict response to chemotherapy. Bioinformatics, 27(13), Jul 2011. → pages 14[45] P. Dao, Y.-A. Kim, D. Wojtowicz, S. Madan, R. Sharan, and T. M. Przytycka. BeWith: A Between-Within method todiscover relationships between cancer modules via integrated analysis of mutual exclusivity, co-occurrence andfunctional interactions. PLoS computational biology, 13(10):e1005695, oct 2017. ISSN 1553-7358.doi:10.1371/journal.pcbi.1005695. → pages 76[46] N. D. Dees, Q. Zhang, C. Kandoth, M. C. Wendl, W. Schierding, D. C. Koboldt, T. B. Mooney, M. B. Callaway,D. Dooling, E. R. Mardis, R. K. Wilson, and L. Ding. MuSiC: Identifying mutational significance in cancer genomes.Genome Research, 22(8):1589–1598, 2012. ISSN 10889051. doi:10.1101/gr.134635.111. URLhttps://www.ncbi.nlm.nih.gov/pubmed/22759861. → pages 2[47] M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A. Philippakis, G. del Angel, M. A.Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M. Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel,D. Altshuler, and M. J. Daly. A framework for variation discovery and genotyping using next-generation DNAsequencing data. Nature Genetics, 43(5):491–498, 2011. ISSN 1061-4036. doi:10.1038/ng.806. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21478889. → pages 63[48] J. Ding, M. K. McConechy, H. M. Horlings, G. Ha, F. Chun Chan, T. Funnell, S. C. Mullaly, J. Reimand,A. Bashashati, G. D. Bader, D. Huntsman, S. Aparicio, A. Condon, and S. P. Shah. Systematic analysis of somaticmutations impacting gene expression in 12 tumour types. Nature Communications, 6(1):8554, dec 2015. ISSN2041-1723. doi:10.1038/ncomms9554. URL http://www.ncbi.nlm.nih.gov/pubmed/26436532. → pages 4103[49] L. Ding, T. J. Ley, D. E. Larson, C. a. Miller, D. C. Koboldt, et al. Clonal evolution in relapsed acute myeloidleukaemia revealed by whole-genome sequencing. Nature, 481(7382):506–10, Jan. 2012. ISSN 1476-4687.doi:10.1038/nature10738. URL https://www.ncbi.nlm.nih.gov/pubmed/22237025. → pages 3[50] A. Dobin, C. a. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, and T. R. Gingeras. STAR:Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21, 2013. ISSN 13674803.doi:10.1093/bioinformatics/bts635. URL https://www.ncbi.nlm.nih.gov/pubmed/23104886. → pages 64[51] B. Dutta, L. Pusztai, Y. Qi, F. Andre´, V. Lazar, G. Bianchini, N. Ueno, R. Agarwal, B. Wang, C. Y. Shiang, G. N.Hortobagyi, G. B. Mills, W. F. Symmans, and G. Bala´zsi. A network-based, integrative study to identify core biologicalpathways that drive breast cancer clinical subtypes. British journal of cancer, 106(6):1107–16, mar 2012. ISSN1532-1827. doi:10.1038/bjc.2011.584. URL http://www.ncbi.nlm.nih.gov/pubmed/22343619. → pages 39[52] M. Dyer and A. Frieze. On the complexity of partitioning graphs into connected subgraphs. Discrete AppliedMathematics, 10(2):139 – 153, 1985. ISSN 0166-218X. doi:10.1016/0166-218X(85)90008-3. URLhttp://www.sciencedirect.com/science/article/pii/0166218X85900083. → pages 80[53] M. El-Kebir and G. W. Klau. Solving the Maximum-Weight Connected Subgraph Problem to Optimality. arXiv, pages1–32, sep 2014. URL http://arxiv.org/abs/1409.5308. → pages 76[54] M. El-kebir, B. J. Raphael, R. Shamir, R. Sharan, S. Zaccaria, M. Zehavi, and R. Zeira. Copy-Number EvolutionProblems: Complexity and Algorithms. In M. Frith and C. N. and Storm Pedersen, editors, Algorithms inBioinformatics, pages 137–149, Cham, 2016. Springer International Publishing. ISBN 978-3-319-43681-4.doi:10.1007/978-3-319-43681-4 11. URL http://link.springer.com/10.1007/978-3-642-33122-0. → pages 78[55] A. Fabregat, K. Sidiropoulos, P. Garapati, M. Gillespie, K. Hausmann, R. Haw, B. Jassal, S. Jupe, F. Korninger,S. McKay, L. Matthews, B. May, M. Milacic, K. Rothfels, V. Shamovsky, M. Webber, J. Weiser, M. Williams, G. Wu,L. Stein, H. Hermjakob, and P. D’Eustachio. The Reactome pathway Knowledgebase. Nucleic acids research, 44(D1):D481–7, jan 2016. ISSN 1362-4962. doi:10.1093/nar/gkv1351. URL http://www.ncbi.nlm.nih.gov/pubmed/24243840.→ pages 24[56] D. A. Fennell, E. Kirkpatrick, K. Cozens, M. Nye, J. Lester, G. Hanna, N. Steele, P. Szlosarek, S. Danson, J. Lord,C. Ottensmeier, D. Barnes, S. Hill, M. Kalevras, T. Maishman, and G. Griffiths. CONFIRM: a double-blind,placebo-controlled phase III clinical trial investigating the effect of nivolumab in patients with relapsed mesothelioma:study protocol for a randomised controlled trial. Trials, 19(1):233, apr 2018. ISSN 1745-6215.doi:10.1186/s13063-018-2602-y. URL http://www.ncbi.nlm.nih.gov/pubmed/29669604. → pages 51[57] A. Ferna´ndez-Medarde and E. Santos. Ras in cancer and developmental diseases. Genes & cancer, 2(3):344–58, mar2011. ISSN 1947-6027. doi:10.1177/1947601911411084. URL http://www.ncbi.nlm.nih.gov/pubmed/21779504. →pages 84[58] S. A. Forbes, D. Beare, H. Boutselakis, S. Bamford, N. Bindal, J. Tate, C. G. Cole, S. Ward, E. Dawson, L. Ponting,R. Stefancsik, B. Harsha, C. YinKok, M. Jia, H. Jubb, Z. Sondka, S. Thompson, T. De, and P. J. Campbell. COSMIC:Somatic cancer genetics at high-resolution. Nucleic Acids Research, 45(D1):D777–D783, 2017. ISSN 13624962.doi:10.1093/nar/gkw1121. → pages 35, 52[59] P. A. Futreal, L. Coin, M. Marshall, T. Down, T. Hubbard, R. Wooster, N. Rahman, and M. R. Stratton. A census ofhuman cancer genes. Nature reviews. Cancer, 4(3):177–83, mar 2004. ISSN 1474-175X. doi:10.1038/nrc1299. →pages 16, 34, 35[60] G. Germano, S. Lamba, G. Rospo, L. Barault, A. Magrı`, F. Maione, M. Russo, G. Crisafulli, A. Bartolini, G. Lerda,G. Siravegna, B. Mussolin, R. Frapolli, M. Montone, F. Morano, F. de Braud, N. Amirouchene-Angelozzi, S. Marsoni,104M. D’Incalci, A. Orlandi, E. Giraudo, A. Sartore-Bianchi, S. Siena, F. Pietrantonio, F. Di Nicolantonio, and A. Bardelli.Inactivation of DNA repair triggers neoantigen generation and impairs tumour growth. Nature, 2017. ISSN 0028-0836.doi:10.1038/nature24673. URL https://www.ncbi.nlm.nih.gov/pubmed/29186113. → pages 60[61] E. E. Gill, L. S. Chan, G. L. Winsor, N. Dobson, R. Lo, S. J. Ho Sui, B. K. Dhillon, P. K. Taylor, R. Shrestha,C. Spencer, R. E. W. Hancock, P. J. Unrau, and F. S. L. Brinkman. High-throughput detection of RNA processing inbacteria. BMC genomics, 19(1):223, 2018. ISSN 1471-2164. doi:10.1186/s12864-018-4538-8. URLhttp://www.ncbi.nlm.nih.gov/pubmed/29587634. → pages 9[62] E. Gonc¸alves, A. Fragoulis, L. Garcia-Alonso, T. Cramer, J. Saez-Rodriguez, and P. Beltrao. WidespreadPost-transcriptional Attenuation of Genomic Copy-Number Variation in Cancer. Cell Systems, 0(0):1–13, 2017. ISSN24054712. doi:10.1016/j.cels.2017.08.013. URL https://www.ncbi.nlm.nih.gov/pubmed/29032074. → pages 57, 58[63] A. Gonza´lez-Pe´rez and N. Lo´pez-Bigas. Improving the assessment of the outcome of nonsynonymous SNVs with aconsensus deleteriousness score, Condel. American Journal of Human Genetics, 88(4):440–449, 2011. ISSN00029297. doi:10.1016/j.ajhg.2011.03.004. URL https://www.ncbi.nlm.nih.gov/pubmed/21457909. → pages 3[64] A. Gonzalez-Perez and N. Lopez-Bigas. Functional impact bias reveals cancer drivers. Nucleic Acids Research, 40(21):1–10, 2012. ISSN 03051048. doi:10.1093/nar/gks743. URL https://www.ncbi.nlm.nih.gov/pubmed/22904074. →pages 3[65] A. Gonzalez-Perez, J. Deu-Pons, and N. Lopez-Bigas. Improving the prediction of the functional impact of cancermutations by baseline tolerance transformation. Genome medicine, 4(11):89, 2012. ISSN 1756-994X.doi:10.1186/gm390. URL https://www.ncbi.nlm.nih.gov/pubmed/23181723. → pages 3[66] C. S. Grasso, Y.-M. Wu, D. R. Robinson, X. Cao, S. M. Dhanasekaran, A. P. Khan, M. J. Quist, X. Jing, R. J. Lonigro,J. C. Brenner, I. a. Asangani, B. Ateeq, S. Y. Chun, J. Siddiqui, L. Sam, M. Anstett, R. Mehra, J. R. Prensner,N. Palanisamy, G. a. Ryslik, F. Vandin, B. J. Raphael, L. P. Kunju, D. R. Rhodes, K. J. Pienta, A. M. Chinnaiyan, andS. a. Tomlins. The mutational landscape of lethal castration-resistant prostate cancer. Nature, 487(7406):239–43, jul2012. ISSN 1476-4687. doi:10.1038/nature11125. URL https://www.ncbi.nlm.nih.gov/pubmed/22722839. → pages 24[67] M. Greaves and C. C. Maley. Clonal evolution in cancer. Nature, 481(7381):306–13, Jan. 2012. ISSN 1476-4687.doi:10.1038/nature10762. URL https://www.ncbi.nlm.nih.gov/pubmed/22258609. → pages 1, 3[68] C. Greenman, R. Wooster, P. A. Futreal, M. R. Stratton, and D. F. Easton. Statistical analysis of pathogenicity ofsomatic mutations in cancer. Genetics, 173(4):2187–98, Aug. 2006. ISSN 0016-6731.doi:10.1534/genetics.105.044677. URL https://www.ncbi.nlm.nih.gov/pubmed/16783027. → pages 2[69] C. Greenman, P. Stephens, R. Smith, G. L. Dalgliesh, C. Hunter, et al. Patterns of somatic mutation in human cancergenomes. Nature, 446(7132):153–8, Mar. 2007. ISSN 1476-4687. doi:10.1038/nature05610. URLhttps://www.ncbi.nlm.nih.gov/pubmed/17344846. → pages 1, 10[70] M. Griffith, O. L. Griffith, A. C. Coffman, J. V. Weible, J. F. McMichael, N. C. Spies, J. Koval, I. Das, M. B. Callaway,J. M. Eldred, C. a. Miller, J. Subramanian, R. Govindan, R. D. Kumar, R. Bose, L. Ding, J. R. Walker, D. E. Larson,D. J. Dooling, S. M. Smith, T. J. Ley, E. R. Mardis, and R. K. Wilson. DGIdb: mining the druggable genome. Naturemethods, 10(12):1209–10, 2013. ISSN 1548-7105. doi:10.1038/nmeth.2689. → pages 35[71] A. Gupta, M. M. Hossain, N. Miller, M. Kerin, G. Callagy, and S. Gupta. NCOA3 coactivator is a transcriptional targetof XBP1 and regulates PERK-eIF2α-ATF4 signalling in breast cancer. Oncogene, 35(October 2015):1–12, apr 2016.ISSN 1476-5594. doi:10.1038/onc.2016.121. URL http://www.ncbi.nlm.nih.gov/pubmed/27109102. → pages 40[72] D. Hanahan and R. a. Weinberg. Hallmarks of cancer: the next generation. Cell, 144(5):646–74, mar 2011. ISSN1097-4172. doi:10.1016/j.cell.2011.02.013. URL http://www.ncbi.nlm.nih.gov/pubmed/21376230. → pages 1105[73] E. Hodzic, R. Shrestha, K. Zhu, K. Cheng, C. C. Collins, and S. C. Sahinalp. Combinatorial detection of conservedalteration patterns for identifying cancer subnetworks. bioRxiv, 2018. doi:10.1101/369850. URLhttps://doi.org/10.1101/369850. → pages[74] J. Hopcroft and D. Sheldon. Manipulation-resistant reputations using hitting time. In Algorithms and Models for theWeb-Graph, pages 68–81. Springer, 2007. → pages 22[75] F. Hormozdiari, C. Alkan, E. E. Eichler, and S. C. Sahinalp. Combinatorial algorithms for structural variation detectionin high-throughput sequenced genomes. Genome research, 19(7):1270–1278, July 2009. → pages 13[76] B. H. Hristov and M. Singh. Network-based coverage of mutational profiles reveals cancer genes. Cell Systems, 5(3):221–229.e4, 2017. ISSN 16113349. doi:10.1016/j.cels.2017.09.003. URL http://arxiv.org/abs/1704.08544. → pages76, 79, 80, 87[77] X. Hua, H. Xu, Y. Yang, J. Zhu, P. Liu, and Y. Lu. DrGaP: A powerful tool for identifying driver genes and pathways incancer sequencing studies. American Journal of Human Genetics, 93(3):439–451, 2013. ISSN 00029297.doi:10.1016/j.ajhg.2013.07.003. URL https://www.ncbi.nlm.nih.gov/pubmed/23954162. → pages 2[78] C. S. Hughes, S. Foehr, D. A. Garfield, E. E. Furlong, L. M. Steinmetz, and J. Krijgsveld. Ultrasensitive proteomeanalysis using paramagnetic bead technology. Molecular Systems Biology, 10(10):757–757, 2014. ISSN 1744-4292.doi:10.15252/msb.20145625. URL http://www.ncbi.nlm.nih.gov/pubmed/25358341. → pages 65[79] C. S. Hughes, M. K. McConechy, D. R. Cochrane, T. Nazeran, A. N. Karnezis, D. G. Huntsman, and G. B. Morin.Quantitative Profiling of Single Formalin Fixed Tumour Sections: proteomics for translational research. ScientificReports, 6(1):34949, 2016. ISSN 2045-2322. doi:10.1038/srep34949. URLhttp://www.ncbi.nlm.nih.gov/pubmed/27713570. → pages 65, 66[80] F. Iorio, T. A. Knijnenburg, D. J. Vis, G. R. Bignell, M. P. Menden, M. Schubert, N. Aben, E. Gonc¸alves, S. Barthorpe,H. Lightfoot, T. Cokelaer, P. Greninger, E. van Dyk, H. Chang, H. de Silva, H. Heyn, X. Deng, R. K. Egan, Q. Liu,T. Mironenko, X. Mitropoulos, L. Richardson, J. Wang, T. Zhang, S. Moran, S. Sayols, M. Soleimani, D. Tamborero,N. Lopez-Bigas, P. Ross-Macdonald, M. Esteller, N. S. Gray, D. A. Haber, M. R. Stratton, C. H. Benes, L. F. A.Wessels, J. Saez-Rodriguez, U. McDermott, and M. J. Garnett. A Landscape of Pharmacogenomic Interactions inCancer. Cell, 166(3):740–54, jul 2016. ISSN 1097-4172. doi:10.1016/j.cell.2016.06.017. URLhttp://www.ncbi.nlm.nih.gov/pubmed/27397505. → pages 41, 44[81] I. H. Ismail, R. Davidson, J.-P. Gagne´, Z. Z. Xu, G. G. Poirier, and M. J. Hendzel. Germline mutations in BAP1 impairits function in DNA double-strand break repair. Cancer research, 74(16):4282–94, aug 2014. ISSN 1538-7445.doi:10.1158/0008-5472.CAN-13-3109. URL http://www.ncbi.nlm.nih.gov/pubmed/24894717. → pages 60[82] P. F. Johnson. Molecular stop signs: regulation of cell-cycle arrest by C/EBP transcription factors. Journal of cellscience, 118(Pt 12):2545–55, jun 2005. ISSN 0021-9533. doi:10.1242/jcs.02459. URLhttp://www.ncbi.nlm.nih.gov/pubmed/15944395. → pages 84[83] N. M. Joseph, Y.-y. Chen, A. Nasr, I. Yeh, E. Talevich, C. Onodera, B. C. Bastian, J. T. Rabban, K. Garg, C. Zaloudek,and D. A. Solomon. Genomic profiling of malignant peritoneal mesothelioma reveals recurrent alterations in epigeneticregulatory genes BAP1, SETD2, and DDX3X. Modern pathology : an official journal of the United States andCanadian Academy of Pathology, Inc, 30(2):246–254, 2017. ISSN 1530-0285. doi:10.1038/modpathol.2016.188. URLhttp://www.ncbi.nlm.nih.gov/pubmed/27813512. → pages 51[84] C. Kadoch and G. R. Crabtree. Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanisticinsights gained from human genomics. Science Advances, 1(5):e1500447–e1500447, 2015. ISSN 2375-2548.doi:10.1126/sciadv.1500447. URL http://www.ncbi.nlm.nih.gov/pubmed/26601204. → pages 60106[85] S. Kato, B. N. Tomson, T. P. H. Buys, S. K. Elkin, J. L. Carter, and R. Kurzrock. Genomic Landscape of MalignantMesotheliomas. Molecular Cancer Therapeutics, 15(10):2498–2507, 2016. ISSN 1535-7163.doi:10.1158/1535-7163.MCT-16-0229. URL https://www.ncbi.nlm.nih.gov/pubmed/27507853. → pages 51, 53[86] E. Khurana, Y. Fu, V. Colonna, X. J. Mu, H. M. Kang, T. Lappalainen, A. Sboner, L. Lochovsky, J. Chen, A. Harmanci,J. Das, A. Abyzov, S. Balasubramanian, K. Beal, D. Chakravarty, D. Challis, Y. Chen, D. Clarke, L. Clarke,F. Cunningham, U. S. Evani, P. Flicek, R. Fragoza, E. Garrison, R. Gibbs, Z. H. Gu¨mu¨s, J. Herrero, N. Kitabayashi,Y. Kong, K. Lage, V. Liluashvili, S. M. Lipkin, D. G. MacArthur, G. Marth, D. Muzny, T. H. Pers, G. R. S. Ritchie, J. a.Rosenfeld, C. Sisu, X. Wei, M. Wilson, Y. Xue, F. Yu, E. T. Dermitzakis, H. Yu, M. a. Rubin, C. Tyler-Smith, andM. Gerstein. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science (New York,N.Y.), 342(6154):1235587, 2013. ISSN 1095-9203. doi:10.1126/science.1235587. URLhttp://www.ncbi.nlm.nih.gov/pubmed/24092746. → pages 3[87] Y.-A. Kim, S. Wuchty, and T. M. Przytycka. Identifying causal genes and dysregulated pathways in complex diseases.PLoS computational biology, 7(3):e1001095, Mar. 2011. ISSN 1553-7358. doi:10.1371/journal.pcbi.1001095. →pages 5[88] Y.-A. Kim, R. Salari, S. Wuchty, and T. M. Przytycka. Module cover - a new approach to genotype-phenotype studies.Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pages 135–46, 2013. ISSN 2335-6936.URL http://www.ncbi.nlm.nih.gov/pubmed/23424119. → pages 76[89] Y.-A. Kim, D.-Y. Cho, P. Dao, and T. M. Przytycka. MEMCover: integrated analysis of mutual exclusivity andfunctional network reveals dysregulated pathways across multiple cancer types. Bioinformatics (Oxford, England), 31(12):i284–92, jun 2015. ISSN 1367-4811. doi:10.1093/bioinformatics/btv247. URLhttp://www.ncbi.nlm.nih.gov/pubmed/26072494. → pages 76, 88[90] J. C. King, J. Xu, J. Wongvipat, H. Hieronymus, B. S. Carver, D. H. Leung, B. S. Taylor, C. Sander, R. D. Cardiff, S. S.Couto, W. L. Gerald, and C. L. Sawyers. Cooperativity of TMPRSS2-ERG with PI3-kinase pathway activation inprostate oncogenesis. Nature Genetics, 41(5):524–526, 2009. ISSN 1061-4036. doi:10.1038/ng.371. URLhttps://www.ncbi.nlm.nih.gov/pubmed/19396167. → pages 75[91] S. Ko¨hler, S. Bauer, D. Horn, and P. N. Robinson. Walking the Interactome for Prioritization of Candidate DiseaseGenes. American Journal of Human Genetics, 82(4):949–958, 2008. ISSN 00029297. doi:10.1016/j.ajhg.2008.02.013.URL http://www.cell.com/AJHG/abstract/S0002-9297(08)00172-9. → pages 6[92] R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of theNineteenth International Conference on Machine Learning, ICML ’02, pages 315–322, San Francisco, CA, USA,2002. Morgan Kaufmann Publishers Inc. ISBN 1-55860-873-7. URLhttp://dl.acm.org/citation.cfm?id=645531.655996. → pages 6[93] K. J. Kron, A. Murison, S. Zhou, V. Huang, T. N. Yamaguchi, Y.-J. Shiah, M. Fraser, T. van der Kwast, P. C. Boutros,R. G. Bristow, and M. Lupien. TMPRSS2ERG fusion co-opts master transcription factors and activates NOTCHsignaling in primary prostate cancer. Nature Genetics, 49(9):1336–1345, 2017. ISSN 1061-4036.doi:10.1038/ng.3930. URL https://www.ncbi.nlm.nih.gov/pubmed/28783165. → pages 75[94] A. Lan, I. Y. Smoly, G. Rapaport, S. Lindquist, E. Fraenkel, and E. Yeger-Lotem. ResponseNet: Revealing signalingand regulatory networks linking genetic and transcriptomic screening data. Nucleic Acids Research, 39(SUPPL. 2):424–429, 2011. ISSN 03051048. doi:10.1093/nar/gkr359. → pages 6[95] S. Landreville, O. A. Agapova, K. A. Matatall, Z. T. Kneass, M. D. Onken, R. S. Lee, A. M. Bowcock, and J. W.Harbour. Histone deacetylase inhibitors induce growth arrest and differentiation in uveal melanoma. Clinical CancerResearch, 18(2):408–416, 2012. ISSN 10780432. doi:10.1158/1078-0432.CCR-11-0946. URLhttps://www.ncbi.nlm.nih.gov/pubmed/22038994. → pages 60107[96] M. S. Lawrence, P. Stojanov, P. Polak, G. V. Kryukov, K. Cibulskis, A. Sivachenko, S. L. Carter, C. Stewart, C. H.Mermel, S. a. Roberts, A. Kiezun, P. S. Hammerman, A. McKenna, Y. Drier, L. Zou, A. H. Ramos, T. J. Pugh,N. Stransky, E. Helman, J. Kim, C. Sougnez, L. Ambrogio, E. Nickerson, E. Shefler, M. L. Corte´s, D. Auclair,G. Saksena, D. Voet, M. Noble, D. DiCara, P. Lin, L. Lichtenstein, D. I. Heiman, T. Fennell, M. Imielinski,B. Hernandez, E. Hodis, S. Baca, A. M. Dulak, J. Lohr, D.-A. Landau, C. J. Wu, J. Melendez-Zajgla,A. Hidalgo-Miranda, A. Koren, S. a. McCarroll, J. Mora, R. S. Lee, B. Crompton, R. Onofrio, M. Parkin, W. Winckler,K. Ardlie, S. B. Gabriel, C. W. M. Roberts, J. a. Biegel, K. Stegmaier, A. J. Bass, L. a. Garraway, M. Meyerson, T. R.Golub, D. a. Gordenin, S. Sunyaev, E. S. Lander, and G. Getz. Mutational heterogeneity in cancer and the search fornew cancer-associated genes. Nature, 499(7457):214–8, July 2013. ISSN 1476-4687. doi:10.1038/nature12213. URLhttp://www.ncbi.nlm.nih.gov/pubmed/23770567. → pages 2[97] D. T. Le, J. N. Uram, H. Wang, B. R. Bartlett, H. Kemberling, A. D. Eyring, A. D. Skora, B. S. Luber, N. S. Azad,D. Laheru, B. Biedrzycki, R. C. Donehower, A. Zaheer, G. A. Fisher, T. S. Crocenzi, J. J. Lee, S. M. Duffy, R. M.Goldberg, A. de la Chapelle, M. Koshiji, F. Bhaijee, T. Huebner, R. H. Hruban, L. D. Wood, N. Cuka, D. M. Pardoll,N. Papadopoulos, K. W. Kinzler, S. Zhou, T. C. Cornish, J. M. Taube, R. A. Anders, J. R. Eshleman, B. Vogelstein, andL. A. Diaz. PD-1 Blockade in Tumors with Mismatch-Repair Deficiency. New England Journal of Medicine, 372(26):2509–2520, 2015. ISSN 0028-4793. doi:10.1056/NEJMoa1500596. URLhttps://www.ncbi.nlm.nih.gov/pubmed/26028255. → pages 60[98] D. T. Le, J. N. Durham, K. N. Smith, H. Wang, B. R. Bartlett, L. K. Aulakh, S. Lu, H. Kemberling, C. Wilt, B. S. Luber,F. Wong, N. S. Azad, A. A. Rucki, D. Laheru, R. Donehower, A. Zaheer, G. A. Fisher, T. S. Crocenzi, J. J. Lee, T. F.Greten, A. G. Duffy, K. K. Ciombor, A. D. Eyring, B. H. Lam, A. Joe, S. P. Kang, M. Holdhoff, L. Danilova, L. Cope,C. Meyer, S. Zhou, R. M. Goldberg, D. K. Armstrong, K. M. Bever, A. N. Fader, J. Taube, F. Housseau, D. Spetzler,N. Xiao, D. M. Pardoll, N. Papadopoulos, K. W. Kinzler, J. R. Eshleman, B. Vogelstein, R. A. Anders, and L. A. Diaz.Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science (New York, N.Y.), 357(6349):409–413, 2017. ISSN 1095-9203. doi:10.1126/science.aan6733. URLhttp://www.ncbi.nlm.nih.gov/pubmed/28596308. → pages 60[99] N. Leblay, F. Lepreˆtre, N. Le Stang, A. Gautier-Stein, L. Villeneuve, S. Isaac, D. Maillet, F. Galateau-Salle´, C. Villenet,S. Sebda, A. Goracci, G. Byrnes, J. D. McKay, M. Figeac, O. Glehen, F. N. Gilly, M. Foll, L. Fernandez-Cuesta, andM. Brevet. BAP1 Is Altered by Copy Number Loss, Mutation, and/or Loss of Protein Expression in More Than 70%ofMalignant Peritoneal Mesotheliomas. Journal of Thoracic Oncology, 12(4):724–733, 2017. ISSN 15561380.doi:10.1016/j.jtho.2016.12.019. URL https://www.ncbi.nlm.nih.gov/pubmed/28034829. → pages 51[100] B. D. Lehmann, J. A. Bauer, X. Chen, M. E. Sanders, A. B. Chakravarthy, Y. Shyr, and J. A. Pietenpol. Identification ofhuman triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. The Journal ofclinical investigation, 121(7):2750–67, jul 2011. ISSN 1558-8238. doi:10.1172/JCI45014. → pages 41[101] M. D. M. Leiserson, D. Blokh, R. Sharan, and B. J. Raphael. Simultaneous identification of multiple driver pathways incancer. PLoS computational biology, 9(5):e1003054, May 2013. ISSN 1553-7358. doi:10.1371/journal.pcbi.1003054.→ pages 4[102] M. D. M. Leiserson, F. Vandin, H.-T. Wu, J. R. Dobson, J. V. Eldridge, J. L. Thomas, A. Papoutsaki, Y. Kim, B. Niu,M. McLellan, M. S. Lawrence, A. Gonzalez-Perez, D. Tamborero, Y. Cheng, G. A. Ryslik, N. Lopez-Bigas, G. Getz,L. Ding, and B. J. Raphael. Pan-cancer network analysis identifies combinations of rare somatic mutations acrosspathways and protein complexes. Nature Genetics, 47(2):106–114, 2014. ISSN 1061-4036. doi:10.1038/ng.3168.URL http://www.ncbi.nlm.nih.gov/pubmed/25501392. → pages 76[103] C. K.-S. Leung. Anti-monotone Constraints, pages 98–98. Springer US, Boston, MA, 2009. ISBN 978-0-387-39940-9.doi:10.1007/978-0-387-39940-9 5046. URL https://doi.org/10.1007/978-0-387-39940-9 5046. → pages 80[104] D. A. Levin, Y. Peres, and E. L. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008. →pages 12108[105] W. Li, J. Cooper, L. Zhou, C. Yang, H. Erdjument-Bromage, D. Zagzag, M. Snuderl, M. Ladanyi, C. O. Hanemann,P. Zhou, M. A. Karajannis, and F. G. Giancotti. Merlin/NF2 loss-driven tumorigenesis linked toCRL4(DCAF1)-mediated inhibition of the hippo pathway kinases Lats1 and 2 in the nucleus. Cancer cell, 26(1):48–60, jul 2014. ISSN 1878-3686. doi:10.1016/j.ccr.2014.05.001. URLhttps://www.ncbi.nlm.nih.gov/pubmed/25026211. → pages 55[106] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American Societyfor Information Science and Technology, 58(7):1019–1031, 2007. ISSN 15322882. doi:10.1002/asi.20591. → pages 7,12[107] F. Lin, M. C. De Gooijer, E. M. Roig, L. C. M. Buil, S. M. Christner, J. H. Beumer, T. WEurdinger, J. H. Beijnen, andO. Van Tellingen. ABCB1, ABCG2, and PTEN determine the response of glioblastoma to temozolomide and ABT-888therapy. Clinical Cancer Research, 20(10):2703–2713, 2014. ISSN 15573265. doi:10.1158/1078-0432.CCR-14-0084.→ pages 38[108] A. A. Loboda, M. N. Artyomov, and A. A. S. B. Solving Generalized Maximum-Weight Connected Subgraph Problemfor Network Enrichment Analysis. In M. Frith and C. N. and Storm Pedersen, editors, Algorithms in Bioinformatics,pages 210–221, Cham, 2016. Springer International Publishing. ISBN 978-3-319-43681-4.doi:10.1007/978-3-319-43681-4 17. URL http://link.springer.com/10.1007/978-3-642-33122-0. → pages 76[109] I. S. U. Luk, R. Shrestha, H. Xue, Y. Wang, F. Zhang, D. Lin, A. Haegert, R. Wu, X. Dong, C. C. Collins, A. Zoubeidi,M. E. Gleave, P. W. Gout, and Y. Wang. BIRC6 Targeting as Potential Therapy for Advanced, Enzalutamide-ResistantProstate Cancer. Clinical cancer research, 23(6):1542–1551, mar 2017. ISSN 1078-0432.doi:10.1158/1078-0432.CCR-16-0718. URL http://www.ncbi.nlm.nih.gov/pubmed/27663589. → pages 9[110] M. Maio, A. Scherpereel, L. Calabro`, J. Aerts, S. C. Perez, A. Bearz, K. Nackaerts, D. A. Fennell, D. Kowalski, A. S.Tsao, P. Taylor, F. Grosso, S. J. Antonia, A. K. Nowak, M. Taboada, M. Puglisi, P. K. Stockman, and H. L. Kindler.Tremelimumab as second-line or third-line treatment in relapsed malignant mesothelioma (DETERMINE): amulticentre, international, randomised, double-blind, placebo-controlled phase 2b trial. The Lancet Oncology, pages1–13, 2017. ISSN 14702045. doi:10.1016/S1470-2045(17)30446-1. URLhttps://www.ncbi.nlm.nih.gov/pubmed/28729154. → pages 51, 61[111] J. Marquart, E. Y. Chen, and V. Prasad. Estimation of The Percentage of US Patients With Cancer Who Benefit FromGenome-Driven Oncology. JAMA Oncology, 97239:1–7, apr 2018. ISSN 2374-2437.doi:10.1001/jamaoncol.2018.1660. URL http://dx.doi.org/10.1001/jamaoncol.2018.1660. → pages 43[112] D. L. Masica and R. Karchin. Correlation of somatic mutation and expression identifies genes important in humanglioblastoma progression and survival. Cancer research, 71(13):4550–61, July 2011. ISSN 1538-7445.doi:10.1158/0008-5472.CAN-11-0180. → pages 4[113] S. Maxwell, M. R. Chance, and M. Koyutu¨rk. Efficiently Enumerating All Connected Induced Subgraphs of a LargeMolecular Network. In A.-H. Dediu, , C. Mart{\’i}n-Vide, , and B. Truthe, editors, Algorithms for ComputationalBiology, pages 171–182, Cham, 2014. Springer International Publishing. ISBN 978-3-319-07953-0.doi:10.1007/978-3-319-07953-0 14. URL http://link.springer.com/10.1007/978-3-319-07953-0{ }14. → pages 80[114] A. McPherson, F. Hormozdiari, A. Zayed, R. Giuliany, G. Ha, M. G. F. Sun, M. Griffith, A. Heravi Moussavi, J. Senz,N. Melnyk, M. Pacheco, M. A. Marra, M. Hirst, T. O. Nielsen, S. C. Sahinalp, D. Huntsman, and S. P. Shah. deFuse: analgorithm for gene fusion discovery in tumor RNA-Seq data. PLoS computational biology, 7(5):e1001138, may 2011.ISSN 1553-7358. doi:10.1371/journal.pcbi.1001138. URL http://www.ncbi.nlm.nih.gov/pubmed/21625565. → pages55, 65109[115] C. H. Mermel, S. E. Schumacher, B. Hill, M. L. Meyerson, R. Beroukhim, and G. Getz. GISTIC2.0 facilitates sensitiveand confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biology,12(4):R41, 2011. ISSN 1465-6906. doi:10.1186/gb-2011-12-4-r41. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21527027. → pages 3, 54[116] D. Miao, C. A. Margolis, W. Gao, M. H. Voss, W. Li, D. J. Martini, C. Norton, D. Bosse´, S. M. Wankowicz, D. Cullen,C. Horak, M. Wind-Rotolo, A. Tracy, M. Giannakis, F. S. Hodi, C. G. Drake, M. W. Ball, M. E. Allaf, A. Snyder, M. D.Hellmann, T. Ho, R. J. Motzer, S. Signoretti, W. G. Kaelin, T. K. Choueiri, and E. M. Van Allen. Genomic correlates ofresponse to immune checkpoint therapies in clear cell renal cell carcinoma. Science (New York, N.Y.), 5951(January):1–11, jan 2018. ISSN 1095-9203. doi:10.1126/science.aan5951. URLhttp://www.ncbi.nlm.nih.gov/pubmed/29301960. → pages 60[117] C. a. Miller, S. H. Settle, E. P. Sulman, K. D. Aldape, and A. Milosavljevic. Discovering functional modules byidentifying recurrent and mutually exclusive mutational patterns in tumors. BMC medical genomics, 4(1):34, 2011.ISSN 1755-8794. doi:10.1186/1755-8794-4-34. URL http://www.biomedcentral.com/1755-8794/4/34. → pages 4, 76[118] M. Mina, F. Raynaud, D. Tavernari, E. Battistello, S. Sungalee, S. Saghafinia, T. Laessle, F. Sanchez-Vega, N. Schultz,E. Oricchio, and G. Ciriello. Conditional Selection of Genomic Alterations Dictates Cancer Evolution and OncogenicDependencies. Cancer cell, 29(0):723–736, jul 2017. ISSN 1878-3686. doi:10.1016/j.ccell.2017.06.010. → pages 76[119] G. Minuti and L. Landi. MET deregulation in breast cancer. Annals of translational medicine, 3(13):181, aug 2015.ISSN 2305-5839. doi:10.3978/j.issn.2305-5839.2015.06.22. URL http://www.ncbi.nlm.nih.gov/pubmed/26366398. →pages 84[120] L. Montanaro, D. Trere´, and M. Derenzini. Nucleolus, ribosomes, and cancer. The American journal of pathology, 173(2):301–10, aug 2008. ISSN 1525-2191. doi:10.2353/ajpath.2008.070752. URLhttp://www.ncbi.nlm.nih.gov/pubmed/18583314. → pages 85[121] K. W. Mouw, M. S. Goldberg, P. A. Konstantinopoulos, and A. D. D’Andrea. DNA Damage and Repair Biomarkers ofImmunotherapy Response. Cancer discovery, 7(7):675–693, 2017. ISSN 2159-8290.doi:10.1158/2159-8290.CD-17-0226. URL http://www.ncbi.nlm.nih.gov/pubmed/28630051. → pages 60[122] A. Murat, E. Migliavacca, T. Gorlia, W. L. Lambiv, T. Shay, M.-F. Hamou, N. de Tribolet, L. Regli, W. Wick, M. C. M.Kouwenhoven, J. a. Hainfellner, F. L. Heppner, P.-Y. Dietrich, Y. Zimmer, J. G. Cairncross, R.-c. Janzer, E. Domany,M. Delorenzi, R. Stupp, and M. E. Hegi. Stem cell-related ”self-renewal” signature and high epidermal growth factorreceptor expression associated with resistance to concomitant chemoradiotherapy in glioblastoma. Journal of clinicaloncology, 26(18):3015–24, jun 2008. ISSN 1527-7755. doi:10.1200/JCO.2007.15.7164. URLhttp://www.ncbi.nlm.nih.gov/pubmed/18565887. → pages 24[123] S. C. Muthukrishnan, S. and Sahinalp. Approximate nearest neighbors and sequence comparison with block operations.In Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing, pages 416–424, New York,2000. ACM. ISBN 1581131844. doi:10.1145/335305.335353. → pages 78[124] A. M. Newman, C. L. Liu, M. R. Green, A. J. Gentles, W. Feng, Y. Xu, C. D. Hoang, M. Diehn, and A. A. Alizadeh.Robust enumeration of cell subsets from tissue expression profiles. Nature methods, 12(5):453–7, may 2015. ISSN1548-7105. doi:10.1038/nmeth.3337. URL http://www.ncbi.nlm.nih.gov/pubmed/25822800. → pages 58, 68[125] S. Ng, E. a. Collisson, A. Sokolov, T. Goldstein, A. Onzalez-Perez, N. Lopez-Bigas, C. Benz, D. Haussler, and J. M.Stuart. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis.Bioinformatics, 28(18):640–646, 2012. ISSN 13674803. doi:10.1093/bioinformatics/bts402. → pages 4110[126] C. K. Osborne, V. Bardou, T. A. Hopp, G. C. Chamness, S. G. Hilsenbeck, S. A. W. Fuqua, J. Wong, D. C. Allred,G. M. Clark, and R. Schiff. Role of the estrogen receptor coactivator AIB1 (SRC-3) and HER-2/neu in tamoxifenresistance in breast cancer. Journal of the National Cancer Institute, 95(5):353–61, mar 2003. ISSN 0027-8874.doi:10.1017/CBO9781107415324.004. URL http://www.ncbi.nlm.nih.gov/pubmed/12618500. → pages 40[127] D. Pan, A. Kobayashi, P. Jiang, L. Ferrari de Andrade, R. E. Tay, A. Luoma, D. Tsoucas, X. Qiu, K. Lim, P. Rao, H. W.Long, G.-c. Yuan, J. Doench, M. Brown, S. Liu, and K. W. Wucherpfennig. A major chromatin regulator determinesresistance of tumor cells to T cell-mediated killing. Science (New York, N.Y.), 1710(January):1–12, jan 2018. ISSN1095-9203. doi:10.1126/science.aao1710. URL http://www.ncbi.nlm.nih.gov/pubmed/29301958. → pages 60[128] D. W. Parsons, S. Jones, X. Zhang, J. C.-H. Lin, R. J. Leary, P. Angenendt, et al. An integrated genomic analysis ofhuman glioblastoma multiforme. Science (New York, N.Y.), 321(5897):1807–12, Sept. 2008. ISSN 1095-9203.doi:10.1126/science.1164382. URL https://www.ncbi.nlm.nih.gov/pubmed/18772396. → pages 3, 35[129] A.-M. Patch, E. L. Christie, D. Etemadmoghadam, D. W. Garsed, J. George, S. Fereday, K. Nones, P. Cowin, K. Alsop,P. J. Bailey, K. S. Kassahn, F. Newell, M. C. J. Quinn, S. Kazakoff, K. Quek, C. Wilhelm-Benartzi, E. Curry, H. S.Leong, A. Hamilton, L. Mileshkin, G. Au-Yeung, C. Kennedy, J. Hung, Y.-E. Chiew, P. Harnett, M. Friedlander,M. Quinn, J. Pyman, S. Cordner, P. OBrien, J. Leditschke, G. Young, K. Strachan, P. Waring, W. Azar, C. Mitchell,N. Traficante, J. Hendley, H. Thorne, M. Shackleton, D. K. Miller, G. M. Arnau, R. W. Tothill, T. P. Holloway,T. Semple, I. Harliwong, C. Nourse, E. Nourbakhsh, S. Manning, S. Idrisoglu, T. J. C. Bruxner, A. N. Christ, B. Poudel,O. Holmes, M. Anderson, C. Leonard, A. Lonie, N. Hall, S. Wood, D. F. Taylor, Q. Xu, J. L. Fink, N. Waddell,R. Drapkin, E. Stronach, H. Gabra, R. Brown, A. Jewell, S. H. Nagaraj, E. Markham, P. J. Wilson, J. Ellul, O. McNally,M. a. Doyle, R. Vedururu, C. Stewart, E. Lengyel, J. V. Pearson, N. Waddell, A. DeFazio, S. M. Grimmond, andD. D. L. Bowtell. Wholegenome characterization of chemoresistant ovarian cancer. Nature, 521(7553):489–494, 2015.ISSN 0028-0836. doi:10.1038/nature14410. → pages 36[130] E. O. Paull, D. E. Carlin, M. Niepel, P. K. Sorger, D. Haussler, et al. Discovering causal pathways linking genomicevents to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics (Oxford,England), pages 1–8, Sept. 2013. ISSN 1367-4811. doi:10.1093/bioinformatics/btt471. → pages 6[131] J. Pelletier, G. Thomas, and S. Volarevic´. Ribosome biogenesis in cancer: new players and therapeutic avenues. Naturereviews. Cancer, 18(1):51–63, jan 2018. ISSN 1474-1768. doi:10.1038/nrc.2017.104. URLhttp://www.ncbi.nlm.nih.gov/pubmed/29192214. → pages 85[132] S. Pen˜a-Llopis, S. Vega-Rubı´n-de Celis, A. Liao, N. Leng, A. Pavı´a-Jime´nez, S. Wang, T. Yamasaki, L. Zhrebker,S. Sivanand, P. Spence, L. Kinch, T. Hambuch, S. Jain, Y. Lotan, V. Margulis, A. I. Sagalowsky, P. B. Summerour,W. Kabbani, S. W. W. Wong, N. Grishin, M. Laurent, X.-J. Xie, C. D. Haudenschild, M. T. Ross, D. R. Bentley,P. Kapur, and J. Brugarolas. BAP1 loss defines a new class of renal cell carcinoma. Nature Genetics, 44(7):751–759,2012. ISSN 1061-4036. doi:10.1038/ng.2323. URL https://www.ncbi.nlm.nih.gov/pubmed/22683710. → pages 60[133] C. M. Perou, T. Sørlie, M. B. Eisen, M. van de Rijn, S. S. Jeffrey, C. A. Rees, J. R. Pollack, D. T. Ross, H. Johnsen,L. A. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S. X. Zhu, P. E. Lønning, A. L. Børresen-Dale, P. O. Brown,and D. Botstein. Molecular portraits of human breast tumours. Nature, 406(6797):747–52, aug 2000. ISSN 0028-0836.doi:10.1038/35021093. → pages 88[134] T. S. K. Prasad, K. Kandasamy, and A. Pandey. Human Protein Reference Database and Human Proteinpedia asdiscovery tools for systems biology. Methods in molecular biology (Clifton, N.J.), 577:67–79, jan 2009. ISSN1940-6029. doi:10.1007/978-1-60761-232-2 6. URL http://www.ncbi.nlm.nih.gov/pubmed/19718509. → pages 24[135] V. Prasad. Perspective: The precision-oncology illusion. Nature, 537(7619):S63–S63, Sep 2016. ISSN 0028-0836.URL http://dx.doi.org/10.1038/537S63a. Outlook. → pages 43111[136] Y. Qi, Y. Suhail, Y.-y. Lin, J. D. Boeke, and J. S. Bader. Finding friends and enemies in an enemies-only network: agraph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast geneticinteractions. Genome research, 18(12):1991–2004, Dec. 2008. ISSN 1088-9051. doi:10.1101/gr.077693.108. → pages6[137] J. Reimand and G. D. Bader. Systematic analysis of somatic mutations in phosphorylation signaling predicts novelcancer drivers. Molecular systems biology, 9(637):637, 2013. ISSN 1744-4292. doi:10.1038/msb.2012.68. URLhttps://www.ncbi.nlm.nih.gov/pubmed/23340843. → pages 3[138] S. Ren, G.-H. Wei, D. Liu, L. Wang, Y. Hou, S. Zhu, L. Peng, Q. Zhang, Y. Cheng, H. Su, X. Zhou, J. Zhang, F. Li,H. Zheng, Z. Zhao, C. Yin, Z. He, X. Gao, H. E. Zhau, C.-Y. Chu, J. B. Wu, C. Collins, S. V. Volik, R. Bell, J. Huang,K. Wu, D. Xu, D. Ye, Y. Yu, L. Zhu, M. Qiao, H.-M. Lee, Y. Yang, Y. Zhu, X. Shi, R. Chen, Y. Wang, W. Xu, Y. Cheng,C. Xu, X. Gao, T. Zhou, B. Yang, J. Hou, L. Liu, Z. Zhang, Y. Zhu, C. Qin, P. Shao, J. Pang, L. W. Chung, J. Xu, C.-L.Wu, W. Zhong, X. Xu, Y. Li, X. Zhang, J. Wang, H. Yang, J. Wang, H. Huang, and Y. Sun. Whole-genome andTranscriptome Sequencing of Prostate Cancer Identify New Genetic Alterations Driving Disease Progression.European Urology, 73(3):322–339, mar 2018. ISSN 03022838. doi:10.1016/j.eururo.2017.08.027. URLhttp://www.ncbi.nlm.nih.gov/pubmed/28927585. → pages 24[139] B. Reva, Y. Antipin, and C. Sander. Predicting the functional impact of protein mutations: Application to cancergenomics. Nucleic Acids Research, 39(17):37–43, 2011. ISSN 03051048. doi:10.1093/nar/gkr407. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21727090. → pages 3[140] A. L. Richardson, Z. C. Wang, A. De Nicolo, X. Lu, M. Brown, A. Miron, X. Liao, J. D. Iglehart, D. M. Livingston,and S. Ganesan. X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell, 9(2):121–132, 2006.ISSN 15356108. doi:10.1016/j.ccr.2006.01.013. → pages 24[141] D. S. Rickman, T. D. Soong, B. Moss, J. M. Mosquera, J. Dlabal, S. Terry, T. MacDonald, K. Bunting, F. Demichelis,A. Melnick, O. Elemento, and M. a. Rubin. Oncogene-mediated alterations in chromatin conformation. Proceedings ofthe National Academy of Sciences of the United States of America, 109(23):9083–9088, 2012. ISSN 0008-5472. →pages 24[142] A. Robertson, J. Shih, C. Yau, E. Gibb, J. Oba, K. Mungall, J. Hess, V. Uzunangelov, V. Walter, L. Danilova,T. Lichtenberg, M. Kucherlapati, P. Kimes, M. Tang, A. Penson, O. Babur, R. Akbani, C. Bristow, K. Hoadley, L. Iype,M. Chang, M. Abdel-Rahman, R. Akbani, A. Ally, J. Auman, O. Babur, M. Balasundaram, S. Balu, C. Benz,R. Beroukhim, I. Birol, T. Bodenheimer, J. Bowen, R. Bowlby, C. Bristow, D. Brooks, R. Carlsen, C. Cebulla,M. Chang, A. Cherniack, L. Chin, J. Cho, E. Chuah, S. Chudamani, C. Cibulskis, K. Cibulskis, L. Cope, S. Coupland,L. Danilova, T. Defreitas, J. Demchok, L. Desjardins, N. Dhalla, B. Esmaeli, I. Felau, M. Ferguson, S. Frazer,S. Gabriel, J. Gastier-Foster, N. Gehlenborg, M. Gerken, J. Gershenwald, G. Getz, E. Gibb, K. Griewank, E. Grimm,D. Hayes, A. Hegde, D. Heiman, C. Helsel, J. Hess, K. Hoadley, S. Hobensack, R. Holt, A. Hoyle, X. Hu, C. Hutter,M. Jager, S. Jefferys, C. Jones, S. Jones, C. Kandoth, K. Kasaian, J. Kim, P. Kimes, M. Kucherlapati, R. Kucherlapati,E. Lander, M. Lawrence, A. Lazar, S. Lee, K. Leraas, T. Lichtenberg, P. Lin, J. Liu, W. Liu, L. Lolla, Y. Lu, L. Iype,Y. Ma, H. Mahadeshwar, O. Mariani, M. Marra, M. Mayo, S. Meier, S. Meng, M. Meyerson, P. Mieczkowski, G. Mills,R. Moore, L. Mose, A. Mungall, K. Mungall, B. Murray, R. Naresh, M. Noble, J. Oba, A. Pantazi, M. Parfenov, P. Park,J. Parker, A. Penson, C. Perou, T. Pihl, R. Pilarski, A. Protopopov, A. Radenbaugh, K. Rai, N. Ramirez, X. Ren,S. Reynolds, J. Roach, A. Robertson, S. Roman-Roman, J. Roszik, S. Sadeghi, G. Saksena, X. Sastre, D. Schadendorf,J. Schein, L. Schoenfield, S. Schumacher, J. Seidman, S. Seth, G. Sethi, M. Sheth, Y. Shi, C. Shields, J. Shih,I. Shmulevich, J. Simons, A. Singh, P. Sipahimalani, T. Skelly, H. Sofia, M. Soloway, X. Song, M.-H. Stern, J. Stuart,Q. Sun, H. Sun, A. Tam, D. Tan, M. Tang, J. Tang, R. Tarnuzzer, B. Taylor, N. Thiessen, V. Thorsson, K. Tse,V. Uzunangelov, U. Veluvolu, R. Verhaak, D. Voet, V. Walter, Y. Wan, Z. Wang, J. Weinstein, M. Wilkerson,M. Williams, L. Wise, S. Woodman, T. Wong, Y. Wu, L. Yang, L. Yang, C. Yau, J. Zenklusen, J. Zhang, H. Zhang,E. Zmuda, A. Cherniack, C. Benz, G. Mills, R. Verhaak, K. Griewank, I. Felau, J. Zenklusen, J. Gershenwald,L. Schoenfield, A. Lazar, M. Abdel-Rahman, S. Roman-Roman, M.-H. Stern, C. Cebulla, M. Williams, M. Jager,112S. Coupland, B. Esmaeli, C. Kandoth, and S. Woodman. Integrative Analysis Identifies Four Molecular and ClinicalSubsets in Uveal Melanoma. Cancer Cell, 32(2):204–220, 2017. ISSN 18783686. doi:10.1016/j.ccell.2017.07.003.URL https://www.ncbi.nlm.nih.gov/pubmed/28810145. → pages 60, 75[143] R. Rosenthal, N. McGranahan, J. Herrero, B. S. Taylor, and C. Swanton. deconstructSigs: delineating mutationalprocesses in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. GenomeBiology, 17(1):31, 2016. ISSN 1474-760X. doi:10.1186/s13059-016-0893-4. URLhttps://www.ncbi.nlm.nih.gov/pubmed/26899170. → pages 52, 67[144] B. Rosner. Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics), 25(2):165–172, 2013.→ pages 24, 83[145] A. Ruepp, B. Waegele, M. Lechner, B. Brauner, I. Dunger-Kaltenbach, G. Fobo, G. Frishman, C. Montrone, and H. W.Mewes. CORUM: The comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Research, 38(SUPPL.1):497–501, 2009. ISSN 03051048. doi:10.1093/nar/gkp914. URLhttp://www.ncbi.nlm.nih.gov/pubmed/19884131. → pages 57[146] J. J. Sacco, J. Kenyani, Z. Butt, R. Carter, H. Y. Chew, L. P. Cheeseman, S. Darling, M. Denny, S. Urbe´, M. J. Clague,and J. M. Coulson. Loss of the deubiquitylase BAP1 alters class I histone deacetylase expression and sensitivity ofmesothelioma cells to HDAC inhibitors. Oncotarget, 6(15):13757–71, 2015. ISSN 1949-2553.doi:10.18632/oncotarget.3765. URL http://www.ncbi.nlm.nih.gov/pubmed/25970771. → pages 60[147] F. Sanchez-Garcia, U. D. Akavia, E. Mozes, and D. Pe’er. JISTIC: identification of significant targets in cancer. BMCbioinformatics, 11:189, 2010. ISSN 1471-2105. doi:10.1186/1471-2105-11-189. URLhttps://www.ncbi.nlm.nih.gov/pubmed/20398270. → pages 3[148] R. F. Schwarz, A. Trinh, B. Sipos, J. D. Brenton, N. Goldman, and F. Markowetz. Phylogenetic quantification ofintra-tumour heterogeneity. PLoS computational biology, 10(4):e1003535, apr 2014. ISSN 1553-7358.doi:10.1371/journal.pcbi.1003535. URL http://www.ncbi.nlm.nih.gov/pubmed/24743184. → pages 78[149] H. Sharifi-Noghabi, Y. Liu, N. Erho, R. Shrestha, M. Alshalalfa, E. Davicioni, C. C. Collins, and M. Ester. Deepgenomic signature for early metastasis prediction in prostate cancer. bioRxiv, 2018. doi:10.1101/276055. URLhttps://doi.org/10.1101/276055. → pages 9[150] N. L. Sharma, C. E. Massie, A. Ramos-Montoya, V. Zecchini, H. E. Scott, A. D. Lamb, S. MacArthur, R. Stark, A. Y.Warren, I. G. Mills, and D. E. Neal. The Androgen Receptor Induces a Distinct Transcriptional Program inCastration-Resistant Prostate Cancer in Man. Cancer Cell, 23(1):35–47, 2013. ISSN 15356108.doi:10.1016/j.ccr.2012.11.010. → pages 24[151] B. S. Sheffield, A. V. Tinker, Y. Shen, H. Hwang, H. H. Li-Chang, E. Pleasance, C. Ch’ng, A. Lum, J. Lorette, Y. J.McConnell, S. Sun, S. J. Jones, A. M. Gown, D. G. Huntsman, D. F. Schaeffer, A. Churg, S. Yip, J. Laskin, and M. A.Marra. Personalized oncogenomics: Clinical experience with malignant peritoneal mesothelioma using whole genomesequencing. PLoS ONE, 10(3):1–12, 2015. ISSN 19326203. doi:10.1371/journal.pone.0119689. URLhttps://www.ncbi.nlm.nih.gov/pubmed/25798586. → pages 51[152] M. F. Shlesinger. Mathematical physics: first encounters. Nature, 450(7166):40–41, 2007. ISSN 0028-0836.doi:10.1038/450040a. → pages 7[153] I. Shmulevich, E. R. Dougherty, and W. Zhang. Gene perturbation and intervention in probabilistic Boolean networks.Bioinformatics (Oxford, England), 18(10):1319–1331, 2002. ISSN 1367-4803, 1460-2059.doi:10.1093/bioinformatics/18.10.1319. → pages 7113[154] R. Shrestha, E. Hodzic, J. Yeung, K. Wang, T. Sauerwald, P. Dao, S. Anderson, H. Beltran, M. A. Rubin, C. C. Collins,G. Haffari, and S. C. Sahinalp. HIT’nDRIVE: Multi-driver gene prioritization based on hitting time. Research inComputational Molecular Biology: 18th Annual International Conference, RECOMB 2014, Pittsburgh, PA, USA, April2-5, 2014, Proceedings, pages 293–306, 2014. doi:10.1007/978-3-319-05269-4 23. URLhttp://dx.doi.org/10.1007/978-3-319-05269-4 23. → pages 7, 11[155] R. Shrestha, E. Hodzic, T. Sauerwald, P. Dao, K. Wang, J. Yeung, S. Anderson, F. Vandin, G. Haffari, C. C. Collins, andS. C. Sahinalp. HIT’nDRIVE: patient-specific multidriver gene prioritization for precision oncology. Genome research,27(9):1573–1588, sep 2017. ISSN 1549-5469. doi:10.1101/gr.221218.117. URLhttps://www.ncbi.nlm.nih.gov/pubmed/28768687. → pages 7, 11, 52, 67, 75[156] R. Shrestha, N. Nabavi, Y.-Y. Lin, F. Mo, S. Anderson, S. Volik, H. H. Adomat, D. Lin, H. Xue, X. Dong, R. Shukin,R. H. Bell, B. McConeghy, A. Haegert, S. Brahmbhatt, E. Li, H. Z. Oo, A. Hurtado-Coll, L. Fazli, J. Zhou,Y. McConnell, A. McCart, A. Lowy, G. B. Morin, M. Daugaard, S. C. Sahinalp, F. Hach, S. Le Bihan, M. E. Gleave,Y. Wang, A. Churg, and C. C. Collins. Integrated Multi-omics Molecular Subtyping Predicts Therapeutic Vulnerabilityin Malignant Peritoneal Mesothelioma. bioRxiv, 2018. doi:10.1101/243477. URL https://doi.org/10.1101/2434777. →pages 8, 51[157] N.-L. Sim, P. Kumar, J. Hu, S. Henikoff, G. Schneider, and P. C. Ng. SIFT web server: predicting effects of amino acidsubstitutions on proteins. Nucleic acids research, 40(Web Server issue):W452–7, 2012. ISSN 1362-4962.doi:10.1093/nar/gks539. URL https://www.ncbi.nlm.nih.gov/pubmed/22689647. → pages 3[158] A. D. Singhi, A. M. Krasinskas, H. A. Choudry, D. L. Bartlett, J. F. Pingpank, H. J. Zeh, A. Luvison, K. Fuhrer,N. Bahary, R. R. Seethala, and S. Dacic. The prognostic significance of BAP1, NF2, and CDKN2A in malignantperitoneal mesothelioma. Modern pathology : an official journal of the United States and Canadian Academy ofPathology, Inc, 29(1):14–24, 2016. ISSN 1530-0285. doi:10.1038/modpathol.2015.121. URLhttp://www.ncbi.nlm.nih.gov/pubmed/26493618. → pages 51[159] T. Sjo¨blom, L. D. Wood, D. W. Parsons, J. Lin, T. D. Barber, D. Mandelker, R. J. Leary, J. Ptak, N. Silliman, S. Szabo,P. Buckhaults, C. Farrell, P. Meeh, S. D. Markowitz, J. Willis, D. Dawson, J. K. V. Willson, A. F. Gazdar, J. Hartigan,L. Wu, C. Liu, G. Parmigiani, B. H. Park, and K. E. Bachman. The Consensus Coding Sequences of Human Breast andColorectal Cancers. Science, 314(October):268–274, 2006. ISSN 0036-8075, 1095-9203.doi:10.1126/science.1133427. URL https://www.ncbi.nlm.nih.gov/pubmed/16959974. → pages 2[160] M. R. Spalinger, R. Manzini, L. Hering, J. B. Riggs, C. Gottier, S. Lang, K. Atrott, A. Fettelschoss, F. Olomski, T. M.Ku¨ndig, M. Fried, D. F. McCole, G. Rogler, and M. Scharl. PTPN2 Regulates Inflammasome Activation and ControlsOnset of Intestinal Inflammation and Colon Cancer. Cell reports, 22(7):1835–1848, feb 2018. ISSN 2211-1247.doi:10.1016/j.celrep.2018.01.052. → pages 86[161] M. R. Stratton, P. J. Campbell, and P. A. Futreal. The cancer genome. Nature, 458(7239):719–24, Apr. 2009. ISSN1476-4687. doi:10.1038/nature07943. URL https://www.ncbi.nlm.nih.gov/pubmed/19360079. → pages 1, 10[162] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy,T. R. Golub, E. S. Lander, and J. P. Mesirov. Gene set enrichment analysis: a knowledge-based approach forinterpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States ofAmerica, 102(43):15545–50, oct 2005. ISSN 0027-8424. doi:10.1073/pnas.0506580102. → pages 44, 68, 89[163] P. H. Sugarbaker and D. Chang. Long-term regional chemotherapy for patients with epithelial malignant peritonealmesothelioma results in improved survival. European Journal of Surgical Oncology, 43(7):1228–1235, 2017. ISSN15322157. doi:10.1016/j.ejso.2017.01.009. URL http://dx.doi.org/10.1016/j.ejso.2017.01.009. → pages 50114[164] L. Sun, A. M. Hui, Q. Su, A. Vortmeyer, Y. Kotliarov, S. Pastorino, A. Passaniti, J. Menon, J. Walling, R. Bailey,M. Rosenblum, T. Mikkelsen, and H. A. Fine. Neuronal and glioma-derived stem cell factor induces angiogenesiswithin the brain. Cancer Cell, 9(4):287–300, 2006. ISSN 15356108. doi:10.1016/j.ccr.2006.03.003. → pages 24[165] C. Suo, O. Hrydziuszko, D. Lee, S. Pramana, D. Saputra, H. Joshi, S. Calza, and Y. Pawitan. Integration of somaticmutation, expression and functional data reveals potential driver genes predictive of breast cancer survival.Bioinformatics, 31(16):2607, Mar. 2015. doi:10.1093/bioinformatics/btv164. → pages 4[166] S. Suthram, A. Beyer, R. M. Karp, Y. Eldar, and T. Ideker. eQED: an efficient method for interpreting eQTLassociations using protein networks. Molecular systems biology, 4(162):162, 2008. ISSN 1744-4292.doi:10.1038/msb.2008.4. → pages 5[167] D. Szklarczyk, a. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, M. Simonovic, a. Roth, a. Santos,K. P. Tsafou, M. Kuhn, P. Bork, L. J. Jensen, and C. von Mering. STRING v10: protein-protein interaction networks,integrated over the tree of life. Nucleic Acids Research, 43(D1):D447–D452, 2014. ISSN 0305-1048.doi:10.1093/nar/gku1003. URL http://www.ncbi.nlm.nih.gov/pubmed/25352553. → pages 67[168] D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, M. Simonovic, A. Roth, A. Santos,K. P. Tsafou, M. Kuhn, P. Bork, L. J. Jensen, and C. von Mering. String v10: proteinprotein interaction networks,integrated over the tree of life. Nucleic Acids Research, 43(D1):D447–D452, 2015. doi:10.1093/nar/gku1003. → pages23[169] B. S. Taylor, N. Schultz, H. Hieronymus, A. Gopalan, Y. Xiao, B. S. Carver, V. K. Arora, P. Kaushik, E. Cerami,B. Reva, Y. Antipin, N. Mitsiades, T. Landers, I. Dolgalev, J. E. Major, M. Wilson, N. D. Socci, A. E. Lash, A. Heguy,J. a. Eastham, H. I. Scher, V. E. Reuter, P. T. Scardino, C. Sander, C. L. Sawyers, and W. L. Gerald. Integrative genomicprofiling of human prostate cancer. Cancer cell, 18(1):11–22, jul 2010. ISSN 1878-3686.doi:10.1016/j.ccr.2010.05.026. → pages 24[170] TCGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407):330–7, July2012. ISSN 1476-4687. doi:10.1038/nature11252. URL https://www.ncbi.nlm.nih.gov/pubmed/22810696. → pages 3,77, 82, 84[171] N. Tebbutt, M. W. Pedersen, and T. G. Johns. Targeting the ERBB family in cancer: couples therapy. Nature ReviewsCancer, 13(9):663–673, 2013. ISSN 1474-175X. doi:10.1038/nrc3559. URLhttp://www.ncbi.nlm.nih.gov/pubmed/23949426. → pages 84[172] J. R. Testa. Asbestos and Mesothelioma. Current Cancer Research. Springer International Publishing, 2017. ISBN978-3-319-53558-6. doi:10.1007/978-3-319-53560-9. URL http://link.springer.com/10.1007/978-3-319-53560-9. →pages 50[173] P. Tetali. Design of on-line algorithms using hitting times. SIAM J. Comput., 28(4):1232–1246, 1999. → pages 13[174] B. Thapa, A. Salcedo, X. Lin, M. Walkiewicz, C. Murone, M. Ameratunga, K. Asadi, S. Deb, S. A. Barnett, S. Knight,P. Mitchell, D. N. Watkins, P. C. Boutros, and T. John. The Immune Microenvironment, Genome-wide Copy NumberAberrations, and Survival in Mesothelioma. Journal of Thoracic Oncology, 12(5):850–859, 2017. ISSN 15561380.doi:10.1016/j.jtho.2017.02.013. URL http://dx.doi.org/10.1016/j.jtho.2017.02.013. → pages 51[175] The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastomagenes and core pathways. Nature, 455(7216):1061–8, Oct. 2008. ISSN 1476-4687. doi:10.1038/nature07385. →pages 15, 22, 35, 44, 82[176] The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353):609–15, June 2011. ISSN 1476-4687. doi:10.1038/nature10166. → pages 15, 22, 35, 36, 44115[177] The Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature,490(7418):61–70, Oct. 2012. ISSN 0028-0836. doi:10.1038/nature11412. → pages 15, 22, 35, 44, 82[178] The Cancer Genome Atlas Research Network. The Molecular Taxonomy of Primary Prostate Cancer. Cell, 163(4):1011–25, nov 2015. ISSN 1097-4172. doi:10.1016/j.cell.2015.10.025. → pages 15, 22, 35, 36, 44[179] H. Thorvaldsdo´ttir, J. T. Robinson, and J. P. Mesirov. Integrative Genomics Viewer (IGV): High-performance genomicsdata visualization and exploration. Briefings in Bioinformatics, 14(2):178–192, 2013. ISSN 14675463.doi:10.1093/bib/bbs017. URL https://www.ncbi.nlm.nih.gov/pubmed/22517427. → pages 63[180] S. a. Tomlins, D. R. Rhodes, S. Perner, S. M. Dhanasekaran, R. Mehra, X.-W. Sun, S. Varambally, X. Cao, J. Tchinda,R. Kuefer, C. Lee, J. E. Montie, R. B. Shah, K. J. Pienta, M. a. Rubin, and A. M. Chinnaiyan. Recurrent fusion ofTMPRSS2 and ETS transcription factor genes in prostate cancer. Science (New York, N.Y.), 310(5748):644–648, 2005.ISSN 0036-8075. doi:10.1126/science.1117679. → pages 36[181] M. Torchala, P. Chelminiak, and P. A. Bates. Mean first-passage time calculations: Comparison of the deterministicHill’s algorithm with Monte Carlo simulations. European Physical Journal B, 85(4), 2012. ISSN 14346028.doi:10.1140/epjb/e2012-20760-8. → pages 7[182] M. Torchala, P. Chelminiak, M. Kurzynski, and P. a. Bates. RaTrav: a tool for calculating mean first-passage times onbiochemical networks. BMC systems biology, 7:130, 2013. ISSN 1752-0509. doi:10.1186/1752-0509-7-130. URLhttp://www.ncbi.nlm.nih.gov/pubmed/24261882. → pages 7[183] Z. Tu, L. Wang, M. N. Arbeitman, T. Chen, and F. Sun. An integrative approach for causal gene identification and generegulatory pathway inference. Bioinformatics, 22(14):489–496, 2006. ISSN 13674803.doi:10.1093/bioinformatics/btl234. → pages 6[184] G. Ugurluer, K. Chang, M. E. Gamez, A. L. Arnett, R. Jayakrishnan, R. C. Miller, and T. T. Sio. Genome-basedMutational Analysis by Next Generation Sequencing in Patients with Malignant Pleural and Peritoneal Mesothelioma.Anticancer research, 36(5):2331–8, may 2016. ISSN 1791-7530. URLhttp://www.ncbi.nlm.nih.gov/pubmed/27127140. → pages 51[185] I. Ulitsky, A. Krishnamurthy, R. M. Karp, and R. Shamir. DEGAS: de novo discovery of dysregulated pathways inhuman diseases. PloS one, 5(10):e13367, oct 2010. ISSN 1932-6203. doi:10.1371/journal.pone.0013367. URLhttp://www.ncbi.nlm.nih.gov/pubmed/20976054. → pages 76, 87[186] A. Untergasser, I. Cutcutache, T. Koressaar, J. Ye, B. C. Faircloth, M. Remm, and S. G. Rozen. Primer3–newcapabilities and interfaces. Nucleic acids research, 40(15):e115, aug 2012. ISSN 1362-4962. doi:10.1093/nar/gks596.URL http://www.ncbi.nlm.nih.gov/pubmed/22730293. → pages 65[187] E. M. Van Allen, N. Wagle, P. Stojanov, D. L. Perrin, K. Cibulskis, S. Marlow, J. Jane-Valbuena, D. C. Friedrich,G. Kryukov, S. L. Carter, A. McKenna, A. Sivachenko, M. Rosenberg, A. Kiezun, D. Voet, M. Lawrence, L. T.Lichtenstein, J. G. Gentry, F. W. Huang, J. Fostel, D. Farlow, D. Barbie, L. Gandhi, E. S. Lander, S. W. Gray, S. Joffe,P. Janne, J. Garber, L. MacConaill, N. Lindeman, B. Rollins, P. Kantoff, S. A. Fisher, S. Gabriel, G. Getz, and L. A.Garraway. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples toguide precision cancer medicine. Nature medicine, 20(6):682–8, jun 2014. ISSN 1546-170X. doi:10.1038/nm.3559.→ pages 35[188] E. Van Dyk, M. J. T. Reinders, and L. F. a. Wessels. A scale-space method for detecting recurrent DNA copy numberchanges with analytical false discovery rate control. Nucleic Acids Research, 41(9), 2013. ISSN 03051048.doi:10.1093/nar/gkt155. URL https://www.ncbi.nlm.nih.gov/pubmed/23476020. → pages 3116[189] F. Vandin, E. Upfal, and B. J. Raphael. Algorithms for detecting significantly mutated pathways in cancer. Journal ofcomputational biology : a journal of computational molecular cell biology, 18(3):507–22, Mar. 2011. ISSN1557-8666. doi:10.1089/cmb.2010.0265. → pages 6, 76[190] F. Vandin, E. Upfal, and B. J. Raphael. De novo discovery of mutated driver pathways in cancer. Genome research, 22(2):375–85, Feb. 2012. ISSN 1549-5469. doi:10.1101/gr.120477.111. → pages 4, 76, 88[191] O. Vanunu, O. Magger, E. Ruppin, T. Shlomi, and R. Sharan. Associating genes and protein complexes with disease vianetwork propagation. PLoS Computational Biology, 6(1), 2010. ISSN 1553734X. doi:10.1371/journal.pcbi.1000641.→ pages 6[192] C. J. Vaske, S. C. Benz, J. Z. Sanborn, D. Earl, C. Szeto, et al. Inference of patient-specific pathway activities frommulti-dimensional cancer genomics data using PARADIGM. Bioinformatics (Oxford, England), 26(12):i237–45, June2010. ISSN 1367-4811. doi:10.1093/bioinformatics/btq182. → pages 4[193] R. G. W. Verhaak, K. a. Hoadley, E. Purdom, V. Wang, Y. Qi, et al. Integrated genomic analysis identifies clinicallyrelevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer cell, 17(1):98–110, Jan. 2010. ISSN 1878-3686. doi:10.1016/j.ccr.2009.12.020. → pages 35[194] R. Visconti, R. Della Monica, and D. Grieco. Cell cycle checkpoint in cancer: a therapeutically targetabledouble-edged sword. Journal of experimental & clinical cancer research : CR, 35(1):153, sep 2016. ISSN 1756-9966.doi:10.1186/s13046-016-0433-9. → pages 86[195] B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. a. Diaz, and K. W. Kinzler. Cancer genome landscapes.Science (New York, N.Y.), 339(6127):1546–58, mar 2013. ISSN 1095-9203. doi:10.1126/science.1235122. URLhttps://www.ncbi.nlm.nih.gov/pubmed/23539594. → pages 1, 2, 10, 75[196] V. Walter, A. B. Nobel, and F. a. Wright. DiNAMIC: A method to identify recurrent DNA copy number aberrations intumors. Bioinformatics, 27(5):678–685, 2011. ISSN 13674803. doi:10.1093/bioinformatics/btq717. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21183584. → pages 3[197] K. Wang, M. Li, and H. Hakonarson. ANNOVAR: Functional annotation of genetic variants from high-throughputsequencing data. Nucleic Acids Research, 38(16):1–7, 2010. ISSN 03051048. doi:10.1093/nar/gkq603. URLhttps://www.ncbi.nlm.nih.gov/pubmed/20601685. → pages 3, 63[198] K. Wang, R. Shrestha, A. W. Wyatt, A. Reddy, J. Leha´r, Y. Wang, A. Lapuk, and C. C. Collins. A meta-analysisapproach for characterizing pan-cancer mechanisms of drug sensitivity in cell lines. PloS one, 9(7):e103050, 2014.ISSN 1932-6203. doi:10.1371/journal.pone.0103050. URL http://www.ncbi.nlm.nih.gov/pubmed/25036042. → pages9[199] M. D. Wilkerson and D. N. Hayes. ConsensusClusterPlus: A class discovery tool with confidence assessments and itemtracking. Bioinformatics, 26(12):1572–1573, 2010. ISSN 13674803. doi:10.1093/bioinformatics/btq170. URLhttps://www.ncbi.nlm.nih.gov/pubmed/20427518. → pages 67[200] L. D. Wood, D. W. Parsons, S. Jones, J. Lin, T. Sjo¨blom, R. J. Leary, D. Shen, S. M. Boca, T. Barber, J. Ptak,N. Silliman, S. Szabo, Z. Dezso, V. Ustyanksky, T. Nikolskaya, Y. Nikolsky, R. Karchin, P. a. Wilson, J. S. Kaminker,Z. Zhang, R. Croshaw, J. Willis, D. Dawson, M. Shipitsin, J. K. V. Willson, S. Sukumar, K. Polyak, B. H. Park, C. L.Pethiyagoda, P. V. K. Pant, D. G. Ballinger, A. B. Sparks, J. Hartigan, D. R. Smith, E. Suh, N. Papadopoulos,P. Buckhaults, S. D. Markowitz, G. Parmigiani, K. W. Kinzler, V. E. Velculescu, and B. Vogelstein. The genomiclandscapes of human breast and colorectal cancers. Science (New York, N.Y.), 318(5853):1108–1113, 2007. ISSN1095-9203. doi:10.1126/science.1145720. URL https://www.ncbi.nlm.nih.gov/pubmed/17932254. → pages 3117[201] A. W. Wyatt, F. Mo, K. Wang, B. McConeghy, S. Brahmbhatt, L. Jong, D. M. Mitchell, R. L. Johnston, A. Haegert,E. Li, J. Liew, J. Yeung, R. Shrestha, A. V. Lapuk, A. McPherson, R. Shukin, R. H. Bell, S. Anderson, J. Bishop,A. Hurtado-Coll, H. Xiao, A. M. Chinnaiyan, R. Mehra, D. Lin, Y. Wang, L. Fazli, M. E. Gleave, S. V. Volik, and C. C.Collins. Heterogeneity in the inter-tumor transcriptome of high risk prostate cancer. Genome biology, 15(8):426, aug2014. ISSN 1474-760X. doi:10.1186/s13059-014-0426-y. URL http://www.ncbi.nlm.nih.gov/pubmed/25155515. →pages 9[202] M. Yamada, J. Tang, J. Lugo-Martinez, E. Hodzic, R. Shrestha, H. Ouyang, P. Radivojac, C. Sahinalp, F. Menczer,Y. Chang, A. Saha, H. Mamitsuka, and D. Yin. Ultra High-Dimensional Nonlinear Feature Selection for Big BiologicalData. IEEE Transactions on Knowledge and Data Engineering, 30(7):1352–1365, 2018. ISSN 1041-4347.doi:10.1109/TKDE.2018.2789451. URL https://doi.org/10.1109/TKDE.2018.2789451. → pages 9[203] X. Yao, H. Hao, Y. Li, and S. Li. Modularity-based credible prediction of disease genes and detection of diseasesubtypes on the phenotype-gene heterogeneous network. BMC systems biology, 5(1):79, 2011. ISSN 1752-0509.doi:10.1186/1752-0509-5-79. URL http://www.biomedcentral.com/1752-0509/5/79. → pages 7[204] E. Yeger-Lotem, L. Riva, L. J. Su, A. D. Gitler, A. G. Cashikar, O. D. King, P. K. Auluck, M. L. Geddie, J. S.Valastyan, D. R. Karger, S. Lindquist, and E. Fraenkel. Bridging high-throughput genetic and transcriptional datareveals cellular responses to alpha-synuclein toxicity. Nature genetics, 41(3):316–323, 2009. ISSN 1061-4036.doi:10.1038/ng.337. → pages 6[205] K. Yoshihara, A. Tajima, D. Komata, T. Yamamoto, S. Kodama, H. Fujiwara, M. Suzuki, Y. Onishi, M. Hatae,K. Sueyoshi, H. Fujiwara, Y. Kudo, I. Inoue, and K. Tanaka. Gene expression profiling of advanced-stage serousovarian cancers distinguishes novel subclasses and implicates ZEB2 in tumor progression and prognosis. CancerScience, 100(8):1421–1428, 2009. ISSN 13479032. doi:10.1111/j.1349-7006.2009.01204.x. → pages 24[206] K. Yoshihara, M. Shahmoradgoli, E. Martı´nez, R. Vegesna, H. Kim, W. Torres-Garcia, V. Trevin˜o, H. Shen, P. W. Laird,D. a. Levine, S. L. Carter, G. Getz, K. Stemke-Hale, G. B. Mills, and R. G. W. Verhaak. Inferring tumour purity andstromal and immune cell admixture from expression data. Nature communications, 4:2612, 2013. ISSN 2041-1723.doi:10.1038/ncomms3612. URL http://www.ncbi.nlm.nih.gov/pubmed/24113773. → pages 58, 68[207] K. Yoshihara, Q. Wang, W. Torres-Garcia, S. Zheng, R. Vegesna, H. Kim, and R. G. W. Verhaak. The landscape andtherapeutic relevance of cancer-associated transcript fusions. Oncogene, 34(37):4845–4854, 2014. ISSN 0950-9232.doi:10.1038/onc.2014.406. → pages 23, 35, 36[208] Y. Yoshikawa, M. Emi, T. Hashimoto-Tamaoki, M. Ohmuraya, A. Sato, T. Tsujimura, S. Hasegawa, T. Nakano,M. Nasu, S. Pastorino, A. Szymiczek, A. Bononi, M. Tanji, I. Pagano, G. Gaudino, A. Napolitano, C. Goparaju, H. I.Pass, H. Yang, and M. Carbone. High-density array-CGH with targeted NGS unmask multiple noncontiguous minutedeletions on chromosome 3p21 in mesothelioma. Proceedings of the National Academy of Sciences of the United Statesof America, 113(47):13432–13437, 2016. ISSN 1091-6490. doi:10.1073/pnas.1612074113. URLhttp://www.ncbi.nlm.nih.gov/pubmed/27834213. → pages 54, 75[209] A. Youn and R. Simon. Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics (Oxford,England), 27(2):175–81, Jan. 2011. ISSN 1367-4811. doi:10.1093/bioinformatics/btq630. URLhttps://www.ncbi.nlm.nih.gov/pubmed/21169372. → pages 2[210] H. Yu, H. Pak, I. Hammond-Martel, M. Ghram, A. Rodrigue, S. Daou, H. Barbour, L. Corbeil, J. Hebert, E. Drobetsky,J. Y. Masson, J. M. Di Noia, and E. B. Affar. Tumor suppressor and deubiquitinase BAP1 promotes DNA double-strandbreak repair. Proceedings of the National Academy of Sciences, 111(1):285–290, 2014. ISSN 0027-8424.doi:10.1073/pnas.1309085110. URL http://www.ncbi.nlm.nih.gov/pubmed/24347639. → pages 60118[211] S. Zaccaria, M. El-kebir, G. W. Klau, and B. J. Raphael. The Copy-Number Tree Mixture Deconvolution Problem andApplications to Multi-sample Bulk Sequencing Tumor Data. In S. C. Sahinalp, editor, Research in ComputationalMolecular Biology, pages 318–335, Cham, 2017. Springer International Publishing. ISBN 978-3-319-56970-3.doi:10.1007/978-3-319-56970-3 20. URL http://link.springer.com/10.1007/978-3-319-56970-3. → pages 78[212] Q. Zhang, L. Ding, D. E. Larson, D. C. Koboldt, M. D. McLellan, K. Chen, X. Shi, A. Kraja, E. R. Mardis, R. K.Wilson, I. B. Borecki, and M. a. Province. CMDS: A population-based method for identifying recurrent DNA copynumber aberrations in cancer from high-resolution data. Bioinformatics, 26(4):464–469, 2009. ISSN 13674803.doi:10.1093/bioinformatics/btp708. URL https://www.ncbi.nlm.nih.gov/pubmed/20031968. → pages 3119


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items