UBC Faculty Research and Publications

DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer Bashashati, Ali; Haffari, Gholamreza; Ding, Jiarui; Ha, Gavin; Lui, Kenneth; Rosner, Jamie; Huntsman, David G; Caldas, Carlos; Aparicio, Samuel A; Shah, Sohrab P Dec 22, 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-13059_2012_Article_3003.pdf [ 1.02MB ]
Metadata
JSON: 52383-1.0223730.json
JSON-LD: 52383-1.0223730-ld.json
RDF/XML (Pretty): 52383-1.0223730-rdf.xml
RDF/JSON: 52383-1.0223730-rdf.json
Turtle: 52383-1.0223730-turtle.txt
N-Triples: 52383-1.0223730-rdf-ntriples.txt
Original Record: 52383-1.0223730-source.json
Full Text
52383-1.0223730-fulltext.txt
Citation
52383-1.0223730.ris

Full Text

METHOD Open AccessDriverNet: uncovering the impact of somaticdriver mutations on transcriptional networks incancerAli Bashashati1†, Gholamreza Haffari1,2†, Jiarui Ding1,3†, Gavin Ha1,4, Kenneth Lui1, Jamie Rosner1,David G Huntsman5,6, Carlos Caldas7, Samuel A Aparicio1,5 and Sohrab P Shah1,3,5*AbstractSimultaneous interrogation of tumor genomes and transcriptomes is underway in unprecedented global efforts.Yet, despite the essential need to separate driver mutations modulating gene expression networks fromtranscriptionally inert passenger mutations, robust computational methods to ascertain the impact of individualmutations on transcriptional networks are underdeveloped. We introduce a novel computational framework,DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application tofour cancer datasets reveals the prevalence of rare candidate driver mutations associated with disruptedtranscriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copynumber co-modification of adjacent oncogenic and metabolic drivers. DriverNet is available on Bioconductor or athttp://compbio.bccrc.ca/software/drivernet/.Keywords: driver mutations, sequencing, cancer, transcriptional networks.BackgroundCancer genome sequencing experiments are designed toenumerate all somatic mutations within a cancer. Some ofthese mutations will serve as actionable genomic aberra-tions upon which to develop and apply targeted therapies(for example, mutations in PIK3CA, BRAF, and KRAS)and ultimately enabling rational frameworks for improvedclinical management and patient care based on precisegenomic patterns of somatic alteration. To this end, nextgeneration sequencing (NGS) technology has shifted therate-limiting step from identifying all cancer mutations ina sequenced genome to identifying the relatively few func-tional mutations that drive the phenotype of malignantcells. Therein lies a major challenge in the cancer geno-mics field: distinguishing pathogenic, driver mutationsfrom the so-called passenger mutations that accrue sto-chastically, but do not confer selective advantages.In order to discover novel driver mutations, severallarge-scale sequencing initiatives such as The Cancer Gen-ome Atlas project (TCGA, for example, [1]) are generatingsimultaneous whole genome and transcriptome interroga-tions for hundreds of cases of the same tumor type. Thisopens the possibility of ascribing the impact of individualsomatic mutations on gene expression networks. Initialobservations in high-throughput datasets, coupled withinnumerable functional studies suggest that driver muta-tions are expected to alter gene expression of their cognateproteins, their interacting partners, or genes that share thesame biochemical pathway. This will lead to a correlatedpattern of gene expression in a network of genes asso-ciated with a driver mutation, which differs from benignpassenger mutations with little to no phenotype. More-over, somatic aberrations in genes may alter more thanone transcriptional network, thus enabling the enumera-tion of a group of pathways driven by a single genomicevent. The importance of placing mutations in the contextof their gene expression has been illuminated recently byPrahallad and colleagues [2], who established the thera-peutic effect of PLX4032 against the BRAF V600E onco-protein, which is mechanistically linked to the activation* Correspondence: sshah@bccrc.ca† Contributed equally1Department of Molecular Oncology, British Columbia Cancer Agency, 675West 10th Avenue, Vancouver, BC, V5Z 1L3, CanadaFull list of author information is available at the end of the articleBashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124© 2012 Bashashati et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.of EGFR. Thus, differential expression of EGFR in differentcell types (colon cancers versus melanomas) has a dra-matic impact on drug efficacy. Consequently, knowingactive pathways coupled with mutational profiles will becritical for implementation of therapeutic decisionsinformed by the presence of mutations in a cancer.Current approaches for driver analysis typically rely onthe frequency of aberration of a given gene or locus in apopulation of tumors as a function of the backgroundmutation rate (for example, [3-5]). Recent whole gen-ome interrogations, however, have revealed the vastmajority of mutated genes exhibit low population fre-quencies [6-10]. While most of these events can beexplained by stochastically acquired mutations due toincreased proliferation or acquisition of mutagenic pro-cesses, with no oncogenic properties, many others are infact well-known pathogenic mutations with, in somecases, actionable clinical utility. For example, sequencingof complete exomes of 316 ovarian cancers [7] and 65triple negative breast cancers [11] revealed rare butfunctionally important and actionable mutations (forexample, in ERBB2 and BRAF) in a small percentage ofcases that were not identified by frequency and back-ground mutation rate analyses. Thus, frequency analysiswill fail to recognize infrequent, but nonetheless impor-tant driver mutations.We suggest that integrative analysis of genomic aberra-tions and transcriptional profiles in cancer will revealsomatic mutations that drive biological processes, regard-less of the population frequency. Furthermore, we proposethat biological networks can be leveraged to relate muta-tions to their consequent effect on transcription and geneexpression. Figure 1A shows an example of high-levelamplification of EGFR in a glioblastoma multiforme(GBM) tumor, accompanied by the coincident outlyingexpression of genes that are connected to EGFR throughknown biological pathways. We note that BRAF in thiscase, although not amplified itself, exhibits elevatedexpression compared to the population distribution. Othergenes known to interact with EGFR exhibit similarextreme changes in expression levels in this example, suchthat PI3K signaling and MAPK signaling could be affectedby this single genomic event. Figure 1B shows fitted Gaus-sian expression distributions of three genes that interactwith EGFR: FGF11, PIK3R1, and PRKACB, and shows thatsome cases with outlying expression have coincidentEGFR amplifications. Our assumption is that amplificationof EGFR in these cases has driven expression of the exam-ple genes to the tails of their respective distributions.Thus, extreme changes in expression levels of genesrelated to genomic aberrations are observable in orthogon-ally measured high-throughput transcriptome assays. Assuch, simultaneous analysis of genome and transcriptomemeasurements should amplify important signals in thedata. Motivated by this idea, we hypothesize that driveraberrations will measurably disrupt transcriptional profilesregardless of their frequency in the population.Algorithmic frameworks to exploit the relationshipbetween genomic events and consequent changes in geneexpression to nominate putative driver genes are underde-veloped. We therefore propose an integrated genome/transcriptome analysis framework, called DriverNet, tocontextualize genomic aberrations (for example, mutationsand copy number alterations) by their effect on transcrip-tional networks and identify candidate genomic aberra-tions suitable for functional experimental follow-up. Ourapproach allows individual mutations to be related tocoincident changes in gene expression and assigns statisti-cal significance to candidate predictions, thus quantita-tively and rationally prioritizing candidate genes. We notethat our intent differs from complementary approachessuch as the one described by Vaske et al. [12], which aimsat nominating driver pathways rather than driver genes incancer, and from those that leverage genome data withoutconsidering expression [4,13]. Both Masica and Karchin[14] and Ciriello et al. [15] integrate genome and tran-scriptome relationships in their framework; however, theydiffer from our approach, since Masica and Karchin [14]do not utilize known biological pathway information andCiriello et al. [15] only consider mRNA expression asso-ciated with copy number aberrations and not with muta-tions. Other methods focusing on copy number andexpression associations do not consider mutations, nor dothey employ the use of previously annotated pathways[16,17].To study the properties and advantages of our approach,we analyzed four large-scale genome-transcriptome inter-rogations of tumor populations (Table 1) in human glio-mas, triple negative breast cancers, a population of nearly1,000 breast tumors (all subtypes) and high-grade serousovarian cancers. We present results from three experi-ments: i) ascertainment of sensitivity and specificity in thecontext of several cancer datasets; ii) enumeration of well-known, but infrequent, drivers modulating transcriptionalnetworks, and iii) identification of complex driver eventsthat implicate compound metabolic and oncogenic path-way modulation from single genomic events.ResultsOverview of DriverNet approachWe developed a novel, integrated algorithmic approach(DriverNet) to analyze population-based genomic andtranscriptomic interrogations of tumor (sub)types for iden-tification of pathogenic driver mutations. Our approachrelates genomic aberrations to disrupted transcriptionalpatterns, informed by known associations or interactionsbetween genes. The full details of the algorithm aredescribed in the Online Methods, but will be summarizedBashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 2 of 14 	                         ff fi      fl  ffi  ffi  ffi  fi  !  "  ffi  fi  fl      ffi       # ffi      ffi   $ % & '  ffi fi  "    "  (  " ffi ff ffi #   ffi  ) * + ,  ffi fi   " ffi  fi - . / 0 / 1 2 . / 0 / 3 2 4 5 6 1 2 4 5 6 3 2 7 8 4 9 3 2 : / : 1 2 ; 9 5 3 < = 2 ; 9 5 3 < 1 2 7 : 5 3 > 2 ? 4 5 1@ A ) ,  ffi fi   " ffi  fi - ; B 7 . 4 2 . 7 . 1 C 2 4 5 6 1 2 D 4 ; E 5 F 2 4 5 6 3 2 G 7 . < 2 > < 4 . 2 . H 8 4 2 ? I 8 2 ; ; D 1 4 2 5 < 4 : 2 G H 5 J 2 : / : 1 2 ? I 8 B 2 8 . 4 6 K J 2 < 4 : 4 E 2 6 4 / 5 1 2< ; : L 5 4 E 2 < 4 ; 1 4 2 K 4 K 8 4 1 M$   N  ffi fi   " ffi  fi        O > 6 K 2 : 6 4 6 = > 2 4 5 6 1 2 D 4 ; E 5 F 2 4 5 6 3 2 G 7 . < 2 > < 4 . 2 ? I 8 2 : < K 2 5 < 4 : 2 : / : 1 2 K 4 D 5 E B 2 ; 9 5 3 < = 2 ; 9 5 3 < 1 2 ; H K 7 1 2 7 : 5 3 >P Q R S TU Q S VW V U X PY Z [\ Y ] V ^^ [ QR S R _ S T `Q S V S aS b R c de R Ye \ fR g Y h S hP Q S ii W i T j[ g f RS b Q P T^ P R T b hR R Q T jZ ` b hS U ^ TW _ S \ X[ \ U X Q T[ b W i SP b _ iV Q RW _ P TQ ` g Sf S [ U T hQ S [ T SQ P P [ kl S U TV ^ S ^ X[ ^ [ _ T TV g V T[ Y R W TV ^ S ^ m PZ W i Qi g n g XR R b R ]i g n g TS U ^ XZ Q R TP ^ R[ \ U X Q mP R Q[ Q U R \^ R i k Y T[ [ S Q bR S f U a bb S [ U aQ S V W Q [ af V ` Xl o _ bZ [ S V TQ [ V ] U S aQ S _V f o Q i aP f [ aR b U ]S R e Q a P[ [ f T SQ S P T T i \ [ hl S f X^ S g U TQ S P T T SQ S Y P [ TP f [ Q ap S V YQ S R T[ Q U R PQ S P a a SR S P \ _ TS [ [ Y a[ Q U R qZ Y U hf n b Tf S [ a U k_ i S ^ R hS Q ` W S [ mf S [ U rW _ S T T\ ^ W P Tb S P TY b Y Qf V _i Y _ S\ ^ W P h\ ^ W P Xi W b XY S f S m\ ^ W S ri s b m^ Y _ aR g Y m S ap _ ^ Xe S V [S Qe Z W i Sf S [ U T_ R U S [ Tf S [ a U X` \ [ TR V _ U a S Tl o _tu vwtu vxtxvwxvxxvwy zy {|}~€ ‚ ƒ „ …†‡‡ ˆ‰†‡‡ ˆ‡†‡‡ ˆŠ†‡Šˆ‰†‡Šˆ‡†‡ŠˆŠ†ŠŁ ˆ‰†ŠŁ ˆ‡†ŠŁ ˆŠ†ŠŒ ˆ‰†ŠŒ ˆ‡†ŠŒ ˆŠ Ž†Š‡ ˆ‰†Š‡ ˆ‡†Š‡ ˆŠ†ŠŠˆ‡†ŠŠˆŠ ŠŠˆŠŠŠˆ‡ŠŠŠˆ‡‡ŠŠˆ‡‰‡ŠˆŠŠ‡ŠˆŠ‡‡ŠˆŠ‰‡Šˆ‡‡Šˆ‰‡‡ ˆŠ ‡‡ ˆ‡‡‡ ˆ‰‰ŠˆŠ ‰Šˆ‡‰Šˆ‰Š‰Šˆ‰‡‰Šˆ‰‰‰‡ ˆŠ ‰‡ ˆ‡‰‡ ˆ‰ ‘ ‘’ ‘“‰” ˆŠ ‰” ˆ‡ ‰” ˆ‰• – • — — ˜ ™ š › œ — ˜ œ š  ž ŸGenomic AberrationsInfluence Graph Gene ExpressionDriverNetg1g2g3g4g5g6p1   p2   p3   p4   p5+ +g1g2g3g4g5g6p1   p2   p3   p4   p5g1g2g3g4p2}Patients PatientsGenesGenesg1g2g3gn...p3}g1g2g3gn...p4}g1g2g3gn...p1}g1g2g3gn...Figure 1 A schematic showing how DriverNet works. (a) An example of a Cytoscape visualization of a glioblastoma patient with a high-levelamplification of epidermal growth factor receptor (EGFR) (shown in green) and coincident outlying expression of genes connected to EGFR in theReactome influence graph (shown in yellow). Examples of the overrepresented pathways (by Reactome FI plug-in for Cytoscape, FDR < 0.001) fromthe list of genes showing outlying expression associated with the EGFR amplification are depicted at the bottom. The box plot shows the population-level expression distribution of BRAF, an interacting protein with EGFR, and where the specific case with EGFR amplification sits on that distribution(red ‘x’). We note that in this case, BRAF itself is not mutated or amplified. (b) Fitted Gaussian expression distributions of three genes that interact withEGFR: FGF11, PIK3R1, and PRKACB, with each point indicating the probability density function for individual cases. For each gene, blue dots indicatecases with mutations in the gene itself and red arrows indicate cases with outlying expression with coincident EGFR amplifications. (c) Schematicrepresentation of the DriverNet approach. Given the genomic aberration states for different patients and genes, gene expression data, and theinfluence graph, which captures biological pathway information, the bipartite graph shown on the right is constructed. Green nodes on the leftpartition of the bipartite graph correspond to aberrated genes and nodes on the right represent the outlying expression status for each patient wherered indicates outlying patient-gene events from the gene expression matrix. The genes with the highest number of outlying expression events (forexample, g2) are nominated as putative drivers.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 3 of 14here in brief. Shown schematically in Figure 1C, DriverNetformulates associations between mutations and expressionlevels using a bipartite graph where nodes are: i) the set ofgenes representing the mutation status (the left partitionof the graph) and ii) the set of genes representing outlyingexpression status in each of the patients (the right partitionof the graph). For each patient, an edge between the nodeson the left and right partitions of the graph is drawn if thefollowing three conditions are all satisfied: i) gene gi ismutated in patient p of the population (green nodes onthe left partition of the graph); ii) gene gj shows outlyingexpression in patient p (red nodes on the right partition ofthe graph); and iii) gi and gj are known to interact accord-ing to pathway or gene set databases (an ‘influence graph’after [18]). Our method then uses a greedy optimizationapproach to explain as many nodes on the right partitionof the bipartite graph as possible using the fewest numberof nodes on the left partition of the graph such that thegenes explaining the highest number of outlying expres-sion events (for example, g2 in Figure 1C) are nominatedas putative driver genes. Finally, we apply statistical signifi-cance tests to these candidates based on null distributionsinformed by stochastic resampling.DatasetsFor our analysis, we used four publicly available datasetsthat contain genome and transcriptome data of severaltumor types (Table 1). Detailed descriptions of the analysisof the datasets and pre-processing workflows can be foundin Additional file 1. The GBM dataset represents copynumber, mutations and expression data for 120 glioblas-toma multiforme patients [6] taken from the TCGA portal[19]. Note that the cases which had both mutation andcopy number data were included in this dataset. TheMETABRIC dataset [20] represents copy number altera-tions and accompanying gene expression data for 997breast cancer patients. TN represents the validated muta-tions, copy number, and expression data for 66 triplenegative breast cancer patients [11]. The TCGA HGSdataset contains mutations, copy number, and expressiondata for 304 high-grade serous ovarian cancer patients [7]that were taken from the TCGA portal. Like the GBMdataset, we only included the cases which had bothmutation and copy number data. The data analysis work-flow is shown schematically in Additional file 2. TheGBM2, TN2, and HGS2 datasets represent mutations onlyand gene expression data for 140, 66, and 307 glioblas-toma, triple negative, and high-grade serous ovarian can-cer patients, respectively.Performance benchmarking analysis establishes DriverNetas a sensitive and specific algorithmIn practice, quantitative measurements with standard sen-sitivity/specificity benchmarking techniques are impracti-cal in the absence of ground truth. However, due to theavailability of well-studied cancer gene databases, includ-ing the cancer gene census (CGC) [21] and the catalogueof somatic mutations in cancer datasets (COSMIC) [22],we set out to approximate performance metrics and com-pare DriverNet with the following two competing meth-ods: i) a method described by Masica and Karchin [14],which uses correlation-based statistics followed by a Fisherexact test to associate mutations with gene expression pat-terns (referred to as ‘Fisher’, see Additional file 1), ii) amethod described in Youn and Simon [5], which identifiesdriver genes based on the background mutation rate, func-tional impact on proteins, and redundancy in genetic code(referred to as ‘Frequency’). In adherence to bothapproaches mentioned above, we removed copy numberdata from the analysis and restricted the comparisons tomutation data only (GBM2, TN2, and HGS2, Table 1),resulting in the exclusion of the METABRIC dataset as itcontained copy number aberration data only. We usedtwo systematic benchmarking measures as follows:i) examining the proportion of predictions found in theCancer Gene Census (CGC) database [21]; ii) examiningthe prevalence of somatic mutations of candidate genes inaccordance with the COSMIC database, assuming geneswith higher mutation prevalence in the correspondingpatient population of interest in COSMIC (glioblastoma,breast and ovarian cancer) are more likely to be drivergenes. Theoretically, this measure should favor the Fre-quency approach.To systematically evaluate specificity, we compared theproportion of predictions that were present in CGC as afunction of decreasing sensitivity thresholds (Figure 2A,Table 1 Description of datasetsDataset Tumor type Number of cases Genomic aberrations Outliers ReferenceGBM glioblastoma 120 3,198 26,956 [6]GBM2 glioblastoma 140 573 35,618METABRIC breast 997 18,331 214,530 [19]TN triple negative breast 66 4,824 15,929 [11]TN2 triple negative breast 66 1,019 15,929HGS serous ovarian 304 8,229 91,697 [7]HGS2 serous ovarian 307 4,919 92,491Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 4 of 14B, C) for all three methods. We also looked at the cumula-tive distribution of mutation prevalence in the COSMICdatabase for all three datasets (Figure 2D, E, F). Through-out the range of the top predictions output by DriverNet,the concordance with CGC was always higher than forFisher and Frequency in the GBM2 and TN2 datasets. ForHGS2, DriverNet and the Frequency approach outper-formed the Fisher method. The cumulative prevalence inthe COSMIC dataset was higher for DriverNet comparedto the other two approaches throughout the range of thetop predictions, with Frequency second best. Thus, farfewer predictions are required by DriverNet to capture themajority of drivers in the dataset, indicating higher relativespecificity.For GBM2 (mutations only), the Frequency methodidentified eight genes: EGFR, IDH1, NF1, PIK3R1, PTEN,RB1, TP53, and FKBP9 as significantly altered with sevenof these found in CGC (Additional file 3). In total, Driver-Net identified 34 genes (p < 0.05) including seven of thegenes nominated by the Frequency-based approach (Addi-tional file 4). Several genes found in CGC (PIK3C2G,MDM2, BCR, ERBB2, DDIT3, FGFR1, BRCA2, MET, andPDGFRA) were also among the top 34 genes nominatedby DriverNet. We detected MET as the 29th ranked gene(p = 0.002, mutated in three cases), which was reported in[1], suggesting that it has been overlooked by the Fre-quency method, which ranked this gene as the 93rd.For TN2 (mutation only, no copy number), the Fre-quency method identified five genes: PIK3CA, RB1, TP53,PTEN, and MYO3A as significantly altered genes by muta-tion, of which four were found in CGC (Additional file 5).In total, DriverNet identified 59 genes with p < 0.05, fourof which were nominated by the Frequency-basedapproach (Additional file 6). A DriverNet prediction notidentified by the Frequency approach included JAK1 (p =0, ranked 13th, mutated in one case), which plays a keyrole in prolactin signaling, which is implicated in breastcancer [23,24].For HGS2 (mutation only, no copy number), the Fre-quency method identified CSMD3, BRCA1, BRCA2, andTP53 as significantly altered genes, three of which werefound in CGC (Additional file 7). DriverNet identifiedConcordance with Cancer Gene CensusConcordance with COSMICTop N Ranked GenesADCB0 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.00 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.00 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0Fisheríbased ApproachFrequency-based ApproachDriverNet0 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.00 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.00 50 100 150 2000.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0FEFigure 2 DriverNet performance benchmarking with the GBM2, HGS2, and HGS2 datasets. (A-C) Concordance with Cancer Gene Censusfor DriverNet, Frequency-based, and Fisher-based approaches as a function of the top N ranked genes (out of 200) for the GBM2, TN2, andHGS2 datasets, respectively. (D-F) Concordance with the COSMIC database (cumulative distribution of mutation prevalence in the COSMICdatabase) for DriverNet, Frequency-based, and Fisher-based approaches as a function of the top N ranked genes (out of 200) for the GBM2, TN2,and HGS2 datasets, respectively. Note that for the GBM2 dataset, DriverNet nominates 113 genes as candidate drivers, therefore, theconcordance of DriverNet genes with the Cancer Gene Census is plotted for the 113 candidates.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 5 of 14BRCA1, BRCA2, and TP53 in addition to CGC genes,KRAS, PTEN, KIT, NRAS, RPN1, RB1, PIK3CA, CLTCL1,ATIC, CREBBP, MET, PPP2R1A, CLTC, CTNNB1, BRAF,and TSHR (Additional file 8). BRAF, PIK3CA, KRAS, andNRAS are known oncogenic drivers and emphasize thepower of integration of expression data to nominateimportant but infrequently mutated genes. In addition,the known tumor suppressor gene, PTEN, was amongthe top genes in DriverNet (rank 11th) but was over-looked by the Frequency method, which ranked this geneas 525th.Infrequent mutations modulating transcriptional networksfeature prominently in population level datasetsWe then sought to ascertain the prevalence of rare dri-vers in all four datasets overlooked by Frequency-basedapproach to driver prediction. We identified ‘infrequent’significant drivers (p < 0.05) where the gene of interestwas abrogated by mutation or copy number alteration(CNA) in < 2% of cases. Due to unknown ground truthwith respect to actual drivers, we restrict presentation tothose genes also found in the CGC. This resulted in 22genes in METABRIC, 13 genes in HGS, 1 gene in TN,and 2 genes in GBM (Table 2). The infrequent drivers inMETABRIC were PTEN, RB1, MDM2, MYC, CDKN2A,CLTC, CREBBP, GNAS, EGFR, CCNE1, EP300, CBL,PIK3R1, JAK2, TP53, NUP98, PIK3CA, IDH2, KRAS, andTRA@. Both PIK3CA (two cases with high-level amplifi-cations) and PIK3R1 (two cases with homozygous dele-tions) were altered in 0.19% of cases, and yet showedevidence of driving expression levels of the connectedgenes to the tails of the expression distribution. Interest-ingly, we identified seven cases (0.67%) with homozygousdeletions in TP53 (locus 17p13.1) coincident with outly-ing expression in MAPK and Wnt signaling pathways(Additional files 9 and 10). Loss of function of TP53is typically associated with mutation; however, theseresults suggest that in rare cases, homozygous deletionsmay be the mechanism by which TP53 is lost in breastcancer.In HGS, we found 13 genes that were infrequent driversalso found in CGC (AKT2, KIT, NRAS, RPN, PIK3CA,CREBBP, PPP2R1A, ATIC, CLTCL1, MET, MAP2K4,ETV1, and EP300) (Table 2). Intriguingly, KIT (1.97% ofcases) and NRAS (0.66% of cases) were detected as drivers(p = 2E-4 and 9E-4, respectively; Additional files 11 and12) where KIT is mutated in melanomas, gastrointestinalstromal tumors, adult acute myeloid leukemia patients,and many other tumor types at high frequency and is thetarget of the kinase inhibitor Imatinib. The mutations inNRAS (typically associated with melanomas, multiple mye-lomas, acute myelogenous leukemia, and thyroid cancer)were, in both cases, the Q61R hotspot mutation in theRas domain. Both the KIT and NRAS mutations wereoverlooked as driver mutations by the Frequency-basedapproach (Additional file 7). This illustrates the increasedsensitivity of DriverNet in identifying infrequent drivers inthe population. Interestingly, mutations typically asso-ciated with lower grade (Type I) ovarian cancers such asPIK3CA (0.66% cases mutated) and CTNNB1 (0.6% casesmutated) were also nominated as drivers despite havingextremely low frequency. The two PIK3CA mutationswere both in well-known, activating hotspots, E545K andH1047R. We suggest that these (four separate) casesmight actually be histologically misdiagnosed ovarian can-cers. These cases represent an important anecdote asmany tumor populations contain rare mutations that cre-ate aberrant expression profiles. Type I ovarian cancersexhibit considerably different expression profiles com-pared to Type II high-grade serous cancers [25]. If indeedthese cases are non-serous it would be unsurprising, giventhe DriverNet formulation of integration of genomic andtranscriptomic profiles, that these rare mutations wouldcover many outlier events. In addition, we note that thepreviously mentioned MAP2K4 as an infrequent driverwith a mutation in one case and homozygous deletions intwo cases, and the presence of ETV1, typically known forgene fusions, are listed amongst the infrequent drivers inthe HGS ovarian data. Finally, we cross-referenced the listof genes p < 0.05 with Cheung et al. [26] (a list of geneswith genetic vulnerabilities in cancer cell lines) and notedthat ALG8 and CCNE1 overlapped.In the TN and GBM datasets, results were sparser. Inthe TN dataset, only one gene was an infrequent driverthat was also in CGC: JAK1 with a mutation occurring ina single case (Table 2). JAK1 associated outliers wereenriched for EGFR1 signaling (Additional files 13 and 14),suggesting that the mutation has downstream effects onan important oncogenic signaling network. In the GBMdataset, two genes, namely KRAS and AKT1, were infre-quent drivers and were also found in CGC. KRAS asso-ciated outliers were enriched for MAPK and PDGFRsignaling and AKT1 outliers were enriched for FoxOfamily signaling (Additional files 15 and 16). AKT activa-tion is associated with many malignancies, where AKTacts, in part, by inhibiting FoxO tumor suppressors [27].Collectively, investigations of rare drivers in METABRIC,HGS, TN, and GBM point out bona fide, but rare drivermutations, which would likely be omitted by methodsexamining genomic aberrations by selection or frequencyanalysis. These results indicate that rare driver mutationsmodulating expression networks comprise a meaningfulcomponent of the landscape of transcriptional variationattributed to the somatic genome, and thus should not beoverlooked in the comprehensive enumeration of drivermutations in population-level studies.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 6 of 14Genomic copy number changes harboring knownoncogenes simultaneously modulate metabolic pathwaysWe next examined patterns of modulated expression asso-ciated with drivers occurring within the same high-levelamplification or homozygous deletion. Surprisingly, wenoted four examples in the METABRIC and GBM data-sets whereby genes proximal to known drivers and withinthe same genomic copy number change exhibited evidencefor altering the expression of metabolic pathways exclusiveof known oncogenic or tumor suppressor pathway modu-lation (Figure 3). PNMT encodes the phenylethanolamineN-methyltransferase enzyme and resides approximately20 Kb centromeric to ERBB2 with one intervening gene.ERBB2, amplified in approximately 15-20% of breast can-cers, is a well-known, targetable membrane-boundgrowth-factor receptor that is effectively inhibited by tras-tuzumab in clinical practice. The proximity of PNMT toERBB2 results in co-amplification of both genes in nearlyall cases (82/83 cases with high-level amplification ofERBB2 (Additional file 10)). PNMT was the top rankeddriver in our analysis (ERBB2 was rank 3). When weexamined the outlier genes associated with ERBB2 andPNMT, ERBB2-associated outlier genes were, as expected,enriched for Erbb signaling and EGF signaling pathways.Table 2 The predicted rare driversDataset Gene Gband SNV/Indel HLAMP HOMD Corrected P value Percent alteredMETABRIC PTEN 10q23.31 0 0 16 0 1.54METABRIC RB1 13q14.2 0 0 16 0 1.54METABRIC MDM2 12q15 0 11 0 0 1.06METABRIC MYC 8q24.21 0 10 0 0 0.96METABRIC CDKN2A 9p21.3 0 0 16 0 1.54METABRIC CLTC 17q23.1 0 16 0 0 1.54METABRIC CREBBP 16p13.3 0 1 2 0 0.29METABRIC GNAS 20q13.32 0 7 0 0 0.67METABRIC EGFR 7p11.2 0 3 1 0 0.39METABRIC CDH1 16q22.1 0 0 16 0 1.54METABRIC CCNE1 19q12 0 6 1 0 0.67METABRIC EP300 22q13.2 0 0 4 0 0.39METABRIC CBL 11q23.3 0 0 13 0 1.25METABRIC PIK3R1 5q13.1 0 0 2 1.00E-04 0.19METABRIC JAK2 9p24.1 0 0 7 1.00E-04 0.67METABRIC TP53 17p13.1 0 0 7 2.00E-04 0.67METABRIC NUP98 11p15.4 0 0 8 0.0011 0.77METABRIC ATM 11q22.3 0 0 15 0.0149 1.45METABRIC PIK3CA 3q26.32 0 2 0 0.017 0.19METABRIC IDH2 15q26.1 0 4 1 0.017 0.48METABRIC KRAS 12p12.1 0 3 1 0.0348 0.39METABRIC TRA@ 14q11.2 0 1 5 0.0388 0.58TN JAK1 1p31.3 1 0 0 0.0026 1.5HGS AKT2 19q13.2 0 3 1 0 1.32HGS KIT 4q12 5 0 1 2.00E-04 1.97HGS NRAS 1p13.2 2 0 0 9.00E-04 0.66HGS RPN1 3q21.3 2 0 0 0.0019 0.66HGS PIK3CA 3q26.32 2 0 0 0.0029 0.66HGS CREBBP 16p13.3 5 0 1 0.0031 1.97HGS PPP2R1A 19q13.33 3 0 1 0.0046 1.32HGS ATIC 2q35 2 0 1 0.005 0.99HGS CLTCL1 22q11.21 4 0 1 0.0068 1.64HGS MET 7q31.2 4 0 0 0.0132 1.32HGS MAP2K4 17p12 1 0 2 0.044 0.99HGS ETV1 7p21.2 1 1 1 0.0468 0.99HGS EP300 22q13.2 1 0 3 0.0492 1.32GBM KRAS 12p12.1 1 0 1 1.41 1.67GBM AKT1 14q32.33 0 1 0 1.64 0.83Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 7 of 14PNMT-associated outliers were enriched for non-onco-genic macromolecule biosynthesis pathways includingmetabolic pathways and tyrosine metabolism (Figure 3A).The co-occurring modulation of oncogenic and metabolicpathways was also found in other high-level amplificationsin METABRIC including the 11q14 amplification of PAK1and NDUFC2 (Additional file 10). PAK1 (27 cases withhigh-level amplifications) shows evidence of driving EGFRsignaling (Figure 3B) and importantly segregates with apoor outcome ER positive subtype as reported in [20].NDUFC2 (30 cases with high-level amplifications), down-stream of PAK1 by approximately 660 Kb, encodes anNADH dehydrogenase enzyme. Outliers associated withNDUFC2 were associated with metabolic pathways and anoxidative phosphorylation pathway: a metabolic pathwaythat uses energy released by the oxidation of nutrients toproduce adenosine triphosphate (Figure 3B).A similar pattern of simultaneous modulation of meta-bolic pathways by the copy number changes harboringknown oncogenes was observed in GBM data. The cyclin-dependent kinase CDKN2A and the methylthioadenosinephosphorylase MTAP are separated by approximately 100Kb and are adjacent genes. MTAP (DriverNet rank 3) andknown tumor-suppressor CDKN2A (DriverNet rank 4) areknown to be co-deleted and they were observed as such inour analysis. We observed 53 cases with homozygous dele-tions in CDK2NA with accompanying co-deletion ofMTAP in all cases (Additional file 16). In two additionalcases with CDKN2A point mutations, MTAP was notfound to be mutated or deleted. The enriched pathways ofthe CDK2NA-associated outliers included cell cycle, p53signaling, and the FOXM1 transcription factor networkamongst others. The only significant enriched pathway ofMTAP-deletion associated outliers was the metabolicpathway (Figure 3C).We examined PNMT-, NDUFC2-, and MTAP-associatedoutlying genes that were part of metabolic pathways andalso ERBB2-, PAK1-, and CDKN2A-associated outlyinggenes that were related to the oncogenic/tumor suppressorpathways. Outlying genes related to metabolic pathwaysand oncogenic/tumor suppressor pathways were distribu-ted across disparate loci in the genome eliminating co-amplification as the cause for the observed signals (Addi-tional file 17).The results of metabolic genes being co-aberrated withoncogenic and tumor suppressor genes suggest stronglythat at least a portion of metabolic pathway disruption incancer can be mechanistically attributed to somatic aberra-tions in the genome. Moreover, our results indicate theintriguing possibility that genomic aberrations harboringknown oncogenic/tumor suppressor drivers are beingselected for due to oncogenic pathway modulation coupledwith non-overlapping metabolic pathway modulation.DiscussionA major challenge in large-scale interrogation of genomicand transcriptomic profiles of tumor types is to contex-tualize genomic aberrations within their gene expressionHUNTINGTON'SDISEASE(K)ALZHEIMER'SDISEASE(K)OXIDATIVEPHOSPHORYLATION(K)GLUCOSEREGULATION OF INSULINSECRETION(R)PARKINSON'SDISEASE(K)METABOLICPATHWAYS(K)ALPHA6BETA4INTEGRIN(C)EPITHELIAL CELL SIGNALING IN HELICOBACTERPYLORIINFECTION(K)FMLP INDUCED CHEMOKINEGENEEXPRESSION IN HMC-1 CELLS(B) PROTEOGYLCANSYNDECAN-MEDIATEDSIGNALINGEVENTS(N)NATURALKILLERCELL MEDIATED CYTOTOXICITY(K)ANGIOPOIETINRECEPTORTIE2-MEDIATEDSIGNALING(N) ERBB SIGNALING PATHWAY(K)AGRIN IN POSTSYNAPTICDIFFERENTIATION(B)RAS-INDEPENDENTPATHWAY IN NK CELL-MEDIATEDCYTOTOXICITY(B)PLASMAMEMBRANEESTROGENRECEPTORSIGNALING(N)CHEMOKINESIGNALINGPATHWAY(K)SIGNALINGEVENTSMEDIATED BY VEGFR1 AND VEGFR2(N)T CELL RECEPTORSIGNALINGPATHWAY(K)GLYPICANPATHWAY(N)SIGNALING BY AURORAKINASES(N)MAPKSIGNALINGPATHWAY(K)BMP RECEPTOR SIGNALING(N)SPHINGOSINE1-PHOSPHATE(S1P)PATHWAY(N)IFN-GAMMAPATHWAY(N)RENAL CELL CARCINOMA(K)FC GAMMA R-MEDIATEDPHAGOCYTOSIS(K)REGULATION OF ACTINCYTOSKELETON(K)RHO CELL MOTILITYSIGNALINGPATHWAY(B)ANGIOGENESIS(P)FOCALADHESION(K)TNF RECEPTOR SIGNALINGPATHWAY(N)EGFR1(C)AXONGUIDANCE(K)IL1-MEDIATEDSIGNALINGEVENTS(N)MTAP CDKN2AERBB2PNMT NDUFC2PAK13.0e+07 3.2e+07 3.4e+07 3.6e+07 3.8e+07 4.0e+07020406080100chr 17 (bp)q12q21.1q21.2q21.31●05101520PNMT ERBB27.0e+07 7.5e+07 8.0e+07 8.5e+07020406080100chr 11 (bp)frequencyq13.4q13.5q14.1●●051015PAK1NDUFC2Homozygous deletionHigh−level amplificationdriverNet predictionfrequencyA CB1.9e+07 2.0e+07 2.1e+07 2.2e+07 2.3e+07 2.4e+07 2.5e+07020406080100chr 9 (bp)frequencyp22.1p21.3● ●051015MTAP CDKN2A Homozygous deletionHigh−level amplificationdriverNet predictionHomozygous deletionHigh−level amplificationdriverNet prediction●●●ADHERENSJUNCTION(K)CALCIUMSIGNALINGPATHWAY(K)PATHWAYS IN CANCER(K)PROSTATECANCER(K)EGF RECEPTOR SIGNALINGPATHWAY(P)A6B1 AND A6B4 INTEGRINSIGNALING(N)ALPHA6BETA4INTEGRIN(C)TYROSINEMETABOLISM(K)ERBB SIGNALING PATHWAY(K)FOCALADHESION(K)BLADDERCANCER(K)METABOLICPATHWAYS(K)P53PATHWAY(P)METABOLICPATHWAYS(K)FOXM1TRANSCRIPTIONFACTORNETWORK(N) GLYPICANPATHWAY(N)MELANOMA(K)REGULATION OF RETINOBLASTOMAPROTEIN(N)CTCF: FIRST MULTIVALENTNUCLEARFACTOR(B)P53 SIGNALING PATHWAY(K)CELL CYCLE: G1/SCHECK POINT(B) TRAILSIGNALINGPATHWAY(N)HYPOXIC AND OXYGENHOMEOSTASISREGULATION OF HIF-1-ALPHA(N)TUMORSUPPRESSORARF INHIBITS RIBOSOMALBIOGENESIS(B) CELL CYCLE(K) PATHWAYS IN CANCER(K)ANDROGEN-MEDIATEDGLIOMA(K)Figure 3Figure 3 Simultaneous modulation of metabolic pathways in copy number alterations harboring known oncogenes. EnrichmentMap[32] diagrams depicting Reactome pathways enriched in the set of outliers associated with pairs of genes that are co-amplified or co-deleted. Ineach pair, one gene is a known tumor suppressor or oncogene while the other is a metabolism gene. Pathways are shown as connected nodesin a graph where the size of the node indicates the number of genes in the pathway. Edges between nodes indicate genes common to bothpathways where the thickness of the edge represents the degree of overlap. In general, little overlap was observed between metabolic driversand oncogenic/tumor-suppressor drivers. (A) PNMT and ERBB2 co-amplified genes at the chr17q12 locus in breast cancer. (B) PAK1 and NDUFC2co-amplified genes at the 11q14 locus in breast cancer. (C) CDKN2A and MTAP co-deleted genes at chr9p21.3 in GBM.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 8 of 14profiles. Assessing the impact of a somatic mutation onthe expression networks of a tumor provides strong evi-dence for its status as a driver. We presented a novelalgorithm called DriverNet for integrative analysis ofgenomic and transcriptomic data derived from popula-tion-level studies of tumors. DriverNet associates the pre-sence of a mutated gene with its impact on the geneexpression levels of its known interacting partners. Weshowed in several cancer datasets that this approach isboth sensitive and specific with respect to known drivergenes and is suitable for application in population-leveldatasets for numerous tumor types that will rapidlyemerge in the coming years.Investigation of infrequent drivers revealed a surpris-ing number of rare mutations in known cancer genestypically associated with other cancers. Although infre-quent, they nonetheless modulate the expression profilesand their identification is critical to understanding thepathogenesis of the cancers that harbor them. We sug-gest that examination of genomic patterns in the popu-lation without the integration of the transcriptomewould likely result in overlooking these important, butrare drivers. The structure of the bipartite graph inducesan interplay between the influence graph, the frequencyof mutations, and the frequency of aberrant expression.A natural question that arises is the role of both fre-quency of mutation and node degree in the ranking ofthe output. Additional files 18 and 19 show that whilerank is correlated with both frequency and node degree,the relationship is not monotonic and therefore thestructure of the graph does not deterministically orderthe output. This suggests instead that simultaneousobservations in the genome and the transcriptome inmany cases override the structure induced by the influ-ence graph and mutation frequency and can thereforepenetrate the seemingly deterministic structure inducedby the initial bipartite graph.Finally, we describe a set of aberrations whereby prox-imal drivers appear to simultaneously modulate onco-genic and metabolic pathways. This was observed inboth breast cancer and GBM datasets and leaves openthe possibility that selection of well-known drivers suchas ERBB2 and EGFR may be synergistically acting onaltered metabolic processes abrogated by co-altered,nearby metabolism genes. In light of recent renewedinterest in studying altered metabolism in cancer [28]owing to IDH1/2 somatic mutations in AML and GBM,the compound effects of single genomic events on meta-bolic and oncogenic pathways, suggest that disruption ofmetabolic pathways by somatic mutations may be morewidespread than previously thought and provides animpetus for novel therapies that might restore normalmetabolic function in a cancer-cell specific manner.LimitationsThe DriverNet algorithm has some limitations. As outly-ing expression is computed in a deterministic manner,we may not be capturing less extreme but nonethelessimportant changes in expression that are modulated bya genomic event. Furthermore, DriverNet does notgracefully handle the directionality of the expressionchange. A probabilistic model would account for thesubtler changes in expression handling; however, thecombinatorial complexity of inference required in a fullyprobabilistic framework remains a daunting and unre-solved challenge because of the number of parametersto estimate. Thus, this remains an open problem. Inaddition, DriverNet relies on the genomic aberrationsincluding mutations and extreme copy number altera-tion events that are supplied to the algorithm. Thethreshold to determine what constitutes a significantcopy number alteration lies within third-party copynumber analysis algorithms and can affect DriverNetresults. Performance benchmarking suggest that, inmost cases, DriverNet performs better when onlyextreme copy number alterations, that is, high-levelamplifications and homozygous deletions, were includedin the analysis (Additional file 20). Reducing the thresh-olds to detect more copy number alterations (such aschromosome-arm level events) results in too large aspace of altered genes in a given dataset (Additional files21, 22, 23, 24).The DriverNet framework relies on a predeterminedinfluence graph that is undoubtedly sparse and incom-plete. This is underscored by the omission in theMETABRIC dataset of ZNF703, which resides in theamplification of the 8p12 locus that includes FGFR1.We have recently described ZNF703 as a driver [29] inluminal B cancers; however, DriverNet was not posi-tioned to identify it due to its absence in the Reactomedatabase. There are undoubtedly other false negativepredictions due to poor characterization and lack ofprotein-protein interaction data; however, as interactiondatabases increase in density and volume of interactions,the DriverNet framework will be well placed to leveragesuch improvements. Nevertheless, our goal is not to dis-cover new protein interactions in this work, but ratherto describe the association of mutations and expressionin the context of well-understood knowledge bases.Finally, we note that this framework is suitable for data-sets with many patients sequenced. Ultimately, we wishto extend the framework for application to individualpatients to determine the effectiveness of identificationof actionable driver mutations for clinical use. This willrequire the accumulation of large gene expression repo-sitories for tumor types that can be used to contextua-lize a patient’s expression and mutational profiles.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 9 of 14ConclusionsWe have presented a comprehensive analysis from fourindependent datasets of how transcriptional networks areaffected by genomic aberrations in cancer and demonstratehow integrative analysis can be used effectively to identifynovel driver genes in population-level studies of tumorgenomes and transcriptomes. Our results demonstrate thepower of integrative analysis across multiple tumor typesin recently generated population-scale datasets in revealinginfrequent, but functionally important, mutations andnovel patterns of pathway disruption in cancer. We expectDriverNet to generalize well to planned future studies,including application to patient-specific mutational andexpression profiles for genome/transcriptome-informedpersonalized cancer care.MethodsIn this section we present the essential details of theDriverNet algorithm. Additional details of data analysis,data preprocessing, and the Fisher method are presentedin Additional file 1.Details of DriverNet algorithmConsider two gene-patient matrices. The first matrixM(i, j)represents a binary matrix where M(i, j) = 1 indicates genei is mutated in patient j and M(i, j) = 0 indicates theabsence of a mutation. Mutations can take the form ofsomatic point mutations, indels, copy number changes, orpossibly epigenomic events. Matrix G(i, j) captures thereal-valued gene expression measure of gene i in patient jand can be derived from gene expression arrays or RNA-Seq. Optionally, G(i, j) can be transformed into a matrixG’(i, j) indicating whether gene i in patient j is an outlierfrom the population-level distribution for that gene. Giventhese matrices, we can formulate the problem of findingdriver mutations with a bipartite graph, (Figure 1C),where nodes on the left represent genomic aberration sta-tus from M (green nodes show the genes that have a muta-tion in at least one patient) and nodes on the right arepatient-gene events from G or G’ (for every patient, outliersare shown as red nodes). Edges are drawn between nodesin different partitions of the graph under the following con-ditions: for each patient pk draw an edge between nodes giin the left partition and gj for patient pk in the right parti-tion, if gi is mutated, gj exhibits outlying expression, and giand gj interact according to known gene networks (forexample, Reactome FI [30]), termed the influence graphafter [18].The aim of the inference algorithm is to identify genesin the left partition that are connected to the most nodesin the right partition (for example, g2 as shown in Figure1C), thereby identifying mutated genes with the largestextent of transcriptional disruption, and simultaneouslyimplicating a network of connected genes in the influ-ence graph with outlying expression that associate withthe mutation. The genes are ranked according to theirnode coverage in the bipartite graph, . If we denote theset of all the mutated genes by U, we postulate that thetop n driver geneset Dn ⊆ U is the set of n genes thatcover the maximum number of nodes on the right parti-tion of the bipartite graph. It should be noted that: i) dueto different factors, all the outlying expression eventsmay not be explained by the given mutations; and ii) thealgorithm formulation makes the strong assumption thatdrivers will modulate the expression of many genes,which will primarily apply for genes that alter large, well-defined transcriptional networks. Finally, we observe thatsolving this problem is closely related to the minimumset cover problem, which is NP-hard.A greedy approximation algorithm to solve theoptimization problemGiven a set of elements (called the universe) and somesets whose union comprises the universe, the set coverproblem is to identify the smallest number of sets whoseunion still contains all elements in the universe. The ana-logy of the minimum set cover problem to our drivermutation framework is as follows: i) elements of the uni-verse are the patient-gene (outlying expression) events,and ii) each mutation corresponds to a set that consistsof those patient-gene events connected to this mutationin the bipartite graph. The greedy algorithm for our pro-blem is similar to that for the set cover problem: at eachstage, choose a mutated gene that contains the largestnumber of uncovered outlying expression events (seeAlgorithm 1). The stopping condition is when all theconnected outlying expression events are covered. Inother words, the algorithm looks for the minimum cover-ing for all of the elements in the universe. It can beshown that the greedy algorithm achieves an approxima-tion ratio of H(s), where s is the size of the largest set andH(n) =∑nk=1 1/k is the nth harmonic number.Significance testsThe statistical significance of the driver genes are assessedusing a randomization framework. The original datasetsare permuted N = 500 times, and the algorithm is run onthe N randomly generated datasets and results on realdata are assessed to see if they are significantly differentfrom the results on randomized datasets. This is an indir-ect way of perturbing the bipartite graph corresponding tothe original problem. To generate the random datasets, wepermute both the patient-mutation, M , and patient-out-lier, G’, matrices according to the following procedure:i) construct a J × K zero matrix where J represents thenumber of patients and K represents the total number ofBashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 10 of 14Ensmbl 54 protein-coding genes, ii) put 1 in Ntotal ran-domly selected cells, where Ntotal represents either thetotal number of mutations or the total number of outlyinggenes depending on which matrix is permuted, iii) removethe columns where their elements are 0. Using the sameinfluence graph, the algorithm is run on the N = 500 per-muted patient-mutation, M1... MN, and patient-outlier,G1’... GN’, matrices.Suppose D is the result of the driver mutation discoveryalgorithm. D contains a ranked list of driver genes withtheir corresponding node coverage in the bipartite graph,. The statistical significance of a gene g ÎD with a corre-sponding node coverage, COVg, is the fraction of timesthat we observe driver genes with the node coverage ofmore than COVg in the N = 500 random runs of the algo-rithm:pvalue(g) =N∑i=1Si∑j=1δ[COVgij > COVg]N∑i=1Siwhere Si is the number of drivers identified in the ithrun of the algorithm. We then use the Benjamini-Hoch-berg approach for correcting the P values for multipletests.Building the influence graphThe influence graph captures the knowledge about theinfluence of mutation in a gene on the change of expres-sion of another gene. Various sources of information suchas the protein-protein interaction (PPI) networks or net-works based on copy number and/or expression data canbe used to build the influence graph. In this paper, we uti-lize the protein functional interaction network derived in[30] to build the influence graph. This network extendsthe protein functional interaction network in curated path-ways with non-curated sources of information, includingprotein-protein interactions, gene co-expression, proteindomain interaction, gene ontology (GO) annotations, andtext-mined protein interactions, which cover close to 50%of the human proteome.ImplementationThe DriverNet algorithm is implemented in a publiclyavailable R package [31]. The memory complexity of thegreedy algorithm is O(M N + M R + R2), where M is thenumber of patients, N is the number of mutated genes,and R is the number of genes with gene expression valuesand also in the influence graph. The algorithm needsmemory to hold the patient-mutation matrix, the patient-outlier matrix, and the influence graph. Note that all thethree matrices are sparse binary matrices, thus the mem-ory usage can be decreased by using sparse representationof the matrices. If we rank all the mutated genes, the timecomplexity is O(δ × N (N + 1)/2), where δ is the time usedto compute the explained outliers by a gene, which isbounded by its node degree of the influence graph. Inpractice, the algorithm is fast when the memory usageis low. For example, for the GBM dataset, it takes about1 minute to run on a dual-core desktop Mac computerwithout computing the empirical P values.Additional materialAdditional file 1: Supplementary text.Additional file 2: Data analysis workflow.Additional file 3: Ranked list of candidate driver genes using theYoun-Simon approach for the GBM2 dataset. rank: rank of the gene,hgnc_symbol: gene symbol, p.value: P value, p.adjust: adjusted P valueusing the Benjamini-Hochberg approach.Additional file 4: Ranked list of candidate driver genes for theGBM2 dataset. rank: rank of the gene according to DriverNet, gene:Algorithm 1 Greedy driver gene selection algorithmRequire: be the bipartite graph, where denotes the set of nodes corresponding to mutated genes, denotes the set of nodescorresponding to the patient-specific outlying expression events, and denotes the set of edges between and1: //the set of selected driver genes2: //the number of all the connected outlying expression events3: z ¬ 0 //the number of covered outlying expression events so far4: while z < Z do5: //pick mutated gene with the highest degree; in case of a tie, randomly pick one of the genes6: //update the number of covered outlying events7: //add g to the driver set8:9: for g’ Î S do10: //remove the node g’ and its connected edges from11: end for12: end while13:Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 11 of 14gene symbol, gband: gene chromosome location and gene band, SNV.Indel: number of cases with SNV or indel in that specific gene, HLAMP:number of cases with copy number high-level amplifications, AMP:number of cases with copy number amplifications, HOMD: number ofcases with copy number homozygous deletions, HETD: number of caseswith copy number hemizygous deletions, covered events: the number ofevents (edges) connected to the gene on the left of the bipartite graph,node degree: the number of genes connected to the gene of interest inthe influence graph, p.value: P value corrected for the multiple test usingthe Benjamini-Hochberg approach, CGC.status: Cancer Gene Census(CGC) membership status (1 = found in CGC, 0 = not in CGC),percentage.event: percentage of cases with genomic aberrations in thegene of interest, p.way: top pathways associated with outlying genes(posterior probability > 0.8); numbers in parentheses show the posteriorprobability.Additional file 5: Ranked list of candidate driver genes using theYoun-Simon approach for the TN2 dataset. rank: rank of the gene,hgnc_symbol: gene symbol, p.value: P value, p.adjust.BH: adjusted P valueusing the Benjamini- Hochberg approach.Additional file 6: Ranked list of candidate driver genes for the TN2dataset. rank: rank of the gene according to DriverNet, gene: genesymbol, gband: gene chromosome location and gene band, SNV.Indel:number of cases with SNV or indel in that specific gene, HLAMP: numberof cases with copy number high-level amplifications, AMP: number ofcases with copy number amplifications, HOMD: number of cases withcopy number homozygous deletions, HETD: number of cases with copynumber hemizygous deletions, covered events: the number of events(edges) connected to the gene on the left of the bipartite graph, nodedegree: the number of genes connected to the gene of interest in theinfluence graph, p.value: P value corrected for the multiple test using theBenjamini-Hochberg approach, CGC.status: Cancer Gene Census (CGC)membership status (1 = found in CGC, 0 = not in CGC), percentage.event: percentage of cases with genomic aberrations in the gene ofinterest, p.way: top pathways associated with outlying genes (posteriorprobability > 0.8); numbers in parentheses show the posterior probability.Additional file 7: Ranked list of candidate driver genes using theYoun-Simon approach for the HGS2 dataset. rank: rank of the gene,hgnc_symbol: gene symbol, p.value: P value, p.adjust: adjusted P valueusing the Benjamini-Hochberg approach.Additional file 8: Ranked list of candidate driver genes for theHGS2 dataset. rank: rank of the gene according to DriverNet, gene:gene symbol, gband: gene chromosome location and gene band, SNV.Indel: number of cases with SNV or indel in that specific gene, HLAMP:number of cases with copy number high-level amplifications, AMP:number of cases with copy number amplifications, HOMD: number ofcases with copy number homozygous deletions, HETD: number of caseswith copy number hemizygous deletions, covered events: the number ofevents (edges) connected to the gene on the left of the bipartite graph,node degree: the number of genes connected to the gene of interest inthe influence graph, p.value: P value corrected for the multiple test usingthe Benjamini-Hochberg approach, CGC.status: Cancer Gene Census(CGC) membership status (1 = found in CGC, 0 = not in CGC),percentage.event: percentage of cases with genomic aberrations in thegene of interest, p.way: top pathways associated with outlying genes(posterior probability > 0.8); numbers in parentheses show the posteriorprobability.Additional file 9: Ranked list of candidate driver genes for theMETABRIC dataset. rank: rank of the gene according to DriverNet, gene:gene symbol, gband: gene chromosome location and gene band, SNV.Indel: number of cases with SNV or indel in that specific gene, HLAMP:number of cases with copy number high-level amplifications, AMP:number of cases with copy number amplifications, HOMD: number ofcases with copy number homozygous deletions, HETD: number of caseswith copy number hemizygous deletions, covered events: the number ofevents (edges) connected to the gene on the left of the bipartite graph,node degree: the number of genes connected to the gene of interest inthe influence graph, p.value: P value corrected for the multiple test usingthe Benjamini-Hochberg approach, CGC.status: Cancer Gene Census(CGC) membership status (1 = found in CGC, 0 = not in CGC),percentage.event: percentage of cases with genomic aberrations in thegene of interest, p.way: top pathways associated with outlying genes(posterior probability > 0.8); numbers in parentheses show the posteriorprobability.Additional file 10: Figure showing the SNVs/indels, homozygousdeletion (HOMD), and high-level amplification (HLAMP) statusacross the patients for the top 190 candidate driver genes (rankedfrom top to bottom) for the METABRIC dataset. Genes with P values≤ 0.05 are shown. Red blocks show HLAMPs and blue show HOMDs foreach case.Additional file 11: Ranked list of candidate driver genes for theHGS dataset. rank: rank of the gene according to DriverNet, gene: genesymbol, gband: gene chromosome location and gene band, SNV.Indel:number of cases with SNV or indel in that specific gene, HLAMP: numberof cases with copy number high-level amplifications, AMP: number ofcases with copy number amplifications, HOMD: number of cases withcopy number homozygous deletions, HETD: number of cases with copynumber hemizygous deletions, covered events: the number of events(edges) connected to the gene on the left of the bipartite graph, nodedegree: the number of genes connected to the gene of interest in theinfluence graph, p.value: P value corrected for the multiple test using theBenjamini-Hochberg approach, CGC.status: Cancer Gene Census (CGC)membership status (1 = found in CGC, 0 = not in CGC), percentage.event: percentage of cases with genomic aberrations in the gene ofinterest, p.way: top pathways associated with outlying genes (posteriorprobability > 0.8); numbers in parentheses show the posterior probability.Additional file 12: Figure showing the SNVs/indels, homozygousdeletion (HOMD), and high-level amplification (HLAMP) statusacross the patients for the top 144 candidate driver genes (rankedfrom top to bottom) for the HGS dataset. Genes with P values ≤ 0.05are shown. Green blocks show SNVs or indels, red blocks show HLAMPs,and blue show HOMDs for each case.Additional file 13: Ranked list of candidate driver genes for the TNdataset. rank: rank of the gene according to DriverNet, gene: genesymbol, gband: gene chromosome location and gene band, SNV.Indel:number of cases with SNV or indel in that specific gene, HLAMP: numberof cases with copy number high-level amplifications, AMP: number ofcases with copy number amplifications, HOMD: number of cases withcopy number homozygous deletions, HETD: number of cases with copynumber hemizygous deletions, covered events: the number of events(edges) connected to the gene on the left of the bipartite graph, nodedegree: the number of genes connected to the gene of interest in theinfluence graph, p.value: P value corrected for the multiple test using theBenjamini-Hochberg approach, CGC.status: Cancer Gene Census (CGC)membership status (1 = found in CGC, 0 = not in CGC), percentage.event: percentage of cases with genomic aberrations in the gene ofinterest, p.way: top pathways associated with outlying genes (posteriorprobability > 0.8); numbers in parentheses show the posterior probability.Additional file 14: Figure showing the SNVs/indels, homozygousdeletion (HOMD), and high-level amplification (HLAMP) statusacross the patients for the top 50 candidate driver genes (rankedfrom top to bottom) for the TN dataset. Genes with P values ≤ 0.05are shown. Green blocks show SNVs or indels, red blocks show HLAMPs,and blue show HOMDs for each case.Additional file 15: Ranked list of candidate driver genes for theGBM dataset. rank: rank of the gene according to DriverNet, gene: genesymbol, gband: gene chromosome location and gene band, SNV.Indel:number of cases with SNV or indel in that specific gene, HLAMP: numberof cases with copy number high-level amplifications, AMP: number ofcases with copy number amplifications, HOMD: number of cases withcopy number homozygous deletions, HETD: number of cases with copynumber hemizygous deletions, covered events: the number of events(edges) connected to the gene on the left of the bipartite graph, nodedegree: the number of genes connected to the gene of interest in theinfluence graph, p.value: P value corrected for the multiple test using theBenjamini-Hochberg approach, CGC.status: Cancer Gene Census (CGC)membership status (1 = found in CGC, 0 = not in CGC), percentage.event: percentage of cases with genomic aberrations in the gene ofBashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 12 of 14interest, p.way: top pathways associated with outlying genes (posteriorprobability > 0.8); numbers in parentheses show the posterior probability.Additional file 16: Figure showing the SNVs/indels, homozygousdeletion (HOMD), and high-level amplification (HLAMP) statusacross the patients for the top 49 candidate driver genes (rankedfrom top to bottom) for the GBM dataset. Genes with P values ≤0.05are shown. Green blocks show SNVs or indels, red blocks show HLAMPs,and blue show HOMDs for each case.Additional file 17: Circos plots showing outlying genes related tometabolic pathways for PNMT (A), NDUFC2 (B), and MTAP (C) andoutlying genes related to oncogenic/tumor suppressor pathwaysfor ERBB2 (D), PAK1 (E), and CDKN2A (F) genes.Additional file 18: Frequency of aberrations versus the rank ofsignificant genes (p ≤ 0.05) for the GBM (A), HGS (B), TN (C), andMETABRIC (D) datasets.Additional file 19: Node degree in the influence graph versus therank of significant genes (p ≤ 0.05) for the GBM (A), HGS (B), TN (C),and METABRIC (D) datasets.Additional file 20: DriverNet performance benchmarking on GBM,TN, HGS, and METABRIC datasets when copy number amplifications(AMP) and hemizygous deletions (HETDs) were included in additionto the high-level amplifications (HLAMP) and homozygousdeletions (HOMDs). (A-D) Concordance with Cancer Gene Census forDriverNet, Frequency-based, and Fisher-based approaches as a functionof the top N ranked genes (out of 200) for the GBM, TN, HGS, andMETABRIC datasets, respectively. (E-H) Concordance with COSMICdatabase (cumulative distribution of mutation prevalence in the COSMICdatabase) for DriverNet, Frequency-based, and Fisher-based approachesas a function of the top N ranked genes (out of 200) for the GBM, TN,HGS, and METABRIC datasets, respectively.Additional file 21: Ranked list of candidate driver genes for theMETABRIC dataset when copy number amplifications andhemizygous deletions were included in addition to the mutations,high-level amplifications, and homozygous deletions. rank: rank ofthe gene according to DriverNet, gene: gene symbol, gband: genechromosome location and gene band, SNV.Indel: number of cases withSNV or indel in that specific gene, HLAMP: number of cases with copynumber high-level amplifications, AMP: number of cases with copynumber amplifications, HOMD: number of cases with copy numberhomozygous deletions, HETD: number of cases with copy numberhemizygous deletions, covered events: the number of events (edges)connected to the gene on the left of the bipartite graph, node degree:the number of genes connected to the gene of interest in the influencegraph, p.value: P value corrected for the multiple test using theBenjamini-Hochberg approach, CGC.status: Cancer Gene Census (CGC)membership status (1 = found in CGC, 0 = not in CGC), percentage.event: percentage of cases with genomic aberrations in the gene ofinterest, p.way: top pathways associated with outlying genes (posteriorprobability > 0.8); numbers in parentheses show the posterior probability.Additional file 22: Ranked list of candidate driver genes for theHGS dataset when copy number amplifications and hemizygousdeletions were included in addition to the mutations, high-levelamplifications, and homozygous deletions. rank: rank of the geneaccording to DriverNet, gene: gene symbol, gband: gene chromosomelocation and gene band, SNV.Indel: number of cases with SNV or indel inthat specific gene, HLAMP: number of cases with copy number high-level amplifications, AMP: number of cases with copy numberamplifications, HOMD: number of cases with copy number homozygousdeletions, HETD: number of cases with copy number hemizygousdeletions, covered events: the number of events (edges) connected tothe gene on the left of the bipartite graph, node degree: the number ofgenes connected to the gene of interest in the influence graph, p.value:P value corrected for the multiple test using the Benjamini-Hochbergapproach, CGC.status: Cancer Gene Census (CGC) membership status (1 =found in CGC, 0 = not in CGC), percentage.event: percentage of caseswith genomic aberrations in the gene of interest, p.way: top pathwaysassociated with outlying genes (posterior probability > 0.8); numbers inparentheses show the posterior probability.Additional file 23: Ranked list of candidate driver genes for the TNdataset when copy number amplifications and hemizygousdeletions were included in addition to the mutations, high-levelamplifications, and homozygous deletions. rank: rank of the geneaccording to DriverNet, gene: gene symbol, gband: gene chromosomelocation and gene band, SNV.Indel: number of cases with SNV or indel inthat specific gene, HLAMP: number of cases with copy number high-level amplifications, AMP: number of cases with copy numberamplifications, HOMD: number of cases with copy number homozygousdeletions, HETD: number of cases with copy number hemizygousdeletions, covered events: the number of events (edges) connected tothe gene on the left of the bipartite graph, node degree: the number ofgenes connected to the gene of interest in the influence graph, p.value:P value corrected for the multiple test using the Benjamini-Hochbergapproach, CGC.status: Cancer Gene Census (CGC) membership status (1 =found in CGC, 0 = not in CGC), percentage.event: percentage of caseswith genomic aberrations in the gene of interest, p.way: top pathwaysassociated with outlying genes (posterior probability > 0.8); numbers inparentheses show the posterior probability.Additional file 24: Ranked list of candidate driver genes for theGBM dataset when copy number amplifications and hemizygousdeletions were included in addition to the mutations, high-levelamplifications, and homozygous deletions. rank: rank of the geneaccording to DriverNet, gene: gene symbol, gband: gene chromosomelocation and gene band, SNV.Indel: number of cases with SNV or indel inthat specific gene, HLAMP: number of cases with copy number high-level amplifications, AMP: number of cases with copy numberamplifications, HOMD: number of cases with copy number homozygousdeletions, HETD: number of cases with copy number hemizygousdeletions, covered events: the number of events (edges) connected tothe gene on the left of the bipartite graph, node degree: the number ofgenes connected to the gene of interest in the influence graph, p.value:P value corrected for the multiple test using the Benjamini-Hochbergapproach, CGC.status: Cancer Gene Census (CGC) membership status (1 =found in CGC, 0 = not in CGC), percentage.event: percentage of caseswith genomic aberrations in the gene of interest, p.way: top pathwaysassociated with outlying genes (posterior probability > 0.8); numbers inparentheses show the posterior probability.AbbreviationsAMP: amplifications; CGC: cancer gene census; CNA: copy number alteration;COSMIC: catalogue of somatic mutations in cancer datasets; GBM:glioblastoma multiforme; GO: gene ontology; HETD: hemizygous deletion;HLAMP: high-level amplification; HOMD: homozygous deletion; MTAP:methylthioadenosine phosphorylase; NGS: next generation sequencing; PPI:protein-protein interaction; TCGA: the cancer genome atlas.Authors’ contributionsSS was responsible for the project’s conception and oversight. GH, AB, andSS designed and/or implemented different parts of the research plan andwrote the manuscript. AB, GH, JD, GaH, and JR conducted the analyses ofthe data. KL, GH, AB, and JD contributed to the R package. CC and SA areMETABRIC project leaders. SA is TN sequencing project leader. SA and DHcontributed to the project’s conception. All authors read and approved thefinal manuscript.AcknowledgementsTechnical support is acknowledged from the Centre for TranslationalGenomics (CTAG), the Michael Smith Genome Sciences Centre technicalgroup. This work was supported by the BC Cancer Foundation, CanadianBreast Cancer Foundation (BC Yukon) (SA, SS), Eli-Lilly Canada (AB), MichaelSmith Foundation for Health Research (SS), and the Canadian Cancer Society(SS) (grant no. 2012-701125).Author details1Department of Molecular Oncology, British Columbia Cancer Agency, 675West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada. 2Faculty of InformationTechnology, Monash University, Wellington Road, Clayton, VIC 3800,Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 13 of 14Australia. 3Department of Computer Science, University of British Columbia,2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada. 4Bioinformatics TrainingProgram, University of British Columbia, 570 West 7th Avenue, Vancouver,BC, V5Z 4S6, Canada. 5Department of Pathology and Laboratory Medicine,University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC, V6T 2B5,Canada. 6Centre for Translational and Applied Genomics, BC Cancer Agency,600 West 10th Avenue, Vancouver, BC, V5Z 4E6 Canada. 7Cancer ResearchUK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way,Cambridge, CB2 0RE, UK.Received: 1 August 2012 Revised: 19 November 2012Accepted: 22 December 2012 Published: 22 December 2012References1. Cancer Genome Atlas Research Network: Comprehensive genomiccharacterization defines human glioblastoma genes and core pathways.Nature 2008, 455:1061-1068.2. Prahallad A, Sun C, Huang S, Di Nicolantonio F, Salazar R, Zecchin D,Beijersbergen RL, Bardelli A, Bernards R: Unresponsiveness of colon cancerto BRAF(V600E) inhibition through feedback activation of EGFR. Nature2012, 483:100-103.3. Greenman C, Wooster R, Futreal PA, Stratton MR, Easton DF: Statisticalanalysis of pathogenicity of somatic mutations in cancer. Genetics 2006,173:2187-2198.4. Getz G, Hofling H, Mesirov JP, Golub TR, Meyerson M, Tibshirani R,Lander ES: Comment on: the consensus coding sequences of humanbreast and colorectal cancers. Science 2007, 317:1500-1500 [http://www.hubmed.org/display.cgi?uids=17872428].5. Youn A, Simon R: Identifying cancer driver genes in tumor genomesequencing studies. Bioinformatics 2011, 27:175-181.6. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P,Carter H, Siu IM, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S,Nikolskaya T, Nikolsky Y, Busam DA, Tekleab H, Diaz LA, Hartigan J,Smith DR, Strausberg RL, Marie SK, Shinjo SM, Yan H, Riggins GJ, Bigner DD,Karchin R, Papadopoulos N, Parmigiani G, et al: An integrated genomicanalysis of human glioblastoma multiforme. Science 2008, 321:1807-1812.7. Cancer Genome Atlas Research Network: Integrated genomic analyses ofovarian carcinoma. Nature 2011, 474:609-615.8. Hammerman PS, Hayes DN, Wilkerson MD, Schultz N, Bose R, Chu A,Collisson EA, Cope L, Creighton CJ: Comprehensive genomiccharacterization of squamous cell lung cancers. Nature 2012, 489:519-525.9. Muzny DM, Bainbridge MN, Chang K, Dinh HH, Drummond JA, Fowler G,Kovar CL, Lewis LR, Morgan MB, Newsham IF: Comprehensive molecularcharacterization of human colon and rectal cancer. Nature 2012,487:330-337.10. Koboldt DC, Fulton RS, McLellan MD, Schmidt H, Kalicki-Veizer J,McMichael JF, Fulton LL, Dooling DJ, Ding L, Mardis ER: Comprehensivemolecular portraits of human breast tumours. Nature 2012, 490:61-70.11. Shah S, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J, Tse K,Haffari G, Bashashati A, Prentice L, Khattra J, Burleigh A, Yap D, Bernard V,McPherson A, Shumansky K, Crisan A, Giuliany R, Heravi-Moussavi A,Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N, Zeng T, Ma K, Chan S,et al: The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 2012, 486:395-399.12. Vaske C, Benz S, Sanborn J, Earl D, Szeto C, Zhu J, Haussler D, Stuart J:Inference of patient-specific pathway activities from multi-dimensionalcancer genomics data using PARADIGM. Bioinformatics 2010, 26:i237-i245.13. Vandin F, Upfal E, Raphael B: De novo discovery of mutated driverpathways in cancer. Genome Res 2012, 22:375-385.14. Masica DL, Karchin R: Correlation of somatic mutation and expressionidentifies genes important in human glioblastoma progression andsurvival. Cancer Res 2011, 71:4550-4561.15. Ciriello G, Cerami E, Sander C, Schultz N: Mutual exclusivity analysisidentifies oncogenic network modules. Genome Research 2012,22:398-406.16. Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC,Pochanard P, Mozes E, Garraway LA, Pe’er D: An integrated approach touncover drivers of cancer. Cell 2010, 143:1005-1017.17. Nelander S, Wang W, Nilsson B, She QB, Pratilas C, Rosen N, Gennemark P,Sander C: Models from experiments: combinatorial drug perturbations ofcancer cells. Molecular Systems Biology 2008, 4.18. Vandin F, Upfal E, Raphael B: Algorithms for detecting significantlymutated pathways in cancer. Journal of Computational Biology 2011,18:507-522.19. TCGA data portal. [https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm].20. Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D,Lynch AG, Samarajiwa S, Yuan Y, Graf S, Ha G, Haffari G, Bashashati A,Russell R, McKinney S, the METABRIC GROUP, Langerod A, Green A,Provenzano E, Wishart G, Pinder S, Watson P, Markowetz F, Murphy L, Ellis I,Purushotham A, Borresen-Dale AL, Brenton JD, Tavare S, et al: The genomicand transcriptomic architecture of 2,000 breast tumours reveals novelsubgroups. Nature 2012, 486:346-352.21. Futreal P, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N,Stratton M: A census of human cancer genes. Nature Reviews Cancer 2004,4:177-183.22. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R,Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA:COSMIC: mining complete cancer genomes in the Catalogue of SomaticMutations in Cancer. Nucleic Acids Research 2011, 39:D945-D950.23. Clevenger CV: Role of prolactin/prolactin receptor signaling in humanbreast cancer. Breast Dis 2003, 18:75-86.24. Neilson LM, Zhu J, Xie J, Malabarba MG, Sakamoto K, Wagner KU, Kirken RA,Rui H: Coactivation of janus tyrosine kinase (Jak)1 positively modulatesprolactin-Jak2 signaling in breast cancer: recruitment of ERK and signaltransducer and activator of transcription (Stat)3 and enhancement ofAkt and Stat5a/b pathways. Mol Endocrinol 2007, 21:2218-2232.25. Kobel M, Kalloger SE, Boyd N, McKinney S, Mehl E, Palmer C, Leung S,Bowen NJ, Ionescu DN, Rajput A, Prentice LM, Miller D, Santos J,Swenerton K, Gilks CB, Huntsman D: Ovarian carcinoma subtypes aredifferent diseases: implications for biomarker studies. PLoS Med 2008, 5:e232.26. Cheung HW, Cowley GS, Weir BA, Boehm JS, Rusin S, Scott JA, East A,Ali LD, Lizotte PH, Wong TC, Jiang G, Hsiao J, Mermel CH, Getz G,Barretina J, Gopal S, Tamayo P, Gould J, Tsherniak A, Stransky N, Luo B,Ren Y, Drapkin R, Bhatia SN, Mesirov JP, Garraway LA, Meyerson M,Lander ES, Root DE, Hahn WC: Systematic investigation of geneticvulnerabilities across cancer cell lines reveals lineage-specificdependencies in ovarian cancer. Proceedings of the National Academy ofSciences 2011 [http://www.pnas.org/content/early/2011/07/06/1109363108.abstract].27. Sykes SM, Lane SW, Bullinger L, Kalaitzidis D, Yusuf R, Saez B, Ferraro F,Mercier F, Singh H, Brumme KM, Acharya SS, Scholl C, Scholl C, Tothova Z,Attar EC, Frohling S, DePinho RA, Armstrong SA, Gilliland DG, Scadden DT:AKT/FOXO signaling enforces reversible differentiation blockade inmyeloid leukemias. Cell 2011, 146:697-708.28. Cairns R, Harris I, Mak T: Regulation of cancer cell metabolism. NatureReviews Cancer 2011, 11:85-95.29. Holland DG, Burleigh A, Git A, Goldgraben MA, Perez-Mancera PA, Chin SF,Hurtado A, Bruna A, Ali HR, Greenwood W, Dunning MJ, Samarajiwa S,Menon S, Rueda OM, Lynch AG, McKinney S, Ellis IO, Eaves CJ, Carroll JS,Curtis C, Aparicio S, Caldas C: ZNF703 is a common luminal B breastcancer oncogene that differentially regulates luminal and basalprogenitors in human mammary epithelium. EMBO Molecular Medicine2011, 3:167-180.30. Wu G, Feng X, Stein L: A human functional protein interaction networkand its application to cancer data analysis. Genome Biology 2010, 11:R53.31. DriverNet algorithm. [http://bioconductor.org/packages/2.12/bioc/html/DriverNet.html].32. Merico D, Isserlin R, Stueker O, Emili A, Bader GD: Enrichment map: anetwork-based method for gene-set enrichment visualization andinterpretation. PLoS ONE 2010, 5:e13984.doi:10.1186/gb-2012-13-12-r124Cite this article as: Bashashati et al.: DriverNet: uncovering the impact ofsomatic driver mutations on transcriptional networks in cancer. GenomeBiology 2012 13:R124.Bashashati et al. Genome Biology 2012, 13:R124http://genomebiology.com/2012/13/12/R124Page 14 of 14

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.52383.1-0223730/manifest

Comment

Related Items