McPherson et al. Genome Biology (2017) 18:140 DOI 10.1186/s13059-017-1267-2METHOD Open AccessReMixT: clone-specific genomic structureestimation in cancerAndrew W. McPherson1,2 , Andrew Roth3,4, Gavin Ha5,6, Cedric Chauve7, Adi Steif1,Camila P. E. de Souza1,2, Peter Eirew1, Alexandre Bouchard-Côté8, Sam Aparicio1,2, S. Cenk Sahinalp9,10and Sohrab P. Shah1,2*AbstractSomatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished inpart by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencingmixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complicatingestimation of clone-specific genotypes. We introduce ReMixT, a method to unmix tumor and contaminating normalsignals and jointly predict mixture proportions, clone-specific segment copy number, and clone specificity ofbreakpoints. ReMixT is free, open-source software and is available at http://bitbucket.org/dranew/remixt.Keywords: Cancer genomics, DNA sequencing, Tumour heterogeneity, Genomic rearrangement, Copy numbervariationBackgroundChromosomal rearrangements pattern the genomes ofcancer cells. Owing to various forms of DNA repairdeficiency, such structural variations accumulate on celldivision, leading to genome instability in the life histo-ries of cancer cells. Coupled with evolutionary selectionand clonal expansion, genomic instability and consequentsegmental aneuploidies mark expanded cell populationswithin a tumour, forming important components of theirgenotypes. Within each tumour, branched evolution pro-duces mixed populations of tumour cells with ancestrallyrelated, but divergent chromosomal structures.Accurate detection and quantification of genomic struc-tural changes in a population of cancer cells as measuredby bulk, whole genome sequencing (WGS) remains a sig-nificant computational challenge. The process of DNAextraction from a tumour sample pools and admixesmolecules from the input material without labelling theassignment of DNA to its parent cell. The resultingsequencing data represent a randomly sampled subset of*Correspondence: sshah@bccrc.ca1Department of Molecular Oncology, BC Cancer Agency, 675 West 10thAvenue, Vancouver, BC, Canada2Department of Pathology and Laboratory Medicine, University of BritishColumbia, 2329 West Mall, Vancouver, BC, CanadaFull list of author information is available at the end of the articleDNA fragments from the admixed pool, leaving the prob-lem of unmixing the structural rearrangements whichmark the constituent clones in the input material. The keydifficulty of the problem is that the admixed pool dilutesthe signal of genomic rearrangements and copy numberalterations in the data, often to a level approaching that ofthe experimental noise.Rearrangements and copy number changes are intrin-sically linked, with unbalanced rearrangements pro-ducing changes in copy number, and loss or gain ofrearranged chromosomes resulting in segment-specificcopy changes. Rearrangement breakpoints representingtumour-specific adjacencies can be predicted with rea-sonable accuracy from WGS data using a variety oftools [1–4]. However, existing methods for copy num-ber analysis do not consider tumour-specific adjacen-cies, and instead model segments as adjacent onlyif they are adjacent in the reference genome [5–9].This results in only partial ability to leverage the spa-tially correlated nature of the data to borrow statisticalstrength.We propose that breakpoints provide the potential for amore comprehensive model of genome structure. Knowl-edge of long-range connectivity between segments of acancer genome provides the opportunity to simultane-ously analyse breakpoints and copy number in a unified© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to theCreative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.McPherson et al. Genome Biology (2017) 18:140 Page 2 of 14model and to reconstruct the true genomic topology. Inte-grating both copy number and breakpoints also providesadditional information about each breakpoint: whetherthe breakpoint is real or a false positive, the prevalenceof the breakpoint in the clone mixture, and the numberof chromosomes harbouring the breakpoint per clone. Anatural hypothesis then emerges: a comprehensive modelof genome structure will improve both copy number infer-ence and biological interpretation through reconstructedtumour genomes.Some progress has been made on more comprehen-sive modelling of genome structure in tumour clones.Mahmoody et al. [10] propose an algorithm to infermissing adjacencies in a mixture of rearranged tumourgenomes; however, they do not model copy number.Zerbino et al. [11] propose a framework for samplingfrom the rearrangement history of tumour genomes.Oesper et al. [12] propose PREGO, a method for infer-ring the copy number of segments and breakpointsusing a genome graph-based approach, though they donot model normal contamination or tumour hetero-geneity, limiting applicability of their method to realtumour data. More recently, Li et al. [13] formulate aMarkov random field model of allele-specific copy num-ber change and apply their method, Weaver, to sam-ples harbouring a single tumour clone and contaminatingnormal cells.We propose ReMixT, a method for jointly inferringclone mixture proportions, clone- and allele-specific seg-ment copy numbers, and clone-specific breakpoint copynumber from WGS data. We formulate the problem asa posterior inference problem on a probabilistic graph-ical model. Our model captures the spatial correlationboth between segments that are adjacent in the referencegenome in addition to correlations between segmentsadjacent in the tumour genome as nominated by pre-dicted breakpoints. We describe an algorithmic solutionusing structured variational inference. Importantly, ouralgorithm is similar in complexity to a breakpoint-naivehidden Markov model (HMM) of segment copy number.We leverage haplotype blocks to more accurately measureallele-specific read counts and infer allele-specific copynumber for each clone.We assert that joint inference of all three features ofgenome sequencing described above will result in moreaccurate prediction compared to independent inference.Knowledge of rearrangement breakpoints will preventthe smoothing over of copy number changes producedby true rearrangements. Incorrect smoothing of highlyrearranged chromosomes may have detrimental effectson the estimation of mixing proportions and varianceparameters, as the model would be forced to compen-sate for an unexpected increase or decrease in readdepth across the smoothed chromosomes. Finally, posthoc prediction of rearrangement breakpoint copy num-ber based on segment copy number may fail if the exactlocations of associated copy number transitions are notidentified, particularly for rearrangements present in aminor fraction of clones.We show using simulations that a more complete modelof genome structure that includes breakpoint informa-tion results in improved inference of mixture proportionand segment copy number over an otherwise equivalentHMM combined with post hoc annotation. Performanceimprovements are most dramatic when the proportionof one clone is small. We benchmark ReMixT againstTITAN [5], THetA2 [14], Battenberg [8], and CloneHD[7] using a novel framework for generating realistic par-tially simulated WGS datasets from an existing WGSdataset. As further validation, we applied ReMixT tofour primary tumour samples from a patient with high-grade serous ovarian cancer (HGSOvCa) and performedsingle cell breakpoint sequencing on a subset of theclone-specific breakpoints. Next we applied ReMixT toa primary breast cancer sample and its derived mousexenograft samples, recapitulating previously described[15] clonal dynamics identified using deep sequencing ofsingle nucleotide variants (SNVs). Finally, we analysed twoHGSOvCa cell lines, providing examples of how ReMixT-predicted clone-specific breakpoints can phase disparatesubclonal genomic regions into partial tumour chromo-somes towards fully reconstructing clone-specific cancergenomes.ResultsThe ReMixT model of genome structureWe consider the problem of predicting segment andbreakpoint copy number given WGS data from tumourand matched normal samples. Assume as input a set ofalignments of uniquely mapped concordant reads anda set of putative breakpoints predicted from discordantreads. Given N segments indexed by n, n ∈ {1 . . .N}; Kbreakpoints indexed by k, k ∈ {1 . . .K}; and assuming Mclones indexed bym,m ∈ {1 . . .M}, we aim to predict thefollowing:1. Mixture proportions of tumour clones and normalcells ρm2. Clone- and allele-specific copy numbers of genomicsegments cnm3. Clone-specific copy number of rearrangementbreakpoints bkmData preprocessingPreprocessing of tumour WGS data produces measuredtotal and allele-specific read counts for a set of genomicsegments in addition to tumour-specific adjacenciesbetween those segments. First, the genome is partitionedMcPherson et al. Genome Biology (2017) 18:140 Page 3 of 14into regular length segments, with segments containingthe breakends of input breakpoints further partitionedsuch that each breakend coincides with a segment bound-ary. Total read counts are obtained by counting thenumber of uniquely aligned paired-end reads fully con-tained within each segment. Next, haplotype blocks arepredicted from single nucleotide polymorphisms (SNPs)using shapeit2 [16] and a 1000 Genomes reference panel.Reads containing heterozygous SNPs are assigned to hap-lotype blocks, and haplotype block counts are aggregatedwithin segments, resulting in per-segment allele-specificread counts. GC and mappability biases contribute signif-icant variance to segment read counts. We use a position-specific model [17] to calculate a bias-adjusted effectivelength for each segment, where segments with shortereffective lengths are statistically less well represented byread counts. For visualization purposes, we calculate rawmajor and minor copy numbers for each segment fromobserved depths and allele ratios and inferred normaland tumour depth. Additional details are provided inAdditional file 1: Sections 1.1 and 1.2.Probabilistic modelWe propose a probabilistic model of genome structureand a structured variational inference algorithm for calcu-lating the optimal clone mixture and segment and break-point copy number (Fig. 1). Below we focus on a modelof total copy number and defer the details of the allele-specific model and modelling of outliers to Additionalfile 1: Section 1.3. Let p(x|c, h, l, θ) be the likelihood ofobserved total read count x given per clone segment copynumber c, segment length l, global likelihood parametersθ , and per clone haploid read depths h. The haploid readdepths encode both the mixture and depth of sequenc-ing and are specified as reads per nucleotide for a singlecopy of a segment. The expected read count μn of seg-ment n is a linear combination of the segment length,clone-specific copy number, and clone-specific haploidread depth, summed over clones (Eq. 1):μn = ln∑mhmcnm (1)A reasonable starting point is to assume read countsare Poisson distributed [18] (xn ∼ Pois(μn)); however,we show in Additional file 1: Section 1.2.3, that a two-component negative binomial mixture provides a signifi-cantly better fit to real data.Let p(C,B|O, λ) be the joint probability of segment andbreakpoint copy number (C and B respectively) givenbreakend orientations O. We assume the copy numbersof a sequence of segments have the Markov property-given breakpoint copy number, and represent the resultingchain structure as a product of un-normalized transitionfactors1. A breakpoint with breakend interposed betweenadcbFig. 1 An overview of the ReMixT Method. a) Bulk sequencing is applied to a mixture of cells modeled as a set of clones of unknown proportioneach with distinct sets of chromosomes with unknown structure. b) Observed data include binned read counts per segment, and rearrangementbreakpoints connecting segment ends. c) The ReMixT graphical model as a factor graph. d) Calculation of the transition factor involves calculatingthe number of telomeres t, the number of segment ends left unconnected to another segment end in the modelMcPherson et al. Genome Biology (2017) 18:140 Page 4 of 14two segments will result in a copy number transitionbetween those segments. For instance, a transition in copynumber is expected between two segments to either sideof the start of a deletion, with the difference in segmentcopy number equal to the number of chromosomes har-bouring the deletion event, or equivalently, the number ofcopies of the deletion breakpoint. A mismatch in segmentand breakpoint copy number implies that at least one seg-ment end is left disconnected (Fig. 2d). We call these freeends telomeres, and define the transition factors of ourprobability model in terms of the number of telomerest implied by the segment and breakpoint copy number.Without a breakpoint, the number of telomeres is simplythe absolute difference in copy number between adjacentsegments t(c, c′) = |c − c′|. Depending on its orienta-tion, a positive copy number for a breakpoint may explainsome or all of the difference in copy number betweenadjacent segments. The number of telomeres at a transi-tion coincident with a breakpoint can thus be calculatedas t(c, c′, b′, o) = |c − c′ − o · b|, with orientation o ∈{−1,+1}. For multiple clones, t may be a more complexfunction of the copy number differences for each clone(see Additional file 1: Section 1.4).Define transition factors f (c, c′, b|o, λ) = e−λt(c,c′,b|o),and let kn be the index of the breakpoint interposedbetween segment n and n + 1. Write the joint proba-bility over the observed read counts and segment andbreakpoint copy number as given by Eq. 2:p(X,C,B|h, L,O, θ , λ) = p(X|C, L, h, θ)p(C,B|O, λ)∝N∏n=1p(xn|cn, h, ln, θ)×N−1∏n=1f (cn, cn+1, bkn |on, λ)(2)Exact inference in the ReMixT model is intractable dueto additional dependencies introduced by modelling thelong-range connectivity of breakpoints.Structured variational inferenceWe are seeking to infer the posterior probability p(z|x)of the unobserved model variables z given observed datax. The variational inference approach seeks to approxi-mate an intractable posterior p(z|x) with a more tractablefamily of distributions q(z), typically characterized by anincreased number of parameters and fewer dependen-cies [19]. An optimal q(z) is computed by minimizing theKullback-Leibler (KL) divergence between p(z|x) and q(z)as given by Eq. 3:DKL (q(z)|p(z|x)) =∫q(z) log( q(z)p(z|x))dz= log p(x) −∫q(z)p(x, z)dz+∫q(z) log q(z)dz= log p(x) − Eq[p(x, z) − log q(z)] (3)The expectation given in the final form of Eq. 3forms a lower bound on the model evidence p(x), sinceDKL (q(z)|p(z|x)) is positive and approaches zero for a per-fect approximation. Importantly, the difficult problem ofdirectly minimizing the KL divergence is equivalent to theeasier problem of maximizing this evidence lower bound(ELBO). The mean field approximation assumes a distri-bution q(z) = ∏i qi(zi) that factorizes over single modelvariables. In structured variational inference, each zi is adisjoint set of model variables, allowing q to have a morecomplex dependency structure that better approximatesthe posterior [20, 21]. Independence between factors of qallows for application of a coordinate descent algorithmthat iteratively maximizes the ELBO with respect to eachqi using general updates given by Eq. 4:log q∗(zj) = E∏j =i qj(zj)[ log p(x, z)]+ const (4)We approximate the posterior p(C,B, h, θ |X, L,O, λ)using a distribution q with factorization given by Eq. 5:q(C,B, h, θ) = q(h)q(θ)q(C)∏kqk(bk) (5)Taking a variational expectation maximization (EM)approach, we specify the distributional form of q(h) andq(θ) to be the Dirac delta function, and compute pointestimates for those parameters. Applying Eq. 4 to q(C)results in Eq. 62:log q∗(C) =∑B(∏kq(bk))log p(X,C,B, h, θ |L,O, λ)+ const=∑nζn(cn) +N−1∑n=1ζn(cn, cn+1) + const (6)ζn(cn) = log p(xn|cn, h, ln, θ) (7)ζn(cn, cn+1) =∑bqkn(b) log f (cn, cn+1, b|on, λ) (8)By inspection, the probability distribution q∗(C) givenby Eq. 6 has a chain topology equivalent to an HMM, withan emission calculated as a function of the read countlikelihood and transition matrices calculated by modify-ing f according to qkn(b) (Eqs. 7 and 8). The emission andtransition terms ζn(cn) and ζn(cn, cn+1) define the vari-ational parameters of q(C). The sum product algorithmcan be used to calculate the single and pairwise posteriorMcPherson et al. Genome Biology (2017) 18:140 Page 5 of 14a bc de fg hFig. 2 Simulation results for the integrated breakpoint model and an equivalent hidden Markov model (HMM) with postprocessing to inferbreakpoint copy number. Also shown are results for the breakpoint model with perfect initialization. Two sets of simulations were performed,varying fraction of the descendant tumour clone (left column) and proportion of the genome with divergent copy number (right column). Boxplotsshow proportion of the genome (a, b) and proportion of breakpoints (c, d) for which the tool correctly called clone-specific copy number, inaddition to relative normal fraction error (e, f) and relative minor clone fraction error (g, h). Boxes show the interquartile (IQR) range with a linedepicting the median. Whiskers extend 1.5 × IQR above quartile 3 and below quartile 1. Diamonds show positions of outlier data pointsMcPherson et al. Genome Biology (2017) 18:140 Page 6 of 14marginal probabilities of q(C), denoted γn(c) and γn(c, c′)respectively. The posterior marginals of q(C) will appearin the updates of the other factors of q, as shown below.Applying Eq. 4 to optimize qk(bk) results in Eq. 9:log q∗k(bk) =∑Cq(C) log p(X,C,B, h, θ |L,O, λ) + const=∑n:kn=k∑c∑c′γn(c, c′) log f (c, c′, bk|o, λ)+ const (9)Intuitively, the variational updates for q(C) and qk(bk)described above involve first updating the transitionmatrices of an HMM, weighting specific transitions thatcorrespond to copy number changes induced by high-probability breakpoint copy number states, and thenupdating breakpoint copy number states according to theprobabilities over adjacent segments in the HMM.Since the entropy of a delta function is constant, opti-mal estimates of h and θ involve minimizing only theEq[log p(x, z)]term of the ELBO. Read counts are inde-pendent of breakpoints given segment copy number; thus,the expectation is calculated over q(C) only (Eq. 10).Minimization is accomplished by computing derivativeswith respect to the parameters and using quasi-Newtonmethods to find a local minimum.Eq[log p(x, z)] =∑Cq(C) log p(X,C,B, h, θ |L,O, λ)=∑n∑cγn(c) log p(xn|c, h, ln, θ) (10)Realistic simulations of bulk genome sequencingWe developed a principled method of simulating rear-ranged genomes that fulfilled three important criteria.First, the simulated tumour genomes were required tohave been produced by a known evolutionary historycomposed of duplication, deletion, and balanced rear-rangement events applied successively to an initially non-rearranged normal genome. Second, the copy numberprofile of the simulated tumour genome should be rea-sonably similar to those of previously observed tumours.Third, the simulated data should be subject to the samebiases seen in real genome sequence data.To satisfy the first two criteria, we developed a sam-pling framework for generating realistic evolutionary his-tories based on a scoring and re-sampling strategy (seeAdditional file 1: Section 2.1). This first step producesa set of rearrangements, in addition to per-clone per-segment copy numbers. WGS read-level data are gener-ated from segment copy numbers in one of two possibleways. For segment count simulations, read counts aresimulated directly from a likelihood model given simu-lated segment copy number. For aligned read re-sampling,individual reads are re-sampled from a very high depthsource normal genome dataset based on simulated seg-ment copy number. By using an appropriate likelihoodmodel, segment count simulations can be used to gen-erate read counts with a distribution that reflects theover-dispersion and outliers in real data. Aligned read re-sampling datasets are computationally more intensive togenerate, but are able to produce read count data with GCand mappability bias similar to that of the source dataset.See Additional file 1: Section 2.2 for additional details.Breakpoint model improves inference for segment countsimulationsWe first sought to understand the benefit of an inte-grated breakpoint model using segment count simula-tions.We compared the ReMixTmodel with an equivalentbreakpoint-naive HMM followed by post hoc breakpointcopy number calculation. For the breakpoint-naive model,we first infer segment copy number using the ReMixTmodel with breakpoint copy number at zero. We then usea simple greedy algorithm (see Additional file 1: Section2.5) to perform a post hoc computation of the break-point copy number based on the segment copy numberinferred using the HMM. As variational inference is sen-sitive to initialization, we also included results using theReMixT breakpoint model with perfect initialization. Weperformed our evaluation on two sets of simulations, onein which we varied the proportion of the genome sim-ulated to be subclonal, and one in which we varied thedescendant clone fraction (see Additional file 1: Section2.3 for details)3.We evaluated the breakpoint model and the HMM onthe model’s ability to recover the true clonal mixture, seg-ment copy number, and breakpoint copy number (Fig. 2).Mixture predictionwas assessed by calculating the relativedeviation of the predicted normal fraction and descen-dant clone fraction from the simulated values. Segmentand breakpoint copy number prediction was assessed bycalculating the proportion of segments/breakpoints forwhich the true clone-specific copy number was recoveredby the method.For both segment and breakpoint copy number pre-diction, the breakpoint model outperformed the base-line HMM. The proportion of segment copy numbercalled correctly was significantly higher for the breakpointmodel for all simulations with the exception of those sim-ulations with a descendant clone fraction of 55% (pairedt test, p value < 0.05, Fig. 3a and b). Additionally, theproportion of breakpoints with correctly predicted copynumber was significantly higher for the breakpoint modelfor all simulations with the exception of those with theproportion of the genome subclonal set at 45% (paired ttest, p value < 0.05, Fig. 3c and d). Improvement withrespect to prediction ofminor clone fractionwas observedfor descendant clone fractions 0.05 and 0.3 (paired t test,McPherson et al. Genome Biology (2017) 18:140 Page 7 of 14a be fc dg hi jFig. 3 Performance comparison of ReMixT with CloneHD, TITAN, Battenberg, and THetA using read re-sampling simulations. Two sets of simulationswere performed, varying fraction of the descendant tumour clone (left column) and proportion of the genome with divergent copy number (rightcolumn). Boxplots show proportion of the genome for which the tool correctly called the copy number of the dominant clone (a, b), relative meanploidy error compared to simulated (c, d), relative proportion divergent error compared to simulated (e, f), relative normal fraction estimation errorcompared to simulated (g, h), and relative minor clone fraction estimation error compared to simulated (i, j). Battenberg was excluded from theminor clone fraction benchmark, as it does not produce a global estimate of this parameter. Boxes show the interquartile (IQR) range with a linedepicting the median. Whiskers extend 1.5 × IQR above quartile 3 and below quartile 1. Diamonds show positions of outlier data pointsMcPherson et al. Genome Biology (2017) 18:140 Page 8 of 14p value < 0.05, Fig. 3g). No improvement was observedwith respect to normal fraction prediction, though wedid observe a decrease in accuracy for descendant clonefraction 0.55 (paired t test, p value = 0.03, Fig. 3e). Per-fect initialization showed improved results over our cur-rent initialization method, indicating additional room forimprovement with respect to this aspect of the algorithm.Comparison with existing copy number inference methodsWe used our aligned read re-sampling framework tocompare the performance of ReMixT to four existingmethods for subclonal copy number inference: TITAN[5], CloneHD [7], Battenberg [8], and THetA2 [12, 14].We performed our comparison on two sets of genomemixtures, one in which we varied the proportion of thegenome simulated to be subclonal, and one in which wevaried the descendant clone fraction. We used alignedread re-sampling to produce realistic simulated datasetsusing 200X sequencing of the NA12878 hapmap individ-ual provided by Illumina [22]. Each tool was run withdefault parameters according to available instructions (seeAdditional file 1: Section 4 for details).Performance of the four tools varied significantly acrosseach measure (Fig. 3). CloneHD was unable to recoverthe copy number of the dominant clone with reasonableaccuracy for a majority of the simulations (< 43% accu-rate for 50% of simulations). In general, CloneHD copynumber results showed a higher mean ploidy and higherdivergent proportion (proportion of the genome predictedto have clonally divergent copy number) than simulatedresults (average 37% higher and 44% higher respectively).However, in many instances, CloneHD was able to esti-mate normal fraction with reasonable accuracy (within6.6% of simulated for 50% of the simulations). Minor clonefraction estimation was less accurate (within 28% of sim-ulated for 50% of the simulations). Our results imply thatCloneHD is prone to over-fitting, producing unrealisticcopy number profiles.THetA, by contrast, produced solutions accurate withrespect to mean ploidy (within 6.5% of simulated for 75%of simulations) and, to a lesser extent, divergent propor-tion (within 20% of simulated for only 25% of simulations).Additionally, THetA copy number predictions were moreconsistent in their accuracy, with the dominant copy num-ber predicted with greater than 81% accuracy for 50% ofthe simulations. The normal fraction estimation error wasin general higher than for the other tools (within 17%of simulated for 50% of simulations). THetA’s estimateddescendant clone fractions were also less accurate thanthose of the other tools (within 21% of simulated for only25% of simulations).TITAN’s results were the most variable, with dominantcopy predicted accurately for a large number of simula-tions (> 88% for 25% of simulations) but poorly for manyother simulations (< 21% for 25% of simulations). As withCloneHD, TITAN appeared to over-fit for a subset of thesimulations, producing solutions for which mean ploidyand divergent proportion were higher than simulated(> 28% higher than simulated ploidy for 25% of sim-ulations and > 66% higher than simulated divergentproportion for 50% of simulations). TITAN estimatednormal fractions with low error for a majority of simu-lations (within 5% of simulated for 50% of simulations),though prediction of minor clone fractions was morevariable (error greater than 19% of simulated for 75% ofsimulations).Battenberg’s results were the most consistent of thecompeting tools. For the simulations with 50/50 tumourmixtures, Battenberg produced a solution at double thesimulated ploidy, highlighting the unidentifiability of thisparticular scenario. Excluding the 50/50 tumour mixturesimulations, Battenberg predicted dominant copy num-ber within 3% for 75% of the simulations and ploidywithin 4% for 75% of the simulations. Battenberg in gen-eral under-estimated the divergent proportion, 13% lowerthan simulated for 75% of simulations. Normal fractionswere also accurate, within 6% of simulated for 100% ofsimulations, excluding 50/50 mixtures. Battenberg doesnot estimate minor clone fraction and was thus excludedfrom such analyses.ReMixT consistently outperformed the four competingtools on all measures. For 75% of the simulations, ReMixTwas able to infer integer copy number for both clones withgreater than 91% accuracy. Lower accuracy results wereobtained for 50/50 tumour mixtures, primarily due to theinherent ambiguity of assigning copy numbers to specificclones for such mixtures. Normal fraction estimation wasslightly biased, and was over-estimated by 1.4% of sim-ulated on average, though never by more than 2.6%. Asexpected, minor clone fraction estimation was less accu-rate for mixtures with the smallest simulated minor clonefractions, up to 50% of simulated, averaging 5%. For theremaining simulations minor clone fraction estimationerror averaged 0.6% with a maximum of 8%.Targeted single cell validation of clone-specific breakpointsNext we sought to establish the accuracy of breakpointcopy number inference in a realistic setting using tar-geted single cell sequencing in a set of specially separatedhigh-grade serous ovarian tumour samples [23]. The setof samples included two obtained from the patient’s rightovary, one from the left ovary, and one from the omentum(Fig. 5b). Each sample was whole genome sequenced to anapproximate depth of 30X.We hand-selected 12 breakpoints associated with puta-tive copy number changes for validation by targeted singlecell sequencing (Fig. 4). Specifically, for each of the 12candidate breakpoints, at least one breakend coincidedMcPherson et al. Genome Biology (2017) 18:140 Page 9 of 14a bcFig. 4 Single cell validation of ReMixT results for 12 breakpoints in 294 cells from 4 HGS Ovarian tumour samples: Omentum 1 (Om1), Right Ovary 1and 2 (ROv1 and ROv2), and Left Ovary 1 (LOv1). (a) Breakpoint (x-axis) by cell (y-axis) presence (dark blue) / absence (light blue) with cells annotatedby sample of origin and clone as inferred by the Single Cell Genotyper. (b) Approximate anatomic location of the 4 tumour samples. (c) F-measure,precision and recall for ReMixT calls of breakpoint presence and subclonalitywith a transition in copy number in at least one sample,where copy number was inferred using an earlier versionof ReMixT [23]. In addition, we selected 60 somatic and 24germline single nucleotide changes based on their utilityas clonal markers [23]. Targeted single cell sequencing wasperformed as previously described [23], cells were clus-tered into clones using the Single Cell Genotyper [24], andbreakpoints were assigned to clones if they were present inat least three cells of that clone. Joint analysis of the break-point and single nucleotide data produced a robust esti-mate of the clonal genotypes with respect to the targetedbreakpoints (Fig. 4a).Next we evaluated the ability of ReMixT to accu-rately determine which breakpoints were present/absentand clonal/subclonal in each sample. We calculated theF measure for present/absent and clonal/subclonal calls(Fig. 4c). F measure values were similar to results obtainedfrom running ReMixT on aligned read re-sampling simu-lations.Tracking clonal expansions using clone-specificbreakpointsSeveral previous studies have used clone-specific SNVs toidentify patterns of clonal evolution [25], infer patterns ofcancer cell dissemination to metastatic sites [23, 26], andtrack expansion and contraction of tumour clones overtime and in response to therapy [27] and in response toxenograft passaging [15]. We sought to evaluate the util-ity of clone-specific breakpoints predicted by ReMixT forinvestigating clonal evolution in successive xenograft pas-sages. To this end, we analysed primary and xenografttumour samples derived from a patient with breast cancer(SA501 from [15]). Our analysis focused on four samples,the primary tumour sample and three xenograft sampleslabelled X1A, X3A, and X3F. The relationship betweenthese four samples and the additional two un-sequencedxenograft samples X2A and X2F is shown in Fig. 5b.For validation of X3F clone-specific copy numberchanges, we used recently published single cell WGS data[28]. We inferred total integer copy number and per-formed phylogenetic analysis using previously describedtechniques [15, 28]. Three major clones were identified.Proportions of cells assigned to each clone were 0.82, 0.11,and 0.07 for clones A, B, and C respectively. Clones B andC were highly similar and formed a distinct clade; thus,for this analysis we merged clones B and C. For clone Aand merged clone BC, we reconstructed clone copy num-ber profiles by selecting the most prevalent copy numberwithin each clone for each segment. Segments with copynumber 6 or higher were removed, as specific copy num-ber states above 5 could not be inferred using availabletechniques.ReMixT analysis using default parameters estimated aclonal mixture of 0.85 for the dominant clone and 0.15 forthe minor clone. Clone-specific copy numbers matchedsingle cell copy number for 91% of the genome. Accu-racy was highest for segments in lower copy number states(≤ 3 total copies). Segments with higher copy number(≥ 4 total copies) and no clonal divergence were fre-quently predicted as subclonal by ReMixT, evidence thatReMixT over-fits some segments with higher copy num-ber (Fig. 5c). Additional disparity appeared to be theresult of noisy segments in lower copy states predicted assubclonal.Next we identified a set of high confidence sub-clonal breakpoints for analysis of clonal dynamics inthe xenograft passages. We smoothed segments smallerthan 100 kb and aggregated adjacent segments with theMcPherson et al. Genome Biology (2017) 18:140 Page 10 of 14adebcFig. 5 Tracking clonal expansions in xenograft passages. a Breakpoints identified by ReMixT as clone-specific were classified according to theirclonal prevalence change between SA501X1A and replicate xenograft passages SA501X3A and SA501X3F. All breakpoints could be classified asascending in both SA501X3A and SA501X3F, descending in both, or stable in at least one. Shown are the clonal prevalence changes between pairsof samples for which WGS was available. b Relationship between primary tumour sample T and xenograft passages X*. c Accuracy of copy numberinference for X3F based on single cell whole genome sequencing. Shown is the proportion of regions with correctly predicted copy number (y-axis)for each clone A copy number (x-axis), split between clonal and subclonal (blue/green) as determined from single cell data. d Copy number profile(top) for chromosomes 7 and 15 showing corroboration between single cell (bottom) and ReMixT (middle) subclonal copy number prediction. Yellowflags show the location of translocation breakpoints predicted to be subclonal by ReMixT. e Similarly, chromosomes 1/18 translocation breakpointspredicted to be subclonal by ReMixT. Copy number plots show raw major (red) and minor (blue) copy numberssame allele-specific difference between clone copy num-bers. We then removed segments with length less than1 Mb or copy number greater than 4. Breakpoints wereselected if they were predicted to be subclonal, and wereimmediately adjacent at each breakend to a segment withsubclonal copy number from the above set of filtered highMcPherson et al. Genome Biology (2017) 18:140 Page 11 of 14confidence segments. This technique was used to identify17 subclonal breakpoints in one of X1, X3A, X3F, and X5or the primary tumour sample. In X3F, the ReMixT copynumber matched the single cell copy number for 84% ofthe 1-Mb regions to either side of each breakend. For 11of the predictions, corroboration was >92%, and for theremaining predictions, corroboration was closer to 50%,indicating a lack of corroboration on one side of eachbreakend. Included in the set of breakpoints were inter-chromosomal translocations linking subclonal segmentson disparate chromosomes, indicative of clone-specificloss or gain of rearranged tumour chromosomes (Fig. 5dand e).Patient SA501 was previously shown to have exhibitedreproducible patterns of clonal expansions across multiplereplicate xenografts using a combination of targeted bulkand single cell sequencing of SNVs [15]. In particular, X3Aand X3B showed similar patterns of clonal expansionsfor clusters of SNVs used as clonal markers. We soughtto establish whether the same clonal dynamics were evi-dent in X3F, and whether those clonal dynamics couldbe understood using clonal-specific breakpoints. To thatend, we classified each of the high confidence subclonalbreakpoints according to whether they exhibited the sameexpansion patterns from X1 to X3A and X1 to X3F. Ofthe 17 high confidence breakpoints, 6 could be classifiedas ascending in both X3A and X3F, 6 as descending inboth X3A and X3F, with the remaining stable from X1 toeither X3A or X3F (Fig. 5a). Strikingly, we did not identifyany conflicting breakpoints, those ascending in X3A anddescending in X3F or vice versa.Assembling tumour chromosomes using subclonalbreakpointsWe applied ReMixT to WGS data from two tumour-derived cell line samples and a matched normal sampleobtained from a patient with HGSOvCa [29]. The twocell lines are derived from an ascites sample (DAH354)and a primary tumour sample (DAH355) obtained duringdebulking surgery. Cell line samples and matched normalswere sequenced to approximately 30X and analysed withReMixT using default parameters. Tetraploid solutionswere selected based on ploidy evidence from preliminarysingle cell sequencing experiments for DAH355 (data notshown).As expected of HGSOvCa, the copy number profilesof the cell line samples showed substantial evidenceof genome instability. For both samples, the fractionof the genome predicted to be diploid heterozygouswas insignificant, and the fraction of the genome withloss of heterozygosity was 40% and 35% for DAH354and DAH355 respectively. Both DAH354 and DAH355showed evidence of multiple genomically distinct clonalpopulations, with dominant clone fractions of 0.7 and0.61 respectively, and fraction of the diploid genome pre-dicted as subclonal as 14% and 32% respectively. A totalof 348 somatic breakpoints were identified by deStruct[4], of which 278 were determined to be present (posi-tive copy number) by ReMixT in one or both samples.A total of 97 breakpoints were predicted to have clone-specific copy number in one or both samples, with 17having clone-specific copy number in both samples.In both DAH354 and DAH355, we observed severalclone-specific translocations adjacent to large segmentswith clonally divergent copy numbers. As with SA501, wesuspected that the loss or duplication of a single tumourchromosome would result in multiple clonally divergentsegments across the reference genome. We thus searchedfor clonally divergent segments connected by subclonalbreakpoints as a method for understanding the struc-ture of tumour chromosomes with divergent copy numberacross the clonal population (Fig. 6). In DAH354, we iden-tified a tumour chromosome composed of three segmentsfrom reference chromosomes 7, 11, and 9 (Fig. 6a), and inDAH355, we identified a tumour chromosome composedof four segments from reference chromosomes 6, 1, 3,and 15 (Fig. 6b).DiscussionWe have demonstrated that ReMixT improves bothinference and interpretation of copy number changesand genomic rearrangements. Improved accuracy wasobserved for prediction of clone fraction, clone specificcopy number, and clone specificity of breakpoints. Weshow how breakpoint copy number changes can be useda markers of clonal populations, and used to track clonalpopulation dynamics in the same way as SNVs. By link-ing clone specific copy number changes to breakpointswe show how targeted single cell sequencing can be usedto jointly profile clonal genotypes in SNV and copy num-ber space. Furthermore, we are able to reconstruct partialtumour chromosomes lost or gained in sub-populationsof cells.Although our method shows performance gains overother methods, further improvements are possible. Theperformance of our variational inference algorithm ishighly dependent on the quality of the initialization.Improvement may be gained using more sophisticated orinformed initialization methods, or extensions to vari-ational inference using annealing or MCMC. Our cur-rent implementation is limited to two tumour clones,largely due to the increased computational complexityof modelling additional clones. An approximating distri-bution factorized per clone would solve the complexityissue within the context of structured variational infer-ence, however based on our own experimentation, sucha factorization exacerbates the initialization problem andwas found to be infeasible. Thus improvements to theMcPherson et al. Genome Biology (2017) 18:140 Page 12 of 14abFig. 6 Inference of partial tumour chromosome assemblies based on linking subclonal segments and breakpoints. Two assembled chromosomesare shown for cell lines DAH354 (a) and DAH355 (b). Shown for each assembled chromosome is a schematic of the segments involved (top left), atable of breakpoint copy number predicted by ReMixT (top right), and a chromosome copy number plot (bottom). Each copy number plot showsraw major (red) and minor (blue) copy numbers (top axis), in addition to prediction of subclonality (bottom axis)variational inference method may also allow for the use ofa more factorized approximation, removing the limitationon the number of clones.ConclusionsTraditionally, classes of genomic aberration have beenpredicted and characterized independently, with post-hocanalysis to determine correlation between events in eachclass. However, there are clear dependencies betweenclasses of aberrations with respect to their generation viamutational processes and their observation using genomesequencing. A number of existing methods partially lever-age class dependencies[7, 30, 31], and the development ofReMixT represents a further step towards a comprehen-sive model of genomic aberrations in tumour populations.We anticipate further benefit may be gained from jointlymodelling copy number changes, rearrangements, SNPsand SNVs, all within the context of an appropriate phylo-genetic model. Future research leveraging the patterns ofgenome damage and the totality of somatic alterations in acancer’s evolutionary history to elucidate its biologic andmutagenic properties will derive benefit from ReMiXT’simproved accuracy in structural alteration detection andinterpretation.Endnotes1A product of normalized conditional probabilities anda prior probability for the first segment would also be pos-sible, though we believe integration of breakpoints intothe model would be less intuitive.2Assuming uniform improper priors over h and θ ,we have log p(X,C,B|h, θ , L,O, λ) = log p(X,C,B, h, θ |L,O, λ) + const.3We maintained a distinction between ances-tral/descendant clone mixtures of x / 1 − x and thereversed 1 − x / x clone mixture, as results for thesemixtures differ.Additional fileAdditional file 1: Supplementary methods, supplementary analyses, andadditional experimental results. (PDF 35000 kb)AcknowledgementsWe thank Dr. Anne-Marie Mes-Masson, CHUM Research Centre (CRCHUM),Montreal, Canada for providing the high-grade serous ovarian cancer cell lines.McPherson et al. Genome Biology (2017) 18:140 Page 13 of 14FundingThis work was supported by a Discovery Frontiers project grant, The CancerGenome Collaboratory, jointly sponsored by the Natural Sciences andEngineering Research Council (NSERC), Genome Canada (GC), the CanadianInstitutes of Health Research (CIHR), and the Canada Foundation forInnovation (CFI) to SPS and SCS. In addition, we acknowledge generouslong-term funding support from the BC Cancer Foundation. The SPS and SAgroups receive operating funds from the Canadian Breast Cancer Foundation,the Canadian Cancer Society Research Institute (impact grant 701584 to SAand SPS), the Terry Fox Research Institute (Program Project Grants program onforme fruste tumours), the CIHR (grant MOP-115170 to SA and SPS), and theCIHR Foundation (grant FDN-143246 to SPS). SPS and SA are supported byCanada Research Chairs. SPS is a Michael Smith Foundation for HealthResearch scholar. Additional funding is provided by NIH GM108348 and by theIndiana University Precision Health Grand Challenge Initiative to SCS.Availability of data andmaterialsReMixT is written in Python and C++. Source code is available at http://bitbucket.org/dranew/remixt, released under the MIT license. The version ofReMixT used for this manuscript has been published on http://zenodo.orgwith doi: 10.5281/zenodo.819479.High-grade serous ovarian cancerWe used previously published data [23] available from the EuropeanGenome-phenome Archive (http://www.ebi.ac.uk/ega/) under accessionnumber [EGA:S00001000547].Breast cancerWe used previously published data [28] available from the EuropeanGenome-phenome Archive (http://www.ebi.ac.uk/ega/) under accessionnumber [EGA:S00001002170].Authors’ contributionsAM developed and implemented the algorithm, performed the analysis, andwrote the manuscript. AS analyzed the SA501 single cell data. PE wasresponsible for development of the xenograft models. SPS contributed to themanuscript text. AR, GH, CC, CD, AB, SCS, and SA contributed ideas duringdevelopment of the model and analysis of the results. All authors read andapproved the final manuscript.Ethics approval and consent to participateHigh-grade serous ovarian cancer.Ethical approval was obtained from the University of British Columbia (UBC)Research Ethics Board (H08-01411 NGS Huntsman). Women undergoingdebulking surgery (primary or recurrent) for carcinoma of ovarian, peritoneal,and/or fallopian tube origin were approached for informed consent for thebanking of tumour tissue. All experimental methods comply with the HelsinkiDeclaration.Breast cancerAnonymized tumour tissue from women aged 26–82 undergoing surgery ordiagnostic core biopsy was collected with informed consent, according toprocedures approved by the UBC Research Ethics Board (H06-00289 BreastTumour Tissue Repository and H13-01125 Breast Xenograft Aparicio). Allexperimental methods comply with the Helsinki Declaration.Patient-derived xenograftsFemale NOD/SCID interleukin-2 receptor gamma null (NSG) and NOD Rag-1null interleukin-2 receptor gamma null (NRG) mice were bred and housed atthe Animal Resource Centre at the British Columbia Cancer Research Centreand the Biological Resource Unit at the Cancer Research UK CambridgeResearch Institute. Surgery was carried out on mice between the ages of 5–10weeks. All experimental procedures were approved by the University of BritishColumbia Animal Care Committee and the University of Cambridge AnimalWelfare and Ethical Review Committee.Consent for publicationNot applicable.Competing interestsThe authors declare that they have no competing interests.Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.Author details1Department of Molecular Oncology, BC Cancer Agency, 675 West 10thAvenue, Vancouver, BC, Canada. 2Department of Pathology and LaboratoryMedicine, University of British Columbia, 2329 West Mall, Vancouver, BC,Canada. 3Department of Statistics, Oxford University, 24-29 St Giles, Oxford,United Kingdom. 4Ludwig Institute for Cancer Research, Oxford University, OldRoad Campus Research Building, Headington, Oxford, United Kingdom.5Dana-Farber Cancer Institute, 450 Brookline Ave, Oxford, Boston, USA. 6Eli andEdythe L. Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA,USA. 7Department of Mathematics, Simon Fraser University, 8888 UniversityDrive, Burnaby, BC, Canada. 8Department of Statistics, University of BritishColumbia, 2329West Mall, Vancouver, BC, Canada. 9Vancouver Prostate Centre,2660 Oak Street, Vancouver, Canada. 10Department of Computer Science,Indiana University Bloomington, 107 S. Indiana Avenue, Bloomington, IN, USA.Received: 16 March 2017 Accepted: 3 July 2017References1. Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. Delly:structural variant discovery by integrated paired-end and split-readanalysis. Bioinformatics. 2012;28(18):333–9.2. Hormozdiari F, Alkan C, Eichler EE, Sahinalp SC. Combinatorialalgorithms for structural variation detection in high-throughputsequenced genomes. Genome Res. 2009;19(7):1270–8.3. Layer RM, Chiang C, Quinlan AR, Hall IM. Lumpy: a probabilisticframework for structural variant discovery. Genome Biol. 2014;15(6):84.4. McPherson A, Shah SP, Sahinalp SC. deStruct: accurate rearrangementdetection using breakpoint specific realignment. bioRxiv. 2017. https://doi.org/10.1101/117523.5. Ha G, Roth A, Khattra J, Ho J, Yap D, Prentice LM, Melnyk N, McPherson A,Bashashati A, Laks E, Biele J, Ding J, Le A, Rosner J, Shumansky K, MarraMA, Gilks CB, Huntsman DG, McAlpine JN, Aparicio S, Shah SP. TITAN:inference of copy number architectures in clonal cell populations fromtumor whole-genome sequence data. Genome Res. 2014;24(11):1881–93.6. Oesper L, Mahmoody A, Raphael BJ. THetA: inferring intra-tumorheterogeneity from high-throughput dna sequencing data. Genome Biol.2013;14(7):80.7. Fischer A, Vázquez-García I, Illingworth CJR, Mustonen V. High-definitionreconstruction of clonal composition in cancer. Cell Rep. 2014;7(5):1740–52.8. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW,Raine K, Jones D, Marshall J, RamakrishnaM, Shlien A, Cooke SL, Hinton J,Menzies A, Stebbings LA, Leroy C, Jia M, Rance R, Mudie LJ, Gamble SJ,Stephens PJ, McLaren S, Tarpey PS, Papaemmanuil E, Davies HR, Varela I,McBride DJ, Bignell GR, Leung K, Butler AP, Teague JW, Martin S,Jönsson G, Mariani O, Boyault S, Miron P, Fatima A, Langerød A,Aparicio SAJR, Tutt A, Sieuwerts AM, Borg Å, Thomas G, Salomon AV,Richardson AL, Børresen-Dale AL, Futreal PA, Stratton MR, Campbell PJ,Breast Cancer Working Group of the International Cancer GenomeConsortium. The life history of 21 breast cancers. Cell. 2012;149(5):994–1007.9. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G,Janoueix-Lerosey I, Delattre O, Barillot E. Control-freec: a tool forassessing copy number and allelic content using next-generationsequencing data. Bioinformatics. 2012;28(3):423–5.10. Mahmoody A, Kahn CL, Raphael BJ. Reconstructing genome mixturesfrom partial adjacencies. BMC Bioinforma. 2012;13 Suppl 19:9.11. Zerbino DR, Paten B, Hickey G, Haussler D. An algebraic framework tosample the rearrangement histories of a cancer metagenome withdouble cut and join, duplication and deletion events. arXiv. 2013.arXiv:1303.5569v1.12. Oesper L, Ritz A, Aerni SJ, Drebin R, Raphael BJ. Reconstructing cancergenomes from paired-end sequencing data. BMC Bioinformatics. 2012;13Suppl 6:10.13. Li Y, Zhou S, Schwartz DC, Ma J. Allele-specific quantification ofstructural variations in cancer genomes. Cell Syst. 2016;3(1):21–34.14. Oesper L, Satas G, Raphael BJ. Quantifying tumor heterogeneity inwhole-genome and whole-exome sequencing data. Bioinformatics.2014;30(24):3532–40. doi:10.1093/bioinformatics/btu651.McPherson et al. Genome Biology (2017) 18:140 Page 14 of 1415. Eirew P, Steif A, Khattra J, Ha G, Yap D, Farahani H, Gelmon K, Chia S,Mar C, Wan A, Laks E, Biele J, Shumansky K, Rosner J, McPherson A,Nielsen C, Roth AJL, Lefebvre C, Bashashati A, de Souza C, Siu C, Aniba R,Brimhall J, Oloumi A, Osako T, Bruna A, Sandoval JL, Algara T,Greenwood W, Leung K, Cheng H, Xue H, Wang Y, Lin D, Mungall AJ,Moore R, Zhao Y, Lorette J, Nguyen L, Huntsman D, Eaves CJ, Hansen C,Marra MA, Caldas C, Shah SP, Aparicio S. Dynamics of genomic clones inbreast cancer patient xenografts at single-cell resolution. Nature.2015;518(7539):422–6.16. Delaneau O, Marchini J, Zagury JF. A linear complexity phasing methodfor thousands of genomes. Nat Methods. 2012;9(2):179–81.17. Benjamini Y, Speed TP. Summarizing and correcting the GC content biasin high-throughput sequencing. Nucleic Acids Res. 2012;40(10):72.18. Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M. Detecting copynumber variation with mated short reads. Genome Res. 2010;20(11):1613–22.19. Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: A review forstatisticians. 2016. arXiv preprint arXiv:1601.00670.20. Saul LK, Jordan MI. Exploiting tractable substructures in intractablenetworks. Adv Neural Inform Process Syst. 1996;486–92.21. Ghahramani Z, Jordan MI, Smyth P. Factorial hidden Markov models.Mach Learn. 1997;29(2-3):245–73.22. Eberle MA, Fritzilas E, Krusche P, Källberg M, Moore BL, Bekritsky MA,Iqbal Z, Chuang HY, Humphray SJ, Halpert AL, et al. A reference datasetof 5.4 million human variants validated by genetic inheritance fromsequencing a three-generation 17-member pedigree. bioRxiv.2016;055541.23. McPherson A, Roth A, Laks E, Masud T, Bashashati A, Zhang AW, Ha G,Biele J, Yap D, Wan A, Prentice LM, Khattra J, Smith MA, Nielsen CB,Mullaly SC, Kalloger S, Karnezis A, Shumansky K, Siu C, Rosner J, Chan HL,Ho J, Melnyk N, Senz J, Yang W, Moore R, Mungall AJ, Marra MA,Bouchard-Côté A, Gilks CB, Huntsman DG, McAlpine JN, Aparicio S,Shah SP. Divergent modes of clonal spread and intraperitoneal mixing inhigh-grade serous ovarian cancer. Nat Genet. 2016;48(7):758–67.24. Roth A, McPherson A, Laks E, Biele J, Yap D, Wan A, SmithMA, Nielsen CB,McAlpine JN, Aparicio S, Bouchard-Côté A, Shah SP. Clonal genotypeand population structure inference from single-cell tumor sequencing.Nat Methods. 2016;13(7):573–6.25. Shah SP, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J,Tse K, Haffari G, Bashashati A, Prentice LM, Khattra J, Burleigh A, Yap D,Bernard V, McPherson A, Shumansky K, Crisan A, Giuliany R,Heravi-Moussavi A, Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N,Zeng T, Ma K, Chan SK, Griffith M, Moradian A, Cheng S.-W. G, Morin GB,Watson P, Gelmon K, Chia S, Chin SF, Curtis C, Rueda OM, Pharoah PD,Damaraju S, Mackey J, Hoon K, Harkins T, Tadigotla V, Sigaroudinia M,Gascard P, Tlsty T, Costello JF, Meyer IM, Eaves CJ, Wasserman WW,Jones S, Huntsman D, Hirst M, Caldas C, Marra MA, Aparicio S. Theclonal and mutational evolution spectrum of primary triple-negativebreast cancers. Nature. 2012;486(7403):395–9.26. Gundem G, Van Loo P, Kremeyer B, Alexandrov LB, Tubio JMC,Papaemmanuil E, Brewer DS, Kallio HML, Högnäs G, Annala M,Kivinummi K, Goody V, Latimer C, O’Meara S, Dawson KJ, Isaacs W,Emmert-BuckMR, Nykter M, Foster C, Kote-Jarai Z, Easton D, Whitaker HC,ICGC Prostate UK Group, Neal DE, Cooper CS, Eeles RA, Visakorpi T,Campbell PJ, McDermott U, Wedge DC, Bova GS. The evolutionaryhistory of lethal metastatic prostate cancer. Nature. 2015;520(7547):353–7.27. Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, Ritchey JK,Young MA, Lamprecht T, McLellan MD, McMichael JF, Wallis JW, Lu C,Shen D, Harris CC, Dooling DJ, Fulton RS, Fulton LL, Chen K, Schmidt H,Kalicki-Veizer J, Magrini VJ, Cook L, McGrath SD, Vickery TL, Wendl MC,Heath S, Watson MA, Link DC, Tomasson MH, Shannon WD, Payton JE,Kulkarni S, Westervelt P, Walter MJ, Graubert TA, Mardis ER, Wilson RK,DiPersio JF. Clonal evolution in relapsed acute myeloid leukaemiarevealed by whole-genome sequencing. Nature. 2012;481(7382):506–10.28. Zahn H, Steif A, Laks E, Eirew P, VanInsberghe M, Shah SP, Aparicio S,Hansen CL. Scalable whole-genome single-cell library preparationwithout preamplification. Nat Methods. 2017;14(2):167–73.29. Létourneau IJ, Quinn MCJ, Wang LL, Portelance L, Caceres KY, Cyr L,Delvoye N, Meunier L, de Ladurantaye M, Shen Z, Arcand SL, Tonin PN,Provencher DM, Mes-Masson AM. Derivation and characterization ofmatched cell lines from primary and recurrent serous ovarian cancer.BMC Cancer. 2012;12:379.30. Greenman CD, Pleasance ED, Newman S, Yang F, Fu B, Nik-Zainal S,Jones D, Lau KW, Carter N, Edwards PAW, Futreal PA, Stratton MR,Campbell PJ. Estimation of rearrangement phylogeny for cancergenomes. Genome Res. 2012;22(2):346–61.31. Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q. PhyloWGS:reconstructing subclonal composition and evolution fromwhole-genome sequencing of tumours. Genome Biol. 2015;16:35.• We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal• We provide round the clock customer support • Convenient online submission• Thorough peer review• Inclusion in PubMed and all major indexing services • Maximum visibility for your researchSubmit your manuscript atwww.biomedcentral.com/submitSubmit your next manuscript to BioMed Central and we will help you at every step:
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- ReMixT: clone-specific genomic structure estimation...
Open Collections
UBC Faculty Research and Publications
ReMixT: clone-specific genomic structure estimation in cancer McPherson, Andrew W.; Roth, Andrew; Ha, Gavin; Chauve, Cedric; Steif, Adi; de Souza, Camila P; Eirew, Peter; Bouchard-Côté, Alexandre; Aparicio, Sam; Sahinalp, S. Cenk; Shah, Sohrab P. Jul 27, 2017
pdf
Page Metadata
Item Metadata
Title | ReMixT: clone-specific genomic structure estimation in cancer |
Creator |
McPherson, Andrew W. Roth, Andrew Ha, Gavin Chauve, Cedric Steif, Adi de Souza, Camila P Eirew, Peter Bouchard-Côté, Alexandre Aparicio, Sam Sahinalp, S. Cenk Shah, Sohrab P. |
Publisher | BioMed Central |
Date Issued | 2017-07-27 |
Description | Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complicating estimation of clone-specific genotypes. We introduce ReMixT, a method to unmix tumor and contaminating normal signals and jointly predict mixture proportions, clone-specific segment copy number, and clone specificity of breakpoints. ReMixT is free, open-source software and is available at http://bitbucket.org/dranew/remixt . |
Subject |
Cancer genomics DNA sequencing Tumour heterogeneity Genomic rearrangement Copy number variation |
Genre |
Article |
Type |
Text |
Language | eng |
Date Available | 2017-07-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution 4.0 International (CC BY 4.0) |
DOI | 10.14288/1.0349092 |
URI | http://hdl.handle.net/2429/62439 |
Affiliation |
Medicine, Faculty of Science, Faculty of Non UBC Pathology and Laboratory Medicine, Department of Statistics, Department of |
Citation | Genome Biology. 2017 Jul 27;18(1):140 |
Publisher DOI | 10.1186/s13059-017-1267-2 |
Peer Review Status | Reviewed |
Scholarly Level | Faculty |
Copyright Holder | The Author(s) |
Rights URI | http://creativecommons.org/licenses/by/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 52383-13059_2017_Article_1267.pdf [ 1.8MB ]
- Metadata
- JSON: 52383-1.0349092.json
- JSON-LD: 52383-1.0349092-ld.json
- RDF/XML (Pretty): 52383-1.0349092-rdf.xml
- RDF/JSON: 52383-1.0349092-rdf.json
- Turtle: 52383-1.0349092-turtle.txt
- N-Triples: 52383-1.0349092-rdf-ntriples.txt
- Original Record: 52383-1.0349092-source.json
- Full Text
- 52383-1.0349092-fulltext.txt
- Citation
- 52383-1.0349092.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0349092/manifest