UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Simulating human tumour heterogeneity using cancer cell line mixtures Billings, Raewyn Lorraine 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_february_billings_raewyn.pdf [ 4.66MB ]
Metadata
JSON: 24-1.0167058.json
JSON-LD: 24-1.0167058-ld.json
RDF/XML (Pretty): 24-1.0167058-rdf.xml
RDF/JSON: 24-1.0167058-rdf.json
Turtle: 24-1.0167058-turtle.txt
N-Triples: 24-1.0167058-rdf-ntriples.txt
Original Record: 24-1.0167058-source.json
Full Text
24-1.0167058-fulltext.txt
Citation
24-1.0167058.ris

Full Text

SIMULATING HUMAN TUMOUR HETEROGENEITY USING CANCER CELL LINE MIXTURES   by Raewyn Lorraine Billings BBiotech(Med-Forensic), Charles Sturt University, 2010  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Medical Genetics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)   November 2014  © Raewyn Lorraine Billings, 2014   ii Abstract Tumours comprise (epi)genetically and phenotypically diverse cellular subpopulations evolving over space and time. This heterogeneity can be observed as differences in morphology or immunohistology of tumour sections, gene expression levels, genomic sequence and structure, proliferative potential, or metastatic ability. In order to unravel how this heterogeneity persists, one may study the clonal structure and evolution of tumours. Understanding intra-tumor heterogeneity, or the differences amongst cells in a single tumour, is of particular importance to facilitate treatment combinations that effectively target all clinically relevant subclones. This clonality-informed approach requires identification and monitoring of clonal cell populations within a tumour. In this study I simulated tumour heterogeneity using cancer cells lines in idealised mixtures. Deep allelic measurements using next generation DNA amplicon sequencing were integrated in a computational modelling program (PyClone), to retrieve cellular prevalence’s and clonal structure within these cell line mixtures. This approach to modeling heterogeneity employed both diploid and aneuploid cell lines with genomic analysis focused on heterozygous alleles and changes in the prevalence of theses alleles when cell lines were mixed in different proportions. As a result I first identified that using NGS and PyClone modeling enables elucidation of population clonal structure as predicted from idealised mixtures of diploid cell lines. However, the aneuploidy cell line mixtures demonstrated a requirement for copy number information to be included as a prior input to clonality modeling in order to avoid misleading interpretations of cellular prevalence and clonal structure. Defining and monitoring clonal heterogeneity in patient tumours is of importance to track functionally relevant subpopulations of tumour cells, enabling oncologists to administer cocktails of therapeutic agents targeting relevant clones irrespective of their numerical abundance. This clonality-informed   iii treatment approach is a promising development to tackle the growing challenge of therapy-resistant subclones and thus limit cancer recurrence.    iv Preface  The project described in this thesis was initiated and designed by Dr. S.Aparicio and Dr. S.Shah. I performed all laboratory experiments, designed and selected all primers, and performed MiSeq sequencing, with procedural training provided by Jas Khattra, Jazmine Brimhall and Damian Yap. K.Shumansky, R.Aniba, Hossein Farahani and Camila de Souza of the Shah lab performed the bioinformatic data analysis and PyClone analysis.   v Table of Contents  Abstract .......................................................................................................................................... ii	  Preface ........................................................................................................................................... iv	  Table of Contents ........................................................................................................................... v	  List of Tables ................................................................................................................................. ix	  List of Figures ............................................................................................................................... xi	  List of Abbreviations .................................................................................................................. xiv	  Glossary ........................................................................................................................................ xv	  Acknowledgements ..................................................................................................................... xvi	  Dedication ................................................................................................................................... xvii	  Chapter 1: Introduction ................................................................................................................ 1	  1.1	   Cancer today ....................................................................................................................... 1	  1.2	   Heterogeneity ..................................................................................................................... 3	  1.2.1	   Inter-tumour heterogeneity .......................................................................................... 7	  1.2.2	   Intra-tumour heterogeneity ........................................................................................ 10	  1.3	   Clonal evolution ............................................................................................................... 13	  1.4	   Sequencing ....................................................................................................................... 17	  1.4.1	   MiSeq ........................................................................................................................ 19	  1.5	   Detection of allelic prevalence in biological data ............................................................ 21	  1.5.1	   PyClone ..................................................................................................................... 21	  1.6	   Overarching hypotheses ................................................................................................... 22	  1.7	   Thesis objectives .............................................................................................................. 22	    vi Chapter 2: Materials and Methods ............................................................................................ 23	  2.1	   Cell lines ........................................................................................................................... 23	  2.1.1	   HCT116 WT .............................................................................................................. 23	  2.1.2	   184-hTERT-L2 .......................................................................................................... 23	  2.1.3	   TOV3133D and TOV3133G ..................................................................................... 23	  2.2	   Cell culture techniques ..................................................................................................... 24	  2.2.1	   Cell culture ................................................................................................................ 24	  2.2.2	   Cell counting ............................................................................................................. 24	  2.2.3	   Cell mixing ................................................................................................................ 25	  2.3	   PCR primer design and primer selection .......................................................................... 26	  2.3.1	   Primer position selection ........................................................................................... 26	  2.3.2	   Illumina design studio ............................................................................................... 26	  2.3.3	   Primer3 design ........................................................................................................... 26	  2.3.4	   Primer selection ......................................................................................................... 28	  2.4	   Molecular biology techniques .......................................................................................... 47	  2.4.1	   DNA extraction ......................................................................................................... 47	  2.4.2	   Real-time quantitative PCR (qPCR) .......................................................................... 47	  2.4.3	   PCR ............................................................................................................................ 47	  2.4.4	   Agilent bioanalyser .................................................................................................... 48	  2.5	   Amplicon library construction – sequencing .................................................................... 48	  2.5.1	   Illumina multiplex sequencing method ..................................................................... 48	  2.5.1.1	   Hybridization of oligo pool ................................................................................ 49	  2.5.1.2	   Removal of unbound oligos ................................................................................ 49	    vii 2.5.1.3	   Extension ligation ............................................................................................... 50	  2.5.1.4	   PCR amplification .............................................................................................. 50	  2.5.1.5	   PCR clean up ...................................................................................................... 51	  2.5.1.6	   Library normalisation ......................................................................................... 51	  2.5.1.7	   Library for Miseq sequencing ............................................................................ 52	  2.5.2	   Singleplex PCR sequencing method ......................................................................... 53	  2.5.2.1	   qPCR ................................................................................................................... 53	  2.5.2.2	   ExoSAP-IT ......................................................................................................... 53	  2.5.2.3	   Barcode adaptation – 2nd PCR ............................................................................ 54	  2.5.2.4	   Sample cleanup and size selection ..................................................................... 54	  2.5.2.5	   Quantitation and quality control ......................................................................... 54	  2.5.2.6	   Denaturation and dilution of library ................................................................... 55	  2.6	   Genomic data analysis ...................................................................................................... 55	  2.6.1	   Allelic prevalence plots ............................................................................................. 55	  2.6.2	   PyClone ..................................................................................................................... 56	  Chapter 3: Results ....................................................................................................................... 57	  3.1	   H1: Multiplex PCR sequencing performs similarly to singleplex PCR sequencing in determining allelic prevalence from idealised mixtures ............................................................ 61	  3.2	   H2: DNA concentration mixed samples as contrasted with cell number mixed samples results in more accurate idealized mixtures ............................................................................... 68	  3.3	   H3: Copy number-complex mixtures require copy number aware clonal analysis .......... 73	  Chapter 4: Discussion ................................................................................................................ 111	  4.1	   Overview ........................................................................................................................ 111	    viii 4.2	   Summary of findings ...................................................................................................... 112	  4.2.1	   Multiplex PCR sequencing performed similarly to singleplex PCR sequencing in determining allelic prevalence from idealised mixtures ...................................................... 112	  4.2.2	   DNA concentration mixed samples when contrasted with cell number mixed samples resulted in more accurate idealised mixtures ......................................................... 114	  4.2.3	   PyClone analysis accurately estimated cellular mutation prevalence in copy number simple mixtures ................................................................................................................... 115	  4.2.4	   PyClone analysis of copy number complex mixtures required copy number informed analysis for correct cellular prevalence estimations ............................................................ 116	  4.3	   Future directions and conclusion .................................................................................... 118	  Bibliography ............................................................................................................................... 120	  Appendices ................................................................................................................................. 126	  Appendix A  - Supplementary material for chapter 3 ............................................................. 126	     ix List of Tables Table 2.1 Primer sequences used for singleplex PCR sequencing 184-htert /HCT116 cell mixing experiments .................................................................................................................................... 33	  Table 2.2 Primer sequences used for singleplex PCR sequencing DAH55 /DAH56 cell mixing experiments .................................................................................................................................... 39	  Table 2.3 Primer sequences used for multiplex 184-hTERT and HCT116 cell mixing experiments .................................................................................................................................... 45	  Table 2.4 The 99 Primer sequences, common between the HCT116/184-hTERT multiplex and singleplex PCR sequencing experiments. ...................................................................................... 46	  Table 3.1 Table of experiments, experimental design and results figures .................................... 59	  Table 3.2 Summarised expected and observed allelic prevalence measurements derived from multiplex sequencing of idealized 184hTERT/HCT116 mixtures. ............................................... 66	  Table 3.3 Summarised expected and observed allelic prevalence measurements derived from singleplex sequencing of idealized 184hTERT/HCT116 mixtures. .............................................. 67	  Table 3.4 Summarised expected and observed allelic prevalence measurements derived from singleplex sequencing of idealized 184hTERT/HCT116 mixtures. .............................................. 72	  Table 3.5 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from multiplex sequencing of HCT116/184-hTERT mixtures. ............... 87	  Table 3.6 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of HCT116/184-hTERT mixtures. .............. 88	  Table 3.7 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of HCT116/184-hTERT mixtures. .............. 89	    x Table 3.8 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  No copy number. ........................................................................................................................................ 106	  Table 3.9 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures. DAH55 copy number informed. ........................................................................................................................ 107	  Table 3.10 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures. DAH56 copy number informed. ........................................................................................................................ 108	  Table 3.11 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  Non-homogenous SNVs only used in analysis. ................................................................................... 109	  Table 3.12 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  Homogenous SNVs only used for analysis. ....................................................................................................... 110	     xi List of Figures Figure 1.1 Tumour heterogeneity – Inter-tumour and Intra-tumour heterogeneity ......................... 6	  Figure 1.2 Inter-tumour heterogeneity .......................................................................................... 10	  Figure 1.3 Darwins evolutionary tree and clonal evolution .......................................................... 17	  Figure 2.1 Box and Whisker Plot diagram .................................................................................... 56	  Figure 3.1 Workflow of comparative experiments performed with idealised mixtures. ............... 60	  Figure 3.2 Allelic prevalence boxplots of idealised mixtures for the multiplex sequencing (184-hTERT specific, HCT116 specific, shared positions). .................................................................. 64	  Figure 3.3 Allelic prevalence boxplots of idealised mixtures for the singleplex sequencing (184-hTERT specific, HCT116 specific, shared positions) ................................................................... 65	  Figure 3.4 Allelic prevalence plots of idealised mixtures for the singleplex sequencing (184-hTERT specific, HCT116 specific, shared positions) ................................................................... 71	  Figure 3.5 Coverage vs. allelic prevalence and cellular prevalence plots for the multiplex sequencing idealised HCT116/184-hTERT mixtures. .................................................................. 80	  Figure 3.6 Coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised HCT116/184-hTERT mixtures. .................................................................. 81	  Figure 3.7 Similarity matrix plot for the multiplex PCR sequencing HCT116/184-hTERT idealised mixtures. ......................................................................................................................... 82	  Figure 3.8 Similarity matrix plot for singleplex PCR sequencing HCT116/184-hTERT idealised mixtures. ........................................................................................................................................ 83	  Figure 3.9 PyClone cellular prevalence plots for the multiplex sequencing idealised HCT116/184-hTERT mixtures. ..................................................................................................... 84	    xii Figure 3.10 PyClone cellular prevalence plots for the samples mixed by cell number from the singleplex sequencing idealised HCT116/184-HTERT mixtures. ................................................ 85	  Figure 3.11 PyClone cellular prevalence plots for the samples mixed by DNA concentration from singleplex sequencing idealised HCT116/184-hTERT mixtures. ................................................. 86	  Figure 3.12 DAH55 CN informed, coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. ............................................. 96	  Figure 3.13 DAH56 CN informed, coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. ............................................. 97	  Figure 3.14 Similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures (No copy number information). ...................................................................................... 98	  Figure 3.15  DAH55 CN informed similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures. ............................................................................................. 99	  Figure 3.16 DAH56 CN informed similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures. ........................................................................................... 100	  Figure 3.17 PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures, No copy number information used. .................................................. 101	  Figure 3.18 DAH55 CN informed PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. ........................................................................ 102	  Figure 3.19 DAH56 CN informed PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. ........................................................................ 103	  Figure 3.20 PyClone cellular prevalence plots of the non-homogenous SNVs for the singleplex sequencing idealised DAH55/DAH56 mixtures. ........................................................................ 104	    xiii Figure 3.21 PyClone cellular prevalence plots of the homogenous SNVs for the singleplex sequencing idealised DAH55/DAH56 mixtures. ........................................................................ 105	     xiv List of Abbreviations CAF - Cancer Associated Fibroblast CNA - Copy Number Alteration CN - Copy Number dNTP’s - Deoxynucleotide Triphosphates ddNTP’s - Di-Deoxynucleotide ECM - Extra Cellular Matrix HMEC - Human Mammary Epithelial Cells LOH - Loss of Heterozygosity Oligos - Oligonucleotide PCR - Polymerase Chain Reaction qPCR - Quantitative Polymerase Chain Reaction SBS - Sequencing by Synthesis TNBC - Triple Negative Breast Cancer     xv Glossary Cellular Mutation Prevalence - The number of tumour cells in which somatic mutations occur (the product of the distribution of mutations over clones and the size of each sub clonal population) see (Roth et al., 2014) Clone – A set of cells that are descended from a single common ancestral cell are referred to as a clone.  Clonal Evolution – The accumulation of mutational changes over time, that form a new generation of clones within a population Clonal Genotype – The set of unique fixed mutations that define the clone Clonal Lineage – Groups of cells that are related by decent  Clonal Prevalence – The proportion of cells in a population harboring a given genotype  Next Generation Sequencing - Sequencing methods that have been developed in the last 10 years and that do not use the older “Sanger” style sequencing methodology      xvi Acknowledgements Firstly, I would like to thank and express my sincere gratitude to my supervisor Dr. Sam Aparicio for allowing me the opportunity to pursue a project in his lab. I am extremely grateful for the guidance, support, and encouragement and for his generous contribution of knowledge and experience during my time as a student.  I would also like to thank my committee members Dr. Sohrab Shah, Dr. Dixie Mager and Dr. Martin Hirst for their time and invaluable advice. To all the wonderful members of the Aparico lab and molecular genetics department who have been a fantastic group to work with and made my time here enjoyable.  I would especially like to thank Jas Khattra who has provided valuable guidance and support during my time as a student and for providing lots of laughs. A big thank you also to Arusha Oloumi, Damian Yap, Tehmina Masud, Adi Stief , Jazmine Brimhall and Charles Soong for sharing their knowledge and encouragement. Also thank you to all the lab staff and students. I would also like to thank Karey Shumansky, Rad Aniba, Hossein Farahani and Camila de Souza who completed the data analysis that generated the figures for my project, and dealt with my numerous requests to make changes to the analysis on a weekly basis. I also want to thank Cheryl Bishop, the Medical Genetics program assistant who has had all the answers to my numerous questions and has been a valuable source of support. Special thanks are owed to my parents, family and friends who have encouraged me and kept me focused and on track when I have wanted to give up. This thesis would not have been possible without their support love and faith in my ability to achieve this goal.    xvii Dedication   Dedicated to my parents, Ross and Gail Billings.    To my wonderful parents who my entire life have provided me with so much encouragement, support, inspiration and love, and who always told me that I could achieve whatever I put my mind too. The amazing strength shown by my parents during some extremely hard times over the years, while not ever wavering in the love and support shown to my brother and I, is a testament to the incredible and inspiring people they are. This thesis is dedicated to my parents, my heroes.       1 Chapter 1: Introduction  1.1 Cancer today Cancer is a multifaceted disease that annually accounts for more than 15% of deaths worldwide (Varmus & Kumar, 2013). To date more than 100 distinct diseases have been classified as cancer (Stratton, Campbell, & Futreal, 2009a) and this is a huge economic burden on the health system. In order to lessen this burden, earlier diagnosis, more accurate diagnosis and targeted treatments to prevent relapse are required.       Cancer is characterised by the uncontrolled and sometimes rapid proliferation of cells that have acquired the ability to outgrow normal tissue cells, to move beyond the normal boundaries of their tissue microenvironment and to metastasize to other parts of the body (Stratton, Campbell, & Futreal, 2009b). Cancer cells are essentially cells that are defective in the regulation of proliferation and homeostasis (Hanahan & Weinberg, 2000). The hallmarks of cancer as described by Hanahan and Weinberg, 2000, are alterations of six biological processes that lead to the development of cancer. These six hallmarks of cancer are: The ability for cells to evade apoptosis, the ability for tissue invasion and metastasis, limitless replicative potential, insensitivity to anti-growth signals, sustained angiogenesis and self-sufficiency in growth signals (Hanahan & Weinberg, 2000). These changes in the cells are acquired during the evolution of the tumour and enable progression and genomic instability.  In recent years these hallmarks of cancer have been studied in greater detail with omics approaches with a better understanding of cancer biology gained. It has also been shown that there are now an additional two hallmarks: The deregulation of cellular energetics and Genomic instability and mutation (Hanahan &   2 Weinberg, 2011). For the purposes of this thesis project, the hallmark of note is genomic instability and mutation. Genomic instability can be the main cause of heterogeneity within a tumour and this heterogeneity can result in phenotypic features that lead to treatment failure and relapse. The process of tumour formation is understood to begin with a somatic mutation within a cell of a particular organ. Over time the originally mutated cell or clone will acquire additional mutations and the cancer will gain sub-clones that may have acquired characteristics different from the original cell that bestow a survival advantage (Kreso et al., 2013). Although it has been known for many years now that changes in the genetics of cells drives tumour progression it is only recently with the advancement in technology that there has been an increase in understanding of the way that tumours are formed and how the mutations that cause them arise (Bunting & Nussenzweig, 2013). DNA sequencing technologies have revealed that single nucleotide variants (SNV) are the most prevalent mutation found in human tumours, although only a small amount of these actually alter the DNA sequence that codes for a protein, and this change may confer a proliferative and selective advantage (Loeb, 2011). DNA sequencing has also shown that the stepwise accumulation of these mutations is the key to cancer formation and can be used to trace the evolution and clonal expansion of a tumour, allowing for a greater understanding of the complexities of tumour progression (Forment, Kaidi, & Jackson, 2012; Greaves & Maley, 2012). Increased knowledge and understanding of cancer biology, in particular tumour heterogeneity and evolution, will lead to more effective cancer treatments and less therapeutic failures, easing the burden on the global health system and decreasing the number of cancer deaths worldwide.     3 1.2 Heterogeneity Tumour heterogeneity is characterised by distinct subpopulations of cells that differ in their genotypes and phenotypes (Fisher, Pusztai, & Swanton, 2013). The differences can be observed in the morphology and histology of the cells, gene and marker expression, metastatic ability, proliferative potential and spatial diversity (De Sousa E Melo, Vermeulen, Fessler, & Medema, 2013; Gerashchenko et al., 2013; Marusyk, Almendro, & Polyak, 2012).   Heterogeneity is present to some extent in all cancers, and cancers such as breast, ovarian, renal carcinomas and melanomas have been shown to be highly heterogeneous (Abelson et al., 2012; Gerlinger et al., 2012; Menzies et al., 2014; Shah et al., 2012) A number of factors have been shown to contribute to the heterogeneity of a tumour, including; genetic variation, epigenetic changes, and the influence of the microenvironment of the tumour. (De Sousa E Melo et al., 2013; Gerashchenko et al., 2013). Heterogeneity of tumours can be observed histologically by examination of tissue sections from various regions of a tumour, and until the advent of gene expression profiling and next generation sequencing (NGS) histological observation was one of the only ways to characterise a tumour (Navin et al., 2010). In breast cancers for example, a highly heterogeneous collection of cancers, histological observation enabled the classification of a tumour as either in situ carcinoma or invasive (infiltrating) carcinoma, and then each of these types further categorised into sub classes based on growth patterns that could be see microscopically (Malhotra, Zhao, Band, & Band, 2010). This classification system helped determine treatment options based on morphological heterogeneity of the tumour. Histological characteristaion is still used as a tool in the classification of some tumours. We do however now have an understanding that it is not just the phenotype of the tumour that needs classification but   4 also the classification and understanding of the cells genotype and genetic heterogeneity of the tumour in order to tailor effective treatment of an individual tumour.  Genetic heterogeneity is the most studied and well known cause of tumour heterogeneity. It can be caused by mutational events such as translocations, deletions, amplifications and aneuploidy that occur over time and are a major contributor of tumour heterogeneity (Navin et al., 2010). These genetic changes may cause genomic instability within the cells, which is one of the hallmarks of cancer.  The very first genetic alterations/mutations that initiate a tumour will remain a constant as the tumour evolves and become “fixed”, meaning that these mutations will be present in 100% of the new sub-populations / sub-clones of cells as the tumour progresses. (Hanahan & Weinberg, 2011; Hayes & Paoletti, 2013). Over time more changes will arise as each new sub-clone is formed, further increasing the complexity of the tumour. This is known as clonal evolution and will be discussed in detail later in this chapter. Microenvironmental factors, although not fully understood, are a contributing factor to the heterogeneity of tumours. As a tumour progresses there is a significant shift in the microenvironment that includes changes to the extracellular matrix (ECM), stromal cell complexity, changes to the extrinsic factors that are controlled and produced by stromal cells, as well as contact of the cells to hypoxia, cancer-associated fibroblasts (CAFs) and a number of inflammatory cells that are able to infiltrate the tumour to impact on the phenotype of the cells (De Sousa E Melo et al., 2013; Marusyk et al., 2012). It has been shown that the microenvironment is able to serve as a selective factor in the Darwinian evolution of the tumour. If the microenvironment offers only weak growth promoting properties, results in the growth and selection of more aggressive and invasive clones, while a permissive microenvironment will   5 have a greater variety of clones and the aggressiveness (cell autonomous cell growth) of the clones will vary (Anderson, Weaver, Cummings, & Quaranta, 2006). CAFs can promote tumourogenesis, increase proliferation and cytokine secretions and change the cellular matrix of the tumour, which in turn causes more heterogeneity of the cells that are exposed to these CAFs (Junttila & de Sauvage, 2013). The microenvironment may also be spatially different within a tumour. For example intra-tumour heterogeneity may result if there is an area of inflammation. This localized inflammatory site will have an increase in secretions from the infiltration of immune cells and may have increased DNA damage caused by oxidative stress at the site of inflammation. This leads to changes in the evolution of tumour cells at the site of inflammation and results in a set of sub-clones that are totally distinct from clones in an area where no inflammation is present (De Sousa E Melo et al., 2013; Meira et al., 2008). The microenvironment itself can be said to be heterogeneous and will be different in composition within different tumours and within a tumour itself, and evolving in composition as the tumour progresses (Anderson et al., 2006; De Sousa E Melo et al., 2013). There has been an increase in the study of tumour heterogeneity over the past few years, as it is this heterogeneity that plays a role in progression and metastasis, and the heterogeneous nature of the tumours may dictate the success or failure or therapeutic treatments (Michor & Polyak, 2010). The advent of new technologies such as Next Generation Sequencing (NGS) has made possible the elucidation of cancer heterogeneity in a number of different cancers. Heterogeneity can be broken down into two categories. The first can be described as variability between individuals, or within an individual between the primary tumour and metastases - known as inter-tumour heterogeneity, and the second can be described as the variability within a   6 tumour itself, known as intra-tumour heterogeneity (Shibata & Shen, 2013) as shown schematically in figure 1.1.  Figure 1.1 Tumour heterogeneity – Inter-tumour and Intra-tumour heterogeneity A. Inter-tumour heterogeneity between patients. Patients diagnosed with the same type of cancer and tumour subtype display differences in the genotype and phenotypes of the tumours. B. Inter-tumour heterogeneity in an individual, displays divergent subpopulations of cells between the primary tumour and metastases. C. Intra-tumour heterogeneity in an individual tumour, showing the clonal evolution of the tumour and subclones that expand after selective pressure events such as drug therapy.    7 1.2.1 Inter-tumour heterogeneity Inter-tumour heterogeneity can be described as the differences between tumours of the same type and tissue location in different individuals. (Shibata & Shen, 2013). This diversity between individuals means that although the tumours are of the same type, the make up of the population of tumour clones cause behavioral and biological differences in the tumours and may result in the need for different treatment options for each individual, as the same type of cancer may respond differently to treatment in each tumour (Gerashchenko et al., 2013).  A study by Shah et al, 2012, demonstrated how inter-tumour heterogeneity was a cause of vast differences in 104 Triple Negative Breast Cancer (TNBC) patients. Shah et al, 2012, investigated TNBC patients from time of diagnosis and observed that even in the early stages of cancer development there was a great variation in somatic mutations between each patient. They used deep re-sequencing measurements of allelic abundance in over 2000 somatic mutations and revealed that TNBC’s have large variability in the clonal composition of tumours between patients when diagnosed. The results of this study identified that inter-tumour heterogeneity is an important factor in understanding the biology of TNBC tumours, and the clonal prevalence within individual tumours may be important when deciding the best therapeutic treatment for the patient. In another study by Menzies et al, 2014, inter-tumour heterogeneity was investigated in patients with metastatic melanoma. This study used response to therapy as a way to understand heterogeneity and drug sensitivity between patients with a very specific type of melanoma. The study followed 24 patients who had been diagnosed with MAPK inhibitor naïve BRAF-mutant melanoma treated with combination drug therapy. CT scans were taken from baseline, then every 8 weeks to measure the response of therapy on the tumour. The authors found that there was a range in the response to treatment; only 4 of the patients had a complete overall response, while   8 nearly all patients had a complete initial response in individual metastases. Although all patients had an initial response to the drugs, patients that had a more varied response to treatment also had a shorter survival. The authors concluded from this that inter-tumour heterogeneity is an important factor in treatment and survival outcomes and needs to be investigated further in patients with metastatic melanoma (Menzies et al., 2014).  In breast cancers, inter-tumour heterogeneity is well known and has been extensively studied. These studies included expression profiling to enable the classification of breast cancers into a number of subtypes based on the presence or absence of particular receptors and growth factors. The subtypes include: luminal A, luminal B, HER2-positive and triple negative (basal, claudin-low, phenotype non-basal). These subtypes exhibit great variation in the sensitivity and resistance to treatments which has meant that treatment is directed by the content of receptors and growth factors that are present in the tumour (Gerashchenko et al., 2013; Hayes & Paoletti, 2013; Sørlie et al., 2001; Voduc et al., 2010). Although breast cancers are classified into one of the different subtypes, each tumor may have some cells from another subtype, (Fig 1.2). For example, cells with a luminal A cell phenotype are predominant in luminal A tumours, although they may also contain a small number of basal-like cells, further increasing the heterogeneity of the tumour (Polyak, 2011).  In a study of non-small cell lung cancer by Chen et al, 2012, the largest inter-tumor heterogeneity study conducted between primary lung tumours and their corresponding metastases, patients with epidermal growth factor receptor (EGFR) mutations who have a mixed response to treatment with tyrosine kinase inhibitors (TKIs) were investigated. The group performed direct sequencing on a large numbers of samples and found that in some of the cases the primary tumour had wild type EGFR while the metastases displayed the mutant EGFR   9 genotype. Chen et al, also identified that EGFR mutation status was higher in patients that had multiple pulmonary nodules than those patients with only one nodule. It was also shown that the amplification and over expression of EGFR lowers the efficacy of the TKIs and the TKIs may actually have a mutagenic effect in these patients. This study suggests that serial biopsy samples of the primary tumour and biopsies of the metastases are required to gain a fuller understanding of the cancer in each patient and to prevent treatment failure (Chen et al., 2012). Inter-tumour heterogeneity can also been seen within an individual, between the primary tumour and its metastases. The divergence of the subclones from the tumour to the metastatic sites is important when making clinical treatment decisions that occur during the progression of the tumour (Bedard, Hansen, Ratain, & Siu, 2013; Fisher et al., 2013; Shah et al., 2009).                  10   Figure 1.2 Inter-tumour heterogeneity This model that shows the inter-tumour heterogeneity of breast cancers between individuals and between the cancer subtypes. The Basal-like subtypes are mainly composed of basal-like cancer cells, the Luminal subtypes are composed predominately of Luminal cancer cells, the differences in the prevalence of the cell types allow for further heterogeneity within the cancer subtypes. (Adapted from Polyak, 2011).  1.2.2 Intra-tumour heterogeneity An individual tumour may show remarkable diversity of the genotypes of the cells that make up the tumour. Different markers may also display variable expression by groups of cells in different locations of the tumour (Gerashchenko et al., 2013) These sub-populations of cells may differ with location of the tumour and with the evolution of the tumour. Diversity within an Luminal(A(subtypes( ((((((((((Basal(subtypes((TNBC)(Inter8tumour(heterogeneity(in(Breast(Cancer(subtypes(Basal(8like(breast(cancer(cells(Luminal(A((breast(cancer(cells(  11 individual tumour is known as intra-tumour heterogeneity. Intra-tumour heterogeneity is a great challenge in providing effective treatment and a good survival outcome for the patient. The resistance of a cancer to therapy and the biggest obstacle in the effective treatment of a tumour may be attributed to intra-tumour heterogeneity, (Abelson et al., 2012; Gerashchenko et al., 2013) Intra-tumour heterogeneity can be studied in the context of the clonal evolution of the tumour and over time target therapy towards the most prevalent clones. Therapy may cause further heterogeneity of the tumour as resistant clones may survive, and over time may evolve to produce new stronger sub clonal populations. For this reason it is important to study the tumour for an extended period of time and importantly after the course of therapy, as the prevalence of the clones may change and may mean that treatment options will need to be changed to offset these new clones.  Different cancer types also show different levels of intra-tumour heterogeneity and many studies have been completed to investigate complexity and intra-tumour heterogeneity of individual tumours in a range of cancer types.    A study by Gerlinger et al, 2012, evaluated tumour samples from four patients with metastatic renal-cell carcinoma. Samples of the biopsies from numerous sites including the primary tumour, metastatic sites, and a germline DNA sample were collected before a 6 week treatment regime began, and multiple regions of the nephrectomy specimen collected after a 1 week washout period where the patient did not receive any drug therapy (Gerlinger et al., 2012). Whole exome sequencing and SNP array analysis was performed on the samples. The results of this study revealed branched evolutionary tumour growth in the patients and up to 69% of all the somatic mutations could not be detected in every tumour region studied. A number of genes showed mutational intra-tumour heterogeneity and underwent numerous different and spatially   12 diverse inactivating mutations within a single tumour, that suggests a form of Darwinian selection and evolution (Gerlinger et al., 2012) This study revealed that intra-tumour heterogeneity can lead to misrepresentation of the mutational and genomic landscape when only a single tumour-biopsy is examined. In order to assure the best individualised therapy, multiple samples of individual tumours and metastatic sites are required for a representative picture of the tumour.  The influence that chemotherapy and drug treatments have on intra-tumour heterogeneity is of particular importance as shown by a study conducted by Kreso et al, 2013, who investigated the influence of chemotherapy on inter-tumour heterogeneity in colorectal cancers. Using copy number alteration (CNA) profiling, sequencing and lineage tracking, the authors followed the repopulation dynamics of 150 single marked lineages from a number of colorectal cancers through multiple xenograft passages in mice. They also investigated how different clones responded to the introduction of a chemotherapeutic drug. This extensive study allowed the authors to conclude that colorectal cancer clones from the same lineage showed great variability and that chemotherapy caused the dominance of lineages and clones that before treatment were dormant or minor populations (Kreso et al., 2013). This study reinforced the notion that intra-tumour heterogeneity is complex and ever changing within the tumour and that chemotherapy and drug treatments may actually increase heterogeneity. Through the above mentioned studies and many others it can be concluded that intra-tumour heterogeneity is the main determinant factor of therapy response/failure and recurrence of the disease (Bedard et al., 2013). Thus, in order to establish an effective clinical regime for patients, intra-tumour heterogeneity must be contained.    13 1.3 Clonal evolution Clonal evolution is a model of cancer progression that was proposed by Peter Nowell in 1976. In the landmark paper, Nowell proposed a mechanism of clonal evolution by which a cancer develops from a unitary cell of origin, a “clone”. Tumour progression follows and is due to the stepwise acquisition of somatic mutations, that under selection, provide a selective advantage for the fitter clones (Nowell, 1976). The fittest clones will have somatic mutations that confer survival advantages such as an increase in proliferative, survival or invasive ability and drug resistance (Podlaha, Riester, De, & Michor, 2012; Yachida et al., 2010).  Nowell’s 1976 model has been widely accepted and is often compared to Darwin’s basic principle of species evolution as depicted in his evolutionary tree diagram (Figure 1.3). Darwin’s 1837 notebook sketch depicts a phylogenetic tree in which ancestors A, B, C and D are all descended from a single originator. Darwin’s theory of evolution also states that natural selection takes advantage of successive variations and occurs over time allowing for only the fittest to survive and reproduce, a process that is seen in the system of cancer progression and clonal evolution.  (Darwin, 1859; Greaves & Maley, 2012).  Clonal evolution has three central concepts: clonal genotype, clonal lineage, and clonal prevalence. Clonal genotype can be described as the collection of unique stable heritable marks/mutations that define the specific clone. Clonal lineage is described as the relationships between all the clones and their evolution over time, or simply put, from where each clone is descended (figure 1.3B). The third concept in tumour clonal evolution studies is clonal prevalence, which is a measure of presence of a specific mutation within the clonal population that takes into account the size of the clonal population that has the specific mutation, that is, how often the mutation/mark appears in each cell (Aparicio & Caldas, 2013). For example, if mutation ‘A’ was in the founding clone of 20 progeny clones, it   14 would have a clonal prevalence of 100 %, being found in 20/20 clones, whereas a mutation ‘C’ that arose after 2 ancestral mutations is found in 10/20 cells and would have a clonal prevalence of 50%. The knowledge of these concepts enables the study of clonal evolution by identifying allelic frequencies with NGS techniques, which provides insight into which clones are driving the clonal expansions and may allow for a more specific treatment regime based on this. Since Nowell’s publication nearly 4 decades ago, numerous studies have supported and validated his mechanism of cancer evolution (Abelson et al., 2012; Fidler & Hart, 1982; Gerlinger et al., 2012; Landau et al., 2013; Navin et al., 2010; Shah et al., 2009; Sottoriva et al., 2013; Yachida et al., 2010). The advent of technologies such as whole genome and transcriptome sequencing has been a major driver for the study of clonal evolution and enabled the elucidation of clonal and sub-clonal populations by single nucleotide variants, including at the single cell level (Bedard et al., 2013; Shah et al., 2009).  One of the first studies to utilise genome and transcriptome sequencing in the investigation of clonal evolution of individual tumors was carried out by Shah et al, 2009. The group sequenced a metastatic tumour from an individual patient at depth, and measured the frequency of mutations in genomic DNA from the primary tumour diagnosed 9 years before collection of the metastases. 32 somatic coding mutations were identified in the metastases, of which, 5 of the 32 mutations were also seen in the primary tumour, 6 were present but at much lower frequencies, 19 were not identified at all in the primary tumour and 2 mutations were undetermined (Shah et al., 2009). The results demonstrated how single nucleotide mutations can reveal how significant evolution can occur with the progression of the cancer. The study also concluded that using sequencing technologies to investigate different cell populations within a tumour is of importance to gain an understanding of the evolution at early and late stage tumour   15 progression. Sequencing also allows for the estimation of the allelic frequency and prevalence of the tumour and can reveal the mutations that initiated the tumour (Shah et al., 2009).   Tumour evolution can also be studied at the level of a single cell using single-cell sequencing. Several papers have been published in the recent years that use single cells to track the evolution of tumours (Hou et al., 2012; Navin et al., 2010; Xu et al., 2012). The study by Navin et al, 2010 was one of the first to use single-cell sequencing to track tumour evolution. The group investigated two TNBC’s, one a polygenomic tumour and the other a monogenomic tumour and paired metastatic liver tumours.  The primary tumours were microdisseceted into 12 sectors, and after FACS sorting and ploidy analysis 100 cells from multiple sectors and ploidy fractions were sequenced. Copy number profiles of all cells were determined. For the polygenomic tumour 3 distinct subclonal populations and evolutionary branches were identified, suggesting that significant clonal expansion occurred between the primary tumour and the metastases. The monogenomic primary tumour and metastases copy number profiles showed that only one clonal expansion caused the primary tumour and both metastatic tumours (Navin et al., 2011). The study showed that single cell analysis from numerous cells across different sections of a primary tumour and it’s metastatic sites enables copy number profiling that can reveal valuable information about the way the tumour evolves and the clonal expansions that occur over time (Navin et al., 2011). Another study that showed the value of genome sequencing for clonal evolution was demonstrated by Ding et al, 2012. The group investigated the relapse associated with acute myeloid leukemia (AML), which of great significance as most patients will die after relapse of the disease. Primary tumours and the relapse tumour genomes of 8 patients were sequenced and revealed a number of recurrent but novel mutations in the patients and showed that there are 2   16 important patterns of clonal expansion occurring in the relapse of the patients. The first pattern the group found was that a single founding clone from the primary tumour gave rise to the relapse after the acquisition of multiple mutations. The groups also found a second pattern of clonal expansion that occurred after treatment with the survival of a sub-clone that then gained mutations and expanded, causing relapse. The study showed that clonal evolution in AML is in part driven by the chemotherapy that is used to treat the cancer and because of this it is of great importance in the future to take into consideration each AML patient’s genome and identify the founding clone and sub-clones and base treatment options on the clonal evolution in each individual (Ding et al., 2012). The above-mentioned studies and the numerous other studies that have been performed to date investigating clonal evolution all show that the process of clonal evolution is one of the main factors that give rise to heterogeneity within tumours. The way a tumour evolves is of great importance in understanding the biology of cancer formation and for choosing a treatment regime that will be successful at preventing recurrence. Clonal evolution may occur over a long period of time (decades), such as in some breast cancers, or in a short time frame (years) in cancers such as AML, But in each case clonal expansion can lead to metastasis, relapse and eventual death, making the investigation of a cancer’s clonal evolution imperative to future successful drug therapies, monitoring after drug therapy and total remission of the cancer. In the future using technologies such as NGS in the analysis of tumours could provide clonal prevalence information of mutations that could direct a clinician’s targeted treatment of the most prevalent mutations, improving the survival outcome of the patient.     17  Figure 1.3 Darwins evolutionary tree and clonal evolution A). The sketch from Darwin’s 1837 notebook. He depicts branches A, B, C and D are descended from one common ancestor 1 and branches and lineages are formed over time. B) Clonal evolution of a tumour. The clonal genotype is represented by the letters that signify the mutations/marks present in the clone. The clonal lineage shows how the clones are descended over time. (Adapted from Aparicio et al, 2013).   1.4 Sequencing Deep sequencing methods refer to the massively parallel sequencing methods that have been developed in the last 10 years and that do not use the older “Sanger” style sequencing   18 methodology (Soon, Hariharan, & Snyder, 2013). In 1977 Fred Sanger developed a sequencing method that was the gold standard of sequencing for the next 30 years. “Sanger” sequencing, also referred to as “first generation” sequencing, was based on the chain-termination method. This method uses a mixture of deoxynucleotide triphosphates (dNTPs) and di-deoxynucleotide triphosphates (ddNTPs). The ddNTPs will cause the termination of the elongation step as they do not have the 3-OH group that is required for the addition of extra nucleotides. The fragments that are generated differ in the length of the chain by one base from each other. The fragments can then be selected for size by running parallel electrophoresis polyacrylamide gels all of the four dNTP reactions. A pattern of the bands can then be seen and the sequence of the original DNA can be obtained. The ddNTPs can also be labeled and more easily detected on automated sequencing machines that were introduced in 1987 and made Sanger sequencing faster and less laborious (Grada & Weinbrecht, 2013; Liu et al., 2012; Sanger & Nicklen, 1977). Sanger sequencing was the technology used for the human genome project, a massive undertaking that began in 1990 with an aim to sequence the entire genome for a cost of $US3 billion. It was a slow and labour intensive project and took 13 years to complete the sequencing of the entire human genome (Grada & Weinbrecht, 2013). Since the completion of the human genome project there was a drive to find a more cost effective and faster method for genome sequencing. The past 5 years have seen a rapid change in the technologies for genome sequencing and this has dramatically dropped the cost of a genome from $3 billion to only $3-5,000 per genome, and that price is continuing to fall in the fashion of Moores law. The cost is more than halving every 2 years (DeVita & Rosenberg, 2012), there has also been a tremendous decrease in turnaround time, a change from 13 years to only 1-4 days per genome at the present time (Grada & Weinbrecht, 2013; Soon et al., 2013). The advancement in high throughput sequencing has   19 allowed for the discovery and exploration of new genes, identification of known disease causing mutations and diagnosis of certain conditions, enabled clonal evolution studies and has also enabled RNA sequencing to deliver information on the entire transcriptome, all of which are of great benefit to researchers, biologists and healthcare providers alike (Grada & Weinbrecht, 2013; Liu et al., 2012; Loman et al., 2012). Current technology differs from Sanger sequencing as it is able to perform massively parallel analysis and has the ability to sequence thousands to millions of molecules at one time, is performed at a reduced cost and is a much faster sequencing method when compared to the Sanger sequencing (Grada & Weinbrecht, 2013; Liu et al., 2012).  One of the deep sequencing platforms that has been used widely in recent years is a bench top sequencing system by Illumina called MiSeq. The MiSeq platform was used to perform all sequencing experiments for this thesis project.  1.4.1 MiSeq The MiSeq is a sequencing platform that is small enough to fit on a bench top and provides more focused sequencing and fast turnaround times. It is aimed at medium size laboratories and for clinical diagnostics, for the purpose of small genome sequencing, targeted sequencing, amplicon sequencing, targeted gene expression and metagenomics (Illumina, 2014b; Quail et al., 2012). The MiSeq can produce read lengths of 2x300bp, and an output of up to 15gb with 25 million sequencing reads. The MiSeq also allows for up to 384 samples to be indexed and sequenced in one run (Illumina, 2014b). A number of publications have compared the MiSeq with other sequencing platforms such as the Ion Torrent, 454 GS Junior, and the collective results show that MiSeq had the lowest   20 error rates, the highest throughput, and the broadest range of applications (Liu et al., 2012; Loman et al., 2012; Quail et al., 2012). The MiSeq uses reversible terminator sequencing by synthesis (SBS) chemistry (Bentley et al., 2008). This chemistry utilises four fluorescently labeled reversible terminator nucleotides, these nucleotides compete with each other for incorporation on the DNA template. Bridge amplification forms the template clusters on the surface of a glass flowcell. The labeled nucleotides are imaged each time a dNTP is added, then cleaved to allow for the incorporation of the next base, that forms a growing DNA strand. The clusters are excited by a laser and will emit a characteristic fluorescent signal and the base is called. Millions of clusters are sequenced in a massively parallel manner (Grada & Weinbrecht, 2013; Illumina, 2014a; Quail et al., 2012). Sequences from pooled sample libraries have unique indices that get added during sample preparation and are separated based on this indexing. For each unique sample, the stretches of DNA sequence that are similar are clustered and the forward and reverse reads are paired which creates contiguous sequences for each sample which can then be aligned to the genome and variant calls made. (Illumina, 2014a, 2014b). The MiSeq is able to do preliminary data analysis and provides information of run performance and cluster generation so that it can be determined if downstream data analysis is worthwhile. This thesis project used the TruSeq chemistry and barcodes from Nextera for the singleplex PCR sequencing method for MiSeq. TruSeq is a custom amplicon kit in which primer design is completed using Illumina software and primers are generated based on a multiplex design. The Nextera chemistry uses customer-designed primers, and for this project singleplex designed primer reactions were used. The Nextera chemistry differs from TruSeq in that the fragmentation and tagging is done simultaneously in a   21 single step (Illumina, 2014c). The MiSeq is a robustly performing sequencing platform that yields quality sequencing data at an affordable price with a fast output time to results.  1.5 Detection of allelic prevalence in biological data As mentioned above, there is immense value in knowing the numerical values / prevalence of clones and sub clones within a tumour population. In order to show the diversity and presence of clonal populations within a tumour, the data must be analysed bioinformatically. The resulting analyses enable visualization of clonal clusters, allelic prevalence and tumour evolution over time.   1.5.1 PyClone One statistical method that has been developed for the calculation and visualisation of clonal frequencies within a population of cells is PyClone. This model for inferring clonal populations was developed by Roth et al, 2014.  The model is used for analysing deeply sequenced mutations to estimate quantity and distinguish clonal populations in a sample or tumour (Roth et al., 2014) PyClone is a hierarchical Bayesian model that measures the allelic prevalence allowing the estimation of the tumour cell population based on the cells that have the specific mutation. These measurements can also quantify the clonal populations by assuming that mutational clusters that are seen to fall into the same place in the clonal phylogeny are present at the same shared cellular prevalence (Roth et al., 2014). The allelic prevalence of a mutation is estimated once the loss of heterozygosity state (LOH), copy number state and cellularity have been taken into consideration and this information used for the final allelic prevalence values (Roth et al., 2014; Shah et al., 2012). Once the allelic prevalence has been calculated it can be   22 graphed for visualisation and the spread of the clusters shown. For the purposes of this thesis, PyClone data for allelic and cellular mutation prevalence (the result of PyClone model fitting) is shown graphically in PyClone plots and in similarity matrix plots.  1.6 Overarching hypotheses Tumours are often comprised of functionally heterogeneous mixtures of cells whose abundance varies spatially and temporally in response to microenvironmental changes and intrinsic genomic determinants. Measuring the dynamics of clones defined by allelic sequence marks is an efficient approach to determine clinically relevant subpopulations of cells in a tumour.   I hypothesise that for the elucidation of clonal structure in idealised cell mixtures; 1) Multiplex PCR-derived amplicon sequencing results in greater accuracy of allele prevalence estimates than singleplex PCR-derived amplicon sequencing, 2) DNA concentration mixed samples as contrasted with cell number mixed samples results in more accurate idealised mixtures, 3) Copy number-complex mixtures require copy number aware clonal analysis.  1.7 Thesis objectives This project was designed to simulate aspects of tumor heterogeneity in vitro in order to gain a better understanding of the influence of experimental parameters in the measurement of allelic and cellular prevalence to inform tumour clonal structure. To create this simulation of heterogeneity, various cell lines with known genomic sequence features were mixed in defined proportions.   23 Chapter 2: Materials and Methods  2.1  Cell lines The cell lines used in this project were selected on the basis of exome and SNP6 copy number data in order to pick target SNV positions for each cell line. The HCT116 and 184-hTERT-L2 lines were chosen as they are relatively copy number simple. The ovarian cell lines were chosen as they are cell lines that are derived from one individual, are copy number complex and thus provide a more biologically relevant model.  2.1.1  HCT116 WT This cell line was derived from the colon of an adult male with colorectal carcinoma (Brattain, Fine, Khaled, Thompson, & Brattain, 1981). The cell line was obtained from the ATCC (Manassas, Virginia) and has been maintained in the Aparicio lab.  2.1.2 184-hTERT-L2 This cell line was derived from human mammary epithelial cells (HMEC) and immortalized by transduction with hTERT (Yaswen & Stampfer, 2002). The 184-hTERT-L2 cell lines used in these experiments were constructed in the laboratory of molecular carcinogenesis at the NIEHS by Dr. Carl Barrett and were gifted to the Aparicio laboratory.  2.1.3 TOV3133D and TOV3133G These ovarian cancer cell lines were derived from solid tumours of the right (TOV3133D) and left (TOV3133G) ovary of a 52 year old female, and established using the   24 scrape method (Létourneau et al., 2012). The cell lines in this experiment were a gift from the Dr. Huntsman laboratory, Vancouver. In the results section of this thesis, these cell lines will be referred to as DAH55 (TOV3133D) and DAH56 (TOV3133G), identifiers given to the cell lines upon exome sequencing.  2.2 Cell culture techniques  2.2.1 Cell culture 184-hTERT cell lines were cultured at 37°C, 5% CO2, in serum-free mammary epithelial cell basal media (MEBM, Lonza), supplemented with mammary epithelial cell growth media single quots (Lonza), 5µg/ml transferrin (Sigma), 1.25M of isoproterenol (Sigma Aldrich). HCT116 cell lines were cultured at 37°C, 5% CO2, in McCoys 5A media (Sigma Aldrich) supplemented with 10% FBS (Sigma Aldrich). TOV3133D and TOV3133G cell lines were cultured at 37°C, 5% CO2, in a 1:1 mix of media 199 (Sigma Aldrich) and media 105 (Sigma Aldrich) supplemented with 10% FBS.  2.2.2 Cell counting The cells were grown until confluent and then passaged. The cells were counted using a dual chamber haemocytometer (Hausser Scientific) after staining with Trypan blue (Gibco). To calculate the number of cells per mL, I used the following formula:  Cell number/number of quadrants counted x dilution factor x 104 = number of cells per mL. This concentration was then used to calculate the volume required to make cell mixtures of defined proportions. E.g. for 1 million cells: 1x106 x 1/number of cells calculated.   25 Once cell numbers were calculated, the cells were centrifuges and then wash with PBS. Washed cells were frozen prior to cell mixing.  2.2.3 Cell mixing Cells were pelleted according to cell number proportions and mixed prior to DNA extraction. For example, if a cell mixture of 0.9 184-hTERT and 0.1 HCT116 was required, these cell pellets were thawed and mixed together with PBS at the beginning of the DNA extraction. Cell mixing was also carried out according to a / the sample’s DNA concentration. DNA was extracted from pelleted cells and quantified. The extracted cell proportions were then mixed together according to DNA concentration. For example in order to calculate how much sample is required for each mix all samples are to be at 5ng/µl per reaction for 150 reactions, (5 x 150 = 750ng in total). In order to calculate mixing amounts: first the calculation of concentration was required for each mix. 0.9 (90% of 750ng) = 675ng and 0.1 (10% of 750ng) = 75ng. Then the required amount of sample for the mix was calculated, If 0.9 sample had a DNA concentration of 482ng/µl, then 675/482 = 1.40µl of sample was added, and if 0.1 sample had a DNA concentration of 8.8ng/µl then 75/8.8 = 8.5µl sample was added to mix. The required sample volumes were mixed and resuspended with TE buffer to obtain the required volume for qPCR.       26 2.3 PCR primer design and primer selection   2.3.1 Primer position selection Target heterozygous SNV positions were selected for each cell line. The criteria was: for each cell line in a mixing experiment 48 heterozygous positions needed to be unique to each cell line and 48 common between the two cell lines.  2.3.2 Illumina design studio TruSeq Custom Amplicon Sequencing primer design was completed using the Design Studio program from Illumina (http://www.illumina.com/informatics/experimental-design/designstudio.ilmn). Target positions were entered, and the best 144 primers (48 unique to HCT116 WT, 48 unique to 184-hTERT-L2, and 48 shared positions) were selected based on how they performed in a multiplex reaction. Primers were ordered directly from Illumina and were supplied as a pool of 144 primers.  2.3.3 Primer3 design The 2-Step PCR sequencing method utilised primers that were designed as singleplex primers. Chosen target positions were entered into Primer3, an online program used for primer design. (Primer3 File - http://primer3.sourceforge.net) (Untergasser et al., 2012) Specific settings were chosen when designing the primers for these experiments as follows: P3_FILE_TYPE=settings PRIMER_THERMODYNAMIC_PARAMETERS_PATH=/meta/o/oncoapop/Apps/primer3-2.3.5/src/primer3_config/   27 PRIMER_MISPRIMING_LIBRARY=/meta/o/oncoapop/Apps/primer3-2.3.5/src/humrep_and_simple.txt PRIMER_LIBERAL_BASE=0 PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS=0 PRIMER_MIN_SIZE=18 PRIMER_OPT_SIZE=22 PRIMER_MAX_SIZE=25 PRIMER_MIN_TM=57.0 PRIMER_OPT_TM=59.0 PRIMER_MAX_TM=63.0 PRIMER_PAIR_MAX_DIFF_TM=5.0 PRIMER_PRODUCT_SIZE_RANGE=150-170 140-190 PRIMER_EXPLAIN_FLAG=1 PRIMER_NUM_RETURN=5 PRIMER_MAX_NS_ACCEPTED=0 PRIMER_MAX_POLY_X=4 PRIMER_GC_CLAMP=2 PRIMER_MIN_LEFT_THREE_PRIME_DISTANCE=5 PRIMER_MIN_RIGHT_THREE_PRIME_DISTANCE=5 =  After Primer3 design all selected target primers were validated with in-silico PCR using the UCSC online program (http://genome.ucsc.edu/cgi-bin/hgPcr). Targets positions that passed all design QC were used. In order for primer compatibility with the MiSeq chemistry, adapter sequences were added to each primer. Sequence information supplied by Illumina as follows: For the forward primer 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG- [locus specific sequence] For the reverse primer 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG – [locus specific sequence] Primers with the Illumina adapter sequence were ordered desalted from IDT at 0.5nM concentration, final volume 250µl  28  1.1.1 Primer selection All primers that were singleplex-designed were tested for amplification performance using the qPCR method described above. PCR products were also run on a 2% agarose gel (Sigma Aldrich) to check the size of the band and quality of each primer pair. If primers passed all QC checks, they were used in the experiments. In total 48 primers for cell line 1, 48 primers for cell line 2 and 48 primers shared across cell lines 1 and 2 were selected. The selected primers and corresponding sequences for the experiments are shown in the tables below. POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr1_40879952	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATGGATTTGTTGGGCCTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACCACTACACCACACTGTCG	   Shared	  Chr1_47496983	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAACTTTGCAACCCTCAGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGTTGGTGACGGTTTTCTGC	   Shared	  Chr1_47583415	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCACCTGTAATCCAGCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTGAGTGGTCTGGAGCC	   Shared	  Chr1_52266120	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATAAGCTACCATGCCCGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGGTGTGGGGGTTAAGG	   Shared	  Chr1_68603432	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCATTCACCGTCTCCCAGT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGCATCTATGGGATGTGGA	   Shared	  Chr1_8404093	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAAATTCCTCCCAGCGACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCCACCAGAGAGCAATGC	   Shared	  Chr10_3178881	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTACAAGGCCAGCTATGACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGACACACTCCACAGACAGC	   Shared	  Chr10_38894616	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACATGAACGTTTTTGGGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCACGATATCCAGGATGGC	   Shared	  Chr11_104972190	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACTTTTTCCTCCCAGGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTTGCAGCCTACAGTTCAGG	   Shared	  Chr11_112584001	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGTTCCAGCTCCTCAGTCC	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCTTGGAGCTCTACTTTTCCA	   Shared	  Chr11_130781475	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAAACGGCATTCTCCTCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCTTGACGGTGAAAACAGC	   Shared	  Chr11_31454975	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCTCTATGGGAGCAGGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCAGGTGCTATGGACCTATTCC	   Shared	  Chr12_25957056	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACTTGGAATTTGGGCTTCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTGCATCTTCTTGGGCAAGC	   Shared	  Chr12_5603632	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATTACCAGAGCACCCTGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGATGTTCTGTTCGCCACC	   Shared	  Chr12_62104219	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTTGAGTTAAGTTTCTATGCGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAAATACCTGCATCTGTTAGTCC	   Shared	  Chr13_110844721	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCCTGCAACACCATCTCTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCGTCGGGTCACCTATTCC	   Shared	    29 POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr14_57052511	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCAAACATTTAGTCATTGGTGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCACAGAGAAGATTGACGC	   Shared	  Chr16_24834581	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG	  ATTCCTAAGCGGGGAATCAG	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGAGGGCTTCATTCACTCA	   Shared	  Chr16_33410688	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAGTGCTGGGATTACAGGCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCAAAGTGAAGAGCTGTCC	   Shared	  Chr17_12905574	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGCCAAGCCCATTCATTAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTTCAGGGTGAATGCCTCC	   Shared	  Chr17_12905743	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACGGAGCTGGTACTTGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGTTCACAACCTTCGCAGC	   Shared	  Chr17_21217547	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGAAGACGGACATTGCTGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGCTTCTTTGGCACTTGGG	   Shared	  Chr17_26695832	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCTGGAACTGGTACTCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTCAAGGGTACTCAGGGGG	   Shared	  Chr17_29673219	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATACAAGAGCCGAGGAGTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAAAAGATGCTGCAAGCTACC	   Shared	  Chr17_39436909	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAGGTCCAGATTGGAAGGA	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGCAGGTGGTCACACAGAC	   Shared	  Chr17_64025068	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCATTGGAATTTATATGTTTCCTC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGCCTAGAGCCATAGATAGCC	   Shared	  Chr19_19446936	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTGGAATTTCAGGGCAGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACACGGAGAGACACTTTGC	   Shared	  Chr19_21751895	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGGACTGAGGCTGAGTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCAGTCTCCGTGACTTTGT	   Shared	  Chr19_41383217	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGAGAAGGGCTGGAAGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCAGCCAAGGTCCATGAGG	   Shared	  Chr19_5273571	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACTGTGATCTCCCCAACCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAATTCTCCCAGTCACCCCC	   Shared	  Chr19_57760057	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATTCCAAGGTCTGGTCCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAAGTCTCAAGCCAGCAGGG	   Shared	  Chr2_133019862	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCAAACTGGAGCAAAGTCT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTTTGGGATTTTAGGTGGAA	   Shared	  Chr2_133020562	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGTCTGCAACAGGAACACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGTTAGTGACTACCAGCACC	   Shared	  Chr2_36691951	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGGGTTCAACTTAGCTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGCTCCAGGATCCTTTTCC	   Shared	  Chr20_41100774	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGATGGGGGTACAAAAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGGAGGAGAGCTTACATGGG	   Shared	  Chr22_18655860	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACCTGCCTTCCATCTCTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTGGAAAGAGACAGGCACG	   Shared	  Chr22_23501286	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGATGCTCTGAGGGAGAACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAAATGGCCCTTGTTGACCC	   Shared	  Chr3_138188485	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGACAGCTTGGTGTGTAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGACATGGGCTTCAGAACC	   Shared	  Chr3_75713826	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTTGGGCTGCTCCTATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATGGCTCACCTCTCCCTAGG	   Shared	  Chr4_141847279	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTTGCACATGGGACAGGTA	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACCACAGGTTATAACAGGAAACA	   Shared	  Chr4_149358014	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGACTGTGGTAGCCTTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTGTTCTGACATCTCGACAAGC	   Shared	  Chr6_127768950	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACATTTGGCATTTCCTGGAG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACTGAGAGGTAGACTTCATTCCA	   Shared	  Chr6_32188383	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTCAATGCAGGTGGATCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTACTCATAGGGCTCCCAGC	   Shared	  Chr6_33235755	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTACCTGCACCTTCATGAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCACAACCCCAAGGAAGTGC	   Shared	  Chr6_36353472	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCAGCTTGCAGATCCCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTTGATGCAGTCTTCTGGTGG	   Shared	    30 POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr8_134050678	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGGGAGGGGTTCCTTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGCAGCAAGAGAAAGAGC	   Shared	  Chr8_95853708	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTAGCAGGGCTGTTTTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTGGACAGGTAAACTCAAGG	   Shared	  Chr9_114841154	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCTCTGGCCCCAAATGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTCCCAGTTACTTGGGAGGC	   Shared	  Chr3_112066360	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTGGGTAGGATATGCCATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTCCAGAAGACCATGGGGG	   hTERT	  Chr4_123236857	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTACAACACTGACAGGAGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACAGTGAACCACCATTCCC	   hTERT	  Chr4_70172516	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACTGGAAGCAACTCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCCACAGGACAAAACAGAAG	   hTERT	  Chr5_71609959	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTTCTTTTTCTCTTGCTTCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCACTGAATTGTCAACTATGCG	   hTERT	  Chr5_140903792	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTTCGACACCCAGGATGAG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGTGGGAGAGGGGAAATCA	   hTERT	  Chr6_7301834	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAGTGTTCAGAGGAAGAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGAAAGGGTGCCAGAATCC	   hTERT	  Chr6_158658857	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACTTTTCCACCCTTCCAAATGT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCAAATATGGGCGGGAGGAG	   hTERT	  Chr7_6370144	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGCAGAGCCAGTAACCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACCCACCAAGTGCTTTTCC	   hTERT	  Chr8_131138344	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCCATTTCCCTATGGATGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGGAAACTTTTATGGTAGCCCC	   hTERT	  Chr8_105361354	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCTTGCCACTCCACTAAGT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCATGAAGAGGCCAGTGC	   hTERT	  Chr9_126520068	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGGAAGAGCAATGCAGAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTACTGCACATTCATTTTCCCC	   hTERT	  Chr9_117458475	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAGGATGGTTTGGAGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATATATTGCATGTGGGTAGCC	   hTERT	  Chr10_103560186	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTTCATCACTGCCTTCATTTTC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGTGGTTTGGTGAAGGGTCT	   hTERT	  Chr10_29840038	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTTCCTCCCCGCTTCTCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACTGCACAGAAACCTCCGG	   hTERT	  Chr11_82644764	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCCATGTTCACAGTCAACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACGTCTGGTTGTGAAACACC	   hTERT	  Chr11_60105199	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAAGAAATGATCCCTCCGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCGTCCAACTTACACTGCAA	   hTERT	  Chr12_109936056	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACATCCTCCATAACCACGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCAGCCACGGATACTTACCC	   hTERT	  Chr12_1553797	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCTTAGATCATCCAGCCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCAACTGGATTTTCACAGGC	   hTERT	  Chr13_33629393	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGACTTTCTAAGCCAGGACAAG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCATGACAATCAGCCTAGG	   hTERT	  Chr14_73459917	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCAATGAGAAGCCCTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAGAAGCTCATCCAGGACCG	   hTERT	  Chr14_45473265	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGGATGAAAACCTGAATGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGTTCGAGGTGATGAGACAGG	   hTERT	  Chr15_40330518	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAGCCACACAGGGCAAAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTCAGATGACGGTCGAACC	   hTERT	  Chr15_45974869	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTGGTTGGGTGGATGTGGAT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTGAGTCTCCCAAAGTGCT	   hTERT	  Chr16_81208515	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTCAGCATGGAGGACACAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATCCAAGCCTCTCTGACCC	   hTERT	  Chr16_58573901	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATCTGTTGCCCAAGCTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTCTTCCCTTATGCCTGCC	   hTERT	  Chr17_17168316	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGAAGGGCAGGCTTAAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAAACACTTGTGGTCCTAAGC	   hTERT	    31 POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr17_16649880	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGCACTGGTCTTCTCTTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCACCAAGGAGCAGCTGG	   hTERT	  Chr19_7605637	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGCCTTCTCAGCATCCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGGCGTGAAGAGAGACAAGG	   hTERT	  Chr20_50803444	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAATTCTGGCCGACTCCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCATATCTGCGGCATCTGC	   hTERT	  Chr21_38439686	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCACTCTTTCATTGCCCCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACATGTTGGCTTTGTGACCG	   hTERT	  Chr21_28212760	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCACACTTGCCGTTGATACACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGACAAGCCTCAGAATCCC	   hTERT	  Chr22_50572406	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGGATTACACACGTGAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGATGACGTGTTCAAGTGC	   hTERT	  Chr22_46652372	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCAGAGGACAAGATTGGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACAGCCACCGTTATCTGGG	   hTERT	  ChrX_135730555	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCACTGGACTGCCCATCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACAAACATTATGCCACAAGACC	   hTERT	  Chr2_188690750	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTACAAGTGGCGAGGGAAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTCTGGGCTCTCTCATGC	   hTERT	  Chr20_8770822	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCACCCTTATGTTTCTGTCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTCTACGAGTTTTTCTTGCCG	   hTERT	  ChrX_10062163	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGGGAAGAGACACATCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCTCCCGGTGCATTTCTCC	   hTERT	  Chr5_140558212	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGCAGGAAATACCCAGAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGACCTTCACAACCAGGAAGC	   hTERT	  Chr19_15164559	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTACTCCAGGCCTGTTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCCAGCTTGACCATCTCCC	   hTERT	  Chr1_160640680	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACCATTCTGATGCAGGTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTCCTCTCCTTCCTCAACG	   hTERT	  Chr7_10491642	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTCGGTGTGTCAATCCTCA	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTGACTTGGGTGCTGGAAG	   hTERT	  Chr7_102854504	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTCAAATTCGGGAGAACAGG	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTTTGTTTTCTTAGTGGGATGTC	   hTERT	  Chr6_158502527	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGATGAGGATCCGTGAGAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACAGGTCAGGGCAAGAGG	   hTERT	  Chr6_32268501	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTCCTAGGCACAGAAATAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACACTGGGCAAAGAGAAGGG	   hTERT	  Chr2_204322298	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCAGGCTGGTCTTGAACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTTCTCTCCCACCCATCCC	   hTERT	  Chr3_15080696	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGGTTTGACCAGCACAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAGTGTGATCCAATGGAGGC	   hTERT	  Chr9_37490422	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTACCTCACGTCTTTCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAACCAATGGGAAGCAGCC	   hTERT	  Chr15_60345430	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCTCCTGCTGGCTTCTTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCACTATCCACAGCTAGAGG	   hTERT	  Chr1_75622616	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTGTTGCTTTTCTCCCTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGATTGCCTGGGAGACAGC	   HCT116	  Chr1_84880380	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATCTTTGCAGCCTGGATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGTGTGGACTCCGATTTAGG	   HCT116	  Chr10_112724573	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCGGGAAGAGAATTCAATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCGTAAATCAAGCATCCGC	   HCT116	  Chr10_115489167	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTGATTTGGTCGTCTCCT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGTTCCTTGGTGAGCATGGA	   HCT116	  Chr10_5177350	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACAGACCAGGTGAGTCACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAAACCGTGTCTGCTTAGCG	   HCT116	  Chr11_104869708	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCACCCCCAAATTTTTGAC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCGTTCTGCATTTCTATTCTGGA	   HCT116	  Chr11_114451027	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGTTTTCCTCTCAGGCATTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACCACCTTACCCCCTCTTAGG	   HCT116	    32 POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr11_120745874	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAACCTGGGGCCTGTTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGATAACTGGGAGCAGGAGGC	   HCT116	  Chr11_5475506	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCCACCTTTGGCATGGACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCACAGGAGGCACAAATAGG	   HCT116	  Chr11_63520131	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTGTCCTCAGAAGCTTTCCA	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAGCTAAGCATACGTTCATTTTT	   HCT116	  Chr11_85630411	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTGGTAATTTGCTGAATCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTTTCCTTTGCAGCTTTCCG	   HCT116	  Chr12_20864356	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAATCTGCTGTTAAACACTGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGCTTGACTCTAGGAGACACG	   HCT116	  Chr14_24606792	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGGAAGACCTGGGAGAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGATTGAGCCTGTGTGAGGG	   HCT116	  Chr15_44900675	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCCCTACATTAAATGCCCTCCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAAGAAAGTGGATCCCCAGC	   HCT116	  Chr15_51529112	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATTGCTCACTTCATTTCAGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCAGATTCCTGTGGATGGGG	   HCT116	  Chr15_78764285	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGTCAAGAGTGGTTTTTGAAGA	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGATTTTGTTGTTGCTGCTG	   HCT116	  Chr16_20648702	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAGGATGACTGGAGCAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGGCCTCTCTGATCAACCC	   HCT116	  Chr16_65022234	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGTGATTTCAAACGATTCCA	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCTTTGAGAAATTTATTGGGATT	   HCT116	  Chr16_84215692	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACGGTGCATGTCACCACTT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCTTGCGAGACAGGAGATTC	   HCT116	  Chr17_2323517	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTCAACCCAGTGCATCTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACCCACGGAGGATGAAAGG	   HCT116	  Chr18_72238472	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAGGAACCAAGGCAGGAAGT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCTTGTTCAGAGCTGGTTC	   HCT116	  Chr19_15038754	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATTTTCCACCTGTCCGTCT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGACAACTGGCCCTTTTCT	   HCT116	  Chr19_21713459	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCTGTTTCTAAGCCAGACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAACCACTGTGCCAGGC	   HCT116	  Chr19_44652954	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAAATCTGGACCAGCTCCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCACCCACTGACAGCAGG	   HCT116	  Chr19_57686731	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAAGTCAAAGATGGCGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCAAGAGCCTGAAGAACGG	   HCT116	  Chr19_9226192	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGCCATGACTGCAAGAAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACCTGCTCATTCTCCTGGC	   HCT116	  Chr19_9236886	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATGAGTGGGTTCAGCATGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTGTTGTTCCTCTCTCTGGG	   HCT116	  Chr2_169763262	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCTTGTATCTATTCTTCCATCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAGTTGATTACCGGGGACA	   HCT116	  Chr2_170055255	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGAAATCAGCCCGAATACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTCAAGCTGATCGCACTTGC	   HCT116	  Chr2_178762824	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCATGGTTGAAGAGTCACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTGAAGAACAGACTGACCTGG	   HCT116	  Chr2_27663416	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCAGGACCAGGAAGAGAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTTCTCACTCTGCTCTCACC	   HCT116	  Chr2_44201259	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCGCATCCTCTTTCTACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCATGCCCCGTATCAAAGG	   HCT116	  Chr3_119211294	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGAAGCCAGTTTATTAGGAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCAAGGTATAGGTGCCAAGC	   HCT116	  Chr3_121838319	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGCAACCTCTATTCTGCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCAGGAACTTACAAAGGAAAGGG	   HCT116	  Chr3_188327461	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTCCACCTCCTTCAACACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTTCTGGTTCCCTGCTCC	   HCT116	  Chr3_189706661	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGCTCAGGAGAGAAGAAATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTGAGGGAACTTGCCACC	   HCT116	    33 POSITION	   FORWARD	  PRIMER	   REVERSE	  PRIMER	   	  Target	  Chr4_175413296	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGTTTTTGCTTGAAATGGA	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACATTTCCCTATAACATGTTCAACT	   HCT116	  Chr4_177113779	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCGACATTTCTTTCATCTGTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCGTTTACCCCTGGTTTCC	   HCT116	  Chr4_2181016	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAATGCCAAGGTCACAACT	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCACACCCTCTTTTGAAGATTT	   HCT116	  Chr4_42580442	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCCTGGGTAAATTTCACAAC	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCAAAGTGTTCCTCCCAAA	   HCT116	  Chr5_102440383	  	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAATTCATCTCTTTGGTCCATT	  	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACCGAAACAGCACCTTCTC	   HCT116	  Chr5_78296755	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCTCACCGAACCAGATGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATTGATGGCGAGGAATGG	   HCT116	  Chr6_29455098	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGATAGCATGGAAGAGCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTGTGTCTCTGTCTCCTGG	   HCT116	  Chr6_30231224	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTTTCTCCCCTGGGTGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGCATCCCTCTTCTCTGC	   HCT116	  Chr6_30899524	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACACACAGCCCCATCTTTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGGGGAAGAGGAACTGTGC	   HCT116	  Chr6_36642168	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCTTCTGGGCAGAACTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGTTTGTCTGCATTGGGG	   HCT116	  Chr6_51720838	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTCCAAGAGCAGAGCCATCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAGGGCCTAAGACATATGTGG	   HCT116	  Chr7_36446121	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAAGAACAGCCTGGTACCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAGAAAAGGCTTCCAAAGACC	   HCT116	   Table 2.1 Primer sequences used for singleplex PCR sequencing 184-htert /HCT116 cell mixing experiments The table shows the chromosome position, forward and reverse primer sequence, including Illumina adapter sequence, the first 33 bases of each primer, and the unique cell line or shared position for all 144 primers used in the 2 Step PCR chemistry 184-hTERT/HCT116 mixing experiment.           34 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr1_151158224	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAGTGGCATAGGGAGTACTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACATTGATGAAGGGGATGAACC	   DAH55	  chr1_183543763	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCAATGCTAACTGTTCTTCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTCAGAGTATACGTGTGAGCC	   DAH55	  chr1_185703859	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAGTTACTCTGAGAGGAAACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCTAGGGAAGAATAAAGAAGAGC	   DAH55	  chr1_228786415	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATATGAGACCTTCTTCCAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAGGATTTCAACTGGACCACC	   DAH55	  chr1_23848591	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTGTCATACCGAGTCTTCTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTTTGTAAACTGGAAAGTGGG	   DAH55	  chr1_240176234	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAAGATGGTGGTGGTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTTCGGGTTGGCTTAGGC	   DAH55	  chr1_241162048	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGGGCATTGATCTTATTCTCCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAACCTTTCCTCAGCTGCC	   DAH55	  chr1_247039362	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTGTATGTGTTAGCGAAAGTAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTAGCCATTGAACGAGCTAAGC	   DAH55	  chr1_75684963	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTATAGGATCTCTGCTTCGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACAAAGTCATAGCTCCAGGGG	   DAH55	  chr10_124008639	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTGGGCATTTGTTCTCGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATGTTACCTGTCCAGTTTCTCC	   DAH55	  chr11_67134963	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCACCCCATCATTTACCAGAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCCATCCAGAAAACCTATGACC	   DAH55	  chr11_74000249	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGGCACACGAAACATTTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGCTGTTCCTTTCAGTGAGC	   DAH55	  chr12_32137858	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCACGAGGTAACCTTTCACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCTGGTGATAATGACTGTCCC	   DAH55	  chr13_24903037	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACTTTTGCCAGTGAGTAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGATGTGCCAGGTAAGAAAGC	   DAH55	  chr13_73293249	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGACAGAGTTTCCATATCTAAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGATTTCCAAAACCTGGGATGC	   DAH55	  chr14_23891672	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCTTTGTGTTCAGGACTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGAAAGAATACGGCTTTGGGG	   DAH55	  chr14_61246803	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGGACGAGTTCTTCTCATTCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGTTGTTCCACCACATTATTTAGG	   DAH55	  chr14_90651010	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTCTATCCTCATCAAACAGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTCTATGGAGATGTTGCAGCG	   DAH55	  chr15_79231523	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACAGTTTGGTTTGATAACAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTCAACATTACTGTGATGCGC	   DAH55	  chr15_93609300	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGATCTCTGCCACCATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGAATTTGATGTCGTGGAAGG	   DAH55	  chr17_21518829	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGCTATCCAAGAGTCACAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACAACATTATACACGATTCTCCCC	   DAH55	  chr19_10252889	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGATGTCTGCGTGGTAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTCTTTACCCTGCAGTTCCC	   DAH55	  chr19_18707559	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCCACTCACTCCAGATCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCTCTACTCTGACCTTGTCC	   DAH55	  chr19_7601057	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGAGATGGGGATTTGCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATAGAAGAGCACTTTGTCCC	   DAH55	  chr19_8953870	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATAATTCCTGCAGAGGGAGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGTCTCTCACTTTCTCTGCC	   DAH55	  chr2_109098378	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAATTAAGAAATTCGACTTTGCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTTCTCTCCTAAGTGGGAAGG	   DAH55	    35 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr2_200213522	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTCAGCTCATCTCTGACTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAACCAACAGATTGCCGTTAGC	   DAH55	  chr2_55523308	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGACTGTGCAATATAAAATTGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCAATAAGCTTACCTCAGTACAGG	   DAH55	  chr2_85779867	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAAATGGAACTTGCCTTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACAAATTTGAGTCTGCTTTGCC	   DAH55	  chr2_86088425	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGGATATAAAGGATCCACACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATAGCAGGCAGACTCATTTAGG	   DAH55	  chr2_99008387	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCGAGTACCTGATGCTGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTAATGTCCCATCCACCATGC	   DAH55	  chr20_39798840	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCTTCGTCTTCTCCATCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGGTTCCAGCAGAATCTCACC	   DAH55	  chr4_154318534	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACCAAAGGGAACAGCTAAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACATGGAGAATAGTTAGCAGGC	   DAH55	  chr5_140181901	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATCACTGCACAGTTCTACTCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTGACCTGTCCATTGACTCC	   DAH55	  chr5_148392362	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACAGACATGAACTTGTGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACAGCTGATTGGGGTAGAGC	   DAH55	  chr5_149301168	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTCAAAGGAGACAACCCAACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGTTCCACCTTACTCTGGTCC	   DAH55	  chr5_37815622	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTTGTCTGTAGCTGGATCTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAAAGAAAGGGACCAAGGTTCC	   DAH55	  chr5_78697782	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACATCAGCCCTGAAATTACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGAACTGGCTACCCTCAAAGG	   DAH55	  chr6_41204403	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTTGGTGCTATGTGGACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTGGCTGCTTCTTCCTAGC	   DAH55	  chr6_51882397	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTCAAATCACACTGCACATAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATTTGTGAAACCGACTAGAGGC	   DAH55	  chr7_122635434	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACGGTAAGCAAGCTGTTTAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTGGACATGATTCTCATCAGCC	   DAH55	  chr7_65253193	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGAATGATCTGTATGCGGAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGAAGACACTGAATTTGTTTCTGC	   DAH55	  chr7_99084738	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAAGTGTTTTCGGCAGCTCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCACACTCAAGGCATTTATGG	   DAH55	  chr8_141756987	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCTGTGATGACTCCAATCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCAATGAGTCGAGATTGTACC	   DAH55	  chr8_77515545	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGGAGAACAGGTTTAGATCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAAGTACTGTTGTGAGGTGACG	   DAH55	  chr9_113096681	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCACTGCGAGTTTGTGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCAGCATAGTGATAACCTGGC	   DAH55	  chrX_2635857	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAGAAGTGATTTGAGTGCTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAGCACACAGAGATGTTGTCC	   DAH55	  chrX_65253624	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATGGTCTCCAGAAGAGTCACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGTGCCAGAGAGTGTAACAGG	   DAH55	  chr1_10177361	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTGGTTTTCTAAGTTGCCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGATCCCTGTACACATAACAAAGC	   DAH56	  chr1_110172346	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTGGAGCTACATCTGTCACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACATGAAGACAGTGTCAGCTCC	   DAH56	  chr1_144917557	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTAGCTTCATTGGAGGAGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAGTCTTCCTTTCCTTGCAGC	   DAH56	  chr1_208266341	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAAGGACAGAGATGGTGAGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCACAATTCTTGGTGCTAGAGG	   DAH56	    36 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr1_216373059	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTTACCATTTAGTTCCGCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACAGGCGTGTACTAGCGG	   DAH56	  chr1_27099880	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACTTACCAGTTTGTTCACCGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATAGGGATAGTGCTGTCGTGG	   DAH56	  chr1_79215892	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTGATAAGAGGTCGAAACTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGCAAATCCACTGTCCATACC	   DAH56	  chr10_124013694	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTTTTGGACTTAACTGTTGCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCTGATAGTCTCCTTAGTACTGC	   DAH56	  chr10_135176526	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATCCTCACAGAGCCTGATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTAGAGAGAGGCCGTGAAAATCC	   DAH56	  chr10_5496548	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGAGGATGCTGTAAGTAGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTAGTGCACAGCGTACTAACG	   DAH56	  chr10_98764573	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAAGCCCTACCTGTAGATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAGAGAAAGTGCCACTGCC	   DAH56	  chr11_121424611	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCACATGCCAATGAATTTGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTAGATGGATTTCCTCACAGC	   DAH56	  chr11_58111675	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCTATCAGAGCAAGACAGAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTGTACATGTCTGGCCATAGG	   DAH56	  chr11_74058341	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGGTAGAAAAGTCTGGATCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGAGTTAAACTCGAAGACCACC	   DAH56	  chr12_55251910	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGAAATTGGTTGTTTATCTTCCCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAATATTGTCAAGGGGCTTCG	   DAH56	  chr13_114789786	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAAAAGAGCTGAGGAGTTTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTTTCTCTCTCGTCTGACTCC	   DAH56	  chr14_63174915	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCACGGTTGTTCTGCTTAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTGAGAGGAACCAACTCCAGG	   DAH56	  chr14_74995317	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCCTCTGCCCTTGCTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACCTCTCAGTTTGTCCTCTGG	   DAH56	  chr16_18026605	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCTCTTCAGTGCCCTATAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGGTACATAGAAGTTGCCAGC	   DAH56	  chr17_74163678	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTGGTCTCCTGATCTTACCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATCACCATCTATTACCAGGCC	   DAH56	  chr19_36913799	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTCAATAACCCTATGTGAATCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCCTGACCTCATGATCTGC	   DAH56	  chr19_39025898	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAATGGAGGCAATGCTGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAAGCCAACTTCCTTCTTGTCC	   DAH56	  chr19_48783139	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTAAGTTGGCAGCTCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCATCTCTCTCCAAGCAGAGG	   DAH56	  chr2_109868023	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGAATATTGTGGTCATTTGGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATTATTTATACTCGTTCCAAGGC	   DAH56	  chr2_210558708	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATCTGCCTTGAAAGAAGAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTGGGCTGGGATTCTTTTCC	   DAH56	  chr2_219457176	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTTAGATGACACTGGTTTGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGACTGAAGTTGTTCAATGCC	   DAH56	  chr2_88474686	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGTTCACAAGTTCACTTATGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAACAAAGGGAATGAACGAGG	   DAH56	  chr20_34257747	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAGGGCTTAGATAGGAAGTCG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGAGCTGACAAGGACTTACAGG	   DAH56	  chr21_32502678	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATGTGTCGAAATCTGAAGGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTACTGCGTGAGAGTCTGTCG	   DAH56	  chr22_41310266	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTCTTACTTTGTTCCAGGATGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATACTCCTAAGAAGCAGTGGG	   DAH56	    37 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr3_100467068	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTCAGCAGTTCCAGGGATATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTAGTAGGCTGAGAAGTTTGGGC	   DAH56	  chr3_190366285	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGTTCTAAGCCCCAACTACG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCTTCAGCTCTTTCACCTTCG	   DAH56	  chr3_53777131	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATTTGCTGGATATGCTGGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCATGAAAATGATGCCCAGCC	   DAH56	  chr3_62571052	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGGATGGAGAATAACCTGTAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTTTCTATCCCCAGGTATGGC	   DAH56	  chr4_140264024	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCACTTGCTGTTACTAAAAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAATTCTTCTAAAATCTTTGCTGCC	   DAH56	  chr5_106763140	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTTGAACCCTTTGGAAGTGTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGATTCCAGAGGGGTGACTACC	   DAH56	  chr5_35807348	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAGTCGCTGTTCTCTTGAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTCTTTCTCTTCCATGCCAGC	   DAH56	  chr5_36680565	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAGACAGTCAGAACTACCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATGTTAATGGTGGCTCCTACG	   DAH56	  chr6_167550792	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGAAGGAAGTACAAGTCCTCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATGTTTCACACATGCCTTAGGG	   DAH56	  chr6_25812965	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCTGTTGCTGGAGTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTGTCTGTCTCTAACGACCC	   DAH56	  chr6_34850761	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTGTGTGGTCAAGTGAGAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCATCATTACTTAATCCTGCAGCC	   DAH56	  chr7_103216161	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTGGATCCATCGGAATCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAAGGTTAAGTGCAGTCAAGC	   DAH56	  chr7_94412373	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATAACTGCTGGTGTGGACTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTTTTCACAATCAGCTACAGC	   DAH56	  chr8_113702265	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACCCAAACTGGCATTCAAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTATCATATGCCTGCAACCC	   DAH56	  chr8_53536478	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATCTGTTTTGTGCCTAAGAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGCAATGAGTAATATGATAGGAGGG	   DAH56	  chrX_100650650	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGGTCTTTTGCTACAAGGAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTAGAACTGGATCACTTGGCC	   DAH56	  chrX_145078143	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGTTCACGTCTTAAGGATGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTCCTTTCTGCGTAGAGTAAGG	   DAH56	  chrX_18912518	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATGATGGCTTCCACCAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCGGTGATTGATTGATAGGGC	   DAH56	  chr1_112329518	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAGCTACCTTACCTTGCAAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCAGAGAGCTGATAAACGCAGG	   Shared	  chr1_157068868	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTCACATTCTCCTCCCTAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGGATTTCAAGGAAAAGGGCC	   Shared	  chr1_196254802	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCCAATTTTAAGTTAGTAGCTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGCTGGGAGGGTGTTTAGC	   Shared	  chr10_26569951	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAGCTCAGTTGCGCATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTCTTCTCTAACCAGGAGAGC	   Shared	  chr10_37626477	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCGCTATTCTCTTGCTCAATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATGGCTTACTTTCAGTGTGACC	   Shared	  chr10_70958138	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTCACAAGAAGTTGAGTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTAAAACAGACAATGCAGTCCCC	   Shared	  chr12_102131538	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGAAACACAAGTACAGCTGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCAGTACATCTCATGTTGGGC	   Shared	  chr12_1943494	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGGAGTGCAAGTTCTAGATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGAGCAGTAGTTGAACATCTCC	   Shared	    38 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr12_1943537	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTCTCTTCTGCATGTAGGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGCCCAGGAATATCCAGC	   Shared	  chr12_7343529	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTCTCTCTTAGGCCTCCAAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATTCTCATTCGGTCAGGAAAGC	   Shared	  chr12_9000316	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGGATGAGAGTGTCTTACTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTCCCTAAGACTTTCTCTCCCC	   Shared	  chr14_102606275	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAAGGAAGAGGCAGGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGTGAATTGGGTCTTAATCTCG	   Shared	  chr14_20404531	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAGTGGCCTGATATCATTGAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGCCCGAAGAAAAGAATGACC	   Shared	  chr14_86089576	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGCGGTGATATTTGTGCTGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAGGATGGAGTTGTCCTTCTTGG	   Shared	  chr14_99183455	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCACCAATGCATGTTTCAATCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTGCTTAAATCTTGGTGTAGC	   Shared	  chr16_67381476	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTGCATTTTAGTCAGTTGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATTTGTGGGAAATTTCTGCTGG	   Shared	  chr17_19476175	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGTTTCATTTTCCAGGCTCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGTGTCTCACTAACAGCCCC	   Shared	  chr17_38926484	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAACAGTGGACTTCAAGAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGAGTGTTGGATGAACTGACC	   Shared	  chr17_46608203	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGTTCTGCTGAAACGCAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGAGTACCCACTCTGTAACCGG	   Shared	  chr17_79954545	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAAGCAGAGCACAAGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTTGGACTTCTTTGGCTTGG	   Shared	  chr19_15918695	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACAGGAAGAGGTACATGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGTGACCCAATTCATCCTCG	   Shared	  chr19_22868599	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACTGTGGCAAAGCTTTTAACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATTTGTAGGGTTTCTCTCCAGC	   Shared	  chr19_53431826	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAGTCTATGATGACGTGCAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGAAACCTTACAAGTGTAGTGAGC	   Shared	  chr19_6908923	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAAAGATGTCTTTATGGCTGGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAGTAGCTGGTACTATGGGC	   Shared	  chr2_141027788	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGATTTCATGTACATACAAAGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTGCACTAATGGAAAGATTGC	   Shared	  chr2_170377554	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGAGTCTAAAGAATTTGCACCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGAAGTAGCTTGCATGTTGGG	   Shared	  chr2_74437264	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTAAGGAAATTGCTTTCAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTGCAGCTTCTCTACAAAGG	   Shared	  chr2_79312563	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGAGGATCAGAAGGAGTGTAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACAATCTATCCACCTGTTGCC	   Shared	  chr21_30699316	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCAGGGAAGATAGTAGTGTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTGTGAGATCTGTTCTGAGCC	   Shared	  chr3_101049143	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCAGCACAATCTAATGATGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTCGACACTGTCTTTGAGTGC	   Shared	  chr3_109190520	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCAAATTTCTCCCCACAGATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTGCAAGATTCCAAAGACAGC	   Shared	  chr3_151045637	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTGTTGCATTCTCTTAGTAATGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAGACTTCCGTAGATAATGTGG	   Shared	  chr3_164737303	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCATCCACTCCTCTACTTCCACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCCAAGCCACTTCTTTATGGG	   Shared	  chr4_135865620	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAAAGGAATCCCCAACGTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTCATTTGTCATTTTCATGGGC	   Shared	    39 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr4_38830962	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAGAAAGCTCATGTCAGAGACC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGGAACCAGAATCCAGTTCTCC	   Shared	  chr5_70081281	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAAGCCCTGGTCTCTCTTGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAGAGGTTGGCACCCAGC	   Shared	  chr6_137330455	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACATGAATGTATGCCCCATTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATTTCTGTTGTCCTGACAGC	   Shared	  chr6_23856901	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAACAAGAGGTTCTGGTTGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTTACGAGGCTCCTGCTTACC	   Shared	  chr6_71004238	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCAGAGGAATCCTGAATCTGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATGGGTACAGGAAACAAGATGC	   Shared	  chr7_117232474	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCGAAGAGGATTCTGATGAGCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGAGTGTGTCATCAGGTTCAGG	   Shared	  chr7_65970411	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATATTGTCCTCCAGCAGGTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCAGTGAATACGACCAAGTCC	   Shared	  chr7_92983022	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTATGGGAAAGCATTGTGGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCTTCTACAATAGTTCGATTAGCC	   Shared	  chr9_104448903	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTTACTCTCTGTCATGGCCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCATGATCCAACCAGAACTTGC	   Shared	  chr9_127300355	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACACTTTGAGACTGCAAGGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACTTCTGCCCCAACAAGC	   Shared	  chr9_127617888	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGGCTGGAGAAATTAGCAAGG	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACAGTGCAGTCACAGGAGG	   Shared	  chr9_6477539	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGGGTAACAAGAGCAAAACTCC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACACAGGTCACATTCAGGG	   Shared	  chrX_135754085	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCCTCTAAAGTCTCCCAAATGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAGAAAAGTGAGTAGAGCAGGC	   Shared	  chrX_53602588	   TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTGAAATTCCCCGGGGATAGC	   GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGATGTATACCAGCAGTACCGG	   Shared	   Table 2.2 Primer sequences used for singleplex PCR sequencing DAH55 /DAH56 cell mixing experiments The table shows the chromosome position, forward and reverse primer sequence, including Illumina adapter sequence, the first 33 bases of each primer, and the unique cell line or shared position for all 144 primers used in the 2 Step PCR chemistry DAH55/DAH56 mixing experiment.         40 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr6_29455098	   GGTTCTTACAGGTGACTCTGCTAACT	   TCATCTGGGGCCTGCCTTTTTGCATCA	   HCT116	  chr6_30231224	   GCATTAGAGCCAGTGCTAGGATTACTA	   GGCTTGAAAATTCATCTCTTAATACCACAG	   HCT116	  chr7_36446121	   GGGATGGGTACCCAACTTAACGATTTT	   GCTTTATTCAAAGAGGCATCGCCATC	   HCT116	  chr8_28157994	   GGATTGTGGAGCTTTATTCCCAAGGAG	   CCACCTCTTCTTTTTAACTTCTTTCTTG	   HCT116	  chr8_6728198	   CTTTCGCGAGATGTTCTCAAATCGTTG	   CAGAAGTGAAATGAACTTTTTATAAGCATT	   HCT116	  chr9_107533175	   TTTGGAGATAAAAATTAAAACAGGTACCA	   TGGCCCACCAGGTTTCATCTGAAA	   HCT116	  chr4_2181016	   TGCTGATTTTATAGGGCTAGATCCCAG	   CTGGAAGCTTTTTCTATTTTATTGATAGCC	   HCT116	  chr4_42580442	   GGAAGAGGGTTTGCAATATTTGACAGT	   TTCTTCCTGCCTCTCCCAACAGATGG	   HCT116	  chr5_102440383	   TGTGTTTTCTTTATTGTGTGCCTGTTG	   CACCAAATGGATTGTGATTTTGCCTCT	   HCT116	  chr5_137475787	   GGGTTTCTTTTCTGGGTAGCATCATAC	   TGATGGAAAACCTGTTTTCTAACCTGG	   HCT116	  chr5_78296755	   GCAAACAAAAGCAATGATTAGGTGCAG	   ACAGCTCCACCTGTCTATCAGTTGAA	   HCT116	  chr6_136882717	   CTGCCAATCTCAATTGTTTCTATGAAC	   ACTGGCTGAGAGTGAATGGAGCTGATG	   HCT116	  chr22_44322922	   ACCACACCCATCCCAAAACAGCTTT	   CGGAGGAACTTGCTTAAGTTGAAGGAT	   HCT116	  chr3_119211294	   CATCTCACGATTTCTGGTCCTTATTCAT	   GATGACTACTATAGTTCAGTTTTCAACAT	   HCT116	  chr3_121838319	   TTACAGGGAGGCTATTCCACTTAGAGG	   CATCAGATCTTTCAGGTATATGGATTTTT	   HCT116	  chr3_188327461	   CCCCTGTATTCTCCTCCTTATGTTACT	   AGAGTCATTTCTGCCCCCATAGCCTGG	   HCT116	  chr4_175413296	   CTGCCAAAATGATGGAAGGTTACAGCT	   ATATGCCTCATTCTTTCGTTTCAAAATA	   HCT116	  chr4_177113779	   CCAGGATGACTTTGTGAATTTCACGGA	   GAAACAGCTGAGTTCTTCCAAGCTCTAT	   HCT116	  chr19_44652954	   GGAAAGTCAGGTCTACAAATGTAGGCG	   AGCCACATCCTTGAAGGTCAGTCC	   HCT116	  chr19_57686731	   CCCGGATTTCTTTGCCAGTAAAAATGA	   GAGAGGGTTGCACTTCCAGAATAAAAT	   HCT116	  chr2_169763262	   CCACTTTGTACATGGAGGCTATGAGAG	   GGAAAATGTGTTGCTATGAATACTCTGG	   HCT116	  chr2_170055255	   GATGTGTTAGCACGGATACTGTTCATA	   GCATGTTGTGAGCTAATACAAAGATATG	   HCT116	  chr2_27663416	   AGATGAAGACAGGTCACATGGGCTCTG	   TGACCAATCTGCCCCTTTCAGACCCT	   HCT116	  chr21_44515685	   CTAACCAGCAAGATTTCTGTTGTTTGC	   GTATAACGAGTCTTAAACCACTGTGGTC	   HCT116	  chr16_84215692	   TTCTGGATGAGGCCTGCACTGGGGGC	   ACTGATCCTTAGTGAGAGGCCCAGAT	   HCT116	  chr17_2323517	   CGTCATTCAGGCCTTGGAAGAGAAAAA	   TTGTCAAGGCTCTAGCAACGGAGCCCA	   HCT116	  chr17_28396142	   TTCTTCTGGGGATAACTCATCTTCACC	   CTTTTCTTATCCACCACTCTTTTAACTC	   HCT116	    41 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr17_39081713	   CAACTGGTCGACATCTGCTAAGTACTC	   AACTGGAGCGGCAGAACAATGAATAC	   HCT116	  chr18_72238472	   CTTATAAGTGGAGGGAACAGTTTGCCA	   TGCCACCAAAGGTTCCTGAGTGAAAA	   HCT116	  chr19_15038754	   AAAAAGAGCAGTTTCCACTTCCATGAT	   ACTGAGGCTATTGCACCTGAGTGTGAG	   HCT116	  chr14_24606792	   TTTAAGGGGGAGGGCTTACCAGGTTGA	   TGTGGTTTCAAGTACCTTTGTCTTCAT	   HCT116	  chr15_44900675	   CACTACCACATACTCTTCTCAAGCTAA	   CATCTGATATTACAATCTACCACCTTATTC	   HCT116	  chr15_51529112	   CGTGATTCACAGATATACATCACATGT	   TGGAGAATTCATGCGAGTCTGGATCTC	   HCT116	  chr15_78764285	   GTGCTGTGCTAGACACTGTGTAAATTT	   TAGTCATTTACTTACCCCACCCCAGA	   HCT116	  chr16_20648702	   TAGACTAAGGCCTTCCAGCTTCCCTAC	   TCCAGGTCTGGTCCTAAGGGCAT	   HCT116	  chr16_65022234	   CCTGGACCCAGATATTGTTTTATCACC	   TGTGTCAGAAGCAGCCGTCCCTGGGGA	   HCT116	  chr11_104869708	   CCTCCAGGTTCTCAGATGACTGTGAA	   AGCCCACATGCATTAAGTATTTGTCTT	   HCT116	  chr11_114451027	   AAAACAGCTGTTCCATTAAGAGAATAC	   AATGACATCCACAATCCCCAGTGGGCAT	   HCT116	  chr11_120745874	   ATCCAGCCTGCTGCTGTTGGGGATAAC	   CCGGTAAGACCTTCCAATTCTACCTG	   HCT116	  chr12_120636549	   ATGTGAAGTCACTGTGCCAGCCCAGAA	   ATGTGCAGCTGATCAAGACTGGAGACA	   HCT116	  chr12_20864356	   AGCTTTTGGGAGCAAAAATCATAGACA	   GAATATCTCTCATATTTGTACCTGCATAAG	   HCT116	  chr13_32936646	   CCATACTGCCGTATATGATTACGTAAT	   TCATAAAAACCACATAGGATGATACTGA	   HCT116	  chr1_10238770	   CAGATTGTCTCAAAAAGCAGCTCTAGC	   ATTTGATCCCTGCCTTCCGCATCTTTG	   HCT116	  chr1_114280856	   CTTCTGTCCTAGTGATTGAGGCAAGAA	   ATCTCTGACTCGGAAGGGGCTTGTTC	   HCT116	  chr1_207263829	   CTTTCCCATAGAGGGAATTGATTCTGT	   GCACTCAGTAGTGGTGTTATCCCAC	   HCT116	  chr10_112724573	   CTCGAATGCTAAGCATGCTGAGTTTTG	   CCAGTGTCATGAGATTTACTAAACATCCC	   HCT116	  chr10_115489167	   TCATTAATGAGTGACCCATTGCTTCTC	   TCTTTTCCGTGCTCCTCCAGGATGGA	   HCT116	  chr10_44112214	   GCTCTTCCTAAAGGCTTTTCCACATTC	   GTTTCTGGCCAGTATGAATTCTCTGATG	   HCT116	  chr6_33235755	   GATCCGAGAGTTTATCCTCCAGAAGAT	   CAATGAACGAGCAACAGCAAAGGAG	   Shared	  chr7_127234121	   TTTCTGAGTGGCCAAGCTGTGTGT	   GTTTTCTGAGTGGCCAAGCTGTGT	   Shared	  chr7_47467787	   GAAAATCAGCTGATCTTTTGTCTCCTG	   TCAGTGCCTAGAAACCCAGTTTCCAG	   Shared	  chr8_103316494	   ACTGAGACTGAAGTACTGGTAAGTAGG	   GAGATTAGTGTTTAGTGGACTATACATACT	   Shared	  chr8_134050678	   GACAACACCTCCTTTGATCGAAAGAAG	   GTATTTTAGAAGCCAAGAGAAGCCACAT	   Shared	  chr9_137677921	   ATGGGGAGTGGATGAACCTGTGCAG	   ACACCTAACACGGCCAGAAGTTACC	   Shared	    42 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr4_149358014	   GTTCTGACATCTCGACAAGCTGTAGT	   ATGGAGACCAAAGGCTACCACAGTCT	   Shared	  chr5_110284484	   GCCAGCAAAAGCACTTCAGAGTCATTT	   GAGAGTTTTTAAAGAAGAAAATCAAAAGGG	   Shared	  chr5_140626627	   TTCTGCAGCGGGTACAGCACGAAGGGC	   GCGGCAGCAGCGAGTAGGTGAC	   Shared	  chr5_60083275	   AGCAGTGTCTCTAGAAGGATGGAAAAC	   TTTACTGAAGAACCCATTAAAAGAGCC	   Shared	  chr6_127768950	   GGGTAGGTGAAGTCTCAGGTAATAGTG	   TGCTTTTGAGAGCTTATTCTAAATAAAGTT	   Shared	  chr6_32188383	   AATCCAGACAACTGTGTCAGCCACCA	   AGGGTCCCCCTCACTGCAGAAA	   Shared	  chr22_18655860	   TCGGGCCTAGTGTGGGCAGGTATAAAT	   TGGTAAGAAACATCATCCACAATTGCC	   Shared	  chr22_23501286	   TACAGAGGGAAACACTGCATGGCAGG	   TGCCTGCTTCCCACTGTGCTGCC	   Shared	  chr3_100631031	   GTACTGCTCTTGTATCCCTTGAGGTTT	   ACGAAGGGTAAGAGCAGGTGTTCTTG	   Shared	  chr3_138188485	   TCATATTGGCCTTAGAGCTCACTTCAG	   ATTCTGAAGACTAAATAACATCAAGTGC	   Shared	  chr3_75713738	   CCTAGGCAGAGTCCCATAGTCTATCTC	   AAGGCCTTACCTTGCCTTTGTGTGTG	   Shared	  chr4_141847279	   CCTATTGAGAAAAAGAAAAATCTCTCAGTT	   AGGTATTCTCTGCAGTTACCGCC	   Shared	  chr2_133019862	   GTGATTGTGGATATTTTCCCTTCTGAAATG	   GGCTCAATACCTGAGGTGTTGAACT	   Shared	  chr2_133020562	   AGTTATTGACCAGTATGTACCATCCCT	   AGCATAATTTGGATCCACTTGGATAGC	   Shared	  chr2_141272057	   TGGATTTGTGATGGAGAAGATGACTGT	   AGCTGTTATTCAATATTTGGTATTTTTCCT	   Shared	  chr2_36691951	   AAGTGTGGCAGGCCCAATGCAGTTT	   GGATCTTAAGCAAACCTTTTTGAGTGC	   Shared	  chr20_41100774	   ATCCGGCTGCGACTCTTGCTGTCTAA	   CAATGAAGGACTATGTAAACTGTGCATT	   Shared	  chr21_11114683	   TAATGTCGTATTCGCCACACAGCAGCC	   ACCTGGAAGTGAGTCCCACCCAG	   Shared	  chr17_21217547	   ACAAGGGCCTGGCTGGGAGCCCC	   CCTATGAGTCTTCTCCCAGGATCTCC	   Shared	  chr17_26695832	   TTTCACTTTCCCCACCTCTTAGGGTAG	   ACAGGCCTCAGCATGTGCCCACCCC	   Shared	  chr17_29673219	   TGATGGGCTTGAGACATGTGCAGAGAC	   TTGGGCAGGGTCAATATGCTTCTTTGG	   Shared	  chr19_10794630	   CTTTGCAGTTACACGAGTTCGTGTATT	   AGTTGCCTCCGTACCCATAGCTTCCA	   Shared	  chr19_19446936	   CTGTACTTCCTTAACTGGACGTCACTAG	   GGGTGGGAGGACAGGGTCTCACTCTGT	   Shared	  chr19_21751895	   GGGATTTAAGCGTTATCCAATCAGCAA	   GACATCCTGAGAAAAGGGAAGGAGT	   Shared	  chr13_110844721	   CCACAGATTCTGTTATCTCACAAGCAG	   TTTTATGCTTTTGAGTCTTAAGAAAAGGT	   Shared	  chr14_57052511	   GTGAAATCACATGTGTTAGGGAAACCA	   TGAACAAAGGATAGCCAGGAGGATGG	   Shared	  chr15_29525855	   CATTTGACATCATCTCCCACAGAAAGC	   AGTGCATAAACTAAGCTCTCGCACAC	   Shared	    43 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr16_24834581	   GGCTTTGTGCAAAGAAACGACTGATC	   TCTCCAGGGCCTGGTGCCTTGCCCA	   Shared	  chr16_33410688	   AATCTGCCCATACCTTGGTCTTGGACT	   CTCTTGCTCAGTTTTAGCTAGAATAGAGGA	   Shared	  chr16_56900931	   ATTTCTGGACATCACGCACCACCCAC	   ATCCAGCCCACTGACCTGCACCACTG	   Shared	  chr11_112584001	   GCATGAGTTTGTTTTTGGTTTTCTGTT	   ATGGACCCCTGGGTTTAAGGTCACATA	   Shared	  chr11_117800083	   TAAAGGCAGCTTCAGGTTCCCGGTATTT	   AAGATGGATCTTCTCCTCGACATCAGC	   Shared	  chr12_25957056	   CTGCTATGTTATGCCGTCATCCAGT	   CAGCATATGGTGGTGCTTACGGAGAAA	   Shared	  chr12_5603632	   CAGACTCTCACTGTCACATACCGAGTA	   GCTGGAATGCTGACTTGGCGGG	   Shared	  chr12_62104219	   TCACTTGGATAATTGTGCAAATGGCTT	   TTTCTCCTTCCTTTGTAGGTAACCCAT	   Shared	  chr12_8376470	   GGAAAAGGAAGGGAAGGAAAACCTGAA	   AGTTCCACACCCAAGAGCTCCTTTGG	   Shared	  chr1_103483514	   ACTGGGACCAAATATGACCTTAGAGAC	   ATGAGATTAGGAATCTGATTAAGATATGGT	   Shared	  chr1_196659237	   AGACATGAACATGCTAGGATTTCAGAG	   ATAAGGAAAATAACATTTTCCTAAGGACC	   Shared	  chr1_28209366	   GGATTATACTAGTACCTTGACGCCTCC	   AGTCAGGTCCAGCCACCGGGGAGGGAT	   Shared	  chr10_3178881	   ACCTTTAGTAGCCACCCATAGATTGAG	   TAATACGTGCATAAAGTGCTGATAAAATA	   Shared	  chr10_38894616	   TGGCACTTTAGACTCCTGGATGTTGA	   GGGTAGATCACCTGGTCACGCA	   Shared	  chr11_104972190	   TGTCATTCTTGTCTTATTCTCCAAGCC	   GCATGCTTTCAGTTTCAGCTTACACA	   Shared	  chr1_23640222	   TCCTTGAAGGTTCTACTCAGTGTTAGG	   GTTTTGTTTATTTGTTGCTTACTAGGTAAA	   hTERT	  chr1_89887017	   TCAGTCTATTCATAGTAAGCATTTGATCA	   GCTGAGTAGCTAACTAAAGAAATGTGA	   hTERT	  chr2_231333852	   CCCCTCTACTCAGGAACTACTTTATCC	   GAGTTTTTAATTGCTCTTACGTCTCTT	   hTERT	  chr2_62067433	   CATCATTTTCAGAGCCTGATTTAGGCC	   CAGTACCGGAGCCTTTTCAAATGATGAT	   hTERT	  chr3_97607290	   ACAGTCTGCCACTCAAGTAAATGTTTG	   GTAATGATTTCATTTCTGCTGTAGAAATC	   hTERT	  chr3_112066360	   GGTCTTTGATATGGAGGATGCTGGTAA	   AGTCCTGCAGTCTCCTCAGAGAGAT	   hTERT	  chr22_46652372	   GTTCAGCTTTCTGACCTCGCAATCTAA	   GTCATGCTTTCTACTCATGTCCTTCAC	   hTERT	  chrX_135730555	   AGCACTTTTTGCTGTGTATCTTCATAG	   GTTAGGATCTTGGGATCTGGAGTCAGA	   hTERT	  chr2_188690750	   CAAGAAAATTACCAGGCATTACCCCAG	   CTTCTTTCAGCGTTGCACTAGCATCT	   hTERT	  chr20_8770822	   TCTGAGACTGTGTACAAGGTACTTATG	   AGAAAGTTACAAGAAACATTACTTTCATCT	   hTERT	  chrX_10062163	   GGAAAATTACACCAAACCAGTGATGGC	   TGAGACAAGCTCTTTGAACGGAGAT	   hTERT	  chr19_15164559	   AACCACAGACTCTGTGCTCTTGGATCT	   TCTTCCTGGAATTGCTGGAGTGAGAG	   hTERT	    44 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr19_7605637	   AGACACGGTCCGCTGGGGATGCTGGTG	   CCCCTGCCTCAGAAACTGGTCA	   hTERT	  chr20_50803444	   CCTACCATGAATCTGGTCTGCATTACA	   AAACAGTGCCTGCCACCCAGACTCAGA	   hTERT	  chr20_37396262	   GAATTCTACGCATGTCTTCATCTTCCA	   GACTACTCGAACAATCAGTACCTGAAAA	   hTERT	  chr21_38439686	   GAAGACAGCTCAAGTGCTCAAAGCATT	   CAGATATTGGGCAGTTGCATTACCTGT	   hTERT	  chr21_28212760	   ACAGTTTTCAGATCACTTTGCTGTCTA	   ATGCAGCCAGCACATGTAGCACCT	   hTERT	  chr22_50572406	   ATGACTGAAGAGACTATTTTACACAAAAT	   ATATTAAGGCTGGCATAAGGTAATGAA	   hTERT	  chr16_58573901	   ATTATGCTGCTGTCTGCCAGGTTTTCC	   TTGTCTCTTGTTGGTTGTTTCCTTCTG	   hTERT	  chr17_17168316	   CTAGCAAACAAAGCTGTGAACAAATTG	   ACATACCAGTTAAGCATCCCTAATCTG	   hTERT	  chr17_16649880	   TTGTCCACGCTCCGCCCCGGTGGGAA	   CTTCCTTGACAGCAAAGATCCAGAATA	   hTERT	  chr18_14106039	   ACAAACATTTAACTGTGAAAGGAGACA	   AATGAATGTGAAAAAGTATTTTACCAATCC	   hTERT	  chr18_31599259	   AAAACTGGGCTGGAGCAAGATGAACAG	   ATTTTTGAGTTGGTCATGAAGGTCACA	   hTERT	  chr19_21477403	   TGCTATGTCATCTTGTTTATATAGTTGGT	   AGGTCACAAGGGACTTAACCGGTG	   hTERT	  chr13_31543095	   CATTTTTGTGGGCATTCAACTCCAGTG	   GTCACTTTGGTATGTGGTCTGTTTCTT	   hTERT	  chr14_73459917	   GGGAGAACGGGTGGTGTTATTCTCTA	   ACAGGCACATGGGTAGTGAGTGACATT	   hTERT	  chr14_45473265	   ACAGAATTATCTTGAGACTTCTTTGGAG	   ATGTATACACAGTTTAACAGTTAAAAGCT	   hTERT	  chr15_40330518	   GAGGTGTTTAACAAAACTAGAGCAGCG	   AGCTACCGATGGGAAGAAGAAGATCA	   hTERT	  chr15_45974869	   ATTGCAGGCGCCCACCACACGCCTGAC	   TTGACTTACCTACTGCAGCAGCGGTCT	   hTERT	  chr16_81208515	   AGTGGTAGAGCAATGACTTGCGCCAG	   TCGGAGACACATGTGTTGAGGATGTAG	   hTERT	  chr10_29840038	   GTGCAGTAGAGTGTCATTCTGTGTGTA	   CAAGGTACAAAGCAGAAAGAAGGCGA	   hTERT	  chr11_82644764	   CCCATTCATCACTATCAGTTTCATGGC	   TTTTTGAAAGCTCTGTTTATCCCATGA	   hTERT	  chr11_60105199	   GCAGTCCATGGTGTAAGGTGAATGATA	   GGTAAGAATCATTCCCAAAGGGCCA	   hTERT	  chr12_109936056	   CAACCAAATCTCTAGAATAGTGACAGG	   AACAGTGAGGCGCTGCAAAGAATAGCA	   hTERT	  chr12_1553797	   TGCATCAACTACACGGCAACTGGATT	   GTCAGGTGTCCAATGTACAACTTTAATT	   hTERT	  chr13_33629393	   AGATTGGAAGAGTTTTGGAGAGAAGGG	   GATTTTCAGGTAAAGGAGGGAAGCCAT	   hTERT	  chr7_6370144	   AAAGGAGAGCCTCGGGTGTCACGACTG	   CCCATGTTTTCAAAGCAAACAATAGAATTC	   hTERT	  chr7_135322853	   AAGAACCATGTGACTTCTCTAAGGTTT	   TACTTGGGGTGGCTGAGGCAGGAGAATT	   hTERT	  chr8_131138344	   AGGAAAGCCTTTCTCACTTTTTCACAA	   AACCCACCTGGCTTTCAACCAACTT	   hTERT	    45 Position	   Forward	  Primer	   Reverse	  Primer	   Target	  chr8_105361354	   CCTCTGTTGATGTCTCTCCCTTTCATC	   TTTGCTGTCATTTAGCTGTGCCTCCA	   hTERT	  chr9_126520068	   TTCTGAGGTGGTCACTAGAATAGGCTG	   TCGTGCTCACTGACATTGACAGCAAA	   hTERT	  chr10_103560186	   ACATTACGTTTAGGTTACTATCCTATATGA	   TTGCAGCTGACAGTGAAGATAGTAC	   hTERT	  chr4_123236857	   CCCCACTCCCTACTCATTTCAGTTTTA	   GAGTTTACTTACAGTATGGATAGTGGGAGA	   hTERT	  chr4_70172516	   GCCATGAGGTAACAGTATTGACTCACT	   TCAATGAGACACTCATGAAGAAGCTAC	   hTERT	  chr5_71609959	   GCATCTTCACTTGTTAGGGGTTGAATG	   TGTAGACAGCCACAAATGGGAAGCAA	   hTERT	  chr5_140903792	   GGTCTGCATTGGATAATAGGCAATGAG	   ATGAGACAGGTGTGATGGACAGTCTT	   hTERT	  chr6_7301834	   CACATGATTTGGATGTGATCTGGTCTG	   TGTCTCTATTTTGTTTCTCAGATTTTCC	   hTERT	  chr6_158658857	   GTCTTCACACAAGCTATCTGGTGGTATC	   TTGCCATCATCAGATAGTATGTCTCTA	   hTERT	   Table 2.3 Primer sequences used for multiplex 184-hTERT and HCT116 cell mixing experiments The table shows the chromosome position, forward and reverse primer sequence, and the unique cell line or shared position for all 144 primers used in the TruSeq chemistry 184-hTERT/HCT116 mixing experiment.             46                    Table 2.4 The 99 Primer sequences, common between the HCT116/184-hTERT multiplex and singleplex PCR sequencing experiments. The table shows the Chromosome position, and the cell line or Shared target for the 99 primers that are common to both methodologies. There are 29 common shared targets, 33 common HCT116 targets and 36 common 184-hTERT targets used in the 184-hTERT/HCT116 mixing experiments.  	  No.$$ Cell$LineCommon$targets$of$multiplex$and$single6plex$methods Cell$LineCommon$targets$of$multiplex$and$single6plex$methods Cell$LineCommon$targets$of$multiplex$and$single6plex$methods1 Shared chr10_3178881 HCT116 chr10_112724573 hTERT7184 chr10_1035601862 Shared chr10_38894616 HCT116 chr10_115489167 hTERT7184 chr10_298400383 Shared chr11_104972190 HCT116 chr11_104869708 hTERT7184 chr11_601051994 Shared chr11_112584001 HCT116 chr11_114451027 hTERT7184 chr11_826447645 Shared chr12_25957056 HCT116 chr11_120745874 hTERT7184 chr12_1099360566 Shared chr12_5603632 HCT116 chr12_20864356 hTERT7184 chr12_15537977 Shared chr12_62104219 HCT116 chr14_24606792 hTERT7184 chr13_336293938 Shared chr13_110844721 HCT116 chr15_44900675 hTERT7184 chr14_454732659 Shared chr14_57052511 HCT116 chr15_51529112 hTERT7184 chr14_7345991710 Shared chr16_24834581 HCT116 chr15_78764285 hTERT7184 chr15_4033051811 Shared chr16_33410688 HCT116 chr16_20648702 hTERT7184 chr15_4597486912 Shared chr17_21217547 HCT116 chr16_65022234 hTERT7184 chr16_5857390113 Shared chr17_26695832 HCT116 chr16_84215692 hTERT7184 chr16_8120851514 Shared chr17_29673219 HCT116 chr17_2323517 hTERT7184 chr17_1664988015 Shared chr19_19446936 HCT116 chr18_72238472 hTERT7184 chr17_1716831616 Shared chr19_21751895 HCT116 chr19_15038754 hTERT7184 chr19_1516455917 Shared chr2_133019862 HCT116 chr19_44652954 hTERT7184 chr19_760563718 Shared chr2_133020562 HCT116 chr19_57686731 hTERT7184 chr2_18869075019 Shared chr2_36691951 HCT116 chr2_169763262 hTERT7184 chr20_5080344420 Shared chr20_41100774 HCT116 chr2_170055255 hTERT7184 chr20_877082221 Shared chr22_18655860 HCT116 chr2_27663416 hTERT7184 chr21_2821276022 Shared chr22_23501286 HCT116 chr3_119211294 hTERT7184 chr21_3843968623 Shared chr3_138188485 HCT116 chr3_121838319 hTERT7184 chr22_4665237224 Shared chr4_141847279 HCT116 chr3_188327461 hTERT7184 chr22_5057240625 Shared chr4_149358014 HCT116 chr4_175413296 hTERT7184 chr3_11206636026 Shared chr6_127768950 HCT116 chr4_177113779 hTERT7184 chr4_12323685727 Shared chr6_32188383 HCT116 chr4_2181016 hTERT7184 chr4_7017251628 Shared chr6_33235755 HCT116 chr4_42580442 hTERT7184 chr5_14090379229 Shared chr8_134050678 HCT116 chr5_102440383 hTERT7184 chr5_7160995930 HCT116 chr5_78296755 hTERT7184 chr6_15865885731 HCT116 chr6_29455098 hTERT7184 chr6_730183432 HCT116 chr6_30231224 hTERT7184 chr7_637014433 HCT116 chr7_36446121 hTERT7184 chr8_10536135434 hTERT7184 chr8_13113834435 hTERT7184 chr9_12652006836 hTERT7184 chrX_10062163  47 1.2 Molecular biology techniques  1.2.1 DNA extraction DNA was extracted using the QIAamp DNA mini kit (Qiagen), using the protocol for cultured cells. DNA was eluted with 20µl elution buffer to increase DNA concentration. DNA concentration and quantitation was measured by flourometry using Qubit, dsDNA BR Assay (Life Technologies) per the manufacturer’s protocol. DNA quality was assessed using the NanoDrop ND1000 (ThermoScientific) with 1ul of extracted genomic DNA as per the manufacturer’s protocol.  1.2.2 Real-time quantitative PCR (qPCR) qPCR was performed with 5ng of genomic DNA template, 5µl SYBR Select Mastermix 2x (Life Technologies), and 0.2µM each of forward and reverse primers. Each primer pair was performed as a singleplex reaction. Cycling conditions were as follows: Standard curve (AQ), 50oC for 2 min, followed by 40 cycles of [95oC for 10 s, 95oC for 15 s, 60oC for 1 min]. A dissociation step was also added to the end of the program. The ABI 7900HT was used for all qPCR experiments.  1.2.3 PCR The PCR for the 2-step MiSeq method was performed with 1µl of the qPCR ExoSAP DNA template, 10x FastStart HiFi Rxn buffer w/o MgCl2, 25mM MgCl2, DMSO, 10ml PCR grade Nucleotide, 5U/µl FastStart HiFi Enzyme (all from Roche), and 4µl of each I7 and I5 Barcode-Adapters (Illumina). PCR cycling conditions were as follows: 95oC for 10 min,   48 followed by 15 cycles of [95oC for 15 s, 60oC for 30 s, 72oC for 1 min, 72oC for 3 min] and 4oC hold.  The PCR for the TruSeq Custom Amplicon method was performed as per the manufacturer’s protocol. Cycling conditions were as follows: 95oC for 3 min, followed by 30 cycles of [95oC for 30 s, 66oC for 30 s, 72oC for 1 min], 72oC for 5 min.  All PCR experiments were carried out on the Veriti 96-well machine (Applied Biosystems).   1.2.4 Agilent bioanalyser The Bioanalyser was used as a quality control step for determining the correct size distribution of the SPRIselect and magnetic bead purified samples, and then pooled as one sample for MiSeq sequencing.  Quality control and size distribution of the samples was performed using the Agilent DNA 1000 DNA kit (Agilent Technologies) per the manufacturer’s protocol. 1µl of the sample was required.  1.3 Amplicon library construction – sequencing  1.3.1 Illumina multiplex sequencing method After samples were mixed, DNA extracted and quantified, the Illumina TruSeq Custom Amplicon Library preparation protocol was followed. All reagents were supplied as part of the TruSeq custom kit (catalog # FC-130-1001 and FC-130-1003) and as such most of the reagents are proprietary, the components of the reagents are not listed.  The starting template for each sample was 5µl of DNA at 50ng/µl (250ng total).    49 1.3.1.1 Hybridization of oligo pool The first step of the protocol involved the use of the pooled custom-designed oligos and the hybridization of these oligos to each of the DNA samples. 5µl of DNA at 50ng/µl (250ng total) was added to each well of the hybridization plate. 5µl of the custom oligo was added to each sample well. 5µl of DNA control (ACD1), and 5µl of control oligo pool ACP1) was added to one well one of the plate.  The plate was sealed and centrifuges centrifuged for 1min at 1,000 xG. 40µl of oligo hybridization for sequencing (OHS1) was added to each sample. The plate was again centrifuged at 1,000 xG for 1min. The plate was then heated at 95oC for 1 min and then the block temperature set to 40oC for 80 minutes incubation.  1.3.1.2 Removal of unbound oligos After hybridization, all of the unbound oligos were removed with the use of a filter plate that is capable of size selection. The filter plate was supplied by Illumina as part of the TruSeq Custom Amplicon kit. Pre-washing of the filter plate was completed using 45µl of stringent wash 1 (SW1) in each well of the filter plate and centrifuged for 2 minutes at 2,400 xG. After incubation, the entire volume of each sample well was transferred to the corresponding well of the pre-washed filter plate. The plate was again centrifuged at 2,400 xG for 2 minutes. 45µl of SW1 was added to each sample well and the plate centrifuged at 2,400 xG for 2 minutes. The wash step was then repeated. 45µl of the universal buffer was added to each well before being centrifuged for 2 minutes at 2,400 xG. This step removed the unbound oligos and washed and prepared the samples for extension-ligation.     50 1.3.1.3 Extension ligation This step hybridizes the upstream and downstream oligos and results in products that contain regions of interest that are flanked by the specific sequences needed for the amplification step. 45µl of the extension ligation mix 3 (ELM3) was added to each sample well of the filter plate. The filter plate was sealed with adhesive aluminum foil and the entire filter plate assembly was incubated at 37oC for 45 minutes  1.3.1.4 PCR amplification The products formed by extension-ligation are amplified with primers that add multiplexing index samples (i5 and i7) and common adapters (P5 and P7) necessary for the generation of clusters in each sample. A new 96 well plate was labeled as the index amplification plate (IAP). 4µl of the i5 primers was added to each of the sample wells. 4µl of the i7 primers was added to each of the corresponding sample wells. The samples that were incubated in the filter plate were centrifuged at 2,400 xG for 2 min at the end of the incubation period. 25µl of 50mM NaOH was added to each of the filter plate sample wells. The plate was incubated at room temperature for 5 minutes. 22µl of the PCR master mix was added to the IAP plate containing the index primers while the filter plate was incubating. 20µl of the sample from the filter plate was transferred to the corresponding well of the IAP plate. The plate was sealed and centrifuges at 1,000 xG for 1 minute before being placed on the PCR machine.  PCR conditions: 1 cycle of 95oC for 3 minutes, 25 cycles of: 95oC for 30 seconds, 62oC for 30 seconds 72oC for 60 seconds, 1 cycle, 72oC for 5 minutes and then hold at 10oC. The PCR machine used for this experiment was the Veriti 96 well (Applied Biosytems).    51 1.3.1.5 PCR clean up PCR amplification produces of reaction components that must be removed in order to have a clean purified library.  Agencourt AMPure beads (Beckman Coulter) were used to purify the PCR products in the samples. A new plate was labeled as the clean up plate (CLP), to this plate 45µl of AMPure XP beads were added to each well. The entire PCR product sample was transferred to the CLP plate. The plate was sealed and placed on the micro-plate shaker for 2 min at 1,800 rpm. The plate was then incubated at room temperature for 10 min without shaking. The plate was placed on a magnetic stand until supernatant clear and supernatant discarded. 200µl of 80% ethanol was added to each sample well and incubated on the magnetic stand until supernatant appeared clear. Discard the supernatant. The ethanol wash step was repeated.  The plate was removed from the magnetic stand and allowed to air dry for 10 min. 30µl of the elution buffer with Tris (EBT) was added to each well of the plate, then the plate was sealed and placed on the micro-plate shaker for 2 min at 1,800 rpm.  The plate was then incubated at room temperature without shaking and placed on the magnetic stand for 2 minutes or until supernatant appeared clear. 20µl of supernatant was added to a new plate labeled library normalization plate (LNP).  1µl of the purified samples were run on the Agilent Bioanalyser (Agilent) to confirm that library amplification was successful and of the required size. If samples were of the correct size, around 350 bp, then they were carried through to the next steps of the protocol.  1.3.1.6 Library normalisation Normalisation of the individual libraries ensures that there is equal library representation in the final pooled sample. This is an important step to ensure that sequencing is not biased to   52 one sample over another.  45µl of the freshly made LNA1/LNB1 (Illumina-supplied) normalization bead mix was added to each well of the LNP plate. The plate was sealed and placed on the micro-plate shaker at 1,800 rpm for 30 min. After shaking, the plate was placed on the magnetic stand for 2 min. Discard the supernatant. The plate was removed from the magnetic stand and 45µl of library normalization wash 1 (LNW1) was added to each well. The plate was placed on the micro-plate shaker for 5 min at 1,800 rpm and then placed on the magnetic stand for 2 min. Discard the supernatant. The LNW1 wash step was repeated. The plate was removed from the magnetic stand and supernatant discarded. 30µl of 0.1 NaOH was added to each sample well. The plate was sealed and placed on the micro-plate shaker for 5 min at 1,800 rpm. While the plate was on the shaker, a new plate labeled Storage plate (SGP) had 30µl library normalization storage buffer 1 (LNS1) was added to each well. After the 5 min elution, the LNP plate was then placed on magnetic stand for 2 min and 30µl of the supernatant was transferred to the corresponding well of the SGP plate.   1.3.1.7 Library for Miseq sequencing The normalised libraries were combined in equal volumes then diluted and denatured by heat prior to loading on the MiSeq cartridge for sequencing. 5µl of each sample from the SGP plate was added to an 8-tube PCR strip. The contents of the PCR strips were transferred to one single tube. 6µl of this pooled sample was then added to 594µl of hybridization buffer (HT1) and mixed. This HT1 and sample mix tube was incubated at 96oC for 2 min and then immediately placed in an ice bath. This pooled library tube was then loaded onto the MiSeq for sequencing.     53 1.3.2 Singleplex PCR sequencing method After samples were mixed, DNA was extracted and quantified, and 5ng total of genomic DNA was used per reaction for each sample.  1.3.2.1 qPCR Amplification of DNA was performed on the ABI 7900HT, as it allowed for the assessment of the quality of each primer in the samples and allowed for higher throughput than standard PCR. An additional dissociation curve step was added to the qPCR run. The SYBR Select Mastermix protocol, mentioned above, was used.  At the end of each qPCR run, samples were pooled (for each sample all 144 wells of singleplex reactions were pooled) and immediately placed at -20oC. Pooling of samples involved pooling each well of singleplex reactions (144 wells in total) into one tube for each sample.  1.3.2.2 ExoSAP-IT After qPCR, any unconsumed dNTPs and primers that could potentially interfere with downstream sequencing reactions were removed using ExoSAP-IT (Affymetrix USB) reagent. 10 µl of post-qPCR product was added to 4 µl of ExoSAP-IT. The sample was incubated at 37oC for 15 min; this step degrades any of the remaining and unconsumed primers and dNTPs. The sample was then incubated at 80oC for a further 15 min to inactivate the ExoSAP-IT reagent.    54 1.3.2.3 Barcode adaptation – 2nd PCR This PCR allows for the addition of the Illumina i5 and i7 barcode adapters to the pooled qPCR Amplicon product samples, required for sequencing on the MiSeq. The Illumina barcodes provide a unique sequence adapter for each sample and allow for complexity in the final pooled sample.  The PCR was performed as per the protocol described in section 2.3.3 using FastStart High Fidelity PCR system (Roche).   1.3.2.4 Sample cleanup and size selection After barcode adaptation, PCR the samples have small PCR fragments removed, in order to provide a clean sample that is of the optimum size for sequencing. SPRIselect beads use Solid Phase Reversible Immobilization (SPRI) to remove amplicons from each sample that are 200bp or less in size. The manufacturer’s protocol for SPRIselect bead clean up (Beckman Coulter) was followed.   1.3.2.5 Quantitation and quality control DNA concentration of the SPRI select purified samples was quantified using Qubit, dsDNA BR Assay (Life Technologies), as per technical protocol.  To ensure that the size distribution of the SPRIselect purified samples were at the target size, all samples were run on an Agilent Bioanalyser using the Agilent DNA 1000 DNA kit (Agilent Technologies) as per the technical protocol. If all samples passed the quality control and were of the correct size, each sample was diluted to a concentration of 4nM before final pooling of samples.   55  1.3.2.6 Denaturation and dilution of library After generation of SPRIselect purified samples were concentrated to 4nM, and all samples were pooled to make a single library with a concentration of 4nM. The 4nM library was denatured as per Illumina Nextera technical protocol. After the library was denatured into single strand DNA molecules, the library was diluted to 11pM, which was previously determined to provide optimal cluster density on the MiSeq for this set of experiments.   1.4 Genomic data analysis  1.4.1 Allelic prevalence plots After amplicon library sequencing, the data files were transferred to the Bioinformatics group.  Raw data processed as fastq files was copied directly from the MiSeq machine. The reads were then mapped to hg19 and counting of the reference and alternate alleles per position, per alignment, per sample was completed. The read depth, p values and q values were calculated for each of HCT116, 184-hTERT and shared target lists. The allelic frequencies in each sample and target list were calculated based on a binomial exact test. The results for the variant prevalences were compiled into a tabular format for each sample. The table information was input into ggplot2 (a program designed to create graphical displays) and the box plots were created. These box plots show the allelic prevalence of variant positions in each sample for the specific target set e.g.: HCT116, 184-hTERT or Shared. Figure 2.1 is an example of a box plot and gives an overview of how to interpret each box plot and what values are represented.    56               Figure 2.1 Box and Whisker Plot diagram The Figure illustrates the features of a box and whisker plot.   1.4.2 PyClone The allelic prevalence data was input into the PyClone computational model (Roth et al., 2014). Cellular prevalences were calculated using the PyClone beta binomial model. The tumour content was assumed to be 1.0 for all cell lines. Copy number (CN) information for the ovarian cell lines was used to calculate the correct allelic prevalence values for all the samples. The HCT116 and 184-hTERT cell lines were assumed to be diploid. However chr20 of the 184-hTERT cell line is known to be triploid after early passaging (passage 22) and the CN for the targets on chr20 was considered. The cellular prevalence means were plotted using R and ggplot. 	  Upper%Quar)le%–%75%%of%the%targets%fall%below%the%upper%quar)le%%Lower%Quar)le%–%25%%of%the%targets%fall%below%the%lower%quar)le%%Median>%(The%middle%quar)le)%is%the%mid>point%of%the%data%Inter%quar)le%range%–%50%%of%targets%fall%in%the%“box”%%Q3%Q2%Upper%Whisker%–%represent%targets%that%are%outside%the%middle%50%%Lower%Whisker%–%represent%targets%that%are%outside%the%middle%50%%Outliers>%between%1.5%and%3%box%lengths%from%the%75%%quar)le%Outliers%>%between%1.5%and%3%box%lengths%from%the%25%%quar)le%%  57 Chapter 2: Results I set out to test the ability of deep sequencing to retrieve allelic prevalences and elucidate clonal structure in idealised cell mixtures. Previous studies have investigated clonal prevalence and evolution in tumours but it has not yet been established how statistical models such as PyClone perform with idealised mixtures of cell lines representing clones. I tested three main hypotheses using idealised cell line mixtures, the first two addressing sequencing/experimental factors in allele prevalence measurement and the third, testing an assumption related to analysis and interpretation of raw allele prevalence measurements; 1) Multiplex PCR-derived amplicon sequencing results in greater accuracy of allele prevalence estimates than singleplex PCR-derived amplicon sequencing, 2) Generation of idealised mixtures by DNA concentration ratios is more accurate and precise than mixing by cell number counting, 3) Cell line mixtures with copy number-complex genomes require copy number aware methods to accurately resolve clonal mixtures First, I tested whether multiplex (using Illumina TruSeq chemistry) vs. singleplex (2 Step targeted PCR) targeted sequencing results in different allelic prevalence estimates in defined mixtures of two diploid cell lines, 184hTERT (breast epithelium) and HCT116 (colorectal cancer). Both cell lines have been extensively characterized previously by sequencing and polymorphisms were used to simulate clonal mixtures. A total of 144 multiplex amplicons oligonucleotide primers were designed using Illumina software, which optimizes primer sequences to avoid primer-dimer reactions. The 144 singleplex amplicons were designed with Primer3 (Untergasser et al., 2012) ,with each amplicon designed for optimal target amplification and processed as a single reaction. Second, I tested whether the manner of mixing cells to generate idealised mixtures would influence the measure allele prevalence. This was achieved by   58 generating the idealised sample mixtures by mixing using cell number and DNA concentration. Finally, I evaluated the performance of a copy number aware algorithm for estimating clonal prevalence in populations, PyClone, compared with simple measurements of allele prevalence. I tested both copy number-simple (HCT116 and 184-hTERT cell line) and copy number-complex (DAH55 and DAH56 ovarian cell lines) in idealised mixtures. The results show that incorporation of copy number into clonal prevalence estimates is able to recreate the expected clonal proportions in copy number complex mixtures. This is relevant to the situation in many epithelial cancers, where copy number aberrations are prevalent in the somatic genome.   59    Table 2.1 Table of experiments, experimental design and results figures The table indicates all the experiments conducted that will be discussed  in the results section. The table also shows the figure numbers that refer to each experiment.         Type%of%ChemistryPrimer%design/PCR Cell%mixtures #%of%mixing%proportions Sample%mixture%type Primer%sets FiguresTruSeq Multiplex HCT116/1844hTERT 159mixtures By9Cell9Number9only489loci9HCT1169999999999999489loci91844hTERT999999999489loci9Shared 3.2,3.5,93.7,93.9929step9PCR9MiSeq Single4Plex HCT116/1844hTERT 249mixtures By9Cell9number9and9DNA9Concentration489loci9HCT1169999999999999489loci91844hTERT99999999489loci9Shared 3.3,93.4,93.6,93.893.10,93.11929step9PCR9MiSeq Single4Plex DAH55/DAH56 249mixtures By9Cell9number9and9DNA9Concentration489loci9DAH55999999999999999489loci9DAH56999999999999999489loci9Shared 3.129493.21  60  Figure 2.1 Workflow of comparative experiments performed with idealised mixtures.  This figure illustrates the three central hypotheses of the thesis and the comparative experiments that were conducted to test the hypotheses. 1) Multiplex PCR amplification sequencing results in greater accuracy of allele prevalence estimates than singleplex PCR sequencing, 2) Generation of idealised mixtures by DNA concentration ratios as opposed to cell number counting is more accurate/precise, 3) Cell mixtures with copy number-complex cellular genomes yield uncertain allelic and cellular prevalences.Compara've*Experiments*1)*PCR*Amplifica'on*method*Mul$plex(Single-plex(2)*Sample*mixture*type*Samples(mixed(by:(Cell(number(Samples(mixed(by:(DNA(concentra$on(3)*Copy*number*status*Copy(number(simple:(HCT116/184-hTERT(cell(lines(Copy(number(complex:(DAH55/DAH56((ovarian(cell(lines)(  61  2.1 H1: Multiplex PCR sequencing performs similarly to singleplex PCR sequencing in determining allelic prevalence from idealised mixtures To address hypothesis 1, I compared the performance of multiplex PCR sequencing and singleplex PCR sequencing at 48 genomic locations with shared heterozygous allele status between HCT116 and 184hTERT cells (full positions in tables 2.1 and 2.2). Both cells lines are copy number diploid across their genomes. Also incorporated in the primer design pool were 48 heterozygous alleles specific to each cell line.  These are mutually exclusive between HCT116 and 184hTERT and provide controls for inadvertent cross contamination.  Out of the 144 selected genomic locations 99 were common between the multiplex and singleplex PCR sequencing methods (table 2.4). Idealised mixtures were made by counting cells and mixing in proportions of 0.0:1.0, 0.1:0.90, 0.25:0.75, 0.5:0.5, 0.75:025, 0.9:0.1 and 1.0:0.0.  In the multiplex sequencing experiments, additional intervals were generated with mixtures of 0.001, 0.999, 0.01, and 0.99, as shown in tables 3.2 and 3.3. After PCR sequencing as described in the methods, reads were aligned to target sequences (described in section 2.6.1) and the proportion of reads for each allele was calculated as a fraction of the total reference and alternate allele read counts. Allelic prevalence box blots were generated to provide a visualisation of sample and target performance. The distribution of measured allele prevalence for both multiplex and singleplex methods generally follow the expected trend (Figures 3.2 and 3.3). As expected, the idealised mixtures of the multiplex experiment (Figure 3.2) demonstrated a pattern of decreasing HCT116 allelic prevalence with a decrease in HCT116 proportion   62 within the sample and an increased allelic prevalence of 184-hTERT as the proportion of 184-hTERT in the sample increased. The calculated median values obtained from each of the samples were compared to the expected values, and standard deviation (SD) and variance calculated (Table 3.2).  The range of departure of the median from the expected values over all prevalences measurements was low (0.0 to 5.0x10-2), with an overall SD across all samples and positions of 1.46x10-2, and an overall variance of 2.0x10-4, revealing that all sample positions were tightly clustered and produced values close to the expected allelic prevalence (Table 3.2). The shared allele positions between HCT116 and 184hTERT produced the tightest clustering of observed allelic prevalence (the range of median values different from expected 5.0x10-4 to 1.0x10-2), with an SD of 3x10-3, and a variance of 1.13x10-5 (Table 3.2). Three of the shared positions (chr2_133019862, chr2_133020562 and chr10_338894616) consistently read out at 0.0 for all samples, despite adequate coverage of the positions, suggesting primer design failure.  The HCT116 and 184-hTERT specific positions also showed tight clustering around the expected values (range of median values different from expected 0.0 to 5.7x10-1), with a very small difference in variance and SD between HCT116 (SD 1.5x10-2, variance 2.13x10-4), and 184-hTERT (SD 1.78x10-2, variance of 3.24x10-4).   The singleplex PCR sequencing experiment also performed in accordance to the expected allelic prevalences per sample proportion mixtures, (Figure 3.3). The idealised mixtures demonstrated a pattern of decreasing HCT116 allelic prevalence with a decrease in HCT116 proportion in each sample and an increase of the allelic prevalence 184-hTERT positions as the proportion of 184-hTERT in the sample increased, which was   63 expected. The calculated range of departure of the median from the expected values over all prevalences measurements 4.7x10-3 to 0.1 (SD 2.29x10-2), with an overall variance of 8x10-4), revealing that all sample positions were tightly clustered and produced values close to the expected allelic prevalence values (Table 3.3). As was also the case for the multiplex PCR method, the shared allele positions, produced the tightest clustering and values closest to the expected allelic prevalence values, with the difference of median values from expected ranging from 1.7x10-3 to 1.3x10-2 (SD 4x10-3, variance of 1.66 x10-5) Two shared positions (chr2_133020562 and chr10_38894616) consistently produced an allelic prevalence value of 0.0. It should be noted that these two shared positions were common between both the multiplex and singleplex PCR methods and failed in both experiments.  The HCT116 and 184-hTERT specific positions showed tight clustering around the expected values (calculated range of departure of the median from the expected values 0.0 to 5.7x10-1), with a small difference in variance between the HCT116 (variance 1.0x10-3) and 184-hTERT (variance 9x10-4) positions. The SD was the same for both the HCT116 and 184-hTERT positions (SD 3.1x10-2) (Table 3.3).       64 Figure 2.2 Allelic prevalence boxplots of idealised mixtures for the multiplex sequencing (184-hTERT specific, HCT116 specific, shared positions). The box plots show the allelic prevalence values for the HCT116 (red), Shared (green) and 184-hTERT (blue) specific heterozygous positions in each sample. The vertical axis indicates the distribution of allelic prevalence values by boxes (interquartile range), horizontal line (median) and dots (outliers greater than 2SD), over the interval 0.0 to 1.0 determined by sequencing and alignments of reads over all informative positions. The horizontal axis indicates the proportions of each cell line in each sample mix ranging from 0.0/1.0 to 1.0/0.0 184-hTERT/HCT116. All samples mixed by cell number.     65 Figure 2.3 Allelic prevalence boxplots of idealised mixtures for the singleplex sequencing (184-hTERT specific, HCT116 specific, shared positions) The box plots show the allelic prevalence values for the HCT116 (red), Shared (green) and 184-hTERT (blue) specific heterozygous positions in each sample. The vertical axis indicates the distribution of allelic prevalence values by boxes (interquartile range), horizontal line (median) and dots (outliers greater than 2SD), over the interval 0.0 to 1.0 determined by sequencing and alignments of reads over all informative positions. The horizontal axis indicates the proportions of each cell line in each sample mix ranging from 0.0/1.0 to 1.0/0.0 184-hTERT/HCT116. All samples mixed by cell number.    66  Table 2.2 Summarised expected and observed allelic prevalence measurements derived from multiplex sequencing of idealized 184hTERT/HCT116 mixtures.  The table columns represent the proportions of cell mixing for HCT116 and 184hTERT, allele target type (shared or cell type specific), median observed prevalence, expected prevalence (as a function of cell mixing), difference, standard deviation and variance.   HCT116&Proportion1840hTERT&Proportion Target&type Median ExpectedDifference&of&median&from&expectedStandard&Deviation&(SD) Variance&1 0 HCT116 5.00E001 5.00E001 3.00E*041 0 HCT116 4.88E001 5.00E001 1.18E*020.999 0.001 HCT116 4.93E001 5.00E001 6.40E*030.99 0.01 HCT116 4.89E001 4.95E001 5.70E*030.9 0.1 HCT116 4.90E001 4.50E001 3.96E*020.75 0.25 HCT116 4.11E001 3.75E001 3.62E*02HCT116 0.5 0.5 HCT116 2.50E001 2.50E001 0.00E+00 1.52E002 2.31E0040.5 0.5 HCT116 2.63E001 2.50E001 1.29E*020.25 0.75 HCT116 8.91E002 1.25E001 3.59E*020.1 0.9 HCT116 1.84E002 5.00E002 3.16E*020.1 0.9 HCT116 1.86E002 5.00E002 3.15E*020.01 0.99 HCT116 1.82E002 5.00E004 1.77E*020.001 0.999 HCT116 2.20E003 5.00E004 1.70E*030 1 HCT116 2.02E003 0.00E+00 2.02E*030 1 HCT116 0.00E+00 0.00E+00 0.00E+001 0 Shared 4.93E001 5.00E001 6.90E*031 0 Shared 4.99E001 5.00E001 1.10E*030.999 0.001 Shared 5.09E001 5.00E001 9.40E*030.99 0.01 Shared 5.01E001 5.00E001 1.00E*030.9 0.1 Shared 4.97E001 5.00E001 3.10E*030.75 0.25 Shared 5.08E001 5.00E001 8.40E*03Shared 0.5 0.5 Shared 5.02E001 5.00E001 1.80E*03 3.36E003 1.13E0050.5 0.5 Shared 5.01E001 5.00E001 1.40E*030.25 0.75 Shared 5.11E001 5.00E001 1.10E*020.1 0.9 Shared 5.01E001 5.00E001 1.40E*030.1 0.9 Shared 5.04E001 5.00E001 3.70E*030.01 0.99 Shared 5.01E001 5.00E001 5.00E*040.001 0.999 Shared 4.95E001 5.00E001 5.30E*030 1 Shared 5.05E001 5.00E001 5.10E*030 1 Shared 5.06E001 5.00E001 5.70E*031 0 184*hTERT 1.75E003 0.00E+00 1.75E*031 0 184*hTERT 1.18E003 0.00E+00 1.18E*030.999 0.001 184*hTERT 2.20E003 5.00E004 1.70E*030.99 0.01 184*hTERT 1.22E002 5.00E003 7.19E*030.9 0.1 184*hTERT 7.08E003 5.00E002 4.29E*020.75 0.25 184*hTERT 6.75E002 1.25E001 5.75E*021840hTERT 0.5 0.5 184*hTERT 2.36E001 2.50E001 1.39E*02 1.80E002 3.24E0040.5 0.5 184*hTERT 2.36E001 2.50E001 1.41E*020.25 0.75 184*hTERT 4.06E001 3.75E001 3.07E*020.1 0.9 184*hTERT 4.89E001 4.50E001 3.90E*020.1 0.9 184*hTERT 4.74E001 4.50E001 2.35E*020.01 0.99 184*hTERT 4.76E001 4.95E001 1.91E*020.001 0.999 184*hTERT 5.00E001 5.00E001 2.00E*040 1 184*hTERT 5.03E001 5.00E001 3.10E*030 1 184*hTERT 4.97E001 5.00E001 3.20E*03Overall&Values 1.46E002 2.13E004  67  Table 2.3 Summarised expected and observed allelic prevalence measurements derived from singleplex sequencing of idealized 184hTERT/HCT116 mixtures.  The table columns represent the proportions of cell mixing for HCT116 and 184hTERT, allele target type (shared or cell type specific), median observed prevalence, expected prevalence (as a function of cell mixing), difference, standard deviation and variance.  HCT116&Proportion1840hTERT&Proportion MedianExpected&Allelic&PrevalenceDifference&of&median&from&expected&valueStandard&Deviation&(SD) Variance&1 0 5.01E001 5.00E001 1.10E%031 0 5.00E001 5.00E001 4.00E%040.9 0.1 4.77E001 4.50E001 2.67E%020.9 0.1 4.73E001 4.50E001 2.33E%02HCT116 0.75 0.25 4.67E001 3.75E001 9.21E%020.75 0.25 4.75E001 3.75E001 1.00E%010.5 0.5 2.73E001 2.50E001 2.28E%02 3.19E002 1.02E0030.5 0.5 2.74E001 2.50E001 2.40E%020.25 0.75 7.33E002 1.25E001 5.17E%020.25 0.75 7.41E002 1.25E001 5.09E%020.1 0.9 9.34E003 5.00E002 4.07E%020.1 0.9 8.31E003 5.00E002 4.17E%020 1 1.03E003 0.00E+00 1.03E%030 1 7.35E004 0.00E+00 7.35E%041 0 4.87E001 5.00E001 1.33E%021 0 4.98E001 5.00E001 1.70E%030.9 0.1 4.97E001 5.00E001 2.60E%030.9 0.1 5.04E001 5.00E001 3.60E%030.75 0.25 4.96E001 5.00E001 3.90E%03Shared 0.75 0.25 4.87E001 5.00E001 1.26E%020.5 0.5 4.93E001 5.00E001 6.60E%03 4.08E003 1.66E0050.5 0.5 4.98E001 5.00E001 2.00E%030.25 0.75 4.97E001 5.00E001 2.70E%030.25 0.75 4.94E001 5.00E001 5.70E%030.1 0.9 4.90E001 5.00E001 1.01E%020.1 0.9 4.97E001 5.00E001 3.40E%030 1 4.95E001 5.00E001 4.90E%030 1 4.89E001 5.00E001 1.12E%021 0 1.09E003 0.00E+00 1.09E%031 0 5.31E004 0.00E+00 5.31E%040.9 0.1 2.58E002 5.00E002 2.42E%020.9 0.1 2.60E002 5.00E002 2.40E%020.75 0.25 2.79E002 1.25E001 9.71E%021840hTERT 0.75 0.25 3.03E002 1.25E001 9.47E%020.5 0.5 2.17E001 2.50E001 3.29E%02 3.08E002 9.48E0040.5 0.5 2.18E001 2.50E001 3.25E%020.25 0.75 4.19E001 3.75E001 4.39E%020.25 0.75 4.25E001 3.75E001 4.95E%020.1 0.9 4.93E001 4.50E001 4.31E%020.1 0.9 4.94E001 4.50E001 4.39E%020 1 4.96E001 5.00E001 3.70E%030 1 4.95E001 5.00E001 4.90E%03Overall&Values 2.86E002 8.17E004  68 2.2 H2: DNA concentration mixed samples as contrasted with cell number mixed samples results in more accurate idealized mixtures To address the second hypothesis, I set out to assess whether the accuracy and precision of results from idealised sample mixtures was dependent on the procedural method of sample mixing. In order to test this, samples were mixed by cell number or by DNA concentration ratio and sequenced using the singleplex PCR sequencing method, with the same genome positions used for all samples. The cell number sample mixes were achieved by counting cells and mixing the cell lines in appropriate proportions based on cell count numbers, while the DNA concentration mixed samples were generated after DNA extraction of each cell line and quantifying the DNA concentration, then mixing the cells according to the specified proportion mix required. The idealised mixtures used for this experiment, comprised of HCT116 and 184-hTERT diploid cell lines that were mixed in defined proportions of 0.0:1.0, 0.1:0.9, 0.25:0.75, 0.5:0.5, 0.75:0.25, 0.9:0.1 and 1.0:0.0, with a replicate for each sample.  The cell number and DNA concentration mixed samples both performed in accordance to the expected allelic prevalences per sample proportion mixtures (Figure 3.4), although variation in the allelic prevalence of the samples can be seen between the mixing methodologies. Both of the mixing methods produced samples that demonstrated a pattern of decreasing HCT116 allelic prevalence with a decrease in HCT116 proportion in each sample and an increase of the allelic prevalence 184-hTERT positions as the proportion of 184-hTERT in the sample increased, which was expected. The samples mixed by DNA concentration however, showed overall closer adherence to the expected allelic prevalence (Table 3.3) with a difference between observed and expected median of   69 < 3.5x10-2, with an overall SD of 7.0x10-3 and overall variance of 5.4x10-5 (Table 3.4), whereas the cell mixing method resulted in differences of < 9.7x10-2 as noted below. The DNA concentration mixed samples also produced more accurate and precise results for the HCT116 and 184-hTERT specific positions with a calculated range of departure of the median from the expected values for HCT116 of 4.0x10-4 to 1.6x10-2 (SD of 5.9x10-3 and variance of 3.5x10-5), and 5.0x10-4 to 1.9x10-2 (SD of 6x10-3,variance of 4.3x10-5) for 184-hTERT (Table 3.4). The shared positions showed a larger SD (9.4x10-3) and variance (8.8x10-5) than the cell number mixed samples (SD 5.0x10-3, variance 1.66x10-5) due to one outlying set of measurements; the 184-hTERT/HCT116 0.9/0.1 replicates displaying a larger spread of values within the box plot between the 1st and 3rd quartiles, that ranged from 0.3 to 0.5, with a median value of 0.46, which is lower than the expected value of 0.5. The idealised mixtures generated by cell number, although adhering to the pattern expected, showed a larger deviation from the expected values, evident from the overall calculated range of departure of the median from the expected values of 4.0x10-4 to 9.7x10-2, with an overall SD of 2.9x10-2, and also produced a greater variance of 8.0x10-4 than the DNA concentration mixed samples.  The HCT116 and 184-hTERT specific positions produced tight clustering around the expected values although not as tightly clustered as the DNA concentration mixed samples, evident from the calculated range of departure of the median from the expected values of 4.0x10-4 to 9.7x10-2, with a variance of 1.0x10-3 for HCT116 and 9.0x10-4 for the 184-hTERT positions. The SD was 3.1x10-2 for both the HCT116 and 184-hTERT positions, indicating that the values for these positions were close to the expected allelic prevalence values although not as tightly clustered as the DNA concentration mixed samples (Tables 3.3 and 3.4). The cell number   70 mixed shared positions produced a small SD of 5.0x10-3 and variance of 1.66x10x-5 with the calculated range of departure of the median values from expected, being 2.0 x10-3 - 1.0x10-2. Two shared positions (chr2_133020562 and chr10_38894616) consistently produced an allelic prevalence value of 0.0 in all of the samples for both mixing methods and despite adequate coverage of the positions, suggesting primer design failure.       71 Figure 2.4 Allelic prevalence plots of idealised mixtures for the singleplex sequencing (184-hTERT specific, HCT116 specific, shared positions) The box plots show the allelic prevalence values for the HCT116 (red), Shared (green) and 184-hTERT (blue) specific heterozygous positions in each sample. The vertical axis indicates the distribution of allelic prevalence values by boxes (interquartile range), horizontal line (median) and dots (outliers greater than 2SD), over the interval 0.0 to 1.0 determined by sequencing and alignments of reads over all informative positions. The horizontal axis indicates the proportions of each cell line in each sample mix ranging from 0.0/1.0 to 1.0/0.0.  The horizontal axis indicates sample mixing by cell number or DNA concentration.    72  Table 2.4 Summarised expected and observed allelic prevalence measurements derived from singleplex sequencing of idealized 184hTERT/HCT116 mixtures.  The table columns represent the proportions of cell mixing for HCT116 and 184hTERT, allele target type (shared or cell type specific), median observed prevalence, expected prevalence (as a function of cell mixing), difference, standard deviation and variance. DNA concentration mixed samples. HCT116&Proportion1840hTERT&Proportion MedianExpected&Allelic&PrevalenceDifference&of&median&from&expected&valueStandard&Deviation&(SD) Variance1 0 5.01E001 5.00E001 1.10E%031 0 5.00E001 5.00E001 4.00E%040.9 0.1 4.46E001 4.50E001 3.60E%030.9 0.1 4.55E001 4.50E001 5.30E%03HCT116 0.75 0.25 3.59E001 3.75E001 1.63E%020.75 0.25 3.57E001 3.75E001 1.78E%020.5 0.5 2.45E001 2.50E001 5.40E%03 5.91E003 3.49E0050.5 0.5 2.60E001 2.50E001 9.90E%030.25 0.75 1.35E001 1.25E001 9.70E%030.25 0.75 1.21E001 1.25E001 4.50E%030.1 0.9 6.28E002 5.00E002 1.28E%020.1 0.9 6.20E002 5.00E002 1.20E%020 1 1.00E003 0.00E+00 1.00E%030 1 7.00E004 0.00E+00 7.00E%041 0 4.87E001 5.00E001 1.33E%021 0 4.98E001 5.00E001 1.70E%030.9 0.1 4.97E001 5.00E001 3.30E%030.9 0.1 5.00E001 5.00E001 1.00E%040.75 0.25 5.00E001 5.00E001 0.00E+00Shared 0.75 0.25 5.00E001 5.00E001 2.00E%040.5 0.5 4.98E001 5.00E001 2.20E%03 9.40E003 8.83E0050.5 0.5 4.95E001 5.00E001 4.90E%030.25 0.75 4.94E001 5.00E001 6.10E%030.25 0.75 4.97E001 5.00E001 3.50E%030.1 0.9 4.64E001 5.00E001 3.58E%020.1 0.9 5.01E001 5.00E001 8.00E%040 1 4.95E001 5.00E001 4.90E%030 1 4.89E001 5.00E001 1.12E%021 0 1.10E003 0.00E+00 1.10E%031 0 5.00E004 0.00E+00 5.00E%040.9 0.1 4.61E002 5.00E002 3.90E%030.9 0.1 4.58E002 5.00E002 4.20E%030.75 0.25 1.38E001 1.25E001 1.32E%021840hTERT 0.75 0.25 1.40E001 1.25E001 1.46E%020.5 0.5 2.42E001 2.50E001 8.50E%03 6.58E003 4.33E0050.5 0.5 2.31E001 2.50E001 1.87E%020.25 0.75 3.61E001 3.75E001 1.41E%020.25 0.75 3.76E001 3.75E001 7.00E%040.1 0.9 4.34E001 4.50E001 1.63E%020.1 0.9 4.34E001 4.50E001 1.57E%020 1 4.96E001 5.00E001 3.70E%030 1 4.95E001 5.00E001 4.90E%03Ovearall&Values 7.33E003 5.37E005  73 2.3  H3: Copy number-complex mixtures require copy number aware clonal analysis To address the third hypothesis, I set out to assess how the estimation of clonal prevalence is affected in copy number complex cell lines, comparing the unmodified allele prevalence measurements, with model-adjusted clonal mutation prevalence determined by PyClone. PyClone is a copy number aware algorithm that estimates clonal mutation prevalence by incorporating copy number information and clustering allele prevalence using a Dirichlet Process model (Roth et al., 2014). The performance of PyClone was evaluated with two approaches: first PyClone analysis was performed on the cell mixing experiments of the copy number simple (diploid) HCT116 and 184-hTERT cell lines; second, analysis was performed with the copy number complex ovarian cell lines DAH55 and DAH56. The DAH55 and DAH56 cell lines each had separate copy number informed PyClone analysis conducted, as copy number status differs between the two cell lines. Copy number information was obtained from OncoSNP (Yau, 2013) analysis of the DAH55 and DAH56 cell line copy number array measurements. Multiplex PCR sequencing was performed on the HCT116/184-hTERT idealised mixtures, with samples generated in defined proportions of 0.0:1.0, 0.001:0.999, 0.01:0.99 0.1:0.9, 0.25:0.75, 0.5:0.5, 0.75:0.25, 0.9:0.1, 0.99:0.01, 0.999:0.001 and 1.0:0.0. Singleplex PCR sequencing was also performed on the HCT116/184-hTERT and DAH55/DAH56 idealised mixtures, with samples generated in defined proportions of 0.0:1.0, 0.1:0.9, 0.25:0.75, 0.5:0.5, 0.75:0.25, 0.9:0.1 and 1.0:0.0, with a replicate for each sample.   74 PyClone analysis uses the allelic prevalence measurements that were generated from singleplex PCR sequencing measurements and from this estimates the proportion of the cells that contain the mutation/SNV (Roth et al., 2014). In order to obtain the PyClone predicted cellular prevalence, allele-specific copy number information was also input into the analysis and a posterior probability density distribution for each SNV of the sample was produced. The posterior densities undergo post processing to determine the location of the mean or median estimate of cellular mutation prevalence. The relationships between raw data PyClone estimates can be visualised and interpreted in a number of ways (Roth et al., 2014)1) Measured allele prevalence vs. sequence coverage and measured allele prevalence vs. inferred cellular prevalence, 2) PyClone predicted cellular mutation prevalence measurements of all positions in all samples, and 3) the degree of similarity between the posterior density distributions of alleles included in each PyClone cluster, as a similarity matrix   1.3.1 H3.1 PyClone accurately estimates cellular mutation prevalence in copy number simple mixtures We first asked how accurate PyClone analysis is at predicting cellular mutation prevalence with copy number simple (diploid) cell line HCT116 and 184-hTERT mixtures of the multiplex and singleplex PCR sequencing experiments.  First, PyClone was used to generate sequence coverage plots and measured allele ratio vs. inferred cellular prevalence plots. The sequencing plots provide visualisation of the sequencing performance of each experiment and show any potential contamination of samples that may have occurred during cell mixing, while the allele ratio vs. cellular   75 prevalence plots indicate the overall performance of PyClone in estimating the cellular prevalence of the clones in each sample. Allele ratios vs. coverage plots (Figure 3.5 A) were generated for the multiplex PCR sequencing HCT116/184-hTERT mixing experiment, for a selection of samples (HCT116/184-hTERT 0.5/0.5, 0.25/0.75. 0.1/0.9 and 0.0/1.0). This analysis showed that the multiplex method performed well with 90% of positions in each sample producing sequencing coverage above the minimum coverage threshold of 50 reads (90% have 98-5054 reads) (Table A.1). It was observed that one 184-hTERT position consistently produced high read coverage (above 1900 reads) in all samples, outperforming all other positions. For positions covered at >50 reads, allele ratio does not appear dependent on depth of coverage (Figure 3.5A). As expected the allele prevalence ratio showed that there was a gradual increase of 184-hTERT in the samples as the proportion of 184-hTERT increased. There was also no evidence of contamination during the cell mixing procedure as shown by the allele prevalence ratio value of 0.0 for HCT116 in the 1.0 184-hTERT sample.    The cellular mutation prevalence mean vs. the variant allele ratios (Figure 3.5 B) for the above-mentioned sample mixes showed that for the multiplex chemistry, the cellular mutation prevalence was recapitulated by PyClone at values close to expected for each of the samples. The shared positions all showed a Pyclone predicted cellular prevalence of 1.0 as expected. The HCT116 and 184-hTERT specific positions also showed cellular mutation prevalence values close to expected (described in detail below). Very similar results were obtained for the copy number simple mixtures sequenced using singleplex chemistry. Some 85% of positions (Figure 3.6A) displayed  >50 reads   76 coverage (85% have 63-51,309 reads). (Table A.2). As for the multiplex chemistry, allele ratio values are not dependent on the sequencing coverage (Fig 3.6A).  Similar to the multiplex method, the singleplex PCR method showed a linear increase of 184-hTERT allele prevalence in the samples as the proportion of 184-hTERT increased. There was also no evidence of contamination during the cell mixing procedure as shown by the allele ratio value of 0.0 for HCT116 in the 1.0 184-hTERT sample.  The comparison of cellular mutation prevalence vs. the variant allele ratios (Figure 3.6 B) for the singleplex PCR method showed a similar pattern to the multiplex chemistry, although the singleplex PCR sequencing produced cellular mutation prevalence values closer to expected than the multiplex method (detailed below).  A key result of PyClone analysis is the number of clusters formed and the strength of association of alleles within each cluster. This relationship is explored in the similarity matrix plots (Figure 3.7) that represent the mean of the posterior distribution of each position and indicates the strength of association between positions (dark blue means 100% pairwise similarity and white means 0% pairwise similarity), In the idealized diploid cell line mixture three major clusters should be observed one for each of the HCT116, 184-hTERT and shared positions. Considering the multiplex sequencing chemistry applied to HCT116/184hTERT mixtures (cell count and DNA count are combined), three large clusters (ranging from 31-48 positions in each) and three smaller clusters (ranging from 2-13 positions in each) were indeed seen (Figure 3.7). The HCT116 positions all clustered as a large group with 100% similarity. The 184-hTERT positions clustered as two groups, the larger cluster with thirty-five 184-hTERT positions, had a similarity of 100% while the smaller second 184-hTERT cluster   77 containing 13 positions showed a variable similarity of 50-100%. The large cluster seen in the middle of the figure is comprised of thirty-seven shared positions and showed similarity within the cluster of 100%. Adjacent to this cluster were two smaller clusters (three and six positions) both comprised of shared positions.   The similarity and clustering for the singleplex HCT116/184-hTERT idealised mixtures (cell number and DNA concentration combined)(Figure 3.8) also resulted in three large clusters (35-47 positions in each) and five small clusters (1-7 positions in each). The HCT116 positions clustered as one large group with 100%, with only 3 positions clustering as one group independent of the main HCT116 cluster. The 184-hTERT positions also exhibited excellent clustering and 100% similarity, with the exception of one 184-hTERT position that formed an independent cluster with two shared positions, this small cluster showed 100% similarity. The majority of shared positions (37 positions) also clustered as a large group with 100% similarity, with seven shared positions forming four smaller independent clusters, with high similarity of 90-100% seen within those clusters.  Finally, we asked how cellular mutation prevalence differed with respect to the expected values by examining the PyClone predictions for each of the samples. We separated multiplex vs. singleplex chemistry and cell counting vs. DNA concentration mixing as factors.  First, considering multiplex sequencing (Figure 3.9, Table 3.5) the greatest deviation from expected for the HCT116 specific positions was 5.9x10-2, and 6.6x10-2 for the 184-hTERT specific positions (Table 3.5).  There were 13 of 48 184-hTERT positions that did not group as expected and formed a cluster (A) independent of the main   78 184-hTERT cluster. The shared (B) positions exhibited a PyClone cellular prevalence mean of 1.0, as expected. Three shared positions did not preform as expected and formed a separate cluster (E) independent of the main shared cluster. The cellular mutation prevalence values of all the positions in the individual mutation clusters (Figure 3.9 B) showed that variance of the positions was low, with a maximum variance of 1.7x10-11 for the HCT116 positions and 4.6x10-6 for the 184-hTERT positions (Table 3.5).  Second, considering cell counting as the mixing method (Figure 3.10, Table 3.6), both HCT116 and 184-hTERT positions indicated good agreement between PyClone predicted mutation cellular prevalence and the expected cellular prevalence. For HCT116 positions, the range of departure of the median from the expected values was 5.2x10-3 to 1.0x10-1 (SD 5.9x10-2) (Table 3.6) and for 184-hTERT the range was 3.5x10-3 to 1.8x10-1 (SD 1.5x10-2). In this data set the observed prevalence for the 0.75, 0.25, and 0.1 mixtures were observed to deviate from the expected values by an amount greater than in sections the initial experiments described in sections 3.1/3.2, suggesting a one off experimental error in either the cell counting or PCR sequencing (Table 3.6).  The shared positions performed as expected and exhibited the predicted cellular prevalence of 1.0. One cluster composed of four shared positions did not follow the expected pattern and formed a cluster independent of main shared target cluster. The HCT116 and 184-hTERT positions were tightly grouped for each (Figure 3.10 B) and showed that variance of the positions was low, with a maximum variance of 2.64x10-12 for HCT116, and 2.06x10-11 for 184-hTERT (Table 3.6). Finally, considering DNA concentration determined mixtures (Figure 3.11, Table 3.7), HCT116 and 184-hTERT specific positions showed very good overall agreement.   79 Departure of the median from the expected values of 2.0x10-3 to 4.4x10-2 (SD of 1.39x10-2) for HCT116, and the calculated range of departure of the median from the expected values of 1.2x10-3 to 4.7x10-2 (SD of 1.5x10-2) for 184-hTERT (Table 3.7). The shared (C) positions performed as expected and exhibited the predicted cellular prevalence of 1.0. There were two distinct clusters (D and E), both composed of shared positions that did not cluster as expected and formed clusters independent of the main shared mutation cluster. The cellular prevalence of all the positions in the individual mutation clusters (Figure 3.11 B) showed that they were tightly grouped and showed that variance of the positions was very low, with a maximum variance of 7.3x10-13 for the HCT116 positions and 8.0x10-11 for the 184-hTERT positions (Table 3.7).       80        Figure 2.5 Coverage vs. allelic prevalence and cellular prevalence plots for the multiplex sequencing idealised HCT116/184-hTERT mixtures. The plot shows HCT116 (red), Shared (blue) and 184-hTERT (green) positions. The vertical axis of the plot A indicates the allele ratio values from 0.0-1.0 The horizontal axis shows the coverage (number of reads) values. The horizontal axis of the plot B indicates the cellular prevalence mean values from 0.0 – 1.0 and vertical axis indicates the allele ratio ranging from 0.0-1.0. The dotted lines represent the expected prevalence, HCT116 (red) and 184-hTERT (green). Samples used in plots: 0.5/0.5, 0.75/0.25, 0.9/0.1 and 1.0/0.0 184-hTERT/HCT116.          81      Figure 2.6 Coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised HCT116/184-hTERT mixtures. The plot shows HCT116 (red), Shared (blue) and 184-hTERT (green) positions. The vertical axis of the plot A indicates the allele ratio values from 0.0-1.0 The horizontal axis shows the coverage (number of reads) values. The horizontal axis of the plot B indicates the cellular prevalence mean values from 0.0 – 1.0 and vertical axis indicates the allele ratio ranging from 0.0-1.0. The dotted lines represent the expected prevalence, HCT116 (red) and 184-hTERT (green). Samples used in plots: 0.5/0.5, 0.75/0.25, 0.9/0.1 and 1.0/0.0 184-hTERT/HCT116.       82  Figure 2.7 Similarity matrix plot for the multiplex PCR sequencing HCT116/184-hTERT idealised mixtures. The horizontal axis lists all 48 of each of the shared, 184-hTERT specific and HCT116 specific positions. The vertical axis lists all 48 of each of the shared, 184-hTERT specific and HCT116 specific positions. The dark blue signifies 100% similarity and lightens in colour to white, signifying 0% similarity between any two points, one from the horizontal and one from the vertical axis.   83  Figure 2.8 Similarity matrix plot for singleplex PCR sequencing HCT116/184-hTERT idealised mixtures. The horizontal axis lists all 48 of each of the shared, 184-hTERT specific and HCT116 specific positions. The vertical axis lists all 48 of each of the shared, 184-hTERT specific and HCT116 specific positions. The dark blue signifies 100% similarity and lightens in colour to white, signifying 0% similarity between any two points, one from the horizontal and one from the vertical axis.  84  Figure 2.9 PyClone cellular prevalence plots for the multiplex sequencing idealised HCT116/184-hTERT mixtures. In each plot, the vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. In plot A the dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three HCT116, 184-hTERT and shared clusters are; C (HCT116) D (184-hTERT) and B (shared). Plot B illustrates the cellular prevalence of all positions in each sample.      85   Figure 2.10 PyClone cellular prevalence plots for the samples mixed by cell number from the singleplex sequencing idealised HCT116/184-HTERT mixtures.  In each plot, the vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. In plot A the dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three HCT116, 184-hTERT and shared clusters are; A (HCT116) B (184-hTERT) and C (shared).). Plot B illustrates the cellular prevalence of all positions in each sample.       0.000.250.500.751.00hTERT=0.0;HCT116=1.0hTERT=0.0;HCT116=1.0hTERT=0.1;HCT116=0.9hTERT=0.1;HCT116=0.9hTERT=0.25;HCT116=0.75hTERT=0.25;HCT116=0.75hTERT=0.5;HCT116=0.5hTERT=0.5;HCT116=0.5hTERT=0.75;HCT116=0.25hTERT=0.75;HCT116=0.25hTERT=0.9;HCT116=0.1hTERT=0.9;HCT116=0.1hTERT=1.0;HCT116=0.0hTERT=1.0;HCT116=0.0Sample Ratio (184-hTERT: HCT116)Mean Cellular Mutation PrevalenceCluster Size (n)A (47)B (48)C (37)D (4)Cellular Mutation Prevalence 0.000.250.500.751.00hTERT=0.0;HCT116=1.0hTERT=0.0;HCT116=1.0hTERT=0.1;HCT116=0.9hTERT=0.1;HCT116=0.9hTERT=0.25;HCT116=0.75hTERT=0.25;HCT116=0.75hTERT=0.5;HCT116=0.5hTERT=0.5;HCT116=0.5hTERT=0.75;HCT116=0.25hTERT=0.75;HCT116=0.25hTERT=0.9;HCT116=0.1hTERT=0.9;HCT116=0.1hTERT=1.0;HCT116=0.0hTERT=1.0;HCT116=0.0Sample Ratio (184-hTERT: HCT116)Mean Cellular Mutation PrevalenceCluster Size (n)A (47)B (48)C (37)D (4)Cellular Mutation Prevalence All PositionsAB  86 Figure 2.11 PyClone cellular prevalence plots for the samples mixed by DNA concentration from singleplex sequencing idealised HCT116/184-hTERT mixtures.  In each plot, the x-axis indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the y-axis In each plot, the vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. In plot A the dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three HCT116, 184-hTERT and shared clusters are; A (HCT116) B (184-hTERT) and C (shared). Plot B illustrates the cellular prevalence of all positions in each sample.    0.000.250.500.751.00hTERT=0.0;HCT116=1.0hTERT=0.0;HCT116=1.0hTERT=0.1;HCT116=0.9hTERT=0.1;HCT116=0.9hTERT=0.25;HCT116=0.75hTERT=0.25;HCT116=0.75hTERT=0.5;HCT116=0.5hTERT=0.5;HCT116=0.5hTERT=0.75;HCT116=0.25hTERT=0.75;HCT116=0.25hTERT=0.9;HCT116=0.1hTERT=0.9;HCT116=0.1hTERT=1.0;HCT116=0.0hTERT=1.0;HCT116=0.0Sample Ratio (184-hTERT: HCT116)Cluster Size (n)A (47)B (48)C (37)D (3)E (2)Cellular Mutation Prevalence 0.000.250.500.751.00hTERT=0.0;HCT116=1.0hTERT=0.0;HCT116=1.0hTERT=0.1;HCT116=0.9hTERT=0.1;HCT116=0.9hTERT=0.25;HCT116=0.75hTERT=0.25;HCT116=0.75hTERT=0.5;HCT116=0.5hTERT=0.5;HCT116=0.5hTERT=0.75;HCT116=0.25hTERT=0.75;HCT116=0.25hTERT=0.9;HCT116=0.1hTERT=0.9;HCT116=0.1hTERT=1.0;HCT116=0.0hTERT=1.0;HCT116=0.0Sample Ratio (184-hTERT: HCT116)Cluster Size (n)A (47)B (48)C (37)D (3)E (2)Cellular Mutation Prevalence All PositionsMean Cellular Mutation PrevalenceMean Cellular Mutation PrevalenceAB  87    Table 2.5 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from multiplex sequencing of HCT116/184-hTERT mixtures. Table columns represent the allele target type (cell type specific), proportions of cell mixing for HCT116 and 184-hTERT (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters.   Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDindividual(mutation(cluster(variance1 9.82E?01 1.82E?02 3.35E?060.9 9.71E?01 7.06E?02 4.46E?060.75 8.40E?01 9.01E?02 3.88E?060.5 5.02E?01 1.92E?03 2.23E?060.25 1.69E?01 8.08E?02 2.94E?070.1 3.26E?02 6.74E?02 4.38E?10184?hTERT 0 1.22E?02 1.22E?02 3.07E?02 2.50E?110.99 9.64E?01 2.62E?02 4.66E?060.01 4.31E?02 3.31E?02 7.12E?090.999 9.81E?01 1.79E?02 2.97E?060.001 1.70E?02 1.60E?02 1.57E?101 9.84E?01 1.62E?02 2.99E?060 1.02E?02 1.02E?02 1.55E?100.5 5.00E?01 2.42E?04 2.19E?060.9 9.66E?01 6.61E?02 3.79E?060 1.48E?02 1.48E?02 2.98E?130.1 5.17E?02 4.83E?02 1.18E?110.25 1.92E?01 5.82E?02 5.55E?120.5 5.22E?01 2.15E?02 2.50E?120.75 8.10E?01 5.97E?02 5.89E?120.9 9.51E?01 5.09E?02 3.17E?12HCT116 1 9.64E?01 3.62E?02 1.72E?02 9.61E?120.01 5.43E?02 4.43E?02 8.51E?120.99 9.44E?01 4.64E?02 6.97E?120.001 1.59E?02 1.49E?02 1.41E?120.999 9.56E?01 4.33E?02 4.76E?120 7.67E?03 7.67E?03 7.52E?121 9.56E?01 4.40E?02 1.76E?120.5 5.15E?01 1.54E?02 1.69E?110.1 5.53E?02 4.47E?02 5.95E?13  88  Table 2.6 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of HCT116/184-hTERT mixtures.   Cell number mixed samples only. Table columns represent the allele target type (cell type specific), proportions of cell mixing for HCT116 and 184-hTERT (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters.     Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1 9.86E?01 1.38E?02 1.63E?120 6.76E?03 6.76E?03 9.05E?140.9 9.80E?01 7.99E?02 5.92E?130.75 8.50E?01 9.98E?02 1.82E?12184$hTERT 0.5 4.56E?01 4.42E?02 2.06E?110.25 6.35E?02 1.87E?01 9.74E?130.1 6.12E?02 3.88E?02 6.10E?141 9.88E?01 1.21E?02 5.95E?02 7.62E?130 3.49E?03 3.49E?03 4.09E?120.9 9.82E?01 8.16E?02 1.09E?110.75 8.50E?01 1.00E?01 6.38E?120.5 4.56E?01 4.36E?02 4.49E?130.25 6.95E?02 1.81E?01 1.48E?120.1 6.33E?02 3.67E?02 2.24E?130 7.40E?03 7.40E?03 6.43E?131 9.82E?01 1.82E?02 8.37E?130.1 2.59E?02 7.41E?02 1.20E?120.25 1.52E?01 9.77E?02 5.79E?13HCT116 0.5 5.41E?01 4.07E?02 1.56E?120.75 9.31E?01 1.81E?01 9.11E?130.9 9.38E?01 3.80E?02 2.72E?120 5.19E?03 5.19E?03 5.75E?02 3.34E?131 9.86E?01 1.43E?02 9.71E?150.1 2.47E?02 7.53E?02 6.60E?140.25 1.55E?01 9.50E?02 9.31E?140.5 5.44E?01 4.45E?02 3.12E?130.75 9.30E?01 1.80E?01 7.68E?130.9 9.39E?01 3.86E?02 2.64E?12  89   Table 2.7 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of HCT116/184-hTERT mixtures.  DNA concentration mixed samples only. Table columns represent the allele target type (cell type specific), proportions of cell mixing for HCT116 and 184-hTERT (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters.    Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1 9.87E?01 1.30E?02 4.53E?120 6.11E?03 6.11E?03 2.78E?120.9 8.73E?01 2.70E?02 1.78E?110.75 7.44E?01 5.70E?03 4.46E?11184?hTERT 0.5 5.01E?01 8.30E?04 1.19E?110.25 2.93E?01 4.29E?02 3.78E?120.1 1.03E?01 3.38E?03 2.93E?121 9.88E?01 1.17E?02 1.50E?02 2.40E?110 2.95E?03 2.95E?03 1.36E?110.9 8.81E?01 1.95E?02 9.64E?120.75 7.74E?01 2.39E?02 1.44E?110.5 4.77E?01 2.35E?02 2.28E?120.25 2.97E?01 4.67E?02 8.07E?110.1 9.88E?02 1.19E?03 1.38E?110 6.35E?03 6.35E?03 1.35E?161 9.82E?01 1.85E?02 4.16E?130.1 1.34E?01 3.39E?02 2.79E?140.25 2.74E?01 2.45E?02 3.05E?15HCT116 0.5 5.02E?01 1.97E?03 8.29E?140.75 7.05E?01 4.45E?02 7.69E?150.9 8.80E?01 1.97E?02 1.73E?130 4.61E?03 4.61E?03 1.39E?02 7.30E?151 9.86E?01 1.37E?02 7.28E?130.1 1.30E?01 3.04E?02 1.11E?140.25 2.44E?01 5.90E?03 2.02E?150.5 5.25E?01 2.48E?02 4.22E?140.75 7.13E?01 3.74E?02 3.50E?140.9 9.03E?01 2.90E?03 1.21E?13  90 H3.2: PyClone estimation of cellular mutation prevalence is concordant with expected values in copy number complex mixtures  In the final set of experiments, we asked how the inclusion of copy number information affects the accuracy and precision of PyClone predictions of cellular mutation prevalence and clonal structure of copy number complex mixtures. We did this by performing PyClone analysis on idealised mixtures of the DAH55/DAH56 two ovarian cancer cell lines with measured copy number and mutation status. Two comparisons were made, first with no copy number information applied (assuming diploid state) and second informing PyClone analysis with the copy number state of the DAH55 and DAH56 cell lines. Since PyClone assumes the copy number of each clone will be the same, we also examined how copy number variation within a clone affects the accuracy and precision of PyClone predictions. This was achieved by looking at SNV positions over all CNV regions, regardless of whether they were copy number concordant between cell lines (non-homogenous copy number) and contrasting with SNVs restricted to regions of homogenous copy number between the two cell lines.  In all these analyses, SNVs that failed to validate as a variant after alignment and binomial testing were removed (12 positions form DAH55, 10 positions from DAH56 and 2 shared positions). For simplicity, only singleplex chemistry was used and mixtures generated by cell counting and DNA concentration were pooled in the analysis. As for other mixture experiments, comparison of allele ratio vs. coverage (Figure 3.12 A and Figure 3.13 B)  (DAH55/DAH56 0.5/0.5, 0.25/0.75. 0.1/0.9 and 0.0/1.0 exhibited good sequence coverage (90% of positions 57-33,352 reads)(Table A.3). Unlike previous comparisons, in this copy number complex mixture, the relationship   91 between the increase in allele prevalence in the mixture (e.g. green DAH56 dots, Figure 3.12A) is less obvious. Some minor contamination of DAH56 into the DAH55 samples may have occurred, as it was observed that for the 1.0 DAH55 proportion, five DAH56 positions were seen at an allele ratio above the expected value of 0.0.  We examined PyClone predictions using either DAH55 copy number calls (Figure 3.12B) or DAH56 copy number calls (Figure 3.13B). In both cases, the DAH55 and DAH56 specific positions consistently had a lower PyClone predicted cellular mutation prevalence than expected, although the DAH56 positions were closer to expected than DAH55. The shared positions also produced cellular prevalence means lower than expected.  Next, considering the situation where no copy number information is incorporated, the similarity and clustering for the singleplex DAH55/DAH56 idealised mixtures (cell number and DNA concentration combined) resulted in many more than the three expected clusters and a range of similarity between 40-100% (Figure 3.14). Six clusters were observed, the largest cluster in the bottom right of the figure comprising of thirty DAH56 positions, with 100% similarity as expected. One smaller cluster comprised of twenty-three DAH55 positions and showed 100% similarity. The remaining clusters did not group as expected and were composed of mainly shared and DAH55 positions, ranging in similarity of 30-100%.  We next incorporated copy number, first including all regions to simulate copy number clonal variation. Including copy number at all positions from DAH55 and DAH56 still resulted in significantly more than three major clusters regardless of whether DAH55 copy number calls (Figure 3.15) or DAH56 (Figure 3.16) copy number calls were used to inform the model. For DAH55 calls, eight clusters were seen, (ranging in   92 size from two – twenty-seven positions, with similarity from 30-100%, Figure 3.15) and for DAH56 copy number calls five clusters were observed (ranging in size from two – forty-two positions, with similarity from 40-100%, Figure 3.16). The largest cluster as expected for the DAH55 informed analysis was composed of all DAH55 positions, and for the DAH56 informed analysis the DAH56 positions comprised the largest cluster. These observations above are reflected in the deviation of cellular mutation prevalence in each situation. With no copy number information (Figure 3.17) five clusters were formed only three of which (E, B and D Figure 3.17) correctly map to the expected private and shared cell line positions. The other clusters are mixed and do not track the expected allele prevalence values. The range of departure of the median from the expected values was cluster 2.6x10-3 to 1.7x10-4, (SD of 4.6x10-2) and for the DAH56 (B) range was 1.4x10-2 to 4.2x10-1 (SD of 1.5x10-1)  (Table 3.8). The measured shared (B) cluster exhibited a cellular prevalence mean of 1.0 as expected. There was large deviation of the observed cellular mutation prevalence values from the expected values for the cell number mixed 0.25 DAH56 proportions by an amount greater than the usual variance, suggesting a one off experimental error in either the cell mixing of PCR sequencing, this is also seen in all subsequent analysis (Table 3.8). Twenty-nine positions formed two clusters (C and E) independent of the main DAH55, DAH56 and shared clusters. Cluster C composed of nineteen shred positions and one DAH56 position and cluster E composed of nine DAH55 positions. With copy number information at all positions of DAH55 or DAH56 the number of clusters is still greater than the expected (3) and the departures of cellular prevalence from expected are greater than for copy number simple mixtures (Figure 3.18, Figure   93 3.19). Considering DAH55 copy number informed analysis, only three of eight (A, B and E, Figure 3.18) correctly map to the expected private and shared cell line positions. The other clusters are mixed and do not track the expected allele prevalence values.  For DAH55 (A), the range of departure of the median from the expected values was 5.4x10-3 to 2.3x10-1 (SD 5.9x10-2) and for the DAH56 (E) range was 2.9x10-3 to 2.6x10-1 (SD 8.3x10-2)  (Table 3.9). Forty-one positions did not group with the main DAH55, DAH56 or shared cellular prevalence mutation clusters and formed five distinct independent clusters (D, F-H) that deviated from the PyClone predictions. Cluster D was composed of two DAH55, four DAH56 and twelve shared positions, cluster F was composed of three shared positions, cluster G was composed of three shared positions and cluster H was composed of one DAH55 and one DAH56 position. Considering DAH56 copy number informed PyClone analysis, only three of eight (A, B and C, Figure 3.19) correctly map to the expected private and shared cell line positions. The DAH55 and DAH56 positions showed calculated range of departure of the median from the expected values of 2.1x10-3 to 2.8x10-1(SD 8.65x10-2) for DAH55 (A) and for the DAH56 (B) the range was 6.16x10-3 to 2.2.x10-1 (SD 6.54x10-2)  (Table 3.10).   Combining all copy number calls into the analysis, only three of six clusters formed (A, C and D, Figure 3.20) correctly map to the expected private and shared cell line positions. For DAH55 (A), the range of departure of the median from the expected values was 5.3x10-4 to 1.5x10-1 (SD of 5.5x10-2) and for the DAH56 (D) the range was 5.8x10-3 to 3.5x10-1 (SD of 1.3x10-1)(Table 3.11). The shared (B) positions did not perform as expected and only produced a predicted cellular mutation prevalence of 1.0 for the 0.1/1.0 DAH55/DAH56 samples.  Twenty-six positions did not group with the   94 main DAH55, DAH56 or shared clusters and formed three distinct independent clusters (C, E-F) that deviated from the expected clusters. Cluster C was composed of three shared positions, cluster E was composed of five shared positions and one DAH56 position, and cluster F was composed of two DAH55 positions.  Finally, we analyzed the situation of copy number complex mixtures which accurately reflects the underlying model assumption of PyClone, that copy number is uniform between clones. This was achieved by considering only SNV positions in regions of homogenous copy number state (homogenous SNVs). Here the agreement with the predicted cellular prevalence and clusters was much closer (Figure 3.21). Three robust clusters were formed three clusters as expected. The calculated range of departure of the median from the expected values for DAH55 (C) was 8.45x10-3 to 1.9x10-1 (SD 5.7x10-1) and the range for the DAH56 (B) positions was 9.9x10-3 to 2.3x10-1 (SD 6.9x10-2)(Table 3.12). The shared (A) positions in each sample performed as expected with a predicted cellular prevalence of 1.0. The DAH55 (C) and DAH56 (B) clusters also performed as expected and no extra clusters were formed, in contrast with the non-homogenous SNV analysis (Figures 3.17-20).   Overall results summary (1) Multiplex and singleplex PCR sequencing methods performed similarly and produced allele prevalence measurements in close agreement with expected. (2) Testing of sample mixing methods showed that DNA concentration mixed samples perform more accurately than samples mixed by cell number.   95 (3) PyClone applied to copy number simple mixtures results in the expected number of clonal groups and good agreement with the expected cellular mutation prevalence.  (4) PyClone performs as expected on copy number complex mixtures – when copy number homogeneity is simulated, good agreement is seen; when locus specific copy number heterogeneity exists, the current PyClone model does not accurately resolve the number of clusters and clonal prevalence.                                          96     Figure 2.12 DAH55 CN informed, coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. The plot shows DAH55 (green), shared (blue) and DAH56 (red) positions. The vertical axis of the plot A indicates the allele ratio values from 0.0-1.0. The horizontal axis shows the coverage (number of reads) values. The horizontal axis of the plot B indicates the cellular prevalence mean value from 0.0 – 1.0 and vertical indicates the allele ratio ranging from 0.0-1.0. The dotted lines represent the expected prevalence, DAH55 (green) and DAH56 (red).  Samples used in plots: 0.5/0.5, 0.75/0.25, 0.9/0.1 and 1.0/0.0 DAH55/DAH56 mixes. DAH55 copy number informed.    97  Figure 2.13 DAH56 CN informed, coverage vs. allelic prevalence and cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures. The plot shows DAH55 (green), shared (blue) and DAH56 (red) positions. The vertical axis of the plot A indicates the allele ratio values from 0.0-1.0. The horizontal axis shows the coverage (number of reads) values. The horizontal axis of the plot B indicates the cellular prevalence mean value from 0.0 – 1.0 and vertical indicates the allele ratio ranging from 0.0-1.0. The dotted lines represent the expected prevalence, DAH55 (green) and DAH56 (red).  Samples used in plots: 0.5/0.5, 0.75/0.25, 0.9/0.1 and 1.0/0.0 DAH55/DAH56 mixes.  DAH56 copy number informed.         98  Figure 2.14 Similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures (No copy number information). The horizontal axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The vertical axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The dark blue signifies 100% similarity and lightens in colour to white, signifying 0% similarity between any two points, one from the horizontal and one from the vertical axis      99   Figure 2.15  DAH55 CN informed similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures. The horizontal axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The vertical axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The dark blue signifies 100% similarity and lightens in colour to white, signifying 0% similarity between any two points, one from the horizontal and one from the vertical axis.      100   Figure 2.16 DAH56 CN informed similarity matrix plot for singleplex PCR sequencing DAH55/DAH56 idealised mixtures. The horizontal axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The vertical axis lists all 48 of each of the shared, DAH55 specific and DAH56 specific positions. The dark blue signifies 100% similarity and lightens in colour to white, signifying 0% similarity between any two points, one from the horizontal and one from the vertical axis.  101         Figure 2.17 PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures, No copy number information used. The vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. The dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three DAH55, DAH56 and shared clusters are; E (DAH55) B (DAH56) and D (shared).           102     Figure 2.18 DAH55 CN informed PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures.  The vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. The dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three DAH55, DAH56 and shared clusters are; A (DAH55), E (DAH56) and B (shared). All samples in this plot are DAH55 copy number informed.      103       Figure 2.19 DAH56 CN informed PyClone cellular prevalence plots for the singleplex sequencing idealised DAH55/DAH56 mixtures.  The vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. The dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three DAH55, DAH56 and shared clusters are; A (DAH55), B (DAH56) and C (shared). All samples in this plot are DAH55 copy number informed.     104   Figure 2.20 PyClone cellular prevalence plots of the non-homogenous SNVs for the singleplex sequencing idealised DAH55/DAH56 mixtures. The vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. The dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three DAH55, DAH56 and shared clusters are; D (DAH55), A (DAH56) and C (shared).  Only non-homogenous SNVs used for analysis.        105                                 Figure 2.21 PyClone cellular prevalence plots of the homogenous SNVs for the singleplex sequencing idealised DAH55/DAH56 mixtures. The vertical indicates the range of mean cellular mutation prevalence ranging from 0.0-1.0 and the horizontal indicates the proportions of each sample mix. The dotted lines represent the expected values and the coloured circles represent the PyClone predicted cellular mutation prevalence values of each sample. The cluster size (n) denotes the number of positions in each cluster. The clusters correctly predicted by PyClone as the three DAH55, DAH56 and shared clusters are; C (DAH55), B (DAH56) and A (shared).  Only homogenous SNVs used for analysis.  106                        Table 2.8 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  No copy number. Table columns represent the allele target type (cell type specific), proportions of cell mixing for DAH55 and DAH56 (the proportions represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters. No copy number information used.  Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1.00 9.23E@01 7.66E@02 8.12E@041.00 9.18E@01 8.22E@02 8.26E@040.90 8.55E@01 4.53E@02 8.43E@040.90 8.45E@01 5.53E@02 7.91E@040.90 9.21E@01 2.11E@02 7.63E@040.90 8.49E@01 5.14E@02 6.17E@040.75 6.97E@01 5.33E@02 4.98E@040.75 7.30E@01 2.02E@02 6.53E@040.75 8.69E@01 1.19E@01 7.46E@040.75 9.09E@01 1.59E@01 6.96E@04DAH55 0.50 4.67E@01 3.26E@02 4.57E@02 2.05E@040.50 6.72E@01 1.72E@01 4.72E@040.25 2.47E@01 2.62E@03 4.02E@050.25 2.74E@01 2.40E@02 5.82E@050.25 2.82E@01 3.20E@02 6.56E@050.25 2.17E@01 3.29E@02 3.51E@050.10 1.77E@01 7.66E@02 2.20E@050.10 1.27E@01 2.66E@02 7.05E@060.00 3.44E@02 3.44E@02 4.87E@070.00 3.36E@02 3.36E@02 5.15E@070.00 3.09E@02 3.09E@02 1.93E@340.00 3.16E@02 3.16E@02 1.93E@340.10 1.15E@01 1.46E@02 3.08E@330.10 1.33E@01 3.31E@02 3.08E@330.10 4.14E@02 5.86E@02 4.81E@350.10 1.15E@01 1.55E@02 7.70E@340.25 2.15E@01 3.48E@02 1.23E@320.25 2.17E@01 3.28E@02 1.47E@01 1.93E@320.25 1.15E@01 1.35E@01 4.81E@330.25 4.61E@02 2.04E@01 0.00E+00DAH56 0.50 3.49E@01 1.51E@01 2.77E@320.50 2.53E@01 2.47E@01 0.00E+000.75 4.74E@01 2.76E@01 2.77E@320.75 4.66E@01 2.84E@01 1.11E@310.75 4.46E@01 3.04E@01 1.23E@320.75 5.01E@01 2.49E@01 1.23E@320.90 5.14E@01 3.86E@01 0.00E+000.90 5.45E@01 3.55E@01 1.23E@321.00 5.85E@01 4.15E@01 4.93E@321.00 5.88E@01 4.12E@01 1.97E@31  107                                                Table 2.9 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures. DAH55 copy number informed. Table columns represent the allele target type (cell type specific), proportions of cell mixing for DAH55 and DAH56 (the proportions represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters.   Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1.00 9.78E@01 2.20E@02 6.49E@041.00 9.68E@01 3.16E@02 4.88E@040.90 8.69E@01 3.12E@02 8.63E@040.90 8.73E@01 2.74E@02 9.95E@040.90 9.73E@01 7.27E@02 4.68E@040.90 8.81E@01 1.95E@02 3.68E@040.75 6.68E@01 8.18E@02 4.58E@040.75 6.98E@01 5.20E@02 5.10E@040.75 9.05E@01 1.55E@01 5.87E@02 4.14E@040.75 9.80E@01 2.30E@01 5.55E@04DAH55 0.50 3.79E@01 1.21E@01 1.28E@040.50 6.33E@01 1.33E@01 2.27E@040.25 1.53E@01 9.66E@02 1.13E@050.25 1.85E@01 6.54E@02 2.69E@050.25 1.86E@01 6.36E@02 1.91E@050.25 1.36E@01 1.14E@01 1.55E@050.10 1.03E@01 3.39E@03 1.29E@050.10 5.58E@02 4.42E@02 1.33E@060.00 5.63E@03 5.63E@03 1.38E@080.00 5.39E@03 5.39E@03 9.78E@100.00 7.62E@03 7.62E@03 7.23E@090.00 7.16E@03 7.16E@03 1.37E@090.10 3.62E@01 2.62E@01 4.82E@060.10 1.05E@01 5.29E@03 2.45E@080.10 1.07E@02 8.93E@02 7.94E@090.10 7.67E@02 2.33E@02 6.38E@070.25 2.53E@01 2.90E@03 5.10E@060.25 2.45E@01 4.54E@03 8.27E@02 4.18E@060.25 7.27E@02 1.77E@01 3.69E@070.25 1.51E@02 2.35E@01 4.99E@08DAH56 0.50 5.12E@01 1.25E@02 1.74E@040.50 3.11E@01 1.89E@01 2.26E@050.75 8.26E@01 7.65E@02 1.32E@030.75 8.26E@01 7.56E@02 1.38E@030.75 7.76E@01 2.61E@02 9.45E@040.75 8.94E@01 1.44E@01 1.50E@030.90 9.30E@01 3.01E@02 1.87E@030.90 9.56E@01 5.61E@02 1.72E@031.00 9.74E@01 2.61E@02 1.41E@031.00 9.80E@01 2.03E@02 1.51E@03  108                        Table 2.10 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures. DAH56 copy number informed. Table columns represent the allele target type (cell type specific), proportions of cell mixing for DAH55 and DAH56 (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters.    Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1.00 7.89E@01 2.11E@01 4.93E@321.00 7.23E@01 2.77E@01 1.97E@310.90 6.67E@01 2.33E@01 1.11E@310.90 6.54E@01 2.46E@01 3.08E@310.90 7.49E@01 1.51E@01 1.11E@310.90 7.59E@01 1.41E@01 3.08E@310.75 6.08E@01 1.42E@01 1.11E@310.75 5.74E@01 1.76E@01 00.75 7.54E@01 3.69E@03 8.65E@02 1.11E@310.75 7.63E@01 1.31E@02 0DAH55 0.50 3.66E@01 1.34E@01 00.50 5.94E@01 9.36E@02 1.23E@320.25 1.64E@01 8.65E@02 3.08E@330.25 1.80E@01 6.97E@02 7.70E@340.25 1.85E@01 6.54E@02 6.93E@330.25 1.32E@01 1.18E@01 00.10 1.02E@01 2.07E@03 4.81E@330.10 6.85E@02 3.15E@02 7.70E@340.00 1.36E@02 1.36E@02 1.20E@350.00 1.29E@02 1.29E@02 7.52E@350.00 1.74E@02 1.74E@02 6.15E@110.00 1.75E@02 1.75E@02 5.42E@110.10 8.50E@02 1.50E@02 1.82E@110.10 1.06E@01 6.16E@03 3.04E@110.10 2.21E@02 7.79E@02 4.93E@110.10 8.63E@02 1.37E@02 1.53E@110.25 2.39E@01 1.08E@02 6.54E@02 1.34E@130.25 2.42E@01 8.22E@03 3.62E@120.25 8.13E@02 1.69E@01 3.14E@110.25 2.71E@02 2.23E@01 2.96E@11DAH56 0.50 4.75E@01 2.48E@02 1.67E@130.50 2.99E@01 2.01E@01 9.59E@130.75 7.40E@01 1.01E@02 4.86E@110.75 7.25E@01 2.51E@02 4.25E@110.75 6.88E@01 6.24E@02 3.71E@110.75 8.00E@01 4.99E@02 8.17E@110.90 8.25E@01 7.54E@02 9.45E@110.90 8.81E@01 1.89E@02 1.13E@101.00 9.36E@01 6.37E@02 1.46E@101.00 9.34E@01 6.63E@02 1.06E@10  109                             Table 2.11 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  Non-homogenous SNVs only used in analysis. Table columns represent the allele target type (cell type specific), proportions of cell mixing for DAH55 and DAH56 (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters. Only positions that had differing copy number across the cell lines were included.  Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1.00 9.74E@01 2.63E@02 4.96E@061.00 9.65E@01 3.55E@02 3.77E@060.90 8.49E@01 5.09E@02 7.02E@060.90 8.57E@01 4.26E@02 7.83E@060.90 9.67E@01 6.68E@02 2.93E@060.90 8.58E@01 4.23E@02 2.54E@060.75 6.38E@01 1.12E@01 3.29E@060.75 6.82E@01 6.82E@02 3.78E@060.75 9.01E@01 1.51E@01 3.02E@060.75 9.76E@01 2.26E@01 5.54E@02 3.01E@06DAH55 0.50 3.84E@01 1.16E@01 1.15E@060.50 6.12E@01 1.12E@01 1.54E@060.25 1.54E@01 9.63E@02 1.38E@070.25 1.85E@01 6.48E@02 1.43E@070.25 1.89E@01 6.08E@02 9.72E@080.25 1.37E@01 1.13E@01 1.58E@070.10 9.95E@02 5.34E@04 9.32E@080.10 5.55E@02 4.45E@02 1.69E@090.00 3.59E@03 3.59E@03 6.19E@110.00 4.29E@03 4.29E@03 9.61E@110.00 6.49E@03 6.49E@03 7.52E@370.00 5.78E@03 5.78E@03 7.52E@370.10 7.89E@02 2.11E@02 00.10 1.12E@01 1.18E@02 1.73E@330.10 1.00E@02 9.00E@02 00.10 9.07E@02 9.35E@03 00.25 2.40E@01 9.70E@03 3.08E@330.25 2.40E@01 1.01E@02 6.93E@330.25 8.43E@02 1.66E@01 1.27E@01 7.70E@340.25 1.28E@02 2.37E@01 2.71E@35DAH56 0.50 4.16E@01 8.41E@02 1.23E@320.50 2.83E@01 2.17E@01 00.75 5.38E@01 2.12E@01 1.23E@320.75 5.26E@01 2.24E@01 00.75 5.20E@01 2.30E@01 1.23E@320.75 5.74E@01 1.76E@01 00.90 5.66E@01 3.34E@01 00.90 6.03E@01 2.97E@01 4.93E@321.00 6.59E@01 3.41E@01 1.23E@321.00 6.49E@01 3.51E@01 1.23E@32  110                                               Table 2.12 Summarised expected and observed PyClone predicted cellular mutation prevalence measurements derived from singleplex sequencing of DAH55/DAH56 mixtures.  Homogenous SNVs only used for analysis. Table columns represent the allele target type (cell type specific), proportions of cell mixing for DAH55 and DAH56 (the proportions also represent the expected values), observed mutation cluster median, difference, standard deviation and the variance of the individual mutation clusters. Only positions that were of the same copy number across cell lines were used for analysis. Expected(value/(ProportionIndividual(mutation(cluster(medianDifference(of(median(from(expected(value Overall(SDIndividual(mutation(cluster(variance1.00 9.44E?01 5.61E?02 6.54E?041.00 9.36E?01 6.37E?02 5.40E?040.90 8.88E?01 1.16E?02 4.18E?040.90 8.87E?01 1.26E?02 1.02E?030.90 9.43E?01 4.29E?02 6.65E?040.90 9.09E?01 8.60E?03 6.15E?040.75 7.38E?01 1.16E?02 3.96E?040.75 7.10E?01 4.03E?02 4.80E?040.75 8.87E?01 1.37E?01 5.74E?02 5.06E?040.75 9.43E?01 1.93E?01 7.44E?04DAH55 0.50 3.67E?01 1.33E?01 5.15E?050.50 6.71E?01 1.71E?01 3.41E?040.25 1.66E?01 8.36E?02 4.85E?060.25 1.94E?01 5.58E?02 3.07E?050.25 1.94E?01 5.57E?02 2.75E?050.25 1.35E?01 1.15E?01 1.86E?060.10 1.14E?01 1.41E?02 7.15E?060.10 6.45E?02 3.55E?02 2.29E?060.00 1.10E?02 1.10E?02 1.42E?090.00 8.45E?03 8.45E?03 3.89E?080.00 9.90E?03 9.90E?03 6.81E?110.00 1.19E?02 1.19E?02 1.43E?100.10 7.75E?02 2.25E?02 1.08E?090.10 8.99E?02 1.01E?02 1.09E?090.10 1.51E?02 8.49E?02 1.41E?100.10 6.65E?02 3.35E?02 5.46E?100.25 2.32E?01 1.79E?02 6.39E?090.25 2.13E?01 3.69E?02 5.29E?090.25 5.91E?02 1.91E?01 6.85E?02 1.84E?090.25 2.08E?02 2.29E?01 7.80E?11DAH56 0.50 4.60E?01 4.01E?02 2.64E?090.50 2.77E?01 2.23E?01 5.42E?090.75 7.26E?01 2.37E?02 6.50E?100.75 7.14E?01 3.55E?02 7.37E?110.75 7.21E?01 2.88E?02 2.23E?090.75 8.00E?01 4.96E?02 9.96E?100.90 8.32E?01 6.84E?02 3.43E?090.90 8.72E?01 2.75E?02 1.20E?081.00 9.21E?01 7.86E?02 2.42E?081.00 9.34E?01 6.60E?02 1.70E?08  111 Chapter 3: Discussion 3.1 Overview Significant advancements in sequencing technologies have enabled investigation of complex cell mixtures in cancers. The elucidation of the clonal structure is of particular importance, in order to gain a clearer understanding of the evolution of the tumour, providing valuable information on how best to treat the cancer. While many studies have investigated bulk complex cell samples from tumours, there is little information regarding how simple idealised mixtures perform across different sequencing methodologies and what effect copy number has on the performance of a computational model to determine clonal structure. In this thesis I investigated how idealised cell line mixtures performed under a number of conditions and the ability of the computational model (PyClone) to predict the cellular prevalence and clonal structure of copy number simple (diploid) and copy number complex mixtures. Specifically, the main goals were 1) to assess the performance of multiplex and single-plex PCR sequencing methodologies; 2) to determine how the method of cell mixing would impact the accuracy of results; and 3) to evaluate the ability of PyClone to retrieve clonal structure from copy number-complex cell lines with copy number naïve and copy number aware analysis.          112 3.2 Summary of findings  3.2.1 Multiplex PCR sequencing performed similarly to singleplex PCR sequencing in determining allelic prevalence from idealised mixtures In this thesis this study, two sequencing methodologies were examined, the multiplex PCR sequencing method and the singleplex PCR sequencing method. The comparison of the two methods identified potential limitations and strengths of each method. The results showed that both sequencing methods performed well and produced allelic prevalence values close to what was expected, with an overall SD of 1.5x10-2 for the multiplex method and 2.9x10-2 for the singleplex method. The multiplex method produced results that were (within 50%) closer to the expected values than the singleplex, although the singleplex method showed less variance of the positions in the samples.  High read coverage of all positions is an important factor in ensuring quality sequencing results and downstream analysis.  In this study we found that the singleplex method outperformed the multiplex method, with ten times the sequencing coverage/read depth of the positions. This may be due to the sequencing method itself or be a result of the number of cycles that were completed for each experiment, 300 cycles for the multiplex experiment and 500 cycles for the singleplex. Both methods did however produce high read coverage for the majority of the positions.  The selection of the PCR sequencing method is dependent on the application needed and the amount of data required for downstream analysis. In this study we found that the multiplex PCR sequencing method is a more expensive method and has less flexibility with primer selection, for example, some regions of the genome may be overlooked as the SNVs in those regions are not   113 compatible in a multiplex primer design. This method does however perform well overall and work be a good choice for use in a more clinical setting where a set of common primers are designed and used regularly for an experiment. The singleplex PCR sequencing method is more laborious than the multiplex method in the initial qPCR steps, although the advantage with the singleplex method is the flexibility, primers can be added or removed depending on experimental requirements, which may be of benefit in a research setting. Furthermore, singleplex primer designs are locally optimal, ensuring that most regions of the genome are accessible, whereas multiplex designs sacrifice individual primer performance, for optimization of a pool. It also permits a greater number of quality control procedures of the samples at various steps of the methodology than the multiplex, and any sample failures can be detected early.    Overall, we found that either method is a suitable option for obtaining linear representation of allelic abundance, which can be used in down stream analysis for the retrieval of clonal structure in mixed cell populations. Some positions, mainly 184-hTERT, in both the multiplex and singleplex experiments showed preferential amplification and produced greater sequence coverage than other positions. This unequal PCR amplification bias could be due to differences in the GC content of the primers. The primers with a higher GC content will amplify with higher efficiency causing superior amplification of these positions. Amplification bias can also be introduced if the melting temperatures of the positions selected show differences of greater than 5oC (Aird et al., 2011; Polz & Cavanaugh, 1998). In order to prevent this bias in future, all primers should be designed with similar GC content and primer melting temperatures.    114  3.2.2 DNA concentration mixed samples when contrasted with cell number mixed samples resulted in more accurate idealised mixtures When performing deep sequencing, equal sequencing performance of all samples in a library is of importance to ensure there is no sample bias. To understand the effect that sample type has on sequencing results, we used two mixing methodologies and compared the accuracy of results. Samples were mixed by cell number or DNA concentration in specified proportions and contrasted in the singleplex HCT116/184-hTERT mixing experiment. We identified that while both mixing procedures produced allelic prevalences as expected the DNA concentration mixed samples showed a closer adherence to the expected values. PyClone analysis also resulted in a closer adherence to the expected cellular mutation prevalence of the DNA concentration mixed samples than the cell number mixed samples.   The discrepancy between the mixing types may be due to cell line bias. The cell number mixing method may have an introduced bias when cells are counted, technical error in the counting of the cells may skew cell number counts in favour of one cell line over the other. It was also noted that the HCT116 cells did not extract as well as the 184-hTERT cells, which again may introduce bias when the cell lines are extracted together. In a biological relevant bulk tumour sample, the cells would not need to be mixed, and as such, cell mixing by cell number would not occur, and the DNA concentration of the bulk population of the cells would be considered for any upstream sequencing analysis. The study revealed that the DNA concentration mixed samples produced results that were closer to the predicted PyClone values than the samples mixed by cell number and would be a more accurate methodology to use for future mixing experiments.     115  3.2.3 PyClone analysis accurately estimated cellular mutation prevalence in copy number simple mixtures PyClone is a computational model used for the estimation of cellular mutation prevalence and clonal structure in mixed cell populations. In this study we tested the ability of PyClone to elucidate cellular mutation prevalence in copy number simple (HCT116 and 184-hTERT) idealised mixtures. The 184-hTERT and HCT116 cell lines were chosen as they are both assumed diploid and had previous exome/genome data available for SNV calling. Multiplex and singleplex PCR sequencing of the HCT116 and 184-hTERT idealised mixtures showed that regardless of the sequencing method, PyClone estimated cellular mutation prevalence as expected and the samples showed a close adherence to the expected values. In the clonal mixtures, the expectation is for three major clonal groups to be identified and in fact three major clusters were observed, with mis-assignment of only a small number of positions into orphan clusters. This shows that the model performs as expected and does not introduce a large unexpected variation in the idealized clonal population analysis. We did observe however that a few of the 184-hTERT and shared positions for the multiplex PCR sequencing experiment and a  few of the shared position in the singleplex PCR sequencing experiment, grouped independently of the main 184-hTERT or shared mutation cluster. This unexpected cluster formation may be due to primer bias and unequal amplification of the positions, causing an inability for PyClone to correctly cluster the positions. Further analysis of these positions by re-sequencing or repeating the binomial testing to ensure only validated positions are included in analysis, would allow for a clearer understanding of why PyClone did not correctly estimate the clonal mutation prevalence,   116 and repeating the experiment would rule out technical or sequencing error if the same clustering pattern occurred. We were able to provide “ground truth” based analysis suggesting that the current PyClone model analysis of copy number simple mixtures is accurate and is a computational model that can correctly retrieve clonal structure within cell populations having simple genomes (i.e. diploid state).  3.2.4 PyClone analysis of copy number complex mixtures required copy number informed analysis for correct cellular prevalence estimations The most important experiment undertaken in thesis was the creation of idealised mixtures with copy number complex cell lines, and the retrieval of cellular mutation prevalence and clonal structure using PyClone. In bulk tumour samples copy number alterations are common and must be considered when informing analysis with computational models such as PyClone. The copy number complex cell line mixtures did not show the same performance as the copy number simple cell lines, excluding copy number complexity, due to possible technical error when mixing the cell lines, pooling equal volumes of all samples to the library or poor primer design. If a selection of the primers were not designed in a way to ensure the best amplification, or in regions of high GC content, the primers may not have performed as well as expected.  As previously discussed, the PyClone model performs well when copy number simple cell line mixtures are analysed, although as copy number complexity increases the ability for PyClone to produce expected results decreases. We performed copy number naïve and copy   117 number informed analysis and identified the differences of cellular mutation clusters between the two analysis types.  Firstly, PyClone analysis of the copy number naïve sample mixtures (assuming a diploid state) did not yield clonal structure and prevalence close to the expected values. Many more than three main clusters were formed. This is a predictable result, since copy number variation changes the expectation of allele prevalence and without incorporation will lead to mis-assignment of alleles to clusters. We next used the same sample mixtures and performed copy number informed PyClone analysis of each cell line to predict cellular mutation prevalence and clonal structure of the samples. A fundamental assumption of the PyClone model used in this analysis, is that each clone has the same copy number state at each locus. We therefore tested the idealized mixtures under this constraint, but also examined the situation where copy number heterogeneity exists at the same locus in different clones. These two situations were simulated by either restricting analysis to SNVs at regions of equal copy number status, or the inclusion of all regions. In the separated analysis copy number calls were analyzed using either DAH55 or DAH56 copy number values.  When only homogenous copy number states were used, PyClone accurately resolves three major clusters and the cellular mutation prevalence of each cluster closely tracks the expected values to within 8.5x10-3 to 2.3x10-1. However when all copy number states are included, the performance degrades, with multiple large clusters being (six clusters, with position ranging from two – seventeen in each cluster). Thus where there is significant copy number heterogeneity between clones, the version of PyClone used in this work would mis-allocate   118 alleles to clonal groups. This is expected behavior since clonal copy number variation was not encompassed in the initial model. In general measurement of copy number suffers from quantitative issues, such as the dynamic range and accuracy of measurements for extreme amplifications (beyond 4 copies or so). Incorrect copy number information used to inform PyClone analysis will result in misleading cellular mutation cluster predictions and is likely a significant source of the uncertainty in practice. This manifests as either very broad (uncertain locations) for the mean of the posterior distributions and/or resulting mis-assignment of alleles to clusters. The variety of posterior distributions can also be seen in bulk tumour populations were some genes show a narrow clonal prevalence, while others show a distribution that spans multiple clonal prevalence values (Shah et al., 2012).  3.3 Future directions and conclusion In this study I determined that Pyclone analysis of copy number complex mixtures require copy number informed analysis in order to obtain accurate cellular mutation prevalence in idealised mixtures. This was an important finding and helped establish a ground truth based on idealised mixtures and the prediction of cellular mutation prevalence using PyClone. The findings also show the limitations of the assumptions in the model used in this study. Extending the PyClone model to incorporate heterogeneous copy number states would be of value and this has been achieved in part with the TITAN model (Ha et al., 2014),which uses the same principles behind PyClone for clustering alleles, but with the assumption that copy number states can be heterogeneous. A future extension would be to apply this model to the idealized mixtures   119 generated here, to determine whether the model assumptions for TITAN improve on those of PyClone. Future experiments utilizing idealised mixtures performed by single cell analysis, would also be of great value in establishing a ground truth of how single cell experiments perform with PyClone. The experiments would provide an understanding of any noise that is introduced during single cell analysis, helping inform future single cell analysis and clonal evolution studies. Finally, future further experiments involving drug selection of mixed cell lines in idealised mixtures would provide important information on what effects drug selection has on clonal populations over time and show how PyClone is able to retrieve the changes in cellular mutation prevalence.  In summary, PyClone is a useful tool for the elucidation of cellular prevalence and clonal structure of mixed cell populations. Copy number informed analysis is an essential step in ensuring correct prediction of clonal structure in mixed cell populations. In depth understanding of the clonal structures of tumours, is of particular importance in designing the most efficacious treatment strategy in a clinical setting and improving survival outcomes.    120 Bibliography  Abelson, S., Shamai, Y., Berger, L., Shouval, R., Skorecki, K., & Tzukerman, M. (2012). Intratumoral heterogeneity in the self-renewal and tumorigenic differentiation of ovarian cancer. Stem Cells (Dayton, Ohio), 30(3), 415–24. doi:10.1002/stem.1029 Aird, D., Ross, M. G., Chen, W.-S., Danielsson, M., Fennell, T., Russ, C., … Gnirke, A. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biology, 12(2), R18. doi:10.1186/gb-2011-12-2-r18 Anderson, A. R. a, Weaver, A. M., Cummings, P. T., & Quaranta, V. (2006). Tumor morphology and phenotypic evolution driven by selective pressure from the microenvironment. Cell, 127(5), 905–15. doi:10.1016/j.cell.2006.09.042 Aparicio, S., & Caldas, C. (2013). The implications of clonal genome evolution for cancer medicine. The New England Journal of Medicine, 368(9), 842–51. doi:10.1056/NEJMra1204892 Bedard, P. L., Hansen, A. R., Ratain, M. J., & Siu, L. L. (2013). Tumour heterogeneity in the clinic. Nature, 501(7467), 355–64. doi:10.1038/nature12627 Bentley, D. R., Balasubramanian, S., Swerdlow, H. P., Smith, G. P., Milton, J., Brown, C. G., … Smith, A. J. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456(7218), 53–9. doi:10.1038/nature07517 Brattain, M. G., Fine, W. D., Khaled, F. M., Thompson, J., & Brattain, D. E. (1981). Heterogeneity of Malignant Cells from a Human Colonic Carcinoma of Malignant Cells from a Human Colonie Carcinoma1. Cancer Research, (41), 1751–1756. Bunting, S. F., & Nussenzweig, A. (2013). End-joining, translocations and cancer. Nature Reviews. Cancer, 13(7), 443–54. doi:10.1038/nrc3537 Chen, Z.-Y., Zhong, W.-Z., Zhang, X.-C., Su, J., Yang, X.-N., Chen, Z.-H., … Wu, Y.-L. (2012). EGFR mutation heterogeneity and the mixed response to EGFR tyrosine kinase inhibitors of lung adenocarcinomas. The Oncologist, 17(7), 978–85. doi:10.1634/theoncologist.2011-0385 Darwin, C. (1859). On the Origin of Species (p. 162). John Murray. Retrieved July 14, 2014, from http://www.darwins-theory-of-evolution.com/ De Sousa E Melo, F., Vermeulen, L., Fessler, E., & Medema, J. P. (2013). Cancer heterogeneity--a multifaceted view. EMBO Reports, 14(8), 686–95. doi:10.1038/embor.2013.92   121 DeVita, V. T., & Rosenberg, S. a. (2012). Two hundred years of cancer research. The New England Journal of Medicine, 366(23), 2207–14. doi:10.1056/NEJMra1204479 Ding, L., Ley, T. J., Larson, D. E., Miller, C. A., Koboldt, D. C., Welch, J. S., … DiPersio, J. F. (2012). Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature, 481(7382), 506–10. doi:10.1038/nature10738 Fidler, I. J., & Hart, I. R. (1982). Biological diversity in metastatic neoplasms: origins and implications. Science (New York, N.Y.), 217(4564), 998–1003. Retrieved August 20, 2014,  from http://www.ncbi.nlm.nih.gov/pubmed/7112116 Fisher, R., Pusztai, L., & Swanton, C. (2013). Cancer heterogeneity: implications for targeted therapeutics. British Journal of Cancer, 108(3), 479–85. doi:10.1038/bjc.2012.581 Forment, J. V, Kaidi, A., & Jackson, S. P. (2012). Chromothripsis and cancer: causes and consequences of chromosome shattering. Nature Reviews. Cancer, 12(10), 663–70. doi:10.1038/nrc3352 Gerashchenko, T. S., Denisov, E. V, Litviakov, N. V, Zavyalova, M. V, Vtorushin, S. V, Tsyganov, M. M., … Cherdyntseva, N. V. (2013). Intratumor heterogeneity: nature and biological significance. Biochemistry., 78(11), 1201–15. doi:10.1134/S0006297913110011 Gerlinger, M., Endesfelder, D., Math, D., Gronroos, E., Ph, D., Martinez, P., … Swanton, C. (2012). Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing. The New England Medicine, 366(10), 883–92. Grada, A., & Weinbrecht, K. (2013). Next-generation sequencing: methodology and application. The Journal of Investigative Dermatology, 133(8), e11. doi:10.1038/jid.2013.248 Greaves, M., & Maley, C. C. (2012). Clonal evolution in cancer. Nature, 481(7381), 306–13. doi:10.1038/nature10762 Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., … Shah, S. P. (2014). TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Research. doi:10.1101/gr.180281.114 Hanahan, D., & Weinberg, R. A. (2000). The Hallmarks of Cancer. Cell, 100(1), 57–70. doi:10.1016/S0092-8674(00)81683-9 Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of cancer: the next generation. Cell, 144(5), 646–74. doi:10.1016/j.cell.2011.02.013 Hayes, D. F., & Paoletti, C. (2013). Circulating tumour cells: insights into tumour heterogeneity. Journal of Internal Medicine, 274(2), 137–43. doi:10.1111/joim.12047   122 Hou, Y., Song, L., Zhu, P., Zhang, B., Tao, Y., Xu, X., … Wang, J. (2012). Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell, 148(5), 873–85. doi:10.1016/j.cell.2012.02.028 Illumina. (2014a). An Introduction to Next-Generation Sequencing Technology. Retrieved October 07, 2014, from http://www.illumina.com/technology/next-generation-sequencing/sequencing-technology.ilmn Illumina. (2014b). MiSeq Desktop Sequencer. website. Retrieved November 07, 2014, from http://www.illumina.com/systems/miseq.ilmn Illumina. (2014c). Nextera DNA Sample Prep. Retrieved July 11, 2014, from http://www.illumina.com/products/nextera_dna_sample_prep_kit.ilmn Junttila, M. R., & de Sauvage, F. J. (2013). Influence of tumour micro-environment heterogeneity on therapeutic response. Nature, 501(7467), 346–54. doi:10.1038/nature12626 Kreso, A., O’Brien, C. a, van Galen, P., Gan, O. I., Notta, F., Brown, A. M. K., … Dick, J. E. (2013). Variable clonal repopulation dynamics influence chemotherapy response in colorectal cancer. Science (New York, N.Y.), 339(6119), 543–8. doi:10.1126/science.1227670 Landau, D. A., Carter, S. L., Stojanov, P., McKenna, A., Stevenson, K., Lawrence, M. S., … Wu, C. J. (2013). Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell, 152(4), 714–26. doi:10.1016/j.cell.2013.01.019 Létourneau, I. J., Quinn, M. C. J., Wang, L.-L., Portelance, L., Caceres, K. Y., Cyr, L., … Mes-Masson, A.-M. (2012). Derivation and characterization of matched cell lines from primary and recurrent serous ovarian cancer. BMC Cancer, 12, 379. doi:10.1186/1471-2407-12-379 Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., … Law, M. (2012). Comparison of next-generation sequencing systems. Journal of Biomedicine & Biotechnology, 2012, 251364. doi:10.1155/2012/251364 Loeb, L. a. (2011). Human cancers express mutator phenotypes: origin, consequences and targeting. Nature Reviews. Cancer, 11(6), 450–7. doi:10.1038/nrc3063 Loman, N. J., Misra, R. V, Dallman, T. J., Constantinidou, C., Gharbia, S. E., Wain, J., & Pallen, M. J. (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology, 30(5), 434–9. doi:10.1038/nbt.2198 Malhotra, G. K., Zhao, X., Band, H., & Band, V. (2010). Histological, molecular and functional subtypes of breast cancers. Cancer Biology & Therapy, 10(10), 955–60. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3047091&tool=pmcentrez&rendertype=abstract   123 Marusyk, A., Almendro, V., & Polyak, K. (2012). Intra-tumour heterogeneity: a looking glass for cancer? Nature Reviews. Cancer, 12(5), 323–34. doi:10.1038/nrc3261 Meira, L. B., Bugni, J. M., Green, S. L., Lee, C.-W., Pang, B., Borenshtein, D., … Samson, L. D. (2008). DNA damage induced by chronic inflammation contributes to colon carcinogenesis in mice. The Journal of Clinical Investigation, 118(7), 2516–25. doi:10.1172/JCI35073 Menzies, A. M., Haydu, L. E., Carlino, M. S., Azer, M. W. F., Carr, P. J. A., Kefford, R. F., & Long, G. V. (2014). Inter- and intra-patient heterogeneity of response and progression to targeted therapy in metastatic melanoma. PloS One, 9(1), e85004. doi:10.1371/journal.pone.0085004 Michor, F., & Polyak, K. (2010). The origins and implications of intratumor heterogeneity. Cancer Prevention Research (Philadelphia, Pa.), 3(11), 1361–4. doi:10.1158/1940-6207.CAPR-10-0234 Navin, N., Kendall, J., Troge, J., Andrews, P., Rodgers, L., McIndoo, J., … Wigler, M. (2011). Tumour evolution inferred by single-cell sequencing. Nature, 472(7341), 90–4. doi:10.1038/nature09807 Navin, N., Krasnitz, A., Rodgers, L., Cook, K., Meth, J., Kendall, J., … Wigler, M. (2010). Inferring tumor progression from genomic heterogeneity. Genome Research, 20(1), 68–80. doi:10.1101/gr.099622.109 Nowell, P. C. (1976). The clonal evolution of tumor cell populations. Science (New York, N.Y.), 194(4260), 23–8. Retrieved June 07,2014, from http://www.ncbi.nlm.nih.gov/pubmed/959840 Podlaha, O., Riester, M., De, S., & Michor, F. (2012). Evolution of the cancer genome. Trends in Genetics  : TIG, 28(4), 155–63. doi:10.1016/j.tig.2012.01.003 Polyak, K. (2011). Review series introduction Heterogeneity in breast cancer, 121(10). doi:10.1172/JCI60534.3786 Polz, M. F., & Cavanaugh, C. M. (1998). Bias in Template-to-Product Ratios in Multitemplate PCR Bias in Template-to-Product Ratios in Multitemplate PCR, 64(10). Quail, M. a, Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., … Gu, Y. (2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13(1), 341. doi:10.1186/1471-2164-13-341 Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., … Shah, S. P. (2014). PyClone: statistical inference of clonal population structure in cancer. Nature Methods, 11(4). doi:10.1038/nmeth.2883   124 Sanger, F., & Nicklen, S. (1977). DNA sequencing with chain-terminating. PNAS, 74(12), 5463–5467. Shah, S. P., Morin, R. D., Khattra, J., Prentice, L., Pugh, T., Burleigh, A., … Aparicio, S. (2009). Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature, 461(7265), 809–13. doi:10.1038/nature08489 Shah, S. P., Roth, A., Goya, R., Oloumi, A., Ha, G., Zhao, Y., … Aparicio, S. (2012). The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature, 486(7403), 395–9. doi:10.1038/nature10933 Shibata, M., & Shen, M. M. (2013). The roots of cancer: stem cells and the basis for tumor heterogeneity. BioEssays  : News and Reviews in Molecular, Cellular and Developmental Biology, 35(3), 253–60. doi:10.1002/bies.201200101 Soon, W. W., Hariharan, M., & Snyder, M. P. (2013). High-throughput sequencing for biology and medicine. Molecular Systems Biology, 9(640), 640. doi:10.1038/msb.2012.61 Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., … Børresen-Dale, A. L. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences of the United States of America, 98(19), 10869–74. doi:10.1073/pnas.191367098 Sottoriva, A., Spiteri, I., Piccirillo, S. G. M., Touloumis, A., Collins, V. P., Marioni, J. C., … Tavaré, S. (2013). Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proceedings of the National Academy of Sciences of the United States of America, 110(10), 4009–14. doi:10.1073/pnas.1219747110 Stratton, M. R., Campbell, P. J., & Futreal, P. A. (2009a). The cancer genome. Nature, 458(7239), 719–24. doi:10.1038/nature07943 Stratton, M. R., Campbell, P. J., & Futreal, P. A. (2009b). The cancer genome. Nature, 458(7239), 719–24. doi:10.1038/nature07943 Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3--new capabilities and interfaces. Nucleic Acids Research, 40(15), e115. doi:10.1093/nar/gks596 Varmus, H., & Kumar, H. S. (2013). Addressing the Growing International Challenge of Cancer: A Multinational Perspective, 5(175). Voduc, K. D., Cheang, M. C. U., Tyldesley, S., Gelmon, K., Nielsen, T. O., & Kennecke, H. (2010). Breast cancer subtypes and the risk of local and regional relapse. Journal of Clinical Oncology  : Official Journal of the American Society of Clinical Oncology, 28(10), 1684–91. doi:10.1200/JCO.2009.24.9284   125 Xu, X., Hou, Y., Yin, X., Bao, L., Tang, A., Song, L., … Wang, J. (2012). Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell, 148(5), 886–95. doi:10.1016/j.cell.2012.02.025 Yachida, S., Jones, S., Bozic, I., Antal, T., Leary, R., Fu, B., … Iacobuzio-Donahue, C. A. (2010). Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature, 467(7319), 1114–7. doi:10.1038/nature09515 Yaswen, P., & Stampfer, M. R. (2002). Molecular changes accompanying senescence and immortalization of cultured human mammary epithelial cells. The International Journal of Biochemistry & Cell Biology, 34(11), 1382–1394. doi:10.1016/S1357-2725(02)00047-X Yau, C. (2013). OncoSNP-SEQ: a statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes. Bioinformatics (Oxford, England), 29(19), 2482–4. doi:10.1093/bioinformatics/btt416   126 Appendices Appendix A   - Supplementary material for chapter 3  Table A.1 Coverage/read depths of samples for the multiplex HCT116.184-hTERT mixing experiment. The table contains the minimum and maximum read depths of the samples for each of HCT116, 184-hTERT and shared targets. The shaded rows refer to the samples used to create Figure 3.5.    !!!!!!!!!Sample!Proportion !!!!!!!!!!!!!!!!!!!!!!!!!!!1841hTERT !!!!!!!!!!!!!!!!!!!!!!!!!HCT116 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Shared184$hTERT HCT116 Minimum1read1depth Maximum1read1depth Minimum1read1depth Maximum1read1depth Minimum1read1depth Maximum1read1depth1 0 2 1855 6 1129 98 14770.9 0.1 6 5054 10 2748 174 21970.75 0.25 3 3076 12 1400 151 16330.5 0.5 4 1917 7 1060 107 17590.25 0.75 6 3205 6 1992 121 20650.1 0.9 10 2886 9 2012 113 20430 1 6 3001 7 1896 149 21920.99 0.01 7 2072 8 1259 141 15060.01 0.99 3 3088 10 2654 160 22900.999 0.001 3 4421 8 2816 222 23510.001 0.999 6 3664 9 2480 186 23511 0 1 3056 5 1090 105 15080 1 4 2627 15 2119 108 21400.5 0.5 10 3902 12 2786 178 23510.9 0.1 2 5028 6 1832 98 2073  127  Table A.2 Coverage/read depths of samples for the singleplex HCT116.184-hTERT mixing experiment. The table contains the minimum and maximum read depths of the samples for each of HCT116, 184-hTERT and shared targets. The shaded rows refer to the samples used to create Figure 3.6.  !!!!!!!!!Sample!Proportion !!!!!!!!!!!!!1841hTERT !!!!!!!!!!!!!!!!HCT116 !!!!!!!!!!!!!!!!!Shared184$hTERT HCT116 Minimum1read1depth Maximun1read1depth Minimum1read1depth Maximun1read1depth Minimum1read1depth Maximun1read1depth1 0 176 16659 521 13682 428 123100 1 110 12886 385 13944 438 136820.9 0.1 344 48864 1783 34532 1 236260.75 0.25 113 16506 115 16656 37 78410.5 0.5 172 15760 559 15925 4 159960.25 0.75 238 10498 30 17363 811 180890.1 0.9 6 24189 1401 44925 141 220860.9 0.1 218 15413 903 51915 962 232800.75 0.25 372 19669 1223 29577 623 263430.5 0.5 297 17523 24 28407 1195 292230.25 0.75 410 19849 680 30011 1691 432360.1 0.9 342 24312 759 38247 363 247671 0 383 36383 1291 32413 1838 193110 1 259 18579 791 37492 6 342140.9 0.1 394 18703 966 32264 1489 313380.75 0.25 498 14997 1321 40531 1539 226000.5 0.5 557 33096 1204 48830 8 459300.25 0.75 566 29187 972 30017 1683 398570.1 0.9 336 33046 872 57736 1281 471090.9 0.1 348 27433 823 44859 936 513090.75 0.25 292 19177 1077 41132 1612 411790.5 0.5 346 31362 836 42699 1589 377010.25 0.75 214 20369 554 27850 958 268310.1 0.9 412 22706 63 31509 1204 27704  128  Table A.3 Coverage/read depths of samples for the singleplex DAH55/DAH56 mixing experiment. The table contains the minimum and maximum read depths of the samples for each of DAH55, DAH56 and shared targets. The shaded rows refer to the samples used to create Figures 3.12 and 3.13.  !!!!!!!!!Sample!Proportion !!!!!!!!!!!!!!!!!!DAH55 !!!!!!!!!!!!!!!!!DAH56 !!!!!!!!!!!!!!!!!SharedDAH55 DAH56 Minimum+read+depth Maximum+read+depth Minimum+read+depth Maximum+read+depth Minimum+read+depth Maximum+read+depth1 0 305 11670 1826 8369 640 134990 1 197 16406 2838 12943 1040 135330.9 0.1 189 26759 5088 23492 1689 237800.75 0.25 180 15407 3176 13945 1128 136450.5 0.5 215 18770 3130 15520 1123 161440.25 0.75 203 19323 3709 17716 1461 173700.1 0.9 199 20083 3543 17738 1554 174690.9 0.1 152 20172 3042 16899 1458 171320.75 0.25 239 28072 4383 22746 1968 233010.5 0.5 102 18369 2938 14102 1301 143520.25 0.75 88 16795 33 14448 5 147810.1 0.9 187 18456 4 15958 992 143781 0 219 15887 1329 12841 705 146190 1 243 19220 2873 12783 1208 152720.9 0.1 398 33352 5435 29491 14 293950.75 0.25 90 25185 4133 20211 1658 200770.5 0.5 154 25829 3723 19830 1442 199830.25 0.75 182 31177 4930 24992 1669 242600.1 0.9 338 15610 2511 11946 1007 191180.9 0.1 31 23255 2939 23289 1350 207920.75 0.25 158 22017 3 19349 1292 197770.5 0.5 57 13102 464 10985 7 113260.25 0.75 122 22224 3682 18784 1456 171940.1 0.9 99 16462 137 14774 1103 14288

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167058/manifest

Comment

Related Items