UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Development and application of microscale technology for single-cell sequencing Zahn, Hans 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_may_zahn_hans.pdf [ 20.59MB ]
Metadata
JSON: 24-1.0365331.json
JSON-LD: 24-1.0365331-ld.json
RDF/XML (Pretty): 24-1.0365331-rdf.xml
RDF/JSON: 24-1.0365331-rdf.json
Turtle: 24-1.0365331-turtle.txt
N-Triples: 24-1.0365331-rdf-ntriples.txt
Original Record: 24-1.0365331-source.json
Full Text
24-1.0365331-fulltext.txt
Citation
24-1.0365331.ris

Full Text

 Development and Application of Microscale Technology for  Single-Cell Sequencing by Hans Zahn B.Eng., Hochschule München, Germany, 2011  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Genome Science and Technology)   THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2018 © Hans Zahn, 2018 ii  Abstract Genomic heterogeneity is a central feature of many cancers and plays a critical role in disease initiation, progression, and response to treatment. As a result, development of robust, scalable and high-fidelity single-cell genomics has become critical to understanding the structure and dynamics of cellular heterogeneity in cancer. However, conventional methods based on library construction from pre-amplified samples introduce artefacts and coverage bias, and are also prohibitively expensive for large-scale studies of thousands of single cells. In this thesis I describe the development and application of technology that uses nanolitre volume transposase reactions to enable the streamlined preparation of single-cell next-generation sequencing libraries without the need for prior amplification. This workflow permits the economical and high-throughput analysis of large numbers of cells. I show the application of the direct library preparation (DLP) method to examine whole genomes from cell lines and murine xenografts derived from a primary triple-negative human breast cancer tumour. Low-coverage sequencing and analysis revealed that the lack of pre-amplification resulted in high uniform coverage amenable to copy number inference, with greater uniformity of coverage and more reliable detection of megabase-scale copy number alterations than existing methods. Pooling of individual cells was used to generate “bulk-equivalent” genomes with equal coverage breadth and uniformity to a standard bulk genome with the same sequencing depth. Phylogenetic analysis of hundreds of single-cell iii  copy number profiles revealed minor clonal sub-populations that were undetectable in bulk measurements. This thesis presents new technology for low-depth sequencing and pooling of single-cell libraries constructed without pre-amplification, and demonstrates that this strategy can provide an effective replacement for conventional bulk sequencing strategies, permitting detailed reconstruction of copy number clonal lineages as well as standard inference of other variants, without significantly increasing the total sequencing effort.   iv  Lay Summary On average, over 500 Canadians will be diagnosed with cancer every day and 75,000 will die from this disease each year. Research efforts are underway to develop effective new treatment strategies, but a significant challenge is that tumours are comprised of a mixture of cells with distinct genetic alterations. While many cancerous cells might be sensitive to drug treatment and die during therapy, small sub-populations of tumour cells can harbour genetic alterations that make them resistant to treatment and over time lead to relapse. In this thesis, we have developed technologies that provide the sensitivity to disentangle the mixture of cells and help to identify resistant cell populations. We combined sophisticated micromachining, molecular biology, and statistical procedures to miniaturize reactions and perform genome analysis across thousands of single cells. This work improves our ability to study cancer evolution to guide treatment choices, minimize unnecessary procedures, and improve patient outcomes.   v  Preface Research presented in this thesis is part of an ongoing collaborative effort between the Hansen Lab, Aparicio Lab, Shah Lab, and the BC Genome Sciences Centre to develop and apply microfluidic technology for single-cell analysis in cancer. The work has resulted in co-authored publications (published and in preparation) and patent applications. Chapter 1 includes sections of an early version of: H. Zahn, A. Steif, E. Laks, P. Eirew, M. VanInsberghe, S. P. Shah, S. Aparicio, and C. L. Hansen, “Scalable whole-genome single-cell library preparation without preamplification,” Nat. Methods, vol. 14, no. 2, pp. 167–173, Jan. 2017. H.Z., A.S., C.L.H., S.A., and S.P.S. wrote the paper. H.Z. wrote Chapter 1. Figure 2.2 was adapted from White A.K., VanInsberghe M., Petriv O.I., Hamidi M., Sikorski D., Marra M.A., Piret J., Aparicio S., Hansen C.L. (2011) High-throughput microfluidic single-cell RT-qPCR. Proceedings of the National Academy of Science. 108: 13999-14004. Permission for use of the figure was granted by Kay McLaughlin from PNAS Permissions on December 21st, 2017. Chapter 2 includes research that led to the following patent application: Carl Lars Genghis Hansen, Hans Zahn, Jens Huft, Marinus Theodorus Johannes Van Loenhout, Kaston LEUNG, Bill Kengli Lin, Anders KLAUS, Samuel Alves Jana Rodrigues Aparicio, Sohrab Prakash Shah, “Methods and devices for analyzing particles,” PCT/CA2016/000031, Feb. 04, 2016. H.Z. desinged and fabricated the discribed microfluidic device and conducted related experiments. K.L., B.L. and A.K. designed and fabricated virtual well platform and performed WGA measurments. M.L. designed an built cell picking system. S.A. and S.S vi  advised research. H.Z., K.L. and C.L.H. wrote invention disclosure. H.Z. designed and conducted the research and wrote Chapter 2. Chapter 3 is a peer-reviewed and accepted version of the published research article: H. Zahn, A. Steif, E. Laks, P. Eirew, M. VanInsberghe, S. P. Shah, S. Aparicio, and C. L. Hansen, “Scalable whole-genome single-cell library preparation without preamplification,” Nat. Methods, vol. 14, no. 2, pp. 167–173, Jan. 2017. H.Z., A.S., S.P.S., S.A., and C.L.H. designed the research. H.Z. performed experiments. A.S. analyzed the data. H.Z., A.S., C.L.H., S.A., and S.P.S. wrote the paper. E.L. prepared tissue samples and bulk libraries. P.E. performed xenograft transplants. M.V. contributed to technology development. C.L.H., S.A., and S.P.S. supervised the research. “The anonymized human tumour tissue for xenografting was collected with informed patient consent according to procedures approved by the Ethics Committee at the University of British Columbia, under protocols H06-00289 BCCA-TTR-BREAST and H11-01887 Neoadjuvant Xenograft Study” (see ref (1): Supplementary Notes, section 2.2). Chapter 4 is a version of a research article in preparation: H. Zahn, E. Laks, A. Steif, D. Lai, M. VanInsberghe, R. Coope, S. P. Shah, C. L. Hansen and S. Aparicio “Highly-scalable direct library preparation (HT-DLP) of single-cell genomes in open nanolitre wells,” Paper in preparation, 2018. H.Z. developed the single-cell platform, H.Z., E.L., A.S., S.P.S., C.L.H., and S.A., designed experiments, D.L., H.Z., E.L., and A.S. analysed the data. H.Z. and E.L. wrote the draft. M.V. and R.C. contributed to technology development. S.A., S.P.S. and C.L.H. supervised the research.  vii  Table of Contents Abstract .................................................................................................................................................... ii Lay Summary ......................................................................................................................................... iv Preface ....................................................................................................................................................... v Table of Contents ................................................................................................................................. vii List of Abbreviations ......................................................................................................................... xiii List of Tables .......................................................................................................................................... xv List of Figures ...................................................................................................................................... xvi Acknowledgements ............................................................................................................................. xx Chapter 1: Introduction ....................................................................................................................... 1 1.1 Why measure genomic alterations at single-cell resolution in cancer research? ..... 1 1.2 Analysis at single-cell resolution ................................................................................................. 3 1.3 Microfluidic technology for single-cell analysis .................................................................. 11 1.3.1 Benefits of miniaturizing and integration ........................................................................ 11 1.3.2 Microfluidic technologies ........................................................................................................ 13 1.3.2.1 Closed microfluidic circuits made using soft lithography ................................ 13 1.3.2.2 Open microwell structures ........................................................................................... 18 1.4 Library preparation for next-generation sequencing applications ............................. 21 1.5 Summary ............................................................................................................................................ 26 1.6 Research contribution ................................................................................................................... 27 1.6.1 Deposition of primers and template into multilayer microfluidic devices .......... 27 viii  1.6.2 Inflatable reaction chamber ................................................................................................... 28 1.6.3 Direct library preparation (DLP) without pre-amplification .................................... 28 1.6.4 Highly-scalable direct library preparation (DLP+) of single-cell genomes in open nanolitre well arrays .................................................................................................................... 29 Chapter 2: Microfluidic designs and device fabrication for single-cell measurements ..   ................................................................................................................................................................... 30 2.1 Overview ............................................................................................................................................ 30 2.2 Introduction ...................................................................................................................................... 31 2.3 Deposition of primers and template into microfluidic devices ..................................... 36 2.3.1 Method............................................................................................................................................ 37 2.3.1.1 Chip fabrication methods .............................................................................................. 37 2.3.1.2 Spotting into microfluidic chamber arrays ............................................................. 42 2.3.2 Results ............................................................................................................................................ 45 2.3.2.1 Previous issues encountered during primer incorporation ............................ 45 2.3.2.2 Chip designs and operation .......................................................................................... 47 2.3.2.3 qPCR results: device sensitivity and performance .............................................. 47 2.4 Inflatable reaction chamber ........................................................................................................ 51 2.4.1 Method............................................................................................................................................ 52 2.4.1.1 Device fabrication ............................................................................................................. 52 2.4.1.2 Bead purification .............................................................................................................. 55 2.4.2 Design and results ...................................................................................................................... 56 ix  2.4.2.1 Architecture of the inflatable microfluidic chamber........................................... 56 2.4.2.2 Microfluidic chip for single-cell library preparation .......................................... 58 2.4.2.3 On-chip size selection using magnetic beads ......................................................... 59 2.5 Discussion .......................................................................................................................................... 61 Chapter 3: Direct library preparation without pre-amplification ..................................... 64 3.1 Overview ............................................................................................................................................ 64 3.2 Introduction ...................................................................................................................................... 64 3.3 Methods .............................................................................................................................................. 68 3.3.1 Device fabrication ...................................................................................................................... 68 3.3.2 Sample preparation ................................................................................................................... 68 3.3.2.1 Cell culture .......................................................................................................................... 68 3.3.2.2 Xenograft passaging and tissue dissociation ......................................................... 69 3.3.3 Library preparation protocol ................................................................................................ 70 3.3.3.1 Preparation of cells and device for loading ............................................................ 70 3.3.3.2 On-chip direct library preparation ............................................................................ 72 3.3.3.3 Bulk library preparation ................................................................................................ 73 3.3.3.4 Whole-genome sequencing ........................................................................................... 73 3.3.4 Data analysis ................................................................................................................................ 73 3.3.4.1 Data alignment and sequencing metrics.................................................................. 74 3.3.4.2 Single-cell copy number inference ............................................................................. 74 3.3.4.3 Clonal and bulk-equivalent genome analysis ........................................................ 75 x  3.4 Results ................................................................................................................................................. 76 3.4.1 Microfluidic device for direct single-cell library construction ................................. 76 3.4.2 Uniformity of coverage in diploid single-cell and tumour-cell sequencing metrics .......................................................................................................................................................... 78 3.4.3 Copy number heterogeneity and clonal evolution in serial breast cancer xenograft passages ................................................................................................................................... 87 3.4.4 Pooling of single-cell genomes yields high-depth, low-bias clonal and bulk-equivalent genomes ................................................................................................................................. 92 3.5 Discussion .......................................................................................................................................... 98 3.6 Tables ................................................................................................................................................ 102 Chapter 4: Highly-scalable direct library preparation (DLP+) of single-cell genomes in open nanolitre well arrays....................................................................................................... 107 4.1 Overview .......................................................................................................................................... 107 4.2 Introduction .................................................................................................................................... 107 4.3 Methods ............................................................................................................................................ 109 4.3.1 Sample preparation ................................................................................................................. 109 4.3.1.1 Cell culture ........................................................................................................................ 109 4.3.1.2 Cell staining and dilution ............................................................................................. 110 4.3.2 Robot operation ........................................................................................................................ 110 4.3.2.1 Primer spotting and wash routine ........................................................................... 111 4.3.2.2 Cell spotting ...................................................................................................................... 112 xi  4.3.3 Chip imaging and cell calling ............................................................................................... 113 4.3.4 Library preparation protocols ............................................................................................ 114 4.3.5 Data analysis and copy number inference ..................................................................... 118 4.4 Results ............................................................................................................................................... 118 4.4.1 Scalable single-cell library preparation in open nanolitre wells ........................... 118 4.4.2 Automated classification of copy-number quality ...................................................... 121 4.4.3 Optimization of key reaction parameters ....................................................................... 122 4.4.4 Evaluation of library fidelity ................................................................................................ 128 4.4.4.1 Bootstrap analysis of GM18507 cells with 2h and overnight lysis .............. 128 4.4.4.2 Sequencing metrics of GM18507 cell line libraries sequenced on 12 HiSeq lanes  ................................................................................................................................................ 130 4.4.4.3 PCR bootstrap analysis and GC bias ........................................................................ 133 4.4.5 Copy number analysis of a large near-diploid cell line dataset .............................. 137 4.4.6 Analysis of single-cell genomes from the 184-hTERT TP53 null isogenic cell-line pair  ......................................................................................................................................................... 140 4.5 Discussion ........................................................................................................................................ 146 4.6 Tables ................................................................................................................................................ 149 Chapter 5: Conclusion ..................................................................................................................... 153 5.1 Contribution to knowledge ....................................................................................................... 153 5.2 Future recommendations and concluding remarks ........................................................ 155 References .......................................................................................................................................... 160 xii  Appendices ......................................................................................................................................... 179 Appendix A : Detailed methods: primer integration ..................................................................... 179 A.1 Device fabrication using the push-down MSL workflow .......................................... 179 A.2 Device fabrication and operation of the double-sided imprinted chip ............... 180 A.3 Spotting robot operation ....................................................................................................... 185 A.4 PCR experiments ...................................................................................................................... 191 A.5 Image analysis ........................................................................................................................... 193 Appendix B : Detailed methods: inflatable chamber ..................................................................... 193 B.1 Device fabrication .................................................................................................................... 193 Appendix C : Extra figures Chapter 3................................................................................................... 197 Appendix D : Extra figures Chapter 4 .................................................................................................. 213 Appendix E : Sequencing data processing ......................................................................................... 227 E.1 Data alignment and coverage uniformity analysis ...................................................... 227 E.2 Single-cell copy number inference .................................................................................... 230 E.3 Phylogenetic inference ........................................................................................................... 232 E.4 Clonal genome analysis .......................................................................................................... 233 E.5 Bulk-equivalent and bulk genome analysis ................................................................... 233 E.6 GC-content estimation ............................................................................................................ 236  xiii  List of Abbreviations ATAC  Assay  for transposase accessible chromatin CAD   Computer-aided design Chr  Chromosome CN  Copy-number CNA  Copy number alteration CT  Cycle threshold DLP  Direct library preparation DNA  Deoxyribonucleic acid DOP-PCR Degenerative-oligonucleotide-PCR  EDTA  Ethylenediaminetetraacetic acid FACS  Fluorescence-activated cell sorting  G2  Buffer G2 is a lysis buffer from QIAGEN gDNA  Genomic DNA HMM  Hidden Markov model KW test Kruskal-Wallis test  LOH  Loss of heterozygosity MAD  Median absolute deviation MALBAC Multiple annealing and looping-based amplification MDA  Multiple displacement amplification MF-DLP DLP performed on a microfluidic device xiv  MSL  Multilayer soft lithography NCC  No-cell control NGS  Next-generation sequencing NTC  No-template control PBS  Phosphate buffered saline PCR  Polymerase chain reaction PDMS  Polydimethylsiloxane qPCR  Quantative PCR SDS  Sodium dodecyl sulfate SNV  Single nucleotide variants Tn5  Transposase of the Tnp family WGA  Whole-genome amplification WGS  Whole-genome sequencing WT  Wildtype xv  List of Tables Table 3.1 Number of constructed indexed libraries, flagged no-template controls, flagged single cells, and single cells retained for downstream analysis ..................................................... 102 Table 3.2 Sequencing metrics for indexed libraries from immortalized normal breast epithelial cell line 184-hTERT-L2 and immortalized normal lymphoblastoid cell line GM18507 ............................................................................................................................................................. 103 Table 3.3 Sequencing metrics for indexed libraries from patient-derived breast cancer xenograft passages SA501X3F and SA501X4F ..................................................................................... 104 Table 3.4 Statistics table showing results of the Kruskal-Wallis tests and Pearson’s correlation .......................................................................................................................................................... 105 Table 3.5 Sequencing metrics for bulk-equivalent and bulk libraries ........................................ 105 Table 3.6 Cost breakdown for device fabrication and library preparation, per chip and per indexed library .................................................................................................................................................. 106 Table 4.1 Summary of experimental conditions .................................................................................. 115 Table 4.2 Statistics table with KW tests for 64 and 128 merged single cells............................ 149 Table 4.3 Sequencing metrics for indexed libraries from immortalized normal lymphoblastoid cell line GM18507 ........................................................................................................... 150 Table 4.4 Number of constructed indexed 184-hTERT libraries .................................................. 151 Table 4.5 Total number of DLP+ libraries .............................................................................................. 152  xvi  List of Figures Figure 1.1 Valve architecture ........................................................................................................................ 14 Figure 1.2 Tagmentation workflow ............................................................................................................ 25 Figure 2.1 Chip interface connection.......................................................................................................... 32 Figure 2.2 Reaction assembly in common microfluidic designs ..................................................... 34 Figure 2.3 Design of the push-down microfluidic device ................................................................... 38 Figure 2.4 Schematic of the double-sided imprinting workflow ..................................................... 40 Figure 2.5 Microphotograph of integrated primers ............................................................................. 41 Figure 2.6 Design of the double-sided imprinted microfluidic device .......................................... 43 Figure 2.7 Performance of incorporated probe substrates ............................................................... 49 Figure 2.8 Layer arrangements .................................................................................................................... 53 Figure 2.9 Device fabrication workflow .................................................................................................... 54 Figure 2.10 Microfluidic device design ...................................................................................................... 57 Figure 2.11 On-chip size selection ............................................................................................................... 60 Figure 3.1 DLP workflow ................................................................................................................................ 65 Figure 3.2 Comparison of amplification methods ................................................................................. 67 Figure 3.3 Indexing primers and trapped cells in the microfluidic device .................................. 77 Figure 3.4 Single-cell copy number profiles from 184-hTERT-L2, an immortalized normal breast epithelial cell line ................................................................................................................................. 79 Figure 3.5 Single-cell copy number profiles from GM18507, an immortalized normal lymphoblastoid cell line .................................................................................................................................. 81 xvii  Figure 3.6 Coverage uniformity of immortalized normal human cell lines and tumour cell sequencing metrics ............................................................................................................................................ 83 Figure 3.7 Single-cell copy number profiles from SA501X3F, a third-passage xenograft derived from a primary triple-negative breast cancer tumour ....................................................... 89 Figure 3.8 Heatmap from fourth passage xenograft SA501X4F ...................................................... 91 Figure 3.9 Analysis of pooled clonal genomes ........................................................................................ 94 Figure 3.10 Analysis of LOH, SNVs, and breakpoints on the pooled bulk-equivalent genome and a standard bulk genome for sample SA501X3F ............................................................................. 96 Figure 4.1 Flow chart showing data-analysis workflow ................................................................... 118 Figure 4.2 DLP+ workflow ............................................................................................................................ 120 Figure 4.3 Effect of lysis buffer and storage time on library quality ............................................ 123 Figure 4.4 Key parameter optimization for DLP+ ............................................................................... 126 Figure 4.5 Effect of splitting experimental condition by cell state on library quality ........... 127 Figure 4.6 Effect of lysis time on coverage breadth of merged single-cell genomes ............. 129 Figure 4.7 Sequencing metrics for DLP GM18507 libraries ............................................................ 132 Figure 4.8 Effect of Tn5 concentration and PCR on coverage breadth of merged single-cell genomes ............................................................................................................................................................... 135 Figure 4.9 Single-cell copy number profiles from GM18507, an immortalized normal lymphoblastoid cell line ................................................................................................................................ 139 Figure 4.10 184-hTERT wild-type passage 21 single-cell heatmap ............................................. 141 Figure 4.11 184-hTERT wild-type passage 51 single-cell heatmap ............................................. 142 xviii  Figure 4.12 184-hTERT TP53 null single-cell heatmap .................................................................... 144 Figure 4.13 Example CN profiles from low-coverage 184-hTERT libraries ............................. 145 Figure A.1 Field target reference point setup for imprinted microfluidic device ................... 187 Figure A.2 Target setup for push-down geometry device ................................................................ 189 Figure A.3 Target setup for double-sided imprinted microfluidic device ................................. 189 Figure A.4 Run tasks for devices described in Chapter 2 ................................................................. 190 Figure C.1 Schematic of the microfluidic device operation ............................................................. 197 Figure C.2 Total reads by cell call .............................................................................................................. 198 Figure C.3 Lorenz curves ............................................................................................................................... 199 Figure C.4 Examples copy number profiles from third passage xenograft SA501X3F ......... 205 Figure C.5 Example integer copy number profiles for single cells from fourth passage xenograft SA501X4F ....................................................................................................................................... 208 Figure C.6 Analysis of contaminating mouse reads in libraries prepared from SA501X3F  and SA501X4F xenograft samples ............................................................................................................. 209 Figure C.7 Comparison of copy number segment calls in the fifteen xenograft cells with the highest number of total reads relative to bulk, as a function of the single-cell bin size ....... 211 Figure D.1 Spotter setup and single-cell isolation ............................................................................... 213 Figure D.2 Cross-contamination during primer spotting ................................................................. 215 Figure D.3 Feature ranking .......................................................................................................................... 216 Figure D.4 Significance correlation matrix for key parameters ..................................................... 217 Figure D.5 Live/dead significance matrix............................................................................................... 218 xix  Figure D.6 Insert size distribution of DLP+ libraries ......................................................................... 219 Figure D.7 Representative single-cell CN-profiles from GM18507 cells .................................... 220 Figure D.8 Representative single-cell CN-profiles from GM18507 cells showing clonal events .................................................................................................................................................................... 222 Figure D.9 Representative single-cell CN-profiles from GM18507 cells with Chr 16q alterations ........................................................................................................................................................... 224 Figure D.10 Total mapped reads split by cell call ................................................................................ 225  xx  Acknowledgements I want to start my acknowledgements by saying thank you to my supervisor and mentor Dr. Carl Hansen. Carl first took me on as a co-up student with no experiences in life-sciences and again as a PhD student in the Genome Sciences and Technology program and gave me the opportunity to prove myself in a new field. Carl, thank you so much for believing in me, your guidance and continuous support and for the unique opportunity to be part of the world-leading collaboration you built. I am and will forever be grateful for the opportunity you gave me. It was a privilege to be part of your lab! I also wish to thank Dr. Sam Aparicio and Sohrab Shah, who were driving forces to apply microfluidic technology in cancer research. In particular, Dr. Sam Aparicio enthusiastically encouraged and supported the idea to massively scale single cell measurements, and without his vision and patience, the DLP+ platform wouldn’t have become a reality. Thank you very much for the opportunity to work in your lab, Sam. It was a great honour. Furthermore, I want to thank my supervisory committee members Dr. Marco Marra and Dr. Andre Marziali for their valuable input and support of my PhD project. My research rotation at the GSC under Dr. Marra’s supervision not only provided me with essential insights into the development of production-quality protocols but also allowed me to build an invaluable network of specialists with a vast knowledge in sequencing technology. Special thanks also go to Mike VanInsberghe and Adi Steif. I want to acknowledge both of you in particular since without your contributions this work wouldn’t have been possible. xxi  Mike, thanks for teaching me everything from pipetting over microfabrication to the optimization of protocols. Your critical feedback always pushed me to go the extra mile. Adi, I really enjoyed working on the DLP project with you. We had to work through so many failures and setbacks but ultimately succeeded. I couldn’t have asked for a better partner on this project, and I’m honoured and excited to share the authorship of the DLP paper with you. My time here would not have been the same without the two of you. I also wish to sincerely say thank you to all my colleagues in the Hansen lab. I was part of an extraordinary interdisciplinary team that not only helped me launch my life-sciences career; it also made me feel truly welcome here. In particular, I would like to thank our amazing lab manager Carmen de Hoog, my friends Georgia and Keith Mewis, Adam White, Darek Sikorski, Kaston Leung, Kevin Heyries, Jens Huft, Marijn van Loenhout and Tim Leaver for all their support. The implementation of the open array platform was an enormous team effort, both in the wet lab and on the bioinformatics front, and so many people were involved in getting to where we are today. Thanks to all the wonderful lab members and collaborators at the BCCRC for integrating me into your teams. Emma Laks, the treble J’s (Jazmine Brimhall, Justina Biele, Jecy Wang), Adrian Wan, Damian Yap and Peter Eirew, the bioinformatics teams around Daniel Lai, Andrew McPherson, and Cydney Nielsen. The enormous push to implement the DLP+ workflow at the current scale wouldn’t have been possible without all of you! xxii  In addition, I would like to thank Dr. Robin Coope for his continuous support and advice. The idea of implementing the DLP protocol on the open-array platform arose during a visit to his office. Robin did not hesitate a second to provide me one of the nanowell chips, which ultimately lead to the development of DLP+ described in the last research chapter. I am also very grateful to the funding agencies NSERC, Genome BC, CIHR for supporting my research and providing the Vanier scholarship. In addition to those mentioned above, I’m genuinely thankful to my cousin Britta Berg, her husband Tobias and their kids Emilia, Manresa, Jonathan, and Kilian. You not only helped me come to Vancouver and find the position in the Hansen Lab, but you also made me feel like being home away from home from the first day. Special thanks also go to all my new friends in Vancouver. In particular, I would like to thank my ski buddies Wylie Spencer, Will Matthews, and Hanna Pierse; my sailing team Fraser Hall, Kyle Duignan, David Volk, John-Mark Ferguson, and Bryan Stern; my roommates Stuart Rowe and Mitch Walker; my dear friends Alisha Neuner and Clayton Van Megen; and the many many more friends that I can’t list here. You made Vancouver so special, enjoyable, and memorable for me and it is safe to say that because of you I had the best time of my life in Canada. Thank you for everything! Finally, I thank my mum and dad for their support throughout my years of education. Above all, I want to thank my mum from the bottom of my heart. Your encouragement and belief in me got me to this point in my career. I wouldn’t be here without you! 1  Chapter 1: Introduction In this thesis I describe the development and application of a miniaturized high-throughput protocol to analyze the genomes of single cells. In the following sections I will: 1) Motivate my work by describing the importance of single-cell analysis in cancer research; 2) Provide the rationale for miniaturizing the protocol and review microfluidic technology to manipulate reagents at the desired nanolitre volume scale and; 3) Outline methods to process DNA samples for sequencing readouts. 1.1 Why measure genomic alterations at single-cell resolution in cancer research? Cancers arise as a result of accumulating sufficiently advantageous mutations in the genome of a single cell (2). During cancer evolution, cancer cells then continuously acquire additional heritable genomic aberrations and may pass these variants on to their descendants. (2–4). Selective pressures such as drug treatment, the tumour microenvironment, and the immune response, act on the diverse group of cancer cells and may differentially affect their survival (5,6). Analogous to Darwinian evolution, this process ultimately selects for cells with a proliferative and survival advantage, thereby forming clonal populations comprised of a family of cells derived from a common ancestor and that can be identified by shared genomic features (7,8). 2  In this thesis a clone is operationally defined as a sub-group of cells that share one or more genomic variants that distinguish them from the rest of the cells in a mixed population. The number of cells within a clone can vary over time (clonal dynamics) and new clonal genotypes can emerge (clonal evolution). It is important to note that genomic mutations can function as lineage markers, regardless of their phenotypic impact, to enable reconstruction of a clonal lineage (the relationship between clones over time). Studying the clonal composition and the response to selective pressure is an essential step in understanding cancer resistance and relapse to cancer therapy. While early genomic studies were limited to the observation of large-scale mutational events, such as the identification of chromosomal aberrations using fluorescent in situ hybridization (FISH, (9)), comparative genomic hybridization (CGH, (10)) and sequencing technology (11) progressively improved the resolution and throughput. In particular, sequencing technology now enables researchers to acquire a detailed picture of genomic aberrations across the entire genome (for more details on sequencing see Chapter 1.4, (12)). Previous studies using whole-genome sequencing (WGS) have been used to capture a complete snapshot of all genomic aberrations across bulk samples from various cancer types (13–16). These studies have been done using conventional library preparation workflows that require tens to hundreds of nanograms of input material. Therefore, preparing DNA for sequencing applications begins from mixtures of many cells, each containing roughly 6 pg of DNA. Consequently, the vast majority of our molecular and cell biological understanding of cancer initiation and progression is derived from studying 3  signal averages over large cell populations which are inherently heterogeneous in their genome features (17–24). This averaging obscures the underlying biology and poses a major obstacle to studying the evolution of cancer. 1.2 Analysis at single-cell resolution Tumour heterogeneity describes the phenomena that individual tumour cells are not all the same. For example, they may differ in their complement of genomic mutations, or in their transcriptional states. While the differences between individual tumour cells is widely acknowledged, our understanding of the consequences of intra-tumour heterogeneity (within a patient) is currently limited (25). In large part this is due to a paucity of data on cellular heterogeneity that underlies tumour clonality (26). Disentangling the clonal mixture of a tissue sample and identifying minor, yet important, cell populations is a helpful approach to decipher the behaviour of populations of cells during tumour evolution and treatment. Recent studies approached the problem of identifying sub-populations from two different directions: 1) computational methods were applied to bulk signals and 2) measurements were performed on single cells directly. Clonal inference from bulk measurements Recently, a variety of studies have used computational methods to infer the clonal architectures from bulk measurements (27–29). These approaches are generally based on analysis of data obtained from standard bulk library preparation protocols and deep sequencing to determine variant allele prevalences (VAP) (8). From this data, computational methods attempt to disentangle the effects of 4  contaminating normal cells, copy number alterations (CNA), and loss of heterozygosity (LOH) to predict clusters of co-occurring mutations (30). While such methods have proven successful in characterizing the clonal structure of major sub-clones in tumours, their capacity to resolve minor clonal populations is severely limited by sampling and the sequencing error rate. In addition, such approaches are of limited use when tumour cellularity (the fraction of cells in a sample that is from the tumour) is low. These approaches also perform poorly in the detection of copy number alterations at low prevalence; these variants are much more difficult to detect in bulk genomes than low-prevalence single nucleotide variants (SNVs) (24,31). Finally, observations of mutational events that do not lead to a clonal expansion are impossible to detect in bulk measurements. Examples of such events can include processes that usually lead to negative selection, such as programmed cell death (checkpoint response) due to mitotic chromosome segregation errors (32), or a subclonal mutation in a DNA repair gene resulting in random mutational patterns within this clone that are undetectable once averaged in bulk measurements.  Measurements of single-cell signals directly Performing measurements on a single-cell directly, instead of inferring the signal from a bulk mixture, offers a more direct and sensitive approach to study the heterogeneity of tumour cell genomes (33–39). Single-cell measurements have become possible due to technology advancements, including the development of flow sorting (FACS, (40,41)) to isolate single cells, improved molecular biology to amplify the DNA of a cell, and reduced input requirements for library 5  preparation protocols and sequencing platforms (42–48). A number of recent studies examined the architecture of heterogeneous samples by applying these advancements to sequence and study mutational signatures in single cells (24,39,46,49–54). Generally, these approaches require whole-genome amplification (WGA) to generate enough material to sequence (50). The three single-cell whole-genome amplification methods most commonly used to perform single-cell WGA are: 1) multiple displacement amplification (MDA), 2) degenerative-oligonucleotide-PCR (DOP-PCR), 3) and multiple annealing and looping-based amplification (MALBAC). MDA is based on random priming of denatured DNA followed by isothermal extension using a DNA-polymerase with strong strand-displacement activity, typically phi29 (55). When the newly extended strand reaches the next starting site, the polymerase will displace this synthesized fragment. This strand displacement generates new single-stranded DNA for primer annealing and extension, resulting in a highly branched, high molecular weight amplification product. An alternative approach, DOP-PCR, relies on a set of primers with defined sequences at the 5’-end and 3’-end and degenerated nucleotides in between (56–58). During two-stage amplification, short copies of the template DNA, spread across the genome, are first generated. In the second stage, the PCR annealing temperature is increased to selectively amplify the short, tagged copies of the template. 6  To increase the probability of random priming across the entire genome, MALBAC performs multiple rounds of pre-amplification using a MDA variation first (50). In contrast to MDA, the random primers contain complementary ends that enable fully extended amplicons to form a loop. This looping mechanism excludes copies from further pre-amplification and results in the sole pre-amplification of the original genomic DNA. The pre-amplification product is then subject to whole-genome PCR amplification. A common goal of most WGA protocols has been the recovery of single-cell genomes with maximum fidelity and coverage. To do this, WGA generally generates many copies of the original template prior to library preparation for sequencing. This pre-amplification may increase library complexity, permitting the sequencing of cells to a higher coverage depth (average number of mapped non-duplicate reads per genomic position) and attaining a higher coverage breadth (fraction of the genome covered by at least one sequencing read) than would otherwise be possible (31,46). However, local biases in amplification reduce the uniformity of coverage, thus obscuring the detection of copy number changes. Moreover, polymerase errors propagated over many amplification cycles lead to high rates of false-positive SNV calls (35,50,59,60). Previous comparisons of WGA methods have detailed the strengths and weaknesses associated with each approach. In general, protocols based on DOP-PCR achieve the highest coverage uniformity, with a lower dispersion in binned read counts relative to MDA and MALBAC, making these methods the most amenable to single-cell copy number inference (33,61). However, the coverage breadth of DOP-PCR libraries tends to saturate with deeper 7  sequencing, making it less suitable for the analysis of single-cell SNVs (46). In contrast, MALBAC and MDA libraries may be sequenced to a high depth in order to achieve the high coverage breadth necessary for SNV analysis (46,50). However, MALBAC libraries suffer from a high rate of polymerase base substitutions (50), while MDA libraries generally exhibit substantially higher dispersion in binned read counts and are therefore more susceptible to copy number artefacts (46).  Finally, it should be emphasized that, independent of the library preparation method chosen, the throughput and cost of WGA methods is often limiting. Further, even if libraries are in hand, the cost of sequencing a biologically meaningful number of cells (ideally in the thousands) to a high coverage depth and breadth remains prohibitively expensive for the vast majority of labs (54). Accordingly, past studies examined only ten to several hundred single-cell genomes (39,46,50,54,62), with some reducing sequencing costs through exome capture or the use of targeted sequencing. If the experimental design requires the analysis of only a few cells, (for example, cells extracted for pre-implantation diagnostic testing), and a range of variant types are of interest, WGA methods remain the preferred approach. However, in the foreseeable future, high-depth single-cell sequencing using WGA methods is unlikely to provide an unbiased representation of a large heterogeneous population. Recently, several library preparation strategies (see Chapter 1.4 for definition of library) that eliminate the need for pre-amplification (WGA) have been described (1,46,63–67). These methods still relay on amplification to generate enough input material for sequencing. However, instead of amplifying the template prior to library preparation, the 8  final library is amplified instead. This is an important difference, since duplicates resulting from amplifying the final library are identical copies and can be flagged and filtered computationally. In contrast, pre-amplification generates many copies of the template, which is subsequently fragmented into short inserts for sequencing. These fragments do not have overlapping start and end coordinates and therefore cannot be identified easily, even though they’re all copies from the same template. For the purpose of this thesis, a library construction method will be referred to as amplification free even if the library is subject to PCR amplification after construction. A recent approach, dubbed Linear Amplification via Transposon Insertion (LIANTI), aimed to improve amplification fidelity of the final library by inserting a T7 promoter during DNA fragmentation that was used for subsequent linear amplification (68). In theory, linear amplification can overcome some of the limitations of exponential amplification, as the high local variability due to different amplification efficiencies and the propagation of early polymerase errors over many rounds of amplification is minimized (68). However, both a detailed comparison of LIANTI to existing WGA methods and a demonstration of the scalability of this approach has yet to be shown.  Similar to previous single-cell studies, LIANTI aimed to recover the complete genomes of single cells. Other studies followed a new theme. Instead of analyzing a few cells at a high coverage breadth and depth, studies with more cells but at low coverage depth have been reported (1,64–66). Low-depth sequencing is suitable to cluster subsets of cells based on 9  large-scale genomic alterations, such and CNAs, and to perform additional analyses within the clusters of merged single-cell data. Bos et al. introduced a ligation-based single-cell WGS protocol in multiwell plates without pre-amplification (64). The protocol is very similar to Strand-seq (63,69), a library preparation technique for selectively sequencing the parental template strands of a daughter cell. The main limitation of these methods is the need to purify the reaction product after each enzymatic step to overcome buffer inhibition and to eliminate side products. Since purification also leads to the loss of template DNA, the library complexity is low which results in low genomic coverage per cell (63,64,69). Alternatively, the Shendure group has reported combinatorial indexing workflows that achieve a significantly higher throughput than those that directly index samples in individual wells (70,71). In combinatorial indexing, populations of intact nuclei are initially distributed into an array and each population receives a first unique molecular barcode. The populations are then pooled and indexed nuclei are redistributed into a new array of wells where a second barcode is introduced to complete the library preparation. By limiting how many cells are redistributed into the second set of wells, most nuclei receive a unique combination of molecular barcodes (72). After sequencing, these barcodes can be split bioinformatically and reads assigned to a single cell. This is an elegant workflow that doesn’t require specialized equipment and achieves a very high throughput.  10  Two recent studies applied this strategy to analyze more than 10,000 single cells (65,66). However, since reaction tubes contain multiple cells at a time, nuclei needed to be kept intact for successive labelling and pooling steps in order to distinguish reads derived from different cells. This requirement makes it challenging to remove DNA-bound proteins effectively and can result in uncertain copy number calls (73). Furthermore, during the redistribution and second labelling step, nuclei with the same initial molecular label can be pooled together again, making it impossible to distinguish sequencing reads derived from different nuclei (65,66). Harsher lysis conditions and protease digestions can better strip DNA from all binding proteins; however, keeping thousands of cells separate in individual reaction tubes followed by traditional library preparation in microlitre reactions is prohibitively expensive and labour intensive.  In this thesis we have advanced a strategy for tagmentation-based direct library preparation (DLP) in nanolitre volumes (1), described in detail in the following research chapters. A key benefit of our approach is the implementation using microfluidics, since this technology is ideally suited to implement a streamlined and automated high-throughput protocol for the analysis of thousands of single cells. In the following section I will discuss the benefits of miniaturizing reagent volumes and describe relevant microfluidic technologies for manipulating nanolitre volumes. 11  1.3 Microfluidic technology for single-cell analysis 1.3.1 Benefits of miniaturizing and integration Miniaturizing biological assays has numerous advantages, including practical benefits directly linked to the reduced volume and others as a result of unique properties at this scale. The following section briefly discusses these advantages as they relate to the analysis of single cells. One obvious result of miniaturized volumes is reduced reagent consumption, which can significantly decrease the analysis cost per sample. In addition, reducing the physical dimensions significantly increases the achievable reaction density for highly scalable designs. More fundamentally, the reduced volumes achievable in microfluidic systems enable molecular biology in a regime of much higher template concentration. For example, processing a single cell in a 50 nL reaction volume corresponds to a concentration that is equal to the optimal values used in tube-based workflows with modern library preparation protocols (e.g. 10-50 ng input DNA in a 50 μL reaction volume, Nextera DNA Library Prep Kit, Illumina). By the same mechanism, reducing the reaction volume also reduces the relative contamination concentration. In microlitre volumes, even small concentration of contamination in the reagents become significant compared to the minute amounts of template DNA contained in a single cell. By reducing the reaction volume 1000×, the effect 12  of reagent contamination, which scales with the total reaction volume, is reduced by the commensurately. Microfluidics can also provide some advantages in mixing kinetics due to small length scales. For instance, in reactions that go to completion, such as the tagmentation reaction described in Chapter 3, the small characteristic length scales of microreactors make diffusive mixing much faster, with time constants scaling roughly as the square of the length. Another advantage of miniaturization is the facile integration of fluidic components for the end-to-end processing within the same device. Performing all processing operations within the same unit reduces the risk of contamination, minimizes or avoids sample loss, and improves reproducibility in defined chamber volumes. Integrated networks also can improve reproducibility by combining many processing units on a single device and performing all measurements in parallel, thereby eliminating possible temporal effects of serial processing and enabling higher throughput through parallelization. In summary, performing analyses in small reaction volumes can help to enhance sensitivity, reproducibility, and experimental scale and is ideally suited to integrated single-cell processing with miniaturized molecular biology assays (60,74–77). In the following section, I will introduce the microscale technologies that we applied in the pursuit of implementing a scalable single-cell library preparation protocol. 13  1.3.2 Microfluidic technologies In this thesis I present the use of microscale library construction technologies that can generally be classified into two types: closed microfluidic devices and open microwell arrays. The following section reviews the foundation of these technologies and their use. 1.3.2.1 Closed microfluidic circuits made using soft lithography Introduction to soft lithography While early microfluidic devices were fabricated in glass or silicon using MEMS (microelectromechanical systems) processes, elastomeric materials soon emerged as an alternative and have been a focus of microfluidics research, both in academia and industry (78). In particular, the use of silicone elastomers has gained popularity due to relatively simple replica moulding and device assembly methods, and lowered fabrication costs. The process of fabricating polymeric devices by replica moulding, termed soft lithography, was introduced by Whitesides in 1997 (79) and combines high-resolution printing, photolithography, and moulding of elastomers such as polydimethylsiloxane (PDMS). Networks of ridges defined on a glass or silicon mould become channels and chambers in the cast PDMS. Multilayer Soft Lithography (MSL) builds on this technique, but instead uses consecutive micro-moulding and bonding steps to create three-dimensional multilayer structures (80). By aligning these patterns in a way such that channels that cross on two adjacent layers are separated by a thin elastomer membrane, monolithic valves can be made for active flow control (Figure 1.1). These valves are actuated upon pressurizing one 14  channel, resulting in the deflection of the membrane to form a true sealing valve (81). When the pressure is relieved, the elasticity of the PDMS causes the membrane to assume its original position and the valve opens. The low Young's modulus and sealing properties of PDMS leads to small valve dimensions and thus enables the integration of tens of thousands of these valves within an area of a bank card. Higher-level fluidic components such as peristaltic pumps(80), rotary mixers(82), and demultiplexers (83,84) can also be assembled using these microvalves.  Figure 1.1 Valve architecture Schematic of the two most common valve architectures. (a) Push-down and (b) Push-up arrangements. Blue represents flow layer, orange represents control layer, grey a glass substrate.   Typically, one of two valve architectures is used, defined by the arrangement of the layer containing the fluidic path that needs to be controlled (flow layer) and the layer containing the control channels that form and actuate the valves (control layer). In the “push-down” arrangement (Figure 1.1a), pressure is applied to deflect the ceiling of the flow channel downwards whereas in the “push-up” architecture the bottom of the channel is deflected upwards (Figure 1.1b). The latter arrangement has two major advantages. First, the actuation pressure is significantly lower due to a favourable membrane shape and constant 15  thickness. Second, reaction chambers located in the flow channel can be taller. In the “push-down” arrangement, the chamber height is restricted by the maximum layer thickness that can be deflected sufficiently to form a closed valve. Contrary, in the “push-up” configuration, the membrane forming the seal is located on the control layer. In this arrangement, the reaction chamber height is only limited by the manufacturing constrains. Multi-layer devices are typically cast from a “control” and a “flow” mould. Depending on valve architecture, PDMS is spin-coated onto the mould that contains the valve membrane. The second layer is cast in a thick (3-8 mm) PDMS slab to simplify handling during alignment. To gain access to the channel geometries, the thick slab is removed from its mould after curing the PDMS, and aligned to the thin membrane, still attached to its mould. Both, shrinkage of the PDMS slab after it is removed from the mould and the elasticity of the material can make alignment challenging. A strong bond between layers is crucial for device operation, since pressurizing channels can create large local forces that can otherwise delaminate (peel apart) layers. There are two techniques for bonding PDMS layers: off-ratio diffusive bonding and functionalizing the PDMS surface using oxygen plasma treatment. The former relies on an off-ratio mixing of a two-component polymer, normally mixed at a 10:1 ratio. For off-ratio bonding purposes, the thick slab is usually cast in a 5:1 ratio and the thin membrane in a 20:1 ratio. After alignment of the two layers, excess functional groups can interact at the interface and form a covalent bond. This process is usually carried out at 80 °C. In the case of oxygen plasma treatment, devices can be made entirely from 10:1 PDMS. Plasma treatment functionalizes 16  surface groups that can react to covalently bond the two layers. In general, oxygen plasma bonding forms stronger bonds, is less time sensitive, and more independent of PDMS batch quality. However, alignment is more challenging since activated surfaces form an immediate bond upon contact, making it more difficult to align complex devices. Both passively controlled single-layer devices as well as actively controlled multi-layer devices have been developed for a range of single-cell applications.  Single-layer soft lithography has primarily been used to implement emulsion droplet approaches to generate large numbers of discrete volumes in an immiscible phase. Each droplet acts as an individual reaction compartment, and methods for merging and splitting droplets as well as adding new reagents to each droplet have been published (85). The main strength of this approach is its raw throughput. Groups have used emulsion droplets to analyze gene expression of tens of thousands of single cells (86), implement single-cell chromatin profiling (87), and perform single-cell WGA for next-generation sequencing library construction (88). Recently, an entire genomic library preparation workflow was implemented on a platform that uses a combination of droplets and hydrogels to enable subsequent wash and enzymatic processing steps while maintaining compartmentalization (89). Even though droplets can rapidly isolate thousands of cells into separate droplets and are exceptionally well suited for dramatically scaling simple protocols, it is difficult to manipulate specific droplets. Emulsion droplet formats, in general, suffer from limited capabilities in performing complex multistep protocols (e.g. protocols that require wash routines or multiple enzymatic steps). In addition, the very small volumes and insufficient 17  dilution of the cell lysate might inhibit reactions in droplets (90). This inhibition can be obstructive for WGS workflows that generally require more elaborative library preparation protocols to strip genomic DNA and carry it through a multi-step process to prepare the DNA for sequencing. For complex workflows that require a higher degree of fluid and volume control, multilayer microfluidic devices having integrated valves are ideal. Integrated valves can be used to isolate cells and precisely route reagents, with volumes in the picolitre to nanolitre range, through a network of microchannels and chambers (91). Valve-based devices made from transparent materials also allow for high-resolution imaging of cells to confirm single-cell capture and to correlate phenotype with sequencing data. In a typical chip layout, processing units are designed from a series of ever-increasing chamber sizes that are separated by valves. During an experiment, valves are used to open a fluidic path to new chambers and reaction products can be mixed (actively or by diffusion) with fresh reagents (see Figure 2.2). Some groups have integrated the entire genomic library preparation workflow on a valve-based microfluidic device (92), including protocols that require purification and size selection (93). Notably, the commercial C1 Auto Prep System is a fully automated valve-based microfluidic solution, that is capable of isolating and processing up to 96 single cells in parallel (94–96). Gawad et al. implemented a WGA protocol on the C1 system to study acute lymphoblastic leukaemia (ALL) on 1479 single tumour cells (52) and Buenrostro et al. developed protocols for ATAC-seq (Assay for Transposase Accessible Chromatin) on the same platform (95). Other studies utilized custom chip designs to 18  demonstrate the application of valve-based microfluidic chips in single-cell whole genome and targeted studies (91,94,97–99). Limitations of closed microfluidic circuits There are, however, some important limitations to the use of integrated microfluidic systems for single-cell analysis. First, microfluidic devices are typically designed for a specific application and thus have fixed volume-ratios between successive reagent additions that generally do not allow for a single device to be repurposed for different protocols. Second, cell isolation is typically achieved with cell-capture traps that are optimized for a specific range of cell sizes, and are not suitable for all samples. Thus, the availability of multiple devices is required for broad applicability, and the specific design must be carefully selected and optimized for each application (100). Finally, the fabrication of MSL devices requires highly specialized facilities and expertise, making the technology largely inaccessible to most biology researchers. Even when devices can be obtained from other sources, extensive lab equipment, such as microscopes, pressure sources, and control valves, are needed for chip operation. 1.3.2.2 Open microwell structures Compared to the droplet and valve-based microfluidic devices, open microwell arrays provide a simple format that offers operational flexibility similar to strip tubes or multiwell plates. Microwell arrays are open nanolitre structures cast in PDMS or manufactured in rigid substrates such as aluminium. To address individual microwells within an array, two 19  principals have been reported: 1) bulk addition of cells or reagents followed by loading the individual wells by gravity or diffusion, or 2) targeted dispensing of cells or reagents directly into fixed locations.  The first approach addresses all microwells simultaneously by adding a cell suspension or reagents to the entire array in a bath like structure. Cells settle into wells by gravity and new reagents can be carefully added and mixed with the existing solution. This method benefits from a simple workflow that doesn’t require specialized lab equipment and could potentially achieve very high throughput. However, the absence of a physical barrier between wells increases the risk of cross-contamination and reduced sensitivity. Besides this obvious limitations, Gole et al. applied this methodology to isolate single cells and perform WGA in open-microwells, before extracting the amplified DNA from individual wells for library construction and sequencing (74). Others attempted to capture the genetic material by incubating cells with stacked beads to reduce the risk of cross-contamination and improve sensitivity for transcriptional analysis (101). To introduce physical barriers while maintaining the simplicity of bulk cell seeding and reagent addition, microwell arrays have been integrated into multilayer devices that facilitate oil-based separation during cell lysis and capture (102,103). While this design is an improvement in terms of isolating cells over previous implementations that did not use any barriers, it lacks the ability for buffer exchanges or additions in multi-step reactions. Furthermore, since individual wells cannot be labelled during parallel processing and recovery, only a low occupancy rate (<10%) can be achieved to avoid wells that contain more than one cell (103). Other groups have used 20  semi-porous membranes instead of oil to create a barrier between wells that are otherwise connected in a bath. These membranes trap macromolecules inside the well while still enabling solution exchanges (104,105). This is, however, an obvious limitation in library preparation protocols, during which the genetic material is processed into short fragments for sequencing. The second approach uses non-contact droplet dispensing robots to address each well individually in order to achieve true separation between adjacent wells. Droplet dispensing systems generate droplets on demand by causing a pressure wave inside a nozzle that causes small amounts of liquid to be ejected (106,107). Reagents and cells can thus be encapsulated into droplets and deposited directly into microwells. This process has been shown to be gentle, preserving cell viability (108,109). Typically, cells are loaded at a concentration such that there is an average of one cell per droplet. The major advantage of using dispensing robots instead of the previously described bulk-addition methods is the ability to then combine the fixed spatial locations of the microwells with imaging information to specifically process select wells (76,110). Unwanted wells, such as those that contain contaminants and dead or multiple cells, can be excluded from downstream processing. In addition, some methods enable the assessment and selection of cells in the dispensing nozzle, thereby overcoming the limiting-dilution distributions (109,111). Finally, a large range of cell sizes can be implemented on the same system, since no mechanical isolation is required. Wafergen’s ICELL8 Single-Cell System is an automated commercial product that combines contactless dispensing with a chip imaging station and 21  cell-selection software to perform single-cell measurements in up to 5,184 microwells in parallel. This system has been used to prepare RNA libraries from single cells (112) as well as nuclei (110). Others have developed similar custom systems to implement single-cell reverse transcription PCR (RT-qPCR) assays (113) and single-cell MDA in nanolitre volumes (76). In summary, microscale technology provides an important platform to enable scalable, high-throughput library preparation from single cells. The technology enables the robust isolation of single cells, reduces reaction costs in nanolitre volumes compared to conventional plate-based method, and improves data consistency through parallelization and automation. However, buffer exchanges and purification protocols are difficult to implement in any microfluidic format, and therefore, robust single-pot library reaction protocols are needed. 1.4 Library preparation for next-generation sequencing applications Innovations in sequencing technology have rapidly increased throughput and lowered costs, making sequencing an integral research tool in modern biology. Historically, capillary sequencing, also termed Sanger sequencing (114), was the “gold standard” sequencing technique and has led to important milestones such as the sequencing of the human genome (115,116). However, high costs, slow speed and intensive labour requirements make Sanger sequencing not suitable for routine whole genome sequencing. The launch of second-generation sequencing platforms, such as the 454’s Genome Sequencer FLX (now 22  Roche, (117)), Solexa’s Genome Analyzer (now Illumina, (12)), and Thermo Fisher Scientific’s Ion Torrent (118), have progressively reduced cost and increased speed ((119); a detailed history of the development of sequencing technology has been published previously (120)). This development propelled rapid discovery in cancer research by providing the ability to quickly identify rare mutations with high confidence. This ability has led to the development of an extensive collection of library preparation protocols in order to provide sequencing solutions to a wide range of research questions. However, the enormous throughput of the latest sequencing platforms creates a new disparity between sequencing throughput and up- and down-stream procedures, such as library preparation and data analysis. For example, Illumina’s NovaSeq 6000 system is now capable of producing up to 20 billion reads in as little as 44 hours; this is roughly equal to 48 human genomes at 30X genome coverage per run (121). Innovative high-throughput library preparation methods are now needed to keep pace with these recent developments, especially for applications where analyzing many different samples is more important than obtaining very deep coverage on one or few samples. Library preparation is the process of processing genetic material into a format that is compatible with the requirements of the sequencing system. It usually involves the fragmentation of the DNA into short inserts and the attachment of “handles” on either end. These handles, called flow-cell adaptors, are used to covalently attach the processed genetic material to the flowcell for sequencing.  Illumina provides comprehensive reports summarizing library preparation methods that can be used with their dominant 23  sequencing by synthesis (SBS) technology (DNA: (122), RNA: (123), Single-Cell: (124)). For this dissertation, we will focus on whole-genome sequencing (WGS) library preparation protocols. Unlike targeted studies, WGS attempts to deliver a comprehensive representation of the entire genome. This strategy is ideal for discovery applications with no prior knowledge of the mutational landscape (125).  In general, the two dominant approaches taken to prepare WGS libraries are either the conventional ligation-based procedure or one based on transposon insertions (126). The conventional protocol includes procedures to fragment the DNA to the desired length (mechanically or enzymatically), followed by end-repair, size selection, poly-A tailing, and sequencing adaptor ligation. Optionally, final libraries might be PCR amplified to boost the DNA quantity in order to meet platform-specific input requirements. To eliminate reaction byproducts, purification steps and buffer exchanges are performed after each enzymatic step, which results in time-consuming and work-intensive protocols with potential losses during purification steps. Adey et al. first introduced a “tagmentation” protocol for bulk library construction as an alternative to ligation-based protocols. The method benefits from low input requirements, very consistent genome-wide coverage and no significant GC-insertion bias compared to libraries built with a conventional ligation-based protocol (127,128). The authors indicated that they had successfully built libraries from 50 ng of human genomic DNA and tested the limits of starting material from as low as 10 pg of gDNA (approaching a single-cell equivalent from a human genome). However, for low input libraries only the 24  number of uniquely mapped reads were reported, and the authors indicated that the library complexity was low, though did not provide further details (127). From an application point of view, it is also noteworthy that libraries were built from purified genomic DNA and not from a single cell directly, which would add additional challenges in determining suitable reagent compatibility, lysis conditions, and genome accessibility. Thus, while the chemistry was successfully tested on sub-nanograms of gDNA, the working range of the protocol demonstrated was tens of nanograms of starting material in a 50 μL final reaction volume. This is comparable to the regime now used in commercially available products. The tagmentation chemistry typically uses the hypoactive Tn5 transposase, which creates 9-bp staggered double-stranded breaks randomly spread over the entire genome and simultaneously end-joins synthetic adaptors to the 5’ end of the target DNA (a process also referred to as tagmentation) (126,129) (Figure 1.2). The fragment size can be adjusted by varying the concentration of transposase relative to the DNA amount. Finally, sequencing adapters and sample indexes are added to each fragment during limited rounds of PCR amplification. In commercial solutions (Nextera DNA Library Prep kit, Illumina), a mix of two adaptors – A and B – are inserted into the target DNA in order to generate library inserts with two different flow-cell adaptors. During the random tagmentation process, four species of inserts are produced: the desired A-B or B-A tagged inserts, as well as fragments with the same adaptors on either end (A-A or B-B) (Figure 1.2). Enrichment PCR (130) is then used to amplify for the desired A-B-tagged 25  sequences and to add sequencing adaptors (126). This enrichment has important consequences when starting from a single-cell template, as 50% of the genome will not produce usable reads. Some groups attempted to further streamline the tagmentation protocol by using a simple heat-inactivation of the transposase before PCR instead of a chemical denaturation and buffer exchange (126,131). However, it has also been shown that the Tn5 transposase stays bound to the target DNA and only completely fragments the DNA after denaturation of the enzyme (70,71). This behaviour can also have important consequences when libraries are built from limited amounts of starting template, such as single cells, as bound enzymes prevent the subsequent addition of sequencing adaptors.  Figure 1.2 Tagmentation workflow 26  Schematic showing tagmentation reaction using Tn5 transposase reaction (Nextera, Illumina). (a) Transposases are loaded randomly with A and B tagged transposons. (b) Loaded transposases are incubated with target dsDNA. DNA is fragmented and transposons are covalently attached to the 5’ end. (c) Transposases bound to DNA are released. The resulting inserts are a mix of AA, BB, AB and BA tagged fragments. (d) Enrichment PCR is performed to enrich for inserts with different tags and add sequencing adapters.   Due to the simplicity of use, groups have been working on implementing less expensive homemade versions of the Tn5 system, or developing new applications with customized adaptors. For example, some groups successfully produced their own Tn5 transposase with comparable results to the commercial kits (132) and used it to load custom adaptors for molecular counting in single-cell gene expression analysis (133) and bisulfite sequencing (134,135). Other used a commercially available transposase but combined it with customized adaptors to construct directional RNA-seq libraries (136). Alternatively, the commercial kit can be applied in innovative ways without modification; for example, the library preparation workflow, termed ATAC–seq, has been applied to study chromatin accessibility in bulk and at single-cell resolution (72,95,137). 1.5 Summary In summary, studying cancer heterogeneity and how it links to disease progression, resistance to therapeutics, and relapse requires measurements at the single-cell resolution. Recent advances in single-cell processing have rapidly advanced our understanding of cancer evolution and the role of cellular heterogeneity. However, many technical challenges remain before single-cell technology can become a routine research tool and part of 27  everyday clinical practices. The need for innovative library preparation methods for single-cell genomic analysis has been amplified by our quickly evolving ability to target essential cancer genes and pathways (38) and has been a key factor motivating the project described in this dissertation. 1.6 Research contribution This thesis describes the development of new microscale technology and associated protocols that allow for the scalable and high-fidelity analysis of genomes from single cells. The main designs, protocols, and demonstrations described are outlined below. 1.6.1 Deposition of primers and template into multilayer microfluidic devices A major technical challenge in implementing a microfluidic version of the tagmentation protocol is the need to interface devices that operate on the nanolitre scale with large numbers of primers or other reagents – the so-called “world-to-chip” interface problem (138). This is true both for implementing protocols that require many different reagents, and for the recovery of hundreds to thousands of individual reaction products. In Chapter 2.3, I describe an optimized protocol for addressable deposition of molecular labels into the microfluidic devices using nanolitre dispensing. By labelling the reaction products and pooling them for recovery on-chip, the number of interface connections is independent of the number of reaction chambers. This approach can be an attractive world-to-chip solution for scalable analysis in genomic and genetic applications. 28  1.6.2 Inflatable reaction chamber Establishing a robust protocol in a microfluidic device generally requires extensive iterations and optimization. For multi-step assays this is particularly challenging since the number of reagent additions and volume ratios are typically “hard-wired” into the device design, often making it necessary to iterate device structure as the protocol develops, with lengthy steps of design and manufacturing. In Chapter 2.4, I present a novel inflatable chamber architecture that solves this problem, allowing for the assembly of any multistep reaction with a single device. In this architecture, a flexible membrane is integrated into the reaction chambers during chip fabrication and is used to expand or reduce its capacity. I combine the inflatable chamber with other design elements to create a flexible microfluidic platform that supports a wide array of single-cell protocols. I also demonstrate that this architecture enables solid-phase capture and size-selection protocols, further expanding the range of potential applications. 1.6.3 Direct library preparation (DLP) without pre-amplification As described above, the vast majority of single-cell sequencing protocols require a step of whole-genome amplification. This process has been shown to introduce errors, which limits the detection sensitivity of genomic aberrations. Using the microfluidic innovations developed in Chapter 2, I describe a new single-cell direct library preparation (DLP) method in Chapter 3 that eliminates the need for pre-amplification to yield high fidelity single-cell genotypes. The scalability and low bias of this approach is used in combination 29  with a custom bioinformatic workflow that first uses the detection of large-scale structural variants to define the clonal structure, and then uses this structure to reconstruct high-depth clonal genomes from which small-scale genomic variants are inferred with high confidence. 1.6.4 Highly-scalable direct library preparation (DLP+) of single-cell genomes in open nanolitre well arrays While the microfluidic implementation of the DLP library preparation protocol improves genomic coverage fidelity, the use of microfluidic devices increases complexity and places some constraints on the ultimate scalability and throughput of single-cell genomics. To address this technical limitation, Chapter 4 describes the transfer and optimization of the DLP process to a highly scalable open-array format using off-the-shelf microwell arrays and a commercially available piezo-electric dispenser. We identified key parameters for the successful preparation of single-cell DLP libraries in this new format and demonstrated the scalability of this approach in generating thousands of single cell libraries per day. 30  Chapter 2: Microfluidic designs and device fabrication for single-cell measurements 2.1 Overview A central theme in my thesis is the use of miniaturization to enable a scalable and sensitive protocol for the generation of next-generation sequencing libraries using nanolitre transposition reactions. The first implementation of this was pursued using integrated microfluidic devices made with multilayer soft lithography (MSL). As described above (Chapter 1.3.2.1), this technology facilitates the dense integration of valves, channels and chambers to support multi-step processing in nanolitre volume reactors. However, two important limitations of MSL technology were encountered during the development of single-cell library construction. The first is related to indexing individual libraries for multiplexed sequencing; the second is related to the assembly of multi-step reactions on-chip with minimal loss. This chapter describes technology development work that was done to resolve these limitations so that a direct single-cell library preparation could be realized on chip. The resulting innovations are also more generally applicable in the field of microfluidics, and we anticipate these approaches will benefit studies where multiplexed high-throughput measurements are essential. 31  2.2 Introduction Soft lithography provides an inexpensive and robust microfabrication technology to precisely manipulate minute volumes and has become a popular tool to miniaturize protocols for life science research (Chapter 1.3.2.1). During the implementation of a single-cell library preparation protocol, I encountered two major limitations. First, a scalable method to maintain the identity of hundreds of single-cell libraries during recovery was required for off-chip sequencing analysis. In genomics applications this problem can be addressed by encoding the origin of reaction products using DNA sequences, a strategy called “indexing”. Specifically, the inclusion of a sample-specific sequence into products originating from a single reaction allows samples to be conveniently labelled, pooled, and then informatically separated after sequencing. Given the large combinatorial diversity of even a short DNA sequence, this approach is scalable to many thousands of reactions. However, indexing solves only the output interface and still requires that a unique reagent containing this label be introduced to every reaction chamber.  A conventional approach for assay delivery is to connect nanolitre microfluidic chambers to microlitre ports that can be interfaced with standard macroscale lab equipment (Figure 2.1). Applying this tactic across more than a few reagents, however, would require an impractical number of inlet ports, each occupying a large area on the device, and would necessitate very complex channel routing that would severely limit the number of 32  microfluidic processing units in a given area. Further, this strategy requires manual sample addition to each well, a process that is both error prone and labour intensive.  Figure 2.1 Chip interface connection Photograph shows a microfluidic device connected to control lines (tubing) and reagents (pipette tips).  This interface constraint is encountered frequently in genomics applications where it is required to recover reaction products from the microfluidic device for off-chip analysis. Examples of this include Next-Generation Sequencing (NGS) of single cells, or when analysis requires delivering individual assays to each reaction chamber for on-chip readouts, such as qPCR (139). Recent advances in microfluidic technology have given rise to highly integrated devices capable of addressing each chamber individually. In particular, Leung (99) et al. used a 2-phase droplet system to recover microfluidic reaction products with minimal cross-contamination, and VanInsberghe et al. (papers in review) used 3-dimensional channel routing technologies (140) to deliver different assays or indexing 33  primers to individual reaction chambers. However, both of these designs resulted in an increased chip complexity and a decreased reaction chamber density. These processes are also prone to error and labour intensive. To fully exploit the scalability of microfluidic devices, new methods for addressing this interconnection bottleneck are required. Second, a novel chamber design was required to implement complex multi-step library preparation reactions with minimal loss. In current device architectures, there is usually an increase in the complexity of the device associated with an increase in protocol steps. For example, one-pot reaction protocols can be implemented by pushing intermediate products into ever increasing chamber volumes using new reagents, thereby assembling the next reaction condition (Figure 2.2). This strategy has been carried out on microfluidic devices for a variety of analysis workflows (52,59,90,141), with commercial products currently available for PCR analysis (Dynamic Array™, Fluidigm), and single-cell genetic or genomic analysis (C1™, Fluidigm). However, for complex multistep protocols, this approach can lead to large fluidic networks that require a large footprint, both to accommodate the increasing chamber sizes and to allow for routing of the required valve structures. Moreover, the sequence and volumes of reagent additions are pre-determined by the chip design and chamber geometries and are not capable of running arbitrary protocols. This can lead to time-consuming optimization experiments, since an entirely new chip has to be designed and fabricated if an alternate set of fluid-handling steps is required for the adoption of an optimized or new protocol. 34   Figure 2.2 Reaction assembly in common microfluidic designs Schematic shows reagent assembly strategy in MSL devices. Intermediate products are pushed into ever increasing chambers with new reagents (D-I). Figure adapted from (90).  Another limitation of current chip designs is the need to transfer reagents between chambers (Figure 2.2). This may lead to template losses, either due to adherence of intermediate products to the chamber walls or from incomplete volume transfer between each reaction chamber. Incomplete transfers can be minimized by a sufficiently large transfer volume; however, the increasing size of the reactor length scale associated with larger volumes poses challenges in achieving efficient mixing and may require active mixing using peristaltic pumping or other means. Also, if small amounts of new reagents should be added to a larger volume, active mixing is required, the implementation of which further complicates device architecture and limits density. This is an important point since 35  the addition of small volumes is a common and important step in many molecular biology protocols (e.g. enzyme additions or neutralization steps) including those used in library preparation.  The reliance of consecutive dilutions also introduces a challenge in the removal of reagents, such as EDTA or SDS, which can inhibit downstream reactions and therefore generally require purification steps and buffer exchanges. Current microfluidic approaches are not well suited to implement any purification steps that require sequential capture, washing, and elution of analytes from a solid phase. It should be noted that solid-phase capture has been implemented in some microfluidic devices by stacking columns of beads within the channel network, but that such approaches are inherently not scalable and require careful design to ensure robust and even bead stacking and flow across columns (93,142). In the following two sections I describe our effort to address these limitations with two new microfluidic designs. Both of these are relevant for the implementation of the library preparation protocol described in Chapter 3 and also more generally applicable to the field of microfluidics. These elements are integrated into a microfluidic chip designed for the analysis of up to 192 single cells per run. This chip is the first implementation of a single-cell processing platform using multi-layer soft-lithography that provides both programmability and scalability and was a major milestone towards the project goal. 36  2.3 Deposition of primers and template into microfluidic devices To overcome the world-to-chip limitation for our single-cell sequencing application, we developed a new fabrication method that uses contactless dispensing to introduce hundreds of unique reagents into a microfluidic device. Picolitre volumes of unique molecular index sequences are deposited directly into open microfluidic chambers, which are then sealed during the chip fabrication process. During an experiment, the index sequences may be re-suspended by addition of buffer, and then added to each reaction separately. The inclusion of unique molecular indices to reaction products also allows samples to be pooled on-chip for recovery while maintaining their identity, solving the issue of unique sample recovery. From a practical perspective, this fabrication process drastically reduces the number of required chip interface connections and increases the total number of reagents that can be used in a device run, while simultaneously simplifying channel routing and maximizing reaction density. The small footprint of microfluidic chambers and PDMS elasticity represent a technical challenge to accurately dispense molecular labels directly into an array of chambers. To overcome these challenges, I developed two chip fabrication workflows that can incorporate PCR primers and template DNA into microfluidic devices. The first is based on an established MSL push-down workflow (80,83) (Figure 1.1); the second is a novel imprinting workflow. The following sections describe the chip fabrication methods, the operation of the spotting robot, and the validation of the developed routines and workflows using qPCR readouts. The imprinting and spotting processes described in this 37  chapter provide the foundation for the microfluidic device and protocols described in the next chapter. 2.3.1 Method 2.3.1.1 Chip fabrication methods Accurate dispensing into the small footprint of a microfluidic chamber array requires open chamber geometries with a flat, minimally distorted chamber array. These characteristics are not easily achieved with the established MSL workflows since distortion naturally occurs once flexible membranes are removed from the rigid mould, as is required to gain access to the transferred patterns (see Chapter 1.3.2.1). We first developed a solution to this problem based on the traditional push-down geometry. Instead of dispensing directly into microfluidic chambers, an array of primers or template DNA was spotted onto a blank membrane and then aligned to a chip assembly containing the flow and control layers. This approach bypassed both the issue of fabricating an array of microfluidic chambers with minimal distortion and the problem of aligning the spotting robot to the chamber array. 38   Figure 2.3 Design of the push-down microfluidic device (a) Layout of 300 reaction chambers accessed through six inlets and twelve outlets. Reagent inlets and outlets are in gray; control channels are in red. (b) Expanded view of one processing unit, featuring the: (I) reagent inlet, (II) chamber interfacing with the dispensed probe substrate, (III) reaction chamber, (IV) reagent outlet, and (V) reagent supply channel.  Briefly, microfluidic devices were cast from PDMS (RTV615, General Electric) with each device consisting of 3 elastomeric layers: a top control layer, a middle flow layer, and a blank layer sealing the chip (detailed protocol in Appendix A). The control layer was made from a 5:1 mixture of PDMS (5 parts RTV615A and 1 part RTV615B) and the thin flow layer was fabricated by spin coating the flow mould with a 20:1 mixture of PDMS (20 parts RTV615A and 1 part RTV615B). Both, the control and the flow layers were cured at 80 °C for 17 minutes. This shortened bake time, compared to the more common 45 min to 1 hour, left the PDMS more sticky and resulted in a stronger bond between control and flow layer. This was particularly important, because push-down valve architectures require a high 39  valve operation pressure (~45 psi). Initially, a longer bake time was used, which resulted in a weaker bond and a high rate of chip failures due to delamination between the two layers.  After the layers were aligned and bonded together for 2 hours, interlayer connections were added with a custom laser ablation system (140). In parallel, microscope slides were spin coated with 20:1 PDMS and also cured for 17 minutes. The microscope slides were then placed on a spotting robot and an array of primers or template was deposited onto the blank PDMS membranes (see Chapters 2.3.1.2 and A.3). Finally, the blank membranes containing the spotted array and the 2-layer chips were plasma oxidized for 30 seconds and immediately aligned. The finished chips were baked at 80 °C for 2 hours to strengthen the final bond. Due to the flexibility of the PDMS assembly and the small dimensions of the chambers, manual alignment was challenging and resulted in a high failure rate. Furthermore, the chip design required interlayer connections (called vias) to connect the thin bottom layer to the taller high-volume reaction chambers located in the top control layer; this requirement further complicated the chip design and constrained the scalability. Ultimately, this approach proved too complicated and error prone, and was abandoned. Instead, we developed a new microfabrication workflow that is better suited for interfacing with spotting technology (Figure 2.4). 40   Figure 2.4 Schematic of the double-sided imprinting workflow Blue and green geometries represent the flow mould, orange geometries represent the control mould. The fabrication protocol is: (a) TMCS treat silicon wafer (b) Parylene coat glass wafer (c) Spin coat 15:1 PDMS onto silicon wafer at 500 rpm for 1 min (d) Cast 15:1 PDMS onto glass wafer and degas for 20 min (e) Cast 5-mm thick 10:1 PDMS layer and cure at 80 °C for 1 hour (f) Align glass to Si wafer using interlocking geometries and cure assembly at 80 °C for 1 hour (g) Cut and release thick PDMS layer (h) Release glass wafer and use fiducial markers to position spotting robot to dispense reagents (i) Plasma oxidize PDMS (j) Plasma bond two PDMS layer and bake at 80 °C for 5 min (k) Release chip from silicon wafer, dice chips to desired size, punch access ports, and plasma bond chips to glass slide.  41  In this new double-sided imprinting workflow (Figure 2.4), liquid PDMS is sandwiched between two moulds that are aligned using interlocking geometries (Figure 2.6c). By using a glass substrate, these alignment geometries are visible to the naked eye (thereby not requiring specialized optics for alignment) and are fabricated at a defined height that determines the thickness of the membrane. During the fabrication, the “control” mould is pre-treated with TMCS (trichloromethylsilane), and the “flow” mould with parylene-C. We found that the polymerized PDMS membrane adheres more strongly to the TMCS-coated substrate, thereby allowing the opposing parylene-coated mould to release, leaving the flow geometry exposed on the top surface of the PDMS membrane. Because the flat membrane is still attached to the control mould, there is no distortion of the membrane, facilitating accurate spotting directly into the reaction chambers (Figure 2.5). A detailed fabrication workflow is provided in the appendix (sections A.2 and A.3).  Figure 2.5 Microphotograph of integrated primers Dried primers (1200 pL dispensed volume) in the imprinted microfluidic device. Scale bar: 100 μm.  42  The double-sided imprinting workflow also addresses the fundamental issues of aligning the control and flow layers. As mentioned earlier, the elasticity of PDMS and shrinkage after removing a layer from the mould makes it challenging to align a flexible membrane to the second mould. In the double-sided imprinting workflow, this problem is completely bypassed since all microfluidic channels and chambers are transferred in the same moulding step and only two points of the ridged moulds need to be aligned.  Based on the described imprinting workflow, we designed a simple microfluidic chip to test and optimize spotting protocols for the incorporation of primers for single-cell indexing applications (Figure 2.6). 2.3.1.2 Spotting into microfluidic chamber arrays To routinely incorporate sequencing primers into a microfluidic chip for single-cell analysis, primers need to be precisely deposited into the desired location. For this, we developed a fully automated dispensing routine (details in Appendix A.3) that integrates all needed wash and quality control steps on a contactless piezo dispenser (sci-FLEXARRAYER S3, Scienion AG). The system includes a 3-axis stage, dispensing nozzles for a variety of volumes, cameras for both optimizing droplet stability and volume, and target recognition and alignment (Figure 2.6d). Furthermore, the system includes an active fresh-water wash station. 43   Figure 2.6 Design of the double-sided imprinted microfluidic device (a) Layout of 288 reaction chambers accessed through six reagent inlets or twelve serial recovery ports. (b) Expanded view of two processing units, featuring the: (I) reagent inlet, (II) reaction chamber, (III) serial recovery channels, (IV) reagent supply channel, and (V) integrated valves. (c) Design of the alignment features. Red photoresist features are located on the control mould; the black pillar is located on the flow mould. (d) Design of the fiducial markers for aligning the spotting robot to the reaction chambers. (e) Wafer layout is showing an array of 4 chips. Blue boxes indicate the location of the alignment markers; green boxes indicate the location of the fiducial markers used for spotting.  In order to dispense the probe substrate directly into the microfluidic chambers, a number of protocols needed to be developed. First, to account for misplacement and rotational 44  errors of the chamber array, a field target recognition algorithm optimized for the microfluidic chips was implemented (details in Appendix A.3). This algorithm was used at the beginning of each run to ensure accurate dispensing into the microfluidic chambers. In addition, a fully automated dispensing routine was developed with a focus on robustness. After taking up a new probe, we found that small amounts of sample remain on the outside of the nozzle. This had a negative effect on droplet stability and the flight path was usually affected. While the deviation from the ideal path was small and wouldn’t cause any problems in other applications, droplets often missed the microfluidic chambers completely or partially. We addressed this issue by first flushing the nozzle with fresh 18 MΩ water after an uptake to remove any contamination from the outside of the nozzle and then dispensing the mixed probe/water interface into waste. After removing the mixed interface, we used the droplet camera to confirm that the droplets were within a narrow region of interest before dispensing the probe into the array. If the droplet deviated too far from the ideal flight path, the probe was not dispensed and recorded for quality control. Another issue we had to address was the risk of cross-contamination. The spotting robot uses a non-disposable nozzle. After a dispense step, the probe is washed out before taking up the next. To ensure that no contamination remains in the nozzle before the next dispensing step, we had to develop and optimize a rigorous wash routine (for details, see Appendix A.3). We found that only flushing fresh water did not remove probes sufficiently and detergents, such as Tween 20, were needed. A combination of fresh-water flush steps 45  and detergent washes were therefore combined into a wash routine that was carried out after every dispense step. A detailed description of the PCR experiment can be found Appendix A.4. 2.3.2 Results 2.3.2.1 Previous issues encountered during primer incorporation Incorporating primers into microfluidic devices had been the goal in our lab for many years. To make it possible, I had to solve two major problems encountered by others before, namely reliable dispensing of picolitre volumes and robust bonding of multilayer devices after droplet deposition. Before switching to a contactless piezo dispenser, a pin contact spotter was used. Spotting pins draw reagents into the pin by capillary force when immersed in the source liquid. During the deposition cycle, loaded pins are brought into contact with the target and surface tension interacting with the target surface leads to the formation of a droplet when the pin pulls away. Spot formation and volume are determined by the interaction of many parameters, including substrate hydrophobicity, spotting buffer, viscosity and pH of the reagents, DNA concentration, temperature (air and target), and humidity. The operation was further complicated by the phenomena that the first spots were larger than subsequently printed ones. This is linked to solution coating the outside of the pin during loading, leading to spots of variable volume. Because of these complex surface interactions 46  and challenges in spotting robot operation, it was impossible to deposit droplets in a robust and repeatable fashion. In addition, the fixed pin spacing put restrictions on the chip design. The reaction chambers had to be aligned with the pin pitch, while also accommodating flow channels and valve control networks. Reaction density further suffered from large tolerances, since pin wobbling and instrument positioning errors were common. Finally, the instrument did not include the capabilities to align a probe substrate precisely with the pins, which is a fundamental requirement to dispense probes directly into microfluidic chambers.  Beyond issues during spotting, inconsistent resuspension of the tiny spots was regularly observed during an experiment. It was determined that the success of resuspension was depended on the bake time during off-ratio bonding. For example, PCR experiments on chips with bake times that exceeded 1 hour at 80 °C frequently did not produce any signal. In contrast, chips that were bonded using shorter bake times usually delaminated, even at very low operating pressures in the flow channels (<5 psi). Finally, chips could not be reliably off-ratio bonded to the spotted substrate, if the time between curing the PDMS-coated spot substrate and alignment of the final device exceeded 2 hours.  An essential step towards addressing these issues was the switch to a contactless dispenser (Scienion) since droplet volumes are independent of the surface properties of the target substrate. A second crucial development was the implementation of oxygen plasma 47  bonding instead of off-ratio mixing to improve the bonding strength and solved the frequently experienced delamination issues. 2.3.2.2 Chip designs and operation Two microfluidic devices (Figure 2.3, Figure 2.6) were designed to develop, improve and validate protocols for incorporating contactless spotting technology into chip fabrication. The devices feature either 288 or 300 reaction chambers arrayed in 12 columns of 24 or 25 chambers, respectively. Six independent reagent inlets are connected to 48 or 50 chambers in parallel. In addition, each sub-array of 48 or 50 chambers may be connected in series in order to enable recovery of pooled reaction products. During an experiment, PCR mix was first injected into each of the 6 inlets and pushed into all reaction chambers in parallel. In all experiments, a single reagent inlet was reserved for no-template controls (NTCs). Quantitative PCR (qPCR) was carried out on a modified BioMark™ reader (Fluidigm), while all valves were kept pressurized. Reaction products could be recovered by flushing 10-15 μL of 0.1% Tween through each of the 6 sub-arrays. 2.3.2.3 qPCR results: device sensitivity and performance Experiments were performed to assess the uniformity, dynamic range, and cross-contamination in PCR reactions assembled using our newly established chip fabrication and spotting protocols. Measurements were performed using an assay to detect a 126 bp region of the RPPH1 gene (143) on a synthetic template of the gene (details in Appendix A.4). 48  Dynamic range and uniformity measurements were performed on chips built with the push-down workflow, while the cross-contamination test was performed on a chip built with the double-sided imprinting protocol. First, we determined the efficiency and dynamic range of the qPCR reaction in our device over a 32× dilution series of template DNA (Figure 2.7a). Forward and reverse primers were spotted into all chambers and then sealed during the chip fabrication process. During an experiment, the PCR mix was pushed into the microfluidic chambers, thereby resuspending the primers. The efficiency of the PCR amplification was calculated as the slope of the linear least-squares fit of log2 concentration (C) versus CT over the highest four concentrations, and was determined to be 70% with an R2 of 0.9995. The low standard deviation in the CT values across the arrays indicate uniform amplification (4.62 ± 0.10 for 2.84 nM, 10.84 ± 0.20 for 0.0888 nM, 17.82 ± 0.26 for 2.77pM, 24.07 ± 0.36 for 0.0867 pM; stated as CT ± standard deviation). The lowest concentration (2.71 fM) resulted in a digital amplification pattern with 27 of 50 chambers generating signal, which is characteristic of single-molecule amplification. The average CT value of positively amplified chambers obtained from this array was found to be 30.29 ± 1.19. In summary, these results demonstrate that the device has single-molecule sensitivity, and that amplification across the device is uniform. 49   Figure 2.7 Performance of incorporated probe substrates Real-time amplification curves are derived from fluorescent images taken during PCR amplification. Heatmaps of CT values are derived from real-time amplification curves and organized in the device layout. Black cells were not detected. The two columns on the right side are negative controls. 50  (a) Dilution series: Heatmap and standard curve generated from a 32× dilution series of template DNA. Primers were pre-spotted during chip fabrication. The digital pattern in the second-to-last array is characteristic of single-molecule amplification. Points in the standard curve represent mean CTs ± standard deviation of all measurements in one concentration. (b) Uniformity: DNA template was incorporated in the microfluidic chip and re-suspended during the experiment. (c) Cross-contamination: 0.1% Tween 20 in water and template DNA was spotted in an alternating pattern starting in the bottom row with 4 replicates in the same row. No amplification was detected in the NTC chambers, indicating that cross-contamination during spotting was less than 1 in 14.5 million molecules. Row 2 and Row 22 (from the bottom) show later amplification, which is consistent within the replicas. This indicates that less volume was dispensed during the chip fabrication process.  Next, we assessed how reproducibly the dispensed template DNA can be resuspended in an experiment (Figure 2.7b). During chip fabrication, 1.2 nL of synthetic template DNA at a concentration of 11.86 nM in 0.1% Tween 20 was dispensed in all but the NTC chambers. The same PCR mix was then pushed into the entire array, thereby resuspending the template DNA in each chamber. The CT values obtained across the entire array demonstrated uniform amplification and were found to be in the dynamic range of the device with a mean CT of 17.07 ± 0.28 (Figure 2.7b). The measurement precision was estimated to be 12%, which is approaching the limit of qPCR (90). The bottom row had a higher average CT and standard deviation (18.19 ± 0.47), which can be attributed to dehydration of the chamber contents or to variability in spotting volume. Finally, we tested the spotter cross-contamination (Figure 2.7c). Template (1.2 nL of synthetic template DNA at a concentration of 20 nM in 0.1% Tween 20) and resuspension solution (0.1% Tween 20) were each dispensed in alternating order with wash steps in 51  between dispensing steps. For each sub-array, template DNA was spotted into 22 chambers, and template-free 0.1% Tween 20 was dispensed into an additional 22 chambers, while leaving the remaining chambers empty. Replicates of each dispensing step were included across 4 sub-arrays. The final array was left empty to serve as a master mix no-template control (NTC). The PCR was carried out by pushing the same master mix into all chambers, and subsequently performing 40 cycles of amplification. The average CT across all chambers that received template was found to be 10.34 ± 0.66, with two rows (row 2 and row 22 counted from the bottom of the array) having higher average CT values (row 2: 11.77 ± 0.43; row 22: 22.28 ± 0.07). This difference is again likely caused by variation in the spotting volume since the variation within a row (replicas of the same dispensing step) was lower than the variation between rows. More importantly, we were not able to detect amplification in any of the template-free wells, indicating that potential carry over between successive spotting steps is removed by the wash routine at a level of less than 1 in 14.5 million molecules. 2.4 Inflatable reaction chamber To enable multi-step protocols on microfluidic devices I designed and implemented an inflatable microfluidic chamber that allows us to adjust the volume of the defined structure freely rather than being constrained by chamber size. In the following section, I describe the device fabrication process and architecture of this new chamber architecture and introduce a microfluidic chip for single-cell genomic library preparation that integrates both the developed protocols for primer incorporation and the inflatable chamber. I used 52  this chip to show that the inflatable chamber structure is compatible with solid-phase capture, wash steps, and buffer exchanges; a single-cell library preparation workflow on the same chip is described in the next chapter. 2.4.1 Method 2.4.1.1 Device fabrication The fabrication process (Figure 2.9) for the inflatable chamber architecture builds on the imprinting capabilities described in the previous section with two significant modifications: 1) chips were fabricated entirely in 10:1 PDMS and 2) flow and control moulds were cast in two separate steps. These modifications were necessary because the inflatable chamber architecture required a much thinner membrane with a highly uniform thickness across the entire array compared to what was reliably achievable with the double-sided imprinting workflow (Figure 2.8). Even with the established MSL workflows, uniform membrane thicknesses of less than 5 μm on top of channels or chamber moulds are not easily achieved. Patterns on the substrate lead to a “crowding” effect of PDMS around and on top of the features, which increased the membrane thickness. We found that the most reliable way to fabricate thin uniform membranes of less than 5 μm thickness is to spin-coat a blank substrate. To incorporate a thin blank layer as an active element into the chip, cast channels and chambers have to face each other; to achieve these conditions, we imprinted only the flow-layer into a PDMS membrane to create microfluidic channels that 53  are open to the top. The control layer was cast in a thick slab of PDMS with channels facing the bottom  (Figure 2.8c).   Figure 2.8 Layer arrangements Schematics show layer arrangements for (a) the common MSL “push-up” architecture; (b) the double-sided imprinting workflow; (c) the inflatable chamber architecture. The arrows highlight the membrane that could form the boundary between two adjacent chambers.  The final chip was assembled by aligning these two patterned layers with a thin membrane in between. The top “control” layer was cast from 55 g PDMS mixture, degassed and cured at 80 °C for 60 min. The middle layer, a thin blank membrane, was made by spin coating a blank wafer with PDMS at 5,500 rpm (G3 Spin Coater, Specialty Coating Systems, Inc.) and cured for 30 min at 80 °C. After baking, the cured PDMS was removed from the control mould and plasma bonded to the blank membrane. The two-layer structure was then incubated for 5 min at 80 °C, before removing it from the blank wafer in order to punch access ports and laser ablate interlayer connections (140). 54   Figure 2.9 Device fabrication workflow (a) TMCS treat flow wafer (b) Parylene coat control wafer (c) Pour 10:1 PDMS and degas for 40 min (d) Cast 10:1 PDMS layer 5 mm thick, degas for 60 min, and cure at 80 °C for 1 hour (e) Plasma oxidize glass slide for 10 sec (f) Spin coat 10:1 PDMS at 6500 rpm for 1 min and cure at 80 °C for 20 min (g) Align oxidized glass slide to flow mould and cure at 80 °C for 1 hour (h) Release control mould and plasma oxidize PDMS (i) Use fiducial makers to position spotting robot and dispense desired amount of reagent (j) Plasma bond two PDMS layers, bake at 80 °C for 5 min and punch access ports (k) Plasma oxidize flow layer for 20 sec and control assembly for 30 sec (l) Dice chips to desired size and plasma bond chips to glass slide. 55  The bottom “flow” layer was fabricated using a single-sided imprinting workflow to facilitate the deposition of indexing primers into open microfluidic structures with minimal distortion of the array. PDMS in its liquid state was sandwiched between the “flow” mould and a plasma-oxidized glass slide (100 × 100 mm2 Schott D263 Borosilicate Glass, S.I. Howard Glass Co.) and baked for 30 min at 80 °C. The stronger adhesion of the cured PDMS towards the glass slide allowed the “flow” mould to be lifted off, while the cast membrane remained attached to the glass slide. Molecular probes could be deposited directly into the open chambers on the flow layer, if desired. Afterwards, the two-layer structure was aligned and plasma bonded to the flow layer (details in Appendix B.1).  2.4.1.2 Bead purification Bead purification was carried out by loading 300 pump cycles of a 100-bp ladder (TrackIt 100bp DNA ladder, Thermo Fisher Scientific) into two arrays. The amount of fluid transferred during each cycle of the peristaltic pump is equal to the volume displaced by the middle valve. The first array was an on-chip control and was not touched during size selection of DNA fragments. The second array was size selected with AmpureXP beads at a ratio of 1× sample to 0.8× beads. After beads were added to the reaction chambers and incubated at room temperature for 10 min, they were collected on the opposite side of the reagent in- and outlet using a magnetic field. After approximately 5 min, the cleared solution was pushed out of the reaction chamber by completely deflating the reaction chamber. Two 800 pump wash cycles with 80% ethanol were performed, while the chip 56  remained on the magnet. Next, the beads were resuspended in 400 pump cycles of PCR water, before both arrays were recovered from the chip by flushing 4.5 μL of PCR water through each array. Finally, the control and size-selected ladders were normalized to 5 μL final volume. A tube control was performed in parallel following the manufacturer’s protocol, but 80% ethanol was used instead of the recommended 70%. 20 μL of the same ladder were combined with 18 μL of beads and washed in 500 μL of ethanol. The product was eluted in 20 μL of PCR water. 2.4.2 Design and results 2.4.2.1 Architecture of the inflatable microfluidic chamber We designed a microfluidic chamber with a freely adjustable volume to carry out multi-step reactions in a programmable format. The inflatable chamber comprises two open microfluidic chambers that are stacked on top of each other. At the interface, a thin, flexible membrane is integrated to separate the two chambers (Figure 2.8c). This configuration allows the thin membrane to be deflected up or down, thus expanding or reducing the volume of the chambers (Figure 2.10c).  We referred to the chamber containing reagents and template as the reaction chamber; the second chamber is referred to as the displacement chamber. Before the first reagent addition, the reaction chamber can be deflated entirely by pressurizing the displacement 57  chamber. This reduces the volume of the reaction chamber to near zero. A peristaltic pump can then be used to add precise amounts of reagents or template. At any point, the volume can either be reduced by running the peristaltic pump in the opposite direction or completely reset to nearly zero by opening the chamber outlet valves and pressurizing the displacement chamber.  Figure 2.10 Microfluidic device design (a) Device layout, featuring 192 single-cell processing units accessed through four cell loading inlets, to enable case vs. control studies. (b) Expanded view of one cell processing unit, featuring: (I) the cell lysis inlet, (II) a cell trap, (III) an inflatable reaction chamber, (IV) the reagent inlet, (V) an index-spotting chamber, (VI) the reagent supply channel. (c) Microphotograph of a partially inflated microfluidic reaction chamber (red), and displacement chamber (yellow). (d) CAD drawing of the on-chip cell filter. (e) Fluorescent image of a cell filter after loading. 58  2.4.2.2 Microfluidic chip for single-cell library preparation We integrated the inflatable chamber as well as the primer spotting protocols into a microfluidic device for the preparation of single-cell libraries for next-generation sequencing. The device features 192 cell-processing units arrayed in four columns of 48 (Figure 2.10a). Each cell-processing unit (Figure 2.10b) includes an inflatable reaction chamber, a chamber that contains molecular labelling sequences, a cell trap, and reagent in- and outlets. The last cell trap in each column is missing the trapping structure for no-cell controls. During an experiment, up to four different cell suspensions can be injected into the chip. A microfluidic filter structure (Figure 2.10d, e) is used to remove cell clumps to avoid clogging the cell traps. Two additional inlets are connected to the cell inlet channel to enable wash steps before isolating the trapped cells using integrated valves. Channels connected to each cell trap can then be used to transfer the cells into the inflatable reaction chambers by flushing buffer backwards over the trapping structure. Cell and reagent inlets connected to the inflatable reaction chamber are routed through a peristaltic pump to meter out precise volumes. The inflatable chamber is connected to 5 inlets and outlets in total: one cell inlet, one reagent inlet, one inlet connected to the index chamber, and two channels for serial recovery. Finally, all displacement chambers are connected to one outlet port and can be pressurized for recovery. In addition, the displacement chamber offers a constantly pressurized reservoir of buffer or water in close proximity to the reaction chamber that can minimize the effect of evaporation during heat incubation steps. 59  2.4.2.3 On-chip size selection using magnetic beads As a demonstration of the inflatable chamber, we implemented a commonly used DNA size selection protocol, where DNA fragments selectively bind to paramagnetic beads based on the volume ratio of sample and bead suspension buffer (AMPure XP, Beckman Coulter). A 100-bp DNA ladder containing a tracking dye to examine the size selection and efficacy of the wash steps was loaded into a device. Paramagnetic beads were added under binding conditions to half of the device, with the other half serving as an unpurified control. After allowing the DNA to bind to the bead surface, the beads were magnetically separated and the supernatant was removed by deflating the reaction chambers. Two ethanol wash steps were carried out by repeatedly inflating and deflating the chambers to remove short fragments and wash away the tracking dye. Finally, the remaining fragments were re-suspended and both the unpurified ladder (on-chip control) and the size-selected fragments were recovered from the chip. To compare the on-chip size selection to the manufacturer’s protocol, we performed the same purification in tubes. Size distribution traces (Agilent Bioanalyzer) were used to compare the size cut-offs and relative yield. We found that on chip, a 0.8× bead to sample ratio removed fragments shorter than 300 bp (Figure 2.11a); off-chip, the cut-off using this bead ratio was 200 bp (Figure 2.11b). The yield, calculated as the percent of the size-selected samples remaining relative to the input amount, was found to be comparable at 64% for the on-chip, and 60% for the off-chip purification.  60   Figure 2.11 On-chip size selection Size-distribution histogram (Bioanalyzer traces) from (a) on-chip and (b) in tube size selection experiment. X-axis shows the fragment size and Y-axis fluorescent units. Red trace represents the control reaction and the blue trace shows the product of the size selection. The dashed orange box indicates the enlarged trace (solid orange). Green arrows highlight the size cut-off and the blue arrow points to the signal generated by the Tartrazine tracking dye.  61  This difference indicates that slightly more sample was removed during size selection in tubes compared to the microfluidic device, despite its shorter cut-off. A similar trend can be observed in the size distribution traces. An overlay of the control and size-selected samples shows that the amounts are highly correlated above the cut-off length for the microfluidic device (Figure 2.11a), whereas a lower concentration of DNA was detected across the size-selected distribution from the tube experiment (Figure 2.11b). In addition, the tracking dye could not be detected in any of the size-selected samples, indicating that the wash steps sufficiently removed contaminants.  2.5 Discussion Here I developed microfabrication workflows that combine multi-layer soft-lithography and contactless spotting technology to allow for the deposition of DNA oligos (molecular barcodes) during the fabrication of microfluidic devices. This capability permits the routine labelling of reaction products and sample pooling on the microfluidic device for recovery. I also developed a new inflatable microfluidic chamber to assemble multi-step reactions in arbitrary sequences on-chip. Based on these protocols and workflows, I then designed a microfluidic device that combines the inflatable chamber with cell traps and pre-dispensed indexing primers to provide a flexible architecture for single-cell analysis. A critical innovation in implementing both technologies is a novel chip fabrication method. The inflatable chamber architecture requires the integration of thin membranes with uniform thickness into the microfluidic device for reproducible results, since the thickness 62  determines how much pressure is needed for the membrane to expand. The thin membranes also have additional benefits, namely reducing the minimum valve actuation pressure and enabling more reliable ablation of inter-layer connections at lower laser power. This allows us, on the one hand, to reduce the size of the valves while on the other hand, to decrease the spacing between vias and adjacent channels and chambers. As a result, more reaction chambers can be integrated on the same size footprint. The new chip fabrication also provides the functionality to fabricate open chamber arrays that can be interfaced with a piezo-dispenser. Fiducial markers in the PDMS membrane enable the alignment of the dispensing nozzle to the array. Furthermore, we implemented and optimized software that continuously monitors droplet stability and the flight path during spotting to ensure that droplets are accurately dispensed into the microfluidic chambers. We also established a robust wash routine to ensure minimal cross-contamination between subsequent dispensing steps. This is integral for routinely incorporating distinct reagents into microfluidic devices for applications such as indexing single-cell libraries for multiplexed sequencing so that reads can be assigned unambiguously to a specific cell. In developing the described wash routine, eliminating potential cross-contamination was prioritized over dispensing speed. However, in time-sensitive applications or procedures requiring prolonged dispensing, shortening the wash routine may improve the protocol. In summary, by combining the established protocols and designs into a chip for single-cell analysis, we created the first implementation of a flexible and universal microfluidic design that enables multistep protocols in a scalable format. This was a fundamental stepping 63  stone towards integrating an entire genomic single-cell library preparation workflow on-chip, which is described in the next chapter. 64  Chapter 3: Direct library preparation without pre-amplification 3.1 Overview The previous device design and fabrication advancements were applied to the high-throughput construction of single-cell next-generation sequencing libraries. We implemented a tagmentation-based chemistry to fragment and index single cells directly, a workflow we termed ‘direct library preparation’ or DLP. In this chapter, I introduce the DLP protocol and show results from 782 single-cell DLP libraries. We first applied our method to analyze 268 single cells from two immortalized normal human cell lines. Through comparison to past studies, we demonstrate that lack of pre-amplification results in more uniform single-cell genome coverage than WGA-based methods, and that pooled libraries achieve equivalent uniformity to a bulk genome at the same sequencing depth.  In addition, we applied our method to examine the genomes of 514 single cells from two passages of a patient-derived triple negative breast cancer xenograft line, yielding a detailed map of this tumour’s copy number architecture, while also inferring SNVs, LOH, and breakpoints on the high-depth, low-bias pooled “bulk-equivalent genome”. 3.2 Introduction Here we present an alternative to both bulk and single-cell WGA approaches, whereby indexed libraries are constructed directly from single-cell template DNA without any pre-65  amplification or sorting steps (Figure 3.1a). Our direct library preparation (DLP) protocol involves direct tagmentation (127) of single-cell template DNA in nanolitre volumes, followed by several cycles of PCR to add sequencing adaptors and index barcodes. Indexed libraries are pooled for multiplexed sequencing at low depth, enabling inference of single-cell copy number profiles that are grouped to identify sub-populations clones in copy number space (Figure 3.1b).  Figure 3.1 DLP workflow Concept schematic. (a) Experimental workflow. Heterogeneous samples are dissociated and single cells (or nuclei) are isolated from a cell suspension in individual reaction chambers; cells are lysed and unamplified single-cell DNA is fragmented and tagged using the Tn5 transposase chemistry (Nextera, Illumina); a minimal number of PCR cycles adds unique single-cell indices and sequencing adaptors to the tagmented DNA; finally, indexed libraries from all cells are pooled for multiplexed sequencing. (b) Analytical workflow. Sequencing reads derived from individual cells are demultiplexed, aligned to the human reference genome, and binned; following GC-content correction, a copy number profile is inferred for each low-coverage single-cell genome; single-cell copy number profiles are grouped, and the sequencing reads of cells with similar profiles are merged to produce higher-depth clonal genomes; sequencing reads 66  from all cells may also be merged to produce a high-depth bulk-equivalent genome; additional variants such as SNVs, LOH and breakpoints are inferred from high-coverage merged clonal genomes or the merged bulk-equivalent; finally, phylogenetic inference of the clonal lineage may be derived based on one or more classes of genomic variants.  As mentioned briefly earlier, the use of PCR following tagmentation has very different consequences than whole-genome amplification (see Figure 3.2 for comparison schematic). DOP-PCR, MDA, and MALBAC all carry out amplification of the DNA template prior to fragmentation (39,50,55). These methods generate many copies of each template strand in the form of long molecules that are only fragmented into sequencing inserts after amplification is complete. Because of this, any given region in the original template is represented by multiple inserts with non-overlapping start and end coordinates, which cannot be filtered computationally as duplicates, even though they are copies of the exact same template molecule. In contrast, direct library preparation involves fragmentation of the original DNA template as the very first step. While the DLP protocol does require several PCR cycles (11 cycles were used in the current experiments) for index barcode and sequencing adaptor incorporation, all duplicate molecules generated through this process have the same start and end coordinates. All such duplicates can be identified and removed computationally after alignment, resulting in single-cell genomes in which any correctly-mapped read is a unique representation of the template from which it originated. Thus, the direct library preparation approach overcomes some of the principal challenges of both bulk tumour sequencing and WGA-based single-cell sequencing methods. 67   Figure 3.2 Comparison of amplification methods Comparison of Whole Genome Amplification (WGA) based methods and Direct Library Preparation (DLP), illustrating the differing consequences of amplification prior to (a) or following (b) fragmentation. (a) WGA produces long copies of the template DNA that are subsequently fragmented into short sequencing inserts. Multiple inserts with non-overlapping start and end coordinates may, therefore, originate from the same region in the template. However, these inserts cannot be identified as duplicates and filtered computationally, and thus distort read depth. 68  (b) In contrast, the very first step in the DLP method is direct fragmentation of the single-cell DNA template into short sequencing inserts. PCR is required in order to add index barcodes and sequencing adaptors and to boost the library quantity to match sequencer input requirements. However, this amplification step produces exact duplicates, which are identified and removed computationally. All correctly-mapped inserts in the resulting single-cell genome are therefore unique representations of a fragment in the original template.  3.3 Methods 3.3.1 Device fabrication Devices were fabricated as described in Chapter 2.4, with the difference that indexing primers were included during chip fabrication. Fiducial markers located on the imprinted flow layer were used to accurately interface the imprinted PDMS membrane with a contact-less micro-dispenser (sci-FLEXARRAYER S3, Scienion AG). During the automated spotting routine, 700 pL of each sequencing primer was deposited into the microfluidic chambers using a PDC-70 Type 1 nozzle. To avoid primer cross-contamination, a rigorous wash routine was implemented between subsequent dispensing steps. Both, the wash and run routine are described in Chapter 2.3. Droplet volumes before and after each deposition step were recorded for quality control. 3.3.2 Sample preparation 3.3.2.1 Cell culture Immortalized human mammary epithelial cells from cell line 184-hTERT-L2 (passage 17) were cultured at 37 °C, 5% CO2 in MEBM Mammary Epithelial Cell Growth Medium (Lonza) 69  with 5ug/mL transferrin (Sigma) and 2.5ug/mL isoproterenol (Sigma), supplemented with Lonza MEGM(tm) Mammary Epithelial Cell Growth Medium Singlequots excluding gentamicin. Cells were grown to near confluence, trypsinized, spun down, resuspended in cryopreservation media (50% media, 40% FBS, 10% DMSO) and frozen at –1 °C/minute to –80 °C in a Mr. Frosty Freezing Container (Thermo Scientific). During the experiment cells were rapidly thawed in a 37 °C water bath. Immortalized human lymphoblastoid cells from cell line GM18507 (passage 13) were cultured at 37 °C, 5% CO2 in RPMI-1640 medium with 2.05 mM L-glutamine (HyClone) supplemented with 10% FBS (Gibco/Invitrogen) at 3x105 cells/mL. During the experiment, fresh cells taken from suspension culture were spun down at 128 ×g for 10 minutes to enrich for live cell clusters, and re-suspended in fresh medium. 3.3.2.2 Xenograft passaging and tissue dissociation Xenograft samples were transplanted and passaged as described in Eirew et al. (2015) (24). The anonymized human tumour tissue for xenografting was collected with informed patient consent according to procedures approved by the Ethics Committee at the University of British Columbia, under protocols H06-00289 BCCA-TTR-BREAST and H11-01887 Neoadjuvant Xenograft Study. Following harvesting, the tissue was finely minced with scalpels, then mechanically disaggregated for 1 minute using a Stomacher 80 Biomaster (Seward Limited, Worthing) in 1-2 mL cold DMEM-F12 medium (STEMCELL Technologies). Aliquots from the resulting suspension of cells and organoids were cryopreserved in viable 70  freezing medium (47:47:6 DMEM-F12:FBS:DMSO) and stored at –196 °C until further processing. Thawed tissue suspensions were enzymatically dissociated to single cells by sequential incubation in warm (37 °C) 300U/mL collagenase (STEMCELL Technologies) plus 100U/mL hyaluronidase (STEMCELL Technologies) for 2.5 hours, 0.25% trypsin/EDTA (Corning) while triturating with a pipette for 4 minutes, then 5U/mL dispase (STEMCELL Technologies) plus 0.1 mg/mL DNAse I (STEMCELL Technologies) while triturating with a pipette for 4 minutes, before passing through a 50 micron filter. 3.3.3 Library preparation protocol 3.3.3.1 Preparation of cells and device for loading Thawed 184-hTERT-L2, fresh GM18507 cells, or dissociated thawed xenograft cells were centrifuged for 10 min at 200 g and re-suspended in fresh PBS (Life Technologies) to a final concentration of 1x106 cells/ml. Cells were filtered using a 40 µm filter to remove cell debris and clumped cells, and 0.15 µL SYTO 9 Green Fluorescent Nucleic Acid Stain (Life Technologies) was added to 49.85 µL filtered cells. Next, 10 µL of stained cells were mixed with 8 µL PBS, and 2 µL loading buffer (81.25 µL Percoll, Sigma-Aldrich; 15 µL SuperBlock PBS Blocking Buffer, Fisher Scientific; 3 µL UltraPure 0.5M EDTA, Life Technologies; 0.75 µL 10% Tween 20, Sigma-Aldrich). The ratio of PBS and loading buffer was optimized for neutral cell buoyancy and can be adjusted for different cell types. 71  Prior to cell loading, cell sorting channels and inlet ports were primed with a Pluronic solution (10% Pluronic F-127, Sigma-Aldrich; 0.3% Syto9 stain in UltraPure DNase/RNase-Free Distilled water, ThermoFisher) (144). Priming helped to prevent cells from adhering to the PDMS. The prepared single-cell suspension was connected to the sample inlets using microcapillary pipette tips and 5-6 psi of pressure was applied to inject the cell suspension into the device and to push out any trapped air against an inlet valve (Figure C.1a). The pressure was then reduced to 1.5-2.5 psi and cell loading and separation valves were opened to allow for flow through the cell-sorting channels. The trapped cells were washed with PBS to remove untrapped cells, cell debris, and extracellular DNA. After washing, the trapped cells were isolated into single chambers by closing the cell separation valves (Figure C.1b). Cell occupancy was determined by microscopy and recorded for later analysis. The investigator was blinded to subsequent sequencing results when recording cell occupancy. Using the on-chip peristaltic pump, 1.2 nL lysis buffer was used to push the trapped and washed cells into the inflatable reaction chamber. The cell lysis buffer was prepared with 25 µL lysis buffer G2 (Qiagen) and 2.5 µL Qiagen Protease (Protease was re-suspended in 7 mL UltraPure water). The cells were lysed at 50 °C for 1 hour on a flatbed thermocycler (Bio-Rad PTC-200), before inactivating the protease at 70 °C for 15 minutes (and finally cooled to 10 °C). The displacement chamber was pressurized at 7-8 psi during the heating step. 72  3.3.3.2 On-chip direct library preparation Single-cell whole-genome sequencing libraries were prepared using a modified Nextera protocol (Nextera DNA Library Preparation Kit, Illumina). 10.8 nL of Tagmentation mix (6 nL TD Buffer, 1.6 nL TDE1, 3.2 nL Buffer 1 [1.22 mM magnesium chloride solution (Sigma-Aldrich) and 0.3% Tween 20 in water]) were added through the reagent inlet to each chamber and incubated on a flatbed thermocycler at 55 °C for 10 minutes (Figure C.1c). Supply channels were flushed with 20 µL UltraPure water and dried with filtered compressed air, before adding the next reagent. The tagmentation reaction was neutralized by adding 1 nL Qiagen Protease and 1 nL PCR water, and incubated at 50 °C for 15 min (Figure C.1d). The Protease was then inactivated at 70 °C for 15 min. Following neutralization, 21 nL PCR master mix (10.5 nL NPM, 3.5 nL PPC, 0.35 nL 10% Tween 20, 6.65 nL PCR water) was added to each chamber. During PCR master mix addition, pre-spotted index primers (20 µM 700 pL per primer) were re-suspended and added to the PCR reaction (Figure C.1e). PCR was performed using the following conditions: 72 °C for 3 min; 95 °C for 30 sec; 11 cycles of 95 °C for 10 sec/55 °C for 30 sec/72 °C for 30 sec; 72 °C for 5 min; and finally 10 °C. The final libraries were pooled and recovered from the microfluidic device by flushing 12 µL EBT through the inflatable reaction chamber array (Figure C.1f). Finally, size selection was performed using a 1.8× Ampure XP bead (Beckman Coulter) to sample ratio. 73  3.3.3.3 Bulk library preparation Flash-frozen xenograft tissue was thawed and immediately homogenized in lysis buffer using a rotor-stator homogenizer (Polytron PT1000). DNA was prepared from the lysate using the AllPrep DNA/RNA Mini kit (Qiagen). Cells from the 184-hTERT-L2 line were thawed and DNA was extracted with the QIAmp DNA mini kit (Qiagen) following the protocol for cultured cells. DNA was quantified using the Qubit dsDNA HS Assay Kit (ThermoFisher Scientific) and bulk libraries were constructed following the Nextera DNA Sample Preparation Guide (Illumina) with the following alteration: after tagmentation, the DNA was purified from the transposome using the NucleoSpin PCR clean-up kit (Clontech). 3.3.3.4 Whole-genome sequencing Library insert size and quantity was determined using a Bioanalyzer High Sensitivity DNA kit (Agilent) and the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific), respectively. The 184-hTERT-L2 and xenograft libraries were sequenced on an Illumina HiSeq 2500 instrument using paired-end 125 bp reads at the Michael Smith Genome Sciences Centre (GSC, Vancouver, Canada). The GM18507 libraries were sequenced on an Illumina NextSeq instrument using paired-end 125 bp reads at the UBC Biomedical Research Centre (BRC) in Vancouver, BC. 3.3.4 Data analysis Full details about sequencing data processing are provided in Appendix E  . 74  3.3.4.1 Data alignment and sequencing metrics Demultiplexed FASTQ files for the 184-hTERT-L2 and xenograft libraries were obtained from the GSC. For the GM18507 libraries, BCL files were demultiplexed and converted to FASTQ format using Illumina’s bcl2fastq2 program (http://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html). Demultiplexed paired-end FASTQ files were trimmed to remove Nextera adaptor contamination and low-quality bases on the 3' ends of reads using Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), and aligned with BWA (145) to the human reference genome GRCh37-lite, or mouse reference genome GRCm38. The resulting BAM files were sorted with Picard (http://broadinstitute.github.io/picard/) and indexed with Samtools (146). Indels were realigned with GATK (147) and duplicates removed with Picard. Downsampling and merging of BAM files was also carried out with Picard, as was collection of sequencing metrics.  3.3.4.2 Single-cell copy number inference Inference of single-cell copy number profiles was carried out with HMMcopy (148), with several modifications to the standard usage of this tool. Mappability and GC-content files were generated for the reference genome using the generateMap, mapCounter, and gcCounter tools packaged with HMMcopy. Due to the inclusion of downsampled datasets, 75  single-cell BAM files were binned into 200 kb bins for all datasets analyzed in Figure 3.6c; analysis of all other datasets was carried out with 150 kb bins. Instead of running the model on the logged, GC- and mappability-corrected values, the hidden Markov model (HMM) was run on the non-logged GC-corrected values, following filtering of bins with low mappability. The model was run with seven hidden states, rather than the default six states, and the default model parameters were modified as follows: params$m <- c(0, 0.5, 1, 1.5, 2, 2.5, 3) params$mu <- c(0, 0.5, 1, 1.5, 2, 2.5, 3) params$kappa <- c(25, 50, 800, 50, 25, 25, 25) params$e <- 0.995 params$S <- 35 Following segmentation, binned read counts were converted to integer copy number scale by dividing all bin counts by half the median value for bins assigned to the neutral (2-copy) state. Segment medians were recomputed, and the median of each segment rounded to the nearest integer to derive an integer copy number profile for the sample. Thus, despite the last HMM state encompassing all CNVs with six or more copies, high-level amplifications were assigned to their correct integer values. 3.3.4.3 Clonal and bulk-equivalent genome analysis Clonal genomes were generated by merging the BAM files of all single-cells belonging to a given clonal group. Bulk-equivalent genomes were generated by merging all indexed libraries for a given sample (including those flagged as a single cell, multiple cells, or 76  contaminated/ambiguous using fluorescence imaging), excluding those libraries flagged as containing mouse contamination. Copy number inference for the clonal genomes was carried out using HMMcopy, with the same parameters applied to the single-cell samples, as described above. For bulk-equivalent and bulk genomes, inference of LOH, SNVs, and breakpoints was carried out with Titan (31), mutationSeq (149), and destruct (150), respectively. 3.4 Results 3.4.1 Microfluidic device for direct single-cell library construction We hypothesized that nanolitre volume reactions would allow us to adapt conventional transposase-based library construction, typically performed on bulk DNA (127), to genomic material obtained from single cells. To implement this strategy, we designed and fabricated a novel microfluidic device that integrates the entire single-cell library preparation workflow: cell isolation, imaging, lysis, DNA tagmentation, barcoding, and sequencing adaptor incorporation (for detailed description of the device see Chapter 2.4.2.2 and Figure 2.10). A unique set of index barcodes were pre-spotted into the indexing chambers during fabrication and integrated into the chip (Figure 3.3a,b). During an experiment, a cell suspension was injected into each of four cell-loading inlets; single cells were sequentially caught in cell traps and washed with PBS to remove contamination and untrapped cells. Integrated valves above and below each cell trap were closed to isolate trapped cells into individual cell processing units, and high-magnification fluorescence imaging was used 77  to identify single cells and flag chambers with multiple cells and contaminating debris (Figure 3.3c,d,e).  Figure 3.3 Indexing primers and trapped cells in the microfluidic device (a) Unique index primers are pre-spotted during device fabrication using contactless spotting technology.  (b) Brightfield image of a pre-spotted primer incorporated in the microfluidic device. (c,d,e) Fluorescence imaging of cell traps permits the identification and labelling of chambers containing single cells, cells with contaminating debris, and two or more cells, prior to sequencing.  Only libraries identified as true single cells were included in subsequent single-cell copy number analyses. Four no-template control (NTC) chambers without cell traps are included to assess contamination for each chip run. Next, each trapped cell was transferred into its inflatable reaction chamber through actuation of a peristaltic pump. Cells were lysed, and single-cell libraries prepared by employing a “one-pot” tagmentation library preparation 78  protocol (Nextera, Illumina) directly on unamplified genomic DNA released from the single cells (127). Index barcodes were re-suspended and pushed into the reaction chamber, and 11 PCR cycles were applied on-chip to incorporate both the barcodes and Illumina sequencing adaptors onto the tagmented DNA. Following this, valves separating the reaction chambers were opened, permitting the pooled recovery of indexed single-cell libraries for multiplexed sequencing, while maintaining the identity of sequencing reads. Analysis of reads obtained from NTC reactions indicates that control chambers had very low levels of contamination (Figure C.2). We subsequently carried out highly multiplexed low-coverage sequencing of single-cell libraries to approximately 0.07X coverage depth per cell (192 indexed samples per Illumina HiSeq lane, given approximately 15X per lane; see Table 3.1 for library information). This approach permits the economical profiling of thousands of single cells. 3.4.2 Uniformity of coverage in diploid single-cell and tumour-cell sequencing metrics As a first experiment to evaluate the uniformity of coverage obtained by direct single-cell library preparation, we first sequenced 192 indexed libraries from 184-hTERT-L2, an immortalized normal breast epithelial cell line (152 single cells identified with fluorescence microscopy, 8 no-template controls). This cell line is primarily diploid, but is known to slowly gain chromosomes and acquire copy number alterations with serial passaging (151). To provide an additional reference dataset from a known diploid sample, we also sequenced 192 indexed libraries from GM18507, an immortalized normal lymphoblastoid  79   Figure 3.4 Single-cell copy number profiles from 184-hTERT-L2, an immortalized normal breast epithelial cell line 80  (a) Heatmap of integer copy number profiles (148) and Bayesian phylogeny (152) for 184-hTERT-L2 cells filtered with the same MAD cutoff as the xenograft datasets (n = 114). Rows correspond to cells, columns to genomic bins (150 kb). (b) Example copy number profiles, from top to bottom: a diploid cell; a cell with a unique one-copy gain of chromosome 8; a cell with a unique high-level amplification in chromosome 10; an example cell from a sub-population with a deletion in chromosome 10; an example cell from a sub-population with a two-copy gain of the long-arm of chromosome 20; an example cell from a sub-population with a one-copy gain of chromosome 20; a cell potentially in the early stages of DNA replication; a cell potentially in the late stages of DNA replication; a cell with a non-integer profile potentially undergoing apoptosis.  cell line, previously characterized in the Hapmap project (153) (123 single cells identified with fluorescence microscopy, 44 no-template controls). The GM18507 libraries were sequenced on one Illumina NextSeq flowcell, while the 184-hTERT-L2 and subsequent xenograft datasets were sequenced on an Illumina HiSeq instrument (184-hTERT-L2 mean 3.02 ± 0.65 million reads, 0.07 ± 0.01X coverage depth per cell; GM18507 mean 7.13 ± 2.07 million reads, 0.12 ± 0.03X coverage depth per cell; see Table 3.2). We used a hidden Markov model (HMM) (148) to infer integer copy number profiles, and found that while the majority of cells were diploid, both cell lines featured several sub-populations of cells with shared integer copy number alterations, as well as a minority of cells with unique integer alterations (see copy number heatmaps and sample cells for 184-hTERT-L2 in Figure 3.4 and GM18507 in Figure 3.5). In addition, both datasets included sub-populations of cells with a multitude of non-integer alterations across the genome, which have been reported in previous studies and may be related to biological processes, such as DNA replication and apoptosis (44). 81   Figure 3.5 Single-cell copy number profiles from GM18507, an immortalized normal lymphoblastoid cell line 82  (a) Heatmap of integer copy number profiles (148) and Bayesian phylogeny (152) for GM18507 cells filtered with the same MAD cutoff as the xenograft datasets (n = 91). Rows correspond to cells, columns to genomic bins (150 kb). (b) Example copy number profiles, from top to bottom: a diploid cell; a unique cell with both a one-copy loss in chromosome 1 and a two-copy gain in chromosome 16; an example cell from a sub-population with a one-copy gain in chromosome 11; an example cell from a sub-population with a two-copy gain in chromosome 16; an example cell from a sub-population with a one-copy loss in chromosome 16; a cell from the same sub-population with a one-copy loss on chromosome 16, that is potentially in the early stages of DNA replication; a cell potentially in the late stages of DNA replication; a cell with a non-integer profile potentially undergoing apoptosis.  To examine how coverage breadth (defined as the fraction of the genome covered by at least one sequencing read) increases with the number of single-cell genomes pooled, we carried out bootstrap sampling (n = 30 draws per condition) and pooling of all libraries flagged as single cells from our immortalized normal cell line samples, excluding only those with less than 0.25 million reads (Figure 3.6; GM18507 n = 122, dark blue; 184-hTERT-L2 n = 146, dark orange). This exclusion criteria was selected because the DOP-PCR dataset we used for comparison was filtered in this manner (54). We compared our results to cells from an immortalized normal lymphoblastoid line sequenced using the C-DOP-L protocol (54), a variant of DOP-PCR, which was the only published dataset we identified that featured a comparable number of immortalized normal cells sequenced in multiplex at 96 cells per Illumina HiSeq lane (54) (Figure 3.6a; 315A n = 95, dark green). Despite pooling half as many samples per HiSeq lane (96 vs. 192), cells from the C-DOP-L dataset had a mean depth 0.015X (approximately 4.5 fold lower than the DLP 184-hTERT-L2 samples). In order to provide a fair comparison of coverage uniformity, we trimmed our reads by 44.8% 83  (from 125bp to 69bp), discarded the second read in each read pair, aligned our data as single-end reads, downsampled each single-cell 184-hTERT-L2 genome by 48% and each GM18507 genome by 66%, to achieve the same mean depth per cell as the C-DOP-L 315A dataset (Figure 3.6a; GM18507, light orange; 184-hTERT-L2, light blue). Analysis indicates that pooling the genomes of 64 184-hTERT-L2 cells prepared without pre-amplification results in a median 94.5% genome coverage; with slightly higher depth per cell, pooling the genomes of 64 GM18507 cells resulted in a median 96.8% genome coverage (Figure 3.6a). In the trimmed and downsampled datasets, 64 DLP 184-hTERT-L2 and GM18507 cells achieved a median of 57.7% and 58.8% coverage breadth, respectively (Figure 3.6a).   Figure 3.6 Coverage uniformity of immortalized normal human cell lines and tumour cell sequencing metrics 84  (a) Bootstrap sampling (n = 30) and pooling of single-cell genomes vs. coverage breadth (fraction of the genome covered by at least one read). Dark orange, DLP GM18507 immortalized male lymphoblastoid cells sequenced in multiplex at 192 libraries per NextSeq flow cell (mean 0.12X depth, n = 122); light orange, GM18507 cells downsampled to the same mean depth as the 315A dataset. Dark blue, DLP 184-hTERT-L2 immortalized female breast epithelial cells sequenced in multiplex at 192 libraries per HiSeq lane (mean 0.07X depth, n = 146); light blue, 184-hTERT-L2 cells downsampled to the same mean depth as the 315A dataset. Green, 315A immortalized male lymphoblastoid cells sequenced with the C-DOP-L protocol at 96 cells per HiSeq lane (mean 0.015X depth, n = 95) (54). (b) Lorenz curves, showing uniformity of coverage for pooled single-cell genomes. Each blue curve corresponds to the median pooled 184-hTERT-L2 samples from panel (a). Black, a bulk genome for the same 184-hTERT-L2 sample prepared using the standard Nextera protocol; orange, a pooled genome corresponding to 48 DLP 184-hTERT-L2 cells. Dotted black line represents perfectly uniform coverage. (c) Comparison of sequencing metrics for single breast cancer tumour cells sequenced with DLP (SA501X3F and SA501X4F, blue), and DOP-PCR based methods WGA4 and C-DOP-L (54) (Pt41, green). Xenograft tumour cells were downsampled (grey), to achieve the same median number of mapped non-duplicate reads as tumour cells from Pt41 (54). Top left, number of mapped non-duplicate reads per single cell; top right, total number of reads per cell; middle left, fraction of mapped reads; middle right, fraction of duplicate reads; bottom left, the median absolute deviation (MAD) over chromosome 19 (the only chromosome diploid in both the SA501 and Pt41 tumours); bottom right, coverage breadth per single cell, dotted line indicates that coverage breadth should not be compared across tumours from different patients. (d) Copy number profile for the 184-hTERT-L2 cell with median coverage breadth from the bootstrap analysis (top), and for a sample GM18507 cell with a unique integer gain in chromosome 1 (bottom). Profiles inferred using a hidden Markov model (148) with 150 kb bins. Colours correspond to integer copy number state assignments. Black lines indicate segment medians.  In contrast, 64 cells from the C-DOP-L dataset were found to have a median of 44.5% breadth (Figure 3.6a). This demonstrates that direct library preparation without pre-amplification not only yields more depth per cell, but that sequencing depth being equal, reads are distributed more uniformly across the genome. 85  To determine how coverage uniformity for our pooled samples compares to a bulk genome, we next plotted one Lorenz curve for each condition in Figure 3.6a, using the pooled 184-hTERT-L2 genome with median coverage breadth for that condition (Figure 3.6b, blue curves). A bulk genome for the same 184-hTERT-L2 sample prepared using the standard Nextera protocol was sequenced at 3.44X (Figure 3.6b, black curve), and we show that a pooled genome corresponding to 48 184-hTERT-L2 single cells with the same coverage depth (3.44X) achieves equivalent coverage breadth and uniformity (Figure 3.6b, dashed orange curve). Lorenz curves for the DLP GM18507 and C-DOP-L 315A datasets are found in Figure C.3. As an application to the analysis of copy number profiles and clonal structure in tumours, we next carried out multiplex sequencing of libraries from two passages of a patient-derived primary triple-negative breast cancer xenograft (patient-identifier SA501, four microfluidic chips or 768 indexed libraries). Of these libraries, 384 were from a third-passage xenograft tumour (SA501X3F; 296 single cells identified with fluorescence microscopy, 32 no-template controls), and 384 from a fourth-passage xenograft tumour derived from the third passage (SA501X4F; 299 single cells identified with fluorescence microscopy, 22 no-template controls). When sequenced at the equivalent of 192 indexed samples per 15X sequencing lane, single cells from these xenograft tumours attained similar depth as the 184-hTERT-L2 cells (SA501X3F mean 3.83 ± 0.67 million reads, 0.07 ± 0.01X coverage depth per cell; SA501X4F mean 4.65 ± 1.12 million reads, 0.087 ± 0.02X coverage depth per cell; see Table 3.3). Due to differences in the proportion of amplified 86  and deleted regions between our xenograft tumour cells and those in previously published datasets, we did not carry out a direct comparison of coverage uniformity for these samples. Instead, we compared basic sequencing metrics between our samples and those from a published breast cancer dataset (54) with similar ploidy, namely cells from an ER-positive breast cancer patient (Pt41), which were sequenced using two different DOP-PCR whole-genome amplification protocols (WGA4 (39,54), n = 74; and C-DOP-L (54), n = 64). We excluded 3 xenograft cells with less than 0.25 million reads, as this cutoff was applied to exclude cells from the Pt41 dataset (54). Once again, in order to provide a fair comparison, we downsampled our X3F and X4F xenograft libraries by 73% and 78%, respectively, in order to achieve the same median number of mapped non-duplicate reads per cell as the Pt41 libraries (Figure 3.6c, top left panel; KW test showed no significant difference in the number of mapped non-duplicate reads, p = 0.11; see Table 3.4). As the middle panels of Figure 3.6c show, WGA4 libraries suffered from low mappability due to WGA adaptor contamination, while C-DOP-L libraries had high duplicate rates. This resulted in nearly twice the number of total reads needed to achieve the same number of mapped non-duplicate reads (Figure 3.6c, top right panel). Given the same number of mapped non-duplicate reads, we next compared the variation in binned read counts (200 kb bins) between datasets, by computing the median absolute deviation (MAD) over chromosome 19, the only chromosome which was diploid in both the SA501 xenograft passages and Pt41 dataset. As the bottom left panel of Figure 3.6c shows, our heavily 87  downsampled xenograft cells had lower median variation in binned read counts than the DOP-PCR cells, given the same number of mapped non-duplicate reads (KW test, p < 2.2e-16; a post-hoc Dunn’s test with Benjamini-Hochberg correction showed all comparisons between the downsampled xenograft datasets and Pt41 datasets were significant, but there was no significant difference between the X3F and X4F xenograft passages; see Table 3.4). While it is difficult to compare coverage breadth for different tumour samples in a fair manner, it should be noted that the C-DOP-L protocol achieved higher median breadth than the WGA4 protocol when applied to the same patient tumour (Figure 3.6c, bottom right panel; KW test, p = 1.89e-5; see Table 3.4), suggesting that WGA4 would not be expected to out-perform direct library preparation without pre-amplification in a bootstrap pooling analysis, were such a dataset available for comparison. Without downsampling, the SA501 xenograft cells had a median duplicate rate of approximately 20%, substantially lower than the C-DOP-L samples, suggesting sufficient diversity for additional sequencing. Finally, Figure 3.6d shows a sample copy number profile from each of the immortalized normal human cell lines: the cell with median coverage breadth from 184-hTERT-L2 (top), and an example of a cell from GM18507 with a unique integer copy number gain (bottom). 3.4.3 Copy number heterogeneity and clonal evolution in serial breast cancer xenograft passages We next sought to examine copy number heterogeneity in our low-coverage single-cell breast cancer xenograft samples, and identify sub-populations with shared copy number 88  profiles. For each low-coverage single-cell genome, binned read counts (150 kb bins) were extracted, GC-content correction applied, and low-mappability regions removed from the analysis (see Appendix E  for details). Profiles were segmented and hidden copy number states derived using a hidden Markov model with seven states and Student’s-t emissions (148). Following hidden state inference, profiles were converted to integer copy number scale by dividing all bin counts by half the median value of bins assigned to the neutral (2-copy) state. The median absolute deviation (MAD) of all bins assigned to the neutral state was computed, and all samples identified as single cells by fluorescence imaging with MAD < 0.15 were retained for downstream analysis. Figure 3.7a displays a heatmap of integer copy number states for 260 single cells from xenograft sample SA501X3F. Clonal lineage reconstruction using a Bayesian phylogenetic model (152) revealed three distinct sub-populations: a major clonal group with one copy of chromosome X (Clone A, n = 214), a minor population with two copies of chromosome X and numerous smaller alterations relative to the dominant population (Clone B, n = 28; see chromosomes 1, 2, 3, 5, 6, 8, 14, 15, 18, and 20), and a third sub-population which shares the profile of Clone B but features additional alterations in chromosome 11 (Clone C, n = 18). Representative single-cell copy number profiles from each of the three clones are presented in Figure 3.7b. 89   Figure 3.7 Single-cell copy number profiles from SA501X3F, a third-passage xenograft derived from a primary triple-negative breast cancer tumour 90  (a) Heatmap of integer copy number profiles for SA501X3F cells (n = 260), inferred using a hidden Markov model (148). Rows correspond to cells, columns to genomic bins (150 kb). Colours correspond to integer copy number states, while the left-hand boxes indicate clone assignment, based on a Bayesian phylogeny (152). (b) Representative single-cell copy number profiles from Clones A, B, and C. Colours correspond to the integer copy number state assignment for a given genomic bin. Black lines indicate segment medians.  It should be noted that while cells within a clone share most major copy number alterations, numerous cells with unique amplifications and deletions are evident, especially within the dominant clonal group. The lack of contamination in our no-template control samples and the clear placement of segment medians along integer copy number values, suggests that many of these events represent genuine copy number diversity in this highly rearranged triple-negative tumour (see Figure C.4 for sample copy number profiles with unique alterations). A copy number heatmap and phylogeny for 254 single cells from the subsequent xenograft passage SA501X4F reveals that by the fourth passage, minor Clones B and C are no longer detectable, and the descendants of Clone A dominate the population (Figure 3.8). Furthermore, the population has diversified, and features numerous small sub-populations with distinct integer copy number alterations that are shared by at least three cells (see example populations highlighted in Figure 3.8, and sample cells from each of these populations in Figure C.5). One of these alterations, a set of minute amplifications and deletions in the long arm of chromosome 12 (a possible chromothripsis event), is in fact evident in the previous passage (n = 2 in SA501X3F, n = 8 in SA501X4F; see population (v) 91  in Figure 3.8 and Figure C.5). The fourth passage also contains several sub-populations with profiles not evident in the third passage, including a group of cells that have lost the ancestral high-level amplification in chromosome 16 (n = 20; see population (ix) in Figure 3.8 and Figure C.5).  Figure 3.8 Heatmap from fourth passage xenograft SA501X4F Integer copy number heatmap and phylogeny for single cells from fourth passage xenograft SA501X4F (n = 254) reveals that the population is dominated by the descendants of cells from SA501X3F Clone A, but also features a multitude small sub-populations with distinct shared copy number alterations. Rows correspond to 92  single-cell profiles, columns to genomic bins (150 kb). Bin colours correspond to integer copy number states. Grey boxes to the left of the heatmap provide a non-exhaustive list of distinct sub-populations with three or more cells that share consistent differences from the majority profile. An example profile from each of these sub-populations is found in Figure C.5.  Finally, we aligned all xenograft samples to the mouse reference genome, and found a single mouse cell among the 768 indexed libraries we sequenced (SA501X4F Cell 106, 0.07X depth relative to mouse genome, 0.0007X depth relative to human genome; Figure C.6). Five additional samples had minute quantities of mouse contamination (defined as depth ≥ 0.001X relative to the mouse genome), but all had previously been flagged as containing contaminating debris using fluorescence imaging prior to library construction. The remaining sequenced samples had depths on the order of 0.0001X relative to the mouse genome (Figure C.6). 3.4.4 Pooling of single-cell genomes yields high-depth, low-bias clonal and bulk-equivalent genomes As demonstrated in Figure 3.7, multiplexed sequencing of unamplified single cells yields highly uniform low-coverage genomes suitable for integer copy number inference. These may be used to provide insight into the copy number heterogeneity and clonal architecture of tumours. However, while other single-cell sequencing approaches may permit such insights, one of the principle strengths of our method lies in the ability to subsequently pool information from multiple cells to yield high-depth clonal or bulk-equivalent genomes 93  with coverage breadth and uniformity equal to that of a bulk genome prepared using standard protocols.  We first generated a pooled genome for each clonal sub-population identified in Figure 3.7, as well as a genome for all three populations combined, and inferred copy number profiles using the same model parameters applied to our single-cell samples (Figure 3.9a). Despite numerous differences in copy number, little evidence of minor Clones B and C is evident (as shifts in median segment values) in the combined copy number profile (Figure 3.9a, bottom panel), underscoring the challenge of identifying low-prevalence sub-clonal copy number alterations in bulk genomes, and the advantage of using a single-cell approach. Next, we generated a bulk-equivalent genome by pooling all indexed samples from xenograft passage SA501X3F (including multi-cell samples and cells with contaminating debris, but excluding those with mouse contamination). Figure 3.9b shows Lorenz curves for the three SA501X3F clonal genomes, the bulk-equivalent genome, and a true bulk genome prepared using the standard Nextera protocol. Once again, the bulk-equivalent genome features comparable uniformity of coverage to the bulk genome with the same sequencing depth. In Figure 3.9c, we sought to determine the size range of copy number variants we can consistently infer in our single-cell samples, by comparing single-cell copy number profiles from each clone to their respective clonal genome profile. Each boxplot corresponds to the set of segments called as deleted (blue) or amplified (orange) in the clonal genome, and the y-axis represents the fraction of cells with overlapping calls. 94   Figure 3.9 Analysis of pooled clonal genomes (a) Inferred copy number profiles for the pooled genomes of all SA501X3F single cells from Clone A (top, 82.31% of cells), Clone B (top middle, 10.77% of cells), and Clone C (lower middle, 6.92% of cells), as well as a genome consisting of cells from all three clones combined (bottom). Profiles were inferred using the same parameters applied to the single-cell genomes (150 kb bins).  (b) Lorenz curves for the three clonal genomes from xenograft SA501X3F, the pooled bulk-equivalent genome (featuring all indexed samples with the exception of those containing mouse contamination), and a standard bulk genome.  (c) Comparison of the overlap between inferred deletions (blue) and amplifications (orange) in the pooled clonal genomes and single cells belonging to those clones. Each boxplot represents the distribution over segments of a given size range (x-axis), while the y-axis represents the fraction of cells in which the copy number call for that segment overlapped with the clonal genome call. 95  This analysis may under-represent our ability to call copy number, as true diversity within each clonal group will reduce the fraction of overlapping calls. Nevertheless, these results suggest that in cells sequenced at low depth without pre-amplification, we can confidently call clonal copy number alterations in the range of 1-5 Mb. This stands in contrast to a recent analysis of copy number inference in whole-genome amplified single-cells, which found that known germline variants < 5 Mb could not be detected consistently (154).  In addition, we sought to determine if smaller bin sizes could improve our sensitivity for known bulk variants in cells with higher coverage depth. We identified the fifteen cells from SA501X4F with the highest number of total reads (mean 7.02 ± 0.58 million reads, 0.13 ± 0.02X coverage depth per cell), and computed, for each known bulk copy number segment, the fraction of cells at a given bin size with the exact same integer copy number call as the bulk (Figure C.7). This analysis indicates that in cells sequenced at higher depth, we can confidently call known bulk segments in the range of 100-500 kb. Finally, we asked whether standard variant calling methods can be applied to the bulk- equivalent genome. As shown in Figure 3.10, we inferred a high-resolution copy number profile, loss of heterozygosity (LOH), single nucleotide variants (SNVs), and breakpoints on the SA501X3F bulk-equivalent genome, and compared these variants to those inferred on the standard bulk genome for this xenograft passage with the same coverage depth (see Table 3.5 for bulk-equivalent and bulk genome sequencing metrics).  96   Figure 3.10 Analysis of LOH, SNVs, and breakpoints on the pooled bulk-equivalent genome and a standard bulk genome for sample SA501X3F 97  (a) Simultaneous inference of copy number alterations and loss of heterozygosity (LOH) (31) on the pooled DLP bulk-equivalent genome with standard 1 kb bins.  (b) The same CNV and LOH analysis applied to the true bulk genome (31). (c) Scatter plot showing correlation (ρ = 0.92) of LOH state calls for 1,516,294 heterozygous germline variants between the pooled DLP bulk-equivalent genome and standard bulk genome (31). (d) Scatter plot showing correlation (ρ = 0.84) of allelic ratios for 8745 high-confidence SNV calls (149) between the pooled bulk-equivalent genome and standard bulk genome.  (e) Scatter plot showing correlation (ρ = 0.93) of allelic ratios between the pooled bulk-equivalent genome and standard bulk genome for 184 SNVs previously validated using targeted amplicon sequencing in the SA501 xenograft series (24). (f) Circos plot of rearrangement breakpoint calls (150) in the bulk and bulk-equivalent, showing overlapping calls (grey, n = 133), calls with high-probability only in the bulk-equivalent genome (orange, n = 18) and those with high-probability only in the standard bulk genome (blue, n = 44).  The first panels show the output of a hidden Markov model for simultaneous inference of copy number and LOH (31), applied to the pooled bulk-equivalent genome (Figure 3.10a) and the bulk genome (Figure 3.10b). In both samples, evidence for the minor sub-populations with two copies of chromosome X is seen as an inward shift in allele ratios for that chromosome. However, the other alterations that distinguish these two minor sub-populations are not evident, once again underscoring the advantage of a single-cell approach.  The overlap in LOH state calls and variant allele fractions for these two samples demonstrate that loss of heterozygosity can be inferred on the bulk-equivalent genome (Figure 3.10c; n = 1,516,294; Pearson’s correlation, ρ = 0.92). Figure 3.10d shows the correlation in variant allele frequencies for the union of high-probability SNV calls (149) between the two samples (n = 8745; Pearson’s correlation, ρ = 0.84), while Figure 3.10e shows the correlation in variant allele frequencies for a set of SNVs previously validated 98  (24) in the SA501 xenograft series using targeted amplicon sequencing (n = 184; Pearson’s correlation, ρ = 0.93).  Finally, Figure 3.10f demonstrates the overlap (grey, n = 133) in breakpoints inferred in the bulk-equivalent (orange, n = 18) and bulk (blue, n = 44) genomes (150). These analyses suggest that single-cell library construction without pre-amplification, followed by genome pooling, generates a low-bias bulk-equivalent genome to which standard variant inference methods can be applied. It should also be noted that, given sufficient sequencing depth, such analyses may also be applied to the individual clonal genomes, yielding information, for example, on overlap between heterogeneity in copy number space and SNV space.  3.5 Discussion We describe a novel experimental method for simultaneous acquisition of high-resolution single-cell copy number profiles and bulk tumour genomes, using direct single-cell library preparation (DLP) without pre-amplification. We demonstrate highly uniform coverage by examining the genomes of 268 cells from two immortalized normal cell lines. Analysis showed that compared to existing methods for single-cell copy number analysis, direct library preparation produces more usable reads per cell, permits more cells to be multiplexed per sequencing lane, and generates genomes with lower variation in reads per bin. Pooling multiple cells produced a bulk-equivalent genome with equivalent coverage breadth and uniformity as a standard bulk genome with the same coverage depth. We next 99  examined the copy number architecture of two patient-derived TNBC xenograft tumours by analyzing the genomes of 514 single cells. Third-passage xenograft tumour SA501X3F features three main clonal sub-populations, distinguished by multiple CNVs across the genome. We noted that many cells with low dispersion in binned read counts featured unique alterations that fall along integer copy number values, suggesting that these represent rare private events. The SA501X4F xenograft, on the other hand, was composed primarily of cells with a single major copy number profile, but also featured numerous sub-populations with distinct alterations shared by three or more cells. This presents the possibility that some highly unstable breast cancer tumours may feature substantial copy number heterogeneity. Finally, we carried out standard analysis of SNVs, LOH, and breakpoints on the pooled bulk-equivalent genome for the SA501X3F xenograft, and demonstrated a high degree of overlap with variants called on the standard bulk genome for the same tumour. A critical element of this method is the ability to perform efficient transposase library preparation directly on unamplified single-cell template DNA using nanolitre-volume processing. Here we apply the Tn5 chemistry in an optimized nanolitre volume process to enable, for the first time, the direct construction of single-cell libraries by tagmentation. Our results demonstrate that this approach generates single-cell libraries with uniformity that surpasses previously reported methods. We implemented this approach using our novel microfluidic chip design, which features pre-spotted index primers, fluorescence 100  imaging of trapped cells, and inflatable reaction chambers that permit protocol customization.  Current DOP-PCR WGA protocols for single-cell library preparation take approximately three days to complete (44), are performed primarily on 96 samples or fewer per experiment (39,44–48), carry an estimated cost of $15 per cell for amplification alone (without factoring in the cost of subsequent library construction), and suffer from either low mappability due to WGA adaptor contamination (WGA4), or high duplicate rates (C-DOP-L), requiring fewer cells to be multiplexed per sequencing lane. The DLP protocol offers substantial gains in throughput in terms of cost ($0.5 per cell, including dead volumes and consumables, see Table 3.6), labour (192 libraries in 2.5 hours of hands-on time), and sequencing effort (192 libraries per HiSeq lane).  Future device designs may accommodate additional chambers, permitting preparation of thousands of single-cell libraries per week. In the current experiment, we utilized a DNA stain combined with fluorescence imaging of trapped cells to distinguish true single-cell samples from doublets and contaminating debris, prior to cell lysis. Future experiments may utilize a more powerful microscope and other stains to examine cell morphology and phenotype in greater detail. While we cannot selectively exclude undesirable samples (such as doublets and traps with contaminating debris) from being pooled for sequencing, reads derived from these samples are not wasted, as they contributed to the depth of the pooled bulk-equivalent genome. Finally, we note that either commercialization of the device or incorporation of key elements critical to the DLP protocol (such as the inflatable reaction 101  chamber and pre-spotted index primers) into existing commercial microfluidic systems, could lead to wide-spread adoption of the method in a relatively short time frame. In addition, we believe the DLP protocol is readily adaptable to other small-volume formats, such as micro-droplets (86,87,155,156). It should be emphasized that the purpose of the described method is not to capture the complete genomes of single cells, but rather to provide high-resolution single-cell copy number profiles while at the same time producing high-quality bulk genomes in a single sequencing experiment. For a diploid cell, any given position in the genome is represented by only two non-duplicate fragments. Physical limitations, such as sample handling, will place a limit on the diversity of our DLP libraries, and affect our ability to detect SNVs on the single-cell level. However, existing computational methods for bulk genome analysis are better powered to detect low-prevalence sub-clonal SNVs than low-prevalence sub-clonal copy number variants. Knowledge of sub-clonal copy number architecture gleaned from low-depth single-cell sequencing may be used to inform and improve inference of sub-clonal SNV structure from pooled bulk-equivalent genomes. In addition, it is possible to fully characterize all sub-types of sub-clonal variants using our method by simply sequencing more cells and generating pooled clonal genomes of sufficient depth for SNV, LOH, and breakpoint inference. We posit that in most cases, this is a superior approach to sequencing a few cells to very high depth, as it avoids the high error rates introduced by amplification, and provides a better representation of the tumour cell population. Sequencing the full genomes of ten cells using MALBAC or MDA to 30X depth would be 102  equivalent to sequencing a 300X bulk- equivalent genome by pooling 6000 single cells (at 0.05X) using DLP. The latter provides a much more comprehensive view of copy number diversity in the population, while yielding sufficient depth (30X) to examine SNVs using standard methods in any copy number clone that makes up at least 10% of the population. Finally, direct library preparation permits identification of contaminating normal cells and their exclusion from pooled bulk-equivalent genomes in order to rescue low-cellularity samples. Low tumour cellularity hinders variant calling, and presents a significant obstacle to sequencing and analysis of valuable donated patient tissue (24). In addition, identification and removal of contaminating mouse cells in xenograft samples reduces the risk of introducing false-positive variant calls from mouse reads aligning to the human reference genome (24). We envision direct single-cell library preparation as the new standard approach to the sequencing of heterogeneous populations. 3.6 Tables Table 3.1 Number of constructed indexed libraries, flagged no-template controls, flagged single cells, and single cells retained for downstream analysis  • sample - Sample from which the indexed libraries were constructed 103  • total_indexed_libraries - Total number of indexed libraries constructed for the sample (192 libraries per chip) • flagged_no_template_controls - Number of indexed libraries flagged as no-template controls with fluorescence microscopy • flagged_single_cells - Number of indexed libraries flagged as single cells with fluorescence microscopy • single_cell_pass_cnv_filters - Number of single-cell libraries which passed the median absolute devation (MAD) criteria for copy number phylogenetic analysis (MAD ≤ 0.15)  Table 3.2 Sequencing metrics for indexed libraries from immortalized normal breast epithelial cell line 184-hTERT-L2 (page 1) and immortalized normal lymphoblastoid cell line GM18507 (page 2)  • sample_id - Identifier for the indexed library • cell_call - Flag assigned to the library following fluorescence imaging: “1 cell”, unambiguous single cell; “No-template control”, empty cell trap; “Multiple cells”, two or more single cells; “Contaminated or uncertain”, contaminating debris, single cell with contaminating debris, or uncertain cell call. • total_reads - Total number of demultiplexed reads assigned to the library • total_mapped_reads - Total number of mapped reads • total_duplicate_reads - Total number of duplicate reads • mean_insert_size - Mean insert size for paired-end reads • coverage_depth - Coverage depth (average number of reads per genomic position) 104  • coverage_breadth - Coverage breadth (fraction of the genome covered by at least one read) • mad_neutral_state - Median absolute deviation (MAD) for all bins assigned to the “neutral” or 2-copy state using a hidden Markov model (148) Table 3.3 Sequencing metrics for indexed libraries from patient-derived breast cancer xenograft passages SA501X3F (page 1) and SA501X4F (page 2)  • sample_id - Identifier for the indexed library • cell_call - Flag assigned to the library following fluorescence imaging: “1 cell”, unambiguous single cell; “No-template control”, empty cell trap; “Multiple cells”, two or more single cells; “Contaminated or uncertain”, contaminating debris, single cell with contaminating debris, or uncertain cell call. • total_reads - Total number of demultiplexed reads assigned to the library • total_mapped_reads - Total number of mapped reads • total_duplicate_reads - Total number of duplicate reads • mean_insert_size - Mean insert size for paired-end reads • coverage_depth - Coverage depth (average number of reads per genomic position) • coverage_breadth - Coverage breadth (fraction of the genome covered by at least one read) • mad_neutral_state - Median absolute deviation (MAD) for all bins assigned to the “neutral” or 2-copy state using a hidden Markov model (148) • coverage_depth_mouse - Coverage depth relative to the mouse reference genome • mouse_contamination - Flag indicating whether library exceeds coverage threshold for mouse contamination: 0, below threshold; 1, above threshold (≥ 0.001X). 105  Table 3.4 Statistics table showing results of the Kruskal-Wallis tests (page 1) and Pearson’s correlation (page 2)  • variable - Independent variable examined • samples compared - List of samples compared • n - Number of cells in sample group • chi-squared - Chi-squared value • df - Degrees of freedom • p-value - Calculated probability • rho - Pearson’s correlation  Table 3.5 Sequencing metrics for bulk-equivalent and bulk libraries  • sample id - Identifier for the library • total reads - Total number of demultiplexed reads assigned to the library • total mapped reads - Total number of mapped reads • total duplicate reads - Total number of duplicate reads • mean insert size - Mean insert size for paired-end reads 106  • coverage depth - Coverage depth (average number of reads per genomic position) • coverage breadth - Coverage breadth (fraction of the genome covered by at least one read)  Table 3.6 Cost breakdown for device fabrication and library preparation, per chip and per indexed library  • device fabrication - Cost breakdown for device fabrication materials (US dollars) • library preparation - Cost breakdown for library preparation reagents (US dollars) • total costs per chip - Total costs for fabrication and application per microfluidic device (US dollars) •  total costs per cell - Total costs for fabrication and application per indexed library (US dollars)    107  Chapter 4: Highly-scalable direct library preparation (DLP+) of single-cell genomes in open nanolitre well arrays 4.1 Overview In the previous chapter, we demonstrated that direct library preparation (DLP) without pre-amplification outperforms standard single-cell library preparation (based on WGA) and provide an experimental workflow that is capable of analyzing hundreds to thousands of single cells. However, the manufacturing process and infrastructure required to run microfluidic chips somewhat limits the adoption of this application in other labs, and also limits ultimate scalability due to device manufacturing constraints. To address this, we implemented the principles of DLP in an open nano-well substrate using off-the-shelf components. As part of this technology transfer we carefully tested and optimized key reaction parameters of DLP+, ultimately achieving equal performance to the microfluidic device. We demonstrate the scalability of this approach through construction and analysis of 18,129 single-cell genome sequencing libraries. 4.2 Introduction A tumour is composed of millions of cells that harbour a range of common and unique genetic alterations. Differences in the tumour genotype can result in differing phenotypes, including drug sensitivity and metastatic potential. While in many patients the majority of cells respond to cancer therapy and die, small subpopulations of tumour cells can resist 108  treatment and over time contribute to relapse. Characterizing tumour heterogeneity at single-cell resolution can improve identification of resistant cell populations, development of personalized treatments, and disease monitoring during progression and treatment. Existing methods, however, lack the required sensitivity and scalability to reliably genotype thousands of single cells. They are expensive, labour intensive, and introduce coverage bias and polymerase errors that complicate data interpretation. In the previous chapter we introduced DLP, a high-fidelity single-cell library preparation method in nanolitre volumes (1). Nanolitre microfluidic reaction chambers not only made this method more economical compared to microlitre approaches, it also allowed us to adapt Illumina’s Nextera transposase chemistry to a single-cell template directly. In addition, we introduced an analysis workflow that combines low-depth sequencing of multiplexed single cells and merging of copy-number clones to infer additional variants such as SNV’s and breakpoints. This makes DLP ideally suited to sequence a large population of tumour cells and to help to identify small, yet important sub-populations. Despite the performance of microfluidic-based DLP analysis, the use of custom microfluidic devices presents an obstacle to adoption in many labs and also places limits on scalability due to fabrication constraints. To address this limitation, we sought to develop and optimize an alternative and much higher-throughput implementation of nanolitre volume DLP, referred to here as DLP+, that is based on high-density microwell arrays and picolitre volume piezo dispensing technology. To achieve comparable performance to our microfluidic approach, we rigorously optimized key reaction parameters, providing new 109  insights into the mechanisms that determine library quality. We then show that optimized DLP+ enables high-fidelity analysis of thousands of cells per experiment. In addition, we describe an improvement for cell loading based on image processing and real-time selection during single-cell dispensing. This further increases throughput by overcoming inefficiencies in stochastic loading (i.e. limiting dilution Poisson distributions) and allows imaging information to be correlated with sequencing results.  We first present results from rigorous optimization of parameters that determine library quality first. We then demonstrate the robust and scalable performance of DLP+ on 1,409 cells from a lymphoblastoid cell line and 10,665 single cells from a breast epithelial cell line. 4.3 Methods 4.3.1 Sample preparation 4.3.1.1 Cell culture Cells from the immortalized normal human lymphoblastoid cell line GM18507 (Coriell Cell Repositories) were cultured at 37 °C and 5% CO2 in RPMI-1640 medium with 2.05 mM L-glutamine (HyClone) supplemented with 10% FBS (Gibco/Invitrogen). Cells from the immortalized normal breast epithelial cell line 184-hTERT L9 were cultured at 37 °C and 5% CO2 in MEBM Mammary Epithelial Cell Growth Medium (Lonza) with transferrin (Sigma) and isoproterenol (Sigma), supplemented with Lonza MEGM(tm) Mammary 110  Epithelial Cell Growth Medium Singlequots. The parental 184-hTERT cell line was generated by Dr. Carl Barratt (Laboratory of Molecular Carcinogenesis at the National Institute of Environmental Health Sciences) and the monoclonal 184-hTERT L9 cell line was derived by Dr. Angela Burleigh in the Aparicio lab (BC Cancer Agency), while the subsequent CRISPR knock-out of TP53 was carried out and verified by Western blot and Sanger sequencing by Dr. Tehmina Masud in the same lab. We test the cells for mycoplasma with h-IMPACT II human pathogen testing (IDEXX Bioresearch). Cells were grown to near confluence, trypsinized, resuspended in cryopreservation media and frozen down at –1 °C/minute to –80 °C. 4.3.1.2 Cell staining and dilution Cells were fluorescently stained using a combination of CellTrace™ CFSE Cell Proliferation Kit (ThermoFisher) and LIVE/DEAD® Fixable Red Dead Cell Stains (ThermoFisher), incubating for 20 min at 37 °C. Cells were resuspended in fresh PBS at a concentration of 220,000 cells/mL prior to dispensing into chips with unique dual index barcodes already dispensed in each well. 4.3.2 Robot operation All cell and reagent spotting was carried out on a contactless spotting robot (sciFLEXARRAYER S3, Scienion). Pulse and voltage were adjusted before every dispensing step or routine to achieve a stable droplet. PDC 70 Type 1 nozzles were used for primer dispensing, PDC 70 Type 4 nozzles were used for reagent addition, and PDC 90 Type 4 111  nozzles were used for cell dispensing. The spotter was primed daily with fresh and degassed water according to the manufacturer's recommendation. Briefly, 700 mL of 18 MΩ water was filtered through a 0.22 μm filter (Millipore Express Plus). The filtered water was placed in a sonicating water bath (VWR Symphony) and a vacuum was applied for 30 min using a custom adaptor lid. Following the “Prime” program prompts, the bottle containing the fresh system water was then connected to the spotter. To minimize travel time during cell spotting, a custom chip holder was mounted next to the droplet camera. All other reagent additions were carried out on a temperature controlled target holder (either at dew point or 4 °C). If the dew point were below 4 °C, the relative humidity was increased to 38% ± 2%, with the exception of index primers and cell dispensing. The built-in “Field Target To Reference Point” function was used to adjust for placement and rotational errors. Nozzles were removed at the end of the day and all system liquid lanes were drained. 4.3.2.1 Primer spotting and wash routine A unique combination of two dual index primers (2.1 nL each at 20 uM) were dispensed into each well of the microwell chip (Wafergen, 5184 nanowells arranged in a 72 × 72 well array, 110 nL each) in advance of cell spotting. 144 customized i7 and i5 primers (Integrated DNA Technologies) were used, where ‘NNNNNN’ was replaced with a unique hexamer barcode (see ref (69)): i5 5’-AATGATACGGCGACCACCGAGATCTACACNNNNNNTCGTCGGCAGCGTC-3’ i7 5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTCTCGTGGGCTCGG-3’ 112  Prior to spotting, the primers were desalted and normalized to 100 μM stock concentration in IDTE 8.0 pH. Working plates were prepared by diluting each stock primer to 20 μM in 0.1% Tween 20 in TE ph 8.0. For primer dispensing, humidity control was not used and the primers were allowed to dry down for storage at room temperature. Finally, a custom wash routine was implemented to avoid cross-contamination of index primers during spotting (Figure D.2). The wash cycle includes a series of pump and sonication steps with 2% Tween 20 and 1% SciClean solution. 4.3.2.2 Cell spotting Single cells or nuclei were isolated by dispensing a limiting dilution (Poisson distribution) or using active selection during cell spotting (CellenOne). For cell/nuclei isolation by limiting dilution, stained cells or nuclei were diluted to 1 million cells/mL in PBS. 1 nL of the diluted sample was dispensed into a test array to determine the isolation rate and optimize the spotting volume to achieve optimal single-cell occupancy before the remaining wells were filled using the optimized spot volume. Under optimal conditions about one-third of wells contained a single cell or nuclei; other wells were empty or contained multiple cells/nuclei.  In addition, optional spotting software (CellenOne) was used to select single cells in the dispensing nozzle, resulting in an almost perfect single-cell isolation rate. Stained cells were diluted to 220,000 cells/mL in PBS. To help avoid imaging artifacts due to reflections 113  or external light, the robot enclosure was blacked out with opaque panels. An automated machine learning algorithm was executed after every cell uptake to set ejection and sedimentation boundaries with a mapping density threshold between 0.25 and 0.3. The CellenOne software allows for thresholding on size, circularity, and elongation of cells, enabling the exclusion of doublets and debris. The following advanced settings were used: min area 20, max area 250 to 1000 (depending on cell type), circularity 1.35, elongation 2.5. In addition, the LED pulse width was increased to 10 ms. Brightfield images and particle metrics from deposited cells were saved with spatial information. Isolated cells were frozen in the microwell chip at –20 °C until library preparation. 4.3.3 Chip imaging and cell calling All microwell chips were scanned on a 10× inverted fluorescent microscope (Nikon TI-E). Standard stages were replaced with fast travel stages to increase speed (ASI stages fitted with an ultra-course lead screw (28 mm/s)). Control software was written in LabView (LabView 2015) and images were acquired on a Grasshopper3 USB camera (Point Grey Research/FLIR). A customized image analysis software (Java) was then used to confirm single-cell occupancy and acquire cell state information. Intensity and area thresholding could be used to select cells of choice automatically. Automated calling was then reviewed, and a spotting robot input file was created to process selected wells only. All imaging information together with additional information, such as sample type, sample processing, and spatial information were recorded in a custom database. 114  4.3.4 Library preparation protocols We used a one-pot transposase chemistry (Nextera DNA Library Preparation Kit, Illumina) as described in Zahn et al. (1). After adding all reagents, the microwell chips were sealed (MicrosealⓇ film A, BioRad; pressed on with a pneumatic sealer) and reagents collected at the bottom of the well with a centrifugation step at 3214 ×g for 2 min. All chip incubations, with the exception of the cell heat lysis, were carried out on a flatbed thermal cycler (DNA Engine Tetrad 2, MJ Research), followed by a centrifugation step for 2 min at 3214 ×g. Table 4.1 summarizes all experimental conditions; a detailed description of each condition can be found below. Dispensing method Cells were dispensed by a limiting dilution (Poisson) to isolate single cells or single cells were selected directly in the nozzle (block CellenOne, see section Cell spotting). Active selection of cells in the nozzle results in a block pattern vs. the scattered pattern of single isolated cells resulting from the limiting dilution. We investigated the effect of sample distribution on the chip and mimicked a limiting dilution-like scattered distribution using target dispensing of selected single cells (scattered CellenOne).  115  Table 4.1 Summary of experimental conditions   116  Lysis We investigated the following lysis conditions on the open-array platform. Lysis buffer & Protease: G2 lysis buffer was prepared with 25 µL lysis buffer G2 (Qiagen) and 2.5 µL (+) Qiagen Protease (Protease was re-suspended in 7 mL UltraPure water). Direct lysis buffer was prepared by combining DirectPCR Cell Lysis Reagent (Viagen) (25 µL), Qiagen Protease (+: 2 µL, ++: 5 µL, +++: 10 µL) in 5% glycerol and 0.1% Pluronic F-127 (Sigma) in PCR water. Volume & presoak time: 1 nL or 10 nL of the specified lysis solution was dispensed into the selected wells of the microwell chip and cells were incubated at 4 °C (0 hours, 2 hours, 4 hours or overnight (19-22 hours)). Protease top-up: If applicable, 2.5 nL of additional lysis solution was added to each well. Water bath/temp/dry down: Heat lysis was carried out at 50 °C for 1 hour followed by a protease inactivation incubation at 70 °C for 15 min, with a final cooling to 10 °C. If applicable, cell heat lysis was performed by immersing the chip into a water bath at 50 °C for 1 hour, followed by a transfer to a thermal cycler for protease inactivation (70 °C for 15 min, 10 °C forever). During immersion, the chip was mounted in a custom-built chip clamp to ensure a secure fit of the seal. Finally, for some experiments a dry down was performed at room temperature for 15 min, followed by dispensing 10 nL of water to equalize volumes before tagmentation. Lysis solution was added to all wells, including gDNA, no-template (NTC) and no-cell (NCC) controls. Tagmentation After cell lysis, 18 nL of the 2.2 nL tagmentation mix (9 nL TD Buffer, 2.2 nL TDE1, 0.165 nL 10% Tween-20), 3.5 nL tagmentation mix (14.335 nL TD Buffer, 3.5nL TDE1, and 0.165 nL 10% Tween-20) or 6.5 nL tagmentation mix (11.3 nL TD Buffer, 6.5nL 117  TDE1, and 0.165 nL 10% Tween-20) in PCR water were dispensed into each well and incubated at 55 °C for 10 min followed by cooling to 10 °C.  Neutralization The tagmentation reaction was neutralized with 4 nL Qiagen Protease and 4 nL 0.2% Tween-20, and an incubation at 50 °C for 15 min, followed by a protease inactivation incubation for 15 min at 70 °C, with a final cooling to 10 °C.  PCR After neutralization, 39 nL of PCR master mix (19.5 nL NPM, 6.5 nL PPC, 0.65 nL 10% Tween-20, 12.35 nL PCR water) was dispensed to each well. PCR was performed using the following conditions: 72 °C for 3 min; 95 °C for 30 sec; 8 cycles or 11 cycles of 95 °C for 10 sec, 55 °C for 30 sec, 72 °C for 30 sec; 72 °C for 3 min; and finally 10 °C. The indexed single-cell libraries were then recovered by centrifugation through a recovery funnel into a pool. Finally, size selection was performed using a 1.8× Ampure XP (Beckman Coulter) bead to sample ratio.   118  4.3.5 Data analysis and copy number inference Sequencing data were processed as described in Appendix E. A summary can be found in the following flow chart (Figure 4.1).  Figure 4.1 Flow chart showing data-analysis workflow  4.4 Results 4.4.1 Scalable single-cell library preparation in open nanolitre wells We implemented DLP+ using a contactless piezoelectric dispenser (sciFlex Arrayer S3, Scienion) and open microwell arrays (SmartChip, Wafergen) (Figure D.1a). Each chip includes 5184 wells with a volume of 110 nL per well. The spotting robot can address and 119  filled each well in increments of 300-550 pL (Figure D.1b) thus providing ~200-360 freely programmable reagent addition steps. Dual index primers are pre-spotted (Figure D.2) to label each chamber for pooled recovery. This also provides a more streamlined library preparation protocol, since the same reagent mix can be delivered to all wells in parallel. An additional software package (CellenOne, Cellenion) allows us to identify single cells inside the dispensing nozzle and deposit the desired cells selectively into reaction chambers (Figure D.1c-e). The software uses real-time imaging and automated machine learning to map the sedimentation and location of cells in the nozzle (Figure D.1c). This enables active selection of single cells as they are dispensed, overcoming the limitations of cell isolation by limiting dilution (Figure D.1d,e). Single-cell selection can be based on various cell properties, including area, elongation, and circularity, and this information can be linked to the fixed spatial location of a microwell array. Additional information, such a cell state (live/dead), are linked to each well after imaging the entire device and automatically extracting fluorescent imaging information (Figure D.1f,g). Since imaging occurs before the library preparation reagents are spotted, doublets, empty wells, or cells with contamination can be excluded from library preparation (Figure 4.2). The final libraries are pooled during recovery and sent for low-coverage multiplexed sequencing on a HiSeq or NextSeq instruments (Illumina). After the libraries are demultiplexed, aligned, and analyzed for copy number alterations (CNA), processed single-cell data are loaded into a custom visualization platform. 120   Figure 4.2 DLP+ workflow Concept schematic of the experimental workflow. (a) Cell isolation and lysis. A cell suspension is loaded into a dispensing nozzle; single cells are identified using real-time imaging during dispensing and selected cells are isolated in individual nanowells; occupancy and cell state are confirmed by fluorescent imaging and recorded; wells can be selectively addressed to add cell lysis solution; spin and seal steps are used to collect reagents in the bottom of the well and prevent evaporation during heat lysis. (b) Open-array library construction. DLP+ libraries from unamplified single cells are built by carrying the chip through a series of reagent addition, spin, seal, and heat incubation steps. (c) Pooled recovery. Indexed single-cell libraries are pooled for multiplexed sequencing.  121  4.4.2 Automated classification of copy-number quality During analysis, we extract more than 20 sequencing and alignment metrics from the single-cell data. To simplify the evaluation of the library quality from hundreds of cells, we implemented a random forest classifier to jointly evaluate a range of sequencing and post-alignment metrics and provide a single quality score for each cell (QS; unpublished work by Adi Steif). In addition to standard sequencing and alignment metrics, several new quality metrics were developed that were shown to substantially improve discrimination. We trained the classifier on 1598 manually classified near-diploid single-cell genomes (GM18507 lymphoblastoid cell line (153)) and found that “non-integerness” (deviation of segment median to the nearest integer), the fit of the hidden Markov model (HMM) (log-likelihood), and median absolute deviation of reads (MAD) were the highest ranked features informing on the CN quality (Figure D.3). To validate the classifier performance, we collapsed the continuous quality score (0-1) into a binary “high” vs. “low” quality call with a 0.5 cutoff. Using this binary classification, we found that the classifier is 89% accurate (p = 2.2e-16, McNemar’s test) compared to our manually classified CN data, with a high specificity (100%) but at the expense of sensitivity (61%). The trained classifier was applied to the subsequent optimization experiments to identify statistically relevant improvements. 122  4.4.3 Optimization of key reaction parameters In our first attempt to implement DLP+, we applied exactly the same protocol that was used on the microfluidic device. This resulted in poor quality libraries as determined by the prevalence of noisy copy-number profiles and failed libraries. Thus, we needed to identify and carefully optimize a range of key reaction parameters on the open-array platform, in order to achieve comparable performance to our published microfluidic DLP-method. The parameters studied for optimization were: the cell dispensing method; the lysis volume, type and time; and the transposase (Tn5) concentration. We built sequencing libraries from the GM18507 lymphoblastoid cell line (153), used an HMM (148) to infer copy-number profiles, and classified cells in each library using our trained model. We included all wells containing single cells (as identified by microscopy) in the following analysis. Cells that did not produce any reads were assigned a quality score of zero (QS = 0) during classification. First, we tested new lysis buffer conditions (Figure 4.3a). The low volumes of lysis buffer (~1 nL Buffer G2, Qiagen) used in the microfluidic device (1) was not robust in the open-array format (Figure 4.3a, blue). After dispensing 1 nL of lysis buffer into a well, a droplet formed at the bottom of the well that did not cover the entire cross-section and thus could miss the single cell to be lysed. We found that higher volumes of the same buffer poisoned the downstream reactions due to insufficient dilution (Figure 4.3a, orange; mean total number of reads: G2 1 nL = 2.1×106 , G2 10 nL = 1.7×103). To address this, we evaluated a PCR compatible lysis buffer (DirectPCR Cell Lysis Reagent, Viagen). An extended lysis time 123  also motivated a protease top-up to assess possible degradation of the protease over time (see supplemental methods, Figure 4.3a). Both the increased lysis volume (10 nL) and the additional protease top-up significantly improved the mean library quality (1 nL Viagen median QS = 0.12 ± 0.21/0.13 ± 0.16 and 10 nL Viagen median QS = 0.63 ± 0.22/0.37 ± 0.23 with and without protease top-up, respectively; Figure 4.3a,c).  Figure 4.3 Effect of lysis buffer and storage time on library quality (a) Effect of lysis buffer type (G2 buffer and Viagen Direct PCR Cell Lysis Reagent), volumes, and top-up before heat on library quality. (b) Effect of sample type (cells or nuclei) and sample storage in chip (overnight or 2 months at -20 °C) on library quality. (c) Shows correlation matrix of KW tests for experimental conditions in panel a. (d) Same as c for panel b. Colour in violin plots represents different experimental conditions as indicated in the figure. Colour and size of dots in the correlation plots illustrate significance from KW test; P = 0.05 in gray. 124  We next tested the PCR compatible lysis buffer with and without the addition of protease (Protease, Qiagen) and observed a substantial improvement in library quality (Figure D.4) for the medium (3.5 nL Tn5; delta median QS = 0.23) and high (6.5 nL Tn5, delta median QS = 0.26) tagmentation concentrations in the presence of protease (Figure 4.4a). Our findings suggest that the higher Tn5 concentration is able to recover more genomic fragments when the single cell is insufficiently lysed, since we did not observe an improvement in the proportion of high-quality cells for the low Tn5 concentration (2.2 nL; delta median QS = -0.03; Figure D.4). Next, we examined limiting dilution cell dispensing against real-time selected cells (CellenOne, Scienion) dispensed in a block or limiting dilution-like scatter pattern (Figure 4.4b), and found a significant improvement in mean library quality of the actively selected cells over the passively dispensed cells for the high Tn5 concentration (median QS scattered = 0.28 ± 0.30, median QS block = 0.20 ± 0.30, median QS limiting dilution = 0.13 ± 0.25 for 6.5 nL Tn5; Figure D.4). We speculate that this is due to the selection of cells based on three images in the actively selected cells vs two images for cells dispensed by limiting dilution. For the low Tn5 concentration, no significant improvement was detected since almost all cells failed in both conditions (Figure D.4). While we observed an improvement in quality in the actively selected cells, the overall library quality remained poor (median QS = 0.17 ± 0.24). In contrast, libraries built from gDNA or crude lysate produced high-quality copy number profiles. 125  Based on this observation, we speculated that the lysis step was not sufficient to expose the DNA for efficient tagmentation and sought to investigate a range of lysis extensions at 4 °C (2h, 4h and 22h) followed by a 1h heat lysis (Figure 4.4c) to improve the single-cell library quality. For all experimental conditions, except for the low Tn5 concentration at 22 hours, the extended lysis improved CN-quality significantly (median QS 0h = 0.12 ± 0.35, 2h = 0.65 ± 0.4, 4h = 0.80 ± 0.39, 22h = 0.64 ± 0.39; Figure D.4). We also repeated the overnight lysis in combination with the low Tn5 concentration in the next experiment and observed the same improvement. Based on this we speculated that an undetected error during spotting might have caused the poor quality initially. As mentioned, we repeated two extended lysis times (2 hours and 19 hours at 4 °C) over a range of protease concentrations and found that increasing protease concentration beyond the amount used in the microfluidic device had little impact on library quality (Figure 4.4d, Figure D.4). We determined that an overnight (19 hours) lysis at 4 °C in combination with the lowest protease concentration provided the best overall performance (Figure 4.4d panel ix; median QS for 2.2 nL Tn5 = 0.93 ± 0.30; 3.5 nL Tn5 = 0.92 ± 0.20, 6.5 nL Tn5 = 0.91 ± 0.37). In order to break up the protocol so that it can be easily completed over two working days, we next investigated the effects of storing isolated single cells and nuclei on the open-well device. After dispensing and imaging, chips were sealed and stored at -20 °C. While the median library quality was slightly worse for both cells and nuclei (Figure 4.3c; median QS 126  cells fresh = 0.88 ± 0.17; cells stored QS = 0.75 ± 0.31; nuclei fresh QS = 0.89 ± 0.41; nuclei stored QS =0.87 ± 0.37), the changes were not significant (KW tests, Figure 4.3d).  Figure 4.4 Key parameter optimization for DLP+ Effect of (a) protease in lysis buffer, (b) cell dispensing method, (c) lysis time, (d) lysis time and protease concentration, over three tagmentation conditions on the library quality. Colour represents Tn5 concentration; blue 2.2 nL Tn5, orange 3.5 nL Tn5, green 6.5 nL Tn5; Numbers of cells in each condition are indicated above the corresponding violin.  127   Figure 4.5 Effect of splitting experimental condition by cell state on library quality Each row is a tested key parameter shown in Figure 4.4; panels show combined tagmentation conditions (2.2, 3.5 and 6.5 nL Tn5) split by cell state (live/dead). Colours represent cell state; green live, orange dead. Numbers of cells in each condition are indicated above the corresponding violin. Experimental conditions are as follows a) i) cells dispensed in a limiting dilution–like pattern with CellenOne, ii) cells dispensed in a block of wells by CellenOne, iii) cells dispensed by limiting dilution; b) i) lysis buffer contains protease, ii) lysis buffer contains no protease; c) i) 0 hour cold lysis, ii) 2 hour cold lysis, iii) 4 hour cold lysis, iv) 19 hour cold lysis; d) i) 2 hour cold lysis, 2 µL protease, ii) 2 hour cold lysis, 5 µL protease, iii) 2 hour cold lysis, 10 µL protease, iv) 19 hour cold lysis, 2 µL protease, v) 19 hour cold lysis, 5 µL protease, vi) 19 hour cold lysis, 10 µL protease.  128  Finally, we examined the impact of dead cells on library quality. We used imaging information to split all examined experimental conditions by cell state (live vs. dead; Figure 4.5) and found a significant improvement in the quality of live cells over dead cells (Figure D.5). It is however noteworthy that we were able to detect some high-quality CN-profiles in the dead population. This may be related to our inability to distinguish between early and late apoptotic cells with our current staining approach. In general, dead cells had a high dropout rate (n= 69% of dead wells failed to produce a library), whereas live cells had a high success (n= 94% of live wells produced libraries). Finally, we note that even our best experimental conditions produce data sets with a subpopulation of cells that have non-integer alterations within the live population, as has been previously reported by our group and others (1,44). 4.4.4 Evaluation of library fidelity 4.4.4.1 Bootstrap analysis of GM18507 cells with 2h and overnight lysis We next investigated how the lysis time affects the coverage breadth of merged single-cell libraries. We carried out bootstrap sampling and merging analysis of single-cell libraries prepared from the GM18507 cell line using the 2 hours and overnight lysis protocols and compared the results to single-cell libraries prepared from the same cell line using our previously published microfluidic DLP (MF-DLP) protocol (1) (Figure 4.6a). To provide a fair comparison, we only used live cells, removed duplicates and downsampled (DLP+ 2 hours to 72.7%, DLP+ overnight to 88.9%, and MF-DLP (1) to 8.2% of unique non-duplicate 129  reads) each single-cell library to approximately 0.01X mean coverage depth (Figure 4.6b). Bootstrap sampling (1-8 cells n = 360 draws, 16 cells n = 240 draws, 32 cells n = 120 draws, 64 cells n = 60 draws, 128 cells n = 40 draws) and merging of 64 single-cell libraries resulted in a median coverage breadth of 43.7%, 44.4%, 43.3% in the MF-DLP, DLP+ 2 hours and DLP+ overnight libraries, respectively. The differences in coverage breadth are likely a result of different coverage depths in the merged genomes since both characteristics are closely correlated (Figure 4.6a,b). Given that differences in coverage breadth between the lysis conditions were not significant (KW test: p = 0.6766), we determined that the longer lysis condition was preferable, since it permitted the protocol to be split between two reasonable-length work days.  Figure 4.6 Effect of lysis time on coverage breadth of merged single-cell genomes (a), (b) Bootstrap sampling of single-cell GM18507 libraries prepared using a 2 hours and overnight cold lysis conditions; DLP+ 2 hours (n = 148), DLP+ overnight (n = 133), MF-DLP (1) (n = 122). Single-cell libraries were downsampled to a similar mean coverage depth. Coverage breadth is shown in (a) and coverage depth in (b). Box plots show median and quartiles, the whiskers show the remaining distribution, and dots indicate outliers. (c) Lorenz curves showing coverage uniformity for merged single-cell genomes. Curves are median 130  merged genomes from (a). Experimental condition and number of merged cells are indicated in the plot. Dotted black line indicates perfectly uniform genome.  Next we evaluated how the genome-wide coverage uniformity of our merged DLP+ libraries compared to the MF-DLP dataset. For each condition in the bootstrap analysis, we plotted one Lorenz curve for the merged genome with median coverage breadth and found that merged DLP+ genomes achieve comparable coverage uniformity to the MF-DLP libraries (Figure 4.6c). It has previously been shown that the MF-DLP single-cell libraries achieved equivalent coverage breadth and uniformity to a standard Nextera bulk genome of equivalent depth (1). It can, therefore, be reasoned that the merged DLP+ genomes also have a comparable quality with that of a bulk library. Combined, these results demonstrate that either DLP+ lysis condition sufficiently disrupts cell membranes and proteins and provides adequate access to the genomic DNA during library preparation, generating single-cell libraries with uniformity equivalent to microfluidic DLP. 4.4.4.2 Sequencing metrics of GM18507 cell line libraries sequenced on 12 HiSeq lanes To evaluate the optimized tagmentation protocols over a range of coverage depths using a cell line with near normal karyotype, we prepared libraries from the GM18507 cell line using the optimized 2.2 nL and 3.5 nL tagmentation protocols and sequenced them on 6 HiSeq lanes each (Figure 4.7). From these we compared the sequencing metrics to our previously published MF-DLP GM18507 dataset (2.2 nL Tn5: mean 0.084 ± 0.037X depth 131  per cell, n = 618 single cells; 3.5 nL Tn5: mean 0.099 ± 0.043X depth per cell, n = 603 single cells; MF-DLP GM18507 (1): 0.12 ± 0.035X depth per cell, n = 123 single cells). One MF-DLP library failed to produce enough reads and was excluded from analysis. The remaining libraries were down-sampled to yield the same number of total mapped reads per condition (0.125, 0.25, 0.5, 1, 2, 4 million reads). At 4 million total mapped reads, the libraries reached 7.5% (MF-DLP (1)), 9.5% (2.2 nL Tn5) and 9.4% (3.5 nL Tn5) coverage breadth. To achieve the same number of total mapped reads both DLP+ conditions required on average 5.3% (2.2 nL Tn5) and 7.1% (3.5 nL Tn5) more total reads compared to the MF-DLP dataset (Figure 4.7a) due to a higher proportion of unmapped reads (Figure 4.7b). The MF-DLP libraries, on the other hand, suffer from a very small insert size (Figure 4.7c). If the insert size is smaller than the read-length of a sequencing run, the end of each read will contain sequences that are originating from the sequencing adapter rather than the template. Since the sequence of adapters is known, these bases can be bioinformatically removed in a process called adapter trimming, Adapter trimming essentially reduces the read length of the microfluidic libraries, resulting in a lower coverage depth and breadth for the same number of raw reads (Figure 4.7d,e). On the other hand, the short inserts in the MF-DLP library produce a lower duplication rate. The prevalence of short reads is in part introduced during cluster formation on the Illumina flow-cell, since short fragments are favoured in amplification over longer fragments. Indeed, long fragments above 1 kb don not readily form clusters.  132   Figure 4.7 Sequencing metrics for DLP GM18507 libraries Sequencing metrics for GM18507 cell line libraries built on the open-array platform (2.2 nL Tn5, n = 587; 3.5 nL Tn5, n = 571) and on a microfluidic device (n = 141, (1)). Single-cell libraries were downsampled to the same number of total mapped reads. Box plots show median and quartiles, the whiskers show the remaining distribution, and dots indicate outliers. The top column labels state the numbers of cells per condition. Colours represent experimental condition. 133  Therefore, if the size distribution is narrow and short, the entire library forms clusters equally well. Otherwise, longer fragments form clusters at a slower rate resulting in the enrichment of short fragments. This has important consequences for DLP libraries, since each location in the genome is only represented by fragments of the same insert size. As a result, short narrow inserts, as obtained on the microfluidic device, have a lower duplication rate compared to the longer fragments from the open arrays (Figure 4.7f). However, the 2.2 nL and 3.5 nL protocols seem to achieve a good compromise between insert size and duplicate rate, since both achieve a higher coverage breadth and depth when compared to the MF-DLP data set. We showed in the previous chapter that the MF-DLP dataset outperforms both, WGA4 and C-DOP-L libraries (Figure 3.6). Since libraries prepared on the open-array platform achieved higher coverage breadth and depth than the MF-DLP libraries, it can be reasoned that DLP+ would also outperform WGA4 or C-DOP-L; since we did not have suitable data sets from these methods, a direct comparison was not performed. 4.4.4.3 PCR bootstrap analysis and GC bias We finally set out to determine the effect of GC-bias on our ability to merge single-cell genomes effectively. GC-bias is a library characteristic that reflects the correlation between coverage depth of a specific genomic location and its GC-content. Strong GC-bias introduced during library preparation can lead to the under-representation of some genomic regions and over-representation of others. This not only complicates CN inference, it can also result 134  in the dropout of AT- and GC-rich regions due to low coverage, thereby undermining SNV and breakpoint inference in merged clonal or bulk-equivalent genomes (157–159).  We prepared DLP+ libraries using a range of Tn5 concentrations and PCR cycles (Figure D.6) and observed an increase of the average GC-content in our single-cell libraries with increasing Tn5 concentration (8 PCR cycles with 2.2 nL Tn5 (2.2nL/8PCR) = 47.1%; 3.5 nL Tn5 (3.5nL/8PCR)  = 52.7%; 6.5 nL Tn5 (6.5nL/8PCR) = 61.5%; Figure 4.8c). In contrast, the increased number of PCR cycles had a very small effect on the GC-content (11 PCR cycles with 2.2 nL Tn5 (2.2nL/11PCR)  = 49.4%, Figure 4.8c). To determine if the increased GC bias compromises our ability to generate a high-coverage merged genome, we carried out bootstrap sampling and merging of single-cell libraries from the different experimental conditions and compared the results to our microfluidic GM18507 DLP dataset (1). Before the bootstrap analysis, all single-cell libraries were downsampled to achieve the same mean coverage depth of 0.05X (Figure 4.8b; KW test, p= 0.16). Merging 64 downsampled single-cell libraries resulted in 3.16X median coverage depth and 90% median coverage breadth across all libraries (Figure 4.8a,b). However, the 6.5nL/8PCR condition and the 2.2nL/11PCR condition had a significant lower genome coverage breadth (86.2% and 89.9%, respectively) compared to the MF-DLP dataset and both the 2.2nL/8PCR and 3.5nL/8PCR conditions (90.6%, 91.3%, 90.7%, respectively; KW test, p = 1.542e-13; a post-hoc Dunn’s test with Benjamini-Hochberg correction showed all comparisons between MF-DLP, 2.2nL/8PCR, 3.5nL/8PCR and the 6.5nL/8PCR, 2.2nL/11PCR condition were significant, but there was no significant difference between  135   Figure 4.8 Effect of Tn5 concentration and PCR on coverage breadth of merged single-cell genomes 136  (a), (b) Bootstrap sampling of single-cell GM18507 libraries prepared using a range of Tn5 concentrations and PCR indexing cycles on the open-array and compared to the MF-DLP dataset (1); DLP+ 2.2 nL Tn5, 8 PCR (n =188), 3.5 nL Tn5, 8 PCR (n = 190), 6.5 nL Tn5, 8 PCR (n = 197), 2.2 nL Tn5, 11 PCR (n = 198), and MF-DLP (1) (n = 122). Single-cell libraries were downsampled to a similar mean coverage depth. Coverage breadth of merged single-cell genomes is shown in (a) and coverage depth in (b). Box plots show median and quartiles, the whiskers show the remaining distribution, and dots indicate outliers. (c) GC-content of single-cell libraries split by experimental condition shown in (a), (b). Blue graphs show normalized GC-content for a single-cell genome. An unbiased genome would produce a straight line at 1.0 normalized coverage. The top panel labels state the experimental condition. Red distributions represent the probability of sampling a 100bp fragment from the reference genome with a GC-content indicated on the x-axis. (d), (e) Lorenz curves showing genome-wide coverage uniformity of merged single-cell genomes. Curves are median merged genomes from (a). (d) Overlay of experimental conditions at 64-cell merged depth; (e) detailed Lorenz curves for each condition. Dotted curves indicate 64 merged single cells from the microfluidic device. Dotted straight black line indicates perfectly uniform genome.  MF-DLP, 2.2nL/8PCR, 3.5nL/8PCR conditions; see Table 4.2) at the same coverage depth (KW test, p = 0.6974). Finally, pooling 128 cells at a mean coverage depth of 0.05X per cell resulted in 96.9% coverage breadth at an aggregated depth of 6.35X for the 2.2nL/8PCR protocol. In comparison, the 6.5nL/8PCR condition achieved significantly less genome coverage (93.2%; results of KW tests in Table 4.2) at the same depth (KW test p = 0.3302). This suggests that the higher GC-bias associated with the increased Tn5 concentration indeed reduced genomic breadth. To evaluate genome-wide uniformity, we again plotted Lorenz curves for each condition in the bootstrap analysis. We found no major difference between the microfluidic dataset and 137  the 2.2 nL and 3.5 nL Tn5 conditions; however, the 6.5 nL Tn5 condition is considerably biased, and 128 merged single cells achieve a comparable coverage breadth and uniformity as 64 merged single-cell genomes from the MF-DLP dataset. It should be noted that the MF-DLP dataset was previously shown to produce equivalent breadth and uniformity to that of a bulk genome at the same coverage depth (1). 4.4.5 Copy number analysis of a large near-diploid cell line dataset To test our optimized DLP+ protocol at scale, we sequenced 2188 indexed libraries from the near-diploid male lymphoblastoid cell line GM18507 split across 3 experiments (Table 4.3). Prior to library construction, we identified 294 dead and 1600 live single cells by fluorescent microscopy; the remaining 295 libraries were controls or flagged as containing contamination or multiple cells. We filtered live cells with less than 250,000 total mapped reads and inferred copy-number profiles using HMM (n=1409 cells; Figure 4.9). CN-profiles can be clustered into three major classes: (1) diploid (n=1156), (2) rare integer-valued CN alterations (n=176), and (3) noisy non-integer profiles (n=77).  As expected, all diploid cells had a single copy of chromosome X in an otherwise diploid genome (Figure D.7a). For chromosome Y, 18 cells had a total or q-arm loss and 3 cells had a gain. The second class of CN-profiles is dominated by cells with rare integer alterations in an otherwise normal genome (Figure D.7b). Within this class of rare events, 59 events were recurrent (present in at least 2 cells, Figure D.8). These include cells with abnormal gain or 138  loss of entire chromosomes (Figure D.8a) or cells with small alterations (Figure D.8b,c). The lack of contamination in our negative control reactions (Figure D.10) the clear “step- wise” or integer nature of copy number segment medians in otherwise diploid genomes, and the recurrence of clonal events detected across multiple experiments, suggests that many of these alterations represent real, rare events. Interestingly, cells with a possible mis-segregation event in Chr 16q followed by clonal expansion (1-copy gain (n=6) and 1-copy loss (n=7); Figure D.9a-c) were also detected in 4 cells in our previous DLP microfluidic study (2-copy gain (n=1) and 1-copy loss (n=3); Figure D.9d). The frequency of cells with a 1-copy loss of Chr 16q was slightly higher compared to gains of the same region and might be linked to the presence of one or more tumour suppressor genes on this chromosome arm (160). More importantly, it shows that we were able to reliably and reproducibly detect a minor sub-population of cells that make up less than 1% of the analyzed population. Finally, the third class of cells featured a multitude of non-integer alterations across the entire genome (Figure 4.9c). These have been reported in previous studies (44) and can be easily identified and filtered. We and others have speculated that these alterations might be related to processes such as DNA replication.  139   Figure 4.9 Single-cell copy number profiles from GM18507, an immortalized normal lymphoblastoid cell line 140  Heatmap of integer copy number profiles (148) for GM18507 cells (filtered live single cells) with a minimum of 250,000 mapped reads (n=1409). Rows correspond to cells, columns to genomic bins. Example profiles of nearly diploid, rare CN events, and noisy cells are shown in Figure D.7. Purple arrows highlight clonal CNA (Figure D.8), green arrows highlight cells with alterations in Chr 16q (Figure D.9). Colours in heatmap correspond to integer HMM copy-number states, and the hcluster tree shows dissimilarity distance.  4.4.6 Analysis of single-cell genomes from the 184-hTERT TP53 null isogenic cell-line pair Finally, we applied DLP+ to investigate 184-hTERT cells from an early and late wildtype (WT) passage (passage 25 and 51, respectively) along with cells from an isogenic TP53 monoclonal knockout line. Fluorescent microscopy was used to confirm single-cell occupancy and to determine cell state (live/dead) prior to cell lysis and library construction. In each experiment GM18507 cell controls, gDNA positive controls, no-cell controls (NCC) and no-template controls (NTC) were included. In total, 11,315 libraries were built, including 1879 early WT, 1878 late WT, and 6596 TP53 null live single cells (Table 4.4). Libraries were initially sequenced on 6 HiSeq lanes, resulting in a median coverage depth of less than 0.01X. The low coverage was sufficient to assess library quality and investigate clonal structures that were defined by large copy-number alterations. However, additional sequencing depth is needed to infer CN alterations at higher resolution, determine the phylogeny of the population, and identify other variants in merged clonal genomes, such as SNVs. 141  For the initial analysis, we filtered live cells and inferred copy-number profiles using HMM. However, due to the low coverage depth, we reduced the read filter from 250,000 to 100,000 total mapped reads compared to the previous analysis and increased the bin size to 1 Mb.  Figure 4.10 184-hTERT wild-type passage 21 single-cell heatmap Heatmap of integer copy number profiles (148) for 184-hTERT-L2 cells (passage 21) filtered with a 100k mapped read cutoff (n = 1548). Rows correspond to cells, columns to genomic bins (1 Mb).  142   Figure 4.11 184-hTERT wild-type passage 51 single-cell heatmap Heatmap of integer copy number profiles (148) for 184-hTERT-L2/TP53 null cells (TP53 monoclonal knockout line) filtered with the same 100k mapped read cutoff as 184-hTERT-L2 passage 21 (n = 1443). Rows correspond to cells, columns to genomic bins (1 Mb).  The majority of single-cell libraries built from the early WT passage had a flat genome without CN alterations (n = 1167, Figure 4.10). The remaining libraries from this passage harboured rare and clonal integer-valued CN alterations (for example: 4 cells with Chr 1q loss; 4 cells with Chr 9q loss; 2 cells with Chr 13 gain; 4 cells with Chr 15 loss; 7 cells with 143  Chr 20 gain; 3 cells with a 2-copy Chr 20q gain; 3 cells with Chr 4q loss, 2 cells with Chr 18 gain; and 10 cells with losses in Chr X) but did not form any major sub-clones. In the later passage of WT cells, two major sub-clones emerged defined by a one copy gain Chr 20 and a one copy gain of the long arm of Chr 11, respectively (Figure 4.11). Visual examination of the heatmaps, however, suggested a much more complex clonal structure, such as a sub-population of the Chr 11q clone with an additional small loss in Chr 22 or a sub-population with an additional Chr X loss. Similar to the late passage WT cells, the p53 null 184-hTERT line also had several major sub-clones with an apparent ancestral clone containing only a loss and gain in the short arm of Chr 17, the region where TP53 is located (Figure 4.12). Due to the low sequencing depth, the loss/gain region was not properly assigned to the correct copy-number state in many cells, even though the raw binned read counts suggest the presence of this alteration (Figure 4.13). Hierarchal clustering and visual inspection suggested 5 major sub-clones of the ancestral TP53 knockout line distinguished by additional CN alterations in: (1) Chr 20q (one copy gain, n = 603); (2) Chr 20q (two copy gain, n = 251); (3) Chr 9p, Chr 10p and Chr 19q (one copy losses, n = 1131); (4) Chr 4p and Chr19 (n = 608); and (5) Chr 13, Chr 20  (one copy gains, n = 135). However, the clonal structure is certainly more complex. For example, a subset of cells with the Chr 19, 9p and 10p alterations also have a Chr 19p gain.  144   Figure 4.12 184-hTERT TP53 null single-cell heatmap Heatmap of integer copy number profiles (148) for 184-hTERT-L2 cells (with TP53 knockout) filtered with the same 100k mapped read cutoff as 184-hTERT-L2 passage 21 (n = 5302). Rows correspond to cells, columns to genomic bins (1 Mb).  To infer the full clonal structure requires a detailed phylogenetic analysis of the entire dataset. However, the low coverage depth and the resulting incorrect assignment of copy-number states introduce bias that makes the interpretation of the current results problematic. To improve the sensitivity and fidelity of CN inference and ultimately the 145  clonal assignment of single cells, additional sequencing is required for all libraries. This is currently underway and was not available for inclusion in this thesis.   Figure 4.13 Example CN profiles from low-coverage 184-hTERT libraries Examples of integer copy number profiles for single cells from the TP53 knockout 184-hTERT-L2 cell line with an ancestral CN-alteration in Chr 17 (a). Due to the low sequencing depth, the small CN alteration in Chr 17 is only partially (b) or not at all (c) correctly identified. Black arrows highlight missing segment calls. 146  Colours correspond to the CN state assignment for a given genomic segment. Black lines indicate segment medians.  4.5 Discussion In this chapter, we demonstrate truly large-scale and robust single-cell whole-genome sequencing using direct library preparation (DLP) in open microwells (Table 4.5). Integral features of single-cell analysis are a robust isolation of individual cells and scalable processing of genetic material. We address this issue by adapting off the shelf products to identify and isolate single cells in real time during cell deposition into open nanolitre arrays. Compared to existing methods, this overcomes limiting dilution distributed well occupancy of single cells and enables selection based on cell geometry (size, circularity, elongation). Furthermore, the small shear volume minimizes contamination in the carrier fluid compared to single cells isolated by FACS (piezo dispenser ~400 pL vs. FACS ~2 nL droplet volume). Image information acquired during cell spotting and from whole-chip fluorescence scans can be linked to single-cell libraries or used to selectively process only cells of interest. Finally, through the optimization of key reaction parameters we were able to obtain comparable high-quality performance to DLP libraries built on the customized microfluidic device described in the previous chapter. Application of the newly optimized protocol to an unsorted population of near-diploid lymphoblastoid cells (GM18507) revealed that while the majority of cells have a nearly diploid genome (n = 1156), private as well as shared CN alterations are observed in ~18% 147  of the population. These populations would not be detectable in bulk measurements and highlight the importance of single cell approaches.  Two recent studies (161,162) used large genomic datasets to investigate the evolutionary selection principles that drive tumour evolution. These studies emphasize the view that cancer evolution is driven by somatic mutations that give rise to positive selection, and purport that negative selection is largely absent (161). This conclusion likely reflects the fact that bulk genome analysis cannot detect mutations that are at a low level, and therefore miss a large fraction of the mutational diversity that exists in any tumour. Our findings indicate that there exists a large complement of rare mutations, visible only using high-depth single-cell analysis, that do not get propagated during tumour evolution. While selection must always be viewed as a result of relative fitness, more careful analysis of such mutations may prove useful in identifying mutational events that expose weaknesses in tumours. Future studies accounting for intra-tumour heterogeneity and rare mutations will likely modify the estimates on positive and negative selection (163), with large single-cell datasets being required to study the underlying evolutionary principles, both positive and negative, of cancer in more detail. As a demonstration of the use of our method for large-scale analysis and tracking of genomic evolution in tumour cells, we constructed and analyzed a total of 10,665 single-cell libraries (184-hTERT-L2), achieving a new bar for single-cell genomics throughput and performance. 148  It should be emphasized that the DLP+ method offers unprecedented throughput and flexibility; sample sizes and sequencing depth can be adjusted to suit the needs of particular experiments. Small numbers of cells may be pooled for sequencing at higher depth, or thousands of cells can be sequenced in multiplex to provide a high-resolution overview of copy number heterogeneity. DLP+ obviates the need to sequence bulk genomes, as single-cell libraries may simply be pooled to permit SNV and breakpoint inference at the population level. In conclusion, DLP+ is an optimized library preparation protocol without WGA preamplification that permits high-fidelity sequencing of thousands of single-cell genomes.  149  4.6 Tables Table 4.2 Statistics table with KW tests for 64 and 128 merged single cells 1. Kruskal-Wallis test (cell count: 64)    2. Kruskal-Wallis test (cell count: 128)            variable Coverage breadth    variable Coverage breadth  chi-squared 66.056    chi-squared 68.303  df 4    df 3  p-value p = 1.54e-13    p-value p = 9.85e-15            Dunn's test (post hoc) with Benjamini-Hochberg correction  Dunn's test (post hoc) with Benjamini-Hochberg correction           Comparison Z P.unadj P.adj   Comparison Z P.unadj P.adj 2.2nL/8PCR - 2.2nL/11PCR 4.054832 5.02e-05 1.25e-04   2.2nL/8PCR - 2.2nL/11PCR 4.722072 2.33e-06 4.67e-06 2.2nL/8PCR - 3.5nL/8PCR 0.861107 3.89e-01 4.32e-01   2.2nL/8PCR - 3.5nL/8PCR 3.442894 5.76e-04 6.91e-04 2.2nL/11PCR - 3.5nL/8PCR -3.193725 1.40e-03 2.81e-03   2.2nL/11PCR - 3.5nL/8PCR -1.279178 2.01e-01 2.01e-01 2.2nL/8PCR - 6.5nL/8PCR 6.981505 2.92e-12 2.92e-11   2.2nL/8PCR - 6.5nL/8PCR 8.164966 3.22e-16 1.93e-15 2.2nL/11PCR - 6.5nL/8PCR 2.926673 3.43e-03 5.71e-03   2.2nL/11PCR - 6.5nL/8PCR 3.442894 5.76e-04 8.63e-04 3.5nL/8PCR - 6.5nL/8PCR 6.120399 9.33e-10 4.67e-09   3.5nL/8PCR - 6.5nL/8PCR 4.722072 2.33e-06 7.00e-06 2.2nL/8PCR - MF-DLP 1.209910 2.26e-01 2.83e-01       2.2nL/11PCR - MF-DLP -2.844923 4.44e-03 6.35e-03       3.5nL/8PCR - MF-DLP 0.348803 7.27e-01 7.27e-01       6.5nL/8PCR - MF-DLP -5.771596 7.85e-09 2.62e-08       • chi-squared - Chi-squared value • df - Degrees of freedom • p-value - Calculated probability 150  Table 4.3 Sequencing metrics for indexed libraries from immortalized normal lymphoblastoid cell line GM18507 Chip_id Coverage_breadth Coverage_depth Total_mapped_reads Percent _duplicate_reads Live _cell Dead _cell Empty Ambiguous gDNA NCC NTC Total_ sublibraries A90689B 0.0156 0.0159 526389 0.0335 618 0 0 0 29 58 29 734 A90694A 0.0106 0.0108 544429 0.1005 157 46 21 8 0 0 0 232 A90694B 0.0112 0.0115 373650 0.0363 825 248 61 20 13 41 14 1222 • Chip id - Identifier for the library • Coverage breadth – Mean coverage breadth (fraction of the genome covered by at least one read) • Coverage depth – Mean coverage depth (average number of reads per genomic position) • Total mapped reads – Mean total number of mapped reads • Percent duplicate reads - Total number of duplicate reads • Live cell – Number of indexed libraries flagged as single live cells with fluorescence microscopy • Dead cell - Number of indexed libraries flagged as single dead cells with fluorescence microscopy • Empty - Number of indexed libraries flagged as empty wells with fluorescence microscopy • Ambiguous - Number of indexed libraries flagged as ambiguous cell calls with fluorescence microscopy • gDNA – Number of indexed libraries flagged as positive controls • NCC – Number of indexed libraries flagged as no-cell controls • NTC – Number of indexed libraries flagged as no-template controls • Total sublibraries – Number of libraries pooled per chip 151  Table 4.4 Number of constructed indexed 184-hTERT libraries Sample Single-cells (live) Single-cells (dead) gDNA control GM cell control cells (live) GM cell control cells (dead) NCC NTC others Total reads per library (median) hTERT cells pass filter WT Passage 21 1879 82 32 14 14 32 32 x 330542 1548 WT Passage 51 1878 88 32 19 5 31 32 6 288984 1443 TP53 null  6596 142 96 100 12 97 96 x 361486 5302 Total 10353 312 160 133 31 160 160 6 --- 8293  • Sample - Identifier for the library • Single-cell (live) – Number of indexed libraries flagged as single live cells with fluorescence microscopy • Single-cell (dead) - Number of indexed libraries flagged as single dead cells with fluorescence microscopy • gDNA control – Number of indexed libraries flagged as positive controls • GM cell control cells (live) - Number of indexed libraries flagged as single live GM18507 control libraries • GM cell control cells (dead) - Number of indexed libraries flagged as single dead GM18507 control libraries • NCC – Number of indexed libraries flagged as no-cell controls • NTC – Number of indexed libraries flagged as no-template controls  • Others - Number of indexed libraries flagged as ambiguous cell calls with fluorescence microscopy • Total reads per library - Total number of demultiplexed reads assigned to the library • hTERT cells pass filter - Number of single-cell libraries which passed the total mapped read filter criteria 153  Chapter 5: Conclusion 5.1 Contribution to knowledge Microfluidic technology has become an important tool in biomedical research, benefiting from high reaction densities, low sample and reagent consumption, and excellent properties to scale and automate processes. The emphasis of work described in this thesis is on developing and adopting microfluidic technology for single-cell genomic measurements and has impacted the following areas: Design elements in multilayer soft lithography Generally, microfluidic designs are unique assemblies of common design elements, such as chambers, valves, channels, interlayer connections, cell isolation geometries, and more. We added two designs to the catalogue that generally expand the functionality of microfluidic processing. First, we developed a method to integrate DNA oligonucleotides into a microfluidic chip.  This enables the assembly of hundreds of unique reactions and the labelling of separate reaction products on chip, thereby solving the world-to-chip interface for applications where spotted reagents are stable when dried. Second, a novel inflatable microfluidic chamber was developed. This architecture supports programmable multistep assembly of reactions for the rapid development and optimization of protocols on a single device architecture, and is also compatible with solid-phase capture and purification methods. Library preparation from single-cell genomes We developed a direct library preparation method using transposase-mediated tagmentation of unamplified single-cell 154  genomes. This approach avoids whole-genome amplification artefacts, such as polymerase errors or uneven genome coverage, and permits the robust identification of copy-number alterations at low sequencing depth. Combined with a workflow that first defines clones based on shared CNAs and merges single cell genomes, other variants, such as SNVs and breakpoints, can be inferred. This approach addresses limitations of current single-cell protocols, is highly scalable, and matches throughput with the current bandwidth of sequencing technology. Studying tumour heterogeneity We implemented the direct library preparation method on two scalable microfluidic platforms to build sequencing libraries from single cells at larger scale. Measurements of genomic mutations at single-cell resolution can be particularly informative in cancer research to investigate the clonal dynamics and evolution that result from mutations accumulated in single cells. Even though this project was focused on the technology development, our experiments have provided new insights into mutational patterns and clonal evolution, both in cell lines and in xenograft tissue.  This project has led to four poster presentations at major single-cell conferences, one published paper (1), two publications in preparation, and contributed to the filing of one patent application (PCT/CA2016/000031) and one invention disclosure. At the BCCRC, a team of more than ten people is now using the developed methods to build single-cell libraries at a rate of roughly 10,000 cells per week to uncover the relationship between mutational patterns and cellular fitness or disease trajectory. Furthermore, the open-array technology has been transferred to the Michael Smith Genome Sciences Centre, where it is 155  offered as a single-cell library preparation service. Finally, aspects of the open-array approach have contributed to several projects in the Hansen Lab, including one publication describing single-cell WGA in virtual wells (76), and two papers in preparation studying RNA and miRNA expression at single-cell resolution. 5.2 Future recommendations and concluding remarks Alternative single-cell workflows Work in this thesis has focused on methods for single-cell genomic library preparation, but both of the described microfluidic formats provide the necessary flexibility to implement additional single-cell workflows. In particular, compared to the closed microfluidic device, the open-array implementation has benefits regarding simplicity of use and scalability. Protocols implemented on the closed microfluidic device, on the other hand, could benefit from solid phase capture followed by wash steps and buffer exchanges. Among protocols that could be readily adapted, ATAC-seq (72,95,137) is the most similar to our current DLP protocol. The method aims to identify regions of the genome that are unpacked and free of DNA-binding proteins or nucleosomes. The major difference of ATAC-seq to our DLP protocol is a milder lysis condition that only bursts the cell and nuclear membranes while preserving the nucleosomes and histones. While altering the lysis buffer is trivial, it has yet to be tested whether our current chip handling protocols (including spin steps) disrupt the DNA packing structure. Although the microwell DLP workflow presented in this thesis moved the bottleneck in single-cell genomic analysis from library preparation to sequencing, the sequencing 156  bandwidth is steadily increasing. As sequencing becomes more affordable, greater genome coverage can be achieved for a large number of single cells. DLP libraries are, however, suffering from a 50% loss of unsuitable inserts, resulting in limited achievable sequencing depth and breadth. Gertz et al. described a transposase-mediated workflow to build directional RNA-seq libraries (136). The same approach could be adapted to overcome this 50% loss. This procedure could further be combined with the strand-seq methodology (63) in which the second strand is degraded, resulting in strand-specific directional genomic libraries. This workflow could be particularly attractive for the identification of structural variance and the recovery of haplotype information. Strand-seq, among other techniques, has been used to comprehensively assess haplotype-resolved structural variations in the human genome (164). Combined with our microwell approach, a modified version of the strand-seq library preparation could provide the full spectrum of genetic variation for each copy-number clone defined at single-cell resolution. Finally, similar open-array implementations have been used in commercially available systems and research applications to profile the genome and transcriptome of single cells (76,110,112,165) and should be readily compatible with our open-array system as well. Single cell isolation A significant challenge in single-cell studies is the robust isolation of cells. Many cell lines, biopsies, or other tissue samples do not readily form single-cell suspensions and elaborate cell dissociation protocols have to be used, which can lead to significant cell losses. Excessive sample loss is especially problematic for analyzing biopsies from primary tissue, since only very limiting starting material is available. During cell pre-157  processing, filter and spin steps can also damage cell and nuclear membranes and could lead to leakage of DNA content into the carrier fluid. Fixation might stabilize the membranes and contain the DNA, but, in our hands, initial tests using ethanol-based protocols led to cell clumping. Agents such as EDTA or Ficoll (111) might prevent cell clumping, but down-stream reaction inhibition could restrict their use or limit the concentration of some additives. On the closed microfluidic device, on-chip filters were placed upstream from the cell traps, but serial loading is still prone to clogging and thus failure of entire sections of the chip. The minimum dimensions of the dispensing nozzle are an order of magnitude larger compared to the cell-traps; however, clogging of the nozzle and the loss of the droplet is still observed frequently. In summary, single-cell studies would greatly benefit from improved protocols to both form and stabilize single-cell suspensions, and store them for additional cell isolation cycles. Another area of interest is the combination of spatial information and mutational patterns. The PALM MicroBeam instrument (Zeiss) or LMD6/7 system (Leica) enables users to precisely mark regions of interest on fluorescent or bright field images and provide capabilities to accurately extract these areas from a wide range of source materials, including live tissue, cryosections, and FFPE materials (166–169). The open geometry of the microwell array is ideally suited to capture extracted regions and process the genetic material in a scalable format. Bioinformatic tools An important area of further research will also be the development of bioinformatics tools that take advantage of the unique properties of pre-amplification free 158  genomic libraries. An important and extensively studied class of genomic aberration in cancer are single nucleotide variations (SNVs). Due to the error rate of sequencing (~1 in 1000-10,000) a high coverage depth has to be achieved to infer SNV’s with confidence. Cellular heterogeneity further complicates inferring genomic aberrations in bulk measurements. However, new bioinformatics tools that leverage the knowledge that reads are derived from different cells could differentiate between errors resulting from sequencing inaccuracies and genuine variants shared across single-cell genomes. In merged copy-number clonal genomes, this knowledge could also help to identify a sub-clonal structure that might be manifested in a unique SNV pattern.  Another characteristic of DLP libraries is that inserts have unique mapping locations (unique 5’ and 3’ sequences) and a 9 bp overlap between adjacent genomic regions introduced by the tagmentation reaction with Tn5 (126). At sufficient sequencing depth, these properties could be leveraged to assemble sequencing reads into long, phased contigs enabling de novo assemblies and the ability to map structural variations and breakpoints from short next-generation sequencing reads. The tagmentation workflow circumventing the 50% loss of inserts mentioned above is essential for this approach. While sequencing unsorted populations provides a more complete picture of the clonal composition, the lineage analysis can be confounded by cell state (e.g., dead, dividing, abnormal ploidy), type (normal or tumour cells), and library quality. Therefore, it will be important to develop methods to classify libraries and use this information to carefully control the input for the predictions of phylogenetic trees. Detecting fractions of dividing 159  and apoptotic cells might also provide information on clonal fitness and enable clonal dynamic projections. The identification of rare normal cells could help to match normal and tumour tissue in order to identify genome-wide somatic variants unique to clonal events, all derived from the same library preparation and sequencing run. Future work could also look into statistical models to predict the number of cells needed to capture the clonal structure to varying degrees of detail. Such models could use information from small exploratory experiments to provide an initial prediction and dynamically refine the forecast and increase confidence as additional information become available. Finally, while inferring CN alterations and determining clonal structure is informative, joint ‘omic measurements (170–173) from the same single cell might be needed to determine the effect of genomic mutations on the transcriptome, epigenome, and proteome. We imagine that this level of information might ultimately be necessary to advance our understanding of the clinical implications of clonal dynamics and evolution in cancer patients and help to devise a personalized treatment. Finally, we hope that the methods described in this thesis will stimulate further research and provide many opportunities for biologists to uncover real biological processes.  160  References 1. Zahn, H. et al. Scalable whole-genome single-cell library preparation without preamplification. Nat. Methods 14, 167–173 (2017). 2. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009). 3. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013). 4. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). 5. Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006). 6. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012). 7. Nowell, P. C. The clonal evolution of tumor cell populations. Science 194, 23–28 (1976). 8. Aparicio, S. & Caldas, C. The implications of clonal genome evolution for cancer medicine. N. Engl. J. Med. 368, 842–851 (2013). 9. Testa, J. R., Mintz, U., Rowley, J. D., Vardiman, J. W. & Golomb, H. M. Evolution of karyotypes in acute nonlymphocytic leukemia. Cancer Res. 39, 3619–3627 (1979). 10. Kallioniemi, A. et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 (1992). 11. Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-161  generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016). 12. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). 13. Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). 14. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013). 15. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). 16. Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at bioRxiv. doi:10.1101/190330 (2017). 17. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809–813 (2009). 18. Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 1114–1117 (2010). 19. Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 467, 1109–1113 (2010). 20. Gerlinger, M. et al. Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing. N. Engl. J. Med. 366, 883–892 (2012). 21. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012). 162  22. Landau, D. A. et al. Evolution and Impact of Subclonal Mutations in Chronic Lymphocytic Leukemia. Cell 152, 714–726 (2013). 23. Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015). 24. Eirew, P. et al. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature 518, 422–426 (2014). 25. Gillies, R. J., Verduzco, D. & Gatenby, R. A. Evolutionary dynamics of carcinogenesis and why targeted therapy does not work. Nat. Rev. Cancer 12, 487–493 (2012). 26. Navin, N. E. The first five years of single-cell cancer genomics and beyond. Genome Res. 25, 1499–1507 (2015). 27. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012). 28. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012). 29. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012). 30. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014). 31. Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881–1893 (2014). 163  32. Kops, G. J. P. L., Weaver, B. A. A. & Cleveland, D. W. On the road to cancer: aneuploidy and the mitotic checkpoint. Nat. Rev. Cancer 5, 773–785 (2005). 33. Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016). 34. Navin, N. E. Cancer genomics: one cell at a time. Genome Biol. 15, 452 (2014). 35. Macaulay, I. C. & Voet, T. Single cell genomics: advances and future perspectives. PLoS Genet. 10, e1004126 (2014). 36. Blainey, P. C. The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiol. Rev. 37, 407–427 (2013). 37. Wang, Y. & Navin, N. E. Advances and Applications of Single-Cell Sequencing Technologies. Mol. Cell 58, 598–609 (2015). 38. Baslan, T. & Hicks, J. Unravelling biology and shifting paradigms in cancer with single-cell sequencing. Nat. Rev. Cancer 17, 557–569 (2017). 39. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011). 40. Julius, M. H., Masuda, T. & Herzenberg, L. A. Demonstration that antigen-binding cells are precursors of antibody-producing cells after purification with a fluorescence-activated cell sorter. Proc. Natl. Acad. Sci. U. S. A. 69, 1934–8 (1972). 41. Shapiro, H. M. Practical flow cytometry. (Wiley-Liss, 2003).  42. Gross, A. et al. Technologies for Single-Cell Isolation. Int. J. Mol. Sci. 16, 16897–16919 (2015). 164  43. Navin, N. & Hicks, J. Future medical applications of single-cell sequencing in cancer. Genome Med. 3, 31 (2011). 44. Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012). 45. McConnell, M. J. et al. Mosaic copy number variation in human neurons. Science 342, 632–637 (2013). 46. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014). 47. Knouse, K. A., Wu, J., Whittaker, C. A. & Amon, A. Single cell sequencing reveals low levels of aneuploidy across mammalian tissues. Proc. Natl. Acad. Sci. 111, 13409–13414 (2014). 48. Cai, X. et al. Single-Cell, Genome-wide Sequencing Identifies Clonal Somatic Copy-Number Variation in the Human Brain. Cell Rep. 8, 1280–1289 (2014). 49. Hou, Y. et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell 148, 873–885 (2012). 50. Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012). 51. Ni, X. et al. Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients. Proc. Natl. Acad. Sci. 110, 21083–21088 (2013). 52. Gawad, C., Koh, W. & Quake, S. R. Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proc. Natl. Acad. Sci. 111, 17947–165  17952 (2014). 53. Lohr, J. G. et al. Whole-exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Nat. Biotechnol. 32, 479–484 (2014). 54. Baslan, T. et al. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling. Genome Res. 25, 714–724 (2015). 55. Dean, F. B. et al. Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. 99, 5261–5266 (2002). 56. Telenius, H. et al. Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. Genomics 13, 718–725 (1992). 57. Cheung, V. G. & Nelson, S. F. Whole genome amplification using a degenerate oligonucleotide primer allows hundreds of genotypes to be performed on less than one nanogram of genomic DNA. Proc. Natl. Acad. Sci. 93, 14676–14679 (1996). 58. Arneson, N., Hughes, S., Houlston, R. & Done, S. Whole-Genome Amplification by Degenerate Oligonucleotide Primed PCR (DOP-PCR). CSH Protoc. (2008).  59. Wang, J., Fan, H. C., Behr, B. & Quake, S. R. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150, 402–412 (2012). 60. de Bourcy, C. F. A. et al. A Quantitative Comparison of Single-Cell Whole Genome Amplification Methods. PLoS One 9, e105585 (2014). 61. Garvin, T. et al. Interactive analysis and assessment of single-cell copy-number variations. Nat. Methods 12, 1058–1060 (2015). 166  62. Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 48, 1119–1130 (2016). 63. Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012). 64. van den Bos, H. et al. Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and Alzheimer’s disease neurons. Genome Biol. 17, 116 (2016). 65. Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017). 66. Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017). 67. Xi, L. et al. New library construction method for single- cell genomes. PLoS One 12, e0181163 (2017). 68. Chen, C. et al. Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science 356, 189–194 (2017). 69. Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017). 70. Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014). 71. Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-167  preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014). 72. Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015). 73. Zhang, K. Stratifying tissue heterogeneity with scalable single-cell assays. Nat. Methods 14, 238–239 (2017). 74. Gole, J. et al. Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells. Nat. Biotechnol. 31, 1126–1132 (2013). 75. Fu, Y. et al. Uniform and accurate single-cell sequencing based on emulsion whole-genome amplification. Proc. Natl. Acad. Sci. 112, 11923–11928 (2015). 76. Leung, K. et al. Robust high-performance nanoliter-volume single-cell multiple displacement amplification on planar substrates. Proc. Natl. Acad. Sci. 113, 8484–8489 (2016). 77. Xu, L., Brito, I. L., Alm, E. J. & Blainey, P. C. Virtual microfluidics for digital quantification and single-cell sequencing. Nat. Methods 13, 759–762 (2016). 78. Duffy, D. C., McDonald, J. C., Schueller, O. J. A. & Whitesides, G. M. Rapid Prototyping of Microfluidic Systems in Poly(dimethylsiloxane). Anal. Chem. 70, 4974–4984 (1998). 79. Zhao, X.-M., Xia, Y. & Whitesides, G. M. Soft lithographic methods for nano-fabrication. J. Mater. Chem. 7, 1069–1074 (1997). 80. Unger, M. A., Chou, H.-P., Thorsen, T., Scherer, A. & Quake, S. R. Monolithic Microfabricated Valves and Pumps by Multilayer Soft Lithography. Science 288, 113–168  116 (2000). 81. Hong, J. W. & Quake, S. R. Integrated nanoliter systems. Nat. Biotechnol. 21, 1179–1183 (2003). 82. Chou, H.-P., Unger, M. A. & Quake, S. R. A Microfabricated Rotary Pump. Biomed. Microdevices 3, 323–330 (2001). 83. Thorsen, T., Maerkl, S. J. & Quake, S. R. Microfluidic large-scale integration. Science 298, 580–584 (2002). 84. Hua, Z. et al. A versatile microreactor platform featuring a chemical-resistant microvalve array for addressable multiplex syntheses and assays. J. Micromechanics Microengineering 16, 1433–1443 (2006). 85. Guo, M. T., Rotem, A., Heyman, J. A. & Weitz, D. A. Droplet microfluidics for high-throughput biological assays. Lab Chip 12, 2146–2155 (2012). 86. Macosko, E. Z. et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214 (2015). 87. Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015). 88. Hosokawa, M., Nishikawa, Y., Kogawa, M. & Takeyama, H. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics. Sci. Rep. 7, 5199 (2017). 89. Lan, F., Demaree, B., Ahmed, N. & Abate, A. R. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding. Nat. Biotechnol. 35, 640–646 169  (2017). 90. White, A. K. et al. High-throughput microfluidic single-cell RT-qPCR. Proc. Natl. Acad. Sci. 108, 13999–14004 (2011). 91. Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011). 92. Kim, S. et al. High-throughput automated microfluidic sample preparation for accurate microbial genomics. Nat. Commun. 8, 13919 (2017). 93. Tan, S. J. et al. A microfluidic device for preparing next generation DNA sequencing libraries and for automating other laboratory protocols that require one or more column chromatography steps. PLoS One 8, e64084 (2013). 94. Szulwach, K. E. et al. Single-Cell Genetic Analysis Using Automated Microfluidics to Resolve Somatic Mosaicism. PLoS One 10, e0135007 (2015). 95. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013). 96. Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2013). 97. Yu, Z., Lu, S. & Huang, Y. Microfluidic Whole Genome Amplification Device for Single Cell Sequencing. Anal. Chem. 86, 9386–9390 (2014). 98. Yang, Y., Swennenhuis, J. F., Rho, H. S., Le Gac, S. & Terstappen, L. W. M. M. Parallel 170  Single Cancer Cell Whole Genome Amplification Using Button-Valve Assisted Mixing in Nanoliter Chambers. PLoS One 9, e107958 (2014). 99. Leung, K. et al. A programmable droplet-based microfluidic device applied to multiparameter analysis of single microbes and microbial communities. Proc. Natl. Acad. Sci. 109, 7665–7670 (2012). 100. Fluidigm. Doublet rate and detection on the C1 IFCs. White Pap. (2016). 101. Fan, H. C., Fu, G. K. & Fodor, S. P. A. Combinatorial labeling of single cells for gene expression cytometry. Science 347, 1258367–1258367 (2015). 102. Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015). 103. Yuan, J. & Sims, P. A. An Automated Microwell Platform for Large-Scale Single Cell RNA-Seq. Sci. Rep. 6, 33883 (2016). 104. DeKosky, B. J. et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol. 31, 166–169 (2013). 105. Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017). 106. Gudapati, H., Dey, M. & Ozbolat, I. A comprehensive review on droplet-based bioprinting: Past, present and future. Biomaterials 102, 20–42 (2016). 107. Li, J., Rossignol, F. & Macdonald, J. Inkjet printing for biosensor fabrication: combining chemistry and technology for advanced manufacturing. Lab Chip 15, 171  2538–2558 (2015). 108. Xu, T., Jin, J., Gregory, C., Hickman, J. J. & Boland, T. Inkjet printing of viable mammalian cells. Biomaterials 26, 93–99 (2005). 109. Scienion. Controlled Cell Dispensing. (2017). at <http://www.scienion.com/company/applications/controlled-cell-dispensing/> 110. Gao, R. et al. Nanogrid single-nucleus RNA sequencing reveals phenotypic diversity in breast cancer. Nat. Commun. 8, 228 (2017). 111. Cheng, E., Yu, H., Ahmadi, A. & Cheung, K. C. Investigation of the hydrodynamic response of cells in drop on demand piezoelectric inkjet nozzles. Biofabrication 8, 15008 (2016). 112. Wu, L. et al. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells. Gigascience 4, 51 (2015). 113. Zhu, Y. et al. Printing 2-dimentional droplet array for single-cell reverse transcription quantitative PCR assay with a microfluidic robot. Sci. Rep. 5, 9551 (2015). 114. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. 74, 5463–5467 (1977). 115. Stranneheim, H. & Lundeberg, J. Stepping stones in DNA sequencing. Biotechnol. J. 7, 1063–73 (2012). 116. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). 172  117. Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M. & Nyrén, P. Real-Time DNA Sequencing Using Detection of Pyrophosphate Release. Anal. Biochem. 242, 84–89 (1996). 118. Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011). 119. Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–6 (2008). 120. Kulski, J. K. Next Gener. Seq. - Adv. Appl. Challenges 3–60 (InTech, 2016).  121. Illumina. NovaSeq 6000 Sequencing System. (2016). at <https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/novaseq-6000-system-specification-sheet-770-2016-025.pdf> 122. Illumina. DNA SEQUENCING METHODS COLLECTION. (2016). at <https://www.illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/dna-sequencing-methods-review-web.pdf> 123. Illumina. RNA SEQUENCING METHODS COLLECTION. (2016). at <https://www.illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/rna-sequencing-methods-review-web.pdf> 124. Illumina. Single-Cell Research: An Overview of Recent Single-Cell Research 173  Publications Featuring Illumina Technology. (2016). at <https://www.illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/single-cell-sequencing-research-review.pdf> 125. Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–1535 (2010). 126. Syed, F., Grunenwald, H. & Caruccio, N. Next-generation sequencing library preparation: simultaneous fragmentation and tagging using in vitro transposition. Nat. Methods | Appl. Notes. (2009). 127. Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010). 128. Grunenwald, H., Baas, B., Caruccio, N. & Syed, F. Rapid, high-throughput library preparation for next-generation sequencing. Nat. Methods | Appl. Notes 7, (2010). 129. Reznikoff, W. S. Transposon Tn 5. Annu. Rev. Genet. 42, 269–286 (2008). 130. Lukyanov, K. A., Gurskaya, N. G., Bogdanova, E. A. & Lukyanov, S. A. Selective Suppression of Polymerase Chain Reaction. Russ. J. Bioorganic Chem. Transl. from Bioorganicheskaya Khimiya 25, 141–147 (1999). 131. Baym, M. et al. Inexpensive Multiplexed Library Preparation for Megabase-Sized Genomes. PLoS One 10, e0128036 (2015). 132. Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014). 174  133. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2013). 134. Wang, Q. et al. Tagmentation-based whole-genome bisulfite sequencing. Nat. Protoc. 8, 2022–2032 (2013). 135. Adey, A. & Shendure, J. Ultra-low-input, tagmentation-based whole-genome bisulfite sequencing. Genome Res. 22, 1139–1143 (2012). 136. Gertz, J. et al. Transposase mediated construction of RNA-seq libraries. Genome Res. 22, 134–141 (2012). 137. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017). 138. Liu, J., Hansen, C. & Quake, S. R. Solving the ‘world-to-chip’ interface problem with a microfluidic matrix. Anal. Chem. 75, 4718–4723 (2003). 139. Spurgeon, S. L., Jones, R. C. & Ramakrishnan, R. High Throughput Gene Expression Measurement with Real Time PCR in a Microfluidic Dynamic Array. PLoS One 3, e1662 (2008). 140. Huft, J., Da Costa, D. J., Walker, D. & Hansen, C. L. Three-dimensional large-scale microfluidic integration by laser ablation of interlayer connections. Lab Chip 10, 2358–2565 (2010). 141. VanInsberghe, M., Zahn, H., White, A. K., Petriv, O. I. & Hansen, C. L. Highly multiplexed single-cell quantitative PCR. PLoS One 13, e0191601 (2018). 142. Huft, J., Haynes, C. A. & Hansen, C. L. Microfluidic Integration of Parallel Solid-Phase 175  Liquid Chromatography. Anal. Chem. 85, 2999–3005 (2013). 143. Heyries, K. A. et al. Megapixel digital PCR. Nat. Methods 8, 649–651 (2011). 144. Gómez-Sjöberg, R., Leyrat, A. A., Pirone, D. M., Chen, C. S. & Quake, S. R. Versatile, Fully Automated, Microfluidic Cell Culture System. Anal. Chem. 79, 8557–8563 (2007). 145. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 146. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). 147. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). 148. Ha, G. et al. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Res. 22, 1995–2007 (2012). 149. Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data. Bioinformatics 28, 167–175 (2012). 150. McPherson, A. et al. nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res. 22, 2250–2261 (2012). 151. Burleigh, A. et al. A co-culture genome-wide RNAi screen with mammary epithelial cells reveals transmembrane signals required for growth and differentiation. Breast Cancer Res. 17, 4 (2015). 152. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under 176  mixed models. Bioinformatics 19, 1572–1574 (2003). 153. The International HapMap Consortium, T. I. H. A haplotype map of the human genome. Nature 437, 1299–1320 (2005). 154. Knouse, K. A., Wu, J. & Amon, A. Assessment of megabase-scale somatic copy number variation using single cell sequencing. Genome Res. 26, 376–384 (2016). 155. Mazutis, L. et al. Multi-step microfluidic droplet processing: kinetic analysis of an in vitro translated enzyme. Lab Chip 9, 2902 (2009). 156. Mazutis, L. et al. Single-cell analysis and sorting using droplet-based microfluidics. Nat. Protoc. 8, 870–891 (2013). 157. Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012). 158. Chen, Y.-C., Liu, T., Yu, C.-H., Chiang, T.-Y. & Hwang, C.-C. Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLoS One 8, e62856 (2013). 159. Ross, M. G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013). 160. Rakha, E. A., Green, A. R., Powe, D. G., Roylance, R. & Ellis, I. O. Chromosome 16 tumor-suppressor genes in breast cancer. Genes, Chromosom. Cancer 45, 527–535 (2006). 161. Martincorena, I. et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041.e21 (2017). 177  162. Campbell, B. B. et al. Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042–1056.e10 (2017). 163. Bakhoum, S. F. & Landau, D. A. Cancer Evolution: No Room for Negative Selection. Cell 171, 987–989 (2017). 164. Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Preprint at bioRxiv. doi:10.1101/193144 (2017).  165. Goldstein, L. D. et al. Massively parallel nanowell-based single-cell gene expression profiling. BMC Genomics 18, 519 (2017). 166. Espina, V. et al. Laser-capture microdissection. Nat. Protoc. 1, 586–603 (2006). 167. Di Martino, D. et al. Single sperm cell isolation by laser microdissection. Forensic Sci. Int. 146, S151–S153 (2004). 168. Boone, D. R., Sell, S. L. & Hellmich, H. L. Laser capture microdissection of enriched populations of neurons or single neurons for gene expression analysis after traumatic brain injury. J. Vis. Exp. doi:10.3791/50308 (2013).  169. Frumkin, D. et al. Amplification of multiple genomic loci from single cells isolated by laser micro-dissection of tissues. BMC Biotechnol. 8, 17 (2008). 170. Dey, S. S., Kester, L., Spanjaard, B., Bienko, M. & van Oudenaarden, A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 33, 285–289 (2015). 171. Mertes, F. et al. Combined ultra-low input mRNA and whole-genome sequencing of human embryonic stem cells. BMC Genomics 16, 925 (2015). 178  172. Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015). 173. Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).  179  Appendices Appendix A  : Detailed methods: primer integration A.1 Device fabrication using the push-down MSL workflow The first chip design was based on a common push-down architecture (80). Briefly, silicon moulds were fabricated by photolithography. Photomasks were designed with CAD software (Autodesk Inc.) and printed on transparency masks at 20,000 dpi (CAD/ART SERVICES INC). The control layer consisted of control lines and a ~31-nL reaction chamber, whereas the flow layer contained chambers that interfaced with the spots and flow channels that connected reagent inlets to reaction chambers. Microfluidic devices were cast from polydimethylsiloxane (PDMS, RTV615, General Electric) with each device consisting of 3 elastomeric layers: a top control layer, a middle flow layer, and a blank layer sealing the chip. To minimize PDMS adhesion to the photoresist structures, both moulds were surface treated with chlorotrimethylsilane (TMCS, Sigma Aldrich) vapour for 20 minutes. The control layer was made by pouring 40 g of a 5:1 mixture of PDMS (5 parts RTV615A and 1 part RTV615B) onto the control mould, followed by degassing the PDMS until all visible bubbles were gone (~20 minutes). In the meantime, the thin flow layer was fabricated by spin coating the flow mould with a 20:1 mixture of PDMS (20 parts RTV615A and 1 part RTV615B). Both, the control and the flow layers were cured at 80 °C for 17 minutes. Following this bake, the control layer was peeled off of the mould and access ports were punched. The control layer was then aligned to the flow mould and baked for 2 hours 180  to bond the two layers together. Next, the combined slab was peeled off the flow mould, the remaining access ports were punched, interlayer connections were added with a custom laser ablation system (140), and the chips were diced to their final size. In parallel, microscope slides were spin coated with 20:1 PDMS and cured for 17 minutes. After baking, the microscope slides were placed on a piezo dispenser (sci- FLEXARRAYER S3, Scienion AG) and an array of primers or template was spotted onto the blank PDMS membranes. Details about this process can be found in section A.3 of the appendix. Finally, the spotted blank membrane and 2-layer chips were plasma oxidized for 30 seconds and immediately aligned. The finished chips were baked at 80 °C for an additional 2 hours to strengthen the final bond and stored in a dark box at room temperature until use. A.2 Device fabrication and operation of the double-sided imprinted chip The following imprinting workflow is a variation of the multilayer soft lithography workflow. Lithography masks. 4-inch silicon wafers (Silicon Quest) and 100 mm × 100 mm glass slides (D263-8, Howard Glass) were patterned by photolithography using photomasks designed with CAD software (Autodesk Inc.). The design from each layer was transferred onto a 5-inch square chrome mask using a laser writer system (LW405 Laser Mask Writer, Microtech). Next, the resist was developed using AZ developer (PPD-455 developer, HTA Enterprises) diluted in an equal amount of water. As a uniform pattern became visible, the mask was transferred into a water bath to stop the development process, then rinsed with 181  DI water and dried with nitrogen. The mask was then immersed in chrome etchant and agitated until the pattern became clear. Shortly after, the mask was removed from the etchant and transferred to a clean water bath. Again, the mask was rinsed with DI water and dried using nitrogen. Lastly, the remaining AZ resist was stripped off by immersing in stripper (PRS-100, HTA Enterprises) and agitating for 2 minutes. The additional resist was removed by wiping the surface with a wet cleanroom wipe. The stripper was washed off by thoroughly rinsing the mask with fresh DI water and isopropanol (IPA), followed by drying with nitrogen. These steps were repeated for all layers. Mould fabrication Photoresist was deposited by spin coating (G3 Spin Coater, Specialty Coating Systems, Inc.) and baking the wafer, followed by exposing it on a mask aligner (Suss MA6) in hard contact mode. All bake steps were performed on large area precision hot plates (300 mm × 300 mm Brewer Cee 10) unless otherwise specified. Resist thicknesses were measured on a KLA-TENCOR Alpha-Step IQ surface profiler. SU-8 photoresist and developer were acquired from MicroChem. AZ50XT resist and developer was obtained from AZ Electronic Materials. Oxygen-plasma treatments were performed in a PDC-32G & PDC-FMG plasma cleaner (Harrick Plasma) using the following settings: RF power settings: “high”, Oxygen gas pressure: 600 mTorr. The control layer design was transferred to silicon moulds in two lithographic steps. First, wafers were dehydrated in a convection oven at 190 °C overnight. Next, the wafers were coated with SU-8 2025 photoresist at 2700 rpm for 30 seconds. The coated wafers were soft baked at 65 °C for 3 min, 95 °C for 6 min and 65 °C for 2 min. Subsequently, the resist 182  was exposed to 315 mJ/cm2 UV light and immediately transferred back to the hotplates for the post exposure bake at 65 °C for 2 min, 95 °C for 5 min and 65 °C for 1 min. The exposed moulds were then immersed in SU-8 developer, developed for 1 min while agitating the solution, rinsed with fresh SU-8 developer and cleaned with IPA to remove the developer. Finally, the moulds were dried with compressed and filtered nitrogen and hard baked at 175 °C for 1 hour. Next, the 110-μm high valve chambers were fabricated. SU-8 50 resist was deposited at 1250 rpm for 30 seconds; soft baked at 65 °C for 10 min, 95 °C for 30 min and 65 °C for 5 min. To obtain vertical sidewalls, a long pass filter minimizing UV radiation below 350 nM was used during exposure (350 mJ/cm2). Moulds were post-baked at 65 °C for 2 min, 95 °C for 10 min and 65 °C for 1 min, and developed in SU-8 developer for 7 min. Finally, the resist was rinsed with IPA and hard baked at 175 °C for 1 hour. The flow mould consisted of 3 layers: rounded flow channels, reaction chambers, and alignment posts. First, glass slides were exposed to HMDS (Hexamethyldisilazane) vapour for 5 min. The HDMS-coated glass slides were next wetted with IPA before pouring AZ 50XT onto the wafer. The slides were then spun at 3500 rpm for 45 seconds and incubated at RT for 20 min, before transferring the slides to an 85 °C hotplate for 2 min and 115 °C for 10 min. The baked resist was rehydrated at room temperature overnight and then exposed to 3 doses of UV light (280 mJ/cm2) with a 5 min wait between exposures. The exposed slides were directly transferred into diluted developer (400XT developer diluted 1:2 in water) for ~ 10 min. The developer was washed off with DI water and transferred to a pre-heated 65 °C convection oven. To achieve a rounded channel geometry and facilitate valve 183  closure, the resist was reflowed by ramping the temperature to 190 °C followed by a hard bake at the same temperature overnight. Next, the temperature was ramped down to room temperature, and the slides were plasma cleaned for 1 min. After a 30-min incubation at room temperature, reaction chambers were created in SU-8 50 resist. The slides were spin coated at 1500 rpm for 30 seconds, followed by a soft bake at 65 °C for 10 min, 95 °C for 26 min and 65 °C for 5 min. The resist was exposed through a long pass filter (315 mJ/cm2), immediately followed by post exposure bake at 65 °C for 2 min, 95 °C for 9 min and 65 °C for 1 min. The moulds were then developed for 7 min in SU-8 developer, rinsed with fresh IPA and hard baked at 175 °C for 1 hour. Finally, the alignment and distance posts were formed in SU-8 50 resist. The resist was spun on at 1250 rpm for 30 seconds and soft baked at 65 °C for 10 min, 95 °C for 30 min and 65 °C for 5 min. Using a long pass filter, the slides were exposed (350 mJ/cm2) and again baked at 65 °C for 2 min, 95 °C for 10 min and 65 °C for 1 min. The slides were then slowly cooled to room temperature before they were immersed in SU-8 developer for 7 min while constantly agitating. The developer was rinsed off with IPA and the resist was hard baked at 175 °C for 1 hour.  Surface treatment Before casting microfluidic devices, both the control silicon wafers and the glass slides received a surface coating. The silicon moulds were plasma cleaned for 1 min and then immediately exposed to chlorotrimethylsilane (TMCS, Sigma Aldrich) vapour for 8-15 hours to minimize adhesion from PDMS to the photoresist structures. This treatment was repeated after every 20 casting steps. The glass moulds and additional blank silicon wafers were oxygen plasma treated for 1 min and then transferred to a preheated 184  70 °C vacuum oven (VWR). A 3-psi vacuum was applied, and the glass moulds were exposed to 3-(Trimethoxysilyl)propyl methacrylate (also known as A174, Sigma Aldrich) vapour for 30 min inside the oven. Next, the moulds were immediately transferred to a parylene coater and received a 200-250 nm thick coating. Device fabrication Microfluidic devices were assembled from two elastomeric membranes. The top layer is a blank 3-5 mm thick membrane that seals the microfluidic channels and chambers and provides the world-to-chip interface with holes for control lines and reagent inlets. The middle layer contains all microfluidic features, such as channels, valve structures, and chambers. Both layers were bonded to a glass slide to provide a ridged substrate for handling the devices. The elastomeric membranes were fabricated in PDMS (RTV615, General Electric, USA). The structured layer was made by pouring a thin layer of a 15:1 mixture of PDMS (15 parts RTV615A and 1 part RTV615B) onto the parylene-coated glass slide (enough to cover all patterned areas). The mould was then placed in a vacuum chamber and PDMS was degassed for 20 min. At the same time, the silicon control mould was spin coated with 15:1 PDMS at 500 rpm for 1 min, before the glass mould was aligned to the silicon mould, thereby sandwiching the PDMS in between the two moulds. A 500-g weight was placed in the center of the glass mould for 15 min to squeeze out excess PDMS. Photoresist posts on the glass mould and a receiving structure on the silicon mould allowed for accurate alignment. Furthermore, the height of the posts determined the thickness of the membrane. Next, the assembly was cured at 80 °C for 1 hour in a convection oven. After baking, the glass mould was carefully lifted off, leaving the 185  PDMS membrane attached to the silicon mould. Next, template was spotted into the open chambers (operation described in section A.3 of the appendix). At the same time, a blank, 3-5 mm-thick PDMS membrane was made by pouring a mixture of 10:1 PDMS onto blank silicon wafers. The PDMS was degassed for 30-45 min and then cured at 80 °C for 1 hour. Next, the blank and structured membranes were plasma oxidised for 1 min and 30 seconds, respectively, and immediately assembled. To strengthen the bond, the assembled layers were incubated at 80 °C for 5 min. Finally, the two-layer structure was peeled off the silicon mould, 0.71-mm access ports were punched, and individual devices were cut to size and plasma bonded to a glass slide. Operation The push-down and imprinted devices were operated using 7 and 3 manual pneumatic valves, respectively. The on-chip valves were connected to the pneumatic valves using Teflon tubing (Cole-Parmer) and 20-gauge stainless steel pins (New England Small Tube Corporation). Control lines were filled with Krytox oil (DuPont) and actuated at 25-30 psi. Reagents were pushed into the chip using gel loading pipette tips (XcludaTM Style G, BioRad) connected at 2-8 psi pneumatic pressure. A.3 Spotting robot operation All spotting operations were carried out on a sciFLEXARRAYER S3 from Scienion. To operate the system, the user can create a run routine from a list of tasks. The software provides a library of common tasks, but more specialized custom tasks can be created. In 186  the following sections, only custom tasks and the final run routine are described; details about the published tasks can be found in the user manual. Alignment of the imprinted PDMS substrate to the spotting robot Two marks on the target holder were used to place the PDMS-glass substrates with a 2-mm tolerance from the marks. Alignment marks defined in the PDMS membrane were then used to automatically align the spotting robot to the microfluidic chambers. The function gave an error if the fiducial maker couldn’t be found within the target area and manual adjustment of the substrate was required. To compensate for both rotation and placement errors, two reference points were used. The following settings were used: Find Target Reference Point, Match Mode: Shift Invariant, Minimum Score: 650, Search Strategy: Conservative, Match Feature Mode: Shape, Minimum Contrast: 10, No. of Reference Points: 2, Spot Area: X Offset 15000 – Y Offset 0, enable “Use Two Remote Image Frames”: Second Point X Offset 0 – Y Offset 40000 (Figure A.1). Furthermore, the camera settings were optimized to improve contrast and brightness on the highly reflective silicon wafer (silicon wafers: brightness: 16, contrast: 42.7, gamma: 1.11; glass slides: brightness: 113, contrast: 42.7, gamma: 1.11). 187   Figure A.1 Field target reference point setup for imprinted microfluidic device  Wash routine A custom wash routine was combined with a provided wash function to avoid cross-contamination during spotting. The custom routine includes a series of pump and sonication steps with a cleaning solution [2% sciClean solution (Scienion AG) in water] 188  to remove any contamination. First, the wash pump was turned on to constantly flush fresh 20 MΩ water through the wash station and remove the waste liquid. Next, 200 μL of system liquid was pumped into the waste station at a speed of 13 μL/sec. During the pump cycle, the piezo was turned on for 5 sec, and subsequently the nozzle was immersed into the wash station until 200 μL of the system liquid was pumped through the nozzle. Next, 18 μL of a cleaning solution was taken up at a speed of 3 μL/sec. The nozzle was moved back to the waste station, and 30 μL was pumped through the nozzle at a speed of 3 μL/sec. As soon as the pump cycle started, the piezo was turned on for 4 sec. Once the 30-μL pump cycle had finished, the wash pump was turned off. Spotting routine 1200 pL of probe substrate was automatically deposited into microfluidic chambers or on a blank PDMS membrane using a PDC-70 Type 1 nozzle. Two run routines were developed from a combination of published and custom tasks. The field target recognition task was not needed for depositing the probe substrate onto the blank membrane, whereas the quality-control task did not work in combination with the microfluidic chambers. The run task started with a short wash to wet the capillary and produce a stable droplet. If applicable, the custom “Find Target Reference Point” task was executed to align the spotting robot to the substrate. Next, 5 μL of probe substrate was taken up and subsequently a short dip into the wash station removed remaining reagents stuck to the outside of the nozzle. After the wash, 750 droplets were dispensed to eject any diluted reagents due to the dip in the wash station, evaluate droplet stability, and record 189  the volume of the drop for quality control. The probe substrate was then dispensed onto the PDMS membrane or into the chambers directly (Figure A.2, Figure A.3).  Figure A.2 Target setup for push-down geometry device  Figure A.3 Target setup for double-sided imprinted microfluidic device  After spotting, the volume was again recorded before the remaining reagents were flushed out (using the “WashFlush_Strong” task) and finally the “WashFlush_Medium” task, the custom wash task is described above). If more than one probe was dispensed, the cycle was restarted by taking up 5 μL of the next probe. After all probes were dispensed, a final short 190  wash was performed and the nozzle parked in the home position. A screenshot of the run routines can be found in Figure A.4.     Figure A.4 Run tasks for devices described in Chapter 2  191  A.4 PCR experiments All on-chip qPCR experiments were performed on a customized prototype instrument of the BioMark™ reader (Fluidigm); the prototype version is a qPCR machine for custom microfluidic devices and includes a vacuum flatbed thermocycler, 4-megapixel camera, 175 W xenon arc bulb, various fluorescence filters (FAM, VIC, ROX, QAS), and control software. The PCR experiments were performed using 2× TaqMan™ Fast Universal PCR Master Mix (Thermo Fisher Scientific). The thermocycling protocol for all experiments included a 20 second hot-start at 95 °C and 40 or 45 cycles at 95 °C for 1 second and 60 °C for 30 seconds. All measurements were performed using a TaqMan assay for the RPPH1 gene on a synthetic template fragment (143). Forward Primer:  5’- GAGGTCAGACTGGGCAGGAG-3’ Reverse Primer:  5’- CCTCACCTCAGCCATTGAACTC-3’ TaqMan Probe:  5’- FAM-TGCCGTGGACCCCGCCCTTCG-BHQ1-3’ Synthetic fragment (reverse strand not shown): 5’- GAGGTCAGACTGGGCAGGAGATGCCGTGGACCCCGC CCTTCGGGGAGGGGCCCGGCG GATGCCTCCTTTGCCG GAGCTTGGAACAGACTCACGGCCAGCGAAGTGAGTT CAATGGCTGAGGTGAGG-3’ Synthetic template DNA, primers, and probes were obtained from Integrated DNA Technologies. Single-stranded template sequences were annealed as described by Heyries 192  et al. (143) and diluted in low TE to a final concentration of 10 μM. Primers were combined and diluted to 6.5 μM in 0.1% Tween20 and the TaqMan probe was diluted to 10 μM. After dilution, all solutions were kept frozen at –20 °C. During an experiment, all reagents were premixed and, if applicable, dispensed in different tubes to add template DNA at various concentrations. For the dynamic range experiment, PCR reagent was prepared with 1× Fast Universal PCR Master mix, 500 nM Probe, 0.1% Tween 20, 0.1% BSA and PCR grade water. Synthetic DNA fragments were first diluted to 2.84 nM followed by five 32× dilutions. Next, 1.5 μL of each template dilution was combined with 5.5 µL of premixed PCR reagent. For the NTC, an equal amount of PCR grade water was added instead. 1200 pL of primers were pre-spotted such that the final concentration was 250 nM in each chamber. PCR reagent for the resuspension experiment was prepared with 1× Fast Universal PCR Master mix, 500 nM Probe, 250 nM Primer, 0.1% Tween 20, 0.1% BSA and PCR grade water. Template DNA was pre-spotted (1200 pL at 11.86 nM) into the first 5 inlets; the sixth inlet was left empty as a negative control. The PCR reagent for the cross-contamination experiment was the same as the PCR mix for the resuspension experiment, but the template was spotted (1200 pL) into the chip with an alternating pattern of water and synthetic template at a concentration of 20 nM with wash steps in between (see wash routine above). The first 5 inlets are replicas and the last inlet was left empty as a negative control. 193  A.5 Image analysis Two fluorescent images (one passive reference and one reporter image) of the devices were taken after each PCR cycle and analyzed using custom MATLAB (MathWorks) code to generate real-time PCR curves and extract cycle-threshold (CT) values (90). CT values are stated as mean ± standard deviation.  Additional analysis was performed using Excel (Microsoft). PCR amplification efficiency was calculated as the slope of the linear fit of log2(C) versus CT over the highest four concentrations (Efficiency = 100*(2^(-1/slope)-1)). Precision estimates were calculated by converting CT values into concentrations using the standard curve and then calculating the mean and standard deviation across the entire array of positive chambers (Precision = standard deviation (C) / mean (C)). Appendix B  : Detailed methods: inflatable chamber B.1 Device fabrication Fabrication process Microfluidic devices were fabricated using a modified multilayer soft lithography workflow (80,83). First, 4-inch silicon wafers (Silicon Quest) were patterned by photolithography. Photomasks were designed in AutoCAD (Autodesk Inc.) and either printed on transparency films at a 20,000 dpi resolution (CAD/Art Services Inc.) or written directly onto chrome masks using a laser writer system (LW405 Laser Mask Writer, Microtech). Chrome masks were developed and etched following manufacturer’s directions 194  (also see Chapter A.2). The patterns were transferred to the “flow” substrate in four lithographic steps. First, SPR220-7.0 photoresist (DOW) was used to define 6.5 µm high pumps, valves, and channels connecting reaction chambers. A rounded channel geometry to facilitate the fabrication of valves was achieved by reflowing the resist at 115 °C, immediately followed by a hard bake at 190 °C for 1 hour. Second, the cell traps, cell inlet filters, as well as the chambers holding the indexing primers, were fabricated in 12 µm high SU-8 2010 photoresist (Microchem) and hard baked at 150 °C for 30 min. Next, a 15 µm thick AZ50XT (AZ Electronic Materials) layer was deposited to define the channels connecting the cell traps as well as the inlet valves. To obtain rounded channel walls, the resist was again reflowed and hard-baked by ramping the temperature from 65 °C to 190 °C and then holding the temperature overnight. The hard bake protected the AZ photoresist from SU-8 developer erosion. Finally, the 100 µm high bus channels and reaction chambers were manufactured in SU-8 100 photoresist (Microchem). The “control” mould consisted of two layers. First, the control channels were fabricated in 30 µm SU-8 2025 photoresist (Microchem). A short 15 min hard-bake at 150 °C protected the resist from further development during the subsequent manufacturing step. Second, the displacement chambers were defined in a 210 µm thick SU-8 100 photoresist layer. To prevent PDMS from sticking to the photoresist structures and substrate, the “flow” mould was treated with trimethylchlorosilane (TMCS, Aldrich) vapour overnight and the “control” mould and a blank silicon wafer were parylene coated. 195  The microfluidic devices were fabricated using QSil 216 (Quantum Silicones), a two-part, clear, liquid elastomer. The two components were mixed (ARE-310 Mixer, Thinky) in a 10:1 base to catalyst ratio by weight. Oxygen-plasma bonding (PDC-32G & PDC-FMG, Harrick Plasma) was used to assemble the devices using the following settings: RF power settings, high; oxygen gas pressure, 600 mTorr; treatment time, 25 sec. The devices were assembled from three layers. The top “control” layer was cast from 55 g PDMS mixture, degassed and cured at 80 °C for 60 min. The middle layer, a thin blank membrane, was made by spin coating the blank wafer with PDMS at 5,500 rpm (G3 Spin Coater, Specialty Coating Systems, Inc.) and cured for 30 min at 80 °C. After baking, the cured PDMS was removed from the control mould and bonded to the blank membrane. The two-layer structure was then incubated for 5 min at 80 °C, before removing it from the blank wafer in order to punch 0.71 mm access ports and laser ablate interlayer connections (140). The bottom “flow” layer was fabricated using an imprinting workflow to facilitate the deposition of indexing primers into open microfluidic structures with minimal distortion of the array. PDMS in its liquid state was sandwiched between the “flow” mould and a plasma-oxidized glass slide (100 × 100 mm2 Schott D263 Borosilicate Glass, S.I. Howard Glass Co.) and baked for 30 min at 80 °C. The stronger adhesion of the cured PDMS towards the glass slide allowed the “flow” mould to be lifted off, while the cast membrane remained attached to the glass slide. Molecular probes could be deposited directly into the chambers on the flow layer, if desired. Afterwards, the two-layer structure was aligned and plasma bonded to the 196  flow layer. The assembled device was removed from the glass slide used for imprinting and boded to a new glass slide. The final devices were incubated at 80 °C for 2 hours. Device operation was semi-automated. Custom LabView software (National Instruments) was used to control on-chip valves through an output card (PCI-6512, SCB-100, National Instruments) and solenoids (MH1 Miniature valve, FESTO). Tygon tubing (Cole-Parmer) and 20-gauge stainless steel pins (New England Small Tube Corporation) were used to connect the control ports of the microfluidic device to the solenoids. Compressed air (25-30 psi) and Krytox (DuPont) oil in the control channels was used to operate the device.   197  Appendix C  : Extra figures Chapter 3  Figure C.1 Schematic of the microfluidic device operation (a) Cell loading. Cell traps permit washing to remove untrapped cells, cell debris, and extra-cellular contamination.  (b) Cell lysis. Isolated single cell is transferred into the inflatable reaction chamber with lysis solution.  (c-d) Sequential reagent addition. The reagent supply channel is flushed with water and dried with compressed air between reagent additions. The inflatable reaction chamber permits flexibility in the experimental design and reagent addition is limited only by the maximum volume of the chamber, not by the number of reagents. (e) Library indexing. Pre-spotted index primers are re-suspended with new reagents and added to the inflatable chamber.  (f) Product recovery. Indexed libraries are pooled and recovered from the chip for sequencing.   198   Figure C.2 Total reads by cell call Lack of reads in “no-template control” libraries indicate extremely low levels of contamination in the cell suspension fluid. Boxplots show the number of total reads in indexed libraries flagged using fluorescence microscopy as: “1 cell”, nothing visible in the trap except a single cell; “no-template control”, an empty trap containing only the cell suspension fluid, which acts a measure of the background contamination expected to be present in the other libraries; “multiple cells”, a doublet or other integer number of single cells; “contaminating debris or uncertain cell call”, a trap containing contaminating debris, cells with contaminating debris, or a clump with an uncertain number of cells. The title of each panel indicates the number of indexed libraries in each category for the given sample.   199    Figure C.3 Lorenz curves Lorenz curves, showing uniformity of coverage for merged single-cell genomes. Each curve corresponds to the median merged genome for the given sample from the bootstrap merging analysis. Dotted black lines represent perfectly uniform coverage. (a) DLP GM18507, sequenced at 192 cells per NextSeq flowcell. (b) DLP 184-hTERT-L2, sequenced at 192 cells per HiSeq lane. Black, a bulk genome for the same 184-hTERT-L2 sample prepared using the standard Nextera protocol; orange, a merged genome corresponding to 48 DLP 184-hTERT-L2 cells. (c) C-DOP-L 315A (54), sequenced at 96 cells per HiSeq lane. (d) A comparison of merged genomes from the full-depth DLP 184-hTERT-L2 dataset and the C-DOP-L 315A dataset (54). (e) A comparison of merged genomes from the DLP 184-hTERT-L2 dataset trimmed and downsampled to the same mean depth per cell as the C-DOPL 315A dataset (54). Despite featuring the same mean depth per condition, the DLP libraries have more uniform coverage.  200   Figure C.4 (Page 1 of 6) 201   Figure C.4 (Page 2 of 6) 202   Figure C.4 (Page 3 of 6) 203   Figure C.4 (Page 4 of 6) 204   Figure C.4 (Page 5 of 6) 205   Figure C.4 (Page 6 of 6) Examples copy number profiles from third passage xenograft SA501X3F 206  Examples of integer copy number profiles for single cells from third passage xenograft SA501X3F with unique copy number alterations. Black arrows highlight difference between a given cell and the dominant profile for the clone to which that cell belongs. Colours correspond to the copy number state assignment for a given genomic bin. Black lines indicate segment medians.   207   Figure C.5 (Page 1 of 2) 208   Figure C.5 (Page 2 of 2) Example integer copy number profiles for single cells from fourth passage xenograft SA501X4F Top of first page, a cell with the majority profile; (i) through (ix), example cells from minor sub-populations highlighted in Figure 3.8; bottom of second page, a cell showing signs of DNA replication. Black arrows highlight difference between a given cell and the majority profile. Colours correspond to the copy number state assignment for a given genomic bin (150 kb). Black lines indicate segment medians.  209   Figure C.6 Analysis of contaminating mouse reads in libraries prepared from SA501X3F (upper panels) and SA501X4F (lower panels) xenograft samples 210  (a) Boxplots showing coverage depth of relative to the human (left panels) and mouse (right panels) reference genomes. (b) Barplots showing total (grey) and mapped (blue) reads relative the human and mouse reference genomes. The six indexed libraries with coverage depth ≥ 0.001 relative to the mouse reference are indicated in orange.   211    Figure C.7 Comparison of copy number segment calls in the fifteen xenograft cells with the highest number of total reads relative to bulk, as a function of the single-cell bin size All cells were from the SA501X4F passage and are compared to the SA501X4F bulk Titan (31) copy number profile. For each known bulk segment, and for each single cell, if at least 90% of single-cell bins had the exact same copy number call as the bulk, the call in that cell is considered correct. For each bulk segment, the fraction of cells with a correct call is then computed. Plots show the distribution of this “fraction of cells” with a correct copy number call for bulk segments in a given size range, and for a given HMMcopy (148) bin size. Results show that for DLP single-cells sequenced at higher coverage depth, 50 kb bins permit consistent identification of known bulk segments in the range of 100 kb–1 Mb. Using 50 kb bins, cells in this dataset had a mean of 86.0 ± 8.7 reads per bin, with a median of 16 empty bins across the genome (excluding those on 212  chromosome Y, since the sample is female). With the smallest bin size of 10 kb, even smaller known bulk segments are detected. However, the average number of reads per bin at this bin size is 17.2 ± 1.8, and a substantial number of bins have no reads (median 1646 empty bins outside of chromosome Y). While the overall copy number profile is remarkably well preserved at a bin size of 10 kb, these empty bins cause a drop-off in the fraction of correct calls for the larger segments.    213  Appendix D  : Extra figures Chapter 4  Figure D.1 Spotter setup and single-cell isolation 214  (a) Spotting robot setup featuring: (I) microwell open-array located on customized chip-holder, (II) wash-solution reservoir, (III) active fresh-water wash station, (IV) dispensing nozzle, (V) droplet camera, (VI) chilled target holder. (b) Brightfield image of the dispensing nozzle. Orange arrow highlights ejected droplet. (c) Overlay of a brightfield image showing the dispensing nozzle and the mapping density of detected cells. Green dots indicated ejected cells; blue dots indicate cells that were again detected after ejecting a single droplet; dotted blue line shows boundary of cell ejection area/volume; dotted orange line indicates sedimentation boundary.  (d) Automated imaging permits the identification of single cells and target deposition into a nanowell. Cells were deposited if a single cell was detected in the ejection area and no particle was present in the sedimentation area (see (c)). Orange arrow highlights selected single cell for deposition. (e) Brightfield image showing contaminating debris (orange arrow). (f) Montage of 186 fluorescent images of isolated single cells in the bottom of a nanowell using the CellenOne software. Images are aligned according to the array layout. (g) Montage of 186 fluorescent images from cells dispensed in a limiting dilution. Images are aligned according to the array layout. 215    Figure D.2 Cross-contamination during primer spotting Cross-contamination was assessed by spotting an alternating pattern of human gDNA (n = 24) and salmon sperm DNA (n = 24) followed by library construction and sequencing. The conditions were repeated for a wash routine containing SciClean (Scienion) solution (a) and an alternative wash routine using 2% Tween 20 (b). Control libraries (n= 31, yellow) were not exposed to any washes and built from single GM18507 cells. 0.90X ± 0.15 of total reads originating from the control libraries map to the human genome and 0.0054X ± 0.0095 to the salmon genome. The majority of reads originating from wells designated for human gDNA (green) align to the human reference genome (0.86X ± 0.062 and 0.91X ± 0.024 for the SciClean and Tween20 wash routine, respectively), whereas only a small fraction of reads from the same libraries align to the salmon genome (0.0047X ± 0.00085 and 0.0054X ±  0.00070 for the SciClean and Tween20 wash routine, respectively). This is approaching the sensitivity limit of our assay. Comparably, only a small fraction of reads derived from wells that were designated for salmon sperm DNA (blue) did align to the human genome (0.0094X ± 0.011 and 0.06X ± 0.14 for the SciClean and Tween20 wash 216  routine, respectively). 0.35X ± 0.022 and 0.27X ± 0.15 of reads derived from wells exposed to the SciClean and Tween20 wash routine, respectively, align to the salmon genome. The low mappability can most likely be attributed to a mismatch of salmon species of the reference genome (Salmo salar) and the DNA source (Oncerhyncus keta).  Box plots show median and quartiles, the whiskers show the remaining distribution, and dots indicate outliers. Fractions of reads reported in the caption are mean ± standard deviation.   Figure D.3 Feature ranking List of ranked features informing on the library quality of single cells. 217   Figure D.4 Significance correlation matrix for key parameters Shows results of KW tests for experiments in Figure 4.4. Colour and size of dots in the correlation plots illustrate significance; P = 0.05 in gray.  218   Figure D.5 Live/dead significance matrix Shows results of KW tests for experimental conditions in Figure 4.5. Colour and size of dots in the correlation plots illustrate significance; P = 0.05 in gray.  219   Figure D.6 Insert size distribution of DLP+ libraries Insert size distribution of 4 DLP+ libraries. Experimental conditions are indicated in the panel heading. X-axis shows insert size; y-axis shows fluorescence units (FU; a measure of library quantity).  220   Figure D.7 Representative single-cell CN-profiles from GM18507 cells 221  Example profiles from two different experiments showing two nearly-diploid cells (a), three rare copy-number alterations detected in only one cell (b), and two cells with a high variability in reads (c).  222   Figure D.8 Representative single-cell CN-profiles from GM18507 cells showing clonal events 223  Example profiles from two different experiments (showing one cell from each experiment) with a 1-copy whole chromosome gain in Chr 2 (a), a 1-copy gain of the long arm of Chr 9 (b), and 1-copy deletion in Chr 5 (c).  224   Figure D.9 Representative single-cell CN-profiles from GM18507 cells with Chr 16q alterations 225  Example profiles from cells with a possible mis-segregation event detected in 3 different experiments carried out on 3 different chips using the open-array platform  (a-c) and in the published DLP GM18507 dataset (d, see ref (1)).   Figure D.10 Total mapped reads split by cell call Low mapped read counts in no-cell controls (NCC) and no-template controls (NTC) indicate low levels of human contamination. Cells were scanned on a fluorescence microscope and flagged as: “live single-cell”, nothing but a single cell was detected in the first fluorescent channel; “dead single-cell”, nothing but a single cell was detected in the second or both fluorescent channels; “gDNA”, wells that contain a single-cell equivalent amount of gDNA used as positive control; “NCC”, empty wells that contain an equal volume of cell suspension fluid as wells containing cells; “NTC”, empty wells only containing all library construction reagents. “Other”, combines uncertain cell calls, multiple cells, clumps, debris and other observations that don not fall into any other category. 226  NCC and NTC act as a measure of the background contamination that can be expected to be observed in the cell libraries. Box plots show median and quartiles, the whiskers show the remaining distribution, and dots indicate outliers.   227  Appendix E  : Sequencing data processing E.1 Data alignment and coverage uniformity analysis Alignment 184-hTERT-L2 and xenograft libraries were sequenced on Illumina Hiseq instruments at the Michael Smith Genome Sciences Centre (GSC, Vancouver, Canada). Demultiplexed FASTQ files were obtained from the GSC. GM18507 libraries were sequenced on an Illumina NextSeq instrument at the University of British Columbia Biomedical Research Centre (Vancouver, Canada). BCL files were demultiplexed and converted to FASTQ format using Illumina’s bcl2fastq2 program, version 2.16.0.10 (http://support.illumina.com/sequencing/sequencing_software/ bcl2fastq-conversion-software.html). Demultiplexed paired-end FASTQ files were trimmed to remove Nextera adaptor contamination and low-quality bases on the 3’ ends of reads using Trim Galore! version 0.3.7 with default parameters (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Trimmed reads were then aligned to the human reference genome GRCh37-lite.fa using BWA (145) version 0.7.10. The resulting BAM files were sorted using the SortSam tool from Picard version 1.121 (http://broadinstitute.github.io/picard/), and indexed with Samtools (146) version 0.1.19. Realignment intervals were identified and realigned concurrently for 228  all indexed samples in a given sequencing lane, using the RealignerTargetCreater and IndelRealigner tools from GATK (147) version 3.4-46. Duplicate reads were marked and removed using MarkDuplicates from Picard, and indexed with Samtools, using the same versions as above. For external datasets, as specified in Baslan et al. (2015), 7 bp index barcodes were trimmed using Trim Galore! off of the 5’ ends of all single-end reads prepared using the C-DOP-L protocol, and 8 bp were similarly trimmed for the WGA4 dataset (54). Low-quality bases on the 3’ ends of reads were also removed with Trim Galore! to match treatment of our samples. WGA4 adaptors were not trimmed, as no mention of adaptor trimming was found in the publication from which the dataset was derived, and the degenerate WGA primer sequence is proprietary (54). Alignments were then carried out as described above. Merging and downsampling Single-cell BAM files were merged using the MergeSamFiles tool from Picard version 1.121. Downsampling of BAM files was carried out using the DownsampleSam tool from the same version of Picard. Following downsampling, duplicates were re-marked using MarkDuplicates, as described above. Sequencing metrics Basic sequencing metrics were collected using the flagstat command from Samtools version 0.1.19. Additional metrics were collected for all aligned, merged and downsampled paired-end BAM files using Picard version 1.121: duplication metrics using MarkDuplicates, insert metrics using CollectInsertSizeMetrics, and genome coverage metrics using CollectWgsMetric. The minimum base quality and 229  minimum mapping quality were set to 20, and the coverage cap to 500 for WGS metric collection. Lorenz curves and coverage breadth were computed based on the histogram from CollectWgsMetrics output. For single-end BAM files, WGS coverage metrics were collected using CollectWgsMetrics from Picard version 1.141, with the COUNT UNPAIRED=True option. Statistical analysis of tumour sequencing metrics In Figure 3.6c, we compare sequencing metrics between our breast cancer xenograft single-cell libraries and single breast cancer cells sequenced using two different DOP-PCR methods, WGA4 and C-DOP-L (patient-identifier, Pt41 (54)). To provide a fair comparison, we downsampled our SA501X3F and SA501X4F xenograft libraries by 73% and 78%, respectively, in order to achieve the same median number of mapped non-duplicate reads per cell as the Pt41 libraries. A KW test between our two downsampled datasets and the two Pt41 datasets showed no significant difference in the number of mapped non-duplicate reads (p=0.11). Next, we compared the variation in binned read counts (200 kb bins) between datasets, by computing the median absolute deviation (MAD) over chromosome 19, the only chromosome which was diploid in both the SA501 xenograft passages and Pt41 dataset. Another KW test shows that our heavily downsampled xenograft cells had lower median variation in binned read counts relative to the DOP-PCR cells, given the same number of mapped non-duplicate reads (p<2.2e-16). A post-hoc Dunn’s test with Benjamini-Hochberg correction showed all comparisons between the downsampled xenograft datasets and Pt41 datasets were significant, but there was no significant difference between the X3F and X4F 230  xenograft passages; see Table 3.4). While it is difficult to compare coverage breadth for different tumour samples in a fair manner, it should be noted that the C-DOP-L protocol achieved higher median breadth than the WGA4 protocol when applied to the same patient tumour (KW test, p=1.89e-5), suggesting that WGA4 would not be expected to out-perform DLP in a bootstrap merging analysis, were such a dataset available for comparison. Plotting Sequencing metrics were plotted using Python 2.7 with the Matplotlib and Seaborn packages (version 1.4.0 and 0.6.0, respectively). For all boxplots, the middle line represents the median, the box indicates the quartiles, and the whiskers show the rest of the distribution, except for “outlier” data points. The Seaborn default definition for outliers was used, with the whis parameter set to 1.5 (the proportion of the inter-quartile range past the high and low quartiles to which the whiskers extend). All points past this range are considered “outliers” and are displayed with small diamond-shaped markers. E.2 Single-cell copy number inference Copy number profiles were inferred using HMMcopy (148) version 0.99.0, with several modifications to the standard usage of this tool. For the analysis in Figure 3.6, due to the inclusion of downsampled datasets, all profiles were inferred with 200 kb bins for consistency. For all other analyses, profiles were inferred using 150 kb bins. First, a mappability file was generated for the reference genome GRCh37-lite.fa using the generateMap.pl script packaged with HMMcopy, with the --window option set to 35 bp. Following this, binned GC-content and mappability files were generated using the 231  mapCounter and gcCounter tools packaged with HMMcopy, with the -w option set to the bin size (200 kb or 150 kb, as described above). For each single-cell BAM file, a binned WIG file was generated using the readCounter tool packaged with HMMcopy, with a minimum mapping quality of 20. These were read into R version 3.0.2 with the HMMcopy command wigsToRangedData, and corrected using the correctReadCount command with the GC-content and mappability files described above. However, instead of using the logged, GC- and mappability-corrected values, the HMM was run on the non-logged GC-corrected values, following filtering of bins with mappability less than 0.78. Instead of the default six states, the model was run with seven states, and the default model parameters modified as follows: params$m <- c(0, 0.5, 1, 1.5, 2, 2.5, 3) params$mu <- c(0, 0.5, 1, 1.5, 2, 2.5, 3) params$kappa <- c(25, 50, 800, 50, 25, 25, 25) params$e <- 0.995 params$S <- 35 After segmentation, binned read counts were converted from the arbitrary GC-corrected scale (where the “neutral” state was centered at 1), to an integer copy number scale. This was done by dividing all bins across the genome by half the median value for bins assigned 232  to the “neutral” or 2-copy state. It should be noted that this approach would have to be adjusted if the sample ploidy differed greatly from two, which was not the case for our samples. Following conversion to integer copy number scale, segment medians were recomputed, and the median for each segment was rounded to the nearest integer to derive an “integer copy number profile” for the sample. Thus, despite the last HMM state encompassing all CNVs with more than six copies, high-level amplifications were assigned to their correct integer values. Following copy number inference, the median absolute deviation (MAD) of all bins assigned to the “neutral” 2-copy state was computed, and all indexed libraries with “neutral MAD”<0.15 that were flagged as “1 cell” using fluorescence microscopy were retained for downstream phylogenetic analysis.  E.3 Phylogenetic inference Phylogenetic inference was carried out using MrBayes (152) version 3.2.2. A matrix of integer copy number across all genomic bins was generated for each dataset. Since the standard (morphological) model accepts states from 0 to 9, all bins with copy number greater than the maximum allowable value were set to 9. The parameters of the likelihood model were set to lset coding=all rates=adgamma, to account for correlated rates for adjacent sites. The MCMC was then run for 10 million generations, with a burnin of 5 233  million. The resulting consensus trees were plotted with FigTree version 1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/). E.4 Clonal genome analysis Copy number profiles were generated for the merged clonal genomes from xenograft SA501X3F using the same parameters applied to the single-cell samples, as described above. For each clonal genome, all segments were first classified as amplifications (three or more copies) or deletions (one copy or fewer), and sub-divided by size range. Next, for each such clonal copy number segment, the copy number calls of all single cells within that clone were compared, and those with matching classification (amplification/deletion) for at least 90% of bins within the segment were considered “overlapping”. The fraction of cells with overlapping calls for each segment was computed. Finally, boxplots were generated showing the distribution for segments within a given size range E.5 Bulk-equivalent and bulk genome analysis Bulk-equivalent genomes were generated by merging all indexed libraries for a given xenograft passage, excluding those flagged as containing mouse contamination (≥ 0.001X depth relative to the mouse reference genome (Mus_musculus.GRCm38.dna.primary_assembly.fa). Loss of heterozygosity (LOH), somatic single nucleotide variants (SNVs), and rearrangement breakpoints were then inferred independently for the standard bulk libraries and bulk-equivalent genomes of each xenograft passage. 234  Loss of heterozygosity analysis Copy number and loss of heterozygosity were inferred using Titan version 1.5.5. First, germline heterozygous single nucleotide polymorphisms (SNPs) were identified in a matched normal BAM file for patient SA501 using mutationSeq (147) version 4.3.7. Next, allele counts were extracted for the tumour genome at each heterozygous germline position using mutationSeq. Read depth for 1000 bp bins was extracted across the tumour and normal genomes using the HMMcopy readCounter tool packaged with Titan, and counts corrected for GC-content and mappability. Samples were analyzed with the Titan ploidy parameter initialized at 1.8, based on previous experimental measurements for the SA501 patient tumour sample (24). The following additional parameters were modified: txn z strength = 1e6 txn exp len = 1e16 alpha k = 15000 For SA501X3F, the two-cluster solution was selected, as Titan did identify the sub-clonal deletion of chromosome X, which is found in both minor Clones B and C, but did not identify the chromosome 11 amplifications found only in Clone C. Single-nucleotide variant calling Somatic single nucleotide variants were predicted on the paired tumour and normal samples from patient SA501 with mutationSeq (149), 235  using model v4.1.2. Variants with probability 0.9 and without the filter=INDEL flag were considered high confidence. The union of all high-confidence variants called in the bulk and bulk-equivalent samples was identified, and mutationSeq was run again in “targeted” mode to extract the reference and variant allele counts for those positions in both samples. Variant allele counts for a set of 184 somatic SNVs previously validated (24) in the SA501 xenograft series using targeted ultra-deep re-sequencing were also extracted for the bulk and bulk-equivalent genomes. Rearrangement breakpoint inference Rearrangement breakpoints were predicted using deStruct version 0.0.4, a tool derived from nFuse (150) and available at https://bitbucket.org/dranew/ destruct. The deStruct model was run independently in “single-mode” on the bulk and bulk-equivalent samples, and all reported breakpoints with probability ≥ 0.9 in at least one of two samples were considered “high-confidence” retained for downstream analysis. Within this set of “high-confidence” break- points, those with identical start and end coordinates called in both samples were considered overlapping. Bin size and segment length limits in higher-depth xenograft cells To determine whether smaller bin sizes could be used in DLP cells sequenced to higher depth, the fifteen cells with the highest number of total reads were identified. All fifteen cells were from the SA501X4F xenograft passage; copy number profiles for these cells were therefore 236  compared to the Titan (31) profile for the SA501X4F bulk genome (inferred with 1 kb bins, as described above). Analysis of overlap in segment calls was carried out as follows. For each known bulk segment, and for each cell, if at least 90% of single-cell bins within the region had the exact same copy number call as the bulk, the call in that cell was considered correct. The fraction of cells with correct call was then computed, and the distribution of this “fraction of cells” with correct call was plotted for a given segment size range. This was done for different HMMcopy (148) bin sizes (150, 100, 50, and 10 kb bins). As bin sizes decreased, the mean number of reads per bin also decreased, leading to an increase in the number of empty bins in low-mappability regions of the genome. To compensate for this, the mappability cutoff for HMMcopy was increased (from 78 for 150 kb bins, to 82, 86, and 90 for 100, 50, and 10 kb bins, respectively). In addition, given more bins across the genome for smaller bin sizes, the probability of staying in a segment (HMMcopy parameter e) was increased such that the expected number of state changes remained constant at 100 (0.995 for 150 kb bins, 0.9967, 0.9983, and 0.99967 for 100, 50, and 10 kb bins, respectively). E.6 GC-content estimation To estimate the GC-content of libraries for different experimental conditions, aligned and processed BAM files are run through Picard tool's CollectGcBiasMetrics (https://broadinstitute.github.io/picard/command-line-237  overview.html#CollectGcBiasMetrics) which computes the fraction of normalized coverage as a function of GC content of 100 base pair windows for each cell individually. We then combine the fraction of normalized coverage over all cells per experimental condition using a generalized additive model from the gam package in R (http://www.rdocumentation.org/packages/mgcv/versions/1.8-23/topics/gam) via the ggplot2 geom_smooth function (http://ggplot2.tidyverse.org/reference/geom_smooth.html). The end result is a single smooth curve per experimental condition representative of the normalized coverage over all cells as a function of GC content. Finally, to determine the average GC content per experimental condition, we determined the weighted mean by multiplying the GC content by the fractional coverage, then dividing the result by the sum of the single-cell coverages per experimental condition. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0365331/manifest

Comment

Related Items