UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Gene synthesis by assembly of short oligonucleotides Horspool, Daniel Richard 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2010_spring_horspool_daniel.pdf [ 2.3MB ]
Metadata
JSON: 24-1.0068440.json
JSON-LD: 24-1.0068440-ld.json
RDF/XML (Pretty): 24-1.0068440-rdf.xml
RDF/JSON: 24-1.0068440-rdf.json
Turtle: 24-1.0068440-turtle.txt
N-Triples: 24-1.0068440-rdf-ntriples.txt
Original Record: 24-1.0068440-source.json
Full Text
24-1.0068440-fulltext.txt
Citation
24-1.0068440.ris

Full Text

GENE SYNTHESIS BY ASSEMBLY OF SHORT OLIGONUCLEOTIDES  by DANIEL RICHARD HORSPOOL B.Sc., University of Victoria, 2005  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE  in THE FACULTY OF GRADUATE STUDIES (Bioinformatics)  THE UNIVERSITY OF BRITISH COLUMNBIA (Vancouver)  December 2009 © Daniel Richard Horspool, 2009  Abstract In principle, a pre-constructed library of all possible short oligonucleotides could be used to construct many distinct gene sequences. This approach requires computational methods to accurately determine the assembly procedure, but relieves the current technological constraints of custom oligonucleotide synthesis.  In order to assess the feasibility of such an approach, I examined T4 DNA Ligase activity on short oligonucleotides and found that ligation is dependent on the formation of a double-stranded DNA duplex of at least five base pairs flanking the site of ligation. However, ligations could be performed with overhangs smaller than five nucleotides and oligonucleotides as small as octamers, in the presence of a second, complementary oligonucleotide.  As a proof of principle for DNA synthesis through the assembly of short oligonucleotides, I performed a hierarchical ligation procedure whereby octamers were combined to construct a target 128 bp segment of the human beta–actin gene coding sequence. Thus, the construction of synthetic genes, without the need for custom oligonucleotide synthesis, is feasible. Algorithmic methods were then developed to extend this approach to DNA on the order of thousands of base pairs.  ii  Table of Contents Abstract ............................................................................................................................... ii	
   Table of Contents............................................................................................................... iii	
   List of Tables ..................................................................................................................... vi	
   List of Figures ................................................................................................................... vii	
   List of Abbreviations ....................................................................................................... viii	
   Acknowledgements............................................................................................................ ix	
   Preface................................................................................................................................. x	
   Dedication .......................................................................................................................... xi	
   1	
   Introduction................................................................................................................... 1	
   1.1	
   A DNA Primer ....................................................................................................... 1	
   1.2	
   Gene Synthesis....................................................................................................... 2	
   1.2.1	
   Background ..................................................................................................... 3	
   1.2.2	
   Phosphoramidite Chemistry and Oligonucleotide Synthesis.......................... 5	
   1.2.3	
   PCA and LCR ................................................................................................. 8	
   1.2.4	
   Reducing Errors ............................................................................................ 11	
   1.3	
   Rationale .............................................................................................................. 11	
   1.3.1	
   Ligase Requirements..................................................................................... 13	
   1.3.2	
   Library Size................................................................................................... 16	
   2	
   Characterization of T4 DNA Ligase and Short Oligonucleotides .............................. 17	
   2.1	
   Methodology for Fluorescence Experiments ....................................................... 17	
   2.1.1	
   Methodology for Bead Immobilization ........................................................ 18	
   2.1.2	
   Methodology for Basic Ligation Assays ...................................................... 19	
   2.1.3	
   Methodology for Ligation Assays With a Supplementary, Second Oligonucleotide......................................................................................................... 20	
   2.1.4	
   Methodology for Hexamer Ligation Assays with Serial Concentrations of PEG ...................................................................................................................... 20	
   2.1.5	
   Methodology for Characterizing Rescue Assays with Serial Concentrations of the Supplementary Oligonucleotide ..................................................................... 20	
   2.1.6	
   Methodology for Ligation Assays with Alternative Rescue Oligonucleotides ...................................................................................................................... 21	
   2.2	
   Fluorescence Experiments ................................................................................... 21	
   2.2.1	
   Minimal Oligonucleotide Lengths for Ligation............................................ 21	
   2.2.2	
   Improvement of 4 bp Ligations with a second oligonucleotide.................... 23	
   2.2.3	
   Improvement of 3 bp Ligations with a Second Oligonucleotide .................. 24	
   2.2.4	
   Alterations in Oligonucleotide Arrangements Improve Ligation Efficiency 27	
   2.2.5	
   Optimal Supplementary Oligonucleotide Concentrations are Dependent on the Ligation Kinetics of the First Oligonucleotide ................................................... 30	
   2.2.6	
   Improved Ligation Rates in the Presence of PEG ........................................ 31	
   2.2.7	
   Hairpin Oligonucleotides can Improve Hexamer Ligation........................... 32	
   2.3	
   Methodology for Pooled Oligonucleotide Experiments ...................................... 34	
   2.3.1	
   Methodology for Octamer and Hexamers Pools........................................... 34	
   2.4	
   Pooled Oligonucleotide Experiments .................................................................. 34	
   2.4.1	
   Octamers ....................................................................................................... 35	
   iii  2.4.2	
   Hexamers ...................................................................................................... 36	
   2.4.3	
   Dephosphorylated Oligonucleotides can be Used in Pooled Assemblies..... 37	
   2.5	
   Methodology for DNA Construction Experiments.............................................. 39	
   2.5.1	
   Experimental Methodology for Pair-wise Assembly.................................... 39	
   2.5.2	
   Experimental Methodology for Serial Assembly ......................................... 41	
   2.6	
   DNA Construction Experiments .......................................................................... 42	
   2.6.1	
   Hierarchical Assembly of a 128 bp Construct .............................................. 43	
   2.6.2	
   Serial Assembly of a 128 bp construct ......................................................... 47	
   3	
   Algorithmic Assembly of Genes................................................................................. 49	
   3.1	
   Sequence Tiling ................................................................................................... 50	
   3.2	
   Problematic Regions ............................................................................................ 50	
   3.2.1	
   Restriction Sites ............................................................................................ 50	
   3.2.2	
   Complete Palindromes .................................................................................. 52	
   3.2.3	
   Repeats in Oligonucleotide Pools ................................................................. 54	
   3.2.4	
   Palindromic 4 bp Ends .................................................................................. 56	
   3.2.5	
   Single Polynucleotide Stretches.................................................................... 57	
   3.3	
   Conflict Avoidance Strategies ............................................................................. 58	
   3.3.1	
   Restriction sites............................................................................................. 59	
   3.3.2	
   Palindrome Avoidance.................................................................................. 60	
   3.4	
   Methodology of Algorithmic Assembly Strategies ............................................. 62	
   3.4.1	
   Subunit Requirement Strategies.................................................................... 62	
   3.4.2	
   Subunit Assembly Strategies ........................................................................ 64	
   3.4.3	
   Solution Selection and Filtering Strategies................................................... 66	
   3.5	
   Algorithmic Assembly of Selected Genes ........................................................... 68	
   3.5.1	
   Overview of the assembly process................................................................ 68	
   3.5.2	
   Assembly of EGFP ....................................................................................... 70	
   3.5.3	
   Assembly of TetC ......................................................................................... 74	
   4	
   Discussion ................................................................................................................... 76	
   4.1	
   Addressing Error Rates ........................................................................................ 77	
   4.1.1	
   Error Rate Analysis....................................................................................... 78	
   4.1.2	
   Cost Analysis ................................................................................................ 81	
   4.1.3	
   Reducing Error Rates.................................................................................... 84	
   4.1.4	
   Assembly with Hexamer Analogues............................................................. 87	
   4.1.5	
   Danger and Security...................................................................................... 88	
   4.1.6	
   Conclusions................................................................................................... 89	
   4.2	
   Future Research ................................................................................................... 89	
   4.2.1	
   Engineering a New Ligase through Directed Evolution ............................... 89	
   4.2.2	
   Microfluidic and Labcyte Assembly............................................................. 90	
   4.3	
   Significance.......................................................................................................... 92	
   5	
   References................................................................................................................... 93	
   Appendix A – Glossary..................................................................................................... 96	
   Appendix B – Sequencing Results for Beta-Actin Assemblies ........................................ 99	
   B.1 Serial Assembly of 100bp BA............................................................................. 100	
   B.1 Serial Assembly of 128bp BA using High Concentration Oligonucleotides ...... 102	
   B.3 Hierarchical (pair-wise) Assembly of 128bp BA................................................ 103	
   Appendix C – Source Code for the Birthday Paradox Algorithm .................................. 105	
   iv  Appendix D – Source Code for Selected Assembly Algorithm Classes ........................ 108	
   Appendix E – Sample of an EGFP Assembly ................................................................ 137	
   Appendix F – Sample of a TetC Assembly .................................................................... 142	
    v  List of Tables Table 1 – Milestones in synthetic DNA.............................................................................. 4	
   Table 2 - All known minimum oligonucleotide length requirements for ligation on a complete template as previously reported in literature. ............................................ 15	
   Table 3 - Comparison of the number of hexamers and octamers required to construct a sample of genes......................................................................................................... 59	
   Table 4 – Number of occurrences of EarI and BbsI in various nucleotide sequences...... 60	
   Table 5 - The number of palindromes that occur in the first tiling frame and the number of palindromes found in the frame that contains the least number of palindromes.. 61	
   Table 6 - Possible subunit validation strategies................................................................ 63	
   Table 7 - Possible assembly strategies.............................................................................. 65	
   Table 8 - Possible solution filtration strategies................................................................. 67	
   Table 9 - Summary of assembly solutions for eGFP with different parameters............... 71	
   Table 10 - Fraction of clones with correct full-length sequences..................................... 79	
   Table 11 - Costs associated with octamer gene assembly ................................................ 83	
    vi  List of Figures Figure 1 – Composition of DNA ........................................................................................ 2	
   Figure 2 – Oligonucleotide synthesis cycle ........................................................................ 6	
   Figure 3 – Graph of the hypothetical % yield based on oligonucleotide length and coupling efficiencies of 0.5, 1, and 2%, respectively ................................................. 8	
   Figure 4 – Overview of PCA and LCA ............................................................................ 10	
   Figure 5 - Required library size for increasing oligonucleotide lengths........................... 12	
   Figure 6 - Oligonucleotide ligation on a complete and partial template........................... 14	
   Figure 7 - Schematic diagram of immobilized double-stranded DNA used in ligation assays and DNA construction ................................................................................... 18	
   Figure 8 - Observed ligation of a labeled oligonucleotide with an immobilized dsDNA containing a 3, 4, or 5 nucleotide 5’ overhang.......................................................... 22	
   Figure 9 (a) Unsuccessful 4bp duplex reactions could be salvaged by utilizing a supplementary oligonucleotide, designed to complement the first oligonucleotide. 24	
   Figure 10 - Unsuccessful 3bp duplex reactions can be salvaged by utilizing a supplementary oligonucleotide ................................................................................. 26	
   Figure 11 - Improved ligation for a hexamer hexamer pair with extended reaction time and reduced temperature ........................................................................................... 27	
   Figure 12 - Ligation of a labeled heptamer using a supplementary hexamer................... 28	
   Figure 13 – Alternative arrangements of hexamers show some improvement to ligation 29	
   Figure 14 – Ligation efficiencies with variable concentrations of the supplementary oligonucleotide.......................................................................................................... 31	
   Figure 15 – Improvement of hexamer ligation efficiency with PEG................................ 32	
   Figure 16 - Variations to the supplementary oligonucleotide used offer different ligation potentials ................................................................................................................... 33	
   Figure 17 - Ligation of multiple oligonucleotides ............................................................ 35	
   Figure 18 - Ligation of pools of oligonucleotides ............................................................ 36	
   Figure 19 - Hexamer pooling experiments with increasing concentrations of PEG......... 37	
   Figure 20 - Dephosphorylated octamers can be used in pooled ligations when supplemented with PNK ........................................................................................... 38	
   Figure 21 – Hierarchical assembly of a 128 bp construct................................................. 44	
   Figure 22 – Result from the 128 bp hierarchical assembly .............................................. 46	
   Figure 23 - Serial assembly of a 128bp construct............................................................. 48	
   Figure 24 - Schematic representation of a two-step assembly procedure......................... 49	
   Figure 25 - Common problems encountered during subunit assembly of pooled oligonucleotides ........................................................................................................ 53	
   Figure 26 – Flowchart of the inchworm algorithm........................................................... 69	
   Figure 27 - Assembly of eGFP ......................................................................................... 73	
   Figure 28 - Assembly of tetC............................................................................................ 75	
   Figure 29 - Overview of an assembly strategy that involves alternating receding/protruding sticky ends................................................................................ 85	
    vii  List of Abbreviations A  Adenine  APS  Ammonium Persulfate  BAC  Bacterial Artificial Chromosome  C  Cytosine  CPG  Controlled Pore Glass  DsDNA  Double-stranded DNA  DMT  Dimethoxytrityl  DNA  Deoxyribonucleic acid  EDTA  Ethylenediaminetetraacetic acid  FAM-6  6-Carboxyfluorescein  G  Guanine  LCR  Ligase Chain Reaction  PCA  Polyermase Cycling Assembly  PCR  Polymerase Chain Reaction  RNA  Ribonucleic acid  SsDNA  Single-stranded DNA  T  Thymine  TAE  Tris Acetate EDTA  TE  Tris EDTA  YAC  Yeast Artificial Chromosome  viii  Acknowledgements Dr. Robert Holt for his encouragement, advice, guidance, and patience throughout the duration of my research. I would especially like to thank him for his many insightful conversations during the development of the ideas in this thesis, for his helpful suggestions and support, and for listening to my numerous and often farfetched ideas.  My committee members, Gregg Morin, Ryan Brinkman, and Andre Marziali, for their helpful suggestions, guidance, and discussions of my research.  Dr. Steven Jones, Ms. Sharon Ruschkowski, and the rest of the Bioinformatics Training Program for Health Research for providing me the opportunity to undertake my research.  I am in debt to Duane Smailus and all of Vectorology for their laboratory expertise and training they provided me while tolerating all my questions.  I would also like to thank the Natural Sciences and Engineering Research Council of Canada, the Michael Smith Foundation for Health Research, and the Bioinformatics Training Program for Health Research who have supported me during my research.  ix  Preface In my last rotation of the Bioinformatics Training Program, under the supervision of Dr. Robert Holt, we were interested in constructing a minimal genome using Haemophilus influenzae as a template. Unfortunately, it soon became apparent that even though many genes are well understood, there were too many genes with unknown function to make this project feasible. The only sure means to develop a minimal genome then would have been to selectively knock out of all of these undefined genes. Further still, isolating the essential genes is just the tip of the iceberg in rebuilding a functional minimal genome. This daunting task led us to consider the practicalities of genome construction and its underlying requirements. At the time of writing this, a gene of a few kilobase pairs can be constructed for a few thousand dollars. Unfortunately, even the smallest known bacterial genome, that of Mycoplasma genatalium, is 580 kbp. In turn, the most basic question was raised; can long DNA be constructed reliably and cost effectively?  x  Dedication This work is dedicated to my parents, Ursula and Nigel Horspool, for their encouragement, guidance, and unconditional love.  xi  1 Introduction 1.1 A DNA Primer Deoxyribonucleic acid (DNA) is composed of two polymers with nucleotide units. Each of these polymers, or polynucleotides, is composed of an ordered assembly of four possible nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). These nucleotides, while each containing a different base, all share the same deoxyribose moiety and are linked together at their 5’ and 3’ positions through phosphodiester bonds. The two polynucleotides run in opposite directions to each other and each nucleotide on one strand is complementary to the opposite nucleotide on the other strand; for this reason, the strands of DNA are said to be complementary and anti-parallel (Figure 1). Just as ones and zeros can encode a computer program, it is the precise order of nucleotides in DNA which gives rise to the instructions that a cell (or virus) is programmed to follow. DNA can easily be replicated by a cell because each polynucleotide strand is complementary to the other and thus both can be used as templates to produce new complementary strands− an essential requirement for passing on genetic information to progeny.  1  Figure 1 – Composition of DNA. (Top) Deoxyribonucleic acid (DNA) naturally found as a double helix of two complementary, anti-parallel polynucleotides. (Middle) Each polynucleotide chain is composed of a long contiguous series of nucleotides joined together by phosphodiester bonds. (Bottom) Individual nucleotides can be assembled in a precise order based upon the composition of the nucleotides in the complementary strand: A pairs with T, and C pairs with G.  1.2 Gene Synthesis Gene synthesis, the de novo fabrication of gene coding sequence length DNA (on the order of several kilobases) from chemically derived oligodeoxyribonucleotides (oligonucleotides), holds incredible promise in the advancement of biological sciences. Gene synthesis offers the ability to optimize genes for unnatural hosts, alter existing or append new restriction sites, create chimeric fusion proteins, or even produce genes for completely artificial transcripts and more (Beattie and Fowler 1991).  With such promise, gene synthesis is quickly becoming more powerful and practical than the conventional genetic manipulation techniques of restriction, ligation, cloning, and screening (Wu et al. 2006). However, despite this progress, it is not yet possible to 2  chemically synthesize entire genes as continuous DNA strands de novo. Rather, gene synthesis relies on assembling chemically synthesized oligonucleotides together into a complete gene (Czar et al. 2009). The challenge is then how to economically and efficiently assemble these expensive and often error-prone oligonucleotides precursors correctly into a gene. As a consequence of these drawbacks, synthetic DNA has been hindered from mainstream use in the laboratory (Zhou et al. 2004).  1.2.1 Background The concept of synthetic and custom DNA was first realized thirty years ago. In 1968, Gupta et al. produced the first synthetic gene of a yeast transfer RNA by a laborious effort of producing two isocanucleotides along with several shorter oligonucleotides and enzymatically joining them together (Gupta et al. 1968a). Although the product was only 30 bp and took over two years to produce, it represents the beginning of gene synthesis.  3  Table 1 – Milestones in synthetic DNA. (PCA: polymerase cycling assembly; BAC: Bacterial Artificial Chromosome; YAC: Yeast Artificial Chromosome)  Fragment Name  Length  Methods Used  Year  References  Ala-tRNA gene fragment  30 bp  Enzymatic ligation  1968  (Gupta et al. 1968a)  Tyrosine tRNA gene  126 bp  Enzymatic ligation  1976  (Khorana et al. 1976)  ompA signal peptide gene  Plasmid  250 bp  2.7 kbp  Stepwise elongation 1992  (Majumder  with PCR  1992)  PCA  1995  (Stemmer et al. 1995)  phiX174 bacteriophage  5.4 kbp  PCA  2003  genome Polyketide synthase gene  2003) 32 kbp  PCA, BAC  2004  cluster Rice chloroplast genome  (Smith et al.  (Kodumal et al. 2004)  134.5 kbp  PCA, B. subsilis  2008  BACS, inchworm  (Itaya et al. 2008)  elongation Mycoplasma genitalium genome  582 kbp  PCA, BAC, YAC  2008  (Gibson et al. 2008)  4  1.2.2 Phosphoramidite Chemistry and Oligonucleotide Synthesis The key step to chemical oligonucleotide synthesis is the forging of phosphodiester bonds between specific nucleotides. In nature, ligases and polymerases catalyse the formation of a phosphodiester bond between adjacent nucleotides; however, when joining nucleotides chemically, additional steps are necessary (Caruthers et al. 1983; Caruthers et al. 1980). First, the deoxyribose moiety of a nucleotide contains two reactive hydroxyl groups and thus one must be protected while a phosphodiester bond is formed with the other. Secondly, the various nitrogenous bases attached to the deoxyribose in each nucleotide contain reactive nitrogen and carboxyl groups that must be protected throughout the entire synthesis process. Lastly, because no reaction can be performed to completion, steps must be included to ensure unreacted intermediates do not continue to react in subsequent additions (Figure 2). With these requirements, the complete synthesis process involves a cyclic procedure of four steps repeated for each nucleotide joined (Caruthers et al. 1987). Reverse to the direction of enzymatic, template-dependent DNA synthesis seen in nature, chemical synthesis forms oligonucleotides in a 3’ to 5’ direction.  5  Figure 2 – Oligonucleotide synthesis cycle. Each nucleotide addition consists of four chemical reactions: deblocking (detritylation), coupling, oxidation, and capping. During deblocking, DMT is removed from the growing oligonucleotide resulting in a free 5’ hydroxyl group. During coupling, an activated nucleoside phosphoramidite is joined to the newly formed 5’ hydroxyl. During oxidation, the unstable tricoordinated phosphate triester linkage formed is converted to a protected tetracoordinated phosphate triester. During capping, oligonucleotides that fail to join to an activated nucleoside are permanently blocked to prevent their further use in subsequent steps. Figure adapted from (Caruthers et al. 1987).  6  A significant factor in the high cost of gene synthesis is the formation of errors during repeated nucleotide additions (Caruthers 1985). During each cycle, a small fraction (~12%) of the growing oligonucleotides do not condense with the incoming activated nucleotide. As part of the four steps, these incomplete products must be capped to prevent further additions to them. The capping process itself, however, is an imperfect process and thus some will escape capping and result in erroneous products.  Due to the repetitive nature of the synthesis process, even with a small error rate, a geometric progression of truncated products ensues (Hecker and Rill 1998). Further still, capping failures result in some incomplete intermediates being reused in the remaining cycles, thus producing oligonucleotides that contain missing internal nucleotides known as N-K products (Temsamani et al. 1995). As a consequence, costly purification and isolation techniques are required post-synthesis to isolate the desired product. Moreover, the necessary starting materials may be too prohibitive to produce a significant yield of product if the desired length exceeds practical limitations (Hecker and Rill 1998). Even with a conservative estimate of a 1% error rate per cycle, there is only a 30% yield of correct product after 99 cycles (Figure 3). After 500 cycles, this yield drops well below 1%.  7  Figure 3 – Graph of the hypothetical % yield based on oligonucleotide length and coupling efficiencies of 0.5, 1, and 2%, respectively. The true coupling efficiency typically varies between cycles but generally lies between 1-2%. Assuming a conservative 1% coupling rate, only 61% of synthesized 50mers are expected to be accurate. By the 100mer size, the yield reduces to 36%, and by the 500mer size (not shown), it less than 1% accurate.  1.2.3 PCA and LCR To overcome the length limitation imposed by chemical synthesis, most DNA synthesis approaches employ Polymerase Cycling Assembly (PCA) or Ligase Chain Reaction (LCR) (Xiong et al. 2008a). In PCA, chemically synthesized oligonucleotides designed to overlap are pooled, and repeated annealing and polymerase extension steps assemble and 8  amplify a complete DNA sequence (Stemmer et al. 1995). Alternatively, LCR achieves DNA synthesis by annealing long oligonucleotides and ligating the fragments into one continuous double-stranded DNA sequence, amplifiable by PCR (Au et al. 1998). The oligonucleotides used in either process must be both long enough to guarantee unique annealing and error-free so as not to introduce base errors into the final product (Carr et al. 2004; Linshiz et al. 2008; Xiong et al. 2008b). Due to the requirement of precise oligonucleotide alignment, regions comprising secondary structures caused by inverted repeats, extraordinary high or low GC content, or repetitive structures make both PCA and LCR error prone even with perfect oligonucleotides.  9  Figure 4 – Overview of PCA and LCA. (Left) Polymerase Cycling Assembly (PCA) synthesis procedure. Oligonucleotides with complementary regions are pooled and repeated annealing and polymerase extension cycles assembly and amplify a full-length product. (Right) Ligase Cycling Assembly (LCA) synthesis procedure. Using a thermostable ligase, repeated annealing and ligation cycles join oligonucleotides into increasingly larger strands. A final PCR step is often employed to amplify the full-length target from the incomplete products.  10  1.2.4 Reducing Errors Eliminating error in DNA products is of great importance to DNA synthesis (Carr et al. 2004; Forster and Church 2006). Mismatches can alter critical codons, remove or introduce unwanted restriction sites, or modify expression levels. Furthermore, a single insertion or deletion can render an entire protein-coding gene meaningless through a frame shift.  To address this problem, two-step assembly is often employed (Xiong et al. 2008a), in which 300–500 bp fragments are constructed (through either PCA or LCR), cloned, sequence-validated, and then used in a second round of assembly to produce the complete target. Alternative assembly strategies often employ methods to enrich error-free DNA molecules to reduce the potential for propagating errors into the final product. These methods often use mismatch-binding proteins to exclude erroneous fragments in subsequent assembly steps (Carr et al. 2004; Tian et al. 2004; Xiong et al. 2008b).  1.3 Rationale Addressing the cost and efficiency of de novo gene synthesis led us to consider an alternative approach to current gene synthesis methods. As an alternative to assembling genes using large and often erroneous, custom synthesized oligonucleotides, we proposed to use oligonucleotides short enough to permit construction of a library of all possible oligonucleotides needed for gene assembly. This approach would serve both to amortize oligonucleotide synthesis costs over many genes and to potentially provide two methods of error correction. By virtue of being short, each oligonucleotide should be more  11  accurate (Caruthers 1985) and each ligation itself effectively provides a mismatch error check (Pritchard and Southern 1997).  Short oligonucleotides are crucial to reducing the library size to a manageable set (Figure 5); however, ligation requirements make it unlikely that an exceptionally small set could be used effectively. A compromise between these competing factors is essential for this approach to be practical, so we set out to investigate the optimal oligonucleotide lengths and conditions for which iterative ligations could be achieved using T4 DNA ligase.  Figure 5 - Required library size for increasing oligonucleotide lengths.  12  1.3.1 Ligase Requirements Both oligonucleotide length and sequence-dependent hybridization efficiency play a critical role in ligase efficiency and vary substantially between ligases (Cherepanov and de Vries 2003; Nilsson and Magnusson 1982). The requirement for accurate duplex DNA surrounding a nick to be sealed differs between the ligases (Figure 6). T7 DNA ligase can effectively join a hexamer and a nonamer on a complete template whereas Tth DNA Ligase is limited to an octamer and a nonamer on a complete template (Pritchard and Southern 1997). These variations suggest some ligases are better suited to joining short oligonucleotides than others (Table 2). T4 DNA ligase was selected for the present study as it is known to ligate adjacent oligonucleotides as small as hexamers completely hybridized to a complementary template (Dunn et al. 1995; Gupta et al. 1968b). Previous studies did not, however, examine the ligation of short, single-stranded oligonucleotides in the absence of a completely hybridized template.  13  Figure 6 - Oligonucleotide ligation on a complete and partial template. A complete template provides adequate double-stranded DNA before and after the site of ligation whereas a partial template on which oligonucleotides do not completely hybridize have a reduced duplex surrounding the site of ligation.  14  Table 2 - All known minimum oligonucleotide length requirements for ligation on a complete template as previously reported in literature.  Name  LigTK  Source  3’ OH + 5’ P on a  Variations/  complete template  Comments  Reference  Thermococcus 9 + 7  8 + 8 also  (Nakatani et al.  kodakaraensis  works  2000)  8 with two  (Pritchard and  mismatches + 9  Southern 1997)  (Archaea) T7  T7 Phage  6+9  Ligase  also works Tth  Thermus  Ligase  thermophilus  8+9  (7 + 9 worked  (Pritchard and  but at a rate  Southern 1997)  100-fold less) Chlorella Chlorella Virus  6+8  Virus  (Odell and Shuman 1999)  Ligase T4 DNA Ligase  T4 Phage  6+6  (Odell and Shuman 1999)  15  1.3.2 Library Size The ideal oligonucleotide library would be as small as possible while still allowing for the efficient construction of all possible gene sequences. However, based on ligation requirements and thermodynamic reasons, it may not be possible to achieve such a goal with extremely short oligonucleotides. A library of hexamers would contain all 4096 possible oligonucleotides of length six. Larger libraries, which may be more efficient in assembly reactions, become difficult to manage: even a library of octamers would require 65,536 oligonucleotides.  16  2 Characterization of T4 DNA Ligase and Short Oligonucleotides 2.1 Methodology for Fluorescence Experiments A rapidly growing number of important DNA applications utilize synthetic oligonucleotides immobilized to solid supports. In general, the purpose of a solid supportattached oligonucleotide is to capture, enrich, identify and/or purify a target. Solid support-attached oligonucleotides are most commonly used in microarrays, which provide a powerful tool for analyzing gene expression. In our experiments, we were interested in isolating and quantifying successful ligations under a variety of different oligonucleotide combinations and reaction conditions. By joining fluorescently labeled oligonucleotides to immobilized oligonucleotides, ligation efficiencies can be obtained from the fluorescence of the immobilized DNA after ligation reactions and subsequent washes to remove unligated reactants. Streptavidin-coated magnetic microbeads (Invitrogen) were selected for the solid support in these experiments as they provided both a direct method for fastening biotin labeled oligonucleotides and a simple magnetbased purification system (Figure 7).  17  Figure 7 - Schematic diagram of immobilized double-stranded DNA used in ligation assays and DNA construction. M-270 Dynabeads (Invitrogen) are attached through a streptavidinbiotin linkage to the 5’ end of a double stranded DNA. The free end is designed with a variable 5’ overhang, complementary to 5’-phosphorylated, 3’-fluorescently labeled oligonucleotides used in ligation. The solid support and double-stranded DNA together can also be referred to as a receptor during ligation assays.  2.1.1 Methodology for Bead Immobilization Oligonucleotides, including those 5’-biotinylated, 3’-FAM6 fluorescently labeled, and 5’phosphorylated, were synthesized by IDT. Immobilized double stranded DNA preparation involved purifying of strepdavidin-coated magnetic beads, binding of the biotinylated top strand, and annealing of the complementary bottom strand. M-270 Streptavidin Dynabeads (Invitrogen) were washed three times with equal volume 2X Bind and Wash (B&W) buffer (10 mM Tris-HCl pH 7.5, 1 mM EDTA, 2.0 M NaCl). DNA immobilization was performed by resuspension of the purified bead solution to 1X B&W buffer supplemented with 3.33 µM 5’-biotinylated oligonucleotide. After 20 min at room temperature with gentle rotation, two washes with equal volume of 1X B&W were performed to remove unbound oligonucleotide. Immobilized oligonucleotide was then 18  hybridized to form dsDNA by resuspending the bead mixture in 10 mM Tris-HCl (pH 7.5), 0.1 M NaCl, 1 mM EDTA and 5 µM bottom strand oligonucleotide. Bead solutions were heated to 80°C for 5 min and cooled to room temperature. Final solutions were washed twice with equal volume TE 10:1 (pH 7.5) to remove excess bottom strand and quantified using a standard PicoGreen fluorescence assay (Invitrogen).  2.1.2 Methodology for Basic Ligation Assays Ligations reactions (25 µl) contained 0.2 µM dsDNA receptor, 0.002 to 20 µM 5’phosphorylated, 3’-FAM oligonucleotide, 20 units of T4 DNA Ligase (NEB), 1X T4 DNA Ligase Buffer (50 mM Tris-HCl, 10 mM MgCl2, 1 mM ATP, and 10 mM dithiothreitol pH 7.5 @ 25°C). Ligations were carried out at 16°C for 1 h and stopped by heat inactivation at 65°C for 10 min. Reactions at each oligonucleotide concentration were performed in triplicate. Control reactions with TE supplemented in lieu of ligase were also performed in triplicate at each oligonucleotide concentration. Washes with equal volume TE 10:1 (pH 7.5) were performed twice to remove excess unligated labeled-oligonucleotide. For each reaction, DNA was quantified in 20 µl by 480 nm fluorescence measurements using a Wallac Victor fluorometer (Perkin Elmer). Ligations at each concentration were corrected for background by subtraction of the average TE control fluorescence levels. Fluorescence units observed were converted to moles based upon the 480 nm fluorescence measurements of the pure labeled oligonucleotide at known molar concentrations.  19  2.1.3 Methodology for Ligation Assays With a Supplementary, Second Oligonucleotide Ligations were performed using the same procedure as the single oligonucleotide reactions above with the addition of either 5 µl TE or 5 µl, 20 µM of a second oligonucleotide, and a modified reaction time. The final 30 µL reaction mixture contained 0.167 µM DNA receptor, 0.00167 µM to 16.7 µM 5’-phosphorylated, 3’-FAM oligonucleotide, 3.33 µM second oligonucleotide, 20 units T4 DNA Ligase, and 1X T4 DNA Ligase Buffer.  2.1.4 Methodology for Hexamer Ligation Assays with Serial Concentrations of PEG Bead and dsDNA preparations were performed as previously described. Reactions (25 µl) consisted of 10 µl 0.4 pmol/µl bead containing the three nucleotide overhang, 5 µl of variable PEG concentrations (such that the final PEG concentration varied from 0 to 20%), 2 µM 5’-phosphorylated 3’-FAM hexamer, 4 µM rescue hexamer, 20 units T4 DNA Ligase, and 1X T4 DNA Ligase Buffer. All reactions were performed in triplicate for 16 h at 4°C. After 10 min enzyme deactivation, samples were washed twice with 25 µl TE (10:1) and 20 µl of each was quantified using the Wallac Victor fluorometer.  2.1.5 Methodology for Characterizing Rescue Assays with Serial Concentrations of the Supplementary Oligonucleotide Bead and dsDNA preparations were performed as previously described. Reactions (30 µl) consisted of 10 µl 0.5 pmol/µl bead containing the three nucleotide overhang, 8.33 µM 5’-phosphorylated 3’-FAM oligonucleotide, a variable concentration of the rescue 20  hexamer, 20 units T4 DNA Ligase, and 1X T4 DNA Ligase Buffer. All reactions were performed in triplicate for 16 h at 4°C followed by 10 minutes at 65°C to deactivate the ligase. Samples were washed twice with 30 µl TE(10:1) and 20 µl of each was quantified with the Wallac Victor fluorometer.  2.1.6 Methodology for Ligation Assays with Alternative Rescue Oligonucleotides Bead and dsDNA preparations were performed as previously described. Reactions (30 µl) consisted of 10 µl 0.5pmol/µl bead containing the three nucleotide overhang, 16.66 µM 5’-phosphorylated 3’-FAM oligonculeotides hexamer, 3.33 µM rescue oligonucleotide, 20 units T4 DNA Ligase, and 1X T4 DNA Ligase Buffer. All reactions were performed in triplicate for 16 h at 4°C followed by 10 min at 65°C to deactivate the ligase. Samples were washed twice with 30 µl TE(10:1) and 20 µl of each was quantified with the Wallac Victor fluorometer.  2.2 Fluorescence Experiments 2.2.1 Minimal Oligonucleotide Lengths for Ligation I investigated the addition of a labeled oligonucleotide which, when annealed to its dsDNA counterpart, extended the double-stranded duplex from 3 to 5 bp. Ligation for 1 h at 16°C showed saturation on overhangs of 5 bp or more; however, reactions with overhangs of fewer than 4 bp showed no detectable ligation (Figure 8).  21  Figure 8 - Observed ligation of a labeled oligonucleotide with an immobilized dsDNA containing a 3, 4, or 5 nucleotide 5’ overhang.  Additional experiments with temperature variations of 4, 8, and 12°C and a time extension to 16 h showed relatively little improvement in ligation efficiency of the 4 bp overhang experiment. These data show that T4 DNA ligase has an essential requirement for a DNA duplex, of at least 5 bp downstream of the 5’ phosphate, for efficient sealing. This requirement suggests that decamers would be necessary for efficient iterative  22  assemblies. Thus, without further improvements, gene assembly would require a library all possible decamers (a size of 1,048,576 oligonucleotides).  2.2.2 Improvement of 4 bp Ligations with a second oligonucleotide To determine if the shorter oligonucleotide reactions could be improved in an effort to reduce the potential library size, the above reactions were repeated with the addition of a second oligonucleotide complementary to the six base linker of the oligonucleotide to be ligated. It was expected that correct hybridization of the three components, the dsDNA substrate, the 5`-phosphorylated 3`-FAM-6 oligonucleotide, and the second oligonucleotide, complementary to the labeled oligonucleotide, would produce two dsDNA with complementary overhangs (Figure 9a). In this manner, ligation efficiency for a 4 bp overhang reaction was markedly enhanced. Shorter oligonucleotides were then tested to determine the lower limit of oligonucleotide length for ligation. A pair of octamers, producing a 4 bp duplex with 4 nucleotide 5’ overhangs on both sides, was successfully ligated (Figure 9b). Thus, by utilizing a second oligonucleotide to extend the surrounding duplex, ligation of an oligonucloetide shorter than a decamer can be achieved. Such an arrangement reduces the necessary library size down to 65,536 oligonucleotides and would allow for the iterative assembly of octamers by successive 4 bp increments.  23  Figure 9 (a) Unsuccessful 4bp duplex reactions could be salvaged by utilizing a supplementary oligonucleotide, designed to complement the first oligonucleotide. Two hour ligation of the 4 bp reaction at 16°C supplemented with 3.33 µM hexamer shows successful ligation (■) while reactions without the supplementary hexamer show no activity (♦). (b) Ligation reaction of an octamer supplemented with a second octamer in which one is used for ligation and the other is used to extend the duplex. A 2 h ligation at 16°C of increasing concentrations of the octamer with 3.33 µM of the supplementary octamer shows significant ligation (■) compared to reactions without the supplementary octamer (♦).  2.2.3 Improvement of 3 bp Ligations with a Second Oligonucleotide With the success of octamers, a further reduction to 3 bp was examined. As expected, a 3 bp overhang reaction could also be enhanced with the addition of a second oligonucleotide (Figure 10a). A pair of hexamers was then examined. When hybridized, 24  the hexamer pair produced a 3bp duplex with 3bp overhangs on both sides. Such an arrangement would allow for the iterative assembly of DNA in 3bp increments. Unfortunately, no ligation was observed (Figure 10b). Modifying the ligation conditions to 4°C and greater than 16 h incubation time provided a modest improvement (Figure 11). It has been shown that hexamer ligations are feasible, but results are inconsistent and reactions must be performed under more demanding conditions (Dunn et al. 1995). For the purposes of gene assembly, iterative assemblies requiring 16 h per hexamer would not be practical. We suspect that the lack of efficient ligation is a collective consequence of several factors: a requirement of T4 DNA ligase to join three strands of minimal length, a reduced Tm of the shorter annealing sequences, and limited contacts between T4 DNA Ligase and its substrates. We cannot exclude the possibility that the fluorescently labeled end of the hexamer may interfere with oligonucleotide entry into the catalytic pocket of T4 DNA Ligase.  25  Figure 10 - Unsuccessful 3bp duplex reactions can be salvaged by utilizing a supplementary oligonucleotide. (a) A 2 h ligation of the 3 bp reaction at 16°C with 3.33 µM supplementary hexamer, shows successful ligation (■) while reactions without the supplementary hexamer show no activity (♦). (b) A 2 h ligation at 16°C with a hexamer pair shows limited improvement (■) compared to the unsupplemented hexamer (♦).  26  Figure 11 - Improved ligation for a hexamer hexamer pair with extended reaction time and reduced temperature. A 16 h ligation at 4°C of a hexamer pair shows improved ligation (■) compared to the unsupplemented control (♦).  2.2.4 Alterations in Oligonucleotide Arrangements Improve Ligation Efficiency In order to investigate the inefficient ligation of hexamers, two alternative approaches where investigated. First, a 3 bp overhang with a he tamer was used to determine whether efficient ligation was limited by the ability of the two free oligonucleotides to anneal or of the immobilized dsDNA overhang to anneal to the incoming pair (Figure 12). Although the use of heptamers results in a tiling arrangement of alternating 3 and 4 nucleotide frames which may not be ideal when a second rescue oligonucleotide must  27  also be used, their use would reduce the number of oligonucleotides in the library 4-fold from that of an octamer library.  Figure 12 - Ligation of a labeled heptamer using a supplementary hexamer. A 16 h ligation of the 3 bp dsDNA and heptamer at 4°C with 3.33 µM supplementary hexamer, shows successful ligation (■) while reactions without the supplementary hexamer show little activity (♦).  In an effort to improve the ligation of a hexamer pair, ligations were performed in which the fluorescently labeled hexamer annealed with the second complementary hexamer to produce a duplex of 4 base pairs and 5’ overhangs of only 2 nucleotides. By shifting the location of the second (bottom strand) hexamer, the pair’s number of hybridized nucleotides could be increased from three to four without altering the lengths of the two hexamers (Figure 13). It was expected that this arrangement would lead to an increased stability in the hybridized pair, thus improving ligation kinetics. Unfortunately, no 28  significant improvement was observed compared to the standard three nucleotide overhang procedure (Figure 13). This may be either a result of the reduction to an immobilized overhang of only 2 nucleotides or a result of the fact that both, regardless of organization, still leave only 6 nucleotides from the site of ligation to the end of the oligonucleotides.  Figure 13 – Alternative arrangements of hexamers show some improvement to ligation. A 16 h, 4°C ligation of a 2 nucleotide overhang with a complementary hexamer (■), a complementary hexamer and rescue hexamer which form a 4 bp duplex pair ( ), a 3 nucleotide over hang with a complementary hexamer ( ) and a complementary hexamer and a rescue hexamer which form a 3bp duplex pair ( ).  29  2.2.5 Optimal Supplementary Oligonucleotide Concentrations are Dependent on the Ligation Kinetics of the First Oligonucleotide To determine to what extent the concentration of the second oligonucleotide has an effect on ligation of the first, experiments were performed in which the first oligonucleotide concentration was held constant while the second varied (reverse to the previously performed experiments). For octamer reactions, which already show good efficiency, the concentration of the second oligonucleotide made little difference to ligation once a 1:1 ratio had been exceeded (Figure 14a). For the 3 bp hexamer ligation supplemented with a rescue hexamer, however, increasing concentrations of the second oligonucleotide beyond a 1:1 ratio increased ligation efficiency (Figure 14b). This correlation suggests that the greater concentration of the second oligonucleotide increases the number of available duplexes and thus increases the likelihood of ligase to successfully bind a complete duplex and join the labeled oligonucleotide to the immobilized dsDNA.  30  Figure 14 – Ligation efficiencies with variable concentrations of the supplementary oligonucleotide. (a) 8.33 µM labeled oligonucleotide supplemented with serial concentrations of a complementary hexamer. (b) 8.33 µM labeled hexamer supplemented with serial concentrations of a complementary hexamer. Both reactions were performed for 16 h at 4°C and all values were subtracted against a no ligase control of equal labeled oligonucleotide concentration.  2.2.6 Improved Ligation Rates in the Presence of PEG I evaluated supplementation with polyethylene glycol (PEG) in hexamer ligations in an effort to improve reaction rates. PEG increases the effective concentration of the oligonucleotides and enzyme (Zimmerman and Pheiffer 1983). Intermolecular ligation is dependent on the concentration of DNA while the rate of intramolecular ligation is concentration independent (Zimmerman 1984). Moreover, the addition of PEG aids in the 31  suspension of the magnetic beads in solution for longer periods of time. As expected, hexamer/hexamer ligation showed improvement with increasing concentrations of PEG (Figure 15).  Figure 15 – Improvement of hexamer ligation efficiency with PEG. A 16 h, 4°C ligation of the 3 nucleotide overhang with a labeled hexamer and supplementary hexamer at varying concentrations of PEG(■). The background fluorescence observed without ligase for the same reaction conditions (♦).  2.2.7 Hairpin Oligonucleotides can Improve Hexamer Ligation In an attempt to further characterize hexamer ligation and potentially improve our understanding of its limitations, oligonucleotides were compared to determine their effectiveness as supplementary oligonucleotides. Specifically, a unique hairpin oligonucleotide was designed which, when combined with the labeled hexamer and the 32  target immobilized 3 bp overhang, would enclose the labeled hexamer with double stranded DNA on both sides (Figure 16). It was expected that such an organization would result in increased base stacking, hybridization, and T4 DNA ligase affinity for the DNA complex by reducing its ability to slide off a free end of dsDNA. Should a hexamer assembly strategy be adopted, a universal hairpin (one with degenerate bases) could be used to supplement any ligation and improve the efficiency.  Figure 16 - Variations to the supplementary oligonucleotide used offer different ligation potentials. A hairpin supplementary oligonucleotide effectively increases the duplex far beyond just the free overhang of the first oligonucleotide and as a consequence shows a much high ligation efficiency.  33  2.3 Methodology for Pooled Oligonucleotide Experiments 2.3.1 Methodology for Octamer and Hexamers Pools Pool ligation reactions of 20 µl contained 10 µM of each 5’-phosphorylated oligonucleotide, 10 units of T4 DNA Ligase (NEB), 1X T4 DNA Ligase Buffer. Ligations were carried out at 0°C for 4 h (unless otherwise stated) and stopped by heat inactivation at 65°C for 10 min. After incubation, all products were supplemented with 6X loading dye and run on a 20% polyacrylamide gel (20% 19:1 Acryl-Bis, 1X TAE, 0.1% APS) for 2 h at 200 V, and visualized by SYBRGreen staining.  Additional hexamer experiments were performed as outlined above but with a modified reaction time of 18 h at 0°C, a 10-fold increase in final oligonucleotide concentration, and with the addition of PEG up to a 10% final volume.  2.4 Pooled Oligonucleotide Experiments Fluorescence experiments demonstrated the feasibility of octamer ligation and provided valuable insight for further experiments. Firstly, they demonstrated that each oligonucleotide ligated must include a supplementary oligonucleotide for the first. Secondly, they established that the magnetic beads were capable of supporting ligation and wash procedures. Lastly, they brought attention to a major concern of the assembly process: the addition of octamers one at a time at a rate of 4bp per ligation is an extremely slow process.  34  For a quicker assembly procedure, ligations of multiple oligonucleotides in a single reaction were examined (Figure 17). In the case of a pool of octamers, each downstream octamer acts as a supplementary oligonucleotide for the previous with the exception of the last position.  Figure 17 - Ligation of multiple oligonucleotides. (Left) The ligation of a single octamer requires the presence of a second supplementary oligonucleotide. (Right) If multiple octamers are combined each can act to supplement the previous in the series (with the exception of the last).  We expected that the fidelity of T4 DNA Ligase would preferentially select against mismatched oligonucleotides, permitting multiple ligations to be performed correctly in one reaction so long as only one unique organization of the oligonucleotides is possible. (Assemblies of unique sets of oligonucleotides are referred to as subunits. These subunits can then be used to assemble larger constructs.)  2.4.1 Octamers Three pools of octamers, each with only one possible organization of the included octamers, were tested for ligation. Correctly sized bands were distinctly visible for all of 35  the pools (Figure 18). As anticipated, faint bands above and below the target bands suggest that mismatched and incomplete ligations also occurred to some degree.  Figure 18 - Ligation of pools of oligonucleotides. Pooled ligation of hexamers and octamers. The expected sized products from left to right were: 18 bp (6 hexamers), 18 bp (6 hexamers), 33 bp (11 hexamers), 32 bp (8 octamers), 24 bp (6 octamers), and 44 bp (11 octamers).  2.4.2 Hexamers The same process was performed with unique sets of hexamers and DNA quantification confirmed that each hexamer pool contained the expected concentration of oligonucleotides; however, no bands were observed (Figure 18). Further experiments revealed a 10-fold increase in hexamer concentration and PEG were necessary to observe a product and that some hexamer combinations would not ligate even with these 36  adjustments (Figure 19). These findings are consistent with our previous fluorescence results and suggest that hexamer assembly is, at best, limited in practice.  Figure 19 - Hexamer pooling experiments with increasing concentrations of PEG. Nine uniquely organized hexamers showed visible ligation after 18 h at 0°C when supplemented with PEG. The same conditions were applied to a different set of eight uniquely organized hexamers however no product was observed.  2.4.3 Dephosphorylated Oligonucleotides can be Used in Pooled Assemblies For our experiments, the average price of an octamer from IDT was CA$8.00 at the 1 µmol synthesis scale (~500 nmol average yield); however, the price for phosphorylation of this octamer was CA$50.00. This is a serious concern to a cost-effective assembly procedure. An alternative procedure is to use pools of dephosphorlyated oligonucleotides  37  and either phosphorylate them prior to or during the ligation reaction. Reactions containing both T4 DNA ligase and T4 Polynucleotide Kinase were performed on a variety of oligonucleotides. Dephosphorylated octamers showed little reduction in ligation compared to their phosphorylated counterparts, suggesting that this is a cost effective alternative to ordering phosphorylated oligonucleotides (Figure 20).  Figure 20 - Dephosphorylated octamers can be used in pooled ligations when supplemented with PNK. 2 h ligations at both 16°C and 25°C show successful pooled assembly. An alternative experiment using dephosphorylated heptamers, however, did not the produce size expected products.  38  2.5 Methodology for DNA Construction Experiments 2.5.1 Experimental Methodology for Pair-wise Assembly Pooled ligation reactions consisted of 1.5 µM immobilized dsDNA on beads, 66.7 µM of each octamer, 1X T4 DNA ligase buffer and 0.5 units/µl of T4 DNA Ligase. Reactions proceeded for 4 h at 4°C and mixtures were then washed twice with equal volume of TE to remove unligated product and enzyme. In selected bead sets, this process was performed twice using the same conditions but with the octamers split into two groups to avoid a region of repeated sequence. Digestion was performed by resuspending the bead solutions to 25 units BbsI (NEB), 50 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, and 1 mM Dithiothreitol (pH 7.9 @ 25°C). Digestion was performed for 3 h at 37°C followed by enzyme inactivation at 65°C for 20 min. Released DNA fragments were isolated by immediate aspiration from the hot digest mixture while a magnet was applied. The extracted mixture was cooled to 4°C for 5 min and the full volume was used in subsequent ligations. Pairwise ligation steps were performed by resuspending an immobilized bead solution with an adjacent digested fragment solution. Ligation reactions consisted of 1.5 µM immobilized DNA, the released DNA fragment (unknown concentration), 1X T4 DNA ligase buffer and 0.5 units/µl of T4 DNA Ligase. Reactions proceeded for 4 h at 4°C and mixtures were then washed twice with equal volume of TE to remove unligated product and enzyme. Digest and ligation steps were repeated as necessary to complete the pair-wise construction process.  DNA amplification was performed using 2 µl of the final bead solution and Platinum PCR SuperMix (Invitrogen) according to manufacturer’s directions. The forward (5’39  GCAGTTCCGGATCCATCTAGA-3’) and reverse primers (5’TGCCAGATTTTCTCCATGTCGT-3’) were designed to match a segment of the bead adaptor and the end of the target construct, respectively. PCR amplification was performed on a DNA Engine Dyad 2 Peltier Thermal Cycler (BIO-RAD) with parameters of 25 cycles at 96°C for 10 s (with an additional 2 min for the first cycle), 53°C for 5 s and 60°C for 1 min, followed by incubation at 4°C. PCR products were analyzed by PAGE (20% 19:1 Acryl-Bis, 1X TAE, 0.1% APS) and visualized by SYBRGreen staining. A standard polyacrylamide gel elution protocol was used to resuspend the PCR product in TE (10:1 pH 7.5). Gel purified PCR products were cloned into a Zero-Blunt TOPO kit vector (Invitrogen) following the manufacture’s instructions. 25 µl electrocompetent DH10-B E. coli cells (Invitrogen) were transformed with 2 µl of the vector solution. PCR-amplified pUC19 fragment and TE were cloned and transformed as positive and negative controls, respectively. Upon electroporation, 200 µl roomtemperature S.O.C. medium was immediately added and solutions were incubated for one hour at 37°C. Transformation mixtures were spread on 2xYT agar plates containing 50 µg/ml kanamycin and incubated at 37°C for 16 hours. Bacterial colonies were picked and transferred to a 96-well culture plate (Beckman Coulter) containing 1.2 ml 50 µg/ml kanamycin 2xYT. Plates were incubated at 37°C for 14 h. 5% glycerol stock, DNA purification, and sequencing were performed as previously described by Yang et al. (Yang et al. 2005). Sequencing primers used were T7 (5’-AATACGACTCACTATAGG3’) and a modified Sp6 primer (5’-ATTTAGGTGACACTATAGAATAC-3’) unique to the pCR-Blunt II-TOPO vector. Thermal cycling was performed on a DNA Engine Tetrad 2 Peltier Thermal Cycler (BIO-RAD) with parameters of 35 cycles at 96°C for 10  40  s, 48°C for 5 s and 60°C for 3 min, followed by incubation at 4°C. All successful sequence reads were analyzed using the ClustalX sequence-alignment tool (Chenna et al. 2003), and completed products were confirmed by direct visualization of electropherograms.  2.5.2 Experimental Methodology for Serial Assembly Immobilized dsDNA (450 µl) was purified and quantified using PicoGreen, and was found to be 0.213 pmol/ul. Aliquots were made, each with 150 ul (32 pmol), and three distinct variations of the assembly procedure were performed: a low oligonucleotide concentration, a low oligonucleotide concentration supplemented with PEG, and a high oligonucleotide concentration.  The first pooled ligation consisted of 32 pmol of immobilized dsDNA resuspended to a 60 µl reaction volume containing either 5 µM (for low concentration pools) or 16.667µM (for high concentration pools) of each octamer (12 in total), 1.66 units/µL T4 DNA Ligase, 1X T4 DNA ligase buffer. PEG-supplemented experiments contained 7.5% final PEG concentration. Reactions proceeded for 3 h at room temperature and mixtures were then washed twice with equal volume of TE to remove unligated product and enzyme. Bead solutions were then resuspended for the second serial ligation.  The second pooled ligation was performed according to the same procedure as the first, but with an alteration to use the next thirteen octamers in the series. The third ligation was performed as the others, but using the final ten octamers. 41  Samples (2µl) of each of the three final bead solutions was amplified using Platinum PCR SuperMix (Invitrogen) according to manufacturer’s directions using two variations on primers. The first contained a forward primer (5’-GCAGTTCCGGATCCATCTAGA-3’) identical to a region in the adaptor DNA and the same as that used in the hierarchical assembly. Two different reverse primer were examined: the first was complementary to a region internal to the expected final construct, while the second was identical to the reverse primer used in hierarchical assembly and was expected to produce the complete product. PCR amplification was performed on a DNA Engine Dyad 2 Peltier Thermal Cycler (BIO-RAD) with parameters of 25 cycles at 96°C for 10 s (with an additional 2 min for the first cycle), 53°C for 5 s and 60°C for 1 min, followed by incubation at 4°C. PCR products were analyzed by PAGE (20% 19:1 Acryl-Bis, 1X TAE, 0.1% APS) and visualized by SYBRGreen staining. Gel elution, cloning, and sequencing were performed as previously described (page 39). All successful sequence reads were analyzed using the ClustalX sequence-alignment tool (Chenna et al. 2003), and completed products were confirmed by direct inspection of electropherograms.  2.6 DNA Construction Experiments From the previous experiments, we determined that the minimum overhang to ligate an oligonucleotide to a double strand was 5 nucleotides using T4 ligase. It was found, however, that for overhangs of 4 bp, a second “supplementary” oligonucleotide to extend the double-stranded region flanking the ligation position (or nick point) greatly enhanced ligation efficiency. Three base overhangs, however, did not ligate efficiently, suggesting 42  that hexamers are irreconcilable with DNA assembly using T4 DNA Ligase and that the optimal oligonucleotide size for assembly is octamers.  Pooling of octamers to produce intermediate length subunits of a desired gene, followed by the assembly of these subunits into the full-length construct, offers a straightforward two-step gene synthesis procedure. In the first stage, optimal subunit boundaries are determined such that overlaps among the precursor octamers are unique and assemble into only one possible product. Subunits are assembled on solid supports (magnetic microbeads) to allow purification after ligation. In the second stage, subunits are released and ligated together to produce the complete, PCR amplifiable product.  Two different approaches were performed to determine optimal assembly strategies. For the hierarchical assembly procedure, distinct subunits of equal length were constructed (on four distinct solid supports) and then combined through two rounds of pair-wise ligations into a complete construct. For the serial assembly procedure, only one solid support was used but underwent repeated pooled ligation and wash procedures such that each subunit assembly would build on the previous until a complete construct was made.  2.6.1 Hierarchical Assembly of a 128 bp Construct From the ligation experiments, it was concluded that DNA synthesis was feasible with octamers. To improve the pace of such a method, a hierarchical approach was designed in which multiple intermediate fragments could be constructed from octamers in parallel and then combined in a repeated pair-wise order (Figure 21). The solid-support used to 43  anchor growing intermediate fragments was designed such that digestion with BbsI would release any attached fragment while retaining a 4 bp overhang (Figure 21a). Released intermediates could then be used in further ligations. Four distinct bead sets were created, each with a unique 4 bp overhang (Figure 21b). The overhangs for the solid support adaptors were constructed to be complementary to evenly distributed regions of the 128 bp target such that eight octamers, overlapping in 4 bp frame shifts, would tile between each region.  Figure 21 – Hierarchical assembly of a 128 bp construct. (a) Adaptors previously used in fluorescence experiments were redesigned to include a forward primer site and a BbsI  44  restriction site. (b) The four sets of eight octamers used to assembly the synthetic DNA. (c) An overview of the hierarchical assembly procedure used in constructing the 128 bp DNA.  In the first step, pooled ligation reactions were performed with the solid support and nine octamers. We suspected from our T4 DNA Ligase experiments that each octamer would act as a partial template for the next, with the exception of the ninth octamer, which would merely serve to allow the last octamer in the set of eight to ligate. To avoid problematic regions of non-unique complementary ends found in the octamer pools, two of the pooled ligations (C and D) were performed in two steps, avoiding the repeated region. Each of the four products from this process, A, B, C, and D, were expected to be 32 bp. In the second phase of construction, fragments B and D were detached from their solid support using BbsI and then ligated to the immobilized fragments, A and C, to produce fragments AB and CD. A third digest and ligation phase, identical to the second, released the fragment CD and ligation with the immobilized AB intermediate produced ABCD (Figure 21c). PCR amplification of the final product was then performed directly from the immobilized dsDNA.  Sequencing the 170 bp target band verified a single product containing the 42 bp adaptor and 128 bp construct (Figure 22). Sequencing of a second, smaller band revealed a product missing one of the 32 bp intermediate fragments.  45  Figure 22 – Result from the 128 bp hierarchical assembly. (a) Gel of PCR amplification on the hierarchical assembled construct. (b) Electropherogram of the sequenced 170 bp product.  46  2.6.2 Serial Assembly of a 128 bp construct To further characterize the ability to construct DNA through pools of octamers, variations in the construction method were explored. Firstly, instead of a hierarchical, pair-wise assembly method, an inchworm assembly method was chosen, in which each subunit would be assembled and joined to a growing fragment one after another. Secondly, the number of octamers per pool used was not fixed; instead, the number of octamers selected was based on how many would fit into one reaction without introducing a repeat that would potentially cause a polymerization of octamers (Figure 23a). Lastly, variations in PEG and octamer concentrations were explored to evaluate the necessary reaction conditions for assembly. Furthermore, two different sites on the construct were selected for PCR amplification to determine if any significant loss of product was occurring after repeated ligation and wash procedures (Figure 23b).  As expected, supplementation with 7.5% PEG reaction and a three-fold increase in octane concentration showed improved assembly efficiency (Figure 23c). Products of the expected size (a 96 bp) were recovered from all three reactions at the first primer site; however, visible products were only observed for the PEG supplemented and increased concentration reactions from the second primer site (a 128 bp product).  47  Figure 23 - Serial assembly of a 128bp construct. (a) Three ligations with a varying number of octamers were performed on a single solid support in series. (b) PCR was performed on the final ligation product at two different end locations to determine if repeated ligations reduced the construction efficiency. (c) PCR products for each of the reverse primer locations and three possible reaction conditions show variations in assembly efficiency.  48  3 Algorithmic Assembly of Genes Conceptually, assembly of genes consists of two distinct stages (Figure 24). In the first stage, oligonucleotides are grouped such that no repeat conflicts can occur thereby ensuring that subunits produced from each group are unique and correct. In the second stage, subunits are removed from their solid support (mobilized) and combined into the final construct.  Figure 24 - Schematic representation of a two-step assembly procedure. In the first step, oligonucleotides are pooled and attached to solid supports into unique subunits. In the second step, subunits are mobilized in some order (serially, pair-wise, or pooled) to produce a final and complete construct that can then be isolated through PCR amplified.  Although short oligonucleotides avoid many of the challenges found in longer oligonucleotides, they introduce their own challenges that must be resolved. For example, 49  when using octamers, there are 256 palindromic octamers that can self-anneal and should be avoided. Furthermore, pools must also be designed such that no two oligonucleotides in the pool share the same overhang, as this could cause incorrect ligation. The latter issue drives the allocation of the octamers into pools, and the former may be dealt with by breaking the sequence across the palindrome and then combining each half after subunit assembly. In either case, there is the flexibility to choose one of eight “frames” in the design, in order to optimize the location of boundaries. Octamers are short enough to avoid issues with longer repeats that are of concern to conventional PCA assembly. With careful selection, many potential problems can be resolved. The following sections describe the basis for assembly procedures.  3.1 Sequence Tiling A target DNA sequence can be mapped to a variety of frames to alter which oligonucleotides are used in the construction process. There are six possible arrangements of hexamers while with octamers there are eight. In an ideal sequence, the selected tiling frame would be inconsequential, however when challenging DNA regions exist, alternative tiling frames offer multiple attempts to alleviate difficulties.  3.2 Problematic Regions 3.2.1 Restriction Sites Immobilization and release of solid-support bound DNA is made graceful with BbsI (or earI for hexamers), a type II restriction enzyme capable of cleaving any sequence adjacent to its recognition site, however it also imposes limitations on the sequences that 50  can be synthesized. Specifically, the use of BbsI in solid-support assembly procedures implies that the BbsI recognition site cannot also be located in the target sequence.  The frequency with which a particular restriction site occurs in any DNA depends largely on the base composition and length of the recognition site. The BbsI recognition site is six base pairs (5’-GAAGAC-3’) and so its frequency of occurrence is (1/4)6. We would expect therefore, for this restriction site to occur on average every 4096 base pairs along a random DNA sequence.  Although codon changes can alleviate this problem, if an exact nucleotide sequence with this restriction site is desired, a few possible solutions exist. Firstly, if a pair-wise hierarchical assembly procedure is used, subunits can be partitioned between the BbsI site such that only the final pair-wise assembly step introduces the sequence since no further BbsI digestions are required after this step. This solution however makes it possible to add only one BbsI restriction site. A more robust alternative is to join multiple subunits in a single step. By designing subunits to each have a unique sticky end, multiple subunits can be joined in a single step and thus multiple BbsI sites if necessary. Assembly of partial constructs can also be accomplished through fusion PCR should these BbsI avoidance solutions not be adequate. Another possibility is to use an alternative restriction enzyme for the bead adaptors. BbsI was selected because of its ability to produce a 4 nucleotide over hang of any sequence, however other restriction enzymes are available which fit the same requirements. Unfortunately, this solution requires that an alternative set of adaptors be constructed for the desired sequence.  51  Finally, oligonucleotides with thiol-modified ends can be chemically attached and released from solid supports with the use of various binding and releasing agents rather than with site-specific enzymes.  3.2.2 Complete Palindromes Oligonucleotides, which are complete palindromes, are inefficient for ligation. As an example, consider the addition of the octamer 5’-AAAATTTT-3’ to a growing DNA strand with an overhang of 5’-TTTT-3’. Although a correct organization between sticky ends exists, the oligonucleotide is far more likely to self anneal and effectively remove itself from the reaction (Figure 25b).  52  Figure 25 - Common problems encountered during subunit assembly of pooled oligonucleotides. (A) Repeated oligonucleotide ends do not allow for a unique order of oligonucleotides resulting in a series of incorrect subunits. (B) Palindromes prevent a proper tiling arrangement and reduce effective subunit assembly. (C) 4 bp oligonucleotide ends cannot be pooled correctly as they can pair with themselves to produce incorrect subunits.  53  For an even length oligonucleotide, the number of perfect palindromes can be found as n(k/2), where n is the alphabet size (four in the case of nucleotides) and k is the length of the palindrome. Of the 65536 total possible octamers, only 256 are perfect palindromes (Of the 4096 hexamers, only 64 are palindromes).  Palindromes can be avoided by using an alternative tiling frame or altering codons (the latter however is not applicable to exact nucleotide sequences). Unfortunately, selecting an alternative tiling frame is not always possible as a frame which avoids one palindrome may often introduce new palindromes elsewhere. In such cases, a possible solution may be to introduce a pair of unusual length oligonucleotides into the construction process. For instance, after the first palindrome is avoided, the use of a pair of nonamers would shift the writing frame over by +1, avoiding the next octamer palindrome. Some palindromes, however, may not be an issue if longer ligation times are used to increase the chance of their correct assembly.  3.2.3 Repeats in Oligonucleotide Pools To improve the pace of DNA assembly, multiple uniquely annealing oligonucleotides can be combined in one step. In such a situation, each oligonucleotide to be added acts to rescue the first and improve its ligation (with the exception of the last). One can expect that these pooled ligations will produce a unique sequence so long as there is only one unique organization of the oligonucleotides. If an overhang is repeated in the pool, then polymerizations of indefinite lengths can occur (Figure 25a). To avoid such conflicts,  54  pools must be designed as unique sets, and, realistically, must not be too large because the change of mismatched ligation increases with pool size.  The problem itself is similar in theory to the birthday paradox, in which one asks what the probability is of in a set of randomly chosen people, some pair of them will have the same birthday. A more relevant version of this problem is to determine what the expected average number of people necessary is for any two people to share the same birthday in the group. In our case, we are interested in determining the average size of a pool of octamers possible before a repeated four nucleotide end occurs. This is analogous to the birthday problem where octamers represent the individuals and the four nucleotide ends represent their unique birthdays.  The problem is encountered in numerous hashing algorithms. If one samples uniformly, with replacement, from a population of size M, the number of trials required for the first repeated sampling of some individual has the expected value Q(M) (Flajolet et al. 1995).  55  With M = 256, the total number of unique 4 bp endings to avoid repeating, the average number of octamers required to find a pair with the same 4 bp beginning or ending is 19. In the theoretical best case, M+1 = 257 octamers would be needed before a collision occurred; at worst, a collision would occur between two octamers. On average, however, 19 are required. This number was obtained both by mathematical approximation of Q(M) and algorithmically using repeated trials of randomly generated octamers (Appendix C: Source Code).  Unfortunately, repeats are not limited oligonucleotides pools. When a subunit is mobilized for pair-wise ligation, one must also ensure that the now free end is not complementary to the other end. In such situations, polymerization of large intermediate fragments can occur and the only solution is to select a different organization of subunits.  3.2.4 Palindromic 4 bp Ends A special case of repeated oligonucleotide ends and a far more serious concern is when an octamer’s 4 bp ending (or beginning) is capable of self-annealing to form a duplex that still matches the tiling organization (Figure 25c). In this situation, when the oligonucleotide is introduced either the next expected oligonucleotide or a second copy of the problematic oligonucleotide can bind. If the second copy is introduced, an erroneous subunit will be constructed in which octamers are correctly tiled until the problematic region but after this point, the same octamers will be tiled in reverse order and the remaining region to be constructed will be absent.  56  There are a total of 42 (or 16) 4 bp sequences that are palindromes. Because any 4 bp sequence attached to one of these 16 palindromes is of concern, there is a total of 16*256 (or 4096) octamers in which the first four bases are a palindrome and 4096 in which the last four bases are a palindrome.  Because all octamers joined require a supplementary oligonucleotide and because these problematic oligonucleotides can effectively self anneal, it is not possible to simply divide these problematic regions between subunits.  To avoid these regions, modified oligonucleotides must be used. If one desires to maintain the tiling frame, a unique 16 nucleotide top and bottom oligonucleotide can be incorporated which effectively skips the problem 4 bp region and continues in the same frame afterwards. A second solution is to use a set of nonamers to skip the problem region but causes shift in the tiling frame. The remaining sequence must then be organized into octamers based on this new frame.  3.2.5 Single Polynucleotide Stretches Common elements in natural DNA often include long stretches of repeated single nucleotides or short polynucleotides. A polynucleotide stretch of twelve or more of the same nucleotide cannot be assembled using an octamer library as in even the best tiling frame, a polymerization will occur due to the requirement of repeating ends. Dividing the sequence into two subunits at the problematic stretch can alleviate this problem stretches of approximately twelve nucleotides. For longer stretches, more complex solutions 57  include: extending the library set to include an additional set of larger oligonucleotides for common polynucleotide stretches; utilizing custom oligonucleotides; or growing the sequence in a serial inchworm process by combining phosphorylated and dephosphorylated oligonucleotide ligation steps to prevent polymerization of oligonucleotides.  3.3 Conflict Avoidance Strategies To illustrate the assembly procedures used to produce optimal DNA sequences while avoiding problematic regions, sample genes were examined for various assembly difficulties and solutions.  For these example genes, only one reading frame for the hexamer and octamer assemblies was examined (Table 3). Thus, a different set of unique hexamers or octamers could be found by altering the frame. The tiling frame should be chosen to optimally reduce the number of problematic regions and simplify the assembly procedure.  58  Table 3 - Comparison of the number of hexamers and octamers required to construct a sample of genes. Unique oligonucleotides represent how many of these oligonucleotides are not repeated in the standard tiling frame.  Gene  Hexamers  Unique  Octamers  Unique  Required  Hexamers  Required  Octamers  gfp  237  230  177  176  bla  286  278  214  214  actb  375  343  281  279  ras  690  636  517  517  gapdh  335  303  251  251  p53  393  265  294  293  h2ax  143  129  107  106  taq  1008  614  755  725  cre  343  321  257  256  3.3.1 Restriction sites Prior to assembly, locations of the necessary restriction sites must be identified and dealt with. While their frequency is expected to be low, they cannot be avoided for nucleotide sequences and so alternative strategies must be used.  59  Table 4 – Number of occurrences of EarI and BbsI in various nucleotide sequences.  Gene  Number of EarI sites  Number of BbsI sites  (Hexamer issue)  (Octamer issue)  gfp  0  0  bla  0  0  actb  2  1  ras  3  0  gapdh  0  2  p53  1  2  h2ax  0  2  taq  5  5  cre  0  0  3.3.2 Palindrome Avoidance Using the same standard assembly frame as that for Table 3, numerous palindromes arise both at the hexamer and octamer tiling level (Table 5), although by virtue of the greater number of possible sequences, there are less palindromes for octamers (a frequency of 0.0156 versus 0.00391 respectively).  60  Table 5 - The number of palindromes that occur in the first tiling frame and the number of palindromes found in the frame that contains the least number of palindromes.  Gene  Palindrome  Palindrome  Palindrome  Palindrome  Hexamers  Hexamers in  Octamers in  Octamers in  in Frame 1  Optimal Frame  Frame 1  Optimal Frame  (Best of six)  (Best of eight)  gfp  5  2  1  0  bla  3  3  0  0  actb  7  6  3  1  ras  11  11  8  1  gapdh  6  4  2  0  p53  8  7  1  0  h2ax  0  0  2  0  taq  15  13  5  1  cre  6  3  1  0  By examining each possible tiling, an optimal frame can be selected in which the number of problematic oligonucleotides is minimized. For an amino acid sequence, exact palindromes can easily be avoided with minor codon changes.  A far more serious problem is that of 4 bp oligonucleotide ends which are palindromes. Because these oligonucleotides can self anneal and still maintain the correct tiling frame they must be avoided. Utilizing a different tiling frame is the simplest solution; however, 61  due to their frequency of occurrence, nonamers or 16mer pairs must be introduced to skip over these regions when there is no possible alternative.  3.4 Methodology of Algorithmic Assembly Strategies 3.4.1 Subunit Requirement Strategies Accurate construction of subunits is essential to all assembly strategies. Because of this requirement and the variability in computationally designed subunits, a modular approach was implemented in which subunit validation strategies can easily be swapped or expanded (Table 6). While these strategies do not dictate the steps in assembling a sequence, they do govern the criteria used by the assembly procedure to identify valid subunits and therefore can radically alter the final solution when changed.  62  Table 6 - Possible subunit validation strategies.  Subunit Validation  Description  Strategy Number of Octamers  A subunit cannot exceed a selected total number of octamers. This limits the potential for mismatched ligations as the pool size increases however this strategy alone does not prevent repeat conflicts from occurring and therefore should not be considered for a large-scale assembly.  Repeats in Octamers  A subunit cannot have any two oligonucleotides share the same 4 base fragment either in the first of last position. This prevents unwanted polymerization of oligonucleotides in a subunit  Distinct repeats in first  A more lenient version of the above which allows for repeats  or last four bases  to occur so long as the repeat is not in both the first four bases or last four bases of any pair of oligonucleotides  Number of Octamers and This strategy validates a subunit based upon the two District Repeats  previously mentioned strategies. A subunit is considered valid only if it passes both validation strategies.  63  3.4.2 Subunit Assembly Strategies Once a subunit validation strategy has been selected and a given tiling frame is chosen, the sequence must be partitioned into valid subunits. In general, there are countless solutions to this problem as any given subunit can easily be truncated in multiple positions leading to different assemblies for the next subunit. Thus, each subunit offers a branch point at which numerous downstream solutions can be explored. Each assembly strategy was designed to perform a different search method and collect all possible solutions (Table 7).  64  Table 7 - Possible assembly strategies.  Assembly Strategy  Description  Inchworm  Works from left to right along the target sequence trying to maximize the size of the next possible subunit without violating the selected subunit constraint strategy. Once a boundary has been selected it also recursively tries a few slightly shorter variations of the current subunit as these may eventually lead to better alternative solutions.  Simple Divide And  Works by simply diving the target sequence at the middle into  Conquer  progressively smaller subunits until each subunit is considered valid by the constraint strategy. Certain sequences may lead this strategy to find solutions with unnecessarily small subunits since no optimization is made in planning subunit boundaries around problematic oligonucleotides.  Optimized Divide  Works by dividing the target sequence into progressively smaller  And Conquer  subunits but selects a location near the middle of the subunit that separates the closest repeat conflict. This strategy reduces the number of necessary subunits compared to the simple divide and conquer strategy since each division is guaranteed to separate a conflict.  65  3.4.3 Solution Selection and Filtering Strategies Because any given sequence can yield an enormous number of candidate assemblies, it is necessary to sort these solutions and identify the most promising ones. Possible selection criteria include: a minimum number of octamers in each subunit; a maximum number of total subunits (thus the least number of pair-wise assembly steps); and unique subunit overhangs for pooled subunit assemblies (Table 8).  66  Table 8 - Possible solution filtration strategies.  Solution Set Filtering Strategy  Description  Minimum subunit size  Ignores solutions that contain subunits with too few octamers. This helps to minimize the total number of subunits and to balance their lengths.  Maximum number of subunits  Ignores solutions that contain too many subunits. This helps reduce the necessary construction steps be selecting strategies that have the least number of subunits.  Unique subunit overhangs  Ignores solutions that contain subunits with the same sticky ends. This filter is useful for construction methods that simply pool subunits together to build the final product.  2-base minimum unique subunit  Same as the previous but more stringent in requiring  overhangs  no two of the four bases in each of the subunit sticky ends to be identical.  As with all the previous strategies described, these filtering solutions can also be bundled to filter multiple criteria at once, further reducing the solution space to a few ideal candidates.  67  3.5 Algorithmic Assembly of Selected Genes 3.5.1 Overview of the assembly process The assembly process occurs in two parts and is summarized in Figure 26. First, subunits are generated and each solution that contains only valid subunits is stored. These solutions are then examined to determine whether they would cause conflicts during assembly and, if so, are discarded. The remaining solutions are then filtered to find the best assembly procedure (generally the one with least number of subunits and unique sticky ends for each subunit).  68  Figure 26 – Flowchart of the inchworm algorithm. BbsI restriction sites, complete palindrome, and 4 bp palindrome conflicts are first resolved. In the case of exact nucleotide sequences, the target sequence must be split up at restriction sites and the subproblems solved independently while with amino acid sequences codons can easily be changed. Subunits are then generated to span the entire sequence. (Not shown: valid solutions must be filtered to determine the ideal assembly.)  69  For the following two example assemblies, subunits were designed such that each would contain no more than 25 octamers and not have repeats in either the first or last four base positions of any two octamers. Both the 16mer and nonamer conflict strategies were explored when other solutions could not be found.  3.5.2 Assembly of EGFP Enhanced green fluorescence protein (eGFP, 724 bp) can be broken down into 181 (179 unique) octamers with no palindromes (in the best tiling frame). Unfortunately, of these 179 unique octamers, there are multiple octamers with 4 nucleotide ends that are palindromes and therefore, cannot be used in assembly. Two different strategies were employed to alleviate this issue: using nonamers to shift the tiling frame over palindromic regions and using 16mers to span over palindromic regions while remaining in the same tiling frame after the region (Table 9). For the two assemblies described below, subunits were required to contain no more than 25 oligonucleotides.  70  Table 9 - Summary of assembly solutions for eGFP with different parameters. This table represents the solutions found only in the first frame however shifting the tiling frame will produce eight different sets of solutions.  Parameters Maximum Subunit Size  Total Number Minimum Subunit 4 bp End Fix  of Solutions  Size  Method  25  No minimum  nonamer  732  25  4  nonamer  689  No maximum  No minimum  nonamer  436  No maximum  4  nonamer  117  No maximum  8  nonamer  36  No maximum  No minimum  16mer  471  No maximum  4  16mer  442  25  No minimum  16mer  570  25  4  16mer  534  71  Using the Inchworm subunit assembly strategy (with backtracking=2, i.e. for each subunit produced from left to right, two truncated variations of the subunit are also tried to generate alternative solutions) and the nonamer option generated 732 solutions. Of these solutions, 467 were excluded on the basis that they contained some subunits with less than 6 octamers (an arbitrary threshold chosen to avoid unnecessarily small subunits). The remaining 265 solutions were then examined to identify the solution with the least number of subunits and custom nonamers. Of these solutions, 65 contained 12 subunits and 6 nonamers (three pairs). If subunits are free to contain an unlimited number of oligonucleotides, 436 solutions were generated and 328 of them contained only 9 subunits (Figure 27).  72  Figure 27 - Assembly of eGFP. Nine subunits are constructed followed by assembly of these subunits into a complete construct. PCR of the final product will amplify those constructs successfully assembled.  Using the same procedure, but with the 16mer option, generates 570 solutions. Of these solutions, 433 were excluded because their subunit sizes were too small (less than six oligonucleotides). The remaining 137 solutions were then examined to identify the solution with the least number of subunits and custom 16mers. The solution with the least number of subunits contained twelve subunits and required ten 16mers and two 24mers (24mers occur when a 16mer used to fix a problem is also problematic).  73  3.5.3 Assembly of TetC Tetracycline resistance gene (tetC, 1204bp) requires being partitioned into two subsequences due to the occurrence of a BbsI restriction site. The two resulting sequences tetCa and tetCb contain 164 and 137 octamers, respectively, however neither contains palindromes.  For the first section (the region prior to the BbsI site), 275 solutions were generated (using the 16mer method and no maximum pool size settings). Of these solutions, 257 could be removed because they contained solutions with subunits smaller than six oligonucleotides. Of the remaining eighteen solutions, seven contained eight subunits (the minimum).  For the second section (the region after the BbsI site), 141 solutions were generated using the same criteria as the first. Of these, thirteen solutions existed which had subunits of greater than six oligonucleotides. One of remaining solutions had a unique set of sticky ends and could therefore be assembled in a single step involving the pooling of all of its subunits.  Lastly, because this gene was assembled in two halves, any solution from the first can be combined with the second to produce the final construct (Figure 28).  74  Figure 28 - Assembly of tetC. The initial sequence is broken into two fragments to avoid an internal BbsI restriction site. Eight and eight subunits are then designed for the two subsequences, respectively. Assembly of each half is then performed and a final pair-wise ligation can join the two halves constructing the final sequence with the BbsI site.  75  4 Discussion Synthetically derived genes have recently been used to increase biodiesel production (Demain 2009), to develop new polymers (Moire et al. 2003), to bioremediate hazardous waste (de Lorenzo 2008), to produce attenuated vaccines by suboptimally recoding viral genomes (Mueller et al. 2009), and to produce anti-malarial drugs in yeast (Ro et al. 2006). Unfortunately, further success has been limited by the high cost associated with producing error free DNA constructs from expensive and error prone oligonucleotide components (Carr et al. 2004; Caruthers 1985; Linshiz et al. 2008; Xiong et al. 2008b; Zhou et al. 2004).  The use of short oligonucleotides provides biological and computational challenges to a DNA assembly process. Firstly, as with all enzymes, DNA ligases are dependent upon temperature however, dsDNA, the substrate for DNA ligases, is also temperature sensitive. While the reduced melting temperature of mismatched DNA would act as error correction in DNA assembly, the already low annealing temperature of short oligonucleotide duplexes may be impossible to ignore. Secondly, the requirements for specific contacts over the length of the dsDNA substrate suggest that these interactions are necessary for effective ligation (Pritchard and Southern 1997). Lastly, while PCA and LCR methods require only a small number of carefully designed custom oligonucleotides, gene synthesis with octamers requires correctly assembling hundreds of potentially conflicting oligonucleotides. Computational methods must be designed to carefully screen all potential assemblies and select those that reduce labour, reagent costs,  76  and error. Overcoming these limitations offers the potential to dispense with custom oligonucleotide synthesis and permit new methods for gene assembly technology.  4.1 Addressing Error Rates Error rates remain the major limitation to low-cost DNA fabrication (Czar et al. 2009). The 7500 bp poliovirus synthesized by Cello et al. (2002) required multiple iterations of assembly and sequencing, and took months to produce. More recent methods, such as the single step PCA assembly of a 2700 bp plasmid by Stemmer et al. (1995) demonstrated the improved pace of synthesis platforms but has done little to improve error rates. Furthermore, the methods used by Stemmer et al. to identify a complete product by selecting for viable plasmids is not applicable to most DNA synthesis applications.  Errors in synthetic DNA can arise at many stages of the assembly process. Generally, most errors occur in the oligonucleotides that are used to build larger products. The errors that arise in synthetic DNA as a result of these erroneous oligonucleotides are generally single nucleotide deletions (Carr et al. 2004). Synthesis errors that are not correctly capped lead to oligonucleotides with internal deletions. These deletion products, when combined in assembly procedures, produce double-stranded DNA products with intermittent gaps. The assembly process itself can also potentially introduce errors into the final product: both mistakes in oligonucleotide annealing and in polymerase extension can lead to erroneous products. Finally, replication of the complete product in a carrier (such as a bacterial plasmid, bacterial artificial chromosome, or a yeast artificial chromosome) can introduce errors, but at much lower frequency. Moreover, because the assembly of a full-length gene product relies on the efficient and specific alignment of 77  long single-stranded oligonucleotides, potential obstacles for correct synthesis include: secondary structures caused by inverted repeats; extraordinary high or low GC-content; and repetitive structures. Usually, these difficult segments can only be synthesized by splitting the procedure into several consecutive steps and a final assembly of shorter subsequences, which in turn leads to a significant increase in time and labour needed for its production.  The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For annealing-based gene synthesis protocols, the quality of the product is directly and exponentially related to the correctness of the component oligonucleotides (Carr et al. 2004). Alternatively, if gene synthesis is performed with low quality oligonucleotides, more effort is required in downstream quality assurance during clone analysis, which involves time-consuming cloning and sequencing procedures.  4.1.1 Error Rate Analysis In our method, the fidelity of T4 DNA ligase to discriminate against mismatched DNA is exploited in the assembly process, allowing for repeated error detection cycles. First, single stranded octamers are ligated to produce short intermediate dsDNA fragments. During this phase, erroneous and mismatched octamers have a sustainably lower Tm and are selected against (Pritchard and Southern 1997). In the second phase, repeated ligation of intermediate fragments prevents incomplete fragments or fragments with erroneous ends from joining. Wash procedures then remove these incomplete fragments from the reaction so that they do not interfere with subsequent steps.  78  To characterize our assembly procedure we cloned and sequenced the resulting products from a series of experimental variations (Table 10).  Table 10 - Fraction of clones with correct full-length sequences.  Assembly Type  Correct Reads  Comments  (ignoring regions outside target) Hierarchical 128bp Assembly  17/19  A smaller band was also visible however only the size-expected band sequencing results are included here  Serial 128bp Assembly (with high  13/15  concentration of octamers) Serial 100bp Assembly  38/39  The most common error observed in the hierarchical 128 bp assembly process was a product missing 1 of the 4 internal regions. Each of the four 32 bp intermediates assembled were designed to have unique 4 bp overhangs and only one unique counterpart; therefore, this truncated product was the result of two mismatched ends joining. Because this fragment still contained the expected ends, it was able to amplify using the primers designed for the complete product. This mismatched ligation product is 79  most likely a result of the long ligation times used during the first attempt at assembly (>4 h). Because these fragments are double stranded and longer than the initial octamers used, it is likely that the stringency of T4 DNA ligase to select against mismatched pairing is reduced. While potentially reducing yield, limiting ligation times would prevent this error in future assemblies.  Of the 128 bp hierarchical assemblies with the correct size, two errors were identified (refer to Appendix B – Sequencing Results for Beta-Actin Assemblie for complete alignment of full length products). The first was a single gap and the second was a transversion (from T to A). These errors could either be the result of errors in oligonucleotides assembled or during PCR.  Error-free synthetic DNA was also produced successfully from the serial construction method in which pools of octamers were added sequentially with intermittent wash steps. Of the full-length products, only one showed a major error. Three adjacent errors were introduced that coincide perfectly to the junction of just a single mismatched octamer, (CGAGCACG was replaced). The only other errors found were a product truncated at one end (most likely do to the failure to complete a pooled ligation) and a product with an additional end outside the PCR priming region.  80  4.1.2 Cost Analysis Regardless of the method, the critical barrier in large DNA synthesis is the cost attributed to producing faultless constructs. While the price of gene synthesis is often quoted at $0.10 to $0.20 per nucleotide, this scales up to $200-$400 per 1 kb, and represents only the price of the raw oligonucleotides materials (Mueller et al. 2009). The additional costs and time involved in assembling these raw materials more accurately brings the price of gene synthesis to ~$2.00 per base (Carlson 2003).  There are three critical costs associated with our synthesis strategy: the synthesis of a library of octamer precursors, the solid supports necessary for each subunit, and the enzymes required for phosphorylation, ligation, and digestion.  At the time of writing this, IDT produced 384, 64 nmol quantities of octamers for our current large-scale assembly undertaking. The approximate price of this plate was $1200, bringing the price to $3.50 per octamer. Based on the quantities used in the 128 bp serial assembly procedure, a 10-fold smaller reaction (a reasonable 6 µl reaction volume) would imply each octamer could be used 640 times (16.67 uM octamer per 6 µl reaction pool, or 100 nmol per reaction). With the assumption that each octamer contributes 4 bp to the DNA construct, the cost in nucleotides alone is $0.00135 per bp.  The solid support, Invitrogen M-270 Dynabeads, was acquired for $1000 for 10 ml. With the same 6 µl synthesis scale, each subunit would require 15 µl of this bead solution (resuspended after preparation to fit the 6 µl reaction volume). This equates to a price of  81  $1.5 per subunit. Presuming an average subunit length of 76 bp (the predicted 19 octamers per pool), the average cost for the solid support is $0.0197 per bp.  The final additional cost is that associated with the enzymes. The 6 µl assembly reaction of each subunit requires the addition of both 0.5 µl T4 PNK and T4 Ligase, and subsequent subunit assemblies require BbsI and T4 Ligase. The quoted NEB cost of PNK and T4 DNA ligase is $212 per 250 µl, implying enzyme costs are $0.848 per subunit (or $0.0111 per bp). Because there can be at most n-1 pair-wise steps in a hierarchical procedure, an estimated additional $0.848 per subunit (for Bibs and T4 DNA Ligase) is reasonable to assume for the assembly of multiple subunits into one large construct.  The complete cost per base is outlined in Table 11 however refinements to the assembly process may allow these costs to be reduced. Firstly, the high cost of the magnetic beads (currently the most expensive requirement) warrants the investigation of possible alternative solid support systems. Secondly, optimization of the enzyme and octamer concentrations used may allow for a dramatic savings in the reagents required for each subunit. Lastly, the use of chemically cleavable attachments of solid-support bound DNA would relive the requirement for enzymatic restriction with BbsI.  82  Table 11 - Costs associated with octamer gene assembly  Reagent  Purpose  Octamers  Preconstructed library of  Cost per bp $0.00135  starting oligonucleotides Dynabeads  Anchor for subunit  $0.01970  assembly and subunit purification Pooling Enzymes  PNK and Ligase for subunit  $0.0111  assembly from octamer pools Subunit Assembly Enzymes  BbsI and Ligase for  $0.0111  assembly of multiple subunits  Total: $0.043  83  4.1.3 Reducing Error Rates  4.1.3.1 Improved Ligation Conditions Without changing the assembly scheme, mismatched ligation errors could easily be avoided by reducing ligation times during both the pooled assembly steps and the pairwise assembly steps. Increasing reaction temperatures during all ligations would further reduce unwanted ligations. Shorter and mismatched oligonucleotide duplexes are denatured at lower temperatures than longer, perfectly matched duplexes and are selected against by ligases (Pritchard and Southern 1997).  4.1.3.2 Receding/Protruding End Assembly Strategy In the current hierarchical assembly process, mobilized intermediates have two free 5’ protruding ends. While the process is designed such that only one of these ends is complementary to the immobilized intermediate ligation target, mismatched ligation is still a potential problem.  To avoid such an error, two adjacent intermediates can be designed to alternate their sticky ends as either 5’ receding or 5’ protruding, thus ensuring only one compatible orientation during any given joining (Figure 29).  84  Figure 29 - Overview of an assembly strategy that involves alternating receding/protruding sticky ends. The final step of the subunit ligations flips the orientation of the combined fragments.  A complication of this procedure is that fragments are intermittently reversed due to the association of their ends. Assembly must plan for these events by constructing some intermediates in reverse prior to hierarchical assembly so that the completed product is correct and in the expected orientation.  85  4.1.3.3 MutS Purification A major drawback of PCR-based assembly strategies is their inability to select against mismatched bases. After one cycle of PCR, a dsDNA fragment containing a mismatch will produce two dsDNA sequences which no longer contain mismatches (assuming no new errors are introduced) and so it becomes impossible to differentiate or identify the original error.  Carr et al. (2004) addressed this problem by introducing a MutS-mediated filtration step in which mismatched dsDNA could be selected against during a PAGE purification. A similar process could be implemented to remove mismatched assemblies during hierarchical assembly. Mobilized intermediates freed from their solid supports could potentially be purified using an affinity-column bound MutS approach. Because this process would most likely also reduce quantities of valid fragments, the process could be performed just once prior to PCR amplification. Mismatches introduced at any stage during assembly will persist and remain identifiable up until amplification and so only one purification step is required prior to PCR.  4.1.3.4 Endonuclease V Selection Bang and Church (2008) addressed the accumulation of oligonucleotide errors during LCA by utilizing a combination of enzymes. Reactions were designed to produce circularized DNA constructs through LCA. The products of this ligation were then transferred to an Endonuclease V and exonuclease reaction. Circular DNA is not a target  86  for exonuclease on its own; however endonuclease V, cleaves mismatched DNA, causing erroneous products to be opened resulting in their degradation by exonuclease.  A similar approach can be applied to our synthesis method by utilizing endonuclease V. Prior to PCR amplification, the synthetic DNA is immobilized at one end and free at the other. With this, two potential methods can be employed. Without any modifications, an endonuclease V treatment will cleave mismatches preferentially and to some degree on both sides of the error. After these cleavages, a wash at a high temperature will remove either one strand or both strands from the location of the mismatch, making it unusable in PCR. Alternatively, the final step of assembly could add an exonuclease resistant cap to the free end of the product. A combined endonuclease V and exonuclease reaction would specifically cleave mismatched fragments by endonuclease V and allow for their degradation by exonculease.  4.1.4 Assembly with Hexamer Analogues While hexamer assembly may be possible with optimizations in PEG, Ligase, and oligonucleotide concentrations, our efforts suggest that such a procedure would be time consuming and unreliable. Furthermore, Dunn et al. (1995) and our own experiments suggest that not all hexamer combinations will ligate and the conditions for optimal ligation are inappropriate for the annealing of short oligonucleotides.  Nucleic acid analogues offer a potential solution to this problem. Both locked nucleic acids (LNA) and peptide nucleic acids (PNA) bind to each other with significantly higher affinities than each do to DNA and with even greater affinities than their analogues 87  DNA-DNA duplexes (Ng and Bergstrom 2005). More importantly, T4 DNA ligase shows no selectivity against these analogous, suggesting that a construction procedure with a hexamer analogue library could be a feasible alterative to an octamer library. For comparison, a duplex of the sequence CGATGC/GCATCG has a Tm of greater than 90°C for an LNA/LNA duplex, 41°C for a LNA/DNA duplex, and ~14°C for the natural DNA/DNA duplex.  While this alternative library increases costs by orders of magnitude, it offers both a more manageable library size and increased reaction rates due to the higher annealing temperatures and reaction kinetics. Further still, it may be possible to reduce costs by utilizing DNA hexamers with only a few selected LNA base substitutions, thus increasing their annealing temperature without requiring complete LNA hexamers.  4.1.5 Danger and Security At the present time, the construction of the 582,970 bp synthetic genome by Gibson et al. (2008) demonstrates that it is now possible to construct DNA larger than most viral genomes. Recent projects involving the synthesis of a viral genome on the brink of eradication (Cello et al. 2002) and the resurrection of the 1918 pandemic influenza virus (Tumpey et al. 2005) further highlight the potential dangers of synthetic DNA.  Currently, all synthetic sequences ordered in North America are screened against publicly available DNA sequence data to ensure no dangerous sequences are constructed. DNA construction with oligonucleotides on the scale of octamers, however, makes this 88  screening process difficult to manage. Ordering additional unnecessary oligonucleotides, codon-altered oligonucleotides, and oligonucleotides from multiple commercial sources would make it a daunting task to determine the original target sequence.  A simplistic solution to the dangers of synthetic DNA would be to prohibit its use or to seriously limit its access. However, given that all life is encoded by nucleic acids, future advances in DNA synthesis are inevitable, regardless of the dangers it brings. As with many technologies developed in the past century, synthetic DNA offers both great promise and danger for the future. While idealists in synthetic biology envision a future in which synthetic DNA can solve many of the world’s problems, it should be approached with caution.  4.1.6 Conclusions Making faultless DNA is a fundamental challenge to genetic engineering applications. Here we show for the first time how a complete library of short oligonucleotides can be used to address this challenge while remaining cost competitive. A further benefit of this approach is its simplistic and recursive construction procedure, which is capable of assembling error-free DNA from error-prone oligonucleotides.  4.2 Future Research 4.2.1 Engineering a New Ligase through Directed Evolution Ligase is essential to this gene synthesis approach. To date, no ligase is known to successfully assembly hexamers at a reasonable rate. While protein engineering may 89  prove this to be impossible, such an improvement would alleviate the difficulties of library size. Furthermore, an alternative alteration may be to engineer a more stringent, high-fidelity ligase to further reduce the potential error rate of our ligase-based assembly.  4.2.2 Microfluidic and Labcyte Assembly A major limitation to an oligonucleotide library synthesis approach is the storage and handling requirements of such a large number of oligonucleotides. A further difficulty is the reduction in volumes necessary to achieve a cost competitive solution. Multichamber, continuous flow microfluidic devices and acoustic droplet ejection technology offer two potential solutions to these requirements.  Continuous flow microfluidics makes use of photolithography etched, silicone wafer chips designed to handle the continuous flow liquids through microscopic channels and reaction chambers. Actuation of liquid flow is implemented by external pressure sources, which control various “push-down” valves found on the flow layer of the chip. A generalized chip typically consists of a flow layer and a control layer bound together and fixed to a glass slip (Kong et al. 2007).  Gene synthesis using microfluidic approaches has demonstrated the PCA assembly of a 1 kb gene utilizing four 500 nl reaction chambers (Kong et al. 2007). Despite such successes, limitations in this technology prevent its feasibility in a library synthesis approach. Firstly, because of the high porosity of the chips, sample evaporation can occur throughout the reaction stages. To counter this, additional channels must be designed on a chip to act as a fluid reservoir to reduce sample loss. It is not clear, however, if this also 90  suggests that unwanted mixing of oligonucleotide pools between independent reaction chambers may occur. Secondly, while the reaction chambers themselves are on the nanoliter scale, a complex series of input channels is necessary. Considering the size of library, were such channels from the library to microfludic chip fabricated, there would be a vast dead volume associated with each oligonucleotide added, most likely on a scale much larger than the reaction volumes themselves. Lastly, the experiments performed by Kong et al. (2007) did not include repeated wash and reaction steps on anchored oligonucleotides. For such a device to work successfully for our purposes, the device must be able to isolate completed DNA subunits by washing away unreacted oligonucleotides, a process which requires the manipulation of tethered (or bead-bound) DNA fragments.  Acoustic droplet ejection offers an alternative approach to oligonucleotide manipulation. By utilizing ultrasound pulses, it is possible to eject fluid droplets as small as a picoliter from a sample without any physical contact. Labcyte Inc., a leading manufacturer in this technology, has a device capable of transferring volumes as small as 2.5 nl between source and destination plates as large as 1536 wells. In comparison to continuous flow microfludics, this approach is far simpler and is not limited by the previously discussed barriers. The Labcyte approach however implies that a two-stage assembly process would be necessary. In the first, the Labcyte device would cherry pick all the necessary oligonucleotides into pools on a reaction plate. In the second stage, this plate would undergo the typical pooled ligation and pairwise assembly strategy either manually or on a specifically programmed Biomec.  91  4.3 Significance The global sales of DNA oligonucleotides is about $700 million a year while the gene synthesis industry is estimated at only $60 million a year (NextBigFuture 2007). However, this latter field is growing rapidly, resulting in a 700-fold increase in DNA synthesis over the past decade, almost doubling every year. Proof of the demand for synthetic DNA can be seen in the numerous companies entering the market offering custom gene synthesis; GENEART, GeneWIZ, GenScript Corporation, Mr. Gene, DNA2.0, and Blue Heron Bioscience.  Here we demonstrate the feasibility of a new method for DNA synthesis, which does not rely on the design and synthesis of custom primers. The success of this research will mark a shift in DNA synthesis technology away from current cumbersome methods to a rapid library-based, cherry-picking procedure of synthesis. Its avoidance of polymerase cycling steps also implies that error correction techniques can easily be applied to further refine the process to creating accurate synthetic DNA.  We hope that this research will help realize the potential and feasibility of large scale DNA synthesis. Just as the sequencing of complete genomes is becoming an every day commodity, so too will the synthesis of these genomes and even artificial genomes.  92  5 References Au LC, Yang FY, Yang WJ, Lo SH, Kao CF. 1998. Gene synthesis by a LCR-based approach: high-level production of leptin-L54 using synthetic gene in Escherichia coli. Biochem Biophys Res Commun 248(1):200-3. Bang D, Church GM. 2008. Gene synthesis by circular assembly amplification. Nat Methods 5(1):37-9. Beattie KL, Fowler RF. 1991. Solid-phase gene assembly. Nature Methods 352:548-549. Carlson R. 2003. The pace and proliferation of biological technologies. Biosecur Bioterror 1(3):203-14. Carr PA, Park JS, Lee YJ, Yu T, Zhang S, Jacobson JM. 2004. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res 32(20):e162. Caruthers MH. 1985. Gene synthesis machines: DNA chemistry and its uses. Science 230(4723):281-5. Caruthers MH, Barone AD, Beaucage SL, Dodds DR, Fisher EF, McBride LJ, Matteucci M, Stabinsky Z, Tang JY. 1987. Chemical synthesis of deoxyoligonucleotides by the phosphoramidite method. Methods Enzymol 154:287-313. Caruthers MH, Beaucage SL, Becker C, Efcavitch JW, Fisher EF, Galluppi G, Goldman R, deHaseth P, Matteucci M, McBride L and others. 1983. Deoxyoligonucleotide synthesis via the phosphoramidite method. Gene Amplif Anal 3:1-26. Caruthers MH, Beaucage SL, Efcavitch JW, Fisher EF, Matteucci MD, Stabinsky Y. 1980. New chemical methods for synthesizing polynucleotides. Nucleic Acids Symp Ser(7):215-23. Cello J, Paul AV, Wimmer E. 2002. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science 297(5583):1016-8. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31(13):3497-500. Cherepanov AV, de Vries S. 2003. Kinetics and thermodynamics of nick sealing by T4 DNA ligase. Eur J Biochem 270(21):4315-25. Czar MJ, Anderson JC, Bader JS, Peccoud J. 2009. Gene synthesis demystified. Trends Biotechnol 27(2):63-72. de Lorenzo V. 2008. Systems biology approaches to bioremediation. Current Opinion in Biotechnology 19(6):579-589. Demain AL. 2009. Biosolutions to the energy problem. J Ind Microbiol Biotechnol 36(3):319-32. Dunn JJ, Butler-Loffredo LL, Studier FW. 1995. Ligation of hexamers on hexamer templates to produce primers for cycle sequencing or the polymerase chain reaction. Anal Biochem 228(1):91-100. Flajolet P, Grabner PJ, Kirschenhofer P, Prodinger H. 1995. On Ramanujan's <italic>Q</italic>-function. J. Comput. Appl. Math. 58(1):103-116. Forster AC, Church GM. 2006. Towards synthesis of a minimal cell. Mol Syst Biol 2:45. Gibson DG, Benders GA, Andrews-Pfannkoch C, Denisova EA, Baden-Tillson H, Zaveri J, Stockwell TB, Brownley A, Thomas DW, Algire MA and others. 2008. 93  Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319(5867):1215-20. Gupta NK, Ohtsuka E, Sgaramella V, Buchi H, Kumar A, Weber H, Khorana HG. 1968a. Studies on polynucleotides, 88. Enzymatic joining of chemically synthesized segments corresponding to the gene for alanine-tRNA. Proc Natl Acad Sci U S A 60(4):1338-44. Gupta NK, Ohtsuka E, Weber H, Chang SH, Khorana HG. 1968b. Studies on polynucleotides. LXXXVII. The joining of short deoxyribopolynucleotides by DNA-joining enzymes. Proc Natl Acad Sci U S A 60(1):285-92. Hecker KH, Rill RL. 1998. Error analysis of chemically synthesized polynucleotides. Biotechniques 24(2):256-60. Itaya M, Fujita K, Kuroki A, Tsuge K. 2008. Bottom-up genome assembly using the Bacillus subtilis genome vector. Nat Methods 5(1):41-3. Khorana HG, Agarwal KL, Besmer P, Buchi H, Caruthers MH, Cashion PJ, Fridkin M, Jay E, Kleppe K, Kleppe R and others. 1976. Total synthesis of the structural gene for the precursor of a tyrosine suppressor transfer RNA from Escherichia coli. 1. General introduction. J Biol Chem 251(3):565-70. Kodumal SJ, Patel KG, Reid R, Menzella HG, Welch M, Santi DV. 2004. Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc Natl Acad Sci U S A 101(44):15573-8. Kong DS, Carr PA, Chen L, Zhang S, Jacobson JM. 2007. Parallel gene synthesis in a microfluidic device. Nucleic Acids Res 35(8):e61. Linshiz G, Yehezkel TB, Kaplan S, Gronau I, Ravid S, Adar R, Shapiro E. 2008. Recursive construction of perfect DNA molecules from imperfect oligonucleotides. Mol Syst Biol 4:191. Majumder K. 1992. Ligation-free gene synthesis by PCR: synthesis and mutagenesis at multiple loci of a chimeric gene encoding ompA signal peptide and hirudin. Gene 116(1):115-6. Moire L, Rezzonico E, Poirier Y. 2003. Synthesis of novel biomaterials in plants. J Plant Physiol 160(7):831-9. Mueller S, Coleman JR, Wimmer E. 2009. Putting synthesis into biology: a viral view of genetic engineering through de novo gene and genome synthesis. Chem Biol 16(3):337-47. Nakatani M, Ezaki S, Atomi H, Imanaka T. 2000. A DNA ligase from a hyperthermophilic archaeon with unique cofactor specificity. J Bacteriol 182(22):6424-33. NextBigFuture. 2007. Synthetic Biology and Gene Synthesis. http://nextbigfuture.com/2007/12/synthetic-biology-and-dna-synthesis_17.html. Ng PS, Bergstrom DE. 2005. Alternative nucleic acid analogues for programmable assembly: hybridization of LNA to PNA. Nano Lett 5(1):107-11. Nilsson SV, Magnusson G. 1982. Sealing of gaps in duplex DNA by T4 DNA ligase. Nucleic Acids Res 10(5):1425-37. Odell M, Shuman S. 1999. Footprinting of Chlorella virus DNA ligase bound at a nick in duplex DNA. J Biol Chem 274(20):14032-9. Pritchard CE, Southern EM. 1997. Effects of base mismatches on joining of short oligodeoxynucleotides by DNA ligases. Nucleic Acids Res 25(17):3403-7. 94  Ro DK, Paradise EM, Ouellet M, Fisher KJ, Newman KL, Ndungu JM, Ho KA, Eachus RA, Ham TS, Kirby J and others. 2006. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440(7086):940-3. Smith HO, Hutchison CA, 3rd, Pfannkoch C, Venter JC. 2003. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci U S A 100(26):15440-5. Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL. 1995. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164(1):49-53. Temsamani J, Kubert M, Agrawal S. 1995. Sequence identity of the n-1 product of a synthetic oligonucleotide. Nucleic Acids Res 23(11):1841-4. Tian J, Gong H, Sheng N, Zhou X, Gulari E, Gao X, Church G. 2004. Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432(7020):1050-4. Tumpey TM, Basler CF, Aguilar PV, Zeng H, Solorzano A, Swayne DE, Cox NJ, Katz JM, Taubenberger JK, Palese P and others. 2005. Characterization of the reconstructed 1918 Spanish influenza pandemic virus. Science 310(5745):77-80. Wu G, Wolf JB, Ibrahima AF, Vadasza S, Gunasinghea M, Freelanda SJ. 2006. Simplified gene synthesis: A one-step approach to PCR-based gene construction. Journal of Biotechnology 124(3). Xiong AS, Peng RH, Zhuang J, Gao F, Li Y, Cheng ZM, Yao QH. 2008a. Chemical gene synthesis: strategies, softwares, error corrections, and applications. FEMS Microbiol Rev 32(3):522-40. Xiong AS, Peng RH, Zhuang J, Liu JG, Gao F, Chen JM, Cheng ZM, Yao QH. 2008b. Non-polymerase-cycling-assembly-based chemical gene synthesis: strategies, methods, and progress. Biotechnol Adv 26(2):121-34. Yang GS, Stott JM, Smailus D, Barber SA, Balasundaram M, Marra MA, Holt RA. 2005. High-throughput sequencing: a failure mode analysis. BMC Genomics 6(1):2. Zhou X, Cai S, Hong A, You Q, Yu P, Sheng N, Srivannavit O, Muranjan S, Rouillard JM, Xia Y and others. 2004. Microfluidic PicoArray synthesis of oligodeoxynucleotides and simultaneous assembling of multiple DNA sequences. Nucleic Acids Res 32(18):5409-17.  95  Appendix A – Glossary Bacterial Artificial Chromosome (BAC) is a DNA construct, based on a functional fertility plasmid, used for transforming and cloning in bacteria. The unique fertility genes promote the even distribution of plasmids after bacterial cell division and make it possible for them to hold inserts in the hundreds of kbp.  Construct A large dsDNA assembly. In the case of this project, constructs refer to the final dsDNA (after all necessary subunits have been combined).  Ligase Chain Reaction (LCR) A technique used to synthesis a gene de novo relying on thermocycling, multiple overlapping primers, and a thermostable ligase to join primers into a complete target sequence.  Nucleoside A nucleotide lacking any phosphates at the 5’ carbon.  Nucleotide The basic unit of polynucleotides and DNA. Nucleotides consist of ribose (or deoxyribose) with a purine (A,G) or pyrimidine (T,C,U) base attached at the 1’ carbon and one or more phosphates attached at the 5’ carbon.  Oligonucleotide (oligo) A short polynucleotide, typically twenty or fewer nucleotides in length.  96  Polymerase Cycling Assembly (PCA) A technique used to synthesis a gene de novo utilizing a PCR reaction with multiple overlapping primers covering the complete target sequence.  Polymerase Chain Reaction (PCR) A technique used to amplify a single piece of DNA across several orders of magnitude. This method relies on thermocycling, unique DNA primers, and a thermostable polymerase.  Polynucleotide An organic polymer molecule composed of nucleotide monomers covalently bonded in a chain. Equivalent to single stranded DNA.  Receptor A dsDNA attached (through a biotin linkage) to a solid support. Used in fluorescence experiments to measure labeled oligonucleotide ligation to a receptor and in DNA assembly as the solid support to build subunits off of.  Rescue Oligonucleotide An oligonucleotide used to supplement the ligation of another oligonucleotide when the oligonucleotides are less than decamers. It itself is not joined in the ligation reaction.  Strategy Pattern In computer science, the strategy pattern is a particular software design pattern, whereby algorithms can be selected at runtime. The strategy pattern is useful for situations where it is necessary to define a family of algorithms and make them easily interchangeable.  97  Subunit Multiple oligonucleotides assembled into one short dsDNA fragment. Subunits are generally built on a solid support, washed, purified, and then released to build larger subunits or constructs  Yeast Artificial Chromosome (YAC) A vector used to clone large DNA fragments (larger than 100 kb and up to 3000 kb). It is an artificially constructed chromosome and contains the telomeric, centromeric, and replication origin sequences needed for replication and preservation in yeast cells. YACs are typically used in situations in which the DNA to be cloned is too large for a bacterial host.  98  Appendix B – Sequencing Results for Beta-Actin Assemblies The following three ClustalX alignments describe the sequencing results of the serial assembly procedure (results for the 100 bp and 128 bp product) and the hierarchical assembly procedure (results for the 128 BP product) used to construct a fragment of betaactin.  99  B.1 Serial Assembly of 100bp BA  100  101  B.1 Serial Assembly of 128bp BA using High Concentration Oligonucleotides  102  B.3 Hierarchical (pair-wise) Assembly of 128bp BA  103  104  Appendix C – Source Code for the Birthday Paradox Algorithm BirthdayParadox.java: package utilities; import java.util.Random; import java.util.TreeSet; /** * Determines how many octamers can be placed in a subunit without conflict * by two methods; * 1. mathematical approximation * 2. trial of randomly generated subunits * */ public class BirthdayParadox { int interested int summation.  M = 256; in the ends of octamers depth = 100; i = 1 to i = 100  //256 = 4bp tiles. we are only //how deep we should proceed in  int trials = 100000; //how many iterations of a random sampling should we try. String[] alphabet = new String[]{ "A","T","G","C" }; int oligonucleotidelength = 4; /* * returns (M)(M-1)(M-2)(M-3)... to (M-k) * */ private double mMinusOneToMinusK( int k ){ double result = 1; for(int i=0; i<k; i++){ result *= (M-i); } return result; } private double MtoTheK( int k ){ //return M^k; return java.lang.Math.pow(M,k); } private double sumFromOneToK( int k ){  105  double result = 0; for( int i=1; i<=k; i++){ double temp = mMinusOneToMinusK( i ); double temp2 = MtoTheK( i ); result += temp/temp2; } return result; }  public void solve(){ double result = sumFromOneToK( depth ); System.out.println("For a depth of " +depth+" the solution is:"); System.out.println( result ); } private String getRandomOligo(Random random){ String oligo = ""; for(int i=0; i<oligonucleotidelength; i++){ int pick = random.nextInt(4); oligo += alphabet[pick]; } return oligo; } public void randomSampleSolve(){ double totalcount = 0; for(int i=0; i<trials; i++){ Random random = new Random(); TreeSet set = new TreeSet(); boolean clash = false; while(clash != true){ String current = getRandomOligo(random); if( set.contains( current )){ clash = true; }else{ set.add( current ); } } //we had a collision, lets add it to the sum. totalcount += set.size(); } System.out.println("For "+trials+" randomly generated sequences,"); System.out.println("the average collision size was: "+(totalcount/trials)); } public static void main(String[] args){  106  BirthdayParadox solver = new BirthdayParadox(); solver.solve(); solver.randomSampleSolve(); return; } }  BirthdayParadox.java Output: For a depth of 100 the solution is: 19.726105902922214 For 100000 randomly generated sequences, the average collision size was: 19.70353  107  Appendix D – Source Code for Selected Assembly Algorithm Classes Oligonucleotide.java: package components; /** * Oligonucleotide * * Describes the organization of a single stranded oligonucleotide. * Also provides methods for common tasks such as reversing and complement * */ public class Oligonucleotide implements Comparable { String oligostring; public Oligonucleotide(String oligostring) { this.oligostring = oligostring.toLowerCase(); } @Override public boolean equals(Object object) { if (object == null) { return false; } try { return ((Oligonucleotide) object).getSequence().equals( this.getSequence()); } catch (Exception e) { return false; } } public String getSequence() { return oligostring; } @Override public String toString() { return getSequence(); } public boolean isPalindromic() { if (oligostring.equalsIgnoreCase(reverseComplement())) { return true; } return false; }  108  private String reverseComplement() { String temp = oligostring.toLowerCase(); temp = temp.replace('a', 'T'); temp = temp.replace('t', 'A'); temp = temp.replace('g', 'C'); temp = temp.replace('c', 'G'); temp = temp.toUpperCase(); StringBuffer sb = new StringBuffer(temp); return sb.reverse().toString(); } public Oligonucleotide getReverseComplement() { return new Oligonucleotide(reverseComplement()); } private String reverse() { StringBuffer sb = new StringBuffer(oligostring); return sb.reverse().toString(); } public Oligonucleotide getReverse() { return new Oligonucleotide(reverse()); } public int compareTo(Object o) { // just compare the sequences; return this.oligostring.compareTo(((Oligonucleotide) o).oligostring); } @Override public int hashCode() { return oligostring.hashCode(); } }  109  Subunit.java: package components; import java.util.ArrayList; import java.util.HashSet; import java.util.List;  /** * Subunit * * Describes the organization of many oligonucleotides to form a dsDNA fragment * The top and bottom strand are both 5'-> 3' and stored independently. * Also provides methods for adding and removing oligos and splitting the subunit * */ public class Subunit implements Cloneable { protected List<Oligonucleotide> topOligos; protected List<Oligonucleotide> botOligos; // is stored in 5'->3' private int bottomoffset; // 0 would be a blunt end at the 5'left pos. 4 // would be a 4nt overhang. etc public Subunit() { bottomoffset = 4; topOligos = new ArrayList<Oligonucleotide>(50); botOligos = new ArrayList<Oligonucleotide>(50); } /** * We need to be very careful when using this as the bottom set of oligos * must be in reverse since it is stored in the 5'-3' but the strand is * actually 3'-5' on the bottom... * * @param oligosettop * @param oligosetbot */ public Subunit(List<Oligonucleotide> oligosettop, List<Oligonucleotide> oligosetbot) { this(); this.topOligos.addAll(oligosettop); this.botOligos.addAll(oligosetbot); } public void addOligoToFivePrimeTop(Oligonucleotide oligo) { topOligos.add(0, oligo); }  110  public void addOligoToThreePrimeTop(Oligonucleotide oligo) { topOligos.add(oligo); } public void addOligoToFivePrimeBottom(Oligonucleotide oligo) { botOligos.add(0, oligo); } public void addOligoToThreePrimeBottom(Oligonucleotide oligo) { botOligos.add(oligo); } public Oligonucleotide getFivePrimeTopOligo() { return topOligos.get(0); } public Oligonucleotide getFivePrimeBottomOligo() { return botOligos.get(0); } public Oligonucleotide getThreePrimeTopOligo() { return topOligos.get(topOligos.size() - 1); } public Oligonucleotide getThreePrimeBottomOligo() { return botOligos.get(botOligos.size() - 1); } public Oligonucleotide removeFivePrimeTopOligo() { return topOligos.remove(0); } public Oligonucleotide removeFivePrimeBottomOligo() { return botOligos.remove(0); } public Oligonucleotide removeThreePrimeTopOligo() { return topOligos.remove(topOligos.size() - 1); } public Oligonucleotide removeThreePrimeBottomOligo() { return botOligos.remove(botOligos.size() - 1); } public boolean isRepeatFree() { HashSet<String> chunks1 = new HashSet<String>(); HashSet<String> chunks2 = new HashSet<String>(); for (int i = 0; i < topOligos.size(); i++) { Oligonucleotide current = topOligos.get(i); String sequence = current.getSequence(); String firsthalf = sequence.substring(0, 4); String secondhalf = sequence.substring(4, 8);  111  if (chunks1.contains(firsthalf)) { return false; } else { chunks1.add(firsthalf); } if (chunks2.contains(secondhalf)) { return false; } else { chunks2.add(secondhalf); } } return true; } public Subunit[] splitAtIndex(int position, boolean top, boolean bottomtailoverhang) { if (top) { Subunit leftunit = new Subunit(); for (int i = 0; i < position; i++) { // shift them off one at a time until we have gotten to the // split index. Oligonucleotide topoligo = removeFivePrimeTopOligo(); leftunit.addOligoToThreePrimeTop(topoligo); Oligonucleotide bottomoligo = removeThreePrimeBottomOligo(); if (bottomoligo != null) { leftunit.addOligoToFivePrimeBottom(bottomoligo); } } // leftunit and subunit are the result of splitting subunit. return new Subunit[] { leftunit, this }; } else { // TODO: Fill this one for the reciprocal case! return null; } } @Override public String toString() { return toString(true, true, true); } public String toString(boolean showtop, boolean showbottom, boolean reversebottom) { String topresult = ""; String botresult = ""; String result = ""; int chunks = 4; // number of oligos to fit on one line at a time.  112  int indextop = 0; int indexbot = 0; while (indextop < topOligos.size() || indexbot < botOligos.size()) { if (showtop) { topresult = ""; for (int i = indextop; i < topOligos.size() && i < indextop + chunks; i++) { topresult += ("t" + i + ":") + topOligos.get(i).getSequence() + " "; } } if (showbottom) { botresult = ""; for (int i = indexbot; i < botOligos.size() && i < indexbot + chunks; i++) { Oligonucleotide bottomoligocurrent = botOligos .get(botOligos.size() - 1 - i); String seq = ""; if (reversebottom) { seq = bottomoligocurrent.getReverse().getSequence(); } else { seq = bottomoligocurrent.getSequence(); } botresult += ("b" + i + ":") + seq + " "; } } result += topresult + "\n"; if (showbottom) { if (showtop) { result += " "; } result += botresult + "\n"; } indextop += chunks; indexbot += chunks; } return result; } public int getBottomOffset() { return bottomoffset; } public int getTotalOligoCount() { return (topOligos.size() + botOligos.size()); }  113  public String getTopStrandSequence() { String seq = ""; for (int i = 0; i < topOligos.size(); i++) { seq += topOligos.get(i).getSequence(); } return seq; } public String getBottomStrandSequence() { String seq = ""; for (int i = 0; i < botOligos.size(); i++) { seq += botOligos.get(i).getSequence(); } return seq; } public int getTopStrandLength() { return getTopStrandSequence().length(); } public int getBottomStrandLength() { return getBottomStrandSequence().length(); } public List<Oligonucleotide> getTopOligos() { return topOligos; } public List<Oligonucleotide> getBottomOligos() { return botOligos; } @Override public Subunit clone() { Object copytop = ((ArrayList<Oligonucleotide>) topOligos).clone(); Object copybot = ((ArrayList<Oligonucleotide>) botOligos).clone(); return new Subunit(((ArrayList<Oligonucleotide>) copytop), (ArrayList<Oligonucleotide>) copybot); } }  114  FivePrimeRecedingAdaptor.java: package components; /** * FivePrimeRecedingAdaptor * * The general adaptor used most often to anchor a growing subunit. * Physical representations of this should have a biotinylated 5' topstrand. * Contains the BbsI restriction site at the 3' end. */ public class FivePrimeRecedingAdaptor extends Subunit implements Adaptor{ private static final String topstrand = "GGCAGTTCCGGATCCATCTAGACAGAATTCAGCTGGAAGACTG"; //5' - 3' private static final String bottomstrandwithoutoverhang = "CAGTCTTCCAGCTGAATTCTGTCTAGATGGATCCGGAACTGCC"; //5' - 3'  /** * * @param adatperoverhang Should be 5' - 3', (4 nucleotides in size for octamers) */ public FivePrimeRecedingAdaptor(String adatperoverhang){ String bottomstrand = adatperoverhang + bottomstrandwithoutoverhang; Oligonucleotide top = new Oligonucleotide( topstrand ); Oligonucleotide bottom = new Oligonucleotide( bottomstrand ); addOligoToFivePrimeTop( top ); addOligoToFivePrimeBottom( bottom ); } public String toString(){ return toString(true,true,true); } public String toString(boolean showtop, boolean showbottom, boolean reversebottom){ String topresult = ""; String botresult = ""; String result = ""; if( showtop ){ topresult = ""; topresult += /*"adaptor: "+*/topOligos.get(0).getSequence() +" "; } if( showbottom ){ botresult = ""; Oligonucleotide bottomoligo = botOligos.get(0);  115  String seq = ""; if( reversebottom ){ seq = bottomoligo.getReverse().getSequence(); }else{ seq = bottomoligo.getSequence(); } botresult += /*"adaptor: "+*/ seq +" "; } if( showtop ){ result += topresult + "\n"; } if(showbottom){ result += botresult+ "\n"; } return result; }  }  116  AdaptorSubunitRescueComplex.java: package components; /** * A complete subunit along with the necessary adaptor (anchor) and rescue to make the * subunit in the lab. * */ public class AdaptorSubunitRescueComplex implements Cloneable { Adaptor adaptor; Subunit subunit; Oligonucleotide rescue; public AdaptorSubunitRescueComplex(Subunit subunit, Oligonucleotide rescue){ this(subunit); this.rescue = rescue; } public AdaptorSubunitRescueComplex(Adaptor adaptor, Subunit subunit, Oligonucleotide rescue){ this.adaptor = adaptor; this.subunit = subunit; this.rescue = rescue; }  public AdaptorSubunitRescueComplex( Subunit subunit ){ //set the subunit //we will determine the other necessities as we go. this.subunit = subunit; if( subunit.getBottomOffset() == 4 ){ //generate the necessary 5' receding adaptor: Oligonucleotide fiveprimetop = subunit.getFivePrimeTopOligo(); Oligonucleotide topbit = new Oligonucleotide( fiveprimetop.getSequence().substring(0,4)); //System.out.println("test0 "+topbit.toString()); Oligonucleotide reverse = topbit.getReverseComplement(); //System.out.println("test1 "+reverse.toString()); //String adaptoroverhang = reverse.getSequence().substring(0,4); adaptor = new FivePrimeRecedingAdaptor(reverse.getSequence()); //generate a rescue oligonucleotide, the final 4 bases don't really matter, //and we should try to make them conflict free with the subunit, if possible.  117  //also need to determine if the rescue is on the top or the bottom }  } public Adaptor getAdaptor(){ return adaptor; } public Subunit getSubunit(){ return subunit; } public Oligonucleotide getRescue(){ return rescue; } public void setRescue(Oligonucleotide rescue){ this.rescue = rescue; }  public String toString(){ String result = "AdaporSubunitRescueComplex:\n"; result += "adaptor:\n"; result += adaptor.toString(); result += "\n"; result += "subunit:\n"; //result += subunit.getTopStrandSequence() +"\n"; //result += subunit.getBottomStrandSequence() +"\n"; result += subunit.toString(); result += "\n"; result += "rescue:\n"; if( rescue != null ){ result += rescue.toString(); }else{ result += "(rescue not set)"; } result += "\n\n";  return result; } public AdaptorSubunitRescueComplex clone(){ Adaptor adaptorclone = adaptor; //we DONT need to clone these either, they are always the same. Subunit subunitclone = subunit.clone();  118  Oligonucleotide rescueclone = rescue; //we DONT need to clone these as long as we aren't changing sequences return new AdaptorSubunitRescueComplex( adaptorclone, subunitclone,rescueclone); } public String getOligosLargerThanOctamersString(boolean topstrand){ String results = ""; if( topstrand ){ //print the top strand only. for(int i=0; i<subunit.getTopOligos().size();i++){ Oligonucleotide current = subunit.getTopOligos().get(i); if(current.getSequence().length() > 8 ){ results += current.getSequence() + "\n"; } } }else{ //it must be bottom strand we want to print for(int i=0; i<subunit.getBottomOligos().size();i++){ Oligonucleotide current = subunit.getBottomOligos().get(i); if(current.getSequence().length() > 8 ){ results += current.getSequence() + "\n"; } } } return results; }  }  119  Solution.java: package components; import java.util.ArrayList; import java.util.List; import subunitvalidationstrategies.SubunitValidationStrategy; /** * Stores ALL the AdaptorSubunitRescueComplexs for a given solution * An assembly strategy should make a SolutionSet containing many Solution(s) where * each Solution contains the list of subunits necessary to build it. */ public class Solution implements Cloneable{ List<AdaptorSubunitRescueComplex> asrclist; public Solution(){ asrclist = new ArrayList<AdaptorSubunitRescueComplex>(); } public void add( AdaptorSubunitRescueComplex asrc){ asrclist.add( asrc ); } public AdaptorSubunitRescueComplex get( int i ){ return asrclist.get( i ); } public int size(){ return asrclist.size(); } public AdaptorSubunitRescueComplex remove( int i ){ return asrclist.remove( i ); } public String toString(){ String result = ""; for( int i =0; i< asrclist.size(); i++ ){ result += "=Adaptor-Subunit-Rescue-complex-"+i+"=\n"; result += asrclist.get(i).toString(); } return result; } public Solution clone(){ Solution clone = new Solution(); for(int i=0; i<asrclist.size(); i++){ clone.add( this.get(i).clone() ); } return clone; }  120  /** * Prints a less frinedly looking string which contains all the requirements in 5'-3' orders etc. * * @return */ public String toRequirementsString(){ String result = "Solution Requirements:\n"; result += "adaptors (all 5'-3'):\n"; //get the top strand for the first one only, they are all the same after that... result += asrclist.get(0).getAdaptor().toString(true,false,false); for( int i =0; i< asrclist.size(); i++ ){ result += asrclist.get(i).getAdaptor().toString(false,true,false); } result += "special/repair oligonucleotide (all 5'-3'):\n"; result += "special/repair oligonucleotide top strand (5'-3'):\n"; for( int i =0; i< asrclist.size(); i++ ){ result += asrclist.get(i).getOligosLargerThanOctamersString(true); } result += "\nspecial/repair oligonucleotide bottom strand (5'3'):\n"; for( int i =0; i< asrclist.size(); i++ ){ result += asrclist.get(i).getOligosLargerThanOctamersString(false); } return result; }  public boolean verify( SubunitValidationStrategy svs ){ for( int i = 0; i< asrclist.size(); i++ ){ AdaptorSubunitRescueComplex asr = asrclist.get(i); Subunit subunit = asr.getSubunit().clone(); if( asr.getRescue() != null ){ subunit.addOligoToThreePrimeTop( asr.getRescue() ); } if( !svs.isSubunitValid( subunit ) ){ System.out.println("Solution Verification Failed!!!"); System.out.println("Subunit with rescue failed to pass validation!"); System.out.println(subunit.toString()); return false; }  121  } System.out.println("Solution Verification passed."); return true; } }  122  SolutionSet.java: package components; import java.util.ArrayList; import java.util.List; /** * Stores ALL the valid solutions for a construction. * Assembly strategies should produce one final SolutionSet at the end of the process * so that the best or ideal solution can then be selected out of these. * */ public class SolutionSet { List<Solution> solutions;  public SolutionSet(){ solutions = new ArrayList<Solution>(); } public void add( Solution solution){ solutions.add( solution ); } public Solution get( int i ){ return solutions.get( i ); } public int size(){ return solutions.size(); } public String toString(){ String ret = "SolutionSet:\n"; ret += "total:" +solutions.size()+"\n"; for(int i=0; i< solutions.size(); i++){ ret += "Solution: "+i+"\n"; ret += "# subunits: "+ solutions.get(i).size() +"\n"; ret += solutions.get(i).toString(); ret += "\n"; } return ret; }  }  123  Assembler.java: package main; import oligolibrary.OligoLibrary; import repair.RepairMethod; import solutionfilterstrategies.MaximumNumberOfSubunitsInSolutionFilter; import solutionfilterstrategies.MinimumSubunitSizeAndSimilarAdaptorSolutionFilter; import solutionfilterstrategies.MinimumSubunitSizeSolutionFilter; import solutionfilterstrategies.SolutionFilterStrategy; import strategies.*; import subunitvalidationstrategies.RepeatFreeAndUnderMaximumPoolSize; import subunitvalidationstrategies.RepeatFreePool; import subunitvalidationstrategies.SubunitValidationStrategy; import utilities.CodonOptomizer; import utilities.SequenceLoader; import components.*; /** * Assembler * * Main entry into gene synthesis. Assembler takes a sequence, runs a selected * build strategy (using subunit validation criteria), and filters all possible * solutions for only those that pass solution criteria. * * Each solution found is a set of subunits (along with their necessary * solid-support adaptor and rescue oligonucleotides). * * Assembler does NOT check all tiling frames. To identify solutions in all * frames, an assemlber bootstrap should be used which goes through all 8 frames * and generates solutions for each. * * Assembler contains a sample main() to run a specified sequence with some * general strategies. */ public class Assembler { String sequence; // the sequence we want to construct somehow. SubunitValidationStrategy svs; // how are subunits vaildated? (max size? // repeats?) SolutionFilterStrategy sfs; // how are the solutions selected Strategy assemblystrategy; // how are we going to build this? (inchworm, // divideandconquer) SolutionSet solutions; // the final set of solutions post assembly. /** * Assembler takes a sequence, runs a selected build strategy (using subunit validation criteria), and filters all possible solutions for only those that pass solution criteria. * * @param sequence  124  * The String sequence to be assembled * @param svs * @param assemblystrategy * @param sfs */ public Assembler(String sequence, SubunitValidationStrategy svs, Strategy assemblystrategy, SolutionFilterStrategy sfs) { this.sequence = sequence; this.svs = svs; this.assemblystrategy = assemblystrategy; this.sfs = sfs; } /** * A default assembler. This should not really be used since most criteria * should specifically be specified. * * @param sequence */ public Assembler(String sequence) { this.sequence = sequence; // default settings: svs = new RepeatFreeAndUnderMaximumPoolSize(25); assemblystrategy = new Inchworm(svs, 1,RepairMethod.NOMAMER_REPAIR); sfs = new MinimumSubunitSizeAndSimilarAdaptorSolutionFilter(10); } /** * useStrategy * * Called once the Assembler has been initialized and runs the give build * strategy (ie runs inchworm or divideandconquer) */ public void useStrategy() { solutions = assemblystrategy.assemble(sequence); } /** * useSolutionFilterStrategy * * Called once the Assembler has run the give build strategy and a desired * solution set needs to be found from all the pontential solutions * * Also, by default, prints off the solution with the least number of * subunits (as would probably be the easiest to make) */ public Solution useSolutionFilterStrategy() { // filter down all the solutions to a nice subset using our chosen // filter criteria. solutions = sfs.filter(solutions);  125  // lets find the solution with the least number of subunits. if (solutions.size() > 0) { int indexofmin = 0; int minsubunits = 9999; for (int i = 0; i < solutions.size(); i++) { if (solutions.get(i).size() < minsubunits) { indexofmin = i; minsubunits = solutions.get(i).size(); } } System.out.println("The solutions with the least number of subunits has " + minsubunits + " subunits"); //now that we have the least number of subunits //lets delete all the others and search through these for the best solution: SolutionFilterStrategy maxsubunitfiler = new MaximumNumberOfSubunitsInSolutionFilter(minsubunits); solutions = maxsubunitfiler.filter(solutions); //now we have a list of ONLY those solutions which contain the least number of subunits possible for (int i = 0; i < solutions.size(); i++) { Solution current = solutions.get(i); String sizes = "solution "+i+": "; for(int j=0;j<current.size();j++){ int size = current.get(j).getSubunit().getTotalOligoCount(); sizes += size + "+ "; } System.out.println(sizes); }  System.out.println("best solution has " + minsubunits + " subunits."); Solution best = solutions.get(indexofmin); System.out.println("\nSolution:"); System.out.println(best.toString()); System.out.println(best.toRequirementsString()); return best; } return null; } public static void main(String[] args) { boolean optomize = false; // can we alter all the codons to their best // possible set? CodonOptomizer.getInstance(); String folderpath = "//Users/dhorspool/Documents/workspace/GeneAssemblerThesisFix/src/";  126  String gene = "sequences/" + //"egfp_nt_complete.txt"; // "pBR322_tet_nt_complete.txt"; "tet_part1.txt"; //"tet_part2.txt"; //"sbfp2_nt_complete.txt"; // "egfp_nt.txt"; //a short test sequence //String libraryfile1 = "oligolibrary/egfp_library.csv"; String libraryfile1 = "oligolibrary/tetc_library.csv"; String libraryfile2 = "oligolibrary/repair_library.csv"; OligoLibrary oligolibrary = new OligoLibrary(); oligolibrary.loadLibrary(folderpath + libraryfile1); oligolibrary.loadLibrary(folderpath + libraryfile2); // get the sequence and make sure we it worked SequenceLoader sl = new SequenceLoader((folderpath + gene), optomize); if (sl.getNucleotideSequence() == null || sl.getNucleotideSequence() == "") { System.out .println("The file path you provided appears to be invalid."); return; }  //Subunit validation strategies: //SubunitValidationStrategy svs = new RepeatFreeAndUnderMaximumPoolSize(24); SubunitValidationStrategy svs = new RepeatFreePool(); //build strategy + details: Strategy strategy = new Inchworm(svs, 3, RepairMethod.SIXTEENMER_REPAIR); // possible filters: SolutionFilterStrategy sfs = new MinimumSubunitSizeSolutionFilter(6); // SolutionFilterStrategy sfs = new SimilarAdaptorSolutionFilter(); //SolutionFilterStrategy sfs = new MinimumSubunitSizeAndSimilarAdaptorSolutionFilter(6); // now that we have specified ALL the parameters, lets build this gene: Assembler assmblr = new Assembler(sl.getNucleotideSequence(), svs, strategy, sfs); assmblr.useStrategy(); Solution best = assmblr.useSolutionFilterStrategy();  127  if( best != null ){ if( !best.verify( svs ) ){ System.out.println("A double check of the solution showed that it failed verification with the rescues!"); } String rearray = oligolibrary.arraySolutionToLibrary(best); System.out.println(rearray); } } }  128  InchWorm.java: package strategies; import import import import  repair.RepairMethod; subunitvalidationstrategies.RepeatFreeAndUnderMaximumPoolSize; subunitvalidationstrategies.SubunitValidationStrategy; components.*;  /** * * Inchworm Strategy * * Starting from one big subunit (that is full of repeats) smaller (safe) * subunits are removed from the left 5' side repeatedly until a set of subunits * is constructed which is safe. * * Inchworm can either use existing oligonuceotides as rescues (in which case * the last oligo in a subunit is the rescue, and that oligo will be repeated in * the next subunit OR an arbitrarily generated oligo can be generated which * allows subunits to be slightly larger (since they can avoid repeats in the * rescue) * * Inchworm always works to make sets where the last oligo is on the bottom * strand and the rescue is the last on the top strand AAAAAAAA | GCGTTTAT | + * rescue TTTTCGCA | AATAGCAC * * Inchworm can also decide to either make subunits as large as possible OR have * a maximum pool size, which would reduce the potential for ligation error * * @author danielh */ public class Inchworm implements Strategy { int problematicincreaselength; // 1 is used for nonamer, decamer, 11mer // fixing, // 8 is used for 16mer, 24mer, ... fixing of bad oligos int maxoligos; // maximum of oligos a subunit can hold private SolutionSet solutions; private SubunitValidationStrategy svs; // exceeds max oligos? or repeats? // etc? int branchfactor; // to what level of recursion should we try // after each subunit along the way? public Inchworm() {  129  this(new RepeatFreeAndUnderMaximumPoolSize(25), 1, RepairMethod.NOMAMER_REPAIR); } public Inchworm(SubunitValidationStrategy svs, int branchfactor, RepairMethod repairmethod) { this.svs = svs; this.solutions = new SolutionSet(); this.branchfactor = branchfactor; if( repairmethod.equals( RepairMethod.NOMAMER_REPAIR) ){ problematicincreaselength = 1; }else if( repairmethod.equals( RepairMethod.SIXTEENMER_REPAIR )){ problematicincreaselength = 8; } } public SolutionSet assemble(String sequence) { SolutionSet solutions = new SolutionSet(); int sequenceindex = 0; Solution currentsolution = new Solution(); recursiveBuild(sequence, sequenceindex, currentsolution, solutions, 0); return solutions; } private void recursiveBuild(String sequence, int sequenceindex, Solution currentsolution, SolutionSet solutions, int currentbranchdepth) { Subunit currentsubunit = new Subunit(); boolean subunitbuildinprogressok = true; boolean requiredcustomoligos = false; // set to true if somewhere in // this round we needed a custom // oligo while (subunitbuildinprogressok) { if (sequenceindex + 8 > sequence.length()) { // we have reached the end and are done, save the solution // and exit this recursive method. // if we were in the middle of building a subunit when we hit // the end // add this subunit to the solution and save the whole thing! if (currentsubunit.getTotalOligoCount() != 0) { AdaptorSubunitRescueComplex asrc = new AdaptorSubunitRescueComplex( currentsubunit); currentsolution.add(asrc);  130  } Solution solutiontoadd = currentsolution.clone(); setRescueOligos( solutiontoadd ); solutions.add(solutiontoadd); return; } // get the next pair of octamers (OR potentially longer oligos if // they had problems) Oligonucleotide[] nextpair = getNextNecessaryOligoPair(sequence, sequenceindex); // lets add them to our growing subunit and check if the subunit is // still ok. Don't bother adding them if they are shorter than the end, because we have reached the end then if( nextpair[0].getSequence().length() >= 8 ){ currentsubunit.addOligoToThreePrimeTop(nextpair[0]); } if( nextpair[1].getSequence().length() >= 8 ){ currentsubunit.addOligoToFivePrimeBottom(nextpair[1]); } if (svs.isSubunitValid(currentsubunit)) { // we are ok, keep going. // continue; sequenceindex += nextpair[0].getSequence().length(); // should // be 8, // but // not // when // we // had // to // use // an // alternative // oligo! if (nextpair[0].getSequence().length() > 8) {  131  requiredcustomoligos = true; // System.out.println("next pair was larger than 8!"); // System.out.println(nextpair[0].toString()); } } else { // we have a conflict, need to stop here, and move on to the // next subunit. // we should also do some recursion to get multiple solutions // from this point. /*we need to do some checks for the rescue: - if the problem was on the bottom then removing one top and one bottom is ok because the top used as a rescue wont be a problem - but if the problem was on the top, then removing the two wont be good enough as the top one will be reused as the rescue and mess up the subunit anyways! so we need to stop one pair even further back */ subunitbuildinprogressok = false; currentsubunit.removeFivePrimeBottomOligo(); if( svs.isSubunitValid(currentsubunit)){ //it was the bottom, we are fine to just remove the pair and thats it: currentsubunit.removeThreePrimeTopOligo(); }else{ currentsubunit.removeThreePrimeTopOligo(); //and step back once more to be safe: //and reset the index as well. currentsubunit.removeFivePrimeBottomOligo(); Oligonucleotide temp = currentsubunit.removeThreePrimeTopOligo(); sequenceindex = sequenceindex temp.getSequence().length(); } break; } } // we have ended the subunit just prior to a conflict. if (currentsubunit.getTotalOligoCount() == 0) { // we have a MAJOR problem, we weren't able to make a subunit which // contained even one pair // this solution is going to be no good, so don't bother adding it // to the set or doing recursive build on it. System.out.println("Major Catastrophy when trying to add the next oligo pair.");  132  System.out.println( currentsolution.toString() ); System.out.println("failure reached!!!\n\n\n"); return; } else { //pop off the last two since we need to use one as a rescue for the subunit //and we don't want any conflicts to occur. AdaptorSubunitRescueComplex asrc = new AdaptorSubunitRescueComplex( currentsubunit); currentsolution.add(asrc); // lets start the whole process over again on the next subunit. recursiveBuild(sequence, sequenceindex, currentsolution.clone(), solutions, currentbranchdepth); // return; for (int i = 0; i < 5; i++) { //only proceed with recursion if we are working on a subunit of size > 6 //AND we haven't exceeded the branchfactor depth yet. if (currentsubunit.getTotalOligoCount() > 6 && currentbranchdepth < branchfactor) { // AdaptorSubunitRescueComplex branchasrctoworkon = currentsolution.remove(currentsolution.size() - 1); ///** OLD BRANCH CODE currentsolution.remove(currentsolution.size() - 1); currentsubunit.removeFivePrimeBottomOligo(); Oligonucleotide temp = currentsubunit .removeThreePrimeTopOligo(); sequenceindex = sequenceindex temp.getSequence().length(); AdaptorSubunitRescueComplex asrctemp = new AdaptorSubunitRescueComplex(  //  currentsubunit); currentsolution.add(asrctemp); */ recursiveBuild(sequence, sequenceindex,  currentsolution.clone(), solutions, currentbranchdepth+1); } } } }  133  /** * returns the next oligonucleotide we need from the top strand. If there * are no difficulties it will just return the next octamer at the * sequenceindex to sequenceindex+8 position however if this oligo is bad it * will try a nonamer or a 16mer depending on settings * * @param sequence * the complete sequence needing assembly * @param sequenceindex * the current position in the sequence we are at. * @return */ private Oligonucleotide[] getNextNecessaryOligoPair(String sequence, int sequenceindex) { String nexttopoligostr = ""; String nextbotoligostr = ""; boolean oligosok = false; int oligoincreasesize = 0; // jump in eights, ie if octamer is bad, use // 16mer, then 24mer... while (!oligosok) { // get the string for the next top oligo, shorten it if its beyond // the sequence length int topendindex = sequenceindex + 8 + oligoincreasesize; topendindex = Math.min(topendindex, sequence.length()); nexttopoligostr = sequence.substring(sequenceindex, topendindex); // get the string for the next bot oligo, shorten it if its beyond // the sequence length int botendindex = sequenceindex + 4 + 8 + oligoincreasesize; botendindex = Math.min(botendindex, sequence.length()); nextbotoligostr = sequence .substring(sequenceindex + 4, botendindex);  if( !isNextOligoPairOk(nexttopoligostr,nextbotoligostr)){ oligoincreasesize += problematicincreaselength; // try continue; } // we made it past potential problems in the top and bottom strand, // so we are ok to return these oligos oligosok = true; } Oligonucleotide top = new Oligonucleotide(nexttopoligostr);  134  Oligonucleotide bot = new Oligonucleotide(nextbotoligostr) .getReverseComplement(); return new Oligonucleotide[] { top, bot }; }  /** takes the two oligos made in getNextOligoPair and verifies they are ok to add * if not, then getNextOligoPair works on and keeps recalling this method until a pair is * made which is "happy". * @param oligotop the next top oligo to be produced (ideally an octamer) * @param oligobottom the next bottom oligo to be produced (ideally an octamer) * @return */ private boolean isNextOligoPairOk(String oligotop, String oligobottom){ if (oligotop.length() > 4) { String endingoftop = oligotop.substring(oligotop .length() - 4, oligotop.length()); if (isStringReverseComplementPalindrome(endingoftop)) { return false; } } if (oligobottom.length() > 4) { String endingofbot = oligobottom.substring(oligobottom .length() - 4, oligobottom.length()); if (isStringReverseComplementPalindrome(endingofbot)) { return false; } } //the endings themselves are happy, but lets also make sure when we put them together //we don't run into problems, ie something like: // ATTGXXXX // XXXXGTTA // if we do, we will have to just make bigger oligos because we don't handle odd shaped subunits yet. Oligonucleotide top = new Oligonucleotide(oligotop); Oligonucleotide bot = new Oligonucleotide(oligobottom).getReverseComplement(); Subunit testsubunit = new Subunit(); testsubunit.addOligoToFivePrimeTop(top); testsubunit.addOligoToFivePrimeBottom(bot); if( !svs.isSubunitValid(testsubunit) ){ // System.out.println("top: "+top + " bot:"+bot); return false; }  return true;  135  }  private boolean isStringReverseComplementPalindrome(String test) { Oligonucleotide testoligo = new Oligonucleotide(test); return testoligo.isPalindromic(); } public SolutionSet getSolutions() { return solutions; }  public void setRescueOligos( Solution solution ){ //go through ALL of the subunits with the exception of the last //we dont need to worry about a rescue for the last subunit... for( int i=0; i<solution.size() - 1; i++ ){ AdaptorSubunitRescueComplex current = solution.get(i); AdaptorSubunitRescueComplex next = solution.get(i+1); Oligonucleotide lastoligoofnextsubunit = next.getSubunit().getFivePrimeTopOligo(); current.setRescue( lastoligoofnextsubunit); } }  public String toString() { String result = ""; for (int i = 0; i < solutions.size(); i++) { Solution current = solutions.get(i); result += "===== Solution: " + i + " =====\n"; result += current.toString(); result += "\n\n"; } return result; } }  136  Appendix E – Sample of an EGFP Assembly SequenceLoaded.readFile: path://Users/dhorspool/Documents/workspace/GeneAssemblerThesisFix/src/sequences/egfp_nt_c omplete.txt SolutionFilterStrategy:MinimumSubunitSize: 4258 solutions prior to filtering 2886 solutions were deleted because some subunits contained less than 6 oligos 1372 solutions remaining. The solutions with the least number of subunits has 11 subunits SolutionFilterStrategy:MaximumNumberOfSubunitsInSolutionFilter: 1372 solutions prior to filtering 1053 solutions were deleted because they contained more than 11 subunits 319 solutions remaining. best solution has 11 subunits. Solution: =Adaptor-Subunit-Rescue-complex-0= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgactacc subunit: t0:atggtgag t1:caagggcg t2:aggagctg b0:actcgttc b1:ccgctcct b2:cgacaagt rescue: ttcaccggggtggtgc =Adaptor-Subunit-Rescue-complex-1= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacaagt subunit: t0:ttcaccggggtggtgc t1:ccatcctg t2:gtcgagctggacggcgacgtaaacggccacaa t3:gttcagcg b0:ggccccaccacgggta b1:ggaccagc b2:tcgacctgccgctgcatttgccggtgttcaag b3:tcgcacag t4:tgtccggc t5:gagggcga t6:gggcgatg b4:gccgctcc b5:cgctcccg b6:ctacggtg rescue: ccacctac =Adaptor-Subunit-Rescue-complex-2= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacggtg subunit: t0:ccacctac t1:ggcaagctgaccctga t2:agttcatctgcaccac b0:gatgccgt b1:tcgactgggacttcaa b2:gtagacgtggtggccg rescue:  137  cggcaagc =Adaptor-Subunit-Rescue-complex-3= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgccg subunit: t0:cggcaagc t1:tgcccgtg t2:ccctggcccaccctcg t3:tgaccacc b0:ttcgacgg b1:gcacggga b2:ccgggtgggagcactg b3:gtgggact rescue: ctgaccta =Adaptor-Subunit-Rescue-complex-4= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgact subunit: t0:ctgaccta t1:cggcgtgc t2:agtgcttc t3:agccgcta b0:ggatgccg b1:cacgtcac b2:gaagtcgg b3:cgatgggg rescue: ccccgacc =Adaptor-Subunit-Rescue-complex-5= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgggg subunit: t0:ccccgacc t1:acatgaag t2:cagcacga t3:cttcttca b0:ctggtgta b1:cttcgtcg b2:tgctgaag b3:aagttcag t4:agtccgcc t5:atgcccga t6:aggctacg t7:tccaggag b4:gcggtacg b5:ggcttccg b6:atgcaggt b7:cctcgcgt rescue: cgcaccat =Adaptor-Subunit-Rescue-complex-6= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgcgt subunit: t0:cgcaccat t1:cttcttca t2:aggacgac t3:ggcaacta b0:ggtagaag b1:aagttcct b2:gctgccgt b3:tgatgttc t4:caagacccgcgccgag t5:gtgaagtt t6:cgagggcg b4:tgggcgcggctccact b5:tcaagctc b6:ccgctgtg rescue: acaccctg =Adaptor-Subunit-Rescue-complex-7= AdaporSubunitRescueComplex:  138  adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgactgtg subunit: t0:acaccctg t1:gtgaaccg t2:catcgagc t3:tgaagggc b0:ggaccact b1:tggcgtag b2:ctcgactt b3:cccgtagc t4:atcgactt t5:caaggagg t6:acggcaac t7:atcctggg b4:tgaagttc b5:ctcctgcc b6:gttgtagg b7:accccgtg t8:gcacaagc t9:tggagtacaactacaa t10:cagccacaacgtctat b8:ttcgacct b9:catgttgatgttgtcg b10:gtgttgcagatatagt rescue: atcatggc =Adaptor-Subunit-Rescue-complex-8= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgactagt subunit: t0:atcatggc t1:cgacaagc t2:agaagaac t3:ggcatcaa b0:accggctg b1:ttcgtctt b2:cttgccgt b3:agttccac t4:ggtgaact t5:tcaagatccgccacaa t6:catcgagg t7:acggcagc b4:ttgaagtt b5:ctaggcggtgttgtag b6:ctcctgcc b7:gtcgcacg t8:gtgcagctcgccgacc t9:actaccag t10:cagaacac t11:ccccatcg b8:tcgagcggctggtgat b9:ggtcgtct b10:tgtggggg b11:tagccgct t12:gcgacggc t13:cccgtgct t14:gctgcccg t15:acaaccac b12:gccggggc b13:acgacgac b14:gggctgtt b15:ggtgatgg rescue: tacctgag =Adaptor-Subunit-Rescue-complex-9= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacatgg subunit: t0:tacctgag t1:cacccagt t2:ccgccctg t3:agcaaaga b0:actcgtgg b1:gtcaggcg b2:ggactcgt b3:ttctgggg t4:ccccaacg t5:agaagcgcgatcacat t6:ggtcctgc t7:tggagttc b4:ttgctctt b5:cgcgctagtgtaccag b6:gacgacct b7:caagcact t8:gtgaccgc b8:ggcggcgg rescue: cgccggga =Adaptor-Subunit-Rescue-complex-10= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgcgg subunit: t0:cgccggga t1:tcactctc t2:ggcatgga t3:cgagctgt b0:ccctagtg b1:agagccgt b2:acctgctc b3:gacatgtt  139  t4:acaagtaa t5:gacagtcc b4:cattctgt rescue: (rescue not set)  Solution Requirements: adaptors (all 5'-3'): ggcagttccggatccatctagacagaattcagctggaagactg ccatcagtcttccagctgaattctgtctagatggatccggaactgcc tgaacagtcttccagctgaattctgtctagatggatccggaactgcc gtggcagtcttccagctgaattctgtctagatggatccggaactgcc gccgcagtcttccagctgaattctgtctagatggatccggaactgcc tcagcagtcttccagctgaattctgtctagatggatccggaactgcc ggggcagtcttccagctgaattctgtctagatggatccggaactgcc tgcgcagtcttccagctgaattctgtctagatggatccggaactgcc gtgtcagtcttccagctgaattctgtctagatggatccggaactgcc tgatcagtcttccagctgaattctgtctagatggatccggaactgcc ggtacagtcttccagctgaattctgtctagatggatccggaactgcc ggcgcagtcttccagctgaattctgtctagatggatccggaactgcc special/repair oligonucleotide (all 5'-3'): special/repair oligonucleotide top strand (5'-3'): ttcaccggggtggtgc gtcgagctggacggcgacgtaaacggccacaa ggcaagctgaccctga agttcatctgcaccac ccctggcccaccctcg caagacccgcgccgag tggagtacaactacaa cagccacaacgtctat tcaagatccgccacaa gtgcagctcgccgacc agaagcgcgatcacat special/repair oligonucleotide bottom strand (5'-3'): gaacttgtggccgtttacgtcgccgtccagct atgggcaccaccccgg gccggtggtgcagatg aacttcagggtcagct gtcacgagggtgggcc tcacctcggcgcgggt tgatatagacgttgtg gctgttgtagttgtac tagtggtcggcgagct gatgttgtggcggatc gaccatgtgatcgcgc Solution Verification passed.  140  141  Appendix F – Sample of a TetC Assembly SequenceLoaded.readFile: path://Users/dhorspool/Documents/workspace/GeneAssemblerThesisFix/src/sequences/tet_part1 .txt SolutionFilterStrategy:MinimumSubunitSize: 1967 solutions prior to filtering 1806 solutions were deleted because some subunits contained less than 6 oligos 161 solutions remaining. The solutions with the least number of subunits has 8 subunits SolutionFilterStrategy:MaximumNumberOfSubunitsInSolutionFilter: 161 solutions prior to filtering 145 solutions were deleted because they contained more than 8 subunits 16 solutions remaining. best solution has 8 subunits. Solution: =Adaptor-Subunit-Rescue-complex-0= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccgag subunit: t0:gctcatga t1:aatctaac t2:aatgcgct t3:catcgtca b0:tactttag b1:attgttac b2:gcgagtag b3:cagtagga t4:tcctcggc t5:accgtcac t6:cctggatg t7:ctgtaggc b4:gccgtggc b5:agtgggac b6:ctacgaca b7:tccgtatc t8:ataggctt b8:cgaaccaa rescue: ggttatgc =Adaptor-Subunit-Rescue-complex-1= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacccaa subunit: t0:ggttatgc t1:cggtactgccgggcct t2:cttgcgggatatcgtc t3:cattccga b0:tacggcca b1:tgacggcccggagaac b2:gccctatagcaggtaa b3:ggctgtcg t4:cagcatcg b4:tagcggtc rescue: ccagtcac =Adaptor-Subunit-Rescue-complex-2= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacggtc subunit:  142  t0:ccagtcac t1:tatggcgt t2:gctgctagcgctatatgcgttgat b0:agtgatac b1:cgcacgac b2:gatcgcgatatacgcaactacgtt rescue: gcaatttc =Adaptor-Subunit-Rescue-complex-3= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccgtt subunit: t0:gcaatttc t1:tatgcgca t2:cccgttct t3:cggagcac b0:aaagatac b1:gcgtgggc b2:aagagcct b3:cgtgacag t4:tgtccgac t5:cgctttgg t6:ccgccgcc t7:cagtcctg b4:gctggcga b5:aaccggcg b6:gcgggtca b7:ggacgagc t8:ctcgcttc t9:gctacttg t10:gagccact t11:atcgactacgcgatca b8:gaagcgat b9:gaacctcg b10:gtgatagc b11:tgatgcgctagtaccg t12:tggcgacc t13:acacccgt t14:cctgtgga t15:tcctctac b12:ctggtgtg b13:ggcaggac b14:acctagga b15:gatgcggc t16:gccggacg t17:catcgtggccggcatc t18:accggcgccacaggtgcggttgct t19:ggcgccta b16:ctgcgtag b17:caccggccgtagtggc b18:cgcggtgtccacgccaacgaccgc b19:ggatatag t20:tatcgccg b20:cggctgta rescue: acatcacc =Adaptor-Subunit-Rescue-complex-4= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgactgta subunit: t0:acatcacc t1:gatgggga t2:agatcggg t3:ctcgccac b0:gtggctac b1:cccttcta b2:gcccgagc b3:ggtgaagc t4:ttcgggctcatgagcg t5:cttgtttc b4:ccgagtactcgcgaac b5:aaagccgc rescue: ggcgtggg =Adaptor-Subunit-Rescue-complex-5= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacccgc subunit: t0:ggcgtggg t1:tatggtgg t2:caggcccc t3:gtggccgggggactgt b0:acccatac b1:caccgtcc b2:ggggcacc b3:ggccccctgacaaccc t4:tgggcgcc t5:atctcctt b4:gcggtaga b5:ggaacgta rescue: gcatgcac =Adaptor-Subunit-Rescue-complex-6=  143  AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccgta subunit: t0:gcatgcac t1:cattcctt t2:gcggcggc t3:ggtgctca b0:cgtggtaa b1:ggaacgcc b2:gccgccac b3:gagttgcc t4:acggcctc t5:aacctact t6:actgggct b4:ggagttgg b5:atgatgac b6:ccgacgaa rescue: gcttccta =Adaptor-Subunit-Rescue-complex-7= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccgaa subunit: t0:gcttccta t1:atgcagga t2:gtcgcata t3:agggagag b0:ggattacg b1:tcctcagc b2:gtattccc b3:tctcgcag t4:cgtcgacc t5:gatgccct t6:tgagagcc t7:ttcaaccc b4:ctggctac b5:gggaactc b6:tcggaagt b7:tgggtcag t8:agtcagctccttccggtgggcgcggggcatga t9:ctatcgtc t10:gccgcact t11:tatgactg b8:tcgaggaaggccacccgcgccccgtactgata b9:gcagcggc b10:gtgaatac b11:tgacagaa t12:tcttcttt  rescue: (rescue not set)  Solution Requirements: adaptors (all 5'-3'): ggcagttccggatccatctagacagaattcagctggaagactg gagccagtcttccagctgaattctgtctagatggatccggaactgcc aacccagtcttccagctgaattctgtctagatggatccggaactgcc ctggcagtcttccagctgaattctgtctagatggatccggaactgcc ttgccagtcttccagctgaattctgtctagatggatccggaactgcc atgtcagtcttccagctgaattctgtctagatggatccggaactgcc cgcccagtcttccagctgaattctgtctagatggatccggaactgcc atgccagtcttccagctgaattctgtctagatggatccggaactgcc aagccagtcttccagctgaattctgtctagatggatccggaactgcc special/repair oligonucleotide (all 5'-3'): special/repair oligonucleotide top strand (5'-3'): cggtactgccgggcct cttgcgggatatcgtc gctgctagcgctatatgcgttgat atcgactacgcgatca catcgtggccggcatc accggcgccacaggtgcggttgct ttcgggctcatgagcg gtggccgggggactgt agtcagctccttccggtgggcgcggggcatga special/repair oligonucleotide bottom strand (5'-3'): aatggacgatatcccg caagaggcccggcagt  144  ttgcatcaacgcatatagcgctag cgccagcaaccgcacctgtggcgc cggtgatgccggccac gccatgatcgcgtagt caagcgctcatgagcc cccaacagtcccccgg atagtcatgccccgcgcccaccggaaggagct Solution Verification passed.  145  SequenceLoaded.readFile: path://Users/dhorspool/Documents/workspace/GeneAssemblerThesisFix/src/sequences/tet_part2 .txt SolutionFilterStrategy:MinimumSubunitSizeAndSimilarAdaptorSolutionFilter 588 solutions prior to filtering SolutionFilterStrategy:MinimumSubunitSize: 588 solutions prior to filtering 532 solutions were deleted because some subunits contained less than 6 oligos 56 solutions remaining. SolutionFilterStrategy:SimilarAdaptorSolutionFilter: 56 solutions prior to filtering 51 solutions were deleted because some subunits had repeated adaptors 5 solutions remaining. The solutions with the least number of subunits has 8 subunits SolutionFilterStrategy:MaximumNumberOfSubunitsInSolutionFilter: 5 solutions prior to filtering 4 solutions were deleted because they contained more than 8 subunits 1 solutions remaining. solution 0: 10+ 6+ 20+ 6+ 6+ 18+ 20+ 19+ best solution has 8 subunits. Solution: =Adaptor-Subunit-Rescue-complex-0= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacagaa subunit: t0:tcttcttt t1:atcatgcaactcgtag t2:gacaggtgccggcagc t3:gctctggg b0:gaaatagt b1:acgttgagcatcctgt b2:ccacggccgtcgcgag b3:acccagta t4:tcattttc b4:aaagccgc rescue: ggcgagga =Adaptor-Subunit-Rescue-complex-1= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacccgc subunit: t0:ggcgagga t1:ccgctttc t2:gctggagc b0:tcctggcg b1:aaagcgac b2:ctcgcgct rescue: gcgacgatgatcggcctgtcgctt =Adaptor-Subunit-Rescue-complex-2= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccgct subunit: t0:gcgacgatgatcggcctgtcgctt t1:gcggtatt t2:cggaatcttgcacgcc t3:ctcgctca b0:gctactagccggacagcgaacgcc b1:ataagcct b2:tagaacgtgcgggagc b3:gagttcgg  146  t4:agccttcg t5:tcactggt t6:cccgccac t7:caaacgtt b4:aagcagtg b5:accagggc b6:ggtggttt b7:gcaaagcc t8:tcggcgag t9:aagcaggc b8:gctcttcg b9:tccggtaa rescue: cattatcgccggcatggcggccgacgcgctgg =Adaptor-Subunit-Rescue-complex-3= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgtaa subunit: t0:cattatcgccggcatggcggccgacgcgctgg t1:gctacgtc t2:ttgctggc b0:tagcggccgtaccgccggctgcgcgacccgat b1:gcagaacg b2:accgcaag rescue: gttcgcgacgcgaggc =Adaptor-Subunit-Rescue-complex-4= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgaccaag subunit: t0:gttcgcgacgcgaggc t1:tggatggc t2:cttcccca b0:cgctgcgctccgacct b1:accggaag b2:gggtaata rescue: ttatgatt =Adaptor-Subunit-Rescue-complex-5= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacaata subunit: t0:ttatgatt t1:cttctcgc t2:ttccggcg t3:gcatcgggatgcccgc b0:ctaagaag b1:agcgaagg b2:ccgccgta b3:gccctacgggcgcaac t4:gttgcagg t5:ccatgctg t6:tccaggca t7:ggtagatg b4:gtccggta b5:cgacaggt b6:ccgtccat b7:ctactgct t8:acgaccat b8:ggtagtcc rescue: cagggaca =Adaptor-Subunit-Rescue-complex-6= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgtcc subunit: t0:cagggaca t1:gcttcaaggatcgctc t2:gcggctct t3:taccagcc b0:ctgtcgaa b1:gttcctagcgagcgcc b2:gagaatgg b3:tcggattg  147  t4:taacttcg t5:atcactgg t6:accgctga t7:tcgtcacg b4:aagctagt b5:gacctggc b6:gactagca b7:gtgccgct t8:gcgattta t9:tgccgcct b8:aaatacgg b9:cggagccg rescue: cggcgagc =Adaptor-Subunit-Rescue-complex-7= AdaporSubunitRescueComplex: adaptor: ggcagttccggatccatctagacagaattcagctggaagactg ccgtcaaggcctaggtagatctgtcttaagtcgaccttctgacgccg subunit: t0:cggcgagc t1:acatggaa t2:cgggttggcatggatt t3:gtaggcgccgccctat b0:ctcgtgta b1:ccttgccc b2:aaccgtacctaacatc b3:cgcggcgggatatgga t4:accttgtc t5:tgcctccccgcgttgc t6:gtcgcggt t7:gcatggagccgggcca b4:acagacgg b5:aggggcgcaacgcagc b6:gccacgta b7:cctcggcccggtggag t8:cctcgacc t9:tgacatgt b8:ctggactg rescue: (rescue not set)  Solution Requirements: adaptors (all 5'-3'): ggcagttccggatccatctagacagaattcagctggaagactg aagacagtcttccagctgaattctgtctagatggatccggaactgcc cgcccagtcttccagctgaattctgtctagatggatccggaactgcc tcgccagtcttccagctgaattctgtctagatggatccggaactgcc aatgcagtcttccagctgaattctgtctagatggatccggaactgcc gaaccagtcttccagctgaattctgtctagatggatccggaactgcc ataacagtcttccagctgaattctgtctagatggatccggaactgcc cctgcagtcttccagctgaattctgtctagatggatccggaactgcc gccgcagtcttccagctgaattctgtctagatggatccggaactgcc special/repair oligonucleotide (all 5'-3'): special/repair oligonucleotide top strand (5'-3'): atcatgcaactcgtag gacaggtgccggcagc gcgacgatgatcggcctgtcgctt cggaatcttgcacgcc cattatcgccggcatggcggccgacgcgctgg gttcgcgacgcgaggc gcatcgggatgcccgc gcttcaaggatcgctc cgggttggcatggatt gtaggcgccgccctat tgcctccccgcgttgc gcatggagccgggcca special/repair oligonucleotide bottom strand (5'-3'): gagcgctgccggcacc tgtcctacgagttgca cgagggcgtgcaagat ccgcaagcgacaggccgatcatcg tagcccagcgcgtcggccgccatgccggcgat tccagcctcgcgtcgc caacgcgggcatcccg  148  ccgcgagcgatccttg gaggtggcccggctcc cgacgcaacgcgggga aggtatagggcggcgc ctacaatccatgccaa Solution Verification passed.  149  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0068440/manifest

Comment

Related Items