UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

High-resolution mutation detection in Caenorhabditis elegans mutants and natural isolates using array… Maydan, Jason Stephen 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2009_spring_maydan_jason.pdf [ 4.35MB ]
Metadata
JSON: 24-1.0067087.json
JSON-LD: 24-1.0067087-ld.json
RDF/XML (Pretty): 24-1.0067087-rdf.xml
RDF/JSON: 24-1.0067087-rdf.json
Turtle: 24-1.0067087-turtle.txt
N-Triples: 24-1.0067087-rdf-ntriples.txt
Original Record: 24-1.0067087-source.json
Full Text
24-1.0067087-fulltext.txt
Citation
24-1.0067087.ris

Full Text

HIGH-RESOLUTION MUTATION DETECTION IN CAENORHABDITIS ELEGANS MUTANTS AND NATURAL ISOLATES USING ARRAY COMPARATIVE GENOMIC HYBRIDIZATION   by  JASON STEPHEN MAYDAN  M.Sc., The University of British Columbia, 2000 B.Ed., The University of Windsor, 1996 B.Sc., The University of Western Ontario, 1994   A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE STUDIES (Genetics)   THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  March 2009   Jason Stephen Maydan, 2009  ii Abstract An essential requirement of genetic research is the ability to identify mutations. Forward genetic screens begin by selecting for a phenotype and proceed to search for the causative mutation. Reverse genetics experiments first identify the mutation and then seek to derive the mutant phenotype, if any. Both approaches depend on efficient means of detecting mutations. This thesis describes the development of methods to facilitate the detection of mutations in the model organism, Caenorhabditis elegans, using array Comparative Genomic Hybridization (aCGH). Exon-centric oligonucleotide microarrays targeting specific chromosomes and the whole genome were designed and used to detect both large multi-gene and small single-gene deletions. Both homozygous and heterozygous deletions were identified using this technique. I showed that even single nucleotide transitions and transversions are detectable when using microarrays with sufficient probe densities, which are achievable with target regions of two Mbp or less. I also used aCGH to detect extensive natural gene content variation between the N2 Bristol strain and twelve wild C. elegans isolates. Most of the DNA copy number alterations in these strains are deletions relative to Bristol. Over 5% of the genes present in the Bristol strain are absent in at least one of the natural isolates that were examined. This represents a significant increase in the number of genes with known null alleles. These deletions were then used to infer relationships among the natural isolates, which proved to be complex. The methods described in this thesis will greatly assist in the identification of mutations in C. elegans and are also applicable to other organisms with sequenced reference genomes.  iii Table of Contents Abstract..................................................................................................................................... ii Table of Contents ..................................................................................................................... iii List of Tables ........................................................................................................................... vi List of Figures......................................................................................................................... vii Acknowledgements ................................................................................................................ viii Co-Authorship Statement ......................................................................................................... ix 1. Introduction........................................................................................................................... 1 1.1. Thesis overview.............................................................................................................. 1 1.2. The nematode Caenorhabditis elegans as a model organism........................................... 1 1.3. Current methods of generating and discovering null alleles in C. elegans........................ 2 1.4. Array Comparative Genomic Hybridization.................................................................... 2 1.5. Development of an aCGH platform for deletion discovery in C. elegans......................... 3 1.6. Detecting single nucleotide mutations in C. elegans using aCGH.................................... 4 1.7. Copy number variation in natural isolates of C. elegans.................................................. 5 1.8. Thesis objectives ............................................................................................................ 5 1.9. References...................................................................................................................... 9 2. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array Comparative Genomic Hybridization .. ........................................................................................................ 12 2.1 Introduction................................................................................................................... 12 2.2. Results.......................................................................................................................... 13 2.2.1. Oligonucleotide probe quality and detection of homozygous 50-kb and 1-kb deletions .......................................................................................................................... 13 2.2.2. Detection of single-copy number differences between hermaphrodite and male X chromosomes and in a balanced chromosome II deficiency ............................................. 15 2.2.3. The mIn1 balancer chromosome............................................................................. 16 2.2.4. Novel balanced lethal deletions on chromosome II................................................. 17 2.2.5. Whole-genome array CGH: Comparing N2 Bristol to Hawaiian and Madeiran wild isolates ............................................................................................................................ 18 2.3. Discussion .................................................................................................................... 19 2.3.1. The utility of aCGH in screening for novel deletions.............................................. 19 2.3.2. Natural gene content variation in wild populations................................................. 21 2.4. Methods........................................................................................................................ 23 2.4.1. Probe selection, microarray design, and microarray manufacture ........................... 23 2.4.2. Nematode culture, harvest, and DNA preparation................................................... 24 2.4.3. DNA fragmentation and labeling............................................................................ 25 2.4.4. Sample hybridization and imaging ......................................................................... 26 2.4.5. Data analysis.......................................................................................................... 26 2.5. References.................................................................................................................... 37 3. De novo identification of single nucleotide mutations in Caenorhabditis elegans using array Comparative Genomic Hybridization ..................................................................................... 40 3.1. Introduction.................................................................................................................. 40 3.2. Results.......................................................................................................................... 40 3.2.1. Novel single-nucleotide mutations detected utilizing an exon-centric chromosome II microarray....................................................................................................................... 40  iv 3.2.2. Single-nucleotide mutations detected in 13 strains with previously mapped mutations ........................................................................................................................................ 41 3.2.3. The sensitivity of single nucleotide polymorphism detection using aCGH.............. 42 3.3. Discussion .................................................................................................................... 43 3.3.1. Limitations to SNP discovery using aCGH............................................................. 43 3.3.2. Suggestions to improve the sensitivity and specificity of SNP detection by aCGH . 44 3.3.3. Online resources for SNP detection using aCGH.................................................... 44 3.3.4. SNP detection by aCGH as an alternative to high-throughput sequencing .............. 45 3.4. Methods........................................................................................................................ 45 3.4.1. Mutagenesis........................................................................................................... 45 3.4.2. Nematode culturing and DNA preparation ............................................................. 46 3.4.3. Probe selection, array design and aCGH................................................................. 46 3.4.4. Data analysis and mutation detection ..................................................................... 47 3.5. References.................................................................................................................... 51 4. Copy number variation in the Caenorhabditis elegans genome reveals complex relationships among natural isolates ............................................................................................................ 52 4.1. Introduction.................................................................................................................. 52 4.2. Results.......................................................................................................................... 53 4.2.1. aCGH reveals a bias favoring coding sequence deletions over amplifications in C. elegans ............................................................................................................................ 53 4.2.2. Extensive copy number variation in the C. elegans genome allows even very closely related strains to be distinguished .................................................................................... 55 4.2.3. The distribution of indels in the genome and the overrepresentation of indels in particular gene families.................................................................................................... 55 4.2.4. Relatedness inferences based on deletions shared by multiple natural isolates ........ 56 4.3. Discussion .................................................................................................................... 58 4.3.1. Bias favoring deletions targeting gene families involved in environmental sensation and innate immunity........................................................................................................ 58 4.3.2. New insights into complex strain relationships resulting from recombination and outcrossing in the C. elegans lineage ............................................................................... 59 4.3.3. Very common indels, mutation hotspots, and the possibility of extreme sequence divergence masquerading as deletions ............................................................................. 60 4.4. Methods........................................................................................................................ 62 4.4.1. Strain selection, nematode culturing and DNA preparation .................................... 62 4.4.2. aCGH .................................................................................................................... 63 4.4.3. Indel identification................................................................................................. 63 4.4.4. Chi-square tests, t-tests and ANOVAs.................................................................... 64 4.4.5. Affected genes ....................................................................................................... 64 4.4.6. Strain relationships ................................................................................................ 65 4.4.7. Linkage disequilibrium .......................................................................................... 66 4.5. References.................................................................................................................... 74 5. Conclusions......................................................................................................................... 78 5.1. Thesis summary............................................................................................................ 78 5.2. The significance of this work and its potential applications........................................... 78 5.3. Strategies to reduce the cost of detecting novel induced deletions using aCGH............. 79 5.4. Single nucleotide mutation detection ............................................................................ 81 5.5. SNP-CGH Mapping...................................................................................................... 82 5.6. Future directions, deep sequencing and site-specific gene conversion ........................... 82 5.7. References.................................................................................................................... 85  v Appendix 1. Genes that are completely deleted from the Hawaiian strain (CB4856) genome... 87 Appendix 2. Genes that are completely deleted from the Madeiran strain (JU258) genome. .... 97 Appendix 3. The segmentation algorithm. ............................................................................. 110  vi List of Tables Table 2.1. Gene family members deleted in natural isolates from Hawaii (CB4856) and Madeira (JU258). .................................................................................................................................. 28 Table 4.1. Indels detected in twelve natural isolates of C. elegans. .......................................... 67 Table 4.2. Number of deletions shared by all strain pairs. ........................................................ 68 Table 4.3. Genes affected by copy number variants in C. elegans. ........................................... 69  vii List of Figures Figure 1.1. Array Comparative Genomic Hybridization............................................................. 7 Figure 1.2. Detection of a single nucleotide mutation by aCGH utilizing a microarray of highly overlapping oligonucleotide probes. .......................................................................................... 8 Figure 2.1. Detection of a 50-kb homozygous viable deletion in gkDf2.................................... 29 Figure 2.2. Detection of a 1047-bp homozygous viable deletion in ceh-39 (gk329).................. 30 Figure 2.3. Comparison of the normalized average fluorescence ratios (XO male / XX hermaphrodite) for all probe pairs to chromosomes II and X. .................................................. 31 Figure 2.4. Detection of the 1202-bp deletion in dab-1 (gk291) in a wash-sampled balanced heterozygous population.......................................................................................................... 32 Figure 2.5. Deletions detected in a screen for homozygous lethal mutations in six wash-sampled balanced heterozygous populations.......................................................................................... 33 Figure 2.6. Whole-genome aCGH comparing Hawaiian (CB4856) and Bristol N2 (VC196) hermaphrodites........................................................................................................................ 34 Figure 2.7. Whole-genome aCGH comparing Madeiran (JU258) and Bristol N2 (VC196) hermaphrodites........................................................................................................................ 35 Figure 2.8. A homozygous viable deletion identified on chromosome V in the Hawaiian strain (CB4856). ............................................................................................................................... 36 Figure 3.1. Novel detection of an A→T transversion in syd-1.................................................. 49 Figure 3.2. Estimation of the sensitivity and specificity of the current SNP detection technique. ................................................................................................................................................ 50 Figure 4.1. Indels on the left arm of chromosome II in twelve natural isolates of C. elegans.... 70 Figure 4.2. Indels in the genomes of twelve natural isolates of C. elegans. .............................. 71 Figure 4.3. The number of deletions detected in each of 12 natural isolates of C. elegans. ....... 72 Figure 4.4. Unrooted consensus tree for twelve natural isolates of C. elegans. ......................... 73  viii Acknowledgements I would like to thank my graduate supervisor Dr. Donald Moerman for the opportunity to work in the C. elegans Gene Knockout Facility and for his generosity, support, guidance, encouragement and friendship during my studies. He provided me with opportunities to collaborate with many excellent researchers and attend numerous conferences to present my work. I would also like to thank Dr. Stephane Flibotte for all of his help, mentoring, guidance and support, as well as my other thesis committee members Drs. Sally Otto and Don Riddle for their helpful suggestions, guidance and advice. I enjoyed the opportunity to collaborate with Dr. James Thomas and appreciate his help and guidance. I am grateful to Genome Canada, Genome British Columbia, the Michael Smith Research Foundation, the Canadian Institute of Health Research, and the Natural Sciences and Engineering Research Council of Canada for financially supporting my work. I would like to thank Mark Edgley for all of his mentoring, guidance, advice, assistance and friendship during my PhD studies as well as my time spent as a technician in the laboratory. I owe a great deal of thanks to past and present members of the C. elegans Gene Knockout Facility at UBC for all of their assistance and support over the years, including Joanne Lau, Jaryn Perkins, Bin Shen, Christine Lee, Owen Dadivas, Allison Hay, Angela Fisher, Candice Navaroli, Nadereh Rezania, Lucy Liu, Sarah Neil, Ola Rogulu, Iasha Chaudry, Adam Lorch, Jon Taylor, Rick Zapf, Carolina Chanis, Christine Kwitkowski, and many, many others. I am also grateful for the helpful suggestions of fellow lab members Ryan Viveiros, Adam Warner, Mariana Veiga, and Drs. Teresa Rogalski, Barbara Meissner and Aruna Somasiri. It has been a pleasure to work with so many outstanding people. Finally, I would especially like to thank my parents Steve, Gail and Andrew K., and my brothers Ryan and Andrew S. for their love and support throughout my life.  ix Co-Authorship Statement Together with my supervisor, Donald Moerman, I was responsible for the design of the research program described in this thesis. I was primarily responsible for the research, data analyses and manuscript preparation. Portions of this thesis are part of multi-author publications. Co-authors of these publications contributed analyses, text, tables, figures, edits, advice, funding and supervision. O. Rogula assisted in the preparation of Figure 1.1. Specific contributions to Chapters 2, 3 and 4 are listed below.  I was the primary author of Chapter 2 and was responsible for all study design, analysis, text, figures and tables except where indicated below. S. Flibotte helped to design the study, wrote and edited portions of text, performed analyses, assisted with Fig. 2.3, selected probes for all microarrays, wrote software to calculate and normalize log2 ratios and perform segmentation of the CGH data, and provided editorial suggestions and advice. M. Edgley helped to design the study, wrote portions of text, assisted with nematode culturing and mutagenesis, and provided editorial suggestions and advice. J. Lau assisted with mutagenesis and performed the screen for lethal mutations (Section 2.2.4). R. Selzer, T. Richmond and N. Pofahl performed the aCGH work at NimbleGen. J. Thomas performed analyses, wrote portions of text and assisted with Tables 2.1, 2.2 and 2.3. D. Moerman helped to design the study, wrote portions of text, provided editorial suggestions and advice, supervised and funded the project.  I was the primary author of Chapter 3 and was responsible for all study design, experiments, analysis, text and figures except where noted below. H.M. Okada created the software described in section 3.3.3. S. Flibotte helped design the study, provided valuable advice, performed analyses including section 3.2.3, wrote and edited portions of text, selected probes for all microarrays, and contributed Figure 3.2. M.L. Edgley contributed text, editorial suggestions, assistance with nematode culturing and performed mutagenesis. D.G. Moerman helped design the study, contributed portions of text and editorial suggestions, performed analyses, provided valuable advice, supervised and funded the project.  I was the primary author of Chapter 4 and was responsible for all study design, experiments, analysis, text, figures and tables except where indicated below. A. Lorch assisted in the  x preparation of Table 4.3. M.L. Edgley provided advice and helped to culture the RW7000 strain. S. Flibotte provided advice and designed the whole genome microarray and the segmentation algorithm. D.G. Moerman provided advice, editorial suggestions, supervision and funding for the project.   1 1. Introduction 1.1. Thesis overview This thesis describes the development of array Comparative Genomic Hybridization (aCGH) as a method of mutation detection in Caenorhabditis elegans. The ability to reliably detect mutations is an essential requirement of genetic research. The methods described in this thesis are capable of detecting mutations of any size, from deletions several hundred kilobases in length to single nucleotide alterations. These methods have been applied to detect both novel induced mutations and natural gene content variation in wild isolates of C. elegans. The methods developed in this thesis should greatly facilitate the detection of mutations in C. elegans. 1.2. The nematode Caenorhabditis elegans as a model organism The nematode C. elegans has become an extraordinarily popular and useful model organism in many fields of study, including development, the nervous system, behavior, aging and evolution (Riddle et al. 1997). C. elegans is a convenient and powerful research tool for a number of reasons, including its short 3.5-day reproductive lifecycle, the ease with which it is cultured in the lab, and the fact that mutants can be preserved by freezing (Brenner 1974), obviating the need for laborious strain maintenance.  The complete C. elegans genome sequence was published in 1998 and contains approximately 20,000 genes, roughly 40% of which have human homologues while 34% appear to be nematode-specific (C. elegans Sequencing Consortium 1998). Remarkably, the entire cell lineage from zygote to 959-cell hermaphroditic adult has been described (Sulston and Horvitz 1977; Sulston et al. 1983), along with the complete anatomical structure of the 302-cell nervous system to the level of individual processes and synaptic connections (White et al. 1986). A number of powerful tools and resources are available to C. elegans researchers, including a library of over 12,500 open reading frame clones (Lamesch et al. 2004), a nearly complete RNAi library of clones allowing the selective knockdown of 85% of the genes in the genome (Kamath et al. 2003), and a growing stock of mutants currently comprised of approximately  2 7000 deletion alleles targeting over 5500 genes (Moerman and Barstead 2008). To fully exploit C. elegans as a model system requires mutations in all of its genes. Since 1998, the C. elegans Gene Knockout Consortium (http://celeganskoconsortium.omrf.org/) has been working towards the goal of generating single gene knockouts for all C. elegans genes. The Consortium includes the Barstead laboratory at the Oklahoma Medical Research Foundation in the USA, the Mitani laboratory at Tokyo Women’s University in Japan, and the Moerman laboratory at the University of British Columbia in Vancouver, Canada. 1.3. Current methods of generating and discovering null alleles in C. elegans A nearly complete set of deletions for all Saccharomyces cerevisiae genes was achieved in 2002 through the use of homologous recombination and gene disruption (Giaever et al. 2002). In the absence of a genome-scale method of site-directed mutagenesis in C. elegans, a number of alternative strategies for generating null alleles in C. elegans have been used.  These methods have recently been reviewed (Moerman and Barstead 2008) and are briefly summarized here. Gene disruption methods utilizing transposon insertion and excision with either the Tc1 transposon (Zwaal et al. 1993) or Drosophila transposable element Mos1 (Granger et al. 2004) have been developed in C. elegans and a promising gene conversion method known as MosTIC has recently been described (Robert and Bessereau 2007). TILLING has recently been used to obtain single nucleotide mutations in C. elegans (Gilchrist et al. 2006), some of which cause nonsense mutations, but has not been applied to a large-scale effort of generating null mutations. The C. elegans Gene Knockout Consortium currently uses a method employing random mutagenesis with either ethyl methanesulfonate (EMS) or tri-methylpsoralen and ultraviolet irradiation (TMP/UV) followed by targeted deletion detection using PCR and gel electrophoresis (Edgley et al. 2002; Barstead and Moerman 2006), and this method remains the primary means of generating null alleles in C. elegans (Moerman and Barstead 2008). 1.4. Array Comparative Genomic Hybridization An alternative to PCR-based detection of deletions is Comparative Genomic Hybridization (CGH) (Kallioniemi et al. 1992; Mantripragada et al. 2004), which allows the detection of copy number differences between two DNA samples. DNA samples are differentially labeled and cohybridized to a microarray consisting of DNA probes to target sequences in the genome, and  3 the ratio of fluorescent intensities measured at each probe reveals copy number differences existing between the two genomes (Figure 1.1). Early aCGH experiments primarily used microarrays of bacterial artificial chromosomes (BACs), cDNA clones or PCR products (Solinas-Toldo et al. 1997; Pinkel et al. 1998; Mantripragada et al. 2004).  More recently, oligonucleotide microarrays have become more popular because they allow higher resolution mutation discovery (Carvalho et al. 2004; Gresham et al. 2008). C. elegans is an attractive target for aCGH studies because of its relatively small 100 Mb genome. A less complex pool of labeled DNA fragments reduces non-specific hybridization to the probes on the microarray, increasing the signal-to-noise ratio relative to an equivalent human experiment (Flibotte and Moerman 2008) and facilitating the detection of smaller mutations. 1.5. Development of an aCGH platform for deletion discovery in C. elegans Chapter 2 describes the development of a reliable and efficient aCGH platform to assist in the detection of deletions in C. elegans. This platform presents a number of advantages over the PCR-based method of deletion detection. Firstly, the amount of time and labour required to isolate each deletion is reduced because aCGH begins with the mutant animal already in hand, whereas the PCR-based method detects deletions among a large population of worms, subsequently requiring a lengthy process of “sibling selection” to isolate a single mutant animal (Barstead and Moerman 2006). Secondly, while PCR is limited to detecting deletions smaller than the amplicon size, aCGH has no constraint on the maximum detectable deletion size. Thirdly, aCGH can identify additional mutations elsewhere in the mutant genome that are not detected by the PCR-based method. These additional mutations could potentially confound the characterization of the mutant phenotype if not properly purged from the mutant genome by outcrossing.  Of course, multiple mutations that are found in a single animal can be individually isolated through backcrosses with N2 and subsequently studied independently.  Chapter 2 also presents the first detailed description of natural copy number variation in coding sequences in the C. elegans genome. This work focused on two highly divergent wild isolates of C. elegans (CB4856 from Hawaii, USA and JU258 from Madeira, Portugal) and was later expanded upon significantly in Chapter 4. I also coauthored a paper not discussed in this thesis that describes the application of the aCGH platform to characterize several genetic deficiency  4 and duplication strains, which enabled the subsequent positional cloning and identification of several previously unidentified mutations on chromosome III (Jones et al. 2007). 1.6. Detecting single nucleotide mutations in C. elegans using aCGH aCGH can also be used to detect underlying mutations in individuals with previously mapped phenotypes. Mapping and positional cloning efforts can be extremely laborious and time- consuming, often culminating in a candidate region containing hundreds of genes. Genomic intervals of this size permit highly sensitive aCGH experiments. As the length of a candidate region decreases, the sensitivity of aCGH experiments increases because of the increased probe density in the region of interest, allowing smaller and smaller deletions to be detected. Even single nucleotide mutations can be detected in aCGH experiments if the probe density is high enough (Gresham et al. 2006; Gresham et al. 2008), as shown in Figure 1.2. Chapter 3 describes the first use of aCGH to detect single nucleotide mutations in C. elegans, demonstrating that aCGH is a viable means of detecting null alleles resulting not only from deletions but also nonsense, frameshift or splice-site mutations. Of course, hypomorphic alleles resulting from missense mutations are also detectable. This method should also be useful for detecting single nucleotide mutations in other organisms with sequenced reference genomes such as Drosophila melanogaster. EMS is the most commonly used mutagen for C. elegans and although it is used by the Knockout Consortium to generate deletions, it primarily creates single nucleotide mutations (Anderson 1995; Cuppen et al. 2007). Many mutants generated by EMS mutagenesis exist in the C. elegans research community, and many of the mutations in these strains have already been mapped to candidate regions small enough to permit single nucleotide polymorphism (SNP) detection using aCGH as described in Chapter 3.  As mentioned, achieving sufficient probe density to detect single nucleotide mutations using aCGH relies on previously mapping the mutation to a small enough candidate region. I coauthored a paper in which we presented a greatly improved method of mapping mutations to small genomic intervals (Flibotte et al. 2009), based on our ability to detect single nucleotide variation in C. elegans using aCGH. This method, called SNP-CGH mapping, enables the mapping of a mutation to within 200 kb after just a single genetic cross between the mutant and the highly polymorphic Hawaiian strain (see Section 5.5 for details). This rapid and simple method of mapping mutations with high resolution should be enormously useful in forward  5 genetic screens as well as in enhancer and suppressor screens. The SNP-CGH procedure maps C. elegans mutations with sufficient precision to permit the use of the method described in Chapter 3 in an effort to precisely identify the mutation. 1.7. Copy number variation in natural isolates of C. elegans Copy number variation is an important component of genetic diversity in humans (Sebat et al. 2004; Redon et al. 2006), mice (She et al. 2008) and flies (Emerson et al. 2008), and it factors into human disease susceptibility and prospects for personalized medicine (Sebat et al. 2004; Conrad and Hurles 2007; McCarroll and Altshuler 2007; Buchanan and Scherer 2008). Prior to my work, the extent of copy number variation in C. elegans was unknown. In Chapter 4, I describe copy number variation in the genomes of twelve natural isolates of C. elegans. Genes that are present in the canonical N2 reference genome but absent in these natural isolates are less likely to serve critical functions and can be deprioritized in the Knockout Consortium’s process of targeted deletion discovery using PCR. Researchers interested in deletions targeting these genes could isolate them from the rest of the mutations in the genetic background of natural isolates by serial backcrosses with the N2 strain. The indels that were detected also provided a large number of genetic markers throughout the genome and permitted the opportunity to more thoroughly characterize the complicated relationships among the strains (Denver et al. 2003; Haber et al. 2005). 1.8. Thesis objectives The primary objective of this thesis was to develop an aCGH platform capable of detecting null mutations in C. elegans. The C. elegans Gene Knockout Consortium is mainly interested in discovering single gene deletions, so the platform needed to be efficient for this purpose. The capabilities of this platform were further extended to enable the detection of single nucleotide mutations, allowing the identification of other types of null alleles as well as missense or even silent mutations. I also wanted to use aCGH to investigate the gene content variation in wild isolates of C. elegans, out of an interest in genome evolution and an effort to clarify the complicated relationships among the strains. An important objective of the natural isolate work was to compile a list of genes present in N2 but absent in at least one of the wild strains, since the natural isolates were expected to contain a wealth of null alleles for non-essential N2 genes.  6 These genes would then be deprioritized as targets for knockout by the C. elegans Gene Knockout Consortium.  7 Figure 1.1. Array Comparative Genomic Hybridization. Two genomic DNA samples are fragmented by sonication and differentially end-labeled with either Cy3 or Cy5 fluorescent dyes. The samples are then mixed in equal proportions and cohybridized to a microarray of oligonucleotide probes. Labeled fragments hybridize to complementary probe sequences on the microarray, and the ratio of fluorescent signals (Cy3 / Cy5) is measured at each probe location. As shown here, ratios significantly greater than 1 (log2 ratio > 0) are indicative of amplifications in the mutant genome relative to the wild-type reference sample, while ratios significantly less than 1 (log2 ratio < 0) indicate deletions. Figure reprinted with permission from Don Moerman and Oxford University Press: Briefings in Functional Genomics and Proteomics 7(3): 195-204, copyright 2008.     8 Figure 1.2. Detection of a single nucleotide mutation by aCGH utilizing a microarray of highly overlapping oligonucleotide probes. A single nucleotide mutation is sufficient to cause a detectable shift in log2 ratios on a microarray of 50-mer oligonucleotide probes. Several highly overlapping probes, each of which targets the mutation, are required in order for a shift of this magnitude to be statistically significant. The first position of each probe is indicated by a . The length of each probe targeted by the mutation is illustrated by a horizontal bar, and the position of the mutation in each probe sequence is indicated by an *. Details are given in Chapter 3.   9 1.9. References Anderson, P. 1995. Mutagenesis. Methods Cell Biol 48: 31-58. Barstead, R.J. and D.G. Moerman. 2006. C. elegans deletion mutant screening. Methods Mol Biol 351: 51-58. Brenner, S. 1974. The genetics of Caenorhabditis elegans. Genetics 77: 71-94. Buchanan, J.A. and S.W. Scherer. 2008. Contemplating effects of genomic structural variation. Genet Med 10: 639-647. C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012-2018. Carvalho, B., E. Ouwerkerk, G.A. Meijer, and B. Ylstra. 2004. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides. J Clin Pathol 57: 644-646. Conrad, D.F. and M.E. Hurles. 2007. The population genetics of structural variation. Nat Genet 39: S30-36. Cuppen, E., E. Gort, E. Hazendonk, J. Mudde, J. van de Belt, I.J. Nijman, V. Guryev, and R.H.A. Plasterk. 2007. Efficient target-selected mutagenesis in Caenorhabditis elegans: Toward a knockout for every gene. Genome Res. 17: 649-658. Denver, D.R., K. Morris, and W.K. Thomas. 2003. Phylogenetics in Caenorhabditis elegans: an analysis of divergence and outcrossing. Mol Biol Evol 20: 393-400. Edgley, M., A. D'Souza, G. Moulder, S. McKay, B. Shen, E. Gilchrist, D. Moerman, and R. Barstead. 2002. Improved detection of small deletions in complex pools of DNA. Nucleic Acids Res 30: e52. Emerson, J.J., M. Cardoso-Moreira, J.O. Borevitz, and M. Long. 2008. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science 320: 1629-1631. Flibotte, S., M.L. Edgley, J. Maydan, J. Taylor, R. Zapf, R. Waterston, and D.G. Moerman. 2009. Rapid High Resolution Single Nucleotide Polymorphism-Comparative Genome Hybridization Mapping in Caenorhabditis elegans. Genetics 181: 33-37. Flibotte, S. and D.G. Moerman. 2008. Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization. BMC Genomics 9: 497. Giaever, G., A.M. Chu, L. Ni, C. Connelly, L. Riles, S. Veronneau, S. Dow, A. Lucau-Danila, K. Anderson, B. Andre, A.P. Arkin, A. Astromoff, M. El-Bakkoury, R. Bangham, R. Benito, S. Brachat, S. Campanaro, M. Curtiss, K. Davis, A. Deutschbauer, K.D. Entian, P. Flaherty, F. Foury, D.J. Garfinkel, M. Gerstein, D. Gotte, U. Guldener, J.H. Hegemann, S. Hempel, Z. Herman, D.F. Jaramillo, D.E. Kelly, S.L. Kelly, P. Kotter, D. LaBonte, D.C. Lamb, N. Lan, H. Liang, H. Liao, L. Liu, C. Luo, M. Lussier, R. Mao, P. Menard, S.L. Ooi, J.L. Revuelta, C.J. Roberts, M. Rose, P. Ross-Macdonald, B. Scherens, G. Schimmack, B. Shafer, D.D. Shoemaker, S. Sookhai-Mahadeo, R.K. Storms, J.N. Strathern, G. Valle, M. Voet, G. Volckaert, C.Y. Wang, T.R. Ward, J. Wilhelmy, E.A. Winzeler, Y. Yang, G. Yen, E. Youngman, K. Yu, H. Bussey, J.D. Boeke, M. Snyder, P. Philippsen, R.W. Davis, and M. Johnston. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387-391. Gilchrist, E.J., N.J. O'Neil, A.M. Rose, M.C. Zetka, and G.W. Haughn. 2006. TILLING is an effective reverse genetics technique for Caenorhabditis elegans. BMC Genomics 7: 262.  10 Granger, L., E. Martin, and L. Segalat. 2004. Mos as a tool for genome-wide insertional mutagenesis in Caenorhabditis elegans: results of a pilot study. Nucleic Acids Res 32: e117. Gresham, D., M.J. Dunham, and D. Botstein. 2008. Comparing whole genomes using DNA microarrays. Nat Rev Genet 9: 291-302. Gresham, D., D.M. Ruderfer, S.C. Pratt, J. Schacherer, M.J. Dunham, D. Botstein, and L. Kruglyak. 2006. Genome-Wide Detection of Polymorphisms at Nucleotide Resolution with a Single DNA Microarray. Science 311: 1932-1936. Haber, M., M. Schungel, A. Putz, S. Muller, B. Hasert, and H. Schulenburg. 2005. Evolutionary history of Caenorhabditis elegans inferred from microsatellites: evidence for spatial and temporal genetic differentiation and the occurrence of outbreeding. Mol Biol Evol 22: 160-173. Jones, M.R., J.S. Maydan, S. Flibotte, D.G. Moerman, and D.L. Baillie. 2007. Oligonucleotide Array Comparative Genomic Hybridization (oaCGH) based characterization of genetic deficiencies as an aid to gene mapping in Caenorhabditis elegans. BMC Genomics 8: 402. Kallioniemi, A., O.P. Kallioniemi, D. Sudar, D. Rutovitz, J.W. Gray, F. Waldman, and D. Pinkel. 1992. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258: 818-821. Kamath, R.S., A.G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot, S. Moreno, M. Sohrmann, D.P. Welchman, P. Zipperlen, and J. Ahringer. 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421: 231. Lamesch, P., S. Milstein, T. Hao, J. Rosenberg, N. Li, R. Sequerra, S. Bosak, L. Doucette- Stamm, J. Vandenhaute, D.E. Hill, and M. Vidal. 2004. C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions. Genome Res 14: 2064-2069. Mantripragada, K.K., P.G. Buckley, T.D. de Stahl, and J.P. Dumanski. 2004. Genomic microarrays in the spotlight. Trends Genet 20: 87-94. McCarroll, S.A. and D.M. Altshuler. 2007. Copy-number variation and association studies of human disease. Nat Genet 39: S37-42. Moerman, D.G. and R.J. Barstead. 2008. Towards a mutation in every gene in Caenorhabditis elegans. Brief Funct Genomic Proteomic 7: 195-204. Pinkel, D., R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W.L. Kuo, C. Chen, Y. Zhai, S.H. Dairkee, B.M. Ljung, J.W. Gray, and D.G. Albertson. 1998. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 20: 207-211. Redon, R., S. Ishikawa, K.R. Fitch, L. Feuk, G.H. Perry, T.D. Andrews, H. Fiegler, M.H. Shapero, A.R. Carson, W. Chen, E.K. Cho, S. Dallaire, J.L. Freeman, J.R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J.R. MacDonald, C.R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M.J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D.F. Conrad, X. Estivill, C. Tyler-Smith, N.P. Carter, H. Aburatani, C. Lee, K.W. Jones, S.W. Scherer, and M.E. Hurles. 2006. Global variation in copy number in the human genome. Nature 444: 444-454. Riddle, D.L., T. Blumenthal, B.J. Meyeer, and J.R. Priess. 1997. C. elegans II. Cold Spring Harbor Laboratory Press, Plainview, NY. Robert, V. and J.L. Bessereau. 2007. Targeted engineering of the Caenorhabditis elegans genome following Mos1-triggered chromosomal breaks. Embo J 26: 170-183.  11 Sebat, J., B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi, N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T.C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, and M. Wigler. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525-528. She, X., Z. Cheng, S. Zollner, D.M. Church, and E.E. Eichler. 2008. Mouse segmental duplication and copy number variation. Nat Genet 40: 909-914. Solinas-Toldo, S., S. Lampel, S. Stilgenbauer, J. Nickolenko, A. Benner, H. Dohner, T. Cremer, and P. Lichter. 1997. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 20: 399-407. Sulston, J.E. and H.R. Horvitz. 1977. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev Biol 56: 110-156. Sulston, J.E., E. Schierenberg, J.G. White, and J.N. Thomson. 1983. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev Biol 100: 64-119. White, J.G., E. Southgate, J.N. Thomson, and S. Brenner. 1986. The structure of the nervous system of the nematode C. elegans. Philosophical Transactions of the Royal Society of London - Series B: Biological Sciences: 1-340. Zwaal, R.R., A. Broeks, J. van Meurs, J.T. Groenen, and R.H. Plasterk. 1993. Target-selected gene inactivation in Caenorhabditis elegans by using a frozen transposon insertion mutant bank. Proc Natl Acad Sci U S A 90: 7431-7435  12 2. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array Comparative Genomic Hybridization 1 2.1 Introduction Comparative Genomic Hybridization (CGH) allows the detection of copy number differences between two DNA samples (Kallioniemi et al. 1992; Mantripragada et al. 2004). The two DNA samples, one a reference and the other a test sample, are differentially labeled and hybridized to a representative genome arrayed on a matrix. Over the past several years a number of different array platforms have been utilized for CGH from bacteria artificial chromosomes (BACs) and cosmids to cDNA clones and oligonucleotides (Solinas-Toldo et al. 1997; Pinkel et al. 1998; Mantripragada et al. 2004). As the ability to detect small alterations is limited by the spacing and size of the probes on the matrix, there has been a move away from BAC clones to oligonucleotide arrays for experiments where high resolution is required (Carvalho et al. 2004; Ishkanian et al. 2004; Sebat et al. 2004; Selzer et al. 2005). For example, oligonucleotide array- based CGH (aCGH) was recently used to measure copy number variation at specific exons in several human genes with a resolution between 50 and 500 bp (Dhami et al. 2005; Selzer et al. 2005).  We were interested in determining whether aCGH could be used to detect copy number alterations (insertions and deletions, or “indels”) among different DNA samples of the nematode Caenorhabditis elegans. Specifically, we wished to determine whether aCGH has the required sensitivity and resolving power to detect single-gene knockouts, where the deletions may be small and the animals may be heterozygous. Our laboratory is a member of the C. elegans Knockout Consortium (http://celeganskoconsortium.omrf.org/) and we are interested in examining techniques that might help us identify and clone single-gene knockouts more efficiently. Array CGH, if efficient, has a number of potential advantages over our current PCR- based method (Barstead 1999) of screening for deletions, including the ability to screen thousands of genes in a single experiment, no constraint on the maximum detectable deletion  1 A version of this chapter has been published. Maydan, J.S., S. Flibotte, M.L. Edgley, J. Lau, R.R. Selzer, T.A. Richmond, N.J. Pofahl, J.H. Thomas, and D.G. Moerman. 2007. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array comparative genomic hybridization. Genome Res. 17: 337-347.  13 size, and identification of copy number alterations at other loci in the mutant genome. As an example, the ability to detect large deletions would be useful for screening for tandem gene family knockouts, such as the Serpentine Receptor class AB (srab) family of seven- transmembrane chemoreceptors and integral membrane proteins, where consecutive genes may share functional redundancies (Chen et al. 2005; Thomas 2006b).  We have designed three exon-tiled oligonucleotide arrays, one for two chromosomes (II and X), one for a single chromosome (II), and one for the entire genome. Using these arrays, we have detected both previously characterized deletions of 1–50 kb in control experiments and new deletion alleles of genes with no known mutations. The sensitivity of aCGH is such that we can detect small deletions even in heterozygous animals. The ability to detect single-copy single- gene deletions at this resolution will allow us to use aCGH to screen for novel induced deletions in mutagenized populations. This will greatly aid our efforts to generate knockout strains for the research community. The resolution of aCGH may also make it an attractive tool for those studying the population biology and evolution of C. elegans. The large number of indel differences observed among the Bristol, Hawaiian, and Madeiran nematode strains points to the dynamic nature of genomes and the flux of many of the gene families within this organism. 2.2. Results 2.2.1. Oligonucleotide probe quality and detection of homozygous 50-kb and 1-kb deletions We designed a pilot microarray composed of a tiled set of oligonucleotide probes to nearly 90% of the exons and 94% of the genes on chromosomes X and II. This set revealed a remarkable consistency in signal-to-noise ratios over all of the experiments. Our initial aCGH experiment was designed to determine whether a large (50 kb) homozygous deletion could be distinguished reliably from wild-type DNA. For this experiment we used gkDf2, a homozygous-viable deletion of the dim-1 locus on chromosome X. PCR analysis indicated that the deletion breakpoints lay between 8,046,205 and 8,046,422 on the left and 8,088,676 and 8,108,916 on the right, a physical interval of ∼50 kb that potentially included up to 12 genes. For this large deletion experiment, fluorescence intensities were collected for all probes, and we calculated log2 fluorescence intensity ratios for the mutant test sample with the reference wild-type sample (gkDf2/ WT). After normalizing with a LOESS regression, the average log2 intensity ratios for  14 the probe pairs gave a SD of 0.13 with very few outliers (see Fig. 2.1a). Note that these few outliers are from a plot of  > 92,000 forward and reverse complement probe pairs. The gkDf2 deletion was identified unambiguously in this plot as a prominent peak of negative log2 ratios for probe pairs targeting the chromosome X region around dim-1. An enlarged view of this region is shown in Figure 2.1b, showing that nine genes are affected by this deletion. The deletion breakpoints are clearly defined at the resolution of individual exons. These results indicated that we could certainly identify deletions smaller than 50 kb. Interestingly, probes adjacent to the breakpoints exhibit a positive log2 ratio (Fig. 2.1b), possibly indicating previously unknown duplications of the flanking sequences, which have been periodically observed for deletions caused by this type of mutagenesis (data not shown).  Using the same X:II array design, we next examined whether aCGH could detect a smaller homozygous deletion elsewhere on the X chromosome. The mutation gk329 is a 1047-bp deletion in the gene ceh-39. A hybridization plot comparing gk329 with wild-type DNA (Fig. 2.2a) showed a deleted region in the chromosome X region around ceh-39, the site of the gk329 deletion. This region is enlarged in Figure 2.2b and aligned with a diagram of the coding regions for ceh-41, ceh-21, and ceh-39. The 30 probe pairs representing exons 1, 2, and 3 of ceh-39 (T26C11.7) showed strong negative fluorescence log2 ratios, while the nine probe pairs representing exon 4 of ceh-39 and the probe pairs targeting the five nearest exons of ceh-21 (T26C11.6) yielded lower amplitudes, but still statistically significant non-zero log2 ratios (with P-values of 3 x 10-8 and 7 x 10-5, respectively).  Probe pairs targeting ceh-41 (T26C11.5) had log2 ratios closer to zero. The negative ratios for probes targeting exons 1–3 of gk329 corresponded exactly with deletion breakpoints determined by DNA sequencing (chromosome X coordinates 1,854,827/1,855,875). The next gene to the right of the deletion, T26C11.t1 (encoding a tRNA-Glu), was not represented among the probes on the array. Fluorescence ratios for probes to the next closest gene, tbx-41 (T26C11.1), lying 9365 bp beyond the distal deletion breakpoint, showed no evidence of reduced signal intensity in the gk329 sample (data not shown). From this experiment it was clear that the X:II chip design permits detection of deletion breakpoints at the resolution of individual exons.  15 2.2.2. Detection of single-copy number differences between hermaphrodite and male X chromosomes and in a balanced chromosome II deficiency Broader application of aCGH in C. elegans research and other model systems would be feasible if its sensitivity extended to detecting heterozygous (single-copy) deletions. We performed two experiments to determine whether log2 fluorescence ratios from a heterozygote are sufficient to give unambiguous identification of deletions. In the first experiment we compared the hybridization signal from wild-type C. elegans hermaphrodite DNA (two X chromosomes) to male DNA (one X chromosome) for all probes on chromosomes II and X (Fig. 2.3). The median log2 fluorescence ratios for probe pairs to chromosome II (which should be equally represented in the two samples) was set to zero, and these exhibited a SD of 0.23. Setting the log2 ratios for chromosome II to zero led to a median log2 ratio for forward and reverse complement probe pairs to chromosome X of -0.82, with a SD of 0.22. These distributions for chromosomes II and X overlapped by only 4% (Fig. 2.3).  In a second experiment we compared wild-type DNA with that from a heterozygous 1202-bp deletion on chromosome II, using a balanced strain of genotype dab-1(gk291)/mIn1[mIs14 dpy- 10(e128)]. Heterozygous animals are wild-type with a pharyngeal green fluorescent protein (GFP) signal conferred by the mIn1 balancer chromosome, and they segregate ∼50% heterozygotes, 25% gk291 homozygotes (viable and fertile but slow-growing), and 25% mIn1 homozygotes (viable and fertile Dpy with small broods and a strong pharyngeal GFP signal). Initially, we compared the hybridization signal from wild-type DNA to that from DNA made from confirmed gk291/mIn1 heterozygotes. A separate hybridization compared the wild-type signal with that from DNA made from a population containing all progeny genotypes in their normal proportions. To obtain this latter sample we simply washed animals off a plate and isolated DNA from the mixed population of animals. The data plots from these two hybridizations were virtually indistinguishable, and both yielded reliable detection of the gk291 deletion (P = 4 x 10-13 [data not shown] and P = 8 x 10-14 [Fig. 2.4], respectively). These experiments demonstrate that single-copy deletions within a single gene can be reliably detected using aCGH.  Figure 2.4b shows a fluorescence ratio plot for probe pairs to the dab-1 locus aligned with a diagram of the dab-1 gene model from WormBase WS120. The sequenced deletion breakpoints  16 lie at chromosome II coordinates 8,226,388 and 8,227,590, and agree perfectly with breakpoints predicted by the log2 fluorescence ratios. The log2 fluorescence ratios for probes to the deleted region were similar to those observed for probes to the X chromosome in the male/hermaphrodite experiment. The proximal deletion breakpoint is within an intron, while the distal deletion breakpoint is within an exon. The oligo probes around these breakpoints serve to illustrate the high resolution of aCGH. Since we targeted all aCGH probes to exons rather than using a tiling-path approach, the resolution of the proximal breakpoint was only about 400 nucleotides due to the first intron causing a 400-plus gap between adjacent oligos in that region. However, the distal deletion breakpoint, which lies within an exon, was more accurately resolved since it is targeted by two overlapping oligos (Fig. 2.4b). Together, these two probes span just 73 base pairs, thus resolving the distal deletion breakpoint to < 50 nucleotides. In this experiment, probes flanking the deletion on either side yielded significant positive log2 fluorescence ratios. 2.2.3. The mIn1 balancer chromosome Figure 2.4a reveals several copy number alterations in addition to the gk291 deletion, including a deletion (roughly chromosome II coordinates 1,020,000–1,050,000) and four amplifications (roughly chromosome II coordinates 1,561,000–1,567,000, 11,698,000–11,701,000, 12,847,000–12,848,000, and 13,482,000–13,490,000). Since these experiments were the first to include the mIn1 balancer chromosome, we speculated that these additional features might represent deletions or amplifications on the balancer chromosome, or elsewhere in the balancer strain genome. Formally, the additional chromosome features could be linked to the gk291- bearing chromosome, the balancer, or distributed between them or other chromosomes (this latter possibility applies to amplifications only). We suspected that these were features of mIn1, as the construction of this balancer chromosome required two rounds of mutagenesis (Edgley and Riddle 2001). Comparison of N2 DNA with DNA from mIn1 homozygotes showed that all of the additional features observed in the dab-1/mIn1 heterozygote (Fig. 2.4a) were indeed derived from the mIn1 strain (data not shown). These alterations included the deletion on the left arm plus the various amplified regions throughout the chromosome. We can only be certain that the deletion involves the mIn1 chromosome II, since the amplifications of chromosome II sequences in the mIn1 strain do not necessarily reside on chromosome II.  17 2.2.4. Novel balanced lethal deletions on chromosome II To demonstrate that aCGH can be used to reliably detect novel deletions, we conducted a screen for lethal mutations on chromosome II balanced by the mIn1 inversion, using TMP/ UV as a mutagen. We mutagenized a predominantly L4 population of DR2078 [bli-2(e768) unc- 4(e120)/mIn1[mIs14 dpy- 10(e128)]], set up clonal populations from the F1 progeny, and screened the F2 for absence of viable, fertile Bli-2 Unc-4 adults. Such lines indicated the presence of new recessive lethal mutations linked to bli-2 unc-4 and balanced by mIn1. Approximately 200 balanced lethal lines were obtained. We analyzed 30 of these strains by array CGH, using DNA prepared from mixed populations washed off plates, and detected 25 new deletions (0–3 deletions per strain). We describe six of these new deletions here (see Fig. 2.5). For one of the candidates, gk463, PCR using primers to the regions flanking the deletion confirmed the presence of an 8-kb deletion; sequencing this PCR product demonstrated that gk463 deletes 8063 bp between chromosome II coordinates 4,131,236/4,139,298 encoding the ast-1 (T08H4.3) gene (Fig. 2.5a). Similar experiments confirmed three other deletions. The gk460 lesion is 4.5-kb deletion encompassing the genes F10E7.4 (spon-1), F10E7.11, and F10E7.2 (breakpoints at chromosome II coordinates 7,118,909 and 7,123,417; Fig. 2.5b); gk462 is a 2.2-kb deletion encompassing Y51B9A.5 and an internal tRNA (Fig. 2.5c); and gk465 is a 141-bp deletion affecting the gene C06A8.1 (breakpoints at chromosome II coordinates 7,774,796 and 7,774,938; Fig. 2.5d).  In addition to these small deletions, we identified several larger deletions spanning several genes. The gk488 deletion is nearly 500 kb in size, spanning chromosome II coordinates 10,662,230 to 11,160,425, affecting 93 genes (Fig. 2.5e). An even larger deletion affecting 274 genes was identified in gk487. The deletion spans chromosome II coordinates 3,057,725 through 3,841,090, completely deleting over 783 kb with the exception of ∼4.5 kb (from chromosome II coordinates 3,131,948 through 3,136,511) (Fig. 2.5f). From these experiments, we conclude that aCGH is a powerful and efficient method for discovery of knockout mutations and for characterizing large deletions in this organism.  18 2.2.5. Whole-genome array CGH: Comparing N2 Bristol to Hawaiian and Madeiran wild isolates The amount of gene content variability within natural populations of animal species is largely unknown. High-resolution aCGH appears to be an excellent method for exploring this variability. To examine this variability we designed a whole-genome oligonucleotide CGH array to detect alterations among wild-type nematode strains. This array targets the entire C. elegans genome, with 94% coverage of the exons and 98% coverage of the genes. Using our selection criteria it was not possible to obtain 100% coverage with unique probes (see Methods). Here we compared the N2 Bristol strain, isolated in England, to the strain CB4856 that was isolated on one of the Hawaiian Islands and to a strain isolated on the island of Madeira (JU258). We chose the Hawaiian strain because it is a popular strain for single nucleotide polymorphism (SNP) mapping, as it has many sequence variations compared with N2 (Wicks et al. 2001). Comparisons among wild isolates are potentially complicated by the presence of single nucleotide changes relative to N2, which might cause reduced log2 fluorescence ratios that do not reflect deletions. To minimize this possibility, we used a conservative analysis for copy number changes and counted regions as deleted only if they had a consistently low log2 fluorescence ratio over a substantial distance covering many probes (see Methods).  Using these conservative criteria we were able to detect many indel differences between N2 and the Hawaiian strain (Fig. 2.6), illustrating that natural large-scale gene-content variation exists between populations. We observed similar differences between N2 and the Madeiran strain (Fig. 2.7). The Hawaiian strain exhibited 141 deletions relative to N2, with a total length of 1.54 Mb of DNA deleted (1.54% of the genome). These deletions removed 483 predicted genes and 48 predicted pseudogenes (2.54% of all genes) (Table 2.1). The Madeiran isolate had 122 deletions relative to N2, deleting 1.94 Mb (1.94% of the genome), removing 670 loci (39 of which are pseudogenes) (Table 2.1). Appendices 1 and 2 show chromosomal coordinates and interpretations for every deleted gene for pairwise genome comparisons between N2 and the Hawaiian and Madeiran strains, respectively.  Alterations in the Hawaiian and Madeiran strains relative to N2 Bristol are unevenly distributed both within and between chromosomes, appearing more often on the chromosome arms than in the centers, and a large number of changes on chromosomes II and V, but relatively few  19 changes on chromosome X (Figures 2.6 and 2.7). Most of the copy number alterations detected appear to be deletions in the Hawaiian and Madeiran strains relative to N2, but a few amplifications are also evident. The genome regions deleted in the Hawaiian and Madeiran strains are not gene poor or enriched in known pseudogenes, indicating that there are major differences in the functional gene content among these isolates. Among many gene families analyzed, a few were overrepresented among deleted genes (Table 2.1). The frequency of deletions was particularly high for the MATH-BTB, F-box, C-type lectin, and Srz chemoreceptor families.  It was impractical in this study to validate all copy number changes detected by aCGH between these strains, but we did test one representative deletion extending over several probes. We identified a 2942-bp deletion on chromosome V in the Hawaiian strain, CB4856, which affects two adjacent genes, C49G7.1 and D1065.3. Both are uncharacterized genes containing ankyrin repeats as well as BRCT and WSN domains. We designed primers flanking the deletion, amplified the affected region using PCR, and sequenced the region to determine the deletion breakpoints. The deletion falls between chromosome V coordinates 4,057,455 and 4,057,457 for the proximal breakpoint and 4,060,396 and 4,060,398 for the distal breakpoint, confirming a deletion for these two genes in the Hawaiian strain relative to N2 Bristol (Fig. 2.8). We also examined a gene, gst-38, that has been sequenced from the Hawaiian strain and is known to have several SNPs relative to the Bristol strain (Denver et al. 2003). Probe targets in the Hawaiian genome contain 0–3 SNPs each, which resulted in a significantly negative log2 ratio in that region of the genome (-1.6), but not of sufficient amplitude to pass our conservative criteria for identifying deletions (see Methods). 2.3. Discussion 2.3.1. The utility of aCGH in screening for novel deletions We have demonstrated that aCGH is a viable platform for detecting heterozygous deletions as small as 141 bp in size in C. elegans. By targeting exons it is more likely that any detected deletion alters the structure of the gene product. Depending on the overlap of oligonucleotides on the array, the resolution of a deletion breakpoint can be < 50 bp. To increase resolution, chromosome-specific arrays can be manufactured as we did for chromosome II, which may be  20 desirable depending on the experiment being undertaken. For identifying lethal mutations this may be the most fruitful approach, as the lethal mutation will already be balanced (as described above). PCR amplification and DNA sequencing of the deleted region in the mutant genome can be utilized to precisely identify the breakpoints after aCGH has made the initial identification.  The ability to detect deletion and amplification events in heterozygous animals is a testament to the sensitivity of aCGH. This is particularly important when screening for lethal mutations, as it means one can use DNA samples from balanced heterozygous populations that are simply washed from a plate. The added convenience of not having to separate out mutant animals should make this type of analysis more amenable as a high-throughput method.  An important feature of aCGH is that it yields a high-resolution view of a whole chromosome, or even a whole genome, without the size limitation of ∼100 kb when using a BAC-based platform. Combining an oligonucleotide-based approach with a high-density array format (∼385,000 unique probes) is already leading to widespread adoption of this method for high- resolution mapping of DNA breakpoints for larger sized chromosomal rearrangements in tumors and microdeletion syndromes in humans (Pollack et al. 2002; Selzer et al. 2005; Stallings et al. 2006; Strefford et al. 2006; Urban et al. 2006), as well as the detection of amplifications and deletions such as copy number polymorphisms < 0.1 Mb in size (Lucito et al. 2003; Sebat et al. 2004; Conrad et al. 2006). The power of screening a whole chromosome or whole genome for gains and losses of genomic DNA was amply illustrated when we tested for mutations balanced by the inversion mIn1. Besides the known inversion, the mIn1 strain contained several previously unknown deletions and amplifications, some linked to the inversion, but others possibly resident elsewhere in the genome (aCGH identifies only the presence of a sequence in a genome, not its location). We also found a previously undetected deletion of exons 4 and 5 of the gene K05F6.2 (fbxb-50) in our N2 strain. Curiously, this deletion must have occurred relatively recently, as all of the mutations studied here were isolated from N2 in this laboratory. Without whole-genome testing by aCGH, these novel features present in the genomes of N2 and the balancer strain would have remained undetected.  21 2.3.2. Natural gene content variation in wild populations The results of our whole-genome experiment comparing N2 to the Hawaiian and Madeiran wild-type strains revealed a large amount of gene-content variation among these natural isolates. Most of these differences are deletions in the Hawaiian or Madeiran strains relative to N2. Obviously, there is a bias in favor of detecting deletions, because all probes target sequences that are present in the N2 genome. Probe targets containing several SNPs could potentially cause the identification of spurious deletions, so we have used very conservative criteria to ameliorate this possibility and observed that even a gene as divergent as gst-38 is not mistakenly identified as a deletion. Our exon-centric probe selection should also help to reduce the impact of SNPs on hybridization, since SNPs are less common in coding sequences. To identify N2 deletions we will need to compare N2 to a sequenced Hawaiian or Madeiran strain. Previous work in nematodes has shown that chromosomal rearrangements, repeat elements, and transposons are all more common on chromosome arms than in the central region of the chromosomes (Stein et al. 2003). Homologous gene clusters are also more common on the chromosome arms, particularly on the proximal arm of chromosome II and both arms of chromosome V (Thomas 2006b), where we observe the largest number of deletions in the Hawaiian and Madeiran strains. This result suggests that non-allelic homologous recombination (Lupski 1998) on chromosome arms between repeat sequences and/or homologous gene clusters could be responsible for many of the deletions observed in these strains relative to N2. This could also explain the smaller number of gene content alterations observed between N2 and the Hawaiian and Madeiran strains on the X chromosome, where chromosomal rearrangements are less common (Stein et al. 2003). Our array designs targeted only annotated exons in the sequenced N2 genome, but the large number of deletions observed in the Hawaiian and Madeiran strains relative to N2 implies the likelihood that N2 has also lost novel genes present in the other natural isolates.  The frequency of deletions was particularly high for the MATH-BTB, F-box, C-type lectin, and Srz chemoreceptor families. These four gene families are among those with the highest rates of birth–death evolution among Caenorhabditis species (J.H. Thomas, unpublished data). The correlation indicates that indel population diversity within the C. elegans species is related to long-term evolutionary stability in gene families. The nature and level of deletion polymorphisms that we find in the nematode is mirrored in human populations (Conrad et al.  22 2006; Hinds et al. 2006; Locke et al. 2006; McCarroll et al. 2006). In the study by Conrad et al. (2006), they reported that genes involved in immunity and defense, sensory perception, cell adhesion, and signal transduction were especially prone to deletion, categories that overlap the gene families highlighted as prone to deletion in C. elegans (Thomas et al. 2005; Thomas 2006a). Array studies in nematodes and humans are the first experiments to view wholesale gene-content variation of large numbers of genes in many diverse gene families between populations. These observations from humans and nematodes offer strong support for the “less- is-more” hypothesis of evolutionary change (Olson 1999). In his review, Olson argued “loss of gene function may represent a common evolutionary response of populations undergoing a shift in environment and, consequently, a change in the pattern of selective pressures.” He went on to suggest that, “adaptive loss of function may occur regularly and may spread rapidly through small populations.” With their small genome size, rapid life cycle, and self-fertilizing mode of reproduction, dispersed wild populations of nematodes are perhaps ideally suited to monitor genomic responses to environmental selective pressures.  Similar to others, we observe that the Hawaiian and Madeiran strains are more similar to each other than either are to the Bristol (N2) strain (Haber et al. 2005; Stewart et al. 2005). At first this seems surprising; why should nematodes from the Hawaiian Islands located in the middle of the Pacific Ocean and nematodes from Madeira, an island in the Atlantic off the coast of the African continent, be so similar? As previously suggested (Stewart et al. 2005), we think there may be a simple explanation based on the migration of human populations. During the last half of the nineteenth and first half of the twentieth centuries, planting and harvesting sugar cane was a major crop in Hawaii. The workers in the cane fields came from many countries including China, Japan, the Philippines, and after 1878, from Portugal (Bartholomew and Bailey 1994). Almost all of the new immigrants from Portugal came from either the Azores or the island of Madeira (Bartholomew and Bailey 1994), and these immigrants may have inadvertently brought C. elegans with them. If this is true, we have a fairly precise timeline for the introduction of a new strain of C. elegans to Hawaii. (Subsequent analysis presented in Section 4.3.2 indicates that these two strains are not closely related.)  In the experiments described here we have demonstrated that aCGH is a robust technology with many possible applications. These include experiments as diverse as screens for novel induced deletions to population genetic studies comparing evolutionary differences among natural  23 isolates. The protocols and chips described here for the C. elegans genomes can similarly be made for other organisms as is already evident in human, mouse, and yeast studies. The high- resolution genome-wide investigation of DNA copy number changes reported here for C. elegans will likely prove to be a powerful tool in genome-wide studies of other model organisms, such as the fly and zebrafish genomes, and the more recently sequenced chicken and dog genomes. 2.4. Methods 2.4.1. Probe selection, microarray design, and microarray manufacture The pilot project focused initially on chromosomes II and X. DNA oligonucleotides, 50 nucleotides in length, were selected to tile open reading frames from both chromosomes. Several types of filters were applied in the selection process in order to maximize the sensitivity and specificity of the oligonucleotides and the signal-to-noise ratio. The applied filters were intentionally relatively mild in order to produce data that would reveal the most important characteristics of oligonucleotides for future chip designs. As a result, ∼90% of the exons and 94% of the genes from both chromosomes are represented on the array. Our oligonucleotide selection can be arbitrarily divided into eight sequential phases. Unless stated otherwise, all of the computer programs have been developed as part of the current work and are freely available from one of the authors (S. Flibotte). (1) The sequences of all curated exons and RNA transcripts on chromosomes II and X were extracted from WormBase (data freeze WS120). Sequences smaller than 50 bases were extended to 50 bases and overlapping sequences were merged. (2) All of the repeats annotated in WormBase were masked. All non-masked subsequences < 50 bases in length were then masked (this was also done after phases 3 and 4). (3) All of the 20-mers occurring more than once in the genome were masked. (4) Homopolymers > 5 bases in length were masked. (5) All possible 50-mers were extracted from the non-masked subsequences and only those with GC content between 30% and 56% were kept, which corresponds to a melting temperature range of Tm = 72.6 ± 5°C. (6) All of the 50- mers with folding energy larger than -1 kcal/mol according to a hybrid-ssmin calculation (Markham 2003) were kept. (7) Following a MegaBLAST (Zhang et al. 2000) calculation, 50- mers without significant homology with other locations in the genome were kept. (8) For all remaining subsequences, the 50-mers with the lowest overall 15-mer counts were selected using  24 a greedy algorithm and probe spacing parameter, ensuring that the distance between the starting positions of two neighboring oligonucleotides is at least 22 bases for chromosome II and 21 bases for chromosome X, except for the region around dim-1, where the distance was set to 6 bases. For each subsequence, the selection continued until no further oligonucleotides could be selected while respecting the overlap constraint. The overall 15-mer count of an oligonucleotide is defined as the sum of the genomic frequencies of all constituent 15-mers. The application of all of these filters resulted in the selection of 97,481 oligonucleotides for chromosome II and 92,209 oligonucleotides for chromosome X. Microarrays were manufactured by NimbleGen Systems, with each oligonucleotide and its corresponding reverse complement synthesized at random positions on the array.  A similar procedure was used to design a chip targeting the whole C. elegans genome (using release WS139) and a chip targeting chromosome II alone (using data freeze WS150). The only differences were that no reverse complement probes were synthesized, the probe spacing parameter was adjusted, and a procedure was introduced to rescue exons targeted by fewer than two oligonucleotides. For the whole-genome chip we tried to select one probe upstream and one probe downstream as close as possible to the underrepresented exon following the filters 2–7 described in the previous paragraph. With a probe spacing parameter of 39, this resulted in 61,910 probes for chromosome I, 64,165 for chromosome II, 56,856 for chromosome III, 59,422 for chromosome IV, 82,944 for chromosome V, and 59,564 for chromosome X. For the chromosome II chip we selected 332,334 probes targeting annotated exons with a probe spacing parameter of 6, and 47,853 probes targeting noncoding sequences with a spacing parameter of 85. 2.4.2. Nematode culture, harvest, and DNA preparation Nematodes were generally grown as previously described (Brenner 1974) on 60- or 150-mm NGM agar plates seeded with Escherichia coli strain OP50 or χ1666. Strains used were N2 (VC196, a hermaphrodite subculture of N2 received from the Caenorhabditis Genetics Center in 2002); N2 males (male stock of CGC N2 received in 1998); mIn1[mIs14 dpy-10(e128)] homozygotes derived from a single Dpy animal selected from CGC strain DR2078 (strain not kept); VC100 (unc-112(r367) V; gkDf2 X); VC615 (dab-1(gk291)/mIn1[mIs14 dpy-10(e128)] II); VC766 (ceh-39(gk329) X); CB4856 (a subculture of the Hawaiian C. elegans wild isolate  25 HA-8); and JU258 (a wild C. elegans isolate from Madeira). All mutant strains (excluding mIn1) were generated by mutagenesis with trimethylpsoralen (TMP) and UV-irradiation. For DNAs prepared from plate cultures, populations were grown to starvation and harvested by washing into 15-mL centrifuge tubes with 10 mL of M9 buffer containing 0.01% Triton X-100. Each population was washed seven times by centrifugation, removal of supernatant by aspiration, and resuspension and vortexing in fresh M9/Triton X-100. After the final wash, populations were plated on unseeded agar plates and left overnight at 20°C to digest any bacteria remaining in their guts, then reharvested by washing and centrifugation. For DNA from N2 males and confirmed dab-1(gk291)/ mIn1[mIs14 dpy-10(e128)] II balanced heterozygotes, worms were picked directly into M9/Triton X-100 in labeled 1.8-mL microcentrifuge tubes, and washed free of bacteria in seven rounds of dilution/centrifugation/aspiration. Aliquots of pelleted worms were transferred to 1.5-mL microcentrifuge tubes containing lysis buffer (50 mM KCl, 10 mM Tris-HCl at pH 8.3, 2.5 mM MgCl2, 0.45% NP-40 [Igepal], 0.45% Tween-20, 0.01% gelatin, 300 µg/mL Proteinase K), frozen at -20°C, and incubated at 55°C–60°C for 3 hours. DNA was prepared either by standard phenol-chloroform extraction followed by ethanol precipitation or with the Puregene DNA Purification Kit (D-7000A, Gentra Systems) using the solid tissue protocol. Purified DNAs were resuspended in nuclease-free sterile dH2O or TE (10 mM Tris-HCl, 1 mM EDTA at pH 7.0–8.0). DNA concentrations were determined with a spectrophotometer (Biomate3, Thermo Spectronic) and adjusted to 500 ng/µL for submission to NimbleGen Systems, Inc. for further processing. 2.4.3. DNA fragmentation and labeling Samples were fragmented and labeled in the NimbleGen Service Laboratory as follows. Two micrograms of each genomic DNA sample were diluted to 80 µL with deionized (DI) water and fragmented by sonication. A portion (0.3 µg) of each sonicated sample was run on a 1% agarose gel to confirm that most of the DNA fragments were between 500 and 2000 bp in length.  Cy3 and Cy5 dye-labeled random 9-mers (TriLink BioTechnologies, Inc.) were diluted to 1 O.D./42 µL of buffer containing 0.125 M Tris-HCl (pH 8.0), 0.125 M MgCl2, 1.75 µL/mL β- mercaptoethanol. Mutant DNA samples were labeled with Cy3 and the wild-type DNA sample (VC196) was labeled with Cy5. One microgram of genomic DNA was added to each random 9- mer buffer solution, denatured at 95°C, and then chilled on ice in 0.2 mL PCR tubes. A total of  26 10 µL of 50x dNTP mixture (1x TE buffer, 10 mM each of dATP, dCTP, dGTP, and dTTP), 8 µL of DI water, and 100 U of Klenow fragment (exo-) was added to each tube and mixed well with a pipette. Samples were centrifuged and incubated at 37°C for 2 hours and 10 µL of 0.5 M EDTA was added and mixed well to stop the labeling reaction. DNA was precipitated by adding 11.5 µL of 0.5M NaCl and 110 µL of isopropanol, vortexing, incubating in the dark for 10 min at room temperature, and centrifuging at 12,000g for 10 min. The supernatant was removed and the DNA pellet was washed with 500 µL of 80% ethanol. After centrifugation at 12,000g for 2 min, the supernatant was removed, and the pellet was dried in a SpeedVac on low heat for 5 min before being rehydrated in 25 µL of DI water. DNA concentration was measured using a spectrophotometer. 2.4.4. Sample hybridization and imaging Samples were hybridized in the NimbleGen Service Facility using standard operating procedures, as previously described (Selzer et al. 2005). Briefly, 15 µg of each labeled test and reference DNA sample were added to a single 1.5 mL tube and dried down in the dark in a SpeedVac on low heat. The DNA was resuspended in 3.5 µL of DI water and vortexed; 41.5 µL of NimbleGen hybridization buffer was added to the tube, mixed well, and heated at 95°C for 5 min in the dark. Samples were hybridized at the NimbleGen Service Facility for 16–20 hours at 42°C. and then washed with NimbleGen wash buffers and scanned on an Axon scanner (Model # 4000B). 2.4.5. Data analysis The fluorescence intensity of each feature on the array was extracted with the NimbleScan 2.1 software for the sample and reference images. The intensity ratios were normalized with the help of the robust LOESS regression on the so-called M-A plot, where M = log2 I1/I2 and A = log2 sqrt(I1*I2), I1, and I2 being the intensities of the feature in the two images, similar to the procedure described in Yang et al. (2002). The LOESS regression was implemented with the library from Cleveland et al. (1992). The log2 ratios, M, corresponding to the probes targeting the forward and reverse strands at the same genomic location, were averaged. No outliers were excluded from the subsequent analysis. Copy number aberrations were detected both by careful visual inspection and with a segmentation algorithm developed and currently being tested by  27 one of the authors (S. Flibotte). This segmentation algorithm is a very efficient implementation of a bottom-up approach (see Appendix 3). The P-value for each aberration was calculated with a one-sample t-test (however, with the total number of non-aberrant data points being very large, one-sample and Welch two-sample t-tests give essentially the same P-values). 9500 50- mer oligonucleotides of random sequence but with the same GC content distribution as our probes were synthesized at random locations on each microarray. Use of data from these probes as an estimate of background tends to increase the overall standard deviation of the data, and therefore our analysis includes no background subtraction.  For indel comparisons between wild-type strains, genes were from the WormBase release WS150 and were classified into families using a combination of the blastclust clustering algorithm and protein alignments and trees, performed using clustalw and phyml (Thompson et al. 1994; Guindon and Gascuel 2003). We set conservative cutoff values for identifying indels, requiring log2 ratios of ≥ 1 for amplification segments and ≤ -2 for deletions. Chromosomal start and end coordinates for each gene were used to determine whether the gene was entirely contained with an assigned deletion; genes that spanned the end of a deletion were not included.  28 Table 2.1. Gene family members deleted in natural isolates from Hawaii (CB4856) and Madeira (JU258). P-values were computed only for families with potentially higher rates of deletions, and only values < 0.05 are shown. P-values are relative to all genes and are one-sided and computed by a 2 x 2 chi-square test with Yates correction. P-values are not corrected for multiple testing, and those with marginal values after Bonferroni correction are enclosed in parentheses. NA, not applicable; NS, not significantly different.   Hawaiian vs. N2 Madeiran vs. N2 Gene Family No. of genes No. of deleted % Deleted P-value No. of deleted % Deleted P-value All genes 20,873 531 2.54 NA 670 3.21 NA MATH only 50 33 66.00 <0.0001 35 70.00 <0.0001 MATH-BTB 47 17 36.17 <0.0001 24 51.06 <0.0001 E3 ubiquitin ligase 38 11 28.95 <0.0001 12 31.58 <0.0001 F-box 536 71 13.25 <0.0001 94 17.54 <0.0001 Ubiquitin 35 5 14.29 <0.0001 5 14.29 0.0001 Lectin C-type 304 26 8.55 <0.0001 25 8.22 <0.0001 DUF130 52 6 11.54 <0.0001 24 46.15 <0.0001 DUF19 84 15 17.86 <0.0001 23 27.38 <0.0001 Srh chemoreceptor 311 34 10.93 <0.0001 33 10.61 <0.0001 Srz chemoreceptor 115 23 20.00 <0.0001 10 8.70 (0.0023) SNF-2-like helicase 105 11 10.48 <0.0001 10 9.52 0.0008 Srbc chemoreceptor 84 6 7.14 (0.0205) 10 11.90 <0.0001 DUF274 22 0 0.00 NS 6 27.27 <0.0001 Srw chemoreceptor 148 9 6.08 (0.014) 13 8.78 0.0003 Str-Srj chemoreceptor 325 8 2.46 NS 22 6.77 0.0006 Sri chemoreceptor 81 8 9.88 0.0001 5 6.17 NS Thioredoxin 44 2 4.55 NS 6 13.64 0.0005 Srt chemoreceptor 75 1 1.33 NS 7 9.33 (0.0077) Nuclear receptor 285 3 1.05  5 1.75 Homeodomain 108 2 1.85  2 1.85 Collagen 231 0 0.00  0 0.00 Major facilitator permease 213 0 0.00  0 0.00 Ser-thr protein kinase 309 2 0.65  0 0.00 DUF18 (ShTK) 122 2 1.64  2 1.64 Major sperm protein 111 0 0.00  0 0.00 Transthyretin 96 0 0.00  0 0.00 Ligand-gated ion channels 94 0 0.00  0 0.00 Srd chemoreceptor 76 0 0.00  0 0.00 Acytransferase 59 1 1.69  1 1.69 Rab-ras 71 1 1.41  0 0.00 Srg chemoreceptor 68 1 1.47  1 1.47 DEAD-box helicase 63 0 0.00  0 0.00 ABC transporter 61 0 0.00  1 1.64 Receptor L 62 1 1.61  1 1.61 Sre chemoreceptor 56 0 0.00  0 0.00 Sru chemoreceptor 48 0 0.00  0 0.00 Insulin 38 0 0.00  0 0.00 Glycosyl hydrolase 37 1 2.70  0 0.00 Tyr protein kinase 37 0 0.00  0 0.00 Galectin 24 0 0.00  0 0.00   29   Figure 2.1. Detection of a 50-kb homozygous viable deletion in gkDf2. (a) Normalized log2 ratios (gkDf2/WT) of the average fluorescent intensities for each of the 92,209 forward and reverse pairs of probes to the X chromosome are represented by circles. The deletion is identified by negative log2 ratios and indicated by an arrow. (b) A higher resolution view of fluorescence ratios for probe pairs targeting the 50-kb deletion. Horizontal bars indicate the positions of the nine genes targeted by the deletion. Duplications of sequences flanking the deletion are indicated by positive log2 ratios. Adjacent 50-mer probes in this region overlap by as much as 44 bp.   30 Figure 2.2. Detection of a 1047-bp homozygous viable deletion in ceh-39 (gk329). (a) Normalized log2 ratios (gk329/WT) for the average fluorescence intensities for all probe pairs to the X chromosome are shown. The arrow indicates the deletion. (b) Intensity ratios for probes to ceh-39, ceh-21, and ceh-41 are shown with WormBase gene models to illustrate probe coverage in exons near the deletion. Sequenced deletion breakpoints are indicated by dotted lines. aCGH accurately identified the left breakpoint between exons 3 and 4 of ceh-39.   31 Figure 2.3. Comparison of the normalized average fluorescence ratios (XO male / XX hermaphrodite) for all probe pairs to chromosomes II and X. The graph in the top right corner plots the probe density versus the log2 fluorescence ratio for probes to chromosomes II and X. The curve peaking on the left is for the X chromosome and the curve peaking on the right is for chromosome II. The distributions overlap by ∼4%.  32 Figure 2.4. Detection of the 1202-bp deletion in dab-1 (gk291) in a wash-sampled balanced heterozygous population. (a) The normalized log2 ratios [(dab-1(-)/mIn1)/WT] of the average fluorescence intensities for probe pairs to chromosome II are plotted. The arrow indicates the dab-1 deletion (other features are discussed in the text). (b) Normalized fluorescence ratios for probe pairs targeting dab-1 are shown. Sequenced deletion breakpoints are indicated by dotted lines and were accurately predicted by aCGH. The left breakpoint lies within the second intron. Overlapping probes targeting the right breakpoint span just 73 bp, allowing resolution of the right breakpoint to within fewer than 50 bp.  33 Figure 2.5. Deletions detected in a screen for homozygous lethal mutations in six wash-sampled balanced heterozygous populations. The following normalized log2 ratios are shown: (a) (gk463/mIn1)/WT; (b) (gk460/mIn1)/WT; (c) (gk462/mIn1)/WT; (d) (gk465/mIn1)/WT; (e) (gk488/mIn1)/WT; and (f) (gk487/mIn1)/WT.  34  Figure 2.6. Whole-genome aCGH comparing Hawaiian (CB4856) and Bristol N2 (VC196) hermaphrodites. Large-scale copy number polymorphism is evident between these two wild-type isolates. Normalized log2 fluorescence ratios (CB4856/N2) for all probes on the chip are shown.  35  Figure 2.7. Whole-genome aCGH comparing Madeiran (JU258) and Bristol N2 (VC196) hermaphrodites. Large-scale copy number polymorphism is evident between these two wild-type isolates. The distribution of deletions both within and between chromosomes is similar to that seen in the Hawaiian strain (CB4856; Figure 2.6). Normalized log2 fluorescence ratios (JU258/N2) for all probes on the chip are shown.  36 Figure 2.8. A homozygous viable deletion identified on chromosome V in the Hawaiian strain (CB4856). Normalized log2 ratios (CB4856/N2) indicate that the deletion targets the genes C49G7.1 and D1065.3.   37 2.5. References Barstead, R.J. 1999. Reverse Genetics. In C. elegans: A Practical Approach (ed. I.A. Hope), pp. 97-118. Oxford University Press, Oxford, UK. Bartholomew, G. and B. Bailey. 1994. Maui Remembers: A Local History. Mutual Publishing, Honolulu, Hawaii. Brenner, S. 1974. The genetics of Caenorhabditis elegans. Genetics 77: 71-94. Carvalho, B., E. Ouwerkerk, G.A. Meijer, and B. Ylstra. 2004. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides. J Clin Pathol 57: 644-646. Chen, N., S. Pai, Z. Zhao, A. Mah, R. Newbury, R.C. Johnsen, Z. Altun, D.G. Moerman, D.L. Baillie, and L.D. Stein. 2005. Identification of a nematode chemosensory gene family. Proc Natl Acad Sci U S A 102: 146-151. Cleveland, W.S., Grosse, E., Shyu, M. J. 1992. A Package of C and Fortran Routines for Fitting Local Regression Models. Chapman and Hall, Ltd., London, UK. Conrad, D.F., T.D. Andrews, N.P. Carter, M.E. Hurles, and J.K. Pritchard. 2006. A high- resolution survey of deletion polymorphism in the human genome. Nat Genet 38: 75-81. Denver, D.R., K. Morris, and W.K. Thomas. 2003. Phylogenetics in Caenorhabditis elegans: an analysis of divergence and outcrossing. Mol Biol Evol 20: 393-400. Dhami, P., A.J. Coffey, S. Abbs, J.R. Vermeesch, J.P. Dumanski, K.J. Woodward, R.M. Andrews, C. Langford, and D. Vetrie. 2005. Exon Array CGH: Detection of Copy- Number Changes at the Resolution of Individual Exons in the Human Genome. Am J Hum Genet 76: 750-762. Edgley, M.L. and D.L. Riddle. 2001. LG II balancer chromosomes in Caenorhabditis elegans: mT1(II;III) and the mIn1 set of dominantly and recessively marked inversions. Mol Genet Genomics 266: 385-395. Guindon, S. and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696-704. Haber, M., M. Schungel, A. Putz, S. Muller, B. Hasert, and H. Schulenburg. 2005. Evolutionary history of Caenorhabditis elegans inferred from microsatellites: evidence for spatial and temporal genetic differentiation and the occurrence of outbreeding. Mol Biol Evol 22: 160-173. Hinds, D.A., A.P. Kloek, M. Jen, X. Chen, and K.A. Frazer. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet 38: 82-85. Ishkanian, A.S., C.A. Malloff, S.K. Watson, R.J. DeLeeuw, B. Chi, B.P. Coe, A. Snijders, D.G. Albertson, D. Pinkel, M.A. Marra, V. Ling, C. MacAulay, and W.L. Lam. 2004. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 36: 299-303. Kallioniemi, A., O.P. Kallioniemi, D. Sudar, D. Rutovitz, J.W. Gray, F. Waldman, and D. Pinkel. 1992. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258: 818-821. Locke, D.P., A.J. Sharp, S.A. McCarroll, S.D. McGrath, T.L. Newman, Z. Cheng, S. Schwartz, D.G. Albertson, D. Pinkel, D.M. Altshuler, and E.E. Eichler. 2006. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet 79: 275-290. Lucito, R., J. Healy, J. Alexander, A. Reiner, D. Esposito, M. Chi, L. Rodgers, A. Brady, J. Sebat, J. Troge, J.A. West, S. Rostan, K.C. Nguyen, S. Powers, K.Q. Ye, A. Olshen, E. Venkatraman, L. Norton, and M. Wigler. 2003. Representational oligonucleotide  38 microarray analysis: a high-resolution method to detect genome copy number variation. Genome Res 13: 2291-2305. Lupski, J.R. 1998. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet 14: 417-422. Mantripragada, K.K., P.G. Buckley, T.D. de Stahl, and J.P. Dumanski. 2004. Genomic microarrays in the spotlight. Trends Genet 20: 87-94. Markham, N.R. 2003. Hybrid: a software system for nucleic acid folding, hybridizing and melting predictions. Rensselaer Polytechnic Institute, Troy, NY., Troy, NY, USA. McCarroll, S.A., T.N. Hadnott, G.H. Perry, P.C. Sabeti, M.C. Zody, J.C. Barrett, S. Dallaire, S.B. Gabriel, C. Lee, M.J. Daly, and D.M. Altshuler. 2006. Common deletion polymorphisms in the human genome. Nat Genet 38: 86-92. Olson, M.V. 1999. When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet 64: 18-23. Pinkel, D., R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W.L. Kuo, C. Chen, Y. Zhai, S.H. Dairkee, B.M. Ljung, J.W. Gray, and D.G. Albertson. 1998. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 20: 207-211. Pollack, J.R., T. Sorlie, C.M. Perou, C.A. Rees, S.S. Jeffrey, P.E. Lonning, R. Tibshirani, D. Botstein, A.L. Borresen-Dale, and P.O. Brown. 2002. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A 99: 12963-12968. Sebat, J., B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi, N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T.C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, and M. Wigler. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525-528. Selzer, R.R., T.A. Richmond, N.J. Pofahl, R.D. Green, P.S. Eis, P. Nair, A.R. Brothman, and R.L. Stallings. 2005. Analysis of chromosome breakpoints in neuroblastoma at sub- kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer 44: 305-319. Solinas-Toldo, S., S. Lampel, S. Stilgenbauer, J. Nickolenko, A. Benner, H. Dohner, T. Cremer, and P. Lichter. 1997. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 20: 399-407. Stallings, R.L., P. Nair, J.M. Maris, D. Catchpoole, M. McDermott, A. O'Meara, and F. Breatnach. 2006. High-resolution analysis of chromosomal breakpoints and genomic instability identifies PTPRD as a candidate tumor suppressor gene in neuroblastoma. Cancer Res 66: 3673-3680. Stein, L.D., Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. Eustachio, D.H.A. Fitch, L.A. Fulton, R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E. Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W. Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D. Willey, R.K. Wilson, R. Durbin, and R.H. Waterston. 2003. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics. PLoS Biology 1: e45. Stewart, M.K., N.L. Clark, G. Merrihew, E.M. Galloway, and J.H. Thomas. 2005. High genetic diversity in the chemoreceptor superfamily of Caenorhabditis elegans. Genetics 169: 1985-1996. Strefford, J.C., F.W. van Delft, H.M. Robinson, H. Worley, O. Yiannikouris, R. Selzer, T. Richmond, I. Hann, T. Bellotti, M. Raghavan, B.D. Young, V. Saha, and C.J. Harrison. 2006. Complex genomic alterations and gene expression in acute lymphoblastic  39 leukemia with intrachromosomal amplification of chromosome 21. Proc Natl Acad Sci U S A 103: 8167-8172. Thomas, J.H. 2006a. Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants. Genome Res 16: 1017-1030. Thomas, J.H. 2006b. Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172: 127-143. Thomas, J.H., J.L. Kelley, H.M. Robertson, K. Ly, and W.J. Swanson. 2005. Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae. Proc Natl Acad Sci U S A 102: 4476-4481. Thompson, J.D., D.G. Higgins, and T.J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position- specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680. Urban, A.E., J.O. Korbel, R. Selzer, T. Richmond, A. Hacker, G.V. Popescu, J.F. Cubells, R. Green, B.S. Emanuel, M.B. Gerstein, S.M. Weissman, and M. Snyder. 2006. High- resolution mapping of DNA copy alterations in human chromosome 22 using high- density tiling oligonucleotide arrays. Proc Natl Acad Sci U S A 103: 4534-4539. Wicks, S.R., R.T. Yeh, W.R. Gish, R.H. Waterston, and R.H. Plasterk. 2001. Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map. Nat Genet 28: 160-164. Yang, Y.H., S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed. 2002. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: e15. Zhang, Z., S. Schwartz, L. Wagner, and W. Miller. 2000. A greedy algorithm for aligning DNA sequences. J Comput Biol 7: 203-214.  40 3. De novo identification of single nucleotide mutations in Caenorhabditis elegans using array Comparative Genomic Hybridization 2 3.1. Introduction A major roadblock in genetic research lies in the molecular identification of mutations responsible for an observed phenotype.  Traditional positional cloning techniques are laborious, time-consuming and sometimes impractical for mapping mutations to regions smaller than a few Mbp, particularly in regions with low recombination frequencies such as the centers of C. elegans chromosomes (Barnes et al. 1995).  Sequencing such a large region still remains impractical for most laboratories and as a result many mutations remain uncharacterized. aCGH has been used to detect many types of genome diversity in a variety of organisms (Gresham et al. 2008), including single nucleotide variation in the 12.5 Mb yeast genome using short 25-mer probes (Gresham et al. 2006). We have been using aCGH with exon-centric tiling arrays of 50- mer oligonucleotide probes to screen for deletions in the C. elegans genome following mutagenesis with trimethylpsoralen (TMP) and ultraviolet (UV) irradiation (Maydan et al. 2007). Here we demonstrate the use of 50-mer probes to detect single nucleotide mutations in the 100 Mb C. elegans genome. 3.2. Results 3.2.1. Novel single-nucleotide mutations detected utilizing an exon-centric chromosome II microarray In one set of experiments utilizing a microarray with probes targeting primarily exons on C. elegans chromosome II, we screened individuals homozygous for a mutagenized chromosome II. In these experiments we identified three statistically significant putative mutations (P-values ranged from 2.7 x 10-5 to 1.8 x 10-14 according to one-sample t-tests). These putative mutations affected just a few adjacent overlapping probes and produced modest signals comparable to those normally observed for heterozygous deletions. We hypothesized that very small  2 A version of this chapter has been accepted for publication. Maydan, J.S., H.M. Okada, S. Flibotte, M.L. Edgley and D.G. Moerman. In Press. De novo identification of single nucleotide mutations in C. elegans using array Comparative Genomic Hybridization. Genetics.  41 homozygous mutations (much shorter than the length of a probe) could produce signals of this magnitude. The mutations would have to be very small in order to target only a few overlapping probes and permit some hybridization of complementary sequence to the array. Mutations of this size would not have produced statistically significant signals on our whole-genome tiling arrays because each mutation would affect only one or two probes.  Our hypothesis was confirmed when PCR and DNA sequencing identified single nucleotide mutations in all three mutants. The strain VC10078 carries gk802, an A→T transversion allele of syd-1 at II: 7586645 (see Fig. 3.1), causing a non-conservative amino acid substitution (I(887) → K);  VC10079 contains allele gk803, an A→G transition at nucleotide II: 10825740 which results in a synonymous base pair substitution in mix-1 at the 3rd position of a codon for leucine (CUA → CUG); and VC10077 carries gk801, an allele with two closely linked mutations in Y46E12BL.2 : a G→A transition at II: 15240024, causing a conservative amino acid substitution (V(714) → I), and an A→G transition at II:15240052 resulting in a non- conservative amino acid substitution (Y(723) → C). 3.2.2. Single-nucleotide mutations detected in 13 strains with previously mapped mutations Dense tiling with oligonucleotides is necessary to obtain sufficient statistical power to detect single nucleotide alterations. In a previous study we showed that a window of about 20 bases in each 50-mer probe contains a strong log2 ratio signal (see Figure 1 in Flibotte et al. 2009), and since we require about four probes to target the mutated site that allows a maximum probe spacing of about five bases. The plot in this figure also shows that it would be useful to target both strands and use the small shift in the peak position on opposite strands to help distinguish SNPs from artifacts. Utilizing these probe spacing guidelines, we conducted an additional 13 aCGH experiments comparing homozygous mutants to their parental strains, using 50-mer oligonucleotide microarrays probing regions from 0.65 – 2.60 Mb in length (see Methods) that are known to include unidentified mutations based on prior mapping experiments.  From these experiments we selected 58 candidate single nucleotide mutations on the basis of visual inspection of the data and identification using a segmentation algorithm (Maydan et al. 2007; see Appendix 3) or a sliding window technique. We then performed PCR and DNA sequencing in order to gauge the accuracy of our mutation predictions. For each candidate  42 mutation, we calculated a SNP Score by averaging the log2 fluorescence ratios (mutant / wild- type [WT]) in a small window containing probes putatively affected by the mutation, and renormalizing by subtracting from that the average log2 ratio in the immediate flanking regions. This renormalization is necessary to account for local bias, which varies both within and among experiments and makes the detection of SNPs more difficult since artifacts associated with a strong local bias in log2 ratio could easily be confused with the signature expected for a SNP. Unlike previous observations that mutations near the centers of 25-mer probes are most inhibitory to efficient hybridization (Sharp et al. 2007), we observed that mutations located away from the glass slide and freely floating in the solution closer to the 5’ ends of our 50-mer probes produced a larger perturbation to the hybridization process, with a maximum perturbation at seven bases in from the 5’ end (probably due to steric effects; (Figure 1 in Flibotte et al. 2009). The location of the window used to calculate the score reflects this observation. This sensitivity to mutations at the 5’-end of NimbleGen probes has also been observed by Wei et al. (2008). The sequencing results (summarized in Figure 3.2a) confirmed the presence of a single nucleotide mutation in 16 of the candidates for an overall success rate or specificity of 28%. All mutations were either C-to-T or G-to-A transitions, as expected from EMS mutagenesis. The locations of the mutations were usually predicted to within less than 10 bp of their true positions, and to within 1 bp in one case. 3.2.3. The sensitivity of single nucleotide polymorphism detection using aCGH In order to estimate the sensitivity of our single nucleotide mutation detection technique, we performed aCGH experiments to test our ability to detect 2639 known single nucleotide polymorphisms (SNPs) in the CB4856 strain isolated in Hawaii (see Methods for array design details). Examples of all possible transitions and transversions were detected. The SNP detection sensitivity is shown in Figure 3.2b for various thresholds in the SNP Score. At the reasonable threshold of -0.45 the specificity (the percentage of predicted SNPs that are real) would be 31% with a sensitivity (the percentage of real SNPs that are successfully detected) of 37%. In other words, with the current SNP detection technique one could expect to detect roughly one out of every three SNPs present in the targeted region and one will have to sequence roughly three candidates in order to detect a real SNP. As expected, the SNP detection sensitivity of the current technique depends on the type of transition or transversion being  43 investigated, and as can be seen in Figure 3.2c the sensitivity reaches around 50% for the most commonly induced EMS mutations (C-to-T and G-to-A). 3.3. Discussion 3.3.1. Limitations to SNP discovery using aCGH The optimal probe length for single nucleotide mutation detection by aCGH is unclear and likely depends on the hybridization conditions. Single nucleotide mutations should have a greater impact on hybridization to shorter oligonucleotides, but longer oligonucleotides allow a greater number of overlapping probes to target a given single nucleotide mutation, and arrays with longer oligonucleotides tend to have better standard deviations in log2 ratios (Sharp et al. 2007). Further experiments will need to be done to determine the optimal probe length to achieve the greatest sensitivity and specificity as a function of the size of the targeted region; such an optimal length will probably vary with the complexity of the genome being studied.  Although this technique is particularly well suited to detecting SNPs generated by EMS mutagenesis, some single nucleotide mutations may not be detectable by aCGH even with higher probe densities than we have used here. We suspected that some of the Hawaiian SNPs that we failed to detect might have been missed because they were found in regions with significant homology to other regions of the genome. In these cases, multiple regions of the genome could have hybridized to our probes, making it unlikely that the effect of a SNP on the log2 ratios would be detectable. However, filtering the oligonucleotide properties according to our best practices and standard microarray design recommendations (Flibotte and Moerman 2008) failed to improve the SNP detection sensitivity. It is also possible that SNPs are more difficult to detect with aCGH when present in the background of the Hawaiian genome, which contains significant structural variation relative to the N2 reference genome (Maydan et al. 2007); consequently, for a more typical SNP detection experiment the sensitivity of the technique might be slightly better than what we have reported here. However, limiting the analysis to SNPs that are located far away from other known mutations did not improve the SNP detection sensitivity. Lastly, we have not yet attempted to detect heterozygous single nucleotide mutations using this technique, but this would be nearly impossible to accomplish with current microarrays.  44 3.3.2. Suggestions to improve the sensitivity and specificity of SNP detection by aCGH The ability of aCGH to detect homozygous single nucleotide mutations in addition to deletions and duplications makes it possible to quickly and affordably identify mutations mapped by traditional positional cloning approaches. A clear example of the feasibility of this technique is demonstrated in the study by O’Meara et al. (in press), in which two single base lesions were mapped to the promoter of the C. elegans gene cog-1 using aCGH. We recommend a maximum probe spacing of no more than five bp in order to have a reasonable chance at successful SNP detection with this technique. This probe spacing corresponds to about two Mbp of genomic sequence on a microarray with 380,000 probes, the oligo capacity of the chips we used in this study. We prefer to apply this SNP detection technique to situations where the mutation is mapped to a maximum of a one Mbp region, as this provides denser coverage of the mutation site and allows us to target both strands. Further reducing the size of the candidate region should improve the likelihood of successful base change detection as more probes target any specific base. If any sequences in the mapped region can be excluded (such as non-coding DNA, repeat elements or genes which can be ruled out as candidate genes) the probe density can be further increased in the remaining regions of interest. Of course, it is possible to use more than one microarray to probe the candidate region if the region is too large to achieve the desired probe density on a single array. When the search region is small enough to allow very high density tiling, one can take advantage of the fact that the effect of a SNP on hybridization is dependent on its position in the probe by including probes that target both strands, and then primarily pursuing candidates showing a small shift between the plus and minus strand log2 ratio profiles. Targeting both strands for this purpose should result in fewer false positives. 3.3.3. Online resources for SNP detection using aCGH In order to make the current SNP detection technique more accessible, we have mounted a web application to design oligonucleotide microarrays built by H.M. Okada. The application can be found at http://hokkaido.bcgsc.ca/SNPdetection/. Downstream analysis tools to calculate and normalize the log2 ratios are also available on this web site. Given the criteria set by the user, such as the probe target region and strand(s), the oligonucleotides are selected in a way to evenly distribute the probes across the selected region. Probes are selected to avoid repeat regions, non-coding regions (optional), and probe sequences that cannot be synthesized due to  45 the cycle number constraint in NimbleGen’s manufacturing process. Once the criteria have been selected, the file is sent to the user in a format ready for submission to NimbleGen. Currently the probe selection application has been set to support the C. elegans and Drosophila melanogaster genomes, and genomes from other species will be added upon request. 3.3.4. SNP detection by aCGH as an alternative to high-throughput sequencing With the advent of whole genome sequencing using new high-throughput sequencing machines (Hillier et al. 2008) it might be asked if SNP detection on microarrays is a reasonable technique for mutation detection. Deep sequencing may become the method of choice in the future, but for now our method is easier to perform, especially given we have provided a website for oligo design and data analysis. Our method involves less labor and the aCGH work can be outsourced to NimbleGen. For the time being our method is less expensive but this may change in the future. Genetic mapping of mutations remains essential. For our SNP detection method, one needs to do initial mapping to limit the mutation of interest to a small region of the genome. Although deep sequencing without prior genetic mapping is possible, one must then determine which of several hundred changes in the genome is the causative mutation (Hillier et al. 2008; David Spencer, personal communication). 3.4. Methods 3.4.1. Mutagenesis A mixed-stage population of VC1415 (unc-4(e120)/mIn1[mIs14 dpy-10(e128)] II) was subjected to mutagenesis with TMP at 10 µg/ml for 1 hour followed by UV irradiation for 90 seconds at 340 µW/cm2, and then placed on food at 20° C.  Both unc-4 and dpy-10 mutations are recessive, and the mIn1 inversion suppresses recombination along the middle of chromosome II from lin-31 to rol-1 (Edgley and Riddle 2001); the mIs14 element confers a semi-dominant GFP signal confined to the pharyngeal muscle.  After 48 hours, 30 gravid WT GFP+ P0 adults were singly picked onto 60mm Petri plates and allowed to self.  Seven WT GFP+ F1 progeny were singly picked from each parent for a total of 210 clones, from which 100 were selected that segregated viable fertile Unc-4 F2 progeny.  Single gravid Unc-4 progeny were picked from each of these plates and used to establish 100 new populations  46 homozygous for unc-4 and any newly induced mutations within the genetic interval balanced by mIn1.  The 13 mutant strains with previously mapped mutations were generated by standard ethyl methanesulfonate (EMS) mutagenesis, which yields approximately one single nucleotide mutation every 100 – 400 kb (Anderson 1995; Cuppen et al. 2007), and then serially backcrossed with their parental strains prior to this work. 3.4.2. Nematode culturing and DNA preparation Nematodes were grown on NGM agar plates spread with a lawn of Escherichia coli strain OP50 or χ1666. Nematode populations were grown to starvation on three 60mm Petri plates, harvested by washing, centrifugation and aspiration of supernatant, and frozen at -80° C in 2.5 volumes of worm lysis buffer (50 mM KCl; 10 mM Tris-HCl, pH 8.3; 2.5 mM MgCl2; 0.45% NP-40 (Igepal); 0.45% Tween-20; 0.01% gelatin; 300 µg/ml Proteinase K).  Crude lysates were prepared from frozen samples by incubation at 65° C for two hours.  Genomic DNA was prepared from the lysates as described previously by Maydan et al. (2007). 3.4.3. Probe selection, array design and aCGH The filters used to select the 50-mer oligonucleotides for the exon-centric chromosome II chip have been described by Maydan et al. (2007; see section 2.4.1). Microarrays for the 13 previously mapped mutations were designed by tiling the target regions with equally spaced overlapping 50-mer oligonucleotides without any filtering except for the elimination of the repeats listed in WormBase and the exclusion of probes that were not possible due to the cycle number constraint in the microarray manufacturing process. The earlier arrays were designed using WormBase data freeze version WS170 while the more recent designs used WS180. A single 380,000-oligonucleotide array was designed for each region of interest except for one experiment where two arrays have been used to cover a genomic region 4.9 Mb wide. The probe spacing, i.e. the distance between the 5’ ends of consecutive probes, on these arrays ranged from 1-5 bp. Unlike our previous exon-centric arrays, no other constraints were applied to the oligonucleotides.   47 From all the CB4856 SNPs present in WormBase data freeze WS170, we selected 2639 that were far enough from all the known mutations in that strain in order to minimize the presence of mutations in the immediate flanking regions of the selected SNPs. Once again, the only filter used in the design process was to eliminate the known repeats. Each SNP was represented on the array by a maximum of 150 50-mer oligonucleotides spaced one bp apart, up to 50 oligonucleotides affected by the mutation and up to 50 oligonucleotides for each immediate left and right flanking region. For each SNP the set of probes alternated between the sequence from the plus and minus strand templates, thus for a given strand the minimum spacing between probes was equal to two bases. For the CB4856 experiment we performed dye-flip hybridizations in order to evaluate the Cy3/Cy5 bias, therefore each SNP log2 ratio profile was measured four times, with two separate hybridizations and on both strands each time.  Microarray manufacture, DNA sample handling, labeling with Cy3 (mutants and CB4856) or Cy5 (WT N2 [VC196] reference), hybridization, imaging and fluorescence intensity extraction were performed by Roche NimbleGen, Inc. (Selzer et al. 2005). Oligonucleotides were synthesized at random positions on all arrays. 3.4.4. Data analysis and mutation detection Log2 fluorescence ratios (Cy3/Cy5) were calculated and normalized as previously described (Maydan et al. 2007; see Section 2.4.5). Many initial experiments were performed using the same chromosome II array design, which allowed an approximate determination (by simply averaging) and subsequent subtraction of local bias in the log2 ratio signal for individual experiments.  The signature of a SNP in the log2 ratio signal is similar to that of a deletion except that the log2 ratio shows only a modest reduction for the affected probes and of course only a few probes are affected. Mutation candidates were selected by analyzing the aCGH data by visual inspection and use of a segmentation algorithm (Maydan et al. 2007; Appendix 3) or a sliding window technique. The SNP Score, or adjusted mean log2 ratio, corresponds to the average log2 ratio of the probes where the mutation is located in a window 13 bases wide (covering positions 5-17) near the 5’-end of the 50-mer oligonucleotide that is away from the slide and therefore freely floating in the solution, and was then renormalized by subtracting the mean of the log2 ratio in  48 the immediate left and right 50-base wide flanking regions for oligonucleotides not overlapping this window.  To test each mutation candidate, PCR was used to amplify products of a few hundred base pairs surrounding the candidate regions. DNA sequencing of these products precisely identified the mutations.  When calculating the SNP detection sensitivity in the CB4856 experiments, each of the four log2 ratio measurements were considered separately because each profile is associated with an oligonucleotide spacing of two bp, which is more representative of the SNP detection experiments we used to evaluate the specificity of the technique. We could have averaged the four profiles to reduce the standard deviation before calculating the sensitivity, but this would not have allowed a direct and meaningful comparison with the data from our SNP detection experiments.     49 Figure 3.1. Novel detection of an A→T transversion in syd-1. Normalized log2 ratios of fluorescence intensities (mutant / wild-type) are plotted as  at the first (5’) free-floating base of each 50-mer probe.  The length of each probe targeted by the SNP is illustrated by a horizontal bar, and the position of the SNP is indicated by an *.  Multiple adjacent overlapping probes targeted the point mutation, so its effect on hybridization was assayed several times. Aberrant fluorescence ratios at probes targeting the SNP stand out from nearby probes targeting wild-type sequence.  50 Figure 3.2. Estimation of the sensitivity and specificity of the current SNP detection technique. (a) The SNP Score (see Methods) is shown for the 58 candidate SNPs we have sequenced with the candidates ordered according to their score. The  and  symbols represent the candidates confirmed and not confirmed by sequencing, respectively. For example, a score smaller than - 0.45 would include all the 16 confirmed cases and 36 non-confirmed candidates, corresponding to a specificity of 31%. (b) The detection sensitivity for the SNPs in the CB4856 (Hawaiian) experiments is shown as a function of the threshold in the SNP Score. Using a threshold of -0.45 as before would correspond to a sensitivity of 37%. (c) The sensitivity is shown separately for each transition and transversion type when using the same threshold of -0.45.  51 3.5. References Anderson, P. 1995. Mutagenesis. Methods Cell Biol 48: 31-58. Barnes, T.M., Y. Kohara, A. Coulson, and S. Hekimi. 1995. Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics 141: 159-179. Cuppen, E., E. Gort, E. Hazendonk, J. Mudde, J. van de Belt, I.J. Nijman, V. Guryev, and R.H.A. Plasterk. 2007. Efficient target-selected mutagenesis in Caenorhabditis elegans: Toward a knockout for every gene. Genome Res. 17: 649-658. Edgley, M.L. and D.L. Riddle. 2001. LG II balancer chromosomes in Caenorhabditis elegans: mT1(II;III) and the mIn1 set of dominantly and recessively marked inversions. Mol Genet Genomics 266: 385-395. Flibotte, S., M.L. Edgley, J. Maydan, J. Taylor, R. Zapf, R. Waterston, and D.G. Moerman. 2009. Rapid High Resolution Single Nucleotide Polymorphism-Comparative Genome Hybridization Mapping in Caenorhabditis elegans. Genetics 181: 33-37. Flibotte, S. and D.G. Moerman. 2008. Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization. BMC Genomics 9: 497. Gresham, D., M.J. Dunham, and D. Botstein. 2008. Comparing whole genomes using DNA microarrays. Nat Rev Genet 9: 291-302. Gresham, D., D.M. Ruderfer, S.C. Pratt, J. Schacherer, M.J. Dunham, D. Botstein, and L. Kruglyak. 2006. Genome-Wide Detection of Polymorphisms at Nucleotide Resolution with a Single DNA Microarray. Science 311: 1932-1936. Hillier, L.W., G.T. Marth, A.R. Quinlan, D. Dooling, G. Fewell, D. Barnett, P. Fox, J.I. Glasscock, M. Hickenbotham, W. Huang, V.J. Magrini, R.J. Richt, S.N. Sander, D.A. Stewart, M. Stromberg, E.F. Tsung, T. Wylie, T. Schedl, R.K. Wilson, and E.R. Mardis. 2008. Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5: 183-188. Maydan, J.S., S. Flibotte, M.L. Edgley, J. Lau, R.R. Selzer, T.A. Richmond, N.J. Pofahl, J.H. Thomas, and D.G. Moerman. 2007. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array comparative genomic hybridization. Genome Res. 17: 337-347. O'Meara M.M., H. Bigelow, S. Flibotte, J.F. Etchberger, D.G. Moerman, and O. Hobert. In press. Cis-regulatory mutations in the C. elegans homeobox gene locus cog-1 affect neuronal development. Genetics. Selzer, R.R., T.A. Richmond, N.J. Pofahl, R.D. Green, P.S. Eis, P. Nair, A.R. Brothman, and R.L. Stallings. 2005. Analysis of chromosome breakpoints in neuroblastoma at sub- kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer 44: 305-319. Sharp, A.J., A. Itsara, Z. Cheng, C. Alkan, S. Schwartz, and E.E. Eichler. 2007. Optimal design of oligonucleotide microarrays for measurement of DNA copy-number. Hum Mol Genet 16: 2770-2779. Wei, H., P.F. Kuan, S. Tian, C. Yang, J. Nie, S. Sengupta, V. Ruotti, G.A. Jonsdottir, S. Keles, J.A. Thomson, and R. Stewart. 2008. A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets. Nucleic Acids Res 36: 2926-2938.  52 4. Copy number variation in the Caenorhabditis elegans genome reveals complex relationships among natural isolates 3 4.1. Introduction Copy number variation is an important component of genetic diversity in both Caenorhabditis elegans (Denver et al. 2003) and humans (Sebat et al. 2004; Conrad et al. 2006; Redon et al. 2006) and has been associated with complex traits including autism spectrum disorder (Marshall et al. 2008), mental retardation (Friedman et al. 2006; Madrigal et al. 2007), and schizophrenia (Stefansson et al. 2008). Genes involved in sensory perception, innate immunity and cell adhesion are common targets of copy number variants (CNVs) (Conrad et al. 2006; Maydan et al. 2007).  We had previously described extensive copy number variation in the genomes of two highly divergent strains of C. elegans, CB4856 (Hawaii) and JU258 (Madeira) (Maydan et al. 2007), and were interested in examining additional natural isolates to measure the extent of variation in the genomes of less divergent strains.  We also recognized that a large number of indel loci spread throughout the genome would allow us to more thoroughly characterize the relationships among strains. Previous studies utilizing nuclear markers have identified some recombinant strains, but they have been limited in their ability to precisely identify which regions of the genome have been exchanged due to the relatively small number of loci at hand (Denver et al. 2003; Haber et al. 2005). Although C. elegans reproduces primarily through selfing of hermaphrodites, males allow rare outcrossing in the wild at an estimated rate of 1-2% (Cutter and Payseur 2003; Barriere and Felix 2005; Barriere and Felix 2007). The frequency of outcrossing varies among populations and has been estimated between 0.01% based on linkage disequilibrium measurements (Barriere and Felix 2005) to as high as 20% based on heterozygote frequencies (Sivasundar and Hey 2005).   3 A version of this chapter will be submitted for publication. Maydan, J.S., A. Lorch, M.L. Edgley, S. Flibotte and D.G. Moerman. Copy number variation in the Caenorhabditis elegans genome reveals complex relationships among natural isolates.  53 In this study we have used array comparative genomic hybridization (aCGH) to detect copy number variation in ten additional natural isolates of C. elegans. We have chosen to use the term indel to refer to the deletions and duplications that we have detected, since many of them are < 1 kb in length and thus would not commonly be considered CNVs (Feuk et al. 2006; Redon et al. 2006). We use the term CNV only for larger aberrations. We use “deletion” to refer to a sequence absent in a natural isolate but present in N2, and “amplification” to refer to an increase in copy number relative to N2. An “amplification” detected in a natural isolate might alternatively represent the deletion of a duplicate copy in the N2 lineage. Only genes that are present in a single copy in the N2 genome are represented on our microarrays, therefore the presence of a gene (as in N2) cannot be reconstituted in an isolate by mutation or conversion from an ancestor carrying a deletion allele. Nevertheless, it is possible for a gene to reappear in a lineage as the result of outcrossing, recombining the intact gene into the lineage. Also, it is possible that extreme sequence divergence could be misinterpreted as a deletion in some cases (see Section 4.3.3).  Among the strains in our study, the Hawaiian and Madeiran strains (CB4856 and JU258) carry the largest number of deletions, followed by the Vancouver strain (KR314). Overall, we detected 510 different deletions affecting 1136 genes, or over 5% of the genes in the canonical N2 genome.  Indels had a median length of 2.7 kb and deletions relative to N2 were far more common than amplifications. We used deletion loci as markers to derive an unrooted tree and observed complex relationships among the strains. Close associations were identified between CB4853 and CB4858, and CB3191 and RW7000. Different regions of the genome clearly possess different genealogies due to recombination throughout the natural history of the species. 4.2. Results 4.2.1. aCGH reveals a bias favoring coding sequence deletions over amplifications in C. elegans We performed ten new aCGH experiments utilizing our exon-centric whole genome microarray (Maydan et al. 2007), which includes probes to 94% of the exons and 98% of the genes in the N2 reference genome. Each aCGH experiment compared a different natural isolate to N2. Table 4.1 summarizes the number and lengths of indels that were detected in each strain, including  54 CB4856 and JU258 from our previous study (Maydan et al. 2007), as well as the number of genes and pseudogenes that were deleted in each strain. It is important to note that nearly all of the indels detected in this study target coding sequences and are thus unlikely to be selectively neutral.  One “amplification” initially detected in all of the strains was subsequently identified by PCR and DNA sequencing as a 1788-bp deletion in our N2 strain (VC196), targeting exons 5 and 6 of alh-2, so that false amplification was ignored in all strains.  We detected 883 deletions (510 different types) and 84 amplifications in the twelve natural isolates. Amplifications were less robustly detected than deletions due in part to the conservative criteria that we used (see Methods), but this alone does not account for the large preponderance of deletions in the strains.  For example, relaxing the log2 fluorescence ratio of mutant:wild-type (log2 ratio) cutoff for amplifications from 1.0 to 0.9 or even 0.8 captured only a modest number of additional amplifications and did not affect the strong bias towards deletions in the strains (data not shown). The bias towards deletions is clearly evident from plots of log2 ratios on each chromosome (Maydan et al. 2007; see Figures 2.6 and 2.7).  An analysis of variance (ANOVA) indicated that indel length varied significantly among the strains in our study (F = 1.8706, P = 0.0395), but not if RW7000 was excluded from the analysis (F = 1.4823, P = 0.1407). With just eight deletions and two amplifications, the mean length of indels in RW7000 (22.6 kb) is greater than in the other strains, due largely to an 86 kb-deletion on chromosome IV and a 117 kb-duplication on chromosome V. RW7000 is the only strain in our study known to have a high Tc1 transposon count (Hodgkin and Doniach 1997), so transposon activity may have been more important in generating CNVs in this strain than in the others. The median indel length among all strains was 2.7 kb, with a mean of 8476 bp, but our ability to detect very small aberrations was limited by the probe density on the arrays. Indel length also varied significantly among chromosomes (F = 5.8772, P = 0.0000233), with larger indels on chromosomes V, IV and II. The mean size of indels on each chromosome ranged from 12.8 kb on chromosome V to just one kb on chromosome X. Furthermore, a Welch’s two- sample t-test revealed a significant difference (t = 6.9256, P = 1.091 x 10-11) between the mean indel lengths in the autosome arms (as defined by recombination rate analysis (Barnes et al. 1995)) and the autosome centers or the X chromosome (9505 bp and 3277 bp, respectively).  55 4.2.2. Extensive copy number variation in the C. elegans genome allows even very closely related strains to be distinguished A single deletion was detected in CB3191 and subsequently confirmed by PCR and DNA sequencing.  The deletion completely knocks out math-15 (sequenced deletion breakpoints are at chromosome coordinates II: 1866365 and II: 1868059). This illustrates the power of our aCGH experiments to distinguish among strains, since CB3191 had previously appeared identical to N2 based on nearly complete mtDNA sequencing (Denver et al. 2003) and multilocus microsatellite genotyping (Haber et al. 2005). Still, with only one deletion, CB3191 appears as similar to the canonical N2 genome as our laboratory’s N2 strain does. In effect, the aCGH experiment comparing CB3191 to N2 served as a self-versus-self negative control and demonstrated that our whole genome array design, in itself, does not produce false positives.  CB4856 and JU258 were clearly the most divergent strains relative to N2, with more indels and more genes deleted than any of the other strains.  KR314 (Vancouver) was the most divergent among the other strains and had more deletions on chromosome II than did any other strain, including CB4856 and JU258, but curiously had no deletions at all on the left half of chromosome V (V-L) where 18 deletions were present in CB4856 and 23 in JU258. The absence of deletions on V-L suggests the possibility that KR314 acquired an N2-like V-L through recombination. Interestingly, RW7000 was by far the least divergent strain other than the N2-like CB3191, despite appearing to be more divergent than six of our strains in a study counting alleles among 31 chemoreceptor genes (Stewart et al. 2005). 4.2.3. The distribution of indels in the genome and the overrepresentation of indels in particular gene families Figure 4.1 plots all of the indels we detected in each strain on the left half of chromosome II (II- L). Figure 4.2 plots a higher resolution figure of this type for the entire genome. Chi-square tests indicated that both the number of deletions and the number of amplifications relative to N2 varied significantly among chromosomes (P < 2.2 x 10-16 and P = 1.281 x 10-5, respectively), after adjusting for the different number of probes targeting each chromosome. As previously described for CB4856 and JU258 (Maydan et al. 2007), indels in C. elegans are strikingly more common on autosome arms than in the autosome centers or on the X chromosome. The  56 autosome arms span just 38% of the probes on our Whole Genome microarray but include 83% of the indels that we identified (85% of the deletions and 64% of the amplifications).  Table 4.3 lists all of the genes targeted by the indels we detected. 1136 different genes were either wholly or partially deleted in at least one strain. The same gene families that we found to be overrepresented in indels in CB4856 and JU258 are clearly also the most common targets of indels in the new strains in this study, most notably the MATH-BTB, F-box, lectin and serpentine chemoreceptor gene families. All of these gene families cluster on autosome arms (Thomas 2006).  4.2.4. Relatedness inferences based on deletions shared by multiple natural isolates Many of the indels we detected were found in multiple strains. Figure 4.3 shows the number of deletions that were found in each strain and the number of other strains that carry the same deletions. The strains with the most deletions also have the highest proportion of unique deletions in our data set, excluding CB3191 and RW7000, which carry only unique deletions. By this measure, CB4856 was the most divergent strain, with 66% (113/172) of its deletions being unique among the twelve strains. This agreed closely with a study reporting that 70% of CB4856 SNPs were not present in nine other natural isolates (Koch et al. 2000).  Unique deletions comprise 64% (75/140) of the deletions in JU258, 42% (33/78) of the deletions in MY2, and 39% (48/124) of the deletions in KR314.  Table 4.2 displays the number of deletions shared by all pairs of strains. Deletions were used as markers to infer a strain phylogeny under both Camin-Sokal parsimony (Camin and Sokal 1965), which assumes that deletions are derived states and transitions to deletions are more likely than transitions to the absence of deletions, and Wagner parsimony (Eck and Dayhoff 1966; Kluge and Farris 1969), which considers the appearance or disappearance of deletions equally likely and does not presume ancestral states. Because of our focus on genes that are present in a single copy in N2, the presence of a gene cannot be reconstituted by mutation or conversion in a strain carrying a deletion allele, but it is possible for a gene to reappear in a lineage as the result of outcrossing. Both parsimony methods gave the same consensus tree (Figure 4.4) based on 1000 bootstrap replicates drawn with replacement from the 510 deletion  57 loci that we identified. We chose to exclude amplifications from this analysis because they were less robustly detected than deletions.  Deletions were treated as independent characters despite the presence of linkage between loci. We detected significant multilocus linkage disequilibrium (standardized index of association (IAS) = 0.072, P < 0.001, see (Haubold and Hudson 2000)) consistent with other studies (Barriere and Felix 2005; Cutter 2006; Barriere and Felix 2007). It is important to note that because deletions are more common on the autosome arms, these regions of the genome factored heavily into the relationships that we inferred.  Overall, close relationships were inferred between CB4853 and CB4858, and between CB3191 and RW7000 (which closely resemble N2). The confidence at the node leading to CB3191 and RW7000 under Camin-Sokal (CS) parsimony is lower than under Wagner (W) parsimony because the two strains occurred in isolation on a subset of the CS trees, but neither strain shared any deletions with any other strain.  Different relationships are predicted by different regions of the genome due to the presence of recombination in the lineage, and this is reflected in the low bootstrap confidence at many of the nodes on the consensus tree. An example of a possible recombination event is indicated in Figure 4.1. AB1 was previously identified as a recombinant strain based on discrepancies between the phylogenies inferred by mitochondrial and nuclear marker data sets (Denver et al. 2003), and the bootstrap support at the node leading to AB1 on our consensus tree is particularly low. While JU258 was most similar to JU263 on I-R, it shared more deletions with KR314 on II-L, III-L and V-R.  JU258 and KR314 were identified as each other’s closest relative on 27% of CS (32% W) trees inferred from all bootstrap replicates, and the group of JU258/KR314/JU263 was also fairly common (19% CS, 28% W). KR314 was more similar to CB4853 and CB4858 on chromosome II. Other groups with appreciable bootstrap support but not appearing on the consensus tree included AB1/CB4854 (32% CS, 20% W), CB4853/CB4858/CB4854/JU322 (21% CS, 29% W), MY2/JU258 (18% CS, 22% W) and MY2/JU258/JU263 (12% CS, 23% W).  58 4.3. Discussion 4.3.1. Bias favoring deletions targeting gene families involved in environmental sensation and innate immunity We detected far more deletions than amplifications in all of the natural isolates. A bias favoring deletions over insertions has previously been observed in patterns of pseudogene variation (Robertson 2000).  It has also been suggested that there may be a high rate of spontaneous deletion in the C. elegans genome (Witherspoon and Robertson 2003) and perhaps selection for small genome size (Denver et al. 2004). It is possible that some of the deletion candidates we detected are actually regions of extreme sequence divergence (see Section 4.3.3). It is also possible that some very ancient duplications are not detected because enough time has passed for the subsequent accumulation of sequence variation to prevent hybridization to our microarrays. It is important to note that duplications present in N2 are not represented on our whole genome microarray because we selected only unique probe sequences with limited homology to other sequences in the genome.  Genes involved in sensory perception and innate immunity are enriched in indels in both C. elegans (Maydan et al. 2007) and humans (Nguyen et al. 2006). Chemoreceptor gene families have undergone significant expansion in C. elegans since its common ancestor with Caenorhabditis briggsae (Stein et al. 2003), including multiple rounds of tandem duplication in the sra and srab families (Chen et al. 2005), and appear prone to gene gains and losses. We found that indel lengths were longer on the autosome arms where homologous gene clusters of these same gene families are the most common (Thomas 2006). Higher recombination rates (Barnes et al. 1995) and the presence of homologous gene clusters and repeat sequences probably predisposes autosome arms to non-allelic homologous recombination (NAHR) events, which tend to generate larger CNVs, whereas smaller indels are more likely created by non- homology based mechanisms (Conrad et al. 2006; Redon et al. 2006; Conrad and Hurles 2007).  We chose to include pseudogenes in Table 4.3 because many genes annotated as pseudogenes in the N2 strain probably have functional copies in other natural isolates, especially genes with a single defect in N2 (usually a premature stop codon or deletion) (Stewart et al. 2005). The large number of deletions relative to N2 that we detected suggests that N2 itself may lack many  59 unknown genes that are present in other natural isolates. Of course, only probes to N2 sequences are included on our microarrays. 4.3.2. New insights into complex strain relationships resulting from recombination and outcrossing in the C. elegans lineage Different portions of the genome will possess different genealogies due to recombination. Trees are therefore inherently flawed in their generalized depiction of strain relationships when recombination has occurred between lineages and should not be interpreted as phylogenies. Nevertheless, our results largely agree with trees inferred from nuclear markers (Denver et al. 2003; Haber et al. 2005). Strains that are close together on these trees generally share more deletions than do strains that are further apart, with a few notable exceptions probably resulting from the limited number of loci available in earlier studies. For example, our results differ markedly from the study in which CB4858 and CB4854 appeared identical at 10/10 microsatellite loci (Haber et al. 2005).  We found that CB4858 was much more closely related to CB4853, sharing 52/66 of its deletions with CB4853 (52/63 CB4853 deletions were found in CB4858) but sharing just 21/66 deletions with CB4854. CB4858 and CB4853 appeared identical on II-L and throughout chromosomes III and V (one small two-probe CB4858 deletion candidate on chromosome III did not quite meet our P-value cutoff in CB4853, but both probes showed log2 ratios < -2). Only five CB4853 deletions not found in CB4858 were present in any other strains, including two deletions on the X chromosome that were found in MY2, and three deletions on II-R that were present in CB4854.  Four of the eight deletions found in CB4858, but not in CB4853, were present in JU263 (on chromosomes I, II and X).  CB4858 and CB4854 did share deletions in the regions of some of Haber et al.’s microsatellites but not in all cases. For instance, CB4858 shared a deletion with CB4853 and CB4856 that is not found in CB4854, which is just six kb away from the microsatellite allele shared by CB4858 and CB4854 on II-L. This highlights the importance of using a large number of loci spread throughout the genome when estimating strain relatedness.  Another discrepancy between the relationships inferred in our study and those estimated by Haber et al. (2005) involves JU263.  Microsatellites did not reveal close relatedness between JU263 and JU258, but JU263 shared more deletions with JU258 (31/68) than with any other strain in our study. JU263 was most similar to KR314 and CB4854 on V-R, and shared 26/68  60 deletions with KR314 overall. Although it does not appear on the consensus tree, a JU263/JU258/KR314 group appeared on 28% of W trees inferred from all bootstrap replicates. Chromosome III told yet another story, where JU263 shared 4/4 deletions on III-L with JU322, and none of these with any other strain.  Overall, CB4856 did not appear particularly closely related to any of the other strains. CB4856 shared the most deletions with KR314, JU322, JU258, JU263 and MY2 (18, 18, 17, 17, and 16, respectively), and slightly fewer with the remaining strains (but none with CB3191 and RW7000). However, specific regions of the CB4856 genome more closely resembled particular strains. For example, CB4856 and JU258 shared five deletions in common over a 10.6-Mb interval on chromosome V, but each shared only one deletion over the same interval with any other strain (MY2). Nevertheless, CB4856 and JU258 were significantly diverged from one another in this region, which included 18 deletions unique to CB4856 and ten deletions unique to JU258. Many other regions of similarity among different groups of strains are evident in Figure 4.2 and Table 4.3, illustrating that the relationships among strains are complicated due to recombination and outcrossing, probably to a greater extent than previously appreciated in studies utilizing fewer genetic markers (Denver et al. 2003; Haber et al. 2005). 4.3.3. Very common indels, mutation hotspots, and the possibility of extreme sequence divergence masquerading as deletions Remarkably, we found 6.7-kb deletions (in CB4854, CB4856 and JU322) and duplications (in KR314 and MY2) that affected exactly the same 117 probes on chromosome III, suggesting the possibility that both CNVs arose from a single NAHR event and subsequently survived the process of genetic drift. Furthermore, the log2 ratios for these amplifications (particularly for MY2) indicated a possible four-fold amplification, perhaps suggesting a second duplication had occurred.  Some deletions and amplifications were more common among the strains in our study than the allele observed in N2 (see Figures 4.1 and 4.2).  For example, a 10-kb deletion on chromosome III was found in all strains except CB3191 and RW7000.  This deletion targeted three uncharacterized genes, Y75B8A.31, Y75B8A.32 and Y75B8A.34, and could possibly be of ancient origin.  It is possible that some common indels have arisen by independent mutations,  61 particularly if there are hotspots susceptible to mutation by NAHR (Conrad and Hurles 2007) and/or subject to positive selection.  There are several very common deletions found on V-R (see Figure 4.2 and Table 4.3), which is a region rich in copy number variation.  Regions with very high sequence divergence could potentially produce log2 ratios that are sufficiently negative to appear as deletions in aCGH data, but are unlikely to account for a sizeable proportion of the deletions that we detected.  On average, roughly 10% or more of the nucleotides in each of several adjacent 50-mer probes would need to be mutated in order to approach our conservative log2 ratio cutoff for deletions (Flibotte and Moerman 2008). This level of sequence variability in our probe sequences is particularly unlikely because our probes target coding sequences. On average, single nucleotide polymorphisms relative to N2 exist at 1/840 nucleotides in CB4856 (Wicks et al. 2001; Swan et al. 2002) and 1/1500 nucleotides in CB4858 (Hillier et al. 2008), but recent whole genome sequencing of the CB4856 genome has identified several regions of much higher sequence diversity (> 10x) that sometimes coincide with deletions identified in our aCGH data (David Spencer and Ryan Morin, personal communication). Most of the deletions we detected (~ 70%) were found to coincide with gaps in the CB4856 genome sequence, suggesting they are likely to be true deletions. The majority of the deletions we inferred that were not found to coincide with gaps in the genome sequence data were small deletions (< 1000 bp) that were not robustly detected because the sequence gap analysis used a sliding window method with a minimum window size of 1000 bp. However, 15/172 Hawaiian deletions that we detected that were larger than 1000 bp were found to overlap with regions of extreme sequence divergence. Nevertheless, one CB4856 deletion that we detected and an overlapping region of high sequence diversity have both been independently identified and confirmed, and are associated with the genetic incompatibility between CB4856 and N2 (Seidel et al. 2008). This partial reproductive isolation has probably allowed for the accumulation of sequence diversity in this region. Still, we cannot rule out that some of the deletions we report could be false positives resulting from regions of extreme sequence divergence. This is probably more likely in the most divergent strains. Genes in some of the gene families that are overrepresented amongst the deletions we report are known to be subject to positive selection for changes in amino acid sequence (Thomas et al. 2005).  We have shown that there is substantial copy number variation in coding sequences in the C. elegans genome. Indels are most common on the autosome arms, especially on chromosomes II  62 and V. Deletions relative to N2 are much more common than amplifications. This bias may partly be the result of selection, as it is unlikely that many of these deletions are selectively neutral because they target coding sequences. Over 5% of the annotated genes in the N2 genome overlap with indels in at least one of the twelve strains that we examined. The indels that were detected should be useful in explaining natural phenotypic variation, particularly in chemosensation (Jovelin et al. 2003) and innate immunity (Schulenburg and Muller 2004). This underestimates the copy number variation in the C. elegans genome because we examined only twelve natural isolates, our ability to detect very small indels is limited by the probe density on our microarrays, and we did not attempt to detect indels that do not target exons. Approximately 26% of the C. elegans genome is intronic and 47% is intergenic sequence (The C. elegans Sequencing Consortium 1998). aCGH is a powerful method of quickly obtaining a large number of genetic markers spread throughout the genome, and has revealed complex relationships among wild C. elegans isolates resulting from recombination and outcrossing events throughout the natural history of the species. 4.4. Methods 4.4.1. Strain selection, nematode culturing and DNA preparation The N2 reference strain in all experiments was VC196, a subculture of N2 received from the Caenorhabditis Genetics Center (CGC) in 2002.  RW7000 was acquired from the lab of Robert Waterston in 1987, and was submitted by that lab to the CGC in 1991. All other strains were received directly from the CGC and grown for a minimal number of generations prior to DNA preparation. We selected the strains in an attempt to sample a range of the microsatellite diversity observed by Haber et al. (2005) but also included strains thought to be very closely related to each other in an attempt to better distinguish them with a larger number of loci. We also included JU322, which was not part of the Haber et al. study.  We intentionally selected some strains suspected to be recombinant, including AB1 (Denver et al. 2003) and CB4854 (Haber et al. 2005), to test our ability to identify particular regions of the genome that have been exchanged as the result of outcrossing. Nematodes were grown as previously described (Brenner 1974) on 150-mm NGM agar plates seeded with Escherichia coli strain χ1666. Nematode populations were grown to starvation, harvested by washing with M9 containing 0.01% Triton X-100, and washed an additional seven times by centrifugation, removal of the supernatant by  63 aspiration, resuspension and vortexing in M9/Triton-X100.  After the final wash, DNA was prepared by standard phenol-chloroform extraction and ethanol precipitation as previously described (Maydan et al. 2007). 4.4.2. aCGH Probes on our whole genome microarray were initially selected from build WS139 of the C. elegans genome (Maydan et al. 2007). Microarray manufacture, DNA fragmentation and labeling, sample hybridization and imaging, and fluorescence intensity measurement were performed by Roche NimbleGen, Inc. as previously described (Maydan et al. 2007).  Log2 ratios (Natural Isolate / N2) were calculated and then normalized using the robust LOWESS regression (Cleveland 1981) implemented in the R programming language (R Development Core Team 2006) with a smooth spanner setting of f = 0.4. 4.4.3. Indel identification Indels were detected with a segmentation algorithm developed by S. Flibotte (Maydan et al. 2007; Appendix 3).  Aberrant segments with a P-value ≤ 0.01 were called deletions if the mean log2 ratio of probes in the segment was ≤ -2, and called amplifications with a mean log2 ratio ≥ 1. Most indels that we classified as deletions are probably truly deletions as opposed to insertions in the N2 lineage, since we used probes targeting coding sequences that contain no non-unique 20-mers and no more than 70% homology to other genome sequences. A novel N2 gene arising by duplication and subsequent sequence divergence in the N2 lineage would have to accumulate many coding sequence mutations in order to pass our probe selection filters. N2 genes arising from recent duplications are probably among the 2% of genes not represented on our microarrays. Nonetheless, we cannot completely rule out the possibility that some insertions or rearrangements in the N2 lineage could have been misclassified as deletions in the natural isolates.  The P-value of each indel was calculated with a one-sample t-test (note that with a very large number of non-aberrant data points, this gives nearly the same value as a Welch’s two-sample t- test).  P-values were not corrected for multiple tests, but a Bonferroni correction would not exclude many indels since a large majority of them have P-values far below the cutoff (91% of  64 all indels have P-values < 0.001, and 83% have P-values < 0.0001).  All indels affected three or more consecutive probes, with the exception of ten deletion candidates (~1% of the indels) that were detected by just two probes with very negative log2 ratios.  All aberrant segments were examined manually and some adjustments were made to fine-tune the selection of the leftmost and rightmost probes (“breakpoint probes”) within the indels. Some segments were interrupted by stretches of probes that did not give log2 ratios consistent with an indel and were manually split into multiple segments. After these adjustments, the mean log2 ratios and P-values were recalculated for all segments with one-sample t-tests using R to ensure that our cutoffs were still met.  In most cases these adjustments further decreased the P-values of the indels.  Occasionally, probes flanking indels show unusual log2 ratios outside the normal range for unaffected probes (Maydan et al. 2007).  This can sometimes make identification of the indel breakpoints less certain.  For each indel, we identified “flanking probes” beyond the left and right breakpoint probes to demark the point at which more normal log2 ratios begin to consistently appear again (log2 ratios > -0.8 for deletions, and log2 ratios < 0.5 for amplifications).  78% of these flanking probes were adjacent to their corresponding breakpoint probes, and 90% were within three probes of indels. 4.4.4. Chi-square tests, t-tests and ANOVAs All remaining chi-square tests, t-tests and ANOVA tests were done using R. For the chi-square tests, the expected number of indels on each chromosome was calculated based on the proportion of probes targeting that chromosome, which essentially corrects for differences in length and gene content among chromosomes. Indel lengths were measured from the middle of the left breakpoint probe to the middle of the right breakpoint probe. Indels found in more than one strain were counted multiple times in these tests. 4.4.5. Affected genes Probe coordinates were obtained by remapping probe sequences to the most recent genome data freeze (WS190) using MegaBLAST, which utilizes a greedy algorithm (Zhang et al. 2000) to  65 align DNA sequences. 24 probes on the array no longer had perfect sequence matches in the genome due to changes in the genome sequence from WS139 to WS190, but none of these probes were present in any of the indels that we detected.  In order to generate the list of genes affected by indels in Table 4.3, we first extracted the start and stop coordinates for all genes in genome build WS190 from WormBase. From that list, we extracted only the genes that overlapped the coordinates spanned by the indel breakpoint probes. Genes completely contained within the region spanned by an indel were listed as entirely affected. Discrepancies between Table 4.3 and the list of deleted genes in CB4856 and JU258 given in Maydan et al. (2007) are due to the adjustments we made to the indel breakpoints and because the genes listed in the previous study were extracted from an older genome build (WS150). 4.4.6. Strain relationships All deletion loci were treated as discrete presence-absence characters. Strains were considered to carry the same deletion allele if their respective deletions overlapped and both their left and right breakpoint probes were within three probes of each other, or in a small number of cases (where the breakpoint and flanking probes were more ambiguous) if the breakpoint probes in one strain fell within the region spanned by the flanking probes in another strain. Remarkably, most deletions found in multiple strains according to these criteria had exactly the same breakpoint probes, illustrating the reliability of the log2 ratios. The single case where we identified deletions and amplifications affecting the same probes (see F44E2.2a in Table 4.3) could have been treated as a multi-state locus, but this was not done because we chose to exclude amplifications from this analysis.  Unrooted trees were inferred from the 510 deletion loci using Phylip 3.66 (Felsenstein 1989). The most parsimonious trees were inferred under both Camin-Sokal and Wagner parsimony methods with 1000 bootstrap replicates drawn with replacement from the loci.  A consensus tree was then inferred separately for each method. An unrooted consensus tree was drawn with CB4856 at the base simply because it shared the fewest alleles with all other strains. The true position of the root of the consensus tree is unknown.  66 4.4.7. Linkage disequilibrium We used LIAN 3.5 (Haubold and Hudson 2000) to calculate a standardized index of association (IAS) based on the original formulae given by (Brown et al. 1980; Smith et al. 1993). IAS is 0 at linkage equilibrium. The program tested significance with a Monte Carlo simulation, resampling loci without replacement over 1000 iterations, in order to scramble their order and generate a null distribution of IAS.  The P-value is the probability, under the null hypothesis of linkage equilibrium, of IAS being greater than or equal to the value observed for our data set.   67  Table 4.1. Indels detected in twelve natural isolates of C. elegans. The number of deletions and amplifications detected in each isolate is shown, along with statistics summarizing their lengths.  The overall number of indels is the sum of those found in all strains, so indels found in more than one strain were counted multiple times.  The number of deleted genes and pseudogenes includes those either wholly or partially deleted.  The overall number of deleted genes and pseudogenes is the number deleted in at least one of the twelve strains (genes deleted in more than one strain were counted only once).  The location refers to the site the strain was initially isolated from.   Strain Deletions Amplifications Median Indel Length Mean Indel Length Maximum Indel Length Deleted Genes Deleted Pseudogenes Location AB1 48 9 2481 7564 75120 147 26 Adelaide, Australia CB3191 1 0 1262 1262 1262 1 0 Altadena, California, USA CB4853 63 10 2460 8651 109400 237 16 Altadena, California, USA CB4854 49 7 2240 6201 68860 122 20 Altadena, California, USA CB4856 172 10 2885 7279 103200 517 91 Oahu, Hawaii, USA CB4858 66 7 2481 7502 109400 211 16 Pasadena, California, USA JU258 140 10 3271 12390 185700 671 117 Ribeiro Frio, Madeira, Portugal JU263 68 6 2680 5080 68860 145 18 Le Blanc, France JU322 66 12 2484 6270 70040 174 15 Merlet, France KR314 124 6 2689 8674 118100 417 77 Vancouver, British Columbia, Canada MY2 78 5 3485 10080 184100 301 44 Roxel, Munster, Germany RW7000 8 2 2593 22600 116900 38 2 Bergerac, France Overall 883 84 2700 8476 185700 1136 216 NA    68 Table 4.2. Number of deletions shared by all strain pairs. The number listed for all comparisons between a strain and itself (indicated by an *) is simply the total number of deletions detected in that strain.   Strain AB1 CB3191 CB4853 CB4854 CB4856 CB4858 JU258 JU263 JU322 KR314 MY2 RW7000 AB1 48* 0 14 20 12 13 20 18 12 21 13 0 CB3191 0 1* 0 0 0 0 0 0 0 0 0 0 CB4853 14 0 63* 24 12 52 16 13 23 34 12 0 CB4854 20 0 24 49* 13 21 15 15 23 19 11 0 CB4856 12 0 12 13 172* 13 17 17 18 18 16 0 CB4858 13 0 52 21 13 66* 18 17 23 34 11 0 JU258 20 0 16 15 17 18 140* 31 17 36 26 0 JU263 18 0 13 15 17 17 31 68* 18 26 15 0 JU322 12 0 23 23 18 23 17 18 66* 20 12 0 KR314 21 0 34 19 18 34 36 26 20 124* 14 0 MY2 13 0 12 11 16 11 26 15 12 14 78* 0 RW7000 0 0 0 0 0 0 0 0 0 0 0 8*   69 Table 4.3. Genes affected by copy number variants in C. elegans. Gene Start and Gene Stop coordinates refer to the positions of the first and last base of each gene in WS190. Indels are identified as amplifications (A) or deletions (D) relative to N2. Only genes that are completely contained by the interval spanned by the breakpoint probes are listed as entirely affected, despite the possibility that the indel extends beyond those probes. For example, math-15 is listed as partially deleted in CB3191, but DNA sequencing has shown that the gene is entirely deleted. The coordinates listed for all flanking and breakpoint probes (see Methods) refer to the position of the first base of each probe. In some cases we did not identify a flanking probe because there was no probe on our microarray to either the left or right of the indel on that chromosome. The Indel Length is the difference between the left breakpoint and right breakpoint coordinates, and is equivalent to the distance between the middle of the first and the last probe affected by the indel.  This table can be accessed at:  http://www.zoology.ubc.ca/~alorch/jason/Table.4.3.pdf     70 Figure 4.1. Indels on the left arm of chromosome II in twelve natural isolates of C. elegans. Deletions unique to a strain are plotted in grey and deletions found in multiple strains are plotted in black. Amplified sequences present in only one strain are shown in orange and those found in multiple strains are shown in red. The actual position of amplified sequences in the genome is unknown.  The position of amplifications shown here corresponds to the position of the single copy of that sequence in the N2 reference genome. Small indels are not shown to scale. The blue arrows indicate the site of a possible recombination event.  KR314 shares alleles with CB4853 and CB4858 to the right of the arrows but not to the left.   71 Figure 4.2. Indels in the genomes of twelve natural isolates of C. elegans. Deletions unique to a strain are plotted in grey and deletions found in multiple strains are plotted in black. Amplified sequences present in only one strain are shown in orange and those found in multiple strains are shown in red. The actual position of amplified sequences in the genome is unknown.  The position of amplifications shown here corresponds to the position of the single copy of that sequence in the N2 reference genome. Small indels are not shown to scale.  This figure can be accessed at:  http://www.zoology.ubc.ca/~alorch/jason/Figure.4.2.jpg   72 Figure 4.3. The number of deletions detected in each of 12 natural isolates of C. elegans. The numbers of other isolates that carry the same deletions are indicated by the colors in the figure legend.     73 Figure 4.4. Unrooted consensus tree for twelve natural isolates of C. elegans. The two numbers listed in parentheses next to each node are the percentage of trees among 1000 bootstrap replicates that included all strains distal from CB4856 under Camin-Sokal and Wagner parsimony, respectively. The tree should not be interpreted strictly as a phylogeny due to recombination between strains.    74 4.5. References Barnes, T.M., Y. Kohara, A. Coulson, and S. Hekimi. 1995. Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics 141: 159-179. Barriere, A. and M.A. Felix. 2005. High local genetic diversity and low outcrossing rate in Caenorhabditis elegans natural populations. Curr Biol 15: 1176-1184. Barriere, A. and M.A. Felix. 2007. Temporal dynamics and linkage disequilibrium in natural Caenorhabditis elegans populations. Genetics 176: 999-1011. Brenner, S. 1974. The genetics of Caenorhabditis elegans. Genetics 77: 71-94. Brown, A.H., M.W. Feldman, and E. Nevo. 1980. Multilocus Structure of Natural Populations of HORDEUM SPONTANEUM. Genetics 96: 523-536. Camin, J.H. and R.R. Sokal. 1965. A method for deducing branching sequences in phylogeny. Evolution 19: 311-326. Chen, N., S. Pai, Z. Zhao, A. Mah, R. Newbury, R.C. Johnsen, Z. Altun, D.G. Moerman, D.L. Baillie, and L.D. Stein. 2005. Identification of a nematode chemosensory gene family. Proc Natl Acad Sci U S A 102: 146-151. Cleveland, W.S. 1981. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician 35: 54. Conrad, D.F., T.D. Andrews, N.P. Carter, M.E. Hurles, and J.K. Pritchard. 2006. A high- resolution survey of deletion polymorphism in the human genome. Nat Genet 38: 75-81. Conrad, D.F. and M.E. Hurles. 2007. The population genetics of structural variation. Nat Genet. 7 Suppl: S30-6. Cutter, A.D. 2006. Nucleotide polymorphism and linkage disequilibrium in wild populations of the partial selfer Caenorhabditis elegans. Genetics 172: 171-184. Cutter, A.D. and B.A. Payseur. 2003. Selection at linked sites in the partial selfer Caenorhabditis elegans. Mol Biol Evol 20: 665-673. Denver, D.R., K. Morris, M. Lynch, and W.K. Thomas. 2004. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430: 679-682. Denver, D.R., K. Morris, and W.K. Thomas. 2003. Phylogenetics in Caenorhabditis elegans: an analysis of divergence and outcrossing. Mol Biol Evol 20: 393-400. Eck, R.V. and M.O. Dayhoff. 1966. Atlas of Protein Sequence and Structure 1966. National Biomedical Research Foundation, Silver Spring, Maryland. Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164- 166. Feuk, L., A.R. Carson, and S.W. Scherer. 2006. Structural variation in the human genome. Nature Reviews Genetics 7: 85-97. Flibotte, S. and D.G. Moerman. 2008. Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization. BMC Genomics 9: 85-97. Friedman, J.M., A. Baross, A.D. Delaney, A. Ally, L. Arbour, L. Armstrong, J. Asano, D.K. Bailey, S. Barber, P. Birch, M. Brown-John, M. Cao, S. Chan, D.L. Charest, N. Farnoud, N. Fernandes, S. Flibotte, A. Go, W.T. Gibson, R.A. Holt, S.J. Jones, G.C. Kennedy, M. Krzywinski, S. Langlois, H.I. Li, B.C. McGillivray, T. Nayar, T.J. Pugh, E. Rajcan- Separovic, J.E. Schein, A. Schnerch, A. Siddiqui, M.I. Van Allen, G. Wilson, S.L. Yong,   75 F. Zahir, P. Eydoux, and M.A. Marra. 2006. Oligonucleotide microarray analysis of genomic imbalance in children with mental retardation. Am J Hum Genet 79: 500-513. Haber, M., M. Schungel, A. Putz, S. Muller, B. Hasert, and H. Schulenburg. 2005. Evolutionary history of Caenorhabditis elegans inferred from microsatellites: evidence for spatial and temporal genetic differentiation and the occurrence of outbreeding. Mol Biol Evol 22: 160-173. Haubold, B. and R.R. Hudson. 2000. LIAN 3.0: detecting linkage disequilibrium in multilocus data. Bioinformatics 16: 847-848. Hillier, L.W., G.T. Marth, A.R. Quinlan, D. Dooling, G. Fewell, D. Barnett, P. Fox, J.I. Glasscock, M. Hickenbotham, W. Huang, V.J. Magrini, R.J. Richt, S.N. Sander, D.A. Stewart, M. Stromberg, E.F. Tsung, T. Wylie, T. Schedl, R.K. Wilson, and E.R. Mardis. 2008. Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5: 183-188. Hodgkin, J. and T. Doniach. 1997. Natural variation and copulatory plug formation in Caenorhabditis elegans. Genetics 146: 149-164. Jovelin, R., B.C. Ajie, and P.C. Phillips. 2003. Molecular evolution and quantitative variation for chemosensory behaviour in the nematode genus Caenorhabditis. Mol Ecol 12: 1325- 1337. Kluge, A.G. and J.S. Farris. 1969. Quantitative phyletics and the evolution of anurans. Systematic Zoology 18: 1-32. Koch, R., H.G. van Luenen, M. van der Horst, K.L. Thijssen, and R.H. Plasterk. 2000. Single nucleotide polymorphisms in wild isolates of Caenorhabditis elegans. Genome Res 10: 1690-1696. Madrigal, I., L. Rodriguez-Revenga, L. Armengol, E. Gonzalez, B. Rodriguez, C. Badenas, A. Sanchez, F. Martinez, M. Guitart, I. Fernandez, J.A. Arranz, M. Tejada, L.A. Perez- Jurado, X. Estivill, and M. Mila. 2007. X-chromosome tiling path array detection of copy number variants in patients with chromosome X-linked mental retardation. BMC Genomics 8: 443. Marshall, C.R., A. Noor, J.B. Vincent, A.C. Lionel, L. Feuk, J. Skaug, M. Shago, R. Moessner, D. Pinto, Y. Ren, B. Thiruvahindrapduram, A. Fiebig, S. Schreiber, J. Friedman, C.E. Ketelaars, Y.J. Vos, C. Ficicioglu, S. Kirkpatrick, R. Nicolson, L. Sloman, A. Summers, C.A. Gibbons, A. Teebi, D. Chitayat, R. Weksberg, A. Thompson, C. Vardy, V. Crosbie, S. Luscombe, R. Baatjes, L. Zwaigenbaum, W. Roberts, B. Fernandez, P. Szatmari, and S.W. Scherer. 2008. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet 82: 477-488. Maydan, J.S., S. Flibotte, M.L. Edgley, J. Lau, R.R. Selzer, T.A. Richmond, N.J. Pofahl, J.H. Thomas, and D.G. Moerman. 2007. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array comparative genomic hybridization. Genome Res. 17: 337-347. Nguyen, D.Q., C. Webber, and C.P. Ponting. 2006. Bias of selection on human copy-number variants. PLoS Genet 2: e20. R Development Core Team. 2006. R: A language and environment for statistical computing. Vienna, Austria. Redon, R., S. Ishikawa, K.R. Fitch, L. Feuk, G.H. Perry, T.D. Andrews, H. Fiegler, M.H. Shapero, A.R. Carson, W. Chen, E.K. Cho, S. Dallaire, J.L. Freeman, J.R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J.R. MacDonald, C.R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M.J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D.F. Conrad, X. Estivill, C. Tyler-Smith, N.P. Carter, H. Aburatani, C. Lee, K.W. Jones, S.W.   76 Scherer, and M.E. Hurles. 2006. Global variation in copy number in the human genome. Nature 444: 444-454. Robertson, H.M. 2000. The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. Genome Res 10: 192-203. Schulenburg, H. and S. Muller. 2004. Natural variation in the response of Caenorhabditis elegans towards Bacillus thuringiensis. Parasitology 128: 433-443. Sebat, J., B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi, N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T.C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, and M. Wigler. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525-528. Seidel, H.S., M.V. Rockman, and L. Kruglyak. 2008. Widespread genetic incompatibility in C. elegans maintained by balancing selection. Science 319: 589-594. Sivasundar, A. and J. Hey. 2005. Sampling from natural populations with RNAI reveals high outcrossing and population structure in Caenorhabditis elegans. Curr Biol 15: 1598- 1602. Smith, J.M., N.H. Smith, M. O'Rourke, and B.G. Spratt. 1993. How clonal are bacteria? Proc Natl Acad Sci U S A 90: 4384-4388. Stefansson, H., D. Rujescu, S. Cichon, A. Ingason, S. Steinberg, R. Fossdal, E. Sigurdsson, T. Sigmundsson, J.E. Buizer-Voskamp, T. Hansen, K.D. Jakobsen, P. Muglia, C. Francks, P.M. Matthews, A. Gylfason, B.V. Halldorsson, D. Gudbjartsson, T.E. Thorgeirsson, A. Sigurdsson, A. Jonasdottir, A. Jonasdottir, A. Bjornsson, S. Mattiasdottir, T. Blondal, M. Haraldsson, B.B. Magnusdottir, I. Giegling, H.J. Moller, A. Hartmann, K.V. Shianna, D. Ge, A.C. Need, C. Crombie, G. Fraser, N. Walker, J. Lonnqvist, J. Suvisaari, A. Tuulio- Henriksson, T. Paunio, T. Toulopoulou, E. Bramon, M. Di Forti, R. Murray, M. Ruggeri, E. Vassos, S. Tosato, M. Walshe, T. Li, C. Vasilescu, T.W. Muhleisen, A.G. Wang, H. Ullum, S. Djurovic, I. Melle, J. Olesen, L.A. Kiemeney, B. Franke, C. Sabatti, N.B. Freimer, J.R. Gulcher, U. Thorsteinsdottir, A. Kong, O.A. Andreassen, R.A. Ophoff, A. Georgi, M. Rietschel, T. Werge, H. Petursson, D.B. Goldstein, M.M. Nothen, L. Peltonen, D.A. Collier, D. St Clair, and K. Stefansson. 2008. Large recurrent microdeletions associated with schizophrenia. Nature 455: 232-236. Stein, L.D., Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. Eustachio, D.H.A. Fitch, L.A. Fulton, R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E. Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W. Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D. Willey, R.K. Wilson, R. Durbin, and R.H. Waterston. 2003. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics. PLoS Biology 1: e45. Stewart, M.K., N.L. Clark, G. Merrihew, E.M. Galloway, and J.H. Thomas. 2005. High genetic diversity in the chemoreceptor superfamily of Caenorhabditis elegans. Genetics 169: 1985-1996. Swan, K.A., D.E. Curtis, K.B. McKusick, A.V. Voinov, F.A. Mapa, and M.R. Cancilla. 2002. High-throughput gene mapping in Caenorhabditis elegans. Genome Res 12: 1100-1105. The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012-2018. Thomas, J.H. 2006. Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172: 127-143.   77 Thomas, J.H., J.L. Kelley, H.M. Robertson, K. Ly, and W.J. Swanson. 2005. Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae. Proc Natl Acad Sci U S A 102: 4476-4481. Wicks, S.R., R.T. Yeh, W.R. Gish, R.H. Waterston, and R.H. Plasterk. 2001. Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map. Nat Genet 28: 160-164. Witherspoon, D.J. and H.M. Robertson. 2003. Neutral evolution of ten types of mariner transposons in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae. J Mol Evol 56: 751-769. Zhang, Z., S. Schwartz, L. Wagner, and W. Miller. 2000. A greedy algorithm for aligning DNA sequences. J Comput Biol 7: 203-214.   78 5. Conclusions 5.1. Thesis summary This thesis describes the development of an aCGH platform that permits very high-resolution detection of mutations in C. elegans. Exon-centric microarrays targeting specific chromosomes and the whole genome were used to detect novel induced mutations as small as 141 bp in length. Further restricting the candidate region for mutation detection to two Mbp or less allowed the detection of single nucleotide mutations and many mutations have been discovered using this technique. aCGH will facilitate the identification of mutations for the C. elegans community. The whole genome array was used to characterize natural copy number variation in twelve wild isolates of C. elegans and identified over 500 different deletions affecting more than five percent of the annotated genes in the genome. The deletions present in the natural isolates largely affected genes thought to be involved in environmental sensation and innate immunity, and revealed that the relationships among strains are complicated due to recombination resulting from outcrossing with males. 5.2. The significance of this work and its potential applications aCGH is an attractive and important adjunct to the PCR-based method of deletion detection used by the C. elegans Gene Knockout Consortium and the research community. aCGH avoids the time-consuming and labor-intensive process of sibling selection and is not constrained to finding deletions smaller than PCR amplicon sizes of 2-3 kb. Large deletions may be particularly desirable for tandem gene families, such as the Serpentine Receptor class AB (srab) family of 7-transmembrane chemoreceptors and integral membrane proteins, in which consecutive genes may share functional redundancies (Stein et al. 2003; Chen et al. 2005; Thomas 2006). The ability of aCGH to reveal additional mutations elsewhere in mutant genomes that are not detected by the PCR-based method may help to prevent incorrect associations being inferred between mutations and phenotypes. Thus far, the cost per deletion isolated has been comparable between the PCR-based method and aCGH experiments comparing known mutants to N2. The deletions that were detected in the natural isolate work presented in Chapter 4 significantly increased the number of genes with known null alleles,   79 reduced the number of remaining targets for the Knockout Consortium, and may also be helpful in identifying genes that are responsible for phenotypic differences among the strains such as body length and reproductive behavior (Hodgkin and Doniach 1997), social behavior (de Bono and Bargmann 1998), aging (Gems and Riddle 2000), sperm morphology (LaMunyon and Ward 2002), chemosensation (Jovelin et al. 2003) and innate immunity (Schulenburg and Muller 2004). The relationships that were inferred among the strains and the specification of which strains carry the largest numbers of unique indel alleles can be used to inform decisions regarding which strains to select for deep sequencing in the hopes of identifying novel loss-of- function mutations, further contributing to the number of known knockout mutations in C. elegans. 5.3. Strategies to reduce the cost of detecting novel induced deletions using aCGH Both aCGH and the PCR-based method depend on reliable mutagenesis for their success. Since implementing the aCGH method, the Knockout Consortium has had greater success using aCGH to identify deletions in mutagenized strains that display a phenotype (thus far, balanced lethals (see Chapter 2), uncoordinated (Unc) strains, and unc-22 mutants that twitch in the presence of 1% nicotine) than in strains that are phenotypically wild-type. In most cases the mutations that have been detected have not been correlated with the phenotypes, but the phenotypes serve to ensure that the animal has been mutated. We do not know precisely how common it is for mutant strains to carry multiple deletions after the standard mutagenesis procedure (Barstead and Moerman 2006), but several of the balanced lethal strains described in Chapter 2 carried multiple deletions on a single chromosome II. Recent aCGH experiments have identified multiple mutations in the genomes of individual unc-22 mutants (Jon Taylor, personal communication). These results suggest that there are likely to be multiple deletions in many of the mutants that we generate.  One way to ensure that there are mutations in the genomes to be screened by aCGH was presented in Chapter 2 in the screens for lethal deletions on chromosome II.  Another way would be to perform the standard TMP/UV or EMS mutagenesis (Barstead and Moerman 2006), select single F2 mutants that display a phenotype, propagate those animals clonally through several generations of picking single hermaphroditic parents in order to drive mutations to homozygosity through genetic drift, and then perform aCGH comparing DNA from the F7   80 mutants to the N2 strain. In the absence of selection, F7 animals resulting from this procedure will have lost roughly half of the mutations present in the F1s but be homozygous for nearly all of the remaining mutations that they carry. Screening homozygous mutants is not necessary for successful deletion detection but it does eliminate downstream sibling selection required to isolate the mutation. Also, heterozygous deletions are less reliably detected and could lead to time being wasted attempting to confirm false deletion candidates.  A simpler method of ensuring that mutant genomes are used in the aCGH experiments, which would not require additional screening for phenotypes, would be to simply screen mutants that have been previously identified by the PCR-based method. These strains have already been clonally propagated by picking single parents for a handful of generations through the normal process of sibling selection, and could be further clonally propagated as described above to drive heterozygous mutations elsewhere in the genome to homozygosity if desired. Deletions previously detected by PCR would serve as positive controls in these aCGH experiments. Additional mutations that were found in these strains could be isolated from the previously known mutations through backcrosses with the N2 strain. These experiments would also further characterize the genetic background of existing mutant strains, making them more valuable tools to researchers.  Perhaps the simplest way to increase the number of deletions that are detected per aCGH experiment would be to compare two mutant DNA samples to each other instead of comparing a mutant to the N2 reference strain. This would effectively cut the cost of the aCGH method in half. In the case of homozygous mutations, there would be little ambiguity between whether a candidate mutation was a deletion in one sample or an amplification in the other because log2 ratios resulting from deletions in one strain are otherwise interpreted as multi-fold amplifications (which are comparatively rare) in the other strain. Although heterozygous deletions could be misinterpreted as homozygous amplifications, PCR and DNA sequencing are subsequently used to precisely characterize all candidate mutations and would distinguish these possibilities.  At the current cost of experiments using the 380,000-probe microarrays, detecting roughly one deletion in every three experiments would make aCGH cost-efficient relative to the PCR method. This success rate seems to be achievable. In the most recent set of deletion screens   81 utilizing the whole genome microarray, four novel deletions were detected among six aCGH experiments comparing F7 twitcher (unc-22) mutants generated by TMP/UV mutagenesis with wild-type N2 animals, not including deletions affecting unc-22 (Jon Taylor, personal communication).  The availability of higher density microarrays should also make the aCGH method increasingly cost-efficient. Higher density arrays can be used to probe a candidate region with increased probe density and sensitivity, or the arrays can be subdivided to allow multiple individual experiments on a single slide. The impact that higher density arrays will have on the cost- efficiency of screening for deletions will, of course, depend on their pricing. The Knockout Consortium plans to purchase microarrays from manufacturers and perform the sample handling, hybridizations and microarray imaging in house, which should further reduce the cost of aCGH experiments. It is also possible to gently but thoroughly remove hybridized DNA samples from a microarray after an aCGH experiment in order to permit reuse of the array for subsequent hybridizations, further reducing the cost per experiment. 5.4. Single nucleotide mutation detection The ability to detect single nucleotide mutations using aCGH provides a boon to C. elegans researchers and can also benefit research in other model organisms with sequenced genomes. Currently, the SNP detection strategy presented in Chapter 3 is probably the cheapest method of detecting a SNP in a candidate region up to two Mb in length, however, the technique is not sensitive enough to detect all single nucleotide mutations in an interval of this size. Approximately 50% of C/G-to-T/A mutations that are generated by EMS mutagenesis are detectable in a one Mb candidate region, but A/T-to-T/A mutations are more difficult to detect. Of course, microarrays with this probe density are also capable of detecting very small deletions, and in one experiment a 10-bp deletion was detected and confirmed (data not shown). Further decreasing the size of the candidate region allows increased probe density and the targeting of both strands, which should improve the sensitivity and specificity of the technique.   82 5.5. SNP-CGH Mapping The ability to detect SNPs using aCGH is exploited in the SNP-CGH mapping protocol that we have developed (Flibotte et al. 2009), which provides an extremely rapid and high-resolution means of mapping mutations in C. elegans. In this technique, ten F1 cross progeny are picked following a genetic cross of homozygous mutant hermaphrodites with a recessive phenotype and CB4856 males. One hundred homozygous mutant F2s and their progeny are then selected and allowed to grow as a mixed population. DNA is prepared from the resulting population of mutants, labeled with Cy3 and compared to Cy5-labeled N2 DNA by aCGH. A custom microarray targeting over 3,000 CB4856 SNPs spread evenly throughout the genome is used to compare the two DNA samples. This microarray includes up to 22 different probe sequences for both the N2 and the CB4856 alleles at each SNP locus. Due to recombination, mutants have inherited zero, one or two CB4856 SNP alleles at each SNP locus. N2 alleles are more likely at loci that are closely linked to the selected mutation, with expected log2 ratios (Cy3 / Cy5) = 0 for all probes. Loci that are unlinked to the mutation should include an equal proportion of N2 and CB4856 alleles in the absence of selection (although selection does occur at one locus; see Seidel et al. (2008)), giving log2 ratios < 0 for N2-specific probes and log2 ratios > 0 for CB4856-specific probes. The mapping signal is therefore strongest where the difference between the median log2 ratios measured for the CB4856-specific and N2-specific probes is the smallest. Fitting a cubic smoothing spline to this mapping signal allows the mutation to be mapped to within approximately 200 kb (Flibotte et al. 2009).  SNP-CGH mapping should become the method of choice for mapping mutations in C. elegans. This technique can also be applied to other model organisms. Mutations mapped in this way should then be relatively easy to identify using the method described in Chapter 3 because the candidate region is reduced to such a small interval. This would allow probe spacing of 1 bp with probes targeting both strands, which should increase the sensitivity and specificity of the technique compared to our results in Chapter 3 with much larger candidate regions. 5.6. Future directions, deep sequencing and site-specific gene conversion A currently more costly and less accessible alternative for SNP detection would be to utilize massively parallel high-throughput methods of DNA sequencing, or “deep sequencing”, such as   83 the Illumina platform (Quail et al. 2008).  The cost of this approach could potentially be reduced by sequencing only the candidate region instead of the entire mutant genome by first isolating DNA from the candidate region using the method known as sequence capture (Hodges et al. 2007; Okou et al. 2007). Sequence capture involves hybridizing fragmented mutant DNA to a microarray of probes to the target region, washing away sample fragments that do not hybridize to the array, and then eluting the fragments bound to the array. These fragments are then amplified using PCR in order to prepare enough DNA for sequencing. This enriches for regions of interest, such as exons, and can avoid undesired sequencing of intergenic or repeat sequences. However, whether or not sequence capture will be worthwhile given the small size of the C. elegans genome and the decreasing cost of deep sequencing is uncertain.  While TILLING should remain an efficient method of locating an allelic series of mutations in a gene of interest (Till et al. 2003; Gilchrist et al. 2006), with increasing affordability and accessibility, deep sequencing will probably become the preferred method for untargeted genome-wide searches for null alleles. A recent resequencing study suggested that EMS mutagenesis can produce individual animals that are heterozygous for as many as 25 loss-of- function mutations (Cuppen et al. 2007). Another study provided a proof in principle of using deep sequencing to detect single nucleotide mutations throughout the C. elegans genome by sequencing an N2 isolate and the CB4858 (Pasadena) strain (Hillier et al. 2008). The Knockout Consortium recently sequenced the genome of an unc-22 mutant generated by standard EMS mutagenesis and was able to find a single base pair mutation responsible for the unc-22 mutation, along with mutations leading to 34 non-conservative amino acid changes and one other nonsense mutation (Moerman 2008). In this experiment the gene responsible for the mutant phenotype was already known, but the SNP-CGH protocol described above could be used to narrow the list of candidate genes if deep sequencing is used in forward genetic screens. In the absence of an efficient site-directed method of mutagenesis, random mutagenesis followed by deep sequencing may eventually become the primary method of detecting null alleles for the Knockout Consortium.  Deep sequencing could also be used to identify null alleles in natural isolates. For instance, over 100 nonsense mutations were identified in a recent sequencing of the CB4856 genome (David Spencer, personal communication). The natural isolate study in Chapter 4 can be used to guide the selection of other strains for deep sequencing in the search for novel null mutations. The   84 most promising strains would be those with the highest number of unique alleles, indicating significant divergence from the other strains in the study. These would include JU258, KR314 and MY2.  A significant disadvantage to the strategy of random mutagenesis followed by either aCGH or deep sequencing is that both methods will increasingly discover mutations in genes with existing mutations as the number of discovered mutations increases. While it is possible to use microarrays or sequence capture strategies that do not target genes for which knockouts currently exist, both methods will eventually become less efficient for identifying desirable knockouts as the number of gene targets decreases.  Currently, both methods are attractive because knockout mutations do not exist for most C. elegans genes, so mutations that are discovered are likely to target genes for which mutations do not yet exist. The PCR-based method involves site-directed mutation discovery but still relies on random mutagenesis. As the number of genes remaining to be knocked out decreases, site-directed methods of mutagenesis such as MosTIC (Robert and Bessereau 2007) or some other method utilizing homologous recombination that is practical at a genome scale will likely be needed to achieve comprehensive mutation coverage of the C. elegans genome. For the foreseeable future, the aCGH platform developed in this thesis provides an efficient means of identifying novel mutations in C. elegans.    85 5.7. References Barstead, R.J. and D.G. Moerman. 2006. C. elegans deletion mutant screening. Methods Mol Biol 351: 51-58. Chen, N., S. Pai, Z. Zhao, A. Mah, R. Newbury, R.C. Johnsen, Z. Altun, D.G. Moerman, D.L. Baillie, and L.D. Stein. 2005. Identification of a nematode chemosensory gene family. Proc Natl Acad Sci U S A 102: 146-151. Cuppen, E., E. Gort, E. Hazendonk, J. Mudde, J. van de Belt, I.J. Nijman, V. Guryev, and R.H.A. Plasterk. 2007. Efficient target-selected mutagenesis in Caenorhabditis elegans: Toward a knockout for every gene. Genome Res. 17: 649-658. de Bono, M. and C.I. Bargmann. 1998. Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell 94: 679-689. Flibotte, S., M. Edgley, J. Maydan, J. Taylor, R. Zapf, R. Waterston, and D.G. Moerman. 2009. Rapid High Resolution SNP-CGH mapping in Caenorhabditis elegans. Genetics 181: 33-37. Gems, D. and D.L. Riddle. 2000. Defining wild-type life span in Caenorhabditis elegans. J Gerontol A Biol Sci Med Sci 55: B215-219. Gilchrist, E.J., N.J. O'Neil, A.M. Rose, M.C. Zetka, and G.W. Haughn. 2006. TILLING is an effective reverse genetics technique for Caenorhabditis elegans. BMC Genomics 7: 262. Hillier, L.W., G.T. Marth, A.R. Quinlan, D. Dooling, G. Fewell, D. Barnett, P. Fox, J.I. Glasscock, M. Hickenbotham, W. Huang, V.J. Magrini, R.J. Richt, S.N. Sander, D.A. Stewart, M. Stromberg, E.F. Tsung, T. Wylie, T. Schedl, R.K. Wilson, and E.R. Mardis. 2008. Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5: 183-188. Hodges, E., Z. Xuan, V. Balija, M. Kramer, M.N. Molla, S.W. Smith, C.M. Middle, M.J. Rodesch, T.J. Albert, G.J. Hannon, and W.R. McCombie. 2007. Genome-wide in situ exon capture for selective resequencing. Nat Genet 39: 1522-1527. Hodgkin, J. and T. Doniach. 1997. Natural variation and copulatory plug formation in Caenorhabditis elegans. Genetics 146: 149-164. Jovelin, R., B.C. Ajie, and P.C. Phillips. 2003. Molecular evolution and quantitative variation for chemosensory behaviour in the nematode genus Caenorhabditis. Mol Ecol 12: 1325- 1337. LaMunyon, C.W. and S. Ward. 2002. Evolution of larger sperm in response to experimentally increased sperm competition in Caenorhabditis elegans. Proc Biol Sci 269: 1125-1128. Moerman, D. 2008. Deep sequencing of an unc-22 mutant following EMS mutagenesis in C. elegans., Personal communication. Vancouver, BC, Canada. Okou, D.T., K.M. Steinberg, C. Middle, D.J. Cutler, T.J. Albert, and M.E. Zwick. 2007. Microarray-based genomic selection for high-throughput resequencing. Nat Methods 4: 907-909. Quail, M.A., I. Kozarewa, F. Smith, A. Scally, P.J. Stephens, R. Durbin, H. Swerdlow, and D.J. Turner. 2008. A large genome center's improvements to the Illumina sequencing system. Nat Methods 5: 1005-1010. Robert, V. and J.L. Bessereau. 2007. Targeted engineering of the Caenorhabditis elegans genome following Mos1-triggered chromosomal breaks. Embo J 26: 170-183. Schulenburg, H. and S. Muller. 2004. Natural variation in the response of Caenorhabditis elegans towards Bacillus thuringiensis. Parasitology 128: 433-443.   86 Seidel, H.S., M.V. Rockman, and L. Kruglyak. 2008. Widespread genetic incompatibility in C. elegans maintained by balancing selection. Science 319: 589-594. Stein, L.D., Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. Eustachio, D.H.A. Fitch, L.A. Fulton, R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E. Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W. Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D. Willey, R.K. Wilson, R. Durbin, and R.H. Waterston. 2003. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics. PLoS Biology 1: e45. Thomas, J.H. 2006. Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172: 127-143. Till, B.J., T. Colbert, R. Tompa, L.C. Enns, C.A. Codomo, J.E. Johnson, S.H. Reynolds, J.G. Henikoff, E.A. Greene, M.N. Steine, L. Comai, and S. Henikoff. 2003. High-throughput TILLING for functional genomics. Methods Mol Biol 236: 205-220.   87 Appendix 1. Genes that are completely deleted from the Hawaiian strain (CB4856) genome. The position listed is the middle of the deleted gene.  Genome Name Genetic Name / Family Chromosome Position Note ZK993.2  I 1,112,217 Y39G10AR.5 SR-unclass I 2,346,427 F35E2.3  I 11,743,123 T02G6.6 fbxb I 11,829,186 Y47H9C.14  I 11,906,180 Y47H9C.9  I 11,908,246 Y47H9C.10 fbxa I 11,910,701 M01G12.14 cfam4 I 12,115,787 T15D6.1 duf595 I 12,380,121 E03H4.12  I 12,444,750 H16D19.4  I 12,650,704 T26E3.8  I 12,666,675 T26E3.1 clec I 12,691,915 W02A11.6  I 12,762,730 pseudogene W02A11.8 bath-35 I 12,767,118 F44F1.1  I 13,260,146 K05C4.9  I 14,736,917 C03H5.1 clec-10 II 403,400 F28A10.3  II 840,887 T07D3.5  II 886,263 T07D3.4  II 889,574 K02E7.9 btb II 1,060,237 K02E7.5 fbx II 1,063,042 K02E7.10 cathA II 1,066,288 K02E7.12  II 1,071,265 Y51H7BR.3  II 1,534,818 Y51H7BR.2 fbxb-43 II 1,536,773 Y51H7BR.1 fbxb-42 II 1,538,130 K05F6.5 fbxb-44 II 1,540,811 K05F6.4 fbxb II 1,545,252 K05F6.6 fbxb-52 II 1,548,512 K05F6.3 fbxb-51 II 1,552,095 K05F6.7 fbxb-54 II 1,555,788 K05F6.2 fbxb-50 II 1,559,902 K05F6.8 fbx II 1,567,541 K05F6.9 fbxb-46 II 1,570,611 K05F6.1 fbxb-49 II 1,570,990 K05F6.10 btb II 1,573,768 C08E3.7 fbxa II 1,613,422 C08E3.8 fbxa II 1,615,632 C08E3.9 fbxa II 1,617,948 C08E3.10 fbxa II 1,620,582 C08E3.11 fbxa II 1,622,249 C08E3.12 fbxa II 1,624,294 ZC204.9 fbxb-20 II 1,649,688 F58E1.12 fbxc II 1,693,959   88 F58E1.13 btb II 1,695,061 F36H5.9 fbxb II 1,752,923 F36H5.11 fbxb-12 II 1,754,848 F36H5.3 math-28 II 1,764,854 F36H5.2b math-27 II 1,769,883 F36H5.1 math-26 II 1,773,253 C08F1.4a math-3 II 1,777,284 C08F1.5 math-4 II 1,780,607 C08F1.10 cfam10 II 1,785,801 C08F1.6 cfam10 II 1,787,697 C08F1.3 fbxb-13 II 1,789,688 C08F1.2 str-21 II 1,792,475 pseudogene C08F1.1 math-2 II 1,800,515 C08F1.7 str-22 II 1,802,708 pseudogene C08F1.8 cfam10 II 1,806,255 C08F1.9 revt II 1,809,929 T08E11.6 fbxb-10 II 1,817,844 T08E11.7 fbxa-3 II 1,819,328 T08E11.5 fbxc II 1,822,276 T08E11.4 math-41 II 1,827,335 T08E11.3 math-40 II 1,831,237 T08E11.2 math-39 II 1,832,898 T08E11.8 fbx II 1,834,708 T08E11.1 fbx II 1,838,364 C52E2.5 fbx II 1,841,959 C52E2.4 fbx II 1,844,889 C52E2.6 fbxb-97 II 1,847,375 C52E2.7 fbxb-96 II 1,849,922 C52E2.3 duf130 II 1,851,866 C52E2.2 clec II 1,852,943 C52E2.1 fbxb-95 II 1,854,863 C52E2.8  II 1,856,710 C16C4.7  II 1,859,901 C16C4.6 fbxb-98 II 1,863,210 C16C4.5 math-15 II 1,866,964 C16C4.4 math-14 II 1,870,059 C16C4.15 math-10 II 1,872,349 C16C4.16 math-11 II 1,874,693 C16C4.3 math-13 II 1,876,411 C16C4.8 math-16 II 1,878,336 C16C4.9 math-17 II 1,880,206 C16C4.10 math-5 II 1,882,210 C16C4.11 math-6 II 1,884,168 C16C4.12 math-7 II 1,885,961 C16C4.13 math-8 II 1,888,192 C16C4.14 math-9 II 1,890,348 C16C4.2 math-12 II 1,891,823 C16C4.1  II 1,894,512 C46F9.4 math-25 II 1,896,912 C46F9.3 math-24 II 1,901,084 C46F9.2 math-23 II 1,903,366 C46F9.1 math-22 II 1,905,674 F52C6.5 math-30 II 1,908,950 F52C6.6 math-31 II 1,912,655 F52C6.7 bath-11 II 1,914,743 F52C6.8 bath-4 II 1,917,377   89 F52C6.9 bath-6 II 1,918,881 F52C6.10 bath-7 II 1,920,562 F52C6.11 bath-2 II 1,922,183 F52C6.4 ubq II 1,923,564 F52C6.3 ubq II 1,925,115 F52C6.2 ubq II 1,926,664 F52C6.1 bath-22 II 1,928,308 F52C6.12 ubql-e2 II 1,929,757 F52C6.13  II 1,930,470 F52C6.14  II 1,932,827 C40D2.2 math-20 II 1,997,515 C40D2.1 math-19 II 1,999,261 C40D2.4 homeodomain II 2,000,366 F59H6.5 pif-helicase II 2,005,815 F59H6.6  II 2,011,981 F59H6.4 math-32 II 2,012,019 F59H6.3  II 2,018,442 F59H6.2  II 2,021,049 F59H6.8 bath-21 II 2,026,127 F59H6.9 bath-1 II 2,028,248 F59H6.10 bath-3 II 2,029,999 F59H6.11 bath-5 II 2,031,390 F59H6.12 btb II 2,033,198 F59H6.1 bath-19 II 2,037,553 B0047.1 bath-20 II 2,041,105 B0047.2 btb II 2,043,359 B0047.3 bath-24 II 2,045,145 B0047.4 math-1 II 2,046,356 B0047.5 bath-14 II 2,048,442 F07E5.4  II 2,050,852 F07E5.2 fbxb-35 II 2,052,197 F07E5.5 ubql-e3 II 2,056,538 T16A1.4 clec II 2,081,788 T16A1.5  II 2,083,712 T16A1.9  II 2,094,932 T16A1.1 math-42 II 2,101,501 K09F6.6  II 2,277,033 K09F6.9  II 2,281,786 K09F6.10  II 2,288,421 K09F6.7 ubql-e3 II 2,291,523 K09F6.8 fbxc II 2,295,399 B0281.4 btb II 2,298,227 B0281.5 btb II 2,300,381 B0281.6 btb II 2,301,816 B0281.3 ubql-e3 II 2,307,558 B0281.2 revt II 2,308,984 B0281.7  II 2,310,574 pseudogene B0281.8 ubql-e3 II 2,312,170 B0281.1  II 2,313,998 ZK1240.4  II 2,315,354 ZK1240.5 ubql-e3 II 2,317,122 ZK1240.9 ubql-e3 II 2,318,984 ZK1240.3 ubql-e3 II 2,321,053 ZK1240.6 ubql-e3 II 2,322,662 ZK1240.2 ubql-e3 II 2,324,267 ZK1240.8 ubql-e3 II 2,328,025   90 ZK1240.1 ubql-e3 II 2,330,624 F43C11.11 duf130 II 2,360,120 F43C11.12 duf130 II 2,365,867 F16G10.5  II 2,367,873 F16G10.4 duf130 II 2,369,674 F16G10.3 duf130 II 2,373,287 F42G2.5 ubql-e3 II 2,417,192 Y27F2A.3 sri-40 II 3,179,286 Y27F2A.6  II 3,181,603 Y27F2A.8 fbxb II 3,184,153 Y27F2A.2 sri-78 II 3,185,969 Y27F2A.1 sri-58 II 3,188,052 pseudogene Y27F2A.9 duf130 II 3,191,463 ZC239.7 gcy-15 II 3,194,643 ZC239.8 sri-51 II 3,198,351 ZC239.9 sri-48 II 3,200,257 ZC239.10 sri-53 II 3,201,907 ZC239.19 sri-50 II 3,204,624 ZC239.12 btb II 3,208,305 ZC239.6 btb II 3,210,123 ZC239.5 btb II 3,212,757 ZC239.4 btb II 3,214,931 ZC239.3 btb II 3,216,336 ZC239.2 btb II 3,218,444 ZC239.14 btb II 3,221,697 ZC239.13 btb II 3,222,750 ZC239.15 btb II 3,224,089 ZC239.16 btb II 3,225,023 ZC239.17 btb II 3,225,676 ZC239.1  II 3,227,772 pseudogene C17F4.4 srh-297 II 3,229,699 C17F4.8  II 3,232,840 C17F4.10 srz-67 II 3,236,542 F14D2.7  II 3,337,169 F14D2.5  II 3,340,797 F14D2.11 ubq II 3,341,568 F14D2.15 btb II 3,342,960 F14D2.13 bath-28 II 3,344,466 F14D2.12 bath-30 II 3,345,825 F14D2.14  II 3,347,163 F14D2.8 fbxo II 3,348,452 F14D2.10  II 3,352,058 Y49F6A.1  II 3,605,380 F19B10.11  II 3,652,760 F19B10.10 cfam19 II 3,679,013 F19B10.1  II 3,683,812 F19B10.12 srx-99 II 3,686,537 pseudogene F40H7.4 srx-101 II 3,690,101 F40H7.3 srx-100 II 3,692,502 pseudogene T24E12.5  II 3,755,628 F54D10.7 ubq II 3,820,745 F53C3.7  II 3,895,533 R03H10.7  II 4,177,232 F59E12.6a  II 5,628,313 F10E7.2  II 7,123,244 F10E7.3  II 7,124,918   91 F10E7.1 duf1114 II 7,127,730 F15A4.8b glyh II 12,477,119 Y46G5A.7 fbxb II 12,755,280 Y46G5A.8 fbxb II 12,756,890 E01G4.5  II 13,473,126 Y39G8C.2 pkin-st II 14,096,391 Y53F4B.5  II 14,983,191 cTel54X.1 fbxa-6 III 2,094 W05G11.5 btb III 60,258 C29F9.4  III 120,719 C29F9.12  III 121,877 C29F9.14  III 124,486 C29F9.3a  III 125,611 Y71H2AM.14 glyct III 2,744,544 F44E2.2b  III 8,857,128 Y75B8A.32  III 12,356,679 Y75B8A.34 SR-unclass III 12,360,282 3R5.1  III 13,780,116 F38A1.7 clec IV 1,258,214 F38A1.14 clec IV 1,262,935 F38A1.13 clec IV 1,265,538 R05C11.2  IV 2,058,494 Y69A2AR.24  IV 2,563,882 Y69A2AR.25  IV 2,567,101 Y69A2AR.13  IV 2,570,034 Y69A2AR.12  IV 2,571,508 Y69A2AR.26 nhr-242 IV 2,579,348 Y69A2AR.11  IV 2,586,164 Y69A2AR.10 revt IV 2,591,779 Y69A2AR.9  IV 2,595,463 Y69A2AR.8  IV 2,596,557 Y69A2AR.27  IV 2,597,860 Y94H6A.10  IV 2,710,391 Y46C8AL.5 clec-72 IV 3,946,043 Y46C8AL.6 revt IV 3,950,340 Y46C8AL.1 clec-73 IV 3,955,131 Y46C8AL.8 clec-74 IV 3,958,253 Y46C8AL.9b clec-75 IV 3,962,690 Y46C8AR.1 clec-76 IV 3,967,076 F49F1.7 duf18 IV 4,124,638 F49F1.8  IV 4,132,560 F49F1.9 glec IV 4,137,201 F49F1.10 glec IV 4,138,758 F49F1.11 glec IV 4,140,539 F49F1.12  IV 4,142,226 F49F1.13  IV 4,143,825 pseudogene R07C12.3 clec IV 4,145,897 R07C12.2 clec IV 4,149,451 R07C12.4 clec IV 4,151,495 R07C12.1 clec IV 4,153,281 K08D10.10 clec IV 4,155,742 K08D10.9 clec IV 4,159,983 K08D10.8 scram IV 4,163,466 K08D10.7 scram IV 4,165,471 Y7A9C.9 srz-75 IV 16,287,201 Y7A9C.8 srz-76 IV 16,288,673   92 Y7A9C.3 srz-73 IV 16,290,158 pseudogene Y7A9C.5 srz-40 IV 16,292,490 pseudogene Y7A9C.1  IV 16,296,141 Y7A9C.7 srz-72 IV 16,299,673 Y7A9C.6 srz-41 IV 16,301,128 pseudogene Y7A9C.2 srh-225 IV 16,304,885 pseudogene Y7A9C.4 srz-39 IV 16,306,537 pseudogene K03D3.10d rac-2|rab IV 16,311,721 K03D3.8  IV 16,314,466 K03D3.6  IV 16,317,566 K03D3.5 cfam23 IV 16,320,284 K03D3.4 srz-61 IV 16,322,142 K03D3.11 srz-like IV 16,324,469 K03D3.3 srz-21 IV 16,327,404 pseudogene K03D3.2  IV 16,328,336 K03D3.1 srz-74 IV 16,330,220 C35D6.10 srz-71 IV 16,335,623 C35D6.9 srz-38 IV 16,335,881 C35D6.1 srh-228 IV 16,340,481 C35D6.2 srh-227 IV 16,343,306 C35D6.8 srh-224 IV 16,343,495 pseudogene VY10G11R.1  IV 16,469,991 Y50D4B.6 pkin-st V 1,085,623 F59A7.10  V 2,007,363 pseudogene F59A7.8 snfh V 2,009,954 F59A7.4 hil-6 V 2,030,894 Y40B10A.3 srab-23 V 2,034,985 C45H4.9 srbc-23 V 2,154,171 C29G2.2  V 2,590,434 C29G2.1  V 2,591,294 C31B8.6 str-46 V 2,899,029 K12D9.6 srw-125 V 2,997,332 pseudogene Y73C8C.3 cfam19 V 3,116,543 T28A11.21 fbxa-64 V 3,277,435 T28A11.2 duf19 V 3,281,339 T28A11.1 str-64 V 3,283,753 F35F10.5 duf19 V 3,285,440 C17B7.8 nepp V 3,322,256 C17B7.7  V 3,327,764 C17B7.9 duf19 V 3,331,340 C17B7.5 duf750 V 3,336,142 C17B7.4 duf19 V 3,339,516 C17B7.10 nepp V 3,341,306 C17B7.3 duf19 V 3,343,161 C17B7.11 fbxa-65 V 3,345,781 C17B7.2 duf19 V 3,349,595 C17B7.1 str-63 V 3,351,996 C17B7.12 duf19 V 3,353,418 C04E12.6 duf976 V 3,354,917 C04E12.7 scram V 3,356,385 C04E12.5 duf750 V 3,359,694 C04E12.4 duf750 V 3,365,365 C04E12.2 duf19 V 3,369,327 C04E12.8 srx-121 V 3,371,664 C04E12.9 srbc-2 V 3,375,638 C04E12.10 duf750 V 3,379,587   93 C04E12.1 srbc-4 V 3,384,361 pseudogene C04E12.11  V 3,387,183 C04E12.12  V 3,389,956 T20D4.13  V 3,392,602 T20D4.15 duf19 V 3,393,567 T20D4.16 duf19 V 3,394,365 T20D4.17 duf19 V 3,395,578 T20D4.12 duf19 V 3,396,799 T20D4.11 duf19 V 3,398,867 T20D4.10 duf19 V 3,401,132 T20D4.9 nepp V 3,403,879 T20D4.8 nepp V 3,406,627 T20D4.7 thiordx V 3,408,974 T20D4.6  V 3,411,448 T20D4.5 duf750 V 3,414,406 T20D4.4 duf750 V 3,416,686 T20D4.3 duf750 V 3,420,324 T20D4.18 srab-21 V 3,423,521 T20D4.2 srab-22 V 3,426,881 T20D4.1 srab-20 V 3,429,530 T20D4.19 duf19 V 3,430,884 F47D2.6 srt-38 V 4,270,236 Y60C6A.1  V 4,784,696 T15B7.10  V 6,813,742 C03G6.1 srx-57 V 7,379,669 pseudogene Y97E10B.6 srx-11 V 7,917,665 pseudogene C54D10.7  V 12,443,219 ZK1037.4 nhr-246 V 15,320,952 T23F1.3 str-72 V 15,459,341 K08G2.9 srh-299 V 15,866,484 K08G2.10 srh-294 V 15,868,034 pseudogene K08G2.8 srh-293 V 15,869,320 C06C6.7  V 15,996,490 F44G3.8 fbxa-87 V 16,132,156 F11A5.7 recepL V 16,208,586 F11A5.8 acylt V 16,211,006 F57E7.1 cfam23 V 16,484,385 F57E7.2  V 16,485,590 T03E6.2 str-126 V 16,584,688 F14F8.6 srw-44 V 16,684,051 F14F8.7 srw-36 V 16,686,614 F09C6.9 nhr-116 V 16,916,337 F09C6.10  V 16,918,409 Y102A5C.1 fbxa V 16,920,452 Y102A5C.2  V 16,921,743 Y102A5C.3 fbxa V 16,923,131 Y68A4A.2 srz-47 V 17,194,164 T19C9.5 scp-like V 17,230,147 T19C9.6 CUB2 V 17,232,325 T19C9.8 CUB2 V 17,242,116 Y68A4B.3  V 17,244,463 Y68A4B.2 clec V 17,247,214 Y68A4B.1 clec V 17,250,421 Y61B8A.3  V 17,253,804 pseudogene Y61B8A.1 srh-116 V 17,256,607 Y61B8A.2 srh-115 V 17,259,022   94 K10G4.2 srw-47 V 17,263,954 K10G4.3  V 17,267,087 K10G4.6 srh-117 V 17,274,886 pseudogene K10G4.7 srh-262 V 17,279,657 pseudogene K10G4.1  V 17,282,723 K10G4.4  V 17,286,351 K10G4.9 srw-30 V 17,290,602 K10G4.8 srw-46 V 17,295,549 pseudogene K10G4.5 fbx V 17,300,703 Y61B8B.1 sri-70 V 17,306,323 Y61B8B.2  V 17,308,868 F31E9.6  V 17,316,992 F31E9.7 srz-26 V 17,319,612 pseudogene F31E9.5 srz-58 V 17,321,408 F31E9.1 fbxa V 17,323,597 F31E9.3 fbx V 17,327,083 F31E9.4 fbxb V 17,329,822 F31E9.2 srg-44 V 17,332,281 F47H4.5  V 17,335,906 pseudogene F47H4.4 fbxa V 17,336,170 F47H4.6 fbxa V 17,339,956 F47H4.7 fbxa V 17,343,841 F47H4.8 fbxa V 17,346,496 F47H4.9 fbxa V 17,349,046 F47H4.10 skr-5 V 17,350,819 F47H4.1  V 17,352,023 F47H4.11 fbxa-134 V 17,354,051 F47H4.2 fbx V 17,359,324 T27C5.7 clec V 17,416,725 T27C5.12 duf595 V 17,418,588 T27C5.8  V 17,420,351 T27C5.14 srh-96 V 17,422,304 pseudogene T27C5.10 srw V 17,432,342 F20E11.15 srbc-27 V 17,434,429 F20E11.16 srbc-28 V 17,436,678 pseudogene F20E11.2 srsx-2 V 17,439,625 F20E11.11 srh-175 V 17,442,319 pseudogene F20E11.12 srh-154 V 17,443,742 F20E11.13 srh-158 V 17,446,702 pseudogene F20E11.3 srh-160 V 17,449,569 pseudogene F20E11.8 srh-157 V 17,451,538 pseudogene F20E11.9 srh-156 V 17,454,047 pseudogene F20E11.14 srh-161 V 17,455,818 pseudogene F20E11.10 srh-203 V 17,457,973 F20E11.4 str-200 V 17,459,971 F20E11.1 srz-48 V 17,461,924 F20E11.5  V 17,464,200 F20E11.7  V 17,468,180 F20E11.6 srw-72 V 17,470,398 F08E10.1 srh-235 V 17,473,520 F08E10.8 srh-114 V 17,475,475 pseudogene F08E10.3 srh-123 V 17,477,311 F08E10.2 srbc-61 V 17,480,025 F08E10.4 srh-110 V 17,481,436 pseudogene F08E10.5 srh-253 V 17,483,929 pseudogene F08E10.6 srh-111 V 17,487,188   95 F08E10.7 scp-like V 17,489,394 K03D7.9  V 17,490,666 K03D7.8  V 17,493,132 K03D7.7 fbxa-102 V 17,495,320 K03D7.6 srh-118 V 17,497,815 K03D7.5  V 17,499,492 pseudogene K03D7.4 srh-261 V 17,500,773 C18D4.3  V 17,535,146 Y6G8.1 srz-45 V 17,595,891 Y6G8.2 fbx V 17,599,921 F57G4.5  V 17,644,968 F57G4.6  V 17,646,034 F57G4.9 tc-related V 17,646,969 F57G4.7  V 17,649,407 pseudogene F57G4.8 fbxa V 17,652,300 F59A1.5  V 17,654,479 pseudogene F59A1.9 fbxa V 17,656,958 F59A1.8 fbxa-129 V 17,658,506 F59A1.12  V 17,668,843 Y94A7B.5 srh-298 V 17,822,511 Y94A7B.6 srh-300 V 17,825,573 Y94A7B.8 srh-301 V 17,828,918 Y94A7B.9 srh-304 V 17,831,873 Y94A7B.7 srh-303 V 17,835,110 F16H6.3 duf18 V 18,199,159 F16H6.4 duf976 V 18,202,398 F16H6.5 duf976 V 18,205,191 F16H6.6  V 18,208,395 F16H6.7 duf976 V 18,210,452 F16H6.8  V 18,216,921 F16H6.9  V 18,222,266 F16H6.10  V 18,226,306 Y37H2B.1  V 18,230,180 R10E8.7  V 18,234,791 pseudogene Y51A2A.2  V 18,290,424 pseudogene Y51A2A.3 clec V 18,292,662 Y51A2A.4 clec V 18,293,715 Y51A2A.11 clec V 18,297,685 Y51A2A.5  V 18,302,334 C08E8.3  V 18,354,009 Y69H2.10b  V 18,679,902 F11D11.1 clec V 18,771,813 T26H2.2 fbxb-115 V 19,241,708 T26H2.1 fbxb-1 V 19,243,562 F21D9.6  V 19,246,505 C43D7.7  V 19,321,820 C43D7.5 sdz-6 V 19,323,487 C43D7.4  V 19,324,767 C25F9.6  V 19,405,779 C25F9.5 snfh V 19,411,733 C25F9.4 snfh V 19,415,190 C25F9.9 snfh V 19,420,199 C25F9.2  V 19,426,665 C25F9.1 srw-85 V 19,433,165 C25F9.t5  V 19,433,900 pseudogene M04C3.1a snfh V 19,443,365   96 M04C3.3  V 19,447,117 M04C3.2 snfh V 19,450,053 Y43F8B.14 snfh V 19,456,220 Y43F8B.13 snfh V 19,462,461 Y116F11B.7  V 19,848,780 Y116F11B.11  V 19,877,121 Y113G7A.12  V 20,143,284 Y113G7A.13  V 20,146,441 F19B2.7  V 20,150,161 F19B2.6  V 20,158,010 F19B2.5 snfh V 20,161,086 F19B2.4 srz-36 V 20,166,293 pseudogene F19B2.3 srw-39 V 20,169,056 F19B2.2 srz-34 V 20,171,407 pseudogene F19B2.10  V 20,174,905 pseudogene F19B2.1 srz-33 V 20,176,869 pseudogene F19B2.8 srz-16 V 20,183,258 Y113G7B.1 fbxa-116 V 20,186,293 Y113G7B.3 fbxa-115 V 20,189,348 Y113G7B.12  V 20,208,787 Y113G7B.11  V 20,212,649 Y113G7B.t2  V 20,212,902 pseudogene Y113G7B.26  V 20,223,154 Y113G7B.14 snfh V 20,225,829 Y113G7B.15 cathA V 20,229,351 F26F2.1  V 20,557,803 F26F2.2  V 20,561,715 F26F2.3  V 20,564,027 F26F2.4  V 20,567,907 F26F2.5  V 20,571,910 H02F09.4 revt X 1,564,033 Y75D11A.1  X 1,757,098 Y75D11A.4 revt X 1,763,420 Y75D11A.5  X 1,770,272 ZC53.4  X 1,931,714 Y59E1A.1 fbxa-40 X 1,940,067 Y48D7A.1  X 3,009,828    97 Appendix 2. Genes that are completely deleted from the Madeiran strain (JU258) genome. The position listed is the middle of the deleted gene.  Genome Name GeneticName / Family Chromosome Position Note Y74C10AR.3 abc-tr I 2,478,901 Y74C10AR.2  I 2,483,169 F27C1.6  I 5,424,977 ZK39.3 clec I 11,155,497 Y53H1A.3 clec I 11,242,863 Y53H1C.3  I 11,427,683 W04G5.5 duf750 I 11,643,850 T02G6.7 glec I 11,836,206 Y47H9C.14  I 11,906,180 Y47H9C.9  I 11,908,246 Y47H9C.10 fbxa I 11,910,701 Y47H10A.1 clp-3 I 12,085,997 F15H9.1 duf316 I 12,166,054 T09E11.3 duf595 I 12,378,020 T15D6.1 duf595 I 12,380,121 K07E8.3 sdz-24 II 659,188 Y51H7BR.2 fbxb-43 II 1,536,773 Y51H7BR.1 fbxb-42 II 1,538,130 K05F6.5 fbxb-44 II 1,540,811 K05F6.4 fbxb II 1,545,252 K05F6.6 fbxb-52 II 1,548,512 K05F6.3 fbxb-51 II 1,552,095 K05F6.7 fbxb-54 II 1,555,788 K05F6.2 fbxb-50 II 1,559,902 K05F6.8 fbx II 1,567,541 K05F6.9 fbxb-46 II 1,570,611 K05F6.1 fbxb-49 II 1,570,990 K05F6.10 btb II 1,573,768 T07H3.3 math-38 II 1,578,871 T07H3.4 clec-21 II 1,582,514 T07H3.5 clec-20 II 1,594,364 T07H3.2 bath-46 II 1,598,756 T07H3.6 bath-26 II 1,600,143 T07H3.1 bath-47 II 1,603,214 C08E3.3 bath-33 II 1,605,409 C08E3.4 fbxa II 1,606,917 C08E3.5 fbxa II 1,608,608 C08E3.7 fbxa II 1,613,422 C08E3.8 fbxa II 1,615,632 C08E3.9 fbxa II 1,617,948 C08E3.10 fbxa II 1,620,582 C08E3.11 fbxa II 1,622,249 C08E3.12 fbxa II 1,624,294 ZC204.8 fbxb II 1,647,165 ZC204.9 fbxb-20 II 1,649,688 ZC204.10 fbxb-16 II 1,651,203 ZC204.7 fbxb-15 II 1,652,998   98 ZC204.11 btb II 1,654,288 ZC204.3 btb II 1,655,686 ZC204.12 btb II 1,656,831 ZC204.13  II 1,660,734 F58E1.8 fbxb-18 II 1,686,937 F58E1.9 fbxb-19 II 1,688,707 F58E1.10 fbxc II 1,690,444 F58E1.11 fbxc II 1,692,234 F58E1.12 fbxc II 1,693,959 F58E1.13 btb|fbxa II 1,695,061 F36H5.9 fbxb II 1,752,923 F36H5.11 fbxb-12 II 1,754,848 F36H5.10 cfam10 II 1,757,483 F36H5.5  II 1,759,501 F36H5.4 cfam10 II 1,762,364 F36H5.3 math-28 II 1,764,854 F36H5.2b math-27 II 1,769,883 F36H5.1 math-26 II 1,773,253 C08F1.4a math-3 II 1,777,284 C08F1.5 math-4 II 1,780,607 C08F1.10 cfam10 II 1,785,801 C08F1.6 cfam10 II 1,787,697 C08F1.3 fbxb-13 II 1,789,688 C08F1.2 str-21 II 1,792,475 pseudogene C08F1.1 math-2 II 1,800,515 C08F1.7 str-22 II 1,802,708 pseudogene C08F1.8 cfam10 II 1,806,255 C08F1.9 revt II 1,809,929 T08E11.6 fbxb-10 II 1,817,844 T08E11.7 fbxa-3 II 1,819,328 T08E11.5 fbxc II 1,822,276 T08E11.4 math-41 II 1,827,335 T08E11.3 math-40 II 1,831,237 T08E11.2 math-39 II 1,832,898 T08E11.8 fbx II 1,834,708 T08E11.1 fbx II 1,838,364 C52E2.5 fbx II 1,841,959 C52E2.4 fbx II 1,844,889 C52E2.6 fbxb-97 II 1,847,375 C52E2.7 fbxb-96 II 1,849,922 C52E2.3 duf130 II 1,851,866 C52E2.2 clec II 1,852,943 C52E2.1 fbxb-95 II 1,854,863 C52E2.8  II 1,856,710 C16C4.7  II 1,859,901 C16C4.6 fbxb-98 II 1,863,210 C16C4.5 math-15 II 1,866,964 C16C4.4 math-14 II 1,870,059 C16C4.15 math-10 II 1,872,349 C16C4.16 math-11 II 1,874,693 C16C4.3 math-13 II 1,876,411 C16C4.8 math-16 II 1,878,336 C16C4.9 math-17 II 1,880,206 C16C4.10 math-5 II 1,882,210 C16C4.11 math-6 II 1,884,168 C16C4.12 math-7 II 1,885,961   99 C16C4.13 math-8 II 1,888,192 C16C4.14 math-9 II 1,890,348 C16C4.2 math-12 II 1,891,823 C16C4.1  II 1,894,512 C46F9.4 math-25 II 1,896,912 C46F9.3 math-24 II 1,901,084 C46F9.2 math-23 II 1,903,366 C46F9.1 math-22 II 1,905,674 F52C6.5 math-30 II 1,908,950 F52C6.6 math-31 II 1,912,655 F52C6.7 bath-11 II 1,914,743 F52C6.8 bath-4 II 1,917,377 F52C6.9 bath-6 II 1,918,881 F52C6.10 bath-7 II 1,920,562 F52C6.11 bath-2 II 1,922,183 F52C6.4 ubq II 1,923,564 F52C6.3 ubq II 1,925,115 F52C6.2 ubq II 1,926,664 F52C6.1 bath-22 II 1,928,308 F52C6.12 ubql-e2 II 1,929,757 F52C6.13  II 1,930,470 F52C6.14  II 1,932,827 C40D2.4 homeodomain II 2,000,366 F59H6.5 pif-helicase II 2,005,815 F59H6.6  II 2,011,981 F59H6.4 math-32 II 2,012,019 F59H6.3  II 2,018,442 F59H6.2  II 2,021,049 F59H6.7 cya-2 II 2,023,091 F59H6.8 bath-21 II 2,026,127 F59H6.9 bath-1 II 2,028,248 F59H6.10 bath-3 II 2,029,999 F59H6.11 bath-5 II 2,031,390 F59H6.12 btb II 2,033,198 F59H6.1 bath-19 II 2,037,553 B0047.1 bath-20 II 2,041,105 B0047.2 btb II 2,043,359 B0047.3 bath-24 II 2,045,145 B0047.4 math-1 II 2,046,356 B0047.5 bath-14 II 2,048,442 F07E5.1 fbxb-6 II 2,066,718 F07E5.7 duf130 II 2,071,062 F07E5.8 duf-wsn II 2,074,097 F07E5.10  II 2,077,463 pseudogene F07E5.9  II 2,077,850 T16A1.4 clec II 2,081,788 T16A1.5  II 2,083,712 T16A1.3 fbxc II 2,084,511 T16A1.7 pqn-66 II 2,088,014 T16A1.8 fbxb-37 II 2,092,797 T16A1.9  II 2,094,932 T16A1.2 duf-wsn II 2,096,989 T16A1.1 math-42 II 2,101,501 R52.3 math-35 II 2,105,696 R52.4 duf130 II 2,107,797 R52.5 duf130 II 2,109,416   100 R52.6 duf130 II 2,112,662 R52.7 srh-195 II 2,115,764 R52.8 math-36 II 2,118,724 R52.9 math-37 II 2,121,461 R52.10 math-btb II 2,126,732 R52.2  II 2,127,158 R52.1 math-btb II 2,131,674 C40A11.5 pphos-y II 2,134,686 C40A11.10  II 2,137,036 C40A11.4 btb II 2,138,139 K09F6.6  II 2,277,033 K09F6.9  II 2,281,786 K09F6.10  II 2,288,421 K09F6.7 ubql-e3 II 2,291,523 K09F6.8 fbxc II 2,295,399 B0281.4 btb II 2,298,227 B0281.5 btb II 2,300,381 B0281.6 btb II 2,301,816 B0281.3 ubql-e3 II 2,307,558 B0281.2 revt II 2,308,984 B0281.7  II 2,310,574 pseudogene B0281.8 ubql-e3 II 2,312,170 B0281.1  II 2,313,998 ZK1240.4  II 2,315,354 ZK1240.5 ubql-e3 II 2,317,122 ZK1240.9 ubql-e3 II 2,318,984 ZK1240.3 ubql-e3 II 2,321,053 ZK1240.6 ubql-e3 II 2,322,662 ZK1240.2 ubql-e3 II 2,324,267 ZK1240.8 ubql-e3 II 2,328,025 ZK1240.1 ubql-e3 II 2,330,624 F43C11.8 ubql-e3 II 2,333,396 F43C11.7 ubql-e3 II 2,336,796 F43C11.9  II 2,339,351 F43C11.6  II 2,343,009 F43C11.5 duf130 II 2,345,574 F43C11.4 duf130 II 2,348,524 F43C11.10  II 2,350,975 F43C11.3  II 2,353,839 F43C11.2 duf130 II 2,355,261 F43C11.1 duf130 II 2,357,584 F43C11.11 duf130 II 2,360,120 F43C11.12 duf130 II 2,365,867 F16G10.5  II 2,367,873 F16G10.4 duf130 II 2,369,674 F16G10.3 duf130 II 2,373,287 F16G10.2 duf130 II 2,375,633 F16G10.6 duf130 II 2,377,520 F16G10.7 duf130 II 2,379,928 F16G10.8 duf130 II 2,382,693 F16G10.9 duf130 II 2,385,469 F16G10.10 duf130 II 2,390,324 F16G10.11 duf130 II 2,392,784 F16G10.13 duf130 II 2,395,441 F29A7.3  II 2,753,579 F08D12.8 fbxb-105 II 2,770,341   101 Y110A2AL.4a  II 2,839,841 Y110A2AL.6  II 2,841,340 Y110A2AL.7  II 2,842,897 Y110A2AL.1  II 2,875,860 T11F1.3  II 2,955,703 ZC239.8 sri-51 II 3,198,351 ZC239.9 sri-48 II 3,200,257 ZC239.10 sri-53 II 3,201,907 ZC239.19 sri-50 II 3,204,624 ZC239.12 btb II 3,208,305 ZC239.6 btb II 3,210,123 ZC239.5 btb II 3,212,757 ZC239.4 btb II 3,214,931 ZC239.3 btb II 3,216,336 ZC239.2 btb II 3,218,444 ZC239.14 btb II 3,221,697 ZC239.13 btb II 3,222,750 ZC239.15 btb II 3,224,089 ZC239.16 btb II 3,225,023 ZC239.17 btb II 3,225,676 ZC239.1  II 3,227,772 pseudogene C17F4.4 srh-297 II 3,229,699 C17F4.8  II 3,232,840 C17F4.10 srz-67 II 3,236,542 C17F4.9 srz-68 II 3,239,101 pseudogene C17F4.3  II 3,241,105 C17F4.7  II 3,245,573 C17F4.5 fbxc II 3,247,242 C17F4.2  II 3,249,006 F39E9.7  II 3,306,570 F39E9.5  II 3,308,326 F39E9.6 nepp II 3,311,104 F39E9.1 duf274 II 3,317,702 Y46D2A.2 duf274 II 3,322,296 Y46D2A.1 duf274 II 3,325,425 Y46D2A.3 cfam15 II 3,326,771 F14D2.6 recepL II 3,331,956 F14D2.7  II 3,337,169 F14D2.5  II 3,340,797 F14D2.11 ubq II 3,341,568 F14D2.15 btb II 3,342,960 F14D2.13 bath-28 II 3,344,466 F14D2.12 bath-30 II 3,345,825 F14D2.14  II 3,347,163 F14D2.8 fbxo II 3,348,452 F14D2.10  II 3,352,058 F14D2.9 tc-related II 3,354,397 F14D2.4a bath-29 II 3,354,580 F14D2.2 duf278 II 3,359,468 F14D2.1 bath-27 II 3,360,783 Y49F6C.6  II 3,364,649 Y49F6C.7  II 3,367,723 Y49F6C.8  II 3,369,682 Y49F6C.5 bath-23 II 3,371,723 F19B10.9 tbx-18 II 3,672,798 F19B10.2  II 3,676,144   102 F19B10.10 cfam19 II 3,679,013 F19B10.1  II 3,683,812 F19B10.12 srx-99 II 3,686,537 pseudogene F40H7.4 srx-101 II 3,690,101 F40H7.3 srx-100 II 3,692,502 pseudogene T24E12.5  II 3,755,628 T24E12.4 srx-111 II 3,758,520 F54D10.7 ubq II 3,820,745 F53C3.7  II 3,895,533 Y14H12A.1  II 3,993,551 W03C9.5  II 11,967,252 Y17G7B.11  II 12,061,375 F49C5.6 str-223 II 12,564,965 Y46G5A.7 fbxb II 12,755,280 Y46G5A.8 fbxb II 12,756,890 E01G4.5  II 13,473,126 T12B5.11 fbxa-67 III 949,918 T12B5.4 fbxa-11 III 951,456 T12B5.3 fbxa-10 III 953,161 T12B5.2 fbxa-54 III 957,063 T12B5.12 fbxa-70 III 961,133 T12B5.1 fbxa-51 III 962,902 R06B10.1  III 969,327 Y119D3A.3 fbxa-35 III 1,296,367 Y119D3A.2 fbxa-28 III 1,298,684 Y119D3A.1 fbxa-75 III 1,304,578 Y82E9BL.13 fbxa-79 III 1,307,354 Y82E9BL.12 duf13 III 1,310,189 Y82E9BL.16 fbxa-20 III 1,312,165 Y82E9BL.14 fbxa-80 III 1,314,267 Y82E9BL.15 fbxa-19 III 1,315,955 Y82E9BL.5 cfam10 III 1,346,712 Y82E9BL.4 fbxa-25 III 1,348,584 Y82E9BL.3 cfam10 III 1,350,284 Y82E9BL.2 cfam10 III 1,352,507 Y82E9BL.1  III 1,355,199 Y82E9BR.8  III 1,358,098 Y82E9BR.9  III 1,359,502 Y82E9BR.7  III 1,361,582 Y82E9BR.20  III 1,374,462 Y82E9BR.10  III 1,375,408 Y82E9BR.11  III 1,377,211 Y82E9BR.12 fbxa-138 III 1,381,351 Y82E9BR.6  III 1,381,526 Y82E9BR.5  III 1,387,640 Y82E9BR.13  III 1,391,180 Y82E9BR.4  III 1,401,338 Y82E9BR.21  III 1,405,468 Y82E9BR.22  III 1,406,649 Y82E9BR.14b  III 1,411,187 B0524.4 duf-wsn III 1,884,902 H04J21.1  III 2,369,758 Y75B8A.31  III 12,353,130 Y75B8A.32  III 12,356,679 Y75B8A.34 SR-unclass III 12,360,282 Y75B8A.33  III 12,364,522   103 Y49E10.6 his-72 III 12,367,798 F38A1.11  IV 1,248,206 Y69A2AR.24  IV 2,563,882 Y69A2AR.25  IV 2,567,101 Y69A2AR.13  IV 2,570,034 Y69A2AR.12  IV 2,571,508 Y69A2AR.26 nhr-242 IV 2,579,348 Y69A2AR.11  IV 2,586,164 Y69A2AR.10 revt IV 2,591,779 Y69A2AR.9  IV 2,595,463 Y69A2AR.8  IV 2,596,557 Y69A2AR.27  IV 2,597,860 Y94H6A.10  IV 2,710,391 Y46C8AL.4 clec-71 IV 3,938,688 F49F1.7 duf18 IV 4,124,638 F49F1.8  IV 4,132,560 F49F1.9 glec IV 4,137,201 F49F1.10 glec IV 4,138,758 F49F1.11 glec IV 4,140,539 F49F1.12  IV 4,142,226 F49F1.13  IV 4,143,825 pseudogene R07C12.3 clec IV 4,145,897 R07C12.2 clec IV 4,149,451 R07C12.4 clec IV 4,151,495 R07C12.1 clec IV 4,153,281 K08D10.10 clec IV 4,155,742 K08D10.9 clec IV 4,159,983 K08D10.8 scram IV 4,163,466 K08D10.7 scram IV 4,165,471 F19C7.6  IV 4,595,282 F19C7.5  IV 4,596,309 F19C7.3 fbxb IV 4,598,308 F55B11.5  IV 14,428,030 T27E7.9 srz-like IV 14,548,126 Y105C5B.13 skr-10 IV 15,947,058 Y105C5B.27  IV 16,162,201 K03D3.8  IV 16,314,466 K03D3.6  IV 16,317,566 Y116A8C.1  IV 16,901,344 R13D11.5 srab-15 V 787,858 K10C9.4  V 1,051,400 K10C9.8 str-224 V 1,057,598 F59A7.8 snfh V 2,009,954 Y19D10B.6 nepp V 2,319,701 Y19D10B.7 cfam7 V 2,322,117 F15E11.14 cfam7 V 2,323,097 F15E11.15a cfam7 V 2,325,569 F15E11.12 cfam7 V 2,326,855 F54E2.1 duf274 V 2,811,049 F54E2.5  V 2,814,059 F54E2.6 srt-34 V 2,817,377 H27D07.2 srw-141 V 2,937,781 H27D07.3 srw-143 V 2,940,381 H27D07.4 srw-137 V 2,943,092 H27D07.6 srh-87 V 2,944,858 H27D07.1 srw-128 V 2,947,442 pseudogene   104 H27D07.5 srw-122 V 2,951,284 H05B21.1 srw-126 V 2,953,568 pseudogene H05B21.2 srh-248 V 2,954,750 C50H11.5 srt-9 V 3,078,513 C50H11.14 srt-5 V 3,079,998 C50H11.4 srt-7 V 3,082,226 C50H11.3 srt-71 V 3,084,284 C50H11.2 srt-8 V 3,087,100 C50H11.1  V 3,089,994 T28A11.9 srj-8 V 3,248,305 T28A11.13 thiordx V 3,249,801 T28A11.8 str-58 V 3,253,857 pseudogene T28A11.15 srt-63 V 3,255,291 T28A11.16 duf19 V 3,258,695 T28A11.7 srbc-5 V 3,261,056 T28A11.6 duf750 V 3,262,666 T28A11.17 nepp V 3,264,767 T28A11.18 duf19 V 3,267,381 T28A11.19 duf19 V 3,268,984 T28A11.5 duf19 V 3,270,724 T28A11.20 nepp V 3,271,969 T28A11.4  V 3,274,041 T28A11.3 duf19 V 3,275,555 T28A11.21 fbxa-64 V 3,277,435 T28A11.2 duf19 V 3,281,339 T28A11.1 str-64 V 3,283,753 F35F10.5 duf19 V 3,285,440 F35F10.7 duf976 V 3,286,742 F35F10.6 duf976 V 3,288,422 F35F10.4 duf750 V 3,292,580 F35F10.8 srx-122 V 3,296,812 F35F10.9 srbc-1 V 3,300,268 F35F10.10 duf750 V 3,304,715 F35F10.2 srbc-3 V 3,308,905 F35F10.11  V 3,311,016 F35F10.12  V 3,314,180 F35F10.1 duf750 V 3,317,032 F35F10.13 duf19 V 3,318,461 F35F10.14  V 3,319,409 C17B7.8 nepp V 3,322,256 C17B7.7 duf1258 V 3,327,764 C17B7.9 duf19 V 3,331,340 C17B7.5 duf750 V 3,336,142 C17B7.4 duf19 V 3,339,516 C17B7.10 nepp V 3,341,306 C17B7.3 duf19 V 3,343,161 C17B7.11 fbxa-65 V 3,345,781 C17B7.2 duf19 V 3,349,595 C17B7.1 str-63 V 3,351,996 C17B7.12 duf19 V 3,353,418 C04E12.6 duf976 V 3,354,917 C04E12.7 scram V 3,356,385 C04E12.5 duf750 V 3,359,694 C04E12.4 duf750 V 3,365,365 C04E12.2 duf19 V 3,369,327 C04E12.8 srx-121 V 3,371,664   105 C04E12.9 srbc-2 V 3,375,638 C04E12.10 duf750 V 3,379,587 C04E12.1 srbc-4 V 3,384,361 pseudogene C04E12.11  V 3,387,183 C04E12.12  V 3,389,956 T20D4.13 duf750 V 3,392,602 T20D4.15 duf19 V 3,393,567 T20D4.16 duf19 V 3,394,365 T20D4.17 duf19 V 3,395,578 T20D4.12 duf19 V 3,396,799 T20D4.11 duf19 V 3,398,867 T20D4.10 duf19 V 3,401,132 T20D4.9 nepp V 3,403,879 T20D4.8 nepp V 3,406,627 T20D4.7 thiordx V 3,408,974 T20D4.6  V 3,411,448 T20D4.5 duf750 V 3,414,406 T20D4.4 duf750 V 3,416,686 T20D4.3 duf750 V 3,420,324 T20D4.18 srab-21 V 3,423,521 T20D4.2 srab-22 V 3,426,881 T20D4.1 srab-20 V 3,429,530 T20D4.19 duf19 V 3,430,884 T20C4.1 srj-9 V 3,432,785 C07G3.5 str-228 V 3,510,885 C07G3.4 str-226 V 3,513,338 C07G3.3 str-227 V 3,518,821 F59B1.5 srx-94 V 3,622,856 C17E7.1 nhr-156 V 3,906,357 T20C7.2 nhr-284 V 3,907,900 T20C7.1 srx-61 V 3,909,835 K07C6.11 srx-68 V 3,913,463 K07C6.10 srx-67 V 3,917,456 K07C6.9 srx-66 V 3,920,015 K07C6.8 srx-65 V 3,922,263 K07C6.13 srx-69 V 3,923,630 K07C6.7 srx-64 V 3,929,914 K07C6.6 srx-63 V 3,932,323 K07C6.15 srx-70 V 3,934,252 K07C6.5 cyp-35A5 V 3,937,224 K07C6.4 cyp-35B1 V 3,939,969 K07C6.3 cyp-35B2 V 3,943,149 K07C6.2 cyp-35B3 V 3,946,012 K07C6.1 srz-89 V 3,947,972 pseudogene T09H2.1 cyp-34A4 V 3,950,828 B0213.9 str-247 V 3,954,529 B0213.10 cyp-34A5 V 3,956,537 K09D9.9  V 3,993,210 K09D9.10 srx-62 V 3,995,172 K09D9.11  V 3,999,108 K09D9.12  V 4,003,568 K09D9.8 srh-8 V 4,006,355 C49G7.1 brca-like V 4,057,429 T15B7.10  V 6,813,742 C51E3.2 srsx-27 V 10,154,026 F38B7.3  V 11,552,951   106 Y75B12A.2  V 15,112,547 ZK1037.4 nhr-246 V 15,320,952 Y6E2A.8  V 15,727,763 Y6E2A.9a tricarbox-carrier V 15,730,183 T23D5.2 str-38 V 15,733,312 T23D5.3 duf130 V 15,735,850 T23D5.1 str-27 V 15,737,506 T23D5.5 str-17 V 15,741,816 pseudogene T23D5.6 str-18 V 15,743,823 T23D5.7 str-19 V 15,746,443 T23D5.8  V 15,750,523 T23D5.10 str-6 V 15,752,198 T23D5.9 str-43 V 15,754,125 T23D5.12 str-5 V 15,756,421 T23D5.11 str-8 V 15,758,297 F57A10.1 str-9 V 15,762,180 F44G3.8 fbxa-87 V 16,132,156 F11A5.8 acylt V 16,211,006 F49H6.9 srz-92 V 17,034,333 pseudogene F49H6.11 srz-94 V 17,037,162 F49H6.10 srz-93 V 17,039,714 pseudogene T19C9.1 srbc-62 V 17,221,074 T19C9.4 srh-109 V 17,222,426 T19C9.3 srh-252 V 17,225,473 T19C9.2 srh-112 V 17,228,230 T19C9.5 scp-like V 17,230,147 T19C9.6 CUB2 V 17,232,325 T19C9.8 CUB2 V 17,242,116 Y68A4B.3  V 17,244,463 Y68A4B.2 clec V 17,247,214 Y68A4B.1 clec V 17,250,421 Y61B8A.3  V 17,253,804 pseudogene Y61B8A.1 srh-116 V 17,256,607 Y61B8A.2 srh-115 V 17,259,022 K10G4.2 srw-47 V 17,263,954 K10G4.3  V 17,267,087 K10G4.6 srh-117 V 17,274,886 pseudogene K10G4.7 srh-262 V 17,279,657 pseudogene K10G4.1  V 17,282,723 K10G4.4  V 17,286,351 K10G4.9 srw-30 V 17,290,602 K10G4.8 srw-46 V 17,295,549 pseudogene K10G4.5 fbx V 17,300,703 Y61B8B.1 sri-70 V 17,306,323 Y61B8B.2  V 17,308,868 F31E9.6  V 17,316,992 F31E9.7 srz-26 V 17,319,612 pseudogene F31E9.5 srz-58 V 17,321,408 F31E9.1 fbxa V 17,323,597 F31E9.3 fbx V 17,327,083 F31E9.4 fbxb V 17,329,822 F31E9.2 srg-44 V 17,332,281 F47H4.5  V 17,335,906 pseudogene F47H4.4 fbxa V 17,336,170 F47H4.6 fbxa V 17,339,956 F47H4.7 fbxa V 17,343,841   107 F47H4.8 fbxa V 17,346,496 F47H4.9 fbxa V 17,349,046 F47H4.10 skr-5 V 17,350,819 F47H4.1  V 17,352,023 F47H4.11 fbxa-134 V 17,354,051 F47H4.2 fbx V 17,359,324 T27C5.14 srh-96 V 17,422,304 pseudogene T27C5.10 srw V 17,432,342 F20E11.15 srbc-27 V 17,434,429 F20E11.16 srbc-28 V 17,436,678 pseudogene F20E11.2 srsx-2 V 17,439,625 F20E11.11 srh-175 V 17,442,319 pseudogene F20E11.12 srh-154 V 17,443,742 F20E11.13 srh-158 V 17,446,702 pseudogene F20E11.3 srh-160 V 17,449,569 pseudogene F20E11.8 srh-157 V 17,451,538 pseudogene F20E11.9 srh-156 V 17,454,047 pseudogene F20E11.14 srh-161 V 17,455,818 pseudogene F20E11.10 srh-203 V 17,457,973 F20E11.4 str-200 V 17,459,971 F20E11.1 srz-48 V 17,461,924 F20E11.5  V 17,464,200 F20E11.7  V 17,468,180 F20E11.6 srw-72 V 17,470,398 F08E10.1 srh-235 V 17,473,520 F08E10.8 srh-114 V 17,475,475 pseudogene F08E10.3 srh-123 V 17,477,311 F08E10.2 srbc-61 V 17,480,025 F08E10.4 srh-110 V 17,481,436 pseudogene F08E10.5 srh-253 V 17,483,929 pseudogene F08E10.6 srh-111 V 17,487,188 F08E10.7 scp-like V 17,489,394 K03D7.9  V 17,490,666 K03D7.8  V 17,493,132 K03D7.7 fbxa-102 V 17,495,320 K03D7.6 srh-118 V 17,497,815 K03D7.5  V 17,499,492 pseudogene K03D7.4 srh-261 V 17,500,773 C38D9.2  V 17,571,597 C38D9.3  V 17,581,072 C38D9.4 fbxa-133 V 17,587,911 C38D9.5 duf-wsn V 17,591,990 F57G4.5  V 17,644,968 F57G4.6  V 17,646,034 F57G4.9 tc-related V 17,646,969 F57G4.7  V 17,649,407 pseudogene F57G4.8 fbxa V 17,652,300 F59A1.5  V 17,654,479 pseudogene F59A1.9 fbxa V 17,656,958 F59A1.8 fbxa-129 V 17,658,506 F59A1.7 fbxa-108 V 17,659,910 Y94A7B.1 srh-292 V 17,805,922 Y94A7B.3 srh-291 V 17,809,625 Y94A7B.4 srh-296 V 17,814,150 Y94A7B.5 srh-298 V 17,822,511 C31G12.2 clec V 18,189,760   108 F16H6.2 clec V 18,192,878 F16H6.1 clec-42 V 18,195,355 F16H6.3 duf18 V 18,199,159 F16H6.4 duf976 V 18,202,398 F16H6.5 duf976 V 18,205,191 F16H6.6  V 18,208,395 F16H6.7 duf976 V 18,210,452 F16H6.8  V 18,216,921 F16H6.9  V 18,222,266 F16H6.10  V 18,226,306 Y37H2B.1  V 18,230,180 R10E8.7  V 18,234,791 pseudogene R10E8.3  V 18,238,098 R10E8.8  V 18,242,101 R10E8.1  V 18,245,743 Y51A2A.1 clec, duf130 V 18,287,304 B0462.1  V 18,329,487 F11D11.6 clec V 18,754,286 F11D11.5 clec V 18,756,538 F11D11.3 duf274 V 18,761,700 F11D11.4 duf274 V 18,765,026 F11D11.1 clec V 18,771,813 Y17D7B.6 clec V 18,775,431 Y17D7B.5 clec V 18,778,043 C54E10.2  V 18,831,641 Y17D7A.4 cyp-33D3 V 18,837,335 Y17D7A.3a nhr-65 V 18,844,876 T26H2.3 fbxb-2 V 19,239,268 C25F9.8  V 19,398,190 C25F9.7 srw-86 V 19,401,837 C25F9.6  V 19,405,779 C25F9.5 snfh V 19,411,733 C25F9.4 snfh V 19,415,190 C25F9.9 snfh V 19,420,199 C25F9.2  V 19,426,665 C25F9.1 srw-85 V 19,433,165 M04C3.1a snfh V 19,443,365 M04C3.3  V 19,447,117 M04C3.2 snfh V 19,450,053 Y43F8B.14 snfh V 19,456,220 Y43F8B.13 snfh V 19,462,461 Y43F8B.12 snfh V 19,471,884 Y43F8B.11 duf19 V 19,474,156 Y43F8B.10 duf19 V 19,478,277 Y43F8B.9  V 19,482,756 Y43F8B.8  V 19,486,339 pseudogene Y43F8B.7  V 19,493,568 Y43F8B.6  V 19,495,973 Y43F8B.5 scp-like V 19,497,797 W04E12.6 clec-49 V 19,748,097 Y116F11B.7  V 19,848,780 Y113G7B.6 fbxa-113 V 20,201,960 Y113G7B.8 fbxb-59 V 20,203,371 Y113G7B.9 srbc-34 V 20,205,286 Y113G7B.12 revt V 20,208,787 Y113G7B.11  V 20,212,649   109 Y113G7B.26  V 20,223,154 Y113G7B.14 snfh V 20,225,829 Y113G7B.15 cathA V 20,229,351 F48F5.1 pphos-y V 20,440,391 F56C3.5  X 1,363,402 F47B7.5 duf130 X 3,768,290 F46F2.1 cfam19 X 15,252,260     110 Appendix 3. The segmentation algorithm. The segmentation algorithm used throughout this thesis was designed and created in the C programming language by Stephane Flibotte. It is a highly efficient implementation of a bottom- up approach. The program assumes that log2 ratios for all probes are drawn from a normal distribution and begins by considering each probe as an individual segment. The algorithm performs t-tests for all possible mergers of adjacent segments to estimate the probability that the log2 ratios were drawn from samples with the same mean. These P-values are stored in a heap, prioritizing all possible mergers. The data structure is an ordered doubly linked list. The program makes the most likely merger of adjacent segments according to these P-values, calculates the new mean log2 ratio of the segment resulting from the merger, calculates P-values for subsequent mergers of the new segment with its neighbours and updates the heap. This procedure is repeated until the P-value of the next most likely merger is less than a critical value supplied by the user (typically 0.05). The program calculates a P-value for each remaining segment using a one-sample t-test. Segments are then labeled as candidate “amplifications” or “deletions” if their both their mean log2 ratios and P-values meet or exceed cutoff values supplied by the user, as described earlier in Chapters 2, 3 and 4. Segments that do not meet both of these criteria are labeled as “normal”. If desired, the program can then merge all adjacent segments sharing the same label, recalculating the mean log2 ratios and P-values for the merged segments. The entire process takes just a few seconds for data sets of 380,000 probes like those used in this thesis.

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0067087/manifest

Comment

Related Items